Joint Inference for Knowledge Base Populationemnlp2014.org/papers/pdf/EMNLP2014205.pdf · 2014-10-16 · Proceedings of the 2014 Conference on Empirical Methods in Natural Language

Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1912–1923,October 25-29, 2014, Doha, Qatar. c©2014 Association for Computational Linguistics

Joint Inference for Knowledge Base Population

Liwei Chen1, Yansong Feng 1∗, Jinghui Mo1, Songfang Huang2, and Dongyan Zhao1

1ICST, Peking University, Beijing, China2IBM China Research Lab, Beijing, China

{chenliwei,fengyansong,mojinghui,zhaodongyan}@[email protected]

Abstract

Populating Knowledge Base (KB) withnew knowledge facts from reliable text re-sources usually consists of linking namementions to KB entities and identifyingrelationship between entity pairs. How-ever, the task often suffers from errorspropagating from upstream entity linkersto downstream relation extractors. In thispaper, we propose a novel joint infer-ence framework to allow interactions be-tween the two subtasks and find an opti-mal assignment by addressing the coher-ence among preliminary local predictions:whether the types of entities meet the ex-pectations of relations explicitly or implic-itly, and whether the local predictions areglobally compatible. We further measurethe confidence of the extracted triples bylooking at the details of the complete ex-traction process. Experiments show thatthe proposed framework can significantlyreduce the error propagations thus obtainmore reliable facts, and outperforms com-petitive baselines with state-of-the-art re-lation extraction models.

1 Introduction

Recent advances in natural language processinghave made it possible to construct structured KBsfrom online encyclopedia resources, at an un-precedented scale and much more efficiently thantraditional manual edit. However, in those KBs,entities which are popular to the community usu-ally contain more knowledge facts, e.g., the bas-ketball player LeBron James, the actor NicholasCage, etc., while most other entities often havefewer facts. On the other hand, knowledge factsshould be updated as the development of entities,such as changes in the cabinet, a marriage event,or an acquisition between two companies, etc.

In order to address the above issues, we couldconsult populating existing KBs from reliable textresources, e.g., newswire, which usually involvesenriching KBs with new entities and populatingKBs with new knowledge facts, in the form of<Entity, Relation, Entity> triple. In this paper, wewill focus on the latter, identifying relationship be-tween two existing KB entities. This task can beintuitively considered in a pipeline paradigm, thatis, name mentions in the texts are first linked toentities in the KB (entity linking, EL), and thenthe relationship between them are identified (re-lation extraction, RE). It is worth mentioning thatthe first task EL is different from the task of namedentity recognition (NER) in traditional informa-tion extraction (IE) tasks, where NER recognizesand classifies the entity mentions (to several pre-defined types) in the texts, but EL focuses on link-ing the mentions to their corresponding entities inthe KB. Such pipeline systems often suffer fromerrors propagating from upstream to downstream,since only the local best results are selected to thenext step. One idea to solve the problem is to allowinteractions among the local predictions of bothsubtasks and jointly select an optimal assignmentto eliminate possible errors in the pipeline.

Let us first look at an example. Suppose we areextracting knowledge facts from two sentences inFigure 1: in sentence [1], if we are more confi-dent to extract the relation fb:org.headquarters1,we will be then prompted to select Bryant Univer-sity, which indeed favors the RE prediction thatrequires an organization to be its subject. Onthe other side, if we are sure to link to KobeBryant in sentence [2], we will probably selectfb:pro athlete.teams, whose subject position ex-pects an athlete, e.g., an NBA player. It is not dif-ficult to see that the argument type expectations ofrelations can encourage the two subtasks interactwith each other and select coherent predictions for

1The prefix fb means the relations are defined in Freebase.

1912

Sentence 1: … [Bryant] is a private university located in [Smithfield]. …

Sentence 2 : … Shaq and [Bryant] led the [Lakers] to three consecutive championships …

Bryant, Illinois

fb:people.person.place_of_birth

fb:sports.pro_athlete.teams

fb:org.org.headquarters

fb:business.board_member.leader_of

Bryant University

...

Kobe Bryant

Kobe Bryant

Bryant University

Bryant, Illinois

...

Smithfield, Rhode Island

Smithfield, Illinois

...

Los Angeles Lakers

Laguna Lakers

...

Figure 1: Two example sentences from which wecan harvest knowledge facts.

both of them. In KBs with well-defined schemas,such as Freebase, type requirements can be col-lected and utilized explicitly (Yao et al., 2010).However, in other KBs with less reliable or evenno schemas, it is more appropriate to implicitlycapture the type expectations for a given relation(Riedel et al., 2013).

Furthermore, previous RE approaches usuallyprocess each triple individually, which ignoreswhether those local predictions are compatiblewith each other. For example, suppose the localpredictions of the two sentences above are <KobeBryant, fb:org.headquarters, Smithfield, Rhode Is-land> and <Kobe Bryant, fb:pro athlete.teams,Los Angeles Lakers>, respectively, which, in fact,disagree with each other with respect to the KB,since, in most cases, these two relations cannotshare subjects. Now we can see that either the re-lation predictions or the EL results for “Bryant”are incorrect. Those disagreements provide us aneffective way to remove the possible incorrect pre-dictions that cause the incompatibilities.

On the other hand, the automatically extractedknowledge facts inevitably contain errors, espe-cially for those triples collected from open do-main. Extractions with confidence scores will bemore than useful for users to make proper deci-sions according to their requirements, such as trad-ing recall for precision, or supporting approximatequeries.

In this paper, we propose a joint framework topopulate an existing KB with new knowledge factsextracted from reliable text resources. The jointframework is designed to address the error propa-gation issue in a pipeline system, where subtasksare optimized in isolation and locally. We find anoptimal configuration from top k results of bothsubtasks, which maximizes the scores of each step,

fulfills the argument type expectations of relations,which can be captured explicitly or implicitly, inthe KB, and avoids globally incoherent predic-tions. We formulate this optimization problem inan Integer Linear Program (ILP) framework, andfurther adopt a logistic regression model to mea-sure the reliability of the whole process, and assignconfidences to all extracted triples to facilitate fur-ther applications. The experiments on a real-worldcase study show that our framework can elimi-nate error propagations in the pipeline systems bytaking relations’ argument type expectations andglobal compatibilities into account, thus outper-forms the pipeline approaches based on state-of-the-art relation extractors by a large margin. Fur-thermore, we investigate both explicit and implicittype clues for relations, and provide suggestionsabout which to choose according to the character-istics of existing KBs. Additionally, our proposedconfidence estimations can help to achieve a pre-cision of over 85% for a considerable amount ofhigh quality extractions.

In the rest of the paper, we first review relatedwork and then define the knowledge base popula-tion task that we will address in this paper. Nextwe detail the proposed framework and present ourexperiments and results. Finally, we conclude thispaper with future directions.

2 Related Work

Knowledge base population (KBP), the task of ex-tending existing KBs with entities and relations,has been studied in the TAC-KBP evaluations (Ji etal., 2011), containing three tasks. The entity link-ing task links entity mentions to existing KB nodesand creates new nodes for the entities absent in thecurrent KBs, which can be considered as a kindof entity population (Dredze et al., 2010; Tamanget al., 2012; Cassidy et al., 2011). The slot-fillingtask populates new relations to the KB (Tamanget al., 2012; Roth et al., 2012; Liu and Zhao,2012), but the relations are limited to a predefinedsets of attributes according to the types of enti-ties. In contrast, our RE models only require min-imal supervision and do not need well-annotatedtraining data. Our framework is therefore easy toadapt to new scenarios and suits real-world appli-cations. The cold-start task aims at constructing aKB from scratch in a slot-filling style (Sun et al.,2012; Monahan and Carpenter, 2012).

Entity linking is a crucial part in many KB re-

1913

lated tasks. Many EL models explore local con-texts of entity mentions to measure the similaritybetween mentions and candidate entities (Han etal., 2011; Han and Sun, 2011; Ratinov et al., 2011;Cheng and Roth, 2013). Some methods further ex-ploit global coherence among candidate entities inthe same document by assuming that these enti-ties should be closely related (Han et al., 2011;Ratinov et al., 2011; Sen, 2012; Cheng and Roth,2013). There are also some approaches regardingentity linking as a ranking task (Zhou et al., 2010;Chen and Ji, 2011). Lin et al. (2012) propose anapproach to detect and type entities that are cur-rently not in the KB.

Note that the EL task in KBP is different fromthe name entity mention extraction task, mainlyin the ACE task style, which mainly identifies theboundaries and types of entity mentions and doesnot explicitly link entity mentions into a KB (ACE,2004; Florian et al., 2006; Florian et al., 2010; Liand Ji, 2014), thus are different from our work.

Meanwhile, relation extraction has also beenstudied extensively in recent years, ranging fromsupervised learning methods (ACE, 2004; Zhaoand Grishman, 2005; Li and Ji, 2014) to unsuper-vised open extractions (Fader et al., 2011; Carl-son et al., 2010). There are also models, with dis-tant supervision (DS), utilizing reliable texts re-sources and existing KBs to predict relations for alarge amount of texts (Mintz et al., 2009; Riedel etal., 2010; Hoffmann et al., 2011; Surdeanu et al.,2012). These distantly supervised models can ex-tract relations from texts in open domain, and donot need much human involvement. Hence, DS ismore suitable for our task compared to other tradi-tional RE approaches.

Joint inference over multiple local models hasbeen applied to many NLP tasks. Our task is dif-ferent from the traditional joint IE works based inthe ACE framework (Singh et al., 2013; Li andJi, 2014; Kate and Mooney, 2010), which jointlyextract and/or classify named entity mentions toseveral predefined types in a sentence and iden-tify in a sentence level which relation this specificsentence describes (between a pair of entity men-tions in this sentence). Li and Ji (2014) followthe ACE task definitions and present a neat incre-mental joint framework to simultaneously extractentity mentions and relations by structure percep-tron. In contrast, we link entity mentions from atext corpus to their corresponding entities in an ex-

isting KB and identify the relations between pairsof entities based on that text corpus. Choi et al.(2006) jointly extracts the expressions and sourcesof opinion as well as the linking relations (i.e., asource entity expresses an opinion expression) be-tween them, while we focus on jointly modelingEL and RE in open domain, which is a differentand challenging task.

Since the automatically extracted knowledgefacts inevitably contain errors, many approachesmanage to assign confidences for those extractedfacts (Fader et al., 2011; Wick et al., 2013). Wicket al. (2013) also point out that confidence estima-tion should be a crucial part in the automated KBconstructions and will play a key role for the wideapplications of automatically built KBs. We thuspropose to model the reliability of the completeextraction process and take the argument type ex-pectations of the relation, coherence with otherpredictions and the triples in the existing KB intoaccount for each populated triple.

3 Task definition

We formalize our task as follows. Given a setof entities sampled from an existing KB, E ={e1, e2, ..., e|E|}, a set of canonicalized relationsfrom the same KB, R = {r1, r2, ..., r|R|}, a setof sentences extracted from news corpus, SN ={sn1, sn2, ..., sn|SN |}, each contains two men-tions m1 and m2 whose candidate entities belongto E, a set of text fragments T = {t1, t2, ..., t|T |},where ti contains its corresponding target sentencesni and acts as its context. Our task is to link thosementions to entities in the given KB, identify therelationship between entity pairs and populate newknowledge facts into the KB.

4 The Framework

We propose to perform joint inference over sub-tasks involved. For each sentence with two entitymentions, we first employ a preliminary EL modeland RE model to obtain entity candidates and pos-sible relation candidates between the two men-tions, respectively. Our joint inference frameworkwill then find an optimal assignment by taking thepreliminary prediction scores, the argument typeexpectations of relations and the global compati-bilities among the predictions into account. In thetask of KBP, an entity pair may appear in multiplesentences as different relation instances, and thecrucial point is whether we can identify all the cor-

1914

Sentence 1: … [Bryant] is a private university located in [Smithfield]. …

Sentence 2: … Shaq and [Bryant] led the [Lakers] to three consecutive

championships from 2000 to 2002. …

Bryant, Illinoisfb:people.place_of_birth

fb:pro_athlete.teams

fb:org.headquarters

fb:business.leader_of

Bryant University

...

Kobe Bryant

Kobe Bryant

Bryant University

Bryant, Illinois

...

Smithfield, Rhode Island

Smithfield, Illinois

...

Los Angeles Lakers

Laguna Lakers

...

Disagreement!

Figure 2: An example of our joint inference framework. The top and bottom are two example sentenceswith entity mentions in the square brackets, candidate entities in the white boxes, candidate relations inthe grey boxes, and the solid lines with arrows between relations and entities represent their preferencescores, with thickness indicating the preferences’ value.

rect relations for an entity pair. Thus, after findingan optimal sentence-level assignment, we aggre-gate those local predictions by ORing them intothe entity pair level. Finally, we employ a regres-sion model to capture the reliability of the com-plete extraction process.

4.1 Preliminary ModelsEntity Linking The preliminary EL model canbe any approach which outputs a score for eachentity candidate. Note that a recall-oriented modelwill be more than welcome, since we expect to in-troduce more potentially correct local predictionsinto the inference step. In this paper, we adoptan unsupervised approach in (Han et al., 2011)to avoid preparing training data. Note the chal-lenging NIL problem, i.e., identifying which en-tity mentions do not have corresponding entitiesin the KB (labeled as NIL) and clustering thosementions, will be our future work. For each men-tion we retain the entities with top p scores for thesucceeding inference step.

Relation Extraction The choice of RE model isalso broad. Any sentence level extractor whoseresults are easy to be aggregated to entity pairlevel can be utilized here (again, a recall-orientedversion will be welcome), such as Mintz++ men-

tioned in (Surdeanu et al., 2012), which we adaptinto a Maximum Entropy version. We also includea special label, NA, to represent the case wherethere is no predefined relationship between an en-tity pair. For each sentence, we retain the relationswith top q scores for the inference step, and wealso call that this sentence supports those candi-date relations. As for the features of RE models,we use the same features (lexical features and syn-tactic features) with the previous works (Chen etal., 2014; Mintz et al., 2009; Riedel et al., 2010;Hoffmann et al., 2011).

4.2 Relations’ Expectations for ArgumentTypes

In most KBs’ schemas, canonicalized relations aredesigned to expect specific types of entities to betheir arguments. For example, in Figure 2, it ismore likely that an entity Kobe Bryant takes thesubject position of a relation fb:pro athlete.teams,but it is unlikely for this entity to take the subjectposition of a relation fb:org.headquarters. Makinguse of these type requirements can encourage theframework to select relation and entity candidateswhich are coherent with each other, and discardincoherent choices.

In order to obtain the preference scores between

1915

the entities in E and the relations in R, we gener-ate two matrices with |E| rows and |R| columns,whose elements spij indicates the preference scoreof entity i and relation j. The matrix Ssubj is forrelations and their subjects, and the matrix Sobj isfor relations and their objects. We initialize thetwo matrices using the KB as follows: for entity iand relation j, if relation j takes entity i as its sub-ject/object in the KB, the element at the position(i, j) of the corresponding matrix will be 1, oth-erwise it will be 0. Note that in our experiments,we do not count the triples that are evaluated in thetesting data, to build the matrices. Now the prob-lem is how we can obtain the unknown elementsin the matrices.

Explicit Type Information Intuitively, weshould examine whether the explicit types of theentities fulfill the expectations of relations in theKB. For each unknown element Ssubj(i, j), wefirst obtain the type of entity i, which is collectedfrom the lowest level of the KB’s type hierarchy,and examine whether there is another entitywith the same type taking the subject positionof relation j in the initial matrix. If such anentity exists, Ssubj(i, j) = 1, otherwise 0. Forexample, for the subject Jay Fletcher Vincent andthe relation fb:pro athlete.teams, we first obtainthe subject’s type basketball player, and then wego through the initial matrix and find anotherentity Kobe Bryant with the same type takingthe subject position of fb:pro athlete.teams,indicating that Jay Fletcher Vincent may take therelation fb:pro athlete.teams. The matrix Sobj isprocessed in the same way.

Implicit Type Expectations In practice, fewKBs have well-defined schemas. In order to makeour framework more flexible, we need to come upwith an approach to implicitly capture the rela-tions’ type expectations, which will also be rep-resented as preference scores.

Inspired by Riedel et al. (2013) who use a ma-trix factorization approach to capture the associa-tion between textual patterns, relations and entitiesbased on large text corpora, we adopt a collabora-tive filtering (CF) method to compute the prefer-ence scores between entities and relations basedon the statistics obtained from an existing KB.

In CF, the preferences between customers anditems are calculated via matrix factorization overthe initial customer-item matrix. In our frame-

work, we compute the preference scores betweenentities and relations via the same approach overthe two initialized matrices Ssubj and Sobj , re-sulting in two entity-relation matrices with esti-mated preference values. We use ALS-WR (Zhouet al., 2008) to process the matrices and computethe preference of a relation taking an entity as itssubject and object, respectively. We normalize thepreference scores of each entity using their meansµ and standard deviations σ.

4.3 Compatibilities among Predicted TriplesThe second aspect we investigate is whether theextracted triples are compatible with respect to allother knowledge facts. For example, according tothe KB, the two relations fb:org.headquarters andfb:pro athlete.teams in Figure 2 cannot share thesame entity as their subjects. So if such sharinghappens, that will indicate either the predictionsof the relations or the entities are incorrect. Theclues can be roughly grouped into three categories,namely whether two relations can share the samesubjects, whether two relations can share the sameobjects, and whether one relation’s subject can bethe other relation’s object.

Global compatibilities among local predictionshave been investigated by several joint models (Liet al., 2011; Li and Ji, 2014; Chen et al., 2014) toeliminate the errors propagating in a pipeline sys-tem. Specifically, Chen et al. (2014) utilized theclues with respect to the compatibilities of rela-tions in the task of relation extraction. Following(Li et al., 2011; Chen et al., 2014), we extend theidea of global compatibilities to the entity and re-lation predictions during knowledge base popula-tion. We examine the pointwise mutual informa-tion (PMI) between the argument sets of two re-lations to collect such clues. For example, if wewant to learn whether two relations can share thesame subject, we first collect the subject sets ofboth relations from the KB, and then compute thePMI value between them. If the value is lowerthan a certain threshold (set to -3 in this paper), theclue that the two relations cannot share the samesubject is added. These clues can be easily inte-grated into an optimization framework in the formof constraints.

4.4 Integer Linear Program FormulationNow we describe how we aggregate the abovecomponents, and formulate the joint inferenceproblem into an ILP framework. For each candi-

1916

date entity e of mention m in text fragment t, wedefine a boolean decision variable dm,e

t , which de-notes whether this entity is selected into the finalconfiguration or not. Similarly, for each candidaterelation r of fragment t, we define a boolean de-cision variable dr

t . In order to introduce the pref-erence scores into the model, we also need a deci-sion variable dr,m,e

t , which denotes whether bothrelation r and candidate entity e of mention m areselected in t.

We use st,m,eel to represent the score of mention

m in t disambiguated to entity e, which is outputby the EL model, st,r

re representing the score of re-lation r assigned to t, which is output by the REmodel, sr,e

p the explicit/implicit preference scorebetween relation r and entity e.

Our goal is to find the best assignment to thevariables dr

t and dm,et , such that it maximizes the

overall scores of the two subtasks and the co-herence among the preliminary predictions, whilesatisfying the constraints between the predictedtriples as well. Our objective function can be writ-ten as:

max el× confent + re× conf rel + sp× cohe−r

(1)where el, re and sp are three weighting parameterstuned on development set. confent is the overallscore of entity linking:

confent =∑

t

∑m∈M(t)

∑e∈Ce(m)

st,m,eel dm,e

t (2)

where M(t) is the set of mentions in t, Ce(m) isthe candidate entity set of the mention m. conf rel

represents the overall score of relation extraction:

conf rel =∑

t

∑r∈Cr(t)

st,rre d

rt (3)

where Cr(t) is the set of candidate relations in t.cohe−r is the coherence between the candidate re-lations and entities in the framework:

cohe−r =∑

t

∑r∈Cr(t)

∑m∈M(t)

∑e∈Ce(m)

sr,ep dr,m,e

t (4)

Now we describe the constraints used in our ILPproblem. The first kind of constraints is intro-duced to ensure that each mention should be dis-ambiguated to only one entity:

∀t,∀m ∈M(t),∑

e∈Ce(m)

dm,et ≤ 1 (5)

The second type of constraints ensure that each en-tity mention pair in one sentence can only take onerelation label:

∀t,∑

r∈Cr(t)

dtr ≤ 1 (6)

The third is introduced to ensure the decision vari-able dr,m,e

t equals 1 if and only if both the corre-sponding variables dr

t and dm,et equal 1.

∀t,∀r ∈ Cr(t), ∀m ∈M(t),∀e ∈ Ce(m)dr,m,e

t ≤ drt (7)

dr,m,et ≤ dm,e

t (8)

drt + dm,e

t ≤ dr,m,et + 1 (9)

As for the compatibility constraints, we need tointroduce another type of boolean decision vari-ables. If a mention m1 in t1 and another mentionm2 in t2 share an entity candidate e, we add a vari-able y for this mention pair, which equals 1 if andonly if both dm1,e

t1 and dm2,et2 equal 1. So we add

the following constraints for each mention pairm1

and m2 satisfies the previous condition:

y ≤ dm1,et1 (10)

y ≤ dm2,et2 (11)

dm1,et1 + dm2,e

t2 ≤ y + 1 (12)

Then we further add the following constraints foreach mention pair to avoid incompatible predic-tions:

∀r1 ∈ Cr(t1), r2 ∈ Cr(t2)If (r1, r2) ∈ Csr, p(m1) = subj, p(m2) = subj

dr1t1 + dr2

t2 + y ≤ 2 (13)

If (r1, r2) ∈ Cro, p(m1) = obj, p(m2) = obj

dr1t1 + dr2

t2 + y ≤ 2 (14)

If (r1, r2) ∈ Csro, p(m1) = obj, p(m2) = subj

dr1t1 + dr2

t2 + y ≤ 2 (15)

where p(m) returns the position of mention m, ei-ther subj (subject) or obj (object). Csr is the pairsof relations which cannot share the same subject,Cro is the pairs of relations which cannot share thesame object, Csro is the pairs of relations in whichone relation’s subject cannot be the other one’s ob-ject.

We use IBM ILOG Cplex2 to solve the aboveILP problem.

2http://www.cplex.com

1917

Table 1: The features used to calculate the confi-dence scores.Type FeatureReal The RE score of the relation.Real The EL score of the subject.Real The EL score of the object.

Real The preference score between the relationand the subject.

Real The preference score between the relationand the object.

Real The ratio of the highest and the second highestrelation score in this entity pair.

Real The ratio of the current relation score and themaximum relation score in this entity pair.

RealThe ratio of the number of sentences supportingthe current relation and the total numberof sentences in this entity pair.

Real Whether the extracted triple is coherent with the KBaccording to the constraints in Section 4.3.

4.5 Confidence Estimation for ExtractedTriples

The automatically extracted triples inevitably con-tain errors and are often considered as with highrecall but low precision. Since our aim is to pop-ulate the extracted triples into an existing KB,which requires highly reliable knowledge facts,we need a measure of confidence for those ex-tracted triples, so that others can properly utilizethem.

Here, we use a logistic regression model to mea-sure the reliability of the process, how the entitiesare disambiguated, how the relationships are iden-tified, and whether those predictions are compat-ible. The features we used are listed in Table 1,which are all efficiently computable and indepen-dent from specific relations or entities. We manu-ally annotate 1000 triples as correct or incorrect toprepare the training data.

5 Experiments

We evaluate the proposed framework in a real-world scenario: given a set of news texts with en-tity mentions and a KB, a model should find moreand accurate new knowledge facts between pairsof those entities.

5.1 Dataset

We use New York Times dataset from 2005 to2007 as the text corpus, and Freebase as the KB.We divide the corpus into two equal parts, one forcreating training data for the RE models using thedistant supervision strategy (we do not need train-ing data for EL), and the other as the testing data.

For the convenience of experimentation, we ran-domly sample a subset of entities for testing. Wefirst collect all sentences containing two mentionswhich may refer to the sampled entities, and prunethem according to: (1)there should be no morethan 10 words between the two mentions; (2)theprior probability of the mention referring to thetarget entity is higher than a threshold (set to 0.1in this paper), which is set to filter the impossi-ble mappings; (3)the mention pairs should not be-long to different clauses. The resulting test set issplit into 10 parts and a development set, each with3500 entity pairs roughly, which leads to averagely200,000 variables and 900,000 constraints per splitand may take 1 hour for Cplex to solve. Note thatwe do not count the triples that will be evaluatedin the testing data when we learn the preferencesand the clues from the KB.

5.2 Experimental Setup

We compare our framework with three baselines.The first one, ME-pl, is the pipeline system con-structed by the entity linker in (Han et al., 2011)and the MaxEnt version of Mintz++ extractormentioned in (Surdeanu et al., 2012). The sec-ond and third baselines are the pipeline systemsconstructed by the same linker and two state-of-the-art DS approaches, MultiR (Hoffmann et al.,2011) and MIML-RE (Surdeanu et al., 2012), re-spectively. They are referred to as MultiR-pl andMIML-pl in the rest of this paper.

We also implement several variants of ourframework to investigate the following two com-ponents in our framework: whether to use ex-plicit (E) or implicit (I) argument type expecta-tions, whether to take global (G) compatibilitiesinto account, resulting in four variants: ME-JE,ME-JI, ME-JEG, ME-JIG.

We tune the parameters in the objective func-tion on the development set to be re = 1, el = 4,sp = 1. The numbers of preliminary results re-tained to the inference step are set to p = 2, q = 3.Three metrics used in our experiments include:(1)the precision of extracted triples, which is theratio of the number of correct triples and the num-ber of total extracted triples; (2)the number of cor-rect triples (NoC); (3)the number of correct triplesin the results ranked in top n. The third metricis crucial for KBP, since most users are only in-terested in the knowledge facts with high confi-dences. We compare the extracted triples against

1918

Table 2: The results of our joint frameworks andthe three baselines.

Approach Precision NoC Top 50 Top 100ME-pl 28.7± 0.8 725± 12 38± 2 75± 4

MultiR-pl 31.0± 0.8 647± 15 39± 2 71± 3MIML-pl 33.2± 0.6 608± 16 40± 3 74± 5ME-JE 32.8± 0.7 768± 10 46± 2 90± 3

ME-JEG 34.2± 0.5 757± 8 46± 2 90± 3ME-JI 34.5± 1.0 784± 9 43± 3 88± 3

ME-JIG 35.7± 1.0 772± 8 43± 3 88± 4

Freebase to compute the precision, which may un-derestimate the performance since Freebase is in-complete. Since we do not have exact annotationsfor the EL, it is difficult to calculate the exact re-call. We therefore use NoC instead. We evalu-ate our framework on the 10 subsets of the testingdataset and compute their means and standard de-viations.

5.3 Overall PerformanceWe are interested to find out: (a)whether the taskbenefits from the joint inference i.e., can we col-lect more and correct facts? Or with a higher pre-cision? (b) whether the argument type expecta-tions (explicit and implicit) and global compati-bility do their jobs as we expected? And, how dowe choose from these components ? (c)whetherthe framework can work with other RE models?(d)whether we can find a suitable approach tomeasure the confidence or uncertainty during theextraction so that users or other applications canbetter utilize the extracted KB facts?

Let us first look at the performance of thebaselines and our framework in Table 2 for anoverview. Comparing the three pipeline sys-tems, we can discover that using the same en-tity linker, MIML-pl performs the best in precisionwith slightly fewer correct triples, while ME-plperforms the worst. It is not surprising, ME-pl, asa strong and high-recall baseline, outputs the mostcorrect triples. As for the results with high confi-dences, MultiR-pl outputs more correct triples inthe top 50 results than ME-pl, and MIML-pl per-forms better or comparable than ME-pl in top nresults.

After performing the joint inference, ME-JEimproves ME-pl with 4.1% in precision and 43more correct triples averagely, and results in bet-ter performance in top n results. By taking globalcompatibilities into consideration, ME-JEG fur-ther improve the precision to 34.2% in averagewith slightly fewer correct triples, indicating that

0 100 200 300 400 5000.4

0.5

0.6

0.7

0.8

0.9

1

Number of Correct Triples

Pre

cisi

on

ME−plMultiR−plMIML−plME−JEGME−JIG

Figure 3: The numbers of correct triples v.s. theprecisions for different approaches.

both argument type expectations and global com-patibilities are useful in improving the perfor-mance: argument type information can help toselect the correct and coherent predictions fromthe candidates EL and RE outputs, while globalcompatibilities can further prune incorrect triplesthat cause disagreements, although a few correctones may be incorrectly eliminated. We can alsoobserve that ME-JIG performs even higher thanME-JEG in overall precision, but ME-JEG col-lects more correct triples than ME-JIG in the topn predictions, showing that explicit type expec-tations with more accurate type information mayperform better in high confidence results.

Furthermore, even though MultiR-pl andMIML-pl are based on state-of-the-art RE ap-proaches, our model (for example, ME-JIG) canstill outperform them in terms of all metrics, with4.7% higher in precision than MultiR-pl, 2.5%higher than MIML-pl. Our model can extract 125more correct triples than MultiR-pl, 164 morethan MIML-pl, and perform better in top n resultsas well.

In previous RE tasks, Precision-Recall curvesare mostly used to evaluate the systems’ perfor-mances. In our task, since it is difficult to calculatethe recall exactly, we use the number of correcttriples instead, and plot curves of Precision-NoCto show the performance of the competitors andour approaches in more detail. For each value ofNoC, the precision is the average of the ten splitsof the testing dataset.

As shown in Figure 3, our approaches (ME-JEGand ME-JIG) obtain higher precisions on eachNoC value, and the curves are much smoother than

1919

Table 3: The results of our joint frameworks withMultiR sentence extractor.

Approach Precision NoC Top 50 Top 100MultiR-pl 31.0± 0.8 647± 15 39± 2 71± 3

MultiR-JEG 36.9± 0.8 687± 15 46± 2 88± 3MultiR-JIG 38.5± 0.9 700± 15 45± 2 88± 3

the pipeline systems, indicating that our frame-work is more suitable for harvesting high qualityknowledge facts. Comparing the two kinds of typeclues, we can see that explicit ones perform betterwhen the confidence control is high and the num-ber of correct triples is small, and then the two arecomparable. Since the precision of the triples withhigh confidences is crucial for the task of KBP,we still suggest choosing the explicit ones whenthere is a well-defined schema available in the KB,although implicit type expectations can result inhigher overall precision.

5.4 Adapting MultiR Sentence Extractor intothe Framework

The preliminary relation extractor of our frame-work is not limited to the MaxEnt3 extractor. Itcan be any sentence level recall-oriented relationextractors. To further investigate the generaliza-tion of our joint inference framework, we alsotry to fit other sentence level relation extractorsinto the framework. Considering that MIML-REdoes not output sentence-level results, we onlyadapt MultiR, with both global compatibilitiesand explicit/implicit type expectations, named asMultiR-JEG and MultiR-JIG, respectively. Sincethe scores output by the original MultiR are un-normalized, which are difficult to directly apply toour framework, we normalize their scores and re-tune the framework’s parameters accordingly. Theparameters are set to re = 1, el = 32, sp = 16.

As seen in Table 3, MultiR-JEG helps MultiRobtain about 40 more correct triples in average,and achieves 5.9% higher in precision, as wellas significant improvements in top n correct pre-dictions. As for MultiR-JIG, the improvementsare 7.5% in precision and 53 in number of cor-rect triples. In terms of top n results, the explicitand implicit type expectations perform compara-ble. We also observe that our framework improvesMultiR as much as it does to MaxEnt, indicatingour joint framework can generalize well in differ-ent RE models.

3http://homepages.inf.ed.ac.uk/lzhang10/maxent_toolkit.html

0 100 200 300 400 5000.4

0.6

0.8

1

Number of Correct Triples

Pre

cisi

on

MultiR−plMultiR−JEGMultiR−JIG

Figure 4: The numbers of correct triples v.s. theprecisions for approaches with MultiR extractor.

0.5 0.6 0.7 0.8 0.9 10.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Confidence Threshold

Pre

cisi

on

ME−plMultiR−plME−JIGMultiR−JIGMIML−plME−JEGMultiR−JEG

Figure 5: The precisions of different models un-der different confidence thresholds. The error barsrepresents the standard deviations of the results.

We further plot Precision-NoC curves forMultiR-JEG and MultiR-JIG in Figure 4, show-ing that our framework can result in better perfor-mance and smoother curves with MultiR extractor.It is interesting to see that with MultiR extractor,the two kinds of expectations perform comparably.

5.5 Results with Confidence Estimations

Now, we will investigate the results from anotherperspective with the help of confidence estima-tions. We calculate the precisions of the competi-tors and our approaches on different confidencethresholds from 0.5 to 1. The results are summa-rized in Figure 5. Note that the results across dif-ferent approaches are not directly comparable, weput them in the same figure only to save space.

In Figure 5, intuitively, as the confidence thresh-old goes up, the extraction precisions shouldincrease, indicating triples with higher confi-dences are more likely to be correct. However,

1920

lower thresholds tend to result in estimations withsmaller standard derivations due to those preci-sions are estimated over much more triples thanthose with higher thresholds, which means the ran-domness will be smaller.

On the other hand, our joint frameworks pro-vide more evidences that can be used to well cap-ture the reliability of an extraction. For example,the precisions of Multir-JIG and ME-JIG both stayaround 85% when the confidence is higher than0.85, with about 120 correct triples, indicating thatby setting a proper threshold, we can obtain con-siderable amount of high quality knowledge factsat an acceptable precision, which is crucial forKBP. However, we cannot harvest such amount ofhigh quality knowledge facts from the other threepipeline systems.

6 Conclusions

In this paper, we propose a joint framework for thetask of populating KBs with new knowledge facts,which performs joint inference on two subtasks,maximizes their preliminary scores, fulfills thetype expectations of relations and avoids globalincompatibilities with respect to all local predic-tions to find an optimal assignment. Experimen-tal results show that our framework can signifi-cantly eliminate the error propagations in pipelinesystems and outperforms competitive pipeline sys-tems with state-of-the-art RE models. Regard-ing the explicit argument type expectations andthe implicit ones, the latter can result in a higheroverall precision, while the former performs bet-ter in acquiring high quality knowledge facts withhigher confidence control, indicating that if theKB has a well-defined schema we can use explicittype requirements for the KBP task, and if not,our model can still perform well by mining theimplicit ones. Our framework can also generalizewell with other preliminary RE models. Further-more, we assign extraction confidences to all ex-tracted facts to facilitate further applications. Bysetting a suitable threshold, our framework canpopulate high quality reliable knowledge facts toexisting KBs.

For future work, we will address the NIL is-sue of EL where we currently assume all entitiesshould be linked to a KB. It would be also inter-esting to jointly model the two subtasks throughstructured learning, instead of joint inference only.Currently we only use the coherence of extracted

triples and the KB to estimate confidences, whichwould be nice to directly model the issue in a jointmodel.

Acknowledgments

We would like to thank Heng Ji, Kun Xu, DongWang and Junyang Rao for their helpful discus-sions and the anonymous reviewers for their in-sightful comments that improved the work consid-erably. This work was supported by the NationalHigh Technology R&D Program of China (GrantNo. 2012AA011101, 2014AA015102), NationalNatural Science Foundation of China (Grant No.61272344, 61202233, 61370055) and the jointproject with IBM Research. Any correspondenceplease refer to Yansong Feng.

ReferencesACE. 2004. The automatic content extraction projects.

http://projects.ldc.upenn.edu/ace.

Andrew Carlson, Justin Betteridge, Byran Kisiel, BurrSettles, Estevam Hruschka Jr., and Tom Mitchell.2010. Toward an architecture for never-ending lan-guage learning. In Proceedings of the Conferenceon Artificial Intelligence (AAAI), pages 1306–1313.AAAI Press.

Taylor Cassidy, Zheng Chen, Javier Artiles, Heng Ji,Hongbo Deng, Lev-Arie Ratinov, Jing Zheng, JiaweiHan, and Dan Roth. 2011. Entity linking systemdescription. In TAC2011.

Zheng Chen and Heng Ji. 2011. Collaborative rank-ing: A case study on entity linking. In Proceedingsof the Conference on Empirical Methods in Natu-ral Language Processing, EMNLP ’11, pages 771–781, Stroudsburg, PA, USA. Association for Com-putational Linguistics.

Liwei Chen, Yansong Feng, Songfang Huang, YongQin, and Dongyan Zhao. 2014. Encoding relationrequirements for relation extraction via joint infer-ence. In Proceedings of the 52nd Annual Meetingon Association for Computational Linguistics, ACL2014, pages 818–827, Stroudsburg, PA, USA. Asso-ciation for Computational Linguistics.

Xiao Cheng and Dan Roth. 2013. Relational inferencefor wikification. In EMNLP.

Yejin Choi, Eric Breck, and Claire Cardie. 2006. Jointextraction of entities and relations for opinion recog-nition. In Proceedings of the 2006 Conference onEmpirical Methods in Natural Language Process-ing, EMNLP ’06, pages 431–439, Stroudsburg, PA,USA. Association for Computational Linguistics.

1921

Mark Dredze, Paul McNamee, Delip Rao, Adam Ger-ber, and Tim Finin. 2010. Entity disambiguation forknowledge base population. In Coling2010.

Anthony Fader, Stephen Soderland, and Oren Etzioni.2011. Identifying relations for open information ex-traction. In Proceedings of the Conference on Em-pirical Methods in Natural Language Processing,EMNLP ’11, pages 1535–1545, Stroudsburg, PA,USA. Association for Computational Linguistics.

Radu Florian, Hongyan Jing, Nanda Kambhatla, andImed Zitouni. 2006. Factorizing complex mod-els: A case study in mention detection. In Proceed-ings of the 21st International Conference on Com-putational Linguistics and the 44th annual meetingof the Association for Computational Linguistics,pages 473–480. Association for Computational Lin-guistics.

Radu Florian, John F Pitrelli, Salim Roukos, and ImedZitouni. 2010. Improving mention detection robust-ness to noisy input. In Proceedings of the 2010 Con-ference on Empirical Methods in Natural LanguageProcessing, pages 335–345. Association for Com-putational Linguistics.

Xianpei Han and Le Sun. 2011. A generative entity-mention model for linking entities with knowledgebase. In Proceedings of ACL, HLT ’11, pages 945–954, Stroudsburg, PA, USA. Association for Com-putational Linguistics.

Xianpei Han, Le Sun, and Jun Zhao. 2011. Collectiveentity linking in web text: a graph-based method. InSIGIR, SIGIR ’11, pages 765–774, New York, NY,USA. ACM.

Raphael Hoffmann, Congle Zhang, Xiao Ling,Luke Zettlemoyer, and Daniel S. Weld. 2011.Knowledge-based weak supervision for informationextraction of overlapping relations. In Proceedingsof the 49th ACL-HLT - Volume 1, HLT ’11, pages541–550, Stroudsburg, PA, USA. ACL.

Heng Ji, Ralph Grishman, and Hoa Dang. 2011.Overview of the tac2011 knowledge base populationtrack. In Proceedings of TAC.

Rohit J. Kate and Raymond J. Mooney. 2010. Jointentity and relation extraction using card-pyramidparsing. In Proceedings of the Fourteenth Confer-ence on Computational Natural Language Learning,CoNLL ’10, pages 203–212, Stroudsburg, PA, USA.Association for Computational Linguistics.

Qi Li and Heng Ji. 2014. Incremental joint extrac-tion of entity mentions and relations. In Proceed-ings of the 52nd Annual Meeting on Association forComputational Linguistics, ACL 2014, pages 402–412, Stroudsburg, PA, USA. Association for Com-putational Linguistics.

Qi Li, Sam Anzaroot, Wen-Pin Lin, Xiang Li, andHeng Ji. 2011. Joint inference for cross-documentinformation extraction. In Proceedings of the 20th

ACM International Conference on Information andKnowledge Management, CIKM ’11, pages 2225–2228, New York, NY, USA. ACM.

Thomas Lin, Mausam, and Oren Etzioni. 2012.No noun phrase left behind: Detecting and typ-ing unlinkable entities. In Proceedings of the 2012EMNLP-CoNLL, EMNLP-CoNLL ’12, pages 893–903, Stroudsburg, PA, USA. Association for Com-putational Linguistics.

Fang Liu and Jun Zhao. 2012. Sweat2012: Patternbased english slot filling system for knowledge basepopulation at tac 2012. In TAC2012.

Mike Mintz, Steven Bills, Rion Snow, and Dan Ju-rafsky. 2009. Distant supervision for relation ex-traction without labeled data. In Proceedings of theJoint Conference of the 47th Annual Meeting of theACL and the 4th IJCNLP of the AFNLP: Volume 2 -Volume 2, ACL ’09, pages 1003–1011.

Sean Monahan and Dean Carpenter. 2012. Lorify: Aknowledge base from scratch. In TAC2012.

Lev Ratinov, Dan Roth, Doug Downey, and Mike An-derson. 2011. Local and global algorithms for dis-ambiguation to wikipedia. In ACL.

Sebastian Riedel, Limin Yao, and Andrew McCallum.2010. Modeling relations and their mentions with-out labeled text. In Machine Learning and Knowl-edge Discovery in Databases, volume 6323 of Lec-ture Notes in Computer Science, pages 148–163.Springer Berlin / Heidelberg.

Sebastian Riedel, Limin Yao, Benjamin M. Marlin, andAndrew McCallum. 2013. Relation extraction withmatrix factorization and universal schemas. In JointHuman Language Technology Conference/AnnualMeeting of the North American Chapter of the Asso-ciation for Computational Linguistics (HLT-NAACL’13), June.

Benjamin Roth, Grzegorz Chrupala, Michael Wiegand,Mittul Singh, and Dietrich Klakow. 2012. General-izing from freebase and patterns using cluster-baseddistant supervision for tac kbp slotfilling 2012. InTAC2012.

Prithviraj Sen. 2012. Collective context-aware topicmodels for entity disambiguation. In Proceedingsof the 21st International Conference on World WideWeb, WWW ’12, pages 729–738, New York, NY,USA. ACM.

Sameer Singh, Sebastian Riedel, Brian Martin, Jiap-ing Zheng, and Andrew McCallum. 2013. Jointinference of entities, relations, and coreference. InProceedings of the 2013 Workshop on AutomatedKnowledge Base Construction, AKBC ’13, pages 1–6, New York, NY, USA. ACM.

Ang Sun, Xin Wang, Sen Xu, Yigit Kiran, ShakthiPoornima, Andrew Borthwick, , and Ralph Grish-man. 2012. Intelius-nyu tac-kbp2012 cold start sys-tem. In TAC2012.

1922

Mihai Surdeanu, Julie Tibshirani, Ramesh Nallap-ati, and Christopher D. Manning. 2012. Multi-instance multi-label learning for relation extraction.In EMNLP-CoNLL, pages 455–465. ACL.

Suzanne Tamang, Zheng Chen, and Heng Ji. 2012. En-tity linking system and slot filling validation system.In TAC2012.

Michael Wick, Sameer Singh, Ari Kobren, and AndrewMcCallum. 2013. Assessing confidence of knowl-edge base content with an experimental study in en-tity resolution. In AKBC2013.

Limin Yao, Sebastian Riedel, and Andrew McCallum.2010. Collective cross-document relation extractionwithout labelled data. In Proceedings of EMNLP,EMNLP ’10, pages 1013–1023, Stroudsburg, PA,USA. ACL.

Shubin Zhao and Ralph Grishman. 2005. Extractingrelations with integrated information using kernelmethods. In Proceedings of the 43rd Annual Meet-ing on Association for Computational Linguistics,ACL ’05, pages 419–426, Stroudsburg, PA, USA.Association for Computational Linguistics.

Yunhong Zhou, Dennis Wilkinson, Robert Schreiber,and Rong Pan. 2008. Large-scale parallel collabo-rative filtering for the netflix prize. In Proceedingsof the 4th International Conference on AlgorithmicAspects in Information and Management, AAIM’08, pages 337–348, Berlin, Heidelberg. Springer-Verlag.

Yiping Zhou, Lan Nie, Omid Rouhani-Kalleh, Fla-vian Vasile, and Scott Gaffney. 2010. Resolvingsurface forms to wikipedia topics. In Proceedingsof the 23rd International Conference on Computa-tional Linguistics, COLING ’10, pages 1335–1343,Stroudsburg, PA, USA. Association for Computa-tional Linguistics.

1923

Documents

Joint Inference for Knowledge Base Populationemnlp2014.org/papers/pdf/EMNLP2014205.pdf · 2014-10-16 · Proceedings of the 2014 Conference on Empirical Methods in Natural Language