Knowledge Graph Reﬁnement: A Survey of …pdfs.semanticscholar.org/93b6/329091e215b9ef007a85c07635...2 Knowledge Graph Reﬁnement: A Survey of Approaches and Evaluation Methods

Semantic Web 0 (2015) 1–0 1IOS Press

Knowledge Graph Refinement:A Survey of Approaches and EvaluationMethodsHeiko Paulheim,Data and Web Science Group, University of Mannheim, B6 26, 68159 Mannheim, GermanyE-mail: [email protected]

Abstract. In the recent years, different web knowledge graphs, both free and commercial, have been created. While Googlecoined the term “Knowledge Graph” in 2012, there are also a few openly available knowledge graphs, with DBpedia, YAGO,and Freebase being among the most prominent ones. Those graphs are often constructed from semi-structured knowledge, suchas Wikipedia, or harvested from the web with a combination of statistical and linguistic methods. The result are large-scaleknowledge graphs that try to make a good trade-off between completeness and correctness. In order to further increase the utilityof such knowledge graphs, various refinement methods have been proposed, which try to infer and add missing knowledge tothe graph, or identify erroneous pieces of information. In this article, we provide a survey of such knowledge graph refinementapproaches, with a dual look at both the methods being proposed as well as the evaluation methodologies used.

Keywords: Knowledge Graphs, Refinement, Completion, Error Detection, Evaluation

1. Introduction

Knowledge graphs on the web are a backbone ofmany information systems that require access to struc-tured knowledge, be it domain-specific or domain-independent. The idea of feeding intelligent systemsand agents with general, formalized knowledge of theworld dates back to classic Artificial Intelligence re-search in the 1980s [89]. More recently, with the ad-vent of Linked Open Data [5] sources like DBpedia[55], and by Google’s announcement of the GoogleKnowledge Graph in 20121, representations of generalworld knowledge as graphs have drawn a lot of atten-tion again.

There are various ways of building such knowledgegraphs. They can be curated like Cyc [56], edited bythe crowd like Freebase [9] and Wikidata [101], or ex-tracted from large-scale, semi-structured web knowl-

1http://googleblog.blogspot.co.uk/2012/05/introducing-knowledge-graph-things-not.html

edge bases such as Wikipedia, like DBpedia [55] andYAGO [98]. Furthermore, information extraction meth-ods for unstructured or semi-structured information areproposed, which lead to knowledge graphs like NELL[14], PROSPERA [69], or KnowledgeVault [21].

Whichever approach is taken for constructing aknowledge graph, the result will never be perfect [10].As a model of the real world or a part thereof, formal-ized knowledge cannot reasonably reach full coverage,i.e., contain information about each and every entity inthe real world. Furthermore, it is unlikely, in particularwhen heuristic methods are applied, that the knowl-edge graph is fully correct – there is usually a trade-offbetween coverage and correctness, which is addresseddifferently in each knowledge graph. [108]

To address those shortcomings, various methodsfor knowledge graph refinement have been proposed.In many cases, those methods are developed by re-searchers outside the organizations or communitieswhich create the knowledge graphs. They rather takean existing knowledge graph and try to increase its

1570-0844/15/$27.50 c© 2015 – IOS Press and the authors. All rights reserved

http://googleblog.blogspot.co.uk/2012/05/introducing-knowledge-graph-things-not.html

http://googleblog.blogspot.co.uk/2012/05/introducing-knowledge-graph-things-not.html

2 Knowledge Graph Refinement: A Survey of Approaches and Evaluation Methods

coverage and/or correctness by various means. Thus,the focus of this survey is not knowledge graph con-struction, but knowledge graph refinement.

For this survey, we view knowledge graph construc-tion as a construction from scratch, i.e., using a set ofoperations on one or more sources to create a knowl-edge graph. In contrast, refinement assumes that thereis already a knowledge graph given which is improved,e.g., by adding missing knowledge or identifying andremoving errors. Usually, those methods directly usethe information given in a knowledge graph, e.g., astraining information for automatic approaches. Thus,the methods for both construction and refinement maybe similar, but not the same, since the latter work on agiven graph, while the former are not.

It is important to note that for many knowledgegraphs, one or more refinement steps are applied whencreating and/or before publishing the graph. For exam-ple, logical reasoning is applied on some knowledgegraphs for validating the consistency of statements inthe graph, and removing the inconsistent statements.Such post processing operations (i.e., operations ap-plied after the initial construction of the graph) wouldbe considered as refinement methods for this survey,and are included in the survey.

Decoupling knowledge base construction and re-finement has different advantages. First, it allows – atleast in principle – for developing methods for refin-ing arbitrary knowledge graphs, which can then be ap-plied to improve multiple knowledge graphs.2 Otherthan fine-tuning the heuristics that create a knowl-edge graph, the impact of such generic refinementmethods can thus be larger. Second, evaluating refine-ment methods in isolation of the knowledge graph con-struction step allows for a better understanding anda cleaner separation of effects, i.e., it facilitates morequalified statements about the effectiveness of a pro-posed approach.

The rest of this article is structured as follows. Sec-tion 2 gives a brief introduction into knowledge graphsin the Semantic Web. In section 3 and 4, we presenta categorization of approaches and evaluation method-ologies. In section 5 and 6, we present the review ofmethods for completion (i.e., increasing coverage) anderror detection (i.e., increasing correctness) of knowl-edge graphs. We conclude with a critical reflection ofthe findings in section 7, and a summary in section 8.

2See section 7.2 for a critical discussion.

2. Knowledge Graphs in the Semantic Web

From the early days, the Semantic Web has pro-moted a graph-based representation of knowledge,e.g., by pushing the RDF standard3. In such a graph-based knowledge representation, entities, which arethe nodes of the graph, are connected by relations,which are the edges of the graph (e.g., Shakespearehas written Hamlet), and entities can have types, de-noted by is a relations (e.g., Shakespeare is a Writer,Hamlet is a play). In many cases, the sets of possibletypes and relations are organized in a schema or ontol-ogy, which defines their interrelations and restrictionsof their usage.

With the advent of Linked Data [5], it was proposedto interlink different datasets in the semantic web. Bymeans of interlinking, the collection of could be under-stood as one large, global knowledge graph (althoughvery heterogenous in nature). To date, roughly 1,000datasets are interlinked in the Linked Open Data cloud,with the majority of links connecting identical entitiesin two datasets [92].

The term Knowledge Graph was coined by Googlein 2012, referring to their use of semantic knowledgein Web Search (“Things, not strings”), and is recentlyalso used to refer to Semantic Web knowledge basessuch as DBpedia or YAGO. From a broader perspec-tive, any graph-based representation of some knowl-edge could be considered a knowledge graph (thiswould include any kind of RDF dataset, as well as de-scription logic ontologies). However, there is no com-mon definition about what a knowledge graph is andwhat it is not. Instead of attempting a formal definitionof what a knowledge graph is, we restrict ourselves toa minimum set of characteristics of knowledge graphs,which we use to tell knowledge graphs from other col-lections of knowledge which we would not consider asknowledge graphs. A knowledge graph

1. mainly describes real world entities and their in-terrelations, organized in a graph.

2. defines possible classes and relations of entitiesin a schema.

3. allows for potentially interrelating arbitrary enti-ties with each other.

4. covers various topical domains.

The first two criteria clearly define the focus of aknowledge graph to be the actual instances (A-box in

3http://www.w3.org/RDF/

http://www.w3.org/RDF/

Knowledge Graph Refinement: A Survey of Approaches and Evaluation Methods 3

description logic terminology), with the schema (T-box) playing only a minor role. Typically, this meansthat the number of instance-level statements is by sev-eral orders of magnitude larger than that of schemalevel statements (cf. Table 1). In contrast, the schemacan remain rather shallow, at a small degree of formal-ization. In that sense, mere ontologies without any in-stances (such as DOLCE [27]) would not be consid-ered as knowledge graphs. Likewise, we do not con-sider WordNet [66] as a knowledge graph, since itmainly concerned with common nouns and words4 andtheir relations (although a few proper nouns, i.e., in-stances are also included).5

The third criterion introduces the possibility to de-fine arbitrary relations between instances, which arenot restricted in their domain and/or range. This is aproperty which is hardly found in relational databases,which follow a strict schema.

Furthermore, knowledge graphs are supposed tocover at least a major portion of the domains that ex-ist in the real world, and are not supposed to be re-stricted to only one domain (such as geographic enti-ties). In that sense, large, but single-domain datasets,such as Geonames, would not be considered a knowl-edge graph.

Knowledge graphs on the Semantic Web are typi-cally provided using Linked Data [5] as a standard.They can be built using different methods: they canbe curated by an organization or a small, closed groupof people, crowd-sourced by a large, open group ofindividuals, or created with heuristic, automatic orsemi-automatic means. In the following, we give anoverview of existing knowledge graphs, both open andcompany-owned.

2.1. OpenCyc

OpenCyc is a freely available version of the Cycknowledge base, which is one of the oldest knowledgegraphs, dating back to the 1980s [56]. Rooted in tra-ditional artificial intelligence research, it is a curatedknowledge graph, developed and maintained by Cy-Corp Inc.6. A semantic web endpoint to OpenCyc also

4The question of whether words as such are real world entities ornot is of philosophical nature and not answered within the scope ofthis article.

5Nevertheless, it is occasionally used for evaluating knowledgegraph refinement methods, as we will show in the subsequent sec-tions.

6http://www.cyc.com/

exists, containing links to DBpedia and other LODdatasets.

OpenCyc contains roughly 120,000 instances and2.5 million facts defined for those instances; its schemacomprises a type hierarchy of roughly 45,000 types,and 19,000 possible relations.7

2.2. Freebase

Curating a universal knowledge graph is an endeav-our which is infeasible for most individuals and organi-zations. To date, more than 900 person years have beeninvested in the creation of Cyc [90], with gaps still ex-isting. Thus, distributing that effort on as many shoul-ders as possible through crowdsourcing is a way takenby Freebase, a public, editable knowledge graph withschema templates for most kinds of possible entities(i.e., persons, cities, movies, etc.). After MetaWeb, thecompany running Freebase, was acquired by Google,Freebase was shut down on March 31st, 2015.

The last version of Freebase contains roughly 50million entities and 3 billion facts8. Freebase’s schemacomprises roughly 27,000 entity types and 38,000 re-lation types.9

2.3. Wikidata

Like Freebase, Wikidata is a collaboratively editedknowledge graph, operated by the Wikimedia founda-tion10 that also hosts the various language editions ofWikipedia. After the shutdown of Freebase, the datacontained in Freebase is subsequently moved to Wiki-data.11 A particularity of Wikidata is that for each ax-iom, provenance metadata can be included – such asthe source and date for the population figure of a city[101].

7These numbers have been gathered by own inspections ofthe 2012 of version of OpenCyc, available from http://sw.opencyc.org/

8http://www.freebase.com9These numbers have been gathered by queries against Freebase’s

query endpoint.10http://wikimediafoundation.org/11http://plus.google.com/

109936836907132434202/posts/3aYFVNf92A1

http://www.cyc.com/

http://sw.opencyc.org/

http://sw.opencyc.org/

http://www.freebase.com

http://wikimediafoundation.org/

http://plus.google.com/109936836907132434202/posts/3aYFVNf92A1

http://plus.google.com/109936836907132434202/posts/3aYFVNf92A1


To date, Wikidata contains roughly 16 million in-stances12 and 66 million statements13. Its schema de-fines roughly 23,000 14 types and 1,600 relations15.

2.4. DBpedia

DBpedia is a knowledge graph which is extractedfrom structured data in Wikipedia. The main source forthis extraction are the key-value pairs in the Wikipediainfoboxes. In a crowd-sourced process, types of in-foboxes are mapped to the DBpedia ontology, and keysused in those infoboxes are mapped to properties inthat ontology. Based on those mappings, a knowledgegraph can be extracted [55].

The most recent version of the main DBpedia(i.e., DBpedia 2015-04, extracted from the EnglishWikipedia based on dumps from February/March2015) contains 4.8 million entities and 176 millionstatements about those entities.16 The ontology com-prises 735 classes and 2,800 relations.17

2.5. YAGO

Like DBpedia, YAGO is also extracted from DB-pedia. YAGO builds its classification implicitly fromthe category system in Wikipedia and the lexical re-source WordNet [66], with infobox properties manu-ally mapped to a fixed set of attributes. While DBpediacreates different interlinked knowledge graphs for eachlanguage edition of Wikipedia [12], YAGO aims at anautomatic fusion of knowledge extracted from variousWikipedia language editions, using different heuristics[64].

The latest release of YAGO, i.e., YAGO3, contains4.6 million entities and 26 million facts about thosetypes. The schema comprises rouhgly 488,000 typesand 77 relations [64].

12http://www.wikidata.org/wiki/Wikidata:Statistics

13http://tools.wmflabs.org/wikidata-todo/stats.php

14http://tools.wmflabs.org/wikidata-exports/miga/?classes#_cat=Classes

15http://www.wikidata.org/wiki/Special:ListProperties

16http://dbpedia.org/services-resources/datasets/dataset-2015-04/dataset-2015-04-statistics

17http://dbpedia.org/dbpedia-data-set-2015-04

2.6. NELL

While DBpedia and YAGO use semi-structuredcontent as a base, methods for extracting knowledgegraphs from unstructured data have been proposedas well. One of the earliest approaches working atweb-scale was the Never Ending Language Learning(NELL) project [14]. The project works on a large-scale corpus of web sites and exploits a coupled pro-cess which learns text patterns corresponding type andrelation assertions, as well as applies them to extractnew entities and relations. Reasoning is applied forconsistency checking and removing inconsistent ax-ioms. The system is still running today, continuouslyextending its knowledge base. While not published us-ing Semantic Web standards, it has been shown that thedata in NELL can be transformed to RDF and providedas Linked Open Data as well [110].

In its most recent version (i.e., the 945th iteration),NELL contains roughly 2 million entities and 433,000relations between those. The NELL ontology defines285 classes and 425 relations.18

2.7. Google’s Knowledge Graph

Google’s Knowledge Graph was introduced to thepublic in 2012, which was also when the term knowl-edge graph as such was coined. Google itself is rathersecretive about how their Knowledge Graph is con-structed; there are only a few external sources that dis-cuss some of the mechanisms of information flow intothe Knowledge Graph based on experience19. Fromthose, it can be assumed that major semi-structuredweb sources, such as Wikipedia, contribute to theknowledge graph, as well as structured markup (likeschema.org Microdata [65]) on web pages and con-tents from Google’s online social network Google+.

According to [21], Google’s Knowledge Graph con-tains 18 billion statements about 570 million entities,with a schema of 1,500 entity types and 35,000 relationtypes.

18These numbers have been derived from the promotion heatmapat http://rtw.ml.cmu.edu/resources/results/08m/NELL.08m.945.heatmap.html.

19E.g., http://www.techwyse.com/blog/search-engine-optimization/seo-efforts-to-get-listed-in-google-knowledge-graph/

http://www.wikidata.org/wiki/Wikidata:Statistics

http://www.wikidata.org/wiki/Wikidata:Statistics

http://tools.wmflabs.org/wikidata-todo/stats.php

http://tools.wmflabs.org/wikidata-todo/stats.php

http://tools.wmflabs.org/wikidata-exports/miga/?classes#_cat=Classes

http://tools.wmflabs.org/wikidata-exports/miga/?classes#_cat=Classes

http://www.wikidata.org/wiki/Special:ListProperties

http://www.wikidata.org/wiki/Special:ListProperties

http://dbpedia.org/services-resources/datasets/dataset-2015-04/dataset-2015-04-statistics



http://dbpedia.org/dbpedia-data-set-2015-04

http://dbpedia.org/dbpedia-data-set-2015-04

http://rtw.ml.cmu.edu/resources/results/08m/NELL.08m.945.heatmap.html

http://rtw.ml.cmu.edu/resources/results/08m/NELL.08m.945.heatmap.html

http://www.techwyse.com/blog/search-engine-optimization/seo-efforts-to-get-listed-in-google-knowledge-graph/




2.8. Google’s Knowledge Vault

The Knowledge Vault is another project by Google.It extracts knowledge from different sources, such astext documents, HTML tables, and structured anno-tations on the Web with Microdata or MicroFormats.Extracted facts are combined using both the extrac-tor’s confidence values, as well as prior probabilitiesfor the statements, which are computed using the Free-base knowledge graph (see above). From those com-ponents, a confidence value for each fact is computed,and only the confident facts are taken into KnowledgeVault [21].

According to [21], the Knowledge Vault containsroughly 45 million entities and 271 million fact state-ments, using 1,100 entity types and 4,500 relationtypes.

2.9. Yahoo!’s Knowledge Graph

Like Google, Yahoo! also has their internal knowl-edge graph, which is used to improve search results.The knowledge graph builds on both public data (e.g.,Wikipedia and Freebase), as well as closed commer-cial sources for various domains. It uses wrappers fordifferent sources and monitors evolving sources, suchas Wikipedia, for constant updates.

Yahoo’s knowledge graph contains roughly 3.5 mil-lion entities and 1.4 billion relations. Its schema, whichis aligned with schema.org, comprises 250 types of en-tities and 800 types of relations. [6]

2.10. Microsoft’s Satori

Satori is Microsoft’s equivalent to Google’s Knowl-edge Graph.20 Although almost no public informationon the construction, the schema, or the data volume ofSatori is available, it has been said to consist of 300million entities and 800 million relations in 2012, andits data representation format to be RDF.21

2.11. Facebook’s Entities Graph

Although the majority of the data in the online so-cial network Facebook22 is perceived as connections

20http://blogs.bing.com/search/2013/03/21/understand-your-world-with-bing/

21http://research.microsoft.com/en-us/projects/trinity/query.aspx, currently offline, accessi-ble through the Internet Archive.

22http://www.facebook.com/

between people, Facebook also works on extracting aknowledge graph which contains many more entities.The information people provide as personal informa-tion (e.g., their home town, the schools they went to),as well as their likes (movies, bands, books, etc.), oftenrepresent entities, which can be linked both to peopleas well as among each other. By parsing textual infor-mation and linking to Wikipedia, the graph also con-tains links among entities, e.g., the writer of a book.Although not many public numbers about Facebook’sEntities Graph exist, it is said to contain more than 100Billion connections between entities.23

2.12. Summary

Table 1 summarizes the characteristics of the knowl-edge graphs discussed above. It can be observed thatthe graphs differ in the basic measures, such as thenumber of entities and relations, as well as in the sizeof the schema they use, i.e., the number of classes andrelations. From these differences, it can be concludedthat the knowledge graphs must differ in other charac-teristics as well, such as average node degree, density,or connectivity.

3. Categorization of Knowledge GraphRefinement Approaches

Knowledge graph refinement methods can differalong different dimensions. For this survey, we distin-guish the overall goal of the method, i.e., completionvs. correction of the knowledge graph, the refinementtarget (e.g., entity types, relations between entities, orliteral values), as well as the data used by the approach(i.e., only the knowledge graph itself, or further exter-nal sources). All three dimensions are orthogonal.

There are a few research fields which are relatedto Knowledge Graph refinement: Ontology learningmainly deals with learning a concept level descrip-tion of a domain, such as a hierarchy (e.g., Cities arePlaces) [13,63]. Likewise, description logic learningis mainly concerned with refining that concept leveldescription [54]. As stated above, the focus of knowl-edge graphs, in contrast, is rather the instance (A-box)level, not so much the concept (T-box) level. Follow-ing that notion, we consider those works as knowl-

23http://www.facebook.com/notes/facebook-engineering/under-the-hood-the-entities-graph/10151490531588920

http://blogs.bing.com/search/2013/03/21/understand-your-world-with-bing/

http://blogs.bing.com/search/2013/03/21/understand-your-world-with-bing/

http://research.microsoft.com/en-us/projects/trinity/query.aspx

http://research.microsoft.com/en-us/projects/trinity/query.aspx

http://www.facebook.com/

http://www.facebook.com/notes/facebook-engineering/under-the-hood-the-entities-graph/10151490531588920




Table 1Overview of Popular Knowledge Graphs. The table depicts the num-ber of instances and facts; as well as the number of different typesand relations defined in their schema. Instances denotes the numberof instances or A-box concepts defined in the graph, Facts denotesthe number of statements about those instances, Entity types denotesthe number of different types or classes defined in the schema, andRelation types denotes the number of different relations defined inthe schema. Microsoft’s Satori and Facebook’s Entities Graph is notshown, because to the best of our knowledge, no detailed recentnumbers on the graph are publicly available.

Name Instances Facts Types Relations

DBpedia (English) 4,806,150 176,043,129 735 2,813YAGO 4,595,906 25,946,870 488,469 77Freebase 49,947,845 3,041,722,635 26,507 37,781Wikidata 15,602,060 65,993,797 23,157 1,673NELL 2,006,896 432,845 285 425OpenCyc 118,499 2,413,894 45,153 18,526Google’s Knowledge Graph 570,000,000 18,000,000,000 1,500 35,000Google’s Knowledge Vault 45,000,000 271,000,000 1,100 4,469Yahoo! Knowledge Graph 3,443,743 1,391,054,990 250 800

edge graph refinement approaches which focus on re-fining the A-box. Approaches that only focus on theT-box are not considered for this survey, however, ifthe schema or ontology is refined as a means to ulti-mately improve the A-box, those works are includedin the survey.

3.1. Completion vs. Error Detection

There are two main goals of knowledge graph re-finement: (a) adding missing knowledge to the graph,i.e., completion, and (b) identifying wrong informationin the graph, i.e., error detection. From a data qualityperspective, those goals relate to the data quality di-mensions free-of-error and completeness [84].

3.2. Target of Refinement

Both completion and error detection approaches canbe further distinguished by the targeted kind of infor-mation in the knowledge graph. For example, someapproaches are targeted towards completing/correctingentity type information, while others are targeted to(either specific or any) relations between entities, or in-terlinks between different knowledge graphs, or literalvalues, such as numbers. While the latter can be of anydatatype (strings, numbers, dates, etc.), most researchfocuses on numerical or date-valued literal values.

Another strand of research targets the extensionof the schema used by the knowledge graph (i.e.,

the T-box), not the data (the A-box). However, asdiscussed above, approaches focusing purely on theschema without an impact on the instance level are notconsidered for this survey.

3.3. Internal vs. External Methods

A third distinguishing property is the data used byan approach. While internal approaches only use theknowledge graph itself as input, external methods useadditional data, such as text corpora. In the widestsense, approaches making use of human knowledge,such as crowdsourcing [1] or games with a purpose[102], can also be viewed as external methods.

4. Categorization of Evaluation Methods

There are different possible ways to evaluate knowl-edge graph refinement. On a high level, we can dis-tinguish methodologies that use only the knowledgegraph at hand, and methodologies that use externalknowledge, such as human annotation.

4.1. Partial Gold Standard

One common evaluation strategy is to use a partialgold standard. In this methodology, a subset of graphentities or relations are selected and labeled manu-ally. Other evaluations use external knowledge graphsand/or databases as partial gold standards.


For completion tasks, this means that all axioms thatshould be there are collected, whereas for correctiontasks, a set of axioms in the graph is manually labeledas correct or incorrect. The quality of completion ap-proaches is usually measured in recall, precision, andF-measure, whereas for correction methods, accuracyand/or area under the ROC curve (AUC) are often usedalternatively or in addition.

Sourcing partial gold standards from humans canlead to high quality data (given that the knowledgegraph and the ontology it uses are not overly complex),but is costly, so that those gold standards are usuallysmall. Exploiting other knowledge graphs based onknowledge graph interlinks is sometimes proposed toyield larger-scale gold standards, but has two sourcesof errors: errors in the target knowledge graph, anderrors in the linkage between the two. For example,it has been reported that 20% of the interlinks be-tween DBpedia and Freebase are incorrect [107], andthat roughly half of the owl:sameAs links betweenknowledge graphs connect two things which are re-lated, but not exactly the same (such as the companyStarbucks and a particular Starbucks coffee shop) [33].

4.2. Knowledge Graph as Silver Standard

Another evaluation strategy is to use the givenknowledge graph itself as a test dataset. Since theknowledge graph is not perfect (otherwise, refinementwould not be necessary), it cannot be considered asa gold standard. However, assuming that the givenknowledge graph is already of reasonable quality, wecall this method silver standard evaluation, as alreadyproposed in other works [32,44,73].

The silver standard method is usually applied tomeasure the performance of knowledge graph comple-tion approaches, where it is analyzed how well rela-tions in a knowledge graph can replicated by a knowl-edge graph completion method. As for gold standardevaluations, the result quality is usually measured inrecall, precision, and F-measure. In contrast to usinghuman annotations, large-scale evaluations are easilypossible. The silver standard method is not suitable forerror detection, since it assumes the knowledge graphto be correct.

There are two variants of silver standard evalua-tions: in the more common ones, the entire knowledgegraph is taken as input to the approach at hand, and theevaluation is then also carried out on the entire knowl-edge graph. As this may lead to an overfitting effect (inparticular for internal methods), some works also fore-

see the splitting of the graph into a training and a testpartition, which, however, is not as straight forward as,e.g., for propositional classification tasks [71], whichis why most papers use the former method. Further-more, split and cross validation do not fully solve theoverfitting effect. For example, if a knowledge graph,by construction, has a bias towards certain kinds of in-formation (e.g., relations are more complete for someclasses than for others), approaches overadapting tothat bias will be rated better than those which do not(and which may actually perform better).

A problem with this approach is that the knowledgegraph itself is not perfect (otherwise, it would not needrefinement), thus, this evaluation method may some-times underrate the evaluated approach. More pre-cisely, most knowledge graphs follow the open worldassumption, i.e., an axiom not present in the knowl-edge graph may or may not hold. Thus, if a completionapproach correctly predicts the existence of an axiommissing in the knowledge graph, this would count as afalse positive and thus lower precision.

4.3. Retrospective Evaluation

For retrospective evaluations, the output of a givenapproach is given to human judges for annotation, whothen label completions or flagged errors as correct andincorrect. The quality metric is usually accuracy orprecision, along with a statement about the total num-ber of completions or errors found with the approach,and ideally also with a statement about the agreementof the human judges.

In many cases, automatic refinement methods leadto a very large number of findings, e.g., lists of tens ofthousands of axioms which are potentially erroneous.Thus, retrospective evaluations are often carried outonly on samples of the results. For some approacheswhich produce higher level artifacts – such as errorpatterns or completion rules – as intermediate results, afeasible alternative is to evaluate those artifacts insteadof the actually affected axioms.

While partial gold standards can be reused for com-paring different methods, this is not the case for retro-spective evaluations. On the other hand, retrospectiveevaluations may make sense in cases where the inter-esting class is rare. For example, when evaluating er-ror detection methods, a sample for a partial gold stan-dard from a high-quality graph is likely not to containa meaningful number of errors. In those cases, retro-spective evaluation methodologies are often preferredover partial gold standards.


Another advantage of retrospective evaluations isthat they allow a very detailed analysis of an ap-proach’s results. In particular, inspecting the errorsmade by an approach often reveals valuable findingsabout the advantages and limitations of a particular ap-proach.

Table 2 sums up the different evaluation methodolo-gies and contrasts their advantages and disadvantages.

4.4. Computational Performance

In addition to the performance w.r.t. correctnessand/or completeness of results, computational per-formance considerations become more important asknowledge graphs become larger. Typical performancemeasures for this aspect are runtime measurements, aswell as memory consumption.

Besides explicit measurement of computational per-formance, a “soft” indicator for computational perfor-mance is whether an approach has been evaluated (orat least the results have been materialized) on an entirelarge-scale knowledge graph, or only on a subgraph.The latter is often done when applying evaluations on apartial gold standard, where the respective approach isonly executed on entities contained in that partial goldstandard.

5. Approaches for Completion of KnowledgeGraphs

Completion of knowledge graphs aims at increasingthe coverage of a knowledge graph. Depending on thetarget information, methods for knowledge graph com-pletion either predict missing entities, missing typesfor entities, and/or missing relations that hold betweenentities.

In this section, we survey methods for knowledgegraph completion. We distinguish internal and exter-nal methods, and further group the approaches by thecompletion target.

5.1. Internal Methods

Internal methods use only the knowledge containedin the knowledge graph itself to predict missing infor-mation.

5.1.1. Methods for Completing Type AssertionsPredicting a type or class for an entity given some

characteristics of the entity is a very common problemin machine learning, known as classification. The clas-sification problem is supervised, i.e., it learns a classi-fication model based on labeled training data, typicallythe set of entities in a knowledge graph (or a subsetthereof) which have types attached. In machine learn-ing, binary and multi-class prediction problems aredistinguished. In the context of knowledge graphs, inparticular the latter are interesting, since most knowl-edge graphs contain entities of more than two differ-ent types. Depending on the graph at hand, it mightbe worthwile distinguishing multi-label classification,which allows for assigning more than one class to aninstance (e.g., Arnold Schwarzenegger being both anActor and a Politician), and single-label classification,which only assigns one class to an instance [100].

For internal methods, the features used for classifi-cation are usually the relations which connect an en-tity to other entities [80,86], i.e., they are a variant oflink-based classification problems [31]. For example,an entity which has a director relation is likely to be aMovie.

In [78,79], we propose a probabilistic method,which is based on conditional probabilities, e.g., theprobability of a node being of type Actor is high ifthere are ingoing edges of type cast. Such probabili-ties are exploited by the SDType algorithm, which iscurrently deployed for DBpedia and adds around 3.4million additional type statements to the knowledgegraph.

In [95], the use of Support Vector Machines (SVMs)has been proposed to type entities in DBpedia andFreebase. The authors also exploit interlinks betweenthe knowledge graphs and classify instances in oneknowledge graph based on properties present in theother, in order to increase coverage and precision.Nickel et al. [72] propose the use of matrix factoriza-tion to predict entity types in YAGO.

Since many knowledge graphs come with a classhierarchy, e.g., defined in a formal ontology, the typeprediction problem could also be understood as a hier-archical classification problem. Despite a larger bodyof work existing on methods for hierarchical classifi-cation [93], there are, to the best of our knowledge,no applications of those methods to knowledge graphcompletion.

In data mining, association rule mining [37] is amethod that analyzes the co-occurence of items initemsets and derives association rules from those co-


Table 2Overview on evaluation methods with their advantages and disadvantages

Methodology Advantages Disadvantages

Partial Gold Standard highly reliable resultsreusable

costly to producebalancing problems

Knowledge Graph as Silver Standard large-scale evaluation feasiblesubjectiveness is minimized

less reliable resultsprone to overfitting

Retrospective Evaluation applicable to disbalanced problemsallows for more detailed analysis ofapproaches

not reusableapproaches cannot be compared directly

occurences. For predicting missing information inknowledge graphs, those methods can be exploited,e.g., in the presence of redundant information. For ex-ample, in DBpedia, different type systems (i.e., theDBpedia ontology and YAGO, among others) are usedin parallel, which are populated with different methods(Wikipedia infoboxes and categories, respectively).This ensures both enough overlap to learn suitable as-sociation rules, as well as a number of entities thatonly have a type in one of the systems, to which therules can be applied. In [76], we exploit such associ-ation rules to predict missing types in DBpedia basedon such redundancies.

In [96], the use of topic modeling for type predictionis proposed. Entities in a knowledge graph are repre-sented as documents, on which Latent Dirichlet Allo-cation (LDA) [7] is applied for finding topics. By an-alyzing the co-occurrence of topics and entity types,new types can be assigned to entities based on the top-ics detected for those entities.

5.1.2. Methods for Predicting RelationsWhile primary used for adding missing type asser-

tions, classification methods can also be used to pre-dict the existence of relations. To that end, Socher etal. [97] propose to train a tensor neural network to pre-dict relations based on chains of other relations, e.g.,if a person is born in a city in Germany, then the ap-proach can predict (with a high probability) that the na-tionality of that person is German. The approach is ap-plied to Freebase and WordNet. A similar approach ispresented in [49], where the authors show that refiningsuch a problem with schema knowledge – either de-fined or induced – can significantly improve the perfor-mance of link prediction. In [48], an approach similarto association rule mining is used to find meaningfulchains of relations for relation prediction. Similarly,in [109], an embedding of pairwise entity relations intoa lower dimensional space is learned, which is thenused to predict the existence of relations in Freebase.

Likewise, association rule mining can be used forpredicting relations as well. In [45], the mining of as-sociation rules which predict relations between entitiesin DBpedia from Wikipedia categories is proposed.24

5.2. External Methods

External methods use sources of knowledge – suchas text corpora or other knowledge graphs – which arenot part of the knowledge graph itself. Those exter-nal sources can be linked from the knowledge graph,such as knowledge graph interlinks or links to webpages, e.g., Wikipedia pages describing an entity, orexist without any relation to the knowledge graph athand, such as large text corpora.

5.2.1. Methods for Completing Type AssertionsFor type prediction, there are also classification

methods that use external data. In contrast to the in-ternal classification methods described above, externaldata is used to create a feature representation of an en-tity.

Nuzzolese et al. [74] propose the usage of theWikipedia link graph to predict types in a knowledgegraph using a k-nearest neighbors classifier. Given thata knowledge graph contains links to Wikipedia, inter-links between Wikipedia pages are exploited to createfeature vectors, e.g., based on the categories of the re-lated pages. Since links between Wikipedia pages arenot constrained, there are typically more interlinks be-tween Wikipedia pages than between the correspond-ing entities in the knowledge graph.

Apriosio et al. [3] use types of entities in differentDBpedia language editions (each of which can be un-derstood as a knowledge graph connected to the oth-ers) as features for predicting missing types. The au-thors use a k-NN classifier with different distance mea-

24Note that since Wikipedia categories are part of the DBpediaknowledge graph, we consider this approach an internal one.


sures (i.e., kernel functions), such as overlap in arti-cle categories. In their setting, a combination of differ-ent distance measures is reported to provide the bestresults.

Another set of approaches uses abstracts in DBpe-dia to extract definitionary clauses, e.g., using Hearstpatterns [35]. Such approaches have been proposed byGangemi et al. [28] and Kliegr [46], where the latteruses abstracts in the different languages in order to in-crease coverage and precision.

5.2.2. Methods for Predicting RelationsLange et al. [51] learn patterns on Wikipedia ab-

stracts using Conditional Random Fields [50]. A sim-ilar approach, but on entire Wikipedia articles, is pro-posed by [106].25

Another common method for the prediction of a re-lation between two entities is distant supervision. Typ-ically, such approaches use large text corpora. As afirst step, entites in the knowledge graph are linked tothe text corpus by means of Named Entity Recognition[39,88]. Then, based on the relations in the knowledgegraph, those approaches seek for text pattern whichcorrespond to relation types (such as: Y’s book X beinga pattern for the relation author holding between X andY), and apply those patterns to find additional relationsin the text corpus. Such methods have been proposedby Mintz et al. [67] for Freebase, and by Aprosio etal. [4] for DBpedia. In both cases, Wikipedia is usedas a text corpus. In [30], a similar setting with DBpe-dia and two text corpora – the English Wikipedia andan English-language news corpus – is used, the lattershowing less reliable results. A similar approach is fol-lowed in the RdfLiveNews prototype, where RSS feedsof news companies are used to address the aspect oftimeliness in DBpedia, i.e., extracting new informationthat is either outdated or missing in DBpedia [29].

West et al. [104] propose the use of web search en-gines to fill gaps in knowledge graphs. Like in theworks discussed above, they first discover lexicaliza-tions for relations. Then, they use those lexicalizationsto formulate search engine queries for filling missingrelation values. Thus, they use the whole Web as a cor-pus, and combine information retrieval and extractionfor knowledge graph completion.

While text is unstructured, some approaches havebeen proposed that use semi-structured data for com-

25Although both approaches do not explicitly mention DBpedia,but aim at completing missing key-value pairs in infoboxes, this canbe directly transferred to extending DBpedia.

pleting knowledge graphs. In particular, approachesleveraging on structured data in Wikipedia are foundin the literature. Those are most often used togetherwith DBpedia, so that there are already links betweenthe entities and the corpus of background knowledge,i.e., no Named Entity Recognition has to be performed,in contrast to the distant supervision approaches dis-cussed above.

Muñoz et al. [68] propose extraction from tablesin Wikipedia. They argue that for two entities co-occurring in a Wikipedia table, it is likely that the cor-responding entities should share an edge in the knowl-edge graph. To fill in those edges, they first extract aset of candidates from the tables, using all possible re-lations that hold between at least one pair of entitiesin two columns. Then, based on a labeled subset ofthat extraction, they apply classification using variousfeatures to identify those relations that should actuallyhold in the knowledge graph.

Ritze et al. [87] extend this approach to arbitraryHTML tables. This requires that not only that pairs oftable columns have to be matched to properties in theDBpedia ontology, but also that rows in the table needto be matched to entities in DBpedia. The authors pro-pose an interative approach to solve those two prob-lems. The approach is evaluated on a gold standardmapping for a sample of HTML tables from the Web-DataCommons Web Table corpus26. Since such tablescan also contain literal values (such as population fig-ures), the approach is capable of completing both rela-tions between entities, and literal values for entities.

In [82], we have proposed the use of list pages inWikipedia for generating both type and relation asser-tions in knowledge graphs, based on statistical meth-ods. The idea is that entities appear together in listpages for a reason, and it should be possible to identifythat common pattern appearing for the majority of theinstance in the list page. For example, instances linkedfrom the page List of Jewish-American Writers shouldall be typed as Writer and include an edge religion toJewish, as well as an edge nationality to United Statesof America. Once such patterns are found for the ma-jority of the list items, they can be applied to the re-maining ones to fill gaps in the knowledge graph.

Many knowledge graphs contain links to otherknowledge graphs. Those are often created automat-ically [70]. Interlinks between knowledge graphs canbe used to fill gaps in one knowledge graph from infor-

26http://webdatacommons.org/webtables/

http://webdatacommons.org/webtables/


mation defined in another knowledge graph. If a map-ping both on the instance and on the schema level isknown, it can be exploited for filling gaps in knowl-edge graphs on both sides.

One work in this direction is presented by Bryl andBizer [12], where different language versions of DB-pedia (each of which can be seen as a knowledge graphof its own) are used to fill missing values in the Englishlanguage DBpedia (the one which is usually meantwhen referring to DBpedia).

Dutta et al. [23] propose a probabilistic mappingbetween knowledge graphs. Based on distributions oftypes and properties, they create a mapping betweenknowledge graphs, which can then be used to deriveadditional, missing facts in the knowledge graphs. Tothat end, the type systems used by two knowledgegraphs are mapped to one another. Then, types holdingin one knowledge graph can be used to predict thosethat should hold in another.

6. Approaches for Error Detection in KnowledgeGraphs

Like completion methods discussed in the previoussection, methods for identifying errors in knowledgegraphs can target various types of information, i.e.,type assertions, relations between individuals, literalvalues, and knowledge graph interlinks.

In this section, we survey methods for detecting er-rors in knowledge graphs. Like for the previous sec-tion, we distinguish internal and external methods, andfurther group the approaches by the error detection tar-get.

6.1. Internal Methods

Internal methods use only the information given ina knowledge graph to find out whether an axiom in theknowledge graph is plausible or not.

6.1.1. Methods for Finding Erroneous TypeAssertions

In contrast to relation assertions, type assertions aremost often more correct in knowledge graphs than re-lation assertions [79]. Hence, methods for finding erro-neous type assertions are rather rare. One such methodis proposed by Ma et al. [62], who use inductive logicprogramming for learning disjointness axioms, andthen apply those disjointness axioms for identifyingpotentially wrong type assertions.

6.1.2. Methods for Finding Erroneous RelationsFor building Knowledge Vault, Dong et al. use

classification to tell relations which should hold in aknowledge graph from those which should not [21].Like the work by Muñoz et al. discussed above, eachrelation is used as an instance in the classificationproblem, with the existence of the relation in theknowledge graph being used as a binary class. Thisclassification is used as a cleansing step after theknowledge extraction process. While the creation ofpositive training examples from the knowledge graphis quite straight forward, the authors propose the cre-ation of negative training examples by applying a Lo-cal Closed World Assumption, assuming that a relationr between two entities e1 and e2 does not hold if itis not present in the knowledge graph, and there is arelation r between e1 and another e3.

In [79], we have proposed a statistical method forfinding wrong statements within a knowledge graph.For each type of relation, we compute the character-istic distribution of subject and object types for edge,i.e., each instantiation of the relation. Edges in thegraph whose subject and object type strongly deviatefrom the characteristic distributions are then identifiedas potential errors.

Reasoning is a field of study in the artificial intelli-gence community which deals with automatically de-riving proofs for theorems, and for uncovering contra-dictions in a set of axioms [89]. The techniques devel-oped in this field have been widely adopted in the se-mantic web community, leading to the development ofa larger number of ontoloyg reasoners [19,20,61].

For exploiting reasoning for error checking inknowledge graphs, a rich ontology is required, whichdefines the possible types of nodes and edges in aknowledge graph, as well as the restrictions that holdon them. For example, if a person is defined to be thecapital of a state, this is a contradiction, since capitalsare cities, and cities and persons are disjoint, i.e., noentity can be a city and a person at the same time. Rea-soning is often used at the building stage of a knowl-edge graph, i.e., when new axioms are about to beadded. For example, NELL and PROSPERA performreasoning at that point to determine whether the newaxiom is plausible or not, and discard implausible ones[14,69]. For real-world knowledge graphs, reasoningcan be difficult due to the presence of errors and noisein the data [85,42].

Works using reasoning as a refinement operation forknowledge graphs have also been proposed. However,many knowledge graphs, such as DBpedia, come with


ontologies that are not rich enough to perform rea-soning for inconsistency detection – for example, theylack class disjointness assertions needed for an infer-ence as in the example above. Therefore, approachesexploiting reasoning are typically used in conjunctionwith methods for enriching ontologies, such as statis-tical methods, as proposed in [41] and [99], or asso-ciation rule mining, as in [52]. In all of those works,the ontology at hand is enriched with further axioms,which can then be used for detecting inconsistencies.For example, if a reasoner concludes that an entityshould both be a person and an organization, and fromthe enrichment steps has a disjointness axiom betweenthe two types added, a reasoner can state that one out ofa few axioms in the knowledge graph has to be wrong.

In [83], a light-weight reasoning approach is pro-posed to compare actual and defined domains andranges of relations in a knowledge graph schema.The authors propose a set of heuristics for fixing theschema if the actual and the defined domain or rangestrongly deviate.

6.1.3. Methods for Finding Erroneous Literal ValuesOutlier detection or anomaly detection methods deal

aim at identifying those instances in a dataset that de-viate from the majority from the data, i.e., that fol-low different characteristics than the rest of the data[15,38].

As outlier detection in most cases deals with nu-meric data, numeric literals are a natural target forthose methods. In [105], we have proposed the appli-cation of different univariate outlier detection meth-ods (such as interquartile range or kernel density es-timation) to DBpedia. Although outlier detection doesnot necessarily identify errors, but also natural outliers(such as the population of very large cities), it has beenshown that the vast majority of outliers identified areactual errors in DBpedia, mostly resulting from mis-takes made when parsing strings using various numberformats and units of measurement.

To lower the influence of natural outliers, an ex-tension of that approach has been presented in [24],where the instance set under inspection is first split intosmaller subsets. For example, population values are in-spected for countries, cities, and towns in isolation,thus, the distributions are more homogenous, whichleads to a higher precision in error identification. Fur-thermore, the approach foresees cross-checking foundoutliers with other knowledge graphs in order to fur-ther reduce the influence of natural outliers, which

makes it a mixed approach with both an internal andan external component.

6.1.4. Methods for Finding Erroneous KnowledgeGraph Interlinks

In [77], we have shown that outlier detection isnot only applicable to numerical values, but also toother targets, such as knowledge graph interlinks. Tothat end, the interlinks are represented as a multi-dimensional feature vector, e.g., with each type of therespective entity in both knowledge graphs being a bi-nary feature. In that feature space, standard outlier de-tection techniques such as Local Outlier Factor [11]or cluster-based outlier detection [34] can be used toassign outlier scores. Based on those scores, implausi-ble links, such as a owl:sameAs assertion between aperson and a book, can be identified based only on theoverall distribution of all links, where such a combina-tion is infrequent.

The work in [57] tries to learn arithmetic relationsbetween attributes, e.g., lessThan or greaterThan, us-ing probabilistic modeling. For example, the birth dateof a person must be before her death date, the total areaof a country must be larger than the area covered bywater, etc. Violations of those relations are then usedto identify errors.27

6.2. External Methods

Purely automatic external methods for error detec-tion in knowledge graphs are still rare. Semi-automaticapproaches, which exploit human knowledge, havealso been proposed.

6.2.1. Methods for Finding Erroneous RelationsMost external methods are targeted on finding erro-

neous relations in knowledge graphs. One of the fewworks is DeFacto [53]. The system uses a databaseof lexicalizations for predicates in DBpedia. Based onthose lexicalizations, it transforms statements in DB-pedia to natural language sentences, and uses a websearch engine to find web pages containing those sen-tences. Statements with no or only very few web pagessupporting the corresponding sentences are then as-signed a low confidence score.

27Note that an error here is not a single statement, but a pair ofstatements that cannot be true at the same time. Thus, the approachdoes not trivially lead to a fully automatic repairing mechanism (un-less both statements are removed, which means that most likely, onecorrect statement is removed as well).


Apart from fully automatic methods, semi-automaticmethods involving users have been proposed for val-idating knowledge graphs, such as crowdsourcingwith microtasks [1]. In order to increase the user in-volvement and motivation, game-based approaches(i.e., games with a purpose) have been proposed[36,47,94,102]. In a wider sense, those can also beviewed as external methods, with the human in theloop being the external source of information.

Generally, a crucial issue with human computationis the size of web scale knowledge graphs. In [79], ithas been argued that the time needed to validate the en-tire DBpedia knowledge graph with the crowdsourcingapproach proposed in [1] – extrapolating the task com-pletion times reported – would take more than 3,000years. To overcome such scaling problems, we have re-cently proposed a clustering of inconsistencies identi-fied by automatic means, which allows to present onlyrepresentative examples to the human for inspection[81]. We have shown that most of the clusters have acommon root cause in the knowledge graph construc-tion (e.g., a wrong mapping rule or a programming er-ror), so that by inspecting only a few dozen examples(and addressing the respective root causes), millions ofstatements can be corrected.

6.2.2. Methods for Finding Erroneous Literal ValuesWhile most of the crowdsourcing approaches above

are focusing on relations in the knowledge graph, thework in [1] uses similar mechanisms for validatingknowledge graph interlinks and literal values.

In [59], an automatic approach using knowledgegraph interlinks for detecting wrong numerical valuesis proposed. The authors exploit links between iden-tical resources and apply different matching functionsbetween properties in the individual sources. Facts inone knowledge graph are assumed to be wrong if mul-tiple other sources have a consensus for a conflictingfact (e.g., a radically different population figure).

7. Findings from the Survey

From the survey in the last two sections, we can ob-serve that there are quite a few works proposed forknowledge graph refinement, both for automatic com-pletion and for error detection. Tables 3 to 5 sum upthe results from the previous section.

By taking a closer look at those results, we can de-rive some interesting findings, both with respect tothe approaches, as well as with respect to evaluationmethodologies.

7.1. Approaches

A first interesting observation is that our distinguish-ing into completion and error detection is a strict one.That is, there exist no approaches which do both com-pletion and correction at the same time. The only ex-ception we found is the pairing of the two approachesSDType and SDValidate [79], which are two closely re-lated algorithms which share the majority of the com-putations and can output both completion axioms anderrors.

For many of the approaches, it is not obvious whythey were only used for one purpose. For example,many of the probabilistic and NLP-based completionapproaches seek for evidence for missing axioms, e.g.,by means of scanning text corpora. Similarly, manycompletion approaches ultimately compute a confi-dence score, which is then combined with a suitablethreshold for completing a knowledge graph. In princi-ple, they could also be used for error detection by flag-ging axioms for which no or only little evidence wasfound, or those with a low confidence score, as wrong.

Furthermore, in particular in the machine learn-ing area, approaches exist which can be used for si-multaneously creating a predictive model and creat-ing weights for pieces of information. For example,random forests can assign weights to attributes [58],whereas boosting assign weights to instances [25],which can also be interpreted as outlier scores [16].Such approaches could be a starting point for devel-oping methods for simultaneous completion and errordetection in knowledge graphs.

Along the same lines, there are hardly any amongthe error detection approaches which are also suitablefor correcting errors, i.e., suggest fixes for the errorsfound. Here, a combination between completion anderror detection methods could be of great value: oncean error is detected, the erroneous axiom(s) could beremoved, and a correction algorithm could try to find anew (and, in the best case, more accurate) replacementfor the removed axiom(s).

In addition to the strict separation of completion andcorrection, we also observe that most of the approachesfocus on only one target, i.e., types, relations, literals,etc. Approaches that simultaneously try to complete orcorrect, e.g., type and relation assertions in a knowl-edge graph, are also quite rare.

For the approaches that perform completion, allworks examined in this survey try to add missingtypes for or relations between existing entities in theknowledge graph. In contrast, we have not observed


Table3:

Overview

ofknow

ledgegraph

completion

approaches(part

1).A

bbreviationsused:

Target(T

=Types,R

=Relations),

Type(I=internal,

E=external),E

valuation(R

E=retrospective

evaluation,PG=partialgold

standard,eitheravailable(a)orunavailable

(n/a),KG

=evaluationagainstknow

l-edge

graph,SV

=splitvalidation,

CV

=crossvalidation),

Metrics

(P/R=precision

andrecall,

A=accuracy,

AU

C-PR

=areaunder

precision-recall-curve,R

OC

=Area

underthe

RO

Ccurve,T

=totalnewstatem

ents).Com

p.:evaluationor

materialization

carriedouton

whole

knowledge

graphor

not,Perfor-m

ance:computationalperform

ancereported

ornot.

PaperTarget

TypeM

ethodsand

SourcesK

nowledge

Graph(s)

Eval.

Metrics

Whole

Com

p.Paulheim

[76]T

IA

ssociationR

uleM

iningD

Bpedia

RE

P,Tno

yesN

ickeletal.[72]T

IM

atrixFactorization

YAG

OK

G(C

V)

P/R,A

UC

-PRyes

yesPaulheim

/Bizer[78,79]

TI

Likelihood

basedD

Bpedia,O

penCyc,N

ellK

G,R

EP/R

,Tyes

noSleem

anetal.[96]

TI

TopicM

odelingD

Bpedia

KG

(SV)

P/Rno

noN

uzzoleseetal.[74]

TE

differentm

achinelearning

methods,

Wikipedia

linkgraph

DB

pediaPG

(a)P/R

yesno

Gangem

ietal.[28]T

EN

LP

onW

ikipediaabstracts

DB

pediaPG

(a)P/R

noyes

Kliegr[46]

TE

NL

Pon

Wikipedia

abstractsD

Bpedia

PG(n/a)

P/Rno

noA

prosioetal.[3]

TE

K-N

Nclassification,different

DB

pedialanguage

editionsD

Bpedia

PG(n/a)

P/Ryes

no

Dutta

etal.[23]T

EK

nowledge

graphfusion

(sta-tistical)

NE

LL

,DB

pediaPG

(a)P/R

nono

Sleeman/Finin

[95]T

I,ESV

M,using

otherKG

sD

Bpedia,

Freebase,A

r-netm

inerK

GP/R

nono

Socheretal.[97]R

IN

euralTensorNetw

orkW

ordNet,Freebase

KG

(SV)

Ano

noK

rompaß

etal.[49]R

IL

atentvariablem

odelsD

Bpedia,

Freebase,YA

GO

KG

(SV)

AU

C-PR

,RO

Cno

no

Kolthoffand

Dutta

[48]R

IR

ulem

iningD

Bpedia,YA

GO

KG

P/Rno

noZ

haoetal.[109]

RI

Learning

Em

beddingsW

ordNet,Freebase

KG

Pno

no

Knowledge Graph Refinement: A Survey of Approaches and Evaluation Methods 15Ta

ble

4:O

verv

iew

ofkn

owle

dge

grap

hco

mpl

etio

nap

proa

ches

(par

t2)

.A

bbre

viat

ions

used

:Ta

rget

(T=T

ypes

,R

=Rel

atio

ns),

Type

(I=i

nter

nal,

E=e

xter

nal)

,Eva

luat

ion

(RE

=ret

rosp

ectiv

eev

alua

tion,

PG=p

artia

lgol

dst

anda

rd,e

ither

avai

labl

e(a

)oru

nava

ilabl

e(n

/a),

KG

=eva

luat

ion

agai

nstk

now

l-ed

gegr

aph,

SV=s

plit

valid

atio

n,C

V=c

ross

valid

atio

n),

Met

rics

(P/R

=pre

cisi

onan

dre

call,

A=a

ccur

acy,

AU

C-P

R=a

rea

unde

rpr

ecis

ion-

reca

ll-cu

rve,

T=t

otal

new

stat

emen

ts).

Com

p.:e

valu

atio

nor

mat

eria

lizat

ion

carr

ied

outo

nw

hole

know

ledg

egr

aph

orno

t,Pe

rfor

man

ce:c

ompu

tatio

nalp

erfo

rman

cere

port

edor

not.

Pape

rTa

rget

Type

Met

hods

and

Sour

ces

Kno

wle

dge

Gra

ph(s

)E

val.

Met

rics

Who

leC

omp.

Gal

árra

gaet

al.[

26]

RI

Ass

ocia

tion

Rul

eM

inin

gYA

GO

,DB

pedi

aK

G,R

EA

,Tye

sye

sK

imet

al.[

45]

RI

Ass

ocia

tion

Rul

eM

inin

gD

Bpe

dia

RE

P,T

yes

noB

ryl/B

izer

[12]

RE

Fusi

onof

DB

pedi

ala

ngua

geed

ition

sD

Bpe

dia

PG(a

)A

nono

Apr

iosi

oet

al.[

4]R

ED

ista

ntsu

perv

isio

nan

dN

LP

tech

niqu

eson

Wik

iped

iate

xtco

rpus

DB

pedi

aK

GP/

Rno

no

Lan

geet

al.[

51]

RE

Patte

rnle

arni

ngon

Wik

iped

iaab

stra

cts

(DB

pedi

a)K

G(C

V)

P/R

yes

yes

Wu

etal

.[10

6]R

EPa

ttern

lear

ning

onW

ikip

edia

artic

les,

Web

sear

ch(D

Bpe

dia)

KG

(CV

)P/

Rno

no

Min

tzet

al.[

67]

RE

Dis

tant

supe

rvis

ion

and

NL

Pte

chni

ques

onW

ikip

edia

text

corp

us

Free

base

KG

P/R

nono

Ger

ber/

Ngo

nga

Ngo

mo

[30]

RE

Dis

tant

supe

rvis

ion

and

NL

Pte

chni

ques

ontw

oco

rpor

aD

Bpe

dia

KG

,RE

P/R

nono

Ger

bere

tal.

[29]

RE

NL

Pan

dpa

ttern

min

ing

onR

SSne

ws

feed

sD

Bpe

dia

PGP/

R,A

yes

yes

Wes

teta

l.[1

04]

RE

sear

chen

gine

sFr

eeba

seK

GP/

R,r

ank

nono

Mu

noz

etal

.[68

]R

ESt

atis

tical

mea

sure

s,m

achi

nele

arni

ng,

usin

gW

ikip

edia

ta-

bles

DB

pedi

aPG

(a)

P/R

yes

no

Ritz

eet

al.[

87]

R,L

ESc

hem

aan

din

stan

cem

atch

-in

gus

ing

HT

ML

tabl

esD

Bpe

dia

PG(a

)P/

Rye

sno

Paul

heim

/Pon

zetto

[82]

T,R

ESt

atis

tical

mea

sure

s,W

ikip

edia

listp

ages

DB

pedi

a–

–no

no


Table5:

Overview

oferror

detectionapproaches.

Abbreviations

used:Target

(T=Types,

R=R

elations,L

=Literals,

I=Interlinks,S=Schem

a),Type

(I=internal,E

=external),E

valuation(R

E=retrospective

evaluation,PG

=partialgold

standard,either

available(a)

orunavailable

(n/a),K

G=evaluation

againstknow

ledgegraph,SV

=splitvalidation,C

V=cross

validation),Metrics

(P/R=precision

andrecall,A

=accuracy,AU

C-PR

=areaunder

precision-recall-curve,R

OC

=Area

undertheR

OC

curve,T=totalnew

statements,R

MSE

=RootM

eanSquared

Error).C

omp.:evaluation

ormaterialization

carriedouton

whole

knowledge

graphornot,Perform

ance:computationalperform

ancereported

ornot.

PaperTarget

TypeM

ethodsand

SourcesK

nowledge

Graph(s)

Eval.

Metrics

Whole

Com

p.M

aetal.[62]

TI

Reasoning,

Association

Rule

Mining

DB

pedia,Zhishi.m

eR

EP,T

yesno

Dong

etal.[21]R

IC

lassificationK

nowledge

Vault

KG

(SV)

AU

C-PR

yesno

Lietal.[57]

LI

ProbabilisticM

odelingD

Bpedia

RE

P,Tyes

yesN

akasholeetal.[69]

RI

Reasoning

ProsperaR

EP,T

yesyes

Paulheim/B

izer[79]R

IProbabilistic

DB

pedia,NE

LL

RE

Pyes

noL

ehmann

etal.[53]R

EText

patterninduction,W

ebsearch

enginesD

Bpedia

KG

(CV

)P/R

,RO

C,

RM

SEno

yes

Töpperetal.[99]

R,S

IStatisticalm

ethods,Reason-

ingD

Bpedia

RE

P,Tyes

no

Jangetal.[41]

RI

Statisticalmethods

DB

pediaR

EP,R

yesno

Lehm

ann/Bühm

ann[52]

R,T

IR

easoning,ILP

DB

pedia,Open

Cyc,

sevensm

allerontolo-gies

RE

Ayes

yes

Wienand/Paulheim

[105]L

IO

utlierDetection

DB

pediaR

EP,T

nono

Fleischhackeretal.[24]L

I,EO

utlierD

etectionand

Data

Fusionw

ithotherK

GD

Bpedia,N

EL

LR

EA

UC

-R

OC

nono

Liu

etal.[59]L

EM

atchingto

otherKG

sD

Bpedia

RE

P,Tno

noPaulheim

[77]I

IO

utlierDetection

DB

pedia+

two

linkedgraphs

PG(a)

P/R,R

OC

yesno

Acosta

etal.[1]L

,IE

Crow

dsourcingD

Bpedia

PG(a)

Pno

noW

aitelonisetal.[102]

RE

Quiz

game

DB

pediaR

EP,T

noyes

Hees

etal.[36]R

ETw

o-playergame

DB

pediaR

ET

noyes

Paulheimand

Gangem

i[81]R

ER

easoning,clustering,

hu-m

aninspection

DB

pediaR

EP,T

yesyes


any approaches which populate the knowledge graphwith new entities. Here, entity set expansion methods,which have been deeply investigated in the NLP field[75,91,103], would be an interesting fit to further in-crease the coverage of knowledge graphs, especiallyfor less well-known long tail entities.

Another interesting observation is that, although thediscussed works address knowledge graphs, only veryfew of them are, in the end, genuinely graph-based ap-proaches. In many cases, simplistic transformations toa propositional problem formulation are taken. Here,methods from the graph mining literature still seektheir application to knowledge graphs. In particular,for many of the methods applied in the works dis-cussed above – such as outlier detection or associationrule mining – graph-based variants have been proposedin the literature [2,43]. Likewise, graph kernel func-tions – which can be used in Support Vector Machinesas well as other machine learning algorithms – havebeen proposed for RDF graphs [40,60,18] and hencecould be applied to many web knowledge graphs.

7.2. Evaluation Methodologies

For evaluation methodologies, our first observationis that there are various different evaluation metrics be-ing used in the papers examined. There is a clear ten-dency towards precision and recall (or precision andtotal number of statements for retrospective evalua-tions) are the most used metrics, with others – such asROC curves, accuracy, or Root Mean Squared Error –occasionally being used as well.

With respect to the overall methodology, the re-sults are more mixed. Evaluations using the knowl-edge graph as a silver standard, retrospective evalua-tions, and evaluations based on partial gold standardsappear at equal frequency, with retrospective valida-tions mostly used for error detection. The latter is nottoo surprising, since due to the high quality of mostknowledge graphs used for the evaluations, partial goldstandards based on random samples are likely to con-tain only few errors. For partial gold standards, it iscrucial to point out that the majority of authors makethose partial gold standards public28, which allows forreplication and comparison.

28For this survey, we counted a partial gold standard as publicif there was a working download link in the paper, but we did notmake any additional efforts to search for the gold standard, such ascontacting the authors.

The major knowledge graph used in the evaluationsis DBpedia. This, in principle, makes the results com-parable to a certain extent, although roughly each year,a new version of DBpedia is published, so that pa-pers from different years are likely to be evaluated onslightly different knowledge graphs.

That being said, we have observed that roughly twoout of three approaches evaluated on DBpedia are onlyevaluated on DBpedia. Along the same lines, abouthalf of the approaches reviewed in this survey areonly evaluated on one knowledge graph. This, in manycases, limits the significance of the results. For someworks, it is clear that they can only work on a specificknowledge graph, e.g., DBpedia, by design, e.g., sincethey exploit the implicit linkage between a DBpediaentity and the corresponding Wikipedia page.

As discussed in section 2, knowledge graphs differheavily in their characteristisc. Thus, for an approachevaluated on only one graph, it is unclear whether itwould perform similarly on another knowledge graphwith different characteristics, or whether it exploitssome (maybe not even obvious) characteristics of thatknowledge graph, and/or overfits to particular charac-teristics of that graph.

Last, but not least, we have observed that only a mi-nority of approaches have been evaluated on a whole,large-scale knowledge graph. Moreover, statementsabout computational performance are only rarely in-cluded in the corresponding papers29. In the age oflarge-scale knowledge graphs, we think that this is adimension that should not be neglected.

In order to make future works on knowledge graphevolution comparable, it would be useful to have acommon selection of benchmarks. This has been donein other fields of the semantic web as well, such asfor schema and instance matching [22], reasoning [8],or question answering [17]. Such benchmarks couldserve both for comparison in the qualitative as well asthe computational performance.

8. Conclusion

In this paper, we have presented a survey on knowl-edge base refinement methods. We distinguish com-pletion from error detection, and internal from exter-nal methods. We have shown that a larger body of

29Even though we were relaxed on this policy and counted alsoinformal statements about the computational performance as a per-formance evaluation.


works exist which apply different methods, rangingfrom techniques from the machine learning field toNLP related techniques.

The survey has revealed that there are, at the mo-ment, rarely any approaches which simultaneously tryto improve completeness and correctness of knowl-edge graphs, and usually only address one target, suchas type or relation assertions, or literal values. Holis-tic solutions which simultaneously improve the qual-ity of knowledge graphs in many different aspects arecurrently not observed.

Looking at the evaluation methods, the picture isquite diverse. Different methods are applied, using ei-ther the knowledge graph itself as silver standard, us-ing a partial gold standard, or performing a retrospec-tive evaluation, are about equally distributed. Further-more, approaches are often only evaluated on one spe-cific knowledge graph. This makes it hard to compareapproaches and make general statements on their rela-tive performance.

In addition, scalability issues are only rarely ad-dressed by current research works. In the light of theadvent of web-scale knowledge graphs, however, thisis an aspect which will be of growing importance.

To sum up, this survey shows that automatic knowl-edge graph refinement is a relevant and flowering re-search area. At the same time, this survey has pointedout some uncharted territories on the research map,which we hope will inspire researchers in the area.

References

[1] Maribel Acosta, Amrapali Zaveri, Elena Simperl, DimitrisKontokostas, Sören Auer, and Jens Lehmann. Crowdsourcinglinked data quality assessment. In The Semantic Web–ISWC2013, pages 260–276. Springer, 2013.

[2] Leman Akoglu, Hanghang Tong, and Danai Koutra. Graphbased anomaly detection and description: a survey. Data Min-ing and Knowledge Discovery, pages 1–63, 2014.

[3] Alessio Palmero Aprosio, Claudio Giuliano, and AlbertoLavelli. Automatic expansion of DBpedia exploitingWikipedia cross-language information. In The Semantic Web:Semantics and Big Data, pages 397–411. Springer, 2013.

[4] Alessio Palmero Aprosio, Claudio Giuliano, and AlbertoLavelli. Extending the Coverage of DBpedia Properties us-ing Distant Supervision over Wikipedia. In NLP&DBpedia,2013.

[5] Christian Bizer, Tom Heath, and Tim Berners-Lee. Linkeddata-the story so far. International journal on semantic weband information systems, 5(3):1–22, 2009.

[6] Roi Blanco, Berkant Barla Cambazoglu, Peter Mika, andNicolas Torzec. Entity recommendations in web search. InThe Semantic Web–ISWC 2013, pages 33–48. Springer, 2013.

[7] David M Blei, Andrew Y Ng, and Michael I Jordan. La-tent dirichlet allocation. the Journal of machine Learning re-search, 3:993–1022, 2003.

[8] Jürgen Bock, Peter Haase, Qiu Ji, and Raphael Volz. Bench-marking OWL reasoners. In Proc. of the ARea2008 Work-shop, Tenerife, Spain (June 2008), 2008.

[9] Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge,and Jamie Taylor. Freebase: a collaboratively created graphdatabase for structuring human knowledge. In Proceedingsof the 2008 ACM SIGMOD international conference on Man-agement of data, pages 1247–1250. ACM, 2008.

[10] Antoine Bordes and Evgeniy Gabrilovich. Constructing andMining Web-scale Knowledge Graphs. In Proceedings of the20th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining, pages 1967–1967, 2014.

[11] Markus M Breunig, Hans-Peter Kriegel, Raymond T Ng, andJörg Sander. LOF: identifying density-based local outliers.29(2):93–104, 2000.

[12] Volha Bryl and Christian Bizer. Learning conflict resolu-tion strategies for cross-language Wikipedia data fusion. InProceedings of the companion publication of the 23rd inter-national conference on World wide web companion, pages1129–1134, 2014.

[13] Paul Buitelaar, Philipp Cimiano, and Bernardo Magnini. On-tology learning from text: methods, evaluation and applica-tions, volume 123. IOS press, 2005.

[14] Andrew Carlson, Justin Betteridge, Richard C Wang, Este-vam R Hruschka Jr, and Tom M Mitchell. Coupled semi-supervised learning for information extraction. In Proceed-ings of the third ACM international conference on Web searchand data mining, pages 101–110. ACM, 2010.

[15] Varun Chandola, Arindam Banerjee, and Vipin Kumar.Anomaly detection: A survey. ACM Computing Surveys(CSUR), 41(3), 2009.

[16] Nathalie Cheze and Jean-Michel Poggi. Iterated Boosting forOutlier Detection. In Data Science and Classification, pages213–220. Springer, 2006.

[17] Philipp Cimiano, Vanessa Lopez, Christina Unger, ElenaCabrio, Axel-Cyrille Ngonga Ngomo, and Sebastian Wal-ter. Multilingual question answering over linked data (qald-3): Lab overview. In Information Access Evaluation. Multi-linguality, Multimodality, and Visualization, pages 321–332.Springer, 2013.

[18] Gerben KD de Vries. A fast approximation of the Weisfeiler-Lehman graph kernel for RDF data. In Machine Learning andKnowledge Discovery in Databases, pages 606–621. 2013.

[19] Kathrin Dentler, Ronald Cornet, ACM ten Teije, Nicolette Fde Keizer, et al. Comparison of reasoners for large ontologiesin the OWL 2 EL profile. 2011.

[20] Li Ding, Pranam Kolari, Zhongli Ding, and Sasikanth Avan-cha. Using ontologies in the semantic web: A survey. In On-tologies, pages 79–113. Springer, 2007.

[21] Xin Luna Dong, K Murphy, E Gabrilovich, G Heitz, W Horn,N Lao, Thomas Strohmann, Shaohua Sun, and Wei Zhang.Knowledge Vault: A Web-scale approach to probabilisticknowledge fusion, 2014.

[22] Zlatan Dragisic, Kai Eckert, Jerome Euzenat, Daniel Faria,Alfio Ferrara, Roger Granada, Valentina Ivanova, ErnestoJimenez-Ruiz, Andreas Kempf, Patrick Lambrix, et al. Re-sults of theOntology Alignment Evaluation Initiative 2014.In International Workshop on Ontology Matching, pages 61–


104, 2014.[23] Arnab Dutta, Christian Meilicke, and Simone Paolo Ponzetto.

A Probabilistic Approach for Integrating HeterogeneousKnowledge Sources. In The Semantic Web: Trends and Chal-lenges, pages 286–301. Springer, 2014.

[24] Daniel Fleischhacker, Heiko Paulheim, Volha Bryl, JohannaVölker, and Christian Bizer. Detecting Errors in NumericalLinked Data Using Cross-Checked Outlier Detection. In TheSemantic Web–ISWC 2014, pages 357–372. Springer, 2014.

[25] Yoav Freund and Robert E Schapire. A desicion-theoreticgeneralization of on-line learning and an application toboosting. In Computational learning theory, pages 23–37.Springer, 1995.

[26] Luis Antonio Galárraga, Christina Teflioudi, Katja Hose, andFabian Suchanek. AMIE: association rule mining under in-complete evidence in ontological knowledge bases. In Pro-ceedings of the 22nd international conference on World WideWeb, pages 413–422, 2013.

[27] Aldo Gangemi, Nicola Guarino, Claudio Masolo, andAlessandro Oltramari. Sweetening WordNet with DOLCE.AI Magazine, (Fall), 2003.

[28] Aldo Gangemi, Andrea Giovanni Nuzzolese, Valentina Pre-sutti, Francesco Draicchio, Alberto Musetti, and Paolo Cian-carini. Automatic typing of DBpedia entities. In The Seman-tic Web–ISWC 2012, pages 65–81. Springer, 2012.

[29] Daniel Gerber, Sebastian Hellmann, Lorenz Bühmann, Tom-maso Soru, Ricardo Usbeck, and Axel-Cyrille NgongaNgomo. Real-time RDF extraction from unstructured datastreams. In The Semantic Web–ISWC 2013, pages 135–150.2013.

[30] Daniel Gerber and A-C Ngonga Ngomo. Bootstrapping thelinked data web. In Workshop on Web Scale Knowledge Ex-traction, volume 2011, 2011.

[31] Lise Getoor and Christopher P Diehl. Link mining: a survey.ACM SIGKDD Explorations Newsletter, 7(2):3–12, 2005.

[32] T. Groza, A. Oellrich, and N. Collier. Using silver and semi-gold standard corpora to compare open named entity recog-nisers. In IEEE International Conference on Bioinformaticsand Biomedicine (BIBM), pages 481–485, 2013.

[33] Harry Halpin, PatrickJ. Hayes, JamesP. McCusker, DeborahL.McGuinness, and HenryS. Thompson. When owl:sameAs Is-nâAZt the Same: An Analysis of Identity in Linked Data. InThe Semantic Web âAS ISWC 2010, pages 305–320. SpringerBerlin Heidelberg, 2010.

[34] Zengyou He, Xiaofei Xu, and Shengchun Deng. Discover-ing cluster-based local outliers. Pattern Recognition Letters,24(9):1641–1650, 2003.

[35] Marti A Hearst. Automatic acquisition of hyponyms fromlarge text corpora. In Proceedings of the 14th conference onComputational linguistics-Volume 2, pages 539–545. Associ-ation for Computational Linguistics, 1992.

[36] Jörn Hees, Thomas Roth-Berghofer, Ralf Biedert, BenjaminAdrian, and Andreas Dengel. Betterrelations: using a gameto rate linked data triples. In KI 2011: Advances in ArtificialIntelligence, pages 134–138. Springer, 2011.

[37] Jochen Hipp, Ulrich Güntzer, and Gholamreza Nakhaeizadeh.Algorithms for association rule miningâATa general sur-vey and comparison. ACM sigkdd explorations newsletter,2(1):58–64, 2000.

[38] Victoria J Hodge and Jim Austin. A survey of outlier detec-tion methodologies. Artificial Intelligence Review, 22(2):85–

126, 2004.[39] Johannes Hoffart, Mohamed Amir Yosef, Ilaria Bordino,

Hagen Fürstenau, Manfred Pinkal, Marc Spaniol, BilyanaTaneva, Stefan Thater, and Gerhard Weikum. Robust disam-biguation of named entities in text. In Proceedings of theConference on Empirical Methods in Natural Language Pro-cessing, pages 782–792, 2011.

[40] Yi Huang, Maximilian Nickel, Volker Tresp, and Hans-PeterKriegel. A scalable kernel approach to learning in semanticgraphs with applications to linked data. In 1st Workshop onMining the Future Internet, 2010.

[41] Saemi Jang, Megawati, Jiyeon Choi, and Mun Yong Yi. Semi-Automatic Quality Assessment of Linked Data without Re-quiring Ontology. In Workshop on NLP and DBpedia, 2015.

[42] Qiu Ji, Zhiqiang Gao, and Zhisheng Huang. Reasoning withnoisy semantic data. In The Semanic Web: Research and Ap-plications, pages 497–502. Springer, 2011.

[43] Chuntao Jiang, Frans Coenen, and Michele Zito. A survey offrequent subgraph mining algorithms. The Knowledge Engi-neering Review, 28(01):75–105, 2013.

[44] Ning Kang, Erik M van Mulligen, and Jan A Kors. Trainingtext chunkers on a silver standard corpus: can silver replacegold? BMC bioinformatics, 13(1):17, 2012.

[45] Jiseong Kim, Eun-Kyung Kim, Yousung Won, Sangha Nam,and Key-Sun Choi. The Association Rule Mining Systemfor Acquiring Knowledge of DBpedia from Wikipedia Cate-gories. In Workshop on NLP and DBpedia, 2015.

[46] Tomáš Kliegr. Linked Hypernyms: Enriching DBpedia withTargeted Hypernym Discovery. Web Semantics: Science, Ser-vices and Agents on the World Wide Web.

[47] Magnus Knuth, Johannes Hercher, and Harald Sack. Collab-oratively Patching Linked Data. In Workshop on Usage Anal-ysis and the Web of Data (USEWOD, 2012.

[48] Christian Kolthoff and Arnab Dutta. Semantic Relation Com-position in Large Scale Knowledge Bases. In 3rd Workshopon Linked Data for Information Extraction, 2015.

[49] Denis Krompaß, Stephan Baier, and Volker Tresp. Type-Constrained Representation Learning in Knowledge Graphs.In International Semantic Web Conference, 2015.

[50] John Lafferty, Andrew McCallum, and Fernando CN Pereira.Conditional random fields: Probabilistic models for segment-ing and labeling sequence data. In 18th International Confer-ence on Machine Learning, pages 282–289, 2001.

[51] Dustin Lange, Christoph Böhm, and Felix Naumann. Extract-ing structured information from Wikipedia articles to popu-late infoboxes. In Proceedings of the 19th ACM Conferenceon Information and Knowledge Management (CIKM), pages1661–1664, Toronto, Canada, 0 2010.

[52] Jens Lehmann and Lorenz Bühmann. ORE-a tool for repair-ing and enriching knowledge bases. In The Semantic Web–ISWC 2010, pages 177–193. Springer, 2010.

[53] Jens Lehmann, Daniel Gerber, Mohamed Morsey, and Axel-Cyrille Ngonga Ngomo. Defacto-deep fact validation. In TheSemantic Web–ISWC 2012, pages 312–327. Springer, 2012.

[54] Jens Lehmann and Pascal Hitzler. Concept Learning inDescription Logics Using Refinement Operators. MachineLearning, 78:203–250, 2010.

[55] Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dim-itris Kontokostas, Pablo N. Mendes, Sebastian Hellmann,Mohamed Morsey, Patrick van Kleef, SÃuren Auer, andChristian Bizer. DBpedia – A Large-scale, Multilingual


Knowledge Base Extracted from Wikipedia. Semantic WebJournal, 2013.

[56] Douglas B Lenat. CYC: A large-scale investment in knowl-edge infrastructure. Communications of the ACM, 38(11):33–38, 1995.

[57] Huiying Li, Yuanyuan Li, Feifei Xu, and Xinyu Zhong. Prob-abilistic Error Detecting in Numerical Linked Data. InDatabase and Expert Systems Applications, pages 61–75.Springer, 2015.

[58] Andy Liaw and Matthew Wiener. Classification and Regres-sion by randomForest. R news, 2(3):18–22, 2002.

[59] Shuangyan Liu, Mathieu dâAZAquin, and Enrico Motta. To-wards Linked Data Fact Validation through Measuring Con-sensus. 2015.

[60] Uta Lösch, Stephan Bloehdorn, and Achim Rettinger. Graphkernels for RDF data. In The Semantic Web: Research andApplications, pages 134–148. 2012.

[61] Marko Luther, Thorsten Liebig, Sebastian Böhm, and OlafNoppens. Who the Heck is the Father of Bob? In The Seman-tic Web: Research and Applications, pages 66–80. Springer,2009.

[62] Yanfang Ma, Huan Gao, Tianxing Wu, and Guilin Qi. Learn-ing Disjointness Axioms With Association Rule Mining andIts Application to Inconsistency Detection of Linked Data.In Dongyan Zhao, Jianfeng Du, Haofen Wang, Peng Wang,Donghong Ji, and Jeff Z. Pan, editors, The Semantic Web andWeb Science, volume 480 of Communications in Computerand Information Science, pages 29–41. Springer, 2014.

[63] Alexander Maedche. Ontology learning for the semantic web.Springer Science & Business Media, 2002.

[64] Farzaneh Mahdisoltani, Joanna Biega, and Fabian M.Suchanek. YAGO3: A Knowledge Base from MultilingualWikipedias. In Conference on Innovative Data Systems Re-search, 2015.

[65] Robert Meusel, Petar Petrovski, and Christian Bizer.The WebDataCommons Microdata, RDFa and MicroformatDataset Series. In The Semantic Web–ISWC 2014, pages 277–292. Springer, 2014.

[66] George A Miller. WordNet: a lexical database for English.Communications of the ACM, 38(11):39–41, 1995.

[67] Mike Mintz, Steven Bills, Rion Snow, and Dan Jurafsky. Dis-tant supervision for relation extraction without labeled data.In Proceedings of the Joint Conference of the 47th AnnualMeeting of the ACL and the 4th International Joint Confer-ence on Natural Language Processing of the AFNLP, pages1003–1011, 2009.

[68] Emir Muñoz, Aidan Hogan, and Alessandra Mileo. Triplify-ing Wikipedia’s Tables. In Linked Data for Information Ex-traction, 2013.

[69] Ndapandula Nakashole, Martin Theobald, and GerhardWeikum. Scalable knowledge harvesting with high precisionand high recall. In Proceedings of the fourth ACM interna-tional conference on Web search and data mining, pages 227–236. ACM, 2011.

[70] Markus Nentwig, Michael Hartung, Axel-Cyrille NgongaNgomo, and Erhard Rahm. A Survey of Current Link Dis-covery Frameworks. Semantic Web Journal, 2015.

[71] Jennifer Neville and David Jensen. Iterative classification inrelational data. In Proc. AAAI-2000 Workshop on LearningStatistical Models from Relational Data, pages 13–20, 2000.

[72] Maximilian Nickel, Volker Tresp, and Hans-Peter Kriegel.

Factorizing YAGO: Scalable Machine Learning for LinkedData. In Proceedings of the 21st International Conference onWorld Wide Web, pages 271–280, 2012.

[73] Joel Nothman, Nicky Ringland, Will Radford, Tara Murphy,and James R Curran. Learning multilingual named entityrecognition from Wikipedia. Artificial Intelligence, 194:151–175, 2013.

[74] Andrea Giovanni Nuzzolese, Aldo Gangemi, Valentina Pre-sutti, and Paolo Ciancarini. Type inference through the anal-ysis of Wikipedia links. In LDOW, 2012.

[75] Patrick Pantel, Eric Crestan, Arkady Borkovsky, Ana-MariaPopescu, and Vishnu Vyas. Web-scale Distributional Simi-larity and Entity Set Expansion. In Proceedings of the 2009Conference on Empirical Methods in Natural Language Pro-cessing: Volume 2 - Volume 2, pages 938–947, 2009.

[76] Heiko Paulheim. Browsing linked open data with auto com-plete. Semantic Web Challenge, 2012.

[77] Heiko Paulheim. Identifying Wrong Links between Datasetsby Multi-dimensional Outlier Detection. In InternationalWorkshop on Debugging Ontologies and Ontology Mappings,2014.

[78] Heiko Paulheim and Christian Bizer. Type inference on noisyrdf data. In The Semantic Web–ISWC 2013, pages 510–525.Springer, 2013.

[79] Heiko Paulheim and Christian Bizer. Improving the Qualityof Linked Data Using Statistical Distributions. InternationalJournal on Semantic Web and Information Systems (IJSWIS),10(2):63–86, 2014.

[80] Heiko Paulheim and Johannes Fürnkranz. Unsupervised gen-eration of data mining features from linked open data. In 2ndinternational conference on web intelligence, mining and se-mantics, page 31. ACM, 2012.

[81] Heiko Paulheim and Aldo Gangemi. Serving DBpedia withDOLCE–More than Just Adding a Cherry on Top. In Inter-national Semantic Web Conference, 2015.

[82] Heiko Paulheim and Simone Paolo Ponzetto. Extending DB-pedia with Wikipedia List Pages. In 1st International Work-shop on NLP and DBpedia, 2013.

[83] Youen Péron, Frédéric Raimbault, Gildas Ménier, and Pierre-François Marteau. On the detection of inconsistencies in RDFdata sets and their correction at ontological level. Technicalreport, 2011.

[84] Leo L Pipino, Yang W Lee, and Richard Y Wang. Data qual-ity assessment. Communications of the ACM, 45(4):211–218,2002.

[85] Axel Polleres, Aidan Hogan, Andreas Harth, and StefanDecker. Can we ever catch up with the Web? Semantic Web,1(1):45–52, 2010.

[86] Petar Ristoski and Heiko Paulheim. A Comparison of Propo-sitionalization Strategies for Creating Features from LinkedOpen Data. In Linked Data for Knowledge Discovery, 2014.

[87] Dominique Ritze, Oliver Lehmberg, and Christian Bizer.Matching HTML Tables to DBpedia. In Proceedings of the5th International Conference on Web Intelligence, Miningand Semantics, page 10. ACM, 2015.

[88] Giuseppe Rizzo and Raphaël Troncy. NERD: evaluatingnamed entity recognition tools in the web of data. InWorkshop on Web Scale Knowledge Extraction (WEKEX’11),2011.

[89] Stuart Russell and Peter Norvig. Artificial intelligence: amodern approach. 1995.


[90] Samuel Sarjant, Catherine Legg, Michael Robinson, andOlena Medelyan. All you can eat ontology-building:Feeding wikipedia to Cyc. In Proceedings of the 2009IEEE/WIC/ACM International Joint Conference on Web In-telligence and Intelligent Agent Technology-Volume 01, pages341–348. IEEE Computer Society, 2009.

[91] Luis Sarmento, Valentin Jijkuon, Maarten de Rijke, and Eu-genio Oliveira. More like these: growing entity classes fromseeds. In Proceedings of the sixteenth ACM conferenceon Conference on information and knowledge management,pages 959–962. ACM, 2007.

[92] Max Schmachtenberg, Christian Bizer, and Heiko Paulheim.Adoption of the Linked Data Best Practices in Different Top-ical Domains. In International Semantic Web Conference,2014.

[93] Carlos N Silla Jr and Alex A Freitas. A survey of hierarchi-cal classification across different application domains. DataMining and Knowledge Discovery, 22(1-2):31–72, 2011.

[94] Katharina Siorpaes and Martin Hepp. Games with a Purposefor the Semantic Web. IEEE Intelligent Systems, (3):50–60,2008.

[95] Jennifer Sleeman and Tim Finin. Type Prediction for EfficientCoreference Resolution in Heterogeneous Semantic Graphs.In Semantic Computing (ICSC), 2013 IEEE Seventh Interna-tional Conference on, pages 78–85. IEEE, 2013.

[96] Jennifer Sleeman, Tim Finin, and Anupam Joshi. Topic Mod-eling for RDF Graphs. In 3rd Workshop on Linked Data forInformation Extraction, 2015.

[97] Richard Socher, Danqi Chen, Christopher D Manning, andAndrew Ng. Reasoning With Neural Tensor Networks forKnowledge Base Completion. In C.J.C. Burges, L. Bot-tou, M. Welling, Z. Ghahramani, and K.Q. Weinberger, edi-tors, Advances in Neural Information Processing Systems 26,pages 926–934. Curran Associates, Inc., 2013.

[98] Fabian M. Suchanek, Gjergji Kasneci, and Gerhard Weikum.Yago: a core of semantic knowledge. In 16th internationalconference on World Wide Web, pages 697–706, 2007.

[99] Gerald Töpper, Magnus Knuth, and Harald Sack. Dbpediaontology enrichment for inconsistency detection. In Proceed-ings of the 8th International Conference on Semantic Systems,pages 33–40. ACM, 2012.

[100] G Tsoumakas et al. Multi Label Classification: An Overview.International Journal of Data Warehousing and Mining,

3(3):1–13, 2007.[101] Denny Vrandecic and Markus Krötzsch. Wikidata: a free

collaborative knowledgebase. Communications of the ACM,57(10):78–85, 2014.

[102] Jörg Waitelonis, Nadine Ludwig, Magnus Knuth, and HaraldSack. Whoknows? evaluating linked data heuristics with aquiz that cleans up dbpedia. Interactive Technology and SmartEducation, 8(4):236–248, 2011.

[103] Richard C Wang and William W Cohen. Iterative set expan-sion of named entities using the web. In Data Mining, 2008.ICDM’08. Eighth IEEE International Conference on, pages1091–1096. IEEE, 2008.

[104] Robert West, Evgeniy Gabrilovich, Kevin Murphy, ShaohuaSun, Rahul Gupta, and Dekang Lin. Knowledge base comple-tion via search-based question answering. In Proceedings ofthe 23rd international conference on World wide web, pages515–526, 2014.

[105] Dominik Wienand and Heiko Paulheim. Detecting incorrectnumerical data in dbpedia. In The Semantic Web: Trends andChallenges, pages 504–518. Springer, 2014.

[106] Fei Wu, Raphael Hoffmann, and Daniel S Weld. Informationextraction from Wikipedia: Moving down the long tail. InProceedings of the 14th ACM SIGKDD international confer-ence on Knowledge discovery and data mining, pages 731–739. ACM, 2008.

[107] Amrapali Zaveri, Dimitris Kontokostas, Mohamed A Sherif,Lorenz Bühmann, Mohamed Morsey, Sören Auer, and JensLehmann. User-driven Quality Evaluation of DBpedia.In 9th International Conference on Semantic Systems (I-SEMANTICS ’13), 2013.

[108] Amrapali Zaveri, Anisa Rula, Andrea Maurino, RicardoPietrobon, Jens Lehmann, Sören Auer, and Pascal Hitzler.Quality assessment methodologies for linked open data. Se-mantic Web Journal, 2013.

[109] Yu Zhao, Sheng Gao, Patrick Gallinari, and Jun Guo. Knowl-edge base completion by learning pairwise-interaction differ-entiated embeddings. Data Mining and Knowledge Discov-ery, 29(5):1486–1504, 2015.

[110] Antoine Zimmermann, Christophe Gravier, Julien Subercaze,and Quentin Cruzille. Nell2RDF: Read the Web, and Turn itinto RDF. In Knowledge Discovery and Data Mining meets

Linked Open Data, pages 2–8, 2013.

Revision of submission 1083-2295

Dear editors, dear reviewers,

we would like to thank all the reviewers for the numerous constructive comments. Based on those comments, we have prepared a carefully revised version of the paper.

The following changes have been made based on the reviews:

Reviewer Comment Changes

1 R1-R3 Define "Knowledge Graph" Although we do not come up with complete definition, we define some criteria a knowledge graph should fulfill, and discuss how some kinds of knowledge bases (e.g., relational databases, general ontologies) do not meet those criteria and are therefore not considered to be KGs.

2 R1 Include Facebook, Microsoft and Yahoo! KGs

The survey of knowledge graphs now also covers those commercial developments.

3 R1 Define boundary between KG graph construction and refinement

Two additional paragraphs have been added to the introduction, defining the boundary and more crisply setting the scope of this survey.

4 R1 Include non-automatic methods Non-automatic methods like crowdsourcing and games with a purpose have been included as "full members" of the survey.

5 R1 provide evidence that Google+ contents are used in KG

A footnote with a source (and a short caveat) has been inserted.

6 R1 Add explanations for the columns in Table 1

Explanations have been added to the table caption.

7 R1 Add sources for the data in Table 1 Precise sources for all numbers have been added in the text in section 2.

8 R1 Discuss Knowledge Vault in the text For each of the knowledge graphs in Table 1, a textual description has been included

9 R1 Define "genuine semantic web knowledge graph"

The expression has been rephrased for clarity.

10 R1 Rephrase "targeted kind of information"

After a long while of thinking about an easy-to-grasp phrasing, this has been changed to "Target of refinement".

11 R1 Explain silver standard An explanation has been added.

12 R1 use "retrospective evaluation" instead of "ex post"

The term has been replaced.

13 R1 Rethink organization of section 5 and 6 – why introduce another taxonomy of approaches?

The presentation of the works is now aligned with the taxonomy in section 3.

14 R1 Check whether the work in 6.4.1 is actually on knowledge graphs

We have checked whether the approaches have been evaluated at least on one dataset following our criteria for what a knowledge graph is (see 1). As a consequence, four works have been discarded from the survey.

15 R2 Clarify how DLs and Knowledge Graphs differ, distinguish against DL learning literature

The characterization of knowledge graphs (see 1) now explicitly differentiates between ontologies/DLs and KGs. The difference to DL learning is now explained in the beginning of section 3.

16 R2 Rephrase statement and replace citation for the origin of reasoning, which is not a genuine SW technique

The corresponding paragraph has been rephrased and augmented with further citations.

17 R3 Explain difference to ontology learning The difference to ontology learning is now explained in the beginning of section 3. Due to the explication of those differences (see also 15), a few works that have been concerned with ontology learning have been removed from the survey.

18 R3 Include four papers by AKSW group Three of the papers have been included in the survey. The ISWC 2013 paper on ORE has not been included since the scope of the survey has been slightly narrowed following a comment by R1, see 14

In addition to the reviewers' comments, the following changes were made• Section 2 has been restructured, also briefly explaining how the knowledge graphs are

constructed• A tabular summary of evaluation approaches has been added (table 2)• Numbers for knowledge graphs have been updated whereever newer numbers were

available (e.g., for the most recent DBpedia release)• Besides the sources named by reviewer 3, a few more additional very recent publications

(that had not been published at the time the last version was submitted) have been included in the survey

• The whole paper was proofread for typos and grammar problems

We are looking forward to your feedback for the revised version.

Best regards,

Heiko Paulheim

Documents

Knowledge Graph Reﬁnement: A Survey of …pdfs.semanticscholar.org/93b6/329091e215b9ef007a85c07635...2 Knowledge Graph Reﬁnement: A Survey of Approaches and Evaluation Methods