16
AI Communications 31 (2018) 3–18 3 DOI 10.3233/AIC-170745 IOS Press Addressing structural and linguistic heterogeneity in the Web 1 Jacobo Rouces a,, Gerard de Melo b and Katja Hose a a Department of Computer Science, Aalborg University, Denmark E-mails: [email protected], [email protected] b Department of Computer Science, Rutgers University–New Brunswick, USA E-mail: [email protected] Abstract. An increasing number of structured knowledge bases have become available on the Web, enabling many new forms of analyses and applications. However, the fact that the data is being published by different parties with different vocabularies and ontologies means that there is a high degree of heterogeneity and no common schema. At the same time, the abundance of different human languages across unstructured data presents a similar problem, because most text mining tools only cater to the English language. This paper presents solutions for these two kinds of heterogeneity. It introduces Klint, a Web-based system that automatically creates mappings to transform knowledge from heterogeneous sources into FrameBase, which is a broad linked data schema that enables the representation of a wide range of knowledge. With Klint, a user can review and edit the mappings with a streamlined interface, which in turn allows for human-level accuracy with minimum human effort. The paper further describes how FrameBase can be extended to support multilingual labels, which can aid in extending current tools for integrating English text into FrameBase knowledge. Keywords: Linked data, data integration, multilingual text mining 1. Introduction The Web of Data includes a rich and increasing number of structured knowledge bases, and has en- abled many new applications and forms of analyses. Such open knowledge bases often provide their data in RDF format [34], based on Subject-Predicate-Object triples. When coupled with the use of global identifiers in the form of Internationalized Resource Identifiers (IRIs), this facilitates the linking and merging of data from disparate sources into a single graph, usually re- ferred to as the LOD (Linked Open Data) cloud. How- ever, despite these advantages, different datasets may still model equivalent or overlapping information us- ing different vocabularies and different triple patterns. This poses two important problems. Querying data can be cumbersome. In order to capture all pertinent knowledge, a structured 1 The research leading to these results has received funding from the Danish Council for Independent Research (DFF) under grant agreement No. DFF-4093-00301, as well as the DARPA SocialSim program. * Corresponding author. E-mail: [email protected]. query will have to consist of a disjunction of all possible semantic patterns occurring in the myr- iad of heterogeneous vocabularies used in the data. Linking data is typically carried out by creating triples with a predicate owl:sameAs connect- ing RDF terms deemed to have the same mean- ing [3,15]. However, this may be infeasible when information is captured in structurally different forms across different datasets. Beyond these, there is also the inevitable challenge that most data on the Web is in the form of natural language [17]. This can be regarded as a further, par- ticularly challenging case of heterogeneity, either if text is considered in its most primitive form as strings of characters, or if it is modelled as a syntactic or semantic network as emitted by one of the state-of- the-art natural language processing libraries, such as CoreNLP [38]. Automatic integration could solve this challenge, either from fully structured and disambiguated het- erogeneous sources, or from syntactic/semantic net- works produced using existing text mining methods. 0921-7126/18/$35.00 © 2018 – IOS Press and the authors. All rights reserved

Addressing structural and linguisticgerard.demelo.org/papers/framebase-klint.pdfAI Communications 31 (2018) 3–18 3 DOI 10.3233/AIC-170745 IOS Press Addressing structural and linguistic

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Addressing structural and linguisticgerard.demelo.org/papers/framebase-klint.pdfAI Communications 31 (2018) 3–18 3 DOI 10.3233/AIC-170745 IOS Press Addressing structural and linguistic

AI Communications 31 (2018) 3–18 3DOI 10.3233/AIC-170745IOS Press

Addressing structural and linguisticheterogeneity in the Web1

Jacobo Rouces a,∗, Gerard de Melo b and Katja Hose a

a Department of Computer Science, Aalborg University, DenmarkE-mails: [email protected], [email protected] Department of Computer Science, Rutgers University–New Brunswick, USAE-mail: [email protected]

Abstract. An increasing number of structured knowledge bases have become available on the Web, enabling many new formsof analyses and applications. However, the fact that the data is being published by different parties with different vocabulariesand ontologies means that there is a high degree of heterogeneity and no common schema. At the same time, the abundance ofdifferent human languages across unstructured data presents a similar problem, because most text mining tools only cater to theEnglish language. This paper presents solutions for these two kinds of heterogeneity. It introduces Klint, a Web-based system thatautomatically creates mappings to transform knowledge from heterogeneous sources into FrameBase, which is a broad linkeddata schema that enables the representation of a wide range of knowledge. With Klint, a user can review and edit the mappingswith a streamlined interface, which in turn allows for human-level accuracy with minimum human effort. The paper furtherdescribes how FrameBase can be extended to support multilingual labels, which can aid in extending current tools for integratingEnglish text into FrameBase knowledge.

Keywords: Linked data, data integration, multilingual text mining

1. Introduction

The Web of Data includes a rich and increasingnumber of structured knowledge bases, and has en-abled many new applications and forms of analyses.Such open knowledge bases often provide their data inRDF format [34], based on Subject-Predicate-Objecttriples. When coupled with the use of global identifiersin the form of Internationalized Resource Identifiers(IRIs), this facilitates the linking and merging of datafrom disparate sources into a single graph, usually re-ferred to as the LOD (Linked Open Data) cloud. How-ever, despite these advantages, different datasets maystill model equivalent or overlapping information us-ing different vocabularies and different triple patterns.This poses two important problems.

– Querying data can be cumbersome. In orderto capture all pertinent knowledge, a structured

1The research leading to these results has received funding fromthe Danish Council for Independent Research (DFF) under grantagreement No. DFF-4093-00301, as well as the DARPA SocialSimprogram.

*Corresponding author. E-mail: [email protected].

query will have to consist of a disjunction of allpossible semantic patterns occurring in the myr-iad of heterogeneous vocabularies used in thedata.

– Linking data is typically carried out by creatingtriples with a predicate owl:sameAs connect-ing RDF terms deemed to have the same mean-ing [3,15]. However, this may be infeasible wheninformation is captured in structurally differentforms across different datasets.

Beyond these, there is also the inevitable challengethat most data on the Web is in the form of naturallanguage [17]. This can be regarded as a further, par-ticularly challenging case of heterogeneity, either iftext is considered in its most primitive form as stringsof characters, or if it is modelled as a syntactic orsemantic network as emitted by one of the state-of-the-art natural language processing libraries, such asCoreNLP [38].

Automatic integration could solve this challenge,either from fully structured and disambiguated het-erogeneous sources, or from syntactic/semantic net-works produced using existing text mining methods.

0921-7126/18/$35.00 © 2018 – IOS Press and the authors. All rights reserved

Page 2: Addressing structural and linguisticgerard.demelo.org/papers/framebase-klint.pdfAI Communications 31 (2018) 3–18 3 DOI 10.3233/AIC-170745 IOS Press Addressing structural and linguistic

4 J. Rouces et al. / Addressing structural and linguistic heterogeneity in the Web

However, it is often an AI-hard problem, especiallysince many applications may expect knowledge basesto have a high accuracy. This is because the accuracyis generally high in linked data KBs [23], owing to thefact that many KBs were built manually (e.g., folkso-nomical approaches as for Wikidata) or using integra-tion rules that exploit highly reliable markup and hy-perlinking structures (Wikipedia infoboxes and taxo-nomical links in DBpedia [35] and YAGO [53]).

In this paper, we present two attempts at tacklingthese issues, both related to FrameBase, a dataset thathas been able to integrate a wide range of knowledgefrom other sources.

Firstly, we describe Klint (Knowledge integrator),a Web-based system enabling semi-automatic schemaintegration. Given one or more existing RDF ontolo-gies, Klint generates tentative integration rules forthese ontologies by representing mappings in a uni-fied schema. For this unified schema, Klint relies onFrameBase [44], a wide-coverage, highly expressiveand extensible schema that can be used to representand integrate [47] a wide spectrum of knowledge frommany sources in a homogeneous and seamless way.2

Simultaneously, Klint offers an agile and simple inter-face that enables the user to inspect and adapt the ten-tative integration rules, achieving the desired balancebetween precision and scalability.

Secondly, we introduce a method for adding multi-lingual support to FrameBase. There are already sev-eral systems to extract FrameBase-modelled knowl-edge from natural language [2,11], but these are re-stricted to input sources given in the English language.Linked data is increasingly being embraced in domainssuch as government data, digital libraries, cultural her-itage, and linguistics. As an increasing number of or-ganizations around the world seek to adopt the prin-ciples of Linked data, it is important to also facilitatethe interlinking of data captured in different naturallanguages. This is particularly important for multina-tional institutions such as within the European Union.We evaluate results obtained for the Swedish language,but the method is generalizable to other languages.

This paper is structured as follows. Section 2 dis-cusses related work in this area. Section 3 introducesthe FrameBase schema, on top of which the work de-scribed in this paper is based. Section 4 describes thedetails of our Klint system for semi-automatic inte-gration of heterogeneous structured data. Section 5presents the algorithms used to add multilingual sup-port to FrameBase, and provides an evaluation of theresults. Finally, Section 6 provides a conclusion.

2See http://www.framebase.org/.

2. Related work

In this section we present previous research that isrelated to the work described in this paper. Section 2.1describes work in the field of ontology alignment. Sec-tion 2.2 describes existing tools for ontology visual-ization and edition, and for the creation of mappingsbetween different ontologies. Section 2.3 describes ex-isting work addressing the problem of linguistic het-erogeneity and multilinguality in the context of ontolo-gies and linked data.

2.1. Ontology alignment

Integrating knowledge from different sources is along-standing, ubiquitous problem [31]. In the con-text of linked data and ontology alignment [16,22],there has been extensive work in tasks such as identi-fying equivalent classes from different ontologies, andin some cases also equivalent instances and proper-ties [4,36,41,52].

Since knowledge can be expressed in structurallydifferent ways, one-to-one equivalences are not suffi-cient to exhaustively connect equivalent informationacross knowledge bases, and complex mappings be-come necessary. There has been some work address-ing the problem of declaring and finding these map-pings, either semi-automatically or fully automatically.The EDOAL (Expressive and Declarative OntologyAlignment Language) format [14] has been proposedto express complex relationships between properties,and complex correspondence patterns between ontolo-gies have been described and classified in an ontol-ogy [49]. However, these works do not address theproblem of identifying these correspondence patterns.The iMAP tool [21] searches a space of possible com-plex relationships between the values of entries intwo knowledge bases, e.g., room-price = room-rate * (1 + tax-rate). However, it is limitedto certain types of attribute combinations. The S-Matchtool [30] makes use of formal reasoning to prove possi-ble matches between ontology classes, involving unionand intersection operators, but it does not address com-plex matching of properties beyond this. The work ofRitze et al. [43] uses a rule-based approach to detectspecific kinds of complex alignment patterns betweenentries in small ontologies.

FrameBase [44,45] declares integration rules usingSPARQL CONSTRUCT queries, which enables arbi-trarily complex mappings between external knowledgebases and a hub schema based on frames (classes repre-

Page 3: Addressing structural and linguisticgerard.demelo.org/papers/framebase-klint.pdfAI Communications 31 (2018) 3–18 3 DOI 10.3233/AIC-170745 IOS Press Addressing structural and linguistic

J. Rouces et al. / Addressing structural and linguistic heterogeneity in the Web 5

senting events or processes) and frame elements (prop-erties of frames), extracted from FrameNet [26,48].Different methods have been developed to automati-cally create certain types of complex rules [45]. Thiswill be explained in more depth in Section 3.

2.2. Graphical tools for addressing structuralheterogeneity

There are several tools for visualizing linked data [7,13,29], but these do not provide facilities to create ar-bitrary mappings.

WebProtégé [55] is a widely used ontology engi-neering tool for RDF, with extensive support for OWLreasoning, but it is designed to facilitate creating indi-vidual ontologies, not mappings between them.

LodLive [9] allows users to browse RDF resourcesusing a dynamic visual graph and to create 1-to-1 linksbetween resources stored in different endpoints. How-ever, it does not allow for creating the more complexkinds of mappings we consider in this work.

Fusion [1] provides facilities to create mappings thatare more complex than 1-to-1 relations between enti-ties, but it is still limited to mappings to a single rela-tion in the source ontology, and hence does not supportarbitrary mappings. More specifically, it is limited toone of the following two kinds.

– Mapping a single property in the source ontologyto a path3 in another, but not allowing arbitrarymappings between graphs. For instance, in Geo-Names,4 the property geo:parentFeatureconnects cities with provinces and also provinceswith countries. Hence, the pattern X geo:parentFeature / geo:parentFeatureY can be mapped to X locatedIn Y to connecta city with the country it is located in. The inter-face for this kind of mapping is graphical, but notbased on graphs.

– Fusion also allows for mapping a single propertyin the source ontology to a complex pattern in an-other, which is defined in the imperative program-ming language Ruby.

FrameBase and Klint, on the contrary, allow com-plex mappings by means of arbitrarily complex butdeclarative graph patterns implemented in SPARQL.

3In the sense of property paths in SPARQL [32], i.e., a sequenceof properties.

4http://www.geonames.org/ontology/.

2.3. Linguistic heterogeneity

There are knowledge bases that include multilinguallabels, such as YAGO 3 [37], BabelNet [40], MENTA[18], and DBpedia [35]. A large part of the entitiesand relations in these ontologies are obtained fromWikipedia articles and their mutual hyperlinks. As partof this approach, multilingual labels can be obtainedby exploiting the cross-lingual links in Wikipedia [19],which constitutes a fundamental part of these ontolo-gies. However, these ontologies are not explicitly builtwith the purpose of knowledge integration.

In order to solve the problem of linguistic het-erogeneity in the context of knowledge bases, therehas been work on linking knowledge entries withlexical labels in different languages, using machinetranslation [27,28], graph models [56], or a combina-tion of dictionary and monolingual ontology mappingtools [54].

Our work differs from these two types of previ-ous efforts in several ways. First, we do not use thecross-lingual links in Wikipedia, as done by YAGO 3and DBpedia. Instead, we add multilingual labels toFrameBase by exploiting its connections to WordNetand FrameNet as well as the existence of high-qualitymultilingual versions of these linguistic resources. Sec-ond, we do not attempt to directly and reciprocallyinterlink multilingual knowledge bases, since Frame-Base is meant as a central and unique hub for inte-grating knowledge, where multilinguality can be ex-pressed by means of labels that will be annotated withLemon [39], reflecting their language and morphosyn-tactic properties.

3. Brief overview of FrameBase

FrameBase [44–47] provides a unified schema de-signed to integrate knowledge from heterogeneousdatasets such as those in the linked data cloud. Whatmakes FrameBase particularly well-suited for integrat-ing heterogeneous knowledge is that it is highly ex-pressive, both from a lexical perspective (containingthousands of lexical entries), but also structurally, be-cause the frame-based representation it relies on canrepresent n-ary relations in an extensible way.

The backbone of FrameBase stems from FrameNet[26,48], a linguistic resource that compiles frames toannotate how specific words, denoted as Lexical Units(LUs), evoke them when they appear in text. Frame-Base extends its frame coverage using synsets, which

Page 4: Addressing structural and linguisticgerard.demelo.org/papers/framebase-klint.pdfAI Communications 31 (2018) 3–18 3 DOI 10.3233/AIC-170745 IOS Press Addressing structural and linguistic

6 J. Rouces et al. / Addressing structural and linguistic heterogeneity in the Web

Fig. 1. Example of how FrameBase represents the knowledge that two entities, John and Mary, married in 1964. Blue entities constitute reifiedknowledge while green ones constitute dereified knowledge.

are sets of synonymous disambiguated words takenfrom another linguistic resource called WordNet [24].However, it is possible to understand FrameBase with-out detailed knowledge of these resources.

The core elements of the FrameBase schema areclasses that correspond to frames, which are concep-tual structures representing situations, events, or pro-cesses of any kind. The frame classes have propertiescalled Frame Elements (FEs), in the sense that eachof these properties has a specific frame class as itsrdfs:domain. For instance, a frame:Forming_relationships.wedding.noun frame classhas FE properties such as Partner_1, Partner_2,Means, Place, and Time. A frame:Building.construction.noun frame class has FE prop-erties such as Agent (the entity that performs theconstruction), Components, Created_entity,Instrument, Manner, Place and Time. Thesetwo frame classes are examples from the lowest partof the frame hierarchy in FrameBase, which con-tains specific terms, such as “wedding” and “construc-tion”. These terms possess very specific meanings,and they are also represented as lexical labels (usingrdfs:label), although additional labels may ex-ist. In the frame IRI, the terms are qualified with thename of upper, more general frames, which allowsus to cope with polysemous words. In the examplesabove, the macroframes are “Forming relationships”and “Building”. As an example of polysemy, the frame

class frame:Containing.house.verb repre-sents a situation in which a Container holds someContents within its physical boundaries, whereasthe frame class frame:Provide_lodging.house.verb represents situations in which a Hostprovides a temporary residence for a Lodger.

Figure 1 provides an example of how FrameBaserepresents the knowledge that two entities, John andMary, married in 1964 (with the frame classes and FEproperties depicted in blue).

In general, frame classes in FrameBase are or-ganized into a hierarchy of three different kinds offrames: microframes, miniframes, and macroframes.Figure 2 shows some example frames to illustrate thishierarchy.

– Microframes have very specific meanings and aredirectly connected to sense-disambiguated terms.Microframes are divided into two categories.

* LU-microframes, each associated with an LUin FrameNet. Some examples are illustrated asgreen nodes in Fig. 2, such as :Quitting_a_place.defection.noun.

* synset-microframes, each associated with asynset in WordNet. Some examples areillustrated as blue nodes in Fig. 2, such as:Quitting_a_place.00055315.desertion.n.

Page 5: Addressing structural and linguisticgerard.demelo.org/papers/framebase-klint.pdfAI Communications 31 (2018) 3–18 3 DOI 10.3233/AIC-170745 IOS Press Addressing structural and linguistic

J. Rouces et al. / Addressing structural and linguistic heterogeneity in the Web 7

Fig. 2. Example illustrating the hierarchy of frames. The prefix before the colon is “http://framebase.org/frame/”. Here, “..” repre-sents further stemming of the IRIs in order to fit the text in the nodes.

– Miniframes cluster together microframes withnear-equivalent meaning. Two examples are illus-trated as pink nodes in Fig. 2: :Quitting_a_place.m.defect.verb and :Quitting_a_place.m.retreat.verb (in order to bemore easily interpretable, they are represented

with one of the inheriting LUs, but the .m. com-ponent distinguishes the IRIs from microframeswith the same LU).

– Macroframes, finally, constitute an upper level ofgeneral types of situations or events. For example,:Quitting_a_place in Fig. 2.

Page 6: Addressing structural and linguisticgerard.demelo.org/papers/framebase-klint.pdfAI Communications 31 (2018) 3–18 3 DOI 10.3233/AIC-170745 IOS Press Addressing structural and linguistic

8 J. Rouces et al. / Addressing structural and linguistic heterogeneity in the Web

The purpose of miniframes is offering an interme-diate level of granularity, in which microframes rep-resenting equivalent concepts can be merged (“wed-ding”, “spouse”, and “marry” are under the sameminiframe), thus reducing data sparsity. Still, a con-flation of clearly different concepts is avoided, aswould happen if macroframes were used (“wedding”,“spouse”, and “marry” are under the same macroframeas “divorce”).

The FrameBase hierarchy contains around 130,000frames in total, and supports efficient RDFS inferencewith a few extensions [10].

The FE properties are also imported from FrameNet.FE properties change across macroframes, but micro-frames and miniframes inheriting a given macroframeshare the same FE properties. For instance, both micro-frames frame:Building.assembly.noun andframe:Building.construct.verb, which aresubclasses of the macroframe frame:Building,share FE properties such as fe:Building.Created_entity (the entity that is created in theact of building), fe:Building.Agent (the agentthat builds the created entity), and fe:Building.Components (the components that form the createdentity).

The main purpose of FrameBase is being able to rep-resent a wide range of knowledge in a homogeneousway. This should ease the difficulty associated withjointly querying data represented by an unboundedmix of different schemas that are heterogeneous notonly at a lexical level but also structurally. This struc-tural heterogeneity also prevents linking data by meansof binary properties such as owl:sameAs (for in-stance, consider a knowledge base with a propertymarriedWith and another with a class Marriage).

Knowledge can directly be created under the Frame-Base schema, or integrated from existing knowledgebases by means of integration rules (usually imple-mented as SPARQL CONSTRUCT queries), which al-low what could be termed as “complex linked data”, or“non-binary linked data”.

The frame-based representation is the core ofFrameBase and is chosen because it is very expres-sive and unambiguous, while striking a balance interms of space complexity. One cannot unambiguouslyrepresent multiple marriages of a given person usingmarriedWith properties, unless some workaroundsare used that entail a much higher triple overhead [44].However, FrameBase additionally provides a secondlevel of representation that is more concise, both inthe knowledge base and in queries, in which only two

FEs are specified for a frame. In this representation,the two FE properties, e.g. the two partners in a mar-riage, are directly connected via Direct Binary Pred-icates (DBPs). Figure 1 shows DBPs in green. Thetwo kinds of representations that FrameBase offersare seamlessly connected via so-called reification anddereification rules. Each type is the converse of theother, and both can be expressed together as bicon-ditional ReDer (Reification-Dereification) rules. Be-low we provide some examples. For instance, the firstexample rule converts between a triple (connecting atransformed entity with the cause for its transforma-tion), and a pattern (with an entity representing a trans-formation event that is connected to the transformedentity and the cause). The latter representation is moreverbose, but is necessary when additional informationneeds to be specified, such as the time of the transfor-mation or the resulting entity.

PREFIX frame: <http://framebase.org/frame/>PREFIX fe: <http://framebase.org/fe/>PREFIX : <http://framebase.org/dbp/>

?S :Cause_change.isTransformedByCause ?O�?R a frame:Cause_change.transform.verb ,?R fe:Cause_change.Entity ?S ,?R fe:Cause_change.Cause ?O .

?S :Cause_to_amalgamate.isUnitedBy ?O�?R a frame:Cause_to_amalgamate.unite.verb ,?R fe:Cause_to_amalgamate.Parts ?S ,?R fe:Cause_to_amalgamate.Agent ?O .

?S :Fluidic_motion.flowsFromSource ?O�?R a frame:Fluidic_motion.flow.verb ,?R fe:Fluidic_motion.Fluid ?S ,?R fe:Fluidic_motion.Source ?O .

Figure 3 provides a general overview of FrameBase.This paper also introduces a new version of Frame-Base, with an extended set of synset-microframes anda more homogeneous structure in the IRIs. Previousversions of FrameBase (1.x) used identifiers for synset-microframes that included the name of the parentmacroframe [44]. The parent macroframe was iden-tified using an automatic method that was not ableto map the entirety of WordNet Synsets into Frame-Base. Some synsets should not even be mapped to anexisting macroframe because this may not be avail-able in the original FrameNet resource. However, in-cluding these orphan synset-microframes turned out

Page 7: Addressing structural and linguisticgerard.demelo.org/papers/framebase-klint.pdfAI Communications 31 (2018) 3–18 3 DOI 10.3233/AIC-170745 IOS Press Addressing structural and linguistic

J. Rouces et al. / Addressing structural and linguistic heterogeneity in the Web 9

to be advisable because even without a proper in-tegration into the upper hierarchy (which can beadded later), these synsets increase the expressivityof FrameBase. Hence, a new syntax has been de-fined for FrameBase 2.0, which removes parent frameinformation from synset-microframe IRIs, and al-lows orphan synset-microframes to be defined as chil-dren of the upper frame frame:Top_frame (whichdoes not exist in FrameNet). Accordingly, the num-ber of synset-microframes has increased from 6,418 to117,659.

We have developed an integration engine that usesReDer rules to map properties from source KBs intoFrameBase, matching them with DBPs after a processof canonicalization [45,47]. Figure 4 includes an ex-ample. In order to map classes from the source KBs,a Support Vector Machine model is trained to classify

Fig. 3. Overview of the structure of FrameBase system (ontologyand rules).

(Class, Frame) pairs based on a range of lexical simi-larity features.

Because of its ties to computational linguistics,FrameBase is not only appropriate for integrating het-erogeneous structured knowledge, but also for inte-grating knowledge extracted from natural language.Currently, there are several third-party systems thatextract FrameBase-modelled knowledge from naturallanguage [2,11], but these are restricted to source textsprovided in the English language.

4. Klint

We will describe Klint from two different angles.First, Section 4.1 functionally describes the featuresof the application. Then, Section 4.2 provides non-functional details about the algorithm used for repre-senting the graph. Finally, Section 4.3 illustrates thesteps in Klint to edit a graph representing an integra-tion rule.

4.1. Working modes

Klint supports three working modes:

– The Assisted Schema Integration mode, whichhelps a user curate and create rules to integrateknowledge from external sources into Frame-Base. It is explained in detail in Section 4.1.1.

– The Visual Knowledge Building mode, which as-sists a user in creating FrameBase knowledge di-rectly. It is explained in detail in Section 4.1.2.

– The Visual Query Building mode, which assistsa user in creating queries to retrieve existingFrames. It is explained in detail in Section 4.1.3.

Fig. 4. Two rules automatically created by the integration engine, integrating different properties from Wikidata: “time of death” and “place ofdeath”. There is some knowledge overlap among them (the death event of type “to die” and the property “protagonist”), and also with the resultsfrom the rule in Fig. 5. This is impossible to capture by simply linking the source schemas via equivalence or subsumption properties.

Page 8: Addressing structural and linguisticgerard.demelo.org/papers/framebase-klint.pdfAI Communications 31 (2018) 3–18 3 DOI 10.3233/AIC-170745 IOS Press Addressing structural and linguistic

10 J. Rouces et al. / Addressing structural and linguistic heterogeneity in the Web

Fig. 5. Example of Klint’s interface: integrating elements from DBpedia – Klint used the contextual and lexical information from thesource elements to suggest two candidate values for the integrated type (selected node, “conflict”), for which the correct assigned value,Hostile_encounter-conflict.n was the first suggestion. The FrameBase properties were auto-inserted and some with high lexicaloverlap were automatically integrated as well. The complex structures that invoke some additional frames were created using the direct searchfunction.

4.1.1. Assisted schema integration modeUnder the Assisted Schema Integration Mode, the

user can integrate one or more entire knowledge bases(KBs) into FrameBase with reduced effort in compar-ison to an entirely manual or non-graphical approach.The input KBs can be loaded from an RDF file or aSPARQL endpoint. Other formats can also be used af-ter pre-processing with a suitable RDF converter.5

Integration heuristics. Klint automatically createscomplex integration rules for each element (i.e. beyond1-to-1 equivalences between entities) in the sourceschema, using integration algorithms based on linguis-tic annotations in FrameBase [47]. These heuristics cancreate two kinds of integration rules:

– Property-Frame integration rules, where a prop-erty P in the source KB is mapped to a pat-tern consisting of an instantiated frame (that rep-resents a situation or eventuality evoked by P ).The frame is connected to the subject and ob-ject of P by two FE properties that represent “se-mantic roles” [25]. Figure 4 includes examples ofthese rules. These integration rules are created by

5http://www.w3.org/wiki/ConverterToRdf.

matching the properties of the source KB with theFrameBase DBPs after a process of canonicaliza-tion [45,47].

– Class-Frame integration rules, where a class ismapped to a frame instance, and some of its out-going properties are mapped to FE properties ofthat frame, when there is textual overlap for all ofthem.

Graphical interface. Each integration rule is repre-sented as a graph with nodes and edges in the rightpane of Klint’s graphical interface (Fig. 5). Users cannavigate across different integration rules with the but-tons at the top bar, making modifications in a givengraph if necessary. The nodes can be of differenttypes.

– Grounded nodes represent single entities, and canthemselves be of three types:

* Source nodes (green) represent resources fromthe source KB and connect two variable nodes.

* FrameBase nodes (blue) represent FrameBaseresources and also connect variable nodes.They provide the translation of the source pat-tern to FrameBase.

Page 9: Addressing structural and linguisticgerard.demelo.org/papers/framebase-klint.pdfAI Communications 31 (2018) 3–18 3 DOI 10.3233/AIC-170745 IOS Press Addressing structural and linguistic

J. Rouces et al. / Addressing structural and linguistic heterogeneity in the Web 11

* Auxiliary nodes (gray) represent resourcesfrom third-party KBs, usually representingcommon idioms or very specific entities.

– Variable nodes (presented in red) represent uni-versally quantified variables over entities. Theybind the pattern from the source KB to the inte-grated FrameBase pattern. The remaining nodesare classified according to the type of entity theyrepresent.

The nodes are connected via directed edges repre-senting triples. Since an RDF triple involves three el-ements, each triple is represented by two successiveedges, one from the subject to the predicate and an-other from the predicate to the object.

In the Assisted Schema Integration Mode, bothedges and any of the above-mentioned types of nodescan be added, deleted, and edited. When a node isselected, the node is highlighted and the left panelis activated, where the user can change its name andunique identifier (in RDF, this is an IRI). Klint auto-matically deduces which consecutive pairs of edges de-fine triples.

Automatic suggestions. When a new FrameBasenode is created, its identifier is initially unspecified.Klint will try to use its context (the type of the neigh-boring nodes) and the integration engine to suggestpossible values automatically. Specifically, if the nodeis a property emanating from a frame class, the FEproperties of that frame class will be suggested, andif it is the subject of a FE property, the frame class towhich this FE property belongs will be suggested. Incase there are too many suggestions, the search func-tion may be used instead.

4.1.2. Visual knowledge buildingIn this mode, the user is able to introduce Frame-

Base nodes as well as external nodes, but not variablenodes. Accordingly, the resulting graph then representsknowledge rather than a query, and can be exportedin any of the common RDF formats. Unlike the As-sisted Schema Integration mode, this is not meant toproduce massive amounts of data from external struc-tured sources, but rather serves as a source-neutral wayof evaluating the expressivity of FrameBase when cre-ating small examples of knowledge.

4.1.3. Visual query buildingIn this mode, the user can introduce FrameBase

nodes and variable nodes, but not external nodes. Theresulting graph represents any kind of query, with thepurpose of retrieving existing knowledge under the

FrameBase schema, instead of an integration rule im-plemented specifically as a CONSTRUCT query, as inthe case of the Assisted Schema Integration mode.

Specifically, the user may choose between the fol-lowing different options:

– Obtain a SELECT SPARQL query, suited for aFrameBase KB, selecting all variables.

– Obtain a CONSTRUCT SPARQL query, suitedfor a FrameBase KB, extracting all the knowledgethat follows a given pattern.

– Run the SELECT/CONSTRUCT SPARQL querydirectly, visualizing the results.

4.2. Representation layer

RDF graphs are commonly regarded as directedlabelled graphs, with each Subject-Predicate-Objecttriple as an edge between the subject and the object,and the predicate as its edge label. However, this isa simplification, as predicates can also be subjects orobjects of some triples, and RDF graphs are truly bi-partite graphs [33], where each triple is representedby two consecutive directed edges: subject–predicateand predicate–object ones. The graph representation inKlint maintains the bipartite model, so RDF is fullysupported, but at the same time it maintains a data layerequivalent to a directed labelled graph (where eachtriple is an edge labelled with the predicate), whichis the one used to provide an intuitive interface to theuser. In order to make the presentation of this grapheven more clear, Klint relies on a combination of visualclues: it uses physics simulation algorithms to main-tain a similar orientation for edges of the same triple;it uses a distinctly different representation for subjectand object nodes in relation to predicates; and it cre-ates alias nodes for predicates, so if the same resourceis used twice as a predicate in different triples, or aspredicate in one and as a subject in another, then it isrepresented with two different nodes that are internallylinked to the same resource. In order to create human-readable labels for nodes, Klint tries to find an explicitlabel for a node, and if it fails to find it, it uses a heuris-tic based on extracting the last part of the IRI, which isoften human-readable.

When users want to make a modification in thegraph, they can do so in a simple, visual way. Subject–Predicate–Object triples can be added or removed byadding or removing edges between nodes. The sys-tem automatically creates a new triple after subject-predicate and predicate-object edges are created, shar-ing the predicate. Temporary links are shown for un-finished subject-predicate edges.

Page 10: Addressing structural and linguisticgerard.demelo.org/papers/framebase-klint.pdfAI Communications 31 (2018) 3–18 3 DOI 10.3233/AIC-170745 IOS Press Addressing structural and linguistic

12 J. Rouces et al. / Addressing structural and linguistic heterogeneity in the Web

4.3. Editing an integration rule: Example

We next consider the steps one takes to edit a graphrepresenting an integration rule so as to make it moreprecise. Figure 6 presents an integration rule that ex-presses that the DBpedia property almaMater can berepresented in FrameBase by using an instance of theStudying frame with FE properties Student andInstitution. While this, viewed as inference, isvalid, it is not entirely complete, since the term and theproperty “alma mater” is only used for cases where thestudies have been finished. This can be represented indifferent ways, but we will choose a relatively simpleone for illustration purposes: adding a third FE prop-erty “Time” with a literal value “past”.

While we have thus far not carried out any full-fledged user studies with untrained users, the manualconstruction of integration rules within the ePOOLICEEU project [46] showed that the use of a graphical in-terface of this sort would be advantageous.

5. Multilingual support

We add multilingual support to FrameBase byadding multilingual labels to microframe classes,alongside the already existing ones in English. Weconsider this more appropriate than creating new mi-croframe classes for terms in another language, sincemicroframe classes are semantic elements that oughtto be language-independent. While there may be con-notational and denotational differences in nuance be-tween translated terms, and there are terms that donot have a direct translation into other languages, thefirst phenomenon can be disregarded in most applica-tions of knowledge representation and linked data, andthe second phenomenon only pertains to a compara-bly small number of cases (and FrameNet as well asFrameBase allow for multi-word expressions, whichcan potentially alleviate this).

We do not consider multilingual labels of RDF termssuch as FE properties, macroframes or FrameBase-specific classes and properties, because the lexical la-bels for these terms are not meant to be used in textmining.

5.1. Method description

We implement our method for the case of Swedish,but it can be applied for another second language(which we will denote throughout the paper as L).The method is a combination of two algorithms, which

have different requirements and can work indepen-dently. This makes it more robust for the case that Ldoes have partial coverage among computation lexi-cons – for instance, there is a WordNet extension butnot a FrameNet one.

5.1.1. WordNet-based methodThis method, described in Algorithm 1, depends on

the availability of a mapping from Princeton Word-Net synsets to some lexicon for the other language (wedenote this mapping as PwnMap()). The lexicon canbe a multilingual WordNet or another kind of lexicon.In our case, we use SALDO, a Swedish lexicon [6],for which there is a mapping consisting of 6,900 pairsof a WordNet word sense and a SALDO identifier.6

SALDO identifiers are different from WordNet synsetsin several ways. For instance, instead of being orga-nized in a taxonomical hierarchy of “is a” relations,they are connected by means of a “non-classical” as-sociation relation that connects word senses with theirprimary and secondary descriptors. The relation be-tween a word sense and its primary descriptor is themost common and also forms a tree, but unlike “is a”,it connects a word sense with another one that is se-mantically and/or morphologically less complex, usu-ally more frequent, stylistically more unmarked, andtypically acquired earlier in first and second languageacquisition [6]. Some examples are “fönster” (window)having “öppning” (opening) as primary descriptor (and“hus” or “house” as secondary descriptor), “operahus”(opera house) having “opera” as primary descriptor,and “dålig” (bad) having “bra” (good) as primary de-scriptor (and “motsats” or “opposite” as secondary de-scriptor).

This algorithm can be generalized for any languagefor which there is a multilingual WordNet version. Cur-rently there are manually created open-source Word-Nets for over 30 languages, while many further lan-guages are covered by automatically produced Word-Nets such as the Universal WordNet [20].

5.1.2. FrameNet-based methodAlgorithm 2 uses the following two resources.

– A function LUL() that maps English BerkeleyFrameNet frame names to sets of L-languageLUs. This can be obtained from any second-language FrameNet that uses frames from the En-glish Berkeley FrameNet.

6https://spraakbanken.gu.se/swe/resurs/wordnet-saldo.

Page 11: Addressing structural and linguisticgerard.demelo.org/papers/framebase-klint.pdfAI Communications 31 (2018) 3–18 3 DOI 10.3233/AIC-170745 IOS Press Addressing structural and linguistic

J. Rouces et al. / Addressing structural and linguistic heterogeneity in the Web 13

(a) (b)

(c) (d)

Fig. 6. (a) The starting point is the initial integration rule in the lower part of the screen, which includes a frame instance labeled “An entity”.After clicking the empty area in the upper space, a new node is created. Klint marks it green, assuming it belongs to a source KB. (b) Once anedge has been created between the frame instance and the new node, Klint recognizes that this is meant to be a property whose domain is the typeof the frame instance, i.e. the frame class, and therefore, when the new node is selected, it automatically proposes all the FE properties of thatframe as possible values, including “Time”. (c) After selecting “Apply” for “Time”, the new node is re-assigned as a FrameBase node. (d) Aftercreating a literal node “past” and a second edge from “Time” to “Past”, Klint recognizes the new triple and updates the rule.

Page 12: Addressing structural and linguisticgerard.demelo.org/papers/framebase-klint.pdfAI Communications 31 (2018) 3–18 3 DOI 10.3233/AIC-170745 IOS Press Addressing structural and linguistic

14 J. Rouces et al. / Addressing structural and linguistic heterogeneity in the Web

Algorithm 1 WordNet-based algorithm1: for all synset-microframe F in FrameBase do2: for all English-word-sense W in WordNet associated to the synset associated to F do3: if PwnMap(W) exists then4: L-word-sense S ← PwnMap(W)

5: for all L-lemma L ∈ lemmas(S) do6: Add L as lexical label of M , with a language tag for L.

Algorithm 2 FrameNet-based algorithm1: for all microframe M in FrameBase do2: for all L ∈ Den–L(posAnnotatedLabelen(M)) do3: if L is in LUL(macroframeOf (M)) then4: Add L as a label for M , with a language tag for L.

– A dictionary function Den–L() that maps(lemma, POS) pairs between English and the sec-ond language L, i.e. a POS-annotated but sense-oblivious translation table.

This algorithm relies on the premise thatamong the possible translations for the termassociated with an English LU (the elements ofD(posAnnotatedLabelen(M))), those that are alsopresent in the non-English FrameNet as terms associ-ated with LUs of the same macroframe (i.e., belong-ing to LUL(macroframeOf (M))) are correct transla-tions for the English LU. In other words, L-languagetranslations owing to unrelated senses (homonyms) ofthe English term are filtered out because they do nottend as terms associated with LUs in the L-languageFrameNet. This assumption will be shown to hold trueempirically in the results section.

We test this algorithm with Swedish in Section 5.2,but it can easily be extended to other languages.

– As mentioned above, LUL() can be obtained fromany second-language FrameNet that uses framesfrom the English Berkeley FrameNet. For our ex-periments in Swedish, we use SweFN++ [5].Similar datasets for other languages are the Ger-man FrameNet from the SALSA project [8],the Spanish FrameNet [51] and the JapaneseFrameNet [42]. There has also been work on au-tomatic machine learning-based translations ofFrameNet [12].

– Because Den–L() does not need to map wordsenses, it can be obtained from any English–Ldictionary that uses English senses different fromWordNet or FrameNet, or makes no word sensedistinction at all either for English or L. For ourexperiments in Swedish, we obtain Den–L() from

DBnary [50], a dataset with multilingual lexicaldata extracted from Wiktionary. Wiktionary con-tains data for numerous further languages. Thecoverage for the results of both algorithms shouldbe higher when applied to languages such asSpanish, German, French, or Chinese, which havea high number of speakers and consequently tendto be well covered in computational linguistics re-sources.

A particular example of a mistaken translation thatis avoided by the algorithm is the following one.The LU-microframe frame:Color.green.noun,whose direct macroframe parent is frame:Color,has the label ‘green’, which has two Swedish transla-tions in DBnary: ‘grön’ (green color) and ‘miljövän-lig’ (green in the sense of environmentally friendly).Since the first is a LU in SweFN++ but the secondis not, the first is chosen as the Swedish label for theLU-microframe, while the second one is filtered out.

5.2. Results and evaluation

The results from Algorithm 1 covered 6,796 synset-microframes of a total of 117,659 in FrameBase 2.0,producing in total 7,331 Swedish labels. This does notneed further evaluation for correctness because it relieson manual mappings.

The results from Algorithm 2 on LU-microframesproduced 3,542 labels for 2,847 LU-microframes outof a total of 11,114. We evaluated 100 labels, of which99 were correct. The labels were sampled randomly,with a uniform distribution, independently, without re-placement. A label was judged correct if any possibleEnglish sentence was deemed possible so that (1) theEnglish LU defining the LU-microframe was being

Page 13: Addressing structural and linguisticgerard.demelo.org/papers/framebase-klint.pdfAI Communications 31 (2018) 3–18 3 DOI 10.3233/AIC-170745 IOS Press Addressing structural and linguistic

J. Rouces et al. / Addressing structural and linguistic heterogeneity in the Web 15

Table 1

Results of adding Swedish labels to FrameBase

synset-microframes LU-microframes Labels Precision

Total in FrameBase 117,659 11,114 – –

Algorithm 1 6,796 0 7,331 1†

Algorithm 2 0 2,847 3,542 0.99†Lossless use of manual mappings.

Table 2

Examples of obtained Swedish annotations

<http://framebase.org/frame/Motion.drift.verb> rdfs:label "driva"@sv

<http://framebase.org/frame/Corroding.rust.verb> rdfs:label "rosta"@sv

<http://framebase.org/frame/Achieving_first.discovery.noun> rdfs:label "upptäckt"@sv

<http://framebase.org/frame/Achieving_first.invent.verb> rdfs:label "uppfinna"@sv

<http://framebase.org/frame/Desirability.pathetic.adjective> rdfs:label "patetisk"@sv

<http://framebase.org/frame/Sufficiency.insufficient.adjective> rdfs:label "otillräcklig"@sv

<http://framebase.org/frame/Communicate_categorization.represent.verb> rdfs:label "föreställa"@sv

<http://framebase.org/frame/Being_located.located.adjective> rdfs:label "belägen"@sv

<http://framebase.org/frame/Hostile_encounter.war.noun> rdfs:label "krig"@sv

<http://framebase.org/frame/Rite.circumcision.noun> rdfs:label "omskärelse"@sv

<http://framebase.org/frame/Part_orientational.western.adjective> rdfs:label "västra"@sv

<http://framebase.org/frame/Accoutrements.goggles.noun> rdfs:label "glasögon"@sv

<http://framebase.org/frame/Experiencer_focus.abhor.verb> rdfs:label "hata"@sv

<http://framebase.org/frame/Experiencer_focus.abhor.verb> rdfs:label "avsky"@sv

<http://framebase.org/frame/Desirability.awful.adjective> rdfs:label "otäck"@sv

<http://framebase.org/frame/Likelihood.probable.adjective> rdfs:label "sannolik"@sv

<http://framebase.org/frame/Contingency.independent.adjective> rdfs:label "oberoende"@sv

<http://framebase.org/frame/Physical_artworks.photograph.noun> rdfs:label "fotografi"@sv

<http://framebase.org/frame/Intoxication.sober.adjective> rdfs:label "nykter"@sv

<http://framebase.org/frame/Containers.suitcase.noun> rdfs:label "resväska"@sv

<http://framebase.org/frame/Observable_body_parts.sole.noun> rdfs:label "fotsula"@sv

<http://framebase.org/frame/Mental_property.cynical.adjective> rdfs:label "cynisk"@sv

<http://framebase.org/frame/Word_relations.holonym.noun> rdfs:label "holonym"@sv

<http://framebase.org/frame/Stage_of_progress.sophisticated.adjective> rdfs:label "sofistikerad"@sv

<http://framebase.org/frame/Using.utilise.verb> rdfs:label "använda"@sv

<http://framebase.org/frame/Death.drown.verb> rdfs:label "drunkna"@sv

<http://framebase.org/frame/Successful_action.fail.verb> rdfs:label "fallera"@sv

<http://framebase.org/frame/Evaluative_comparison.comparable.adjective> rdfs:label "jämförbar"@sv

<http://framebase.org/frame/Food.cracker.noun> rdfs:label "kex"@sv

<http://framebase.org/frame/Calendric_unit.week.noun> rdfs:label "vecka"@sv

<http://framebase.org/frame/Attempt_suasion.discourage.verb> rdfs:label "avråda"@sv

<http://framebase.org/frame/Compliance.obey.verb> rdfs:label "lyda"@sv

<http://framebase.org/frame/Self_motion.jump.verb> rdfs:label "hoppa"@sv

<http://framebase.org/frame/Mental_property.naive.adjective> rdfs:label "naiv"@sv

<http://framebase.org/frame/Commitment.promise.verb> rdfs:label "lova"@sv

<http://framebase.org/frame/Food.egg.noun> rdfs:label "ägg"@sv

<http://framebase.org/frame/Judgment_communication.laud.verb> rdfs:label "prisa"@sv

used to evoke the macroframe, and (2) the transla-tion of the sentence would include the Swedish labelas translation of the English LU. The evaluation wasjointly performed by two annotators, and it is availableat http://www.framebase.org.

This evaluation yields a precision of 0.99, and therecall, under the simplifying assumption of one correcttranslation per LU, is 0.26. The somewhat low cover-age is due to the low coverage of SweFn++ in relationto the Berkeley English FrameNet, but the high preci-

Page 14: Addressing structural and linguisticgerard.demelo.org/papers/framebase-klint.pdfAI Communications 31 (2018) 3–18 3 DOI 10.3233/AIC-170745 IOS Press Addressing structural and linguistic

16 J. Rouces et al. / Addressing structural and linguistic heterogeneity in the Web

sion confirms the validity of the assumption made inSection 5. More details about the results are availablein Table 1.

We also tested Algorithm 2 on synset-macroframes,and we used SweFn++ as gold standard for eval-uation. In this case, it can only be applied to the18,907 synset-microframes that have a parent otherthan frame:Top_frame. The obtained precision fora sample of 100 was rather low (0.38). This is due totwo factors: synsets in WordNet are based on substan-tially more fine-grained sense distinctions in compari-son with LUs in FrameNet, and SweFn++ is not com-plete (for instance, it has a constraint of one SALDOID per WordNet synset). In order to alleviate this, weused a weighted synonymy graph included in SALDO,where SALDO IDs are weighted by similarity from 0to 5. Expanding the gold standard with neighbors con-nected by a value higher or equal to 4 increased theprecision to 67%, whereas using a threshold of 3 in-creased the precision to 93%. Table 2 lists a few correctexamples of the resulting annotations.

Besides simple lexical annotations using rdfs:label, we also create morpho-syntactically rich an-notations using Lemon [39].

6. Conclusion

We have presented two approaches to advance theintegration of heterogeneous knowledge – both struc-tured and non-structured – into FrameBase, which is aschema able to represent a wide range of knowledgeusing linguistic structures.

Klint is a Web-based framework that allows usersto supervise an automatic integration of heterogeneousstructured knowledge by providing a user-friendlygraph-based interface that enables the reviewing andcuration of complex integration rules produced bystate-of-the-art integration algorithms. We have alsodemonstrated how FrameBase can be extended to sup-port multilingual labels by reusing publicly availablelexical resources, which has the potential to benefit textmining in languages other than English.

References

[1] S. Araujo, G.-J. Houben, D. Schwabe and J. Hidders, Fusion– Visually exploring and eliciting relationships in linked data,in: The Semantic Web – ISWC 2010, Springer-Verlag, Berlin,2010, pp. 1–15.

[2] V. Basile, E. Cabrio and C. Schon, KNEWS: Using logical andlexical semantics to extract knowledge from natural language,in: Proceedings of the 22nd European Conference on ArtificialIntelligence (ECAI 2016), 2016.

[3] C. Bizer, T. Heath and T. Berners-Lee, Linked data – The storyso far, IJSWIS 5(3) (2009), 1–22.

[4] C. Böhm, G. de Melo, F. Naumann and G. Weikum, LINDA:Distributed Web-of-data-scale entity matching, in: Proceed-ings of CIKM 2012 (CIKM ’12), 2012, pp. 2104–2108. ISBN978-1-4503-1156-4. doi:10.1145/2396761.2398582.

[5] L. Borin, D. Dannélls, M. Forsberg, M. Toporowska Gronos-taj and D. Kokkinakis, Swedish FrameNet++, in: SwedishLanguage Technology Conference 2010, 2010, https://svn.spraakdata.gu.se/sb/fnplusplus/pub/sltc2010.pdf.

[6] L. Borin, M. Forsberg and L. Lönngren, SALDO: A touch ofyin to WordNet’s yang, Language Resources and Evaluation47(4) (2013), 1191–1211. doi:10.1007/s10579-013-9233-4.

[7] J.M. Brunetti, S. Auer, R. García, J. Klímek and M. Necaský,Formal linked data visualization model, in: Proceedings of In-ternational Conference on Information Integration and Web-Based Applications & Services (IIWAS ’13), ACM, NewYork, NY, USA, 2013. ISBN 978-1-4503-2113-6. doi:10.1145/2539150.2539162.

[8] A. Burchardt, K. Erk, A. Frank, A. Kowalski, S. Padó andM. Pinkal, The SALSA corpus: A German corpus resourcefor lexical semantics, in: Proceedings of LREC 2006, 2006,pp. 969–974.

[9] D.V. Camarda, S. Mazzini and A. Antonuccio, LodLive, ex-ploring the Web of data, in: Proceedings of the 8th Interna-tional Conference on Semantic Systems (I-SEMANTICS ’12),ACM, New York, NY, USA, 2012, pp. 197–200. ISBN 978-1-4503-1112-0. doi:10.1145/2362499.2362532.

[10] J.J. Carroll, I. Dickinson, C. Dollin, D. Reynolds, A. Seaborneand K. Wilkinson, Jena: Implementing the Semantic Web Rec-ommendations, in: WWW ’04, 2004. ISBN 1-58113-912-8.doi:10.1145/1013367.1013381.

[11] F. Corcoglioniti, M. Rospocher and A.P. Aprosio, A 2-phaseframe-based knowledge extraction framework, in: Proceedingsof ACM Symposium on Applied Computing (SAC’16), 2016,pp. 354–361.

[12] O. Culo and G. de Melo, Source–Path–Goal: Investigating thecross-linguistic potential of frame-semantic text analysis, it –Information Technology 54 (2012), 147–152.

[13] A.-S. Dadzie and M. Rowe, Approaches to visualising linkeddata: A survey, Semantic Web 2(2) (2011), 89–124.

[14] J. David, J. Euzenat, F. Scharffe and C. Trojahn dos San-tos, The alignment API 4.0, Semantic Web 2(1) (2011),3–10, http://www.semantic-web-journal.net/sites/default/files/swj60_1.pdf.

[15] G. de Melo, Not quite the same: Identity constraints for theWeb of Linked Data, in: Proceedings of AAAI 2013, AAAIPress, Menlo Park, CA, USA, 2013, pp. 1092–1098, http://www.aaai.org/ocs/index.php/AAAI/AAAI13/paper/view/6491. ISBN 978-1-57735-615-8.

[16] G. de Melo, F. Suchanek and A. Pease, Integrating YAGO intothe Suggested Upper Merged Ontology, in: Proceedings of the20th IEEE International Conference on Tools with Artificial In-telligence (ICTAI 2008), IEEE Computer Society, Los Alami-tos, CA, USA, 2008.

Page 15: Addressing structural and linguisticgerard.demelo.org/papers/framebase-klint.pdfAI Communications 31 (2018) 3–18 3 DOI 10.3233/AIC-170745 IOS Press Addressing structural and linguistic

J. Rouces et al. / Addressing structural and linguistic heterogeneity in the Web 17

[17] G. de Melo and G. Weikum, Language as a foundation of theSemantic Web, in: Proceedings of ISWC 2008, CEUR WS,Vol. 401, CEUR, Karlsruhe, Germany, 2008.

[18] G. de Melo and G. Weikum, MENTA: Inducing multilingualtaxonomies from Wikipedia, in: Proceedings of CIKM 2010,ACM, 2010, pp. 1099–1108.

[19] G. de Melo and G. Weikum, Untangling the cross-linguallink structure of Wikipedia, in: Proceedings of the 48th An-nual Meeting of the Association for Computational Linguistics,Association for Computational Linguistics, Stroudsburg, PA,USA, 2010, pp. 844–853, http://www.aclweb.org/anthology/P10-1087. ISBN 978-1-932432-66-4.

[20] G. de Melo and G. Weikum, UWN: A large multilingual lexicalknowledge base, in: Proceedings of the 50th Annual Meetingof the Association for Computational Linguistics, Associationfor Computational Linguistics, Stroudsburg, PA, USA, 2012,pp. 151–156. ISBN 978-1-937284-27-5.

[21] R. Dhamankar, Y. Lee, A. Doan, A. Halevy and P. Domin-gos, iMAP: Discovering complex semantic matches betweendatabase schemas, in: SIGMOD 2004, 2004, pp. 383–394.ISBN 1-58113-859-8. doi:10.1145/1007568.1007612.

[22] J. Euzenat and P. Shvaiko, Ontology Matching, Springer, 2007.[23] M. Färber, F. Bartscherer, C. Menne and A. Rettinger, Linked

data quality of DBpedia, Freebase, OpenCyc, Wikidata, andYAGO, Semantic Web, 2016.

[24] C. Fellbaum (ed.), WordNet: An Electronic Lexical Database,MIT Press, 1998.

[25] C.J. Fillmore, The case for case reopened, Syntax and Seman-tics 8 (1977), 59–82.

[26] C.J. Fillmore, C.R. Johnson and M.R. Petruck, Backgroundto FrameNet, International Journal of Lexicography 16(3)(2003), 235–250. doi:10.1093/ijl/16.3.235.

[27] B. Fu, R. Brennan and D. O’Sullivan, Cross-lingual ontologymapping – An investigation of the impact of machine trans-lation, in: The Semantic Web, Springer-Verlag, Berlin, 2009,pp. 1–15.

[28] B. Fu, R. Brennan and D. O’Sullivan, Cross-lingual ontologymapping and its use on the multilingual Semantic Web, MSW571 (2010), 13–20.

[29] T. Ge, Y. Wang, G. de Melo and H. Li, Visualizing and curatingknowledge graphs over time and space, in: Proceedings of ACL2016, ACL, 2016.

[30] F. Giunchiglia, P. Shvaiko and M. Yatskevich, S-Match: An al-gorithm and an implementation of semantic matching, in: TheSemantic Web: Research and Applications, Springer, Berlin,2004, pp. 61–75. doi:10.1007/978-3-540-25956-5_5.

[31] B. Golshan, A. Halevy, G. Mihaila and W.-C. Tan, Data inte-gration: After the teenage years, in: Proceedings of the 36thACM SIGMOD–SIGACT–SIGAI Symposium on Principles ofDatabase Systems (PODS ’17), ACM, New York, NY, USA,2017, pp. 101–106. ISBN 978-1-4503-4198-1. doi:10.1145/3034786.3056124.

[32] S. Harris and A. Seaborne, SPARQL 1.1 query language, W3CRecommendation, W3C Consortium, 2013.

[33] J. Hayes and C. Gutierrez, Bipartite graphs as intermediatemodel for RDF, in: ISWC 2004, 2004, pp. 47–61.

[34] P. Hayes and P. Patel-Schneider, RDF 1.1 semantics, Technicalreport, W3C, 2014.

[35] J. Lehmann, R. Isele, M. Jakob, A. Jentzsch, D. Kontokostas,P.N. Mendes, S. Hellmann, M. Morsey, P. Van Kleef, S. Auer et

al., DBpedia – A large-scale, multilingual knowledge base ex-tracted from Wikipedia, Semantic Web 6(2) (2015), 167–195.

[36] J. Li, J. Tang, Y. Li and Q. Luo, RiMOM: A dynamic mul-tistrategy ontology alignment framework, IEEE TKDE 21(8)(2009), 1218–1232. doi:10.1109/TKDE.2008.202.

[37] F. Mahdisoltani, J. Biega and F. Suchanek, Yago3: A knowl-edge base from multilingual Wikipedias, in: 7th Biennial Con-ference on Innovative Data Systems Research, 2015, CIDRConference.

[38] C.D. Manning, M. Surdeanu, J. Bauer, J. Finkel, S.J. Bethardand D. McClosky, The Stanford CoreNLP natural languageprocessing toolkit, in: Association for Computational Linguis-tics (ACL) System Demonstrations, 2014, pp. 55–60, http://www.aclweb.org/anthology/P/P14/P14-5010.

[39] J. McCrae, D. Spohr and P. Cimiano, Linking lexical resourcesand ontologies on the Semantic Web with lemon, in: TheSemantic Web: Research and Applications, Springer, Berlin,2011, pp. 245–259.

[40] R. Navigli and S.P. Ponzetto, BabelNet: The automatic con-struction, evaluation and application of a wide-coverage mul-tilingual semantic network, Artificial Intelligence 193 (2012),217–250. doi:10.1016/j.artint.2012.07.001.

[41] A.-C. Ngonga Ngomo and S. Auer, LIMES: A time-efficientapproach for large-scale link discovery on the Web of Data, in:Proceedings of the Twenty-Second International Joint Confer-ence on Artificial Intelligence (IJCAI’11), AAAI Press, 2011,pp. 2312–2317. ISBN 978-1-57735-515-1. doi:10.5591/978-1-57735-516-8/IJCAI11-385.

[42] K. Ohara, Semantic annotations in Japanese FrameNet: Com-paring frames in Japanese and English, in: Proceedingsof LREC 2012, European Language Resources Association(ELRA), Istanbul, Turkey, 2012. ISBN 978-2-9517408-7-7.

[43] D. Ritze, C. Meilicke, O. Sváb-Zamazal and H. Stucken-schmidt, A pattern-based ontology matching approach for de-tecting complex correspondences, in: OM’10, 2008, http://dblp.uni-trier.de/db/conf/semweb/om2009.html#RitzeMSS08.

[44] J. Rouces, G. de Melo and K. Hose, FrameBase: RepresentingN-ary relations using semantic frames, in: Proceedings of the12th Extended Semantic Web Conference, ESWC, 2015.

[45] J. Rouces, G. de Melo and K. Hose, Heuristics for connectingheterogeneous knowledge via FrameBase, in: ESWC’16, 2015.

[46] J. Rouces, G. de Melo and K. Hose, Representing specializedevents with FrameBase, in: DeRiVE ’15, 2015.

[47] J. Rouces, G. de Melo and K. Hose, Complex schema map-ping and linking data: Beyond binary predicates, in: LDOW’16,2016.

[48] J. Ruppenhofer, M. Ellsworth, M.R.L. Petruck, C.R. Johnsonand J. Scheffczyk, FrameNet II: Extended Theory and Prac-tice, ICSI, 2006.

[49] F. Scharffe and D. Fensel, Correspondence patterns for on-tology alignment, in: Proceedings of the 16th InternationalConference on Knowledge Engineering: Practice and Patterns(EKAW ’08), Springer-Verlag, Berlin, 2008, pp. 83–92. ISBN978-3-540-87695-3. doi:10.1007/978-3-540-87696-0_10.

[50] G. Sérasset, DBnary: Wiktionary as a Lemon-based multilin-gual lexical resource in RDF, Semantic Web 6(4) (2015), 355–361. doi:10.3233/SW-140147.

Page 16: Addressing structural and linguisticgerard.demelo.org/papers/framebase-klint.pdfAI Communications 31 (2018) 3–18 3 DOI 10.3233/AIC-170745 IOS Press Addressing structural and linguistic

18 J. Rouces et al. / Addressing structural and linguistic heterogeneity in the Web

[51] C. Subirats, Spanish FrameNet: A frame-semantic analysis ofthe Spanish lexicon, in: Multilingual FrameNets in Computa-tional Lexicography: Methods and Applications, H. Boas, ed.,Mouton de Gruyter, Berlin, 2009, pp. 135–162.

[52] F.M. Suchanek, S. Abiteboul and P. Senellart, PARIS: Prob-abilistic alignment of relations, instances, and schema, Proc.VLDB Endow. 5(3) (2011), 157–168. doi:10.14778/2078331.2078332.

[53] F.M. Suchanek, G. Kasneci and G. Weikum, YAGO: A largeontology from Wikipedia and WordNet, Web Semantics: Sci-ence, Services and Agents on the World Wide Web 6(3) (2008),203–217. doi:10.1016/j.websem.2008.06.001.

[54] C. Trojahn, P. Quaresma and R. Vieira, A framework for mul-tilingual ontology mapping, in: Proceedings of LREC 2008,ACM, 2008.

[55] T. Tudorache, C. Nyulas, N.F. Noy and M.A. Musen, WebPro-tégé: A collaborative ontology editor and knowledge acquisi-tion tool for the Web, Semantic web 4(1) (2013), 89–99.

[56] Z. Wang, J. Li, Z. Wang and J. Tang, Cross-lingual knowledgelinking across Wiki knowledge bases, in: Proceedings of the21st International Conference on World Wide Web (WWW ’12),ACM, New York, NY, USA, 2012, pp. 459–468. ISBN 978-1-4503-1229-5. doi:10.1145/2187836.2187899.