53
A Flexible Framework for Understanding the Dynamics of Evolving RDF Datasets: Extended Version Yannis Roussakis 1 , Ioannis Chrysakis 1 , Kostas Stefanidis 1 , Giorgos Flouris 1 , and Yannis Stavrakas 2 1 Institute of Computer Science, FORTH, Heraklion, Greece {rousakis,hrysakis,kstef,fgeo}@ics.forth.gr 2 Institute for the Management of Information Systems, ATHENA, Athens, Greece [email protected] Abstract. The dynamic nature of Web data gives rise to a multitude of problems related to the description and analysis of the evolution of RDF datasets, which are important to a large number of users and domains, such as, the curators of biological information where changes are constant and interrelated. In this paper, we propose a framework that enables identifying, analysing and understanding these dynamics. Our approach is flexible enough to capture the peculiarities and needs of different applications on dynamic data, while being formally robust due to the satisfaction of the completeness and unambiguity properties. In addition, our framework allows the persistent representation of the detected changes be- tween versions, in a manner that enables easy and efficient navigation among versions, automated processing and analysis of changes, cross-snapshot queries (spanning across different versions), as well as queries involving both changes and data. Our work is evaluated using real Linked Open Data, and exhibits good scalability properties. 1 Introduction With the growing complexity of the Web, we face a completely different way of cre- ating, disseminating and consuming big volumes of information. The recent explosion of the Data Web and the associated Linked Open Data (LOD) initiative has led sev- eral large-scale corporate, government, or even user-generated data from different do- mains (e.g., DBpedia, Freebase, YAGO) to be published online and become available to a wide spectrum of users [21]. Dynamicity is an indispensable part of LOD; LOD datasets are constantly evolving for several reasons, such as the inclusion of new experi- mental evidence or observations, or the correction of erroneous conceptualizations [22]. Understanding this evolution by finding and analysing the differences (deltas) between datasets has been proved to play a crucial role in various curation tasks, like the syn- chronization of autonomously developed dataset versions [3], the visualization of the evolution history of a dataset [13], and the synchronization of interconnected LOD datasets [15]. Deltas are also necessary in certain applications that require access to previous versions of a dataset to support historical or cross-snapshot queries [20], in order to review past states of the dataset, understand the evolution process (e.g., to identify trends in the domain of interest), or detect the source of errors in the current

A Flexible Framework for Understanding the …interesting, common, or coarse-grained enough to be expressed using complex changes. 3 Preliminaries We consider two disjoint sets U,

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Flexible Framework for Understanding the …interesting, common, or coarse-grained enough to be expressed using complex changes. 3 Preliminaries We consider two disjoint sets U,

A Flexible Framework for Understanding the Dynamicsof Evolving RDF Datasets: Extended Version

Yannis Roussakis1, Ioannis Chrysakis1, Kostas Stefanidis1, Giorgos Flouris1, andYannis Stavrakas2

1 Institute of Computer Science, FORTH, Heraklion, Greece{rousakis,hrysakis,kstef,fgeo}@ics.forth.gr

2 Institute for the Management of Information Systems, ATHENA, Athens, [email protected]

Abstract. The dynamic nature of Web data gives rise to a multitude of problemsrelated to the description and analysis of the evolution of RDF datasets, whichare important to a large number of users and domains, such as, the curators ofbiological information where changes are constant and interrelated. In this paper,we propose a framework that enables identifying, analysing and understandingthese dynamics. Our approach is flexible enough to capture the peculiarities andneeds of different applications on dynamic data, while being formally robust dueto the satisfaction of the completeness and unambiguity properties. In addition,our framework allows the persistent representation of the detected changes be-tween versions, in a manner that enables easy and efficient navigation amongversions, automated processing and analysis of changes, cross-snapshot queries(spanning across different versions), as well as queries involving both changesand data. Our work is evaluated using real Linked Open Data, and exhibits goodscalability properties.

1 Introduction

With the growing complexity of the Web, we face a completely different way of cre-ating, disseminating and consuming big volumes of information. The recent explosionof the Data Web and the associated Linked Open Data (LOD) initiative has led sev-eral large-scale corporate, government, or even user-generated data from different do-mains (e.g., DBpedia, Freebase, YAGO) to be published online and become availableto a wide spectrum of users [21]. Dynamicity is an indispensable part of LOD; LODdatasets are constantly evolving for several reasons, such as the inclusion of new experi-mental evidence or observations, or the correction of erroneous conceptualizations [22].Understanding this evolution by finding and analysing the differences (deltas) betweendatasets has been proved to play a crucial role in various curation tasks, like the syn-chronization of autonomously developed dataset versions [3], the visualization of theevolution history of a dataset [13], and the synchronization of interconnected LODdatasets [15]. Deltas are also necessary in certain applications that require access toprevious versions of a dataset to support historical or cross-snapshot queries [20], inorder to review past states of the dataset, understand the evolution process (e.g., toidentify trends in the domain of interest), or detect the source of errors in the current

Page 2: A Flexible Framework for Understanding the …interesting, common, or coarse-grained enough to be expressed using complex changes. 3 Preliminaries We consider two disjoint sets U,

modelling. Unfortunately, it is often difficult, or even infeasible, for curators or editorsto accurately record such deltas; studies have shown that manually created deltas are of-ten incomplete or erroneous, even for centrally curated datasets [15]. In addition, sucha recording would require a closed and controlled system, and is thus, not suitable forthe chaotic nature of the Web.

To study the dynamics of LOD, we propose a framework for detecting and analysingchanges and the evolution history of LOD datasets. This would allow remote users ofa dataset to identify changes, even if they have no access to the actual change process.Apart from identifying the change, we focus on empowering users to perform sophis-ticated analysis on the evolution data, so as to understand how datasets (or parts ofthem) evolve, and how this evolution is related to the data itself. For instance, one couldbe interested in specific types of evolution, e.g., transfers of soccer players, along acertain timeframe, e.g., DBpedia versions v3.7-v3.9, with emphasis on specific partsof the data, e.g., only for strikers being transferred to Spanish teams. This motivat-ing example is further discussed in Section 2, where we give an informal descriptionof our framework. We restrict ourselves to RDF3 datasets, which is the de facto stan-dard for representing knowledge in LOD. Analysis of the evolution history is basedon SPARQL [18], a W3C standard for querying RDF datasets. Details on RDF andSPARQL appear in Section 3.

Regarding change detection, our framework acknowledges that there is no one-size-fits-all solution, and that different uses (or users) of the data may require a different setof changes being reported, since the importance and frequency of changes vary in dif-ferent application domains. For this reason, our framework supports both simple andcomplex changes. Simple changes are meant to capture fine-grained types of evolution.They are defined at design time and should meet the formal requirements of complete-ness and unambiguity, which guarantee that the detection process is well-behaved [15].Complex changes are meant to capture more coarse-grained, or specialized, changesthat are useful for the application at hand; this allows a customized behaviour of thechange detection process, depending on the actual needs of the application. Complexchanges are totally dynamic, and defined at run-time, greatly enhancing the flexibilityof our approach. More details on the definition of changes are given in Section 4.

To support the flexibility required by complex changes, our detection process isbased on SPARQL queries (one per defined change) that are provided to the algorithmas configuration parameters; as a result, the core detection algorithm is agnostic to theset of simple or complex changes used, thereby allowing new changes to be easily de-fined. Furthermore, to support sophisticated analysis of the evolution process, we pro-pose an ontology of changes, which allows the persistent representation of the detectedchanges, in a manner that permits easy and efficient navigation among versions, analy-sis of the deltas, cross-snapshot or historical queries, and the raising of changes as firstclass citizens. This, in a multi-version repository, allows queries that refer uniformly toboth the data and its evolution. This framework provides a generic basis for analyzingthe dynamics of LOD datasets, and is described in Section 5.

In our experimental evaluation (Section 6), we used 3 real RDF datasets of dif-ferent sizes to study the number of simple and complex changes that usually occur in

3 http://www.w3.org/RDF/

Page 3: A Flexible Framework for Understanding the …interesting, common, or coarse-grained enough to be expressed using complex changes. 3 Preliminaries We consider two disjoint sets U,

Fig. 1. Motivating Example

real-world settings, and provide an analysis of their types. Moreover, we report the eval-uation results of the efficiency of our change detection process and quantify the effectof the size of the compared versions and the number of detected changes in the perfor-mance of the algorithm. To our knowledge, this is the first time that change detectionhas been evaluated for datasets of this size.

2 Motivating Example

In our work, we provide a change recognition method, which, given two dataset versionsDold,Dnew, produces their delta (∆), i.e., a formal description of the changes that weremade to get Dnew from Dold. The naive approach is to express the delta with low-levelchanges (consisting of triple additions and deletions). Our approach builds two morelayers on top of low level changes, each adding a semantically richer change vocabulary.

Low-level changes are easy to define and detect, and have several nice proper-ties [23]. For example, assume two DBpedia versions of a partial ontology with infor-mation about football teams (Figure 1 (top)), in which the RDF class of Real Madrid CFis subclass of SoccerClub. Commonly, change detection compares the current with theprevious dataset version and returns the low-level delta containing the added triples:(Mikel Lasa, team, Real Madrid CF), (Mikel Lasa, name, MikelLasa), (Mikel Lasa, type, Athlete). Clearly, the representation of changesat the level of (added/deleted) triples, leads to a syntactic delta, which does not prop-erly capture the intent behind a change and generates results that are not intuitiveenough for the human user. What we would like to report is: Add Player(“Mikel Lasa”,Real Madrid CF), which corresponds to the actual essence of the change.

In order to achieve this, we need an intermediary level of changes, called simplechanges. Simple changes are fine-grained, predefined and domain-agnostic changes. In

Page 4: A Flexible Framework for Understanding the …interesting, common, or coarse-grained enough to be expressed using complex changes. 3 Preliminaries We consider two disjoint sets U,

our example, the low-level changes found as added triples, reflect three simple changes,namely, two Add Property Instance changes, for the property:team and property:name,and one Add Type To Individual change, for denoting the type of athlete (Figure 1).Interestingly, a simple change can group a set of different low-level changes.

However, it is still not easy for the user who is not domain expert and familiarwith the notion of triples to define simple changes. To address this problem, changesof coarser granularity are needed. The main idea is to group simple changes into com-plex ones, that are data model agnostic and carry domain-specific semantics, therebymaking the description of the evolution (delta) more human-understandable and con-cise. In our example, the three simple changes can be grouped under one complex,called Add Player. The change’s definition includes two arguments: Add Player(“MikelLasa”, Real Madrid CF). Such a complex change consumes the corresponding simplechanges, thus, there is no need for further reporting them.

In a nutshell, complex changes are user-defined, custom changes, which intend tocapture changes from the application perspective. Different applications are expectedto use different sets of complex changes. Complex changes are defined at a semanticlevel, and may be used to capture coarse grained changes that happen often; or changesthat the curator wants to highlight because they are somehow useful or interesting for aspecific domain or application; or changes that indicate an abnormal situation or type ofevolution. Thus, complex changes build upon simple ones because, intuitively, complexchanges are much easier to be defined on top of simple changes.

On the other hand, complex changes, being coarse-grained, cannot capture all evo-lution aspects; moreover, it would be unrealistic to assume that complex changes wouldbe defined in a way that captures all possible evolution types. Thus, simple changesare necessary as a “default” set of changes for describing evolution types that are notinteresting, common, or coarse-grained enough to be expressed using complex changes.

3 Preliminaries

We consider two disjoint sets U, L, denoting the URIs and literals (we ignore here blanknodes that can be avoided when data are published according to the LOD paradigm);the set T= U×U× (U ∪ L) is the set of all RDF triples. A version Di is a set of RDFtriples (Di ⊆ T); a dataset D is a sequence of versions D = 〈D1, . . . ,Dn〉.

SPARQL 1.1 [18] is the official W3C recommendation language for querying RDFgraphs. The building block of a SPARQL statement is a triple pattern tp that is likean RDF triple, but may contain variables (prefixed with character ?); variables aretaken from an infinite set of variables V, disjoint from the sets U, L, so the set oftriple patterns is: TP= (U ∪ V) × (U ∪ V) × (U ∪ L ∪ V). SPARQL triple patternscan be combined into graph patterns gp, using operators like join (“.”), optional (OP-TIONAL) and union (UNION) [1] and may also include conditions (using FILTER). Inthis work, we are only interested in SELECT SPARQL queries, which are of the form:“SELECT v1, . . . , vn WHERE gp”, where n > 0, vi ∈ V and gp is a graph pattern.

Evaluation of SPARQL queries is based on mappings, which are partial functionsµ : V 7→ U ∪ L that associate variables with URIs or literals (abusing notation, µ(tp)is used to denote the result of replacing the variables in tp with their assigned values

Page 5: A Flexible Framework for Understanding the …interesting, common, or coarse-grained enough to be expressed using complex changes. 3 Preliminaries We consider two disjoint sets U,

according to µ). Then, the evaluation of a SPARQL triple pattern tp on a dataset Dreturns a set of mappings (denoted by [[tp]]D), such that, µ(tp) ∈ D for µ ∈ [[tp]]D.This idea is extended to graph patterns by considering the semantics of the variousoperators (e.g., [[tp1 UNION tp2]]

D = [[tp1]]D ∪ [[tp2]]

D). Given a SPARQL query“SELECT v1, . . . , vn WHERE gp”, its result when applied on D is (µ(v1), . . . ,µ(vn)) for µ ∈ [[gp]]D. For the precise semantics and further details on the evaluationof SPARQL queries, the reader is referred to [16, 1].

4 Semantics

4.1 Language of Changes

We assume a setL = {c1, . . . , cn} of changes, which is disjoint from V, U, L. The setLis called a language of changes and is partitioned into the set of simple changes (denotedby Ls) and the set of complex changes (denoted by Lc). Each change has a certain arity(e.g., Add Player has two arguments); given a change c, a change specification is anexpression of the form c(p1, . . . , pn), where n is the arity of c, and p1, . . . , pn ∈ V.

As was made obvious in Section 2, the detection semantics of a change specificationare determined by the changes that it consumes and the related conditions. Formally:

Definition 1. Given a simple change c ∈ Ls, and its change specification c(p1, . . . , pn),the detection semantics of c(p1, . . . , pn) is defined as a tuple 〈δ, φold, φnew〉 where:

– δ determines the consumed changes of c and is a pair δ = (δ+, δ−), where δ+, δ−

are sets of triple patterns (corresponding to the added/deleted triples respectively).– φold, φnew are graph patterns, called the conditions for Dold, Dnew, respectively.

Definition 2. Given a complex change c ∈ Lc, and its change specification c(p1, . . . , pn),the detection semantics of c(p1, . . . , pn) is defined as a tuple 〈δ, φold, φnew〉 where:

– δ determines the consumed changes of c and is a set of change specifications fromLs, i.e., δ = {c1(p11, . . . , p1n1), . . . , cm(pm1 , . . . , p

mn m)} where {c1, . . . , cm} ⊆ Ls.

– φold, φnew are graph patterns, called the conditions for Dold, Dnew, respectively.

In our running example, the detection semantics of Add Property Instance(MikelLasa, team, Real Madrid CF) are: δ+ = {(Mikel Lasa, team,Real Madrid CF )},δ− = ∅, φold = “ ”, φnew = “ ”. Additionally, the detection semantics of Add Player(“MikelLasa”, Real Madrid CF) are: Add Property Instance(Mikel Lasa, team, Real MadridCF), Add Property Instance(Mikel Lasa, name, “Mikel Lasa”), Add Type To Individual(Mikel Lasa, Athlete).

The structure of the above definitions determines the SPARQL to be used for detec-tion (see Subsection 5.2, and at the Appendix). Any actual detection will give specificvalues (URIs or literals) to the variables appearing in a change specification. For ex-ample, when Add Property Instance is detected, the returned result should specify thesubject and object of the instance added to the property; essentially, this corresponds toan association of the three variables (parameters) of Add Property Instance to specificURIs/literals. Formally, for a change c, a change instantiation is an expression of theform c(x1, . . . , xn), where n is the arity of c, and x1, . . . , xn ∈ U ∪ L.

Page 6: A Flexible Framework for Understanding the …interesting, common, or coarse-grained enough to be expressed using complex changes. 3 Preliminaries We consider two disjoint sets U,

4.2 Detection Semantics

Simple changes. For simple changes, a detectable change instantiation corresponds toa certain assignment of the variables in δ+, δ−, φold, φnew, such that the conditions(φold, φnew) are true in the underlying datasets, and the triples in δ+, δ− have beenadded/deleted, respectively, from Dold to get Dnew. Formally:

Definition 3. A change instantiation c(x1, . . . , xn) of a simple change specificationc(p1, . . . , pn) is detectable for the pair Dold,Dnew iff there is a µ ∈ [[φold]]

Dold ∩[[φnew]]

Dnew such that for all tp ∈ δ+: µ(tp) ∈ Dnew \ Dold and for all tp ∈ δ−:µ(tp) ∈ Dold \ Dnew and for all i: µ(pi) = xi.

Simple changes must satisfy the properties of completeness and unambiguity; thisguarantees that the detection process exhibits a sound and deterministic behaviour [15].Essentially, what we need to show is that each change that the dataset underwent isproperly captured by one, and only one, simple change. Formally:

Definition 4. A detectable change instantiation c(x1, . . . , xn) of a simple change spec-ification c(p1, . . . , pn) consumes t ∈ Dnew \ Dold (respectively, t ∈ Dold \ Dnew) iffthere is a µ ∈ [[φold]]

Dold ∩ [[φnew]]Dnew and a tp ∈ δ+ (respectively, tp ∈ δ−) such

that µ(tp) = t and for all i: µ(pi) = xi.

The concept of consumption represents the fact that low-level changes are “as-signed” to simple ones, essentially allowing a grouping (partitioning) of low-level changesinto simple ones. To fulfil its purpose, this “partitioning” should be perfect, as dictatedby the properties of completeness and unambiguity. Formally:

Definition 5. A set of simple changes C is called complete iff for any pair of versionsDold, Dnew and for all t ∈ (Dnew \ Dold) ∪ (Dold \ Dnew), there is a detectableinstantiation c(x1, . . . , xn) of some c ∈ C such that c(x1, . . . , xn) consumes t.

Definition 6. A set of simple changes C is called unambiguous iff for any pair ofversions Dold, Dnew and for all t ∈ (Dnew \ Dold) ∪ (Dold \ Dnew), if c, c′ ∈ Cand c(x1, . . . , xn), c′(x′1, . . . , x

′m) are detectable and consume t, then c(x1, . . . , xn) =

c′(x′1, . . . , x′m).

In a nutshell, completeness guarantees that all low level changes are associated withat least one simple change, thereby making the reported delta complete (i.e., not miss-ing any change); unambiguity guarantees that no race conditions will emerge betweensimple changes attempting to consume the same low level change (see Figure 2 for a vi-sualization of the notions of completeness and unambiguity). The combination of thesetwo properties guarantees that the delta is produced in a complete and deterministicmanner. Regarding the simple changes, Ls, used in this work (for a complete list, see atthe Appendix), the following holds:

Proposition 1. The simple changes in Ls are complete and unambiguous.

Page 7: A Flexible Framework for Understanding the …interesting, common, or coarse-grained enough to be expressed using complex changes. 3 Preliminaries We consider two disjoint sets U,

Fig. 2. Visualization of Completeness and Unambiguity

Complex Changes. As complex changes can be freely defined by the user, it would beunrealistic to assume that they will have any quality guarantees, such as completenessor unambiguity. As a consequence, the detection process may lead to non-deterministicconsumption of simple changes and conflicts; to avoid this, complex changes are asso-ciated with a priority level, which is used to resolve such conflicts.

The detection for complex changes is defined on top of simple ones. A complexchange is detectable if its conditions are true for some assignment, while at the sametime the corresponding simple changes in δ are detectable. However, this naive defini-tion could lead to problems, as it could happen that the same detectable simple changeinstantiation is simultaneously contributing in the detection of two (or more) complexchanges. Such a case would lead to undesirable race conditions, so we define a total or-der (called priority, and denoted by <) over Lc, which helps disambiguate these cases.This leads to the following definitions:

Definition 7. A complex change instantiation c(x1, . . . , xn) is initially detectable forthe pair Dold, Dnew iff there is a µ ∈ [[φold]]

Dold ∩ [[φnew]]Dnew such that c′(µ(p′1),

. . . , µ(p′m)) is detectable for all c′(p′1, . . . , p′m) ∈ δ, and µ(p′i) = xi for i = 1, . . . , n.

Definition 8. An initially detectable complex change instantiation c(x1, . . . , xn) con-sumes a simple change instantiation c′(x′1, . . . , x

′m) iff c′(p′1, . . . , p

′m) ∈ δ and there is

a µ ∈ [[φold]]Dold ∩ [[φnew]]

Dnew such that for all i, µ(pi) = xi, µ(p′i) = x′i.

Definition 9. A complex change instantiation c(x1, . . . , xn) is detectable for the pairDold,Dnew iff it is initially detectable for the pair Dold,Dnew and there is no initiallydetectable change instantiation c′(x′1, . . . , x

′m) such that c < c′ and c, c′ have at least

one consumed simple change instantiation in common.

Page 8: A Flexible Framework for Understanding the …interesting, common, or coarse-grained enough to be expressed using complex changes. 3 Preliminaries We consider two disjoint sets U,

5 Change Detection for Evolution Analysis

5.1 Representing Detected Changes

We treat detected changes (i.e., change instantiations) as first-class citizens in order tobe able to perform queries analysing the evolution of datasets. Further, we are interestedin performing combined queries, in which both the datasets and the changes should beconsidered to get an answer. To achieve this, the representation of the changes that aredetected on the data cannot be separated from the data itself.

For example, consider the following query: “return all the left backs born before1980, which were transferred to Athletic Bilbao between versions Dold and Dnew andused to play for Real Madrid CF in any version”. Such a query requires access to thechanges (to identify transfers to Athletic Bilbao), and to the data (to identify whichof those transfers were related to left backs born before 1980); in addition, it requiresaccess to all previous versions (cross-snapshot query) to determine whether any of thepotential results (players) used to play for Real Madrid CF in any version.

To answer such queries, the repository should include all versions, as well as theirchanges. We opt to store the changes in a structured form; their representation shouldinclude connections with the actual entities (e.g., teams or players) and the versionsthat they refer to. This can be achieved by representing changes as RDF entities, withconnections to the actual data and versions, so that a detectable change can be associatedwith the corresponding data entities that it refers to.

In particular, we propose the use of an adequate schema (that we call the ontol-ogy of changes) for storing the detected changes, thereby allowing a supervisory lookof the detected changes and their association with the entities they refer to in the ac-tual datasets, facilitating the formulation and the answering of queries that refer toboth the data and their evolution (see Figure 3). In a nutshell, the schema in our rep-resentation describes the change specifications and detection semantics, whereas thedetected changes (change instantiations) are classified as instances under this schema.More specifically, at schema level, we introduce one class for each simple and com-plex change c ∈ L. Each such class c is subsumed by one of the main classes “Sim-ple Change” or “Complex Change”, indicating the type of c. Each change is also asso-ciated with its user-defined name, a number of properties (one per parameter), and thenames of these parameters (not shown in Figure 3 to avoid cluttering the image).

For complex changes, we also store information regarding the changes being con-sumed by each complex change, as well as the SPARQL query used for its detec-tion, which is automatically generated at change definition time; this is done for ef-ficiency, to avoid having to generate this query in every run of the detection pro-cess. Note that the information related to complex changes is generated on the fly atchange creation time (in contrast to simple changes, which are built in the ontologyat design time). All schema information is stored in a dataset-specific named graph(“D/changes/App1/schema”, for a datasetD and a related application App1); this is nec-essary because each different application may adopt a different set of complex changes.

At instance level, we introduce one individual for each detectable change instantia-tion c(x1, . . . , xn) in each pair of versions (AddPI1 and AddPlayer1). This individual isassociated with the values of its parameters, which are essentially URIs or literals from

Page 9: A Flexible Framework for Understanding the …interesting, common, or coarse-grained enough to be expressed using complex changes. 3 Preliminaries We consider two disjoint sets U,

Fig. 3. The Ontology of Changes

the actual dataset versions. This provides the “link” between the change repository andthe data, thereby allowing queries involving both the changes and the data. In addition,complex changes are connected with their consumed simple ones. The triples that de-scribe this information are stored in an adequate named graph (e.g., “D/changes/v1-v2”,for the changes detected between v1, v2 of the dataset D).

5.2 Change Detection Process and Storage

To detect simple and complex changes, we rely on plain SPARQL queries, which aregenerated from the information drawn from the detection semantics of the correspond-ing changes (Definition 1 and 2). For simple changes, this information is known at de-sign time, so the query is loaded from a configuration file, whereas for complex changes,the corresponding query is generated once at change-creation time (run-time) and isloaded from the ontology of changes (see Figure 3). For examples of such queries, seeat the Appendix. The results of the generated queries determine the change instantia-tions that are detectable; these results determine the actual triples to be inserted in theontology of changes.

The SPARQL queries used for detecting a simple change are SELECT queries,whose returned values are the values of the change instantiation; thus, for each vari-able in the change specification, we put one variable in the SELECT clause. Then,the WHERE clause of the query includes the triple patterns that should (or should not)be found in each of the versions in order for a change instantiation to be detectable;more specifically, the triple patterns in δ+ must be found in Dnew but not in Dold, thetriple patterns in δ− must be found in Dold but not in Dnew, and the graph patterns inφold, φnew should be applied in Dold, Dnew, respectively.

The generation of the SPARQL queries for the complex changes follows a similarpattern. The main difference is that complex changes check the existence of simplechanges in the ontology of changes, rather than triples in the two versions (as is thecase with simple changes detection); therefore, complex changes should be detected

Page 10: A Flexible Framework for Understanding the …interesting, common, or coarse-grained enough to be expressed using complex changes. 3 Preliminaries We consider two disjoint sets U,

after the detection of simple changes and their storage in the ontology. Note also thatthe considered simple changes should not have been marked as “consumed” by otherdetectable changes of a higher priority; thus, it is important for queries associated withcomplex changes to be executed in a particular order, as implied by their priority.

Following detection, the information about the detectable (simple or complex) changeinstantiations is stored in the ontology of changes along with any new consumptions ofsimple changes. To do so, we process each result row to create the corresponding tripleblocks, as specified in Section 5.1. This is done as a separate process that first storesthe triple blocks in a file (on disk) and subsequently uploads them in the triple store (inour implementation, we use Virtuoso4 and its bulk loading process for triple ingestion).Note that the detection and storing of changes could be done in one step, if one used anadequately defined SPARQL INSERT statement5 that identified the detectable changeinstantiations, created the corresponding triple blocks and inserted them in the ontol-ogy using a single statement. However, this approach turned out to be slower by 1 to 2orders of magnitude, partly because it does not exploit bulk updates based on multiplethreads, and also because bulk loading is much faster.

6 Experimental Evaluation

Our evaluation focuses on identifying the number and type of simple and complexchanges that usually occur in real-world settings, study the performance of our changedetection process and quantify the effect of different parameters in the performance ofthe algorithm. Our experiments are based on the changes defined in the Appendix.Setting. For the management of linked data (e.g., storage of datasets and query execu-tion), we worked with a scalable triple store, namely the open source version of VirtuosoUniversal Server4, v7.10.3209 (note that, our work is not bounded to any specific in-frastructure or triple-store). Virtuoso is hosted on a machine which uses an Intel XeonE5-2630 at 2.30GHz, with 384GB of RAM running Debian Linux wheezy version, withLinux kernel 3.16.4. The system uses 7TB RAID-5 HDD configurations. From the to-tal amount of memory, we dedicated 64GB for Virtuoso and 5GB for the implementedapplication. Moreover, taking into account that CPU provides 12 cores with 2 threadseach, we decided to use a multi-threaded implementation; specifically, we noticed thatthe use of 8 threads during the creation of the RDF triples along with the ingestionprocess gave us optimal results for our setting. This was one more reason to select Vir-tuoso for our implementation, as it allows the concurrent use of multiple threads duringingestion. To eliminate the effects of hot/cold starts, cached OS information etc., eachchange detection process was executed 10 times and the average times were considered.

For our experimental evaluation, we used 3 real RDF datasets of different sizes: asubset of the English DBpedia6 (consisting of article categories, instance types, labelsand mapping-based properties), and the FMA7 and EFO8 datasets. Table 1 summarizes

4 http://virtuoso.openlinksw.com5 http://www.w3.org/TR/2013/REC-sparql11-update-20130321/6 http://dbpedia.org7 http://sig.biostr.washington.edu/projects/fm/AboutFM.html8 http://www.ebi.ac.uk/efo/

Page 11: A Flexible Framework for Understanding the …interesting, common, or coarse-grained enough to be expressed using complex changes. 3 Preliminaries We consider two disjoint sets U,

Table 1. Evaluated Datasets: Versions and Sizes

DBpedia FMA EFOVersion v3.7 v3.8 v3.9 v1.4 v3.0 v3.1 v2.44 v2.45 v2.46 v2.47 v2.48 v2.49 v2.50# Triples 49M 63M 68M 1.51M 1.67M 1.71M 0.38M 0.38M 0.39M 0.39M 0.4M 0.4M 0.42M

Table 2. Sets of Complex Changes for DBpedia, FMA and EFO

DBpedia FMA EFOAdd Subject (1) Add Concept (1) Add Definition (1)Delete Subject (1) Delete Concept (1) Add Synonym (1)Add Thing (1) Add Restriction (1) Delete Definition (1)Delete Thing (1) Delete Restriction (1) Delete Synonym (1)Add Athlete (1) Add Synonym (1) Mark as Obsolete (2)Update Label (2) Update Comment (2) Update Comment (2)Add Place (2) Update Domain (2) Update Domain (2)Delete Place (2) Update Range (2) Update Label (2)Add Person (3) Add Observation (3) Update Range (2)Delete Person (3) Delete Observation (3) Update Property (4)

the sizes of the evaluated versions of these datasets. To evaluate the performance ofthe complex change detection process, we created 3 sets of complex changes, one foreach dataset. To do this, we exploit domain experts knowledge9, so as to have sets ofchanges that reflect real-users needs and show similar characteristics, namely (i) samenumber of complex changes in the sets and (ii) very close numbers of simple changesconsumed by the complex changes in the sets. Table 2 presents the particular complexchanges used for each dataset along with the number of simple changes consumed (forthe definition of the changes, see at the Appendix).

For DBpedia and FMA, let DBp1, DBp2 and FMA1, FMA2 stand for the pairs ofversions (v3.7, v3.8), (v3.8, v3.9), and (v1.4, v3.0), (v3.0, v3.1), respectively. Similarly,we denote with EFO1 the pair of versions (v2.44, v2.45) of the EFO dataset, with EFO2the pair of versions (v2.49, v2.50), and so forth. To our knowledge, this is the first timethat change detection has been evaluated for datasets of this size.Detected Simple Changes. Figure 4 summarizes the number and type of simple changesthat appear in the evaluated datasets. We note the large number of changes which oc-curred during DBpedia evolution compared to the FMA and EFO datasets, due mostlyto its bigger size. However, even if the versions sizes of FMA are much smaller thanDBpedia (Table 1), there are cases in which the number of changes between two FMAversions are of the same order of magnitude compared to the number of changes be-tween two DBpedia versions (e.g., Add Property Instance). This is explained by thefact that FMA contains experimental biological results and measurements that changeover time, thus new versions are vastly different from previous ones. Moreover, observethat the majority of changes (in all datasets except EFO) are applied to the data level(e.g., Add Property Instance), whereas in EFO, we have also changes which are ap-

9 http://www.ebi.ac.uk/

Page 12: A Flexible Framework for Understanding the …interesting, common, or coarse-grained enough to be expressed using complex changes. 3 Preliminaries We consider two disjoint sets U,

Fig. 4. Detected Simple Changes

plied to the schema (we notice a big number of Add Superclass changes, expressing amodification on the hierarchy of the EFO schema).Performance of Simple Change Detection. Table 3 reports on the performance of thedetection process for the employed datasets. We split the results in two parts, namelytriple creation and triple ingestion; the former includes the execution of the SPARQLqueries for detection and the identification of the triples to be inserted in the ontologyof changes, whereas the latter is the actual enrichment of the ontology of changes.Our main conclusion is that the number of simple changes is a more crucial factor forperformance than the sizes of the compared versions. This observation is more clear inthe DBpedia dataset, where the evolution between v3.7 and v3.8 produces about twicethe number of changes than the evolution between v3.8 and v3.9; despite the fact thatin the second case, we compare larger dataset versions (Table 1), the execution timein the former case is almost twice as large. Note that this conclusion holds for bothtriple creation and ingestion. Overall, our approach is about 1 order of magnitude fastercompared to the most relevant approach (presented in [15]). To show this, we performedan additional experiment with the largest dataset used in [15], namely the GO10 dataset(versions v22-09-2009 and v20-04-2010) with about 0.2M triples per version. In thisexperiment, our approach needs 1,52 sec, while [15] requires 33,13 sec.Detected Complex Changes. Figure 5 summarizes the number of complex changes pertype for the evaluated datasets. Clearly, the size of the datasets determines the numberof the complex changes occurred during the datasets evolution; abstractly speaking, thebigger the dataset (see Table 1), the more the changes. From Figure 5, we can identifythe particular types of complex changes that are the most popular ones. Specifically,in DBpedia, changes like Add Subject, Add Thing and Add Person are very common(on average, there are 2.7M, 1M and 0.5M changes, respectively). In FMA, we observe

10 http://geneontology.org

Page 13: A Flexible Framework for Understanding the …interesting, common, or coarse-grained enough to be expressed using complex changes. 3 Preliminaries We consider two disjoint sets U,

Table 3. Performance of Simple Change Detection

Versions Pairs # Simple Changes # Ingested Triples Triple Creation (sec) Triple Ingestion (sec) Duration (sec)DBp1 20.7M 74.8M 412 143 555DBp2 9.3M 32M 235 73 308FMA1 2.7M 8.8M 113 12 125FMA2 2.5M 9.7M 140 12 152EFO1 0.1K 0.3K 0.33 0.11 0.44EFO2 59K 180K 0.9 1.63 2.53EFO3 0.3K 1K 0.22 0.79 1.01EFO4 1.9K 6.4K 0.64 0.33 0.97EFO5 0.6K 2K 0.57 0.2 0.77EFO6 2.8K 8.9K 0.47 0.39 0.86

Fig. 5. Detected Complex Changes

a big number of Add Concept, Delete Concept, Add Observation, Delete Observationand Add Synonym changes with about 140K, 140K, 140K, 140K, 46K changes, respec-tively. EFO is the smallest dataset with the smaller number of changes; for example,we count about 7K, 7K and 1.5K Add Synonym, Delete Synonym and Add Definitionchanges. In overall, the majority of complex changes are applied to the data level.

Performance of Complex Change Detection. Table 4 reports on the performance ofthe complex change detection process for the employed RDF datasets. Again, we pro-vide execution times for both the triple creation, i.e., for the execution of the SPARQLqueries for detecting the triples to be inserted in the ontology of changes, and the tripleingestion, i.e., for the actual enrichment of the ontology of changes. Moreover, we showthe size, in number of triples, of the ontology of changes per dataset; the ontology ofchanges, as produced after identifying the simple changes, is used for searching forcomplex changes, instead of the actual datasets versions. The bigger the size of theontology of changes, the higher the execution time (for both triple creation and inges-tion). Given that, typically, the ontologies of changes contain much fewer triples thanthe datasets versions, searching for complex changes needs much less time, comparedto the time required for searching simple changes. The reported small execution timesare affected as well by the smaller number of complex changes identified, compared tothe number of the identified simple changes. Finally, note that here we follow a multi-

Page 14: A Flexible Framework for Understanding the …interesting, common, or coarse-grained enough to be expressed using complex changes. 3 Preliminaries We consider two disjoint sets U,

Table 4. Performance of Complex Change Detection

Versions # Complex Ontology of # Ingested Triple Triple DurationPairs Changes Changes Size Triples Creation (sec) Ingestion (sec) (sec)DBp1 5.79M 74.8M 100.2M 136.5 52.8 189.3DBp2 5.67M 32M 53.6M 130.7 48 178.7FMA1 616.4K 8.8M 13.6M 20.93 19.65 40.58FMA2 627.8K 9.7M 13.1M 20.84 19.23 40.07EFO1 36 0.3K 0.5K 0.79 0.04 0.83EFO2 15.7M 180K 243.6K 1.17 0.93 2.1EFO3 39 1K 1.2K 0.08 0.02 0.1EFO4 1M 6.4K 11.1K 0.35 0.21 0.56EFO5 287 2K 3.4K 0.57 0.06 0.63EFO6 1M 8.9K 14.3K 0.44 0.07 0.51

threaded implementation only for triple ingestion. Due to the fact that unambiguity doesnot hold for complex changes, we cannot perform triple creation in parallel.

7 Related Work

In general, approaches for change detection can be classified into low-level and high-level ones, based on the types of changes they support. Low-level change detection ap-proaches report simple add/delete operations, which are not concise or intuitive enoughto human users, while focusing on machine readability. [4] discusses a low-level detec-tion approach for propositional Knowledge Bases (KBs), which can be easily extendedto apply to KBs represented under any classical knowledge representation formalism.This work presents a number of desirable formal properties for change detection lan-guages, like delta uniqueness and reversibility of changes. Similar properties appearin [23], where a low-level change detection formalism for RDFS datasets is presented.[10] describes a low-level change detection approach for the Description Logic EL; thefocus is on a concept-based description of changes, and the returned delta is a set of con-cepts whose position in the class hierarchy changed. [11] presents a low-level changedetection approach for DL-Lite ontologies, which focuses on a semantical descriptionof the changes. Recently, [8] introduces a scalable approach for reasoning-aware low-level change detection that uses an RDBMS, while [12] supports change detection be-tween RDF datasets containing blank nodes. All these works result in non-concise,low-level deltas, which are difficult for a human to understand.

High-level change detection approaches provide more human-readable deltas. Al-though there is no agreed-upon list of changes necessary for any given context, vari-ous high-level operations, along with the intuition behind them, have been proposed.Specifically, [9, 14] describes a fixed-point algorithm for detecting changes, imple-mented in PromptDiff. The algorithm incorporates heuristic-based matchers to detectchanges between two versions, thus introducing uncertainty in the results. [17] pro-poses the Change Definition Language (CDL) as a means to define high-level changes.A change is defined and detected using temporal queries over a version log that containsrecordings of the applied low-level changes. The version log must be updated whenevera change occurs; this overrules the use of this approach in non-curated or distributed en-vironments. In general, these approaches do not present formal semantics of high-level

Page 15: A Flexible Framework for Understanding the …interesting, common, or coarse-grained enough to be expressed using complex changes. 3 Preliminaries We consider two disjoint sets U,

operations, or of the corresponding detection process; thus, no useful formal propertiescan be guaranteed.

The most relevant work appears in [15], where an approach for detecting high-levelchanges appears. In that work, unlike our approach, a fixed set of high-level changes isproposed, without providing facilities related to representing the detected changes andanswering cross-snapshot queries, or queries accessing both the changes and the data;as such, it only partly addresses the problem of analyzing datasets’ dynamics. Interest-ingly, we experience significantly improved performance and scalability (see Section 6).In [2] the authors focus on formally defining high-level changes as sequences of triples,but do not describe a detection process or a specific language of changes, while [6] pro-poses an interesting high-level change detection algorithm that takes into account thesemantics of OWL. Using a layered approach designed for OWL as well, [7] focuseson representing changes only at schema level.

The idea of using SPARQL query templates to identify evolution patterns is alsoused in [19]; however, this paper aims to identify problems caused during ontology evo-lution, rather than analyse the evolution and report or represent changes. A complemen-tary to ours work is presented in [5]; it defines a SPARQL-like language for expressingcomplex changes and querying the ontology of changes in a user-friendly manner. Onthe contrary, our work provides the semantics of the created complex changes, and thechanges ontology, upon which the evolution analysis will be made.

8 Conclusions

The dynamicity of LOD datasets makes the automatic identification of deltas betweenversions increasingly important for several reasons, such as storing and communica-tion efficiency, visualization and documentation of deltas, efficient synchronization andstudy of the dataset evolution history. In this paper, we proposed an approach to copewith the dynamicity of Web datasets via the management of changes between versions.We advocated in favour of a flexible, extendible and triple-store independent approach,which prescribes (i) the definition of custom, application-specific changes, and theirmanagement (definition, storage, detection) in a manner that ensures the satisfaction offormal properties, like completeness and unambiguity, (ii) the flexibility and customiza-tion of the considered changes, via complex changes that can be defined at run-time, and(iii) the easy configuration of a scalable detection mechanism, via a generic algorithmthat builds upon SPARQL queries easily generated from the changes’ definitions.

An important feature of our work, in which we handle real datasets snapshots, isthe ability to perform sophisticated analysis on top of the detected changes, via therepresentation of the detected changes in an ontology and their treatment as first-classcitizens. This allows queries spanning multiple versions of the data (cross-snapshot), aswell as queries involving both the evolution history and the data.

Acknowledgments. This work was partially supported by the EU FP7 projects DI-ACHRON (#601043) and IdeaGarden (#318552).

Page 16: A Flexible Framework for Understanding the …interesting, common, or coarse-grained enough to be expressed using complex changes. 3 Preliminaries We consider two disjoint sets U,

References

1. M. Arenas, C. Gutierrez, and J. Perez. On the semantics of SPARQL. In Semantic WebInformation Management - A Model-Based Perspective. Springer, 2009.

2. S. Auer and H. Herre. A versioning and evolution framework for RDF knowledge bases. InPSI, 2007.

3. R. Cloran and B. Irvin. Transmitting RDF graph deltas for a cheaper semantic Web. InSATNAC, 2005.

4. E. Franconi, T. Meyer, and I. Varzinczak. Semantic diff as the basis for knowledge baseversioning. In NMR, 2010.

5. T. Galani, Y. Stavrakas, G. Papastefanatos, and G. Flouris. Supporting complex changes inRDF(S) knowledge bases. In MEPDaW-15, 2015.

6. G. Groner, F. S. Parreiras, and S. Staab. Semantic recognition of ontology refactoring. InISWC, 2010.

7. J. Hartmann, R. Palma, Y. Sure, P. Haase, and M. C. Suarez-Figueroa. OMV ontologymetadata vocabulary. In Ontology Patterns for the Semantic Web Workshop, 2005.

8. D.-H. Im, S.-W. Lee, and H.-J. Kim. Backward inference and pruning for rdf change detec-tion using rdbms. J. Information Science, 39(2):238–255, 2013.

9. M. Klein, A. Proefschrift, M. Christiaan, A. Klein, and J. M. Akkermans. Change manage-ment for distributed ontologies. Technical report, VU University Amsterdam, 2004.

10. B. Konev, D. Walther, and F. Wolter. The logical difference problem for description logicterminologies. In IJCAR, 2008.

11. R. Kontchakov, F. Wolter, and M. Zakharyaschev. Can you tell the difference between DL-Lite ontologies? In KR, 2008.

12. D.-H. Lee, D.-H. Im, and H.-J. Kim. A change detection technique for RDF documentscontaining nested blank nodes. In PSI, 2007.

13. N. F. Noy, A. Chugh, W. Liu, and M. A. Musen. A framework for ontology evolution incollaborative environments. In ISWC, 2006.

14. N. F. Noy and M. A. Musen. Promptdiff: A fixed-point algorithm for comparing ontologyversions. In AI, 2002.

15. V. Papavasileiou, G. Flouris, I. Fundulaki, D. Kotzinos, and V. Christophides. High-levelchange detection in RDF(S) KBs. ACM Trans. Database Syst., 38(1), 2013.

16. J. Perez, M. Arenas, and C. Gutierrez. Semantics and complexity of SPARQL. In ISWC,2006.

17. P. Plessers, O. De Troyer, and S. Casteleyn. Understanding ontology evolution: A changedetection approach. Web Semant., 5(1):39–49, 2007.

18. E. Prud’hommeaux, S. Harris, and A. Seaborne. SPARQL 1.1 Query Language. Technicalreport, W3C, 2013.

19. C. Riess, N. Heino, S. Tramp, and S. Auer. Evopat – pattern-based evolution and refactoringof RDF knowledge bases. In ISWC 2010, 2010.

20. K. Stefanidis, I. Chrysakis, and G. Flouris. On designing archiving policies for evolvingRDF datasets on the Web. In ER, 2014.

21. K. Stefanidis, V. Efthymiou, M. Herchel, and V. Christophides. Entity resolution in the Webof data. In WWW, 2014.

22. J. Umbrich, M. Hausenblas, A. Hogan, A. Polleres, and S. Decker. Towards dataset dynam-ics: Change frequency of Linked Open Data sources. In LDOW, 2010.

23. D. Zeginis, Y. Tzitzikas, and V. Christophides. On computing deltas of rdf/s knowledgebases. ACM Trans. Web, 5(3):14:1–14:36, 2011.

Page 17: A Flexible Framework for Understanding the …interesting, common, or coarse-grained enough to be expressed using complex changes. 3 Preliminaries We consider two disjoint sets U,

A Representing Changes

A.1 Ontology of Changes: Schema Level

Figure 6 shows the high-level concepts in the ontology. We represent each change cat-egory (simple and complex) with a class, while we use the general class Change forgeneralizing them. In following, we provide additional (to the main paper) examplesfor illustrating issues regarding the representation of changes.

Fig. 6. Top-level Concepts

For simple changes, the corresponding change definition is modelled as a subclassof the main class Simple Change, and is associated with adequate properties to repre-sent its parameters (see Figure 7); each such property models the type of the parameter(e.g., whether it is a URI or a literal, via the classes rdfs:Resource, rdfs:Literal respec-tively) and its name (which is a descriptive name that allows a more intuitive interactionwith the user, useful also during the construction of complex changes that consume saidsimple change). Also a descriptive name (cname) is captured for each type of change.Note that conditions do not need to be stored.

For complex changes, similar ideas are used; however, note that the informationrelated to complex changes is generated on the fly at change creation time (in contrastto simple changes, which are in-built in the ontology at design time). In addition, moredetailed information related to the change should be available in the ontology for thechange detection process, because, unlike simple changes, this information is not knownat design time and cannot be embedded in the code.

Each complex change is modelled as a subclass of the main classComplex Change,and is associated with adequate properties to represent its parameters (see Figure 8).Again, parameters have a type and a name, which are modelled in the same manneras in simple changes. In addition, each complex change is associated with the simplechange(s) that it consumes, and includes also properties describing its (user-defined)descriptive name and its priority.

Finally, for each complex change, the SPARQL query used for its detection, is au-tomatically generated at change definition time; this is done for efficiency, to avoid

Page 18: A Flexible Framework for Understanding the …interesting, common, or coarse-grained enough to be expressed using complex changes. 3 Preliminaries We consider two disjoint sets U,

Fig. 7. Representation of Simple Changes

having to generate this query in every run of the detection process. The SPARQL queryencapsulates all the relevant information about the detection of the change, so furtherinformation on the complex change (namely conditions and associations) need not tobe stored.

Fig. 8. Representation of Complex Changes

A.2 Ontology of Changes: Data Level

The instance level is used to store the detectable simple and complex changes. Each de-tectable change between any two versions is represented using a different individual as-sociated through adequate properties with all relevant information, namely, the versionsbetween which the change was detected and the exact values of its parameters (of the

Page 19: A Flexible Framework for Understanding the …interesting, common, or coarse-grained enough to be expressed using complex changes. 3 Preliminaries We consider two disjoint sets U,

change instantiation). For complex changes, we additionally need to include consump-tion information. Examples of such instantiations for simple and complex changes areshown in Figures 9 and 10, for the detectable changes Add Type To Individual(md5 :ecfddff607550c2def892a823eb16a8f, SSIS : Asset) and Mark as Obsolete (efo :EFO 0004151), respectively.

Fig. 9. Representation of Simple Change Detection

Fig. 10. Representation of Complex Change Detection

Page 20: A Flexible Framework for Understanding the …interesting, common, or coarse-grained enough to be expressed using complex changes. 3 Preliminaries We consider two disjoint sets U,

B Simple Changes

Change name andparameters

Add Type Class(CLASS)

Intuition Used to add a class (CLASS)δ+ (CLASS, rdf : type, rdfs : class)δ− –φold –φnew –SPARQL usedfor detection

SELECT ?aWHERE {GRAPH <v2> {?a rdf:type rdf:Class. }FILTER NOT EXISTS {GRAPH <v1> {?a rdf:type rdf:Class. }} }

Change name andparameters

Delete Type Class(CLASS)

Intuition Used to delete a class (CLASS)δ+ –δ− (CLASS, rdf : type, rdfs : class)φold –φnew –SPARQL usedfor detection

SELECT ?aWHERE {GRAPH <v1> {?a rdf:type rdf:Class. }FILTER NOT EXISTS {GRAPH <v2> {?a rdf:type rdf:Class. }} }

Page 21: A Flexible Framework for Understanding the …interesting, common, or coarse-grained enough to be expressed using complex changes. 3 Preliminaries We consider two disjoint sets U,

Change name andparameters

Add Type Property(PROPERTY)

Intuition Used to add a property (PROPERTY)δ+ (PROPERTY, rdf : type, rdf : property)δ− –φold –φnew –SPARQL usedfor detection

SELECT ?aWHERE {GRAPH <v2> {?a rdf:type rdf:Property. }FILTER NOT EXISTS {GRAPH <v1> {?a rdf:type rdf:Property. }} }

Change name andparameters

Delete Type Property(PROPERTY)

Intuition Used to delete a property (PROPERTY)δ+ –δ− (PROPERTY, rdf : type, rdf : property)φold –φnew –SPARQL usedfor detection

SELECT ?aWHERE {GRAPH <v1> {?a rdf:type rdf:Property. }FILTER NOT EXISTS {GRAPH <v2> {?a rdf:type rdf:Property.}} }

Page 22: A Flexible Framework for Understanding the …interesting, common, or coarse-grained enough to be expressed using complex changes. 3 Preliminaries We consider two disjoint sets U,

Change name andparameters

Add Superclass(SUBCLASS, SUPERCLASS)

Intuition Used to add a class subsumption relationship between SUBCLASSand SUPERCLASS

δ+ (SUBCLASS, rdfs : subClassOf, SUPERCLASS)δ− –φold –φnew –SPARQL usedfor detection

SELECT ?a ?b WHERE {GRAPH <v2> {?a rdfs:subClassOf ?b }FILTER NOT EXISTS {GRAPH <v1> {?a rdfs:subClassOf ?b }} }

Change name andparameters

Delete Superclass(SUBCLASS, SUPERCLASS)

Intuition Used to delete a class subsumption relationship between SUB-CLASS and SUPERCLASS

δ+ –δ− (SUBCLASS, rdfs : subClassOf, SUPERCLASS)φold –φnew –SPARQL usedfor detection

SELECT ?a ?b WHERE {GRAPH <v1> {?a rdfs:subClassOf ?b }FILTER NOT EXISTS {GRAPH <v2> {?a rdfs:subClassOf ?b }} }

Page 23: A Flexible Framework for Understanding the …interesting, common, or coarse-grained enough to be expressed using complex changes. 3 Preliminaries We consider two disjoint sets U,

Change name andparameters

Add Superproperty(SUBPROP, SUPERPROP)

Intuition Used to add a property subsumption relationship between SUB-PROP and SUPERPROP

δ+ (SUBPROP, rdfs : subPropertyOf, SUPERPROP)δ− –φold –φnew –SPARQL usedfor detection

SELECT ?a ?b WHERE {GRAPH <v2> {?a rdfs:subPropertyOf ?b }FILTER NOT EXISTS {GRAPH <v1> {?a rdfs:subPropertyOf ?b }} }

Change name andparameters

Delete Superproperty(SUBPROP, SUPERPROP)

Intuition Used to delete a property subsumption relationship between SUB-PROP and SUPERPROP

δ+ –δ− (SUBPROP, rdfs : subPropertyOf, SUPERPROP)φold –φnew –SPARQL usedfor detection

SELECT ?a ?b WHERE {GRAPH <v1> {?a rdfs:subPropertyOf ?b }FILTER NOT EXISTS {GRAPH <v2> {?a rdfs:subPropertyOf ?b }} }

Page 24: A Flexible Framework for Understanding the …interesting, common, or coarse-grained enough to be expressed using complex changes. 3 Preliminaries We consider two disjoint sets U,

Change name andparameters

Add Type To Individual(INDIV, TYPE)

Intuition Used to add a new type (TYPE) to individual INDIV

δ+ (INDIV, rdf : type, TYPE)δ− –φold –φnew FILTER (TYPE != rdfs:Class &&

TYPE != rdf:Property &&TYPE != rdfs:Resource).

SPARQL usedfor detection

SELECT ?a ?b WHERE {GRAPH <v2> {?a rdfs:rdf:type ?b.}GRAPH <v1> {?a rdfs:rdf:type ?b.}FILTER (?b != rdfs:Class &&?b != rdf:Property &&?b != rdfs:Resource).}

Change name andparameters

Delete Type From Individual(INDIV, TYPE)

Intuition Used to delete a type (TYPE) from individual INDIV

δ+ –δ− (INDIV, rdf : type, TYPE)φold FILTER (TYPE != rdfs:Class &&

TYPE != rdf:Property &&TYPE != rdfs:Resource).

φnew –SPARQL usedfor detection

SELECT ?a ?b WHERE {GRAPH <v1> {?a rdfs:rdf:type ?b.}GRAPH <v2> {?a rdfs:rdf:type ?b.}FILTER (?b != rdfs:Class &&?b != rdf:Property &&?b != rdfs:Resource).}

Page 25: A Flexible Framework for Understanding the …interesting, common, or coarse-grained enough to be expressed using complex changes. 3 Preliminaries We consider two disjoint sets U,

Change name andparameters

Add Property Instance(SUB, OBJ, PROP)

Intuition Used to add a property (PROP) instantiation between a given sub-ject SUB and a given object OBJ

δ+ (SUB, PROP, OBJ)δ− –φold –φnew FILTER (PROP != rdfs:subClassOf &&

PROP != rdfs:subPropertyOf &&PROP != rdf:type &&PROP != rdfs:comment && PROP != rdfs:label &&PROP != rdfs:domain && PROP != rdfs:range)

SPARQL usedfor detection

SELECT ?a1 ?b ?a2 WHERE {GRAPH <v2> { ?a1 ?b ?a2.}FILTER NOT EXISTS {GRAPH <v1>{ ?a1 ?b ?a2.}}FILTER(?b!=rdfs:subClassOf &&?b!=rdfs:subPropertyOf &&?b!=rdf:type &&?b!=rdfs:comment &&?b!=rdfs:label &&?b!=rdfs:domain &&?b!=rdfs:range) }

Page 26: A Flexible Framework for Understanding the …interesting, common, or coarse-grained enough to be expressed using complex changes. 3 Preliminaries We consider two disjoint sets U,

Change name andparameters

Delete Property Instance(SUB, OBJ, PROP)

Intuition Used to delete a property (PROP) instantiation between a givensubject SUB and a given object OBJ

δ+ –δ− (SUB, PROP, OBJ)φold FILTER (PROP != rdfs:subClassOf &&

PROP != rdfs:subPropertyOf &&PROP != rdf:type &&PROP != rdfs:comment && PROP != rdfs:label &&PROP != rdfs:domain && PROP != rdfs:range)

φnew –SPARQL usedfor detection

SELECT ?a1 ?b ?a2 WHERE {GRAPH <v1> { ?a1 ?b ?a2.}FILTER NOT EXISTS {GRAPH <v2>{ ?a1 ?b ?a2.}}FILTER(?b!=rdfs:subClassOf &&?b!=rdfs:subPropertyOf &&?b!=rdf:type &&?b!=rdfs:comment &&?b!=rdfs:label &&?b!=rdfs:domain &&?b!=rdfs:range)}

Change name andparameters

Add Domain(PROPERTY, DOMAIN)

Intuition Used to add a domain DOMAIN in a given property PROPERTY

δ+ (PROPERTY, rdfs : domain, DOMAIN)δ− –φold –φnew –SPARQL usedfor detection

SELECT ?a ?b WHERE {GRAPH <v2> {?a rdfs:domain ?b }.FILTER NOT EXISTS {GRAPH <v1> {?a rdfs:domain ?b } }}

Page 27: A Flexible Framework for Understanding the …interesting, common, or coarse-grained enough to be expressed using complex changes. 3 Preliminaries We consider two disjoint sets U,

Change name andparameters

Delete Domain(PROPERTY, DOMAIN)

Intuition Used to delete a domain DOMAIN from a given property PROP-ERTY

δ+ –δ− (PROPERTY, rdfs : domain, DOMAIN)φold –φnew –SPARQL usedfor detection

SELECT ?a ?b WHERE {GRAPH <v1> {?a rdfs:domain ?b }.FILTER NOT EXISTS {GRAPH <v2> {?a rdfs:domain ?b } }}

Change name andparameters

Add Range(PROPERTY, RANGE)

Intuition Used to add a range RANGE in a given property PROPERTY

δ+ (PROPERTY, rdfs : range, RANGE)δ− –φold –φnew –SPARQL usedfor detection

SELECT ?a ?b WHERE {GRAPH <v2> {?a rdfs:range ?b }.FILTER NOT EXISTS {GRAPH <v1> {?a rdfs:range ?b } }}

Page 28: A Flexible Framework for Understanding the …interesting, common, or coarse-grained enough to be expressed using complex changes. 3 Preliminaries We consider two disjoint sets U,

Change name andparameters

Delete Range(PROPERTY, RANGE)

Intuition Used to delete a range RANGE from a given property PROPERTY

δ+ –δ− (PROPERTY, rdfs : range, RANGE)φold –φnew –SPARQL usedfor detection

SELECT ?a ?b WHERE {GRAPH <v1> {?a rdfs:range ?b }.FILTER NOT EXISTS {GRAPH <v2> {?a rdfs:range ?b } }}

Change name andparameters

Add Comment(ENTITY, COMMENT)

Intuition Used to add a comment COMMENT to a given entity (ENTITY)δ+ (ENTITY, rdfs : comment, COMMENT)δ− –φold –φnew –SPARQL usedfor detection

SELECT ?a ?b WHERE {GRAPH <v2> {?a rdfs:comment ?b }.FILTER NOT EXISTS {GRAPH <v1> {?a rdfs:comment ?b } }}

Page 29: A Flexible Framework for Understanding the …interesting, common, or coarse-grained enough to be expressed using complex changes. 3 Preliminaries We consider two disjoint sets U,

Change name andparameters

Delete Comment(ENTITY, COMMENT)

Intuition Used to delete a comment COMMENT from a given entity(ENTITY)

δ+ –δ− (ENTITY, rdfs : comment, COMMENT)φold –φnew –SPARQL usedfor detection

SELECT ?a ?b WHERE {GRAPH <v1> {?a rdfs:comment ?b }.FILTER NOT EXISTS {GRAPH <v2> {?a rdfs:comment ?b } }}

Change name andparameters

Add Label(ENTITY, LABEL)

Intuition Used to add a label LABEL to a given entity (ENTITY)δ+ (ENTITY, rdfs : label, LABEL)δ− –φold –φnew –SPARQL usedfor detection

SELECT ?a ?b WHERE {GRAPH <v2> {?a rdfs:label ?b }.FILTER NOT EXISTS {GRAPH <v1> {?a rdfs:label ?b } }}

Page 30: A Flexible Framework for Understanding the …interesting, common, or coarse-grained enough to be expressed using complex changes. 3 Preliminaries We consider two disjoint sets U,

Change name andparameters

Delete Label(ENTITY, LABEL)

Intuition Used to delete a label LABEL from a given entity (ENTITY)δ+ –δ− (ENTITY, rdfs : label, LABEL)φold –φnew –SPARQL usedfor detection

SELECT ?a ?b WHERE {GRAPH <v1> {?a rdfs:label ?b }.FILTER NOT EXISTS {GRAPH <v2> {?a rdfs:label ?b } }}

Page 31: A Flexible Framework for Understanding the …interesting, common, or coarse-grained enough to be expressed using complex changes. 3 Preliminaries We consider two disjoint sets U,

C Complex Changes

Change name andparameters

Update Property(PROP, OLDDOM, OLDRAN, NEWDOM,NEWRAN)

Intuition Used to update both the domain and range of PROP from OLDDOMto NEWDOM and OLDRAN to NEWRAN respectively

δ Delete Domain(PROP,OLDDOM),Delete Range(PROP,OLDRAN),Add Domain(PROP,NEWDOM),Add Range(PROP,NEWRAN)

φold –φnew –Associations –SPARQL usedfor detection

SELECT ?var1 ?var2 ?var3 ?var4 ?var5 WHERE {GRAPH <ChangesOntology> {?sc1 a co:Delete Domain;co:dd p1 ?sc11;co:dd p2 ?sc12.FILTER NOT EXISTS {?dcc1 co:consumes ?sc1.}.BIND (?sc11 AS ?var1).BIND (?sc12 AS ?var2).?sc2 a co:Add Range;co:dr p1 ?sc21;co:dr p2 ?sc22.FILTER NOT EXISTS {?dcc2 co:consumes ?sc2.}.FILTER (?sc21 = ?sc11).BIND (?sc22 AS ?var3).?sc3 a co:Add Domain;co:ad p1 ?sc31;co:ad p2 ?sc32.FILTER NOT EXISTS {?dcc3 co:consumes ?sc3.}.FILTER (?sc31 = ?sc11).BIND (?sc32 AS ?var4).?sc4 a co:Add Range;co:ar p1 ?sc41;co:ar p2 ?sc42.FILTER NOT EXISTS {?dcc4 co:consumes ?sc4.}.FILTER (?sc41 = ?sc11).BIND (?sc42 AS ?var5).} }

Priority 1

Page 32: A Flexible Framework for Understanding the …interesting, common, or coarse-grained enough to be expressed using complex changes. 3 Preliminaries We consider two disjoint sets U,

Change name andparameters

Update Range(PROP,NEWRAN,OLDRAN)

Intuition Used to update the range of PROP from OLDRAN to NEWRAN

δ Add Range(PROP,NEWRAN),Delete Range(PROP,OLDRAN)

φold –φnew –Associations –SPARQL usedfor detection

SELECT ?var1 ?var2 ?var3WHERE {GRAPH <ChangesOntology> {?sc1 a co:Add Range;co:ar p1 ?sc11;co:ar p2 ?sc12.FILTER NOT EXISTS {?dcc1 co:consumes ?sc1.}.BIND (?sc11 AS ?var1).BIND (?sc12 AS ?var3).?sc2 a co:Delete Range;co:dr p1 ?sc21;co:dr p2 ?sc22.FILTER NOT EXISTS {?dcc2 co:consumes ?sc2.}.FILTER (?sc21 = ?sc11).BIND (?sc22 AS ?var2).} }

Priority 2

Page 33: A Flexible Framework for Understanding the …interesting, common, or coarse-grained enough to be expressed using complex changes. 3 Preliminaries We consider two disjoint sets U,

Change name andparameters

Update Domain(PROP,NEWDOM,OLDDOM)

Intuition Used to update the domain of PROP from OLDDOM to NEWDOM

δ Add Domain(PROP,NEWDOM),Delete Domain(PROP,OLDDOM)

φold –φnew –Associations –SPARQL usedfor detection

SELECT ?var1 ?var2 ?var3WHERE {GRAPH <ChangesOntology> {?sc1 a co:Add Domain;co:ad p1 ?sc11;co:ad p2 ?sc12.FILTER NOT EXISTS {?dcc1 co:consumes ?sc1.}.BIND (?sc11 AS ?var1).BIND (?sc12 AS ?var3).?sc2 a co:Delete Domain;co:dd p1 ?sc21;co:dd p2 ?sc22.FILTER NOT EXISTS {?dcc2 co:consumes ?sc2.}.FILTER (?sc21 = ?sc11).BIND (?sc22 AS ?var2).} }

Priority 3

Page 34: A Flexible Framework for Understanding the …interesting, common, or coarse-grained enough to be expressed using complex changes. 3 Preliminaries We consider two disjoint sets U,

Change name andparameters

Update Label(ENTITY,NEWLAB,OLDLAB)

Intuition Used to update the label of an entity ENTITY from OLDLAB toNEWLAB

δ Add Label(PROP,NEWLAB),Delete Label(PROP,OLDLAB)

φold –φnew –Associations –SPARQL usedfor detection

SELECT ?var1 ?var2 ?var3WHERE {GRAPH <ChangesOntology> {?sc1 a co:Add Label;co:al p1 ?sc11;co:al p2 ?sc12.FILTER NOT EXISTS {?dcc1 co:consumes ?sc1.}.BIND (?sc11 AS ?var1).BIND (?sc12 AS ?var3).?sc2 a co:Delete Label;co:dl p1 ?sc21;co:dl p2 ?sc22.FILTER NOT EXISTS {?dcc2 co:consumes ?sc2.}.FILTER (?sc21 = ?sc11).BIND (?sc22 AS ?var2).} }

Priority 4

Page 35: A Flexible Framework for Understanding the …interesting, common, or coarse-grained enough to be expressed using complex changes. 3 Preliminaries We consider two disjoint sets U,

Change name andparameters

Update Comment(ENTITY,NEWCOM,OLDCOM)

Intuition Used to update the comment of an entity ENTITY from OLDCOMto NEWCOM

δ Add Comment(PROP,NEWCOM),Delete Comment(PROP,OLDCOM)

φold –φnew –Associations –SPARQL usedfor detection

SELECT ?var1 ?var2 ?var3WHERE {GRAPH <ChangesOntology> {?sc1 a co:Add Comment;co:ac p1 ?sc11;co:ac p2 ?sc12.FILTER NOT EXISTS { ?dcc1 co:consumes ?sc1.}.BIND (?sc11 AS ?var1).BIND (?sc12 AS ?var3).?sc2 a co:Delete Comment;co:dc p1 ?sc21;co:dc p2 ?sc22.FILTER NOT EXISTS { ?dcc2 co:consumes ?sc2.}.FILTER (?sc21 = ?sc11).BIND (?sc22 AS ?var2).} }

Priority 5

Page 36: A Flexible Framework for Understanding the …interesting, common, or coarse-grained enough to be expressed using complex changes. 3 Preliminaries We consider two disjoint sets U,

Change name andparameters

Add Definition(SUBJECT, DEFINITION)

Intuition Used to express the addition of a definition DEFINITION for a spe-cific subject SUBJECT

δ Add Property Instance(SUBJECT,PROPERTY,DEFINITION), wherePROPERTY = http://www.ebi.ac.uk/efo/definition

φold –φnew –Associations –SPARQL usedfor detection

SELECT ?var1 ?var2WHERE {GRAPH <ChangesOntology> {?sc1 a co:Add Property Instance;co:api p1 ?sc11;co:api p2 ?sc12;co:api p3 ?sc13.FILTER NOT EXISTS {?dcc1 co:consumes ?sc1.}.FILTER (?sc12 = <http://www.ebi.ac.uk/efo/definition>).BIND (?sc11 AS ?var1).BIND (?sc13 AS ?var2).} }

Priority 6

Page 37: A Flexible Framework for Understanding the …interesting, common, or coarse-grained enough to be expressed using complex changes. 3 Preliminaries We consider two disjoint sets U,

Change name andparameters

Delete Definition(SUBJECT, DEFINITION)

Intuition Used to express the deletion of a definition DEFINITION for a spe-cific subject SUBJECT

δ Delete Property Instance(SUBJECT,PROPERTY,DEFINITION),where PROPERTY = http://www.ebi.ac.uk/efo/definition

φold –φnew –Associations –SPARQL usedfor detection

SELECT ?var1 ?var2WHERE {GRAPH <ChangesOntology> {?sc1 a co:Delete Property Instance;co:dpi p1 ?sc11;co:dpi p2 ?sc12;co:dpi p3 ?sc13.FILTER NOT EXISTS {?dcc1 co:consumes ?sc1.}.FILTER (?sc12=<http://www.ebi.ac.uk/efo/definition>).BIND (?sc11 AS ?var1).BIND (?sc13 AS ?var2).} }

Priority 7

Page 38: A Flexible Framework for Understanding the …interesting, common, or coarse-grained enough to be expressed using complex changes. 3 Preliminaries We consider two disjoint sets U,

Change name andparameters

Add Synonym(SUBJECT, SYNONYM)

Intuition Used to express the addition of a synonym SYNONYM for a spe-cific term SUBJECT

δ Add Property Instance(SUBJECT,PROPERTY,SYNONYM), wherePROPERTY = http://www.ebi.ac.uk/efo/alternative\_term

φold –φnew –Associations –SPARQL usedfor detection

SELECT ?var1 ?var2WHERE {GRAPH <ChangesOntology> {?sc1 a co:Add Property Instance;co:api p1 ?sc11;co:api p2 ?sc12;co:api p3 ?sc13.FILTER NOT EXISTS {?dcc1 co:consumes ?sc1.}.FILTER (?sc12=<http://www.ebi.ac.uk/efo/alternative term>).BIND (?sc11 AS ?var1).BIND (?sc13 AS ?var2).} }

Priority 8

Page 39: A Flexible Framework for Understanding the …interesting, common, or coarse-grained enough to be expressed using complex changes. 3 Preliminaries We consider two disjoint sets U,

Change name andparameters

Delete Synonym(SUBJECT, SYNONYM)

Intuition Used to express the deletion of a synonym SYNONYM for a specificterm SUBJECT

δ Delete Property Instance(SUBJECT,PROPERTY,SYNONYM),where PROPERTY = http://www.ebi.ac.uk/efo/alternative\_term

φold –φnew –Associations –SPARQL usedfor detection

SELECT ?var1 ?var2WHERE {GRAPH <ChangesOntology> {?sc1 a co:Delete Property Instance;co:dpi p1 ?sc11;co:dpi p2 ?sc12;co:dpi p3 ?sc13.FILTER NOT EXISTS {?dcc1 co:consumes ?sc1.}.FILTER (?sc12 = <http://www.ebi.ac.uk/efo/alternative term>).BIND (?sc11 AS ?var1).BIND (?sc13 AS ?var2).} }

Priority 9

Page 40: A Flexible Framework for Understanding the …interesting, common, or coarse-grained enough to be expressed using complex changes. 3 Preliminaries We consider two disjoint sets U,

Change name andparameters

Mark as Obsolete(OBS CLASS, REASON)

Intuition Used to express the classification of a class OBS CLASS as obsoletewithin the dataset version, along with a reason (REASON) for thisclassification

δ Add Superclass(OBS CLASS, SUPERCLASS),Add Property Instance(OBS CLASS, PROPERTY, REASON),whereSUPERCLASS =http://www.geneontology.org/formats/oboInOwl\#ObsoleteClass, PROPERTY = http://www.ebi.ac.uk/efo/reason\_for\_obsolescence

φold –φnew –Associations –SPARQL usedfor detection

SELECT ?var1 ?var2WHERE {GRAPH <ChangesOntology> {?sc1 a co:Add Superclass;co:asc p1 ?sc11;co:asc p2 ?sc12.FILTER NOT EXISTS {?dcc1 co:consumes ?sc1.}.FILTER (?sc12=<http://www.geneontology.org/formats/oboInOwl#ObsoleteClass>).BIND (?sc11 AS ?var1).?sc2 a co:Add Property Instance;co:api p1 ?sc21;co:api p2 ?sc22;co:api p3 ?sc23.FILTER NOT EXISTS {?dcc2 co:consumes ?sc2.}.FILTER (?sc22=<http://www.ebi.ac.uk/efo/reason for obsolescence>).FILTER (?sc21 = ?sc11).BIND (?sc23 AS ?var2).} }

Priority 10

Page 41: A Flexible Framework for Understanding the …interesting, common, or coarse-grained enough to be expressed using complex changes. 3 Preliminaries We consider two disjoint sets U,

Change name andparameters

Add Concept(OBJECT)

Intuition Used to express the addition of a concept term OBJECT within thenew dataset version

δ Add Type To Individual(OBJECT,TYPE), whereTYPE = http://fma/dataset\#Concept\_name

φold –φnew –Associations –SPARQL usedfor detection

SELECT ?var1WHERE {GRAPH <ChangesOntology> {?sc1 a co:Add Type To Individual;co:atti p1 ?sc11;co:atti p2 ?sc12.FILTER NOT EXISTS {?dcc1 co:consumes ?sc1.}.FILTER (?sc12=<http://fma/dataset#Conceptname>).BIND (?sc11 AS ?var1).} }

Priority 11

Page 42: A Flexible Framework for Understanding the …interesting, common, or coarse-grained enough to be expressed using complex changes. 3 Preliminaries We consider two disjoint sets U,

Change name andparameters

Add Observation(OBSERVATION, NAME, AUTHOR, DATE)

Intuition Used to express the addition of an observation (OBSERVATION) de-scribed by a name (NAME) which came from an author (AUTHOR)and took place at a specific date (DATE)

δ Add Property Instance(OBSERVATION,PROPERTY1,NAME)Add Property Instance(OBSERVATION,PROPERTY2,AUTHOR)Add Property Instance(OBSERVATION,PROPERTY3,DATE),wherePROPERTY1 = http://fma/dataset\#namePROPERTY2 = http://fma/dataset\#authorPROPERTY3 = http://fma/dataset\#Date\_entered\_modified

φold –φnew –Associations –SPARQL usedfor detection

SELECT ?var1 ?var2 ?var3 ?var4WHERE {GRAPH <ChangesOntology> {?sc1 a co:Add Property Instance;co:api p1 ?sc11;co:api p2 ?sc12;co:api p3 ?sc13.FILTER NOT EXISTS { ?dcc1 co:consumes ?sc1.}.FILTER (?sc12=<http://fma/dataset#name>).BIND (?sc11 AS ?var1).BIND (?sc13 AS ?var2).?sc2 a co:Add Property Instance;co:api p1 ?sc21;co:api p2 ?sc22;co:api p3 ?sc23.FILTER NOT EXISTS {?dcc2 co:consumes ?sc2.}.FILTER (?sc22=<http://fma/dataset#author>).FILTER (?sc11 = ?sc21).BIND (?sc23 AS ?var3).?sc3 a co:Add Property Instance;co:api p1 ?sc31;co:api p2 ?sc32;co:api p3 ?sc33.FILTER NOT EXISTS { ?dcc3 co:consumes ?sc3.}.FILTER (?sc32=<http://fma/dataset#Dateentered modified>).FILTER (?sc11 = ?sc31).BIND (?sc33 AS ?var4).} }

Priority 12

Page 43: A Flexible Framework for Understanding the …interesting, common, or coarse-grained enough to be expressed using complex changes. 3 Preliminaries We consider two disjoint sets U,

Change name andparameters

Delete Observation(OBSERVATION, NAME, AUTHOR, DATE)

Intuition Used to express the deletion of an observation (OBSERVATION) de-scribed by a name (NAME) which came from an author (AUTHOR)and took place at a specific date (DATE)

δ Delete Property Instance(OBSERVATION,PROPERTY1,NAME)Delete Property Instance(OBSERVATION,PROPERTY2,AUTHOR)Delete Property Instance(OBSERVATION,PROPERTY3,DATE),wherePROPERTY1 = http://fma/dataset\#namePROPERTY2 = http://fma/dataset\#authorPROPERTY3 = http://fma/dataset\#Date\_entered\_modified

φold –φnew –Associations –SPARQL usedfor detection

SELECT ?var1 ?var2 ?var3 ?var4WHERE {GRAPH <ChangesOntology> {?sc1 a co:Delete Property Instance;co:dpi p1 ?sc11;co:dpi p2 ?sc12;co:dpi p3 ?sc13.FILTER NOT EXISTS {?dcc1 co:consumes ?sc1.}.FILTER (?sc12=<http://fma/dataset#name>).BIND (?sc11 AS ?var1).BIND (?sc13 AS ?var2).?sc2 a co:Delete Property Instance;co:dpi p1 ?sc21;co:dpi p2 ?sc22;co:dpi p3 ?sc23.FILTER NOT EXISTS {?dcc2 co:consumes ?sc2.}.FILTER (?sc22=<http://fma/dataset#author>).FILTER (?sc11 = ?sc21).BIND (?sc23 AS ?var3).?sc3 a co:Delete Property Instance;co:dpi p1 ?sc31;co:dpi p2 ?sc32;co:dpi p3 ?sc33.FILTER NOT EXISTS {?dcc3 co:consumes ?sc3.}.FILTER (?sc32=<http://fma/dataset#Dateentered modified>).FILTER (?sc11 = ?sc31).BIND (?sc33 AS ?var4).} }

Priority 13

Page 44: A Flexible Framework for Understanding the …interesting, common, or coarse-grained enough to be expressed using complex changes. 3 Preliminaries We consider two disjoint sets U,

Change name andparameters

Add Restriction(RESTRICTION)

Intuition Used to express the addition of a restriction term RESTRICTIONwithin the new dataset version

δ Add Type To Individual(RESTRICTION,TYPE), whereTYPE = http://www.w3.org/2002/07/owl\#Restriction

φold –φnew –Associations –SPARQL usedfor detection

SELECT ?var1WHERE {GRAPH <ChangesOntology> {?sc1 a co:Add Type To Individual;co:atti p1 ?sc11;co:atti p2 ?sc12.FILTER NOT EXISTS { ?dcc1 co:consumes ?sc1.}.FILTER (?sc12=<http://www.w3.org/2002/07/owl#Restriction>).BIND (?sc11 AS ?var1).} }

Priority 14

Page 45: A Flexible Framework for Understanding the …interesting, common, or coarse-grained enough to be expressed using complex changes. 3 Preliminaries We consider two disjoint sets U,

Change name andparameters

Delete Restriction(RESTRICTION)

Intuition Used to express the deletion of a restriction term RESTRICTIONfrom the new dataset version

δ Delete Type From Individual(RESTRICTION,TYPE), whereTYPE = http://www.w3.org/2002/07/owl\#Restriction

φold –φnew –Associations –SPARQL usedfor detection

SELECT ?var1WHERE {GRAPH <ChangesOntology> {?sc1 a co:Delete Type From Individual;co:dtti p1 ?sc11;co:dtti p2 ?sc12.FILTER NOT EXISTS { ?dcc1 co:consumes ?sc1.}.FILTER (?sc12=<http://www.w3.org/2002/07/owl#Restriction>).BIND (?sc11 AS ?var1).} }

Priority 15

Page 46: A Flexible Framework for Understanding the …interesting, common, or coarse-grained enough to be expressed using complex changes. 3 Preliminaries We consider two disjoint sets U,

Change name andparameters

Add Subject(ELEMENT, CATEGORY)

Intuition Used to express the addition of an element ELEMENT which be-longs to a specific category CATEGORY

δ Add Property Instance(ELEMENT,PROPERTY,CATEGORY), wherePROPERTY = http://purl.org/dc/terms/subject

φold –φnew –Associations –SPARQL usedfor detection

SELECT ?var1 ?var2WHERE {GRAPH <ChangesOntology> {?sc1 a co:Add Property Instance;co:api p1 ?sc11;co:api p2 ?sc12;co:api p3 ?sc13.FILTER NOT EXISTS {?dcc1 co:consumes ?sc1.}.FILTER (?sc12=<http://purl.org/dc/terms/subject>).BIND (?sc11 AS ?var1).BIND (?sc13 AS ?var2).} }

Priority 16

Page 47: A Flexible Framework for Understanding the …interesting, common, or coarse-grained enough to be expressed using complex changes. 3 Preliminaries We consider two disjoint sets U,

Change name andparameters

Delete Subject(ELEMENT, CATEGORY)

Intuition Used to express the deletion of an element ELEMENT which be-longs to a specific category CATEGORY

δ Delete Property Instance(ELEMENT,PROPERTY,CATEGORY),where PROPERTY = http://purl.org/dc/terms/subject

φold –φnew –Associations –SPARQL usedfor detection

SELECT ?var1 ?var2WHERE {GRAPH <ChangesOntology> {?sc1 a co:Delete Property Instance;co:dpi p1 ?sc11;co:dpi p2 ?sc12;co:dpi p3 ?sc13.FILTER NOT EXISTS {?dcc1 co:consumes ?sc1.}.FILTER (?sc12=<http://purl.org/dc/terms/subject>).BIND (?sc11 AS ?var1).BIND (?sc13 AS ?var2).} }

Priority 17

Page 48: A Flexible Framework for Understanding the …interesting, common, or coarse-grained enough to be expressed using complex changes. 3 Preliminaries We consider two disjoint sets U,

Change name andparameters

Add Thing(THING)

Intuition Used to express the addition of a thing term THING within the newdataset version

δ Add Type To Individual(THING,TYPE), whereTYPE = http://www.w3.org/2002/07/owl\#Thing

φold –φnew –Associations –SPARQL usedfor detection

SELECT ?var1WHERE {GRAPH <ChangesOntology> {?sc1 a co:Add Type To Individual;co:atti p1 ?sc11;co:atti p2 ?sc12.FILTER NOT EXISTS {?dcc1 co:consumes ?sc1.}.FILTER (?sc12=<http://www.w3.org/2002/07/owl#Thing>).BIND (?sc11 AS ?var1).} }

Priority 18

Change name andparameters

Delete Thing(THING)

Intuition Used to express the deletion of a thing term THING from the newdataset version

δ Delete Type From Individual(THING,TYPE), whereTYPE = http://www.w3.org/2002/07/owl\#Thing

φold –φnew –Associations –SPARQL usedfor detection

SELECT ?var1WHERE {GRAPH <ChangesOntology> {?sc1 a co:Delete Type From Individual;co:dtti p1 ?sc11;co:dtti p2 ?sc12.FILTER NOT EXISTS {?dcc1 co:consumes ?sc1.}.FILTER (?sc12=<http://www.w3.org/2002/07/owl#Thing>).BIND (?sc11 AS ?var1).} }

Priority 19

Page 49: A Flexible Framework for Understanding the …interesting, common, or coarse-grained enough to be expressed using complex changes. 3 Preliminaries We consider two disjoint sets U,

Change name andparameters

Add Person(PERSON)

Intuition Used to express the addition of a person term PERSON within thenew dataset version

δ Add Type To Individual(PERSON,TYPE1),Add Type To Individual(PERSON,TYPE2),Add Type To Individual(PERSON,TYPE3), whereTYPE1 = http://xmlns.com/foaf/0.1/Person,TYPE2 = http://dbpedia.org/ontology/Person,TYPE3 = http://schema.org/Person

φold –φnew –Associations –SPARQL usedfor detection

SELECT ?var1WHERE {GRAPH <ChangesOntology> {?sc1 a co:Add Type To Individual;co:atti p1 ?sc11;co:atti p2 ?sc12.FILTER NOT EXISTS {?dcc1 co:consumes ?sc1.}.FILTER (?sc12=<http://xmlns.com/foaf/0.1/Person>).BIND (?sc11 AS ?var1).?sc2 a co:Add Type To Individual;co:atti p1 ?sc21;co:atti p2 ?sc22.FILTER NOT EXISTS {?dcc2 co:consumes ?sc2.}.FILTER (?sc22=<http://dbpedia.org/ontology/Person>).FILTER (?sc21 = ?sc11).?sc3 a co:Add Type To Individual;co:atti p1 ?sc31;co:atti p2 ?sc32.FILTER NOT EXISTS {?dcc3 co:consumes ?sc3.}.FILTER (?sc32=<http://schema.org/Person>).FILTER (?sc31 = ?sc11).} }

Priority 20

Page 50: A Flexible Framework for Understanding the …interesting, common, or coarse-grained enough to be expressed using complex changes. 3 Preliminaries We consider two disjoint sets U,

Change name andparameters

Delete Person(PERSON)

Intuition Used to express the deletion of a person term PERSON from thenew dataset version

δ Delete Type From Individual(PERSON,TYPE1),Delete Type From Individual(PERSON,TYPE2),Delete Type From Individual(PERSON,TYPE3), whereTYPE1 = http://xmlns.com/foaf/0.1/Person,TYPE2 = http://dbpedia.org/ontology/Person,TYPE3 = http://schema.org/Person

φold –φnew –Associations –SPARQL usedfor detection

SELECT ?var1WHERE {GRAPH <ChangesOntology> {?sc1 a co:Delete Type From Individual;co:dtti p1 ?sc11;co:dtti p2 ?sc12.FILTER NOT EXISTS {?dcc1 co:consumes ?sc1.}.FILTER (?sc12=<http://xmlns.com/foaf/0.1/Person>).BIND (?sc11 AS ?var1).?sc2 a co:Delete Type From Individual;co:dtti p1 ?sc21;co:dtti p2 ?sc22.FILTER NOT EXISTS {?dcc2 co:consumes ?sc2.}.FILTER (?sc22=<http://dbpedia.org/ontology/Person>).FILTER (?sc21 = ?sc11).?sc3 a co:Delete Type From Individual;co:dtti p1 ?sc31;co:dtti p2 ?sc32.FILTER NOT EXISTS {?dcc3 co:consumes ?sc3.}.FILTER (?sc32=<http://schema.org/ Person>).FILTER (?sc31 = ?sc11).} }

Priority 21

Page 51: A Flexible Framework for Understanding the …interesting, common, or coarse-grained enough to be expressed using complex changes. 3 Preliminaries We consider two disjoint sets U,

Change name andparameters

Add Place(PLACE)

Intuition Used to express the addition of a place term PLACE within the newdataset version

δ Add Type To Individual(PLACE,TYPE1),Add Type To Individual(PLACE,TYPE2), whereTYPE1 = http://schema.org/Place,TYPE2 = http://dbpedia.org/ontology/Place

φold –φnew –Associations –SPARQL usedfor detection

SELECT ?var1WHERE {GRAPH <ChangesOntology> {?sc1 a co:Add Type To Individual;co:atti p1 ?sc11;co:atti p2 ?sc12.FILTER NOT EXISTS {?dcc1 co:consumes ?sc1.}.FILTER (?sc12=<http://schema.org/Place>).BIND (?sc11 AS ?var1).?sc2 a co:Add Type To Individual;co:atti p1 ?sc21;co:atti p2 ?sc22.FILTER NOT EXISTS { ?dcc2 co:consumes ?sc2.}.FILTER (?sc22=<http://dbpedia.org/ontology/Place>).FILTER (?sc21 = ?sc11).} }

Priority 22

Page 52: A Flexible Framework for Understanding the …interesting, common, or coarse-grained enough to be expressed using complex changes. 3 Preliminaries We consider two disjoint sets U,

Change name andparameters

Delete Place(PLACE)

Intuition Used to express the deletion of a place term PLACE from the newdataset version

δ Delete Type From Individual(PLACE,TYPE1),Delete Type From Individual(PLACE,TYPE2), whereTYPE1 = http://schema.org/Place,TYPE2 = http://dbpedia.org/ontology/Place

φold –φnew –Associations –SPARQL usedfor detection

SELECT ?var1WHERE {GRAPH <ChangesOntology> {?sc1 a co:Delete Type From Individual;co:dtti p1 ?sc11;co:dtti p2 ?sc12.FILTER NOT EXISTS {?dcc1 co:consumes ?sc1.}.FILTER (?sc12=<http://schema.org/Place>).BIND (?sc11 AS ?var1).?sc2 a co:Delete Type From Individual;co:dtti p1 ?sc21;co:dtti p2 ?sc22.FILTER NOT EXISTS {?dcc2 co:consumes ?sc2.}.FILTER (?sc22=<http://dbpedia.org/ontology/Place>).FILTER (?sc21 = ?sc11).} }

Priority 23

Page 53: A Flexible Framework for Understanding the …interesting, common, or coarse-grained enough to be expressed using complex changes. 3 Preliminaries We consider two disjoint sets U,

Change name andparameters

Add Athlete(ATHLETE)

Intuition Used to express the addition of a new athlete (ATHLETE)δ Add Type To Individual(ATHLETE,TYPE),

where TYPE = http://dbpedia.org/ontology/Athlete

φold –φnew –Associations –SPARQL usedfor detection

SELECT ?var1WHERE {GRAPH <ChangesOntology> {?sc1 a co:Add Type To Individual;co:atti p1 ?sc11;co:atti p2 ?sc12.FILTER NOT EXISTS {?dcc1 co:consumes ?sc1.}.FILTER (?sc12=<http://dbpedia.org/ontology/Athlete>).BIND (?sc11 AS ?var1).} }

Priority 24