142
Exchanging more than Complete Data Marcelo Arenas PUC Chile Joint work with Jorge P´ erez (U. de Chile) and Juan Reutter (U. Edinburgh) M. Arenas Exchanging more than Complete Data - RR2011 1 / 68

Exchanging more than Complete Data

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Exchanging more than Complete Data

Exchanging more than Complete Data

Marcelo ArenasPUC Chile

Joint work with Jorge Perez (U. de Chile) and Juan Reutter (U. Edinburgh)

M. Arenas – Exchanging more than Complete Data - RR2011 1 / 68

Page 2: Exchanging more than Complete Data

Outline: First part

◮ The data exchange problem◮ Some fundamental results in relational data exchange

◮ The need for a more general data exchange framework◮ Two important scenarios: Incomplete databases (open-world

databases: RDF) and knowledge bases

M. Arenas – Exchanging more than Complete Data - RR2011 2 / 68

Page 3: Exchanging more than Complete Data

Outline: First part

◮ The data exchange problem◮ Some fundamental results in relational data exchange

◮ The need for a more general data exchange framework◮ Two important scenarios: Incomplete databases (open-world

databases: RDF) and knowledge bases

M. Arenas – Exchanging more than Complete Data - RR2011 3 / 68

Page 4: Exchanging more than Complete Data

The problem of data exchange

Given: A source schema S, a target schema T and a specificationΣ of the relationship between these schemas

Data exchange: Problem of materializing an instance of T givenan instance of S

◮ Target instance should reflect the source data as accurately aspossible, given the constraints imposed by Σ and T

◮ It should be efficiently computable

◮ It should allow one to evaluate queries on the target in a waythat is semantically consistent with the source data

M. Arenas – Exchanging more than Complete Data - RR2011 4 / 68

Page 5: Exchanging more than Complete Data

Data exchange in a picture

Query Q

Schema S

Σ

Schema T

M. Arenas – Exchanging more than Complete Data - RR2011 5 / 68

Page 6: Exchanging more than Complete Data

Data exchange in a picture

Query Q

Schema S

Σ

Schema T

M. Arenas – Exchanging more than Complete Data - RR2011 5 / 68

Page 7: Exchanging more than Complete Data

Data exchange in a picture

Query Q

Schema S

Σ

Schema T

M. Arenas – Exchanging more than Complete Data - RR2011 5 / 68

Page 8: Exchanging more than Complete Data

Data exchange in a picture

Schema TSchema S

Σ

M. Arenas – Exchanging more than Complete Data - RR2011 5 / 68

Page 9: Exchanging more than Complete Data

Data exchange in a picture

Query Q

Schema S

Σ

Schema T

M. Arenas – Exchanging more than Complete Data - RR2011 5 / 68

Page 10: Exchanging more than Complete Data

Data exchange: Some fundamental questions

What are the challenges in the area?

◮ What is a good language for specifying the relationshipbetween source and target data?

◮ Expressiveness versus complexity

◮ What is a good instance to materialize?

◮ What does it mean to answer a query over target data?

◮ How do we answer queries over target data? Can we do thisefficiently?

M. Arenas – Exchanging more than Complete Data - RR2011 6 / 68

Page 11: Exchanging more than Complete Data

Exchanging relational data

The data exchange problem has been extensively studied in therelational world.

◮ It has also been commercially implemented: IBM Clio

Relational data exchange setting:

◮ Source and target schemas: Relational schemas

◮ Relationship between source and target schemas:Source-to-target tuple-generating dependencies (st-tgds)

Semantics of data exchange has been precisely defined.

◮ Efficient algorithms for materializing target instances and foranswering queries over the target schema have been developed

M. Arenas – Exchanging more than Complete Data - RR2011 7 / 68

Page 12: Exchanging more than Complete Data

Schema mapping: The key component in relational data

exchange

Schema mapping: M = (S,T,Σ)

◮ S and T are disjoint relational schemas

◮ Σ is a finite set of st-tgds:

∀x∀y (ϕ(x , y) → ∃z ψ(x , z))

ϕ(x , y ): conjunction of relational atomic formulas over S

ψ(x , z): conjunction of relational atomic formulas over T

M. Arenas – Exchanging more than Complete Data - RR2011 8 / 68

Page 13: Exchanging more than Complete Data

Relational schema mappings: An example

Example

◮ S: Employee(name)

◮ T: Dept(name, number)

◮ Σ:

∀x

(

Employee(x) → ∃y Dept(x , y)

)

M. Arenas – Exchanging more than Complete Data - RR2011 9 / 68

Page 14: Exchanging more than Complete Data

Relational schema mappings: An example

Example

◮ S: Employee(name)

◮ T: Dept(name, number)

◮ Σ:

∀x

(

Employee(x) → ∃y Dept(x , y)

)

Note

We omit universal quantifiers in st-tgds:

Employee(x) → ∃y Dept(x , y)

M. Arenas – Exchanging more than Complete Data - RR2011 9 / 68

Page 15: Exchanging more than Complete Data

Relational data exchange problem

Fixed: M = (S,T,Σ)

Problem: Given instance I of S, find an instance J of T such that(I , J) satisfies Σ

◮ (I , J) satisfies ϕ(x , y ) → ∃z ψ(x , z) if whenever I satisfiesϕ(a, b), there is a tuple c such that J satisfies ψ(a, c)

M. Arenas – Exchanging more than Complete Data - RR2011 10 / 68

Page 16: Exchanging more than Complete Data

Relational data exchange problem

Fixed: M = (S,T,Σ)

Problem: Given instance I of S, find an instance J of T such that(I , J) satisfies Σ

◮ (I , J) satisfies ϕ(x , y ) → ∃z ψ(x , z) if whenever I satisfiesϕ(a, b), there is a tuple c such that J satisfies ψ(a, c)

Notation

J is a solution for I under M

◮ SolM(I ): Set of solutions for I under M

M. Arenas – Exchanging more than Complete Data - RR2011 10 / 68

Page 17: Exchanging more than Complete Data

The notion of solution: Example

Example

◮ S: Employee(name)

◮ T: Dept(name, number)

◮ Σ: Employee(x) → ∃y Dept(x , y)

Solutions for I = {Employee(Peter)}:

M. Arenas – Exchanging more than Complete Data - RR2011 11 / 68

Page 18: Exchanging more than Complete Data

The notion of solution: Example

Example

◮ S: Employee(name)

◮ T: Dept(name, number)

◮ Σ: Employee(x) → ∃y Dept(x , y)

Solutions for I = {Employee(Peter)}:

J1: {Dept(Peter,1)}

M. Arenas – Exchanging more than Complete Data - RR2011 11 / 68

Page 19: Exchanging more than Complete Data

The notion of solution: Example

Example

◮ S: Employee(name)

◮ T: Dept(name, number)

◮ Σ: Employee(x) → ∃y Dept(x , y)

Solutions for I = {Employee(Peter)}:

J1: {Dept(Peter,1)}

J2: {Dept(Peter,1), Dept(Peter,2)}

M. Arenas – Exchanging more than Complete Data - RR2011 11 / 68

Page 20: Exchanging more than Complete Data

The notion of solution: Example

Example

◮ S: Employee(name)

◮ T: Dept(name, number)

◮ Σ: Employee(x) → ∃y Dept(x , y)

Solutions for I = {Employee(Peter)}:

J1: {Dept(Peter,1)}

J2: {Dept(Peter,1), Dept(Peter,2)}

J3: {Dept(Peter,1), Dept(John,1)}

M. Arenas – Exchanging more than Complete Data - RR2011 11 / 68

Page 21: Exchanging more than Complete Data

The notion of solution: Example

Example

◮ S: Employee(name)

◮ T: Dept(name, number)

◮ Σ: Employee(x) → ∃y Dept(x , y)

Solutions for I = {Employee(Peter)}:

J1: {Dept(Peter,1)}

J2: {Dept(Peter,1), Dept(Peter,2)}

J3: {Dept(Peter,1), Dept(John,1)}

J4: {Dept(Peter,n1)}

M. Arenas – Exchanging more than Complete Data - RR2011 11 / 68

Page 22: Exchanging more than Complete Data

The notion of solution: Example

Example

◮ S: Employee(name)

◮ T: Dept(name, number)

◮ Σ: Employee(x) → ∃y Dept(x , y)

Solutions for I = {Employee(Peter)}:

J1: {Dept(Peter,1)}

J2: {Dept(Peter,1), Dept(Peter,2)}

J3: {Dept(Peter,1), Dept(John,1)}

J4: {Dept(Peter,n1)}

J5: {Dept(Peter,n1), Dept(Peter,n2)}

M. Arenas – Exchanging more than Complete Data - RR2011 11 / 68

Page 23: Exchanging more than Complete Data

Canonical universal solution

Algorithm (Chase)

Input : M = (S,T,Σ) and an instance I of S

Output : Canonical universal solution J⋆ for I under M

let J⋆ := empty instance of T

for every ϕ(x , y) → ∃z ψ(x , z) in Σ do

for every a, b such that I satisfies ϕ(a, b) do

create a fresh tuple n of pairwise distinct null valuesinsert ψ(a, n) into J⋆

M. Arenas – Exchanging more than Complete Data - RR2011 12 / 68

Page 24: Exchanging more than Complete Data

Canonical universal solution: Example

Example

Consider mapping M specified by dependency:

Employee(x) → ∃y Dept(x , y)

Canonical universal solution forI = {Employee(Peter), Employee(John)}:

◮ For a = Peter do

◮ Create a fresh null value n1

◮ Insert Dept(Peter, n1) into J⋆

◮ For a = John do

◮ Create a fresh null value n2

◮ Insert Dept(John, n2) into J⋆

Result: J⋆ = {Dept(Peter, n1), Dept(John, n2)}

M. Arenas – Exchanging more than Complete Data - RR2011 13 / 68

Page 25: Exchanging more than Complete Data

Query answering in data exchange

Given: Mapping M, source instance I and query Q over the targetschema

◮ What does it mean to answer Q?

M. Arenas – Exchanging more than Complete Data - RR2011 14 / 68

Page 26: Exchanging more than Complete Data

Query answering in data exchange

Given: Mapping M, source instance I and query Q over the targetschema

◮ What does it mean to answer Q?

Definition (Certain answers)

certainM(Q, I ) =⋂

J is a solution for I under M

Q(J)

M. Arenas – Exchanging more than Complete Data - RR2011 14 / 68

Page 27: Exchanging more than Complete Data

Certain answers: Example

Example

Consider mapping M specified by:

Employee(x) → ∃y Dept(x , y)

Given instance I = {Employee(Peter)}:

certainM(∃y Dept(x , y), I ) = {Peter}certainM(Dept(x , y), I ) = ∅

M. Arenas – Exchanging more than Complete Data - RR2011 15 / 68

Page 28: Exchanging more than Complete Data

Query rewriting: An approach for answering queries

How can we compute certain answers?

◮ Naıve algorithm does not work: infinitely many solutions

M. Arenas – Exchanging more than Complete Data - RR2011 16 / 68

Page 29: Exchanging more than Complete Data

Query rewriting: An approach for answering queries

How can we compute certain answers?

◮ Naıve algorithm does not work: infinitely many solutions

Approach proposed in [FKMP03]: Query Rewriting

Given a mapping M and a target query Q, compute a queryQ⋆ such that for every source instance I with canonicaluniversal solution J⋆:

certainM(Q, I ) = Q⋆(J⋆)

M. Arenas – Exchanging more than Complete Data - RR2011 16 / 68

Page 30: Exchanging more than Complete Data

Query rewriting over the canonical universal solution

Theorem (FKMP03)

Given a mapping M specified by st-tgds and a union ofconjunctive queries Q, there exists a query Q⋆ such that for everysource instance I with canonical universal solution J⋆:

certainM(Q, I ) = Q⋆(J⋆)

M. Arenas – Exchanging more than Complete Data - RR2011 17 / 68

Page 31: Exchanging more than Complete Data

Query rewriting over the canonical universal solution

Theorem (FKMP03)

Given a mapping M specified by st-tgds and a union ofconjunctive queries Q, there exists a query Q⋆ such that for everysource instance I with canonical universal solution J⋆:

certainM(Q, I ) = Q⋆(J⋆)

Proof idea: Assume that C(a) holds whenever a is a constant.

Then:

Q⋆(x1, . . . , xm) = C(x1) ∧ · · · ∧ C(xm) ∧ Q(x1, . . . , xm)

M. Arenas – Exchanging more than Complete Data - RR2011 17 / 68

Page 32: Exchanging more than Complete Data

Computing certain answers: Complexity

Data complexity: Data exchange setting and query are consideredto be fixed.

Corollary (FKMP03)

For mappings given by st-tgds, certain answers for UCQ can becomputed in polynomial time (data complexity)

M. Arenas – Exchanging more than Complete Data - RR2011 18 / 68

Page 33: Exchanging more than Complete Data

Relational data exchange: Some lessons learned

Key steps in the development of the area:

◮ Definition of schema mappings: Precise syntax and semantics◮ Definition of the notion of solution

◮ Identification of good solutions

◮ Polynomial time algorithms for materializing good solutions

◮ Definition of target queries: Precise semantics

◮ Polynomial time algorithms for computing certain answers forUCQ

M. Arenas – Exchanging more than Complete Data - RR2011 19 / 68

Page 34: Exchanging more than Complete Data

Relational data exchange: Some lessons learned

Key steps in the development of the area:

◮ Definition of schema mappings: Precise syntax and semantics◮ Definition of the notion of solution

◮ Identification of good solutions

◮ Polynomial time algorithms for materializing good solutions

◮ Definition of target queries: Precise semantics

◮ Polynomial time algorithms for computing certain answers forUCQ

Creating schema mappings is a time consuming and expensiveprocess

◮ Manual or semi-automatic process in general

M. Arenas – Exchanging more than Complete Data - RR2011 19 / 68

Page 35: Exchanging more than Complete Data

Outline: First part

◮ The data exchange problem◮ Some fundamental results in relational data exchange

◮ The need for a more general data exchange framework◮ Two important scenarios: Incomplete databases (open-world

databases: RDF) and knowledge bases

M. Arenas – Exchanging more than Complete Data - RR2011 20 / 68

Page 36: Exchanging more than Complete Data

Ongoing project: Reusing schema mappings

ΣSU

ΣST

S T U

ΣTU

M. Arenas – Exchanging more than Complete Data - RR2011 21 / 68

Page 37: Exchanging more than Complete Data

Ongoing project: Reusing schema mappings

ΣSU

ΣST

S T U

ΣTU

M. Arenas – Exchanging more than Complete Data - RR2011 21 / 68

Page 38: Exchanging more than Complete Data

Ongoing project: Reusing schema mappings

ΣSU?

ΣST

S T U

ΣTU

M. Arenas – Exchanging more than Complete Data - RR2011 21 / 68

Page 39: Exchanging more than Complete Data

Ongoing project: Reusing schema mappings

ΣSU?

ΣST

S T U

ΣTU

We need some operators for schema mappings

M. Arenas – Exchanging more than Complete Data - RR2011 21 / 68

Page 40: Exchanging more than Complete Data

Ongoing project: Reusing schema mappings

ΣSU = ΣST ◦ ΣTU

ΣST

S T U

ΣTU

We need some operators for schema mappings

◮ Composition in the above case

M. Arenas – Exchanging more than Complete Data - RR2011 21 / 68

Page 41: Exchanging more than Complete Data

Metadata management

Contributions mentioned in the previous slides are just a first steptowards the development of a general framework for data exchange.

In fact, as pointed in [B03],

many information system problems involve not only the designand integration of complex application artifacts, but also theirsubsequent manipulation.

M. Arenas – Exchanging more than Complete Data - RR2011 22 / 68

Page 42: Exchanging more than Complete Data

Metadata management

This has motivated the need for the development of a generalinfrastructure for managing schema mappings.

The problem of managing schema mappings is called metadata

management.

High-level algebraic operators, such as compose, are used tomanipulate mappings.

◮ What other operators are needed?

M. Arenas – Exchanging more than Complete Data - RR2011 23 / 68

Page 43: Exchanging more than Complete Data

An inverse operator is also needed

ΣVS?

ΣST

S T U

ΣTU

V

ΣSVΣVS = Σ−1SV

(Σ−1VS ◦ ΣST) ◦ ΣTUΣ−1

VS ◦ ΣST

M. Arenas – Exchanging more than Complete Data - RR2011 24 / 68

Page 44: Exchanging more than Complete Data

An inverse operator is also needed

ΣVS?

ΣST

S T U

ΣTU

V

ΣSVΣVS = Σ−1SV

(Σ−1VS ◦ ΣST) ◦ ΣTUΣ−1

VS ◦ ΣST

M. Arenas – Exchanging more than Complete Data - RR2011 24 / 68

Page 45: Exchanging more than Complete Data

An inverse operator is also needed

ΣVS?

ΣST

S T U

ΣTU

V

ΣSV

(Σ−1VS ◦ ΣST) ◦ ΣTUΣ−1

VS ◦ ΣST

ΣVS = Σ−1SV

M. Arenas – Exchanging more than Complete Data - RR2011 24 / 68

Page 46: Exchanging more than Complete Data

An inverse operator is also needed

ΣVS = Σ−1SV

ΣST

S T U

ΣTU

V

ΣSV

(Σ−1VS ◦ ΣST) ◦ ΣTUΣ−1

VS ◦ ΣST

M. Arenas – Exchanging more than Complete Data - RR2011 24 / 68

Page 47: Exchanging more than Complete Data

An inverse operator is also needed

Σ−1SV ◦ ΣST

ΣST

S T U

ΣTU

V

ΣSV

(Σ−1VS ◦ ΣST) ◦ ΣTU

ΣVS = Σ−1SV

Composition and inverse operators have to be combined

M. Arenas – Exchanging more than Complete Data - RR2011 24 / 68

Page 48: Exchanging more than Complete Data

An inverse operator is also needed

Σ−1SV ◦ ΣST

ΣST

S T U

ΣTU

V

ΣSV

(Σ−1SV ◦ ΣST) ◦ ΣTU

ΣVS = Σ−1SV

Composition and inverse operators have to be combined

M. Arenas – Exchanging more than Complete Data - RR2011 24 / 68

Page 49: Exchanging more than Complete Data

Metadata management: A more general data exchange

framework is needed

Composition and inverse operators have been extensively studied in therelational world.

◮ Semantics, computation, . . .

Combining these operators is an open issue.

M. Arenas – Exchanging more than Complete Data - RR2011 25 / 68

Page 50: Exchanging more than Complete Data

Metadata management: A more general data exchange

framework is needed

Composition and inverse operators have been extensively studied in therelational world.

◮ Semantics, computation, . . .

Combining these operators is an open issue.

◮ Key observation: A target instance of a mapping can be the sourceinstance of another mapping

M. Arenas – Exchanging more than Complete Data - RR2011 25 / 68

Page 51: Exchanging more than Complete Data

Metadata management: A more general data exchange

framework is needed

Composition and inverse operators have been extensively studied in therelational world.

◮ Semantics, computation, . . .

Combining these operators is an open issue.

◮ Key observation: A target instance of a mapping can be the sourceinstance of another mapping

◮ Sources instances may contain null values

M. Arenas – Exchanging more than Complete Data - RR2011 25 / 68

Page 52: Exchanging more than Complete Data

Metadata management: A more general data exchange

framework is needed

Composition and inverse operators have been extensively studied in therelational world.

◮ Semantics, computation, . . .

Combining these operators is an open issue.

◮ Key observation: A target instance of a mapping can be the sourceinstance of another mapping

◮ Sources instances may contain null values

There is a need for a data exchange framework that can handle databases

with incomplete information.

M. Arenas – Exchanging more than Complete Data - RR2011 25 / 68

Page 53: Exchanging more than Complete Data

Data exchange in the RDF world

There is an increasing interest in publishing relational data as RDF

◮ Resulted in the creation of the W3C RDB2RDF Working Group

The problem of translating relational data into RDF can be seen as adata exchange problem

◮ Schema mappings can be used to describe how the relational data isto be mapped into RDF

M. Arenas – Exchanging more than Complete Data - RR2011 26 / 68

Page 54: Exchanging more than Complete Data

Data exchange in the RDF world

There is an increasing interest in publishing relational data as RDF

◮ Resulted in the creation of the W3C RDB2RDF Working Group

The problem of translating relational data into RDF can be seen as adata exchange problem

◮ Schema mappings can be used to describe how the relational data isto be mapped into RDF

But there is a mismatch here: A relational database under a closed-worldsemantics is to be translated into an RDF graph under an open-worldsemantics

◮ There is a need for a data exchange framework that can handleboth databases with complete and incomplete information

M. Arenas – Exchanging more than Complete Data - RR2011 26 / 68

Page 55: Exchanging more than Complete Data

Data exchange in the RDF world

An issue discussed at the W3C RDB2RDF Working Group: Is amapping information preserving?

◮ In particular: For the default mapping defined by this group

How can we address this issue?

◮ Metadata management can help us

M. Arenas – Exchanging more than Complete Data - RR2011 27 / 68

Page 56: Exchanging more than Complete Data

Data exchange in the RDF world

An issue discussed at the W3C RDB2RDF Working Group: Is amapping information preserving?

◮ In particular: For the default mapping defined by this group

How can we address this issue?

◮ Metadata management can help us

Question to answer: Is a mapping invertible?

M. Arenas – Exchanging more than Complete Data - RR2011 27 / 68

Page 57: Exchanging more than Complete Data

Data exchange in the RDF world

An issue discussed at the W3C RDB2RDF Working Group: Is amapping information preserving?

◮ In particular: For the default mapping defined by this group

How can we address this issue?

◮ Metadata management can help us

Question to answer: Is a mapping invertible?

◮ This time an RDF graph is to be translated into a relationaldatabase!

M. Arenas – Exchanging more than Complete Data - RR2011 27 / 68

Page 58: Exchanging more than Complete Data

Data exchange in the RDF world

An issue discussed at the W3C RDB2RDF Working Group: Is amapping information preserving?

◮ In particular: For the default mapping defined by this group

How can we address this issue?

◮ Metadata management can help us

Question to answer: Is a mapping invertible?

◮ This time an RDF graph is to be translated into a relationaldatabase!

◮ We want to have a unifying framework for all these cases

M. Arenas – Exchanging more than Complete Data - RR2011 27 / 68

Page 59: Exchanging more than Complete Data

But these are not the only reasons . . .

Nowadays several applications use knowledge bases to represent data.

◮ A knowledge base has not only data but also rules that allows toinfer new data

◮ In the Semantics Web: RDFS and OWL ontologies

M. Arenas – Exchanging more than Complete Data - RR2011 28 / 68

Page 60: Exchanging more than Complete Data

But these are not the only reasons . . .

Nowadays several applications use knowledge bases to represent data.

◮ A knowledge base has not only data but also rules that allows toinfer new data

◮ In the Semantics Web: RDFS and OWL ontologies

In a data exchange application over the Semantics Web:

The input is a mapping and a source specification including dataand rules, and the output is a target specification also includingdata and rules

M. Arenas – Exchanging more than Complete Data - RR2011 28 / 68

Page 61: Exchanging more than Complete Data

But these are not the only reasons . . .

Nowadays several applications use knowledge bases to represent data.

◮ A knowledge base has not only data but also rules that allows toinfer new data

◮ In the Semantics Web: RDFS and OWL ontologies

In a data exchange application over the Semantics Web:

The input is a mapping and a source specification including dataand rules, and the output is a target specification also includingdata and rules

There is a need for a data exchange framework that can handle

knowledge bases.

M. Arenas – Exchanging more than Complete Data - RR2011 28 / 68

Page 62: Exchanging more than Complete Data

Knowledge exchange: A more general data exchange

framework is needed

Example

Assume given the following source knowledge base:

Data:

Father

Andy BobBob DannyDanny Eddie

Mother

Carrie Bob

Rules:

Father(x , y) → Parent(x , y)

Mother(x , y) → Parent(x , y)

Parent(x , y) ∧ Parent(y , z) → Grandparent(x , z)

M. Arenas – Exchanging more than Complete Data - RR2011 29 / 68

Page 63: Exchanging more than Complete Data

Knowledge exchange: A more general data exchange

framework is needed

Example (cont’d)

Given a mapping:

Father(x , y) → F(x , y)

Grandparent(x , y) → G(x , y)

What is a good translation of the initial knowledge base?

M. Arenas – Exchanging more than Complete Data - RR2011 30 / 68

Page 64: Exchanging more than Complete Data

Knowledge exchange: A more general data exchange

framework is needed

Example (cont’d)

Given a mapping:

Father(x , y) → F(x , y)

Grandparent(x , y) → G(x , y)

What is a good translation of the initial knowledge base?

Data:

F

Andy BobBob DannyDanny Eddie

G

Andy DannyCarrie DannyBob Eddie

Rules: ∅

M. Arenas – Exchanging more than Complete Data - RR2011 30 / 68

Page 65: Exchanging more than Complete Data

Knowledge exchange: A more general data exchange

framework is needed

Example (cont’d)

Our first alternative does not include any translation of the source rules:

Father(x , y) → Parent(x , y)

Mother(x , y) → Parent(x , y)

Parent(x , y) ∧ Parent(y , z) → Grandparent(x , z)

M. Arenas – Exchanging more than Complete Data - RR2011 31 / 68

Page 66: Exchanging more than Complete Data

Knowledge exchange: A more general data exchange

framework is needed

Example (cont’d)

Our first alternative does not include any translation of the source rules:

Mother(x , y) → Parent(x , y)

Parent(x , y) ∧ Parent(y , z) → Grandparent(x , z)

M. Arenas – Exchanging more than Complete Data - RR2011 31 / 68

Page 67: Exchanging more than Complete Data

Knowledge exchange: A more general data exchange

framework is needed

Example (cont’d)

Our first alternative does not include any translation of the source rules:

Parent(x , y) ∧ Parent(y , z) → Grandparent(x , z)

M. Arenas – Exchanging more than Complete Data - RR2011 31 / 68

Page 68: Exchanging more than Complete Data

Knowledge exchange: A more general data exchange

framework is needed

Example (cont’d)

Our first alternative does not include any translation of the source rules:

F(x , y) ∧ F(y , z) → G(x , z)

M. Arenas – Exchanging more than Complete Data - RR2011 31 / 68

Page 69: Exchanging more than Complete Data

Knowledge exchange: A more general data exchange

framework is needed

Example (cont’d)

Our first alternative does not include any translation of the source rules:

F(x , y) ∧ F(y , z) → G(x , z)

What data should we materialize?

M. Arenas – Exchanging more than Complete Data - RR2011 31 / 68

Page 70: Exchanging more than Complete Data

Knowledge exchange: A more general data exchange

framework is needed

Example (cont’d)

Our first alternative does not include any translation of the source rules:

F(x , y) ∧ F(y , z) → G(x , z)

What data should we materialize?

F

Andy BobBob DannyDanny Eddie

G

Andy DannyCarrie DannyBob Eddie

M. Arenas – Exchanging more than Complete Data - RR2011 31 / 68

Page 71: Exchanging more than Complete Data

Knowledge exchange: A more general data exchange

framework is needed

Example (cont’d)

Our first alternative does not include any translation of the source rules:

F(x , y) ∧ F(y , z) → G(x , z)

What data should we materialize?

F

Andy BobBob DannyDanny Eddie

G

Carrie Danny

M. Arenas – Exchanging more than Complete Data - RR2011 31 / 68

Page 72: Exchanging more than Complete Data

Knowledge exchange: A more general data exchange

framework is needed

Example (cont’d)

Our first alternative does not include any translation of the source rules:

F(x , y) ∧ F(y , z) → G(x , z)

What data should we materialize?

F

Andy BobBob DannyDanny Eddie

G

Carrie Danny

Is this a good translation? Why?

M. Arenas – Exchanging more than Complete Data - RR2011 31 / 68

Page 73: Exchanging more than Complete Data

One can exchange more than complete data

◮ In data exchange one starts with a database instance (withcomplete information).

◮ What if we have an initial object that has severalinterpretations?

◮ A representation of a set of possible instances

◮ We propose a new general formalism to exchangerepresentations of possible instances

◮ We apply it to the problems of exchanging instances withincomplete information and exchanging knowledge bases

M. Arenas – Exchanging more than Complete Data - RR2011 32 / 68

Page 74: Exchanging more than Complete Data

Outline: Second part

◮ Formalism for exchanging representations systems

◮ Applications to incomplete instances

◮ Applications to knowledge bases

◮ Concluding remarks

M. Arenas – Exchanging more than Complete Data - RR2011 33 / 68

Page 75: Exchanging more than Complete Data

Outline: Second part

◮ Formalism for exchanging representations systems

◮ Applications to incomplete instances

◮ Applications to knowledge bases

◮ Concluding remarks

M. Arenas – Exchanging more than Complete Data - RR2011 34 / 68

Page 76: Exchanging more than Complete Data

Representation systems

A representation system R = (W, rep) consists of:

◮ a set W of representatives

◮ a function rep that assigns a set of instances to every elementin W

rep(V) = {I1, I2, I3, . . .} for every V ∈ W

Uniformity assumption: For every V ∈ W, there exists a relationalschema S (the type of V) such that rep(V) ⊆ Inst(S)

M. Arenas – Exchanging more than Complete Data - RR2011 35 / 68

Page 77: Exchanging more than Complete Data

Representation systems

A representation system R = (W, rep) consists of:

◮ a set W of representatives

◮ a function rep that assigns a set of instances to every elementin W

rep(V) = {I1, I2, I3, . . .} for every V ∈ W

Uniformity assumption: For every V ∈ W, there exists a relationalschema S (the type of V) such that rep(V) ⊆ Inst(S)

Incomplete instances and knowledge bases are representationsystems

M. Arenas – Exchanging more than Complete Data - RR2011 35 / 68

Page 78: Exchanging more than Complete Data

In classical data exchange we consider only complete data

Recall that given M = (S,T,Σ), I ∈ Inst(S) and J ∈ Inst(T): J isa solution for I under M if (I , J) |= Σ

J ∈ SolM(I )

M. Arenas – Exchanging more than Complete Data - RR2011 36 / 68

Page 79: Exchanging more than Complete Data

In classical data exchange we consider only complete data

Recall that given M = (S,T,Σ), I ∈ Inst(S) and J ∈ Inst(T): J isa solution for I under M if (I , J) |= Σ

J ∈ SolM(I )

This can be extended to set of instances. Given X ⊆ Inst(S):

SolM(X ) =⋃

I∈X

SolM(I )

M. Arenas – Exchanging more than Complete Data - RR2011 36 / 68

Page 80: Exchanging more than Complete Data

Extending the definition to representation systems

Given:

◮ a mapping M = (S,T,Σ)

◮ a representation system R = (W, rep)

◮ U ,V ∈ W of types S and T, respectively

M. Arenas – Exchanging more than Complete Data - RR2011 37 / 68

Page 81: Exchanging more than Complete Data

Extending the definition to representation systems

Given:

◮ a mapping M = (S,T,Σ)

◮ a representation system R = (W, rep)

◮ U ,V ∈ W of types S and T, respectively

Definition (APR11)

V is an R-solution of U under M if

rep(V) ⊆ SolM(rep(U))

M. Arenas – Exchanging more than Complete Data - RR2011 37 / 68

Page 82: Exchanging more than Complete Data

Extending the definition to representation systems

Given:

◮ a mapping M = (S,T,Σ)

◮ a representation system R = (W, rep)

◮ U ,V ∈ W of types S and T, respectively

Definition (APR11)

V is an R-solution of U under M if

rep(V) ⊆ SolM(rep(U))

Or equivalently: V is an R-solution of U if for every J ∈ rep(V),there exists I ∈ rep(U) such that J ∈ SolM(I ).

M. Arenas – Exchanging more than Complete Data - RR2011 37 / 68

Page 83: Exchanging more than Complete Data

Universal solutions

What is a good solution in this framework?

M. Arenas – Exchanging more than Complete Data - RR2011 38 / 68

Page 84: Exchanging more than Complete Data

Universal solutions

What is a good solution in this framework?

Definition (APR11)

V is an universal R-solution of U under M if

rep(V) = SolM(rep(U))

M. Arenas – Exchanging more than Complete Data - RR2011 38 / 68

Page 85: Exchanging more than Complete Data

Strong representation systems

Let C be a class of mappings.

M. Arenas – Exchanging more than Complete Data - RR2011 39 / 68

Page 86: Exchanging more than Complete Data

Strong representation systems

Let C be a class of mappings.

Definition (APR11)

R = (W, rep) is a strong representation system for C if for everyM ∈ C and for every U ∈ W , there exists aV ∈ W :

rep(V) = SolM(rep(U))

M. Arenas – Exchanging more than Complete Data - RR2011 39 / 68

Page 87: Exchanging more than Complete Data

Strong representation systems

Let C be a class of mappings.

Definition (APR11)

R = (W, rep) is a strong representation system for C if for everyM ∈ C from S to T, and for every U ∈ W , there exists aV ∈ W :

rep(V) = SolM(rep(U))

M. Arenas – Exchanging more than Complete Data - RR2011 39 / 68

Page 88: Exchanging more than Complete Data

Strong representation systems

Let C be a class of mappings.

Definition (APR11)

R = (W, rep) is a strong representation system for C if for everyM ∈ C from S to T, and for every U ∈ W of type S, there exists aV ∈ W :

rep(V) = SolM(rep(U))

M. Arenas – Exchanging more than Complete Data - RR2011 39 / 68

Page 89: Exchanging more than Complete Data

Strong representation systems

Let C be a class of mappings.

Definition (APR11)

R = (W, rep) is a strong representation system for C if for everyM ∈ C from S to T, and for every U ∈ W of type S, there exists aV ∈ W of type T:

rep(V) = SolM(rep(U))

M. Arenas – Exchanging more than Complete Data - RR2011 39 / 68

Page 90: Exchanging more than Complete Data

Strong representation systems

Let C be a class of mappings.

Definition (APR11)

R = (W, rep) is a strong representation system for C if for everyM ∈ C from S to T, and for every U ∈ W of type S, there exists aV ∈ W of type T:

rep(V) = SolM(rep(U))

If R = (W, rep) is a strong representation system, then theuniversal solutions for the representatives in W can be representedin the same system.

M. Arenas – Exchanging more than Complete Data - RR2011 39 / 68

Page 91: Exchanging more than Complete Data

Outline: Second part

◮ Formalism for exchanging representations systems

◮ Applications to incomplete instances

◮ Applications to knowledge bases

◮ Concluding remarks

M. Arenas – Exchanging more than Complete Data - RR2011 40 / 68

Page 92: Exchanging more than Complete Data

Motivating questions

What is a strong representation system for the class of mappingsspecified by st-tgds?

◮ Are instances including nulls enough?

Can the fundamental data exchange problems be solved inpolynomial time in this setting?

◮ Computing (universal) solutions

◮ Computing certain answers

M. Arenas – Exchanging more than Complete Data - RR2011 41 / 68

Page 93: Exchanging more than Complete Data

Naive instances

We have already considered naive instances: Instances with null values

◮ Example: Canonical universal solution

A naive instance I has labeled nulls:

R(1, n1)R(n1, 2)R(1, n2)

M. Arenas – Exchanging more than Complete Data - RR2011 42 / 68

Page 94: Exchanging more than Complete Data

Naive instances

We have already considered naive instances: Instances with null values

◮ Example: Canonical universal solution

A naive instance I has labeled nulls:

R(1, n1)R(n1, 2)R(1, n2)

The interpretations of I are constructed by replacing nulls by constants:

rep(I) = {K | µ(I) ⊆ K for some valuation µ}

M. Arenas – Exchanging more than Complete Data - RR2011 42 / 68

Page 95: Exchanging more than Complete Data

Are naive instances expressive enough?

Naive instances have been extensively used in data exchange:

Proposition (FKMP03)

Let M = (S,T,Σ), where Σ is a set of st-tgds. Then for everyinstance I of S, there exists a naive instance J of T such that:

rep(J ) = SolM(I )

In fact, the canonical universal solution satisfies the propertymentioned above.

M. Arenas – Exchanging more than Complete Data - RR2011 43 / 68

Page 96: Exchanging more than Complete Data

Are naive instances expressive enough?

But naive instances are not expressive enough to deal withincomplete information in the source instances:

Proposition (APR11)

Naive instances are not a strong representation system for the classof mappings specified by st-tgds

M. Arenas – Exchanging more than Complete Data - RR2011 44 / 68

Page 97: Exchanging more than Complete Data

Are naive instances expressive enough?

Example

Consider a mapping M specified by:

Manager(x , y) → Reports(x , y)

Manager(x , x) → SelfManager(x)

The canonical universal solution for I = {Manager(n,Peter)} under M:

J = {Reports(n,Peter)}

But J is not a good solution for I.

◮ It cannot represent the fact that if n is given value Peter, thenSelfManager(Peter) should hold in the target.

M. Arenas – Exchanging more than Complete Data - RR2011 45 / 68

Page 98: Exchanging more than Complete Data

Conditional instances

What should be added to naive instances to obtain a strongrepresentation system?

M. Arenas – Exchanging more than Complete Data - RR2011 46 / 68

Page 99: Exchanging more than Complete Data

Conditional instances

What should be added to naive instances to obtain a strongrepresentation system?

◮ Answer from database theory: Conditions on the nulls

M. Arenas – Exchanging more than Complete Data - RR2011 46 / 68

Page 100: Exchanging more than Complete Data

Conditional instances

What should be added to naive instances to obtain a strongrepresentation system?

◮ Answer from database theory: Conditions on the nulls

Conditional instances: Naive instances plus tuple conditions

A tuple condition is a positive Boolean combinations of:

◮ equalities and inequalities between nulls, and between nullsand constants

M. Arenas – Exchanging more than Complete Data - RR2011 46 / 68

Page 101: Exchanging more than Complete Data

Conditional instances

Example

R(1, n1) n1 = n2

R(n1, n2) n1 6= n2 ∨ n2 = 2

M. Arenas – Exchanging more than Complete Data - RR2011 47 / 68

Page 102: Exchanging more than Complete Data

Conditional instances

Example

R(1, n1) n1 = n2

R(n1, n2) n1 6= n2 ∨ n2 = 2

Semantics:

M. Arenas – Exchanging more than Complete Data - RR2011 47 / 68

Page 103: Exchanging more than Complete Data

Conditional instances

Example

R(1, n1) n1 = n2

R(n1, n2) n1 6= n2 ∨ n2 = 2

Semantics:

µ(n1) = µ(n2) = 2 µ(n1) = µ(n2) = 3 µ(n1) = 2, µ(n2) = 3

M. Arenas – Exchanging more than Complete Data - RR2011 47 / 68

Page 104: Exchanging more than Complete Data

Conditional instances

Example

R(1, n1) n1 = n2

R(n1, n2) n1 6= n2 ∨ n2 = 2

Semantics:

µ(n1) = µ(n2) = 2R(1, 2)R(2, 2)

µ(n1) = µ(n2) = 3 µ(n1) = 2, µ(n2) = 3

M. Arenas – Exchanging more than Complete Data - RR2011 47 / 68

Page 105: Exchanging more than Complete Data

Conditional instances

Example

R(1, n1) n1 = n2

R(n1, n2) n1 6= n2 ∨ n2 = 2

Semantics:

µ(n1) = µ(n2) = 2R(1, 2)R(2, 2)

µ(n1) = µ(n2) = 3R(1, 3)

µ(n1) = 2, µ(n2) = 3

M. Arenas – Exchanging more than Complete Data - RR2011 47 / 68

Page 106: Exchanging more than Complete Data

Conditional instances

Example

R(1, n1) n1 = n2

R(n1, n2) n1 6= n2 ∨ n2 = 2

Semantics:

µ(n1) = µ(n2) = 2R(1, 2)R(2, 2)

µ(n1) = µ(n2) = 3R(1, 3)

µ(n1) = 2, µ(n2) = 3

R(2, 3)

M. Arenas – Exchanging more than Complete Data - RR2011 47 / 68

Page 107: Exchanging more than Complete Data

Conditional instances

Example

R(1, n1) n1 = n2

R(n1, n2) n1 6= n2 ∨ n2 = 2

Semantics:

µ(n1) = µ(n2) = 2R(1, 2)R(2, 2)

µ(n1) = µ(n2) = 3R(1, 3)

µ(n1) = 2, µ(n2) = 3

R(2, 3)

Interpretations of a conditional instance I:

rep(I) = {K | µ(I) ⊆ K for some valuation µ}

M. Arenas – Exchanging more than Complete Data - RR2011 47 / 68

Page 108: Exchanging more than Complete Data

Positive conditional instances

Many problems are intractable over conditional instances.

◮ We also consider a restricted class of conditional instances

Positive conditional instances: Conditional instances withoutinequalities

M. Arenas – Exchanging more than Complete Data - RR2011 48 / 68

Page 109: Exchanging more than Complete Data

(Positive) conditional instances are enough

Theorem (APR11)

Both conditional instances and positive conditional instances are strongrepresentation systems for the class of mappings specified by st-tgds.

Example

Consider again the mapping M specified by:

Manager(x , y) → Reports(x , y)

Manager(x , x) → SelfManager(x)

The following is a universal solution for I = {Manager(n,Peter)}

Reports(n,Peter) trueSelfManager(Peter) n = Peter

M. Arenas – Exchanging more than Complete Data - RR2011 49 / 68

Page 110: Exchanging more than Complete Data

Positive conditional instances are exactly the needed

representation system

Positive conditional instances are minimal:

Theorem (APR11)

All the following are needed to obtain a strong representationsystem for the class of mappings specified by st-tgds:

◮ equalities between nulls

◮ equalities between constant and nulls

◮ conjunctions and disjunctions

Conditional instances are enough but not minimal.

M. Arenas – Exchanging more than Complete Data - RR2011 50 / 68

Page 111: Exchanging more than Complete Data

Positive conditional instance can be used in practice!

Let M = (S,T,Σ), where Σ is a set of st-tgds.

M. Arenas – Exchanging more than Complete Data - RR2011 51 / 68

Page 112: Exchanging more than Complete Data

Positive conditional instance can be used in practice!

Let M = (S,T,Σ), where Σ is a set of st-tgds.

Theorem (APR11)

There exists a polynomial time algorithm that, given a positiveconditional instance I over S, computes a positive conditional instanceJ over T that is a universal solution for I under M.

M. Arenas – Exchanging more than Complete Data - RR2011 51 / 68

Page 113: Exchanging more than Complete Data

Positive conditional instance can be used in practice!

Let M = (S,T,Σ), where Σ is a set of st-tgds.

Theorem (APR11)

There exists a polynomial time algorithm that, given a positiveconditional instance I over S, computes a positive conditional instanceJ over T that is a universal solution for I under M.

Let Q be a union of conjunctive queries over T.

Q(J ) =⋂

J∈rep(J )

Q(J)

certainM(Q, I) =⋂

J is a solution for I under M

Q(J )

M. Arenas – Exchanging more than Complete Data - RR2011 51 / 68

Page 114: Exchanging more than Complete Data

Positive conditional instance can be used in practice!

Theorem (APR11)

There exists a polynomial time algorithm that, given a positiveconditional instance I over S, computes certainM(Q, I).

M. Arenas – Exchanging more than Complete Data - RR2011 52 / 68

Page 115: Exchanging more than Complete Data

Positive conditional instance can be used in practice!

Theorem (APR11)

There exists a polynomial time algorithm that, given a positiveconditional instance I over S, computes certainM(Q, I).

The same result holds for the class of unions of conjunctive queries withat most one inequality per disjunct.

◮ The other important class of queries in the data exchange area forwhich certain answers can be computed in polynomial time

M. Arenas – Exchanging more than Complete Data - RR2011 52 / 68

Page 116: Exchanging more than Complete Data

Outline: Second part

◮ Formalism for exchanging representations systems

◮ Applications to incomplete instances

◮ Applications to knowledge bases

◮ Concluding remarks

M. Arenas – Exchanging more than Complete Data - RR2011 53 / 68

Page 117: Exchanging more than Complete Data

The semantics of knowledge bases is given by sets of

instances

Knowledge base over S: (I ,Γ) such that

◮ I ∈ Inst(S)

◮ Γ a set of rules over S

Semantics: finite models

Mod(I ,Γ) = {K ∈ Inst(S) | I ⊆ K and K |= Γ}

M. Arenas – Exchanging more than Complete Data - RR2011 54 / 68

Page 118: Exchanging more than Complete Data

We can apply our formalism to knowledge bases

(I2,Γ2) is a KB-solution for (I1,Γ1) under M if:

Mod(I2,Γ2) ⊆ SolM(Mod(I1,Γ1))

(I2,Γ2) is a universal KB-solution for (I1,Γ1) under M if:

Mod(I2,Γ2) = SolM(Mod(I1,Γ1))

M. Arenas – Exchanging more than Complete Data - RR2011 55 / 68

Page 119: Exchanging more than Complete Data

Motivating questions

Same as for the case of instances with incomplete information.

◮ Constructing universal KB-solutions

◮ Answering target queries

New fundamental problem: Construct solutions including as muchimplicit knowledge as possible.

M. Arenas – Exchanging more than Complete Data - RR2011 56 / 68

Page 120: Exchanging more than Complete Data

What are good knowledge-base solutions?

First alternative: universal KB-solutions

But there exist some other KB-solutions desirable to materialize

◮ Minimality comes into play

M. Arenas – Exchanging more than Complete Data - RR2011 57 / 68

Page 121: Exchanging more than Complete Data

What are good knowledge-base solutions?

First alternative: universal KB-solutions

But there exist some other KB-solutions desirable to materialize

◮ Minimality comes into play

Given sets X , Y of instances:

◮ X ≡min Y if X and Y coincide in the minimal instances under ⊆

Definition

(I2, Γ2) is a minimal KB-solution of (I1, Γ1) under M if:

Mod(I2, Γ2) ≡min SolM(Mod(I1, Γ1))

M. Arenas – Exchanging more than Complete Data - RR2011 57 / 68

Page 122: Exchanging more than Complete Data

Two requirements to construct minimal knowledge-base

solutions

Given (I1,Γ1) and M, when constructing a minimal KB-solution(I2,Γ2) we would like:

M. Arenas – Exchanging more than Complete Data - RR2011 58 / 68

Page 123: Exchanging more than Complete Data

Two requirements to construct minimal knowledge-base

solutions

Given (I1,Γ1) and M, when constructing a minimal KB-solution(I2,Γ2) we would like:

1. Γ2 to only depend on Γ1 and M:

Γ2 is safe for Γ1 and M

M. Arenas – Exchanging more than Complete Data - RR2011 58 / 68

Page 124: Exchanging more than Complete Data

Two requirements to construct minimal knowledge-base

solutions

Given (I1,Γ1) and M, when constructing a minimal KB-solution(I2,Γ2) we would like:

1. Γ2 to only depend on Γ1 and M:

Γ2 is safe for Γ1 and M

Definition

Γ2 is safe for Γ1 and M, if for every I1 there exists I2:

(I2,Γ2) is a minimal KB-solution of (I1,Γ1) under M

M. Arenas – Exchanging more than Complete Data - RR2011 58 / 68

Page 125: Exchanging more than Complete Data

Two requirements to construct minimal knowledge-base

solutions

2. Γ2 to be as informative as possible (thus minimizing the sizeof I2):

M. Arenas – Exchanging more than Complete Data - RR2011 59 / 68

Page 126: Exchanging more than Complete Data

Two requirements to construct minimal knowledge-base

solutions

2. Γ2 to be as informative as possible (thus minimizing the sizeof I2):

Definition

Γ2 is optimal-safe if for every other safe set Γ′:

Γ2 |= Γ′

M. Arenas – Exchanging more than Complete Data - RR2011 59 / 68

Page 127: Exchanging more than Complete Data

Computing minimal KB-solutions

To obtain algorithms for computing minimal KB-solutions, we needto specify the language used in knowledge bases.

◮ Full st-tgd:

∀x∀y (ϕ(x , y) → ψ(x))

M. Arenas – Exchanging more than Complete Data - RR2011 60 / 68

Page 128: Exchanging more than Complete Data

Computing minimal KB-solutions

To obtain algorithms for computing minimal KB-solutions, we needto specify the language used in knowledge bases.

◮ Full st-tgd:

∀x∀y (ϕ(x , y) → ψ(x))

Theorem (APR11)

There exists a polynomial-time algorithm that, givenM = (S,T,Σ), where Σ is a set of full st-tgds, and given a set Γ1

of full tgds over S, computes a set Γ2 of second-order logicsentences over T that is optimal-safe for Γ1 and M.

M. Arenas – Exchanging more than Complete Data - RR2011 60 / 68

Page 129: Exchanging more than Complete Data

Computing minimal KB-solutions

Unfortunately, first-order logic is no expressive enough.

Theorem (APR11)

There exist M = (S,T,Σ), where Σ is a set of full st-tgds, and aset Γ1 of full tgds over S such that:

no FO-sentence is optimal-safe for Γ1 and M.

M. Arenas – Exchanging more than Complete Data - RR2011 61 / 68

Page 130: Exchanging more than Complete Data

Computing minimal KB-solutions

Unfortunately, first-order logic is no expressive enough.

Theorem (APR11)

There exist M = (S,T,Σ), where Σ is a set of full st-tgds, and aset Γ1 of full tgds over S such that:

no FO-sentence is optimal-safe for Γ1 and M.

How can we deal with these problems in practice?

M. Arenas – Exchanging more than Complete Data - RR2011 61 / 68

Page 131: Exchanging more than Complete Data

Computing minimal KB-solutions

Unfortunately, first-order logic is no expressive enough.

Theorem (APR11)

There exist M = (S,T,Σ), where Σ is a set of full st-tgds, and aset Γ1 of full tgds over S such that:

no FO-sentence is optimal-safe for Γ1 and M.

How can we deal with these problems in practice?

◮ We need to restrict the language used to specify knowledgebases: Description logics [ABC11]

M. Arenas – Exchanging more than Complete Data - RR2011 61 / 68

Page 132: Exchanging more than Complete Data

Outline: Second part

◮ Formalism for exchanging representations systems

◮ Applications to incomplete instances

◮ Applications to knowledge bases

◮ Concluding remarks

M. Arenas – Exchanging more than Complete Data - RR2011 62 / 68

Page 133: Exchanging more than Complete Data

We can exchange more than complete data

We propose a general formalism to exchange representationsystems

◮ Applications to incomplete instances

◮ Applications to knowledge bases

Next step: Apply our general setting to the Semantic Web

◮ Semantic Web data has nulls (blank nodes)

◮ Semantic Web specifications have rules (RDFS, OWL)

Lots of interesting problems to solve if knowledge bases arespecified by means of description logics.

◮ Better results can be obtained

M. Arenas – Exchanging more than Complete Data - RR2011 63 / 68

Page 134: Exchanging more than Complete Data

We can exchange more than complete data

We propose a general formalism to exchange representationsystems

◮ Applications to incomplete instances

◮ Applications to knowledge bases

Next step: Apply our general setting to the Semantic Web

◮ Semantic Web data has nulls (blank nodes)

◮ Semantic Web specifications have rules (RDFS, OWL)

Lots of interesting problems to solve if knowledge bases arespecified by means of description logics.

◮ Better results can be obtained

M. Arenas – Exchanging more than Complete Data - RR2011 63 / 68

Page 135: Exchanging more than Complete Data

We can exchange more than complete data

We propose a general formalism to exchange representationsystems

◮ Applications to incomplete instances

◮ Applications to knowledge bases

Next step: Apply our general setting to the Semantic Web

◮ Semantic Web data has nulls (blank nodes)

◮ Semantic Web specifications have rules (RDFS, OWL)

Lots of interesting problems to solve if knowledge bases arespecified by means of description logics.

◮ Better results can be obtained

M. Arenas – Exchanging more than Complete Data - RR2011 63 / 68

Page 136: Exchanging more than Complete Data

Thank you!

M. Arenas – Exchanging more than Complete Data - RR2011 64 / 68

Page 137: Exchanging more than Complete Data

Bibliography

[ABC11] M. Arenas, E. Botoeva, D. Calvanese. Knowledge Base Exchange.DL 2011.

[APR11] M. Arenas, J. Perez, J. Reutter. Data Exchange beyond CompleteData. PODS 2011.

[B03] P. A. Bernstein. Applying Model Management to Classical MetaData Problems. CIDR 2003.

[FKMP03] R. Fagin, P. G. Kolaitis, R. J. Miller, L. Popa. Data Exchange:Semantics and Query Answering. ICDT 2003.

M. Arenas – Exchanging more than Complete Data - RR2011 65 / 68

Page 138: Exchanging more than Complete Data

Bonus track: Computation of solutions and its associated

decision problem

Decision problem: Check-KB-Sol

Input: M = (S,T,Σ), where Σ is a set of st-tgds(I1,Γ1) KB over S with Γ1 a set of tgds(I2,Γ2) KB over T with Γ2 a set of tgds

Output: Is (I2,Γ2) a KB-solution of (I1,Γ1) under M?

M. Arenas – Exchanging more than Complete Data - RR2011 66 / 68

Page 139: Exchanging more than Complete Data

Bonus track: Computation of solutions and its associated

decision problem

Decision problem: Check-KB-Sol

Input: M = (S,T,Σ), where Σ is a set of st-tgds(I1,Γ1) KB over S with Γ1 a set of tgds(I2,Γ2) KB over T with Γ2 a set of tgds

Output: Is (I2,Γ2) a KB-solution of (I1,Γ1) under M?

Theorem (APR11)

Check-KB-Sol is undecidable (even for a fixed M).

M. Arenas – Exchanging more than Complete Data - RR2011 66 / 68

Page 140: Exchanging more than Complete Data

Bonus track: Computation of solutions and its associated

decision problem

Undecidability is a consequence of using ∃ in knowledge bases.

◮ We need to restrict the input

Check-Full-KB-Sol: Γ1, Γ2 are assumed to be sets of full tgds

M. Arenas – Exchanging more than Complete Data - RR2011 67 / 68

Page 141: Exchanging more than Complete Data

Bonus track: Computation of solutions and its associated

decision problem

Theorem (APR11)

Check-Full-KB-Sol is EXPTIME-complete.

M. Arenas – Exchanging more than Complete Data - RR2011 68 / 68

Page 142: Exchanging more than Complete Data

Bonus track: Computation of solutions and its associated

decision problem

Theorem (APR11)

Check-Full-KB-Sol is EXPTIME-complete.

Theorem (APR11)

If M = (S,T,Σ) is fixed:

Check-Full-KB-Sol is ∆P2 [O(log n)]-complete.

∆P2 [O(log n)]: PNP with a logarithmic number of

calls to the NP oracle.

M. Arenas – Exchanging more than Complete Data - RR2011 68 / 68