62
1 Debugging Schema Mappings with Routes Laura Chiticariu UC Santa Cruz (joint work with Wang-Chiew Tan)

Debugging Schema Mappings with Routes

  • Upload
    obelia

  • View
    73

  • Download
    0

Embed Size (px)

DESCRIPTION

Debugging Schema Mappings with Routes. Laura Chiticariu UC Santa Cruz (joint work with Wang-Chiew Tan). SPIDER : A S chema Map pi ng De bugge r. Demo group B. Today 14:00-15:30 Thursday 11:00-12:30. I. Source instance. Schema Mappings. - PowerPoint PPT Presentation

Citation preview

Page 1: Debugging Schema Mappings  with Routes

1

Debugging Schema Mappings

with Routes

Laura ChiticariuUC Santa Cruz

(joint work with Wang-Chiew Tan)

Page 2: Debugging Schema Mappings  with Routes

2

SPIDER: A Schema Mapping Debugger

Today 14:00-15:30 Thursday 11:00-12:30

Demo group B

Page 3: Debugging Schema Mappings  with Routes

3

Schema Mappings A schema mapping is a logical assertion that describes the

correspondence between two schemas Key element in data exchange and data integration systems

Data Exchange [FKMP05] Translate data conforming to a source schema S into data

conforming to a target schema T so that the schema mapping M is satisfied

Schema S Schema T

I

Source instance

J

Target instance

M

Page 4: Debugging Schema Mappings  with Routes

4

Debugging a Data Exchange Today

XQuery/XSLT/Java

Debugging at the (low) level of the implementation1. Specific to the data exchange engine2. Specific to the implementation language: XQuery, SQL, etc

Debugging at the level of schema mappings

NO SUPPORT!!!

Schema S Schema T

I

Source instance

J

Target instance

M

Page 5: Debugging Schema Mappings  with Routes

5

Debugging Schema Mappings

Debugging schema mappings: the process of exploring, understanding and refining a schema mapping through the use of (test) data at the level of schema mappings

Schema S Schema T

I

Source instance

J

Target instance

M

Page 6: Debugging Schema Mappings  with Routes

6

Outline Overview

Motivation

Debugging schema mappings with routes Motivating example What are routes? Computing routes Related work

Performance evaluation

Conclusions

Page 7: Debugging Schema Mappings  with Routes

7

Motivation Schema mappings are good

Higher-level, declarative programming constructs Hide implementation details, allow for optimization Typically easier to understand vs. SQL/XSLT/XQuery/Java Serve a similar goal as model management [Bernstein03,

MBHR05]

Uniformity in specifying and debugging Reduce programming effort by allowing a user to specify and

debug at the level of schema mappings

Schema mappings are often generated by schema matching tools Close to user’s intention, but may need further refinements Hard to understand without the help of tools

Page 8: Debugging Schema Mappings  with Routes

8

Language for Schema Mappings Tuple generating dependencies (tgds)

8 x ((x) ! 9 y (x,y)) Equality generating dependencies (egds)

8 x ((x) ! x1 = x2)

Remarks: Widely used for relational schema mappings in data

exchange and data integration [Kolaitis05,Lenzerini02] TGDs generalize LAV, GAV and are equivalent to GLAV

assertions in the terminology of data integration Extended to handle XML data exchange [PVMHF02]

Page 9: Debugging Schema Mappings  with Routes

9

Relational Schema Mappings [FKMP03] Schema mapping M = (S, T, st[t)

S, T: relational schemas with no relation symbols in common Source-to-target dependencies st:

Source-to-target tgds (s-t tgds) S(x) ! 9y T(x,y)

Target dependencies t: Target tgds: T(x) ! 9y T(x,y)

Target egds: T(x) ! x1 = x2

∑st ∑t

Schema S Schema T

I

Source instance

J

Target instance

Page 10: Debugging Schema Mappings  with Routes

10

Example Schema Mapping

Source-to-target dependencies, st:m1: CardHolders(cn,l,s,n) ! 9L (Accounts(cn,L,s) Clients(s,n))

m2: Dependents(an,s,n) ! Clients(s,n)

Target dependencies, t:m3: Clients(s,n) ! A L (Accounts(A,L,s))

MANHATTAN CREDITCardHolders: cardNo ² limit ² ssn ² name ²

Dependents: accNo ² ssn ² name ²

FARGO FINANCEAccounts:² accNo² creditLine² accHolder

Clients:² ssn² name

m2

m1

m3

S: T:

Source instance I Target instance J Solution for I underthe schema mapping

123 $15K ID1 Alice

CardHolders

123 ID2 Bob

Dependents

123 L1 ID1

A2 L2 ID2

AccountsID1 Alice

ID2 Bob

Clients

fk1

Page 11: Debugging Schema Mappings  with Routes

11

Example Debugging Scenario 1

Unknown credit limit?

15K is not copied over to the target

Source instance I Target instance J

123 $15K ID1 Alice

CardHolders

123 ID2 Bob

Dependents

123 L1 ID1

A2 L2 ID2

AccountsID1 Alice

ID2 Bob

Clients

AliceID1$15K123

CardHolders ID1L1123

Accounts

AliceID1

Clientsm1

A route for the Accounts tuple

m1: CardHolders(cn,l,s,n) ! 9L (Accounts(cn,L,s) ^ Clients(s,n))

Page 12: Debugging Schema Mappings  with Routes

12

Example Debugging Scenario 1

Unknown credit limit?

15K is not copied over to the target

Source instance I Target instance J

123 $15K ID1 Alice

CardHolders

123 ID2 Bob

Dependents

123 L1 ID1

A2 L2 ID2

AccountsID1 Alice

ID2 Bob

Clients

AliceID1$15K123

CardHolders ID1L1123

Accounts

AliceID1

Clientsm1

A route for the Accounts tuple

m1: CardHolders(cn,l,s,n) ! (Accounts(cn,l,s) ^ Clients(s,n))

Page 13: Debugging Schema Mappings  with Routes

13

Example Debugging Scenario 2

Unknown account number?

123 is not copied over to the target as Bob’s account number

Source instance I Target instance J

123 $15K ID1 Alice

CardHolders

123 ID2 Bob

Dependents

123 L1 ID1

A2 L2 ID2

AccountsID1 Alice

ID2 Bob

Clients

m2BobID2123

Dependents

ID2L2A2

Accounts

BobID2

Clients m3

Route for Accounts tuple with accNo A2

m2: Dependents(an,s,n) ! Clients(s,n)

Page 14: Debugging Schema Mappings  with Routes

14

Example Debugging Scenario 2

Unknown account number?

123 is not copied over to the target as Bob’s account number

Source instance I Target instance J

123 $15K ID1 Alice

CardHolders

123 ID2 Bob

Dependents

123 L1 ID1

A2 L2 ID2

AccountsID1 Alice

ID2 Bob

Clients

m2BobID2123

Dependents

ID2L2A2

Accounts

BobID2

Clients m3

Route for Accounts tuple with accNo A2

m’2: CardHolders(an,l,s’,n’) ^ Dependents(an,s,n) ! Accounts(an,l,s) ^ Clients(s,n)

Page 15: Debugging Schema Mappings  with Routes

15

Debugging Schema Mappings with Routes Main intuition: routes describe the relationships between

source and target data with the schema mapping

Definition: Let: M be a schema mapping I be a source instance J be a solution for I under M and Js µ J

A route for Js with M and (I,J) is a finite non-empty sequence of satisfaction steps

(I,;) ! (I,J1) ! … ! (I,Jn)

such that: Ji µ J, mi 2 st [ t, where 1· i· n Js µ Jn

m1, h1 m2, h2mn, hn

Page 16: Debugging Schema Mappings  with Routes

16

Example of Satisfaction Step

123 $15K ID1 Alice

CardHolders123 L1 ID1

Accounts

ID1 Alice

Clients

m1, h1

m1: CardHolders(cn, l, s, n) ! 9L (Accounts(cn, L, s ) ^ Clients(s, n ))

h1={cn ! ‘123’, l ! $15K, s ! ID1, n ! Alice, L ! L1}

Unknown credit limit?

Source instance I Target instance J

123 $15K ID1 Alice

CardHolders

123 ID2 Bob

Dependents

123 L1 ID1

A2 L2 ID2

AccountsID1 Alice

ID2 Bob

Clients

Page 17: Debugging Schema Mappings  with Routes

17

Compute all routes The schema mapping M is fixed

Input: source instance I, a solution J for I under M, a set of target tuples Js µ J

Output: a forest representing all routes for Js

Algorithm idea: For each tuple t in Js, consider every possible 2 st [ t

and h for witnessing t Do the same for all target tuples encountered during the

process until tuples from the source instance are obtained

Page 18: Debugging Schema Mappings  with Routes

18

Compute all routes: A simple example st:

1: S1(x) ! T1(x) 2: S2(x) ! T2(x) Æ T6(x)

t: 3: T2(x) ! T3(x) 4: T3(x) ! T4(x) 5: T4(x) Æ T1(x) ! T5(x) 6: T4(x) Æ T6(x) ! T7(x) 7: T5(x) ! T3(x)

Source instance, I: S1(a), S2(a)

A solution, J: T1(a), …, T7(a)

T7(a)

T4(a) T6(a)

6, x a

Page 19: Debugging Schema Mappings  with Routes

19

Compute all routes: A simple example st:

1: S1(x) ! T1(x) 2: S2(x) ! T2(x) Æ T6(x)

t: 3: T2(x) ! T3(x) 4: T3(x) ! T4(x) 5: T4(x) Æ T1(x) ! T5(x) 6: T4(x) Æ T6(x) ! T7(x) 7: T5(x) ! T3(x)

Source instance, I: S1(a), S2(a)

A solution, J: T1(a), …, T7(a)

T7(a)

T4(a) T6(a)

6

T3(a)

4, x a

Page 20: Debugging Schema Mappings  with Routes

20

Compute all routes: A simple example st:

1: S1(x) ! T1(x) 2: S2(x) ! T2(x) Æ T6(x)

t: 3: T2(x) ! T3(x) 4: T3(x) ! T4(x) 5: T4(x) Æ T1(x) ! T5(x) 6: T4(x) Æ T6(x) ! T7(x) 7: T5(x) ! T3(x)

Source instance, I: S1(a), S2(a)

A solution, J: T1(a), …, T7(a)

T7(a)

T4(a) T6(a)

6

T3(a)

4

T5(a)

7

Page 21: Debugging Schema Mappings  with Routes

21

Compute all routes: A simple example st:

1: S1(x) ! T1(x) 2: S2(x) ! T2(x) Æ T6(x)

t: 3: T2(x) ! T3(x) 4: T3(x) ! T4(x) 5: T4(x) Æ T1(x) ! T5(x) 6: T4(x) Æ T6(x) ! T7(x) 7: T5(x) ! T3(x)

Source instance, I: S1(a), S2(a)

A solution, J: T1(a), …, T7(a)

T7(a)

T4(a) T6(a)

6

T3(a)

T5(a)

4

7

T4(a) T1(a)

5

S1(a)

1

Page 22: Debugging Schema Mappings  with Routes

22

Compute all routes: A simple example st:

1: S1(x) ! T1(x) 2: S2(x) ! T2(x) Æ T6(x)

t: 3: T2(x) ! T3(x) 4: T3(x) ! T4(x) 5: T4(x) Æ T1(x) ! T5(x) 6: T4(x) Æ T6(x) ! T7(x) 7: T5(x) ! T3(x)

Source instance, I: S1(a), S2(a)

A solution, J: T1(a), …, T7(a)

T7(a)

T4(a) T6(a)

6

T3(a)

T5(a)

4

7

S2(a)

2

T4(a) T1(a)

5

T2(a)

S2(a)

3

2

S1(a)

1

Page 23: Debugging Schema Mappings  with Routes

23

Compute all routes: A simple example st:

1: S1(x) ! T1(x) 2: S2(x) ! T2(x) Æ T6(x)

t: 3: T2(x) ! T3(x) 4: T3(x) ! T4(x) 5: T4(x) Æ T1(x) ! T5(x) 6: T4(x) Æ T6(x) ! T7(x) 7: T5(x) ! T3(x)

Source instance, I: S1(a), S2(a)

A solution, J: T1(a), …, T7(a)

T7(a)

T4(a) T6(a)

6

T3(a)

T5(a)

4

7

S2(a)

8

T4(a) T1(a)

5

T2(a)

S2(a)

3

2

S1(a)

1

Route for T7(a): 2, 3, 4, 8, 6

Page 24: Debugging Schema Mappings  with Routes

24

Properties of compute all routes Completeness:

Let F denote the route forest by our algorithm returned on Js. If R is a minimal route for Js, then it is represented in F.

Running time: polynomial in the sizes of I, J and Js Every “branch of a tuple” once explored, is never

explored again Polynomial number of branches for each tuple since M is

fixed

Challenge: Exponentially many routes, but polynomial-size

representation constructed in polynomial time

Page 25: Debugging Schema Mappings  with Routes

25

Compute one route Our experimental results indicate that compute all routes

can be expensive Generate one route fast and alternative routes as needed?

Our solution: adapt compute all routes to compute only one route Non-exhaustive: Stops when one witness is found. A

witness that uses source tuples is preferred Inference procedure: to deduce all consequences of a

proven tuple and avoid recomputation of “branches” Key step for polynomial time analysis

Completeness: If there is a route for Js, then our algorithm will produce a route for Js

Page 26: Debugging Schema Mappings  with Routes

26

Related work Commercial data exchange systems

e.g., Altova MapForce, Stylus Studio Use “lower-level” languages (e.g., XSLT, XQuery) to

specify the exchange Debugging is done at this low level Source tuple centric

Data viewer [YMHF01] Constructs an “example” source instance illustrative for

the behavior of the schema mapping Complementary to our approach

Works only for relational schema mappings

Page 27: Debugging Schema Mappings  with Routes

27

Related work Computing routes for target data is related to

computing provenance (aka lineage) of data

SQL Schema mappings

Eager DBNotes [B.TV04] Mondrian [GKM06]

MXQL system[VMM05]

Lazy [CWW00][CW00a, CW00b]

Our routes approach

Page 28: Debugging Schema Mappings  with Routes

28

Empirical Evaluation Implementation: on top of the Clio data exchange system from

IBM Almaden Research Center Scalable: push computation to the database Handles relational and XML schema mappings [PVMHF02]

Testbed: Created relational and XML schema mappings based on the TPCH schema Created schema mappings based on Mondial, DBLP and Amalgam

schemas

Methodology - measured the influence of: The sizes of I, J and Js

The complexity of st [ t i.e., the number of tgds and the number of atoms in each tgd

Setup: P4 2.8GHz, 2Gb RAM, 256MB DB2 buffer pool

Our regret: No benchmark to base our comparisons

Page 29: Debugging Schema Mappings  with Routes

29

ComputeOneRoute with Rel. schema mappingInfluence of the Sizes of I and J

TGDs with 1 join in the LHS and RHS Routes with 3 satisfaction steps for each selected tuple

0

2

4

6

8

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

# selected target tuples

Co

mp

ute

on

e ro

ute

(s

ec)

I:10MB; J:60MB I:50MB; J:300MB I:100MB; J:600MB

Page 30: Debugging Schema Mappings  with Routes

30

ComputeOneRoute with Rel. schema mappingInfluence of the Complexity of st [ t

TGDs with 0 to 3 joins in the LHS and RHSRoutes with 3 satisfaction steps for each selected tuple

Size of I = 100MB, Size of J = 600MB

0

5

10

15

20

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

# selected target tuples

Co

mp

ute

on

e

rou

te (

sec)

no joins 1 join 2 joins 3 joins

Page 31: Debugging Schema Mappings  with Routes

31

ComputeOneRoute vs. ComputeAllRoutes

TGDs with 1 join in the LHS and RHSRoutes with 3 satisfaction steps

Size of I = 100MB, Size of J = 600MB

0.0010.010.1

110

1001000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

# selected target tuples

Ru

nn

ing

tim

e (s

ec)

computeOneRoute computeAllRoutes

Page 32: Debugging Schema Mappings  with Routes

32

Experimental results with Mondial, DBLP and AmalgamSchemas Total

Elem.AtomicElems.

Nest.Depth

Inst. Size

|st|/|t|

S DBLP1 (XML) 65 57 1 640KB 10/14

DBLP2 (XML) 20 12 4 850KB

T Amalgam (rel) 117 100 1 1.1MB

S Mondial1 (rel) 157 129 1 1 MB 13/25

T Mondial2 (XML) 144 112 4 1.2MB

Page 33: Debugging Schema Mappings  with Routes

33

Experimental results with Mondial, DBLP and AmalgamSchemas Total

Elem.AtomicElems.

Nest.Depth

Inst. Size

|st|/|t|

S DBLP1 (XML) 65 57 1 640KB 10/14

DBLP2 (XML) 20 12 4 850KB

T Amalgam (rel) 117 100 1 1.1MB

S Mondial1 (rel) 157 129 1 1 MB 13/25

T Mondial2 (XML) 144 112 4 1.2MB

Two DBLP schemas and datasets, both XML: DBLP1, DBLP2

First relational schema from Amalgam test suite

Page 34: Debugging Schema Mappings  with Routes

34

Experimental results with Mondial, DBLP and AmalgamSchemas Total

Elem.AtomicElems.

Nest.Depth

Inst. Size

|st|/|t|

S DBLP1 (XML) 65 57 1 640KB 10/14

DBLP2 (XML) 20 12 4 850KB

T Amalgam (rel) 117 100 1 1.1MB

S Mondial1 (rel) 157 129 1 1 MB 13/25

T Mondial2 (XML) 144 112 4 1.2MB

Two DBLP schemas and datasets, both XML: DBLP1, DBLP2

First relational schema from Amalgam test suite Two Mondial schemas and datasets:

one relational (Mondial1), the other XML (Mondial2) Designed

st and used the foreign key constraints as t

Page 35: Debugging Schema Mappings  with Routes

35

Experimental results with Mondial, DBLP and AmalgamSchemas Total

Elem.AtomicElems.

Nest.Depth

Inst. Size

|st|/|t|

S DBLP1 (XML) 65 57 1 640KB 10/14

DBLP2 (XML) 20 12 4 850KB

T Amalgam (rel) 117 100 1 1.1MB

S Mondial1 (rel) 157 129 1 1 MB 13/25

T Mondial2 (XML) 144 112 4 1.2MB

Compute one route: under 3 seconds for 1-10 randomly selected tuples

Compute all routes: can take much longer 18 seconds to construct the route forest for 10 selected

tuples in the target instance of Mondial Compute one route took under 1 second

Page 36: Debugging Schema Mappings  with Routes

36

Conclusions Debugging schema mappings with routes

Complete, polynomial time algorithms for computing routes

Extension for routes for selected source data

Routes have declarative semantics, based on the logical satisfaction of tgds What we don’t do: illustrate data merging

Future work: Illustrate grouping semantics for nested schema

mappings Adapt target instance to changes in the schema

mapping and data sources

Page 37: Debugging Schema Mappings  with Routes

37

SPIDER: A Schema Mappings Debugger

Compute one/all routes Alternative routes Guided computation of

routes Standard debugging

features Breakpoints “Watch” windows

Schema-level routes

Today 14:00-15:30 Thursday 11:00-12:30

Demo group B

Page 38: Debugging Schema Mappings  with Routes

38

Thank you!

Page 39: Debugging Schema Mappings  with Routes

39

How do we do it?

Sourceinstance I

Sourceschema S

TargetSchema T

Targetinstance J

M

Schema mappin

gs debugg

er

routes

Witness selected target data

with source data and M

Page 40: Debugging Schema Mappings  with Routes

40

How do we do it?

Sourceinstance I

Sourceschema S

TargetSchema T

Targetinstance J

M

Schema mappin

gs debugg

er

routes

Illustrate consequences of

selected source data with M

Page 41: Debugging Schema Mappings  with Routes

41

Key Concept: ROUTES - describe the relationships between source and target data with the schema mapping

Sourceinstance I

Sourceschema S

TargetSchema T

Targetinstance J

M

Schema mappin

gs debugg

er

routes

Page 42: Debugging Schema Mappings  with Routes

42

Clio

A semi-automatic schema mapping system Supports user-guided mapping from source to target with constraints Schema mapping language: a nested extension of tgds and egds Automatically generate XQuery/SQL/XSLT scripts for the actual data

transferring based on the schema mapping Generates universal solutions under relational-to-relational schema

mappings Implemented our techniques on top of Clio, but…

Routes have declarative semantics Independent of Clio’s transformation engine

Data

Schema Schema

Data

Mapping

XQuery/SQL/XSLT

Page 43: Debugging Schema Mappings  with Routes

43

Related work Computing routes for target data is related to

computing provenance (aka lineage) of data

SQL Schema mappings

Eager DBNotes [B.TV04] Mondrian [GKM06]

MXQL system[VMM05]

Lazy [CWW00][CW00a, CW00b]

Our routes approach

Q

Q’

Provenanceinformation

Page 44: Debugging Schema Mappings  with Routes

44

Related work Computing routes for target data is related to

computing provenance (aka lineage) of data

SQL Schema mappings

Eager DBNotes [B.TV04] Mondrian [GKM06]

MXQL system[VMM05]

Lazy [CWW00][CW00a, CW00b]

Our routes approach

Q

No reengineeringof the query

Page 45: Debugging Schema Mappings  with Routes

45

Related work Approaches to computing provenance:

Eager: changes the transformation to carry provenance information

Requires re-engineering of Q to Q’. No subsequent source access or access to the definition of Q or Q’.

Lazy: does not No re-engineering of Q. Subsequent source access and

access to the definition of Q may be needed.

Q

Q’

Provenanceinformation

Eager

Page 46: Debugging Schema Mappings  with Routes

46

Related work Computing routes for target data is related to

computing provenance (aka lineage) of data

SQL Schema mappings

Eager DBNotes [BCTV04] Mondrian [GKM06]

MXQL system[VMM05]

Lazy [CWW00][CW00a, CW00b]

Our routes approach

Page 47: Debugging Schema Mappings  with Routes

47

Programming Languages vs. Schema Mappings Debugging programming languages vs. debugging schema

mappings Procedural PL

We may have a specification (e.g. compute x2 on input x) which completely determines the output

Well-defined notion of correct answer The program is an implementation of the specification If the correct answer is not obtained, there’s a bug – need to debug the

implementation However, the specification may also not be that concrete

E.g., build a visual interface for … Functional PL

Debugging is performed by analyzing a trace of the execution Declarative approach for debugging [Nilsson94]

Schema mapping IS the specification Infinite number of solutions consistent with the schema mapping Best we can do: look at the target instance – if something looks

wrong (e.g., the clients’ names are not copied to the target) go back to the schema mapping and try to refine it (or debug it)

Page 48: Debugging Schema Mappings  with Routes

48

Related Work: Computing Provenance of Data over SQL queries Compute the provenance of relational data in a

view in data warehouses [CWW2000] The provenance of a tuple t in a view is described as the

tuples in the base tables that witness the existence of t

SQL Schema mappings

Eager DBNotes[BCTV2004][CTV2005]

MXQL system[VMM2005]

Lazy [CWW2000] Our approach1 2

4 5

R2 3

6 7

S

View definition:T(a,c) :- R(a,b) Æ S(b,c)

1 3

T

Provenance answered using two reverse queries:R(a,b) :- R(a,b) Æ S(b,c) Æ a=1 Æ c=3S(a,b) :- R(a,b) Æ S(b,c) Æ a=1 Æ c=3

DB

Page 49: Debugging Schema Mappings  with Routes

49

Related Work: Computing Provenance of data over SQL queries DBNotes: an annotation management system for relational

databases Each data value has zero or more annotations pSQL: a query language for propagating annotations

3 propagation schemes: DEFAULT, DEFAULT-ALL, CUSTOM By default, annotations propagate according to provenance

Eager approach: annotations propagate along with data as data is transformed through queries

Provenance information readily available in the output Automatically trace the provenance and flow of data over

multiple transformation steps Systematically maintains provenance annotations that describe the

exact location of data values

SQL Schema mappings

Eager DBNotes[BCTV2004][CTV2005]

MXQL system[VMM2005]

Lazy [CWW2000] Our approach1 2

4 5

R2 3

6 7

S

1 3

T

DB1 DB2Transformation:T(a,c):-R(a,b)ÆS(b,c)

Page 50: Debugging Schema Mappings  with Routes

50

Related Work: Computing the Provenance of Data over Schema Mappings MXQL system over relational/XML schema

mappings Eager approach

Additional info about source schema elements and mappings that contribute to the creation of target data is propagated and stored

Our approach is lazy: no reengineering Non-automatic approach for answering provenance

The additional info needs to be queried using MXQL We automatically compute routes for selected data

Data involved in the transformation not considered Our routes contain information

about schema elements, dependencies and data involved SQL

Schema mappings

Eager DBNotes[BCTV2004][CTV2005]

MXQL system[VMM2005]

Lazy [CWW2000] Our approach

Page 51: Debugging Schema Mappings  with Routes

51

Related Work: the Data Viewer Schema mapping M=(S, T, st[t)

S: Dept(dID,dName) and Emp(dID,name) T: DeptEmp(dID,dName,employee)

st: Dept(id,n) ! 9E DeptEmp(id,n,E) Dept(id,n) Æ Emp(id,e) ! DeptEmp(id,n,e) t = ;

Example source instance created to illustrate M

D1 Computer Science

D2 Anthropology

DeptD1 Alice

D3 Bob

Emp

Department that has at least one employee

(will join with Emp)

Department withno employee(will not join with Emp)

Employee of a department

Employee withno department

(will not appear in the target)

Page 52: Debugging Schema Mappings  with Routes

52

Universal Solutions [FKMP02] Definition: Given two instances K1 and K2, a homomorphism

h: K1 → K2 is a function h: Const[Var ! Const[Var such that: h(c) = c for all constants c For every fact R(a1, …, an) 2 K1, the fact R(h(a1), …, h(an)) 2 K2

Example: J1={V(1,N1), V(N2,2)}, J2={V(1,2)} h:J1 ! J2 is h={1 1, N1 2, N2 1, 2 2}

Definition: Let M=(S,T,st[t) be a schema mapping. If I is a source instance, then a universal solution for I is a solution J for I such that for every solution J’ for I, there exist a homomorphism h : J→J’

Example: st : R(x) ! 9N V(x,N) U(x) ! 9N V(N,x) Source instance I={R(1), U(2)} J2={V(1,2)} is not a universal solution for I J1={V(1,N1), V(N2,2)} is a universal solution for I

Page 53: Debugging Schema Mappings  with Routes

53

Homomorphism Definition: Let (x) be a conjunction of atoms and

K be an instance. A homomorphism h: (x) ! K is such that h((x)) =

{ R(h(z)) 2 K | R(z) is a rel. atom in (x) }

Example: Two homomorphisms from

Accounts(u,v,w) ^ Clients(w,x) to the target instance J

Target instance J

123 L1 ID1

A2 L2 ID2

AccountsID1 Alice

ID2 Bob

Clients

Page 54: Debugging Schema Mappings  with Routes

54

A Satisfaction Step Definition: Let be a tgd 8x (x) ! 9y (x,y): Let K and K1 are instances such that:

K1 µ K K ² Let h: (x) ^ (x,y) ! K be a homomorphism such that h is

also a homomorphism from (x) to K1.

Let K2 = K1 [ h((x,y)).

Then the result of satisfying on K1 with homomorphism h and solution K is K2.

K1 K2 h

Page 55: Debugging Schema Mappings  with Routes

55

Satisfaction Step: Remark 1 Satisfaction step chase step [FKMP02]

Definition based on logical satisfaction of tgds, not tied to implementation of the exchange

Example: st:EmpPhone(x,y) ! 9 z Emp(x,y,z) (1) EmpFax(x,z) ! 9 y Emp(x,y,z) (2) t: Emp(x,y,z) Æ Emp(x,u,v) ! y=u Æ z=v I={ s1: EmpPhone(Mary, p123), s

2: EmpFax(Mary, f567) }

J={ t: Emp(Mary, p123, f567) }

Two routes for t: s1 ! t and s2 ! t

Both routes make an assumption about the values taken by the existentials (z and y are assumed to e f567 and p123, respectively)

The egd is not used in the routes

We don’t have satisfaction steps with egds If K satisfies and egd , then K1 also satisfies , since K µ K1

Page 56: Debugging Schema Mappings  with Routes

56

Satisfaction Step: Remark 2 Satisfaction step solution-aware chase step

[FKT05]

Example: st : S(x) ! 9 N T(x,N) I={S(1)} J={t1:: T(1,N1), t2: T(1,N2)} is a solution for I

A route for J: h I, ; i ! h I, {t1} i ! h I, {t1,t

2} i

h1={x 1, N N1} and h

2={x 1, N N

2}

No solution-aware chase sequence produces both t1 and t2

h1 h2

Page 57: Debugging Schema Mappings  with Routes

57

Computing all routes for target tuples The schema mapping M is fixed Input:

source instance I target instance J a set of target tuples Js µ J

Output: a route forest for Js that concisely represents all routes for Js

Algorithm idea: reverse chase For each tuple R(a) in Js, consider every possible and h

for witnessing R(a) Do the same for all target tuples encountered during the

construction Do not consider the same tuple twice

Page 58: Debugging Schema Mappings  with Routes

58

Computing all routes: Properties Running time: polynomial in the sizes of M, I and J

At most |I|+|J| tuples in the forest Polynomial number of branches for each tuple A branch is not explored twice Reverse chase is efficient: push the computation to the

database

Completeness: the route forest embeds every minimal route for Js A minimal route for Js is a route for Js with no redundant

satisfaction steps

Page 59: Debugging Schema Mappings  with Routes

59

Computing one route for Js

Running time: polynomial in the sizes of M, I and J

Completeness: if there is a route for Js, then the algorithm will find a route for Js

Much faster compared to computing all routes No need to explore the entire route forest Possible to construct additional routes as needed

Page 60: Debugging Schema Mappings  with Routes

60

Some implementation details Scalable approach: steps in routes are discovered by pushing

computation to the database engine

Example: Source-to-target tgd: S(x,y) ! 9U9V (T1(x,U) ^ T2(U,V,y)) T1(a,b) matched against RHS LHS query:

S(a,y) is executed against the source instance using the db RHS query:

T1(a,b) ^ T2(b,V,c) is executed against the target instance using the db

Each binding for y generates one RHS query Design choice to decouple LHS and RHS queries

Extended for XML schema mappings

Page 61: Debugging Schema Mappings  with Routes

61

Comparison with Approaches for Evaluating Datalog Top-down techniques: OLDT, QSQ, Rule/goal

graphs Similarities: use memoization to avoid redundant

computation and infinite loops Major difference: the target instance J is available and

we leverage it to: Obtain completely instantiated facts during reverse chase Hence, avoid redundant computation earlier

Magic set rewriting technique: Possible to obtain all tuples that contribute to the

creation of Js However, need to recover the routes from the evaluation

of the magic rules

Page 62: Debugging Schema Mappings  with Routes

62

Top-down Example st:

1: S1(x,y) ! T1(x,y)

2: S2(x,y,u) ! T2(x,y,u)

t: 3: T1(x,y) Æ T2(y,z,u) ! T3(x,z)

I: S1(1,2)

J: T1(1,2), T3(1,3)

T3(1,3)

T1(1,y) Æ T2(y,3,u)

S1(1,y) T2(2,3,u)

S2(2,3,u)

y 2

y 2