39
Answering Tree Pattern Queries Using Views Laks V.S. Lakshmanan, Hui (Wendy) Wang, and Zheng (Jessica) Zhao University of British Columbia Vancouver, BC Amazon.com

Answering Tree Pattern Queries Using Views

  • Upload
    jaimin

  • View
    44

  • Download
    1

Embed Size (px)

DESCRIPTION

Answering Tree Pattern Queries Using Views. Laks V.S. Lakshmanan , Hui (Wendy) Wang , and Zheng (Jessica) Zhao University of British Columbia Vancouver, BC Amazon.com. Outline. Motivation Problems Studied Without schema With schema Recursive schemas Related Work - PowerPoint PPT Presentation

Citation preview

Page 1: Answering Tree Pattern Queries Using Views

Answering Tree Pattern Queries Using Views

Laks V.S. Lakshmanan, Hui (Wendy) Wang, and Zheng (Jessica) Zhao

University of British ColumbiaVancouver, BC

Amazon.com

Page 2: Answering Tree Pattern Queries Using Views

Outline

Motivation Problems Studied Without schema With schema Recursive schemas Related Work Summary & Future Work

Page 3: Answering Tree Pattern Queries Using Views

Motivation 1/3

Integration of existing data sources. Local as view (LAV) – one of the well-

known approaches. Each source = a materialized view over

some global database. Answer to query over global DB =

answer to query using (materialized) views.

Page 4: Answering Tree Pattern Queries Using Views

Motivation 2/3 <Trial> (3) <Patient> (4) John Doe </Patient> … <Status> (10) Complete </Status> </Trial> <Trial> (11) <Patient> (12) Jen Bloe </Patient> … </Trial> <Trial> (14) <Patient> (15) Mary Moore </Patient> … </Trial>

Source = View “//Trials//Trial” over some DB containing clinical data – trials, their status, patient data, etc.

Consider query Q: //Trials[//Status]//Trial over [unknown] original DB.

How can and should we answer it using above source?

Page 5: Answering Tree Pattern Queries Using Views

Motivation 3/3<PharmaLab> (1) <Trials @type=“T1”> (2) <Trial> (3) <Patient> (4) John Doe </Patient> … <Status> (10) Complete </Status> </Trial> <Trial> (11) <Patient> (12) Jen Bloe </Patient> … </Trial> </Trials> <Trials @type=“T2”> (13) <Trial> (14) <Patient> (15) Mary Moore </Patient> … </Trial> </Trials> </PharmaLab>

<Trial> (3) <Patient> (4) John Doe </Patient> … <Status> (10) Complete </Status> </Trial> <Trial> (11) <Patient> (12) Jen Bloe </Patient> … </Trial> <Trial> (14) <Patient> (15) Mary Moore </Patient> … </Trial>

//Trials//Trial//Trials//Trial?? ??

On

e p

ossib

le o

rig

inal D

B

Page 6: Answering Tree Pattern Queries Using Views

Motivation 3/3<PharmaLab> (1) <Trials @type=“T1”> (2) <Trial> (3) <Patient> (4) John Doe </Patient> … <Status> (10) Complete </Status> </Trial> <Trial> (11) <Patient> (12) Jen Bloe </Patient> …

</Trial> </Trials> <Trials @type=“T2”> (13) <Trial> (14) <Patient> (15) Mary Moore </Patient> … </Trial> </Trials> </PharmaLab>

<Trial> (3) <Patient> (4) John Doe </Patient> … <Status> (10) Complete </Status> </Trial> <Trial> (11) <Patient> (12) Jen Bloe </Patient> … </Trial> <Trial> (14) <Patient> (15) Mary Moore </Patient> … </Trial>

//Trials//Trial//Trials//Trial?? ??

On

e p

ossib

le o

rig

inal D

BQ: //Trials[//Status]//Trial

Page 7: Answering Tree Pattern Queries Using Views

Motivation 3/3<PharmaLab> (1) <Trials @type=“T1”> (2) <Trial> (3) <Patient> (4) John Doe </Patient> … <Status> (10) Complete </Status> </Trial> <Trial> (11) <Patient> (12) Jen Bloe </Patient> … </Trial> </Trials> <Trials @type=“T2”> (13) <Trial> (14) <Patient> (15) Mary Moore </Patient> … </Trial> </Trials> </PharmaLab>

<Trial> (3) <Patient> (4) John Doe </Patient> … <Status> (10) Complete </Status> </Trial> <Trial> (11) <Patient> (12) Jen Bloe </Patient> … </Trial> <Trial> (14) <Patient> (15) Mary Moore </Patient> … </Trial>

//Trials//Trial//Trials//Trial

On

e p

ossib

le o

rig

inal D

B

◦ “●◦ “●[//Status]”[//Status]” { (3) }

Contained rewriting

Page 8: Answering Tree Pattern Queries Using Views

Problems Studied 1/3 Equivalent Rewriting: Given Q and

views V, find an equivalent rewriting of Q using V, i.e., an expression E s.t. V◦E Ξ Q, over all possible input DBs. Appropriate for query optimization.

Contained Rewriting: Given Q and V, find an expression E s.t. V◦E Q overall all possible input DBs, and V◦E is maximal among all such rewritings. Most appropriate for information integration

[Halevy, Lenzerini, Pottinger & Halevy].

Page 9: Answering Tree Pattern Queries Using Views

Problems Studied 2/3No Schema: Given Q and V, find a

maximally contained rewriting (MCR) of Q using V.

With Schema: Given Q and V, and a schema prescribing possible input DBs, find a maximally contained rewriting of Q using V.

Focus: Tree Pattern Queries (XP/,//, [ ]). Schema without cycles, union, and

recursion.

Page 10: Answering Tree Pattern Queries Using Views

Problems Studied 3/3

Given Q & V: R Ξ V ◦ E Q.

Compensation queryRewriting query

Want MCR in the absence and in the presence of a schema.

//a[//b]/c //a

b c

Page 11: Answering Tree Pattern Queries Using Views

Without Schema 1/6

Question 1: Does an MCR always exist?

/a/a

bb

cc

VV

/b/b

d

Q1Q1

/a/a

b

dd

Q2Q2

No MCR for Q1 and for Q2.

What went wrong?

distinguished (answer) node

Page 12: Answering Tree Pattern Queries Using Views

Without Schema 2/6

Trial

//Trials//Trials //Trials//Trials

StatusStatus Patient

(1)(1)

(2)(2) (3)(3)Trial

//Trials//Trials

StatusStatus Patient

(1)(1)

(2)(2) (3)(3)

Unfulfilled obligationsClip Away Tree (CAT)

f

f – useful embedding

VVQQ

V

E

Page 13: Answering Tree Pattern Queries Using Views

Without Schema 3/6Theorem: Q, V – tree pattern queries.

Then Q is answerable using V iff there is a useful embedding from Q to V.

aa

dd

baa

//a//a

c c

bba a aa

b

cc

2

1

77

33

44

55V ee

bb

ccQ

1,21,2

1:{2}, 2:{}1:{2}, 2:{}

2:{6}2:{6}

6:{7}6:{7}

1:{2,3}, 2:{3}1:{2,3}, 2:{3}

2:{6}, 3:{4} 2:{6}, 3:{4}

4:{5}, 6:{7} 4:{5}, 6:{7}

Testing Existence of MCR:

//a//a

6

Page 14: Answering Tree Pattern Queries Using Views

Without Schema 4/6

Two embeddings – corresponding irredundant CRs.

aa

baa

cb

cc

//a//a

d e

aa

baa

cb

cc

//a//a

a eb

c d

need for expressing

MCR!

Page 15: Answering Tree Pattern Queries Using Views

Without Schema 5/6Can test existence of MCR in poly time. However, MCRs can be exponentially

large (closure issue).

eedd

aa aa

c

bb

//a//a

aa

//a//a

c

bb

c

bbVVQQ

How many irredundant CRs

are possible?

Page 16: Answering Tree Pattern Queries Using Views

Without Schema 5/6

eedd

aa aa

c

bb

//a//a

aa

//a//a

c

bb

c

bbVV

QQc

bb

aa

//a//a

d e

Page 17: Answering Tree Pattern Queries Using Views

Without Schema 5/6

eedd

aa aa

c

bb

//a//a

aa

//a//a

c

bb

c

bbVV

QQc

bb

aa

//a//a

d

a/b/c/e

Page 18: Answering Tree Pattern Queries Using Views

Without Schema 5/6

eedd

aa aa

c

bb

//a//a

aa

//a//a

c

bb

c

bbVV

QQc

bb

aa

//a//a

ea/b

e

c

Page 19: Answering Tree Pattern Queries Using Views

Without Schema 5/6

eedd

aa aa

c

bb

//a//a

aa

//a//a

c

bb

c

bbVV

QQc

bb

aa

//a//a

a/b/c/ea/b

e

cMCR = union of exponential

# CRs in the worst case!

Page 20: Answering Tree Pattern Queries Using Views

Without Schema 6/6

Summary: Can test existence of MCR in poly

time. Exact characterization.

MCR may be union of exponentially many CRs in the worst case.

Algorithm for generating MCR.

Page 21: Answering Tree Pattern Queries Using Views

With Schema 1/6

Given Query Q, view V, schema S. Infer all constraints C implied by S. Chase V w.r.t. C. Look for MCR of Q w.r.t. chased view.

Page 22: Answering Tree Pattern Queries Using Views

With Schema 2/6AuctionsAuctions

AuctionAuction

open_auctionopen_auctionclosed_auctionclosed_auction

bidsbids

personperson itemitem

namename

**

** ??

++ ??

++ ++

E.g. constraints:•c_a has ≤ 1 bids

child•Every Auction

having a person desc also

has an item desc.

•every path from Auction to name

goes via bids.

Page 23: Answering Tree Pattern Queries Using Views

With Schema 3/6

//Auction//Auction

o_ao_a c_ac_a

bidsbids bidsbids

VV

//Auction//Auction

bidsbids bidsbids

personperson itemitem

namenameQQ

Page 24: Answering Tree Pattern Queries Using Views

With Schema 3/6AuctionsAuctions

AuctionAuction

open_auctionopen_auction closed_auctionclosed_auction

bidsbids

personperson itemitem

namename

**

** ??

++ ??

++ ++

o_ao_a c_ac_a

bidsbids bidsbids

//Auction//Auction

person item

name

p i

n

Page 25: Answering Tree Pattern Queries Using Views

With Schema 4/6

o_ao_a c_ac_a

bidsbids bidsbids

//Auction//Auction

person item

name

p i

n

//Auction//Auction

bidsbids bidsbids

personperson itemitem

namenameQQ

MCR = identity query.

Page 26: Answering Tree Pattern Queries Using Views

With Schema 5/6Another Example: AuctionsAuctions

AuctionAuction

closed_auctionclosed_auction

bidsbids

personperson

itemitem

namename

**

** ??

++

++

open_auctionopen_auction

buyer?

//Auction

nameitem

person

Q

//Auction

VHow to answer Q using V?

Page 27: Answering Tree Pattern Queries Using Views

With Schema 5/6 Another Example: AuctionsAuctions

AuctionAuction

closed_auctionclosed_auction

bidsbids

personperson

itemitem

name

**

** ??

++

++

open_auctionopen_auction

buyer?

//Auction

nameitem

person

Q

//Auction

namnamee

item

So what’s the compensation query?

Page 28: Answering Tree Pattern Queries Using Views

With Schema 5/6 Another Example: AuctionsAuctions

AuctionAuction

closed_auctionclosed_auction

bidsbids

personperson

itemitem

name

**

** ??

++

++

open_auctionopen_auction

buyer?

//Auction

nameitem

person

Q

//Auction

namename

item

MCR = V ◦ “●//name”

Page 29: Answering Tree Pattern Queries Using Views

With Schema 6/6

Challenges and Highlights: Naïve chase can explode.

Make chase context aware.

Exact characterization of schema w/o recursion and union in terms of constraints.

Efficient algo. for inferring the constraints. Efficient algo. for chase. And for finding MCR. MCR is unique, if it exists.

Page 30: Answering Tree Pattern Queries Using Views

Recursive Schemas 1/2a

b

c d**

?

//a

b

V

//a

b b

c dQ

What is the MCR?

Page 31: Answering Tree Pattern Queries Using Views

Recursive Schemas 2/2a

b

c d**

?

//a

b

V

//a

b b

c dQ//a

b

c d

Page 32: Answering Tree Pattern Queries Using Views

Recursive Schemas 2/2a

b

c d**

?

//a

b

V

//a

b b

c dQ//a

b

c

d

b

Page 33: Answering Tree Pattern Queries Using Views

Recursive Schemas 2/2a

b

c d**

?

//a

b

V

//a

b b

c dQ//a

b

c

db

Page 34: Answering Tree Pattern Queries Using Views

Recursive Schemas 2/2a

b

c d**

?

//a

b

V

//a

b b

c dQ//a

b

c

b

d

b

MCR = union of four CRs.

Behavior similar to no schema.

Page 35: Answering Tree Pattern Queries Using Views

Related Work 1/2

QAV for relational – huge body of work [Halevy 01].

Regular path queries and semi-structured DBs [Grahne&Thomo 03, Calvenese 00,Papakonstantinou&Vassalos 99].

Equivalent rewrites for fragments of XQuery and XPath [Deutsch&Tannen 03, Tang&Zhou 05, Xu&Ozsoyoglu 05].

Page 36: Answering Tree Pattern Queries Using Views

Related Work 2/2 Key differences b/w equivalent &

contained rewriting: Unique rewriting (even w/o schema). MCR may involve union of (possibly

exponentially many) CRs. Study of contained rewriting in

presence of schema. Lot of work on semantic caching

[Chen+ 02], heuristics for using materialized views for optimizing XPath [Balmin+ 04], mine views worth materializing, XPath containment, … .

Page 37: Answering Tree Pattern Queries Using Views

Summary & Future Work 1/2

QAV using (maximally) contained rewriting ( information integration).

Without schema: existence, characterization, closure, generation of MCR.

With Schema: extract essence using constraints, chase, similar problems as above.

Impact of recursion. Experiments.

Page 38: Answering Tree Pattern Queries Using Views

Summary & Future Work 2/2

Impact of wildcard, disjunction, order …

Impact of union, recursion, … Other integration models (e.g., GLAV) QAV for XQuery.

Page 39: Answering Tree Pattern Queries Using Views