Evaluation of Partial Path Queries on XML Data

Preview:

DESCRIPTION

Evaluation of Partial Path Queries on XML Data. Stefanos Souldatos (NTUA, GREECE) Xiaoying Wu (NJIT, USA) Dimitri Theodoratos (NJIT, USA) Theodore Dalamagas (NTUA, GREECE) Timos Sellis (NTUA, GREECE). Evaluation of Partial Path Queries on XML Data. Partial path queries Query processing - PowerPoint PPT Presentation

Citation preview

Evaluation of Partial Path Queries on XML Data

Stefanos Souldatos (NTUA, GREECE)Xiaoying Wu (NJIT, USA)Dimitri Theodoratos (NJIT, USA)Theodore Dalamagas (NTUA, GREECE)Timos Sellis (NTUA, GREECE)

Partial path queries

Query processing

Query evaluation

Experiments

Conclusion

Evaluation of Partial Path Queries on XML Data

3

Difficulties on Querying XML Data

Creta

theHotel.grtheHotel.gr

CretaCreta

CityCity

CityCity

ChaniaChania

IslandIsland

AthensAthens

IslandIslandLocationLocation

PorosPoros

CityCity

HeraklioHeraklioCenterCenter

Athens Creta

4

Difficulties on Querying XML Data

Creta

Search problemName: Xiaoying WuPlace: Athens Center, HeraklioPurpose: Sightseeing

Problem:

structural difference

Search problemName: Xiaoying WuPlace: Athens Center, HeraklioPurpose: Sightseeing

Problem:

structural difference

Parthenon (438 BC)

Phaistos’ Disk (1700

BC)

theHotel.grtheHotel.gr

CretaCreta

CityCity

CityCity

ChaniaChania

IslandIsland

AthensAthens

IslandIslandLocationLocation

PorosPoros

CityCity

HeraklioHeraklioCenterCenter

Athens Creta

5

Difficulties on Querying XML Data

Creta

Search problemName: Theodore DalamagasPlace: IslandsPurpose: Sea sports

Problem:

structural inconsistency

Search problemName: Theodore DalamagasPlace: IslandsPurpose: Sea sports

Problem:

structural inconsistency

theHotel.grtheHotel.gr

CretaCreta

CityCity

CityCity

ChaniaChania

IslandIsland

AthensAthens

IslandIslandLocationLocation

PorosPoros

CityCity

HeraklioHeraklioCenterCenter

Athens Creta

Windsurf

Jet ski

6

Difficulties on Querying XML Data

Creta

Search problem Name: Dimitri TheodoratosPlace: HeraklioPurpose: HDMS Conference

Problem:

unknown structure

Search problem Name: Dimitri TheodoratosPlace: HeraklioPurpose: HDMS Conference

Problem:

unknown structure

theHotel.grtheHotel.gr

CretaCreta

CityCity

CityCity

ChaniaChania

IslandIsland

AthensAthens

IslandIslandLocationLocation

PorosPoros

CityCity

HeraklioHeraklioCenterCenter

Athens Creta

HDMS 2008

7

Difficulties on Querying XML Data

Creta

theHotel.grtheHotel.gr

Search problem Name: Stefanos SouldatosPlace: Any islandPurpose: Escape from PhD!

Problem:

multiple sources

Search problem Name: Stefanos SouldatosPlace: Any islandPurpose: Escape from PhD!

Problem:

multiple sources

hotels.grhotels.gr

holidays.grholidays.gr

1400 islands

8

Difficulties on Querying XML Data

Creta

theHotel.grtheHotel.gr

CretaCreta

CityCity

CityCity

ChaniaChania

IslandIsland

AthensAthens

IslandIslandLocationLocation

PorosPoros

CityCity

HeraklioHeraklioCenterCenter

Athens Creta

Can we use existing query languages (XPath, XQuery) to express our queries?

Can we use existing techniques to evaluate our queries?

9

Path Queries in XPath

theHotel.grtheHotel.gr

CityCity IslandIsland

partial path queries

theHotel.grtheHotel.gr

CityCity

IslandIsland

theHotel.grtheHotel.gr

CityCity

IslandIsland

//theHotel.gr [descendant-or-self::*[ancestor-or-self::City] [ancestor-or-self::Island]]

/theHotel.gr/City//Island//theHotel.gr//City [descendant-or-self::*[ancestor-or-self::Island]]

no structure(keywords)

full structure(path patterns)

10

Partial Path Queries

root node (optional)

query node labelled by “a”

child relationship

descendant relationship

r

aa

b

r

c

da

c

partial path query

11

Partial Path Queries

a

b

r

c

da

cQUERY

PROCESSING a

b

r

c

d

a

partial path query partial path query

in canonical form

QUERYEVALUATION

Evaluation of Partial Path Queries on XML Data

Partial path queries

Query processing

Query evaluation

Experiments

Conclusion

13

Query Processing

a

b

r

c

da

c

1. Full form2. Satisfiability3. Redundant nodes4. Canonical form

14

Query Processing

a

b

r

c

da

c

IR1

INFERENCE RULES(IR1) |- r//ai

(IR2) x/y |- x//y (IR3) x//y, y//z |- x//z(IR4) x/ai, x//bj |- ai//bj(IR5) ai/x, bj//x |- bj//ai(IR6) x/y, y/w, x//z, z//w |- x/z(IR7) x/y, x//z, w/z, w//y |- x/z(IR8) x/y, y/w, x/z |- z/w(IR9) x//y, y//w, x/z |- z//w(IR10) x/y, w/y, w/z |- x/z(IR11) x//y, w/y, w//z |- x//z(IR12) x/y, y/w, z/w |- x/z(IR13) x//y, y//w, z/w |- x//z

x,y,z,w: query nodesai/bj: nodes labelled by a/b

1. Full form2. Satisfiability3. Redundant nodes4. Canonical form

15

Query Processing

a

b

r

c

da

cIR4

1. Full form2. Satisfiability3. Redundant nodes4. Canonical form

INFERENCE RULES(IR1) |- r//ai(IR2) x/y |- x//y (IR3) x//y, y//z |- x//z(IR4) x/ai, x//bj |- ai//bj(IR5) ai/x, bj//x |- bj//ai(IR6) x/y, y/w, x//z, z//w |- x/z(IR7) x/y, x//z, w/z, w//y |- x/z(IR8) x/y, y/w, x/z |- z/w(IR9) x//y, y//w, x/z |- z//w(IR10) x/y, w/y, w/z |- x/z(IR11) x//y, w/y, w//z |- x//z(IR12) x/y, y/w, z/w |- x/z(IR13) x//y, y//w, z/w |- x//z

x,y,z,w: query nodesai/bj: nodes labelled by a/b

16

Query Processing

a

b

r

c

da

c

IR4

1. Full form2. Satisfiability3. Redundant nodes4. Canonical form

INFERENCE RULES(IR1) |- r//ai(IR2) x/y |- x//y (IR3) x//y, y//z |- x//z(IR4) x/ai, x//bj |- ai//bj(IR5) ai/x, bj//x |- bj//ai(IR6) x/y, y/w, x//z, z//w |- x/z(IR7) x/y, x//z, w/z, w//y |- x/z(IR8) x/y, y/w, x/z |- z/w(IR9) x//y, y//w, x/z |- z//w(IR10) x/y, w/y, w/z |- x/z(IR11) x//y, w/y, w//z |- x//z(IR12) x/y, y/w, z/w |- x/z(IR13) x//y, y//w, z/w |- x//z

x,y,z,w: query nodesai/bj: nodes labelled by a/b

17

Query Processing

a

b

r

c

d

a

c

1. Full form2. Satisfiability3. Redundant nodes4. Canonical form

INFERENCE RULES(IR1) |- r//ai(IR2) x/y |- x//y (IR3) x//y, y//z |- x//z(IR4) x/ai, x//bj |- ai//bj(IR5) ai/x, bj//x |- bj//ai(IR6) x/y, y/w, x//z, z//w |- x/z(IR7) x/y, x//z, w/z, w//y |- x/z(IR8) x/y, y/w, x/z |- z/w(IR9) x//y, y//w, x/z |- z//w(IR10) x/y, w/y, w/z |- x/z(IR11) x//y, w/y, w//z |- x//z(IR12) x/y, y/w, z/w |- x/z(IR13) x//y, y//w, z/w |- x//z

x,y,z,w: query nodesai/bj: nodes labelled by a/b

18

Query Processing

a

b

r

c

d

a

c

1. Full form2. Satisfiability3. Redundant nodes4. Canonical form

yx

A query is unsatisfiable if its full form contains a trivial

cycle:

19

Query Processing

c

a

b

r

c

d

a

1. Full form2. Satisfiability3. Redundant nodes4. Canonical form

yx

y

yz

y

yx

yz

yx

y

zy

A node y is redundant if one of the following patterns occur:

a)

b)

c)

d)

20

Query Processing

a

b

r

c

d

a

1. Full form2. Satisfiability3. Redundant nodes4. Canonical form

canonical form of satisfiable query

=full form

– IR2 – IR3 – redundant nodes

canonical form of satisfiable query

=full form

– IR2 – IR3 – redundant nodes

The canonical form of a query is a directed acyclic graph

(dag)

Evaluation of Partial Path Queries on XML Data

Partial path queries

Query processing

Query evaluation

Experiments

Conclusion

22

Evaluation Algorithms

Based on PathStack [Bruno et al. ’02]

Produce all possible path queries… Decompose into root-to-leaf paths… PartialMJ: Decompose a spanning tree into paths…

Extending PathStack [Bruno et al. ’02]

PartialPathStack: Produce a topological order of the query nodes and extend PathStack to handle it…

24

Based on PathStack

dc

e

b

r

a

g

fd

c

e

b

r

a

g

fd

c

e

b

r

a

g

f

c

e

b

r

a

d

g

fd

c

e

b

r

a

g

f

1. Producing all possible path queries…

25

Based on PathStack

d

c

e

b

r

a

g

f

c

e

b

r

a

d

g

fd

c

e

b

r

a

g

f

d

c

e

b

r

a

g

f

d

c

b

r

a

e

g

f

1. Producing all possible path queries…

26

Based on PathStack

c

e

b

r

a

d

g

f

Problems:

too many queries to evaluate

multiple traversal of the XML tree

1. Producing all possible path queries…

27

b

r

a

d

g

f

r

ac

de

Based on PathStack

2. Decomposing into root-to-leaf paths…

b

r

a

de

r

ac

d

g

f

28

Based on PathStack

2. Decomposing into root-to-leaf paths…

b

r

a

d

g

f

r

ac

de

b

r

a

de

r

ac

d

g

f

PathStack

29

b

r

a

d

g

f

r

ac

de

Based on PathStack

2. Decomposing into root-to-leaf paths…

b

r

a

de

r

ac

d

g

fProblems:

path overlaps

more than one components to evaluate

intermediate results

30

Based on PathStack

PartialMJ. Using a spanning tree…

Remove edges to create a spanning tree

b

r

a

d

g

f

r

acb

r

a

de

31

Based on PathStack

PartialMJ. Using a spanning tree…

b

r

a

d

g

f

r

acb

r

a

de

c

e

b

r

a

d

g

f

32

Based on PathStack

PartialMJ. Using a spanning tree…

b

r

a

d

g

f

r

acb

r

a

de

c

e

b

r

a

d

g

f

PathStack

33

Based on PathStack

PartialMJ. Using a spanning tree…

b

r

a

d

g

f

r

acb

r

a

de

c

e

b

r

a

d

g

f

Join conditions (identity, structural, path)

34

Based on PathStack

PartialMJ. Using a spanning tree…

b

r

a

d

g

f

r

acb

r

a

de

c

e

b

r

a

d

g

f

Join conditions (identity, structural, path)

35

Based on PathStack

PartialMJ. Using a spanning tree…

b

r

a

d

g

f

r

acb

r

a

de

c

e

b

r

a

d

g

f

Join conditions (identity, structural, path)

36

Based on PathStack

PartialMJ. Using a spanning tree…

b

r

a

d

g

f

r

acb

r

a

de

c

e

b

r

a

d

g

f

37

Based on PathStack

PartialMJ. Using a spanning tree…

c

e

b

r

a

d

g

f Problems:

path overlaps

more than one components to evaluate

intermediate results

38

Extending PathStack

dc

e

b

r

a

g

f

PartialPathStack. Employ a topological order…

c

e

b

r

a

d

g

f

39

Extending PathStack

PartialPathStack. Employ a topological order…

c

e

b

r

a

d

g

fd

c

e

b

r

a

g

f

PartialPathStack

40

PartialPathStack Examplequerytree

Sr Sa Sb Sd Sc Se

d2

e1

c1

d1

c2 e2

d1

b1

a1

r

db

r

a

c esink

nodes

results

41

PartialPathStack Exampletree

Sr Sa Sb Sd Sc Se

d2

e1

c1

d1

c2 e2

d1

b1

a1

r

query

db

r

a

c e

r

sink nodes

results

42

PartialPathStack Exampletree

Sr Sa Sb Sd Sc Se

d2

e1

c1

d1

c2 e2

d1

b1

a1

r

query

db

r

a

c e

r a1

sink nodes

results

43

PartialPathStack Exampletree

Sr Sa Sb Sd Sc Se

d2

e1

c1

d1

c2 e2

d1

b1

a1

r

query

db

r

a

c e

r a1 b1

sink nodes

results

44

PartialPathStack Exampletree

Sr Sa Sb Sd Sc Se

d2

e1

c1

d1

c2 e2

d1

b1

a1

r

query

db

r

a

c e

r a1 b1 d1

sink nodes

results

45

PartialPathStack Exampletree

Sr Sa Sb Sd Sc Se

d2

e1

c1

d1

c2 e2

d1

b1

a1

r

query

db

r

a

c e

r a1 b1 d1 c1

sink nodes

results

46

PartialPathStack Exampletree

Sr Sa Sb Sd Sc Se

d2

e1

c1

d1

c2 e2

d1

b1

a1

r

query

db

r

a

c e

r a1

sink nodes

b1 d1 c1 e1

results

OUTPUT!!!

47

PartialPathStack Exampletree

Sr Sa Sb Sd Sc Se

d2

e1

c1

d1

c2 e2

d1

b1

a1

r

query

db

r

a

c e

r a1

sink nodes

b1 d1 c1 e1

results

OUTPUT!!!

48

PartialPathStack Exampletree

Sr Sa Sb Sd Sc Se

d2

e1

c1

d1

c2 e2

d1

b1

a1

r

query

db

r

a

c e

r a1

sink nodes

b1 d1 c1 e1

results

OUTPUT!!!

49

PartialPathStack Exampletree

Sr Sa Sb Sd Sc Se

d2

e1

c1

d1

c2 e2

d1

b1

a1

r

query

db

r

a

c e

r a1

sink nodes

b1 d1 c1 e1

results

OUTPUT!!!

50

PartialPathStack Exampletree

Sr Sa Sb Sd Sc Se

d2

e1

c1

d1

c2 e2

d1

b1

a1

r

query

db

r

a

c e

r a1

sink nodes

b1 d1 c1 e1

results

OUTPUT!!!

51

PartialPathStack Exampletree

Sr Sa Sb Sd Sc Se

d2

e1

c1

d1

c2 e2

d1

b1

a1

r

query

db

r

a

c e

r a1

sink nodes

b1 d1 c1 e1

OUTPUT!!!

results

ra1b1d1c1e1

52

PartialPathStack Exampletree

Sr Sa Sb Sd Sc Se

d2

e1

c1

d1

c2 e2

d1

b1

a1

r

query

db

r

a

c e

r a1

sink nodes

b1 d1 c1 e1

results

ra1b1d1c1e1

d2

53

PartialPathStack Exampletree

Sr Sa Sb Sd Sc Se

d2

e1

c1

d1

c2 e2

d1

b1

a1

r

query

db

r

a

c e

r a1

sink nodes

b1 d1 c1 e1

d2 c2

OUTPUT!!!

results

ra1b1d1c1e1

54

PartialPathStack Exampletree

Sr Sa Sb Sd Sc Se

d2

e1

c1

d1

c2 e2

d1

b1

a1

r

query

db

r

a

c e

r a1

sink nodes

b1 d1 c1 e1

d2 c2

OUTPUT!!!

results

ra1b1d1c1e1

55

PartialPathStack Exampletree

Sr Sa Sb Sd Sc Se

d2

e1

c1

d1

c2 e2

d1

b1

a1

r

query

db

r

a

c e

r a1

sink nodes

b1 d1 c1 e1

d2 c2

OUTPUT!!!

results

ra1b1d1c1e1

56

PartialPathStack Exampletree

Sr Sa Sb Sd Sc Se

d2

e1

c1

d1

c2 e2

d1

b1

a1

r

query

db

r

a

c e

r a1

sink nodes

b1 d1 c1 e1

d2 c2

OUTPUT!!!

results

ra1b1d1c1e1

57

PartialPathStack Exampletree

Sr Sa Sb Sd Sc Se

d2

e1

c1

d1

c2 e2

d1

b1

a1

r

query

db

r

a

c e

r a1

sink nodes

b1 d1 c1 e1

d2 c2

OUTPUT!!!

results

ra1b1d1c1e1

58

PartialPathStack Exampletree

Sr Sa Sb Sd Sc Se

d2

e1

c1

d1

c2 e2

d1

b1

a1

r

query

db

r

a

c e

r a1

sink nodes

b1 d1 c1 e1

d2 c2

OUTPUT!!!

results

ra1b1d1c1e1

ra1b1d1c2e1

59

PartialPathStack Exampletree

Sr Sa Sb Sd Sc Se

d2

e1

c1

d1

c2 e2

d1

b1

a1

r

query

db

r

a

c e

r a1

sink nodes

b1 d1 c1 e1

d2 c2

results

ra1b1d1c1e1

ra1b1d1c2e1

60

PartialPathStack Exampletree

Sr Sa Sb Sd Sc Se

d2

e1

c1

d1

c2 e2

d1

b1

a1

r

query

db

r

a

c e

r a1

sink nodes

b1 d1 c1 e1

d2

results

ra1b1d1c1e1

ra1b1d1c2e1

e2

OUTPUT!!!

61

PartialPathStack Exampletree

Sr Sa Sb Sd Sc Se

d2

e1

c1

d1

c2 e2

d1

b1

a1

r

query

db

r

a

c e

r a1

sink nodes

b1 d1 c1 e1

d2

results

ra1b1d1c1e1

ra1b1d1c2e1

e2

OUTPUT!!!

62

PartialPathStack Exampletree

Sr Sa Sb Sd Sc Se

d2

e1

c1

d1

c2 e2

d1

b1

a1

r

query

db

r

a

c e

r a1

sink nodes

b1 d1 c1 e1

d2

results

ra1b1d1c1e1

ra1b1d1c2e1

e2

OUTPUT!!!

63

PartialPathStack Exampletree

Sr Sa Sb Sd Sc Se

d2

e1

c1

d1

c2 e2

d1

b1

a1

r

query

db

r

a

c e

r a1

sink nodes

b1 d1 c1 e1

d2

results

ra1b1d1c1e1

ra1b1d1c2e1

e2

OUTPUT!!!

64

PartialPathStack Exampletree

Sr Sa Sb Sd Sc Se

d2

e1

c1

d1

c2 e2

d1

b1

a1

r

query

db

r

a

c e

r a1

sink nodes

b1 d1 c1 e1

d2

results

ra1b1d1c1e1

ra1b1d1c2e1

e2

OUTPUT!!!

65

PartialPathStack Exampletree

Sr Sa Sb Sd Sc Se

d2

e1

c1

d1

c2 e2

d1

b1

a1

r

query

db

r

a

c e

r a1

sink nodes

b1 d1 c1 e1

d2

results

ra1b1d1c1e1

ra1b1d1c2e1

ra1b1d1c1e2

e2

OUTPUT!!!

66

PartialPathStack Examplequerytree

d2

e1

c1

d1

c2 e2

d1

b1

a1

r

db

r

a

c e

results

ra1b1d1c1e1

ra1b1d1c2e1

ra1b1d1c1e2

only one component to evaluate

no intermediate results

67

Evaluation Algorithms

Problems:

Algorithm:

Many queries /

components to evaluate

Path overlaps

Intermediate results

Produce all path queries…

Decompose into paths…

PartialMJ (spanning tree)

PartialPathStack

68

PartialPathStack vs PathStack

PathStack• Path queries• Indegree = 1• Outdegree = 1• O(input + output)

d

c

e

b

r

a

g

f

d

c

e

b

r

a

g

f

PartialPathStack• Partial path queries• Indegree > 1• Outdegree > 1• O(input*indegree + output*outdegree)

Evaluation of Partial Path Queries on XML Data

Partial path queries

Query processing

Query evaluation

Experiments

Conclusion

70

Queries Used in the Experiments

d

c

e

b

r

a

f

d

c

eb

r

a

f d

e

r

a

fc

b

d

e

r

a

fc

b

Q1/Q5 Q2/Q6 Q3/Q7 Q4/Q8

71

Experiment 1

Execution time on Treebank…2.5 million nodes

72

Experiment 1

path queries

Execution time on Treebank…2.5 million nodes

73

Experiment 1

too many results

Execution time on Treebank…2.5 million nodes

74

Experiment 1

2.5 million nodes(IBM AlphaWorks

XML generator)

Execution time on Synthetic data…

75

Experiment 2

PartialMJ

PartialPathStack

PartialPathStack

PartialMJ

PartialPathStack

PartialMJ

Q2

Q3 Q7

Execution time varying the size of the XML tree…(1 - 3 million nodes)

Evaluation of Partial Path Queries on XML Data

Partial path queries

Query processing

Query evaluation

Experiments

Conclusion

77

Conclusion

Evaluation Containment

Heuristics for

Containment

Partial Path Queries CIKM ’07 SSDBM ’06 CIKM ’06

Queries with repetitions

? SSDBM ’06 CIKM ’06

Partial Tree Queries ? SSDBM ’06 CIKM ’06

Questions?

Partial path queries

Query processing

Query evaluation

Experiments

Conclusion

Recommended