29
Join-Queries between two Spatial Datasets Indexed by a Single R*- tree Michael Vassilakopoulos (*) (Dept. of Computer Science and Biomedical Informatics, University of Central Greece, Greece) Antonio Corral (Department of Languages and Computing, University of Almeria, Spain) Nikitas N. Karanikolas (Department of Informatics, Technological Educational Institute of Athens, Greece) (*) speaker 37th Int. Conf. on Current Trends in 37th Int. Conf. on Current Trends in Theory and Practice of Computer Theory and Practice of Computer Science, Science, Jan. 2011, Nový Smokovec, Slovakia Jan. 2011, Nový Smokovec, Slovakia

Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Michael Vassilakopoulos

Embed Size (px)

Citation preview

Page 1: Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Michael Vassilakopoulos

Join-Queries between two Spatial Datasets

Indexed by a Single R*-tree

Join-Queries between two Spatial Datasets

Indexed by a Single R*-tree

Michael Vassilakopoulos (*)

(Dept. of Computer Science and Biomedical Informatics, University of Central Greece, Greece)

Antonio Corral(Department of Languages and Computing, University of Almeria, Spain)

Nikitas N. Karanikolas(Department of Informatics, Technological Educational Institute of Athens, Greece)

(*) speaker

37th Int. Conf. on Current Trends in 37th Int. Conf. on Current Trends in Theory and Practice of Computer Theory and Practice of Computer

Science, Science,

Jan. 2011, Nový Smokovec, Slovakia Jan. 2011, Nový Smokovec, Slovakia

Page 2: Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Michael Vassilakopoulos

Outline

Problem and Motivation Contribution and Background R-trees 2 Datasets in 1 Tree (2D1T) Join-query Algorithms The New Algorithm for 2D1T Experimental Results (Intersection Join,

K-CPQ, Buffer Query, LRU-buffer) Conclusions and Future Work

2

Page 3: Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Michael Vassilakopoulos

Outline

Problem and Motivation Contribution and Background R-trees 2 Datasets in 1 Tree (2D1T) Join-query Algorithms The New Algorithm for 2D1T Experimental Results (Intersection Join,

K-CPQ, Buffer Query, LRU-buffer) Conclusions and Future Work

3

Page 4: Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Michael Vassilakopoulos

Problem and Motivation (1) Among the most frequent queries appearing in

Spatial Databases is the Spatial Join Query:

find all pairs of objects (O, O’) R×S,

where θ (O.G, O’.G) = TRUE

Examples of θ : intersects, contains, is enclosed by, distance, north-west, adjacent, meets, etc.

Usually, the intersection join is an efficient filter.

4

Page 5: Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Michael Vassilakopoulos

Problem and Motivation (2) Current social and technological

advancements contribute to the production of large datasets (e.g. datasets of Geographical Information Systems applications).

The size of such datasets raises the cost of join processing => the importance of the data indexing method and the query processing technique rises.

5

Page 6: Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Michael Vassilakopoulos

Problem and Motivation (3) R-trees (and family) are considered good

choices for indexing spatial data sets to process join queries in Spatial Databases.

When joining two datasets, a common assumption is that each dataset is indexed by a different R*-tree.

In this paper, we index both datasets by a single R*-tree, so that spatial locality between different datasets is embedded in data indexing.

6

Page 7: Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Michael Vassilakopoulos

Outline

Problem and Motivation Contribution and Background R-trees 2 Datasets in 1 Tree (2D1T) Join-query Algorithms The New Algorithm for 2D1T Experimental Results (Intersection Join,

K-CPQ, Buffer Query, LRU-buffer) Conclusions and Future Work

7

Page 8: Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Michael Vassilakopoulos

ContributionWe present an R*-tree variation able to index two datasets

(2D1T), taking advantage of the spatial locality between different datasets,

a new algorithm for processing join queries on 2D1T by Breadth-First traversal, where at each level we follow Best-First selection,

results (I/O and execution time) of comparative experimentation between 2D2T and 2D1T solutions for Intersection Join queries (on non-point datasets), K-CPQ and Buffer Queries (on point datasets).

8

Page 9: Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Michael Vassilakopoulos

Background Intersection Join: discovers the pairs of objects

that intersect each other. E.g. “find all trails that go through some forest”

K-CPQ: discovers the K pairs of objects that have the K smallest distances between them. It is a combination of join and nearest neighbor queries. E.g. “find the 3 closest pairs of cities and archeological sites”

Buffer Query: discovers pairs of objects that are within a threshold distance of each other.E.g. “find house-power line pairs that are within 50 meters of each other”.

9

Page 10: Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Michael Vassilakopoulos

Outline

Problem and Motivation Contribution and Background R-trees 2 Datasets in 1 Tree (2D1T) Join-query Algorithms The New Algorithm for 2D1T Experimental Results (Intersection Join,

K-CPQ, Buffer Query, LRU-buffer) Conclusions and Future Work

10

Page 11: Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Michael Vassilakopoulos

MBRs can be recursively grouped into larger MBRs

R-Tree (1)

Clusters of spatial objects can be grouped into Minimum Bounding Rectangles – MBRs

R1

R2R5

R3

R7R9

R6

R4

R8

R10 R11

R12

11

Page 12: Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Michael Vassilakopoulos

R10 R11 R12

R1 R2 R3 R4 R5 R6 R7 R8 R9

Nodes that contain points

R-Tree (2)

Nested MBRs can be organized as a tree (R-tree)

R1

R2R5

R3

R7R9

R6

R4

R8

R10 R11

R12

The R*-tree is the most popular R-tree variation12

Page 13: Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Michael Vassilakopoulos

Outline

Problem and Motivation Contribution and Background R-trees 2 Datasets in 1 Tree (2D1T) Join-query Algorithms The New Algorithm for 2D1T Experimental Results (Intersection Join,

K-CPQ, Buffer Query, LRU-buffer) Conclusions and Future Work

13

Page 14: Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Michael Vassilakopoulos

2 Datasets in 1 Tree (2D1T)

Each MBR has a dataset flag showing if it contains data related to D1 ('a'), D2 ('b'), or both ('c') This flag does not affect the placement of data in the tree.

subscript of 'a' / 'b': identifier of the element

14

Page 15: Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Michael Vassilakopoulos

Outline

Problem and Motivation Contribution and Background R-trees 2 Datasets in 1 Tree (2D1T) Join-query Algorithms The New Algorithm for 2D1T Experimental Results (Intersection Join,

K-CPQ, Buffer Query, LRU-buffer) Conclusions and Future Work

15

Page 16: Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Michael Vassilakopoulos

Join-query Algorithms Search algorithms (Best-First, Depth-First and

Breadth-First Search) can be applied on tree-like structures for spatial queries

We have implemented all of them for all the three spatial join queries (Intersection Join, K-CPQ and Buffer Query) on 2D2T and 2D1T

In 2D1T, since the tree is only one, Self-Join variations of the algorithms are utilized

Plane-sweep is utilized to save CPU cost (it restricts the possible combinations of pairs of MBRs)

16

Page 17: Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Michael Vassilakopoulos

Outline

Problem and Motivation Contribution and Background R-trees 2 Datasets in 1 Tree (2D1T) Join-query Algorithms The New Algorithm for 2D1T Experimental Results (Intersection Join,

K-CPQ, Buffer Query, LRU-buffer) Conclusions and Future Work

17

Page 18: Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Michael Vassilakopoulos

The New algorithm for 2D1T

We devised a Breadth-First Traversal algorithm. It synchronously traverses the R-tree in breadth-

first order, while processing the spatial predicate (join condition) one level at a time.

At each level, it creates a list with the entries that satisfy the spatial predicate, the Intermediate Candidate Entry, to be accessed at the next level.

When the leaf level is reached, two separate lists are created, one from each dataset.

Intersection plane-sweep is applied to both lists to get the final result.

Global optimization is applied level-by-level. 18

Page 19: Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Michael Vassilakopoulos

Outline

Problem and Motivation Contribution and Background R-trees 2 Datasets in 1 Tree (2D1T) Join-query Algorithms The New Algorithm for 2D1T Experimental Results (Intersection Join,

K-CPQ, Buffer Query, LRU-buffer) Conclusions and Future Work

19

Page 20: Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Michael Vassilakopoulos

Experimental Results (1)

We used real datasets (line segments), representing roads (NArd: 569,120 segments and Grrd: 23,268 segments), railroads (NArr: 191,637 segments and rivers (Grri: 24,650 segments). For producing point datasets, we transformed the MBRs of NArd and NArr into points. For larger (smaller) datasets, page size was set to 4 (1) Kbytes. Environment used: Mac BookPro, Intel Core 2 Duo, 2.4 GHz, 4 GB RAM, gcc. Performance measurements: I/O activity (page accesses) and response time.

Experimental settings

20

Page 21: Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Michael Vassilakopoulos

Experimental Results (2)

Results from the creation of the trees show that the 2D1T is slightly smaller, in size, than the sum of the two R*-trees that make up 2D2T (in terms of number of nodes).

E.g. for pairs of line-segment (point) datasets for NArd and NArr, the 2D1T nodes are 5393 (5490), while the sum of the 2D2T nodes are 5543 (5697).

21

Page 22: Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Michael Vassilakopoulos

Intersection Join Results

Algorithm for 2D2T: BFT, algorithms for 2D1T: Self-BFT and New. New is 3.7 to 4.6 times better in I/O than BFT and 11.7 to 8.2 times

worse in response time. Self-BFT is more than 10 times worse than New algorithm in I/O

and more than 3 times better than New algorithm in response time.22

Page 23: Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Michael Vassilakopoulos

K-CPQ Results

Algorithm for 2D2T: Heap, algorithms for 2D1T: Self-Heap and New. New is 3.8 times better in I/O than Heap and 2 times worse in

response time. Self-Heap is more than 3.8 times worse than New in I/O and more

than 1.4 times better in response time.23

NArd x NArr NArd x NArr

Page 24: Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Michael Vassilakopoulos

Buffer Query Results

Algorithm for 2D2T: Heap, algorithms for 2D1T: Self-Heap and New. New is 3.6 to 4.3 times better in I/O than Heap algorithm and 4.2 to

5.2 times in response time. Self-Heap is more than 9 times worse than the New algorithm in I/O

and more than 1.6 times better in response time.24

NArd x NArrNArd x NArr

Page 25: Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Michael Vassilakopoulos

LRU Buffer Effect

Intersection Join (left) by using DFT for 2D2T and by using Self-BFT and New for 2D1T.

K-CPQ (right) by using DFT for 2D2T and Self-Heap and New for 2D1T. The I/O performance of the New algorithm was always the best and

invariant to the buffer size (contrary to DFT).25

NArd x NArrNArd x NArr

Page 26: Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Michael Vassilakopoulos

Outline

Problem and Motivation Contribution and Background R-trees 2 Datasets in 1 Tree (2D1T) Join-query Algorithms The New Algorithm for 2D1T Experimental Results (Intersection Join,

K-CPQ, Buffer Query, LRU-buffer) Conclusions and Future Work

26

Page 27: Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Michael Vassilakopoulos

Conclusions

We presented a tree that stores simultaneously two spatial datasets, taking advantage of spatial locality (2D1T).

We presented a New join algorithm for 2D1T. We compared the performance of 2D2T and 2D1T

for several join queries (Intersection Join, K-CPQ, Buffer Query).

2D1T exhibits a much better I/O performance, 2D2T exhibit better CPU performance.

The winner depends on the balance between CPU power and I/O efficiency of the computing system.

27

Page 28: Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Michael Vassilakopoulos

Future Work

To study the CPU cost of the New algorithm and focus on reducing it.To consider 2D1T variants of structures with non-overlapping nodes, such as R+-trees, or Quadtrees, as they have been used in

Y. J. Kim and J. Patel. Performance Comparison of the R*-tree and the Quadtree for kNN and Distance Join Queries. IEEE TKDE, 22(7), pp. 1014-1027, 2010.

In this paper, it is concluded that “an often dismissed index structure (the Quadtree) can be a better choice than the widely used R*-tree for index-based k-NN query and distance join algorithms when indices are constructed dynamically”.

28

Page 29: Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Michael Vassilakopoulos

Thank you for your attention

[email protected]://users.ucg.gr/~mvasilako