38
Continuous Query Processing on Spatio-Temporal Data Streams Rimma V. Nehme Department of Computer Sciences, Worcester Polytechnic Institute Thesis Advisor: Elke A. Rundensteiner Thesis Reader: Michael A. Gennert June 27, 2005

Continuous Query Processing on Spatio-Temporal Data Streams Rimma V. Nehme Department of Computer Sciences, Worcester Polytechnic Institute Thesis Advisor:

Embed Size (px)

Citation preview

Page 1: Continuous Query Processing on Spatio-Temporal Data Streams Rimma V. Nehme Department of Computer Sciences, Worcester Polytechnic Institute Thesis Advisor:

Continuous Query Processing on Spatio-Temporal Data Streams

Rimma V. Nehme

Department of Computer Sciences, Worcester Polytechnic Institute

Thesis Advisor: Elke A. RundensteinerThesis Reader: Michael A. Gennert

June 27, 2005

Page 2: Continuous Query Processing on Spatio-Temporal Data Streams Rimma V. Nehme Department of Computer Sciences, Worcester Polytechnic Institute Thesis Advisor:

2

Outline Motivation Part I: SCUBA

Motivation Moving Clusters SCUBA Algorithm Analysis of SCUBA Evaluation Conclusions Future Work

Part II: Performance vs. Accuracy Discrete vs. Continuous Model Accuracy Model Evaluation Conclusions

Are we

there yet?

Page 3: Continuous Query Processing on Spatio-Temporal Data Streams Rimma V. Nehme Department of Computer Sciences, Worcester Polytechnic Institute Thesis Advisor:

3Motivation

Monitor the traffic in the

red areas

Continuously return the

area covered by the heard during the migration

Send a notification to all cell phone users in the range of 2 miles

that we have 50% off lunch sale

Page 4: Continuous Query Processing on Spatio-Temporal Data Streams Rimma V. Nehme Department of Computer Sciences, Worcester Polytechnic Institute Thesis Advisor:

4

Challenges Scalability

Large number of objects Large number of queries

Limited Resources Memory CPU

Real-time Response Requirement

Reduce the number of computations

The challenge is to provide fast query response in update-intensive environments

- moving objects- dynamic range query

- dynamic kNN query

Novel Idea: Exploit thefact that objects/queries

move in groups (i.e., clusters)to optimize the execution

Page 5: Continuous Query Processing on Spatio-Temporal Data Streams Rimma V. Nehme Department of Computer Sciences, Worcester Polytechnic Institute Thesis Advisor:

5

Big Picture

SINA [MXA04] SEA-CNN [XMA05] Q-Index [PXK+02]

SR [SR01] DQ [LPM02] CNN [TPS] TPR [SJL00]

Traditional Execution Shared Execution

My work (SCUBA)

Shared Cluster-Based Execution

Use clustering as means to improve execution for densely moving objects and queries

Page 6: Continuous Query Processing on Spatio-Temporal Data Streams Rimma V. Nehme Department of Computer Sciences, Worcester Polytechnic Institute Thesis Advisor:

6

Proposed SolutionMoving Clusters!!!

Main Idea: Abstracting individual entities into a cluster based on common attributes

-Direction

-Speed

-Spatial Position

The execution of continuous moving queries on moving objects is then abstracted as the join-between moving clusters and join-within moving clusters

Continuously retrieve closest police car next

to me

Police Car

Scalable Cluster-Based Algorithm for Evaluating Continuous Spatio-Temporal Queries on Moving Objects (SCUBA)

Page 7: Continuous Query Processing on Spatio-Temporal Data Streams Rimma V. Nehme Department of Computer Sciences, Worcester Polytechnic Institute Thesis Advisor:

7

Architecture Overview SCUBA-enabled motion operator

execution

SCUBA - Motion Operator

Moving ObjectsData Stream

Moving QueriesData Stream

Results DataStream

-range query

Time interval expires

Grid-based Join Between/Within

Clusters

I present the system in the context of continuous spatio-temporal range queries

StreamGenerator

Query PlanGenerator

StreamGenerator

Raindrop Workhorse

ExecutionEngine

ExecutionScheduler

StatisticsGatherer

StreamReceiver

CAPE Engine

User Query

Control Flow

Data Flow

Legend:

User Query

End User

Internet

CAPE

-moving object

Moving Clusters

Page 8: Continuous Query Processing on Spatio-Temporal Data Streams Rimma V. Nehme Department of Computer Sciences, Worcester Polytechnic Institute Thesis Advisor:

8

Moving Cluster Representation in SCUBA

Centroid

Actual Cluster SizeΘD

Max Cluster Size

Velocity Vector

Cluster members:-moving objects

Cluster members:-moving queries

Cluster Member Representation Inside Cluster:

Centroid

Cluster member:(moving object)

Page 9: Continuous Query Processing on Spatio-Temporal Data Streams Rimma V. Nehme Department of Computer Sciences, Worcester Polytechnic Institute Thesis Advisor:

9

SCUBA Execution SCUBA produces result every time units

Phase I: Cluster Pre-Join Maintenance Formation of new clusters Dissolving “empty” clusters Expanding existing clusters

Cluster-Based Joining

Clusters Position Update

Send Results

DONE

DONE

SCUBA has three phases

Phase II: Cluster-Based Joining

Phase III: Cluster Post-Join Maintenance Dissolving “expiring” clusters Relocating “non-expiring” clusters based on velocity vector in the grid

TimeoutIn-memory

clustering

Ob

ject

&

Qu

erie

s

DONE

Cluster Pre-Join Maintenance

Cluster-Based Joining

Cluster Post-Join Maintenance

Page 10: Continuous Query Processing on Spatio-Temporal Data Streams Rimma V. Nehme Department of Computer Sciences, Worcester Polytechnic Institute Thesis Advisor:

10

Phase I: Cluster Pre-Join Maintenance Clustering is done incrementally (upon the arrival of updates) Location update format

(ID, Loct, t, Speed, CNLoc, ...)

Use 2 threshold distances + destination ΘD – distance threshold

ΘS – speed threshold Destination

ConnectionNode (CNLoc)

Clustering Algorithm is based on Leader-Follower Clustering Algorithm(J.A. Hartigan. Clustering Algorithms,John Wiley and Sons 1975)

(1) New moving object arrives

(2) Hash objectinto the grid

(3) Add object to thecluster and update cluster attributes

M1

M2

M3

M1

M2

M3

-centroid position-radius-average speed-member count

Parent Cluster

(4) If the cluster has expanded check foroverlap with neighboringcells (make new entries if necessary)

Clustering New Object Example

(5) If object left the existing cluster,for a new cluster and the old cluster is “empty”, dissolve the old cluster.

Page 11: Continuous Query Processing on Spatio-Temporal Data Streams Rimma V. Nehme Department of Computer Sciences, Worcester Polytechnic Institute Thesis Advisor:

11

Phase II: Cluster-Based JoiningLocation updates

arrive

Incremental Clustering Cluster-Based Join

∆ expires

Join-Between

= overlap

ignored

= query results

Join-Within

Phase I Phase II

Page 12: Continuous Query Processing on Spatio-Temporal Data Streams Rimma V. Nehme Department of Computer Sciences, Worcester Polytechnic Institute Thesis Advisor:

12

Phase II: Cluster-Based Joining (cont.) Join-Between

Between two clusters

Join-Within For each cluster (joining objects and queries inside) For two overlapping clusters (cross-join between objects and queries

from the two clusters)

Join-Between

= overlap

Join-Within

ignored

= query results

Join-Within

Page 13: Continuous Query Processing on Spatio-Temporal Data Streams Rimma V. Nehme Department of Computer Sciences, Worcester Polytechnic Institute Thesis Advisor:

13

Phase III: Cluster Post-Join Maintenance

ConnectionNode

Dissolved

New ClusterPosition Updated

Insert into the grid

Clear the grid

Dissolve “expiring” clusters

Relocate “non-expiring” clusters based on velocity vector back into the grid

Page 14: Continuous Query Processing on Spatio-Temporal Data Streams Rimma V. Nehme Department of Computer Sciences, Worcester Polytechnic Institute Thesis Advisor:

14

Data Structures Objects Table Queries Table ClusterHome Table ClusterStorage Table ClusterGrid

1

246

5637

42

Page 15: Continuous Query Processing on Spatio-Temporal Data Streams Rimma V. Nehme Department of Computer Sciences, Worcester Polytechnic Institute Thesis Advisor:

15

Moving Cluster-Based Load Shedding Focus: Discarding data inside moving clusters

ΘD

Velocity Vector

O1(r1,1)

O2(r2,2)

O3(r3,3)

Q4(r4,4)

Q5(r5,5)

Case 1: No Load Shedding (All relative positions of cluster members are preserved)

Page 16: Continuous Query Processing on Spatio-Temporal Data Streams Rimma V. Nehme Department of Computer Sciences, Worcester Polytechnic Institute Thesis Advisor:

16

Moving Cluster-Based Load Shedding (cont.)

ΘD

Velocity Vector

Cluster Members:(O1,O2,O3, Q4,Q5)

Case 2: Full Shedding (All relative positions of cluster members are discarded)-Cluster is the sole representation of movements of its members-Assume all objects satisfy all queries inside the cluster-No Join-Within is needed

Page 17: Continuous Query Processing on Spatio-Temporal Data Streams Rimma V. Nehme Department of Computer Sciences, Worcester Polytechnic Institute Thesis Advisor:

17

Moving Cluster-Based Load Shedding (cont.)

Case 3: Partial Shedding (Some (furthest) relative positions of cluster members are maintained)- Introduce new structure to abstract discarded members - Nucleus-Assume all objects satisfy all queries inside the nucleus-No Join-Within is needed for cluster nucleus members

ΘD

Velocity Vector

O2(r2,2)

ΘN

ΘN = 0.45 * ΘD

Nucleus

Nucleus Threshold

Page 18: Continuous Query Processing on Spatio-Temporal Data Streams Rimma V. Nehme Department of Computer Sciences, Worcester Polytechnic Institute Thesis Advisor:

18

Experimental Settings We use the Network-based Generator of

Moving Objects to generate a set of moving objects and moving queries in Worcester County (Tiger Line files)

Unless mentioned otherwise, the following are the parameters used:

10,000 moving objects and 10,000 moving queries. Each moving object or query reports its new information (if changed) every time unit.

The percentage of objects and queries that report a change of information is 100%

Speed of objects and queries is set to medium

ΘD = 100 (spatial units), ΘS = 10 (spatial units/time units) ΘN = 0 (no load shedding)

Grid: 100x100

Page 19: Continuous Query Processing on Spatio-Temporal Data Streams Rimma V. Nehme Department of Computer Sciences, Worcester Polytechnic Institute Thesis Advisor:

19

Experimental Results Varying Grid Cell Sizes

- Performance of regular grid-based execution improves with finer granularity of grid cells (But memory requirements increase as well)

0

10

20

30

40

50

60

50x50 75x75 100x100 125x125 150x150

REGULAR SCUBA

0

500

1000

1500

2000

50x50 75x75 100x100 125x125 150x150

REGULAR SCUBA

Tim

e (in

sec

s)

(a) Join TimeGrid Cell Count Grid Cell Count

Mem

ory

(in M

B)

(b) Memory Consumption

Page 20: Continuous Query Processing on Spatio-Temporal Data Streams Rimma V. Nehme Department of Computer Sciences, Worcester Polytechnic Institute Thesis Advisor:

20

0

5

10

15

20

25 Offline Clustering Time

Join Time

Tim

e (in

sec

s)

Increm. Non-Inc.iter = 1

Non-Inc.iter = 3

Non-Inc.iter = 5

Non-Inc.iter = 10

- Non-Increm. Clustering Time- Join Time

Experimental Results (cont.) Varying Skew Factor:

Incremental vs. Non-incremental:

-The higher the skew factor the more dense the objects and queries (i.e., more clusterable)

-EXPERIMENTS TO FINISH

-Join time slightly improves with non-incremental clustering-But the clustering wait time outweighs the advantage of faster join

Page 21: Continuous Query Processing on Spatio-Temporal Data Streams Rimma V. Nehme Department of Computer Sciences, Worcester Polytechnic Institute Thesis Advisor:

21

Experimental Results (cont.) Moving Cluster-Based Load Shedding:

- Varying ΘN relative to the ΘD

- Accuracy measured in terms of false positives and false negatives- Measure average # of FP and FN (per object and query)

Page 22: Continuous Query Processing on Spatio-Temporal Data Streams Rimma V. Nehme Department of Computer Sciences, Worcester Polytechnic Institute Thesis Advisor:

22

Experimental Results (cont.) Cluster Maintenance:

Cluster maintenance time is cheap relative to the join time

-EXPERIMENTS TO FINISH

Page 23: Continuous Query Processing on Spatio-Temporal Data Streams Rimma V. Nehme Department of Computer Sciences, Worcester Polytechnic Institute Thesis Advisor:

23

Contributions

I proposed:

SCUBA is a novel cluster-based algorithm for continuously evaluating a set of concurrent continuous spatio-temporal queries. SCUBA is generic model that is applicable to any location-aware server.

Scalability in SCUBA is achieved through shared cluster-based execution, where objects and queries with similar attributes are grouped into clusters. The execution of a set of concurrent continuous queries is abstracted as a join-between and join-within moving clusters.

SCUBA utilizes moving cluster-based load shedding, with two alternatives (full shedding, partial shedding of cluster members) to resource usage while maintaining accurate answers.

Experimental results show that SCUBA outperforms regular grid-based indexing scheme when executing on densely moving objects

Page 24: Continuous Query Processing on Spatio-Temporal Data Streams Rimma V. Nehme Department of Computer Sciences, Worcester Polytechnic Institute Thesis Advisor:

24

Future Work Non-circular clusters Extend to other types of spatio-temporal queries

CKNN Aggregate

Hierarchical clustering (merge and break-down clusters) Use real-sensor data

Page 25: Continuous Query Processing on Spatio-Temporal Data Streams Rimma V. Nehme Department of Computer Sciences, Worcester Polytechnic Institute Thesis Advisor:

25

Part II: Additional Work

Accuracy vs. Performance Tradeoffin Location-Aware Services

Page 26: Continuous Query Processing on Spatio-Temporal Data Streams Rimma V. Nehme Department of Computer Sciences, Worcester Polytechnic Institute Thesis Advisor:

26

Part II: Accuracy vs. Performance Tradeoff

Motion can be described as

(a) A list of discrete positions (b) A continuous function

time time

Page 27: Continuous Query Processing on Spatio-Temporal Data Streams Rimma V. Nehme Department of Computer Sciences, Worcester Polytechnic Institute Thesis Advisor:

27

Related Works: Discrete & Continuous Discrete:

mSTOMM [SDK02] MobiEyes[GL04] SINA [MXA04] SEA-CNN [XMA05] Q-Index [PXK+02]

Continuous: DOMINO [WCL02] A Framework for Representing

Moving Objects [BBH04] MON-Tree [AG04] CHOROCHRONOS/TB-tree [PJT00] Continuous Nearest Neighbor Search [TPS02] Dynamic Queries [LPM02]

Discrete: Faster Simpler computations (join) Smaller memory req-s Poor approximation of actual movement Poor accuracy, especially with infrequent

updates or when objects move fast Don’t know anything about the object

between the updates Load shedding has dramatic effect on

accuracy

Continuous: Slower More complex computations (join) Larger memory req-s Better approximation of actual movement Higher accuracy Can answer questions about durations of

events Can do load shedding with relatively

good quality answers

I investigate when each model is more appropriate for any

location-aware server

Page 28: Continuous Query Processing on Spatio-Temporal Data Streams Rimma V. Nehme Department of Computer Sciences, Worcester Polytechnic Institute Thesis Advisor:

28

Linear Continuous Model Use linear segments to approximate the movement between updates Common justifications:

Simple Arbitrarily complex movements can be approximated using piece-wise linear movements. Movement is constrained within a road network (roads tend to be linear)

Other functions describing motion can be plugged into the system

Chicago Washington, DC Los-Angeles

Page 29: Continuous Query Processing on Spatio-Temporal Data Streams Rimma V. Nehme Department of Computer Sciences, Worcester Polytechnic Institute Thesis Advisor:

29

Accuracy vs. Performance Tradeoff Continuous Model MORE ACCURATE, but is MORE EXPENSIVE

Accuracy model comparison between discrete and continuous results

Assumptions Continuous model is more accurate (100% accuracy) Compare discrete to continuous

Idea Construct continuous segments out of discrete answers Compare them to continuous results

Page 30: Continuous Query Processing on Spatio-Temporal Data Streams Rimma V. Nehme Department of Computer Sciences, Worcester Polytechnic Institute Thesis Advisor:

30

Accuracy Model

Step1: Calculate Average Result Segment Length

Step 2: Multiply average result segment length by the number of discrete results

Step3: Calculate accuracy

According to our model, discrete is ~30% as

accurate as continuous

Step 1: Calculate Average Result Segment Length

Page 31: Continuous Query Processing on Spatio-Temporal Data Streams Rimma V. Nehme Department of Computer Sciences, Worcester Polytechnic Institute Thesis Advisor:

31

Accuracy ExamplesScenario 1: Object location update received everytime object entered, stayed, and left the query

Scenario 2: Object location received only once when object was inside the query

Scenario 3: No location update received at any point

in time when object was inside the query

Accuracy ≈ 100%Accuracy ≈ 50%Accuracy ≈ 0%

Page 32: Continuous Query Processing on Spatio-Temporal Data Streams Rimma V. Nehme Department of Computer Sciences, Worcester Polytechnic Institute Thesis Advisor:

32

Experimental Results We compare the performance of two models:

Varying the speed of the objects and queries Varying the update probability of objects and queries

We use the Network-based Generator of Moving Objects to generate a set of moving objects and moving queries in Worcester County (Tiger Line files)

5,000 moving objects and 5,000 moving queries. Each moving object or query reports its new information (if changed) every time unit.

Results are computed every 2 time units. Unless mentioned otherwise, the percentage of objects and queries that report a change of information is 100%

Page 33: Continuous Query Processing on Spatio-Temporal Data Streams Rimma V. Nehme Department of Computer Sciences, Worcester Polytechnic Institute Thesis Advisor:

33

0

5000

10000

15000

20000

25000

30000

35000

40000

45000

50000

Very Slow Slow Medium Slow Medium Fast Fast Very Fast

0

20

40

60

80

100

120

Per

cent (%

)

DM Join Time CM Join TimeDM Accuracy CM Accuracy

Accuracy and Performance (Varying Speed)

Accuracy

6.4317.59

52.07

80.13

97.0890.62

0

20

40

60

80

100

120

Speed_250 Speed_150 Speed_100 Speed_50 Speed_20 Speed_1

Perc

ent

Continuous Model Discrete Model

Join Time

3086 3276 3480 3903 4069 42824696 4878 5371 8513

33782

20032

05000

100001500020000250003000035000400004500050000

Speed_250 Speed_150 Speed_100 Speed_50 Speed_20 Speed_1

Tim

e (in

mse

cs) Discrete Model Continuous Model

Very Slow Very Fast

Very Slow Very Fast

Page 34: Continuous Query Processing on Spatio-Temporal Data Streams Rimma V. Nehme Department of Computer Sciences, Worcester Polytechnic Institute Thesis Advisor:

34

Accuracy vs. Scalability (Varying Update Probability)

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Evaluation Intervals

Tim

e (in m

secs

)

Continuous 100% Continuous 90%

Continuous 75% Continuous 50%

Continuous 25% Discrete 100%

0

1000

2000

3000

4000

5000

6000

7000

8000

Continuous100%

Continuous90%

Discrete 100% Continuous75%

Continuous50%

Continuous25%

Tim

e (im

mse

cs)

0

10

20

30

40

50

60

70

80

90

100

Per

cent (%

)

Average Join Time

Average Accuracy

0

1000

2000

3000

4000

5000

6000

7000

8000

100% 90% 75% 50% 25%

Tim

e (in m

secs

)

0

20

40

60

80

100

120

Per

cent (%

)

Average Join TimeAverage Accuracy

Update Probability = frequency ofupdates from objects and queries

100% = every timestamp50% = every other timestamp

Page 35: Continuous Query Processing on Spatio-Temporal Data Streams Rimma V. Nehme Department of Computer Sciences, Worcester Polytechnic Institute Thesis Advisor:

35

Conclusions Continuous model is more preferred when:

1. objects move fast

2. not all location updates are received (e.g., load shedding occurs);

3. location updates arrive out-of-sync due to network delay (in this case, we assume the system would load shed this data, as it is outside of the current window of execution).

Discrete model is preferred when:

1. objects move slow or

2. very frequent location updates occur

Continuous model can give a higher accuracy with better performance with only 75% of location updates.

Next Step: Dynamic switch between location modeling techniques based on: attributes of the arriving data and performance and accuracy requirements

Page 36: Continuous Query Processing on Spatio-Temporal Data Streams Rimma V. Nehme Department of Computer Sciences, Worcester Polytechnic Institute Thesis Advisor:

36

References[SDK02] D. Stojanovi´c and S. Djordjevi´c–Kajan: Location–based Web services for tracking and visual route analysis of mobile objects. In: Proceedings of Yu INFO Conference, Kopaonik, 2002, CD ROM (Serbian).

[GL04] Gedik, B., Liu, L. MobiEyes: Distributed Processing of Continuously Moving Queries on Moving Objects in a Mobile System. EDBT, 2004.

[MXA04] Mokbel, M., Xiong, X., Aref, W. SINA: Scalable Incremental Processing of Continuous Queries in Spatio-temporal Databases. SIGMOD, 2004.

[PXK+02] Prabhakar, S., Xia, Y., Kalashnikov, D., Aref, W., Hambrusch, S. Query Indexing and Velocity Constrained Indexing: Scalable Techniques for Continuous Queries on Moving Objects. IEEE Transactions on Computers, 51(10): 1124-1140, 2002.

[XMA05] Xiong, X., Mokbel, M., Aref, W. SEA-CNN: Scalable Processing of Continuous K-Nearest Neighbor Queries in Spatio-temporal Databases. ICDE, 2005.

[WCL02] Ouri Wolfson, Hu Cao, Hai Lin, Goce Trajcevski, Fengli Zhang, Naphtali Rishe: Management of Dynamic Location Information in DOMINO. EDBT 2002: 769-771

[BBH04] L. Becker, H. Blunck, K. Hinrichs, J. Vahrenhold: A Framework for Representing Moving Objects. Proceedings of the 14th International Conference on Database and Expert Systems Applications (DEXA 2004) Berlin, 2004, 854 - 863

[AG04] V. T. Almeida and R. H. Guting. Indexing the trajectories of moving objects in networks. Technical Report 309, FernuniversitÄat Hagen, Fachbereich Informatik, 2004.

[PJT00] D. Pfoser, C. S. Jensen, and Y. Theodoridis. Novel approaches to the indexing of moving object trajectories. In Proceedings of the 26th International Conference on Very Large Databases, pages 395–406, 2000.

[TPS02] Yufei Tao, Dimitris Papadias, and Qiongmao Shen. Continuous Nearest Neighbor Search. In VLDB, 2002.

[LPM02] Iosif Lazaridis, Kriengkrai Porkaew, and Sharad Mehrotra. Dynamic Queries over Mobile Objects. In EDBT, 2002

[SR01] Zhexuan Song and Nick Roussopoulos. K-Nearest Neighbor Search for Moving Query Point. In SSTD, 2001.

[LPM02] Iosif Lazaridis, Kriengkrai Porkaew, and Sharad Mehrotra. Dynamic Queries over Mobile Objects. In EDBT, 2002.

[TPS] Yufei Tao, Dimitris Papadias, and Qiongmao Shen. Continuous Nearest Neighbor Search. In VLDB, 2002.

[SJL00] Simonas Saltenis, Christian S. Jensen, Scott T. Leutenegger, and Mario A. Lopez. Indexing the Positions of Continuously Moving Objects. In SIGMOD, 2000.

Page 37: Continuous Query Processing on Spatio-Temporal Data Streams Rimma V. Nehme Department of Computer Sciences, Worcester Polytechnic Institute Thesis Advisor:

37

Acknowledgments Elke A. Rundensteiner DSRG Michael Gennert George Heineman Thomas Brinkhoff

Page 38: Continuous Query Processing on Spatio-Temporal Data Streams Rimma V. Nehme Department of Computer Sciences, Worcester Polytechnic Institute Thesis Advisor:

38

Thank You

The End