1 Dynamically Adaptive Distributed System for Processing CompleX Continuous Queries Bin Liu, Yali...

Preview:

Citation preview

1

Dynamically Adaptive Distributed

System for Processing CompleX

Continuous QueriesBin Liu, Yali Zhu, Mariana Jbantova, Brad

Momberger, and Elke A. Rundensteiner

VLDB’05August 31st 2005

Presented by Yali Zhu

Department of Computer ScienceWorcester Polytechnic Institute

U.S.A

2

Uncertainties in Stream Query Processing

RegisterContinuous

Queries

Stream Query Engine

Stream Query Engine

Streaming DataStreaming Result

Real-time and accurate responses

required

May have time-varying rates and

high-volumesAvailable resources for

executing each operator may vary over time.

Distribution and Run-time Adaptations are required.

High workload of queries

Memory- and CPU resource limitations

3

DAX (D-CAPE) System Architecture

Local Statistics Gatherer

DataDistributor

CAPE-Continuous Query Processing Engine

DataReceiver

Query Processor

Local Adaptation Controller

Distribution Manager

StreamingDataNetwor

k

Network

End User

Global AdaptationController

RuntimeMonitor

Query PlanManager

Repository

ConnectionManager

Repository

Application Server Stream Generator

Global Plan Migrator

Local Plan Migrator

4

Distributed Adaptation Techniques Workload Relocation

Operator-level Partition-level

Query Plan Reshaping Data Spilling

5

Initial DistributionDistribution

Manager

Machine 2

Machine 1

1 2 3 4

5

6

7 8

1 2 3 4

5

6

7 8Operator Processor

Operator 1 QP 1

Operator 2 QP 1

Operator 3 QP 2

Operator 4 QP 2

Operator 5 QP 1

Operator 6 QP 1

Operator 7 QP 2

Operator 8 QP 2

Stream Source

3

1 2

Application

5

6

7 8

4

6

Distribution Manager

Machine 2

Machine 1

1 2 3 4

5

6

7 8

1 2 3 4

5

6

7 8Operator Processor

Operator 7 QP 1

Stream Source

3

1 2

Application

5

6

7 8

4

Workload Relocation – Operator-level

7

Workload Relocation – Partition-level

A B C

SplitA

m1 m2

SplitB SplitC

Problem of operator-level adaptation: Operators have large states. Moving them across machines can be expensive.

Solution as partition-level adaptation: Partition state-intensive operators [Gra90,SH03,LR05] Distribute Partitioned Plan into Multiple Machines

How to partition and relocate multi-way joins at run time?

8

Dynamic Plan Reshaping and Migration

op1

op2

op3 op4

op1

op2

op3 op4

M1

M2op1

op2

op3 op4

Distribution Manager

Distribution Manager

op2

op3 op4

op1

M1 op2

op3 op4

op1

op2

op3 op4

op1

M2

Migration Protocol11-way handshaking

How does the protocol guarantees correct query results? How to integrate with across-machine workload relocation?

9

State Spill

A B C

A B C

How to keep high run-time query throughput? How to integrate with across-machine workload relocation?

Secondary Storage

Push part of operator state onto disk Quick relief of memory overflow problem

10

Summary

Key Words Distributed system Continuous queries (multi-way joins) Various unique run-time adaptation techniques

Demo Sessions:Wednesday 2-3:30Friday 9-10

Recommended