Upload
horatio-allen
View
218
Download
0
Embed Size (px)
Citation preview
1
Dynamically Adaptive Distributed
System for Processing CompleX
Continuous QueriesBin Liu, Yali Zhu, Mariana Jbantova, Brad
Momberger, and Elke A. Rundensteiner
VLDB’05August 31st 2005
Presented by Yali Zhu
Department of Computer ScienceWorcester Polytechnic Institute
U.S.A
2
Uncertainties in Stream Query Processing
RegisterContinuous
Queries
Stream Query Engine
Stream Query Engine
Streaming DataStreaming Result
Real-time and accurate responses
required
May have time-varying rates and
high-volumesAvailable resources for
executing each operator may vary over time.
Distribution and Run-time Adaptations are required.
High workload of queries
Memory- and CPU resource limitations
3
DAX (D-CAPE) System Architecture
Local Statistics Gatherer
DataDistributor
CAPE-Continuous Query Processing Engine
DataReceiver
Query Processor
Local Adaptation Controller
Distribution Manager
StreamingDataNetwor
k
Network
End User
Global AdaptationController
RuntimeMonitor
Query PlanManager
Repository
ConnectionManager
Repository
Application Server Stream Generator
Global Plan Migrator
Local Plan Migrator
4
Distributed Adaptation Techniques Workload Relocation
Operator-level Partition-level
Query Plan Reshaping Data Spilling
5
Initial DistributionDistribution
Manager
Machine 2
Machine 1
1 2 3 4
5
6
7 8
1 2 3 4
5
6
7 8Operator Processor
Operator 1 QP 1
Operator 2 QP 1
Operator 3 QP 2
Operator 4 QP 2
Operator 5 QP 1
Operator 6 QP 1
Operator 7 QP 2
Operator 8 QP 2
Stream Source
3
1 2
Application
5
6
7 8
4
6
Distribution Manager
Machine 2
Machine 1
1 2 3 4
5
6
7 8
1 2 3 4
5
6
7 8Operator Processor
Operator 7 QP 1
Stream Source
3
1 2
Application
5
6
7 8
4
Workload Relocation – Operator-level
7
Workload Relocation – Partition-level
A B C
SplitA
m1 m2
SplitB SplitC
Problem of operator-level adaptation: Operators have large states. Moving them across machines can be expensive.
Solution as partition-level adaptation: Partition state-intensive operators [Gra90,SH03,LR05] Distribute Partitioned Plan into Multiple Machines
How to partition and relocate multi-way joins at run time?
8
Dynamic Plan Reshaping and Migration
op1
op2
op3 op4
op1
op2
op3 op4
M1
M2op1
op2
op3 op4
Distribution Manager
Distribution Manager
op2
op3 op4
op1
M1 op2
op3 op4
op1
op2
op3 op4
op1
M2
Migration Protocol11-way handshaking
How does the protocol guarantees correct query results? How to integrate with across-machine workload relocation?
9
State Spill
A B C
A B C
How to keep high run-time query throughput? How to integrate with across-machine workload relocation?
Secondary Storage
Push part of operator state onto disk Quick relief of memory overflow problem
10
Summary
Key Words Distributed system Continuous queries (multi-way joins) Various unique run-time adaptation techniques
Demo Sessions:Wednesday 2-3:30Friday 9-10