Upload
kasper-grud-skat-madsen
View
243
Download
1
Embed Size (px)
Citation preview
Dynamic Resource Management
In a Massively Parallel Stream
Processing Engine
KASPER GRUD SKAT MADSEN, PHD STUDENT
YONGLUAN ZHOU, ASSOCIATE PROFESSOR
LINK TO PAPER
About this presentation
Presentation was given by Kasper Grud Skat Madsen, 20/10/2015 at CIKM 2015
SIGIR graciously provided a travel grant
2
Massively Parallel
Stream Processing Engine
Designed to process streaming data in a highly distributed fashion
Job is modelled as DAG (O,E)
Vertices are operator instances
Edges are communication channels between vertices
Job is allocated to nodes, s.t.
Workload is reasonably balanced at submission
Assumptions
Passive Fault Tolerance
Support for Horizontal Scaling (see paper for details)
3
Motivation & Contributions
Data input rate and distribution might change on streaming inputs
Need for horizontal scaling at runtime
Need for load balancing at runtime
Both issues can be handled with state-migration
Typically incurs overhead relative to size of state to migrate
It is imperative to support state-migrations, as cheap as possible
Paper contributions
Design scheduler, to let Apache Storm support horizontal scaling
Checkpoint-Assisted Low-Latency State-Migration
Checkpoint Allocation: Using correlation to do better
4
Fault Tolerance
Upstream FT
Operators buffers output, until the downstream operators no longer
depends on it
Buffers are potentially unbounded
Passive FT
Extension of upstream with checkpoints
A checkpoint is created periodically and stored on an external node
After a checkpoint is created, the buffers can safely be trimmed
Active
Not considered in this work.
5
Passive FT - Example 6
1 2
7
4
6
5
3
Example job: 7 operators, 3 nodes
Node 1
Node 2
Node 3
2
4
3Last TS: 99
Last CP: 90
OutBuf1011009998979695…
Last TS: 98
Last CP: 85
Each operator maintains
Output buffer
Last seen timestamp (Last TS)
Last checkpointed timestamp (Last CP)
Checkpoints are made periodically
Upstream output buffers are trimmedbased on downstream ”Last CP”
In case of fault
Checkpoint is loaded
Upstream output buffers are replayed
Regular processing can continue
Supports At-Least-Once / Exactly-Once
State-Migration
Direct State-Migration
Pause execution (op2)
Install new operator on target node (install op 2 on node 2)
Serialize state (op 2) & send to new node (node 2)
Redirect tuples to new node (node 1 node 2)
CP-Assisted State-Migration
Migrate to node which contains the newest checkpoint
Redirect tuples and buffer at new node (node 1 node2)
Install new operator on target node & convert cp to state (install op 2 on node 2)
Replay all upstream buffers (from op 1 on node 1)
7
1
4
3
2
Example job: 4 operators, 3 nodes
Migrate op 2,
from node 1
to node 2
Checkpoint-Assisted Low-Latency
State-Migration
8
Step 1: Migrate to node which contains newest checkpoint
1
4
3
2
Migrate op 2, from node 1 to node 2
Checkpoint-Assisted Low-Latency
State-Migration
9
Step 2: Duplicate tuples to both old and new node
1
4
3
2
2 Buffer incoming data
Continues processingD.
Checkpoint-Assisted Low-Latency
State-Migration
10
Step 3: Convert checkpoint to state & replay buffers
1
4
3
2
2
D. Continues processing
Convert cp to state & process output buffer from op 1
Checkpoint-Assisted Low-Latency
State-Migration
11
Step 4: Finishes processing output buffers
1
4
3
2
2Finishes the processing of the output buffers
Process buffered tuples
Continues processingD.
Checkpoint-Assisted Low-Latency
State-Migration
12
Step 5: When new node has ”caught up”, synchronize
1
4
3
2
2 Operator 2 has caught up do sync
Operator 2 stops receiving inputs. Is then converted to checkpoint
Optimizing checkpoint allocation
for partial checkpoints
Optimize allocation of checkpoints, such that processing node and
checkpointing node are negatively correlated
This is only needed when using partial checkpoints
NP-Hard heuristic solution
Calculate ”impact” of allocating each checkpoint to each node
Calculate ”importance” of each checkpoint
Assign cp with largest ”importance”
Loop, until no more checkpoints
13
Evaluation –
State-Migration
Executed with 25 nodes on Amazon EC2
Real dataset: Airline On-Time (provided by United States Department of Transportation)
Apache Storm with stabilization period of 500 seconds, then one migration
14
Evaluation –
Checkpoint Allocation
Executed with 25 nodes on Amazon EC2
Real dataset: Airline On-Time (provided by United States Department of Transportation)
Job: 4 Operators, 3 Highly Correlated, 1 Highly Uncorrelated
Apache Storm executed for 48 minutes (with 15 min statistics collection)
15