Latency-aware Elastic Scaling for Distributed Data Stream Processing Systems

Public

Latency-aware Elastic Scaling of Distributed

Data Stream Processing Systems

Thomas Heinze, Zbigniew Jerzak, Gregor Hackenbroich, Christof Fetzer

May 27, 2014

Public

Utilization in Cloud Environments

Cluster of Twitter[1] has average CPU utilization < 20%, however ~80% of the resources are reserved

Google Cluster Trace[2] shows an average 25-35% CPU utilization and 40% memory

The average utilization in public clouds is estimated to be between 6% to 12%[3].

1 Benjamin Hindman, et al. "Mesos: A platform for fine-grained resource sharing in the data center." NSDI, 2011.

2 Charles Reiss, et al. "Heterogeneity and dynamicity of clouds at scale: Google trace analysis." SOCC, 2012.

3 Arunchandar Vasan, et al. “Worth their watts? An empirical study of datacenter servers“. HPCA,2010.

Public

Elasticity

Users needs to reserve required resources Limited understanding of the performance of the system

Limited knowledge of characteristics of the workload

Workload Load

Static Provisioning

Elastic Provisioning

time

Underprovisioning

Overprovisioning

Public

Elastic Data Stream Processing[1,2]

Long standing continous queries over potential infinite data

stream

Small memory footprint (MB – GB) for most use cases, fast

scale out

Strict requirements on end to end latency

Unpredictable workload, high variability (within seconds to

minutes)

Load balancing influences running queries

Input Streams

Data Stream Processing Engine Output Streams

[1] V. Gulisano, et al. “Streamcloud: An Elastic and Scalable Data Streaming System.” IEEE TPDS, 2012 [2] R. Fernandez, et al. „Integrating scale out and fault tolerance in stream processing using operator state management”, SIGMOD 2013.

Public

Impact on End to End Latency

0

10

20

30

40

50

60

0

2

4

6

8

10

12

1 101 201 301 401

Qu

ery

Co

un

t

Ho

st C

ou

nt

Time Unit

Used Hosts Query Count

0

10

20

30

40

-2

0

2

4

1 101 201 301 401

Mo

ved

Op

era

tors

Late

ncy

[s]

Time Unit

Latency Moved Operator

Public

Outline

1. Introduction

2. Movement Cost Modeling

3. Latency-aware Elastic Scaling

4. Evaluation

5. Conclusion and Future Work

Public

Operator Movement Protocol

Pause & Resume Protocol[1]

Host 2 Host 1 Host 3

Host 4

F1 D1 A1

A1

State

F1

[1] M. Shah, et al. "Flux: An adaptive partitioning operator for continuous query systems." In ICDE, 2003.

Public

Movement Cost Model

ql(op) = pauseTime(op) x inputRate(op, t)

pauseTime(op) = max(moveTime(o)| o ϵ succ(op)) op ϵ opsmoved

moveTime(o) = f(stateSize(o),|opsmoved|, load(h, t))

delayproc(op) = 0.5 x ql(op) x procTime(op, t)

latSpike(op) = pauseTime(op) + delayproc(op) - delayarrival(op)

delayarrival(op) = 0.5 x ql(op) x inputRate(op, t)

lat(qi, t + 1) = lat(qi, t)+ ∑ (latSpike(op)) op ϵ pausedOps(qi)

Public

Validation

Measured

Small Medium Large

Esti

mat

ed

Small 68% 0 0

Medium 19% 7% 0

Large 2% 1% 3%

Estimate Latency Peak for 378 Scaling Decisions

Three classes:

small: 0.0s < lat < 5.0s

medium: 5.0s < lat <10.0s

large: lat > 10.0s

High precision + no under estimation

Public

LATENCY-AWARE ELASTIC SCALING

Public

Latency-aware Elastic Scaling

User-defined threshold for maximum latency

Differentiate between necessary and optional scaling

decisions

Necessary: Need to response to overloaded hosts

Optional: Host release can be postponed

General idea: try to scaling decision where movement cost is

below latency threshold

Public

When should we scale ?

Upper Threshold: „If CPU utilization of host is larger than x for y

seconds, host is marked as overloaded.“

Lower Threshold: „If average CPU utilization of all hosts is small than

z for w seconds, host is marked as underloaded.“

Additional parameters: Target Utilization, Grace Period

Public

Latency-aware Scaling

Scale out Decisions:

- Move subset of operators to remove overload (subset problem)

- Best solution: set of operators creating minimal latency peak

Scale in Decisions:

- Release host with minimal latency peak, if possible

- If estimated latency >threshold: move away subset of operators from host with minimal peak

Public

EVALUATION

Public

System Architecture[1]

Elastic CEP Engine Elastic CEP Engine FU

GU

Operator Placement

Data Stream Processing Engine

Processing Coordination

Output Streams

Input Streams

1 Thomas Heinze, et al. “Auto-scaling techniques for Elastic Data Stream

Processing” SMDB, 2014.

Public

Setup

Private cloud environment with 10 hosts

Financial workload taken from Frankfurt stock exchange

Varying query workload with up to 35 queries

Measure characteristics like CPU load, latency, etc. in 10

seconds intervals

Latency is measured based on number of violations (

>2,3,4,5 sec.)

Two baseline algorithms: State size heuristic, CPU Load

heuristic

Public

0

1

2

3

4

5

6

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

2 sec. 3 sec. 4 sec.

Late

ncy

(s)

Uti

lizat

ion

Latency Threshold

Latency Avg. Latency Avg. Utilization

Min Utilization Max Utilization

0

20

40

60

80

100

120

140

2 sec. 3 sec. 4 sec.

Nu

mb

er V

iola

tio

ns

Latency Threshold

2 sec. 3 sec. 4 sec. 5 sec.

Different Latency Thresholds

Latency Threshold is reflected

Public

Comparison with Baseline

0

10

20

30

40

50

60

70

80

90

100

Our Model StateSize CpuLoadN

um

ber

vio

lati

on

s

Selection Strategy


0

1

2

3

4

5

6

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

Our Model StateSize CPULoad

Late

ncy

(s)

Uti

lizat

ion

Selection Strategy

Latency Avg. Latency Avg.Utilization

Min. Utilization Max. Utilization

Our model outperforms state of the art solutions.

Public

Different Utilization Thresholds

0

20

40

60

80

100

120

140

160

0.3 0.4 0.5N

um

ber

vio

lati

on

s

Lower Threshold


Our model slows down too aggressive scaling policies

0

1

2

3

4

5

6

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0.3 0.4 0.5

Late

ncy

(s)

Uti

lizat

ion

Lower Threshold

Latency Avg. Latency Avg. Utilization

Max. Utilization Min. Utilization

Public

Summary

Elastic scaling of a data stream processing system has negative effects

on end to end latency

We introduced model for estimating operator movement cost

Significantly reduced number of latency violations by latency-aware

elastic scaling (up to 50% less violations)

Future Work:

Still latency violations due to overload, try to employ pro-active scaling

strategies

Technology

Latency-aware Elastic Scaling for Distributed Data Stream Processing Systems