20
Public Latency-aware Elastic Scaling of Distributed Data Stream Processing Systems Thomas Heinze, Zbigniew Jerzak, Gregor Hackenbroich, Christof Fetzer May 27, 2014

Latency-aware Elastic Scaling for Distributed Data Stream Processing Systems

Embed Size (px)

DESCRIPTION

Elastic scaling allows a data stream processing system to react to a dynamically changing query or event workload by automatically scaling in or out. Thereby, both unpredictable load peaks as well as underload situations can be handled. However, each scaling decision comes with a latency penalty due to the required operator movements. Therefore, in practice an elastic system might be able to improve the system utilization, however it is not able to provide latency guarantees defined by a service level agreement (SLA). In this paper we introduce an elastic scaling system, which optimizes the utilization under certain latency constraints defined by a SLA. Specifically, we present a model, which estimates the latency spike created by a set of operator movements. We use this model to build a latency-aware elastic operator placement algorithm, which minimizes the number of latency violations. We show that our solution is able to reduce the 90th percentile of the end to end latency by up to 30% and reduce the number of latency violations by 50%. The achieved system utilization for our approach is comparable to a scaling strategy, which does not use latency as optimization target.

Citation preview

Page 1: Latency-aware Elastic Scaling for Distributed Data Stream Processing Systems

Public

Latency-aware Elastic Scaling of Distributed

Data Stream Processing Systems

Thomas Heinze, Zbigniew Jerzak, Gregor Hackenbroich, Christof Fetzer

May 27, 2014

Page 2: Latency-aware Elastic Scaling for Distributed Data Stream Processing Systems

Public

Utilization in Cloud Environments

Cluster of Twitter[1] has average CPU utilization < 20%, however ~80% of the resources are reserved

Google Cluster Trace[2] shows an average 25-35% CPU utilization and 40% memory

The average utilization in public clouds is estimated to be between 6% to 12%[3].

1 Benjamin Hindman, et al. "Mesos: A platform for fine-grained resource sharing in the data center." NSDI, 2011.

2 Charles Reiss, et al. "Heterogeneity and dynamicity of clouds at scale: Google trace analysis." SOCC, 2012.

3 Arunchandar Vasan, et al. “Worth their watts? An empirical study of datacenter servers“. HPCA,2010.

Page 3: Latency-aware Elastic Scaling for Distributed Data Stream Processing Systems

Public

Elasticity

Users needs to reserve required resources Limited understanding of the performance of the system

Limited knowledge of characteristics of the workload

Workload Load

Static Provisioning

Elastic Provisioning

time

Underprovisioning

Overprovisioning

Page 4: Latency-aware Elastic Scaling for Distributed Data Stream Processing Systems

Public

Elastic Data Stream Processing[1,2]

Long standing continous queries over potential infinite data

stream

Small memory footprint (MB – GB) for most use cases, fast

scale out

Strict requirements on end to end latency

Unpredictable workload, high variability (within seconds to

minutes)

Load balancing influences running queries

Input Streams

Data Stream Processing Engine Output Streams

[1] V. Gulisano, et al. “Streamcloud: An Elastic and Scalable Data Streaming System.” IEEE TPDS, 2012 [2] R. Fernandez, et al. „Integrating scale out and fault tolerance in stream processing using operator state management”, SIGMOD 2013.

Page 5: Latency-aware Elastic Scaling for Distributed Data Stream Processing Systems

Public

Impact on End to End Latency

0

10

20

30

40

50

60

0

2

4

6

8

10

12

1 101 201 301 401

Qu

ery

Co

un

t

Ho

st C

ou

nt

Time Unit

Used Hosts Query Count

0

10

20

30

40

-2

0

2

4

1 101 201 301 401

Mo

ved

Op

era

tors

Late

ncy

[s]

Time Unit

Latency Moved Operator

Page 6: Latency-aware Elastic Scaling for Distributed Data Stream Processing Systems

Public

Outline

1. Introduction

2. Movement Cost Modeling

3. Latency-aware Elastic Scaling

4. Evaluation

5. Conclusion and Future Work

Page 7: Latency-aware Elastic Scaling for Distributed Data Stream Processing Systems

Public

Operator Movement Protocol

Pause & Resume Protocol[1]

Host 2 Host 1 Host 3

Host 4

F1 D1 A1

A1

State

F1

[1] M. Shah, et al. "Flux: An adaptive partitioning operator for continuous query systems." In ICDE, 2003.

Page 8: Latency-aware Elastic Scaling for Distributed Data Stream Processing Systems

Public

Movement Cost Model

ql(op) = pauseTime(op) x inputRate(op, t)

pauseTime(op) = max(moveTime(o)| o ϵ succ(op)) op ϵ opsmoved

moveTime(o) = f(stateSize(o),|opsmoved|, load(h, t))

delayproc(op) = 0.5 x ql(op) x procTime(op, t)

latSpike(op) = pauseTime(op) + delayproc(op) - delayarrival(op)

delayarrival(op) = 0.5 x ql(op) x inputRate(op, t)

lat(qi, t + 1) = lat(qi, t)+ ∑ (latSpike(op)) op ϵ pausedOps(qi)

Page 9: Latency-aware Elastic Scaling for Distributed Data Stream Processing Systems

Public

Validation

Measured

Small Medium Large

Esti

mat

ed

Small 68% 0 0

Medium 19% 7% 0

Large 2% 1% 3%

Estimate Latency Peak for 378 Scaling Decisions

Three classes:

small: 0.0s < lat < 5.0s

medium: 5.0s < lat <10.0s

large: lat > 10.0s

High precision + no under estimation

Page 10: Latency-aware Elastic Scaling for Distributed Data Stream Processing Systems

Public

LATENCY-AWARE ELASTIC SCALING

Page 11: Latency-aware Elastic Scaling for Distributed Data Stream Processing Systems

Public

Latency-aware Elastic Scaling

User-defined threshold for maximum latency

Differentiate between necessary and optional scaling

decisions

Necessary: Need to response to overloaded hosts

Optional: Host release can be postponed

General idea: try to scaling decision where movement cost is

below latency threshold

Page 12: Latency-aware Elastic Scaling for Distributed Data Stream Processing Systems

Public

When should we scale ?

Upper Threshold: „If CPU utilization of host is larger than x for y

seconds, host is marked as overloaded.“

Lower Threshold: „If average CPU utilization of all hosts is small than

z for w seconds, host is marked as underloaded.“

Additional parameters: Target Utilization, Grace Period

Page 13: Latency-aware Elastic Scaling for Distributed Data Stream Processing Systems

Public

Latency-aware Scaling

Scale out Decisions:

- Move subset of operators to remove overload (subset problem)

- Best solution: set of operators creating minimal latency peak

Scale in Decisions:

- Release host with minimal latency peak, if possible

- If estimated latency >threshold: move away subset of operators from host with minimal peak

Page 14: Latency-aware Elastic Scaling for Distributed Data Stream Processing Systems

Public

EVALUATION

Page 15: Latency-aware Elastic Scaling for Distributed Data Stream Processing Systems

Public

System Architecture[1]

Elastic CEP Engine Elastic CEP Engine FU

GU

Operator Placement

Data Stream Processing Engine

Processing Coordination

Output Streams

Input Streams

1 Thomas Heinze, et al. “Auto-scaling techniques for Elastic Data Stream

Processing” SMDB, 2014.

Page 16: Latency-aware Elastic Scaling for Distributed Data Stream Processing Systems

Public

Setup

Private cloud environment with 10 hosts

Financial workload taken from Frankfurt stock exchange

Varying query workload with up to 35 queries

Measure characteristics like CPU load, latency, etc. in 10

seconds intervals

Latency is measured based on number of violations (

>2,3,4,5 sec.)

Two baseline algorithms: State size heuristic, CPU Load

heuristic

Page 17: Latency-aware Elastic Scaling for Distributed Data Stream Processing Systems

Public

0

1

2

3

4

5

6

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

2 sec. 3 sec. 4 sec.

Late

ncy

(s)

Uti

lizat

ion

Latency Threshold

Latency Avg. Latency Avg. Utilization

Min Utilization Max Utilization

0

20

40

60

80

100

120

140

2 sec. 3 sec. 4 sec.

Nu

mb

er V

iola

tio

ns

Latency Threshold

2 sec. 3 sec. 4 sec. 5 sec.

Different Latency Thresholds

Latency Threshold is reflected

Page 18: Latency-aware Elastic Scaling for Distributed Data Stream Processing Systems

Public

Comparison with Baseline

0

10

20

30

40

50

60

70

80

90

100

Our Model StateSize CpuLoadN

um

ber

vio

lati

on

s

Selection Strategy

2 sec. 3 sec. 4 sec. 5 sec.

0

1

2

3

4

5

6

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

Our Model StateSize CPULoad

Late

ncy

(s)

Uti

lizat

ion

Selection Strategy

Latency Avg. Latency Avg.Utilization

Min. Utilization Max. Utilization

Our model outperforms state of the art solutions.

Page 19: Latency-aware Elastic Scaling for Distributed Data Stream Processing Systems

Public

Different Utilization Thresholds

0

20

40

60

80

100

120

140

160

0.3 0.4 0.5N

um

ber

vio

lati

on

s

Lower Threshold

2 sec. 3 sec. 4 sec. 5 sec.

Our model slows down too aggressive scaling policies

0

1

2

3

4

5

6

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0.3 0.4 0.5

Late

ncy

(s)

Uti

lizat

ion

Lower Threshold

Latency Avg. Latency Avg. Utilization

Max. Utilization Min. Utilization

Page 20: Latency-aware Elastic Scaling for Distributed Data Stream Processing Systems

Public

Summary

Elastic scaling of a data stream processing system has negative effects

on end to end latency

We introduced model for estimating operator movement cost

Significantly reduced number of latency violations by latency-aware

elastic scaling (up to 50% less violations)

Future Work:

Still latency violations due to overload, try to employ pro-active scaling

strategies