Upload
tiziano-de-matteis
View
192
Download
0
Embed Size (px)
Citation preview
University of PisaItaly
PPoPP 2016 - Barcelona
Tiziano De Matteis, Gabriele Mencagli
KEEP CALMAND
REACT WITHFORESIGHT
STRATEGIES FOR LOW-LATENCY AND ENERGY-EFFICIENT ELASTIC DATA STREAM PROCESSING
INTRODUCTIONThe recent years have been characterized by an explosion of data streams generated by a variety of sources: social networks, sensors, stock markets...
Data Stream Processing (DaSP) applications : real-time processing of continuous data streams with stringent Quality of Service (QoS) requirements in a very dynamic environment. Requirements:
Parallelism to obtain performance
Elasticity to handle dynamicity
Cost effectiveness
Goal: proposal of latency-aware and energy efficient scaling strategies with predictive capabilities
Stateful Operator
BACKGROUND
Applications are expressed as graphs of operators (vertices) that communicate through streams (edges). We will focus on stateful operators.
In many contexts, the physical input stream conveys tuples belonging to multiple logical substreams. Examples from network monitoring, financial applications, social networks, ...
Require to maintain separated state (e.g. window) for each substream and apply computation on a substream basis.
Source Side
BACKGROUNDParallelized partitioned stateful operator: each state partition owned by an operator replica
○ Splitter distributes tuples using an hash function : →[1:n]
The most used parallel schema, implemented in various DaSP frameworks (e.g. Storm)
○ Merger collects the results from the replicas
REPLICA1
REPLICAn
SPLITTER MERGERinput
streamoutput stream
Scaling strategies will change the operator configuration ( e.g. number of replicas, CPU frequency,...) in order to face all the D-* challenges
DYNAMICITY
○ the arrival rate;
○ keys frequency distribution;
○ processing time per tuple.
SYSTEM
CONTR
OLLER
disturbances
MODEL PREDICTIVE CONTROLModel Predictive Control (MPC) approach: actions are taken by using a model to predict the future system behavior over a limited prediction horizon h.
Optimizer
SystemModel
Disturbance Forecaster
decision variables
System observed through disturbances at each control step . Future values are estimated
A system model is used to compare and evaluate alternative configurations
An optimization problem is solved
The result is a reconfiguration trajectory :
Only the first one is applied
ELASTIC OPERATORThe parallel schema incorporates now the controller
R
R
S M
CONTROLLER
Measured disturbances (for step -1):
○ ( A, A): mean and standard deviation of inter-arrival time per triggering tuple;
○ { k } keys frequency distribution;○ { k } computation time for each key.
Decision variables: u( )= Number of replicas (n) and CPU frequency (f)
System models: Used to predict the values of the QoS variables with a given configuration:
SYSTEM MODELS
Latency (or more formally the response time): we use a Queueing Theory approach. For the control step is expressed as:
To find WQ we model the operator as a G/G/1 queueing system (Kingman):
Feedback mechanism to increase the precision
Waiting Time Processing Time
SYSTEM MODELS
Power : owing to the infinite nature of DaSP computations, minimize the instant power is the main solution to reduce energy consumption
Power at step is proportional to the number of replicas, the CPU frequency and square of supply voltage (depends from f)
Rationale: computation time is inversely proportional to frequency. That is, halving the frequency we will double the computation time but we will use less than half the Power.
This model will be used to compare different operator configurations.
EXPERIMENTSOur control strategies have been evaluated on an HFT application over a multicore
HFTR
R
S M
CONTROLLER
Source Consumer
financial quotes
Two different datasets (2836 symbols):
○ a real one (trading day, accelerated 100x)○ a synthetic (random walk arrival rate)
All the dynamicity factors to handle.
fitting on aggregated quotes
Window of 1000 tuples, slide 25 tuples.
EVALUATION
Two control strategies:
○ Lat-Node: resource cost depends on the number of used cores;○ Lat-Power: resource cost depends on the power consumed.
Arrival rate is predicted with Holt-Winter filter. We explicitly consider the case of =0. Control step = 1 second
Strategies evaluated in terms of SASO properties:
○ Stability: no frequent reconfigurations;○ Accuracy: minimize the QoS violations;○ Settling time: find a stable configuration quickly;○ Overshoot: no overestimating the configuration.
Target architecture: dual CPU Intel Sandy Bridge Xeon E5-265016 physical cores with DVFS feature.
STABILITY
Considering all the scenarios:
The switching cost reduce the number of reconfigurations. This effect is partially mitigated by increasing the horizon length.
ACCURACYWe detect a QoS violation each time the average latency is higher than a threshold δ (δ=1.5 ms for Synt. WL, δ=7ms for Real WL)
The switching cost allows the strategy to reach a better accuracy. This is partially offset by increasing the horizon length.
OVERSHOOT
We considered the resource consumptions.
The use of the switching cost causes overshoot. This can be mitigated by using a longer horizon
RESOURCE CONSUMPTION
We studied the power consumption (CPU cores) of the Lat-Node and Lat-Power strategies
Average power saving of 18.2% and 16.5%
SETTLINGIn cases of sudden workload challenges, the strategy should be able to reach rapidly the right configuration
The switching cost reduces the average reconfiguration amplitude. Better settling time can be achieved with longer prediction horizons.
OTHER APPROACHESWe compare our approach with a peak load configuration and two reactive strategies:
○ one based on policy rules;○ an algorithm developed for IBM SPL, not intended for latency
# Reconf. QoS Violations # Replicas
Rule-based 47.42 76 6.89
SPL-strategy 40.18 230 4.63
Lat-Node 11 30 9.97
Peak-load - 15 12
Our approach has fewer reconfiguration with fewer violations (SPL strategy is throughput oriented)
CONCLUSIONSIn this work we have studied and implemented strategies for elastic DaSP operators:
○ predictive approach by using MPC methods;○ take into account power consumption, while providing latency
guarantees;○ our strategies exhibit good stability, accuracy and lower resources
consumption;
Future works:
○ extend the work on distributed memory architectures;○ integrate the strategies in a complete graph context (not only an
operator)
ADDITIONAL REFERENCE AND ATTRIBUTIONS
References:
○ Artifact of the paper available at: https://github.com/tizianodem/elastic-hft
○ Application was developed in Fastflow, a C++ parallel programming framework for multicores: http://calvados.di.unipi.it/
○ For energy statistics and CPU frequency scaling we used the Mammut library available at: https://github.com/DanieleDeSensi/Mammut
Attribution
○ Icons used in slide 2 and 4 were designed by Freepik from www.flaticon.com