Load Shedding in Network Monitoring...

Introduction Load Shedding Fairness and NE Custom Load Shedding Conclusions

Load Shedding in Network

Monitoring Applications

Pere Barlet Ros

Advisor: Dr. Gianluca Iannaccone

Co-advisor: Prof. Josep Solé Pareta

Universitat Politècnica de Catalunya (UPC)Departament d’Arquitectura de Computadors

December 15, 2008

1 / 33

Outline

1 Introduction

2 Prediction and Load Shedding Scheme

3 Fairness of Service and Nash Equilibrium

4 Custom Load Shedding

5 Conclusions and Future Work

2 / 33

Outline

1 Introduction

MotivationRelated workContributions

3 / 33

Motivation

Network monitoring is crucial for operating data networks

Traffic engineering, network troubleshooting, anomaly detection . . .

Monitoring systems are prone to dramatic overload situations

Link speeds, anomalous traffic, bursty traffic nature . . .

Complexity of traffic analysis methods

Overload situations lead to uncontrolled packet loss

Severe and unpredictable impact on the accuracy of applications

. . . when results are most valuable!!

4 / 33

Motivation

Network monitoring is crucial for operating data networks

Traffic engineering, network troubleshooting, anomaly detection . . .

Monitoring systems are prone to dramatic overload situations

Link speeds, anomalous traffic, bursty traffic nature . . .

Complexity of traffic analysis methods

Overload situations lead to uncontrolled packet loss

Severe and unpredictable impact on the accuracy of applications

. . . when results are most valuable!!

Load Shedding Scheme

Efficiently handle extreme overload situations

Over-provisioning is not feasible

4 / 33

Related Work

Load shedding in data stream management systems (DSMS)

Examples: Aurora, STREAM, TelegraphCQ, Borealis, . . .

Based on declarative query languages (e.g., CQL)

Small set of operators with known (and constant) cost

Maximize an aggregate performance metric (utility or throughput)

Limitations

Restrict the type of metrics and possible uses

Assume explicit knowledge of operators’ cost and selectivity

Not suitable for non-cooperative environments

Resource management in network monitoring systems

Restricted to a pre-defined set of metrics

Limit the amount of allocated resources in advance

5 / 33

Contributions

1 Prediction method

Operates w/o knowledge of application cost and implementation

Does not rely on a specific model for the incoming traffic

2 Load shedding scheme

Anticipates overload situations and avoids packet loss

Relies on packet and flow sampling (equal sampling rates)

Extensions:

3 Packet-based scheduler

Applies different sampling rates to different queries

Ensures fairness of service with non-cooperative applications

4 Support for custom-defined load shedding methods

Safely delegates load shedding to non-cooperative applications

Still ensures robustness and fairness of service

6 / 33

Outline

1 Introduction

2 Prediction and Load Shedding SchemeCase Study: Intel CoMoPrediction Methodology

Load Shedding SchemeEvaluation and Operational Results

7 / 33

Case Study: Intel CoMo

CoMo (Continuous Monitoring)1

Open-source passive monitoring system

Framework to develop and execute network monitoring applications

Open (shared) network monitoring platform

Traffic queries are defined as plug-in modules written in C

Contain complex computations

1http://como.sourceforge.net

8 / 33

Case Study: Intel CoMo

CoMo (Continuous Monitoring)1

Open-source passive monitoring system

Framework to develop and execute network monitoring applications

Open (shared) network monitoring platform

Traffic queries are defined as plug-in modules written in C

Contain complex computations

Traffic queries are black boxes

Arbitrary computations and data structures

Load shedding cannot use knowledge of the queries

1http://como.sourceforge.net

8 / 33

Load Shedding Approach

Working Scenario

Monitoring system supporting multiple arbitrary queries

Single resource: CPU cycles

Approach: Real-time modeling of the queries’ CPU usage

1 Find correlation between traffic features and CPU usage

Features are query agnostic with deterministic worst case cost

2 Exploit the correlation to predict CPU load

3 Use the prediction to decide the sampling rate

9 / 33

System Overview

Figure: Prediction and Load Shedding Subsystem

10 / 33

Traffic Features vs CPU Usage

0 10 20 30 40 50 60 70 80 90 1000

Packets

0 10 20 30 40 50 60 70 80 90 1000

Time (s)

Figure: CPU usage compared to the number of packets, bytes and flows

11 / 33

Traffic Features vs CPU Usage

1800 2000 2200 2400 2600 2800 30001.4

2.8x 10

packets/batch

new_5tuple_hashes < 500

500 <= new_5tuple_hashes < 700

700 <= new_5tuple_hashes < 1000

new_5tuple_hashes >= 1000

Figure: CPU usage versus the number of packets and flows

12 / 33

Prediction Methodology2

Multiple Linear Regression (MLR)

Yi = β0 + β1X1i + β2X2i + · · · + βpXpi + εi , i = 1, 2, . . . ,n.

Yi = n observations of the response variable (measured cycles)

Xji = n observations of the p predictors (traffic features)

βj = p regression coefficients (unknown parameters to estimate)

εi = n residuals (OLS minimizes SSE)

2P. Barlet-Ros et al. "Load Shedding in Network Monitoring Applications", Proc. of USENIX Annual Technical Conference, 2007.

13 / 33

Prediction Methodology2

Feature Selection

Variant of the Fast Correlation-Based Filter (FCBF)

Removes irrelevant and redundant predictors

Reduces significantly the cost and improves accuracy of the MLR

2P. Barlet-Ros et al. "Load Shedding in Network Monitoring Applications", Proc. of USENIX Annual Technical Conference, 2007.

13 / 33

Load Shedding Scheme3

Prediction and Load Shedding subsystem

1 Each 100ms of traffic is grouped into a batch of packets

2 The traffic features are efficiently extracted from the batch (multi-resolution bitmaps)

3 The most relevant features are selected (using FCBF) to be used by the MLR

4 MLR predicts the CPU cycles required by each query to run

5 Load shedding is performed to discard a portion of the batch

6 CPU usage is measured (using TSC) and fed back to the prediction system

3P. Barlet-Ros et al. "On-line Predictive Load Shedding for Network Monitoring", Proc. of IFIP-TC6 Networking, 2007.

14 / 33

Results: Load Shedding Performance

09 am 10 am 11 am 12 pm 01 pm 02 pm 03 pm 04 pm 05 pm0

sage [cycle

CoMo cycles

Load shedding cycles

Query cycles

Predicted cycles

CPU limit

Figure: Stacked CPU usage (Predictive Load Shedding)

15 / 33

Results: Load Shedding Performance

0 2 4 6 8 10 12 14 16

CPU usage [cycles/batch]

usage)

CPU limit per batch

Predictive

Original

Reactive

Figure: CDF of the CPU usage per batch

16 / 33

Results: Packet Loss

11x 10

DAG drops

(a) Original CoMo

11x 10

DAG drops

Unsampled

(b) Reactive Load Shedding

11x 10

DAG drops

Unsampled

(c) Predictive Load Shedding

Figure: Link load and packet drops

17 / 33

Results: Error of the Queries

Query original reactive predictive

application (pkts) 55.38% ±11.80 10.61% ±7.78 1.03% ±0.65

application (bytes) 55.39% ±11.80 11.90% ±8.22 1.17% ±0.76

counter (pkts) 55.03% ±11.45 9.71% ±8.41 0.54% ±0.50

counter (bytes) 55.06% ±11.45 10.24% ±8.39 0.66% ±0.60

flows 38.48% ±902.13 12.46% ±7.28 2.88% ±3.34

high-watermark 8.68% ±8.13 8.94% ±9.46 2.19% ±2.30

top-k destinations 21.63 ±31.94 41.86 ±44.64 1.41 ±3.32

Table: Errors in the query results (mean ± stdev )

Original Reactive Predictive0

18 / 33

Results: Load Shedding Overhead

sage [cycle

CoMo cycles

Load shedding cycles

Query cycles

Predicted cycles

CPU limit

Figure: Stacked CPU usage (Predictive Load Shedding)

19 / 33

Outline

1 Introduction

3 Fairness of Service and Nash EquilibriumLoad Shedding Strategy

Evaluation

20 / 33

Fairness with Non-Cooperative Users4

Previous scheme does not differentiate among queries

An equal sampling rate is applied to all queries

Better results can be obtained applying different sampling rates

Considering the tolerance to sampling of each query

Challenges

Queries are black boxes (i.e. external information is needed)

Users are non-cooperative (selfish) in nature

4P. Barlet-Ros et al. "Robust Network Monitoring in the presence of Non-Cooperative Traffic Queries", Computer Networks, 2008.

21 / 33

Fairness with Non-Cooperative Users4

Previous scheme does not differentiate among queries

An equal sampling rate is applied to all queries

Better results can be obtained applying different sampling rates

Considering the tolerance to sampling of each query

Challenges

Queries are black boxes (i.e. external information is needed)

Users are non-cooperative (selfish) in nature

Game theory-based strategy

Variant of max-min fair share allocation policy

Users provide a minimum sampling rate

Single Nash Equilibrium when users provide correct information

4P. Barlet-Ros et al. "Robust Network Monitoring in the presence of Non-Cooperative Traffic Queries", Computer Networks, 2008.

21 / 33

Load Shedding Strategy (I)

Load shedding strategy

1 If (1), (2) are not satisfied, disable queries with greatest mq × dq

2 Compute cq ’s that satisfy (1) and (2):

mmfs_cpu: Maximize minimum allocation of cycles (cq)

mmfs_pkt: Maximize minimum sampling rate (pq)

Constraints

∀q∈Q (mq × dq) ≤ cq ≤ dq (1)∑

cq ≤ C (2)

Notation

bdq = Cycles demanded by query q ∈ Q (prediction)

mq = Minimum sampling rate of query q ∈ Q

cq = Cycles allocated to query q ∈ Q

C = System capacity in CPU cycles

22 / 33

Load Shedding Strategy (II)

Advantages

1 Single Nash Equilibrium (NE) at C|Q| (fair share of C)

2 Enforces users to provide small mq constraints

3 Non-cooperative users have no incentive to lie

4 Optimal solution can be computed in polynomial time

23 / 33

Load Shedding Strategy (II)

Advantages

1 Single Nash Equilibrium (NE) at C|Q| (fair share of C)

2 Enforces users to provide small mq constraints

3 Non-cooperative users have no incentive to lie

4 Optimal solution can be computed in polynomial time

Limitations of previous approaches

Maximize an aggregate performance metric (utility, throughput)

Difficult to compute (assumptions on cost, selectivity, utility func.)

NE when all players ask for the maximum possible allocation

Not suitable for non-cooperative environments

23 / 33

Results: Performance Evaluation

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

overload level (K)

accura

no_lshed

reactive

eq_srates

mmfs_cpu

mmfs_pkt

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

overload level (K)

accura

no_lshed

reactive

eq_srates

mmfs_cpu

mmfs_pkt

Figure: Average (left) and minimum (right) accuracy with fixed mq constraints

24 / 33

Outline

1 Introduction

Proposed MethodEnforcement Policy

25 / 33

Custom Load Shedding5

Not all monitoring applications are robust against sampling

Previous scheme penalizes these queries (mq = 1)

Queries can compute more accurate results with other methods

Examples: sample-and-hold, compute summaries, . . .

The solution of providing multiple mechanisms is unviable

The monitoring system supports arbitrary queries

5P. Barlet-Ros et al. "Custom Load Shedding for Non-Cooperative Network Monitoring Applications", submitted to IEEE Infocom 2009.

26 / 33

Custom Load Shedding

Delegates the task of shedding load to end users

Queries know their implementation (system == black boxes)

Queries can compete under fair conditions for shared resources

26 / 33

Custom Load Shedding

Delegates the task of shedding load to end users

Queries know their implementation (system == black boxes)

Queries can compete under fair conditions for shared resources

Requires an enforcement mechanism

26 / 33

Example: P2P detector query

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

load shedding rate

flow sampling

packet sampling

custom load shedding

Figure: Accuracy error of a signature-based P2P detector query

27 / 33

Enforcement Policy

10.90.80.70.60.50.40.30.20.10

load shedding rate (rq)

selfish prediction

selfish actual

custom prediction

custom actual

Figure: Actual vs predicted CPU usage (p2p-detector)

MLR update

history_value =dq

1−rq

28 / 33

Outline

1 Introduction

5 Conclusions and Future WorkConclusions

Future Work

29 / 33

Conclusions

Effective load shedding methods are now a basic requirement

Increasing data rates, number of users and application complexity

Robustness against traffic anomalies and attacks

Predictive load shedding scheme

Operates without knowledge of the traffic queries

Does not rely on a specific model for the input traffic

Anticipates overload situations avoiding uncontrolled packet loss

Graceful performance degradation (sampling & custom methods)

Suitable for non-cooperative environments

Results in two operational networks show that:

The system is robust against severe overload

The impact on the accuracy of the results is minimized

30 / 33

Future Work

Study other system resources

Examples: memory, disk bandwidth, storage space, . . .

Multi-dimensional load shedding schemes

Extend the prediction model

Study queries with non-linear relationships with traffic features

Include payload-related and entropy-based features

Address resource management problem in a distributed platform

Load balancing and distribution techniques

Other metrics: bandwidth between nodes, query delays, . . .

31 / 33

Main Publications

P. Barlet-Ros, D. Amores-López, G. Iannaccone, J. Sanjuàs-Cuxart, and J.Solé-Pareta. On-line Predictive Load Shedding for Network Monitoring. In Proc.of IFIP-TC6 Networking, Atlanta, USA, May 2007.

P. Barlet-Ros, G. Iannaccone, J. Sanjuàs-Cuxart, D. Amores-López, and J.Solé-Pareta. Load Shedding in Network Monitoring Applications. In Proc. ofUSENIX Annual Technical Conf., Santa Clara, USA, June 2007.

P. Barlet-Ros, J. Sanjuàs-Cuxart, J. Solé-Pareta, and G. Iannaccone. RobustResource Allocation for Online Network Monitoring. In Proc. of Intl.Telecommunications Network Workshop on QoS in Multiservice IP Networks(ITNEWS-QoSIP), Venice, Italy, Feb. 2008.

P. Barlet-Ros, G. Iannaccone, J. Sanjuàs-Cuxart, and J. Solé-Pareta. RobustNetwork Monitoring in the presence of Non-Cooperative Traffic Queries.Computer Networks, Oct. 2008 (in press).

Under submission:

P. Barlet-Ros, G. Iannaccone, J. Sanjuàs-Cuxart, and J. Solé-Pareta. CustomLoad Shedding for Non-Cooperative Monitoring Applications. Submitted to IEEEINFOCOM, Aug. 2008.

32 / 33

Availability

The source code of load shedding system is publicly available athttp://loadshedding.ccaba.upc.edu

The CoMo monitoring system is available athttp://como.sourceforge.net

Acknowledgments

This work was funded by a University Research Grant awarded by the Intel ResearchCouncil and the Spanish Ministry of Education under contracts TSI2005-07520-C03-02(CEPOS) and TEC2005-08051-C03-01 (CATARO)

We thank CESCA and UPCnet for allowing us to evaluate the load shedding prototype in theCatalan NREN and UPC network respectively.

33 / 33

Pere Barlet Ros

December 15, 2008

33 / 33

Appendix Backup Slides *

Outline

6 Backup SlidesTestbed ScenarioIntroduction

CoMo SystemPrediction MethodPrediction Validation and EvaluationLoad Shedding Scheme

Robustness Against AttacksNash EquilibriumCustom Load SheddingPublications

-3 / 33

Testbed Scenario

scenario

optical

spli t ters

optical

spli t ters

1 Gbps1 Gbps

PC-1: online CoMo

(2 x DAG 4.3GE)

PC-2: trace collection

(2 x DAG 4.3GE)

clock synchronization

INTERNET

RedIRIS UPC networkScientific Ring

(~70 inst i tut ions)

Figure: CESCA and UPCnet scenarios

-2 / 33

Datasets

Name Date/TimeTrace Pkts Bytes Load (Mbps)(GB) (M) (GB) avg/max/min

ABILENE 14/Aug/02 09:00-11:00 34.1 532.4 370.6 411.9/623.8/286.2

CENIC 17/Mar/05 15:50-16:20 3.8 59.5 56.0 249.7/936.9/079.1

CESCA-I 02/Nov/05 16:30-17:00 8.3 103.7 81.1 360.5/483.3/197.3

CESCA-II 11/Apr/06 08:00-08:30 30.9 49.4 29.9 133.0/212.2/096.2

UPC-I 07/Nov/07 18:00-18:30 54.7 95.2 53.0 253.5/399.0/177.8

Table: Traces in our dataset

Name Date/TimeTrace Pkts Bytes Load (Mbps)(GB) (M) (GB) avg/max/min

CESCA-III 24/Oct/06 09:00-17:00 155.5 2908.2 2764.8 750.4/973.6/129.0

CESCA-IV 25/Oct/06 09:00-17:00 152.5 2867.2 2652.2 719.9/967.5/218.0

CESCA-V 05/Dec/06 09:00-17:00 138.6 2037.8 1484.8 403.3/771.6/131.0

UPC-II 24/Apr/08 09:00-09:30 47.6 61.3 46.5 222.2/282.1/176.9

Table: Online executions

-1 / 33

Monitoring Applications (Traffic Queries)

Query Description Method Costapplication Port-based application classification packet lowautofocus High volume traffic clusters per subnet packet medcounter Traffic load in packets and bytes packet lowflows Per-flow classification and number of active flows flow lowhigh-watermark High watermark of link utilization over time packet lowp2p-detector Signature-based P2P detector custom highpattern-search Identifies sequences of bytes in the payload packet highsuper-sources Detection of sources with largest fan-out flow medtop-k Ranking of the top-k destination IP addresses packet lowtrace Full-payload packet collection custom med

applic

counte

pattern

-searc

super-

(cycle

0 / 33

Challenges

1 Monitoring applications operate in real-time with live traffic

Load shedding scheme must be lightweight

Quickly adapt to overload to prevent packet loss

2 Monitoring applications are non-cooperative

They will try to obtain the maximum resource share

The system must ensure fairness of service and avoid starvation

3 Applications are arbitrary with unknown resource demands

Input data is continuous, highly variable and unpredictable

Applications developed by 3rd parties using imperative languages

No assumptions about the input traffic or application cost

1 / 33

Problem Space

Static (offline) Dynamic (online)

Static query assignment

Resource provisioning

Load shedding

Query scheduling

Global

Placement of monitors

Placement of queries

Dissemination of queries

Load distribution

Table: Resource management problem space

2 / 33

General-Purpose Network Monitoring

Developing and deploying monitoring applications is complex

Continuous streams, high-speed sources, highly variable rates, . . .

Different network devices, transport tech., system configs

E.g., NICs, DAG cards, NetFlow, NPs, SNMP, . . .

Most application code is dedicated to deal with lateral aspects

Increased application complexity

Large development times

High probability of introducing undesired errors

Open network monitoring infrastructures

Abstract away the inner workings of the infrastructure

No tailor made for a single specific application

Examples: Scriptroute, FLAME, Intel CoMo, . . .

Research community demands open monitoring infrastructures

Share measurement data from multiple network viewpoints

Fast prototype applications to evaluate novel analysis methods

Study properties of network traffic and protocols

3 / 33

CoMo Architecture

Callback Description Process

check() Stateful filterscapture

update() Packet processing

export() Processes tuples sent by captureexportaction() Decides what to do with a record

store() Stores records to disk

load() Loads records from diskquery

print() Formats records

4 / 33

Work Hypothesis

Our thesis

Cost of maintaining data structures needed to execute a querycan be modeled looking at a set of traffic features

Empirical observation

Different overhead when performing basic operations on thestate while processing incoming traffic

E.g., creating or updating entries, looking for a valid match, etc.

Cost is dominated by the overhead of some of these operations

5 / 33

Work Hypothesis

Our thesis

Cost of maintaining data structures needed to execute a querycan be modeled looking at a set of traffic features

Empirical observation

Different overhead when performing basic operations on thestate while processing incoming traffic

E.g., creating or updating entries, looking for a valid match, etc.

Cost is dominated by the overhead of some of these operations

Our method

Models queries’ cost by considering the right set of traffic features

5 / 33

Traffic Features

No. Traffic aggregate

1 src-ip

2 dst-ip

3 protocol

4 <src-ip, dst-ip>

5 <src-port, proto>

6 <dst-port, proto>

7 <src-ip, src-port, proto>

8 <dst-ip, dst-port, proto>

9 <src-port, dst-port, proto>

10 <src-ip, dst-ip, src-port, dst-port, proto>

Counters per traffic aggregate

1 Number of unique items in a batch

2 Number of new items in a measurement interval

3 Number of repeated items in a batch

4 Number of repeated items in a measurement interval

6 / 33

Multiple Linear Regression

Cost = O(min(np2, n2p))

OLS assumptions

X variables are independent (no multicollinearity)

εi is normally distributed and expected value of ε vector is 0

No correlation between residuals and exhibit constant variance

Covariance between predictors and residuals is 0

7 / 33

Fast Correlation Based Filter6

FCBF Algorithm

1 Selecting relevant predictors

Correlation of each predictor (X ) and response variable (Y ):

(Xi−X)(Yi−Y )√Pn

i=1(Xi−X )2

i=1(Yi−Y )2

Predictors r < FCBF threshold are discarded

2 Removing redundant predictors

Sort predictors according to r

Iteratively compute r ′ among each pair of predictors

If r ′ > r , the predictor lower in the list is removed

O(n p log p)

6L. Yu and H. Liu. Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution, Proc. of ICML, 2003.

8 / 33

Resource Usage Measurements

Time-stamp counter (TSC)

64-bit counter incremented by the processor at each clock cycle

rdtsc: Reads TSC counter

Sources of measurement noise

1 Instruction reordering: Serializing instruction (e.g., cpuid)

2 Context switches: Discard sample from MLR history (rusagestructure, sched_setscheduler to SCHED_FIFO)

3 Disk accesses: Query memory accesses compete with CoModisk accesses

9 / 33

Prediction parameters (n and FCBF threshold)

0 10 20 30 40 50 60 70 80 90 1000

tive e

History (s)

MLR error vs. cost (100 executions)

0 10 20 30 40 50 60 70 80 90 1000

Cost (C

average error

average cost

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90

tive e

FCBF threshold

FCBF error vs. cost (100 executions)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90

Cost (C

average error

average cost

Figure: Prediction error versus cost as a function of the amount of history

used to compute the Multiple Linear Regression (left) and as a function of the

Fast Correlation-Based Filter threshold (right)

10 / 33

Prediction parameters (broken down by query)

0 10 20 30 40 50 60 70 80 90 1000

tive e

History (s)

MLR error (100 executions)

application

counter

pattern−search

top−k

high−watermark

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90

tive e

FCBF threshold

FCBF error (100 executions)

application

counter

pattern−search

top−k

high−watermark

Figure: Prediction error broken down by query as a function of the amount of

history used to compute the Multiple Linear Regression (left) and as a

function of the Fast Correlation-Based Filter threshold (right)

11 / 33

Prediction Accuracy (CESCA-II trace)

0 200 400 600 800 1000 1200 1400 1600 18000

Time (s)

tive e

Prediction error (5 executions − 7 queries)

average error: 0.012364max error: 0.13867

average

Query Mean Stdev Selected features

application 0.0110 0.0095 packets, bytescounter 0.0048 0.0066 packetsflows 0.0319 0.0302 new dst-ip,dst-port,protohigh-watermark 0.0064 0.0077 packetspattern-search 0.0198 0.0169 bytestop-k destinations 0.0169 0.0267 packetstrace 0.0090 0.0137 bytes, packets

12 / 33

Prediction Accuracy (CESCA-I trace)

0 200 400 600 800 1000 1200 1400 1600 18000

Time (s)

tive e

Prediction error (5 executions − 7 queries)

average error: 0.0065407max error: 0.19061

average

Query Mean Stdev Selected features

application 0.0068 0.0060 repeated 5-tuple, packetscounter 0.0046 0.0053 packetsflows 0.0252 0.0203 new dst-ip, dst-port, protohigh-watermark 0.0059 0.0063 packetspattern-search 0.0098 0.0093 packetstop-k destinations 0.0136 0.0183 new 5-tuple, packetstrace 0.0092 0.0132 packets

13 / 33

Prediction Overhead

Prediction phase Overhead

Feature extraction 9.070%

FCBF 1.702%MLR 0.201%

TOTAL 10.973%

Table: Prediction overhead (5 executions)

14 / 33

When to shed load

When the prediction exceeds the available cycles

avail_cycles = (0.1 × CPU frequency) − overhead

Overhead is measured using the time-stamp counter (TSC)

Corrected according to prediction error and buffer space

How and where to shed load

Packet and Flow sampling (hash based)

The same sampling rate is applied to all queries

How much load to shed

Maximum sampling rate that keeps CPU usage < avail_cycles

srate = avail_cyclespred_cycles

15 / 33

When to shed load

15 / 33

When to shed load

15 / 33

Load Shedding Algorithm

Load shedding algorithm (simplified version)

pred_cycles = 0;foreach qi in Q do

fi = feature_extraction(bi);si = feature_selection(fi, hi);pred_cycles += mlr(fi , si , hi);

if avail_cycles < pred_cycles × (1 + error ) thenforeach qi in Q do

bi = sampling(bi , qi , srate);fi = feature_extraction(bi);

foreach qi in Q doquery_cyclesi = run_query(bi , qi , srate);hi = update_mlr_history(hi, fi , query_cyclesi);

16 / 33

Reactive Load Shedding

Sampling rate at time t :

sratet = min

(1, max

(α, sratet−1 ×

avail_cyclest − delay

consumed_cyclest−1

consumed_cyclest−1: cycles used in the previous batch

sratet−1: sampling rate applied to the previous batch

delay : cycles by which avail_cyclest−1 was exceeded

α: minimum sampling rate to be applied

17 / 33

Alternative Prediction Approaches (DDoS attacks)8

0 5 10 15 20 25 300

Time (s)

actual

predicted

0 5 10 15 20 25 300

Time (s)

lative

(a) EWMA

0 5 10 15 20 25 300

Time (s)C

actual

predicted

0 5 10 15 20 25 300

Time (s)

lative

(b) SLR

0 5 10 15 20 25 300

Time (s)

actual

predicted

0 5 10 15 20 25 300

Time (s)

lative

(c) MLR

Figure: Prediction accuracy under DDoS attacks (flows query)

8P. Barlet-Ros et al. "Robust Resource Allocation for Online Network Monitoring", Proc. of IT-NEWS (QoSIP), 2008.

18 / 33

Impact of DDoS Attacks9

0 5 10 15 20 25 30 35 40 45 504

12x 10

Time (s)

no load shedding

load shedding

CPU threshold

5% bounds

(a) CPU usage

0 5 10 15 20 25 30 35 40 45 500

Time (s)

tive e

no load shedding

packet sampling

flow sampling

(b) Error in the query results

Figure: CPU usage and accuracy under DDoS attacks (flows query)

9P. Barlet-Ros et al. "Robust Resource Allocation for Online Network Monitoring", Proc. of IT-NEWS (QoSIP), 2008.

19 / 33

Notation and Definitions

Symbol Definition

Q Set of q continuous traffic queriesC System capacity in CPU cyclescdq Cycles demanded by the query q ∈ Q (prediction)mq Minimum sampling rate constraint of the query q ∈ Qc Vector of allocated cyclescq Cycles allocated to the query q ∈ Qp Vector of sampling ratespq Sampling rate applied to the query q ∈ Qaq Action of the query q ∈ Qa−q Actions of all queries in Q except aquq Payoff function of the query q ∈ Q

Max-min fair share allocation policy

A vector of allocated cycles c is max-min fair if it is feasible, and foreach q ∈ Q and feasible c for which cq < cq, there is some q′ wherecq ≥ cq′ and cq′ > cq′ .

20 / 33

Nash Equilibrium

Payoff function

uq(aq, a−q) =

aq + mmfsq(C −∑

i:ui>0

ai), if∑

i:ai≤aq

ai ≤ C

0, if∑

i:ai≤aq

ai > C

Definition: Nash Equilibrium (NE)

A NE is an action profile a∗ with the property that no player i can dobetter by choosing an action profile different from a∗i , given that everyplayer j adheres to a∗j

Theorem

Our resource allocation game has a single NE when all playersdemand C

|Q| cycles.

21 / 33

Proof: a∗ is a NE

a∗ is a NE if ui(a∗) ≥ ui(ai , a

∗−i) for every player i and action ai , if

all other players keep their actions fixed to C|Q|

ai > C|Q| : ui(ai , a

∗−i) = 0

ai < C|Q| : ui(ai , a

∗−i) = ai + mmfsi(

C|Q| − ai) ≤

C|Q| (mmfsi(x) ≤ x)

Proof: a∗ is the only NE

For any profile other than a∗i there is at least one query that hasan incentive to change∑

i ai > C: queries with largest demands have incentive to obtaina non-zero payoff∑

i ai < C: queries have incentive to capture spare cycles∑

i ai = C and ∃i : ai 6=CQ

: incentive to disable the query with

largest demand

22 / 33

Accuracy Function

sampling rate

accura

1 − εq

Figure: Accuracy of a generic query

23 / 33

Accuracy with Minimum Constraints

Query mqAccuracy (mean ±stdev, K = 0.5)

no_lshed reactive eq_srates mmfs_cpu mmfs_pktapplication 0.03 0.57 ±0.50 0.81 ±0.40 0.99 ±0.04 1.00 ±0.00 1.00 ±0.03autofocus 0.69 0.00 ±0.00 0.00 ±0.00 0.05 ±0.12 0.97 ±0.06 0.98 ±0.04counter 0.03 0.00 ±0.00 0.02 ±0.12 1.00 ±0.00 1.00 ±0.00 0.99 ±0.01flows 0.05 0.00 ±0.00 0.66 ±0.46 0.99 ±0.01 0.95 ±0.07 0.95 ±0.06high-watermark 0.15 0.62 ±0.48 0.98 ±0.01 0.98 ±0.01 1.00 ±0.01 0.97 ±0.02pattern-search 0.10 0.66 ±0.08 0.63 ±0.18 0.69 ±0.07 0.20 ±0.08 0.41 ±0.08super-sources 0.93 0.00 ±0.00 0.00 ±0.00 0.00 ±0.00 0.95 ±0.04 0.95 ±0.04top-k 0.57 0.42 ±0.50 0.67 ±0.47 0.96 ±0.09 0.99 ±0.03 0.96 ±0.07trace 0.10 0.66 ±0.08 0.63 ±0.18 0.68 ±0.01 0.64 ±0.17 0.41 ±0.08

Table: Sampling rate constraints (mq) and average accuracy with K = 0.5

24 / 33

Example of Minimum Sampling Rates

00.10.20.30.40.50.60.70.80.91 0

sampling rate (pq)

accura

1−εq)

high−watermark

top−k

p2p−detector

Figure: Accuracy as a function of the sampling rate (high-watermark, top-k

and p2p-detector queries using packet sampling)

25 / 33

Results: Performance Evaluation

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

overload level (K)

accura

no_lshed

reactive

eq_srates

mmfs_cpu

mmfs_pkt

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

overload level (K)

accura

no_lshed

reactive

eq_srates

mmfs_cpu

mmfs_pkt

Figure: Average (left) and minimum (right) accuracy with fixed mq constraints

Overload level (K)

C′ = C × (1 − K ), K = 1 − CP

q∈Qbdq

26 / 33

Enforcement Policy

10.90.80.70.60.50.40.30.20.10

load shedding rate

selfish prediction

selfish actual

custom prediction

custom actual

custom−corrected prediction

custom−corrected actual

Figure: Actual vs predicted CPU usage

kq parameter

kq = 1 −dq,rq=0

dq,rq=1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

load shedding rate

actual CPU usage

expected CPU usage

kq = 0.82

Figure: Actual vs expected CPU usage

Corrected load shedding rate

rq = min(1,

r ′qkq

27 / 33

Performance Evaluation

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

overload level (K)

accura

no_lshed

eq_srates

mmfs_pkt

custom

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

overload level (K)

accura

no_lshed

eq_srates

mmfs_pkt

custom

Figure: Average (left) and minimum (right) accuracy at increasing K levels

28 / 33

0 200 400 600 800 1000 1200 1400 1600 18000

0.5prediction error

tive e

0 200 400 600 800 1000 1200 1400 1600 18000

0.5delay compared to real time

0 200 400 600 800 1000 1200 1400 1600 18000

system accuracy

accura

0 200 400 600 800 1000 1200 1400 1600 18000

0.5load shedding overhead

time (s)

cycles available to CoMo prediction actual (after sampling)

Figure: Performance of a system that supports custom load shedding

29 / 33

0 200 400 600 800 1000 1200 1400 1600 1800

0 200 400 600 800 1000 1200 1400 1600 18000

0.5prediction error

tive e

0 200 400 600 800 1000 1200 1400 1600 18000

0.5delay compared to real time

0 200 400 600 800 1000 1200 1400 1600 18000

system accuracy

accura

time (s)

cycles available to CoMo prediction actual (after sampling)

Figure: Performance with a selfish query (every 3 minutes)

30 / 33

Online Performance

09:00 09:05 09:10 09:15 09:20 09:25 09:300

18x 10

time [hh:mm]

sage [cycle

CoMo + load_shedding overhead

Query cycles

Predicted cycles

CPU limit

09:00 09:05 09:10 09:15 09:20 09:25 09:300

time [hh:mm]

total packets

buffer occupation (KB)

DAG drops

Figure: Stacked CPU usage (left) and buffer occupation (right)

31 / 33

Publication Statistics

USENIX Annual Technical Conference

Estimated impact: 2.64 (rank: 7 of 1221)10

Acceptance ratio: 26.5% (31/117)

Computer Networks Journal

Impact factor: 0.829 (25/66)11

IFIP-TC6 Networking

ITNEWS-QoSIP

IEEE INFOCOM (under submission)

Estimated impact: 1.39 (rank: 133 of 1221)10

10According to CiteSeer: http://citeseer.ist.psu.edu/impact.html11Journal Citation Reports (JCR)

32 / 33

Pere Barlet Ros

December 15, 2008

33 / 33

Load Shedding in Network Monitoring...

Documents

Eskom Load Shedding Schedule - sun.ac.za Wor… · Eskom Load Shedding Schedule PLEASE NOTE Eskom Western Cape will be implementing new load shedding schedules from 1 February 2015

intelligent load shedding scheme

Under Voltage Load Shedding Guidelines

Blackout Avoidance & Undervoltage Load Shedding

Based on Load Shedding 1

UNDERVOLTAGE LOAD SHEDDING - beckwithelectric.com

Load Shedding Management

Load Shedding in Network Monitoring Applicationspeople.ac.upc.edu/pbarlet/reports/barlet-ros-phd_thesis.pdf · Load Shedding in Network Monitoring Applications Pere Barlet-Ros Advisor:

Load Shedding Controller PML630 Product Guide - ABB Group · PDF fileload-shedding function triggers the fast load-shedding function to initiate the load shed action. The manual load-shedding

Moving, Capturing and Load Shedding

Undervoltage Load Shedding Standards/PRC...Underfrequency Load Shedding). Clearly identify and separate centrally controlled undervoltage‐based load shedding due to the reliability

Intelligent Load Shedding

PML630 Compact Load Shedding Solution

UNDERVOLTAGE LOAD SHEDDING GUIDELINES - … Load...UNDERVOLTAGE LOAD SHEDDING GUIDELINES JULY 1999 Prepared by Undervoltage Load Shedding Task Force (UVLSTF)* Technical Studies Subcommittee

Load Shedding Philosophy Ppt

Load Shedding Exercise

Load Shedding

SISTEM LOAD SHEDDING PADA GENERATOR

Avoiding Load Shedding - AMEU 2015 Presentation… · Avoiding Load Shedding using a “Virtual” Power Station Avoid stage 1 and 2 load shedding through customer participation

Eskom Load Shedding Schedule - bvm.gov.za · Eskom Load Shedding Schedule PLEASE NOTE Eskom Western Cape will be implementing new load shedding schedules from 1 …