The Continuous Distributed Monitoring Model

Preview:

Citation preview

The Continuous Distributed

Monitoring ModelFarzad Nozarian

fnozarian@aut.ac.ir

Chalmers University of Technology

18/04/2016

218/04/2016

Outline

Chalmers University of Technology

Countdown Problem

Monitoring Entropy

Geometric Approach

Sampling

Introduction

318/04/2016

What Is the Problem?

Chalmers University of Technology

Simple countdown!Tracking the entropy Distinct elementsSamplingTop-k items

Several processing nodes receive streams of data items

The goal is how to monitor a function over the union of items

Examples of monitoring functions:

with minimum communication cost

418/04/2016

Motivation and Applications

Chalmers University of Technology

Monitoring the global health of the network in a large ISP

Tracking the usage of resources in distributed data centers by social

networks

Tracking global changes by collecting information from sensors

518/04/2016

What Are the Challenges?

Chalmers University of Technology

Continuous MonitoringReal-time tracking, rather than one-shot query

StreamingData is received at a very high speed

Distributed Processing

Each node only sees part of the global streamCommunication cost is important

618/04/2016

Trivial Solutions

Chalmers University of Technology

High communication cost!

Summarizing information in complex functionsParameter tuning for frequency of the polling

Infrequent polling

Delay in identifying events

Frequent polling

High communication

Centralizing all the items

Periodic polling

The Countdown Problem

818/04/2016

The Countdown Problem

Chalmers University of Technology

A threshold monitoring problem with many applications

Identifying when the total number of observations reaches

Trivial solution: Observers notify the coordinator by sending a bit when an event is observed

But we can improve it!

communication

918/04/2016

A First Approach

Chalmers University of Technology

The total communication is

Idea: there are many events at each site before reaching the threshold

At least one site should see items before thresholdEvery site waits to see at least items before reporting to the coordinator

After receiving a report from observer the coordinator updates and informs all nodes

1018/04/2016

A Quadratic Improvement

Chalmers University of Technology

Waiting for more updates before reporting to coordinatorProtocol runs over rounds

The total communication is

In round , all nodes wait to receive items before reporting to the coordinator

Coordinator starts the th round after receiving messages

Monitoring Entropy

1218/04/2016

Monitoring Entropy

Chalmers University of Technology

Monitoring non-monotone functions

Let denote the number of occurrences of item

Let denote the total number of items

Union of input streams implicitly define a probability distribution given by ,

The goal is monitoring the entropy of this distribution

1318/04/2016

Entropy Protocol

Chalmers University of Technology

The protocol proceeds in multiple rounds

In the first round, coordinator collects a constant number of items from sites

In each subsequent round coordinator does the following:

Computes the parameter

Runs the approximate countdown protocol with Collects frequency distribution from all sites and computes current entropy

The Geometric Approach

1518/04/2016

The Geometric Approach (1/2)

Chalmers University of Technology

Goal: monitoring of arbitrary threshold non-linear functions

A geometric fact:

Idea: break down the testing of or into local conditions

1618/04/2016

The Geometric Approach (2/2)

Chalmers University of Technology

Each site checks whether its sphere is monochromaticWhen all the constraints are upheld:

Query result remains unchangedNo communication is required

When a constraint is violated:New data is gathered from the streamsNew constraints are set on the streams

Sampling

1818/04/2016

Sampling

Chalmers University of Technology

Given inputs of total size , draw a sample of size Uniform over all subsets of size

Sampling cases

Sampling applications

Approximate query answeringQuery planningNumber of distinct elementsHeavy hitters

Infinite windowsSliding windows

1918/04/2016

Infinite Windows (1/2)

Chalmers University of Technology

Each site associates a random weight with each observation

Coordinator maintains the following variables:

Set of random sample with weight no more than

Weight : the -th smallest weight so far in the system

Each site only maintains its local -th smallest weight

2018/04/2016

Infinite Windows (2/2)

Chalmers University of Technology

Protocol outline:

Each site sends an element with weight smaller than to the coordinator

Coordinator updates and , if weight of received item is smaller than

Coordinator replies back to site with the current value of

Thank You :)

Support Slides

2318/04/2016

A First Approach (long Ver.)

Chalmers University of Technology

Algorithm steps:Initially, each site report the coordinator whenever its num. of observed items exceeds Coordinator compute current slack based on the sum of all local count: ( is current count)Each site set upper bound on its local count

The total communication is

Idea: there are many events at each site before reaching the threshold At least one site should see items before

threshold

2418/04/2016

Approximate Countdown

Chalmers University of Technology

Improve the cost by approximating the answer

Similar to previous approach but now terminate when the bound of unreported count reaches The number of rounds is reduced to

The total communication is

Let be the approx. parameter

Report 0 if count Report 1 if count

2518/04/2016

Randomized Countdown Protocol (1/2)

Chalmers University of Technology

If grows very large the cost will be high

Allow algorithm to give an wrong answer with small probability

Randomization reduces the dependency to by parameter

2618/04/2016

Randomized Countdown Protocol (2/2)

Chalmers University of Technology

With randomization parameter determined by analysis:

Each site collect of observations

With probability it sends a message otherwise remains silent

The coordinator wait until receive messages, then terminates

The total communication cost is

2718/04/2016

Geometric Computational Model (1/2)

Chalmers University of Technology

Each site has a -dimensional vector called local statistics vector

Let be weights assigned to the streams

Define the global statistics vector as the weighted average of the s

Let be an arbitrary monitoring function

Goal: determining at any given time and threshold

2818/04/2016

Geometric Computational Model (2/2)

Chalmers University of Technology

is the last statistics vector collected from the node Coordinator constructs estimate vector is the weighted average of the

Each node also maintains following parameters:

Decomposing relies on the following fact:

Delta vector:

Drift vector:

2918/04/2016

Geometric Interpretation

Chalmers University of Technology

Geometric interpretation:

Convex hull can be fully covered by spheres with radius centered at

�⃗�

𝑢1𝑢2

𝑢3

𝑢4𝑢5

Recommended