1 Distributed Streams Algorithms for Sliding Windows Phillip B. Gibbons, Srikanta Tirthapura

Distributed Streams Algorithms for Sliding Windows

Phillip B. Gibbons,

Srikanta Tirthapura

Abstract

• Algorithm for estimating aggregate functions over a “sliding window” of the N most recent data items in one or more streams.

Single stream

• The first E-approximation scheme for number of 1’s in a sliding window.

• The first E-approximation scheme for the sum of integers in [0..R] in a sliding window.

• Both algorithms are optimal in worst case time and space.

• Both algorithms are deterministic

Distributed Streams

• The first randomized E-approximation scheme for the number of 1’s in a sliding window over the union of distributed streams.

• Network Monitoring

• Data Warehousing

• Telecommunications

• Sensor Networks

• Multiple Data Source - Distributed Stream Model

• Only the most recent data is important - “Sliding Window”

The Goal in the algorithms

• Approximating a function F while minimizing :

• 1. The total memory

• 2. The time take by each party to

process a data item

• 3. The time to produce an estimate -

query time

Definition 1-An -approximation scheme for a quantity X • A randomized procedure that, given any

positive <1 and <1, compute an estimate :

• -approximate :An estimate whose worst case relative error is at most

An Example for Basic Counting Problem

Algorithms for Distributed Stream

• Each party observes only its own stream

• Each party communicates with other parties only when estimate is requested

• Each party sends a message to a Referee who computes the estimate

The Idea

• Storing a wave consisting of many random samples of the stream.

• Samples that contain only the recent items are sampled at a high probability, while those containing old items are sampled at a lower probability

Contributions• Introducing a data structures called

• Presenting the first E-approximation scheme for Basic Counting.

• Presenting the first E-approximation scheme for the sum of integers in [0..R]. Both optimal in worst case space, processing time and query time.

Contributions

• Presenting the first randomized -approximation for the number of 1’s in a sliding window over the union of distributed streams

Related Work

• From the paper of Datar et al :

• Using Exponential Histogram data base

Exponential Histogram • Maintain more information about

recently seen items, less about old items.

• k0 most recent 1’s are assigned to individual bucket

• The K1 next most recent 1’s are assigned to bucket size 2.

• The K2 next most recent 1’s are assigned to bucket size 4.

• So on until last N items are assigned to some bucket

Exponential Histogram• Each ki is either or • The last bucket is discarded if its position no

longer falls within the window• If the new item is a 1, it is assigned to a new

bucket of size 1.

• If this make , then the two least recent buckets of size 1 are merged to form a bucket of size 2.

• If k1 in now too large, the two least recent buckets of size 2 are merged

• So on resulting in a cascading of up to log N bucket merges in the worst case.

• The approach using waves avoids this cascading

The Basic Wave

• Assumption : is an integer.• Counters:

1. pos - the current length of stream2. rank - the current number of 1’s in the stream.

• The wave contains the position of the recent 1’s in the stream, arranged at different “levels”.

• For i=1,2,..,l-1, level i contains the positions of the most recent 1-bits whose 1-rank is a multiple of

An Example for Basic Wave

• The crest of the wave is always over the largest 1-rank

• N=48, 1/E=3, l=5

Estimation Steps:• Let s=max(0,pos-n+1)

{estimation number of 1’s in [s,pos]}

• Let p1 be the maximum position less than s, and p2 the minimum position greater/equal then s.

• Let r1 and r2 be the rank-1 of p1 and p2 respectively.

• Return = rank-r+1 where r= r2 if r2-r1 =1 otherwise r=(r1+r2)/2

LEMMA 1

• The procedure returns an estimate that is within a relative error of E of the actual number of 1’s in the window.

Proof• Let j be the smallest numbered level containing

position p1.• By returning the midpoint of the range [r1,r2] , we

guarantee that the absolute error is at most (r2-r1)/2

• There is at most a gap between r1 and its next larger position r2.

• Thus the absolute error in our estimate is at most

• Let r3 be the earliest 1-rank at level j-1.• r3> r1, r3>=r2. • by definition

Improvement• Use modulo N’ counters for pos and

rank, store the positions in the wave as modulo N’ numbers - Take only log N’ bits.

• Keep track of both the largest 1-rank discarded (r1) and the smallest 1-rank (r2) still in the wave - Number of 1’s answer in O(1).

• Instead of storing a single position in multiple levels, store each position only at its maximal level.

Improvement

Improvement• The positions at each level are stored in a

fixed length queue so that each time new position is added , the position at the end of the queue is removed.

• Maintaining a doubly link list of the position in the wave in increasing order.

• By storing the difference between consecutive positions instead of the absolute positions - reduce the space from to

The deterministic wave algorithm• Upon receiving a stream bit b:

1.Increment pos (modulo N’=2N)2.If the head(p,r) of the linked list L has expired (p<=pos-N), then discard it from L and from its queue, and store r as the largest 1-rank discarded

• 3.If b=1 then do:(a)Increment rank, and determine the corresponding wave level j, the largest j such that rank is a multiple of (b)If the level j queue is full,discard the tail of the queue and splice it out of L(c)Add(pos,rank) to the head of the level j queue and the tail of L

Answering a query for a sliding window of size N:

• 1. Let r1 the largest 1-rank discarded. (If no such r1, return rank as exact answer.) Let r2 be 1-rank at the head of the linked list L. (If L is empty, return 0).

• 2. Return rank-r+1, where r=r2 if r2-r1=1 and otherwise r=(r1+r2)/2

• Space -

• Process time for each item - O(1)• Estimate time - O(1)

• In related work (Datar et al)

• Space -

• Process time for each item - O(log(EN))

Sum of Bounded Integers• The sum over a sliding window can

range from 0 to NR.

• Let N’ be smallest power of 2 greater than/equal to 2RN.

• Counters(modulo N’):pos - the current lengthtotal - the running sum

• l=log(2ENR) levels.

• Storing triple for each item (p,v,z)v-the value for the data itemz-the partial sum trough this item

• The answer for query is the midpoint of the interval [total-z2+v2,total-z1)

The Algorithm for the sum of last N items in a data stream• Upon receiving a stream value v between 0 to R:• 1.Increment pos (modulo N’=2N)• 2.If the head(p,v’,z) of the linked list L has expired

(p<=pos-N), then discard it from L and from its queue, and store z as the largest partial sum discarded

• 3.If v>0 then do:• (a)Determine the largest j such that some number in

(total,total+v) is a multiple of Add v to total.• (b)If the level j queue is full,discard the tail of the

queue and splice it out of L• (c)Add(pos,v,total) to the head of the level j queue and

the tail of L

Step 3a• The desired wave level is the largest

position j such that some number y in the interval (total,total+v] has 0’s in all positions less than j.

• y-1 and y differ in bit position j.

• If bit j changes from 1 to 0 at any point in [total,total+v],then j is not the largest

• j is the position of the most-significant bit that is 0 in total and 1 in total+v.

• j is the most -significant bit that is 1 in bitwise xor between total and total+v

Answering a query for a sliding window of size N :

• 1. Let z1 be the largest partial sum discarded from L. (If no such z1, return total as exact answer.) Let (pos,v2,z2) be the head of the linked list L. (If L is empty, return 0).

• 2. Return [total - (z1+z2-v2)/2]

• Space -O(1/E(logN+logR)) memory word of O(logN+logR)

• Process time for each item - O(1)• Estimate time - O(1)

• In related work (Datar et al)

• Space - O(1/E(logN+logR)) buckets of logN+log(logN+logR)

• Process time for each item - O(logN+logR)

Distributed Streams• Tree definitions for sliding window

over a collection of t>1 distributed

stream:1. Seeking the total number of 1’s in the last N

items in each of the t streams (tN items in total)

2. A single logical stream has been split

arbitrarily among the parties. Each party receives

items that include a sequence number in the

logical stream. Seeking the total number of 1’s in

the last N items in the logical stream.

3.Seeking the total number of 1’s in the last N

items in the position-wise union of the t streams

Solution for First Scenario :

• Applying single stream algorithm to each stream.

• To answer a query, each party sends its count to the Referee.

• The Referee sums the answers.

• Because each individual count is within E relative error, so is the total.

Solution for Second Scenario :

• To answer a query, each party sends its wave to the Referee.

• The Referee computes the maximum sequence number over all the parties use each wave to obtain an estimate over the resulting window, and sum the result.

• Because each individual count is within E relative error, so is the total.

Randomized Waves • Contains the positions of the recent 1’s in

the data stream, stored at different levels.• Each level i contains the most recently

selected positions of the 1-bits, where a position is selected into level i with probability

• The deterministic wave select 1 out of every 1-bits at regular interval.

• A randomized wave selects an expected 1 out of every 1-bits random interval.

• The randomize wave retains more position per level.

The Basic Randomized Wave• Let N’ be the power of 2 that is at least 2N• Let d=logN’• Let E<1 be the desired error probability• Each Party Pj maintains a basic

randomized wave for its stream consisting of d+1 queues, Qj(0),..,Qj(d), one for each level.

• Using a psedo-random hash function h to map positions to levels, according to exponential distribution

The Steps for Maintaining the Randomized Wave:• Party Pj, upon receiving a stream bit b:

1.Increment pos (modulo N’=2N)2.Discard any position p in the tail of a queue that has expired (p<=pos-N)3.If b=1 then for l= 0,..,h(pos) do:(a) If the level l queue Qj(l) is full, then discard the tail of Qj(l)(b) Add pos to the head of Qj(l).

• The sample for each level, stored in a queue, contains the most recent position selected into the level. (c=36)

• Consider a queue Qj(l) contains all the 1-bitwise the interval [I,pos] whose position i. Then Qj(l) contains all the 1-bits in the interval [i,pos] whose positions hash to a value greater than equal to l.

• As we move from level l to l+1, the range may increase.

• The queues at lower numbered levels may have ranges that fail to contain the window, but as we move to higher levels, we will find a level whose contains the window

Answering a query for a sliding window of size n<=N • After each party has observed pos bits:

1. Each party j sends its wave, {Qj(0),..,Qj(logN’))}, to the Referee, let s=max(0,pos-n+1). Then W=[s,pos] is the desired window.2.For j=1,..,t let lj be the minimum level such that the tail of Qj(lj) is a position p<=s.3.Let l*=max{lj},j=0,..,t. Let U be the union of all positions in Q1(l*),..Qt(l*).4. Return

• The algorithm returns an estimate for Union Counting Problem for any sliding window of size n<=N that is within a relative error E with probability greater than 2/3

• space -

1 Distributed Streams Algorithms for Sliding Windows Phillip B. Gibbons, Srikanta Tirthapura

Documents

Gibbons v ogden

American Soldier Color - FILET CROCHET AFGHAN …superlativestitchery.com/images/American_Soldier_in_Color_GRAPH... · C Tina Gibbons Gibbons Gibbons YinaGibbons Ina Gibbons C Tina

Gibbons Capitulo1

1963 Gibbons Yearbook - Landmark - Cardinal Gibbons H.S., Raleigh, N.C

1947 Gibbons Yearbook - Cathedralite - Cardinal Gibbons H.S., Raleigh, N.C

1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura

Onion Curve: A Space Filling Curve with Near-Optimal ClusteringOnion Curve: A Space Filling Curve with Near-Optimal Clustering Pan Xu #1, Cuong Nguyen 2, Srikanta Tirthapura 3 # University

Gibbons Michael

‘Jaina Epistemology’ by S. Srikanta Sastri (Oct. 1941) Jaina Epistemology’ by S. Srikanta Sastri (Oct. 1941) Page 7 the horse neighed, the parasol unfurled itself, the chauris

Vincent Gibbons - Introduction

1 Analysis of Link Reversal Routing Algorithms Srikanta Tirthapura (Iowa State University) and Costas Busch (Renssaeler Polytechnic Institute)

1 Concurrent Counting is harder than Queuing Costas Busch Rensselaer Polytechnic Intitute Srikanta Tirthapura…

ICDCS 05Adaptive Counting Networks Srikanta Tirthapura Elec. And Computer Engg. Iowa State University

H. S. Krishnaswamy Iyengar on Dr S. Srikanta Sastri in ... · H. S. Krishnaswamy Iyengar on Dr S. Srikanta Sastri in ‘Sudha’ Magazine – Varada Vyakthi Page 1

Grinling Gibbons

CASE STUDY GIBBONS

2007 Gibbons Yearbook - Landmark - Cardinal Gibbons High School, Raleigh, N.C

Gibbons SenSys08 Keynote

TAREA ANALISIS GIBBONS

Gail Gibbons