33
Count / Top-k Continuous Queries on P2P Networks 01/11/2006

Count / Top-k Continuous Queries on P2P Networks

Embed Size (px)

DESCRIPTION

Count / Top-k Continuous Queries on P2P Networks. 01/11/2006. Outline. Problem Definition P2P Architecture Count Top-K Experiment Setup Future Work. Streaming Data in P2P. P2P Dynamic changing topology, large scale, … Streaming data Continuous, unbounded, rapid, time-varying, noise - PowerPoint PPT Presentation

Citation preview

Page 1: Count / Top-k Continuous Queries on P2P Networks

Count / Top-k Continuous Queries on P2P Networks

01/11/2006

Page 2: Count / Top-k Continuous Queries on P2P Networks

Outline

Problem Definition P2P Architecture Count Top-K Experiment Setup Future Work

Page 3: Count / Top-k Continuous Queries on P2P Networks

Streaming Data in P2P

P2PDynamic changing topology, large scale, …

Streaming dataContinuous, unbounded, rapid, time-varying,

noise P2P + Streaming data

Dynamic in both data and topology

Page 4: Count / Top-k Continuous Queries on P2P Networks

Objective and Goal

Objective Issue a continuous query to estimate count and

top-K Goal

Lower down the communication costLightweight maintenanceApproximated answersAn adaptive and progressive approach

Page 5: Count / Top-k Continuous Queries on P2P Networks

Naïve approach

Flooding the overlay continuousPros

Closer to the exact answer

Cons Network congestion Still non-real time

Page 6: Count / Top-k Continuous Queries on P2P Networks

The State-of-the-Art

CountFocus on one-time answer in P2PDeal with streaming data only

Top-KP2P environment without streaming dataDistributed environment not P2P

Page 7: Count / Top-k Continuous Queries on P2P Networks

P2P architecture

AssumptionHierarchical P2P (Focused)

Super-peer hierarchical structure Query issuer is a super-peer Super peer connect with other super peers Each peer belongs to only one super peer

Pure unstructured P2P

Page 8: Count / Top-k Continuous Queries on P2P Networks

Big picture

Group

Accumulate information within a group based on the constraintand statistics

Set Constraint

Report changes

Approximated answer

Page 9: Count / Top-k Continuous Queries on P2P Networks

Group in hierarchical P2P

Issuer

Coordinator

Peer

Page 10: Count / Top-k Continuous Queries on P2P Networks

Group in hierarchical P2P

3

1

4

2

Page 11: Count / Top-k Continuous Queries on P2P Networks

Group in hierarchical P2P

4

3

3

1

4

2

Page 12: Count / Top-k Continuous Queries on P2P Networks

Group in hierarchical P2P

4

3

3

1

4

2

Page 13: Count / Top-k Continuous Queries on P2P Networks

After partition

Group1

Group3Group2

,01,... 0ii N C

Assume we have N objects and K Groups after partition

,

:1, ...,

:1, ...,

: Count at each peeri j

i N

j K

C

Page 14: Count / Top-k Continuous Queries on P2P Networks

User-specified Epsilon

Group1

Group3Group2

User-specifiedε(Precision)

Page 15: Count / Top-k Continuous Queries on P2P Networks

Consider a group

P4

P1

P3

P2

CoordinatorNode

Objects

O1

O2

O3

Page 16: Count / Top-k Continuous Queries on P2P Networks

Each node maintain the distribution information of owning objects

P2

P4

P1

P3

object

Rate

#

R1

R2

R3

R4

Page 17: Count / Top-k Continuous Queries on P2P Networks

At initial - Polling

P4

P1

P3

P2

CoordinatorNode

Page 18: Count / Top-k Continuous Queries on P2P Networks

At initial - Polling

P4

P1

P3

P2

CoordinatorNode

Page 19: Count / Top-k Continuous Queries on P2P Networks

Information at coordinator after polling

object

#

22

2633

P4

P3P2

P1

Page 20: Count / Top-k Continuous Queries on P2P Networks

Statistics information

object

# P1 P2 P3 P4 ΔO1 1/1 6/6 10/10 5/5 22O2 11/11 13/13 5/5 9/7 36O3 15/15 6/6 3/3 9/9 33R 0.3 0.2 -0.05 0.6T 15 15 17 13

22

2633

Updated time stamp

Maximum changing rate(+/-) of objects in each peer

Change value for each objectLatest real value

Estimated value

Page 21: Count / Top-k Continuous Queries on P2P Networks

Update to Coordinator

(Δ11, Δ21, Δ31)

T2

(Δ12, Δ22, Δ32)

(Δ13, Δ23, Δ33)

Page 22: Count / Top-k Continuous Queries on P2P Networks

Calculate Count

( 1) ( ),0 ,0 ,

1

Kl li i i j

j

C C

Page 23: Count / Top-k Continuous Queries on P2P Networks

Redistribute Epsilon

,0( , , )i if C

wi=Max(Δi)/Cx,0 where x is the i-index of Max(Δi)δi=wiεCx,0/ ∑wi

Page 24: Count / Top-k Continuous Queries on P2P Networks

Visiting sequence

P4

P3P2

P1

Pick those peers would violate δ

Page 25: Count / Top-k Continuous Queries on P2P Networks

Update information

Group

P1 P2 P3 P4 ΔO1 1/1 6/6 10/10 8/8 -O2 11/11 11/11 5/5 6/6 -O3 15/15 5/5 3/3 11/11 -R 0.3 0.4 -0.05 0.2T 15 30 17 33

Page 26: Count / Top-k Continuous Queries on P2P Networks

For those nodes not being visited

Group

P1 P2 P3 P4 ΔO1 1/2 6/6 10/9 8/8 25O2 11/13 11/11 5/4 6/6 34 O3 15/18 5/5 3/2 11/11 36 R 0.3 0.4 -0.05 0.2T 15 30 17 33

Page 27: Count / Top-k Continuous Queries on P2P Networks

Un-notified Leave

P1

Ping

P1 is dead

Remove P1’s information

P4

P3P2

Page 28: Count / Top-k Continuous Queries on P2P Networks

Experiment Setup

Generate synthetic data set by statistics distribution for Streaming dataLife time of peers

MetricsMessage sizeCommunication costResponse latencyResult accuracy

Page 29: Count / Top-k Continuous Queries on P2P Networks

Top-K

Use Regression to predicate the reasonable trend of changesOnce a updated result is required, Super Peer

only need to ask those doubtful peers for doubtful objects

Update its counting list, and return the top k objects

Page 30: Count / Top-k Continuous Queries on P2P Networks

Future Work

Connect and recommend latent good friends for each user Good friends: the ones with the same interests (behaviors)

Exploiting current connecting peers to discover good friends bit by bit

Design a system that could make clusters reflecting current interests of individual peers and connecting them together based on their similarity by using user’s social network

Page 31: Count / Top-k Continuous Queries on P2P Networks

Advantages

Reduce search time and diminish query traffic by using friends list

By utilizing their different strength of arcs/edges/ties = friendshipness, social networks exceed random-walk networks in quickly finding target objects

Page 32: Count / Top-k Continuous Queries on P2P Networks

Example

Level 1

Level 2

Page 33: Count / Top-k Continuous Queries on P2P Networks

Example

has larger weight than

Score(Ni)

Score(Ni)

1 1( ) ( , ) ( )i i i iscore N sim N N score N

Similarity