28
The 30th International Conference on Distributed Computing Systems June 2010, Genoa, Italy Parameterized Maximum and Average Degree Approximation in Topic-based Publish- Subscribe Overlay Network Design Melih Onus, TOBB University of Economics and Technology Andrea W. Richa, Arizona State University

The 30th International Conference on Distributed Computing Systems June 2010, Genoa, Italy Parameterized Maximum and Average Degree Approximation in Topic-based

Embed Size (px)

Citation preview

Page 1: The 30th International Conference on Distributed Computing Systems June 2010, Genoa, Italy Parameterized Maximum and Average Degree Approximation in Topic-based

The 30th International Conference on Distributed Computing SystemsJune 2010, Genoa, Italy

Parameterized Maximum and Average DegreeApproximation in Topic-based Publish-Subscribe

Overlay Network Design

Melih Onus, TOBB University of Economics and TechnologyAndrea W. Richa, Arizona State University

Page 2: The 30th International Conference on Distributed Computing Systems June 2010, Genoa, Italy Parameterized Maximum and Average Degree Approximation in Topic-based

Publish/Subscribe (Pub/Sub)

N1

Subscription(N1)={B,C,D}N2

{A,B,C,E,}

N3

{A,D}

N4

{A,B,X}

N5

{A,X}Message BusMessage Bus

Publish(M1, A)

M1

M1

M1

Page 3: The 30th International Conference on Distributed Computing Systems June 2010, Genoa, Italy Parameterized Maximum and Average Degree Approximation in Topic-based

Scalability of Pub/Sub

Most traditional pub/sub systems are geared towards small scale deployment– E.g., Isis MDS, TIB, MQSeries, Gryphon

New generation of applications…– Large data centers: Amazon, Google, Yahoo, EBay,…– RSS, feed/news readers, on-line stock trading and banking– Web 2.0, Second Life

…drive dramatic growth in scale– 10,000s of nodes, 1000s of topics, Internet-wide distribution

Emerging systems address this trend using P2P techniques

Page 4: The 30th International Conference on Distributed Computing Systems June 2010, Genoa, Italy Parameterized Maximum and Average Degree Approximation in Topic-based

Overlay-Based Pub/Sub

N1

{B,C,D}N2

{A,B,C,E}

N3

{A,D}

{A,B,X}

N5

{A,X}

N4

(M1,

A)

(M1, A)

(M1, A)

(M1, A

)(M1, A)

•SCRIBE•Corona •Feedtree •Sub-2-Sub •TERA•...

Relay

Page 5: The 30th International Conference on Distributed Computing Systems June 2010, Genoa, Italy Parameterized Maximum and Average Degree Approximation in Topic-based

Overlay Topologies for Pub/Sub

“Good” overlay will allow for efficient and simple publication routing– Small routing tables, low load on relays, – low latency

Ideally, overlay is topic-connected: i.e., one connected component for each topic-induced sub-graph– Most existing implementations construct topic-connected

overlays

Page 6: The 30th International Conference on Distributed Computing Systems June 2010, Genoa, Italy Parameterized Maximum and Average Degree Approximation in Topic-based

Topic-Connectivity

Topics B,C,X,E are connected

Topics A and D are disconnected

N1

{B,C,D}N2

{A,B,C,E}

N3

{A,D}

{A,B,X}

N5

{A,X}

N4

Page 7: The 30th International Conference on Distributed Computing Systems June 2010, Genoa, Italy Parameterized Maximum and Average Degree Approximation in Topic-based

Topic-Connectivity: Simple Solution

N1

{B,C,D}N2

{A,B,C,E}

N3

{A,D}

{A,B,X}

N5

{A,X}

N4

Node degree grows linearly with the subscription size Roughly twice as big as the subscription size for

rings/trees

Page 8: The 30th International Conference on Distributed Computing Systems June 2010, Genoa, Italy Parameterized Maximum and Average Degree Approximation in Topic-based

Scalability of the Simple Solution

Negative impact on performance due to– CPU load: neighbor monitoring, message processing– Connection maintenance and header overhead– Memory overhead: per-link state associated with routing

and/or compression schemes being used, etc.

Scalability barrier for large systems offering a wide range of subscription choices

Can we do better?Can we do better?

Page 9: The 30th International Conference on Distributed Computing Systems June 2010, Genoa, Italy Parameterized Maximum and Average Degree Approximation in Topic-based

The MinMax-TCO Problem

Minimum Maximum Degree Topic-Connected Overlay (MinMax-TCO) problem:– For a set of nodes V, set of topics T, and Interest: V T

{true, false}– Construct a topic-connected overlay G with the minimum

possible maximum degree TCO (decision version):

– Decide whether there is a topic-connected overlay with maximum degree k (for a given k)

Page 10: The 30th International Conference on Distributed Computing Systems June 2010, Genoa, Italy Parameterized Maximum and Average Degree Approximation in Topic-based

GM Algorithm

The GM algorithm can have maximum degree of (n), when constant maximum degree overlay network exists.

Page 11: The 30th International Conference on Distributed Computing Systems June 2010, Genoa, Italy Parameterized Maximum and Average Degree Approximation in Topic-based

Complexity of MinMax-TCO

Lemma: MinMax-TCO(V,T,Interest,k)NPProof: Topic connectivity is verifyable in polynomial time

Lemma: MinMax-TCO(V,T,Interest,k) is NP-hardProof: 1. Define an auxiliary problem Single Node TCO (SN-TCO)

which is to decide if there is a topic-connected overlay in which the degree of single given node d

2. Set Cover is polynomially reducible to SN-TCO3. SN-TCO is polynomially reducible to TCO

Theorem: MinMax-TCO is NP-complete

Page 12: The 30th International Conference on Distributed Computing Systems June 2010, Genoa, Italy Parameterized Maximum and Average Degree Approximation in Topic-based

Approximating MinMax-TCO

The idea: exploiting subscription overlaps– Connecting the nodes with overlapping interests improves

connectivity of several topics at once Overlay Design Algorithm (ODA):

– Start from a singleton connected component for each (v, t) V T

– At each iteration: add an edge that reduces the number of connected components for the biggest number of topics among the ones which increase maximum degree minimally

– Stop, once there is a single connected component for each topic

Page 13: The 30th International Conference on Distributed Computing Systems June 2010, Genoa, Italy Parameterized Maximum and Average Degree Approximation in Topic-based

ODA Running Time

O(|V|4|T|)– At most |V|2 iterations – At most |V|2 edges inspected at each iteration– At most |T| steps to inspect an edge

Can be optimized to run in O(|V|2 |T|)– For each e V V, weight(e) = the number of connected

components merged by e– At each iteration, output the heaviest edge and adjust the other

edge weights accordingly– Stop once there are no more edges with weight > 0

Page 14: The 30th International Conference on Distributed Computing Systems June 2010, Genoa, Italy Parameterized Maximum and Average Degree Approximation in Topic-based

Approximability Results

Lemma: The number of edges in the overlay constructed by GM log(|V||T|) OPT

Proof: Similar to that of the approximation ratio of the greedy algorithm for Set Cover

Uses Maximum Weighted Matching Uses Edge Coloring

Theorem: No algorithm can approximate MinMax-TCO within a constant factor (unless P=NP)

Proof: Existence of such an algorithm would imply existence of the constant factor approximation for Set Cover which is known to be impossible (unless P=NP)

Page 15: The 30th International Conference on Distributed Computing Systems June 2010, Genoa, Italy Parameterized Maximum and Average Degree Approximation in Topic-based

ODA Algorithm

The ODA algorithm can have average degree of (n), when constant average degree overlay network exists.

vn-1

v1

v2

v3

vn

v1

v2

v3

vn

vn-1

… …

v3

vn-1

v1

v2

vn

Page 16: The 30th International Conference on Distributed Computing Systems June 2010, Genoa, Italy Parameterized Maximum and Average Degree Approximation in Topic-based

ODA and GM Algorithms

GM Algorithm: Choose edge with maximum benefit– Average Degree: O(log nt) approximation– Maximum Degree: O(n) approximation

ODA Algorithm: Choose edge with maximum benefit among the ones that increases maximum degree minimally– Average Degree: O(n) approximation– Maximum Degree: O(log nt) approximation

How to approximate both average and maximum degree?

Page 17: The 30th International Conference on Distributed Computing Systems June 2010, Genoa, Italy Parameterized Maximum and Average Degree Approximation in Topic-based

Parameterized Algorithm

e1: Edge with maximum benefit

e2: Edge with maximum benefit among the ones that increases maximum degree minimally

If w(e2) > w(e1) / k, choose e2

Otherwise, choose e1 1 < k < n

Page 18: The 30th International Conference on Distributed Computing Systems June 2010, Genoa, Italy Parameterized Maximum and Average Degree Approximation in Topic-based

Algorithms

GM Algorithm: – Average Degree: O(log nt) approximation– Maximum Degree: O(n) approximation

ODA Algorithm: – Average Degree: O(n) approximation– Maximum Degree: O(log nt) approximation

P-ODA Algorithm:– Average Degree: O(k * log nt) approximation– Maximum Degree: O((n/k)*log nt) approximation

Page 19: The 30th International Conference on Distributed Computing Systems June 2010, Genoa, Italy Parameterized Maximum and Average Degree Approximation in Topic-based

Constant Diameter Overlays

Constant Diameter Topic-Connected Overlay (CD-TCO) problem:– For a set of nodes V, set of topics T, and Interest: V T

{true, false}– Construct a topic-connected, constant diameter overlay G

with the minimum possible average degree

The GM algorithm can have diameter of (n), where n is number of nodes in the pub/sub system.

Page 20: The 30th International Conference on Distributed Computing Systems June 2010, Genoa, Italy Parameterized Maximum and Average Degree Approximation in Topic-based

Constant Diameter Overlay Algorithm

Constant Diameter Overlay Design Algorithm:– At each iteration:

• Find number of neighbors for each node• Add a star which connects maximum number of nodes, • Remove topics which are connected by the star

– Stop, once there is a single connected component for each topic

Number of neighbors of node u:

Page 21: The 30th International Conference on Distributed Computing Systems June 2010, Genoa, Italy Parameterized Maximum and Average Degree Approximation in Topic-based

Constant Diameter Overlay Algorithm I

Constant Diameter Overlay Design Algorithm I:– At each iteration:

• Find weight for each node• Add a star which connects the node with maximum weight, • Remove topics which are connected by the star

– Stop, once there is a single connected component for each topic

Weight of node u:

Page 22: The 30th International Conference on Distributed Computing Systems June 2010, Genoa, Italy Parameterized Maximum and Average Degree Approximation in Topic-based

Constant Diameter Overlay Algorithm II

Constant Diameter Overlay Design Algorithm II:– At each iteration:

• Find number of neighbors for each node• Add a star which connects the node with maximum density, • Remove topics which are connected by the star

– Stop, once there is a single connected component for each topic

Density of node u:

Page 23: The 30th International Conference on Distributed Computing Systems June 2010, Genoa, Italy Parameterized Maximum and Average Degree Approximation in Topic-based

Experimental Results I

Average Node DegreeVarying #nodes#topics: 100#subscription: 10Uniform distribution

Only 2.3 times more edge

Page 24: The 30th International Conference on Distributed Computing Systems June 2010, Genoa, Italy Parameterized Maximum and Average Degree Approximation in Topic-based

Experimental Results II

Average Node DegreeVarying #topics#nodes: 100#subscription: 20Uniform distribution

Only 1.9 times more edge

Page 25: The 30th International Conference on Distributed Computing Systems June 2010, Genoa, Italy Parameterized Maximum and Average Degree Approximation in Topic-based

Experimental Results III

Average Node DegreeVarying #subscription#nodes: 100#topics: 100Uniform distribution

Only 1.8 times more edge

Page 26: The 30th International Conference on Distributed Computing Systems June 2010, Genoa, Italy Parameterized Maximum and Average Degree Approximation in Topic-based

Conclusions

Formal study of the problem of designing efficient and scalable overlay topologies for pub/sub

Empirical evaluation showed effectiveness of our approximation algorithm on practical inputs

Parameterized algorithm with low maximum and average degree

Defined the problem (CD-TCO), empirical results

Page 27: The 30th International Conference on Distributed Computing Systems June 2010, Genoa, Italy Parameterized Maximum and Average Degree Approximation in Topic-based

Future Directions

Study dynamic case Investigate other overlay design problems Study distributed case

– Partial knowledge of other node interest– Dynamically changing interest assignments

Proving diameter results theoretically

Page 28: The 30th International Conference on Distributed Computing Systems June 2010, Genoa, Italy Parameterized Maximum and Average Degree Approximation in Topic-based

Thank You!