Topics in Database Systems: Data Management in Peer-to-Peer Systems

1p2p, Fall 06

Topics in Database Systems: Data Management in Peer-to-Peer Systems

PART 1: Replication and other issues

2p2p, Fall 06

Agenda για σήμερα

1. Περιγραφή των εργασιών του μαθήματος2. Γενικά για Replication3. Replication Theory for Unstructured (Cohen et al paper)4. Epidemic Algorithms for Updates (Demers et al paper)

3p2p, Fall 06

Term Projects

Εργασίες τριών τύπων

Έχουν κάποιο «ερευνητικό» χαρακτήρα – χρειάζεται να σκεφτείτε

Δεν υπάρχει μία λύση (άρα την ίδια εργασία παραπάνω από μια ομάδες)

Θα ήθελα 3 άτομα ανά ομάδα

Αν έχετε κάποια άλλη ιδέα – γίνεται αλλά όχι «αυτόματα»

Θα φτιάξετε μια web σελίδα για το project – την οποία θα μου στείλετε replicate content and not index (for durability)!!!

4p2p, Fall 06

Term Projects

ΕΡΓΑΣΙΑ ΤΥΠΟΥ I ================= Θα επιλέξετε ένα άρθρο από μια λίστα από άρθρα Τα άρθρα αφορούν προβλήματα διαχείρισης δεδομένων είτε σε κεντρικοποιημένα συστήματα είτε σε κατανεμημένα συστήματα χωρίς τις ιδιότητες των συστημάτων ομοτίμων. Στόχος της εργασίας είναι η σχεδίαση μια εκδοχής του προβλήματος κατάλληλης για ένα σύστημα ομοτίμων κόμβων. Η εργασία σας θα πρέπει να περιέχει μια μορφή αξιολόγησης της προσέγγισής σας. Αυτή μπορεί να είναι θεωρητική (πχ, εκτίμηση πολυπλοκότητας της λύσης, απόδειξη της ορθότητας ή άλλων ιδιοτήτων (πχ εξισορρόπιση φορτίου) της λύσης) ή/και να περιλαμβάνει μια μικρή υλοποίηση. Θα παραδώσετε ένα άρθρο που θα έχει την μορφή ερευνητικής εργασίας (θα δοθούν οδηγίες). Επίσης, θα παρουσιάσετε την εργασία σας στο μάθημα (θα δοθούν οδηγίες).

5p2p, Fall 06

Term Projects

Άρθρα για τις Εργασίες Τύπου Ι[1-3] Διαλέξτε οποιοδήποτε (ένα) από τα sections 3, 4 ή 5 από το: M. Stonebraker, P. M. Aoki, W. Litwin, A. Pfeffer, A. Sah, J. Sidell C. Staelin and A. Yu. Mariposa: A Wide-Area Distributed Database System. VLDB J., 5(1), 1996, 48-63.

[4] Εξετάστε πως το παρακάτω που συζητήσαμε στο μάθημα μπορεί να προσαρμοστεί για p2p: A. J. Demers, D. H. Greene, C. Hauser, W. Irish, J. Larson, S. Shenker, H. E. Sturgis, D. C. Swinehart, D. B. Terry: Epidemic Algorithms for Replicated Database Maintenance. PODC 1987: 1-12

[5] Θεωρείστε μια κατανεμημένη (p2p) εκδοχή ενός bitmap index. Για τα bitmap indexes μπορείτε να συμβουλευτείτε οποιοδήποτε βιβλίο βάσεων δεδομένων ή/και το παρακάτω P. E. O'Neil and D. Quass. Improved Query Performance with Variant Indexes. Proc. SIGMOD Conference, 1997, 38-49.

[6] Εξετάστε πως το παρακάτω που αφορά sensor networks μπορεί να εφαρμοστεί σε p2p συστήματα D. Zeinalipour-Yazti, Z. Vagena, D. Gunopulos, V. Kalogeraki, V. Tsotras, M. Vlachos, N. Koudas, D. Srivastava The Threshold Join Algorithm for Top-k Queries in Distributed Sensor Networks, DMSN Workshop, 2005.

6p2p, Fall 06

Term ProjectsΕΡΓΑΣΙΑ ΤΥΠΟΥ ΙΙ =================== Θα επιλέξετε ένα άρθρο που αφορά θέματα της περιοχής των συστημάτων ομότιμων κόμβων που δεν έχουμε καλύψει στο μάθημα, συγκεκριμένα: (i) security, (ii) trust/reputation, (iii) incentives, (iv) publish-subscribe συστήματα.

Παρουσίαση του άρθρου στο μάθημα.

(α) προτείνετε κάποια επέκταση του άρθρου, πχ εφαρμογή του σε άλλο τύπο overlay, βελτίωση κάποιου χαρακτηριστικού του κλπ. Σε αυτήν την περίπτωση, θα πρέπει να συμπεριλάβετε και κάποια μορφή αξιολόγησης της επέκτασης. Αυτή μπορεί να είναι θεωρητική (πχ, εκτίμηση πολυπλοκότητας της λύσης κλπ) ή/και να περιλαμβάνει μια μικρή υλοποίηση, είτε (β) να υλοποιήσετε ένα ικανοποιητικό κομμάτι του άρθρου.

Θα παραδώσετε ένα άρθρο που θα έχει την μορφή ερευνητικής εργασίας (θα δοθούν οδηγίες).

Επίσης, θα δώσετε μια δεύτερη παρουσίαση στο μάθημα αυτή τη φορά της εργασία σας (θα δοθούν οδηγίες).

7p2p, Fall 06

Term Projects

Άρθρα για τις Εργασίες Τύπου ΙI

• Security E. Sit and R. Morris: Security Considerations for Peer-to-Peer Distributed Hash Tables. IPTPS 2002: 261-269 D. S. Wallach: A Survey of Peer-to-Peer Security Issues. ISSS 2002: 42-57

• Incentives M. Feldman, K. Lai, I. Stoica and J. Chuang: Robust incentive techniques for peer-to-peer networks. ACM Conference on Electronic Commerce 2004: 102-111

• Trust/Reputation S. D. Kamvar, M. T. Schlosser, H. Garcia-Molina: The Eigentrust algorithm for reputation management in P2P networks. WWW 2003: 640-651

• Publish/subscribe M. Bender, S. Michel, S. Parkitny, and G. Weikum A Comparative Study of Pub/Sub Methods in Structured P2P Networks. DBISP2P 2006, Seoul, South Korea, Springer, 2006

8p2p, Fall 06

Term Projects

ΕΡΓΑΣΙΑ ΤΥΠΟΥ ΙIΙ ===================

Θα επιλέξετε ένα από τα συστήματα που αφορούν λογισμικό συστημάτων ομότιμων κόμβων.

Θα πρέπει να εγκαταστήσετε το σχετικό λογισμικό και να κατασκευάσετε μια μικρή εφαρμογή.

Θα παραδώσετε ένα άρθρο που θα περιλαμβάνει ένα σύντομο εγχειρίδιο για το σύστημα και μια περιγραφή της εφαρμογή σας.

Επίσης, θα παρουσιάσετε την εργασία σας στο μάθημα (θα δοθούν οδηγίες). Η παρουσίαση θα πρέπει να περιλαμβάνει και ένα σύντομο demo.

9p2p, Fall 06

Term Projects

Τα Συστήματα για τις Εργασίες Τύπου IΙI

[1] OpenDHT OpenDHT is a publicly accessible distributed hash table (DHT) service.

[2] P2: Declarative Networking: P2 is a system which uses a high-level declarative language to express overlay networks in a highly compact and reusable form

[3] PeerSim: PeerSim is a simulation environment for P2P protocols in java.

http://opendht.org/

http://p2.cs.berkeley.edu/

http://peersim.sourceforge.net/

10p2p, Fall 06

Term Projects

Προθεσμίες:

Δεκ 7: Σχηματισμός ομάδων και επιλογή εργασίας Δεκ 14: 1-2 σελίδες "πρόταση εργασίας" (project proposal) (θα δοδούν οδηγίες) Δεκ 21: πιθανών να έχουμε μια μικρή παρουσίαση/συζήτηση των εργασιών την τελευταία εβδομάδα πριν τα Χριστούγεννα Ιαν 11: Παρουσιάσεις άρθρων Ομάδας ΙΙ Ιαν 18: " "

Ιαν 25: Παράδοση Εργασίας (για το άρθρο, θα δοθούν οδηγίες)

Θα υπάρχει ένα τελικό workshop που θα παρουσιαστούν οι εργασίες όλων των ομάδων.

11p2p, Fall 06


1. Περιγραφή των εργασιών του μαθήματος

2. Γενικά για Replication

3. Replication Theory for Unstructured (Cohen et al paper)4. Epidemic Algorithms for Updates (Demers et al paper)

12p2p, Fall 06

Types of Replication

Two types of replication Metadata/Index: replicate index entries

Data/Document replication: replicate the actual data (e.g., music files)

Metadata vs Data(+) “Lighter” storage and bandwidth wise(+) Sizes of replicated objects more uniform(-) Adds an extra hop for actually getting the data(-) More frequent updates(-) Less durability/availability

13p2p, Fall 06

Types of Replication

Caching vs Replication

Cache: Store data retrieved from a previous request (client-initiated)

Replication: More proactive, a copy of a data item may be stored at a node even if the node has not requested it

14p2p, Fall 06

Reasons for Replication

Reasons for replication

Performanceload balancing locality: place copies close to the requestor

geographic locality (more choices for the next step in search)

reduce number of hops

AvailabilityIn case of failuresPeer departures

15p2p, Fall 06

Reasons for Replication

Besides storage, cost associated with replication: Consistency Maintenance

Make reads faster in the expense of slower writes

16p2p, Fall 06

• No proactive replication (Gnutella)– Hosts store and serve only what they requested– A copy can be found only by probing a host with a copy

• Proactive replication of “keys” (= meta data + pointer) for search efficiency (FastTrack, DHTs)

• Proactive replication of “copies” – for search and download efficiency, anonymity. (Freenet)

17p2p, Fall 06

Issues

Which items (data/metadata) to replicate

Based on popularityIn traditional distributed systems, also rate of read/write

cost benefit:the ratio: read-savings/write-increase

Where to replicate (allocation schema)

18p2p, Fall 06

Issues

How/When to update

Both data items and metadata

19p2p, Fall 06

“Database-Flavored” Replication Control Protocols

Lets assume the existence of a data item x with copies x1, x2, …, xn

x: logical data itemxi’s: physical data items

A replication control protocol is responsible for mapping each read/write on a logical data item (R(x)/W(x)) to a set of read/writes on a (possibly) proper subset of the physical data item copies of x

20p2p, Fall 06

One Copy Serializability

Correctness

A DBMS for a replicated database should behave like a DBMS managing a one-copy (i.e., non-replicated) database insofar as users can tell

One-copy serializable (1SR)the schedule of transactions on a replicated database be equivalent to a serial execution of those transactions on a one-copy database

One-copy schedule: replace operation of data copies with operations on data items

21p2p, Fall 06

ROWA

Read One/Write All (ROWA)A replication control protocol that maps each read to only one copy of the item and each write to a set of writes on all physical data item copies.

Even if one of the copies is unavailable an update transaction cannot terminate

22p2p, Fall 06

Write-All-Available

Write-all-available A replication control protocol that maps each read to only one copy of the item and each write to a set of writes on all available physical data item copies.

23p2p, Fall 06

Quorum-Based Voting

Read quorum Vr and a write quorum Vw to read or write a data item

If a given data item has a total of V votes, the quorums have to obey the following rules:

1. Vr + Vw > V2. Vw > V/2

Rule 1 ensures that a data item is not read or written by two transactions concurrently (R/W)Rule 2 ensures that two write operations from two transactions cannot occur concurrently on the same data item (W/W)

24p2p, Fall 06

Distributing Writes

Immediate writes

Deffered writesAccess only one copy of the data item, it delays the distribution of writes to other sites until the transaction has terminated and is ready to commit.It maintains an intention list of deferred updatesAfter the transaction terminates, it send the appropriate portion of the intention list to each site that contains replicated copiesOptimizations – aborts cost less – may delay commitment – delays the detection of conflicts

Primary or master copyUpdates at a single copy per item

25p2p, Fall 06

Eager vs Lazy Replication

Eager replication: keeps all replicas synchronized by updating all replicas in a single transaction

Lazy replication: asynchronously propagate replica updates to other nodes after the replicating transaction commits

In p2p, lazy replication (or soft state)

26p2p, Fall 06

Update Propagation

Stateless or State-full (the “item-owners” know which nodes holds copies of the item)

Who initiates the update: Push by the server item (copy) that changes Pull by the client holding the copy

27p2p, Fall 06

Update Propagation

When Periodic Immediate Lazy: when an inconsistency is detected Threshold-based: Freshness (e.g., number of updates or actual time)

Value Expiration-Time: Items expire (become invalid) after that time (most often used in p2p) Adaptive periodic:

Reduce or increase period based on the updates seen between two successive updates

Stateless or State-full (the “item-owners” know which nodes holds copies of the item)

28p2p, Fall 06

Path-length

Neighbor state

Total path latency

Per-hop latency volume Multiple

routesreplica

sDimension

s (d) O(dn1/d) O(d) - - -

Realities (r) O(r) - O(r) O(r)

MAXPEERS (p) O(1/p) O(p) O(p)* O(p)*

Hash functions

(k)- - - Ο(k) - O(k)

RTT-weighted routing

- - - - -

Uniform partitioning heuristic

Reduced variance

Reduces variance - - Reduced

variance - -

Summary: Design parameters and performance (CAN)

* Only on replicated data

29p2p, Fall 06

ReplicationEach node maintain a successor list of its r nearest successors

Upon failure, use the next successor in the list Modify stabilize to fix the list

CHORD: Failures

Other nodes may attempt to send requests through the failed node Use alternate nodes found in the routing table of preceding nodes or in the successor list

30p2p, Fall 06

A lookup fails, if all r nodes in the successor list fail. All fail with probability 2-r (independent failures) = 1/N

Theorem: If we use a successor list with r = Ο(logN) in an initially stable network and then every node fails with probability 1/2, then

with high probability, find_successor returns the closest living successor

the expected time to execute find_successor in the failed network is O(logN)

CHORD: Failures

31p2p, Fall 06

Store replicas of a key at the k nodes succeeding the key

Successor list helps to keep the number of replicas per item known

Other approach: store a copy per region

CHORD: Replication

32p2p, Fall 06

BATON: Failures

Upon node departure or failure, the parent can reconstruct the entries

Assume node x fails, any detected failures of x are reported to its parent y

y regenerates the routing tables of x – Theorem 2

Messages are routed Sideways (redundancy similar to CHORD) Up-down (can find its parent through its neighbors)

There is “routing” redundancy

33p2p, Fall 06

Replication - Beehive

Proactive – model-driven replication Passive (demand-driven) replication such as caching objects along a lookup path

Hint for BATONBeehiveThe length of the average query path reduced by one when an object is proactively replicated at all nodes logically preceding that node on all query pathsBATONRange queriesMany paths to data

Any ideas?

34p2p, Fall 06


1. Περιγραφή των εργασιών του μαθήματος2. Γενικά για Replication

3. Replication Theory for Unstructured (Cohen et al paper)

4. Epidemic Algorithms for Updates (Demers et al paper)

35p2p, Fall 06

Replication Theory: Replica Allocation Policies in Unstructured P2P Systems

E. Cohen and S. Shenker, “Replication Strategies in Unstructured Peer-to-Peer Networks”. SIGCOMM 2002 Q. Lv et al, “Search and Replication in Unstructured Peer-to-Peer Networks”, ICS’02 – Replication Part

36p2p, Fall 06

Question: how to use replication to improve search efficiency in unstructured networks?

Replication: Allocation Scheme

How many copies of each object so that the search overhead for the object is minimized, assuming that the total amount of storage for objects in the network is fixed

37p2p, Fall 06

Replication Theory - Model

Assume m objects and n nodesEach node capacity ρ, total capacity R = n ρ

How to allocate R among the m objects? Determine ri number of copies (distinct nodes) that hold a copy of i

Σ i=1, m ri = R (R total capacity)Also, pi = ri/R – Fraction of total capacity allocated to I

Allocation represented by the vector

(p1, p2, …. pm) = (r1/R, r2/R, rm/R)

38p2p, Fall 06


Assume that object i is requested with relative rates qi, we normalize it by setting

Σ i=1, m qi = 1

For convenience, assume 1 << ri n and that q1 q2 … qm

Map the query distribution q to an allocation vector p

39p2p, Fall 06


Assume all nodes equal capacity ρ, ρ = R/n

R m (at least one copy per item)m > ρ (else, the problem is trivial, maintain copies of all items everywhere)

Bounds for pi

At least one copy, ri 1, Lower value l = 1/R At most n copies, ri n, Upper value, u = n/R

40p2p, Fall 06

Replication Theory

Assume that searches go on until a copy is found

We want to determine ri that minimizes the average search size (number of nodes probed) to locate an item i

Need to compute average search size per item

Searches consist of randomly probing sites until the desired object is found: search at each step draws a node uniformly at random and asks whether it has a copy

41p2p, Fall 06

Search Example

2 probes 4 probes

42p2p, Fall 06

Replication Theory

The probability Pr(k) that the object I is found at the k’th probe is given

Pr(k) = Pr(not found in the previous k-1 probes) Pr(found in one (the kth)

probe) =

(1 – ri/n)k-1 * ri/n

k (search size: step at which the item is found) is a random variable with geometric distribution and θ = ri/n =>

expectation n/ri

43p2p, Fall 06

Replication Theory

Ai: Expectation (average search size) for object i is the inverse of the fraction of sites that have replicas of the object

Ai = n/ri

The average search size A of all the objects (average number of nodes probed per object query)

A = Σi qi Ai = n Σi qi/ri

Minimize: A = n Σi qi/ri

44p2p, Fall 06

Replication Theory

If we have no limit on ri, replicate everything everywhereThen, the average search size

Ai = n/ri = 1

Search becomes trivial

How to allocate these R replicas among the m objects: how many replicas per object

Assume a limit on R and that the average number of replicas per site ρ = R/n is fixed

45p2p, Fall 06

Replication Theory

Minimize: Σi qi/pi

Subject to Σpi = 1 and l pi u

MonotonicitySince q1 q2 … qm, we must have

p1 p2 … pm

More copies to more popular, but how many?

46p2p, Fall 06

Uniform Replication

Create the same number of replicas for each objectri = R/m

Average search size for uniform replicationAi = n/ri = m/ρ

Auniform = Σi qi m/ρ = m/ρ (m n/R)

Which is independent of the query distribution

47p2p, Fall 06

Proportional Replication

Create a number of replicas for each object proportional to the query rate

ri = R qi

It makes sense to allocate more copies to objects that are frequently queried, this should reduce the search size for the more popular objects

48p2p, Fall 06

Proportional Replication

Number of replicas for each object:ri = R qi

Average search size for uniform replicationAi = n/ri = n/R qi

Aproportioanl = Σi qi n/R qi = m/ρ = Auniform

again independent of the query distribution

Why? Objects whose query rate are greater than average (>1/m) do better with proportional, and the other do better with uniform

The weighted average balances out to be the same

49p2p, Fall 06

Uniform and Proportional Replication

Summary:• Uniform Allocation: pi = 1/m

•Simple, resources are divided equally

• Proportional Allocation: pi = qi•“Fair”, resources per item proportional to demand• Reflects current P2P practices

Example: 3 items, q1=1/2, q2=1/3, q3=1/6Uniform Proportional

50p2p, Fall 06

Space of Possible Allocations

q i+1/q i ? p i+1/p iAs the query rate decreases, how much does the ratio of allocated replicas behave

Reasonable:p i+1/p i 1

=1 for uniform

So what is the optimal way to allocate replicas so that A is minimized?

51p2p, Fall 06

Space of Possible Allocations

Definition: Allocation p1, p2, p3,…, pm is “in-between” Uniform and Proportional if

for 1< i <m, q i+1/q i < p i+1/p i < 1

(=1 for uniform, = for proportial, we want to favor popular but not too much)

Theorem1: All (strictly) in-between strategies are (strictly) better than Uniform and Proportional

Theorem2: p is worse than Uniform/Proportional if for all i, p i+1/p i > 1 (more popular gets less) OR for all i, q i+1/q i > p i+1/p i (less popular gets less than “fair share”)

Proportional and Uniform are the worst “reasonable” strategies

52p2p, Fall 06

q2/q1

p 2/p

1Space of allocations on 2 items

Worse than prop/uniMore popular item gets less.

Worse than prop/uni

More popular gets more thanits proportional share

Better than prop/uni

Uniform

Proportional

SR

53p2p, Fall 06

So, what is the best strategy?

54p2p, Fall 06

Square-Root Replication

Find ri that minimizes A,

A = Σi qi Ai = n Σi qi/ri

This is done for ri = λ √qi where λ = R/Σi √qi

Then the average search size isAoptimal = 1/ρ (Σi √qi)2

55p2p, Fall 06

How much can we gain by using SR ?w

i iq Zipf-like query rates

Auniform/ASR

56p2p, Fall 06

Other Metrics: Discussion

Utilization rate, the rate of requests that a replica of an object i receives

Ui = R qi/ri

For uniform replication, all objects have the same average search size, but replicas have utilization rates proportional to their query rates

Proportional replication achieves perfect load balancing with all replicas having the same utilization rate, but average search sizes vary with more popular objects having smaller average search sizes than less popular ones

57p2p, Fall 06

Replication: Summary

58p2p, Fall 06

Pareto Distribution (for the queries)

59p2p, Fall 06

Pareto Distribution (for the queries)

Both model Power-law distributions

Zipf: what is the size (popularity) of the r-th ranked -- y ~ r-b Pareto: how many have size > r (look at the frequency distribution)P[X > x] ~ x-k

P[X = x] ~ x-(k+1) = x-a

"The r-th hottest item has n queries" is equivalent to saying "r items have n or more queries". This is exactly the definition of the Pareto distribution, except the x and y axes are flipped. Whereas for Zipf, we have r (rank) and compute n, in Pareto we have n and compute r (rank) Reference: http://www.hpl.hp.com/research/idl/papers/ranking/ranking.html

60p2p, Fall 06

Replication (summary)

Each object i is replicated on ri nodes and the total number of objects stored is R, that is

Σ i=1, m ri = R(1) Uniform: All objects are replicated at the same number

of nodesri = R/m

(2) Proportional: The replication of an object is proportional to the query probability of the object

ri qi

(3) Square-root: The replication of an object i is proportional to the square root of its query probability qi

ri √qi

61p2p, Fall 06

What is the search size of a query ?

Soluble queries: number of probes until answer is found.

Insoluble queries: maximum search size

Query is soluble if there are sufficiently many copies of the item.

Query is insoluble if item is rare or non existent.

Assumption that there is at least one copy per object

62p2p, Fall 06

• SR is best for soluble queries

• Uniform minimizes cost of insoluble queries

OPT is a hybrid of Uniform and SR

Tuned to balance cost of soluble and insoluble queries: uniformly allocate a minimum number of copies per item, use SR for the rest

What is the optimal strategy?

63p2p, Fall 06

UniformSR

10^4 items, Zipf-like w=1.5

All Soluble

85% Soluble

All Insoluble

64p2p, Fall 06

We now know what we need.

How do we get there?

65p2p, Fall 06

Replication Algorithms

• Fully distributed where peers communicate through random probes; minimal bookkeeping; and no more communication than what is needed for search.

• Converge to/obtain SR allocation when query rates remain steady.

Uniform and Proportional are “easy”– Uniform: When item is created, replicate its key in a fixed

number of hosts.– Proportional: for each query, replicate the key in a fixed

number of hosts (need to know or estimate the query rate)

Desired properties of algorithm:

66p2p, Fall 06

Replication - Implementation

Two strategies are popular

Owner ReplicationWhen a search is successful, the object is stored at the requestor node only (used in Gnutella)

Path ReplicationWhen a search succeeds, the object is stored at all nodes along the path from the requestor node to the provider node (used in Freenet)Following the reverse path back to the requestor

67p2p, Fall 06

Achieving Square-Root Replication

How can we achieve square-root replication in practice?

Assume that each query keeps track of the search size Each time a query is finished the object is copied to a number of sites proportional to the number of probes

On average object i will be replicated on c n/ri times each time a query is issued (for some constant c)

It can be shown that this gives square root

68p2p, Fall 06

Replication - Conclusion

Thus, for Square-root replication

an object should be replicated at a number of nodes that is proportional to the number of probes that the search required

69p2p, Fall 06


If a p2p system uses k-walkers, the number of nodes between the requestor and the provider node is 1/k of the total nodes visited (number of probes)

Then, path replication should result in square-root replication

Problem: Tends to replicate nodes that are topologically along the same path

70p2p, Fall 06


Random ReplicationWhen a search succeeds, we count the number of nodes on the path between the requestor and the providerSay pThen, randomly pick p of the nodes that the k walkers visited to replicate the object

Harder to implement

71p2p, Fall 06

Achieving Square-Root Replication

What about replica deletion?Steady state: creation time equal with the deletion time

The lifetime of replicas must be independent of object identity or query rateFIFO or random deletions is okLRU or LFU no

72p2p, Fall 06

Replication: EvaluationStudy the three replication strategies in the Random graph network topologySimulation Details• Place the m distinct objects randomly into the network• Query generator generates queries according to a Poisson process at 5 queries/sec• Zipf-distribution of queries among the m objects (with a = 1.2)• For each query, the initiator is chosen randomly• Then a 32-walker random walk with state keeping and checking every 4 steps• Each sites stores at most objAllow (40) objects• Random Deletion• Warm-up period of 10,000 secs• Snapshots every 2,000 query chunks

73p2p, Fall 06

Replication: Evaluation

For each replication strategy

What kind of replication ratio distribution does the strategy generate?

What is the average number of messages per node in a system using the strategy

What is the distribution of number of hops in a system using the strategy

74p2p, Fall 06

Evaluation: Replication Ratio

Both path and random replication generates replication ratios quite close to square-root of query rates

75p2p, Fall 06

Evaluation: Messages

Path replication and random replication reduces the overall message traffic by a factor of 3 to 4

76p2p, Fall 06

Evaluation: Hops

Much of the traffic reduction comes from reducing the number of hops

Path and random, better than ownerFor example, queries that finish with 4 hops, 71% owner, 86% path, 91% random

77p2p, Fall 06

Summary

• Random Search/replication Model: probes to “random” hosts

• Proportional allocation – current practice• Uniform allocation – best for insoluble queries

• Soluble queries: • Proportional and Uniform allocations are two

extremes with same average performance• Square-Root allocation minimizes Average

Search Size

• OPT (all queries) lies between SR and Uniform• SR/OPT allocation can be realized by simple

algorithms.

78p2p, Fall 06

Discussion

Cohen et al paperPath replication overshoots or undershoot the fixed point if queries arrive in large bursts or time between search and subsequent copy generator is large – more involved algorithms than path replication

Extensions for variable size issues or nodes with heterogeneous capacities

Many issues:Other types of graphs, adaptability, etc …

79p2p, Fall 06


1. Περιγραφή των εργασιών του μαθήματος2. Γενικά για Replication3. Replication Theory for Unstructured (Cohen et al paper)

4. Epidemic Algorithms for Updates (Demers et al paper)

80p2p, Fall 06

Replication & Unstructured P2P

epidemic algorithms

81p2p, Fall 06

Replication Policy How many copies Where (owner, path, random path)

Update Policy Synchronous vs Asynchronous Master Copy

82p2p, Fall 06

Methods for spreading updates:

Push: originate from the site where the update appeared To reach the sites that hold copies

Pull: the sites holding copies contact the master siteExpiration times

Epidemics for spreading updates

83p2p, Fall 06

Update at a single site

Randomized algorithms for distributing updates and driving replicas towards consistency

Ensure that the effect of every update is eventually reflected to all replicas:Sites become fully consistent only when all updating activity has stopped and the system has become quiescent

Analogous to epidemics

A. Demers et al, Epidemic Algorithms for Replicated Database Maintenance, SOSP 87

84p2p, Fall 06


Direct mail: each new update is immediately mailed from its originating site to all other sites

(+) Timely & reasonably efficient(-) Not all sites know all other sites (stateless)(-) Mails may be lost

Anti-entropy: every site regularly chooses another site at random and by exchanging content resolves any differences between them

(+) Extremely reliable but requires exchanging content and resolving updates(-) Propagates updates much more slowly than direct mail

85p2p, Fall 06


Rumor mongering: Sites are initially “ignorant”; when a site receives a new update it becomes a “hot rumor” While a site holds a hot rumor, it periodically chooses another site at random and ensures that the other site has seen the update

When a site has tried to share a hot rumor with too many sites that have already seen it, the site stops treating the rumor as hot and retains the update without propagating it further

Rumor cycles can be more frequent that anti-entropy cycles, because they require fewer resources at each site, but there is a chance that an update will not reach all sites

86p2p, Fall 06

Anti-entropy and rumor spreading are examples of epidemic algorithms

Three types of sites: Infective: A site that holds an update that is willing to share is hold Susceptible: A site that has not yet received an update Removed: A site that has received an update but is no longer willing to share

Anti-entropy: simple epidemic where all sites are always either infective or susceptible

87p2p, Fall 06

A set S of n sites, each storing a copy of a database

The database copy at site s S is a time varying partial function s.ValueOf: K {u:V x t :T}

set of keys set of values set of timestamps (totally ordered by <

V contains the element NILs.ValueOf[k] = {NIL, t}: item with k has been deleted from the database

Assume, just one items.ValueOf {u:V x t:T}thus, an ordered pair consisting of a value and a timestampThe first component may be NIL indicating that the item was deleted by the time indicated by the second component

88p2p, Fall 06

The goal of the update distribution process is to drive the system towards

s, s’ S: s.ValueOf = s’.ValueOf

Operation invoked to update the database

Update[u:V] s.ValueOf {r, Now{})

89p2p, Fall 06

Direct Mail

At the site s where an update occurs:For each s’ S

PostMail[to:s’, msg(“Update”, s.ValueOf)

Each site s’ receiving the update message: (“Update”, (u, t))

If s’.ValueOf.t < t

s’.ValueOf (u, t)

The complete set S must be known to s (stateful server) PostMail messages are queued so that the server is not delayed (asynchronous), but may fail when queues overflow or their destination are inaccessible for a long time n (number of sites) messages per update traffic proportional to n and the average distance between sites

s originator of the updates’ receiver of the update

90p2p, Fall 06

Anti-Entropy

At each site s periodically execute:For some s’ S

ResolveDifference[s, s’]

Three ways to execute ResolveDifference:Push (sender (server) - driven)

If s.Valueof.t > s’.Valueof.t

s’.ValueOf s.ValueOf

Pull (receiver (client) – driven)If s.Valueof.t < s’.Valueof.t

s.ValueOf s’.ValueOf

Push-Pulls.Valueof.t > s’.Valueof.t s’.ValueOf s.ValueOfs.Valueof.t < s’.Valueof.t s.ValueOf s’.ValueOf

s s’

s pushes its value to s’

s pulls s’ and gets s’

value

91p2p, Fall 06

Anti-Entropy

Assume that Site s’ is chosen uniformly at random from the set S Each site executes the anti-entropy algorithm once per period

It can be proved that An update will eventually infect the entire population Starting from a single affected site, this can be achieved in time proportional to the log of the population size

92p2p, Fall 06

Anti-EntropyLet pi be the probability of a site remaining susceptible (has not received the update) after the i cycle of anti-entropy

For pull,A site remains susceptible after the i+1 cycle, if (a) it was susceptible after the i cycle and (b) it contacted a susceptible site in the i+1 cycle

pi+1 = (pi)2

For push,A site remains susceptible after the i+1 cycle, if (a) it was susceptible after the i cycle and (b) no infectious site choose to contact in the i+1 cycle

pi+1 = pi (1 – 1/n)n(1-pi)

1 – 1/n (site is not contacted by a node)n(1-pi) number of infectious nodes at cycle iPull is preferable than

push

93p2p, Fall 06

Anti-Entropy

More next week

Documents

Topics in Database Systems: Data Management in Peer-to-Peer Systems