Upload
aminia
View
61
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Data Currency in Replicated DHTs. Reza Akbarinia , Esther Pacitti and Patrick Valduriez University of Nantes, France, INIRA ACM SIGMOD 2007 Presenter Jerry Wu. Motivation. P2P data sharing systems Enable large amount of users to share a massive number of files - PowerPoint PPT Presentation
Citation preview
Data Currency in Replicated DHTs
Reza Akbarinia, Esther Pacitti and Patrick ValduriezUniversity of Nantes, France, INIRA
ACM SIGMOD 2007
Presenter Jerry Wu
Motivation
• P2P data sharing systems– Enable large amount of users to share a massive
number of files
– Query Reply Send request Download• Message forwarding on these systems
– Flooding : KaZaA, Gnutella– DHT : CAN, Chord, Pastry, … etc.
Distributed Hash Table (DHT)
• Use hash functions to locate files– h(meta data) = k (for identification)– g(k) = k1 (for routing)
A
B F
D
EC
MetaFreeLoop.mp3
g(k)=k1 (A)
U
k1
k1
Data Replication
• What if node A fails?• Duplicate several copies
A
B F
D
EC
g(h(FreeLoop.mp3))=k1 (A)
U
g2(h(FreeLoop.mp3))=k2 (D)
g3(h(FreeLoop.mp3))=k3 (E)
MetaFreeLoop.mp3
k2
k3
Basic Operations
• putH(meta key k, File D)– Insert a file into the DHT
• getH(meta key k)– Retrieve the file from the DHT
H : { g(k , D) | g is used as a hash function}|H| : The replication level of the system
Each file will be stored at |H| peers
Additional Problems
• If the owner can modify the data …
• The nature of P2P system– Peers can join and leave dynamically
• Update while some peers depart and rejoins later?
• Concurrent update?
Solution
• If we have a timestamp for each transaction of update/insert ?– The currency of the file is judged by its
timestamp– FileX = File + timestamp– Put (k, FileX) instead of (k, File) into the
DHT!!• Then we know the freshness of the file• Only the latest update can succeed
How Can We Get A Timestamp?
• KTS (Key-based Timestamp Service)– Issue timestamps for each transaction– gen_ts(key k)
• Generate a timestamp w.r.t. key k– last_ts(key k)
• Return the finally issued timestamp
The New DHT Functions
• Based on the KTS service• Insert(key k, FileX D, Hash function set Hr)
– Insert or update a file with identity key k into the DHT
• Retrieve(k, Hr)– Retrieve the latest copy of the file with identity
key k
Insert A File
B F
G
EC
g(k)=k1 (A)
U
g2(k)=k2 (C)
InsertP.avi
k2
k1
D
Hh(P.avi)=k
KTSTimestamp
Service
gen_ts(k)=tA
A
putg(k, (tA, P.avi))
putg2(k, (tA, P.avi))
Retrieve A File
B F
G
EC
g(k)=k1 (A)
U
g2(k)=k2 (C)
GetP.avi
k2
k1
D
Hh(P.avi)=k
KTSTimestamp
Service
last_ts(k)=tA
A
getg(k)
getg2(k)
(t0, P.avi)
(tA, P.avi)
• If( tsx > ts0) then– Update File D
Update A File
putg(k, (tsx, File D))Key TS File
k ts0 File D (P.avi)
k1 ts1 File D1 (X.mp3)
k2 ts2 File D2 (Y.m4v)
k3 ts3 File D3 (Z.tar)
Retrieval Cost Analysis• C = Ckts + N * Cret
• Ckts = Cret = O(logn), n = # of peers• Let X be the random variable of N
• N : Number of retries to get the latest copy• pt : The probability of finding a fresh copy • Prob(X = i) = pt * (1 - pt)i-1
• |Hr| = number of replicas of the system
Retrieval Cost Analysis
• Then, how can we get a timestamp?– Key-based Timestamp Service (KTS)
The KTS Service• Use the same DHT but with different hash
function hts
1
2
Hash Table Req (k, hts)
Req(k, hts)=pTimeStampRequest (k)
Hash Table Req(k, hts)
3
4
The KTS Service• How can node p generate timestamps
w.r.t. key k?– Receive the counters from a leaving peer
• DHT system will distribute the load of the leaving peer to its neighbors
• Direct initialization
– Send a file request w.r.t. key k to obtain the latest timestamp• Take place if the leaving peer fails• Indirect initialization
The KTS Service
• Indirect initialization– The probability to fail pf
– pf = (1-pt)|H|
– If pt = 30%, |H|=13, then pf < 1%
• After initialization, increase timestamp on every timestamp request
Experiments And Simulations
• Environments– 64 node cluster– 10000 nodes on the SimJava platform
• Metrics– Response time : Time to return a current
replica in response to a query– Communication cost : # of messages to send
to answer a query
The Competitor - BRICKS
• Use a function to map key k to multiple keys (k1, k2, k3, k4, …)
• Each replica has a version number– Concurrent update problems– Must extract all replicas to find the newest
one
Response Time VS DHT Size
Communication Cost VS DHT Size
Response Time VS # of Replica
Failure Rate VS Response Time
Conclusion
• Pros– Use DHT to provide timestamp service is smart!– Consider the concurrent update problem– Easy to apply on exiting DHTs
• Cons– KTS service can raise additional communication
overhead
Thank You