Upload
derrick-leo-norris
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
Ivy: A Read/Write Peer-to-Peer File System
A. Muthitacharoen, R. Morris, T. M. Gil, and B. ChenIn Proceedings of OSDI ‘02
2003-4-29
Presenter : Chul Lee
What is IVY?
• A multi-user read/write peer-to-peer file system
• No centralized/dedicated components
• Single file system image
• Conventional file system interface
– Case study of DHT use!
Ivy uses DHT
• DHT provides– Simple API
• Put(key, value) and get(key) value
– Availability (Replication)– Robustness (Integrity checking)
Distributed hash table
Distributed application
get (key) dataput(key, data)
Lookup service
lookup(key) node IP address
(Ivy)
(DHash)
(Chord)
Prob.: Shared Data w/ DHT
RootInode
DirectoryBlock
File3Inode
File2Inode
File1Inode
File3Data
Internet
DHT node
Challenges
• Consistency of file system meta-data
• Locking is an unattractive approach over unreliable participants.
• Undo modifications by untrustworthy participants
• Operate while partitioned, repair conflicting updates
Solution: Log Based
• Update: Each participant maintains a log of changes to the file system
• Lookup: Each participant scans all logs
Software Structure
• Local NFS loop-back server
App
NFSClient
IvyServer
DHTNode
DHTNode
DHTNode
kernel
user
systemcalls NFS RPCs
Internet
DHTNode
Example: Using Log
Local NFS Client Local Ivy Server
LOOKUP(“d”, I-Num=10)I-Num=1000
CREATE(“aaa”, I-Num=1000)I-Num=9956
WRITE(“hello”, 0, I-Num=9956)OK
• echo hello > d/aaa• LOOKUP finds the I-Number of directory “d”• CREATE creates file “aaa” in directory “d”• WRITE writes “hello” at offset 0 in file “aaa”
Using Log: File Creation
Type: CreateI-num: 9956
Type: LinkDir I-num: 1000File I-num: 9956Name: “aaa”
Type: WriteI-num: 9956Offset: 0Data: “hello”
…
LogHead
• A log record describes a change to the file system
Using Log: Lookup
Type: LinkDir I-num: 1000File I-num: 9956Name: “aaa”
Type: LinkDir I-num: 1000File I-num: 9876Name: “bbb”
Type: RemoveDir I-num: 1000Name: “aaa”
• A scan follows the log backwards in time• LOOKUP(name, dir I-num): last Link, but stop at Remove• READDIR(dir I-num): accumulate Links, minus Removes
Contributions
• Multi-user read/write peer-to-peer storage system
• Distributed file system with useful integrity properties based on untrusted components
• Use of distributed hash tables as a building block
Design
• DHash – maps keys to arbitrary values
• Log Data Structure – a linked list
• View – a set of logs
• Combining logs – in ordering records
• Snapshot – state of the file system
User Cooperation: Views
• Set of logs that comprise the file system
• View block – a immutable DHash content-hash block
Combining Logs
• Ivy orders records using version vectors • Seq. field – starts from zero for each log• Version vector: tuple (U:V) for each log
– U: Dhash key of the log-head– V: Sequence number of the most recent record
• Example: (A:5 B:7)– < (A:6 B:7) BUT concurrent with (A:6 B:6)
• Public keys used to order in case of concurrency
Snapshots
• Each Ivy participant constructs a private snapshot for speed
• Contains the entire state of the file system
• Each snapshot stored in DHash for persistency as content-hash blocks
Concurrent Updates
• Ivy does not serialize all updates
• Problem – Unlink(“a”) and rename(“a”, “b”) at same
time– Ivy correctly lets only one take effect– But it may return “success” status for both
Partitioned Updates
• Ivy is not directly aware of partitions– Ivy’s design maximizes availability at the e
xpense of consistency– Letting updates proceed in all partitions
• All updates during a partition are concurrent updates
• Conflict resolution -> “lc” tools
WAN Evaluation on MAB
• Modified Andrew Benchmark• 4 DHash nodes
• Round-trip times: 9, 16, 82 milliseconds
• No DHash replication
• 4 logs
• One active writer
WAN Performance
Phase Ivy NFS
Mkdir 11.2 4.8
Write 89.2 42.0
Stat 65.6 47.8
Read 65.8 55.6
Compile 144.2 130.2
Total 376.0 280.4
Summary
• Exploring use of DHTs as a building block
• Case study of DHT use: Ivy– Read/write peer-to-peer file system
• Suitable for small groups of cooperating participants who do not have a single central server
Critiques
• Indefinite logs
• Scanning all logs for each request
• Rely on DHT’s block availability and robustness