26
Ivy: A Read/Write Peer-to-Peer File System A. Muthitacharoen, R. Morris, T. M. Gil, and B. Chen In Proceedings of OSDI ‘02 2003-4-29 Presenter : Chul Lee

Ivy: A Read/Write Peer-to-Peer File System A. Muthitacharoen, R. Morris, T. M. Gil, and B. Chen In Proceedings of OSDI ‘02 2003-4-29 Presenter : Chul Lee

Embed Size (px)

Citation preview

Ivy: A Read/Write Peer-to-Peer File System

A. Muthitacharoen, R. Morris, T. M. Gil, and B. ChenIn Proceedings of OSDI ‘02

2003-4-29

Presenter : Chul Lee

What is IVY?

• A multi-user read/write peer-to-peer file system

• No centralized/dedicated components

• Single file system image

• Conventional file system interface

– Case study of DHT use!

Ivy uses DHT

• DHT provides– Simple API

• Put(key, value) and get(key) value

– Availability (Replication)– Robustness (Integrity checking)

Distributed hash table

Distributed application

get (key) dataput(key, data)

Lookup service

lookup(key) node IP address

(Ivy)

(DHash)

(Chord)

Prob.: Shared Data w/ DHT

RootInode

DirectoryBlock

File3Inode

File2Inode

File1Inode

File3Data

Internet

DHT node

Challenges

• Consistency of file system meta-data

• Locking is an unattractive approach over unreliable participants.

• Undo modifications by untrustworthy participants

• Operate while partitioned, repair conflicting updates

Solution: Log Based

• Update: Each participant maintains a log of changes to the file system

• Lookup: Each participant scans all logs

Software Structure

• Local NFS loop-back server

App

NFSClient

IvyServer

DHTNode

DHTNode

DHTNode

kernel

user

systemcalls NFS RPCs

Internet

DHTNode

Example: Using Log

Local NFS Client Local Ivy Server

LOOKUP(“d”, I-Num=10)I-Num=1000

CREATE(“aaa”, I-Num=1000)I-Num=9956

WRITE(“hello”, 0, I-Num=9956)OK

• echo hello > d/aaa• LOOKUP finds the I-Number of directory “d”• CREATE creates file “aaa” in directory “d”• WRITE writes “hello” at offset 0 in file “aaa”

Using Log: File Creation

Type: CreateI-num: 9956

Type: LinkDir I-num: 1000File I-num: 9956Name: “aaa”

Type: WriteI-num: 9956Offset: 0Data: “hello”

LogHead

• A log record describes a change to the file system

Using Log: Lookup

Type: LinkDir I-num: 1000File I-num: 9956Name: “aaa”

Type: LinkDir I-num: 1000File I-num: 9876Name: “bbb”

Type: RemoveDir I-num: 1000Name: “aaa”

• A scan follows the log backwards in time• LOOKUP(name, dir I-num): last Link, but stop at Remove• READDIR(dir I-num): accumulate Links, minus Removes

Contributions

• Multi-user read/write peer-to-peer storage system

• Distributed file system with useful integrity properties based on untrusted components

• Use of distributed hash tables as a building block

Design

• DHash – maps keys to arbitrary values

• Log Data Structure – a linked list

• View – a set of logs

• Combining logs – in ordering records

• Snapshot – state of the file system

Log Data Structure

• A linked list of immutable log records

Log record types

• Roughly NFS update operations• 160-bit i-numbers as file handle

User Cooperation: Views

• Set of logs that comprise the file system

• View block – a immutable DHash content-hash block

Combining Logs

• Ivy orders records using version vectors • Seq. field – starts from zero for each log• Version vector: tuple (U:V) for each log

– U: Dhash key of the log-head– V: Sequence number of the most recent record

• Example: (A:5 B:7)– < (A:6 B:7) BUT concurrent with (A:6 B:6)

• Public keys used to order in case of concurrency

Snapshots

• Each Ivy participant constructs a private snapshot for speed

• Contains the entire state of the file system

• Each snapshot stored in DHash for persistency as content-hash blocks

Snapshot Data Structure

Application Semantics

• Concurrent Updates

• Partitioned Updates/ Conflict Resolution

Concurrent Updates

• Ivy does not serialize all updates

• Problem – Unlink(“a”) and rename(“a”, “b”) at same

time– Ivy correctly lets only one take effect– But it may return “success” status for both

Partitioned Updates

• Ivy is not directly aware of partitions– Ivy’s design maximizes availability at the e

xpense of consistency– Letting updates proceed in all partitions

• All updates during a partition are concurrent updates

• Conflict resolution -> “lc” tools

WAN Evaluation on MAB

• Modified Andrew Benchmark• 4 DHash nodes

• Round-trip times: 9, 16, 82 milliseconds

• No DHash replication

• 4 logs

• One active writer

WAN Performance

Phase Ivy NFS

Mkdir 11.2 4.8

Write 89.2 42.0

Stat 65.6 47.8

Read 65.8 55.6

Compile 144.2 130.2

Total 376.0 280.4

Summary

• Exploring use of DHTs as a building block

• Case study of DHT use: Ivy– Read/write peer-to-peer file system

• Suitable for small groups of cooperating participants who do not have a single central server

Critiques

• Indefinite logs

• Scanning all logs for each request

• Rely on DHT’s block availability and robustness

Discussion

• DHT interface ~ Disk Sector R/W interface

• Performance vs. Semantic

• Any other applications of DHT– DB, LDAP server…