21
July 2003 Sorrento: A Self-Organizing Distributed File System on Large-scale Clusters Hong Tang, Aziz Gulbeden and Tao Yang Department of Computer Science, University of California, Santa Barbara

July 2003 Sorrento: A Self-Organizing Distributed File System on Large-scale Clusters Hong Tang, Aziz Gulbeden and Tao Yang Department of Computer Science,

Embed Size (px)

Citation preview

Page 1: July 2003 Sorrento: A Self-Organizing Distributed File System on Large-scale Clusters Hong Tang, Aziz Gulbeden and Tao Yang Department of Computer Science,

July 2003

Sorrento: A Self-Organizing Distributed File System on

Large-scale Clusters

Hong Tang, Aziz Gulbeden and Tao Yang

Department of Computer Science,

University of California, Santa Barbara

Page 2: July 2003 Sorrento: A Self-Organizing Distributed File System on Large-scale Clusters Hong Tang, Aziz Gulbeden and Tao Yang Department of Computer Science,

2

Information Management Challenges

“Disk full (again)!” Cause: increasing storage demand. Options: adding more disks, reorganizing data,

removing garbage. “Where are the data?”

Cause 1: scattered storage repositories. Cause 2: disk corruptions (crashes). Options: exhaustive search; indexing; backup.

Management headaches! Nightmares for data-intensive applications and

online services.

Page 3: July 2003 Sorrento: A Self-Organizing Distributed File System on Large-scale Clusters Hong Tang, Aziz Gulbeden and Tao Yang Department of Computer Science,

3

A Better World

A single repository – virtual disk. A uniform hierarchical namespace. Expand storage capacity on-demand. Resilient to disk failures through data redundancy. Fast and ubiquitous access. Inexpensive storage.

Page 4: July 2003 Sorrento: A Self-Organizing Distributed File System on Large-scale Clusters Hong Tang, Aziz Gulbeden and Tao Yang Department of Computer Science,

4

Cluster-based Storage Systems

Turn a generic cluster as a storage system.

LAN

Clients

Storage cluster

Page 5: July 2003 Sorrento: A Self-Organizing Distributed File System on Large-scale Clusters Hong Tang, Aziz Gulbeden and Tao Yang Department of Computer Science,

5

Why?

Clusters provide: Cost-effective computing platform. Incremental scalability. High availability.

Page 6: July 2003 Sorrento: A Self-Organizing Distributed File System on Large-scale Clusters Hong Tang, Aziz Gulbeden and Tao Yang Department of Computer Science,

6

Design Objectives

Programmability Virtualization of distributed storage resources. Uniform namespace for data addressing.

Manageability Incremental expansion. Self-adaptive to node additions and departures. Almost-zero administration.

Performance Performance monitoring. Intelligent data placement and migration.

247 Availability Replication support.

Page 7: July 2003 Sorrento: A Self-Organizing Distributed File System on Large-scale Clusters Hong Tang, Aziz Gulbeden and Tao Yang Department of Computer Science,

7

Design Choices

Use commodity components as much as possible. Share-nothing architecture. Functionally symmetric servers (serverless). User-level file system.

Daemons run as user processes. Possible to make it mountable through kernel modules.

Page 8: July 2003 Sorrento: A Self-Organizing Distributed File System on Large-scale Clusters Hong Tang, Aziz Gulbeden and Tao Yang Department of Computer Science,

8

Data Organization Model

User-perceived files are split into variable-length segments (data objects).

Data objects are linked by index objects.

Data and index objects are stored in their entirety as files within native file systems.

Objects are addressed through location-transparent GUIDs.

Page 9: July 2003 Sorrento: A Self-Organizing Distributed File System on Large-scale Clusters Hong Tang, Aziz Gulbeden and Tao Yang Department of Computer Science,

9

Multi-level Data Consistency Model

Level 0: best-effort without any guarantee. Possible to reorder I/O operations.

Level 1: time-ordered I/O operations. May observe problems of missed writes.

Level 2: open-to-close session consistency. The effect of multiple I/O operations within an open-to-close session are either ALL visible or NONE visible to others. May lead to abortion when there is a write/write conflict.

Level 3: adding file sharing and automatic conflict resolution upon Level 2.

Page 10: July 2003 Sorrento: A Self-Organizing Distributed File System on Large-scale Clusters Hong Tang, Aziz Gulbeden and Tao Yang Department of Computer Science,

10

System Architecture

Proxy Module Data location and

placement. Monitor multicast channel.

Server Module Export local storage.

Namespace Server Maintain a global

directory tree. Translate filenames to

root-object GUIDs.

LAN

Namespaceserver proxy

server

proxyserver

proxyserver

proxyserver

proxyserver

proxyserver

proxyserver

Page 11: July 2003 Sorrento: A Self-Organizing Distributed File System on Large-scale Clusters Hong Tang, Aziz Gulbeden and Tao Yang Department of Computer Science,

11

4. Determine the 4. Determine the server to contact server to contact

1. Wants to access 1. Wants to access /foo/bar/foo/bar

LAN

Namespaceserver proxy

server

proxyserver

proxyserver

proxyserver

proxyserver

proxyserver

proxyserver

Accessing a File

2. Ask “2. Ask “/foo/bar”s GUID/foo/bar”s GUID

3. Get “3. Get “/foo/bar”s GUID/foo/bar”s GUID

5. Ask for the root object5. Ask for the root object

6. Retrieve the data6. Retrieve the data

7. Contact other servers7. Contact other servers if necessary if necessary

8. Close file8. Close file

Page 12: July 2003 Sorrento: A Self-Organizing Distributed File System on Large-scale Clusters Hong Tang, Aziz Gulbeden and Tao Yang Department of Computer Science,

12

Project Status

Distributed data placement and location protocol. To appear in SuperComputing 2003.

Prototype implementation done by summer 2003. Production usage by end of 2003. Project Web page:

http://www.cs.ucsb.edu/~gulbeden/sorrento/

Page 13: July 2003 Sorrento: A Self-Organizing Distributed File System on Large-scale Clusters Hong Tang, Aziz Gulbeden and Tao Yang Department of Computer Science,

13

Evaluation

We are planning to use trace-driven evaluation. Enables us to find problems without adding much to

the system. Performance of various applications can be measured

without porting. Allows us to reproduce and identify the any potential

problem. Applications that can benefit from the system are:

Web crawler. Protein sequence matching. Parallel I/O applications.

Page 14: July 2003 Sorrento: A Self-Organizing Distributed File System on Large-scale Clusters Hong Tang, Aziz Gulbeden and Tao Yang Department of Computer Science,

14

Project Status and Development Plan

Most software modules are implemented such as: consistent hashing, UDP request/response management, persistent hash table, file block cache, thread pool, and load statistics collection.

We are working on building a running prototype. Milestones:

Barebone runtime system. Add dynamic migration. Add version-based data management and replication. Add kernel VFS switch.

Page 15: July 2003 Sorrento: A Self-Organizing Distributed File System on Large-scale Clusters Hong Tang, Aziz Gulbeden and Tao Yang Department of Computer Science,

15

Conclusion

Project website http://www.cs.ucsb.edu/~gulbeden/sorrento

Page 16: July 2003 Sorrento: A Self-Organizing Distributed File System on Large-scale Clusters Hong Tang, Aziz Gulbeden and Tao Yang Department of Computer Science,

16

Proxy Module

Consists of: Dispatcher: listens for incoming requests. Thread pool: processes requests from local

applications. Subscriber: monitors the multicast channel.

Stores: A set of live hosts. Address of the Namespace Server. Set of opened file handles.

Accesses data by hashing GUID of the object.

Page 17: July 2003 Sorrento: A Self-Organizing Distributed File System on Large-scale Clusters Hong Tang, Aziz Gulbeden and Tao Yang Department of Computer Science,

17

Server Module

Consists of: Dispatcher: Listens for

requests (UDP or TCP). Thread Pool: Handles

requests for local operations. Local Storage: Stores the

local data.

Stores: Global block table partition. INode Map. Physical local store.

UDP / TCP

handleRequest

Thread_Pool

Respond

Server

Runwait_for_request

Server Dispatcher(UDP / TCP)

Data

Local Storage

Queue

EnqueueRequest Create

OpenCloseReadWrite

AppendRemove

Page 18: July 2003 Sorrento: A Self-Organizing Distributed File System on Large-scale Clusters Hong Tang, Aziz Gulbeden and Tao Yang Department of Computer Science,

18

Choice I: SAN

Distributed and heterogeneous devices.

Dedicated fast network. Storage virtualization.

Volume-based. Each volume managed by a

dedicated server. Volume map.

Page 19: July 2003 Sorrento: A Self-Organizing Distributed File System on Large-scale Clusters Hong Tang, Aziz Gulbeden and Tao Yang Department of Computer Science,

19

Choice I: SAN (cont)

Cost Disadvantage

Scalability Manageability

Change the volume map. Reorganize data on the old volume.

Handling disk failures: Exclude failed disks from volume maps. Restore data to spare disks.

Conclusions: Hard to automate. Prone to human errors (at large scale).

Page 20: July 2003 Sorrento: A Self-Organizing Distributed File System on Large-scale Clusters Hong Tang, Aziz Gulbeden and Tao Yang Department of Computer Science,

20

SAN: Storage Area Networks

Distributed and heterogeneous devices.

Dedicated fast network. Storage virtualization.

Volume-based. Each volume managed by

a dedicated server. Volume map.

Page 21: July 2003 Sorrento: A Self-Organizing Distributed File System on Large-scale Clusters Hong Tang, Aziz Gulbeden and Tao Yang Department of Computer Science,

21

Management Challenges of SAN

Expanding an existing volume: Change the volume map. Reorganize data on the old volume.

Handling disk failures: Exclude failed disks from volume maps. Restore data to spare disks.

Conclusions: Hard to automate. Prone to human errors (at large scale).