24
Towards a Scalable File System Progress on adapting BlobSeer to WAN scale for the HGMDS distributed metadata system Viet-Trung Tran, Gabriel Antoniu, Alexandru Costan (INRIA - Rennes) In collaboration with Kohei Hiraga, Osamu Tatebe (U Tsukuba) FP3C meeting Bordeaux, 2 – 3 September 2011

Progress on adapting BlobSeer to WAN scale

Embed Size (px)

Citation preview

Page 1: Progress on adapting BlobSeer to WAN scale

Towards a Scalable File System Progress on adapting BlobSeer to WAN scale for the HGMDS distributed metadata system Viet-Trung Tran, Gabriel Antoniu, Alexandru Costan (INRIA - Rennes) In collaboration with Kohei Hiraga, Osamu Tatebe (U Tsukuba)

FP3C meeting Bordeaux, 2 – 3 September 2011

Page 2: Progress on adapting BlobSeer to WAN scale

Plan

1. Background and context 2. Goal 3. Approach and solution 4. Preliminary evaluation 5. Conclusion

FP3C meeting – Bordeaux, 2-3 September 2011 - 2

Page 3: Progress on adapting BlobSeer to WAN scale

FP3C meeting – Bordeaux, 2-3 September 2011 - 3

Background BlobSeer & HGMDS

1

Page 4: Progress on adapting BlobSeer to WAN scale

FP3C meeting – Bordeaux, 2-3 September 2011 - 4

BlobSeer: A large-scale data management service

Generic data-management platform for huge, unstructured data •  Huge data (TB) : BLOBs •  Highly concurrent, fine-grain access (MB): R/W/A •  Prototype available Key design features •  Decentralized metadata management •  Beyond MVCC: multiversioning exposed to the user •  Lock-free write access through versioning A back-end for higher-level, sophisticated data management systems

Page 5: Progress on adapting BlobSeer to WAN scale

FP3C meeting – Bordeaux, 2-3 September 2011 - 5

BlobSeer: Architecture

Clients •  Perform fine grain blob accesses Providers •  Store the pages of the blob Provider manager •  Monitors the providers •  Favours data load balancing Metadata providers •  Store information about page location Version manager •  Ensures concurrency control

Clients

Providers

Metadata providers

Provider manager

Version manager

Page 6: Progress on adapting BlobSeer to WAN scale

FP3C meeting – Bordeaux, 2-3 September 2011 - 6

HGMDS: A distributed metadata management system for global file systems

•  Multi-master file system metadata server (MDS). •  Managing inode structure. •  High latency networks don't affect metadata operation performance. - Both reading and writing. •  One MDS per site. •  Metadata versioning using vector clocks for collision detection. •  Automatic collision resolution by system side.

The  Internet Site A

File system Clients

HGMDS

Site B

Site C

mkdir/rmdir/ create/stat/

unlink

Propagate updates in

background

HGMDS

Page 7: Progress on adapting BlobSeer to WAN scale

FP3C meeting – Bordeaux, 2-3 September 2011 - 7

Goal A joint architecture integrating BlobSeer and HGMDS

2

Page 8: Progress on adapting BlobSeer to WAN scale

FP3C meeting – Bordeaux, 2-3 September 2011 - 8

Goal BlobSeer HGMDS

Data management Metadata management Typically on a single site Global scale, multiple sites

Idea: build a global file system deployed on multiple site by integrating BlobSeer to HGMDS Potential benefits: •  HGMDS: efficient multi-site file metadata management •  BlobSeer: concurrency-optimized access to globally shared data

Page 9: Progress on adapting BlobSeer to WAN scale

FP3C meeting – Bordeaux, 2-3 September 2011 - 9

Our approach and solution 3

Page 10: Progress on adapting BlobSeer to WAN scale

Two approaches

Multiple BlobSeer instances •  One BlobSeer / site One single BlobSeer-WAN over distributed geographic sites

FP3C meeting – Bordeaux, 2-3 September 2011 - 10

Page 11: Progress on adapting BlobSeer to WAN scale

1st approach: 1 BlobSeer instance / site

FP3C meeting – Bordeaux, 2-3 September 2011 - 11

Client

Page 12: Progress on adapting BlobSeer to WAN scale

1st approach: Zoom

High latency when accessing remote BLOBs: •  Too many remote requests for small metadata EMETTEUR - NOM DE LA PRESENTATION - 12

Page 13: Progress on adapting BlobSeer to WAN scale

2nd approach: 1 BlobSeer-WAN instance over distributed geographic sites

Multiple version managers •  1 version manager/site Multiple provider managers •  1 provider manager/site On each site •  Multiple data providers and metadata servers •  Data providers are under control of local provider manager

EMETTEUR - NOM DE LA PRESENTATION - 13

Page 14: Progress on adapting BlobSeer to WAN scale

Idea: leverage locality for remote metadata accesses

Metadata I/O is resolved locally

EMETTEUR - NOM DE LA PRESENTATION - 14

2

Page 15: Progress on adapting BlobSeer to WAN scale

2nd approach: I/O scheme in BlobSeer-WAN

Writing •  Publish version on local version manager •  Locally write metadata on local metadata servers •  Locally write data on local data providers Reading (Read your write in many cases) •  Ask a version to local version manager •  Local metadata accesses •  Access remote/local providers if necessary

FP3C meeting – Bordeaux, 2-3 September 2011 - 15

Page 16: Progress on adapting BlobSeer to WAN scale

Vector clocks and optimistic metadata replication

FP3C meeting – Bordeaux, 2-3 September 2011 - 16

Page 17: Progress on adapting BlobSeer to WAN scale

Expected benefits

FP3C meeting – Bordeaux, 2-3 September 2011 - 17

•  On WAN: BlobSeer coordinates with HGMDS to provide a global versioning file system

- Low latency metadata I/O - Eventually consistency model - Load balancing/fault tolerance

•  On LAN: - Distributed version management - Load balancing/fault tolerance

Page 18: Progress on adapting BlobSeer to WAN scale

FP3C meeting – Bordeaux, 2-3 September 2011 - 18

Preliminary evaluation BlobSeer-WAN on G5K

4

Page 19: Progress on adapting BlobSeer to WAN scale

FP3C meeting – Bordeaux, 2-3 September 2011 - 19

Testbed

Using 2 sites of G5K •  Rennes: 40 nodes

• 30 nodes reserved for BlobSeer services • 10 nodes for clients

•  Grenoble: 40 nodes • 30 nodes reserved for BlobSeer services • 10 nodes for clients

•  Interconnect network between sites 10 Gbps

Page 20: Progress on adapting BlobSeer to WAN scale

FP3C meeting – Bordeaux, 2-3 September 2011 - 20

Concurrent appending: 512 MB/client

Page 21: Progress on adapting BlobSeer to WAN scale

FP3C meeting – Bordeaux, 2-3 September 2011 - 21

Conclusion On going work

5

Page 22: Progress on adapting BlobSeer to WAN scale

FP3C meeting – Bordeaux, 2-3 September 2011 - 22

Summary Discussed the integration of BlobSeer and HGMDS: •  BlobSeer-WAN extension is required BlobSeer-WAN •  Preliminary results look encouraging •  Performance of BlobSeer-WAN on two sites similar to that of

vanilla BlobSeer on a single site •  Prototype available at BlobSeer’s repository/branches/

BlobSeer-WAN-dev/

HGMDS •  Implementation almost done •  Works on multi-sites •  Collisions automatically solved by a rule

Page 23: Progress on adapting BlobSeer to WAN scale

Next steps

•  A more extensive evaluation for BlobSeer-WAN •  Integrate BlobSeer-WAN to HGMDS •  Preliminary evaluation of HGMDS BlobSeer-WAN on

Grid5000 and on the Japanese Clusters •  Submit co-authored paper by Spring 2012 •  Next internships: Kohei @Inria Rennes

FP3C meeting – Bordeaux, 2-3 September 2011 - 23

Page 24: Progress on adapting BlobSeer to WAN scale

Thank you!

FP3C meeting 2 – 3 September 2011