33
GMount: An Ad-hoc and Locality-Aware Distributed File System by using SSH and FUSE Graduate School of Information Science and Technology The University of Tokyo Nan Dun Kenjiro Taura Akinori Yonezawa

GMount: An Ad-hoc and Locality-Aware Distributed File System by using SSH and FUSE

  • Upload
    zurina

  • View
    46

  • Download
    0

Embed Size (px)

DESCRIPTION

GMount: An Ad-hoc and Locality-Aware Distributed File System by using SSH and FUSE. Graduate School of Information Science and Technology The University of Tokyo Nan Dun Kenjiro Taura Akinori Yonezawa. Today You may Have. Computing resources across different administration domains - PowerPoint PPT Presentation

Citation preview

Page 1: GMount:  An Ad-hoc and Locality-Aware Distributed File System by using SSH and FUSE

GMount: An Ad-hoc and Locality-Aware Distributed File Systemby using SSH and FUSEGraduate School of Information Science and Technology

The University of TokyoNan Dun Kenjiro Taura Akinori Yonezawa

Page 2: GMount:  An Ad-hoc and Locality-Aware Distributed File System by using SSH and FUSE

CCGrid 2009, Shanghai, China 2

Computing resources across different administration domains◦ InTrigger (JP), Tsubame (JP), T2K-Tokyo (JP)◦ Grid5000 (FR), D-Grid (DE), INFN Grid (IT), National

Grid Services (UK)◦ Open Science Grid (US)

Workload to run on all the available resources◦ Finding super nova◦ Gene decoding◦ Weather simulation, etc.

May 20, 2009

Today You may Have

Page 3: GMount:  An Ad-hoc and Locality-Aware Distributed File System by using SSH and FUSE

May 20, 2009CCGrid 2009, Shanghai, China 3

Scenario I

How to share your data among

arbitrary machines across different domains

?

Page 4: GMount:  An Ad-hoc and Locality-Aware Distributed File System by using SSH and FUSE

CCGrid 2009, Shanghai, China 4

Option 1: Staging you data◦ Too troublesome: SCP, FTP, GridFTP, etc.

Option 2: Conventional DFSs◦ Ask your administrators!

Which one? NFS, OpenAFS, PVFS, GPFS, Lustre, GoogleFS, Gfarm

Only for you? Believe me, they won’t do so Quota, security, policy? Headaches… Configure and install, even if admins are supposed

to do their job ... Option 3: GMount

◦ Build a DFS by yourself on the fly!

May 20, 2009

Ways of Sharing

Page 5: GMount:  An Ad-hoc and Locality-Aware Distributed File System by using SSH and FUSE

May 20, 2009CCGrid 2009, Shanghai, China 5

Scenario II

You have many clients/resources,

And you want more servers

Page 6: GMount:  An Ad-hoc and Locality-Aware Distributed File System by using SSH and FUSE

CCGrid 2009, Shanghai, China 6

Option 1: Conventional DFSs◦ File servers are fixed at deploy time

Fixed number of MDS (Metadata Server) Fixed number of DSS (Data Storage Servers)

◦ Ask your administrators again! Append more DSS

Option 2: GMount◦ No metadata server◦ File servers scale with the clients

As long as you have more servers, you have more DSS◦ Especially benefit if your workloads prefer large

amount of local writes

May 20, 2009

Ways of Scaling

Page 7: GMount:  An Ad-hoc and Locality-Aware Distributed File System by using SSH and FUSE

May 20, 2009CCGrid 2009, Shanghai, China 7

Scenario III

What happens when client access nearby files

in the wide-area environments?

Page 8: GMount:  An Ad-hoc and Locality-Aware Distributed File System by using SSH and FUSE

CCGrid 2009, Shanghai, China 8

High-Latency: DFSs with Central MDS◦ Central MDS is far away from some clients

Locality-Aware: GMount◦ Search nearby nodes

first◦ Sending high-latency

message only if targetfile can not be foundlocally

May 20, 2009

File Lookup in Wide-Area

Page 9: GMount:  An Ad-hoc and Locality-Aware Distributed File System by using SSH and FUSE

CCGrid 2009, Shanghai, China 9

Prerequisite1. You can SSH login some nodes2. Each node has some export directory

having the data you want to share3. Specify a mountpoint via which DFS can

be accessed◦ Simply make an empty directory for each node

May 20, 2009

Impression of Usage

Page 10: GMount:  An Ad-hoc and Locality-Aware Distributed File System by using SSH and FUSE

CCGrid 2009, Shanghai, China 10

Just one command, You are done!◦ gmnt /export/directory /mountpoint◦ GMount will create a DFS at mountpoint: a

UNION of all export directories can be mutually accessed by all nodes

May 20, 2009

Impression of Usage

Host001 Host002

export

dir1 dir2

dat1

export

dir1 dir2

dat2 dat3 dat4

mount

dir1 dir2

dat1 dat2 dat3 dat4

mount

dir1 dir2

dat1 dat3 dat4dat2

Mutual Access

Page 11: GMount:  An Ad-hoc and Locality-Aware Distributed File System by using SSH and FUSE

CCGrid 2009, Shanghai, China 11

Building Blocks◦ FUSE, SSHFS and SSHFS-MUX

To create basic userspace file system To utilize existing SSH authentication and data transfer

features◦ Grid and cluster shell (GXP)

To efficiently execute commands in parallel Core Ideas

◦ Scalable All-Mount-All algorithms To enable all nodes hierarchically and simultaneously

share with each other◦ Locality-Aware Optimization

To enable file access be aware of closer files

May 20, 2009

Enabling Techniques

Page 12: GMount:  An Ad-hoc and Locality-Aware Distributed File System by using SSH and FUSE

CCGrid 2009, Shanghai, China 12

FUSE [fuse.sf.net]

◦ Framework for quickly building userspace FS

◦ Widely available(Linux version>2.6.14)

SSHFS [fuse.sf.net/sshfs.html]

◦ Manipulate files on remote hosts as local files

◦ $ sshfs myhost.net:/export /mount◦ Limitation: only can mount one host at a time

May 20, 2009

FUSE and SSHFS Magic

Page 13: GMount:  An Ad-hoc and Locality-Aware Distributed File System by using SSH and FUSE

CCGrid 2009, Shanghai, China 13

Manipulate multiple hosts simultaneously◦ SSHFS-MUXA$ sshfsm B:/export C:/export /mount Priority lookup

E.g. C:/export will be accessed before B:/export

May 20, 2009

FUSE and SSHFS Magic (cont.)

/export

dir1 dir2

dat1

B’s /export

dir1 dir2

dat2 dat3

C’s /mount

dir1 dir2

dat1

A’s

dat2 dat3

Page 14: GMount:  An Ad-hoc and Locality-Aware Distributed File System by using SSH and FUSE

CCGrid 2009, Shanghai, China 14

Data to export at /export (E.g. 3 nodes)INPUT: export directory at each node: E

DFS mounted directory /mountOUPUT: DFS mount directory at each node: M

May 20, 2009

Problem Setting

321 ,E,EE

321321 EEEMMM

Page 15: GMount:  An Ad-hoc and Locality-Aware Distributed File System by using SSH and FUSE

CCGrid 2009, Shanghai, China 15

321321 EEEMMM

Execution examples for 3 nodes1$ sshfsm 1:/export 2:/export 3:/export /mount2$ sshfsm 1:/export 2:/export 3:/export /mount3$ sshfsm 1:/export 2:/export 3:/export /mount

May 20, 2009

A Straightforward Approach

1 2 3

1 2 3

What if we have 100 nodes?

Scalability!

Page 16: GMount:  An Ad-hoc and Locality-Aware Distributed File System by using SSH and FUSE

CCGrid 2009, Shanghai, China 16

Phase I: One-Mount-All

May 20, 2009

Scalable Approach: Phase I

1

2 3

1$ sshfsm 1:/export 2:/export 3:/export /mount

3

2

3211

M

M

EEEM

Page 17: GMount:  An Ad-hoc and Locality-Aware Distributed File System by using SSH and FUSE

CCGrid 2009, Shanghai, China 17

Phase II: All-Mount-One

May 20, 2009

Scalable Approach: Phase II

1

2 3

2$ sshfsm 1:/mount /mount3$ sshfsm 1:/mount /mount

13

12

3211

MM

MM

EEEM

Page 18: GMount:  An Ad-hoc and Locality-Aware Distributed File System by using SSH and FUSE

CCGrid 2009, Shanghai, China 18

Straight Forward

Scalable

Connections 9 (N2) 4 (O(KlogkN))SSH daemons in each node 3 (N) 2 (K)

May 20, 2009

Comparison

1 2 3

1 2 3

1

2 3

2 3

VS.

K is the number of children

Page 19: GMount:  An Ad-hoc and Locality-Aware Distributed File System by using SSH and FUSE

CCGrid 2009, Shanghai, China 19

Locality-Aware Lookup

May 20, 2009

Further Optimization

1

2 3

2 3

1$ sshfsm 1:/export 2:/export 3:/export /mount2$ sshfsm 1:/mount /mount3$ sshfsm 1:/mount /mount

1$ sshfsm 1:/export 2:/export 3:/export /mount2$ sshfsm 1:/mount 2:/export /mount3$ sshfsm 1:/mount 3:/export /mount

1

2 3

2 3

Page 20: GMount:  An Ad-hoc and Locality-Aware Distributed File System by using SSH and FUSE

CCGrid 2009, Shanghai, China 20May 20, 2009

Hierarchical grouping, sharing and lookup

Recursively and Hierarchically Constructing

• Nodes share with each other at the same level• Export their union to upper level

• File lookup happened in local group first• Then lookup upward if not found

Page 21: GMount:  An Ad-hoc and Locality-Aware Distributed File System by using SSH and FUSE

CCGrid 2009, Shanghai, China 21

Grid and Cluster Shell [Taura ’04]

◦ Simultaneously operate hundreds of nodes◦ Scalable and efficient◦ Across different administration domains◦ Install at one node, deploy to all node by itself◦ Also a useful tool for daily Grid interaction Programmable parallel execution framework

In GMount Efficiently execute SSHFS-MUX in parallel on many nodes

May 20, 2009

How to execute many mounts in parallel ?

Page 22: GMount:  An Ad-hoc and Locality-Aware Distributed File System by using SSH and FUSE

CCGrid 2009, Shanghai, China 22

1. Grab nodes by GXP◦ Assign the starting node as master, others as

workers2. Master gather info and make a mount plan

◦ Get the number of nodes◦ Get the information of each node◦ Make a spanning tree and mount plan and send

them to workers3. Execute the plan

◦ Workers execute the mount plan and send result back to master

◦ Master aggregates the results and prompt user

May 20, 2009

Summary of GMount Executions

Page 23: GMount:  An Ad-hoc and Locality-Aware Distributed File System by using SSH and FUSE

CCGrid 2009, Shanghai, China 23

Utilize Network Topology Information◦ Grouping nodes based on

implicit/explicit network affinity : Using IP address affinity Using network topology

information if available

NAT/Firewall◦ Overcome by cascade mount

Specify gateways as root ofinternal nodes and cascadeinside-outside traffic

May 20, 2009

Deal with Real Environments

NAT, Firewall1

2 3

4 5 6 7

LAN

Page 24: GMount:  An Ad-hoc and Locality-Aware Distributed File System by using SSH and FUSE

CCGrid 2009, Shanghai, China 24

Experimental Environments◦ InTrigger, 15 sites distributed cluster of clusters in

Japan Experiments

◦ Performance of building block (SSHFS-MUX) I/O performance Metadata performance

◦ File system construction time on system size Mount time Umount time

◦ I/O performance on spanning tree shape◦ Metadata performance on local accesses

May 20, 2009

Evaluation

Page 25: GMount:  An Ad-hoc and Locality-Aware Distributed File System by using SSH and FUSE

CCGrid 2009, Shanghai, China 25

Over 300 nodes across 12 sites◦ Representative platform in

wide-area environments Heterogeneous wide-area links

◦ NAT enabled in 2 sites Unified software environments

◦ Linux 2.6.18◦ FUSE 2.7.3◦ OpenSSH 4.3p2◦ SSHFS-MUX 1.1◦ GXP 3.03

May 20, 2009

InTrigger Platformhttp://www.intrigger.jp

14

6

761110

269

Page 26: GMount:  An Ad-hoc and Locality-Aware Distributed File System by using SSH and FUSE

CCGrid 2009, Shanghai, China 26May 20, 2009

File System Construction Time

1 (69/4)

2 (158/8)

3 (194/12)

4 (207/16)

5 (236/20)

6 (266/24)

7 (272/28)

8 (282/32)

9 (293/36)

10 (304/40)

11 (315/44)

12 (329/48)

0123456789

10mount (all nodes/site)unmount (all nodes/site)mount (4 nodes/site)unmount (4 nodes/site)

Tim

e (s

ec)

Sites # (nodes #)

< 10 seconds for nation

wide329 nodes

Page 27: GMount:  An Ad-hoc and Locality-Aware Distributed File System by using SSH and FUSE

2 4 8 16 320

50

100

150

200

250

300

350

400

450

500

GMount (K=4)GMount (K=8)GMount (K=16)Gfarm-FUSE

Concurrent Clients #May 20, 2009CCGrid 2009, Shanghai, China 27

Parallel I/O Performance

2 4 8 16 3205

101520253035404550

GMount (K=4)

GMount (K=8)

GMount (K=16)

Concurrent clients #

• Limited SSH transfer rate is primary bottleneck• Performance is also depends on tree shape

Page 28: GMount:  An Ad-hoc and Locality-Aware Distributed File System by using SSH and FUSE

CCGrid 2009, Shanghai, China 28

Gfarm: Wide-area DFS◦ Central meta server

Clients first query in meta server for file location

Clients may be distant from metadata server

Locality Awareness◦ Clients prefer to access files that stored in nodes

close to it (within the same cluster/LAN)◦ Percent of Local Access

May 20, 2009

Metadata Operation Peformance

AccessTotalofNumberAccessLocalofNumberP accesslocal

where local access is the access to the nodes within thesame cluster/LAN

Page 29: GMount:  An Ad-hoc and Locality-Aware Distributed File System by using SSH and FUSE

CCGrid 2009, Shanghai, China 29May 20, 2009

Metadata: GMount in WAN

0.2 0.4 0.600000000000001 0.8 10

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

mkdirrmdiropen+closestat EXISTstat NONEXISTutimechmod u+xunlink

Percent of Locality Access

Aggr

egat

e Op

erat

ion

Late

ncy

(sec

)

Gfarm in WAN

Gfarm in LAN

Locality-Aware: Saved Network Latency

Page 30: GMount:  An Ad-hoc and Locality-Aware Distributed File System by using SSH and FUSE

CCGrid 2009, Shanghai, China 30

Conventional DFS GMount DFS

Resources Fixed within domain Ad-hocFixed at deploy time Scale on demand

Volume Quota Policy depend Sum of local volumesFirewall Optional OK

Wide-AreaPotential high-latency file lookup if using central metadata server

Distributed metadata and locality-aware file lookup

Data Persistence Permanent storage On demand sharingData Redundancy

Yes No

Authentication GSI, SharedKey, etc SSHDeploy Privilege Administrator UserPrerequisite Kernel source, DB, etc. SSH, FUSE, PythonEnabling Effort Weeks-Months Minutes-HoursImplementation Years Months May 20, 2009

Highlights

Page 31: GMount:  An Ad-hoc and Locality-Aware Distributed File System by using SSH and FUSE

CCGrid 2009, Shanghai, China 31

SFTP Limitations◦ Not Fully POSIX Compatible

Rename operation and link operation◦ Limited Receive Buffer [Rapier et al. ’08]

Low data transfer rate in long-fat network◦ SFTP extended attributed support

Piggybacking file location during lookup Performance Enhancement

◦ SSHFS-MUX local mount operation (Done!) Fault Tolerance

◦ Tolerance on connection drops

May 20, 2009

Future Work

Page 32: GMount:  An Ad-hoc and Locality-Aware Distributed File System by using SSH and FUSE

CCGrid 2009, Shanghai, China 32

SSHFSMUXhttp://sshfsmux.googlecode.com/

Grid and Cluster Shellhttp://sourceforge.net/projects/gxp/

May 20, 2009

Available as OSS

Page 33: GMount:  An Ad-hoc and Locality-Aware Distributed File System by using SSH and FUSE

CCGrid 2009, Shanghai, China 33May 20, 2009

Thank You!