View
219
Download
0
Category
Tags:
Preview:
Citation preview
Information Management NTU
Distributed File Systems
Information Management NTU
Purposes of a Distributed File System
Sharing of storage and information across a network
Convenience (and efficiency) of a conventional file system
Persistent storage that most other services (e.g., Web servers) need
Information Management NTU
Properties of Storage Systems
Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.
Other properties include availability, timing guarantees, etc.
Information Management NTU
Files
Files are an abstraction of permanent storage.
A file is typically defined as a sequence of similar-sized data items along with a set of attributes.
A directory is a file that provides a mapping from text names to internal file identifiers.
Information Management NTU
File Attributes
Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.
Information Management NTU
File Systems
Responsible for the (a) organization, (b) storage, (c) retrieval, (d) naming, (e) sharing, and (f) protection of files.
Provide a set of programming operations that characterize the file abstraction, particularly operations to read and write subsequences of data items beginning at any point of a file.
Information Management NTU
File System Modules
Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.
A basic distributed file system implements all of the above plus modules for
client-server communication and distributed naming and location of files.
Information Management NTU
UNIX File Operations
Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.
Information Management NTU
Distributed File System Requirements
Transparency: access, location, mobility, performance, and scaling transparency.
Concurrency (and Consistency) Replication/Caching (and Consistency) Hardware/operating system heterogeneity Fault-Tolerance Security (Access Control, Authentication) Efficiency
Information Management NTU
A File Service Architecture
Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.
Note: The modules communicate with one another by remote procedure calls.
Information Management NTU
File Service Components
Flat file service: implementing operations on the contents of files, which are referred to by unique file identifiers (UFIDs)
Directory service: mapping text names of files (including directories) to their UFIDs
Client module: integrating and extending the previous two services under a single application programming interface
* Why is this structure more open and configurable?
Information Management NTU
Flat File Service Operations
Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.
Information Management NTU
Difference from UNIX
Immediate access to files using UFIDs (without open or close)
Read or write starts at the position indicated by a parameter
All operations, except create, are repeatable
Allows a stateless implementation
Information Management NTU
Access Control
Conventional access rights checks (at open calls) not feasible
Two ‘stateless’ approaches:
* Capability (by manipulating the UFID)
* User identity sent with every request (adopted in NFS and AFS)
Main problem: forged requests; some authentication mechanism is needed
Information Management NTU
Capabilities and UFIDs
A capability is a binary value that acts as an access key; it can be encoded in the UFID.
Basic construction of a UFID:
file group id + file number + random number
Additional field: permissions Additional field: encryption of the
permission field
Information Management NTU
Directory Service Operations
Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.
Note: Each directory is stored as an ordinary file with a UFID.
Information Management NTU
The Network File System (NFS)
Introduced by Sun Microsystems in 1985, now an Internet standard
Runs on top of RPC (RFC 1831) Implemented on most operating systems Version described here: UNIX
implementation of NFS Version 3 (RFC 1813, June 1995)
Most recent version: NFS Version 4 (RFC 3010, December 2000)
Information Management NTU
NFS Architecture
Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.
Note: Each computer can act as both a client and a server.
Information Management NTU
The Virtual File System Module
Access transparency File handles (file identifiers):
‘filesystem indentifier’ + ‘i-node number’ + ‘i-node generation number’
One VFS structure for each mounted filesystem relates a remote filesystem (identified by its file handle obt
ained at mount time) to a local directory on which it is mounted
One v-node per open file indicates whether a file is local or remote, etc.
Information Management NTU
The NFS Client Module in UNIX
Integrated with the kernel Emulates the UNIX file system primitives A single client module serves all user-level
processes The encryption key for authentication
stored in the kernel Caches file blocks
Information Management NTU
NFS Server Operations
Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.
Information Management NTU
NFS Server Operations (cont’d)
Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.
Information Management NTU
Remote File Acceses
Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.
Information Management NTU
File System Information in UNIX
saturn:~ 35 % df -kFilesystem kbytes capacity Mounted on/dev/dsk/c0t3d0s0 143903 91% //dev/dsk/c0t3d0s6 267943 99% /usr/dev/dsk/c0t3d0s3 15383 3% /tmpgalaxy:/usr/local.real 4030440 53% /usr/locallucky:/var/mail.real 564648 86% /var/mailcosmos:/home.real/student/xxx
3941760 60% /home/xxxgalaxy:/home.real/faculty/yyy
2964512 51% /home/yyy
* Note: The output of ‘df -k’ has been edited.
Information Management NTU
Caching
Server caching read-ahead write-through delayed-write with the commit operation
Client caching cache validation (freshness interval and validation tim
estamp, modification timestamp and getattr, …) bio-daemon (for read-ahead and delayed-write cachin
g at the client side)
Information Management NTU
Achievements of NFS
Access and location transparency Mobility transparency (partially) Read-only file replication: the automounter Fault-tolerance: stateless servers, the automoun
ter Efficiency: caching of disk blocks (main problem:
frequent use of getattr)
Nonachievements: scalability, concurrency and consistency, security (Kerberos), ...
Information Management NTU
The Andrew File System (AFS)
Developed at CMU Current versions: AFS-2, AFS-3 Compatible with NFS Main achievement over (older) NFS: better
scalability by minimizing client-server communication
Key characteristics: whole-file serving and caching (partial file caching allowed in AFS-3)
Information Management NTU
Observations onUNIX File Usage
Files are mostly small Read operations are more common Sequential accesses are more common Most files are written by one user Files are referenced in burst
Information Management NTU
AFS Architecture
Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.
Information Management NTU
AFS File Name Space
Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.
Information Management NTU
System Call Interception in AFS
Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.
Information Management NTU
AFS System Calls Implementation
Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.
Information Management NTU
Cache Consistency
A callback promise is provided when Vice supplies a copy of file to a Venus process
The callback promise stored with the cached copy is in either valid or cancelled state
When Venus handles an open, it checks the cache.
Information Management NTU
The Vice Service Interface
Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.
Information Management NTU
Enhancements to NFS and AFS
Spritely NFS add open and close, use callbacks
NQNFS (Not Quite NFS) use callbacks and leases
WebNFS allow browsers and other applications to interact with an NF
S server directly NFS Version 4 (RFC 3010, December 2000)
incorporating all of the above and more DCE/DFS (based on AFS)
use callbacks and write tokens (with a lifetime)
Information Management NTU
New Features of NFS Version 4
Adoption of the RPCSEC_GSS (RFC 2203) security protocol
Multiple operations in one request Better migration and replication abilities
A client may query the location(s) of a file system. Introduction of open and close operations Lease-based file locking Callback-based delegation of files
Information Management NTU
New Design Approaches
Backgroundhigh-performance storage technology (e.g., RAID) log-structure file systems (e.g., Sprite, BSD LFS)high-performance switched networks (e.g., ATM,
high-speed Ethernet) Goals: high scalability and fault-tolerance Main ideas: distribute file data among
many nodes, separate responsibilities, … Constraints: high level of trust
Information Management NTU
More Recent File System Designs
xFSServerless: all data, metadata, and control can be lo
cated anywhere in the system; any machine can take over the responsibilities of a failed one
FrangipaniTwo-layer structure
the Petal distributed virtual disk system the Frangipani server module
Both designs utilize RAID-style striping, log-structured file storage, etc.
Information Management NTU
Log-based Striping in xFS
Source: T.E. Anderson et al., Serverless Network File Systems, ACM TOCS 1996
Information Management NTU
An xFS Configuration
Source: T.E. Anderson et al., Serverless Network File Systems, ACM TOCS 1996
Information Management NTU
A Frangipani Configuration
Source: C.A. Thekkath et al., Frangipani: A Scalable Distributed File System, ACM SOSP 1997
Recommended