Upload
frederica-nichols
View
222
Download
2
Embed Size (px)
Citation preview
Content
1. Ceph Architecture1. Ceph Architecture
2. Ceph Components2. Ceph Components
3. Performance Evaluation3. Performance Evaluation
4. Ceph Demo4. Ceph Demo
5. Conclusion5. Conclusion
Ceph Architecture
What is Ceph?Ceph is a distributed file system that provides excellent
performance, scalability and reliability.
Features
Decoupled data and metadata
Dynamic distributed metadata management
Reliable autonomic distributed object storage
Ceph Architecture
Object-based Storage
Applications
System Call Interface
File System
Logical Block Interface
Block I/O ManageHard Drive
Operating System
Traditional Storage
File System Storage Component
File System Client Component
Applications
System Call Interface
Logical Block Interface
Block I/O ManageObject-based Storage Device
Operating System
Object-based Storage
Ceph Components
ObjectStoragecluster
Clients
MetadataServercluster
Clustermonitor
File I/O
Metadata I/O
Met
adat
a op
s
Ceph Components
Client Operation
Meta Datacluster
Clients
Object Storagecluster
Open Request
Capability Management
Read/
Writ
e
Capability,
Inode, size,
stripe
CRUSH is used to map Placement Group (PG) to
OSD.Close Request,
Details of
Read/Write
Ceph Components
Client Synchronization
POSIXSemantics
Relaxed Consistency
Synchronous I/O. performance killer
Solution: HPC extensions to POSIX
Default: Consistency / correctness
Optionally relax Extensions for both data
and metadata
Ceph Components
Namespace OperationsCeph optimizes for most common meta-data access scenarios (readdir followed by stat)
But by default “correct” behavior is provided at some cost.
S
t
a
t
o
p
e
r
a
ti
o
n
o
n
a
fil
e
o
p
e
n
e
d
b
y
m
u
lti
p
l
e
w
rit
e
r
s
Applications for which coherent behavior is unnecessary use extensions
Namespace Operations
Ceph Components
Metadata Storage Advantages
Per-MDS journals
Eventually pushed to OSD
Sequential Update
More efficient
Reducing re-write
workload.
Optimized on-disk storage
layout for future read access
Easier failure recovery. Journal can be rescanned
for recovery.
Ceph Components
Dynamic Sub-tree Partitioning
Adaptively distribute cached metadata hierarchically across a set of nodes.
Migration preserves locality. MDS measures popularity of metadata.
Ceph Components
Traffic Control for metadata access Challenge
Partitioning can balance workload but can’t deal with hot spots or flash crowds
Ceph Solution Heavily read directories are selectively replicated
across multiple nodes to distribute load Directories that are extra large or experiencing
heavy write workload have their contents hashed by file name across the cluster
16
CRUSH
CRUSH(x) (osdn1, osdn2, osdn3) Inputs
x is the placement group Hierarchical cluster map Placement rules
Outputs a list of OSDs Advantages
Anyone can calculate object location Cluster map infrequently updated
Conclusion
Strengths: Easy scalability to peta-byte capacity High performance for varying work loads Strong reliability
Weaknesses: MDS and OSD Implemented in user-space The primary replicas may become bottleneck
to heavy write operation N-way replication lacks storage efficiency
References
“Ceph: A Scalable, High Performance Distributed File System” Sage A Weil, Scott A. Brandt, Ethan L. Miller and Darrell D.E. Long, OSDI '06: th USENIX Symposium on Operating Systems Design and Implementation.
“Ceph: A Linux petabyte-scale distributed file System”, M. Tim Jones, IBM developer works, online document.
Technical talk presented by Sage Weil at LCA 2010. Sage Weil's PhD dissertation, “
Ceph: Reliable, Scalable, and High-Performance Distributed Storage” (PDF)
“CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data” (PDF) and “RADOS: A Scalable, Reliable Storage Service for Petabyte-scale Storage Clusters” (PDF) discuss two of the most interesting aspects of the Ceph file system.
“Building a Small Ceph Cluster” gives instructions for building a Ceph cluster along with tips for distribution of assets.
“Ceph : Distributed Network File System: Kernel trap”