View
1.283
Download
2
Category
Preview:
Citation preview
SCALABLE POSIX FILE SYSTEMSIN THE CLOUD
Jason Callaway@jasoncallaway | blog.jasoncallaway.comJanuary 2016
THE RED HAT STORAGE MISSIONTo offer a unified, open software-defined
storage portfolio that delivers a range of data services for next generation workloads
thereby accelerating the transition to modern IT infrastructures.
Traditional StorageComplex proprietary silos
Open, Software-Defined StorageStandardized, unified, open platforms
Custom GUI
Proprietary Software
ProprietaryHardware
StandardComputersand DisksS
tand
ard
Har
dwar
eO
pen
Sou
rce
Sof
twar
e
Ceph Gluster +++
Control Plane (API, GUI)
ADMIN USER
THE FUTURE OF STORAGE
ADMIN
USER
ADMIN
USER
ADMIN
USER
Custom GUI
Proprietary Software
ProprietaryHardware
Custom GUI
Proprietary Software
ProprietaryHardware
WHY BOTHER?
PROPRIETARYHARDWARE
HARDWARE-BASED INTELLIGENCE
SCALE-UPARCHITECTURE
CLOSED DEVELOPMENT PROCESS
Common, off-the-shelf hardwareLower cost, standardized supply chain
Scale-out architectureIncreased operational flexibility
Software-based intelligenceMore programmability, agility, and control
Open development processMore flexible, well-integrated technology
A RISING TIDE
“By 2020, between 70-80% of unstructured data will be held on lower-cost storage managed by SDS environments.”
“By 2019, 70% of existing storage array productswill also be available as software only versions”
“By 2016, server-based storage solutions will lower storage hardware costs by 50% or more.”
Gartner: “IT Leaders Can Benefit From Disruptive Innovation in the Storage Industry”
Innovation Insight: Separating Hype From Hope for Software-Defined Storage
Innovation Insight: Separating Hype From Hope for Software-Defined Storage
Market size is projected to increase approximately 20% year-over-year between 2015 and 2019.
2013 2014 2015 2016 2017 2018 2019
$1,349B
$1,195B
$1,029B
$859B
$706B
$592B
SDS-P MARKET SIZE BY SEGMENT
$457B
Block StorageFile StorageObject StorageHyperconverged
Source: IDC
Software-Defined Storage is leading a shift in the global storage industry, with far-reaching
effects.
Open Software-Defined Storage is a fundamental reimagining of how storage infrastructure works.It provides substantial economic and operational advantages, and it has quickly become ideally suited for a growing number of use cases.
TODAY EMERGING FUTURE
CloudInfrastructure
CloudNative Apps
Analytics
Hyper-Convergence
Containers
???
???
THE JOURNEY
THE RED HAT STORAGEPORTFOLIO
THE RED HAT STORAGE PORTFOLIO
Cephmanagement
OP
EN S
OU
RC
ESO
FTW
AR
E
Glustermanagement
Cephdata services
Glusterdata services
STA
ND
AR
DH
AR
DW
AR
E
Share-nothing, scale-out architecture provides durability and adapts to changing demands
Self-managing and self-healing features reduce operational overhead
Standards-based interfaces and full APIs ease integration with applications and systems
Supported by theexperts at Red Hat
GROWING INNOVATION COMMUNITIES
lOver 11M downloads in the last 12 monthslIncreased development velocity, authorship, and discussion has resulted in rapid feature expansion.
lContributions from Intel, SanDisk, SUSE,and DTAG.lPresenting Ceph Days in cities around the world and quarterly virtual Ceph Developer Summit events.
78 AUTHORS/mo
1500 COMMITS/mo
258 POSTERS/mo
41 AUTHORS/mo
259 COMMITS/mo
166 POSTERS/mo
SanDisk sells the InfiniFlash storage arrays, designed for use with Red Hat Ceph Storage. Optimizations contributed by SanDisk deliver high performance which allow Ceph customers to service new workloads.
Our relationship includes:
lEngineering and product collaboration
lCommunity thought leadership
PARTNER SOLUTIONS
All-flash arrays, optimized for Ceph
Supermicro's Red Hat Ceph Storage optimized solutions offer durable, software-defined, scale-out storage platforms in 1U/2U/4U form factors and are designed to maximize performance, density, and capacity
Customer can expect to see:
lReference architectures, validated for
performance, density, and capacity
lWhitepapers and datasheets that support
Red Hat Storage solutions
Systems designed with storage in mind
Through silicon innovation and software optimization, Intel pushes the envelope on open, software-defined storage. A key contributor, Intel recently donated significant hardware to the Ceph project.
Intel development efforts have included:
lSSD and performance optimizations
lCephFS development
Accelerating software-defined storage
Version 1.3 of Red Hat Ceph Storage is the first major release since joining the Red Hat Storage product portfolio, and incorporates feedback from customers who have deployed in production at large scale.
Areas of improvement:
lRobustness at scale
lPerformance tuning
lOperational efficiency
REFINEMENTS FORPETABYTE-SCALE OPERATORS
Optimized for large-scale deployments
Version 3.1 of Red Hat Gluster Storage contains many new features and capabilities aimed to bolster data protection, performance, security, and client compatibility.
New capabilities include:
lErasure coding
lTiering
lBit Rot detection
lNVSv4 client support
Enhanced for flexibility and performance
RED HAT GLUSTER STORAGE
Nimble file storage for petabyte-scale workloads
lMachine analytics with SplunklBig data analytics with Hadoop
TARGET USE CASES
Enterprise File SharinglMedia StreaminglActive Archives
Analytics
Enterprise Virtualization
OVERVIEW:RED HAT GLUSTER STORAGE
Purpose-built as a scale-out file store with a straightforward architecture suitable for public, private, and hybrid cloud
Simple to install and configure, with a minimal hardware footprint
Offers mature NFS, SMB and HDFS interfaces for enterprise use
Customer Highlight: IntuitIntuit uses Red Hat Gluster Storage to provide flexible, cost-effective storage for their industry-leading financial offerings. Rich Media & Archival
OVERVIEW:TERMINOLOGY
Brick: basic unit of storage, represented by an export directory on a server in the trusted storage pool.
Cluster: a group of linked computers, working together closely thus in many respects forming a single computer.
FUSE: Filesystem in Userspace (FUSE) is a loadable kernel module for Unix-like computer operating systems that lets non-privileged users create their own file systems without editing kernel code.
Geo-Replication: provides a continuous, asynchronous, and incremental replication service from site to another over Local Area Networks (LANs), Wide Area Network (WANs), and across the Internet.
OVERVIEW:TERMINOLOGY
Metadata: defined as data providing information about one or more other pieces of data. There is no special metadata storage concept in GlusterFS. The metadata is stored with the file data itself.
Namespace: an abstract container or environment created to hold a logical grouping of unique identifiers or symbols. Each Gluster volume exposes a single namespace as a POSIX mount point that contains every file in the cluster.
Volume: a logical collection of bricks. Most of the gluster management operations happen on the volume.
OVERVIEW:GLUSTER ARCHITECTURE
OVERVIEW:VOLUME PERMUTATIONS
Distributed -- Namespace is distributed horizontally across n bricks
Replicated -- Namespace is synchronously replicated to an identical namespace
Striped -- Namespace stripes data across bricks
Dispersed -- Namespace uses Erasure Coding
No Redundancy Replicated (synchronous replication) Dispersed (erasure codes)
Distributed Distributed Distributed
Striped Striped
Distributed-Striped Distributed-Striped
Geo-replicated Geo-replicated Geo-replicated
Standard replicated back-ends are very durable, and can recover very quickly, but they have an inherently large capacity overhead.Erasure coding back-ends reconstruct corrupted or lost data by using information about the data stored elsewhere in the system.Providing failure protection with erasure coding eliminates the need for RAID, consumes far less space than replication, and can be appropriate for capacity-optimized use cases.
ERASURE CODING
Storing more data with less hardwareOBJECT/FILE
1 2 3 4 X Y
ERASURE CODED POOL/VOLUME
STORAGE CLUSTER
Up to 75% reduction in TCO
Optimally, infrequently accessed data is served from less expensive storage systems while frequently accessed data can be served from faster, more expensive ones.However, manually moving data between storage tiers can be time-consuming and expensive.Red Hat Gluster Storage 3.1 now supports automated promotion and demotion of data between “hot” and “cold” sub volumes based on frequency of access.
TIERING
Cost-effective flash accelerationOBJECT/FILE
HOT SUBVOLUME(FLASH, REPLICATED)
STORAGE CLUSTER
COLD SUBVOLUME(ROTATIONAL, ERASURE CODED)
Bit rot detection is a mechanism that detects data corruption resulting from silent hardware failures, leading to deterioration in performance and integrity.Red Hat Gluster Storage 3.1 provides a mechanism to scan data periodically and detect bit-rot.Using the SHA256 algorithm, checksums are computed when files are accessed and compared against previously stored values. If they do not match, an error is logged for the storage admin.
BIT ROT DETECTION
Detection of silent data corruption
ADMIN
0
!!!
0 0 0 0 00 X
Using NFS-Ganesha, an NFS server implementation, Red Hat Gluster Storage 3.1 provides client access with simplified failover and failback in the case of a node or network failure.Supporting both NFSv3 and NFSv4 clients, NFSGanesha introduces ACLs for additional security, Kerberos authentication, and dynamic export management.
SECURITY
Scalable and secure NFSv4 client supportCLIENT CLIENT CLIENT CLIENT
NFS NFS NFS NFS
NFS-GANESHA NFS-GANESHA
STORAGE CLUSTER
OVERVIEW:MAXIMUMS
• 64 nodes per cluster• 8 volumes per LVM RAID• XFS max recommended size for brick
• 100 TB certified / 8 EB maximum on RHEL 6• 500 TB certified / 8 EB maximum on RHEL 7
• 16 PB usable per distributed-replicated cluster• (500 TB * 64 nodes / 2 replication factor)
• ~4 PB with EC2 16 TB EBS volumes
• http://blog.gluster.org/category/performance/
RED HAT GLUSTER STORAGETARGET WORKLOADS
WHAT TOOL DO I USE?Use Case Gluster Ceph
File• NFS• CIFS• FUSE Native
Client
Works great! Tech-preview in June
OLTP-like Maybe Works great!Block Nope Works great!Object Works great! Works great!Cloud Works great! NopeBig Data Works great! Works great!Geo-replication Works great! It’s complicated
ANALYTICSBig Data analytics with Hadoop
CLOUD INFRASTRUCTURE
RICH MEDIAAND ARCHIVAL
SYNC ANDSHARE
ENTERPRISEVIRTUALIZATION
Machine data analytics with Splunk
Virtual machine storage with OpenStack
Object storage for tenant applications
Cost-effective storage for rich media streaming
Active archives
File sync and share with ownCloud
Storage for conventional virtualization with RHEV
FOCUSED SET OF USE CASES
In-place Hadoop analytics in a POSIX compatible environment
HADOOP MAP REDUCE FRAMEWORK
lAllows the Hortonworks Data Platform 2.1 to be deployed on Red Hat Gluster Storage lHadoop tools can operate on data in-placelAccess to the Hadoop ecosystem of toolslAccess to non-Hadoop analytics toolslConsistent operating model: Hadoop can run directly on Red Hat Gluster Storage nodes
lFlexible, unified enterprise big data repositorylBetter analytics (Hadoop and non-Hadoop)lFamiliar POSIX-compatible file system and toolslStart small, scale as big data needs growlMulti-volume support (HDFS is single-volume)lUnified management (Hortonworks HDP Ambari and Red Hat Gluster Storage)
FEATURES
BIG DATA ANALYTICS
Hadoop DistributedFile System
Red Hat GlusterStorage Cluster
BENEFITS
High-performance, scale-out, online cold storage for Splunk Enterprise
Hot/warm data optimized for performance
10s of TB on Splunk server DAS
Cold optimized for cost, capacity and elasticity
Red Hat Storage Server on commodity x86 servers
lMultiple ingest options using NFS & FUSElExpand storage pools without incurring downtimelSupport for both clustered and non-clustered configurations
lRun high speed indexing and search onlSplunk’s cold data storelPay as you grow economics for Splunk cold datalReduce ingestion time for data withlstandard protocolsl“Always online”, fast, disk-based storage pools provide constant access to historical data
MACHINE DATA ANALYTICS
FEATURES BENEFITS
Massively-scalable, flexible, and cost-effective storage for image, video, and audio content
Unstructured image, video,and audio content
lSupport for multi-petabyte storage clusters on commodity hardwarelErasure coding and replication for capacity-optimized or performance-optimized poolslSupport for standard file & object protocolslSnapshots and replication capabilities for high availability and disaster recovery
lProvides massive and linear scalability in on-premise or cloud environmentslOffers robust data protection with an optimal blend of price & performancelStandard protocols allow access to broadcast content anywhere, on any devicelCost-effective, high performance storage for on-demand rich media content
RICH MEDIA
Red Hat GlusterStorage Cluster
Red Hat CephStorage Cluster
FEATURES BENEFITS
Open source, capacity-optimized archival storage on commodity hardware
Unstructuredfile data
lCache tiering to enable "temperature"-based storagelErasure coding to support archive and cold storage use caseslSupport for industry-standard file and object access protocols
lStore data based on its access frequencylStore data on premise or in a public or hybrid cloudlAchieve durability while reducing raw capacity requirements and limiting costlDeploy on industry-standard hardware
ACTIVE ARCHIVES
Red Hat GlusterStorage Cluster
Red Hat CephStorage Cluster
Unstructuredobject data
Volumebackups
FEATURES BENEFITS
Powerful, software-defined, scale-out, on-premise storage for file sync and share with ownCloud
Webbrowser
lSecure file sync and share with enterprise-grade auditing and accountinglCombined solution of Red Hat Gluster Storage, ownCloud, HP ProLiant SL4550 Gen 8 serverslDeployed on-premise, managed by internal ITlAccess sync and share data from mobile devices, desktop systems, web browsers
lSecure collaboration with consumer-grade ease of uselLower risk by storing data on-premiselConform to corporate data security and compliance polices lLower total cost of ownership with standard, high-density servers and open source
FILE SYNC AND SHARE
OWNCLOUD ENTERPRISE EDITION
Mobileapplication
DesktopOS
RED HAT GLUSTER STORAGE
FEATURES BENEFITS
The Journey to Software-Defined Storage INTERNAL ONLY31
Scalable, reliable storage forRed Hat Enterprise Virtualization
lReliably store virtual machine images in a distributed Red Hat Gluster Storage volumelManage storage through the RHEV-M consolelDeploy on standard hardware of choicelSeamlessly grow and shrink storage infrastructure when demand changes
lReduce operational complexities by eliminating dependency on complex and expensive SAN infrastructures lDeploy efficiently on less expensive, easier to provision, standard hardwarelAchieve centralized visibility and control of server and storage infrastructure
ENTERPRISE VIRTUALIZATION
FEATURES BENEFITS
ROADMAP DETAIL
ROADMAP:RED HAT GLUSTER STORAGE
TODAY (v3.1) V3.2 (H1-2016) FUTURE (v4.0 and beyond)
Gluster 3.8, RHEL 6, 7
lDynamic provisioning of resources
lInode quotaslFaster Self-heallControlled Rebalance
lSMB 3 (advanced features)lMulti-channel
lAt-rest encryption
lNew UIlGluster REST API
lCompressionlDeduplicationlHighly scalable control planelNext-gen replication/distribution
lpNFS
lQoSlClient side caching
MG
MT
CO
RE
FIL
ES
EC
MG
MT
CO
RE
FIL
EP
ER
F
Gluster 4, RHEL 7Gluster 3.7, RHEL 6, 7
lDevice ManagementlGeo-Replication, SnapshotslDashboard
lErasure CodinglTieringlBit Rot DetectionlSnap Schedule
lActive/Active NFSv4lSMB 3 (basic subset)
lSELinuxlSSL encryption (in-flight)
MG
MT
CO
RE
FIL
ES
EC
MG
MT Support in the console for discovery, format, and creation of bricks based on recommended best practices; an improved dashboard that shows vital statistics of pools.
MG
MT
CO
RE
CO
RE
CO
RE
New support in the console for snapshotting and geo-replication features.
New features to allow creation of a tier of fast media (SSDs, Flash) that accompanies slower media, supporting policy-based movement of data between tiers and enhancing create/read/write performance for many small file workloads.
Ability to detect silent data corruption in files via signing and scrubbing, enabling long term retention and archival of data without fear of “bit rot”.
Ability to schedule periodic execution of snapshots easily, without the complexity of custom automation scripts.
Device management, dashboard
Snapshots,Geo-replication
Tiering
Bit rot detection
Snapshot scheduling
These features were introduced in the most recent release of Red Hat Gluster Storage, and are now supported by Red Hat.
CO
RE
CO
RE
Features that enable incremental, efficient backup of volumes using standard commercial backup tools, providing time-savings over full-volume backups.
Introduction of erasure coded volumes (dispersed volumes) that provide cost-effective durability and increase usable capacity when compared to standard RAID and replication.
Backup hooks
Erasure coding
DETAIL:RED HAT GLUSTER STORAGE 3.1
DETAIL:RED HAT GLUSTER STORAGE 3.1
PE
RF Optimizations to enhance small file performance,
especially with small file create and write operations.
PE
RF
SEC
UR
ITY
PR
OT
OC
OL
Optimizations that result in enhanced rebalance speed at large scale.
Introduction of the ability to operate with SELinux in enforcing mode, increasing security across an entire deployment.
Support for data access via clustered, active-active NFSv4 endpoints, based on the NFS-Ganesha project.
Enhancements to SMB 3 protocol negotiation, copy-data offload, and support for in-flight data encryption [Sayan: what is copy-data offload?]
Small file
Rebalance
SELinux enforcing mode
NFSv4(multi-headed)
SMB 3 (subset of features)
These features were introduced in the most recent release of Red Hat Gluster Storage, and are now supported by Red Hat.
PR
OT
OC
OL
SCALABLE POSIX FILESYSTEMIN THE CLOUD
IN THE CLOUDGLUSTER WORKS GREAT IN AWS
us-east-1a
us-east-1c
m4.xlarge
root
10G
m4.xlarge
root
10G
m4.xlarge
root
10G
m4.xlarge
root
10G
m4.xlarge
root
10G
m4.xlarge
root
10G
m4.xlarge
root
10G
m4.xlarge
root
10G
vpc
IN THE CLOUDGLUSTER WORKS GREAT IN AWS
us-east-1a
us-east-1c
m4.xlarge
root
10G
m4.xlarge
root
10G
m4.xlarge
root
10G
m4.xlarge
root
10G
m4.xlarge
root
10G
m4.xlarge
root
10G
m4.xlarge
root
10G
m4.xlarge
root
10G
vpc
40GB Gluster Distributed Volume
40GB Gluster Distributed Volume
Rep
licat
ion
SEEMS LEGIT…SINGLE-THREAD PERF FOR GOOGLE EARTH
Type Operation Protocol Test Size MB/s
Random Read NFS POSIX AIO 16MB 97.11
Random Write NFS POSIX AIO 16MB 55.12
Sequential Read NFS POSIX AIO 16MB 130.55
Sequential Write NFS POSIX AIO 16MB 56.26
Random Read FUSE Client POSIX AIO 16MB 156.34
Random Write FUSE Client POSIX AIO 16MB 83.23
Sequential Read FUSE Client POSIX AIO 16MB 146.07
Sequential Write FUSE Client POSIX AIO 16MB 82.55
DEVOPSDEPLOY & SCALE GLUSTER WITH ANSIBLE
DEVOPSSHOULD HAVE THAT DONE IN AN HOUR OR SO…
tinyurl.com/gluster-ansible
IN THE CLOUDGLUSTER WORKS GREAT IN AWS
us-east-1a
us-east-1c
m4.xlarge
root
10G
m4.xlarge
root
10G
m4.xlarge
root
10G
m4.xlarge
root
10G
m4.xlarge
root
10G
m4.xlarge
root
10G
m4.xlarge
root
10G
m4.xlarge
root
10G
vpc
40GB Gluster Distributed Volume
40GB Gluster Distributed Volume
Rep
licat
ion
PUSH BUTTONSCALE OUT YOUR POSIX STORAGE!
m4.xlarge
root
10G
m4.xlarge
root
10G
m4.xlarge
root
10G
m4.xlarge
root
10G
m4.xlarge
root
10G
m4.xlarge
root
10G
m4.xlarge
root
10G
m4.xlarge
root
10G
vpc
m4.xlarge
root
10G
m4.xlarge
root
10G
DEMO VIDEOhttps://youtu.be/-wsJjrLQKqk
THANK YOU
Recommended