34
Distributed Petabyte-Scale Cloud Storage with GlusterFS The Future of GlusterFS and Gluster.org John Mark Walker GlusterFS Community Guy Red Hat, Inc. February 28, 2012

vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28

Embed Size (px)

DESCRIPTION

GlusterFS is an open source scale-out NAS solution. The software is a powerful and flexible solution that simplifies the task of managing unstructured file data whether you have a few terabytes of storage or multiple petabytes. It’s no secret that unstructured data is growing like crazy, Gluster provides a solutions that scales capacity and performance as you need it and is an ideal fit for an IT environment that is increasingly virtualized and moving to the cloud. There are two key ways that GlusterFS is beneficial for cloud builders: 1. Storage layer for VMs. If you're deploying Xen or KVM VMs on a private cloud, storing them on GlusterFS gives you the ability to migrate to different hypervisors, suspend and resume quickly - even on another hypervisor, scale out far beyond what other filesystems will allow, and utilize N-way replication for DR and HA 2. Unified storage layer for applications. With GlusterFS 3.3, you will be able to access your application data stores from an object (S3, Swift-style) interface, as well as a traditional POSIX-compatible NAS interface. This unified approach gives developers and admins the ability to access the same data store using a variety of different methods. In this session, attendees will learn steps for deployment and some common use cases. Speaker Bio John Mark is an experienced veteran of all things open source and a self-described agitprop, agitator and advocate for those who volunteer countless, unpaid hours for a particular project or community. He first fell down the slippery slope of open source as a web developer at VA Linux Systems and eventually switched to the community team, beginning a career that has now lasted over ten years. Along the way, John Mark made stops at young, up-and-coming startups, such as Groundwork, Hyperic and then Gluster (later acquired by Red Hat). In between, there was a brief interlude at IDG World Expo, where he was the conference director for LinuxWorld, GridWorld and OSBC. His advice for companies who want to "do community" is to trust your community and give them the space to "just try s***." John Mark loves to perform community karaoke, and is available for weddings, funerals and Bar/Bat Mitzvahs

Citation preview

Page 1: vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28

Distributed Petabyte-Scale Cloud Storage with GlusterFSThe Future of GlusterFS and Gluster.org

John Mark WalkerGlusterFS Community Guy

Red Hat, Inc.February 28, 2012

Page 2: vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28

February 28, 2012 The Future of Gluster.org - John Mark Walker

The Roots of GlusterFS

● Distributed storage solutions difficult to find● Decided to write their own● No filesystem experts – Pro & Con● Applied lessons from microkernel architecture

– GNU Hurd

Page 3: vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28

The Roots of GlusterFS

● All storage solutions were either● Too expensive. or...● Not scalable, or…● Single purpose, or…● Don’t support legacy apps, or…● Don't support new apps, or...● Do some combo of the above, but not very well

Page 4: vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28

The Roots of GlusterFS

● The challenge:● Create a storage system that was…

– Scalable– Seamlessly integrated in the data center– Future-proof

● The solution: GlusterFS● Scalable, with DHT● POSIX-compliant● Stackable● User-space

Page 5: vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28

GlusterFS Client Architecture

● Creating a file system in user space● Utilizes fuse module

– Kernel goes through fuse, which hands off to glusterd

Linux kernel

Fuse Ext4

glusterd

… …

Applications

Page 6: vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28

No Centralized Metadata

Client A Client B Client C

Server X

Files

Extended Attr.

Server Y

Files

Extended Attr.

Server Z

Files

Extended Attr.

Page 7: vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28

What is a Translator?

● Add/remove layers● Reorder layers● Move layers between

client and server● Implement new layers

● e.g. encryption

● Replace old layers● e.g. replication

FUSE Interface Layer

Performance Layer

Distribution Layer

Replication Layer

Protocol Layer

Local Filesystem Later

Page 8: vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28

Some Features

● Distributed, replicated and/or striped volumes● Global namespace● High availability● Geo-replication● Rebalancing● Remove or replace bricks● Self healing● volume profile and top metrics

Page 9: vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28

No one ever expects the Red Hat acquisition

Page 10: vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28

Red Hat Invests in GlusterFS

● Unstructured data volume to grow 44x by 2020● Cloud and virtualization are driving scale-out

storage growth● Scale-out storage shipments to exceed 63,000

PB by 2015 (74% CAGR)● 40% of core cloud spend related to storage● GlusterFS-based solutions up to 50% less than

other storage systems

Page 11: vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28

Red Hat Invests in GlusterFS

● GlusterFS adds to the Red Hat stack● Complements other

Red Hat offerings● Many integration points

● More engineers hacking on GlusterFS than ever before

RHEL

RHEV BareMetal

Clouds

GlusterFS Unified Storage

JBoss

Page 12: vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28

Red Hat Invests in GlusterFS

● Acceleration of community investment● GlusterFS needs to be “bigger than Red Hat”● Transformation of GlusterFS from product to project

– From “open core” to upstream● More resources for engineering and community

outreach● Red Hat's success rests on economies of scale

– Critical mass of users and developers

Page 13: vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28

Join a Winning Team

● We're hiring hackers and engineers● Looking for community collaborators

● ISVs, students, IT professionals, fans, et al.

“Join me, and together, we can rule the galaxy...”

Page 14: vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28

The Immediate Future

Page 15: vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28

The Gluster Community

● 300,000+ downloads● ~35,000 /month

● >300% increase Y/Y

● 1000+ deployments● 45 countries

● 2,000+ registered users

● Mailing lists, Forums, etc.

Global adoption

Page 16: vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28

The Gluster Community

● Why are we changing?● Only 1 non-Red Hat core contributor

– There were 2, but he acquired us● Want to be the software standard for distributed

storage● Want to be more inclusive, more community-driven

Goal: create global ecosystem that supports ISVs, service providers and more

Page 17: vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28

Towards “Real” Open Source

● GlusterFS, prior to acquisition● “Open Core”● Tied directly to Gluster products

– No differentiation ● Very little outside collaboration● Contributors had to assign copyright to Gluster

– Discouraged would-be contributors

Page 18: vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28

Commercial Product

Towards “Real” Open Source

Open SourceCode

“Open Core”● All engineering controlled by

project/product sponsor

● No innovation outside of core engineering team

● All open source features also in commercial product

● Many features in Commercial product not in open source code

Page 19: vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28

Open Source Code

Towards “Real” Open Source

Commercial Products

“Real” Open Source● Many points of collaboration

and innovation in open source project

● Engineering team from multiple sources

● Project and product do not completely overlap

● Commercial products are hardened, more secure and thoroughly tested

Page 20: vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28

Fedora Linux

Towards “Real” Open Source

RHEL

“Real” Open Source● Enables more innovation on

the fringes

● Engineering team from multiple sources

● Open source project is “upstream” from commercial product

● “Downstream” products are hardened, more secure and thoroughly tested

Page 21: vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28

GlusterFS

Towards “Real” Open Source

Red Hat Storage

“Real” Open Source● Enables more innovation on

the fringes

● Engineering team from multiple sources

● Open source project is “upstream” from commercial product

● “Downstream” products are hardened, more secure and thoroughly tested

Page 22: vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28

Project Roadmaps

Page 23: vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28

GlusterFS 3.3 ETA in Q2/Q3 2012

What's New in GlusterFS 3.3● New features

● Unified File & Object access

● Hadoop / HDFS compatibility

● New Volume Type● Replicated + striped (+ distributed) volumes

● Enhancements to Distributed volumes (DHT translator)● Rebalance can migrate open files

● Remove-brick can migrate data to remaining bricks

● Enhancements to Replicated volumes (AFR translator)● Change replica count on an active volume, add replication to distribute-only volumes

● Granular locking – Much faster self-healing for large files

● Proactive self-heal process starts without FS stat

● Round-trip reduction for lower latency

● Quorum enforcement - avoid split brain scenarios

Page 24: vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28

File and Object Storage

● Traditional SAN / NAS support either file or block storage

● New storage methodologies implement RESTful APIs over HTTP

● Demand for unifying the storage infrastructure increasing

● Treats files as objects and volumes as buckets

● Available now in 3.3 betas

● Soon to be backported to 3.2.x

● Contributing to OpenStack project● Re-factored Swift API

Page 25: vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28

Technology Integrations

OpenStack Imaging Services

Unified File &Object Storage

…Compute

API Layer Mobile Apps. Web Clients. Enterprise Software Ecosystem

GlusterFS used as VM storage system● Pause and re-start VM’s, even on another

hypervisor● HA and DR for VM’s● Faster VM deployment ● V-motion –like capability

Shared storage ISOs and appliances● oVirt / RHEV ● CloudStack● OpenStack

Goal: The standard for cloud storage

Page 26: vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28

HDFS/Hadoop Compatibility

● HDFS compatibility library● Simultaneous file and object access within Hadoop

● Benefits● Legacy app access to MapReduce applications

● Enables data storage consolidation

● Simplify and unify storage deployments

● Provide users with file level access to data

● Enable legacy applications to access data via NFS● Analytic apps can access data without modification

Page 27: vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28

The Gluster Community

● What is changing?● HekaFS / CloudFS being folded into Gluster project

– HekaFS == GlusterFS + multi-tenancy and SSL for auth and data encryption

– HekaFS.org – ETA ~9 months

Page 28: vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28

What else?

Page 29: vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28

GlusterFS Advisory Board

● Advisory board● Consists of industry and community leaders from Facebook, Citrix,

Fedora, and OpenStack– Richard Wareing, Storage Engineer, Facebook

– Jeff Darcy, Filesystem Engineer, Red Hat; Founder, HekaFS Project

– AB Periasamy, Co-Founder, GlusterFS project

– Ewan Mellor, Xen Engineer, Citrix; Member, OpenStack project

– David Nalley, CloudStack Community Mgr; Fedora Advisory Board

– Louis Zuckerman, Sr. System Administrator, Picture Marketing

– Joe Julian, Sr. System Administrator, Ed Wyse Beauty Products

– Greg DeKoenigsberg, Community VP, Eucalyptus; co-founder, Fedora

– John Mark Walker, Gluster.org Community Guy (Chair)

Page 30: vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28

Gluster.org Web Site

● Services for users and developers● Developer section with comprehensive docs● Collaborative project hosting● Continuing development of end user documentation

and interactive tools● Published roadmaps

● Transparent feature development

Page 31: vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28

GlusterFS Downloads

● Where's the code?● GlusterFS 3.3

– Simultaneous file + object– HDFS compatibility– Improved self-healing + VM hosting

● Granular locking

– Beta 3 due Feb/Mar 2012– http://download.gluster.org/pub/gluster/glusterfs

Page 32: vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28

Gluster.org Services

● Gluster.org● Portal into all things GlusterFS

● Community.gluster.org● Self-support site; Q&A; HOWTOs; tutorials

● Patch review, CI● review.gluster.com

● #gluster● IRC channel on Freenode

Page 33: vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28

Development Process

● Source code● Hosted at github.com/gluster

● Bugs and Feature Requests● Bugzilla.redhat.com – select GlusterFS from menu

● Patches● Submit via Gerritt at review.gluster.com

● See Development Work Flow doc: ● gluster.org/community/documentation/index.php/Development_Work_Flow

Page 34: vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28

Thank You

● GlusterFS contacts● Gluster.org/interact/mailinglists● @RedHatStorage & @GlusterOrg● #gluster on Freenode

● My contact info● [email protected]● Twitter & identi.ca: @johnmark