8
PURITY//FB AND ITS CORE TECHNOLOGIES MODERN STORAGE SOFTWARE POWERING INDUSTRY’S FIRST DATA HUB TECHNICAL BRIEF

TECHNICAL BRIEF PURITY//FB AND ITS CORE TECHNOLOGIES€¦ · unifies them. Enter the data hub. Data hub is a data-centric architecture for storage that powers analytics and AI while

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

PURITY//FB AND ITS CORE TECHNOLOGIESMODERN STORAGE SOFTWARE POWERING INDUSTRY’S FIRST DATA HUB

TECHNICAL BRIEF

2

TABLE OF CONTENTS

INTRODUCTION ........................................................................................................................................ 3

DATA LIVES IN A COMPLEX SPRAWL OF SILOS ................................................................................ 3

NEW CLASS OF STORAGE TO UNIFY ALL ........................................................................................... 4

FOUR ESSENTIAL INGREDIENTS OF A DATA HUB ........................................................................... 4

MEET THE INDUSTRY’S FIRST DATA HUB: FLASHBLADE ............................................................... 5

POWERED BY PURITY//FB ...................................................................................................................... 5

STORAGE SOFTWARE LIKE THE WORLD HAS NEVER SEEN .......................................................... 6

High Throughput for File & Object on a Single Platform ....................................................... 6

Multi-Dimensional Performance for Any Data ........................................................................... 6

Seamless, Native Scale-Out Architecture .................................................................................. 6

Massively Parallel Architecture ....................................................................................................... 7

CUSTOMER SPOTLIGHT: MAN AHL ....................................................................................................... 7

CUSTOMER SPOTLIGHT: COUNTY OF LOS ANGELES, DEPT OF PUBLIC HEALTH .................... 8

CONCLUSION ............................................................................................................................................ 8

3

DATA LIVES IN A COMPLEX SPRAWL OF SILOS

For today’s enterprises, data is the most important asset. But it’s often locked in multiple silos. In the world of

modern analytics, there are four classes of silo: Data warehouses, like Exadata appliances, require massive

throughput. Data lakes are built to store raw unstructured data. Streaming analytics require real-time analytics

and data agility. And finally, AI clusters are often built with HPC-style storage systems. The problem? From DAS

to HPC, each application silo has required a different storage architecture.

DATAWAREHOUSE

BACKUP

DATALAKE

STREAMINGANALYTICS

TIER 1SAN TEST/DEV

DAS

DBAPPLIANCE

OBJECT CLOUD

TAPE ARCHIVE

JBOD

ARCHIVE CLOUD

BACKUP APPLIANCE

DAS

VENDOR-SPECIFICDAS

AICLUSTER

HPC SYSTEMS

AFA

INTRODUCTION

Data is the fuel powering the modern enterprise, unlocking new possibilities as analytics and AI applications turn data into insight. Whereas the great promises of “big data” have fallen short in the past, a new generation of tools fueled by tremendous amounts of data – like Apache Spark and Google’s TensorFlow – are driving a tidal wave of new innovation. Today, a data warehouse holds data that may be important to improve AI models, while AI models may generate information to drive data warehouse analytics.

Yet data is often stuck in infrastructure complexities and sprawling silos. The storage industry is largely to blame: legacy vendors promote data silos, like Exadata or Data Domain systems, and data lakes, originally built to store unstructured data, not to deliver data in real-time. A modern storage architecture is needed for the new era of analytics and AI – because in today's era of intelligence, unified and shared data is more valuable than stored data.

NEW CLASS OF STORAGE TO UNIFY ALL

It’s time to rethink storage, with the goal of a single storage platform that builds on key strengths of each silo and

unifies them. Enter the data hub.

Data hub is a data-centric architecture for storage that powers analytics and AI while enabling enterprises to

consolidate and share data in today’s rapidly-evolving, data-first world. Unlike data lake and legacy DAS architectures

primarily designed to store data, data hub is designed to share data.

DATA LAKE STREAMING ANALYTICS AI CLUSTERBACKUP &

DATA WAREHOUSE

HIGH THROUGHPUTFILE & OBJECT

NATIVE SCALE-OUT

MULTI-DIMENSIONAL PERFORMANCE

DATA HUB

MASSIVELYPARALLEL

FOUR ESSENTIAL INGREDIENTS OF A DATA HUB

Not every storage architecture is a data hub. A data hub takes the key strengths of each silo, unique features which

make them capable for their own tasks, and integrates them into a single unified platform. So any storage vendor can

build a data hub, but it must be architected with these four features:

MULTI-DIMENSIONAL PERFORMANCE

Streaming analytics requires storage to deliver per formance for any data, small or large, with

any I/O pat tern, sequential or random.

MASSIVELY PARALLEL

Powered by tens of thousands of compute cores with GPUs, AI

clusters need storage to also be massively parallel.

SEAMLESS, NATIVE SCALE-OUT

Data lake redef ined as scale-out infrastructure, enabling

applications to run at any scale by adding more resources

as needed.

HIGH THROUGHPUT FILE & OBJECT

Backup and data warehouse applications require

tremendous data throughput to accelerate query t imes and

batched analy tics.

4

5

MEET THE INDUSTRY’S FIRST DATA HUB: FLASHBLADE™

FlashBlade is a storage system unlike anything the storage industry has ever delivered. From software to hardware,

everything is tuned to deliver on these four essential qualities of a data hub.

FlashBlade is built from the ground-

up to unify file and object on a single

scale-out platform that consolidates

all data-intensive applications, from

backup and data protection all the

way to AI clusters. FlashBlade offers

a native scale-out architecture that

grows seamlessly to deliver data

to any application. Architected to

deliver unbiased performance for

any unstructured data, it delivers

multi-dimensional performance

for any data, any I/O. And it is massively parallel – built on a modern software system that scales limitlessly, delivering

performance to tens of thousands of clients accessing billions of objects.

POWERED BY PURITY//FB

Under the hood of FlashBlade lies one of the most powerful examples of storage software ever built: Purity//FB.

The architecture of Purity//FB can be represented by layers of innovations, all working in tandem to power a data hub.

• NFS file and S3 object protocols are native to the Purity//FB

software stack. To accelerate both legacy file-based

applications as well as modern cloud-native workloads,

Purity//FB eliminates the need for any gateways –

delivering the highest performance and efficiency.

• At the core of Purity//FB is a massively distributed

transactional database. Built on a modern key-value pair

architecture, this unique design enables FlashBlade to

distribute everything – both data and metadata.

• Data comes in all forms and sizes, and is accessed

in unpredictable ways. With its variable block metadata engine, Purity//FB delivers high performance

regardless of data type, size, or access pattern. And with intelligent load balancing technology, no resource,

data path, or metadata server is ever the bottleneck.

• Purity//FB is designed for flash from day 1. DirectFlash™ eliminates legacy protocol overhead and IO

concurrency limitations, both common in traditional storage systems, and enables software to speak to each

flash chip using a massively parallel data path.

MASSIVELY DISTRIBUTED TRANSACTION DATABASE

KEY

VALUE

VARIABLE BLOCK

METADATA ENGINE

CLIENT LOAD

BALANCINGDIRECTFLASH

NFS FILE S3 OBJECT

DATA LAKE STREAMING ANALYTICS AI CLUSTERBACKUP &

DATA WAREHOUSE

HIGH THROUGHPUTFILE & OBJECT

NATIVE SCALE-OUT

MULTI-DIMENSIONAL PERFORMANCE

MASSIVELYPARALLEL

STORAGE SOFTWARE LIKE THE WORLD HAS NEVER SEEN

Purity//FB represents a giant leap for the scale-out storage industry. From industry-standard file and object protocols to

managing every flash chip, Purity//FB is built from the ground-up to bring the power of all-flash at every layer of software.

6

MASSIVELY DISTRIBUTED TRANSACTION DATABASE

KEY

VALUE

VARIABLE BLOCK

METADATA ENGINE

CLIENT LOAD

BALANCINGDIRECTFLASH

NFS FILE S3 OBJECT

High Throughput for File & Object

SINGLE PLATFORM FOR BEST ROIFlashBlade eliminates the need for separate f ile and object systems and the inevitable inef f iciencies in capacity over-provisioning.

NO GATEWAYSLegacy approaches typically use gateways to layer additional protocols. File and object protocols are natively built for per formance in Purity//FB.

BUILT FOR FLASH FROM DAY 1Most scale-out sof tware uses decades-old designs originally built for spinning disk. Purity//FB is modern storage sof tware built for all -f lash.

Multi-Dimensional Performance for Any Data

VARIABLE BLOCK SIZE FOR HIGHEST EFFICIENCYOlder sof tware uses a f ixed block size, leaving lots of ef f iciency on the table. Purity//FB tailors block size to each object to maximize capacity ef f iciency.

TUNED FOR EVERYTHING ARCHITECTUREExisting solutions of ten optimize for a subset of I /O, like large, sequential f i les. Purity//FB is natively designed to deliver per formance for any data.

DISTRIBUTE CLIENTS & DATA ACROSS ALL RESOURCESPurity//FB distributes every data and every client request to ensure no resource is ever a bot tleneck.

Native Scale-Out ArchitectureDISTRIBUTE EVERYTHING: METADATA, FILES, OBJECTSLegacy systems utilize a federation of nodes, pools, pairs, caches, and metadata servers. Purity//FB is scale-out at its core, of fering simplicity without compromise.

DYNAMICALLY SELF-TUNES, SELF-HEALSPurity//FB self-tunes and self-heals to deliver per formance and resiliency while legacy systems require constant retuning for per formance.

UNIFIED DATABASE Core Purity//FB is built on a modern distributed metadata database technology, eliminating per formance hotspots found in legacy systems.

Massively Parallel ArchitectureLegacy storage software has been built up, over many years,

on serial protocols and spinning disk. Layers of gateways

and inefficiencies were added to support SSDs. Purity//FB

is engineered from the ground-up for flash, with a scale-out

metadata architecture capable of handling billions of files

and objects while delivering unprecedented performance.

The software speaks to each flash chip, using a massively

parallel data path to accelerate data access for today’s data-

intensive applications.

CUSTOMER SPOTLIGHT: MAN AHL

Man AHL is a pioneer in the field of systematic quantitative investing. Its entire business is based on creating and

executing computer models to make investment decisions. The firm has adopted the data hub with FlashBlade to

deliver a single unified platform for data scientists and engineers to innovate using analytics and AI applications.

MLlib

AIANALYTICSDATA WAREHOUSE

7

LEGACY SOFTWARE

FLASH

SCSISAS

GATEWAY

SATA

FLASH

Our quants want to test a model, get the results, and then test another one – all day long. So a 10-20X improvement in performance is a game-changer when it comes to creating a time-to-market advantage for us.”— GARY COLLIER, CO-CTO

CUSTOMER SPOTLIGHT: COUNTY OF LOS ANGELES, DEPARTMENT OF PUBLIC HEALTH

Customers often start their analytics journeys by consolidating the infrastructure that holds most of their data –

backup appliances. The Public Health Department in Los Angeles County had a sprawl of silos, comprising racks of

Data Domain and DAS-centric data lake. Today, the Department is actively unifying all their data on a single FlashBlade

to accelerate backup and restore times while offering faster analytics capabilities to their data teams.

ANALYTICS

Racks of Data Domain

PRODUCTION BACKUP

Racks of Data Lake

Avamar

Tape Archive

PRODUCTION BACKUP & ANALYTICS

Tape Archive

OLD ARCHITECTURE NEW ARCHITECTURE

CONCLUSION

Data is stuck in a complex sprawl of silos – and legacy storage architectures are largely to blame. The storage industry

offers many data silos, like backup appliances, data warehouse appliances, and data lakes. Each silo is useful for its

original task, but is not built to share and deliver data with simplicity and speed.

Data hub is a new class of storage architecture designed to unify and deliver data for modern analytics and AI

workloads. It takes the key strengths of each silo, from data warehouses to AI clusters, and integrates them into a

single unified platform.

FlashBlade is the industry’s first data hub, and Purity//FB is the engine that makes this possible. To learn more about

FlashBlade and Purity//FB, please visit us at www.purestorage.com/datahub.

8

© 2018 Pure Storage, Inc. All rights reserved.

Pure Storage, FlashBlade, DirectFlash, and the “P” Logo are trademarks or registered trademarks of Pure Storage, Inc. in the U.S. and other countries. Other company, product, and service names may be trademarks or service marks of others.

The Pure Storage product described in this documentation is distributed under a license agreement and may be used only in accordance with the terms of the agreement. The license agreement restricts its use, copying, distribution, decompilation, and reverse engineering. No part of this documentation may be reproduced in any form by any means without prior written authorization from Pure Storage, Inc. and its licensors, if any.

THE DOCUMENTATION IS PROVIDED “AS IS” AND ALL EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONS AND WARRANTIES, INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT ARE DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID. PURE STORAGE SHALL NOT BE LIABLE FOR INCIDENTAL OR CONSEQUENTIAL DAMAGES IN CONNECTION WITH THE FURNISHING, PERFORMANCE, OR USE OF THIS DOCUMENTATION. THE INFORMATION CONTAINED IN THIS DOCUMENTATION IS SUBJECT TO CHANGE WITHOUT NOTICE.

ps_wp8p_purity-fb-and-data-hub_01