Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
PURITY//FB AND ITS CORE TECHNOLOGIESMODERN STORAGE SOFTWARE POWERING INDUSTRY’S FIRST DATA HUB
TECHNICAL BRIEF
2
TABLE OF CONTENTS
INTRODUCTION ........................................................................................................................................ 3
DATA LIVES IN A COMPLEX SPRAWL OF SILOS ................................................................................ 3
NEW CLASS OF STORAGE TO UNIFY ALL ........................................................................................... 4
FOUR ESSENTIAL INGREDIENTS OF A DATA HUB ........................................................................... 4
MEET THE INDUSTRY’S FIRST DATA HUB: FLASHBLADE ............................................................... 5
POWERED BY PURITY//FB ...................................................................................................................... 5
STORAGE SOFTWARE LIKE THE WORLD HAS NEVER SEEN .......................................................... 6
High Throughput for File & Object on a Single Platform ....................................................... 6
Multi-Dimensional Performance for Any Data ........................................................................... 6
Seamless, Native Scale-Out Architecture .................................................................................. 6
Massively Parallel Architecture ....................................................................................................... 7
CUSTOMER SPOTLIGHT: MAN AHL ....................................................................................................... 7
CUSTOMER SPOTLIGHT: COUNTY OF LOS ANGELES, DEPT OF PUBLIC HEALTH .................... 8
CONCLUSION ............................................................................................................................................ 8
3
DATA LIVES IN A COMPLEX SPRAWL OF SILOS
For today’s enterprises, data is the most important asset. But it’s often locked in multiple silos. In the world of
modern analytics, there are four classes of silo: Data warehouses, like Exadata appliances, require massive
throughput. Data lakes are built to store raw unstructured data. Streaming analytics require real-time analytics
and data agility. And finally, AI clusters are often built with HPC-style storage systems. The problem? From DAS
to HPC, each application silo has required a different storage architecture.
DATAWAREHOUSE
BACKUP
DATALAKE
STREAMINGANALYTICS
TIER 1SAN TEST/DEV
DAS
DBAPPLIANCE
OBJECT CLOUD
TAPE ARCHIVE
JBOD
ARCHIVE CLOUD
BACKUP APPLIANCE
DAS
VENDOR-SPECIFICDAS
AICLUSTER
HPC SYSTEMS
AFA
INTRODUCTION
Data is the fuel powering the modern enterprise, unlocking new possibilities as analytics and AI applications turn data into insight. Whereas the great promises of “big data” have fallen short in the past, a new generation of tools fueled by tremendous amounts of data – like Apache Spark and Google’s TensorFlow – are driving a tidal wave of new innovation. Today, a data warehouse holds data that may be important to improve AI models, while AI models may generate information to drive data warehouse analytics.
Yet data is often stuck in infrastructure complexities and sprawling silos. The storage industry is largely to blame: legacy vendors promote data silos, like Exadata or Data Domain systems, and data lakes, originally built to store unstructured data, not to deliver data in real-time. A modern storage architecture is needed for the new era of analytics and AI – because in today's era of intelligence, unified and shared data is more valuable than stored data.
NEW CLASS OF STORAGE TO UNIFY ALL
It’s time to rethink storage, with the goal of a single storage platform that builds on key strengths of each silo and
unifies them. Enter the data hub.
Data hub is a data-centric architecture for storage that powers analytics and AI while enabling enterprises to
consolidate and share data in today’s rapidly-evolving, data-first world. Unlike data lake and legacy DAS architectures
primarily designed to store data, data hub is designed to share data.
DATA LAKE STREAMING ANALYTICS AI CLUSTERBACKUP &
DATA WAREHOUSE
HIGH THROUGHPUTFILE & OBJECT
NATIVE SCALE-OUT
MULTI-DIMENSIONAL PERFORMANCE
DATA HUB
MASSIVELYPARALLEL
FOUR ESSENTIAL INGREDIENTS OF A DATA HUB
Not every storage architecture is a data hub. A data hub takes the key strengths of each silo, unique features which
make them capable for their own tasks, and integrates them into a single unified platform. So any storage vendor can
build a data hub, but it must be architected with these four features:
MULTI-DIMENSIONAL PERFORMANCE
Streaming analytics requires storage to deliver per formance for any data, small or large, with
any I/O pat tern, sequential or random.
MASSIVELY PARALLEL
Powered by tens of thousands of compute cores with GPUs, AI
clusters need storage to also be massively parallel.
SEAMLESS, NATIVE SCALE-OUT
Data lake redef ined as scale-out infrastructure, enabling
applications to run at any scale by adding more resources
as needed.
HIGH THROUGHPUT FILE & OBJECT
Backup and data warehouse applications require
tremendous data throughput to accelerate query t imes and
batched analy tics.
4
5
MEET THE INDUSTRY’S FIRST DATA HUB: FLASHBLADE™
FlashBlade is a storage system unlike anything the storage industry has ever delivered. From software to hardware,
everything is tuned to deliver on these four essential qualities of a data hub.
FlashBlade is built from the ground-
up to unify file and object on a single
scale-out platform that consolidates
all data-intensive applications, from
backup and data protection all the
way to AI clusters. FlashBlade offers
a native scale-out architecture that
grows seamlessly to deliver data
to any application. Architected to
deliver unbiased performance for
any unstructured data, it delivers
multi-dimensional performance
for any data, any I/O. And it is massively parallel – built on a modern software system that scales limitlessly, delivering
performance to tens of thousands of clients accessing billions of objects.
POWERED BY PURITY//FB
Under the hood of FlashBlade lies one of the most powerful examples of storage software ever built: Purity//FB.
The architecture of Purity//FB can be represented by layers of innovations, all working in tandem to power a data hub.
• NFS file and S3 object protocols are native to the Purity//FB
software stack. To accelerate both legacy file-based
applications as well as modern cloud-native workloads,
Purity//FB eliminates the need for any gateways –
delivering the highest performance and efficiency.
• At the core of Purity//FB is a massively distributed
transactional database. Built on a modern key-value pair
architecture, this unique design enables FlashBlade to
distribute everything – both data and metadata.
• Data comes in all forms and sizes, and is accessed
in unpredictable ways. With its variable block metadata engine, Purity//FB delivers high performance
regardless of data type, size, or access pattern. And with intelligent load balancing technology, no resource,
data path, or metadata server is ever the bottleneck.
• Purity//FB is designed for flash from day 1. DirectFlash™ eliminates legacy protocol overhead and IO
concurrency limitations, both common in traditional storage systems, and enables software to speak to each
flash chip using a massively parallel data path.
MASSIVELY DISTRIBUTED TRANSACTION DATABASE
KEY
VALUE
VARIABLE BLOCK
METADATA ENGINE
CLIENT LOAD
BALANCINGDIRECTFLASH
NFS FILE S3 OBJECT
DATA LAKE STREAMING ANALYTICS AI CLUSTERBACKUP &
DATA WAREHOUSE
HIGH THROUGHPUTFILE & OBJECT
NATIVE SCALE-OUT
MULTI-DIMENSIONAL PERFORMANCE
MASSIVELYPARALLEL
STORAGE SOFTWARE LIKE THE WORLD HAS NEVER SEEN
Purity//FB represents a giant leap for the scale-out storage industry. From industry-standard file and object protocols to
managing every flash chip, Purity//FB is built from the ground-up to bring the power of all-flash at every layer of software.
6
MASSIVELY DISTRIBUTED TRANSACTION DATABASE
KEY
VALUE
VARIABLE BLOCK
METADATA ENGINE
CLIENT LOAD
BALANCINGDIRECTFLASH
NFS FILE S3 OBJECT
High Throughput for File & Object
SINGLE PLATFORM FOR BEST ROIFlashBlade eliminates the need for separate f ile and object systems and the inevitable inef f iciencies in capacity over-provisioning.
NO GATEWAYSLegacy approaches typically use gateways to layer additional protocols. File and object protocols are natively built for per formance in Purity//FB.
BUILT FOR FLASH FROM DAY 1Most scale-out sof tware uses decades-old designs originally built for spinning disk. Purity//FB is modern storage sof tware built for all -f lash.
Multi-Dimensional Performance for Any Data
VARIABLE BLOCK SIZE FOR HIGHEST EFFICIENCYOlder sof tware uses a f ixed block size, leaving lots of ef f iciency on the table. Purity//FB tailors block size to each object to maximize capacity ef f iciency.
TUNED FOR EVERYTHING ARCHITECTUREExisting solutions of ten optimize for a subset of I /O, like large, sequential f i les. Purity//FB is natively designed to deliver per formance for any data.
DISTRIBUTE CLIENTS & DATA ACROSS ALL RESOURCESPurity//FB distributes every data and every client request to ensure no resource is ever a bot tleneck.
Native Scale-Out ArchitectureDISTRIBUTE EVERYTHING: METADATA, FILES, OBJECTSLegacy systems utilize a federation of nodes, pools, pairs, caches, and metadata servers. Purity//FB is scale-out at its core, of fering simplicity without compromise.
DYNAMICALLY SELF-TUNES, SELF-HEALSPurity//FB self-tunes and self-heals to deliver per formance and resiliency while legacy systems require constant retuning for per formance.
UNIFIED DATABASE Core Purity//FB is built on a modern distributed metadata database technology, eliminating per formance hotspots found in legacy systems.
Massively Parallel ArchitectureLegacy storage software has been built up, over many years,
on serial protocols and spinning disk. Layers of gateways
and inefficiencies were added to support SSDs. Purity//FB
is engineered from the ground-up for flash, with a scale-out
metadata architecture capable of handling billions of files
and objects while delivering unprecedented performance.
The software speaks to each flash chip, using a massively
parallel data path to accelerate data access for today’s data-
intensive applications.
CUSTOMER SPOTLIGHT: MAN AHL
Man AHL is a pioneer in the field of systematic quantitative investing. Its entire business is based on creating and
executing computer models to make investment decisions. The firm has adopted the data hub with FlashBlade to
deliver a single unified platform for data scientists and engineers to innovate using analytics and AI applications.
MLlib
AIANALYTICSDATA WAREHOUSE
7
LEGACY SOFTWARE
FLASH
SCSISAS
GATEWAY
SATA
FLASH
Our quants want to test a model, get the results, and then test another one – all day long. So a 10-20X improvement in performance is a game-changer when it comes to creating a time-to-market advantage for us.”— GARY COLLIER, CO-CTO
“
CUSTOMER SPOTLIGHT: COUNTY OF LOS ANGELES, DEPARTMENT OF PUBLIC HEALTH
Customers often start their analytics journeys by consolidating the infrastructure that holds most of their data –
backup appliances. The Public Health Department in Los Angeles County had a sprawl of silos, comprising racks of
Data Domain and DAS-centric data lake. Today, the Department is actively unifying all their data on a single FlashBlade
to accelerate backup and restore times while offering faster analytics capabilities to their data teams.
ANALYTICS
Racks of Data Domain
PRODUCTION BACKUP
Racks of Data Lake
Avamar
Tape Archive
PRODUCTION BACKUP & ANALYTICS
Tape Archive
OLD ARCHITECTURE NEW ARCHITECTURE
CONCLUSION
Data is stuck in a complex sprawl of silos – and legacy storage architectures are largely to blame. The storage industry
offers many data silos, like backup appliances, data warehouse appliances, and data lakes. Each silo is useful for its
original task, but is not built to share and deliver data with simplicity and speed.
Data hub is a new class of storage architecture designed to unify and deliver data for modern analytics and AI
workloads. It takes the key strengths of each silo, from data warehouses to AI clusters, and integrates them into a
single unified platform.
FlashBlade is the industry’s first data hub, and Purity//FB is the engine that makes this possible. To learn more about
FlashBlade and Purity//FB, please visit us at www.purestorage.com/datahub.
8
© 2018 Pure Storage, Inc. All rights reserved.
Pure Storage, FlashBlade, DirectFlash, and the “P” Logo are trademarks or registered trademarks of Pure Storage, Inc. in the U.S. and other countries. Other company, product, and service names may be trademarks or service marks of others.
The Pure Storage product described in this documentation is distributed under a license agreement and may be used only in accordance with the terms of the agreement. The license agreement restricts its use, copying, distribution, decompilation, and reverse engineering. No part of this documentation may be reproduced in any form by any means without prior written authorization from Pure Storage, Inc. and its licensors, if any.
THE DOCUMENTATION IS PROVIDED “AS IS” AND ALL EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONS AND WARRANTIES, INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT ARE DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID. PURE STORAGE SHALL NOT BE LIABLE FOR INCIDENTAL OR CONSEQUENTIAL DAMAGES IN CONNECTION WITH THE FURNISHING, PERFORMANCE, OR USE OF THIS DOCUMENTATION. THE INFORMATION CONTAINED IN THIS DOCUMENTATION IS SUBJECT TO CHANGE WITHOUT NOTICE.
ps_wp8p_purity-fb-and-data-hub_01