23
Intel Confidential — Do Not Forward The Importance of Fast, Scalable Storage for Today’s HPC Bill Webster High Performance Data Division, Intel Corporation

The Importance of Fast, Scalable Storage for Today’s HPC

Embed Size (px)

DESCRIPTION

Today, data drives discovery. And discoveries create are key to creating sustained advantages. The better your critical workflows are able to create and access data – the better you’ll be able to discover new, innovative solutions to important problems, or to create entirely new products. More than ever before, data intensive applications need the sustained performance and virtually unlimited scalability that only parallel storage software delivers. Designed for maximum performance and scale, storage solutions powered by Lustre software deliver the performance at scale to meet today’s storage requirements. As the most widely used parallel storage system for HPC, Lustre-powered storage is the ideal storage foundation. But scalable performance storage by itself only solves half the problem. Today’s users expect storage solutions that deliver sustained performance, scale upward to near limitless capacities, and are simple to install and manage. Intel(r) Enterprise Edition for Lustre* software combines the straight line speed and scale of Lustre with the bottom line need for lowered management complexity and cost. As the recognized leaders in the development and support of the Lustre file system, Intel has the expertise to make storage solutions for data intensive applications faster, smarter and easier.

Citation preview

Page 1: The Importance of Fast, Scalable Storage for Today’s HPC

Intel Confidential — Do Not Forward

The Importance of Fast, Scalable Storage for Today’s HPCBill Webster

High Performance Data Division, Intel Corporation

Page 2: The Importance of Fast, Scalable Storage for Today’s HPC

Some Data About Data….

2.5 >80% ~90%

Quadrillion bytes of data created daily1

Of data today’s data is unstructured

Of the world’s data has been created within the last 2

years…

1 Source: IBM

Page 3: The Importance of Fast, Scalable Storage for Today’s HPC

The Case for Fast, Scalable Storage

Solving important problems drives technology investments

Fast storage is critical for maximum application performance

Lustre software was created for performance at large scale

Storage fueled by Lustre* is stable, flexible and highly efficient

Lustre is the most widely used parallel storage for HPC1

Over 60% of the fastest 100 HPC sites worldwide rely on Lustre2

1 Source: IDC research

2 Intel analysis of www.top500.org rankings, December 2013

* Some names and brands may be claimed as the property of others.

Page 4: The Importance of Fast, Scalable Storage for Today’s HPC

4

• Workloads are diverse and dynamic, and applications are compute or data-intensive – often both

• The value of HPC storage is measured by speed, scale & IOPS

• To meet these requirements, HPC storage needs to:

• Scale-out for increased I/O and capacity

• Perform I/O in parallel for maximum throughput

• Support virtually unlimited number of clients

• Commercial “HPC” needs the same level of performance

Lustre was architected for speed, scale and IOPS

HPC Places Unique Demands on Storage

Page 5: The Importance of Fast, Scalable Storage for Today’s HPC

Intel Confidential — Do Not Forward

HPC Storage SoftwareIntroducing the Lustre file system

5

Page 6: The Importance of Fast, Scalable Storage for Today’s HPC

6

What is Lustre*?

Open source, distributed, parallel, clustered file system

Designed for maximum performance at massive scale

POSIX compliant – key for supporting applications

Global, shared name space – all clients can access all data

Very resource efficient and cost effective

* Some names and brands may be claimed as the property of others.

Page 7: The Importance of Fast, Scalable Storage for Today’s HPC

7

What Makes Lustre* So Important?

Purpose-built for speed and scale: Speed: Unmatched performance

Openness: choice of storage platforms

Efficiency: Achieves +90% utilization of storage resources

Affordable: Low CAPEX and OPEX

Scale-out: Independently scale storage capacity and bandwidth

Stable and reliable: Backed by Intel, the worldwide leader in Lustre support

* Some names and brands may be claimed as the property of others.

Page 8: The Importance of Fast, Scalable Storage for Today’s HPC

8

Good Fit Applications for Lustre*…

Financial analysis – Modeling risk exposure & portfolio valuation

Geosciences - weather forecasting and climate modeling

Bioinformatics – genomics, proteomics, drug discovery

Energy - exploration, reservoir modeling, wind energy

Engineering - CAE, CFD and FEA for aerospace, automotive

SCIENCEANALYTICS ENGINEERING

* Some names and brands may be claimed as the property of others.

Page 9: The Importance of Fast, Scalable Storage for Today’s HPC

What Does a Lustre* Solution Look Like?

ManagementNetwork

High Performance Data Network(InfiniBand, 10GbE)

MetadataServers

Object StorageServers

Intel Manager for Lustre* (requires Enterprise Edition)

Object StorageServers

Object StorageTargets (OSTs)

Object StorageTargets (OSTs)

MetadataTarget (MDT)

ManagementTarget (MGT)

Lustre Clients – diskless compute servers

* Some names and brands may be claimed as the property of others.

Page 10: The Importance of Fast, Scalable Storage for Today’s HPC

Management Servers

ManagementNetwork

High Performance Data Network(InfiniBand, 10GbE)

MetadataServers

Object StorageServers

Intel Manager for Lustre* (requires Enterprise Edition)

Object StorageServers

Object StorageTargets (OSTs)

Object StorageTargets (OSTs)

MetadataTarget (MDT)

ManagementTarget (MGT)

Lustre Clients – diskless compute servers

1

* Some names and brands may be claimed as the property of others.

Page 11: The Importance of Fast, Scalable Storage for Today’s HPC

Storage Servers

ManagementNetwork

High Performance Data Network(InfiniBand, 10GbE)

MetadataServers

Object StorageServers

Intel Manager for Lustre* (requires Enterprise Edition)

Object StorageServers

Object StorageTargets (OSTs)

Object StorageTargets (OSTs)

MetadataTarget (MDT)

ManagementTarget (MGT)

Lustre Clients – diskless compute servers

2

* Some names and brands may be claimed as the property of others.

Page 12: The Importance of Fast, Scalable Storage for Today’s HPC

Compute clients

ManagementNetwork

High Performance Data Network(InfiniBand, 10GbE)

MetadataServers

Object StorageServers

Intel Manager for Lustre* (requires Enterprise Edition)

Object StorageServers

Object StorageTargets (OSTs)

Object StorageTargets (OSTs)

MetadataTarget (MDT)

ManagementTarget (MGT)

Lustre Clients – diskless compute servers

3

* Some names and brands may be claimed as the property of others.

Page 13: The Importance of Fast, Scalable Storage for Today’s HPC

Interconnect fabric

ManagementNetwork

High Performance Data Network(InfiniBand, 10GbE)

MetadataServers

Object StorageServers

Intel Manager for Lustre* (requires Enterprise Edition)

Object StorageServers

Object StorageTargets (OSTs)

Object StorageTargets (OSTs)

MetadataTarget (MDT)

ManagementTarget (MGT)

Lustre Clients – diskless compute servers

4

* Some names and brands may be claimed as the property of others.

Page 14: The Importance of Fast, Scalable Storage for Today’s HPC

The Results? Fast, scalable storage & I/O

ManagementNetwork

High Performance Data Network(InfiniBand, 10GbE)

Object StorageServers

Object StorageServers

Lustre Clients – diskless compute servers

Object StorageTargets (OSTs)

Object StorageTargets (OSTs)

MetadataTarget (MDT)

ManagementTarget (MGT)

• Over +2 TB/s achieved

• 500-750 GB/s production

• +80,000 IO/s

* Some names and brands may be claimed as the property of others.

Page 15: The Importance of Fast, Scalable Storage for Today’s HPC

Intel Confidential — Do Not Forward

Intel® Lustre SolutionsEnterprise Edition for Lustre* software

* Some names and brands may be claimed as the property of others.

Page 16: The Importance of Fast, Scalable Storage for Today’s HPC

16

Intel® Enterprise Edition for Lustre* Intel®

Manager for Lustre is the heart of all Intel EE for Lustre based solutions.

* Some names and brands may be claimed as the property of others.

Page 17: The Importance of Fast, Scalable Storage for Today’s HPC

17

Intel® Manager for Lustre*

The ‘dashboard’ canvas displays a variety of charts that illustrates performance levels and resource utilization.

Visual system status indictor

Configure, create and optimize Lustre file systems

Intelligent, intuitive logging – understand how your storage is performing quickly and easily

* Some names and brands may be claimed as the property of others.

Page 18: The Importance of Fast, Scalable Storage for Today’s HPC

Intel Confidential — Do Not Forward

A word about Big Data.

Page 19: The Importance of Fast, Scalable Storage for Today’s HPC

19

The Convergence of HPC and Big Data• Big Data problems are getting larger

• More compute power. More files. More capacity and data throughput

• MapReduce workloads are being added to HPC environments

• 1 in 3 HPC sites have deployed Hadoop1

• But MapReduce workloads run differently than typical HPC applications

• Compute nodes are diskless – no local storage

• By default, Hadoop expects local storage within each node

• Lustre storage accelerates the value of Hadoop • Improves application performance

• Boosts storage efficiency and lowers management complexity

* Some names and brands may be claimed as the property of others.

1 Source: IDC research

Page 20: The Importance of Fast, Scalable Storage for Today’s HPC

Intel® Enterprise Edition

for Lustre* software Includes theHadoop ‘adapter’ for Lustre

• Replacement for HDFS• Shared, parallel

storage optimizes performance• Lowers

management complexity• Maximizes

utilization of storage resources

* Some names and brands may be claimed as the property of others.

Page 21: The Importance of Fast, Scalable Storage for Today’s HPC

21

Case Study: Sanger Wellcome Trust

Challenge: Improved processes and lab equipment led to exponential increases in the volume of data being generated – but storage budgets were growing slowly.

Large data sets are difficult to proactively manage, and can easily overwhelm storage resources. Un-optimized storage had a direct, negative impact on application performance – slowing the time for breakthrough results.

Solution: Exploit the power and scale of HPC-class storage, powered by Lustre* software and supported by Intel.

Benefits provided: Openness – Broad array of storage vendors and

products Global namespace – all clients can access all data Performance – Upwards of 1 TB/s Capacity - Virtually unlimited file system and per

file sizes Confidence – Backed by Intel expertise with Lustre

• 10-15 TB of processed data weekly

• Processed data is small fraction of overall storage capacity

• Stored in iRODS data warehouse

• BAM or FASTA format files• Use pattern matching

algorithms like BWA and BLAST

• Lustre offers immense scalable capacity

• Now have 8 production Lustre file systems – and are planning to add more

• Performance was main goal – but scale, flexibility, efficiency were critical

* Some names and brands may be claimed as the property of others.

Page 22: The Importance of Fast, Scalable Storage for Today’s HPC

22

Thank You.

Page 23: The Importance of Fast, Scalable Storage for Today’s HPC

Intel Confidential — Do Not Forward