16
Building Systems for Big Data and Big Compute Steve Scott, Cray CTO Smoky Mountains Conference September 1, 2016

Building Systems for Big Data Big Compute · 2016-09-01 · Strong motivation for HPC + Big Data in a single system Growing desire for HPC + analytic workflows More efficient when

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Building Systems for Big Data Big Compute · 2016-09-01 · Strong motivation for HPC + Big Data in a single system Growing desire for HPC + analytic workflows More efficient when

Building Systems for Big Data and Big Compute

Steve Scott, Cray CTO

Smoky Mountains ConferenceSeptember 1, 2016

Page 2: Building Systems for Big Data Big Compute · 2016-09-01 · Strong motivation for HPC + Big Data in a single system Growing desire for HPC + analytic workflows More efficient when

C O M P U T E | S T O R E | A N A L Y Z E

We’ve Been Doing “Big Data” For a Long Time

Massive Datasets

High Performance Memory, Interconnects, and Storage

Copyright 2016 Cray Inc. 2

Page 3: Building Systems for Big Data Big Compute · 2016-09-01 · Strong motivation for HPC + Big Data in a single system Growing desire for HPC + analytic workflows More efficient when

C O M P U T E | S T O R E | A N A L Y Z E

Disruptive Memory Technology

Cray Inc.

● Standard DDR memory BW has notkept pace with CPUs

● HBM:● ~10x higher BW, ~10x less energy/bit● Costs ~2x DDR4 per bit

0

200

400

600

800

1000

1200

1400

1600

1800

bandwidth (GB/s) pJ/bit

Today’s DDR4 vs. Future HBM3

4 channels 2.4 GHz DDR4 4 stacks of gen-3 HBM on package

180160140120100

80604020

0

May want more, smaller nodes, with better BW and capacity per op

Page 4: Building Systems for Big Data Big Compute · 2016-09-01 · Strong motivation for HPC + Big Data in a single system Growing desire for HPC + analytic workflows More efficient when

C O M P U T E | S T O R E | A N A L Y Z E

Most “Big Data” Jobs Aren’t That Big

Copyright 2016 Cray Inc. 4

● Aggregate data becoming very large, but most analytic jobs are modest● Typical data analytics workloads: 10GB mean, 100 GB 95%ile● Prabhat: big HPC analytics jobs ~10x larger than that● Many data analytics jobs run on a handfull of cores

● Meanwhile, the APEX procurement wants multiple PB of memory!

3PB main memory

1 TB “Big Data” job

Page 5: Building Systems for Big Data Big Compute · 2016-09-01 · Strong motivation for HPC + Big Data in a single system Growing desire for HPC + analytic workflows More efficient when

C O M P U T E | S T O R E | A N A L Y Z E

Most “Big Data” Jobs Aren’t That Big

Copyright 2016 Cray Inc. 5

● Aggregate data becoming very large, but most analytic jobs are modest● Typical data analytics workloads: 10GB mean, 100 GB 95%ile● Prabhat: big HPC analytics jobs ~10x larger than that● Many data analytics jobs run on a handfull of cores

● Meanwhile, the APEX procurement wants multiple PB of memory!

● I’ll interpret “Big Data” as meaning data analytics● Extracting knowledge/insight from data● As opposed to simulation and modeling, which generally producesdata

Page 6: Building Systems for Big Data Big Compute · 2016-09-01 · Strong motivation for HPC + Big Data in a single system Growing desire for HPC + analytic workflows More efficient when

C O M P U T E | S T O R E | A N A L Y Z E

Convergence of HPC and Big Data

Copyright 2016 Cray Inc. 6

What dowe need tobedoing inHPCthat is differentfrom what we have donein

the past?

Page 7: Building Systems for Big Data Big Compute · 2016-09-01 · Strong motivation for HPC + Big Data in a single system Growing desire for HPC + analytic workflows More efficient when

C O M P U T E | S T O R E | A N A L Y Z E

What is an optimal design for HPC?

Copyright 2016 Cray Inc. 7

Node Architecture A

• Dual Haswell nodes @ 2.4 GHz• 128 GB DDR4 @ 2.66 GHz• 12.5 GB/s/node network bandwidth

Node Architecture B

• Dual Haswell nodes @ 2.6 GHz• 256 GB DDR4 @ 2.66 GHz• 25 GB/s/node network bandwidth

Not atall clear.

Whichofthese is better?

Page 8: Building Systems for Big Data Big Compute · 2016-09-01 · Strong motivation for HPC + Big Data in a single system Growing desire for HPC + analytic workflows More efficient when

C O M P U T E | S T O R E | A N A L Y Z E

§ Map Reduce§ N-body methods § Graph traversal § Graphical models § Dense and sparse linear algebra § Spectral methods § Structured and unstructured grids § Combinational logic§ Dynamic programming § Backtrack and branch-and-bound § Finite-state machines

§ Basic statistics – simple Map Reduce implementation

§ Generalized n-body problems§ Graph-theoretic computations§ Linear algebraic computations§ Optimizations – e.g., linear programming§ Integration/machine learning§ Alignment problems – e.g., BLAST

Copyright 2016 Cray Inc.

Landscape of Parallel Computing Research (Berkeley – 2006/2008)

State of Big Data: Use Cases and Ogre Patterns (NIST 2014)

Data Analytics can be considered just another set of workloads in a sea of workloads.

8

Page 9: Building Systems for Big Data Big Compute · 2016-09-01 · Strong motivation for HPC + Big Data in a single system Growing desire for HPC + analytic workflows More efficient when

C O M P U T E | S T O R E | A N A L Y Z E

GeneralizationsAbout Analytics Workloads

Copyright 2016 Cray Inc. 9

● Data centric workloads⇒ Larger memories and local SSDs are helpful

● Vertical data motion is important● Hadoop and Spark effectively move computation to the data, do initial filtering of data locally⇒ Don’t (usually) need much network bandwidth

● Notable exceptions: Graph analytics and machine learning● Graph analytics

● Can’t partition the data! So really hard to scale! (many get discouraged)● Wants a network that can do fine-grained RDMA well (similar to some HPC)

● Machine Learning● Training problem can be parallelized, can use lots of data, and requires global communication● Wants a very high performance network and memory system

Page 10: Building Systems for Big Data Big Compute · 2016-09-01 · Strong motivation for HPC + Big Data in a single system Growing desire for HPC + analytic workflows More efficient when

C O M P U T E | S T O R E | A N A L Y Z E

Merging of HPC and Data Analytics

Copyright 2016 Cray Inc. 10

Urika-GDCustom Graph

Analyticsengine Urika-XA

Hadoop, Spark,NoSQL

Urika-GX “Athena”

Cray Graph Engine

HPC + Analytics workflows

Why combine HPC and Analytics solutions in a single box? HPC underneath the covers

Open analytics framework

Aries network

Integrated system: Hadoop/Spark + Graph analytics +

HPC

XC40World’s leading Supercomputer

“Minerva”

Page 11: Building Systems for Big Data Big Compute · 2016-09-01 · Strong motivation for HPC + Big Data in a single system Growing desire for HPC + analytic workflows More efficient when

C O M P U T E | S T O R E | A N A L Y Z E

Building an Analytics Machine

Copyright 2016 Cray Inc. 11

● Urika-GX Approach:● 48 Haswell nodes per cabinet● Aries network● Up to 512GB DRAM per node● Dual SATA HDDs per node● Up to 4TB/node SSD per node

● XC40 Approach:● 192 Haswell nodes per cabinet● Aries network● Up to 256GB DRAM per node● DataWarp 12TB SSD blades, which can

be dynamically shared across system

But…. need to address Lustre metadata bottleneck for codes that do lots of “local” file IO.

Page 12: Building Systems for Big Data Big Compute · 2016-09-01 · Strong motivation for HPC + Big Data in a single system Growing desire for HPC + analytic workflows More efficient when

C O M P U T E | S T O R E | A N A L Y Z E

Using Shifter to Accelerate Per-Node I/O

Copyright 2016 Cray Inc. 12

• Demonstrated > 100x speedup vs. straight Lustre on IOPS benchmark at 256 nodes

• Demonstrated Spark scaling to 50,000 cores in CUG 2016 paper

“NAS storage surprisingly close to local SSDs”`

https://cug.org/proceedings/cug2016_proceedings/includes/files/pap125.pdf

Page 13: Building Systems for Big Data Big Compute · 2016-09-01 · Strong motivation for HPC + Big Data in a single system Growing desire for HPC + analytic workflows More efficient when

C O M P U T E | S T O R E | A N A L Y Z E

Resource and Management and Scheduling

Copyright 2016 Cray Inc. 13

Picture from Malte Schwarzkopf bloghttp://www.firmament.io/blog/scheduler-architectures.html

● Analytics workloads can have very different scheduling needs than HPC workloads● May want very fine-grained scheduling (cores, not nodes)● May have long-running services processing streaming data● May need to dynamically expand/contract● May be tied to real-time events such as experimental control or output processing● May be interactive/bursty (database utilization depends on queries)

Page 14: Building Systems for Big Data Big Compute · 2016-09-01 · Strong motivation for HPC + Big Data in a single system Growing desire for HPC + analytic workflows More efficient when

C O M P U T E | S T O R E | A N A L Y Z E

Other Analytics Implications (mostly SW)

Copyright 2016 Cray Inc. 14

● Greater diversity of programming languages & environments● Python, R, Julia, Spark, Scala, ML frameworks, etc.● MPI + OpenMP is a foreign concept to the analytics community● Openness and container support are important

● Cloud interoperability● E.g.: source data from cloud ➝ compute/analyze ➝ store data back in cloud

● Data movement between apps● HPC tends to focus on accelerating single applications● Analytics workloads usually involve pipelines● Shared data formats can allow data exchange in memory

● E.g.: Arrow in-memory data structure specification for columnar data

Page 15: Building Systems for Big Data Big Compute · 2016-09-01 · Strong motivation for HPC + Big Data in a single system Growing desire for HPC + analytic workflows More efficient when

C O M P U T E | S T O R E | A N A L Y Z E

Take aways

Copyright 2016 Cray Inc. 15

● Strong motivation for HPC + Big Data in a single system● Growing desire for HPC + analytic workflows● More efficient when data can be transferred in memory/SSD● Utilization is better with systems that can be dynamically provisioned

● Big Data is just another set of workloads● Not that different (we already build machines to handle big data)● On average, probably want more memory per node for analytics● Some workloads don’t need much network, but others need a strong network● May argue for heterogeneous systems (already do that for HPC)

● Biggest issue may be resource management/scheduling● A few other software issues, but no show stoppers for converged systems

Page 16: Building Systems for Big Data Big Compute · 2016-09-01 · Strong motivation for HPC + Big Data in a single system Growing desire for HPC + analytic workflows More efficient when

Thank You!Questions?