24
Platform Computing Boost Your Productivity with GPGPUs and IBM Platform Computing Software NVIDIA GTC 2013 Chris Porter, IBM March, 2013 © 2012 IBM Corporation 1

Boost Your Productivity with GPGPUs and IBM Platform ...on-demand.gputechconf.com/.../S3578-IBM-Platform... · Platform Computing IBM Platform Computing The leader in cluster, grid

  • Upload
    others

  • View
    17

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Boost Your Productivity with GPGPUs and IBM Platform ...on-demand.gputechconf.com/.../S3578-IBM-Platform... · Platform Computing IBM Platform Computing The leader in cluster, grid

Platform Computing

Boost Your Productivity with GPGPUs and IBM Platform Computing Software

NVIDIA GTC 2013Chris Porter, IBM

March, 2013

© 2012 IBM Corporation1

Chris Porter, IBM

Page 2: Boost Your Productivity with GPGPUs and IBM Platform ...on-demand.gputechconf.com/.../S3578-IBM-Platform... · Platform Computing IBM Platform Computing The leader in cluster, grid

Platform Computing

Agenda

• IBM Platform Computing offerings

• GPGPU Adoption in the HPC Market

• GPGPU Scheduling & Management

- IBM Platform Computing Solutions for GPGPUs

© 2012 IBM Corporation2

- IBM Platform Computing Solutions for GPGPUs

- Benefits from Intelligent GPU Scheduling & Management

- Use Case Examples

• Summary

Page 3: Boost Your Productivity with GPGPUs and IBM Platform ...on-demand.gputechconf.com/.../S3578-IBM-Platform... · Platform Computing IBM Platform Computing The leader in cluster, grid

Platform Computing

© 2012 IBM Corporation3

IBM PLATFORM COMPUTING OFFERINGS

Page 4: Boost Your Productivity with GPGPUs and IBM Platform ...on-demand.gputechconf.com/.../S3578-IBM-Platform... · Platform Computing IBM Platform Computing The leader in cluster, grid

Platform Computing

IBM Platform Computing The leader in cluster, grid and HPC cloud management software

• Acquired by IBM in 2012 as part of mainstream Technical Computing strategy

• 20 year history delivering leading workload and resource

management software for technical computing and big data/analytics environments

• 2000+ global customers including 23 of 30 largest enterprises

De facto Standard for Commercial

HPC

60% of top Financial

© 2012 IBM Corporation4

• 2000+ global customers including 23 of 30 largest enterprises

• Market leading scheduling engine with high performance,

mission-critical reliability and extreme scalability

• Comprehensive capability from ready-to-deploy complete cluster systems to large global grids to HPC clouds

• Large ISV and global partner ecosystem

• Global services and support coverage

Over 5 MM CPUs under management

Financial Services

Page 5: Boost Your Productivity with GPGPUs and IBM Platform ...on-demand.gputechconf.com/.../S3578-IBM-Platform... · Platform Computing IBM Platform Computing The leader in cluster, grid

Platform Computing

IBM Platform Computing offerings

Platform LSF Family

Platform HPC for

System x

Scalable, comprehensive workload management suite for heterogeneous compute environments

Simplified, integrated, purpose-built HPC management software integrated with systemsW

ork

load M

anagem

ent • Unmatched experience through market share

• Powerful multi-policy scheduling engine

• Unmatched scalability through high end accounts

• Unmatched breadth of offering due to extent of add-ons

• All-in-one integrated solution with leading web interface

• Applicable to the smallest of clusters

• Leverages Platform LSF technology base

• Hardware bundled for turnkey purchasing and deployment

© 2012 IBM Corporation5

Platform Symphony

Family

Platform Cluster

Manager

integrated with systems

High-throughput, low-latency compute and data intensive analytics applications

Provisioning and management of HPC clusters

Work

load M

anagem

ent

Analy

tics

Infr

astr

uctu

reF

lexib

le

Clu

ste

rs

• Leading experience due to 50%+ major investment banks as customers (translates to other industries)

• High scalability and better application performance due to fast, low latency processing (sub millisecond)

• Proven business model for sharing grid infrastructure

• Both compute and data intensive applications

• Hardware bundled for turnkey purchasing and deployment

• Scalable offerings to simplify process of deploying and managing small clusters to global HPC clouds

• Broad heterogeneous support enables managing broad technologies and multiple workload managers

• Enables multi-tenant HPC clouds

Page 6: Boost Your Productivity with GPGPUs and IBM Platform ...on-demand.gputechconf.com/.../S3578-IBM-Platform... · Platform Computing IBM Platform Computing The leader in cluster, grid

Platform Computing

The Application Accelerator Storm

• GPU adoption is increasing

– 53 systems on the Top500 released in Nov, 2012 are using GPGPUs

– GPGPUs are penetrating both high-end and mainstream HPC

• Nvidia is leading the accelerator race

– 100’s of K’s of trained CUDA developers worldwide

© 2012 IBM Corporation6

– 100’s of K’s of trained CUDA developers worldwide

– 50 systems powered by Nvidia on the latest Top500 list

• Other accelerator technologies are emerging

– Intel: Xeon Phi Coprocessor

– AMD: FireStream

Page 7: Boost Your Productivity with GPGPUs and IBM Platform ...on-demand.gputechconf.com/.../S3578-IBM-Platform... · Platform Computing IBM Platform Computing The leader in cluster, grid

Platform Computing

© 2012 IBM Corporation7

GPGPU ADOPTION IN THEHPC MARKET

Page 8: Boost Your Productivity with GPGPUs and IBM Platform ...on-demand.gputechconf.com/.../S3578-IBM-Platform... · Platform Computing IBM Platform Computing The leader in cluster, grid

Platform Computing

Market Landscape: Technical Applications are Exploding

Creativity

GeoScience Financial

CAE

Adoption Drivers Technical Applications

© 2012 IBM Corporation8

Productivity

Visualization

GeoScience

Life-

Sciences

Government

& Education

EDA

Financial

TechnicalProcessing

Quality

Page 9: Boost Your Productivity with GPGPUs and IBM Platform ...on-demand.gputechconf.com/.../S3578-IBM-Platform... · Platform Computing IBM Platform Computing The leader in cluster, grid

Platform Computing

The Big Buzz in HPC: Hybrid Computing

• Hybrid Computing: CPUs and GPUs working together

• Applications Taking Advantage Of GPUs

When do I use them?What is the ROI?How do I schedule jobs to them?How to maximize utilization, various published

benchmarks showing dramatic performance increases?

© 2012 IBM Corporation9

• Applications Taking Advantage Of GPUs

– Life Sciences• Unipro UGENE, Agile Molecule, many others

– Financial Services• Volmaster FX, ClusterTech Financial Library, many others

– Manufacturing

• Fidesys, Ansys, 3ds, many others

– Oil and Gas• Acceleware Seismic Solvers, many others

Page 10: Boost Your Productivity with GPGPUs and IBM Platform ...on-demand.gputechconf.com/.../S3578-IBM-Platform... · Platform Computing IBM Platform Computing The leader in cluster, grid

Platform Computing

© 2012 IBM Corporation10

GPU SCHEDULING & MANAGEMENT

Page 11: Boost Your Productivity with GPGPUs and IBM Platform ...on-demand.gputechconf.com/.../S3578-IBM-Platform... · Platform Computing IBM Platform Computing The leader in cluster, grid

Platform Computing

What do Intelligent GPU Scheduling and Management Bring to You?

• Improved application performance by allocating GPU suitable workloads on those resources and free up CPUs for other types of workloads.

• Reduced infrastructure cost by maximizing cluster utilization.

• Simplified system management via easy to use GUI and timely alerts.

• Increased productivity for administrators and application developers.

© 2012 IBM Corporation11Intelligent scheduling improves cluster efficiency

Page 12: Boost Your Productivity with GPGPUs and IBM Platform ...on-demand.gputechconf.com/.../S3578-IBM-Platform... · Platform Computing IBM Platform Computing The leader in cluster, grid

Platform Computing

GPUs: Schedule, Monitor & Manage

• DEPLOY: Quickly deploy workload to GPU resources

– Easy job submission to GPUs in a cluster via CUDA job submission wrappers

– Install CUDA across a cluster is a couple of clicks

• MANAGE: Easily manage heterogeneous clusters

– Deploy & manage both CPU & GPU resources in the same cluster

– Remotely manage & view the status of your jobs

© 2012 IBM Corporation12

Take immediate advantage of the exceptional HPC performance provided by GPUs

Page 13: Boost Your Productivity with GPGPUs and IBM Platform ...on-demand.gputechconf.com/.../S3578-IBM-Platform... · Platform Computing IBM Platform Computing The leader in cluster, grid

Platform Computing

GPUs: Schedule, Monitor & Manage

• MONITOR: Monitor GPU metrics

– GPU slot utilization, temperature & status

– Detect ECC error accumulation

© 2012 IBM Corporation13

Page 14: Boost Your Productivity with GPGPUs and IBM Platform ...on-demand.gputechconf.com/.../S3578-IBM-Platform... · Platform Computing IBM Platform Computing The leader in cluster, grid

Platform Computing

Scheduling to GPGPUs Today

• Managing latest GPGPUs and CUDA (V5.0) applications using:– IBM Platform LSF– IBM Platform HPC– IBM Platform Symphony

• GPU ELIM provides:– Monitoring and detection of GPUs– Group hosts with GPU(s) into a

resource group– Compute slots on these hosts are

user configured

Resource Group = RG_GPU

Compute HostGPU

ELIMLIM

Compute Host

Info on GPU(s)

GPU

© 2012 IBM Corporation14

• GPU-enablement is the responsibility of the application developer

GPU Management:• # of GPU• # of GPU in “normal”• # of GPU in “exclusive”• # of GPU in “prohibited”

LSFSCHED

Compute Host

GPUELIMLIM

Compute Host

ELIMLIMGPU Monitoring:• Mode (normal, exclusive, prohibited)• Temperature• ECC error count

Page 15: Boost Your Productivity with GPGPUs and IBM Platform ...on-demand.gputechconf.com/.../S3578-IBM-Platform... · Platform Computing IBM Platform Computing The leader in cluster, grid

Platform Computing

© 2012 IBM Corporation15

USE CASES

Page 16: Boost Your Productivity with GPGPUs and IBM Platform ...on-demand.gputechconf.com/.../S3578-IBM-Platform... · Platform Computing IBM Platform Computing The leader in cluster, grid

Platform Computing

Use Case #1: Simple Use Case

LSF Clusterjobs

jobs

© 2012 IBM Corporation16

• Nvidia GPGPU only• CUDA 5.0 and older• Simple monitoring statistics

jobs

jobs

ELIM

Page 17: Boost Your Productivity with GPGPUs and IBM Platform ...on-demand.gputechconf.com/.../S3578-IBM-Platform... · Platform Computing IBM Platform Computing The leader in cluster, grid

Platform Computing

Use Case #2: Complex Use Case

LSF Clusterjobs

jobs

© 2012 IBM Corporation17

• Multiple GPGPU / accelerators OR• Use of newer CUDA features > 3.2 OR• Monitoring of memory and GPU core utilization

jobs

jobs

Page 18: Boost Your Productivity with GPGPUs and IBM Platform ...on-demand.gputechconf.com/.../S3578-IBM-Platform... · Platform Computing IBM Platform Computing The leader in cluster, grid

Platform Computing

Use Case #3: NUMA optimization within a single server

GPU

Mem

ory

Mem

ory

16xCPU

Asymmetric Bandwidth

© 2012 IBM Corporation18

GPU

GPU

Mem

ory

Mem

ory

PC

I E

xpre

ss

CPU

8x

8x

Page 19: Boost Your Productivity with GPGPUs and IBM Platform ...on-demand.gputechconf.com/.../S3578-IBM-Platform... · Platform Computing IBM Platform Computing The leader in cluster, grid

Platform Computing

Use Case #3: NUMA Optimization within a single server

Asymmetric bandwidth requires:

– LSF: Non-GPU jobs to be scheduled to hosts without GPUs first

– LSF: Non-GPU jobs be scheduled to cores with low GPU bandwidth

– LSF: GPU jobs schedule to cores with maximum GPU bandwidth

© 2012 IBM Corporation19

GPU

GPU

GPU

Me

mo

ry

Me

mo

ryM

em

ory

Me

mo

ry

16x

PC

I E

xp

ress

CPU

CPU

8x

8x

Page 20: Boost Your Productivity with GPGPUs and IBM Platform ...on-demand.gputechconf.com/.../S3578-IBM-Platform... · Platform Computing IBM Platform Computing The leader in cluster, grid

Platform Computing

Use Case #4: NUMA optimization for multi-server MPI jobs

© 2012 IBM Corporation20

MPI job optimization

– MPI selects optimal cores for multi-host job MPI processes

GPU MPI job CPU only MPI jobGPU serial jobs

Page 21: Boost Your Productivity with GPGPUs and IBM Platform ...on-demand.gputechconf.com/.../S3578-IBM-Platform... · Platform Computing IBM Platform Computing The leader in cluster, grid

Platform Computing

Use Case #4: NUMA Optimization for multi-server MPI jobs

MPI based multi-server GPU and non-GPU jobs

– LSF: Single servers – LSF scheduling plugin controls core placement

– LSF: Multiple servers – LSF scheduling plugin does nothing

– MPI: Single servers – MPI scheduler does nothing

– MPI: Multiple servers – LSF scheduling plugin controls core placement

© 2012 IBM Corporation21

Page 22: Boost Your Productivity with GPGPUs and IBM Platform ...on-demand.gputechconf.com/.../S3578-IBM-Platform... · Platform Computing IBM Platform Computing The leader in cluster, grid

Platform Computing

© 2012 IBM Corporation22

SUMMARY

Page 23: Boost Your Productivity with GPGPUs and IBM Platform ...on-demand.gputechconf.com/.../S3578-IBM-Platform... · Platform Computing IBM Platform Computing The leader in cluster, grid

Platform Computing

Reality and Conclusions

Don’t have application developed for GPUs?

• Many ISVs are working hard to adopt CUDA and/or openCL for their applications

© 2012 IBM Corporation23

IBM Platform Computing has available solutions for

• Managing both GPU and CPU resources in a cluster

• Monitoring & visualizing important parameters for GPUs

• Scheduling serial jobs to available and functional GPUs

• Scheduling parallel jobs to available and functional GPUs

• Scheduling & optimizing mixed mode serial and parallel workload

Page 24: Boost Your Productivity with GPGPUs and IBM Platform ...on-demand.gputechconf.com/.../S3578-IBM-Platform... · Platform Computing IBM Platform Computing The leader in cluster, grid

Platform Computing

© 2012 IBM Corporation24

QUESTIONS?