57
Is remote GPU virtualization useful? Federico Silla Technical University of Valencia Spain

Is remote GPU virtualization useful? - HPC Advisory CouncilIs remote GPU virtualization useful? Federico Silla. Technical University of Valencia. Spain. HPC Advisory Council Spain

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Is remote GPU virtualization useful? - HPC Advisory CouncilIs remote GPU virtualization useful? Federico Silla. Technical University of Valencia. Spain. HPC Advisory Council Spain

Is remote GPU virtualization useful?

Federico SillaTechnical University of Valencia

Spain

Page 2: Is remote GPU virtualization useful? - HPC Advisory CouncilIs remote GPU virtualization useful? Federico Silla. Technical University of Valencia. Spain. HPC Advisory Council Spain

HPC Advisory Council Spain Conference 2015 2/57

OutlinestWhat is “remote GPU virtualization”?

Page 3: Is remote GPU virtualization useful? - HPC Advisory CouncilIs remote GPU virtualization useful? Federico Silla. Technical University of Valencia. Spain. HPC Advisory Council Spain

HPC Advisory Council Spain Conference 2015 3/57

We deal with GPUs, obviously!

Page 4: Is remote GPU virtualization useful? - HPC Advisory CouncilIs remote GPU virtualization useful? Federico Silla. Technical University of Valencia. Spain. HPC Advisory Council Spain

HPC Advisory Council Spain Conference 2015 4/57

Basics of GPU computing

Basic behavior of CUDA

GPU

Page 5: Is remote GPU virtualization useful? - HPC Advisory CouncilIs remote GPU virtualization useful? Federico Silla. Technical University of Valencia. Spain. HPC Advisory Council Spain

HPC Advisory Council Spain Conference 2015 5/57

Remote GPU virtualization

A software technology that enables a more flexible use of GPUs in computing facilities

No GPU

Page 6: Is remote GPU virtualization useful? - HPC Advisory CouncilIs remote GPU virtualization useful? Federico Silla. Technical University of Valencia. Spain. HPC Advisory Council Spain

HPC Advisory Council Spain Conference 2015 6/57

Basics of remote GPU virtualization

Page 7: Is remote GPU virtualization useful? - HPC Advisory CouncilIs remote GPU virtualization useful? Federico Silla. Technical University of Valencia. Spain. HPC Advisory Council Spain

HPC Advisory Council Spain Conference 2015 7/57

Basics of remote GPU virtualization

Page 8: Is remote GPU virtualization useful? - HPC Advisory CouncilIs remote GPU virtualization useful? Federico Silla. Technical University of Valencia. Spain. HPC Advisory Council Spain

HPC Advisory Council Spain Conference 2015 8/57

Physicalconfiguration

CPU

Main

Mem

ory

Network

PCI-e

CPU

Main

Mem

ory

Network

PCI-e

CPU

Main

Mem

ory

Network

PCI-e

CPU

Main

Mem

ory

Network

PCI-e

CPU

Main

Mem

ory

Network

PCI-e

PCI-e

CPU

Main

Mem

ory

Network

Interconnection Network

Logical connections

Logicalconfiguration

Remote GPU virtualization envision

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

PCI-e C

PU

Main

Mem

ory

Network

Interconnection Network

PCI-e C

PU

GPU GPUmem

Main

Mem

ory

Network

GPU GPUmem

PCI-e C

PU

GPU GPUmem

Main

Mem

ory

Network

GPU GPUmem

CPU

Main

Mem

ory

Network

GPU GPUmem

GPU GPUmem

PCI-e C

PU

Main

Mem

ory

Network

GPU GPUmem

GPU GPUmem

PCI-e C

PU

Main

Mem

ory

Network

GPU GPUmem

GPU GPUmem

PCI-e

Remote GPU virtualization allows a new vision of a GPU deployment, moving from the usual cluster configuration:

to the following one:

Page 9: Is remote GPU virtualization useful? - HPC Advisory CouncilIs remote GPU virtualization useful? Federico Silla. Technical University of Valencia. Spain. HPC Advisory Council Spain

HPC Advisory Council Spain Conference 2015 9/57

Outline

Why is “remote GPU virtualization” needed?

nd

Page 10: Is remote GPU virtualization useful? - HPC Advisory CouncilIs remote GPU virtualization useful? Federico Silla. Technical University of Valencia. Spain. HPC Advisory Council Spain

HPC Advisory Council Spain Conference 2015 10/57

Outline

Which is the problem with GPU-enabled clusters?

Page 11: Is remote GPU virtualization useful? - HPC Advisory CouncilIs remote GPU virtualization useful? Federico Silla. Technical University of Valencia. Spain. HPC Advisory Council Spain

HPC Advisory Council Spain Conference 2015 11/57

A GPU-enabled cluster is a set of independent self-contained nodes that leverage the shared-nothing approach: Nothing is directly shared among nodes (MPI required for aggregating

computing resources within the cluster, included GPUs) GPUs can only be used within the node they are attached to

PCI-e CPU

GPU GPUmem

Main

Mem

ory

Network

GPU GPUmem

Interconnection Network

PCI-e CPU

GPU GPUmem

Main

Mem

ory

Network

GPU GPUmem

PCI-e CPU

GPU GPUmem

Main

Mem

ory

Network

GPU GPUmem

PCI-e CPU

GPU GPUmem

Main

Mem

ory

Network

GPU GPUmem

PCI-e CPU

GPU GPUmem

Main

Mem

ory

Network

GPU GPUmem

PCI-e CPU

GPU GPUmem

Main

Mem

ory

Network

GPU GPUmem

Characteristics of GPU-based clusters

Page 12: Is remote GPU virtualization useful? - HPC Advisory CouncilIs remote GPU virtualization useful? Federico Silla. Technical University of Valencia. Spain. HPC Advisory Council Spain

HPC Advisory Council Spain Conference 2015 12/57

• Applications can only use the GPUs located within their node:• Non-accelerated applications keep GPUs idle in the nodes where they

use all the cores

First concern with accelerated clusters

PCI-e CPU

GPU GPUmem

Main

Mem

ory

Network

GPU GPUmem

Interconnection Network

PCI-e CPU

GPU GPUmem

Main

Mem

ory

Network

GPU GPUmem

PCI-e CPU

GPU GPUmem

Main

Mem

ory

Network

GPU GPUmem

PCI-e CPU

GPU GPUmem

Main

Mem

ory

Network

GPU GPUmem

PCI-e CPU

GPU GPUmem

Main

Mem

ory

Network

GPU GPUmem

PCI-e CPU

GPU GPUmem

Main

Mem

ory

Network

GPU GPUmem

A CPU-only application spreading over these four nodes would make their GPUs unavailable for accelerated applications

Page 13: Is remote GPU virtualization useful? - HPC Advisory CouncilIs remote GPU virtualization useful? Federico Silla. Technical University of Valencia. Spain. HPC Advisory Council Spain

HPC Advisory Council Spain Conference 2015 13/57

For some workloads, GPUs may be idle for significant periods of time:• Initial acquisition costs not amortized• Space: GPUs reduce CPU density• Energy: idle GPUs keep consuming power

Money leakage in current clusters?

Time (s)

Idle

Pow

er (W

atts

)

1 GPU node

4 GPUs node

• 1 GPU node: Two E5-2620V2 sockets and 32GB DDR3 RAM. One Tesla K20 GPU • 4 GPUs node: Two E5-2620V2 sockets and 128GB DDR3 RAM. Four Tesla K20 GPUs

25%

Page 14: Is remote GPU virtualization useful? - HPC Advisory CouncilIs remote GPU virtualization useful? Federico Silla. Technical University of Valencia. Spain. HPC Advisory Council Spain

HPC Advisory Council Spain Conference 2015 14/57

• Applications can only use the GPUs located within their node:• non-MPI multi-GPU applications running on a subset of nodes

cannot make use of the tremendous GPU resources available at other cluster nodes (even if they are idle)

Second concern with accelerated clusters

PCI-e CPU

GPU GPUmem

Main

Mem

ory

Network

GPU GPUmem

Interconnection Network

PCI-e CPU

GPU GPUmem

Main

Mem

ory

Network

GPU GPUmem

PCI-e CPU

GPU GPUmem

Main

Mem

ory

Network

GPU GPUmem

PCI-e CPU

GPU GPUmem

Main

Mem

ory

Network

GPU GPUmem

PCI-e CPUGPU GPU

mem

Main

Mem

ory

Network

GPU GPUmem

PCI-e CPU

GPU GPUmem

Main

Mem

ory

Network

GPU GPUmem

All these GPUs cannot be used by the multi-GPU

application being executedNon-MPI multi-GPU

application

Page 15: Is remote GPU virtualization useful? - HPC Advisory CouncilIs remote GPU virtualization useful? Federico Silla. Technical University of Valencia. Spain. HPC Advisory Council Spain

HPC Advisory Council Spain Conference 2015 15/57

• Do applications completely squeeze the GPUs available in the cluster?• When a GPU is assigned to an application, computational resources

inside the GPU may not be fully used• Application presenting low level of parallelism• CPU code being executed (GPU assigned ≠ GPU working)• GPU-core stall due to lack of data• etc …

One more concern with accelerated clusters

PCI-e CPU

GPU GPUmem

Main

Mem

ory

Network

GPU GPUmem

Interconnection Network

PCI-e CPU

GPU GPUmem

Main

Mem

ory

Network

GPU GPUmem

PCI-e CPU

GPU GPUmem

Main

Mem

ory

Network

GPU GPUmem

PCI-e CPU

GPU GPUmem

Main

Mem

ory

Network

GPU GPUmem

PCI-e CPUGPU GPU

mem

Main

Mem

ory

Network

GPU GPUmem

PCI-e CPU

GPU GPUmem

Main

Mem

ory

Network

GPU GPUmem

Page 16: Is remote GPU virtualization useful? - HPC Advisory CouncilIs remote GPU virtualization useful? Federico Silla. Technical University of Valencia. Spain. HPC Advisory Council Spain

HPC Advisory Council Spain Conference 2015 16/57

In summary …

• There are scenarios where GPUs are available but cannot be used

• Accelerated applications do not make use of GPUs 100% of the time

In conclusion …

• GPU cycles are lost, thus reducing cluster performance

Why performance in GPU clusters is lost?

Page 17: Is remote GPU virtualization useful? - HPC Advisory CouncilIs remote GPU virtualization useful? Federico Silla. Technical University of Valencia. Spain. HPC Advisory Council Spain

HPC Advisory Council Spain Conference 2015 17/57

We need something more in the cluster

The current model for using GPUs is too rigid

What is missing is ...... some flexibility for using

the GPUs in the cluster

Page 18: Is remote GPU virtualization useful? - HPC Advisory CouncilIs remote GPU virtualization useful? Federico Silla. Technical University of Valencia. Spain. HPC Advisory Council Spain

HPC Advisory Council Spain Conference 2015 18/57

What is missing is ...... some flexibility for using

the GPUs in the cluster

We need something more in the cluster

A way of seamlessly sharing GPUs across nodes in the cluster

(remote GPU virtualization)

The current model for using GPUs is too rigid

Page 19: Is remote GPU virtualization useful? - HPC Advisory CouncilIs remote GPU virtualization useful? Federico Silla. Technical University of Valencia. Spain. HPC Advisory Council Spain

HPC Advisory Council Spain Conference 2015 19/57

CPU

Main

Mem

ory

Network

PCI-e

PCI-e

CPU

Main

Mem

ory

Network

CPU

Main

Mem

ory

Network

PCI-e

PCI-e C

PU

GPU GPUmem

Main

Mem

ory

Network

GPU GPUmem

Interconnection Network

PCI-e C

PU

GPU GPUmem

Main

Mem

ory

Network

GPU GPUmem

PCI-e C

PU

GPU GPUmem

Main

Mem

ory

Network

GPU GPUmem

CPU

Main

Mem

oryNetwork

PCI-e

CPU

Main

Mem

ory

Network

PCI-e

CPU

Main

Mem

ory

Network

PCI-e

CPU

Main

Mem

ory

Network

PCI-e

CPU

Main

Mem

ory

Network

PCI-e

PCI-e

CPU

Main

Mem

ory

Network

Interconnection Network

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

Remote GPU virtualization envision

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

Without GPU virtualization

With GPU virtualization

Real localGPUs

VirtualizedremoteGPUs

GPU virtualization allows all nodes to

access all GPUs

Page 20: Is remote GPU virtualization useful? - HPC Advisory CouncilIs remote GPU virtualization useful? Federico Silla. Technical University of Valencia. Spain. HPC Advisory Council Spain

HPC Advisory Council Spain Conference 2015 20/57

Several efforts have been made to implement remote GPU virtualization during the last years:

rCUDA (CUDA 7.0) GVirtuS (CUDA 3.2) DS-CUDA (CUDA 4.1) vCUDA (CUDA 1.1) GViM (CUDA 1.1) GridCUDA (CUDA 2.3) V-GPU (CUDA 4.0)

Remote GPU virtualization frameworks

Publicly available

NOT publicly available

Page 21: Is remote GPU virtualization useful? - HPC Advisory CouncilIs remote GPU virtualization useful? Federico Silla. Technical University of Valencia. Spain. HPC Advisory Council Spain

HPC Advisory Council Spain Conference 2015 21/57

Remote GPU virtualization frameworks

H2D pageableD2H pageable

H2D pinned D2H pinned

FDR InfiniBand + K20 !!

Page 22: Is remote GPU virtualization useful? - HPC Advisory CouncilIs remote GPU virtualization useful? Federico Silla. Technical University of Valencia. Spain. HPC Advisory Council Spain

HPC Advisory Council Spain Conference 2015 22/57

Outline

Cons of “remote GPU virtualization”?

rd

Page 23: Is remote GPU virtualization useful? - HPC Advisory CouncilIs remote GPU virtualization useful? Federico Silla. Technical University of Valencia. Spain. HPC Advisory Council Spain

HPC Advisory Council Spain Conference 2015 23/57

The main GPU virtualization drawback is the reduced bandwidth to the remote GPU

Problem with remote GPU virtualization

No GPU

Page 24: Is remote GPU virtualization useful? - HPC Advisory CouncilIs remote GPU virtualization useful? Federico Silla. Technical University of Valencia. Spain. HPC Advisory Council Spain

HPC Advisory Council Spain Conference 2015 24/57

Using InfiniBand networks

Page 25: Is remote GPU virtualization useful? - HPC Advisory CouncilIs remote GPU virtualization useful? Federico Silla. Technical University of Valencia. Spain. HPC Advisory Council Spain

HPC Advisory Council Spain Conference 2015 25/57

H2D pageable D2H pageable

H2D pinned D2H pinned

rCUDA EDR Orig rCUDA EDR OptrCUDA FDR Orig rCUDA FDR Opt

Initial transfers within rCUDA

Page 26: Is remote GPU virtualization useful? - HPC Advisory CouncilIs remote GPU virtualization useful? Federico Silla. Technical University of Valencia. Spain. HPC Advisory Council Spain

HPC Advisory Council Spain Conference 2015 26/57

H2D pageable D2H pageable

H2D pinned

Almost 100% of available BW

D2H pinned

Almost 100% of available BW

rCUDA EDR Orig rCUDA EDR OptrCUDA FDR Orig rCUDA FDR Opt

Optimized transfers within rCUDA

Page 27: Is remote GPU virtualization useful? - HPC Advisory CouncilIs remote GPU virtualization useful? Federico Silla. Technical University of Valencia. Spain. HPC Advisory Council Spain

HPC Advisory Council Spain Conference 2015 27/57

rCUDA optimizations on applications• Several applications executed with

CUDA and rCUDA• K20 GPU and FDR InfiniBand• K40 GPU and EDR InfiniBand

Loweris better

Page 28: Is remote GPU virtualization useful? - HPC Advisory CouncilIs remote GPU virtualization useful? Federico Silla. Technical University of Valencia. Spain. HPC Advisory Council Spain

HPC Advisory Council Spain Conference 2015 28/57

Outline

Pros of “remote GPU virtualization”?

th

Page 29: Is remote GPU virtualization useful? - HPC Advisory CouncilIs remote GPU virtualization useful? Federico Silla. Technical University of Valencia. Spain. HPC Advisory Council Spain

HPC Advisory Council Spain Conference 2015 29/57

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

Without GPU virtualization

GPU virtualization is useful for multi-GPU applications

CPU

Main

Mem

ory

Network

PCI-e

CPU

Main

Mem

ory

Network

PCI-e

CPU

Main

Mem

ory

Network

PCI-e

CPU

Main

Mem

ory

Network

PCI-e

CPU

Main

Mem

ory

Network

PCI-e

PCI-e

CPU

Main

Mem

ory

Network

Interconnection Network

Logical connectionsPC

I-e CPU

GPU GPUmem

Main

Mem

ory

Network

GPU GPUmem

Interconnection Network

PCI-e C

PU

GPU GPUmem

Main

Mem

ory

Network

GPU GPUmem

PCI-e C

PU

GPU GPUmem

Main

Mem

ory

Network

GPU GPUmem

PCI-e C

PU

GPU GPUmem

Main

Mem

oryNetwork

GPU GPUmem

PCI-e C

PU

GPU GPUmem

Main

Mem

ory

Network

GPU GPUmem

PCI-e C

PU

GPU GPUmem

Main

Mem

ory

Network

GPU GPUmem

With GPU virtualization

Many GPUs in the cluster can be provided

to the application

Only the GPUs in the node can be provided

to the application

1: more GPUs for a single application

Page 30: Is remote GPU virtualization useful? - HPC Advisory CouncilIs remote GPU virtualization useful? Federico Silla. Technical University of Valencia. Spain. HPC Advisory Council Spain

HPC Advisory Council Spain Conference 2015 30/57

1: more GPUs for a single application

64GPUs!

Page 31: Is remote GPU virtualization useful? - HPC Advisory CouncilIs remote GPU virtualization useful? Federico Silla. Technical University of Valencia. Spain. HPC Advisory Council Spain

HPC Advisory Council Spain Conference 2015 31/57

MonteCarlo Multi-GPU (from NVIDIA samples)

Loweris better

Higher is better

1: more GPUs for a single application

FDR InfiniBand +

NVIDIA Tesla K20

Page 32: Is remote GPU virtualization useful? - HPC Advisory CouncilIs remote GPU virtualization useful? Federico Silla. Technical University of Valencia. Spain. HPC Advisory Council Spain

HPC Advisory Council Spain Conference 2015 32/57

GPU GPUmem

GPU GPUmem

PCI-e C

PU

Main

Mem

ory

Network

Interconnection Network

PCI-e C

PU

GPU GPUmem

Main

Mem

ory

Network

GPU GPUmem

PCI-e C

PU

GPU GPUmem

Main

Mem

ory

Network

GPU GPUmem

CPU

Main

Mem

ory

Network

GPU GPUmem

GPU GPUmem

PCI-e C

PU

Main

Mem

ory

Network

GPU GPUmem

GPU GPUmem

PCI-e C

PU

Main

Mem

ory

Network

GPU GPUmem

GPU GPUmem

PCI-e

CPU

Main

Mem

ory

Network

PCI-e

CPU

Main

Mem

ory

Network

PCI-e

CPU

Main

Mem

oryNetwork

PCI-e

CPU

Main

Mem

ory

Network

PCI-e

CPU

Main

Mem

ory

Network

PCI-e

PCI-e

CPU

Main

Mem

ory

Network

Interconnection Network

Logical connections

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

Physicalconfiguration

Logicalconfiguration

2: busy CPU cores do not block GPUs

Page 33: Is remote GPU virtualization useful? - HPC Advisory CouncilIs remote GPU virtualization useful? Federico Silla. Technical University of Valencia. Spain. HPC Advisory Council Spain

HPC Advisory Council Spain Conference 2015 33/57

3: easier cluster upgrade

No GPU

• A cluster without GPUs may be easily upgraded to use GPUs with rCUDA

Page 34: Is remote GPU virtualization useful? - HPC Advisory CouncilIs remote GPU virtualization useful? Federico Silla. Technical University of Valencia. Spain. HPC Advisory Council Spain

HPC Advisory Council Spain Conference 2015 34/57

3: easier cluster upgrade

• A cluster without GPUs may be easily upgraded to use GPUs with rCUDA

GPU-enabled

Page 35: Is remote GPU virtualization useful? - HPC Advisory CouncilIs remote GPU virtualization useful? Federico Silla. Technical University of Valencia. Spain. HPC Advisory Council Spain

HPC Advisory Council Spain Conference 2015 35/57

Box A has 4 GPUs but only one is busy Box B has 8 GPUs but only two are busy

1. Move jobs from Box B to Box A and switch off Box B

2. Migration should be transparent to applications (decided by the global scheduler)

TRUE GREEN GPU COMPUTING

4: GPU task migration

Box A

Box B

Page 36: Is remote GPU virtualization useful? - HPC Advisory CouncilIs remote GPU virtualization useful? Federico Silla. Technical University of Valencia. Spain. HPC Advisory Council Spain

HPC Advisory Council Spain Conference 2015 36/57

5: virtual machines can easily access GPUs

• The GPU is assigned by using PCI passthrough exclusively to a single virtual machine

• Concurrent usage of the GPU is not possible

Page 37: Is remote GPU virtualization useful? - HPC Advisory CouncilIs remote GPU virtualization useful? Federico Silla. Technical University of Valencia. Spain. HPC Advisory Council Spain

HPC Advisory Council Spain Conference 2015 37/57

5: virtual machines can easily access GPUs

High performance network available

Low performance network available

Page 38: Is remote GPU virtualization useful? - HPC Advisory CouncilIs remote GPU virtualization useful? Federico Silla. Technical University of Valencia. Spain. HPC Advisory Council Spain

HPC Advisory Council Spain Conference 2015 38/57

Application performance with KVM

LAMMPS CUDA-MEME

CUDASW++ GPU-BLAST

FDR InfiniBand + K20 !!

Page 39: Is remote GPU virtualization useful? - HPC Advisory CouncilIs remote GPU virtualization useful? Federico Silla. Technical University of Valencia. Spain. HPC Advisory Council Spain

HPC Advisory Council Spain Conference 2015 39/57

Application performance with Xen

LAMMPS CUDA-MEME

CUDASW++ GPU-BLAST

FDR InfiniBand + K20 !!

Page 40: Is remote GPU virtualization useful? - HPC Advisory CouncilIs remote GPU virtualization useful? Federico Silla. Technical University of Valencia. Spain. HPC Advisory Council Spain

HPC Advisory Council Spain Conference 2015 40/57

Overhead of rCUDA within VMs

KVM

Xen

FDR InfiniBand + K20 !!

1.6% 2.5%

0.5% 0.07%

6.1%

2.9%

0.7%2.3%

Page 41: Is remote GPU virtualization useful? - HPC Advisory CouncilIs remote GPU virtualization useful? Federico Silla. Technical University of Valencia. Spain. HPC Advisory Council Spain

HPC Advisory Council Spain Conference 2015 41/57

Outline

What happens at the cluster level?

th

Page 42: Is remote GPU virtualization useful? - HPC Advisory CouncilIs remote GPU virtualization useful? Federico Silla. Technical University of Valencia. Spain. HPC Advisory Council Spain

HPC Advisory Council Spain Conference 2015 42/57

• GPUs can be shared among jobs running in remote clients• Job scheduler required for coordination

• Slurm was selected

App 2App 1

App 3App 4

rCUDA at cluster level … Slurm

App 6App 5

App 7App 8App 9

Page 43: Is remote GPU virtualization useful? - HPC Advisory CouncilIs remote GPU virtualization useful? Federico Silla. Technical University of Valencia. Spain. HPC Advisory Council Spain

HPC Advisory Council Spain Conference 2015 43/57

Applications used for tests: GPU-Blast (21 seconds; 1 GPU; 1599 MB) LAMMPS (15 seconds; 4 GPUs; 876 MB) MCUDA-MEME (165 seconds; 4 GPUs; 151 MB) GROMACS (167 seconds) NAMD (11 minutes) BarraCUDA (10 minutes; 1 GPU; 3319 MB) GPU-LIBSVM (5 minutes; 1GPU; 145 MB) MUMmerGPU (5 minutes; 1GPU; 2804 MB)

Non

-GPU

Short execution time

Long execution time

Ongoing work: studying rCUDA+Slurm

Set 1Set 2

Three workloads: Set 1 Set 2 Set 1 + Set 2

Three workload sizes: Small (100 jobs) Medium (200 jobs) Large (400 jobs)

Page 44: Is remote GPU virtualization useful? - HPC Advisory CouncilIs remote GPU virtualization useful? Federico Silla. Technical University of Valencia. Spain. HPC Advisory Council Spain

HPC Advisory Council Spain Conference 2015 44/57

Dual socket E5-2620v2 Intel Xeon + 32GB RAM + K20 GPU FDR InfiniBand based cluster

Test bench for studying rCUDA+Slurm

node with the Slurmscheduler

node with the Slurmscheduler node with

the Slurmscheduler

8 GPU nodes 16 GPU nodes

4 GPU nodes

Page 45: Is remote GPU virtualization useful? - HPC Advisory CouncilIs remote GPU virtualization useful? Federico Silla. Technical University of Valencia. Spain. HPC Advisory Council Spain

HPC Advisory Council Spain Conference 2015 45/57

8 nodes

16 nodes

Loweris better

4 nodes

Results for Set 1

Results for execution time

Page 46: Is remote GPU virtualization useful? - HPC Advisory CouncilIs remote GPU virtualization useful? Federico Silla. Technical University of Valencia. Spain. HPC Advisory Council Spain

HPC Advisory Council Spain Conference 2015 46/57

GPU utilization8 nodes

16 nodes

Loweris better

4 nodes

Results for Set 1

Higher is better

Page 47: Is remote GPU virtualization useful? - HPC Advisory CouncilIs remote GPU virtualization useful? Federico Silla. Technical University of Valencia. Spain. HPC Advisory Council Spain

HPC Advisory Council Spain Conference 2015 47/57

Results for energy consumption8 nodes

16 nodes

Loweris better

4 nodes

Results for Set 1

Page 48: Is remote GPU virtualization useful? - HPC Advisory CouncilIs remote GPU virtualization useful? Federico Silla. Technical University of Valencia. Spain. HPC Advisory Council Spain

HPC Advisory Council Spain Conference 2015 48/57

Reducing the amount of GPUs

35% Less

39% Less

41% LessLoweris better

Page 49: Is remote GPU virtualization useful? - HPC Advisory CouncilIs remote GPU virtualization useful? Federico Silla. Technical University of Valencia. Spain. HPC Advisory Council Spain

HPC Advisory Council Spain Conference 2015 49/57

Reducing the amount of GPUs

35% Less

39% Less

41% LessLoweris better

Cost (in Euros):a) 16 nodes + 16 K20 … 16*2500 + 16*2000 = 72000 Eurosb) 16 nodes + 4 K20 … 16*2500 + 4*2000 = 48000 Euros

Reducing 33% acquisition expenses, performance is still 40% better

Page 50: Is remote GPU virtualization useful? - HPC Advisory CouncilIs remote GPU virtualization useful? Federico Silla. Technical University of Valencia. Spain. HPC Advisory Council Spain

HPC Advisory Council Spain Conference 2015 50/57

Energy when removing GPUs

39% Less

42% Less

44% Less

Loweris better

Page 51: Is remote GPU virtualization useful? - HPC Advisory CouncilIs remote GPU virtualization useful? Federico Silla. Technical University of Valencia. Spain. HPC Advisory Council Spain

HPC Advisory Council Spain Conference 2015 51/57

Energy when removing GPUs

39% Less

42% Less

44% Less

Loweris better

Costs:

… reducing 33% acquisition expenses, performance is still 40% better …

Additionally, the electricity bill is significantly reduced

Page 52: Is remote GPU virtualization useful? - HPC Advisory CouncilIs remote GPU virtualization useful? Federico Silla. Technical University of Valencia. Spain. HPC Advisory Council Spain

HPC Advisory Council Spain Conference 2015 52/57

GPU utilization when removing GPUs

Higheris better

Page 53: Is remote GPU virtualization useful? - HPC Advisory CouncilIs remote GPU virtualization useful? Federico Silla. Technical University of Valencia. Spain. HPC Advisory Council Spain

HPC Advisory Council Spain Conference 2015 53/57

Outline

… in summary …

th

Page 54: Is remote GPU virtualization useful? - HPC Advisory CouncilIs remote GPU virtualization useful? Federico Silla. Technical University of Valencia. Spain. HPC Advisory Council Spain

HPC Advisory Council Spain Conference 2015 54/57

• Cons:1.Reduced bandwidth to remote GPU (really a concern??)

Pros and cons of rCUDA

• Pros:1.Many GPUs for a single application2.Concurrent GPU access to virtual machines3. Increased cluster throughput4.Similar performance with smaller investment5.Easier (cheaper) cluster upgrade6.Migration of GPU jobs7.Reduced energy consumption8. Increased GPU utilization

Page 55: Is remote GPU virtualization useful? - HPC Advisory CouncilIs remote GPU virtualization useful? Federico Silla. Technical University of Valencia. Spain. HPC Advisory Council Spain

HPC Advisory Council Spain Conference 2015 55/57

… is remote GPU virtualization useful?

Back to the initial question

Page 56: Is remote GPU virtualization useful? - HPC Advisory CouncilIs remote GPU virtualization useful? Federico Silla. Technical University of Valencia. Spain. HPC Advisory Council Spain

HPC Advisory Council Spain Conference 2015 56/57

Get a free copy of rCUDA athttp://www.rcuda.net

@rcuda_

More than 650 requests world wide

Sergio Iserte Fernando Campos

rCUDA is a development by Technical University of Valencia

Carlos Reaño Javier Prades

Page 57: Is remote GPU virtualization useful? - HPC Advisory CouncilIs remote GPU virtualization useful? Federico Silla. Technical University of Valencia. Spain. HPC Advisory Council Spain

HPC Advisory Council Spain Conference 2015 57/57

Thanks!Questions?

rCUDA is a development by Technical University of Valencia