Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
CUDA in the Cloud – Enabling HPC Workloads in OpenStack
John Paul Walters Computer Scien5st, USC Informa5on Sciences Ins5tute
[email protected] With special thanks to Andrew Younge (Indiana Univ.) and Massimo Bernaschi (IAC-CNR)
2
Outline
§ Introduction § OpenStack Background § Supporting Heterogeneity § Supporting GPUs § Performance Results § Current Status § Future Work
3
Introduction
§ Cloud computing is traditionally seen as a resource for IT – Web servers, databases
§ More recently researchers have begun to leverage the public cloud as an HPC resource – AWS virtual cluster is 102 on Top500 list
§ Major difference between HPC and IT in the cloud: – Types of resources, heterogeneity
§ Our contribution: we’re developing the heterogeneous HPC extensions for the OpenStack cloud computing platform
Why A Heterogeneous Cloud?
§ Typical cloud infrastructures focus on commodity hardware. § Choose number of CPUs, memory, and disk § ODen not a good match for technical compu5ng
§ Advantage of Heterogeneity § Leverage GPUs and other accelerators § Power efficiency, performance § Process technical workloads
§ Move away from batch/grid schedulers § Give users access to customizable instances § Dedicated virtual machine
Benefits of Heterogeneity
136.2 seconds 139.5 seconds
SGI UV100 rendering 1926 objects
Tilera vs. x86 video transcoding
Each architecture performs well on different applica5ons.
CPU: 108 samples GPU: 1010 samples
6
OpenStack Background § OpenStack founded by
Rackspace and NASA § In use by Rackspace, HP, and
others for their public clouds § Open source with hundreds of
participating companies § In use for both public and private
clouds § Current stable release:
OpenStack Folsom – OpenStack Grizzly to be
released in April
0
20
40
60
80
100
120
Google Trends Searches for Common Open Source IaaS Projects openstack cloudstack opennebula eucalyptus cloud
7
OpenStack Architecture
Image Source: Ken Pepple (http://ken.pepple.info/openstack/2012/09/25/openstack-folsom-architecture/)
Supporting Heterogeneity
• Current cloud services have limited resource op5ons • M1.large = 7.5 GB RAM, 2 virtual cores, 850 GB disk • x86 and x86_64 only
• Users should be able to customize instance types to needs • E.g. 16 GB RAM, 4 virtual cores, SSE4.2, 2xKepler GPUs • Hypervisor type
9
Supporting GPUs § We’re pursuing two approaches for GPU support in OpenStack
– LXC support for container-based VMs • Good for non-virtualization-friendly GPUs • Near-native performance • Can’t run non-Linux guests
– Xen support for fully virtualized guests • Fully virtualized guests, can run Windows, etc. • Some overhead, especially for PCIe transfers
§ Our OpenStack work currently supports GPU-enabled LXC containers – Xen is in development, preliminary results shown today
9
10
Integrating GPU Support into OpenStack § Our GPU work is based off of the OpenStack’s libvirt
module – Libvirt supports KVM/QEMU, LXC – Libvirt-based Xen support is experimental
§ After instances boot, LXC’s GPUs are whitelisted and their major/minor device IDs are passed into the VM
§ GPU Instances are launched just like any other: Euca-run-instances –k <my key> -t cg1.small <machine image>
cg1 instance types represent GPUs
11
Test Results: LXC and Xen § 3 data sets (all using CUDA 5)
– NVIDIA CUDA samples – SHOC multi-GPU – BFS/Graph500 multi-GPU
LXC tests § 2x Xeon E5520 § 96 GB RAM § 1x Tesla S2050 (4 GPUs) § RHEL 6.1
– 2.6.38.2 kernel (non-stock)
Xen Tests § 2x Xeon X5660 § 192 GB RAM § 2x NVIDIA Tesla C2075 § Centos 6.3 (with Xen 4.1.2)
– 2.6.32-279 kernel (guest, stock)
Hardware
12
LXC CUDA 5 Samples
0.92
0.94
0.96
0.98
1
1.02
RelaHv
e Pe
rforman
ce
Higher is BeKer
LXC CUDA Samples, Performance RelaHve to Base
13
LXC Bandwidth Results
0
1000
2000
3000
4000
5000
Band
width, M
B/s
Data Size, KB
LXC vs. Base, Host to Device Bandwidth, Pinned
Base
LXC VM 0
500 1000 1500 2000 2500 3000 3500
Band
width, M
B/s
Data Size, KB
LXC vs. Base, Device to Host Bandwidth, Pinned
Base
LXC VM
LXC mirrors base bandwidth
14
LXC SHOC Results
0.86 0.88 0.9
0.92 0.94 0.96 0.98
1 1.02
RelaHv
e Pe
rforman
ce
Higher is BeKer
LXC SHOC Benchmarks, 1 Node 4 GPUs. Performance RelaHve to Base
15
0.0E+00
5.0E+06
1.0E+07
1.5E+07
2.0E+07
2.5E+07
3.0E+07
3.5E+07
16 17 18 19 20 21 22
Med
ian TEPS
Problem Size
BFS/Graph500 2,4 GPUs, 1 Node
LXC 2 GPUs
Base 2 GPUs
LXC 4 GPUs
Base 4 GPUs
LXC Graph500 Results
TEPS: Traversed Edges per Second
16
Xen CUDA 5 Samples
0.92 0.94 0.96 0.98
1 1.02
RelaHv
e Pe
rforman
ce
Higher is BeKer
Xen CUDA Samples, Performance RelaHve to Base
17
Xen Bandwidth Results
0 1000 2000 3000 4000 5000 6000 7000
Band
width, M
B/s
Data Size, KB
Xen vs. Base, Host to Device Bandwidth, Pinned
Base
Xen VM
0 1000 2000 3000 4000 5000 6000
Band
width, M
B/s
Data Size, KB
Xen vs. Base, Device to Host Bandwidth, Pinned
Base
Xen VM
18
Xen SHOC Benchmarks
0.86 0.88 0.9 0.92 0.94 0.96 0.98
1 1.02
RelaHv
e Pe
rforman
ce
Higher is BeKer
Xen SHOC Benchmarks, 1 Node 2 GPUs. Performance RelaHve to Base
19
Xen Graph500 Results
0.00E+00
5.00E+06
1.00E+07
1.50E+07
2.00E+07
2.50E+07
3.00E+07
3.50E+07
16 17 18 19 20 21 22
Med
ian TEPS
Problem Size
BFS/Graph500, 2 GPUs, 1 Node
Xen VM
Base
TEPS: Traversed Edges per Second
20
Current Status
§ Source code is available now – https://github.com/usc-isi/nova
§ Includes support for heterogeneity – GPU-enabled LXC instances – Bare-metal provisioning – Architecture-aware scheduler
§ We’re working towards integrating additional support into the OpenStack Grizzly and H-releases
21
Future Work
§ Add support for high speed networking as a resource – Infiniband via SR-IOV – Enable features like GPUdirect
§ Add support for the Kepler architecture § Merge the Xen support work into Upstream OpenStack