Accelerating High-Throughput Computing through OpenCL

Andrei Dafinoiu, Joshua Higgins, Violeta Holmes

High-Performance Computing Research Group

University of Huddersfield

Huddersfield, United Kingdom

OverviewIntroduction

Resources

Motivation

Experiment Design

Results and Performance

Conclusions

OpenCL

• OpenCL -> A programming

framework for heterogeneous

compute platforms

• Supports CPU, GPU, DSP, and

other accelerators

• Programming API based on C/C++

Introduction

• High-Throughput Computing

• HTCondor

• QGGCondor

Aim of project• To expand the capabilities of the QGGCondor.

• To evaluate the efficiency and flexibility of OpenCL for use within a heterogeneous HTC environment.

• To increase the visibility of GPGPU computing

HTCondor SetupClassAdds, what are they ?

Old ClassAdd HTCondor Proposed New ClassAdd

Environment SetupReasoning:

- Unreliable environment for benchmarking purposes.

- Desire to execute benchmarks over the live system.

Method:

- Extracted hostnames of 1000 condor machines.

- Script based generation of jobs.

Resource Discovery

AMD 5600

1%AMD 6400

AMD 6500

4%GTX 610

4%GTX 670

GTX 750 TI

GTX 970

Not Detected

GPU DISTRIBUTION

AMD 5600 AMD 6400 AMD 6500 K600 GTX 610

GTX 670 GTX 750 TI GTX 970 Not Detected

Experiment DesignFast-Fourier Transform

1000 iterations -> to ensure precision

17 FFT sizes -> from 2^8 to 2^24

701 machines -> that reported GPUs

Resulting in:

-Aprox. 12 million FFT calculations

-Aprox. 28 thousand CPU hours

Results

2^8 2^11 2^14 2^17 2^20 2^23

FFT Size

1D FFT on CPU

Intel I50

2^8 2^11 2^14 2^17 2^20 2^23

FFT Size

1D FFT on GPU

AMD 6500

2^8 2^11 2^14 2^17 2^20 2^23

FFT Size

QGGCondor GPU FFT Performance

AMD 6500 AMD 6400 GTX 970 GTX 750 Ti GTX 670 AMD 5600

Comparison with GPU cluster

2^8 2^11 2^14 2^17 2^20 2^23

FFT Size

Average Performance

Condor C2050 AMD NVIDIA• On average, a single compute

GPU is negligibly faster.

• Newer-gen GPGPUs outperform

older compute counterparts.

P.S: NVIDIA C2050 was released in 2011

Conclusion•HTCondor GPU integration was successful with single entity ClassAdd definition.

• OpenCL based implementations over HTCondor are straightforward (without specific optimizations) however machines need dedicated graphics drivers installed.

• QGGCondor GPU performance varies greatly however the average is marginally slower than that of a dedicated compute GPU.

Thank you for your attention

Questions ?

Accelerating High-Throughput Computing through OpenCL · Accelerating High-Throughput Computing...

Documents

Lecture 11: “GPGPU” computing and the CUDA/OpenCL

High Throughput Computing on P2P Networks

OpenCL Parallel Computing for CPUs and GPUs 201003

What Is High Throughput Distributed Computing

An Introduction to OpenCL for Scientific Computing

Condor: High-throughput Computing From Clusters to Grid Computing

High Throughput Computing

Introduction to GPU Computing with OpenCL - Nvidiadeveloper.download.nvidia.com/CUDA/training/NVIDIA... · Introduction to GPU Computing with OpenCL. ... //nvdeveloper.nvidia.com/login.asp

High Throughput Computing and Protein Structure

Parallel Programming Concepts GPU Computing with OpenCL

OpenCL Heterogeneous Parallel Computing

GPU-Programmierung: OpenCL€¦ · Einsatzgebiete von GPU-Computing Entwicklung von GPU-Computing 2 OpenCL Entwicklung Architektur Spracheigenschaften Vergleich mit CUDA Beispiel

OpenCL - SPEEDUP Parallel Computing for Heterogeneous Devices Ofer Rosenberg Visual Computing Software Division, Intel Based on Khronos OpenCL …

HAIBO XIE, PH.D. haibo.xie@amd to OpenCL.pdf · FUNDAMENTALS FOR OPENCL PROGRAMMING Parallel computing thinking ‒Parallel computing thinking is a must-have for OpenCL programming

High Throughput Computing Technologies

OpenCL (pdf presentation) - Beyond Programmable Shadings08.idav.ucdavis.edu/munshi-opencl.pdf · •OpenCL – Open Computing Language ... OpenCL Software Stack. Beyond Programmable

Parallel Programming Concepts GPU Computing … Programming Concepts GPU Computing with OpenCL ... Programming Models AMD: ATI Stream SDK ... NVIDIA, 2009. NVIDIA OpenCL Best Practices

Throughput-Optimized OpenCL-based FPGA …Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks ... et al. Optimizing FPGA-based accelerator

GPU-Computing mit CUDA und OpenCL in der Praxis

High Throughput Computing - University of · PDF fileHTCondor Grid Computing ... Tue Apr 7 2015 21 High Throughput Computing High Throughput Computing (HTC) means getting lots