Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching Center @ UoM

Training Program onGPU Programming

with CUDA

31st July, 7th Aug, 14th Aug 2011CUDA Teaching Center @ UoM

Training Program on GPU Programming with CUDA

Sanath JayasenaCUDA Teaching Center @ UoM

Day 1, Session 1

Introduction

Outline

• Training Program Description• CUDA Teaching Center at UoM

Subject Matter• Introduction to GPU Computing• GPU Computing with CUDA• CUDA Programming Basics

July-Aug 2011 3CUDA Training Program

Overview of Training Program

• 3 Sundays, starting 31st July• Schedule and program outline

• Main resource persons– Sanath Jayasena, Jayathu Samarawickrama, Kishan

Wimalawarna, Lochandaka Ranathunga• Dept of Computer Science & Eng, Dept of Electronic &

Telecom. Engineering (of Faculty of Engineering) and Faculty of IT

July-Aug 2011 CUDA Training Program 4

CUDA Teaching Center

• UoM was selected as a CTC– A group of people from multiple Depts– http://research.nvidia.com/content/cuda-teaching-centers

• Benefits– Donation of hardware by NVIDIA (GeForce

GTX480s and Tesla C2070)– Access to other resources

• Expectations– Use of the resources for teaching/research,

industry collaborationJuly-Aug 2011 CUDA Training Program 5

GPU Computing: Introduction

• Graphics Processing Units (GPUs)– high-performance many-core processors that can

be used to accelerate a wide range of applications

• GPGPU - General-Purpose computation on Graphics Processing Units

• GPUs lead the race for floating-point performance since start of 21st century

• GPUs are being used as parallel processors



• General computing, until end of 20th century– Relied on the advances in hardware to increase the

speed of software/apps• Slowed down since then due to

– Power consumption issues– Limited productivity within a single processor

• Switch to multi-core and many-core models – Multiple processing units (processor cores) used in

each chip to increase the processing power– Impact on software developers?



• A sequential program will only run on one of the cores, which will not become any faster

• With each new generation of processors – Software that will continue to enjoy performance

improvement will be parallel programs– Where, multiple threads of execution cooperate to

achieve the functionality faster


CPU-GPU Performance Gap


Source: CUDA Prog. Guide 4.0

CPU-GPU Performance Gap


Source: CUDA Prog. Guide 4.0

GPGPU & CUDA

• GPU designed as a numeric computing engine – Will not perform well on some tasks as CPUs– Most applications will use both CPUs and GPUs

• CUDA– NVIDIA’s parallel computing architecture aimed at

increasing computing performance by harnessing the power of the GPU

– A programming model


More Details on GPUs

• GPU is typically a computer card, installed into a PCI Express 16x slot

• Market leaders: NVIDIA, Intel, AMD (ATI)– Example NVIDIA GPUs (donated to UoM)

GeForce GTX 480 Tesla 2070


Example SpecificationsGTX 480 Tesla 2070

Peak double precision floating point performance

650 Gigaflops 515 Gigaflops

Peak single precision floating point performance

1300 Gigaflops 1030 Gigaflops

CUDA cores 480 448

Frequency of CUDA Cores

1.40 GHz 1.15 GHz

Memory size (GDDR5) 1536 MB 6 GigaBytes

Memory bandwidth 177.4 GBytes/sec 150 GBytes/sec

ECC Memory NO YES


CPU vs. GPU Architecture

The GPU devotes more transistors for computation


CPU-GPU Communication


CUDA Architecture• CUDA is NVIDA’s solution to access the GPU• Can be seen as an extension to C/C++

CUDA Software Stack


CUDA ArchitectureThere are two main parts

1.Host (CPU part)-Single Program, Single Data

2.Device (GPU part)-Single Program, Multiple

Data


CUDA Architecture

GRID ArchitectureJuly-Aug 2011 18CUDA Training Program

The Grid1.A group of threads all running

the same kernel2.Can run multiple grids at once

The Block1.Grids composed of blocks2.Each block is a logical unit containing a number of coordinating threads and some amount of shared memory

Some Applications of GPGPU

Computational Structural Mechanics

Bio-Informatics and Life Sciences

Computational Electromagnetics and Electrodynamics

Computational Finance


Some Applications…

Computational Fluid Dynamics

Data Mining, Analytics, and Databases

Imaging and Computer Vision

Medical Imaging


Some Applications…

Molecular Dynamics

Numerical Analytics

Weather, Atmospheric, Ocean Modelingand Space Sciences


CUDA ProgrammingBasics

Accessing/Using the CUDA-GPUs

• You have been given access to our cluster– User accounts on 192.248.8.13x– It is a Linux system

• CUDA Toolkit and SDK for development– Includes CUDA C/C++ compiler for GPUs (“nvcc”)– Will need C/C++ compiler for CPU code

• NVIDIA device drivers needed to run programs– For programs to communicate with hardware


Example Program 1• “__global__” says

the function is to be compiled to run on a “device” (GPU), not “host” (CPU)

• Angle brackets “<<<“ and “>>>” for passing params/args to runtime


#include <cuda.h>

#include <stdio.h>

__global__ void kernel (void) { }

int main (void)

{

kernel <<< 1, 1 >>> ();

printf("Hello World!\n");

return 0;

}

A function executed on the GPU (device) is usually called a “kernel”

Example Program 2 – Part 1


As can be seen in next slide:

•We can pass parameters to a kernel as we would with any C function

• We need to allocate memory to do anything useful on a device, such as return values to the host

Example Program 2 – Part 2int main (void) {

int c, *dev_c;

cudaMalloc ((void **) &dev_c, sizeof (int));

add <<< 1, 1 >>> (2,7, dev_c);

cudaMemcpy(&c, dev_c, sizeof(int),

cudaMemcpyDeviceToHost);

printf(“2 + 7 = %d\n“, c);

cudaFree(dev_c);

return 0;

}


Example Program 3

Within host (CPU) code, call the kernel by using <<< and >>> specifying the grid size (number of blocks) and/or the block size (number of threads) - (more details later)


Example Program 3 …contd


Note:Details on threads and thread IDs will come later

Example Program 4


Grids, Blocks and Threads


• A grid of size 6 (3x2 blocks)

• Each block has 12 threads (4x3)

Conclusion

• In this session we discussed– Introduction to GPU Computing– GPU Computing with CUDA– CUDA Programming Basics

• Next session– Data Parallelism– CUDA Programming Model– CUDA Threads


References for this Session

• Chapters 1 and 2 of: D. Kirk and W. Hwu, Programming Massively Parallel Processors, Morgan Kaufmann, 2010

• Chapters 1-4 of: E. Kandrot and J. Sanders, CUDA by Example, Addison-Wesley, 2010

• Chapters 1-2 of: NVIDIA CUDA C Programming Guide, NVIDIA Corporation, 2006-2011 (Versions 3.2 and 4.0)


Documents

Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching Center @ UoM