GPU Architecture and Programming. GPU vs CPU

GPU Architecture and Programming

GPU vs CPUhttps://www.youtube.com/watch?v=fKK933KK6Gg

GPU Architecture

• GPU (Graphics Processing Unit) were originally designed as graphics accelerators, used for real-time graphics rendering.

• Starting in the late 1990s, the hardware became increasingly programmable, culminating in NVIDIA's first GPU in 1999.

• CPU + GPU is a powerful combination – CPUs consist of a few cores optimized for serial processing, – GPUs consist of thousands of smaller, more efficient cores

designed for parallel performance. – Serial portions of the code run on the CPU while parallel

portions run on the GPU

Architecture of GPU

Image copied from http://www.pgroup.com/lit/articles/insider/v2n1a5.htm Image copied from http://people.maths.ox.ac.uk/~gilesm/hpc/NVIDIA/NVIDIA_CUDA_Tutorial_No_NDA_Apr08.pdf

CUDA Programming

• CUDA (Compute Unified Device Architecture) is a parallel programming platform created by NVIDIA based on its GPUs.

• By using CUDA, you can write programs that directly access GPU.

• CUDA platform is accessible to programmers via CUDA libraries and extensions to programming languages like C, C++ AND Fortran. – C/C++ programmers use “CUDA C/C++”, compiled with nvcc

compiler– Fortran programmers can use CUDA Fortran, compiled with PGI

CUDA Fortran

• Terminology:– Host: The CPU and its memory (host memory)– Device: The GPU and its memory (device memory)

Programming Paradigm

Copy from http://on-demand.gputechconf.com/gtc-express/2011/presentations/GTC_Express_Sarah_Tariq_June2011.pdf

Parallel function of application: execute as a kernel

Programming Flow

1. Copy input data from CPU memory to GPU memory

2. Load GPU program and execute3. Copy results from GPU memory to CPU

memory

• Each parallel function of application is execute as a kernel

• That means GPUs are programmed as a sequence of kernels; typically, each kernel completes execution before the next kernel begins.

• Fermi has some support for multiple, independent kernels to execute simultaneously, but most kernels are large enough to fill the entire machine.

Image copied from http://people.maths.ox.ac.uk/~gilesm/hpc/NVIDIA/NVIDIA_CUDA_Tutorial_No_NDA_Apr08.pdf

Hello World! Example

_ _global_ _ is a CUDA C/C++ keyword meaning • mykernel() will be exectued on the device• mykernel() will be called from the host

Addition Example

• Since add runs on device, pointers a, b, and c must point to device memory

Vector Addition Example

Kernel Function:

Alternative 1:

Alternative 2:

int globalThreadId = threadIdx.x + blockIdx.x * M //M is the number of threads in a block

Int globalThreadId = threadIdx.x + blockIdx.x * blockDim.x

• So the kernel becomes

• The main becomes

Handling Arbitrary Vector Sizes

GPU Architecture and Programming. GPU vs CPU

Documents

Extending Unified Parallel C for GPU Computing · PGAS Programming Model for Hybrid Multi-Core Systems Computer Node CPU Memory GPU GPU Memory CPU CPU GPU GPU Memory Computer Node

Multi-GPU Programming - GPU Technology Conference

Programming Heterogeneous Many-Cores Using Directives - Part 1developer.download.nvidia.com/GTC/PDF/GTC2012/... · S0630-31 23 Migration Process, Tool View CPU Code Analysis GPU Programming

CMPT454 GPU Managed Database · GPGPU: General Purpose GPU, using GPU for usual CPU usage. Outline 1. GPU VS CPU 2. GPU Implementation 3. Products 4. Future Holds. GPU VS CPU. GPU

Turning software into hardware - Hastlayer · CPU vs GPU vs FPGA 13 Parallelism Program complexity Power efficiency CPU GPU FPGA. But! 14 CPU GPU FPGA FPGA with Hastlayer How hard

GPU Programming Guide G80 - NVIDIA · Improving GPU performance simply increases GPU idle time. Another easy way to find out if your application is CPU-limited is to ignore all draw

GPU Programming on CPU - Using C++AMP

GPU Analysis and Optimisation - Peoplepeople.maths.ox.ac.uk/gilesm/cuda/lecs/NV_Profiling_lowres.pdf · inter-node/inter-process communication CPU-GPU communication CPU/GPU performance

INTELLIGENT SCHEDULING FOR SIMULTANEOUS CPU-GPU

GPU vs CPU Supercomputing Security Shootout

A Discussion of CPU vs. GPU

Unicorn: A Bulk Synchronous Programming Model, …sbansal/theses/tarun_beri.pdfunicorn: a bulk synchronous programming model, framework and runtime for hybrid cpu-gpu clusters tarun

High Performance Computing and GPU Programming · Programming Lecture 1: Introduction Objectives C++/CPU Review GPU Intro Programming Model . Objectives •efore we begin…a little

Selective GPU Caches to Eliminate CPU–GPU HW Cache Coherence

Integer Programming Based Heterogeneous CPU-GPU Cluster Schedulers …ozturan/preprint-soner-ozturan-jcss.pdf · Integer Programming Based Heterogeneous CPU-GPU Cluster Schedulers

Build GPU Cluster Hardware for Efficiently Accelerating ... · Hardware GPU dense HPC Cluster CPU Host RAM GPU #5 PHB IB Card Host RAM CPU GPU #0 PHB IB Card GPU #1 GPU #4 GPU #2

FShark - DIKUhjemmesider.diku.dk/~zgh600/Publications/mikkel-msc-thesis.pdf · GPU programs, compared to normal (sequential) CPU programming, severely hinders the adoption of GPU

CPU, GPU und FPGA - eti/Vorlesung/WS1718/Informationsmaterial/... · Maximilian Bandle CPU, GPU und FPGA CPU • Bisher in Vorlesung betrachtet • Über Assembler/Maschinensprache

CPU vs. GPU presentation

CUDA and GPGPU Computing · 2020-04-07 · Spring 2019 CS4823/6643 Parallel Processing 2 GPGPU Programming As GPU is a drastically different from CPU, programming on GPU requires