20
CODE GPU WITH CUDA CUDA INTRODUCTION Created by Marina Kolpakova ( ) for cuda.geek Itseez PREVIOUS

Code gpu with cuda - CUDA introduction

Embed Size (px)

Citation preview

Page 1: Code gpu with cuda - CUDA introduction

CODE GPU WITH CUDACUDA

INTRODUCTION

Created by Marina Kolpakova ( ) for cuda.geek Itseez

PREVIOUS

Page 2: Code gpu with cuda - CUDA introduction

OUTLINETerminologyDefinitionProgramming modelExecution modelMemory modelsCUDA kernel

Page 3: Code gpu with cuda - CUDA introduction

OUT OF SCOPECUDA API overview

Page 4: Code gpu with cuda - CUDA introduction

TERMINOLOGYDevice

CUDA-capable NVIDIA GPUDevice code

code executed on the deviceHost

x86/x64/arm CPUHost code

code executed on the hostKernel

concrete device function

Page 5: Code gpu with cuda - CUDA introduction

CUDACUDA is a Compute Unified Device Arhitecture.CUDA includes:

1. Capable GPU hardware and driver2. Device ISA, GPU assembler, Compiler3. C++ based HL language, CUDA Runtime

CUDA defines:programming modelexecution modelmemory model

Page 6: Code gpu with cuda - CUDA introduction

PROGRAMMING MODELKernel is executed by many threads

Page 7: Code gpu with cuda - CUDA introduction

PROGRAMMING MODELThreads are grouped into blocks

Each thread has a thread ID

Page 8: Code gpu with cuda - CUDA introduction

PROGRAMMING MODELThread blocks form an execution grid

Each block has a block ID

Page 9: Code gpu with cuda - CUDA introduction

EXECUTION (HW MAPPING) MODELSingle thread is executed on core

Page 10: Code gpu with cuda - CUDA introduction

EXECUTION (HW MAPPING) MODELEach block is executed by one SM and does not migrateNumber of concurrent blocks that can reside on SM depends on available resources

Page 11: Code gpu with cuda - CUDA introduction

EXECUTION (HW MAPPING) MODELThreads in a block can cooperate via shared memory and synchronizationThere is no hardware support for cooperation between threads from different blocks

Page 12: Code gpu with cuda - CUDA introduction

EXECUTION (HW MAPPING) MODELOne or multiple (sm_20+) kernels are executed on the device

Page 13: Code gpu with cuda - CUDA introduction

MEMORY MODELThread has its own registers

Page 14: Code gpu with cuda - CUDA introduction

MEMORY MODELThread has its own local memory

Page 15: Code gpu with cuda - CUDA introduction

MEMORY MODELBlock has shared memoryPointer to shared memory is valid while block is resident

_ _ s h a r e d _ _ f l o a t b u f f e r [ C T A _ S I Z E ] ;

Page 16: Code gpu with cuda - CUDA introduction

MEMORY MODELGrid is able to access global and constant memory

Page 17: Code gpu with cuda - CUDA introduction

BASIC CUDA KERNELWork for GPU threads represented as kernelkernel represents a task for single thread (scalar notation)Every thread in a particular grid executes the same kernelThreads use their threadIdx and blockIdx to dispatch workKernel function is marked with __global__ keyword

Common kernel structure:1. Retrieving position in grid (widely named tid)2. Loading data form GPU’s memory3. Performing compute work4. Writing back the result into GPU’s memory

_ _ g l o b a l _ _ v o i d k e r n e l ( f l o a t * i n , f l o a t * o u t ){ i n t t i d = b l o c k I d x . x * b l o c k D i m . x + t h r e a d I d x . x ; o u t [ t i d ] = i n [ t i d ] ;}

Page 18: Code gpu with cuda - CUDA introduction

KERNEL EXECUTIONv o i d e x e c u t e _ k e r n e l ( c o n s t * f l o a t h o s t _ i n , f l o a t * h o s t _ o u t , i n t s i z e ){ f l o a t * d e v i c e _ i n , * d e v i c e _ o u t ; c u d a M a l l o c ( ( v o i d * * ) & d e v i c e _ i n , s i z e * s i z e o f ( f l o a t ) ) ; c u d a M a l l o c ( ( v o i d * * ) & d e v i c e _ o u t , s i z e * s i z e o f ( f l o a t ) ) ;

/ / 1 . U p l o a d d a t a i n t o d e v i c e m e m o r y c u d a M e m c p y ( d e v i c e _ i n , h o s t _ i n , c u d a M e m c p y H o s t T o D e v i c e ) ;

/ / 2 . C o n f i g u r e k e r n e l l a u n c h d i m 3 b l o c k ( 2 5 6 ) ; d i m 3 g r i d ( s i z e / 2 5 6 ) ;

/ / 3 . E x e c u t e k e r n e l k e r n e l < < < g r i d , b l o c k > > > ( d e v i c e _ i n , d e v i c e _ o u t ) ;

/ / 4 . W a i t t i l l c o m p l e t i o n c u d a T h r e a d S y n c h r o n i z e ( ) ;

/ / 5 . D o w n l o a d r e s u l t s i n t o h o s t m e m o r y c u d a M e m c p y ( h o s t _ o u t , d e v i c e _ o u t , c u d a M e m c p y D e v i c e T o H o s t ) ;}

Page 19: Code gpu with cuda - CUDA introduction

FINAL WORDSCUDA is a set of capable GPU hardware, driver, GPU ISA, GPU assembler, compiler, C++based HL language and runtime which enables programming of NVIDIA GPUCUDA function (kernel) is called on a grid of blocksKernel runs on unified programmable coresKernel is able to access registers and local memory, share memory inside a block ofthreads and access RAM through global, texture and constant memories