Upload
marina-kolpakova
View
1.213
Download
5
Embed Size (px)
Citation preview
CODE GPU WITH CUDACUDA
INTRODUCTION
Created by Marina Kolpakova ( ) for cuda.geek Itseez
PREVIOUS
OUTLINETerminologyDefinitionProgramming modelExecution modelMemory modelsCUDA kernel
OUT OF SCOPECUDA API overview
TERMINOLOGYDevice
CUDA-capable NVIDIA GPUDevice code
code executed on the deviceHost
x86/x64/arm CPUHost code
code executed on the hostKernel
concrete device function
CUDACUDA is a Compute Unified Device Arhitecture.CUDA includes:
1. Capable GPU hardware and driver2. Device ISA, GPU assembler, Compiler3. C++ based HL language, CUDA Runtime
CUDA defines:programming modelexecution modelmemory model
PROGRAMMING MODELKernel is executed by many threads
PROGRAMMING MODELThreads are grouped into blocks
Each thread has a thread ID
PROGRAMMING MODELThread blocks form an execution grid
Each block has a block ID
EXECUTION (HW MAPPING) MODELSingle thread is executed on core
EXECUTION (HW MAPPING) MODELEach block is executed by one SM and does not migrateNumber of concurrent blocks that can reside on SM depends on available resources
EXECUTION (HW MAPPING) MODELThreads in a block can cooperate via shared memory and synchronizationThere is no hardware support for cooperation between threads from different blocks
EXECUTION (HW MAPPING) MODELOne or multiple (sm_20+) kernels are executed on the device
MEMORY MODELThread has its own registers
MEMORY MODELThread has its own local memory
MEMORY MODELBlock has shared memoryPointer to shared memory is valid while block is resident
_ _ s h a r e d _ _ f l o a t b u f f e r [ C T A _ S I Z E ] ;
MEMORY MODELGrid is able to access global and constant memory
BASIC CUDA KERNELWork for GPU threads represented as kernelkernel represents a task for single thread (scalar notation)Every thread in a particular grid executes the same kernelThreads use their threadIdx and blockIdx to dispatch workKernel function is marked with __global__ keyword
Common kernel structure:1. Retrieving position in grid (widely named tid)2. Loading data form GPU’s memory3. Performing compute work4. Writing back the result into GPU’s memory
_ _ g l o b a l _ _ v o i d k e r n e l ( f l o a t * i n , f l o a t * o u t ){ i n t t i d = b l o c k I d x . x * b l o c k D i m . x + t h r e a d I d x . x ; o u t [ t i d ] = i n [ t i d ] ;}
KERNEL EXECUTIONv o i d e x e c u t e _ k e r n e l ( c o n s t * f l o a t h o s t _ i n , f l o a t * h o s t _ o u t , i n t s i z e ){ f l o a t * d e v i c e _ i n , * d e v i c e _ o u t ; c u d a M a l l o c ( ( v o i d * * ) & d e v i c e _ i n , s i z e * s i z e o f ( f l o a t ) ) ; c u d a M a l l o c ( ( v o i d * * ) & d e v i c e _ o u t , s i z e * s i z e o f ( f l o a t ) ) ;
/ / 1 . U p l o a d d a t a i n t o d e v i c e m e m o r y c u d a M e m c p y ( d e v i c e _ i n , h o s t _ i n , c u d a M e m c p y H o s t T o D e v i c e ) ;
/ / 2 . C o n f i g u r e k e r n e l l a u n c h d i m 3 b l o c k ( 2 5 6 ) ; d i m 3 g r i d ( s i z e / 2 5 6 ) ;
/ / 3 . E x e c u t e k e r n e l k e r n e l < < < g r i d , b l o c k > > > ( d e v i c e _ i n , d e v i c e _ o u t ) ;
/ / 4 . W a i t t i l l c o m p l e t i o n c u d a T h r e a d S y n c h r o n i z e ( ) ;
/ / 5 . D o w n l o a d r e s u l t s i n t o h o s t m e m o r y c u d a M e m c p y ( h o s t _ o u t , d e v i c e _ o u t , c u d a M e m c p y D e v i c e T o H o s t ) ;}
FINAL WORDSCUDA is a set of capable GPU hardware, driver, GPU ISA, GPU assembler, compiler, C++based HL language and runtime which enables programming of NVIDIA GPUCUDA function (kernel) is called on a grid of blocksKernel runs on unified programmable coresKernel is able to access registers and local memory, share memory inside a block ofthreads and access RAM through global, texture and constant memories
THE ENDNEXT
BY / 2013–2015CUDA.GEEK