16

PARRAY: The Array-Based GPU Programming Technology Yifeng Chen School of EECS Peking University, China.

PARRAY: The Array-Based GPU Programming … The Array-Based GPU Programming Technology Yifeng Chen School of EECS Peking University, China

Download PDF Report

Upload
hoanghuong
View
243
Download
0

Embed Size (px)

Citation preview

Page 1: PARRAY: The Array-Based GPU Programming … The Array-Based GPU Programming Technology Yifeng Chen School of EECS Peking University, China

PARRAY: The Array-Based GPU Programming Technology

Yifeng Chen

School of EECSPeking University, China.

Page 2: PARRAY: The Array-Based GPU Programming … The Array-Based GPU Programming Technology Yifeng Chen School of EECS Peking University, China

Two Conflicting Approaches for Programmability in HPC

Top-down ApproachCore programming model is high-level (e.g. func parallel lang)Must rely on heavy heuristic runtime optimizationAdd low-level program constructs to improve low-level controlRisks:

Programmers tend to avoid using “extra” constructs.Low-level controls do not fit well into the core model.

Bottom-up Approach (PARRAY)Core programming model exposes the memory hierarchySame algorithm, Same performance, Same intellectual challenge, but Shorter codeRuntime optimization possible, but not part of the core model.

Page 3: PARRAY: The Array-Based GPU Programming … The Array-Based GPU Programming Technology Yifeng Chen School of EECS Peking University, China

Basic Notation

• Dimensions in a tree• A dimension may refer to another array type.

Page 4: PARRAY: The Array-Based GPU Programming … The Array-Based GPU Programming Technology Yifeng Chen School of EECS Peking University, China

Motivating Examples for PARRAY

Page 5: PARRAY: The Array-Based GPU Programming … The Array-Based GPU Programming Technology Yifeng Chen School of EECS Peking University, China

Thread Arrays

Page 6: PARRAY: The Array-Based GPU Programming … The Array-Based GPU Programming Technology Yifeng Chen School of EECS Peking University, China

#parray {pthd [2]} P#parray {paged float [2][[2048][4096]]} H#parray {dmem float # H_1} D#parray {[#P][#D]} Gfloat* host;_pa_pthd* p;#mainhost{

#create P(p)#create H(host)#detour P(p) {

float* dev;INIT_GPU($tid$);#create D(dev)#insert DataTransfer(dev, G, host, H){}

}#destroy H(host)#destroy P(p)

}

pthread_create

sem_post

sem_wait

pthread_join

Generating CUDA+Pthread

Page 7: PARRAY: The Array-Based GPU Programming … The Array-Based GPU Programming Technology Yifeng Chen School of EECS Peking University, China

#parray { mpi [2] } M#parray { paged float [2][[2048][4096]] } H#parray { [#M][#H_1] } G

float* host;_pa_mpi* m;

#mainhosts{#create M(m)#create H(host)#detour M(m) {

float* dev;#create H_1(dev)#insert DataTransfer(dev, G, host, H){}

}#destroy H(host)#destroy M(m)

}

Generating MPI or IB/verbs

MPI_Scatter

Page 8: PARRAY: The Array-Based GPU Programming … The Array-Based GPU Programming Technology Yifeng Chen School of EECS Peking University, China

ALLTOALL

BCAST

Other Communication Patterns

Page 9: PARRAY: The Array-Based GPU Programming … The Array-Based GPU Programming Technology Yifeng Chen School of EECS Peking University, China

One-Line CUDA Code

Page 10: PARRAY: The Array-Based GPU Programming … The Array-Based GPU Programming Technology Yifeng Chen School of EECS Peking University, China

Large-Scale FFTin 20 linesDeeply optimized algorithm (ICS 2010Zero-copy for hmem

Page 11: PARRAY: The Array-Based GPU Programming … The Array-Based GPU Programming Technology Yifeng Chen School of EECS Peking University, China

Page 12: PARRAY: The Array-Based GPU Programming … The Array-Based GPU Programming Technology Yifeng Chen School of EECS Peking University, China

(Before Nov 2011)

Page 13: PARRAY: The Array-Based GPU Programming … The Array-Based GPU Programming Technology Yifeng Chen School of EECS Peking University, China

Direct Simulation of Turbulent Flows

ScaleUp to 14336 3D Single-Precision12 distributed arrays, each with 11 TB data (128TB total)Entire Tianhe-1A with 7168 nodes

Progress4096 3D completed8192 3D half-way and 14336 3D tested for performance.

Software TechnologiesPARRAY (ACM PPoPP’12) code only 300 lines.Programming-level resilience technology for stable computation Conclusion: GPU-accelerated large simulation on entire Tianhe-1A is feasible.

Page 14: PARRAY: The Array-Based GPU Programming … The Array-Based GPU Programming Technology Yifeng Chen School of EECS Peking University, China

Page 15: PARRAY: The Array-Based GPU Programming … The Array-Based GPU Programming Technology Yifeng Chen School of EECS Peking University, China

Page 16: PARRAY: The Array-Based GPU Programming … The Array-Based GPU Programming Technology Yifeng Chen School of EECS Peking University, China

DiscussionsCan other programming models benefit from PARRAY ideas?

MPI (more expressive datatype)OpenACC (optimization for coalescing accesses)PGAS (generating PGAS library calls)IB/verbs (directly generating Zero-Copy IB calls)

PARRAY helps, if you can write it down!Any index expressions using add/mul/mod/divIrregular structures must be encoded into arrays and then benefit from PARRAY. Generating Pthread + CUDA + MPI (future support of FPGA and MIC possible) + macrosMacros are compiled out: no performance loss.Typical training = 3 days, Friendly to engineers, geophysicists…

OpenCV on a GPU · OpenCV GPU header file Upload image from CPU to GPU memory Allocate a temp output image on the GPU Process images on the GPU Process images on the GPU Download

OpenCV on a GPU · OpenCV GPU header file Upload image from CPU to GPU memory Allocate a temp output image on the GPU Process images on the GPU Process images on the GPU Download

Documents

History of GPGPU - CVG · Mapping GPGPU to Rendering Streams (data-parallel arrays): CPU array = GPU texture float a[1024] ↔ glGenTextures(...); glTexImage2D(...) Kernel: Body of

History of GPGPU - CVG · Mapping GPGPU to Rendering Streams (data-parallel arrays): CPU array = GPU texture float a[1024] ↔ glGenTextures(...); glTexImage2D(...) Kernel: Body of

Documents

Presentación de PowerPoint › acwarfra › arpar › AP › AP.fitxategiak › AP-Gardenkia… · - prozesatze-matrizeak (array processors) - GPU . MIMD: Multiple-Instruction-Multiple-Data

Presentación de PowerPoint › acwarfra › arpar › AP › AP.fitxategiak › AP-Gardenkia… · - prozesatze-matrizeak (array processors) - GPU . MIMD: Multiple-Instruction-Multiple-Data

Documents

Computer Vision on GPU with OpenCV - Gipsa- · PDF fileComputer Vision on GPU with OpenCV ... •OpenCV GPU module •Face Detection on GPU •Pedestrian detection on GPU 2 . ... C++

Computer Vision on GPU with OpenCV - Gipsa- · PDF fileComputer Vision on GPU with OpenCV ... •OpenCV GPU module •Face Detection on GPU •Pedestrian detection on GPU 2 . ... C++

Documents

Taming GPU Threads with F# and Alea GPU · Taming GPU Threads with F# and Alea GPU ... Alea Reactive Dataflow ... 20141105_Taming GPU threads with Fshap

Taming GPU Threads with F# and Alea GPU · Taming GPU Threads with F# and Alea GPU ... Alea Reactive Dataflow ... 20141105_Taming GPU threads with Fshap

Documents

Multicore Computer, GPU 및Cluster · What is a Graphics Processing Unit (GPU) Originally for graphics acceleration, now also used for scientific calculations Massively parallel array

Multicore Computer, GPU 및Cluster · What is a Graphics Processing Unit (GPU) Originally for graphics acceleration, now also used for scientific calculations Massively parallel array

Documents

Futhark Purely Functional GPU-programming with Nested Parallelism and … · 2018. 12. 8. · Futhark: Purely Functional GPU-programming with Nested Parallelism and in-place Array

Futhark Purely Functional GPU-programming with Nested Parallelism and … · 2018. 12. 8. · Futhark: Purely Functional GPU-programming with Nested Parallelism and in-place Array

Documents

GPU-Accelerated Iterated Function Systems · GPU Particle Systems • Render-to-vertex array made particle systems possible on the GPU – Lutz Latta, “Building a Million Particle

GPU-Accelerated Iterated Function Systems · GPU Particle Systems • Render-to-vertex array made particle systems possible on the GPU – Lutz Latta, “Building a Million Particle

Documents

Outlook for Parallel Computing in the Electric Power Industry · computations e.g. multi-core CPU, GPU (graphics processing unit) or FPGA (field-programmable gate array). Significant

Outlook for Parallel Computing in the Electric Power Industry · computations e.g. multi-core CPU, GPU (graphics processing unit) or FPGA (field-programmable gate array). Significant

Documents

Best Practices GPU-Based Video Processing | GTC 2013on-demand.gputechconf.com/...Based-Video-Processing... · GPU-Based Video Processing Optimal GPU Upload GPU Processing GPU Readback

Best Practices GPU-Based Video Processing | GTC 2013on-demand.gputechconf.com/...Based-Video-Processing... · GPU-Based Video Processing Optimal GPU Upload GPU Processing GPU Readback

Documents

Build GPU Cluster Hardware for Efficiently Accelerating ... · Hardware GPU dense HPC Cluster CPU Host RAM GPU #5 PHB IB Card Host RAM CPU GPU #0 PHB IB Card GPU #1 GPU #4 GPU #2

Build GPU Cluster Hardware for Efficiently Accelerating ... · Hardware GPU dense HPC Cluster CPU Host RAM GPU #5 PHB IB Card Host RAM CPU GPU #0 PHB IB Card GPU #1 GPU #4 GPU #2

Documents

GPU, GP-GPU, GPU computing

GPU, GP-GPU, GPU computing

Documents

GPU GPU and Supercomputer - University of Rochester€¦ · 4/16/2015 9 Outline: GPU GPGPU and CUDA CPU + GPU SuperComputer (TianHe 1A) Latest Trends GPU accelerated supercomputer

GPU GPU and Supercomputer - University of Rochester€¦ · 4/16/2015 9 Outline: GPU GPGPU and CUDA CPU + GPU SuperComputer (TianHe 1A) Latest Trends GPU accelerated supercomputer

Documents

Multi-GPU MapReduce on GPU Clusters

Multi-GPU MapReduce on GPU Clusters

Software

Programming the GPU With Array-Oriented Syntax In Python | GTC … · 2013. 3. 22. · Programming the GPU with Array-Oriented Syntax in Python GTC 2013 March 21, 2013 ... NumPy +

Programming the GPU With Array-Oriented Syntax In Python | GTC … · 2013. 3. 22. · Programming the GPU with Array-Oriented Syntax in Python GTC 2013 March 21, 2013 ... NumPy +

Documents

Parallel Hybrid Computing · GPU GPU GPU GPU OpenMP HMPP MPI CUDA. Programming Multicores/ ... CILK, TBB, automatic parallelization, vectorization… • Distributed memory architectures

Parallel Hybrid Computing · GPU GPU GPU GPU OpenMP HMPP MPI CUDA. Programming Multicores/ ... CILK, TBB, automatic parallelization, vectorization… • Distributed memory architectures

Documents

GPU Physics - Nvidiadeveloper.download.nvidia.com/.../siggraph/gpu_physics-siggraph-06.pdf · NVIDIA GPU Physics Multi-GPU configurations, mixed or same GPU type One GPU does both

GPU Physics - Nvidiadeveloper.download.nvidia.com/.../siggraph/gpu_physics-siggraph-06.pdf · NVIDIA GPU Physics Multi-GPU configurations, mixed or same GPU type One GPU does both

Documents

Greater Kashmir Inner Pagesepaper.greaterkashmir.com/epaperpdf/632016/632016-md-hr-13.pdf · Nazir Ahmed Parray, Ghulam Hassan Parray Gund Hassi Bhat Srinagar 62. Noor Ul Amin Lone,

Greater Kashmir Inner Pagesepaper.greaterkashmir.com/epaperpdf/632016/632016-md-hr-13.pdf · Nazir Ahmed Parray, Ghulam Hassan Parray Gund Hassi Bhat Srinagar 62. Noor Ul Amin Lone,

Documents

GPU Computing with MATLAB - GPU Technology Conference

GPU Computing with MATLAB - GPU Technology Conference

Documents

Das VELUX INTEGRA® System - Broschüren · GPU PK06 0,75 GPU SK06 0,95 30°–43° 140 cm GPU FK08 0,58 GPU MK08 0,72 GPU PK08 0,92 GPU SK08 1,16 25°–35° 9 Zugelassener Dachneigungsbereich

Das VELUX INTEGRA® System - Broschüren · GPU PK06 0,75 GPU SK06 0,95 30°–43° 140 cm GPU FK08 0,58 GPU MK08 0,72 GPU PK08 0,92 GPU SK08 1,16 25°–35° 9 Zugelassener Dachneigungsbereich

Documents

GPU Computing with MATLAB - GTC On Demand · 10 GPU Functionality Call GPU(s) from MATLAB or toolbox/server worker Support for CUDA 1.3 enabled devices GPU array data type – Store

GPU Computing with MATLAB - GTC On Demand · 10 GPU Functionality Call GPU(s) from MATLAB or toolbox/server worker Support for CUDA 1.3 enabled devices GPU array data type – Store

Documents

GPU Acceleration for Seismic Interpretation Algorithmsdeveloper.download.nvidia.com/GTC/...GTC2012-GPU-Acceleration-Sei… · GPU Acceleration for Seismic ... GPU Acceleration for

GPU Acceleration for Seismic Interpretation Algorithmsdeveloper.download.nvidia.com/GTC/...GTC2012-GPU-Acceleration-Sei… · GPU Acceleration for Seismic ... GPU Acceleration for

Documents

GPU Architecture & Implications - Computer Scienceskadron/cuda_asplos08_tutorial/4-GPU-architecture.pdf · GPU Architecture CUDA provides a parallel programming model The Tesla GPU

GPU Architecture & Implications - Computer Scienceskadron/cuda_asplos08_tutorial/4-GPU-architecture.pdf · GPU Architecture CUDA provides a parallel programming model The Tesla GPU

Documents

Inspur GPU Server - 株式会社キング・テック2017-7-21 · Inspur AI Computing Platform 3 GPU Server 4 GPU Server 8 GPU Server 16 GPU Server NF5280M4 (2CPU + 3 GPU) NF5280M5

Inspur GPU Server - 株式会社キング・テック2017-7-21 · Inspur AI Computing Platform 3 GPU Server 4 GPU Server 8 GPU Server 16 GPU Server NF5280M4 (2CPU + 3 GPU) NF5280M5

Documents

GPU-Based Hierarchical Texture Decompression short2staff.elka.pw.edu.pl/.../GPU-Based_Hierarchical_Texture_Decompression.pdf · GPU-Based Hierarchical Texture Decompression J. Stachera

GPU-Based Hierarchical Texture Decompression short2staff.elka.pw.edu.pl/.../GPU-Based_Hierarchical_Texture_Decompression.pdf · GPU-Based Hierarchical Texture Decompression J. Stachera

Documents

MOBİL PLATFORMLARDA FIR FİLTRE TASARIMI İÇİN FPGA VE … · Kısaltmalar Açıklama FIR Finite impulse response FPGA Field programmable gate array GPU Graphics processing unit

MOBİL PLATFORMLARDA FIR FİLTRE TASARIMI İÇİN FPGA VE … · Kısaltmalar Açıklama FIR Finite impulse response FPGA Field programmable gate array GPU Graphics processing unit

Documents

GPU-to-GPU and Host-to-Host Multipattern String Matching on a GPU

GPU-to-GPU and Host-to-Host Multipattern String Matching on a GPU

Documents

Fred´ eric Bastien´ A Common GPU n-Dimensional Array for ...lisa/pointeurs/BigLearn_GPUNdArray_poster.pdf · James Bergstra, Olivier Breleux, Fred´ eric Bastien, Pascal Lamblin,

Fred´ eric Bastien´ A Common GPU n-Dimensional Array for ...lisa/pointeurs/BigLearn_GPUNdArray_poster.pdf · James Bergstra, Olivier Breleux, Fred´ eric Bastien, Pascal Lamblin,

Documents

Panini: A GPU Aware Array Class - GPU Technology

Panini: A GPU Aware Array Class - GPU Technology

Documents

GPU Benefits for Earth System Science › sites › default › files... · GPU Benefits for Earth System Science. 2 TOPICS ... 19km GPU 19km CPU 1.9km GPU.93km GPU ... NOAA FV3 GPU

GPU Benefits for Earth System Science › sites › default › files... · GPU Benefits for Earth System Science. 2 TOPICS ... 19km GPU 19km CPU 1.9km GPU.93km GPU ... NOAA FV3 GPU

Documents