106
ONEAPI SINGLE PROGRAMMING MODEL TO DELIVER CROSS-ARCHITECTURE PERFORMANCE MODULE 2 INTRODUCTION TO DPC++

oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

ONEAPISINGLE PROGRAMMINGMODEL TO DELIVER CROSS-ARCHITECTURE PERFORMANCE

MODULE 2INTRODUCTION TO DPC++

Page 2: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

ONEAPI TRAINING SERIES2

▷ Module 1: Getting Started with oneAPI▷ Module 2: Introduction to DPC++▷ Module 3: Fundamentals of DPC++, part 1 of 2▷ Module 4: Fundamentals of DPC++, part 2 of 2▷ Modules 5+: Deeper dives into specific DPC++ features,oneAPI libraries and tools

https://oneapi.comhttps://software.intel.com/en-us/oneapi

https://tinyurl.com/book-dpcpphttp://tinyurl.com/oneapimodule?2

oneAPI module 2: Introduction to DPC++ 2 / 97

Page 3: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

1 DPC++, SYCL, C++, and a Heterogeneous Universe

2 Anatomy of a DPC++ program

3 Revisit Doubler - DPC++

4 Device Selection in DPC++

5 What is Data Parallel Computing?

6 DevCloud - Try oneAPI easily

oneAPI module 2: Introduction to DPC++ 3 / 97

Page 4: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

RESOURCES4

▷ Book (Chapters 1-4 Preview)▷ oneAPI Toolkit(s)▷ Training, Support, Forums,Example Code

All availableFree

https://software.intel.com/en-us/oneapi https://tinyurl.com/book-dpcpphttp://tinyurl.com/oneapimodule?2

oneAPI module 2: Introduction to DPC++ 4 / 97

Page 5: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

§1. DPC++, SYCL, C++, AND A HETEROGENEOUS UNIVERSE

Page 6: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

1 DPC++, SYCL, C++, and a Heterogeneous Universe

2 Anatomy of a DPC++ program

3 Revisit Doubler - DPC++

4 Device Selection in DPC++

5 What is Data Parallel Computing?

6 DevCloud - Try oneAPI easily

oneAPI module 2: Introduction to DPC++ 6 / 97

Page 7: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

WHAT IS DPC++?7

DPC++ implements cross-platform data parallelism support (extends C++).

▷ adheres to the SYCL specification▷ implements cross-platform abstraction layer for dataparallelism

▷ open source implementation (github) with all featuressupported

▷ utilizes Clang and LLVM▷ product implementation, support, and tools available fromIntel

▷ DPC++ book in progress - first four chapters available (free)

oneAPI module 2: Introduction to DPC++ 7 / 97

Page 8: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

WHAT IS SYCL?8

SYCL is an industry-wide standardization effort to definecross-platform data parallelism support for C++.▷ pronounced `sickle' `sick ell' /"sik(@)l/▷ cross-platform abstraction layer for data parallelism▷ single source programming▷ extends modern C++▷ defined by a Khronos standards group▷ Intel is a participant in the standards group, as are many more▷ Most of DPC++ is already part of SYCL▷ Intel's contributes back new additions

oneAPI module 2: Introduction to DPC++ 8 / 97

Page 9: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

SYCL AND C++9

▷ SYCL is based on purely modern C++.▷ SYCL should feel familiar to C++11 programmers.▷ SYCL will run ahead of C++ standardization for support ofheterogeneous platforms and parallelism.

▷ ISO C++ of the future, may look a lot like SYCL.

oneAPI module 2: Introduction to DPC++ 9 / 97

Page 10: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

DIVERSEWORKLOADS DEMAND DIVERSE ARCHITECTURES10

The future is adiversemix of scalar, vector, matrix, and spatialarchitectures deployed in CPU, GPU, AI, FPGA, and other accelerators.

oneAPI module 2: Introduction to DPC++ 10 / 97

Page 11: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

DPC++11

The same programming language cansupport all SVMS architectures.

oneAPI module 2: Introduction to DPC++ 11 / 97

Page 12: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

DPC++12

Data Parallel C++provides the features and abstractionnecessary to deliver uncompromisedperformance on SVMS architectures.

oneAPI module 2: Introduction to DPC++ 12 / 97

Page 13: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

PROGRAMMING IN DPC++13

DPC++ implements cross-platform data parallelism support (extends C++).

▷ Write `kernels'▷ Control when/where/how they might be accelerated

oneAPI module 2: Introduction to DPC++ 13 / 97

Page 14: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

§2. ANATOMY OF A DPC++ PROGRAM

Page 15: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

1 DPC++, SYCL, C++, and a Heterogeneous Universe

2 Anatomy of a DPC++ program

3 Revisit Doubler - DPC++

4 Device Selection in DPC++

5 What is Data Parallel Computing?

6 DevCloud - Try oneAPI easily

oneAPI module 2: Introduction to DPC++ 15 / 97

Page 16: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

SOFTWARE PROGRAMMINGMODEL16

▷ Platform model: what to program (host and devices)▷ Execution model: how to control (command queues)▷ Kernel model: how to program (kernels)▷ Memory model: how to feed with data (memory model)

oneAPI module 2: Introduction to DPC++ 16 / 97

Page 17: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

COMPILATION AND EXECUTION17

oneAPI module 2: Introduction to DPC++ 17 / 97

Page 18: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

COMPILATION18

oneAPI module 2: Introduction to DPC++ 18 / 97

Page 19: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

EXECUTION19

oneAPI module 2: Introduction to DPC++ 19 / 97

Page 20: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

EXECUTION - HOST20

oneAPI module 2: Introduction to DPC++ 20 / 97

Page 21: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

EXECUTION - DEVICE(S)21

oneAPI module 2: Introduction to DPC++ 21 / 97

Page 22: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

EXECUTION22

oneAPI module 2: Introduction to DPC++ 22 / 97

Page 23: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

PLATFORM TERMINOLOGY23

▷ Platform = Host + Devices▷ Host makes API calls (queue work, define memory)▷ Devices run Kernels▷ host device always available

oneAPI module 2: Introduction to DPC++ 23 / 97

Page 24: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

PLATFORM INFORMATION24

There are three important sets of capabilities to consider for a SYCL device:the platform version,the version of a device, andthe extensions supported.

oneAPI module 2: Introduction to DPC++ 24 / 97

Page 25: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

DEVCLOUD25

https://software.intel.com/en-us/devcloud/oneapi

oneAPI module 2: Introduction to DPC++ 25 / 97

Page 26: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

PUZZLES26

code for this module: http://tinyurl.com/oneapimodule?2

oneAPI module 2: Introduction to DPC++ 26 / 97

Page 27: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

JUMP ON DEVCLOUD27

$ ssh devcloud...login to the devcloud...

$ wget tinyurl.com/oneapimodule?2 -O 2.tz$ tar xvfz 2.tz...fetch and unpack code I'll be playing with for module 2...

$ pbsnodes -l free...list of free nodes...

$ pbsnodes s001-145...information about node s001-145...

$ pbsnodes | more...lots more detail...$ pbsnodes | grep properties...useful properties list...$ pbsnodes | grep fpga...useful fpga oriented list...

oneAPI module 2: Introduction to DPC++ 27 / 97

Page 28: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

HELLO QSUB28

$ mkdir mytst$ cd mytst

$ cat - > myhello.shecho "HELLO, WORLD!"^D

$ qsub myhello.sh

$ qstatJob ID Name ...------ ----------3463 myhello.sh ...

$ qstat

oneAPI module 2: Introduction to DPC++ 28 / 97

Page 29: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

HELLO QSUB29

$ ls -ltotal 8-rw-r--r-- 1 u27938 u27938 21 Oct 14 22:58 myhello.sh-rw------- 1 u27938 u27938 0 Oct 14 22:58 myhello.sh.e3463-rw------- 1 u27938 u27938 603 Oct 14 22:58 myhello.sh.o3463

$ cat myhello.sh.o3463

######################################################################### Date: Mon Oct 14 22:58:58 PDT 2019# Job ID: 3463.v-qsvr-nda.aidevcloud# User: u27938# Resources: neednodes=1:ppn=2,nodes=1:ppn=2,walltime=06:00:00########################################################################

HELLO, WORLD!

######################################################################### End of output for job 3463.v-qsvr-nda.aidevcloud# Date: Mon Oct 14 22:58:59 PDT 2019########################################################################

oneAPI module 2: Introduction to DPC++ 29 / 97

Page 30: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

HELLO QSUB30

...use a particular node...$ qsub -lnodes=s001-n155:ppn=2

...use a node based on a property...$ qsub -lnodes=1:ppn=2:fpga_compile$ qsub -lnodes=1:ppn=2:gpu$ qsub -lnodes=1:ppn=2:skl$ qsub -lnodes=1:ppn=2:cfl

Command docs: https://tinyurl.com/pbsnodes

oneAPI module 2: Introduction to DPC++ 30 / 97

Page 31: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

CURIOUS2.CPP - SOURCE CODE31

#include <CL/sycl.hpp>

using namespace cl::sycl;

int main() {unsigned number = 0;auto MyPlatforms =

platform::get_platforms();

/* Loop through the platformsSYCL can findthere is always ONE */

for (auto &OnePlatform : MyPlatforms) {std::cout<< ++number << " found..."<< std::endl<< "Platform: "<< OnePlatform.get_info<info::platform::name>()<< std::endl;

/* Loop through the devices SYCL can findthere is always ONE */

auto MyDevices = OnePlatform.get_devices();for (auto &OneDevice : MyDevices ) {std::cout<< " Device: "<< OneDevice.get_info<info::device::name>()<< std::endl;

}std::cout << std::endl;

}}

oneAPI module 2: Introduction to DPC++ 31 / 97

Page 32: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

CURIOUS2.CPP - DOWNLOAD32

...if you did not already fetch and unpack code for module 2...$ wget tinyurl.com/oneapimodule?2 -O 2.tz$ tar xvfz 2.tz

$ cd module02

$ cat mycode.sh#!/bin/shif [ -f /opt/intel/inteloneapi/setvars.sh ]; then

source /opt/intel/inteloneapi/setvars.sh 2>/dev/null 1>/dev/nullficd module02make cleanmake curious2./curious2

oneAPI module 2: Introduction to DPC++ 32 / 97

Page 33: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

CURIOUS2.CPP - RUN33

$ qsub mycode.sh3592...

$ qstat 3592Job ID Name-------- ---------3592... mycode.sh

$ qstat 3592qstat: Unknown Job Id Error 3592...

$ ls -l *3592*-rw------- 1 u27938 u27938 0 Oct 15 13:40 mycode.sh.e3592-rw------- 1 u27938 u27938 1072 Oct 15 13:40 mycode.sh.o3592

oneAPI module 2: Introduction to DPC++ 33 / 97

Page 34: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

CURIOUS2.CPP - RESULTS 1 OF 234

$ cat mycode.sh.o3592######################################################################### Date: Tue Oct 15 13:40:17 PDT 2019# Job ID: 3592.v-qsvr-nda.aidevcloud# User: u27938# Resources: neednodes=1:ppn=2,nodes=1:ppn=2,walltime=06:00:00########################################################################

rm -f curious curious2 verycurious doubler doubler2 doubler3 *~ *.o a.outdpcpp curious2.cpp -o curious21 found...Platform: Intel(R) FPGA Emulation Platform for OpenCL(TM)Device: Intel(R) FPGA Emulation Device

2 found...Platform: Intel(R) FPGA SDK for OpenCL(TM)Device: pac_a10 : Intel PAC Platform (pac_ee00000)

3 found...Platform: Intel(R) OpenCLDevice: Intel(R) Xeon(R) Gold 6128 CPU @ 3.40GHz

oneAPI module 2: Introduction to DPC++ 34 / 97

Page 35: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

CURIOUS2.CPP - RESULTS 2 OF 235

3 found...Platform: Intel(R) OpenCLDevice: Intel(R) Xeon(R) Gold 6128 CPU @ 3.40GHz

4 found...Platform: SYCL host platformDevice: SYCL host device

######################################################################### End of output for job 3592.v-qsvr-nda.aidevcloud# Date: Tue Oct 15 13:40:26 PDT 2019

oneAPI module 2: Introduction to DPC++ 35 / 97

Page 36: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

VERYCURIOUS.CPP36

#include <CL/sycl.hpp>

int main() {unsigned number = 0;auto MyPlatforms = cl::sycl::platform::get_platforms();

/* Loop through the platforms SYCL can findthere is always ONE */

for (auto &OnePlatform : MyPlatforms) {std::cout << ++number << " found..."

<< std::endl<< "Platform: "<< OnePlatform.get_info<cl::sycl::info::platform::name>()<< std::endl;

/* Loop through the devices SYCL can findthere is always ONE */

auto MyDevices = OnePlatform.get_devices();for (auto &OneDevice : MyDevices ) {

std::cout << " Device: "<< OneDevice.get_info<cl::sycl::info::device::name>()<< std::endl;

}std::cout << std::endl;

}}

oneAPI module 2: Introduction to DPC++ 36 / 97

Page 37: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

VERYCURIOUS - PLATFORM 1 OF 437

$ make verycuriousdpcpp verycurious.cpp -o verycurious

$ ./verycurious1 found...Platform:

cl::sycl::info::platform::profile is 'EMBEDDED_PROFILE'cl::sycl::info::platform::version is 'OpenCL 1.0 Intel(R) FPGA SDK for OpenCL(TM), Version 19.2'cl::sycl::info::platform::name is 'Intel(R) FPGA Emulation Platform for OpenCL(TM)'cl::sycl::info::platform::vendor is 'Intel(R) Corporation'cl::sycl::info::platform::extensions is :

1) cl_khr_icd2) cl_khr_byte_addressable_store3) cl_intel_fpga_host_pipe4) cles_khr_int645) cl_khr_il_program

Device: Intel(R) FPGA Emulation Deviceis_host() = Nois_cpu() = Nois_gpu() = Nois_accelerator() = Yescl::sycl::info::device::vendor_id is '4466'cl::sycl::info::device::max_compute_units is '12'cl::sycl::info::device::max_work_item_dimensions is '3'...cl::sycl::info::device::name is 'Intel(R) FPGA Emulation Device'cl::sycl::info::device::vendor is 'Intel(R) Corporation'cl::sycl::info::device::driver_version is '2019.8.9.0'...

oneAPI module 2: Introduction to DPC++ 37 / 97

Page 38: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

VERYCURIOUS - PLATFORM 2 OF 438

2 found...Platform:

cl::sycl::info::platform::profile is 'FULL_PROFILE'cl::sycl::info::platform::version is 'OpenCL 2.1 'cl::sycl::info::platform::name is 'Intel(R) OpenCL HD Graphics'cl::sycl::info::platform::vendor is 'Intel(R) Corporation'cl::sycl::info::platform::extensions is :

1) cl_khr_3d_image_writes2) cl_khr_byte_addressable_store3) cl_khr_fp164) cl_khr_depth_images5) cl_khr_global_int32_base_atomics

...37) cl_intel_advanced_motion_estimation38) cl_intel_va_api_media_sharing

Device: Intel(R) Gen9 HD Graphics NEOis_host() = Nois_cpu() = Nois_gpu() = Yesis_accelerator() = Nocl::sycl::info::device::vendor_id is '32902'cl::sycl::info::device::max_compute_units is '24'cl::sycl::info::device::max_work_item_dimensions is '3'...cl::sycl::info::device::name is 'Intel(R) Gen9 HD Graphics NEO'cl::sycl::info::device::vendor is 'Intel(R) Corporation'cl::sycl::info::device::driver_version is '19.34.13890'...

oneAPI module 2: Introduction to DPC++ 38 / 97

Page 39: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

VERYCURIOUS - PLATFORM 3 OF 439

3 found...Platform:

cl::sycl::info::platform::profile is 'FULL_PROFILE'cl::sycl::info::platform::version is 'OpenCL 2.1 LINUX'cl::sycl::info::platform::name is 'Intel(R) OpenCL'cl::sycl::info::platform::vendor is 'Intel(R) Corporation'cl::sycl::info::platform::extensions is :

1) cl_khr_icd2) cl_khr_global_int32_base_atomics3) cl_khr_global_int32_extended_atomics4) cl_khr_local_int32_base_atomics5) cl_khr_local_int32_extended_atomics

...16) cl_khr_fp6417) cl_khr_image2d_from_buffer

Device: Intel(R) Xeon(R) E-2176G CPU @ 3.70GHzis_host() = Nois_cpu() = Yesis_gpu() = Nois_accelerator() = Nocl::sycl::info::device::vendor_id is '32902'cl::sycl::info::device::max_compute_units is '12'cl::sycl::info::device::max_work_item_dimensions is '3'...cl::sycl::info::device::name is 'Intel(R) Xeon(R) E-2176G CPU @ 3.70GHz'cl::sycl::info::device::vendor is 'Intel(R) Corporation'cl::sycl::info::device::driver_version is '2019.9.9.0'...

oneAPI module 2: Introduction to DPC++ 39 / 97

Page 40: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

VERYCURIOUS - PLATFORM 4 OF 440

4 found...Platform:

cl::sycl::info::platform::profile is 'FULL PROFILE'cl::sycl::info::platform::version is '1.2'cl::sycl::info::platform::name is 'SYCL host platform'cl::sycl::info::platform::vendor is ''cl::sycl::info::platform::extensions is :NO extensions

Device: SYCL host deviceis_host() = Yesis_cpu() = Nois_gpu() = Nois_accelerator() = Nocl::sycl::info::device::vendor_id is '32902'cl::sycl::info::device::max_compute_units is '12'cl::sycl::info::device::max_work_item_dimensions is '3'...cl::sycl::info::device::name is 'SYCL host device'cl::sycl::info::device::vendor is ''cl::sycl::info::device::driver_version is '1.2'...

oneAPI module 2: Introduction to DPC++ 40 / 97

Page 41: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

COMMAND GROUPS AND QUEUES41

▷ Host code creates Queue(s) to command devices▷ Multiple Queues allowed per Device▷ but - a Queue cannot command multiple devices

▷ Command Groups are submitted into Queues

oneAPI module 2: Introduction to DPC++ 41 / 97

Page 42: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

COMMAND GROUPS AND QUEUES41

▷ Host code creates Queue(s) to command devices▷ Multiple Queues allowed per Device▷ but - a Queue cannot command multiple devices

▷ Command Groups are submitted into Queues

oneAPI module 2: Introduction to DPC++ 41 / 97

Page 43: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

RESTRICTIONS ON KERNEL CODE42

Supported include:

▷ lambdas▷ operator overloading▷ templates▷ classes▷ static polymorphism▷ share data with host via

accessors▷ read-only values of host

variables subject via lambdacaptures

Not supported:

▷ dynamic polymorphism▷ dynamic memory allocations▷ static variables▷ function pointers▷ pointer structure members▷ runtime type information▷ exception handling

oneAPI module 2: Introduction to DPC++ 42 / 97

Page 44: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

§3. REVISIT DOUBLER - DPC++

Page 45: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

1 DPC++, SYCL, C++, and a Heterogeneous Universe

2 Anatomy of a DPC++ program

3 Revisit Doubler - DPC++

4 Device Selection in DPC++

5 What is Data Parallel Computing?

6 DevCloud - Try oneAPI easily

oneAPI module 2: Introduction to DPC++ 44 / 97

Page 46: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

"HELLO DOUBLER" DPC++45

#include <CL/sycl.hpp>#include <iostream>#include <array>#include <cstdio>#define SIZE 1024

using namespace cl::sycl;

int main() {std::array<int, SIZE> myArray;for (int i = 0; i<SIZE; ++i)

myArray[i] = i;

printf("Value at start: myArray[42] is %d.\n",myArray[42]);{

queue myQ; /* use defaults today *//* (queue parameters possible - future topic) */

range<1> mySize{SIZE};buffer<int, 1> bufferA(myArray.data(), mySize);

myQ.submit([&](handler &myHandle) {auto deviceAccessorA =

bufferA.get_access<access::mode::read_write>(myHandle);myHandle.parallel_for<class uniqueID>(mySize,

[=](id<1> index){

deviceAccessorA[index] *= 2;}

);});

}printf("Value at finish: myArray[42] is %d.\n",myArray[42]);

}

oneAPI module 2: Introduction to DPC++ 45 / 97

Page 47: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

SCOPES46

1 printf("Value at start: myArray[42] is %d.\n",myArray[42]);2 {3 queue myQ; /* use defaults today */4 /* (queue parameters possible - future topic) */5 range<1> mySize{SIZE};6 buffer<int, 1> bufferA(myArray.data(), mySize);78 myQ.submit([&](handler &myHandle) {9 auto deviceAccessorA =

10 bufferA.get_access<access::mode::read_write>(myHandle);11 myHandle.parallel_for<class uniqueID>(mySize,12 [=](id<1> index)13 {14 deviceAccessorA[index] *= 2;15 }16 );17 });18 }19 printf("Value at finish: myArray[42] is %d.\n",myArray[42]);

▷ command scopelines 8-17

▷ kernel scopelines 11-16

▷ application scopeall else(default))

oneAPI module 2: Introduction to DPC++ 46 / 97

Page 48: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

SCOPES47

1 printf("Value at start: myArray[42] is %d.\n",myArray[42]);2 {3 queue myQ; /* use defaults today */4 /* (queue parameters possible - future topic) */5 range<1> mySize{SIZE};6 buffer<int, 1> bufferA(myArray.data(), mySize);78 myQ.submit([&](handler &myHandle) {9 auto deviceAccessorA =

10 bufferA.get_access<access::mode::read_write>(myHandle);11 myHandle.parallel_for<class uniqueID>(mySize,12 [=](id<1> index)13 {14 deviceAccessorA[index] *= 2;15 }16 );17 });18 }19 printf("Value at finish: myArray[42] is %d.\n",myArray[42]);

▷ command scopelines 8-17

▷ kernel scopelines 11-16

▷ application scopeall else(default))

oneAPI module 2: Introduction to DPC++ 47 / 97

Page 49: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

SCOPES48

1 printf("Value at start: myArray[42] is %d.\n",myArray[42]);2 {3 queue myQ; /* use defaults today */4 /* (queue parameters possible - future topic) */5 range<1> mySize{SIZE};6 buffer<int, 1> bufferA(myArray.data(), mySize);78 myQ.submit([&](handler &myHandle) {9 auto deviceAccessorA =

10 bufferA.get_access<access::mode::read_write>(myHandle);11 myHandle.parallel_for<class uniqueID>(mySize,12 [=](id<1> index)13 {14 deviceAccessorA[index] *= 2;15 }16 );17 });18 }19 printf("Value at finish: myArray[42] is %d.\n",myArray[42]);

▷ command scopelines 8-17

▷ kernel scopelines 11-16

▷ application scopeall else(default))

oneAPI module 2: Introduction to DPC++ 48 / 97

Page 50: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

SCOPES49

1 printf("Value at start: myArray[42] is %d.\n",myArray[42]);2 {3 queue myQ; /* use defaults today */4 /* (queue parameters possible - future topic) */5 range<1> mySize{SIZE};6 buffer<int, 1> bufferA(myArray.data(), mySize);78 myQ.submit([&](handler &myHandle) {9 auto deviceAccessorA =

10 bufferA.get_access<access::mode::read_write>(myHandle);11 myHandle.parallel_for<class uniqueID>(mySize,12 [=](id<1> index)13 {14 deviceAccessorA[index] *= 2;15 }16 );17 });18 }19 printf("Value at finish: myArray[42] is %d.\n",myArray[42]);

▷ command scopelines 8-17

▷ kernel scopelines 11-16

▷ application scopeall else(default))

oneAPI module 2: Introduction to DPC++ 49 / 97

Page 51: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

WHY THE { } BLOCK?50

1 printf("Value at start: myArray[42] is %d.\n",myArray[42]);2 {3 queue myQ; /* use defaults today */4 /* (queue parameters possible - future topic) */5 range<1> mySize{SIZE};6 buffer<int, 1> bufferA(myArray.data(), mySize);78 myQ.submit([&](handler &myHandle) {9 auto deviceAccessorA =

10 bufferA.get_access<access::mode::read_write>(myHandle);11 myHandle.parallel_for<class uniqueID>(mySize,12 [=](id<1> index)13 {14 deviceAccessorA[index] *= 2;15 }16 );17 });18 }19 printf("Value at finish: myArray[42] is %d.\n",myArray[42]);

▷ Putting all the SYCL workinside the { } blockensures all SYCL workhas concluded.

▷ Destruction of bufferA isa barrier (causes a wait)

oneAPI module 2: Introduction to DPC++ 50 / 97

Page 52: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

USE ACCESSORS51

1 printf("Value at start: myArray[42] is %d.\n",myArray[42]);2 {3 queue myQ; /* use defaults today */4 /* (queue parameters possible - future topic) */5 range<1> mySize{SIZE};6 buffer<int, 1> bufferA(myArray.data(), mySize);78 myQ.submit([&](handler &myHandle) {9 auto deviceAccessorA =

10 bufferA.get_access<access::mode::read_write>(myHandle);11 myHandle.parallel_for<class uniqueID>(mySize,12 [=](id<1> index)13 {14 deviceAccessorA[index] *= 2;15 }16 );17 });18 printf("Premature peek: myArray[42] is %d.\n",myArray[42]);19 }20 printf("Value at finish: myArray[42] is %d.\n",myArray[42]);

▷ When I ran it - it printed42, 42, and 84.

▷ The Premature line isnot defined.

▷ Should use an accessor!

oneAPI module 2: Introduction to DPC++ 51 / 97

Page 53: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

▷ Doubler, like other DPC++ kernels, can be mapped to all architectures.▷ The suitability of each architecture is algorithm dependent.

oneAPI module 2: Introduction to DPC++ 52 / 97

Page 54: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

PROGRAMMING IN DPC++53

DPC++ implements cross-platform data parallelism support (extends C++).

▷ Write `kernels'▷ Control when/where/how they might be accelerated

oneAPI module 2: Introduction to DPC++ 53 / 97

Page 55: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

§4. DEVICE SELECTION IN DPC++

Page 56: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

1 DPC++, SYCL, C++, and a Heterogeneous Universe

2 Anatomy of a DPC++ program

3 Revisit Doubler - DPC++

4 Device Selection in DPC++

5 What is Data Parallel Computing?

6 DevCloud - Try oneAPI easily

oneAPI module 2: Introduction to DPC++ 55 / 97

Page 57: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

DPC++ DEVICE SELECTION56

▷ Algorithms▷ Devices

oneAPI module 2: Introduction to DPC++ 56 / 97

Page 58: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

DPC++ DEVICE SELECTION57

▷ In a system withoutaccelerators, we usethe CPU.

▷ The grayed outdevices are not inthis system!

oneAPI module 2: Introduction to DPC++ 57 / 97

Page 59: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

DPC++ DEVICE SELECTION58

▷ Even withaccelertors present,all algorithms can goto the CPU.

▷ In module 2 - we'lldiscuss the threeways this canhappen: Default,Host, or CPU.

oneAPI module 2: Introduction to DPC++ 58 / 97

Page 60: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

DPC++ DEVICE SELECTION59

▷ We can controldevices to use.

▷ Some algorithmsuse the top device.

oneAPI module 2: Introduction to DPC++ 59 / 97

Page 61: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

DPC++ DEVICE SELECTION60

▷ In a system with anaccelerator, but notthe one we wanted.

▷ Let's use the CPU.

oneAPI module 2: Introduction to DPC++ 60 / 97

Page 62: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

DPC++ DEVICE SELECTION61

▷ Consider a differentprogram.

▷ This one hasalgorithms we havefigured out how tomap to multipledevices.

oneAPI module 2: Introduction to DPC++ 61 / 97

Page 63: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

DPC++ DEVICE SELECTION62

▷ The top device canbe used.

▷ Algorithms M, N, andT have code tailoredfor it.

oneAPI module 2: Introduction to DPC++ 62 / 97

Page 64: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

DPC++ DEVICE SELECTION63

▷ The bottom devicecan be used as well.

▷ Same algorithms(M,N,T) plusAlgorithm S.

oneAPI module 2: Introduction to DPC++ 63 / 97

Page 65: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

DPC++ DEVICE SELECTION64

▷ A single systemmight have bothacceleratorsavailable.

▷ We can choose themapping we want -based on bestmatch, applicationbalance, datamovement, etc.

oneAPI module 2: Introduction to DPC++ 64 / 97

Page 66: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

DPC++ DEVICE SELECTION65

▷ In a system with thetop device.

▷ Both programs canbe accelerated.

oneAPI module 2: Introduction to DPC++ 65 / 97

Page 67: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

DPC++ DEVICE SELECTION66

▷ In a system with thebottom device.

▷ Only our secondexample had usesfor it.

oneAPI module 2: Introduction to DPC++ 66 / 97

Page 68: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

DPC++ DEVICE SELECTION67

▷ In system with BOTHdevices.

▷ Both programs haveoptions.

oneAPI module 2: Introduction to DPC++ 67 / 97

Page 69: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

DPC++ DEVICE SELECTION68

▷ In a system with NOaccelerators.

▷ Both programs usethe CPU only.

oneAPI module 2: Introduction to DPC++ 68 / 97

Page 70: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

DPC++ DEVICE SELECTION69

▷ Reconsidering asystem with bothaccelerators.

▷ Choices for thesecond program caninclude both. Wehave full control inour program!

oneAPI module 2: Introduction to DPC++ 69 / 97

Page 71: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

ONEAPI FOR CROSS-ARCHITECTURE PERFORMANCE70

oneAPI module 2: Introduction to DPC++ 70 / 97

Page 72: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

DATA PARALLEL C++ (DPC++)71

Standards-BasedCross-Architecture Language▷ Language to deliver uncompromised parallel programming productivity and

performance across CPUs and accelerators• Allows code reuse across hardware targets...• while permitting custom tuning for a specific accelerator• Open, cross-industry alternative to single architecture proprietary language

▷ Based on C++• Delivers C++ productivity benefits, using common and familiar C and C++ constructs• Incorporates SYCL (from the Khronos Group) to support data parallelism andheterogeneous programming

▷ Language enhancements being driven through community project• Extensions to simplify data parallel programming• Open and cooperative development for continued evolution

▷ Builds upon many years of experience in architecture and compilers

oneAPI module 2: Introduction to DPC++ 71 / 97

Page 73: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

DATA PARALLEL C++ (DPC++)71

Standards-BasedCross-Architecture Language▷ Language to deliver uncompromised parallel programming productivity and

performance across CPUs and accelerators• Allows code reuse across hardware targets...• while permitting custom tuning for a specific accelerator• Open, cross-industry alternative to single architecture proprietary language

▷ Based on C++• Delivers C++ productivity benefits, using common and familiar C and C++ constructs• Incorporates SYCL (from the Khronos Group) to support data parallelism andheterogeneous programming

▷ Language enhancements being driven through community project• Extensions to simplify data parallel programming• Open and cooperative development for continued evolution

▷ Builds upon many years of experience in architecture and compilers

oneAPI module 2: Introduction to DPC++ 71 / 97

Page 74: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

DATA PARALLEL C++ (DPC++)71

Standards-BasedCross-Architecture Language▷ Language to deliver uncompromised parallel programming productivity and

performance across CPUs and accelerators• Allows code reuse across hardware targets...• while permitting custom tuning for a specific accelerator• Open, cross-industry alternative to single architecture proprietary language

▷ Based on C++• Delivers C++ productivity benefits, using common and familiar C and C++ constructs• Incorporates SYCL (from the Khronos Group) to support data parallelism andheterogeneous programming

▷ Language enhancements being driven through community project• Extensions to simplify data parallel programming• Open and cooperative development for continued evolution

▷ Builds upon many years of experience in architecture and compilers

oneAPI module 2: Introduction to DPC++ 71 / 97

Page 75: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

DATA PARALLEL C++ (DPC++)71

Standards-BasedCross-Architecture Language▷ Language to deliver uncompromised parallel programming productivity and

performance across CPUs and accelerators• Allows code reuse across hardware targets...• while permitting custom tuning for a specific accelerator• Open, cross-industry alternative to single architecture proprietary language

▷ Based on C++• Delivers C++ productivity benefits, using common and familiar C and C++ constructs• Incorporates SYCL (from the Khronos Group) to support data parallelism andheterogeneous programming

▷ Language enhancements being driven through community project• Extensions to simplify data parallel programming• Open and cooperative development for continued evolution

▷ Builds upon many years of experience in architecture and compilers

oneAPI module 2: Introduction to DPC++ 71 / 97

Page 76: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

DATA PARALLEL C++ (DPC++)71

Standards-BasedCross-Architecture Language▷ Language to deliver uncompromised parallel programming productivity and

performance across CPUs and accelerators• Allows code reuse across hardware targets...• while permitting custom tuning for a specific accelerator• Open, cross-industry alternative to single architecture proprietary language

▷ Based on C++• Delivers C++ productivity benefits, using common and familiar C and C++ constructs• Incorporates SYCL (from the Khronos Group) to support data parallelism andheterogeneous programming

▷ Language enhancements being driven through community project• Extensions to simplify data parallel programming• Open and cooperative development for continued evolution

▷ Builds upon many years of experience in architecture and compilers

oneAPI module 2: Introduction to DPC++ 71 / 97

Page 77: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

DATA PARALLEL C++ (DPC++)71

Standards-BasedCross-Architecture Language▷ Language to deliver uncompromised parallel programming productivity and

performance across CPUs and accelerators• Allows code reuse across hardware targets...• while permitting custom tuning for a specific accelerator• Open, cross-industry alternative to single architecture proprietary language

▷ Based on C++• Delivers C++ productivity benefits, using common and familiar C and C++ constructs• Incorporates SYCL (from the Khronos Group) to support data parallelism andheterogeneous programming

▷ Language enhancements being driven through community project• Extensions to simplify data parallel programming• Open and cooperative development for continued evolution

▷ Builds upon many years of experience in architecture and compilers

oneAPI module 2: Introduction to DPC++ 71 / 97

Page 78: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

WHY NOT AN EXISTING LANGUAGE?72

▷ Portable languages are either serial (C++, Fortran, Java, ...) orhigh-level (MATLAB, Python, ...).

▷ Data Parallel languages are either proprietary (CUDA) orlow-level (OpenCL).

▷ Intel has developed DPC++ by building on C++, embracingopen standard SYCL, and filling in real-world gaps withsolutions today.

▷ The result is open, and available today to enable data parallelprogramming across SVMS architectures.

oneAPI module 2: Introduction to DPC++ 72 / 97

Page 79: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

WHY NOT AN EXISTING LANGUAGE?72

▷ Portable languages are either serial (C++, Fortran, Java, ...) orhigh-level (MATLAB, Python, ...).

▷ Data Parallel languages are either proprietary (CUDA) orlow-level (OpenCL).

▷ Intel has developed DPC++ by building on C++, embracingopen standard SYCL, and filling in real-world gaps withsolutions today.

▷ The result is open, and available today to enable data parallelprogramming across SVMS architectures.

oneAPI module 2: Introduction to DPC++ 72 / 97

Page 80: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

WHY NOT AN EXISTING LANGUAGE?72

▷ Portable languages are either serial (C++, Fortran, Java, ...) orhigh-level (MATLAB, Python, ...).

▷ Data Parallel languages are either proprietary (CUDA) orlow-level (OpenCL).

▷ Intel has developed DPC++ by building on C++, embracingopen standard SYCL, and filling in real-world gaps withsolutions today.

▷ The result is open, and available today to enable data parallelprogramming across SVMS architectures.

oneAPI module 2: Introduction to DPC++ 72 / 97

Page 81: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

DATA PARALLEL C++73

The language is:

C+++

SYCL+

Additional Features

oneAPI module 2: Introduction to DPC++ 73 / 97

Page 82: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

DATA PARALLEL C++73

The language is:

C+++

SYCL+

Additional Featuressuch as...

ndrange subgroups, USM, ordered queue

oneAPI module 2: Introduction to DPC++ 73 / 97

Page 83: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

DATA PARALLEL C++74

The implementation is:

Clang+

LLVM+

Runtime

oneAPI module 2: Introduction to DPC++ 74 / 97

Page 84: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

§5. WHAT IS DATA PARALLEL COMPUTING?

Page 85: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

WHAT IS DATA PARALLEL COMPUTING?76

oneAPI module 2: Introduction to DPC++ 76 / 97

Page 86: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

WHAT IS DATA PARALLEL COMPUTING?77

Independent Computations(preferably, a lot of them!)

oneAPI module 2: Introduction to DPC++ 77 / 97

Page 87: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

HARDWARE FOR DATA PARALLEL COMPUTING78

▷ Pipelining▷ SIMD (vector)▷ Multithreading▷ Multicore▷ Combinations of the above

IndependentComputations

(preferably, a lot of them!)

oneAPI module 2: Introduction to DPC++ 78 / 97

Page 88: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

HARDWARE FOR DATA PARALLEL COMPUTING79

▷ Pipelining▷ SIMD (vector)▷ Multithreading▷ Multicore▷ Combinations of the above

IndependentComputations

(preferably, a lot of them!)

Compiler and runtime map the IndependentComputations

onto various data parallel hardware(s).

oneAPI module 2: Introduction to DPC++ 79 / 97

Page 89: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

INTEL® XEON® SCALABLE PROCESSOR80

▷ Pipelining (1993)▷ SIMD (1996)▷ Multithreading(2002)

▷ Multicore (2005)▷ Combinationsof the above

oneAPI module 2: Introduction to DPC++ 80 / 97

Page 90: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

GRAPHICS81

▷ Pipelining▷ SIMD (vector)▷ Multithreading▷ Multicore▷ Combinations of the above

oneAPI module 2: Introduction to DPC++ 81 / 97

Page 91: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

FPGAS82

▷ Compiler sythesizes customSIMD pipeline, and replicates

▷ Effectively Pipelining +SIMD + Multi*

oneAPI module 2: Introduction to DPC++ 82 / 97

Page 92: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

DPC++83

The same programming language cansupport all SVMS architectures.

oneAPI module 2: Introduction to DPC++ 83 / 97

Page 93: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

DPC++84

Data Parallel C++provides the features and abstractionnecessary to deliver uncompromisedperformance on SVMS architectures.

oneAPI module 2: Introduction to DPC++ 84 / 97

Page 94: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

DPC++85

▷ DPC++ can be completely built from open source (all features)▷ github project, community collaboration▷ Full documentation of all Intel® extensions▷ C++, and SYCL, of course are open and well defined as well▷ Goal of DPC++ open source project is tomerge with top trunk of LLVM and Clang

▷ Intel has product version of DPC++ as well (freely available)

oneAPI module 2: Introduction to DPC++ 85 / 97

Page 95: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

ONEAPI TRAINING SERIES86

▷ Module 1: Getting Started with oneAPI▷ Module 2: Introduction to DPC++▷ Module 3: Fundamentals of DPC++, part 1 of 2▷ Module 4: Fundamentals of DPC++, part 2 of 2▷ Modules 5+: Deeper dives into specific DPC++ features,oneAPI libraries and tools

https://oneapi.comhttps://software.intel.com/en-us/oneapi

https://tinyurl.com/book-dpcpphttp://tinyurl.com/oneapimodule?2

oneAPI module 2: Introduction to DPC++ 86 / 97

Page 96: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

DPC++ - READY TO DIVE IN87

▷ Modules 3 and 4 dive into DPC++▷ We have covered the ``why'' and ``what'' for oneAPI andDPC++

oneAPI module 2: Introduction to DPC++ 87 / 97

Page 97: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

DPC++ - READY TO DIVE IN88

▷ Modules 3 and 4 dive into DPC++▷ We have covered the ``why'' and ``what'' for oneAPI andDPC++

oneAPI module 2: Introduction to DPC++ 88 / 97

Page 98: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

DPC++ - READY TO DIVE IN89

▷ Modules 3 and 4 dive into DPC++▷ We have covered the ``why'' and ``what'' for oneAPI andDPC++

oneAPI module 2: Introduction to DPC++ 89 / 97

Page 99: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

DPC++ - READY TO DIVE IN90

▷ Modules 3 and 4 dive into DPC++▷ We have covered the ``why'' and ``what'' for oneAPI andDPC++

oneAPI module 2: Introduction to DPC++ 90 / 97

Page 100: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

DPC++ - READY TO DIVE IN91

▷ Modules 3 and 4 dive into DPC++▷ We have covered the ``why'' and ``what'' for oneAPI andDPC++

oneAPI module 2: Introduction to DPC++ 91 / 97

Page 101: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

ONEAPI TRAINING SERIES92

▷ Module 1: Getting Started with oneAPI▷ Module 2: Introduction to DPC++▷ Module 3: Fundamentals of DPC++, part 1 of 2▷ Module 4: Fundamentals of DPC++, part 2 of 2▷ Modules 5+: Deeper dives into specific DPC++ features,oneAPI libraries and tools

https://oneapi.comhttps://software.intel.com/en-us/oneapi

https://tinyurl.com/book-dpcpphttp://tinyurl.com/oneapimodule?2

oneAPI module 2: Introduction to DPC++ 92 / 97

Page 102: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

§6. DEVCLOUD - TRY ONEAPI EASILY

Page 103: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

1 DPC++, SYCL, C++, and a Heterogeneous Universe

2 Anatomy of a DPC++ program

3 Revisit Doubler - DPC++

4 Device Selection in DPC++

5 What is Data Parallel Computing?

6 DevCloud - Try oneAPI easily

oneAPI module 2: Introduction to DPC++ 94 / 97

Page 104: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

DEVCLOUD95

https://software.intel.com/en-us/devcloud/oneapi

oneAPI module 2: Introduction to DPC++ 95 / 97

Page 105: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

RESOURCES96

▷ Book (Chapters 1-4 Preview)▷ oneAPI Toolkit(s)▷ Training, Support, Forums,Example Code

All availableFree

https://software.intel.com/en-us/oneapi https://tinyurl.com/book-dpcpphttp://tinyurl.com/oneapimodule?2

oneAPI module 2: Introduction to DPC++ 96 / 97

Page 106: oneAPI, Module 2 - Colfax International...ONEAPITRAININGSERIES 2 Module1:GettingStartedwithoneAPI Module2:IntroductiontoDPC++ Module3:FundamentalsofDPC++,part1of2 Module4:FundamentalsofDPC++,part2of2

NEXT FOR YOU97

▷ Sign up for DevCloud▷ Try the oneAPI toolkit▷ Watch module 1 if you skipped it▷ Join us for Modules 3+, to continue diving into DPC++

https://software.intel.com/en-us/oneapi

oneAPI module 2: Introduction to DPC++ 97 / 97