Languages such as JavaScript may receive a lot of hype nowadays, but for high-performance, close-to-the-metal computing, C++ is still king. This webinar takes you on a tour of the HPC universe, with a focus on parallelism, be it instruction-level (SIMD), data-level, task-based (multithreading, OpenMP), or cluster-based (MPI). We also discuss how specific hardware can significantly accelerate computation by looking at two such technologies: NVIDIA CUDA and Intel Xeon Phi. (Some scarier tech such as FPGAs are also mentioned). These slides were used as part of May 29, 2014 webinar, High-Performance Computing with C++. You can watch the webinar on JetBrainsTV YouTube Channel - http://youtu.be/JcSrwxDb-Fs
An overview of available technologies for computation A look at
managed vs. unmanaged code How to leverage capabilities of x86
architecture What COTS and specialized acceleration h/w exists and
how to use it
Native code Managed code
More portable. But ++ is also portable provided you do not use
platform-specific things. In theory gets optimized for various
platforms. In practice, this isnt great. Does not permit low-level
interaction with the processor. Additional safety (managed) array
bound checks, type conversion checks, etc.
Not always portable (e.g. .NET is only partially portable,
excluding UI, WCF, ) Typically supports garbage collection. Has
ways of interacting with native code (JNI, P/Invoke, C++/CLI).
Developer vs. software productivity? Managed languages simpler
to use
This talk focuses on CPU bound problems Some problems
bottleneck on I/O SSD made things a lot better Optimization
mechanisms
Dont expect CPU clock speed to pick up PC/server architecture
does not scale The only way to accelerate computation is to provide
more entities to compute on.
Instruction-level Thread-level Machine-level
Via inline assembly Via intrinsics Compiler vectorization Use
magical compilers (e.g. Intel SPMD)
SIMD things
Processing data in an array OpenMP Intel Threading Building
Blocks/ Parallel Patterns Library (MS)
GPGPU Expansion boards Custom chips
Hardware Platforms NVIDIA, ATI Software platforms for
computation CUDA, OpenCL, C++ AMP
Typically 2, effectiveness drop-off after that PCI bus
congestion, but depends on usage patterns
CUDA is the principal commercially successful GPGPU platform
CUDA is supported by many software manufacturers (Photoshop,
MATLAB, etc.) In many domains (e.g. video transcoding), the
situation with GPU leveraging is dire In terms of performance, it
is thought that CUDA has better floating-point, AMD better integral
math
CUDA is actually a managed technology CUDA is not
device-independent CUDA C is the primary development language
A GPU has several streaming multiprocessors (SM) Each SM has
lots of processors (SP) We can launch a large number of threads in
parallel Very large number of SPs ensures that even at lower clock
speeds, GPU wins out over CPU
A look at CUDA development
GPU does not support ordinary x86. Running several tasks on a
GPU is difficult Branch divergence branching code (a simple if)
turns computation from parallel to sequential.
How do you plug in a few CPUs into a motherboard? You cannot.
The architecture doesnt scale. (And never will.) An alternative is
to put a coprocessor on the PCI bus
Commercial coprocessor implementation from Intel PCI board with
60x cores Supports x86!!!!!!!!!111111 Supports different
technologies Runs its own micro Linux (not a driver) Can be used in
either independent or offload mode Requires special development
tools (Intel C++ compiler)
Intel makes a lot of tools for ++ developers To work with Xeon
Phi, you need
Same as in ordinary PCs, i.e., OpenMP, MPI pthreads Other
models coming soon
FPGA Field Programmable Gate Array Design your own CPU
processing mechanic Middle ground between hard-wired ASIC and very
flexible general-purpose CPU Uses special hardware description
languages (HDL) VHDL, Verilog. There are others (SystemC, OpenCL)
and higher-level solutions (e.g., MATLAB, Embeddr).
Intrinsically parallel Low-power Better scalability Not a COTS
solution
FPGA lets us offload some tasks from the CPU FPGA is a lot less
flexible. Not so good for math. FPGA is a low-level construct.
FPGAs are relatively expensive to operate.
FPGAs do not directly compete with ordinary CPUs Gain an
advantage due to a highly asynchronous nature The goal is to
pre-program an FPGA to solve a single problem very quickly E.g.,
protocol parsing in hardware (so called feed handler)
JetBrains is working on the C++ IDE And C++ support in
ReSharper Questions?