10
Intrinsics Lecture 1 Manfred Liebmann Technische Universit¨ at M¨ unchen Chair of Optimal Control Center for Mathematical Sciences, M17 [email protected] January 12, 2016

Intrinsics Lecture 1 - M17/Lehrstuhl für Optimalsteuerung · 2016-01-12 · Intrinsics are functions that the compiler replaces with the proper assembly ... Intrinsics are primarily

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Intrinsics Lecture 1 - M17/Lehrstuhl für Optimalsteuerung · 2016-01-12 · Intrinsics are functions that the compiler replaces with the proper assembly ... Intrinsics are primarily

IntrinsicsLecture 1

Manfred LiebmannTechnische Universitat Munchen

Chair of Optimal Control

Center for Mathematical Sciences, M17

[email protected]

January 12, 2016

Page 2: Intrinsics Lecture 1 - M17/Lehrstuhl für Optimalsteuerung · 2016-01-12 · Intrinsics are functions that the compiler replaces with the proper assembly ... Intrinsics are primarily

Manfred Liebmann January 12, 2016

Programming with Intrinsics

What are intrinsics?

Intrinsics are functions that the compiler replaces with the proper assemblyinstructions. Intrinsics are primarily used to access the vector processing capabilities ofmodern CPUs.

• Long history of Intrinsics

– MMX : Multi Media Extensions 8 x 64bit (1997)– SSE/SSE2/SSE3/SSSE3/SSE4.x : Streaming SIMD Extensions 8 x 128bit (1999)– AVX/AVX2/FMA : Advanced Vector Extensions 16 x 256 bit (2008)– AVX-512/KNC : Advanced Vector Extensions 32 x 512 bit (2012)

Intrinsics 1

Page 3: Intrinsics Lecture 1 - M17/Lehrstuhl für Optimalsteuerung · 2016-01-12 · Intrinsics are functions that the compiler replaces with the proper assembly ... Intrinsics are primarily

Manfred Liebmann January 12, 2016

Choose the Right Header!

Intrinsics are supported by all modern C/C++ compilers.

• Every generation has its own header!

– #include <mmintrin.h> //MMX– #include <xmmintrin.h> //SSE– #include <emmintrin.h> //SSE2– #include <pmmintrin.h> //SSE3– #include <tmmintrin.h> //SSSE3– #include <smmintrin.h> //SSE4.1– #include <nmmintrin.h> //SSE4.2– #include <ammintrin.h> //SSE4A– #include <wmmintrin.h> //AES– #include <immintrin.h> //AVX

Intrinsics 2

Page 4: Intrinsics Lecture 1 - M17/Lehrstuhl für Optimalsteuerung · 2016-01-12 · Intrinsics are functions that the compiler replaces with the proper assembly ... Intrinsics are primarily

Manfred Liebmann January 12, 2016

Advanced Vector Extensions (AVX)

Intel Advanced Vector Extensions (AVX) is a set of instructions for doing Single InstructionMultiple Data (SIMD) operations on Intel architecture CPUs. These instructions extend theprevious SIMD o↵erings, MMX instructions and Intel Streaming SIMD Extensions (SSE).

Intel Intrinsics Guide

https://software.intel.com/sites/landingpage/IntrinsicsGuide/

Complete interactive reference for all intrinsic functions!

Instruction Set Architecture (ISA) Extensions

https://software.intel.com/en-us/isa-extensions

Intrinsics 3

Page 5: Intrinsics Lecture 1 - M17/Lehrstuhl für Optimalsteuerung · 2016-01-12 · Intrinsics are functions that the compiler replaces with the proper assembly ... Intrinsics are primarily

Manfred Liebmann January 12, 2016

Intel AVX Su�x Markings

All modern C++ compilers support the same intrinsic operations to simplify using IntelAVX from C or C++ code. Intrinsics are functions that the compiler replaces with the properassembly instructions. Most Intel AVX intrinsic names follow the following format:

_mm256_op_suffix(data_type param1, data_type param2, data_type param3)

where mm256 is the prefix for working on the new 256-bit registers; op is the operation,like add for addition or sub for subtraction; and su�x denotes the type of data to operateon, with the first letters denoting packed (p), extended packed (ep), or scalar (s). Theremaining letters are the types given in the table below.

• Su�x Markings

[s/d] : Single- or double-precision floating point[i/u]nnn : Signed or unsigned integer of bit size nnn, where nnn is 128, 64, 32, 16, or 8[ps/pd/sd] : Packed single, packed double, or scalar doubleepi32 : Extended packed 32-bit signed integersi256 : Scalar 256-bit integer

Intrinsics 4

Page 6: Intrinsics Lecture 1 - M17/Lehrstuhl für Optimalsteuerung · 2016-01-12 · Intrinsics are functions that the compiler replaces with the proper assembly ... Intrinsics are primarily

Manfred Liebmann January 12, 2016

Intel AVX Intrinsics Data Types

• Data Types

m256 : 256-bit as eight single-precision floating-point valuesm256d : 256-bit as four double-precision floating-point valuesm256i : 256-bit as integers, (bytes, words, etc.)m128 : 128-bit single precision floating-point (32 bits each)m128d : 128-bit double precision floating-point (64 bits each)

Figure 1: Intel AVX and Intel SSE data types

Intrinsics 5

Page 7: Intrinsics Lecture 1 - M17/Lehrstuhl für Optimalsteuerung · 2016-01-12 · Intrinsics are functions that the compiler replaces with the proper assembly ... Intrinsics are primarily

Manfred Liebmann January 12, 2016

Mandelbrot Set Code Example

Pseudocode for calculating the Mandelbrot set.

z,p are complex numbers

for each point p on the complex plane

z = 0

for count = 0 to max_iterations

if abs(z) > 2.0

break

z = z*z+p

set color at p based on count reached

Intrinsics 6

Page 8: Intrinsics Lecture 1 - M17/Lehrstuhl für Optimalsteuerung · 2016-01-12 · Intrinsics are functions that the compiler replaces with the proper assembly ... Intrinsics are primarily

Manfred Liebmann January 12, 2016

Mandelbrot Set Visualization

Figure 2: Mandelbrot set 0.29768+0.48354i to 0.29778+0.48364i with 4096 max iterations

Intrinsics 7

Page 9: Intrinsics Lecture 1 - M17/Lehrstuhl für Optimalsteuerung · 2016-01-12 · Intrinsics are functions that the compiler replaces with the proper assembly ... Intrinsics are primarily

Manfred Liebmann January 12, 2016

Simple Mandelbrot C++ STL Code

#include <iostream>

#include <complex>

using namespace std;

int main(int argc, char** argv)

{

float x1 = 0.29768, y1 = 0.48364, x2 = 0.29778, y2 = 0.48354;

int width = 2048, height = 2048, int maxIters = 4096;

unsigned short *image = new unsigned short[width * height];

float dx = (x2-x1)/width, dy = (y2-y1)/height;

for (int j = 0; j < height; ++j) {

for (int i = 0; i < width; ++i) {

complex<float> c(x1+dx*i, y1+dy*j), z(0,0);

int count = -1;

while ((++count < maxIters) && (norm(z) < 2.0))

z = z*z+c;

*image++ = count;

}

}

}

Intrinsics 8

Page 10: Intrinsics Lecture 1 - M17/Lehrstuhl für Optimalsteuerung · 2016-01-12 · Intrinsics are functions that the compiler replaces with the proper assembly ... Intrinsics are primarily

Manfred Liebmann January 12, 2016

Mandelbrot Set Benchmark

Cores STL FPU AVX

1 63.5186 11.9445 1.644152 50.1687 9.42479 1.269574 42.7716 8.02288 1.056728 23.2062 4.34219 0.56915216 13.9921 2.62823 0.345063

Table 1: Total runtimes in seconds for the Mandelbrot set benchmark with a 2048 x 2048grid on 2x Intel Xeon E5-2650 @ 2.00GHz.

Intrinsics 9