27
SWAR: MMX, SSE, SSE 2 Multiplatform Programming 22 November 2002 SWAR - Programmazione multipiattaforma 1 Relatore: dott. Matteo Roffilli [email protected]

SWAR: MMX, SSE, SSE 2 Multiplatform Programming · 2018. 11. 26. · SWAR: MMX, SSE, SSE 2 Multiplatform Programming 22 November 2002 SWAR - Programmazione multipiattaforma 1 Relatore:

  • Upload
    others

  • View
    22

  • Download
    0

Embed Size (px)

Citation preview

Page 1: SWAR: MMX, SSE, SSE 2 Multiplatform Programming · 2018. 11. 26. · SWAR: MMX, SSE, SSE 2 Multiplatform Programming 22 November 2002 SWAR - Programmazione multipiattaforma 1 Relatore:

SWAR: MMX, SSE, SSE 2Multiplatform Programming

22 November 2002 SWAR - Programmazione multipiattaforma

1

Relatore: dott. Matteo Roffilli

[email protected]

Page 2: SWAR: MMX, SSE, SSE 2 Multiplatform Programming · 2018. 11. 26. · SWAR: MMX, SSE, SSE 2 Multiplatform Programming 22 November 2002 SWAR - Programmazione multipiattaforma 1 Relatore:

What’s SWAR?

22 November 2002 SWAR - Programmazione multipiattaforma

2

SWAR = SIMD Within A Register

SIMD = Single Instruction Multiple Data

MMX,SSE,SSE2,Power3DNow = Intel\AMD implementation of SWAR

Page 3: SWAR: MMX, SSE, SSE 2 Multiplatform Programming · 2018. 11. 26. · SWAR: MMX, SSE, SSE 2 Multiplatform Programming 22 November 2002 SWAR - Programmazione multipiattaforma 1 Relatore:

Why SWAR?

22 November 2002 SWAR - Programmazione multipiattaforma

3

Enable parallel computing on commercial hardware

Boost 2x your application at 0 € cost

Enjoy yourself learning assembler ☺

Page 4: SWAR: MMX, SSE, SSE 2 Multiplatform Programming · 2018. 11. 26. · SWAR: MMX, SSE, SSE 2 Multiplatform Programming 22 November 2002 SWAR - Programmazione multipiattaforma 1 Relatore:

Why not SWAR?

22 November 2002 SWAR - Programmazione multipiattaforma

4

Hard low-level programming

Require know out of assembler

Not portable code (yet!)

Page 5: SWAR: MMX, SSE, SSE 2 Multiplatform Programming · 2018. 11. 26. · SWAR: MMX, SSE, SSE 2 Multiplatform Programming 22 November 2002 SWAR - Programmazione multipiattaforma 1 Relatore:

X86 Instruction Set

22 November 2002 SWAR - Programmazione multipiattaforma

5

Pentium I

General Purpose Instruction

X87 FPU Instruction

System Instruction

Page 6: SWAR: MMX, SSE, SSE 2 Multiplatform Programming · 2018. 11. 26. · SWAR: MMX, SSE, SSE 2 Multiplatform Programming 22 November 2002 SWAR - Programmazione multipiattaforma 1 Relatore:

General Purpose Instruction

22 November 2002 SWAR - Programmazione multipiattaforma

6

Data transferBinary integer arithmeticDecimal arithmeticLogic operationsShift and rotateBit and byte operationsProgram controlStringFlag control Segment register operations

Page 7: SWAR: MMX, SSE, SSE 2 Multiplatform Programming · 2018. 11. 26. · SWAR: MMX, SSE, SSE 2 Multiplatform Programming 22 November 2002 SWAR - Programmazione multipiattaforma 1 Relatore:

X87 FPU Instruction

22 November 2002 SWAR - Programmazione multipiattaforma

7

Floating-point

Integer

Binary-coded decimal (BCD) operands

Page 8: SWAR: MMX, SSE, SSE 2 Multiplatform Programming · 2018. 11. 26. · SWAR: MMX, SSE, SSE 2 Multiplatform Programming 22 November 2002 SWAR - Programmazione multipiattaforma 1 Relatore:

SIMD Instruction

22 November 2002 SWAR - Programmazione multipiattaforma

8

SIMD: Single Instruction Multiple Data

MMX , SSE and SSE2 instructionprovides a group of instructions that perform SIMD operations on packed integer and/or packed floating-point data elements contained in the 64-bit MMX, the 128-bit XMM or 128-bit MMX registers.enables increased performance on a wide variety of multimedia and communications applications.

Page 9: SWAR: MMX, SSE, SSE 2 Multiplatform Programming · 2018. 11. 26. · SWAR: MMX, SSE, SSE 2 Multiplatform Programming 22 November 2002 SWAR - Programmazione multipiattaforma 1 Relatore:

What’s new in Pentium II

22 November 2002 SWAR - Programmazione multipiattaforma

9

Pentium II=Pentium I + MMX

MMX : MultiMedia Extensions57 new instructions Eight 64-bit wide MMX registers Four new data types

Page 10: SWAR: MMX, SSE, SSE 2 Multiplatform Programming · 2018. 11. 26. · SWAR: MMX, SSE, SSE 2 Multiplatform Programming 22 November 2002 SWAR - Programmazione multipiattaforma 1 Relatore:

What’s new in Pentium III

22 November 2002 SWAR - Programmazione multipiattaforma

10

Pentium III=Pentium II + SSESSE : Streaming SIMD Extensions70 new instructionsthree categories:

SIMD-Floating Point New Media InstructionStreaming Memory Instruction

Page 11: SWAR: MMX, SSE, SSE 2 Multiplatform Programming · 2018. 11. 26. · SWAR: MMX, SSE, SSE 2 Multiplatform Programming 22 November 2002 SWAR - Programmazione multipiattaforma 1 Relatore:

The implementation of SSE

22 November 2002 SWAR - Programmazione multipiattaforma

11

SSE has 128-bit architectural width

Double-cycling the existing 64-bit data paths.

Deliver a realized 1.5 – 2x speedup

Only have 10% die size overhead

Page 12: SWAR: MMX, SSE, SSE 2 Multiplatform Programming · 2018. 11. 26. · SWAR: MMX, SSE, SSE 2 Multiplatform Programming 22 November 2002 SWAR - Programmazione multipiattaforma 1 Relatore:

SSE in details 1

22 November 2002 SWAR - Programmazione multipiattaforma

12

Eight 128-bits registers, named XMM0 to XMM7. The principal data type used by SSE instructions is a packed single floating point operand; practically, four 32-bits encoded floats are packed in each of the eight XMM registers

SSE instructions set includes some additional integer instructions that operate on classic 64-bits MMX registers (MM0 to MM7)

Page 13: SWAR: MMX, SSE, SSE 2 Multiplatform Programming · 2018. 11. 26. · SWAR: MMX, SSE, SSE 2 Multiplatform Programming 22 November 2002 SWAR - Programmazione multipiattaforma 1 Relatore:

SSE in details 2

22 November 2002 SWAR - Programmazione multipiattaforma

13

1. Arithmetic instructions

2. Logical instructions

3. Compare instructions

4. Shuffle instructions

5. Conversion instructions

6. Data movement instructions

7. State management instructions

8. Additional SIMD integer instructions

9. Cacheability control instructions

Page 14: SWAR: MMX, SSE, SSE 2 Multiplatform Programming · 2018. 11. 26. · SWAR: MMX, SSE, SSE 2 Multiplatform Programming 22 November 2002 SWAR - Programmazione multipiattaforma 1 Relatore:

What’s new in Pentium 4

22 November 2002 SWAR - Programmazione multipiattaforma

14

Pentium 4=Pentium III + SSE2

SSE2-Streaming SIMD Extensions 2144 new instructions for Double Precision FP.Cache and Memory Management.Enable 3-D Graphics.

Video Decoding/Encoding/ Speech Recognition

128 bit MMX Registers.Separate Register for Data Movement

Page 15: SWAR: MMX, SSE, SSE 2 Multiplatform Programming · 2018. 11. 26. · SWAR: MMX, SSE, SSE 2 Multiplatform Programming 22 November 2002 SWAR - Programmazione multipiattaforma 1 Relatore:

SIMD-FP Instruction

22 November 2002 SWAR - Programmazione multipiattaforma

15

SIMD feature introduce a new register file containing eight 128-bit registers

Capable of holding a vector of four IEEE single precision FP data elementsAllow four single precision FP operations to be carried out within a single instruction

Page 16: SWAR: MMX, SSE, SSE 2 Multiplatform Programming · 2018. 11. 26. · SWAR: MMX, SSE, SSE 2 Multiplatform Programming 22 November 2002 SWAR - Programmazione multipiattaforma 1 Relatore:

Aligned or not aligned?

22 November 2002 SWAR - Programmazione multipiattaforma

16

Why use memory aligned?Fast memory-to-register loadRequire dedicated malloc function

Why use memory not aligned?Slow memory-to-register loadFast integration into existing codeNot require dedicated malloc function

Page 17: SWAR: MMX, SSE, SSE 2 Multiplatform Programming · 2018. 11. 26. · SWAR: MMX, SSE, SSE 2 Multiplatform Programming 22 November 2002 SWAR - Programmazione multipiattaforma 1 Relatore:

Repetition helps learning!

22 November 2002 SWAR - Programmazione multipiattaforma

17

MMX54 instructions for INTEGER

SSE70 instructions for FLOAT

SSE2144 instructions for DOUBLE

Page 18: SWAR: MMX, SSE, SSE 2 Multiplatform Programming · 2018. 11. 26. · SWAR: MMX, SSE, SSE 2 Multiplatform Programming 22 November 2002 SWAR - Programmazione multipiattaforma 1 Relatore:

Portable Code Writing

22 November 2002 SWAR - Programmazione multipiattaforma

18

At&T vs Intel syntax

Linux vs Windows syntax

GNU Gcc 2.95 vs MS Visual Studio 6

Page 19: SWAR: MMX, SSE, SSE 2 Multiplatform Programming · 2018. 11. 26. · SWAR: MMX, SSE, SSE 2 Multiplatform Programming 22 November 2002 SWAR - Programmazione multipiattaforma 1 Relatore:

How to write SWAR code?

22 November 2002 SWAR - Programmazione multipiattaforma

19

Intel compiler 6 Linux/Windows

Direct Assembly

Inline Assembly

Link portable library (libSIMD)

Page 20: SWAR: MMX, SSE, SSE 2 Multiplatform Programming · 2018. 11. 26. · SWAR: MMX, SSE, SSE 2 Multiplatform Programming 22 November 2002 SWAR - Programmazione multipiattaforma 1 Relatore:

Real problem

22 November 2002 SWAR - Programmazione multipiattaforma

20

Intel Compiler: is not FREE!

Direct Asm: too hard for newmbies!

libSIMB: under developing yet!

Inline Asm: not portable but easy!

Page 21: SWAR: MMX, SSE, SSE 2 Multiplatform Programming · 2018. 11. 26. · SWAR: MMX, SSE, SSE 2 Multiplatform Programming 22 November 2002 SWAR - Programmazione multipiattaforma 1 Relatore:

Linux Inline Assembly

22 November 2002 SWAR - Programmazione multipiattaforma

21

1. __asm__ __volatile__(2. /* load vector a[1-4] in 128bit reg */3. "movups %0, %%xmm0\n\t"4. "movups %1, %%xmm1\n\t"

5. /* multiply */6. "mulps %%xmm1,%%xmm0 \n\t"7. :8. :"m"(a[0]),"m"(a[4])9. );

Page 22: SWAR: MMX, SSE, SSE 2 Multiplatform Programming · 2018. 11. 26. · SWAR: MMX, SSE, SSE 2 Multiplatform Programming 22 November 2002 SWAR - Programmazione multipiattaforma 1 Relatore:

Windows Inline Assembly

22 November 2002 SWAR - Programmazione multipiattaforma

22

1. __asm {2. /* load vector a[1-4] in 128bit reg */3. mov edi, a4. movups xmm0, [edi]5. movups xmm1, [edi+16]

6. /* multiply */7. mulps xmm0, xmm18. }

Page 23: SWAR: MMX, SSE, SSE 2 Multiplatform Programming · 2018. 11. 26. · SWAR: MMX, SSE, SSE 2 Multiplatform Programming 22 November 2002 SWAR - Programmazione multipiattaforma 1 Relatore:

Side by side

22 November 2002 SWAR - Programmazione multipiattaforma

23

1. __asm {2. mov edi, a3. movups xmm0, [edi]

4. mulps xmm0, xmm1

5. }

1. __asm__ __volatile__(

2. movups %0, %%xmm0\n\t“

3. "mulps %%xmm1,%%xmm0 \n\t“

4. :5. :"m"(a[0]),"m"(a[4])6. );

Page 24: SWAR: MMX, SSE, SSE 2 Multiplatform Programming · 2018. 11. 26. · SWAR: MMX, SSE, SSE 2 Multiplatform Programming 22 November 2002 SWAR - Programmazione multipiattaforma 1 Relatore:

How to compile?

22 November 2002 SWAR - Programmazione multipiattaforma

24

MS Visual Studio 6

AssemblyPower Pack 6

Kernel 2.4.x

Gcc 2.95 / 3.x

Binutils >2.13

Page 25: SWAR: MMX, SSE, SSE 2 Multiplatform Programming · 2018. 11. 26. · SWAR: MMX, SSE, SSE 2 Multiplatform Programming 22 November 2002 SWAR - Programmazione multipiattaforma 1 Relatore:

How to debug?

22 November 2002 SWAR - Programmazione multipiattaforma

25

Directly in assembler!

Page 26: SWAR: MMX, SSE, SSE 2 Multiplatform Programming · 2018. 11. 26. · SWAR: MMX, SSE, SSE 2 Multiplatform Programming 22 November 2002 SWAR - Programmazione multipiattaforma 1 Relatore:

Online Resurce

22 November 2002 SWAR - Programmazione multipiattaforma

26

Intel tutorialAMD tutorial and source codeSourceforge.netIBM whitepapersMS tutorial

SPRITE Linux Day Proceding

Page 27: SWAR: MMX, SSE, SSE 2 Multiplatform Programming · 2018. 11. 26. · SWAR: MMX, SSE, SSE 2 Multiplatform Programming 22 November 2002 SWAR - Programmazione multipiattaforma 1 Relatore:

22 November 2002 SWAR - Programmazione multipiattaforma

27

THANKS FOR YOU ATTENTION!

Some questions?

Relatore: dott. Matteo Roffilli

[email protected]