17
CS220 April 25, 2007

AT&T syntax MMX

Embed Size (px)

DESCRIPTION

Professional Assembly Language

Citation preview

Page 1: AT&T syntax MMX

CS220

April 25, 2007

Page 2: AT&T syntax MMX

AT&T syntax MMX

• Most MMX documents are in Intel SyntaxOPERATION DEST, SRC

• We use AT&T SyntaxOPERATION SRC, DEST

• Always remember: DEST = DEST OPERATION SRC

(Please note the weird subtraction and division operation direction in FP was a mistake of gcc)

Page 3: AT&T syntax MMX

Multiplication

• Except for multiplication, conversion, and comparison, all other MMX instructions are straightforward.

• PMADDWD mm/m64, mm

Page 4: AT&T syntax MMX

• PMULHW mm/m64, mm

• PMULLW mm/m64, mmDoubleword->word, keep high part

Doubleword->word, keep low part

Page 5: AT&T syntax MMX

Conversion

• PACKSSDW mm/m64, mm• PACKUSDW mm/m64, mm

doubleword->word

• PACKUSWB mm/m64, mmword->byte

Page 6: AT&T syntax MMX

How to do interleave pack?

• PACKSSDW %mm0, %mm0• PACKSSDW %mm1, %mm1• PUNPKLWD %mm1, %mm0

(interleave the low end 16-bit values of the operands)

Page 7: AT&T syntax MMX

• PUNPCKHBW mm/m64, mm

• PUNPCKLBW mm/m64/m32, mm

Low parts of original 64 bits are ignored

byte_src+byte_dst=word_dst

High parts of original 64 bits are ignored

byte_src+byte_dst=word_dst

Page 8: AT&T syntax MMX

• MOVQ %mm0, %mm2• PUNPCKLDQ %mm1, %mm0

(replace the two high end words of mm0 with the two low end words of mm1 leave the two low end words of mm0 in place)

• PUNPCKHDQ %mm1, %mm2 (move the two high end words of

mm2 to the two low end words of mm2; place the two high end words of mm1 in the two high end words of mm2)

How to do non-interleaved unpack?

mm0

mm2

Page 9: AT&T syntax MMX

• PCMPEQW mm/m64, mm

• PCMPGTW mm/m64, mm

Page 10: AT&T syntax MMX

Rule of Thumb

• Only Shift instructions can have immediate number

• Only movd instruction can have 32-bit register

• Punpckl can have 32-bit memory source• All other instructions deal with 64-bit

registers or memory. No immediate number!

Page 11: AT&T syntax MMX

Constant numbers• Generate a zero in mm0:

PXOR %mm0, %mm0 PANDN %mm0, %mm0

• Generate all 1's in register mm1, which is -1 in each of the packed data type fields:PCMPEQ %mm1, %mm1

• Generate the constant 1 in every packed-byte [or packed-word] (or packed-dword) field:

PXOR %mm0, %mm0PCMPEQ %mm1, %mm1PSUBB %mm1, %mm0 [PSUBW %mm1, %mm0] (PSUBD %mm1, %mm0)

• Generate the signed constant 2n -1 in every packed-word (or packed-dword) field:PCMPEQ %mm1, %mm1PSRLW $(16-n), %mm1 (PSRLD $(32-n), %mm1)

• Generate the signed constant -2n in every packed-word (or packed-dword) field:PCMPEQ %mm1, %mm1PSLLW $n, %mm1 (PSLLD $n, %mm1)

Page 12: AT&T syntax MMX

Examples

• absolute value of a vector of signed wordsmovq %mm0, %mm1 #make a copy of source datapsraw $15, %mm0 #replicate sign bitpxor %mm0, %mm1 #psubs %mm0, %mm1 #add 1 to just the negative fields

PXOR/XOR a number with all 0s, get itselfPXOR/XOR a number with all 1s, get NOT(itself)

The data in %mm0 are all 0’s and all 1’sFor positive number, it subtracts 0’s(0)For negative number, it subtracts 1’s(-1)

Page 13: AT&T syntax MMX

Dot Production#include<stdio.h>main(){

int i;int result;unsigned short a[] = {1, 2, 3, 4, 5, 6, 7, 8};unsigned short b[] = {2, 4, 6, 8, 10, 12, 14, 16};

__asm__("pxor %mm7,%mm7");

for(i = 0; i < sizeof(a)/sizeof(short); i += 4){__asm__("movq %0,%%mm0\n\t"

"movq %1,%%mm1\n\t""pmaddwd %%mm1,%%mm0\n\t""paddd %%mm0,%%mm7":: "m" (a[i]), "m" (b[i]));

}__asm__("movq %%mm7,%%mm0\n\t"

"psrlq $32,%%mm0\n\t""paddd %%mm7,%%mm0\n\t""movd %%mm0,%0\n\t""emms":"=m" (result));

printf("dotproduction: %d\n", result);}

movd moves lower 32bits of mm0

Page 14: AT&T syntax MMX

Weathercaster• PCMPEQ (packed compare for

equality) is performed on the weathercaster and blue-screen images, yielding a bitmask that traces the outline of the weathercaster.

• This bitmask image is PANDNed(packed and not) with the weathercaster image, yielding the first intermediate image: now the weathercaster has no background behind her.

• The same bitmask image is PANDed (packed and) with the weather map image, yielding the second intermediate image.

• The two intermediate images are PORed (packed or) together, resulting in final composite of the weathercaster over weather map

Page 15: AT&T syntax MMX

Address or Content?.section .rodata

mybytes:.byte 'a','b','c','d','e','f','g','h'

mystr:.ascii "abcdefghijklmnopqrstuvwxyz".text

.globl main.type main, @function

main:pushl %ebpmovl %esp, %ebpmovl mybytes, %eaxmovl $mybytes, %ebxmovl (mybytes), %edxmovl (%ebx), %edxxorl %ecx, %ecxmovl $mystr, %ebxmovq (%ebx,%ecx,8),%mm0leal mystr, %ebxmovq (%ebx,%ecx,8),%mm1leal (mystr), %ebxmovq (%ebx,%ecx,8),%mm2movq mystr(,%ecx,8),%mm3movq mystr,%mm4movq (mystr),%mm5subl $8, %espmovq %mm0, (%esp)leaveret.size main, .-main

Content in %eax, %ecx and %edx:

0x64636261==“abcd”

Content in %ebx:

Address

Content in %mm0-%mm5:

0x6867666564636261

H address L address

“abcdefgh”

L address H address

0x61==97==‘a’

Page 16: AT&T syntax MMX

Misc• Context Switching

– FP mode to MMX mode: 28 cycles– MMX mode to FP mode: 53 cycles

FP_code: …... ……

MMX_code: …... EMMS (*mark the FP tag word as empty*)

FP_code 1: …... …...

• Also FNSAVE and FRSTR

Page 17: AT&T syntax MMX

Category Mnemonic Different Opcodes Description Arithmetic PADD[B,W,D] 3 Add with wrap-around on [byte, word, doubleword]

PADDS[B,W] 2 Add signed with saturation on [byte, word] PADDUS[B,W] 2 Add unsigned with saturation on [byte, word] PSUB[B,W,D] 3 Subtract with wrap-around on [byte, word, doubleword] PSUBS[B,W] 2 Subtract signed with saturation on [byte, word] PSUBUS[B,W] 2 Subtract unsigned with saturation on [byte, word] PMULHW 1 Packed multiply high on wordsPMULLW 1 Packed multiply low on words PMADDWD 1 Packed multiply on words and add resulting pairs

Comparison PCMPEQ[B,W,D] 3 Packed compare for equality [byte, word,doubleword] PCMPGT[B,W,D] 3 Packed compare greater than [byte, word, doubleword]

Conversion PACKUSWB 1 Pack words into bytes (unsigned with saturation) PACKSS[WB,DW] 2 Pack [words into bytes, doublewords into words] (signed with

saturation) PUNPCKH [BW,WD,DQ] 3 Unpack (interleave) high-order [bytes, words, doublewords] from

MMXTM register PUNPCKL [BW,WD,DQ] 3 Unpack (interleave) low-order [bytes, words, doublewords] from

MMX register Logical PAND 1 Bitwise AND

PANDN 1 Bitwise AND NOT POR 1 Bitwise OR PXOR 1 Bitwise XOR

Shift PSLL[W,D,Q] 6 Packed shift left logical [word, doubleword, quadword] by amount specified in MMX register or by immediate value

PSRL[W,D,Q] 6 Packed shift right logical [word, doubleword, quadword] by amount specified in MMX register or by immediate value

PSRA[W,D] 4 Packed shift right arithmetic [word, doubleword] by amount specified in MMX register or by immediate value

Data Transfer MOV[D,Q] 4 Move [doubleword, quadword] to MMX register or from MMX register

State Mgmt EMMS 1 Empty MMX state

MMX Instruction Set