Upload
akirank1
View
113
Download
0
Embed Size (px)
DESCRIPTION
Professional Assembly Language
Citation preview
CS220
April 25, 2007
AT&T syntax MMX
• Most MMX documents are in Intel SyntaxOPERATION DEST, SRC
• We use AT&T SyntaxOPERATION SRC, DEST
• Always remember: DEST = DEST OPERATION SRC
(Please note the weird subtraction and division operation direction in FP was a mistake of gcc)
Multiplication
• Except for multiplication, conversion, and comparison, all other MMX instructions are straightforward.
• PMADDWD mm/m64, mm
• PMULHW mm/m64, mm
• PMULLW mm/m64, mmDoubleword->word, keep high part
Doubleword->word, keep low part
Conversion
• PACKSSDW mm/m64, mm• PACKUSDW mm/m64, mm
doubleword->word
• PACKUSWB mm/m64, mmword->byte
How to do interleave pack?
• PACKSSDW %mm0, %mm0• PACKSSDW %mm1, %mm1• PUNPKLWD %mm1, %mm0
(interleave the low end 16-bit values of the operands)
• PUNPCKHBW mm/m64, mm
• PUNPCKLBW mm/m64/m32, mm
Low parts of original 64 bits are ignored
byte_src+byte_dst=word_dst
High parts of original 64 bits are ignored
byte_src+byte_dst=word_dst
• MOVQ %mm0, %mm2• PUNPCKLDQ %mm1, %mm0
(replace the two high end words of mm0 with the two low end words of mm1 leave the two low end words of mm0 in place)
• PUNPCKHDQ %mm1, %mm2 (move the two high end words of
mm2 to the two low end words of mm2; place the two high end words of mm1 in the two high end words of mm2)
How to do non-interleaved unpack?
mm0
mm2
• PCMPEQW mm/m64, mm
• PCMPGTW mm/m64, mm
Rule of Thumb
• Only Shift instructions can have immediate number
• Only movd instruction can have 32-bit register
• Punpckl can have 32-bit memory source• All other instructions deal with 64-bit
registers or memory. No immediate number!
Constant numbers• Generate a zero in mm0:
PXOR %mm0, %mm0 PANDN %mm0, %mm0
• Generate all 1's in register mm1, which is -1 in each of the packed data type fields:PCMPEQ %mm1, %mm1
• Generate the constant 1 in every packed-byte [or packed-word] (or packed-dword) field:
PXOR %mm0, %mm0PCMPEQ %mm1, %mm1PSUBB %mm1, %mm0 [PSUBW %mm1, %mm0] (PSUBD %mm1, %mm0)
• Generate the signed constant 2n -1 in every packed-word (or packed-dword) field:PCMPEQ %mm1, %mm1PSRLW $(16-n), %mm1 (PSRLD $(32-n), %mm1)
• Generate the signed constant -2n in every packed-word (or packed-dword) field:PCMPEQ %mm1, %mm1PSLLW $n, %mm1 (PSLLD $n, %mm1)
Examples
• absolute value of a vector of signed wordsmovq %mm0, %mm1 #make a copy of source datapsraw $15, %mm0 #replicate sign bitpxor %mm0, %mm1 #psubs %mm0, %mm1 #add 1 to just the negative fields
PXOR/XOR a number with all 0s, get itselfPXOR/XOR a number with all 1s, get NOT(itself)
The data in %mm0 are all 0’s and all 1’sFor positive number, it subtracts 0’s(0)For negative number, it subtracts 1’s(-1)
Dot Production#include<stdio.h>main(){
int i;int result;unsigned short a[] = {1, 2, 3, 4, 5, 6, 7, 8};unsigned short b[] = {2, 4, 6, 8, 10, 12, 14, 16};
__asm__("pxor %mm7,%mm7");
for(i = 0; i < sizeof(a)/sizeof(short); i += 4){__asm__("movq %0,%%mm0\n\t"
"movq %1,%%mm1\n\t""pmaddwd %%mm1,%%mm0\n\t""paddd %%mm0,%%mm7":: "m" (a[i]), "m" (b[i]));
}__asm__("movq %%mm7,%%mm0\n\t"
"psrlq $32,%%mm0\n\t""paddd %%mm7,%%mm0\n\t""movd %%mm0,%0\n\t""emms":"=m" (result));
printf("dotproduction: %d\n", result);}
movd moves lower 32bits of mm0
Weathercaster• PCMPEQ (packed compare for
equality) is performed on the weathercaster and blue-screen images, yielding a bitmask that traces the outline of the weathercaster.
• This bitmask image is PANDNed(packed and not) with the weathercaster image, yielding the first intermediate image: now the weathercaster has no background behind her.
• The same bitmask image is PANDed (packed and) with the weather map image, yielding the second intermediate image.
• The two intermediate images are PORed (packed or) together, resulting in final composite of the weathercaster over weather map
Address or Content?.section .rodata
mybytes:.byte 'a','b','c','d','e','f','g','h'
mystr:.ascii "abcdefghijklmnopqrstuvwxyz".text
.globl main.type main, @function
main:pushl %ebpmovl %esp, %ebpmovl mybytes, %eaxmovl $mybytes, %ebxmovl (mybytes), %edxmovl (%ebx), %edxxorl %ecx, %ecxmovl $mystr, %ebxmovq (%ebx,%ecx,8),%mm0leal mystr, %ebxmovq (%ebx,%ecx,8),%mm1leal (mystr), %ebxmovq (%ebx,%ecx,8),%mm2movq mystr(,%ecx,8),%mm3movq mystr,%mm4movq (mystr),%mm5subl $8, %espmovq %mm0, (%esp)leaveret.size main, .-main
Content in %eax, %ecx and %edx:
0x64636261==“abcd”
Content in %ebx:
Address
Content in %mm0-%mm5:
0x6867666564636261
H address L address
“abcdefgh”
L address H address
0x61==97==‘a’
Misc• Context Switching
– FP mode to MMX mode: 28 cycles– MMX mode to FP mode: 53 cycles
FP_code: …... ……
MMX_code: …... EMMS (*mark the FP tag word as empty*)
FP_code 1: …... …...
• Also FNSAVE and FRSTR
Category Mnemonic Different Opcodes Description Arithmetic PADD[B,W,D] 3 Add with wrap-around on [byte, word, doubleword]
PADDS[B,W] 2 Add signed with saturation on [byte, word] PADDUS[B,W] 2 Add unsigned with saturation on [byte, word] PSUB[B,W,D] 3 Subtract with wrap-around on [byte, word, doubleword] PSUBS[B,W] 2 Subtract signed with saturation on [byte, word] PSUBUS[B,W] 2 Subtract unsigned with saturation on [byte, word] PMULHW 1 Packed multiply high on wordsPMULLW 1 Packed multiply low on words PMADDWD 1 Packed multiply on words and add resulting pairs
Comparison PCMPEQ[B,W,D] 3 Packed compare for equality [byte, word,doubleword] PCMPGT[B,W,D] 3 Packed compare greater than [byte, word, doubleword]
Conversion PACKUSWB 1 Pack words into bytes (unsigned with saturation) PACKSS[WB,DW] 2 Pack [words into bytes, doublewords into words] (signed with
saturation) PUNPCKH [BW,WD,DQ] 3 Unpack (interleave) high-order [bytes, words, doublewords] from
MMXTM register PUNPCKL [BW,WD,DQ] 3 Unpack (interleave) low-order [bytes, words, doublewords] from
MMX register Logical PAND 1 Bitwise AND
PANDN 1 Bitwise AND NOT POR 1 Bitwise OR PXOR 1 Bitwise XOR
Shift PSLL[W,D,Q] 6 Packed shift left logical [word, doubleword, quadword] by amount specified in MMX register or by immediate value
PSRL[W,D,Q] 6 Packed shift right logical [word, doubleword, quadword] by amount specified in MMX register or by immediate value
PSRA[W,D] 4 Packed shift right arithmetic [word, doubleword] by amount specified in MMX register or by immediate value
Data Transfer MOV[D,Q] 4 Move [doubleword, quadword] to MMX register or from MMX register
State Mgmt EMMS 1 Empty MMX state
MMX Instruction Set