Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
Lecture 3: Instruction Set Architecture
Interface
instruction set
Software/compiler
hardware
Design Space of ISA
Five Primary Dimensions• Number of explicit operands ( 0, 1, 2, 3 )• Operand Storage Where besides memory?• Effective Address How is memory location
specified?• Type & Size of Operands byte, int, float, vector, . . .
How is it specified?• Operations add, sub, mul, . . .
How is it specifed?
Other Aspects• Successor How is it specified?• Conditions How are they determined?• Encodings Fixed or variable?• Parallelism
ISA Metrics• Orthogonality
– No special registers, few special cases, all operand modes available with any data type or instruction type
• Completeness– Support for a wide range of operations and target
applications
• Regularity– No overloading for the meanings of instruction fields
• Streamlined– Resource requirements can be easily determined
Ease of compilation
Basic ISA Classes
Accumulator:1 address add A acc ← ← acc + mem[A]1+x address addx A acc ← ← acc + mem[A + x]
Stack:0 address add tos ←← tos + next
General Purpose Register:2 address add A B EA(A) ← ← EA(A) + EA(B)3 address add A B C EA(A) ← ← EA(B) + EA(C)
Load/Store:3 address add Ra Rb Rc Ra ←← Rb + Rc
load Ra Rb Ra ← ← mem[Rb]store Ra Rb mem[Rb] ← ← Ra
Stack Machines• Instruction set:
+, -, *, /, . . .push A, pop A
• Example: a*b - (a+c*b)push apush b*push apush cpush b*+-
A BA
A*B
-
+
aa b
*
b
*
c
A*BA*B
A*B
AAC
A*BA A*B
Arguments Against Stacks
• Data does not always “surface” when needed– Constants, repeated operands, common sub-expressions
so TOP and SWAP instructions are required• Code density is about equal to that of GPR
instruction sets– Registers have short addresses– Keep things in registers and reuse them
• Slightly simpler to write a poor compiler, but not an optimizing compiler
Performance derived from fast registers, not the way they are used.
A "Typical" RISC
• 32-bit fixed format instruction (3 formats)• 32 32-bit GPR (R0 contains zero, DP take pair)• 3-address, reg-reg arithmetic instruction• Single address mode for load/store:
base + displacement• Simple branch conditions• Delayed branch
Example: MIPS
Op
31 26 01516202125
Rs1 Rd immediate
Op
31 26 025
Op
31 26 01516202125
Rs1 Rs2
target
Rd Opx
Register-Register
561011
Register-Immediate
Op
31 26 01516202125
Rs1 Rs2/Opx immediate
Branch
Jump / Call
shamt
• simple instructions all 32 bits wide• very structured• only three instruction formats
op rs rt rd shamt funct
op rs rt 16 bit address
op 26 bit address
R
I
J
Overview of MIPS
• Instructions:bne $t4,$t5,Label Next instruction is at Label if $t4 ° $t5
beq $t4,$t5,Label Next instruction is at Label if $t4 = $t5
j Label Next instruction is at Label
• Formats:
op rs rt 16 bit address
op 26 bit address
I
J
Addresses in Branches and Jumps
To summarize:MIPS operands
Name Example Comments$s0-$s7, $t0-$t9, $zero, Fast locations for data. In MIPS, data must be in registers to perform
32 registers $a0-$a3, $v0-$v1, $gp, arithmetic. MIPS register $zero always equals 0. Register $at is
$fp, $sp, $ra, $at reserved for the assembler to handle large constants.
Memory[0], Accessed only by data transfer instructions. MIPS uses byte addresses, so
230
memory Memory[4], ..., sequential words differ by 4. Memory holds data structures, such as arrays,
words Memory[4294967292] and spilled registers, such as those saved on procedure calls.
M I P S a s s e m b l y l a n g u a g e
C a t e g o r y I n s t r u c t i o n E x a m p l e M e a n i n g C o m m e n t s
a d d a d d $ s 1 , $ s 2 , $ s 3 $ s 1 = $ s 2 + $ s 3 T h r e e o p e r a n d s ; d a t a i n r e g i s t e r s
A r i t h m e t i c s u b t r a c t s u b $ s 1 , $ s 2 , $ s 3 $ s 1 = $ s 2 - $ s 3 T h r e e o p e r a n d s ; d a t a i n r e g i s t e r s
a d d i m m e d i a t e a d d i $ s 1 , $ s 2 , 1 0 0 $ s 1 = $ s 2 + 1 0 0 U s e d t o a d d c o n s t a n t s
l o a d w o r d l w $ s 1 , 1 0 0 ( $ s 2 ) $ s 1 = M e m o r y [$ s 2 + 1 0 0 ] W o r d f r o m m e m o r y t o r e g i s t e r
s t o r e w o r d s w $ s 1 , 1 0 0 ( $ s 2 ) M e m o r y [$ s 2 + 1 0 0 ] = $ s 1 W o r d f r o m r e g i s t e r t o m e m o r y
D a t a t r a n s f e r l o a d b y t e l b $ s 1 , 1 0 0 ( $ s 2 ) $ s 1 = M e m o r y [$ s 2 + 1 0 0 ] B y t e f r o m m e m o r y t o r e g i s t e r
s t o r e b y t e s b $ s 1 , 1 0 0 ( $ s 2 ) M e m o r y [$ s 2 + 1 0 0 ] = $ s 1 B y t e f r o m r e g i s t e r t o m e m o r y
l o a d u p p e r
i m m e d i a t e
l u i $ s 1 , 1 0 0 $ s 1 = 1 0 0 * 2 16 L o a d s c o n s t a n t i n u p p e r 1 6 b i t s
b r a n c h o n e q u a l b e q $ s 1 , $ s 2 , 2 5 i f ($ s 1 = = $ s 2 ) g o t o
P C + 4 + 1 0 0
E q u a l t e s t ; P C - r e l a t i v e b r a n c h
C o n d i t i o n a l
b r a n c h o n n o t e q u a l b n e $ s 1 , $ s 2 , 2 5 i f ($ s 1 ! = $ s 2 ) g o t o
P C + 4 + 1 0 0
N o t e q u a l t e s t ; P C - r e l a t i v e
b r a n c h s e t o n l e s s t h a n s l t $ s 1 , $ s 2 , $ s 3 i f ($ s 2 < $ s 3 ) $ s 1 = 1 ;
e l s e $ s 1 = 0
C o m p a r e l e s s t h a n ; f o r b e q , b n e
s e t l e s s t h a n
i m m e d i a t e
s l t i $ s 1 , $ s 2 , 1 0 0 i f ($ s 2 < 1 0 0 ) $ s 1 = 1 ;
e l s e $ s 1 = 0
C o m p a r e l e s s t h a n c o n s t a n t
j u m p j 2 5 0 0 g o t o 1 0 0 0 0 J u m p t o t a r g e t a d d r e s s
U n c o n d i - j u m p r e g i s t e r j r $ r a g o t o $ r a F o r s w i t c h , p r o c e d u r e r e t u r n
t i o n a l j u m p j u m p a n d l i n k j a l 2 5 0 0 $ r a = P C + 4 ; g o t o 1 0 0 0 0 F o r p r o c e d u r e c a l l
• Design alternative:– provide more powerful operations
– goal is to reduce number of instructions executed
– danger is a slower cycle time and/or a higher CPI
• Sometimes referred to as “RISC vs. CISC”– virtually all new instruction sets since 1982 have been
RISC
Alternative Architectures
Most Popular ISA of All Time:The Intel 80x86
• 1978: 8086– extension to 8080 (8 bit accumulator machine)– 16 bit, additional registers
• 1980: 8087 floating point coprocessor– adds 60 instructions – hybrid stack/register scheme
• 1982: 80286 – 24-bit address space– memory mapping and protection model
Most Popular ISA of All Time:The Intel 80x86
• 1985: 80386 – 32-bit address space– 32-bit GP registers– paging
• 1989-95– 80486, Pentium, Pentium Pro
• 1997– MMX added
80x86
• Complexity:– Instructions from 1 to 17 bytes long– one operand must act as both a source and destination– one operand can come from memory– complex addressing modes
e.g., “base or scaled index with 8 or 32 bit displacement”
• Saving grace:– the most frequently used instructions are not too difficult to
build– compilers avoid the portions of the architecture that are
slow
Intel 80x86 Integer Registers
Intel 80x86 Floating Point Registers
Usage of Intel 80x86 Floating Point Registers
NASA 7 SpiceStack (2nd operand ST(1)) 0.3% 2.0%Register (2nd operand ST(i), i>1) 23.3% 8.3%Memory 76.3% 89.7%
Above are dynamic instruction percentages (i.e., based on counts of executed instructions)
80x86 Addressing/Protection
80x86 Instruction Format
• 8086 in black; 80386 extensions in color
(Base reg + 2Scale x Index reg)
80x86 Instruction Encoding: Mod, Reg, R/M Field
r w=0 w=1 r/m mod=0 mod=1 mod=2 mod=3
16b 32b 16b 32b 16b 32b 16b 32b
0 AL AX EAX 0 addr=BX+SI =EAX same same same same same
1 CL CX ECX 1 addr=BX+DI =ECX addr addr addr addr as
2 DL DX EDX 2 addr=BP+SI =EDX mod=0 mod=0 mod=0 mod=0 reg
3 BL BX EBX 3 addr=BP+SI =EBX +d8 +d8 +d16 +d32 field
4 AH SP ESP 4 addr=SI =(sib) SI+d8 (sib)+d8 SI+d8 (sib)+d32 “
5 CH BP EBP 5 addr=DI =d32 DI+d8 EBP+d8 DI+d16 EBP+d32 “
6 DH SI ESI 6 addr=d16 =ESI BP+d8 ESI+d8 BP+d16 ESI+d32 “7 BH DI EDI 7 addr=BX =EDI BX+d8 EDI+d8 BX+d16 EDI+d32 “
First address specifier: Reg=3 bits, R/M=3 bits, Mod=2 bits
w fromopcode
r/m field depends on mod and machine mode
reg
80x86 Instruction EncodingSc/Index/Base field
sib Index Base
0 EAX EAX1 ECX ECX2 EDX EDX3 EBX EBX4 no index ESP5 EBP if mod=0, d32
if mod°0, EBP6 ESI ESI7 EDI EDI
Base + Scaled Index ModeUsed when:
mod = 0,1,2in 32-bit modeAND r/m = 4!
2-bit Scale Field3-bit Index Field3-bit Base Field
80x86 Addressing Mode Usage for 32-bit Mode
Addressing Mode GccEspr.NASA7 Spice Avg.
Register indirect 10% 10% 6% 2% 7%Base + 8-bit disp 46% 43% 32% 4% 31%Base + 32-bit disp 2% 0% 24% 10% 9%Indexed 1% 0% 1% 0% 1%Based indexed + 8b disp 0% 0% 4% 0% 1%Based indexed + 32b disp 0% 0% 0% 0% 0%Base + Scaled Indexed 12% 31% 9% 0% 13%Base + Scaled Index + 8b disp 2% 1% 2% 0% 1%Base + Scaled Index + 32b disp 6% 2% 2% 33% 11%32-bit Direct 19% 12% 20% 51% 26%
80x86 Length Distribution
Len
gth
in
byt
es
% instructions at each length
0% 10% 20% 30%
1
2
3
4
5
6
7
8
9
10
11
24%
23%
21%
3%
12%
13%
3%
0%
0%
1%
19%
17%
16%
1%
15%
27%
4%
0%
0%
1%
24%
24%
27%
4%
13%
6%
2%
0%
0%
0%
25%
24%
29%
3%
12%
4%
2%
0%
0%
0%
Espresso
Gcc
Spice
NASA7
Instruction Counts: 80x86 v. DLX
SPEC pgm x86 DLX DLX÷86
gcc 3,771,327,742 3,892,063,460 1.03espresso 2,216,423,413 2,801,294,286 1.26spice 15,257,026,309 16,965,928,788 1.11nasa7 15,603,040,963 6,118,740,321 0.39