Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
Lecture 2: Instruction Set Principles & Examples
Prof. Tao LiAdvanced Computer Architecture
EEL 5764
What Is ISA?
Instruction set architecture is the structure of a computer that a machine language programmer (or a compiler) must understand to write a correct (timing independent) program for that machine.
For IBM System/360, 1964
• 1960s: stack architectures - a good match for high-level languages
• 1970s: ISAs were enriched to make the compiler’s job easier
- software costs were a concern
• 1980s: high clock speed and high parallelism – RISC- pipelined and superscalar architectures
• today: ISAs designed in 1980 are still around!
• tomorrow: ISA virtualization
A History of ISA Evolution
• ISAs for all three segments are very similar
• Desktops: equal emphasis for int and fp, little regard for code size and power
• Servers: little need for high floating-point performance
• Embedded: emphasis on low cost and power – code size is important, floating-point may be optional
• Desktops and embedded also care about multimedia apps
-- hence, use special media extension instructions
ISAs for Different Processors(today’s market)
ISA Type 1: Stack
Implicit operands on stackEx. C = A + BPush APush BAddPop C
Used in 60’s-70’s; now in Java VM
ISA Type 2: Accumulator
• The accumulator provides an implicit input, and is the implicit place to store the result.
• Ex. C = A + BLoad AAdd BStore c
• Used before 1980
ISA Type 3: General-purpose Registers
• General-purpose registers are preferred by compilers– Reduce memory traffic – Improve program speed– Improve code density
• Usage of general-purpose registers– Holding temporal variables in expression evaluation– Passing parameters– Holding variables
Register-memory
• There is no implicit operand• One input operand is register, and
one in memoryEx. C = A + BLoad R1, AAdd R3, R1, B
Store R3, C
• Processors include VAX, 80x86
Register-register (Load-store)
• Both operands are registers• Values in memory must be loaded
into a register and stored back• Ex. C = A + B
Load R1, ALoad R2, BAdd R3, R1, R2Store R3, C
• Processors: MIPS, SPARC
Variants of GPR Architecture• Number of operands in ALU instructions: two or three
Add R1, R2, R3 Add R1, R2
• Maximal number of memory operands in ALU instructions: zero, one, two, or three
Load R1, A Load R1, ALoad R2, B Add R3, R1, BAdd R3, R1, R2
• Three popular combinations– register-register (load-store): 0 memory, 3 operands– register-memory: 1 memory, 2 operands– memory-memory: 2 memories, 2 operands; or 3
memories, 3 operands
VAXVariation in instr size (hard to decode), frequent memory
accesses, variable instr latency
Most compact code size, doesn’t waste registers
Memory-Memory (2 mem, 2 ops) or
(3, 3)
Intel 80x86, Motorola
68000
One of the operands is destroyed, instrlatency is variable
Can access data without doing a load, small code
size
Register-Memory (1 mem, 2 ops)
Alpha, MIPS, ARM,
PowerPC, SPARC
High instr count and code size
Simple, fixed-length, simple code-generation,
easy pipelining and parallelism extraction
Register-Register(0 mem, 3 ops)
ExamplesDisadvantagesAdvantagesType
Variants of GPR Architecture
How Many Registers?If the number of registers increase:
Allocate more variables in registers (fast accesses)Reducing code spillReducing memory traffic
Longer register specifiers (difficult encoding)Increasing register access time (physical registers)More registers to save in context switch
MIPS64: 32 general-purpose registers
• Complex Instruction Set Computer: if you do it in hardware,it’s fast hence, implement every functionality in hardware
rich instruction setcomplex decodingcomplex analysis to identify dependences
• Reduced Instruction Set Computer: by using a few simpleinstruction primitives, the hardware is simpler
easy to extract parallelismeasy to effect high clock speeds
• x86 is CISC and is popular for compatibility reasons – CISCinstrs are converted to RISC instrs in hardware
RISC vs. CISC
ISA and PerformanceCPU time = #inst × CPI × cycle time
• RISC with Register-Register instructionsSimple, fix-length instruction encodingSimple code generationRegularity in CPIHigher instruction countsLower instruction density
• CISC with Register-memory instructionsNo extra load in accessing data in memoryOperands being not equivalentRestricted #registers due to encoding memory addressIrregularity in CPI
Memory AddressingInstructions see registers, constant values, and memory
• Addressing mode decides how to specify an object to access
– Object can be memory location, register, or a constant
– Memory addressing is complicated• Memory addressing involves many factors
– Memory addressing mode– Object size– byte ordering– alignment
For a memory location, its effective address is calculated in a certain form of register content, immediate address, and PC, as specified by the addressing mode
• Most computers are byte addressed and also allow access to half words (16 bits), words (32), and double words (64)
• Accesses are usually required to be aligned: a half word can not have an odd address, a double word must have an address A, where A mod 8 = 0, etc.
• Misalignment increases hardware complexity and worsens performance (if data cross cache line boundaries)
Interpreting Memory Addresses
Little and Big Endian• Consider a 64-bit quantity, composed of bytes 0-7 (LSB-MSB)
• In Little-Endian format, memory address A will contain byte 0,address A+1 will contain byte 1,….address A+7 will containbyte 7
Advantage: easier to organize bytes, half-words, words, double words, etc. into registers (Alpha, x86)
• In Big-Endian format, memory address A will contain byte 7,address A+1 will contain byte 6,… address A+7 will containbyte 0
Advantage: values are stored in the order they areprinted out, the sign is available early (Motorola)
Endianness Example• Consider the hexadecimal number:
MSB 0x 43fa27c77156ab91 LSB
• Two options:
43fa27c77156ab91 address 7 6 5 4 3 2 1 0
91ab5671c727fa43
Endianness Example
• Consider the hexadecimal number:MSB 0x 43fa27c77156ab91 LSB
• Two options:
43fa27c77156ab91 address 7 6 5 4 3 2 1 0
91ab5671c727fa43
Little-endian
Big-endian
MIPS Data Addressing Modes• Register
ADD $16, $7, $8
• ImmediateADDI $17, $7, 100
• DisplacementLW $18, 100($9)
Memory Addressing Seen in CISC
• Direct (absolute)• Register indirect• Indexed• Scaled• Autoincrement• Autodecrement• Memory indirect
And more …
ADD R1, (1001)SUB R2, (R1)ADD R1, (R2 + R3) SUB R2, 100(R2)[R3]ADD R1, (R2)+SUB R2, -(R1)ADD R1, @(R3)(see textbook p98)
Choosing of Memory Addressing Modes
Choosing complex addressing modesClose to addressing in high-level languageMay reduce instruction countsIncrease implementation complexity (may increase cycle time)Increase CPI
RISC ISA comes with simple memory addressing, and CISC ISA with complex ones
How Often Are Those Address Modes?
Usage of address modes, VAX machine, SPEC89
Usage of Immediate Operands In RISC
Alpha, SPEC CINT2000 & CFP2000
Immediate Size in RISC
Alpha, SPEC CINT2000 & CFP2000
Displacement Size in RISC
Displacement bit size: Alpha ISA, SPEC CPU2000 Integer and FP
Common Operations
Compression/decompression, vertex/pixel opsGraphics
Move, compare, searchString
Decimal add, sub, mult, decimal to character conversionsDecimal
FP add, sub, mult, divFloating point
OS call, virtual memory managementSystem
Branch, jump, call, returnControl
Loads/storesData transfer
Add, sub, and, or, mult, divArithmetic/Logical
ExamplesOperator Type
Dynamic Instruction Mix (MIPS)SPEC2K Int SPEC2K FP
Load 26% 15%Store 10% 2%Add 19% 23%Compare 5% 2%Cond br 12% 4%Cond mv 2% 0%Jump 1% 0%LOGIC 18% 4%FP load 15%FP store 7%FP others 19%
4%Move register-register
2%Call/Return
5%Sub
6%And
8%Add
12%Store
16%Compare
20%Conditional branch
22%Load
Integer average (% total executed)80x86 instruction
Common Operations
Control Transfer Instructions
• Conditional branches (75% - Int) (82% - FP)
• Jumps (6% - Int) (10% - FP)
• Procedure calls/returns (19% - Int) (8% - FP)
• Design issues:How do you specify the target address?How do you specify the condition?What happens on a procedure call/return?
Review of MIPS ISAInstruction Types
• Data transfer: Load and store
• Integer arithmetic/logic
• Floating point arithmetic• Control instructions
(branches and jumps)• A few others
• LW $16, 100($2)LB $17, 200($2)
• ADDI $8, $16, 17SLT $10, $8, $9
• ADD.D $f0, $f1, $f0• BEQ $0, $1, loop
J _fprintf
MIPS Instruction Format
opcode rs rt Immediate/offset6 5 5 16
I-type
opcode rs rt rd shamt funct6 5 5 5 5 6
R-type
opcode offset6 26
J-type
• opcdoe: 6-bit operation of the instruction• rs: first source registers• rt: second source register• rd: destination register• immediate: immediate value or displacment• shamt: shift amount• funct: function variants• offset: offset added to PC for jumps
Simplified Instruction Set• LW/SW Instructions
• R-type Arithmetic Instructions
• Branch Instructions
opcode rs rt offset6 5 5 16
I-type
opcode rs rt offset6 5 5 16
I-type
opcode rs rt rd shamt funct6 5 5 5 5 6
R-type
Normal Usage of RegistersName Register number Usage
$zero 0 the constant value 0$v0-$v1 2-3 values for results and expression evaluation$a0-$a3 4-7 arguments$t0-$t7 8-15 temporaries$s0-$s7 16-23 saved$t8-$t9 24-25 more temporaries$gp 28 global pointer$sp 29 stack pointer$fp 30 frame pointer$ra 31 return address
Simple Implementation• Available datapath elements
PC
Instruction memory
Instruction address
Instruction
a. Instruction memory b. Program counter
Add Sum
c. Adder
ALUcontrol
RegWrite
RegistersWrite register
Read data 1
Read data 2
Read register 1
Read register 2
Write data
ALU result
ALU
Data
Data
Register numbers
a. Registers b. ALU
Zero5
5
5 3
16 32Sign
extend
b. Sign-extension unit
MemRead
MemWrite
Data memory
Write data
Read data
a. Data memory unit
Address
Operations in Instruction Execution1. Fetch the inst at PC;
PC incr by 4
• LW $a, 100($b)3. Add $b and 1004. Read memory5. Wirte $a
• ADD $c, $a, $b3. Add $a and $b4. Write $c
2. Decode the inst;Read $a and $b
• SW $a, 100($b)3. Add $b and 100;4. Write memory
• BEQ $a, $b, offset3. Compare $a and $b4. If equal,
PC<=PC+offset
Single-cycle Implementation• This one supports LW, SW, BEQ, Alu inst e.g. ADD
MemtoReg
MemRead
MemWrite
ALUOp
ALUSrc
RegDst
PC
Instruction memory
Read address
Instruction [31–0]
Instruction [20–16]
Instruction [25–21]
Add
Instruction [5–0]
RegWrite
4
16 32Instruction [15–0]
0Registers
Write registerWrite data
Write data
Read data 1
Read data 2
Read register 1Read register 2
Sign extend
ALU result
Zero
Data memory
Address Read data M
u x
1
0
M u x
1
0
M u x
1
0
M u x
1
Instruction [15–11]
ALU control
Shift left 2
PCSrc
ALU
Add ALU result
Regularize Instruction Execution
LW/SW ALU Branch
IF IF IFID/REG ID/REG ID/REGEX EX EXMEM -- --WB WB --
Pipelined Instruction Executions
80x86 ISA Early HistorySource: An Alternative to RISC: The Intel 80x86. Appendix D of
the textbook. Available at http://www.mkp.com/CA3.
• 1978: 16-bit 8086– Extended from 8-bit 8080– Hybrid of accumulator and GRP
• 1980: 8087 FP coprocessor– Add 60 FP instructions to 8086– Hybrid of stack and GRP
• 1982: 80286– address– Virtual 24-bit memory management
• 1985: 32-bit 80386– Add new addressing modes, paging support– Nearly a GPR machine
80x86 Integer Registers
Addressing Modes
• Absolute: 16-bit or 32-bit displacement• Register indirect: selected registers EAX, ECX, EDX, EBS,
ESI, and EDI• Displacement: selected registers + 16- or 32-bit offset• Indexed: reg + reg• Based indexed: reg + reg + 8- or 16-bit disp• Base plus scaled indexed: reg + reg*d• Base plus scaled index with displacement: reg + reg*d + 8- or
16-bit disp
Byte order: little endian
Instruction Encoding Examples
…, its [80x86] checkered ancestry has led to an architecture that is difficult to explain and impossible to love.
Textbook, appendix D
Remarks
The x86 isn’t all that complex – it just doesn’t make a lot of sense.
Mike JohnsonLeader of 80x86 Design at AMD