Lecture 2: Instruction Set Principles & Examples · Instruction set architecture is the structure of a computer that a machine language programmer (or a compiler) must understand

Lecture 2: Instruction Set Principles & Examples

Prof. Tao LiAdvanced Computer Architecture

EEL 5764

What Is ISA?

Instruction set architecture is the structure of a computer that a machine language programmer (or a compiler) must understand to write a correct (timing independent) program for that machine.

For IBM System/360, 1964

• 1960s: stack architectures - a good match for high-level languages

• 1970s: ISAs were enriched to make the compiler’s job easier

- software costs were a concern

• 1980s: high clock speed and high parallelism – RISC- pipelined and superscalar architectures

• today: ISAs designed in 1980 are still around!

• tomorrow: ISA virtualization

A History of ISA Evolution

• ISAs for all three segments are very similar

• Desktops: equal emphasis for int and fp, little regard for code size and power

• Servers: little need for high floating-point performance

• Embedded: emphasis on low cost and power – code size is important, floating-point may be optional

• Desktops and embedded also care about multimedia apps

-- hence, use special media extension instructions

ISAs for Different Processors(today’s market)

ISA Type 1: Stack

Implicit operands on stackEx. C = A + BPush APush BAddPop C

Used in 60’s-70’s; now in Java VM

ISA Type 2: Accumulator

• The accumulator provides an implicit input, and is the implicit place to store the result.

• Ex. C = A + BLoad AAdd BStore c

• Used before 1980

ISA Type 3: General-purpose Registers

• General-purpose registers are preferred by compilers– Reduce memory traffic – Improve program speed– Improve code density

• Usage of general-purpose registers– Holding temporal variables in expression evaluation– Passing parameters– Holding variables

Register-memory

• There is no implicit operand• One input operand is register, and

one in memoryEx. C = A + BLoad R1, AAdd R3, R1, B

Store R3, C

• Processors include VAX, 80x86

Register-register (Load-store)

• Both operands are registers• Values in memory must be loaded

into a register and stored back• Ex. C = A + B

Load R1, ALoad R2, BAdd R3, R1, R2Store R3, C

• Processors: MIPS, SPARC

Variants of GPR Architecture• Number of operands in ALU instructions: two or three

Add R1, R2, R3 Add R1, R2

• Maximal number of memory operands in ALU instructions: zero, one, two, or three

Load R1, A Load R1, ALoad R2, B Add R3, R1, BAdd R3, R1, R2

• Three popular combinations– register-register (load-store): 0 memory, 3 operands– register-memory: 1 memory, 2 operands– memory-memory: 2 memories, 2 operands; or 3

memories, 3 operands

VAXVariation in instr size (hard to decode), frequent memory

accesses, variable instr latency

Most compact code size, doesn’t waste registers

Memory-Memory (2 mem, 2 ops) or

(3, 3)

Intel 80x86, Motorola

68000

One of the operands is destroyed, instrlatency is variable

Can access data without doing a load, small code

size

Register-Memory (1 mem, 2 ops)

Alpha, MIPS, ARM,

PowerPC, SPARC

High instr count and code size

Simple, fixed-length, simple code-generation,

easy pipelining and parallelism extraction

Register-Register(0 mem, 3 ops)

ExamplesDisadvantagesAdvantagesType

Variants of GPR Architecture

How Many Registers?If the number of registers increase:

Allocate more variables in registers (fast accesses)Reducing code spillReducing memory traffic

Longer register specifiers (difficult encoding)Increasing register access time (physical registers)More registers to save in context switch

MIPS64: 32 general-purpose registers

• Complex Instruction Set Computer: if you do it in hardware,it’s fast hence, implement every functionality in hardware

rich instruction setcomplex decodingcomplex analysis to identify dependences

• Reduced Instruction Set Computer: by using a few simpleinstruction primitives, the hardware is simpler

easy to extract parallelismeasy to effect high clock speeds

• x86 is CISC and is popular for compatibility reasons – CISCinstrs are converted to RISC instrs in hardware

RISC vs. CISC

ISA and PerformanceCPU time = #inst × CPI × cycle time

• RISC with Register-Register instructionsSimple, fix-length instruction encodingSimple code generationRegularity in CPIHigher instruction countsLower instruction density

• CISC with Register-memory instructionsNo extra load in accessing data in memoryOperands being not equivalentRestricted #registers due to encoding memory addressIrregularity in CPI

Memory AddressingInstructions see registers, constant values, and memory

• Addressing mode decides how to specify an object to access

– Object can be memory location, register, or a constant

– Memory addressing is complicated• Memory addressing involves many factors

– Memory addressing mode– Object size– byte ordering– alignment

For a memory location, its effective address is calculated in a certain form of register content, immediate address, and PC, as specified by the addressing mode

• Most computers are byte addressed and also allow access to half words (16 bits), words (32), and double words (64)

• Accesses are usually required to be aligned: a half word can not have an odd address, a double word must have an address A, where A mod 8 = 0, etc.

• Misalignment increases hardware complexity and worsens performance (if data cross cache line boundaries)

Interpreting Memory Addresses

Little and Big Endian• Consider a 64-bit quantity, composed of bytes 0-7 (LSB-MSB)

• In Little-Endian format, memory address A will contain byte 0,address A+1 will contain byte 1,….address A+7 will containbyte 7

Advantage: easier to organize bytes, half-words, words, double words, etc. into registers (Alpha, x86)

• In Big-Endian format, memory address A will contain byte 7,address A+1 will contain byte 6,… address A+7 will containbyte 0

Advantage: values are stored in the order they areprinted out, the sign is available early (Motorola)

Endianness Example• Consider the hexadecimal number:

MSB 0x 43fa27c77156ab91 LSB

• Two options:

43fa27c77156ab91 address 7 6 5 4 3 2 1 0

91ab5671c727fa43

Endianness Example

• Consider the hexadecimal number:MSB 0x 43fa27c77156ab91 LSB

• Two options:

43fa27c77156ab91 address 7 6 5 4 3 2 1 0

91ab5671c727fa43

Little-endian

Big-endian

MIPS Data Addressing Modes• Register

ADD $16, $7, $8

• ImmediateADDI $17, $7, 100

• DisplacementLW $18, 100($9)

Memory Addressing Seen in CISC

• Direct (absolute)• Register indirect• Indexed• Scaled• Autoincrement• Autodecrement• Memory indirect

And more …

ADD R1, (1001)SUB R2, (R1)ADD R1, (R2 + R3) SUB R2, 100(R2)[R3]ADD R1, (R2)+SUB R2, -(R1)ADD R1, @(R3)(see textbook p98)

Choosing of Memory Addressing Modes

Choosing complex addressing modesClose to addressing in high-level languageMay reduce instruction countsIncrease implementation complexity (may increase cycle time)Increase CPI

RISC ISA comes with simple memory addressing, and CISC ISA with complex ones

How Often Are Those Address Modes?

Usage of address modes, VAX machine, SPEC89

Usage of Immediate Operands In RISC

Alpha, SPEC CINT2000 & CFP2000

Immediate Size in RISC

Alpha, SPEC CINT2000 & CFP2000

Displacement Size in RISC

Displacement bit size: Alpha ISA, SPEC CPU2000 Integer and FP

Common Operations

Compression/decompression, vertex/pixel opsGraphics

Move, compare, searchString

Decimal add, sub, mult, decimal to character conversionsDecimal

FP add, sub, mult, divFloating point

OS call, virtual memory managementSystem

Branch, jump, call, returnControl

Loads/storesData transfer

Add, sub, and, or, mult, divArithmetic/Logical

ExamplesOperator Type

Dynamic Instruction Mix (MIPS)SPEC2K Int SPEC2K FP

Load 26% 15%Store 10% 2%Add 19% 23%Compare 5% 2%Cond br 12% 4%Cond mv 2% 0%Jump 1% 0%LOGIC 18% 4%FP load 15%FP store 7%FP others 19%

4%Move register-register

2%Call/Return

5%Sub

6%And

8%Add

12%Store

16%Compare

20%Conditional branch

22%Load

Integer average (% total executed)80x86 instruction

Common Operations

Control Transfer Instructions

• Conditional branches (75% - Int) (82% - FP)

• Jumps (6% - Int) (10% - FP)

• Procedure calls/returns (19% - Int) (8% - FP)

• Design issues:How do you specify the target address?How do you specify the condition?What happens on a procedure call/return?

Review of MIPS ISAInstruction Types

• Data transfer: Load and store

• Integer arithmetic/logic

• Floating point arithmetic• Control instructions

(branches and jumps)• A few others

• LW $16, 100($2)LB $17, 200($2)

• ADDI $8, $16, 17SLT $10, $8, $9

• ADD.D $f0, $f1, $f0• BEQ $0, $1, loop

J _fprintf

MIPS Instruction Format

opcode rs rt Immediate/offset6 5 5 16

I-type

opcode rs rt rd shamt funct6 5 5 5 5 6

R-type

opcode offset6 26

J-type

• opcdoe: 6-bit operation of the instruction• rs: first source registers• rt: second source register• rd: destination register• immediate: immediate value or displacment• shamt: shift amount• funct: function variants• offset: offset added to PC for jumps

Simplified Instruction Set• LW/SW Instructions

• R-type Arithmetic Instructions

• Branch Instructions

opcode rs rt offset6 5 5 16

I-type

opcode rs rt offset6 5 5 16

I-type

opcode rs rt rd shamt funct6 5 5 5 5 6

R-type

Normal Usage of RegistersName Register number Usage

$zero 0 the constant value 0$v0-$v1 2-3 values for results and expression evaluation$a0-$a3 4-7 arguments$t0-$t7 8-15 temporaries$s0-$s7 16-23 saved$t8-$t9 24-25 more temporaries$gp 28 global pointer$sp 29 stack pointer$fp 30 frame pointer$ra 31 return address

Simple Implementation• Available datapath elements

PC

Instruction memory

Instruction address

Instruction

a. Instruction memory b. Program counter

Add Sum

c. Adder

ALUcontrol

RegWrite

RegistersWrite register

Read data 1

Read data 2

Read register 1

Read register 2

Write data

ALU result

ALU

Data

Data

Register numbers

a. Registers b. ALU

Zero5

5

5 3

16 32Sign

extend

b. Sign-extension unit

MemRead

MemWrite

Data memory

Write data

Read data

a. Data memory unit

Address

Operations in Instruction Execution1. Fetch the inst at PC;

PC incr by 4

• LW $a, 100($b)3. Add $b and 1004. Read memory5. Wirte $a

• ADD $c, $a, $b3. Add $a and $b4. Write $c

2. Decode the inst;Read $a and $b

• SW $a, 100($b)3. Add $b and 100;4. Write memory

• BEQ $a, $b, offset3. Compare $a and $b4. If equal,

PC<=PC+offset

Single-cycle Implementation• This one supports LW, SW, BEQ, Alu inst e.g. ADD

MemtoReg

MemRead

MemWrite

ALUOp

ALUSrc

RegDst

PC

Instruction memory

Read address

Instruction [31–0]



Add

Instruction [5–0]

RegWrite

4

16 32Instruction [15–0]

0Registers

Write registerWrite data

Write data

Read data 1

Read data 2

Read register 1Read register 2

Sign extend

ALU result

Zero

Data memory

Address Read data M

u x

1

0

M u x

1

0

M u x

1

0

M u x

1


ALU control

Shift left 2

PCSrc

ALU

Add ALU result

Regularize Instruction Execution

LW/SW ALU Branch

IF IF IFID/REG ID/REG ID/REGEX EX EXMEM -- --WB WB --

Pipelined Instruction Executions

80x86 ISA Early HistorySource: An Alternative to RISC: The Intel 80x86. Appendix D of

the textbook. Available at http://www.mkp.com/CA3.

• 1978: 16-bit 8086– Extended from 8-bit 8080– Hybrid of accumulator and GRP

• 1980: 8087 FP coprocessor– Add 60 FP instructions to 8086– Hybrid of stack and GRP

• 1982: 80286– address– Virtual 24-bit memory management

• 1985: 32-bit 80386– Add new addressing modes, paging support– Nearly a GPR machine

80x86 Integer Registers

Addressing Modes

• Absolute: 16-bit or 32-bit displacement• Register indirect: selected registers EAX, ECX, EDX, EBS,

ESI, and EDI• Displacement: selected registers + 16- or 32-bit offset• Indexed: reg + reg• Based indexed: reg + reg + 8- or 16-bit disp• Base plus scaled indexed: reg + reg*d• Base plus scaled index with displacement: reg + reg*d + 8- or

16-bit disp

Byte order: little endian

Instruction Encoding Examples

…, its [80x86] checkered ancestry has led to an architecture that is difficult to explain and impossible to love.

Textbook, appendix D

Remarks

The x86 isn’t all that complex – it just doesn’t make a lot of sense.

Mike JohnsonLeader of 80x86 Design at AMD

Documents

Lecture 2: Instruction Set Principles & Examples · Instruction set architecture is the structure of a computer that a machine language programmer (or a compiler) must understand