1 ECE369 ECE369 Chapter 2. 2 ECE369 Instruction Set Architecture A very important abstraction...

Preview:

Citation preview

1ECE369

ECE369

Chapter 2

2ECE369

Instruction Set Architecture

• A very important abstraction

– interface between hardware and low-level software

– standardizes instructions, machine language bit patterns, etc.

– advantage: different implementations of the same architecture

– disadvantage: sometimes prevents using new innovations

• Modern instruction set architectures:

– IA-32, PowerPC, MIPS, SPARC, ARM, and others

3ECE369

The MIPS Instruction Set

• Used as the example throughout the book• Stanford MIPS commercialized by MIPS Technologies (www.mips.com)• Large share of embedded core market

– Applications in consumer electronics, network/storage equipment, cameras, printers, …

• Typical of many modern ISAs– See MIPS Reference Data tear-out card, and Appendixes B and E

4ECE369

MIPS arithmetic

• All instructions have 3 operands

• Operand order is fixed (destination first)

Example:

C code: a = b + c

MIPS ‘code’: add a, b, c

(we’ll talk about registers in a bit)

“The natural number of operands for an operation like addition is three…requiring every instruction to have exactly three operands, no more and no less, conforms to the philosophy of keeping the hardware simple”

5ECE369

MIPS arithmetic

• Design Principle: simplicity favors regularity.

• Of course this complicates some things...

C code: a = b + c + d;

MIPS code: add a, b, cadd a, a, d

• Operands must be registers, only 32 registers provided

• Each register contains 32 bits

• Design Principle: smaller is faster. Why?

6ECE369

Registers vs. Memory

Processor I/O

Control

Datapath

Memory

Input

Output

• Arithmetic instructions operands must be registers, — only 32 registers provided

• Compiler associates variables with registers

• What about programs with lots of variables

7ECE369

Memory Organization

• Viewed as a large, single-dimension array, with an address.

• A memory address is an index into the array

• "Byte addressing" means that the index points to a byte of memory.

0

1

2

3

4

5

6

...

8 bits of data

8 bits of data

8 bits of data

8 bits of data

8 bits of data

8 bits of data

8 bits of data

8ECE369

Memory Organization

• Bytes are nice, but most data items use larger "words"

• For MIPS, a word is 32 bits or 4 bytes.

• 232 bytes with byte addresses from 0 to 232-1

• 230 words with byte addresses 0, 4, 8, ... 232-4

• Words are alignedi.e., what are the least 2 significant bits of a word address?

0

4

8

12

...

32 bits of data

32 bits of data

32 bits of data

32 bits of data

Registers hold 32 bits of data

9ECE369

Instructions

• Load and store instructions• Example:

C code: A[12] = h + A[8];

# $s3 stores base address of A and $s2 stores hMIPS code: lw $t0, 32($s3)

add $t0, $s2, $t0sw $t0, 48($s3)

• Can refer to registers by name (e.g., $s2, $t2) instead of number• Store word has destination last• Remember arithmetic operands are registers, not memory!

Can’t write: add 48($s3), $s2, 32($s3)

10ECE369

Instructions

• Example:

C code: g = h + A[i];

# $s3 stores base address of A and # g,h and i in $s1,$s2 and $s4

Add $t1,$s4,$s4

Add $t1,$t1,$t1

Add $t1,$t1,$s3

Lw $t0,0($t1)

Add $s1,$s2,$t0

t1 = 2*i

t1 = 4*i

t1 = 4*i + s3

t0 = A[i]

g = h + A[i]

11ECE369

So far we’ve learned:

• MIPS— loading words but addressing bytes— arithmetic on registers only

• Instruction Meaning

add $s1, $s2, $s3 $s1 = $s2 + $s3sub $s1, $s2, $s3 $s1 = $s2 – $s3lw $s1, 100($s2) $s1 = Memory[$s2+100] sw $s1, 100($s2) Memory[$s2+100] = $s1

12ECE369

Policy of Use Conventions

Name Register number Usage$zero 0 the constant value 0$v0-$v1 2-3 values for results and expression evaluation$a0-$a3 4-7 arguments$t0-$t7 8-15 temporaries$s0-$s7 16-23 saved$t8-$t9 24-25 more temporaries$gp 28 global pointer$sp 29 stack pointer$fp 30 frame pointer$ra 31 return address

Register 1 ($at) reserved for assembler, 26-27 for operating system

13ECE369

MIPS Format

• Instructions, like registers and words– are also 32 bits long– add $t1, $s1, $s2– Registers: $t1=9, $s1=17, $s2=18

• Instruction Format:000000 10001 10010 01001 00000 100000 op rs rt rd shamt funct

14ECE369

• Consider the load-word and store-word instructions,

– What would the regularity principle have us do?

– New principle: Good design demands a compromise

• Introduce a new type of instruction format

– I-type for data transfer instructions

– other format was R-type for register

• Example: lw $t0, 32($s2)

35 18 9 32

op rs rt 16 bit number

• Where's the compromise?

Machine Language

15ECE369

Summary

Name Register number Usage$zero 0 the constant value 0$v0-$v1 2-3 values for results and expression evaluation$a0-$a3 4-7 arguments$t0-$t7 8-15 temporaries$s0-$s7 16-23 saved$t8-$t9 24-25 more temporaries$gp 28 global pointer$sp 29 stack pointer$fp 30 frame pointer$ra 31 return address

A[300]=h+A[300] # $t1 = base address of A, $s2 stores h # use $t0 for temporary register

Lw $t0,1200($t1)Add $t0, $s2, $t0Sw $t0, 1200($t1)

instruction format op rs rt rd shamt funct addressadd R 0 reg reg reg 0 32 nasub R 0 reg reg reg 0 34 nalw I 35 reg reg na na na addresssw I 43 reg reg na na na address

Op rs,rt,address 35,9,8,1200Op,rs,rt,rd,shamt,funct 0,18,8,8,0,32Op,rs,rt,address 43,9,8,1200

16ECE369

Summary of Instructions We Have Seen So Far

17ECE369

Summary of New Instructions

18ECE369

Example

swap(int* v, int k);{ int temp;

temp = v[k]v[k] = v[k+1];v[k+1] = temp;

}

swap:sll $t0, $a1, 4add $t0, $t0, $a0lw $t1, 0($t0)lw $t2, 4($t0)sw $t2, 0($t0)sw $t1, 4($t0)jr $31

19ECE369

Control Instructions

20ECE369

Using If-Else

$s0 = f$s1 = g$s2 = h$s3 = i$s4 = j$s5 = k

Where is 0,1,2,3 stored?

21ECE369

• Instructions:

bne $t4,$t5,Label Next instruction is at Label if $t4≠$t5beq $t4,$t5,Label Next instruction is at Label if $t4=$t5

• Formats:

op rs rt 16 bit addressI

Addresses in Branches

•What if the “Label” is too far away (16 bit address is not enough)

22ECE369

• Instructions:bne $t4,$t5,Label if $t4 != $t5beq $t4,$t5,Label if $t4 = $t5j Labelj Label Next instruction is at Label

• Formats:

op rs rt 16 bit address

op 26 bit address

I

J

Addresses in Branches and Jumps

23ECE369

• We have: beq, bne, what about Branch-if-less-than?

If (a<b) # a in $s0, b in $s1

Control Flow

slt $t0, $s0, $s1 # t0 gets 1 if a<b bne $t0, $zero, Less # go to Less if $t0 is not 0

Combination of slt and bne implements branch on less than.

24ECE369

While Loop

While (save[i] == k) # i, j and k correspond to registers i = i+j; # $s3, $s4 and $s5 # array base address at $s6

Loop: add $t1, $s3, $s3add $t1, $t1, $t1add $t1, $t1, $s6lw $t0, 0($t1)bne $t0, $s5, Exitadd $s3, $s3, $s4j loop

Exit:

25ECE369

What does this code do?

26ECE369

• simple instructions all 32 bits wide

• very structured, no unnecessary baggage

• only three instruction formats

op rs rt rd shamt funct

op rs rt 16 bit address

op 26 bit address

R

I

J

Overview of MIPS

27ECE369

Arrays vs. Pointers

clear1( int array[ ], int size)

{

int i;

for (i=0; i<size; i++)

array[i]=0;

}

clear2(int* array, int size)

{

int* p;

for( p=&array[0]; p<&array[size]; p++)

*p=0;

}

CPI for arithmetic, data transfer, branch type of instructions are 1, 2, and 1 correspondingly. Which code is faster?

28ECE369

Clear1

clear1( int array[ ], int size){ int i; for (i=0; i<size; i++) array[i]=0;}

array in $a0size in $a1i in $t0

add $t0,$zero,$zero # i=0, register $t0=0

loop1: add $t1,$t0,$t0 # $t1=i*2

add $t1,$t1,$t1 # $t1=i*4

add $t2,$a0,$t1 # $t2=address of array[i]

sw $zero, 0($t2) # array[i]=0addi $t0,$t0,1 # i=i+1

slt $t3,$t0,$a1 # $t3=(i<size)

bne $t3,$zero,loop1 # if (i<size) go to loop1

29ECE369

Clear2, Version 2

clear2(int* array, int size){ int* p; for( p=&array[0]; p<&array[size]; p++) *p=0;}

Array and size to registers $a0 and $a1

loop2: sw $zero,0($t0) # memory[p]=0

add $t0,$a0,$zero # p = address of array[0]

addi $t0,$t0,4 # p = p+4

add $t1,$a1,$a1 # $t1 = size*2add $t1,$t1,$t1 # $t1 = size*4 Distance of last element

add $t2,$a0,$t1 # $t2 = address of array[size]

slt $t3,$t0,$t2 # $t3=(p<&array[size])bne $t3,zero,loop2 # if (p<&array[size]) go to loop2

30ECE369

Array vs. Pointer

loop2: sw $zero,0($t0) # memory[p]=0

add $t0,$a0,$zero # p = address of array[0]

addi $t0,$t0,$4 # p = p+4

add $t1,$a1,$a1 # $t1 = size*2add $t1,$t1,$t1 # $t1 = size*4

add $t2,$a0,$t1 # $t2 = address of array[size]

slt $t3,$t0,$t2 # $t3=(p<&array[size])

bne $t3,zero,loop2 # if (p<&array[size]) go to loop2

add $t0,$zero,$zero # i=0, register $t0=0

loop1: add $t1,$t0,$t0 # $t1=i*2

add $t1,$t1,$t1 # $t1=i*4add $t2,$a0,$t1 # $t2=address of array[i]sw $zero, 0($t2) # array[i]=0addi $t0,$t0,1 # i=i+1slt $t3,$t0,$a1 # $t3=(i<size)bne $t3,$zero,loop1 # if (i<size) go to loop1

7 instructions inside loop

4 instructions inside loop

31ECE369

Summary

32ECE369

• More reading:

support for procedures

linkers, loaders, memory layout

stacks, frames, recursion

manipulating strings and pointers

interrupts and exceptions

system calls and conventions

• Some of these we'll talk more about later

• We have already talked about compiler optimizations

Other Issues

33ECE369

Elaboration

Name Register number Usage$zero 0 the constant value 0$v0-$v1 2-3 values for results and expression evaluation$a0-$a3 4-7 arguments$t0-$t7 8-15 temporaries$s0-$s7 16-23 saved$t8-$t9 24-25 more temporaries$gp 28 global pointer$sp 29 stack pointer$fp 30 frame pointer$ra 31 return address

What if there are more than 4 parameters for a function call?Addressable via frame pointerReferences to variables in the stack have the same offset

34ECE369

What is the Use of Frame Pointer?

Variables local to procedure do not fit in registers !!!

35ECE369

Nested Procedures,

function_main(){ function_a(var_x); /* passes argument using $a0 */ : /* function is called with “jal” instruction */ return;}

function_a(int size){ function_b(var_y); /* passes argument using $a0 */ : /* function is called with “jal” instruction */ return; }

function_b(int count){ : return;}

Resource Conflicts ???

36ECE369

Stack

• Last-in-first-out queue

• Register # 29 reserved as stack

pointer

• Points to most recently

allocated address

• Grows from higher to lower

address

• Subtracting $sp

• Adding data – Push

• Removing data – Pop

37ECE369

Function Call and Stack Pointer

jr

38ECE369

Recursive Procedures Invoke Clones !!!

int fact (int n) { if (n < 1 )

return ( 1 ); else

return ( n * fact ( n-1 ) );}

“n” corresponds to $a0

Program starts with the label of the procedure “fact”

How many registers do we need to save on the stack?

Registers $a0 and $ra

39ECE369

Factorial Code

200 fact:addi $sp, $sp, -8 #adjust stack for 2 items204 sw $ra, 4($sp) #save return address

sw $a0, 0($sp) #save argument n

slti $t0, $a0, 1 # is n<1?beq $t0, $zero, L1 # if not go to L1

addi $v0, $zero, 1 #return resultaddi $sp, $sp, 8 #pop items off stackjr $ra #return to calling proc.

L1: addi $a0, $a0, -1 #decrement n236 jal fact # call fact(n-1)

240 lw $a0, 0($sp) # restore “n”lw $ra, 4($sp) # restore addressaddi $sp, $sp,8 # pop 2 itemsmult $v0,$a0,$v0 # return n*fact(n-1)jr $ra # return to caller

:100 fact(3)104 add ….

ra = 104a0= 3sp= 40vo=

int fact (int n) { if (n < 1 )

return ( 1 ); else

return ( n * fact ( n-1 ) );}

40ECE369

Assembly to Hardware Example

int i;

int k = 0;

for (i=0; i<3; i++){

k = k + i + 1;

}

k = k/2;

add $t1,$zero,$zero # i=0, register $t1=0

k is stored in $t0; i is stored in $t1

3 stored in $t2; $t3 used as temp

add $t0,$zero,$zero # k=0, register $t0=0

loop: add $t0,$t0,$t1 # k = k + iaddi $t0,$t0,1 # k = k + 1

addi $t2,$zero,3 # $t2=3

addi $t1,$t1,1 # i=i+1

slt $t3,$t1,$t2 # $t3= (i<3)

bne $t3,$zero,loop # if (i<3) go to loop

srl $t0,$t0,1 # k = k / 2

41ECE369

Assembly to Hardware Example

add $t1,$zero,$zeroadd $t0,$zero,$zero

loop: add $t0,$t0,$t1addi $t0,$t0,1

addi $t2,$zero,3

addi $t1,$t1,1

slt $t3,$t1,$t2

bne $t3,$zero,loop

srl $t0,$t0,1

Instruction Types?

R-Type

R-Type

R-Type

R-Type

R-Type

I-Type

I-Type

I-Type

I-Type

op rs rt rd shamt funct

op rs rt 16 bit address

R

I

42ECE369

How do we represent in machine language?

add $t1,$zero,$zero

add $t0,$zero,$zero

loop: add $t0,$t0,$t1

addi $t0,$t0,1

addi $t2,$zero,3

addi $t1,$t1,1

slt $t3,$t1,$t2

bne $t3,$zero,loop

srl $t0,$t0,2

op rs rt rd shamt funct

op rs rt 16 bit address

R

I

6 bits 5 bits 5 bits 5 bits 5 bits 6 bits

000000_00000_00000_01000_00000_100000

000000_00000_00000_01001_00000_100000

op rs rt rd shamt funct

001000_00000_01010_0000000000000011

$t0 is reg 8

$t1 is reg 9

$t2 is reg 10

$t3 is reg 11

000000_01000_01001_01000_00000_100000

001000_01000_01000_0000000000000001

001000_01001_01001_0000000000000001

000000_01001_01010_01011_00000_101010

000000_00000_01000_01000_00001_000010

000101_00000_01011_1111111111111011

0:

4:

8:

12:

16:

20:

24:

28:

32:

PC+4+BR Addr

- 5

43ECE369

How do we represent in machine language?

add $t1,$zero,$zero

add $t0,$zero,$zero

loop: add $t0,$t0,$t1

addi $t0,$t0,1

addi $t2,$zero,3

addi $t1,$t1,1

slt $t3,$t1,$t2

bne $t3,$zero,loop

srl $t0,$t0,2

op rs rt rd shamt funct

0 000000_00000_00000_01000_00000_100000

4 000000_00000_00000_01001_00000_100000

8 001000_00000_01010_0000000000000011

12 000000_01000_01001_01000_00000_100000

16 001000_01000_01000_0000000000000001

20 001000_01001_01001_0000000000000001

24 000000_01001_01010_01011_00000_101010

28 000101_00000_01011_1111111111111011

32 000000_00000_01000_01000_00001_000010

Instruction Memory

44ECE369

Representation in MIPS Datapath

op rs rt rd shamt funct

0 000000_00000_00000_01000_00000_100000

4 000000_00000_00000_01001_00000_100000

8 001000_00000_01010_0000000000000011

12 000000_01000_01001_01000_00000_100000

16 001000_01000_01000_0000000000000001

20 001000_01001_01001_0000000000000001

24 000000_01001_01010_01011_00000_101010

28 000101_00000_01011_1111111111101100

32 000000_00000_01000_01000_00001_000010

Instruction MemoryName Register number Usage

$zero 0 the constant value 0$v0-$v1 2-3 values for results and expression evaluation$a0-$a3 4-7 arguments$t0-$t7 8-15 temporaries$s0-$s7 16-23 saved$t8-$t9 24-25 more temporaries$gp 28 global pointer$sp 29 stack pointer$fp 30 frame pointer$ra 31 return address

45ECE369

Big Picture

46ECE369

Compiler

47ECE369

Addressing Modes

48ECE369

Our Goal

add $t1, $s1, $s2 ($t1=9, $s1=17, $s2=18)

– 000000 10001 10010 01001 00000 100000 op rs rt rd shamt funct

49ECE369

• Assembly provides convenient symbolic representation

– much easier than writing down numbers

– e.g., destination first

• Machine language is the underlying reality

– e.g., destination is no longer first

• Assembly can provide 'pseudoinstructions'

– e.g., “move $t0, $t1” exists only in Assembly

– would be implemented using “add $t0,$t1,$zero”

• When considering performance you should count real instructions

Assembly Language vs. Machine Language

50ECE369

• Instruction complexity is only one variable

– lower instruction count vs. higher CPI / lower clock rate

• Design Principles:

– simplicity favors regularity

– smaller is faster

– good design demands compromise

– make the common case fast

• Instruction set architecture

– a very important abstraction indeed!

Summary