71
1 EE334 Chapter Four (Part A) The Processor: Datapath and Control

Chapter Four (Part A) The Processor: Datapath and Control

  • Upload
    others

  • View
    15

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Chapter Four (Part A) The Processor: Datapath and Control

1EE334 Spring 2010

Chapter Four

(Part A)

The Processor: Datapath and Control

Page 2: Chapter Four (Part A) The Processor: Datapath and Control

2EE334 Spring 2010

• We're ready to look at an implementation of the MIPS

• Simplified to contain only:

– memory-reference instructions: lw, sw

– arithmetic-logical instructions: add, sub, and, or, slt

– control flow instructions: beq, j

• Generic Implementation:

– use the program counter (PC) to supply instruction address

– get the instruction from memory

– read registers

– use the instruction to decide exactly what to do

• All instructions use the ALU after reading the registers

Why? memory-reference? arithmetic? control flow?

The Processor: Datapath & Control

Page 3: Chapter Four (Part A) The Processor: Datapath and Control

3EE334 Spring 2010

• Abstract / Simplified View:

Two types of functional units:

– elements that operate on data values (combinational)

– elements that contain state (sequential)

More Implementation Details

Registers

Register#

Data

Register#

Data

memory

Address

Data

Register#

PC Instruction ALU

Instruction

memory

Address

Page 4: Chapter Four (Part A) The Processor: Datapath and Control

4EE334 Spring 2010

• The set-reset latch

– output depends on present inputs and also on past inputs

An unclocked state element

A B NOR

0 0 1

0 1 0

1 0 0

1 1 0

R S Q Q

0 0 Q Q

0 1 1 0

1 0 0 1

1 1 0 0 Not allowed

R

S

Q

Q

Page 5: Chapter Four (Part A) The Processor: Datapath and Control

5EE334 Spring 2010

• Output is equal to the stored value inside the element

(don't need to ask for permission to look at the value)

• Change of state (value) is based on the clock

• Latches: whenever the inputs change, and the clock is

asserted

• Flip-flop: state changes only on a clock edge

(edge-triggered methodology)

"logically true",

— could mean electrically low

A clocking methodology defines when signals can be read and written

— wouldn't want to read a signal at the same time it was being written

Latches and Flip-flops

Page 6: Chapter Four (Part A) The Processor: Datapath and Control

6EE334 Spring 2010

• Two inputs:

– the data value to be stored (D)

– the clock signal (C) indicating when to read & store D

• Two outputs:

– the value of the internal state (Q) and it's complement

D-latch

Q

C

D

_Q

D

C

Q

Page 7: Chapter Four (Part A) The Processor: Datapath and Control

7EE334 Spring 2010

D flip-flop

• Output changes only on the clock edge

QQ

_Q

Q

_Q

Dlatch

D

C

Dlatch

DD

C

C

D

C

Q

Page 8: Chapter Four (Part A) The Processor: Datapath and Control

8EE334 Spring 2010

Our Implementation

• An edge triggered methodology

• Typical execution:

– read contents of some state elements,

– send values through some combinational logic

– write results to one or more state elements

Clock cycle

State

element

1

Combinational logic

State

element

2

Page 9: Chapter Four (Part A) The Processor: Datapath and Control

9EE334 Spring 2010

• Built using D flip-flops

Register File

Mu

x

Register 0

Register 1

Register n – 1

Register n

M

ux

Read data 1

Read data 2

Read register

number 1

Read register

number 2

Read registernumber 1 Read

data 1

Readdata 2

Read registernumber 2

Register fileWriteregister

Writedata Write

Page 10: Chapter Four (Part A) The Processor: Datapath and Control

10EE334 Spring 2010

Register File

• Note: we still use the real clock to determine when to write

n-to-1

decoder

Register 0

Register 1

Register n – 1

C

C

D

D

Register n

C

C

D

D

Register number

Write

Register data

0

1

n – 1

n

Page 11: Chapter Four (Part A) The Processor: Datapath and Control

11EE334 Spring 2010

Simple Implementation

• Include the functional units we need for each instruction

Why do we need this stuff?

PC

Instructionmemory

Instructionaddress

Instruction Add Sum

ALU control

RegWrite

RegistersWriteregister

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Writedata

ALUresult

ALU

Data

Data

Registernumbers Zero

5

5

5 3

16 32Sign

extend

b. Sign-extension unit

MemRead

MemWrite

Datamemory

Writedata

Readdata

a. Data memory unit

Addressa. Instruction memory b. Program counter c. Adder

a. Registers b. ALU

Page 12: Chapter Four (Part A) The Processor: Datapath and Control

12EE334 Spring 2010

Building the Datapath

• Use multiplexers to stitch them together

P C

In s tru c tio n

m e m o ry

R e a d

a d d re s s

In s tru c tio n

1 6 3 2

A d d A L Ur e s u lt

Mux

R e g is te rs

W ritere g is te r

W rite

d a ta

R e a d

d a ta 1

R e a d

d a ta 2

R e a dre g is te r 1

R e a d

re g is te r 2

S h ift

le f t 2

4

Mux

A L U o p e ra tio n3

R e g W rite

M e m R e a d

M e m W rite

P C S rc

A L U S rc

M e m to R e g

A L U

re su lt

Z e ro

A L U

D a ta

m e m o ry

A d d re s s

W ri te

d a ta

R e a d

d a ta Mux

S ig n

e x te n d

A d d

Instruction Fetch

Instruction

Decode/Operand

Fetch

Execution

Memory

WB

Page 13: Chapter Four (Part A) The Processor: Datapath and Control

13EE334 Spring 2010

Building the Datapath

• Use multiplexers to stitch them together

P C

In s tru c tio n

m e m o ry

R e a d

a d d re s s

In s tru c tio n

1 6 3 2

A d d A L Ur e s u lt

Mux

R e g is te rs

W ritere g is te r

W rite

d a ta

R e a d

d a ta 1

R e a d

d a ta 2

R e a dre g is te r 1

R e a d

re g is te r 2

S h ift

le f t 2

4

Mux

A L U o p e ra tio n3

R e g W rite

M e m R e a d

M e m W rite

P C S rc

A L U S rc

M e m to R e g

A L U

re su lt

Z e ro

A L U

D a ta

m e m o ry

A d d re s s

W ri te

d a ta

R e a d

d a ta Mux

S ig n

e x te n d

A d d

Page 14: Chapter Four (Part A) The Processor: Datapath and Control

14EE334 Spring 2010

Control

• Selecting the operations to perform (ALU, read/write, etc.)

• Controlling the flow of data (multiplexer inputs)

• Information comes from the 32 bits of the instruction

• Example:

add $8, $17, $18 Instruction Format:

000000 10001 10010 01000 00000 100000

op rs rt rd shamt funct

• ALU's operation based on instruction type and function code

Page 15: Chapter Four (Part A) The Processor: Datapath and Control

15EE334 Spring 2010

• e.g., what should the ALU do with this instruction

• Example: lw $1, 100($2)

35 2 1 100

op rs rt 16 bit offset

• ALU control input

000 AND

001 OR

010 add

110 subtract

111 set-on-less-than

• Why is the code for subtract 110 and not 011?

Control

Page 16: Chapter Four (Part A) The Processor: Datapath and Control

16EE334 Spring 2010

• Needs to support Logic and Arithmetic Operation

– AND, OR

– ADD, Subtract (using two’s complement)

• Needs to support the set-on-less-than instruction (slt)

– remember: slt is an arithmetic instruction

– produces a 1 if rs < rt and 0 otherwise

– use subtraction: (a-b) < 0 implies a < b

• Needs to support test for equality (beq $t5, $t6, $t7)

– use subtraction: (a-b) = 0 implies a = b

ALU

Page 17: Chapter Four (Part A) The Processor: Datapath and Control

17EE334 Spring 2010

Supporting slt

• Can we figure out the idea?0

3

Result

Operation

a

1

CarryIn

CarryOut

0

1

Binvert

b 2

Less

0

3

Result

Operation

a

1

CarryIn

0

1

Binvert

b 2

Less

Set

Overflow

detectionOverflow

a.

b.

Page 18: Chapter Four (Part A) The Processor: Datapath and Control

18EE334 Spring 2010

Set

a31

0

ALU0 Result0

CarryIn

a0

Result1

a1

0

Result2

a2

0

Operation

b31

b0

b1

b2

Result31

Overflow

Binvert

CarryIn

Less

CarryIn

CarryOut

ALU1

Less

CarryIn

CarryOut

ALU2

Less

CarryIn

CarryOut

ALU31

Less

CarryIn

Page 19: Chapter Four (Part A) The Processor: Datapath and Control

19EE334 Spring 2010

Test for equality

• Notice control lines:

000 = and

001 = or

010 = add

110 = subtract

111 = slt

•Note: zero is a 1 when the result is zero!

Set

a31

0

Result0a0

Result1a1

0

Result2a2

0

Operation

b31

b0

b1

b2

Result31

Overflow

Bnegate

Zero

ALU0

Less

CarryIn

CarryOut

ALU1

Less

CarryIn

CarryOut

ALU2

Less

CarryIn

CarryOut

ALU31

Less

CarryIn

Page 20: Chapter Four (Part A) The Processor: Datapath and Control

20EE334 Spring 2010

• Must describe hardware to compute 3-bit ALU control input

– given instruction type

00 = lw, sw

01 = beq,

11 = arithmetic

– function code for arithmetic

• Describe it using a truth table (can turn into gates):

ALUOp

computed from instruction type

(see next slide)

Control

ALUOp Funct field Operation

ALUOp1 ALUOp0 F5 F4 F3 F2 F1 F0 ALU control

0 0 X X X X X X 010 lw/sw

0 1 X X X X X X 110 beq

1 X X X 0 0 0 0 010 add

1 X X X 0 0 1 0 110 sub

1 X X X 0 1 0 0 000 and

1 X X X 0 1 0 1 001 or

1 X X X 1 0 1 0 111 slt

ALU control

ALUresult

ALUZero

3

Page 21: Chapter Four (Part A) The Processor: Datapath and Control

21EE334 Spring 2010

Control

Instruction RegDst ALUSrc

Memto-

Reg

Reg

Write

Mem

Read

Mem

Write Branch ALUOp1 ALUp0

R-format 1 0 0 1 0 0 0 1 0lw 0 1 1 1 1 0 0 0 0

sw X 1 X 0 0 1 0 0 0

beq X 0 X 0 0 0 1 0 1

PC

Instructionmemory

Readaddress

Instruction[31– 0]

Instruction [20– 16]

Instruction [25– 21]

Add

Instruction [5– 0]

MemtoReg

ALUOp

MemWrite

RegWrite

MemRead

Branch

R egDst

ALUSrc

Instruction [31– 26]

4

16 32Instruction [15– 0]

0

0Mux

0

1

Control

AddALU

result

Mux

0

1

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Read

register 2

Signextend

Shiftleft 2

Mux

1

ALUresult

Zero

Datamemory

Writedata

Readdata

Mux

1

Instruction [15– 11]

ALUcontrol

ALU

Address

Page 22: Chapter Four (Part A) The Processor: Datapath and Control

22EE334 Spring 2010

Control (R-format instruction)

Instruction RegDst ALUSrc

Memto-

Reg

Reg

Write

Mem

Read

Mem

Write Branch ALUOp1 ALUp0

R-format 1 0 0 1 0 0 0 1 0lw 0 1 1 1 1 0 0 0 0

sw X 1 X 0 0 1 0 0 0

beq X 0 X 0 0 0 1 0 1

PC

Instructionmemory

Readaddress

Instruction[31– 0]

Instruction [20– 16]

Instruction [25– 21]

Add

Instruction [5– 0]

MemtoReg

ALUOp

MemWrite

RegWrite

MemRead

Branch

R egDst

ALUSrc

Instruction [31– 26]

4

16 32Instruction [15– 0]

0

0Mux

0

1

Control

AddALU

result

Mux

0

1

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Read

register 2

Signextend

Shiftleft 2

Mux

1

ALUresult

Zero

Datamemory

Writedata

Readdata

Mux

1

Instruction [15– 11]

ALUcontrol

ALU

Address

Op

co

de

rsrt

rdsha

mt

fun

c

1

0 0

0

0

1

10

0

Page 23: Chapter Four (Part A) The Processor: Datapath and Control

23EE334 Spring 2010

Control (lw instruction)

Instruction RegDst ALUSrc

Memto-

Reg

Reg

Write

Mem

Read

Mem

Write Branch ALUOp1 ALUp0

R-format 1 0 0 1 0 0 0 1 0lw 0 1 1 1 1 0 0 0 0

sw X 1 X 0 0 1 0 0 0

beq X 0 X 0 0 0 1 0 1

PC

Instructionmemory

Readaddress

Instruction[31– 0]

Instruction [20– 16]

Instruction [25– 21]

Add

Instruction [5– 0]

MemtoReg

ALUOp

MemWrite

RegWrite

MemRead

Branch

R egDst

ALUSrc

Instruction [31– 26]

4

16 32Instruction [15– 0]

0

0Mux

0

1

Control

AddALU

result

Mux

0

1

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Read

register 2

Signextend

Shiftleft 2

Mux

1

ALUresult

Zero

Datamemory

Writedata

Readdata

Mux

1

Instruction [15– 11]

ALUcontrol

ALU

Address

Op

co

de

rsrt

imm

ed

iate

0

1 1

0

1

1

00

0

Page 24: Chapter Four (Part A) The Processor: Datapath and Control

24EE334 Spring 2010

Control (beq instruction)

Instruction RegDst ALUSrc

Memto-

Reg

Reg

Write

Mem

Read

Mem

Write Branch ALUOp1 ALUp0

R-format 1 0 0 1 0 0 0 1 0lw 0 1 1 1 1 0 0 0 0

sw X 1 X 0 0 1 0 0 0

beq X 0 X 0 0 0 1 0 1

PC

Instructionmemory

Readaddress

Instruction[31– 0]

Instruction [20– 16]

Instruction [25– 21]

Add

Instruction [5– 0]

MemtoReg

ALUOp

MemWrite

RegWrite

MemRead

Branch

R egDst

ALUSrc

Instruction [31– 26]

4

16 32Instruction [15– 0]

0

0Mux

0

1

Control

AddALU

result

Mux

0

1

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Read

register 2

Signextend

Shiftleft 2

Mux

1

ALUresult

Zero

Datamemory

Writedata

Readdata

Mux

1

Instruction [15– 11]

ALUcontrol

ALU

Address

Op

co

de

rsrt

imm

ed

iate

X

0 X

0

0

0

01

1

Page 25: Chapter Four (Part A) The Processor: Datapath and Control

25EE334 Spring 2010

Control (Part 1)

• Simple combinational logic (truth tables)

Operation2

Operation1

Operation0

Operation

ALUOp1

F3

F2

F1

F0

F (5– 0)

ALUOp0

ALUOp

ALU control block

ALUOp Funct field Operation

ALUOp1 ALUOp0 F5 F4 F3 F2 F1 F0

0 0 X X X X X X 010 lw/sw

0 1 X X X X X X 110 branch

1 X X X 0 0 0 0 010 add

1 X X X 0 0 1 0 110 sub

1 X X X 0 1 0 0 000 and

1 X X X 0 1 0 1 001 or

1 X X X 1 0 1 0 111 slt

Page 26: Chapter Four (Part A) The Processor: Datapath and Control

26EE334 Spring 2010

Control (Part 2)

• Simple combinational logic (truth tables)

R-format Iw sw beq

Op0

Op1

Op2

Op3

Op4

Op5

Inputs

Outputs

RegDst

ALUSrc

MemtoReg

RegWrite

MemRead

MemWrite

Branch

ALUOp1

ALUOpO

From MIPS Data Reference

Opcode [31:26]

0 000000 R-format inst

4 000100 branch inst.

35 100011 lw inst.

43 101011 sw inst.

Instruction RegDst ALUSrc

Memto-

Reg

Reg

Write

Mem

Read

Mem

Write Branch ALUOp1 ALUp0

R-format 1 0 0 1 0 0 0 1 0

lw 0 1 1 1 1 0 0 0 0

sw X 1 X 0 0 1 0 0 0

beq X 0 X 0 0 0 1 0 1

0

0

0

0

0

0

Page 27: Chapter Four (Part A) The Processor: Datapath and Control

27EE334 Spring 2010

• All of the logic is combinational

• We wait for everything to settle down, and the right thing to be done

– ALU might not produce “right answer” right away

– we use write signals along with clock to determine when to write

• Cycle time determined by length of the longest path

Our Simple Control Structure

We are ignoring some details like setup and hold times

Clock cycle

State

element

1

Combinational logic

State

element

2

Page 28: Chapter Four (Part A) The Processor: Datapath and Control

28EE334 Spring 2010

Single Cycle Implementation

• Calculate cycle time assuming negligible delays except:

– memory (2ns), ALU and adders (2ns), register file access (1ns)

MemtoReg

MemRead

MemWrite

ALUOp

ALUSrc

RegDst

PC

Instructionmemory

Readaddress

Instruction[31– 0]

Instruction [20– 16]

Instruction [25– 21]

Add

Instruction [5– 0]

RegWrite

4

16 32Instruction [15– 0]

0

Registers

Writeregister

Writedata

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

ALUresult

Zero

Datamemory

Address Readdata

Mux

1

0

Mux

1

0

Mux

1

0

Mux

1

Instruction [15– 11]

ALUcontrol

Shift

left 2

PCSrc

ALU

AddALU

result

Page 29: Chapter Four (Part A) The Processor: Datapath and Control

29EE334 Spring 2010

Where we are headed

• Single Cycle Problems:

– what if we had a more complicated instruction like floating

point?

– wasteful of area

• One Solution:

– use a “smaller” cycle time

– have different instructions take different numbers of cycles

– a “multicycle” datapath:

PC

Memory

Address

Instructionor data

Data

Instructionregister

Registers

Register #

Data

Register #

Register #

ALU

Memory

dataregister

A

B

ALUOut

Page 30: Chapter Four (Part A) The Processor: Datapath and Control

30EE334 Spring 2010

• We will be reusing functional units

– ALU used to compute address and to increment PC

– Memory used for instruction and data

• Our control signals will not be determined solely by instruction

– e.g., what should the ALU do for a “subtract” instruction?

• We’ll use a finite state machine for control

Multicycle Approach

Page 31: Chapter Four (Part A) The Processor: Datapath and Control

31EE334 Spring 2010

• Finite state machines:

– a set of states and

– next state function (determined by current state and the input)

– output function (determined by current state and possibly input)

– We’ll use a Moore machine (output based only on current state)

Review: finite state machines

Next-statefunction

Current state

Clock

Outputfunction

Next

state

Outputs

Inputs

Page 32: Chapter Four (Part A) The Processor: Datapath and Control

32EE334 Spring 2010

• Break up the instructions into steps, each step takes a cycle

– balance the amount of work to be done

– restrict each cycle to use only one major functional unit

• At the end of a cycle

– store values for use in later cycles (easiest thing to do)

– introduce additional “internal” registers

Multicycle Approach

Shiftleft 2

PC

Memory

MemData

Writedata

Mux

0

1

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Mux

0

1

Mux

0

1

4

Instruction[15–0]

Signextend

3216

Instruction[25–21]

Instruction[20–16]

Instruction[15–0]

Instructionregister

1 Mux

0

3

2

Mux

ALUresult

ALU

Zero

Memorydata

register

Instruction[15–11]

A

B

ALUOut

0

1

Address

Page 33: Chapter Four (Part A) The Processor: Datapath and Control

33EE334 Spring 2010

• Instruction Fetch

• Instruction Decode and Register Fetch

• Execution, Memory Address Computation, or Branch Completion

• Memory Access or R-type instruction completion

• Write-back

INSTRUCTIONS TAKE FROM 3 - 5 CYCLES!

Five Execution Steps

Page 34: Chapter Four (Part A) The Processor: Datapath and Control

34EE334 Spring 2010

R-format instruction (ALU inst)

Shiftleft 2

PC

Memory

MemData

Writedata

Mux

0

1

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Mux

0

1

Mux

0

1

4

Instruction[15–0]

Signextend

3216

Instruction[25–21]

Instruction[20–16]

Instruction[15–0]

Instructionregister

1 Mux

0

3

2

Mux

ALUresult

ALU

Zero

Memorydata

register

Instruction[15–11]

A

B

ALUOut

0

1

Address

Step: IF Cycle: 1Step: OF (ID) Cycle: 2Step: EX Cycle: 3Step: WB Cycle: 4

FORMAT

opcode rs rt rd shamt funct

Page 35: Chapter Four (Part A) The Processor: Datapath and Control

35EE334 Spring 2010

Load instruction

Shiftleft 2

PC

Memory

MemData

Writedata

Mux

0

1

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Mux

0

1

Mux

0

1

4

Instruction[15–0]

Signextend

3216

Instruction[25–21]

Instruction[20–16]

Instruction[15–0]

Instructionregister

1 Mux

0

3

2

Mux

ALUresult

ALU

Zero

Memorydata

register

Instruction[15–11]

A

B

ALUOut

0

1

Address

Step: IF Cycle: 1Step: OF (ID) Cycle: 2Step: EX Cycle: 3Step: MEM Cycle: 4Step: WB Cycle: 5

FORMAT

opcode rs rt immediate

Page 36: Chapter Four (Part A) The Processor: Datapath and Control

36EE334 Spring 2010

branch instruction

Shiftleft 2

PC

Memory

MemData

Writedata

Mux

0

1

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Mux

0

1

Mux

0

1

4

Instruction[15–0]

Signextend

3216

Instruction[25–21]

Instruction[20–16]

Instruction[15–0]

Instructionregister

1 Mux

0

3

2

Mux

ALUresult

ALU

Zero

Memorydata

register

Instruction[15–11]

A

B

ALUOut

0

1

Address

Step: IF Cycle: 1Step: OF (ID) Cycle: 2Step: EX Cycle: 3

FORMAT

opcode rs rt immediate

Page 37: Chapter Four (Part A) The Processor: Datapath and Control

37EE334 Spring 2010

• Use PC to get instruction and put it in the Instruction Register.

• Increment the PC by 4 and put the result back in the PC.

• Can be described succinctly using RTL "Register-Transfer Language"

IR = Memory[PC];

PC = PC + 4;

Can we figure out the values of the control signals?

What is the advantage of updating the PC now?

Step 1: Instruction Fetch

address

Page 38: Chapter Four (Part A) The Processor: Datapath and Control

38EE334 Spring 2010

• Read registers rs and rt in case we need them

• Compute the branch address in case the instruction is a branch

• RTL:

A = Reg[IR[25-21]];

B = Reg[IR[20-16]];

ALUOut = PC + (sign-extend(IR[15-0]) << 2);

• We aren't setting any control lines based on the instruction type

(we are busy "decoding" it in our control logic)

Step 2: Instruction Decode and Register

Fetch

Page 39: Chapter Four (Part A) The Processor: Datapath and Control

39EE334 Spring 2010

• ALU is performing one of three functions, based on instruction type

• Memory Reference:

ALUOut = A + sign-extend(IR[15-0]);

• R-type:

ALUOut = A op B;

• Branch:

if (A==B) PC = ALUOut;

Step 3: EX (instruction dependent)

Page 40: Chapter Four (Part A) The Processor: Datapath and Control

40EE334 Spring 2010

• Loads and stores access memory

MDR = Memory[ALUOut];

or

Memory[ALUOut] = B;

• R-type instructions finish

Reg[IR[15-11]] = ALUOut;

The write actually takes place at the end of the cycle on the edge

Step 4: (R-type or memory-access)

Page 41: Chapter Four (Part A) The Processor: Datapath and Control

41EE334 Spring 2010

• Reg[IR[20-16]]= MDR;

What about all the other instructions?

Write-back step

Page 42: Chapter Four (Part A) The Processor: Datapath and Control

42EE334 Spring 2010

Summary:

Step name

Action for R-type

instructions

Action for memory-reference

instructions

Action for

branches

Action for

jumps

Instruction fetch IR = Memory[PC]PC = PC + 4

Instruction A = Reg [IR[25-21]]decode/register fetch B = Reg [IR[20-16]]

ALUOut = PC + (sign-extend (IR[15-0]) << 2)

Execution, address ALUOut = A op B ALUOut = A + sign-extend if (A ==B) then PC = PC [31-28] IIcomputation, branch/ (IR[15-0]) PC = ALUOut (IR[25-0]<<2)jump completion

Memory access or R-type Reg [IR[15-11]] = Load: MDR = Memory[ALUOut]completion ALUOut or

Store: Memory [ALUOut] = B

Memory read completion Load: Reg[IR[20-16]] = MDR

Page 43: Chapter Four (Part A) The Processor: Datapath and Control

43EE334 Spring 2010

• How many cycles will it take to execute this code?

lw $t2, 0($t3)

lw $t3, 4($t3)

beq $t2, $t3, Label #assume not

add $t5, $t2, $t3

sw $t5, 8($t3)

Label: ...

• What is going on during the 8th cycle of execution?

• In what cycle does the actual addition of $t2 and $t3 takes place?

Simple Questions

Page 44: Chapter Four (Part A) The Processor: Datapath and Control

44EE334 Spring 2010

• Value of control signals is dependent upon:

– what instruction is being executed

– which step is being performed

• Use the information we’ve accumulated to specify a finite state machine

– specify the finite state machine graphically, or

– use microprogramming

• Implementation can be derived from specification

Implementing the Control

Page 45: Chapter Four (Part A) The Processor: Datapath and Control

45EE334 Spring 2010

MemtoReg ALUSrcB ALUOp

MemRead

IorD MemWrite IRWrite RegDst RegWrite ALU SrcA

Page 46: Chapter Four (Part A) The Processor: Datapath and Control

46EE334 Spring 2010

Page 47: Chapter Four (Part A) The Processor: Datapath and Control

47EE334 Spring 2010

Instruction Fetch (IF)

Page 48: Chapter Four (Part A) The Processor: Datapath and Control

48EE334 Spring 2010

Instruction Decode (ID)

Page 49: Chapter Four (Part A) The Processor: Datapath and Control

49EE334 Spring 2010

Address Computation (EX) (Memory: Load or Store)

Page 50: Chapter Four (Part A) The Processor: Datapath and Control

50EE334 Spring 2010

Memory Read (MEM) (Load instruction)

Page 51: Chapter Four (Part A) The Processor: Datapath and Control

51EE334 Spring 2010

Write Back (WB) (Load instruction)

Page 52: Chapter Four (Part A) The Processor: Datapath and Control

52EE334 Spring 2010

Instruction Fetch (IF) Jump

Page 53: Chapter Four (Part A) The Processor: Datapath and Control

53EE334 Spring 2010

Instruction Decode (ID) Jump

Page 54: Chapter Four (Part A) The Processor: Datapath and Control

54EE334 Spring 2010

Execute (EX) Jump

Page 55: Chapter Four (Part A) The Processor: Datapath and Control

How many state bits will we need?

Graphical Specification of FSM

PCWrite

PCSource = 10

ALUSrcA = 1

ALUSrcB = 00ALUOp = 01

PCWriteCondPCSource = 01

ALUSrcA =1

ALUSrcB = 00ALUOp= 10

RegDst = 1RegWrite

MemtoReg = 0

MemWriteIorD = 1

MemRead

IorD = 1

ALUSrcA = 1

ALUSrcB = 10ALUOp = 00

RegDst =0RegWrite

MemtoReg=1

ALUSrcA = 0

ALUSrcB = 11

ALUOp = 00

MemRead

ALUSrcA = 0IorD = 0

IRWriteALUSrcB = 01

ALUOp = 00PCWrite

PCSource = 00

Instruction fetchInstruction decode/

register fetch

Jumpcompletion

BranchcompletionExecution

Memory address

computation

Memoryaccess

Memoryaccess R-type completion

Write-back step

(Op

='J

')

(Op

='L

W')

4

01

9862

753

Start

Page 56: Chapter Four (Part A) The Processor: Datapath and Control

Graphical Specification of FSMMemRead

ALUSrcA = 0IorD = 0

IRWriteALUSrcB = 01

ALUOp = 00PCWrite

PCSource = 00

0

Start

Page 57: Chapter Four (Part A) The Processor: Datapath and Control

ALUSrcA = 0

ALUSrcB = 11

ALUOp = 00

MemRead

ALUSrcA = 0IorD = 0

IRWriteALUSrcB = 01

ALUOp = 00PCWrite

PCSource = 00

Instruction fetchInstruction decode/

register fetch0

1

Start

Page 58: Chapter Four (Part A) The Processor: Datapath and Control

PCWrite

PCSource = 10

ALUSrcA = 1

ALUSrcB = 00ALUOp = 01

PCWriteCondPCSource = 01

ALUSrcA =1

ALUSrcB = 00ALUOp= 10

RegDst = 1RegWrite

MemtoReg = 0

MemWriteIorD = 1

MemRead

IorD = 1

ALUSrcA = 1

ALUSrcB = 10ALUOp = 00

RegDst=0RegWrite

MemtoReg=1

ALUSrcA = 0

ALUSrcB = 11

ALUOp = 00

MemRead

ALUSrcA = 0IorD = 0

IRWriteALUSrcB = 01

ALUOp = 00PCWrite

PCSource = 00

Instruction fetchInstruction decode/

register fetch

Jumpcompletion

BranchcompletionExecution

Memory address

computation

Memoryaccess

Memoryaccess R-type completion

Write-back step

(Op

='J

')

(Op

='L

W')

4

01

9862

753

Start

Page 59: Chapter Four (Part A) The Processor: Datapath and Control

59EE334 Spring 2010

• Implementation:

Finite State Machine for Control

PCWrite

PCWriteCond

IorD

MemtoReg

PCSource

ALUOp

ALUSrcB

ALUSrcA

RegWrite

RegDst

NS3

NS2

NS1

NS0

Op5

Op4

Op3

Op2

Op1

Op0

S3

S2

S1

S0

State register

IRWrite

MemRead

MemWrite

Instruction register

opcode field

Outputs

Control logic

Inputs

Page 60: Chapter Four (Part A) The Processor: Datapath and Control

60EE334 Spring 2010

OR

Plane

AND

Plane

PLA Implementation

• If I picked a horizontal or vertical line could you explain it?

Op5

Op4

Op3

Op2

Op1

Op0

S3

S2

S1

S0

IorD

IRWrite

MemRead

MemWrite

PCWrite

PCWriteCond

MemtoReg

PCSource1

ALUOp1

ALUSrcB0

ALUSrcA

RegWrite

RegDst

NS3

NS2

NS1

NS0

ALUSrcB1

ALUOp0

PCSource0

Page 61: Chapter Four (Part A) The Processor: Datapath and Control

61EE334 Spring 2010

PLA Implementation

• If I picked a horizontal or vertical line could you explain it?

Op5

Op4

Op3

Op2

Op1

Op0

S3

S2

S1

S0

IorD

IRWrite

MemRead

MemWrite

PCWrite

PCWriteCond

MemtoReg

PCSource1

ALUOp1

ALUSrcB0

ALUSrcA

RegWrite

RegDst

NS3

NS2

NS1

NS0

ALUSrcB1

ALUOp0

PCSource0

Page 62: Chapter Four (Part A) The Processor: Datapath and Control

62EE334 Spring 2010

• ROM = "Read Only Memory"

– values of memory locations are fixed ahead of time

• A ROM can be used to implement a truth table

– if the address is m-bits, we can address 2m entries in the ROM.

– our outputs are the bits of data that the address points to.

m is the "height", and n is the "width"

ROM Implementation

m n

0 0 0 0 0 1 1

0 0 1 1 1 0 0

0 1 0 1 1 0 0

0 1 1 1 0 0 0

1 0 0 0 0 0 0

1 0 1 0 0 0 1

1 1 0 0 1 1 0

1 1 1 0 1 1 1

Page 63: Chapter Four (Part A) The Processor: Datapath and Control

63EE334 Spring 2010

• How many inputs are there?

6 bits for opcode, 4 bits for state = 10 address lines

(i.e., 210 = 1024 different addresses)

• How many outputs are there?

16 datapath-control outputs, 4 state bits = 20 outputs

• ROM is 210 x 20 = 20K bits (and a rather unusual size)

• Rather wasteful, since for lots of the entries, the outputs are the

same

— i.e., opcode is often ignored

ROM Implementation

Page 64: Chapter Four (Part A) The Processor: Datapath and Control

64EE334 Spring 2010

• Break up the table into two parts

— 4 state bits tell you the 16 outputs, 24 x 16 bits of ROM

— 10 bits tell you the 4 next state bits, 210 x 4 bits of ROM

— Total: 4.3K bits of ROM

• PLA is much smaller

— can share product terms

— only need entries that produce an active output

— can take into account don't cares

• Size is (#inputs #product-terms) + (#outputs #product-terms)

For this example = (10x17)+(20x17) = 460 PLA cells

• PLA cells usually about the size of a ROM cell (slightly bigger)

ROM vs PLA

Page 65: Chapter Four (Part A) The Processor: Datapath and Control

65EE334 Spring 2010

• Complex instructions: the "next state" is often current state + 1

Another Implementation Style

AddrCtl

Outputs

PLA or ROM

State

Address select logic

Op[5

–0]

Adder

Instruction register

opcode field

1

Control unit

Input

PCWrite

PCWriteCond

IorD

MemtoReg

PCSource

ALUOp

ALUSrcB

ALUSrcA

RegWrite

RegDst

IRWrite

MemRead

MemWrite

BWrite

Page 66: Chapter Four (Part A) The Processor: Datapath and Control

66EE334 Spring 2010

State number Address-control action Value of AddrCtl

0 Use incremented state 3

1 Use dispatch ROM 1 1

2 Use dispatch ROM 2 2

3 Use incremented state 3

4 Replace state number by 0 0

5 Replace state number by 0 0

6 Use incremented state 3

7 Replace state number by 0 0

8 Replace state number by 0 0

9 Replace state number by 0 0

DetailsDispatch ROM 1 Dispatch ROM 2

Op Opcode name Value Op Opcode name Value

000000 R-format 0110 100011 lw 0011

000010 jmp 1001 101011 sw 0101

000100 beq 1000

100011 lw 0010

101011 sw 0010

State

Op

Adder

1

PLA or ROM

Mux

3 2 1 0

Dispatch ROM 1Dispatch ROM 2

0

AddrCtl

Address select logic

Instruction register

opcode field

Page 67: Chapter Four (Part A) The Processor: Datapath and Control

67EE334 Spring 2010

Microprogramming

• What are the “microinstructions” ?

PCWrite

PCWriteCondIorD

MemtoReg

PCSource

ALUOp

ALUSrcB

ALUSrcA

RegWrite

AddrCtl

Outputs

Microcode memory

IRWrite

MemRead

MemWrite

RegDst

Control unit

Input

Microprogram counter

Address select logic

Op[5

–0]

Adder

1

Datapath

Instruction registeropcode field

BWrite

Page 68: Chapter Four (Part A) The Processor: Datapath and Control

68EE334 Spring 2010

• A specification methodology

– appropriate if hundreds of opcodes, modes, cycles, etc.

– signals specified symbolically using microinstructions

• Will two implementations of the same architecture have the same microcode?

• What would a microassembler do?

Microprogramming

Label

ALU

control SRC1 SRC2

Register

control Memory

PCWrite

control Sequencing

Fetch Add PC 4 Read PC ALU Seq

Add PC Extshft Read Dispatch 1

Mem1 Add A Extend Dispatch 2

LW2 Read ALU Seq

Write MDR Fetch

SW2 Write ALU Fetch

Rformat1 Func code A B Seq

Write ALU Fetch

BEQ1 Subt A B ALUOut-cond Fetch

JUMP1 Jump address Fetch

Page 69: Chapter Four (Part A) The Processor: Datapath and Control

Microinstruction formatField name Value Signals active Comment

Add ALUOp = 00 Cause the ALU to add.

ALU control Subt ALUOp = 01 Cause the ALU to subtract; this implements the compare for

branches.

Func code ALUOp = 10 Use the instruction's function code to determine ALU control.

SRC1 PC ALUSrcA = 0 Use the PC as the first ALU input.

A ALUSrcA = 1 Register A is the first ALU input.

B ALUSrcB = 00 Register B is the second ALU input.

SRC2 4 ALUSrcB = 01 Use 4 as the second ALU input.

Extend ALUSrcB = 10 Use output of the sign extension unit as the second ALU input.

Extshft ALUSrcB = 11 Use the output of the shift-by-two unit as the second ALU input.

Read Read two registers using the rs and rt fields of the IR as the register

numbers and putting the data into registers A and B.

Write ALU RegWrite, Write a register using the rd field of the IR as the register number and

Register RegDst = 1, the contents of the ALUOut as the data.

control MemtoReg = 0

Write MDR RegWrite, Write a register using the rt field of the IR as the register number and

RegDst = 0, the contents of the MDR as the data.

MemtoReg = 1

Read PC MemRead, Read memory using the PC as address; write result into IR (and

lorD = 0 the MDR).

Memory Read ALU MemRead, Read memory using the ALUOut as address; write result into MDR.

lorD = 1

Write ALU MemWrite, Write memory using the ALUOut as address, contents of B as the

lorD = 1 data.

ALU PCSource = 00 Write the output of the ALU into the PC.

PCWrite

PC write control ALUOut-cond PCSource = 01, If the Zero output of the ALU is active, write the PC with the contents

PCWriteCond of the register ALUOut.

jump address PCSource = 10, Write the PC with the jump address from the instruction.

PCWrite

Seq AddrCtl = 11 Choose the next microinstruction sequentially.

Sequencing Fetch AddrCtl = 00 Go to the first microinstruction to begin a new instruction.

Dispatch 1 AddrCtl = 01 Dispatch using the ROM 1.

Dispatch 2 AddrCtl = 10 Dispatch using the ROM 2.

Page 70: Chapter Four (Part A) The Processor: Datapath and Control

70EE334 Spring 2010

• No encoding:

– 1 bit for each datapath operation

– faster, requires more memory (logic)

– used for VAX 780 — an astonishing 400K of memory!

• Lots of encoding:

– send the microinstructions through logic to get control signals

– uses less memory, slower

• Historical context of CISC:

– Too much logic to put on a single chip with everything else

– Use a ROM (or even RAM) to hold the microcode

– It’s easy to add new instructions

Maximally vs. Minimally Encoded

Page 71: Chapter Four (Part A) The Processor: Datapath and Control

71EE334 Spring 2010

Microcode: Trade-offs

• Distinction between specification and implementation is sometimes blurred

• Specification Advantages:

– Easy to design and write

– Design architecture and microcode in parallel

• Implementation (off-chip ROM) Advantages

– Easy to change since values are in memory

– Can emulate other architectures

– Can make use of internal registers

• Implementation Disadvantages, SLOWER now that:

– Control is implemented on same chip as processor

– ROM is no longer faster than RAM

– No need to go back and make changes