146
Computer Organization and Architecture Chapter 5: The Processor: Datapath and Control Yu-Lun Kuo Computer Sciences and Information Engineering University of Tunghai, Taiwan [email protected]

Computer Organization and Architecture Chapter 5: The Processor: Datapath and Control

  • Upload
    amanda

  • View
    54

  • Download
    1

Embed Size (px)

DESCRIPTION

Computer Organization and Architecture Chapter 5: The Processor: Datapath and Control. Yu-Lun Kuo Computer Sciences and Information Engineering University of Tunghai, Taiwan [email protected]. 5.1 Introduction. The performance of a machine Instruction count Clock cycle time - PowerPoint PPT Presentation

Citation preview

Page 1: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Computer Organization and Architecture

Chapter 5: The Processor:

Datapath and Control

Yu-Lun Kuo

Computer Sciences and Information Engineering

University of Tunghai, Taiwan

[email protected]

Page 2: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

5.1 Introduction

• The performance of a machine – Instruction count

– Clock cycle time

– Clock cycles per instruction (CPI)

• The compiler and the instruction set architecture

– Determine the instruction count required for a given instruction

2

Page 3: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

5.1 Introduction

• Both the clock cycle time and the number of CPI

– Determined by the implementation of the processor

• We construct the datapath and control unit for two different implementations of the MIPS instruction set

– Single cycle implementation

– Multi cycle implementation

3

Page 4: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

5.1 Introduction

• We are going to see how the processor is implemented

– starting with a very simple processor, and adding some more complexity

4

Page 5: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Basic MPIS Implementation

• Include a subset of the MIPS instruction– Memory-reference instructions: lw and sw

– The ALU instructions: add, sub, and, or, slt

– Control flow instructions: beq and j

• Generic Implementation– Use the program counter (PC) to supply

instruction address

– Fetch the instruction from memory

– Read one/two registers

– Use the instruction to decide exactly what to do

5

Page 6: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Basic MPIS Implementation

• All instructions use the ALU after reading the registers (except jump)

– Memory-reference instructions use ALU for address calculation

– Arithmetic-logical instructions for the operation execution

– Branches for comparison

6

Page 7: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Our Processor, sort of…

What’s missing

How to combine input that are “joined” together How to tell which component what to do?

Page 8: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control
Page 9: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Multiplexers and Controllers

• In the previous figure we have two or more “wires” going into the input of a component

– This is because depending on the instruction being executed different input should be provided

• So, based on the instruction, we need to decide which input should be selected

• This is done with a multiplexer (多工器 )

M

U

X

M

U

X

input 1

input n. . . selected output

control: ceil(log2(n)) bits

Page 10: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

What about the Control?

• So great, now we can control multiplexers– Need a controller sends the appropriate control

bits to all the multiplexers and the components

• Besides, there are other things to control– Example: the ALU has a bunch of control bits,

that tells it what to do:

2-bit control

00: ADD

01: SUB

10: MUL

11: SHIFT

Page 11: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Control Unit (Simplified)

instruction register

. . .

PC

Add

. . . offset

M

U

X

4

input 1

input 0

0 or 1

Page 12: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

A More Complete Picture

Page 13: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control
Page 14: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

5.2 Logic Design Conventions

• The functional units (功能單元 ) in the MIPS implementation consist of two different types of logic elements

– Elements that operate on data values (combinational)

» Outputs depend only on the current inputs

» Always produces the same output • It has no internal storage

– Elements that contain state (sequential)

» Has at least two inputs and one output• Data value to be written into the element

• Clock: determine when the data value is written

• The value that was written in a previous clock cycle

14

Page 15: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Clocking Methodology

• Clocking methodology– When signals can be read and when they can be

written

» If a signal is written at the same time it is read. Computer designs cannot tolerate such unpredictability

– The clock cycle/period is divided into two portions

» high clock

» low clock

15clock cycle

rising edge falling edge

Page 16: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Edge-triggered Clocking

• Edge-triggered clocking (邊緣觸發 )– meaning that state changes (in state elements)

occur only at a clock edge

– Using either the rising edge or the falling edge

• Typical execution:– Read contents of some state elements

– Send values through some combinational logic

– Write results to one or more state elements

16

State

element 1

State

element 2

Combinational logic

Clock cycle

Page 17: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

The Clock

• In the above, we want to use the value in state element #1 to modify the value in state element #2: It takes one cycle

– We need all signals to be stabilized

clock cycle

state

element #1

state

element #2

stable updated on edge

combinatorial

circuit

stable by edge

Page 18: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control
Page 19: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Read/Write in a Clock Cycle

• A great implication of edge-triggered clocking

– A state element can be read and written in the same clock cycle

– We will say things like: “reads happen in the first half of the clock cycle, writes happen in the second half”

state

element #1

state

element #2

stable updated on edge

combinatorial

circuit

stable by edge

Page 20: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control
Page 21: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Write Control Signal (p.291)

• Both the clock signal and the write control signal are inputs

– The state element is changed only when

» The write control signal is asserted

» Clock edge occurs

– Assuming a rising edge update:» While the control bit stays at 0, nothing happen

» If we set the control bit to 1, the state element will be updated at the next rising edge

21

Page 22: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Busses and bus width

• Many of the state elements and combinational elements take multi-bit inputs (often 32-bit inputs)

• The term “bus” refers to a wire that carries more than one bit

– multiple 1-bit wires, really

• We simply indicate the width of the busses as follows:

16

8

control signal

Page 23: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Building a Datapath

• A datapath is an element in the processor that is supposed to operate on or hold data

– instruction memory, data memory, register file, ALU, adders

• Let’s re-examine the datapath elements we only barely introduced earlier

Page 24: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Building a Datapath

• Start by looking at which datapath elememts each instruction needs

– Also show their control signals

• Program Counter (PC) (程式計數器 )– (Register) Memory unit to store the instructions of a

program and supply instructions given an address

– 32 bits register that will written at the end of every clock cycle (not need a write control signal)

• Adder (加法器 )– Increment the PC to the address of the next

instruction

– Combinational. Built from the ALU

24

Page 25: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

The Three Elements

• Two state element are needed to store and access instructions

– The instruction memory only provide read

– Output at any time reflects the contents of the location specified by the address input

• An adder is needed to compute the next instruction address (+4 Bytes)

– ALU wired to always perform an add

25

Page 26: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Fetching Instructions

add

Instruction

Memory

Instruction

read addressPC

4

32

32

32

The PC gets updated in 1 clock cycle because we use edge-triggered clocking

read address, instruction retrieved

from instruction memory

PC +4 latched into PC

Page 27: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Register File

• The processor’s 32 general-purpose registers

– Stored in a structure called register file

• Register file– Collection of registers in which any register

can be read or written by specifying the number of the register in the file

27

Read registernumber 1 Read

data 1

Readdata 2

Read registernumber 2

Register fileWriteregister

Writedata Write

Clock

5 bits

5 bits

5 bits

32 bits

32 bits

32 bits

Control signal

Page 28: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Datapath: Instruction Store/Fetch & PC Increment

PC

Instructionmemory

Instructionaddress

Instruction

a. Instruction memory b. Program counter

Add Sum

c. Adder

PC

Instructionmemory

Readaddress

Instruction

4

Add

Three elements used to store

and fetch instructions and

increment the PC Datapath

Page 29: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Animating the Datapath

Instruction <- MEM[PC]

PC <- PC + 4

RDMemory

ADDR

PC

Instruction

4

ADD

Page 30: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

What about R-type instructions

• These instructions take 3 registers as arguments:– 1 output register

– 2 input registers

– Example: add $t1, $t2, $t3

» Which reads $t2 and $t3 and writes $t1

• We need an input that contains data to be written into the output register

– Typically comes from the ALU

• We need a Write signal to trigger the register write on the next clock edge

– A write anytime during the clock cycle could lead to race conditions if that register is also read

30

Page 31: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Datapath: R-Type Instruction

ALU control

RegWrite

RegistersWriteregister

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Writedata

ALUresult

ALU

Data

Data

Registernumbers

a. Registers b. ALU

Zero5

5

5 3

InstructionRegisters

Writeregister

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Writedata

ALUresult

ALU

Zero

RegWrite

ALU operation3

Two elements used to implement

R-type instructions

Datapath

Page 32: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Register File and ALU

ALU

Register File

Read

data 2

Read

register 1

32

Read

data 1Read

register 2Write

register

Write

data

RegWrite

32

5

5

5

32 Operation4

32

32

32

zero

Extracted from the 32-bit instruction code

Register

number

Page 33: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Add t1, t1, t2 (sketch)

ALU

Register File

Read

data 2

Read

register 1

32

Read

data 1Read

register 2Write

register

Write

data

RegWrite

(must be set only at the next edge)

32

5

5

5

32Operation4

t1

t2

t1

i

n

s

t

r

u

c

t

i

o

n

zero

Page 34: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Animating the Datapath (R-type)

add rd, rs, rt

R[rd] <- R[rs] + R[rt];

5 5 5

RD1

RD2

RN1 RN2 WN

WD

RegWrite

Register File

op rs rt rd functshamt

Operation

ALU Zero

Instruction

3

Page 35: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

What about the Load/Store

• Ex. lw t1, offset(t2)– The memory @ is computed by adding the 16-

bit signed offset to the input register

– The offset of 16-bit, but memory addresses are 32-bit

– Therefore, the offset must be sign-extended into a 32-bit value before being added to the input register

– The memory has both read and write control

» MemWrite control signal

» MemRead control signal

35

Page 36: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Datapath: Load/Store Instruction

16 32Sign

extend

b. Sign-extension unit

MemRead

MemWrite

Datamemory

Writedata

Readdata

a. Data memory unit

Address

Instruction

16 32

RegistersWriteregister

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Datamemory

Writedata

Readdata

Writedata

Signextend

ALUresult

ZeroALU

Address

MemRead

MemWrite

RegWrite

ALU operation3

Two additional elements used

To implement load/stores

Datapath

Page 37: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Implementing Load/Store

sign

extend16 32

Data Memory

Address Read

data

Write

data

MemRead

MemWrite

3232

32

Data Memory UnitData Memory Unit

Sign-extension UnitSign-extension Unit

Page 38: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control
Page 39: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Implementing lw s1,offset(s2)

sign

extend16 32 Data Memory

Address Read

data

Write

data

MemRead (set)

MemWrite

(not set)

3232

32

i

n

s

t

r

u

c

t

i

o

n

s2

offset

s1

add32

Register File

Read

data 2

Read

register 1

32

Read

data 1

Read

register 2Write

register

Write

data

RegWrite

(set on next edge)

32

5

5

5

32

Page 40: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Animating the Datapath (Load)

op rs rt offset/immediate

5 5

16

RD1

RD2

RN1 RN2 WN

WD

RegWrite

Register File

Operation

ALU

3

EXTND

16 32

Zero

RDWD

MemRead

MemoryADDR

MemWrite

5

lw rt, offset(rs)

R[rt] <- MEM[R[rs]+s_extend(offset)];

Page 41: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Animating the Datapath (Store)

op rs rt offset/immediate

5 5

16

RD1

RD2

RN1 RN2 WN

WD

RegWrite

Register File

Operation

ALU

3

EXTND

16 32

Zero

RDWD

MemRead

MemoryADDR

MemWrite

5

sw rt, offset(rs)

MEM[R[rs]+sign_extend(offset)] <- R[rt]

Page 42: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

What about the Branch (beq)

• 2 registers that are compared

• To do a branch we must– Compute the branch’s target address based on

its offset

– Decide whether the branch is taken or not taken» Taken: branch target address becomes the new PC

PC = (PC+4)+4*(target field)

» Not taken: if the operands are not equal,

PC=PC+4 as usual

42

Page 43: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Branch Datapath

43Datapath

No shift hardware required:

simply connect wires from

input to output, each shifted

left 2 bits

Page 44: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control
Page 45: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Animating the Datapath (branch)

beq rs, rt, offset

op rs rt offset/immediate

5 5

16

RD1

RD2

RN1 RN2 WN

WD

RegWrite

Register File

Operation

ALU

EXTND

16 32

Zero

ADD

<<2

PC +4 from instruction datapath

if (R[rs] == R[rt]) then PC <- PC+4 + s_extend(offset<<2)

Page 46: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Putting it altogether

• The simplest design is one in which – all instructions are executed in a single clock

cycle

• In this case, every element of the datapath is used only once per clock cycle

– No duplication of hardware needed

– Or only of a few adders perhaps here and there

– And we need separate Data and Instruction memories

• Let’s at first put together the pieces for the R-type (ALU) instructions and the memory instructions as they are quite similar.

46

Page 47: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Altogether (not quite)

We “simply” add multiplexer (多工器 ) for choosing between the

datapath for the ALU instructions and the memory instructions

Combining the datapaths for R-type instructions

and load/stores using two multiplexors

Page 48: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control
Page 49: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Animating the Datapath: R-type Instruction

add rd,rs,rt5 516

RD1

RD2

RN1 RN2 WN

WD

RegWrite

Register File

Operation

ALU

3

EXTND

16 32

Zero

RD

WD

MemRead

DataMemory

ADDRMemWrite

5

Instruction32

MUX

MUXALUSrc

MemtoReg

Page 50: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Animating the Datapath: Load Instruction

lw rt,offset(rs)5 516

RD1

RD2

RN1 RN2 WN

WD

RegWrite

Register File

Operation

ALU

3

EXTND

16 32

Zero

RD

WD

MemRead

DataMemory

ADDRMemWrite

5

Instruction32

MUX

MUXALUSrc

MemtoReg

Page 51: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Animating the Datapath: Store Instruction

sw rt,offset(rs)5 516

RD1

RD2

RN1 RN2 WN

WD

RegWrite

Register File

Operation

ALU

3

EXTND

16 32

Zero

RD

WD

MemRead

DataMemory

ADDRMemWrite

5

Instruction32

MUX

MUXALUSrc

MemtoReg

Page 52: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

PC

Instructionmemory

Readaddress

Instruction

16 32

Registers

Writeregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

ALUresult

Zero

Datamemory

Address

Writedata

Readdata M

ux

4

Add

Mux

ALU

RegWrite

ALU operation3

MemRead

MemWrite

ALUSrcMemtoReg

Adding instruction fetch

Separate instruction memory as instruction and data read occur in the same clock cycle

Separate adder as ALU operations and PC increment occur in the same clock cycle

Page 53: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control
Page 54: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Complete Altogether

Adding branch capability and another multiplexor

Instruction address is either

PC+4 or branch target address

Extra adder needed as both

adders operate in each cycle

New multiplexor

Important note: in a single-cycle implementation data cannot be stored

during an instruction – it only moves through combinational logic

Question: is the MemRead signal really needed?! Think of RegWrite…!

Page 55: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

5.4 What now?

• At this point we’ve identified most of the component for an almost full datapath for a very simple implementation of the MIPS ISA

• Let us now design the logic that makes it all work

– i.e., how we set the control signals

Page 56: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Datapath Executing add

add rd, rs, rt

5 516

RD1

RD2

RN1 RN2 WN

WD

RegWrite

Register File

Operation

ALU

3

EXTND

16 32

Zero

RD

WD

MemRead

DataMemory

ADDRMemWrite

5

Instruction32

MUX

ALUSrc

MemtoReg

ADD

<<2

RD

InstructionMemory

ADDR

PC

4

ADD

ADD

MUX

MUX

PCSrc

Page 57: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Datapath Executing lw

lw rt,offset(rs)

5 516

RD1

RD2

RN1 RN2 WN

WD

RegWrite

Register File

Operation

ALU

3

EXTND

16 32

Zero

RD

WD

MemRead

DataMemory

ADDRMemWrite

5

Instruction32

MUX

ALUSrc

MemtoReg

ADD

<<2

RD

InstructionMemory

ADDR

PC

4

ADD

ADD

MUX

MUX

PCSrc

Page 58: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Datapath Executing sw

sw rt,offset(rs)

5 516

RD1

RD2

RN1 RN2 WN

WD

RegWrite

Register File

Operation

ALU

3

EXTND

16 32

Zero

RD

WD

MemRead

DataMemory

ADDRMemWrite

5

Instruction32

MUX

ALUSrc

MemtoReg

ADD

<<2

RD

InstructionMemory

ADDR

PC

4

ADD

ADD

MUX

MUX

PCSrc

Page 59: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Datapath Executing beq

beq r1,r2,offset

5 516

RD1

RD2

RN1 RN2 WN

WD

RegWrite

Register File

Operation

ALU

3

EXTND

16 32

Zero

RD

WD

MemRead

DataMemory

ADDRMemWrite

5

Instruction32

MUX

ALUSrc

MemtoReg

ADD

<<2

RD

InstructionMemory

ADDR

PC

4

ADD

ADD

MUX

MUX

PCSrc

Page 60: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Control Unit

• Let’s go through the type of control signals that need to be generated

• An important set of signals if for the ALU

• Our ALU has four control signals:

ALU controls Function

0 0 0 0 AND

0 0 0 1 OR

0 0 1 0 add

0 1 1 0 subtract

0 1 1 1 set on less than

1 1 0 0 NOR

Page 61: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Controlling the ALU

• Depending on the instruction, the ALU will need to perform on of these five function

– For Load/Store: the ALU needs to add

– For R-type instructions: depends on the 6-bit function field in the low-order bits of the instructions (Remember Chapter 2)

– For branch: the ALU needs to subtract

61

Page 62: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Controlling the ALU

• We can generate the 4-bit ALU control using a small control unit that takes:

– 2 control bits called ALUOp» add (00), sub (01), depends (10)

– the instruction’s function field

• ALU control inputs based on – 2-bit ALUOp control

– 6-bit function code

62

Page 63: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control
Page 64: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Determining ALU Control Bits

64

Inst.

Opcode

ALUop Inst. Operation

Func. Field

Desired ALU action

ALU control input

lw 00 load xxxxxx add 0010

sw 00 store xxxxxx add 0010

beq 01 branch xxxxxx subtract 0110

R-type 10 add 100000 add 0010

R-type 10 subtract 100010 subtract 0110

R-type 10 and 100100 and 0000

R-type 10 or 100101 or 0001

R-type 10 Set on < 101010 Set on < 0111

Don’t CareDon’t Care

Page 65: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Design ALU Control Unit

• Designing logic– Useful to create a truth table for the interesting

combinations of the function code field and the ALUOp bits

– It can be optimized and then turned into gates

65

ALUOp Funct field OperationALUOp1 ALUOp0 F5 F4 F3 F2 F1 F0

0 0 X X X X X X 010X 1 X X X X X X 1101 X X X 0 0 0 0 0101 X X X 0 0 1 0 1101 X X X 0 1 0 0 0001 X X X 0 1 0 1 0011 X X X 1 0 1 0 111

Page 66: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

The Three Instruction Classes

• R-type, load and store, and branch formats– Need to add a multiplexor to select which field

of the instruction is used to indicate the destination register

» 20:16 bit position (rt) for load

» 15:11 bit position (rd) for R-type instruction

66

op:0 rs rt rd shamt funct

4 or 5 rs rt address

35 or 43 rs rt address

31:26 25:21 20:16 15:11 10:6 5:0

31:26 25:21 20:16 15:0

31:26 25:21 20:16 15:0

R-type

Load & store

Branch

Page 67: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

New Control Signals

• RegDst: destination comes from rt vs. rd

• RegWrite: register should be written

• ALUSrc: ALU operand from register vs. instruction

• PCSrc: PC from adder vs. branch target

• MemRead: for lw

• MemWrite: for store

• MemtoReg: register write from ALU vs. memory

Page 68: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control
Page 69: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

The Seven Control Signals

Signal Name Effect when deasserted

(未被拉起時的功能 )

Effect when asserted

(被拉起時的功能 )

RegDst The register destination number comes from rt field ([20:16])

The register destination number comes from rd field ([15:11])

RegWrite

ALUSrc

PCSrc

MemRead

MemWrite

Mem2Reg

Page 70: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

The Seven Control Signals

Signal Name Effect when deasserted Effect when asserted

RegDst The register destination number comes from rt field ([20:16])

The register destination number comes from rd field ([15:11])

RegWrite None The write register is written with the value on the write data input

ALUSrc

PCSrc

MemRead

MemWrite

Mem2Reg

Page 71: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

The Seven Control Signals

Signal Name Effect when deasserted Effect when asserted

RegDst The register destination number comes from rt field ([20:16])

The register destination number comes from rd field ([15:11])

RegWrite None The write register is written with the value on the write data input

ALUSrc The second ALU operand comes from Read data 2

The second ALU operand is the sign-extended, lower 16 bits

PCSrc

MemRead

MemWrite

Mem2Reg

Page 72: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

The Seven Control Signals

Signal Name Effect when deasserted Effect when asserted

RegDst The register destination number comes from rt field ([20:16])

The register destination number comes from rd field ([15:11])

RegWrite None The write register is written with the value on the write data input

ALUSrc The second ALU operand comes from Read data 2

The second ALU operand is the sign-extended, lower 16 bits

PCSrc The PC is replaced by the output of the adder, PC+4

The PC is replaced by the output of the adder, the branch target

MemRead

MemWrite

Mem2Reg

Page 73: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

The Seven Control Signals

Signal Name Effect when deasserted Effect when asserted

RegDst The register destination number comes from rt field ([20:16])

The register destination number comes from rd field ([15:11])

RegWrite None The write register is written with the value on the write data input

ALUSrc The second ALU operand comes from Read data 2

The second ALU operand is the sign-extended, lower 16 bits

PCSrc The PC is replaced by the output of the adder, PC+4

The PC is replaced by the output of the adder, the branch target

MemRead None Data memory contents designated by the address are put on the Read data output

MemWrite

Mem2Reg

Page 74: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

The Seven Control Signals

Signal Name Effect when deasserted Effect when asserted

RegDst The register destination number comes from rt field ([20:16])

The register destination number comes from rd field ([15:11])

RegWrite None The write register is written with the value on the write data input

ALUSrc The second ALU operand comes from Read data 2

The second ALU operand is the sign-extended, lower 16 bits

PCSrc The PC is replaced by the output of the adder, PC+4

The PC is replaced by the output of the adder, the branch target

MemRead None Data memory contents designated by the address are put on the Read data output

MemWrite None Data memory contents designated by the address are replaced by the write data input

Mem2Reg

Page 75: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

The Seven Control Signals

Signal Name Effect when deasserted Effect when asserted

RegDst The register destination number comes from rt field ([20:16])

The register destination number comes from rd field ([15:11])

RegWrite None The write register is written with the value on the write data input

ALUSrc The second ALU operand comes from Read data 2

The second ALU operand is the sign-extended, lower 16 bits

PCSrc The PC is replaced by the output of the adder, PC+4

The PC is replaced by the output of the adder, the branch target

MemRead None Data memory contents designated by the address are put on the Read data output

MemWrite None Data memory contents designated by the address are replaced by the write data input

Mem2Reg Write to the register. Write data input comes from the ALU

Write to the register. Write data input comes from the data memory

Page 76: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control
Page 77: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Instruction RegDst ALUSrcMemto-

RegReg

WriteMem Read

Mem Write Branch ALUOp1 ALUp0

R-format 1 0 0 1 0 0 0 1 0lw 0 1 1 1 1 0 0 0 0sw X 1 X 0 0 1 0 0 0beq X 0 X 0 0 0 1 0 1

PC

Instructionmemory

Readaddress

Instruction[31– 0]

Instruction [20 16]

Instruction [25 21]

Add

Instruction [5 0]

MemtoReg

ALUOp

MemWrite

RegWrite

MemRead

BranchRegDst

ALUSrc

Instruction [31 26]

4

16 32Instruction [15 0]

0

0Mux

0

1

Control

Add ALUresult

Mux

0

1

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux

1

ALUresult

Zero

PCSrc

Datamemory

Writedata

Readdata

Mux

1

Instruction [15 11]

ALUcontrol

Shiftleft 2

ALUAddress

Determining control signals for the MIPS datapath based on instruction opcode

PCSrc cannot be

set directly from the

opcode: zero test

outcome is required

Page 78: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Control Signals: R-Type Instruction

Control signals

shown in blue

1

0

0

0

1

???Value depends on

funct

0

0

5 516

RD1

RD2

RN1 RN2 WN

WD

RegWrite

Register File

Operation

ALU

3

EXTND

16 32

Zero

RD

WD

MemRead

DataMemory

ADDRMemWrite

5

Instruction I32

MUX

ALUSrc

MemtoReg

ADD

<<2

RD

InstructionMemory

ADDR

PC

4

ADD

ADD

MUX

MUX

PCSrc

MUX RegDst

5

rdI[15:11]

rtI[20:16]

rsI[25:21]

immediate/offsetI[15:0]

0

1

0

11

0

10

Page 79: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

5 516

RD1

RD2

RN1 RN2 WN

WD

RegWrite

Register File

Operation

ALU

3

EXTND

16 32

Zero

RD

WD

MemRead

DataMemory

ADDRMemWrite

5

Instruction I32

MUX

ALUSrc

MemtoReg

ADD

<<2

RD

InstructionMemory

ADDR

PC

4

ADD

ADD

MUX

MUX

PCSrc

MUX RegDst

5

rdI[15:11]

rtI[20:16]

rsI[25:21]

immediate/offsetI[15:0]

0

1

0

11

0

10

Control Signals: lw Instruction

0

Control signals

shown in blue

0010

1

1

1

0

1

Page 80: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

5 516

RD1

RD2

RN1 RN2 WN

WD

RegWrite

Register File

Operation

ALU

3

EXTND

16 32

Zero

RD

WD

MemRead

DataMemory

ADDRMemWrite

5

Instruction I32

MUX

ALUSrc

MemtoReg

ADD

<<2

RD

InstructionMemory

ADDR

PC

4

ADD

ADD

MUX

MUX

PCSrc

MUX RegDst

5

rdI[15:11]

rtI[20:16]

rsI[25:21]

immediate/offsetI[15:0]

0

1

0

11

0

10

Control Signals: sw Instruction

0

Control signals

shown in blue

X010

1

X

0

1

0

Page 81: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

5 516

RD1

RD2

RN1 RN2 WN

WD

RegWrite

Register File

Operation

ALU

3

EXTND

16 32

Zero

RD

WD

MemRead

DataMemory

ADDRMemWrite

5

Instruction I32

MUX

ALUSrc

MemtoReg

ADD

<<2

RD

InstructionMemory

ADDR

PC

4

ADD

ADD

MUX

MUX

PCSrc

MUX RegDst

5

rdI[15:11]

rtI[20:16]

rsI[25:21]

immediate/offsetI[15:0]

0

1

0

11

0

10

Control Signals: beq Instruction

Control signals

shown in blue

X110

0

X

0

0

0

1 if Zero=1

Page 82: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Single-Cycle Design Problems (p.314)

• Assuming fixed-period clock every instruction datapath uses one clock cycle implies

– CPI = 1

– Cycle time determined by length of the longest instruction path (load)

» But several instructions could run in a shorter clock cycle: waste of time

– Resources used more than once in the same cycle need to be duplicated

» waste of hardware and chip area

82

Page 83: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Performance of Single-Cycle

– Memory units: 200 ps

– ALU and adder: 100ps

– Register file (read/write): 50ps

» multiplexors, control unit, PC accesses, sign extension, wires: no delay

• Assume instruction mix as follows– all loads take same time and comprise 25%

– all stores take same time and comprise 10%

– R-format instructions comprise 45%

– branches comprise 15%

– jumps comprise 5%

• Compare the performance of • (a) a single-cycle implementation using a fixed-period clock with

• (b) one using a variable-period clock where each instruction executes in one clock cycle that is only as long as it needs to be (not really practical but pretend it’s possible!)

83

Page 84: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Solution (1/3)

• CPU time = Instruction_count x CPI x clock_cycle• CPU time = Instruction_count x clock_cycle (CPI=1)

– We need only find the clock cycle time, since instruction count and CPI are the same for both implementations

84

Instruction class

Functional units used by the instruction class

R-type Inst. fetch Reg. access ALU Reg. access

Load word Inst. fetch Reg. access ALU Memory access

Reg. access

Store word Inst. fetch Reg. access ALU Memory access

Branch Inst. fetch Reg. access ALU

Jump Inst. fetch

Page 85: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Solution (2/3)

• Machine with a single clock for all instruction– be determined by the longest instruction 600 ps

• Machine with a variable clock – Find average clock cycle length– 400*45%+600*25%+550*10%+350*15%+200*5% =447.5ps

» It is clearly faster

85

Instruction class

Inst. Memory

Reg. read

ALU operation

Data memory

Reg. write

Total

R-type 200 50 100 0 50 400 ps

Load word 200 50 100 200 50 600 ps

Store word 200 50 100 200 550 ps

Branch 200 50 100 0 350 ps

Jump 200 200 ps

Page 86: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Solution (3/3)

• Unfortunately, implementing a variable-speed clock for each instruction class is extremely difficult

– Overhead for such an approach could be larger than any advantage gained

86

34.15.447

600

CycleClock CPU

CycleClock CPU

variableCycleClock CPUIC

CycleClock CPUIC

TimeExecution CPU

TimeExecution CPU

ePerformanc CPU

ePerformanc CPU

clock variable

clock single

clock

clock single

clock variable

clock single

clock single

clock variable

Page 87: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Example: Practice

• Consider a machine with an additional floating point unit. Assume functional unit delays as follows

– memory: 2 ns., ALU and adders: 2 ns., FPU add: 8 ns., FPU multiply: 16 ns., register file access (read or write): 1 ns.

– multiplexors, control unit, PC accesses, sign extension, wires: no delay

• Assume instruction mix as follows– all loads take same time and comprise 31%

– all stores take same time and comprise 21%

– R-format instructions comprise 27%

– branches comprise 5%

– jumps comprise 2%

– FP adds and subtracts take the same time and totally comprise 7%

– FP multiplys and divides take the same time and totally comprise 7%

• Compare the performance of (a) a single-cycle implementation using a fixed-period clock with (b) one using a variable-period clock where each instruction executes in one clock cycle that is only as long as it needs to be (not really practical but pretend it’s possible!)

Page 88: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Solution

• Clock period for fixed-period clock = longest instruction time = 20 ns.

• Average clock period for variable-period clock

• = 8 31% +7 21% + 6 27% + 5 5% + 2 2% + 20 7% + 12 7%

= 7.0 ns.

• Therefore, performancevar-period /performancefixed-period = 20/7 = 2.9

Instruction Instr. Register ALU Data Register FPU FPU Total

class mem. read oper. mem. write add/ mul/ time

sub div ns.

Load word 2 1 2 2 1 8

Store word 2 1 2 2 7

R-format 2 1 2 0 1 6

Branch 2 1 2 5

Jump 2 2

FP mul/div 2 1 1 16 20

FP add/sub 2 1 1 8 12

Page 89: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

5.5 Multi-Cycle Implementation

• The design of a multi-cycle implementation

• The idea is to have the functional units and a set of additional registers

– to hold important values in between the cycles of a single instruction

• This way a functional unit can be shared between cycles of the same instruction

– provided some multiplexers are added to decide where the input should come from

– This sharing can help reduce the amount of hardware required

89

Page 90: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Multi-cycle Design

• Major Advantages– Instructions to take different numbers of clock cycles

– Share functional units within the execution of a single instruction

• Compare with single-cycle version– Single memory unit is used for both instructions and

data

– Single ALU (not ALU and two adders)

– One or more registers are added after every functional unit to hold the output

» Until the value is used in a subsequent clock cycle

90

Page 91: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Multi-cycle Design

• The clock cycle can accommodate at most one of the following operations

– Memory access

– Register file access (two reads or one write)

– ALU operation

• So, data produced by one of these three functional units must be saved

– Into a temporary register for use on a later cycle

91

Page 92: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Temporary Register

• Instruction register (IR)– Save the output of the memory for an

instruction read

• Memory data register (MDR)– Save the output of the memory for a data read

• A and B registers– Hold the register operand values read from the

register file

• ALUOut register– Hold the output of the ALU

92

Page 93: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

• single memory for data

and instructions

• single ALU, no extra adders

• extra registers to

hold data between

clock cycles

Multi-cycle vs. single-cycle

PC

Instructionmemory

Readaddress

Instruction

16 32

Add ALUresult

Mux

Registers

Writeregister

Writedata

Readdata 1

Readdata 2

Readregister 1Readregister 2

Shiftleft 2

4

Mux

ALU operation3

RegWrite

MemRead

MemWrite

PCSrc

ALUSrc

MemtoReg

ALUresult

ZeroALU

Datamemory

Address

Writedata

Readdata M

ux

Signextend

Add

PC

Memory

Address

Instructionor data

Data

Instructionregister

Registers

Register #

Data

Register #

Register #

ALU

Memorydata

register

A

B

ALUOut

Single-cycle datapath

Multicycle datapath (high-level view)

Page 94: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Multicycle Datapath

Basic multicycle MIPS datapath handles R-type instructions and load/stores:

new internal register in red ovals, new multiplexors in blue ovals

Page 95: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Breaking Instructions into Steps

• Our goal is to break up the instructions into steps so that

– Each step takes one clock cycle

– The amount of work to be done in each step/cycle is about equal

– Each cycle uses at most once each major functional unit so that such units do not have to be replicated

– Functional units can be shared between different cycles within one instruction

• Data at end of one cycle to be used in next must be stored !!

95

Page 96: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Breaking Instructions into Steps

• For MIPS, we can think of the instruction running in 5 1-cycle stages

1.Instruction fetch and PC increment (IF)

2.Instruction decode and register fetch (ID)

3.Execution, memory address computation, or branch completion (EX)

4.Memory access or R-type instruction completion (MEM)

5.Memory read completion (WB)

– Each MIPS instruction takes from 3 – 5 cycles (steps)

96

Page 97: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

• For MIPS, we can think of the instruction running in 5 1-cycle stages

1.Instruction fetch and PC increment (IF)

2.Instruction decode and register fetch (ID)

3.Execution, memory address computation, or branch completion (EX)

4.Memory access or R-type instruction completion (MEM)

5.Memory read completion (WB)

97

Page 98: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Step 1: Instruction Fetch & PC Increment (IF)

IR = Memory[PC]; PC = PC + 4;

• Use PC to get instruction and write the instruction into instruction register (IR)

• Increment the PC by 4 and put the result back in the PC

– The new value of the PC is not visible until the next clock cycle (stored into ALUOut)

• In this step we don’t know yet what the instruction does

98

Page 99: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

• For MIPS, we can think of the instruction running in 5 1-cycle stages

1.Instruction fetch and PC increment (IF)

2.Instruction decode and register fetch (ID)

3.Execution, memory address computation, or branch completion (EX)

4.Memory access or R-type instruction completion (MEM)

5.Memory read completion (WB)

99

Page 100: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Step 2: Instruction Decode and Register Fetch (ID)

• Read registers rs and rt in case we need them

– Read them from the register file and store the values into the temporary register A and B

• Compute the branch address with the ALU and save it in a temporary register

A = Reg[IR[25-21]];B = Reg[IR[20-16]];ALUOut = PC+(sign-extend(IR[15-0]) << 2);

100

Page 101: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

• For MIPS, we can think of the instruction running in 5 1-cycle stages

1.Instruction fetch and PC increment (IF)

2.Instruction decode and register fetch (ID)

3.Execution, memory address computation, or branch completion (EX)

4.Memory access or R-type instruction completion (MEM)

5.Memory read completion (WB)

101

Page 102: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Step 3: Execution, Address Computation or Branch Completion (EX)

• Action to be taken depending on the instruction class

– Memory reference (lw and sw, [rs]+offset)

ALUOut = A + sign-extend(IR[15-0]);

– Arithmetic-logical instruction (R-type)

ALUOut = A op B;

– Branch (A-B ? 0)

if (A==B) PC = ALUOut;

– Jump

PC = PC[31-28] || (IR(25-0) << 2)

102

Page 103: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

• For MIPS, we can think of the instruction running in 5 1-cycle stages

1.Instruction fetch and PC increment (IF)

2.Instruction decode and register fetch (ID)

3.Execution, memory address computation, or branch completion (EX)

4.Memory access or R-type instruction completion (MEM)

5.Memory read completion (WB)

103

Page 104: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Step 4: Memory access or R-type Instruction Completion (MEM)

• Load or Store instruction accesses memory and an arithmetic-logical instruction writes its result

– If the instruction is a load

» Value is retrieved from memory, it is stored into the memory data register (MDR)

– If the instruction is a store

» Data is written to memory

– If the instruction is a R-type instruction

» Place the result from the ALU into a temporary register (ALUOut), write to rd

104

Page 105: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

• For MIPS, we can think of the instruction running in 5 1-cycle stages

1.Instruction fetch and PC increment (IF)

2.Instruction decode and register fetch (ID)

3.Execution, memory address computation, or branch completion (EX)

4.Memory access or R-type instruction completion (MEM)

5.Memory read completion (WB)

105

Page 106: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Step 5: Memory Read Completion (WB)

• Loads complete by writing back the value from memory

– Write the load data, which was stored into MDR

– Write back into the register rt

Reg[IR[20-16]]= MDR;

106

Page 107: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Summary of Instruction Execution

107

Step nameAction for R-type

instructionsAction for memory-reference

instructionsAction for branches

Action for jumps

Instruction fetch IR = Memory[PC]PC = PC + 4

Instruction A = Reg [IR[25-21]]decode/register fetch B = Reg [IR[20-16]]

ALUOut = PC + (sign-extend (IR[15-0]) << 2)

Execution, address ALUOut = A op B ALUOut = A + sign-extend if (A ==B) then PC = PC [31-28] IIcomputation, branch/ (IR[15-0]) PC = ALUOut (IR[25-0]<<2)jump completion

Memory access or R-type Reg [IR[15-11]] = Load: MDR = Memory[ALUOut]completion ALUOut or

Store: Memory [ALUOut] = B

Memory read completion Load: Reg[IR[20-16]] = MDR

1: IF

2: ID

3: EX

4: MEM

5: WB

Step

Page 108: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

The schematic view

IF ID EX

uses the memory

uses the register file

uses the ALU

uses the

memory

uses the

register file

Very important to remember

the content of this slide

Mem WB

Page 109: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Multicycle Execution Step (1):Instruction Fetch

IR = Memory[PC];PC = PC + 4;

4PC + 4

5 5

RD1

RD2

RN1 RN2 WN

WD

RegWrite

Registers

Operation

ALU

3

Zero

RD

WDMemRead

MemoryADDR

MemWrite

5

Instruction I

PC

IR

MDR

A

B

ALUOUT

Page 110: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Multicycle Execution Step (2):Instruction Decode & Register Fetch

A = Reg[IR[25-21]]; (A = Reg[rs])B = Reg[IR[20-15]]; (B = Reg[rt])ALUOut = (PC + sign-extend(IR[15-0]) << 2)

Branch

Target

Address

Reg[rs]

Reg[rt]

PC + 4

5 5

RD1

RD2

RN1 RN2 WN

WD

RegWrite

Registers

Operation

ALU

3

Zero

RD

WDMemRead

MemoryADDR

MemWrite

5

Instruction I

PC

IR

MDR

A

B

ALUOUT

Page 111: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Multicycle Execution Step (3):Memory Reference Instructions

ALUOut = A + sign-extend(IR[15-0]);

Mem.

Address

Reg[rs]

Reg[rt]

PC + 4

5 5

RD1

RD2

RN1 RN2 WN

WD

RegWrite

Registers

Operation

ALU

3

Zero

RD

WDMemRead

MemoryADDR

MemWrite

5

Instruction I

PC

IR

MDR

A

B

ALUOUT

Page 112: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Multicycle Execution Step (3):ALU Instruction (R-Type)

ALUOut = A op B

R-Type

Result

Reg[rs]

Reg[rt]

PC + 4

5 5

RD1

RD2

RN1 RN2 WN

WD

RegWrite

Registers

Operation

ALU

3

Zero

RD

WDMemRead

MemoryADDR

MemWrite

5

Instruction I

PC

IR

MDR

A

B

ALUOUT

Page 113: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Multicycle Execution Step (3):Branch Instructions

if (A == B) PC = ALUOut;

Branch

Target

Address

Reg[rs]

Reg[rt]

Branch

Target

Address

5 5

RD1

RD2

RN1 RN2 WN

WD

RegWrite

Registers

Operation

ALU

3

Zero

RD

WDMemRead

MemoryADDR

MemWrite

5

Instruction I

PC

IR

MDR

A

B

ALUOUT

Page 114: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Multicycle Execution Step (3):Jump InstructionPC = PC[31-28] concat (IR[25-0] << 2)

Jump

Address

Reg[rs]

Reg[rt]

Branch

Target

Address

5 5

RD1

RD2

RN1 RN2 WN

WD

RegWrite

Registers

Operation

ALU

3

Zero

RD

WDMemRead

MemoryADDR

MemWrite

5

Instruction I

PC

IR

MDR

A

B

ALUOUT

Page 115: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Multicycle Execution Step (4):Memory Access - Read (lw)

MDR = Memory[ALUOut];

Mem.

Data

PC + 4

Reg[rs]

Reg[rt]

Mem.

Address

5 5

RD1

RD2

RN1 RN2 WN

WD

RegWrite

Registers

Operation

ALU

3

Zero

RD

WDMemRead

MemoryADDR

MemWrite

5

Instruction I

PC

IR

MDR

A

B

ALUOUT

Page 116: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Multicycle Execution Step (4):Memory Access - Write (sw)

Memory[ALUOut] = B;

PC + 4

Reg[rs]

Reg[rt]

5 5

RD1

RD2

RN1 RN2 WN

WD

RegWrite

Registers

Operation

ALU

3

Zero

RD

WDMemRead

MemoryADDR

MemWrite

5

Instruction I

PC

IR

MDR

A

B

ALUOUT

Page 117: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Multicycle Execution Step (4):ALU Instruction (R-Type)Reg[IR[15:11]] = ALUOUT

R-Type

Result

Reg[rs]

Reg[rt]

PC + 4

5 5

RD1

RD2

RN1 RN2 WN

WD

RegWrite

Registers

Operation

ALU

3

Zero

RD

WDMemRead

MemoryADDR

MemWrite

5

Instruction I

PC

IR

MDR

A

B

ALUOUT

Page 118: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Multicycle Execution Step (5):Memory Read Completion (lw)

Reg[IR[20-16]] = MDR;

PC + 4

Reg[rs]

Reg[rt]Mem.

Data

Mem.

Address

5 5

RD1

RD2

RN1 RN2 WN

WD

RegWrite

Registers

Operation

ALU

3

Zero

RD

WDMemRead

MemoryADDR

MemWrite

5

Instruction I

PC

IR

MDR

A

B

ALUOUT

Page 119: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Multicycle Datapath with Control I

Shiftleft 2

MemtoReg

IorD MemRead MemWrite

PC

Memory

MemData

Writedata

Mux

0

1

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Instruction[15– 11]

Mux

0

1

Mux

0

1

4

ALUOpALUSrcB

RegDst RegWrite

Instruction[15– 0]

Instruction [5– 0]

Signextend

3216

Instruction[25– 21]

Instruction[20– 16]

Instruction[15– 0]

Instructionregister

1 Mux

0

3

2

ALUcontrol

Mux

0

1ALU

resultALU

ALUSrcA

ZeroA

B

ALUOut

IRWrite

Address

Memorydata

register

… with control lines and the ALU control block added – not all control lines are shown

Page 120: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Multicycle Datapath with Control II

Complete multicycle MIPS datapath (with branch and jump capability)

and showing the main control block and all control lines

Shiftleft 2

PCMux

0

1

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Instruction[15– 11]

Mux

0

1

Mux

0

1

4

Instruction[15– 0]

Signextend

3216

Instruction[25– 21]

Instruction[20– 16]

Instruction[15– 0]

Instructionregister

ALUcontrol

ALUresult

ALUZero

Memorydata

register

A

B

IorD

MemRead

MemWrite

MemtoReg

PCWriteCond

PCWrite

IRWrite

ALUOp

ALUSrcB

ALUSrcA

RegDst

PCSource

RegWrite

Control

Outputs

Op[5– 0]

Instruction[31-26]

Instruction [5– 0]

Mux

0

2

Jumpaddress [31-0]Instruction [25– 0] 26 28

Shiftleft 2

PC [31-28]

1

1 Mux

0

3

2

Mux

0

1ALUOut

Memory

MemData

Writedata

Address

New multiplexorNew gates For the jump address

Page 121: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Action of the Control Signals

• Action of the 1-bit control signals– RegDst, RegWrite

– ALUSrcA

– MemRead, MemWrite, MemtoRe

– IorD

– IRWrite

– PCWrite, PCWriteCond

• Action of the 2-bit control signals– ALUOp

– ALUSrcB

– PCSource

121

Page 122: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Multicycle Control Step (1): Fetch

IR = Memory[PC];PC = PC + 4;

1

0

1

0

1

0X

0X

0010

1

5 5

RD1

RD2

RN1 RN2 WN

WD

RegWrite

Registers

Operation

ALU

3

EXTND

16 32

Zero

RD

WDMemRead

MemoryADDR

MemWrite

5

Instruction I

32

ALUSrcB

<<2

PC

4

RegDst

5

IR

MDR

MUX

0123

MUX

1

0

MUX

0

1A

BALUOUT

0

1

2MUX

<<2 CONCAT28 32

MUX

0

1

ALUSrcA

jmpaddrI[25:0]

rd

MUX0 1

rtrs

immediate

PCSource

MemtoReg

IorD

PCWr*

IRWrite

Page 123: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Multicycle Control Step (2):Instruction Decode & Register Fetch

A = Reg[IR[25-21]]; (A = Reg[rs])B = Reg[IR[20-15]]; (B = Reg[rt])ALUOut = (PC + sign-extend(IR[15-0]) << 2);

0

0X

0

0X

3

0X

X

010

0

5 5

RD1

RD2

RN1 RN2 WN

WD

RegWrite

Registers

Operation

ALU

3

EXTND

16 32

Zero

RD

WDMemRead

MemoryADDR

MemWrite

5

Instruction I

32

ALUSrcB

<<2

PC

4

RegDst

5

IR

MDR

MUX

0123

MUX

1

0

MUX

0

1A

BALUOUT

0

1

2MUX

<<2 CONCAT28 32

MUX

0

1

ALUSrcA

jmpaddrI[25:0]

rd

MUX0 1

rtrs

immediate

PCSource

MemtoReg

IorD

PCWr*

IRWrite

Page 124: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

0X

Multicycle Control Step (3):Memory Reference Instructions

ALUOut = A + sign-extend(IR[15-0]);

X

2

0

0X

0 1

X

010

0

5 5

RD1

RD2

RN1 RN2 WN

WD

RegWrite

Registers

Operation

ALU

3

EXTND

16 32

Zero

RD

WDMemRead

MemoryADDR

MemWrite

5

Instruction I

32

ALUSrcB

<<2

PC

4

RegDst

5

IR

MDR

MUX

0123

MUX

1

0

MUX

0

1A

BALUOUT

0

1

2MUX

<<2 CONCAT28 32

MUX

0

1

ALUSrcA

jmpaddrI[25:0]

rd

MUX0 1

rtrs

immediate

PCSource

MemtoReg

IorD

PCWr*

IRWrite

Page 125: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Multicycle Control Step (3):ALU Instruction (R-Type)

ALUOut = A op B;

0X

X

0

0

0X

0 1

X

???

0

5 5

RD1

RD2

RN1 RN2 WN

WD

RegWrite

Registers

Operation

ALU

3

EXTND

16 32

Zero

RD

WDMemRead

MemoryADDR

MemWrite

5

Instruction I

32

ALUSrcB

<<2

PC

4

RegDst

5

IR

MDR

MUX

0123

MUX

1

0

MUX

0

1A

BALUOUT

0

1

2MUX

<<2 CONCAT28 32

MUX

0

1

ALUSrcA

jmpaddrI[25:0]

rd

MUX0 1

rtrs

immediate

PCSource

MemtoReg

IorD

PCWr*

IRWrite

Page 126: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

1 if

Zero=1

Multicycle Control Step (3):Branch Instructions

if (A == B) PC = ALUOut;

0X

X

0

0

X0 1

1

011

0

5 5

RD1

RD2

RN1 RN2 WN

WD

RegWrite

Registers

Operation

ALU

3

EXTND

16 32

Zero

RD

WDMemRead

MemoryADDR

MemWrite

5

Instruction I

32

ALUSrcB

<<2

PC

4

RegDst

5

IR

MDR

MUX

0123

MUX

1

0

MUX

0

1A

BALUOUT

0

1

2MUX

<<2 CONCAT28 32

MUX

0

1

ALUSrcA

jmpaddrI[25:0]

rd

MUX0 1

rtrs

immediate

PCSource

MemtoReg

IorD

PCWr*

IRWrite

Page 127: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Multicycle Execution Step (3):Jump Instruction

PC = PC[21-28] concat (IR[25-0] << 2);

0X

X

X

0

1X

0 X

2

XXX

0

5 5

RD1

RD2

RN1 RN2 WN

WD

RegWrite

Registers

Operation

ALU

3

EXTND

16 32

Zero

RD

WDMemRead

MemoryADDR

MemWrite

5

Instruction I

32

ALUSrcB

<<2

PC

4

RegDst

5

IR

MDR

MUX

0123

MUX

1

0

MUX

0

1A

BALUOUT

0

1

2MUX

<<2 CONCAT28 32

MUX

0

1

ALUSrcA

jmpaddrI[25:0]

rd

MUX0 1

rtrs

immediate

PCSource

MemtoReg

IorD

PCWr*

IRWrite

Page 128: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Multicycle Control Step (4):Memory Access - Read (lw)

MDR = Memory[ALUOut];

0X

X

X

1

01

0 X

X

XXX

0

5 5

RD1

RD2

RN1 RN2 WN

WD

RegWrite

Registers

Operation

ALU

3

EXTND

16 32

Zero

RD

WDMemRead

MemoryADDR

MemWrite

5

Instruction I

32

ALUSrcB

<<2

PC

4

RegDst

5

IR

MDR

MUX

0123

MUX

1

0

MUX

0

1A

BALUOUT

0

1

2MUX

<<2 CONCAT28 32

MUX

0

1

ALUSrcA

jmpaddrI[25:0]

rd

MUX0 1

rtrs

immediate

PCSource

MemtoReg

IorD

PCWr*

IRWrite

Page 129: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Multicycle Execution Steps (4)Memory Access - Write (sw)

Memory[ALUOut] = B;

0X

X

X

0

01

1 X

X

XXX

0

5 5

RD1

RD2

RN1 RN2 WN

WD

RegWrite

Registers

Operation

ALU

3

EXTND

16 32

Zero

RD

WDMemRead

MemoryADDR

MemWrite

5

Instruction I

32

ALUSrcB

<<2

PC

4

RegDst

5

IR

MDR

MUX

0123

MUX

1

0

MUX

0

1A

BALUOUT

0

1

2MUX

<<2 CONCAT28 32

MUX

0

1

ALUSrcA

jmpaddrI[25:0]

rd

MUX0 1

rtrs

immediate

PCSource

MemtoReg

IorD

PCWr*

IRWrite

Page 130: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

10

0X

0

X

0

XXX

X

X

1

15 5

RD1

RD2

RN1 RN2 WN

WD

RegWrite

Registers

Operation

ALU

3

EXTND

16 32

Zero

RD

WD

MemRead

MemoryADDR

MemWrite

5

Instruction I

32

ALUSrcB

<<2

PC

4

RegDst

5

IR

MDR

MUX

0123

MUX

0

1

MUX

0

1A

BALUOUT

0

1

2MUX

<<2 CONCAT28 32

MUX

0

1

ALUSrcA

jmpaddrI[25:0]

rd

MUX0 1

rtrs

immediate

PCSource

MemtoReg

IorD

PCWr*

IRWrite

Multicycle Control Step (4):ALU Instruction (R-Type)

Reg[IR[15:11]] = ALUOut; (Reg[Rd] = ALUOut)

Page 131: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Multicycle Execution Steps (5)Memory Read Completion (lw)

Reg[IR[20-16]] = MDR;

1

0

0

X

0

0X

0 X

X

XXX

0

5 5

RD1

RD2

RN1 RN2 WN

WD

RegWrite

Registers

Operation

ALU

3

EXTND

16 32

Zero

RD

WD

MemRead

MemoryADDR

MemWrite

5

Instruction I

32

ALUSrcB

<<2

PC

4

RegDst

5

IR

MDR

MUX

0123

MUX

0

1

MUX

0

1A

BALUOUT

0

1

2MUX

<<2 CONCAT28 32

MUX

0

1

ALUSrcA

jmpaddrI[25:0]

rd

MUX0 1

rtrs

immediate

PCSource

MemtoReg

IorD

PCWr*

IRWrite

Page 132: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

CPI in a Multicycle CPU

• What is the CPI assuming each step requires 1 clock cycle?

– An instruction mix of 25% loads, 10% stores, 11% branches, 2% jumps, and 52% ALU

– Solution:

» Number of clock cycles from previous slide for each instruction class:

• loads 5, stores 4, ALU 4, branches 3, jumps 3

– CPI = CPU clock cycles / instruction count

= (instruction countclass i CPIclass i) / instruction count

= (instruction countclass I / instruction count) CPIclass I

= 0.25 5 + 0.10 4 + 0.52 4 + 0.11 3 + 0.02 3

= 4.12

– Better than the worst-case CPI of 5.0 132

Page 133: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Conclusion

• If instructions take different amounts of time, multi-cycle is better

• We haven’t dived into the gory details of implementing a multi-cycle processors

– What we’ve talked covers Sections 5.1, 5.2, 5.3, 5.4, and a small subset of Section 5.5

– This is all you need to read in the book

» Don’t worry about most of the stuff in Section 5.5

• We are now ready to talk about our “big” topic: Pipelining

Page 134: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

•Q & A

134

Page 135: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

– Chapter 5: Datapath and Control (資料路徑與控制單元 )

– Single-Cycle Implementation v.s. Multi-Cycle Implementation

» MIPS Instruction types and formats

» What is Datapath? What are the datapath elements of MIPS?

» What are the five steps of MIPS datapath?

– Control unit design

» What are the two kinds of control unit design? Describe their implementations and compare them.

– Exception and Interrupt

» Definitions

» Operations

135

Page 136: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

Example

Assume the base address of word array A is stored in the register $s0. The following code is used for the calculation:

A[2] = | A[0] + A[1] |. Highlight the running path of the following instructions in blue in the

simple datapath and mark the control signal. Assume the first instruction is stored in the address of 0040 1000hex .

lw $t0, 0($s0) lw $t1, 4($s0) add $t1, $t1, $t0 slt $t0, $t1, $zero beq $t0, $zero, Label sub $t1, $zero, $t1 sw $t1, 8($s0) j Exit

Label: sw $t1, 8($s0)Exit:

Page 137: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

The Simple Datapath with Controls

ALU

Read register 1

Read data 1

Read register 2

Write register

Read data 2

Write data

Register files Sign-

extend

RegWrite

16 32

0

M

U

X

1

ALUsrc

Address

Read data

Write data

Data Memory

0

M

U

X

1

MemRead

MemWrite Mem2Reg

Read

address

Instruction

[31:0]

Instruction

Memory

PC

4

Shift

left 2

0

M

U

X

1

0

M

U

X

1

ALU

control

ALUop

RegDst

[25:21]

[20:16]

15:11

[15:0]

Control[31:26]

Branch

Zero

[5:0]

Page 138: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

LW $t0, 0/4($s0)

ALU

Read register 1

Read data 1

Read register 2

Write register

Read data 2

Write data

Register files Sign-

extend

RegWrite

16 32

0

M

U

X

1

ALUsrc

Address

Read data

Write data

Data Memory

0

M

U

X

1

MemRead

MemWrite Mem2Reg

Read

address

Instruction

[31:0]

Instruction

Memory

PC

4

Shift

left 2

0

M

U

X

1

0

M

U

X

1

ALU

control

ALUop

RegDst

[25:21]

[20:16]

15:11

[15:0]

Control[31:26]

Branch

Zero

[5:0]

Page 139: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

The Setting of Control Lines

Instruc-tions

RegDst ALUSrc Mem2Reg

Reg-

Write

Mem-

Read

Mem-

Write

Branch ALUOp1

ALUOp0

lw 0 1 1 1 1 0 0 0 0

Page 140: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

add $t1, $t1, $t0 / slt $t0, $t1, $zero / sub $t1, $zero, $t1

ALU

Read register 1

Read data 1

Read register 2

Write register

Read data 2

Write data

Register files Sign-

extend

RegWrite

16 32

0

M

U

X

1

ALUsrc

Address

Read data

Write data

Data Memory

0

M

U

X

1

MemRead

MemWrite Mem2Reg

Read

address

Instruction

[31:0]

Instruction

Memory

PC

4

Shift

left 2

0

M

U

X

1

0

M

U

X

1

ALU

control

ALUop

RegDst

[25:21]

[20:16]

15:11

[15:0]

Control[31:26]

Branch

Zero

[5:0]

Page 141: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

The Setting of Control Lines

Instruc-tions

RegDst ALUSrc Mem2Reg

Reg-

Write

Mem-

Read

Mem-

Write

Branch ALUOp1

ALUOp0

lw 0 1 1 1 1 0 0 0 0

R-type

1 0 0 1 0 0 0 1 0

Page 142: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

beq $t0, $zero, Label (the case $t0 = zero)

ALU

Read register 1

Read data 1

Read register 2

Write register

Read data 2

Write data

Register files Sign-

extend

RegWrite

16 32

0

M

U

X

1

ALUsrc

Address

Read data

Write data

Data Memory

0

M

U

X

1

MemRead

MemWrite Mem2Reg

Read

address

Instruction

[31:0]

Instruction

Memory

PC

4

Shift

left 2

0

M

U

X

1

0

M

U

X

1

ALU

control

ALUop

RegDst

[25:21]

[20:16]

15:11

[15:0]

Control[31:26]

Branch

Zero

[5:0]

Page 143: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

beq $t0, $zero, Label (the case $t0 != zero)

ALU

Read register 1

Read data 1

Read register 2

Write register

Read data 2

Write data

Register files Sign-

extend

RegWrite

16 32

0

M

U

X

1

ALUsrc

Address

Read data

Write data

Data Memory

0

M

U

X

1

MemRead

MemWrite Mem2Reg

Read

address

Instruction

[31:0]

Instruction

Memory

PC

4

Shift

left 2

0

M

U

X

1

0

M

U

X

1

ALU

control

ALUop

RegDst

[25:21]

[20:16]

15:11

[15:0]

Control[31:26]

Branch

Zero

[5:0]

Page 144: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

The Setting of Control Lines

Instruc-tions

RegDst ALUSrc Mem2Reg

Reg-

Write

Mem-

Read

Mem-

Write

Branch ALUOp1

ALUOp0

lw 0 1 1 1 1 0 0 0 0

R-type

1 0 0 1 0 0 0 1 0

beq x 0 x 0 0 0 1 0 1

Page 145: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

sw $t1, 8($s0)

ALU

Read register 1

Read data 1

Read register 2

Write register

Read data 2

Write data

Register files Sign-

extend

RegWrite

16 32

0

M

U

X

1

ALUsrc

Address

Read data

Write data

Data Memory

0

M

U

X

1

MemRead

MemWrite Mem2Reg

Read

address

Instruction

[31:0]

Instruction

Memory

PC

4

Shift

left 2

0

M

U

X

1

0

M

U

X

1

ALU

control

ALUop

RegDst

[25:21]

[20:16]

15:11

[15:0]

Control[31:26]

Branch

Zero

[5:0]

Page 146: Computer Organization and Architecture Chapter 5: The Processor:  Datapath and Control

The Setting of Control Lines

Instruc-tions

RegDst ALUSrc Mem2Reg

Reg-

Write

Mem-

Read

Mem-

Write

Branch ALUOp1

ALUOp0

lw 0 1 1 1 1 0 0 0 0

R-type

1 0 0 1 0 0 0 1 0

beq x 0 x 0 0 0 1 0 1

sw x 1 x 0 0 1 0 0 0