55:035 Computer Architecture and Organization Lecture 9

Preview:

Citation preview

55:035 Computer Architecture and Organization

Lecture 9

Outline Building a CPU

Basic Components MIPS Instructions Basic 5 Steps for CPU Single-Cycle Design Multi-cycle Design Comparison of Single and Multi-cycle Designs

255:035 Computer Architecture and Organization

Overview Brief look

Digital logic

CPU Datapath MIPS Example

355:035 Computer Architecture and Organization

Digital Logic

D Q

D-type Flip-flop

Clock(edge-triggered)

S (Select input)

A

BF

0

1

Multiplexer

D-type Flip-flop with Enable

Clock(edge-triggered)

D QEN

0

1D Q

DQ

EN(enable)

Clock(edge-triggered)

455:035 Computer Architecture and Organization

Digital Logic

1 Bit

D Q

Clock(edge-triggered)

EN

4 Bits

Clock(edge-triggered)

D3 Q3

EN

D2 Q2D1 Q1D0 Q0

Registers

N Bits

D Q

Clock(edge-triggered)

EN

555:035 Computer Architecture and Organization

Digital Logic

outin

drive

Tri-state Driver (Buffer)In Drive Out

0 0 Z

1 0 Z

0 1 0

1 1 1

What is Z ??

655:035 Computer Architecture and Organization

Digital Logic

Adder/Subtractor or ALU

A B

F

Carry-out

Add/sub or ALUop

Carry-in

755:035 Computer Architecture and Organization

Overview Brief look

Digital logic

How to Design a CPU Datapath MIPS Example

855:035 Computer Architecture and Organization

Designing a CPU: 5 Steps Analyze the instruction set datapath requirements

MIPS: ADD, SUB, ORI, LW, SW, BR Meaning of each instruction given by RTL (register transfers) 2 types of registers: CPU/ISA registers, temporary registers

Datapath requirements select the datapath components ALU, register file, adder, data memory, etc

Assemble the datapath Datapath must support planned register transfers Ensure all instructions are supported

Analyze datapath control required for each instruction Assemble the control logic

955:035 Computer Architecture and Organization

Step 1a: Analyze ISA All MIPS instructions are 32 bits long. Three instruction formats:

R-type

I-type

J-type

R: registers, I: immediate, J: jumps These formats intentionally chosen to simplify design

op target address

02631

6 bits 26 bits

op rs rt rd shamt funct

061116212631

6 bits 6 bits5 bits5 bits5 bits5 bits

op rs rt immediate

016212631

6 bits 16 bits5 bits5 bits

1055:035 Computer Architecture and Organization

Step 1b: Analyze ISA

Meaning of the fields: op: operation of the instruction rs, rt, rd: the source and destination register specifiers

Destination is either rd (R-type), or rt (I-type) shamt: shift amount funct: selects the variant of the operation in the “op” field immediate: address offset or immediate value target address: target address of the jump instruction

op target address02631

6 bits 26 bits

op rs rt rd shamt funct061116212631

6 bits 6 bits5 bits5 bits5 bits5 bits

op rs rt immediate016212631

6 bits 16 bits5 bits5 bits

R-type

I-type

J-type

1155:035 Computer Architecture and Organization

MIPS ISA: subset for today ADD and SUB

addU rd, rs, rt subU rd, rs, rt

OR Immediate: ori rt, rs, imm16

LOAD and STORE Word lw rt, rs, imm16 sw rt, rs, imm16

BRANCH: beq rs, rt, imm16

op rs rt rd shamt funct

061116212631

6 bits 6 bits5 bits5 bits5 bits5 bits

op rs rt immediate

016212631

6 bits 16 bits5 bits5 bits

op rs rt immediate

016212631

6 bits 16 bits5 bits5 bits

op rs rt immediate

016212631

6 bits 16 bits5 bits5 bits

1255:035 Computer Architecture and Organization

Step 2: Datapath RequirementsREGISTER FILE

MIPS ISA requires 32 registers, 32b each

Called a register file Contains 32 entries Each entry is 32b

AddU rd,rs,rt or SubU rd,rs,rt Read two sources rs, rt Operation rs + rt or rs – rt Write destination rd ← rs+/-rt

Requirements Read two registers (rs, rt) Perform ALU operation Write a third register (rd)

RdReg1

RdReg2

WrReg

WrData

RdData1

RdData2

RegWrite

REGFILE

RegisterNumbers(5 bits ea)

How toimplement?

ALU

ALUop

Result

Zero?

1355:035 Computer Architecture and Organization

Step 3: Datapath Assembly ADDU rd, rs, rt SUBU rd, rs, rt

Need an ALU Hook it up to REGISTER FILE REGFILE has 2 read ports (rs,rt), 1 write port (rd)

rsParametersCome FromInstructionFields

rt

rd

Control Signals DependUpon Instruction Fields

Eg:ALUop = f(Instruction) = f(op, funct)

RdReg1

RdReg2

WrReg

WrData

RdData1

RdData2

RegWrite

REGFILE

ALU

ALUop

Result

Zero?

1455:035 Computer Architecture and Organization

Steps 2 and 3: ORI Instruction ORI rt, rs, Imm16

Need new ALUop for ‘OR’ function, hook up to REGFILE 1 read port (rs), 1 write port (rt), 1 const value (Imm16)

rs

FromInstruction

rt

rt rdX

RdReg1

RdReg2

WrReg

WrData

RdData1

RdData2

RegWrite

REGFILE

ZERO-EXTEND

ALU

ALUop

Result

Zero?

16-bitsImm16

ALUsrc

0

1Control SignalsDepend UponInstruction Fields

E.g.:ALUsrc = f(Instruction) = f(op, funct)

1555:035 Computer Architecture and Organization

Steps 2 and 3 Destination Register Must select proper destination, rd or rt

Depends on Instruction Type R-type may write rd I-type may write rt

FromInstruction

RdReg1

RdReg2

WrReg

WrData

RdData1

RdData2

REGFILE

rs

rt

rd

ZERO-EXTEND

ALU

ALUop

Result

Zero?

ALUsrc

0

1

RegDst

1

0

16-bitsImm16

RegWrite

1655:035 Computer Architecture and Organization

Steps 2 and 3: Load Word LW rt, rs, Imm16

Need Data Memory: data ← Mem[Addr] Addr is rs+Imm16, Imm16 is signed, use ALU for +

Store in rt: rt ← Mem[rs+Imm16]

RdReg1

RdReg2

WrRegWrData

RdData1

RdData2REGFILE

rs

rt

rd

SIGN/ZERO-

EXTEND

ALU

ALUop

Result

Zero?

ALUsrc

0

1

RegDst

1

0

Imm16

RegWrite

AddrRdData

MemtoReg

0

1

DATAMEM

ExtOp

1755:035 Computer Architecture and Organization

Steps 2 and 3: Store Word SW rt, rs, Imm16

Need Data Memory: Mem[Addr] ← data Addr is rs+Imm16, Imm16 is signed, use ALU for +

Store in Mem: Mem[rs+Imm16] ← rt

RdReg1

RdReg2

WrReg

WrData

RdData1

RdData2

REGFILE

rs

rt

rd

SIGN/ZERO-

EXTEND

ALU

ALUop

Result

Zero?

ALUsrc

0

1

RegDst

1

0

Imm16

RegWrite

AddrRdData

WrData

MemtoReg

1

0

DATAMEM

ExtOp

MemWrite

1855:035 Computer Architecture and Organization

Writes: Need to Control Timing Problem: write to data memory

Data can come anytime Addr must come first MemWrite must come after Addr

Else? writes to wrong Addr!

Solution: use ideal data memory Assume everything works ok How to fix this for real? One solution: synchronous memory Another solution: delay MemWr to come late

Problems?: write to register file Does RegWrite signal come after WrReg number? When does the write to a register happen? Read from same register as being written?

1955:035 Computer Architecture and Organization

Missing Pieces: Instruction Fetching Where does the Instruction come from?

From instruction memory, of course!

Recall: stored-program concept Alternatives? How about hard-coding wires and switches…? This

is how ENIAC was programmed!

How to branch? BEQ rs, rt, Imm16

2055:035 Computer Architecture and Organization

Instruction Processing Fetch instruction Execute instruction

Fetch next instruction Execute next instruction

Fetch next instruction Execute next instruction

Etc…

How to maintain sequence? Use a counter! Branches (out of sequence) ? Load the counter!

2155:035 Computer Architecture and Organization

Instruction Processing Program Counter

Points to current instruction

Address to instruction memory Instr ← InstrMem[PC]

Next instruction: counts up by 4 Remember: memory is byte-addressable, instructions are 4 bytes

PC ← PC + 4

Branch instruction: replace PC contents

2255:035 Computer Architecture and Organization

Step 1: Analyze Instructions Register Transfer Language…

op | rs | rt | rd | shamt | funct = InstrMem[ PC ]

op | rs | rt | Imm16 = InstrMem[ PC ]

Instr Register Transfers

ADDU R[rd] ← R[rs] + R[rt]; PC ← PC + 4

SUBU R[rd] ← R[rs] – R[rt]; PC ← PC + 4

ORI R[rt] ← R[rs] + zero_ext(Imm16); PC ← PC + 4

LOAD R[rt] ← MEM[ R[rs] + sign_ext(Imm16)]; PC ← PC + 4

STORE MEM[ R[rs] + sign_ext(Imm16) ] ← R[rt]; PC ← PC + 4

BEQ if ( R[rs] == R[rt] ) then PC ← PC + 4 + { sign_ext(Imm16)] || b’00’ } else

PC ← PC + 42355:035 Computer Architecture and Organization

Steps 2 and 3: Datapath & Assembly

PC: a register Counter, counts by +4 Provides address to Instruction Memory

Add

Readaddress

InstructionMemory

Instruction[31:0]

PC

Instruction[31:0]

4

2455:035 Computer Architecture and Organization

Steps 2 and 3: Datapath & Assembly

Add AddAdd

result

Readaddress

InstructionMemory

Instruction[31:0]

PC

0Mux1

Sign/Zero

Extend

Instruction[25:21]

Instruction[20:16]

Instruction[15:11]

Instruction[15:0] (Imm16)

16 32

PCSrcShiftLeft 2

4

PC: a register Counter, counts by +4 Sometimes, must add

SignExtend{Imm16||b’00’} for branch instructionsNote: the sign-extender for Imm16

is already in the datapath(everything else is new)

ExtOp25

Steps 2 and 3: Add Previous Datapath

Add Add

ALU

Addresult

ALUresult

Zero

Readaddress

InstructionMemory

Instruction[31:0]

RegisterFile

DataMemory

PC

Addr-ess

Readdata

Writedata

0Mux1

1Mux0

0Mux1

0Mux1

ALUControl

Sign/Zero

Extend

Writereg.

Readreg. 1

Readreg. 2

Readdata 2

Readdata 1

Writedata

Instruction[25:21]

Instruction[20:16]

Instruction[15:11]

Instruction[15:0] (Imm16)

Instruction[5:0] (funct)

16 32

RegWrite

RegDst

ALUSrc

MemWrite

PCSrc

MemtoReg

ALUOp

ShiftLeft 2

4

ExtOp

What have we done? Created a simple CPU datapath

Control still missing (next slide)

Single-cycle CPU Every instruction takes 1 clock cycle Clocking ?

2755:035 Computer Architecture and Organization

One Clock Cycle Clock Locations

PC, REGFILE have clocks

Operation On rising edge, PC will get new value

Maybe REGFILE will have one value updated as well After rising edge

PC and REGFILE can’t change New value out of PC Instruction out of INSTRMEM Instruction selects registers to read from REGFILE Instruction controls ALUop, ALUsrc, MemWrite, ExtOp, etc ALU does its work DataMem may be read (depending on instruction) Result value goes back to REGFILE New PC value goes back to PC Await next clock edge

Lots to do in only1 clockcycle !!

2855:035 Computer Architecture and Organization

Missing Steps? Control is missing (Steps 4 and 5 we mentioned earlier)

Generate the green signals ALUsrc, MemWrite, MemtoReg, PCSrc, RegDst, etc

These are all f(Instruction), where f() is a logic expression Will look at control strategies in upcoming lecture

Implementation Details How to implement REGFILE?

Read port: tristate buffers? Multiplexer? Memory? Two read ports: two of above? Write port: how to write only 1 register?

How to control writes to memory? To register file?

More instructions Shift instructions Jump instruction Etc

2955:035 Computer Architecture and Organization

1-Cycle CPU Datapath

Add Add

ALU

Addresult

ALUresult

Zero

Readaddress

InstructionMemory

Instruction[31:0]

RegisterFile

DataMemory

PC

Addr-ess

Readdata

Writedata

0Mux1

1Mux0

0Mux1

0Mux1

ALUControl

Sign/Zero

Extend

Writereg.

Readreg. 1

Readreg. 2

Readdata 2

Readdata 1

Writedata

Instruction[25:21]

Instruction[20:16]

Instruction[15:11]

Instruction[15:0] (Imm16)

Instruction[5:0] (funct)

16 32

RegWrite

RegDst

ALUSrc

MemWrite

PCSrc

MemtoReg

ALUOp

ShiftLeft 2

4

ExtOp

1-cycle CPU Datapath + Control

PCSrc

Instruction[25:21]

Instruction[20:16]

Instruction[15:11]

Instruction[15:0]

Instruction[5:0]

Instruction[31:26]

Sign/Zero

Extend

DataMemory

Addr-ess

Readdata

Writedata

ALUALU

result

Zero

Readaddress

InstructionMemory

Instruction[31:0]

Add

PC

4

AddAdd

resultShiftLeft 2

RegisterFile

Writereg.

Readreg. 1

Readreg. 2

Readdata 2

Readdata 1

Writedata

RegDst

BranchMemReadMemtoRegALUOpMemWriteALUSrcRegWrite

ALUcontrol

Con-trol

Input or Output Signal Name R-format Lw Sw Beq

Inputs

Op5 0 1 1 0

Op4 0 0 0 0

Op3 0 0 1 0

Op2 0 0 0 1

Op1 0 1 1 0

Op0 0 1 1 0

Outputs

RegDst 1 0 X X

ALUSrc 0 1 1 0

MemtoReg 0 1 X X

RegWrite 1 1 0 0

MemRead 0 1 0 0

MemWrite 0 0 1 0

Branch 0 0 0 1

ALUOp1 1 0 0 0

ALUOp0 0 0 0 1

Also: I-type instructions (ORI) & ExtOp (sign-extend control), etc.

1-cycle CPU Control – Lookup Table

1-cycle CPU + Jump Instruction

Instruction[31:26]

Instruction[25:0]

PC + 4 [31..28]

Jump address [31..0]

Instruction[25:21]

Instruction[20:16]

Instruction[15:11]

Instruction[15:0]

Instruction[5:0]

1-cycle CPU Problems? Every instruction 1 cycle Some instructions “do more work”

Eg, lw must read from DATAMEM All instructions must have same clock period…

Many instructions run slower than necessary

Tricky timing on MemWrite, RegWrite(?) signals Write signal must come *after* address is stable

Need extra resources… PC+4 adder, ALU for BEQ instruction, DATAMEM+INSTRMEM

3455:035 Computer Architecture and Organization

Performance! Single-Cycle CPU Performance

Execute one instruction per clock cycle (CPI=1) Clock cycle time? Note dataflow includes:

INSTRMEM read REGFILE access Sign extension ALU operation DATAMEM read REGFILE/PC write

Not every instruction uses all resources (eg, DATAMEM read) Can we change clock period for each instruction?

No! (Why not?) One clock period: the worst case! This is why a single-cycle CPU is not good for performance

3555:035 Computer Architecture and Organization

1-cycle CPU Datapath + Controller

Instruction[31:26]

Instruction[25:0]

PC + 4 [31..28]

Jump address [31..0]

Instruction[25:21]

Instruction[20:16]

Instruction[15:11]

Instruction[15:0]

Instruction[5:0]

1-cycle CPU Summary Operation

1 cycle per instruction Control signals held fixed during entire cycle (except BRANCH) Only 2 registers

PC, updated every clock cycle REGFILE, updated when required

During clock cycle, data flows from register-outputs to register-inputs Fixed clock frequency / period

Performance 1 instruction per cycle Slowest instruction determines clock frequency

Outstanding issue: MemWrite timing Assume this signal writes to memory at end of clock cycle

3755:035 Computer Architecture and Organization

Multi-cycle CPU Goals Improve performance

Break each instruction into smaller steps / multiple cycles LW instruction 5 cycles SW instruction 4 cycles R-type instruction 4 cycles Branch, Jump 3 cycles

Aim for 5x clock frequency Complex instructions (eg, LW) 5 cycles same performance as before Simple instructions (eg, ADD) fewer cycles faster

Save resources (gates/transistors) Re-use ALU over multiple cycles Put INSTR + DATA in same memory

MemWrite timing solved?

3855:035 Computer Architecture and Organization

Multi-cycle CPU Datapath

Instruction[25:21]

Instruction[20:16]

Instruction[15:11]

Instruction[15:0]

Instruction[5:0]

Instr[15:0]

InstructionRegister

MemoryData

Register

ALUOut

A

B

MemoryMemData

Address

Writedata

Registers

RdData1

RdData2

RdReg2

RdReg1

Writereg

Writedata

Add multiplexers + control signals (IorD, MemtoReg, ALUSrcA, ALUSrcB) Move signal paths (+4, Shift Left 2)

4

ShiftLeft 2

SignExtend

PC

Mux

Mux

ALU

ALUresult

Zero

Mux

Mux

Mux

Multi-cycle CPU Datapath

Instruction[25:21]

Instruction[20:16]

Instruction[15:11]

Instruction[15:0]

Instruction[5:0]

Instr[15:0]

ALUOut

A

B

MemoryMemData

Address

Writedata

Registers

RdData1

RdData2

RdReg2

RdReg1

Writereg

Writedata

Add registers + control signals (IR, MDR, A, B, ALUOut) Registers with no control signal load value every clock cycle (eg, PC)

4

ShiftLeft 2

SignExtend

PC

Mux

Mux

ALU

ALUresult

Zero

Mux

Mux

Mux

InstructionRegister

MemoryData

Register

Instruction Execution Example Execute a “Load Word” instruction

LW rt, 0(rs)

5 Steps1. Fetch instruction2. Read registers3. Compute address4. Read data5. Write registers

4155:035 Computer Architecture and Organization

Load Word Instruction Sequence

1. Fetch InstructionInstructionRegister ← Mem[PC]

Instruction[25:21]

Instruction[20:16]

Instruction[15:11]

Instruction[5:0]

Instr[15:0]

ALUOut

A

BWritedata

Registers

RdData1

RdData2

RdReg2

RdReg1

Writereg

Writedata

4

ShiftLeft 2

SignExtend

PC

Mux

Mux

ALU

ALUresult

Zero

Mux

Mux

Mux

InstructionRegister

MemoryData

Register

Instruction[15:0]

MemoryMemData

Address

Load Word Instruction Sequence

2. Read RegistersA ← Registers[Rs]

Instruction[20:16]

Instruction[15:11]

Instruction[15:0]

Instruction[5:0]

Instr[15:0]

ALUOut

A

B

MemoryMemData

Address

Writedata

Registers

RdData2

RdReg2

Writereg

Writedata

4

ShiftLeft 2

SignExtend

PC

Mux

Mux

ALU

ALUresult

Zero

Mux

Mux

Mux

InstructionRegister

MemoryData

Register

Instruction[25:21]

RdData1

RdReg1

Load Word Instruction Sequence

3. Compute AddressALUOut ← A + {SignExt(Imm16),b’00’}

Instruction[25:21]

Instruction[20:16]

Instruction[15:0]

Instruction[5:0]

Instr[15:0]

B

MemoryMemData

Address

Writedata

Registers

RdData1

RdData2

RdReg2

RdReg1

Writereg

Writedata

4

ShiftLeft 2

SignExtend

PC

Mux

Mux

ALU

ALUresult

Zero

Mux

Mux

Mux

InstructionRegister

MemoryData

Register

Instruction[15:11]

ALUOut

A

Load Word Instruction Sequence

4. Read DataMDR ← Memory[ALUOut]

Instruction[25:21]

Instruction[20:16]

Instruction[15:11]

Instruction[15:0]

Instruction[5:0]

Instr[15:0]

A

BWritedata

Registers

RdData1

RdData2

RdReg2

RdReg1

Writereg

Writedata

4

ShiftLeft 2

SignExtend

PC

Mux

Mux

ALU

ALUresult

Zero

Mux

Mux

Mux

InstructionRegister

MemoryData

Register

ALUOut

MemoryMemData

Address

Load Word Instruction Sequence

5. Write RegistersRegisters[Rt] ← MDR

Instruction[25:21]

Instruction[20:16]

Instruction[15:11]

Instruction[15:0]

Instruction[5:0]

Instr[15:0]

ALUOut

A

B

MemoryMemData

Address

Writedata

Registers

RdData1

RdData2

RdReg2

RdReg1

4

ShiftLeft 2

SignExtend

PC

Mux

Mux

ALU

ALUresult

Zero

Mux

Mux

Mux

InstructionRegister

MemoryData

Register

Writereg

Writedata

Load Word Instruction Sequence

All 5 Steps Shown

Instruction[5:0]

Instr[15:0]

BWritedata

Registers

RdData2

RdReg2

4

ShiftLeft 2

SignExtend

PC

Mux

Mux

ALU

ALUresult

Zero

Mux

Mux

Mux

InstructionRegister

MemoryData

Register

Instruction[25:21]

Instruction[20:16]

Instruction[15:11]

Instruction[15:0]

ALUOut

MemoryMemData

AddressRdData1

RdReg1

Writereg

Writedata

A

Multi-cycle Load Word: Recap1. Fetch Instruction InstructionRegister ← Mem[PC]

2. Read Registers A ← Registers[Rs]

3. Compute Address ALUOut ← A + {SignExt(Imm16)}

4. Read Data MDR ← Memory[ALUOut]

5. Write Registers Registers[Rt] ← MDR

Missing Steps?

4855:035 Computer Architecture and Organization

Multi-cycle Load Word: Recap1. Fetch Instruction InstructionRegister ← Mem[PC]; PC ← PC + 4

2. Read Registers A ← Registers[Rs]

3. Compute Address ALUOut ← A + {SignExt(Imm16)}

4. Read Data MDR ← Memory[ALUOut]

5. Write Registers Registers[Rt] ← MDR

Missing Steps? Must increment the PC Do it as part of the instruction fetch (in step 1) Need PCWrite control signal

4955:035 Computer Architecture and Organization

Multi-cycle R-Type Instruction1. Fetch Instruction InstructionRegister ← Mem[PC]; PC ← PC + 4

2. Read Registers A ← Registers[Rs]; B ← Registers[Rt]

3. Compute Value ALUOut ← A op B

4. Write Registers Registers[Rd] ← ALUOut

RTL describes data flow action in each clock cycle Control signals determine precise data flow Each step implies unique control values

5055:035 Computer Architecture and Organization

Multi-cycle R-Type Instruction: Control Signal Values1. Fetch Instruction InstructionRegister ← Mem[PC]; PC ← PC + 4

MemRead=1, ALUSrcA=0, IorD=0, IRWrite, ALUSrcB=01, ALUop=00, PCWrite, PCSource=00

2. Read Registers A ← Registers[Rs]; B ← Registers[Rt]ALUSrcA=0, ALUSrcB=11, ALUop=00

3. Compute Value ALUOut ← A op BALUSrcA=1, ALUSrcB=00, ALUop=10

4. Write Registers Registers[Rd] ← ALUOutRegDst=1, RegWrite, MemtoReg=0

Each step implies unique control values Fixed for entire cycle “Default value” implied if unspecified

5155:035 Computer Architecture and Organization

Check Your Work – Is RTL Valid ? 1. Datapath check

Within one cycle… Each cycle has valid data flow path (path exists) Each register gets only one new value

Across multiple cycles… Register value is defined before use in previous (earlier in time) clock cycle

Eg, “A 3” must occur before “B A” Make sure register value doesn’t disappear if set >1 cycle earlier

2. Control signal check Each cycle, RTL describing the datapath flow implies a value for each control

signal 0 or 1 or default or don’t care

Each control signal gets only one fixed value the entire cycle

3. Overall check Does the sequence of steps work ?

5255:035 Computer Architecture and Organization

Multi-cycle BEQ Instruction

1. Fetch InstructionInstructionRegister ← Mem[PC]; PC ← PC + 4

2. Read Registers, Precompute TargetA ← Registers[Rs] ; B ← Registers[Rt] ; ALUOut ← PC + {SignExt{Imm16},b’00’}

3. Compare Registers, Conditional Branchif( (A – B) ==0 ) PC ← ALUOut

Green shows PC calculation flow (in parallel with other operations)

5355:035 Computer Architecture and Organization

Multi-cycle Datapath with Control Signals

Instr[25:21]

Instr[20:16]

Instr[15:0]

Instr[15:0]

Instruction[5:0]

In[15:11]

Instr[25:0]

PC[31..28]

Jumpaddress

[31..0]

PCWrite

IorDMemRead

MemWrite

MemtoReg

IRWritePCSrc

ALUOp

ALUSrcA

ALUSrcB

RegWrite

RegDst

ALUControl

5455:035 Computer Architecture and Organization

Multi-cycle Datapath with Controller

Instr.[31:26]

Instr[31:26]

Instr[25:21]

Instr[20:16]

Instr[15:0]

Instr[15:0]

Instruction[5:0]

In[15:11]

Instr[25:0]

PC[31..28]

Jumpaddress

[31..0]

Multi-cycle BEQ Instruction

1. Fetch InstructionInstructionRegister ← Mem[PC]; PC ← PC + 4

2. Read Registers, Precompute TargetA ← Registers[Rs] ; B ← Registers[Rt] ; ALUOut ← PC + {SignExt{Imm16},b’00’}

3. Compare Registers, Conditional Branchif( (A – B) ==0 ) PC ← ALUOut

Green shows PC calculation flow (in parallel with other operations)

5655:035 Computer Architecture and Organization

Multi-cycle Datapath with Control Signals

Instr[25:21]

Instr[20:16]

Instr[15:0]

Instr[15:0]

Instruction[5:0]

In[15:11]

Instr[25:0]

PC[31..28]

Jumpaddress

[31..0]

PCWrite

IorDMemRead

MemWrite

MemtoReg

IRWritePCSrc

ALUOp

ALUSrcA

ALUSrcB

RegWrite

RegDst

ALUControl

5755:035 Computer Architecture and Organization

Multi-cycle Datapath with Controller

Instr.[31:26]

Instr[31:26]

Instr[25:21]

Instr[20:16]

Instr[15:0]

Instr[15:0]

Instruction[5:0]

In[15:11]

Instr[25:0]

PC[31..28]

Jumpaddress

[31..0]

Multi-cycle CPU Control: Overview

General approach: Finite State Machine (FSM) Need details in each branch of control…

Precise outputs for each state (Mealy depends on inputs, Moore does not) Precise “next state” for each state (can depend on inputs)

ControlSignalOutputs

ControlSignalOutputs

5955:035 Computer Architecture and Organization

How to Implement FSM ? Manually with logic gates + FFs

Bubble diagram, next-state table, state assignment Karnaugh map for each state bit, each output bit (painful!)

High-level language description (eg, Verilog, VHDL) Describe FSM bubble diagram (next-states, output values) Automatically synthesized into gates + FFs

Microcode (µ-code) description Sequence through many µ-ops for each CPU instruction

One µ-op (µ-instruction) sends correct control signal for 1 cycle µ-op similar to one bubble in FSM

Acts like a mini-CPU within a CPU µPC: microcode program counter Microcode storage memory contains µ-ops

Can look similar to RTL or some new “assembly language”

6055:035 Computer Architecture and Organization

FSM Specification: Bubble Diagram

Can build thisby examiningRTL

It is possible toautomaticallyconvert RTLinto this form !

61

FSM: Gates + FFs Implementation

FSMHigh-level

Organization

6255:035 Computer Architecture and Organization

FSM: Microcode Implementation

Adder

1

Datapathcontroloutputs

Sequencingcontrol

Inputs from instructionregister opcode field

MicrocodeStorage

(memory)

Inputs

Outputs

Microprogram Counter

Address Select Logic

6355:035 Computer Architecture and Organization

Multi-cycle CPU with Control FSM

Instr.[31:26]

Instr[31:26]

Instr[25:21]

Instr[20:16]

Instr[15:0]

Instr[15:0]

Instruction[5:0]

In[15:11]

Instr[25:0]

PC[31..28]

Jumpaddress

[31..0]

FSMControlOutputs

ConditionalBranch

Control FSM: Overview

General approach: Finite State Machine (FSM) Need details in each branch of control…

6555:035 Computer Architecture and Organization

Detailed FSM

66

Detailed FSMInstruction

Fetch

MemoryReference

Branch JumpR-Type

67

Detailed FSM: Instruction Fetch

6855:035 Computer Architecture and Organization

Detailed FSM: Memory Reference

LW SW

69

Detailed FSM: R-Type Instruction

7055:035 Computer Architecture and Organization

Detailed FSM: Branch Instruction

7155:035 Computer Architecture and Organization

Detailed FSM: Jump Instruction

7255:035 Computer Architecture and Organization

Performance Comparison

Single-cycle CPU

vs

Multi-cycle CPU

7355:035 Computer Architecture and Organization

Simple Comparison

Single-cycle CPU

1 clock cycle

5 clock cycles

Multi-cycle CPU

4 clock cycles

Multi-cycle CPU

3 clock cycles

Multi-cycle CPU

SW, R-type

BEQ, J

LW

All

What’s really happening?

Single-cycle CPU

Multi-cycle CPU

( Load Word Instruction )

Fetch Decode Memory WriteCalcAddr

Ideally:

7555:035 Computer Architecture and Organization

In practice, steps differ in speeds…

Single-cycle CPU

Multi-cycle CPU

Fetch Decode MemoryCalcAddr

Fetch Decode MemoryCalcAddr

Write

Write

Violation!Wasted time!

Load Word Instruction

7655:035 Computer Architecture and Organization

Single-cycle vs Multi-cycleLW instruction faster for single-cycle

Single-cycle CPU

Fetch Decode MemoryCalcAddr

Fetch Decode MemoryCalcAddr

Write

Write

Violation fixed!

Multi-cycle CPU

Now wasted time is larger!

7755:035 Computer Architecture and Organization

Single-cycle vs Multi-cycleSW instruction ~ same speed

Single-cycle CPU

Fetch Decode MemoryCalcAddr

Fetch Decode MemoryCalcAddr

Multi-cycle CPU

Wasted time!

Speed diff

7855:035 Computer Architecture and Organization

Single-cycle vs Multi-cycleBEQ, J instruction faster for multi-cycle

Single-cycle CPU

Fetch DecodeCalcAddr

Fetch DecodeCalcAddr

Wasted time!

Speed diff

Multi-cycle CPU

7955:035 Computer Architecture and Organization

Performance Summary Which CPU implementation is faster?

LW single-cycle is faster SW,R-type about the same BEQ,J multi-cycle is faster

Real programs use a mix of these instructions

Overall performance depends instruction frequency !

8055:035 Computer Architecture and Organization

Implementation Summary Single-cycle CPU

1 instruction per cycle (eg, 1MHz 1 MIPS) No “wasted time” on most complex instruction Large wasted time on simpler instructions Simple controller (just a lookup table or memory) Simple instructions

Multi-cycle CPU << 1 instruction per cycle (eg, 1MHz 0.2 MIPS) Small time wasted on most complex instruction

Hence, this instruction always slower than single-cycle CPU Small time wasted on simple instructions

Eliminates “large wasted time” by using fewer clock cycles Complex controller (FSM) Potential to create complex instructions

8155:035 Computer Architecture and Organization