81
55:035 Computer Architecture and Organization Lecture 9

55:035 Computer Architecture and Organization Lecture 9

Embed Size (px)

Citation preview

Page 1: 55:035 Computer Architecture and Organization Lecture 9

55:035 Computer Architecture and Organization

Lecture 9

Page 2: 55:035 Computer Architecture and Organization Lecture 9

Outline Building a CPU

Basic Components MIPS Instructions Basic 5 Steps for CPU Single-Cycle Design Multi-cycle Design Comparison of Single and Multi-cycle Designs

255:035 Computer Architecture and Organization

Page 3: 55:035 Computer Architecture and Organization Lecture 9

Overview Brief look

Digital logic

CPU Datapath MIPS Example

355:035 Computer Architecture and Organization

Page 4: 55:035 Computer Architecture and Organization Lecture 9

Digital Logic

D Q

D-type Flip-flop

Clock(edge-triggered)

S (Select input)

A

BF

0

1

Multiplexer

D-type Flip-flop with Enable

Clock(edge-triggered)

D QEN

0

1D Q

DQ

EN(enable)

Clock(edge-triggered)

455:035 Computer Architecture and Organization

Page 5: 55:035 Computer Architecture and Organization Lecture 9

Digital Logic

1 Bit

D Q

Clock(edge-triggered)

EN

4 Bits

Clock(edge-triggered)

D3 Q3

EN

D2 Q2D1 Q1D0 Q0

Registers

N Bits

D Q

Clock(edge-triggered)

EN

555:035 Computer Architecture and Organization

Page 6: 55:035 Computer Architecture and Organization Lecture 9

Digital Logic

outin

drive

Tri-state Driver (Buffer)In Drive Out

0 0 Z

1 0 Z

0 1 0

1 1 1

What is Z ??

655:035 Computer Architecture and Organization

Page 7: 55:035 Computer Architecture and Organization Lecture 9

Digital Logic

Adder/Subtractor or ALU

A B

F

Carry-out

Add/sub or ALUop

Carry-in

755:035 Computer Architecture and Organization

Page 8: 55:035 Computer Architecture and Organization Lecture 9

Overview Brief look

Digital logic

How to Design a CPU Datapath MIPS Example

855:035 Computer Architecture and Organization

Page 9: 55:035 Computer Architecture and Organization Lecture 9

Designing a CPU: 5 Steps Analyze the instruction set datapath requirements

MIPS: ADD, SUB, ORI, LW, SW, BR Meaning of each instruction given by RTL (register transfers) 2 types of registers: CPU/ISA registers, temporary registers

Datapath requirements select the datapath components ALU, register file, adder, data memory, etc

Assemble the datapath Datapath must support planned register transfers Ensure all instructions are supported

Analyze datapath control required for each instruction Assemble the control logic

955:035 Computer Architecture and Organization

Page 10: 55:035 Computer Architecture and Organization Lecture 9

Step 1a: Analyze ISA All MIPS instructions are 32 bits long. Three instruction formats:

R-type

I-type

J-type

R: registers, I: immediate, J: jumps These formats intentionally chosen to simplify design

op target address

02631

6 bits 26 bits

op rs rt rd shamt funct

061116212631

6 bits 6 bits5 bits5 bits5 bits5 bits

op rs rt immediate

016212631

6 bits 16 bits5 bits5 bits

1055:035 Computer Architecture and Organization

Page 11: 55:035 Computer Architecture and Organization Lecture 9

Step 1b: Analyze ISA

Meaning of the fields: op: operation of the instruction rs, rt, rd: the source and destination register specifiers

Destination is either rd (R-type), or rt (I-type) shamt: shift amount funct: selects the variant of the operation in the “op” field immediate: address offset or immediate value target address: target address of the jump instruction

op target address02631

6 bits 26 bits

op rs rt rd shamt funct061116212631

6 bits 6 bits5 bits5 bits5 bits5 bits

op rs rt immediate016212631

6 bits 16 bits5 bits5 bits

R-type

I-type

J-type

1155:035 Computer Architecture and Organization

Page 12: 55:035 Computer Architecture and Organization Lecture 9

MIPS ISA: subset for today ADD and SUB

addU rd, rs, rt subU rd, rs, rt

OR Immediate: ori rt, rs, imm16

LOAD and STORE Word lw rt, rs, imm16 sw rt, rs, imm16

BRANCH: beq rs, rt, imm16

op rs rt rd shamt funct

061116212631

6 bits 6 bits5 bits5 bits5 bits5 bits

op rs rt immediate

016212631

6 bits 16 bits5 bits5 bits

op rs rt immediate

016212631

6 bits 16 bits5 bits5 bits

op rs rt immediate

016212631

6 bits 16 bits5 bits5 bits

1255:035 Computer Architecture and Organization

Page 13: 55:035 Computer Architecture and Organization Lecture 9

Step 2: Datapath RequirementsREGISTER FILE

MIPS ISA requires 32 registers, 32b each

Called a register file Contains 32 entries Each entry is 32b

AddU rd,rs,rt or SubU rd,rs,rt Read two sources rs, rt Operation rs + rt or rs – rt Write destination rd ← rs+/-rt

Requirements Read two registers (rs, rt) Perform ALU operation Write a third register (rd)

RdReg1

RdReg2

WrReg

WrData

RdData1

RdData2

RegWrite

REGFILE

RegisterNumbers(5 bits ea)

How toimplement?

ALU

ALUop

Result

Zero?

1355:035 Computer Architecture and Organization

Page 14: 55:035 Computer Architecture and Organization Lecture 9

Step 3: Datapath Assembly ADDU rd, rs, rt SUBU rd, rs, rt

Need an ALU Hook it up to REGISTER FILE REGFILE has 2 read ports (rs,rt), 1 write port (rd)

rsParametersCome FromInstructionFields

rt

rd

Control Signals DependUpon Instruction Fields

Eg:ALUop = f(Instruction) = f(op, funct)

RdReg1

RdReg2

WrReg

WrData

RdData1

RdData2

RegWrite

REGFILE

ALU

ALUop

Result

Zero?

1455:035 Computer Architecture and Organization

Page 15: 55:035 Computer Architecture and Organization Lecture 9

Steps 2 and 3: ORI Instruction ORI rt, rs, Imm16

Need new ALUop for ‘OR’ function, hook up to REGFILE 1 read port (rs), 1 write port (rt), 1 const value (Imm16)

rs

FromInstruction

rt

rt rdX

RdReg1

RdReg2

WrReg

WrData

RdData1

RdData2

RegWrite

REGFILE

ZERO-EXTEND

ALU

ALUop

Result

Zero?

16-bitsImm16

ALUsrc

0

1Control SignalsDepend UponInstruction Fields

E.g.:ALUsrc = f(Instruction) = f(op, funct)

1555:035 Computer Architecture and Organization

Page 16: 55:035 Computer Architecture and Organization Lecture 9

Steps 2 and 3 Destination Register Must select proper destination, rd or rt

Depends on Instruction Type R-type may write rd I-type may write rt

FromInstruction

RdReg1

RdReg2

WrReg

WrData

RdData1

RdData2

REGFILE

rs

rt

rd

ZERO-EXTEND

ALU

ALUop

Result

Zero?

ALUsrc

0

1

RegDst

1

0

16-bitsImm16

RegWrite

1655:035 Computer Architecture and Organization

Page 17: 55:035 Computer Architecture and Organization Lecture 9

Steps 2 and 3: Load Word LW rt, rs, Imm16

Need Data Memory: data ← Mem[Addr] Addr is rs+Imm16, Imm16 is signed, use ALU for +

Store in rt: rt ← Mem[rs+Imm16]

RdReg1

RdReg2

WrRegWrData

RdData1

RdData2REGFILE

rs

rt

rd

SIGN/ZERO-

EXTEND

ALU

ALUop

Result

Zero?

ALUsrc

0

1

RegDst

1

0

Imm16

RegWrite

AddrRdData

MemtoReg

0

1

DATAMEM

ExtOp

1755:035 Computer Architecture and Organization

Page 18: 55:035 Computer Architecture and Organization Lecture 9

Steps 2 and 3: Store Word SW rt, rs, Imm16

Need Data Memory: Mem[Addr] ← data Addr is rs+Imm16, Imm16 is signed, use ALU for +

Store in Mem: Mem[rs+Imm16] ← rt

RdReg1

RdReg2

WrReg

WrData

RdData1

RdData2

REGFILE

rs

rt

rd

SIGN/ZERO-

EXTEND

ALU

ALUop

Result

Zero?

ALUsrc

0

1

RegDst

1

0

Imm16

RegWrite

AddrRdData

WrData

MemtoReg

1

0

DATAMEM

ExtOp

MemWrite

1855:035 Computer Architecture and Organization

Page 19: 55:035 Computer Architecture and Organization Lecture 9

Writes: Need to Control Timing Problem: write to data memory

Data can come anytime Addr must come first MemWrite must come after Addr

Else? writes to wrong Addr!

Solution: use ideal data memory Assume everything works ok How to fix this for real? One solution: synchronous memory Another solution: delay MemWr to come late

Problems?: write to register file Does RegWrite signal come after WrReg number? When does the write to a register happen? Read from same register as being written?

1955:035 Computer Architecture and Organization

Page 20: 55:035 Computer Architecture and Organization Lecture 9

Missing Pieces: Instruction Fetching Where does the Instruction come from?

From instruction memory, of course!

Recall: stored-program concept Alternatives? How about hard-coding wires and switches…? This

is how ENIAC was programmed!

How to branch? BEQ rs, rt, Imm16

2055:035 Computer Architecture and Organization

Page 21: 55:035 Computer Architecture and Organization Lecture 9

Instruction Processing Fetch instruction Execute instruction

Fetch next instruction Execute next instruction

Fetch next instruction Execute next instruction

Etc…

How to maintain sequence? Use a counter! Branches (out of sequence) ? Load the counter!

2155:035 Computer Architecture and Organization

Page 22: 55:035 Computer Architecture and Organization Lecture 9

Instruction Processing Program Counter

Points to current instruction

Address to instruction memory Instr ← InstrMem[PC]

Next instruction: counts up by 4 Remember: memory is byte-addressable, instructions are 4 bytes

PC ← PC + 4

Branch instruction: replace PC contents

2255:035 Computer Architecture and Organization

Page 23: 55:035 Computer Architecture and Organization Lecture 9

Step 1: Analyze Instructions Register Transfer Language…

op | rs | rt | rd | shamt | funct = InstrMem[ PC ]

op | rs | rt | Imm16 = InstrMem[ PC ]

Instr Register Transfers

ADDU R[rd] ← R[rs] + R[rt]; PC ← PC + 4

SUBU R[rd] ← R[rs] – R[rt]; PC ← PC + 4

ORI R[rt] ← R[rs] + zero_ext(Imm16); PC ← PC + 4

LOAD R[rt] ← MEM[ R[rs] + sign_ext(Imm16)]; PC ← PC + 4

STORE MEM[ R[rs] + sign_ext(Imm16) ] ← R[rt]; PC ← PC + 4

BEQ if ( R[rs] == R[rt] ) then PC ← PC + 4 + { sign_ext(Imm16)] || b’00’ } else

PC ← PC + 42355:035 Computer Architecture and Organization

Page 24: 55:035 Computer Architecture and Organization Lecture 9

Steps 2 and 3: Datapath & Assembly

PC: a register Counter, counts by +4 Provides address to Instruction Memory

Add

Readaddress

InstructionMemory

Instruction[31:0]

PC

Instruction[31:0]

4

2455:035 Computer Architecture and Organization

Page 25: 55:035 Computer Architecture and Organization Lecture 9

Steps 2 and 3: Datapath & Assembly

Add AddAdd

result

Readaddress

InstructionMemory

Instruction[31:0]

PC

0Mux1

Sign/Zero

Extend

Instruction[25:21]

Instruction[20:16]

Instruction[15:11]

Instruction[15:0] (Imm16)

16 32

PCSrcShiftLeft 2

4

PC: a register Counter, counts by +4 Sometimes, must add

SignExtend{Imm16||b’00’} for branch instructionsNote: the sign-extender for Imm16

is already in the datapath(everything else is new)

ExtOp25

Page 26: 55:035 Computer Architecture and Organization Lecture 9

Steps 2 and 3: Add Previous Datapath

Add Add

ALU

Addresult

ALUresult

Zero

Readaddress

InstructionMemory

Instruction[31:0]

RegisterFile

DataMemory

PC

Addr-ess

Readdata

Writedata

0Mux1

1Mux0

0Mux1

0Mux1

ALUControl

Sign/Zero

Extend

Writereg.

Readreg. 1

Readreg. 2

Readdata 2

Readdata 1

Writedata

Instruction[25:21]

Instruction[20:16]

Instruction[15:11]

Instruction[15:0] (Imm16)

Instruction[5:0] (funct)

16 32

RegWrite

RegDst

ALUSrc

MemWrite

PCSrc

MemtoReg

ALUOp

ShiftLeft 2

4

ExtOp

Page 27: 55:035 Computer Architecture and Organization Lecture 9

What have we done? Created a simple CPU datapath

Control still missing (next slide)

Single-cycle CPU Every instruction takes 1 clock cycle Clocking ?

2755:035 Computer Architecture and Organization

Page 28: 55:035 Computer Architecture and Organization Lecture 9

One Clock Cycle Clock Locations

PC, REGFILE have clocks

Operation On rising edge, PC will get new value

Maybe REGFILE will have one value updated as well After rising edge

PC and REGFILE can’t change New value out of PC Instruction out of INSTRMEM Instruction selects registers to read from REGFILE Instruction controls ALUop, ALUsrc, MemWrite, ExtOp, etc ALU does its work DataMem may be read (depending on instruction) Result value goes back to REGFILE New PC value goes back to PC Await next clock edge

Lots to do in only1 clockcycle !!

2855:035 Computer Architecture and Organization

Page 29: 55:035 Computer Architecture and Organization Lecture 9

Missing Steps? Control is missing (Steps 4 and 5 we mentioned earlier)

Generate the green signals ALUsrc, MemWrite, MemtoReg, PCSrc, RegDst, etc

These are all f(Instruction), where f() is a logic expression Will look at control strategies in upcoming lecture

Implementation Details How to implement REGFILE?

Read port: tristate buffers? Multiplexer? Memory? Two read ports: two of above? Write port: how to write only 1 register?

How to control writes to memory? To register file?

More instructions Shift instructions Jump instruction Etc

2955:035 Computer Architecture and Organization

Page 30: 55:035 Computer Architecture and Organization Lecture 9

1-Cycle CPU Datapath

Add Add

ALU

Addresult

ALUresult

Zero

Readaddress

InstructionMemory

Instruction[31:0]

RegisterFile

DataMemory

PC

Addr-ess

Readdata

Writedata

0Mux1

1Mux0

0Mux1

0Mux1

ALUControl

Sign/Zero

Extend

Writereg.

Readreg. 1

Readreg. 2

Readdata 2

Readdata 1

Writedata

Instruction[25:21]

Instruction[20:16]

Instruction[15:11]

Instruction[15:0] (Imm16)

Instruction[5:0] (funct)

16 32

RegWrite

RegDst

ALUSrc

MemWrite

PCSrc

MemtoReg

ALUOp

ShiftLeft 2

4

ExtOp

Page 31: 55:035 Computer Architecture and Organization Lecture 9

1-cycle CPU Datapath + Control

PCSrc

Instruction[25:21]

Instruction[20:16]

Instruction[15:11]

Instruction[15:0]

Instruction[5:0]

Instruction[31:26]

Sign/Zero

Extend

DataMemory

Addr-ess

Readdata

Writedata

ALUALU

result

Zero

Readaddress

InstructionMemory

Instruction[31:0]

Add

PC

4

AddAdd

resultShiftLeft 2

RegisterFile

Writereg.

Readreg. 1

Readreg. 2

Readdata 2

Readdata 1

Writedata

RegDst

BranchMemReadMemtoRegALUOpMemWriteALUSrcRegWrite

ALUcontrol

Con-trol

Page 32: 55:035 Computer Architecture and Organization Lecture 9

Input or Output Signal Name R-format Lw Sw Beq

Inputs

Op5 0 1 1 0

Op4 0 0 0 0

Op3 0 0 1 0

Op2 0 0 0 1

Op1 0 1 1 0

Op0 0 1 1 0

Outputs

RegDst 1 0 X X

ALUSrc 0 1 1 0

MemtoReg 0 1 X X

RegWrite 1 1 0 0

MemRead 0 1 0 0

MemWrite 0 0 1 0

Branch 0 0 0 1

ALUOp1 1 0 0 0

ALUOp0 0 0 0 1

Also: I-type instructions (ORI) & ExtOp (sign-extend control), etc.

1-cycle CPU Control – Lookup Table

Page 33: 55:035 Computer Architecture and Organization Lecture 9

1-cycle CPU + Jump Instruction

Instruction[31:26]

Instruction[25:0]

PC + 4 [31..28]

Jump address [31..0]

Instruction[25:21]

Instruction[20:16]

Instruction[15:11]

Instruction[15:0]

Instruction[5:0]

Page 34: 55:035 Computer Architecture and Organization Lecture 9

1-cycle CPU Problems? Every instruction 1 cycle Some instructions “do more work”

Eg, lw must read from DATAMEM All instructions must have same clock period…

Many instructions run slower than necessary

Tricky timing on MemWrite, RegWrite(?) signals Write signal must come *after* address is stable

Need extra resources… PC+4 adder, ALU for BEQ instruction, DATAMEM+INSTRMEM

3455:035 Computer Architecture and Organization

Page 35: 55:035 Computer Architecture and Organization Lecture 9

Performance! Single-Cycle CPU Performance

Execute one instruction per clock cycle (CPI=1) Clock cycle time? Note dataflow includes:

INSTRMEM read REGFILE access Sign extension ALU operation DATAMEM read REGFILE/PC write

Not every instruction uses all resources (eg, DATAMEM read) Can we change clock period for each instruction?

No! (Why not?) One clock period: the worst case! This is why a single-cycle CPU is not good for performance

3555:035 Computer Architecture and Organization

Page 36: 55:035 Computer Architecture and Organization Lecture 9

1-cycle CPU Datapath + Controller

Instruction[31:26]

Instruction[25:0]

PC + 4 [31..28]

Jump address [31..0]

Instruction[25:21]

Instruction[20:16]

Instruction[15:11]

Instruction[15:0]

Instruction[5:0]

Page 37: 55:035 Computer Architecture and Organization Lecture 9

1-cycle CPU Summary Operation

1 cycle per instruction Control signals held fixed during entire cycle (except BRANCH) Only 2 registers

PC, updated every clock cycle REGFILE, updated when required

During clock cycle, data flows from register-outputs to register-inputs Fixed clock frequency / period

Performance 1 instruction per cycle Slowest instruction determines clock frequency

Outstanding issue: MemWrite timing Assume this signal writes to memory at end of clock cycle

3755:035 Computer Architecture and Organization

Page 38: 55:035 Computer Architecture and Organization Lecture 9

Multi-cycle CPU Goals Improve performance

Break each instruction into smaller steps / multiple cycles LW instruction 5 cycles SW instruction 4 cycles R-type instruction 4 cycles Branch, Jump 3 cycles

Aim for 5x clock frequency Complex instructions (eg, LW) 5 cycles same performance as before Simple instructions (eg, ADD) fewer cycles faster

Save resources (gates/transistors) Re-use ALU over multiple cycles Put INSTR + DATA in same memory

MemWrite timing solved?

3855:035 Computer Architecture and Organization

Page 39: 55:035 Computer Architecture and Organization Lecture 9

Multi-cycle CPU Datapath

Instruction[25:21]

Instruction[20:16]

Instruction[15:11]

Instruction[15:0]

Instruction[5:0]

Instr[15:0]

InstructionRegister

MemoryData

Register

ALUOut

A

B

MemoryMemData

Address

Writedata

Registers

RdData1

RdData2

RdReg2

RdReg1

Writereg

Writedata

Add multiplexers + control signals (IorD, MemtoReg, ALUSrcA, ALUSrcB) Move signal paths (+4, Shift Left 2)

4

ShiftLeft 2

SignExtend

PC

Mux

Mux

ALU

ALUresult

Zero

Mux

Mux

Mux

Page 40: 55:035 Computer Architecture and Organization Lecture 9

Multi-cycle CPU Datapath

Instruction[25:21]

Instruction[20:16]

Instruction[15:11]

Instruction[15:0]

Instruction[5:0]

Instr[15:0]

ALUOut

A

B

MemoryMemData

Address

Writedata

Registers

RdData1

RdData2

RdReg2

RdReg1

Writereg

Writedata

Add registers + control signals (IR, MDR, A, B, ALUOut) Registers with no control signal load value every clock cycle (eg, PC)

4

ShiftLeft 2

SignExtend

PC

Mux

Mux

ALU

ALUresult

Zero

Mux

Mux

Mux

InstructionRegister

MemoryData

Register

Page 41: 55:035 Computer Architecture and Organization Lecture 9

Instruction Execution Example Execute a “Load Word” instruction

LW rt, 0(rs)

5 Steps1. Fetch instruction2. Read registers3. Compute address4. Read data5. Write registers

4155:035 Computer Architecture and Organization

Page 42: 55:035 Computer Architecture and Organization Lecture 9

Load Word Instruction Sequence

1. Fetch InstructionInstructionRegister ← Mem[PC]

Instruction[25:21]

Instruction[20:16]

Instruction[15:11]

Instruction[5:0]

Instr[15:0]

ALUOut

A

BWritedata

Registers

RdData1

RdData2

RdReg2

RdReg1

Writereg

Writedata

4

ShiftLeft 2

SignExtend

PC

Mux

Mux

ALU

ALUresult

Zero

Mux

Mux

Mux

InstructionRegister

MemoryData

Register

Instruction[15:0]

MemoryMemData

Address

Page 43: 55:035 Computer Architecture and Organization Lecture 9

Load Word Instruction Sequence

2. Read RegistersA ← Registers[Rs]

Instruction[20:16]

Instruction[15:11]

Instruction[15:0]

Instruction[5:0]

Instr[15:0]

ALUOut

A

B

MemoryMemData

Address

Writedata

Registers

RdData2

RdReg2

Writereg

Writedata

4

ShiftLeft 2

SignExtend

PC

Mux

Mux

ALU

ALUresult

Zero

Mux

Mux

Mux

InstructionRegister

MemoryData

Register

Instruction[25:21]

RdData1

RdReg1

Page 44: 55:035 Computer Architecture and Organization Lecture 9

Load Word Instruction Sequence

3. Compute AddressALUOut ← A + {SignExt(Imm16),b’00’}

Instruction[25:21]

Instruction[20:16]

Instruction[15:0]

Instruction[5:0]

Instr[15:0]

B

MemoryMemData

Address

Writedata

Registers

RdData1

RdData2

RdReg2

RdReg1

Writereg

Writedata

4

ShiftLeft 2

SignExtend

PC

Mux

Mux

ALU

ALUresult

Zero

Mux

Mux

Mux

InstructionRegister

MemoryData

Register

Instruction[15:11]

ALUOut

A

Page 45: 55:035 Computer Architecture and Organization Lecture 9

Load Word Instruction Sequence

4. Read DataMDR ← Memory[ALUOut]

Instruction[25:21]

Instruction[20:16]

Instruction[15:11]

Instruction[15:0]

Instruction[5:0]

Instr[15:0]

A

BWritedata

Registers

RdData1

RdData2

RdReg2

RdReg1

Writereg

Writedata

4

ShiftLeft 2

SignExtend

PC

Mux

Mux

ALU

ALUresult

Zero

Mux

Mux

Mux

InstructionRegister

MemoryData

Register

ALUOut

MemoryMemData

Address

Page 46: 55:035 Computer Architecture and Organization Lecture 9

Load Word Instruction Sequence

5. Write RegistersRegisters[Rt] ← MDR

Instruction[25:21]

Instruction[20:16]

Instruction[15:11]

Instruction[15:0]

Instruction[5:0]

Instr[15:0]

ALUOut

A

B

MemoryMemData

Address

Writedata

Registers

RdData1

RdData2

RdReg2

RdReg1

4

ShiftLeft 2

SignExtend

PC

Mux

Mux

ALU

ALUresult

Zero

Mux

Mux

Mux

InstructionRegister

MemoryData

Register

Writereg

Writedata

Page 47: 55:035 Computer Architecture and Organization Lecture 9

Load Word Instruction Sequence

All 5 Steps Shown

Instruction[5:0]

Instr[15:0]

BWritedata

Registers

RdData2

RdReg2

4

ShiftLeft 2

SignExtend

PC

Mux

Mux

ALU

ALUresult

Zero

Mux

Mux

Mux

InstructionRegister

MemoryData

Register

Instruction[25:21]

Instruction[20:16]

Instruction[15:11]

Instruction[15:0]

ALUOut

MemoryMemData

AddressRdData1

RdReg1

Writereg

Writedata

A

Page 48: 55:035 Computer Architecture and Organization Lecture 9

Multi-cycle Load Word: Recap1. Fetch Instruction InstructionRegister ← Mem[PC]

2. Read Registers A ← Registers[Rs]

3. Compute Address ALUOut ← A + {SignExt(Imm16)}

4. Read Data MDR ← Memory[ALUOut]

5. Write Registers Registers[Rt] ← MDR

Missing Steps?

4855:035 Computer Architecture and Organization

Page 49: 55:035 Computer Architecture and Organization Lecture 9

Multi-cycle Load Word: Recap1. Fetch Instruction InstructionRegister ← Mem[PC]; PC ← PC + 4

2. Read Registers A ← Registers[Rs]

3. Compute Address ALUOut ← A + {SignExt(Imm16)}

4. Read Data MDR ← Memory[ALUOut]

5. Write Registers Registers[Rt] ← MDR

Missing Steps? Must increment the PC Do it as part of the instruction fetch (in step 1) Need PCWrite control signal

4955:035 Computer Architecture and Organization

Page 50: 55:035 Computer Architecture and Organization Lecture 9

Multi-cycle R-Type Instruction1. Fetch Instruction InstructionRegister ← Mem[PC]; PC ← PC + 4

2. Read Registers A ← Registers[Rs]; B ← Registers[Rt]

3. Compute Value ALUOut ← A op B

4. Write Registers Registers[Rd] ← ALUOut

RTL describes data flow action in each clock cycle Control signals determine precise data flow Each step implies unique control values

5055:035 Computer Architecture and Organization

Page 51: 55:035 Computer Architecture and Organization Lecture 9

Multi-cycle R-Type Instruction: Control Signal Values1. Fetch Instruction InstructionRegister ← Mem[PC]; PC ← PC + 4

MemRead=1, ALUSrcA=0, IorD=0, IRWrite, ALUSrcB=01, ALUop=00, PCWrite, PCSource=00

2. Read Registers A ← Registers[Rs]; B ← Registers[Rt]ALUSrcA=0, ALUSrcB=11, ALUop=00

3. Compute Value ALUOut ← A op BALUSrcA=1, ALUSrcB=00, ALUop=10

4. Write Registers Registers[Rd] ← ALUOutRegDst=1, RegWrite, MemtoReg=0

Each step implies unique control values Fixed for entire cycle “Default value” implied if unspecified

5155:035 Computer Architecture and Organization

Page 52: 55:035 Computer Architecture and Organization Lecture 9

Check Your Work – Is RTL Valid ? 1. Datapath check

Within one cycle… Each cycle has valid data flow path (path exists) Each register gets only one new value

Across multiple cycles… Register value is defined before use in previous (earlier in time) clock cycle

Eg, “A 3” must occur before “B A” Make sure register value doesn’t disappear if set >1 cycle earlier

2. Control signal check Each cycle, RTL describing the datapath flow implies a value for each control

signal 0 or 1 or default or don’t care

Each control signal gets only one fixed value the entire cycle

3. Overall check Does the sequence of steps work ?

5255:035 Computer Architecture and Organization

Page 53: 55:035 Computer Architecture and Organization Lecture 9

Multi-cycle BEQ Instruction

1. Fetch InstructionInstructionRegister ← Mem[PC]; PC ← PC + 4

2. Read Registers, Precompute TargetA ← Registers[Rs] ; B ← Registers[Rt] ; ALUOut ← PC + {SignExt{Imm16},b’00’}

3. Compare Registers, Conditional Branchif( (A – B) ==0 ) PC ← ALUOut

Green shows PC calculation flow (in parallel with other operations)

5355:035 Computer Architecture and Organization

Page 54: 55:035 Computer Architecture and Organization Lecture 9

Multi-cycle Datapath with Control Signals

Instr[25:21]

Instr[20:16]

Instr[15:0]

Instr[15:0]

Instruction[5:0]

In[15:11]

Instr[25:0]

PC[31..28]

Jumpaddress

[31..0]

PCWrite

IorDMemRead

MemWrite

MemtoReg

IRWritePCSrc

ALUOp

ALUSrcA

ALUSrcB

RegWrite

RegDst

ALUControl

5455:035 Computer Architecture and Organization

Page 55: 55:035 Computer Architecture and Organization Lecture 9

Multi-cycle Datapath with Controller

Instr.[31:26]

Instr[31:26]

Instr[25:21]

Instr[20:16]

Instr[15:0]

Instr[15:0]

Instruction[5:0]

In[15:11]

Instr[25:0]

PC[31..28]

Jumpaddress

[31..0]

Page 56: 55:035 Computer Architecture and Organization Lecture 9

Multi-cycle BEQ Instruction

1. Fetch InstructionInstructionRegister ← Mem[PC]; PC ← PC + 4

2. Read Registers, Precompute TargetA ← Registers[Rs] ; B ← Registers[Rt] ; ALUOut ← PC + {SignExt{Imm16},b’00’}

3. Compare Registers, Conditional Branchif( (A – B) ==0 ) PC ← ALUOut

Green shows PC calculation flow (in parallel with other operations)

5655:035 Computer Architecture and Organization

Page 57: 55:035 Computer Architecture and Organization Lecture 9

Multi-cycle Datapath with Control Signals

Instr[25:21]

Instr[20:16]

Instr[15:0]

Instr[15:0]

Instruction[5:0]

In[15:11]

Instr[25:0]

PC[31..28]

Jumpaddress

[31..0]

PCWrite

IorDMemRead

MemWrite

MemtoReg

IRWritePCSrc

ALUOp

ALUSrcA

ALUSrcB

RegWrite

RegDst

ALUControl

5755:035 Computer Architecture and Organization

Page 58: 55:035 Computer Architecture and Organization Lecture 9

Multi-cycle Datapath with Controller

Instr.[31:26]

Instr[31:26]

Instr[25:21]

Instr[20:16]

Instr[15:0]

Instr[15:0]

Instruction[5:0]

In[15:11]

Instr[25:0]

PC[31..28]

Jumpaddress

[31..0]

Page 59: 55:035 Computer Architecture and Organization Lecture 9

Multi-cycle CPU Control: Overview

General approach: Finite State Machine (FSM) Need details in each branch of control…

Precise outputs for each state (Mealy depends on inputs, Moore does not) Precise “next state” for each state (can depend on inputs)

ControlSignalOutputs

ControlSignalOutputs

5955:035 Computer Architecture and Organization

Page 60: 55:035 Computer Architecture and Organization Lecture 9

How to Implement FSM ? Manually with logic gates + FFs

Bubble diagram, next-state table, state assignment Karnaugh map for each state bit, each output bit (painful!)

High-level language description (eg, Verilog, VHDL) Describe FSM bubble diagram (next-states, output values) Automatically synthesized into gates + FFs

Microcode (µ-code) description Sequence through many µ-ops for each CPU instruction

One µ-op (µ-instruction) sends correct control signal for 1 cycle µ-op similar to one bubble in FSM

Acts like a mini-CPU within a CPU µPC: microcode program counter Microcode storage memory contains µ-ops

Can look similar to RTL or some new “assembly language”

6055:035 Computer Architecture and Organization

Page 61: 55:035 Computer Architecture and Organization Lecture 9

FSM Specification: Bubble Diagram

Can build thisby examiningRTL

It is possible toautomaticallyconvert RTLinto this form !

61

Page 62: 55:035 Computer Architecture and Organization Lecture 9

FSM: Gates + FFs Implementation

FSMHigh-level

Organization

6255:035 Computer Architecture and Organization

Page 63: 55:035 Computer Architecture and Organization Lecture 9

FSM: Microcode Implementation

Adder

1

Datapathcontroloutputs

Sequencingcontrol

Inputs from instructionregister opcode field

MicrocodeStorage

(memory)

Inputs

Outputs

Microprogram Counter

Address Select Logic

6355:035 Computer Architecture and Organization

Page 64: 55:035 Computer Architecture and Organization Lecture 9

Multi-cycle CPU with Control FSM

Instr.[31:26]

Instr[31:26]

Instr[25:21]

Instr[20:16]

Instr[15:0]

Instr[15:0]

Instruction[5:0]

In[15:11]

Instr[25:0]

PC[31..28]

Jumpaddress

[31..0]

FSMControlOutputs

ConditionalBranch

Page 65: 55:035 Computer Architecture and Organization Lecture 9

Control FSM: Overview

General approach: Finite State Machine (FSM) Need details in each branch of control…

6555:035 Computer Architecture and Organization

Page 66: 55:035 Computer Architecture and Organization Lecture 9

Detailed FSM

66

Page 67: 55:035 Computer Architecture and Organization Lecture 9

Detailed FSMInstruction

Fetch

MemoryReference

Branch JumpR-Type

67

Page 68: 55:035 Computer Architecture and Organization Lecture 9

Detailed FSM: Instruction Fetch

6855:035 Computer Architecture and Organization

Page 69: 55:035 Computer Architecture and Organization Lecture 9

Detailed FSM: Memory Reference

LW SW

69

Page 70: 55:035 Computer Architecture and Organization Lecture 9

Detailed FSM: R-Type Instruction

7055:035 Computer Architecture and Organization

Page 71: 55:035 Computer Architecture and Organization Lecture 9

Detailed FSM: Branch Instruction

7155:035 Computer Architecture and Organization

Page 72: 55:035 Computer Architecture and Organization Lecture 9

Detailed FSM: Jump Instruction

7255:035 Computer Architecture and Organization

Page 73: 55:035 Computer Architecture and Organization Lecture 9

Performance Comparison

Single-cycle CPU

vs

Multi-cycle CPU

7355:035 Computer Architecture and Organization

Page 74: 55:035 Computer Architecture and Organization Lecture 9

Simple Comparison

Single-cycle CPU

1 clock cycle

5 clock cycles

Multi-cycle CPU

4 clock cycles

Multi-cycle CPU

3 clock cycles

Multi-cycle CPU

SW, R-type

BEQ, J

LW

All

Page 75: 55:035 Computer Architecture and Organization Lecture 9

What’s really happening?

Single-cycle CPU

Multi-cycle CPU

( Load Word Instruction )

Fetch Decode Memory WriteCalcAddr

Ideally:

7555:035 Computer Architecture and Organization

Page 76: 55:035 Computer Architecture and Organization Lecture 9

In practice, steps differ in speeds…

Single-cycle CPU

Multi-cycle CPU

Fetch Decode MemoryCalcAddr

Fetch Decode MemoryCalcAddr

Write

Write

Violation!Wasted time!

Load Word Instruction

7655:035 Computer Architecture and Organization

Page 77: 55:035 Computer Architecture and Organization Lecture 9

Single-cycle vs Multi-cycleLW instruction faster for single-cycle

Single-cycle CPU

Fetch Decode MemoryCalcAddr

Fetch Decode MemoryCalcAddr

Write

Write

Violation fixed!

Multi-cycle CPU

Now wasted time is larger!

7755:035 Computer Architecture and Organization

Page 78: 55:035 Computer Architecture and Organization Lecture 9

Single-cycle vs Multi-cycleSW instruction ~ same speed

Single-cycle CPU

Fetch Decode MemoryCalcAddr

Fetch Decode MemoryCalcAddr

Multi-cycle CPU

Wasted time!

Speed diff

7855:035 Computer Architecture and Organization

Page 79: 55:035 Computer Architecture and Organization Lecture 9

Single-cycle vs Multi-cycleBEQ, J instruction faster for multi-cycle

Single-cycle CPU

Fetch DecodeCalcAddr

Fetch DecodeCalcAddr

Wasted time!

Speed diff

Multi-cycle CPU

7955:035 Computer Architecture and Organization

Page 80: 55:035 Computer Architecture and Organization Lecture 9

Performance Summary Which CPU implementation is faster?

LW single-cycle is faster SW,R-type about the same BEQ,J multi-cycle is faster

Real programs use a mix of these instructions

Overall performance depends instruction frequency !

8055:035 Computer Architecture and Organization

Page 81: 55:035 Computer Architecture and Organization Lecture 9

Implementation Summary Single-cycle CPU

1 instruction per cycle (eg, 1MHz 1 MIPS) No “wasted time” on most complex instruction Large wasted time on simpler instructions Simple controller (just a lookup table or memory) Simple instructions

Multi-cycle CPU << 1 instruction per cycle (eg, 1MHz 0.2 MIPS) Small time wasted on most complex instruction

Hence, this instruction always slower than single-cycle CPU Small time wasted on simple instructions

Eliminates “large wasted time” by using fewer clock cycles Complex controller (FSM) Potential to create complex instructions

8155:035 Computer Architecture and Organization