Upload
myron-douglas-jacobs
View
227
Download
0
Embed Size (px)
Citation preview
55:035 Computer Architecture and Organization
Lecture 9
Outline Building a CPU
Basic Components MIPS Instructions Basic 5 Steps for CPU Single-Cycle Design Multi-cycle Design Comparison of Single and Multi-cycle Designs
255:035 Computer Architecture and Organization
Overview Brief look
Digital logic
CPU Datapath MIPS Example
355:035 Computer Architecture and Organization
Digital Logic
D Q
D-type Flip-flop
Clock(edge-triggered)
S (Select input)
A
BF
0
1
Multiplexer
D-type Flip-flop with Enable
Clock(edge-triggered)
D QEN
0
1D Q
DQ
EN(enable)
Clock(edge-triggered)
455:035 Computer Architecture and Organization
Digital Logic
1 Bit
D Q
Clock(edge-triggered)
EN
4 Bits
Clock(edge-triggered)
D3 Q3
EN
D2 Q2D1 Q1D0 Q0
Registers
N Bits
D Q
Clock(edge-triggered)
EN
555:035 Computer Architecture and Organization
Digital Logic
outin
drive
Tri-state Driver (Buffer)In Drive Out
0 0 Z
1 0 Z
0 1 0
1 1 1
What is Z ??
655:035 Computer Architecture and Organization
Digital Logic
Adder/Subtractor or ALU
A B
F
Carry-out
Add/sub or ALUop
Carry-in
755:035 Computer Architecture and Organization
Overview Brief look
Digital logic
How to Design a CPU Datapath MIPS Example
855:035 Computer Architecture and Organization
Designing a CPU: 5 Steps Analyze the instruction set datapath requirements
MIPS: ADD, SUB, ORI, LW, SW, BR Meaning of each instruction given by RTL (register transfers) 2 types of registers: CPU/ISA registers, temporary registers
Datapath requirements select the datapath components ALU, register file, adder, data memory, etc
Assemble the datapath Datapath must support planned register transfers Ensure all instructions are supported
Analyze datapath control required for each instruction Assemble the control logic
955:035 Computer Architecture and Organization
Step 1a: Analyze ISA All MIPS instructions are 32 bits long. Three instruction formats:
R-type
I-type
J-type
R: registers, I: immediate, J: jumps These formats intentionally chosen to simplify design
op target address
02631
6 bits 26 bits
op rs rt rd shamt funct
061116212631
6 bits 6 bits5 bits5 bits5 bits5 bits
op rs rt immediate
016212631
6 bits 16 bits5 bits5 bits
1055:035 Computer Architecture and Organization
Step 1b: Analyze ISA
Meaning of the fields: op: operation of the instruction rs, rt, rd: the source and destination register specifiers
Destination is either rd (R-type), or rt (I-type) shamt: shift amount funct: selects the variant of the operation in the “op” field immediate: address offset or immediate value target address: target address of the jump instruction
op target address02631
6 bits 26 bits
op rs rt rd shamt funct061116212631
6 bits 6 bits5 bits5 bits5 bits5 bits
op rs rt immediate016212631
6 bits 16 bits5 bits5 bits
R-type
I-type
J-type
1155:035 Computer Architecture and Organization
MIPS ISA: subset for today ADD and SUB
addU rd, rs, rt subU rd, rs, rt
OR Immediate: ori rt, rs, imm16
LOAD and STORE Word lw rt, rs, imm16 sw rt, rs, imm16
BRANCH: beq rs, rt, imm16
op rs rt rd shamt funct
061116212631
6 bits 6 bits5 bits5 bits5 bits5 bits
op rs rt immediate
016212631
6 bits 16 bits5 bits5 bits
op rs rt immediate
016212631
6 bits 16 bits5 bits5 bits
op rs rt immediate
016212631
6 bits 16 bits5 bits5 bits
1255:035 Computer Architecture and Organization
Step 2: Datapath RequirementsREGISTER FILE
MIPS ISA requires 32 registers, 32b each
Called a register file Contains 32 entries Each entry is 32b
AddU rd,rs,rt or SubU rd,rs,rt Read two sources rs, rt Operation rs + rt or rs – rt Write destination rd ← rs+/-rt
Requirements Read two registers (rs, rt) Perform ALU operation Write a third register (rd)
RdReg1
RdReg2
WrReg
WrData
RdData1
RdData2
RegWrite
REGFILE
RegisterNumbers(5 bits ea)
How toimplement?
ALU
ALUop
Result
Zero?
1355:035 Computer Architecture and Organization
Step 3: Datapath Assembly ADDU rd, rs, rt SUBU rd, rs, rt
Need an ALU Hook it up to REGISTER FILE REGFILE has 2 read ports (rs,rt), 1 write port (rd)
rsParametersCome FromInstructionFields
rt
rd
Control Signals DependUpon Instruction Fields
Eg:ALUop = f(Instruction) = f(op, funct)
RdReg1
RdReg2
WrReg
WrData
RdData1
RdData2
RegWrite
REGFILE
ALU
ALUop
Result
Zero?
1455:035 Computer Architecture and Organization
Steps 2 and 3: ORI Instruction ORI rt, rs, Imm16
Need new ALUop for ‘OR’ function, hook up to REGFILE 1 read port (rs), 1 write port (rt), 1 const value (Imm16)
rs
FromInstruction
rt
rt rdX
RdReg1
RdReg2
WrReg
WrData
RdData1
RdData2
RegWrite
REGFILE
ZERO-EXTEND
ALU
ALUop
Result
Zero?
16-bitsImm16
ALUsrc
0
1Control SignalsDepend UponInstruction Fields
E.g.:ALUsrc = f(Instruction) = f(op, funct)
1555:035 Computer Architecture and Organization
Steps 2 and 3 Destination Register Must select proper destination, rd or rt
Depends on Instruction Type R-type may write rd I-type may write rt
FromInstruction
RdReg1
RdReg2
WrReg
WrData
RdData1
RdData2
REGFILE
rs
rt
rd
ZERO-EXTEND
ALU
ALUop
Result
Zero?
ALUsrc
0
1
RegDst
1
0
16-bitsImm16
RegWrite
1655:035 Computer Architecture and Organization
Steps 2 and 3: Load Word LW rt, rs, Imm16
Need Data Memory: data ← Mem[Addr] Addr is rs+Imm16, Imm16 is signed, use ALU for +
Store in rt: rt ← Mem[rs+Imm16]
RdReg1
RdReg2
WrRegWrData
RdData1
RdData2REGFILE
rs
rt
rd
SIGN/ZERO-
EXTEND
ALU
ALUop
Result
Zero?
ALUsrc
0
1
RegDst
1
0
Imm16
RegWrite
AddrRdData
MemtoReg
0
1
DATAMEM
ExtOp
1755:035 Computer Architecture and Organization
Steps 2 and 3: Store Word SW rt, rs, Imm16
Need Data Memory: Mem[Addr] ← data Addr is rs+Imm16, Imm16 is signed, use ALU for +
Store in Mem: Mem[rs+Imm16] ← rt
RdReg1
RdReg2
WrReg
WrData
RdData1
RdData2
REGFILE
rs
rt
rd
SIGN/ZERO-
EXTEND
ALU
ALUop
Result
Zero?
ALUsrc
0
1
RegDst
1
0
Imm16
RegWrite
AddrRdData
WrData
MemtoReg
1
0
DATAMEM
ExtOp
MemWrite
1855:035 Computer Architecture and Organization
Writes: Need to Control Timing Problem: write to data memory
Data can come anytime Addr must come first MemWrite must come after Addr
Else? writes to wrong Addr!
Solution: use ideal data memory Assume everything works ok How to fix this for real? One solution: synchronous memory Another solution: delay MemWr to come late
Problems?: write to register file Does RegWrite signal come after WrReg number? When does the write to a register happen? Read from same register as being written?
1955:035 Computer Architecture and Organization
Missing Pieces: Instruction Fetching Where does the Instruction come from?
From instruction memory, of course!
Recall: stored-program concept Alternatives? How about hard-coding wires and switches…? This
is how ENIAC was programmed!
How to branch? BEQ rs, rt, Imm16
2055:035 Computer Architecture and Organization
Instruction Processing Fetch instruction Execute instruction
Fetch next instruction Execute next instruction
Fetch next instruction Execute next instruction
Etc…
How to maintain sequence? Use a counter! Branches (out of sequence) ? Load the counter!
2155:035 Computer Architecture and Organization
Instruction Processing Program Counter
Points to current instruction
Address to instruction memory Instr ← InstrMem[PC]
Next instruction: counts up by 4 Remember: memory is byte-addressable, instructions are 4 bytes
PC ← PC + 4
Branch instruction: replace PC contents
2255:035 Computer Architecture and Organization
Step 1: Analyze Instructions Register Transfer Language…
op | rs | rt | rd | shamt | funct = InstrMem[ PC ]
op | rs | rt | Imm16 = InstrMem[ PC ]
Instr Register Transfers
ADDU R[rd] ← R[rs] + R[rt]; PC ← PC + 4
SUBU R[rd] ← R[rs] – R[rt]; PC ← PC + 4
ORI R[rt] ← R[rs] + zero_ext(Imm16); PC ← PC + 4
LOAD R[rt] ← MEM[ R[rs] + sign_ext(Imm16)]; PC ← PC + 4
STORE MEM[ R[rs] + sign_ext(Imm16) ] ← R[rt]; PC ← PC + 4
BEQ if ( R[rs] == R[rt] ) then PC ← PC + 4 + { sign_ext(Imm16)] || b’00’ } else
PC ← PC + 42355:035 Computer Architecture and Organization
Steps 2 and 3: Datapath & Assembly
PC: a register Counter, counts by +4 Provides address to Instruction Memory
Add
Readaddress
InstructionMemory
Instruction[31:0]
PC
Instruction[31:0]
4
2455:035 Computer Architecture and Organization
Steps 2 and 3: Datapath & Assembly
Add AddAdd
result
Readaddress
InstructionMemory
Instruction[31:0]
PC
0Mux1
Sign/Zero
Extend
Instruction[25:21]
Instruction[20:16]
Instruction[15:11]
Instruction[15:0] (Imm16)
16 32
PCSrcShiftLeft 2
4
PC: a register Counter, counts by +4 Sometimes, must add
SignExtend{Imm16||b’00’} for branch instructionsNote: the sign-extender for Imm16
is already in the datapath(everything else is new)
ExtOp25
Steps 2 and 3: Add Previous Datapath
Add Add
ALU
Addresult
ALUresult
Zero
Readaddress
InstructionMemory
Instruction[31:0]
RegisterFile
DataMemory
PC
Addr-ess
Readdata
Writedata
0Mux1
1Mux0
0Mux1
0Mux1
ALUControl
Sign/Zero
Extend
Writereg.
Readreg. 1
Readreg. 2
Readdata 2
Readdata 1
Writedata
Instruction[25:21]
Instruction[20:16]
Instruction[15:11]
Instruction[15:0] (Imm16)
Instruction[5:0] (funct)
16 32
RegWrite
RegDst
ALUSrc
MemWrite
PCSrc
MemtoReg
ALUOp
ShiftLeft 2
4
ExtOp
What have we done? Created a simple CPU datapath
Control still missing (next slide)
Single-cycle CPU Every instruction takes 1 clock cycle Clocking ?
2755:035 Computer Architecture and Organization
One Clock Cycle Clock Locations
PC, REGFILE have clocks
Operation On rising edge, PC will get new value
Maybe REGFILE will have one value updated as well After rising edge
PC and REGFILE can’t change New value out of PC Instruction out of INSTRMEM Instruction selects registers to read from REGFILE Instruction controls ALUop, ALUsrc, MemWrite, ExtOp, etc ALU does its work DataMem may be read (depending on instruction) Result value goes back to REGFILE New PC value goes back to PC Await next clock edge
Lots to do in only1 clockcycle !!
2855:035 Computer Architecture and Organization
Missing Steps? Control is missing (Steps 4 and 5 we mentioned earlier)
Generate the green signals ALUsrc, MemWrite, MemtoReg, PCSrc, RegDst, etc
These are all f(Instruction), where f() is a logic expression Will look at control strategies in upcoming lecture
Implementation Details How to implement REGFILE?
Read port: tristate buffers? Multiplexer? Memory? Two read ports: two of above? Write port: how to write only 1 register?
How to control writes to memory? To register file?
More instructions Shift instructions Jump instruction Etc
2955:035 Computer Architecture and Organization
1-Cycle CPU Datapath
Add Add
ALU
Addresult
ALUresult
Zero
Readaddress
InstructionMemory
Instruction[31:0]
RegisterFile
DataMemory
PC
Addr-ess
Readdata
Writedata
0Mux1
1Mux0
0Mux1
0Mux1
ALUControl
Sign/Zero
Extend
Writereg.
Readreg. 1
Readreg. 2
Readdata 2
Readdata 1
Writedata
Instruction[25:21]
Instruction[20:16]
Instruction[15:11]
Instruction[15:0] (Imm16)
Instruction[5:0] (funct)
16 32
RegWrite
RegDst
ALUSrc
MemWrite
PCSrc
MemtoReg
ALUOp
ShiftLeft 2
4
ExtOp
1-cycle CPU Datapath + Control
PCSrc
Instruction[25:21]
Instruction[20:16]
Instruction[15:11]
Instruction[15:0]
Instruction[5:0]
Instruction[31:26]
Sign/Zero
Extend
DataMemory
Addr-ess
Readdata
Writedata
ALUALU
result
Zero
Readaddress
InstructionMemory
Instruction[31:0]
Add
PC
4
AddAdd
resultShiftLeft 2
RegisterFile
Writereg.
Readreg. 1
Readreg. 2
Readdata 2
Readdata 1
Writedata
RegDst
BranchMemReadMemtoRegALUOpMemWriteALUSrcRegWrite
ALUcontrol
Con-trol
Input or Output Signal Name R-format Lw Sw Beq
Inputs
Op5 0 1 1 0
Op4 0 0 0 0
Op3 0 0 1 0
Op2 0 0 0 1
Op1 0 1 1 0
Op0 0 1 1 0
Outputs
RegDst 1 0 X X
ALUSrc 0 1 1 0
MemtoReg 0 1 X X
RegWrite 1 1 0 0
MemRead 0 1 0 0
MemWrite 0 0 1 0
Branch 0 0 0 1
ALUOp1 1 0 0 0
ALUOp0 0 0 0 1
Also: I-type instructions (ORI) & ExtOp (sign-extend control), etc.
1-cycle CPU Control – Lookup Table
1-cycle CPU + Jump Instruction
Instruction[31:26]
Instruction[25:0]
PC + 4 [31..28]
Jump address [31..0]
Instruction[25:21]
Instruction[20:16]
Instruction[15:11]
Instruction[15:0]
Instruction[5:0]
1-cycle CPU Problems? Every instruction 1 cycle Some instructions “do more work”
Eg, lw must read from DATAMEM All instructions must have same clock period…
Many instructions run slower than necessary
Tricky timing on MemWrite, RegWrite(?) signals Write signal must come *after* address is stable
Need extra resources… PC+4 adder, ALU for BEQ instruction, DATAMEM+INSTRMEM
3455:035 Computer Architecture and Organization
Performance! Single-Cycle CPU Performance
Execute one instruction per clock cycle (CPI=1) Clock cycle time? Note dataflow includes:
INSTRMEM read REGFILE access Sign extension ALU operation DATAMEM read REGFILE/PC write
Not every instruction uses all resources (eg, DATAMEM read) Can we change clock period for each instruction?
No! (Why not?) One clock period: the worst case! This is why a single-cycle CPU is not good for performance
3555:035 Computer Architecture and Organization
1-cycle CPU Datapath + Controller
Instruction[31:26]
Instruction[25:0]
PC + 4 [31..28]
Jump address [31..0]
Instruction[25:21]
Instruction[20:16]
Instruction[15:11]
Instruction[15:0]
Instruction[5:0]
1-cycle CPU Summary Operation
1 cycle per instruction Control signals held fixed during entire cycle (except BRANCH) Only 2 registers
PC, updated every clock cycle REGFILE, updated when required
During clock cycle, data flows from register-outputs to register-inputs Fixed clock frequency / period
Performance 1 instruction per cycle Slowest instruction determines clock frequency
Outstanding issue: MemWrite timing Assume this signal writes to memory at end of clock cycle
3755:035 Computer Architecture and Organization
Multi-cycle CPU Goals Improve performance
Break each instruction into smaller steps / multiple cycles LW instruction 5 cycles SW instruction 4 cycles R-type instruction 4 cycles Branch, Jump 3 cycles
Aim for 5x clock frequency Complex instructions (eg, LW) 5 cycles same performance as before Simple instructions (eg, ADD) fewer cycles faster
Save resources (gates/transistors) Re-use ALU over multiple cycles Put INSTR + DATA in same memory
MemWrite timing solved?
3855:035 Computer Architecture and Organization
Multi-cycle CPU Datapath
Instruction[25:21]
Instruction[20:16]
Instruction[15:11]
Instruction[15:0]
Instruction[5:0]
Instr[15:0]
InstructionRegister
MemoryData
Register
ALUOut
A
B
MemoryMemData
Address
Writedata
Registers
RdData1
RdData2
RdReg2
RdReg1
Writereg
Writedata
Add multiplexers + control signals (IorD, MemtoReg, ALUSrcA, ALUSrcB) Move signal paths (+4, Shift Left 2)
4
ShiftLeft 2
SignExtend
PC
Mux
Mux
ALU
ALUresult
Zero
Mux
Mux
Mux
Multi-cycle CPU Datapath
Instruction[25:21]
Instruction[20:16]
Instruction[15:11]
Instruction[15:0]
Instruction[5:0]
Instr[15:0]
ALUOut
A
B
MemoryMemData
Address
Writedata
Registers
RdData1
RdData2
RdReg2
RdReg1
Writereg
Writedata
Add registers + control signals (IR, MDR, A, B, ALUOut) Registers with no control signal load value every clock cycle (eg, PC)
4
ShiftLeft 2
SignExtend
PC
Mux
Mux
ALU
ALUresult
Zero
Mux
Mux
Mux
InstructionRegister
MemoryData
Register
Instruction Execution Example Execute a “Load Word” instruction
LW rt, 0(rs)
5 Steps1. Fetch instruction2. Read registers3. Compute address4. Read data5. Write registers
4155:035 Computer Architecture and Organization
Load Word Instruction Sequence
1. Fetch InstructionInstructionRegister ← Mem[PC]
Instruction[25:21]
Instruction[20:16]
Instruction[15:11]
Instruction[5:0]
Instr[15:0]
ALUOut
A
BWritedata
Registers
RdData1
RdData2
RdReg2
RdReg1
Writereg
Writedata
4
ShiftLeft 2
SignExtend
PC
Mux
Mux
ALU
ALUresult
Zero
Mux
Mux
Mux
InstructionRegister
MemoryData
Register
Instruction[15:0]
MemoryMemData
Address
Load Word Instruction Sequence
2. Read RegistersA ← Registers[Rs]
Instruction[20:16]
Instruction[15:11]
Instruction[15:0]
Instruction[5:0]
Instr[15:0]
ALUOut
A
B
MemoryMemData
Address
Writedata
Registers
RdData2
RdReg2
Writereg
Writedata
4
ShiftLeft 2
SignExtend
PC
Mux
Mux
ALU
ALUresult
Zero
Mux
Mux
Mux
InstructionRegister
MemoryData
Register
Instruction[25:21]
RdData1
RdReg1
Load Word Instruction Sequence
3. Compute AddressALUOut ← A + {SignExt(Imm16),b’00’}
Instruction[25:21]
Instruction[20:16]
Instruction[15:0]
Instruction[5:0]
Instr[15:0]
B
MemoryMemData
Address
Writedata
Registers
RdData1
RdData2
RdReg2
RdReg1
Writereg
Writedata
4
ShiftLeft 2
SignExtend
PC
Mux
Mux
ALU
ALUresult
Zero
Mux
Mux
Mux
InstructionRegister
MemoryData
Register
Instruction[15:11]
ALUOut
A
Load Word Instruction Sequence
4. Read DataMDR ← Memory[ALUOut]
Instruction[25:21]
Instruction[20:16]
Instruction[15:11]
Instruction[15:0]
Instruction[5:0]
Instr[15:0]
A
BWritedata
Registers
RdData1
RdData2
RdReg2
RdReg1
Writereg
Writedata
4
ShiftLeft 2
SignExtend
PC
Mux
Mux
ALU
ALUresult
Zero
Mux
Mux
Mux
InstructionRegister
MemoryData
Register
ALUOut
MemoryMemData
Address
Load Word Instruction Sequence
5. Write RegistersRegisters[Rt] ← MDR
Instruction[25:21]
Instruction[20:16]
Instruction[15:11]
Instruction[15:0]
Instruction[5:0]
Instr[15:0]
ALUOut
A
B
MemoryMemData
Address
Writedata
Registers
RdData1
RdData2
RdReg2
RdReg1
4
ShiftLeft 2
SignExtend
PC
Mux
Mux
ALU
ALUresult
Zero
Mux
Mux
Mux
InstructionRegister
MemoryData
Register
Writereg
Writedata
Load Word Instruction Sequence
All 5 Steps Shown
Instruction[5:0]
Instr[15:0]
BWritedata
Registers
RdData2
RdReg2
4
ShiftLeft 2
SignExtend
PC
Mux
Mux
ALU
ALUresult
Zero
Mux
Mux
Mux
InstructionRegister
MemoryData
Register
Instruction[25:21]
Instruction[20:16]
Instruction[15:11]
Instruction[15:0]
ALUOut
MemoryMemData
AddressRdData1
RdReg1
Writereg
Writedata
A
Multi-cycle Load Word: Recap1. Fetch Instruction InstructionRegister ← Mem[PC]
2. Read Registers A ← Registers[Rs]
3. Compute Address ALUOut ← A + {SignExt(Imm16)}
4. Read Data MDR ← Memory[ALUOut]
5. Write Registers Registers[Rt] ← MDR
Missing Steps?
4855:035 Computer Architecture and Organization
Multi-cycle Load Word: Recap1. Fetch Instruction InstructionRegister ← Mem[PC]; PC ← PC + 4
2. Read Registers A ← Registers[Rs]
3. Compute Address ALUOut ← A + {SignExt(Imm16)}
4. Read Data MDR ← Memory[ALUOut]
5. Write Registers Registers[Rt] ← MDR
Missing Steps? Must increment the PC Do it as part of the instruction fetch (in step 1) Need PCWrite control signal
4955:035 Computer Architecture and Organization
Multi-cycle R-Type Instruction1. Fetch Instruction InstructionRegister ← Mem[PC]; PC ← PC + 4
2. Read Registers A ← Registers[Rs]; B ← Registers[Rt]
3. Compute Value ALUOut ← A op B
4. Write Registers Registers[Rd] ← ALUOut
RTL describes data flow action in each clock cycle Control signals determine precise data flow Each step implies unique control values
5055:035 Computer Architecture and Organization
Multi-cycle R-Type Instruction: Control Signal Values1. Fetch Instruction InstructionRegister ← Mem[PC]; PC ← PC + 4
MemRead=1, ALUSrcA=0, IorD=0, IRWrite, ALUSrcB=01, ALUop=00, PCWrite, PCSource=00
2. Read Registers A ← Registers[Rs]; B ← Registers[Rt]ALUSrcA=0, ALUSrcB=11, ALUop=00
3. Compute Value ALUOut ← A op BALUSrcA=1, ALUSrcB=00, ALUop=10
4. Write Registers Registers[Rd] ← ALUOutRegDst=1, RegWrite, MemtoReg=0
Each step implies unique control values Fixed for entire cycle “Default value” implied if unspecified
5155:035 Computer Architecture and Organization
Check Your Work – Is RTL Valid ? 1. Datapath check
Within one cycle… Each cycle has valid data flow path (path exists) Each register gets only one new value
Across multiple cycles… Register value is defined before use in previous (earlier in time) clock cycle
Eg, “A 3” must occur before “B A” Make sure register value doesn’t disappear if set >1 cycle earlier
2. Control signal check Each cycle, RTL describing the datapath flow implies a value for each control
signal 0 or 1 or default or don’t care
Each control signal gets only one fixed value the entire cycle
3. Overall check Does the sequence of steps work ?
5255:035 Computer Architecture and Organization
Multi-cycle BEQ Instruction
1. Fetch InstructionInstructionRegister ← Mem[PC]; PC ← PC + 4
2. Read Registers, Precompute TargetA ← Registers[Rs] ; B ← Registers[Rt] ; ALUOut ← PC + {SignExt{Imm16},b’00’}
3. Compare Registers, Conditional Branchif( (A – B) ==0 ) PC ← ALUOut
Green shows PC calculation flow (in parallel with other operations)
5355:035 Computer Architecture and Organization
Multi-cycle Datapath with Control Signals
Instr[25:21]
Instr[20:16]
Instr[15:0]
Instr[15:0]
Instruction[5:0]
In[15:11]
Instr[25:0]
PC[31..28]
Jumpaddress
[31..0]
PCWrite
IorDMemRead
MemWrite
MemtoReg
IRWritePCSrc
ALUOp
ALUSrcA
ALUSrcB
RegWrite
RegDst
ALUControl
5455:035 Computer Architecture and Organization
Multi-cycle Datapath with Controller
Instr.[31:26]
Instr[31:26]
Instr[25:21]
Instr[20:16]
Instr[15:0]
Instr[15:0]
Instruction[5:0]
In[15:11]
Instr[25:0]
PC[31..28]
Jumpaddress
[31..0]
Multi-cycle BEQ Instruction
1. Fetch InstructionInstructionRegister ← Mem[PC]; PC ← PC + 4
2. Read Registers, Precompute TargetA ← Registers[Rs] ; B ← Registers[Rt] ; ALUOut ← PC + {SignExt{Imm16},b’00’}
3. Compare Registers, Conditional Branchif( (A – B) ==0 ) PC ← ALUOut
Green shows PC calculation flow (in parallel with other operations)
5655:035 Computer Architecture and Organization
Multi-cycle Datapath with Control Signals
Instr[25:21]
Instr[20:16]
Instr[15:0]
Instr[15:0]
Instruction[5:0]
In[15:11]
Instr[25:0]
PC[31..28]
Jumpaddress
[31..0]
PCWrite
IorDMemRead
MemWrite
MemtoReg
IRWritePCSrc
ALUOp
ALUSrcA
ALUSrcB
RegWrite
RegDst
ALUControl
5755:035 Computer Architecture and Organization
Multi-cycle Datapath with Controller
Instr.[31:26]
Instr[31:26]
Instr[25:21]
Instr[20:16]
Instr[15:0]
Instr[15:0]
Instruction[5:0]
In[15:11]
Instr[25:0]
PC[31..28]
Jumpaddress
[31..0]
Multi-cycle CPU Control: Overview
General approach: Finite State Machine (FSM) Need details in each branch of control…
Precise outputs for each state (Mealy depends on inputs, Moore does not) Precise “next state” for each state (can depend on inputs)
ControlSignalOutputs
ControlSignalOutputs
5955:035 Computer Architecture and Organization
How to Implement FSM ? Manually with logic gates + FFs
Bubble diagram, next-state table, state assignment Karnaugh map for each state bit, each output bit (painful!)
High-level language description (eg, Verilog, VHDL) Describe FSM bubble diagram (next-states, output values) Automatically synthesized into gates + FFs
Microcode (µ-code) description Sequence through many µ-ops for each CPU instruction
One µ-op (µ-instruction) sends correct control signal for 1 cycle µ-op similar to one bubble in FSM
Acts like a mini-CPU within a CPU µPC: microcode program counter Microcode storage memory contains µ-ops
Can look similar to RTL or some new “assembly language”
6055:035 Computer Architecture and Organization
FSM Specification: Bubble Diagram
Can build thisby examiningRTL
It is possible toautomaticallyconvert RTLinto this form !
61
FSM: Gates + FFs Implementation
FSMHigh-level
Organization
6255:035 Computer Architecture and Organization
FSM: Microcode Implementation
Adder
1
Datapathcontroloutputs
Sequencingcontrol
Inputs from instructionregister opcode field
MicrocodeStorage
(memory)
Inputs
Outputs
Microprogram Counter
Address Select Logic
6355:035 Computer Architecture and Organization
Multi-cycle CPU with Control FSM
Instr.[31:26]
Instr[31:26]
Instr[25:21]
Instr[20:16]
Instr[15:0]
Instr[15:0]
Instruction[5:0]
In[15:11]
Instr[25:0]
PC[31..28]
Jumpaddress
[31..0]
FSMControlOutputs
ConditionalBranch
Control FSM: Overview
General approach: Finite State Machine (FSM) Need details in each branch of control…
6555:035 Computer Architecture and Organization
Detailed FSM
66
Detailed FSMInstruction
Fetch
MemoryReference
Branch JumpR-Type
67
Detailed FSM: Instruction Fetch
6855:035 Computer Architecture and Organization
Detailed FSM: Memory Reference
LW SW
69
Detailed FSM: R-Type Instruction
7055:035 Computer Architecture and Organization
Detailed FSM: Branch Instruction
7155:035 Computer Architecture and Organization
Detailed FSM: Jump Instruction
7255:035 Computer Architecture and Organization
Performance Comparison
Single-cycle CPU
vs
Multi-cycle CPU
7355:035 Computer Architecture and Organization
Simple Comparison
Single-cycle CPU
1 clock cycle
5 clock cycles
Multi-cycle CPU
4 clock cycles
Multi-cycle CPU
3 clock cycles
Multi-cycle CPU
SW, R-type
BEQ, J
LW
All
What’s really happening?
Single-cycle CPU
Multi-cycle CPU
( Load Word Instruction )
Fetch Decode Memory WriteCalcAddr
Ideally:
7555:035 Computer Architecture and Organization
In practice, steps differ in speeds…
Single-cycle CPU
Multi-cycle CPU
Fetch Decode MemoryCalcAddr
Fetch Decode MemoryCalcAddr
Write
Write
Violation!Wasted time!
Load Word Instruction
7655:035 Computer Architecture and Organization
Single-cycle vs Multi-cycleLW instruction faster for single-cycle
Single-cycle CPU
Fetch Decode MemoryCalcAddr
Fetch Decode MemoryCalcAddr
Write
Write
Violation fixed!
Multi-cycle CPU
Now wasted time is larger!
7755:035 Computer Architecture and Organization
Single-cycle vs Multi-cycleSW instruction ~ same speed
Single-cycle CPU
Fetch Decode MemoryCalcAddr
Fetch Decode MemoryCalcAddr
Multi-cycle CPU
Wasted time!
Speed diff
7855:035 Computer Architecture and Organization
Single-cycle vs Multi-cycleBEQ, J instruction faster for multi-cycle
Single-cycle CPU
Fetch DecodeCalcAddr
Fetch DecodeCalcAddr
Wasted time!
Speed diff
Multi-cycle CPU
7955:035 Computer Architecture and Organization
Performance Summary Which CPU implementation is faster?
LW single-cycle is faster SW,R-type about the same BEQ,J multi-cycle is faster
Real programs use a mix of these instructions
Overall performance depends instruction frequency !
8055:035 Computer Architecture and Organization
Implementation Summary Single-cycle CPU
1 instruction per cycle (eg, 1MHz 1 MIPS) No “wasted time” on most complex instruction Large wasted time on simpler instructions Simple controller (just a lookup table or memory) Simple instructions
Multi-cycle CPU << 1 instruction per cycle (eg, 1MHz 0.2 MIPS) Small time wasted on most complex instruction
Hence, this instruction always slower than single-cycle CPU Small time wasted on simple instructions
Eliminates “large wasted time” by using fewer clock cycles Complex controller (FSM) Potential to create complex instructions
8155:035 Computer Architecture and Organization