Upload
amanda
View
54
Download
1
Embed Size (px)
DESCRIPTION
Computer Organization and Architecture Chapter 5: The Processor: Datapath and Control. Yu-Lun Kuo Computer Sciences and Information Engineering University of Tunghai, Taiwan [email protected]. 5.1 Introduction. The performance of a machine Instruction count Clock cycle time - PowerPoint PPT Presentation
Citation preview
Computer Organization and Architecture
Chapter 5: The Processor:
Datapath and Control
Yu-Lun Kuo
Computer Sciences and Information Engineering
University of Tunghai, Taiwan
5.1 Introduction
• The performance of a machine – Instruction count
– Clock cycle time
– Clock cycles per instruction (CPI)
• The compiler and the instruction set architecture
– Determine the instruction count required for a given instruction
2
5.1 Introduction
• Both the clock cycle time and the number of CPI
– Determined by the implementation of the processor
• We construct the datapath and control unit for two different implementations of the MIPS instruction set
– Single cycle implementation
– Multi cycle implementation
3
5.1 Introduction
• We are going to see how the processor is implemented
– starting with a very simple processor, and adding some more complexity
4
Basic MPIS Implementation
• Include a subset of the MIPS instruction– Memory-reference instructions: lw and sw
– The ALU instructions: add, sub, and, or, slt
– Control flow instructions: beq and j
• Generic Implementation– Use the program counter (PC) to supply
instruction address
– Fetch the instruction from memory
– Read one/two registers
– Use the instruction to decide exactly what to do
5
Basic MPIS Implementation
• All instructions use the ALU after reading the registers (except jump)
– Memory-reference instructions use ALU for address calculation
– Arithmetic-logical instructions for the operation execution
– Branches for comparison
6
Our Processor, sort of…
What’s missing
How to combine input that are “joined” together How to tell which component what to do?
Multiplexers and Controllers
• In the previous figure we have two or more “wires” going into the input of a component
– This is because depending on the instruction being executed different input should be provided
• So, based on the instruction, we need to decide which input should be selected
• This is done with a multiplexer (多工器 )
M
U
X
M
U
X
input 1
input n. . . selected output
control: ceil(log2(n)) bits
What about the Control?
• So great, now we can control multiplexers– Need a controller sends the appropriate control
bits to all the multiplexers and the components
• Besides, there are other things to control– Example: the ALU has a bunch of control bits,
that tells it what to do:
2-bit control
00: ADD
01: SUB
10: MUL
11: SHIFT
Control Unit (Simplified)
instruction register
. . .
PC
Add
. . . offset
M
U
X
4
input 1
input 0
0 or 1
A More Complete Picture
5.2 Logic Design Conventions
• The functional units (功能單元 ) in the MIPS implementation consist of two different types of logic elements
– Elements that operate on data values (combinational)
» Outputs depend only on the current inputs
» Always produces the same output • It has no internal storage
– Elements that contain state (sequential)
» Has at least two inputs and one output• Data value to be written into the element
• Clock: determine when the data value is written
• The value that was written in a previous clock cycle
14
Clocking Methodology
• Clocking methodology– When signals can be read and when they can be
written
» If a signal is written at the same time it is read. Computer designs cannot tolerate such unpredictability
– The clock cycle/period is divided into two portions
» high clock
» low clock
15clock cycle
rising edge falling edge
Edge-triggered Clocking
• Edge-triggered clocking (邊緣觸發 )– meaning that state changes (in state elements)
occur only at a clock edge
– Using either the rising edge or the falling edge
• Typical execution:– Read contents of some state elements
– Send values through some combinational logic
– Write results to one or more state elements
16
State
element 1
State
element 2
Combinational logic
Clock cycle
The Clock
• In the above, we want to use the value in state element #1 to modify the value in state element #2: It takes one cycle
– We need all signals to be stabilized
clock cycle
state
element #1
state
element #2
stable updated on edge
combinatorial
circuit
stable by edge
Read/Write in a Clock Cycle
• A great implication of edge-triggered clocking
– A state element can be read and written in the same clock cycle
– We will say things like: “reads happen in the first half of the clock cycle, writes happen in the second half”
state
element #1
state
element #2
stable updated on edge
combinatorial
circuit
stable by edge
Write Control Signal (p.291)
• Both the clock signal and the write control signal are inputs
– The state element is changed only when
» The write control signal is asserted
» Clock edge occurs
– Assuming a rising edge update:» While the control bit stays at 0, nothing happen
» If we set the control bit to 1, the state element will be updated at the next rising edge
21
Busses and bus width
• Many of the state elements and combinational elements take multi-bit inputs (often 32-bit inputs)
• The term “bus” refers to a wire that carries more than one bit
– multiple 1-bit wires, really
• We simply indicate the width of the busses as follows:
16
8
control signal
Building a Datapath
• A datapath is an element in the processor that is supposed to operate on or hold data
– instruction memory, data memory, register file, ALU, adders
• Let’s re-examine the datapath elements we only barely introduced earlier
Building a Datapath
• Start by looking at which datapath elememts each instruction needs
– Also show their control signals
• Program Counter (PC) (程式計數器 )– (Register) Memory unit to store the instructions of a
program and supply instructions given an address
– 32 bits register that will written at the end of every clock cycle (not need a write control signal)
• Adder (加法器 )– Increment the PC to the address of the next
instruction
– Combinational. Built from the ALU
24
The Three Elements
• Two state element are needed to store and access instructions
– The instruction memory only provide read
– Output at any time reflects the contents of the location specified by the address input
• An adder is needed to compute the next instruction address (+4 Bytes)
– ALU wired to always perform an add
25
Fetching Instructions
add
Instruction
Memory
Instruction
read addressPC
4
32
32
32
The PC gets updated in 1 clock cycle because we use edge-triggered clocking
read address, instruction retrieved
from instruction memory
PC +4 latched into PC
Register File
• The processor’s 32 general-purpose registers
– Stored in a structure called register file
• Register file– Collection of registers in which any register
can be read or written by specifying the number of the register in the file
27
Read registernumber 1 Read
data 1
Readdata 2
Read registernumber 2
Register fileWriteregister
Writedata Write
Clock
5 bits
5 bits
5 bits
32 bits
32 bits
32 bits
Control signal
Datapath: Instruction Store/Fetch & PC Increment
PC
Instructionmemory
Instructionaddress
Instruction
a. Instruction memory b. Program counter
Add Sum
c. Adder
PC
Instructionmemory
Readaddress
Instruction
4
Add
Three elements used to store
and fetch instructions and
increment the PC Datapath
Animating the Datapath
Instruction <- MEM[PC]
PC <- PC + 4
RDMemory
ADDR
PC
Instruction
4
ADD
What about R-type instructions
• These instructions take 3 registers as arguments:– 1 output register
– 2 input registers
– Example: add $t1, $t2, $t3
» Which reads $t2 and $t3 and writes $t1
• We need an input that contains data to be written into the output register
– Typically comes from the ALU
• We need a Write signal to trigger the register write on the next clock edge
– A write anytime during the clock cycle could lead to race conditions if that register is also read
30
Datapath: R-Type Instruction
ALU control
RegWrite
RegistersWriteregister
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Writedata
ALUresult
ALU
Data
Data
Registernumbers
a. Registers b. ALU
Zero5
5
5 3
InstructionRegisters
Writeregister
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Writedata
ALUresult
ALU
Zero
RegWrite
ALU operation3
Two elements used to implement
R-type instructions
Datapath
Register File and ALU
ALU
Register File
Read
data 2
Read
register 1
32
Read
data 1Read
register 2Write
register
Write
data
RegWrite
32
5
5
5
32 Operation4
32
32
32
zero
Extracted from the 32-bit instruction code
Register
number
Add t1, t1, t2 (sketch)
ALU
Register File
Read
data 2
Read
register 1
32
Read
data 1Read
register 2Write
register
Write
data
RegWrite
(must be set only at the next edge)
32
5
5
5
32Operation4
t1
t2
t1
i
n
s
t
r
u
c
t
i
o
n
zero
Animating the Datapath (R-type)
add rd, rs, rt
R[rd] <- R[rs] + R[rt];
5 5 5
RD1
RD2
RN1 RN2 WN
WD
RegWrite
Register File
op rs rt rd functshamt
Operation
ALU Zero
Instruction
3
What about the Load/Store
• Ex. lw t1, offset(t2)– The memory @ is computed by adding the 16-
bit signed offset to the input register
– The offset of 16-bit, but memory addresses are 32-bit
– Therefore, the offset must be sign-extended into a 32-bit value before being added to the input register
– The memory has both read and write control
» MemWrite control signal
» MemRead control signal
35
Datapath: Load/Store Instruction
16 32Sign
extend
b. Sign-extension unit
MemRead
MemWrite
Datamemory
Writedata
Readdata
a. Data memory unit
Address
Instruction
16 32
RegistersWriteregister
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Datamemory
Writedata
Readdata
Writedata
Signextend
ALUresult
ZeroALU
Address
MemRead
MemWrite
RegWrite
ALU operation3
Two additional elements used
To implement load/stores
Datapath
Implementing Load/Store
sign
extend16 32
Data Memory
Address Read
data
Write
data
MemRead
MemWrite
3232
32
Data Memory UnitData Memory Unit
Sign-extension UnitSign-extension Unit
Implementing lw s1,offset(s2)
sign
extend16 32 Data Memory
Address Read
data
Write
data
MemRead (set)
MemWrite
(not set)
3232
32
i
n
s
t
r
u
c
t
i
o
n
s2
offset
s1
add32
Register File
Read
data 2
Read
register 1
32
Read
data 1
Read
register 2Write
register
Write
data
RegWrite
(set on next edge)
32
5
5
5
32
Animating the Datapath (Load)
op rs rt offset/immediate
5 5
16
RD1
RD2
RN1 RN2 WN
WD
RegWrite
Register File
Operation
ALU
3
EXTND
16 32
Zero
RDWD
MemRead
MemoryADDR
MemWrite
5
lw rt, offset(rs)
R[rt] <- MEM[R[rs]+s_extend(offset)];
Animating the Datapath (Store)
op rs rt offset/immediate
5 5
16
RD1
RD2
RN1 RN2 WN
WD
RegWrite
Register File
Operation
ALU
3
EXTND
16 32
Zero
RDWD
MemRead
MemoryADDR
MemWrite
5
sw rt, offset(rs)
MEM[R[rs]+sign_extend(offset)] <- R[rt]
What about the Branch (beq)
• 2 registers that are compared
• To do a branch we must– Compute the branch’s target address based on
its offset
– Decide whether the branch is taken or not taken» Taken: branch target address becomes the new PC
PC = (PC+4)+4*(target field)
» Not taken: if the operands are not equal,
PC=PC+4 as usual
42
Branch Datapath
43Datapath
No shift hardware required:
simply connect wires from
input to output, each shifted
left 2 bits
Animating the Datapath (branch)
beq rs, rt, offset
op rs rt offset/immediate
5 5
16
RD1
RD2
RN1 RN2 WN
WD
RegWrite
Register File
Operation
ALU
EXTND
16 32
Zero
ADD
<<2
PC +4 from instruction datapath
if (R[rs] == R[rt]) then PC <- PC+4 + s_extend(offset<<2)
Putting it altogether
• The simplest design is one in which – all instructions are executed in a single clock
cycle
• In this case, every element of the datapath is used only once per clock cycle
– No duplication of hardware needed
– Or only of a few adders perhaps here and there
– And we need separate Data and Instruction memories
• Let’s at first put together the pieces for the R-type (ALU) instructions and the memory instructions as they are quite similar.
46
Altogether (not quite)
We “simply” add multiplexer (多工器 ) for choosing between the
datapath for the ALU instructions and the memory instructions
Combining the datapaths for R-type instructions
and load/stores using two multiplexors
Animating the Datapath: R-type Instruction
add rd,rs,rt5 516
RD1
RD2
RN1 RN2 WN
WD
RegWrite
Register File
Operation
ALU
3
EXTND
16 32
Zero
RD
WD
MemRead
DataMemory
ADDRMemWrite
5
Instruction32
MUX
MUXALUSrc
MemtoReg
Animating the Datapath: Load Instruction
lw rt,offset(rs)5 516
RD1
RD2
RN1 RN2 WN
WD
RegWrite
Register File
Operation
ALU
3
EXTND
16 32
Zero
RD
WD
MemRead
DataMemory
ADDRMemWrite
5
Instruction32
MUX
MUXALUSrc
MemtoReg
Animating the Datapath: Store Instruction
sw rt,offset(rs)5 516
RD1
RD2
RN1 RN2 WN
WD
RegWrite
Register File
Operation
ALU
3
EXTND
16 32
Zero
RD
WD
MemRead
DataMemory
ADDRMemWrite
5
Instruction32
MUX
MUXALUSrc
MemtoReg
PC
Instructionmemory
Readaddress
Instruction
16 32
Registers
Writeregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Signextend
ALUresult
Zero
Datamemory
Address
Writedata
Readdata M
ux
4
Add
Mux
ALU
RegWrite
ALU operation3
MemRead
MemWrite
ALUSrcMemtoReg
Adding instruction fetch
Separate instruction memory as instruction and data read occur in the same clock cycle
Separate adder as ALU operations and PC increment occur in the same clock cycle
Complete Altogether
Adding branch capability and another multiplexor
Instruction address is either
PC+4 or branch target address
Extra adder needed as both
adders operate in each cycle
New multiplexor
Important note: in a single-cycle implementation data cannot be stored
during an instruction – it only moves through combinational logic
Question: is the MemRead signal really needed?! Think of RegWrite…!
5.4 What now?
• At this point we’ve identified most of the component for an almost full datapath for a very simple implementation of the MIPS ISA
• Let us now design the logic that makes it all work
– i.e., how we set the control signals
Datapath Executing add
add rd, rs, rt
5 516
RD1
RD2
RN1 RN2 WN
WD
RegWrite
Register File
Operation
ALU
3
EXTND
16 32
Zero
RD
WD
MemRead
DataMemory
ADDRMemWrite
5
Instruction32
MUX
ALUSrc
MemtoReg
ADD
<<2
RD
InstructionMemory
ADDR
PC
4
ADD
ADD
MUX
MUX
PCSrc
Datapath Executing lw
lw rt,offset(rs)
5 516
RD1
RD2
RN1 RN2 WN
WD
RegWrite
Register File
Operation
ALU
3
EXTND
16 32
Zero
RD
WD
MemRead
DataMemory
ADDRMemWrite
5
Instruction32
MUX
ALUSrc
MemtoReg
ADD
<<2
RD
InstructionMemory
ADDR
PC
4
ADD
ADD
MUX
MUX
PCSrc
Datapath Executing sw
sw rt,offset(rs)
5 516
RD1
RD2
RN1 RN2 WN
WD
RegWrite
Register File
Operation
ALU
3
EXTND
16 32
Zero
RD
WD
MemRead
DataMemory
ADDRMemWrite
5
Instruction32
MUX
ALUSrc
MemtoReg
ADD
<<2
RD
InstructionMemory
ADDR
PC
4
ADD
ADD
MUX
MUX
PCSrc
Datapath Executing beq
beq r1,r2,offset
5 516
RD1
RD2
RN1 RN2 WN
WD
RegWrite
Register File
Operation
ALU
3
EXTND
16 32
Zero
RD
WD
MemRead
DataMemory
ADDRMemWrite
5
Instruction32
MUX
ALUSrc
MemtoReg
ADD
<<2
RD
InstructionMemory
ADDR
PC
4
ADD
ADD
MUX
MUX
PCSrc
Control Unit
• Let’s go through the type of control signals that need to be generated
• An important set of signals if for the ALU
• Our ALU has four control signals:
ALU controls Function
0 0 0 0 AND
0 0 0 1 OR
0 0 1 0 add
0 1 1 0 subtract
0 1 1 1 set on less than
1 1 0 0 NOR
Controlling the ALU
• Depending on the instruction, the ALU will need to perform on of these five function
– For Load/Store: the ALU needs to add
– For R-type instructions: depends on the 6-bit function field in the low-order bits of the instructions (Remember Chapter 2)
– For branch: the ALU needs to subtract
61
Controlling the ALU
• We can generate the 4-bit ALU control using a small control unit that takes:
– 2 control bits called ALUOp» add (00), sub (01), depends (10)
– the instruction’s function field
• ALU control inputs based on – 2-bit ALUOp control
– 6-bit function code
62
Determining ALU Control Bits
64
Inst.
Opcode
ALUop Inst. Operation
Func. Field
Desired ALU action
ALU control input
lw 00 load xxxxxx add 0010
sw 00 store xxxxxx add 0010
beq 01 branch xxxxxx subtract 0110
R-type 10 add 100000 add 0010
R-type 10 subtract 100010 subtract 0110
R-type 10 and 100100 and 0000
R-type 10 or 100101 or 0001
R-type 10 Set on < 101010 Set on < 0111
Don’t CareDon’t Care
Design ALU Control Unit
• Designing logic– Useful to create a truth table for the interesting
combinations of the function code field and the ALUOp bits
– It can be optimized and then turned into gates
65
ALUOp Funct field OperationALUOp1 ALUOp0 F5 F4 F3 F2 F1 F0
0 0 X X X X X X 010X 1 X X X X X X 1101 X X X 0 0 0 0 0101 X X X 0 0 1 0 1101 X X X 0 1 0 0 0001 X X X 0 1 0 1 0011 X X X 1 0 1 0 111
The Three Instruction Classes
• R-type, load and store, and branch formats– Need to add a multiplexor to select which field
of the instruction is used to indicate the destination register
» 20:16 bit position (rt) for load
» 15:11 bit position (rd) for R-type instruction
66
op:0 rs rt rd shamt funct
4 or 5 rs rt address
35 or 43 rs rt address
31:26 25:21 20:16 15:11 10:6 5:0
31:26 25:21 20:16 15:0
31:26 25:21 20:16 15:0
R-type
Load & store
Branch
New Control Signals
• RegDst: destination comes from rt vs. rd
• RegWrite: register should be written
• ALUSrc: ALU operand from register vs. instruction
• PCSrc: PC from adder vs. branch target
• MemRead: for lw
• MemWrite: for store
• MemtoReg: register write from ALU vs. memory
The Seven Control Signals
Signal Name Effect when deasserted
(未被拉起時的功能 )
Effect when asserted
(被拉起時的功能 )
RegDst The register destination number comes from rt field ([20:16])
The register destination number comes from rd field ([15:11])
RegWrite
ALUSrc
PCSrc
MemRead
MemWrite
Mem2Reg
The Seven Control Signals
Signal Name Effect when deasserted Effect when asserted
RegDst The register destination number comes from rt field ([20:16])
The register destination number comes from rd field ([15:11])
RegWrite None The write register is written with the value on the write data input
ALUSrc
PCSrc
MemRead
MemWrite
Mem2Reg
The Seven Control Signals
Signal Name Effect when deasserted Effect when asserted
RegDst The register destination number comes from rt field ([20:16])
The register destination number comes from rd field ([15:11])
RegWrite None The write register is written with the value on the write data input
ALUSrc The second ALU operand comes from Read data 2
The second ALU operand is the sign-extended, lower 16 bits
PCSrc
MemRead
MemWrite
Mem2Reg
The Seven Control Signals
Signal Name Effect when deasserted Effect when asserted
RegDst The register destination number comes from rt field ([20:16])
The register destination number comes from rd field ([15:11])
RegWrite None The write register is written with the value on the write data input
ALUSrc The second ALU operand comes from Read data 2
The second ALU operand is the sign-extended, lower 16 bits
PCSrc The PC is replaced by the output of the adder, PC+4
The PC is replaced by the output of the adder, the branch target
MemRead
MemWrite
Mem2Reg
The Seven Control Signals
Signal Name Effect when deasserted Effect when asserted
RegDst The register destination number comes from rt field ([20:16])
The register destination number comes from rd field ([15:11])
RegWrite None The write register is written with the value on the write data input
ALUSrc The second ALU operand comes from Read data 2
The second ALU operand is the sign-extended, lower 16 bits
PCSrc The PC is replaced by the output of the adder, PC+4
The PC is replaced by the output of the adder, the branch target
MemRead None Data memory contents designated by the address are put on the Read data output
MemWrite
Mem2Reg
The Seven Control Signals
Signal Name Effect when deasserted Effect when asserted
RegDst The register destination number comes from rt field ([20:16])
The register destination number comes from rd field ([15:11])
RegWrite None The write register is written with the value on the write data input
ALUSrc The second ALU operand comes from Read data 2
The second ALU operand is the sign-extended, lower 16 bits
PCSrc The PC is replaced by the output of the adder, PC+4
The PC is replaced by the output of the adder, the branch target
MemRead None Data memory contents designated by the address are put on the Read data output
MemWrite None Data memory contents designated by the address are replaced by the write data input
Mem2Reg
The Seven Control Signals
Signal Name Effect when deasserted Effect when asserted
RegDst The register destination number comes from rt field ([20:16])
The register destination number comes from rd field ([15:11])
RegWrite None The write register is written with the value on the write data input
ALUSrc The second ALU operand comes from Read data 2
The second ALU operand is the sign-extended, lower 16 bits
PCSrc The PC is replaced by the output of the adder, PC+4
The PC is replaced by the output of the adder, the branch target
MemRead None Data memory contents designated by the address are put on the Read data output
MemWrite None Data memory contents designated by the address are replaced by the write data input
Mem2Reg Write to the register. Write data input comes from the ALU
Write to the register. Write data input comes from the data memory
Instruction RegDst ALUSrcMemto-
RegReg
WriteMem Read
Mem Write Branch ALUOp1 ALUp0
R-format 1 0 0 1 0 0 0 1 0lw 0 1 1 1 1 0 0 0 0sw X 1 X 0 0 1 0 0 0beq X 0 X 0 0 0 1 0 1
PC
Instructionmemory
Readaddress
Instruction[31– 0]
Instruction [20 16]
Instruction [25 21]
Add
Instruction [5 0]
MemtoReg
ALUOp
MemWrite
RegWrite
MemRead
BranchRegDst
ALUSrc
Instruction [31 26]
4
16 32Instruction [15 0]
0
0Mux
0
1
Control
Add ALUresult
Mux
0
1
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Signextend
Mux
1
ALUresult
Zero
PCSrc
Datamemory
Writedata
Readdata
Mux
1
Instruction [15 11]
ALUcontrol
Shiftleft 2
ALUAddress
Determining control signals for the MIPS datapath based on instruction opcode
PCSrc cannot be
set directly from the
opcode: zero test
outcome is required
Control Signals: R-Type Instruction
Control signals
shown in blue
1
0
0
0
1
???Value depends on
funct
0
0
5 516
RD1
RD2
RN1 RN2 WN
WD
RegWrite
Register File
Operation
ALU
3
EXTND
16 32
Zero
RD
WD
MemRead
DataMemory
ADDRMemWrite
5
Instruction I32
MUX
ALUSrc
MemtoReg
ADD
<<2
RD
InstructionMemory
ADDR
PC
4
ADD
ADD
MUX
MUX
PCSrc
MUX RegDst
5
rdI[15:11]
rtI[20:16]
rsI[25:21]
immediate/offsetI[15:0]
0
1
0
11
0
10
5 516
RD1
RD2
RN1 RN2 WN
WD
RegWrite
Register File
Operation
ALU
3
EXTND
16 32
Zero
RD
WD
MemRead
DataMemory
ADDRMemWrite
5
Instruction I32
MUX
ALUSrc
MemtoReg
ADD
<<2
RD
InstructionMemory
ADDR
PC
4
ADD
ADD
MUX
MUX
PCSrc
MUX RegDst
5
rdI[15:11]
rtI[20:16]
rsI[25:21]
immediate/offsetI[15:0]
0
1
0
11
0
10
Control Signals: lw Instruction
0
Control signals
shown in blue
0010
1
1
1
0
1
5 516
RD1
RD2
RN1 RN2 WN
WD
RegWrite
Register File
Operation
ALU
3
EXTND
16 32
Zero
RD
WD
MemRead
DataMemory
ADDRMemWrite
5
Instruction I32
MUX
ALUSrc
MemtoReg
ADD
<<2
RD
InstructionMemory
ADDR
PC
4
ADD
ADD
MUX
MUX
PCSrc
MUX RegDst
5
rdI[15:11]
rtI[20:16]
rsI[25:21]
immediate/offsetI[15:0]
0
1
0
11
0
10
Control Signals: sw Instruction
0
Control signals
shown in blue
X010
1
X
0
1
0
5 516
RD1
RD2
RN1 RN2 WN
WD
RegWrite
Register File
Operation
ALU
3
EXTND
16 32
Zero
RD
WD
MemRead
DataMemory
ADDRMemWrite
5
Instruction I32
MUX
ALUSrc
MemtoReg
ADD
<<2
RD
InstructionMemory
ADDR
PC
4
ADD
ADD
MUX
MUX
PCSrc
MUX RegDst
5
rdI[15:11]
rtI[20:16]
rsI[25:21]
immediate/offsetI[15:0]
0
1
0
11
0
10
Control Signals: beq Instruction
Control signals
shown in blue
X110
0
X
0
0
0
1 if Zero=1
Single-Cycle Design Problems (p.314)
• Assuming fixed-period clock every instruction datapath uses one clock cycle implies
– CPI = 1
– Cycle time determined by length of the longest instruction path (load)
» But several instructions could run in a shorter clock cycle: waste of time
– Resources used more than once in the same cycle need to be duplicated
» waste of hardware and chip area
82
Performance of Single-Cycle
– Memory units: 200 ps
– ALU and adder: 100ps
– Register file (read/write): 50ps
» multiplexors, control unit, PC accesses, sign extension, wires: no delay
• Assume instruction mix as follows– all loads take same time and comprise 25%
– all stores take same time and comprise 10%
– R-format instructions comprise 45%
– branches comprise 15%
– jumps comprise 5%
• Compare the performance of • (a) a single-cycle implementation using a fixed-period clock with
• (b) one using a variable-period clock where each instruction executes in one clock cycle that is only as long as it needs to be (not really practical but pretend it’s possible!)
83
Solution (1/3)
• CPU time = Instruction_count x CPI x clock_cycle• CPU time = Instruction_count x clock_cycle (CPI=1)
– We need only find the clock cycle time, since instruction count and CPI are the same for both implementations
84
Instruction class
Functional units used by the instruction class
R-type Inst. fetch Reg. access ALU Reg. access
Load word Inst. fetch Reg. access ALU Memory access
Reg. access
Store word Inst. fetch Reg. access ALU Memory access
Branch Inst. fetch Reg. access ALU
Jump Inst. fetch
Solution (2/3)
• Machine with a single clock for all instruction– be determined by the longest instruction 600 ps
• Machine with a variable clock – Find average clock cycle length– 400*45%+600*25%+550*10%+350*15%+200*5% =447.5ps
» It is clearly faster
85
Instruction class
Inst. Memory
Reg. read
ALU operation
Data memory
Reg. write
Total
R-type 200 50 100 0 50 400 ps
Load word 200 50 100 200 50 600 ps
Store word 200 50 100 200 550 ps
Branch 200 50 100 0 350 ps
Jump 200 200 ps
Solution (3/3)
• Unfortunately, implementing a variable-speed clock for each instruction class is extremely difficult
– Overhead for such an approach could be larger than any advantage gained
86
34.15.447
600
CycleClock CPU
CycleClock CPU
variableCycleClock CPUIC
CycleClock CPUIC
TimeExecution CPU
TimeExecution CPU
ePerformanc CPU
ePerformanc CPU
clock variable
clock single
clock
clock single
clock variable
clock single
clock single
clock variable
Example: Practice
• Consider a machine with an additional floating point unit. Assume functional unit delays as follows
– memory: 2 ns., ALU and adders: 2 ns., FPU add: 8 ns., FPU multiply: 16 ns., register file access (read or write): 1 ns.
– multiplexors, control unit, PC accesses, sign extension, wires: no delay
• Assume instruction mix as follows– all loads take same time and comprise 31%
– all stores take same time and comprise 21%
– R-format instructions comprise 27%
– branches comprise 5%
– jumps comprise 2%
– FP adds and subtracts take the same time and totally comprise 7%
– FP multiplys and divides take the same time and totally comprise 7%
• Compare the performance of (a) a single-cycle implementation using a fixed-period clock with (b) one using a variable-period clock where each instruction executes in one clock cycle that is only as long as it needs to be (not really practical but pretend it’s possible!)
Solution
• Clock period for fixed-period clock = longest instruction time = 20 ns.
• Average clock period for variable-period clock
• = 8 31% +7 21% + 6 27% + 5 5% + 2 2% + 20 7% + 12 7%
= 7.0 ns.
• Therefore, performancevar-period /performancefixed-period = 20/7 = 2.9
Instruction Instr. Register ALU Data Register FPU FPU Total
class mem. read oper. mem. write add/ mul/ time
sub div ns.
Load word 2 1 2 2 1 8
Store word 2 1 2 2 7
R-format 2 1 2 0 1 6
Branch 2 1 2 5
Jump 2 2
FP mul/div 2 1 1 16 20
FP add/sub 2 1 1 8 12
5.5 Multi-Cycle Implementation
• The design of a multi-cycle implementation
• The idea is to have the functional units and a set of additional registers
– to hold important values in between the cycles of a single instruction
• This way a functional unit can be shared between cycles of the same instruction
– provided some multiplexers are added to decide where the input should come from
– This sharing can help reduce the amount of hardware required
89
Multi-cycle Design
• Major Advantages– Instructions to take different numbers of clock cycles
– Share functional units within the execution of a single instruction
• Compare with single-cycle version– Single memory unit is used for both instructions and
data
– Single ALU (not ALU and two adders)
– One or more registers are added after every functional unit to hold the output
» Until the value is used in a subsequent clock cycle
90
Multi-cycle Design
• The clock cycle can accommodate at most one of the following operations
– Memory access
– Register file access (two reads or one write)
– ALU operation
• So, data produced by one of these three functional units must be saved
– Into a temporary register for use on a later cycle
91
Temporary Register
• Instruction register (IR)– Save the output of the memory for an
instruction read
• Memory data register (MDR)– Save the output of the memory for a data read
• A and B registers– Hold the register operand values read from the
register file
• ALUOut register– Hold the output of the ALU
92
• single memory for data
and instructions
• single ALU, no extra adders
• extra registers to
hold data between
clock cycles
Multi-cycle vs. single-cycle
PC
Instructionmemory
Readaddress
Instruction
16 32
Add ALUresult
Mux
Registers
Writeregister
Writedata
Readdata 1
Readdata 2
Readregister 1Readregister 2
Shiftleft 2
4
Mux
ALU operation3
RegWrite
MemRead
MemWrite
PCSrc
ALUSrc
MemtoReg
ALUresult
ZeroALU
Datamemory
Address
Writedata
Readdata M
ux
Signextend
Add
PC
Memory
Address
Instructionor data
Data
Instructionregister
Registers
Register #
Data
Register #
Register #
ALU
Memorydata
register
A
B
ALUOut
Single-cycle datapath
Multicycle datapath (high-level view)
Multicycle Datapath
Basic multicycle MIPS datapath handles R-type instructions and load/stores:
new internal register in red ovals, new multiplexors in blue ovals
Breaking Instructions into Steps
• Our goal is to break up the instructions into steps so that
– Each step takes one clock cycle
– The amount of work to be done in each step/cycle is about equal
– Each cycle uses at most once each major functional unit so that such units do not have to be replicated
– Functional units can be shared between different cycles within one instruction
• Data at end of one cycle to be used in next must be stored !!
95
Breaking Instructions into Steps
• For MIPS, we can think of the instruction running in 5 1-cycle stages
1.Instruction fetch and PC increment (IF)
2.Instruction decode and register fetch (ID)
3.Execution, memory address computation, or branch completion (EX)
4.Memory access or R-type instruction completion (MEM)
5.Memory read completion (WB)
– Each MIPS instruction takes from 3 – 5 cycles (steps)
96
• For MIPS, we can think of the instruction running in 5 1-cycle stages
1.Instruction fetch and PC increment (IF)
2.Instruction decode and register fetch (ID)
3.Execution, memory address computation, or branch completion (EX)
4.Memory access or R-type instruction completion (MEM)
5.Memory read completion (WB)
97
Step 1: Instruction Fetch & PC Increment (IF)
IR = Memory[PC]; PC = PC + 4;
• Use PC to get instruction and write the instruction into instruction register (IR)
• Increment the PC by 4 and put the result back in the PC
– The new value of the PC is not visible until the next clock cycle (stored into ALUOut)
• In this step we don’t know yet what the instruction does
98
• For MIPS, we can think of the instruction running in 5 1-cycle stages
1.Instruction fetch and PC increment (IF)
2.Instruction decode and register fetch (ID)
3.Execution, memory address computation, or branch completion (EX)
4.Memory access or R-type instruction completion (MEM)
5.Memory read completion (WB)
99
Step 2: Instruction Decode and Register Fetch (ID)
• Read registers rs and rt in case we need them
– Read them from the register file and store the values into the temporary register A and B
• Compute the branch address with the ALU and save it in a temporary register
A = Reg[IR[25-21]];B = Reg[IR[20-16]];ALUOut = PC+(sign-extend(IR[15-0]) << 2);
100
• For MIPS, we can think of the instruction running in 5 1-cycle stages
1.Instruction fetch and PC increment (IF)
2.Instruction decode and register fetch (ID)
3.Execution, memory address computation, or branch completion (EX)
4.Memory access or R-type instruction completion (MEM)
5.Memory read completion (WB)
101
Step 3: Execution, Address Computation or Branch Completion (EX)
• Action to be taken depending on the instruction class
– Memory reference (lw and sw, [rs]+offset)
ALUOut = A + sign-extend(IR[15-0]);
– Arithmetic-logical instruction (R-type)
ALUOut = A op B;
– Branch (A-B ? 0)
if (A==B) PC = ALUOut;
– Jump
PC = PC[31-28] || (IR(25-0) << 2)
102
• For MIPS, we can think of the instruction running in 5 1-cycle stages
1.Instruction fetch and PC increment (IF)
2.Instruction decode and register fetch (ID)
3.Execution, memory address computation, or branch completion (EX)
4.Memory access or R-type instruction completion (MEM)
5.Memory read completion (WB)
103
Step 4: Memory access or R-type Instruction Completion (MEM)
• Load or Store instruction accesses memory and an arithmetic-logical instruction writes its result
– If the instruction is a load
» Value is retrieved from memory, it is stored into the memory data register (MDR)
– If the instruction is a store
» Data is written to memory
– If the instruction is a R-type instruction
» Place the result from the ALU into a temporary register (ALUOut), write to rd
104
• For MIPS, we can think of the instruction running in 5 1-cycle stages
1.Instruction fetch and PC increment (IF)
2.Instruction decode and register fetch (ID)
3.Execution, memory address computation, or branch completion (EX)
4.Memory access or R-type instruction completion (MEM)
5.Memory read completion (WB)
105
Step 5: Memory Read Completion (WB)
• Loads complete by writing back the value from memory
– Write the load data, which was stored into MDR
– Write back into the register rt
Reg[IR[20-16]]= MDR;
106
Summary of Instruction Execution
107
Step nameAction for R-type
instructionsAction for memory-reference
instructionsAction for branches
Action for jumps
Instruction fetch IR = Memory[PC]PC = PC + 4
Instruction A = Reg [IR[25-21]]decode/register fetch B = Reg [IR[20-16]]
ALUOut = PC + (sign-extend (IR[15-0]) << 2)
Execution, address ALUOut = A op B ALUOut = A + sign-extend if (A ==B) then PC = PC [31-28] IIcomputation, branch/ (IR[15-0]) PC = ALUOut (IR[25-0]<<2)jump completion
Memory access or R-type Reg [IR[15-11]] = Load: MDR = Memory[ALUOut]completion ALUOut or
Store: Memory [ALUOut] = B
Memory read completion Load: Reg[IR[20-16]] = MDR
1: IF
2: ID
3: EX
4: MEM
5: WB
Step
The schematic view
IF ID EX
uses the memory
uses the register file
uses the ALU
uses the
memory
uses the
register file
Very important to remember
the content of this slide
Mem WB
Multicycle Execution Step (1):Instruction Fetch
IR = Memory[PC];PC = PC + 4;
4PC + 4
5 5
RD1
RD2
RN1 RN2 WN
WD
RegWrite
Registers
Operation
ALU
3
Zero
RD
WDMemRead
MemoryADDR
MemWrite
5
Instruction I
PC
IR
MDR
A
B
ALUOUT
Multicycle Execution Step (2):Instruction Decode & Register Fetch
A = Reg[IR[25-21]]; (A = Reg[rs])B = Reg[IR[20-15]]; (B = Reg[rt])ALUOut = (PC + sign-extend(IR[15-0]) << 2)
Branch
Target
Address
Reg[rs]
Reg[rt]
PC + 4
5 5
RD1
RD2
RN1 RN2 WN
WD
RegWrite
Registers
Operation
ALU
3
Zero
RD
WDMemRead
MemoryADDR
MemWrite
5
Instruction I
PC
IR
MDR
A
B
ALUOUT
Multicycle Execution Step (3):Memory Reference Instructions
ALUOut = A + sign-extend(IR[15-0]);
Mem.
Address
Reg[rs]
Reg[rt]
PC + 4
5 5
RD1
RD2
RN1 RN2 WN
WD
RegWrite
Registers
Operation
ALU
3
Zero
RD
WDMemRead
MemoryADDR
MemWrite
5
Instruction I
PC
IR
MDR
A
B
ALUOUT
Multicycle Execution Step (3):ALU Instruction (R-Type)
ALUOut = A op B
R-Type
Result
Reg[rs]
Reg[rt]
PC + 4
5 5
RD1
RD2
RN1 RN2 WN
WD
RegWrite
Registers
Operation
ALU
3
Zero
RD
WDMemRead
MemoryADDR
MemWrite
5
Instruction I
PC
IR
MDR
A
B
ALUOUT
Multicycle Execution Step (3):Branch Instructions
if (A == B) PC = ALUOut;
Branch
Target
Address
Reg[rs]
Reg[rt]
Branch
Target
Address
5 5
RD1
RD2
RN1 RN2 WN
WD
RegWrite
Registers
Operation
ALU
3
Zero
RD
WDMemRead
MemoryADDR
MemWrite
5
Instruction I
PC
IR
MDR
A
B
ALUOUT
Multicycle Execution Step (3):Jump InstructionPC = PC[31-28] concat (IR[25-0] << 2)
Jump
Address
Reg[rs]
Reg[rt]
Branch
Target
Address
5 5
RD1
RD2
RN1 RN2 WN
WD
RegWrite
Registers
Operation
ALU
3
Zero
RD
WDMemRead
MemoryADDR
MemWrite
5
Instruction I
PC
IR
MDR
A
B
ALUOUT
Multicycle Execution Step (4):Memory Access - Read (lw)
MDR = Memory[ALUOut];
Mem.
Data
PC + 4
Reg[rs]
Reg[rt]
Mem.
Address
5 5
RD1
RD2
RN1 RN2 WN
WD
RegWrite
Registers
Operation
ALU
3
Zero
RD
WDMemRead
MemoryADDR
MemWrite
5
Instruction I
PC
IR
MDR
A
B
ALUOUT
Multicycle Execution Step (4):Memory Access - Write (sw)
Memory[ALUOut] = B;
PC + 4
Reg[rs]
Reg[rt]
5 5
RD1
RD2
RN1 RN2 WN
WD
RegWrite
Registers
Operation
ALU
3
Zero
RD
WDMemRead
MemoryADDR
MemWrite
5
Instruction I
PC
IR
MDR
A
B
ALUOUT
Multicycle Execution Step (4):ALU Instruction (R-Type)Reg[IR[15:11]] = ALUOUT
R-Type
Result
Reg[rs]
Reg[rt]
PC + 4
5 5
RD1
RD2
RN1 RN2 WN
WD
RegWrite
Registers
Operation
ALU
3
Zero
RD
WDMemRead
MemoryADDR
MemWrite
5
Instruction I
PC
IR
MDR
A
B
ALUOUT
Multicycle Execution Step (5):Memory Read Completion (lw)
Reg[IR[20-16]] = MDR;
PC + 4
Reg[rs]
Reg[rt]Mem.
Data
Mem.
Address
5 5
RD1
RD2
RN1 RN2 WN
WD
RegWrite
Registers
Operation
ALU
3
Zero
RD
WDMemRead
MemoryADDR
MemWrite
5
Instruction I
PC
IR
MDR
A
B
ALUOUT
Multicycle Datapath with Control I
Shiftleft 2
MemtoReg
IorD MemRead MemWrite
PC
Memory
MemData
Writedata
Mux
0
1
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Instruction[15– 11]
Mux
0
1
Mux
0
1
4
ALUOpALUSrcB
RegDst RegWrite
Instruction[15– 0]
Instruction [5– 0]
Signextend
3216
Instruction[25– 21]
Instruction[20– 16]
Instruction[15– 0]
Instructionregister
1 Mux
0
3
2
ALUcontrol
Mux
0
1ALU
resultALU
ALUSrcA
ZeroA
B
ALUOut
IRWrite
Address
Memorydata
register
… with control lines and the ALU control block added – not all control lines are shown
Multicycle Datapath with Control II
Complete multicycle MIPS datapath (with branch and jump capability)
and showing the main control block and all control lines
Shiftleft 2
PCMux
0
1
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Instruction[15– 11]
Mux
0
1
Mux
0
1
4
Instruction[15– 0]
Signextend
3216
Instruction[25– 21]
Instruction[20– 16]
Instruction[15– 0]
Instructionregister
ALUcontrol
ALUresult
ALUZero
Memorydata
register
A
B
IorD
MemRead
MemWrite
MemtoReg
PCWriteCond
PCWrite
IRWrite
ALUOp
ALUSrcB
ALUSrcA
RegDst
PCSource
RegWrite
Control
Outputs
Op[5– 0]
Instruction[31-26]
Instruction [5– 0]
Mux
0
2
Jumpaddress [31-0]Instruction [25– 0] 26 28
Shiftleft 2
PC [31-28]
1
1 Mux
0
3
2
Mux
0
1ALUOut
Memory
MemData
Writedata
Address
New multiplexorNew gates For the jump address
Action of the Control Signals
• Action of the 1-bit control signals– RegDst, RegWrite
– ALUSrcA
– MemRead, MemWrite, MemtoRe
– IorD
– IRWrite
– PCWrite, PCWriteCond
• Action of the 2-bit control signals– ALUOp
– ALUSrcB
– PCSource
121
Multicycle Control Step (1): Fetch
IR = Memory[PC];PC = PC + 4;
1
0
1
0
1
0X
0X
0010
1
5 5
RD1
RD2
RN1 RN2 WN
WD
RegWrite
Registers
Operation
ALU
3
EXTND
16 32
Zero
RD
WDMemRead
MemoryADDR
MemWrite
5
Instruction I
32
ALUSrcB
<<2
PC
4
RegDst
5
IR
MDR
MUX
0123
MUX
1
0
MUX
0
1A
BALUOUT
0
1
2MUX
<<2 CONCAT28 32
MUX
0
1
ALUSrcA
jmpaddrI[25:0]
rd
MUX0 1
rtrs
immediate
PCSource
MemtoReg
IorD
PCWr*
IRWrite
Multicycle Control Step (2):Instruction Decode & Register Fetch
A = Reg[IR[25-21]]; (A = Reg[rs])B = Reg[IR[20-15]]; (B = Reg[rt])ALUOut = (PC + sign-extend(IR[15-0]) << 2);
0
0X
0
0X
3
0X
X
010
0
5 5
RD1
RD2
RN1 RN2 WN
WD
RegWrite
Registers
Operation
ALU
3
EXTND
16 32
Zero
RD
WDMemRead
MemoryADDR
MemWrite
5
Instruction I
32
ALUSrcB
<<2
PC
4
RegDst
5
IR
MDR
MUX
0123
MUX
1
0
MUX
0
1A
BALUOUT
0
1
2MUX
<<2 CONCAT28 32
MUX
0
1
ALUSrcA
jmpaddrI[25:0]
rd
MUX0 1
rtrs
immediate
PCSource
MemtoReg
IorD
PCWr*
IRWrite
0X
Multicycle Control Step (3):Memory Reference Instructions
ALUOut = A + sign-extend(IR[15-0]);
X
2
0
0X
0 1
X
010
0
5 5
RD1
RD2
RN1 RN2 WN
WD
RegWrite
Registers
Operation
ALU
3
EXTND
16 32
Zero
RD
WDMemRead
MemoryADDR
MemWrite
5
Instruction I
32
ALUSrcB
<<2
PC
4
RegDst
5
IR
MDR
MUX
0123
MUX
1
0
MUX
0
1A
BALUOUT
0
1
2MUX
<<2 CONCAT28 32
MUX
0
1
ALUSrcA
jmpaddrI[25:0]
rd
MUX0 1
rtrs
immediate
PCSource
MemtoReg
IorD
PCWr*
IRWrite
Multicycle Control Step (3):ALU Instruction (R-Type)
ALUOut = A op B;
0X
X
0
0
0X
0 1
X
???
0
5 5
RD1
RD2
RN1 RN2 WN
WD
RegWrite
Registers
Operation
ALU
3
EXTND
16 32
Zero
RD
WDMemRead
MemoryADDR
MemWrite
5
Instruction I
32
ALUSrcB
<<2
PC
4
RegDst
5
IR
MDR
MUX
0123
MUX
1
0
MUX
0
1A
BALUOUT
0
1
2MUX
<<2 CONCAT28 32
MUX
0
1
ALUSrcA
jmpaddrI[25:0]
rd
MUX0 1
rtrs
immediate
PCSource
MemtoReg
IorD
PCWr*
IRWrite
1 if
Zero=1
Multicycle Control Step (3):Branch Instructions
if (A == B) PC = ALUOut;
0X
X
0
0
X0 1
1
011
0
5 5
RD1
RD2
RN1 RN2 WN
WD
RegWrite
Registers
Operation
ALU
3
EXTND
16 32
Zero
RD
WDMemRead
MemoryADDR
MemWrite
5
Instruction I
32
ALUSrcB
<<2
PC
4
RegDst
5
IR
MDR
MUX
0123
MUX
1
0
MUX
0
1A
BALUOUT
0
1
2MUX
<<2 CONCAT28 32
MUX
0
1
ALUSrcA
jmpaddrI[25:0]
rd
MUX0 1
rtrs
immediate
PCSource
MemtoReg
IorD
PCWr*
IRWrite
Multicycle Execution Step (3):Jump Instruction
PC = PC[21-28] concat (IR[25-0] << 2);
0X
X
X
0
1X
0 X
2
XXX
0
5 5
RD1
RD2
RN1 RN2 WN
WD
RegWrite
Registers
Operation
ALU
3
EXTND
16 32
Zero
RD
WDMemRead
MemoryADDR
MemWrite
5
Instruction I
32
ALUSrcB
<<2
PC
4
RegDst
5
IR
MDR
MUX
0123
MUX
1
0
MUX
0
1A
BALUOUT
0
1
2MUX
<<2 CONCAT28 32
MUX
0
1
ALUSrcA
jmpaddrI[25:0]
rd
MUX0 1
rtrs
immediate
PCSource
MemtoReg
IorD
PCWr*
IRWrite
Multicycle Control Step (4):Memory Access - Read (lw)
MDR = Memory[ALUOut];
0X
X
X
1
01
0 X
X
XXX
0
5 5
RD1
RD2
RN1 RN2 WN
WD
RegWrite
Registers
Operation
ALU
3
EXTND
16 32
Zero
RD
WDMemRead
MemoryADDR
MemWrite
5
Instruction I
32
ALUSrcB
<<2
PC
4
RegDst
5
IR
MDR
MUX
0123
MUX
1
0
MUX
0
1A
BALUOUT
0
1
2MUX
<<2 CONCAT28 32
MUX
0
1
ALUSrcA
jmpaddrI[25:0]
rd
MUX0 1
rtrs
immediate
PCSource
MemtoReg
IorD
PCWr*
IRWrite
Multicycle Execution Steps (4)Memory Access - Write (sw)
Memory[ALUOut] = B;
0X
X
X
0
01
1 X
X
XXX
0
5 5
RD1
RD2
RN1 RN2 WN
WD
RegWrite
Registers
Operation
ALU
3
EXTND
16 32
Zero
RD
WDMemRead
MemoryADDR
MemWrite
5
Instruction I
32
ALUSrcB
<<2
PC
4
RegDst
5
IR
MDR
MUX
0123
MUX
1
0
MUX
0
1A
BALUOUT
0
1
2MUX
<<2 CONCAT28 32
MUX
0
1
ALUSrcA
jmpaddrI[25:0]
rd
MUX0 1
rtrs
immediate
PCSource
MemtoReg
IorD
PCWr*
IRWrite
10
0X
0
X
0
XXX
X
X
1
15 5
RD1
RD2
RN1 RN2 WN
WD
RegWrite
Registers
Operation
ALU
3
EXTND
16 32
Zero
RD
WD
MemRead
MemoryADDR
MemWrite
5
Instruction I
32
ALUSrcB
<<2
PC
4
RegDst
5
IR
MDR
MUX
0123
MUX
0
1
MUX
0
1A
BALUOUT
0
1
2MUX
<<2 CONCAT28 32
MUX
0
1
ALUSrcA
jmpaddrI[25:0]
rd
MUX0 1
rtrs
immediate
PCSource
MemtoReg
IorD
PCWr*
IRWrite
Multicycle Control Step (4):ALU Instruction (R-Type)
Reg[IR[15:11]] = ALUOut; (Reg[Rd] = ALUOut)
Multicycle Execution Steps (5)Memory Read Completion (lw)
Reg[IR[20-16]] = MDR;
1
0
0
X
0
0X
0 X
X
XXX
0
5 5
RD1
RD2
RN1 RN2 WN
WD
RegWrite
Registers
Operation
ALU
3
EXTND
16 32
Zero
RD
WD
MemRead
MemoryADDR
MemWrite
5
Instruction I
32
ALUSrcB
<<2
PC
4
RegDst
5
IR
MDR
MUX
0123
MUX
0
1
MUX
0
1A
BALUOUT
0
1
2MUX
<<2 CONCAT28 32
MUX
0
1
ALUSrcA
jmpaddrI[25:0]
rd
MUX0 1
rtrs
immediate
PCSource
MemtoReg
IorD
PCWr*
IRWrite
CPI in a Multicycle CPU
• What is the CPI assuming each step requires 1 clock cycle?
– An instruction mix of 25% loads, 10% stores, 11% branches, 2% jumps, and 52% ALU
– Solution:
» Number of clock cycles from previous slide for each instruction class:
• loads 5, stores 4, ALU 4, branches 3, jumps 3
– CPI = CPU clock cycles / instruction count
= (instruction countclass i CPIclass i) / instruction count
= (instruction countclass I / instruction count) CPIclass I
= 0.25 5 + 0.10 4 + 0.52 4 + 0.11 3 + 0.02 3
= 4.12
– Better than the worst-case CPI of 5.0 132
Conclusion
• If instructions take different amounts of time, multi-cycle is better
• We haven’t dived into the gory details of implementing a multi-cycle processors
– What we’ve talked covers Sections 5.1, 5.2, 5.3, 5.4, and a small subset of Section 5.5
– This is all you need to read in the book
» Don’t worry about most of the stuff in Section 5.5
• We are now ready to talk about our “big” topic: Pipelining
•Q & A
134
– Chapter 5: Datapath and Control (資料路徑與控制單元 )
– Single-Cycle Implementation v.s. Multi-Cycle Implementation
» MIPS Instruction types and formats
» What is Datapath? What are the datapath elements of MIPS?
» What are the five steps of MIPS datapath?
– Control unit design
» What are the two kinds of control unit design? Describe their implementations and compare them.
– Exception and Interrupt
» Definitions
» Operations
135
Example
Assume the base address of word array A is stored in the register $s0. The following code is used for the calculation:
A[2] = | A[0] + A[1] |. Highlight the running path of the following instructions in blue in the
simple datapath and mark the control signal. Assume the first instruction is stored in the address of 0040 1000hex .
lw $t0, 0($s0) lw $t1, 4($s0) add $t1, $t1, $t0 slt $t0, $t1, $zero beq $t0, $zero, Label sub $t1, $zero, $t1 sw $t1, 8($s0) j Exit
Label: sw $t1, 8($s0)Exit:
The Simple Datapath with Controls
ALU
Read register 1
Read data 1
Read register 2
Write register
Read data 2
Write data
Register files Sign-
extend
RegWrite
16 32
0
M
U
X
1
ALUsrc
Address
Read data
Write data
Data Memory
0
M
U
X
1
MemRead
MemWrite Mem2Reg
Read
address
Instruction
[31:0]
Instruction
Memory
PC
4
Shift
left 2
0
M
U
X
1
0
M
U
X
1
ALU
control
ALUop
RegDst
[25:21]
[20:16]
15:11
[15:0]
Control[31:26]
Branch
Zero
[5:0]
LW $t0, 0/4($s0)
ALU
Read register 1
Read data 1
Read register 2
Write register
Read data 2
Write data
Register files Sign-
extend
RegWrite
16 32
0
M
U
X
1
ALUsrc
Address
Read data
Write data
Data Memory
0
M
U
X
1
MemRead
MemWrite Mem2Reg
Read
address
Instruction
[31:0]
Instruction
Memory
PC
4
Shift
left 2
0
M
U
X
1
0
M
U
X
1
ALU
control
ALUop
RegDst
[25:21]
[20:16]
15:11
[15:0]
Control[31:26]
Branch
Zero
[5:0]
The Setting of Control Lines
Instruc-tions
RegDst ALUSrc Mem2Reg
Reg-
Write
Mem-
Read
Mem-
Write
Branch ALUOp1
ALUOp0
lw 0 1 1 1 1 0 0 0 0
add $t1, $t1, $t0 / slt $t0, $t1, $zero / sub $t1, $zero, $t1
ALU
Read register 1
Read data 1
Read register 2
Write register
Read data 2
Write data
Register files Sign-
extend
RegWrite
16 32
0
M
U
X
1
ALUsrc
Address
Read data
Write data
Data Memory
0
M
U
X
1
MemRead
MemWrite Mem2Reg
Read
address
Instruction
[31:0]
Instruction
Memory
PC
4
Shift
left 2
0
M
U
X
1
0
M
U
X
1
ALU
control
ALUop
RegDst
[25:21]
[20:16]
15:11
[15:0]
Control[31:26]
Branch
Zero
[5:0]
The Setting of Control Lines
Instruc-tions
RegDst ALUSrc Mem2Reg
Reg-
Write
Mem-
Read
Mem-
Write
Branch ALUOp1
ALUOp0
lw 0 1 1 1 1 0 0 0 0
R-type
1 0 0 1 0 0 0 1 0
beq $t0, $zero, Label (the case $t0 = zero)
ALU
Read register 1
Read data 1
Read register 2
Write register
Read data 2
Write data
Register files Sign-
extend
RegWrite
16 32
0
M
U
X
1
ALUsrc
Address
Read data
Write data
Data Memory
0
M
U
X
1
MemRead
MemWrite Mem2Reg
Read
address
Instruction
[31:0]
Instruction
Memory
PC
4
Shift
left 2
0
M
U
X
1
0
M
U
X
1
ALU
control
ALUop
RegDst
[25:21]
[20:16]
15:11
[15:0]
Control[31:26]
Branch
Zero
[5:0]
beq $t0, $zero, Label (the case $t0 != zero)
ALU
Read register 1
Read data 1
Read register 2
Write register
Read data 2
Write data
Register files Sign-
extend
RegWrite
16 32
0
M
U
X
1
ALUsrc
Address
Read data
Write data
Data Memory
0
M
U
X
1
MemRead
MemWrite Mem2Reg
Read
address
Instruction
[31:0]
Instruction
Memory
PC
4
Shift
left 2
0
M
U
X
1
0
M
U
X
1
ALU
control
ALUop
RegDst
[25:21]
[20:16]
15:11
[15:0]
Control[31:26]
Branch
Zero
[5:0]
The Setting of Control Lines
Instruc-tions
RegDst ALUSrc Mem2Reg
Reg-
Write
Mem-
Read
Mem-
Write
Branch ALUOp1
ALUOp0
lw 0 1 1 1 1 0 0 0 0
R-type
1 0 0 1 0 0 0 1 0
beq x 0 x 0 0 0 1 0 1
sw $t1, 8($s0)
ALU
Read register 1
Read data 1
Read register 2
Write register
Read data 2
Write data
Register files Sign-
extend
RegWrite
16 32
0
M
U
X
1
ALUsrc
Address
Read data
Write data
Data Memory
0
M
U
X
1
MemRead
MemWrite Mem2Reg
Read
address
Instruction
[31:0]
Instruction
Memory
PC
4
Shift
left 2
0
M
U
X
1
0
M
U
X
1
ALU
control
ALUop
RegDst
[25:21]
[20:16]
15:11
[15:0]
Control[31:26]
Branch
Zero
[5:0]
The Setting of Control Lines
Instruc-tions
RegDst ALUSrc Mem2Reg
Reg-
Write
Mem-
Read
Mem-
Write
Branch ALUOp1
ALUOp0
lw 0 1 1 1 1 0 0 0 0
R-type
1 0 0 1 0 0 0 1 0
beq x 0 x 0 0 0 1 0 1
sw x 1 x 0 0 1 0 0 0