Upload
owen-montgomery
View
215
Download
0
Embed Size (px)
Citation preview
Computer Structure - Multi-Cycle Datapath
Adding the Jump InstructionThe j instruction uses pseudodirect
addressing, the upper 4 bits of PC+4 are concatenated (מחוברים) to the 26 bits (shifted left by 2) of the address in the J-type instruction.
Shiftleft 2
PC
Instructionmemory
Readaddress
Instruction[31– 0]
Datamemory
Readdata
Writedata
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Instruction [15– 11]
Instruction [20– 16]
Instruction [25– 21]
Add
ALUresult
Zero
Instruction [5– 0]
MemtoReg
ALUOp
MemWrite
RegWrite
MemRead
Branch
JumpRegDst
ALUSrc
Instruction [31– 26]
4
Mux
Instruction [25– 0] Jump address [31– 0]
PC+4 [31– 28]
Signextend
16 32Instruction [15– 0]
1
Mux
1
0
Mux
0
1
Mux
0
1
ALUcontrol
Control
Add ALUresult
Mux
0
1 0
ALU
Shiftleft 2
26 28
Address
1/13
Computer Structure - Multi-Cycle Datapath
Operation of the DatapathLet's see the stages of execution of a R-type
instruction add $t1,$t2,$t3:1. An instruction is fetched from memory, the PC is incremented
2. Two registers $t2 and $t3 are read from the register file.
3. The ALU operates on the data read from the register file.
4. The results of the ALU is written into the register $t3.
This doesn't really happen in 4 steps because the implementation is combinational, but at the end of the clock cycle the result is written into the destination register.
2/13
Computer Structure - Multi-Cycle Datapath
Let's see the stages of execution of branch instruction beq $t1,$t2,L1:
1. An instruction is fetched from memory, the PC is incremented
2. Two registers $t1 and $t2 are read from the register file.
2a. The branch ALU computes the sum of the PC and the sign-extended L1.
3. The ALU operates on the data read from the register file.
4. The output of the ALU decides if the PC is PC+4 or PC+L1.
Let's look at lw $t1,offset($t2)1. An instruction is fetched from memory, the PC is incremented
2. The register $t2 is read from the register file.
3. The ALU computes the sum of $t2 and the sign-extended offset.
4. The sum from the ALU is used as the address for the data memory.
5. The data from memory is written into register $t1.
Let's look at j L21. An instruction is fetched from memory, the PC is incremented
2. The PC is set to the L2 label (concatenated with the 4 MSBs of the PC).
Computer Structure - Multi-Cycle Datapath
Goal: Design a Multi-Cycle DatapathThe datapath presented completes in a single
machine cycle. Every instruction completes in the same time.
This is inefficient as the cycle time must be that of the longest instruction executed.
We will assume the following delays for each unit:Memory: 2 ns (nanoseconds)ALU: 2 nsRegister File: 1nsWires, MUXs, Sign-extend, PC ...: 0ns
3/13
Computer Structure - Multi-Cycle Datapath
Performance of Single-Cycle MachinesWhich implementation is faster?
1. A clock cycle of fixed length.2. A clock cycle of varying length.
Lets look at the time needed by each instruction:
Inst. Fetch Reg. Rd ALU op Memory Reg. Wr TotalR-Type 2 1 2 0 1 6nsLoad 2 1 2 2 1 8nsStore 2 1 2 2 7nsBranch 2 1 2 5nsJump 2 2ns
In a fixed clock datapath the clock cycle is 8ns.
4/13
Computer Structure - Multi-Cycle Datapath
Fixed vs. Variable Cycle PerformanceA program has the following instruction mix: 24%
loads, 12% stores, 44% R-type, 18% branches, 2% jumps.
CPU execution time = Instruction count * Cycle timeFixed Cycle ET (100 inst.) = 100*8ns = 800ns Variable cycle ET (100 inst.) =
8*24 + 7*12 + 6*44 + 5*18 + 2*2 = 634 nsThe variable clock is better but much harder to
implement.Single cycle is only 634/800 = 79% slower. Why not
use it?
5/13
Computer Structure - Multi-Cycle Datapath
For the former instruction mix a fixed cycle clock might do. But what about multiplication, division, floating point operations. A division instruction can take 40ns. It is obvious that a better solution is needed.
If we brake each instruction into several steps, we can use these steps to build a multicycle implementation. Each step takes 1 cycle. Instructions such as jumps and branches can complete in less cycles.
Another advantage is that the multicycle implementation allows a functional unit to be used more than once in each instruction as long as it is used on different clock cycles.
Computer Structure - Multi-Cycle Datapath
A Multicycle ImplementationWe now have only a single memory unit and
a single ALU. In addition we need registers to hold the output of each stage.
PC
Memory
Address
Instructionor data
Data
Instructionregister
Registers
Register #
Data
Register #
Register #
ALU
Memorydata
register
A
B
ALUOut
6/13
Computer Structure - Multi-Cycle Datapath
We have now added several new registers(which are transparent to the programmer, they can’t be accessed by software) and some new MUXs: Instruction Register (IR) - the instruction fetched Memory Data Register (MDR) - data read from memory A, B - registers read from the register file ALUOut - result of ALU operation
The new MUXs added are: An additional MUX to the 1st ALU input, chooses between the A
register and the PC. The MUX on the 2nd ALU input is changed from a 2-way to a 4-way
MUX. The additional inputs are the constant 4 (used to increment the PC) and the sign-extended and shifted offset field (used in beq).
There are 3 possible sources for the PC value: 1. The output of the ALU which is PC+4; 2. The register ALUOut which is the address of the computed branch target; 3. The lower 26 bits of the IR shifted left by 2, concatenated with the 4 upper bits of the PC.
Computer Structure - Multi-Cycle Datapath
Multicycle Diagram
Shiftleft 2
MemtoReg
IorD MemRead MemWrite
PC
Memory
MemData
Writedata
Mux
0
1
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Instruction[15– 11]
Mux
0
1
Mux
0
1
4
ALUOpALUSrcB
RegDst RegWrite
Instruction[15– 0]
Instruction [5– 0]
Signextend
3216
Instruction[25– 21]
Instruction[20– 16]
Instruction[15– 0]
Instructionregister
1 Mux
0
3
2
ALUcontrol
Mux
0
1ALU
resultALU
ALUSrcA
ZeroA
B
ALUOut
IRWrite
Address
Memorydata
register
7/13
Shiftleft 2
PCMux
0
1
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Instruction[15– 11]
Mux
0
1
Mux
0
1
4
Instruction[15– 0]
Signextend
3216
Instruction[25– 21]
Instruction[20– 16]
Instruction[15– 0]
Instructionregister
ALUcontrol
ALUresult
ALUZero
Memorydata
register
A
B
IorD
MemRead
MemWrite
MemtoReg
PCWriteCond
PCWrite
IRWrite
ALUOp
ALUSrcB
ALUSrcA
RegDst
PCSource
RegWrite
Control
Outputs
Op[5– 0]
Instruction[31-26]
Instruction [5– 0]
Mux
0
2
Jumpaddress [31-0]Instruction [25– 0] 26 28
Shiftleft 2
PC [31-28]
1
1 Mux
0
3
2
Mux
0
1ALUOut
Memory
MemData
Writedata
Address
Computer Structure - Multi-Cycle Datapath
The Instruction Execution Stages (1,2)
1. Instruction Fetch (IF) - Fetch the instruction from memory and compute the address of the next sequential address:IR = Memory[PC];PC= PC + 4;
2. Instruction Decode (ID) and register fetch - get the registers from the register file and compute the potential branch address (even if it isn't needed in the future):A = Reg[IR[25-21]];B = Reg[IR[20-16]];ALUOut = PC + (sign-extended(IR[15-0])<<2);
8/13
Computer Structure - Multi-Cycle Datapath
The Instruction Execution Stages (3)
3. Execution (EX), Memory address computation or branch completion - In this stage the operation is determined by the the instruction class: A. Memory reference: ALUOut = A + sign-extended(IR[15-0]);B. R-type: ALUOut = A op B;C. Branch: if (A == B) PC = ALUOut;D. Jump: PC = PC[31-28] cat (IR[25-0]<<2)
9/13
Computer Structure - Multi-Cycle Datapath
The Instruction Execution Stages (4,5)
4. Memory access (Mem) or R-type completion - During this step the load/store instruction accesses memory or the AL instruction write its results.A. Memory reference: MDR = Memory[ALUOut]; (load) Memory[ALUOut] = B; (store)B. R-type: Reg[IR[15-11]] = ALUOut;
5. Memory read completion ( Write Back - WB ) step - The load completes by writing the value from memory into a register.Reg[IR[20-16]]= MDR;
10/13
Computer Structure - Multi-Cycle Datapath
Cycles Per Instruction (CPI)The CPI of a program defines how many cycles an
average instruction takes. Assuming an instruction mix of:
22% loads, 11% stores, 49% R-type, 16% branch, 2% jumps
Clock cycles for each instruction format is:Loads: 5; Stores: 4; R-type: 4; Branches: 3; Jumps: 3
CPI = 0.22*5 + (0.11 + 0.49)*4 + (0.16 + 0.02)*3 = 4.04
This is better than a CPI of 5.00 if all instructions took the same number of cycles.
11/13
Computer Structure - Multi-Cycle Datapath
One of the most hardest parts of control is implementing exceptions and interrupts, events other than branches and jumps which change the normal flow of instruction execution.
An exception is an unexpected event that happens during program execution such as an arithmetic overflow or an illegal instruction (which are the only 2 in our design).
An interrupt is an event that is external to the processor, such as requests by I/O devices.
When an exception occurs the machine must save the address of the offending instruction in the exception program counter (EPC), and then transfer execution to the OS. The OS might service the exception and return control to the program or terminate execution.
The OS uses vectored interrupts in order to process the interrupts. In a vectored interrupt the address to which control is transferred is determined by the exception cause. The OS knows the cause of the exception by the address that is jumped to.
In MIPS all exceptions jump to the same routine. there the OS reads a
register called the Cause register and decides how to process the interrupt depending on the value in the cause register.
Computer Structure - Multi-Cycle Datapath
Exceptions When an exception occurs the cause is written into the
Cause Register. The Cause Register holds 0 for an undefined instruction and
1 for an arithmetic overflow. The EPC holds the address of the instruction which caused
the exception, the OS might need to return to the program. 2 control signals are needed to write to the EPC and cause
registers (EPCWrite and CauseWrite). A signal is needed to set the LSB of the Cause register
(IntCause). IntCause is defined by the control if it can't decode the
instruction or if the ALU signals an overflow. The next PC MUX now has 4 inputs, the exception handler addr is added
12/13
Computer Structure - Multi-Cycle Datapath
Datapath with Exceptions
Shiftleft 2
Memory
MemData
Writedata
Mux
0
1
Instruction[15– 11]
Mux
0
1
4
Instruction[15– 0]
Signextend
3216
Instruction[25– 21]
Instruction[20– 16]
Instruction[15– 0]
Instructionregister
ALUcontrol
ALUresult
ALUZero
Memorydata
register
A
B
IorD
MemRead
MemWrite
MemtoReg
PCWriteCond
PCWrite
IRWrite
Control
Outputs
Op[5– 0]
Instruction[31-26]
Instruction [5– 0]
Mux
0
2
Jumpaddress [31-0]Instruction [25– 0] 26 28
Shiftleft 2
PC [31-28]
1
Address
EPC
CO 00 00 00 3
Cause
ALUOp
ALUSrcB
ALUSrcA
RegDst
PCSource
RegWrite
EPCWriteIntCauseCauseWrite
1
0
1 Mux
0
3
2
Mux
0
1
Mux
0
1
PC
Mux
0
1
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
ALUOut
13/13