Computer Structure - Multi-Cycle Datapath Adding the Jump Instruction The j instruction uses pseudodirect addressing, the upper 4 bits of PC+4 are concatenated

Computer Structure - Multi-Cycle Datapath

Adding the Jump InstructionThe j instruction uses pseudodirect

addressing, the upper 4 bits of PC+4 are concatenated (מחוברים) to the 26 bits (shifted left by 2) of the address in the J-type instruction.

Shiftleft 2

PC

Instructionmemory

Readaddress

Instruction[31– 0]

Datamemory

Readdata

Writedata

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Instruction [15– 11]



Add

ALUresult

Zero


MemtoReg

ALUOp

MemWrite

RegWrite

MemRead

Branch

JumpRegDst

ALUSrc


4

Mux

Instruction [25– 0] Jump address [31– 0]

PC+4 [31– 28]

Signextend

16 32Instruction [15– 0]

1

Mux

1

0

Mux

0

1

Mux

0

1

ALUcontrol

Control

Add ALUresult

Mux

0

1 0

ALU

Shiftleft 2

26 28

Address

1/13


Operation of the DatapathLet's see the stages of execution of a R-type

instruction add $t1,$t2,$t3:1. An instruction is fetched from memory, the PC is incremented

2. Two registers $t2 and $t3 are read from the register file.

3. The ALU operates on the data read from the register file.

4. The results of the ALU is written into the register $t3.

This doesn't really happen in 4 steps because the implementation is combinational, but at the end of the clock cycle the result is written into the destination register.

2/13


Let's see the stages of execution of branch instruction beq $t1,$t2,L1:

1. An instruction is fetched from memory, the PC is incremented

2. Two registers $t1 and $t2 are read from the register file.

2a. The branch ALU computes the sum of the PC and the sign-extended L1.

3. The ALU operates on the data read from the register file.

4. The output of the ALU decides if the PC is PC+4 or PC+L1.

Let's look at lw $t1,offset($t2)1. An instruction is fetched from memory, the PC is incremented

2. The register $t2 is read from the register file.

3. The ALU computes the sum of $t2 and the sign-extended offset.

4. The sum from the ALU is used as the address for the data memory.

5. The data from memory is written into register $t1.

Let's look at j L21. An instruction is fetched from memory, the PC is incremented

2. The PC is set to the L2 label (concatenated with the 4 MSBs of the PC).


Goal: Design a Multi-Cycle DatapathThe datapath presented completes in a single

machine cycle. Every instruction completes in the same time.

This is inefficient as the cycle time must be that of the longest instruction executed.

We will assume the following delays for each unit:Memory: 2 ns (nanoseconds)ALU: 2 nsRegister File: 1nsWires, MUXs, Sign-extend, PC ...: 0ns

3/13


Performance of Single-Cycle MachinesWhich implementation is faster?

1. A clock cycle of fixed length.2. A clock cycle of varying length.

Lets look at the time needed by each instruction:

Inst. Fetch Reg. Rd ALU op Memory Reg. Wr TotalR-Type 2 1 2 0 1 6nsLoad 2 1 2 2 1 8nsStore 2 1 2 2 7nsBranch 2 1 2 5nsJump 2 2ns

In a fixed clock datapath the clock cycle is 8ns.

4/13


Fixed vs. Variable Cycle PerformanceA program has the following instruction mix: 24%

loads, 12% stores, 44% R-type, 18% branches, 2% jumps.

CPU execution time = Instruction count * Cycle timeFixed Cycle ET (100 inst.) = 100*8ns = 800ns Variable cycle ET (100 inst.) =

8*24 + 7*12 + 6*44 + 5*18 + 2*2 = 634 nsThe variable clock is better but much harder to

implement.Single cycle is only 634/800 = 79% slower. Why not

use it?

5/13


For the former instruction mix a fixed cycle clock might do. But what about multiplication, division, floating point operations. A division instruction can take 40ns. It is obvious that a better solution is needed.

If we brake each instruction into several steps, we can use these steps to build a multicycle implementation. Each step takes 1 cycle. Instructions such as jumps and branches can complete in less cycles.

Another advantage is that the multicycle implementation allows a functional unit to be used more than once in each instruction as long as it is used on different clock cycles.


A Multicycle ImplementationWe now have only a single memory unit and

a single ALU. In addition we need registers to hold the output of each stage.

PC

Memory

Address

Instructionor data

Data

Instructionregister

Registers

Register #

Data

Register #

Register #

ALU

Memorydata

register

A

B

ALUOut

6/13


We have now added several new registers(which are transparent to the programmer, they can’t be accessed by software) and some new MUXs: Instruction Register (IR) - the instruction fetched Memory Data Register (MDR) - data read from memory A, B - registers read from the register file ALUOut - result of ALU operation

The new MUXs added are: An additional MUX to the 1st ALU input, chooses between the A

register and the PC. The MUX on the 2nd ALU input is changed from a 2-way to a 4-way

MUX. The additional inputs are the constant 4 (used to increment the PC) and the sign-extended and shifted offset field (used in beq).

There are 3 possible sources for the PC value: 1. The output of the ALU which is PC+4; 2. The register ALUOut which is the address of the computed branch target; 3. The lower 26 bits of the IR shifted left by 2, concatenated with the 4 upper bits of the PC.


Multicycle Diagram

Shiftleft 2

MemtoReg

IorD MemRead MemWrite

PC

Memory

MemData

Writedata

Mux

0

1


Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2


Mux

0

1

Mux

0

1

4

ALUOpALUSrcB

RegDst RegWrite



Signextend

3216




Instructionregister

1 Mux

0

3

2

ALUcontrol

Mux

0

1ALU

resultALU

ALUSrcA

ZeroA

B

ALUOut

IRWrite

Address

Memorydata

register

7/13

Shiftleft 2

PCMux

0

1


Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2


Mux

0

1

Mux

0

1

4


Signextend

3216




Instructionregister

ALUcontrol

ALUresult

ALUZero

Memorydata

register

A

B

IorD

MemRead

MemWrite

MemtoReg

PCWriteCond

PCWrite

IRWrite

ALUOp

ALUSrcB

ALUSrcA

RegDst

PCSource

RegWrite

Control

Outputs

Op[5– 0]

Instruction[31-26]


Mux

0

2

Jumpaddress [31-0]Instruction [25– 0] 26 28

Shiftleft 2

PC [31-28]

1

1 Mux

0

3

2

Mux

0

1ALUOut

Memory

MemData

Writedata

Address


The Instruction Execution Stages (1,2)

1. Instruction Fetch (IF) - Fetch the instruction from memory and compute the address of the next sequential address:IR = Memory[PC];PC= PC + 4;

2. Instruction Decode (ID) and register fetch - get the registers from the register file and compute the potential branch address (even if it isn't needed in the future):A = Reg[IR[25-21]];B = Reg[IR[20-16]];ALUOut = PC + (sign-extended(IR[15-0])<<2);

8/13


The Instruction Execution Stages (3)

3. Execution (EX), Memory address computation or branch completion - In this stage the operation is determined by the the instruction class: A. Memory reference: ALUOut = A + sign-extended(IR[15-0]);B. R-type: ALUOut = A op B;C. Branch: if (A == B) PC = ALUOut;D. Jump: PC = PC[31-28] cat (IR[25-0]<<2)

9/13


The Instruction Execution Stages (4,5)

4. Memory access (Mem) or R-type completion - During this step the load/store instruction accesses memory or the AL instruction write its results.A. Memory reference: MDR = Memory[ALUOut]; (load) Memory[ALUOut] = B; (store)B. R-type: Reg[IR[15-11]] = ALUOut;

5. Memory read completion ( Write Back - WB ) step - The load completes by writing the value from memory into a register.Reg[IR[20-16]]= MDR;

10/13


Cycles Per Instruction (CPI)The CPI of a program defines how many cycles an

average instruction takes. Assuming an instruction mix of:

22% loads, 11% stores, 49% R-type, 16% branch, 2% jumps

Clock cycles for each instruction format is:Loads: 5; Stores: 4; R-type: 4; Branches: 3; Jumps: 3

CPI = 0.22*5 + (0.11 + 0.49)*4 + (0.16 + 0.02)*3 = 4.04

This is better than a CPI of 5.00 if all instructions took the same number of cycles.

11/13


One of the most hardest parts of control is implementing exceptions and interrupts, events other than branches and jumps which change the normal flow of instruction execution.

An exception is an unexpected event that happens during program execution such as an arithmetic overflow or an illegal instruction (which are the only 2 in our design).

An interrupt is an event that is external to the processor, such as requests by I/O devices.

When an exception occurs the machine must save the address of the offending instruction in the exception program counter (EPC), and then transfer execution to the OS. The OS might service the exception and return control to the program or terminate execution.

The OS uses vectored interrupts in order to process the interrupts. In a vectored interrupt the address to which control is transferred is determined by the exception cause. The OS knows the cause of the exception by the address that is jumped to.

In MIPS all exceptions jump to the same routine. there the OS reads a

register called the Cause register and decides how to process the interrupt depending on the value in the cause register.


Exceptions When an exception occurs the cause is written into the

Cause Register. The Cause Register holds 0 for an undefined instruction and

1 for an arithmetic overflow. The EPC holds the address of the instruction which caused

the exception, the OS might need to return to the program. 2 control signals are needed to write to the EPC and cause

registers (EPCWrite and CauseWrite). A signal is needed to set the LSB of the Cause register

(IntCause). IntCause is defined by the control if it can't decode the

instruction or if the ALU signals an overflow. The next PC MUX now has 4 inputs, the exception handler addr is added

12/13


Datapath with Exceptions

Shiftleft 2

Memory

MemData

Writedata

Mux

0

1


Mux

0

1

4


Signextend

3216




Instructionregister

ALUcontrol

ALUresult

ALUZero

Memorydata

register

A

B

IorD

MemRead

MemWrite

MemtoReg

PCWriteCond

PCWrite

IRWrite

Control

Outputs

Op[5– 0]

Instruction[31-26]


Mux

0

2

Jumpaddress [31-0]Instruction [25– 0] 26 28

Shiftleft 2

PC [31-28]

1

Address

EPC

CO 00 00 00 3

Cause

ALUOp

ALUSrcB

ALUSrcA

RegDst

PCSource

RegWrite

EPCWriteIntCauseCauseWrite

1

0

1 Mux

0

3

2

Mux

0

1

Mux

0

1

PC

Mux

0

1


Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

ALUOut

13/13

Documents

Computer Structure - Multi-Cycle Datapath Adding the Jump Instruction The j instruction uses pseudodirect addressing, the upper 4 bits of PC+4 are concatenated