Upload
kory-martin
View
229
Download
2
Tags:
Embed Size (px)
Citation preview
04/20/23 16:42 1 of 86
PipeliningPipelining
Chapter 6Chapter 6
04/20/23 16:42 2 of 86
Overview of PipeliningOverview of Pipelining
Pipelining is an implementation Pipelining is an implementation technique in which multiple technique in which multiple instructions are overlapped in instructions are overlapped in execution.execution.
Pipelining improves performance Pipelining improves performance by increasing instruction by increasing instruction throughput.throughput.
The execution time of an individual The execution time of an individual instruction is not decreased.instruction is not decreased.
04/20/23 16:42 3 of 86
AnalogyAnalogy
Doing laundry:Doing laundry:
1.1. Put clothes in washer to wash.Put clothes in washer to wash.
2.2. Put clothes in dryer to dry.Put clothes in dryer to dry.
3.3. Put clothes on table to fold.Put clothes on table to fold.
4.4. Put clothes away.Put clothes away.
04/20/23 16:42 4 of 86
AnalogyAnalogy
Non-pipelined:Non-pipelined:
04/20/23 16:42 5 of 86
AnalogyAnalogy
Pipelined:Pipelined:
04/20/23 16:42 6 of 86
ExampleExample
Assume that the operation time for Assume that the operation time for the major functional units are:the major functional units are: 200 ps for memory access200 ps for memory access 200 ps for ALU operation200 ps for ALU operation 100 ps for register access100 ps for register access
MIPS InstructionsMIPS Instructions
5 stages for a MIPS instruction:5 stages for a MIPS instruction:
Fetch → Reg. Read → ALU Op.Fetch → Reg. Read → ALU Op.
→ → Data access → Reg. WriteData access → Reg. Write
lw $s1, 100($s2)lw $s1, 100($s2) sw $s1, 100($s2)sw $s1, 100($s2) add $s1, $s2, $s3add $s1, $s2, $s3 beq $s1, $s2, 25beq $s1, $s2, 2504/20/23 16:42 7 of 86
04/20/23 16:42 8 of 86
ExampleExample
InstrInstructiouctio
nn
FetcFetchh
Reg Reg readread
ALU ALU opop
Data Data acceacce
ssss
Reg Reg writewrite
Total Total timetime
lwlw 200200 100100 200200 200200 100100 800 800 psps
swsw 200200 100100 200200 200200 700 700 psps
addadd 200200 100100 200200 100100 600 600 psps
beqbeq 200200 100100 200200 500 500 psps
Execution time for each instruction Execution time for each instruction class:class:
04/20/23 16:42 9 of 86
ExampleExample
For the single-cycle design:For the single-cycle design: Must allow for the slowest Must allow for the slowest
instruction – lw.instruction – lw. So the time required for So the time required for everyevery
instruction is 800 ps.instruction is 800 ps.
04/20/23 16:42 10 of 86
ExampleExample
Non-pipelined for three lw Non-pipelined for three lw instructions:instructions:
04/20/23 16:42 11 of 86
ExampleExample
Non-pipelined for three lw Non-pipelined for three lw instructions:instructions:
The time between the first and the The time between the first and the fourth instructions is 3 x 800 ps = fourth instructions is 3 x 800 ps = 2400 ps.2400 ps.
04/20/23 16:42 12 of 86
ExampleExample
For the pipelined multi-cycle design:For the pipelined multi-cycle design: Each clock cycle must be long Each clock cycle must be long
enough to accommodate the slowest enough to accommodate the slowest operation.operation.
So the time required for So the time required for everyevery clock clock cycle is 200 ps.cycle is 200 ps.
04/20/23 16:42 13 of 86
ExampleExample
Pipelined for three lw instructions :Pipelined for three lw instructions :
04/20/23 16:42 14 of 86
ExampleExample
Pipelined for three lw instructions:Pipelined for three lw instructions: The time between the first and the The time between the first and the
fourth instructions is 3 x 200 ps = fourth instructions is 3 x 200 ps = 600 ps.600 ps.
2400/600 = 4.2400/600 = 4. A fourfold performance A fourfold performance
improvement.improvement.
04/20/23 16:42 15 of 86
Pipeline HazardsPipeline Hazards
Structural hazardsStructural hazards Data hazardsData hazards Control hazardsControl hazards
04/20/23 16:42 16 of 86
Structural HazardsStructural Hazards
There is a structural hazard when There is a structural hazard when the hardware cannot support the the hardware cannot support the combination of instructions that we combination of instructions that we want to execute in the same clock want to execute in the same clock cycle.cycle.
Analogy: Having a washer/dryer Analogy: Having a washer/dryer combination.combination.
04/20/23 16:42 17 of 86
ExampleExample
What happens if we execute four lw What happens if we execute four lw instructions one after another…instructions one after another…
04/20/23 16:42 18 of 86
ExampleExample
What happens if we execute four lw What happens if we execute four lw instructions one after another…instructions one after another…
04/20/23 16:42 19 of 86
ExampleExample
What happens if we execute four lw What happens if we execute four lw instructions one after another…instructions one after another…
04/20/23 16:42 20 of 86
ExampleExample
What happens if we execute four lw What happens if we execute four lw instructions one after another…instructions one after another…
The 1The 1stst instruction is accessing data while instruction is accessing data while the 4the 4thth instruction is being fetched. instruction is being fetched.
SolutionSolution
Have two separate memories – Have two separate memories – One for instructionOne for instruction
One for dataOne for data
04/20/23 16:42 21 of 86
04/20/23 16:42 22 of 86
Data HazardsData Hazards
Data hazards occur when the pipeline Data hazards occur when the pipeline must be stalled because one step must must be stalled because one step must wait for another to complete.wait for another to complete.
Arise from the dependence of one Arise from the dependence of one instruction on an earlier one that is still instruction on an earlier one that is still in the pipeline.in the pipeline.
addadd $s0, $t0, $t1$s0, $t0, $t1
subsub $t2, $s0, $t3$t2, $s0, $t3
04/20/23 16:42 23 of 86
Solution 1Solution 1
Compilers can remove the data Compilers can remove the data hazard by moving non-dependent hazard by moving non-dependent instructions in between.instructions in between.
04/20/23 16:42 24 of 86
Solution 2Solution 2
Observation: we don’t need to wait Observation: we don’t need to wait for the add instruction to complete for the add instruction to complete before trying to resolve the data before trying to resolve the data hazard.hazard.
As soon as the ALU creates the sum As soon as the ALU creates the sum for the add, we can supply it as an for the add, we can supply it as an input for the subtract.input for the subtract.
04/20/23 16:42 25 of 86
ForwardingForwarding
ForwardingForwarding or or bypassingbypassing is when is when extra hardware is added to retrieve extra hardware is added to retrieve the missing item early from the the missing item early from the internal resources.internal resources.
04/20/23 16:42 26 of 86
ForwardingForwarding
Forwarding paths are valid only if Forwarding paths are valid only if the destination stage is later in time the destination stage is later in time than the source stage.than the source stage.
04/20/23 16:42 27 of 86
ForwardingForwarding
What happens when we have a sub What happens when we have a sub instruction after a lw instruction?instruction after a lw instruction?
04/20/23 16:42 28 of 86
ForwardingForwarding
04/20/23 16:42 29 of 86
Pipeline StallPipeline Stall
Even with forwarding, we need to Even with forwarding, we need to stall one stage for a stall one stage for a load-use data load-use data hazardhazard..
This is referred to as a This is referred to as a pipeline pipeline stallstall..
04/20/23 16:42 30 of 86
Example of reordering Example of reordering codecode
Consider the following code segment Consider the following code segment in C:in C:A = B + E;A = B + E;
C = B + F;C = B + F; Assume that all variables are in Assume that all variables are in
memory and are addressable as memory and are addressable as offsets from $t0.offsets from $t0.
04/20/23 16:42 31 of 86
Example of reordering Example of reordering codecode
The corresponding MIPS code is:The corresponding MIPS code is:lwlw $t1, 0($t0)$t1, 0($t0) // load B; offset from // load B; offset from
$t0$t0
lwlw $t2, 4($t0)$t2, 4($t0) // load E// load E
addadd $t3, $t1, $t2$t3, $t1, $t2 // B + E// B + E
swsw $t3, 12($t0)$t3, 12($t0)
lwlw $t4, 8($t0)$t4, 8($t0)
addadd $t5, $t1, $t4$t5, $t1, $t4
swsw $t5, 16($t0)$t5, 16($t0)
04/20/23 16:42 32 of 86
Example of reordering Example of reordering codecode
What are the problems?What are the problems?lwlw $t1, 0($t0)$t1, 0($t0) // load B; offset from // load B; offset from
$t0$t0
lwlw $t2, 4($t0)$t2, 4($t0) // load E// load E
addadd $t3, $t1, $t2$t3, $t1, $t2 // B + E// B + E
swsw $t3, 12($t0)$t3, 12($t0)
lwlw $t4, 8($t0)$t4, 8($t0)
addadd $t5, $t1, $t4$t5, $t1, $t4
swsw $t5, 16($t0)$t5, 16($t0)
04/20/23 16:42 33 of 86
Example of reordering Example of reordering codecode
What are the problems?What are the problems?lwlw $t1, 0($t0)$t1, 0($t0) // load B; offset from // load B; offset from
$t0$t0
lwlw $t2, 4($t0)$t2, 4($t0) // load E// load E
addadd $t3, $t1, $t2$t3, $t1, $t2 // B + E// B + E
swsw $t3, 12($t0)$t3, 12($t0)
lwlw $t4, 8($t0)$t4, 8($t0)
addadd $t5, $t1, $t4$t5, $t1, $t4
swsw $t5, 16($t0)$t5, 16($t0)
04/20/23 16:42 34 of 86
Example of reordering Example of reordering codecode
Code re-ordered with no stallsCode re-ordered with no stallslwlw $t1, 0($t0)$t1, 0($t0) // load B; offset from // load B; offset from
$t0$t0
lwlw $t2, 4($t0)$t2, 4($t0) // load E// load E
lwlw $t4, 8($t0)$t4, 8($t0)
addadd $t3, $t1, $t2$t3, $t1, $t2 // B + E// B + E
swsw $t3, 12($t0)$t3, 12($t0)
addadd $t5, $t1, $t4$t5, $t1, $t4
swsw $t5, 16($t0)$t5, 16($t0)
04/20/23 16:42 35 of 86
Control HazardsControl Hazards
A control hazard (also called branch A control hazard (also called branch hazard) arises from the need to make hazard) arises from the need to make a decision based on the results of one a decision based on the results of one instruction while others are executing.instruction while others are executing.
The proper instruction cannot execute The proper instruction cannot execute in the proper clock cycle because the in the proper clock cycle because the instruction that was fetched is not the instruction that was fetched is not the one that is needed.one that is needed.
Caused by the branch instruction.Caused by the branch instruction.
04/20/23 16:42 36 of 86
Pipelined DatapathPipelined Datapath
04/20/23 16:42 37 of 86
Pipelined DatapathPipelined Datapath
04/20/23 16:42 38 of 86
Pipelined DatapathPipelined Datapath
04/20/23 16:42 39 of 86
Pipelined DP for lwPipelined DP for lw
Instruction fetch
04/20/23 16:42 40 of 86
Pipelined DP for lwPipelined DP for lw
Instruction decode
04/20/23 16:42 41 of 86
Pipelined DP for lwPipelined DP for lw
Instruction execute
04/20/23 16:42 42 of 86
Pipelined DP for lwPipelined DP for lw
Memory access
04/20/23 16:42 43 of 86
Pipelined DP for lwPipelined DP for lw
Write back
Pipelined DP for swPipelined DP for sw
Note that in the memory access stage for Note that in the memory access stage for sw…sw…
The register containing the data to be The register containing the data to be stored was read in an earlier stage and stored was read in an earlier stage and stored in ID/EXstored in ID/EX
The only way to make the data available The only way to make the data available during the MEM stage is to place the during the MEM stage is to place the data into the EX/MEM pipeline register data into the EX/MEM pipeline register in the EX stage, just as we stored the in the EX stage, just as we stored the effective address into the EX/MEM.effective address into the EX/MEM.
04/20/23 16:42 44 of 86
Key PointKey Point
Each logical component of the Each logical component of the datapath – such as instruction datapath – such as instruction memory, register read ports, ALU, memory, register read ports, ALU, data memory, and register write data memory, and register write ports – can be used only within a ports – can be used only within a singlesingle pipeline state. pipeline state.
Otherwise we would have a Otherwise we would have a structural hazardstructural hazard
04/20/23 16:42 45 of 86
A bugA bug
There is a bug in the pipeline design There is a bug in the pipeline design for the load instructionfor the load instruction
What’s the problem?What’s the problem?
04/20/23 16:42 46 of 86
A bugA bug
During the write back stage of the During the write back stage of the load, we need the write register load, we need the write register number to use.number to use.
What instruction is supplying this What instruction is supplying this register number at this point in register number at this point in time?time?
It’s not the original lw instruction!It’s not the original lw instruction!
04/20/23 16:42 47 of 86
04/20/23 16:42 48 of 86
Pipelined DP for lwPipelined DP for lw
To properly handle write back
04/20/23 16:42 49 of 86
Pipelined ControlPipelined Control
The pipelined registers are written at The pipelined registers are written at each clock cycle, so there’s no separate each clock cycle, so there’s no separate write signals for them (IF/ID, ID/EX, write signals for them (IF/ID, ID/EX, EX/MEM, and MEM/WB)EX/MEM, and MEM/WB)
To specify control for the pipeline, we To specify control for the pipeline, we need only set the control values during need only set the control values during each pipeline stage.each pipeline stage.
Each control line is associated with a Each control line is associated with a component active in only a single component active in only a single pipeline stage.pipeline stage.
04/20/23 16:42 50 of 86
Pipelined ControlPipelined Control Divide the control lines into five groups:Divide the control lines into five groups:1.1. Instruction fetch – same operation in every Instruction fetch – same operation in every
clock cycle, therefore always asserted.clock cycle, therefore always asserted.2.2. Instruction decode – same as 1.Instruction decode – same as 1.3.3. Execution/address calculation – the signals Execution/address calculation – the signals
to be set are RegDst, ALUOp and ALUSrc.to be set are RegDst, ALUOp and ALUSrc.4.4. Memory access – the signals to be set are Memory access – the signals to be set are
Branch, MemRead and MemWrite. PCSrc Branch, MemRead and MemWrite. PCSrc is asserted by ALUis asserted by ALU
5.5. Write back – the signals to be set are Write back – the signals to be set are MemtoReg and RegWrite.MemtoReg and RegWrite.
04/20/23 16:42 51 of 86
Pipelined ControlPipelined Control
The 9 control signalsThe 9 control signals
04/20/23 16:42 52 of 86
Pipelined ControlPipelined Control
Implementing pipelined control Implementing pipelined control means setting the nine control lines means setting the nine control lines to these values in each stage for to these values in each stage for each instruction.each instruction.
Since the control lines start with the Since the control lines start with the EX stage, we can create the control EX stage, we can create the control information during instruction information during instruction decode.decode.
04/20/23 16:42 53 of 86
Pipelined ControlPipelined Control
The 9 control signalsThe 9 control signals
04/20/23 16:42 54 of 86
Pipelined ControlPipelined Control
4 of the 9 control lines are used in 4 of the 9 control lines are used in the EX stage.the EX stage.
5 are passed on to the EX/MEM 5 are passed on to the EX/MEM registerregister
04/20/23 16:42 55 of 86
Pipelined ControlPipelined Control
3 of the 9 lines are used in the MEM 3 of the 9 lines are used in the MEM stage.stage.
2 are passed on to the MEM/WB 2 are passed on to the MEM/WB registerregister
04/20/23 16:42 56 of 86
Pipelined ControlPipelined Control
2 of the 9 control lines are used in 2 of the 9 control lines are used in the WB stage.the WB stage.
04/20/23 16:42 57 of 86
Pipelined ControlPipelined Control
04/20/23 16:42 58 of 86
Data HazardsData Hazards
Pipelined dependences for 5 Pipelined dependences for 5 instructionsinstructions
04/20/23 16:42 59 of 86
ForwardingForwarding
04/20/23 16:42 60 of 86
Datapath with Datapath with Forwarding UnitForwarding Unit
Ignores forwarding of a store value to a store instruction.Ignores forwarding of a store value to a store instruction.
04/20/23 16:42 61 of 86
Forwarding UnitForwarding Unit
The forwarding unit controls the The forwarding unit controls the ALU multiplexors to replace the ALU multiplexors to replace the value from a general-purpose value from a general-purpose register with the value from the register with the value from the proper pipeline register.proper pipeline register.
04/20/23 16:42 62 of 86
Data Hazards and StallsData Hazards and Stalls
One case where forwarding cannot One case where forwarding cannot solve the problem is when an solve the problem is when an instruction tries to read a register instruction tries to read a register following a load instruction that following a load instruction that writes the same register.writes the same register.
E.g. a lw followed by a subE.g. a lw followed by a sub
04/20/23 16:42 63 of 86
Data Hazards and StallsData Hazards and Stalls
Since the dependence between the Since the dependence between the lwlw and the and the andand goes goes back in time, this hazard cannot be solved by forwarding.back in time, this hazard cannot be solved by forwarding.
04/20/23 16:42 64 of 86
Inserting a StallInserting a Stall
04/20/23 16:42 65 of 86
Inserting a StallInserting a Stall
The The andand instruction is turned into a instruction is turned into a nopnop
All instructions beginning with the All instructions beginning with the andand instruction are delayed one instruction are delayed one cycle.cycle.
04/20/23 16:42 66 of 86
Hazard Detection UnitHazard Detection Unit
04/20/23 16:42 67 of 86
Hazard Detection UnitHazard Detection Unit
The hazard detection unit controls The hazard detection unit controls the writing of the PC and IF/ID the writing of the PC and IF/ID registers plus the multiplexor that registers plus the multiplexor that chooses between the real control chooses between the real control values and all 0s.values and all 0s.
The hazard detection unit stalls and The hazard detection unit stalls and deasserts the control fields if the deasserts the control fields if the load-use hazard test is true.load-use hazard test is true.
04/20/23 16:42 68 of 86
Control HazardControl Hazard
Pipeline hazards involving branches.Pipeline hazards involving branches. The branch instruction decides The branch instruction decides
whether to branch in the MEM stage whether to branch in the MEM stage (clock cycle 4 in the figure).(clock cycle 4 in the figure).
In the meantime, three following In the meantime, three following instructions will have begun instructions will have begun execution.execution.
04/20/23 16:42 69 of 86
Control HazardControl Hazard
04/20/23 16:42 70 of 86
Solutions for Control Solutions for Control HazardsHazards
1.1. Assume branch not takenAssume branch not taken Continue execution down the sequential Continue execution down the sequential
instruction stream.instruction stream. If the branch is taken, the instructions If the branch is taken, the instructions
that are in the pipeline must be discarded.that are in the pipeline must be discarded. Execution continues at the branch target.Execution continues at the branch target. If branches are untaken half the time, and If branches are untaken half the time, and
if it costs little to discard the instructions, if it costs little to discard the instructions, then this optimization halves the cost of then this optimization halves the cost of control hazards.control hazards.
04/20/23 16:42 71 of 86
Solutions for Control Solutions for Control HazardsHazards
1.1. Assume branch not takenAssume branch not taken Discarding instructions means to flush Discarding instructions means to flush
instructions in the IF, ID, and Ex instructions in the IF, ID, and Ex stages of the pipeline.stages of the pipeline.
Change the original control values to Change the original control values to 0s, and let them percolate through the 0s, and let them percolate through the pipeline.pipeline.
04/20/23 16:42 72 of 86
Solutions for Control Solutions for Control HazardsHazards
2.2. Reducing the delay of branchesReducing the delay of branches Reduce the cost of the taken branch.Reduce the cost of the taken branch. Move the branch execution earlier in Move the branch execution earlier in
the pipeline so that fewer instructions the pipeline so that fewer instructions need to be flushed.need to be flushed.
Requires two actions to occur earlier:Requires two actions to occur earlier:
04/20/23 16:42 73 of 86
Solutions for Control Solutions for Control HazardsHazards
2.2. Reducing the delay of branchesReducing the delay of branches Reduce the cost of the taken branch.Reduce the cost of the taken branch. Move the branch execution earlier in Move the branch execution earlier in
the pipeline so that fewer instructions the pipeline so that fewer instructions need to be flushed.need to be flushed.
Requires two actions to occur earlier:Requires two actions to occur earlier:i.i. Computing the branch target address.Computing the branch target address.
04/20/23 16:42 74 of 86
Solutions for Control Solutions for Control HazardsHazards
2.2. Reducing the delay of branchesReducing the delay of branches Reduce the cost of the taken branch.Reduce the cost of the taken branch. Move the branch execution earlier in Move the branch execution earlier in
the pipeline so that fewer instructions the pipeline so that fewer instructions need to be flushed.need to be flushed.
Requires two actions to occur earlier:Requires two actions to occur earlier:i.i. Computing the branch target address.Computing the branch target address.
ii.ii. Evaluating the branch decision.Evaluating the branch decision.
04/20/23 16:42 75 of 86
Solutions for Control Solutions for Control HazardsHazards
2.2. Reducing the delay of branchesReducing the delay of branchesi.i. Computing the branch target address.Computing the branch target address. Easy.Easy. Already have the PC and the immediate Already have the PC and the immediate
field in the IF/ID pipeline register.field in the IF/ID pipeline register. Just move the branch adder from the EX Just move the branch adder from the EX
stage to the ID stage.stage to the ID stage. The address calculation will be performed The address calculation will be performed
for all instructions, but only used when for all instructions, but only used when needed.needed.
04/20/23 16:42 76 of 86
Branch adder locationBranch adder location
Move from EX to ID stageMove from EX to ID stage
04/20/23 16:42 77 of 86
Solutions for Control Solutions for Control HazardsHazards
2.2. Reducing the delay of branchesReducing the delay of branchesii.ii. Evaluating the branch decision.Evaluating the branch decision. Harder.Harder. Need to compare the two registers read Need to compare the two registers read
during the ID stage.during the ID stage. During ID, we mustDuring ID, we must
Decode the instructionDecode the instruction Decide whether a bypass to the equality unit is Decide whether a bypass to the equality unit is
needed. Source can come from EX/MEM or needed. Source can come from EX/MEM or MEM/WB pipeline registers.MEM/WB pipeline registers.
Complete the comparison.Complete the comparison. Set the PC to the branch address if necessary.Set the PC to the branch address if necessary.
04/20/23 16:42 78 of 86
Solutions for Control Solutions for Control HazardsHazards
2.2. Reducing the delay of branchesReducing the delay of branchesii.ii. Evaluating the branch decision.Evaluating the branch decision. The values in a branch comparison are The values in a branch comparison are
needed during ID but may be produced needed during ID but may be produced later in time later in time can cause a data hazard can cause a data hazard and a stall might be needed.and a stall might be needed.
Ex. If an ALU instruction immediately Ex. If an ALU instruction immediately preceding a branch produces one of the preceding a branch produces one of the operands for the comparison in the operands for the comparison in the branch, a stall will be required. Why?branch, a stall will be required. Why?
04/20/23 16:42 79 of 86
Solutions for Control Solutions for Control HazardsHazards
2.2. Reducing the delay of branchesReducing the delay of branchesii.ii. Evaluating the branch decision.Evaluating the branch decision. The values in a branch comparison are needed The values in a branch comparison are needed
during ID but may be produced later in time during ID but may be produced later in time can cause a data hazard and a stall might be can cause a data hazard and a stall might be needed.needed.
Ex. If an ALU instruction immediately Ex. If an ALU instruction immediately preceding a branch produces one of the preceding a branch produces one of the operands for the comparison in the branch, a operands for the comparison in the branch, a stall will be required.stall will be required.
Because the EX stage for the ALU instruction Because the EX stage for the ALU instruction will occur after the ID cycle of the branch.will occur after the ID cycle of the branch.
04/20/23 16:42 80 of 86
Solutions for Control Solutions for Control HazardsHazards
2.2. Reducing the delay of branchesReducing the delay of branchesii.ii. Evaluating the branch decision.Evaluating the branch decision. Ex. If a load instruction immediately Ex. If a load instruction immediately
preceding a branch produces one of the preceding a branch produces one of the operands for the comparison in the operands for the comparison in the branch, two stalls will be required.branch, two stalls will be required.
Because the result from the load appears Because the result from the load appears at the end of the MEM cycle but is needed at the end of the MEM cycle but is needed at the beginning of the ID cycle of the at the beginning of the ID cycle of the branch.branch.
04/20/23 16:42 81 of 86
Solutions for Control Solutions for Control HazardsHazards
2.2. Reducing the delay of branchesReducing the delay of branches Moving the branch execution to the ID Moving the branch execution to the ID
stage is an improvement since it reduces stage is an improvement since it reduces the penalty of a branch to only one the penalty of a branch to only one instruction if the branch is taken, namely, instruction if the branch is taken, namely, the one currently being fetched.the one currently being fetched.
Zeros the instruction field of the IF/ID Zeros the instruction field of the IF/ID pipeline register.pipeline register.
Clearing the register transforms the Clearing the register transforms the fetched instruction into a nop.fetched instruction into a nop.
04/20/23 16:42 82 of 86
Solutions for Control Solutions for Control HazardsHazards
3.3. Dynamic branch predictionDynamic branch prediction Assuming a branch is not taken is one Assuming a branch is not taken is one
simple form of branch prediction.simple form of branch prediction. With deeper pipelines and multiple issue, With deeper pipelines and multiple issue,
branch penalty increases in terms of branch penalty increases in terms of instructions lost.instructions lost.
A simple static branch prediction wastes A simple static branch prediction wastes too much performance.too much performance.
Possible to try to predict branch behavior Possible to try to predict branch behavior dynamically (i.e. during program dynamically (i.e. during program execution).execution).
04/20/23 16:42 83 of 86
Dynamic Branch Dynamic Branch PredictionPrediction
Implementation:Implementation: A A branch prediction bufferbranch prediction buffer or or
branch history tablebranch history table is used. is used. This is a small memory indexed by the This is a small memory indexed by the
lower portion of the address of the lower portion of the address of the branch instruction.branch instruction.
The memory contains a bit that says The memory contains a bit that says whether the branch was recently whether the branch was recently taken or not.taken or not.
04/20/23 16:42 84 of 86
Dynamic Branch Dynamic Branch PredictionPrediction
Look up the address of the Look up the address of the instruction to see if a branch was instruction to see if a branch was taken the last time this instruction taken the last time this instruction was executed.was executed.
If so, then fetch the new instruction If so, then fetch the new instruction from the same place.from the same place.
04/20/23 16:42 85 of 86
Dynamic Branch Dynamic Branch PredictionPrediction
The bit may have been put there by The bit may have been put there by another branch instruction that has the another branch instruction that has the same low-order address bits.same low-order address bits.
If the hint is wrong thenIf the hint is wrong then The incorrectly predicted instructions are The incorrectly predicted instructions are
deleted.deleted. The prediction bit is inverted and stored The prediction bit is inverted and stored
back.back. The proper sequence is fetched and The proper sequence is fetched and
executed.executed.
04/20/23 16:42 86 of 86
Dynamic Branch Dynamic Branch PredictionPrediction
Problem:Problem: If the branch is almost always taken, we will If the branch is almost always taken, we will
likely predict incorrectly likely predict incorrectly twicetwice, rather than , rather than once, when it is not taken.once, when it is not taken.
Example:Example: Consider a loop branch that branches nine Consider a loop branch that branches nine
times in a row, then is not taken once on the times in a row, then is not taken once on the tenth time. What is the prediction accuracy tenth time. What is the prediction accuracy assuming the prediction bit for this branch assuming the prediction bit for this branch remains in the prediction buffer?remains in the prediction buffer?
04/20/23 16:42 87 of 86
Dynamic Branch Dynamic Branch PredictionPrediction
Answer:Answer: The steady-state prediction behavior will The steady-state prediction behavior will
mispredict on the first and last loop mispredict on the first and last loop iterations.iterations.
Mispredicting the last iteration is Mispredicting the last iteration is inevitable since the prediction bit will say inevitable since the prediction bit will say taken during the first nine times.taken during the first nine times.
Mispredicting on the first iteration Mispredicting on the first iteration happens because the bit is flipped on happens because the bit is flipped on prior execution of the last iteration of the prior execution of the last iteration of the loop.loop.
04/20/23 16:42 88 of 86
Dynamic Branch Dynamic Branch PredictionPrediction
The prediction accuracy for this The prediction accuracy for this branch that is taken 90% of the time branch that is taken 90% of the time is only 80% (8 out of 10).is only 80% (8 out of 10).
Ideally, the accuracy of the predictor Ideally, the accuracy of the predictor should match the taken branch should match the taken branch frequency for these highly regular frequency for these highly regular branches.branches.
04/20/23 16:42 89 of 86
Dynamic Branch Dynamic Branch PredictionPrediction
A 2-bit prediction scheme.A 2-bit prediction scheme. A prediction must be wrong twice A prediction must be wrong twice
before the bit is changed.before the bit is changed.
04/20/23 16:42 90 of 86
2-bit prediction scheme2-bit prediction scheme
04/20/23 16:42 91 of 86
Solutions for Control Solutions for Control HazardsHazards
4.4. Scheduling the branch delay slotScheduling the branch delay slot dd
04/20/23 16:42 92 of 86
Partial MIPS Partial MIPS InstructionsInstructions
InstructiInstructionon
OP (6)OP (6) rs (5)rs (5) rt (5)rt (5) rd (5)rd (5) shamt shamt (5)(5)
funct funct (6)(6)
LWLW 3535 rsrs rdrd offsetoffset
SWSW 4343 rsrs rdrd offsetoffset
BEQBEQ 44 rsrs rtrt offsetoffset
ADDADD 00 rsrs rtrt rdrd 00 3232
SUBSUB 00 rsrs rtrt rdrd 00 3434
ANDAND 00 rsrs rtrt rdrd 00 3636
OROR 00 rsrs rtrt rdrd 00 3737
SLTSLT 00 rsrs rtrt rdrd 00 4242
ADDIADDI 88 rsrs rtrt immimm
OUTOUT 6363 rsrs* All numbers are in decimal.