ECE473 Computer Organization and Architecture · ECE473 Lec 15.3 Pipeline Hazards •Where one...

Preview:

Citation preview

Lec 15.1ECE473

Pipeline: Control Hazard

ECE473 Computer Architecture and Organization

Lecturer: Prof. Yifeng Zhu

Fall, 2015

Portions of these slides are derived from:

Dave Patterson © UCB

Lec 15.2ECE473

Pipelining Outline

• Introduction– Defining Pipelining

– Pipelining Instructions

• Hazards– Structural hazards

– Data Hazards

– Control Hazards \

• Performance

• Controller implementation

Lec 15.3ECE473

Pipeline Hazards

• Where one instruction cannot immediatelyfollow another

• Types of hazards– Structural hazards - attempt to use same resource twice

– Control hazards - attempt to make decision before condition is evaluated

– Data hazards - attempt to use data before it is ready

• Can always resolve hazards by waiting

Lec 15.4ECE473

Control Hazards

A control hazard is when we need to find thedestination of a branch, and can’t fetch any newinstructions until we know that destination.

A branch is either– Taken: PC <= PC + 4 + Imm

– Not Taken: PC <= PC + 4

Lec 15.5ECE473

Control Hazard on BranchesThree Stage Stall

Control Hazards

10: beq r1,r3,36

14: and r2,r3,r5

18: or r6,r1,r7

22: add r8,r1,r9

36: xor r10,r1,r11

Reg

AL

U

DMemIfetch Reg

Reg

AL

U

DMemIfetch Reg

Reg

AL

U

DMemIfetch Reg

Reg

AL

U

DMemIfetch Reg

Reg

AL

U

DMemIfetch Reg

The penalty when branch take is 3 cycles!

Lec 15.6ECE473

Basic Pipelined Processor

In our original Design, branches have a penalty of 3 cycles

Lec 15.7ECE473

Reducing Branch Delay

Move following to ID stage

a) Branch-target address calculation

b) Branch condition decision

Reduced penalty (1 cycle) when branch take!

Lec 15.8ECE473

Reducing Branch Delay: move branch logic to ID stage ->

beq

writes PC

here

new PC

used here

0 2 4 6 8 10 12

IF ID EX MEM WB

16

add $r4,$r5,$r6

beq $r0,$r1,tgt IF ID EX MEM WB

IF ID EX MEM WBsw $s4,200($t5)

18

BUBBLE BUBBLE BUBBLE BUBBLE BUBBLE

STALL

Lec 15.9ECE473

Control Hazard Solution #1

• Stall– stop loading instructions until result is available

Lec 15.10ECE473

Control Hazard Solution #2 Branch Prediction

• Just stalling for each branch is not practical

• Common assumption: branch not taken

• When assumption fails: flush three instructions

Reg

Reg

CC 1

Time (in clock cycles)

40 beq $1, $3, 7

Program

execution

order

(in instructions)

IM Reg

IM DM

IM DM

IM DM

DM

DM Reg

Reg Reg

Reg

Reg

RegIM

44 and $12, $2, $5

48 or $13, $6, $2

52 add $14, $2, $2

72 lw $4, 50($7)

CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9

Reg

(Fig. 6.37)

Lec 15.11ECE473

Static Branch Prediction

For every branch, predict whether the branch will be taken or not taken.

Predicting branch not taken:

1. Speculatively fetch and execute in-line instructions following the branch

2. If prediction incorrect flush pipeline of speculated instructions

• Convert these instructions to NOPs by clearing pipeline registers

• These have not updated memory or registers at time of flush

Predicting branch taken:

1. Speculatively fetch and execute instructions at the branch target address

2. Useful only if target address known earlier than branch outcome

• May require stall cycles till target address known

• Flush pipeline if prediction is incorrect

• Must ensure that flushed instructions do not update memory/registers

Lec 15.12ECE473

Flush instructions in Branch Hazard

36 sub $10, $4, $8

40 beq $1, $3, 7 # taget = 40 + 4 + 7*4 = 72

44 and $12, $2, $5

48 or $13, $2, $6

52 ….

….

72 lw $4, 50($7)

Lec 15.13ECE473

Control Hazard - Stall

beq

writes PC

here

new PC

used here

0 2 4 6 8 10 12

IF ID EX MEM WB

16

add $r4,$r5,$r6

beq $r0,$r1,tgt IF ID EX MEM WB

IF ID EX MEM WBsw $s4,200($t5)

18

BUBBLE BUBBLE BUBBLE BUBBLE BUBBLE

STALL

Lec 15.14ECE473

Control Hazard - Correct Prediction

Fetch assuming

branch taken

0 2 4 6 8 10 12

IF ID EX MEM WB

16

add $r4,$r5,$r6

beq $r0,$r1,tgt IF ID EX MEM WB

IF ID EX MEM WBtgt:sw $s4,200($t5)

18

Lec 15.15ECE473

Control Hazard - Incorrect Prediction

“Squashed”

instruction

0 2 4 6 8 10 12

IF ID EX MEM WB

16

add $r4,$r5,$r6

beq $r0,$r1,tgt IF ID EX MEM WB

IF ID EX MEM WB

18

BUBBLE BUBBLE BUBBLE BUBBLE

tgt:sw $s4,200($t5)(incorrect - STALL)

IF

or $r8,$r8,$r9

Lec 15.16ECE473

Lec 15.17ECE473

Flush instructions at IF stage in Branch Hazard

Turn the instructions at IF stage into nop.

Lec 15.18ECE473

Flush instructions at IF stage in Branch Hazard

Turn the instructions at IF stage into nop.

2

zero control signals

Lec 15.19ECE473

Branch Behavior in Programs

• Based on SPEC benchmarks on DLX– Branches occur with a frequency of 14% to 16% in integer

programs and 3% to 12% in floating point programs.

– About 75% of the branches are forward branches

– 60% of forward branches are taken

– 80% of backward branches are taken

– 67% of all branches are taken

• Why are branches (especially backward branches) more likely to be taken than not taken?

Lec 15.20ECE473

1-Bit Branch Prediction

• Branch History Table (BHT): Lower bits of PC address index table of 1-bit values

– Says whether or not branch taken last time

– No address check (saves HW, but may not be right branch)

– If prediction is wrong, invert prediction bit

a31a30…a11…a2a1a0 branch instruction

1K-entry BHT

10-bit index

0

1

1

prediction bit

Instruction memory

Hypothesis: branch will do the same again.

1 = branch was last taken

0 = branch was last not taken

Lec 15.21ECE473

1-Bit Branch Prediction

• Example:

Consider a loop branch that is taken 9 times in a row and then not taken once. What is the prediction accuracy of 1-bit predictor for this branch assuming only this branch ever changes its corresponding prediction bit?

– Answer: 80%. Because there are two mispredictions – one on the first iteration and one on the last iteration. Why?

Lec 15.22ECE473

• Solution: 2-bit scheme where change prediction only if get misprediction twice

Red: stop, not taken

Green: go, taken

2-Bit Branch Prediction(Jim Smith, 1981)

T

T

NT

Predict Taken

Predict Not

Taken

Predict Taken

Predict Not

Taken

11 10

01 00T

NT

T

NT

NT

Lec 15.23ECE473

2-bit Predictor Statistics

Prediction accuracy of 4K-entry 2-bit prediction buffer on SPEC89 benchmarks:accuracy is lower for integer programs (gcc, espresso, eqntott, li) than for FP

Lec 15.24ECE473

2-bit Predictor Statistics

Prediction accuracy of 4K-entry 2-bit prediction buffer vs. “infinite” 2-bit buffer:increasing buffer size from 4K does not significantly improve performance

Lec 15.25ECE473

Control Hazard Solution #3Delay Branches

• Delayed branches – code rearranged by compiler to place independent instruction after every branch (in delay slot).

add $R4,$R5,$R6

beq $R1,$R2,20

lw $R3,400($R0)

beq $R1,$R2,20

add $R4,$R5,$R6

lw $R3,400($R0)

Lec 15.26ECE473

Scheduling the Delay Slot

Lec 15.27ECE473

Delayed Branch

• Instruction in branch delay slot is always executed

• Compiler (tries to) move a useful instruction into delay slot.

(a) From before the Branch: Always helpful when possible

ADD R1, R2, R3

BEQZ R2, L1 BEQZ R2, L1

DELAY SLOT ADD R1, R2, R3

- -

L1: L1:

• If the ADD instruction were: ADD R2, R1, R3 the move would not be possible

Lec 15.28ECE473

Delayed Branch(b) From the Target: Helps when branch is taken. May duplicate

instructions

ADD R2, R1, R3 ADD R2, R1, R3

BEQZ R2, L1 BEQZ R2, L2

DELAY SLOT SUB R4, R5, R6

- -

L1: SUB R4, R5, R6 L1: SUB R4, R5, R6

L2: L2:

Instructions between BEQ and SUB (in fall through) must not use R4.

Why is instruction at L1 duplicated? What if R5 or R6 changed?

Lec 15.29ECE473

Delayed Branch

( c ) From Fall Through: Helps when branch is not taken.

ADD R2, R1, R3 ADD R2, R1, R3

BEQZ R2, L1 BEQZ R2, L1

DELAY SLOT SUB R4, R5, R6

SUB R4, R5, R6 -

-

L1: L1:

Instructions at target (L1 and after) must not use R4 till set again.

• Cancelling (Nullifying) Branch:Branch instruction indicates direction of prediction.

If mispredicted the instruction in the delay slot is cancelled.

Greater flexibility for compiler to schedule instructions.

Lec 15.30ECE473

Delayed Branch• Limitations of delayed branch

–Compiler may not find appropriate instructions to fill delay slots. Then it fills delay slots with no-ops.

–Visible architectural feature – likely to change with new implementations

»Pipeline structure is exposed to compiler. Need to know how many delay slots.

Lec 15.31ECE473

Summary - Control Hazard Solutions

• Stall - stop fetching instr. until result is available– Significant performance penalty

– Hardware required to stall

• Predict - assume an outcome and continue fetching (undo if prediction is wrong) – Performance penalty only when guess wrong

– Hardware required to "squash" instructions

• Delayed branch - specify in architecture that following instruction is always executed– Compiler re-orders instructions into delay slot

– Insert "NOP" (no-op) operations when can't use (~50%)

– This is how original MIPS worked