74
Pipelining 3: Hazards/Forwarding/Prediction 1

Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

  • Upload
    others

  • View
    11

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

Pipelining 3: Hazards/Forwarding/Prediction

1

Page 2: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

pipeline stages

fetch — instruction memory, most PC computation

decode — reading register file

execute — computation, condition code read/write

memory — memory read/write

writeback — writing register file, writing Stat register

common case: fetch next instruction in next cyclecan’t for conditional jump, return

read/write in same stage avoids reading wrong valueget value updated for prior instruction (not earlier/later)don’t want to halt until everything else is done

2

Page 3: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

pipeline stages

fetch — instruction memory, most PC computation

decode — reading register file

execute — computation, condition code read/write

memory — memory read/write

writeback — writing register file, writing Stat register

common case: fetch next instruction in next cyclecan’t for conditional jump, return

read/write in same stage avoids reading wrong valueget value updated for prior instruction (not earlier/later)don’t want to halt until everything else is done

2

Page 4: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

pipeline stages

fetch — instruction memory, most PC computation

decode — reading register file

execute — computation, condition code read/write

memory — memory read/write

writeback — writing register file, writing Stat register

common case: fetch next instruction in next cyclecan’t for conditional jump, return

read/write in same stage avoids reading wrong valueget value updated for prior instruction (not earlier/later)

don’t want to halt until everything else is done

2

Page 5: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

pipeline stages

fetch — instruction memory, most PC computation

decode — reading register file

execute — computation, condition code read/write

memory — memory read/write

writeback — writing register file, writing Stat register

common case: fetch next instruction in next cyclecan’t for conditional jump, return

read/write in same stage avoids reading wrong valueget value updated for prior instruction (not earlier/later)

don’t want to halt until everything else is done

2

Page 6: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

Changelog

Changes made in this version not seen in first lecture:13 March 2018: correct PC update rearranging HCL example to check ifcondition codes NOT taken for correcting misprediction.

2

Page 7: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

last time

adding pipelining:divide into stagesvalues that cross stages go into pipeline registerseach stage: read from previous, write to next

pipeline execution:instruction 1 in writebackinstruction 2 in memory…instruction 5 in fetch

hazards — pipeline can’t work “naturally”data: wrong valuecontrol: wrong instruction to fetchgeneric solution: stalling

3

Page 8: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

stalling costs

with only stalling:

extra 3 cycles (total 4) for every ret

extra 2 cycles (total 3) for conditional jmp

up to 3 extra cycles for data dependencies

can we do better?

can’t easily read memory earlymight be written in previous instruction

trick: guess and check

trick: use values waiting to get to register file

4

Page 9: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

stalling costs

with only stalling:

extra 3 cycles (total 4) for every ret

extra 2 cycles (total 3) for conditional jmp

up to 3 extra cycles for data dependencies

can we do better?

can’t easily read memory earlymight be written in previous instruction

trick: guess and check

trick: use values waiting to get to register file

5

Page 10: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

stalling costs

with only stalling:

extra 3 cycles (total 4) for every ret

extra 2 cycles (total 3) for conditional jmp

up to 3 extra cycles for data dependencies

can we do better?

can’t easily read memory earlymight be written in previous instruction

trick: guess and check

trick: use values waiting to get to register file

6

Page 11: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

stalling costs

with only stalling:

extra 3 cycles (total 4) for every ret

extra 2 cycles (total 3) for conditional jmp

up to 3 extra cycles for data dependencies

can we do better?

can’t easily read memory earlymight be written in previous instruction

trick: guess and check

trick: use values waiting to get to register file

7

Page 12: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

revisiting data hazards

stalling worked

but very unsatisfying — wait 2 extra cycles to use anything?!

observation: value ready before it would be needed(just not stored in a way that let’s us get it)

8

Page 13: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

motivation

PC

Instr.Mem.

register filesrcA

srcB

R[srcA]R[srcB]

dstE

next R[dstE]

dstM

next R[dstM]

split

0xF

ADDADD

add 2

// initially %r8 = 800,// %r9 = 900, etc.addq %r8, %r9addq %r9, %r8addq ...addq ...

fetch rA rB R[srcA] R[srcB] dstE next R[dstE] dstEcycle PC rA rB R[srcA] R[srcB] dstE next R[dstE] dstE0 0x01 0x2 8 92 9 8 800 900 93 900 800 8 1700 94 1700 8

fetch/decode decode/execute execute/writeback

should be 1700

9

Page 14: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

motivation

PC

Instr.Mem.

register filesrcA

srcB

R[srcA]R[srcB]

dstE

next R[dstE]

dstM

next R[dstM]

split

0xF

ADDADD

add 2

// initially %r8 = 800,// %r9 = 900, etc.addq %r8, %r9addq %r9, %r8addq ...addq ...

fetch rA rB R[srcA] R[srcB] dstE next R[dstE] dstEcycle PC rA rB R[srcA] R[srcB] dstE next R[dstE] dstE0 0x01 0x2 8 92 9 8 800 900 93 900 800 8 1700 94 1700 8

fetch/decode decode/execute execute/writeback

should be 1700

9

Page 15: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

forwarding

PC

Instr.Mem.

register filesrcA

srcB

R[srcA]R[srcB]

dstE

next R[dstE]

dstM

next R[dstM]

split

0xF

ADDADD

add 2

addq %r8, %r9 // (1)addq %r9, %r8 // (2)

addq %r10, %r9 // (2b)reg #s 9, 8 from (2)reg # 9,R8=800; R9=900 (1)(2)old R9=900,R8=800

(2)R9=1700 (forwarded)R8=800

(2b)R10=1000,R9=1700 (forwarded)

new R9=1700 (1)

MUX

MUX

10

Page 16: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

forwarding

PC

Instr.Mem.

register filesrcA

srcB

R[srcA]R[srcB]

dstE

next R[dstE]

dstM

next R[dstM]

split

0xF

ADDADD

add 2

addq %r8, %r9 // (1)addq %r9, %r8 // (2)

addq %r10, %r9 // (2b)

reg #s 9, 8 from (2)reg # 9,R8=800; R9=900 (1)

(2)old R9=900,R8=800

(2)R9=1700 (forwarded)R8=800

(2b)R10=1000,R9=1700 (forwarded)

new R9=1700 (1)

MUX

MUX

10

Page 17: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

forwarding

PC

Instr.Mem.

register filesrcA

srcB

R[srcA]R[srcB]

dstE

next R[dstE]

dstM

next R[dstM]

split

0xF

ADDADD

add 2

addq %r8, %r9 // (1)addq %r9, %r8 // (2)

addq %r10, %r9 // (2b)reg #s 9, 8 from (2)reg # 9,R8=800; R9=900 (1)

(2)old R9=900,R8=800

(2)R9=1700 (forwarded)R8=800

(2b)R10=1000,R9=1700 (forwarded)

new R9=1700 (1)

MUX

MUX

10

Page 18: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

forwarding

PC

Instr.Mem.

register filesrcA

srcB

R[srcA]R[srcB]

dstE

next R[dstE]

dstM

next R[dstM]

split

0xF

ADDADD

add 2

addq %r8, %r9 // (1)addq %r9, %r8 // (2)

addq %r10, %r9 // (2b)reg #s 9, 8 from (2)reg # 9,R8=800; R9=900 (1)

(2)old R9=900,R8=800

(2)R9=1700 (forwarded)R8=800

(2b)R10=1000,R9=1700 (forwarded)

new R9=1700 (1)

MUX

MUX

10

Page 19: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

forwarding

PC

Instr.Mem.

register filesrcA

srcB

R[srcA]R[srcB]

dstE

next R[dstE]

dstM

next R[dstM]

split

0xF

ADDADD

add 2

addq %r8, %r9 // (1)addq %r9, %r8 // (2)

addq %r10, %r9 // (2b)reg #s 9, 8 from (2)reg # 9,R8=800; R9=900 (1)(2)old R9=900,R8=800

(2)R9=1700 (forwarded)R8=800

(2b)R10=1000,R9=1700 (forwarded)

new R9=1700 (1)

MUX

MUX

10

Page 20: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

forwarding

PC

Instr.Mem.

register filesrcA

srcB

R[srcA]R[srcB]

dstE

next R[dstE]

dstM

next R[dstM]

split

0xF

ADDADD

add 2

addq %r8, %r9 // (1)addq %r9, %r8 // (2)addq %r10, %r9 // (2b)

reg #s 9, 8 from (2)reg # 9,R8=800; R9=900 (1)(2)old R9=900,R8=800

(2)R9=1700 (forwarded)R8=800

(2b)R10=1000,R9=1700 (forwarded)

new R9=1700 (1)

MUX

MUX

10

Page 21: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

forwarding: MUX conditions

PC

Instr.Mem.

register filesrcA

srcB

R[srcA]R[srcB]

dstE

next R[dstE]

dstM

next R[dstM]

split

0xF

ADDADD

add 2

MUX

MUX

addq %r8, %r9 // (1)addq %r9, %r8 // (2)

d_valA= [condition : e_valE;1 : reg_outputA;

];What could condition be?a. W_rA == reg_srcAb. W_dstE == reg_srcAc. e_dstE == reg_srcAd. d_rB == reg_srcAe. something else

11

Page 22: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

forwarding: MUX conditions

PC

Instr.Mem.

register filesrcA

srcB

R[srcA]R[srcB]

dstE

next R[dstE]

dstM

next R[dstM]

split

0xF

ADDADD

add 2

MUX

MUX

addq %r8, %r9 // (1)addq %r9, %r8 // (2)

d_valA= [condition : e_valE;1 : reg_outputA;

];What could condition be?a. W_rA == reg_srcAb. W_dstE == reg_srcAc. e_dstE == reg_srcAd. d_rB == reg_srcAe. something else

11

Page 23: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

forwarding: MUX conditions

PC

Instr.Mem.

register filesrcA

srcB

R[srcA]R[srcB]

dstE

next R[dstE]

dstM

next R[dstM]

split

0xF

ADDADD

add 2

MUX

MUX

addq %r8, %r9 // (1)addq %r9, %r8 // (2)

d_valA= [condition : e_valE;1 : reg_outputA;

];What could condition be?a. W_rA == reg_srcAb. W_dstE == reg_srcAc. e_dstE == reg_srcAd. d_rB == reg_srcAe. something else

11

Page 24: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

forwarding: MUX conditions

PC

Instr.Mem.

register filesrcA

srcB

R[srcA]R[srcB]

dstE

next R[dstE]

dstM

next R[dstM]

split

0xF

ADDADD

add 2

MUX

MUX

addq %r8, %r9 // (1)addq %r9, %r8 // (2)

d_valA= [condition : e_valE;1 : reg_outputA;

];What could condition be?a. W_rA == reg_srcAb. W_dstE == reg_srcAc. e_dstE == reg_srcAd. d_rB == reg_srcAe. something else

11

Page 25: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

forwarding: MUX conditions

PC

Instr.Mem.

register filesrcA

srcB

R[srcA]R[srcB]

dstE

next R[dstE]

dstM

next R[dstM]

split

0xF

ADDADD

add 2

MUX

MUX

addq %r8, %r9 // (1)addq %r9, %r8 // (2)

d_valA= [condition : e_valE;1 : reg_outputA;

];What could condition be?a. W_rA == reg_srcAb. W_dstE == reg_srcAc. e_dstE == reg_srcAd. d_rB == reg_srcAe. something else

11

Page 26: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

some forwarding paths

cycle # 0 1 2 3 4 5 6 7 8addq %r8, %r9 F D E M Wsubq %r9, %r11 F D E M Wmrmovq 4(%r11), %r10 F D E M Wrmmovq %r9, 8(%r11) F D E M Wxorq %r10, %r9 F D E M W

12

Page 27: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

some forwarding paths

cycle # 0 1 2 3 4 5 6 7 8addq %r8, %r9 F D E M Wsubq %r9, %r11 F D E M Wmrmovq 4(%r11), %r10 F D E M Wrmmovq %r9, 8(%r11) F D E M Wxorq %r10, %r9 F D E M W

12

Page 28: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

some forwarding paths

cycle # 0 1 2 3 4 5 6 7 8addq %r8, %r9 F D E M Wsubq %r9, %r11 F D E M Wmrmovq 4(%r11), %r10 F D E M Wrmmovq %r9, 8(%r11) F D E M Wxorq %r10, %r9 F D E M W

12

Page 29: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

some forwarding paths

cycle # 0 1 2 3 4 5 6 7 8addq %r8, %r9 F D E M Wsubq %r9, %r11 F D E M Wmrmovq 4(%r11), %r10 F D E M Wrmmovq %r9, 8(%r11) F D E M Wxorq %r10, %r9 F D E M W

12

Page 30: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

some forwarding paths

cycle # 0 1 2 3 4 5 6 7 8addq %r8, %r9 F D E M Wsubq %r9, %r11 F D E M Wmrmovq 4(%r11), %r10 F D E M Wrmmovq %r9, 8(%r11) F D E M Wxorq %r10, %r9 F D E M W

12

Page 31: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

some forwarding paths

cycle # 0 1 2 3 4 5 6 7 8addq %r8, %r9 F D E M Wsubq %r9, %r11 F D E M Wmrmovq 4(%r11), %r10 F D E M Wrmmovq %r9, 8(%r11) F D E M Wxorq %r10, %r9 F D E M W

12

Page 32: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

some forwarding paths

cycle # 0 1 2 3 4 5 6 7 8addq %r8, %r9 F D E M Wsubq %r9, %r11 F D E M Wmrmovq 4(%r11), %r10 F D E M Wrmmovq %r9, 8(%r11) F D E M Wxorq %r10, %r9 F D E M W

12

Page 33: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

some forwarding paths

cycle # 0 1 2 3 4 5 6 7 8addq %r8, %r9 F D E M Wsubq %r9, %r11 F D E M Wmrmovq 4(%r11), %r10 F D E M Wrmmovq %r9, 8(%r11) F D E M Wxorq %r10, %r9 F D E M W

12

Page 34: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

some forwarding paths

cycle # 0 1 2 3 4 5 6 7 8addq %r8, %r9 F D E M Wsubq %r9, %r11 F D E M Wmrmovq 4(%r11), %r10 F D E M Wrmmovq %r9, 8(%r11) F D E M Wxorq %r10, %r9 F D E M W

12

Page 35: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

multiple forwarding paths (1)

cycle # 0 1 2 3 4 5 6 7 8addq %r10, %r8 F D E M Waddq %r11, %r8 F D E M Waddq %r12, %r8 F D E M W

13

Page 36: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

multiple forwarding paths (1)

cycle # 0 1 2 3 4 5 6 7 8addq %r10, %r8 F D E M Waddq %r11, %r8 F D E M Waddq %r12, %r8 F D E M W

13

Page 37: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

multiple forwarding HCL (1)

/* decode output: valA */d_valA = [

...reg_srcA == e_dstE : e_valE;

/* forward from end of execute */

reg_srcA == m_dstE : m_valE;/* forward from end of memory */

...1 : reg_outputA;

];14

Page 38: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

multiple forwarding paths (2)

cycle # 0 1 2 3 4 5 6 7 8addq %r10, %r8 F D E M Waddq %r11, %r12 F D E M Waddq %r12, %r8 F D E M W

15

Page 39: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

multiple forwarding paths (2)

cycle # 0 1 2 3 4 5 6 7 8addq %r10, %r8 F D E M Waddq %r11, %r12 F D E M Waddq %r12, %r8 F D E M W

15

Page 40: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

multiple forwarding paths (2)

cycle # 0 1 2 3 4 5 6 7 8addq %r10, %r8 F D E M Waddq %r11, %r12 F D E M Waddq %r12, %r8 F D E M W

15

Page 41: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

multiple forwarding HCL (2)

d_valA = [...reg_srcA == e_dstE : e_valE;...1 : reg_outputA;

];...d_valB = [

...reg_srcB == m_dstE : m_valE;...1 : reg_outputA;

];16

Page 42: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

hazards versus dependencies

dependency — X needs result of instruction Y?

hazard — will it not work in some pipeline?before extra work is done to “resolve” hazardslike forwarding or stalling or branch prediction

17

Page 43: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

ex.: dependencies and hazards (1)addq %rax, %rbx

subq %rax, %rcx

irmovq $100, %rcx

addq %rcx, %r10

addq %rbx, %r10

where are dependencies?which are hazards in our pipeline?which are resolved with forwarding?

18

Page 44: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

ex.: dependencies and hazards (1)addq %rax, %rbx

subq %rax, %rcx

irmovq $100, %rcx

addq %rcx, %r10

addq %rbx, %r10

where are dependencies?which are hazards in our pipeline?which are resolved with forwarding?

18

Page 45: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

ex.: dependencies and hazards (1)addq %rax, %rbx

subq %rax, %rcx

irmovq $100, %rcx

addq %rcx, %r10

addq %rbx, %r10

where are dependencies?which are hazards in our pipeline?which are resolved with forwarding?

18

Page 46: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

ex.: dependencies and hazards (1)addq %rax, %rbx

subq %rax, %rcx

irmovq $100, %rcx

addq %rcx, %r10

addq %rbx, %r10

where are dependencies?which are hazards in our pipeline?which are resolved with forwarding?

18

Page 47: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

ex.: dependencies and hazards (2)mrmovq 0(%rax) %rbx

addq %rbx %rcx

jne foo

addq %rcx %rdx

mrmovq (%rdx) %rcx

foo:

where are dependencies?which are hazards in our pipeline?which are resolved with forwarding?

19

Page 48: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

pipeline with different hazards

example: 4-stage pipeline:fetch/decode/execute+memory/writeback

// 4 stage // 5 stageaddq %rax, %r8 // // Wsubq %rax, %r9 // W // Mxorq %rax, %r10 // EM // Eandq %r8, %r11 // D // D

addq/andq is hazard with 5-stage pipeline

addq/andq is not a hazard with 4-stage pipeline

20

Page 49: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

pipeline with different hazards

example: 4-stage pipeline:fetch/decode/execute+memory/writeback

// 4 stage // 5 stageaddq %rax, %r8 // // Wsubq %rax, %r9 // W // Mxorq %rax, %r10 // EM // Eandq %r8, %r11 // D // D

addq/andq is hazard with 5-stage pipeline

addq/andq is not a hazard with 4-stage pipeline

20

Page 50: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

exercise: different pipeline

split execute into two stages: F/D/E1/E2/M/W

result only available after second execute stage

where does forwarding, stalls occur?cycle # 0 1 2 3 4 5 6 7 8

addq %rcx, %r9 F D E1 E2 M Waddq %r9, %rbx F D D E1 E2 M Waddq %rax, %r9 F F D E1 E2 M Wrmmovq %r9, (%rbx) F D E1 E2 M W

21

Page 51: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

exercise: different pipeline

split execute into two stages: F/D/E1/E2/M/W

cycle # 0 1 2 3 4 5 6 7 8addq %rcx, %r9 F D E1 E2 M Waddq %r9, %rbx F D E1 E2 M Waddq %r9, %rbx F D D E1 E2 M Waddq %rax, %r9 F D E1 E2 M Waddq %rax, %r9 F F D E1 E2 M Wrmmovq %r9, (%rbx) F D E1 E2 M Wrmmovq %r9, (%rbx) F D E1 E2 M W

22

Page 52: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

exercise: different pipeline

split execute into two stages: F/D/E1/E2/M/W

cycle # 0 1 2 3 4 5 6 7 8addq %rcx, %r9 F D E1 E2 M Waddq %r9, %rbx F D E1 E2 M Waddq %r9, %rbx F D D E1 E2 M Waddq %rax, %r9 F D E1 E2 M Waddq %rax, %r9 F F D E1 E2 M Wrmmovq %r9, (%rbx) F D E1 E2 M Wrmmovq %r9, (%rbx) F D E1 E2 M W

22

Page 53: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

exercise: different pipeline

split execute into two stages: F/D/E1/E2/M/W

cycle # 0 1 2 3 4 5 6 7 8addq %rcx, %r9 F D E1 E2 M Waddq %r9, %rbx F D E1 E2 M Waddq %r9, %rbx F D D E1 E2 M Waddq %rax, %r9 F D E1 E2 M Waddq %rax, %r9 F F D E1 E2 M Wrmmovq %r9, (%rbx) F D E1 E2 M Wrmmovq %r9, (%rbx) F D E1 E2 M W

22

Page 54: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

exercise: different pipeline

split execute into two stages: F/D/E1/E2/M/W

cycle # 0 1 2 3 4 5 6 7 8addq %rcx, %r9 F D E1 E2 M Waddq %r9, %rbx F D E1 E2 M Waddq %r9, %rbx F D D E1 E2 M Waddq %rax, %r9 F D E1 E2 M Waddq %rax, %r9 F F D E1 E2 M Wrmmovq %r9, (%rbx) F D E1 E2 M Wrmmovq %r9, (%rbx) F D E1 E2 M W

22

Page 55: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

exercise: different pipeline

split execute into two stages: F/D/E1/E2/M/W

cycle # 0 1 2 3 4 5 6 7 8addq %rcx, %r9 F D E1 E2 M Waddq %r9, %rbx F D E1 E2 M Waddq %r9, %rbx F D D E1 E2 M Waddq %rax, %r9 F D E1 E2 M Waddq %rax, %r9 F F D E1 E2 M Wrmmovq %r9, (%rbx) F D E1 E2 M Wrmmovq %r9, (%rbx) F D E1 E2 M W

22

Page 56: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

stalling costs

with only stalling:

extra 3 cycles (total 4) for every ret

extra 2 cycles (total 3) for conditional jmp

up to 3 extra cycles for data dependencies

can we do better?

can’t easily read memory earlymight be written in previous instruction

trick: guess and check

trick: use values waiting to get to register file

23

Page 57: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

when do instructions change things?

… other than pipeline registers/PC:stage changesfetch (none)decode (none)execute condition codesmemory memory writeswriteback register writes/stat changes

to “undo” instruction during fetch/decode:forget everything in pipeline registers

24

Page 58: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

when do instructions change things?

… other than pipeline registers/PC:stage changesfetch (none)decode (none)execute condition codesmemory memory writeswriteback register writes/stat changes

to “undo” instruction during fetch/decode:forget everything in pipeline registers

24

Page 59: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

making guesses

subq %rcx, %raxjne LABELxorq %r10, %r11xorq %r12, %r13...

LABEL: addq %r8, %r9rmmovq %r10, 0(%r11)

speculate: jne will goto LABEL

right: 2 cycles faster!

wrong: forget before execute finishes25

Page 60: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

jXX: speculating right

time fetch decode execute memory writeback1 subq2 jne subq3 addq [?] jne subq (set ZF)4 rmmovq [?] addq [?] jne (use ZF) OPq5 irmovq rmmovq addq jne (done) OPq

subq %r8, %r8jne LABEL...

LABEL: addq %r8, %r9rmmovq %r10, 0(%r11)irmovq $1, %r11

were waiting/nothing

26

Page 61: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

jXX: speculating right

time fetch decode execute memory writeback1 subq2 jne subq3 addq [?] jne subq (set ZF)4 rmmovq [?] addq [?] jne (use ZF) OPq5 irmovq rmmovq addq jne (done) OPq

subq %r8, %r8jne LABEL...

LABEL: addq %r8, %r9rmmovq %r10, 0(%r11)irmovq $1, %r11

were waiting/nothing

26

Page 62: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

jXX: speculating wrong

time fetch decode execute memory writeback1 subq2 jne subq3 addq [?] jne subq (set ZF)4 rmmovq [?] addq [?] jne (use ZF) OPq5 xorq nothing nothing jne (done) OPq

subq %r8, %r8jne LABELxorq %r10, %r11...

LABEL: addq %r8, %r9rmmovq %r10, 0(%r11)

“squash” wrong guessesfetch correct next instruction

27

Page 63: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

jXX: speculating wrong

time fetch decode execute memory writeback1 subq2 jne subq3 addq [?] jne subq (set ZF)4 rmmovq [?] addq [?] jne (use ZF) OPq5 xorq nothing nothing jne (done) OPq

subq %r8, %r8jne LABELxorq %r10, %r11...

LABEL: addq %r8, %r9rmmovq %r10, 0(%r11)

“squash” wrong guesses

fetch correct next instruction

27

Page 64: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

jXX: speculating wrong

time fetch decode execute memory writeback1 subq2 jne subq3 addq [?] jne subq (set ZF)4 rmmovq [?] addq [?] jne (use ZF) OPq5 xorq nothing nothing jne (done) OPq

subq %r8, %r8jne LABELxorq %r10, %r11...

LABEL: addq %r8, %r9rmmovq %r10, 0(%r11)

“squash” wrong guesses

fetch correct next instruction

27

Page 65: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

performance

kind portion cycles (predict) cycles (stall)not-taken jXX 3% 3 3

taken jXX 5% 1 3ret 1% 4 4

others 91% 1* 1*

hypothetical instruction mix

predict: 3 × .03 + 1 × .05 + 4 × .01 + 1 × .91 =1.09 cycles/instr.

stall: 3 × .03 + 3 × .05 + 4 × .01 + 1 × .91 =1.19 cylces/instr.

* — ignoring data hazards 28

Page 66: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

performance

kind portion cycles (predict) cycles (stall)not-taken jXX 3% 3 3

taken jXX 5% 1 3ret 1% 4 4

others 91% 1* 1*

hypothetical instruction mix

predict: 3 × .03 + 1 × .05 + 4 × .01 + 1 × .91 =1.09 cycles/instr.

stall: 3 × .03 + 3 × .05 + 4 × .01 + 1 × .91 =1.19 cylces/instr.

* — ignoring data hazards 28

Page 67: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

PC update (adding stall)

PC

MUX

convert icode icode (from instr. mem)

+2+10…

to instr. mem

MUX

control logic need to stall?“taken” (from execute), ret ready?

jump targetaddr. after mispredicted jump/ret address

29

Page 68: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

PC update (adding stall)

PC

MUX

convert icode icode (from instr. mem)

+2+10…

to instr. mem

MUX

control logic need to stall?“taken” (from execute), ret ready?

jump targetaddr. after mispredicted jump/ret address

29

Page 69: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

PC update (rearranged)

predicted PC(replaces PC)

MUX

convert icode icode (from instr. mem)

need to stall?

+2+10jump/call target…

MUX

control logic

to stall logic

taken?; etc

…address after mispred. jump

address from ret to instr. mem.

same logic as before — but happens in next cycleinputs are from slightly different place…(e.g. ‘taken?’ from execute to memory registers, not execute directly)

30

Page 70: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

PC update (rearranged)

predicted PC(replaces PC)

MUX

convert icode icode (from instr. mem)

need to stall?

+2+10jump/call target…

MUX

control logic

to stall logic

taken?; etc

…address after mispred. jump

address from ret to instr. mem.

same logic as before — but happens in next cycleinputs are from slightly different place…(e.g. ‘taken?’ from execute to memory registers, not execute directly)

30

Page 71: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

PC update (rearranged)

predicted PC(replaces PC)

MUX

convert icode icode (from instr. mem)

need to stall?

+2+10jump/call target…

MUX

control logic

to stall logic

taken?; etc

…address after mispred. jump

address from ret to instr. mem.

same logic as before — but happens in next cycleinputs are from slightly different place…(e.g. ‘taken?’ from execute to memory registers, not execute directly)

30

Page 72: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

PC update (rearranged)

predicted PC(replaces PC)

MUX

convert icode icode (from instr. mem)

need to stall?

+2+10jump/call target…

MUX

control logic

to stall logic

taken?; etc

…address after mispred. jump

address from ret to instr. mem.

same logic as before — but happens in next cycleinputs are from slightly different place…(e.g. ‘taken?’ from execute to memory registers, not execute directly)

30

Page 73: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

rearranged PC update in HCL

/* replacing the PC register: */register fF {

predictedPC: 64 = 0;}

/* actual input to instruction memory */pc = [

conditionCodesSaidNotTaken : jumpValP;/* from later in pipeline */

...1: F_predictedPC;

];

31

Page 74: Pipelining 3: Hazards/Forwarding/Predictioncr4bd/3330/S2018/slides/20180313--slides-1up.pdfpipelinestages fetch—instructionmemory,mostPCcomputation decode—readingregisterfile execute—computation,conditioncoderead/write

why rearrange PC update?

either workscorrect PC at beginning or end of cycle?still some time in cycle to do so…

maybe easier to think about branch prediction this way?

32