50
1 CA226 — Advanced Computer Architecture Stephen Blott <[email protected]> Table of Contents

CA226 — Advanced Computer Architectureray/teaching/CA226/05-hazards.pdf · CA226 — Advanced Computer Architecture 3 … Recall: • the MIPS pipeline implements instruction level

  • Upload
    buikiet

  • View
    226

  • Download
    2

Embed Size (px)

Citation preview

Page 1: CA226 — Advanced Computer Architectureray/teaching/CA226/05-hazards.pdf · CA226 — Advanced Computer Architecture 3 … Recall: • the MIPS pipeline implements instruction level

1

CA226 — AdvancedComputer Architecture

Stephen Blott <[email protected]>

Table of Contents

Page 2: CA226 — Advanced Computer Architectureray/teaching/CA226/05-hazards.pdf · CA226 — Advanced Computer Architecture 3 … Recall: • the MIPS pipeline implements instruction level

CA226 — AdvancedComputer Architecture

2

…Today:

• data hazards

Page 3: CA226 — Advanced Computer Architectureray/teaching/CA226/05-hazards.pdf · CA226 — Advanced Computer Architecture 3 … Recall: • the MIPS pipeline implements instruction level

CA226 — AdvancedComputer Architecture

3

…Recall:

• the MIPS pipeline implements instruction level parallelism

• ideally, up to five instructions are executed (in part) on any clock cycle

• if one instruction were to exit the pipeline on each cycle:

• then the CPI would be 1and, ideally, the MIPS pipeline approaches a CPI of 1

Page 4: CA226 — Advanced Computer Architectureray/teaching/CA226/05-hazards.pdf · CA226 — Advanced Computer Architecture 3 … Recall: • the MIPS pipeline implements instruction level

CA226 — AdvancedComputer Architecture

4

MIPS Pipeline

Page 5: CA226 — Advanced Computer Architectureray/teaching/CA226/05-hazards.pdf · CA226 — Advanced Computer Architecture 3 … Recall: • the MIPS pipeline implements instruction level

CA226 — AdvancedComputer Architecture

5

Example daddi r1,r1,1 daddi r2,r2,1 daddi r3,r3,1 daddi r4,r4,1 daddi r5,r5,1

Note

Note to self: see pipeline.s.

Page 6: CA226 — Advanced Computer Architectureray/teaching/CA226/05-hazards.pdf · CA226 — Advanced Computer Architecture 3 … Recall: • the MIPS pipeline implements instruction level

CA226 — AdvancedComputer Architecture

6

SpeedupIdeally:

• each instruction takes 5 cycles to execute

• however, 5 instructions are in the pipeline

• so the number of cycles per instruction approaches 1

Note

Note to self:Observe the effect on CPI of repeating the block of instructions, previous.

Page 7: CA226 — Advanced Computer Architectureray/teaching/CA226/05-hazards.pdf · CA226 — Advanced Computer Architecture 3 … Recall: • the MIPS pipeline implements instruction level

CA226 — AdvancedComputer Architecture

7

HazardsThe major hurdle to effective pipeline implementation is:

• hazards

Page 8: CA226 — Advanced Computer Architectureray/teaching/CA226/05-hazards.pdf · CA226 — Advanced Computer Architecture 3 … Recall: • the MIPS pipeline implements instruction level

CA226 — AdvancedComputer Architecture

8

Types of Hazard

Structural hazardsresource conflicts;hardware cannot support all instruction combinations simultaneously

Data hazardswhen one instruction depends upon the result (which is not yet available) of aprevious instruction(today)

Control hazardswhen the address of the next instruction cannot be determined immediately

Page 9: CA226 — Advanced Computer Architectureray/teaching/CA226/05-hazards.pdf · CA226 — Advanced Computer Architecture 3 … Recall: • the MIPS pipeline implements instruction level

CA226 — AdvancedComputer Architecture

9

Data Hazards — ExampleConsider:

dadd r1,r2,r3 ; instruction 1 dsub r4,r1,r5 ; instruction 2 and r6,r1,r5 ; instruction 3 or r8,r1,r9 ; instruction 4 xor r10,r1,r11 ; instruction 5

Instructions 2, 3, 4 and 5:

• each depend upon the result of instruction 1

Page 10: CA226 — Advanced Computer Architectureray/teaching/CA226/05-hazards.pdf · CA226 — Advanced Computer Architecture 3 … Recall: • the MIPS pipeline implements instruction level

CA226 — AdvancedComputer Architecture

10

Ok …

Turn off forwarding, and let’s try running that …

Note to self:

• see hazards1.s.

Page 11: CA226 — Advanced Computer Architectureray/teaching/CA226/05-hazards.pdf · CA226 — Advanced Computer Architecture 3 … Recall: • the MIPS pipeline implements instruction level

CA226 — AdvancedComputer Architecture

11

Illustration

Table 1. Two Read-After-Write (RAW) pipeline stalls:

1 2 3 4 5 6 7

dadd r1,r2,r3 IF ID Ex Mem WB*

dsub r4,r1,r5 IF ID RAW RAW *Ex

and r6,r1,r5 IF stall stall ID

or r8,r1,r9 IF

Note

This assumes that we can both write and read the register file in a single clock cycle.Typically, the write happens in the first half of the cycle, and the read in the secondhalf.

Page 12: CA226 — Advanced Computer Architectureray/teaching/CA226/05-hazards.pdf · CA226 — Advanced Computer Architecture 3 … Recall: • the MIPS pipeline implements instruction level

CA226 — AdvancedComputer Architecture

12

ObservationsThis is known as a read after write (or RAW) stall:

• instruction 2 is blocked at ID because one of its arguments (registers) is not yetavailable

• in this case, all subsequent instructions are blocked toowhich is known as a pipeline stall

Page 13: CA226 — Advanced Computer Architectureray/teaching/CA226/05-hazards.pdf · CA226 — Advanced Computer Architecture 3 … Recall: • the MIPS pipeline implements instruction level

CA226 — AdvancedComputer Architecture

13

Next, …Consider:

• the effect of replacing instruction 2 with a nop instruction(or any other, non-dependent instruction)

Page 14: CA226 — Advanced Computer Architectureray/teaching/CA226/05-hazards.pdf · CA226 — Advanced Computer Architecture 3 … Recall: • the MIPS pipeline implements instruction level

CA226 — AdvancedComputer Architecture

14

Illustration

Table 2. Still one RAW stall:

1 2 3 4 5 6 7

dadd r1,r2,r3 IF ID Ex Mem WB*

nop IF ID Ex Mem WB

and r6,r1,r5 IF ID RAW *Ex Mem

or r8,r1,r9 IF stall Id Ex

Page 15: CA226 — Advanced Computer Architectureray/teaching/CA226/05-hazards.pdf · CA226 — Advanced Computer Architecture 3 … Recall: • the MIPS pipeline implements instruction level

CA226 — AdvancedComputer Architecture

15

Next, …Finally, consider:

• the effect of replacing instruction 3 with a nop instruction(or any other, non-dependent instruction)

Page 16: CA226 — Advanced Computer Architectureray/teaching/CA226/05-hazards.pdf · CA226 — Advanced Computer Architecture 3 … Recall: • the MIPS pipeline implements instruction level

CA226 — AdvancedComputer Architecture

16

Illustration

Table 3. No stalls:

1 2 3 4 5 6 7

dadd r1,r2,r3 IF ID Ex Mem WB*

nop IF ID Ex Mem WB

nop IF ID Ex Mem

or r8,r1,r9 IF ID *Ex Mem

Page 17: CA226 — Advanced Computer Architectureray/teaching/CA226/05-hazards.pdf · CA226 — Advanced Computer Architecture 3 … Recall: • the MIPS pipeline implements instruction level

CA226 — AdvancedComputer Architecture

17

…We could:

• find (two) other (independent) instructions to insert between such write-readdependencies

• but such dependencies are commonand we rarely have enough instructions to fill the gaps

Page 18: CA226 — Advanced Computer Architectureray/teaching/CA226/05-hazards.pdf · CA226 — Advanced Computer Architecture 3 … Recall: • the MIPS pipeline implements instruction level

CA226 — AdvancedComputer Architecture

18

…However, such hazards are not insurmountable:

• the ALU produces the necessary value in cycle 3(although it is not written back to the register file until cycle 5)

• that value is not needed by instruction 2 until cycle 4

Page 19: CA226 — Advanced Computer Architectureray/teaching/CA226/05-hazards.pdf · CA226 — Advanced Computer Architecture 3 … Recall: • the MIPS pipeline implements instruction level

CA226 — AdvancedComputer Architecture

19

Table 4. The value is available after cycle 3:

1 2 3 4 5 6 7

dadd r1,r2,r3 IF ID Ex** Mem WB*

dsub r4,r1,r5 IF ID RAW RAW *Ex

and r6,r1,r5 IF stall stall ID

or r8,r1,r9 IF

Page 20: CA226 — Advanced Computer Architectureray/teaching/CA226/05-hazards.pdf · CA226 — Advanced Computer Architecture 3 … Recall: • the MIPS pipeline implements instruction level

CA226 — AdvancedComputer Architecture

20

ForwardingSolution:

• data paths are added:

• EX/Mem.ALUOutput → ID/EX.A (output)EX/Mem.ALUOutput → ID/EX.B (output)Mem/WB.ALUOutput → ID/EX.A (output)Mem/WB.ALUOutput → ID/EX.B (output)

• when a read-after-write is detected, the ALU input:(either ID/EX.A or ID/EX.B)is switched to one of the two available ALUOutput pipeline registers (Ex/Mem orMem/WB)

Page 21: CA226 — Advanced Computer Architectureray/teaching/CA226/05-hazards.pdf · CA226 — Advanced Computer Architecture 3 … Recall: • the MIPS pipeline implements instruction level

CA226 — AdvancedComputer Architecture

21

MIPS Pipeline

Page 22: CA226 — Advanced Computer Architectureray/teaching/CA226/05-hazards.pdf · CA226 — Advanced Computer Architecture 3 … Recall: • the MIPS pipeline implements instruction level

CA226 — AdvancedComputer Architecture

22

Forwarding

1 2 3 4 5 6 7

dadd r1,r2,r3 IF ID Ex** Mem WB

dsub r4,r1,r5 IF ID **Ex Mem WB

and r6,r1,r5 IF ID Ex Mem WB

or r8,r1,r9 IF ID Ex Mem

One of:

• EX/Mem.ALUOutput → ID/EX.AEX/Mem.ALUOutput → ID/EX.B

Page 23: CA226 — Advanced Computer Architectureray/teaching/CA226/05-hazards.pdf · CA226 — Advanced Computer Architecture 3 … Recall: • the MIPS pipeline implements instruction level

CA226 — AdvancedComputer Architecture

23

Forwarding

1 2 3 4 5 6 7

dadd r1,r2,r3 IF ID Ex Mem** WB

nop IF ID Ex Mem WB

and r6,r1,r5 IF ID **Ex Mem WB

or r8,r1,r9 IF ID Ex Mem

One of:

• Mem/WB.ALUOutput → ID/EX.AMem/WB.ALUOutput → ID/EX.B

Page 24: CA226 — Advanced Computer Architectureray/teaching/CA226/05-hazards.pdf · CA226 — Advanced Computer Architecture 3 … Recall: • the MIPS pipeline implements instruction level

CA226 — AdvancedComputer Architecture

24

The WinMIPS64 SimulatorThe WinMIPS64 simulator:

• supports forwardingit can be either enabled or disabled

• see: Configure/Enable Forwarding

Page 25: CA226 — Advanced Computer Architectureray/teaching/CA226/05-hazards.pdf · CA226 — Advanced Computer Architecture 3 … Recall: • the MIPS pipeline implements instruction level

CA226 — AdvancedComputer Architecture

25

…Try turning on forwarding:

• and running the example again…(hazards1.s)

Page 26: CA226 — Advanced Computer Architectureray/teaching/CA226/05-hazards.pdf · CA226 — Advanced Computer Architecture 3 … Recall: • the MIPS pipeline implements instruction level

CA226 — AdvancedComputer Architecture

26

Now, consider the following … daddi r1,r2,123 ; instruction 1 ld r4,0(r1) ; instruction 2 sd r4,8(r1) ; instruction 3

Here:

• there is a RAW dependency between the daddi instruction and the addresscalculation in both of the following instructions

• the address calculation is handled by the ALU,so these are handled by forwarding, as before

Page 27: CA226 — Advanced Computer Architectureray/teaching/CA226/05-hazards.pdf · CA226 — Advanced Computer Architecture 3 … Recall: • the MIPS pipeline implements instruction level

CA226 — AdvancedComputer Architecture

27

Illustration

Table 5. No stalls due to address calculation:

1 2 3 4 5 6 7

daddi r1,r2,123 IF ID Ex** Mem++ WB

ld r4,0(r1) IF ID **Ex Mem WB

sd r4,8(r1) IF ID ++Ex Mem WB

• EX/Mem.ALUOutput → ID/EX.A for cycle 4Mem/WB.ALUOutput → ID/EX.A for cycle 5

Page 28: CA226 — Advanced Computer Architectureray/teaching/CA226/05-hazards.pdf · CA226 — Advanced Computer Architecture 3 … Recall: • the MIPS pipeline implements instruction level

CA226 — AdvancedComputer Architecture

28

And, again …daddi r1,r2,123 ; instruction 1ld r4,0(r1) ; instruction 2sd r4,8(r1) ; instruction 3

Page 29: CA226 — Advanced Computer Architectureray/teaching/CA226/05-hazards.pdf · CA226 — Advanced Computer Architecture 3 … Recall: • the MIPS pipeline implements instruction level

CA226 — AdvancedComputer Architecture

29

And, again …daddi r1,r2,123 ; instruction 1ld r4,0(r1) ; instruction 2sd r4,8(r1) ; instruction 3

Also:

• the sd instruction depends upon the result of the ld

Page 30: CA226 — Advanced Computer Architectureray/teaching/CA226/05-hazards.pdf · CA226 — Advanced Computer Architecture 3 … Recall: • the MIPS pipeline implements instruction level

CA226 — AdvancedComputer Architecture

30

Table 6. This can be solved by forwarding too:

1 2 3 4 5 6 7

daddi r1,r2,123 IF ID Ex Mem WB

ld r4,0(r1) IF ID Ex Mem** WB

sd r4,8(r1) IF ID Ex **Mem WB

Here:

• Mem/WB.LMD → EX/MEM.B for cycle 6

Page 31: CA226 — Advanced Computer Architectureray/teaching/CA226/05-hazards.pdf · CA226 — Advanced Computer Architecture 3 … Recall: • the MIPS pipeline implements instruction level

CA226 — AdvancedComputer Architecture

31

In full …

1 2 3 4 5 6 7

daddi r1,r2,123 IF ID Ex++ Mem== WB

ld r4,0(r1) IF ID ++Ex Mem** WB

sd r4,8(r1) IF ID ==Ex **Mem WB

• EX/Mem.ALUOutput → ID/EX.A for cycle 4Mem/WB.ALUOutput → ID/EX.A for cycle 5Mem/WB.LMD → EX/MEM.B for cycle 6

Page 32: CA226 — Advanced Computer Architectureray/teaching/CA226/05-hazards.pdf · CA226 — Advanced Computer Architecture 3 … Recall: • the MIPS pipeline implements instruction level

CA226 — AdvancedComputer Architecture

32

…In all:

• four pipeline stalls are eliminated(note to self: see stalls1.s)

Page 33: CA226 — Advanced Computer Architectureray/teaching/CA226/05-hazards.pdf · CA226 — Advanced Computer Architecture 3 … Recall: • the MIPS pipeline implements instruction level

CA226 — AdvancedComputer Architecture

33

MIPS Pipeline

Page 34: CA226 — Advanced Computer Architectureray/teaching/CA226/05-hazards.pdf · CA226 — Advanced Computer Architecture 3 … Recall: • the MIPS pipeline implements instruction level

CA226 — AdvancedComputer Architecture

34

Unfortunately …Forwarding cannot solve all RAW problems:

ld r1,n(r0)dadd r2,r1,r0

Page 35: CA226 — Advanced Computer Architectureray/teaching/CA226/05-hazards.pdf · CA226 — Advanced Computer Architecture 3 … Recall: • the MIPS pipeline implements instruction level

CA226 — AdvancedComputer Architecture

35

Table 7. You can’t forward backwards in time:

1 2 3 4 5 6 7

ld r1,n(r0) IF ID Ex Mem** WB

dadd r2,r1,r0 IF ID **Ex Mem WB

Clearly:

• this is not possible

Page 36: CA226 — Advanced Computer Architectureray/teaching/CA226/05-hazards.pdf · CA226 — Advanced Computer Architecture 3 … Recall: • the MIPS pipeline implements instruction level

CA226 — AdvancedComputer Architecture

36

An Insurmountable Stall

Table 8. An inevitable stall of one cycle:

ld r1,n(r0) IF ID Ex Mem** WB

dadd r2,r1,r0 IF ID RAW **Ex Mem

Page 37: CA226 — Advanced Computer Architectureray/teaching/CA226/05-hazards.pdf · CA226 — Advanced Computer Architecture 3 … Recall: • the MIPS pipeline implements instruction level

CA226 — AdvancedComputer Architecture

37

More generally, …Unlike arithmetic instructions:

• loads yield values only after the Mem stage of the pipelineso stalls at Ex cannot be avoided

Page 38: CA226 — Advanced Computer Architectureray/teaching/CA226/05-hazards.pdf · CA226 — Advanced Computer Architecture 3 … Recall: • the MIPS pipeline implements instruction level

CA226 — AdvancedComputer Architecture

38

SuggestionWhen possible, replace:

dadd r3,r2,r1 ; some other, unrelated instructionld r4,N(r0)dadd r6,r5,r4 ; stall - can't forward backwards!

Page 39: CA226 — Advanced Computer Architectureray/teaching/CA226/05-hazards.pdf · CA226 — Advanced Computer Architecture 3 … Recall: • the MIPS pipeline implements instruction level

CA226 — AdvancedComputer Architecture

39

SuggestionWith:

ld r4,N(r0)dadd r3,r2,r1 ; some other, unrelated instructiondadd r6,r5,r4 ; doesn't stall - can forward from dadd

Now:

• when the final dadd reaches Ex:Mem/WB.LMD is available for forwarding

Page 40: CA226 — Advanced Computer Architectureray/teaching/CA226/05-hazards.pdf · CA226 — Advanced Computer Architecture 3 … Recall: • the MIPS pipeline implements instruction level

CA226 — AdvancedComputer Architecture

40

Note

A good compiler (or you!) should be able to spot such stalls and reorder theoperations.

We spot such stalls by observing that an ALU instruction immediately follows a loadupon which it depends.

Page 41: CA226 — Advanced Computer Architectureray/teaching/CA226/05-hazards.pdf · CA226 — Advanced Computer Architecture 3 … Recall: • the MIPS pipeline implements instruction level

CA226 — AdvancedComputer Architecture

41

ExampleCompile:

int a = b + c;int d = e + f;

Note to self:

• see psched1.s and psched2.s.

Page 42: CA226 — Advanced Computer Architectureray/teaching/CA226/05-hazards.pdf · CA226 — Advanced Computer Architecture 3 … Recall: • the MIPS pipeline implements instruction level

CA226 — AdvancedComputer Architecture

42

ExampleFirst, spot the problem:

ld r1,b(r0) ; a = b + cld r2,c(r0)dadd r5,r1,r2sd r5,a(r0)

ld r1,e(r0) ; d = e + fld r2,f(r0)dadd r5,r1,r2sd r5,d(r0)

Page 43: CA226 — Advanced Computer Architectureray/teaching/CA226/05-hazards.pdf · CA226 — Advanced Computer Architecture 3 … Recall: • the MIPS pipeline implements instruction level

CA226 — AdvancedComputer Architecture

43

ExampleThen, rewrite instructions such that there are no stalls:

ld r1,b(r0) ; a = b + cld r2,c(r0)dadd r5,r1,r2 ; stall, r2 not readysd r5,a(r0)

ld r1,e(r0) ; d = e + fld r2,f(r0)dadd r5,r1,r2 ; stall, r2 not readysd r5,d(r0)

Page 44: CA226 — Advanced Computer Architectureray/teaching/CA226/05-hazards.pdf · CA226 — Advanced Computer Architecture 3 … Recall: • the MIPS pipeline implements instruction level

CA226 — AdvancedComputer Architecture

44

ExampleWell, it’s helpful to use different registers:

ld r1,b(r0) ; a = b + cld r2,c(r0)dadd r5,r1,r2 ; stall, r2 not readysd r5,a(r0)

ld r3,e(r0) ; d = e + fld r4,f(r0)dadd r5,r3,r4 ; stall, r4 not readysd r5,d(r0)

Page 45: CA226 — Advanced Computer Architectureray/teaching/CA226/05-hazards.pdf · CA226 — Advanced Computer Architecture 3 … Recall: • the MIPS pipeline implements instruction level

CA226 — AdvancedComputer Architecture

45

ExampleNo stalls:

ld r1,b(r0)ld r2,c(r0)ld r3,e(r0) ; prevent stall (pulled up)dadd r5,r1,r2 ; no stall

ld r4,f(r0)sd r5,a(r0) ; prevent stall (pushed down)dadd r5,r3,r4 ; no stallsd r5,d(r0)

Page 46: CA226 — Advanced Computer Architectureray/teaching/CA226/05-hazards.pdf · CA226 — Advanced Computer Architecture 3 … Recall: • the MIPS pipeline implements instruction level

CA226 — AdvancedComputer Architecture

46

…This is known as:

• pipeline scheduling

In this case:

• use two extra registers

• avoid two stalls

• 13 cycles, instead of 15

Page 47: CA226 — Advanced Computer Architectureray/teaching/CA226/05-hazards.pdf · CA226 — Advanced Computer Architecture 3 … Recall: • the MIPS pipeline implements instruction level

CA226 — AdvancedComputer Architecture

47

AsideThe "13 versus 15 cycles" statement is misleading:

• it includes cycles for the pipeline to fill and empty

Actually:

• disregarding the filling of the pipeline:

• it’s 8 cycles, instead of 10so a speedup of 1.25

Page 48: CA226 — Advanced Computer Architectureray/teaching/CA226/05-hazards.pdf · CA226 — Advanced Computer Architecture 3 … Recall: • the MIPS pipeline implements instruction level

CA226 — AdvancedComputer Architecture

48

Summary 1Forwarding is simple:

• if the necessary data is available somewhere in the pipeline and when needed:then it can be forwarded to where it’s needed

The implementation in hardware of these strategies is an engineering decision:

• it is correct, in all cases, to stall the pipeline when such hazards are detected

• forwarding, however, improves performance at the cost of some additionalcomplexity

Page 49: CA226 — Advanced Computer Architectureray/teaching/CA226/05-hazards.pdf · CA226 — Advanced Computer Architecture 3 … Recall: • the MIPS pipeline implements instruction level

CA226 — AdvancedComputer Architecture

49

Summary 2Some types of (RAW) stall are unavoidable:

• however, it is often possible to reorder instructions such that they do not occur

Page 50: CA226 — Advanced Computer Architectureray/teaching/CA226/05-hazards.pdf · CA226 — Advanced Computer Architecture 3 … Recall: • the MIPS pipeline implements instruction level

CA226 — AdvancedComputer Architecture

50

Done<script> (function() { var mathjax = 'mathjax/MathJax.js?config=asciimath'; // var mathjax= 'http://smblott.computing.dcu.ie/mathjax/MathJax.js?config=asciimath'; var element= document.createElement('script'); element.async = true; element.src = mathjax;element.type = 'text/javascript'; (document.getElementsByTagName('HEAD')[0]||document.body).appendChild(element); })(); </script>