32
CMPE 421 Parallel Computer Architecture Part 1 Pipeline: HAZARD

CMPE 421 Parallel Computer Architecture Part 1 Pipeline: HAZARD

Embed Size (px)

Citation preview

Page 1: CMPE 421 Parallel Computer Architecture Part 1 Pipeline: HAZARD

CMPE 421Parallel Computer Architecture

Part 1Pipeline: HAZARD

Page 2: CMPE 421 Parallel Computer Architecture Part 1 Pipeline: HAZARD

Pipelining MIPS Lets us examine why the pipeline can not run at full

speed There are some cases, though, where the next instruction can not

begin executing immediately This limits to pipeline are known as hazards

What makes it hard? structural hazards: different instructions, at different stages,

in the pipeline want to use the same hardware resource (resource conflict)

control hazards: succeeding instruction, to put into pipeline, depends on the outcome of a

previous branch instruction, already in pipeline Control decision determines execution path, such as when the instruction

changes the PC data hazards: an instruction in the pipeline requires data to

be computed by a previous instruction still in the pipeline

Before actually building the pipelined datapath and control we first briefly examine these potential hazards individually…

Page 3: CMPE 421 Parallel Computer Architecture Part 1 Pipeline: HAZARD

Structural Hazards Structural hazard: inadequate hardware to

simultaneously support all instructions in the pipeline in the same clock cycle

E.g., suppose single – not separate – instruction and data memory in pipeline below with one read port

then a structural hazard between first and fourth lw instructions

MIPS was designed to be pipelined: structural hazards are easy to avoid!

2 4 6 8 10 12 14

Instructionfetch

Reg ALUData

accessReg

Time

lw $1, 100($0)

lw $2, 200($0)

lw $3, 300($0)

2 nsInstruction

fetchReg ALU

Dataaccess

Reg

2 nsInstruction

fetchReg ALU

Dataaccess

Reg

2 ns 2 ns 2 ns 2 ns 2 ns

Programexecutionorder(in instructions)

Pipelined

Instructionfetch

Reg ALUData

accessReg

2 nslw $4, 400($0)

Hazard if single memory

Structural Hazards

Page 4: CMPE 421 Parallel Computer Architecture Part 1 Pipeline: HAZARD

Structural HazardEx 1: Suppose we have one memory unit instead of separate instruction and data memory

InstFetch

RegRead

ALU DataAccess

Reg Write

InstFetch

RegRead

ALU DataAccess

Reg Write

InstFetch

RegRead

ALU DataAccess

Reg Write

InstFetch

RegRead

ALU DataAccess

Reg Write

When a load or store word instruction is used the MEM stage tries to access the memory and because of single data memory a conflict occurs

Page 5: CMPE 421 Parallel Computer Architecture Part 1 Pipeline: HAZARD

Structural Hazard• Consider a load followed immediately by a

store • Processor only has a single write port

Clock

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9

IF RF/ID EX WBR-type

IF RF/ID EX WBBR-type

IF RF/ID EX MEMM WBLoad

IF RF/ID EX WBR-type

IF RF/ID EX WBR-type

bubble

Page 6: CMPE 421 Parallel Computer Architecture Part 1 Pipeline: HAZARD

Structural Hazard• Solutions

• Delay instruction until functional unit is ready

• Hardware inserts a pipeline stall or a bubble that delays execution of all instructions that follow (previous instructions continue)

• Increases CPI from the ideal value of 1

• Build more sophisticated functional units so that all combinations of instructions can be accommodated

• Example: Allow two simultaneous writes to the register file

Page 7: CMPE 421 Parallel Computer Architecture Part 1 Pipeline: HAZARD

Structural Hazard SolutionWrite Back Stall Solution:

Delay R-type register write by one cycle

IF RF/ID EX WBR-type MEM

1 2 3 4

Clock

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9

IF RF/ID MEM WBR-type

IF RF/ID MEM WBR-type

IF RF/ID EX MEM WBLoad

IF RF/ID MEM WBR-type

IF RF/ID MEM WBR-type

EX

EX

EX

EX

Page 8: CMPE 421 Parallel Computer Architecture Part 1 Pipeline: HAZARD

Control Hazards Control hazard: need to make a decision based on the

result of a previous instruction still executing in pipeline Solution 1 Stall the pipeline

Instructionfetch

Reg ALUData

accessReg

Time

beq $1, $2, 40

add $4, $5, $6

lw $3, 300($0)

4 ns

Instructionfetch

Reg ALUData

accessReg

2ns

Instructionfetch

Reg ALUData

accessReg

2ns

2 4 6 8 10 12 14 16Programexecutionorder(in instructions)

Pipeline stall

bubble

Note that branch outcome iscomputed in ID stage withadded hardware (later…)

Page 9: CMPE 421 Parallel Computer Architecture Part 1 Pipeline: HAZARD

Control Hazards Solution 2 Predict branch outcome

e.g., predict branch-not-taken :

Instructionfetch

Reg ALUData

accessReg

Time

beq $1, $2, 40

add $4, $5, $6

lw $3, 300($0)

Instructionfetch

Reg ALUData

accessReg

2 ns

Instructionfetch

Reg ALUData

accessReg

2 ns

Programexecutionorder(in instructions)

Instructionfetch

Reg ALUData

accessReg

Time

beq $1, $2, 40

add $4, $5 ,$6

or $7, $8, $9

Instructionfetch

Reg ALUData

accessReg

2 4 6 8 10 12 14

2 4 6 8 10 12 14

Instructionfetch

Reg ALUData

accessReg

2 ns

4 ns

bubble bubble bubble bubble bubble

Programexecutionorder(in instructions)

Prediction success

Prediction failure: undo (=flush) lw

Page 10: CMPE 421 Parallel Computer Architecture Part 1 Pipeline: HAZARD

Control Hazards Solution 3 Delayed branch: always execute the

sequentially next statement with the branch executing after one instruction delay – compiler’s job to find a statement that can be put in the slot that is independent of branch outcome

MIPS does this – but it is an option in SPIM (Simulator -> Settings)

Instructionfetch

Reg ALUData

accessReg

Time

beq $1, $2, 40

add $4, $5, $6

lw $3, 300($0)

Instructionfetch

Reg ALUData

accessReg

2 ns

Instructionfetch

Reg ALUData

accessReg

2 ns

2 4 6 8 1 0 12 14

2 ns

(d elayed branch slot)

Programexecutionorder(in instructions)

Delayed branch beq is followed by add that isindependent of branch outcome

Page 11: CMPE 421 Parallel Computer Architecture Part 1 Pipeline: HAZARD

Review: Pipelining Multiple Instructions The Instructions in Figures 6-19, 6-20

and 6-21 were independent None of them used the results calculated by

any of the others (register numbers are different)

Page 12: CMPE 421 Parallel Computer Architecture Part 1 Pipeline: HAZARD

12

Review: Pipelining Multiple Instructions

Page 13: CMPE 421 Parallel Computer Architecture Part 1 Pipeline: HAZARD

13

Review: Pipelining Multiple Instructions

Page 14: CMPE 421 Parallel Computer Architecture Part 1 Pipeline: HAZARD

14

Problem with starting next instruction before first is finished

dependencies that “go backward in time” are data hazards

Data Hazards

IM Reg

IM Reg

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6

Time (in clock cycles)

sub $2, $1, $3

Programexecutionorder(in instructions)

and $12, $2, $5

IM Reg DM Reg

IM DM Reg

IM DM Reg

CC 7 CC 8 CC 9

10 10 10 10 10/– 20 – 20 – 20 – 20 – 20

or $13, $6, $2

add $14, $2, $2

sw $15, 100($2)

Value of register $2:

DM Reg

Reg

Reg

Reg

DM

Page 15: CMPE 421 Parallel Computer Architecture Part 1 Pipeline: HAZARD

Solution to Data Hazards Data hazard: instruction needs data from the result of a

previous instruction still executing in pipeline Occur when pipeline changes the order of read/write

access to operands so that the order differs from the order seen by sequentially executing instructions

Solution1 Forward data if possible… Solution 2 Or change the relative timing of instructions

(insert stalls)Time

2 4 6 8 10

add $s0, $t0, $t1 IF ID WBEX MEM

add $s0, $t0, $t1

sub $t2, $s0, $t3

Programexecutionorder(in instructions)

IF ID WBEX

IF ID MEMEX

Time2 4 6 8 10

MEM

WBMEM

Instruction pipeline diagram:shade indicates use – left=write, right=read

Without forwarding – blue line –data has to go back in time;with forwarding – red line – data is available in time

•Caused by several different types of dependencies

Page 16: CMPE 421 Parallel Computer Architecture Part 1 Pipeline: HAZARD

Data Hazards SOLUTION 1

• Don’t wait for the instruction to complete before trying to resolve the data hazard

• As soon as ALU creates the sum for “add”, we can supply it as an input for the add

• Adding extra H/W to retrieve the missing item early from the internal resources is called forwarding or bypassing

Invalid

Remark: Forwarding path from the output of the memory access stage in the first instruction to the input of the execution stage is invalid (backward in time)

Page 17: CMPE 421 Parallel Computer Architecture Part 1 Pipeline: HAZARD

Data Dependency Types-Three classifications of data dependencies for instruction j following instruction I

• Read after Write (RAW)Instr. j tries to read before instr. i tries to write it

• Write after Write (WAW)Instr. j tries to write an operand before i writes its valueSince register writes only occur in WB, the pipeline we have been discussing does not have this type of dependency

• Write after Read (WAR)Instr. j tries to write a destination before it is read by iThis also does not occur in this pipeline we have been discussing since all reads happen early in the ID/RF stage and all writes are late in the WB stage

-WAW and WAR are in later more complicated pipes

Page 18: CMPE 421 Parallel Computer Architecture Part 1 Pipeline: HAZARD

Data Hazards Forwarding may not be enough (Hybrid solution is

required) e.g., if an R-type instruction following a load uses the result

of the load – called load-use data hazardTime

2 4 6 8 10 12 14

lw $s0, 20($t1)

sub $t2, $s0, $t3

Programexecutionorder(in instructions)

IF ID WBMEMEX

IF ID WBMEMEX

Time2 4 6 8 10 12 14

lw $s0, 20($t1)

sub $t2, $s0, $t3

Programexecutionorder(in instructions)

IF ID WBMEMEX

IF ID WBMEMEX

bubble bubble bubble bubble bubble

-With a one-stage stall (solution 2)

-Forwarding can get the data to the sub instruction in time (solution 1)

Without a stall it is impossibleto provide input to the subinstruction in time

Page 19: CMPE 421 Parallel Computer Architecture Part 1 Pipeline: HAZARD

Reordering Code to Avoid Pipeline Stall (Alternative Software Solution)

Example:lw $t0, 0($t1)lw $t2, 4($t1)sw $t2, 0($t1)sw $t0, 4($t1)

Reordered code:lw $t0, 0($t1)lw $t2, 4($t1)sw $t0, 4($t1)sw $t2, 0($t1)

Data hazard

Interchanged

Page 20: CMPE 421 Parallel Computer Architecture Part 1 Pipeline: HAZARD

Revisiting Hazards So far our datapath and control have ignored

hazards We shall revisit data hazards and control hazards

and enhance our datapath and control to handle them in hardware…

Page 21: CMPE 421 Parallel Computer Architecture Part 1 Pipeline: HAZARD

Problem with starting an instruction before previous are finished:

data dependencies that go backward in time – called data hazards

Data Hazards and Forwarding

IM Reg

IM Reg

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6

Time (in clock cycles)

sub $2, $1, $3

Programexecutionorder(in instructions)

and $12, $2, $5

IM Reg DM Reg

IM DM Reg

IM DM Reg

CC 7 CC 8 CC 9

10 10 10 10 10/– 20 – 20 – 20 – 20 – 20

or $13, $6, $2

add $14, $2, $2

sw $15, 100($2)

Value of register $2:

DM Reg

Reg

Reg

Reg

DM

sub $2, $1, $3and $12, $2, $5or $13, $6, $2add $14, $2, $2sw $15, 100($2)

$2 = 10 before sub;$2 = -20 after sub

Page 22: CMPE 421 Parallel Computer Architecture Part 1 Pipeline: HAZARD

Have compiler guarantee never any data hazards! by rearranging instructions to insert independent

instructions between instructions that would otherwise have a data hazard between them,

or, if such rearrangement is not possible, insert nops

Such compiler solutions may not always be possible, and nops slow the machine down

Software Solution

sub $2, $1, $3 lw $10, 40($3) slt $5, $6, $7

and $12, $2, $5or $13, $6, $2add $14, $2, $2sw $15, 100($2)

sub $2, $1, $3 nop nop

and $12, $2, $5or $13, $6, $2add $14, $2, $2sw $15, 100($2)

or

MIPS: nop = “no operation” = 00…0 (32bits) = sll $0, $0, 0

Page 23: CMPE 421 Parallel Computer Architecture Part 1 Pipeline: HAZARD

REVIEW: Solution to HAZARDS

Page 24: CMPE 421 Parallel Computer Architecture Part 1 Pipeline: HAZARD

How About Register File Access?

Instr.

Order

Time (clock cycles)

Inst 1

Inst 2

ALUIM Reg DM Reg

ALUIM Reg DM Reg

ALUIM Reg DM Reg

ALUIM Reg DM Reg

Fix register file access hazard by

doing reads in the second half of the cycle and

writes in the first half

add $1,

add $2,$1,

clock edge that controls register writing

clock edge that controls loading of pipeline state registers

Define register reads to occur in the second half of the cycle and register writes in the first half

Page 25: CMPE 421 Parallel Computer Architecture Part 1 Pipeline: HAZARD

Register Usage Can Cause Data Hazards

ALUIM Reg DM Reg

ALUIM Reg DM Reg

ALUIM Reg DM Reg

ALUIM Reg DM Reg

ALUIM Reg DM Reg

Dependencies backward in time cause hazards

add $1,

sub $4,$1,$5

and $6,$1,$7

xor $4,$1,$5

or $8,$1,$9

Read before write data hazard

Page 26: CMPE 421 Parallel Computer Architecture Part 1 Pipeline: HAZARD

Loads Can Cause Data Hazards

Instr.

Order

lw $1,4($2)

sub $4,$1,$5

and $6,$1,$7

xor $4,$1,$5

or $8,$1,$9A

LUIM Reg DM Reg

ALUIM Reg DM Reg

ALUIM Reg DM Reg

ALUIM Reg DM Reg

ALUIM Reg DM Reg

Dependencies backward in time cause hazards

Load-use data hazard

Page 27: CMPE 421 Parallel Computer Architecture Part 1 Pipeline: HAZARD

stall

stall

One Way to “Fix” a Data Hazard

Instr.

Order

add $1,

ALUIM Reg DM Reg

sub $4,$1,$5

and $6,$1,$7

ALUIM Reg DM Reg

ALUIM Reg DM Reg

Can fix data hazard by

waiting – stall – but impacts

CPI

Page 28: CMPE 421 Parallel Computer Architecture Part 1 Pipeline: HAZARD

Another Way to “Fix” a Data Hazard

ALUIM Reg DM Reg

ALUIM Reg DM Reg

ALUIM Reg DM Reg

Fix data hazards by forwarding results as soon

as they are available to

where they are needed

ALUIM Reg DM Reg

ALUIM Reg DM Reg

Instr.

Order

add $1,

sub $4,$1,$5

and $6,$1,$7

xor $4,$1,$5

or $8,$1,$9

Forwarding paths are valid only if the destination stage is later in time than the source stage.Forwarding is harder if there are multiple results to forward per instruction or if they need to write a result early in the pipeline

Page 29: CMPE 421 Parallel Computer Architecture Part 1 Pipeline: HAZARD

Forwarding with Load-use Data Hazards

ALUIM Reg DM Reg

ALUIM Reg DM Reg

ALUIM Reg DM Reg

ALUIM Reg DM Reg

ALUIM Reg DM Reg

Will still need one stall cycle even with forwarding

Instr.

Order

lw $1,4($2)

sub $4,$1,$5

and $6,$1,$7

xor $4,$1,$5

or $8,$1,$9

Page 30: CMPE 421 Parallel Computer Architecture Part 1 Pipeline: HAZARD

Branch Instructions Cause Control Hazards

Instr.

Order

lw

Inst 4

Inst 3

beq

ALUIM Reg DM Reg

ALUIM Reg DM Reg

ALUIM Reg DM Reg

ALUIM Reg DM Reg

Dependencies backward in time cause hazards

Page 31: CMPE 421 Parallel Computer Architecture Part 1 Pipeline: HAZARD

One Way to “Fix” a Control Hazard Another “solution” is to put in enough extra

hardware so that we can test registers, calculate the branch address, and update the PC during the second stage of the pipeline. That would reduce the number of stalls to only one.

A third approach is to prediction to handle branches, e.g., always predict that branches will be untaken. When right, the pipeline proceeds at full speed. When wrong, have to stall (and make sure nothing completes – changes machine state – that shouldn’t have).

Page 32: CMPE 421 Parallel Computer Architecture Part 1 Pipeline: HAZARD

stall

stall

stall

One Way to “Fix” a Control Hazard

Instr.

Order

beq

ALUIM Reg DM Reg

lw

ALUIM Reg DM Reg

ALU

Inst 3IM Reg DM

Fix branch hazard by waiting –

stall – but affects CPI