Exam #2 Results - csl.skku.educsl.skku.edu/uploads/ICE3003S11/10-hazards.pdf · ICE3003: Computer...

Preview:

Citation preview

1ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim (jinsookim@skku.edu)

Exam #2 ResultsExam #2 Results

Pipeline HazardsPipeline Hazards

Jin-Soo Kim (jinsookim@skku.edu)Computer Systems Laboratory

Sungkyunkwan Universityhttp://csl.skku.edu

3ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim (jinsookim@skku.edu)

HazardsHazards What are hazards?

• Situations that prevent starting the next instruction in the next cycle

Structure hazards• A required resource is busy

Data hazards• Need to wait for previous instruction to complete its

data read/write

Control hazards• Deciding on control action depends on previous

instruction

4ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim (jinsookim@skku.edu)

Structure HazardsStructure Hazards Conflict for use of a resource

In MIPS pipeline with a single memory• Load/store requires data access• Instruction fetch would have to stall for that cycle Would cause a pipeline “bubble”

Hence, pipelined datapaths require separate instruction/data memories• Or separate instruction/data caches

5ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim (jinsookim@skku.edu)

Data Hazards (1)Data Hazards (1) An instruction depends on completion of data

access by a previous instruction

add    $s0, $t0, $t1sub    $t2, $s0, $t3

6ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim (jinsookim@skku.edu)

Data Hazards (2)Data Hazards (2) Read After Write (RAW)

• Inst J tries to read operand before Inst I writes it:

• Caused by a “(true) dependence”. This hazard results from an actual need for communication.

I: add  $t1, $t2, $t3J: sub  $t4, $t1, $t3

7ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim (jinsookim@skku.edu)

Data Hazards (3)Data Hazards (3) Write After Read (WAR)

• Inst J writes operand before Inst I reads it:

• Called an “anti-dependence” by compiler writers. This results from the reuse of the name “$t1”.

I: sub  $t4, $t1, $t3J: add  $t1, $t2, $t3K: add $t6, $t1, $t7

8ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim (jinsookim@skku.edu)

Data Hazards (4)Data Hazards (4) Write After Write (WAW)

• Inst J writes operand before Inst I writes it:

• Called an “output dependence” by compiler writers. This also results from the reuse of the name “$t1”.

I: sub  $t1, $t4, $t3J: add  $t1, $t2, $t3K: add $t6, $t1, $t7

9ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim (jinsookim@skku.edu)

ForwardingForwarding Forwarding (aka Bypassing)

• Use result when it is computed• Don’t wait for it to be stored in a register• Requires extra connections in the datapath

10ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim (jinsookim@skku.edu)

Forwarding ExampleForwarding Example Consider this sequence:

We can resolve hazards with forwarding• How do we detect when to forward?

sub   $2,  $1, $3and   $12, $2, $5or    $13, $6, $2add   $14, $2, $2sw $15, 100($2)

11ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim (jinsookim@skku.edu)

Dependencies & ForwardingDependencies & Forwarding

12ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim (jinsookim@skku.edu)

Forwarding Conditions (1)Forwarding Conditions (1) Detecting the need to forward

• Pass register numbers along pipeline– e.g., ID/EX.RegisterRs = register number for Rs sitting in

ID/EX pipeline register

• ALU operand register numbers in EX stage are given by ID/EX.RegisterRs & ID/EX.RegisterRt

• Data hazards when1a. EX/MEM.RegisterRd = ID/EX.RegisterRs1b. EX/MEM.RegisterRd = ID/EX.RegisterRt2a. MEM/WB.RegisterRd = ID/EX.RegisterRs2b. MEM/WB.RegisterRd = ID/EX.RegisterRt

Fwd from EX/MEMpipeline reg

Fwd from MEM/WBpipeline reg

13ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim (jinsookim@skku.edu)

Forwarding Conditions (2)Forwarding Conditions (2) Detecting the need to forward (cont’d)

• But only if forwarding instruction will write to a register!– EX/MEM.RegWrite, MEM/WB.RegWrite

• And only if Rd for that instruction is not $zero– EX/MEM.RegisterRd ≠ 0,

MEM/WB.RegisterRd ≠ 0

14ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim (jinsookim@skku.edu)

Forwarding Conditions (3)Forwarding Conditions (3) Forwarding paths

15ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim (jinsookim@skku.edu)

Forwarding Conditions (4)Forwarding Conditions (4) EX hazard

• if (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)and (EX/MEM.RegisterRd = ID/EX.RegisterRs))

ForwardA = 10 • if (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)

and (EX/MEM.RegisterRd = ID/EX.RegisterRt))ForwardB = 10

MEM hazard• if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0)

and (MEM/WB.RegisterRd = ID/EX.RegisterRs))ForwardA = 01

• if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0)and (MEM/WB.RegisterRd = ID/EX.RegisterRt))

ForwardB = 01

16ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim (jinsookim@skku.edu)

Double Data HazardDouble Data Hazard Consider this sequence:

Both hazards occur• Want to use the most recent

Revise MEM hazard condition• Only forward if EX hazard condition isn’t true

add   $1, $1, $2add   $1, $1, $3add   $1, $1, $4

17ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim (jinsookim@skku.edu)

Rev’d Forwarding ConditionsRev’d Forwarding Conditions MEM hazard

• if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0)and not (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)

and (EX/MEM.RegisterRd = ID/EX.RegisterRs))and (MEM/WB.RegisterRd = ID/EX.RegisterRs))

ForwardA = 01

• if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0)and not (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)

and (EX/MEM.RegisterRd = ID/EX.RegisterRt)) and (MEM/WB.RegisterRd = ID/EX.RegisterRt))

ForwardB = 01

18ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim (jinsookim@skku.edu)

Datapath with ForwardingDatapath with Forwarding

19ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim (jinsookim@skku.edu)

Load-Use Data Hazard (1)Load-Use Data Hazard (1) Load-Use data hazards

• Can’t always avoids stalls by forwarding• If value not computed when needed• Can’t forward backward in time!

20ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim (jinsookim@skku.edu)

Load-Use Data Hazard (2)Load-Use Data Hazard (2)

Need to stall for one cycle

21ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim (jinsookim@skku.edu)

Load-Use Data Hazard (3)Load-Use Data Hazard (3) Load-use hazard detection

• Check when using instruction is decoded in ID stage• ALU operand register numbers in ID stage are given

by IF/ID.RegisterRs & IF/ID.RegisterRt• Load-use hazard when

– ID/EX.MemRead and((ID/EX.RegisterRt = IF/ID.RegisterRs) or (ID/EX.RegisterRt = IF/ID.RegisterRt))

• If detected, stall and insert bubble

22ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim (jinsookim@skku.edu)

Load-Use Data Hazard (4)Load-Use Data Hazard (4) How to stall the pipeline?

• Force control values in ID/EX register to 0– EX, MEM, and WB do nop (no-operation)

• Prevent update of PC and IF/ID register– Using instruction is decoded again– Following instruction is fetched again– 1-cycle stall allows MEM to read data for lw

» Can subsequently forward to EX stage

23ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim (jinsookim@skku.edu)

Load-Use Data Hazard (5)Load-Use Data Hazard (5) Stall/Bubble in the pipeline

Stall inserted here

24ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim (jinsookim@skku.edu)

Load-Use Data Hazard (6)Load-Use Data Hazard (6) Stall/Bubble in the pipeline (cont’d)

Or, more accurately...

25ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim (jinsookim@skku.edu)

Datapath + Hazard DetectionDatapath + Hazard Detection

26ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim (jinsookim@skku.edu)

Compiler Approach (1)Compiler Approach (1) Stalls reduce performance

• But are required to get correct results

Compiler can arrange code to avoid hazards and stalls• Requires knowledge of the pipeline structure

27ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim (jinsookim@skku.edu)

Compiler Approach (2)Compiler Approach (2) Code scheduling to avoid stalls

• Reorder code to avoid use of load result in the next instruction

(C code)   A = B + E;   C = B + F;

lw $t1, 0($t0)lw $t2, 4($t0)add $t3, $t1, $t2sw $t3, 12($t0)lw $t4, 8($t0)add $t5, $t1, $t4sw $t5, 16($t0)

lw $t1, 0($t0)lw $t2, 4($t0)lw $t4, 8($t0) add $t3, $t1, $t2sw $t3, 12($t0)add $t5, $t1, $t4sw $t5, 16($t0)

stall 

stall 

13 cycles 11 cycles

28ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim (jinsookim@skku.edu)

Control Hazards (1)Control Hazards (1) Branch determines flow of control

• Fetching next instruction depends on branch outcome

• Pipeline can’t always fetch correct instruction– Still working on ID stage of branch

In MIPS pipeline• Need to compare registers and compute target early

in the pipeline• Add hardware to do it in ID stage

29ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim (jinsookim@skku.edu)

Control Hazards (2)Control Hazards (2) Stall on branch

• If branch outcome determined in MEM:

PC

Flush theseinstructions(Set controlvalues to 0)

30ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim (jinsookim@skku.edu)

Control Hazards (3)Control Hazards (3) Reducing branch delay

• Move hardware to determine outcome to ID stage– Target address adder– Register comparator

Example: branch taken

36:  sub $10, $4, $840:  beq $1,  $3, 744:  and $12, $2, $548:  or  $13, $2, $652:  add $14, $4, $256:  slt $15, $6, $7

...72:  lw $4, 50($7) 

31ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim (jinsookim@skku.edu)

Control Hazards (4)Control Hazards (4)

32ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim (jinsookim@skku.edu)

Control Hazards (5)Control Hazards (5)

33ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim (jinsookim@skku.edu)

Data Hazards for Branches (1)Data Hazards for Branches (1) Data hazards for branches

• If a comparison register is a destination of 2nd or 3rd preceding ALU instruction Can resolve using forwarding

add $4, $5, $6

add $1, $2, $3

beq $1, $4, target

IF ID EX MEM WB

IF ID EX MEM WB

IF ID EX MEM WB

IF ID EX MEM WB

34ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim (jinsookim@skku.edu)

Data Hazards for Branches (2)Data Hazards for Branches (2) Data hazards for branches (cont’d)

• If a comparison register is a destination of preceding ALU instruction or 2nd preceding load instruction Need 1 stall cycle

beq stalled

add $4, $5, $6

lw $1, addr

beq $1, $4, target

IF ID EX MEM WB

IF ID EX MEM WB

IF ID

ID EX MEM WB

35ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim (jinsookim@skku.edu)

Data Hazards for Branches (3)Data Hazards for Branches (3) Data hazards for branches (cont’d)

• If a comparison register is a destination of immediately preceding load instruction Need 2 stall cycles

beq stalled

beq stalled

lw $1, addr

beq $1, $0, target

IF ID EX MEM WB

IF ID

ID

ID EX MEM WB

36ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim (jinsookim@skku.edu)

Branch Prediction (1)Branch Prediction (1) Branch prediction

• Longer pipelines can’t readily determine branch outcome early– Stall penalty becomes unacceptable

• Predict outcome of branch– Only stall if prediction is wrong

• In MIPS pipeline– Can predict branches not taken– Fetch instruction after branch, with no delay

37ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim (jinsookim@skku.edu)

Branch Prediction (2)Branch Prediction (2) Static branch prediction

• Based on typical branch behavior• Example: loop and if-statement branches

– Predict backward branches taken– Predict forward branches not taken

Dynamic branch prediction• Hardware measures actual branch behavior

– e.g., record recent history of each branch

• Assume future behavior will continue the trend– When wrong, stall while re-fetching, and update history

38ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim (jinsookim@skku.edu)

Static Branch PredictionStatic Branch Prediction MIPS with “Predict Not Taken”

Prediction correct

Prediction incorrect

39ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim (jinsookim@skku.edu)

Dynamic Branch Prediction (1)Dynamic Branch Prediction (1) Dynamic branch prediction

• In deeper and superscalar pipelines, branch penalty is more significant

• Use dynamic prediction– Branch prediction buffer (aka branch history table)– Indexed by recent branch instruction addresses– Stores outcome (taken/not taken)– To execute a branch

» Check table, expect the same outcome» Start fetching from fall-through or target» If wrong, flush pipeline and flip prediction

40ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim (jinsookim@skku.edu)

Dynamic Branch Prediction (2)Dynamic Branch Prediction (2) 1-bit predictor: shortcoming

• Inner loop branches mispredicted twice!

– Mispredict as taken on last iteration of inner loop– Then mispredict as not taken on first iteration of inner loop

next time around

outer: ……

inner: ……beq …, …, inner…beq …, …, outer

41ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim (jinsookim@skku.edu)

Dynamic Branch Prediction (3)Dynamic Branch Prediction (3) 2-bit predictor

• Only change prediction on two successive mispredictions

42ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim (jinsookim@skku.edu)

Dynamic Branch Prediction (4)Dynamic Branch Prediction (4) Calculating the branch target

• Even with predictor, still need to calculate the target address– 1-cycle penalty for a taken branch

• Branch target buffer– Cache of target addresses– Indexed by PC when instruction fetched

» If hit and instruction is branch predicted taken, can fetch target immediately

43ICE3003: Computer Architecture | Spring 2011 | Jin-Soo Kim (jinsookim@skku.edu)

Compiler ApproachCompiler Approach Delayed branch

• Branch delay slot filled by a useful instruction• Done by compilers and assemblers

Recommended