79
Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University See P&H Chapter: 4.64.8

Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

Prof. Hakim WeatherspoonCS 3410, Spring 2015Computer ScienceCornell University

See P&H Chapter: 4.6‐4.8

Page 2: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

Prelim next weekTuesday at 7:30. Go to location based on netid[a‐g]* →  MRS146: Morrison Hall 146[h‐l]* →  RRB125: Riley‐Robb Hall 125[m‐n]*→ RRB105: Riley‐Robb Hall 105[o‐s]* →  MVRG71: M Van Rensselaer Hall G71[t‐z]* →  MVRG73: M Van Rensselaer Hall G73

Prelim reviewsTODAY, Tue, Feb 24 @ 7:30pm in Olin 255Sat, Feb 28 @ 7:30pm in Upson B17

Prelim conflictsContact Deniz Altinbuken <[email protected]

Page 3: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

Prelim1:• Time: We will start at 7:30pm sharp, so come early• Location: on previous slide• Closed Book

• Cannot use electronic device or outside material

• Practice prelims are online in CMS

• Material covered everything up to end of this week• Everything up to and including data hazards• Appendix B (logic, gates, FSMs, memory, ALUs) • Chapter 4 (pipelined [and non] MIPS processor with hazards)• Chapters 2 (Numbers / Arithmetic, simple MIPS instructions)• Chapter 1 (Performance)• HW1, Lab0, Lab1, Lab2, C‐Lab0, C‐Lab1

Page 4: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

RISC and Pipelined Processor: Putting it all togetherData Hazards• Data dependencies• Problem, detection, and solutions

– (delaying, stalling, forwarding, bypass, etc)• Hazard detection unit• Forwarding unit

Next time• Control Hazards

What is the next instruction to execute ifa branch is taken?  Not taken?

Page 5: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

Simplicity favors regularity• 32 bit instructions

Smaller is faster• Small register file

Make the common case fast• Include support for constants

Good design demands good compromises• Support for different type of interpretations/classes

Page 6: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

All MIPS instructions are 32 bits long, has 3 formats

R‐type

I‐type

J‐type 

op rs rt rd shamt func6 bits 5 bits 5 bits 5 bits 5 bits 6 bits

op rs rt immediate6 bits 5 bits 5 bits 16 bits

op immediate (target address)6 bits 26 bits

Page 7: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

Arithmetic/Logical• R‐type: result and two source registers, shift amount• I‐type:  16‐bit immediate with sign/zero extension

Memory Access• load/store between registers and memory• word, half‐word and byte operations

Control flow• conditional branches: pc‐relative addresses• jumps: fixed offsets, register absolute

Page 8: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

Arithmetic/Logical• ADD, ADDU, SUB, SUBU, AND, OR, XOR, NOR, SLT, SLTU• ADDI, ADDIU, ANDI, ORI, XORI, LUI, SLL, SRL, SLLV, SRLV, SRAV, SLTI, SLTIU

• MULT, DIV, MFLO, MTLO, MFHI, MTHIMemory Access

• LW, LH, LB, LHU, LBU, LWL, LWR• SW, SH, SB, SWL, SWR

Control flow• BEQ, BNE, BLEZ, BLTZ, BGEZ, BGTZ• J, JR, JAL, JALR, BEQL, BNEL, BLEZL, BGTZL

Special• LL, SC, SYSCALL, BREAK, SYNC, COPROC

Page 9: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

Principle:Throughput increased by parallel executionBalanced pipeline very important

Else slowest stage dominates performance

Pipelining:• Identify pipeline stages• Isolate stages from each other• Resolve pipeline hazards (this and next lecture)

Page 10: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

Five stage “RISC” load‐store architecture

1. Instruction fetch (IF)– get instruction from memory, increment PC

2. Instruction Decode (ID)– translate opcode into control signals and read registers

3. Execute (EX)– perform ALU operation,  compute jump/branch targets

4.Memory (MEM)– access memory if needed

5.Writeback (WB)– update register file

Page 11: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

• Each instruction goes through the 5 stages• Each stage takes one clock cycle

• So slowest stage determines clock cycle time

Page 12: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

1 2 3 4 5 6 7 8 9Clock cycle

IF ID EX MEM WB

IF ID EX MEM WB

IF ID EX MEM WB

IF ID EX MEM WB

IF ID EX MEM WB

add

lw

Page 13: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

The pipeline achievesA) Latency: 1, throughput: 1 instr/cycleB) Latency: 5, throughput: 1 instr/cycleC) Latency: 1, throughput: 1/5 instr/cycleD) Latency: 5, throughput: 5 instr/cycleE) None of the above

Page 14: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

1 2 3 4 5 6 7 8 9Clock cycle

Latency:  5 cyclesThroughput:  1 instr/cycleConcurrency:  5

IF ID EX MEM WB

IF ID EX MEM WB

IF ID EX MEM WB

IF ID EX MEM WB

IF ID EX MEM WB

add

lw

CPI = 1CPI =

Page 15: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

• Each instruction goes through the 5 stages• Each stage takes one clock cycle

• So slowest stage determines clock cycle time

• Stages must share information. How?• Add pipeline registers (flip‐flops) to pass results 

between different stages

Page 16: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

extend

registerfile

control

alu

memory

din dout

addrPC

memory

newpc

computejump/branch

targets

+4

Fetch Decode Execute Memory WB

Page 17: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

Write‐BackMemory

InstructionFetch Execute

InstructionDecode

extend

registerfile

control

alu

memory

din dout

addrPC

memory

newpc

inst

IF/ID ID/EX EX/MEM MEM/WB

imm

BA

ctrl

ctrl

ctrl

BD D

M

computejump/branch

targets

+4

Page 18: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

• Each instruction goes through the 5 stages• Each stage takes one clock cycle

• So slowest stage determines clock cycle time

• Stages must share information. How?• Add pipeline registers (flip‐flops) to pass results 

between different stages

And is this it? Not quite….

Page 19: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

3 kinds• Structural hazards

– Multiple instructions want to use same unit

• Data hazards– Results of instruction needed before ready

• Control hazards– Don’t know which side of branch to take

Will get back to thisFirst, how to pipeline when no hazards

Page 20: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

Write‐BackMemory

InstructionFetch Execute

InstructionDecode

extend

registerfile

control

alu

memory

din dout

addrPC

memory

newpc

inst

IF/ID ID/EX EX/MEM MEM/WB

imm

BA

ctrl

ctrl

ctrl

BD D

M

computejump/branch

targets

+4

Page 21: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

add  r3, r1, r2; nand  r6, r4, r5; lw  r4, 20(r2); add  r5, r2, r5; sw  r7, 12(r3); 

Page 22: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

Assume eight‐register machineRun the following code on a pipelined datapath

add r3 r1    r2   ;  reg 3 = reg 1 + reg 2nand r6 r4    r5   ;  reg 6 = ~(reg 4 & reg 5)lw r4 20 (r2)  ;  reg 4 =  Mem[reg2+20]add r5 r2    r5   ;  reg 5 = reg 2 + reg 5sw r7    12(r3)   ;  Mem[reg3+12] = reg 7

Slides thanks to Sally McKee

Page 23: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

PC

Register file

MUXA

LU

MUX

4

Datamem

+

MUX

Bits 11‐15

Bits 16‐20

op

Rt

imm

valB

valA

PC+4PC+4target

ALUresult

op

dest

valB

op

dest

ALUresult

mdata

instruction

0

R2

R3

R4

R5

R1

R6

R0

R7

regAregB

Bits 26‐31

data

dest

IF/ID ID/EX EX/MEM MEM/WB

extend

MUX

Rd

Page 24: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

data

dest

IF/ID ID/EX EX/MEM MEM/WB

extend

0MUX

0

At time 1, Fetchadd r3 r1 r2

0

Time: 0

4

Page 25: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

PC

Register file

MUXA

LU

MUX

4

Datamem

+

MUX

Bits 11‐15

Bits 16‐20

nop

0

0

0

040

0

nop

0

0

nop

0

0

0

0

add  3 1  2

912187

36

41

0

22

R2

R3

R4

R5

R1

R6

R0

R7

Bits 26‐31

data

dest

Fetch:add 3 1 2

add 3 1 2

IF/ID ID/EX EX/MEM MEM/WB

extend

0MUX

0

/    add

/ 3

/ 4

/ 36

/ 9

/ 2

Time: 1/ 2

48

Page 26: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

PC

Register file

MUXA

LU

MUX

4

Datamem

+

MUX

Bits 11‐15

Bits 16‐20

add

3

9

36

480

0

nop

0

0

nop

0

0

0

0nand  6 4  5

912187

36

41

0

22

R2

R3

R4

R5

R1

R6

R0

R7

12

Bits 26‐31

data

dest

Fetch:nand 6 4 5

nand 6 4 5           add 3 1 2

IF/ID ID/EX EX/MEM MEM/WB

extend

2MUX

3

/ 45

/    add/    nand

/ 9/ 6

/ 8

/ 18

/ 7

/ 5 / 3

/ 4

36

9

3

Time: 2/ 3

812

Page 27: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

PC

Register file

MUXA

LU

MUX

4

Datamem

+

MUX

Bits 11‐15

Bits 16‐20

nand

6

7

18

8124

45

add

3

9

nop

0

0

0

0lw  4  20(2)

912187

36

41

0

22

R2

R3

R4

R5

R1

R6

R0

R7

45

Bits 26‐31

data

dest

Fetch:lw 4 20(2)

lw 4 20(2)           nand 6 4 5           add 3 1 2

36

9

3

IF/ID ID/EX EX/MEM MEM/WB

extend

5MUX

6 32

/ 45

/ 3

nand (18 · 7)

18 = 01  00107 = 00  0111‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐3 = 11  1101

/  ‐3

/    add

/ 18

/ 7

/    nand

/ 7

/ 6

/ 8

Time: 3/ 4

1216

Page 28: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

PC

Register file

MUXA

LU

MUX

4

Datamem

+

MUX

Bits 11‐15

Bits 16‐20

lw

20

18

9

12168

‐3

nand

6

7

add

3

45

0

0add  5 2  5 

912187

36

41

0

22

R2

R3

R4

R5

R1

R6

R0

R7

24

Bits 26‐31

data

dest

Fetch:add 5 2 5

add 5 2 5          lw 4 20(2)          nand 6 4 5      add 3 1 2

18

7

6

45

3

IF/ID ID/EX EX/MEM MEM/WB

extend

4MUX

0 65

Time: 4

1620

Page 29: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

PC

Register file

MUXA

LU

MUX

4

Datamem

+

MUX

Bits 11‐15

Bits 16‐20

add

5

7

9

162012

29

lw

4

18

nand

6

‐3

0

0sw  7 12(3)

945187

36

41

0

22

R2

R3

R4

R5

R1

R6

R0

R7

25

Bits 26‐31

data

dest

Fetch:sw 7  12(3)

sw 7 12(3)          add 5 2 5           lw 4 20 (2)          nand 6 4 5             add   3 1 2

9

20

4

‐3

6

45

3

IF/ID ID/EX EX/MEM MEM/WB

extend

5MUX

5 04

Time: 5

2024

Page 30: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

PC

Register file

MUXA

LU

MUX

4

Datamem

+

MUX

Bits 11‐15

Bits 16‐20

sw

12

22

45

2016

16

add

5

7

lw

4

29

99

0945187

36

‐3

0

22

R2

R3

R4

R5

R1

R6

R0

R7

37

Bits 26‐31

data

dest

No moreinstructions

sw 7 12(3)        add 5 2 5        lw 4 20(2)         nand 6 4 5

9

7

5

29

4

‐3

6

IF/ID ID/EX EX/MEM MEM/WB

extend

7MUX

0 55

Time: 6

2428

Page 31: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

PC

Register file

MUXA

LU

MUX

4

Datamem

+

MUX

Bits 11‐15

Bits 16‐20

20

57

sw

7

22

add

5

16

0

0945997

36

‐3

0

22

R2

R3

R4

R5

R1

R6

R0

R7

Bits 26‐31

data

dest

No moreinstructions

nop nop sw 7 12(3)            add 5 2 5           lw 4 20(2)

45

7

12

16

5

99

4

IF/ID ID/EX EX/MEM MEM/WB

extend

MUX

07

Time: 7

2832

Page 32: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

PC

Register file

MUXA

LU

MUX

4

Datamem

+

MUX

Bits 11‐15

Bits 16‐20

sw

7

57

0

9459916

36

‐3

0

22

R2

R3

R4

R5

R1

R6

R0

R7

Bits 26‐31

data

dest

No moreinstructions

nop nop nop sw 7 12(3)          add  5 2 5

2257

22

16

5

Slides thanks to Sally McKee

IF/ID ID/EX EX/MEM MEM/WB

extend

MUX

Time: 8

3236

Page 33: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

PC

Register file

MUXA

LU

MUX

4

Datamem

+

MUX

Bits 11‐15

Bits 16‐20

9459916

36

‐3

0

22

R2

R3

R4

R5

R1

R6

R0

R7

Bits 21‐23

data

dest

No moreinstructions

nop nop nop nop sw 7 12(3)

IF/ID ID/EX EX/MEM MEM/WB

extend

MUX

Time: 9

3640

Page 34: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

Pipelining is a powerful technique to mask latencies and increase throughput

• Logically, instructions execute one at a time• Physically, instructions execute in parallel

– Instruction level parallelism

Abstraction promotes decoupling• Interface (ISA) vs. implementation (Pipeline)

Page 35: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

See P&H Chapter: 4.7‐4.8

Page 36: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

3 kinds• Structural hazards

– Multiple instructions want to use same unit

• Data hazards– Results of instruction needed before

• Control hazards– Don’t know which side of branch to take

Page 37: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

What about data dependencies (also known as a data hazard in a pipelined processor)?i.e. add r3, r1, r2

sub r5, r3, r4

Need to detect and then fix such hazards

Page 38: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

Data Hazards• register file reads occur in stage 2 (ID) • register file writes occur in stage 5 (WB)• next instructions may read values about to be written

– i.e instruction may need values that are being computed further down the pipeline

– in fact, this is quite common

Page 39: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

IF ID MEM WB

IF ID MEM WB

IF ID MEM WB

IF ID MEM WB

IF ID MEM WB

Clock cycle1 2 3 4 5 6 7 8 9

sub r5, r3, r4

lw r6,  4(r3)

or r5, r3, r5

sw r6, 12(r3)

add r3, r1, r2

time

Page 40: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

How many data hazards due to r3 only

A) 1B) 2C) 3D) 4E) 5

sub r5, r3, r4

lw r6,  4(r3)

or r5, r3, r5

sw r6, 12(r3)

add r3, r1, r2

Page 41: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

IF ID MEM WB

IF ID MEM WB

IF ID MEM WB

IF ID MEM WB

IF ID MEM WB

Clock cycle1 2 3 4 5 6 7 8 9

2. sub r5, r3, r4

3. lw r6,  4(r3)

4. or r5, r3, r5

5. sw r6, 12(r3)

1. add r3, r1, r2

time

r3 = 10

r3 = 20

r3 = 10 r3 = 20

r3 = 10

r3 = 10

r3 = 10

OK

Page 42: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

IF ID MEM WB

IF ID MEM WB

IF ID MEM WB

IF ID MEM WB

IF ID MEM WB

Clock cycle1 2 3 4 5 6 7 8 9

sub r5, r3, r4

lw r6,  4(r3)

or r5, r3, r5

sw r6, 12(r3)

add r3, r1, r2r3 = 10

r3 = 20

OK

r3 = 10 r3 = 20time

r3 = 10

r3 = 10

r3 = 10

r3 = 20

r6 = Mem[r3 + 4]

Page 43: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

Data Hazards• register file reads occur in stage 2 (ID) • register file writes occur in stage 5 (WB)• next instructions may read values about to be written

i.e. add r3, r1, r2sub r5, r3, r4

How to detect? 

Page 44: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

IF/ID

+4

ID/EX EX/MEM MEM/WB

mem

din dout

addrinst

PC+4

OP

BA

Rt

BD

MD

PC+4

imm

OP

Rd

OP

Rd

PC

instmem

Rd

Ra Rb

DB

A

Rd

Detecting Data Hazards

IF/ID.Ra ≠ 0 &&(IF/ID.Ra==ID/Ex.RdIF/ID.Ra==Ex/M.RdIF/ID.Ra==M/W.Rd)

Page 45: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

Data Hazards• register file reads occur in stage 2 (ID) • register file writes occur in stage 5 (WB)• next instructions may read values about to be written

How to detect? Logic in ID stage:stall = (IF/ID.Ra != 0 && 

(IF/ID.Ra == ID/EX.Rd || IF/ID.Ra == EX/M.Rd || IF/ID.Ra == M/WB.Rd))

|| (same for Rb)

Page 46: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

IF/ID

+4

ID/EX EX/MEM MEM/WB

mem

din dout

addr

PC

instmem

Rd

Ra Rb

DB

A

detecthazard Rd

add r3, r1, r2sub r5, r3, r5or r6, r3, r4 add r6, r3, r8

inst

PC+4

OP

BA

Rt

BD

MD

PC+4

imm

OP

Rd

OP

Rd

Page 47: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

Data hazards occur when a operand (register)  depends on the result of a previous instruction that may not be computed yet.  A pipelined processor needs to detect data hazards.

Page 48: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

What to do if data hazard detected?

Page 49: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

What to do if data hazard detected?A) Wait/StallB) Reorder in Software (SW)C) Forward/BypassD) All the aboveE) None.  We will use some other method

Page 50: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

What to do if data hazard detected?A) Wait/StallB) Reorder in Software (SW)C) Forward/BypassD) All the aboveE) None.  We will use some other method

Discuss today

Page 51: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

What to do if data hazard detected?

Options• Nothing

– Change the ISA to match implementation

• Stall– Pause current and subsequent instructions till safe

• Forward/bypass– Forward data value to where it is needed

Page 52: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

How to stall an instruction in ID stage• prevent IF/ID pipeline register update

– stalls the ID stage instruction

• convert ID stage instr into nop for later stages– innocuous “bubble” passes through pipeline

• prevent PC update– stalls the next (IF stage) instruction

Page 53: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

IF/ID

+4

ID/EX EX/MEM MEM/WB

mem

din dout

addr

PC

instmem

Rd

Ra Rb

DB

A

detecthazard Rd

add r3, r1, r2sub r5, r3, r5or r6, r3, r4 add r6, r3, r8

inst

PC+4

OP

BA

Rt

BD

MD

PC+4

imm

OP

Rd

OP

Rd

If detect hazard

WE=0

MemWr=0RegWr=0

Page 54: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

Clock cycle1 2 3 4 5 6 7 8

add r3, r1, r2

sub r5, r3, r5

or r6, r3, r4

add r6, r3, r8

time

Page 55: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

Clock cycle1 2 3 4 5 6 7 8

add r3, r1, r2

sub r5, r3, r5

or r6, r3, r4

add r6, r3, r8

r3 = 10

r3 = 20

time

IF ID Ex M W

IF ID Ex M W

IF ID Ex M

ID ID ID

IF IF IF

IF ID Ex

Stalls3 Stall

Page 56: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

datamem

B

A

B

D

M

Dinstmem

DrD B

A

Rd RdRd

WE

WE

Op

WE

Op

rA rB

PC

+4

Opnop

inst

/stall

add r3,r1,r2

(MemWr=0RegWr=0)

NOP = If(IF/ID.rA ≠ 0 &&(IF/ID.rA==ID/Ex.RdIF/ID.rA==Ex/M.RdIF/ID.rA==M/W.Rd))

sub r5,r3,r5

or r6,r3,r4 (WE=0)

Page 57: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

datamem

B

A

B

D

M

Dinstmem

DrD B

A

Rd RdRd

WE

WE

Op

WE

Op

rA rB

PC

+4

Opnop

inst

/stall

nop

(MemWr=0RegWr=0)

NOP = If(IF/ID.rA ≠ 0 &&(IF/ID.rA==ID/Ex.RdIF/ID.rA==Ex/M.RdIF/ID.rA==M/W.Rd))

add r3,r1,r2sub r5,r3,r5

(MemWr=0RegWr=0)

or r6,r3,r4 (WE=0)

Page 58: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

datamem

B

A

B

D

M

Dinstmem

DrD B

A

Rd RdRd

WE

WE

Op

WE

Op

rA rB

PC

+4

Opnop

inst

/stall

(MemWr=0RegWr=0)

NOP = If(IF/ID.rA ≠ 0 &&(IF/ID.rA==ID/Ex.RdIF/ID.rA==Ex/M.RdIF/ID.rA==M/W.Rd))

add r3,r1,r2sub r5,r3,r5 nop nop

(MemWr=0RegWr=0)

(MemWr=0RegWr=0)

or r6,r3,r4 (WE=0)

Page 59: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

Clock cycle1 2 3 4 5 6 7 8

add r3, r1, r2

sub r5, r3, r5

or r6, r3, r4

add r6, r3, r8

r3 = 10

r3 = 20

time

IF ID Ex M W

IF ID Ex M W

IF ID Ex M

ID ID ID

IF IF IF

IF ID Ex

Stalls3 Stall

Page 60: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

How to stall an instruction in ID stage• prevent IF/ID pipeline register update

– stalls the ID stage instruction

• convert ID stage instr into nop for later stages– innocuous “bubble” passes through pipeline

• prevent PC update– stalls the next (IF stage) instruction

Page 61: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

Data hazards occur when a operand (register)  depends on the result of a previous instruction that may not be computed yet.  A pipelined processor needs to detect data hazards.

Stalling, preventing a dependent instruction from advancing, is one way to resolve data hazards.  

Stalling introduces NOPs (“bubbles”) into a pipeline.  Introduce NOPs by (1) preventing the PC from updating, (2) preventing writes to IF/ID registers from changing, and (3) preventing writes to memory and register file.  *Bubbles in pipeline significantly decrease performance.

Page 62: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

What to do if data hazard detected?A) Wait/StallB) Reorder in Software (SW)C) Forward/Bypass

Page 63: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

Forwarding bypasses some pipelined stages forwarding a result to a dependent instruction operand (register).

Three types of forwarding/bypass• Forwarding from Ex/Mem registers to Ex stage (MEx)• Forwarding from Mem/WB register to Ex stage (WEx)• RegisterFile Bypass

Page 64: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

datamem

imm

B

A

B

D

M

Dinstmem

DB

A

Rd Rd

Rb

WE

WE

MC

Ra

MC

detecthazard

Three types of forwarding/bypass• Forwarding from Ex/Mem registers to Ex stage (MEx)• Forwarding from Mem/WB register to Ex stage (W Ex)• RegisterFile Bypass

IF/ID ID/Ex Ex/Mem Mem/WB

forwardunit

Page 65: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

datamem

imm

B

A

B

D

M

Dinstmem

DB

A

Rd Rd

Rb

WE

WE

MC

Ra

MC

forwardunit

detecthazard

Three types of forwarding/bypass• Forwarding from Ex/Mem registers to Ex stage (MEx)• Forwarding from Mem/WB register to Ex stage (W Ex)• RegisterFile Bypass

IF/ID ID/Ex Ex/Mem Mem/WB

Page 66: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

Ex/MEM to EX Bypass• EX needs ALU result that is still in MEM stage• Resolve:

Add a bypass from EX/MEM.D to start of EX

How to detect? Logic in Ex Stage:forward = (Ex/M.WE && EX/M.Rd != 0 &&

ID/Ex.Ra == Ex/M.Rd)|| (same for Rb)

Page 67: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

datamem

imm

B

A

B

D

M

Dinstmem

DB

A

Rd Rd

Rb

WE

WE

MC

Ra

MC

forwardunit

detecthazard

Three types of forwarding/bypass• Forwarding from Ex/Mem registers to Ex stage (MEx)• Forwarding from Mem/WB register to Ex stage (W Ex)• RegisterFile Bypass

IF/ID ID/Ex Ex/Mem Mem/WB

Page 68: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

add r3, r1, r2

sub r5, r3, r1

datamem

instmem

DB

A

IF ID Ex M W

IF ID Ex M W

Page 69: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

Mem/WB to EX Bypass• EX needs value being written by WB• Resolve:

Add bypass from WB final value to start of EX 

How to detect? Logic in Ex Stage:forward = (M/WB.WE && M/WB.Rd != 0 &&

ID/Ex.Ra == M/WB.Rd &&not (Ex/M.WE && Ex/M.Rd != 0 &&

ID/Ex.Ra == Ex/M.Rd)|| (same for Rb)

Check pg. 311

Page 70: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

datamem

imm

B

A

B

D

M

Dinstmem

DB

A

Rd Rd

Rb

WE

WE

MC

Ra

MC

forwardunit

detecthazard

Three types of forwarding/bypass• Forwarding from Ex/Mem registers to Ex stage (MEx)• Forwarding from Mem/WB register to Ex stage (W Ex)• RegisterFile Bypass

IF/ID ID/Ex Ex/Mem Mem/WB

Page 71: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

add r3, r1, r2

sub r5, r3, r1

or r6, r3, r4

datamem

instmem

DB

A

IF ID Ex M W

IF IDIF W

Ex M WID Ex M

Page 72: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

Register File Bypass• Reading a value that is currently being written

Detect:((Ra == MEM/WB.Rd) or (Rb == MEM/WB.Rd))and (WB is writing a register)

Resolve:Add a bypass around register file (WB to ID)Better: (Hack) just negate register file clock– writes happen at end of first half of each clock cycle– reads happen during second half of each clock cycle

Page 73: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

datamem

imm

B

A

B

D

M

Dinstmem

DB

A

Rd Rd

Rb

WE

WE

MC

Ra

MC

forwardunit

detecthazard

Three types of forwarding/bypass• Forwarding from Ex/Mem registers to Ex stage (MEx)• Forwarding from Mem/WB register to Ex stage (W Ex)• RegisterFile Bypass

IF/ID ID/Ex Ex/Mem Mem/WB

Page 74: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

add r3, r1, r2

sub r5, r3, r1

or r6, r3, r4

add r6, r3, r8 

datamem

instmem

DB

A

IF ID Ex M W

IF IDIF W

Ex M WID Ex MIF ID Ex M W

Page 75: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

Clock cycle1 2 3 4 5 6 7 8

add r3, r1, r2

sub r5, r3, r5

or r6, r3, r4

add r6, r3, r8

IF ID Ex M W

IF ID

IF W

Ex M W

ID Ex M

IF ID Ex

r3 = 10

r3 = 20

time

M W

Page 76: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

Clock cycle1 2 3 4 5 6 7 8

add r3, r1, r2

sub r5, r3, r4

lw r6,  4(r3)

or r5, r3, r5

sw r6, 12(r3)

IF ID Ex M W

IF ID

IF W

Ex M W

ID Ex M

IF ID Ex

time

M W

IF ID Ex M W

Page 77: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

datamem

imm

B

A

B

D

M

Dinstmem

DB

A

Rd Rd

Rb

WE

WE

MC

Ra

MC

forwardunit

detecthazard

Three types of forwarding/bypass• Forwarding from Ex/Mem registers to Ex stage (MEx)• Forwarding from Mem/WB register to Ex stage (W Ex)• Register File Bypass

IF/ID ID/Ex Ex/Mem Mem/WB

Page 78: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

Data hazards occur when a operand (register)  depends on the result of a previous instruction that may not be computed yet.  A pipelined processor needs to detect data hazards.

Stalling, preventing a dependent instruction from advancing, is one way to resolve data hazards.  Stalling introduces NOPs (“bubbles”) into a pipeline.  Introduce NOPs by (1) preventing the PC from updating, (2) preventing writes to IF/ID registers from changing, and (3) preventing writes to memory and register file.  Bubbles (nops) in pipeline significantly decrease performance.

Forwarding bypasses some pipelined stages forwarding a result to a dependent instruction operand (register).  Better performance than stalling.

Page 79: Prof. Hakim Weatherspoon CS 3410, Spring 2015...RISC and Pipelined Processor: Putting it all together Data Hazards • Data dependencies • Problem, detection, and solutions – (delaying,

Stall• Pause current and all subsequent instructions

Forward/Bypass• Try to steal correct value from elsewhere in pipeline• Otherwise, fall back to stalling or require a delay slot

Tradeoffs?