Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University

Lec 9: Pipelining

Kavita Bala

CS 3410, Fall 2008

Computer Science

Cornell University

Basic PipeliningFive stage “RISC” load-store architecture1. Instruction fetch (IF)

• get instruction from memory2. Instruction Decode (ID)

• translate opcode into control signals and read regs3. Execute (EX)

• perform ALU operation4. Memory (MEM)

• Access memory if load/store5. Writeback (WB)

• update register file

Following slides thanks to Sally McKee

Sample Code (Simple)

• Assume eight-register machine

• Run the following code on a pipelined datapath add 3 1 2 ; reg 3 = reg 1 + reg 2

nand 6 4 5 ; reg 6 = ~(reg 4 & reg 5)

lw 4 20 (2) ; reg 4 = Mem[reg2+20]

add 5 2 5 ; reg 5 = reg 2 + reg 5

sw 7 12(3) ; Mem[reg3+12] = reg 7

Slides thanks to Sally McKee

PC Instmem

Datamem

EX/Mem

Mem/WB

Bits 0-2

Bits 15-17

offset

PC+1PC+1target

ALUresult

instru

Bits 21-23

Time Graphs

Time: 1 2 3 4 5 6 7 8 9

fetch decode execute memory writeback

Pipelining Recap• Powerful technique for masking latencies

– Logically, instructions execute one at a time– Physically, instructions execute in parallel

Instruction level parallelism

• Decouples the processor model from the implementation– Interface vs. implementation

• BUT dependencies between instructions complicate the implementation

What Can Go Wrong?

• Data hazards– register reads occur in stage 2 – register writes occur in stage 5 – could read the wrong value if is about to be

written• Control hazards

– branch instruction may change the PC in stage 4

– what do we fetch before that?

Handling Data Hazards

• Detect and Stall– If hazards exist, stall the processor until they

go away– Safe, but not great for performance

• Detect and Forward– If hazards exist, fix up the pipeline to get the

correct value (if possible)– Most common solution for high performance

Handling Data Hazards III:

• Detect– dest of early instrs == source of current

• Forward:– New bypass datapaths route computed data to

where it is needed– New MUX and control to pick the right data

• Beware: Stalling may still be required even in the presence of forwarding

Different type of Hazards

• Use register value in subsequent instruction

add 3 1 2

or 6 3 1

nand 5 3 4

sub 6 3 1

Data Hazards: AL ops

add 3 1 2 nand 5 3 4

If not careful, you read the wrong value of R3

add 3 1 2 nand 5 3 4

Handling Data Hazards: Forwarding

• No point forwarding to decode

• Forward to EX stage

• From output of ALU and MEM stages

add 3 1 2 nand 5 3 4

add 3 1 2 and 8 0 2 nand 5 3 4

fetch decode execute memory writebacknand

Forwarding Illustration

Instr.

add $3

nand $5 $3

UIM Reg DM Reg

add 3 1 2 and 8 0 2and 8 2 0nand 5 3 4

addadd

fetch decode execute memory writebacknandfetch decode execute memory writebackadd

Write in first half of cycle, read in second half of cycle

Data Hazards: lw

lw 3 12(zero) and 8 0 2 nand 5 3 4

Data Hazards: lw

lw 3 12(zero) and 8 0 2 nand 5 3 4

Data Hazards: lw

lw 3 12(zero) and 8 0 2 nand 5 3 4

fetch decode execute memory writebacklw

A tricky case

add 1 1 2

add 1 1 3

add 1 1 4

Instr.

add 1 1 2

UIM Reg DM Reg

add 1 1 3

add 1 1 4

UIM Reg DM Reg

Another tricky case

add 0 4 1add 3 0 4

Handling Data Hazards: Summary

• Forward:– New bypass datapaths route computed data to

where it is needed

• Beware: Stalling may still be required even in the presence of forwarding

• For lw– MIPS 2000/3000: a delay slot– MIPS 4000 onwards: stall

But really rely on compiler to treat it like delay slot

Detection

• Detection:– Compare regA with previous DestRegs – Compare regB with previous DestRegs

– regA and regB going to ALU– Previous destregs

Previously in ALU, Memory and Writeback

PCInstmem

leMUXA

Datamem

EX/Mem

Mem/WB

offset

PC+1PC+1target

ALUresult

instru

Sample Code

Which data hazards do you see?

add 3 1 2

nand 5 3 4

add 7 6 3

lw 6 24(3)

sw 6 12(2)

Sample Code

Which data hazards do you see?

add 3 1 2

nand 5 3 4

add 7 6 3

lw 6 24(3)

sw 6 12(2)

Hazard

PCInstmem

leMUXA

Datamem

EX/Mem

Mem/WB

d 5 3 4

7 10 1177

fwd fwd fwd

First half of cycle 3

PCInstmem

leMUXA

Datamem

EX/Mem

Mem/WB

7 10 11 77

End of cycle 3

New Hazard

PCInstmem

leMUXA

Datamem

EX/Mem

Mem/WB

7 10 11 77

PCInstmem

leMUXA

Datamem

EX/Mem

Mem/WB

lw 6 24(3)

7 10 1177

75 3data

End of cycle 4

PCInstmem

leMUXA

Datamem

EX/Mem

Mem/WB

lw 6 24(3)

7 10 1177

75 3data

PCInstmem

leMUXA

Datamem

EX/Mem

Mem/WB

sw 6 12(2)

7 21 11 77

7 5data

End of cycle 5

PCInstmem

leMUXA

Datamem

EX/Mem

Mem/WB

sw 6 12(2)

7 21 11 77

Hazard

PCInstmem

leMUXA

Datamem

EX/Mem

Mem/WB

sw 6 12(2)

7 21 11 -2

6 7data

End of cycle 6

PCInstmem

leMUXA

Datamem

EX/Mem

Mem/WB

sw 6 12(2)

7 21 11 -2

6 7data

Hazard

PCInstmem

leMUXA

Datamem

EX/Mem

Mem/WB

7 21 11 -2

End of cycle 7

PCInstmem

leMUXA

Datamem

EX/Mem

Mem/WB

7 21 11 -2

PCInstmem

leMUXA

Datamem

EX/Mem

Mem/WB

7 21 11 -2

End of cycle 8

Control Hazards

beqz 3 goToZero nand 5 1 4add 6 7 8

fetch decode execute memory writebackbeqz

Control Hazards for 4-stage pipeline

F/D execute memory writebackbeqz

F/D execute memory writebacknand

Control Hazards• Stall

– Inject NOPs into the pipeline when the next instruction is not known

– Pros: simple, clean; Cons: slow

• Delay Slots– Tell the programmer that the N instructions after a jump will

always be executed, no matter what the outcome of the branch– Pros: The compiler may be able to fill the slots with useful

instructions; Cons: breaks abstraction boundary

• Speculative Execution– Insert instructions into the pipeline– Replace instructions with NOPs if the branch comes out opposite

of what the processor expected– Pros: Clean model, fast; Cons: complex

Control Hazards for 4-stage pipeline

F/D execute memory writebackbeqz

F/D execute memory writebacknand

goToZero: lui 2 1

F/D execute memory writebacklui

Delay slot!

Hazard summary

• Data hazards– Forward – Stall for lw/sw

• Control hazard– Delay slot

Pipelining for Other Circuits• Pipelining can be applied to any combinational

logic circuit

• What’s the circuit latency at 2ns. per gate?• What’s the throughput?• What is the stage breakdown? Where would you

place latches?• What’s the throughput after pipelining?

Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University

Documents

Dolmar Parts Manual for Chainsaw Models: PS-3410, PS-3410 TH

Patel Kavita

Pipelining & Parallel Processing - ics.kaist.ac.krics.kaist.ac.kr/ee878_2018f/[EE878]3 Pipelining and Parallel Processing.pdf · Pipelining processing By using pipelining latches

© Kavita Bala, Computer Science, Cornell University Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Pipelining See: P&H Chapter 4.5

CONTAINER GLASS FORMING MACHINES AL - Sklostroj · B [mm] - - 2916 2992 2992 2992 2992 2992 2992 3292 3292 3292 3680 3680 3680 4060 4060 4060 C [mm] - - 3138 3410 3410 3410 3410 3410

Lec 8: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University

Prof. Kavita Bala and Prof. Hakim Weatherspoon CS 3410

Kavita Krishnamurthy

Compes 3410

kavita chauhan

Kavita Mistry

Kavita Kosh Presentation

EE457Unit6a Pipelining Notes - USC Viterbiee.usc.edu/~redekopp/ee457/slides/EE457Unit6a_Pipelining_Notes.pdf · • w/o pipelining: ___ • w/ pipelining: _ – _ cycles for

3410 AlmashriqNews

Casio 3410

Hakim Weatherspoon CS 3410€¦ · Pipelining Hakim Weatherspoon CS 3410. Computer Science. Cornell University [Weatherspoon, Bala, Bracy, McKee, and Sirer]

Kapitel 10: Pipelining - Multimediale Informationsverarbeitung · 2019-01-28 · 1 Kapitel 10: Pipelining • Pipelining (Fließbandtechnik): Implementierungstechnik Vielfältig angewendet

Todayʼs Menu Multi-Cycle Exceptions Exceptions ... · 13 Pipelining Multicycle Pipelining Let’s build cars 14 Pipelining Can we go faster? Pipelining: Production assembly lines

Prof. Kavita Bala and Prof. Hakim Weatherspoon CS 3410 ... · •Essentially just D-Latches plus Tri-State Buffers •A decoder selects which line of memory to access (i.e. word line)

Prof. Kavita Bala and Prof. Hakim Weatherspoon CS 3410 ...Prof. Kavita Bala and Prof. Hakim Weatherspoon CS 3410, Spring 2014 Computer Science Cornell University . ... CS 3420 (ECE