120
1 Chapter 6: Pipelining Chapter 6: Pipelining

Chapter 6: Pipelining

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

1

Chapter 6: PipeliningChapter 6: Pipelining

2

OutlineOutline

An overview of pipeliningA pipelined datapathPipelined controlData hazards and forwardingData hazards and stallsBranch hazardsExceptionsSuperscalar and dynamic pipelining

3

Laundry example:

Ann, Brian, Cathy, Daveeach have one load ofclothes to wash, dry,and fold

Washer takes 30 minutes

Dryer takes 40 minutes

“Folder”takes 20 minutes

A B C D

Pipelining Is Natural!Pipelining Is Natural!

4

Sequential laundry takes 6 hours for 4 loadsIf they learned pipelining, how long would it take?

A

B

C

D

30 40 20 30 40 20 30 40 20 30 40 20

6 PM 7 8 9 10 11 Midnight

Task

Order

Time

Sequential LaundrySequential Laundry

5

Pipelined laundry takes 3.5 hours for 4 loads

A

B

C

D

6 PM 7 8 9 10 11 Midnight

Task

Order

Time

30 40 40 40 40 20

Pipelined Laundry: Start ASAPPipelined Laundry: Start ASAP

6

Pipelining LessonsPipelining LessonsDoesn’t help latency of

single task, but throughputof entirePipeline rate limited by

slowest stageMultiple tasks working at

same time using differentresourcesPotential speedup =

Number of pipe stagesUnbalanced stage length;

time to “fill”& “drain”the pipeline reducespeedupStall for dependences

A

B

C

D

6 PM 7 8 9

Task

Order

Time

30 40 40 40 40 20

7

SingleSingle--, Multi, Multi--Cycle, vs. PipelineCycle, vs. Pipeline

Clk

Cycle 1

Multiple Cycle Implementation:

Ifetch Reg Exec Mem Wr

Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10

Load Ifetch Reg Exec Mem Wr

Ifetch Reg Exec MemLoad Store

Pipeline Implementation:

Ifetch Reg Exec Mem WrStore

Clk

Single Cycle Implementation:

Load Store Waste

IfetchR-type

Ifetch Reg Exec Mem WrR-type

Cycle 1 Cycle 2

8

Pipelining MIPS Execution

Instructionfetch

Reg ALU Dataaccess

Reg

8 nsInstruction

fetchReg ALU Data

accessReg

8 nsInstruction

fetch

8 ns

Time

lw $1, 100($0)

lw $2, 200($0)

lw $3, 300($0)

2 4 6 8 10 12 14 16 18

2 4 6 8 10 12 14

...

Programexecutionorder(in instructions)

Instructionfetch

Reg ALUData

accessReg

Time

lw $1, 100($0)

lw $2, 200($0)

lw $3, 300($0)

2 nsInstruction

fetchReg ALU

Dataaccess

Reg

2 nsInstruction

fetchReg ALU

Dataaccess

Reg

2 ns 2 ns 2 ns 2 ns 2 ns

Programexecutionorder(in instructions)

Fig. 6.3

9

Instr.

Order

Time (clock cycles)

Inst 0

Inst 1

Inst 2

Inst 4

Inst 3A

LUIm Reg Dm Reg

AL

UIm Reg Dm Reg

AL

UIm Reg Dm Reg

AL

UIm Reg Dm Reg

AL

UIm Reg Dm Reg

Why Pipeline? Because theWhy Pipeline? Because theResources Are There!Resources Are There!

Single-cycle

Datapath

10

HazardHazard

Limits to pipelining: Hazards prevent next instructionfrom executing during its designated clock cycle–Structural hazards: Hardware cannot support this combination of

instructions - two instructions need the same resource.–Data hazards: Instruction depends on result of prior instruction

still in the pipeline–Control hazards: Pipelining of branches & other instructions that

change the PCCommon solution is to stall the pipeline until the hazard is

resolved, inserting one or more “bubbles”in the pipelineTo do this, hardware or software must detect that a hazard

has occurred.

11

Structural HazardsStructural Hazards

Structural hazards occur when two or more instructionsneed the same resource.

Common methods for eliminating structural hazards are:– Duplicate resources– Pipeline the resource– Reorder the instructions

It may be too expensive too eliminate a structural hazard,in which case the pipeline should stall.

When the pipeline stalls, no instructions are issued untilthe hazard has been resolved.

What are some examples of structural hazards?

12

One Memory Port Structural HazardsOne Memory Port Structural HazardsFigure 3.6, Page 142Figure 3.6, Page 142

Time (clock cycles)

Instr.

Order

Load

Instr 1

Instr 2

Instr 3

Instr 4

Reg ALU DMemIfetch Reg

Reg ALU DMemIfetch Reg

Reg ALU DMemIfetch Reg

Reg ALU DMemIfetch Reg

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 6 Cycle 7Cycle 5

Reg ALU DMemIfetch Reg

13

One Memory Port Structural HazardsOne Memory Port Structural HazardsFigure 3.7, Page 143Figure 3.7, Page 143

Instr.

Order

Time (clock cycles)

Load

Instr 1

Instr 2

Stall

Instr 3

Reg ALU DMemIfetch Reg

Reg ALU DMemIfetch Reg

Reg ALU DMemIfetch Reg

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 6 Cycle 7Cycle 5

Reg ALU DMemIfetch Reg

Bubble Bubble Bubble BubbleBubble

14

OutlineOutline

An overview of pipeliningA pipelined datapath (6.2)Pipelined controlData hazards and forwardingData hazards and stallsBranch hazardsExceptionsSuperscalar and dynamic pipelining

15

Designing a Pipelined ProcessorDesigning a Pipelined Processor

Examine the datapath and control diagram–Starting with single- or multi-cycle datapath?–Single- or multi-cycle control?

Partition datapath into stages:–IF (instruction fetch)–ID (instruction decode and register file read)–EX (execution or address calculation)–MEM (data memory access)–WB (write back)

Associate resources with stagesEnsure that flows do not conflict, or figure out how to

resolveAssert control in appropriate stage

16

Use Multicycle Execution Steps

Step nameAction for R-type

instructionsAction for memory-reference

instructionsAction forbranches

Action forjumps

Instruction fetch IR = Memory[PC]PC = PC + 4

Instruction A = Reg [IR[25-21]]decode/register fetch B = Reg [IR[20-16]]

ALUOut = PC + (sign-extend (IR[15-0]) << 2)

Execution, address ALUOut = A op B ALUOut = A + sign-extend if (A ==B) then PC = PC [31-28] IIcomputation, branch/ (IR[15-0]) PC = ALUOut (IR[25-0]<<2)jump completion

Memory access or R-type Reg [IR[15-11]] = Load: MDR = Memory[ALUOut]completion ALUOut or

Store: Memory [ALUOut] = B

Memory read completion Load: Reg[IR[20-16]] = MDR

But, use single-cycle datapath ...

17

Split Single-cycle Datapath

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

Instruction

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

ReaddataAddress

Datamemory

1

ALUresult

Mux

ALUZero

IF: Instruction fetch ID: Instruction decode/register file read

EX: Execute/address calculation

MEM: Memory access WB: Write back

Fig. 6.9

FeedbackPath

18

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

Inst

ruct

ion

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALUZero

ID/EX

Datamemory

Address

Pipeline registers (latches)

Add Pipeline Registers

Fig. 6.11

19

IF: Instruction Fetch–Fetch the instruction from the Instruction Memory

ID: Instruction Decode–Registers fetch and instruction decode

EX: Calculate the memory addressMEM: Read the data from the Data MemoryWB: Write the data back to the register file

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5

Ifetch Reg/Dec Exec Mem WrLoad

ConsiderConsider loadload

20

Clock

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7

Ifetch Reg/Dec Exec Mem Wr1st lw

Ifetch Reg/Dec Exec Mem Wr2nd lw

Ifetch Reg/Dec Exec Mem Wr3rd lw

PipeliningPipelining loadload

5 functional units in the pipeline datapath are:– Instruction Memory for the Ifetch stage– Register File’s Read ports (busA and busB) for the Reg/Dec stage– ALU for the Exec stage– Data Memory for the MEM stage– Register File’s Write port (busW) for the WB stage

21

•IR = mem[PC]; PC = PC + 4

IF Stage of load

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

Inst

ruct

ion

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALUZero

ID/EX

Instruction fetch

lw

Address

Datamemory

IR, PC+4Fig. 6.12

22

ID Stage of load• A = Reg[IR[25-21]]; B = Reg[IR[20-16]];

ALUout = PC + (sign-ext(IR[15-0]) << 2) (some ops moved to thenext stage)

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

Inst

ruct

ion

IF/ID EX/MEM

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALUZero

ID/EX MEM/WB

Instruction decode

lw

Address

Datamemory

Fig. 6.12

23

EX Stage of load•ALUout = A + sign-ext(IR[15-0])

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

Inst

ruct

ion

IF/ID EX/MEM

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALUZero

ID/EX MEM/WB

Execution

lw

Address

Datamemory

Fig. 6.13

24

MEM State of load•MDR = mem[ALUout]

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

Inst

ruct

ion

IF/ID EX/MEM

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

Datamemory

1

ALUresult

Mux

ALUZero

ID/EX MEM/WB

Memory

lw

Address

Fig. 6.14

25

WB Stage of load•Reg[IR[20-16]] = MDR

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

Inst

ruct

ion

IF/ID EX/MEM

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writedata

ReaddataData

memory

1

ALUresult

Mux

ALUZero

ID/EX MEM/WB

Write backlw

Writeregister

Address

Fig. 6.14

Who willsupply thisaddress?

26

Cycle 1 Cycle 2 Cycle 3 Cycle 4

Ifetch Reg/Dec Exec WrR-type

The Four Stages of RThe Four Stages of R--typetype

IF: fetch the instruction from the InstructionMemory

ID: registers fetch and instruction decodeEX: ALU operates on the two register operandsWB: write ALU output back to the register file

27

We have a structural hazard:–Two instructions try to write to the register file at the same time!–Only one write port

Clock

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9

Ifetch Reg/Dec Exec WrR-type

Ifetch Reg/Dec Exec WrR-type

Ifetch Reg/Dec Exec Mem WrLoad

Ifetch Reg/Dec Exec WrR-type

Ifetch Reg/Dec Exec WrR-type

Ops! We have a problem!

Pipelining RPipelining R--type andtype and loadload

28

Important ObservationImportant Observation

Ifetch Reg/Dec Exec Mem WrLoad1 2 3 4 5

Ifetch Reg/Dec Exec WrR-type1 2 3 4

Each functional unit can only be used once per instructionEach functional unit must be used at the same stage for all

instructions:–Load uses Register File’s write port during its 5th stage

–R-type uses Register File’s write port during its 4th stage

Several ways to solve: forwarding, adding pipeline bubble,making instructions same length

29

Clock

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9

Ifetch Reg/Dec Mem WrR-type

Ifetch Reg/Dec Mem WrR-type

Ifetch Reg/Dec Exec Mem WrLoad

Ifetch Reg/Dec Mem WrR-type

Ifetch Reg/Dec Mem WrR-type

Ifetch Reg/Dec Exec WrR-type Mem

Exec

Exec

Exec

Exec

1 2 3 4 5

Solution: Delay RSolution: Delay R--typetype’’s Writes WriteDelay R-type’s register write by one cycle:

–R-type also use Reg File’s write port at Stage 5–MEM is a NOP stage: nothing is being done.

R-type alsohas 5 stages

30

Cycle 1 Cycle 2 Cycle 3 Cycle 4

Ifetch Reg/Dec Exec MemStore Wr

The Four Stages ofThe Four Stages of storestore

IF: fetch the instruction from the Instruction Memory ID: registers fetch and instruction decode EX: calculate the memory address MEM: write the data into the Data Memory

Add an extra stage: WB: NOP

31

IF: fetch the instruction from the Instruction Memory ID: registers fetch and instruction decodeEX:

–compares the two register operand–select correct branch target address–latch into PC

Add two extra stages:MEM: NOPWB: NOP

Cycle 1 Cycle 2 Cycle 3 Cycle 4

Ifetch Reg/Dec Exec MemBeq Wr

The Three Stages ofThe Three Stages of beqbeq

32

PipelinedPipelined DatapathDatapath

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

Inst

ruct

ion

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0

Address

Writedata

Mux

1Registers

Readdata1

Readdata2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

Datamemory

1

ALUresult

Mux

ALUZero

ID/EX

Fig. 6.17

33

Graphically RepresentingPipelines

• Can help with answering questions like:– How many cycles to execute this code?– What is the ALU doing during cycle 4?– Help understand datapaths

IM Reg DM Reg

IM Reg DM Reg

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6

Time (in clock cycles)

lw $10, 20($1)

Programexecutionorder(in instructions)

sub $11, $2, $3

ALU

ALU

34

Example 1: Cycle 1

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

Inst

ruct

ion

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALUZero

ID/EX

Instruction fetch

lw $10, 20($1)

Address

Datamemory

Clock 1

Fig. 6.18

35

Example 1: Cycle 2

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

Inst

ruct

ion

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALUZero

ID/EX

Instruction decode

lw $10, 20($1)

Instruction fetch

sub $11, $2, $3

Address

Datamemory

Clock 2

Fig. 6.18

36

Instructionmemory

Address

4

0

Add Addresult

Shiftleft 2

Inst

ruct

ion

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALUZero

ID/EX

Execution

lw $10, 20($1)

Instruction decode

sub $11, $2, $3

3216Sign

extend

Address

Datamemory

Clock 3

Example 1: Cycle 3

Fig. 6.18

37

Instructionmemory

Address

4

0

Add Addresult

Shiftleft 2

Inst

ruct

ion

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

3216Sign

extend

Writeregister

Writedata

Memory

lw $10, 20($1)

Readdata

1

ALUresult

Mux

ALUZero

ID/EX

Execution

sub $11, $2, $3

Datamemory

Address

Clock 4

Example 1: Cycle 4

Fig. 6.18

38

Instructionmemory

Address

4

32

0

Add Addresult

1

ALUresult

Zero

Shiftleft 2

Inst

ruct

ion

IF/ID EX/MEMID/EX MEM/WB

Write backMux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Mux

ALUReaddata

Writeregister

Writedata

lw $10, 20($1)

Memory

sub $11, $2, $3

Address

Datamemory

Clock 5

Example 1: Cycle 5

Fig. 6.18

39

Instructionmemory

Address

4

32

0

Add Addresult

1

ALUresult

Zero

Shiftleft 2

Inst

ruct

ion

IF/ID EX/MEMID/EX MEM/WB

Write backMux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Mux

ALUReaddata

Writeregister

Writedata

sub $11, $2, $3

Address

Datamemory

Clock 6

Example 1: Cycle 6

Fig. 6.18

40

OutlineOutline

An overview of pipeliningA pipelined datapathPipelined control (6.3)Data hazards and forwardingData hazards and stallsBranch hazardsExceptionsSuperscalar and dynamic pipelining

41

Pipeline Control: Control Signals

PC

Instructionmemory

Address

Inst

ruct

ion

Instruction[20–16]

MemtoReg

ALUOp

Branch

RegDst

ALUSrc

4

16 32Instruction[15–0]

0

0Registers

Writeregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux

1Write

data

Read

data Mux

1

ALUcontrol

RegWrite

MemRead

Instruction[15–11]

6

IF/ID ID/EX EX/MEM MEM/WB

MemWrite

Address

Datamemory

PCSrc

Zero

AddAdd

result

Shiftleft 2

ALUresult

ALUZero

Add

0

1

Mux

0

1

Mux

Fig. 6.22

42

Execution/Address Calculationstage control lines

Memory access stagecontrol lines

Write-back stagecontrol lines

RegDst

ALUOp1

ALUOp0

ALUSrc Branch

MemRead

MemWrite

Regwrite

Mem toReg

1 1 0 0 0 0 0 1 00 0 0 1 0 1 0 1 1X 0 0 1 0 0 1 0 XX 0 1 0 1 0 0 0 X

Fig. 6.25

Group Signals According toStages

•Can use control signals of single-cycleCPU (Fig. 6.23, 6.24 <==> 5.12, 5.16)

R-type

lw

sw

beq

43

•Pass control signals along just like the data–Main control generates control signals during ID

Data Stationary Control

Control

EX

M

WB

M

WB

WB

IF/ID ID/EX EX/MEM MEM/WB

Instruction

Fig. 6.26

44

IF/ID

Register

ID/E

xR

egister

Ex/M

EM

Register

ME

M/W

BR

egister

ID EX MEM

ExtOp

ALUOp

RegDst

ALUSrc

BranchMemWr

MemtoRegRegWr

MainControl

ExtOp

ALUOp

RegDst

ALUSrc

MemtoRegRegWr

MemtoRegRegWr

MemtoRegRegWr

BranchMemWr Branch

MemW

WB

Data Stationary Control (cont.)Data Stationary Control (cont.)

Signals for EX (ExtOp, ALUSrc, ...) are used 1 cycle later Signals for MEM (MemWr, Branch) are used 2 cycles later Signals for WB (MemtoReg, MemWr) are used 3 cycles later

45

WB Stage of load•Reg[IR[20-16]] = MDR

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

Inst

ruct

ion

IF/ID EX/MEM

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writedata

ReaddataData

memory

1

ALUresult

Mux

ALUZero

ID/EX MEM/WB

Write backlw

Writeregister

Address

Fig. 6.14

Who willsupply thisaddress?

46

Datapath with Control

PC

Instructionmemory

Inst

ruct

ion

Add

Instruction[20–16]

Mem

toR

eg

ALUOp

Branch

RegDst

ALUSrc

4

16 32Instruction[15–0]

0

0

Mux

0

1

Add Addresult

RegistersW riteregister

W ritedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux

1

ALUresult

Zero

Writedata

Readdata

Mux

1

ALUcontrol

Shiftleft 2

Reg

Writ

e

MemRead

Control

ALU

Instruction[15–11]

6

EX

M

WB

M

WB

WBIF/ID

PCSrc

ID/EX

EX/MEM

MEM/WB

Mux

0

1

Mem

Writ

e

AddressData

memory

Address

Fig. 6.27

47

lw $10, 20($1)

sub $11, $2, $3

and $12, $4, $5

or $13, $6, $7

add $14, $8, $9

LetLet’’s Try it Outs Try it Out

48

Example 2: Cycle 1

Instructionmemory

Instruction[20–16]

Mem

toR

eg

ALUOp

Branch

RegDst

ALUSrc

4

Instruction[15–0]

0

Mux

0

1

Add Addresult

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux

1

ALUresult

Zero

ALUcontrol

Shiftleft 2

Reg

Writ

e

MemRead

Control

ALU

Instruction[15–11]

EX

M

WB

M

WB

WB

Inst

ruct

ion

IF/ID EX/MEMID/EX

ID: before<1> EX: before<2> MEM: before<3> WB: before<4>

MEM/WB

IF: lw $10, 20($1)

000

00

0000

000

00

000

0

00

00

0

00

Mux

0

1

Add

PC

0

Datamemory

Address

Writedata

Readdata

Mux

1

Mem

Writ

e

Address

Clock 1

49

Example 2: Cycle 2

WB

EX

M

Instructionmemory

Mem

toR

eg

ALUOp

Branch

RegDst

ALUSrc

4

0

Mux

0

1

Add Addresult

Writeregister

Writedata

Mux

1

ALUresult

Zero

ALUcontrol

Shiftleft 2

Reg

Writ

e

ALU

M

WB

WB

Inst

ruct

ion

IF/ID EX/MEMID/EX

ID: lw $10, 20($1) EX: before<1> MEM: before<2> WB: before<3>

MEM/WB

IF: sub $11, $2, $3

010

11

0001

000

00

000

0

00

00

0

00

Mux

0

1

Add

PC

0Writedata

Readdata

Mux

1

lwControl

Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

X

10

20

X

1

Instruction[20–16]

Instruction[15–0] Sign

extend

Instruction[15–11]

20

$X

$1

10

X

MemRead

Mem

Writ

e

Datamemory

Address

Address

Clock 2

50

Instructionmemory

Address

Instruction[20–16]

Mem

toR

eg

Branch

ALUSrc

4

Instruction[15–0]

0

1

Add Addresult

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

ALUresult

Shiftleft 2

Reg

Writ

e

MemRead

Control

ALU

Instruction[15–11]

EX

M

WB

WB

Inst

ruct

ion

IF/ID EX/MEMID/EX

ID: sub $11, $2, $3 EX: lw $10, . . . MEM: before<1> WB: before<2>

MEM/WB

IF: and $12, $4, $5

000

10

1100

010

11

000

1

00

00

0

00

Mux

0

1

Add

PC

0Writedata

Readdata

Mux

1

Mem

Writ

e

sub

11

X

X

3

2

X

$3

$2

X

11

$1

20

10

Mux

0

Mux

1

ALUOp

RegDst

ALUcontrol

M

WB

Zero

Signextend

Datamemory

Address

Clock 3

Example 2: Cycle 3

51

WB

EX

M

Instructionmemory

Address

Mem

toR

eg

ALUOp

Branch

RegDst

ALUSrc

4

0

0

1

Add Addresult

Writeregister

Writedata 1

ALUresult

ALUcontrol

Shiftleft 2

Reg

Writ

e

M

WB

Inst

ruct

ion

IF/ID EX/MEMID/EX

ID: and $12, $2, $3 EX: sub $11, . . . MEM: lw $10, . . . WB: before<1>

MEM/WB

IF: or $13, $6, $7

000

10

1100

000

10

101

0

11

10

0

00

Mux

0

1

Add

PC

0Writedata

Mux

1

andControl

Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

12

X

X

5

4

Instruction[20–16]

Instruction[15–0]

Instruction[15–11]

X

$5

$4

X

12

MemRead

Mem

Writ

e

$3

$2

11

Mux

Mux

ALUAddress Read

dataData

memory

10

WB

Zero

Signextend

Clock 4

Example 2: Cycle 4

52

Example 2: Cycle 5

Instructionmemory

Address

Instruction[20–16]

Branch

ALUSrc

4

Instruction[15–0]

0

1

Add Addresult

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

ALUresult

Shiftleft 2

Reg

Writ

e

MemRead

Control

ALU

Instruction[15–11]

EX

M

WB

Inst

ruct

ion

IF/ID EX/MEMID/EX

ID: or $13, $6, $7 EX: and $12, . . . MEM: sub $11, . . . WB: lw $10, . . .

MEM/WB

IF: add $14, $8, $9

000

10

1100

000

10

101

0

10

00

0

Mux

0

1

Add

PC

0Writedata

Readdata

Mux

1

Mem

Writ

e

or

13

X

X

7

6

X

$7

$6

X

13

$4

Mux

0

Mux

1

ALUOp

RegDst

ALUcontrol

M

WB

11 10

10$5

12

WB

Mem

toR

eg

11

Zero

Datamemory

Address

Signextend

Clock 5

53

Example 2: Cycle 6

WB

EX

M

Instructionmemory

Address

Mem

toR

eg

ALUOp

Branch

RegDst

ALUSrc

4

0

0

1

Add Addresult

1

ALUresult

ALUcontrol

Shiftleft 2

Reg

Writ

e

M

WB

Inst

ruct

ion

IF/ID EX/MEMID/EX

ID: add $14, $8, $9 EX: or $13, . . . MEM: and $12, . . . WB: sub $11, . . .

MEM/WB

IF: after<1>

000

10

1100

000

10

101

0

10

00

0

10

Mux

0

1

Add

PC

0Writedata

Mux

1

addControl

Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

14

X

X

9

8

Instruction[20–16]

Instruction[15–0]

Instruction[15–11]

X

$9

$8

X

14

MemRead

Mem

Writ

e

$7

$6

13

Mux

Mux

ALUReaddata

12

WB

11

11

Writeregister

Writedata

Zero

Datamemory

Address

Signextend

Clock 6

54

Example 2: Cycle 7

Fig. 6.34

Instructionmemory

Address

Instruction[20–16]

Branch

ALUSrc

4

Instruction[15–0]

0

1

Add Addresult

RegistersWriteregister

Writedata

ALUresult

Shiftleft 2

Reg

Writ

e

MemRead

Control

ALU

Instruction[15–11]

Signextend

EX

M

WB

Inst

ruct

ion

IF/ID EX/MEMID/EX

ID: after<1> EX: add $14, . . . MEM: or $13, . . . WB: and $12, . . .

MEM/WB

IF: after<2>

000

00

0000

000

10

101

0

10

00

0

Mux

0

1

Add

PC

0Writedata

Readdata

Mux

1

Mem

Writ

e

$8

Mux

0

Mux

1

ALUOp

RegDst

ALUcontrol

M

WB

13 12

12$9

14

WB

Mem

toR

eg

10

Readdata 1

Readdata 2

Readregister 1

Readregister 2 Zero

Datamemory

Address

Clock 7

55

Example 2: Cycle 8

Fig. 6.34

WB

EX

M

Instructionmemory

Address

Mem

toR

eg

ALUOp

Branch

RegDst

ALUSrc

4

0

0

1

Add Addresult

1

ALUresult

Zero

ALUcontrol

Shiftleft 2

Reg

Writ

e

M

WB

Inst

ruct

ion

IF/ID EX/MEMID/EX

ID: after<2> EX: after<1> MEM: add $14, . . . WB: or $13, . . .

MEM/WB

IF: after<3>

000

00

0000

000

00

000

0

10

00

0

10

Mux

0

1

Add

PC

0Writedata

Mux

1

Control

Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Instruction[20–16]

Instruction[15–0] Sign

extend

Instruction[15–11]

MemRead

Mem

Writ

e

Mux

Mux

ALUReaddata

14

WB

13

13

Writeregister

Writedata

Datamemory

Address

Clock 8

56

WB

EX

M

Instructionmemory

Address

Mem

toR

eg

ALUOp

Branch

RegDst

ALUSrc

4

0

0

1

Add Addresult

1

ALUresult

Zero

ALUcontrol

Shiftleft 2

Reg

Writ

e

M

WB

Inst

ruct

ion

IF/ID EX/MEMID/EX

ID: after<3> EX: after<2> MEM: after<1> WB: add $14, . . .

MEM/WB

IF: after<4>

000

00

0000

000

00

000

0

00

00

0

10

Mux

0

1

Add

PC

0Writedata

Mux

1

Control

Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Instruction[20–16]

Instruction[15–0] Sign

extend

Instruction[15–11]

MemRead

Mem

Writ

e

Mux

Mux

ALUReaddata

WB

14

14

Writeregister

Writedata

Datamemory

Address

Clock 9

Example 2: Cycle 9

57

Summary of Pipeline BasicsSummary of Pipeline Basics

Pipelining is a fundamental concept–Multiple steps using distinct resources–Utilize capabilities of datapath by pipelined instruction

processing Start next instrunction while working on the current one Limited by length of longest stage (plus fill/flush) Need to detect and resolve hazards

What makes it easy in MIPS?–All instructions are of the same length–Just a few instruction formats–Memory operands only in loads and stores

What makes pipelining hard? hazards

58

OutlineOutline

An overview of pipeliningA pipelined datapathPipelined controlData hazards and forwarding (6.4)Data hazards and stalls (6.5)Branch hazardsExceptionsSuperscalar and dynamic pipelining

59

Pipeline HazardsPipeline Hazards

Pipeline Hazards:–Structural hazards: attempt to use the same resource in two

different ways at the same time Ex.: combined washer/dryer or folder busy doing something else

(watching TV)–Data hazards: attempt to use item before ready

Instruction depends on result of prior instruction still in the pipeline–Control hazards: attempt to make decision before condition is

evaluated Ex.: wash football uniforms and need to see result of previous load to get

proper detergent level Branch instructions

Can always resolve hazards by waiting–pipeline control must detect the hazard–take action (or delay action) to resolve hazards

60

Mem

Instr.

Order

Time

Load

Instr 1

Instr 2

Instr 3

Instr 4A

LUMem Reg Mem Reg

AL

UMem Reg Mem Reg

AL

UMem Reg Mem RegA

LUReg Mem Reg

AL

UMem Reg Mem Reg

Use 2 memory: data memory and instructionmemory

Structural Hazard: Single MemoryStructural Hazard: Single Memory

61

Pipeline Hazards IllustratedPipeline Hazards Illustrated

IF ID EX MEM WB

StructuralHazard

IF ID ….

62

Data Hazards

IM Reg

IM Reg

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6

Time (in clock cycles)

sub $2, $1, $3

Programexecutionorder(in instructions)

and $12, $2, $5

IM Reg DM Reg

IM DM Reg

IM DM Reg

CC 7 CC 8 CC 9

10 10 10 10 10/ -2 0 -20 -20 -20 -20

or $13, $6, $2

add $14, $2, $2

sw $15, 100($2)

Value ofregister $2:

DM Reg

Reg

Reg

Reg

DM

Fig. 6.28

63

Types of Data HazardsTypes of Data Hazards

Three types: (inst. i1 followed by inst. i2)RAW (read after write):

i2 tries to read operand before i1 writes itWAR (write after read):

i2 tries to write operand before i1 reads it–Gets wrong operand, e.g., autoincrement addr.–Can’t happen in MIPS 5-stage pipeline because:

All instructions take 5 stages, and reads are always in stage 2, and writesare always in stage 5

WAW (write after write):i2 tries to write operand before i1 writes it–Leaves wrong result ( i1’s not i2’s); occur only in pipelines that

write in more than one stage–Can’t happen in MIPS 5-stage pipeline because:

All instructions take 5 stages, and writes are always in stage 5

64

Pipeline Hazards IllustratedPipeline Hazards Illustrated

IF ID EX MEM WB

IF ID EX Mem

RAW (read after write) Data Hazard

WAW Data Hazard(write after write)

IF ID EX MEM WB WAR Data Hazard(write after read)

IF ID EX MEM WB

IF ID EX MEM WB

65

Handling Data HazardsHandling Data Hazards

Use simple, fixed designs– Eliminate WAR by always fetching operands early (ID) in pipeline– Eliminate WAW by doing all write backs in order (last stage, static)– These features have a lot to do with ISA design

Internal forwarding in register file:– Write in first half of clock and read in second half– Read delivers what is written, resolve hazard between sub and add

Detect and resolve remaining ones– Compiler inserts NOP– Forward– Stall

66

Software SolutionSoftware Solution

Have compiler guarantee no hazardsWhere do we insert the NOPs?

sub $2, $1, $3and $12, $2, $5or $13, $6, $2add $14, $2, $2sw $15, 100($2)

Problem: this really slows us down!

67

Data Hazards

IM Reg

IM Reg

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6

Time (in clock cycles)

sub $2, $1, $3

Programexecutionorder(in instructions)

and $12, $2, $5

IM Reg DM Reg

IM DM Reg

IM DM Reg

CC 7 CC 8 CC 9

10 10 10 10 10/ -2 0 -20 -20 -20 -20

or $13, $6, $2

add $14, $2, $2

sw $15, 100($2)

Value ofregister $2:

DM Reg

Reg

Reg

Reg

DM

Fig. 6.28Insert two nops

68

Data Hazards : Forwarding

IM Reg

IM Reg

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6

Time (in clock cycles)

sub $2, $1, $3

Programexecutionorder(in instructions)

and $12, $2, $5

IM Reg DM Reg

IM DM Reg

IM DM Reg

CC 7 CC 8 CC 9

10 10 10 10 10/ -2

0 -20 -20 -20 -20

or $13, $6, $2

add $14, $2, $2

sw $15, 100($2)

Value ofregister $2:

DM Reg

Reg

Reg

Reg

DM Fig. 6.28

69

Pipeline with Forwarding

PC Instructionmemory

Registers

Mux

Mux

Control

ALU

EX

M

WB

M

WB

WB

ID/EX

EX/MEM

MEM/WB

Datamemory

Mux

Forwardingunit

IF/ID

Inst

ruct

ion

Mux

RdEX/MEM.RegisterRd

MEM/WB.RegisterRd

Rt

Rt

Rs

IF/ID.RegisterRd

IF/ID.RegisterRt

IF/ID.RegisterRt

IF/ID.RegisterRs

Fig. 6.32

ForwardA

ForwardB

70

Detecting Data HazardsDetecting Data Hazards

Hazard conditions:–1a. EX/MEM.RegisterRd = ID/EX.RegisterRs–1b. EX/MEM.RegisterRd = ID/EX.RegisterRt–2a. MEM/WB.RegisterRd = ID/EX.RegisterRs–2b. MEM/WB.RegisterRd = ID/EX.RegisterRt

Two optimizations:–Don’t forward if instruction does not write register

=> check if RegWrite is asserted–Don’t forward if destination register is $0

=> check if RegisterRd = 0

71

Detecting Data Hazards (cont.)Detecting Data Hazards (cont.)

Hazard conditions using control signals:–At EX stage:

EX/MEM.RegWrite and (EX/MEM.RegRd0)and (EX/MEM.RegRd=ID/EX.RegRs)

–At MEM stage:MEM/WB.RegWrite and (MEM/WB.RegRd0)

and (MEM/WB.RegRd=ID/EX.RegRs)–(replace ID/EX.RegRt for ID/EX.RegRs for the other

two conditions)

72

•Use temporary results, e.g., those in pipeline registers,don’t wait for them to be written

Resolving Hazards: Forwarding

IM R eg

IM Reg

CC 1 C C 2 C C 3 C C 4 C C 5 C C 6

Time (in c lock cyc les)

sub $2, $1 , $3

Programexecution orde r(in instruc tions)

and $12 , $2, $5

IM R eg D M R eg

IM DM R eg

IM D M Re g

C C 7 C C 8 C C 9

10 10 10 10 10/–20 –2 0 –20 –20 –20

or $13 , $6, $2

add $14 , $2, $2

sw $15 , 100($2)

Value o f register $2 :

DM R eg

Reg

Reg

R eg

X X X –20 X X X X XVa lue of EX/M EM :X X X X –20 X X X XVa lue o f M EM /W B :

D M

Fig. 6.29

73

Pipeline with Forwarding

PC Instructionmemory

Registers

Mux

Mux

Control

ALU

EX

M

WB

M

WB

WB

ID/EX

EX/MEM

MEM/WB

Datamemory

Mux

Forwardingunit

IF/ID

Inst

ruct

ion

Mux

RdEX/MEM.RegisterRd

MEM/WB.RegisterRd

Rt

Rt

Rs

IF/ID.RegisterRd

IF/ID.RegisterRt

IF/ID.RegisterRt

IF/ID.RegisterRs

Fig. 6.32

ForwardA

ForwardB

74

Forwarding LogicForwarding Logic

Forwarding: input to ALU from any pipe reg.–Add multiplexors to ALU input–Control forwarding in EX => carry Rs in ID/EX

Control signals for forwarding:–If both WB and MEM forward, e.g., add $1,$1,$2; add$1,$1,$3; add $1,$1,$4; => let MEM forward

–EX hazard: if (EX/MEM.RegWrite and (EX/MEM.RegRd0)

and (EX/MEM.RegRd=ID/EX.RegRs)) ForwardA=10–MEM hazard:

if (MEM/WB.RegWrite and (MEM/WB.RegRd0)and (EX/MEM.RegRd ID/EX.Reg.Rs)and (MEM/WB.RegRd=ID/EX.RegRs)) ForwardA=01

75

Example 3: Cycle 3

PC Instructionmemory

Registers

Mux

Mux

Mux

EX

M

WB

WB

Datamemory

Mux

Forwardingunit

Inst

ruct

ion

IF/ID

and $4, $2, $5 sub $2, $1, $3

ID/EX

before<1>

EX/MEM

before<2>

MEM/WB

or $4, $4, $2

Clock 3

2

5

10 10

$2

$5

5

2

4

$1

$3

3

1

2

Control

ALU

M

WB

76

Example 3: Cycle 4Example 3: Cycle 4

Fig. 6.41

PC Instructionmemory

Registers

Mux

Mux

Mux

EX

M

WB

M

WB

Datamemory

Mux

Forwardingunit

Inst

ruct

ion

IF/ID

or $4, $4, $2 and $4, $2, $5

ID/EX

sub $2, . . .

EX/MEM

before<1>

MEM/WB

add $9, $4, $2

Clock 4

4

6

10 10

$4

$2

6

2

4

$2

$5

5

2

4

Control

ALU

10

2

WB

77

Example 3: Cycle 5Example 3: Cycle 5

Fig. 6.42

PC Instructionmemory

Registers

Mux

Mux

Mux

EX

M

WB

M

WB

Datamemory

Mux

Forwardingunit

Inst

ruct

ion

IF/ID

add $9, $4, $2 or $4, $4, $2

ID/EX

and $4, . . .

EX/MEM

sub $2, . . .

MEM/WB

after<1>

Clock 5

4

2

10 10

$4

$2

2

4

9

$4

$2

4

2

24

Control

ALU

10

WB

2

1

4

78

Example 3: Cycle 6Example 3: Cycle 6

Fig. 6.42

PC Instructionmemory

Mux

Mux

Mux

EX

M

WB

M

WB

Datamemory

Mux

Forwardingunit

after<1>after<2> add $9, $4, $2 or $4, . . .

EX/MEM

and $4, . . .

MEM/WB

Clock 6

10

$4

$2

2

4

9

ALU

10

4

WB

4

1

Registers

Inst

ruct

ion

IF/ID

ID/EX

4

Control

79

•lw can still cause a hazard:–if is followed by an instruction to read the loaded reg.

Reg

IM

Reg

Reg

IM

lw $2, 20($1)

(in instructions)

and $4, $2, $5

IM Reg DM Reg

IM DM Reg

IM DM Reg

or $8, $2, $6

add $9, $4, $2

slt $1, $6, $7

DM Reg

Reg

Reg

DM

Can't Always Forward

Use stalling orcompiler toresolve

Fig. 6.34

80

Stalling•Stall pipeline by keeping instructions in same

stage and inserting an NOP instead

lw$2, 20($1)

order(in instructions)

and $4, $2, $5

or $8, $2, $6

add $9, $4, $2

slt $1, $6, $7

Reg

IM

Reg

Reg

IM DM

IM Reg DM RegIM

IM DM Reg

IM DM Reg

DM Reg

RegReg

Reg

bubble

Fig. 6.35

81

PCInstruction

memory

Registers

Mux

Mux

Mux

Control

ALU

EX

M

WB

M

WB

WB

ID/EX

EX/MEM

MEM/WB

Datamemory

Mux

Hazarddetection

unit

Forwardingunit

0

Mux

IF/ID

Inst

ruct

ion

ID/EX.MemReadIF

/IDW

rite

PC

Writ

e

ID/EX.RegisterRt

IF/ID.RegisterRd

IF/ID.RegisterRtIF/ID.RegisterRt

IF/ID.RegisterRs

RtRs

Rd

Rt EX/MEM.RegisterRd

MEM/WB.RegisterRd

Pipeline with Stalling Unit•Forwarding controls ALU inputs, hazard

detection controls PC, IF/ID, control signals

Fig. 6.36

82

Handling StallsHandling Stalls

Hazard detection unit in ID to insert stall between a loadinstruction and its use:if (ID/EX.MemRead and

((ID/EX.RegisterRt = IF/ID.RegisterRs) or(ID/EX.RegisterRt = IF/ID.registerRt))stall the pipeline for one cycle

(ID/EX.MemRead=1 indicates a load instruction)

How to stall?–Stall instruction in IF and ID: not change PC and IF/ID

=> the stages re-execute the instructions–What to move into EX: insert an NOP by changing EX, MEM,

WB control fields of ID/EX pipeline register to 0 as control signals propagate, all control signals to EX, MEM, WB are

deasserted and no registers or memories are written

83

Example 4: Cycle 2Hazard

detectionunit

0

MuxIF/ID

Writ

e

PC

Writ

e

ID/EX.RegisterRt

ID/EX.MemRead

M

WB

$1

$X

X

1

2

before<3>

PC Instructionmemory

Registers

Mux

Mux

Mux

EX WB

Datamemory

Mux

Forwardingunit

Inst

ruct

ion

IF/ID

ID/EX

EX/MEM

MEM/WB

and $4, $2, $5 lw $2, 20($1) before<1> before<2>

Clock 2

1

1

X

X11

Control

ALU

M

WB

84

Example 4: Cycle 3Hazard

detectionunit

0

MuxIF

/IDW

rite

PC

Writ

e

ID/EX.RegisterRt

lw $2, 20($1)

PC Instructionmemory

Registers

Mux

Mux

Mux

EX

M

WB

WB

Datamemory

Mux

Forwardingunit

Inst

ruct

ion

IF/ID

and $4, $2, $5

ID/EX

before<1>

EX/MEM

before<2>

MEM/WB

or $4, $4, $2

Clock 3

2

5

2

500 11

$2

$5

5

2

4

$1

$X

X

1

2

Control

ALU

M

WB

ID/EX.MemRead

85

$2

$5

5

2

24

WB

Hazarddetection

unit

0

MuxIF

/IDW

rite

PC

Writ

e

ID/EX.RegisterRt

PCInstruction

memory

Registers

Mux

Mux

Mux

EX

M

WB

Datamemory

Mux

Inst

ruct

ion

IF/ID

and $4, $2, $5 bubble

ID/EX

lw $2, . . .

EX/MEM

before<1>

MEM/WB

Clock 4

2

2

5

510

11

00

$2

$5

5

2

4

Control

ALU

M

WB

Forwardingunit

ID/EX.MemRead

or $4, $4, $2

Example 4: Cycle 4

86

Hazarddetection

unit

0

MuxIF

/IDW

rite

PC

Writ

e

ID/EX.RegisterRt

2

bubble lw $2, . . .

PCInstruction

memory

Registers

Mux

Mux

Mux

EX

M

WB

M

WB

Datamemory

Mux

Forwardingunit

Inst

ruct

ion

IF/ID

and $4, $2, $5

ID/EX

EX/MEM

MEM/WB

add $9, $4, $2

Clock 5

2

210 10

11

$4

$2

2

4

4

4

2

4

$2

$5

5

2

4

Control

ALU

0

WB

ID/EX.MemRead

or $4, $4, $2

Example 4: Cycle 5

87

Example 4: Cycle 6Example 4: Cycle 6

Fig. 6.49

PC Instructionmemory

Hazarddetection

unit

0

MuxIF

/IDW

rite

PC

Writ

e

ID/EX.RegisterRt

bubble

Registers

Mux

Mux

EX

M

WB

M

WB

Datamemory

Mux

Inst

ruct

ion

IF/ID

add $9, $4, $2

ID/EX

and $4, . . .

EX/MEM

MEM/WB

Clock 6

4

4

2

210 10

$4

$2

2

4

49

$2

2

Control

ALU

10

WB0

after<1>

Forwardingunit

$4

4

4

or $4, $4, $2

ID/EX.MemRead

Mux

88

Example 4: Cycle 7Example 4: Cycle 7

Fig. 6.49

Registers

Inst

ruct

ion

ID/EX

4

Control

PC Instructionmemory

IF/ID

Writ

e

PC

Writ

e

add $9, $4, $2 or $4, . . . and $4, . . .after<2> after<1>

Clock 7

Mux

Mux

Mux

EX

M

WB

M

WB

Datamemory

Mux

Forwardingunit

EX/MEM

MEM/WB

10 10

$4

$2

2

4

9

ALU

10

WB

44

1

Hazarddetection

unit

0

Mux

ID/EX.RegisterRt

ID/EX.MemRead

IF/ID

89

OutlineOutline

An overview of pipeliningA pipelined datapathPipelined controlData hazards and forwardingData hazards and stallsBranch hazards (6.6)ExceptionsSuperscalar and dynamic pipelining

90

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

Inst

ruct

i on

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALUZero

ID/EX

Datamemory

Address

Feedback PathFeedback Path

Fig. 6.11

91

Pipeline Datapath with ControlSignals

PC

Instructionmemory

Address

Inst

ruct

ion

Instruction[20–16]

MemtoReg

ALUOp

Branch

RegDst

ALUSrc

4

16 32Instruction[15–0]

0

0Registers

Writeregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux

1Write

data

Read

data Mux

1

ALUcontrol

RegWrite

MemRead

Instruction[15–11]

6

IF/ID ID/EX EX/MEM MEM/WB

MemWrite

Address

Datamemory

PCSrc

Zero

AddAdd

result

Shiftleft 2

ALUresult

ALUZero

Add

0

1

Mux

0

1

Mux

Fig. 6.22

92

Pipeline Hazards IllustratedPipeline Hazards Illustrated

IF ID EX MEM WB

StructuralHazard

IF ID EX MEM WB

IF ID EX Mem

RAW (read after write) Data Hazard

WAW Data Hazard(write after write)

IF ID EX MEM WB WAR Data Hazard(write after read)

IF ID EX MEM WB

IF ID EX MEM WB

IF ID ….

Control HazardIF ID EX MEM WB

IF ID ….

93

•When decide to branch, other inst. are in pipeline!

Reg

Reg

40 beq $1, $3, 7

order(in instructions)

IM Reg

IM DM

IM DM

IM DM

DM

DM Reg

Reg Reg

Reg

Reg

RegIM

44 and $12, $2, $5

48 or $13, $6, $2

52 add $14, $2, $2

72 lw $4, 50($7)

Reg

Branch Hazards

Fig. 6.37

94

Handling Branch HazardHandling Branch Hazard

Predict branch always not taken–Need to add hardware for flushing inst. if wrong–Branch decision made at MEM => need to flush instruction in

IF/ID, ID/MEM by changing control values to 0Reduce delay of taken branch by moving branch execution

earlier in the pipeline–Move up branch address calculation to ID–Check branch equality at ID (using XOR) by comparing the two

registers read during ID–Branch decision made at ID => one instruction to flush–Add a control signal, IF.Flush, to zero instruction field of IF/ID

=> making the instruction an NOPDynamic branch predictionCompiler rescheduling, delay branch

95

Pipeline with Flushing

PC Instructionmemory

4

Registers

Mux

Mux

Mux

ALU

EX

M

WB

M

WB

WB

ID/EX

0

EX/MEM

MEM/WB

Datamemory

Mux

Hazarddetection

unit

Forwardingunit

IF.Flush

IF/ID

Signextend

Control

Mux

=

Shiftleft 2

Mux

Fig. 6.41

96

Example 5: Cycle 3

PC Instructionmemory

4

Registers

Signextend

Mux

Mux

Control

EX

M

WB

M

WB

WB

Mux

Hazarddetection

unit

Forwardingunit

Mux

IF.F lush

IF/ID

and $12, $2, $5 beq $1, $3, 7 sub $10, $4, $8

MEM/WB

EX/MEM

ID/EX

Clock 3

72 44

48 44

28

7

$1

$3

10

48

72

72

0

$4

$8

ALU Datamemory

Mux

Shiftleft 2

before<1> before<2>

=

Fig. 6.38

97

Example 5: Cycle 4

Mux

0

bubble (nop)lw $4, 50($7)

Clock 4

beq $1, $3, 7 sub $10, . . . before<1>

PC Instructionmemory

4

Registers

Signextend

Mux

Mux

Control

EX

M

WB

M

WB

WB

Mux

Hazarddetection

unit

Forwardingunit

IF.F lush

IF/ID

MEM/WB

EX/MEM

ID/EX

76 72

76 72

$1

$3

10

76

ALU Datamemory

Mux

Shiftleft 2

=

Fig. 6.38

98

Predict-not-taken + branch decision at ID=> the following instruction is always executed=> branches take effect 1 cycle later

Instr.

Order

Time (clock cycles)

add

beq

misc

AL

UMem Reg Mem Reg

AL

UMem Reg Mem Reg

Mem

AL

UReg Mem Reg

lw Mem

AL

UReg Mem Reg

Delayed BranchDelayed Branch

99

Dynamic Branch PredictionDynamic Branch Prediction Performance = ƒ(accuracy, cost of misprediction)Branch History Table: Lower bits of PC address

index table of 1-bit values–Says whether or not branch taken last time–No address check

Problem: in a loop, 1-bit BHT will cause twomispredictions (avg is 9 iterations before exit):–End of loop case, when it exits instead of looping as

before–First time through loop on next time through code, when

it predicts exit instead of looping

100

11--Bit PredictionBit Prediction

For each branch, keep track of what happened lasttime and use that outcome as the prediction

What are prediction accuracies for branches 1 and2 below:while (1) {

for (i=0;i<10;i++) { branch-1…

}for (j=0;j<20;j++) { branch-2…

}}

101

22--Bit PredictionBit Prediction

For each branch, maintain a 2-bit saturating counter:if the branch is taken: counter = min(3,counter+1)if the branch is not taken: counter = max(0,counter-1)

If (counter >= 2), predict taken, else predict not takenAdvantage: a few atypical branches will not influence the

prediction (a better measure of “the common case”)Especially useful when multiple branches share the same

counter (some bits of the branch PC are used to index intothe branch predictor)

Can be easily extended to N-bits (in most processors, N=2)

102

NN--bit Branch Prediction Buffersbit Branch Prediction Buffers

When the counter is greater than or equal to one-half of its maximum value (2n-1), the branch ispredicted as taken.

The counter is increased on a taken branch anddecremented on an untaken branch.

A branch buffer can be implemented as a smallcache accessed during the IF stage.

103

NN--bit Branch Prediction Buffersbit Branch Prediction Buffers

Use an n-bit saturating counterOnly the loop exit causes a misprediction2-bit predictor almost as good as any general n-bit predictor

104

Basic Branch Prediction BuffersBasic Branch Prediction Buffers

IR:

PC:

Branch Instruction

+ Branch Target

BHT

a.k.a. Branch History Table (BHT) - Small direct-mapped cache of T/NT bits

PC + 4

T (predict taken)

NT (predict not- taken)

105

OutlineOutline

An overview of pipeliningA pipelined datapathPipelined controlData hazards and forwardingData hazards and stallsBranch hazardsExceptionsSuperscalar and dynamic pipelining

106

What about Exceptions?What about Exceptions?

•Another form of branch hazard–How to stop the pipeline? restart?–Who caused the interrupt?–Who to serve first, if multiple interrupts at the

same time?

107

Handling ExceptionsHandling Exceptions

How to stop the pipeline? restart?Suppose overflow occur at add $1,$2,$1

–Disable writes of instructions till trap hits WB, e.g.,flush following instructions using IF.Flush, ID.Flush,EX.Flush to cause multiplexers to zero control signals(overflow exception detected at EX => flushoffending instruction)

–Force trap instruction into IF, e.g., fetch from 40000040hex by adding 4000 0040hex to PC input MUX

–Save address of offending instruction in EPC

108

Pipeline with Exception

PC Instructionmemory

4

Registers

Signextend

Mux

Mux

Mux

Control

ALU

EX

M

WB

M

WB

WB

ID/EX

EX/MEM

MEM/WB

Mux

Datamemory

Mux

Hazarddetection

unit

Forwardingunit

IF.Flush

IF/ID

=

ExceptPC

40000040

0

Mux

0

Mux

0

Mux

ID.Flush EX.Flush

Cause

Shiftleft 2

Fig. 6.55

109

Who caused the exception ?Who caused the exception ?5 instructions executing in 5 stage pipelineWho caused the exception? Need to know in which stage

an exception can occur => help determine cause

Stage Problem interrupts occurringIF Page fault; misaligned memory access;

memory-protection violationID Undefined or illegal opcodeEX Arithmetic exceptionMEM Page fault; misaligned memory access;

memory error; mem-protection violation;

110

When to Serve?When to Serve?

Who to serve first, if multiple interrupts atthe same time?–Multiple interrupts: use priority hardware to

choose the earliest instruction to interrupt–External interrupts: flexible in when to

interrupt

111

OutlineOutline

An overview of pipeliningA pipelined datapathPipelined controlData hazards and forwardingData hazards and stallsBranch hazardsExceptionsSuperscalar and dynamic pipelining

112

Instruction Level ParallelismInstruction Level Parallelism

How to increase the potential amount of ILP:

Increase the depth of the pipeline to overlap moreinstructions–super-pipeline

Launch multiple instructions–Static multiple issue (decision made by compiler before

execution)–Dynamic multiple issue (decision made during

execution by the processor)

113

Different Pipelined DesignsPipelining

Super-scalar- Issue multiple scalarinstructions per cycle

VLIW (EPIC)- Each instruction specifiesmultiple scalar operations

- Compiler determines parallelism

Limitation

Issue rate, FU stalls, FU depth

Hazard resolution

Packing

IF D Ex M W

IF D Ex

IF D Ex M WIF D Ex M W

M W

IF D Ex M WIF D Ex M W

IF D Ex M WIF D Ex M W

IF D Ex M WEx M WEx M WEx M W

114

Static Multiple IssueStatic Multiple Issue

Use compiler to assist with packinginstructions and handling hazard

Very Long Instruction Word (VLIW)Explicitly Parallel Instruction Computer

(EPIC) (Intel IA-64)

115

A Static Two-issue Datapath

PC Instructionmemory

4

RegistersMux

Mux

ALU

Mux

Datamemory

Mux

40000040

Signextend Sign

extend

ALU Address

Writedata

Fig. 6.45

116

Dynamic Multiple IssueDynamic Multiple Issue

The hardware performs the scheduling?–hardware tries to find instructions to execute–out of order execution is possible–speculative execution and dynamic branch

prediction

Superscalar

117

Superscalar: Three Primary UnitsSuperscalar: Three Primary Units

Commitunit

Instruction fetchand decode unit

In-order issue

In-order commit

Load/Store

Floatingpoint

IntegerInteger …Functionalunits

Out-of-order execute

Reservationstation

Reservationstation

Reservationstation

Reservationstation

Fig. 6. 49

118

Independent INT and FP issue to separate pipelines

INT Reg Inst Issueand Bypass

FP Reg

Operand /ResultBusses

INT Unit

I-Cache

Load /StoreUnit

FP Add FP Mul

D-Cache

Simple SuperscalarSimple Superscalar

119

Dynamic SchedulingDynamic Scheduling

All modern processors are very complicated–DEC Alpha 21264: 9 stage pipeline, 6

instruction issue–PowerPC and Pentium: branch history table–Compiler technology important

120

SummarySummary

Pipelines pass control information down thepipe just as data moves down pipe

Forwarding/stalls handled by local controlExceptions stop the pipelineMIPS instruction set architecture made

pipeline visible (delayed branch, delayedload)

More performance from deeper pipelines,parallelism