Csci 136 Computer Architecture II – Branch Hazards, Exceptions

Preview:

DESCRIPTION

Csci 136 Computer Architecture II – Branch Hazards, Exceptions. Xiuzhen Cheng cheng@gwu.edu. Announcement. Homework assignment # 10 , Due time – Before class, April 12 Readings: Sections 6.4 – 6.5 - PowerPoint PPT Presentation

Citation preview

Csci 136 Computer Architecture IICsci 136 Computer Architecture II – Branch Hazards, Exceptions – Branch Hazards, Exceptions

Xiuzhen Chengcheng@gwu.edu

Announcement

Homework assignment #10, Due time – Before class, April 12

Readings: Sections 6.4 – 6.5

Problems: 6.17-6.19, 6.21-6.22, 6.33-6.36, 6.39-6.40 (six of them will be graded. Your TA will give hints in the lab sections.)

Project #3 is due on April 10, 2005

Quiz #4: April 12, 2005

Final: Thursday, May 12, 12:40AM-2:40PM

Note: you must pass final to pass this course!

Review on Data Hazards, Forwarding, Stall

When does a data hazard happen?Data dependencies

Using forwarding to overcome data hazardsData is available after ALU stage

Forwarding conditions

Stall the pipeline for load-use instructionsData is available after MEM stage (lw instruction)

Hazard detection conditionsWhy in ID stage?

Review on Data Hazards

Review on Data Hazards, Forwarding, Stall

Sign-extend

PC+4

LW and SW

lw $5, 0($15)sw $5, 100($15)

Sign-Ext

lw $5, 0($15)beq $5, $0, Exitsw $5, 100($15)

lw $5, 0($15)add $8, $8, $8sw $5, 100($15)

SW is in MEM Stage

MEM/WB.RegWrite and EX/MEM.MemWrite and

MEM/WB.RegisterRd = EX/MEM.RegisterRd and

MEM/WB.RegisterRD != 0

Sign-Ext

EX/MEM

Data memory

lwsw

lw $5, 0($15)sw $5, 100($15)

SW is In EX Stage

ID/EX.MemWrite and MEM/WB.RegWrite and

MEM/WB.RegisterRd = ID/EX.RegisterRt and

MEM/WB.RegisterRd != 0

Sign-Ext

lwsw

More Cases

lw $15, 0($8) # load-use,sw $5, 100($15) # stall pipeline

R-Type followed by sw?The result from R-Type will be saved into memory

R-Type will overwrite base register for sw

An Example

40: lw $2, 20($1)

44: and $4, $2, $5

48: or $8, $2, $4

Clock Cycle 1:

Clock Cycle 2:

Clock Cycle 3:

Clock Cycle 4:

Clock 1

Sign-extend

PC+4

Clock 1

Lw $2, 20($1)

44

Clock 2

Sign-extend

PC+4

Clock 2

And $4, $2, $5

48

Lw $2, 20($1)

44

$1

20

122

11

010

0001

Clock 3

Sign-extend

PC+4

Clock 3

Or $8, $2, $4

52

And $4, $2, $5

44

$2

255

10

000

1100

$5

4

Lw $2, 20($1)

11

010

122

$1

20

Clock 4

Sign-extend

PC+4

Clock 4

Or $8, $2, $4

52

And $4, $2, $5

44

$2

255

10

000

1100

$5

4

Bubble

00

000

Lw $2, 20($1)

11

Clock 5

Sign-extend

PC+4

Clock 5

Or $8, $2, $4 And $4, $2, $5

44

$2

244

10

000

1100

$4

8

Bubble

10

000

Lw $2, 20($1)

00

$2

$5

255

44 2

11

Branch Hazards

Control hazard: attempt to make a decision before condition is evaluated

Branch Hazards

flush flush flush

Decision is made here

Observations

Branch decision does not occur until MEM stage; 3 CCs are wasted. – Current design, non-optimized

Is it possible to reduce branch delay?YESIn EXE stage?

Two CCs branch delay

In ID Stage?One CC branch delayHow? – for beq $x, $y, label, $x xor $y then or all bits, much faster than ALU operation. Also we have a separate ALU to compute branch address.

3 strategiesDelayed branch; Static branch prediction; Dynamic branch Prediction

Delayed Branch

Will always execute the instruction following the branch.

Only one will be executed

Done by compiler or assembler50% successful rate

Losing popularityWhy?

More pipeline stages

Superscalar

Scheduling the Branch Delay Slot

Independent instruction, best choice B is good when branch taking probability is high. It must be OK to execute the sub instruction when the branch goes to the unexpected direction

Static Branch Prediction

Assume the branch will not be taken; If prediction is wrong, clear the effect of sequential instruction execution.

How to discard instructions in the pipeline?Branch decision is made at MEM stage: instructions in IF, ID, EX stages need to be discarded.

Branch decision is made at ID stage: only flush IF/ID pipeline register!

Static Branch Prediction

flush flush flush

Decision is made here

Static Branch Prediction

IF.Flush

Pipelined Branch – An Example36:

10

$4

$8

40:

44

28

72

IF.Flush

44:

Pipelined Branch – An Example72:

Dynamic Branch Prediction

Static branch prediction is crude!

Take history into considerationIf a branch was taken last time, then fetching the new instruction from the same place

Branch prediction buffer – indexed by the lower bits of the branch instruction

This memory contains a bit (or bits) which tells whether the branch was recently taken or not

Is the prediction correct? Any bad effect?

1-bit prediction scheme

2-bit prediction scheme

Prediction Taken Prediction Taken

Prediction not Taken Prediction not Taken

taken

Not taken

takentaken

Not taken

Not taken

Not taken

taken

Observation

Since we move branch prediction to the ID stage, we need to copy forwarding control related hardware to the ID stage too!

Beq following lwHazard detection unit should work.

In-Class Exercise

Consider a loop branch that branches nine times in a row, then is not taken once. What is the prediction accuracy for this branch, assuming the prediction bit for this branch remains in the prediction buffer?

1-bit prediction?

With 2-bit prediction?

Prediction Taken Prediction Taken

Prediction not Taken Prediction not Taken

taken

Not taken

takentaken

Not taken

Not taken

Not taken

taken

Performance Comparision

Compare the performance of single-cycle, multi-cycle and pipelined datapath

200ps for memory access, 100ps for ALU operation, 50ps for register file access

25% loads, 10% stores, 11% branches, 2% jumps, 52% ALU ops

For piplelined datapath, 50% of load are immediately followed an instruction that uses the result

Branch delay on misprediction is 1 clock cycle and 25% branches are mispredicted

Jump delay is 1 clock cycle

Exceptions

Exceptions: events other than branch or jump that change the normal flow of instruction

Arithmetic overflow, undefined instruction, etc

Internal of the processor

Interrupts from external – IO interrupts

Use arithmetic overflow as an exampleWhen an overflow is detected, we need to transfer control to the exception handling routine at location 0x 8000 0180 immediately because we do not want this invalid value to contaminate other registers or memory locations

Similar idea as branch hazard

Detected in the EX stage

De-assert all control signals in EX and ID stages, flush IF/ID

Exceptions

80000180

Example

sub $11, $2, $4

and $12, $2, $5

or $13, $2, $6

add $1, $2, $1 -- overflow occurs

slt $15, $6, $7

lw $16, 50($7)

Exceptions handling routine:

0x 8000 0180 sw $25, 1000($0)

0x 8000 0184 sw $26, 1004($0)

Example

80000180

Clock 6

Example

Clock 7

80000180

Questions?

Recommended