21
EECS 470 ILP and Exceptions Lecture 7 Coverage: Chapter 3

EECS 470 ILP and Exceptions Lecture 7 Coverage: Chapter 3

Embed Size (px)

Citation preview

Page 1: EECS 470 ILP and Exceptions Lecture 7 Coverage: Chapter 3

EECS 470ILP and Exceptions

Lecture 7Coverage: Chapter 3

Page 2: EECS 470 ILP and Exceptions Lecture 7 Coverage: Chapter 3

Optimizing CPU Performance

• Golden Rule: tCPU = Ninst*CPI*tCLK

• Given this, what are our options– Reduce the number of instructions executed– Reduce the cycles to execute an instruction– Reduce the clock period

• Our first focus: Reducing CPI– Approach: Instruction Level Parallelism (ILP)

Page 3: EECS 470 ILP and Exceptions Lecture 7 Coverage: Chapter 3

Why ILP?

Vs.

• Requirements– Parallelism– Large window– Limited control deps– Eliminate “false” deps– Find run-time deps

Page 4: EECS 470 ILP and Exceptions Lecture 7 Coverage: Chapter 3

How Much ILP is There?

Page 5: EECS 470 ILP and Exceptions Lecture 7 Coverage: Chapter 3

How Large Must the “Window” Be?

Page 6: EECS 470 ILP and Exceptions Lecture 7 Coverage: Chapter 3

ALU Operation GOOD, Branch BAD

Expected Number of BranchesBetween Mispredicts

E(X) ~ 1/(1-p)

E.g., p = 95%, E(X) ~ 20 brs, 100-ish insts

Page 7: EECS 470 ILP and Exceptions Lecture 7 Coverage: Chapter 3

How Accurate are Branch Predictors?

Page 8: EECS 470 ILP and Exceptions Lecture 7 Coverage: Chapter 3

Impact of Physical Storage Limitations

• Each instruction “in flight” must have storage for its result– Really worse than this because of mispeculation…

Page 9: EECS 470 ILP and Exceptions Lecture 7 Coverage: Chapter 3

Registers GOOD, Memory BAD

• Benefits of registers– Well described deps– Fast access– Finite resource

• Memory loses these benefits for flexibility

*p = …

*q = …

… = *p

?

Page 10: EECS 470 ILP and Exceptions Lecture 7 Coverage: Chapter 3

“Bottom Line” for an Ambitious Design

Page 11: EECS 470 ILP and Exceptions Lecture 7 Coverage: Chapter 3

First Optimization: Out-of-Order Writeback

Page 12: EECS 470 ILP and Exceptions Lecture 7 Coverage: Chapter 3

Playing by the Rules: In-order Writeback

DIV.D

ADD

IF ID D1 D2 D3 D4 MEM WB

IF ID EX MEM WB

D5

Page 13: EECS 470 ILP and Exceptions Lecture 7 Coverage: Chapter 3

Playing by the Rules: In-order Writeback

DIV.D

ADD IF ID EX MEM WB

What’s wrong with this picture?

Divide by Zero!

IF ID D1 D2 D3 D4 MEM WBD5

Page 14: EECS 470 ILP and Exceptions Lecture 7 Coverage: Chapter 3

Playing by the Rules: In-order Writeback

DIV.D

ADD IF ID EX MEM WB

What’s wrong with this picture?

Divide by Zero!

IF ID D1 D2 D3 D4 MEM WBD5

DIV.D

ADD IF ID EX MEM WB

IF ID D1 D2 D3 D4 MEM WBD5

stall stall stall stall

Page 15: EECS 470 ILP and Exceptions Lecture 7 Coverage: Chapter 3

Another Way to Get in the Same Mess

• Many systems use microcode– Simplifies mapping of complex

instructions to CPU resources

• iA32 add-with-carry– ADC (EAX),EBX

tmp = MEM[EAX]tmp = tmp + EBX+CF, update CFMEM[EAX] = tmp

Side Effect!

Potential Fault!

Page 16: EECS 470 ILP and Exceptions Lecture 7 Coverage: Chapter 3

Exceptions and Interrupts

Exception Type

Sync/Async Maskable? Restartable?

I/O request Async Yes Yes

System call Sync No Yes

Breakpoint Sync Yes Yes

Overflow Sync Yes Yes

Page fault Sync No Yes

Misaligned access

Sync No Yes

Memory Protect Sync No Yes

Machine Check Async/Sync No No

Power failure Async No No

Page 17: EECS 470 ILP and Exceptions Lecture 7 Coverage: Chapter 3

Solution: Precise Interrupts• Implementation

approaches– Don’t

• E.g., Cray-1– Force in-order WB

• E.g., ARM SA-1– Force in-order checks

• E.g., Alpha 21064– Buffer speculative

results• E.g., P4, Alpha 21264• History buffer• Future file/Reorder buffer

InstructionsCompletelyFinished

No InstructionHas ExecutedAt All

PC

Precise State

Speculative State

Page 18: EECS 470 ILP and Exceptions Lecture 7 Coverage: Chapter 3

MEM

Precise Interrupts via the Reorder Buffer

• @ Alloc– Allocate result storage at Tail

• @ Sched– Get inputs (ROB T-to-H then

ARF)– Wait until all inputs ready

• @ WB– Write results/fault to ROB– Indicate result is ready

• @ CT– Wait until inst @ Head is done– If fault, initiate handler– Else, write results to ARF– Deallocate entry from ROB

IF ID Alloc Sched EX

ROB

CT

Head Tail

PCDst regIDDst valueExcept?

• Reorder Buffer (ROB)– Circular queue of spec state– May contain multiple definitions

of same register

In-order In-order

Any order

ARF

Page 19: EECS 470 ILP and Exceptions Lecture 7 Coverage: Chapter 3

Reorder Buffer Example

Code Sequence

f1 = f2 / f3 r3 = r2 + r3 r4 = r3 – r2

Initial Conditions

- reorder buffer empty - f2 = 3.0 - f3 = 2.0 - r2 = 6 - r3 = 5

ROB

Tim

eH T

regID: f1result: ?Except: ?

H T

regID: f1result: ?Except: ?

regID: r3result: ?Except: ?

H T

regID: f1result: ?Except: ?

regID: r3result: 11Except: N

regID: r4result: ?Except: ?

r3

regID: r8result: 2Except: n

regID: r8result: 2Except: n

regID: r8result: 2Except: n

Page 20: EECS 470 ILP and Exceptions Lecture 7 Coverage: Chapter 3

Reorder Buffer Example

Code Sequence

f1 = f2 / f3 r3 = r2 + r3 r4 = r3 – r2

Initial Conditions

- reorder buffer empty - f2 = 3.0 - f3 = 2.0 - r2 = 6 - r3 = 5

ROB

Tim

eH T

regID: f1result: ?Except: ?

regID: r3result: 11Except: n

regID: r4result: 5Except: n

H T

regID: f1result: ?Except: y

regID: r3result: 11Except: n

regID: r4result: 5Except: n

regID: r8result: 2Except: n

regID: r8result: 2Except: n

H T

regID: f1result: ?Except: y

regID: r3result: 11Except: n

regID: r4result: 5Except: n

Page 21: EECS 470 ILP and Exceptions Lecture 7 Coverage: Chapter 3

Reorder Buffer Example

Code Sequence

f1 = f2 / f3 r3 = r2 + r3 r4 = r3 – r2

Initial Conditions

- reorder buffer empty - f2 = 3.0 - f3 = 2.0 - r2 = 6 - r3 = 5

ROB

Tim

eH T

H T

first instof faulthandler