Upload
amari-dains
View
221
Download
0
Tags:
Embed Size (px)
Citation preview
EECS 470ILP and Exceptions
Lecture 7Coverage: Chapter 3
Optimizing CPU Performance
• Golden Rule: tCPU = Ninst*CPI*tCLK
• Given this, what are our options– Reduce the number of instructions executed– Reduce the cycles to execute an instruction– Reduce the clock period
• Our first focus: Reducing CPI– Approach: Instruction Level Parallelism (ILP)
Why ILP?
Vs.
• Requirements– Parallelism– Large window– Limited control deps– Eliminate “false” deps– Find run-time deps
How Much ILP is There?
How Large Must the “Window” Be?
ALU Operation GOOD, Branch BAD
Expected Number of BranchesBetween Mispredicts
E(X) ~ 1/(1-p)
E.g., p = 95%, E(X) ~ 20 brs, 100-ish insts
How Accurate are Branch Predictors?
Impact of Physical Storage Limitations
• Each instruction “in flight” must have storage for its result– Really worse than this because of mispeculation…
Registers GOOD, Memory BAD
• Benefits of registers– Well described deps– Fast access– Finite resource
• Memory loses these benefits for flexibility
*p = …
*q = …
… = *p
?
“Bottom Line” for an Ambitious Design
First Optimization: Out-of-Order Writeback
Playing by the Rules: In-order Writeback
DIV.D
ADD
IF ID D1 D2 D3 D4 MEM WB
IF ID EX MEM WB
D5
Playing by the Rules: In-order Writeback
DIV.D
ADD IF ID EX MEM WB
What’s wrong with this picture?
Divide by Zero!
IF ID D1 D2 D3 D4 MEM WBD5
Playing by the Rules: In-order Writeback
DIV.D
ADD IF ID EX MEM WB
What’s wrong with this picture?
Divide by Zero!
IF ID D1 D2 D3 D4 MEM WBD5
DIV.D
ADD IF ID EX MEM WB
IF ID D1 D2 D3 D4 MEM WBD5
stall stall stall stall
Another Way to Get in the Same Mess
• Many systems use microcode– Simplifies mapping of complex
instructions to CPU resources
• iA32 add-with-carry– ADC (EAX),EBX
tmp = MEM[EAX]tmp = tmp + EBX+CF, update CFMEM[EAX] = tmp
Side Effect!
Potential Fault!
Exceptions and Interrupts
Exception Type
Sync/Async Maskable? Restartable?
I/O request Async Yes Yes
System call Sync No Yes
Breakpoint Sync Yes Yes
Overflow Sync Yes Yes
Page fault Sync No Yes
Misaligned access
Sync No Yes
Memory Protect Sync No Yes
Machine Check Async/Sync No No
Power failure Async No No
Solution: Precise Interrupts• Implementation
approaches– Don’t
• E.g., Cray-1– Force in-order WB
• E.g., ARM SA-1– Force in-order checks
• E.g., Alpha 21064– Buffer speculative
results• E.g., P4, Alpha 21264• History buffer• Future file/Reorder buffer
InstructionsCompletelyFinished
No InstructionHas ExecutedAt All
PC
Precise State
Speculative State
MEM
Precise Interrupts via the Reorder Buffer
• @ Alloc– Allocate result storage at Tail
• @ Sched– Get inputs (ROB T-to-H then
ARF)– Wait until all inputs ready
• @ WB– Write results/fault to ROB– Indicate result is ready
• @ CT– Wait until inst @ Head is done– If fault, initiate handler– Else, write results to ARF– Deallocate entry from ROB
IF ID Alloc Sched EX
ROB
CT
Head Tail
PCDst regIDDst valueExcept?
• Reorder Buffer (ROB)– Circular queue of spec state– May contain multiple definitions
of same register
In-order In-order
Any order
ARF
Reorder Buffer Example
Code Sequence
f1 = f2 / f3 r3 = r2 + r3 r4 = r3 – r2
Initial Conditions
- reorder buffer empty - f2 = 3.0 - f3 = 2.0 - r2 = 6 - r3 = 5
ROB
Tim
eH T
regID: f1result: ?Except: ?
H T
regID: f1result: ?Except: ?
regID: r3result: ?Except: ?
H T
regID: f1result: ?Except: ?
regID: r3result: 11Except: N
regID: r4result: ?Except: ?
r3
regID: r8result: 2Except: n
regID: r8result: 2Except: n
regID: r8result: 2Except: n
Reorder Buffer Example
Code Sequence
f1 = f2 / f3 r3 = r2 + r3 r4 = r3 – r2
Initial Conditions
- reorder buffer empty - f2 = 3.0 - f3 = 2.0 - r2 = 6 - r3 = 5
ROB
Tim
eH T
regID: f1result: ?Except: ?
regID: r3result: 11Except: n
regID: r4result: 5Except: n
H T
regID: f1result: ?Except: y
regID: r3result: 11Except: n
regID: r4result: 5Except: n
regID: r8result: 2Except: n
regID: r8result: 2Except: n
H T
regID: f1result: ?Except: y
regID: r3result: 11Except: n
regID: r4result: 5Except: n
Reorder Buffer Example
Code Sequence
f1 = f2 / f3 r3 = r2 + r3 r4 = r3 – r2
Initial Conditions
- reorder buffer empty - f2 = 3.0 - f3 = 2.0 - r2 = 6 - r3 = 5
ROB
Tim
eH T
H T
first instof faulthandler