Upload
braden-cousens
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002
ILP, cont.
• Maintaining Sequential Appearance– Precise Interrupts– RUU approach to OoO Scheduling
© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002
Superscalar Processors: The Big Picture
Program Form Processing Phase
Static program
dynamic inst.Stream (trace)
execution window
completed instructions
Fetch and CTpredictionDispatch/ dataflow
inst. Issue
inst execution
inst. Reorder & commit
© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002
A Generic Superscalar OOO ProcessorPr
e-de
code
I-CAC
HE
buffe
r
Rena
me
Disp
atch
scheduler scheduler
Reor
der b
uffe
r
RF RF
FUs
FUs
Memory Interface
© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002
Maintaining Sequential Semantics
• What if execution gets interrupted at an arbitrary point?– All insts. before commit– None thereafter
• We’ll focus on interrupts• Same mechanisms used today to
support SPECULATIVE EXECUTION• “Definition”: Instr. executes speculatively
up to complete. We don’t know yet if we should have executed this instr. Verification happens at commit (if ever).
© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002
Interrupts
• Examples– Power Failing, Arithmetic Overflow– I/O Device Request, OS Call, Page Fault– Invalid Opcode, Breakpoint, Protection Viol.
• Aka Faults, Exceptions, or Traps• Requirements
– Surprise Jump (to vectored Address)– Linking Return Address– Saving State– Changing State (e.g., kernel mode)
© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002
Classifying Interrupts
• 1a: Synchronous– Function of program state – overflow, page fault, etc.
• 1b. Asynchronous– e.g., External device or malfunction
• 2. Use Request– OS Call
• 2b. Coersed– From OS or hardware – page fault, protection violation
• 3a. User Maskable– Use can disable processing
• 3b. Non-Maskable– Guess!!!
© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002
Classifying Interrupts, contd.
• 4a. Between Instructions– Usually Asynchronous
• 4b. Within an Instruction– Usually Synchronous– Harder to deal with, why???
• 5a. Resume– As if nothing happened as far as the
program is concerned• 5b. Catastrophic
– Say, bye bye, program is leaving us
© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002
Restartable Pipelines
• Interrupts within an instruction are not catastrophic
• Most machines support this– Needed for virtual memory
• Some machines did not support this– Cost & Slowdown
• PRECISE INTERRUPTS is key– As if the interrupt happened at a well
defined point in the original sequential order– First let’s consider a simple DLX-style
pipeline
© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002
Precise Interrupts
• Sequential Semantics• Complete instructions before the
offending instruction• Squash (effects of) instructions after• Save PC• Force trap instruction into FETCH stage
– divert execution to interrupt handler
© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002
Precise Interrupts
• Jim Smith and Andrew Plezkun Paper• Original work was for a “simple” pipeline• Today the same principles are used in
virtually all modern microprocessors– Support for SPECULATIVE EXECUTION
• executing instruction without knowing whether we should
• more on this later– and of course, precise interrupts
• We’ll stick to precise interrupts for the time being
© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002
Do the Simple Thing First
• Modify State only when all preceding insts. are KNOWN to be exception free.
• Mechanism: Result Shift Register
• Stage = cycle• At FETCH: Reserve all stages for the
duration of the instruction
stage FU DR V PColdest 1 DIV R1 1 1000
n-1 ADD R2 1 1001youngest n SUB R3 1 1002
© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002
Simple Solution Discussion
• Essentially In-Order Completion– Simple
• Easy to implement– Performance?
• Execution overlap still possible• Writebacks in order • Amplifies latencies• Dependent Instructions wait longer
© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002
Allowing out-of-order completes
• Add one more state for instruction execution:– COMPLETE & COMMIT
• COMPLETE:– Result calculated– Dependent instructions can use– BUT, don’t know if preceding instructions
are all OK– I.e., don’t know if this instruction should
have executed now based on the original program order
• COMMIT:– All preceding instructions executed with no
problems– Can safely commit stage changes
© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002
OOO Complete & IO Commit
• Want: Out-of-Order Completion– Allow OOO completion– Maintain in-order COMMIT– Allow maximum overlap– Guarantee precise state if needed
• How does this improve performance?In-Order Complete OOO Complete
Tim
e
DIV R3, _, _ADD R1, _, _ADD _, R1, _
In-ordercommits
© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002
Reorder Buffer
• Result Shift Register:– Reserve Result Bus – Out-of-Order Completion
• Reorder Buffer– Defer Commits and do them in-order– Allow OOO Completes by buffering state
st. FU V TAG
ADD 5
DIV 4
Result Shift Register
mot
ion
TAG DR RES V E PC
head 4 R1 1000tail 5 R2 1001
REORDER BUFFER
mot
ion
res = resultv = valide = result NYA
Whe
n to
com
plet
e
Whe
n to
com
mit
© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002
Reorder Buffer Complications
• State is kept in the reorder buffer• Have to bypass from every entry
– Need to determine latest write w/ respect to the consuming instruction
RF
RB
Essentially:1. In-Order Commits2. Buffer speculative state till commit
© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002
Speculative State Updates
• Two fundamental approaches– Do changes but keep a record of old
state– Everything OK?
• Just discard record of changes• HISTORY BUFFER
– Keep two states:• Architectural and Speculative• On COMPLETE write state to
Speculative• On ISSUE read from speculative • On COMMIT write to Architectural• On Error, throw out Speculative state• FUTURE FILE
© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002
History Buffer
• Allow out-of-order register file updates• At decode record current value of target
register in RB– notice that this is the previous value the
register had• On Commit?
– Do nothing, state is fine• On Exception?
– Use History to UNDO changes made
RF
HB
resultsSource operands
Destination registersException
© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002
History Buffer Discussion
• Simple Mechanism• Additional Register File Port• Single Source for Input Operands• Normal Instruction processing Not
changed by much– Control mostly unchanged– Nothing to do on Commit for the common
case• Slow response to Interrupts
– Need to scan through HB– Complex?
© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002
Future File: The Optimist’s View
• Two Register Files:– One updated Out-of-Order (FUTURE)
• assume no exception will occur– One updated in Order (ARCHITECTURAL)
• Advantage: No delay to restore state on exception
RF
RB
Source operandsFF
results
© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002
How These Relate to Register Renaming?
• Physical Registers provide sufficient storage for both speculative and architectural storage
• It’s the register map table that determines what is the current state
• On interrupt we have to restore the map table– Values are there in the physical register file
• History and Future approaches still valid– History: keep track of changes to register
map table– On interrupt undo them one by one– Future: keep two tables
• Speculative: updated at decode• Architectural: updated at commit
© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002
RUU
• Sohi’s Paper• Common Mechanism for Precise
Interrupts and OOO Execution• Register Update Unit
– A collection of Reservation stations– Organized as a FIFO queue– Instructions Enter In-order at FETCH– They Exit In-Order at COMMIT
• Register File updates happen at this point.
• Simplescalar follows this model– Well, mostly– Cut’s corners on when Completes become
visible
© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002
RUU: OOO Execution
• Decode:– Check RUU for most recent write to register– If none found, read value from RF
• Do it in parallel really– If found, link to producer with a TAG
• RUU number is the TAG• Issue
– Wait till all input operands are ready• Complete
– Broadcast value and RUU ID• Waiting instructions will pick value up
• Commit– Head and Tail pointer for FIFO operation– Only when everyone before has committed
© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002
Where is the Rename Table?
• It’s the RUU– @ decode insts scan for the most recent
update to register– If none found, then register in register file– Otherwise, get RUU entry # as tag
• Interrupts?– Simply flush RUU
• Pros/Cons:– Associative lookup for decode– RUU ports limit when consumers can read a
value