24
© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002 ILP, cont. Maintaining Sequential Appearance Precise Interrupts RUU approach to OoO Scheduling

© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002 ILP, cont. Maintaining Sequential Appearance –Precise Interrupts –RUU approach to OoO Scheduling

Embed Size (px)

Citation preview

© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002

ILP, cont.

• Maintaining Sequential Appearance– Precise Interrupts– RUU approach to OoO Scheduling

© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002

Superscalar Processors: The Big Picture

Program Form Processing Phase

Static program

dynamic inst.Stream (trace)

execution window

completed instructions

Fetch and CTpredictionDispatch/ dataflow

inst. Issue

inst execution

inst. Reorder & commit

© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002

A Generic Superscalar OOO ProcessorPr

e-de

code

I-CAC

HE

buffe

r

Rena

me

Disp

atch

scheduler scheduler

Reor

der b

uffe

r

RF RF

FUs

FUs

Memory Interface

© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002

Maintaining Sequential Semantics

• What if execution gets interrupted at an arbitrary point?– All insts. before commit– None thereafter

• We’ll focus on interrupts• Same mechanisms used today to

support SPECULATIVE EXECUTION• “Definition”: Instr. executes speculatively

up to complete. We don’t know yet if we should have executed this instr. Verification happens at commit (if ever).

© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002

Interrupts

• Examples– Power Failing, Arithmetic Overflow– I/O Device Request, OS Call, Page Fault– Invalid Opcode, Breakpoint, Protection Viol.

• Aka Faults, Exceptions, or Traps• Requirements

– Surprise Jump (to vectored Address)– Linking Return Address– Saving State– Changing State (e.g., kernel mode)

© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002

Classifying Interrupts

• 1a: Synchronous– Function of program state – overflow, page fault, etc.

• 1b. Asynchronous– e.g., External device or malfunction

• 2. Use Request– OS Call

• 2b. Coersed– From OS or hardware – page fault, protection violation

• 3a. User Maskable– Use can disable processing

• 3b. Non-Maskable– Guess!!!

© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002

Classifying Interrupts, contd.

• 4a. Between Instructions– Usually Asynchronous

• 4b. Within an Instruction– Usually Synchronous– Harder to deal with, why???

• 5a. Resume– As if nothing happened as far as the

program is concerned• 5b. Catastrophic

– Say, bye bye, program is leaving us

© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002

Restartable Pipelines

• Interrupts within an instruction are not catastrophic

• Most machines support this– Needed for virtual memory

• Some machines did not support this– Cost & Slowdown

• PRECISE INTERRUPTS is key– As if the interrupt happened at a well

defined point in the original sequential order– First let’s consider a simple DLX-style

pipeline

© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002

Precise Interrupts

• Sequential Semantics• Complete instructions before the

offending instruction• Squash (effects of) instructions after• Save PC• Force trap instruction into FETCH stage

– divert execution to interrupt handler

© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002

Precise Interrupts

• Jim Smith and Andrew Plezkun Paper• Original work was for a “simple” pipeline• Today the same principles are used in

virtually all modern microprocessors– Support for SPECULATIVE EXECUTION

• executing instruction without knowing whether we should

• more on this later– and of course, precise interrupts

• We’ll stick to precise interrupts for the time being

© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002

Do the Simple Thing First

• Modify State only when all preceding insts. are KNOWN to be exception free.

• Mechanism: Result Shift Register

• Stage = cycle• At FETCH: Reserve all stages for the

duration of the instruction

stage FU DR V PColdest 1 DIV R1 1 1000

n-1 ADD R2 1 1001youngest n SUB R3 1 1002

© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002

Simple Solution Discussion

• Essentially In-Order Completion– Simple

• Easy to implement– Performance?

• Execution overlap still possible• Writebacks in order • Amplifies latencies• Dependent Instructions wait longer

© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002

Allowing out-of-order completes

• Add one more state for instruction execution:– COMPLETE & COMMIT

• COMPLETE:– Result calculated– Dependent instructions can use– BUT, don’t know if preceding instructions

are all OK– I.e., don’t know if this instruction should

have executed now based on the original program order

• COMMIT:– All preceding instructions executed with no

problems– Can safely commit stage changes

© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002

OOO Complete & IO Commit

• Want: Out-of-Order Completion– Allow OOO completion– Maintain in-order COMMIT– Allow maximum overlap– Guarantee precise state if needed

• How does this improve performance?In-Order Complete OOO Complete

Tim

e

DIV R3, _, _ADD R1, _, _ADD _, R1, _

In-ordercommits

© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002

Reorder Buffer

• Result Shift Register:– Reserve Result Bus – Out-of-Order Completion

• Reorder Buffer– Defer Commits and do them in-order– Allow OOO Completes by buffering state

st. FU V TAG

ADD 5

DIV 4

Result Shift Register

mot

ion

TAG DR RES V E PC

head 4 R1 1000tail 5 R2 1001

REORDER BUFFER

mot

ion

res = resultv = valide = result NYA

Whe

n to

com

plet

e

Whe

n to

com

mit

© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002

Reorder Buffer Complications

• State is kept in the reorder buffer• Have to bypass from every entry

– Need to determine latest write w/ respect to the consuming instruction

RF

RB

Essentially:1. In-Order Commits2. Buffer speculative state till commit

© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002

Speculative State Updates

• Two fundamental approaches– Do changes but keep a record of old

state– Everything OK?

• Just discard record of changes• HISTORY BUFFER

– Keep two states:• Architectural and Speculative• On COMPLETE write state to

Speculative• On ISSUE read from speculative • On COMMIT write to Architectural• On Error, throw out Speculative state• FUTURE FILE

© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002

History Buffer

• Allow out-of-order register file updates• At decode record current value of target

register in RB– notice that this is the previous value the

register had• On Commit?

– Do nothing, state is fine• On Exception?

– Use History to UNDO changes made

RF

HB

resultsSource operands

Destination registersException

© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002

History Buffer Discussion

• Simple Mechanism• Additional Register File Port• Single Source for Input Operands• Normal Instruction processing Not

changed by much– Control mostly unchanged– Nothing to do on Commit for the common

case• Slow response to Interrupts

– Need to scan through HB– Complex?

© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002

Future File: The Optimist’s View

• Two Register Files:– One updated Out-of-Order (FUTURE)

• assume no exception will occur– One updated in Order (ARCHITECTURAL)

• Advantage: No delay to restore state on exception

RF

RB

Source operandsFF

results

© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002

How These Relate to Register Renaming?

• Physical Registers provide sufficient storage for both speculative and architectural storage

• It’s the register map table that determines what is the current state

• On interrupt we have to restore the map table– Values are there in the physical register file

• History and Future approaches still valid– History: keep track of changes to register

map table– On interrupt undo them one by one– Future: keep two tables

• Speculative: updated at decode• Architectural: updated at commit

© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002

RUU

• Sohi’s Paper• Common Mechanism for Precise

Interrupts and OOO Execution• Register Update Unit

– A collection of Reservation stations– Organized as a FIFO queue– Instructions Enter In-order at FETCH– They Exit In-Order at COMMIT

• Register File updates happen at this point.

• Simplescalar follows this model– Well, mostly– Cut’s corners on when Completes become

visible

© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002

RUU: OOO Execution

• Decode:– Check RUU for most recent write to register– If none found, read value from RF

• Do it in parallel really– If found, link to producer with a TAG

• RUU number is the TAG• Issue

– Wait till all input operands are ready• Complete

– Broadcast value and RUU ID• Waiting instructions will pick value up

• Commit– Head and Tail pointer for FIFO operation– Only when everyone before has committed

© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002

Where is the Rename Table?

• It’s the RUU– @ decode insts scan for the most recent

update to register– If none found, then register in register file– Otherwise, get RUU entry # as tag

• Interrupts?– Simply flush RUU

• Pros/Cons:– Associative lookup for decode– RUU ports limit when consumers can read a

value