View
50
Download
0
Category
Tags:
Preview:
DESCRIPTION
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors. Onur Mutlu, The University of Texas at Austin Jared Start, Microprocessor Research, Intel Labs Chris Wilkerson, Desktop Platforms Group, Intel Corp Yale N. Patt, The University of Texas at Austin - PowerPoint PPT Presentation
Citation preview
Runahead Execution:
An Alternative to Very Large Instruction Windows for Out-of-order
Processors
Onur Mutlu, The University of Texas at Austin
Jared Start, Microprocessor Research, Intel Labs
Chris Wilkerson, Desktop Platforms Group, Intel Corp
Yale N. Patt, The University of Texas at Austin
Presented by: Mark Teper
Outline The Problem Related Work The Idea: Runahead Execution Details Results Issues
Brief Overview Instruction
Window: Set of in-order
instructions that have not yet been commited
Scheduling Window Set of unexecuted
instructions needed to selected for execution
What can go wrong?
…
Program FlowInstruction Window
Scheduling Windows
ExecutionUnits
ExecutionUnits
…
The Problem
Program Flow
Unexecuted Instruction Executing Instruction
Long Running InstructionCommited Instruction
Instruction Window
Filling the Instruction Window
Better
IPC
Related Work Caches:
Alter size and structure of caches
Attempt to reduce unnecessary memory reads
Prefetching: Attempt to fetch data into
nearby cache before needed
Hardware & software techniques
Other techniques: Waiting instruction buffer
(WIB) Long-latency block
retirements
CPU
L1 Cache 1 Cycle
L2 Cache 10 Cycles
Memory 1000 cycles
RunAhead Execution Continue executing instructions during long stalls
Disregard results once data is available
…
Program Flow
Unexecuted Instruction Executing Instruction
Long Running InstructionCommited Instruction
Instruction Window
Benefits Acts as a high accuracy prefetcher
Software prefetchers have less information Hardware prefetchers can’t analyze code as
well
Biase predictors
Makes use of cycles that are otherwise wasted
Entering RunAhead Processors can enter run-ahead mode at any point
L2 Cache Misses used in paper
Architecture needs to be able to checkpoint and restore register state
Including branch-history register and return address stack
Handling Avoided Read Run Ahead trigger returns immediately
Value is marked as INV Processor continues fetching and executing
instructions
ld r1, [r2]
Add r3, r2, r2
Add r3, r1, r2
move r1, 0
R1
R2
R3
Executing Instruction in RunAhead Instructions are fetched and executed as
normal Instructions are committed retired out of
the instruction window in program order If the instructions registers are INV it can be
retired without executing No data is ever observable outside the CPU
Branches during RunAhead
Divergence Points: Incorrect INV value branch prediction
Predict Branch
Yes – Assume predictor is correct,Continue execution
Does BranchDepend on INV?
No - Evaluate branch
Was branch predictor correct?
Yes – Continue Execution No – Flush instruction queue
Exiting RunAhead Occurs when stalling memory access finally
returns Checkpointed architecture is restored All instructions in the machine are flushed
Processor starts fetching again at instruction which caused RunAhead execution Paper presented optimization where fetching
started slightly before stalled instruction returned
Biasing Branch Predictors RunAhead can cause branch predictors to
be biased twice on the same branch
Several Alternatives:(1)Always train branch predictors (2)Never train branch predictors (3)Create list of predicted branches(4)Create separate Branch Predictor
RunAhead Cache
RunAhead execution disregards stores Can’t produce externally observable results
However, this data is needed for communication Solution: Run-Ahead cache
Loop:
…
store r1, [r2]
add r1, r3, r1
store r1, [r4]
load r1, [r2]
bne r1, r5, Loop
Stores and Loads in Run Ahead
Loads1. If address is INV data
is automatically INV2. Next look in:
1. Store buffer2. RunAhead Cache
3. Finally go to memory1. In in cache treat as
valid2. If not treat as INV,
don’t stall
Stores1. Use store-buffer as
usual2. On Commit:
1. If address is INV ignore2. Otherwise write data to
RunAhead Cache
Run-Ahead Cache Results
Found that not passing data from stores to loads resulted in poor performance Significant number of INV results
Better
Details: Architecture
ResultsBetter
Results (2)Better
Issues
Some wrong assumptions about future machines Future baseline corresponds poorly to modern
architectures
Not a lot of details of architectural requirement for this technique Increase architecture size Increase power-requirements
Recommended