View
218
Download
1
Embed Size (px)
Citation preview
Wish Branches
A Review of “Wish Branches: Enabling Adaptive and Aggressive Predicated
Execution”
Russell Dodd - October 24, 2006
Motivation Traditional Execution
Control dependencies can cause delays or stalls. Performance penalties for hard to predict branches.
Predicated Execution Converts control dependencies in branches into data
dependencies. Typical HW cannot override a compiler’s choice of branch
type. If easy-to-predict branches are predicated (unnecessarily);
it results in processing overhead and possibly a performance loss.
Cannot remove backward branches.
Russell Dodd - October 24, 2006 Adapted from H. Kim et al
Proposed Solution
Compiler generates code that has dual functionality: predicated and non-predicated.
Same as predicated code only predicated conditional branches are not removed.
These conditional branches are called
wish branches.
Russell Dodd - October 24, 2006 Adapted from H. Kim et al
Proposed Solution
Wish Branches When fetched, HW estimates the difficulty of the
branch prediction at runtime with a confidence estimator.
If hard-to-predict the HW executes the predicated version of the code, to avoid misprediction.
Otherwise, uses branch predictor and ignores predicate information for easy-to-predict branches.
Russell Dodd - October 24, 2006 Adapted from H. Kim et al
Comparison
Figure 1 from “Wish Branches: Enabling Adaptive and Aggressive Predicated Execution”
Russell Dodd - October 24, 2006 Adapted from H. Kim et al
C-code
Comparison
Figure 1 from “Wish Branches: Enabling Adaptive and Aggressive Predicated Execution”
Russell Dodd - October 24, 2006 Adapted from H. Kim et al
C-code
Condition==FALSE
Low confidence mode
Not used
High confidence mode
Consequences
If the confidence estimate is accurate Avoids overhead of predicated execution. Or eliminates a branch misprediction.
If the wish branch is mispredicted In high confidence the pipeline must be flushed. In low confidence the processor does not flush
the pipeline, but executes as conventional predicated code.
Russell Dodd - October 24, 2006 Adapted from H. Kim et al
Wish Loops
Used for backward branching. Used since predication cannot directly
remove backward branches. Use predication to reduce branch
misprediction penalties, without removing the branch.
Russell Dodd - October 24, 2006 Adapted from H. Kim et al
Comparison
Extra instruction to initialize predicate
C-code
Russell Dodd - October 24, 2006 Adapted from H. Kim et al
Figure 2 from “Wish Branches: Enabling Adaptive and Aggressive Predicated Execution”
A Closer Look at the Wish Loop in Low Confidence
C-Code
do{
a++;
i++;
} while (i<3)
a = 1
i = 1
i < 3
a= 2
i = 2
i < 3
a = 3
i = 3
i !< 3 exit
What if a misprediction occurs during the loop?
Three cases can occur:
1 Early Exit
The loop iterates fewer times than it should. Processor needs to execute the body one
more time. Requires a pipeline flush just like a normal
mispredicted branch.
Early stop here
Russell Dodd - October 24, 2006 Adapted from H. Kim et al
A Closer Look at the Wish Loop in Low Confidence
C-Code
do{
a++;
i++;
} while (i<3)
a = 1
i = 1
i < 3
a= 2
i = 2
i < 3
a = 3
i = 3
i !< 3 exit
a = 4
i = 4
i !< 3 exit
What if a misprediction occurs during the loop?
Three cases can occur:
2 Late Exit
The loop iterates a few more times than it should.
Allow extra blocks to go through data path as no-ops.
Performs better than normal backward branch since it reduces penalty.
Late stop here
Russell Dodd - October 24, 2006 Adapted from H. Kim et al
A Closer Look at the Wish Loop in Low Confidence
C-Code
do{
a++;
i++;
} while (i<3)
a = 1
i = 1
i < 3
a= 2
i = 2
i < 3
a = 3
i = 3
i !< 3 exit
…
a = N
i = N
i !< 3 exit
What if a misprediction occurs during the loop?
Three cases can occur:
3 No Exit
Flush any extra blocks and continue. Similar to normal mispredicted branch. If extra blocks were treated as no-ops as in
late exit it would waste energy.
No exit
Russell Dodd - October 24, 2006 Adapted from H. Kim et al
Wish Branches in Complex Control Flow
Normal Predicated Wish Branch
C-code
Russell Dodd - October 24, 2006 Adapted from H. Kim et al
Figure 3 from “Wish Branches: Enabling Adaptive and Aggressive Predicated Execution”
Wish Branch Support
ISA Implementable in existing branch instructions. Requires two hint bits for four choices (normal branch, wish
jump, wish join and wish loop). Compiler
Needs to decide which branches should be converted to. Hardware
Requires an accurate confidence estimator. Changes to the processor front-end and branch
misprediction detection module.
Russell Dodd - October 24, 2006 Adapted from H. Kim et al
Advantages
Dynamically eliminates the performance overhead of predicated execution.
Allows compilers to generate code more aggressively, since the processor corrects the decisions at runtime.
Exploits predicated execution to reduce branch misprediction penalties for backward branches.
Increases code adaptivity to machines since processors dynamically decide when not to use predication.
Russell Dodd - October 24, 2006 Adapted from H. Kim et al
Disadvantages
Requires extra branch instructions. Increases contention for the branch predictor
table entries. Increase interference in pattern history tables.
Reduces the size of basic code blocks by adding control dependencies. Reduces the scope of optimization.
Russell Dodd - October 24, 2006 Adapted from H. Kim et al
Performance Evaluation
Implemented in the IA-64 ISA since it has full support for predication.
Used a superscalar out-of-order processor model.
Russell Dodd - October 24, 2006 Adapted from H. Kim et al
An excerpt from “Wish Branches: Enabling Adaptive and Aggressive Predicated Execution” pg 54.
Results
Figure 4 from “Wish Branches: Enabling Adaptive and Aggressive Predicated Execution”
Russell Dodd - October 24, 2006 Adapted from H. Kim et al
Creates large skew on average.
Results
Russell Dodd - October 24, 2006 Adapted from H. Kim et al
Table 1 from “Wish Branches: Enabling Adaptive and Aggressive Predicated Execution”
Analysis
Wish branches improve performance since they divide work between the compiler and micro-architecture. Compilers excel at analyzing control-flow graphs
and making predicated code. Micro-architectures excel at making runtime
decisions: whether to use predicated execution or branch prediction, based on dynamic program information.
Adapted from H. Kim et alRussell Dodd - October 24, 2006
Future Directions
Develop computer algorithms to decide which branches to convert to wish branches. Combat reduced scope for code optimization.
Develop more accurate confidence estimation mechanisms.
Develop a hardware wish loop predictor.
Russell Dodd - October 24, 2006 Adapted from H. Kim et al
My Thoughts Branches are large portion of code content ~ 20%, so it’s
worth the effort to improve. No mention of code size/instruction count.
Dynamic micro-instruction count given in previous paper but no comparison given to other styles.
Branch Predicting vs. Wish Branching Hypothetically… If a branch predictor can be improved, so it is accurate
most of the time during runtime, is it worthwhile to invest in dynamically solving branch-types (predicated/normal) for further prediction.
Russell Dodd - October 24, 2006
My Thoughts More hardware isn’t that big of a deal in a billion transistor
world. (A previous paper details hardware requirements and its complexity) H.Kim et al., “Wish Branches: Combining Conditional Branching
and Predication for Adaptive Predicated Execution”. Software would not require a large investment since it is
based on predicated (existing) execution. However compiler support to enable heuristics for optimization
could be expensive. How much?? The improvement in execution time is a really interesting
development for future processors. How far could prediction be nested?
Predict the confidence estimator’s results..
Russell Dodd - October 24, 2006