Upload
teagan-rich
View
14
Download
1
Embed Size (px)
DESCRIPTION
Christopher Barton Arie Tal Bob Blainey Jose Nelson Amaral. Generalized Index-Set Splitting. Generalized Index-Set Splitting. Overview Compile-time unknown split points Loops with multiple split points Removing “dead” inductive-branches Nested inductive branches Controlling Code Growth - PowerPoint PPT Presentation
Citation preview
Generalized Index-Set Splitting
Christopher Barton
Arie Tal
Bob Blainey
Jose Nelson Amaral
Generalized Index-Set Splitting
● Overview● Compile-time unknown split points● Loops with multiple split points● Removing “dead” inductive-branches● Nested inductive branches● Controlling Code Growth● Results● Conclusions
Index-Set Splitting
● Divide the index-set (range) of a loop into
sub-ranges● Each sub-range represents a separate loop
for (i=0; i < 100; i++) { if (i < 5) a[i] = 2 * a[i]; else a[i] = 5 * a[i]; b[i] = a[i] * a[i];}
for (i=0; i < 5; i++) { a[i] = 2 * a[i]; b[i] = a[i] * a[i];}for (i=5; i < 100; i++) { a[i] = 5 * a[i]; b[i] = a[i] * a[i];}
Compile-time Unknown Split
Points● When the lower-bound, upper-bound or split-
point are unknown at compile time:
for (i=0; i < 100; i++) { if (i < m) a[i] = 2 * a[i]; else a[i] = 5 * a[i]; b[i] = a[i] * a[i];}
for (i=0; i < m; i++) { a[i] = 2 * a[i]; b[i] = a[i] * a[i];}for (i=m; i < 100; i++) { a[i] = 5 * a[i]; b[i] = a[i] * a[i];}
Problem: m may be smaller than 0 or greater than 100!
Compile-time Unknown Split
Points● To overcome this problem, we will use the
following sub-ranges:– First loop: 0 – min(m, 100)– Second loop: max(m,0) - 100
for (i=0; i < min(m,100); i++) { a[i] = 2 * a[i]; b[i] = a[i] * a[i];}
for (i=max(m,0); i < 100; i++) { a[i] = 5 * a[i]; b[i] = a[i] * a[i];}
for (i=max(m,0); i < 100; i++) { a[i] = 5 * a[i]; b[i] = a[i] * a[i];}
Compile-time Unknown Split
Points
m ≤ 0:
0 < m < 100
100 ≤ m:
for (i=0; i < min(m,100); i++) { a[i] = 2 * a[i]; b[i] = a[i] * a[i];}
second loop iterates from 0 to 100
for (i=max(m,0); i < 100; i++) { a[i] = 5 * a[i]; b[i] = a[i] * a[i];}
first loop iterates from 0 to msecond loop iterates from m to 100
for (i=max(m,0); i < 100; i++) { a[i] = 5 * a[i]; b[i] = a[i] * a[i];}
for (i=0; i < min(m,100); i++) { a[i] = 2 * a[i]; b[i] = a[i] * a[i];}
first loop iterates from 0 to 100
for (i=0; i < min(m,100); i++) { a[i] = 2 * a[i]; b[i] = a[i] * a[i];}
Compile-time Unknown Split
Points● In general:
– Given a range with lower bound lb, upper bound
ub, and split-point sp, we define two sub-ranges:● lb to min(sp, ub)● max(sp,lb) to ub
for (i=lb; i < min(sp,ub); i++) { a[i] = 2 * a[i]; b[i] = a[i] * a[i];}for (i=max(sp,lb); i < ub; i++) { a[i] = 5 * a[i]; b[i] = a[i] * a[i];}
No other conditions are needed in addition to theloop structure
Loops With Multiple Split Points
● Multiple split points can be handled iteratively
● Instead, we build a sub-range tree:
for (i=0; i < 100; i++) { if (i < m) a[i] = 2 * a[i]; if (i < n) a[i] = 5 * a[i]; b[i] = a[i] * a[i];}
0,100
0,min(m,100) max(m,0), 100
0,min(m,n,100)
max(0,n),min(m,100) max(m,0),min(n,100)
max(m,n,0),100
if (i < m)if (i < m)
if (i < n)if (i < n)
Loops with multiple split points
● The first level in the sub-range tree
corresponds to the original loop range
for (i=0; i < 100; i++) { if (i < m) a[i] = 2 * a[i]; if (i < n) a[i] = 5 * a[i]; b[i] = a[i] * a[i];}
0,100
0,min(m,100) max(m,0), 100
0,min(m,n,100)
max(0,n),min(m,100) max(m,0),min(n,100)
max(m,n,0),100
Loops with multiple split points
● The second level of the tree corresponds to
the two loops created to remove the (i < m)
inductive branchfor (i=0; i < min(m,100); i++) { a[i] = 2 * a[i]; if (i < n) a[i] = 5 * a[i]; b[i] = a[i] * a[i];}for (i=max(m,0); i < 100; i++) { if (i < n) a[i] = 5 * a[i]; b[i] = a[i] * a[i];}
0,100
0,min(m,100) max(m,0), 100
0,min(m,n,100)
max(0,n),min(m,100) max(m,0),min(n,100)
max(m,n,0),100
Removing Dead Inductive-
Branches● To easily remove dead inductive-branches,
we mark the edges in the Then/Else to
correspond with the conditionfor (i=max(0,n); i<min(m,100); i++) { if (i < m) a[i] = 2 * a[i]; if (i < n) a[i] = 5 * a[i]; b[i] = a[i] * a[i];}
0,100
0,min(m,100) max(m,0), 100
0,min(m,n,100)
max(0,n),min(m,100) max(m,0),min(n,100)
max(m,n,0),100
Removing Dead Inductive-
Branches● We examine the path that leads from the root
of the sub-range tree to the desired sub-
range nodefor (i=max(0,n); i<min(m,100); i++) { if (i < m) a[i] = 2 * a[i]; if (i < n) a[i] = 5 * a[i]; b[i] = a[i] * a[i];}
0,100
0,min(m,100) max(m,0), 100
0,min(m,n,100)
max(0,n),min(m,100) max(m,0),min(n,100)
max(m,n,0),100
Nested Inductive-branches
● Nested branches only split the nodes of the
ranges to which they apply
for (i=0; i < 100; i++) { if (i < m) a[i] = 2 * a[i]; else if (i < n) a[i] = 5 * a[i]; b[i] = a[i] * a[i];}
0,100
0,min(m,100) max(m,0), 100
max(m,0),min(n,100) max(m,n,0),100
Nested Inductive-branches
● Nested branches only split the nodes of the
ranges to which they apply● The result will be three loops:
for (i=max(m,n,0); i < 100; i++) { b[i] = a[i] * a[i];}
0,100
0,min(m,100) max(m,0), 100
max(m,0),min(n,100) max(m,n,0),100
for (i=0; i < min(m,100; i++) { a[i] = 2 * a[i]; b[i] = a[i] * a[i];}
for (i=max(m,0); i < min(n,100); i++) { a[i] = 5 * a[i]; b[i] = a[i] * a[i];}
Controlling Code Growth
● We descend down the sub-range tree using
a greedy breadth-first approach
for (i=0; i < 100; i++) { if (i < m) a[i] = 2 * a[i]; if (i < n) a[i] = 5 * a[i]; b[i] = a[i] * a[i];}
0,100
0,min(m,100) max(m,0), 100
0,min(m,n,100)
max(0,n),min(m,100) max(m,0),min(n,100)
max(m,n,0),100
Controlling Code Growth
● We stop when we reach all the leaf nodes, or
the code growth limit● The selected nodes designate the loops that
will be generatedfor (i=0; i<min(m,100); i++) { a[i] = 2 * a[i]; if (i < n) a[i] = 5 * a[i]; b[i] = a[i] * a[i];}
0,100
0,min(m,100) max(m,0), 100
0,min(m,n,100)
max(0,n),min(m,100) max(m,0),min(n,100)
max(m,n,0),100
for (i=max(m,n,0); i<100; i++) { b[i] = a[i] * a[i];}
for (i=max(m,0); i<min(n,100); i++) { a[i] = 5 * a[i]; b[i] = a[i] * a[i];}
ISS Opportunities in SPEC2000
0
5
10
15
20
25
Number of ISS Opportunities
bzip2craftyeon gapgzip
perlbmktwolf vpr
appluapsiequakefma3dgalgellucas
sixtrack
Benchmarks
No Loop Fusion
Loop Fusion
Compilation Times for SPEC2000
-3-2-1012345678
Compilation Time Increase (%)
gziplucasbzip2equake
vprcrafty
apsitwolfgalgelappluperlbmk
eon gap
sixtrackfma3d
Benchmarks
Runtimes for SPEC2000
-3-2.5
-2-1.5
-1-0.5
00.5
11.5
2
Runtime Reduction (%)
lucasbzip2equake
vprcrafty
apsi twolfgalgelappluperlbmk
gap
sixtrackfma3d
Benchmark
Conclusions
● Presented a technique to remove loop-variant
branches from loops by splitting the original
loop into several loops with varying ranges● Implemented in IBM’s XL Compiler suite and
showed ISS opportunities in benchmark suites● Exploring ways ISS can be used to improve
effectiveness of other optimizations