Upload
jaylon-bellamy
View
222
Download
2
Embed Size (px)
Citation preview
March 14, 2002 1
CMPUT680 - Winter 2006
Topic C: Loop FusionKit Barton
www.cs.ualberta.ca/~cbarton
March 14, 2002 2
Outline
• Definition of loop fusion
• Basic concepts
• Prerequisites of loop fusion
• A loop fusion algorithm
• Example
March 14, 2002 3
Loop Fusion
• Combine 2 or more loops into a single loop
• This cannot violate any dependencies between the loop bodies
• Several conditions which must be met for fusion to occur
• Often these conditions are not initially satisfied
March 14, 2002 4
Advantages of Loop Fusion
• Save increment and branch instructions
• Creates opportunities for data reuse
• Provide more instructions to instruction scheduler to balance the use of functional units
March 14, 2002 5
Disadvantages of Loop Fusion
• Increase code size effecting instruction cache performance
• Increase register pressure within a loop
• Could cause the formation of loops with more complex control flow
March 14, 2002 6
Background
• There has been extensive work done on loop fusion
• Most has focused on weighted loop fusion (Gao et al., Kennedy and McKinley, Megiddo and Sarkar)
• Extensive work has also been done it performing loop fusion to increase parallelism
March 14, 2002 7
Weighted Loop Fusion
• Associates non-negative weights with each pair of loop nests
• Weights are a measurement of the expected gain if the two loops are fused
• Gains include potential for array contraction, data reuse and improved local register allocation
March 14, 2002 8
Optimal Loop Fusion
• Fuse loops to optimize data reuse, taking into consideration resource constraints and register usage
• This problem is NP-Hard
March 14, 2002 9
Maximal Loop Fusion
• Our approach is to perform maximal loop fusion
• Fuse as many loops as possible, without considering resource constraints
• Fuse loops as soon as possible, not considering the consequences
March 14, 2002 Allen & Kennedy, p. 150, 353 10
Dominators and Post Dominators
• A node x in a directed graph G with a single exit node dominates node y in G if any path from the entry node of G to y must pass through x
• A node x in a directed graph G with a single exit node post-dominates node y in G if any path from y to the exit node of G must pass through x
March 14, 2002 11
Requirements for Loop Fusion
i. Loops must have identical iteration counts (be conforming)
ii. Loops must be control-flow equivalent
iii. Loops must be adjacent
iv. There cannot be any negative distance dependencies between the loops
March 14, 2002 12
Non-conforming Loops
• If iteration counts are different, one loop must be manipulated to make the iteration counts the same
1. Loop peeling
2. Introduce a guard into one of the loops
March 14, 2002 13
Loop Peeling
• Find the difference between the iteration count of the two loops (n)
• Duplicate the body of the loop with the higher iteration count n times
• Update the iteration count of the peeled loop
March 14, 2002 14
Loop Peeling Example
while (i < 10)
{a[i] = a[i - 1] * 2;
i++;
}
while (j < 12)
{b[j] = b[j - 1] - 2;
j++;
}
while (i < 10)
{
a[i] = a[i - 1] * 2;
i++;
}
while (j < 10)
{
b[j] = b[j - 1] - 2;
j++;
}
b[j] = b[j - 1] - 2;
j++;
b[j] = b[j - 1] - 2;
j++;
March 14, 2002 15
Guarding Iterations
• Increase the iteration count of the loop with fewer iterations
• Insert a guard branch around statements that would not normally be executed
March 14, 2002 16
Guarding Iterations Example
while (i < 10)
{a[i] = a[i - 1] * 2;
i++;
}
while (j < 12)
{b[j] = b[j - 1] - 2;
j++;
}
while (i < 12)
{
if (i < 10)
{
a[i] = a[i - 1] * 2;
i++;
}
}
while (j < 12)
{
b[j] = b[j - 1] - 2;
j++;
}
March 14, 2002 17
Loop Peeling
• Advantage:• Does not generate control flow within a loop
body
• Disadvantage:• Generates additional code outside of loops,
which could possible intervene with other loops
March 14, 2002 18
Guarding Iterations
• Advantages:• Does not introduce intervening code• Can be “undone” later
• Disadvantage:• Generates control flow within a loop
March 14, 2002 19
Control Flow Equivalence
• Two loops are control-flow equivalent if when one executes, the other also executes
Loop 1
BB
Loop2
Loop 1
Loop 3
BB
Loop2
March 14, 2002 20
Determining Control Flow Equivalence
• Use the concepts of dominators and post dominators. Two loops L1 and L2 are control-flow equivalent if the following two conditions are true:• L1 dominates L2; and • L2 post dominates L1.
March 14, 2002 21
Intervening Code
• Two loops are adjacent if there are no statements between the two loops
• Can be determined using the CFG:• If the immediate successor of the first loop is
the second loop, the two loops are adjacent
• If two loops are not adjacent, there is intervening code between them.
March 14, 2002 22
Dealing with Non-Adjacent Loops
• If two loops are not adjacent, we attempt to make them adjacent by moving the intervening code
• Intervening code can be moved:• Above the first loop• Below the second loop• Both
• as long as no data dependencies are violated
March 14, 2002 23
Intervening Code Example
• Assume CFG has 20 nodes
• 0-5 are above Loop 1• 17-19 are below Loop 2• What algorithm should be
used to determine which nodes are between Loop1 and Loop2?
Loop 1
Loop 2
6
7
8 9
10 11 12
13 14
15
16
March 14, 2002 24
Gathering Intervening Code
• Given two loops L1 and L2, a basic block B is intervening code between L1 and L2 if and only if:o B is strictly dominated by L1o B is not dominated by L2
• Once the dominance relations are known, the set subtraction can be efficiently computed using bit vectors
March 14, 2002 25
Intervening Code Example
Loop 1
Loop 2
6
7
8 9
10 11 12
13 14
15
16
Loop 10000 0011 1111 1111 1111 1
Loop 2
0000 0000 0000 0000 1111 1
Difference
0000 0011 1111 1111 0000 0
March 14, 2002 26
Analyze Intervening Code
• Build a DDG of the intervening code• Put all nodes with no predecessors into queue• For each node in the queue:
• If there are no dependencies between the node and the loop
• Mark node as moveable• Add all of the nodes immediate successors to the
queue
• All nodes marked can be moved around the loop
March 14, 2002 27
Non-Adjacent loops example
while (i < N) {a += i;i++;
}b := a * 2;c := b + 6;g := 0;h := g + 10;if (c < 100)
d := c/2;else
e := c * 2;while (j < N) {
f := g + 6;j++;
}
b := a * 2;
c := b + 6;
g := 0;
if (c < 100)
d := c/2;
else
e := c * 2;
h := g + 10;
March 14, 2002 28
Non-Adjacent loops example
while (i < N) {a += i;i++;
}b := a * 2;c := b + 6;g := 0;h := g + 10;if (c < 100)
d := c/2;else
e := c * 2;while (j < N) {
f := g + 6;j++;
}
g := 0;h := g + 10;while (i < N) {
a += i;i++;
}while (j < N) {
f := g + 6;j++;
}b := a * 2;c := b + 6;if (c < 100)
d := c/2;else
e := c * 2;
March 14, 2002 29
Non-Adjacent loops example
b := a * 2;
c := b + 6;
g := 0;
if (c < 100)
d := c/2;
else
e := c * 2;
h := g + 10;
Node Queueb := a * 2;
g := 0;
DDG Loop 2
Moveable Nodes
c := b + 6;
if (c < 100)
d := c/2;
else
e := c * 2;
b := a * 2;
c := b + 6;
if (c < 100)
d := c/2;
else
e := c * 2;
while (j < N) {
f := g + 6;
j++;
}
March 14, 2002 30
Non-Adjacent loops example
b := a * 2;
c := b + 6;
g := 0;
if (c < 100)
d := c/2;
else
e := c * 2;
h := g + 10;
Node Queueb := a * 2;
g := 0;
DDG Loop 1
Moveable Nodes
h := g + 10;
g := 0;
h := g + 10;
while (i < N) {
a += i;
i++;
}
March 14, 2002 31
Dependencies Preventing Fusion
i = j = 1;
while (i < 10)
{
a[i] = c[i] + 10;
i++;
}
while (j < 10)
{
b[j] = a[j+1] * 2;
j++;
}
Can the following loops be fused?
March 14, 2002 32
Dependencies Preventing Fusion
• If we look at the array access patterns of a[], we see the following
a[i] = c[i] + 10;
b[j] = a[j+1] * 2;
March 14, 2002 33
Dependencies Preventing Fusion
• By aligning the array access patterns, we get the following:
a[i] = c[i] + 10;
b[j] = a[j+1] * 2;
March 14, 2002 34
Loop Alignment
i = j = 1;
while (i < 10)
{
a[i] = c[i] + 10;
i++;
}
while (j < 10)
{
b[j] = a[j+1] * 2;
j++;
}
j = 1;
i = 2
a[1] = c[1] + 10;
while (i < 10)
{
a[i] = c[i] + 10;
i++;
}
while (j < 10)
{
b[j] = a[j+1] * 2;
j++;
}
March 14, 2002 35
Loop Alignment
• Loop alignment can be used to remove dependencies between loop bodies
• Easy to do when all dependencies have the same distance
• Gets tricky when there are multiple dependencies with different distances
March 14, 2002 36
Putting it all together
• We’ve seen ways to deal with each of the preconditions of loop fusion
• If the conditions are not met, we apply transformations to try and modify the code
• If the transformations are successful, loop fusion can occur
• But in what order should these transformations be applied?
March 14, 2002 37
Loop Fusion Algorithm
For each Ni from outermost to innermost:
Gather control equivalent loops in Ni into LoopSets
For each set Si in LoopSets
remove non-eligible loops from Si
FusedLoops = trueDirection = forwardwhile FusedLoops == true
if |Si| < 2 breakCompute Dominance Relation
FusedLoops = LoopFusionPass(Si, Direction)Reverse Direction
March 14, 2002 38
Loop Fusion AlgorithmLoopFusionPass(S, Direction)
FusedLoops = false
For each pair of loops Lj and Lk in S such that Lj dominates Lk in Direction
if (DependenceDistance(Lj, Lk) < 0) continue
if (InterveningCode(Lj, Lk) == true and
IsInterveningCodeMoveable(Lj, Lk) == false) continue
d = | IterationCount(Lj) – IterationCount(Lk) |
if (Lj and Lk are non-conforming and (d cannot be determined at compile time or d > MAXPEEL)) continue
if (Lj and Lk are non-conforming) Peel iterations
MoveInterveningCode(Lj, Lk)
if InterveningCode(Lj, Lk) == false
FuseLoops(Lj, Lk) FusedLoops = true
Return FusedLoops
March 14, 2002 39
ExampleL1: do i1 = 1, n a(i1) = a(i1) * k1 end doL2: do i2 = 1, n-1 d(i2) = a(i2) - b(i2+1) * k2 end doS1: ds = 0.0L3: do i3 = 1, m ds = ds + d(i3) end doS2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end do
Loop Set
L1
L2
L3
L4
March 14, 2002 40
Peeling Loop 1L1: do i1 = 1, n a(i1) = a(i1) * k1 end doL2: do i2 = 1, n-1 d(i2) = a(i2) - b(i2+1) * k2 end doS1: ds = 0.0L3: do i3 = 1, m ds = ds + d(i3) end doS2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end do
S7: a(1) = a(1) * k1L1: do i1 = 1, n-1 a(i1+1) = a(i1+1) * k1 end doL2: do i2 = 1, n-1 d(i2) = a(i2) - b(i2+1) * k2 end doS1: ds = 0.0L3: do i3 = 1, m ds = ds + d(i3) end doS2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end do
March 14, 2002 41
Fuse L1 and L2S7: a(1) = a(1) * k1L5: do i5 = 1, n-1 a(i5+1) = a(i5+1) * k1
d(i5) = a(i5) - b(i5+1) * k2 end doS1: ds = 0.0L3: do i3 = 1, m ds = ds + d(i3) end doS2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end do
S7: a(1) = a(1) * k1L1: do i1 = 1, n-1 a(i1+1) = a(i1+1) * k1 end doL2: do i2 = 1, n-1 d(i2) = a(i2) - b(i2+1) * k2 end doS1: ds = 0.0L3: do i3 = 1, m ds = ds + d(i3) end doS2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end do
March 14, 2002 42
Compare L5 and L3
• We now compare loops L5 and L3
• They are not adjacent, but the intervening code can move
• Difference in iteration count is not know, so fusion fails
S7: a(1) = a(1) * k1L5: do i5 = 1, n-1 a(i5+1) = a(i5+1) * k1
d(i5) = a(i5) - b(i5+1) * k2 end doS1: ds = 0.0L3: do i3 = 1, m ds = ds + d(i3) end doS2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end do
March 14, 2002 43
Compare L5 and L4
Intervening CodeS7: a(1) = a(1) * k1L5: do i5 = 1, n-1 a(i5+1) = a(i5+1) * k1
d(i5) = a(i5) - b(i5+1) * k2 end doS1: ds = 0.0L3: do i3 = 1, m ds = ds + d(i3) end doS2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end do
S1: ds = 0.0
L3: do i3 = 1, m
ds = ds + d(i3)
end do
S2: if (n<m)
S3: c(n-2) = n
S4: else
S5: c(n-2) = m
March 14, 2002 44
Peel L5S7: a(1) = a(1) * k1L5: do i5 = 1, n-1 a(i5+1) = a(i5+1) * k1
d(i5) = a(i5) - b(i5+1) * k2 end doS1: ds = 0.0L3: do i3 = 1, m ds = ds + d(i3) end doS2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end do
S7: a(1) = a(1) * k1S8: a(2) = a(2) * k1S9: d(1) = a(1) - b(2) * k2L5: do i5 = 1, n-2 a(i5+2) = a(i5+2) * k1 d(i5+1) = a(i5+1) - b(i5+2) * k2 end doS1: ds = 0.0L3: do i3 = 1, m ds = ds + d(i3) end doS2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end do
March 14, 2002 45
Move Intervening CodeS7: a(1) = a(1) * k1S8: a(2) = a(2) * k1S9: d(1) = a(1) - b(2) * k2S1: ds = 0.0S2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL5: do i5 = 1, n-2 a(i5+2) = a(i5+2) * k1 d(i5+1) = a(i5+1) - b(i5+2) * k2 end doL3: do i3 = 1, m ds = ds + d(i3) end doL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end do
S7: a(1) = a(1) * k1S8: a(2) = a(2) * k1S9: d(1) = a(1) - b(2) * k2L5: do i5 = 1, n-2 a(i5+2) = a(i5+2) * k1 d(i5+1) = a(i5+1) - b(i5+2) * k2 end doS1: ds = 0.0L3: do i3 = 1, m ds = ds + d(i3) end doS2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end do
March 14, 2002 46
Reverse PassS7: a(1) = a(1) * k1S8: a(2) = a(2) * k1S9: d(1) = a(1) - b(2) * k2S1: ds = 0.0S2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL5: do i5 = 1, n-2 a(i5+2) = a(i5+2) * k1 d(i5+1) = a(i5+1) - b(i5+2) * k2 end doL3: do i3 = 1, m ds = ds + d(i3) end doL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end do
Loop Set
L1
L3
L4
Sorted in Reverse Dominance Direction
L1
L3
L4
March 14, 2002 47
Compare L4 and L3
• Compare L4 and L3• No dependencies to
prevent fusion• Iteration count cannot
be determined at compile time
• Fusion fails
S7: a(1) = a(1) * k1S8: a(2) = a(2) * k1S9: d(1) = a(1) - b(2) * k2S1: ds = 0.0S2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL5: do i5 = 1, n-2 a(i5+2) = a(i5+2) * k1 d(i5+1) = a(i5+1) - b(i5+2) * k2 end doL3: do i3 = 1, m ds = ds + d(i3) end doL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end do
March 14, 2002 48
Compare L4 and L5
Intervening Code
L3: do i3 = 1, m
ds = ds + d(i3)
end do
S7: a(1) = a(1) * k1S8: a(2) = a(2) * k1S9: d(1) = a(1) - b(2) * k2S1: ds = 0.0S2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL5: do i5 = 1, n-2 a(i5+2) = a(i5+2) * k1 d(i5+1) = a(i5+1) - b(i5+2) * k2 end doL3: do i3 = 1, m ds = ds + d(i3) end doL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end do
March 14, 2002 49
Move Intervening CodeS7: a(1) = a(1) * k1S8: a(2) = a(2) * k1S9: d(1) = a(1) - b(2) * k2S1: ds = 0.0S2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL5: do i5 = 1, n-2 a(i5+2) = a(i5+2) * k1 d(i5+1) = a(i5+1) - b(i5+2) * k2 end doL3: do i3 = 1, m ds = ds + d(i3) end doL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end do
S7: a(1) = a(1) * k1S8: a(2) = a(2) * k1S9: d(1) = a(1) - b(2) * k2S1: ds = 0.0S2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL5: do i5 = 1, n-2 a(i5+2) = a(i5+2) * k1 d(i5+1) = a(i5+1) - b(i5+2) * k2 end doL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end doL3: do i3 = 1, m ds = ds + d(i3) end do
March 14, 2002 50
Fuse L4 and L1S7: a(1) = a(1) * k1S8: a(2) = a(2) * k1S9: d(1) = a(1) - b(2) * k2S1: ds = 0.0S2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL6: do i5 = 1, n-2 a(i6+2) = a(i6+2) * k1 d(i6+1) = a(i6+1) - b(i6+2) * k2 b(i6) = a(i6) + b(i6) / c(i6) end doL3: do i3 = 1, m ds = ds + d(i3) end do
S7: a(1) = a(1) * k1S8: a(2) = a(2) * k1S9: d(1) = a(1) - b(2) * k2S1: ds = 0.0S2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL5: do i5 = 1, n-2 a(i5+2) = a(i5+2) * k1 d(i5+1) = a(i5+1) - b(i5+2) * k2 end doL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end doL3: do i3 = 1, m ds = ds + d(i3) end do