Upload
amelia-bennett
View
223
Download
1
Embed Size (px)
DESCRIPTION
Back-end Structure IR TempMa p instruction selector register allocator Assem instruction scheduler
Citation preview
Middle and Back End
AST translation IR1
asmother IR and
translation
translation IR2
Back-end StructureIR
TempMap
instruction selector
register allocator
Assem
Asseminstruction scheduler
InstructionSelectionint f (int x, int y){ int a; int b; int c; int d;
a = x + y; b = a + 4; c = b * 2; d = c / 8; return d;}
y: 12(%ebp)x: 8(%ebp)
Positions for a, b, c, d can not be determined during this phase.
int f (int x, int y){ int a,b,c,d; int t1, t2; pushl %ebp movl %esp, %ebp
movl 8(%ebp), t1 movl 12(%ebp), t2 movl t1, a addl t2, a movl a, b addl $4, b movl b, %eax imult $2 movl %eax, c movl c, %eax cltd idivl $8 movl %eax, d movl d, %eax leave ret }
Prolog
Epilog
Register allocation After instruction selection, there may be
some variables left basic idea:
put as many as possible of these variables into registers
speed! Into memory, only if the register are out of supply
This process is called register allocation the most popular and important
optimization in modern compilers
RegisterAllocation
int f (int x, int y){ int a,b,c,d; int t1, t2; pushl %ebp movl %esp, %ebp
movl 8(%ebp), t1 movl 12(%ebp), t2 movl t1, a addl t2, a movl a, b addl $4, b movl b, %eax imult $2 movl %eax, c movl c, %eax cltd idivl $8 movl %eax, d movl d, %eax leave ret }
Suppose that the register allocation determines that (we will discuss how to do this a little later):a => %eaxb => %eaxc => %eaxd => %eaxt1 => %eaxt2 => %edx(this data structure is called a temp map)
RewritingWith the given temp map:a => %eaxb => %eaxc => %eaxd => %eaxt1 => %eaxt2 => %edx
.text .globl ff: pushl %ebp movl %esp, %ebp
movl 8(%ebp), t1 movl 12(%ebp), t2 movl t1, a addl t2, a movl a, b addl $4, b movl b, %eax imult $2 movl %eax, c movl b, %eax cltd idivl $8 movl %eax, d movl d, %eax leave ret
%eax%edx
We can rewrite the code accordingly, to generate the final assembly code.
%eax%eax%edx%eax
The rest are left to you!
Peep-holeOptimization
.globl ff: pushl %ebp movl %esp, %ebp
movl 8(%ebp), %eax movl 12(%ebp), %edx movl %eax, %eax addl %edx, %eax movl %eax, %eax addl $4, %eax movl %eax, %eax imult $2 movl %eax, %eax movl %eax, %eax cltd idivl $8 movl %eax, %eax movl %eax, %eax leave ret
Peep-hole optimizations try to improve the code by examine the code using a code window. It’s of a local manner.For example, we can use a code window of width 1, to eliminate the obvious redundancy of the form:movl r, r
Final Assembly
// This function does // NOT need a (stack) // frame! .text .globl ff: pushl %ebp movl %esp, %ebp
movl 8(%ebp), %eax movl 12(%ebp), %edx addl %edx, %eax addl $4, %eax imult $2 cltd idivl $8
leave ret
int f (int x, int y){ int a; int b; int c; int d;
a = x + y; b = a + 4; c = b * 2; d = b / 8; return 0;}
RegisterAllocationRegister allocation determines a temp map:a => %eaxb => %eaxc => %eaxd => %eaxt1 => %eaxt2 => %edx
int f (int x, int y){ int a,b,c,d; int t1, t2; pushl %ebp movl %esp, %ebp
movl 8(%ebp), t1 movl 12(%ebp), t2 movl t1, a addl t2, a movl a, b addl $4, b movl b, %eax imult $2 movl %eax, c movl c, %eax cltd idivl $8 movl %eax, d movl d, %eax leave ret }
How to generate such a temp map?Key observation: two variables can reside in one register, iff they don NOT live simultaneously.
LivenessAnalysisint f (int x, int y){ int a,b,c,d; int t1, t2; pushl %ebp movl %esp, %ebp
movl 8(%ebp), t1 movl 12(%ebp), t2 movl t1, a addl t2, a movl a, b addl $4, b movl b, %eax imult $2 movl %eax, c movl c, %eax cltd idivl $8 movl %eax, d movl d, %eax leave ret }
So, we can perform liveness analysis to calculate the live variable information.On the right, we mark, between each two statements, the liveOut set.
{eax}
{eax}
{d}{eax}
{…}
InterferenceGraph (IG)Register allocation determines that:(the temp map)a => %eaxb => %eaxc => %eaxd => %eaxt1 => %eaxt2 => %edx
int f (int x, int y){ int a,b,c,d; int t1, t2; pushl %ebp movl %esp, %ebp
movl 8(%ebp), t1 movl 12(%ebp), t2 movl t1, a addl t2, a movl a, b addl $4, b movl b, %eax imult $2 movl %eax, c movl c, %eax cltd idivl $8 movl %eax, d movl d, %eax leave ret }
t2 ∞ t1a ∞ t2
a b
c dt1 t2
%eax %eax%eax %eax
%eax %edx
Steps in Register Allocator Do liveness analysis Build the interference graph (IG)
draw an edge between any two variables which don’t live simultaneously Coloring the IG with K colors (registers)
K is the number of available registers on a machine A classical problem in graph theory NP-complete (for K>=3), thus one must use heuristics
Allocate physical registers to variables
History Early work by Cocke suggests that register allocation can be viewed as a graph coloring problem (1971) The first working allocator is Chaitin’s for IBM PL/1 compiler (1981)
Later, IBM PL.8 compiler Have some impact on the RISC
History, cont The more recent graph coloring allocator is due to Briggs (1992) For now, the graph coloring is the most popular allocator, used in many production compilers
e.g., GCC But more advanced allocators invented in recent years
so, graph coloring is a lesson abandoned? more on next few lectures …
Graph coloring Once we have the interference
graph, we can try to color the graph with K colors K: number of machine registers adjacent nodes with difference colors
But this problem is a NP-complete problem (for K>=3)
So we must use some heuristics
Kempe’s Allocator
Kempe’s Theorem [Kempe] Given a graph G with a node n such that degree(n)<K, G is K-colorable iff (G-{n}) is K-colorable (remove n and all edges connect n)
Proof?n…
degree(n)<K
Kempe’s Algorithmkempe(graph G, int K) while (there is any node n, degree(n)<K) remove this node n assign a color to the removed node n // greedy if (G is empty) // i.e., G is K-colorable return success; return failure;
Examplea b
c d
e
K = 41, 2, 3, 4
degree(a) = 3<4remove node “a”, assign the first available color
Examplea b
c d
e
K = 41, 2, 3, 4
degree(a) = 3<4remove node “a”, assign the first available colordegree(b) = 2<4remove node “b”, assign the first available color
Here, we want to choose the node with lowest degree, what kind of data structure should we use?
Examplea b
c d
e
K = 41, 2, 3, 4
degree(a) = 3<4remove node “a”, assign the first available colordegree(b) = 2<4remove node “b”, assign the first available colordegree(c) = 2<4remove node “c”, assign the first available color
Examplea b
c d
e
K = 41, 2, 3, 4
degree(a) = 3<4remove node “a”, assign the first available colordegree(b) = 2<4remove node “b”, assign the first available colordegree(c) = 2<4remove node “c”, assign the first available colordegree(d) = 1<4remove node “d”, assign the first available color
Examplea b
c d
e
K = 41, 2, 3, 4
degree(a) = 3<4remove node “a”, assign the first available colordegree(b) = 2<4remove node “b”, assign the first available colordegree(c) = 2<4remove node “c”, assign the first available colordegree(d) = 1<4remove node “d”, assign the first available colordegree(e) = 0<4remove node “e”, assign the first available color
Examplea b
c d
e
K = 41, 2, 3, 4
degree(a) = 3<4remove node “a”, assign the first available colordegree(b) = 2<4remove node “b”, assign the first available colordegree(c) = 2<4remove node “c”, assign the first available colordegree(d) = 1<4remove node “d”, assign the first available colordegree(e) = 0<4remove node “e”, assign the first available color
Examplea b
c d
e
K = 31, 2, 3
So this graph is 3-colorable.But if we have three colors, we can NOT apply the Kempe algorithm. (Why?)We can refine it to the following one:kempe(graph G, int K) stack = []; while (true) remove and push node<K to stack; if node>=K, remove and push it pop stack and assign colorsEssentially, this is a lazy algorithm!
Examplea b
c d
e
K = 31, 2, 3
remove node “a”, push onto the stack
Examplea b
c d
e
K = 31, 2, 3
remove node “a”, push onto the stack
a
remove node “b”, push onto the stack
significant
Examplea b
c d
e
K = 31, 2, 3
remove node “a”, push onto the stack
a
remove node “b”, push onto the stack
b
remove node “c”, push onto the stack
significant
Examplea b
c d
e
K = 31, 2, 3
remove node “a”, push onto the stack
a
remove node “b”, push onto the stack
b
remove node “c”, push onto the stack
c
remove node “d”, push onto the stack
d
remove node “e”, push onto the stack
significant
Examplea b
c d
e
K = 31, 2, 3
remove node “a”, push onto the stack
a
remove node “b”, push onto the stack
b
remove node “c”, push onto the stack
c
remove node “d”, push onto the stack
d
remove node “e”, push onto the stack
e
pop the stack, assign suitable colorspop “e”
significant
Examplea b
c d
e
K = 31, 2, 3
remove node “a”, push onto the stack
a
remove node “b”, push onto the stack
b
remove node “c”, push onto the stack
c
remove node “d”, push onto the stack
d
remove node “e”, push onto the stackpop the stack, assign suitable colorspop “e”
pop “d”
significant
Examplea b
c d
e
K = 31, 2, 3
remove node “a”, push onto the stack
a
remove node “b”, push onto the stack
b
remove node “c”, push onto the stack
c
remove node “d”, push onto the stackremove node “e”, push onto the stackpop the stack, assign suitable colorspop “e”
pop “d”pop “c”
significant
Examplea b
c d
e
K = 31, 2, 3
remove node “a”, push onto the stack
a
remove node “b”, push onto the stack
b
remove node “c”, push onto the stackremove node “d”, push onto the stackremove node “e”, push onto the stackpop the stack, assign suitable colorspop “e”
pop “d”pop “c”pop “b”
significant
Examplea b
c d
e
K = 31, 2, 3
remove node “a”, push onto the stack
a
remove node “b”, push onto the stackremove node “c”, push onto the stackremove node “d”, push onto the stackremove node “e”, push onto the stackpop the stack, assign suitable colorspop “e”
pop “d”pop “c”pop “b”pop “a”
significant
Examplea b
c d
e
K = 31, 2, 3
remove node “a”, push onto the stackremove node “b”, push onto the stackremove node “c”, push onto the stackremove node “d”, push onto the stackremove node “e”, push onto the stackpop the stack, assign suitable colorspop “e”
pop “d”pop “c”pop “b”pop “a”
Moral Kempe’s algorithm:
step #1: simplify remove graph nodes, be optimistic
step #2: select assign a color for each node, be lazy
You should use this algorithm for your lab6 first But what about the select phase fail?
no enough colors (registers)!
Examplea b
c d
e
K = 21, 2
remove node “a”, push onto the stack
Failure It’s often the case that Kempe’s algorithm fails
The IG is not K-colorable The basic idea is to generate spilling code
some variables should be put into memory, instead of into registers Usually, spilled variables reside in the call stack
Should modify code using such variables: for variable use: read from the memory for variable def: store into the memory
Spill code generation The effect of spill code is to turn long live range into shorter ones
This may introduce more temporaries The register allocator should start over, after generating spill code We’ll talk about this shortly
Chaitin’s Allocator
Chaitin’s Algorithm Build: build the interference graph (IG) Simplify: simplify the graph Spill: for significant nodes, mark it as potential spill (sp), remove it and continue Select: pop nodes and try to assign colors
if this fails for potential spill node, mark potential spill as actural spill and continue Start over: generate spill code for actural spills and start over from step #1 (build)
Chaitin’s Algorithm
build simplify Potentialspill Select Actual
spill
Step 1: build the IGa = 1
b = 2
c = a+b
d = a+c
e = a+b
f = d+e
a b
c d
e f
K = 21, 2
Step 2: simplificationa = 1
b = 2
c = a+b
d = a+c
e = a+b
f = d+e
a b
c d
e f
K = 21, 2
f
Step 2: simplificationa = 1
b = 2
c = a+b
d = a+c
e = a+b
f = d+e
a b
c d
e f
K = 21, 2
fe
Step 2: simplificationa = 1
b = 2
c = a+b
d = a+c
e = a+b
f = d+e
a b
c d
e f
K = 21, 2
fec ps
Step 2: simplificationa = 1
b = 2
c = a+b
d = a+c
e = a+b
f = d+e
a b
c d
e f
K = 21, 2
fec psd ps
Step 2: simplificationa = 1
b = 2
c = a+b
d = a+c
e = a+b
f = d+e
a b
c d
e f
K = 21, 2
fec psd psa
Step 2: simplificationa = 1
b = 2
c = a+b
d = a+c
e = a+b
f = d+e
a b
c d
e f
K = 21, 2
fec psd psab
Step 3: selectiona = 1
b = 2
c = a+b
d = a+c
e = a+b
f = d+e
a b
c d
e f
K = 21, 2
fec psd psab
Step 3: selectiona = 1
b = 2
c = a+b
d = a+c
e = a+b
f = d+e
a b
c d
e f
K = 21, 2
fec psd psa
Step 3: selectiona = 1
b = 2
c = a+b
d = a+c
e = a+b
f = d+e
a b
c d
e f
K = 21, 2
fec psd ps
Step 3: selectiona = 1
b = 2
c = a+b
d = a+c
e = a+b
f = d+e
a b
c d
e f
K = 21, 2
fec ps
actural spilla fake color
Step 3: selectiona = 1
b = 2
c = a+b
d = a+c
e = a+b
f = d+e
a b
c d
e f
K = 21, 2
feactural spill
a fake color
actural spilla fake color
Step 3: selectiona = 1
b = 2
c = a+b
d = a+c
e = a+b
f = d+f
a b
c d
e f
K = 21, 2
factural spilla fake color
actural spilla fake color
Step 3: selectiona = 1
b = 2
c = a+b
d = a+c
e = a+b
f = d+e
a b
c d
e f
K = 21, 2
actural spilla fake color
actural spilla fake color
There are two spills: c and d. One must rewrite the code.
Step 4: code rewriting (actual spill)
a = 1
b = 2
c = a+b
d = a+c
e = a+b
f = d+e
a = 1
b = 2
x1 = a+b
M[l_c] = x1
x2 = M[l_c]
x3 = a+x2
M[l_d] = x3
e = a+b
x4 = M[l_d]
f = x4+e
There are two spills: c and d. Suppose the memory address for c and d are l_c and l_d (two integers indicating stack offsets). Then for each use, generate a read, for each def, generate a write.
What’s special about xi?They can NOT spill any more. (Why?) c = a+b
d = a+c
f = d+e
Step 4: … star over
a b
x1 x2
f
K = 21, 2
a = 1
b = 2
x1 = a+b
M[l_c] = x1
x2 = M[l_d]
x3 = t1+x2
M[l_d] = x3
e = a+b
x4 = M[l_d]
f = x4+e
e
x4
x3Leave other steps to you.This graph can NOT be colored with 2 colors. (There is a K2 sub-graph.)
Veryyyyyyyy EXPENSIVE!
So, we have to do another iteration to generate spill code (Keep in mind that you can NOT spill x1, x2, x3 and x4) …
Code spill (2nd time)x5 = 1
M[l_a] = x5
b = 2
s6 = M[l_a]
x1 = s6+b
M[l_c] = x1
x2 = M[l_c]
x7 = M[l_a]
x3 = x7+x2
M[l_d] = x3
x8 = M[l_a]
e = x8+b
x4 = M[l_d]
f = x4+e
a = 1
b = 2
x1 = a+b
M[l_c] = x1
x2 = M[l_c]
x3 = a+x2
M[l_d] = x3
e = a+b
x4 = M[l_d]
f = x4+e
a b
x1 x2
f
K = 21, 2
e
x4
x3spilled
IGx1
bx2
x8
fK = 21, 2
e
x3
x4x5x6
x7
x5 = 1
M[l_a] = x5
b = 2
s6 = M[l_a]
x1 = s6+b
M[l_c] = x1
x2 = M[l_c]
x7 = M[l_a]
x3 = x7+x2
M[l_d] = x3
x8 = M[l_a]
e = x8+b
x4 = M[l_d]
f = x4+e
This graph is still not 2-colorable. Why?
So we should continue to spill code. And star over…There are 3 variables remained: b, e, f.Which one should be spilled?Suppose we spill b.
ThirdRound
x5 = 1
M[l_a] = x5
x9 = 2
M[l_b] = x9
x6 = M[l_a]
x10 = M[l_b]
x1 = x6+x10
M[l_c] = x1
x2 = M[l_c]
x7 = M[l_a]
x3 = x7+x2
M[l_d] = x3
x1
x9x8
x6
fK = 21, 2
e
x3
x4x5x2
x7
x8 = M[l_a]
x11 = M[l_b]
e = s8+s11
x4 = M[l_d]
f = x4+e
x11
x10 We have spill all of a, b, c, and d.This has the effect of chopping up all long live ranges into small live ranges!
Spilling a use For a statement like this:
t = u + v if we mark u as an actural spill, rewrite to:
u’ = M[l_u] t = u’+v
where u’ can NOT be a candidate for future spill (unspillable)
Spilling a def For a statement like this:
t = u + v if we mark t as an actural spill, rewrite to:
t’ = u+v M[l_t] = t’
where t’ can NOT be a candidate for future spill (unspillable)
Spilled temps Where should these variables be
spilled to? function frames!
%ebp
…
%esp
arg1arg0
ret addrold ebpSpill_0Spill_1
…
The compiler maintains an internal counter. Each time the compiler finds an actural spill, it increases the counter and assigns a location for that spilled variable.
Frame Suppose we put the frame on the
stack: .text .globl ff: pushl %ebp movl %esp, %ebp pushl %ebx pushl %edi pushl %esi subl $(n*4), %esp
n is the number of all spills, which can only be determined after register allocation.
Some improvements We can speed up the graph coloring based register alloctor in several ways But:
To finish first, first finish KISS: keep it simple and stupid Don’t be too smart by half
Your Tiger compiler must produce correct target code first
#1: Good data structures For live sets
bit-vector? or other data structures? For IG
adjacency list? adjacency matrix? both? Similar for other data structures Use good interface will let you write
dead simple code and enhance it later
#2: frame slot allocationx5 = 1
M[l_a] = x5
x9 = 2
M[l_b] = x9
x6 = M[l_a]
x10 = M[l_b]
x1 = x6+x10
M[l_c] = x1
x2 = M[l_c]
x7 = M[l_a]
x3 = x7+x2M[l_d] = x3 x8 = M[l_a]
x11 = M[l_b]
e = x8+x11
x4 = M[l_d]
f = x4+e
Allocating every spilled temp to its own frame slot can lead to a lot of memory used!A better idea is to share frame slot between spilled temp: iff they don’t live simultaneously: frame slot allocation!
#2: frame slot allocation
l_a
l_b l_d
l_cHow many different colors are required to color this graph?
x5 = 1
M[l_a] = x5
x9 = 2
M[l_b] = x9
x6 = M[l_a]
x10 = M[l_b]
x1 = x6+x10
M[l_c] = x1
x2 = M[l_c]
x7 = M[l_a]
x3 = x7+x2M[l_d] = x3 x8 = M[l_a]
x11 = M[l_b]
e = x8+x11
x4 = M[l_d]
f = x4+e
#3: coalescing Suppose we have a move
statement: t = u
What’s the potential benefit of allocating both t and u to the same register r? r = r
This is called coalescing
Briggs’ Allocator