Register Allocation: Graph Coloring Compiler Baojian Hua

Register Allocation: Graph Coloring

CompilerBaojian Hua

[email protected]

Middle and Back End

AST translation IR1

asmother IR and

translation

translation IR2

Back-end StructureIR

TempMap

instruction selector

register allocator

Assem

Asseminstruction scheduler

InstructionSelectionint f (int x, int y){ int a; int b; int c; int d;

a = x + y; b = a + 4; c = b * 2; d = c / 8; return d;}

y: 12(%ebp)x: 8(%ebp)

Positions for a, b, c, d can not be determined during this phase.

int f (int x, int y){ int a,b,c,d; int t1, t2; pushl %ebp movl %esp, %ebp

movl 8(%ebp), t1 movl 12(%ebp), t2 movl t1, a addl t2, a movl a, b addl $4, b movl b, %eax imult $2 movl %eax, c movl c, %eax cltd idivl $8 movl %eax, d movl d, %eax leave ret }

Prolog

Epilog

Register allocation After instruction selection, there may be

some variables left basic idea:

put as many as possible of these variables into registers

speed! Into memory, only if the register are out of supply

This process is called register allocation the most popular and important

optimization in modern compilers

RegisterAllocation



Suppose that the register allocation determines that (we will discuss how to do this a little later):a => %eaxb => %eaxc => %eaxd => %eaxt1 => %eaxt2 => %edx(this data structure is called a temp map)

RewritingWith the given temp map:a => %eaxb => %eaxc => %eaxd => %eaxt1 => %eaxt2 => %edx

.text .globl ff: pushl %ebp movl %esp, %ebp

movl 8(%ebp), t1 movl 12(%ebp), t2 movl t1, a addl t2, a movl a, b addl $4, b movl b, %eax imult $2 movl %eax, c movl b, %eax cltd idivl $8 movl %eax, d movl d, %eax leave ret

%eax%edx

We can rewrite the code accordingly, to generate the final assembly code.

%eax%eax%edx%eax

The rest are left to you!

Peep-holeOptimization

.globl ff: pushl %ebp movl %esp, %ebp

movl 8(%ebp), %eax movl 12(%ebp), %edx movl %eax, %eax addl %edx, %eax movl %eax, %eax addl $4, %eax movl %eax, %eax imult $2 movl %eax, %eax movl %eax, %eax cltd idivl $8 movl %eax, %eax movl %eax, %eax leave ret

Peep-hole optimizations try to improve the code by examine the code using a code window. It’s of a local manner.For example, we can use a code window of width 1, to eliminate the obvious redundancy of the form:movl r, r

Final Assembly

// This function does // NOT need a (stack) // frame! .text .globl ff: pushl %ebp movl %esp, %ebp

movl 8(%ebp), %eax movl 12(%ebp), %edx addl %edx, %eax addl $4, %eax imult $2 cltd idivl $8

leave ret

int f (int x, int y){ int a; int b; int c; int d;

a = x + y; b = a + 4; c = b * 2; d = b / 8; return 0;}

RegisterAllocationRegister allocation determines a temp map:a => %eaxb => %eaxc => %eaxd => %eaxt1 => %eaxt2 => %edx



How to generate such a temp map?Key observation: two variables can reside in one register, iff they don NOT live simultaneously.

LivenessAnalysisint f (int x, int y){ int a,b,c,d; int t1, t2; pushl %ebp movl %esp, %ebp


So, we can perform liveness analysis to calculate the live variable information.On the right, we mark, between each two statements, the liveOut set.

{eax}

{eax}

{d}{eax}

{…}

InterferenceGraph (IG)Register allocation determines that:(the temp map)a => %eaxb => %eaxc => %eaxd => %eaxt1 => %eaxt2 => %edx



t2 ∞ t1a ∞ t2

a b

c dt1 t2

%eax %eax%eax %eax

%eax %edx

Steps in Register Allocator Do liveness analysis Build the interference graph (IG)

draw an edge between any two variables which don’t live simultaneously Coloring the IG with K colors (registers)

K is the number of available registers on a machine A classical problem in graph theory NP-complete (for K>=3), thus one must use heuristics

Allocate physical registers to variables

History Early work by Cocke suggests that register allocation can be viewed as a graph coloring problem (1971) The first working allocator is Chaitin’s for IBM PL/1 compiler (1981)

Later, IBM PL.8 compiler Have some impact on the RISC

History, cont The more recent graph coloring allocator is due to Briggs (1992) For now, the graph coloring is the most popular allocator, used in many production compilers

e.g., GCC But more advanced allocators invented in recent years

so, graph coloring is a lesson abandoned? more on next few lectures …

Graph coloring Once we have the interference

graph, we can try to color the graph with K colors K: number of machine registers adjacent nodes with difference colors

But this problem is a NP-complete problem (for K>=3)

So we must use some heuristics

Kempe’s Allocator

Kempe’s Theorem [Kempe] Given a graph G with a node n such that degree(n)<K, G is K-colorable iff (G-{n}) is K-colorable (remove n and all edges connect n)

Proof?n…

degree(n)<K

Kempe’s Algorithmkempe(graph G, int K) while (there is any node n, degree(n)<K) remove this node n assign a color to the removed node n // greedy if (G is empty) // i.e., G is K-colorable return success; return failure;

Examplea b

c d

e

K = 41, 2, 3, 4

degree(a) = 3<4remove node “a”, assign the first available color

Examplea b

c d

e

K = 41, 2, 3, 4

degree(a) = 3<4remove node “a”, assign the first available colordegree(b) = 2<4remove node “b”, assign the first available color

Here, we want to choose the node with lowest degree, what kind of data structure should we use?

Examplea b

c d

e

K = 41, 2, 3, 4

degree(a) = 3<4remove node “a”, assign the first available colordegree(b) = 2<4remove node “b”, assign the first available colordegree(c) = 2<4remove node “c”, assign the first available color

Examplea b

c d

e

K = 41, 2, 3, 4

degree(a) = 3<4remove node “a”, assign the first available colordegree(b) = 2<4remove node “b”, assign the first available colordegree(c) = 2<4remove node “c”, assign the first available colordegree(d) = 1<4remove node “d”, assign the first available color

Examplea b

c d

e

K = 41, 2, 3, 4

degree(a) = 3<4remove node “a”, assign the first available colordegree(b) = 2<4remove node “b”, assign the first available colordegree(c) = 2<4remove node “c”, assign the first available colordegree(d) = 1<4remove node “d”, assign the first available colordegree(e) = 0<4remove node “e”, assign the first available color

Examplea b

c d

e

K = 41, 2, 3, 4

degree(a) = 3<4remove node “a”, assign the first available colordegree(b) = 2<4remove node “b”, assign the first available colordegree(c) = 2<4remove node “c”, assign the first available colordegree(d) = 1<4remove node “d”, assign the first available colordegree(e) = 0<4remove node “e”, assign the first available color

Examplea b

c d

e

K = 31, 2, 3

So this graph is 3-colorable.But if we have three colors, we can NOT apply the Kempe algorithm. (Why?)We can refine it to the following one:kempe(graph G, int K) stack = []; while (true) remove and push node<K to stack; if node>=K, remove and push it pop stack and assign colorsEssentially, this is a lazy algorithm!

Examplea b

c d

e

K = 31, 2, 3

remove node “a”, push onto the stack

Examplea b

c d

e

K = 31, 2, 3


a

remove node “b”, push onto the stack

significant

Examplea b

c d

e

K = 31, 2, 3


a


b

remove node “c”, push onto the stack

significant

Examplea b

c d

e

K = 31, 2, 3


a


b


c

remove node “d”, push onto the stack

d

remove node “e”, push onto the stack

significant

Examplea b

c d

e

K = 31, 2, 3


a


b


c


d

remove node “e”, push onto the stack

e

pop the stack, assign suitable colorspop “e”

significant

Examplea b

c d

e

K = 31, 2, 3


a


b


c


d

remove node “e”, push onto the stackpop the stack, assign suitable colorspop “e”

pop “d”

significant

Examplea b

c d

e

K = 31, 2, 3


a


b


c

remove node “d”, push onto the stackremove node “e”, push onto the stackpop the stack, assign suitable colorspop “e”

pop “d”pop “c”

significant

Examplea b

c d

e

K = 31, 2, 3


a


b

remove node “c”, push onto the stackremove node “d”, push onto the stackremove node “e”, push onto the stackpop the stack, assign suitable colorspop “e”

pop “d”pop “c”pop “b”

significant

Examplea b

c d

e

K = 31, 2, 3


a

remove node “b”, push onto the stackremove node “c”, push onto the stackremove node “d”, push onto the stackremove node “e”, push onto the stackpop the stack, assign suitable colorspop “e”

pop “d”pop “c”pop “b”pop “a”

significant

Examplea b

c d

e

K = 31, 2, 3

remove node “a”, push onto the stackremove node “b”, push onto the stackremove node “c”, push onto the stackremove node “d”, push onto the stackremove node “e”, push onto the stackpop the stack, assign suitable colorspop “e”

pop “d”pop “c”pop “b”pop “a”

Moral Kempe’s algorithm:

step #1: simplify remove graph nodes, be optimistic

step #2: select assign a color for each node, be lazy

You should use this algorithm for your lab6 first But what about the select phase fail?

no enough colors (registers)!

Examplea b

c d

e

K = 21, 2


Failure It’s often the case that Kempe’s algorithm fails

The IG is not K-colorable The basic idea is to generate spilling code

some variables should be put into memory, instead of into registers Usually, spilled variables reside in the call stack

Should modify code using such variables: for variable use: read from the memory for variable def: store into the memory

Spill code generation The effect of spill code is to turn long live range into shorter ones

This may introduce more temporaries The register allocator should start over, after generating spill code We’ll talk about this shortly

Chaitin’s Allocator

Chaitin’s Algorithm Build: build the interference graph (IG) Simplify: simplify the graph Spill: for significant nodes, mark it as potential spill (sp), remove it and continue Select: pop nodes and try to assign colors

if this fails for potential spill node, mark potential spill as actural spill and continue Start over: generate spill code for actural spills and start over from step #1 (build)

Chaitin’s Algorithm

build simplify Potentialspill Select Actual

spill

Step 1: build the IGa = 1

b = 2

c = a+b

d = a+c

e = a+b

f = d+e

a b

c d

e f

K = 21, 2

Step 2: simplificationa = 1

b = 2

c = a+b

d = a+c

e = a+b

f = d+e

a b

c d

e f

K = 21, 2

f


b = 2

c = a+b

d = a+c

e = a+b

f = d+e

a b

c d

e f

K = 21, 2

fe


b = 2

c = a+b

d = a+c

e = a+b

f = d+e

a b

c d

e f

K = 21, 2

fec ps


b = 2

c = a+b

d = a+c

e = a+b

f = d+e

a b

c d

e f

K = 21, 2

fec psd ps


b = 2

c = a+b

d = a+c

e = a+b

f = d+e

a b

c d

e f

K = 21, 2

fec psd psa


b = 2

c = a+b

d = a+c

e = a+b

f = d+e

a b

c d

e f

K = 21, 2

fec psd psab

Step 3: selectiona = 1

b = 2

c = a+b

d = a+c

e = a+b

f = d+e

a b

c d

e f

K = 21, 2

fec psd psab


b = 2

c = a+b

d = a+c

e = a+b

f = d+e

a b

c d

e f

K = 21, 2

fec psd psa


b = 2

c = a+b

d = a+c

e = a+b

f = d+e

a b

c d

e f

K = 21, 2

fec psd ps


b = 2

c = a+b

d = a+c

e = a+b

f = d+e

a b

c d

e f

K = 21, 2

fec ps

actural spilla fake color


b = 2

c = a+b

d = a+c

e = a+b

f = d+e

a b

c d

e f

K = 21, 2

feactural spill

a fake color



b = 2

c = a+b

d = a+c

e = a+b

f = d+f

a b

c d

e f

K = 21, 2

factural spilla fake color



b = 2

c = a+b

d = a+c

e = a+b

f = d+e

a b

c d

e f

K = 21, 2



There are two spills: c and d. One must rewrite the code.

Step 4: code rewriting (actual spill)

a = 1

b = 2

c = a+b

d = a+c

e = a+b

f = d+e

a = 1

b = 2

x1 = a+b

M[l_c] = x1

x2 = M[l_c]

x3 = a+x2

M[l_d] = x3

e = a+b

x4 = M[l_d]

f = x4+e

There are two spills: c and d. Suppose the memory address for c and d are l_c and l_d (two integers indicating stack offsets). Then for each use, generate a read, for each def, generate a write.

What’s special about xi?They can NOT spill any more. (Why?) c = a+b

d = a+c

f = d+e

Step 4: … star over

a b

x1 x2

f

K = 21, 2

a = 1

b = 2

x1 = a+b

M[l_c] = x1

x2 = M[l_d]

x3 = t1+x2

M[l_d] = x3

e = a+b

x4 = M[l_d]

f = x4+e

e

x4

x3Leave other steps to you.This graph can NOT be colored with 2 colors. (There is a K2 sub-graph.)

Veryyyyyyyy EXPENSIVE!

So, we have to do another iteration to generate spill code (Keep in mind that you can NOT spill x1, x2, x3 and x4) …

Code spill (2nd time)x5 = 1

M[l_a] = x5

b = 2

s6 = M[l_a]

x1 = s6+b

M[l_c] = x1

x2 = M[l_c]

x7 = M[l_a]

x3 = x7+x2

M[l_d] = x3

x8 = M[l_a]

e = x8+b

x4 = M[l_d]

f = x4+e

a = 1

b = 2

x1 = a+b

M[l_c] = x1

x2 = M[l_c]

x3 = a+x2

M[l_d] = x3

e = a+b

x4 = M[l_d]

f = x4+e

a b

x1 x2

f

K = 21, 2

e

x4

x3spilled

IGx1

bx2

x8

fK = 21, 2

e

x3

x4x5x6

x7

x5 = 1

M[l_a] = x5

b = 2

s6 = M[l_a]

x1 = s6+b

M[l_c] = x1

x2 = M[l_c]

x7 = M[l_a]

x3 = x7+x2

M[l_d] = x3

x8 = M[l_a]

e = x8+b

x4 = M[l_d]

f = x4+e

This graph is still not 2-colorable. Why?

So we should continue to spill code. And star over…There are 3 variables remained: b, e, f.Which one should be spilled?Suppose we spill b.

ThirdRound

x5 = 1

M[l_a] = x5

x9 = 2

M[l_b] = x9

x6 = M[l_a]

x10 = M[l_b]

x1 = x6+x10

M[l_c] = x1

x2 = M[l_c]

x7 = M[l_a]

x3 = x7+x2

M[l_d] = x3

x1

x9x8

x6

fK = 21, 2

e

x3

x4x5x2

x7

x8 = M[l_a]

x11 = M[l_b]

e = s8+s11

x4 = M[l_d]

f = x4+e

x11

x10 We have spill all of a, b, c, and d.This has the effect of chopping up all long live ranges into small live ranges!

Spilling a use For a statement like this:

t = u + v if we mark u as an actural spill, rewrite to:

u’ = M[l_u] t = u’+v

where u’ can NOT be a candidate for future spill (unspillable)

Spilling a def For a statement like this:

t = u + v if we mark t as an actural spill, rewrite to:

t’ = u+v M[l_t] = t’

where t’ can NOT be a candidate for future spill (unspillable)

Spilled temps Where should these variables be

spilled to? function frames!

%ebp

…

%esp

arg1arg0

ret addrold ebpSpill_0Spill_1

…

The compiler maintains an internal counter. Each time the compiler finds an actural spill, it increases the counter and assigns a location for that spilled variable.

Frame Suppose we put the frame on the

stack: .text .globl ff: pushl %ebp movl %esp, %ebp pushl %ebx pushl %edi pushl %esi subl $(n*4), %esp

n is the number of all spills, which can only be determined after register allocation.

Some improvements We can speed up the graph coloring based register alloctor in several ways But:

To finish first, first finish KISS: keep it simple and stupid Don’t be too smart by half

Your Tiger compiler must produce correct target code first

#1: Good data structures For live sets

bit-vector? or other data structures? For IG

adjacency list? adjacency matrix? both? Similar for other data structures Use good interface will let you write

dead simple code and enhance it later

#2: frame slot allocationx5 = 1

M[l_a] = x5

x9 = 2

M[l_b] = x9

x6 = M[l_a]

x10 = M[l_b]

x1 = x6+x10

M[l_c] = x1

x2 = M[l_c]

x7 = M[l_a]

x3 = x7+x2M[l_d] = x3 x8 = M[l_a]

x11 = M[l_b]

e = x8+x11

x4 = M[l_d]

f = x4+e

Allocating every spilled temp to its own frame slot can lead to a lot of memory used!A better idea is to share frame slot between spilled temp: iff they don’t live simultaneously: frame slot allocation!

#2: frame slot allocation

l_a

l_b l_d

l_cHow many different colors are required to color this graph?

x5 = 1

M[l_a] = x5

x9 = 2

M[l_b] = x9

x6 = M[l_a]

x10 = M[l_b]

x1 = x6+x10

M[l_c] = x1

x2 = M[l_c]

x7 = M[l_a]

x3 = x7+x2M[l_d] = x3 x8 = M[l_a]

x11 = M[l_b]

e = x8+x11

x4 = M[l_d]

f = x4+e

#3: coalescing Suppose we have a move

statement: t = u

What’s the potential benefit of allocating both t and u to the same register r? r = r

This is called coalescing

Briggs’ Allocator

Documents

Register Allocation: Graph Coloring Compiler Baojian Hua