55
Optimal Polynomial Time Algorithms for Register Assignment Presented at the Chinese University of Hong Kong - Fernando M. Q. Pereira - August 28 th , 2007 University of California, Los Angeles

Optimal Polynomial Time Algorithms for Register Assignment Presented at the Chinese University of Hong Kong - Fernando M. Q. Pereira - August 28 th, 2007

  • View
    216

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Optimal Polynomial Time Algorithms for Register Assignment Presented at the Chinese University of Hong Kong - Fernando M. Q. Pereira - August 28 th, 2007

Optimal Polynomial Time Algorithms for Register Assignment

Presented at the Chinese University

of Hong Kong

- Fernando M. Q. Pereira -

August 28th, 2007

University of California, Los Angeles

Page 2: Optimal Polynomial Time Algorithms for Register Assignment Presented at the Chinese University of Hong Kong - Fernando M. Q. Pereira - August 28 th, 2007

Background

Page 3: Optimal Polynomial Time Algorithms for Register Assignment Presented at the Chinese University of Hong Kong - Fernando M. Q. Pereira - August 28 th, 2007

Register Allocation Assign physical locations to the

variables in a program. Registers are fast, but few. Memory is large, but slow.

Constraints: variables simultaneously alive must be assigned to different physical locations.

If there are not enough registers, some variables must be mapped into memory. These are called spilled variables.

Page 4: Optimal Polynomial Time Algorithms for Register Assignment Presented at the Chinese University of Hong Kong - Fernando M. Q. Pereira - August 28 th, 2007

Spill Free Register Allocation Instance: program P and K registers Problem: can each of the variables

of P be mapped to one of the K registers such that variables simultaneously alive are given different registers?

Page 5: Optimal Polynomial Time Algorithms for Register Assignment Presented at the Chinese University of Hong Kong - Fernando M. Q. Pereira - August 28 th, 2007

Liveness? Live Range? A variable is alive if it can be used in the

future. Live range of a variable is the collection

of program points where it is alive.a := 1

b := 2

c := a

d := b

e := c

ret a + e

a

b

c

d

a

e

a := d

1)

2)

3)

4)

5)

6)

7)

Page 6: Optimal Polynomial Time Algorithms for Register Assignment Presented at the Chinese University of Hong Kong - Fernando M. Q. Pereira - August 28 th, 2007

Quiz 1 How many registers?

a := 1

b := 2

c := a

d := b

e := c

ret a + e

a

b

c

d

aa := d

1)

2)

3)

4)

5)

6)

7)

Is there a general algorithm? Is this problem in P or NP?

a(R1):= 1b(R2):= 2

c(R1):= a(R1)d(R2):= b(R2)e(R3):= c(R1)a(R1):= d(R2)

ret a(R1)+e(R3)

e

Page 7: Optimal Polynomial Time Algorithms for Register Assignment Presented at the Chinese University of Hong Kong - Fernando M. Q. Pereira - August 28 th, 2007

Register Allocation and Graphs SFRA = Graph coloring [Chaitin81]

a := 1

b := 2

c := a

d := b

e := c

ret a + e

a

b

c

d

a

SFRA is NP-complete…

a b c

dee

a := d

Page 8: Optimal Polynomial Time Algorithms for Register Assignment Presented at the Chinese University of Hong Kong - Fernando M. Q. Pereira - August 28 th, 2007

Example

a := 1

b := 2

c := a

d := b

e := c

ret a + e

a(R1)

b(R2)

c(R1)

d(R3)e(R2)

a := d

Thee registers: R1, R2 and R3 R1 := 1

R2 := 2

R1 := R1

R3 := R2

R2 := R1

ret R1 + R2

R1 := R3

Page 9: Optimal Polynomial Time Algorithms for Register Assignment Presented at the Chinese University of Hong Kong - Fernando M. Q. Pereira - August 28 th, 2007

Live Range Splitting Live ranges are split via copy instructions

and/or renaming of variables. May reduce the degree of the

interference graph.

a := 1

b := 1

:= b

c := 1

:= a

:= c

b

ca a

b

c

a1 := 1

b := 1

:= b

c := 1

:= a2

:= c

a2 := a1

b

c

a1

a2

b

c

a1

a2

(a) (b) (c) (d) (e) (f)

Page 10: Optimal Polynomial Time Algorithms for Register Assignment Presented at the Chinese University of Hong Kong - Fernando M. Q. Pereira - August 28 th, 2007

Quiz 2 If I can split live ranges, how many

registers?

a := 1

b := 2

c := a

d := b

e := c

ret a + e

a := d

a

b

c

d

a

e

a1 := 1b := 2c := a1

d := be := ca2 := d

ret a2 + e

a2

a1bcde

Page 11: Optimal Polynomial Time Algorithms for Register Assignment Presented at the Chinese University of Hong Kong - Fernando M. Q. Pereira - August 28 th, 2007

Quiz 3 P or NP?

Instance: program P, K registers Problem: is there a way to split the live

ranges of P so that all its variables can fit into K registers?

This problem has polynomial solution! Three independent proofs in 2005:

Philip Brisk, WLS’05 Florent Bouchez, INRIA, Master’s thesis Sebastian Hack, CC’06

Page 12: Optimal Polynomial Time Algorithms for Register Assignment Presented at the Chinese University of Hong Kong - Fernando M. Q. Pereira - August 28 th, 2007

Quiz 4, and a bit of intuition…

Is coloring of Circular arc-graphs in P or NP?

Is coloring of Interval-graphs in P or NP?

b

c

a

d

e

a b c d e

Page 13: Optimal Polynomial Time Algorithms for Register Assignment Presented at the Chinese University of Hong Kong - Fernando M. Q. Pereira - August 28 th, 2007

Intuition on Live Range Splitting

b

c

a

d

e

a b c

de

b

c

a1

d

e

a2

b c

dea1

a2

Page 14: Optimal Polynomial Time Algorithms for Register Assignment Presented at the Chinese University of Hong Kong - Fernando M. Q. Pereira - August 28 th, 2007

SSA-Form: the new hope. Static Single Assignment[CFR+91]. Intermediate program representation. Each variable is defined only once.

b

c

da2

a1 a1

a2

a1 := 1

b := 2

c := a1

d := b

e := c

ret a2 + e

b

c

d

e

a2 := d e

1)

2)

3)

4)

5)

6)

7)

Page 15: Optimal Polynomial Time Algorithms for Register Assignment Presented at the Chinese University of Hong Kong - Fernando M. Q. Pereira - August 28 th, 2007

Polynomial time SFRA [Brisk05,Bouchez05,Hack06]: the

interference graph of SSA-form programs is chordal.

Chordal graphs can be colored in polynomial time.

SFRA has polynomial solution for SSA-form programs.

Any program can be converted to SSA-form.

The SSA-form program never requires more regs than the original program.

Page 16: Optimal Polynomial Time Algorithms for Register Assignment Presented at the Chinese University of Hong Kong - Fernando M. Q. Pereira - August 28 th, 2007

Quiz 5: RA in basic blocks

A basic block is a sequence of instructions with no branches.

How is the interference graph of a SSA-form basic block?

Give polynomial time algorithm for register assignment in basic blocks.

Page 17: Optimal Polynomial Time Algorithms for Register Assignment Presented at the Chinese University of Hong Kong - Fernando M. Q. Pereira - August 28 th, 2007

Too good, but… … real computer architectures are a

little too surreal…

Page 18: Optimal Polynomial Time Algorithms for Register Assignment Presented at the Chinese University of Hong Kong - Fernando M. Q. Pereira - August 28 th, 2007

There are more things in x86, Horatio…

The polynomial time register assignment algorithm is too abstract.

Some computer architectures are messy: Pre-colored registers Registers of different sizes.

Testimony: no publicly available implementation for x86 after two years.

Page 19: Optimal Polynomial Time Algorithms for Register Assignment Presented at the Chinese University of Hong Kong - Fernando M. Q. Pereira - August 28 th, 2007

Pre-colored registers Some variables must be assigned to

particular registers. Ex.: calling conventions, division, etc

a := 10;b := 2;R0 := a;R1 := b;call(R0, R1);

a := 10;b := 2;AX := a;(AL,AH) := DIV AX, b;d := AL; // quotientr := AH; // remainder

Function call (PowerPC) Division (x86)

Page 20: Optimal Polynomial Time Algorithms for Register Assignment Presented at the Chinese University of Hong Kong - Fernando M. Q. Pereira - August 28 th, 2007

Quiz 6: pre-coloring extension

Pre-coloring extension is NP-complete for interval graphs[Biro92] and even for Unit-interval graphs[Marx06]

easy :) difficult :(

Is pre-coloring extension of interval graphs in P or NP?

Page 21: Optimal Polynomial Time Algorithms for Register Assignment Presented at the Chinese University of Hong Kong - Fernando M. Q. Pereira - August 28 th, 2007

Alias Register Allocation Aliased registers can be used

independently, or in combination. Ex.: x86, Sun SPARC, MIPS floating point

numbers, etc. Ex.: aliased registers in the Pentium:

EAX EBX ECX EDX

AX BX CX DX

AH AL BH BL CH CL DH DL

32 bits

16 bits

8 bits

Page 22: Optimal Polynomial Time Algorithms for Register Assignment Presented at the Chinese University of Hong Kong - Fernando M. Q. Pereira - August 28 th, 2007

Quiz 7: Weighted Coloring

a b e

d

c

a

b

c

de

Shipbuilding Alias RA

a(23)

b(0)

c(12)

d(3)e(1)

a(01)

b(2)

c(01)

d(4)e(3)

What is the optimal 1-2-coloring of the graph in the left?

Page 23: Optimal Polynomial Time Algorithms for Register Assignment Presented at the Chinese University of Hong Kong - Fernando M. Q. Pereira - August 28 th, 2007

Alias Register Allocation Alias Register Allocation is similar to the

shipbuilding problem[Gol04, pp 204] Alias Register Allocation is NP-

complete[LPP07] for interval graphs. And so is the shipbuilding problem...

Page 24: Optimal Polynomial Time Algorithms for Register Assignment Presented at the Chinese University of Hong Kong - Fernando M. Q. Pereira - August 28 th, 2007

What can SSA do?

The SSA transformation is too weak to handle

alias register allocation and programs with pre-

colored variables.

Page 25: Optimal Polynomial Time Algorithms for Register Assignment Presented at the Chinese University of Hong Kong - Fernando M. Q. Pereira - August 28 th, 2007

Register Allocation by Puzzle Solving

Polynomial time 1-2-coloring extension with live range splitting.

Page 26: Optimal Polynomial Time Algorithms for Register Assignment Presented at the Chinese University of Hong Kong - Fernando M. Q. Pereira - August 28 th, 2007

Aliased Register Allocation with Pre-coloring

Instance: program P containing variables that are either short or long, 2K available registers, plus a partial function that associates variables with registers. Long variables are assigned two registers {2i, 2i+1}, 0 i < K, and short variables are assigned one register.

Problem: is it possible to extend so that it constitutes a valid register allocation of P? The register allocator is allowed to split live ranges.

Page 27: Optimal Polynomial Time Algorithms for Register Assignment Presented at the Chinese University of Hong Kong - Fernando M. Q. Pereira - August 28 th, 2007

In other words… Optimal spill free register allocation.

x86, Ultra SPARC, MIPS, PowerPC, … as far as I know, any register based architecture.

Heuristics for spilling.

Page 28: Optimal Polynomial Time Algorithms for Register Assignment Presented at the Chinese University of Hong Kong - Fernando M. Q. Pereira - August 28 th, 2007

Heuristics for spilling? Optimal solution for spill free register

allocation. If it is not possible to find an optimal

register assignment for program P, variables of P must be stored in memory. Finding the minimum number of variables

that must be spilled is NP-complete. Finding the largest K colorable induced

subgraph of a chordal graph is NP-complete [Yannakakis87].

Page 29: Optimal Polynomial Time Algorithms for Register Assignment Presented at the Chinese University of Hong Kong - Fernando M. Q. Pereira - August 28 th, 2007

[PP07] - The Main Ideas Elementary Programs and Elementary

graphs. Elementary programs have elementary

interference graphs. Any well structured program can be

converted to an elementary program. Each connected component of an

elementary graph is a clique substitution of P3.

Page 30: Optimal Polynomial Time Algorithms for Register Assignment Presented at the Chinese University of Hong Kong - Fernando M. Q. Pereira - August 28 th, 2007

[PP07] - The Main Ideas

Page 31: Optimal Polynomial Time Algorithms for Register Assignment Presented at the Chinese University of Hong Kong - Fernando M. Q. Pereira - August 28 th, 2007

Elementary Programs

P is an elementary program if:

1. P is strict

2. P is in static single assignment form

3. For any variable v of P, LR(v) contains at most one program point outside the basic block that contains def(v)

4. If two variables u,v of P interfere, then either def(u) = def(v), or kill(u) = kill(v)

5. If two variables u,v of P interfere, then either LR(u) LR(v), or LR(v) LR(u)

Page 32: Optimal Polynomial Time Algorithms for Register Assignment Presented at the Chinese University of Hong Kong - Fernando M. Q. Pereira - August 28 th, 2007

(a) Strict program (b) Elementary program

Page 33: Optimal Polynomial Time Algorithms for Register Assignment Presented at the Chinese University of Hong Kong - Fernando M. Q. Pereira - August 28 th, 2007

Interference graph

Page 34: Optimal Polynomial Time Algorithms for Register Assignment Presented at the Chinese University of Hong Kong - Fernando M. Q. Pereira - August 28 th, 2007

Clique Substitution of P3

P3 is a path with three vertices.

P3K2 K3

P3[K2, K2, K3]

X Clique Y Clique

Z Clique

Page 35: Optimal Polynomial Time Algorithms for Register Assignment Presented at the Chinese University of Hong Kong - Fernando M. Q. Pereira - August 28 th, 2007

Elementary Graphs Definition: G is an elementary graph if

and only if every connected component of G is a

clique substitution of P3

Theorem: An elementary program has an elementary interference graph.

Page 36: Optimal Polynomial Time Algorithms for Register Assignment Presented at the Chinese University of Hong Kong - Fernando M. Q. Pereira - August 28 th, 2007

Aligned 1-2-coloring extension

Instance: Graph G with nodes that are either short or long, 2K available colors, plus a partial function that associates nodes with colors. Long nodes are assigned two colors {2i, 2i+1}, 0 i < K, and short nodes are assigned one.

Problem: is it possible to extend so that it constitutes a valid coloring of G?

Page 37: Optimal Polynomial Time Algorithms for Register Assignment Presented at the Chinese University of Hong Kong - Fernando M. Q. Pereira - August 28 th, 2007

Graph Hierarchy

Page 38: Optimal Polynomial Time Algorithms for Register Assignment Presented at the Chinese University of Hong Kong - Fernando M. Q. Pereira - August 28 th, 2007

The Puzzles

The Board:

The Pieces:

Page 39: Optimal Polynomial Time Algorithms for Register Assignment Presented at the Chinese University of Hong Kong - Fernando M. Q. Pereira - August 28 th, 2007

From graphs to puzzles

Given PX,Y,Z we build a puzzle: Vertex piece Color column X-clique upper row Y-clique both rows Z-clique lower row Pre-coloring some pieces are already on the

board Theorem: Aligned 1-2-coloring extension for

clique substitutions of P3 and puzzle solving are equivalent under linear-time reductions

Page 40: Optimal Polynomial Time Algorithms for Register Assignment Presented at the Chinese University of Hong Kong - Fernando M. Q. Pereira - August 28 th, 2007
Page 41: Optimal Polynomial Time Algorithms for Register Assignment Presented at the Chinese University of Hong Kong - Fernando M. Q. Pereira - August 28 th, 2007

Rules, Patterns and matches

match

Don’t match

Page 42: Optimal Polynomial Time Algorithms for Register Assignment Presented at the Chinese University of Hong Kong - Fernando M. Q. Pereira - August 28 th, 2007

Example Program

Page 43: Optimal Polynomial Time Algorithms for Register Assignment Presented at the Chinese University of Hong Kong - Fernando M. Q. Pereira - August 28 th, 2007

Our Solution

Page 44: Optimal Polynomial Time Algorithms for Register Assignment Presented at the Chinese University of Hong Kong - Fernando M. Q. Pereira - August 28 th, 2007

Counter-example 1

Lesson: use a size-2 piece before two size-1 pieces

Page 45: Optimal Polynomial Time Algorithms for Register Assignment Presented at the Chinese University of Hong Kong - Fernando M. Q. Pereira - August 28 th, 2007

Counter-example 2

Lesson: statements 7-10 must come before statements 11-14

Page 46: Optimal Polynomial Time Algorithms for Register Assignment Presented at the Chinese University of Hong Kong - Fernando M. Q. Pereira - August 28 th, 2007

Counter-example 3

Lesson: statement 15 must come before statements 11-14

Page 47: Optimal Polynomial Time Algorithms for Register Assignment Presented at the Chinese University of Hong Kong - Fernando M. Q. Pereira - August 28 th, 2007

Counter-example 4

Lesson: the order in statement 11-14 is crucial

Page 48: Optimal Polynomial Time Algorithms for Register Assignment Presented at the Chinese University of Hong Kong - Fernando M. Q. Pereira - August 28 th, 2007

Running Complexity Theorem: a puzzle is solvable if, and

only if, our program succeeds on the puzzle.

Our puzzle solving program runs in linear time.

Page 49: Optimal Polynomial Time Algorithms for Register Assignment Presented at the Chinese University of Hong Kong - Fernando M. Q. Pereira - August 28 th, 2007

Spilling Visit each puzzle once. If the puzzle is not solvable, then

remove some pieces and try to solve again.

Each time we remove a piece, we also remove all other pieces that stem from the same variable in the original program.

Spill farthest use first.

Page 50: Optimal Polynomial Time Algorithms for Register Assignment Presented at the Chinese University of Hong Kong - Fernando M. Q. Pereira - August 28 th, 2007

Experimental Results Puzzle solver has been implemented in

the LLVM[CV04] framework. Compile C programs to x86 target.

Over one million lines of code compiled! We have compared our allocator with

LLVM’s default algorithm, and a graph coloring well known heuristics.

Page 51: Optimal Polynomial Time Algorithms for Register Assignment Presented at the Chinese University of Hong Kong - Fernando M. Q. Pereira - August 28 th, 2007

Benchmarks

Benchmark LoC Asm btcode

ASCI Purple:smg2000 74,875 73,039 303,037

SPEC2000:175.vpr 70,253 52,917 173,475

SPEC2000:188.ammp 54,335 35,567 149,245

MallocBench:expresso 52,853 45,041 250,770

SPEC2000:197.parser 49,388 32,849 163,025

SPEC2000:164.gzip 39,157 8,130 46,188

(six more) … … …

Total 409,540 286,900 1,345,898

Page 52: Optimal Polynomial Time Algorithms for Register Assignment Presented at the Chinese University of Hong Kong - Fernando M. Q. Pereira - August 28 th, 2007

Types of Puzzles

Page 53: Optimal Polynomial Time Algorithms for Register Assignment Presented at the Chinese University of Hong Kong - Fernando M. Q. Pereira - August 28 th, 2007

Number of Iterations

Benchmark Puzzles Avg max Once

ASCI Purple:smg2000 52,791 1.33 8 33,822

SPEC2000:175.vpr 47,276 1.10 10 45,575

SPEC2000:188.ammp 33,428 1.09 9 28,515

MallocBench:expresso 43,791 1.06 3 38,925

SPEC2000:197.parser 30,868 1.05 4 28,992

SPEC2000:164.gzip 7,840 1.06 3 6,718

(six more) … … … …

Total 251,428 1.13 10 213,411

Page 54: Optimal Polynomial Time Algorithms for Register Assignment Presented at the Chinese University of Hong Kong - Fernando M. Q. Pereira - August 28 th, 2007

Execution Time of Generated Code

Data normalized with respect to GCC -02.

Page 55: Optimal Polynomial Time Algorithms for Register Assignment Presented at the Chinese University of Hong Kong - Fernando M. Q. Pereira - August 28 th, 2007

Conclusion If you want to do register allocation for

the Pentium, your problem is to solve a collection of puzzles.

Fast compilation time, competitive code quality.

Many possible directions for future research.