23
1 Improving Compiler Heuristics with Machine Learning Mark Stephenson Una-May O’Reilly Martin C. Martin Saman Amarasinghe Massachusetts Institute of Technology System Complexities Compiler complexity Open Research Compiler ~3.5 million lines of C/C++ code Trimaran’s compiler ~ 800,000 lines of C code Architecture Complexity MMU and improved FPU Speculative execution Branch prediction Pentium® (3M) Superscalar Pentium 4 (55M) Features

Improving Compiler Heuristics with Machine Learninggroups.csail.mit.edu/EVO-DesignOpt/groupWebSite/uploads/Site/... · 3 Our Proposal Use machine-learning techniques to automatically

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Improving Compiler Heuristics with Machine Learninggroups.csail.mit.edu/EVO-DesignOpt/groupWebSite/uploads/Site/... · 3 Our Proposal Use machine-learning techniques to automatically

1

Improving CompilerHeuristics with

Machine LearningMark StephensonUna-May O’Reilly

Martin C. MartinSaman Amarasinghe

Massachusetts Institute of Technology

System Complexities Compiler complexity

Open Research Compiler ~3.5 million lines of

C/C++ code Trimaran’s compiler

~ 800,000 lines ofC code

Architecture Complexity MMU andimprovedFPU

Speculativeexecution

Branchprediction

Pentium®(3M)

Superscalar

Pentium 4(55M)

Features

Page 2: Improving Compiler Heuristics with Machine Learninggroups.csail.mit.edu/EVO-DesignOpt/groupWebSite/uploads/Site/... · 3 Our Proposal Use machine-learning techniques to automatically

2

NP-Completeness

Many compiler problems are NP-complete Thus, implementations can’t be optimal

Compiler writers rely on heuristics In practice, heuristics perform well …but, require a lot of tweaking

Heuristics often have a focal point Rely on a single priority function

Priority Functions

A heuristic’s Achilles heel A single priority or cost function often dictates the

efficacy of a heuristic Priority functions rank the options available to a

compiler heuristic List scheduling (identifying instructions in worklist to

schedule first) Graph coloring register allocation (selecting nodes to

spill) Hyperblock formation (selecting paths to include)

Any priority function is legal

Page 3: Improving Compiler Heuristics with Machine Learninggroups.csail.mit.edu/EVO-DesignOpt/groupWebSite/uploads/Site/... · 3 Our Proposal Use machine-learning techniques to automatically

3

Our Proposal

Use machine-learning techniques toautomatically search the priority functionspace Increases compiler’s performance Reduces compiler design complexity

Can focus on a small portion of anoptimization algorithm Don’t need to worry about legality checking

Small change can yield big payoffs Clear specification in terms of input/output Prevalent in compiler heuristics

Qualities of Priority Functions

Page 4: Improving Compiler Heuristics with Machine Learninggroups.csail.mit.edu/EVO-DesignOpt/groupWebSite/uploads/Site/... · 3 Our Proposal Use machine-learning techniques to automatically

4

An Example OptimizationHyperblock Scheduling

Conditional execution is potentially veryexpensive on a modern architecture

Modern processors try to dynamically predictthe outcome of the condition This works great for predictable branches… But some conditions can’t be predicted

If they don’t predict correctly you waste a lotof time

Example OptimizationHyperblock Scheduling

if (a[1] == 0)

else

Assume a[1] is 0 time

reso

urce

s

Misprediction

Page 5: Improving Compiler Heuristics with Machine Learninggroups.csail.mit.edu/EVO-DesignOpt/groupWebSite/uploads/Site/... · 3 Our Proposal Use machine-learning techniques to automatically

5

Example OptimizationHyperblock Scheduling

if (a[1] == 0)

else

Assume a[1] is 0 time

reso

urce

s

Example OptimizationHyperblock Scheduling (using predication)

if (a[1] == 0)

else

Assume a[1] is 0 time

reso

urce

s

Processor simply discards resultsof instructions that weren’tsupposed to be run

Page 6: Improving Compiler Heuristics with Machine Learninggroups.csail.mit.edu/EVO-DesignOpt/groupWebSite/uploads/Site/... · 3 Our Proposal Use machine-learning techniques to automatically

6

Example OptimizationHyperblock Scheduling

There are unclear tradeoffs In some situations, hyperblocks are faster than traditional

execution In others, hyperblocks impair performance

Many factors affect this decision Accuracy of branch predictor Availability of parallel execution resources Effectiveness of the compiler’s scheduler Parallelizability and predictability of the program

Hard to model

Example OptimizationHyperblock Scheduling [Mahlke]

Find predicatable regions of control flow Enumerate paths of control in region

Exponential, but in practice it’s okay Prioritize paths based on four path

characteristics The priority function we want to optimize

Add paths to hyperblock in priority order

Page 7: Improving Compiler Heuristics with Machine Learninggroups.csail.mit.edu/EVO-DesignOpt/groupWebSite/uploads/Site/... · 3 Our Proposal Use machine-learning techniques to automatically

7

Trimaran’s Priority Function

!"#

=free hazard is if :1

hazard has if :25.0

i

i

isegment

segmenthazard

jNj

ii

heightdep

heightdepratiodep

_max

__

1!=

=

)__1.2(_ iiiii ratioopratiodephazardratioexecpriority !!""=

jNj

ii

heightop

heightopratioop

_max

__

1!=

=

Favor frequentlyexecuted paths Favor short

paths

Penalize paths with hazards Favor parallel paths

Our Approach

Trimaran uses four characteristics

What are the important characteristics of ahyperblock formation priority function?

Our approach: Extract all the characteristicsyou can think of and let a learning techniquefind the priority function

Page 8: Improving Compiler Heuristics with Machine Learninggroups.csail.mit.edu/EVO-DesignOpt/groupWebSite/uploads/Site/... · 3 Our Proposal Use machine-learning techniques to automatically

8

Genetic Programming

Searching algorithm analogous to Darwinianevolution Maintain a population of expressions

num_ops 2.3 predictability 4.1

- /

*

Genetic Programming

Searching algorithm analogous to Darwinianevolution Maintain a population of expressions Selection

The fittest expressions in the population are morelikely to reproduce

Reproduction Crossing over subexpressions of two expressions

Mutation

Page 9: Improving Compiler Heuristics with Machine Learninggroups.csail.mit.edu/EVO-DesignOpt/groupWebSite/uploads/Site/... · 3 Our Proposal Use machine-learning techniques to automatically

9

General Flow

Create initial population(initial solutions)

Evaluation

Selection

Randomly generated initialpopulation seeded with thecompiler writer’s bestguess

Create Variants

done?

General Flow

Create initial population(initial solutions)

Evaluation

Selection

Create Variants

done?

Compiler is modified to usethe given expression as apriority function

Each expression isevaluated by compiling andrunning the benchmark(s)

Fitness is the relativespeedup over Trimaran’spriority function on thebenchmark(s)

Page 10: Improving Compiler Heuristics with Machine Learninggroups.csail.mit.edu/EVO-DesignOpt/groupWebSite/uploads/Site/... · 3 Our Proposal Use machine-learning techniques to automatically

10

General Flow

Create initial population(initial solutions)

Evaluation

Selection

Create Variants

done?

Just as with NaturalSelection, the fittestindividuals are more likelyto survive

General Flow

Create initial population(initial solutions)

Evaluation

Selection

Create Variants

done?

Use crossover andmutation to generate newexpressions

And thus, generate newcompilers

Page 11: Improving Compiler Heuristics with Machine Learninggroups.csail.mit.edu/EVO-DesignOpt/groupWebSite/uploads/Site/... · 3 Our Proposal Use machine-learning techniques to automatically

11

Experimental Setup

Collect results using Trimaran Simulate a VLIW machine

64 GPRs, 64 FPRs, 128 PRs 4 fully pipelined integer FUs 2 fully pipelined floating point FUs 2 memory units (L1:2, L2:7, L3:35)

Replace priority functions in IMPACT with ourGP expression parser and evaluator

Outline of Results High-water mark

Create compilers specific to a given applicationand a given data set

Essentially partially evaluating the application Application-specific compilers

Compiler trained for a given application and dataset, but run with an alternate data set

General-purpose compiler Compiler trained on multiple applications and

tested on an unrelated set of applications

Page 12: Improving Compiler Heuristics with Machine Learninggroups.csail.mit.edu/EVO-DesignOpt/groupWebSite/uploads/Site/... · 3 Our Proposal Use machine-learning techniques to automatically

12

Training the Priority Function

Compiler

A.c B.c C.c D.c

A B C D

1 2

Training the Priority FunctionApplication-Specific Compilers

Compiler

A.c B.c C.c D.c

A B C D

1 2

Page 13: Improving Compiler Heuristics with Machine Learninggroups.csail.mit.edu/EVO-DesignOpt/groupWebSite/uploads/Site/... · 3 Our Proposal Use machine-learning techniques to automatically

13

Hyperblock ResultsApplication-Specific Compilers (High-Water Mark)

1.54

1.23

0

0.5

1

1.5

2

2.5

3

3.5

129.

com

pres

s

g721

enco

de

g721

deco

de

huff_

dec

huff_

enc

raw

caud

io

raw

daud

io

toas

t

mpe

g2de

c

Ave

rage

Spee

dup

Training input Novel input

(add (sub (cmul (gt (cmul $b0 0.8982 $d17)…$d7)) (cmul $b0 0.6183 $d28)))

(add (div $d20 $d5) (tern $b2 $d0 $d9))

Training the Priority FunctionGeneral-Purpose Compilers

Compiler

A.c B.c C.c D.c

A B C D

1

Page 14: Improving Compiler Heuristics with Machine Learninggroups.csail.mit.edu/EVO-DesignOpt/groupWebSite/uploads/Site/... · 3 Our Proposal Use machine-learning techniques to automatically

14

Hyperblock ResultsGeneral-Purpose Compiler

1.44

1.25

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5de

codr

le4

codr

le4

g721

deco

de

g721

enco

de

raw

daud

io

raw

caud

io

toas

t

mpe

g2de

c

124.

m88

ksim

129.

com

pres

s

huff_

enc

huff_

dec

Ave

rage

Spee

dup

Train data set Novel data set

Validation of GeneralityTesting General-Purpose Applicability

1.09

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

unep

ic

djpe

g

rast

a

023.

eqnt

ott

132.

ijpeg

052.

alvi

nn

147.

vorte

x

085.

cc1 art

130.

li

osde

mo

mip

map

Ave

rage

Spe

edup

Page 15: Improving Compiler Heuristics with Machine Learninggroups.csail.mit.edu/EVO-DesignOpt/groupWebSite/uploads/Site/... · 3 Our Proposal Use machine-learning techniques to automatically

15

Running Time Application specific compilers

~1 day using 15 processors General-purpose compilers

Dynamic Subset Selection [Gathercole] Run on a subset of the training benchmarks at a time

Memoize fitnesses ~1 week using 15 processors

This is a one time process! Performed by the compiler vendor

(add (sub (mul exec_ratio_mean 0.8720) 0.9400) (mul 0.4762 (cmul (not has_pointer_deref)

(mul 0.6727 num_paths) (mul 1.1609

(add (sub (mul (div num_ops dependence_height) 10.8240) exec_ratio) (sub (mul (cmul has_unsafe_jsr predict_product_mean 0.9838)

(sub 1.1039 num_ops_max)) (sub (mul dependence_height_mean num_branches_max) num_paths)))))))

GP Hyperblock SolutionsGeneral Purpose

Intron that doesn’t affectsolution

Page 16: Improving Compiler Heuristics with Machine Learninggroups.csail.mit.edu/EVO-DesignOpt/groupWebSite/uploads/Site/... · 3 Our Proposal Use machine-learning techniques to automatically

16

(add (sub (mul exec_ratio_mean 0.8720) 0.9400) (mul 0.4762 (cmul (not has_pointer_deref)

(mul 0.6727 num_paths) (mul 1.1609

(add (sub (mul (div num_ops dependence_height) 10.8240) exec_ratio) (sub (mul (cmul has_unsafe_jsr predict_product_mean 0.9838)

(sub 1.1039 num_ops_max)) (sub (mul dependence_height_mean num_branches_max) num_paths)))))))

GP Hyperblock SolutionsGeneral Purpose

Favor paths that don’thave pointer dereferences

GP Hyperblock SolutionsGeneral Purpose

(add (sub (mul exec_ratio_mean 0.8720) 0.9400) (mul 0.4762 (cmul (not has_pointer_deref)

(mul 0.6727 num_paths) (mul 1.1609

(add (sub (mul (div num_ops dependence_height) 10.8240) exec_ratio) (sub (mul (cmul has_unsafe_jsr predict_product_mean 0.9838)

(sub 1.1039 num_ops_max)) (sub (mul dependence_height_mean num_branches_max) num_paths)))))))

Favor highly parallel(fat) paths

Page 17: Improving Compiler Heuristics with Machine Learninggroups.csail.mit.edu/EVO-DesignOpt/groupWebSite/uploads/Site/... · 3 Our Proposal Use machine-learning techniques to automatically

17

GP Hyperblock SolutionsGeneral Purpose

(add (sub (mul exec_ratio_mean 0.8720) 0.9400) (mul 0.4762 (cmul (not has_pointer_deref)

(mul 0.6727 num_paths) (mul 1.1609

(add (sub (mul (div num_ops dependence_height) 10.8240) exec_ratio) (sub (mul (cmul has_unsafe_jsr predict_product_mean 0.9838)

(sub 1.1039 num_ops_max)) (sub (mul dependence_height_mean num_branches_max) num_paths)))))))

If a path calls asubroutine that mayhave side effects,penalize it

Our Proposal

Use machine-learning techniques toautomatically search the priority functionspace Increases compiler’s performance Reduces compiler design complexity

Page 18: Improving Compiler Heuristics with Machine Learninggroups.csail.mit.edu/EVO-DesignOpt/groupWebSite/uploads/Site/... · 3 Our Proposal Use machine-learning techniques to automatically

18

Eliminate the Human from theLoop

So far we have tried to improve existingpriority functions Still a lot of person-hours were spent creating the

initial priority functions Observation: the human-created priority

functions are often eliminated in the 1st

generation What if we start from a completely random

population (no human-generated seed)?

Another ExampleRegister Allocation

An old, established problem Hundreds of papers on the subject

Priority-Based Register Allocation[Chow,Hennessey] Uses a priority function to determine the worth of

allocating a register Let’s throw our GP system at the problem

and see what it comes up with

Page 19: Improving Compiler Heuristics with Machine Learninggroups.csail.mit.edu/EVO-DesignOpt/groupWebSite/uploads/Site/... · 3 Our Proposal Use machine-learning techniques to automatically

19

Register Allocation ResultsGeneral-Purpose Compiler

1.03

1.03

0.90

0.95

1.00

1.05

1.10

1.15

1.20

raw

caud

io

raw

daud

io

huff_

enc

huff_

dec

129.

com

pres

s

mpe

g2de

c

g721

enco

de

g721

deco

de

aver

age

Spee

dup

Train data set Novel data set

Validation of GeneralityTesting General-Purpose Applicability

1.02

0.94

0.96

0.98

1.00

1.02

1.04

1.06

1.08

deco

drle

4

codr

le4

130.

li

djpe

g

unep

ic

124.

m88

ksim

023.

eqnt

ott

132.

ijpeg

085.

cc1

147.

vorte

x

aver

age

Spee

dup

Page 20: Improving Compiler Heuristics with Machine Learninggroups.csail.mit.edu/EVO-DesignOpt/groupWebSite/uploads/Site/... · 3 Our Proposal Use machine-learning techniques to automatically

20

Importance of Priority FunctionsSpeedup over a constant priority function

1.32 1.

36

0.80

1.00

1.20

1.40

1.60

1.80

2.00

2.20

raw

caud

io

raw

daud

io

huff_

enc

huff_

dec

129.

com

pres

s

mpe

g2de

c

g721

enco

de

g721

deco

de

aver

age

Spee

dup

(ove

r no

func

tion)

Trimaran's function GP's function

Advantages our SystemProvides

Engineers can focus their energy on moreimportant tasks

Can quickly retune for architectural changes Can quickly retune when compiler changes Can provide compilers catered toward

specific application suites e.g., consumer may want a compiler that excels

on scientific benchmarks

Page 21: Improving Compiler Heuristics with Machine Learninggroups.csail.mit.edu/EVO-DesignOpt/groupWebSite/uploads/Site/... · 3 Our Proposal Use machine-learning techniques to automatically

21

Related Work

Calder et al. [TOPLAS-19] Fine tuned static branch prediction heuristics Requires a priori classification by a supervisor

Monsifrot et al. [AIMSA-02] Classify loops based on amenability to unrolling Also used a priori classification

Cooper et al. [Journal of Supercomputing-02] Use GAs to solve phase ordering problems

Conclusion

Performance talk and a complexity talk Take a huge compiler, optimize one priority

function with GP and get exciting speedups Take a well-known heuristic and create

priority functions for it from scratch There’s a lot left to do

Page 22: Improving Compiler Heuristics with Machine Learninggroups.csail.mit.edu/EVO-DesignOpt/groupWebSite/uploads/Site/... · 3 Our Proposal Use machine-learning techniques to automatically

22

Why Genetic Programming? Many learning techniques rely on having pre-

classified data (labels) e.g., statistical learning, neural networks, decision

trees Priority functions require another approach

Reinforcement learning Unsupervised learning

Several techniques that might work well e.g., hill climbing, active learning, simulated

annealing

Page 23: Improving Compiler Heuristics with Machine Learninggroups.csail.mit.edu/EVO-DesignOpt/groupWebSite/uploads/Site/... · 3 Our Proposal Use machine-learning techniques to automatically

23

Why Genetic Programming

Benefits of GP Capable of searching high-dimensional spaces It is a distributed algorithm The solutions are human readable

Nevertheless…there are other learningtechniques that may also perform well

Genetic ProgrammingReproduction

num_ops 2.3 predictability 4.1

- /

*

branches 7.5

* 1.1

/

branches 7.5

*

predictability 4.1

/

*

1.1

/

num_ops 2.3

-