37
Compiler Research in HPC Lab R. Govindarajan High Performance Computing Lab. [email protected]

Compiler Research in HPC Lab

  • Upload
    tambre

  • View
    35

  • Download
    0

Embed Size (px)

DESCRIPTION

Compiler Research in HPC Lab. R. Govindarajan High Performance Computing Lab. [email protected]. Organization. HPC Lab Research Overview Compiler Analysis & Optimizations Precise Dataflow Analysis Energy Reduction for Embedded Systems - PowerPoint PPT Presentation

Citation preview

Page 1: Compiler Research  in HPC Lab

Compiler Research

in HPC Lab

R. GovindarajanHigh Performance Computing

[email protected]

Page 2: Compiler Research  in HPC Lab

Organization

HPC Lab Research Overview Compiler Analysis & Optimizations

Precise Dataflow Analysis Energy Reduction for Embedded

Systems Array Allocation for Partitioned Memory

Arch. Dynamic Voltage Scaling

Integrated Spill Code Generation & Scheduling

Conclusions

Page 3: Compiler Research  in HPC Lab

HPC Team (or HPC– XI)

Mruggesh Gajjar B.C. Girish R. Karthikeyan R. Manikantan Santosh

Nagarakatte

Rupesh Nasre Sreepathi Pai Kaushik Rajan T.S. Rajesh Kumar V.Santhosh Kumar Aditya Thakur

Coach: R. Govindarajan

Page 4: Compiler Research  in HPC Lab

Compiler Optimizations Traditional analysis & optimizations,

power-aware compiling techniques, compilation techniques for embedded systems

Computer Architecture Superscalar architecture, architecture-

compiler interaction, application-specific processors, embedded systems

High Performance Computing Cluster computing, HPC Applications

HPC Lab Research Overview

Page 5: Compiler Research  in HPC Lab

ILP Compilation Techniques Compiling Techniques for

Embedded Systems Compiling Techniques for

Application-Specific Systems Dataflow Analysis

Compiler Research in HPC Lab.

Page 6: Compiler Research  in HPC Lab

ILP Compilation Techniques

Instruction Scheduling Software pipelining Register Allocation Power/Energy Aware Compilation

techniques Compiling Techniques for embedded

systems/application specific processors (DSP, Network Processors, …)

Page 7: Compiler Research  in HPC Lab

Power-aware software pipelining method (using integer linear program formulation)

Simple Offset Assignment for code-size reduction.

Loop transformation and memory bank assignment for power reduction.

Compiler Assisted Dynamic Voltage Scaling Memory layout problem for embedded

systems MMX code generation using vectorization

Compiling Techniques for Embedded Systems

Page 8: Compiler Research  in HPC Lab

Framework for exploring application design space for network application

Compiling techniques for Streaming Applications and Program Models Buffer-Aware, Schedule-size Aware,

Throughput Optimal Schedules

Compiling Techniques for Application Specific Systems

Page 9: Compiler Research  in HPC Lab

Precise Dataflow Analysis Pointer Analysis

Compiler Analysis

Page 10: Compiler Research  in HPC Lab

Compiler problems are Optimization problems – solved by

formulating the problem as Integer Linear Program problem. Involves non-trivial effort! Efficient formulation for reducing exec. time! Other evolutionary approaches can also used.

Graph Theoretic problems – leverage existing well-known approaches

Modelled using Automaton – elegant problem formulation to ensure correctness

So, What is the Connection?

Page 11: Compiler Research  in HPC Lab

The Problem: Improve precision of data-flow analysis used in compiler optimization

Precise Dataflow Analysis

Page 12: Compiler Research  in HPC Lab

… : statements unrelated to x or y

A: ...

F: ...E: ...

D: ...

C: x=2;B: x=1;

G: y = x;

start

end

I: ...H: ...

J: ...

Can’t replace the use of x at G with a constant.

{x = 1} {x = 2}

{x = nc}

nc : not constant{ } : Data-flow information

Constant Propagation

Page 13: Compiler Research  in HPC Lab

A0

F1 E1

D1

C0 B0

G1

F2 E2

D2

G2

start

end

J0

H0 I0

A: ...

F: ...E: ...

D: ...

C: x=2;B: x=1;

G: y = x;

start

end

I: ...H: ...

J: ...

Can replace uses of x at G1 and G2

Overview of our Solution

Page 14: Compiler Research  in HPC Lab

Challenges

The Problem: Improve precision of data-flow analysis

Approach: Restructuring control-flow of the program

Challenges: Developed generic framework Guarantees optimization opportunities Handles the precision and code size

trade-off Approach is simple and clean

Page 15: Compiler Research  in HPC Lab

… : statements unrelated to x or y

A: ...

F: ...E: ...

D: ...

C: x=2;B: x=1;

G: y = x;

start

end

I: ...H: ...

J: ...

A brief look at our example.

At control-flow merge D, we lose precision.

{x = 1} {x = 2}

{x = nc}

nc : not constant{ } : Data-flow information

Page 16: Compiler Research  in HPC Lab

… : statements unrelated to x or y

A: ...

F: ...E: ...

D: ...

C: x=2;B: x=1;

G: y = x;

start

end

I: ...H: ...

J: ... nc : not constant{ } : Data-flow information

Need to duplicate this in order to optimize node G…

Page 17: Compiler Research  in HPC Lab

A: ...

F: ...E: ...

D: ...

C: x=2;B: x=1;

G: y = x;

start

end

I: ...H: ...

J: ...

… : statements unrelated to x or y

nc : not constant{ } : Data-flow information

…such that paths with differing dataflow information do not intersect.

Page 18: Compiler Research  in HPC Lab

A: ...

F: ...E: ...

D: ...

C: x=2;B: x=1;

G: y = x;

start

end

I: ...H: ...

J: ...

… : statements unrelated to x or y

nc : not constant{ } : Data-flow information

No need to duplicate this.

Page 19: Compiler Research  in HPC Lab

Control-flow Graph = Automaton View a control-flow graph G as a

finite automaton with states as nodes start state as entry node accepting state as exit node transitions as the edges

Page 20: Compiler Research  in HPC Lab

A: ...

F: ...E: ...

D: ...

C: x=2;B: x=1;

G: y = x;

start

end

I: ...H: ...

J: ...

0

21

B-D

B-D

C-D

C-D

G-HG-I

G-HG-I

Split Automaton for D

The Automaton

Page 21: Compiler Research  in HPC Lab

A: ...

F: ...E: ...

D: ...

C: x=2;B: x=1;

G: y = x;

start

end

I: ...H: ...

J: ...

0

21

B-D

B-D

C-D

C-D

G-HG-I

G-HG-I

Split Automaton for D

The Automaton

Page 22: Compiler Research  in HPC Lab

A: ...

F: ...E: ...

D: ...

C: x=2;B: x=1;

G: y = x;

start

end

I: ...H: ...

J: ...

0

21

B-D

B-D

C-D

C-D

G-HG-I

G-HG-I

Split Automaton for D

A0

F1 E1

D1

C0 B0

G1

F2 E2

D2

G2

start

end

J0

H0 I0

more

CFG x Automaton = Split Graph

Page 23: Compiler Research  in HPC Lab

Energy Reduction: Array Alloc. for Partitioned Memory Arch. Dynamic Energy reduction in Memory

Subsystem. Memory subsystem consumes significant energy Many embedded applications are array intensive Memory architecture with multiple banks

Exploiting various low-power modes of

partitioned memory architectures. Put idle memory banks in low-power mode Allocate arrays to memory banks s.t. more

memory banks can be in low-power mode for longer duration

Page 24: Compiler Research  in HPC Lab

Partitioned Memory Architectures

Memory banks with low-power modes. Active, Stand-by, Napping, Power-down, Disabled.

Resynchronization time – time to move from lower power mode to Active mode

ModeResynch.

Time (cycles)

Energy Consumed

(nJ)

Active 0 0.718

Standby 2 0.468

Napping 30 0.0206

Power Down 9000 0.00875

Page 25: Compiler Research  in HPC Lab

Motivating Example

Array Relation GraphExample :

float a[N], d[N]; double b[N], c[N];

L1: for (ia=0;ia < N;ia++)

d[ia] = a[ia] + k;

L2: for (ia=0;ia < N;ia++)

a[ia] = b[ia] * k ;

L3: for (ia=0;ia < N;ia++)

c[ia] = d[ia] / k;

L4: for (ia=0;ia < N;ia++)

b[ia] = c[ia] - k;

L5: for (ia=0;ia < N;ia++)

b[ia] = d[ia] + k;

Arrays a, d ~ 1 MB eachArrays b, c ~ 2 MB eachMemory bank size = 4MB

b

c

2N

d

a

N

8N

4N

N

Memory banks active for a total of 32N cycles!

Page 26: Compiler Research  in HPC Lab

Motivating Example -- Our Approach

Array Relation Graph

• Array allocation requires partitioning the ARG!

• Graph partitioning such that each subgraph can be accommodated in a memory bank.

• Weights of edges across subgraphs is the cost of keeping multiple banks active together. Minimize them!

• Arrays b and c in one subgraph and a and d in another

b

c

2N

d

a

N

8N

4N

N

Memory banks active for a total of 23N cycles!

Page 27: Compiler Research  in HPC Lab

Dynamic Voltage Scaling

Dynamically vary the CPU frequency and supply voltage.

Dynamic Power proportional to C * V2 * f C capacitance V supply voltage f operating frequency

Processors support different Voltage (and Frequency) modes and can switch betn. them.

AMD, Transmeta, Xscale provide support for DVS, have multiple operating frequencies.

Page 28: Compiler Research  in HPC Lab

Identify program regions where DVS can be performed.

For each program region, identify the voltage (freq.) mode to operate on, s.t. energy is minimized

Ensure that performance is not degraded.

Compiler Assisted DVS

Page 29: Compiler Research  in HPC Lab

Motivating Example

Freq.

P1 P2 P3 P4 P5 Total

200Exec. Time

151 6827 335 6827 335 14475

Energy 82 125 39 125 39 410

300Exec. Time

100 4552 223 4552 223 9650

Energy 149 163 72 163 72 619

400Exec. Time

76 3414 168 3414 168 7240

Energy 198 274 176 274 176 1098DVS

Freq. 200 400 300 400 300 --

Exec. Time

151 3414 223 3414 223 7425

Energy 82 274 72 274 72 778

2 % Increase

30 % decrease

Page 30: Compiler Research  in HPC Lab

Program divided into number of regions. Assign an operating frequency for each

program region. Constraint

Marginal increase in exec. time of the program.

Objective Minimizing program Energy

Consumption. Multiple Choice Knapsack Problem

DVS Problem Formulation

Page 31: Compiler Research  in HPC Lab

Compiler Problem as Optimization Problem Integrated register allocation, spill

code generation and scheduling in Software Pipelined loop

Problem: Given Machine M, Loop L, a software pipelined schedule S with initiation interval II, perform Register Allocation and generate spill code, if necessary, and schedule them such that the register requirement of the schedule Number of Registers and resource constraints are met!

Page 32: Compiler Research  in HPC Lab

Live Range Representation

TN A,0,0

Register R0 Register Rn....................A

....................

....................

....................

....................

....................

1

2

3

4

5

6

7 use

1

2

3

4

5

6

7

use

0def

TN A,0,1TN A,0,2

TN A,0,3TN A,0,4

TN A,0,5TN A,0,6

TN A,0,7

TN A,n,0TN A,n,1

TN A,n,2TN A,n,3TN A,n,4TN A,n,5

TN A,n,6TN A,n,7

Modeling Liverange

Page 33: Compiler Research  in HPC Lab

Store decision variables

Register R0 Register Rn....................A

....................

....................

....................

1

2

3

4

5

6

7 use

1

2

3

4

5

6

7

use

use

0def

STN A,0,1STN A,0,2

STN A,0,3STN A,0,4

STN A,n,1

STN A,n,2STN A,n,3STN A,n,4

Latencies: Load : 1, Store : 1, Instruction : 1

store

store

store

store

Modeling Spill Stores

Page 34: Compiler Research  in HPC Lab

Load decision variables

Register R0 Register Rn....................A

....................

....................

....................

1

2

3

4

5

6

7 use

1

2

3

4

5

6

7

use

use

0def

LTN A,0,3

LTN A,0,4

LTN A,0,5

LTN A,0,6

LTN A,n,3

LTN A,n,4

LTN A,n,5

LTN A,n,6

Latencies: Load : 1, Store : 1, Instruction : 1

load

load

load

load

Modeling Spill Loads

Page 35: Compiler Research  in HPC Lab

Constraints- Overview

Every live range must be in a register at the definition time and the use time.

Spill load can take place only if the spill store has already taken place.

After a spill store, a live range can continue or cease to exist.

Ensure that the spill loads and stores don't saturate the memory units.

Minimize the number of spill loads and stores.

Constraints

Page 36: Compiler Research  in HPC Lab

Objective

No Objective function – just a constrain solving problem!

Minimize the number of spill loads and stores

STN i,r,t+LTN i,r,t

Page 37: Compiler Research  in HPC Lab

Conclusions

Compiler research is fun! It is cool to do compiler research! But, remember Proebsting’s Law:

Compiler Technology Doubles CPU PowerEvery 18 YEARS!!

Plenty of opportunities in compiler research!

However, NO VACANCY in HPC lab this year!