29
This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. Lawrence Livermore National Security, LLC Survey of Program Transformation Technologies LLNL-PRES-607473 Chunhua (Leo) Liao, Daniel J. Quinlan, and Adrian Prantl Software Institute for Abstractions and Methodologies for HPC Simulations Codes on Future Architectures workshop Dec. 10 th , 2012 Chicago, IL, USA

Survey of Program Transformation Technologies

Embed Size (px)

Citation preview

Page 1: Survey of Program Transformation Technologies

This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. Lawrence Livermore National Security, LLC

Survey of Program Transformation

Technologies

LLNL-PRES-607473

Chunhua (Leo) Liao, Daniel J. Quinlan, and Adrian Prantl

Software Institute for Abstractions and Methodologies for HPC Simulations Codes on Future Architectures workshop

Dec. 10th, 2012 Chicago, IL, USA

Page 2: Survey of Program Transformation Technologies

Lawrence Livermore National Laboratory 2

Outline

Big picture

Program transformation techniques

• T1: String-based (scripting)

• T2: Compiler-based (direct IR modification)

• T3: Rule-based term rewrite

• T4: Semantic patches

Our relevant efforts

Summary

Page 3: Survey of Program Transformation Technologies

Lawrence Livermore National Laboratory 3

What is a program transformation?

Definition: modifications to an input program to generate an output program

— A program: a sequence of statements/instructions written to perform a specified task with a computer

Approaches: — Manual: prohibitively expensive

– E.g. Porting to a new platform: 160 lines per programmer day*: 17 years for 1 million SLOC

— Automated (semi-automated)

*P. Newcomb, R. Doblar, Automated Transformation of Legacy Systems, CrossTalk, December 2001.

Program

X

Program

Y

Modification

Page 4: Survey of Program Transformation Technologies

Lawrence Livermore National Laboratory 4

Program transformations in software

life-cycle/programming models

Code generation • Programming model implementation:

E.g: OpenMP, UPC, X10

• DSLs to general purpose code

• Compilation: source to binary

Program optimizations • loop unrolling, tiling, interchanging

• Inlining, parallelization, vectorization,

• Autotuning (empirical tuning): code variant generation

Code migration/porting • Fortran to C++, C++ to X10, etc

• Linux to Windows, Desktop to Embedded Systems, CPUs to GPUs, …

Code refactoring

• Variable renaming, code obfuscation

• Push member method up, extract

code to new method, …

Aspect-oriented programming

• Cross-cutting issues: inject support

for logging, resilience, persistence,

etc.

Program analysis

• Normalization, instrumentation

(coverage analysis)

Page 5: Survey of Program Transformation Technologies

Lawrence Livermore National Laboratory 5

Challenges to practical program

transformations Sheer size

• applications with millions lines

Multiple programming languages • real apps mixes of C/C++/Fortran, OpenMP/CUDA,

Python/Perl ...

Multiple configurations • #if .. #elseif .. #endif due to algorithm, library, platform variants

Representation of programs • Parsing and constructing internal representation

Analysis • Where (location& scope) and when (eligibility & profitability)

Modification • Correctness: individual transformations and their combinations

Program

Transformation

Page 6: Survey of Program Transformation Technologies

Lawrence Livermore National Laboratory 6

Outline

Big picture

Program transformation techniques

• T1: String-based (scripting)

• T2: Compiler-based (direct IR modification)

• T3: Rule-based term rewrite

• T4: Semantic patches

Our relevant efforts

Summary

Page 7: Survey of Program Transformation Technologies

Lawrence Livermore National Laboratory 7

T1: String-based transformation

(scripting)

Peephole transformations: variable renaming, etc.

• Representation: original string format

• Analysis: pattern match using regular expressions

• Modification: string replacement

✔ No need to parse input program

✔ Easy to learn and quick to use

✔ Widely/immediately available

sed -e 's/old_pattern/new_stuff/g' inputFileName > outputFileName

✘ Insufficient information: symbol resolving, CFG

✘ Only for localized simple transformation

✘ No access to advanced analysis

Page 8: Survey of Program Transformation Technologies

Lawrence Livermore National Laboratory 8

T2: Direct IR (Intermediate

Representation) modification

Based on matured compiler technologies

• Representation:

— Parsing to IR: High levels to low levels, with symbol tables

• Analysis:

— Control flow, data flow, dependence, etc.

• Modification:

— Procedural code directly manipulating IR

• Abundant choices: GCC, Open64, ROSE, Cetus,

LLVM, …

— Classic vs. source-to-source

Page 9: Survey of Program Transformation Technologies

Lawrence Livermore National Laboratory 9

Example: Open64 using multiple

levels of IRs

Lower to High IR

GCC

C/C++

Cray

Fortran

Inter Procedural

Analyzer Loop Nest

Optimizer

Inliner

Global Scalar Optimizer

Lower all

Lower I/O

Lower Mid W

Code Generation

-O3 -IPA

.w2c.c

.w2c.f

-O0

-O2/O3

Very high WHIRL

High WHIRL

Mid WHIRL

Low WHIRL

Take either path

(only for f90)

WHIRL2 C/Fortran

-INLINE

-CLIST/

-FLIST

IA-64, x86, MIPS, …

http://www.open64.net/

Page 10: Survey of Program Transformation Technologies

Lawrence Livermore National Laboratory 10

Example: ROSE using a single high

level AST

EDG Front-end/

Open Fortran

Parser

Abstract

Syntax Tree

(AST)

Unparser

ROSE–based source-to-source tools

http://www.roseCompiler.org

Generic

Analyses/

Transformations/

Optimizations

Custom

Analyses/

Transformations/

Optimizations

Analyzed/

Transformed/

Optimized

Source Code

Input

C/C++/Fortran

OpenMP/UPC

Source Code

Vendor

Compiler

Machine

Executable

Developed at LLNL

Page 11: Survey of Program Transformation Technologies

Lawrence Livermore National Laboratory 11

ROSE AST for a simple program

1. int main()

2. {

3. int i=0;

4. i++;

5. return i;

6. }

S3

S4 S5

S2

Page 12: Survey of Program Transformation Technologies

Lawrence Livermore National Laboratory 12

Procedural code to create a for loop

// Grab a scope in which the code will be built

SgBasicBlock *func_body = func_def->get_definition ()->get_body ();

// for(i=0;..)

SgStatement* init_stmt= buildAssignStatement(buildVarRefExp("i"),buildIntVal(0));

// for(..,i<100,...) It is an expression, not a statement

SgExprStatement* cond_stmt=

buildExprStatement(buildLessThanOp(buildVarRefExp("i"),buildIntVal(100)));

// for (..,;...;i++); not ++i;

SgExpression* incr_exp =

buildPlusPlusOp(buildVarRefExp("i"),SgUnaryOp::postfix);

// j++; as loop body statement

SgStatement* loop_body=

buildExprStatement(buildPlusPlusOp(buildVarRefExp("j"),SgUnaryOp::postfix));

// build for (i=0; i<100; i++) {j++}

SgForStatement*for_stmt = buildForStatement (init_stmt,cond_stmt,incr_exp,loop_body);

appendStatement (for_stmt, func_body);

Bottom-up

construction

void foo()

{

int i;

int j;

for (i=0;i++;i<100)

j++;

}

✔ Detailed representation of input program

✔ Familiar APIs: C/C++ interfaces

✔ Access to advanced compiler analysis

✔ Arbitrary transformation: trivial to radical

✘ Parsing/building IR is hard, especially for C++

✘ Learning curve to IR (AST)

✘ Tedious/error-prone coding for traversal,

pattern match, IR manipulation, etc.

Transparently handle

edges, symbols

Page 13: Survey of Program Transformation Technologies

Lawrence Livermore National Laboratory 13

Rewriting systems:

— Rule (lhs → rhs ): a logic formalism to express transformation between objects

— Iff a rewrite system formed by the rules is both

confluent and terminating,

the order of rule application is not significant, the system converges to a normal form.

— Otherwise, a rewrite strategy to control which rule is applied first

T3: Rule-based term rewriting

Functor Atom

Variable

List

Term Rewriting: rules for terms: nested expressions of Atoms, Functors, Lists, Variables.

— [add_expr(X, int_val(2))]

s1

s2 s3

s4

s1

s2

s3

Page 14: Survey of Program Transformation Technologies

Lawrence Livermore National Laboratory 14

Term rewriting applied to programs

Programs -> trees/ terms -> rewriting -> trees/ terms -> programs

• Representation: abstract syntax tree (AST) == nested terms

• Analysis: pattern match

• Modification: substitution (term replacement) specified via rules/strategies

Colors indicate node types, box indicates arbitrary substructure.

Pattern to match Pattern to substitute Input structure that

contains pattern

Output structure

after rewriting

Example structure of a rewrite rule:

Lhs -> rhs

Example application of rewriting:

Page 15: Survey of Program Transformation Technologies

Lawrence Livermore National Laboratory 15

Example: Stratego/XT (aka Spoofax)

Stratego/XT is an implementation of a term

rewrite system

• Stratego: a language for specifying transformations

— Rewrite rules: transformations

— Custom strategies to apply rewrite rules: traversals like

innermost, topdown, bottomup, repeat, etc.

• XT: collection of tools

— Parser generator: parser to generate nested terms

— Pretty printer generator: unparser to generate source code

— Grammar tools

http://strategoxt.org/Spoofax/

Page 16: Survey of Program Transformation Technologies

Lawrence Livermore National Laboratory 16

Rule R1: lhs_term -> rhs_term

While(e, stm) -> If(e, DoWhile(stm, e))

Term rewrite: example

if(e) do { stmts; } while(e);

while(e) { stmts; }

✔ More user-friendly than others

✔ High level transformation rules and their

application strategies

✘ Limited access to compiler analysis

✘ Learning curve for the transformation

language

Strategies:

simplify = bottomup(repeat(R1 <+ ... <+ Rn))

simplify = topdown(R1 <+ ... <+ Rn)

Page 17: Survey of Program Transformation Technologies

Lawrence Livermore National Laboratory 17

Coccinelle: program matching and transformation tool for unpreprocessed C code

Systematical bug finding and fixing

Collateral evolutions: changes to APIs -> changes to client codes

Semantic Patch Language (SmPL)

Declarative, patch-like syntax

Supports type declarations

Language-aware pattern matching (NOT literal matching!)

Abstract away diff in spacing, indentation, comments, coding style variants, irrelevant code, etc.

T4: Semantic patches - Coccinelle

http://coccinelle.lip6.fr/

Page 18: Survey of Program Transformation Technologies

Lawrence Livermore National Laboratory 18

Transformation Engine*

Parse

C files

Translate to

IR/CFG

Parse

semantic patch

Expand

isomorphisms

Translate to

CTL

CTL: Computational Tree Logic

With extra features

*Source http://coccinelle.lip6.fr/Intro_gen.pdf

Matching using

model checking

Modify

IR/CFG

Unparse

Page 19: Survey of Program Transformation Technologies

Lawrence Livermore National Laboratory 19

Semantic Patch Language (SmPL)

Examples

Example: Replace boolean expression

@@ // type declaration expression E; constant C; @@ - !E & C // pattern + !(E & C) // Replacement

Example: Change from kmalloc() with explicit init to kzalloc() with built in init. @@

expression x;

expression E1,E2;

@@

- x = kmalloc(E1,E2);

+ x = kzalloc(E1,E2);

... // ignore irrelevant code

- memset(x, 0, E1); ✔ Intuitive user input using patch-like language

✔ Used in Linux kernel development

✘ Only supports the C language

✘ Limited access to compiler analysis

Page 20: Survey of Program Transformation Technologies

Lawrence Livermore National Laboratory 20

Comparison

T1: Scripting

(Sed)

T2: Direct IR

modification

T3: Term rewriting T4: Semantic

patching

Program

Representation

String IR/AST AST terms IR

/Control Flow Graph

Transformation

Specification

s/in/out/g Low level

procedural code

High level language

for rules/strategies

Declarative

patch-like language

Input

Languages

All All (Limited by

frontends)

All (Limited by

frontends)

C only

Output String IR/ Source or binary Source (AST terms) Source

Analysis Regular

Expression

Data flow, control flow,

dependence, etc.

Pattern-match Pattern-match

/Model checking

Easy to use Easy Hard Medium Easy

Powerfulness Weak Strong Medium Medium

Robustness Weak Strong Medium Medium

Page 21: Survey of Program Transformation Technologies

Lawrence Livermore National Laboratory 21

Manual

Refinement

Exec

Resilience

Machine Learning

& Formal Methods

Parameterized

Abstract

Machine Model

Refinement/

Transformation

s

Refinement/

Lowering Vendor

Compiler

Performance

Tools

X10/SEEC

Runtime

Scalable Data

Structures

Levels of ROSE AST

DSL 1 ..N

Specification

ROSE-based

DSL Compiler

Semantic Analysis

DSL 1 ..N

Programs

DOE Apps

Rosebud DSL

Compiler Generator

Parser Generator

Rewrite system

Grammar system

Migration

Process

Compiler analysis &

Transformations

ROSE

Recording & Mapping

Front-end

Sketch-based

Transformations

http://www.dtec-xstack.org

D-TEC – “DSL Technology for Exascale Computing”

Page 22: Survey of Program Transformation Technologies

Lawrence Livermore National Laboratory 22

Minitermite: term rewriting

leveraging ROSE

• Minitermite connects ROSE with

Stratego/XT and other term-based tools

• Rewrite C++ and Fortran

• Retains column/line/preprocessing

info

• Released with ROSE already!

• Work in progress to bring semantic-

patch-like functionality

Example. http://compose-hpc.sourceforge.net ROSE+Stratego were used to transform parts of NWChem (2.9M loc/Fortran+Global Arrays) to add instrumentation and

increase parallelism.

The transformation improves performance up to 4×. [PNNL]

Stratego/XT

Source Code

C, C++,

Fortran

Transformed

Source Code

C, C++, Fortran

Minitermite (src2term --stratego)

Minitermite (term2src --stratego)

ROSE

Frontend

ROSE

Unparser

Rewrite

Rules

ROSE

Abstract Syntax Tree (AST)

Term Representation

Transformed

ROSE AST

Transformed Term

Representation

Page 23: Survey of Program Transformation Technologies

Lawrence Livermore National Laboratory 23

Summary

Program transformation • Indispensable for each stage of software life-cycle

• Code generation, analysis, optimization, migration, reverse-engineering, etc.

Difficulties • Theory: accurate eligibility and profitability analysis, correctness of

transformation

• Practice: parsing, sheer size, mixed & complex languages, multiple configurations, diverse requirements, etc.

Solution • Re-usable, common transformation infrastructure combining multiple

techniques — Parsing, analysis, direct AST modification, rewrite system, etc.

• Customized tools built in collaboration between compiler experts and end users

Page 24: Survey of Program Transformation Technologies

Lawrence Livermore National Laboratory 24

Thank You!

Questions?

Page 25: Survey of Program Transformation Technologies

Lawrence Livermore National Laboratory 25

Many other tools

http://www.txl.ca/ : The TXL Programming Language. Rule-based source-to-source transformation. traversals are part of rewrite rules.

http://www.semdesigns.com: DMS Software Reengineering Toolkit. Commercial product.

http://www.meta-environment.org/: ASF+SDF Meta-Environment for interactive program analysis and transformation. It combines SDF (Syntax Definition Formalism), ASF (Algebraic Specification Formalism) and other technologies. • http://www.rascal-mpl.org/ : Rascal–Meta Programming Language to

combine both source code analysis and manipulation

http://www.eclipse.org/aspectj/: AspectJ: Aspect-oriented programming extension to Java

Page 26: Survey of Program Transformation Technologies

Lawrence Livermore National Laboratory 26

Taxonomy of program

transformation*

*Source: http://www.program-transformation.org/

Aspect

Language

High-Level

Language X

High-Level

Language Y

Low-Level

Language Z

1.4 Analysis 1.3 Migration

1.2 Reverse

Engineering 1.1 Synthesis

2. Rephrasing (Within one language)

1. Translation (Across languages)

2. Rephrasing 2. Rephrasing

Page 27: Survey of Program Transformation Technologies

Lawrence Livermore National Laboratory 27

Generic transformation steps

Input

Program Output

Program

Intermediate Representation (IR)

/Abstract Syntax Tree (AST)

1.

Parse 4.

Unparse

2.

Analysis* 3.

Transformation*

*Interleaved and/or repeated

Pretty printer

Page 28: Survey of Program Transformation Technologies

Lawrence Livermore National Laboratory 28

Variants of Direct IR Modification IR=intermediate representation

Input: languages

• depend on the compiler frontends (parsers)

Levels of intermediate representation (IR):

• High level (close to source code, complex, many idioms)

• Low level (close to binary instructions, normalized, minimal)

Driver: data flow analysis in compiler • Programs: data (values) + control (control flow graph, call graph)

• Harder to implement source-to-source (complex IR)

• Examples: Reaching definition, live variables, constant propagation, partial redundancy, loop optimizations, …

APIs to manipulate IR (and symbol tables)

• Traversal, creation, copy, removal, insertion, ...

Output: depend on the level of IR

• Source-to-source compilers can output human-readable and compilable output with comments/preprocessing info, ...

Page 29: Survey of Program Transformation Technologies