Upload
chunhua-liao
View
170
Download
1
Embed Size (px)
Citation preview
This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. Lawrence Livermore National Security, LLC
Survey of Program Transformation
Technologies
LLNL-PRES-607473
Chunhua (Leo) Liao, Daniel J. Quinlan, and Adrian Prantl
Software Institute for Abstractions and Methodologies for HPC Simulations Codes on Future Architectures workshop
Dec. 10th, 2012 Chicago, IL, USA
Lawrence Livermore National Laboratory 2
Outline
Big picture
Program transformation techniques
• T1: String-based (scripting)
• T2: Compiler-based (direct IR modification)
• T3: Rule-based term rewrite
• T4: Semantic patches
Our relevant efforts
Summary
Lawrence Livermore National Laboratory 3
What is a program transformation?
Definition: modifications to an input program to generate an output program
— A program: a sequence of statements/instructions written to perform a specified task with a computer
Approaches: — Manual: prohibitively expensive
– E.g. Porting to a new platform: 160 lines per programmer day*: 17 years for 1 million SLOC
— Automated (semi-automated)
*P. Newcomb, R. Doblar, Automated Transformation of Legacy Systems, CrossTalk, December 2001.
Program
X
Program
Y
Modification
Lawrence Livermore National Laboratory 4
Program transformations in software
life-cycle/programming models
Code generation • Programming model implementation:
E.g: OpenMP, UPC, X10
• DSLs to general purpose code
• Compilation: source to binary
Program optimizations • loop unrolling, tiling, interchanging
• Inlining, parallelization, vectorization,
• Autotuning (empirical tuning): code variant generation
Code migration/porting • Fortran to C++, C++ to X10, etc
• Linux to Windows, Desktop to Embedded Systems, CPUs to GPUs, …
Code refactoring
• Variable renaming, code obfuscation
• Push member method up, extract
code to new method, …
Aspect-oriented programming
• Cross-cutting issues: inject support
for logging, resilience, persistence,
etc.
Program analysis
• Normalization, instrumentation
(coverage analysis)
…
Lawrence Livermore National Laboratory 5
Challenges to practical program
transformations Sheer size
• applications with millions lines
Multiple programming languages • real apps mixes of C/C++/Fortran, OpenMP/CUDA,
Python/Perl ...
Multiple configurations • #if .. #elseif .. #endif due to algorithm, library, platform variants
Representation of programs • Parsing and constructing internal representation
Analysis • Where (location& scope) and when (eligibility & profitability)
Modification • Correctness: individual transformations and their combinations
Program
Transformation
Lawrence Livermore National Laboratory 6
Outline
Big picture
Program transformation techniques
• T1: String-based (scripting)
• T2: Compiler-based (direct IR modification)
• T3: Rule-based term rewrite
• T4: Semantic patches
Our relevant efforts
Summary
Lawrence Livermore National Laboratory 7
T1: String-based transformation
(scripting)
Peephole transformations: variable renaming, etc.
• Representation: original string format
• Analysis: pattern match using regular expressions
• Modification: string replacement
✔ No need to parse input program
✔ Easy to learn and quick to use
✔ Widely/immediately available
sed -e 's/old_pattern/new_stuff/g' inputFileName > outputFileName
✘ Insufficient information: symbol resolving, CFG
✘ Only for localized simple transformation
✘ No access to advanced analysis
Lawrence Livermore National Laboratory 8
T2: Direct IR (Intermediate
Representation) modification
Based on matured compiler technologies
• Representation:
— Parsing to IR: High levels to low levels, with symbol tables
• Analysis:
— Control flow, data flow, dependence, etc.
• Modification:
— Procedural code directly manipulating IR
• Abundant choices: GCC, Open64, ROSE, Cetus,
LLVM, …
— Classic vs. source-to-source
Lawrence Livermore National Laboratory 9
Example: Open64 using multiple
levels of IRs
Lower to High IR
GCC
C/C++
Cray
Fortran
Inter Procedural
Analyzer Loop Nest
Optimizer
Inliner
Global Scalar Optimizer
Lower all
Lower I/O
Lower Mid W
Code Generation
-O3 -IPA
.w2c.c
.w2c.f
-O0
-O2/O3
Very high WHIRL
High WHIRL
Mid WHIRL
Low WHIRL
Take either path
(only for f90)
WHIRL2 C/Fortran
-INLINE
-CLIST/
-FLIST
IA-64, x86, MIPS, …
http://www.open64.net/
Lawrence Livermore National Laboratory 10
Example: ROSE using a single high
level AST
EDG Front-end/
Open Fortran
Parser
Abstract
Syntax Tree
(AST)
Unparser
ROSE–based source-to-source tools
http://www.roseCompiler.org
Generic
Analyses/
Transformations/
Optimizations
Custom
Analyses/
Transformations/
Optimizations
Analyzed/
Transformed/
Optimized
Source Code
Input
C/C++/Fortran
OpenMP/UPC
Source Code
Vendor
Compiler
Machine
Executable
Developed at LLNL
Lawrence Livermore National Laboratory 11
ROSE AST for a simple program
1. int main()
2. {
3. int i=0;
4. i++;
5. return i;
6. }
S3
S4 S5
S2
Lawrence Livermore National Laboratory 12
Procedural code to create a for loop
// Grab a scope in which the code will be built
SgBasicBlock *func_body = func_def->get_definition ()->get_body ();
…
// for(i=0;..)
SgStatement* init_stmt= buildAssignStatement(buildVarRefExp("i"),buildIntVal(0));
// for(..,i<100,...) It is an expression, not a statement
SgExprStatement* cond_stmt=
buildExprStatement(buildLessThanOp(buildVarRefExp("i"),buildIntVal(100)));
// for (..,;...;i++); not ++i;
SgExpression* incr_exp =
buildPlusPlusOp(buildVarRefExp("i"),SgUnaryOp::postfix);
// j++; as loop body statement
SgStatement* loop_body=
buildExprStatement(buildPlusPlusOp(buildVarRefExp("j"),SgUnaryOp::postfix));
// build for (i=0; i<100; i++) {j++}
SgForStatement*for_stmt = buildForStatement (init_stmt,cond_stmt,incr_exp,loop_body);
appendStatement (for_stmt, func_body);
Bottom-up
construction
void foo()
{
int i;
int j;
for (i=0;i++;i<100)
j++;
}
✔ Detailed representation of input program
✔ Familiar APIs: C/C++ interfaces
✔ Access to advanced compiler analysis
✔ Arbitrary transformation: trivial to radical
✘ Parsing/building IR is hard, especially for C++
✘ Learning curve to IR (AST)
✘ Tedious/error-prone coding for traversal,
pattern match, IR manipulation, etc.
Transparently handle
edges, symbols
Lawrence Livermore National Laboratory 13
Rewriting systems:
— Rule (lhs → rhs ): a logic formalism to express transformation between objects
— Iff a rewrite system formed by the rules is both
confluent and terminating,
the order of rule application is not significant, the system converges to a normal form.
— Otherwise, a rewrite strategy to control which rule is applied first
T3: Rule-based term rewriting
Functor Atom
Variable
List
Term Rewriting: rules for terms: nested expressions of Atoms, Functors, Lists, Variables.
— [add_expr(X, int_val(2))]
s1
s2 s3
s4
s1
s2
s3
Lawrence Livermore National Laboratory 14
Term rewriting applied to programs
Programs -> trees/ terms -> rewriting -> trees/ terms -> programs
• Representation: abstract syntax tree (AST) == nested terms
• Analysis: pattern match
• Modification: substitution (term replacement) specified via rules/strategies
Colors indicate node types, box indicates arbitrary substructure.
Pattern to match Pattern to substitute Input structure that
contains pattern
Output structure
after rewriting
Example structure of a rewrite rule:
Lhs -> rhs
Example application of rewriting:
Lawrence Livermore National Laboratory 15
Example: Stratego/XT (aka Spoofax)
Stratego/XT is an implementation of a term
rewrite system
• Stratego: a language for specifying transformations
— Rewrite rules: transformations
— Custom strategies to apply rewrite rules: traversals like
innermost, topdown, bottomup, repeat, etc.
• XT: collection of tools
— Parser generator: parser to generate nested terms
— Pretty printer generator: unparser to generate source code
— Grammar tools
http://strategoxt.org/Spoofax/
Lawrence Livermore National Laboratory 16
Rule R1: lhs_term -> rhs_term
While(e, stm) -> If(e, DoWhile(stm, e))
Term rewrite: example
if(e) do { stmts; } while(e);
while(e) { stmts; }
✔ More user-friendly than others
✔ High level transformation rules and their
application strategies
✘ Limited access to compiler analysis
✘ Learning curve for the transformation
language
Strategies:
simplify = bottomup(repeat(R1 <+ ... <+ Rn))
simplify = topdown(R1 <+ ... <+ Rn)
Lawrence Livermore National Laboratory 17
Coccinelle: program matching and transformation tool for unpreprocessed C code
Systematical bug finding and fixing
Collateral evolutions: changes to APIs -> changes to client codes
Semantic Patch Language (SmPL)
Declarative, patch-like syntax
Supports type declarations
Language-aware pattern matching (NOT literal matching!)
Abstract away diff in spacing, indentation, comments, coding style variants, irrelevant code, etc.
T4: Semantic patches - Coccinelle
http://coccinelle.lip6.fr/
Lawrence Livermore National Laboratory 18
Transformation Engine*
Parse
C files
Translate to
IR/CFG
Parse
semantic patch
Expand
isomorphisms
Translate to
CTL
CTL: Computational Tree Logic
With extra features
*Source http://coccinelle.lip6.fr/Intro_gen.pdf
Matching using
model checking
Modify
IR/CFG
Unparse
Lawrence Livermore National Laboratory 19
Semantic Patch Language (SmPL)
Examples
Example: Replace boolean expression
@@ // type declaration expression E; constant C; @@ - !E & C // pattern + !(E & C) // Replacement
Example: Change from kmalloc() with explicit init to kzalloc() with built in init. @@
expression x;
expression E1,E2;
@@
- x = kmalloc(E1,E2);
+ x = kzalloc(E1,E2);
... // ignore irrelevant code
- memset(x, 0, E1); ✔ Intuitive user input using patch-like language
✔ Used in Linux kernel development
✘ Only supports the C language
✘ Limited access to compiler analysis
Lawrence Livermore National Laboratory 20
Comparison
T1: Scripting
(Sed)
T2: Direct IR
modification
T3: Term rewriting T4: Semantic
patching
Program
Representation
String IR/AST AST terms IR
/Control Flow Graph
Transformation
Specification
s/in/out/g Low level
procedural code
High level language
for rules/strategies
Declarative
patch-like language
Input
Languages
All All (Limited by
frontends)
All (Limited by
frontends)
C only
Output String IR/ Source or binary Source (AST terms) Source
Analysis Regular
Expression
Data flow, control flow,
dependence, etc.
Pattern-match Pattern-match
/Model checking
Easy to use Easy Hard Medium Easy
Powerfulness Weak Strong Medium Medium
Robustness Weak Strong Medium Medium
Lawrence Livermore National Laboratory 21
Manual
Refinement
Exec
Resilience
Machine Learning
& Formal Methods
Parameterized
Abstract
Machine Model
Refinement/
Transformation
s
Refinement/
Lowering Vendor
Compiler
Performance
Tools
X10/SEEC
Runtime
Scalable Data
Structures
Levels of ROSE AST
DSL 1 ..N
Specification
ROSE-based
DSL Compiler
Semantic Analysis
DSL 1 ..N
Programs
DOE Apps
Rosebud DSL
Compiler Generator
Parser Generator
Rewrite system
Grammar system
Migration
Process
Compiler analysis &
Transformations
ROSE
Recording & Mapping
Front-end
Sketch-based
Transformations
http://www.dtec-xstack.org
D-TEC – “DSL Technology for Exascale Computing”
Lawrence Livermore National Laboratory 22
Minitermite: term rewriting
leveraging ROSE
• Minitermite connects ROSE with
Stratego/XT and other term-based tools
• Rewrite C++ and Fortran
• Retains column/line/preprocessing
info
• Released with ROSE already!
• Work in progress to bring semantic-
patch-like functionality
Example. http://compose-hpc.sourceforge.net ROSE+Stratego were used to transform parts of NWChem (2.9M loc/Fortran+Global Arrays) to add instrumentation and
increase parallelism.
The transformation improves performance up to 4×. [PNNL]
Stratego/XT
Source Code
C, C++,
Fortran
Transformed
Source Code
C, C++, Fortran
Minitermite (src2term --stratego)
Minitermite (term2src --stratego)
ROSE
Frontend
ROSE
Unparser
Rewrite
Rules
ROSE
Abstract Syntax Tree (AST)
Term Representation
Transformed
ROSE AST
Transformed Term
Representation
Lawrence Livermore National Laboratory 23
Summary
Program transformation • Indispensable for each stage of software life-cycle
• Code generation, analysis, optimization, migration, reverse-engineering, etc.
Difficulties • Theory: accurate eligibility and profitability analysis, correctness of
transformation
• Practice: parsing, sheer size, mixed & complex languages, multiple configurations, diverse requirements, etc.
Solution • Re-usable, common transformation infrastructure combining multiple
techniques — Parsing, analysis, direct AST modification, rewrite system, etc.
• Customized tools built in collaboration between compiler experts and end users
Lawrence Livermore National Laboratory 24
Thank You!
Questions?
Lawrence Livermore National Laboratory 25
Many other tools
http://www.txl.ca/ : The TXL Programming Language. Rule-based source-to-source transformation. traversals are part of rewrite rules.
http://www.semdesigns.com: DMS Software Reengineering Toolkit. Commercial product.
http://www.meta-environment.org/: ASF+SDF Meta-Environment for interactive program analysis and transformation. It combines SDF (Syntax Definition Formalism), ASF (Algebraic Specification Formalism) and other technologies. • http://www.rascal-mpl.org/ : Rascal–Meta Programming Language to
combine both source code analysis and manipulation
http://www.eclipse.org/aspectj/: AspectJ: Aspect-oriented programming extension to Java
Lawrence Livermore National Laboratory 26
Taxonomy of program
transformation*
*Source: http://www.program-transformation.org/
Aspect
Language
High-Level
Language X
High-Level
Language Y
Low-Level
Language Z
1.4 Analysis 1.3 Migration
1.2 Reverse
Engineering 1.1 Synthesis
2. Rephrasing (Within one language)
1. Translation (Across languages)
2. Rephrasing 2. Rephrasing
Lawrence Livermore National Laboratory 27
Generic transformation steps
Input
Program Output
Program
Intermediate Representation (IR)
/Abstract Syntax Tree (AST)
1.
Parse 4.
Unparse
2.
Analysis* 3.
Transformation*
*Interleaved and/or repeated
Pretty printer
Lawrence Livermore National Laboratory 28
Variants of Direct IR Modification IR=intermediate representation
Input: languages
• depend on the compiler frontends (parsers)
Levels of intermediate representation (IR):
• High level (close to source code, complex, many idioms)
• Low level (close to binary instructions, normalized, minimal)
Driver: data flow analysis in compiler • Programs: data (values) + control (control flow graph, call graph)
• Harder to implement source-to-source (complex IR)
• Examples: Reaching definition, live variables, constant propagation, partial redundancy, loop optimizations, …
APIs to manipulate IR (and symbol tables)
• Traversal, creation, copy, removal, insertion, ...
Output: depend on the level of IR
• Source-to-source compilers can output human-readable and compilable output with comments/preprocessing info, ...