Optimizing Compilers CISC 673 Spring 2011 Dynamic Compilation

Preview:

DESCRIPTION

Optimizing Compilers CISC 673 Spring 2011 Dynamic Compilation. John Cavazos University of Delaware. High Level View of JVM. JVM Interpreter. Reads a bytecode from a method “Interprets” the bytecode Decodes opcode and operands Based on opcodes jumps to some C code Passes operands - PowerPoint PPT Presentation

Citation preview

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Optimizing CompilersCISC 673

Spring 2011Dynamic Compilation

John CavazosUniversity of Delaware

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

High Level View of JVM

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

JVM Interpreter Reads a bytecode from a method “Interprets” the bytecode

Decodes opcode and operands Based on opcodes jumps to some C code Passes operands

Continues reading bytecodes from method until: Call Return Exception

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Interpretation Popular approach for high-level languages

Ex, Python, APL, SNOBOL, BCPL, Perl, MATLAB

Useful for memory-challenged environments

Low startup time & space overhead, but much slower than native code execution

MMI (Mixed Mode Interpreter) [Suganauma’01] Fast interpreter implemented in assembler

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Dynamic Compilation Techniques

Baseline compiler Translates bytecodes one by one to

machine code Quick compilation

Reduced set of optimizations for fast compilation

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Dynamic Compilation Techniques

Full compilation Full optimizations only for selected hot

methods Classic just-in-time compilation

Compile methods to native code on first invocation

Ex, ParcPlace Smalltalk-80, Self-91 Initial high (time & space) overhead for each

compilation Precludes use of sophisticated optimizations (eg.

SSA) Responsible for many of today’s

myths

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Interpretation vs JIT

0

20

40

60

80

100

120

Intepreter Compiler

Initial Overhead Execution

0

500

1000

1500

2000

2500

Intepreter Compiler

Execution: 20 time units Execution: 2000 time units

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Selective Optimization

Hypothesis: most execution is spent in a small percentage of methods (90/10 rule)

Idea: use two execution strategies1. Interpreter or non-optimizing compiler2. Full-fledged optimizing compiler

Strategy: Use option 1 for initial execution of all methods Profile to find “hot” subset of methods Use option 2 on this subset

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Selective Optimization

0

20

40

60

80

100

120

Intepreter Compiler Selective

Initial Overhead Execution

0

500

1000

1500

2000

2500

Intepreter Compiler Selective

Initial Overhead Execution

Selective opt: compiles 10%-20% of methods, representing 90-99% of execution time

Execution: 20 time units Execution: 2000 time units

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Designing a Selective Optimizer AKA: Adaptive Optimization System What is the system architecture?

What are the profiling mechanisms and policies for driving recompilation? How effective are these systems?

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Basic Structure of a Dynamic Compiler

ProgramMachine

code

Structural inlining

unrollingloop perm

Scalar cse

constantsexpressions

Memory scalar repl

ptrs

Reg. Alloc

Scheduling peephole

Still needs good core compiler - but more

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Raw Profile Data

Instrumented code

Basic Structure of a Dynamic Compiler

Compiler subsystem

Optimizations

Interpreter or Simple Translation

Program Executing Program

Profile Processor

History

prior decisionscompile time

ControllerCompilation

decisions

Processed Profile

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Method Profiling

Counters Call Stack Sampling Combinations

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Method Profiling: Counters Insert method-specific counter on method entry and loop

back edges Counts how often a method is called and approximates how

much time is spent in a method Very popular approach: Self, HotSpot Issues: overhead for incrementing counter can be

significant Not present in optimized code

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Method Profiling: Counters

foo ( … ) { fooCounter++; if (fooCounter > Threshold) { recompile( … ); } . . .

}

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Method Profiling: Call Stack Sampling

Periodically record which method(s) on call stack

Approximates amount of time spent in each method

Can be compiled into the code Jikes RVM, JRocket

or use hardware sampling Issues: timer-based sampling is not

deterministic

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Method Profiling: Call Stack Sampling

ABC

AB

A AB

ABC

ABC

......

Sample

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Method Profiling Mixed Combinations

Use counters initially and sampling later on IBM DK for Java

foo ( … ) { fooCounter++; if (fooCounter > Threshold) { recompile( … ); } . . . }

ABC

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Recompilation Policies

Problem: given optimization candidates, which should be optimized?

Counters: Optimize method that surpass threshold Simple, but hard to tune, doesn’t

consider context Sampling: Optimize method on call

stack top Addresses context issue

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Recompilation Policies

Problem: given optimization candidates, which should be optimized?

Call Stack Sampling: Optimize all methods that are sampled

Simple to implement Use cost/benefit model

Seemingly complicated, but easy to engineer Maintenance free Naturally supports multiple optimization

levels

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Jikes RVM: Recompilation Policy – Cost/Benefit Model

Define cur, current opt level for method m Exe(j), expected future execution time at level

j Comp(j), compilation cost at opt level j

Choose j > cur that minimizes Exe(j) + Comp(j)

If Exe(j) + Comp(j) < Exe(cur) recompile at level j

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Jikes RVM: Recompilation Policy – Cost/Benefit Model

Assumptions Sample data determines how

long a method has executed Method will execute as much in

the future as it has in the past Compilation cost and speedup

are offline averages

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Optimization LevelsOptimization

Level

Opt LevelO0

Opt LevelO1

Opt LevelO2

Branch Opts Low Constant Prop / Local CSE

Reorder Code Copy Prop / Tail Recursion

Static Splitting / Branch Opt Med Simple Opts Low

While into Untils / Loop Unroll Branch Opt High / Redundant BR

Simple Opts Med / Load Elim Expression Fold / Coalesce

Global Copy Prop / Global CSE SSA

Optimizations Controlled

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Short Running Programs

No FDO, Mar’04, AIX/PPC

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Short Running Programs

No FDO, Mar’04, AIX/PPC

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Steady State

No FDO, Mar’04, AIX/PPC

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Steady State

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Profiling for What to Do

Myth: Sophisticated profiling is too expensive to perform online

Reality: Well-known technology can collect sophisticated profiles with sampling and minimal overhead

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Suggested ReadingDynamic Compilation

Adaptive optimization in the Jalapeno JVM, M. Arnold, S. Fink, D. Grove, M. Hind, and P. Sweeney, Proceedings of the 2000 ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages & Applications (OOPSLA '00), pages 47--65, Oct. 2000.

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Spare Slides

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Method Profiling Timer Based

class Thread scheduler (...) { ... flag = 1;}void handler(...) { // sample stack, perform GC, swap threads, etc. .... flag = 0;}

foo ( … ) { // on method entry, exit, & all loop backedges if (flag) { handler( … ); } . . . }

ABC

Useful for more than profiling Jikes RVM

Schedule garbage collection Thread scheduling policies, etc.

if (flag) handler();

if (flag) handler();

if (flag) handler();

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Arnold-Ryder [PLDI 01]: Full Duplication Profiling

Full-Duplication Framework

Duplicated CodeChecking Code

Method Entry

Checks

EntryBackedges

CheckPlacement

Generate two copies of a method• Execute “fast path” most of the time• Execute “slow path” with detailed profiling occassionally• Adapted by J9 due to proven accuracy and low overhead

Recommended