Spring 2014Jim Hogg - UW - CSE - P501X1-1 CSE P501 – Compiler Construction Inlining...

Spring 2014 Jim Hogg - UW - CSE - P501 X1-1

CSE P501 – Compiler Construction

Inlining

Devirtualization

long res;

void foo(long x) { res = 2 * x;}

void bar() { foo(5);}

long res;

void bar() { res = 2 * 5;}

long res;

void bar() { res = 10;}

Spring 2014 X1-2Jim Hogg - UW - CSE - P501

inlining

constant folding

Inlining

Benefits

Removes overhead of function-call

No marshalling of arguments No unmarshalling of return value Better instruction-cache (I-cache) locality

Bonus: expands opportunities for further optimization

CSE, constant-prop, DCE, ... Poor man’s interprocedural optimization

Code size

Typically expands overall program size Can hurt I-cache

Compilation time

Large methods take longer to compile - hurts "through-put"

Eg: optimizing, instruction-selection, register-allocation

Language / Runtime Aspects

What is the cost of a function call? C: cheap Java: moderate (virtual dispatch) Python: expensive

Are targets resolved at compile time or run time? C: compile time Java, Python: runtime

Is the whole program available for analysis? "separate compilation"

Is profile information available? Eg: if "f" is rarely called, don't inline it

When to Inline?

Jikes RVM (with Hazelwood/Grove adaptations):

Call Instruction Sequence (CIS) = # of instructions to make call

Tiny (size < 2x call size): Always inline Small (2-5x): Inline subject to space constraints Medium (5-25x): Inline if hot (subject to space

constraints) Large: Never inline

"We should forget about small efficiencies, say about 97% of the time: premature optimization is

the root of all evil"

Jim Hogg - UW - CSE - P501

Gathering Profile Info

Counter-based: Instrument edges in flowgraph Entry + loop back edges Enough edges (enough to get good results without

excessive overhead) Expensive - always removed in optimized code Depends critically on the "training sets"

Call-stack sampling Periodically walk stack Interrupt-based or instrumentation-based May gather info on what called what (callsite info)

Spring 2014 X1-7

Jim Hogg - UW - CSE - P501

OO encourages lots of small methods

getters, setters, ... Inlining is a requirement for performance

High call overhead wrt total execution Limited scope for compiler optimizations without it

For Java, C# ,etc if you’re going to anything, do this!

But ... virtual methods are a challenge

Spring 2014 X1-8

Devirtualization

Virtual Methods

In general, we cannot determine the target until runtime

Some languages (eg, Java) allow dynamic class loading: all subclasses of A may not be visible until runtime

class A { int foo() { return 0; } int bar() { return 1; }}

class B extends A { int foo() { return 2; }}

void baz(A x) { = x.foo(); = x.bar();}

x.foo may return 0 or 2 (depending on x's runtime type - A or B)

x.bar will return 1 (unless we dynamically loaded C derived from B which over-rides method bar)

Virtual tables

Object layout in a JVM for object of class B:

Virtual method dispatch

x is the receiver object

if x has runtime type B, t2 will refer to B::foo

t1 = ldvtable xt2 = ldvirtfunaddr t1, A::foot3 = call [t2] (x)

t4 = ldvtable xt5 = ldvirtfunaddr t4, A::bart6 = call [t4] (x)

= x.foo(); = x.bar();

Devirtualization

Goal: change virtual calls to static calls at compile-time

Benefits: enables inlining lowers call overhead better I-cache performance better indirect-branch prediction

Often optimistic: Make guess at compile time Test guess at run time Fall back to virtual call if necessary

Guarded Devirtualization

Guess receiver type is B (based on profile or other information)

Call to B::foo is statically known - can be inlined

But guard inhibits optimization

t1 = ldvtable xt7 = getvtable B

if t1 == t7 t3 = call B::foo(x)else t2 = ldvirtfunaddr t1, A::foo t3 = call [t2] (x)

Guarded by Method Test

Guess that method is B:foo outside guard

More robust, but more overhead

Harder to optimize redundant guards

t1 = ldvtable xt2 = ldvirtfunaddr t1t7 = getfunaddr B::foo

if t2 == t7 t3 = call B::foo(x)else t2 = ldvirtfunaddr t1, A::foo t3 = call [t2] (x)

How to guess receiver?

Profile information Record call site targets and/or frequently executed

methods "monomorphic" versus "polymorphic"

Class hierarchy analysis Walk class hierarchy at compile time

Type analysis Intra/inter procedural data flow analysis

Class Hierarchy Analysis

Walk class hierarchy at compile-time If only one implementation of a method (ie, in the base

class), devirtualize to that target

Not guaranteed in the presence of runtime class loading Still need runtime test / fallback

Flow-Sensitive Type Analysis

Perform a forward dataflow analysis propagating type info

At each callsite compute possible set of types

Use type info of receiver to narrow targets.

A a1 = new B();a1.foo();

if (a2 instanceof C) a2.bar();

Alternatives to Guarding

Guards impose overhead run-time test on every call, merge points impede optimization

Often “know” only one target is invoked call site is monomorphic

Alternative: compile without guards recover as assumption is violated (eg, at class load,

recompile) cheaper runtime test vs more costly recovery

Recompilation Approach

Optimistically assume current class hierarchy will never change wrt a call

Devirtualize and/or inline call sites without guard

On violating class load, recompile caller method Recompiled code installed before new class New invocations will call de-optimized code What about current invocations?

Nice match with JIT compilingSpring 2014 X1-19Jim Hogg - UW - CSE - P501

Preexistence analysis

Idea: if receiver object pre-existed the caller method invocation, then callsite is only affected by a class load in future invocations

If new class C is loaded during execution of baz, x cannot have type C:

void baz(A x) { ... // C loaded here x.bar();}

x is bound to object on entry; cannot refer to a C

Code-patching

Pre-generate fallback virtual call out-of-line

On an invalidating class load, overwrite direct-call with a jump to the fallback code

Must be thread-safe! On x86, single write within a cache line is atomic Self-modifying code (also used for JIT "jump thunks")

No recompilation necessary

Patching

t3 = 2 // B::foonext: ...

fallback: t2 = ldvirtfunaddr t1, A::foo t3 = call [t2] (x) goto next

B::foo() { return 2; }

on class-load, stomp with jump fallback

Spring 2014Jim Hogg - UW - CSE - P501X1-1 CSE P501 – Compiler Construction Inlining...

Documents

1/1/2014 S. NALINI,AP/CSE 1 - ME CSE NOTES - ME CSE 1st

TeaVM: dead code elimination and devirtualization

Green University of BangladeshName: Mst. Isma Khatun Course Code Grade CSE 425 A CSE 431 A CSE 432 A-CSE 400B I CSE 437-CSE(161) C+ CSE 438-CSE(161) ... CSE (Diploma- We Course Code

CSE-H53NHardware Manual 4 版シリアルインタフェース／イーサネットプロトコルコンバータ CSE-H53N e zTCP/Ethernet Series CSE-H53N CSE-H53Nハードウェアマニュアル

scan0002 - Dhaka University of Engineering & Technology ... · q f? 8qf g so 1 i math-3411 cse-3811 cse-3813 cse-3311 cse-3511 cse-4623 cse-4621 cse-4625 cse-4821 cse-4721 cse-2111

Spring 2014Jim Hogg - UW - CSE - P501T-1 CSE P501 – Compiler Construction Loop optimizations Dominators – discovering loops Loop Hoisting Operator Strength

CSE 3320 Operating Systems Deadlock - CSE SERVICES

cse - javalab - CSE TUBE.pdf

Spring 2014Jim Hogg - UW - CSE - P501O-1 CSE P501 – Compiler Construction Instruction Scheduling Issues Latencies List scheduling

Mémoire dans le cadre des consultations sur la … · publiques sur le projet de politique de la réussite éducative (CSE, 2016a). ... (CSE, 2006; CSE, 2012; CSE, 2013; CSE, 2014;

Devirtualization of method calls

Spring 2014Jim Hogg - UW - CSE - P501R-1 CSE P501 – Compiler Construction Available Expressions Dataflow Analysis Aliasing

1204450003976735~CSE~001~001~RB 1204450041915108~CSE… · 1204450003976735~cse~001~001~rb 1204450020404487~cse~001~0091~rb 1204450036101920~cse~001~0853~rb 1204450041877230~cse~001~2338~rb

Automatically generated PDF from existing images. · 2020. 1. 17. · Prof. Krishnakant Kushvah Prof.Veena Mandlekar Chairman ARC LNCTE, Bhopal CSE CSE CSE CSE CSE CSE CE CE CE 9669981618

Spring 2014Jim Hogg - UW - CSE - P501L-1 CSE P501 – Compiler Construction Object layout Field access What is this ? Object creation - new Method calls

CSE 484 / CSE M 584: Computer Security and Privacy

MANUU-CSE RESIDENTIAL COACHING ACADEMY …manuu.ac.in/CSE-Acdmy/CSE PRELIMS 2016 ENTRANCE TEST HALL TICKETS...1 MANUU-CSE RESIDENTIAL COACHING ACADEMY CSE Prelims Coaching Entrance

GM(Graphics Masters) CSE 20011266 이동형 CSE 20021139 최태준 CSE 20061174 신재훈

CSE$484$/$CSE$M584 Computer$Security:$ Web$Security$€¦ · CSE$484$/$CSE$M584 $ Computer$Security:$ Web$Security$ TA:$Franzi$Roesner$ franzi@cs.washington.edu$

Hardware-based Devirtualization (VPC Prediction) Hyesoon Kim, Jose A. Joao, Onur Mutlu ++, Chang Joo Lee, Yale N. Patt, Robert Cohn* ++ *