73
Fernando Magno Quintão Pereira PROGRAMMING LANGUAGES LABORATORY [email protected] Universidade Federal de Minas Gerais - Department of Computer Science PROGRAM ANALYSIS AND OPTIMIZATION – DCC888 JUST-IN-TIME COMPILATION The material in these slides have been taken mostly from two papers: "Dynamic EliminaCon of Overflow Tests in a Trace Compiler", and "Just-in- Time Value SpecializaCon"

Universidade Federal de Minas Gerais - Department of Computer Sciencehomepages.dcc.ufmg.br/~fernando/classes/dcc888/ementa/slides/... · Fernando Magno Quintão Pereira! PROGRAMMING

  • Upload
    donga

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

Fernando Magno Quintão Pereira!

PROGRAMMING LANGUAGES LABORATORY!

[email protected]

UniversidadeFederaldeMinasGerais-DepartmentofComputerScience

PROGRAM ANALYSIS AND OPTIMIZATION – DCC888!

JUST-IN-TIME COMPILATION!

Thematerialintheseslideshavebeentakenmostlyfromtwopapers:"DynamicEliminaConofOverflowTestsinaTraceCompiler",and"Just-in-TimeValueSpecializaCon"

AToyExample

#include <stdio.h>!#include <stdlib.h>!#include <sys/mman.h>!

int main(void) {!  char* program;!  int (*fnptr)(void);!  int a;!  program = mmap(NULL, 1000, PROT_EXEC | PROT_READ |! PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);!  program[0] = 0xB8;!  program[1] = 0x34;!  program[2] = 0x12;!  program[3] = 0;!  program[4] = 0;!  program[5] = 0xC3;!  fnptr = (int (*)(void)) program;!  a = fnptr();!  printf("Result = %X\n",a);!}!

1)  WhatistheprogramonthelePdoing?

2)  WhatisthisAPIallabout?

3)  Whatdoesthisprogramhavetodowithajust-in-Cmecompiler?

Just-in-TimeCompilers

•  AJITcompilertranslatesaprogramintobinarycodewhilethisprogramisbeingexecuted.

•  WecancompileafuncConassoonasitisnecessary.– ThisisGoogle'sV8approach.

•  OrwecanfirstinterpretthefuncCon,andaPerwerealizethatthisfuncConishot,wecompileitintobinary.– ThisistheapproachofMozilla'sIonMonkey.

1)When/where/whyarejust-in-Cmecompilersusuallyused?

2)CanaJITcompiledprogramrunfasterthanastaCcallycompiledprogram?

3)WhichfamousJITcompilersdoweknow?

TherearemanyJITcompilersaround

•  JavaHotspotisoneofthemostefficientJITcompilersin

usetoday.Itwasreleasedin1999,andhasbeeninuse

sincethen.

•  V8istheJavaScriptJITcompilerusedbyGoogleChrome.

•  IonMonkeyistheJavaScriptJITcompilerusedbythe

MozillaFirefox.

•  LuaJIT(http://luajit.org/)isatracebasedjust-in-CmecompilerthatgeneratescodefortheLua

programminglanguage.

•  The.NetframeworkJITsCILcode.

•  ForPythonwehavePyPy,whichrunsonCpython.

CanJITscompetewithstaCccompilers?

http://benchmarksgame.alioth.debian.org/!

Howcananinterpretedlanguagerunfasterthanacompiledlanguage?

Tradeoffs

•  TherearemanytradeoffsinvolvedintheJITcompilaConofaprogram.

•  TheCmetocompileispartofthetotalexecuConCmeoftheprogram.

•  WemaybewillingtorunsimplerandfasteropCmizaCons(ornoopCmizaConatall)todiminishthecompilaConoverhead.

•  AndwemaytrytolookatrunCmeinformaContoproducebecercodes.– Profilingisabigplayerhere.

•  ThesamecodemaybecompiledmanyCmes!

WhywouldwecompilethesamecodemanyCmes?

Example:Mozilla'sIonMonkey

Parser

Optimizer

CodeGenerator

function inc(x) {

return x + 1;

}

JavaScript source code

mov 0x28(%rsp),%r10

shr $0x2f,%r10

cmp $0x1fff1,%r10d

je 0x1b

jmpq 0x6e

mov 0x28(%rsp),%rax

mov %eax,%ecx

mov $0x7f70b72c,%r11

mov (%r11),%rdx

cmp %rdx,%rsp

jbe 0x78

add $0x1,%ecx

jo 0xab

mov $0xfff88000,%rax

retq

Native code

Bytecodes

00000: getarg 0

00003: one

00004: add

00005: return

00006: stop

LIR

label ()

parameter ([x:1 (arg:0)])

parameter ([x:2 (arg:8)])

start ()

unbox ([i:3]) (v2:r)

checkoverrecursed()t=([i:4])

osipoint ()

addi ([i:5 (!)]) (v3:r), (c)

box ([x:6]) (v5:r)

return () (v6:rcx)

resumepoint 4 2 6

parameter -1 Value

resumepoint 4 2 6

parameter 0 Value

constant undef Undefined

start

unbox parameter4 Int32

resumepoint 4 2 6

checkoverrecursed

constant 0x1 Int32

add unbox10 const14 Int32

return add16

MIR

Interpreter

Compiler

IonMonkeyisoneoftheJITcompilersusedbytheFirefoxbrowsertoexecuteJavaScriptprograms.

ThiscompilerisCghtlyintegratedwithSpiderMonkey,theJavaScriptinterpreter.

SpiderMonkeyinvokesIonMonkeytoJITcompileafuncConeitherifitisoPencalled,orifithasaloopthatexecutesforalongCme.

WhydowehavesomanydifferentintermediaterepresentaConshere?

WhentoInvoketheJITCompiler?

•  CompilaConhasacost.– FuncConsthatexecuteonlyonce,forafewiteraCons,shouldbeinterpreted.

•  Compiledcoderunsfaster.– FuncConsthatarecalledoPen,orthatloopforalongCmeshouldbecompiled.

•  AndwemayhavedifferentopCmizaConlevels…

Howtodecidewhentocompileapieceofcode?

Asanexample,SpiderMonkeyusesthreeexecuConmodes:thefirstisinterpretaCon;thenwehavethebaselinecompiler,whichdoesnotopCmizethecode.FinallyIonMonkeykicksin,andproduceshighlyopCmizedcode.

TheCompilaConThreshold

Interpreted Code

Unoptimized Native Code

Optimized Native Code Baseline Compiler

Optimizing Compiler

Number of instructions processed (either natively or via interpretation)

Number of instructions compiled

Tim

e

ManyexecuConenvironmentsassociatecounterswithbranches.Onceacounterreachesagiventhreshold,thatcodeiscompiled.Butdefiningwhichthresholdtouseisverydifficult,e.g.,howtominimizetheareaofthecurvebelow?JITsarecrowdedwithmagicnumbers.

TheMillion-DollarsQuesCon

•  WhentoinvoketheJITcompiler?

1)  CanyoucomeupwithastrategytoinvoketheJITcompilerthatopCmizesforspeed?

2)  DoyouhavetoexecutetheprogramabitbeforecallingtheJIT?

3)  HowmuchinformaCondoyouneedtomakeagoodguess?

4)  WhatisthepriceyoupayformakingawrongpredicCon?

5)  Whichprogramsareeasytopredict?

6)  Dotheeasyprogramsreflecttheneedsoftheusers?

UniversidadeFederaldeMinasGerais–DepartmentofComputerScience–ProgrammingLanguagesLaboratory

SPECULATION

SpeculaCon

•  AkeytrickusedbyJITcompilersisspecula(on.•  Wemayassumethatagivenpropertyistrue,andthen

weproducecodethatcapitalizesonthatspeculaCon.•  TherearemanydifferentkindsofspeculaCon,andthey

arealwaysagamble:

–  Let'sassumethatthetypeofavariableisaninteger,

•  butifwehaveanintegeroverflow…–  Let'sassumethattheproperCesoftheobjectare

fixed,•  butifweaddorremovesomethingfromtheobject…

–  Let'sassumethatthetargetofacallisalwaysthesame,

•  butifwepointthefuncConreferencetoanotherclosure…

InlineCaching

•  Oneoftheearliest,andmosteffecCve,typesofspecializaConwasinlinecaching,anopCmizaCondevelopedfortheSmalltalkprogramminglanguage♧.

•  Smalltalkisadynamicallytypedprogramminglanguage.•  Inanutshell,objectsarerepresentedashash-tables.•  Thisisaveryflexibleprogrammingmodel:wecanaddor

removeproperCesofobjectsatwill.•  LanguagessuchasPythonandRubyalsoimplement

objectsinthisway.•  Today,inlinecachingisthekeyideabehindJITs'shigh

performancewhenrunningJavaScriptprograms.

♧:EfficientimplementaConofthesmalltalk-80system,POPL(1984)

VirtualTables

InJava,C++,C#,andotherprogramminglanguagesthatarecompiledstaCcally,objectscontainpointerstovirtualtables.AnymethodcallcanberesolvedaPertwopointerdereferences.Thefirstdereferencefindsthevirtualtableoftheobject.Thesecondfindsthetargetmethod,givenaknownoffset.

classAnimal{publicvoideat(){System.out.println(this+"iseaCng");}publicStringtoString(){return"Animal";}}

classMammalextendsAnimal{publicvoidsuckMilk(){System.out.println(this+"issucking");}publicStringtoString(){return"Mammal";}publicvoideat(){System.out.println(this+"iseaCnglikeamammal");}}

classDogextendsMammal{publicvoidbark(){System.out.println(this+"isbarking");}publicStringtoString(){return"Dog";}publicvoideat(){System.out.println(this+",iseaCnglikeadog");}}

1)  Inorderforthistricktowork,weneedtoimposesomerestricConsonvirtualtables.Whichones?

2)  Whatarethevirtualtablesof?Animala=newAnimal();Animalm=newMammal();Mammald=newDog()

VirtualTables

classAnimal{publicvoideat(){System.out.println(this+"iseaCng");}publicStringtoString(){return"Animal";}}

classMammalextendsAnimal{publicvoidsuckMilk(){System.out.println(this+"issucking");}publicStringtoString(){return"Mammal";}publicvoideat(){System.out.println(this+"iseaCnglikeamammal");}}

classDogextendsMammal{publicvoidbark(){System.out.println(this+"isbarking");}publicStringtoString(){return"Dog";}publicvoideat(){System.out.println(this+",iseaCnglikeadog");}}

AnimaltoStringeat

MammaltoStringeatsuckMilk

DogtoStringeatsuckMilkbark

Animal a

Animal m

Mammal d

Howtolocatethetargetofd.eat()?

VirtualCall

DogtoStringeatsuckMilkbark

Mammal d

d.eat()First, we need to know the table d is pointing to. This requires one pointer dereference:

Second, we need to know the offset of the method eat, inside the table. This offset is always the same for any class that inherits from Animal, so we can jump blindly.

AnimaltoStringeat

MammaltoStringeatsuckMilk

public void eat() {

System.out.println ("Eats like a dog");

}

public void eat() {

System.out.println ("Eats like a mammal");

}

public void eat() {

System.out.println ("Eats like an animal");

}

Canwehavevirtualtablesinducktypedlanguages?

ObjectsinPython

INT_BITS = 32!

def getIndex(element):! index = element / INT_BITS! offset = element % INT_BITS! bit = 1 << offset! return (index, bit)!

class Set:! def __init__(self, capacity):! self.capacity = capacity! self.vector = range(1+capacity/INT_BITS)! for i in range(len(self.vector)):! self.vector[i] = 0! def add(self, element):! (index, bit) = getIndex(element)! self.vector[index] |= bit!

class ErrorSet(Set):! def checkIndex(self, element):! if (element > self.capacity):! raise IndexError(str(element) + " is out of range.")! def add(self, element):! self.checkIndex(element)! Set.add(self, element)! print element, "successfully added."!

1)  WhatistheprogramonthelePdoing?

2)  WhatisthecomplexitytolocatethetargetofamethodcallinPython?

3)  Whycan'tcallsinPythonbeimplementedasefficientlyascallsinJava?

UsingPythonObjects

def fill(set, b, e, s):! for i in range(b, e, s):! set.add(i)!

s0 = Set(15)!fill(s0, 10, 20, 3)!

s1 = ErrorSet(15)!fill(s1, 10, 14, 3)!

class X:! def __init__(self):! self.a = 0!

fill(X(), 1, 10, 3)!>>> AttributeError: X instance!>>> has no attribute 'add'!

1)  WhatdoesthefuncConfilldo?

2)  Whydidthethirdcalloffillfailed?

3)  Whataretherequirementsthatfillexpectsonitsparameters?

DuckTyping

def fill(set, b, e, s):! for i in range(b, e, s):! set.add(i)!

class Num:! def __init__(self, num):! self.n = num! def add(self, num):! self.n += num! def __str__(self):! return str(self.n)!

n = Num(3)!print n!fill(n, 1, 10, 1)!...!

Dowegetanerrorhere?

DuckTyping

class Num:! def __init__(self, num):! self.n = num! def add(self, num):! self.n += num! def __str__(self):! return str(self.n)!

n = Num(3)!print n!fill(n, 1, 10, 1)!print n!

>>> 3!>>> 48!

Theprogramworksjustfine.Theonlyrequirementthatfillexpectsonitsfirstargumentisthatithasamethodaddthattakestwoparameters.Anyobjectthathasthismethod,andcanreceiveanintegeronthesecondargument,willworkwithfill.Thisiscalledducktyping:ifitsqueakslikeaduck,swimslikeaduck,eatslikeaduck,thenitisaduck!

ThePriceofFlexibility

•  Objects,inthesedynamicallytypedlanguages,areforthemostpartimplementedashashtables.– Thatiscool:wecanaddorremovemethodswithoutmuchhardwork.

def fill(set, b, e, s):! for i in range(b, e, s):! set.add(i)!

– Andmindhowmuchcodewecanreuse?

•  Butmethodcallsareprecyexpensive.

Mammal d

__in

it__(

self,

cap

)

add(self, elem)del(self, elem)

contains(self, elem)Howcanwemakethesecallscheaper?

MonomorphicInlineCache

class Num:! def __init__(self, n):! self.n = n! def add(self, num):! self.n += num!

def fill(set, b, e, s):! for i in range(b, e, s):! set.add(i)!

n = Num(3)!print n!fill(n, 1, 10, 1)!print n!

>>> 3!>>> 48!

ThefirstCmewegeneratecodeforacall,wecancheckthetargetofthatcall.Weknowthistarget,becausewearegenera(ngcodeatrun(me!

fill(set,b,e,s):foriinrange(b,e,s):ifisinstance(set,Num):set.n+=ielse:add=lookup(set,"add")add(set,i)

CouldyouopCmizethiscodeevenfurtherusingclassiccompilertransformaCons?

InliningontheMethod

•  Wecanalsospeculateonthemethodname,insteadofdoingitonthecallingsite:

fill(set,b,e,s):foriinrange(b,e,s):__f_add(set,i)

__f_add(o,e):ifisinstance(o,Num):o.n+=eelse:f=lookup(o,"add")f(o,e)

1)  Isthereanyadvantagetothisapproach,whencomparedtoinliningatthecallsite?

2)  Isthereanydisadvantage?

3)  WhichoneislikelytochangemoreoPen?

fill(set,b,e,s):foriinrange(b,e,s):ifisinstance(set,Num):set.n+=ielse:add=lookup(set,"add")add(set,i)

PolymorphicCalls

•  IfthetargetofacallchangesduringtheexecuConoftheprogram,thenwehaveapolymorphiccall.

•  Amonomorphicinlinecachewouldhavetobeinvalidated,andwewouldfallbackintotheexpensivequest.

>>> l = [Num(1), Set(1), Num(2), Set(2), Num(3), Set(3)]!>>> for o in l:!... o.add(2)!...!

IsthereanythingwecoulddotoopCmizethiscase?

PolymorphicCalls

•  IfthetargetofacallchangesduringtheexecuConoftheprogram,thenwehaveapolymorphiccall.

•  Amonomorphicinlinecachewouldhavetobeinvalidated,andwewouldfallbackintotheexpensivequest.

>>> l = [Num(1), Set(1), Num(2), Set(2), Num(3), Set(3)]!>>> for o in l:!... o.add(2)!...!

fill(set,b,e,s):foriinrange(b,e,s):__f_add(set,i)

__f_add(o,e):ifisinstance(o,Num):o.n=eelifisinstance(o,Set):(index,bit)=getIndex(e)o.vector[index]|=bitelse:f=lookup(o,"add")f(o,e)

Woulditnotbebecerjusttohavethecodebelow?

foriinrange(b,e,s):lookup(set,"add")…

TheSpeculaCveNatureofInlineCaching

•  Python–aswellasJavaScript,Ruby,Luaandotherverydynamiclanguages–allowstheusertoaddorremovemethodsfromanobject.

•  Ifsuchchangesinthelayoutofanobjecthappen,thentherepresentaConofthatobjectmustberecompiled.Inthiscase,weneedtoupdatetheinlinecache.

fromSetimportINT_BITS,getIndex,Set

deferrorAdd(self,element):if(element>self.capacity):raiseIndexError(str(element)+"isoutofrange.")else:(index,bit)=getIndex(element)self.vector[index]|=bitprintelement,"addedsuccessfully!"

Set.add=errorAdds=Set(60)s.errorAdd(59)s.remove(59)

TheBenefitsoftheInlineCache

•  Monomorphicinlinecachehit:– 10instrucCons

•  PolymorphicInlinecachehit:– 35instrucConsifthereare10types– 60instrucConsifthereare20types

•  Inlinecachemiss:1,000–4,000instrucCons.

ThesenumbershavebeenobtainedbyAhnetal.forJavaScript,intheChromeV8compiler♤:

♤:ImprovingJavaScriptPerformancebyDeconstrucCngtheTypeSystem,PLDI(2014)

WhichfactorscouldjusCfythesenumbers?

UniversidadeFederaldeMinasGerais–DepartmentofComputerScience–ProgrammingLanguagesLaboratory

PARAMETERSPECIALIZATION

FuncConsareoPencalledwithsamearguments

•  Costaetal.haveinstrumentedtheMozillaFirefoxbrowser,andhaveusedittonavigatethroughthe100mostvisitedwebpagesaccordingtotheAlexaindex♤.

•  Theyhavealsoperformedthesetestsonthreepopularbenchmarksuites:– Sunspider1.0,V86.0andKraken1.1

♤: http://www.alexa.com!♧:Just-in-TimeValueSpecializaCon(CGO2013)

Empiricalfinding:mostoftheJavaScriptfuncConsarealwayscalledwiththesamearguments.ThisobservaConhasbeenmadeoveracorpusof26,000funcCons.

!"

!#$"

!#%"

!#&"

!#'"

!#("

$" '" )" $!" $&" $*" $+" %%" %(" %,"

ManyFuncConsareCalledonlyOnce!

Almost50%ofthefuncConshavebeencalledonlyonceduringamockusersession.

MostFuncConsareCalledwithSameArguments

!"

!#$"

!#%"

!#&"

!#'"

!#("

!#)"

$" '" *" $!" $&" $)" $+" %%" %(" %,"

Almost60%ofthefuncConsarecalledwithonlyonesetofargumentsthroughouttheenCrewebsession.

ASimilarPacernintheBenchmarks

!"

!#!$"

!#%"

!#%$"

!#&"

!#&$"

%" '" %%" %'" &%" &'" (%" ('" )%" )'"

!"

!#!$"

!#!%"

!#!&"

!#!'"

(" &" ((" (&" $(" $&" )(" )&" %(" %&" *(" *&" &(" &&" +(" +&" '(" '&" ,(" ,&"(!("

!"

!#$"

!#%"

!#&"

!#'"

$" (" $&" $)" %*" &$" &(" '&" ')" **" +$" +("

!"

!#$"

!#%"

!#&"

!#'"

$" (" $$" $(" %$" %(" &$" &(" '$" '("

!"

!#$%"

!#&"

!#'%"

$" (" $$" $(" )$" )(" &$" &(" '$" '(" %$"

!"

!#$"

!#%"

!#&"

!#'"

!#("

!#)"

$" *" $&" $+" %(" &$" &*" '&" '+" (("

Sunspider 1.0 V8 version 6 Kraken 1.1

Sunspider 1.0 V8 version 6 Kraken 1.1

ThethreeupperhistogramsshowhowoPeneachfuncConiscalled.

ThethreelowerhistogramsshowshowoPeneachfuncConiscalledwiththesamesuiteofarguments.

So,ifitislikethis…

•  Let'sassumethatthefuncConisalwayscalledwiththesamearguments.Wegeneratethebestcodespecifictothosearguments.IfthefuncConiscalledwithdifferentarguments,thenwere-compileit,thisCmeusingamoregenericapproach.

function sum(N) {! if(typeof N != 'number') ! return 0;!

var s = 0;! for(var i=0; i<N; i++)! {! if(N % 3 == 0)! s += i*N;! else! s += i;! }! return s;!}!

function sum() {! var s = 0;! for(var i=0; i<59; i++)! {! s += i;! }! return s;!}!

IfweassumethatN=59,thenwecanproducethisspecializedcode:

ARunningExample–LinearSimilaritySearch

function closest(v, q, n) {! if (n == 0) {! throw "Error";! } else {! var i = 0;! var d = 2147483647;! while (i < n) {! var nd = abs(s[i] - q);! if (nd <= d)! d = nd;! i++;! }! return d;! }!}!

Givenavectorvofsizen,holdingintegers,findtheshortestdifferencedfromanyelementinvtoaparameterq.

Wemustiteratethroughv,lookingforthesmallestdifferencev[i] – q.ThisisanO(q)algorithm.

Whatcouldwedoifwewantedtogeneratecodethatisspecifictoagiventuple(v,q,n)?

TheControlFlowGraphoftheExample

function closest(v, q, n) {! if (n == 0) {! throw "Error";! } else {! var i = 0;! var d = 2147483647;! while (i < n) {! var nd = abs(s[i]-q);! if (nd <= d)! d = nd;! i++;! }! return d;! }!}!

Function entry pointv = param[0]

q = param[1]

n = param[2]

if (n == 0) goto L1

L1:

throw "Error"L2: i0 = 0

d0 = 2147483647

L3: i1 =! (i0, i2, i3)

d1 =! (d0, d3, d4)

if (i1 < n) goto L5

L4:

return d1

L5: t0 = 4 * i

t1 = v[t0]

inbounds(t1, n) goto L8

L6: d2 = nd

L9: d3 =! (d1, d2)

i2 = i1 + 1

goto L3

On stack replacementv = param[0]

q = param[1]

n = param[2]

i3 = stack[0]

d4 = stack[0]

L7: nd = abs(t1, q)

if (nd > d1) goto L9

L8:

throw BoundsErr

TheCFGofafuncConJIT-compiledduringinterpretaConhastwoentrypoints.Why?

TheControlFlowGraphoftheExample

function closest(v, q, n) {! if (n == 0) {! throw "Error";! } else {! var i = 0;! var d = 2147483647;! while (i < n) {! var nd = abs(s[i]-q);! if (nd <= d)! d = nd;! i++;! }! return d;! }!}!

Function entry pointv = param[0]

q = param[1]

n = param[2]

if (n == 0) goto L1

L1:

throw "Error"L2: i0 = 0

d0 = 2147483647

L3: i1 =! (i0, i2, i3)

d1 =! (d0, d3, d4)

if (i1 < n) goto L5

L4:

return d1

L5: t0 = 4 * i

t1 = v[t0]

inbounds(t1, n) goto L8

L6: d2 = nd

L9: d3 =! (d1, d2)

i2 = i1 + 1

goto L3

On stack replacementv = param[0]

q = param[1]

n = param[2]

i3 = stack[0]

d4 = stack[0]

L7: nd = abs(t1, q)

if (nd > d1) goto L9

L8:

throw BoundsErr

ThisCFGhastwoentries.ThefirstmarkstheordinarystarCngpointofthefuncCon.Thesecondexistsifwecompiledthisbinarywhileitwasbeinginterpreted.ThisentryisthepointwheretheinterpreterwaswhentheJITwasinvoked.

v

q

n

load[0]

42

100i

d

heapstack

2147483647

40

Function entry pointv = load[0]

q = 42

n = 100

if (n == 0) goto L1

L1:

throw "Error"

L2: i0 = 0

d0 = 2147483647

L3: i1 =! (i0, i2, i3)

d1 =! (d0, d3, d4)

if (i1 < n) goto L5

L4:

return d1

L5: t0 = 4 * i

t1 = v[t0]

inbounds(t1, n) goto L8

L6: d2 = nd

L9: d3 =! (d1, d2)

i2 = i1 + 1

goto L3

On stack replacementv = load[0]

q = 42

n = 100

i3 = 40

d4 = 2147483647

L7: nd = abs(t1, q)

if (nd > d1) goto L9

L8:

throw BoundsErr

IdenCfyingtheParametersBecausewehavetwoentrypoints,therearetwoplacesfromwherewecanreadarguments.

Tofindthevaluesofthesearguments,wejustaskthemtotheinterpreter.WecanaskthisquesConbyinspecCngtheinterpreter'sstackattheverymomentwhentheJITcompilerwasinvoked.

DeadCodeEliminaCon(DeadCodeAvoidance?)

Function entry pointv = load[0]

q = 42

n = 100

if (n == 0) goto L1

L1:

throw "Error"

L2: i0 = 0

d0 = 2147483647

L3: i1 =! (i0, i2, i3)

d1 =! (d0, d3, d4)

if (i1 < n) goto L5

L4:

return d1

L5: t0 = 4 * i

t1 = v[t0]

inbounds(t1, n) goto L8

L6: d2 = nd

L9: d3 =! (d1, d2)

i2 = i1 + 1

goto L3

On stack replacementv = load[0]

q = 42

n = 100

i3 = 40

d4 = 2147483647

L7: nd = abs(t1, q)

if (nd > d1) goto L9

L8:

throw BoundsErr

IfweassumethatthefuncConwillnotbecalledagain,wedonotneedtogeneratethefuncCon'sentryblock.

Wecanalsoremoveanyotherblocksthatarenotreachablefromthe"On-Stack-Replacement"block.

Function entry pointv = load[0]

q = 42

n = 100

if (n == 0) goto L1

L1:

throw "Error"

L2: i0 = 0

d0 = 2147483647

L3: i1 =! (i0, i2, i3)

d1 =! (d0, d3, d4)

if (i1 < n) goto L5

L4:

return d1

L5: t0 = 4 * i

t1 = v[t0]

inbounds(t1, n) goto L8

L6: d2 = nd

L9: d3 =! (d1, d2)

i2 = i1 + 1

goto L3

On stack replacementv = load[0]

q = 42

n = 100

i3 = 40

d4 = 2147483647

L7: nd = abs(t1, q)

if (nd > d1) goto L9

L8:

throw BoundsErr

ArrayBoundsCheckingEliminaCon

L3: i1 =! (i2, i3)

d1 =! (d3, d4)

if (i1 < n) goto L5

L4:

return d3

L5: t0 = 4 * i

t1 = v[t0]

inbounds(t1, n) goto L8

L6: d2 = nd

L9: d3 =! (d1, d2)

i2 = i1 + 1

goto L3

L7: nd = abs(t1, q)

if (nd > d1) goto L9

L8:

throw BoundsErr

On stack replacementv = load[0]

q = 42

n = 100

i3 = 40

d4 = 2147483647

InJavaScript,everyarrayaccessisguardedagainstout-of-boundsaccesses.Wecaneliminatesomeoftheseguards.

Let'sjustcheckthepacerni = start; i < n; i++.Thispacerniseasytospot,anditalreadycatchesmanyinducConvariables.Therefore,ifwehavea[i],andn ≤ a.length,thenwecaneliminatetheguard.

L3: i1 =! (i2, i3)

d1 =! (d3, d4)

if (i1 < n) goto L7

L4:

return d3

L6: d2 = nd

L9: d3 =! (d1, d2)

i2 = i1 + 1

goto L3

On stack replacementv = load[0]

q = 42

n = 100

i3 = 40

d4 = 2147483647

L7: t0 = 4 * i

t1 = v[t0]

nd = abs(t1, q)

if (nd > d1) goto L9

LoopInversion

L3: i1 =! (i2, i3)

d1 =! (d3, d4)

if (i1 < n) goto L5

L4:

return d3

L5: t0 = 4 * i

t1 = v[t0]

inbounds(t1, n) goto L8

L6: d2 = nd

L9: d3 =! (d1, d2)

i2 = i1 + 1

goto L3

L7: nd = abs(t1, q)

if (nd > d1) goto L9

L8:

throw BoundsErr

On stack replacementv = load[0]

q = 42

n = 100

i3 = 40

d4 = 2147483647

Loopinversionconsistsinchangingawhileloopintoado-while:

while(cond){...}

if(cond){do{…}while(cond);}

1)  WhydoweneedthesurroundingcondiConal?

2)  Whichinfodoweneed,toeliminatethiscondiConal?

LoopInversion

while (i < n) {! var nd = abs(s[i] - q);! if (nd <= d)! d = nd;! i++;!}!

if (i < n) {! do {! var nd = abs(s[i] - q);! if (nd <= d)! d = nd;! i++;! } while (i < n)!}!

ThestandardloopinversiontransformaConrequiresustosurroundthenewrepeatloopwithacondiGonal.However,wecanevaluatethiscondiConal,ifweknowthevalueofthevariablesthatarebeingtested,inthiscase,iandn.Hence,wecanavoidthecostofcreaCngthiscode.

Otherwise,ifwegeneratecodeforthecondiConal,thenasubsequentsparsecondi(onalconstantpropaga(onphasemightsCllbeabletoremovethesurrounding'if'.

ConstantPropagaCon+DeadCodeEliminaCon

L3: i1 =! (i2, i3)

d1 =! (d3, d4)

if (i1 < n) goto L7

L4:

return d3

L6: d2 = nd

L9: d3 =! (d1, d2)

i2 = i1 + 1

goto L3

On stack replacementv = load[0]

q = 42

n = 100

i3 = 40

d4 = 2147483647

L7: t0 = 4 * i

t1 = v[t0]

nd = abs(t1, q)

if (nd > d1) goto L9

L3: i1 =! (i2, i3)

d1 =! (d3, d4)

L4:

return d3L6: d2 = nd

L7: d3 =! (d1, d2)

i2 = i1 + 1

if (i2 > n) goto L4

On stack replacementv = load[0]

q = 42

n = 100

i3 = 40

d4 = 2147483647

L7: t0 = 4 * i

t1 = v[t0]

nd = abs(t1, q)

if (nd > d1) goto L7

WecloseoursuiteofopCmizaConswithconstantpropagaCon,followedbyanewdead-code-eliminaConphase.

ThisispossiblytheopCmizaConthatgivesusthelargestprofit.

ItisoPenthecasethatwecaneliminatesomecondiConalteststoo,aswecanevaluatethecondiConsatcodegeneraConCme.

TheSpecializedCode

•  Thespecializedcodeismuchshorterthantheoriginalcode.

•  ThiscodesizereducConalsoimprovescompilaConCme.InparCcular,itspeedsupregisterallocaCon.

L3: i1 =! (i2, i3)

d1 =! (d3, d4)

L4:

return d3L6: d2 = nd

L7: d3 =! (d1, d2)

i2 = i1 + 1

if (i2 > 100) goto L4

On stack replacementv = load[0]

i3 = 40

d4 = 2147483647

L7: t0 = 4 * i

t1 = v[t0]

nd = abs(t1, 42)

if (nd > d1) goto L7

Function entry pointv = param[0]

q = param[1]

n = param[2]

if (n == 0) goto L1

L1:

throw "Error"L2: i0 = 0

d0 = 2147483647

L3: i1 =! (i0, i2, i3)

d1 =! (d0, d3, d4)

if (i1 < n) goto L5

L4:

return d1

L5: t0 = 4 * i

t1 = v[t0]

inbounds(t1, n) goto L8

L6: d2 = nd

L9: d3 =! (d1, d2)

i2 = i1 + 1

goto L3

On stack replacementv = param[0]

q = param[1]

n = param[2]

i3 = stack[0]

d4 = stack[0]

L7: nd = abs(t1, q)

if (nd > d1) goto L9

L8:

throw BoundsErr

ArethereotheropCmizaConsthatcouldbeappliedinthiscontextofrunCmespecializaCon?

FuncConCallInlining

function inc(x) {

return x + 1;

}

function map(s, b, n, f) {

var i = b;

while (i < n) {

s[i] = f(s[i]);

i++;

}

return s;

}

print (map(new Array(1, 2, 3, 4, 5), 2, 5, inc));

1)Whichcallcouldweinlineinthisexample?

2)HowwouldtheresulCngcodelooklike?

FuncConCallInlining

function inc(x) {

return x + 1;

}

function map(s, b, n, f) {

var i = b;

while (i < n) {

s[i] = f(s[i]);

i++;

}

return s;

}

print (map(new Array(1, 2, 3, 4, 5), 2, 5, inc));

function map(s, b, n, f) {! var i = b;! while (i < n) {! s[i] = s[i] + 1;! i++;! }! return s;!}!

Wecanalsoperformloopunrollinginthisexample.HowwouldbetheresulCngcode?

LoopUnrolling

function map(s, b, n, f) {! var i = b;! while (i < n) {! s[i] = s[i] + 1;! i++;! }! return s;!}!print (map(new Array(1, 2, 3, 4, 5), 2, 5, inc));!

function map(s, b, n, f) {! s[2] = s[2] + 1;! s[3] = s[3] + 1;! s[4] = s[4] + 1;! return s;!}!

Loopunrolling,inthiscase,isveryeffecCve,asweknowtheiniCalandfinallimitsoftheloop.Inthiscase,wedonotneedtoinsertprologandepiloguecodestopreservethesemanCcsoftheprogram.

UniversidadeFederaldeMinasGerais–DepartmentofComputerScience–ProgrammingLanguagesLaboratory

TRACECOMPILATION

WhatisaJITtracecompiler?

•  Atrace-basedJITcompilertranslatesonlythemostexecutedpathsintheprogram’scontrolflowtomachinecode.

•  Atraceisalinearsequenceofcode,thatrepresentsahotpathintheprogram.

•  Twoormoretracescanbecombinedintoatree.

•  ExecuConalternatesbetweentracesandinterpreter.

WhataretheadvantagesanddisadvantagesoftracecompilaConovertradiConalmethodcompilaCon?

Theanatomyofatracecompiler

•  TraceMonkeyisthetracebasedJITcompilerusedintheMozillaFirefoxBrowser.

file.js AST BytecodesTraceengine

LIR x86

jsparser jsemicer jsinterpreter JIT

SpiderMonkey nanojit

Fromsourcetobytecodes

function foo(n){ var sum = 0; for(i = 0; i < n; i++){ sum+=i; } return sum; }

00:getnamen02:setlocal006:zero07:setlocal111:zero12:setlocal216:goto35(19)19:trace20:getlocal123:getlocal226:add27:setlocal131:localinc235:getlocal238:getlocal041:lt42:ifne19(-23)45:getlocal148:return49:stop

ASTjsparser jsemitter

Canyouseethecorrespondencebetweensourcecodeandbytecodes?

00:getnamen02:setlocal006:zero07:setlocal111:zero12:setlocal216:goto35(19)19:trace20:getlocal123:getlocal226:add27:setlocal131:localinc235:getlocal238:getlocal041:lt42:ifne19(-23)45:getlocal148:return49:stop

Thetraceenginekicksin

•  TraceMonkeyinterpretsthebytecodes.•  Oncealoopisfound,itmaydecidetoask

Nanojittotransformitintomachinecode(e.g,x86,ARM).– NanojitreadsLIRandproducesx86

•  Hence,TraceMonkeymustconvertthistraceofbytecodesintoLIR

00:getnamen02:setlocal006:zero07:setlocal111:zero12:setlocal216:goto35(19)19:trace20:getlocal123:getlocal226:add27:setlocal131:localinc235:getlocal238:getlocal041:lt42:ifne19(-23)45:getlocal148:return49:stop

L: load “i” %r1 load “sum” %r2 add %r1 %r2 %r1 %p0 = ovf() bra %p0 Exit1 store %r1 “sum” inc %r2 store %r2 “i” %p0 = ovf() bra %p0 Exit2 load “i” %r0 load “n” %r1 lt %p0 %r0 %r1 bne %p0 L

Bytecodes

Nanojit LIR

FrombytecodestoLIR

Whydowehaveoverflowtests?

•  ManyscripCnglanguagesrepresentnumbersasfloaCng-pointvalues.– ArithmeCcoperaConsarenotveryefficient.

•  ThecompilersomeCmesisabletoinferthatthesenumberscanbeusedasintegers.– ButfloaCng-pointnumbersarelargerthanintegers.– ThisisanotherexampleofspeculaCveopCmizaCon.

– Thus,everyarithmeCcoperaConthatmightcauseanoverflowmustbeprecededbyatest.Ifthetestfails,thentherunCmeenginemustchangethenumber'stypebacktofloaCng-point.

L:movl -32(%ebp), %eax movl %eax, -20(%ebp) movl -28(%ebp), %eax movl %eax, -16(%ebp) movl -20(%ebp), %edx leal -16(%ebp), %eax addl %edx, (%eax) call _ovf testl %eax, %eax jne Exit1 movl -16(%ebp), %eax movl %eax, -28(%ebp) leal -20(%ebp), %eax incl (%eax) call _ovf testl %eax, %eax jne Exit2 movl -24(%ebp), %eax movl %eax, -12(%ebp) movl -20(%ebp), %eax cmpl -12(%ebp), %eax jl L

The overflow tests are also translated into machine code.

FromLIRtoassembly

Nanojit LIR

x86 Assembly

L: load “i” %r1 load “sum” %r2 add %r1 %r2 %r1 %p0 = ovf() bra %p0 Exit1 store %r1 “sum” inc %r2 store %r2 “i” %p0 = ovf() bra %p0 Exit2 load “i” %r0 load “n” %r1 lt %p0 %r0 %r1 bne %p0 L

CanyoucomeupwithanopCmizaContoeliminatesomeoftheoverflowchecks?

Howtoeliminatetheredundanttests

•  Weuserangeanalysis:– FindtherangeofintegervaluesthatavariablemightholdduringtheexecuConofthetrace.

function foo(n){ var sum = 0; for(i = 0; i < n; i++){ sum+=i; } return sum; }

Example:ifweknowthatnis10,andiisalwayslessthann,thenwewillneverhaveanoverflowifweadd1toi.

CheaCngatrunCme

•  AstaGcanalysismustbeveryconservaCve:ifwedonotknowforsurethevalueofn,thenwemustassumethatitmaybeanywherein[-∞,+∞].

•  However,wearenotastaGcanalysis!

•  WearecompilingatrunCme!

•  Toknowthevalueofn,justasktheinterpreter.

function foo(n){ var sum = 0; for(i = 0; i < n; i++){ sum+=i; } return sum; }

Howthealgorithmworks

•  Createaconstraintgraph.– WhilethetraceistranslatedtoLIR.

•  Propagaterangeintervals.– BeforesendingtheLIRtoNanojit.– UsinginfiniteprecisionarithmeCc.

•  Eliminatetestswheneveritissafetodoso.– WetellNanojitthatcodeforsomeoverflowtestsshouldnotbeproduced.

Nx

lt

inc

leq gt geq eq

!

Name

ArithmeCc

RelaConal

Assignment

Theconstraintgraph

•  WehavefourcategoriesofverCces:

dec add sub mul

Buildingtheconstraintgraph

•  WestartbuildingtheconstraintgraphonceTraceMonkeystartsrecordingatrace.

•  TraceMonkeystartsatthebranchinstrucCon,whichisthefirstinstrucConvisitedintheloop.– Althoughitisattheendofthetrace.

19:trace20:getlocalsum23:getlocali26:add27:setlocalsum31:localinci35:getlocaln38:getlocali41:lt42:ifne19(-23)

IntermsofcodegeneraCon,canyourecognizethepacernofbytecodescreatedforthetestifn<igotoL?

19:trace20:getlocalsum23:getlocali26:add27:setlocalsum31:localinci35:getlocaln38:getlocali41:lt42:ifne19(-23)

n0

[10,10]

We check the interpreter’s stack to find out that n is 10.

19:trace20:getlocalsum23:getlocali26:add27:setlocalsum31:localinci35:getlocaln38:getlocali41:lt42:ifne19(-23)

n0

[10,10]

i0

[1,1]

i started holding 0, but at this point, its value is already 1.

WhywedonotiniCalizeiwith0inourconstraintgraph?

19:trace20:getlocalsum23:getlocali26:add27:setlocalsum31:localinci35:getlocaln38:getlocali41:lt42:ifne19(-23)

n0

[10,10]

lt

i0

[1,1]

n1 i1

Comparisons are great! We can learn new information about variables

Now we know that

i is less than 10, so

we rename it to i1

we also rename n

WearerenamingaPercondiConals.WhichprogramrepresentaConarewecreaCngdynamically?

19:trace20:getlocalsum23:getlocali26:add27:setlocalsum31:localinci35:getlocaln38:getlocali41:lt42:ifne19(-23)

n0

[10,10]

lt

i0

[1,1]

n1 i1

sum0

[1,1]

The current value of sum on the interpreter's stack is 1.

19:trace20:getlocalsum23:getlocali26:add27:setlocalsum31:localinci35:getlocaln38:getlocali41:lt42:ifne19(-23)

n0

[10,10]

lt

i0

[1,1]

n1 i1

addsum0

sum1

[1,1]

For the sum, we get the last version of variable i, which is i1.

Whydowepairupsum0withi1,andnotwithi0?

19:trace20:getlocalsum23:getlocali26:add27:setlocalsum31:localinci35:getlocaln38:getlocali41:lt42:ifne19(-23)

n0

[10,10]

lt

i0

[1,1]

n1 i1 inc

i2addsum0

sum1

[1,1]

The last version of

variable i is still i1

so we use it as a

parameter of the

increment.

n0

[10,10]

lt

i0

[1,1]

n1 i1 inc

i2addsum0

sum1

[1,1]

Some variables, like n, have not been updated inside the trace. We know that they are constants.

The comparisons allows us to infer

bounds on the variables that are

updated in the trace

Other variables, like sum, have been updated inside the trace.

n0

[10,10]

lt

i0

[-∞,+∞]

n1 i1 inc

i2addsum0

sum1

ThenextphaseofouralgorithmisthepropagaConofrangeintervals.

WestartbyassigningconservaGvei.e,[-∞,+∞],boundstotherangesofvariablesupdatedinsidethetrace.

[-∞,+∞]

n0

[10,10]

lt

i0

[-∞,+∞]

n1 i1 inc

i2addsum0

sum1

[-∞,+∞]

[10,10] [-∞,9]

We propagate ranges

through arithmetic

and relational nodes

in topological order.

No need to sort the graph topologically. Follow the order in which nodes are created.

19:trace20:getlocalsum23:getlocali26:add27:setlocalsum31:localinci35:getlocaln38:getlocali41:lt42:ifne19(-23)

1

2

3

WhydoestheorderofcreaConofnodesgivesusthetopologicalorderingoftheconstraintgraph?

n0

[10,10]

lt

i0

[-∞,+∞]

n1 i1 inc

i2addsum0

sum1

[-∞,+∞]

[10,10] [-∞,9]

[-∞,+∞]

WearedoingabstractinterpretaCon.WeneedanabstractsemanCcsforeveryoperaConinourconstraintgraph.Asanexample,weknowthat[-∞,+∞]+[-∞,9]=[-∞,+∞]

n0

[10,10]

lt

i0

[-∞,+∞]

n1 i1 inc

i2addsum0

sum1

[-∞,+∞]

[10,10] [-∞,9]

[-∞,+∞]

[-∞,10]

APerrangepropagaConwecancheckwhichoverflowtestsarereallynecessary.

1)  Howmanyoverflowchecksdowehaveinthisconstraintgraph?

2)  Isthereanyoverflowcheckthatisredundant?

n0

[10,10]

lt

i0

[-∞,+∞]

n1 i1 inc

i2addsum0

sum1

[-∞,+∞]

[10,10] [-∞,9]

[-∞,+∞]

[-∞,10]

This increment operation will never

cause an overflow,

for it receives at

most the integer 9.

The add operation might cause an overflow, for one of the inputs – sum – might be very large.

APerrangepropagaConwecancheckwhichoverflowtestsarereallynecessary.

n0

[10,10]

lt

i0

[-∞,+∞]

n1 i1 inc

i2addsum0

sum1

[-∞,+∞]

[10,10] [-∞,9]

[-∞,+∞]

[-∞,10]

L: load “i” %r1 load “sum” %r2 add %r1 %r2 %r1 %p0 = ovf() bra %p0 Exit1 store %r1 “sum” inc %r2 store %r2 “i” %p0 = ovf() bra %p0 Exit2 load “i” %r0 load “n” %r1 lt %p0 %r0 %r1 bne %p0 L

ABitofHistory

•  Acomprehensivesurveyonjust-in-CmecompilaConisgivenbyJohnAycock.

•  ClassInferencewasproposedbyChambersandUngar.•  TheideaofParameterSpecializaConwasproposedby

Costaetal.•  TheoverfloweliminaContechniquewastakenfroma

paperbySoletal.

•  Chambers,C.,andUngar,D.,"CustomizaCon:opCmizingcompilertechnologyforSELF,adynamically-typedobject-orientedprogramminglanguage",SIGPLAN,p146-160(1989)

•  Aycock,J.,"ABriefHistoryofJust-In-Time",ACMCompuCngSurveys,p97-113(2003)

•  Sousa,R.,Guillon,C.,Pereira,F.andBigonha,M."DynamicEliminaConofOverflowTestsinaTraceCompiler",CC,p2-21(2011)

•  Costa,I.,Oliveira,P.,Santos,H.,andPereira,F."Just-in-TimeValueSpecializaCon",CGO,(2013)