53
Rethinking Compilers in the Rise of Machine Learning and AI Computer Science, North Carolina State University Xipeng Shen

Rethinking Compilers in the Rise of Machine Learning and AI · NIPS’2012 SIAM’2010 ICML’2003 ICDE’15 IJCNN’11 VisionInterface’10 SSDM’10. 11 • Conceptual connections

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Rethinking Compilers in the Rise of Machine Learning and AI

ComputerScience,NorthCarolinaStateUniversity

XipengShen

2

ThejourneyofasnowflakeBorn of a raindrop, laced by the breeze, a snowflake forms. Spinning through space, dancing through trees; “I’m not water anymore; I’m a beautiful snowflake” Proudly, it announces its new identity, while falling into a lake, melt. “Welcome home, child”, gently says the water.

TraditionalOptimizingCompilers

3

Derivedprogrampropertiesdef-use;dependence;controlflow;…

ACcompilerLow-levelcodetransformationslooptransformationscommonsubexpressioneliminationvectorization/parallelization…

understandsCconstructs;hardcodedbasicknowledge(e.g.linearalgebraicidentities);hardcodedoptimizationheuristics

ACprogram.

ThesisofthisTalk

• High-levelsemanticiskeyforunleashingthehiddenpowerofcompilers

• AIiskeyforunleashingthehiddenpowerofhigh-levelsemantic-drivencompilations

4

High-LevelSemantic

• Beyondthesemanticoftheprimitiveconstructsinageneralpurposelanguage

• Importancehasbeenwellperceived• NumerousDomainSpecificLanguages(DSL)

• Potentialforgeneralizingcompilertechniques• Generalizedstrengthreduction• Generalizedloopredundancyremoval

5

GeneralizedStrengthReduction

6

[PLDI’17,ICDM’17,ICDE’17,VLDB’15,ICML’15]

YufeiDing LinNing HuiGuan

(w/MadanMusuvathi&ToddMytkowicz

GuoyangChen RandallPittman

7

Strength Reduction

b/2à b>>1Traditional: only instruction level.

8

a

d b

TriangularInequality:a-b≤d≤a+b

Reduce expensive distance calculations to bounds

calculations

9

9≤ d≤11

10 1

C1

C2X

K-Means

5

C’2

Distances: O(K) V.S. O(N * K)

KMeans

10

KNN

ICML’2015

NIPS’2012

SIAM’2010

ICML’2003

ICDE’15

IJCNN’11

VisionInterface’10

SSDM’10

11

• Conceptual connections between TI & strength reduction• Extend TI theory

– Angle triangle inequality (ATI) – Its relations with traditional TI based on edge (ETI)

• Create Generalized Strength Reduction as a compiler technique– For distance & dot product– Offer systematic deployment of TI– Enable automatic algorithm optimizations

• Speedups: tens to hundreds of times

Our Generalization

ATI

• Vector dot product

12

= cos

+ - ≧≧

Holds in spaces of any (>1) dimensions!ATI always gives tighter distance bounds than TI does!

(detailedproofinPLDI’17paper).

AngleTri.Ineq.(ATI)

13

• Example: Restricted Boltzmann Machine (RBM).

Usage of ATI

5

VisibleLayer:v

HiddenLayer:h

h1 h2 h3 hm

v1 vnv2

W(1,m)

W(2,m) W(n,m)

14

Usage of ATI

5

VisibleLayer:v

HiddenLayer:h

h1 h2 h3 hm

v1 vnv2

W(1,m)

W(2,m) W(n,m)

• visible -> hidden – Conditional probability.

sigmoid function.

− Set.

r:arandomnumber[0,1].

15

• When and how bounds can be used as a replacement?

If , then = 1,

else if , then = 0.

Usage of ATI

Inbothcases,boundscanhelpavoidcomputing

16

Complexities of Applying ETI + ATI• ATI V.S. ETI – Tightness is not the only factor relevant to the benefits of TI-

based strength reduction. • Many variants of applying ETI and ATI. – Tradeoff between the cost and quality of bounds. – Multi-granularity bounds. – Dynamic adaption.

Solution:GuidedTIAdaptationtochoosethebestTIoptimization

onthefly

(detailsinPLDI’17&VLDB’15).

• Over default implementation .

17

Overall Speedup (No quality loss)

> 93%, up to 99%

84% 91%, 94%

18

Baseline:ClassicK-means

(16GB,8-coreIntelIvyBridge)Speedu

p(X)

TOP Yinyang K-Means

CodelinkinICML’15paper.

ICML’15

OverManualTIOptK-Means(K=1024)

ExampleII:GeneralizedLoopRedundancyRemoval

19

[OOPSLA’17]

YufeiDing

LoopRedundancy

20

for (i=0; i<N; i++){ a[i] = b/c; }

Motivating Example

21

w = w0; while (d > 0.01){ d = 0; for (i = 0; i < M; i++){ d += a[i] + b[i] * w; } w = w - 0.001 * d; }

reduction over loop i

Any Redundant Computations?

Stay the same across for loop, but change across while loop

Stay the same across while loop, but vary across for loop.

Elusivetoexistingtechniques.

22

•  An equivalent form of the example.

w = w0; while (d > 0.01){ A = ∑i a[i]; B = ∑i b[i]; d = A + B * w; w = w - 0.001 * d }

w = w0; while (d > 0.01){ d = 0; for (i = 0; i < M; i++){ d += a[i] + b[i] * w; } w = w - 0.001 * d; }

Motivating Example

O(M*K) O(M+K)

//Kiterations

KeyObservation

23

Natureofsuchhiddenredundancies:Theirdiscoveryandremovalrequirelarge-scopedanalysisandcomputationreorderingatbothexpressionandlooplevels.

GeneralizedLoopRedundancyElimination(GLORE)

24

LERNotation ASetofNovelAlgorithmsOperandfoldingAlternativeformgenerationClosure-basedalgorithmMinimum-unionalgorithmLoopencapsulation……

Enabler TransformerLoopOptimizedLoop

FormulaeFormulae

25

Optimizationthrough GLOREalgorithms

ReasonsfortheAlgorithmDesigns

• Varietyofloopredundancies• Complexmathoperations(e.g.,div,sin,mod)• Loopindexdependence• Manypossibleordersofcomputations• Interplaysbetweenreorderingofexpressionsandthatofloops

26

Ingeneral,findingthebestorderisNP-complete.[Chi-chung+:1997]

EntireWorkFlow

27

f f’ G

g g’ H

loop encapsu.

cat-3 opt(min union alg. &

closure-based alg.)

loop decaps.

cat-4 opt(incremental repres.)

h I

cat-1 & cat-2 opt. (reuse list & groups)

operand abstraction f*

operand concat. h’

Forg preprocess

Falt(operand folding; alt form gen)

upper case: a set of formulae; lower case: a formula; : transformation; : for each member.

forcomplexexpressions

toeaseanalysis

forindexdependence tofindbestorder

prepareforothercategories

findredundancy

Findincrementalcompu.

SeeOOPLSA’17fordetails.

Evaluation:21Benchmarks

28

Real-world problems Batch gradient descent (BGD)) a nonlinear partial differential

equation (PDE). Stencil benchmarks

Dibligbilharm, Diso3X3, Imorph, Noise1, Iyoki, Inevatia, and Dresid.

Programs from prior work LANL POP, mgrid:resid, mgrid:psinv, mgrid:interp, and

mgrid:rpj3. Pluto benchmark suite

ssymm, fuse, priv2, fmri, and ccsd.

Categories3&4(RedundantLoops)

29

ReducedComputationalComplexity

Nopriormethodsworkonthem.

1.15-1.8Xspeedupsonotherprograms.

Reflection

30

Rightabstractionsplussuitablealgorithmsmaygoalongwayinleveraginghigh-levelsemanticsand

henceadvancingcompilertechnology.

Cantheideabegeneralizedtoabroaderrangeofhigh-levelsemantics?

ThesisofthisTalk

• High-levelsemanticiskeyforunleashingthehiddenpowerofcompilers

• AIiskeyforunleashingthehiddenpowerofhigh-levelsemantic-drivencompilations

31

Whatotherhigh-levelsemantics?

• Alot…• EachlibraryAPIdefinessomehigh-levelsemantics

• Howarecurrentcompilerstreatingthem?• Blackboxes• Barriersforprogramanalysisandoptimizations

32

33

objectAScalaSparkJob{defmain(args:Array[String]){valconf=newSparkConf()conf.setAppName("MyFirstSparkScalaApplication")valctx=newSparkContext(conf)valfile="file:///home/Crimes_-Aug-2015.csv";vallogData=ctx.textFile(file,2)valnumLines=logData.filter(line=>true).count()valvectors=logData.map(…)valkMeansModel=KMeans.train(vectors,2,20)

}}

AbstractionWall.

LotsofPriorEfforts

• Interproceduralanalysis• Limitedbyanalyzability

• Specification-basedsolutions• Leveraginghigh-levelsemanticconveyedbymanualspecifications

• TelescopingLanguages(Kennedyetal.),Broadway(Guyer&Lin),Speckle(Vandevoordeetal.),…

34

TelescopingLanguages

• OnprogramsinSlanguages• 10—143Xspeedups• Ononeprogram,23000Xusingstructureofamatrix

35

[IEEE2005]

“Ifeachdomainlibraryeffectivelydefinesanewlanguage,thentherewillbeanenormousnumberofspecializedhigh-levellanguagesthatwilleachrequireanoptimizingcompiler.”

36

—Kennedyetal.[IEEE2005]

TelescopingLanguages

37

Ageneratorofdomain-specificoptimizers.

LibrarySpecifications

Transformationspecifications

Bottleneckforadoption.

(AdaptedfromKennedy+:IEEE’2005)

KeyBarrier:Knowledge• Theoptimizergeneratorneedstoknow• KnowledgeontheAPIs

• Functional,behavioral,performanceattributesofeachAPI• Relations(e.g.,algebraicequivalence)ofsequencesofAPIs

• Knowledgeonthetransformations• Precondition,effects,andcost

• Difficulties• Difficulttowriteormaintain• Hardtobecorrectorcomplete

38

ExistingEfforts• Manualefforts• Automaticderivationsofcodespecifications• R.Bodiketal.,T.Xieetal.,others• Code,documentations,dynamicobservations.• Statemachines,Graphs,NLP,ML,etc.• Focusedoninvariantsorcodecontract

• Muchmoreneeded• (Algebraic)relationsbetweenAPIs,equivalenceofsequences,context-sensitiveperformanceproperties,datadependences,codetransformations,…

39

OtherOpenChallenges

• Howtobesttransformthelibraryandprogram• Explosionofthesetofabstracttypesandspecializedversions

• Bestwaysandordertoapplytransformations

40

RevisitinLensofModernAI&ML• Automaticdiscoverhigh-levelsemantics• knowledgediscoveryfrommulti-sources• DocumentationsofAPIs&comments• Code,dynamicobservations

• Automaticdiscover&applyhigh-leveltransformations• Learningfromexperience• Planning,reasoning,searching

41

Telescoping Languages

42

Softwarebuildingblockswithhigh-levelsemanticunderstandabletocompilers.

Aprograminscriptinglanguage

High-leveluserintention.

TelGen AnefficientimplementationuponthelibraryHW

Automatic Programming

43

Softwarebuildingblockswithhigh-levelsemanticunderstandabletoautocoder.

High-leveluserintention,intuitivelyexpressed.

High-leveluserintention.

Autocoder Anefficientimplementationofanefficientalgorithm.HW

HolyGrailofOptimizingCompilers

• Afullfreedomtoexplore• Algorithm• Implementation

• Notboundedby• Hardcodedalgorithmopaquetocompilers• Aliasesandothercodecomplexities• Abstractionwalls

44

BenefitstoUsers

• Programmingproductivity• Lesshumaninvolvements:Lesserrorprone• Reliability• Security• Maintainability

• Analyzability-PreservingCompositions

45

Along-timegoal

46

“Automaticprogramminghasbeenagoalofcomputerscienceandartificialintelligencesincethefirstprogrammercamefacetofacewiththedifficultiesofprogramming.”

—from“AutomaticProgramming:MythsandProspects”byCharlesRichandRichardC.Watersin1988

47

Myth1:End-user-orientedautomaticprogrammingsystemsdonotneeddomainknowledge.

“End-user-oriented automatic programming systems must be domain experts. There is no point at which someone who knows nothing about programming communicates directly with someone who knows nothing about the application domain.”

48

Myth2:End-user-oriented,generalpurpose,fullyautomaticprogrammingispossible.

“…such an automatic programming system would have to be expert in every application domain.Unfortunately, artificial intelligence is nowhere near supporting this superhuman level of performance.”

AretheremorehopegiventhenewprogressofAIandML?Canthosedomainknowledgebeautomaticallylearned?

ThreeKeyQuestionstoAnswer

• Whatdoesusersee?• Whatdoesautocoderknow?• Howdoesautocoderwork?

49

AllshouldberevisitedinthelensofmodernAIandML.

AnExample

50

“Naturallanguagesareanattractivechoiceforcommunicationbetweenendusersandanautomaticprogrammingsystem.…Unfortunately,enablingmachinestoconverseinnaturallanguageiswaybeyondthecurrentcapabilitiesofartificialintelligence.”

—from“AutomaticProgramming:MythsandProspects”byCharlesRichandRichardC.Watersin1988

51

AI AutomaticProgramming

[MITAIMemo]v39:“TheNewCompiler”v349,v353,v379,v443,v453,…Lambdapapers. Compiler

Technology

52

Traditionalcompiler/Low-levelmethod

Analysisandoptimizationsbasedonprimitivelanguageconstructs.

High-levelsemantic-drivencompilation

Analysisandoptimizationsatalargerscopeandahigherlevel.

Generatorsofhigh-levelsemanticbasedcompilers/Autocoder

Scalableapproachtoleveraginghigh-levelsemantics.

Abstraction & Generalization

AI, ML, etc.

Summary

FinalMessage

• Beingopen-minded,embracingnewdirections• Trynewperspectivesoncompilerconstructions• Trytopursuesystematicapproachestohigh-levelsemantic-drivencompilations• RatherthanbeingsatisfiedwithanotherDSL

53