Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Rethinking Compilers in the Rise of Machine Learning and AI
ComputerScience,NorthCarolinaStateUniversity
XipengShen
2
ThejourneyofasnowflakeBorn of a raindrop, laced by the breeze, a snowflake forms. Spinning through space, dancing through trees; “I’m not water anymore; I’m a beautiful snowflake” Proudly, it announces its new identity, while falling into a lake, melt. “Welcome home, child”, gently says the water.
TraditionalOptimizingCompilers
3
Derivedprogrampropertiesdef-use;dependence;controlflow;…
ACcompilerLow-levelcodetransformationslooptransformationscommonsubexpressioneliminationvectorization/parallelization…
understandsCconstructs;hardcodedbasicknowledge(e.g.linearalgebraicidentities);hardcodedoptimizationheuristics
ACprogram.
ThesisofthisTalk
• High-levelsemanticiskeyforunleashingthehiddenpowerofcompilers
• AIiskeyforunleashingthehiddenpowerofhigh-levelsemantic-drivencompilations
4
High-LevelSemantic
• Beyondthesemanticoftheprimitiveconstructsinageneralpurposelanguage
• Importancehasbeenwellperceived• NumerousDomainSpecificLanguages(DSL)
• Potentialforgeneralizingcompilertechniques• Generalizedstrengthreduction• Generalizedloopredundancyremoval
5
GeneralizedStrengthReduction
6
[PLDI’17,ICDM’17,ICDE’17,VLDB’15,ICML’15]
YufeiDing LinNing HuiGuan
(w/MadanMusuvathi&ToddMytkowicz
GuoyangChen RandallPittman
Reduce expensive distance calculations to bounds
calculations
9
9≤ d≤11
10 1
C1
C2X
K-Means
5
C’2
Distances: O(K) V.S. O(N * K)
11
• Conceptual connections between TI & strength reduction• Extend TI theory
– Angle triangle inequality (ATI) – Its relations with traditional TI based on edge (ETI)
• Create Generalized Strength Reduction as a compiler technique– For distance & dot product– Offer systematic deployment of TI– Enable automatic algorithm optimizations
• Speedups: tens to hundreds of times
Our Generalization
ATI
• Vector dot product
12
= cos
+ - ≧≧
Holds in spaces of any (>1) dimensions!ATI always gives tighter distance bounds than TI does!
(detailedproofinPLDI’17paper).
AngleTri.Ineq.(ATI)
13
• Example: Restricted Boltzmann Machine (RBM).
Usage of ATI
5
VisibleLayer:v
HiddenLayer:h
h1 h2 h3 hm
v1 vnv2
W(1,m)
W(2,m) W(n,m)
14
Usage of ATI
5
VisibleLayer:v
HiddenLayer:h
h1 h2 h3 hm
v1 vnv2
W(1,m)
W(2,m) W(n,m)
• visible -> hidden – Conditional probability.
sigmoid function.
− Set.
r:arandomnumber[0,1].
15
• When and how bounds can be used as a replacement?
If , then = 1,
else if , then = 0.
Usage of ATI
Inbothcases,boundscanhelpavoidcomputing
16
Complexities of Applying ETI + ATI• ATI V.S. ETI – Tightness is not the only factor relevant to the benefits of TI-
based strength reduction. • Many variants of applying ETI and ATI. – Tradeoff between the cost and quality of bounds. – Multi-granularity bounds. – Dynamic adaption.
Solution:GuidedTIAdaptationtochoosethebestTIoptimization
onthefly
(detailsinPLDI’17&VLDB’15).
18
Baseline:ClassicK-means
(16GB,8-coreIntelIvyBridge)Speedu
p(X)
TOP Yinyang K-Means
CodelinkinICML’15paper.
ICML’15
OverManualTIOptK-Means(K=1024)
Motivating Example
21
w = w0; while (d > 0.01){ d = 0; for (i = 0; i < M; i++){ d += a[i] + b[i] * w; } w = w - 0.001 * d; }
reduction over loop i
Any Redundant Computations?
Stay the same across for loop, but change across while loop
Stay the same across while loop, but vary across for loop.
Elusivetoexistingtechniques.
22
• An equivalent form of the example.
w = w0; while (d > 0.01){ A = ∑i a[i]; B = ∑i b[i]; d = A + B * w; w = w - 0.001 * d }
w = w0; while (d > 0.01){ d = 0; for (i = 0; i < M; i++){ d += a[i] + b[i] * w; } w = w - 0.001 * d; }
Motivating Example
O(M*K) O(M+K)
//Kiterations
KeyObservation
23
Natureofsuchhiddenredundancies:Theirdiscoveryandremovalrequirelarge-scopedanalysisandcomputationreorderingatbothexpressionandlooplevels.
GeneralizedLoopRedundancyElimination(GLORE)
24
LERNotation ASetofNovelAlgorithmsOperandfoldingAlternativeformgenerationClosure-basedalgorithmMinimum-unionalgorithmLoopencapsulation……
Enabler TransformerLoopOptimizedLoop
FormulaeFormulae
ReasonsfortheAlgorithmDesigns
• Varietyofloopredundancies• Complexmathoperations(e.g.,div,sin,mod)• Loopindexdependence• Manypossibleordersofcomputations• Interplaysbetweenreorderingofexpressionsandthatofloops
26
Ingeneral,findingthebestorderisNP-complete.[Chi-chung+:1997]
EntireWorkFlow
27
f f’ G
g g’ H
loop encapsu.
cat-3 opt(min union alg. &
closure-based alg.)
loop decaps.
cat-4 opt(incremental repres.)
h I
cat-1 & cat-2 opt. (reuse list & groups)
operand abstraction f*
operand concat. h’
Forg preprocess
Falt(operand folding; alt form gen)
upper case: a set of formulae; lower case: a formula; : transformation; : for each member.
forcomplexexpressions
toeaseanalysis
forindexdependence tofindbestorder
prepareforothercategories
findredundancy
Findincrementalcompu.
SeeOOPLSA’17fordetails.
Evaluation:21Benchmarks
28
Real-world problems Batch gradient descent (BGD)) a nonlinear partial differential
equation (PDE). Stencil benchmarks
Dibligbilharm, Diso3X3, Imorph, Noise1, Iyoki, Inevatia, and Dresid.
Programs from prior work LANL POP, mgrid:resid, mgrid:psinv, mgrid:interp, and
mgrid:rpj3. Pluto benchmark suite
ssymm, fuse, priv2, fmri, and ccsd.
Categories3&4(RedundantLoops)
29
ReducedComputationalComplexity
Nopriormethodsworkonthem.
1.15-1.8Xspeedupsonotherprograms.
Reflection
30
Rightabstractionsplussuitablealgorithmsmaygoalongwayinleveraginghigh-levelsemanticsand
henceadvancingcompilertechnology.
Cantheideabegeneralizedtoabroaderrangeofhigh-levelsemantics?
ThesisofthisTalk
• High-levelsemanticiskeyforunleashingthehiddenpowerofcompilers
• AIiskeyforunleashingthehiddenpowerofhigh-levelsemantic-drivencompilations
31
Whatotherhigh-levelsemantics?
• Alot…• EachlibraryAPIdefinessomehigh-levelsemantics
• Howarecurrentcompilerstreatingthem?• Blackboxes• Barriersforprogramanalysisandoptimizations
32
33
objectAScalaSparkJob{defmain(args:Array[String]){valconf=newSparkConf()conf.setAppName("MyFirstSparkScalaApplication")valctx=newSparkContext(conf)valfile="file:///home/Crimes_-Aug-2015.csv";vallogData=ctx.textFile(file,2)valnumLines=logData.filter(line=>true).count()valvectors=logData.map(…)valkMeansModel=KMeans.train(vectors,2,20)
}}
AbstractionWall.
LotsofPriorEfforts
• Interproceduralanalysis• Limitedbyanalyzability
• Specification-basedsolutions• Leveraginghigh-levelsemanticconveyedbymanualspecifications
• TelescopingLanguages(Kennedyetal.),Broadway(Guyer&Lin),Speckle(Vandevoordeetal.),…
34
TelescopingLanguages
• OnprogramsinSlanguages• 10—143Xspeedups• Ononeprogram,23000Xusingstructureofamatrix
35
[IEEE2005]
“Ifeachdomainlibraryeffectivelydefinesanewlanguage,thentherewillbeanenormousnumberofspecializedhigh-levellanguagesthatwilleachrequireanoptimizingcompiler.”
36
—Kennedyetal.[IEEE2005]
TelescopingLanguages
37
Ageneratorofdomain-specificoptimizers.
LibrarySpecifications
Transformationspecifications
Bottleneckforadoption.
(AdaptedfromKennedy+:IEEE’2005)
KeyBarrier:Knowledge• Theoptimizergeneratorneedstoknow• KnowledgeontheAPIs
• Functional,behavioral,performanceattributesofeachAPI• Relations(e.g.,algebraicequivalence)ofsequencesofAPIs
• Knowledgeonthetransformations• Precondition,effects,andcost
• Difficulties• Difficulttowriteormaintain• Hardtobecorrectorcomplete
38
ExistingEfforts• Manualefforts• Automaticderivationsofcodespecifications• R.Bodiketal.,T.Xieetal.,others• Code,documentations,dynamicobservations.• Statemachines,Graphs,NLP,ML,etc.• Focusedoninvariantsorcodecontract
• Muchmoreneeded• (Algebraic)relationsbetweenAPIs,equivalenceofsequences,context-sensitiveperformanceproperties,datadependences,codetransformations,…
39
OtherOpenChallenges
• Howtobesttransformthelibraryandprogram• Explosionofthesetofabstracttypesandspecializedversions
• Bestwaysandordertoapplytransformations
40
RevisitinLensofModernAI&ML• Automaticdiscoverhigh-levelsemantics• knowledgediscoveryfrommulti-sources• DocumentationsofAPIs&comments• Code,dynamicobservations
• Automaticdiscover&applyhigh-leveltransformations• Learningfromexperience• Planning,reasoning,searching
41
Telescoping Languages
42
Softwarebuildingblockswithhigh-levelsemanticunderstandabletocompilers.
Aprograminscriptinglanguage
High-leveluserintention.
TelGen AnefficientimplementationuponthelibraryHW
Automatic Programming
43
Softwarebuildingblockswithhigh-levelsemanticunderstandabletoautocoder.
High-leveluserintention,intuitivelyexpressed.
High-leveluserintention.
Autocoder Anefficientimplementationofanefficientalgorithm.HW
HolyGrailofOptimizingCompilers
• Afullfreedomtoexplore• Algorithm• Implementation
• Notboundedby• Hardcodedalgorithmopaquetocompilers• Aliasesandothercodecomplexities• Abstractionwalls
44
BenefitstoUsers
• Programmingproductivity• Lesshumaninvolvements:Lesserrorprone• Reliability• Security• Maintainability
• Analyzability-PreservingCompositions
45
Along-timegoal
46
“Automaticprogramminghasbeenagoalofcomputerscienceandartificialintelligencesincethefirstprogrammercamefacetofacewiththedifficultiesofprogramming.”
—from“AutomaticProgramming:MythsandProspects”byCharlesRichandRichardC.Watersin1988
47
Myth1:End-user-orientedautomaticprogrammingsystemsdonotneeddomainknowledge.
“End-user-oriented automatic programming systems must be domain experts. There is no point at which someone who knows nothing about programming communicates directly with someone who knows nothing about the application domain.”
48
Myth2:End-user-oriented,generalpurpose,fullyautomaticprogrammingispossible.
“…such an automatic programming system would have to be expert in every application domain.Unfortunately, artificial intelligence is nowhere near supporting this superhuman level of performance.”
AretheremorehopegiventhenewprogressofAIandML?Canthosedomainknowledgebeautomaticallylearned?
ThreeKeyQuestionstoAnswer
• Whatdoesusersee?• Whatdoesautocoderknow?• Howdoesautocoderwork?
49
AllshouldberevisitedinthelensofmodernAIandML.
AnExample
50
“Naturallanguagesareanattractivechoiceforcommunicationbetweenendusersandanautomaticprogrammingsystem.…Unfortunately,enablingmachinestoconverseinnaturallanguageiswaybeyondthecurrentcapabilitiesofartificialintelligence.”
—from“AutomaticProgramming:MythsandProspects”byCharlesRichandRichardC.Watersin1988
51
AI AutomaticProgramming
[MITAIMemo]v39:“TheNewCompiler”v349,v353,v379,v443,v453,…Lambdapapers. Compiler
Technology
52
Traditionalcompiler/Low-levelmethod
Analysisandoptimizationsbasedonprimitivelanguageconstructs.
High-levelsemantic-drivencompilation
Analysisandoptimizationsatalargerscopeandahigherlevel.
Generatorsofhigh-levelsemanticbasedcompilers/Autocoder
Scalableapproachtoleveraginghigh-levelsemantics.
Abstraction & Generalization
AI, ML, etc.
Summary