Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
Code generation under ControlRencontres sur la compilation / Saint Hippolyte
Henri-Pierre CharlesCEA Laboratoire LaSTRE / Grenoble
12 décembre 2011
Code generation under ControlRencontres sur la compilation / Saint Hippolyte
Henri-Pierre CharlesCEA Laboratoire LaSTRE / Grenoble
12 décembre 2011
Introduction Présentation
Henri-Pierre Charles, two lines CV :2010- CEA/DRT/DACLE/LIST/LaSTRE
CRI PILSI context at Gières1993-2010 : assistant professor in Université of VersaillesSaint-Quentin en Yvelines, PRiSM laboratory, IUT de Vélizy
Keywords :Architecture, HPC, Compiler backend, Parallelism (ILP,Multimedia, Caches)6809, 68000, i860, trimedia, Itanium, Power, CELL, ARM,MEPHISTO, otherGCC, LLVM, FFTW, H264, Spiral, ATLAS, MESA3D, other3D Image reconstruction, Z-buffer, Video Compression,FFTW, QCD
Henri-Pierre Charles Code generation under Control 10 / 10011
Introduction CEA / CRI PILSI
CEA : Commissariat à l'Énergie Atomique
et aux Énergies Alternatives
DAM : Direction des Applications Militaires
DEN : Direction de l'Énergie
Nucléaire
DRT : Direction de la Recherche Technologique
DSM : Direction des Sciences de la Matière
DSV : Direction des Sciences
du Vivant
LIST : Laboratoire Intégration des Systèmes
et des TechnologiesSACLAY
LETI : Laboratoire Électronique et de
Technologie de l'InformationGrenoble
LITEN : LaboratoireInnovation pour les Technologies
des Energies Nouvelles et les nanomatériau
LaSTRE : LaboratoireSystème Temps Réel
Saclay / Gières
LIALP : LaboratoireInfrastructure et Atelier Logiciel pour Puces
Gières
Henri-Pierre Charles Code generation under Control 11 / 10011
Introduction Présentation LaSTRE
Laboratoire Sytèmes Temps Réel : Head : Vincent DAVIDOASIS Multi-scaled time-triggered architecture (the system
is measured at its own rhythm) Temporal consistencyof exchanged data
PharOS Same concepts specialized in automotive context :Embedded SystemsMultiprocessors
MPPA High productivity parallel programming model forembedded HPC : MPPA project
∑c
Low Level Code Optimization Dynamic code generation, low leveloptimization, multimedia applications
Technologies from high level sources to bare metal machines
Henri-Pierre Charles Code generation under Control 100 / 10011
Motivation Context Objective ?
Be at home as fast as possibleWith safetySpeed Limitations Constraints“Real” Speed Limitations ConstraintsGaz Consomption ConstraintsEngine temperature Constraint
Henri-Pierre Charles Code generation under Control 101 / 10011
Motivation Context Classical Compilation Chain
Sourcecode
Intermediatecode
Compiler Binarycode
Runnablecode
SystemAssemblycode
Assembler Loader DataUserIdea Algorithm Programmer
Compilation objectivesTranslate source code to a semantically binary equivalentAssume “successive refinement”Optimize for efficency / parallelism : reduce cycle count
Performance defaults is now a “bug” (not only in RT systems)“Performance counter in the loop”
Henri-Pierre Charles Code generation under Control 110 / 10011
Motivation Context Semantic Bottleneck
Henri-Pierre Charles Code generation under Control 111 / 10011
Motivation Context Ask for program !
What are speed variation for this program :
int i;for (i= 0; i < N; ++i)
{int j;dest[i]= 0;for (j= 0; j < N; ++j)
dest[i] += src[j] * m[i][j];}
Compiler, data size, target processor, instruction set, availableparallelism, data type, memory location, operating system, ...
Henri-Pierre Charles Code generation under Control 1000 / 10011
Motivation Context Data Size Matter
Loop size (value of N)101 Multimedia kernel : Full loop unroll, instruction
scheduling, memory caches access, ...102/103/ Scientific code : loop unroll, loop convertion, data
prefetching106 Multimedia flux : multithreading
1010 and more High level parallelism : MPI / Grid / Cloud, ...N is generally a parameter only known at run-time. Profiling andIterative compilation does not help.Compilation strategies are complex and are application domainspecific
Henri-Pierre Charles Code generation under Control 1001 / 10011
Architecture Architecture GENEPY
CEA-LETI architecture
Henri-Pierre Charles Code generation under Control 1010 / 10011
Architecture Operateur Mephisto
“No instruction set” (microprogram)
Henri-Pierre Charles Code generation under Control 1011 / 10011
Architecture Consommation à c©lectrique
Henri-Pierre Charles Code generation under Control 1100 / 10011
Dynamic compilation Compilette in work
Sourcecode
Intermediatecode
Compiler Binarycode
Runnablecode
SystemAssemblycode
Assembler Loader
DataUser
Idea Algorithm
Programmer
Compilette
Algorithmic optimizer
Parameter
Code generation
Data Driven (Size, Alignment, Values)Energy Driven (ISA selection, Vectorization)Speed Driven (ISA selection, Vectorization quality)Network Topology drivenUser Driven (Experimentation)
Henri-Pierre Charles Code generation under Control 1101 / 10011
Dynamic compilation deGoal a tool for dynamic codegeneration
deGoal : a tool for compilette generationGenerate a code generator
Virtual Portable Instruction Set (Register based Data Type)Optimization at compil time & run timeFaster than any compiler code generatorNo Intermediate representationAlgorithmic levelBottom up approach
Target : ARM, GENEPY, XP70V3/4, GPU, K1, ...Memory footprint : few KbGeneral context : telecommunication algorithms (3GPP LTE)
Henri-Pierre Charles Code generation under Control 1110 / 10011
Dynamic compilation FP7 H4H
FP7 : H4H : High Performance for Heterogenous Architecture,GPU JIT for Scilab
Generate NVIDIA assembly language PTX dynamicallyEmbed code generator in ScilabOptimized data movementLinear algebra context
Dynamic generation driven by data size
Henri-Pierre Charles Code generation under Control 1111 / 10011
Dynamic compilation FP7 Touchmore
FP7 : Touchmore : Dynamic code generationDynamic code generation for MpSOCGENEPY tile (DSP Mephisto + MIPS)Generate code for MIPS or MephistoMultimedia applications (MP3 / MP4)
Dynamic code generation driven by code performance
Henri-Pierre Charles Code generation under Control 10000 / 10011
Dynamic compilation Smecy
FP7 : SmecyTarget P2012 MPSoC / XP70 processorMatrix x Matrix dynamic generation“Perfect hash” dynamic generator
Dynamic generation driven by performance and power consomption
Henri-Pierre Charles Code generation under Control 10001 / 10011
Dynamic compilation Related work
Jit compilation : Java, LLVM, CUDA : Intermediaterepresentation, heavy weight code generators (code footprint& time)Python, perl, php : too high level, glue languageFFTW, Spiral : code generator, dynamic configurationAtlas : compil time tuningVVM / CCG / HPBCG : previous versions
Henri-Pierre Charles Code generation under Control 10010 / 10011
Dynamic compilation Conclusion
Dynamic code generation is THE challenge (JIT, Javascript,emulation, multicore simulation, ...)Lot of work to do : power characterizationMPSoC and HPC systems share some problematics : multiplecore, power consomption control, ...Control over parameters for code generation are multiples andhard to manage
Subscribe to DCE 2012 : Workshop on “Dynamic CompilationEverywhere” (during Hipeac 2012)
Henri-Pierre Charles Code generation under Control 10011 / 10011