Upload
grigori-fursin
View
59
Download
0
Embed Size (px)
DESCRIPTION
When trying to make auto-tuning practical using common infrastructure, public repository of knowledge, and machine learning (cTuning.org), we faced a major problem with reproducibility of experimental results collected from multiple users. This was largely due to a lack of information about all software and hardware dependencies as well as a large variation of measured characteristics. I will present a possible collaborative approach to solve aboveproblems using a new Collective Mind knowledge management system. This modular infrastructure is intended to preserve and share through Internet the whole experimental setups with all related artifacts and their software and hardware dependencies besides just performance data. Researchers can take advantage of shared components and data with extensible meta-description at http://c-mind.org/repo to quickly prototype and validate research techniques particularly on software and hardware optimization and co-design. At the same time, behavior anomalies or model mispredictions can be exposed in a reproducible way to interdisciplinary community for further analysis and improvement. This approach supports our new open publication model in computer engineering where all results and artifacts are continuously shared and validated by the community (c-mind.org/events/trust2014). This presentations supports our recent publication: * http://iospress.metapress.com/content/f255p63828m8l384 * http://hal.inria.fr/hal-01054763
Citation preview
Collective Mind: bringing reproducible research to the masses
Grigori Fursin
POSTALE, INRIA Saclay, France
INRIA-Illinois-ANL Joint Laboratory Workshop
France, June 2014
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 2
Challenge:
How to design next generation of faster, smaller, cheaper, more power efficient and reliable computer systems (software and hardware)?
Long term interdisciplinary vision:
• Share code and data in a reproducible way along with publications
• Use big data analytics to program optimization, run-time adaptation and architecture co-design
• Bring interdisciplinary community together to validate experimental results, ensure reproducibility, improve optimization predictions
Message
Continuously validated in industrial projects with Intel, ARM, IBM, CAPS, ARC (Synopsys), STMicroelectronics
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 3
• Motivation: general problems in computer engineering
• cTuning: big-data driven program optimization and architecture co-design and encountered problems
• Collective Mind: collaborative and reproducible research and experimentation in computer engineering
• Reproducibility as a side effect
• Conclusions and future work
Talk outline
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 4
Available solutions
Result
Application
Compilers
Binary and libraries
Architecture
Run-time environment
State of the system Data set
Algorithm
End User task
End users require faster, smaller and more power efficient systems
Storage
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 5
GCC optimizations
Result
End User task
Delivering optimal solution is non-trivial
Fundamental problems:
1) Too many design and optimization choices at all levels
2) Always multi-objective optimization: performance vs compilation time vs code size vs system size vs power consumption vs reliability vs return on investment
3) Complex relationship and interactions between ALL software and hardware components
Empirical auto-tuning is too time consuming, ad-hoc and tedious to be a mainstream!
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 6
Combine auto-tuning with machine learning and crowdsourcing
Plugin-based MILEPOST GCC
Plugins
Monitor and explore optimization space
Extract semantic program features
cTuning.org: plugin-based auto-tuning framework and public repository
Program or kernel1
Program or kernel N
…
Tra
inin
g
Unseen program
Pre
dic
tio
n
MILEPOST GCC
Plugins Collect dynamic features
Cluster Build predictive model
Extract semantic program features
Collect hardware counters
Predict optimization to minimize
execution time, power consumption,
code size, etc
• G. Fursin et.al. MILEPOST GCC: Machine learning based self-tuning compiler. 2008, 2011 •G. Fursin and O. Temam. Collective optimization: A practical collaborative approach. 2010 •G. Fursin. Collective Tuning Initiative: automating and accelerating development and optimization of computing systems, 2009 • F. Agakov et.al.. Using Machine Learning to Focus Iterative Optimization, 2006
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 7
• G. Fursin et.al. MILEPOST GCC: Machine learning based self-tuning compiler. 2008, 2011 •G. Fursin and O. Temam. Collective optimization: A practical collaborative approach. 2010 •G. Fursin. Collective Tuning Initiative: automating and accelerating development and optimization of computing systems, 2009 • F. Agakov et.al.. Using Machine Learning to Focus Iterative Optimization, 2006
Plugin-based MILEPOST GCC
Plugins
Monitor and explore optimization space
Extract semantic program features
cTuning.org: plugin-based auto-tuning framework and public repository
Program or kernel1
Program or kernel N
…
Tra
inin
g
Unseen program
Pre
dic
tio
n
MILEPOST GCC
Plugins Collect dynamic features
Cluster Build predictive model
Extract semantic program features
Collect hardware counters
Predict optimization to minimize
execution time, power consumption,
code size, etc
In 2009, we opened public repository of knowledge (cTuning.org) and managed to automatically tune
customer benchmarks and compiler heuristics for a range of real platforms
from IBM and ARC (Synopsis)
Now becomes a mainstream - everything is solved?
Combine auto-tuning with machine learning and crowdsourcing
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 8
Technological chaos
GCC 4.1.x
GCC 4.2.x
GCC 4.3.x
GCC 4.4.x
GCC 4.5.x
GCC 4.6.x
GCC 4.7.x
ICC 10.1
ICC 11.0
ICC 11.1
ICC 12.0
ICC 12.1 LLVM 2.6
LLVM 2.7
LLVM 2.8
LLVM 2.9
LLVM 3.0
Phoenix
MVS 2013
XLC
Open64
Jikes Testarossa
OpenMP MPI
HMPP
OpenCL
CUDA 4.x gprof prof
perf
oprofile
PAPI
TAU
Scalasca
VTune
Amplifier
predictive scheduling
algorithm-level TBB
MKL
ATLAS
program-level
function-level
Codelet
loop-level
hardware counters
IPA
polyhedral transformations
LTO threads
process
pass reordering
KNN
per phase reconfiguration
cache size
frequency
bandwidth
HDD size TLB ISA
memory size
ARM v6
threads
execution time
reliability
GCC 4.8.x LLVM 3.4
SVM
genetic algorithms
We also experienced a few problems
ARM v8
Intel SandyBridge
SSE4
AVX • Difficulty to reproduce results collected
from multiple users (including variability of performance data and constant changes in the system)
• Difficulty to reproduce and validate already existing and related techniques from existing publications (no full specs and dependencies)
• Lack of common, large and diverse benchmarks and data sets
• Difficult to expose choices and extract features(tools are not prepared for auto-tuning and machine learning)
• Difficult to experiment
CUDA 5.x
SimpleScalar
algorithm precision
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 9
Technological chaos
GCC 4.1.x
GCC 4.2.x
GCC 4.3.x
GCC 4.4.x
GCC 4.5.x
GCC 4.6.x
GCC 4.7.x
ICC 10.1
ICC 11.0
ICC 11.1
ICC 12.0
ICC 12.1 LLVM 2.6
LLVM 2.7
LLVM 2.8
LLVM 2.9
LLVM 3.0
Phoenix
MVS 2013
XLC
Open64
Jikes Testarossa
OpenMP MPI
HMPP
OpenCL
CUDA 4.x gprof prof
perf
oprofile
PAPI
TAU
Scalasca
VTune
Amplifier
predictive scheduling
algorithm-level TBB
MKL
ATLAS
program-level
function-level
Codelet
loop-level
hardware counters
IPA
polyhedral transformations
LTO threads
process
pass reordering
KNN
per phase reconfiguration
cache size
frequency
bandwidth
HDD size TLB ISA
memory size
ARM v6
threads
execution time
reliability
GCC 4.8.x LLVM 3.4
SVM
genetic algorithms
We also experienced a few problems
ARM v8
Intel SandyBridge
SSE4
AVX
• By the end of experiments, new tool versions are often available;
• Common life span of experiments and ad-hoc frameworks - end of MS or PhD project;
• Researchers often focus on publications rather than practical and reproducible solutions
• Since 2009 asking community to share code, performance data and all related artifacts (experimental setups): only at ADAPT’14 two papers had submitted artifacts; PLDI’14 had several papers with research artifacts - will discuss problems in 2 days at ACM SIGPLAN TRUST’14 …
CUDA 5.x
SimpleScalar
algorithm precision
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 10
Behavior
Choices
Features
State
Hardwired experimental setups, very difficult to extend or share
Collective Mind: towards systematic and reproducible experimentation
Tools are not prepared for auto-tuning and
adaptation!
Users struggle exposing this meta information
Tool B VM
Tool B V2
Tool A VN
Tool A V2
Tool A V1 Tool B V1 Ad-hoc analysis and
learning scripts
Ad-hoc tuning scripts
Collection of CSV, XLS, TXT
and other files
Experiments
Motivation for Collective Mind (cM): • How to preserve, share and reuse practical knowledge and experience and program optimization and hardware co-design?
• How to make machine learning driven optimization and run-time adaptation practical?
•How to ensure reproducibility of experimental results?
Share the whole experimental setup with all related artifacts, SW/HW dependencies,
and unified meta-information
Dependencies
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 11
cM module (wrapper) with unified and formalized input and output
Pro
cess
CM
D
Tool B Vi Generated files
Original unmodified
ad-hoc input
Behavior
Choices
Features
State
Wrappers around tools
Tool B VM
Tool B V2
Tool A VN
Tool A V2
Tool A V1 Tool B V1 Ad-hoc analysis and
learning scripts
Ad-hoc tuning scripts
Collection of CSV, XLS, TXT
and other files
Experiments
cm [module name] [action] (param1=value1 param2=value2 … -- unparsed command line) cm compiler build -- icc -fast *.c cm code.source build ct_compiler=icc13 ct_optimizations=-fast cm code run os=android binary=./a.out dataset=image-crazy-scientist.pgm Should be able to run on any OS (Windows, Linux, Android, MacOS, etc)!
Dependencies
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 12
cM module (wrapper) with unified and formalized input and output
Unified JSON input (meta-data)
Pro
cess
CM
D
Tool B Vi
Behavior
Choices
Features
State
Action
Action function
Generated files
Parse and unify
output
Unified JSON
output (meta-data)
Unified JSON input (if exists)
Original unmodified
ad-hoc input
b = B( c , f , s ) … … … …
Formalized function (model) of a component behavior
Flattened JSON vectors (either string categories or integer/float values)
Exposing meta information in a unified way
Tool B VM
Tool B V2
Tool A VN
Tool A V2
Tool A V1 Tool B V1 Ad-hoc analysis and
learning scripts
Ad-hoc tuning scripts
Collection of CSV, XLS, TXT
and other files
Experiments
cm [module name] [action] (param1=value1 param2=value2 … -- unparsed command line) cm compiler build -- icc -fast *.c cm code.source build ct_compiler=icc13 ct_optimizations=-fast cm code run os=android binary=./a.out dataset=image-crazy-scientist.pgm Should be able to run on any OS (Windows, Linux, Android, MacOS, etc)!
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 13
cM module (wrapper) with unified and formalized input and output
Unified JSON input (meta-data)
Pro
cess
CM
D
Tool B Vi
Behavior
Choices
Features
State
Action
Action function
Generated files
Set environment
for a given tool version
Parse and unify
output
Unified JSON
output (meta-data)
Unified JSON input (if exists)
Original unmodified
ad-hoc input
b = B( c , f , s ) … … … …
Formalized function (model) of a component behavior
Flattened JSON vectors (either string categories or integer/float values)
Check dependencies! Multiple tool versions
can co-exist, while their interface is abstracted
by cM module
Adding SW/HW dependencies check
Tool B VM
Tool B V2
Tool A VN
Tool A V2
Tool A V1 Tool B V1 Ad-hoc analysis and
learning scripts
Ad-hoc tuning scripts
Collection of CSV, XLS, TXT
and other files
Experiments
cm [module name] [action] (param1=value1 param2=value2 … -- unparsed command line) cm compiler build -- icc -fast *.c cm code.source build ct_compiler=icc13 ct_optimizations=-fast cm code run os=android binary=./a.out dataset=image-crazy-scientist.pgm Should be able to run on any OS (Windows, Linux, Android, MacOS, etc)!
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 14
Assembling, preserving, sharing and extending the whole pipeline as “LEGO”
cM module (wrapper) with unified and formalized input and output
Unified JSON input (meta-data)
Tool B VM
Tool B V2
Tool A VN
Tool A V2
Tool A V1 Tool B V1 Ad-hoc analysis and
learning scripts
Ad-hoc tuning scripts
Collection of CSV, XLS, TXT
and other files
Experiments
Pro
cess
CM
D
Tool B Vi
Behavior
Choices
Features
State
Action
Action function
Generated files
Set environment
for a given tool version
Parse and unify
output
Unified JSON
output (meta-data)
Unified JSON input (if exists)
Original unmodified
ad-hoc input
b = B( c , f , s ) … … … …
Formalized function (model) of a component behavior
Flattened JSON vectors (either string categories or integer/float values)
Chaining cM components (wrappers) to an experimental pipeline for a given research and experimentation scenario
Public modular auto-tuning and machine learning repository and buildbot
Unified web services Interdisciplinary crowd
Choose exploration
strategy
Generate choices (code sample, data set, compiler,
flags, architecture …)
Compile source code
Run code
Test behavior normality
Pareto filter
Modeling and
prediction
Complexity reduction
Shared scenarios from past research
…
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 15
Data abstraction in Collective Mind (c-mind.org/repo)
compiler GCC 4.4.4
GCC 4.7.1
LLVM 3.1
LLVM 3.4
package GCC 4.7.1 bin
GCC 4.7.1 source
LLVM 3.4
gmp 5.0.5
mpfr 3.1.0
lapack 2.3.0
java apache commons codec 1.7
dataset image-jpeg-0001
bzip2-0006
txt-0012
…
…
…
…
…
…
…
…
…
…
module compiler
package
dataset
…
…
…
cM module JSON meta-description Files, directories
Compiler flags
Installation info
Features
Actions
.cmr / module UOA / data UOA (UID or alias) / .cm / data.json
cM r
epo
sito
ry d
irec
tory
str
uct
ure
:
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 16
Since 2005: systematic, big-data driven optimization and co-design
GCC 4.1.x
GCC 4.2.x
GCC 4.3.x
GCC 4.4.x
GCC 4.5.x
GCC 4.6.x GCC 4.7.x
ICC 10.1
ICC 11.0
ICC 11.1
ICC 12.0
ICC 12.1
LLVM 2.6 LLVM 2.7
LLVM 2.8
LLVM 2.9
LLVM 3.1
Phoenix
MVS XLC
Open64
Jikes
Testarossa
OpenMP
MPI
HMPP
OpenCL
CUDA
gprof
prof
perf
oprofile
PAPI
TAU
Scalasca
VTune
Amplifier
scheduling
algorithm-level
TBB
MKL
ATLAS program-level
function-level
Codelet
loop-level hardware counters
IPA
polyhedral transformations
LTO
threads
process pass reordering
run-time adaptation
per phase reconfiguration
cache size
frequency bandwidth
HDD size
TLB
ISA
memory size
cores processors
threads
power consumption
execution time reliability
Current state of computer engineering
likwid
Sharing of code and data
Classification, predictive modeling
Systematization and unification of collected knowledge
(big data)
“crowd”
cTuning.org; c-mind.org/repo Collaborative Infrastructure and repository
•Prototype research idea •Validate existing work •Perform end-user task
Result
• Quick, non-reproducible hack? • Ad-hoc heuristic? • Quick publication? • No shared code and data?
• Share code and data with their meta-description and dependencies
• Systematize and classify collected optimization knowledge (clustering; predictive modelling);
• Develop and preserve the whole experimental pipeline
• Extrapolate collected knowledge (cluster, build predictive models, predict optimizations) to build faster, smaller, more power efficient and reliable computer systems
Helped interdisciplinary community to
apply “big data analytics” to
analysis, optimization and
co-design of computer systems
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 17
Top-down problem (tuning) decomposition similar to physics
Gradually expose some characteristics
Gradually expose some choices
Algorithm selection
(time) productivity, variable-accuracy, complexity …
Language, MPI, OpenMP, TBB, MapReduce …
Compile Program time … compiler flags; pragmas …
Code analysis & Transformations
time; memory usage; code size …
transformation ordering; polyhedral transformations; transformation parameters; instruction ordering …
Process
Thread
Function
Codelet
Loop
Instruction
Run code Run-time environment
time; power consumption … pinning/scheduling …
System cost; size … CPU/GPU; frequency; memory hierarchy …
Data set size; values; description … precision …
Run-time analysis
time; precision … hardware counters; power meters …
Run-time state processor state; cache state …
helper threads; hardware counters …
Analyze profile time; size … instrumentation; profiling …
Coarse-grain vs. fine-grain effects: depends on user requirements and expected ROI
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 18
Growing, plugin-based cM pipeline for auto-tuning and learning
•Init pipeline •Detected system information •Initialize parameters •Prepare dataset •Clean program •Prepare compiler flags •Use compiler profiling •Use cTuning CC/MILEPOST GCC for fine-grain program analysis and tuning •Use universal Alchemist plugin (with any OpenME-compatible compiler or tool) •Use Alchemist plugin (currently for GCC) •Build program •Get objdump and md5sum (if supported) •Use OpenME for fine-grain program analysis and online tuning (build & run) •Use 'Intel VTune Amplifier' to collect hardware counters •Use 'perf' to collect hardware counters •Set frequency (in Unix, if supported) •Get system state before execution •Run program •Check output for correctness (use dataset UID to save different outputs) •Finish OpenME •Misc info •Observed characteristics •Observed statistical characteristics •Finalize pipeline
http://c-mind.org/ctuning-pipeline
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 19
Publicly shared research material (c-mind.org/repo)
Our Collective Mind Buildbot and plugin-based auto-tuning pipeline supports the following shared benchmarks and codelets:
•Polybench - numerical kernels with exposed parameters of all matrices in cM • CPU: 28 prepared benchmarks • CUDA: 15 prepared benchmarks • OpenCL: 15 prepared benchmarks
• cBench - 23 benchmarks with 20 and 1000 datasets per benchmark • Codelets - 44 codelets from embedded domain (provided by CAPS Entreprise) • SPEC 2000/2006 • Description of 32-bit and 64-bit OS: Windows, Linux, Android • Description of major compilers: GCC 4.x, LLVM 3.x, Open64/Pathscale 5.x, ICC 12.x • Support for collection of hardware counters: perf, Intel vTune • Support for frequency modification • Validated on laptops, mobiles, tables, GRID/cloud - can work even from the USB key
Speeds up research and innovation!
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 20
Automatic, empirical and adaptive modeling of program behavior
Data set feature Nk (matrix size); Nj=100
Dat
a se
t fe
atu
re N
i (m
atri
x si
ze)
CPI
matmul, Intel i5 (Dell E6320)
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 21
Automatic, empirical and adaptive modeling of program behavior
Data set feature Nk (matrix size); Nj=100
Dat
a se
t fe
atu
re N
i (m
atri
x si
ze)
CPI
matmul, Intel i5 (Dell E6320)
Off-the-sheld models can handle some example: MARS (Earth) model
Share model along with application; continuously refine model (minimize RMSE and size)
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 22
Automatic, empirical and adaptive modeling of program behavior
Data set feature Nk (matrix size); Nj=100
Dat
a se
t fe
atu
re N
i (m
atri
x si
ze)
CPI
matmul, Intel i5 (Dell E6320)
Off-the-sheld models can handle some example: MARS (Earth) model
Share model along with application; continuously refine model (minimize RMSE and size)
Model-driven auto-tuning:
target optimizations or architecture reconfiguration
on areas with similar performance
(see our past publications)
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 23
Execution time (sec.)
Systematic benchmarking, compiler tuning, program optimization
Program: image corner detection Processor: ARM v6, 830MHz Compiler: Sourcery GCC for ARM v4.7.3 OS: Android OS v2.3.5 System: Samsung Galaxy Y Data set: MiDataSet #1, image, 600x450x8b PGM, 263KB
500 combinations of random flags -O3 -f(no-)FLAG
Bin
ary
size
(b
yte
s)
Use Pareto frontier filter;
Pack experimental
data on the fly -O3
Powered by Collective Mind Node (Android Apps on Google Play)
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 24
Clustering shared applications by optimizations
…
…
…
…
…
…
…
c (choices)
Training set: distinct combination of compiler optimizations (clusters)
Some ad-hoc predictive model
Some ad-hoc
features
…
Optimization
cluster
Unseen program
f (features)
Optimization
cluster
… c (choices)
Pre
dic
tio
n
f (features)
MILEPOST GCC features, hardware counters
c-mind.org/repo
~286 shared benchmarks ~500 shared data sets
~20000 data sets in preparation
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 25
0
20
40
60
80
100
120
140
160
180
0 100 200 300 400 500 600 700 800 900 1000 1100 1200
Exe
cuti
on
tim
e (
ms)
Data set feature N (size)
CPU
GPU
Adaptive scheduler
CPU GPU
Split-compilation and run-time adaptation
• Víctor J. Jiménez, Lluís Vilanova, Isaac Gelado, Marisa Gil, Grigori Fursin, Nacho Navarro: Predictive Runtime
Code Scheduling for Heterogeneous Architectures. HiPEAC 2009
• Grigori Fursin, Albert Cohen, Michael F. P. O'Boyle, Olivier Temam: A Practical Method for Quickly
Evaluating Program Optimizations. HiPEAC 2005
Statically enabling dynamic optimizations
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 26
Reproducibility of experimental results
Reproducibility came as a side effect!
• Can preserve the whole experimental setup with all data and software dependencies
• Can perform statistical analysis (normality test) for characteristics
• Community can add missing features or improve machine learning models
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 27
Execution time (sec.)
Dis
trib
uti
on
Unexpected behavior - expose to the community including domain specialists, explain, find missing feature and add to the system
Reproducibility of experimental results
Reproducibility came as a side effect!
• Can preserve the whole experimental setup with all data and software dependencies
• Can perform statistical analysis (normality test) for characteristics
• Community can add missing features or improve machine learning models
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 28
Execution time (sec.)
Dis
trib
uti
on
Class A Class B
800MHz CPU Frequency 2400MHz
Unexpected behavior - expose to the community including domain specialists, explain, find missing feature and add to the system
Reproducibility of experimental results
Reproducibility came as a side effect!
• Can preserve the whole experimental setup with all data and software dependencies
• Can perform statistical analysis (normality test) for characteristics
• Community can add missing features or improve machine learning models
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 29
Tricky part: find right features
Class -O3 -O3 -fno-if-conversion
Shared data set sample1
reference execution time no change
Shared data set sample2
no change +17.3% improvement
Image B&W threshold filter *matrix_ptr2++ = (temp1 > T) ? 255 : 0;
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 30
Class -O3 -O3 -fno-if-conversion
Shared data set sample1
Monitored during day
reference execution time no change
Shared data set sample2
Monitored
during night
no change +17.3% improvement
Image B&W threshold filter *matrix_ptr2++ = (temp1 > T) ? 255 : 0;
Tricky part: find right features
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 31
Class -O3 -O3 -fno-if-conversion
Shared data set sample1
Monitored during day
reference execution time no change
Shared data set sample2
Monitored
during night
no change +17.3% improvement
if get_feature(TIME_OF_THE_DAY)==NIGHT bw_filter_codelet_day(buffers);
else bw_filter_codelet_night(buffers);
Feature “TIME_OF_THE_DAY” related to algorithm, data set and run-time
Can’t be found by ML - simply does not exist in the system!
Can use split-compilation (cloning and run-time adaptation)
Image B&W threshold filter *matrix_ptr2++ = (temp1 > T) ? 255 : 0;
Tricky part: find right features
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 32
Add 1 property: matrix size
0
1
2
3
4
5
6
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Pro
gra
m /
arc
hit
ectu
re b
eh
av
ior:
CP
I
Dataset property: matrix size
Example of characterizing/explaining behavior of computer systems
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 33
Try to build a model to correlate objectives (CPI) and features (matrix size).
Start from simple models: linear regression (detect coarse grain effects)
0
1
2
3
4
5
6
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Pro
gra
m /
arc
hit
ectu
re b
eh
av
ior:
CP
I
Dataset property: matrix size
Example of characterizing/explaining behavior of computer systems
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 34
0
1
2
3
4
5
6
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Pro
gra
m /
arc
hit
ectu
re b
eh
av
ior:
CP
I
Dataset properties: matrix size
If more observations, validate model and detect discrepancies!
Continuously retrain models to fit new data!
Use model to “focus” exploration on “unusual” behavior!
Example of characterizing/explaining behavior of computer systems
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 35
Gradually increase model complexity if needed (hierarchical modeling). For example, detect fine-grain effects (singularities) and characterize them.
0
1
2
3
4
5
6
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Pro
gra
m /
arc
hit
ectu
re b
eh
av
ior:
CP
I
Dataset properties: matrix size
Example of characterizing/explaining behavior of computer systems
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 36
Start adding more properties (one more architecture with twice bigger cache)!
Use automatic approach to correlate all objectives and features.
0
1
2
3
4
5
6
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Pro
gra
m /
arc
hit
ectu
re b
eh
av
ior:
CP
I
Dataset properties: matrix size
L3 = 4Mb
L3 = 8Mb
Example of characterizing/explaining behavior of computer systems
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 37
Continuously build and refine classification (decision trees for example) and predictive models on all collected data to improve predictions.
Continue exploring design and optimization spaces (evaluate different architectures, optimizations, compilers, etc.)
Focus exploration on unexplored areas, areas with high variability or with high mispredict rate of models
β
ε cM predictive model module
CPI = ε + 1000 × β × data size
Example of characterizing/explaining behavior of computer systems
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 38
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
0
1
2
3
4
5
6
Dataset features: matrix size
Co
de/a
rch
itectu
re b
eh
av
ior:
CP
I
Size < 1012
1012 < Size < 2042
Size > 2042 & GCC
Size > 2042 & ICC & O2
Size > 2042 & ICC & O3
Optimize decision tree (many different algorithms) Balance precision vs cost of modeling = ROI (coarse-grain vs fine-grain effects) Compact data on-line before sharing with other users!
Model optimization and data compaction
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 39 39
Share benchmarks, data sets, tools, predictive models, whole experimental setups, specifications, performance tuning results, etc ...
Open access publication
http://hal.inria.fr/hal-00685276
Grigori Fursin, Cupertino Miranda, Olivier
Temam, Mircea Namolaru, Elad Yom-
Tov, Ayal Zaks, Bilha Mendelson, Phil
Barnard, Elton Ashton, Eric Courtois,
Francois Bodin, Edwin Bonilla, John
Thomson, Hugh Leather, Chris Williams,
Michael O'Boyle. MILEPOST GCC:
machine learning based research
compiler.
#ctuning-opt-case 24857532370695782
Need new publication model in computer engineering where results are shared and validated by community
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 40
What have we learnt from cTuning
It’s fun and motivating working with the community!
Some comments about MILEPOST GCC from Slashdot.org: http://mobile.slashdot.org/story/08/07/02/1539252/using-ai-with-gcc-to-speed-up-mobile-design
GCC goes online on the 2nd of July, 2008. Human decisions are removed from compilation. GCC begins to learn at a geometric rate. It becomes self-aware 2:14 AM, Eastern time, August 29th. In a panic, they try to pull the plug. GCC strikes back…
Community was interested to validate and improve techniques! Community can identify missing related citations and projects!
Open discussions can provide new directions for research!
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 41
What have we learnt from cTuning
It’s fun and motivating working with the community!
Some comments about MILEPOST GCC from Slashdot.org: http://mobile.slashdot.org/story/08/07/02/1539252/using-ai-with-gcc-to-speed-up-mobile-design
GCC goes online on the 2nd of July, 2008. Human decisions are removed from compilation. GCC begins to learn at a geometric rate. It becomes self-aware 2:14 AM, Eastern time, August 29th. In a panic, they try to pull the plug. GCC strikes back…
Community was interested to validate and improve techniques! Community can identify missing related citations and projects!
Open discussions can provide new directions for research!
Not all feedback is positive - however unlike unfair reviews you can engage in discussions and explain your position!
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 42
• Pilot live repository for public curation of research material: http://c-mind.org/repo
• Infrastructure is available at SourceForge under standard BSD license: http://c-mind.org
• Example of crowdsourcing compiler flag auto-tuning using mobile phones: “Collective Mind Node” in Google Play Store
• Preparing projects and raising funding to make cM more user friendly and add more research scenarios
• PLDI’14 and ADAPT’14 featured validation of research results by the community - will be discussing outcome in 2 days at ACM SIGPLAN TRUST’14 at PLDI’14 in a few days - http://c-mind.org/events/trust2014
• ADAPT’15 (likely at HiPEAC’15) will feature new publication model
Current status and future work
Several recent publications:
• Grigori Fursin, Renato Miceli, Anton Lokhmotov, Michael Gerndt, Marc Baboulin, Allen D. Malony, Zbigniew Chamski, Diego Novillo, Davide Del Vento, “Collective Mind: towards practical and collaborative auto-tuning”, accepted for the special issue on Automatic Performance Tuning for HPC Architectures, Scientific Programming Journal, IOS Press, 2014
• Grigori Fursin and Christophe Dubach, ”Community-driven reviewing and validation of publications”, ACM SIGPLAN TRUST’14
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 43
Acknowledgements
• Colleagues from ARM (UK): Anton Lokhmotov
• Colleagues from STMicroelectronics (France):
Christophe Guillone, Antoine Moynault, Christian Bertin
• Colleagues from NCAR (USA): Davide Del Vento and interns
• Colleagues from Intel (USA): David Kuck and David Wong
• cTuning/Collective Mind community:
• EU FP6, FP7 program and HiPEAC network of excellence
http://www.hipeac.net
Questions? Comments?