Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Acknowledgments
Autotuning and Adaptivity Approach for Energy Efficient HPC Systems: The ANTAREX Toolbox
Cristina Silvano, Politecnico di Milano
http://www.antarex-project.eu/
Energy-efficient exascalesupercomputers • DARPA’s target of 20MW• 4x energy efficiency from
around 13 to 50 GFLOPS/W• Summit (June 2018) 1st-Top500
5th-Green500 122 PetaFLOPS13.8 GFLOPS/W
ANTAREX: The context
Programmability and easy application tuning• Develop HPC applications is a
complex tasks, which requires• several specialized languages• tools for performance tuning
• Incompatible with the current trend to open HPC infrastructures to a wide range of users Next generation supercomputers need to be
coupled with a radically new software stackcapable of exploiting the benefits offered by
heterogeneity to meet programmability, scalability and energy efficiency required by the Exascale era.
Energy-efficient exascalesupercomputers • DARPA’s target of 20MW• 8x energy efficiency from
around 6 to 50 GFlops/W• Sunway TaihuLight (Nov. 2017)
1st-Top500 20th-Green500
ANTAREX: The context
Programmability and easy application tuning• Develop HPC applications is a
complex tasks, which requires• several specialized languages• tools for performance tuning
• Incompatible with the current trend to open HPC infrastructures to a wider range of users Next generation supercomputers need to be
coupled with a radically new software stackcapable of exploiting the benefits offered by
heterogeneity to meet programmability, scalability and energy efficiency required by the Exascale era.
ANTAREX wants to provide a breakthrough approach • to express by a Domain Specific Language the
application self‐adaptivity and extra‐functional requirements
• to runtime manage, monitor and autotune applicationsfor green HPC systems up to the Exascale level
ANTAREX wants to provide a breakthrough approach • to express by a Domain Specific Language the
application self‐adaptivity and extra‐functional requirements
• to runtime manage, monitor and autotune applicationsfor green HPC systems up to the Exascale level
ANTAREX Tool flowC/C++ w/ OpenMP, MPI, OpenCL
functional description
ANTAREX Tool flowC/C++ w/ OpenMP, MPI, OpenCL
functional description
mARGOt
LARA/Clava
ExaMon
libVersioningCompiler
LARA DSL for Extra-Functional Specifications
DSL WEAVER
Functional Description Extra‐Functional Requirements E.g. Adaptivity features, Tuning Knobs,
Power/Performance constraints
DSL LARAC/C++ w/ OpenMP,
MPI, OpenCL
• Enable separation of concerns: functional and extra-functional descriptions are decoupled.
• Useful to express strategies forinstrumentation and profiling
• Support for explore compileroptimization space, accordingto code and target architectures
• Enable more advanced controlthan using pragmas/directives/ switches (e.g. considering application parameters and mapping control)
• Enable separation of concerns: functional and extra-functional descriptions are decoupled.
• Useful to express strategies forinstrumentation and profiling
• Support for explore compileroptimization space, accordingto code and target architectures
• Enable more advanced controlthan using pragmas/directives/ switches (e.g. considering application parameters and mapping control)
LARA Design Benefits from Aspect Orientation Reusable Strategies
Custom Targetability
Design Exploration
LARA-based Tool Flow
Application (C, Java, MATLAB)
Aspects / Strategies (LARA)
Library of Aspects / Strategies
Compiler Toolset
Code Output Analysis Output
LARA Instrumentation Example
LARA Instrumentation Example
Used to manage precision tuning, application and compiler autotuning, code transformations, code
versioning, memoization, application monitoring, etc.
Used to manage precision tuning, application and compiler autotuning, code transformations, code
versioning, memoization, application monitoring, etc.
Loop Unrolling: Aspects and Strategies
LARA: “recipes” for compiler optimizations
aspectdef LoopUnroll
select loop end
apply
if($loop.num_iterations <= 32) {
$loop.exec Unroll(0);
} else {
$loop.exec Unroll(2);
}
end
condition
$loop.is_innermost &&
$loop.type=="for"
end
end
Advices (actions)
Program elements
Condition
Apply dynamic choices from algorithm parameters to compiler and mapping optimizationsForms of runtime adaptivity include:
• Modifications of application parameters (attributes)• Selection among different algorithms for solving the same problem• Different compiler optimizations for the same algorithm• Runtime strategies for partitioning and for mapping computations
targeting hardware accelerators• Runtime management of system resources
LARA: Runtime Adaptivity Dimensions
Extending LARA with native support for runtime adaptivity strategies
ANTAREX Runtime Framework
CoolingNode1 Node2 NodeNNodei
Rack1 RackK
J1J2
J3JN
Job SchedulerAllocator
J1 JN
M M M M
AM
P P P P
COff
C
AA AM
COn
A
C
P
COff
COn
MAM
Autotuner
Collector
Power Manager
Offline Compiler
Online Compiler
Hardware Monitor
Application Monitor
Compile Time
Deploy Time
Run Time
COn
M
C DB
mARGOt: ANTAREX Application AutotunermARGOt enables self-optimization at application-level by:
– SW Knobs tunable at runtime (such as parallelism and code versioning)
– Application knobs to tune the approximation by trading off accuracy vs throughput: output just needs to be “good enough”
– Feed-back loop to make the application behaviour self-aware
ExaMon: Monitoring System & Power Management
DB
CassandraKairosDBCassandraKairosDB
ANTAREX Power Management
Power capping strategies to select the best performance point for each core given a power constraint.
AutoTuning Geometric Docking for HPC-based Drug Discovery
Personalized Medicine will enable to “treat the right patient with the right drug at the right dose at the right time." [FDA]
Use Case 1: HPC Accelerated Drug Discovery System
Need of HPC in Drug Discovery: HPC Molecular Simulations
Molecular Docking is a method to calculate the preferred position and shape of one molecule to a second when bound to each other• Geometric Docking
• Shape Complementarity: geometric matching search to find out compatible pairs and most suitable poses
• Pharmacophoric Docking• Molecular Simulation: exploration of a large energy
landscape determined by chemical and physical interactions
Molecular Docking
Developing energy and resource efficient algorithmsUsing self‐functionalities to adapt and scale‐out LiGen application
LiGen: Exascale‐ready HPC application Tangible Chemical Space: 300 Bio
Peptide library: 120 Mio
Commercial libraries: 6 Mio
SafeInMan library: 9 K
Exascale HPC Virtual ScreeningExascale HPC Virtual Screening
LiGen in ANTAREX
Large experimental campaign1Bio ligands database currently running on 250K CPUs MARCONI supercomputer at CINECA
AutoTuning Time-Dependent Routing Algorithm in HPC based
Navigation System
Smart City Challenge exploiting synergies betweenclient‐side and server‐side: • Many drivers – many routing requests to HPC system• Multiple traffic status data sources• Continuous update of traffic flow calculation
Use Case 2: Self-adaptive Navigation System
10
HPCSygic Company develops world`s most popular navigation application & provides professional
navigation software for business solutions
Smart City Challenges • Navigate hundred thousands of drivers/cars
within city area• Avoid traffic jams• Intelligent routing – Balance routes for a city
global optimum • Accurate calculation of traffic flow – by
exploiting big data• Minimize data transfer• To serve all drivers’ requests with global best
under variable conditions• Large scale graph problems (e.g.
Betweenness centrality) with data updates• Around 12M vertex and 30M edges for ¼ Europe
Intelligent Navigation for Smart Cities
What is the time dependent routing part of a navigation system?• It is the module responsible for determining the expected travel time• Given the client-server navigation infrastructure, it is on the server side to
have accurate travel time evaluations with updated traffic information• It is implemented by using MonteCarlo Simulation (MCS) over the
probabilistic speed profile on each path segment
Probabilistic Time Dependent Routing
Travel‐Time Probability Distribution
#Samples
MCSMCSTime + Path + Speed Profiles
Large experimental campaign currently running on IT4I supercomputer
http://www.antarex-project.eu/