Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. SAND NO. 2016-2672 C
PreparingSandia'sApplicationPortfoliofortheFutureUsingKokkos
Christian Trott,DanielSunderland, CarterEdwards,[email protected]
Center forComputingResearchSandiaNationalLaboratories,NMSAND2017-2110C
NewProgrammingModels
§ HPCisataCrossroads§ DiversifyingHardwareArchitectures§ MoreparallelismnecessitatesparadigmshiftfromMPI-only
§ NeedforNewProgrammingModels§ PerformancePortability:OpenMP 4.5,OpenACC,Kokkos,RAJA,SyCL,
C++20?,…§ ResilienceandLoadBalancing:Legion,HPX,UPC++,...
§ Vendordecouplingdrivesexternaldevelopment
2
My(slightlychanged)GoalfortheTalk:DescribewhatittooktogetKokkos acceptedbylegacyapplications
Kokkos:Performance,PortabilityandProductivity
DDR#
HBM#
DDR#
HBM#
DDR#DDR#
DDR#
HBM#HBM#
Kokkos#
LAMMPS# Sierra# Albany#Trilinos#
https://github.com/kokkos
PerformancePortabilitythroughAbstraction
Kokkos
Execution Spaces (“Where”)
Execution Patterns (“How”)
Execution Policies
- N-Level- Support Heterogeneous Execution
- parallel_for/reduce/scan, task spawn- Enable nesting
- Range, Team, Task-Dag- Dynamic / Static Scheduling- Support non-persistent scratch-pads
Memory Spaces (“Where”)
Memory Layouts (“How”)
Memory Traits
- Multiple-Levels- Logical Space (think UVM vs explicit)
- Architecture dependent index-maps- Also needed for subviews
- Access Intent: Stream, Random, …- Access Behavior: Atomic- Enables special load paths: i.e. texture
Parallel ExecutionData Structures
Separating of Concerns for Future Systems…
Timeline
5
Initial Kokkos: Linear Algebra for Trilinos
Restart of Kokkos: Scope now Programming Model
Mantevo MiniApps: Compare Kokkos to other Models
LAMMPS: Demonstrate Legacy App Transition
Trilinos: Move Tpetra over to use Kokkos Views
Multiple Apps start exploring (Albany, Uintah, …)
Sandia Multiday Tutorial (~80 attendees)
Sandia Decision to prefer Kokkos over other models
Github Release of Kokkos 2.0
Kokkos-Kernels and Kokkos-Tools Release
DOE Exascale Computing Project starts
2008
2011
2013
2012
2014
2016
2015
2017
17"
0"2000"4000"6000"8000"10000"12000"14000"
FOM+(Z
/s)+
LULESH+Figure+of+Merit+Results+(Problem+60)+
HSW"1x16" HSW"1x32" P8"1x40"XL" KNC"1x224" ARM64"1x8" NV"K40"Higher"is"
Better"
Results"by"Dennis"Dinge,"Christian"Trott"and"Si"Hammond"
InitialDemonstrations
6
§ DemonstrateFeasibilityofPerformancePortability§ DevelopmentofanumberofMiniApps fromdifferentsciencedomains
§ DemonstrateLowPerformanceLossversusNativeModels§ MiniApps areimplementedinvariousprogrammingmodels
§ DOETriLab Collaboration§ ShowKokkos worksfor
otherlabsapp§ Notethisishistoricaldata:
Improvementswerefound,RAJAimplementedsimilaroptimizationetc.
TrainingtheUser-Base§ TypicalLegacyApplicationDeveloper
§ ScienceBackground§ MostlySerialCoding(MPIappsusuallyhavecommunicationlayerfew
peopletouch)§ Littlehardwarebackground,littleparallelprogrammingexperience
§ NotsufficienttoteachProgrammingModelSyntax§ Needtraininginparallelprogrammingtechniques§ Teachfundamentalhardwareknowledge(howdoesCPU,MICandGPU
differ,andwhatdoesitmeanformycode)§ Needtraininginperformanceprofiling
§ RegularKokkos Tutorials§ ~200slides,9hands-onexercisestoteachparallelprogramming
techniques,performanceconsiderationsandKokkos§ HeldatGTC,andSC;Alsoatrequestofinstitutions§ NowdedicatedECPKokkos supportproject:developonlinesupport
community 7
KeepingApplicationsHappy§ Neverunderestimatedevelopersabilitytofindnewcorner
cases!!§ HavingaProgrammingModeldeployedinMiniApps orasinglebigappis
verydifferentfromhavinghalfadozenmulti-millionlinecodecustomers.§ 430Issuesin22months§ ~25%aresmallenhancements§ ~20%biggerfeaturerequests§ ~25%arebugs:oftencornercases
80
100
200
300
400
500
Issues
Issues since 2015
Other
Question
Compiler Issue
Bug
Feature RequestEnhancements
§ Example:Subviews§ Initiallydatatypeneededtomatch
includingcompiletimedimensions§ Allowcompile/runtimeconversion§ AllowLayoutconversionifpossible§ Automaticallyfindbestlayout§ Addsubviewpatterns
TestingandSoftwareQuality§ ProgrammingModelsareinvasive
§ Reachmanycodelocations:allparallelizableloops§ Sometakeoverlowleveldatastructures§ Potentiallycostlytobackoutagain
§ PerformancePortabilityimpliesmultiplatform§ Muchgreatervarietyofcompilersandarchitectures§ Programmingmodelneedstosupportunionofcustomerneeds
§ TestingonSNLTestbeds§ IntelHaswell,KNL;IBMPower;CaviumARM;NVIDIAKepler,Pascal§ 15compilers(GCC,Intel,Clang,IBM,PGI)§ >200configurationseverynight
§ SEMS:SupportCommonSoftwareStackaccross SNL§ Applicationteamsdon’thavetheresourcesformultiplesoftwarestacks§ Delivertestedcompiler/tpl combinationsacrossdiversemachines 9
BuildinganEcoSystem
10
Algorithms(Random, Sort)
Containers(Map, CrsGraph, Mem Pool)
Kokkos(Parallel Execution, Data Allocation, Data Transfer)
Kokkos – Kernels(Sparse/Dense BLAS, Graph Kernels, Tensor
Kernels)
Kokk
os–
Tool
s(K
okko
saw
are
Pro
filin
g an
d D
ebug
ging
Too
ls)
Trilinos(Linear Solvers, Load Balancing, Discretization, Distributed Linear
Algebra)
Kokk
os–
Supp
ort C
omm
unity
(App
licat
ion
Sup
port,
Dev
elop
er T
rain
ing)
ApplicationsMiniApps
std::thread OpenMP CUDA ROCm
NecessaryResources§ Longtermdevelopment:
§ ~6yearseffortsofar§ onlynowseriouslyworkingonmajorapplications
§ NowmoreResourcesforSupport/ToolsthancoreModelR&D§ ~2FTEoncoreKokkos development§ ~1.5FTEapplicationsupport§ ~2FTEonToolsandKokkos Kernels
§ Diversehardwareresourcesfortestinganddevelopment§ Equivalentof2-3nodesfordedicatedtesting§ ~5differentarchitecturetestbedsfordevelopment§ BetaaccesstoallmajorHPCcompilers
§ IntensiveCollaborationwithVendors§ WorkingonCompilerBugs,Compilerimprovementsandnewbackends
11
FurtherMaterial
§ https://github.com/kokkos Kokkos Github Organization§ Kokkos: Corelibrary,Containers,Algorithms§ Kokkos-Kernels: SparseandDenseBLAS,Graph,Tensor(under
development)§ Kokkos-Tools: ProfilingandDebugging§ Kokkos-MiniApps:MiniApp repositoryandlinks§ Kokkos-Tutorials: ExtensiveTutorialswithHands-OnExercises
§ https://cs.sandia.gov Publications(searchfor’Kokkos’)§ ManyPresentationsonKokkos anditsuseinlibrariesandapps
§ www.gputechconf.com/gtcnew/on-demand-gtc.php§ SearchforKokkos:recordedtalksonKokkos andsomeusage
12
http://www.github.com/kokkos