Sustainable Research Pathways Workshop › assets › Uploads › Research-on-the...Sustainable Research Pathways Workshop December 8, 2015 • Na#onal Energy Research Scienﬁc Compung

Research on the Path to

Exascale at NERSC

Alice E. Koniges Sustainable Research

Pathways Workshop December 8, 2015

•  Na#onalEnergyResearchScien#ficCompu#ngCenter–  Established1974,firstunclassifiedsupercomputercenter

–  Originalmission:toenablecomputa<onalscienceasacomplementtomagne<callycontrolledplasmaexperiment

•  Today’s mission: Accelerate scientific discovery at the DOE Office of Science through high performance computing and extreme data analysis

•  A national user facility

NERSC

Trajectoryofanenerge.cioninaFieldReverseConfigura.on(FRC)magne.cfield.Magne.cseparatrixdenotedbygreensurface.Spheresarecoloredbyazimuthalvelocity.ImagecourtesyofCharlsonKim,U.ofWashington;NERSCreposm487,mp21,m1552

•  Diverseworkload:–  4,500users,700+projects–  700codes;100sofusersdaily

•  Alloca#onscontrolledprimarilybyDOE–  80%DOEAnnualProduc<onawards(ERCAP):

•  From10Khourto~10Mhour•  Proposal-based;DOEchooses

–  10%DOEASCRLeadershipCompu<ngChallenge

–  10%NERSCreserve•  NISE,NESAP

3

NERSC: Mission Science Computing for the DOE Office of Science

CollisionbetweentwoshellsofmaSerejectedintwosupernova

erup.ons,showingaslicethroughacorneroftheevent.

Colorsrepresentgasdensity(redishighest,darkblueislowest).ImagecourtesyofKe-JungChen,SchoolofPhysicsandAstronomy,Univ.Minnesota.Repom1400

DOE View of Workload

ASCR AdvancedScien#ficCompu#ngResearch

BER Biological&EnvironmentalResearch

BES BasicEnergySciences

FES FusionEnergySciences

HEP HighEnergyPhysics

NP NuclearPhysics

NERSCAlloca#onsByDOEOffice

ASCR6% BER

18%

BES31%

FES20%

HEP13%

NP12%

107

106

105

104

103

102

10

2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020

NERSC Roadmap

Franklin(N5)+QC36TFSustained352TFPeak

Hopper(N6)1.25PFPeak

Edison(N7)2.4PFPeak

Cori(N8)10-30xHopper

N9200-500PFPeak

PeakTeraflop

/s

N10~1EFPeak

CRTFacility

The Big Picture: KNL is Coming to NERSC •  ThenextlargeNERSCproduc#onsystem“Cori”willbeIntelXeonPhi

KNL(KnightsLanding)architecture–  Energyefficientmanycoresystem–  >9300singlesocketnodes,mul<pleNUMAdomains–  Self-hosted,notanaccelerator–  72cores/node,4hardwarethreadspercore.Totalof288threadspernode–  AVX512,largervectorlengthof512bits(8double-precisionelements)–  Onpackagehigh-bandwidthmemory(HBM)–  BurstBuffer

-6-

Edison / Cori Quick Comparison

Edison(Ivy-Bridge)NERSCCrayXC30system

•  12CoresPerCPU•  24LogicalCoresPerCPU•  2.4-3.2GHz•  Vectorlengthof256bits,

4DoublePrecisionOpsperCycle(+mul<ply/add)

•  2.5GBofMemoryPerCore•  ~100GB/sMemory

Bandwidth

Cori(Knights-Landing)

•  72PhysicalCoresPerCPU•  288LogicalCoresPerCPU•  MuchslowerGHz•  Vectorlengthof512bits,

8DoublePrecisionOpsperCycle(+mul<ply/add)

•  <0.3GBofFastMemoryPerCore•  <2GBofSlowMemoryPerCore•  Fastmemoryhas~5xDDR4

bandwidth•  BurstBufferforfastIO

AdvancedScien.ficCompu.ngResearch

BoxLibAMRFrameworkChombo-crunch

HighEnergyPhysics

WARP&IMPACTMILCHACC

NuclearPhysicsMFDnChroma

DWF/HISQ

BasicEnergySciencesQuantumEspresso

BerkeleyGWPARSECNWChemEMGeo

BiologicalandEnvironmental

ResearchGromacs

MeraculousMPAS-OACMECESM

FusionEnergySciencesM3DXGC1

NESAP: NERSC Exascale Science Application Program

Jack Deslippe Rebecca Hartman-Baker

Programming Considerations: Running on Cori

• Applica#onisverylikelytorunonKNLwithsimplepor#ng,buthighperformanceishardertoachieve.

• Applica#onsneedtoexploremoreon-nodeparallelismwiththreadscalingandvectoriza#on,alsotou#lizeHBMandburstbufferop#ons.

• Manyapplica#onswillnotfitintothememoryofaKNLnodeusingpureMPIacrossallHWcoresandthreadsbecauseofthememoryoverheadforeachMPItask.

• HybridMPI/OpenMPistherecommendedprogrammingmodel,toachievescalingcapabilityandcodeportability.

• CurrentNERSCsystems(Edison/HopperandBabbage)canhelppreparecodesforCori.

Lots of cores with in package memory

10Source:AvinashSodani,HotChips2015KNLtalk

Connecting tiles

11Source:AvinashSodani,HotChips2015KNLtalk

-12-

To run effectively on Cori users will have to: •  ManageDomainParallelism

–  independentprogramunits;explicit

•  IncreaseNodeParallelism–  independentexecu<onunitswithintheprogram;generallyexplicit

•  ExploitDataParallelism–  Sameopera<ononmul<pleelements

•  Improvedatalocality–  Cacheblocking;Useon-packagememory

MPI MPI MPI

x

y

z

Threads

x

y

z

|--> DO I = 1, N | R(I) = B(I) + A(I) |--> ENDDO

Threads Threads

CASE STUDY: XGC1 PIC Fusion Code•  Par#cle-in-cellcodeusedtostudyturbulenttransportinmagne#cconfinementfusionplasmas.•  Usesfixedunstructuredgrid.HybridMPI/OpenMPforbothspa#algridandpar#cledata.(plusPGI

CUDAFortran,OpenACC)•  ExcellentoverallMPIscalability•  Internalprofiling#merborrowedfromCESM•  UsesPETScPoissonSolver(separateNESAPeffort)•  60k+linesofFortran90codes.•  Foreach#mestep:

–  Depositchargesongrid–  Solveellip<cequa<ontoobtainelectro-magne<cpoten<al–  Pushpar<clestofollowtrajectoriesusingforcescomputedfrombackgroundpoten<al(~50-70%of<me)–  Accountforcollisionandboundaryeffectsonvelocitygrid

•  Most#mespentinPar#clePushandChargeDeposi#on

-13-

Unstructuredtriangularmeshgridduetocomplicatededgegeometry

SampleMatrixofcommunica<onvolume

Programming Portability

•  CurrentlyXGC1runsonmanyplagorms•  PartofNESAPandORNLCAARprograms•  AppliedforANLThetaprogram•  PreviouslyusedPGICUDAFortranforaccelerators•  ExploringOpenMP4.0targetdirec#vesandOpenACC.•  Have#ifdef_OpenACCand#ifdef_OpenMPincode.•  Hopetohaveasfewercompilerdependentdirec#vesas

possible.•  NestedOpenMPisused•  NeedsthreadsafePSPLIBandPETSclibraries.

-14-

New algorithms should work in concert with new exascale operating systems: ParalleX Execution Model

•  Lightweightmul#-threading–  Dividesworkintosmallertasks–  Increasesconcurrency

•  Message-drivencomputa#on–  Moveworktodata–  Keepsworklocal,stopsblocking

•  Constraint-basedsynchroniza#on–  Declara<vecriteriaforwork–  Eventdriven–  Eliminatesglobalbarriers

•  Data-directedexecu#on–  Mergerofflowcontrol&datastructure

•  Sharednamespace–  Globaladdressspace–  Simplifiesrandomgathers

Thomas Sterling, et al. IU and XPRESS

HPX and Related Application Development

–  Exploreappdevelopmentalterna<veto“tradi<onalMPI+X”.–  Ques<on:Canaqualita<velydifferentapproach(Parallex-based):

•  Exploituntappedandnewparallelism?•  Improveexpressability?•  Improveproduc<vity?•  GetustoExascaleandbeyond?

–  Broadsamplingofappdomains&algorithms:•  Plasmaphysics,Many-body&par<cle-in-cell(PIC)•  Nuclearengineering&finitevolume/eigensolvers.•  Shockphysics&finiteelement/explicit<meintegra<on.•  Computa<onalmechanics&implicitsparsesolvers.

–  Fullteameffortinvolvingappdesigners,XPRESSteam,HPXandParalleXdevelopers,andcompilerandtoolsdevelopers

Application Perspective

•  UseofFutures:–  Exploitpreviouslyinaccessible,fine-graindynamicparallelism.–  Naturalframeworkforexpressingdata-drivenparallelism.

•  BekerthanMPI:–  Beyondfunc<onalmimicofMPI.–  ReallyuseAc<veGlobalAddressSpace(AGAS)–  Takeadvantageoffine-grainedparallelismusingageneralizedconceptofthreads

•  OverarchingApplica#onTeamgoal:–  DemonstratethatParallex-basedapproacheswork–  SuperiortoMPI+Xinoneormoremetric:

•  Performance:Extrac<nglatentparallelism.•  Portability:Performanceobtainedfromsystem’sunderlyingrun<me.•  Produc<vity:Easiertowrite,understand,maintain.

The Ideal HPX Model from an Application Perspective

•  Func#onalize–figureoutwhatisaquantumofwork•  Determinedatadependencies•  Createadataflowstructure•  Feedintodatataskmanager•  Applica#ons

–  Sweetspotbetweengrainsizesforvarioustaskgranulari<es.–  Strongscalingimprovesasyouaddedextralevelsofrefinement.OppositeofwhatyouseewithMPI,givingmoreusableworktothesimula<on

–  LotsnewresultsatSC15

NERSC is a great place for research on applications and programming models

•  Accesstolatesthardware•  Applica#onexpertsinavarietyofdomains•  NESAPprogramwithpost-docsandstaff•  Par#cipatesinOpenMPandMPIlanguagestandards

•  Cross-DOEprojectsonnextgenera#onrun#mesandprogrammingmodels

Documents

Sustainable Research Pathways Workshop › assets › Uploads › Research-on-the...Sustainable Research Pathways Workshop December 8, 2015 • Na#onal Energy Research Scienﬁc Compung