Upload
others
View
54
Download
0
Embed Size (px)
Citation preview
©2017ArmLimited1
Debugging and Profiling your HPC ApplicationsSrinath Vadlamani, Field Application [email protected]
Aug.8,ATPESC_2017
©2017ArmLimited2
AboutthistalkTechniquesnottools
• Learnwaystodebugandprofileyourcode
Usetoolstoapplytechniques
• DebuggingwithAllinea DDT• BenchmarkingwithAllineaPerformanceReports• ProfilingwithAllineaMAP• Gotowww.allinea.com/trials
ToolsareavailableontheATPESCmachinesAug.8,ATPESC_2017
©2017ArmLimited3
MotivationHPCsystemsarefinite
• Limitedlifetimetoachievemostsciencepossible
• Sharingapreciousresourcemeansyourlimitedallocationneedstobeusedwell
Yourtimeisfinite• PhDtosubmit
• Projecttocomplete
• Papertowrite
• Careertodevelop
DoinggoodthingswithHPCmeanscreatingbettersoftware,faster• Beingsmartaboutwhatyou’redoing
• Usingthetoolsthathelpyouapplysmarttechniques Aug.8,ATPESC_2017
©2017ArmLimited4
BioinformaticsDiscover Assembly3x speedupEC2
Deep LearningTorch + DeepMind5.3x speedupIntel Xeon Phi (KNL)
Fluid DynamicsHemeLB blood flow16.8x capability boost50k core crash fixed
Real-worldexample
Aug.8,ATPESC_2017
©2017ArmLimited5
Debugginginpractice…
Run
Crash
HypothesisInsertprintstatements
Compile
Aug.8,ATPESC_2017
©2017ArmLimited6
OptimizationinPractice
Inserttimers
Runcode
Analyzeperformance
result
Changecode
Aug.8,ATPESC_2017
©2017ArmLimited7
Aboutthosetechniques…“No-onecareshowquicklyyoucancomputethewronganswer”
•OldsayingofHPCperformanceexperts
Let’sstartwithdebuggingthen…
Aug.8,ATPESC_2017
©2017ArmLimited8
Sometypesofbug
Bohrbug Steady,dependablebug
Heisenbug Vanisheswhenyoutrytodebug(observe)
Mandelbug Complexityandobscurityofthecauseissogreatthatitappearschaotic
Schroedinbug Firstoccursaftersomeonereadsthesourcefileanddeducesthatthecodeshouldhaveneverworked,afterwhichtheprogramceasestoworkuntilfixed
Aug.8,ATPESC_2017
©2017ArmLimited9
DebuggingTheartoftransformingabrokenprogramtoaworkingone:
Debuggingrequiresthought– anddiscipline:• Tracktheproblem
• Reproduce
• Automate– (andsimplify)thetestcase
• Findorigins– wherecouldthe“infection”befrom?
• Focus– examinetheorigins
• Isolate– narrowdowntheorigins
• Correct– fixandverifythetestcase issuccessful
SuggestedReading:• AndreasZeller,“WhyProgramsFail”,2ndEdition,2009
Whatyouwillread:• Crowdsources likestackoverflow
Aug.8,ATPESC_2017
©2017ArmLimited10
Populartechniques
Automation• Testcases• Bisectionviaversioncontrol
Observation• Printstatements
• Debuggers
Inspiration• Explainingthesourcecodetoaduck
Magic• Staticanalysis• Memorydebugging
Aug.8,ATPESC_2017
©2017ArmLimited11
Solving Software DefectsWhohadaroguebehavior ?
• Mergesstacksfromprocessesandthreads
Wheredidithappen?• leapstosource
Howdidithappen?• Diagnosticmessages
• Somefaultsevidentinstantlyfromsource
Whydidithappen?• Unique“SmartHighlighting”
• Sparklines comparingdataacrossprocesses
RunwithAllineatools
Identifyaproblem
GatherinfoWho,Where,How,Why
Fix
Aug.8,ATPESC_2017
©2017ArmLimited12
Favorite Allinea DDT Features for Scale
Parallelstackview Automateddatacomparison:sparklines Parallelarraysearching
Step,play,andbreakpoints Offlinedebugging
Aug.8,ATPESC_2017
©2017ArmLimited13
6stepstohelpimproveperformance
Getarealistictestcase
Profileyourcode
Lookforthesignificant
Whatisthenatureoftheproblem?
Applybraintosolve BottleIt
Aug.8,ATPESC_2017Logginglikeanexperimentisuseful.
©2017ArmLimited14
Bottling it…• Lock in performance once you have won it• Save your nightly performance • Tie your performance results to your continuous integration server
• Lock in the bug fixes• Save the test cases• Tie the test cases to your continuous integration server
• Regression tests do help you from regressing!!!
Aug.8,ATPESC_2017
©2017ArmLimited15
Aug.8,ATPESC_2017
©2017ArmLimited16
HowTheToolsFit…
Forge
PerformanceReportsMeasure
DDTDebug
MAPProfileandOptimize
Demandforsoftwareefficiency
PullforMAPtodevelopperformancefix
LeadstoDDTtounderstandandfix
Debug,optimize,edit,commit,build,repeat…
Demandforperformanceoptimization
Demandfordebugging
Demandfordeveloperefficiency
LeadstoMAPtooptimizeperformance
VersionControl
ContinuousIntegration
OpenInterfaces(eg.JSONAPIs)
Aug.8,ATPESC_2017
©2017ArmLimited17
You can teach a man to fishBut first he must realize he is hungry
Image © Kanani CC-BY
How to help scientific developers best?
Aug.8,ATPESC_2017
©2017ArmLimited18
caption
… this is your code on “–O0”, ie. no optimizations
Communicate the benefits of optimizationShow, don’t tell…
Aug.8,ATPESC_2017
©2017ArmLimited19
caption “Vectorization, how does it work?”
Show performance they understand
Aug.8,ATPESC_2017
©2017ArmLimited20
caption
Out-of-order Pipelined
Timeperretired
instruction
Communicating at the right level
Aug.8,ATPESC_2017
©2017ArmLimited21
caption+simple,actionableadvice
Explaining performance at the right level
Aug.8,ATPESC_2017
Compileradviceisyourfriend.
©2017ArmLimited22
caption
Vectorization, MPI, I/O, memory, energy…
Aug.8,ATPESC_2017
©2017ArmLimited23
caption
Accelerator support…
Aug.8,ATPESC_2017
©2017ArmLimited24
ApplicationDevelopmentWorkflow
Profiling
Optimization
ExecutionDebugging
Coding
Aug.8,ATPESC_2017
©2017ArmLimited25
Hello Allinea Forge!
Observeanddebugyourcodestepbystep
FlicktoAllineaDDTCommoninterfaceandsettingsfiles
Increasingmemoryusage?Memoryleak!Workloadimbalance?Possiblepartitioner bug!
AllineaMAPtofindperformancebottleneck
Aug.8,ATPESC_2017
©2017ArmLimited26
Linux
OS/X
Windows
MultiplehopSSH
RSA+Cryptocard
Usesserverlicense
HPCmeansbeingproductiveonremotemachines
Aug.8,ATPESC_2017
©2017ArmLimited27
Smalldatafiles
<5%slowdown
Noinstrumentation
Norecompilation
MAPinanutshell
Aug.8,ATPESC_2017
©2017ArmLimited28
Above all…Aimedatanyperformanceproblemthatmatters
• MAPfocusesontime
Doesnotprejudgetheproblem• Doesn’tassumeit’sMPImessages,threadsorI/O
Ifthere’saproblem..• MAPshowsyouit,nexttoyourcode
Aug.8,ATPESC_2017
©2017ArmLimited29
Scaling issue – 512 processes
Simplefix…reduceperiodicityofoutput Aug.8,ATPESC_2017
©2017ArmLimited30
Deeper insight into CPU usageRuntimeofapplicationstillunusuallyslow
AllineaMAPidentifiesvectorization closetozero
Why?Timetoswitchtoadebugger!Aug.8,ATPESC_2017
©2017ArmLimited31
Whilestillconnectedtotheserverweswitchtothedebugger
Aug.8,ATPESC_2017
©2017ArmLimited32
It’salreadyconfiguredtoreproducetheprofilingrun
Aug.8,ATPESC_2017
©2017ArmLimited33
Today’s Status on ScalabilityDebuggingandprofiling
• Activeusersat100,000+coresdebugging
• 50,000coreswaslargestprofilingtriedtodate(andwasVerySuccessful)
• …andactiveuserswithjust1processtoo
Deployedon• NERSCCori,ORNL’sTitan,NCSABlueWaters,ANLMiraetc.
• Hundredsofmuchsmallersystems– academic,research,oilandgas,genomics,etc.
Toolshelpthefullrangeofprogrammerambition• Verysmallslowdownwitheithertool(<5%)
Aug.8,ATPESC_2017
©2017ArmLimited34
FivegreatthingstotrywithAllineaDDT
Thescalableprintalternative Stoponvariablechange Staticanalysiswarnings
oncodeerrors
Detectread/writebeyondarraybounds
Detectstalememoryallocations
Aug.8,ATPESC_2017
©2017ArmLimited35
SixGreatThingstoTrywithAllineaMAP
Findthepeakmemoryuse FixanMPIimbalance RemoveI/Obottleneck
MakesureOpenMPregionsmakesense Improvememoryaccess Restructurefor
vectorization
Aug.8,ATPESC_2017
©2017ArmLimited36
GettingstartedonThetaInstalllocalclientonyourlaptop
• www.allinea.com/products/forge/downloads
– Linux– installsfullsetoftools
– Windows,Mac– justaremoteclienttotheremotesystem
• Runtheinstallationandsoftware
• “Connecttoremotehost”
• Hostname:
• Remoteinstallationdirectory:/soft/debuggers/forge-7.0.6-2017-08-07/• ClickTest
CongratulationsyouarenowreadytodebugTheta.
Aug.8,ATPESC_2017
©2017ArmLimited37
HandsonSessionUseAllineaDDTonyourfavoritesystemtodebugyourcode– orexamplecodes
UseAllineaMAPorPerformanceReportsonCooleytoseeyourcodeperformance
UseAllineaDDTandAllineaMAPtogethertoimproveourtestcode• Downloadexamplesfromwww.allinea.com - Trialsmenu,Resources– “trialguide”
Aug.8,ATPESC_2017
©2017ArmLimited38
Thankyouforyourattention!Contact:
DownloadatrialforATPESC(orlater)• http://www.allinea.com/trials
Aug.8,ATPESC_2017