17
RedLine Performance Methodology (RPM) Getting Maximum Performance Out of HPC Systems

RPM HPC Systems - RedLine Performance Solutions€¦ · HPC workloads are vital to citizens, governments, and organizations throughout the world. Many ... Research into new materials

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: RPM HPC Systems - RedLine Performance Solutions€¦ · HPC workloads are vital to citizens, governments, and organizations throughout the world. Many ... Research into new materials

RedLine Performance Methodology (RPM)

Getting Maximum Performance Out of HPC Systems

Page 2: RPM HPC Systems - RedLine Performance Solutions€¦ · HPC workloads are vital to citizens, governments, and organizations throughout the world. Many ... Research into new materials

RedLinePerformanceMethodology(RPM)—GettingMaximumPerformanceOutofHPCSystems 2

RedLinePerformanceSolutionsisaworld-classproviderofhigh-performancecomputing(HPC)solutions.Ourpromise:toensureobjectivelyengineeredtopqualitysolutionsateveryphaseoftheHPClifecycle,minimizinglabor,time,andcosts.OurproprietaryRedLinePerformanceMethodology(RPM)–developedovertwodecadesofworkingwithHPCsystemsandapplicationsandupdatedregularlywithlessonsfromeachnewengagement–deliversuniquebenefitsthatconsistentlymaximizecustomersuccess.

WrittenBy:

ChrisYoung

KeithBall

AndrewQualkenbush

MarkPotts

CarolynPasti

DonAvart

Page 3: RPM HPC Systems - RedLine Performance Solutions€¦ · HPC workloads are vital to citizens, governments, and organizations throughout the world. Many ... Research into new materials

RedLinePerformanceMethodology(RPM)—GettingMaximumPerformanceOutofHPCSystems 3

HPCsystemperformancematters–alot.

That’swhyweatRedLinePerformanceSolutions(RedLine)havededicatedourbusinesstoensuringourcustomers’HPCsystemsperformattheirveryhighestlevel.

HPCworkloadsarevitaltocitizens,governments,andorganizationsthroughouttheworld.Many,ifnotmost,oftheitemswetakeforgrantedtodaywerediscovered,designed,orinsomewaymadepossiblebyHPCsystems.It’snotastretchtosaythatlivesdependontheperformanceofHPCsystems.

Oneexampleofthisisbiologicalresearchtoimprovemedicines,curediseases,andgrowbettercrops.Researchintonewmaterialsandnewenergysourcesisalsohighlyimportanttotheworldasawhole.

AtRedLine,HPCperformanceisouronlybusiness.WehavedevelopedourRedLinePerformanceMethodology(RPM)toensureconsistentlyoptimalHPCperformanceforourcustomers.PerhapsthebestwaytoexplaintheRPMandhowweapplyitistodiscusshowweworkwiththeUSNationalWeatherService.

Page 4: RPM HPC Systems - RedLine Performance Solutions€¦ · HPC workloads are vital to citizens, governments, and organizations throughout the world. Many ... Research into new materials

RedLinePerformanceMethodology(RPM)—GettingMaximumPerformanceOutofHPCSystems 4

Delivering RedLine Performance

Forthepast19years,RedLinehassupportedtheNationalWeatherServiceattheNationalCentersforEnvironmentalPrediction(NCEP).Ourteammanagestheoperationalsupercomputersthatprovideforecastguidanceproductsuseddailybymeteorologiststopredicttheweather.

NCEPalsoissuessevereweatherwarnings,alongwithflightguidancetotheFAAandthemilitary.ThemodelsusedbytheNationalHurricaneCenterarealsorunonNCEP’ssupercomputers.

TheperformanceofNCEP’sHPCsystemsisabsolutelycriticaltotheirmissionandtothecountryaswellforthreeprimaryreasons:

• Criticality:Livesliterallydependonaccurateandtimelyweatherforecasts.Givingpeopleadvancednoticeofimpendingweatheremergenciesisonlypossibleifthesystemshaveenoughdataandcomputepowertorunthemodelsasquicklyandaccuratelyaspossible.

• Cost:theUSgovernmentspendslargeamountsofmoneytoprocureandruntheirsupercomputers.It’simportanttheygetthemaximumvaluefortheirpurchase.Thesameprincipleappliestonon-governmentusers.Forexample,ifanetworkpromises100Gb/secperformance,thenitshouldprovidethatperformanceinpractice–andit’sourmissionto

Page 5: RPM HPC Systems - RedLine Performance Solutions€¦ · HPC workloads are vital to citizens, governments, and organizations throughout the world. Many ... Research into new materials

RedLinePerformanceMethodology(RPM)—GettingMaximumPerformanceOutofHPCSystems 5

helpourcustomersgettheadvertisedperformanceoutoftheirHPCsystems.

• Consistency:NCEPrunsmultipleweathermodels24hoursaday,sevendaysaweek.Productionisessentiallyonelongworkflow,withdataingested,largemodelsrun,andforecastproductsdelivered.Theoutputofonemodelbecomestheinputofanotherinordertoarriveatshort-term,mid-term,andlong-rangeforecasts.Eachofthesemodelsusesamassiveamountofdataingestedfromthenetwork,whichisstagedonthestoragenetwork,andfunneledthroughtheserversforprocessing.Evenaslightperformanceproblemcanhaveasignificantrippleeffectcausingforecaststobedelayed.

Becauseourcustomers’workloadsareofutmostimportance,RedLinecloselymonitorssystemandapplicationperformanceeverystepoftheway.Potentialissuesaredetectedearlyandcorrectiveactionistakenbeforetheissuesbecomeproblems.Thisistheonlywaytoensuremission-criticalsystemsconsistentlydelivertherequiredperformance.

Thekeystosustainingconsistenthighperformancearetohaveextensive,detailedknowledgeofsystemperformance,knowinghowtomonitorperformance,andhavingbusinessrulesinplacetodealwithperformancevariations.

NoteveryorganizationhasthesamestringentperformancerequirementsasNCEP,howeverallorganizationsneedtohavesomelevelofunderstandingoftheirperformancerequirementsandcharacteristics.UnderstandingperformanceenablesITmanagerstomoreaccuratelypredictworkloadcompletion,recognizetheneedformoreresources,andrespondtogrowthindemand.Thisalsoallowsthemtoeliminateguessworkwhenslowdownsoccurorwhentheyareperceivedbyendusers.

Page 6: RPM HPC Systems - RedLine Performance Solutions€¦ · HPC workloads are vital to citizens, governments, and organizations throughout the world. Many ... Research into new materials

RedLinePerformanceMethodology(RPM)—GettingMaximumPerformanceOutofHPCSystems 6

Performance,TheRedLineWay

Wefindthatorganizationsoftenhaveperformanceexpectationsthatarenotquantifiable.Notethatherewestate‘expectations’ratherthan‘requirements’,becauserequirementsarequantifiable.

Expectationsareusuallybasedonenduserexperience.Whenenduserscomplainanapplicationisslow,ITstaffareexpectedtoreactandsolvetheperceivedproblem.However,inawell-tunedandinstrumentedenvironment,ITprofessionalsoftenalready

knowwhenaproblemexistsandareworkingtoisolateit.

Iftheenvironmentisnotwelltunedandinstrumented,serviceprofessionalshaveamoredifficulttimedeterminingwhethertheperformanceissueistrulyvalid,andifitis,determiningtherootcauseoftheproblem.

Ourapproachtosystemsmanagementisbasedonthepremisethatorganizationsneedwell-definedperformancemetrics.Understandingsystemperformancerequiresanunderstandingofthesystemfrombeginningtoend.

Ingeneralterms,datacomesintoasystem,isstored,manipulated,andoutputisproduced.Thesystemcomponents(network,CPU,storage)worktogetherandhavenumerousinterdependencies.Webelievethathavingathoroughunderstandingofhoweachsystemcomponentperformsindividually,aswellashowtheyperformtogether,isfundamentaltoproactivesystemsmanagement.

Page 7: RPM HPC Systems - RedLine Performance Solutions€¦ · HPC workloads are vital to citizens, governments, and organizations throughout the world. Many ... Research into new materials

RedLinePerformanceMethodology(RPM)—GettingMaximumPerformanceOutofHPCSystems 7

KeystoSystemsManagement&MaximizingPerformance

Whethermanagingasmallclusterofeightsystemsoramega-clusterwiththousandsofnodes,systemsmanagementiscriticaltoensuringmaximumperformance,predictability,andrapid,effectivetroubleshooting.Oursystemsmanagementapproachheavilyreliesonthefollowingfivemethods:

SystemPerformanceTuning:Thisisanincrediblybroadtopicthatbordersonanartform.Itisaniterativeprocessthatbuildsonpriorknowledgeandconstantlyevolves.

Greatcaremustbeexercisedwhentuning,aschangesinindividualsystemsalmostalwayshavedownstreamandupstreamramifications.

Asystem’scurrentstageinitslifecyclestronglyinfluencesthebestapproachto

adoptforperformancetuning.Ideally,performancetuningisinitiatedinthepre-productionphase,butperformancetuningcanandshouldbeconsideredthroughoutallphasesofthesystemslifecycle.Itisimperative,however,thatsolidsystemsmanagementpracticesareinplaceifasystemisfartheralonginthelifecyclethanthepre-productionphase.

Baselining:Theimportanceofbaselineperformancemetricscannotbeoverstated.Withoutameasuredbaseline,performanceexpectationsarebasedonspeculation.Baseliningasystemincludesrunningsimplebenchmarkssuchasnetworkthroughputtests,diskIOPstests,andmeasuringend-to-endapplicationperformance.

Afterthebaselinehasbeenestablished,baselinetestsarere-executedpriortodeployingnewhardwareorsoftware,aswellasbeforeandafteranupgradeorpatch.

Page 8: RPM HPC Systems - RedLine Performance Solutions€¦ · HPC workloads are vital to citizens, governments, and organizations throughout the world. Many ... Research into new materials

RedLinePerformanceMethodology(RPM)—GettingMaximumPerformanceOutofHPCSystems 8

Executethebaselineteststoensurethesystemisinahealthystatepriortoapplyingtheupgradeorpatchandagainafterwardtoensurethechangehadnonegativeorunexpectedeffects.

Baselinetestsshouldbeexecutedwheneverthereisanopportunitytovalidateperformanceinameasuredfashion.Baselinetestsarealsorunwhenperformingproblemdetermination.IfabottleneckintheI/Osubsystemissuspected,executetheI/ObaselineteststovalidatetheI/Osubsystem.

Agoodbaselinetesthaswell-definedperformancerequirementsandarepeatablemethodtotestforthedefinedperformancemetricormetrics.ThiscouldbeanindividualcomponenttestsuchasnetworkperformanceofaWANlink,oraIOPstesttoanindividualdiskdrive,RAIDarray,orfilesystem.Agoodbaselinetestcouldalsobeanend-to-endsystemtestthatsimulatesspecificfunctionalitythat,inthebestcase,simulatesnormaloperations.It’svitalthatdetailedperformancemetricsarecollectedduringbaselinetesting.Itisalsoimportantthattheseperformancemetricsarecapturedandstoredforfuturereference.

SystemsMonitoring:Effectivesystemsmonitoringprovidesserviceprofessionalsinsightintoproblemsbeforetheyarereportedbyusers.It’sthedifferencebetweenrelyingonanover-temperaturelightinyourcarvs.havingatemperaturegaugeshowingthetemperatureasitincreasesplusthealarmlightwhenthetemperatureexceedsthehealthythreshold.

Manysystemsmonitoringtoolsaredesignedtoalertsystemadministratorswhensystems

havefailed.Adashboardlightgoesredtoinformyouaserverhascrashed.That’sdefinitelyvaluableinformation,buthavingawarningthattheserverwashavingaproblembeforeitcrashesismuchmorevaluable.

Wehelpourcustomersfindtherighttoolsthatwillalertthemtopotentialproblemsbeforetheyresultinsystemorapplicationcrashes.

Page 9: RPM HPC Systems - RedLine Performance Solutions€¦ · HPC workloads are vital to citizens, governments, and organizations throughout the world. Many ... Research into new materials

RedLinePerformanceMethodology(RPM)—GettingMaximumPerformanceOutofHPCSystems 9

Sometimessystemsfailsuddenly,makingitdifficult(whenthereisnosolidsystemsmonitoring)toidentifytherootcauseandthusrenderaquickfix.That’soneofthereasonswehelpcustomersinstallextensiveinstrumentationthathelpsthemquicklyandefficientlyidentifyandisolateproblems.

PerformanceMonitoring:Performancemonitoringprovidesinsightsintosystemhealththatbasicsystemmonitoringjustcan’taccomplish.

Strangelyenough,foranindustrydealingwiththemostcomplexcomputingsystemsintheworld,there’sverylittlepublishedand/ordiscussedaroundHPCsystemperformancemonitoringandperformanceanalysis.HPCperformanceanalysisrequiresunderstandingatadeeplevelwhereandhowsystems

componentsareinteractingand,mostimportantly,wherebottlenecksexist.

Collectingbothhigh-levelandlow-levelperformancedatainatimeseriesformatisthefirststep.Asstatedabove,thereisnoshortageofavailabletools.SinceHPCsystemsaretypicallyveryoverhead-conscious,uselightweightlow-impacttoolsforsystemandapplicationperformancedatacollection.

Thereissignificantvalueinbeingabletographicallyrepresenttimeseriesperformancedata.ToolslikeGrafanaenableadministratorstorapidlyplottimeseriesdatacollectedbyvariousmetricsframeworks(e.g.,StatsD,collectd,collectl,tcollector)throughawebinterface.Alesser-knowntool,PerformanceCoPilot,hasexcellentcapabilitiesinperformancedatacollection,aswellasthecapabilitytointuitivelydisplaytimeseriesperformancedata.Thesetoolsallowforthevisualcomparisonofknowngoodbaselineperformancedataagainstreal-timeornearreal-timeperformancedata.

Thekeytoperformancemonitoringiscapturingthedifferencesbetweenyourknownbaselineperformanceandtoday’sperformance.Iftoday’sperformanceissubpar,itcouldbeaslowCPU,failingmemory,oradrivethatwasn’tcorrectlyinstalled.Withsolidhistoricalperformancedata,effectivedatavisualizationcapabilities,andbaselinesandbenchmarks,administratorsareabletoidentify(andfix)problemsmuchmorequickly.Makingtheinvestmentinperformance

Page 10: RPM HPC Systems - RedLine Performance Solutions€¦ · HPC workloads are vital to citizens, governments, and organizations throughout the world. Many ... Research into new materials

RedLinePerformanceMethodology(RPM)—GettingMaximumPerformanceOutofHPCSystems10

monitoringwillhavelonglastingbenefitsthatfarsurpassthecostofimplementation.

ChangeManagement:Ascertainasdeathandtaxes,changeisinevitableanditwillbreakyoursystem.ChangeisanessentialpartofIToperationsandHPCisnoexception.Althoughthereisnomagicbulletthatmakeschangeriskfree,goodchangemanagementwillreducethemeantimetorepair(MTTR)whenchangeintroducesproblemsintoyourHPCsystem.

Sowhatmakesachangemanagementprocesswork?Themostimportantaspectis

thepeople.Thebestchangemanagementpolicies,procedures,andtoolsareuselessunlesspeopleadheretothem.Systemadministrators,andparticularlyHPCsystemadministrators,aretypicallyverysmartfolks.Unfortunately,someviewadherencetoachangemanagementprocessasanunnecessaryburden,particularlywhenitcomestomakingsmallandseemingly“obvious”changestofixaproblem.

Whatisoftennotfullyconsideredaretheupstreamordownstreamramificationsofchange.EventhemostexperiencedsystemadministratorwillhaveahardtimeknowingalloftheeffectschangescanhaveonacomplexHPCsystem.

Whenfailuresstart,it’susuallynotthepersonwhomadethechangewhogetsthat3:00AMphonecall.Ifyou’rethatperson,thefirstthingtoaskis“Whatchanged?”Knowingwhatchanged,whenitchanged,whyitchanged,andhowtobackoutofthechangearesomeofthemostvaluablebenefitssystemadministratorsderivefromchangemanagement.

Thevalueofrunningbaselinesandbenchmarksbeforeandafterachangeiscritical.Changeswilloftenhaveanimpactthatisnotobvious.Howmanytimeshaveyoureceivedcallsfromenduserssaying,“thesystemseemsslowertoday?”

Runningyourbaselinesandbenchmarkspriortoachangeconfirmsyoursystemishealthy.Runningthosesamebaselinesandbenchmarksafterachangeallowsyoutoassesstheimpactofthatchange.Inaddition,ifyourpre-changebaselines

Page 11: RPM HPC Systems - RedLine Performance Solutions€¦ · HPC workloads are vital to citizens, governments, and organizations throughout the world. Many ... Research into new materials

RedLinePerformanceMethodology(RPM)—GettingMaximumPerformanceOutofHPCSystems11

and/orbenchmarksfail,stronglyconsidercancellingyourchangeandremediatingyoursystembeforeapplyingthechange.

Thebottomlinewhenitcomestochangemanagement?Thereshouldbezerotoleranceforunauthorizedchange.

Enforcingapolicyofzerounauthorizedchangesistheonlywaytoknowchangemanagementisbeingstrictlyfollowed.Dealingwithchange-relatedproblemscomeswiththeterritoryforanHPCsystemadministrator,butdealingwithunauthorizedchangesisunacceptable.Theproblemdeterminationprocessrequiressystemadministratorstomakelogicaldeductionsbasedonavailableinformation.Whenunauthorized/undocumentedchangesaremade,MTTRsuffers.

Detectingunauthorizedchangestosystemimagesismanageable.EnterpriseconfigurationmanagementtoolslikePuppetandChefcanbeconfiguredtodetectand,ifappropriate,overwriteunauthorizedchangesforbothstatelessandstatefulnodes.

ThesetoolshavebeensuccessfullydeployedandutilizedinHPCenvironments.ToolssuchasTripwireorOSSEC,knownmostlyforintrusiondetection,provideautomateddetectionandreportingofchanges.Evenwiththebesttools,ITstaffmustbetrustedtoadheretochangemanagementpoliciesandpracticesorfindanewlineofwork.

Adheringtothedisciplinesabovewillpayoffinextraordinaryways.Downtimewillberadicallydecreased,performancewillimprove,and,perhapsmoreimportantly,performancepredictabilitywillincrease.

Page 12: RPM HPC Systems - RedLine Performance Solutions€¦ · HPC workloads are vital to citizens, governments, and organizations throughout the world. Many ... Research into new materials

RedLinePerformanceMethodology(RPM)—GettingMaximumPerformanceOutofHPCSystems12

AchievingMaximumApplicationPerformance

Sofar,we’vebeendiscussinghowtooptimize,monitor,andmanagehardwaresystems.Butthat’sonlyhalfoftheperformanceequation.Theotherhalfisensuringyourcodeisoptimizedformaximumperformanceandthroughput.

CodeOptimization:Likesystemperformancetuning,optimizingcodeisalmostanartform.Thegoalistoachieveabalanceinwhichthesystemprovidesthebestthroughputandoverallperformance.Thismeanseliminatingbottlenecksormitigatingthembyhavingotherfunctionsoccursimultaneously,thusreducingtheimpactofthebottleneck.

Havinganintimateknowledgeofyouruniqueapplication’sexecutioniscritical.Understandingifyourapplicationistypically

compute,memory,communications,orI/Oboundisagoodstart,butyouneedtogodeeperinordertowringthehighestthroughputfromyoursystem.

Profilingtoolsarehighlyusefulandcanhelpyouseeexactlywhereyourfunctionsorsubroutinesbogdown.Profilingreferstotheprocessofobservingaprogramwhileit’sexecuting,andcollectingthetimerequired(andoveralltimespent)foreachfunctionorsubroutineinaprogram.Thisdatamaythenbesortedorrankedtodeterminewherethemostcomputingcyclesarefocused.Suchfrequentlyusedprogramelementsmakethebestcandidatesforoptimization,becauseimprovingtheirperformancewillprovidethegreatestreturnonthetimeandeffortinvested.

Profilingcanbeconductedineitherserialorparallelfashion.Serialprofilingproducesasetoftimingsforeachfunctionorsubroutineinaprogram.Amdahl’sLawlimitsthedegreeofaccelerationthatcanbeachievedbyoptimizinganysingleserialfunctionorsubroutine.ButAmdahl’sLawalsoextendstoparallelprograms(orfunctionsandsubroutines).

Forexample,codethatis90percentparalleland10percentserialcanonlybeaccelerated10xthroughparallelization.

Page 13: RPM HPC Systems - RedLine Performance Solutions€¦ · HPC workloads are vital to citizens, governments, and organizations throughout the world. Many ... Research into new materials

RedLinePerformanceMethodology(RPM)—GettingMaximumPerformanceOutofHPCSystems13

Whenparallelizationisputtowork,parallelprofilingalsoresultsinsignificantlymoredatatobeanalyzed.That’sbecausetimingmustbemeasuredforeachsubroutineorfunctionasitrunsoneachprocessorused(forMPI-basedparallelization)orexecutionthreademployed(forOpenMP-basedparallelization).

Furthermore,whenaprogramisparallelized,itbecomesnecessarytomeasureandrankcommunicationlatencyaspartoftheprofilingprocess.Thisintroducesadditionalfactorstoconsiderwhenlookingforbottlenecks,andwillalsorequiretimeandefforttoanalyzeandaddress.

Onceyouhavesomeoptimizationtargets,thereareavarietyofparallelizationandoptimizationtechniquesyoucanattempttomakethecoderunmoreefficiently.Hereareafewexamples:

• Ifyourcodeiscomputebound,youcouldportsomeofthecodetoaGPUorFPGAaccelerator.Anotheroptionistotryusingadifferentmethodtosolvekeypartsofyouralgorithm,likereplacinganexplicitmethodwithanimplicitmethod,orusingafastmultipolemethod.

• Ifyourapplicationsarecommunicationsbound,youcanworktoreduceall-to-allandsynchronouscollectcommunications.Youcanalsoreplacesynchronouscommunicationswithasynchronouscommunicationstohidecommunicationlatency.OryoucanreworkyouralgorithmtoremoveanyMPI_Barriercalls.

• MemoryboundapplicationscanbenefitfromredesigningyouralgorithmtobreakupglobalarraysanduseMPI-basedprocessestoretrievedatathatisstoredonotherprocessors.

Optimizationisalwaysatrade-off.Themoretimeandeffortyouexpendinadjustingandtuning,thebetteryourprocessingoutcomesshouldbe.Ifyouprovideclearguidancetodevelopersonthebestoptionsforovercomingvarioustypesofconstraints–suchasmemorybound,communicationsbound,orI/Obound–youcanhelpthemboostperformancebyquiteabitrightoutofthegate.Butifyouspendtoomuchtimepreparing,ortoolittletimeprocessing,you’llstartaffectingproductivityinotherways.Striketherightbalanceandyou’llseeproductivityimprove.Keepupyouroptimizations,andthoseimprovementsshouldcontinue.

Page 14: RPM HPC Systems - RedLine Performance Solutions€¦ · HPC workloads are vital to citizens, governments, and organizations throughout the world. Many ... Research into new materials

RedLinePerformanceMethodology(RPM)—GettingMaximumPerformanceOutofHPCSystems14

WorkflowManagement:AnoftenoverlookedbutcriticalaspectofoverallHPCperformanceandefficiencyisworkflow.Workflowmanagementbecomesincreasinglyimportantasyourjobsbecomemorenumerousandcomplex.InmostHPCenvironments,therearemanydependenciesthatmustbetakenintoaccount.We’retalkingaboutsituationsinwhichtheoutputofonejobistheinputforotherjobsandtheentireapplicationrunneedstohappenintherightordereverytime.

Goodworkflowmanagementwilltrackeachphaseofoperationsandensuretheprocesseshappenintherightsequence.Ittakesintoaccountallworkloaddependenciesandenforcespolicies.Itworkswiththeschedulertomonitoreachphaseofthejobanditwon’tallowtheprocesstocontinueifthesub-processeshaven’tsuccessfullycompleted.

Yourworkflowmanagementtoolshouldgiveyouawiderangeofoptionsonhowitshouldproceedifajobterminatesunsuccessfullyorfailstoterminateatall.Forinstance,itcouldbeprogrammedtoattempttoruntheprecedingjobtwomoretimesbeforegivinguponitandalertingoperatorstothefailure.

AtRedLine,we’veworkedwithalargenumberofworkflowmanagers,bothcommercialandopensource,andcanworkwithyoutofindtherighttoolforyourspecificsituation.

Page 15: RPM HPC Systems - RedLine Performance Solutions€¦ · HPC workloads are vital to citizens, governments, and organizations throughout the world. Many ... Research into new materials

RedLinePerformanceMethodology(RPM)—GettingMaximumPerformanceOutofHPCSystems15

DemandMoreFromYourHPCSystems

Statedplainly,webelievecustomersshoulddemandmorefromtheirHPCsystems.

Supercomputersareascarceandpreciousresource;moreover,they’reveryexpensivetoacquireandoperate.Thisalsoappliestoclustersusedinbusinessandmanufacturingorganizations.

Customerswanttogetmaximumperformance,stability,andreliabilityoutoftheirsystems,buttheyoftendon’tknowwheretoturnforthehelptheyneed.Today’sHPCsystemsaretypicallybuiltusinghardwareandsoftwarefrommultiplehardwaremanufacturersutilizingbothopen-sourceandcommerciallylicensedsoftware.RedLine’sviewofsystemsandsystemperformancecanoftenshortcutthetroubleshootingprocesswhendealingwithdisparatehardwarevendors,ISVsandsoftwarecompanies.RedLine’svendoragnosticapproachtosystemsintegrationcoupledwithRPMmitigatessupportconcernsandallowscustomerstoselect“BestofBreed”solutions.

AtRedLine,wetakeaholisticviewofyourenvironmentasatightlycoupledarchitecture.Thisincludesthehardwareandsoftwarethatcomprisethecluster,aswellastheapplicationsthatrunonthecluster.Wedon’tspecializeinjusttuningcodes,ornetworkoperations,orCPU/memorythroughput–wedoitall.

Aswe’vediscussed,wehavedevelopedawidevarietyofoperationallyprovenmethods,practices,andtoolstomonitor,manage,andmaximizeperformancefor

Page 16: RPM HPC Systems - RedLine Performance Solutions€¦ · HPC workloads are vital to citizens, governments, and organizations throughout the world. Many ... Research into new materials

RedLinePerformanceMethodology(RPM)—GettingMaximumPerformanceOutofHPCSystems16

ITinfrastructuresrangingfrommassivesupercomputerstoasmallcluster.WeincorporatealloftheseintoourRedLinePerformanceMethodology(RPM).

OurRPMensuresthecustomer’senvironmentisfullyinstrumentedandmanaged,soadministratorscanidentifyperformanceaberrationsbeforetheycauseproblemsforendusers.Wedevelopedandrefinedthisrepeatablemethodologyoverthecourseofmorethan19yearsofsuccessfullysupportingcustomerHPCsystemsandapplications.RedLineconstantlyevolvestheRPMwiththelessonslearnedfromeachengagement.ThebenefitsofleveragingourRPMincludeconsistenttopperformanceandminimizinglabor,risk,time,andcostforourcustomers.

Everydayweworkwithourcustomers,alongwiththeirhardwareandsoftwaresuppliers,totweakandtuneeveryimaginableaspectofthesysteminordertoachievepeakend-to-endperformance.It’snotcalledhighperformancecomputingfornothing,right?

Ifyou’dliketohearmoreaboutwhatwedoandhowwemightbeabletohelpyourorganization,justreachouttousformoreinformation.

Page 17: RPM HPC Systems - RedLine Performance Solutions€¦ · HPC workloads are vital to citizens, governments, and organizations throughout the world. Many ... Research into new materials

RedLinePerformanceMethodology(RPM)—GettingMaximumPerformanceOutofHPCSystems17

2275ResearchBlvd.,Suite500Rockville,MD20850

[email protected]

www.redlineperf.com