16
Manuel Baumgartner, Christian Bischof, André Brinkmann, Alexandru Calotoiu, Nicolas Gauger, Matthias Kretz, Volker Lindenstruth, Max Sagebaum, Dörte Carla Sternel, Felix Wolf Enabling Performance Engineering in Hesse and Rhineland-Palatinate

Enabling Performance Engineering in Hesse and …...• Relativistic hydrodynamics simulation, encompassing turbulence, accretion and neutron star collision • Coupled wit the AMRexframework

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Enabling Performance Engineering in Hesse and …...• Relativistic hydrodynamics simulation, encompassing turbulence, accretion and neutron star collision • Coupled wit the AMRexframework

Manuel Baumgartner, Christian Bischof, André Brinkmann, AlexandruCalotoiu, Nicolas Gauger, Matthias Kretz, Volker Lindenstruth, Max Sagebaum, Dörte Carla Sternel, Felix Wolf

Enabling Performance Engineering in Hesseand Rhineland-Palatinate

Page 2: Enabling Performance Engineering in Hesse and …...• Relativistic hydrodynamics simulation, encompassing turbulence, accretion and neutron star collision • Coupled wit the AMRexframework

10/9/18 | Department of Computer Science | Laboratory for Parallel Programming | Alexandru Calotoiu | 2

Enabling performance engineering

Hesse

Rhineland-Palatinate

Page 3: Enabling Performance Engineering in Hesse and …...• Relativistic hydrodynamics simulation, encompassing turbulence, accretion and neutron star collision • Coupled wit the AMRexframework

10/9/18 | Department of Computer Science | Laboratory for Parallel Programming | Alexandru Calotoiu | 3

Enabling performance engineering

Objective: Expand and deepen HPC support in areas where existingscientific expertise coincides with critical user needs.

Approach: HKHLRAHRP + Users

Page 4: Enabling Performance Engineering in Hesse and …...• Relativistic hydrodynamics simulation, encompassing turbulence, accretion and neutron star collision • Coupled wit the AMRexframework

10/9/18 | Department of Computer Science | Laboratory for Parallel Programming | Alexandru Calotoiu | 4

Enabling performance engineering

Hesse

Rhineland-Palatinate

Principal Investigators

Christian Bischof (coordinator) – Technische Universität DarmstadtAndré Brinkmann – Johannes Gutenberg Universität Mainz

Nicolas Gauger – Technische Universität KaiserslauternVolker Lindenstruth – Goethe-Universität Frankfurt am MainDörte Carla Sternel – Hessisches Kompetenzzentrum für

HochleistungsrechnenFelix Wolf – Technische Universität Darmstadt

Page 5: Enabling Performance Engineering in Hesse and …...• Relativistic hydrodynamics simulation, encompassing turbulence, accretion and neutron star collision • Coupled wit the AMRexframework

10/9/18 | Department of Computer Science | Laboratory for Parallel Programming | Alexandru Calotoiu | 5

Enabling performance engineering

Scalability

Page 6: Enabling Performance Engineering in Hesse and …...• Relativistic hydrodynamics simulation, encompassing turbulence, accretion and neutron star collision • Coupled wit the AMRexframework

10/9/18 | Department of Computer Science | Laboratory for Parallel Programming | Alexandru Calotoiu | 6

HPC support structures

EPE Activities• 1st EPE workshop, Mainz, May 2017• 8th HiPerCH workshop, Marburg, September 2017• Expert workshop on scalability analysis, Darmstadt, November 2017• User meeting on scalability analysis, Darmstadt, February 2018• Totalview tutorial, Kaiserslautern, March 2018• Minisymposium at WCCM 18, July 2018• FEPA-Workshop Erlangen, July 2018• 10th HiPerCH workshop, Darmstadt: September 2018

Courses• Introduction to Bash• Introduction to Mogon

User Engagement• HPC-Cafe• Bachelor & Master Theses

Page 7: Enabling Performance Engineering in Hesse and …...• Relativistic hydrodynamics simulation, encompassing turbulence, accretion and neutron star collision • Coupled wit the AMRexframework

10/9/18 | Department of Computer Science | Laboratory for Parallel Programming | Alexandru Calotoiu | 7

What affects the predictability of weather?• Sensitivity of numerical models• Collaboration with TRR 165 “Waves to weather”

Status• Applied algorithmic differentiation to cloud

scheme• Warm cloud scheme of COSMO

• Identified coefficients and parameters with large derivatives

Algorithmic reproducibility

Page 8: Enabling Performance Engineering in Hesse and …...• Relativistic hydrodynamics simulation, encompassing turbulence, accretion and neutron star collision • Coupled wit the AMRexframework

10/9/18 | Department of Computer Science | Laboratory for Parallel Programming | Alexandru Calotoiu | 8

Help measure stability of algorithms • Provide tools to analyze the Lipschitz

constants / condition numbers of each code part

Status• CFD suite TRACE – Replacement of hand-

made FFT implementation through library • Stability proven• Greatly improved maintainability• Speedup of 51! (according to DLR)

• Ongoing: exchange of hand-made linear solver in industrial mold-filling simulation

Algorithmic stability & performance

Page 9: Enabling Performance Engineering in Hesse and …...• Relativistic hydrodynamics simulation, encompassing turbulence, accretion and neutron star collision • Coupled wit the AMRexframework

10/9/18 | Department of Computer Science | Laboratory for Parallel Programming | Alexandru Calotoiu | 9

C++ extension for explicit data parallelism• Allows numerical algorithm developers to exploit hardware parallelism in a

portable way and with minimal effort• Vc library provides portable, zero-overhead C++ types for explicitly data-

parallel programming

Multi- and manycore performance

Fundamental Type

ScalarRegisters &Instructions

ProgrammingLanguage

ComputerPacked

Registers &Instructions

SIMD Type

abstracts abstracts abstracts

Page 10: Enabling Performance Engineering in Hesse and …...• Relativistic hydrodynamics simulation, encompassing turbulence, accretion and neutron star collision • Coupled wit the AMRexframework

10/9/18 | Department of Computer Science | Laboratory for Parallel Programming | Alexandru Calotoiu | 10

Status application• Collaboration with Prof. Rezolla @ Goethe-Universität Frankfurt

• Relativistic hydrodynamics simulation, encompassing turbulence, accretion and neutron star collision

• Coupled wit the AMRex framework• 3x CPU speedup with Vc library• GPU port in progress

Status standardization• ISO TS 19570:2018, containing SIMD types, awaiting publication• Independent implementation exists in libc++ (Clang)• Contributed implementation to libstdc++ (GCC)

Multi- and manycore performance (2)

2.5D Blocking Elias R. Most (dev of CPU code)

Page 11: Enabling Performance Engineering in Hesse and …...• Relativistic hydrodynamics simulation, encompassing turbulence, accretion and neutron star collision • Coupled wit the AMRexframework

10/9/18 | Department of Computer Science | Laboratory for Parallel Programming | Alexandru Calotoiu | 11

Help developers identify and resolve scalability limitations in their codes using Extra-P

Scalability

http://www.scalasca.org/software/extra-p/download.html

Page 12: Enabling Performance Engineering in Hesse and …...• Relativistic hydrodynamics simulation, encompassing turbulence, accretion and neutron star collision • Coupled wit the AMRexframework

10/9/18 | Department of Computer Science | Laboratory for Parallel Programming | Alexandru Calotoiu | 12

Scalability (2)

UG4 @ GCSC, Frankfurt• Grid-based solution of PDEs • Effects of problem size and

#processes on performance

LLL Algorithm @ SC, Darmstadt• Lattice-based cryptographic algorithm• Higher complexity desirable !• Empirical complexity

lower than expected

OpenFoam @ MMA, Darmstadt• Open-source CFD

package • Many different solvers• Derived hardware requirements

for icoFoamFASTEST @ SC, TU Darmstadt• Flows in complex 3D configurations• Modeled strong

scaling behavior• Reproducibility of

performance

6 Andreas Vogel et al.6 Andreas Vogel et al.

!!""

#$!!""

%#!""

Fig. 3. Computing grids for the skin problem showing corneocytes (green) and lipidchannels (red). Left: geometry ratios. Right: 3d grid for 10 layers of corneocytes.

For each run, the benchmark data is written out in a certain format thatenables the researcher to deduct the desired information. This data can be parsedby automatic pre- and post-processing scripts that draw information and storeit more densely for manual interpretation.

The steps carried out by JUBE are shown in Fig. 2. Preparation, compilation,execution, and analysis steps might exist multiple times and JUBE will performthe aforementioned steps in sequence. It is important to note that JUBE isable to easily create combinatorial runs of multiple parameters. For example,in a scaling experiment, one can simply specify multiple numbers of processes,and/or di�erent solver setups and/or physical parameter, and JUBE will createone experiment for each possible combination, submit all of them to the resourcemanager, collect all results, and display them together.

5 Results

Using the tools from Sec. 3 and Sec. 4 we analyzed the UG4 code in three studies:In the first two tests we focus on modeling drug di�usion through the humanskin. First, we analyze the code behavior in a weak scaling followed by studyvarying the di�usivity of the skin cells over ranges of magnitude. In the thirdstudy we provide a comparison for two di�erent types of solver: the geometricmultigrid solver is compared in a weak scaling study to the unpreconditionedconjugate gradient (CG) method.

5.1 Drug di�usion though the human skin

We consider a model for the permeability of the human skin. The outermostpart of the epidermis (stratum corneum) consists of flattened, dead cells (cor-neocytes) that are surrounded by an inter-cellular lipid. The stratum corneum isthe natural barrier to protect underlying tissue but still allows for the through-put of certain concentrations (e.g., drugs, medicine). The latter process can bemodeled by a di�usion process in which the di�usion coe⇥cient within the cor-neocytes di�ers to the one in the lipid and di�erent geometric representation of

Fig. 3. Computing grids for the skin problem showing corneocytes (green) and lipidchannels (red). Left: geometry ratios. Right: 3d grid for 10 layers of corneocytes.

5 Results

Using the tools from Sec. 3 and Sec. 4, we analyze the UG4 code in three sub-studies: In the first two tests, we focus on modeling drug di�usion through thehuman skin. First, we analyze the code behavior under weak scaling, then wevary the di�usivity of the skin cells over several ranges of magnitude. In the thirdstudy, we compare two di�erent types of solvers, again under weak scaling: thegeometric multigrid solver and the unpreconditioned conjugate gradient (CG)method.

Drug di�usion though the human skin. The outermost part of the epi-dermis (stratum corneum) consists of flattened, dead cells (corneocytes), thatare surrounded by an inter-cellular lipid. The stratum corneum is the naturalbarrier to protect underlying tissue, but still allows for the throughput of certainconcentrations (e.g., drugs, medicine). The latter process can be modeled by adi�usion process, in which the di�usion coe⇥cient within the corneocytes di�ersfrom the one in the lipid. Di�erent geometric representations of the stratumcorneum have been used to compute the di�usional throughput [17].

In the following two studies, we use a brick-and-mortar model (Fig. 3).Assuming di�usion driven transport in the two subdomains s � {cor, lip} (cor-neocyte, lipid), the governing equation is given by

�tcs(t, x) = ⇥ · (Ds⇥cs(t, x)).

The di�usion coe⇥cient Ds is assumed to be constant within each subdomains � {cor, lip}, but may di�er between subdomains. For the scalability analysis,we compute the steady state of the concentration distribution.

As solver, we employ a geometric multigrid method, accelerated by an outerconjugate gradient method. The multigrid uses a damped Jacobi smoother, two(resp. three) smoothing steps in 2d (resp. 3d), a V-cycle, and an LU base solver.The iterations are completed once an absolute residuum reduction of 10�10 isachieved. The main di⇥culty of this problem is the bad aspect ratio of thecomputational domain (0.1µm vs. 30µm for the lipid channels). This is resolvedby three (resp. five) steps of anisotropic refinement to enhance those ratios. Basesolvers are applied at a level where ratios are satisfactory.

Page 13: Enabling Performance Engineering in Hesse and …...• Relativistic hydrodynamics simulation, encompassing turbulence, accretion and neutron star collision • Coupled wit the AMRexframework

10/9/18 | Department of Computer Science | Laboratory for Parallel Programming | Alexandru Calotoiu | 13

Service: automated scalability proof for compute time applications

Required for access to large-scale cluster

29 210 211 212 2130

3

6

9

12

15

18

21

3¨ 10

´4 p2 ` c

Processes

Tim

ers

s

Access to HPC resourcesScalability proof

Page 14: Enabling Performance Engineering in Hesse and …...• Relativistic hydrodynamics simulation, encompassing turbulence, accretion and neutron star collision • Coupled wit the AMRexframework

10/9/18 | Department of Computer Science | Laboratory for Parallel Programming | Alexandru Calotoiu | 14

Service: automated scalability proof for compute time applications (2)

Status• Prototype available for Lichtenberg cluster users at TU Darmstadt

• Integration with Workflow Manager JUBE (FZJ) in progress

Application

Scripting

29 210 211 212 2130

3

6

9

12

15

18

21

3¨ 10

´4 p2 ` c

Processes

Tim

ers

s

Scalability proof

Page 15: Enabling Performance Engineering in Hesse and …...• Relativistic hydrodynamics simulation, encompassing turbulence, accretion and neutron star collision • Coupled wit the AMRexframework

10/9/18 | Department of Computer Science | Laboratory for Parallel Programming | Alexandru Calotoiu | 15

Summary

• New performance engineering services offer• Speedup

• Productivity improvements

• Increased maintainability

• Easier ways of preparing compute time grant proposals

• Teaching activities bring knowledge to users

• Individual user support allows complex application tuning

Page 16: Enabling Performance Engineering in Hesse and …...• Relativistic hydrodynamics simulation, encompassing turbulence, accretion and neutron star collision • Coupled wit the AMRexframework

10/9/18 | Department of Computer Science | Laboratory for Parallel Programming | Alexandru Calotoiu | 16

Project publications

[1] Michael Burger, Christian Bischof, Alexandru Calotoiu, Felix Wolf, Thomas Wunderer, Johannes Buchmann: Exploring thePerformance Envelope of the LLL Algorithm. In CSE 2018 - 21st IEEE International Conference of Computational Science andEngineering, Romania, IEEE Computer Society, October 2018, (toappear).

[6] Kashif Ilyas, Alexandru Calotoiu, Felix Wolf: Off-Road Performance Modeling – How to Deal with Segmented Data. In Proc. of the 23rd Euro-Par Conference, Santiago de Compostela, Spain

[2] Alexandru Calotoiu, Alexander Graf, Torsten Hoefler, Daniel Lorenz, Sebastian Rinke, Felix Wolf: Lightweight Requirements for ExascaleCo-design. In Proc. of the 2018 IEEE International Conference on Cluster Computing (CLUSTER), Belfast, UK

[7] Patrick Reisert, Alexandru Calotoiu, Sergei Shudler, Felix Wolf: Following the Blind Seer – Creating Better Performance Models Using Less Information. In Proc. of the 23rd Euro-Par Conference, Santiago de Compostela, Spain

[3] Aamer Shah, Matthias S. Müller, Felix Wolf: Estimating the Impact ofExternal Interference on Application Performance. In Proc. of the24th Euro-Par Conference, Turin, Italy, volume 11014 of LectureNotes in Computer Science, pages 46-58, Springer, August 2018.

[8] Sergei Shudler, Alexandru Calotoiu, Torsten Hoefler, Felix Wolf: Isoefficiency in Practice: Configuring and Understanding the Performance of Task-based Applications. In Proc. of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), Austin, TX, USA, 2017

[4] Alexander Hück, Christian Bischof, Max Sagebaum, Nicolas R. Gauger, Benjamin Jurgelucks, Eric Larour & Gilberto Perez: A usability case study of algorithmic differentiation tools on the ISSM ice sheet model, Optimization Methods and Software, 33:4-6, 844-867, 2018)

[9] Manuel Baumgartner & Peter Spichtinger: Local Interactions by Diffusion between Mixed-Phase Hydrometeors: Insightsfrom Model Simulations. Mathematics of Climate andWeather Forecasting, 3(1), pp. 64-89., 2017

[5] Manuel Baumgartner & Peter Spichtinger: Towards a bulk approachto local interactions of hydrometeors, Atmos. Chem. Phys., 18, 2525-2546, 2018