24
Seminar Operating Systems Summer Term 2018 Scalability of Molecular Dynamics Simulations on Heterogeneous Hardware Laura Morgenstern

Scalability of Molecular Dynamics Simulations on ... · Survey models of ... Your task is a literature research on models of parallel ... L. Morgenstern. A NUMA-Aware Task-Based Load-Balancing

Embed Size (px)

Citation preview

Seminar Operating SystemsSummer Term 2018

Scalability of Molecular Dynamics Simulations on Heterogeneous Hardware

Laura Morgenstern

Scalability of Molecular Dynamics Simulations on Heterogeneous Hardware

Summer Term 2018 Laura Morgenstern ∙ osg.informatik.tu-chemnitz.de2

1 Motivation

1 Motivation● Materials science

● Biochemistry

● Biomedicine

● Virology

● Fluid Dynamics

Scalability of Molecular Dynamics Simulations on Heterogeneous Hardware

Summer Term 2018 Laura Morgenstern ∙ osg.informatik.tu-chemnitz.de3

2 Fast Multipole Method

2 Fast Multipole Method

● Goal: simulation of particle systems

● N-body problem: computation of all pairwise interactions between N particles

● Classic Coulomb solver: compute coulomb force F for each particle O(N2)

● Fast Multipole Method (FMM): reduce number of interactions via particle grouping O(N)

Scalability of Molecular Dynamics Simulations on Heterogeneous Hardware

Summer Term 2018 Laura Morgenstern ∙ osg.informatik.tu-chemnitz.de4

2 Fast Multipole Method

2.1 Hierarchical Space Decomposition

Scalability of Molecular Dynamics Simulations on Heterogeneous Hardware

Summer Term 2018 Laura Morgenstern ∙ osg.informatik.tu-chemnitz.de5

2 Fast Multipole Method

2.2 Workflow

2.2.1 Particle to Multipole (P2M)

Scalability of Molecular Dynamics Simulations on Heterogeneous Hardware

Summer Term 2018 Laura Morgenstern ∙ osg.informatik.tu-chemnitz.de6

2 Fast Multipole Method

2.2 Workflow

2.2.2 Multipole to Multipole (M2M)

Scalability of Molecular Dynamics Simulations on Heterogeneous Hardware

Summer Term 2018 Laura Morgenstern ∙ osg.informatik.tu-chemnitz.de7

2 Fast Multipole Method

2.2 Workflow

2.2.3 Multipole to Local (M2L)

Scalability of Molecular Dynamics Simulations on Heterogeneous Hardware

Summer Term 2018 Laura Morgenstern ∙ osg.informatik.tu-chemnitz.de8

2 Fast Multipole Method

2.2 Workflow

2.2.4 Local to Local (L2L)

Scalability of Molecular Dynamics Simulations on Heterogeneous Hardware

Summer Term 2018 Laura Morgenstern ∙ osg.informatik.tu-chemnitz.de9

2 Fast Multipole Method

2.2 Workflow

2.2.5 Local to Particle (L2P)

Scalability of Molecular Dynamics Simulations on Heterogeneous Hardware

Summer Term 2018 Laura Morgenstern ∙ osg.informatik.tu-chemnitz.de10

2 Fast Multipole Method

2.2 Workflow

2.2.6 Particle to Particle (P2P)

Scalability of Molecular Dynamics Simulations on Heterogeneous Hardware

Summer Term 2018 Laura Morgenstern ∙ osg.informatik.tu-chemnitz.de11

3 Heterogeneous Hardware

3 Heterogeneous Hardware

● Multi-core processors, e.g. Intel Xeon Skylake

● Graphics processing units, e.g. NVIDIA Tesla V100

● Accelerators, e.g. Intel Knights Landing Xeon Phi

Scalability of Molecular Dynamics Simulations on Heterogeneous Hardware

Summer Term 2018 Laura Morgenstern ∙ osg.informatik.tu-chemnitz.de12

4 Research Goal

4 Research Goal

● Problem size fixed by physical size of molecules goal is strong scaling

● Execution time below 1 ms per simulation step...no matter, how many particles to simulate :-)

● Computational effort per compute node vanishingly low latency-critical FMM

● Heterogeneous hardware causes differences in latency, throughput and memory hierarchy

● Goal: strong scaling via adaptation of FMM to these differences task graph scheduling

Scalability of Molecular Dynamics Simulations on Heterogeneous Hardware

Summer Term 2018 Laura Morgenstern ∙ osg.informatik.tu-chemnitz.de13

5 Roadmap

5 Roadmap

● Survey models of parallel machines

● Understand hardware architectures

● Develop SW for heterogeneous HW

● Define performance portability

● Survey scheduling algorithms

Seminar goals

Scalability of Molecular Dynamics Simulations on Heterogeneous Hardware

Summer Term 2018 Laura Morgenstern ∙ osg.informatik.tu-chemnitz.de14

6 Topics

6.1 Models of Parallel Computing Systems (PS) ● Parallel computing systems rule our daily life – laptops, mobile phones, cloud services, …● Models of parallel computing systems support design and evaluation of parallel algorithms

● Your task is a literature research on models of parallel computing systems for performance critical applications

● Distinguish the terms machine model, architectural model, computational model and programming model

● Describe PRAM, LogP and a self-chosen model in detail● Find use cases in current research

● Literature to start with [1], [2], [3]

Scalability of Molecular Dynamics Simulations on Heterogeneous Hardware

Summer Term 2018 Laura Morgenstern ∙ osg.informatik.tu-chemnitz.de15

6 Topics

6.2 Metrics for Performance Portability (HS/FS)

● Performance portability is the holy grail of high performance computing● However, its definition varies in literature and is often given in a qualitative way only

● Your task is a literature research on metrics for performance portability● Find use cases in current research

● Literature to start with [4]

Scalability of Molecular Dynamics Simulations on Heterogeneous Hardware

Summer Term 2018 Laura Morgenstern ∙ osg.informatik.tu-chemnitz.de16

6 Topics

6.3 Developing a Coulomb Solver on GPUs with CUDA/SYCL (HS/FS)

● Computation of long-range interactions, e.g. Coulomb forces, is the most costly part of MD ● FMM “contains“ a classic Coulomb solver (P2P)

● Your task is the implementation of a classic Coulomb solver that computes the Coulomb force acting on each particle in a given particle system.

● Use C++ in combination with CUDA/SYCL for your implementation ● Assess performance, programmability and portability of your code

● Literature to start with [5] (CUDA), [6] (SYCL), [4]

Scalability of Molecular Dynamics Simulations on Heterogeneous Hardware

Summer Term 2018 Laura Morgenstern ∙ osg.informatik.tu-chemnitz.de17

6 Topics

6.4 Scheduling in Operations Research (PS)

● Scheduling is the assignment of jobs to resources at particular points in time● Used for process scheduling in operating systems, building production flows in operations

research and scheduling tasks in real-time systems

● Your task is a literature research on scheduling algorithms and heuristics developed in the area of operations research

● Focus on runtime-reducing scheduling approaches ● Classify scheduling problems – In which class would a task-based FMM belong?

● Literature to start with [7]

Scalability of Molecular Dynamics Simulations on Heterogeneous Hardware

Summer Term 2018 Laura Morgenstern ∙ osg.informatik.tu-chemnitz.de18

6 Topics

6.5 Automated Hardware Characterization for CPUs/GPUs (HS/FS)

● MD requires strong scaling● Strong scaling requires detailed knowledge of the hardware

● Your task is the implementation of a well-portable application that determines the hardware characteristics of an arbitrary CPU/GPU.

● Research on current CPU/GPU-architectures ● Which characteristics are interesting for the design of latency-critical parallel applications? ● How to determine these characteristics?● How to assure portability?

● Literature to start with [5] (GPU), [6] (GPU), [9] (GPU), [10] (CPU)

Scalability of Molecular Dynamics Simulations on Heterogeneous Hardware

Summer Term 2018 Laura Morgenstern ∙ osg.informatik.tu-chemnitz.de19

6 Topics

6.6 Task-based Scheduling on GPUs with Kokkos (HS/FS)

● Diverse task scheduling runtime systems and libraries promise performance, portability and programmability in the same breath

● Do not necessarily match the requirements of latency-critical applications● Nevertheless, we should watch out for transferable concepts

● Your task is a literature research on task-based scheduling approaches for dynamic workloads on GPUs.

● Focus on concepts such as persistent threads, dynamic parallelism, queues, ...● Are these concepts useful for a task-based FMM?

● Literature to start with [11]

Scalability of Molecular Dynamics Simulations on Heterogeneous Hardware

Summer Term 2018 Laura Morgenstern ∙ osg.informatik.tu-chemnitz.de20

7 Next Steps

7 Next Steps

1. Decide for a topic (https://doodle.com/poll/t7zv8qrkde2mtfr9) until 19th April 2018 2. Send me an email with your name, student number, field of study, seminar type and topic

until 19th April 20183. Individual status meeting between 21st May 2018 and 25th May 20184. Individual presentation-check between 2nd July 2018 and 6th July 2018 5. Give an awesome presentation 6. Hand in your preliminary seminar paper for proofreading until 17th September 20187. Revise your seminar paper8. Hand in your final seminar paper until 30th September 2018

Scalability of Molecular Dynamics Simulations on Heterogeneous Hardware

Summer Term 2018 Laura Morgenstern ∙ osg.informatik.tu-chemnitz.de21

8 Literature

8 Literature [1] T. Rauber and G. Rünger. Parallel Programming for Multicore and Cluster Systems. Springer Berlin Heidelberg, 2010.[2] S. Fortune and J. Wyllie. Parallelism in Random Access Machines. In Proceedings of the Tenth Annual ACM Symposium on Theory of Computing, STOC ’78, pages 114–118, New York, NY, USA, 1978. ACM.[3] D. Culler, R. Karp, D. Patterson, A. Sahay, K. E. Schauser, E. Santos, R. Subramonian, and T. v. Eicken. LogP: Towards a Realistic Model of Parallel Computation. SIGPLAN Not., 28(7):1–12, July 1993.[4] S.J. Pennycook, J.D. Sewall, and V.W. Lee. A Metric for Performance Portability. Proceedings of the 7th International Workshop in Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems, 2016.[5] J. Nickolls, I. Buck, M. Garland, and K. Skadron. Scalable Parallel Programming with CUDA. Queue, 6(2):40–53, March 2008.

Scalability of Molecular Dynamics Simulations on Heterogeneous Hardware

Summer Term 2018 Laura Morgenstern ∙ osg.informatik.tu-chemnitz.de22

8 Literature

[6] R. Keryell, R. Reyes, and L. Howes. Khronos SYCL for OpenCL: A Tutorial. In Proceedings of the 3rd International Workshop on OpenCL, IWOCL ’15, pages 24:1–24:1, New York, NY, USA, 2015. ACM.[7] A. Jones, L. C. Rabelo, and A. T. Sharawi. Survey of Job Shop Scheduling Techniques. Wiley Encyclopedia of Electrical and Electronics Engineering, 1999.[8] W. Stallings. Operating systems : Internals and design principles. Pearson Education Inter- national, Upper Saddle River, NJ, 6. ed., internat. ed. Edition, 2009.[9] Nvidia. Nvidia Tesla V100 GPU Architecture; The World’s Most Advanced Data Center GPU. Technical report, Nvidia, 2017.[10] C. Lameter. NUMA (Non-Uniform Memory Access): An Overview. Queue, 2013.[11] H. Carter Edwards, Christian R. Trott, and Daniel Sunderland. Kokkos: Enabling manycore performance portability through polymorphic memory access patterns. Journal of Parallel and Distributed Computing, 74(12):3202–3216, 2014. Domain-Specific Languages and High-Level Frameworks for High-Performance Computing.

8 Literature

Scalability of Molecular Dynamics Simulations on Heterogeneous Hardware

Summer Term 2018 Laura Morgenstern ∙ osg.informatik.tu-chemnitz.de23

8 Literature

8 Literature [12] M. Steinberger, M. Kenzel, P. Boechat, B. Kerbl, M. Dokter, D. Schmalstieg. Whippletree: Task-based Scheduling of Dynamic Workloads on the GPU[13] L. Morgenstern. A NUMA-Aware Task-Based Load-Balancing Scheme for the Fast Multipole Method. Master’s thesis, Chemnitz University of Technology, 2017.[14] I. Kabadshow. Periodic Boundary Conditions and the Error-Controlled Fast Multipole Method. PhD thesis, Forschungszentrum Jülich, 2012.[15] L. Greengard and V. Rokhlin. A fast algorithm for particle simulations. J. Comput. Phys., 73(2):325–348, December 1987.

Scalability of Molecular Dynamics Simulations on Heterogeneous Hardware

Summer Term 2018 Laura Morgenstern ∙ osg.informatik.tu-chemnitz.de24

9 Image Sources

9 Image Sources https://www.extremetech.com/wp-content/uploads/2016/04/KNL-Feature-640x354.jpgTh. Ullmann - MPI BPC Göttingenhttp://knoxblogs.com/atomiccity/2008/06/17/the_beauty_of_materials_scienc/http://scienews.com/research/3088-dna-of-someone-who-died-in-1827-restored-without-his-remains.htmlhttp://tropeninstitut.de/krankheiten-a-z/zika-virushttps://www.nvidia.de/data-center/tesla-v100/https://hothardware.com/news/nvidia-dgx-1-supercomputer-shreds-geekbench-with-8-tesla-v100-gpus-960-tflops-of-computehttp://www.loadthegame.com/2014/11/05/intel-skylake-s-chipset-specs-leaked/https://www.hardwareluxx.de/index.php/news/hardware/prozessoren/39576-hpc-prozessor-xeon-phi-knights-landing-final-vorgestellt.htmlhttps://www.slideshare.net/chesteve/ieee-hpsr-2017-keynote-softwarized-dataplanes-and-the-p3-tradeoffs-programmability-performance-portabiilty