Parallel Computing Through MPI Technologies Author: Nyameko Lisa Supervisors: Prof. Elena Zemlyanaya, Prof Alexandr P. Sapozhnikov and Tatiana F. Sapozhnikov

Parallel Computing Through MPI Technologies

Author: Nyameko Lisa

Supervisors: Prof. Elena Zemlyanaya, Prof Alexandr P. Sapozhnikov and Tatiana F. Sapozhnikov

Outline – Parallel Computing through MPI Technologies

Introduction Overview of MPI General Implementation Examples Application to Physics Problems Concluding Remarks

Introduction – Need for Parallelism

More stars in the sky than there are grains of sands on all the beaches of the world

Introduction – Need for Parallelism

It requires approximately 204 billion atoms to encode the human genome sequence

Vast number of problems from a wide range of fields have significant computational requirements

Introduction – Aim of Parallelism

Attempt to divide a single problem into multiple parts

Distribute the segments of said problem amongst various processes or nodes

Provide a platform layer to manage data exchange between multiple processes that solve a common problem simultaneously

Introduction – Serial Computation

Problem divided into discrete, serial sequence of instructions

Each executed individually, on a single CPU

Introduction – Parallel Computation

Same problem distributed amongst several processes (program and allocated data)

Introduction – Implementation

Main goal is to save time and hence money– Furthermore can solve larger problems – depleted resources – Overcome intrinsic limitations of serial computation– Distributed systems provide redundancy, concurrency and

access to non-local resources, e.g. SETI, Facebook, etc 3 methodologies for implementation of parallelism

– Physical Architecture– Framework– Algorithm

In practice will almost always be combination of above Greatest hurdle is managing distribution of information

and data exchange i.e. overhead

Introduction – Top 500

Japan’s K Computer (Kei = 10 quadrillion) Currently fastest supercomputer cluster in the world 8.162 petaflops (~8 x 1015 calculations per second)

Overview – What is MPI?

Message Passing Interface One of many frameworks and technologies

for implementing parallelization Library of subroutines (FORTRAN), classes

(C/C++) and bindings for python packages that mediate communication (via messages) between single threaded processes, executing independently and in parallel

Overview – What is needed?

Common user accounts with same password Administrator / root privileges for all accounts Common directory structure and paths MPICH2 installed on all machines This is combination of MPI-1 and MPI-2

standards CH – Chameleon portability layer provides

backward compatibility to existing MPI frameworks

Overview – What is needed?

MPICC & MPIF77 – Provide options and special libraries needed to compile and link MPI programs

MPIEXEC – Initialize parallel jobs and spawn copies of the executable to all of the processes

Each process executes its own copy of code By convention choose root process (rank 0) to

serve as master process

General ImplementationHello World - C++

General ImplementationHello World - FORTRAN

General ImplementationHello World - Output

Example - Broadcast Routine

Point-to-point (send & recv) and Collective (bcast) library routines are contained in library

Source node mediates distribution of data to/from all other nodes

Example - Broadcast RoutineLinear Case

Apart from root and last nodes, each node receives from and sends to previous and next node respectively

Use point-to-point library routines to build custom collective routine

MPI_RECV(myProc - 1)

MPI_SEND(myProc + 1)

Example - Broadcast RoutineBinary Tree

Each parent node sends message to two child nodes

MPI_SEND(2 * myProc)

MPI_SEND(2 * myProc+1) IF( MOD(myProc,2) == 0 ) MPI_RECV( myProc/2 ) ELSE MPI_RECV((myProc-1)/2)

Example – Broadcast RoutineOutput

Applications to Physics Problems

Quadrature – Discretize interval [a,b] into N steps and divide amongst processes: – FOR LOOP (1+myProc to N;incr of numProcs)– E.g. with N = 10 and numProcs = 3

Process: Iteration1, Iteration2,… 0: 1,4,7,10 1: 2,5,8 2: 3,6,9

Finite Difference problems – Similarly divide mesh/grid amongst processes

Many applications, limited only by our ingenuity

Closing Remarks

In 1970’s, Intel co-founder Gordon Moore, correctly predicted that, ”number of transistors that can be inexpensively placed on an integrated circuit doubles approximately every 2 years”

10-Core Xeon E7 processor family chips are currently commercially available

MPI easy to implement and well suited to many independent operations that can be executed simultaneously

Only limitations are overhead incurred by inter-process communications, out ingenuity ands strictly sequential segments of program

Acknowledgements and Thanks

NRF and South African Department of Science and Technology

JINR, University Center Dr. Jacobs and Prof. Lekala Prof. Elena Zemlyanaya, Prof Alexandr P.

Sapozhnikov and Tatiana F. Sapozhnikov Last but not least my fellow colleagues

Documents

Parallel Computing Through MPI Technologies Author: Nyameko Lisa Supervisors: Prof. Elena Zemlyanaya, Prof Alexandr P. Sapozhnikov and Tatiana F. Sapozhnikov