Numerical solution of PDEs on parallel computers utilizing sequential simulators

Classication: OpenThe Numerical Objects Report SeriesReport 1997:16Numerical Solution of PDEson Parallel ComputersUtilizing SequentialSimulatorsAre Magnus Bruaset Xing CaiHans Petter LangtangenAslak TveitoSeptember 28, 1997

This document is classied as Open. All information herein is the property of Numerical ObjectsAS and should be treated in accordance with the stated classication level. For documents thatare not publicly available, explicit permission for redistribution must be collected in writing.Numerical Objects Report 1997:16Title Numerical Solution of PDEs on Parallel Computers Utilizing Sequential SimulatorsWritten byAre Magnus BruasetXing CaiHans Petter LangtangenAslak TveitoApproved byAre Magnus BruasetSeptember 28, 1997Numerical Objects, Dipack and Siscat and other names of Numerical Objects products refer-enced herein are trademarks or registered trademarks of Numerical Objects AS, P.O. Box 124,Blindern, N-0314 Oslo, Norway.Other product and company names mentioned herein may be the trademarks of their respectiveowners.Contact informationPhone: +47 22 06 73 00Fax: +47 22 06 73 50Email: [email protected]: http://www.nobjects.comCopyright c Numerical Object AS, Oslo, NorwaySeptember 28, 1997

Contents1 Introduction 12 Domain decomposition and parallelization 23 A simulator-parallel model in Dipack 34 A generic programming framework 55 Some numerical experiments 75.1 The Poisson equation : : : : : : : : : : : : : : : : : : : : : : : 85.2 The linear elasticity problem : : : : : : : : : : : : : : : : : : : 95.3 The two-phase porous media ow problem : : : : : : : : : : : 10

i

This report should be referenced as shown in the following BibTEX entry:@techreportNO1997:16,author = "Are Magnus Bruaset and Xing Cai and Hans Petter Langtangen and Aslak Tveito",title = "Numerical Solution of PDEs on Parallel Computers Utilizing Sequential Simulators",institution = "Numerical Objects AS, Oslo, Norway",type = "The Numerical Objects Report Series",number = "\#1997:16",year = "September 28, 1997",

ii

Numerical Solution of PDEs onParallel Computers UtilizingSequential SimulatorsAre Magnus Bruaset Xing CaiHans Petter Langtangen Aslak Tveito

1

AbstractWe propose a strategy, based on domain decomposition methods, for par-allelizing existing sequential simulators for solving partial dierential equa-tions. Using an object-oriented programming framework, high-level paral-lelizations can be done in an ecient and systematical way. Concrete casestudies, including numerical experiments, are provided to further illustratethis parallelization strategy.

Chapter 1IntroductionThe purpose of this paper is to address the following problem: How canexisting sequential simulators be utilized in developing parallel simulatorsfor the numerical solution of partial dierential equations (PDEs)? We dis-cuss this issue in the light of object-oriented (OO) programming techniques,which can substantially increase the eciency of implementing parallel PDEsoftware. Our parallelization approach is to use domain decomposition (DD)as an overall numerical strategy. More precisely, DD is applied at the levelof subdomain simulators, instead of at the level of linear algebra. This givesrise to a simulator-parallel programming model that allows easy migrationof sequential simulators to multiprocessor platforms. We propose a genericprogramming framework in which the simulator-parallel model can be re-alized in a exible and ecient way. The computational eciency of theresulting parallel simulator depends strongly on the eciency of the originalsequential simulators, and can be enhanced by the numerical eciency of theunderlying DD structure. This parallelization strategy enables a high-levelparallelization of existing OO codes. We show that the strategy is exibleand ecient and we illustrate the parallelization process by developing someconcrete parallel simulators in Dipack [DP].1

Chapter 2Domain decomposition andparallelizationRoughly speaking, DD algorithms (see e.g. [SBG96]) search the solution ofthe original large problem by iteratively solving many smaller problems oversubdomains. DD methods are e.g. very ecient numerical techniques forsolving large linear systems, even on sequential computers. Such methodsare particularly attractive for parallel computing if the subproblems can besolved concurrently.In this paper we concentrate on a particular DD method, called the (over-lapping) additive Schwarz method (see [SBG96]). In this method the subdo-mains form an overlapping covering of the original domain. The methodcan be formulated as an iterative process, where in each iteration we solveupdated boundary value problems over the subdomains. The work on eachsubdomain in each iteration consists mainly of solving the PDE(s) restrictedto a subdomain using values from its neighboring subdomains, computed inthe previous iteration, as Dirichlet boundary conditions. The subproblemscan be solved in parallel because neighboring subproblems are only coupledthrough previously computed values in the overlapping region between thenon-physical inner boundaries. The convergence of the above iterative pro-cess depends on the amount of overlapping. It can be shown for many ellipticPDE problems that the additive Schwarz method has an optimal convergencebehavior, for a xed level of overlapping, when an additional coarse grid cor-rection is applied in each iteration (see e.g. [CM94] for the details).2

Chapter 3A simulator-parallel model inDipackWe hereby propose a simulator-parallel model for parallelizing existing se-quential PDE simulators. The programming model uses DD at the levelof subdomain simulators and assigns processors of a parallel computer eachwith a sequential simulator, which is readily extended from existing sequen-tial simulator(s). The parallel computation related global administrationand communication can be easily realized in an OO framework. We havetested this parallelization approach within Dipack [DP], which is an OOenvironment for scientic computing, with particular emphasis on numericalsolution of PDEs.The C++ Dipack libraries contain user-friendly objects for I/O, GUIs,arrays, linear systems and solvers, grids, scalar and vector elds, visualiza-tion and grid generation interfaces etc. The exible and modular design ofDipack allows easy incorporation of new numerical methods into its frame-work, whose content grows constantly. In Dipack, a nite element (FE)based PDE solver is typically realized as a C++ class having a grid, FEelds for the unknowns, the integrands in the weak formulation of the PDE,a linear system toolbox and some standard functions for prescribing essentialboundary conditions, together with some interface functions for data inputand output. The above parallelization approach oers exibility in the sensethat dierent types of grid, linear system solvers, preconditioners, conver-gence monitors etc. are allowed for dierent subproblems. A new parallelsimulator can thus be derived from reliable sequential simulators in a exibleand ecient way.Dipack was originally designed without paying particular attention toparallel computing. A large number of exible, ecient, extensible, reliable,and optimized sequential PDE solver classes has been developed during re-3

Page 4 Parallel Solution of PDEs Utilizing Sequential Simulatorscent years. The parallelization strategy presented in this paper thus extendsDipack by oering means of adapting these sequential solvers for concur-rent computers, hopefully without signicant loss of computational eciencycompared to a special-purpose application code which is implemented partic-ularly for parallel computing. Normally, such special-purpose parallel codesemploy distributed arrays, grids and so on. That is, the conceptual modelcontains abstractions representing global quantities. In our simulator-parallelmodel proposed above, we only work with local PDE problems. This avoidsthe need for data distribution in the traditional sense. We are only con-cerned with a standard, sequential PDE solver on each processor and someglue for communicating boundary values for the local problems, in additionto an overall numerical iteration. OO programming is a key ingredient thatmakes this migration to parallel computers fast and reliable. The advan-tages of such an approach are obvious and signicant: (1) optimized andreliable existing sequential solvers can be re-used for each subproblem, (2)message passing statements are kept to a minimum and can be hidden fromthe application programmer in generic objects, (3) the extra glue for com-munication and iteration is just a short code at the same abstraction level asthe theoretical description of the parallel DD algorithm. We believe that thesuggested parallel extension of Dipack will make it much easier for appli-cation developers to utilize parallel computing environments. Nevertheless,the fundamental question is how ecient our general high-level approach is.This will be addressed in the numerical experiments.

Chapter 4A generic programmingframeworkTo increase exibility and portability, the simulator-parallel model is real-ized in a generic programming framework consisting of three main parts (seeFigure 4.1); 1. SubdomainSimulator; 2. Communicator; 3. Administrator.Furthermore, each of the three parts of the programming framework is im-plemented as a C++ class hierarchy, where dierent subclasses specializein dierent types of PDEs and specic numerical methods. In this way aprogrammer can quickly adapt his existing sequential simulator(s) into theframework. The subsequent discussion requires some knowledge of OO pro-gramming techniques.The base class SubdomainSimulator is a generic interface class oeringan abstract representation of a sequential simulator. First, the class containsvirtual functions that are to be overridden in a derived class to make the con-nection between SubdomainSimulator and an existing sequential simulator.Second, the class contains functions for accessing the subdomain local dataneeded by the communication between neighboring subdomains. We haveintroduced subclasses of SubdomainSimulator to generalize dierent specicsimulators, e.g. SubdomainFEMSolver for simulators solving a scalar/vectorelliptic PDE discretized by FE methods and SubdomainFDMSolver for simu-lators using nite dierence discretizations.With the data access functions in SubdomainSimulator at disposal, thebase class Communicator is designed to generalize the communication be-tween neighboring subdomains. The primary task of the class is to determinethe communication pattern by nding on which processors lay the neighbor-ing subdomain simulators and which part of the local data should be sent toeach neighbor etc. A hierarchy of dierent communicators are also built tohandle dierent situations. One example is CommunicatorFEMSP which spe-5

Page 6 Parallel Solution of PDEs Utilizing Sequential SimulatorsSubdomainSimulator

Administrator

Communicator

SubdomainSimulator

Administrator

Communicator

SubdomainSimulator

Administrator

Communicator

Processor 0 Processor 1 Processor n

Communication networkFigure 4.1: A generic framework for the simulator-parallel programmingmodel.cializes in communication between subdomain simulators using the FE dis-cretization. More importantly, the concrete message passing model, which isMPI in the current implementation, is hidden from the user such that a newmessage passing model may be easily inserted without changing the otherparts of the framework.Finally, Administrator performs the numerics underlying the iterativeprocess of DD. Coordinating with each other, an object of Administratorexists on each subdomain. Typically, Administrator has under its controla SubdomainSimulator and a Communicator, so that member functions ofSubdomainSimulator are invoked for computation and member functions ofCommunicator are invoked for communication. A hierarchy of administratorclasses are built to realize dierent solution processes for dierent modelproblems, among those is class PdeFemAdmSP for cases of a scalar/vectorelliptic PDE discretized by FE methods.

Chapter 5Some numerical experimentsIn this section, we apply the proposed parallelization strategy to develop par-allel simulators for solving the Poisson equation, the linear elasticity problemand the two-phase porous media ow problem, respectively. The implemen-tation of the parallel simulators consists essentially of extending the existingDipack sequential simulators to t in the programming framework men-tioned above. Given the fact that we apply the FE discretization for ellipticPDEs, the situation for the rst two test problems is straightforward, be-cause only a single elliptic PDE is involved in each case, even though it is ascalar equation in one case and a vector equation in the other. The classesCommunicatorFEMSP and PdeFemAdmSP can be used directly. So the onlynecessary implementation is to derive a new simulator class as a subclassof both the existing sequential simulator and class SubdomainFEMSolver.Using a subclass for gluing the sequential solver and the parallel comput-ing environment avoids any modications of the original sequential solverclass. The parallelization is done within an hour. The situation for the thirdtest problem is slightly more demanding, because of the involved system ofPDEs consisting of an elliptic PDE and a hyperbolic PDE. The two PDEsare solved with dierent numerical methods which means additional workfor implementing a suitable administrator, as a subclass of PdeFemAdmSP.However, programming in the proposed framework enables a quick imple-mentation which is done in a couple of days.In the following, we give for each test problem the mathematical modeland CPU consumption measurements associated with dierent numbers ofprocessors in use. The measurements are obtained on a SRI Cray Origin2000 parallel computer with R10000 processors. The resulting parallel simu-lators demonstrate not only nice scalability, but also in some simulations theintrinsic numerical eciency of the underlying DD algorithm.7

Page 8 Parallel Solution of PDEs Utilizing Sequential SimulatorsP M CPU speedup subdomain grid1 4 44.41 | 257 2574 4 11.37 3.91 257 2576 6 7.92 5.61 257 1818 8 5.84 7.60 257 13512 12 3.92 11.33 257 9116 16 3.06 14.51 257 69Table 5.1: CPU consumptions (in seconds) of the iterative DD solution pro-cess for dierent M . The xed total number of unknowns is 481 481.5.1 The Poisson equationOur rst test problem is the 2D Poisson equation on the unit square, with aright hand side and Dirichlet boundary conditions such that u(x; y) = xeyis the solution of the problem.We denote the number of subdomains byM and introduce an underlyinguniform 481 481 global grid covering . (The grid is only used as the basisfor partitioning for dierent choices of M .) With the number of processorsP equal to M , we study the CPU consumption of the DD solution processwith dierent choices of P . Suitable coarse grid corrections, together with axed level of overlapping, are used to achieve the convergence up to a pre-scribed accuracy under the same number of DD iterations, independently ofP . The largest coarse grid problem is considerately smaller than any sub-problem, so the CPU time spent on the coarse grid correction is negligible.To obtain a reference CPU consumption associated with a single processor,we make a sequential implementation of the same iterative DD process andmeasure the CPU consumption with M = 4. In this test problem we areable to use a fast direct solver for the subproblems. The important prop-erty is that its computational work is linearly proportional to the numberof unknowns. From the numerical experiments, we observe that the com-munication overhead is very small in comparison with the solution of thesubproblems. These two factors together attribute to the nice speedup inTable 5.1.Even though DD methods achieve convergence independently of the prob-lem size, it might be more ecient to use a standard Krylov subspace methodfor the current test problem when only a small number of processors are avail-able. So the purpose of the current test problem is mainly to verify that ourimplementation of the generic programming framework has kept the com-munication overhead at a negligible level. In the following case of solving a

Chapter 5. Some numerical experiments Page 9Γ

Γ

1

2

ΩFigure 5.1: The solution domain for the linear elasticity problem.linear elasticity problem, we will see that the parallel DD approach shows itsnumerical eciency even for a discrete problem of a medium size.5.2 The linear elasticity problemThe 2D deformation of an elastic continuum subject to a plain strain can bemodelled by the following partial dierential equationU (+ )rr U = f ;where the displacement eld U = (u1; u2) is the primary unknown and f is agiven vector function. Here and are constants. The 2D domain is a thequarter of a hollow disk, see Figure 5.1. On the boundary the stress vectoris prescribed, except on 1, where u1 = 0 and on 2, where u2 = 0.We use a structured 241 241 curvilinear global grid as the basis forpartitioning into M overlapping subdomain grids. Again we have P = M .In this test problem we apply an inexact solver for the subproblems. Moreprecisely, a conjugate gradient method preconditioned with incomplete LU-factorization (ILU) is used to solve the subproblems. Convergence of thesubproblems is considered reached when the residual of the local equationsystem, in the discrete L2-norm, is reduced by a factor of 102. For theglobal convergence of the DD method we require that the global residual,in the discrete L2-norm, is reduced by a factor of 104. In Table 5.2, we listthe CPU consumptions and the number of iterations I used in connectionwith dierent M . The xed number of degrees of freedom is 116,162. Todemonstrate the superior eciency of the parallel DD approach, we alsolist the CPU consumption and number of iterations used by a sequentialconjugate gradient method, preconditioned with ILU, for the same globalconvergence requirement and problem size. Note that the super linearity ofthe speedup is due to the fact that the preconditioned conjugate methodworks more eciently for smaller subproblems.

Page 10 Parallel Solution of PDEs Utilizing Sequential SimulatorsTable 5.2: CPU consumptions and number of iterations used by the parallelDD approach. The measurements for M = 1 are associated with a sequentialconjugate gradient method preconditioned with ILU.M CPU speedup I subdomain grid1 110.34 204 241 2412 40.97 2.69 7 129 2414 23.20 4.76 11 129 1296 12.43 8.88 8 93 1298 7.95 13.88 7 69 12912 4.95 22.29 7 47 12916 3.64 30.31 7 35 1295.3 The two-phase porous media ow prob-lemWe consider a simple model of two-phase (oil and water) porous media owin oil reservoir simulation,st + ~v r(f(s)) = 0 in (0; T ]; (5.1)r ((s)rp) = q in (0; T ]: (5.2)In the above system of PDEs, s and p are the primary unknowns and~v = (s)rp. We carry out simulations for the time interval 0 < t 0:4 sec-onds. At each discrete time level, the two PDEs are solved in sequence. Thehyperbolic Equation (5.1), referred to as the saturation equation, is solvedby an explicit nite dierence scheme, whereas the elliptic Equation (5.2),referred to as the pressure equation, is solved by a FE method. The 2D spa-tial domain is the unit square with impermeable boundaries. An injectionwell (modelled by a delta function) is located at the origin and a productionwell at (1,1). Initially, s = 0 except at the injection well, where s = 1.A uniform 241 241 global grid is used to cover . For all the discretetime levels, the convergence requirement for the DD method when solving(5.2) is that the absolute value of the discrete L2-norm of the global resid-ual becomes smaller than 108. We have used a conjugate gradient methodpreconditioned with ILU as the inexact subdomain solver, where local conver-gence is considered reached when the local residual in the discrete L2-normis reduced by a factor of 103.

Chapter 5. Some numerical experiments Page 11Table 5.3: Simulations of two-phase ow by a parallel DD approach; Columnsof total CPU consumptions are listed for 1. The whole simulation, 2. Thepressure equation, 3. The saturation equation, accumulated from 744 discretetime levels.M total CPU subdomain grid CPU pre. eq. CPU sat. eq.2 12409.01 129 241 11807.60 241.414 5343.50 129 129 5202.79 140.716 3201.89 129 91 3101.38 100.098 2827.77 129 69 2770.05 77.7212 1881.33 91 69 1826.82 54.5116 1388.75 69 69 1346.36 42.39AcknowledgmentsThis work has received support from The Research Council of Norway (Pro-gramme for Supercomputing) through a grant of computing time. The au-thors also thank Dr. Klas Samuelsson for numerous discussions on varioussubjects.

Bibliography[CM94] T.F. Chan and T.P. Mathew. Domain decomposition algorithms.Acta Numerica, pages 61143, 1994.[DP] Dipack Home Page. http://www.nobjects.com/Dipack.[SBG96] B.F. Smith, P.E. Bjrstad and W.D. Gropp. Domain Decompo-sition, Parallel Multilevel Methods for Elliptic Partial DierentialEquations. Cambridge University Press, 1996.

12

Documents

Numerical solution of PDEs on parallel computers utilizing sequential simulators