NUMERICAL SOFTWARE for INTELLIGENT MIMD-COMPUTER … · 6 problems on Inparcom. A technology of the investigating and solving of scientific and engineering problems has been demonstrated

NATIONAL ACADEMY OF SCIENCES OF UKRAINEV.M. GLUSHKOV INSTITUTE of CYBERNETICS

NUMERICAL SOFTWAREfor

INTELLIGENTMIMD-COMPUTER

INPARCOM

KIEVNaukova dumka

2007

The monograph is devoted to the employment of intelligent program toolsInpartool and library of intelligent programs Inparlib for the investigating andsolving of basic classes of problems occurring in the computational mathematics onthe intelligent parallel computer Inparcom. Mathematical and engineeringcharacteristics of Inparcom are described. Creation principles of Inpartool andInparlib, their components as well as technology of their employment for every classof problems are considered in detail. The book contains a plenty of programs,examples and problems required for the arrangement of the efficient solving ofproblems occurring in computational mathematics on the intelligent computerInparcom.

The book is intended for those who is engaged in the solving of scientific andengineering problems with approximately given initial data on computer,postgraduate students and senior courses students in the proper specialties.

© A.N. Himich, I.N. Molchanov, V.I. Mova,О.L. Perevozchikova, V.A. Stryuchenko,А.V. Popov, T.V. Chistyakova,M.F. Yakovlev, Т.А. Gerasimova,V.S. Zubatenko,А.V. Gromovsky,A.N. Nesterenko, V.V. Polyanko,O.V. Rudich, R.A. Yushchenko,А.А. Nikolaychuk, A.S. Gorodetsky,Ya.Е. Slobodyan, Yu.D. Geraymovich, 2007

4

PREFACE

For most scientific and engineering problems simulated oncomputers the solving of problems of the computational mathematicswith approximately given initial data constitutes an intermediate or afinal stage. Basic problems of the computational mathematics includethe investigating and solving of linear algebraic systems, evaluating ofeigenvalues and eigenvectors of matrices, the solving of systems ofnon-linear equations, numerical integration of initial-value problemsfor systems of ordinary differential equations.

Characteristic feature of mathematical models is a fact that theinitial data's specification error should be considered and taken intoaccount along with mathematical equations describing the models and,finally, the reliability of the obtained results should be guaranteed.

A problem of the reliability of computer solutions of mathematicalproblems possesses two natural aspects: reliability of mathematicalmodels describing the application problem and reliability of thecomputer solution.

Another, not less important, aspect of practical implementation ofthe numerical simulation methods is the creation of software at the enduser’s level – intelligent programs providing both communication withcomputer in terms of the subject area language and automation of allstages in the problem’s solving on computer (algorithmization,programming, solving of the problems with approximate initial datatogether with analysis of the reliability of the obtained computersolutions).

A conception of the intelligent computers intended for theinvestigating and solving of scientific and engineering problemswhose architecture and system software support the intelligentsoftware has been developed at V.M. Glushkov Institute of cyberneticsof the National Academy of Sciences of Ukraine. This conception isimplemented within the frameworks of Inparcom project jointlyperformed with State scientific production enterprise “Electronmash”.

Intelligent MIMD-computers Inparcom are knowledge basedcomputers intended for the investigating and solving of scientific andengineering problems with approximately given initial data (see:

5

http://www.inparcom.com). Inparcom enables to formulate a problemin computer in terms of the subject area language, automaticallyinvestigate characteristics of the computer model of problem withapproximately given initial data; in accordance with the revealedcharacteristics and with taking into account mathematical andengineering characteristics of the computer determine the number ofprocessors required for the solving of problem and construct both analgorithm and computational scheme; form a topology of theMIMD-computer’s processors for the solving of problem withpossibly minimum consumption of machine time; in accordance withthis configuration synthesize a program of parallel computations;solve the problem; estimate both the reliability of the obtainedcomputer solution (a proximity between computer and mathematicalsolutions) and inherited error in mathematical solution of the problem;visualize results of the problem’s solving in terms of the subject arealanguage.

The intelligent numerical software for Inparcom includes:Inpartool – an intelligent program tool intended for the automaticinvestigating and solving of basic problems of the computationalmathematics, including the Internet solving of problems; Inparlib – alibrary of intelligent programs intended for the investigating andsolving of programs by means of algorithm chosen by user (thecopyright certificate № 17213 in the State department of the intelligentproperty as of 11.07.2006 ). A conception of the intelligent software istested on the 16-processor intelligent workstation Inparcom-16.

Inparcom-16 includes: a host-system consisting of twohost-computers (each: Xeon 3,2 GHz, a machine word length – 64 bit,operating memory – 8 Gbyte and disk memory – 72 Gbyte); aprocessing unit consisting of 16 computing nodes (each: Xeon 3,2GHz, a machine word length – 64 bit, operating memory – 2 Gbyteand disk memory – 36 Gbyte); a communication environmentconsisting of two switching networks (based on Gigabit Ethernet andInfiniband) and hypercube network (based on Gigabit Ethernet).

An objective of the monograph is to present a methodology for theautomatic investigating and solving of scientific and engineering

http://www.inparcom.com/

6

problems on Inparcom. A technology of the investigating and solvingof scientific and engineering problems has been demonstrated on theintelligent workstation Inparcom-16.

First two chapters deal with basic concepts of the intelligentcomputers and intelligent software as well as with creationmethodo-logy for the intelligent software based on knowledgeconception.

Chapters 3-6 contain user guides for the Inpartool intelligentsoftware intended for the solving of classes of problems in thecomputational mathematics. Fundamental potentialities of Inpartool aswell as technology of its employment for each class of problems aredescribed; a lot of examples demonstrating innovative characteristicsof the program tool are given and facilities of the dialog interface arereviewed.

The intelligent programs library entitled Inparlib is presented inChapter 7. Principles of its creation are considered; its potentialities inthe solving of four classes of problems of the computationalmathematics are determined; descriptions of parameters for everyC-function are given and segments of programs forming the initialdata are delivered.

Results of testing of the intelligent software for solving ofproblems on the strength analysis of building structures (jointly withPC LIRA (http://www.lira.com) are given in Chapter 8.

http://www.lira.com/

7

Chapter 1. THE INTELLIGENT COMPUTER INPARCOM

To solve a scientific and engineering problem one needs toformulate an application problem, construct a physical model, describeit by means of mathematical formulas and get a mathematical model.By means of discretization facilities (for example, the finite elementsor finite differences methods) this model is “arithmetized” i.e. reducedto the discrete model an information about which is entered intocomputer and, finally, the computer model of the problem is obtained.If the problem is being solved for the first time then, as a rule, itsmathematical and discrete models are investigated in order to revealits characteristic features and properly choose a solution method.

Based on the solution method, the solution algorithm isconstructed with taking into account characteristics of the discretemodel of problem. A computational scheme implementing the solutionalgorithm is constructed with taking into account mathematical andengineering characteristics both of computer and its system software(programming language, computer, operating system and etc.). Thenthe computational scheme is presented in the form of program writtenin some programming language the compilation of which leads to theobtaining of the problem’s solution algorithm in machine codes.Within this technological scheme the computer implements thesolving of problem by means of this program using information aboutproblem’s computer model and its characteristics. The solving ofproblem results in the obtaining of computer solution the reliability ofwhich is to be estimated by user on his own.

Such a traditional scheme for the investigating and solving ofscientific and engineering problems on computer may encounter anumber of hidden obstacles which lead to the obtaining of computersolutions making no physical sense.

As this takes place one of the main characteristic features is thefact that computer model which in the final analysis is to beinvestigated is of approximate nature with respect to the originalproblem (due to the following: inherited error in the initial data,discretization error or numerical data’s error occurring during theirinput to computer and etc.).

8

The next characteristic feature is the fact that characteristics ofproblem’s computer model may differ from characteristics of themathematical problem. Note that the initial data's specification errorshould correlate with characteristics of the mathematical problem.Without such correlation even exact analytical solution makes nophysical sense. Because of this it is necessary to carry out an analysisof the reliability of the obtained computer solutions [31].One more important characteristic of the numerical modeling is theemployment of computer algorithms for the investigating and solvingof problems. The computer algorithm is oriented towards computermathematics and takes into account architecture of the computer, itsmathematical and engineering characteristics as well as its systemsoftware.

And, finally, the key problem of the numerical modeling is aproblem of the reliability of obtained computer solutions. Animportance of this problem is confirmed by more than 20 yearsexperience of NAFEMS Working groups (National Agency FiniteElement Methods and Standards, UK),) in 30 countries worldwide, themain objective of which is to provide reliability and safety of theengineering calculations based on finite elements method and itsrelated technologies.

Another, not less important, aspect of the practical realization ofnumerical modeling methods is the creation of the end user’s levelsoftware – intelligent program tools providing both thecommunication of the end user with computer during the solving ofscientific and engineering problems in terms of the subject arealanguage and automation of process of obtaining solutions withinapproximately given initial data and together with reliability analysisof the obtained solutions.

At present parallel architecture supercomputers [13] have becomeone of the main computing facilities for the numerical modeling ofcomplicated processes in various subject areas. A set of problemslisted above has been replenished by the following problems relatedto: the design of algorithms which take into account both architectureand engineering characteristics of these computers; a choice of the

9

required number of processors; distribution of data betweenprocessors; synchronization both of computations and exchanges andso on.

The intelligent computer [27] is a tool for the overcoming of theabove listed difficulties and “hidden obstacles”.

The intelligent multiprocessor computer Inparcom intended forthe investigating and solving of scientific and engineering problems isa computer whose structure, architecture and operating environmentsupport the intelligent software [26, 29, 30, 33]. By the intelligentsoftware for the solving of scientific and engineering problems we’llmean a set of programs enabling to form a problem in computer interms of the subject area language; automatically investigatecharacteristics of computer model of problem with approximatelygiven initial data; determine the number of processors required for thesolving of problem according to the revealed characteristics and withtaking into account mathematical and engineering characteristics ofcomputer and construct both the solution algorithm and computationalscheme, form a topology from MIMD-computer’s processors for thesolving of problem with possibly minimal consumption of themachine time; in accordance with this configuration synthesize aprogram of parallel computations; solve the problem; estimate thereliability of the obtained computer solution (its proximity to themathematical one) and derive an estimate for the inherited error insolution of the mathematical problem; visualize results of theproblem’s solving in terms of the subject area language.

A “hidden parallelism” principle is implemented for Inparcom. A“hidden parallelism” [25, 37] is such a mode of user’s work on theintelligent computer under which the solving of problems is carriedout in the same manner as on the mono-processor computer. Alloperations related to the paralleling in computer are carried outautomatically.

The “hidden parallelism” involves:determination both of the optimal number of processors and

computer’s configuration efficient for the solving of problem;

10

creation of efficient algorithms (acceleration Sp sufficientlyclose to the number of processors p; coefficient Ep sufficiently closeto one) as well as synthesis of parallel program for the chosenconfiguration;

distribution of initial information between processors;paralleling of not only of arithmetic or logical operations but

also data exchanges between processors;uniform loading of processors being used during the solving

of problem;synchronization of exchanges between processors;minimization of exchanges between processors.

The Inparcom’s software supposes three levels:1) operating environment supporting the intelligent software;2) intelligent numerical software intended for investigating and

solving problems of the computational mathematics withapproximately given initial data;

3) intelligent application software for the classes of applications,for example, for investigating and solving problems on the strengthanalysis of structures. Based on the conception of intelligent computers the16-processor workstation Inparcom-16 has been created which can bemanufactured with the required by customer number of processors.

Inparcom-16 workstation includes:a host-system consisting of two host-computers (each Xeon3,2

GHz, 64 bit machine word length, 8 Gbyte operating memory, 72Gbyte disk memory);

a processing unit consisting of 16 computational nodes (eachXeon3,2 GHz, 64 bit machine word length, 2 Gbyte operatingmemory, 36 Gbyte disk memory);

communication network consisting of Gigabit Ethernet,Infiniband and hypercube.

The host-system carries out the following: control over theutilization of multiprocessor computing resource; all-systemmonitoring; communication with user’s terminal networks;visualization of problem’s solving results as well as realization of that

11

part of computations and data processing which are non- orill-parallelable. A host-system with peripherals can be involved eitherinto local or a global network.

A processing unit is a uniform scalable structure consisting ofsome set of high-performance processors (for the workstation ― 16processors with their own operating and disk memory) integrated byinter-processor communication network.

Operating network of the intelligent computer is based onGNU/Linux solutions. However, the user can choose one of threeversions of installable operating system: Linux/Windows orLinux+Windows. Within the latter version the host’s operatingenvironment provides on-demand automatic switch between Linuxand Windows by reloading of nodes. The installed version of Linuxbased on Scientific Linux4.2 (от 22.11.2005) is optimized inaccordance with Inparcom’s hardware. Installed version of Windowsis XP SP2. Operating environment of the parallel computer is based onMPI. Both MVAPICH optimized for Infiniband and LAM MPI forWindows-MPICH are installed in Linux. Another widespread messagepassing system PVM (parallel virtual machine) is installed for thesupporting of the maximum number of outside applications.

The GCC-computer included in Linux supports programminglanguages C, С++, Fortran and others. The operating environmentinvolves Apach Internet-server supporting applications written inРНР, MySQL data-base management system, standard mathematicallibraries (including ScaLAPACK), tests (Linpack, Scali) anddistributed file system.

Intelligent parallel computers possess high technical andeconomic efficiency (reduction of the “solution time to cost” ratio)which is gained by system integration of advanced microelectronicelements, functional modules, mass production devices andconstructions using standardized inter-module interfaces and basicprogram tools.

12

Most important problems related both to the arrangement ofparallel computations and programming are implemented in intelligentparallel computers:

revelation of natural parallelism for classes of problemsbeing solved and employment of automatic mechanisms of itsfunctioning;

determination of the minimum volume of the intelligentcomputer’s processor resource being used for balancing the optimumloading of nodes of the processing unit during the solving of problem;

substantial improvement of conditions for the solving ofclasses of problems at the expense of varieties of architecture andcommunication topologies of computing nodes of the intelligentcomputer for the sake of economy both of time and volume ofoperating and disk memories being used.

The scalability of intelligent computers enables to slow downtheir moral ageing as well as to reduce the cost of permanentreplacement of old equipment.

The operating environment provides:job stacking and run of parallelized program on the

computing nodes being chosen;monitoring both of the intelligent computer and executable

jobs;saving and visualization of protocols of parallel

computations;run of application (executable program code) on the

host-computer;work via local network and\or Internet (the remote access);design of parallel programs;management of parts of distributed file system accessible to

users.The intelligent software includes: Inpartool – the intelligent

software for the automatic investigating and solving of basic problemsof the computational mathematics; Inparlib – a library of intelligentprograms intended for the investigating and solving of problem chosen

13

by user (a set of programs implementing complete parts ofcomputational algorithms).

It is most reasonable to use intelligent computers in:engineering (machine-building, building, energetic systems,

etc.) and scientific (geophysics, physics, chemistry, biology,pharmacology, etc.) calculations;

virtual design;modeling of various physical objects;creation of simulators for the training of personnel in

managing up-to-date machinery, including those in real-time;bank-financing activities.

General advantages of intelligent computers include:freeing users from complexities related to: investigating of

problem, creation of algorithms and programs that not less than 100times reduces times required for formulating and solving of problems;

computer statement of user’s problem with approximatelygiven initial data in terms of the subject area language;

obtaining of the computer solution of the problem togetherwith reliability estimate and its certain characteristics;

reduction of times required for the computer investigatingand solving of problems in comparison with solving of the sameproblems on MIMD-computer with the same number of processorsand the same element base but possessing the traditional parallelarchitecture.

14

Chapter 2. INTELLIGENT SOFTWARE INPARTOOL FORTHE INVESTIGATING AND SOLVING OFPROBLEMS OF THE COMPUTATIONALMATHEMATICS

2.1. Conception of Inpartool

Most problems occurring in engineering and science simulatedon computers have as an intermediate of a final stage the solving ofproblems of the computational mathematics with approximately giveninitial data. Basic problems of the computational mathematics include:the solving of linear algebraic systems; finding of eigenvalues andeigenvectors of matrices; solving of non-linear algebraic systems;numerical integration of initial-value problems for systems of ordinarydifferential equations.

It is well known that the efficient solving of mathematicalproblems with approximately given initial data requires the carryingout the following investigations:

to reveal the existence of classic or generalized solution;to find out an opportunity to determine the unique classic or

generalized solution;to determine a stability of the solution;to find an area within which mathematical solution makes

physical sense;to estimate an error in the mathematical solution caused by

initial data error.It should be emphasized that due to the initial data error the

mathematical problem is to be considered as a problem with a prioriunknown characteristics. A machine model of problem to beultimately implemented on computer is always of the approximatenature with respect to mathematical problem due to the error occurringduring input of numerical information about problem into computer.The error is, in particular, caused by the following:

a continuum of real numbers in computer is approximated bya finite set of simple fractions (even input of numerical data causesrounding-off errors);

15

a phenomenon of “machine zero” gives rise to a number ofdifficulties during the implementation of computational algorithms(any up-to-date computer possesses the least positive number whichcan be represented in it; all numbers in modulus less than this numberare replaced by zero);

computer arithmetic operations differ from theirmathematical counterparts: associativity and distributivity laws are notvalid for any up-to-date computer, while commutativity laws for thefloating-point operations are valid only for the correct rounding-offprocedure.

So, it is necessary to carry out the computer investigation ofmathematical characteristics of computer models of problems, namely:

to reveal the existence and uniqueness of solution of theproblem's computer model;

to investigate stability of solution within errors in thedecimal-to-binary conversion of numbers;

to determine characteristic features of the computer model ofproblem for the choice of efficient algorithm for the solving ofproblem;

to estimate an inherited error in the mathematical solution;to estimate computational error in the obtained solution, i.e.

estimate a proximity between the obtained and exact solutions of thecomputer problem.

To solve problems on MIMD-computers the user is to carryout the following additional work:

to determine both the optimum number of processors andtopology of inter-processor communication for the efficient solving ofproblem;

to provide the uniform loading of processors being used forthe solving of problem;

to provide the synchronization of data exchanges betweenprocessors;

to minimize the communicational losses caused by thenecessity of inter-processor data exchange.

16

Such a work requires from users skills in parallel programming,knowledge of mathematical and engineering characteristic features ofMIMD-computer, studying of a great deal of the operation instructionsfor packages and libraries implementing parallel algorithms ofprograms.

Difficulties occurring during the computer solving of problemsof the computational mathematics on MIMD-computers can beovercome by means of the intelligent program tool Inpartool intendedfor the solving of fundamental problems of the computationalmathematics.

Inpartool consists of separate components for investigating andsolving problems from the following classes:

linear algebraic systems;algebraic eigenvalue problem;non-linear equations and systems;ordinary differential equations and systems.

At the level of concepts Inpartool implements the end user’smodel and represents a set of program and engineering tools providingthe investigating and solving of user’s problems belonging to the fieldof numerical methods.

For the linear algebraic systems Inpartool solves the problemswith various structure matrices together with reliability elements forthe solution, invert matrices, evaluates singular values and matrixranks, estimates the matrix condition numbers, etc.

For the algebraic eigenvalue problem (standard and generalized)Inpartool solves the both partial and full eigenvalue problems withvarious structure matrices (general, band or sparse). Inpartool enablesto evaluate condition number for the separately taken eigenvalues,condition numbers of eigenvectors, to estimate computational andoverall errors in solutions.

For system of non-linear and transcendental equations Inpartoolevaluates: the local condition number of the function f(x), the localcondition number of the vector-function F(x), termination criteria forthe iterative processes, the accuracy of solution with taking intoaccount the approximate nature of the initial data.

17

For the investigating and solving of initial-value problems forsystems of ordinary differential equations Inpartool enables tointegrate both common and stiff systems of equations with accuracy ofvarious orders, including any a priori specified accuracy. At user’s willInpartool can carry out the investigation of the stiffness for the systemsof ordinary differential equations, the evaluation of Lipschitz constantfor them and determination of the accuracy of solution with taking intoaccount approximate nature of the initial data.

At the functional level Inpartool is the software enabling toformulate a problem with approximately given initial data forcomputer in terms of the subject area language; automaticallyinvestigate mathematical characteristics of the problem’s computermodel; according to the revealed characteristics of the problemconstruct a solution algorithm; the automatically determine theoptimum number of processors and form an efficient topology of theMIMD-computer; distribute the initial data between processors;synthesize a parallel program for solving the problem with taking intoaccount mathematical and engineering characteristics of the computer;solve the problem together with reliability estimates of the solution;explain and visualize the obtained results in terms of the subject arealanguage.

Inpartool implements a conception of knowledge [45]. Its designis based on the synthesis of fundamental achievements in the field ofmodule programming, knowledge bases and databases; it relies on thedata processing methods being developed: representation, storage andobtaining of new knowledge, etc.

A subject area for each class of problems involves a widespectrum of problems, methods, algorithms and computationalschemes taking into account approximate nature of the initial data.Special computer methods for investigating mathematicalcharacteristics of their computer models are implemented togetherwith algorithms for the analysis of the obtained computer results.Modular programming principle [42] made it possible to systematizeand unify knowledge about subject areas and design special methodsof the same type for storing, search, extraction and pressing of data.

18

This made it possible to determine an optimum set of procedures andfunctions by means of which all problems can be solved. Proceduralknowledge is represented by functional modules describing logicallycompleted segments of computer algorithms for the investigating andsolving of problems as well as semantics of these algorithms. Eachmodule contains knowledge about its employment, input and outputparameters, rules for the initial data distribution between processors,allowable computer topology, required computing resources, etc.

2.2. Composition and architecture of Inpartool

A client-server architecture of Inpartool is representedschematically in fig.2.1, with the client part consisting only of thedialog system and the server part including systems providing user’saccess to Inpartool in Internet as well as systems by means of whichthe investigating and solving of problems with approximately giveninitial data on parallel computer is implemented.

Library of program modules enables to automatically constructthe required algorithm and synthesize a program for solving theproblem from separate functional modules on the basis of revealedproblem’s characteristics and with efficient employment ofMIMD-computer’s computing resources. Communication betweenmodules is established both by data and control.

Scheduling and control system is closely associated with formaldescription of the subject area, knowledge base and dialog system. Theprincipal purpose of the scheduling of computations is to find anoptimum way for the solving of the problem.

Principles of the automatic investigating and solving of problemson computer with automatic analysis of the reliability of resultsimpose the following requirements upon the scheduling and controlsystem:

analysis of user’s initial data and their transformation intoprimary knowledge about the problem;

possibility of storing and processing knowledge from thesubject area during the scheduling of computations;

19

arrangement of various ways of employing knowledge bothabout the problem and subject area for investigating and solving of theproblem;

construction of algorithms and synthesis of programs forinvestigating and solving problems;

output and saving of results of investigating and solving theproblem for their subsequent clarification and visualization.

FIG.2.1

I norde rt oi mplem en tthesolvingo fproblemo ntheM I

MD-computer the system should carry out the following controlfunctions:

construction of MIMD-computer’s virtual topology;determination of the number of processors providing

efficient solving of the problem;distribution of the initial data between processors.

Dialog system

Library of program

Scheduling and control

Explanatory

Reference system

Identification system

Access control system

Input Outpu

Remote control system

Knowledge

20

Explanatory system answers the following questions: how was thesolution obtained? Why was such a way of investigating the problem’scharacteristics chosen? It either yields the obtained solution togetherwith reliability estimates or explains reasons of refusal in theproducing of the solution. User can manage an extent of working outexplanations in detail.

Toward this end various scenarios of explanations, various-levelprotocols of the computational process and graphic examples for theuser have been developed.

Reference system allows the user to get information necessary forthe solving of his problem by means of Inpartool: functionalpotentialities, order of work, input of the initial data, glossary of termsfrom the subject area being used, etc.

The interaction with user is implemented by means of dialog tools,namely:

formulating of problem and input of the initial data;communication during the process of computations;visualization of the obtained results;access to explanatory block;the obtaining of information-reference data and help at each

stage of work.Dialog scenarios are developed with taking into account a model

of the subject area as well as various purposes and level of user’spreparedness to the using of Inpartool. As this takes place, thefollowing requirements are satisfied: communication in terms of thesubject area language, suitable forms of information input/output,paper-free form of documentation.

The intelligent interface enables the end user only to formulate theoriginal problem, while a sequence of operations required for theobtaining of the problem’s solution is automatically determined by thesoftware itself by means of including a set of operations carried out byuser into the sequence being generated. Forms of communicationbeing used are the following: menu, answer/question, screen forms.

21

An order of Inpartool’s communication with user is established bymain menu. Its structure and basic items are natural and habitual foruser since they are inherent in many dialog systems. Various menuselection schemes enable to determine a problem, indicate input(display, data archive) and output (disk, printer) destinations, run theproblem, look over glossary of the subject area, etc. In addition, thefollowing operations are provided: browsing, correction, copying andsaving of the input and output data and their using in current andsubsequent work sessions.

During the input of initial data the user either fills out windowforms by means of prompting and instructions or answers theInpartool’s questions.

The solving of problem can be implemented either automaticallywhen investigating and solving of the problem are carried out withoutuser’s involvement or interactively when user’s participation ispossible in all or separate stages of investigating and solving of theproblem.

2.3. Program implementation of Inpartool

Program implementation of the intelligent software performed bymeans of the client-server architecture is based both on one ofinterfaces of the remote access to RMI (Remote Method Invocation)[5] and two-level protection system:

access by password, user name and electronic certificate;subdivision of functions into several abstraction levels, for

example, access not to files or executable processes but to softwareadjustments, problems, input and output data.

For all types of input data (matrices, right-hand sides, functions,etc.) special functional objects are introduced the employment ofwhich enables to enhance the software’s reliability by restricting thedirect access to file system and other server’s resources.

The program platform is based on Java virtual machine whichmost completely meets the requirements to security, functionality andservers being granted:

22

independence both of the hardware platform and operatingsystem (various versions of Java for Linux and Windows for variousprocessor architectures are available);

built-in network facilities for providing the remote access toapplication;

built-in facilities for the software’s integration intoInternet-browser or possibility to run application from theInternet-browser;

availability of user interface servers most completelycorresponding to standard elements of Windows and Linux operatingsystems;

built-in facilities for protecting data communications;simplicity of unfolding and scalability.

To simplify the interface component of the software, user’sfunctions of the remote access system are integrated into dialogsystem. The rest of system, namely: knowledge base, useridentification system, access control system, belongs to the servercomponent. To control the access to the software, an additionalsoftware is employed which controls accounting records and user’saccess rights (fig.2.2).

All components of Inpartool are implemented in the form ofseparate components that simplifies the software’s adaptation to user’scomputer environment. Thus, for different MPI environments, forremote and local access as well as for different operating systemsidentical dialog interfaces are proposed to the user.

23

FIG. 2.2

At the software’s level allnecessary components areattached while the excesscomponents are unloaded thatresults in the economy ofresources.The remote access functionsimpose additionalrequirements not only toclient’s but also to server’ssoftware: the width of the

communication channel (the providing of both of real time monitoringand control requires the band width equal to 8–16Kb/sec) andmaximum data communications delay is in the order of 50–100 msec.Special optimization technologies are employed for minimization ofthe data communication: rh-optimization [1], routing optimization ofdata communications, delayed execution of commands that enabled toemploy the communication channel with capacity equal to 4–12Kb/sec and to lower data communication delays to 100–200 msec.

The Inpartool's starting window is shown in fig. 2.3.

FIG. 2.3

The dialog system is developed with taking into account demandfor the remote access. It is based on the cross-platform graphic LibrarySwing. Communication with user is carried out in English andGerman; one of them is to be chosen in menu (fig. 2.4).

Inpartool

Host-systemRegistrar of Inpartool’s

Access managemeut ofuser

(accounts_manager)

24

FIG. 2.4

The remote access system operates with special objects (fig. 2.5)rather than with real files and processes.

25

FIG. 2.5

A work in the network requires an open access to ports numbered1771, 1881, 1991, 2000–2100. When starting, the server registersnames of basic objects in the RMI system that will manipulate datapackages being transferred, sizes of these objects being determined onthe basis of the rh-optimization. Having established the start-upconnections with server, the client undergoes authorization after whichit can work with the software.

To minimize delays an algorithm of the delayed execution ofinstructions and sorting or integration of instructions based on therepeated actions of user is used along with special algorithmcontrolling volumes of packages and their routing.

A scheme of one version of the instructions transfer is depicted infig. 2.5,а and after optimization this algorithm takes the form depictedin fig. 2.5,b In so doing two, rather than four, instruction areperformed each with delay 50–100 msec, therefore the execution timeis reduced in half.

/FileOBJ

/FileSysObserver

/FileAccess to files and

/Exec Scheduling system

Control system/RSession

26

FIG. 2.5

Graph representation of formal models of the subject areas at theprogram level is implemented by means of special language based onJava [5]. This language is used for the creation both of scheduling andcontrol system and knowledge base. To provide the remote accessfunctions it is required to run server processes rmiregisty andrmi_server on the Inparcom’s host-system. Client application can berun either from the command line or from the browser’s window whenattached to the corresponding node and accessed to special Internetsite. An access from the web-interface requires the availability ofelectronic certificate file which allow run of the Java-code on theremote computer. This enhances protection of users from unsignedand unauthorized program components which may be loaded.

К1

К2

К3

К4

К1

К2 К3 К4b

а

27

Chapter 3. INVESTIGATING AND SOLVING OF LINEARALGEBRAIC SYSTEMS

3.1. Functional potentialities of Inpartool on investigatingand solving of linear algebraic systems

Linear algebraic systems (LAS) can arise: in data processingproblems where linear differential problems are discretized by finitedifferences or finite elements method; in the solving of linearproblems by least squares method; in calculating electric circuits andcomplicated hydraulic systems, in some models of economic problemsand so on.

As this takes place, consider what kinds of problems can beformulated. Thus, in a number of cases it is required to solve LASwith non-singular square n-th order matrix with one vector of freeterms (with one right-hand side) or the same system with p right-handsides In some problems the necessity arises in the evaluation of matrixinverse to the given non-singular matrix of order n. There existproblems where for the given m × n matrix A and vector b сonsistingof m components it is necessary to evaluate such a vector-column хсonsisting of n components that the Euclidean norm be the

least. Such a vector х is called a solution obtained by least squaresmethod or a generalized solution to the system Ax=b (possiblynon-consistent system). If rank of the given system r(n) n than thereexists an infinite set of vectors x being solutions obtained by leastsquares method (generalized solutions to LAS). Sometimes it isrequired to find among such solutions the solution x which possessesthe least Euclidean norm . This vector is always unique and referred

to as a normal generalized solution.As a rule, the solving of application problem starts from the

creation of acceptable physical and mathematical models. Varioushypothesizes are used for the construction of these models. If thesehypotheses are valid (error in hypothesis is absent or sufficientlysmall) the physical model correctly reflects regularities inherent in

28

application problem. The physical model can be described bymathematical formulas, for example, by some LAS.

Systems of the form (3.1)

with accurate initial data are very seldom used in the describing ofphysical models. The most typical initial data specification has theform

Ax = b (3.2)with indicating error in the initial data:

. (3.3)

Thus, a physical model is described by the entire class of equations.As a formal solution to the problem (3.1)–(3.3) one can take anyvector which turns equation (3.3) into identity. Note that in the case ofrectangular (m n) or singular (detA=0) matrix of accurate system (3.1)the approximate system (3.2) obtained in computer may turn out to benon-consistent for any accuracy of the initial data specification.

An error in the solution х caused by inaccurate specification of theinitial data is said to be inherited error. Its value depends both on theinitial data error and characteristics of the matrix.

A solution to the system of equations (3.2) obtained by somenumerical method on computer is called a machine solution of theproblem. Because of the error in decimal-to-binary conversion of theinitial data, the method error as well as error in the computerimplementation of algorithm the obtained machine solution of theproblem may differ from the mathematical one.

Thus, in the solving of LAS describing application problems it isnecessary to determine a concept of solution to be sought, construct analgorithm for finding this solution, estimate the computerimplementation error in the course of solving the problem (i.e.estimate proximity between machine and mathematical solutions) aswell as the estimated inherited error in the solution [15,16,18,28].

As to the class «Linear algebraic systems» Inpartool involves thesolving of the following problems:

29

investigation and solving of LAS together with reliabilityesti-mates for the obtained results;

inversion/pseudo-inversion of matrix together with reliabilityestimates for the obtained results;

evaluation of estimate for the matrix condition number;evaluation of determinant of the matrix;evaluation of singular values of the matrix;evaluation of matrix rank;evaluation of fundamental system of solutions to homogeneous

system. These problems are solved for the following types of matrices:

dense nonsingular;dense symmetric positive definite;band symmetric positive semi-definite;band symmetric positive definite;square singular of arbitrary rank;rectangular.

Problems under consideration are covered by small set of solutionalgorithms but their various modifications take into account allproblems and types of matrices. An important requirement is imposedon the set of algorithms intended for the solving of problems withapproximately given initial data – to be in accordance withmathe-matical and engineering characteristic features of computer.During the development of algorithms and programs the questionswere inve-stigated related to dependence of problem’s solving timeand relia-bility of results on the following: arrangement ofcomputations, archi-tecture and topology of computer, systemsoftware, translator, styles of programming and so on [19, 20, 21, 32,43, 53].

For the investigating and solving of LAS with dense and bandnonsingular matrices various modifications of Gauss algorithm areused. Various modifications of the Cholesky algorithm are employedfor the investigating and solving of LAS with dense and bandsym-metric positive definite matrices. The least squares method basedon the SVD decomposition of the matrix employed for the solving of

30

LAS with square singular and rectangular matrices of the arbitraryrank [7].

At the stage of computer solving of LAS with nonsingularmatri-ces with approximately given initial data Inpartool providesautomatic carrying out of the following:

investigation of singularity of the matrix within the limits ofmachine accuracy and within the limits of the initial data error;

investigation of conditioning of the matrix;the solving problem by algorithm corresponding to the

reve-aled characteristics of the problem;estimating of the inherited error in the mathematical solution;estimating of the proximity between machine and

mathe-matical solutions.It is common knowledge that basic criterion for the determining of

the above-mentioned characteristics of LAS is the condition number. If H is not large the system’s matrix A is called an

ill-conditioned or singular within the range of the initial data error.However, the practical evaluation of H in the computer involves theevaluation of the inverse matrix A–1 that requires more calculations.

To economize the amount of computations and minimize losses inaccuracy one should evaluate an estimate (cond A) instead ofevaluating the condition number of the matrix and, in so doing, oneshould make use of the decomposition of the original matrix A by oneof direct methods.

The evaluation of the matrix condition number is implemented byscheme:

A L U, ,

Uw=e, LTy=w, Lv= y, Uz=v,

,

where , .

31

If the value соnd A in computer satisfies the condition1.0 + 1.0/соnd A = 1.0, (3.4)

the matrix is considered to be singular within the limits of machineaccuracy. In this case a stable projection of the solution can be found.

If matrix cannot be classified as singular according to (3.4), butA соndA 1, then LAS with approximately given initial data

entered to computer is singular within the limits of accuracy of thematrix elements’ specification, therefore the reliability of computed

solution cannot be guaranteed. Here is the maximal relative

error in matrix elements. If a user considers the initial data to be givenaccurately then is assigned the value macheps – the leastfloating-point number such that condition:

1 + macheps > 1is hold in computer [41].

If the system of equations possesses either a rectangularm n-matrix or a square singular matrix Inpartool guarantees theobtaining of the generalized solution of the system, i.e. determines avector minimizing over the entire space Rn. A system can

possess a set of such solutions. Then a normal solution can beevaluated, i.e. a vector possessing the minimal norm .

In this case the spectral condition number of the matrix isevaluated by means of the singular value decomposition of the matrix:

,

where is the largest singular value and is the least non-zerosingular value.

During the computational process an analysis of the reliability ofobtained results is carried out which includes estimating the proximity

32

between machine and mathematical solutions as well as estimating ofthe inherited error.

The upper bound for the relative inherited error in the solution isdetermined by formula:

,

where is the exact solution of the system with accurately giveninitial data; x is exact solution of the system with approximately giveninitial data; , b are maximal relative errors in elements of thematrix and right-hand sides, respectively.

An estimate of the computational error in the solutioncharacterizes the proximity between machine and mathematicalsolutions. For its evaluation one should employ one step of thesolution’s iterative refinement procedure.

Let us briefly outline the iterative refinement algorithm for thesolving of system with non-singular matrix.

Let x be a solution to the system

Ax = b

evaluated by some direct method.The iterative refinement is implemented by scheme

x(0) = x,r(s) = b Ax(s),A x (s) = r(s),x(s+1) = x(s) + x (s),s=0, 1, 2 …

During the evaluation of x(s) the matrix decomposition is usedalre-ady obtained by one of algorithms, therefore the iterativerefinement procedure doesn’t require a lot of extra time. Theevaluation of the residual should be carried out with the increased

machine word length.

33

Within Inpartool an estimate for the solution’s computational erroris determined as follows:

,

where x2 is an approximation to the exact solution obtained by onestep of the iterative refinement.

Let us outline fundamental conceptual theses of the technologicalscheme for the solving of LAS by Inpartool:

possibility of solving problems with approximately giveninitial data;

formulating of problems in terms of the subject area language;suitable for user forms of the initial data's input;automation of the following processes: the computer

inve-stigation of mathematical characteristics of the problem’scomputer model, choice of algorithm and synthesis of program for thesolving of problem;

the solving of problem together with reliability estimates of theobtained computer solutions;

the obtaining not only of solution to the problem but also aprotocol describing the solving of problem together with analysis of itsrevealed characteristics and reliability of the obtained results;

implementation of the “hidden parallelism” principle.The implementation of the “hidden parallelism” principle involves

the following: paralleling of algorithms for the investigating andsolving of problems; a choice of the optimum number of processorsfor the efficient solving of the problem; creation of the computertopology and distribution of the initial data between processorsaccording to the requirements of algorithms; arrangement of dataexchanges between processors.

When solving LAS by Inpartool a user takes part only in theformulation of problem while the rest of work stages in the solving ofproblem are performed automatically.

34

3.2. Technology for investigating and solving of linearalgebraic systems

3.2.1. Applying to Inpartool for the solving of LAS. Thegeneral form of main window «Linear algebraic systems» consistingof main menu and two panels is shown in fig. 3.1. The left panel(passive) reflects a sequence of work stages and sub-stages whichwere already performed, being performed or will be performed.

Inpartool solves LAS for such matrices:dense nonsingular;dense symmetric positive definite;band symmetric positive semi-definite;band symmetric positive definite;square singular of arbitrary rank;rectangular.

LAS can be solved both for one and many right-hand sides.To solve the problem a user should carry out the following

successive stages of work in the right-hand (active) panel:formulate a problem;input the problem’s initial data;start the problem;obtain results.

35

FIG. 3.1

To formulate a problem the user should click on arrow located tothe right of the title «Problem». The submenu will appear containinga list of problems from this class of problems which can be solved bymeans of Inpartool.

Now the dialog window has a form shown in fig. 3.2. From thelist being proposed user should select a problem to be solved, forexample, «The solving of LAS».

36

FIG. 3.2

3.2.2. Specification of initial data for the solving of LAS. Initial data forthe solving of LAS are given by parameters of the problem, i.e.elements of the matrix (the number of rows and columns in the matrix,the number of diagonals for band matrices as well as the number ofright-hand sides), matrix elements and their maximum relative errors.The data can be input from the binary file and/or their values can bedirectly entered into corresponding data fields. This input can also beimplemented by program or by formulas (fig. 3.3).

37

FIG. 3.3

During the data input from file a user can make use of convenientformats provided by Inpartool (fig. 3.4).

FIG. 3.4

Among them are the following data formats:file contains in the binary form matrix elements and elements

of right-hand sides as floating-point numbers the sequence of which is

38

determined by form of the matrix (the file mask is *.tam), eachnumber occupying 8 bytes;

file contains in the binary form only matrix elements asfloating point numbers the sequence of which is determined by formof the matrix (the file mask is *.tam), each number occupying 8bytes;

file contains in the binary form only elements of right-handside as floating point numbers the sequence of which is determined byform of the matrix (the file mask is *.tam), each number occupying 8bytes;

file contains in the binary form the number of rows andcolumns for dense matrices or order of the matrix and the number ofdiagonals for band matrices, the number of right-hand sides, matrixelements and elements of right-hand sides (the file mask is *.dat),each number occupying 8 bytes;

file contains in the binary form all information about problem:form and structure of the matrix, the number of rows and columns fordense matrices of order of the matrix and the number of diagonals forband matrices, the number of right-hand sides, matrix elements andelements of right-hand sides (the file mask is *.edat).

Table 3.1 contains an order in which information about theproblem being solved is to be written in file as well as values ofparameters to be used during the creation of the initial data filepossessing mask*.edat. TABLE 3.1

Contents of file Type BytesFormat version (=1) Integer 4Matrix structure and type code(= 17) " " 4

Code of order in which matrix elementsand elements of right-hand sides arewritten:

0 – by lower diagonals;1 – by rows;

" " 4

39

2 – by columns;The number of rows in matrix (for bandmatrix – order of the matrix)

" " 4

The number of columns (for bandsymmetric matrix – half-width of bandexcluding main diagonal)

" " 4

The number of right-hand sides in LAS " " 4

Relative error in matrix elements Floating-point number

8

Relative error in elements of right-handsides

" " 8

Elements both of matrix and right-handsides (in the form of sequence of numbers)

" " 8×lA8×lb

To encode the type of matrix the following formula is used16 i1 + i2,

where i1 is a matrix structure code: i1 = 0 for dense matrix and i1 = 1for band matrix; i2 is a matrix code i2 = 0 for general matrix andi2 = 1 for symmetric matrix.

An order in which elements of dense (general and symmetric) andband matrices are entered is given below.

For the dense matrix

its elements are entered in the following order:a11 a12 ... a1n a21 a22 a2n … an1 an2 …ann.

Elements of the band symmetric matrix

40

are entered in the following order:a11 a22 a21a33 a32a31a44 a43 a42 a55a54 a53 …

Consider the case where elements both of the symmetric matrixand one right-hand side for the 1000-th order LAS are written insuccession in binary form in file A41_1000.tam. In the window«Matrix specification» the user should indicate “File” for data inputand enter the file name (fig. 3.5). In the window «Matrix form» usershould indicate type and structure of the matrix by selecting «dense,symmetric» (fig. 3.6).

41

FIG. 3.5

42

FIG. 3.6

Order of the matrix and maximum relative error in matrixele-ments can be entered by means of the keyboard (fig. 3.7).

43

FIG. 3.7

If the user considers the initial data to be given accurately heenters a value of the maximum relative error equal to 0.0 (0.0 byde-fault). Already entered values are colored in green, while data to beentered are red.

Information about right-hand sides (fig. 3.8) is entered in the samemanner. Elements of the right-hand sides vector are to be written infile which already contains matrix elements, that’s why the usershould choose the item «File, from matrix file» in the window«Right-hand sides specification». Parameters and maximumrela-tive error in right-hand sides (0.0 by default) are also entered bymeans of the keyboard.

44

FIG. 3.8

Input both of problem’s parameters and maximum relative errorsin elements finishes by pressing the <Enter> button. In so doing theinformation is entered and passage to the next data input window takesplace.

Elements of matrices of LAS can be edited by pressing button. In the appeared dialog window(fig. 3.9) the location of marked by user element to be edited isschematically reflected in the left upper corner of the right panel. Ared slider can be moved in order to mark matrix segment containingelement to be edited. A table to the right represents the matrix segmentwhere the editing is taking place. At the bottom of the panel one cansee the numbers of row and column at the crossing of which anelement to be edited in separate cell is located. After its editing the

45

corrected matrix is either automatically updated in the old file or savedin another file (when pressing « Save as »). Having pressed «Close»the work of Inpartool can be continued.

FIG. 3.9

3.2.3. The solving of LAS. Inpartool proposes two ways for thesolving of LAS: automatic and interactive. To run the problem oneshould choose a way for solving the problem in window which willappear after successful input of the data (fig. 3.8).

During the automatic solving of the problem it is investigated. Onthe basis both of the initial data investigation and characteristics ofLAS revealed by computer as well as according to engineering andmathematical potentialities of Inparcom-16 an algorithm for solvingthe problem is chosen, an efficient topology from the number ofpro-cessors optimum for this problem is constructed, initial data are

46

dis-tributed between processors according to the chosen algorithm, theproblem is solved and reliability of the obtained results is analyzed.

During the interactive solving of the problem some itschara-cteristics known to user can be indicated, for example,determinacy or singularity of the matrix (fig. 3.10).

FIG. 3.10

Inpartool will construct an algorithm and solution program withtaking into account information about problem’s characteristicsob-tained from user and distribute matrix elements betweenprocessors. If user was mistaken in the determining of problem’scharacteristics he will be informed about this by Inpartool and will beproposed to con-tinue the investigating and solving of the problemwith taking into account characteristics revealed by computer. Theproblem will be solved together with reliability and analysis of theobtained results.

During the process of solving the problem a window will appearshowing a progress of performing the task (fig. 3.11) which will beclosed after completion of the solving of problem.

FIG. 3.11

47

One may vary the ways of solving the same problem. Forexamp-le, in order to choose interactive way after automatic solving ofthe problem one should click an arrow in the line entitled «Thesolving of problem: Automatically» (fig. 3.12) and then in theappeared menu choose «The solving of problem: Interactively».

To solve another problem (with different initial data) from theclass of problems under consideration one should click an arrow in theline entitled «Problem» (fig. 3.8), choose a problem from the list ofproblems in the appeared submenu (fig. 3.2) and then perform insuccession all stages of work covering input of the initial data andsolving of the problem.

3.2.4. Results of solving LAS. On the completion ofcompu-tational process the brief information about problem whichwas solved appears in the upper part of the right-hand panel. Besides,a popup dialog window «Processing of results» (fig. 3.12) appearswhere user can obtain results of solving the problem.

48

FIG. 3.12

Results of solving LAS by Inpartool include:solution of LAS;protocol describing process of investigating and solvingLAS.

On the completion of computational process a solution of theproblem is automatically saved in a binary file with standard nameresult.out. To look over the obtained solution of the problem (intabular form), save or print it in the yext form one should press«Browsing of solution» button. The solution can also be saved in thetabular form), save and print it in the text form one should press binaryform for its further using in the solving (by Inpartool, Inparlib or someother software) on Inparcom of those problems for which the solvingof LAS was intermediate stage. To do this suffice it to press «Savingof solution vectors» and indicate the file name.

Protocol describing a process of investigating and solving LAS ispresented in the text form. It includes the following descriptions:para-meters of the problem, method employed for the investigation ofprob-lem in order to choose an efficient algorithm and construct aprogram for the solving of problem, several control components of thesolution, an estimate for the inherited error in the solution, an estimatefor the computational error in the solution, an estimate for the matrixcondi-tion number and some other characteristics of the problem, theprob-lem's execution time and the number of processors being used.

To look over the protocol a user should press «Browsing ofsolutions protocol» button. The protocol can be printed, saved in textfile or deleted. If the current protocol has not been deleted allpro-tocols of problems solved after this problem during this worksession will be appended to the already existing protocol.

3.2.5. Inpartool’s diagnostics during the solving of LAS.Du-ring the process of formulating the problem, inputting initial dataand solving LAS a user can get:

referential information;help-type message;

49

problem’s run-time diagnostics.Having click by right-hand mouse button on the title «Linear

algebraic systems» (fig. 3.13) a user can become familiar withfunctional potentialities of Inpartool concerning the solving ofprob-lems belonging to this class of problems as well as with order ofwork. In similarly the same manner one can get appropriate shortinfor-mation (of the Help-type) at any stage of work by clicking theright-hand mouse button on any menu item, title, inscription or someother control element of interest.

FIG. 3.13

Some information about terminology related to the linear algebrabeing used can be obtained by choosing «Glossary» menu item in thesubmenu «Help» of the main menu (fig. 3.1). Having chosen a term ofinterest from the list on the left-hand panel in the appeared windowuser can get its explanation (fig. 3.14).

50

FIG. 3.14

As noted above, after the completion of automatic or interactivesolving of the problem by means of Inpartool a user gets (in protocol)information about: process of solving the problem, revealedcharacteristics of the problem, reliability of the obtained results orreasons for which problem was not solved.

In case of interactive solving of the problem the user can get somerun-time information about a process of solving the problem and makea decision as to its further continuation. For example, if user waswrong in determining such characteristics of the matrix as positivedefiniteness or singularity then, having investigated the problem,Inpartool will deliver appropriate message and provide an opportunityfor the user either to continue or interrupt a process of computations(fig. 3.15).

51

FIG. 3.15

Besides, when working with Inpartool the user receives in caseof the necessity various prompts, warnings and error messages.

3.3. Examples of solving linear algebraic systems by means of Inpartool

Let us illustrate the computational potentialities of Inpartool forthe solving of LAS on the following problems.

Problem 1

Investigate and solve LAS Ax = b by means of Inpartool, whereA = (aij), i, j = 1 n, aij = n + 1 + max (i, j),

, .

Exact solution of the system is the following:.

The problem was solved on 4 processors. Matrix elements as wellas elements of the right-hand sides were input from the binary fileA16_1k.tam. In dialog windows the user should indicate a type and

52

structure of the matrix: dense symmetric, order of the matrix: 1000,maximum relative errors in elements: zero.

Protcol of solving the Problem 1 in automatic mode.

P R O B L E M: The solving of the linear algebraic system with a symmetric positive definite matrix

D a t a : - matrix dimension = 1000 - number of the right-hand sides of the systems = 1 - maximum relative error in the matrix elements = 0.00000e+00 - maximum relative error in elements of the right-hand sides = 0.00000e+00

P r o c e s s o f i n v e s t i g a t i n g a n ds o l v i n g

M e t h o d: - Cholesky decomposition

R E S U L T S: SOLUTION IS OBTAINED IN FILE result.out

E s t i m a t e s : - inherited error in the solution : 8.93106e-12 - computational error in the solution: 8.8817841970012602e-16

P r o p e r t i e s : - estimate of condition number of the matrix 2.01110e+04

Solution (last 10 components)

0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.000000.00000 -1.00000 2.00000

Proc Number: 4

As one can see in the protocol, a matrix of the system issymmetric and positive definite. For such system a program wasconstructed for the investigating and solving of the problem together

53

with reliability estimates for solution obtained by Cholesky algorithm.The computed solution (in the protocol we can see 10 last componentsof it) is of high accuracy that agrees with the given error estimates.Draw your attention to the following peculiarity of estimate for theinherited error: initial data of the problem are accurate (maximumrelative errors in their elements an equal to zeros), while estimate forthe inherited error in given protocol is non-zero. This can be explainedby the fact that all real numbers being input to computer undergo somechanges related to their machine representation. An accuracy of thenumber’s representation is characterized by machine epsilon, i.e. theleast floating-point number macheps Therefore, if user assignsmaximum relative errors in elements equal to zeros, these values arereplaced by macheps=2.220446049250313e-016).

Problem 2

Investigate and solve LAS Ax = b by means of Inpartool,where A = (aij), i, j = 1 n, n = 3w + 1, w = 1, 2, …; aii = n – i, aij = n + 1 – max (i, j),

,

, b i= n – i, if i 2; bi= n + 1 – i, if и i > 2.

Exact solution of the system is the following:.

Matrix elements as well as elements of the right-hand side shouldbe input from the binary file A41_1000.tam. In dialog windows theuser should indicate a type and structure of the matrix: dense

54

symmetric, order of the matrix: 1000, maximum relative errors inelements of the system: zeros.

Protocol of solving the Problem 2 in automatic mode

P R O B L E M: solving of the linear algebraic system with a symmetric positive definite matrix


P r o c e s s o f i n v e s t i g a t i n g a n d s o l v i n g

M e t h o d: - Cholesky decomposition

R E S U L T S: !!! THE MATRIX IS NOT POSITIVE DEFINED !!! Number of processors: 4

M e t h o d: - Gauss elimination with partial pivoting

R E S U L T S: !!! THE MATRIX IS MACHINE-SINGULAR !!!

Number of processors: 4

M e t h o d: - singular value decomposition of a general matrix

R E S U L T S:

SOLUTION WAS CALCULATED

first 4 components of solution (vector 1) are:

55

-3.7747582837255322e-010 1.0000000000000031e+000 3.8857805861880479e-010 3.6489927986770073e-010

The vector(s) of solution are successfully stored in thefile result.out

Error estimations: 4.99145e-08

P r o p e r t i e s: - estimation of conditional number: 7.49316e+07 - matrix rank: 999


As one can see in the protocol, since the system’s matrix issymmetric, Inpartool has chosen the Cholesky algorithm (mosteco-nomic algorithm for such matrices) as a trial algorithm forinve-stigating the problem. However, during the process ofinvestigating by this algorithm the matrix turned out to be not-positivedefinite, and for its further investigation Inpartool has chosen theGauss algorithm. During the investigating of LAS by Gauss algorithmthe matrix turned out to be machine-singular. It is possible to constructa generalized so-lution for such LAS. In the automatic mode aproblem was syn-thesized for finding the generalized solution of LASbased on the SVD-matrix decomposition, the required topology for thisalgorithm was created from the available processors, data arrays wereredis-tributed between processors and the problem was solved togetherwith reliability estimates for the solution. However, if the user a prioriknows that system’s matrix is singular he can solve the probleminte-ractively. From the very beginning Inpartool synthesizes aprogram for the SVD-decomposition for finding a generalized solutionof LAS and solves the problem together with reliability estimates forthe solution.

Protocol of solving Problem 2 in interactive mode

P R O B L E M: solving of the linear algebraic system

56

with a general matrix



M e t h o d: - singular-value decomposition of a general matrix

R E S U L T S: SOLUTION WAS CALCULATED


-3.7747582837255322e-010 1.0000000000000031e+000 3.8857805861880479e-010 3.6489927986770073e-010





If some absolute error is introduced in the last element of thematrix, namely instead of zero value set: 1.e–8 and set the value of themaximum relative error equal to 1.e–14, then one can see in theprotocol below that the solution of the problem and its error estimatehave also changed.

Protocol of solving the Problem 2 with changed value of maximumrelative error

57

P R O B L E M: solving of the linear algebraic system with a general matrix

D a t a : - matrix dimension = 1000 - number of the right-hand sides of systems = 1 - maximum relative error in the matrix elements = 1.00000e-14 - maximum relative error in elements of the right-hand sides = 0.00000e+00


M e t h o d: - singular-value decomposition of a general matrix

R E S U L T S:SOLUTION WAS CALCULATED


4.023313522338e-007 1.0000000000018e+000 -6.407499313354e-007 -5.2154064178466e-007





Problem 3

Investigate and solve LAS Ax = b by means of Inpartool,where

A = (aij), i, j = 1 n, aij = 2, j i; aij = 1, j < i;

58

, .

Exact solution of the system is:

.The problem was solved on 40 processors. Matrix elements as

well as elements of the right-hand side should be input from the fileA2_2k.tam in the binary form. In dialog windows the user shouldindicate a type and structure of the matrix: dense general, order of thematrix: 2000, maximum relative errors in elements of the system:zero.

Protocol of solving the Problem 3 in automatic mode

P R O B L E M: solving of the linear algebraic system with a general matrix

D a t a : - number of matrix's rows = 2000 - number of matrix's columns = 2000 - number of the right-hand sides of the systems = 1 - maximum relative error in the matrix elements = 0.00000e+00 - maximum relative error in elements of the right-hand sides = 0.00000e+00


M e t h o d: Gauss elimination with partial pivoting

R E S U L T S :

SOLUTION IS OBTAINED IN FILE result.out

59

E s t i m a t e s: - inherited error in the solution: 2.96064e-12 - computational error in the solution: 0.44630e-16

P r o p e r t i e s: - estimate of condition number of the matrix .66678e+03

Solution for right part number 1

first 5 components:

1.0000000000000000e+00 0.0000000000000000e+00 0.0000000000000000e+00 0.0000000000000000e+00 0.0000000000000000e+00

last 5 components:

0.0000000000000000e+00 0.0000000000000000e+00 0.0000000000000000e+00 0.0000000000000000e+00 -5.0000000000000000e-01

Proc number: 40

A one can see in the protocol, a matrix of the system isnon-singular. For such kind of system Inpartool synthesizes a programfor its investigating and solving by Gauss method's algorithm. Thevalue of estimate for the condition number is small therefore thecomputed solution (5 first and 5 last components of it are presented inthe Protocol) possesses the high accuracy that is confirmed bydelivered error estimates.

Time required for the investigating and solving of LAS with bandsymmetric matrix (matrix order is 136198, the band half-width is1811) by Cholesky method is shown in fig. 3.16.

60

FIG. 3.16

Time required for the investigating and solving of LAS with densesymmetric matrix of the 10000-th order by LDLT matrixdecomposition algorithm of Cholesky method is shown in fig. 3.17.

61

FIG. 3.17

Time required for the investigating and solving of LAS with densegeneral matrix of the 10000-th order by Gauss method is shown infig. 3.18.

FIG. 3.18

Diagrams presented above characterize accelerations gaineddu-ring the solving of LAS on different numbers of processors byvarious methods and illustrate dependence of problem’s run times onthe num-ber of cluster’s processors being used. For each problemInpartool automatically establishes a computer topology and the

62

number of pro-cessors optimum for the efficient solving of theproblem.

63

Chapter 4. INVESTIGATING AND SOLVING OFEIGENVALUE PROBLEMS

4.1. Functional potentialities of Inpartool on investigating andsolving of eigenvalue problems

Eigenvalue problems arise in the determining of frequencies andforms of eigen-oscillations of conservative dynamic systems, ininvestigating of oscillations and stability of objects of mechanical,physical and chemical origin, in factor analysis and as independentmathematical problems.

Algebraic eigenvalue problem (AEVP) consists in finding suchnumbers , for which there exist different from zero solutions of LAS

Ax = Bx, (4.1)where A and B are some square matrices of order n. Numbers arecalled eigenvalues of the problem (4.1), while vectors x are calledeigenvectors of this problem. If B is an identity n-th order matrix theproblem (4.1) is referred to as a standard eigenvalue problem;otherwise – as a generalized problem. In case of standard problemnumbers and vectors x are also called eigenvalues and eigenvectorsof the matrix A.

Problem consisting in finding all eigenvalues and eigenvectorscorresponding to them is called a full eigenvalue problem. Problemconsisting in finding several eigenvalues and vectors corresponding tothem or finding only eigenvalues is referred to a partial eigenvalueproblem.

Eigenvalues of either real symmetric matrix or complex-valuedHermitian matrix are real numbers which can be ordered, for example,in increasing order and, then, they can be renumbered. Eigenvectors ofreal symmetric matrix are real, while eigenvalues of the Hermitiancomplex-valued matrix are complex-valued.

When solving application problems, AEVPs with accurate initialdata

. (4.2)arise very seldom.

64

The approximate nature of initial data of the problem (4.1) iscaused by the following factors: inaccuracies of measurementsperformed during the statement of the application problem; acceptedsimplifications and admissions; errors in the discretization ofcontinuous mathematical model; the using of approximate formulaswhen forming the initial data, etc.

The most typical specification of the problem (4.1) and error inthe initial data is the following:

. (4.3)

The investigation of characteristics of AEVPs with approximatelygiven initial data may include spectrum decomposition of the matrix,construction of the invariant subspaces (for example, eigen- orroot-subspaces) of canonical forms (for example, the Jordan's),determination of conditioning of eigenvalues and eigenvectors,investigation of perturbations in solutions depending on errors in theinitial data, reliability estimates of the obtained machine solutions, i.e.solutions of the problem (4.1) obtained in computer together withinitial data errors (4.3).

A proximity between elements of matrices A, B and ,respectively, doesn’t always provide proximity between eigenvalues ofthe problem. When investigating standard eigenvalue problem thefollowing cases should be distinguished [51]:

perturbation in simple eigenvalue of the matrixpossessing linear elementary divisors;

perturbation in multiple eigenvalue of the matrixpossessing linear elementary divisors;

perturbation in simple eigenvalue of the matrixpossessing one or more non-linear elementary divisors;

perturbation in multiple eigenvalue corresponding tonon-linear elementary divisor of the full matrix;

65

perturbation in multiple eigenvalues I when more thanone elementary divisor with multiplier ( I- ) is available and, at least,one of them is non-linear.

Similar cases may also arise for the generalized eigenvalueproblem. When evaluating eigenvectors a problem of estimating thereliability of the obtained solutions arises, as well.

On the basis of the foregoing one can see that the entire class ofeigenvalue problems (4.1), (4.3) arises in the describing of physicalmodels. A proximity between solution of problems (4.2) and(4.1), (4.3) is determined, on the one hand, by characteristics ofproblem’s matrices, while, on the other hand, by errors in the initialdata specification.

Thus, the computer implementation of methods for findingeigenvalues and eigenvectors of the problem (4.1), (4.3) introduces anerror determined by characteristics of the problem’s matrix (matrices),by methods for solving AEVPs as well as by characteristicpeculiarities of computations. Therefore at the stage of computersolving of the problem the following investigations (which should takeinto account the above mentioned cases of perturbations ineigenvalues) are carried out:

to reveal the existence and uniqueness of solution of themachine problem;

to investigate its stability within the level of errors in the initialdata (εA и εB);

to choose an algorithm according to the revealedcharacteristics;

to estimate inherited and computational errors in the obtainedsolution, i.e. estimate the proximity between the obtained and exactsolutions of the machine problem.

Hence, the investigating of the reliability of the obtained computersolutions to matrix eigenvalue problems includes: the revelation andinvestigation of characteristics of problems (4.2) and (4.1), (4.3) aswell as characteristics of machine problem corresponding to them;estimation of the inherited error in the mathematical solution as well

66

as estimation of the computational error in the obtained machinesolution and estimation of the overall error in the solution.

Proceeding from the analysis of practical problems Inpartoolincludes the solving of AEVPs with following real symmetricmatrices: dense, tri-diagonal, band positive definite. For these types ofmatrices the following AEVPs with approximately given initial dataare considered:

investigate and solve full standard AEVP Ax = x withtri-diagonal symmetric matrix A;

investigate and solve full standard AEVP Ax = x with densesymmetric matrix A;

investigate and solve partial (finding of some minimumeigenvalues and their corresponding eigenvectors) standard AEVpAx = x with band symmetric positive definite matrix A;

investigate and solve partial generalized AEVP Ax = Bx withband symmetric positive definite matrices A and B.

As in the case of LAS, these problems can be solved by optimumnumber of algorithms, various modifications of which take intoaccount all problems and types of matrices under consideration. Thus,the QL-algorithm [52] is employed for the evaluation of alleigenvalues both of tri-diagonal and dense symmetric matrix.Algorithm of the iterations' method on the subspace [2] is used for thesolving both of standard and generalized AEVP with band symmetricmatrices.

Besides, for the solving of AEVP algorithms involved in theinvestigating and solving of LAS are used. For example, algorithms ofthe Cholesky method are employed both in the solving of problems byiterations' method on the subspace and in the reliability analysis of theobtained solution to the partial AEVP with band symmetric matrices.

A sequence of orthogonal reflection transformation (theHouseholder’s method) is used for the reduction of dense symmetricmatrix to tri-diagonal symmetric matrix [52].

If A is a real and symmetric matrix the evaluation of eigenvaluesof symmetric matrices is always stable and proximity betweenproblems (4.2) and (4.1), (4.3) is determined only by the initial data

67

error. However, error in the evaluation of eigenvectors also dependson the proximity of egenvalues. The following error estimate for thesimple eigenvalue and its corresponding eigenvector is well known[28]:

.

If approximates the multiple eigenvalue (i = p, p+1, …, q) of

the matrix and (i ≠ p, p+1, …, q), while x is an

eigenvector of the problem (4.1) corresponding to , then there existsvector ( eigenvectors of problem (4.2) for

which [17].

An overall error in the solution of the algebraic eigenvalueproblem (4.1), (4.3) can be estimated as follows:

, , (4.4)

where and vector are approximate eigenvalue and itscorresponding eigenvector of the problem (4.1), respectively, while

is the residual vector.The following estimate delivers relative to error in eigenvalues of

the generalized problem (4.1) with symmetric positive definitematrices:

.

When solving partial AEVP by method of iterations' on thesubspace a loss (non-evaluation) of one or several minimumeigenvalues being evaluated and their corresponding eigenvectors ispossible. This phenomenon is caused by orthogonality of each such

68

eigenvectors as well as of the initial subspace being iterated. For aposteriori diagnostics of such phenomenon a property of the Sturm’ssequence is used. To this end the LDLT-decomposition of the matrixA-μB is carried out, where shift μ should exceed maximum of theevaluated eigenvalues; then the number of eigenvalues less than μ isequal to the number of negative elements of the diagonal matrix D.

69

The distinguished features of the Inpartool are the following:investigating of characteristics of AEVP;possibility of the automatic choice of algorithm and its

parameters according to the revealed characteristics of AEVP;possibility of the automatic choice of topology (the number of

processors) of the parallel computer according to the chosen algorithmand its parameters;

the solving of AEVP with approximately given initial data;investigation of reliability of AEVP’s solutions;possibility of work with software without preliminary

familiarization with it as well as without studying of instructions.At the level of concepts, Inpartool implements fundamental

principles of the information computing technology for the solving ofproblems. This technology involves: formulation of problem in termsof the subject area language, investigation of characteristics of theproblem being solved and automatic choice of algorithm depending onthe revealed problem’s characteristics, syntheses of the solutionprogram with taking into account mathematical and engineeringcharacteristics of computer, the solving of problems and analyzing thereliability of the obtained results, dialog support and informationreferential provision for the process of problem’s formulating,investigating and solving.

4.2. Technology of investigating and solving algebraiceigenvalue problems

4.2.1. Applying of Inpartool for the solving of AEVP. Themain window «Algebraic eigenvalue problem» consists of the mainmenu and two panels (fig.4.1).

70

FIG. 4.1

The left panel (passive) reflects a sequence of work stages andsub-stages which were already performed, being performed and willbe performed.

To solve the problem a user should carry out the followingsuccessive stages of work in the right-hand (active) panel:

formulate the problem;input the problem’s initial data;start the problem;obtain results.

Inpartool solves the following AEVPs:full AEVP with tri-diagonal symmetric matrix;full AEVP with dense symmetric matrix;with dense symmetric matrix;

71

partial AEVP (evaluation of several minimum eigenvalues andtheir corresponding eigenvectors);

standard or generalized AEVP with band symmetric matrices.To formulate a problem a user should choose the required item

from one of two pull-down lists of problems from the given classwhich can be solved by Inpartool (fig. 4.2, 4.3).

FIG. 4.2

72

FIG. 4.3

4.2.2. Specification of initial data for the solving of AEVP.Initial data for the solving of AEVP are given by parameters of theproblem being solved, i.e. by the following: order of the matrix(matrices), the number of diagonals for band matrices, number of thefirst and last eigenvalues to be evaluated for the partial problem,maximum relative error in matrix elements as well as elements of thematrix (matrices).

FIG. 4.4

73

The initial data (fig. 4.4) can be input from the binary file (theinput from file) and/or their values can be directly entered by user intocorresponding data fields (input by means of the keyboard). Inaddition it is also possible to form matrix elements by program(functions in C) written by user.

The following data structures (formats) are supported by Inpartoolfor the data input from file (files):

file contains only matrix elements in form of floating-pointnumbers the sequence of which is determined by form of the matrix(the file mask is *.tam), each number occupying 8 bytes;

file contains in binary form parameters and elements of thematrix (the file mask is *.edat); Table 4.1 contains structure of suchfile for the case of band symmetric matrix;

file contains parameters and elements of two matrices (the filemask is *.eedat) required for the solving of generalized AEVP;Table 4.2 contains a structure of such file for the case of bandsymmetric matrices.

74

TABLE 4.1Contents of the file Type Bytes

Format version (=1) Integer 4Matrix structure and type code(= 17)

" " 4

Code of order in which matrixelements are written

" " 4

Order of the matrix " " 4

Band width of the matrix " " 4

Relative error in matrix elements Double precisionf l o a t i n g - p o i n tnumber

8

Matrix elements (diagonal andsub-diagonal) in the form ofsequence of numbers

" " 8×lA

To encode matrix structure and type of the following formula maybe used:

16 i1 + i2,where i1 = 0 for dense matrix, i1 = 1 for band matrix, i1 = 3 fordiagonal matrix, i2 = 0 for general matrix, i2 = 1 for symmetricmatrix. Code of the identity matrix is equal to 49.

The bandwidth of the matrix is equal to the number of sub- andover-diagonals plus 1, i. е. in the case of symmetric matrix it is equalto 2m + 1, where m is the band half-width. Diagonal and sub-diagonalelements of the matrix are entered in succession a row by row, eachrow beginning from the diagonal element. For example, for the bandsymmetric matrix the 9-th order with band half-width equal to 2 (theband width is equal to 5) elements of the matrix should be stored inthe following order:

a11, a22, a21, a33, a32, a31, …, a99, a98, a97.

75

In so doing the code in which elements of the matrix are written isequal to 1, while the number of matrix elements stored in the filelA = 24.

TABLE 4.2Contents of the file Type Bytes

Format version (=2) Integer 4Matrix structure and type codefor the first (left-hand) matrix (=17)

" " 4

Code of order in which elementsof the first matrix are written (=1)

" " 4

Matrix structure and type codefor the second (left-hand) matrix

" " 4

Code of order in which elementsof the second matrix are written(= 1)

" " 4

Order of problem’s matrices " " 4Band width of the first matrix " " 4Band width of the second matrix " " 4

Relative error in elements of firstmatrix

Double precisionfloating-point

number

8

Relative error in elements ofsecond matrix

" " 8

Elements of the first matrix " " 8×lAElements of the second matrix " " 8×lB

Elements of tri-diagonal symmetric matrix are stored in file in thefollowing order: in succession, beginning from a11, elements of themain diagonal and then elements of the first sub-diagonal.

76

Elements of the dense symmetric matrix are stored by rows insuccession.

As an example, consider a case where the standard full eigenvalueproblem with dense symmetric matrix of order n = 1000 is solved.Elements of the matrix are written in succession by rows to the fileAmax1000.tam. A user should choose the file input of the matrixand having indicated data format, enter the file nameAmax1000.tam. After this in the appeared lists he should choose thematrix type – dense, its structure –symmetric, the matrix order – 1000,and the maximum relative error in its elements are to be entered intodata fields (fig. 4.5). By default, matrix elements are considered to begiven accurately and value of the maximum relative error is equal to 0.

During the input of numerical parameters the data fieldscorresponding to them are colored in red if these values are necessarilyto be determined (i. е. such parameters are not assigned any values bydefault). After the input of values the data fields become green. Theinput of numerical parameters should be finished by pressing<Enter> button. In so doing the information is entered and passageto the next data input field or control element takes place. Matrixelements can be edited in the manner described in Chapter 3.

As to the program specification of matrix elements one shouldenter text of function written in C that forms diagonal andsub-diagonal elements of one row of the matrix (the number of row isan input parameter of this function). User may prepare the text inadvance in file or enter it directly and save in a new file. Dialog on thedetermining of a priori created file containing text of the function issimilar to dialog on the determining of the data file. For the directentering of text or its corrections Inpartool employs a text editor.

77

FIG. 4.5

4.2.3 The solving of AEVP. The solving of algebraic eigenvalueproblem in Inpartool can be carried out either automatically withoutuser’s involvement or interactively in the dialog with user.

To solve the problem one should, after the successful input of theinitial data, press appropriate button («Automatically» or«Interactively»), which appears under the title «The solving ofproblem» (fig. 4.6).

78

FIG. 4.6

In the case of automatic solving of problem Inpartool first of allinvestigates the problem. On the basis both of initial data investigationand characteristics of AEVP revealed by Inpartool as well asaccording to engineering and mathematical potentialities ofInparcom-16 an algorithm for the solving of problem is chosen, aprocessors’ topology is constructed, according to the chosen algorithmthe initial data are distributed between processors, the problem issolved and reliability of the obtained results is investigated.

During the iterative solving of the problem a user can specify thenumber of processors on which the problem is to be solved as well asother parameters of the problem if they can be modified, for example,a block size. Besides, when solving AEVP with band symmetricmatrix (matrices) the user can choose a solution algorithm for matriceswith narrow band or matrices with wider band. Then it is necessary topress «Further».

79

During the solving of problem a window appears showing aprogress of performing the task which will be closed after thecompletion of solving the problem. After the completion ofcomputations short information about the problem that was solvedappears in upper part of the right-hand panel together with title«Processing of results» located below.

4.2.4. Results of solving AEVP. Results of solving AEVP by Inpartool are the following:

evaluated eigenvalues;evaluated eigenvectors;error estimates for eigenvalues;error estimates for eigenvectors;other information on the reliability of the obtainedsolutions.

Numerical results (eigen- pairs and their estimates) are saved in abinary file with standard name result.out. Some part of numericalresults as well as characteristic information about the reliability of theobtained results in the text form are placed in protocol describing theprocess of investigating and solving of AEVP.

To look over and process the obtained results one should press abutton located below the title «Processing of results» (fig. 4.7).

To look over and process the obtained results saved in the fileresult.out, one should press «Browsing of results» button. In theappeared window the results of solving the problem are presented inthe form of table (eigenvalues, estimates, eigenvectors). These resultsor a part of them can be printed or saved in a binary file with uniquename for their further using. If it is necessary only to save resultssuffice it to press «Saving of solution» button.

A protocol describing process of investigating and solving AEVPin addition to some results of solving the problem contains descriptionof the problem, name of the method (algorithm) used for the solvingof problem, the number of processors being used, the problem’sexecution time. To look over the protocol the user should press«Browsing of protocol» button. By pressing an appropriate button inthe browsing window the user can print the protocol, save it in the text

80

file or delete. If current protocol wasn’t deleted a protocol of the nextproblem solved during this work session will be appended to thealready existing protocol.

FIG. 4.7

4.2.5. Inpartool's diagnostics during the solving of AEVP.During formulating or inputting of the initial data and solving ofAEVP a user can get:

referential information;Help-type messages;Problem’s run-time diagnostics.

Having clicked by right-hand mouse button on the title«Algebraic eigenvalue problem» (fig. 4.1) a user can familiarizehimself with Inpartool’s functional potentialities as to the solving ofproblems belonging to this class of problems as well as with order ofwork with Inpartool. In similarly the same manner one can getappropriate short information (of the Help-type) at any stage of work

81

by clicking the right-hand mouse button on any menu item, title,inscription or some other control element of interest.

In «Help» item of the main menu a user can get information aboutfunctional potentialities of Inpartool or terminology related to thelinear algebra being used (fig. 4.8).

FIG. 4.8

Having chosen the «Glossary» item in the «Help» submenu andthen a term of interest from the list on the left-hand panel in theappeared window a user get its explanation (fig. 4.9).

82

Besides, Inpartool issues to user various prompts, warnings anderror messages.

FIG. 4.9

4.3. Examples of solving algebraic eigenvalue problems bymeans of Inpartool

Let us illustrate computational potentialities of Inpartool for thesolving of AEVP on the following problems.

Problem 1

Solve the full standard AEVP Ах = λx, where А is a tri-diagonalsymmetric matrix:

Exact solution of the Problem 1 (i,k = 1, 2, …, n) has a form

83

, .

Problem 2

Solve the full standard AEVP Ах = λx with dense symmetricmatrix:

Exact solution of the Problem 2 (i,k = 1, 2, …, n) has a form

, .

Problem 3

Evaluate 8 minimum eigenvalues and their correspondingeigenvectors of the generalized algebraic eigenvalue problem (4.1)with band symmetric positive definite matrices A and B. Matrices Aand B are obtained during the discretization of eigenvalue problem byfinite elements method for the Laplace operator in rectangle one sideof which is fixed. In this case matrices are block tri-diagonal:

, ,

where

84

, ,

,

, , ,

Nx, Ny – amounts of partitioning of the rectangular region inhorizontal and vertical directions, respectively. The order of squareblocks A1 and A2, B1 equals to Nx+1, the number of such blocks inmatrices A and B is Ny Ny. Thus, the order of matrices A and B isn = (Nx+1)Ny, while band half-width of these matrices is equal tom = Nx+2 (band width is 2m+1 = 2Nx+5).

Exact solution of the Problem 3 has a form

,

where , , 0 k Nx, 1 l Ny.

85

Problem 1 (with n = 1000) was solved on 16 processors byQL-algorithm. Elements of the matrix are given accurately and writtenin the file A286_1000.tam. In the solution problem the relativeerror in matrix elements specification is assigned to the least non-zerofloating point number masheps ≈ 2,22 10–16. A protocol describingthe solving of the problem by means of Inpartool in the automaticmode is given below.exact eigenvalues 9.849886676738e-06 39944968634e-05 8.864839796918e-05 1.575962464286e-04 2.462423159362e-04 3.545857333380e-04 4.826254314638e-04 6.303601491373e-04 7.977884311878e-04 9.849086284661e-04

Time mp_esytri 7.794500e-01

Order of matrix = 1000 Number of processor = 16 Matrix elements error = 2.220446e-16

FIRST 10 EIGENVALUES 9.849886674095e-06 3.939944968403e-05 8.864839796700e-05 1.575962464263e-04 2.462423159338e-04 3.545857333358e-044.826254314614e-04 6.303601491352e-04 7.977884311857e-04 9.849086284639e-04

ESTIMATION OF FIRST EIGENVALUES 2.392412e-13 2.392412e-13 2.392413e-132.392413e-13 2.392413e-13 2.392413e-13 2.392413e-132.392414e-13 2.392414e-13 2.392415e-13 ESTIMATION OF FIRST EIGENVECTORS 8.096270e-09 8.096270e-09 4.857794e-093.469887e-09 2.698837e-09 2.208176e-09 1.868494e-091.619398e-09 1.428919e-09 1.278544e-09

Problem 2 (with n = 1000) was solved on 16 processors. Matrixelements are given accurately and written in the file Amax1000.tam.In the solution program the relative error in matrix elements’

86

specification is assigned to the least non-zero floating point numbermasheps ≈ 2,22 10–16. During the solving of problem the initialmatrix is reduced to tri-diagonal symmetric matrix by means ofsequence of two-sided Householder’s transformations, and full AEVPfor this matrix is solved by QL-algorithm. A protocol describing thesolving of the problem by means of Inpartool in automatic mode isgiven below.

Calculation of all eigenvalues and eigenvectors

Time 4.270442e+00

Order of matrix = 1000 Number of row in block = 20 Number of processor = 16 Matrix elements error = 2.220446e-16

FIRST 10 EIGENVALUES -7.49999384e-01 -7.49997535e-01 -7.49994454e-01-7.49990140e-01 -7.49984594e-01 -7.49977814e-01 -7.49969802e-01 -7.49960557e-01-7.49950078e-01 -7.49938366e-01 LAST 10 EIGENVALUES 1.12287869e+03 1.40285538e+03 1.80215054e+032.39961659e+03 3.35189425e+03 5.00760334e+03 8.27847355e+03 1.62266882e+044.50757634e+04 4.05689204e+05

ERROR ESTIMATION OF FIRST EIGENVALUES 9.09616314e-11 9.09616314e-11 9.09616314e-119.09616314e-11 9.09616314e-11 9.09616314e-11 9.09616314e-11 9.09616314e-119.09616314e-11 9.09616314e-11 ERROR ESTIMATION OF LAST EIGENVALUES 9.12111271e-11 9.12732944e-11 9.13619558e-119.14946199e-11 9.17060680e-11 9.20737093e-11 9.27999883e-11 9.45648465e-111.00970628e-10 1.81042897e-10

ERROR ESTIMATION OF FIRST EIGENVECTORS 4.92025911e-05 4.92025911e-05 2.95211576e-052.10861262e-05 1.63998857e-05 1.34176502e-05 1.13529491e-05 9.83876786e-068.68081069e-06 7.76658121e-06 ERROR ESTIMATION OF LAST EIGENVECTORS

87

4.47413616e-13 3.26003185e-13 2.28808073e-131.53137771e-13 9.63018164e-14 5.56098349e-14 2.83716511e-14 1.18976212e-143.49996065e-15 5.02041457e-16

Problem 3 with Nx = 319, Ny = 50, i. е. n = 16 000, and the bandhalf-width m = 321 (total band width is equal to 643) was solved on16 processors by method of iterations on the subspaces. The problem’sinitial data are written in the file Eig16000.eedat. A protocoldescribing solving of the problem in the automatic mode is givenbelow.

P R O B L E M : Solving of Partial Generalized EigenvalueProblem for Band Symmetric Matrices

INPUT PARAMETERS: order of matrices = 16000 bandwise of matrix A = 643 bandwise of matrix B = 643 maximal relative errors: of matrix A elements = 0.000e+00 of matrix B elements = 0.000e+00

number of minimal eigenvalues to calculate = 8

Exact eigenvalues 2.467604042554091e+00 1.233728821340764e+01.222305254921369e+01 3.209273672006724e+01 4.194729797545172e+016.170274648211132e+01 6.181196631645549e+01 7.168165048730904e+019.130050516997107e+01 1.012916602493531e+02 1.110559536766307e+021.213906838857423e+02 2.011944685514055e+02 3.015383945680863e+021.312603680565959e+02 2.110641527222591e+02P r o c e s s o f r e s e a r c h a n d s o l ut i o n of the problem

M e t h o d : Subspace Iterations matrix blocksize = 10

88

number of processors = 16

pr#15: (sbpldlt) returns 0 time=2.23644e+00pr# 0: conv= 0 Nit=16 time=1.67143e+00pr# 5: calc. of errors est. time=2.78056e+00

problem solving: total time = 3.43789e+01

R e s u l t s : SOLUTION WAS CALCULATED by 16 iterations (mit=32) All calculated eigenvalues are minimal

Eigenvalues (calculated) Estimates of Errors 2.467604042574461e+00 4.493e-15 1.233728821342248e+01 2.096e-12 2.222305254923304e+01 2.674e-15 3.209273672008100e+01 3.727e-12 4.194729797546218e+01 5.368e-10 6.170274648216704e+01 4.187e-07 6.181196631647476e+01 6.440e-09 7.168165048952910e+01 2.728e-06

#result=0pr# 8: (sbpldlt-2) returns 0 time=2.61476e+00

An influence of errors in the initial data can be illustrated byresults of solving of the following full standard AEVP.

Problem 2 was solved for n = 2000 for three different values oferror in the specification of the problem’s matrix elements: εA = 0,

εA = 10–10, εA = 10–6 (remind that during the accurate specificationof matrix elements – εA = 0 in program is replaced by

εA ≈ 2,22∙10–16). Results of problem’s solving for five minimum andfive maximum eigenvalues are given in Tabl.4.3 (overall errors in theeigenvalues’ evaluations) and in Tabl.4.4 overall errors in theeigenvectors evaluations).

Large values of estimates for errors in evaluation of eigenvectorscorresponding to minimum eigenvalues are caused by pathologicalproximity of these eigenvalues (see estimate (4.4)).

89

TABLE 4.3

I λiΔλi with

εA = 0Δλi with

εA = 10−10Δλi with

εA = 10−6

1 –0.749999846 3.632e–10 2.0036e–07 2.0000004e–03

2 –0.749999383 3.632e–10 2.0036e–07 2.0000004e–03

3 –0.749998613 3.632e–10 2.0036e–07 2.0000004e–03

4 –0.749997534 3.632e–10 2.0036e–07 2.0000004e–03

5 –0.749996147 3.632e–10 2.0036e–07 2.0000004e–03

1996 20 023.1526 3.677e–10 2.0037e–07 2.0000004e–03

1997 33 100.0958 3.706e–10 2.0037e–07 2.0000004e–03

1998 64 877.0677 3.776e–10 2.0038e–07 2.0000004e–03

1999 180 215.707 4.032e–10 2.0040e–07 2.0000004e–03

2000 1 621 948.69 7.234e–10 2.0072e–07 2.0000007e–03

90

TABLE 4.4

IΔxi with

εA = 0Δxi with

εA = 10−10Δxi with

εA = 10−6

1 7.8550e–04 4.3330e–01 4.9786e–01

2 7.8550e–04 4.3330e–01 4.9786e–01

3 4.7130e–04 2.5998e–01 4.9786e–01

4 3.3664e–04 1.8570e–01 4.9786e–01

5 2.6183e–04 1.4443e–01 4.9786e–01

1996 5.5543e–14 3.0269e–11 3.0214e–07

1997 2.8338e–14 1.5322e–11 1.5294e–07

1998 1.1884e–14 6.3057e–12 6.2939e–08

1999 3.4961e–15 1.7375e–12 1.7340e–08

2000 5.0173e–16 1.3922e–13 1.3872e–09

Accelerations gained during the solving of АEVPs on the differentnumber of processors related to run time on one processor are shownin diagrams below. Results obtained during the solving of Problem 2for n = 2 000 are shown in fig. 4.10.

91

FIG. 4.10

Results obtained during the solving of partial AEVP of the form(4.1) with band symmetric matrix A (the band half-width mA = 335)and diagonal matrix B(without and with investigating of the solution’sreliability) are shown in fig.4.1, the matrix order being n = 300 000.

FIG. 4.11

Accelerations gained on different stages of the problem's solvingprocess are shown in fig. 4.12: LDLT-decomposition of the matrix,iterative process, investigation of reliability of the computed solution.

92

FIG. 4.12

The exhibited results of solving the problem reflect basicopportunities for investigating and solving of AEVPs: of mathematicalcharacteristics, approximate nature of the initial data, reliabilityanalysis of the solution, automation of paralleling of the problem’sinvestigation and solving process.

93

Chapter 5. INVESTIGATING AND SOLVING OFSYSTEMS OF NON-LINEAR EQUATIONS

5.1. Functional potentialities of Inpartool on investigating andsolving of systems of non-linear equations

Systems of non-linear equations (SNE) often occur in the solvingof application problems. These problems may either representindependent problems describing physical processes or may arise inthe solving of more complicated mathematical problems at theintermediate stage of their solving. Due to the requirements of sourceand energy-saving the necessity arises in the mathematical modelingof processes and phenomena with high accuracy and reliability. Thisfact, in turn, leads to the solving of high-order problems. Thespeeding-up of the solving of such problems can be gained only bymeans of using parallel computations.

The solving of SNE is mainly carried out by iterative methodsbased (to some extent) on the Newton’s method. As this takes place,the number of methods requires evaluation of the Jacoby matrix ateach iteration and the subsequent solving of LAS.

Paralleling both of the Jacoby matrix evaluation and solving oLAS considerably speeds up a process of finding solutions of SNE.Iterative methods of another type involve iterative evaluation either ofthe Jacoby matrix or its inverse. In these cases paralleling ofcomputations also considerably reduces time required for the solvingof SNE.

The basic information about SNE is contained in vector-functions.By using considerably small increments of the vector-function one canevaluate the system’s Jacoby matrix rather accurately. In SNE the realaccuracy of the obtained solution (i.е. the reliability of the solution)may be evaluated in the neighborhood of the solution by norm ofmatrix inverse to the Jacoby matrix. All necessary evaluations can andshould be paralleled and thereby the considerable reduction in theproblem’s execution times can be gained.

Let the system of n non-linear equations

94

(5.1)

be given,where are vectors

containing the solution to be sought and vector-function, respectively.

If is the Jacoby matrix of the system (5.1) (or some

approximation to it), the iterative process for finding a solution whichimplements the Newton’s method with given initial approximation

can be written in the form

(5.2)

where is the correction, and , k=0,1,... is the number of iteration,

. (5.3)

For the solving of such problems in addition to the initialapproximation the following information is to be given: an additionalregion , where the solution is sought

and the required accuracy of the obtaining of approximations to thesystem’s solution. In so doing, .

As one can see from formulas (5.2), (5.3), one should solve LASof the form (5.2) on each iteration by evaluating therewith the value ofvector-function and the Jacoby matrix.

When modeling realistic processes on computer by means ofsystems of non-linear equations one often happens to be concernedwith approximately given initial data. The approximate nature of thedata may be caused by:

1) errors in system’s coefficients since they as results ofvarious measurements cannot be accurate;

2) errors in functions specification; these errors are causedby the fact that non-linear equations are often some approximations to

95

realistic non-linear equations; besides, very often realistic non-linearequations are approximated by more simple ones (which approximaterealistic non-linear equations) in order to save arithmetic operations ateach evaluation of functions;

3) employment of the numerical method for the solving ofproblem and rounding off numbers during computations;

4) the obtaining of system of equations by means ofdiscretization of problems of various types by spatial variables

Because of this the approximate nature of the initial data shouldbe taken into account when estimating the accuracy of solutions.

If instead of accurate system (5.1) the approximate system, (5.7)

is to be solved for which the following inequality

(5.8)

holds, where v is any vector and is an error in the vector-function'sspecification (i.e. SNE), then in case of satisfying the inequality

in the iterative process of the form (5.2) the

accuracy of the obtained solution is estimated by formula

, (5.9)

where is accuracy of the solution to the problem's computer modelspecified by user.

On the basis of the foregoing, the intelligent software Inpartoolhas been created which is intended for the investigating and numericalsolving of SNE with approximately given initial data in the specifiedregion and within the required accuracy. Inpartool provides:

the solving of SNE in the specified region and within therequired accuracy;

investigating of characteristics of the system;

96

a choice of classes of methods and solution programs (bothautomatically and by user);

dialog tools for the input of information;control over the information being input;issuing of recommendations as to making a decision in case of

interruption;reliability of the obtained solution.

The software operates with knowledge obtained during theinvestigating of problems and on the basis of them makes a decision asto ways and methods for the evaluating of solution within the givenaccuracy.

Mathematical facilities of the intelligent software include:mathematical methods for the computer investigation of

characteristic features of SNE;algorithms for the solving of SNE;tools for the evaluation of the solution and its reliability.

The initial data are either read from a priori prepared file orentered by means of the keyboard with their further visualization onthe display.

Inpartool possesses the following distinctive characteristics:investigation of characteristics of SNE;possibility of the automatic choice of a class of methods;guarantee of the solution's reliability;possibility of work with the software without preliminary

familiarization with it and without studying of instructions.At the conceptual level Inpartool implements the basic principles

of the information computing technology for the solving of problemswhich involves: formulation of problem in terms of the subject arealanguage, investigation of characteristics of the problem being solvedand automatic choice of algorithm depending on the revealedcharacteristics, synthesis of the solution program with taking intoaccount mathematical and engineering characteristics of computer, thesolving of problem and reliability analysis of the obtained results aswell as dialog support and information referential provision forprocesses of formulating and solving the problem.

97

To solve SNE by means of Inpartool the following input data arerequired:

order of system of non-linear equations; maximum number of iterations being performed; accuracy of the obtained solution; the initial data specification error; vector of initial approximations; arrays determining boundaries of the region.

In addition, to solve the SNE by means of Inpartool the usershould enter a program for the evaluation of the vector-function. It canbe read from the a priori prepared file or entered by means of thekeyboard.

Inpartool provides a possibility of automatic mode of investigatingand solving of problems under which the problem is investigated incomputer without user’s involvement and suitable algorithm forsolving of problem is chosen on the basis of revealed characteristics ofSNE and with taking into account engineering and mathematicalcharacteristics of Inparcom-16, a processor topology is constructed,the initial data are distributed between processors in order required byalgorithm, the problem is solved and reliability of the obtained resultsis estimated. By default, Inpartool constructs a topology from theoptimum number of processors. However, the user can choose therequired number of processors on his own. All information aboutprocess of solving the problem and obtained results are accumulated inprotocol.

A possibility of investigating and solving problems in the dialogmode is also provided. In this case characteristics of SNE areinvestigated first of all. If the Jacoby matrix is symmetric the SNE issolved by Powell’s method [44]. If the matrix is non-symmetric onecan choose either globally convergent method (Burdakov’s method[4]) or locally convergent methods.

In case if user has chosen a class of methods possessing localconvergence he can choose one of the following methods:

Newton’s method,Dennis-More’s method [6],

98

Broyden’s method [3].Output data for the solving of SNE are the following:

problem’s execution time;order of system of non-linear equations;the number of iterations performed during the evaluation of

solution;accuracy of the obtained solution;accuracy of the obtained solution with taking into accuracy of

the initial data specification;array containing norms of the vector-function on the sequence

of iterations;vector of the solution.

All data listed above are written info file of results.

5.2. Technology of investigating and solving systems ofnon-linear equations

5.2.1. Applying of Inpartool for the solving of SNE. Inpartoolsolves the following problems: finding of roots of one equation androots of SNE with symmetric and non-symmetric Jacobi matrix.

The main window «Systems of non-linear equations» consists ofthe main menu and two panels. The left panel (passive) reflects asequence of work stages and sub-stages which were alreadyperformed, being performed and will be performed. The view ofright-hand panel of the window is shown in fig. 5.1.

To find roots of one equation the following work stages are to becarried out in succession in special windows:

input of problem’s initial data; input of the right-hand side; start of the problem; obtaining of results of solving the problem.

To solve SNE a user should carry out the following successivework stages in special windows:

input of problem’s initial data;

99

input of the right-hand side; a choice of class of SNE; a choice of the solution method (in case of interactive solving

of the problem) start of the problem; obtaining of results of solving the problem.

FIG. 5.1

5.2.2 Specification of initial data for the solving of LAS. Thefollowing input data are required for the finding of roots of oneequation by means of Inpartool:

left-hand end of the interval; right-hand end of the interval; required accuracy of roots' evaluation; accuracy of the initial data specification; an estimate from above for maximum of the derivative’s

module in the interval.The solving of SNE by means of Inpartool requires the following

input data: order of system of non-linear equations; The maximum number of the performed iterations; accuracy of the obtained solution; error in the function’s specification; vector of initial approximations;

100

arrays determining boundaries of the region.Numerical data required for finding roots of one equation can be

entered by means of the keyboard, while function describing thenon-linear equation can be either entered by means of the keyboard orread from the a priori prepared file (fig. 5.2).

FIG. 5.2

The initial data (numerical data and function) for by means of thekeyboard (fig. 5.3)

When inputting data from the file (fig. 5.4) the data are read fromthe binary file data_nel in the order implicated above.

When inputting data by means of the keyboard the above-indicateddata are entered in special dialog data input window (fig. 5.5). Theinput of every element ends by pressing the <Enter> button. In so

101

doing the information is entered and passage to the next data inputfield takes place.

FIG. 5.3

FIG. 5.4

102

FIG. 5.5

All information about current problem can be saved in a single filein «File» item of the main menu. To do this one should select «Saveas…» menu item. This information can be used in next work sessions.

Previously entered data can be edited by pressing «Edit» button(fig. 5.6).

103

FIG. 5.6

Besides, to solve SNE a vector-function is to be entered. It alsocan be entered either from file or by means of the keyboard (fig. 5.7).

FIG. 5.7

If the user has chosen the input of functions from the file, awindow opens containing a list of vector-functions (fig. 5.8) whichwere used during all work sessions. It remains only to choose therequired vector-function (for example, f.c).

104

FIG. 5.8

The function’s input file contains a program written in C (fig. 5.9).If it is necessary to introduce modifications in the program one shouldpress «Edit» button and then save these modifications by pressing«Save» button.

When inputting functions by means of the keyboard one shouldwrite a program in C for the evaluation of right-hand sides in theappeared window and save it in the file by pressing «Save» button.

105

FIG. 5.9

5.2.3. The solving of SNE. Inpartool proposes two modes ofsolving SNE: automatic and interactive. To run the problem a useshould choose one of these modes by pressing a button of theappropriate mode in window which appears after the successful inputof data.

In case of automatic mode of solving the problem Inpartoolconstructs a computer topology efficient for the solving of thisproblem; distributes initial data between processors in the orderrequired by algorithm, solves the problem and estimates the reliabilityof the obtained results. By default, Inpartool constructs a topologyfrom such number of processes which is optimum for the solving ofthe given problem. However, the user can choose the number ofprocessors on his own (fig. 5.10).

106

FIG. 5.10

In case of interactive solving of the problem prior to the beginningof problem’s solving a user should indicate a type of the Jacoby matrix(symmetric or non-symmetric) (fig. 5.11).

If the matrix is symmetric the user is proposed to solve theproblem by Powell method. If the Jacoby matrix is non-symmetricthen the user should choose a class of methods first of all (fig. 5.12). Ifthe class «with global convergence» has been chosen a user isproposed to solve the problem by Burdakov’s method.

If the class «with local convergence» has been chosen the user isproposed to solve the problem by one of the following methods: theNewton’s, Broyden’s or Dennis-More’s method’s (fig. 5.13).

107

FIG. 5.11

FIG. 5.12

108

FIG. 5.13

Differences between methods consist only in the way ofconstruction of approximation to the Jacoby matrix. At each iterationof the Newton’s method the Jacoby matrix is constructed by means ofusing difference approximations. The rest two methods construct theJacoby matrix by the above described technique only once and furtherthis matrix is improved by one of the iterative methods at eachiteration.

5.2.4. Results of solving SNE. Results of solving SNE byInpartool are the following:

solution of SNE;protocol describing a process of investigation and solving of

SNE.After completion of the computational process information about

problem which was solved appears in the upper part of the right-handpanel. Besides, the pop-up window «Processing of results» appears,as well (fig. 5.14)

A solution of the problem is saved in the binary form. It can bebrowsed, saved or printed. A protocol describing a process of

109

investigating and solving of SNE is presented in the text form; itcontains parameters of the problem, methods being used, severalcontrol components of the solution, problem’s run time, and thenumber of processors being used. The protocol can be saved andprinted.

FIG. 5.14

5.2.5. Inpartool’s diagnostics during the solving of SNE. Whensolving SNE Inpartool provides:

referential information; Help-type messages at every stage of the user’s work; problem’s run-time diagnostics.

In the «Help» item of the main menu a user can get not onlyinformation about functional potentialities of Inpartool but alsoinformation about terminology being used related to systems ofnon-linear equations (fig. 5.15).

Having chosen the «Glossary» item in the «Help» submenu andthen a term of interest from the list of terms a user can get itsexplanation (fig. 5.16).

110

Having clicked by right-hand mouse button on the title «Systemsof non-linear equations» a user can familiarize himself withInpartool’s functional potentialities as to the solving of problemsbelonging of the given class of problems as well as with order of workwith Inpartool. In similarly the same manner a user can get appropriateshort information at any stage of work by clicking the right-handmouse button on any menu item or title of interest in the dialogwindow (fig. 5.17).

FIG.5.15

111

FIG. 5.16

FIG. 5.17

112

5.3. Examples of solving systems of non-linear equations bymeans of Inpartool

Let us illustrate the using of Inpartool for the solving of SNE bythe following two problems.

Problem 1Solve SNE

,

starting with the initial approximation in the region

,

with restriction on the number of iterations it=100, given accuracyeps=1,0×10 10, error in the function’s specification del=1,0×10 10.

Exact solution of the problem is , .

Problem 2Solve SNE

starting with vector of initial approximation all components of whichare equal to 1, in the region

, with restriction on the number of iterations it=100,

given accuracy eps=1,0×10 10, error in the specification of thevector-function del=1,0×10 10.

Problem 1 for n= 100 was solved on 4 processors by variousmethods: the Burdakov’s, Dennis-More’s, Newton’s, Broyden’s andPowell’s. This has resulted in the obtaining of solution (only the first,

113

eleventh and twenty-first components of the solution’s vector arepresented)

Solution1.0100000000e+00 1.1100000000e+00 1.2100000000e+00

Problem 2 for n= 100 was solved on 4 processors by variousmethods: the Burdakov’s, Dennis-More’s, Newton’s, Broyden’s andPowell’s. This has resulted in the obtaining of solution (only the first,eleventh and twenty-first components of the solution’s vector arepresented)

Solution -5.7076119297e-01 -7.0710677535e-01 -7.0710678119e-01For the sake of verification of the obtaining of acceleration on

Inparcom Problem 1 was solved with n= 2000 by Dennis-More’smethod. The following run times (in seconds) were obtained:

on one processor time= 83.20;on four processors time= 6.46.

Thus, the obtained acceleration coefficient is equal to 83,2 / 6,46= 12,88, while the coefficient of efficiency is equal to 0,8.

The influence of the approximate nature of the initial data can beillustrated by results presented below.

The Problem 2 was solved with n = 100, accuracy of thesolution’s evaluation eps=1,0×10 10, error in the vector-function’sspecification del= 1,0×10 10 by Burdakov’s method. The followingresults by have been obtained (only the first, eleventh and twenty-firstcomponents of the solution’s vector are presented):

Accuracy of the obtained solution del = 1.0100000000e-08Solution-5.7076119297e-01 -7.0710677535e-01 -7.0710678119e-01If for all above values the vector-function’s the specification

error is equal to del=1,0×10 5 (i.e. 1,0×10 5 is added to everycomponent of the vector-function) then employment of theBurdakov’s method yields the following results:

Accuracy of the obtained solution

114

del = 1.0000001000e-03Solution-5.7076443018e-01 -7.0711031088e-01 -7.0711031671e-001To illustrate the obtaining of acceleration during the solving of

SNE the following solution methods have been chosen: the Newton’sand Dennis-More’s.

Diagrams illustrating the acceleration obtained during the solvingof Problem 1 with n = 3000 on various numbers of processors byabove mentioned methods are shown in fig. 5.18 5.19. Diagram forthe Newton’s method is depicted in fig. 5.18, while diagram for theDennis-More method is shown in fig. 5.19.

FIG. 5.18

115

FIG. 5.19

From these figures one can see the obtaining of accelerationduring the solving of SNE of the above indicated orders with theincrease in the number of processors. However, the acceleration in thesolving of SNE considerably depends both on the order of systemsbeing solved and the amount of arithmetic operations required for theevaluation of vector-function.

FIG. 5.20

Diagram describing the solving of Problem 1 for n = 1000 byDennis-More's method is shown in fig. 5.20. One can see that thenumber of processors optimum for the solving of system of such orderis four processors and further increase in the number of processors

116

being used doesn’t result in the acceleration during the solving ofproblem.

117

Chapter 6. INVESTIGATING AND SOLVING OFSYSTEMS OF ORDINARY DIFFERENTISLEQUATIONS

6.1. Functional potentialities of Inpartool on investigating andsoling systems of ordinary differential equations

Ordinary differential equations and systems of equations withinitial conditions may arise at the intermediate stage of the solving ofmore complicated mathematical problems, for example, as a result ofapplying the finite elements method for the discretization only byspecial variables of initial boundary-value problems for systems ofpartial differential equations. Initial-value problems for systems ofordinary differential equations (SODE) can also arise in the describingof movements, processes and phenomena varying in time. Very oftenduring the solving of problems related to the movement of guidedobjects a necessity arises to solve SODE faster then in the real-time,moreover the solving of these problems supposes multi-variantcalculations. Such problems can be efficiently solved on parallelcomputers, in particular, on computers possessing MIMD-architecture.

When deriving ordinary differential equations one should abstracthimself from some (regarded as secondary) characteristics ofproperties and processes. Such abstraction may result in the creation ofeither unstable or stable mathematical model of physically stableprocess or phenomenon. As a rule, at the next stage a problem arisesconcerning the construction of numerical solution of the mathematicalmodel. Prior to the construction of numerical solution one shoulddetermine whether the mathematical model possesses asymptoticallystable or, at least, stable by Lyapunov solution. Having becomeconvinced on the basis of some investigations that mathematicalmodel possesses a stable solution let us turn our attention to suchalgorithm for the construction of the numerical solution that under thecondition of stability of the numerical method at any point ofintegration the required relative or absolute accuracy can be attained inthe minimal run-time.

118

The solving of the above-mentioned problems in great part can beput on the computer. In so doing all investigations can be carried outin parallel with construction of the numerical solution by using theevaluated values both for the construction of solution andinvestigation of problem’s characteristics.

Initial-value problems in the n-th order systems of ordinarydifferential equations in the interval [t0,T] will be considered in theform:

, (6.1)

, (6.2)

where v is n-dimensional vector, while (v) is n-dimensionalfunction, i.e.

v = (v1, v2,...,vn)T,

(v) = ( 1(t, v1,v2,...,vn), 2(t, v1,v2,...,vn),..., n(t, v1,v2,...,vn))T ,t [t0, T].

Sufficient conditions for the existence and uniqueness of solutionto the problem (6.1), (6.2) are the following:

continuity of components of the right-hand side in the rectangleD = {t0 a t t0 + a, v0 - b v v0 + b};

holding of the Lipschitz condition for all functions j(j = 1, 2, 3, … the number of vector’s component), by all arguments

, j = 1, 2, … , n,

in the rectangle D, where Lj are the Lipschitz constants.Under these conditions there exists a unique solution of the

problem (6.1), (6.2) on the interval to t0 N t t0 + N, where

.

119

Mj = max | j (t, u) | in D,

Further we will deal with SODE possessing asymptotically stablesolutions. In addition, without loss of generality and for the sake ofsimplifying formulas we will consider the initial-value problem of theform:

, (6.3)

, (6.4)

Since the problem (6.1), (6.2) can be reduced to the problem (6.3),(6.4) by introducing both additional differential equation for the timewith right-hand side equal to 1 and additional initial condition equal tot0.

When modeling realistic processes on computer by means ofSODE the number of difficulties arises, in particular, one have to dealwith problems possessing approximate initial data.

Approximate nature of the initial data may be caused by thefollowing factors:

1) errors in specification of the initial data since the initialconditions being the results of various measurements are inaccurate;

2) errors in the specification of right-hand side; these errors arecaused by the fact the right-hand side is some approximation toright-hand side of the realistic differential equation; in particular, thistakes place during formulating of problems when neglecting somefacts and phenomena which are either unimportant from the viewpointof user or have a little influence on the development of the processbeing described; besides the right-hand side is often approximated bymore simple functions for the sake of economy in the number ofarithmetic operations at each step of integration;

3) in some cases the solution is evaluated from the equivalentequation explicitly unresolved with respect to the solution beingsought; then in the process of integration an approximate solution ofimplicit equation is used instead of the exact solution;

120

4) applying of the numerical (discrete) method of integration androunding off numbers during computations;

5) discretization of dynamical problems of various forms byspecial variables.

The above-mentioned difficulties considerably affect the accuracyestimate for the obtained solution.

If instead of the accurate problem (6.3), (6.4) the problem withapproximately given initial data

, (6.5)

(6.6)

is solved with, (6.7)

for the arbitrary functions w(t), then on the obtaining of a solution tothe problem (6.5), (6.6) by any numerical method within the accuracyof , i.e. on the attaining of inequality , where

is the numerical solution to the problem (6.5), 6.(6) at the point ,

the error in the solution to problem (6.3), (6.4) is estimated be formula

within the accuracy of values of higher order in smallness.On the basis of the foregoing an intelligent software Inpartool has

been created for the investigating and numerical solving ofinitial-value problems in SODE with approximately given initial dataon the given interval and within the required accuracy.

SODE can be common (non-stiff) or stiff. A system is consideredto be stiff if the following condition holds:

,

121

where are eigenvalues of the Jacoby matrix, is order of the

system and is a constant depending on performance of computerused for the solving of problem.

As it is known, for mast problems of the type (6.1), (6.2) or (6.5),(6.6) it is impossible to find analytic solution and that’s why numericalintegration methods are employed for the search of solution.Numerical methods are based on the discretization of differentialproblems by difference ones. The available well-known methods forthe discretization of system (6.1) lead to different classes of methods(one-step and multi-step, explicit and implicit) possessing differentorders of accuracy of difference schemes. Problems arise related to theattaining of solution’s accuracy provided that conditions of stablecomputations are fulfilled.

Inpartool provides:the solving of initial-value problems on the given interval and

within the required accuracy;investigation of problem’s characteristics (common or stiff );a choice of class of methods and solution programs (both

automatically and interactively);control over integration step size;dialog tools for the input of information;control of the input data;issuing of recommendations as to making decisions in case of

interruption;reliability of the obtained solutions.

The software works with knowledge obtained during investigationof problems and on the basis of them makes decisions as to ways andmethods for the evaluating of solution within the given accuracy.

Mathematical facilities of the intelligent software include:mathematical methods for the computer investigation of

characteristics of systems;algorithms for the solving of systems;means for controlling the integration step size based on the

accuracy and stability;tools for the evaluating of solution and analyzing its error.

122

The initial data are either read from the a priori prepared file orentered by means of the keyboard with their further displaying on thescreen.

Both the solution and estimate for the local Lipschitz constant atoutput points are written to the file.

Inpartool exhibits the following distinctive characteristics:investigation of characteristics of SODE;possibility of automatic choice of class of methods;control over the integration step size based on requirements of

accuracy and stability;guarantee of the solution’s reliability;work with software without preliminary familiarization with it

as well as without studying of instructions.At the conceptual level Inpartool implements the following

principles of the information computing technology for the solving ofproblems which involves: formulation of problem in terms of thesubject area language, investigation of characteristics of the problembeing solved and automatic choice of algorithm depending on therevealed characteristics, syntheses of the solution program with takinginto account mathematical and engineering characteristics ofcomputer, the solving of problem and reliability analysis of theobtained results as well as dialog support and information referentialprovision for processes of formulating and solving the problem.

Inpartool provides a possibility of automatic investigating andsolving of problems under which characteristics of SODE areinvestigated in computer without user’s involvement, and on the basisof revealed characteristics as well as with taking into accountengineering and mathematical potentialities of Inpartool-16, Inpartooldetermines whether the problem is common or stiff, a suitablealgorithm for the solving of problem is chosen, a processor topology isconstructed, the initial data are distributed between processors in orderrequired by algorithm, the problem is solved and reliability of theobtained results is estimated. By default, Inpartool constructs atopology from the optimum number of processors. However, the usercan choose the required number of processors on his own. All

123

information about the process of solving the problem and results areaccumulated in protocol.

A possibility of investigation and solving of problems in theinteractive mode is also provided. In this case characteristics of SODEare investigated first of all and information is issued whether thesystem is stiff or not. If system is common (non-stiff) the user canchoose a class of methods with attaining of either global or localaccuracy as well as a method for solving the problem from the list ofmethods being proposed.

In case if a class of methods with attaining the global accuracy hasbeen chosen the user can solve the problem by explicitRunge-Kutta-type 1-st order method. In case if a class of methods withattaining the local accuracy has been chosen the user can choose thesolution method from the following list:

Adams’ methods of the order up to 12 [23];4-th order Runge-Kutta methods;5(6)-th order Runge-Kutta methods;Euler-Cauchy method.

If the system is stiff the user is proposed to choose a method forthe solving of problem from the following list:

Gear’s methods of order up to 5 [8];Rosenbrock method [46].

6.2. Technology of investigating and solving systems ofordinary differential equations

6.2.1. Applying of Inpartool for the solving of SODE.Inpartool solves common (non-stiff) and stiff systems of ordinarydifferential equations.

The main window «Systems of ordinary differential equations»is depicted in fig. 6.1.

To solve the problem a user should perform:input of problem’s initial data;input of right-hand side;a choice of either interactive or automatic way of solving the

problem;

124

a choice of class and methods for the solving of problem (incase of interactive solving of problem);

run of the problem;obtaining of results of solving the problem.

FIG. 6.1

6.2.2. Specification of initial data for the solving of SODE. Thefollowing input data are required for the solving of SODE by means ofInpartool:

order of SODE; the number of output points;Starting point of the integration interval (for the first call of

function) or a point attained during the integration;final point of the integration interval;the required accuracy in the solution;error in the initial conditions’ specification;error in the right-hand sides’ specification;accuracy of the obtained solution with taking into account

approximate nature of the initial data;

125

array containing either initial values of solution’s components(for the first call of the function) or values of solution at the pointattained in the process of integration;

array containing output points of the solution.The input data for the solving of SODE can be either read from file

or entered by means of the keyboard (fig. 6.2).

FIG. 6.2

When inputting data from the file (fig. 6.3) they are read from thebinary file data_dif in the above-indicated order.

126

FIG. 6.3

When inputting data by means of the keyboard the above-indicateddata are entered in special dialog data input window (fig. 6.4). Theinput of every element ends by pressing the <Enter> button. In sodoing the information is entered and passage to the next data inputfield takes place.

127

FIG. 6.4The already entered data can be edited by pressing «Edit» button.

The current values of the initial data can be saved in file. To do this auser should choose «Save as…» in the «File» item of the main menu.This information may be used in the next work sessions.

Besides, for the solving of SODE by means of Inpartool the usershould input a vector-function for the evaluation of right-hand sides ofthe system. It also can be entered either from file or by means of thekeyboard (fig. 6.5).

128

FIG. 6.5

When inputting functions by means of the keyboard in theappeared window one should write a program in C for the evaluationof right-hand sides and then save it in the file by pressing the «Save»button. When user has chosen input of functions from the file, awindow opens containing a list of functions (fig. 6.6). It remains onlyto choose the required function (for example, diffun.c).

The file contains a program written in C (fig. 6.7). In case ofnecessity the user can modify the program and save all changes bypressing «Save» button.

When inputting functions by means of the keyboard the user shouldin the appeared window write a program in C for the evaluation ofright-hand sides and then save it in the file with indicated name.

129

FIG. 6.6

FIG. 6.7

130

6.2.3. The solving of SODE in Inpartool. Inpartool proposestwo modes of solving SODE: automatic and interactive. To run theproblem a user should choose one of these modes by pressing a buttonof the appropriate mode in window which appears after the successfulinput of data.

In case of automatic mode of solving the problem Inpartoolinvestigates the problem and on the basis of revealed characteristics ofSODE and according to engineering and mathematical potentialities ofInparcom-16 determines whether the system is common (non-stiff) orstiff; an appropriate algorithm for solving the problem is chosen; anefficient processor topology is constructed; the initial data distributedbetween processors in order required by algorithm; the problem issolved and reliability of the obtained results is estimated. By defaultInpartool constructs a topology from such number of processes whichis optimum for the given problem. However, the user can choose thenumber of processors on his own (fig. 6.8). All the information aboutprocess of solving the problem is accumulated in the protocol.

FIG. 6.8

131

In case of interactive solving of the problem, characteristic ofSODE are investigated first of all and user is informed whether it isstiff or common (non-stiff). If system is common (non-stiff) the usercan choose a class of methods either «with attaining of globalaccuracy» or «with attaining of local accuracy » and then choose asolution method from the list being proposed (fig. 6.9, 6.10).

FIG. 6.9

In case if the class of solution methods «with attaining of globalaccuracy» is chosen, the user can solve the problem by explicit 1-storder Runge-Kutta-type method. If the class of solution methods«with attaining of local accuracy» is chosen the user can solve amethod from the following list:

Adams’ methods of order up to 12;4-th order Runge-Kutta methods;5(6)-th order Runge-Kutta methods;Euler-Cauchy method.

132

If the system is stiff the user is proposed to choose the solutionmethod from the following list:

Gear’s methods of order up to 5;Rosenbrock method.

FIG. 6.10

Under the interactive mode of solving the problem the number ofprocessors can be chosen. A process of solving the problem is startedafter pressing «Further…» button.

Similarly as in the case of automatic mode, the problem is solvedtogether with estimating the reliability of the obtained results.

6.2.4. Results of solving SODE. Results of solving SODE byInpartool are the following:

solution of SODE;

133

protocol describing a process of investigation and solving ofSODE.

After completion of the computational process the informationabout problem which was solved appears in the upper part of theright-hand panel. Besides, the pop-up window «Processing ofresults» appears, as well (fig. 6.11)

A solution of the problem is saved in the binary form. It can bebrowsed, saved or printed.

A protocol describing a process of investigating and solving ofSODE is presented in the text form. It contains: parameters of theproblem, methods used for investigating of problem for the sake ofchoosing an efficient algorithm and construction of the solutionprogram, several control components of the solution, problem’s runtime, and the number of processors being used. The protocol can bebrowsed by pressing «Show the result» button, saved or printed.

6.2.5. Inpartool’s diagnostics during the solving of SODE.When solving SODE Inpartool provides:

referential information;Help-type messages at every stage of the user’s work;problem’s run-time diagnostics.

In the «Help» item of the main menu a user can get not onlyinformation about functional potentialities of Inpartool but alsoinformation about terminology being used related to systems ofordinary differential equations (fig. 6.11).

134

FIG. 6.11Having chosen the «Glossary» item in the «Help» submenu and

then a term of interest from the list of terms the user can get itsexplanation (fig. 6.12, 6.13).

135

FIG. 6.12

FIG. 6.13

Having clicked by right-hand mouse button on the title «Systemsof ordinary differential equations» a user can familiarize himself

136

with Inpartool’s functional potentialities as to the solving of problemsbelonging to the given class of problems as well as with order of workwith Inpartool.

In similarly the same manner a user can get appropriate shortinformation at any stage of work by clicking the right-hand mousebutton on any menu item or title of interest in the dialog window(fig. 6.14).

FIG. 6.14

137

6.3. Examples of solving systems of ordinary differentialequations means of Inpartool

Let us illustrate the using of Inpartool for the solving of SODE bythe following two problems.

Problem 1Solve SODE

, i = 0,1,2, …, n 1

under the initial conditions , i = 0, 1, 2,…, n 1 on the interval

[0, T].The problem is common (non-stiff). Accuracy of obtaining the

solution is eps=1.0×10 6, error in the initial conditions’ specificationis delta1=1.0 ×10 10, and error in the vector-function’s specificationis delta2=1.0 ×10 10.

Exact solution of the problem is , i = 0, 1, 2,…, n 1.

Problem 2Solve SODE

under the initial conditions ui (0) =0,0; ui+1 (0) =1,0; ui+2 (0) =1,0on the interval [0, 50], i= 0, 3, 6,…, 3k, k = 0, 1, 2,…, n = 3k+3.

The problem is stiff. Accuracy of obtaining the solution iseps=1.0×10 6, error in the initial conditions’ specification is

138

delta1=1.0 ×10 10, error in the vector-function’s specification isdelta2=1.0 ×10 10.

Exact solution of the problem is

i= 0, 3, 6, …, 3k, k = 0, 1, 2, …, n = 3k+3.Problem 1 was solved on 16 processors with n= 4000, T=0,4 by

various methods: Gear’s methods of order up to 5; Rosenbrockmethod, the 4-th order Runge-Kutta method, Adams’ methods oforder up to 12, explicit 1-st order method with attaining of the localaccuracy.

The following solution was obtained (only the first component ofit is given)

sol[0] = 1.400000e+00.Problem 2 was solved on 16 processors with n= 600, T=50,0 by

Gear’s methods of order up to 5.The following solution was obtained (only first 4 components of

are given):Solution =

-1.888397e-06 5.964471e-01 1.403653e+00 -1.888397e-06.

The obtaining of acceleration was verified on the solving ofProblem 1 with n = 4000 and T=4,0×10 4 by Adams’ methods oforder up to 12. The following run-times were obtained:

for one processor time= 4.924150e-01;

for four processors time= 1.443830e-01.Thus, the obtained acceleration coefficient is equal to 0,49/0,14=

=3,5, while the coefficient of efficiency is equal to 0,87.The influence of the approximate nature of the initial data can be

illustrated by following results.The Problem 1 was solved with n = 4000 and T=0,4; accuracy of

the solution’s evaluation is eps=1,0×10 6, error in the initialconditions’ specification is delta1=1.0 ×10 10, error in the

139

vector-function’s specification is delta2=1.0 ×10 10. Theemployment of Adam’s methods of order up to 12, 4-th orderRunge-Kutta method and Euler-Couch method resulted in thefollowing solution(only first 10 components of the solution aregiven):Solution = 1.400000e+000 1.400000e+000 1.400000e+0001.400000e+000 1.400000e+000 1.400000e+000 1.400000e+0001.400000e+000 1.400000e+000 1.400000e+000.At this point the solution is obtained with accuracy delta= 1.000000e-006.

If for all above values the error in the initial conditions’specification is delta1=1.0 ×10 3, error in the vector-function’sspecification is delta2=6,0×10 3 (i.e. values of initial conditions havebeen altered, and 6,0×10 3 has been added to each component of thevector-function), then employment of, for example, Adam’s methodsyields the following results:Solution = 1.400059e+000 1.400059e+000 1.400059e+0001.400059e+000 1.400059e+000 1.400059e+000 1.400059e+0001.400059e+000 1.400059e+000 1.400059e+000.

At this point the solution is obtained with accuracy delta= 3.401000e-003.

To illustrate the obtaining of acceleration during the solving ofSODE the following was chosen: Problem 1 and the most popular inpractice methods for solving such problems – the 4-th orderRunge-Kutta method, Adams’ methods of order up to 12 and Gear’smethods of order up to 5.

Diagrams illustrating the acceleration obtained during the solvingof Problem 1 with n = 10 000 and T=0,4 on the different numbers ofprocessors by 4-th order Runge-Kutta method is shown in fig. 6.15, byAdams’ methods of order up to 12 – in fig. 6.16 and by Gear’smethods of order up to 5 – in fig. 6.17.

140

FIG. 6.15

141

FIG. 6.16

FIG. 6.17

Diagram depicted in fig. 6.18 illustrates acceleration gainedduring the solving of Problem 1 with n = 5 000 by Euler-Cauchymethod.

142

FIG. 6.18

From the diagrams presented here one can see nearly linearacceleration obtained during the solving of SODE of the indicatedorders with increase in the number of processors. However, itsignificantly depends on the order of systems being solved or, to bemore precise, on the amount of arithmetic operations required for theevaluation of right-hand sides. Results of experiments have shown thatone should determine an optimum number of processors in order tosolve each concrete problem.

143

Chapter 7. LIBRARY OF INTELLIGENT PROGRAMSINPARLIB

7.1. Purpose and composition of the library

Intelligent programs involved in the library [53] are intended forthe investigating and solving of basic problems of the computationalmathematics:

linear algebraic systems;algebraic eigenvalue problem;non-linear equations and systems;systems of ordinary differential equations.

Programs included in the library implement:statement of problems with approximately given initial data;

investigation of characteristics of problem's computermodel;

verification of agreement between characteristics of problem'scomputer model revealed by computer and chosen solution algorithm;

construction of topology of Inparcom's processors;the obtaining of solution together with reliability estimate

which includes both estimate for the inherited error caused by theinitial data error and estimate for the computational error.

Program modules implementing finished parts of investigatingand solution algorithm are written in C and intended both for theMIMD-architecture computers and parallel programming environmentMPI.

As to linear algebraic systems (LAS) program modules included inInparlib enable to: investigate and solve problems with variousstructure matrices together with reliability estimates for the solution,invert matrices, evaluate both singular values and matrix ranks well asestimate matrix condition numbers.

As algebraic eigenvalue problems (common and generalized)Inparlib's programs solve both full and partial eigenvalue problemwith various structure matrices (dense, band or sparse). By means ofprograms from Inparlib it is possible to evaluate condition numbers for

144

separately taken eigenvalues, condition numbers for eigenvectors aswell as to evaluate estimates for the overall error in solutions.

As to non-linear equations and systems Inparlib's programs enableto: investigate and solve systems of non-linear algebraic andtranscendental equations; determine local condition number of thefunction f(x), local condition number of the vector-function F(x);implement termination criteria for iterative processes guaranteeing theobtaining both of solutions within the given accuracy and solution'serrors with taking into account approximate nature of the initial data.

As to systems of ordinary differential equations with initialconditions, Inparlib contains programs enabling to: investigate andsolve these systems, integrate both common and stiff systems ofequations within accuracy of various orders as well as within any apriori specified accuracy. A user can carry out investigation of thestiffness of SODE, evaluate both the Lipschitz constant and accuracyof the obtained solution with taking into account approximate natureof the initial data.

Functional programs from Inparlib provide: statement of problemswith approximately given initial data, investigation of mathematicalcharacteristics of problem's machine models, verification of agreementbetween the revealed characteristics and application area for thesolution algorithm being chosen as well the obtaining of solutiontogether with reliability estimate or a refusal (with indication ofreasons) in the solving of problem.

From the end user's point of view programs included in the libraryare reuse components in the solving of application problems for whichproblems of the computational mathematics are either intermediate ora final stage.

For the investigating and solving of LAS Inparlib contains thefollowing functions:

PLGESAD – function for the investigating and solving of LAS withdense non-singular matrix by Gauss method with partial columnpivoting within approximately given initial data. The program enablesto obtain a solution to LAS together with estimates for the inheritedand computational errors;

145

PLPPSAD – function for the investigating and solving of LAS withsymmetric positive definite matrix by Cholesky method withinapproximately given initial data. The program enables to obtain asolution to LAS together with estimates for the inherited andcomputational errors;

Slae_bsp_bp – function for the investigating and solving ofLAS with band symmetric positive definite matrix by Choleskymethod implementing LDLT-factorization of the matrix. The programenables to obtain a classic solution to LAS together with its reliabilityestimate;

Slae_bss_bp – function for the investigating and solving ofLAS with band symmetric positive semi-definite matrices bythree-staged regularization method. The program enables to obtain apseudo-solution to LAS approximated to the normal solution withinthe given accuracy;

Slae_svd_p – function for the investigating and solving of LASwith rectangular or square singular general matrices by employing thesingular-value decomposition of the system's matrix. The programenables to obtain a generalized solution to LAS together with itsreliability estimate.

For the investigating and solving of AEVP Inparlib contains thefollowing functions:

mp_esytri – function for the investigating and solving of fullAEVP for symmetric tri-diagonal matrix with approximately givenelements distributed between processors;

mp_esyqai_bl – function for the investigating and solving offull AEVP for dense symmetric matrix with approximately givenelements distributed between processors;

Evp_bs_bp – function for the investigating and solving of partialstandard or generalized AEVP for band symmetric positive definitematrix by method of iterations on the subspace. The program evaluatesseveral minimum eigenvalues and eigenvectors corresponding to themas well as estimates the reliability of the obtained solutions.

For the investigating and solving of SNE Inparlib contains thefollowing functions:

146

zeroin – function for the finding roots of non-linear equation bybisection method within specified initial data's error as well as forevaluating of error in the solution (if any);

bur – function for the solving of SNE by Burdakov methodwithin approximately given initial data. The method possesses a globalconvergence and retains the quadratic rate of convergence in theneighborhood of the solution;

kn – function for the solving of SNE by Dennis and More'smethod. It implements quasi-Newton method which during theiterative process approximates the inverse Jacoby matrix withapproximately given initial data. The method possesses a super-linearrate of convergence;

nut – function for the solving of SNE by Newton method withinapproximately given initial data. The method retains the quadratic rateof convergence in the neighborhood of the solution;

fib – function for the solving of SNE by first Broyden's methodwithin approximately given initial data.. The method possesses aglobal convergence and retains the quadratic rate of convergence inthe neighborhood of the solution;

paul – function for the solving of SNE with symmetric Jacobymatrix by Powell method within approximately given initial data. Themethod possesses a global convergence and retains the quadratic rateof convergence in the neighborhood of the solution.

For the investigating and solving of SODE Inparlib contains thefollowing functions:ek_dri – function for the solving of initial-value problems in SODEby 1-st order explicit Euler-Cauchy method within approximatelygiven initial data;

rk4_dri – function for the solving of initial-value problems inSODE by 4-th order explicit Runge-Kutta method withinapproximately given initial data;

rk6_dri – function for the solving of initial-value problems inSODE by 5(6)-th order explicit Runge-Kutta method withinapproximately given initial data;

147

adams_dri – function for the solving of initial-value problems inSODE by Adams methods of order up to 12-th within approximatelygiven initial data;

rk1_dri – function for the solving of initial-value problems inSODE by 1-st order Runge-Kutta-type method within approximatelygiven initial data;

gear_dri – function for the solving of initial-value problems inSODE by Gear's methods of order up to 5-th within approximatelygiven initial data;ros_dri – function for the solving of initial-value problems inSODE by 4-th order Rosenbrock method within approximately giveninitial data.

7.2. Description of functions from the Inparlib library

7.2.1. Investigating and solving of linear algebraic systems

Function PLGESAD. Function PLGESAD is intended forinvestigating and solving of LAS of the form AX=B with densenon-singular matrix A and several right-hand sides B withinapproximately given initial data by Gauss method with partial pivotingon the distributed memory MIMD-computer within the parallelprogramming environment MPI [24, 28, 32, 38, 52,53].

Prior to the call of the function PLGESAD the communicator(communications environment) is to be determined consisting of thechosen number of processors, the initial data are to be distributedbetween processors and the required volume of memory for arraysused in the program is to be allocated.

Declaration of function PLGESAD#include <math.h>#include "mpi.h"#include "plgesad.h"int PLGESAD(int info, int n, int r, int q,

int s,int size, int myid, double ea,double eb, double* cond, double* e,double* e1, double* A, double* B,int* P, double* R, double* S,double* ABX, int* PP, int* P1,

148

int* P2, double* R1, double* R2, double* A1, double* B1);

n – order of the original matrix A (full);r – the number of right-hand sides;q – the number of matrix rows in the current processor;s – the number of rows in block by which the matrix is cyclically

distributed, the rows being located one by one. For the efficientsolving of the problem with n <= 4000 one should set s = 4, 5, in caseif n > 4000 one should set s = 8, 10;

size – the number of processors being used;myid – identifier of the processor;ea – maximum relative error in the specification of elements of

the matrix А;eb – maximum relative error in the specification of elements of

the right-hand sides B;cond – estimate for the condition number of the matrix А;e – estimate for the inherited error;e1 – estimate for computational error;A – array of dimension (n*q):on entry: contains the local part of the distributed matrix А in the

current processor;on exit: contains the local part of the LU - decomposition;B – array of dimension (n*q):on entry: contains the local part of the distributed matrix of

right-hand sides in the current processor;on exit: contains a local matrix of the obtained solution in the

current processor;P – auxiliary array which on exit: contains information about

numbers of interchanged rows of the matrix in the process of itsfactorization. Length of the array could not be less than the number ofrows in the local sub-matrix. Array of dimension q;

R – auxiliary array for the sending out of the leading row. Arrayof dimension n;

S – auxiliary array for the interchange of rows between theleading processor and processor-owner of the maximal element. Arrayof dimension n;

149

ABX – auxiliary array. With more than one processor it is usedfor gathering of the matrix B during interchange of rows of right-handsides. Array of dimension n*r;

PP – with more than one processor it is used for gathering of thematrix Р during interchange of rows of right-hand sides. Array ofdimension n;

P1 – auxiliary vector for the gathering of right-hand sides. Arrayof dimension n;

P2 – auxiliary vectors for sending out of right-hand sides. Arrayof dimension n;

R1 – auxiliary array of dimension n;R2 – auxiliary array of dimension n;A1 – a local part of the original distributed matrix A (a copy for

the evaluation of estimates). Array of dimension (n*q);B1 – a local part of the distributed matrix of right-hand sides B (a

copy for the evaluation of estimates) in the current processor. Array ofdimension (r*q).

Returned valueFunction PLGESAD returns a value of the completion code with

the following meanings:

0 – normal completion; –101 – the system's matrix is machine-singular and classic

solution is non-existent; –105 – the solution was obtained but its reliability cannot be

guaranteed since the matrix may turn out to besingular within the accuracy of its elementsspecification; in this case it is recommended to specifythe initial data with more accuracy;

–2 – a lack of memory for the solving of problem; –3 – parameters input error; it is necessary to check the

correctness of the initial data specification.

If the value of the completion code differs from zero the functionsends a message to stdout describing a reason of the failure.

Example of using function PLGESAD

150

A segment of program using the function PLGESAD for thesolving of LAS is given below. The initial data (matrix and theright-hand side) are formed by accompanying programs.

int main(int argc,char*argv[]){ int myid;//number of processor int size;//the number of processors int q;//the number of matrix rows in each processor int s;//block size(the number of rows in block)

int b,bb;//temporary variables double e,e1;//errors estimates

double ea,eb;//maximal error in matrix elementsspecification

double cond;//condition number estimatedouble*A1;//Array (a copy of А for the evaluating of

errors)double*B1;//Array (a copy of B for the evaluating of

errors) double*ABX,*R,*R1,*R2,*S;//Working arrays int*P1, *P2, *PP;// Working arrays

// initialization of MPI MPI_Init(&argc,&argv); // initialization of MPI MPI_Comm_size(MPI_COMM_WORLD,&size); MPI_Comm_rank(MPI_COMM_WORLD,&myid);

// initial data specification info=0; n=100; r=1; ea=E; eb=E; s=1;

// evaluation of the number of rows in each processor b=n/s; q=b/size; bb=b-q*size; if(myid<bb) q++; q*=s; if(myid==bb) q+=n%s;

// allocation of memory if(size>1) { S=(double*)malloc(2*size*sizeof(double)); P1=(int*)malloc(size*sizeof(int));

P2=(int*)malloc(size*sizeof(int)); if(myid==0) ABX=(double*)malloc(n*r*sizeof(double));

else ABX=(double*)malloc(sizeof(double)); } else { P1=(int*)malloc(sizeof(int));

P2=(int*)malloc(sizeof(int));

151

S=(double*)malloc(sizeof(double));ABX=(double*)malloc(sizeof(double));

} A=(double*)malloc(q*n*sizeof(double)); A1=(double*)malloc(q*n*sizeof(double)); B1=(double*)malloc(q*r*sizeof(double)); B=(double*)malloc(q*r*sizeof(double)); P=(int*)malloc(q*sizeof(int)); R=(double*)malloc((n+1)*sizeof(double)); R1=(double*)malloc((n+1)*sizeof(double)); R2=(double*)malloc(n*sizeof(double)); PP=(int*)malloc(n*sizeof(int));

// verification of the correctness of memory allocationif((ABX==NULL)||(A==NULL)||(B==NULL)||(P==NULL)||(R==

NULL)||(S==NULL)||(PP==NULL)||(P1==NULL)||(P2==NULL)) info=-2; if(size>1) { P1[0]=info;

MPI_Allreduce(P1,P2,1,MPI_INT,MPI_MIN,MPI_COMM_WORLD); info=P2[0]; }

// construction of basic matrix and right-hand sidesif(info==0){

BuildMatrPart(16,n,n,q,s,size,myid,A); BuildMatrPart(100,n,r,q,s,size,myid,B);

} if(info==0) { MPI_Barrier(MPI_COMM_WORLD);

// the solving of problem info=PLGESAD(info,n,r,q,s,size,myid,ea,eb,&cond,

&e,&e1,A,B,P,R,S,ABX,PP,P1,P2,R1,R2,A1,B1); MPI_Barrier(MPI_COMM_WORLD); }

// if successfully – output of result if((info==0)||(info==-105)) {

// the gathering of the solution matrix in one processorif(size>1) GatherFromAllD(size,myid,q,s,n,r,B,ABX,P1,P2);

// shutdown of MPIMPI_Finalize(); return info;};

// Function for the construction of matricesvoid BuildMatrPart(int ind, int n, int r,int q,

int s, int size, int myid, double *A){ int k,kk,kid,j;

152

for(k=0;k<n;k++) {kid=(k/s)%size; kk=((k/(s*size))*s+k%s)*r; if((myid==kid)&&(kk<q*r)) for(j=0;j<r;j++)

//a choice of formula for the construction of matrices switch(ind) { case 16: A[kk+j]=n-MAX(k,j); break; default: if(ind>=100){if(k==j+ind-100)A[kk+j]=1; else A[kk+j]=0.;} break; } }};Function PLPPSAD. Function PLPPSAD is intended for

investigating and solving of LAS of the form AX=B with symmetricpositive definite matrix A and several right-hand sides B withinapproximately given initial data by LDLT-decomposition on thedistributed memory MIMD-computer within the parallel programmingenvironment MPI [24, 28, 32, 38, 52].

Prior to the call of the function PLPPSAD the communicator(communications environment) is to be determined consisting of thechosen number of processors, the initial data are to be distributedbetween processors and the required volume of memory for arraysused in the program is to be allocated.

Declaration of function PLPPSAD#include <math.h>#include "mpi.h"#include "plppsad.h"int PLPPSAD(int n, int rowcount, int myid,

int procnum, double** a, double** aa,double** b, double** bb, int xn,double* p, double* work2, double** work3, double ea, double eb,int* info, double* conda, double* e, double* e1, int LINE);

n – order of the original matrix A (full);rowcount – the number of matrix rows in the current processor;myid – identifier of the processor;procnum – the number of processors being used;a – array of dimension (rowcount*n+1), where:

153

rowcount = ((n / LINE) / procnum)*LINE;if(myid < (n - rowcount*procnum)/LINE)rowcount+=LINE;else {if(myid == (n - rowcount*procnum)/LINE)rowcount+= (n - rowcount*procnum)%LINE;}on entry: contains a local part of the distributed matrix А in

the current processor;on exit: contains a local part of LDLT decomposition in the

current processor;aa – a local part of the original distributed matrix А in the current

processor (a copy for the evaluation of solution's error estimates).Array of dimension (rowcount*n+1);

b – array of dimension (rowcount*xn+1), where:rowcount = ((n / LINE) / procnum)*LINE;if(myid < (n - rowcount*procnum)/LINE)rowcount+=LINE;else {if(myid == (n - rowcount*procnum)/LINE)rowcount+= (n - rowcount*procnum)%LINE; }on entry: contains a local part of the distributed matrix of

right-hand sides B in the current processor;on exit: contains a local matrix of the obtained solution in the

current processor;bb – a local part of the distributed matrix of right-hand sides B (a

copy for the evaluation of estimates for solutions) in the currentprocessor. Array of dimension (rowcount*xn+1);

xn – the number of right-hand sides;p – auxiliary array of dimension n;work2 – auxiliary array of dimension (n*LINE);work3 – auxiliary array of dimension LINE, where the value of

LINE is specified in PLPPSAD.h,by default it is 10;ea – maximum relative error in the specification of elements of

the matrix А;

154

eb – maximum relative error in the specification of elements ofthe right-hand sides B;

info – the completion code of the problem;conda – estimate for the condition number of the matrix А;e – estimate for the inherited error;e1 – estimate for computational error;LINE – the number of rows in matrix block by which the matrix is

cyclically distributed, the rows being located one by one.Returned valueFunction PLPPSAD returns a value of the completion code with


0 – normal completion; –101 – the system's matrix is machine-singular; –102 – the system's matrix is non-positive definite; –105 – the solution was obtained but its reliability cannot be

guaranteed since the matrix may turn out to besingular within the accuracy of its elementsspecification;

–106 – a lack of memory for the solving of problem; –107 – parameters input error.


Example of using function PLPPSADA segment of program using the function PLPPSAD for the

solving of LAS is given below. The initial data (matrix and theright-hand side) are formed by accompanying programs.int main(int argc, char* argv[]){ int LINE;

double MACH_EPS 2.220446049250313e-016; char** in; int n, xn, myid, procnum, info = 0, res = 0;//the number of blocks in the processor: the number of"rows" and // "columns" of blocks int columnBlockCount, rowBlockCount; int columnCount, rowCount;

155

int* p; blockptr blocks; //memory for sgid of blocks double* memorya; //memory for the matrix double* memoryb; //memory for right-hand sides double **a, **aa; double **b, **bb;

int iindex, jindex, kindex, rest, rest1, result; double conda = 0.;

double e, e1; int P_DIM = 1, Q_DIM = 1, BLOCK_DIM = 4; double ea = 0., eb = 0.;

double **work2, **work3;LINE = 10;

n = 4; // dimension of the matrix xn = 1; // the number of right-hand sides if(fabs(ea) <= MACH_EPS) ea = MACH_EPS; if(fabs(eb) <= MACH_EPS) eb = MACH_EPS;//initialization of MPI, the obtaining of theprocessor's number and the //number of processes MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &myid); MPI_Comm_size(MPI_COMM_WORLD, &procnum);// evaluation of the number of blocks in the currentprocessor columnBlockCount = N/(P_DIM*BLOCK_DIM); columnCount = columnBlockCount * BLOCK_DIM; rest = N%(P_DIM*BLOCK_DIM*columnBlockCount); if(rest&&((myid%P_DIM)*BLOCK_DIM < rest)) { if(rest / (BLOCK_DIM*(myid+1))){ columnBlockCount++; columnCount += BLOCK_DIM; }else { rest = rest % (BLOCK_DIM*(myid+1)); if(rest){ columnCount += rest; columnBlockCount++; } } } rowBlockCount = (N/Q_DIM)/BLOCK_DIM; rowCount = rowBlockCount * BLOCK_DIM; rest = N%(Q_DIM*BLOCK_DIM*rowBlockCount); if(rest&&((myid/P_DIM)*BLOCK_DIM < rest)) { if(rest / (BLOCK_DIM*(myid+1))) { rowBlockCount++; rowCount += BLOCK_DIM; }else { rest = rest % (BLOCK_DIM*(myid+1));

156

if(rest){ rowCount += rest; rowBlockCount++; } } }// allocation of memory for matrices memorya = (double*)malloc(2 * (rowcount*n + 1) *sizeof(double)); a = (double **)malloc(rowcount * sizeof(double*)); aa = (double **)malloc(rowcount *sizeof(double*)); memoryb = (double*)malloc((rowcount * xn +rowcount) * sizeof(double));

b = (double **)malloc(rowcount*sizeof(double*)); bb = (double **)malloc(rowcount*sizeof(double*)); p = (double*)malloc(n*sizeof(double)); work2 = (double*)malloc(n*LINE*sizeof(double)); work3 = (double**)malloc(LINE*sizeof(double*));

// data input(generation)memorya[0] = 0.13; memorya[1] = 0.18; memorya[2] = 0.19;memorya[3] = 0.18;memorya[4] = 0.18; memorya[5] = 0.22;memorya[6] = 0.26; memorya[7] = 0.27;memorya[8] = 0.19;memorya[9] = 0.25; memorya[10] = 0.2; memorya[11] = 0.6;memorya[12] = 0.8; memorya[13] = 0.7; memorya[14] = 0.6;memorya[15] = 0.25;memoryb[0] = 0.1; memoryb[1] = 0.1;memoryb[2] = 0.9; memoryb[3] = 0.7; for(iindex=0;iindex<rowcount;iindex++) { a[iindex]=memorya+iindex*n; aa[iindex]=memorya+rowcount*n+1+iindex*n; } for(iindex=0;iindex<rowcount;iindex++) { b[iindex] = memoryb + iindex*xn; bb[iindex] = memoryb + rowcount*xn + iindex; bb[iindex][0] = 0.; }MPI_Barrier(MPI_COMM_WORLD);// the solving of problem result = PLPPSAD(n, rowcount, myid, procnum, a,aa, b, bb, xn, p, work2, work3, ea, eb, &info, &conda,&e, norm0, LINE);

MPI_Barrier(MPI_COMM_WORLD);MPI_Finalize(); // shutdown of MPI}

Function Slae_bsp_bp. Function Slae_bsp_bp is intended forinvestigating and solving of large volume LAS with band symmetricpositive definite matrices within approximately given initial data on

157

the distributed memory MIMD-computer within the parallelprogramming environment MPI. It is solved by parallel block-cyclicCholesky algorithm [43].

Prior to the call of the function Slae_bsp_bp the communicator(communications environment) is to be determined consisting of thechosen number of processors, the initial data are to be distributedbetween processors and the required volume of memory for arraysused in the program is to be allocated.

Declaration of function Slae_bsp_bp#include <math.h>#include <lira.h>#include <lin_alg.h>#include "mpi.h"int Slae_bsp_bp(int n, int m, int q, double* eres,

double*** pT, int crT, int szT,MPI_Comm com, int s, int r1, int prb)

n – order of LAS;m – band half-width of the matrix of LAS (the band width is

2m+1);q – the number of right-hand sides in LAS;eres – on entry: eres[0] – relative error in the matrix elements

specification; eres[1] – relative error in the specification of elementsof the right-hand sides; on exit: eres[2] – the obtained estimate for theoverall error in solutions; eres[3] – estimate for the matrix conditionnumber ; eres[4] – the matrix norm;

pT – pointer to the array of pointers, where:pT[0] – two-dimensional array of matrix elements stored in the

current processor according to one-dimensional block-cyclic scheme;at that elements of each row are stored in the inverse order, beginningfrom the diagonal element;

pT[1] – two-dimensional array of elements of right-hand sides ofLAS stored in the current processor according to one-dimensionalblock-cyclic scheme;

pT[2][0] on entry: contains pointer to one-dimensional array;pT[2] – on exit: a pointer to two-dimensional array of elements of

the solution matrix of LAS obtained in the current processor;

158

pT[3] – on exit: a pointer to two-dimensional array of elements ofthe factorized matrix A, stored in the current processor;

crT – the number of rows in array pT[2] not less than2*s*((n-1)/(p*s)+1) +(m+s+1);

szT – size of the array pT[2][0] not less than the value 'szm';com – identifier of communicator;s – the number of rows in block by which the matrix is broken;r1 – mathematical number of the first matrix row stored in the

current processor;prb – index of the problem, which is equal to 1, in case if matrix

factorization is to be performed and solution is to be evaluated;otherwise it is equal to 0, if program with factorized matrix is called.

Returned valueFunction Slae_bsp_bp returns a value of the completion code

with the following meanings:0 – normal completion;

100 – matrix of LAS turned out to be sign-undetermined;120 – matrix of LAS turned out to be singular within the

limits of machine accuracy;125 – the reliability of machine solution of LAS cannot be

guaranteed;100 – the initial data specification error;110 – insufficient size of working array of pointers and/or

of the work buffer;300 – inter-processor data exchange error.


159

Example of using function Slae_bsp_bpA segment of program using the function Slae_bsp_bp for the

solving of LAS is given below. The initial data (matrix and theright-hand side) are formed by accompanying programs.// Function for the forming of LAS with bandsymmetric matrix

// distributed betweeh processors;// call of functions Slae_bsb_bp or Slae_bss_bp// for the solving of problem and gathering ofthe solution vector int pre_sbp( int com, // identifier of communicator int prb_id, // index of the problem(sign-determinacy of matrices) int s, // the number of rows in blocks bywhich // the matrices of problem are broken int* nn, // pointer to the value of the orderof LAS int* mm, // pointer to the value of the bandwidth int* qq, // pointer to the value of the numberof // right-hand sides in LAS int trg_id, // the storing index of results double* cond, // pointer to the value of estimatefor // the matrix condition number of LAS double* ess, // pointer to the value of estimatefor // error in the obtained solution double* alpha // pointer to the value of the initial // shift of the matrix spectrum // (with prb_id=1) ){ Mbs_alloc(),Mrd_alloc(), Mrc_gather_e(), in_row_L(); int pari[8], n, m, q, szA,szb,szbf, sz1, sz2, p, cpr, r1, npp, mpp, sp, sps, i,j,j1,k, err, it=0, *ktz; double **ppp[4], *dar, *tar, eres[5]; FILE *fout;

MPI_Comm_rank( com, &cpr ); MPI_Comm_size( com, &p ); pari[0] = pari[1] = pari[2] = 0;

160

n = pari[3] = pari[6] = *nn; m = pari[4] = *mm; q = pari[5] = *qq; eres[0] = eres[1] = 0.0; sp = p*s; sps = sp-s; r1 = cpr*s+1; npp = ((n-1)/sp)*s+mmin((n-1)%sp+1,s); mpp = (m/sp)*s+mmin(m%sp+1,s);

// allocation of memory for (two-dimensional) array ofpointers sz2 = 3*npp+m+s+mmax(2*s,2*sps)+q+2; ppp[0] = (double**)malloc( sizeof(double*)*sz2 ); if( ppp[0]==NULL ) { return( -12 ); }// allocation of memory for one-dimensional array ofmatrix elements,// elements of right-hand sides and results for( i=sps, j=0; i<m; i=k ) { j += m-i; if( (k=i+1)%s==0 ) k+=sps; } for( i=2*sps, j1=0; i<m; i=k ) { j1 += m-i; if( (k=i+1)%s==0 ) k+=sps; } szA = npp*(m+1)-j; szb = npp*q; sz1 = 2*(szA+szb)+j-j1)+10*(m+1) +mmax((2*s+2)*(m+1),szb+mmax(2*s,mpp+sp+sps)*q; ppp[0][0]=dar = (double*)malloc(sizeof(double)*sz1); if( dar==NULL ) { return( -11 ); }// distribution of the arrays ppp[1] = ppp[0]+npp+1; ppp[2] = ppp[1]+q; szbf = sz1-(szA+szb); ppp[1][0] = ppp[0][0]+szA; ppp[2][0] = tar = ppp[1][0]+szb;// forming of matrix distributed between processors err = Mbs_alloc( in_row_L, n, m, s, r1, ppp[0], eres, p ); if( err!=0 ) { free( dar ); free( ppp[0] ); return( err ); }// forming of matrix of right-hand sides// distributed between processors err = Mrd_alloc( in_row_L, q, n, s, r1, npp, ppp[1], tar, szbf, p ); free( dar ); free( ppp[0] ); return( err ); }// the solving of LAS if( prb_id!=1 ) { err = Slae_bsp_bp( n, m, q, eres, ppp, sz2-(npp+q), szbf, com, s, r1, 1 ); if( err/10==12 && prb_id==0 ) { prb_id = 1; *alpha = 0.0; } } // end if( prb_id!=1 )

161

if( prb_id==1 ) { err = Slae_bss_bp( n, m, q, eres, ppp, sz2-(npp+q), szbf, com, s, r1, alpha ); } // end if( prb_id==1 ) if( err!=0 ) { return( err ); } // end if( err!=0 )// output of results to protocol and file if( cpr==0 ) { if( trg_id==0 || (fout=fopen(FONM,"wb"))==NULL ) it = 0; else { it = 1; fwrite( pari+5, sizeof(int), 2, fout ); fwrite( eres+2, sizeof(double), 2, fout ); } }// gathering of solution vectors in the 0-th processor err = Mrc_gather_e( ppp[2][0], ppp[1], tar+npp*q, q, n, s, r1, com, fout, it ); if( cpr==0 ) { if( it ) fclose( fout ); return( err ); }// Function for the distributing between processors ofrows of blocks // of the band symmetric matrixint Mbs_alloc ( int in_row(), int n, int m, int s, int ri, double** BS, double* eps, int p ){ int i,ip, mp1, sps, rv; rv = in_row( '0', n, m, 0, BS[0] ); eps[0] = BS[0][0]; eps[1] = BS[0][1]; sps = s*p-s; mp1 = m+1; for( i=ri-1, ip=0; i<n; ip++ ) { rv = in_row( 'A', n, m, i, BS[ip] ); i++; BS[ip+1] = BS[ip]+mmin(mp1,i); if( i%s==0 ) i += sps; } return( 0 ); }// Function for the distributing between processors ofrows of blocks // of the rectangular matrixint Mrd_alloc ( int in_row(), int nr,

162

int nc, int s, int ri, int lrow, double** RD, double* buf, int szb, int p ){ int i,ip,k, sps, rv; for( k=1; k<nr; k++ ) RD[k] = RD[0]+k*lrow; sps = s*p-s; for( i=ri-1, ip=0; i<nc; ip++ ) { rv = in_row( 'F', nc, nr, i, buf ); i++; if( i%s==0 ) i += sps; for( k=0; k<nr; k++ ) RD[k][ip] = buf[k]; } return( 0 );}// Function for the forming of the i-th row of thefinite-element // matrices (Laplace operator, Neumann problem)int in_row_L ( int cs, int n, int m, int i, double* R ){ static int Nx, Ny; static double a1,a2,a4,a8, b1,b2,b4,b8,b0, f1,f2, g1,g2; switch( cs ) { case '0': { double hx,hy, c0,c1,c2, rl=1.0, rw=1.0; Nx = m-2; hx = rl/Nx; Ny = n/(m-1); hy = rw/Ny; c0 = hx*hy/36.0; c1 = hx/(6.0*hy); c2 = hy/(6.0*hx); a8 = c1+c2; a1 =-a8; a2 = 2*a8; a4 = 2*a2; a8 = 2*a4; b1 = c0; b2 = 2*b1; b4 = 2*b2; b8 = 2*b4; b0 = 2*b8; f1 = c1-2*c2; f2 = 2*f1; g1 = c2-2*c1; g2 = 2*g1; R[0] = R[1] = 0.0; break; } case 'A': {// forming of the i-th row of the finite-element matrix // for the Laplace operator int j, im, mi; double *Ai;

163

im = i%(Nx+1); mi = mmin(m,i); Ai = R; for( j=1; j<=m; j++ ) Ai[j] = 0.0; if( i==0 ) Ai[0] = a2; else if( i==Nx ) { Ai[0] = a2; Ai[1] = f1; } else if( i<Nx ) { Ai[0] = a4; Ai[1] = f1; } else if( im==0 ) { Ai[0] = a4; Ai[m-2] = a1;

Ai[m-1] = g1; } else if( im==Nx ){ Ai[0] = a4; Ai[1] = f2;

Ai[m-1] = g1; Ai[m] = a1;}

else { Ai[0] = a8; Ai[1] = f2; Ai[m-1] = g2; Ai[m-2] = Ai[m] = a1; } if( i>n-m ) { if( im==Nx ){ Ai[0] = a2; Ai[1] = f1; Ai[m-1] = g1; Ai[m] = a1; } else if( im==0 ) { Ai[0] = a2; Ai[m-2] = a1; Ai[m-1]= g1; }

else { Ai[0] = a4; Ai[1] = f1; Ai[m-1] = g2;

Ai[m-2] =Ai[m] = a1; }

} break; } case 'F': { int j, im; im = i%(Nx+1); for( j=0; j<m; j++ ) R[j] = 0.0; if( i==0 ) R[0] = 4*a2+f1+4*g1+a1; else if( i==1 ) R[0] = a4+4*f1+4*a1+g2; else if( i==2 ) R[0] = f1+a1; else if( i==Nx-1 ) R[0] = -(a1+f1); else if( i==Nx ) R[0] = -(a2+g1); else if( i>Nx && i<n-Nx-1 ) { if( im==0 ) R[0] = 8*g1+2*a1+4*a4+f2; else if( im==1 ) R[0] = 8*a1+2*g2+a8+4*f2; else if( im==2 ) R[0] = 2*a1+f2; else if( im==Nx-1 ) R[0] = -(2*a1+f2); else if( im==Nx ) R[0] = -(a4+2*g1); } else if( i>n-(Nx+2) ) { if( im==0 ) R[0] = 4*g1+a1+4*a2+f1; else if( im==1 ) R[0] = 4*a1+g2+a4+4*f1; else if( im==2 ) R[0] = a1+f1; else if( im==Nx-1 ) R[0] = -(a1+f1); else if( im==Nx ) R[0] = -(a2+g1); } break; } }

164

return( 0 );}

Function Slae_bss_bp. Function Slae_bss_bp is intended forinvestigating and solving LAS with band symmetric positivesemi-definite matrices within approximately given initial data on thedistributed memory MIMD-computer within the parallel programmingenvironment MPI. A generalized solution of LAS is evaluated bythree-staged regularization method [43] and parallel block-cyclicCholesky algorithm[29].

Prior to the call of the function Slae_bss_bp the communicator(communications environment) is to be determined consisting of thechosen number of processors, the initial data are to be distributedbetween processors and the required volume of memory for arraysused in the program is to be allocated.

Declaration of function Slae_bss_bp#include <math.h>#include "mpi.h"#include <lira.h>#include <lin_alg.h>int Slae_bss_bp(int n, int m, int q, double* eres,

double*** pT, int crT, int szT,MPI_Comm com, int s, int r1, double* alpha)

n – order of LAS;m – band half-width of the matrix of LAS (the band width is

2m+1);q – the number of right-hand sides in LAS;eres – on entry: eres[0] – relative error in the matrix elements

specification; eres[1] – relative error in the specification of elementsof the right-hand sides; on exit: eres[2] – the obtained estimate for theoverall error in solutions; eres[3] – estimate for the matrix conditionnumber ; eres[4] – the matrix norm;

pT – pointer to the array of pointers, where:pT[0] – two-dimensional array of matrix elements stored in the

current processor according to one-dimensional block-cyclic scheme;at that elements of each row are stored in the inverse order, beginningfrom the diagonal element;

165

pT[1] – two-dimensional array of elements of right-hand sides ofLAS stored in the current processor according to one-dimensionalblock-cyclic scheme;

pT[2][0] on entry: contains pointer to one-dimensional array;pT[2] – on exit: contains a pointer to two-dimensional array of

elements of the solution matrix of LAS obtained in the currentprocessor;

pT[3] – on exit: contains a pointer to two-dimensional array ofelements of the factorized matrix A, stored in the current processor;

crT – the number of rows in array pT[2] not less than2*s*((n-1)/(p*s)+1) +(m+s+1);

szT – size of the array pT[2][0] not less than the value 'szm';com – identifier of communicator;s – the number of rows in block by which the matrix is broken;r1 – mathematical number of the first matrix row stored in the

current processor;alpha − initial shift (if it is =0, then is chosen automatically); on

exit: contains value of the shift involved in the evaluation of solutionto LAS.

Returned valueFunction Slae_bss_bp returns a value of the completion code

with the following meanings:

166

0 – normal completion;101 – matrix of the problem turned out to be

sign-undetermined; 121,122 – matrix of the problem turned out to be

sign-undetermined;125 – the reliability of machine solution of LAS cannot be

guaranteed;–100 – the initial data specification error;–110 – insufficient size of working array of pointers and/or

of the work buffer;–300 – inter-processor data exchange error.


Example of using function Slae_bss_bpFunction Slae_bss_bp is used in similarly the same manner as

the function Slae_bsp_bp.

Function Slae_svd_p. Function Slae_svd_p is intended forinvestigating and solving of large volume LAS with rectangular orsquare singular general matrices within approximately given initialdata on the distributed memory MIMD-computer within the parallelprogramming environment MPI. The row-cyclic algorithm is used forsingular-value decomposition of the matrix and subsequent evaluationof the generalized solution [36].

Prior to the call of the function Slae_svd_p the communicator(communications environment) is to be determined consisting of thechosen number of processors, the initial data are to be distributedbetween processors and the required volume of memory for arraysused in the program is to be allocated.

Declaration of function Slae_svd_p#include <math.h>#include "mpi.h"#include <lin_alg.h>

167

int Slae_svd_p(int nr, int nc, int nrs, int ss, int ri,double** A, double eA,double** C, double eC, int sar,double* ar, double* ep, int* rank,MPI_Comm com)

nr – the number of rows in matrix;nc – the number of columns in matrix;nrs – the number of right-hand sides правых частей (= 0, if it is

required to carry out only the singular-value decomposition of thematrix);

ss – index of the storage scheme which is equal to 0 for the rowscheme or to 1 for the column scheme;

ri – mathematical number of the first matrix row / column storedin this processor;

A – pointer to the array of pointers which on entry: points toarrays of elements of matrix rows stored in the current processor, onexit: points to arrays of elements of rows of the right-hand sidesmatrix stored in the current processor. Matrix elements should bedistributed between processors according by the row-cyclic schemeaccording to which processor with logical number i contains elementsof rows with numbers j, j+p, j+2p, …, while processor with numberi+1 – with numbers j+1, j+1+p, j+1+2p, …, where p is the number ofprocessors been employed by program. For each processor thisdistribution is determined by the number of first row whose elementsare distributed to this processor (for example, for the processor withnumber i, this is the number j). For each processor matrix elements arestore in two-dimensional array;

еA – maximum relative error in the specification of elements ofthe matrix;

C – pointer to the array of pointers which on entry: points toarrays of elements of rows of the right-hand sides matrix stored in thecurrent processor, while on exit: points to arrays of elements of rowsboth of the solutions and estimates matrix stored in the currentprocessor or matrix of the left-hand singular vectors;

168

еC – maximum relative error in the specification of elements ofthe right-hand sides; it is required thatsar >= ((nrs+p-1)/p)*nr+4*min{nr,nc}+2);

ar – pointer to the working buffer;ep – pointer to the element whose value on entry: determines the machine zero for the

QR-algorithm, on exit: the value of element is equal to machine zero ofthe system's matrix;

rank – pointer to the array consisting of two elements which on exit: contains values of the evaluated and efficient

ranks;com – identifier of communicator.Returned valueFunction Slae_svd_p returns a value of the completion code

with the following meanings:0 – normal completion;

121 – all singular valued do not exceed the value of machinezero of system's matrix;

100 – the initial data specification error;500 – inter-processor data exchange error.


Example of using function Slae_svd_pA segment of program using the function Slae_svd_p for the

solving of LAS is given below. The initial data (matrix and theright-hand side) are formed by accompanying programs// Function for the forming of roblem using// singular-value matrix decompositionint pre_SVD ( MPI_Comm comm, int mc, int* nrs, int* nr, int* nc, double* eps, double** sv, int* rank ){ int HiP_svd(), inma_svd(), Mrg_cast_e(), Mrc_gather_e(); int cpr,p,ri, m,n,q,ss, mp,np,qp, sr,sar, sar1,sar2, res, par[4],it, trg_id=1;

169

double **A, **B, **tar2, *tar1, epPr[2], t0; FILE *f1in, *f2in, *fout;// initialization of MPI MPI_Comm_rank( comm, &cpr );

MPI_Comm_size( comm, &p ); m = *nr; n = *nc; q = *nrs; epPr[0] = epPr[1] = 0.0; epPr[0] = mmax(MEPS,epPr[0]); epPr[1] = mmax(MEPS,epPr[1]); ss = (m<n?1:0);// allocation and distribution of memory res = HiP_svd( p, cpr, m, n, q, &ri, &tar1, &sar1, &tar2, &sar2 ); if( res ) return( res ); *sv = tar1; sr = mmin(m,n); sar = tar2[0]-(tar1+sr); mp = (m+p-1)/p; np = (n+p-1)/p; qp = (q+p-1)/p; B = tar2+qp; A = B+(q>0?mmax(mp,np+1):mmin(mp,np));// forming and distributing of matrix// of the problem between processors res = inma_svd ( mc, &m, &n, ri, A, p, comm ); if( res ) return( res );// forming and distributing of matrix// of right-hand sides between processors if( q>0 ) { res = inma_svd ( (mc>1?(mc==4?5:-1):0), &m, &q, ri, B, p, comm ); if( res ) return( res ); }// the solving of problem by means of matrix// singular-value decomposition res = Slae_svd_p( m, n, q, ss, ri, A, epPr[0], tar2, epPr[1], sar, tar1, eps, rank,comm ); if( res ) return( res ); if( q>0 ) { int j;// gathering of solution vectors in the 0-thprocessor res = Mrc_gather_e( B[0], A, A[0]+np*q, q, n, 1, ri, comm, fout, it ); return( res );}// function for allocation and distribution of memoryint HiP_svd ( int p, int cpr, int m, int n, int q, int* ri, double** a1, int* sa1, double*** t, int* st )

170

{ int i, mp,np,qp, mip,mrp,nip, mn,nm, buf, r_mem, p_mem; double **dad, *oad;// indication of scheme of matrix elements distribution// between processors *ri = cpr+1;// allocation and distribution of memory mp = (m+p-1)/p; np = (n+p-1)/p; qp = (q+p-1)/p; mip = mmax(mp,np); mrp = mmax(mp,np+1); nip = mmin(mp,np); mn = mmax(m,n); nm = mmin(m,n); *st = p_mem = (q>0?mrp+qp:nip)+mip+2; *t=dad = (double**)malloc(p_mem*sizeof(double*)); if( *t==NULL ) return(-4 ); buf = mmax(5*nm+q+q,np*q)+2*nm+q; *sa1=r_mem = buf+nm*mip+(q>0?mrp*q+nm*qp:nm*nip); *a1 = oad = (double*)malloc(r_mem*sizeof(double)); if( *a1==NULL ) return(-2 ); dad[0] = oad+buf; if( q>0 ) { for( i=1; i<=qp; i++ ) dad[i] = dad[0]+i*nm; for( i=1, dad+=qp; i<=mrp; i++ ) dad[i] = dad[0]+i*q; for( i=1, dad+=mrp; i<=mip; i++ ) dad[i] = dad[0]+i*nm; } else for( i=1; i<=(mp+np); i++ ) dad[i] = dad[0]+i*nm; return( 0 );}// function for the forming of original matrixint inma_svd ( int mc, int* nr, int* nc, int ri, double** A, int p, MPI_Comm comm ){ int mtrx(); int m,n, r1, i,ip; m = *nr; n = *nc; r1 = ri-1; if( mc==2 && m<n ) { n = *nr; m = *nc; } for( ip=i=0; i<m; i++ ) if( i%p==(ri-1) ) { n = mtrx( n, i, A[ip], mc ); ip++; if( n<0 ) return( n ); } if( n!=mmin(*nc,*nr) ) *nc = *nr = m = n; return( 0 );}// function for the forming of rows (columns)// of different matrices

171

int mtrx ( int n, int i, double* A_row, int matr_case ){ int j; switch( matr_case ) { case 4: // square matrix of order n // whose elements = n-i, // while the off-diagonal elements =n+1-max{i,j}; // n%3=1 if( n%3!=1 ) n = ((n-1)/3)*3+1; for( j=0; j<n; j++ ) A_row[j] = n-(i==j?i+1:mmax(i,j)); return( n );

case 5: // rectangular matrix of right-hand sides // of dimension N*n, with N%3=1 // (for the matrix "4") if( n==1 ) A_row[0] = (i<2?N-(i+1):N-i); else { for( j=0; j<n; j++ ) { int k,c; for( k=0, c=N/n, A_row[j]=0.0; k<N; k++ ) A_row[j] +=(n-(i==j?i+1.0:mmax(i,j))) *(j>0?1.0+k-c*j:1.0); } } return( n ); default : return(-50); }}

172

7.2.2. Investigating and solving of the algebraiceigenvalue problem

Function mp_esytri. Function mp_esytri is intended forinvestigating and solving of full AEVP for tridiagonal symmetricmatrix on the distributed memory MIMD-computer within the parallelprogramming environment MPI. The problem is solved byQL-algorithm with parallel forming of eigenvectors' matrix [38].

Prior to the call of the function mp_esytri the communicator(communications environment) is to be determined consisting of thechosen number of processors, the initial data are to be distributedbetween processors and the required volume of memory for arraysused in the program is to be allocated.

Declaration of function mp_esytri#include "mpi.h" #include <math.h> #include <stdlib.h>int mp_esytri(int n, double *d, double *e,

double ea, double *a, double *w,double *estw, double *estz,int id, int np, MPI_Comm com )

n – order of the matrix;d – pointer to the array of dimension n, containing on entry:

elements of principal diagonal of the matrix contained in eachprocessor. Each processor should contain all elements of principaldiagonal;

e – pointer to the array of dimension n, containing on entry:under-diagonal elements of the matrix beginning from the secondentry of the array. Each processor should contain all elements ofsub-diagonal;

ea – maximum relative error in the specification of elements ofthe matrix; in case of the accurate specification it should be set equalto zero;

a – pointer to the array containing on exit: a matrix whosecolumns are evaluated eigenvectors; the matrix is distributed betweenprocessors so that its full rows with numbers id+K*np+1, whereK = 0, 1, 2, ..., are contained in the processor id;

173

w – pointer to the array of dimension n, containing on exitevaluated eigenvalues, the array being contained in all processors infull;

estw – pointer to the array of dimension n, containing on exitestimates for absolute errors in the evaluated eigenvalues, the arraybeing contained in all processors in full;

estz – pointer to the array of dimension n, containing on exitestimates for absolute errors in evaluated eigenvectors, the array beingcontained in all processors in full;

id – logical number of processor ( = 0, 1, ..., np–1) ;np – the number of processors ( np > 1 );com – identifier of communicator.Returned valueFunction mp_esytri returns a value of the completion code


0 – normal completion; k – error in the allocation of memory for working arrays or

in data exchange operations (k integer);m – the number of eigenvalue the evaluation of which

requires more than 30 iterations; next eigenvalues andeigenvectors are not evaluated


Example of using function mp_esytriA segment of program using the function mp_esytri for the

solving of full AEVP for tridiagonal symmetric matrix is given below.The matrix is formed by the accompanying programs.#include "mpi.h" #include <stdio.h> #include <math.h> #include <stdlib.h>#include <malloc.h>/* Example of evaluating all eigenvalues andeigenvectors of symmetric tri-diagonal matrix 2 1 0 0 ... 0 0 1 2 1 0 ... 0 0

174

0 1 2 1 ... 0 0 ... ... 0 0 0 0 ... 2 1 0 0 0 0 ... 1 2 */

int mp_thrwz(),mp_esytri(),mp_tql2(), in_A286();int main(argc, argv)int argc; char **argv;{ int N, i, err=0;

// initialization of MPIMPI_Init(&argc,&argv);

// initial data specificationN=1000;i=mp_thrwz(N,in_A286);

// shutdown of MPIMPI_Finalize();exit(0);}int mp_thrwz(int n, int ff()){ int np, id, ku, ku0,kk, err=0,i, j;double ea,*de,*a, *w, *estw,*estz, t1,t2;double macheps=2.220446049250313e-16; MPI_Comm_rank(MPI_COMM_WORLD,&id); MPI_Comm_size(MPI_COMM_WORLD,&np); ea=macheps; ku0=n/np;j=n%np; ku=(id<j) ? (ku0+1): (ku0); kk=ku;/* the total number of rows in block */ if(j!=0 && id>=j)kk=ku+1; j=sizeof(double); de=(double*)malloc(2*n*j); a=(double*)malloc(n*kk*j); w=(double*)malloc(n*j); estw=(double*)malloc(n*j); estz=(double*)malloc(n*j);

// construction of matrix err=ff(n,de,id,np); for(i=0;i<ku*n; i++)a[i]=0.0e0; for(j=0,i=0;j<ku; i+=(n+np),j++)a[i+id]=1.0e0; err = mp_esytri(n,de,&de[n],ea,a,w,estw,estz,id,np, MPI_COMM_WORLD);return(err);} /* end MP_THRWZ */

175

/* construction of tridiagonal matrix in all processors */

int in_A286(int n, double *de, int id, int np){ int i,j ; double d, e; d=2.0e0; e=1.0e0; for(i=0; i<n; i++){ de[i]=d; de[n+i]=e; } de[n]=0; return(0); }

Function mp_esyqai_bl. Function mp_esyqai_bl isintended for investigating and solving of full AEVP for densesymmetric matrix on the distributed memory MIMD-computer withinthe parallel programming environment MPI. The problem is solved byparallel block-cyclic Householder's algorithm and QL-algorithm withparallel forming of eigenvectors' matrix [38].

Prior to the call of the function mp_esyqai_bl thecommunicator (communications environment) is to be determinedconsisting of the chosen number of processors, the initial data are tobe distributed between processors and the required volume of memoryfor arrays used in the program is to be allocated.

Declaration of function mp_esyqai_bl#include "mpi.h"#include <math.h>#include <stdlib.h>int mp_esyqai_bl(int n, s, double ea, double* a, double*

w, double* estw, double* estx,double **work,int id, int np, MPI_Comm com);

n – order of the matrix;s – the number of rows in blocks by which the matrix is broken;ea – maximal relative error in the specification of matrix;a – on entry: a pointer to two-dimensional array of dimension

(n*ku), containing in processor id a part of the original matrix А, i.e.full rows with numbers id+K*np+1, where K=0,1,2,...; on exit: ineach processor the array contains a block of matrix of eigenvectors(columns) specified by the same rule;

176

w – a pointer to the array of dimension n, which on exit: containsthe evaluated eigenvalues;

estw – pointer to the array of dimension n, containing on exit:contains estimates for absolute errors in the evaluated eigenvalues;

estx – pointer to the array of dimension n, which on exit: containsestimates for absolute errors in the evaluated eigenvectors;

work – working array of pointers of the dimension not less than2*n+3;

id – logical number of processor (0,1, … np–1);np – the number of processors ( np>1 );com – identifier of communicator.Returned valueFunction mp_esyqai_bl returns a value of the completion code


0 – normal completion; k – error in the allocation of memory for working arrays or

in data exchange operations (k integer);m – the number of eigenvalue the evaluation of which

requires more than 30 iterations; next eigenvalues andeigenvectors are not evaluated


Example of using function mp_esyqai_blA segment of program using the function mp_esyqai_bl for

the solving of full AEVP for dense symmetric matrix is given below.The initial data (matrix and the right-hand side) are formed byaccompanying programs

#include "mpi.h"#include <stdio.h> #include <math.h> #include <stdlib.h>#include <malloc.h>/* Example of evaluating all eigenvalues andeigenvectors of symmetric matrix

177

n-1 n-2 n-3 ... 2 1 n-2 n-2 n-3 ... 2 1 n-3 n-3 n-4 ... 2 1 ... ... ...

2 2 2 ... 2 1 1 1 1 ... 1 0

*/int mp_erswz_block(),mp_esyqai_bl(),mp_haus_bl()int mp_tql2(),dotxy(),in_ram_sym(),mp_outz(); int in_AmaxU();int main(argc, argv)int argc; char **argv;

{ int N, s, err=0;// initialization МРI MPI_Init(&argc,&argv);// initial data specification N=1000; s=20; err=mp_erswz_block(N,s,in_AmaxU,prt1_rez);// shutdown of MPI MPI_Finalize(); exit(0); }int mp_erswz_block(int n, int s,int in_A(),

int out_rez()){ int np, id, ku, ku0,kk, err=0,i, j, jj; double ea,**a,*pa, *w, *estw, *estz,**work; double anorm=0.0e0, t1; MPI_Comm_rank(MPI_COMM_WORLD,&id); MPI_Comm_size(MPI_COMM_WORLD,&np); ea=0; ku0=n/np;j=n%np; /* the number of matrix rows in processor */ ku=(id<j) ? (ku0+1): (ku0); kk=ku;/* the total number of matrix rows in processor */ if(j!=0 && id>=j)kk=ku+1; // allocation of memory

j=sizeof(double); jj=sizeof(double*); a=(double**)malloc(kk*jj); a[0]=pa=(double*)malloc(n*kk*j); for(i=1;i<kk; i++)a[i]=a[0]+i*n; w=(double*)malloc(n*j); estw=(double*)malloc(n*j); estz=(double*)malloc(n*j);

// determination of working array work=(double**)malloc((2*n+3)*sizeof(double*));work[0]=(double*)malloc((2*n*s+5*n)*sizeof(double));

// construction of matrix err=in_A(n,pa,w,ku,id,np);

178

// evaluation of eigenvalues and eigenvectors err = mp_esyqai_bl( n, s,ea,a, w,estw,estz,work, id,

np, MPI_COMM_WORLD); return(err);}

// Function for the matrix construction

in_AmaxU( int n, double *a, double *w, int ku, int id, int np )

{ int i,j, ni; for(i=0; i<ku*n; i++)a[i]=0.0e0; ni=0; /* the beginning of the i-th row in the

processor id */ for(i=0; i<n; i++){ for(j=i; j<n; j++)w[j]=n-j; w[i]-=1.0e0;if(id == i%np){for(j=i; j<n; j++) a[ni+j]=w[j]; ni+=n; } } /* i */ return(0);}

Function Evp_bs_bp. Function Evp_bs_bp is intended forinvestigating and solving of partial standard or generalized AEVP forband symmetric positive definite matrices on the distributed memoryMIMD-computer within the parallel programming environment MPI.The problem is solved by method of iterations on the subspace [35]and parallel block-cyclic Cholesky algorithm [43].

Prior to the call of the function Evp_bs_b the communicator(communications environment) is to be determined consisting of thechosen number of processors, the initial data are to be distributedbetween processors and the required volume of memory for arraysused in the program is to be allocated.

179

Declaration of the function Evp_bs_bp#include <math.h>#include <lin_alg.h>#include "mpi.h"int Evp_bs_bp(int* c, int* nitm, int n, int mA,

int mB, int q, int* qi, double* eres,double*** pT, int crT, int szT,MPI_Comm com, int bls, int r1m)

c – on entry: contains the number of required minimumeigenvalues, while on exit the number of evaluated minimumeigenvalues;

nitm – on entry: contains the upper bound for the number ofiterations; on exit: contains the number of performed iterations;

n – order of the problem's matrix;mA – the band half-width of the matrix A (the band width is

2*mA +1);mB – the band half-width of the matrix B (the band width is

2*mB+1);q – the number of vectors being iterated (dimension of the

subspace);qi – the number of specified initial iterated vectors;eres – on entry: eres[0] – relative error in the specification of

elements of the matrix A;eres[1] – relative error in the specification of elements of the

matrix B; eres[2] – the required accuracy in solutions;pT – pointer to the array of pointers, where:pT[0] – array of pointers to the arrays (two-dimensional array)

containing results of solving AEVP;pT[1] – two-dimensional array of elements of matrix A stored in

the current processor according to one-dimensional block-cyclicscheme;

pT[2] – two-dimensional array of elements of matrix B stored inthe current processor according to one-dimensional block-cyclicscheme;

pT[3] – pointer to the working array of pointers;pT[3][0] contains pointer to one-dimensional working array;

180

crT – the number of rows in the array pT[2] not less than2*s*((n-1)/(p*s)+ +1)+(m+s+1);

szT – size of array pT[3][0] – not less than 'szm';com – identifier of communicator;bls – the number of rows in block by which the matrix is broken;r1m – mathematical number of the first matrix row stored in the

current processor.Returned valueFunction Evp_bs_bp returns a value of the completion code


0 – normal completion;< 0 – error either in the allocation of memory for working

arrays or during the data exchange operations;

cin-cout

– at the given maximum number of iterations it wasevaluated less eigenvalues than required whichsatisfy either termination criteria for the iterativeprocess or the reliability criteria;

>5(*с) – one of the matrices turned out to be not positivedefinite.


Example of using function Evp_bs_bpFunction Evp_bs_bp is used similarly in the same manner asSlae_bsp_bp.

7.2.3. Investigating and solving of systems of non-linearequations

Function zeroin. Function zeroin is intended for the findingof roots of non-linear equation with approximately given initial databy bisection method on the distributed memory MIMD-computerwithin the parallel programming environment MPI.

181

Function zeroin employs the function f_zer which is to bewritten by user.

Prior to the call of the function zeroin the communicator(communications environment) is to be determined consisting of thechosen number of processors, the initial data are to be distributedbetween processors and the required volume of memory for arraysused in the program is to be allocated.

Declaration of the function zeroin#include <math.h>#include "mpi.h"#include "nu.h"int zeroin(MPI_Comm com, double a, double b,

double *tol, double geps, double *x,double m, int *n);

com – identifier of communicator;а – left-hand boundary of the given interval;b – right-hand boundary of the given interval;

tol – on entry: contains the required accuracy for the obtaining ofroots of equation;

on exit: contains the accuracy of evaluated roots;geps – accuracy of the initial data specification;x – pointer to n-dimensional array of the evaluated roots;m – estimate from above for the maximum of derivative's module

on the given interval;n – on entry: contains the maximum number of roots beingsought;on exit: contains the number of evaluated roots.Returned valueFunction zeroin returns a value of the completion code with the

following meanings:

0 – normal completion;1 – no roots are found on the given interval.


Example of using function zeroin

182

For the solving of non-linear equation

on the interval [0.1, 4.9] the user is to write the following functiondouble f_zer(double x){ return x*x*x - 6*x*x + 11*x - 6;}A segment of program using the function zeroin for the

solving of this non-linear equation is given below:int main(int argc, char *argv[]){ int info=0, i, j, id; double *roots;// initial data specification int n=100; double a = 0.1, b = 4.9; double tol =1.e-10; double geps =1.e-10; double m=60.0;// initialization of MPI MPI_Init(&argc,&argv); MPI_Comm_rank(MPI_COMM_WORLD, &id); // allocation of memory roots = (double*)malloc(n*sizeof(double));// finding of roots of non-linear equationinfo=zeroin(MPI_COMM_WORLD,a,b,&tol,geps,roots,m,&n);// shutdown of MPI MPI_Finalize();}

Function bur. Function bur is intended for the solving of SNEwith approximately given initial data on the distributed memoryMIMD-computer within the parallel programming environment MPI.Parallel algorithm is described in [10, 39, 40].

The evaluation both of vector-function and rows of the Jacobymatrix on the parallel computer is distributed almost uniformlybetween processors being used. Process of parallelization is carriedout automatically.

Function bur employs the function f which is to be written byuser.

Prior to the call of the function bur the communicator(communications environment) is to be determined consisting of the

183

chosen number of processors, the initial data are to be distributedbetween processors and the required volume of memory for arraysused in the program is to be allocated.

Declaration of function bur#include <math.h>#include "mpi.h"#include "nu.h"int bur(MPI_Comm com, int n, int *it,

double *a, double *b, double *eps, double *del, double *x, double *xn, double *neww);

com – identifier of communicator;n – order of system of non-linear equations;it – on entry: contains the maximum number of iterations; on

exit: contains the number of iterations performed during theevaluating of solution;

a, b – pointers to arrays of dimension n determining boundaries ofthe region;

eps – accuracy of the obtained solution;del – on entry: contains error in the initial data specification; on

exit: contains accuracy of the obtained solution with taking intoaccount error in the initial data specification;

x – pointer to the vector of dimension n which on entry: containsa vector of initial approximations; on exit: contains a vector of thesolution;

xn – pointer to the working array of dimension5n+(3+2n)×([n/np]+1)+1, where [n/np] – integer part of n/np, np – thenumber of processors being used;

neww – pointer to the array of dimension it containing norms ofvector-function at the sequence of iterations.

Returned valueFunction bur returns a value of the completion code with the

following meanings: 0 – normal completion;

1 – iterative process cannot be continued;3 – solution has not been obtained at it iterations;

184

4 – parallelepiped restrictions are violated (going outbeyond the limits of region);

5 – the Jacoby matrix is singular;7 – point lies on the boundary of region;8 – overflow during the evaluation of norm;2 – insufficient size of either working array of pointers

and/or working bufferе;3 – error in the inter-processor data exchange operations;4 – error in the initial data specification.


Example of using function burFor the solving of SNE

, i=1, 2, , n

in the region [ 1000, 1000] the user is to write the functionvoid f(int n, int l, int m, double *x, double *y,

double *fv) {double ss, sss; int i, j; for(i=l; i<m; i++){ ss=0.e0; sss= (double)(i+1)/n; for(j=0; j<n; j++) ss+=x[j]; y[i-l]=ss-0.5e0*(3.e0*n+1.e0)+2.e0*x[i]*x[i]

-2.e0*(1.e0+2.e0*sss+sss*sss);return;}A segment of program using the function bur for the solving of

this non-linear equation is given below:main(int argc, char **argv){int i,j, err,k,info=0, it, n,nn,id,np,*work; double eps,del,*a,*b,*neww,*x,*xn,t, *work1; FILE *fi;// initialization MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &np); MPI_Comm_rank(MPI_COMM_WORLD, &id); // initial data specificationit=100;

185

n=100;// allocation of memory nn=n/np; if(n%np != 0) nn=nn+1; a = (double*)malloc(n*sizeof(double)); b = (double*)malloc(n*sizeof(double)); x = (double*)malloc(n*sizeof(double)); xn=(double*)malloc((5*n+3*nn+2*nn*n+1)*sizeof(double)); neww = (double*)malloc(it*sizeof(double));

// initial data specification for(i=0; i<n; i++) {x[i] =-1.e0; a[i] = -1000.e0; b[i] = 1000.e0; } // the solving of system of non-linear equations // by Burdakov method

info=bur(MPI_COMM_WORLD,n,&it,a,b,&eps,&del,x,xn,neww);

// shut down of MPIMPI_Finalize();}

Function kn. Function kn is intended for the solving of SNEwith approximately given initial data on the distributed memoryMIMD-computer within the parallel programming environment MPI.Parallel algorithm is described in [10, 39, 40].


Function kn employs the function f which is to be written byuser. Example of this function's specification is given in thedescription of function bur.

Prior to the call of the function kn the communicator(communications environment) is to be determined consisting of thechosen number of processors, the initial data are to be distributedbetween processors and the required volume of memory for arraysused in the program is to be allocated.

186

Declaration of function kn#include <math.h>#include "mpi.h"#include "nu.h"int kn(MPI_Comm com,int n,int *it,double *a,

double *b, double *eps, double *del,double *x, double *xn, double *neww);









Returned valueФункция kn returns a value of the completion code with the

following meanings:

0 – normal completion;3 – solution has not been obtained at it iterations;5 – the Jacoby matrix is singular;7 – point lies on the boundary of region;8 – overflow during the evaluation of norm;2 – insufficient size of either working array of pointers

187



Example of using function knA segment of program using the function kn for the solving of

the above-mentioned SNE is given below:main(int argc, char **argv){int i,j, err,k,info=0, it, n,nn,id,np,*work; double eps,del,*a,*b,*neww,*x,*xn,t, *work1; FILE *fi;

// initialization MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &np); MPI_Comm_rank(MPI_COMM_WORLD, &id);

// initial data specificationit=100; n=100;

// allocation of memorynn=n/np; if(n%np != 0) nn=nn+1;a = (double*)malloc(n*sizeof(double));

b = (double*)malloc(n*sizeof(double));

x = (double*)malloc(n*sizeof(double));

xn = (double*)malloc((7*n+4*nn+2*nn*n)*sizeof(double));neww = (double*)malloc(it*sizeof(double));

// initial data specification for(i=0; i<n; i++) { x[i] =-1.e0; a[i] = -1000.e0; b[i] = 1000.e0; } // the solving of system of non-linear equations // by Dennis-More method

188

info=kn(MPI_COMM_WORLD,n,&it,a,b,&eps,&del,x,xn,neww);

// shutdown of MPIMPI_Finalize();}

Function nut. Function nut is intended for the solving of SNEwith approximately given initial data on the distributed memoryMIMD-computer within the parallel programming environment MPI.Parallel algorithm is described in [10, 39, 40].


Function nut employs the function f which is to be written byuser. Example of this function's specification is given in thedescription of function bur.

Prior to the call of the function nut the communicator(communications environment) is to be determined consisting of thechosen number of processors, the initial data are to be distributedbetween processors and the required volume of memory for arraysused in the program is to be allocated.

Declaration of function nut#include <math.h>#include "mpi.h"#include "nu.h"int nut(MPI_Comm com, int n, int *it, double *a, double

*b, double *eps, double *del,double *x, double *xn, double *neww);




eps – accuracy of the obtained solution;

189

del – on entry: contains error in the initial data specification; onexit: contains accuracy of the obtained solution with taking intoaccount error in the initial data specification;




Returned valueFunction nut returns a value of the completion code with the


3 – solution has not been obtained at it iterations;5 – the Jacoby matrix is singular;7 – point lies on the boundary of region;8 – overflow during the evaluation of norm;2 – insufficient size of either working array of pointers



Example of using function nutA segment of program using the function nut for the solving of

the above-mentioned SNE is given below:main(int argc, char **argv){int i, j, err, k, info=0, it, n, nn, id, np,*work; double eps, del, *a, *b, *neww, *x, *xn,t, *work1; FILE *fi;

// initialization MPI_Init(&argc, &argv);

190

MPI_Comm_size(MPI_COMM_WORLD, &np); MPI_Comm_rank(MPI_COMM_WORLD, &id);// initial data specificationit=100;n=100;

// allocation of memorynn=n/np; if(n%np != 0) nn=nn+1;a = (double*)malloc(n*sizeof(double));b = (double*)malloc(n*sizeof(double));x = (double*)malloc(n*sizeof(double));xn=(double*)malloc((5*n+3*nn+2*nn*n+1)*sizeof(double));neww = (double*)malloc(it*sizeof(double));// initial data specification for(i=0; i<n; i++) {x[i] =-1.e0; a[i] = -1000.e0; b[i] = 1000.e0; }// the solving of system of non-linear equations by

Newton methodinfo=nut(MPI_COMM_WORLD,n,&it,a,b,&eps,&del,x,xn,neww);// shutdown of MPIMPI_Finalize();}

Function fiben. Function fiben is intended for the solving ofSNE with approximately given initial data by 1st Broyden method onthe distributed memory MIMD-computer within the parallelprogramming environment MPI. Parallel algorithm is described in [10,39, 40].


Function fiben employs the function f which is to be writtenby user. Example of this function's specification is given in thedescription of function bur.

191

Prior to the call of the function fiben the communicator(communications environment) is to be determined consisting of thechosen number of processors, the initial data are to be distributedbetween processors and the required volume of memory for arraysused in the program is to be allocated.

Declaration of function fiben#include <math.h>#include "mpi.h"#include "nu.h"int fiben(MPI_Comm com, int n, int *it, double *a,


com – identifier of communicator;n – order of system of non-linear equations;it – on entry: contains the maximum number of iterations;on exit: contains the number of iterations performed during the

evaluating of solution;a, b – pointers to arrays of dimension n determining boundaries of

the region;eps – accuracy of the obtained solution;

del – on entry: contains error in the initial data specification; onexit: contains accuracy of the obtained solution with taking intoaccount error in the initial data specification;

x – pointer to the vector of dimension n which on entry: containsa vector of initial approximations; on exit: contains a vector ofsolution;



Returned valueФункция fiben returns a value of the completion code with the

following meanings:

192

0 – normal completion;3 – solution has not been obtained at it iterations;4 – parallelepiped restrictions are violated (going out

beyond the limits of region);5 – the Jacoby matrix is singular;7 – point lies on the boundary of region;2 – insufficient size of either working array of pointers



Example of using function fibenA segment of program using the function fiben for the solving

of the above-mentioned SNE is given below:

main(int argc, char **argv){ int i,j, err,k,info=0, it, n,nn,id,np,*work; double eps,del,*a,*b,*neww,*x,*xn,t, *work1; FILE *fi;// initialization MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &np); MPI_Comm_rank(MPI_COMM_WORLD, &id); // initial data specificationit=100;n=100;

// allocation of memorynn=n/np; if(n%np != 0) nn=nn+1;a = (double*)malloc(n*sizeof(double)); b = (double*)malloc(n*sizeof(double)); x = (double*)malloc(n*sizeof(double)); xn =(double*)malloc((6*n+3*nn+2*nn*n+1)*sizeof(double));neww = (double*)malloc(it*sizeof(double));// initial data specification for(i=0; i<n; i++) {x[i] =-1.e0; a[i] = -1000.e0; b[i] = 1000.e0; }

// the solving of system of non-linear equations

193

// by Broyden methodinfo=fiben(MPI_COMM_WORLD,n,&it,a,b,&eps,&del,x,xn,neww);// shutdown of MPIMPI_Finalize();}

Function paul. Function paul is intended for the solving ofSNE with approximately given initial data by symmetric Powell'sversion of Broyden method on the distributed memoryMIMD-computer within the parallel programming environment MPI.Parallel algorithm is described in [10, 39, 40].


Function paul employs the function f which is to be written byuser. Example of this function's specification is given in thedescription of function bur.

Prior to the call of the function paul the communicator(communications environment) is to be determined consisting of thechosen number of processors, the initial data are to be distributedbetween processors and the required volume of memory for arraysused in the program is to be allocated.

Declaration of function paul#include <math.h>#include "mpi.h"#include "nu.h"int paul(MPI_Comm com, int n, int *it, double *a,





194



x – pointer to the vector of dimension n which on entry: containsa vector of initial approximations; on exit: contains a vector ofsolution;



Returned valueФункция paul returns a value of the completion code with the


3 – solution has not been obtained at it iterations;4 – parallelepiped restrictions are violated (going out

beyond the limits of region);5 – the Jacoby matrix is singular;8 – overflow during the evaluation of norm;2 – insufficient size of either working array of pointers



Example of using function paulA segment of program using the function paul for the solving of

the above-mentioned SNE is given below:main(int argc, char **argv){int i,j, err,k,info=0, it, n,nn,id,np,*work; double eps,del,*a,*b,*neww,*x,*xn,t, *work1; FILE *fi;

195

// initialization MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &np); MPI_Comm_rank(MPI_COMM_WORLD, &id); // initial data specificationit=100; n=100;// allocation of memorynn=n/np; if(n%np != 0) nn=nn+1;a = (double*)malloc(n*sizeof(double)); b = (double*)malloc(n*sizeof(double)); x = (double*)malloc(n*sizeof(double)); xn=(double*)malloc((6*n+3*nn+2*nn*n+1)*sizeof(double));neww = (double*)malloc(it*sizeof(double));// initial data specification for(i=0; i<n; i++) { x[i] =-1.e0; a[i] = -1000.e0; b[i] = 1000.e0; } // the solving of system of non-linear equations// by Powell methodinfo=paul(MPI_COMM_WORLD,n,&it,a,b,&eps,&del,x,xn,neww);// shutdown of MPIMPI_Finalize();}

7.2.4. Investigating and solving of systems of ordinarydifferential equations

Function ek_dri. Function ek_dri is intended for the solvingof SODE with approximately given initial data by 1-st orderEuler-Cauchy method on the distributed memory MIMD-computerwithin the parallel programming environment MPI. Parallel algorithmis described in [10, 11, 22, 32].

In the process of solving SODE the function ek_dri employsfunction diffun which is to be written by user.

The evaluation of the vector-function on the multi-processorcomputer is distributed almost uniformly between processors beingused. Process of parallelization is carried out automatically.

Prior to the call of the function ek_dri the communicator(communications environment) is to be determined consisting of thechosen number of processors, the initial data are to be distributed

196

between processors and the required volume of memory for arraysused in the program is to be allocated.

Declaration of function ek_dri#include <math.h>#include "mpi.h"#include "ode1.h"int ek_dri(MPI_Comm com, int id, int np, int n,

int mt, double *t0, double tfin,double eps,double delta1, double delta2,double *lip, double *delta, double *y0, double*tout, void (*diffun)());

com – identifier of communicator;id – the number of processor;np – the number of processors;n – order of SODE;mt – the number of output points;t0 – initial point of the integration interval;tfin – final point of the integration interval;eps – the required accuracy in the solution;delta1 – error in the specification of initial values of the solution

vector;delta2 – error in the specification of right-hand sides;lip – the Lipschitz constant;delta – accuracy of the obtained solution with taking into account

errors in the initial data specification;y0 – pointer to array of dimension n which on entry: contains

initial values of the solution vector; on exit: contains values of thesolution vector in the output point;

tout – pointer to the array of the dimension mt, containing outputpoints of the solution;

diffun – pointer to the function for the evaluation of right-handsides.

197

Returned valueFunction ek_dri returns a value of the completion code with the

following meanings:

1 – normal completion;2 – too high accuracy is requested which cannot be attained;3 – too small step of integration is obtained;4 – error in the initial data specification;5 – memory allocation error.

If the value of the completion code differs from one the functionsends a message to stdout describing a reason of the failure.

Example of using function ek_driFor the solving of SODE

, i=0, 2, , n 1,

in the region [0.0, 0.4] the user is to write the functionvoid diffun(int n, int l, int m, double t,

double *y, double *f){

int i, k;double s, s1;double *ff;ff=(double )malloc(n*sizeof(double));ne=m-l;for(i=l; i<m; i++){ s=0.e0; for(k=0; k<n; k++) s-=y[k]; s1=(1.0+t)*n; s+= s1+t+2.0; ff[i]=s- y[i];}for(k=0; k<ne; k++) f[k]= ff[l+k]}

A segment of program using the function ek_dri for thesolving of SODE is given below:int main(int argc, char **argv){ int i,n,mt,id,np, ne,iflag;

double t0,tk,eps,delta1,delta2,lip,delta;

198

double *y0, *tout;FILE *fi

// initialization of MPIMPI_Init(&argc,&argv);MPI_Comm_rank(MPI_COMM_WORLD, &id);

// initial data specificationn=4000;mt=4;t0=0.0;tfin=0.4; eps = 0.000001;delta1=1e-15;delta2=1e-15;

// allocation of memory y0=(double *)malloc(n*sizeof(double));

tout=(double *)malloc(mt*sizeof(double));// initial data specification

tout[0]=0.1;tout[1]=0.2;tout[2]=0.3; tout[3]=0.4;

for(i=0; i<n; i++)y0[i]=1.0;

// the solving of SODE by EULER-CAUCHY method iflag=ek_dri (MPI_COMM_WORLD,id,np,n,mt,&t0,tk,eps,

delta1,delta2,&lip,&delta,y0,tout,diffun);// shutdown of MPI

MPI_Finalize();}

Function rk4_dri. Function rk4_dri is intended for thesolving of SODE with approximately given initial data by explicit 4-thorder Runge-Kutta method on the distributed memoryMIMD-computer within the parallel programming environment MPI.Parallel algorithm is described in [10, 11, 22, 32].

In the process of solving SODE the function rk4_dri employsfunction diffun which is to be written by user. Example of thisfunction's specification is given in the description of functionek_dri.


Prior to the call of the function rk4_dri the communicator(communications environment) is to be determined consisting of the

199

chosen number of processors, the initial data are to be distributedbetween processors and the required volume of memory for arraysused in the program is to be allocated.

Declaration of function rk4_dri#include <math.h>#include "mpi.h"#include "ode1.h"int rk4_dri(MPI_Comm com, int id, int np, int n,

int mt, double *t0, double tfin,double eps,double delta1,double delta2,double *lip, double *delta, double *y0,double *tout, void (*diffun)());







200

Returned valueFunction rk4_dri returns a value of the completion code with




Example of using function rk4_driA segment of program using the function rk4_dri for the

solving of the above-mentioned system of ordinary differentialequations is given below:int main(int argc, char **argv)

{int i,n,mt,id,np, ne,iflag;double t0,tk,eps,delta1,delta2,lip,delta;double *y0, *tout;FILE *fi


// initial data specificationn=4000;mt=4;t0=0.0;tfin=0.4; eps = 0.000001;delta1=1e-15;delta2=1e-15;




for(i=0; i<n; i++)y0[i]=1.0;

201

// the solving of SODE by 4-th order RUNGE-KUTTA method iflag=rk4_dri (MPI_COMM_WORLD,id,np,n,mt,&t0,tk,eps,

delta1,delta2,&lip,&delta,y0,tout,diffun);

// shutdown of MPIMPI_Finalize();

}Function rk6_dri. Function rk6_dri is intended for the

solving of SODE with approximately given initial data by explicit5(6)-th order Runge-Kutta method on the distributed memoryMIMD-computer within the parallel programming environment MPI.Parallel algorithm is described in [10, 11, 22, 32].

In the process of solving the SODE function rk6_dri employsthe function diffun which is to be written by user. Example of thisfunction's specification is given in the description of functionek_dri.


Prior to the call of the function rk6_dri the communicator(communications environment) is to be determined consisting of thechosen number of processors, the initial data are to be distributedbetween processors and the required volume of memory for arraysused in the program is to be allocated.

Declaration of function rk6_dri#include <math.h>#include "mpi.h"#include "ode1.h"int rk6_dri(MPI_Comm com, int id, int np, int n,

int mt, double *t0, double tfin,double eps,double delta1, doubledelta2,double *lip, double *delta, double *y0,double *tout, void (*diffun)());

com – identifier of communicator;id – the number of processor;np – the number of processors;n – order of SODE;mt – the number of output points;

202

t0 – initial point of the integration interval;tfin – final point of the integration interval;eps – the required accuracy in the solution;delta1 – error in the specification of initial values of the solution


errors in the initial data specification;y0 – pointer to array of dimension n which on entry:contains








Example of using function rk6_driA segment of program using the function rk6_dri for the

solving of the above-mentioned system of ordinary differentialequations is given below:

int main(int argc, char **argv) { int i,n,mt,id,np, ne,iflag;

double t0,tk,eps,delta1,delta2,lip,delta;

203

double *y0, *tout;FILE *fi


// initial data specificationn=4000; mt=4;t0=0.0; tfin=0.4; eps = 0.000001;delta1=1e-15;delta2=1e-15;




for(i=0; i<n; i++)y0[i]=1.0;

// the solving of SODE by 6-th order RUNGE-KUTTA method iflag=rk4_dri (MPI_COMM_WORLD,id,np,n,mt,&t0,tk,eps,


MPI_Finalize();}

Function adams_dri. Function adams_dri is intended forthe solving of SODE with approximately given initial data by explicit12-th order Adams method on the distributed memoryMIMD-computer within the parallel programming environment MPI.Parallel algorithm is described in [10, 11, 22, 32].

In the process of solving the SODE function adams_driemploys the function diffun which is to be written by user. Exampleof this function's specification is given in the description of functionek_dri.


Prior to the call of the function adams_dri the communicator(communications environment) is to be determined consisting of thechosen number of processors, the initial data are to be distributed

204

between processors and the required volume of memory for arraysused in the program is to be allocated.

Declaration of function adams_dri#include <math.h>#include "mpi.h"#include "ode1.h"int adams_dri(MPI_Comm com, int id, int np, int n, int

mt, double t0, double tfin,double eps, double delta1, double delta2, double *lip, double *delta, double *y0,double **w1, double *tout,void(*diffun)());





w1 – pointer to two-dimensional working array;tout – pointer to the array of the dimension mt, containing output

points of the solution;diffun – pointer to the function for the evaluation of right-hand

sides.

205

Returned valueFunction adams_dri returns a value of the completion code


1 – successful completion of integration;–1 – integration was interrupted because the error test failed

even with decreasing of the integration step size;–2 – integration was interrupted because too high accuracy

was requested;–3 – integration was interrupted because corrector's

convergence has not been attained;–4 – illegal specification of input parameters;–5 – memory allocation error.


Example of using function adams_driA segment of program using the function adams_dri for the

solving of the above-mentioned system of ordinary differentialequations is given below:int main(int argc, char **argv){ int i,n,mt,id,np, ne,iflag;

double t0,tk,eps,delta1,delta2,lip,delta;double *y0, *tout;double **w1;FILE *fi


// initial data specification

n=4000;mt=4;t0=0.0;tfin=0.4; eps = 0.000001;delta1=1e-15;delta2=1e-15;


206

tout=(double *)malloc(mt*sizeof(double));w1=(double **)malloc(13*sizeof(double *));

// initial data specificationtout[0]=0.1;tout[1]=0.2;tout[2]=0.3; tout[3]=0.4;

for(i=0; i<n; i++)y0[i]=1.0;

// the solving of SODE by 12-th order ADAMS methodsiflag=adams_dri(MPI_COMM_WORLD,id,np,n,m,t0,tfin,eps,

delta1,delta2,&lip,&delta, y0,w1,tout,diffun);// shutdown of MPI

MPI_Finalize();}

Function rk1_dri. Function rk1_dri is intended for thesolving of SODE with approximately given initial data by explicit 1-storder explicit Runge-Kutta-type method on the distributed memoryMIMD-computer within the parallel programming environment MPI.Parallel algorithm is described in [10, 11, 22, 32].

In the process of solving the SODE function rk1_dri employsthe function diffun which is to be written by user. Example of thisfunction's specification is given in the description of functionek_dri.


Prior to the call of the function rk1_dri the communicator(communications environment) is to be determined consisting of thechosen number of processors, the initial data are to be distributedbetween processors and the required volume of memory for arraysused in the program is to be allocated.

Declaration of function rk1_dri#include <math.h>#include "mpi.h"#include "ode1.h"int rk1_dri(MPI_Comm com, int id, int np, int n, int mt,

double *t0, double tfin,double eps, double delta1, double delta2, double *lip,

207

double *delta, double *y0, double *tout, void(*diffun)());









1 – successful completion of integration;2 – too small step of integration has been obtained, while the

required accuracy has not been attained;3 – initial data specification error;4 – too high accuracy is requested which cannot be attained;5 – memory allocation error.


Example of using function rk1_dri

208

A segment of program using the function rk1_dri for thesolving of the above-mentioned system of ordinary differentialequations is given below:int main(int argc, char **argv)

{ int i,n,mt,id,np, ne,iflag;double t0,tk,eps,delta1,delta2,lip,delta;double *y0, *tout;FILE *fi


// initial data specificationn=4000; mt=4;t0=0.0; tfin=0.4; eps = 0.000001;delta1=1e-15;delta2=1e-15;




for(i=0; i<n; i++) y0[i]=1.0;// the solving of SODE by 1-st order explicit // RUNGE-KUTTA-type methodiflag=rk1_dri(MPI_COMM_WORLD,id,np,n,mt,&t0,tk,eps,


MPI_Finalize();}

Function gear_dri. Function gear_dri is intended for thesolving of SODE with approximately given initial data by explicit 5-thorder Gear method on the distributed memory MIMD-computer withinthe parallel programming environment MPI. Parallel algorithm isdescribed in [10, 11, 22, 32].

In the process of solving the SODE function gear_dri employsthe function diffun which is to be written by user. Example of thisfunction's specification is given in the description of functionek_dri.

209


Prior to the call of the function gear_dri the communicator(communications environment) is to be determined consisting of thechosen number of processors, the initial data are to be distributedbetween processors and the required volume of memory for arraysused in the program is to be allocated.

Declaration of function gear_dri#include <math.h>#include "mpi.h"#include "ode1.h"int gear_dri(MPI_Comm com, int id, int np, int n, int

mt, double t0, double tfin,double eps, double delta1, double delta2, double *lip, double *delta, double *y0,double **w1, double *tout,void(*diffun)());





w1 – pointer to two-dimensional working array;

210



Returned valueFunction gear_dri returns a value of the completion code with


1 – successful completion of integration;–1 – integration was interrupted because the error test failed

even with decreasing of the integration step size;–2 – integration was interrupted because too high accuracy

was requested;–3 – integration was interrupted because corrector's

convergence has not been attained;–4 – illegal specification of input parameters;–5 – memory allocation error.


Example of using function gear_driA segment of program using the function gear_dri for the


{int i,n,mt,id,np, ne,iflag;double t0,tk,eps,delta1,delta2,lip,delta;double *y0, *tout;double **w1;FILE *fi


// initial data specificationn=4000; mt=4;t0=0.0;tfin=0.4; eps = 0.000001;delta1=1e-15;

211

delta2=1e-15;// allocation of memory y0=(double *)malloc(n*sizeof(double));

tout=(double *)malloc(mt*sizeof(double));w1=(double **)malloc(13*sizeof(double *));


for(i=0; i<n; i++)y0[i]=1.0;

// the solving of SODE by 5-th order GEAR methodsiflag=gear_dri(MPI_COMM_WORLD,id,np,n,m,t0,tfin,eps,

delta1,delta2,&lip,&delta, y0,w1,tout,diffun);// shutdown of MPI

MPI_Finalize();}

Function ros_dri. Function ros_dri is intended for thesolving of SODE with approximately given initial data by explicit 4-thorder Rosenbrock method on the distributed memory MIMD-computerwithin the parallel programming environment MPI. Parallel algorithmis described in [10, 11, 22, 32].

In the process of solving the SODE function ros_dri employsthe function diffun which is to be written by user. Example of thisfunction's specification is given in the description of functionek_dri.


Prior to the call of the function ros_dri the communicator(communications environment) is to be determined consisting of thechosen number of processors, the initial data are to be distributedbetween processors and the required volume of memory for arraysused in the program is to be allocated..

Declaration of function ros_dri#include <math.h>#include "mpi.h"#include "ode1.h"

212

int ros_dri(MPI_Comm com, int id, int np, int n, int mt,double *t0, double tfin,double eps, double delta1, double delta2,double *lip, double *delta, double *y0,double **a, double *tout, void(*diffun)());





a – pointer to two-dimensional working array;tout – pointer to the array of the dimension mt, containing output

points of the solution;diffun – pointer to the function for the evaluation of right-hand

sides.Returned valueFunction ros_dri returns a value of the completion code with


213

1 – successful completion of integration;2 – too small step of integration is obtained, while the

required accuracy has not been attained;3 – initial data specification error;4 – too high accuracy is requested which cannot be attained;5 – memory allocation error.


Example of using function ros_driA segment of program using the function ros_dri for the


{int i,n,mt,id,np, ne,ne1,mmm,iflag;double t0,tk,eps,delta1,delta2,lip,delta;double *y0, *tout;double **a;FILE *fi

// initialization of MPIM P I _ I n i t ( & a r g c , & a r g v ) ;

MPI_Comm_rank(MPI_COMM_WORLD, &id);// initial data specification

n=4000; mt=4; t0=0.0;tfin=0.4; eps = 0.000001;delta1=1e-15;delta2=1e-15;ne1=n/np;if(n%np ==0) {ne=ne1; mmm=ne;} else mmm=ne1+1;


tout=(double *)malloc(mt*sizeof(double)); a=(double **)malloc(mmm*sizeof(double *));


for(i=0; i<n; i++)y0[i]=1.0;

214

// the solving of SODE by 4-th order ROSENBROCK methodiflag=ros_dri(MPI_COMM_WORLD,id,np,n,mt,&t0,tfin,eps,

delta1,delta2,&lip,&delta,y0,a,tout,diffun)// shutdown of MPI

MPI_Finalize();}

215

Chapter 8. EMPLOYING OF THE INTELLIGENTSOFTWARE FOR THE SOLVING OFAPPLICATION PROBLEMS

8.1. Branches in which problems modeling the behavior ofcomplicated structures and constructions arise and beingstated

8.1.1. Branches in which problems arise. Problems ofcalculation of strength of structures and constructions arise inmany branches of national economy, including:

construction;air-building;ship-building;missile-building;motor-building;other branches of machine building.

Increasing requirements to the quality of project solutions andapplying of new construction materials generate a necessity in thesolving of absolutely new problems as well as in the carrying out ofcalculations of unique structures. The demand is growing in newmethods and approaches related to the construction and investigationof correct computer models which adequately reflect real work ofstructures.

The program complex (PC) LIRA can be used for the solving ofproblem [9, 12, 14, 47]. However, this PC can be employed for thestrength calculations in other branches by adjusting its interface to theappropriate subject area.

8.1.2. Mathematical statements of problems. Problems incalculating the strength of structures by means of possibledisplacements principle can be mathematically formulated in form ofthe following variational problems [49].

It is required to find a vector-function u U0, which for anyvector-function v U0 (any possible displacement) satisfies the integralidentity:

216

217

for the static problema(u,v) = l(f,v); (8.1)

for dynamic problema(u,v) + b(u ,v) + c(u ,v) = l(f,v), (8.2)

u(t0) = u(0), u (t0) = u(1); (8.3)

for problem on eigen-oscillationsa(u,v) = b(u,v), (8.4)

where U0 is infinite-dimensional functional space of possibledisplacements; symmetric bilinear forms a(u,v), b(u ,v), c(u ,v) areproportional to the potential, kinetic deformation and braking energies,respectively, while the linear form l(f,v) is proportional to the action ofapplied (outer) efforts under loading; u is the first derivative ofvector-function u(χ) with respect to time; u is the second derivative.

Here we are to consider only linear problem in the assumptionthat solving of non-linear problem can be reduced to the solving ofsequence of linear problems.

8.2. Discretization of problems by finite elements method

8.2.1. General methodic of problems’ discretization. PC LIRAis theoretically based on the finite elements method (FEM) [48]implemented in the form of displacements. A choice of this versioncan be explained by: the following: simplicity of it algorithmizationand physical interpretation; the availability of common methods forthe construction both of stiffness matrices and loading vectors forvarious types of finite elements; possibility of taking into accountarbitrary boundary-value conditions and complex geometry of theconstruction being calculated.

To obtain a discrete problem by FEM a domain occupied by thestructure is broken into finite elements, nodes and degrees of theirfreedom are specified (displacements and turning angles of nodes).Degrees of freedom are in accordance with basic (coordinate,

218

approximating) vector functions i, different from zero only on starsof elements containing nodes corresponding to the given degree offreedom. Besides, the following relations are valid for degrees offreedom and basis functions:

Lj( i) = ij, (8.5)where ij is the Kronecker s symbol, while the result of operationLj( i) is a value of component of the vector-function i for the degreeof freedom Lj.

Approximate solutions of the appropriate problems are beingsought in the finite-dimensional subspace U0h of the space U0.

Vector-functions from the subspace U0h are piecewise-polynomialand can be represented in the form of the linear combination of basisvector-function satisfying the principal (kinetic) conditions:

, (8.6)

where i (i = 1, 2, …, N) are above-mentioned piecewise-polynomial

basic of U0h. Then the resolving discrete problems have a form [34]:linear algebraic system (LAS)

Ax = b (8.7)for the static problem (8.1);

initial-value problem with initial conditions

Bx (t) + Cx (t) + Ax(t) = b(t), x(t0) = x(0), x (t0) = x(1) (8.8)

for dynamic problem(8.2),(8.3);algebraic eigenvalue problem

Ax = hBx (8.9)

for the problem on eigen-oscillations (8.4).The resolving problems (8.7) (8.9) possess the following

distinctive features:

219

1) high order of matrices of resolving discrete problems –ranged between 100 000 and tens of millions;

2) matrices of discrete problems (stiffness matrix A, matrix ofmasses B, and dumping matrix C) are:

symmetric;positive definite or positive semi-definite;sparse (band, profile, block and etc.);

3) elements of matrices and vectors of the resolving problemsare evaluated with errors caused by: the initial data errors,discretization errors and errors due to the evaluating of these elementsin computer.

A matrix is said to be sparse if the number of its non-zero elementsis considerably less than their total amount n2, where n is order of thematrix.

FEM possesses an important advantage that elements of matricesand vectors of right-hand sides of problems (8.7) (8.9) are obtainedby summing up corresponding elements of stiffness matrices (mass,dumping) and vectors of loadings constructed for separately finiteelements.

Such property of matrices and vectors of discrete problems ofFEM enables to efficiently parallel a process of forming of thesematrices and vectors. For example, a domain can be broken intoapproximately equal sub-domains (with respect to the number ofprocessors being used) and for the finite elements contained in thesesub-domains their corresponding matrices and vectors are formed(independently on each other) by using inter-processor exchanges;global matrices and vectors of the corresponding discrete problem(8.7) (8.9) are formed. In so doing the global matrices and vectors canbe either formed in file (files) on the external information carrier ordistributed between processors according to the requirements ofmethod to be used for the solving of discrete problem.

8.2.2. Convergence of solutions of discrete problems.Convergence conditions and estimates for errors for FEM indisplacements are known [12, 34, 48]. Convergence conditions are thefollowing: linear independence and fullness of systems of basis

220

functions as well as their consistency (conformity) or conditionscompensating non-consistency. Linear independence follows fromequalities (8.5). Consistency means that all basis functions arepossible displacements, i.e. belong to the space U0.

Easily verified conditions are known enabling to reveal either thefullness of basis functions and their consistency or the holding ofconditions compensating the non-consistency. These conditions havethe form of equalities to be satisfied by basis functions on every finiteelement.

Such theoretical foundation enables not only to investigate thecorrectness of employment of well-known finite elements but also todevelop principles of constructing new consistent and non-consistentelements and obtain their estimates.

A finite elements library of the Program Complex contains thefollowing elements modeling the work of structures of various types:

elements of bars;quadrangular and triangular elements of the plane problem,

plates and shells;spatial elements – tetrahedron, parallelepiped and trihedral

prism.In addition, the library contains various special elements modeling

connection of finite stiffness, elastic pliability between nodes andelements specified by numerical stiffness matrix.

All finite elements included in the library are theoretically based,and estimates are obtained for errors in the solution of discreteproblem both by displacements and stresses. Errors in solution of thediscrete problem are estimated by values proportional to hk, где h ismaximum size of the finite elements, while k is order of convergence.

Only those finite elements the convergence of which istheoretically proved are included in the library of finite elements. Theknowledge of these values for various finite elements enables not onlyto gain assurance in results of solving some or other problem but alsoto estimate strained and stressed state as a whole.

Thus, for rectangular elements of plate the error in displacementscomes to O(h4), while in stresses – to O(h2). For the rest of finite

221

elements in use the error in displacements is estimated by O(h2), whilein stresses – by O(h).

The following possibilities are also theoretically grounded:possibility of presenting curvilinear bars by rectilinear elements andpresenting arbitrary shells by triangular and rectangular (for cylindricalshells) elements of the plane shell [12]. In this case errors both indisplacements and stresses are estimated by values proportional to h.

For bar system (except for elastically based bars) the finiteelements method yields exact solutions within the frames of planesections hypothesis.

8.3. The solving of discrete problems

8.3.1. The solving of static problem. Upon presenting the givenstructure in the form of finite element scheme, the problem on thedetermining of displacements of angles is reduced to the solving ofLAS (8.7), where А is a symmetric positive definite matrix of order N,b is a matrix of right-hand sides (loadings) of size N×q (q is thenumber of loadings), and x is N×q matrix of displacements beingsought.

In the most cases the matrix А of system (8.7) is sparse. Thereforeto decrease a volume of the required operating and external memory aswell as run times, the ordering of unknown in system (8.6) can becarried out within the program complex in order to minimize thematrix profile. Several re-ordering methods are used, namely: inverseCuthill-McKee algorithm, “tree factor” algorithm as well as parallelsections algorithm [12]. The user can choose the re-ordering method.By default, the inverse Cuthill-McKee algorithm is employed since itrequires minimum operating memory. It is impossible to recommendconcrete re-ordering method, since the performance of some or otheralgorithm considerably depends on the structure of concrete matrix А.

The solving of LAS (8.7), either original or re-ordered, is carriedout on Inparcom by means of programs from the Inparlib library.Programs are used for the investigating and solving of LAS with eitherband symmetric or profile structure symmetric matrices depending onfilling of the matrix band [43].

222

During the investigating and solving of system (8.7) the triangulardecomposition of the matrix А is carried out y first of all. If in theprocess of this decomposition the matrix А turns out to be singular,then the automatic imposing of connections ensuring geometricinvariability is carried out. In so doing a user obtains informationabout numbers of nodes and numbers of degrees of freedom by whichconnections are imposed. In this case it is recommended to attentivelyanalyze calculating scheme and reveal the organization of geometricvariability of the structure.

A control over the solving of system (8.7) is an extra service tool,which is done on the basis of investigations of characteristics of theLAS carried out by computer. These investigations yield estimates forthe condition number of system’s matrix as well as for the inheritedand computations errors in solutions. If values of these errors are largethe user should attentively analyze values of displacements of anglesand make sure that the obtained solution is acceptable from theengineer's point of view.

The static calculation of the stressed and strained state possessesthe following results: displacements of nodes of scheme and efforts(stresses) in sections of elements.

8.3.2. The solving of dynamic problem. A method consisting inthe combination of finite elements method and decomposition byforms of eigen-oscillations is used for the solving of dynamic problem(8.2), (8.3).

As to semi-discrete FEM, the approximate solution is sought inthe form (8.6), where coefficients xi are functions of time t. As a resultwe obtain initial-value problem for the second-order system ofordinary differential equations (8.8), where х(t), x(0), x(1) are vectorswith elements xi(t), .

System (8.8) is solved by decomposition by forms ofeigen-oscillations. If is a solution

of algebraic eigenvalue problem (8.9), then by assuming in (8.8) that

223

, instead of (8.8) we get (under assumption of the

B-orthogonality of vectors zkh and under certain assumptions as to thedumping matrix С) a system which breaks down into independentequalities with respect to yi(t):

(8.10)

where

, 0 < ξk < 1, ,

, ,

k = 1, 2, …, N.Solutions of problems (8.10) have a form:

where . Analysis of these solutions shows that

considerable contribution to the solution x(t) of the problem (8.8) ismade only by approximately 10 components corresponding tominimum eigenvalues. Because of this the corresponding eigenvalueproblem (8.9) can be solved by method of iterations on the subspace[2, 35].

On Inparcom-16 the problem (8.9) is solved by means of programfrom the Inparlib library intended for the investigating and solving ofpartial generalized eigenvalue problem either with band symmetric orprofile structure symmetric matrices depending on the filling of thematrix band.

224

Results of the dynamic calculation are the following: periods,frequencies and forms of eigen-oscillations for each tone as well asinertial forces and their corresponding displacements of nodes andefforts (stresses) in elements.

Vectors of inertial forces Sk(t) are evaluated by formula

.

The values

are employed in calculations.Expressions for values Sk,0 under dome loadings (when the

accurate evaluation of yk(t) is feasible) are given below:

1. Under the wind loading Sk,0 = wн k, where wн is normativevalue of the wind loading, k is the dynamic quality coefficientdepending both on k, k and wind’s speed.

2. Under seismic loading Sk,0 =Aβk, where А is relative value ofthe acceleration, βk – dynamic quality coefficient depending on k и

k.3. Under impulse and impact loadings , where k

depends on ti, k, (ti is the impulse action time, ),

takes into account periodicity of loading's action and depends on the

fact whether the oscillations ( , n the number of repetitions)

have been established or not ,

225

m0, 0 are the mass and speed of impacting body, respectively, while is a coefficient of form recovering for bodies involved in the impact.

4. Under harmonic loading the inertial

forces S1 and S2 (summarized over all forms) are evaluated thatcorrespond to cosinusoidal (realistic) and sinusoidal (imaginary)components. Then we have

.

In the rest of cases solutions yk(t) are evaluated numerically. Inparticular, during the calculations of seismic loading by accelogramthe vector Pj = P(tj) is specified at every moment of time tj. Then in(8.10) we have Pk,j =Pk(tj). Further equations (8.10) are solved byfinite differences method by Newmark scheme. As a result, we getvalues both of displacements yk,j = yk(tj) and inertial forcesSk,j = Sk(tj), involved with the evaluation of

.

On Inparcom-16 the problem (8.10) is solved by 4-th orderRunge-Kytta method [38] intended for the solving of the 2-nd ordersystem of ordinary differential equations by means of programs fromthe library Inparlib.

8.3.3. The solving of discrete problems on Inparcom. As it wasnoted above, the solving of problems (8.7), (8.9) and (8.10) can becarried out on Inparcom by means of using programs from the Inparliblibrary. In so doing, two cases are to be distinguished:

initial data of the discrete problem have been formed by PCLIRA on the personal computer (workstation);

226

initial data of the discrete problem are formed on Inparcom byparallel program.

In the first case initial data which have been formed correspondingto the discrete problem are written to files according to data formataccepted by PC LIRA. Afterwards, by means of special interface aparallel program for the solving of appropriate discrete problem isexecuted on Inparcom-16. This program

by analyzing problem’s parameters, chooses both algorithmand parameters required for the solving of problem;

reads initial data from files and distribute them betweenprocessors according to the chosen parallel algorithm for the solvingof problem;

investigates and solves the problem by calling appropriatedfunctional modules from the Inparlib library;

saves results of problem’s investigating and solving in files fortheir further usage by post-processor of the program complex.

In the case if the initial data of the discrete problem have beenformed on Inparcom-16, then prior to the forming of data theparameters of discrete problem being formed are analyzed by parallelprogram. Then the problem’s data are formed and distributed betweenprocessors according to the parallel algorithm chosen for the solvingof discrete problem; the data are either formed in the processor or sendto it after their formation. Afterwards the investigating and solving ofproblem is carried out by means of suitable functional modules fromthe Inparlib library, while the results of investigating and solving ofthe problem are stored in files for their further using by post-processorof PC LIRA.

8.4. Examples of solving problems on the calculation ofbuilding structures on Inparcom

Consider several examples of using functional modules from theintelligent programs library Inparlib for the solving problems of theform (8.7) and (8.9) whose data have been formed by PC LIRA.

227

Example 1. Static calculation of stressed and strained state ofthe industrial building.

The general view of the structure and finite-element grid in use areshown in fig. 8.1.

FIG. 8.1

The structure is broken into 13 876 finite elements. The finiteelement grid formed by this breaking includes 7 563 nodes.

The discretization resulted in the obtaining of resolving problem(8.7) – system of linear algebraic equations whose order is equal to44 436. Prior to the ordering the matrix of this system possessed theband structure shown in fig. 8.2; the density (filling) of this structuremade up almost 21%, and the band’s half-width is equal to 4 476.

After the ordering (optimization of the matrix structure) thedensity of the ordered structure made up almost 2%, though band’shalf-width became equal to 37 850. Structure of the ordered matrix isshown in fig. 8.2.

228

FIG. 8.2

The calculated distribution of stresses for one of the cases of theloading is shown in fig. 8.3.

FIG. 8.3

229

The resolving problem form (8.7) is solved on Inparcom by meansof functional modules from Inparlib library, implementing parallelalgorithms of the Cholesky method:

band algorithm for the original structure of the stiffness matrix;profile algorithm for the ordered structure of matrix of LAS.

Both versions of the problem for three cases of loading weresolved on 16 processors of the workstation Inparcom-16. As to theoriginal structure, the time required for the investigating and solvingof the problem amounted to 1,449 min (including time for the stiffnessmatrix decomposition and solving – 1,354 min), while for the orderedstructure this times amounted to 0,117 min and 0,109 min,respectively.

Example 2. Statistic calculation of stressed and strained state ofthe buiding’s foundation.

The general view of the structure and finite-element grid in use areshown in fig. 8.4.

FIG. 8.4

The structure is broken into 97 412 finite elements. The finiteelement grid resulted from this breaking consists of 94 346 nodes.

The discretization resulted in the obtaining of the resolvingproblem (8.7) – a system of linear algebraic equations the order whichis equal to 283 031. Original structure of this system's matrix is shown

230

in fig. 8.5. After optimization of the matrix structure the density ofordered structure constituted 7% with band half-width equal to 19 530.The structure of ordered matrix is shown in fig. 8.5.

FIG. 8.5

The resolving problem of the form (8.7) is solved on Inparcom bymeans of functional modules from the Inparlib library implementingparallel algorithms of the Cholesky method for the ordered structure ofthe stiffness matrix. Time required for investigating and solving of theproblem amounted to 1,290 min (including time required for thestiffness matrix decomposition and solving – 1,222 min).

Example 3. Determination of frequencies and forms ofeigen-oscillations of consol (a test problem).

The structure is broken into 81 000 finite elements – rectangularparallelepipeds, while the finite-element grid formed as a result of thebreaking includes 100 100 nodes.

The discretization resulted in the obtaining of resolving problem(8.9) – generalized eigenvalue problem with diagonal matrix ofmasses B. Order of the problem is equal to 300 000. The bandhalf-width of the stiffness matrix A is equal to 336, wile density of itsband structure constituted 100%.

The partial eigenvalue problem of the (8.9) was solved onInparcom by means of functional modules from Inparlib libraryimplementing parallel algorithm for method of iterations on the

231

subspace for band matrices. Eleven minimum eigenvalues andeigenvectors corresponding to them were evaluated. The problem wassolved on 16 processors of Inparcom-16. Times required forinvestigating and solving of the problem amounted to 1,006 min,including the time for the stiffness matrix decomposition – 0,179 minand time for determining frequencies and forms (iterative process) –0,592 min.

In addition to these examples some test resolving problems of theform (8.7) formed by PC LIRA was solved on Inparcom-16. Timesrequired for the solving of these problem on 16 processors ofInparcom-16 as well as approximate times required for the solving ofthese problem on PC LIRA are given in Table 8.1.

232

TABLE 8.1

Order

Band half-width(maxi-mum)

Den-sity,

%

Solution time,(in min.)

PC-LIRA

Inparcom-16

Ratioof

solution times

Time forinvestigatingof problem

onInparcom-16

, (in min.)

Algo-rithm

43 950 1 804 49 3,4 0,131 26,03 0,006 p44 436 4 476 21 2,6 1,354 1,92 0,095 b44 436 37

5802 0,9 0,109 8,26 0,008 b

283 031 19530

7 16,4 1,222 23,42 0,068 b

300 000 335 100 3 0,188 15,92 0,040 b666 000 1 004 85 27 1,989 13,58 0,157 b

1 000332 1 004 85 41 3,001 13,66 0,238 b1 332 000 1 004 85 54 3,942 13,70 0,313 b1 332 000 1 004 85 54 2,802 19,27 0,418 p1 200 000 1 265 100% 102 4,776 21,36 0,337 b

In the last column entitled "Algorithm" in Table 8.1 the letter “b”means band algorithm, while letter “p” means profile algorithm. Bydensity we mean the density of the matrix band filling in LAS. TheTable also gives times required both for the investigating ofcharacteristics of LAS and evaluating of estimates for solution’sreliability on Inparcom-16 (PC LIRA doesn’t carry out suchinvestigations). Problems listed in the table were solved by means ofPC LIRA on personal computers with following characteristics:

CPU GenuineIntel 2,39 GHz, 522988 Kb RAM – forproblems with serial numbers 1–3;

CPU GenuineIntel 3 GHz, 2095148 Kb RAM – for problemwith serial number 4;

233

CPU GenuineIntel 3,21 GHz, 1047788 Kb RAM – forproblems with serial numbers 5–10.

Attention should be drawn to the fact that investigations carriedout by functional modules of Inparlib increase the total time requiredfor the solving of problems, as a rule, not more than by 10%.However, such investigations, especially the evaluation of thecondition number of the matrix of LAS and comparison of this valuewith errors in problem’s initial data during the evaluation of estimatefor the inherited error, enable to estimate the reliability of the obtainedsolution and, that is especially important, to introduce corrections tothe statement of problem in case if it is impossible to guarantee thereliability of the problem’s solution.

234

REFERENCES

1. Alishov, N.I.: Theoretical foundations and practical problems inthe optimization of time for the delivery of information resourcesin distributed systems, Computer tools, networks and systems. –2004. – N 3. – P. 87–97 (in Russian).

2. Bathe, K.; Wilson, E.: Numerical methods in finite elementanalysis, Prentice Hall, New Jersey, 1976.

3. Broyden, C.G.: A class of methods for solving nonlinearsimultaneous equations, Math. Comput, 19, N92 (1965),P.577-593.

4. Burdakov, O.P.: Some globally convergent modifications of theNewton's method for solving systems of non-linear equations,Dokl. AN SSSR, 254, №3 (1980), P. 521-523 (in Russian).

5. Deitel, H.M., Deitel, P.J.: Advanced Java 2 Platform: How toprogram. Prentice Hall, October – 2003. – P. 30,- 31.

6. Dennis, J.E.; More, J.J.: A characterization of super-linearconvergence and its application to quazi-Newton methods, Math.Comput., 28, N126 (1974), P. 549-560.

7. Forsythe, G.E.; Malcom, M. M.; Moler, C.B.: Computer Methodsfor Mathematical Computation, Prentice Hall, Englewood Cliffs,New Jersey, 1977.

8. Gear, C.W.: Numerical initial value problem in ordinarydif-ferential equations, Prentice-Hall, New Jersey, 1971.

9. Genzersky, Yu.V., Slobodyan, Ya.Ye., Titok, V.P. et al.:LIRA 9.4. Examples of calculation and design. – Kiev: NIIASS,2006. – 124 p. (in Russian).

10. Gerasimova, T.A., Zubatenko, V.S., Molchanov, I.N. et al.:Library of intelligent parallel programs for the investigating andsolving of problems of the computational mathematics withapproximately given initial data: Copyright registration certificate№ 17213 as of 11.07.2006 р / State department on the intelligentproperty (in Russian).

11. Gerasimova, T.A., Nesterenko, A.N., Khimich, А.N.,Yakovlev, M.F.: Investigation of some algorithms for the solving

235

of SODE and non-linear systems on MIMD-computers, Artificialintellect, N 4, 2006,P. 138–147 (in Russian).

12. Gorodetsky A.S., Yevzerov I.D.: Computer models ofconst-ructions. – Kiev: FACT, 2005. – 344 p. (in Russian).

13. http: // www.top500-org14. http: // www.lira.com.ua15. Khimich, А.N.: Estimates for perturbations to solutions of the

least-squares problem, Cybernetics and system analysis. – 1996. –N 3. – P. 142 – 145 (in Russian).

16. Khimich, А.N.: Estimates for overall error in solutions of linearalgebraic systems with arbitrary rank matrices. // Computermathematics. – 2002. – N 2. – P. 41–49 (in Russian).

17. Khimich, А.N.: Estimates for overall error in symmetriceigen-value problem, Technology and methods for the solving ofsome application problems, Kiev: V.M.Glushkov Institute ofcyberne-tics, 1991. – P 85–88 (in Russian).

18. Khimich, А.N А.Н., Voitsekhovsky, S.А., Brusnikin, V.N.: On thereliability of linear mathematical models with approximately giveninitial data // Mathematical machines and systems. – 2004. – N 3.– P. 54–62 (in Russian).

19. Khimich, А.N., Popov А.V., Chystyakova Т.V. et al.: Investigationof block-cyclik algorithms on the family of clusters SKIT //Problems of programming. –2006. – N 2-3. – P. 177–183 (inRussian).

20. Khimich, А.N., Yakovlev, M.F.: On the solving of systems withnon-full rank matrices // Computer mathematics. – 2003. – N 1. –P. 1–15 (in Russian).

21. Khimich, А.N, Yakovlev, M.F.: The solving of LAS withsymmetric positive semi-definite matrices // Computermathematics. Theory of computations. 2001. P. 392–396 (inRussian).

22. Khimich, А.N., Yakovlev, M.F., Gerasimova, T.A.: Somequestions related to the solving of systems of ordinary differentialequations on MIMD-computers // Cybernetics and system analysis.

236

– 2007. – N 2.– P. 175–182 (in Russian).23. Lambert, J.D.: Computational methods in ordinary differential

equations. Wiley, London, 1973.24. Mikhalevich, V.S., Bic, N.А., Brusnikin, V.N. et al.: Numerical

methods for multi-processor computational complex ES / Ed.:Molchanov, I.N., Мoscow: prof. N.Ye. Zhukovsky VVIA, 1986. –401 p. (in Russian).

25. Molchanov, I.N.: Introduction to algorithms of parallelcomputations, Naukova Dumka, Kiev, 1990. – p. 127 (in Russian)

26. Molchanov, I.N.: Intelligent computers – an efficient tool for theinvestigating and solving of scientific and engineering problems,Cybernetics and system analysis. – 2004. –N 1. – P. 174–179 (inRussian).

27. Molchanov, I.N. Machine mathematics – problems and pros-pects,Cybernetics and system analysis. – 2004. –N 6. – P. 65–72 (inRussian).

28. Molchanov, I.N. Machine methods for the solving of applicationproblems. Algebra, approximation of functions, Naukova Dumka,Kiev, 1987. – 285 p. (in Russian).

29. Molchanov, I.N. Problems and prospects of the development ofapplications software. // Supervising systems and machines. 1988. – N 1. – P. 56–61 (in Russian).

30. Molchanov, I.N., Galba Ye.F., Popov А.V. et al.: Intelligentinterface for investigating and solving problems of thecomputational mathematics with approximately given initial dataon MIMD-computer // Problems of programming. – 2000. – N 1-2.– P. 102–112 (in Russian).

31. Molchanov, I.N., Galba Ye.F., Popov А.V. et al.: Problems in thecreation of the intelligent numerical software //Artificial intellect.–2003.–N 3. –P. 276–284 (in Russian).

32. Molchanov, I.N., Gerasimova, T.A., Nesterenko, A.N. et al.: Onthe efficient implementation of computational algorithms onMIMD-computers // Artificial intellect. 2005. – N 3. –P. 175–181 (in Russian).

237

33. Molchanov, I.N. Mova, V.I, Stryuchenko, V.A. Intelligentcom-puters for investigating and solving of scientific andengineering problems– a new direction in the development of thecompu-tational machinery // Communications. – 2005. – N 7. –P. 45, 46 (in Russian).

34. Molchanov, I.N.; Nikolenko, L.D.: Foundations of the finiteelements methods, Naukova Dumka, Kiev, 1989 (in Russian).

35. Molchanov, I.N., Popov А.V., Khimich, А.N.: Algorithm for thesolving of partial eigenvalue problem for large profile matrices //Cybernetics and system analysis. – 1992. – N 2. – P. 141–147 (inRussian).

36. Molchanov, I.N., Popov А.V., Khimich, А.N.: Parallel algorithmfor the singular decomposition of matrices //Theory of optimaldecisions. – Kiev: V.M.Glushkov Institute of cybernetics, 2001. –P. 80–83 (in Russian).

37. Molchanov, I.N., Khimich, А.N., Popov А.V. et al.: Efficientimplementation of process of solving problems of thecomputational mathematics on MIMD-computers // Problems inoptimization of computations (ПОО-XXXII): Proc. of theInternational conference. – Kiev: V.M. Glushkov Institute ofcybernetics, 2005. – P. 155 (in Russian).

38. Moltschanow I., Chimitsch A. und andere. Intelligente Umgebungzur Untersuchung und Lösung wissenschaftlichtechnischerAufgaben auf Parallelrechnern (ISPAR), 01 IR 64113. – Kiew,1998, 192 p.

39. Nesterenko, A.N., Khimich, А.N., Yakovlev, M.F.: Someproblems related to the solving of systems of non-linear equationson multi-processor distributed-memory computing systems,Bulletin of computer and information technologies, Мoscow,2006. – N 10. – P. 54–56 (in Russian).

40. Nesterenko, A.N., Khimich, А.N., Yakovlev, M.F.: On the solvingof systems of non-linear equations on MIMD-computers,Intelligent and и multi-processor systems: – Proc. of theInternational conference. – Taganrog: ТRТU, 2005, vol.1. –P. 130–133 (in Russian).

238

41. Оrtega J.: Introduction to parallel and vector solution of linearsystems, Penum Press, N.Y., N.7, 1973. – 368 p.

42. Parasyuk, I.N., Sergiyenko, I.V. Packages of programs for the dataanalysis: development technology, Moscow: Finances andstatistics, 1988, 159 p. (in Russian).

43. Popov А.V., Khimich, А.N.: Parallel algorithms for the solving oflinear algebraic with band symmetric matrix, Computermathe-matics, 2005. – N 2. – P. 52–59 (in Russian).

44. Powell, M.J.D.: A new algorithm for unconstrained optimization,Nonlinear Programming (Eds. Rosen J.B. et al.), Acad. Press, NewYork, 1970.

45. Representation of knowledge in human-machine and robotsys-tems: in 3 vol. – Мoscow: VINITI, Comp. Center Аcad. Scis.USSR, 1984. – Vol А. – 216 с.; Vol. В. – 236 с.; Vol. С. – 378 p.(in Russian).

46. Rosenbrock, H.H.: Some general implicit processes for thenumerical solution of differential equations, Comput. J. 5, N4(1963), P. 329-330 (in Russian).

47. Strelets-Streletsky, Ye.B., Gensersky, Yu.V., Marchenko, D.V.etal.: LIRA 9.2. Users guide. Foundations. Manual, Ed. byА.S. Gorodetsky, Kiev: FAKT, 2005, 146 p. (in Russian).

48. Strang G., Fix G.I.: An analysis of the finite element method.Englewood Cliffs: Prentice-Hall, 1973. – 349 p.

49. Timoshenko S.P., Gudier J. Elasticity theory, Мoscow: Nauka,1975, 575 p. (in Russian).

50. Voyevodin V.V., Voyevodin Vl.V. Parallel computations, – St.Peterbourg, 2002, 608 p. (in Russian).

51. Wilkinson, J.H.: The algebraic eigenvalue problem, ClarendonPress, Oxford, 1965.

52. Wilkinson, J.H.; Reinsch, C.: Handbook for automaticcomputation, Linear Algebra, Springer Verlag, Berlin, 1971.

53. Zubatenko, V.S., Maistrenko, A.S., Molchanov, I.N., et al.:Investigation of some parallel algorithms for the solving ofproblems in linear algebra on MIMD-computers, Artificialintellect, 2006. – N3. – P. 129–138 (in Russian).

239

PREFACE 3Chapter 1. THE INTELLIGENT COMPUTER INPARCOM 6Chapter 2. INTELLIGENT SOFTWARE INPARTOOL FOR THEINVESTIGATING AND SOLVING OF PROBLEMS OF THECOMPUTATIONAL MATHEMATICS 13

2.1. Conception of Inpartool 132.2. Composition and architecture of Inpartool 172.3. Program implementation of Inpartool 20

Chapter 3. INVESTIGATING AND SOLVING OF LINEARALGEBRAIC SYSTEMS 25

3.1. Functional potentialities of Inpartool on investigating andsolving of linear algebraic systems 253.2. Technology for investigating and solving of linear algebraicsystems 313.3. Examples of solving linear algebraic systems by means ofInpartool 45

Chapter 4. INVESTIGATING AND SOLVING OF EIGENVALUEPROBLEMS 57

4.1. Functional potentialities of Inpartool on investigating andsolving of eigenvalue problems 574.2. Technology of investigating and solving algebraic eigenvalueproblems 624.3. Examples of solving algebraic eigenvalue problems by meansof Inpartool 74

Chapter 5. INVESTIGATING AND SOLVING OF SYSTEMS OFNON-LINEAR EQUATIONS 85

5.1. Functional potentialities of Inpartool on investigating andsolving of systems of non-linear equations 855.2. Technology of investigating and solving systems of non-linearequations 905.3. Examples of solving systems of non-linear equations by meansof Inpartool 102

Chapter 6. INVESTIGATING AND SOLVING OF SYSTEMS OFORDINARY DIFFERENTISL EQUATIONS 107

240

6.1. Functional potentialities of Inpartool on investigating andsoling systems of ordinary differential equations 1076.2. Technology of investigating and solving systems of ordinarydifferential equations 1136.3. Examples of solving systems of ordinary differential equationsmeans of Inpartool 126

Chapter 7. LIBRARY OF INTELLIGENT PROGRAMS INPARLIB131

7.1. Purpose and composition of the library 1317.2. Description of functions from the Inparlib library 135

Chapter 8. EMPLOYING OF THE INTELLIGENT SOFTWAREFOR THE SOLVING OF APPLICATION PROBLEMS 203

8.1. Branches in which problems modeling the behavior ofcomplicated structures and constructions arise and being stated 2038.2. Discretization of problems by finite elements method 2048.3. The solving of discrete problems 2088.4. Examples of solving problems on the calculation of buildingstructures on Inparcom 213

REFERENCES 220

Documents

NUMERICAL SOFTWARE for INTELLIGENT MIMD-COMPUTER … · 6 problems on Inparcom. A technology of the investigating and solving of scientific and engineering problems has been demonstrated