Proceedings of the 2009 International Conference on ...cmmse.usal.es/cmmse2020/sites/default/files/volumes/volumen3opt.pdfA Spherical Interpolation Algorithm Using Zonal Basis Functions

Proceedings of the 2009 International Conference on

Computational and Mathematical Methods in Science and Engineering

Gijón (Asturias), Spain June 30, July 1-3, 2009

Editor: J. Vigo-Aguiar

Associate Editors:

Pedro Alonso (Spain), Shinnosuke Oharu (Japan) Ezio Venturino (Italy) and Bruce A. Wade (USA)

ISBN 978-84-612-9727-6 Register Number: 09/9851

@Copyright 2009 CMMSE Printed on acid-free paper

Volume III

Contents:

Volume I

Preface..................................................................................................... 5

Improvements on binary coding using parallel computing Abascal P, García D and Jiménez J.......................................................... 24 Crossover Operators for Permutations. Equivalence between PBX and OBX Aguado F., Molinelli J., Pérez G., Vidal. V............................................. 35 Numerical integration schemes for the discretization of BIEs related to wave propagation problems Aimi A., Diligenti M., Guardasoni C. ..................................................... 45 Mimicking spatial effects in predator-prey models with group defense Ajraldi V. and Venturino E. . .................................................................. 57 Recent advances in the parallel iterative solution of large-scale sparse linear systems Aliaga J. I., Bollhöfer M., Martín A. F., Quintana-Ortí E. S ............... 68 Scattered Multivariate Interpolation by a Class of Spline Functions Allasia G. ................................................................................................. 73 Two interpolation formulas on irregularly distributed data Allasia G., Bracco C. ............................................................................... 80 Neville Elimination and Multi-core Systems: OpenMP vs MPI Alonso P., Cortina R., Martínez-Zaldívar F. J., Ranilla J. ...................... 85 @CMMSE Page 726 of 1461 ISBN: 978-84-612-9727-6

Growth Factors of Pivoting Strategies Associated to Neville Elimination Alonso P., Delgado J., Gallego R. and Peña J.M.................................... 93 Reduced order models of industrial fluid-thermal problems Alonso D., Lorente L.S., Velázquez A. and Vega J.M. . ........................ 98 On symmetry groups of quasicrystals Artamonov Viacheslav A. and Sánchez Sergio ....................................... 109 Control of the particle number in particle simulations Assous Franck .......................................................................................... 119 Fast simulation of one-layer shallow water systems using CUDA architectures. Asunción M., Mantas J.M., and Castro M. ............................................ 127 A residual-based a posteriori error estimator for an augmented mixed method in elasticity Barrios T. P., Behrens E.M. and González M. . ...................................... 139 An algorithm for Bang-Bang control of fixed-head hydroplants Bayón L, Grau J.M., Ruiz M.M. and Suárez P.M. .................................. 149 Seeking the identities of a ternary quaternion algebra Beites P. D., Nicolás A. P. and Pozhidaev A. P...................................... 160 Computer-aided clinker analysis Bilbao-Castro, J.R. et al ........................................................................... 166 Fractional calculus and Levy flights: modelling spatial epidemic spreading Boto J.P. and Stollenwerk N. .................................................................. 177 Detection of faults and gradient faults from scattered data with noise Bozzini M. and Rossini M. ..................................................................... 189 Growth of Individuals in Randomly Fluctuating Environments Braumann C.A., Filipe P.A., Carlos C. and Roquete C. ........................ 201

@CMMSE Page 727 of 1461 ISBN: 978-84-612-9727-6

Virtual detectors, transformation and application to pattern recognition Buslaev A. and Yashina M. . .................................................................. 213 A numerical code for fast interpolation and cubature at the Padua points Caliari M., Marchi S. De, Sommariva A. and Vianello M. ..................... 218 Improving Ab-initio Protein Structure Prediction by Parallel Multi-Objective Evolutionary Optimization Calvo J.C., Ortega J. and Anguita M. ..................................................... 229 Numerical Approximation of Elliptic Control Problems with Finitely Many Pointwise Constraints Casas E. and Mateos M. .......................................................................... 241 An Algorithm for Classification of 3-dimensional complex Leibniz algebras Casas J.M., Insua M., Ladra M. and Ladra S. . ....................................... 249 A Spherical Interpolation Algorithm Using Zonal Basis Functions Cavoretto R. and De Rossi A. . .............................................................. 258 Complete triangular structures and Lie algebras Ceballos M., Nuñez J. and Tenorio A. F................................................ 270 Reconstruction of the discontinuities in the parameters of quasi-linear elliptic problems Cimrák I. ................................................................................................ 282 Computing minimal keys using lattice theory Cordero P., Mora A., Enciso M., Pérez de Guzmán. I............................. 288 Incorporating a Four-Dimensional Filter Line Search Method into an Interior Point Framework Costa MFP. and Fernández EGMP. ........................................................ 300 Harvesting in an ecoepidemiological model with species-barrier crossing Costamagna A.and Venturino E. ............................................................ 311 Determining all indecomposable codes over some Hopf algebras Cuadra J, García J.M. and López-Ramos J.A. ........................................ 331 @CMMSE Page 728 of 1461 ISBN: 978-84-612-9727-6

Comparing the behaviour of basic linear algebra routines on multicore platforms Cuenca J., García P., Jiménez D. and Quesada M. . ............................... 341 On Unequally Smooth Bivariate Quadratic Spline Spaces Dagnino C, Lamberti P and Remogna S. ................................................. 350


Contents:

Volume II

Geometric greedy and greedy points for RBF interpolation De Marchi S. . .......................................................................................... 381 On the Consistency Restoring in SPH Di Blasi G, Francomano E, TortoriciA. and Toscazo E. ....................... 393 Classification based on L-fuzzy sets Diaz S., Martinetti D., Montes I. and Montes S. ................................... 405 An Improved Feature Reduction Approach based on Redundancy Techniques for Medical Diagnosis Díaz I., Montañés E., Combarro E.F. and Espuña-Pons M. .................. 416 The Bloodhound: a chirp-tracking algorithm for chirps separation Dugnol B., Fernández C., Galiano G. and Velasco J. . .......................... 427 Quadratic Discrete Dynamics Associated to Quadratic Maps in the Plane Durán R., Hernández L., and Muñoz-Masqué J....................................... 437 Optimal Control of Chemical Birth and Growth Processes in a Deterministic Model Escobedo R. and Fernández L. ................................................................ 449 Fish Swarm Intelligent Algorithm for Bound Constrained Global Optimization Fernandes E. M. G. P. .............................................................................. 461 Numerical analysis of a contact problem including bone remodeling Fernández J. and Martínez R. ................................................................. 473 @CMMSE Page 730 of 1461 ISBN: 978-84-612-9727-6

Evaluation of two Parallel High Performance Libraries applied to the solution of two-group Neutron Diffusion Equation Flores O., Vidal V., Drummond L.A. and Verdú G................................. 485 Parallel Simulation of SystemC Model for Power Line Communication Network Galiano V., et al. ..................................................................................... 497 Some properties of the polynomial vector fields having a polynomial first integral García B., Giacomini H., Pérez del Río J. .............................................. 507 A numerical method appropriate for solving frst-order IVPs with two fixed points García-Rubio R., Ramos H. and Vigo-Aguiar J. .................................... 512 Potential fluid flow computations involving free boundaries with topological changes Garzon M., Gray L. and Sethian J. . ........................................................ 521 Probabilistic algorithm to find a normal basis in special finite fields Gashkov S.B., Gashkov I.B. .................................................................... 532 Decision Procedure for Modal Logic K Golinska-Pilarek, J., Muñoz-Velasco, E., Mora, A. ............................... 537 Modeling cognitive systems with Category Theory: Towards rigor in cognitive sciences Gomez J. and Sanz R. ............................................................................. 549 Evolution of a Nested-Parallel Programming System Gonzalez-Escribano A. and Llanos D...................................................... 554 Solution of Symmetric Toeplitz Linear Systems in GPUs Graciá L., Alonso P. and Vidal A. ........................................................... 560 J-MADeM, a market-based model for complex decision problems Grimaldo F., Lozano M., Barber F. ........................................................ 572 Semi-Coupled Derivative scheme for Turbulent Flows Hokpunna A. ............................................................................................ 584 @CMMSE Page 731 of 1461 ISBN: 978-84-612-9727-6

An ETD-Crank-Nicolson scheme applied to Finance Janssen B.................................................................................................. 589 Equilibria and simulations in a metapopulation model for the spread of infectious diseases. Juher D., Ripoll J., Saldaña J. ................................................................. 600 On the L-fuzzy generalization of Chu correspondences Krídlo O and Ojeda-Aciego M. . ............................................................ 608 Filter Diagonalization with Prolates: Discrete Signal Data Levitina T. and Brandas E. J. ................................................................... 618 HP-FASS: A Hybrid Parallel Fast Acoustic Scattering Solver López J.A. et al. ....................................................................................... 622 A note on the first and second order theories of equilibrium figures of celestial bodies López- Ortí, Forner M. and Barreda M. ................................................ 633 Application of HOSVD to aerodynamics. The problem of shock waves. Lorente L., Alonso D., Vega J. M. and Velázquez A. ........................... 638 An Integrodifference model for the study of long distance dispersal coupled to Allee effect in a biological invasion Lou S. and Castro W. Jr. ........................................................................ 649 Solving Out-of-Core Linear Systems on Desktop Computers Marqués M., Quintana-Ortí G., Quintana-Ortí E. and van de Geijn R .... 660 ADITHE: An approach to optimise iterative computation on heterogeneous multiprocessors Martínez J.A. , Garzón E.M., Plaza A., García I. ................................... 665 A pipelined parallel OSIC algorithm based on the square root Kalman Filter for heterogeneous networks of processors Martínez-Zaldívar F.J., Vidal-Maciá A.M. and D. Giménez D. ............ 677 @CMMSE Page 732 of 1461 ISBN: 978-84-612-9727-6

Parallel Nonlinear Conjugate Gradient Algorithms on Multicore Architectures Migallón H., Migallón V. and Penades J. ............................................... 689 A Hybrid Approach for Learning with Imbalanced Classes using Evolutinary Algorithms Milaré C.R., Batista G. and Carvalho A. ................................................. 701 On the Continuity of Incompatibility Measures on Fuzzy Sets Montilla W., Castiñeira E. and Cubillo S. ............................................... 711 @CMMSE Page 733 of 1461 ISBN: 978-84-612-9727-6

Contents:

Volume III

RePMLK : A Relational Prover for Modal Logic K Mora, A., Muñoz-Velasco, E. and Pilarek, J.G. ...................................... 743 Numerical methods for a singular boundary value problem with application to a heat conduction model in the human head Morgado L. and Lima P. ......................................................................... 755 Symbolic computation of the exponential matrix of a linear system Navarro J.F. and Pérez A. ....................................................................... 765 Attitude determination from GPS signals Nicolás A. P. et al..................................................................................... 774 Flow analysis around structures in slow fluids and its applications to environmental fluid phenomena Oharu S., Matsuura Y. and Arima T. . .................................................... 781 Exploiting Performance on Parallel T-Coffee Progressive Alignment by Balanced Guide Tree Orobitg M., Cores F. and Guirado F. ....................................................... 793 FPGA Cluster Accelerated Boolean Synthesis Pedraza C. et al......................................................................................... 806 Petri Nets as discrete dynamical systems. Pelayo F., Pelayo M. Valverde J.C. and Garcia-Guirao J.A. .................. 817 A Lower Bound for the Oriented-Tree Network Design Problem based on Information Theory Concepts Pérez-Bellido A. et al .............................................................................. 821 @CMMSE Page 734 of 1461 ISBN: 978-84-612-9727-6

Evaluating Sparse Matrix-Vector Product on the FinisTerrae Supercomputer Pichel J.C. et al......................................................................................... 831 Optimal Extended Optical Flow and Statistical Constraint Picq M, Pousin J and Clarysse P.............................................................. 843 Optimization of a Hyperspectral Image Processing Chain Using Heterogeneous and GPU-Based Parallel Computing Architectures Plaza A, Plaza J, Sánchez S and Paz A ................................................... 854 Interior point methods for protein image alignment Potra F. .................................................................................................... 866 Error Analysis on the implementation of explicit Falkner methods for y’’ = f(x,y) Ramos H. and Lorenzo C. ....................................................................... 874 An approximate solution to an initial boundary valued problem to the Rakib-Sivashinsky equation Rebelo P. ................................................................................................. 884 Shared memory programming models for evolutionary algorithms. Redondo J.L.,García I. and Ortigosa P.M. ............................................. 893 A Reinvestment Mechanism for Incentive Collaborators and Discouraging Free Riding in Peer-to-Peer Computing Rius J., Cores F. and Solsona F. .............................................................. 905 Dimensionality Reduction and Parallel Computing for Malware Detection Rodriguez A. et al .................................................................................... 917 The complexity space of partial functions: A connection between Complexity Analysis and Denotational Semantics Romaguera S., Schellekens MP. and Valero O. ...................................... 922 Spectral centralities of complex networks vs. local estimators Romance M. ............................................................................................ 933 @CMMSE Page 735 of 1461 ISBN: 978-84-612-9727-6

Computational Methods for Finite Semifields Rúa I., Combarro E. and Ranilla J .......................................................... 937 First Programming, then Algebra Rubio J ..................................................................................................... 945 On the Approximation of Controlled Singular Stochastic Processes Rus G., Stockbridge R. and Wade B. ...................................................... 954 High-Performance Monte Carlo Radiosity on the GPU using CUDA Sanjurjo J. R., Amor M., Bóo M., Doallo R. and Casares J. ................. 965 Web Services based scheduling in OpenCF Santos A., Almeida F. and Blanco V. ....................................................977 An Algorithm for Generating Sequences of Random Tuples on Special Simple Polytopes Shmerling E. ............................................................................................989 Computational aspects in the investigation of chaotic multi-strain dengue models Stollenwerk N., Aguiar M. and. Kooi B. W. ........................................... 995 Perfect secrecy in Semi-Linear Key Distribution Scheme Strunkov S. and Sánchez S ...................................................................... 1003 Numerical Approximation of Forward-Backward Differential Equations by a Finite Element Method Teodoro M.F., Lima P.M., Ford N.J. and Lumb P.M. ............................. 1010 Trends in the formation of aggregates and crystals from M@Si_16 clusters. A study from first principle calculations. Torres M.B. et al .....................................................................................1020 A statistical characterization of differences and similarities of aggregation functions Troyano L. and Rodríguez-Muñiz L. J. ..................................................1030 Decoding of signals from MIMO communication systems using Direct Search methods Trujillo R., Videl A.M., García V. . ........................................................ 1042 @CMMSE Page 736 of 1461 ISBN: 978-84-612-9727-6

Unification of Analysis with Mathematica Ufuktepe U. and Kapcak S......................................................................1053 Application of the generalized finite difference method to solve advection-diffusion equation Ureña F., Benito J.J. and Gavete L ..........................................................1062 Analytic likelihood function for data analysis in the starting phase of an influenza outbreak Van Noort S., Stollenwerk N. and Stone L. ............................................ 1072 Accelerating sparse matrix vector product with GPUs Vázquez F., Garzón E.M., Martínez J.A., Fernández J.J. .......................1081


Contents:

Volume IV

A view of Professor Giampietro Allasia Venturino E. .............................................................................................1114 Enhancing Workload Balancing in Distributed Crowd Simulations through the Partitioning Method Vigueras G., Lozano M. and Orduña J.M. ..............................................1117 Energy decay rates for solutions of the waves equation with damping term and acoustic boundary conditions Yeoul Park J. and Gab Ha T. .................................................................1129 Finite Element Solution of the Stationary Schrödinger Equation Using Standard Computational Tools Young T.D., Romero E. and Roman J.E..................................................1140 Analysis of the detectability time in fault detection schemes for continuous-time systems Zufiria P.J. ...............................................................................................1151 Machine learning techniques applied to the construction of a new geomechanical quality index Araujo M, J. M. Matías, J. M. Rivas T. and Tabeada J. .........................1163 Computational Methods for Immision Analysis of Urban Atmospheric Pollution. Arroyo A, Corchado E. and Tricio V. .....................................................1169


Optimal Selection of Air Pollution Control Equipments for a Network of Refinery Stacks Azizi N. et al. ...........................................................................................1177 Numerical Study of Depth-Averaged 90º Channel Junctions Baghlani A. ............................................................................................. 1189 Mathematical modelling for musical representation and computation Castaneda E..............................................................................................1200 On Plotting Positions on Normal Q-Q Plots. R Script Castillo-Gutiérrez, S.

and Lozano-Aguilera, E. . ....................................1210

A summarization of SOM ensembles algorithm to boost the performance of a forecasting CBR system applied to forest fires. Corchado E., Mata A. and Baruque B. ....................................................1215 An Adaptative Mathematical Model for Pattern Classification in Microarray Data Analysis De Paz J.F., Rodríguez S., Bajo J. and Corchado J.M. ..........................1223 A new clustering algorithm applying a hierarchical method neural network De Paz J.F., Rodríguez S., Bajo J. and Corchado J.M. ..........................1235 A Cyclic scheduling algorithm for re-entrant manufacturing systems Fattahi P. ................................................................................................. 1247 Comparative analysis inversion using wavelet expansions Fernandez Z. ............................................................................................1257 Learning Computational Fluid Dynamics using Active Strategies: The Matlab Experience solving the Navier-Stokes Equations Fernández -Oro, JM. et al.........................................................................1268 Design of a simple and powerful Particle Swarm optimizer García E., Fernández J.L. . .....................................................................1280 @CMMSE Page 739 of 1461 ISBN: 978-84-612-9727-6

Numerical simulation of the welding process in duplex stainless steel plates by FEM García-Nieto P.J. et al ..............................................................................1291 Operational Experience with CMS Tier-2 Sites González I.. .............................................................................................1298 On the Dynamics of a Viscoelastic String González-Santos G., Vargas-Jarillo C. ...................................................1310 Statistics: Analysis, interpretation and presentation of data using new teaching resources Hernández F. et al. ..................................................................................1320 Transmission coefficient in heterostructures Hdez.-Fuentevilla C., Lejarreta J. D. ......................................................1326 Estimating of practical CO2 Emission Factors for Oil and Gas Plants Kahforoushan D. et al. ............................................................................1335 Nematodynamic Theory of Liquid Crystalline Polymers Leonov A. I. ............................................................................................1349 3D Simulation of Charge Collection and MNU in Highly-Scaled SRAM Design Lin L., Yuanfu Z. and Suge Y. ................................................................1357 Parallel Meshfree Computation for Parabolic Equations on Graphics Hardware Nakata S. .................................................................................................1365 Determining vine leaf water stress by functional data analysis Ordóñez C. et al. ......................................................................................1376 On the Minimum-Energy Problem for Positive Discrete Time Linear Systems Rumchev V. and Chotijah S. ...................................................................1381 @CMMSE Page 740 of 1461 ISBN: 978-84-612-9727-6

Soft Computing for detecting thermal insulation failures in buildings Sedano J. et al...........................................................................................1392 Microphysical thermal modelling of porous media: Application to cosmic bodies Skorov Y., Keller H. U., Blum J. and Gundlach B. ................................1403 Thermochemical Fragment Energy Method for Quantum Mechanical Calculations on Biomolecules Suárez E., Díaz N. and Suárez D. ...........................................................1407 Robust techniques for regression models with minimal assumptions. An empirical study. Van der Westhuizen M., Hattingh G. and Kruger H. ..............................1417 A Computational Study on the Stability-Aromaticity Correlation of Triply N-Confused Porphyrins Yeguas V., Cárdenas-Jirón G., Menéndez N. and López R. ...................1428

Abstracts

X-ray crystallography: from 'phase problem' to 'moduli problem’ Borge J. ....................................................................................................1440 Some relationships between global measures on a network and the respective measures on the dual and the bipartite associated networks Criado R., Flores J., García del Amo A. and Romance M. .....................1441 Exponential Fitting Implicit Runge-Kutta Methods De Bustos-Muñoz M.T. and Fernández A.. ............................................1444 Running Error for the Evaluation of Rational Brezier Surfaces through a Corner Cutting Algorithm Delgado J. and Peña J.M. .........................................................................1445 @CMMSE Page 741 of 1461 ISBN: 978-84-612-9727-6

An approach to sustainable petascale computing Drummond L.A. ......................................................................................1449 Efficient methods for numerical simulations in electrocardiology Gerardo-Giorda L. et al ............................................................................1450 Mesh generation for ship hull geometries for Optimization Hopfensitz M., Matutat J.C. and Urban K. .............................................1452 Modelling and Optimization of Ship Hull Geometries Hopfensitz M., Matutat J.C. and Urban K. .............................................1453 Computation of High Frequency Waves in Heterogeneous Media Jin S. ........................................................................................................1454 Mathematical modelling of genetic effects in the transmission of pneumococcal carriage and infection Lamb K., Greenhalgh D. and Robertson C. .............................................1455 Quantum Calculations for Gold Clusters, Complexes and Nanostructures Liu X., Hamilton I., Krawczyk R. and Schwerdtfeger P. ......................1456 Monotone methods for neutral functional differential equations. Obaya R....................................................................................................1459 Broken spin-symmetry HF and DFT approaches. А comparative analysis for nanocarbons Sheka E. ...................................................................................................1460


Proceedings of the International Conference

on Computational and Mathematical Methods

in Science and Engineering, CMMSE 2009

30 June, 1–3 July 2009.

RePMLK : A Relational Prover for Modal Logic K

Mora, A.1, Munoz-Velasco, E.1 and Golinska-Pilarek, J.2

1 Department of Applied Mathematics, University of Malaga. Spain

2 Institute of Philosophy, Warsaw University.

National Institute of Telecommunications. Poland.

emails: [email protected], [email protected],[email protected]

Abstract

We introduce an automatic theorem prover, called RePMLK , for a proof systemin the style of dual tableaux for the relational logic associated with modal logic K.It is the first implementation of a specific relational prover for a standard modallogic. The main contribution of this paper is the implementation of new rules,called (k1) and (k2), which substitute the classical relational rules for compositionand negation of composition in order to guarantee not only that every proof tree isfinite but also to improve the efficiency of the prover. Moreover, this work wouldbe the basis for successive extensions of modal logic K, such as T, B and S4.

Key words: Relational Logic, Modal Logic, Dual Tableau Methods, Implemen-

tation theorem provers.

1 Introduction

Relational proof systems in the style of Rasiowa-Sikorski, called dual tableaux, arepowerful tools for performing the four major reasoning tasks: verification of validity,verification of entailment, model checking, and verification of satisfaction. The systemof the basic relational logic provides the common relational core of all dual tableaux.Therefore, for each particular theory we need only to expand the basic relational logicwith specific relational constants and/or operators satisfying the appropriate axioms,and then we design specific rules corresponding to given properties of a logic and weadjoin them to the core set of the rules. Dual tableau systems have been constructedfor many non-classical logics [4, 5, 8, 10–12,15,18–21].

In this paper, we introduce an automatic theorem prover, called RePMLK , for arelational proof system in the style of dual tableaux for the relational logic associatedwith standard modal logic K, given in [16]. It is is the first implementation of a relational



theorem prover for modal logics and it is the basis of successive extensions of modallogic K, such as T, B , and S4. The main contribution of this work is the implementationof two new rules for composition and complement of composition, called (k1) and (k2),which not only guarantee that every proof tree is finite but also improve the efficiencyof the prover.

There are some implementations of relational provers. For example, an implemen-tation of the proof system for the classical relational logic is described in [7]. In [9], animplementation of translation procedures from non-classical logics to relational logic ispresented. Moreover, in [6, 14] there are implementations of relational logics for orderof magnitude reasoning.

Focusing our attention in modal logics, there are in the literature many implemen-tations which use other methods as tableaux, sequent calculus, etc.

Classical tableau provers have been presented in papers such as [3, 17]. In [3] theauthors present the tableau-based theorem prover 3TAP (version 4.0) in which in-clude methods for handling redundant axiom sets, utilization of pragmatic informationcontained in axioms to rearrange the search space, and a graphical user interface forcontrolling 3TAP and visualizing its output. In [2], an implementation in Prolog calledleanK is presented for modal deduction. This prover employs a tableaux procedure thatexploits Prolog’s built-in clause indexing scheme and backtracking mechanisms insteadof relying on elaborate heuristics. These provers are much easier to be understood andadapted. In this line, RePMLK has been developed in Prolog and take advantage of thepowerful capabilities of this language: fast prototyped, modular, and easily extensibleto other modal logics.

The paper is organized as follows. In section 2, we present the relational proofsystem for modal logic K. In section 3, we present the implementation of our proverRePMLK , and finally, some conclusions and prospects of future work are presented insection 4.

2 Relational proof system for modal logic K

In this section, we sketch the construction of a relational proof system in the style ofdual tableaux for the relational logic associated with standard modal logic K, presentedin [16]. To begin with, we present the well known modal logic K [1].

The language of K consists of the symbols from the following pairwise disjoint sets:

• V - a countable infinite set of propositional variables;

• ¬,∨,∧ - the set of the classical propositional operations of negation (¬), dis-junction (∨), and conjunction (∧);

• 〈R〉 - the set consisting of modal propositional operation called the possibility

operation.

As usual, the modal operation [R] of necessity is defined by [R]def= ¬〈R〉¬.


Mora, Golinska-Pilarek, Munoz-Velasco

The set of K-formulas is the smallest set including the set of propositional variablesand closed with respect to all the propositional operations.

A K-model is a structure M = (U,R,m), where U is a non-empty set (of states), R isa binary relation on U , and m is a meaning function such that m(p) ⊆ U , for everypropositional variable p ∈ V. The relation R is referred to as the accessibility relation.

The satisfaction relation is defined inductively as usual, being ϕ and ψ K-formulas.We emphasize here the meaning for the modal formulas:

M, s |= 〈R〉ϕ iff there exists s′ ∈ U such that (s, s′) ∈ R and M, s′ |= ϕ.

A K-formula ϕ is said to be true in a K-model M = (U,R,m), M |= ϕ, whenever forevery s ∈ U , M, s |= ϕ, and it is K-valid whenever it is true in all K-models.

Now, we define the relational logic, RLK, appropriate for expressing formulas of themodal logic K. The vocabulary of the language of the relational logic RLK consists ofthe symbols from the following pairwise disjoint sets:

• OV = z0, z1, . . . - a countable infinite set of object variables;

• RV = S1, S2, . . . - a countable infinite set of relational variables;

• R - the set consisting of the relational constant R representing the accessibilityrelation from K-models;

• −,∪,∩, ; - the set of relational operations.

The set of relational terms, RT, is the smallest set which icludes RV and satisfies:

If P,Q ∈ RT, then −P,P ∪Q,P ∩Q, (R;P ) ∈ RT.

RLK-formulas are of the form ziTzj , where zi, zj are object variables and T is anyrelational term.

An RLK-model is a structure M = (U,R,m), where U is a non-empty set, R is a binaryrelation on U , and m is a meaning function satisfying:

• m(S) = X × U , where X ⊆ U , for every relational variable S;

• m(R) = R, i.e., R is the interpretation of the relational constant R;

• m extends to all the compound relational terms as follows:

m(−P ) = (U × U) −m(P );

m(P ∪Q) = m(P ) ∪m(Q); m(P ∩Q) = m(P ) ∩m(Q);

m(R;P ) = (x, y) ∈ U × U : ∃z ∈ U ((x, z) ∈ R ∧ (z, y) ∈ m(P )).



Let M = (U,R,m) be an RLK-model. A valuation in M is any function v: OV → U .An RLK-formula ziTzj is satisfied in an RLK-model M by a valuation v, M, v |= ziTzjwhenever (v(zi), v(zj)) ∈ m(T ). A formula is true in M whenever it is satisfied by allthe valuations in M, and it is RLK-valid whenever it is true in all RLK-models.

The translation of K-formulas into relational terms starts with a one-to-one assignmentof relational variables to the propositional variables. Let τ ′ be such an assignment.Then the translation τ of formulas is defined inductively as follows:

τ(p) = τ ′(p), for any propositional variable p ∈ V;

τ(¬ϕ) = −τ(ϕ); τ(ϕ ∨ ψ) = τ(ϕ) ∪ τ(ψ); τ(ϕ ∧ ψ) = τ(ϕ) ∩ τ(ψ);

τ(〈R〉ϕ) = (R; τ(ϕ)).

The following result ensures the preservation of validity via translation of formulas ofmodal logic into relational terms.

Theorem 1 For every K-formula ϕ, ϕ is K-valid iff z1τ(ϕ)z0 is RLK-valid.

We finish this section by presenting the relational proof system for our logic [13]. Rela-tional proof systems are determined by the axiomatic sets of formulas and rules whichapply to finite sets of relational formulas. The axiomatic sets take the place of axioms.The rules are intended to reflect properties of relational operations and constants.There are two groups of rules: decomposition rules and specific rules. Given a formula,the decomposition rules of the system enable us to transform it into simpler formulas,while the specific rules enable us to replace a formula by some other formulas. Therules have the following general form:

(∗)X ∪ Ψ

X ∪ Φor (∗∗)

X ∪ Ψ

X ∪ Φ1 |X ∪ Φ2

where X,Ψ,Φ,Φ1,Φ2 are finite non-empty sets of formulas such that X ∩ Ψ = ∅. Arule of the form (**) is a branching rule. In a rule, the set above the line is referred toas its premise and the set(s) below the line is (are) its conclusion(s). A rule of the form(∗) (resp. (∗∗)) is applicable to a finite set Y if and only if Y = X ∪Ψ and Φ 6⊆ Y (resp.Φ1 6⊆ Y or Φ2 6⊆ Y ), that is an application of a rule must introduce a new formula.A new variable is a variable which appears in the conclusion of one rule but does notappear in its premise.

Decomposition rules of RLK-dual tableau are (∪), (∩), (−∪), (−∩), (−), (k1), and (k2)of the following forms:

For every k ≥ 1 and for all relational terms P and Q,

(∪)X ∪ zk(P ∪ Q)z0

X ∪ zkPz0, zkQz0(∩)

X ∪ zk(P ∩ Q)z0

X ∪ zkPz0 |X ∪ zkQz0



(−∪)X ∪ zk−(P ∪ Q)z0

X ∪ zk−Pz0 |X ∪ zk−Qz0(−∩)

X ∪ zk−(P ∩ Q)z0

X ∪ zk−Pz0, zk−Qz0

(−)X ∪ zk−−Pz0

X ∪ zkPz0

For all k, l,m ≥ 1 and for all relational terms Pi, Qj, 1 ≤ i ≤ j, 1 ≤ j ≤ m,

(k1)X ∪ zk−(R; Q1)z0, . . . , zk−(R; Qm)z0

X ∪ zk1−Q1z0, . . . , zk1+(m−1)−Qmz0

(k2)X ∪ zk(R;Pi)z0i∈1,...,l ∪ zk−(R; Q1)z0, . . . , zk−(R; Qm)z0

X ∪ zk1Piz0, . . . , zk1+(m−1)Piz0i∈1,...,l ∪ zk1

−Q1z0, . . . , zk1+(m−1)−Qmz0

provided that:

• zk(R; T )z0 6∈ X and zk−(R; T ′)z0 6∈ X for all terms T and T ′;

• zk < zk1and k1 is the minimum entire number such that zk1

is a new variable.

The specific rule of RLK-dual tableau, (right), is of the form:

For all k ≥ 1, j ≥ 0 and for every relational variable S,

(right)X ∪ zkSzj

X ∪ zkSzl, zkSzj

provided that:

• l is the minimum entire number such that zl occurs in X ∪ zkSzj;

• l 6= j and zkSzl 6∈ X.

A finite set of RLK-formulas is said to be an RLK-axiomatic set whenever it is a supersetof zkPzj , zk−Pzj, for some object variables zk, zj and for some relational term P .

Let z1Tz0 be an RLK-formula. An RLK-proof tree of z1Tz0 is a tree with the followingproperties:

• the formula z1Tz0 is at the root of this tree;

• each node except the root is obtained by an application of a rule to its predecessornode; the rules are applied with the following ordering: (−), (∪), (−∩), (∩), (−∪),(right), (k1), and (k2);

• a node does not have successors whenever its set of formulas is an RLK-axiomaticset or none of the rules is applicable to its set of formulas.

A branch of an RLK-proof tree is closed whenever it contains a node with an RLK-axiomatic set of formulas. An RLK-proof tree is closed if and only if all of its branchesare closed. An RLK-formula z1Tz0 is RLK-provable whenever there is a closed RLK-prooftree of it, which is then referred to as its RLK-proof.

The following result ensures the equivalence between validity of a K-formula and prov-ability in our relational system.



Theorem 2 (Relational Soundness and Completeness of K) For every K-formula

ϕ the following conditions are equivalent:

1. ϕ is K-valid;

2. z1τ(ϕ)z0 is RLK-provable.

3 The implementation of RePMLK

In the previous section, we present a new proof system based on relational dual tableauxfor modal logics K. The decision procedure developed in the theoretical framework hasimproved the rules and the engine of the prover, and the result is a new ATP, calledRePMLK . In this section, we summarize how RePMLK works in three levels: repre-sentation of the formulas, rules of the new proof system, and significant enhancementsin the engine of the prover. From now on, we will work with the relational translationmodal formulas as explained in the previous section. We represent the formula z1Tz0as the Prolog fact: rel([1], T, z1, z0). Node [1] denotes the root of the proof tree whichis developed by the Prolog tool by applying the rules of RLK.

Example 1 The formula z1(R;−(p∩ q))∪ (−(R;−p)∩ (−(R;−q))z0 of RLK1 is trans-

lated into the following fact in Prolog:

rel([1],uni(comp(r,opp(inter(p, q))), inter(opp(comp(r,opp(p))),

opp(comp(r,opp(q))))), Z1, Z0)

Prolog knows the leaf in which it must apply any rule, because the predicate

leaves([[1,\ldots,1],\ldots, [1,\ldots,k]])

stores the leaves that the tool must close. Prolog will try to satisfy the relations in theleaf nodes. If the tool can close all the leaves in the tree, then formula is valid.

As said above, rules of RLK have the following general form:

(∗)X ∪ Ψ

X ∪ Φor (∗∗)

X ∪ Ψ

X ∪ Φ1 |X ∪ Φ2

Now, we explain how our prover works when (∗∗) is applicable to a set of formulasY = X ∪ Ψ, the case of (∗) is similar. If Y is appears in the leaf [i1, i2, . . . , ik], thesystem divides this leaf in 2 new leaves, labeled as [i1, i2, . . . , ik, 1] and [i1, i2, . . . , ik, 2]by copying X∪Φ1 to the node [i1, i2, . . . , ik, 1], and X ∪Φ2 to the node [i1, i2, . . . , ik, 2](see Figure 1).In [14], an explanation of the common rules for modal logics (union, intersection, etc.)is available. We have translated the rules for RLK to clauses in Prolog and now, weoutline the implementation of the powerful new rules (k1) and (k2):

1This formula is the relational translation of the theorem of K: (p ∧ q) → (p ∧ q).



(Before) [1]

[. . . ]

[i1,i2,. . . ,ik]

X

. . .

. . .

[1]

[. . . ]

[i1,i2,. . . ,ik]

[i1,i2,. . . ,ik,1]

X \ Φ ∪ Φ1

[i1,i2,. . . ,ik,2]

X \ Φ ∪ Φ2

. . .

. . .

(After)

Figure 1: Division of a leaf of the tree.

k1(IdLeaf):-

rel(IdLeaf,opp(comp(r,Q)),Zk,Z0),

\+rel(IdLeaf,comp(r,_),Zk,Z0),

...

allOppCompRels(rel(IdLeaf,opp(comp(r,Q)),Zk,Z0), ListRels),

...

\+rule_used(IdLeaf,k1,ListRels),!,

add_relsK1(ListRels,ListRelsDeduced),

write_rule(’k1 ’, ListRels, ListRelsDeduced),

add_list_of_relations(IdLeaf,ListRelsDeduced).

These rules are related with the composition and opposite composition relations. WhenProlog tries to apply the (k1) rule in the leaf selected by the engine (depth first search), itchecks if in IdLeaf leaf any formula matchs with rel(IdLeaf,opp(comp(r,Q)),Zk,Z0)

and in the same IdLeaf leaf that none formula matches with rel(IdLeaf,comp(r, ),

Zk, Z0).Then Prolog searches a list of formulas ListRels using allOppCompRels predicate,

with the pattern rel(IdLeaf,opp(comp(r,Qi)),Zk,Z0) for the instantiated variablesZk,Z0. If this rule has not been applied previously in IdLeaf for these variables(rule used), then the list of formulas rel(IdLeaf,opp(Qi)),Zki,Z0) (being Zki

new variables) are deduced and stored in ListRelsDeduced using add relsK1 predi-cate. ListRelsDeduced formulas are added to node IdLeaf with the predicate calledadd list of relations.Now, we focus our attention in the rule (k2):

k2(Leaf):-

rel(Leaf,opp(comp(r,Q)),Zk,Z0),

rel(Leaf,comp(r,P),Zk,Z0),

...

allOppCompRels(rel(Leaf,opp(comp(r,Q)),Zk,Z0), ListOppcomRels),

allCompRels(rel(Leaf,comp(r,P),Zk,Z0), ListComRels),

append(ListOppcomRels,ListComRels,ListRels),

...

\+rule_used(Leaf,k2,ListRels),

...

add_relsK2(ListOppcomRels, ListComRels,ListRelsDeduced),

write_rule(’k2 ’, ListRels, ListRelsDeduced),

add_list_of_relations(Leaf,ListRelsDeduced).



When Prolog matchs any formula in IdLeaf leaf with rel(IdLeaf,opp(comp(r,Q)),

Zk, Z0) and any formula with rel(IdLeaf, comp(r, ) ,Zk,Z0), (k2) decompositionrule can be applied. Then Prolog searches a list of formulas ListOppcomRels with thepattern rel(IdLeaf, opp(comp(r,Qi)), Zk,Z0) using allOppCompRels predicate,and a list of formulas ListComRels with the pattern rel(Leaf, comp(r, Pj), Zk,

Z0) using allCompRels predicate.Finally, if this rule has not been applied in the node IdLeaf, then the list of formulasrel(IdLeaf,Pj, Zki,Z0) rel(IdLeaf, opp(Qi), Zki,Z0) (ListRelsDeduced) arededuced where Zki are new variables using add relsK2 predicate. In the same waythat (k1) rule, ListRelsDeduced formulas are added to node IdLeaf.

Now, we show the engine of RePMLK . The main predicate in the inference engine isrun engine that examines the first leaf of the tree which has to be checked and triesto apply the rules to the relations that appear in this leaf.

run_engine:-

leaves(L),

\+is_list_null(L),

...

engine,!.

engine:-

apply_rules.

apply_rules:-

first_leaf([FirstLeaf]),

maplist(apply_rules_in_leaf,[FirstLeaf]),!,

...

run_engine.

apply_rules_in_leaf(FirstLeaf):-

not2(FirstLeaf)-> axiomatic_set;

uni(FirstLeaf)-> axiomatic_set;

notinter(FirstLeaf)-> axiomatic_set;

inter(FirstLeaf)-> axiomatic_set;

notuni(FirstLeaf)-> axiomatic_set;

right(FirstLeaf)-> axiomatic_set;

k1(FirstLeaf)-> axiomatic_set;

k2(FirstLeaf)-> axiomatic_set,!.

apply_rules_in_leaf(_):-

leaves([]),

write(’ OK. There are no Leaves in the proof tree. ’), nl,

write(’ VALID. ’), nl, !.

While the tree has opened leaves, run engine is recursively called. If all leaves areclosed, then system informs to the user that the proof is finished and it is possibleto trace (used rules) predicate what rules have been used in the proof process. Theengine of the RePMLK uses the mechanism of pattern machine of Prolog to detect ifexists, in any leaf of the tree, an axiomatic set, then deletes the corresponding leaf andinforms to the user.



axiomatic_set:-

rel(NumLeaf,equal,Zk,Zk),

nl,

remove_leaf(NumLeaf,[rel(NumLeaf,equal,Zk,Zk)]),!.

.....

Example 2 In this example, we execute RePMLK to prove the following formula 2.

rel([1], uni(opp(inter(opp(comp(r, opp(p))), opp(comp(r, opp(q))))),

opp(comp(r, opp(inter(p, q))))), x, y).

This example is satisfied by RePMLK with the Prolog predicate:

?run(′reomtheorK2.pl′,′ logaxiomk2.txt′).

The following report in logaxiomd2.txt file is returned:

[rel([1], uni(opp(inter(opp(comp(r, opp(p))), opp(comp(r, opp(q))))),

opp(comp(r, opp(inter(p, q))))), x, y)]

_________________________________________________________________________ Union Rule

[rel([1], opp(inter(opp(comp(r, opp(p))), opp(comp(r, opp(q))))), x, y),

rel([1], opp(comp(r, opp(inter(p, q)))), x, y)]

[rel([1], opp(inter(opp(comp(r, opp(p))),

opp(comp(r, opp(q))))), x, y)]

__________________________________________________ Opposite Intersection Rule

[rel([1], opp(opp(comp(r, opp(p)))), x, y),

rel([1], opp(opp(comp(r, opp(q)))), x, y)]

[rel([1], opp(opp(comp(r, opp(p)))), x, y)]

____________________________________________ 2 Not Rule

[rel([1], comp(r, opp(p)), x, y)]

[rel([1], opp(opp(comp(r, opp(q)))), x, y)]

____________________________________________ 2 Not Rule

[rel([1], comp(r, opp(q)), x, y)]

[rel([1], opp(comp(r, opp(inter(p, q)))), x, y),

rel([1], comp(r, opp(p)), x, y),

rel([1], comp(r, opp(q)), x, y)]

___________________________________________________ k2 Rule

[rel([1], opp(opp(inter(p, q))), a1, y),

rel([1], opp(p), a1, y), rel([1], opp(q), a1, y)]

[rel([1], opp(opp(inter(p, q))), a1, y)]

2This formula is the relational translation of the theorem of K: (p ∧ q) → (p ∧ q).



___________________________________________ 2 Not Rule

[rel([1], inter(p, q), a1, y)]

[rel([1], inter(p, q), a1, y)]

__________________________________________________ Intersection Rule

rel([1, 1], p, a1, y) | rel([1, 2], q, a1, y)

---------->

Found axiomatic set. Leaf: [1, 2]

- Axiomatic set: [rel([1, 2], opp(q), a1, y), rel([1, 2], q, a1, y)]

- Deleted relations in Leaf: [1, 2]

---------->

Found axiomatic set. Leaf: [1, 1]

- Axiomatic set: [rel([1, 1], opp(p), a1, y), rel([1, 1], p, a1, y)]

- Deleted relations in Leaf: [1, 1]

-------->

Variables used: [a1, x, y]

OK. There are no Leaves in the proof tree.

VALID.

used_rules([1], inter, [rel(inter(p, q), a1, y)]).

used_rules([1], not2, [rel(opp(opp(inter(p, q))), a1, y)]).

used_rules([1], k2, [rel(opp(comp(r, opp(inter(p, q)))), x, y), ...

used_rules([1], not2, [rel(opp(opp(comp(r, opp(q)))), x, y)]).

used_rules([1], not2, [rel(opp(opp(comp(r, opp(p)))), x, y)]).

used_rules([1], notinter, [rel(opp(inter(opp(comp(r, opp(p))), ...

used_rules([1], union, [rel(uni(opp(inter(opp(comp(r, opp(p))), ...

Notice that after the application of the intersection rule, the engine of RePMLK detectstwo axiomatic sets and the leaves of the tree are closed. Then the formula is valid anda trace of the rules applied is returned.

4 Conclusions and Future Work

We presented an implementation in Prolog, called RePMLK , of a relational dualtableau for modal logic K. The key step of this work is the implementation of new rules(k1) and (k2) which guarantee that every proof tree is finite and improves the efficiencyof our prover.

RePMLK makes use of backtracking of Prolog, matching mechanism for free vari-ables, and of the logic programming techniques in order to obtain an easy and modularprover. We remark that the results are promising. All the axioms and examples exe-cuted with RePMLK are proved in a few steps and it works efficiently with the newrules (k1) and (k2).

We are working in a comparison of our implementation with other provers formodal logic K using generators of formulae such as LWB benchmark formulae - for the

propositional modal logics K (http://www.lwb.unibe.ch/). Moreover, we are studyingthe complexity of our system and the extension of this prover to other modal logics as



T, B, and S4 and the improvement of its efficiency. Last, but not least, we are preparinga more general and user friendly interface which could be general for this type of logics.

Acknowledgements

This work is partially supported by the Spanish research project TIN2006-15455-C03-01 and the second author is partially supported also by project P6-FQM-02049.Thefirst author of the paper is partially supported by the Polish Ministry of Science andHigher Education grant N N206 399134.

References

[1] P. Blackburn, M. de Rijke, and Y. Venema. Modal Logic. Cambridge UniversityPress, 2002.

[2] B. Beckert and R. Gor, Free-variable Tableaux for Propositional Modal Logics,Lecture Notes in Computer Science, 1227, (1997), 91–106.

[3] B. Beckert and R. Hahnle, P. Oel, M. Sulzmann, The tableau-based theorem prover

3TAP Version 4.0 , Lecture Notes in Computer Science, 1104/1996, (2006), 303–307.

[4] D. Bresolin, J. Golinska-Pilarek, and E. Or lowska, Relational dual tableaux for

interval temporal logics, Journal of Applied Non-Classical Logics 16, No. 3-4 (2006),251–277.

[5] A. Burrieza, M. Ojeda-Aciego, and E. Or lowska, Relational approach to order of

magnitude reasoning, Lect. Notes in Artificial Intelligence 3040 (2004), 431–440.

[6] A. Burrieza, A. Mora, M. Ojeda-Aciego, and E Orlowska. An implementation of

a dual tableaux system for order-of-magnitude qualitative reasoning. InternationalJournal of Computer Mathematics, 2009. To appear.

[7] Dallien, J. and MacCaull, W., RelDT: A relational dual tableaux automated theo-

rem prover, http://www.logic.stfx.ca/reldt/

[8] I. Duntsch and E. Or lowska, A proof system for contact relation algebras, Journalof Philosophical Logic 29 (2000), 241–262.

[9] A. Formisano, E. Omodeo, and E. Or lowska, A PROLOG tool for rela-

tional translation of modal logics: A front-end for relational proof systems,in: B. Beckert (ed) TABLEAUX 2005 Position Papers and Tutorial Descrip-tions, Universitat Koblenz-Landau, Fachberichte Informatik No 12, 2005, 1–10.http://www.di.univaq.it/TARSKI/transIt/



[10] A. Formisano, M. Nicolosi Asmundo, An efficient relational deductive system for

propositional non-classical logics, Journal of Applied Non-Classical Logics 16, No.3–4 (2006), 367–408.

[11] A. Formisano, E. Omodeo, and E. Or lowska, An environment for specifying prop-

erties of dyadic relations and reasoning about them. II: Relational presentation of

non-classical logics, Lecture Notes in Artificial Intelligence 4342 (2006), 89–104.

[12] M. F. Frias and E. Or lowska, A proof system for fork algebras and its applications to

reasoning in logics based on intuitionism, Logique et Analyse 150-151-152 (1995),239–284.

[13] J. Golinska-Pilarek and E. Or lowska, Tableaux and dual Tableaux: Transformation

of proofs, Studia Logica 85 (2007), 291–310.

[14] Golinska-Pilarek, J. Mora, A; Munoz-Velasco, E.: An ATP of a Relational Proof

System for Order of Magnitude Reasoning with Negligibility, Non-Closeness and

Distance. Lecture Notes in Artificial Intelligence 5351, pp. 129-139, 2008.

[15] Golinska-Pilarek, J.; Munoz-Velasco, E.: Dual tableau for a multimodal logic for

order of magnitude qualitative reasoning with bidirectional negligibility. Interna-tional Journal of Computer Mathematics. To appear 2009.

[16] Golinska-Pilarek, J. Mora, A; Munoz-Velasco, E.: A New Decision Procedure for

modal logic K. Submitted, 2009.

[17] R. Letz, J. Schumann, S. Bayerl, W. Bibel. SETHEO: A high-perfomance theorem

prover. Journal of Automated Reasoning 8 (2), 183–212, 1992.

[18] E. Or lowska, Relational interpretation of modal logics, in: Andreka, H., Monk,D., and Nemeti, I. (eds) Algebraic Logic, Colloquia Mathematica Societatis JanosBolyai 54, North Holland, Amsterdam, 1988, 443–471.

[19] E. Or lowska, Relational proof systems for relevant logics, Journal of Symbolic Logic57 (1992), 1425–1440.

[20] E. Or lowska, Temporal logics in a relational framework, in: Bolc, L. and Sza las,A. (eds) Time and Logic-a Computational Approach, University College LondonPress (1995), 249–277.

[21] E. Or lowska and J. Golinska-Pilarek, Dual Tableaux and their Applications, A draftof the book, 2009.


Proceedings of the International Conferenceon Computational and Mathematical Methodsin Science and Engineering, CMMSE 200930 June, 1–3 July 2009.

Numerical methods for a singular boundary value

problem with application to a

heat conduction model in the human head

L. Morgado1 and P. Lima2

1 Cemat/Department of Mathematics, University of Tras-os-Montes e Alto Douro,Vila Real, Portugal

2 Cemat/Centro de Matematica e Aplicacoes, Instituto Superior Tecnico, Lisboa,Portugal

emails: [email protected], [email protected]

Abstract

A class of singular boundary value problems modeling the heat conduction inthe human head is studied. Suitable singular Cauchy problems are consideredin order to determine one parameter families of solutions in the neighborhoodof the singularities. These families are then used to construct stable shootingalgorithms to the solution of the considered problems. A finite difference methodis also introduced and, taking into account the behavior of the solution in theneighborhood of the singular points, a variable substitution is proposed to improveits convergence order. Numerical results are presented and discussed.

Key words: Singular boundary value problems, one-parameter families of solu-tions, shooting method, degenerate laplacian, finite difference

MSC 2000: 65L05

1 Introduction

In this paper we consider the following boundary value problem (BVP)

(

∣

∣y′(x)∣

∣

m−2y′(x)

)′+

N − 1

x

∣

∣y′(x)∣

∣

m−2y′(x) = f(y), 0 < x < 1 (1)

y′(0) = 0, (2)

ay(1) + by′(1) = c, (3)

where we assume that m > 1 and N ≥ 1, a > 0, b ≥ 0 and c ≥ 0. The source functionf has the form

f(y) = −αe−βy, α > 0, β > 0. (4)


Numerical methods for singular boundary value problems

The motivation for studying problem (1)-(4) comes from a mathematical model for thedistribution of heat sources in the human head. This model was originally describedby U.Flesh [3] and B.Gray [4]. In [1], N. Anderson and A. Arthurs have analysed thesame problem, which they describe in the form of the following BVP

y′′(x) +2

xy′(x) +

q(y)

K= 0, 0 < y < R; (5)

y(0) − finite, −Ky′(R) = β(y(R) − θa); (6)

here q is the heat conduction rate per unit volume, y is the absolute temperature; x isthe radial distance from the center; K is the termal conductivity (average) inside thehead, β is a heat exchange coefficient and θa is the ambient temperature. The sourcefunction q has the form

q(y) = αe−My/α, (7)

where M and α are positive constants. In [1], the authors have used complementaryvariational principles to obtain approximate solutions of this problem. In [2], R.Dugganand A.M.Goodman have considered the same problem and computed lower and upperbounding functions (which in the literature also known as upper and lower solutions).In this way, they have obtained more accurate numerical results, which is verified bythe closeness of the computed bounding functions. In [8] and [9] the authors have useda finite difference scheme to obtain the solution of a wider class of problems: insteadof the linear differential operator Ly = y′′(x) + 2

xy′(x) they have considered (p(x)y′(x))′

p(x),

where p(x) = xb0 , with b0 ≥ 1. This operator reduces to Ly when b0 = 2. They havealso introduced the boundary condition y′(0) = 0, which in this case is equivalent tothe condition that y is bounded at the origin. The problem (1)-(4), considered in thepresent paper, is also a generalization of (5), (6), where we replace the linear differentialoperator Ly by

x1−N(

xN−1|y′(x)|m−2y′(x))′

,

which represents the radial part of the N -dimensional m-laplacian. As it can be easilyseen, this operator reduces to Ly when m = 2 and N = 3. The physical meaning ofintroducing the m-laplacian in heat conduction problems is that we replace the usualFourier law, according to which the module of the heat flux is proportional to themodule of the temperature gradient, by a generalized Fourier law, which states thatthe module of the heat flux is proportional to a certain power (m− 1) of the module ofthe temperature gradient. This generalized Fourier law is usually applied to describeheat conduction in non-homogeneous media and therefore it seems reasonable to use itwhen modeling the human head.

Problem (1)-(4) is singular, in order to the independent variable, at x = 0 due tothe division by zero on the second term of the right-hand side of (1), but also in orderto the dependent variable whenever m > 2, due to the boundary condition (2).

Our main concern will be the study of the behavior of the solution in the neigh-borhood of this singular point. That is what we will do in the next section, wherewe determine one parameter families of solutions of the singular Cauchy problem (1),


Luısa Morgado, Pedro Lima

(2). Based on that behavior, in section 3 we will introduce a stable shooting algorithm,using the same approach as we did in [6] and [7]; the family of solutions parameter isvaried in order to satisfy condition (3). In section 4 we apply a finite difference schemeto solve the problem; in order to improve its convergence order, a variable substitutionis introduced, which takes into account the asymptotic behavior of the solution.

2 Behavior of the solution in the neighborhood of the

singularity x = 0

Consider the following singular initial value problem(

∣

∣y′(x)∣

∣

m−2y′(x)

)′+

N − 1

x

∣

∣y′(x)∣

∣

m−2y′(x) = f(y), x > 0 (8)

y(0) = y0, limx→0+

xy′(x) = 0, (9)

where f(y) is defined by (4).Let us look for a solution of this problem in the form:

y(x) = y0 − Cxk(1 + o(1)) (10)

y′(x) = −Ckxk−1(1 + o(1))

y′′(x) = Ck(k − 1)xk−2(1 + o(1)), x → 0+

where C is a positive constant and k > 1. If we substitute (10) in (8) we obtain

k =m

m − 1, k − 1 =

1

m − 1> 0, and C =

1

k

(

αe−βy0

N

)k−1

. (11)

In order to improve representation (10) we perform the variable substitution

y(x) = y0 − Cxk(1 + g(x))

obtaining the Cauchy problem in the new unknown g:

m − 1

kN

[

k(k − 1)(1 + g) + 2kxg′ + x2g′′]

[

1 + g +x

kg′]m−2

+

+N − 1

N

[

1 + g +x

kg′]m−1

= eCβxk(1+g) (12)

g(0) = 0, limx→0+

xg′(x) = 0. (13)

Leu us seek for a particular solution of problem (12), (13) in the form

gp(x, y0) =+∞∑

l=0,j=0,l+j≥1

gl,j (y0) xl+j mm−1 , 0 ≤ x ≤ δ (y0) , δ (y0) ≥ 0. (14)

The coefficients gl,j depending on y0 may be determined by formally inserting (14) in(12), resulting for l = 0 and j = 1:

g0,1 =CNβ

2(m − N + mN). (15)



We will now prove that the particular solution (14) is, in fact, the only solution ofproblem (12), (13).

Performing the variable substitutions z1 = g, z2 = xg′, the initial problem (12),(13) rewrites

xz′ = Az + F (x, z) + H(x)

z(0) = 0

where z =

(

z1

z2

)

, A =

(

0 1

− mN(m−1)2

−m+Nm−1

)

, F (x, z) =

(

0f(x, z1, z2)

)

, H(x) =

(

0H(x)

)

,

f(x, z1, z2) = kNm−1

eCxkβ[

(

1 + z1 + 1kz2

)2−meCxkβz1 − 1

]

and H(x) = kNm−1

(

eCxkβ − 1)

.

Since the eigenvalues of the matrix A are λ1 = − mm−1

< 0 and λ2 = − Nm−1

< 0,theorem 5 of [5] states that problem (12), (13) has a unique solution. Therefore it hasno other solution than gp.

Returning to the initial variable we easily obtain the following result.

Theorem 1 For each y0 > 0, problem (8), (9) has, in the neighborhood of x = 0, aunique holomorphic solution that can be represented by

y(x, y0) = y0 − Cxk(

1 + g0,1xk + o(xk)

)

,

where k, C and g0,1 are given by (11) and (15), respectively.

3 A shooting algorithm

As we have done in previous works for different classes of boundary value problems (see,for example, [6] and [7]), in this section we implement a shooting algorithm basing uson the behavior of the solution in the neighborhood of the singular point x = 0.

In order to do that, we consider the following regular initial value problem

(

∣

∣y′(x)∣

∣

m−2y′(x)

)′+

N − 1

x

∣

∣y′(x)∣

∣

m−2y′(x) = f(x, y),

y(δ) = y0 − Cδk(

1 + g0,1δk)

y′(δ) =

[

d

dx

(

y0 − Cxk(

1 + g0,1xk))

]

x=δ

for a certain value of y0 > 0 and δ small. This problem can be solved by any standardnumerical method (in our case we have used the NDSolve package of Mathematica[10]).Starting with an initial value for y0 (see Remark 2 below), this value is adusted by aniterative process, in order to make the solution of the initial value problem satisfy theboundary condition (3).



Remark 2 The performance of the shooting method depends strongly on the choice ofthe initial value for y0. In our case, such initial value can be obtained in a straightfor-ward way. Consider an approximation of the solution of the Cauchy problem (8)-(9),whose form is given by Theorem 1 (retaining only the first two terms of the series):

y(x, y0) = y0 − Cxk(

1 + g0,1xk)

. (16)

Replacing y by y in the boundary condition (3), we obtain an equation that can be solvedwith respect to y0. The value of y0 obtained as the root of this equation has proved to bea good initial guess for the shooting method. (In all the cases considered in the presentpaper it gives an approximation of the exact value with 2-3 correct digits).

In order to compare our results with the ones obtained by other authors, in ournumerical experiments we have considered two test cases: 1) boundary condition (3)with a = 1, b = 1, c = 0; 2) boundary condition (3) with a = 0.1, b = 1, c = 0.

In table 1 and figure 1a) we display some numerical results for test case 1, withdifferent values of m. The corresponding results for test case 2 are displayed in table 2and figure 1b).

In all the cases, we have considered N = 3, α = β = 1.

Comparing with the values presented in [9] (in the case m = 2) we verify that wehave 6-7 coincident digits in all the results.

xm = 1.5

y(x)m = 2 m = 3

0.0 0.11966751 0.36751685 0.722681380.2 0.11943427 0.36289409 0.698598590.4 0.11779993 0.34894843 0.654058740.6 0.11335039 0.32544353 0.595381380.8 0.10462835 0.29197111 0.524378321.0 0.09008105 0.24792773 0.44175312

Table 1: Numerical solution of problem (1)-(4) when N = 3, a = b = 1, c = 0 andα = β = 1 obtained with the shooting algorithm.

4 A finite difference scheme

Finite difference schemes are often used to obtain approximate solutions of bound-ary value problems, but it is known that the convergence order of such methods maydecrease in the presence of singularities.

In order to discretize (1)-(4), we introduce in the interval [0, 1] a uniform grid ofstepsize h = 1

n defined by the gridpoints xi = ih, i = 0, . . . , n. At every point xi,i = 1, . . . , n − 1, approximations for the first and second derivative of the solution are



xm = 1.5

y(x)m = 2 m = 3

0.0 0.46221338 1.14703907 2.171053400.2 0.46209582 1.14492055 2.159404150.4 0.46127247 1.13854877 2.137986680.6 0.45903426 1.12787477 2.110022920.8 0.45466122 1.11281554 2.076567721.0 0.44740965 1.09325196 2.03816311

Table 2: Numerical solution of problem (1)-(4) when N = 3, a = 0.1, b = 1, c = 0 andα = β = 1 obtained with the shooting algorithm.

0.2 0.4 0.6 0.8 1x

0.1

0.2

0.3

0.4

0.5

0.6

0.7

y m=3m=2m=1.5

0.2 0.4 0.6 0.8 1x

0.5

0.75

1.25

1.5

1.75

2

y m=3m=2m=1.5

a) b)

Figure 1: Approximate solutions of problem (1)-(4) with N = 3, b = 1, c = 0, α = β = 1and a) a = 1, b) a = 0.1.

given by the first and second central differences formulas:

y′(xi) ≃y(xi+1) − y(xi−1)

2h= y′i,

y′′(xi) ≃y(xi+1) − 2y(xi) + y(xi−1)

h2= y′′i ,

respectively.The derivatives at the endpoints x = 0 and x = 1 are approximated by the second

order formulae

y′(0) =1

2h(−3y(0) + 4y(h) − y(2h)) + O(h2)

y′(1) =1

2h(3y(1) − 4y(1 − h) + y(1 − 2h)) + O(h2).

In this way we obtain a discretized problem which is solved by the Newton method. Asan initial approximation for the iterative process we have used the function y, definedby (16), where y0 is determined as described in Remark 2.

Some numerical results obtained by this finite difference method, with h = 11000

, arepresented in table 3, for test case 1, and in table 4, for test case 2. In all the considered



xm = 1.5

y(x)m = 2 m = 3

0.0 0.11966736 0.36751677 0.722680900.2 0.11943410 0.36289402 0.698598660.4 0.11779978 0.34894837 0.654058780.6 0.11335026 0.32544348 0.595381410.8 0.10462823 0.29197106 0.524378351.0 0.09008093 0.24792768 0.44175314

Table 3: Numerical solution of problem (1)-(4) with N = 3, a = b = 1, c = 0 andα = β = 1 obtained with the finite difference scheme.

xm = 1.5

y(x)m = 2 m = 3

0.0 0.46221308 1.14703897 2.171053310.2 0.46209552 1.14492045 2.159404330.4 0.46127218 1.13854870 2.137986840.6 0.45903397 1.12787471 2.110023070.8 0.45466093 1.11281547 2.076567871.0 0.44740937 1.09325190 2.03816325

Table 4: Numerical solution of problem (1)-(4) with N = 3, a = 0.1, b = 1, c = 0 andα = β = 1 obtained with the finite difference scheme.

cases the number of iterations of the Newton method was never greater than four. Wesee that these numerical results are in good agreement with the ones obtained by theshooting method; they are also consistent with the results presented in [9]. In orderto estimate the convergence order of the finite difference method at x = 0, we havecarried out several experiments with different values of the step size h (see table 5) andused the formula

cy0= − log2

|yh3

0 − yh2

0 |

|yh2

0 − yh1

0 |, (17)

where yhi

0 is the approximate value of y0 obtained with stepsize hi (the stepsizes satisfy

the relation hi = hi−1

2, i = 1, 2, 3, 4. The results for test case 1 are presented in table 6.

The results of table 6 show that the convergence order estimates are very close to2 when m < 2 and decrease when m > 2. This is not surprising if we take into accountthe behavior of the solution in the neighborhood of x = 0. When m = 2, according toTheorem 1, the solution behaves as the function y0−Cx2, as x approaches zero. Whenm < 2, we have k = m

m−1> 2 and therefore the solution in the neighborhood of the

origin behaves as y0 − Cxγ , γ > 2, so its second derivative vannishes at the origin.The same can not be said whenever m > 2. As it can be easily seen, in this case

the solution behaves, near the origin, as y0 −Cxγ , γ < 2, that is, the second derivativetends to infinity as x tends to 0. In order to overcome this problem, for m > 2 we



h = 1100

h = 1200

h = 1400

h = 1800

m = 1.5 0.11965587 0.11966455 0.1196674 0.11966729m = 2 0.36751218 0.36751565 0.36751652 0.36751674m = 3 0.72266092 0.72267490 0.72267928 0.72268070m = 4 0.95008029 0.95015003 0.95017705 0.95018761m = 5 1.10780678 1.10793122 1.10798279 1.10800429m = 6 1.22384813 1.22400230 1.22406844 1.22409701

Table 5: Approximate solution of y0 when N = 3, a = b = 1, c = 0 and α = β = 1 fordifferent values of h.

m = 1.5 m = 2 m = 3 m = 4 m = 5 m = 6

h1 = 1200

1.99 1.98 1.62 1.36 1.26 1.21h1 = 1

1001.99 2.00 1.67 1.37 1.27 1.22

Table 6: Estimates of the convergence order at x = 0 when N = 3, a = b = 1, c = 0and α = β = 1 for different values of m.

introduce the variable substitution

t = xk2 , k =

m

m − 1. (18)

The solution in the new variable t always behaves as y0 −Ct2 as t approaches zero (forany value of m). Therefore, the variable substitution (18) enables us to recover thesecond order of convergence. Some numerical results obtained by the finite differencemethod, using this variable substitution, are presented in table 7. The correspondingestimates of the convergence order are given in table 8. These results confirm the abovearguments.

h = 1100

h = 1200

h = 1400

h = 1800

m = 1.5 0.11966680 0.11966730 0.11966743 0.11966746m = 2 0.36751218 0.36751565 0.36751652 0.36751674m = 3 0.72266938 0.72267838 0.72268064 0.72268121m = 4 0.95017870 0.95019052 0.95019349 0.95019423m = 5 1.10800242 1.10801543 1.10801870 1.10801951m = 6 1.22410108 1.22411444 1.22411779 1.22411863

Table 7: Approximate values of y0 when N = 3, a = b = 1, c = 0 and α = β = 1 fordifferent step sizes of the finite difference scheme with variable substitution.

5 Conclusions

In this paper, for a class of singular boundary value problems arising in the modeling ofheat conduction problems in the human head, numerical methods where implemented,



m = 1.5 m = 2 m = 3 m = 4 m = 5 m = 6

h1 = 1200

2.11 1.98 1.99 2.00 2.01 2.00h1 = 1

1001.94 2.00 1.99 1.99 1.99 2.00

Table 8: Estimates of the convergence order at x = 0 when N = 3, a = b = 1, c = 0and α = β = 1 of the finite difference scheme with variable substitution.

based on the asymptotic behavior of the solution in the neighborhood of the singularpoint x = 0: a shooting algorithm and a finite difference scheme, whose convergenceorder is increased by a simple variable substitution.

We remark that in the case m = 2 (which was also considered by the authors of [8]and [9]), our results suggest that second order convergence can be achieved even witha classical finite difference scheme (in spite of the singularity at x = 0). For the casem > 2, a classical finite difference scheme would not provide second order convergence,due to the behavior of the solution in the neighborhood of the origin. However, ournumerical experiments suggest that second order convergence can be obtained even inthis case, by introducing a variable substitution which makes the solution smooth nearthe origin.

In the future we intend to provide a detailed numerical analysis and, in particular,a theoretical justification for the convergence order of the considered method, whenapplied to this singular BVP. We are also planning to use extrapolation methods toaccelerate the convergence.

Acknowledgements

One of the authors, L. Morgado, acknowledge financial support from FCT, Fundacaopara a Ciencia e Tecnologia, through grant SFRH/BPD/46530/2008.

References

[1] N. Anderson and A.M. Arthurs, Complementary extremum principles fora nonlinear model of heat conduction in the human head, Bull. Math. Biol. 43

(1981), 341–346.

[2] R.C. Duggan and A.M.Goodman, Pointwise bounds for a nonlinear heat con-duction model of the human head, Bull. Math. Biol. 48 (1986), 229-236.

[3] Flesh, U., The distribution of heat sources in the human head: A theoreticalconsideration, J. Theor. Biol., 54 (1975) 285–287.

[4] Gray, B. F., The distribution of heat sources in the human head: A theoreticalconsideration, J. Theor. Biol., 82 (1980) 473–476.

[5] Konyukhova, N.B., Singular Cauchy problems for systems of ordinary differen-tial equations, USSR Comput. Math. Math. Phys., 23 (1983) 72–82.



[6] P.M. Lima and L. Morgado, Analytical-numerical investigation of a singu-lar boundary value problem for a generalized Emden-Fowler equation, J. Comput.Appl. Math., in press.

[7] P.M. Lima and L. Morgado, Numerical methods for singular boundary valueproblems involving the p-laplacian, accepted for publication in Proceedings of theInternational Conference ”Boundary Value Problems 2008”, held in Santiago deCompostela, September 2008.

[8] Pandey, R. K., A finite difference method for a class of singular two point bound-ary value problems arising in physiology, Intern. J. Computer Math., 65 (1997)131–140.

[9] R. K. Pandey and Arvind K. Singh, On the convergence of a finite differ-ence method for a class of singular two point boundary value problems arising inphysiology, J. Comput. Appl. Math., 166 (2004) 553–564.

[10] S. Wolfram, The Mathematica Book, Cambridge University Press, 1996.



Symbolic computation of the exponential matrix of alinear system

Juan F. Navarro1 and Antonio Perez1

1 Departamento de Matematica Aplicada, Universidad de Alicante


Abstract

The aim of this paper is to introduce a technique for the computation of theprincipal matrix of the companion matrix associated to an ordinary differentialequation with constant cofficients, based on symbolic computation, and via analgebraic processor which works with matrices of quasi–polynomials.

Key words: Perturbation methods, Symbolic manipulationMSC 2000: 41A60, 68W30

1 Introduction

Perturbation theories for differential equations containing a small parameter ε are quiteold. The small perturbation theory originated by Sir Isaac Newton has been highly de-veloped by many others, and an extension of this theory to the asymptotic expansion,consisting on a power series expansion in the small parameter, was devised by Poincare(1892). The main point is that for the most of the differential equations, it is not pos-sible to obtain an exact solution. In cases where equations contain a small parameter,we can consider it as a perturbation parameter to obtain an asymptotic expansion ofthe solution. In practice, the work involved in the application of this approach to com-pute the solution to a differential equation cannot be performed by hand, and algebraicprocessors result to be a very useful tool.

As explained in Henrard (1989), the first symbolic processors were developed towork with Poisson series, that is, multivariate Fourier series whose coefficients aremultivariate Laurent series. These processors were applied to problems in non–linearmechanics or non–linear differential equations problems, in the field of Celestial Me-chanics. One of the first applications of these processor was concerned with the theoryof the Moon, but also planetary theories, the theory of the rotation of the Earth andartificial satellite theories.

In order to achieve better accuracies in the applications of analytical theories, highorders of the approximate solution must be computed, making necessary a continuous


Symbolic computation of the exponential matrix

maintenance and revision of the existing symbolic manipulation systems, as well as thedevelopment of new packages adapted to the peculiarities of the problem to be treated.Recently, Navarro (2008a, 2008b) developed a symbolic processor to compute periodicsolutions in equations of type

x+ a1 x+ a0 x = u(t) + εf(x, x) , x(t0) = x0 , x(t0) = x0 , (1)

where a0, a1, t0, x0, x0 ∈ R, u(t) is a quasi–polynomial,

u(t) =∑ν≥0

tnνeανt(λν cos(ωνt) + µν sin(ωνt)) , (2)

where nν ∈ N, αν , ων , λν and µν ∈ R, and f(x, x) admits the expansion

f(x, x) =M∑κ=0

∑0≤ν≤κ

fν,κ−ν xν xκ−ν , fν,κ−ν ∈ R . (3)

To that end, the idea is to expand both the solution and the modified frequency withrespect to the small parameter, allowing to kill secular terms which appear in therecursive scheme. The elimination of secular terms was performed through a manipula-tion system which works with modified quasi–polynomials, that is, quasi–polynomialscontaining undetermined constants:

u(t) =∑ν≥0

τσ11 × · · · × τ

σQQ tnνeανt(λν cos(ωνt) + µν sin(ωνt)) , (4)

where nν ∈ N, αν , ων , λν , µν ∈ R, σν ∈ Z, and τ1, . . . , τQ are real constants withunknown value.

In Navarro (2009), an algebraic processor has been developed to handle m × nmatrices whose elements lie on the set of quasi–polynomials. This algebraic processorwill provide a tool to solve a perturbed differential equation of the class

x(n) + an−1x(n−1) + · · ·+ a1x+ a0x = u(t) + εf(x, x, . . . , x(n−1)) ,

with initial conditions

x(t0) = x10 , x(t0) = x20 , . . . , x(n−1)(t0) = xn0 ,

where ε is a small real parameter, a0, a1, . . . , an−1 ∈ R, u(t) is a quasi–polynomial, andf is such that

f(x, x, . . . , x(n−1)) =∑

0≤ν1,...,νn≤Mfν1,...,νn x

ν1 · · · (x(n−1))νn ,

with M ∈ N, ν1, . . . , νn ∈ N and fν1,...,νn ∈ R. To that end, our goal is to constructfirst the solution to the homogeneous problem.


Juan F. Navarro and Antonio Perez

2 Setting the problem

Many economical, biological and physical processes are modeled by nth order lineardifferential equations with constant coefficients of the form

x(n) + an−1x(n−1) + · · ·+ a1x+ a0x = 0 , (5)

with initial conditions

x(0) = x10 , x(0) = x20 , . . . , x(n−1)(0) = xn0 ,

being x10, . . . , xn0, a0, a1, . . . , an−1 ∈ R. In Navarro (2009), an algebraic processor wasdeveloped to handle m×n matrices whose elements lie on the set of quasi–polynomials.This algebraic processor implements the basic algebra on such a class of matrices:addition, substraction, product by a scalar, multiplication, derivation and integrationwith respect to t.

The goal of this work is to construct a package for computing simbolically the solu-tion to (5) based on the mentioned algebraic processor. To that purpose, we work withthe linear system associated to that equation instead of working directly with equation(5). This is to avoid the computation of the roots of the characteristic equation,

λn + an−1λn−1 + · · ·+ a1λ+ a0 = 0 .

This package can be useful both for research and educational purposes in the fieldof differential equations or dynamical systems, and it can be taken as the kernel forconstructing the solution to the complete differential equation

x(n) + an−1x(n−1) + · · ·+ a1x+ a0x = εf(x, . . . , x(n−1)) ,

by means of the application of perturbation methods based on the expansion of thesolution as a power series in the small parameter ε.

3 A method for computing the exponential matrix

With the aid of the substitutions

x1 = x, x2 = x , . . . , xn = x(n−1) ,

equation (5) is transformed into the system of differential equations given by

X(t) = AX(t) , X(0) = X0 , (6)

where A is the companion matrix,

A =

0 1 0 · · · 00 0 1 · · · 0...

......

. . ....

−a0 −a1 −a2 · · · −an−1

,



and

X(t) =

x1(t)x2(t)

...xn(t)

, X0 =

x10

x20...xn0

.

As stated in Moler (2003), there are many methods for computing the principalmatrix Φ(t) = eAt of the system (6). Let us give a brief description here of the methodimplemented in the symbolic computation package. The method consists of splittingA into B + C, being

B =

0 1 0 · · · 00 0 1 · · · 0...

......

. . ....

0 0 0 · · · 0

, C =

0 0 0 · · · 00 0 0 · · · 0...

......

. . ....

−a0 −a1 −a2 · · · −an−1

,

and then using the approximation

eA ' (eB/meC/m)m ,

with m ∈ N. This approach to obtain eA is of potential interest when the exponentialsof matrices B and C can be efficiently computed. In our case,

eB/m =

g0(1,m) g1(1,m) g2(1,m) · · · gn−1(1,m)

0 g0(1,m) g1(1,m) · · · gn−2(1,m)0 0 g0(1,m) · · · gn−3(1,m)...

......

. . ....

0 0 0 · · · g0(1,m)

,

and

eC/m = I − 1an−1

(e−an−1/m − 1

)C =

=

1 0 0 · · · 00 1 0 · · · 00 0 1 · · · 0...

......

. . ....

a0an−1

(e−an−1/m − 1) a1an−1

(e−an−1/m − 1) a2an−1

(e−an−1/m − 1) · · · e−an−1/m

,

being, for any n ∈ N,

gn(t,m) =tn

n!mn. (7)

As it is established in Moler (2003), for a general splitting A = B + C, m can bedetermine from the inequality

‖eA − (eB/meC/m)m‖ ≤ 12m‖[B,C]‖e‖B‖+‖C‖ .



Thus, being A the companion matrix, we get that

‖eA − (eB/meC/m)m‖ ≤ 1me1+‖a‖‖a‖ ,

where a = (a0, a1, . . . , an−1). Here,

‖x‖ =√x2

1 + · · ·+ x2n ,

for any x = (x1, . . . , xn) ∈ Rn, and ‖A‖ = max‖x‖=1 ‖Ax‖.

4 On the symbolic solution

We can also apply this method to compute the matrix eAt instead of eA. If we do so,we obtain a principal matrix of (6) whose elements lie on the set of quasi–polynomials,and the symbolic processor results to be suitable to work with those matrices. Theapproach is to compute

eAt ' (eBt/meCt/m)m ,

taking into account that

eBt/m =

g0(t,m) g1(t,m) g2(t,m) · · · gn−1(t,m)

0 g0(t,m) g1(t,m) · · · gn−2(t,m)0 0 g0(t,m) · · · gn−3(t,m)...

......

. . ....

0 0 0 · · · g0(t,m)

, (8)

and

eCt/m =

1 0 0 · · · 00 1 0 · · · 00 0 1 · · · 0...

......

. . ....

a0an−1

(e−an−1t/m − 1) a1an−1

(e−an−1t/m − 1) a2an−1

(e−an−1t/m − 1) · · · e−an−1t/m

,

(9)where gn(t,m) is given by equation (7). Thus, eAt is a matrix of quasi–polynomialsthat can be completely computed via the designed symbolic processor. The procedurefor the computation of the exponential matrix is substantially simplified by using thefollowing equation, which avoids the symbolic multiplication of eBt/m and eCt/m:

eBt/meCt/m = eBt/m +H(t,m) ,

where

H(t,m) =1

an−1f(t,m)

gn−1(t,m)...

g0(t,m)

( a0 a1 · · · an−1

),



andf(t,m) = e−an−1t/m − 1 .

This relation can be obtained directly from the multiplication of matrices eBt/m andeCt/m as given in equations (8) and (9).

Thus, the exponential matrix eAt will be symbolically calculated through equation

eAt ' (eBt/meCt/m)m = (eBt/m +H(t,m))m .

In the following, we will derive some expressions which simplify the way the matrix(eBt/m +H(t,m))m is computed.

Lemma 1 For any k ≥ 1,

(eBt/m)k =

G0,k(t,m) G1,k(t,m) · · · Gn−1,k(t,m)

0 G0,k(t,m) · · · Gn−2,k(t,m)...

.... . .

...0 0 · · · G0,k(t,m)

, (10)

whereGν,k(t,m) = kνgν(t,m) .

Proof. To prove this property, we will proceed by induction over k ∈ N. The statementholds for k = 1:

eBt/m =

g0(t,m) g1(t,m) g2(t,m) · · · gn−1(t,m)


......

. . ....

0 0 0 · · · g0(t,m)

.

Now, we will show that if the expression holds for k, then also does it for k + 1:(eBt/m

)k+1= eBt/m

(eBt/m

)k=

=

g0(t,m) g1(t,m) g2(t,m) · · · gn−1(t,m)


......

. . ....

0 0 0 · · · g0(t,m)

×

×

k0g0(t,m) k1g1(t,m) k2g2(t,m) · · · kn−1gn−1(t,m)

0 k0g0(t,m) k1g1(t,m) · · · kn−2gn−2(t,m)0 0 k0g0(t,m) · · · kn−3gn−3(t,m)...

......

. . ....

0 0 0 · · · k0g0(t,m)

= (hij)n−1i,j=0 .

Due to the triangular character of the matrices in the equation above, the followingexpressions are satisfied:



1. hij = 0 si i > j.

2. hij = 1 = k0g0(t,m) si i = j.

3. hij =∑j−i

ν=0 kνgν(t,m)gj−i−ν(t,m) if i < j.

Let us recall ` = j − i to ease the writing of the formulae. The last formula can betransformed as follows:

hij =∑ν=0

kνgν(t,m)g`−ν(t,m) =∑ν=0

kνtν

ν!mν× t`−ν

(`− ν)!m`−ν =

=∑ν=0

kνt`

ν!(`− ν)!m`=

t`

m`

∑ν=0

kν

ν!(`− ν)!=

t`

`!m`

∑ν=0

kν`!ν!(`− ν)!

=

=t`

`!m`

∑ν=0

kν(`ν

)= (k + 1)`g`(t,m) .

Thus, if i < j, hij is given by

hij = (k + 1)j−igj−i(t,m) .

That is, the matrix (eBt/m)k can be calculated directly through formula (10).

Lemma 2 For any p ∈ Z larger that 1,

Hp(t,m) =(

f(t,m)an−1(n− 1)!mn−1

)p(Sn−1(t))p−1×

×

tn−1

(n− 1)tn−2m(n− 1)(n− 2)tn−3m2

...(n− 1)!t0mn−1

(a0 a1 · · · an−1

),

where

Sn−1(t) =(a0 a1 · · · an−1

)

tn−1

(n− 1)tn−2m(n− 1)(n− 2)tn−3m2

...(n− 1)!t0mn−1

.

Proof. To prove this property we will proceed by induction over p ∈ N. The statementholds for p = 1. Now, we will show that if the expression holds for p, then also does it



for p+ 1:

Hp+1(t,m) = H(t,m)Hp(t,m) =

=1

an−1f(t,m)

gn−1(t,m)gn−2(t,m)

...g0(t,m)

( a0 a1 · · · an−1

)( f(t,m)an−1(n− 1)!mn−1

)p×

× (Sn−1(t))p−1

tn−1

(n− 1)tn−2m(n− 1)(n− 2)tn−3m2

...(n− 1)!t0mn−1

(a0 a1 · · · an−1

),

and now, by simply rearranging terms, we get

Hp+1(t,m) ==(

f(t,m)an−1(n− 1)!mn−1

)p+1

(Sn−1(t))p×

×

tn−1

(n− 1)tn−2m(n− 1)(n− 2)tn−3m2

...(n− 1)!t0mn−1

(a0 a1 · · · an−1

).

4.1 Computation of(eBt/m

)kHp(t, m) and Hp(t, m)

(eBt/m

)kOn the one hand, lemmas 1 and 2 allow us to arrage the product

(eBt/m

)kHp(t,m) as

follows:

(eBt/m

)kHp(t,m) =

(f(t,m)

an−1(n− 1)!mn−1

)p(Sn−1(t))p−1×

×

(k + 1)n−1tn−1

(k + 1)n−2(n− 1)tn−2m(k + 1)n−3(n− 1)(n− 2)tn−3m2

...(k + 1)0(n− 1)!t0mn−1

(a0 a1 · · · an−1

).



On the other hand, the product Hp(t,m)(eBt/m

)kcan be written as

Hp(t,m)(eBt/m

)k=

=(

f(t,m)an−1(n− 1)!mn−1

)p(Sn−1(t))p−1

tn−1

(n− 1)tn−2m(n− 1)(n− 2)tn−3m2

...(n− 1)!t0mn−1

A(t,m) ,

where

A(t,m) =( ∑0

ν=0 aνk0−νg0−ν

∑1ν=0 aνk

1−νg1−ν · · ·∑1

ν=n−1 aνkn−1−νgn−1−ν

),

where, for the sake of simplicity, we have omitted the dependence on m and t of gν

References

[1] A. Abad, A. Elipe, J. Palacian, J. F. San–Juan, ATESA: A symbolic processorfor artificial satellite theory, Mathematics and Computers in Simulation 45 (1998),497–510

[2] Martın Lara, A. Elipe, Manuel Palacios, Automatic programming of recurrentpower series, Mathematics and Computers in Simulation 49 (1999), 351–362

[3] Jacques Henrard, A survey of Poisson series processors, Celestial Mech. Dynam.Astron. 45, 245–253

[4] C. Moler and C. V. Loan, Nineteen dubious ways to compute the exponential of amatrix, SIAM Review 45 (2003), 1

[5] Juan F. Navarro, On the implementation of the Poincare–Lindstedt technique,Applied Mathematics and Computation 195 (2008a), 183–189

[6] Juan F. Navarro, Computation of periodic solutions in perturbed second orderODEs, Applied Mathematics and Computation 202 (2008b), 171–177

[7] Juan F. Navarro, On the symbolic computation of the solution to a differentialequation, Applied Mathematics and Computation, to appear in Advances andApplications in Mathematical Sciences (2009)





30 June, 1–3 July 2009.

Attitude determination from GPS signals

Alejandro P. Nicolas1, Michael Gonzalez-Harbour1, Laureano

Gonzalez-Vega1, Raul Arnau Prieto2, Daniel Bolado Guerra2, Marıa

Campo-Cossio Gutierrez2 and Alberto Puras Trueba2

1 Departamento de Matematicas, Estadıstica y Computacion, Universidad de

Cantabria

2 Fundacion Centro Tecnologico de Componentes,

emails: [email protected], [email protected],[email protected], rarnau.ctcomponentes.com,[email protected], [email protected],

[email protected]

Abstract

In this work we study a method for attitude determination using phase-doubledifferences. We use the available information for reducing the ambiguity searchspace, and so, to obtain an instantaneous estimate of the attitude. Some experi-mental results are presented. A discussion about the effect of experimental errorsin the measured data is introduced in the last section.

Key words: GPS, attitude determination, ambiguity resolution

1 Introduction

The problem of using GPS signals for attitude determination has been discussed bymany authors. Thus, several kind of methods have beeen developed for dealing withit. In this work, we will study one of them. It was introduced by Hodgart, in [1] and[2], and tries to obtain an instantaneous estimation of the attitude. Other methods useredundant measurements for improving accuracy.

We start with the phase data from the GPS receivers. The substantial problem isan integer ambiguity in the receiver carrier phase measurement. A GPS receiver canonly measure a fraction of carrier phase cycle (−π to π in radians or −λ/2 to λ/2 inrange), so the number of cycles is unknown.

The general technique for ambiguity search consists on minimize the error betweenmeasurement and estimate. In particular, this method uses the double-phase differencesfor removing the offset error in measurements and for reducing the ambiguity search



space. Moreover, the use of the available information, such as the length of the baselines,will be useful for reducing the ambiguity search (in general, three-dimensional search)to a two-dimensional search. We will see that this process can cause lack of accuracy.

This paper is organized as follows. In section 2 we introduce the mathematicalaspects of the method developed by Hodgart. In particular, we will see how the reduc-tion of the ambiguity search space is made. Next section presents two experimentalresults. The first one shows a situation in which ambiguities are correctly solved. Un-fortunately, real data contain errors and the method can fail as the second exampleshows. Finally, we refer the existence of other methods for ambiguity resolution whichexploit the redundancy present in measurements. These methods are more accurate,but computationally expensive.

2 Mathematical approach

The method used for ambiguity resolution is based on the ideas exposed in [1] and [2].We use information about base lines for performing only a two dimensional search ofthe ambiguities.

Let si : i = 1, . . . , 4 be the set of unit vectors from master antenna to GPSsatellites. Let us suppose that the set s1 − s2, s2 − s3, s3 − s4 is linearly independent,and so, a base of R

3. Using the Gram Smidt orthogonalization method we can constructa new set vj : j = 1, . . . , 3 of orthogonal vectors as follows:

v1 = (s1 − s2),v2 = (s2 − s3) − a21v1,

v3 = (s3 − s4) − a32v2 − a31v1,

(1)

where a21, a31 and a32 are the following constants:

a21 = (s2 − s3) · v1/(v1 · v1),a31 = (s3 − s4) · v1/(v1 · v1),a32 = (s3 − s4) · v2/(v2 · v2).

In the following, we will write aii = vi · vi, i = 1, . . . , 3.

As we have seen, the measurement rij from the i’th satellite to the j’th base line bj

can be written as the scalar product rij = si · bj, which can be expand as:

rij = rij + nij, −1/2 < rij < 1/2, (2)

with nij integer. The value rij is a module measurement of a single (normalize) phasedifference.

Since the argument is the same for all base lines, we fix one of them, for examplethe j′th one. For simplicity, we will write only single subscripts. With this notation,the modulo measurements from the four satellites from the fixed base line can berepresented as

r = (r1, r2, r3, r4),


A. P. Nicolas

and the double modulo differences are formed from these values according to the scheme∆ri = ri − ri+1, i = 1, . . . 3.

On the other hand, the base line vector bj can be expressed in the orthogonal basev1, v2, v3 as:

bj =

3∑

i=1

µivi,

with µi = bj · vi/aii for i = 1, . . . , 3. From (1) and (2) we can write each scalar product

λi = bj · vi, i = 1, . . . 3, as a function of the double differences ∆ri, i = 1, . . . , 3, in thefollowing way:

λ1 = bj · v1 = ∆r1 + ∆n1,

λ2 = bj · v2 = ∆r2 − a21∆r1 + ∆n2 − a21∆n1,

λ3 = bj · v3 = ∆r3 − a32∆r2 − a30∆r1 + ∆n3 − a32∆n2 − a30∆n1,

(3)

where the coefficient a30 = a31 − a32a21. Now, we introduce the following parameterswhich do not deppend on the values of ∆ni, i = 1, . . . , 3:

λ1 = ∆r1,

λ2 = ∆r2 − a21∆r2,

λ3 = ∆r3 − a32∆r2 − a30∆r1.

(4)

At this point, we use the available information about the base line for doing areduction in the dimension of the search space for the differences ∆ni. So, we considerall possible values for ∆n1 and ∆n2 and then, we estimate the parameters:

λ1 = λ1 + ∆n1,

λ2 = λ2 + ∆n2 − a21∆n1,(5)

which are independent from ∆n3. Since the length of the base line is l, we can predictthe parameter λ3 as follows:

λ23 = a33(l

2 − λ21/a11 − λ2

2/a22). (6)

From this value, two possible values can be found for the integer ∆n3:

∆n+3 = round(+|λ3| − λ3 + a32∆n2 + a30∆n1),

∆n−3 = round(−|λ3| − λ3 + a32∆n2 + a30∆n1),

(7)

where round means the nearest integer. For each value, we calculate the corrected λ3

as

λ3 = λ3 + ∆n3 − a32∆n2 − a30∆n1,

and finally the estimated value for the base line:



b =λ1

a11v1 +

λ2

a22v2 +

λ3

a33v3. (8)

For each estimated b, we associate the value

ε = λ21/a11 + λ2

2/a22 + λ23/a33 − l2, (9)

which measures the goodness of fit against the base line. Notice that ε is always anon-negative number by equation (6).

We repeat this process for all base lines. Suppose that we have a total of N choicesfor each one. Then, we will obtain three lists of N possible values for each base lineordered by decreasing ε. The next step consists on finding a combination of threeestimated base lines, one for each list, which leads to the best fitting with respect tothe real data.

We will denote each base line as lj , j = 1, . . . , 3. We know these coordinates, whichare expressed in the body frame system of reference. The virtual base line difference∆lij is defined as ∆lij = li − lj . This value will be compared with the estimated

virtual base line difference ∆bk1k2

ij = bk1

i − bk2

j , where bk1

i denotes the k1 estimatedi’th base line, that is, the k1 position in the i’th list. This comparison is made forall possible base line differences ∆lij , 1 ≤ i < j ≤ 3 and all possible combinations∆bk1k2

ij , 1 ≤ i < j ≤ 3, 1 ≤ k1, k2 ≤ N . Goodness of fit is measured by the parameter:

εtot = ||bk1

1 ||2 + ||bk2

2 ||2 + ||bk3

3 ||2 + ||∆bk1k2

12 ||2 + ||∆bk1k3

13 ||2 + ||∆bk2k3

23 ||2−(||l1||2 + ||l2||2 + ||l3||2 + ||∆l12||2 + ||∆l13||2 + ||∆l23||2).

(10)

This last process can be seen as the construction of most likely tetrahedron from theavailable data. Notice that we are working with norms, so equation (10) is valid eitherthe vectors b, ∆b, l and ∆l are expressed in the same reference system or not.

Attitude is now found by minimizing the following function (see [5]):

L(C) =1

2

∑

i

ai|bi − Cri|, (11)

where the sum is taken over all baselines, bi is a set of the baselines measured in thebody frame system and ri are the corresponding unit vectors in a reference frame. Theweights ai are non-negative and are chosen to be inverse variances σ−1

i . The matrix C

is orthogonal and gives the attitude of the system.The solution of this problem can be calculated using several methods. We use

Davenport’s q-method [3], which relates the solution of (11) with the eigenvector withlargest eigenvalue of a matrix K derived from the values of bi and ri.

3 Some experimental results

This method has been tested with several data using phase measurements collected forthree baselines lying in the same plane. The attitude determination is made individually


A. P. Nicolas

for each epoch (1 second interval). Two of the baselines have a length of 0.958 m, whilethe other one has 1.355 m. So, there are 11 possible wavelenght integers for the shorterbaselines and 8 for the longer one. The double phase differences give, respectively, atotal of 21 and 29 integer elections. We only need to compute 1281 possible solutions,whereas 42911 trials should be computed if we had performed a three-dimensionalsearch of the ambiguities. For each epoch, we select the four satellites with maximumelevation.

Figure 1: Experiment 1 (Roll: 0, Pitch: 0, Yaw: 170)

In figures 1 and 2 we present some experimental results. Figure 1 shows that,after an inicial period, the solution with minimum εtot is the correct one. So, theambiguities are correctly solved and the attitude is properly calculated. However (seefigure 2), due to the experimental errors, the function εtot can provide wrong solutions.In any case, the correct solution can be found among the 30 solutions with a minimumεtot. Unfortunately, this method does not provide an effective way of finding it. Noticethat the method can select the same wrong solution during a long period of time, whichmakes more difficult to detect this kind of behaviour.

Since there is a lot of redundant information, it is likely that other approaches,which exploit this redundancy, will be more efficient in finding the correct solution.The LAMBDA method [4], which performs integer least squares, can calculate integerambiguities with more accuracy, and then can be used to find the initial ones. Also,the use of some external devices can provide some useful information for selecting thecorrect solution among those with minimum εtot.



Figure 2: Experiment 2 (Roll: 0, Pitch: 0, Yaw: 100)

4 Conclusions

In this work we have presented a method for determining attitude starting with doubledifferenced phase data. We have seen that this kind of methods, that solve integerambiguities for each individual epoch, can present serious errors when we are workingwith real data. The use of methods which exploit the redundancy of the measure-ments, as LAMBDA method, is a good alternative for accuracy. However, they arecomputationally more expensive. This problem can be avoided by the using of bothtype of methods simultaneously, since with a good accuracy in the initial measurementsand with the knowledge of some external information, it can be possible to choose thecorrect solution among the most likely solutions provided by the simplest method.

References

[1] S. Purivigraipong, M. S. Hodgart, M. J. Unwin, S. Kuntanapreeda, An

approach to resolve integer ambiguity of GPS carrier phase difference for spacecraft

attitude determination. In TENCON 2005 2005 IEEE Region 10, pp.1-6, 21-24Nov. 2005, (2005).

[2] M. S. Hodgart, S. Purivigraipong, New approach to resolving instantaneous

integer ambiguity resolution for spacecraft attitude determination using GPS sig-

nals. In Position Location and Navigation Symposium, IEEE 2000, 13-16 March2000, IEEE Computer Society (2000) 132-139.

[3] G. M. Lerner, Three-Axis Attitude Determination, Spacecraft Attitude Determi-nation and control ed. by James R. Wertz, Dordrecht, Holland, D. Reidel, (1978).

[4] P. De Jonge, CCJM Tiberius, The LAMBDA method for integer ambiguity

estimation: implementation aspects Technical report LGR Series No 12, Delft


A. P. Nicolas

Geodetic Computing Centre, Delft University of Technology, The Netherlands.(1996).

[5] F. L. Markley,Optimal Attitude Matrix for Two Vector Measurements Journalof Guidance, Control and Dynamics 31 (3), (2008) 765-768.



Flow analysis around structures in slow fluids and itsapplications to environmental fluid phenomena

S. Oharu1, Y. Matsuura2 and T. Arima3

1 Department of Mathematics, Chuo University, Tokyo, Japan2 Department of Maritime Safety Technology, Japan Coast Guard Academy

3 Wako Research Center, Honda R&D Co., Ltd. Saitama, Japan


Abstract

Importance and application of numerical flow analysis in environmental scienceand technology are outlined. Fluid phenomena in the ocean, rivers, atmosphereand the ground are investigated by means of numerical methods and in turn pro-posals for the control, restoration and counterplans against the so-called environ-mental disrupters which destroy natural environment as well as ecological systemsin nature. All such environmental disrupters diffuse in and are transported byenvironmental fluids. Those disrupters sometimes react on some other chemicalsto generate more poisonous materials. Environmental fluid dynamics is effectivefor the evaluation, prediction and restoration of the environmental damage. Inthis paper a mathematical model of environmental fluid is presented and resultsof numerical simulations based on the model are exhibited. Key words: Environ-

mental fluid, computational fluid dynamics, numerical simulation, environmentalrestoration technology, three dimensional visualization.

MSC 2000: 39A12, 62P12, 65M06, 65M12, 76D05

1 Introduction

Fluid dynamical technologies are now important in the field of environmental scienceand technology. Evaluation of environmental fluid flow using numerical methods is par-ticularly useful to understand the complex fluid motion and make it possible to controlthe flow fields from the point of view of environmental restoration. The environmentalfluid problems may be classified by three types of applications. The first application isconcerned with ultimate use of exergy. This is the most important subject for existingengine that use fossil fuel for combustion. Secondly, new energy sources such as wind


Flow analysis around structures in slow fluids

and wave power generation should be extensively studied. Thirdly, it is indispensable todevelop not only efficient and harmless energy sources but also technologies to restorethe environment which has already been polluted by exhaust gases through combus-tion of fossil fuel. It is also important to develop effective methods for protecting theenvironment fluids against pollutants. In this paper the field of studies in evaluation,control and prediction of transport phenomena which arise in a variety of environmentalproblems is called environmental fluid dynamics. Obviously, the environmental fluiddynamics is one of the key theories to invent efficient technologies for the preservationand restoration of the natural environment. Here we focus our attention on a dynami-cal analysis of diffusion and convective processes of pollutants in environmental fluids.A mathematical model of environmental fluids is presented and results of simulationsare exhibited. It is then expected that new environmental restoration technology willbe developed.

2 A mathematical model of environmental fluids

As a mathematical model describing the motion of environmental fluids, we employthe compressible Navier-Stokes system. Although it is known (see [9]) that the appli-cation of numerical methods for the compressible Navier-Stokes system to low-speedflows does not necessarily provide us with satisfactory results on the convergence. Thisfact implies that numerical simulations become inefficient and the associated compu-tational results turn out to be inaccurate. These numerical difficulties are caused bythe circumstances that the characteristic velocities in the compressible Navier-Stokessystems are convective and sound speeds, and so that their ratios become large and theso-called stiffness of the system may occur due to the disparity of eigenvalues of the sys-tem. One of efficient counter measures for those numerical difficulties is to employ theso-called Boussinesq approximation under the assumption that the ratio of the changein density to that of temperature is small. In fact, under the Boussinesq approximationnumerical methods which have been developed for incompressible fluids can directlyapplied. On the other hand, in the case that the ratio of the change in density to thatof temperature is considerably large, the low Mach number approximation is proposedin [5].

In this paper we apply the Boussinesq approximation to the Navier-Stokes systemand formulate the following system of equations (1-3) as our mathematical model fordescribing the motion of environmental fluids:

∇ · v = 0 (1)ρ [vt + (v · ∇)v] = −∇p + µ∆v − ρβ(T − T0)g (2)

ρCp [Tt + (v · ∇)T ] = ∇(κ∇T ) + Sc (3)

Here the parameters v, ρ, p, µ, β, g, T and T0 represent the velocity vector, density,pressure, viscosity coefficient, rate of volume expansion, the acceleration of gravity,temperature and its reference temperature, respectively. Also, the coefficient κ meansthe thermal conductivity and Sc stands for the sum of heat sources in the fluid.


S. Oharu, Y. Matsuura, T. Arima

Our main objective here is to get numerical data describing the flow field aroundbodies in an environmental fluid. For this purpose we impose Dirichlet boundary con-ditions for v and T and homogeneous Neumann boundary conditions for p on the inflowboundary. On the outflow boundary we impose homogeneous Neumann conditions forv, T and Dirichlet boundary conditions for p. On the surface of each body standing inthe fluid, we impose the non-slip conditions for v and homogeneous Neumann condi-tions for T and employ an inhomogeneous Neumann boundary conditions for p whichis obtained from equation (2). In this paper a new numerical scheme is proposed suchthat a fully implicit Euler scheme for the numerical velocity is introduced.

3 Numerical Model

Making discretization in time of (1) by use of the implicit Euler method, we obtain thefollowing system of nonlinear equations:

∇ · vn+1 = 0 (4)vn+1 − vn

∆t= −(vn+1 · ∇)vn+1 − 1

ρ∇pn+1 +

µ

ρ∆vn+1 − β(Tn+1 − T0)g (5)

Tn+1 − Tn

∆t= −(vn+1 · ∇)Tn+1 +

1ρCp

∇(κ∇Tn+1) + Sc (6)

Substituting Equation (5) into Equation (4), Poisson’s equation for pressure is derived:

∆pn+1 = −ρ

[∇ ·

(vn+1 · ∇)vn+1

− ∇ · vn

∆t

]− ρβ∇ · (Tn+1g). (7)

In what follows, we regard Equations (4-6) as the governing equations for the motion ofnumerical fluids. We here investigate the structure and behavior of numerical solutionsof this basic model.

3.1 Implicit iterative scheme

We here present a new method such that given velocity field vn, pressure field pn andtemperature field Tn at the n-th time step, the velocity field vn+1, pressure field pn+1

and temperature field Tn+1 are obtained at the (n + 1)-th step through the iterationprocedures, respectively. Our mathematical models of the numerical fluid as expressedby Equations (5), (6) and (7) are fully implicit in time and this implicit form guaranteesnumerical stability and robustness. We adopt a procedure of constructing iterativenumerical solutions that is not only much more economical but also remains most ofthe stability and accuracy properties of the fully implicit scheme.

The iteration procedure employed in the present study is summarized as follows: Inthe following, the superscript n refers to the value which are known from the previoustime step, the superscript k refers to the iteration cycle between the solutions at timestep n and n + 1, the superscript 0 is associated with an initial guess for the firstiteration step k = 0.



Step1: Choose an inferred initial data for computing the values vn+1, pn+1, and Tn+1

at the next time step. The simplest choice is to use the solutions themselves at thecurrent time step:

v0 = vn, p0 = pn, T 0 = Tn

Step2: Poisson’s equation for the pressure (7) is solved by applying the successive overrelaxation (SOR) method, in which the residual cutting method (RCM) developed in[7] is used to speed up the convergence, to get the pressure at the current iterationstep, say k:

∆pk = −ρ

[∇ ·

(vk · ∇)vk

− ∇ · vn

∆t

]− ρβ∇ · (T kg) (8)

Step3: The following equation of the delta-form for δvk (= vk+1 − vk) is solved.[1 + ∆t

(vn · ∇ − µ

ρ∆

)]δvk = rhsk

m, (9)

rhskm = −(vk − vn) + ∆t

[−(vk · ∇)vk − 1

ρ∇pk +

µ

ρ∆vk − β(T k − T0)g

]. (10)

Step4: Compute the velocity at the next iteration step k + 1 by

vk+1 = vk + δvk.

Step5: The following equation of the delta-form for δT k (= T k+1 − T k) is solved.[1 + ∆t

(vk+1 · ∇ − κ

Cpρ∆

)]δT k = rhsk

T , (11)

rhskT = −(T k − Tn) + ∆t

[−(vk+1 · ∇)T k +

κ

Cpρ∆T k +

Sc

ρCp

]. (12)

Step6: Compute the temperature at the next iteration step k + 1 by

T k+1 = T k + δT k.

Step7: Check the convergence of Newton’s iteration for the equations of the delta formfor the velocity and the temperature as follows:∑

Ω

∣∣vk+1 − vk∣∣ < ϵv,

∑Ω

∣∣T k+1 − T k∣∣ < ϵT .

where∑

Ω means the summation over the whole computational domain, ϵv and ϵT

are small values prescribed as admissible error bounds which also stand for radii ofconvergence for the respective inequalities. This completes one cycle of the iterativeprocess. If more iterations are required, the process should be continued from Step 2.In particular, experiences suggest that only 2 or 3 iterations are enough to get desiredapproximate numerical solutions. We find in this scheme that if |vk+1 − vk

∣∣ → 0 and|T k+1 − T k

∣∣ → 0 then vk = vk+1 = vn+1, T k = T k+1 = Tn+1 and pk = pn+1, becauseEquations (9) and (11) converge to Equations (5) and (6), respectively, for δvk = 0and δT k = 0; and then pressure equation (8) converges to Equation (7).



3.2 Spatial discretization with TVD property

Concerning the discretization in space, we employ a central difference scheme of thesecond order accuracy for discretization of terms except convective terms. For thediscretization of the convective terms on the right-hand side of the Equations (9) and(11), we apply an upwind scheme of the third order accuracy which is proposed in[3]. A more favorable way for discretizing the advection terms would be the use ofTVD schemes. To obtain TVD schemes of higher accuracy, the method of MonotoneUpstream Centered Schemes for Conservation Laws (MUSCL) [4] can be used. Forsimplicity, we consider the following time-dependent Cauchy problem in one spacedimension:

∂ϕ

∂t+ v

∂ϕ

∂x= 0, −∞ < x < ∞, t ≥ 0, ϕ(x, 0) = ϕ0(x). (13)

Here ϕ : R × R → R means velocity. We find that the solution of this equation hasa TVD property because the solution of ϕ of Equation (13) is constant along curvedx/dt = v, which is known as the characteristics equation. Therefore, it is possibleto construct a TVD scheme by starting with Equation (13). Thus, we consider thefollowing equation similar to Equation (13).

∂ϕ

∂t+

∂(vϕ)∂x

− ϕ∂v

∂x= 0. (14)

Since the form of the second term in the above equation is of the form of derivative offlux, we incorporate a discretization with TVD property throught the MUSCL approach[4]. We discretize the x − t plane by choosing a mesh width ∆x and a time step ∆t,and define the discrete mesh points (xi, tn) by

xi = i∆x, i = ...,−1, 0, 1, 2, ..., tn = n∆t, n = 0, 1, 2, ...

For simplicity we take a uniform mesh, with ∆x and ∆t being constant. The finitedifference methods provide approximations un

i ∈ R to solution u(xi, tn) at the discretegrid points. Here we discretize Equation (14) as follows:

ϕn+1i − ϕn

i

∆t= − 1

∆x

(fi+ 1

2− fi− 1

2

)(15)

f1± 12

denotes a numerical flux function on the cell interface xi + ∆x/2. This can beevaluated as the sum of discretizations of the last term of Equation (14) and the dis-cretized flux of the second term by the MUSCL method with minmod limiter function[1]. Since the last term can be discretized as,

ϕ∂v

∂x⇒

[(ai+ 1

2− ai− 1

2)/∆x

]ϕ,

with a = vn, the total numerical flux can be evaluated as follows:

fi+ 12

= −ai+ 12ϕi + f

(upw)

i+ 12

+ a+i+ 1

2

· 14

[(1 + κ)Φ+C

i+ 12

+ (1 − κ)Φ+Ui+ 1

2

]− a−

i+ 12

· 14

[(1 + κ)Φ−C

i+ 12

+ (1 − κ)Φ−Ui+ 1

2

] (16)



The first term of Equation (16) corresponds to the last term of Equation (14). Thesecond term of Equation (17), f

(upw)

i+ 12

corresponds to the first-order accurate upwinddifference of the second term of Equation (14) and the other terms are corrections tomake the scheme of higher order accuracy. These can be written as follows:

f(upw)

i+ 12

= a+i+ 1

2

ϕi + a−i+ 1

2

ϕi+1.

Herea = vn, a± = v± =

12

(vn ± |vn|) ,

and Φ is defined as follows:

Φ+Ci+ 1

2

= minmod [ϕi+1 − ϕi, β(ϕi − ϕi−1)]

Φ+Ui+ 1

2

= minmod [ϕi − ϕi−1, β(ϕi+1 − ϕi)]

Φ−Ci+ 1

2

= minmod [ϕi+1 − ϕi, β(ϕi+2 − ϕi+1)]

Φ−Ui+ 1

2

= minmod [ϕi+2 − ϕi+1, β(ϕi+1 − ϕi)] ,

minmod(x, y) =12

[sgn(x) + sgn(y)]min(|x|, |y|).

The parameter β is called a compression parameter in [1] and must satisfy β ≥ 1, andits upper bound is determined by the TVD condition. The parameter κ is one fordiscretization accuracy, e.g., the second-order accuracy for κ = −1 and κ = 1/3 forthe third-order accuracy, we have −1 ≤ κ ≤ 1. The numerical flux fi− 1

2is obtained by

replacing subscript i + 12 by i − 1

2 . In this replacement of subscripts, we should notethat the first term with the replaced subscript is not −ai− 1

2ϕi−1 but −ai− 1

2ϕi. Using

a = a+ + a−, Equation (17) can be rewritten as follows:

fi+ 12

= a−i+ 1

2

(ϕi+1 − ϕi) + a+i+ 1

2

· 14

[(1 + κ)Φ+C

i+ 12

+ (1 − κ)Φ+Ui+ 1

2

]− a−

i+ 12

· 14

[(1 + κ)Φ−C

i+ 12

+ (1 − κ)Φ−Ui+ 1

2

] (17)

When this scheme is written as

un+1i = un

i − Ci− 12(ui − ui−1) + Di+ 1

2(ui+1 − ui),

the conditions for this scheme to be Total Variation Diminishing (TVD) are:

Ci+12 ≥ 0, Di+ 1

2≥ 0, Ci+ 1

2+ Di+ 1

2≤ 1.

From the conditions Ci+12 ≥ 0 and Di+ 1

2≥ 0, we obtain

(1 ≤) β ≤ 3 − κ

1 − κ.



From the conditions Ci+ 12

+ Di+ 12≤ 1, we obtain

∆t ≤ ∆x

|ai+ 12| + 1

4(a+i+ 3

2

− a+i− 1

2

)(β(1 + κ) + 1 − κ)

Under these conditions, the scheme becomes a TVD scheme [2] for the discretizationof Equation(15). When the advection speed is constant (a = const), it becomes

∆t ≤ 45 − κ + β(1 + κ)

· ∆x

|a|.

This scheme is of the third-order accuracy for the values κ = 1/3 and β = 4.We can directly incorporate this formula with the convective terms on the right-

hand side of Equations (10) and (12). If this formula is applied to the convective termon the right-hand side of Equation (10), the same formula may have to be incprporatewith the first term of Equation (8) in terms of the consistency to the Equations (10).Thus, it is possible to employ the TVD discretization in our iterative implicit scheme.

4 Settings of Numerical Simulation

We here discuss a flow analysis around cylinders with bottom ends standing in an en-vironmental fluid. In the numerical simulations we have performed, flow analysis wasmade for a typical fluid flow. Our setting may be outlined as follows: We consider aparallelepiped region R in R3 and assume that one side is the inflow boundary andthe opposite side is the outflow boundary. We then insert two circular cylinders withradius 1R and length 20R both of which have bottom ends in the region R in such away that they are arranged in a row at an interval of 10R and perpendicular to the topside of R, as illustrated in Figure 2. For convenience, we call the cylinder facing theinflow boundary the front cylinder and the cylinder facing the outflow boundary therear cylinder. In this setting we performed numerical simulations and made detailedanalysis around the two cylinders parallel to each other. One of the typical applicationsof this analysis is the simulation of the behavior of red tide plankton in a neighborhoodof the farming part of an oyster raft floating on the ocean, as illustrated in Figure1(a). Another application is the study in the effect and influence of convective diffusionphenomena of automobile exhaust fumes on roadside trees, as shown in Figure 1(b).Numerical conditions are put in the following way: The Reynolds number (= ρ|v|2R/µ)in accordance with the main flow velocity is assumed to be Re = 2500 and the tem-perature distribution is assumed to follow a linear distribution such that T = 300K onthe top of the front cylinder and T = 290K under the bottom end.

5 Generation of Grid Point Systems

In order to obtain numerical solutions, we first define grid point system fitting with thesurfaces of the bodies and compute solutions of discretized equations consistent with



(a) Oyster raft (b) Street trees

Figure 1: Structures in the ocean and atmosphere

(1) - (3) on the grid points. Here an effective method for generating grid point systemswhich fit the surfaces of bodies in the fluid is proposed, and it is demonstrated thata simulation code can be developed on the grid point system. We generate grid pointsystems not only on the region of the fluid but also in the inside of the bodies so thatthose grid point systems are connected continuously on the boundaries of the bodies.Therefore the final grid point system covers a simply connected domain on which sim-ulations are performed. A concrete procedure for generating such grid point systemmay be outlined as follows: Firstly, on each horizontal plane crossing the cylinders,we divide the computational domain into the inside and outside of the circles whichare cross sections of the cylinders and then subdivide the outside of the circles into afinite number of simply connected subdomains. Secondly, we generate initial grid pointsystems on each subdomain by algebraic methods and then construct a spatial smoothgrid point system by solving the associated elliptic system in a numerical sense.

Let ξ = ξ(x, y) and η = η(x, y) denote the mapping from the physical space to thecomputational space. The basis of this method is that, following Thompson et al.[8],the mapping functions are required to satisfy the system of Poisson’s equations.

ξxx + ξyy = P, ηxx + ηyy = Q (18)

(a) Settings of computational domain (b) 3D view of computational domain

Figure 2: Settings for numerical simulation



Actual computation is to be done in the rectangular transformed field, where the curvi-linear coordinates, ξ, η, are the independent variables, with the Cartesian coordinates,x, y, as dependent variables. The following relations are useful in transforming equa-tions between computational space and physical space:

ξx = yη/J, ξy = −xη/J, ηx = −yξ/J, ηy = xξ/J (19)

whereJ = xξyη − xηyξ.

Applying Equation (19) to Equation (18) implies the transformed Poisson’s equations

αxξξ − 2βxξη + γxηη = −J2 (Pxξ + Qxη)

αyξξ − 2βyξη + γyηη = −J2 (Pyξ + Qyη) ,

whereα = x2

η + y2η, β = xξxη + yξyη, γ = x2

ξ + x2ξ .

Using appropriate inhomogeneous terms P, Q in an iterative way based on an idea ofSteger and Sorenson [6], we may generate a grid point system in such a way that thespacial distribution of grid points is concentrated near the boundaries of the bodies andlines connecting adjacent grid points are orthogonal to the boundaries.

Finally, we stack up the grid points constructed on each 2D cross section in thedirection of the axes of the cylinders and generate the aimed 3D grid point system inthe computational domain. The constructed grid point system used for our numericalsimulations is depicted in Figure 3. Figure 3(a) shows a general view of the whole gridpoint system by plotting grid points on the boundaries of the computational domain.Figure 3(b) gives a closeup of the grid points generated around the cylinders. Thenumber of grid points in the direction of the flow (the direction of x-axis) is 239, thatin the direction of the axes of the cylinders (the direction of z-axis) is 119 and thatin the direction perpendicular to the x and z-axes (the direction of y-axis) is 91, andso the total number of the grid points is 2,588,131. In order to shorten the time forcomputation, we use a parallel computer of the distributed memory type. In the parallelcomputing, the whole computational domain is divided into subdomains as shown inFigure 3(c) and computation on each subdomain is allocated to each CPU. In the caseof parallel computing of the distributed memory type, the allocated boundary data aresaved in separate memories. Hence we perform the parallel computing through thedata communication using an Message Passing Interface Library.

6 Results of Numerical Simulations

Computation is started with a uniform initial data and qualitative features are investi-gated by analyzing the numerical results of the simulation at a time step at which theflow field is well developed and reaches a quasi-stationary state. Figure 4 depicts thevelocity vector field and contours of the pressure on the cross section containing theaxes of the two cylinders. In the velocity vector field upward flows along the back of



(a) 3D general view (b) Grid points around thecircular cylinders

(c) Grid decomposition forparallel computation

Figure 3: Grid point system

the front cylinder are observed. These upward flows are formed when the horizontaluniform flow runs around the bottom end and are remarkable in a neighborhood of thebottom end and even reach the top part of the cylinder. Similar upward flows are alsoobserved behind the rear cylinder. These flows are formed in such a way that they seemto roll the bottom part up and go up towards the top part. Moreover, such upwardflows are observed in a wide range behind the rear cylinder since there are no obsta-cles. In the figure of contours of the pressure it is observed that a vertical sequence ofseparate regions like cells of negative pressure are formed. This is due to the presenceof nonstationary vorticities of Karmann-type.

On the other hand, a vertical sequence of regions of positive pressure are observedin the front of the rear cylinder. This phenomenon suggests that the nonstationaryvorticities generated by the front cylinder interact the regions of stagnation existingin the front of the rear cylinder and deteriorate the stagnation pressure. Figure 5depicts the iso-surfaces of pressure and vorticity and illustrates the 3D structure ofthe pressure field and vorticity distribution. It is observed in the pressure field thatthere is regular variation from the top part of the cylinders until the downstream dueto the formation of vorticity pairs of Karmann type, and that in the bottom parts ofthe cylinders the variation of pressure is restricted to the rear sides. On the otherhand, concerning the vorticity distribution, regular variation due to the formation ofvorticity pairs of Karmann type is observed in the top part of the rear cylinder inthe same way as in the pressure field. Vorticity distribution is concentrated in theback of the middle part of the rear cylinder, although the vorticity distribution in thebottom part is comparatively diffusive. This suggests that the flows running aroundthe bottom parts of the cylinders form longitudinal vortices. It is then inferred thatthese longitudinal vortices would motivate the upward flows behind the two cylinders.In Figure 6 the stream lines and trajectories of particles in the fluid are depicted.Trajectories of particles are drawn in the following way: We release the particles fromthe back of each cylinder and trace the trajectories forward and backward in timeuntil the particles reach the boundaries of the computational domain and those of the



(a) Velocity vector (b) Pressure contour

Figure 4: Computed results on x-z plane across circular cylinders

bodies in the fluid. It is seen from Figure 6(a) that upward flows behind the cylindersare rolling up towards the top. Furthermore, the motion of longitudinal vortices aroundthe bottom sides can be observed as inferred from the iso-surfaces of vorticities. Figure6(b) is obtained by arranging particles on the same trajectories as in Figure 6(a) atregular time intervals. From this it is seen that the particle distribution represents howlong a particle stay in the flow, and that particles are concentrated in the back of thefront cylinder. These results of numerical simulations may have applications to variousenvironmental problems. The farming part of an oyster raft as illustrated in Figure1(a) may be regarded as a regular arrangement of cylinders with bottom ends. Ourresults suggests that red tide plankton would stay in the upward vortices rolling upalong the back of the cylinders provided that the oyster raft is moved or water currentsflow against the raft. Another application is that street trees as illustrated in Figure1(b) are regarded as a sequence of cylinders with top ends, and that automobile exhaustfumes may be caught in the back of each tree which purify those poisonous gases. Itis then expected that new environmental restoration technology will be developed byapplying the results of numerical simulations for environmental fluids.

(a) Pressure (b) Absolute vorticity

Figure 5: Computed iso-surfaces



(a) streamline (b) Particle trajectory

Figure 6: Upward flow motion observed behind two circular cylinders

References

[1] S. R. Chakravarthy and S. Osher, A new class of high accuracy TVD schemefor hyperbolic conservation laws, AIAA Paper 85-0363 (1985).

[2] A. Harten, On a class of high resolution Total-Variation-Stable finite-differenceschemes, SIAM Journal of Numerical Analysis 21 (1), (1984),1–12.

[3] T. Kawamura and K. Kuwahara, Computation of High Reynolds Number Flowaround a Circular Cylinder with Surface Roughnes, AIAA Paper 84-0340 (1984).

[4] B. van Leer, Toward the ultimate conservative difference scheme. 4, A new ap-proach to numerical convection, J. of Comput. Phys. 23 (1977) 276–299.

[5] H. N. Najm, P. S. Wyckoff and O. M. Knio, A Semi-implicit NumericalScheme for Reacting Flow, J. Comput. Phys. 143 (1998) 381-402.

[6] E. Witten, Automatic mesh-point clustering near a boundary in grid generationwith elliptic partial differential equations, J. Comput. Phys. 33 (1979) 405–410.

[7] A. Tamura, K. Kikuchi and T. Takahashi, Residual cutting method for ellipticboundary value problems: application to Poisson’s equation, J. Comput. Phys. 137(1997) 247–264.

[8] J. Thompson, Z. U. A. Warsi and C. W. Mastin, Numerical Grid Generation: Foundations and Applications, McGraw-Hill/Appleton & Lange, 1985.

[9] E. Turkle, Preconditioned Method for Solving the Incompressible and Low SpeedCompressible Equations, J. Comput. Phys. 72 (2) (1987) 277-298.





30 June, 1–3 July 2009.

Exploiting Performance on Parallel T-Coffee Progressive

Alignment by Balanced Guide Tree

Miquel Orobitg1, Fernando Cores1, Fernando Guirado1 and Albert

Montanyola1

1 Department of Computer Science and Industrial Engineering, University of Lleida

emails: orobitg, fcores, f.guirado, [email protected]

Abstract

Multiple Sequence Alignment (MSA) constitutes an extremely powerful toolfor important biological applications such as phylogenetic analysis, identificationof conserved motifs and domains and structure prediction. In spite of the improve-ment in speed and accuracy introduced by MSA programs, the computationalrequirements for large-scale alignments requires high performance computing andparallel applications. In this paper we present an improvement to a parallel im-plementation of TCoffee, a widely used MSA package. Our approximation resolvesthe bottleneck of the progressive alignment stage on MSA. This is achieved byincreasing the degree of parallelism by balancing the guide tree that drives theprogressive alignment process. The experimental results show improvements inexecution time of over 68% while maintaining the biological quality.

Key words: T-Coffee, Multiple Sequence Alignment, Parallelism, Balanced Guide

Tree

1 Introduction

From the biological point of view, all living organisms are related by evolution. Thus,it could be possible to find characteristic motifs and conservation regions in proteinfamilies in the determination of evolutionary linkage. Currently the study of similarityin multiple protein sequences is an important challenge and is performed by MultipleSequence Alignment processes (MSA). MSA constitutes an extremely powerful toolfor such important biological applications as phylogenetic analysis, identification ofconserved motifs and domains and structure prediction.

The main idea behind MSA is to put protein residues in the same column accordingto some selected criteria. These criteria can be the structural, evolutionary, functionalor sequence similarity. The first three criteria are based on biological meaning, but thefourth is not. Then it could be possible that the best match alignment does not exhibit


Exploiting Performance on Parallel T-Coffee

the best biological meaning. Thus, MSA are computationally difficult to produce, andmost formulations of the problem lead to NP-hard optimization problems [17].

In order to obtain the best alignment, the sequence similarity is typically definedas the sum of substitution matrix scores for each aligned pair of residues, minus somepenalties for gaps that can be added between them. This approach is generalized to themultiple sequence case by seeking an alignment that maximizes the sum of similaritiesfor all pairs of sequences (the sum-of-pairs, or SP score). Because of the relativelylimited requirements for time and power compared to global optimization methods,heuristic methods are widely used in multiple sequence alignment programs [3] [11].

There are many MSA heuristics but the most widely used are based on progressivealignment [4] and iterative methods [16]. Progressive alignment is based on the succes-sive construction of pair-wise alignments. It starts with the two most closely relatedsequences and then adds in sequences in order of increasing distance. The order thatselects the sequences to be added is determined by a guide tree previously obtainedfrom an initial pair-wise alignment. Progressive methods are very dependent on theinitial alignment and thus more likely to perform well for closely related sequences,which may be seen as a limitation of the method (over 100 sequences). On the otherhand, progressive methods provide a good compromise between time spent and qualityof the resulting alignment. Two of the most representative progressive algorithms areClustalW [14] and T-Coffee [10].

Iterative methods attempt to improve the heavy dependence on the accuracy of theinitial pair-wise alignments exhibited by progressive alignment. Iterative techniques op-timize an objective function based on a selected alignment scoring method by assigningan initial global alignment and then realigning sequence subsets. The realigned subsetsare themselves aligned to produce the next iterations in MSA. Although the iterativemethods give generally more accurate alignments than the progressive methods, theirmajor disadvantage is their high execution time. Relevant iterative algorithms areMUSCLE [2] and DiAlign [8].

In spite of the improvement in speed and accuracy introduced by these programs,the computational requirements for large-scale alignments (thousands of sequences)clearly exceed the workstation performance. Therefore, there are parallel implemen-tations based on the main heuristics, such as ClustalW-MPI [6], Parallel-TCoffee [18]or DialignP [13]. All of these improve their original algorithm but exhibit a scalabilityproblems when the number of sequences increases, as they are constrained by datadependencies that guide the alignment process.

This work presents a new algorithm, applied to the progressive alignment heuristics,in order to exploit the parallelism degree in the final alignment process. This goalis achieved by building a guide tree that increases the number of available parallelalignments. In order to evaluate the results, the algorithm presented was implementedin Parallel-TCoffee and the results show that our proposal decrease the execution timeand maintains the accuracy of the alignment.

The remaining sections are organized as follows. In Section II, the T-Coffee al-gorithm and the implementation of Parallel-TCoffee are overviewed. Section III isdevoted to the presentation of our proposed algorithm. In Section IV, experimentation


M. Orobitg, F. Cores, F. Guirado, A. Montanyola

is performed to evaluate the effectiveness of the proposed algorithm. Finally, SectionV outlines the main conclusions and future work.

2 Multiple Sequence alignment

2.1 T-Coffee

T-Coffee (TC) is a multiple sequence aligner method that combines the consistency-based scoring function COFFEE [9] with the progressive alignment algorithm. TCprovides an improvement in accuracy compared to most methods based on a progressivestrategy, due to errors made in the initial alignments cannot be rectified later as therest of the sequences are added in. In contrast, TC introduces a library generated usinga mixture of global and local pair-wise alignments in order to reduce the greediness andincrease the accuracy. However, the introduction of these improvements has penalizedTC in speed as compared to the most commonly used alternatives. TC is divided intothree main stages:

• Primary Library. The primary library contains a set of pair-wise alignments be-tween all of the sequences to be aligned. By default, it is generated by combiningthe ten top-scoring non-intersecting local alignments constructed with the Lalignprogram and all the global pair-wise alignments obtained with the ClustalWmethod. In the library, each alignment is represented as a list of pair-wise residuematches and a sequence identity weight is assigned to each pair of aligned residuesin order to reflect the correctness of a constraint. This stage is the most time andmemory consuming, limiting its applicability to no more than 100 sequences ona typical workstation.

• Extended Library. The extended library allows TC to reduce the errors madein the initial alignments. The extension of the library is a re-weighting processwhere the new weights, for a given pair of sequences, also depend on informationfrom the other sequences in the set.

• Progressive Alignment strategy. To obtain the multiple sequence alignment,first of all, pair-wise alignments are made to produce a distance matrix betweenall the sequences. The distance matrix is a matrix of similarity values between thepairs of sequences used to generate a guide tree by the neighbor-joining method.The guide tree is a phylogenetic tree, whose order is followed by the progressivealignment strategy to obtain the multiple sequence alignment.

Progressive alignment consists of aligning the first two closest sequences usingdynamic programming. This alignment uses the weights in the extended libraryabove to align the residues in the two sequences. Then, the next two closestsequences are aligned, or a sequence is added to the existing alignment of thefirst two sequences, depending on the tree order. This process is repeated untilall the sequences are aligned.



2.2 Parallel-TCoffee

Parallel-TCoffee (PTC) [18] is a parallel version of TC that allows to report alignmentsof more than hundreds of sequences, which is far beyond the capability of the sequentialversion. PTC is implemented on version 3.79 of TC and supports most of the optionsprovided by this.

The implementation of PTC uses a distribute master-worker architecture and amessage passing paradigm employing one-sided communication primitives[7]. Basically,PTC parallelizes the library generation, the progressive alignment, which are the twomain and difficult stages of TC, and the distance matrix computation.

• Distance Matrix Computation. In TC, the progressive alignment strategy isguided by a neighbor-joining tree. This is generated using some measure of se-quence similarity expressed with a distance matrix. The computation of the dis-tance matrix requires

(

n

2

)

sequence comparisons and each comparison is a totallyindependent task. This is why PTC parallelizes it by a master-worker paradigmimplementing a Guided Self Scheduling method (GSS) [12] to distribute the com-putations (tasks) between workers. Each worker computes its part of the distancematrix, calculates the time required and returns the results to the master.

• Library Generation. The library generation consists in three phases:

1. Generation of all pair-wise constraints: The method implemented by PTCto parallelize this part is similar to the distance matrix computation. PTCuses a modified GSS, where half of the total number of pair-wise alignmentsis distributed proportionally based on worker efficiency, and the other partis distributed using GSS. Finally, each worker stores the list of the corre-sponding constraints in its local memory instead of returning the results tothe master. The efficiency of workers is known due to the time it takes eachprocessor to compute its part of the distance matrix.

2. Deletion, association and re-weighting of duplicated pair-wise constraints:Each host merges its duplicate constraints locally using the original TCmethod, and then PTC implements a parallel sorting to group and mergeall the repeated constraints that are found by different workers.

3. Transformation of the library into a three-dimensional look-up table: Thelibrary is turned into a three-dimensional look-up table, where the rowsare indexed by sequences and the columns are indexed by residues. PTCimplements the table using one-sided remote memory access mechanisms(RMA). Each worker creates a read-only RMA window that presents itspart of the table, and all the workers share two indexing vectors to retrieveany address of any entry in the table. Finally, each worker also implementsa cache system managed by a Last Recently Used policy to store all thefrequent requests to remote memory.

• Progressive Alignment. The computations of the progressive alignment stagefollow a tree order, and its parallelization can be reduced to a Directed Acyclic



Graph (DAG) scheduling problem. This is why this stage is the most difficult toparallelize.

PTC implements a strategy similar to the HLFET (Highest Level First with Esti-mated Times) algorithms [5]. It launches the graph nodes that have no precedencedependencies and allow the earliest start time, until all graph nodes (alignments)are computed.

3 BP-TCoffee: Balanced Parallel-TCoffee

3.1 Problem analysis

To improve Parallel-TCoffee (PTC), we analyzed the scalability of it by running sometests and varying the numbers of processors. In this study, the data set PF000231 fromthe Pfam database was used [1]. This is made up of 554 sequences and a maximallength of 331 amino-acids. This experiment was run on 16 to 120 processors.

Figure 1a shows that PTC generally improves the execution time as the numberof processors increases. It can be observed in Figure 1b, that the only stage that isimproved is the library generation while the progressive alignment stage keeps linearon the number of processors, avoiding the scalability.

0

1000

2000

3000

4000

5000

6000

16 24 32 48 64 80 100 120

Tim

e (S

econ

ds)

N Processors

Total Time

(a) Total execution time

0

500

1000

1500

2000

2500

3000

3500

4000

16 24 32 48 64 80 100 120

Tim

e (S

econ

ds)

N Processors

InitializationDistance MatrixConstraint List

Extended LibraryProgressive Alignment

(b) Stages execution time

Figure 1: Parallel-TCoffee performance analysis

Given this results it can be deduced that the optimization must be focused on theprogressive alignment. This stage is driven by the guide tree because it determines theorder of the partial alignments. For this reason the guide tree is the key that definesthe dependences among parallel tasks. The guide tree generation method consists ofthree steps:

1. Search the nearest pair of sequences: TC compares the similarity of each distancematrix column with the other columns and selects the two most similar columns.

2. Group this pair of sequences: The two closest sequences are grouped and theyare linked by the same tree node in the guide tree. Each tree node represents thealignment between the sequences of their child nodes.



3. Replace pair by joining similarity: One of the grouped columns is deleted and theother column is filled with the new recalculated similarity values. This recalcu-lated column represents the joining similarity of the group of aligned sequences.

Figure 2 shows an example of these three steps. In step 1, the method applies ametric function that determines columns 4 and 5 as the two closest sequences. Then, instep 2, these two sequences are joined with a tree node. Finally, in step 3, the row andcolumn 5 are deleted to avoid that this sequence could be taken into account in nextiterations. And the the row and column 4 are filled with a new value that representsthe joining similarity of sequences 4 and 5. This method is repeated (#sequences - 3)iterations, and the last three columns are linked directly with the root node.

Figure 3 shows the guide tree generated by PTC. In the tree, the PT-nodes (internalnodes) define the progressive alignment tasks. The leaf nodes are the different sequencesto align (12 in this example) and the tree represents the order in which such progressivepartial alignments can be performed.

From the point of view of parallelism, only PT-nodes with all dependencies resolved(all children nodes are leaves) can be executed as independent tasks. In the example,there are three initial tasks, grey PT-nodes, which can be executed in parallel. Thisvalue defines the maximum number of tasks that can be launched in parallel. Anotherimportant parameter in order to model the parallelization of progressive alignment isthe critical path -the longest path through the guide tree to obtain the final alignment-. The critical path defines the number of sequential iterations that the alignmentalgorithm has to perform. As more sequential iterations, lower parallelism, lower per-formance and high execution times. In the example, the maximum degree of parallelismis three parallel tasks and the length of the critical path is 7 iterations.

The guide tree generated by the PTC method is unbalanced. We have seen thatmost trees generated with PTC are unbalanced because it only takes into account thesimilarity between sequences to generate these trees. The problem of working withunbalanced trees during the progressive alignment stage is that there are too manyprecedence relations between the tree nodes and this generates longer critical paths.Unbalanced trees also affects to the degree of parallelism. The more unbalanced a treeis, the fewer the tasks can be launched in parallel and in consequence the lower degree ofparallelism is. In other words, if we have a low degree of parallelism, full performance

Figure 2: PTC Guide Tree generation example



is not obtained in the HFLET parallel strategy and many of our processors are notused.

Figure 3: Guide tree generated with TC heuristic

3.2 BGT: Balancing Guide Tree

To solve the problems presented in section 3.1, the proposal is to modify the treegeneration method to take into account not only the similarity between sequences, butalso balancing features. The aim is to generate better balanced guide trees than onesgenerated with original TC heuristic, without losing the alignment accuracy. Our maingoal is to reduce the number of precedence relations, reducing the critical path andincreasing the degree of parallelism.

A new heuristic called BGT derived from the original was implemented to maintainsimilarity properties and to balance the guide tree. BGT tries to join the maximumnumber of pairs of sequences and locate them at the base of the tree in order to reducethe number of tree levels and thus reduce the critical path. In order to maintainthe accuracy of the TCoffee algorithm, our balancing heuristic is only applied if twosequences are quite similar. The main key changes performed to the original heuristicare:

1. Calculate the similarity threshold: This is the average value between the similarityvalues of the upper diagonal in the distance matrix (Figure 4 step 1). It is usedto decide which method to use to group a pair of sequences: BGT heuristic orthe original PTC heuristic.

2. Search and group the nearest pair of sequences: Like the original method, thenearest pair of sequences is grouped. The main difference is that if the intersectionvalue between the pair of sequences in the distance matrix satisfies the constraintimposed by the similarity threshold, this pair of sequences is grouped and theirrespective columns and rows in the distance matrix are deleted (Figure 4 step 2).If not, they are grouped using the original PTC heuristic. This stage is repeated(#sequences - 3) iterations, and the last three columns are grouped directly withthe root node.



Figure 4: BGT Guide Tree generation example

3. Replace pair by joining similarity: At each iteration the joining similarity valuesof these columns are calculated as the original method. The similarity values willbe replaced when one of this conditions are achieved: 1) all the rows and columnshave been deleted or, 2) the intersection value between the pair of sequences failsto satisfy the constraint imposed by the similarity threshold.

Finally, three different approaches were defined allowing us to prioritize whichparts of the tree we want to balance more or less, which parts we want to maintainthe similarity features in and which parts we want to use one heuristic or another in.These approaches are:

• AllBalancing: If this option is selected, the BGT heuristic is used to generate allthe tree, in order to obtain a balanced tree taking similarity features into account.

• LeafBalancing: If this option is selected, BGT is only used to generate a percent-age n of the sequences of the tree, and the rest of the tree is generated with theoriginal PTC heuristic. With this approach, the base of the tree is balanced takingsimilarity features into account and the tree nodes near the root are unbalanced.

• RootBalancing: If this option is selected, BGT is only used to generate a per-centage n of the sequences of the tree, and the rest of the tree is generated usingan approach that only considers balancing features. With this approach, the baseof the tree is balanced taking similarity features into account and the tree nodesnear the root are perfectly balanced.

Figure 5 shows the same tree as Figure 3, but in this case it has been generatedwith the AllBalancing BGT. This tree is more balanced than the previous one. Thus,comparing it with Figure 3 tree, it can be noticed that the critical path is shorter and



Figure 5: Guide tree generated with BGT heuristic

the parallelism degree is greater. The critical path length is reduced by 50%, from 7 to4 iterations, and the parallelism degree is increased from 3 to 5 tasks, 66% greater.

A tree balancing study, shown in Table 1, was done using the sets of sequencesPF00074, PF00200, PF00231, PF00349, PF01057 and PF08443 from the Pfam database.In this study, the critical path and the parallelism degree of each tree are comparedbetween PTC guide tree generation heuristic, AllBalancing BGT (BGT-AB), LeafBal-ancing BGT with a percentage of 25 (BGT-LB25) and RootBalancing BGT with apercentage of 25 (BGT-RB25).

PTC BGT BGT BGT PTC BGT BGT BGT

AB LB25 RB25 AB LB25 RB25

SEQUENCES Critical Path length Maximum Parallel DegreePF00074 28 11 26 9 107 172 129 174PF00200 29 17 28 10 173 295 218 297PF00231 26 18 26 10 164 270 191 277PF00349 23 19 23 9 143 252 183 257PF01057 94 14 93 9 71 187 122 191PF08443 68 29 62 10 215 365 278 374

Table 1: Critical Path and Parallel Degree study for Pfam database

The results in Table 1 shown that BGT-RB25 generates the guide trees with theshortest critical path and the greater degree of parallelism. On average this approachreduces the critical path by 71.92% and increases the degree of parallelism by 91.57%compared with PTC. The limit case appears with the data set PF01057 where BGT-RB25 reduces the critical path by 90.42% and increases the degree of parallelism by196.01% against PTC. In conclusion, the guide trees generated with the BP-TCoffeeapproaches are more balanced than the guide trees obtained with the Parallel-TCoffee.



4 Experimentation

The two main goals of the experimentation are to confirm the improvement in executiontime due to balanced guide trees and validate that our approaches maintain the levelof biological accuracy of the original T-Coffee (TC). The first study is presented in thesection 4.1 and is based on a performance comparison between Parallel-TCoffee (PTC)and Balanced-Parallel-TCoffee (BP-TCoffee). The biological quality of the alignmentsis analyzed in section 4.2 and is based on the Balibase benchmark [15].

4.1 BP-TCoffee Performance

In this section, the performance and scalability of BP-TCoffee were analyzed and com-pared with PTC. The experiment was run on 8 to 100 processors in our cluster whereeach node is a HP Proliant DL145-G1, with 2 processors AMD Opteron (1.6GHz/1MB)and 1GB of RAM Memory.

The experiment was executed different times and the average value was calculatedin order to test the performance improvements of BP-TCoffee and its approaches,BGT-AB, BGT-LB25 and BGT-RB25, in comparison with PTC. Figure 6a shows thecomparison of total execution time using only the PF01057 data set. It can be seenthat BGT-AB and BGT-RB25 improve the PTC and BGT-LB25 execution times.With regards to PTC, the biggest improvement is obtained with higher numbers ofprocessors used. With 100 processors, BGT-AB reduces the execution time of PTC by68%. This means that the higher level of degree of parallelism of our approaches cantake advantage of large parallel architectures.

0

500

1000

1500

2000

2500

3000

3500

8 16 32 48 64 80 100

Tim

e (S

econ

ds)

N of Processors

PTCBGT-AB

BGT-LB25BGT-RB25

(a) PF01057 sequence

1000

2000

3000

4000

5000

6000

7000

8000

8 16 32 48 64 80 100

Tim

e (S

econ

ds)

N of Processors

PTCBGT-AB

BGT-LB25BGT-RB25

(b) Pfam database average

Figure 6: PTC and BP-TCoffee comparison

It is remarkable that although BGT-RB25 achieves the shortest critical path andhighest parallel degree, it does not obtain the best execution time. This is due tothe parallel tasks obtained were more computational complex and then the parallelismbenefits were reduced.

Figure 6b shows the average execution time of all the data sets used in the exper-iment. It is noted that similar behavior as Figure 6a is obtained. The total executiontime using BGT-AB and BGT-RB25 is lower than PTC and BGT-LB25 time, becausethe progressive alignment stage has been reduced. This feature can be seen in Figures



7a and 7b that show BGT-AB and PTC execution times on each stage. The resultsdemonstrate that BTG-AB and BGT-RB25 improvement come from higher degree ofparallelism obtained by the balanced tree.

0

200

400

600

800

1000

1200

1400

1600

8 16 32 48 64 80 100

Tim

e (M

icro

Sec

onds

)

N of Processors



(a) BGT-AB

0

200

400

600

800

1000

1200

1400

1600

8 16 32 48 64 80 100

Tim

e (M

icro

Sec

onds

)

N of Processors



(b) PTC

Figure 7: Comparison of stages execution time on PF01057 sequence

4.2 BP-TCoffee Accuracy

This experiment evaluates the quality of the alignments generated with BGT and itsapproaches, and compare this quality with the latest version of TC and PTC. To obtainthis measure, the multiple alignment benchmark BAliBASE v3.0 was used [15].

BAliBASE is a multiple alignment benchmark that provides high quality and man-ually refined reference alignments based on 3D structural superpositions to identifythe strong and weak points of the alignment programs. The alignments provided byBAliBASE are organized into reference sets that are designed to represent real multi-ple alignment problems. The BAliBASE benchmark compares the reference alignmentwith the user alignment and calculates two different scores: the sum-of-pairs (SP) andthe total column score (TCS). The SP score is used to determine the extent to whichthe program succeeds in aligning some of the sequences in an alignment. This scoreincreases with the number of sequences correctly aligned. Otherwise, the TCS score isa binary score that tests the ability of the programs to align all the sequences correctly.

Table 2 shows the results of running the benchmark BAliBASE to obtain the strongand weak points of the alignments generated by T-Coffee, PTC, BGT-AB, BGT-LB25and BGT-RB25. The SP and TCS average scores of each reference for all methods andthe SP and TCS total average of joining all references are calculated. To compare thealignment quality of BGT approaches and PTC with the latest version of T-Coffee, allexperiments were executed with this latest version using the guide trees obtained byBGT and PTC.

The Table shows that the quality of the alignments generated with BGT approachesis maintained in comparison with PTC and the latest version of TC. The accuracydifferences between TC with BGT-AB are in the interval [−0.002,+0.008], and similarresults are obtained when compared with PTC. We conclude that our heuristic doesnot diminish the accuracy.



Reference TC PTC BGT BGT BGT

AB LB25 RB25

Ref1 SP 0.764 0.765 0.764 0.765 0.764TCS 0.579 0.583 0.581 0.584 0.581

Ref2 SP 0.878 0.878 0.876 0.878 0.874TCS 0.363 0.369 0.366 0.370 0.365

Ref3 SP 0.785 0.787 0.784 0.789 0.782TCS 0.392 0.406 0.400 0.406 0.380

Ref4 SP 0.802 0.802 0.806 0.806 0.803TCS 0.420 0,419 0.430 0.430 0.426

Ref5 SP 0.787 0.788 0.790 0.788 0.787TCs 0.423 0.419 0.429 0.421 0.424

Reference TC PTC BGT BGT BGT

AB LB25 RB25

Ref6 SP 0.806 0.811 0.809 0.811 0.807TCS 0.419 0.424 0.417 0.418 0.424

Ref7 SP 0.805 0.810 0.811 0.809 0.797TCS 0.359 0.364 0.365 0.364 0.354

Ref8 SP 0.701 0.701 0.701 0.700 0.700TCS 0.179 0.181 0.180 0.181 0.176

Ref9 SP 0.741 0.743 0.741 0.745 0.741TCS 0.484 0.487 0.484 0.487 0.486

Total SP 0.785 0.787 0.787 0.788 0.784TCS 0.402 0.406 0.406 0.407 0.402

Table 2: BAliBASE accuracy analysis

5 Conclusions

Multiple Sequence Alignment (MSA) parallel approximations are constrained by thebottleneck that the progressive alignment step represents. This bottleneck is generatedby the high dependencies between different iterations of this algorithm (critical path)and the poor degree of parallelism derived from progressive alignment guide tree.

We have proposed and evaluated a new Balanced Guide Tree heuristic (BGT) forthe construction of the guide tree that it can achieve a more balanced tree, resultingin a significant reduction in the critical path and an important rise of the numberof task that can be executed in parallel. As a result, BalancedParallel-TCoffee (BP-TCoffe) can take advantage of large high performance computing infrastructures toreduce the execution time of MSA applications. The experimental results obtained ona cluster of 100 processors show that BP-TCoffee reduces the execution time of previousParallel-TCoffee by 68%. Moreover, this performance is achieved while maintaining thebiological accuracy of the resulting alignment.

For future work, we are studying new parallel algorithms for MSA, capable ofusing thousand of processors efficiently. We are also analyzing how to reduce accuracydegradation when large number of sequences are aligned.

Acknowledgements

This work was supported by the MEC-Spain under contract TIN2008-05913 and Con-solider CSD2007-00050.

References

[1] A. Bateman, L. Coin, R. Durbin, R.D. Finn, V. Hollich, S. Gruiffiths-

Jones, A. Khanna, M. Marshall, S. Moxon, E. L. L. Sonnhammer, D. J.

Studholme, C. Yeats, S. R. Eddy, The Pfam protein families database,Nuc. Acids. Res. 32 (2004) 138–141.

[2] R.C. Edgar, Muscle: multiple sequence alignment with high accuracy, high

throughput , Nucleic Acids Res 32(5) (2004) 1792–1797.



[3] R.C. Edgar, S. Batzoglou, Multiple sequence alignment, Curr Opin StructBiol 16(3) (2006) 368-373.

[4] D.F. Feng, R.F. Doolittle, Progressive sequence alignment as a prerequisite

to correct phylogenetic trees, J Mol Evol 25(4) (1987) 351-360.

[5] Y.K. Kwok, I. Ahmad, Benchmarking, comparison of the task graph scheduling

algorithms, J. Par. Dist. Comp. 59(3) (1999) 381–422.

[6] K.-B. Li, ClustalW-MPI: ClustalW analysis using distributed, parallel computing,Bioinformatics 19 (2003) 15851586.

[7] MPI-2: Message-Passing Interface Standard version 2. http://www.open-mpi.org/

[8] B. Morgenstern, DIALIGN 2: improvement of the segment-to-segment ap-

proach to multiple sequence alignment, Bioinformatics 15(3) (1999) 211-218.

[9] C. Notredame, L. Holm, D.G. Higgins, COFFEE: an objective function for

multiple sequence alignments, Bioinformatics 14(5) (1998) 407–422.

[10] C. Notredame, D.G. Higgins, J. Heringa, T-Coffee: A Novel Method for

Fast, Accurate Multiple Sequence Alignment, J. Mol. Biol. 302(1) (2000) 205–217.

[11] C. Notredame, Recent evolutions of multiple sequence alignment algorithms,PLoS Comput Biol 3(8) (2007) 1405-1408.

[12] C. Polychronopoulos, D. Kuck, Guided self-scheduling: A practical schedul-

ing scheme for parallel supercomputers, IEEE Trans. on Computers 36(12) (1997)1425-1439.

[13] M. Schmollinger, K. Nieselt, M. Kaufmann, B. Morgenstern, DiAlignP:

fast pairwise, multiple sequence alignment using parallel processors, BMC Bioin-formatics 5(1):128 (2004).

[14] J.D. Thompson, D.G. Higgins, T. J. Gibson, ClustalW: improving the sen-

sitivity of progressive multiple sequence alignment through sequence weighting,

position-specific gap penalties, weight matrix choice, Nucleic Acids Res 22(22)

(1994) 4673-4680.

[15] J.D.Thompson, P.Koehl, R.Ripp, O.Poch, BaliBase 3.0:Latest Developments

of Multiple Sequence Alignment Benchmark, Proteins 61(1)(2005) 127–136.

[16] I. Wallace, O´ Sullivan Orla, D.G. Higgins, Evaluation of iterative align-

ment algorithms for multiple alignment, Bioinformatics, 21(8) (2005) 1408–1414.

[17] L. Wang, T. Jiang, On the complexity of multiple sequence alignment, J ComputBiol 1 (1994) 337-348.

[18] J. Zola, X. Yang, S. Rospondek, S. Aluru, Parallel T-Coffee: A Parallel

Multiple Sequence Aligner, In Proc. of ISCA PDCS-2007, (2007) 248-253.





30 June, 1–3 July 2009.

FPGA Cluster Accelerated Boolean Synthesis

Cesar Pedraza1, Javier Castillo1, Pablo Huerta1, Javier Cano1, Jose L.

Bosque2 and Jose I. Martınez1

1 Departamento de ATC y CCIA, Universidad Rey Juan Carlos

2 Departamento de Eletronica y Computadores, Universidad de Cantabria

emails: [email protected], [email protected],[email protected], [email protected],

[email protected], [email protected]

Abstract

One important issue in the evolvable hardware area is speeding up the com-binational circuits synthesis. This can be mainly achieved by using genetic pro-gramming techniques combined with implementations in parallel systems. Thispaper presents a parallel genetic program for combinational circuits synthesis im-plemented specifically for an FPGA cluster. This implementation accelerates thecomputation of the fitness function by a x5000 factor. The experiments test thealgorithm and the cluster architecture performance in comparison with AltamiraHPC cluster. Results show that the FPGA cluster architecture delivers an inter-esting high performance for combinational circuit synthesis with more than eightvariables. Being able to tune some parameters up can greatly help the algorithmsto find new optimized implementations (whether by area or speed).

Key words: Cluster, FPGA, Boolean Synthesis, Hardware accelerated

1 Introduction

Evolvable hardware (EH) is a piece of hardware that can change its internal structurein order to adapt itself to new conditions or to an special environment [8],[7]. Twoimportant topics of this kind of systems are: an evolvable algorithm and a reconfig-urable system [17]. The evolvable algorithm is responsible for establishing the way thehardware should change, and it is often based on bio-inspired methods such as geneticalgorithms and genetic programming, among others [5],[2]. The reconfigurable systemis an architecture that changes its internal hardware changing the configuration bit-stream of the FPGA based on the stream obtained with the evolvable algorithm. TheEH can be divided into two categories depending on the way it is implemented: extrin-sic EH [10] or intrinsic EH [19]. An extrinsic EH system uses software for the evolution



process and only the best candidate is implemented in hardware; its implementationis straightforward but with large processing times depending on the complexity of theevolvable algorithm. On the other hand, the intrinsic EH systems, every candidategenerated during the evolution step is calculated in hardware, obtaining better pro-cessing times but increasing the complexity of the design process and consuming morehardware resources. The EH is mainly used for adapative systems, data compression,control systems and digital circuits design [17]. One important issue in digital circuitsdesign is the synthesis of combinational circuits, where the goal is to get the best pos-sible digital hardware implementation of a function or truth table for a programmabledevice or ASIC [15],[12],[3],[9]. This paper presents an intrinsic EH system implementedon an FPGA cluster for solving the combinational on-chip synthesis problem using par-allel genetic programming as evolvable algorithm. Special emphasis is put on showingthe performance of the architecture and adapting the algorithm. Section two describesthe combinational on-chip synthesis problem. Section three connects concepts aboutparallel genetic algorithms with digital hardware synthesis. Section four shows a fewimportant aspects for the FPGAs and the FPGA cluster architecture developed. Fi-nally, sections five and six describe experiments and results, and the conclusions andfuture work.

2 Combinational synthesis onchip

Combinational synthesis is a design flow process that optimizes and reduces the logicgate nets of a circuit tin order to minimize costs and chip area, and increase perfor-mance. There are many techniques available for combinational synthesis such as theKarnaugh maps, the Quine-McCluskey algorithm, the Reed-Muller algorithm and alsoother heuristic methods. In general terms, these algorithms have disadvantages likeexponential growth, lack of restrictions management, and multiple solutions for thesame problem. On the other hand, genetic algorithms and genetic programming pro-duce new structures for the implementation of combinational blocks that cannot beobtained with the traditional methods mentioned early [3],[9],[16] and can add somerestrictions to the algorithm such as delay, area, etc. to obtain good results for eachproblem. When implemented on-chip, the combinational synthesis main problem is thegreat computational overload of evolving the circuit in the design space [18], especiallyin embedded systems. There are some intrinsic implementations [19] of combinationalsynthesis using genetic algorithms (GA) and genetic programming (GP) but with avery low limited number of variables [5] and mainly oriented to obtain a few basicstructures. There are also efforts for improving the management of restrictions andincreasing the number of variables with GA [7] or GP [3], and mainly sugguesting newhardware architectures and representations of the chromosome.


C. Pedraza, J Castillo, P Huerta, J Cano, JL Bosque, J Martınez.

Figure 1: Evolutionary system

3 Genetic programming and the boolean synthesis

Figure 1 shows the main components of an EH system. First, there is a target functionthat determines the hardware requirements. The fitness function calculus unit and thegenetic unit are responsible for running the evolutive process that finds a configurationthat fulfills the requirements. This configuration is later transferred to the hardwarereconfiguration unit that elaborates the new bitstream for the reconfigurable area.

In the evolutive process, the hardware is represented with a chromosome and man-aged with the Darwinian concept of natural selection [12]. The chromosomes muteand cross with other chromosomes for creating a population of individuals to evolve.As in nature, a population of individuals is generated, the fitness function evaluateswhich ones are suitable for accomplishing the target function requirements, and later, aselection process excludes some members whilst the rest mute and are crossed, creatinga new next generation. This process is repeated until a population that accomplishesthe requirements and restrictions of the target function is obtained. In hardware termsthese restrictions usually are 1) getting the appropriate input-output behavior and 2)the minimum number of logic gates. Other restrictions that can be added are propa-gation delays, type of logic gates available, etc. For hardware synthesis there are somemodifications to the evolvable algorithm that has to be done, using a variation of thesimple genetic algorithm (SGA) known as genetic programming (GP). The GP [12][16]is a technique based on genetic algorithms that does not distinguish the search spacenor the representation space of the chromosome, and is able to modify the chromosomelength and create new muting and crossing operators. The programs that are evolv-ing can be represented linearly or by trees (the tree option is the most interesting forcombinational synthesis).

3.1 Chromosome representation

Finding the most appropriate codification for a hardware structure is one of the moreimportant issues in the design process of an EH system. The codification is how alogic circuit is represented using a bit array so that can be managed in the evolution



Figure 2: Cell structure and its representation inside the chromosome.

process [4]. There are many conditions that should be accomplished by a good geneticrepresentation [20]: 1) it must be able to represent all the different solutions of theproblem, 2) the crossing and muting operators should not generate unreal individualsand 3) it must cover all the solution space so the search is really random. There aremany different ways of representing combinational hardware for a genetic algorithm.Both Aguirre [6] and Koza [12] use a tree-based representation in 2-D for evolving thehardware using genetic programming. Higuchi [8] proposes a PLD-like structure forevolving logic circuits, whilst Miller [14] suggests a Cartesian representation using alist of integer numbers that are directly mapped to graphs. The 2-D tree represen-tation is appropriated for implementing parallel systems because allows splitting thechromosomes for balancing the computational load [20]. For simplicity, a basic treestructure that allows representing a single boolean function or up to three functionswith four input variables coded in binary has been selected. Figure 2 shows the cellstructure and the way it is coded inside the chromosome.

It is worth mentioning that the chromosome length has to be variable because thelength of the solution to the synthesis problem is unknown. Alander [1] argues, withempirical studies, that a population sized between l and 2l is enough for getting thesolution to the majority of the evolvable problems, being l the size of the chromosome.This means that if the chromosome length is variable it is probable that the size of thepopulation is also going to be variable.

3.2 Fitness function.

Finding the appropriate fitness function is another difficult problem in designing evolv-able systems, because is responsible for quantifying the way a chromosome or individualmeets or not the requirements. Three parameters have been established for the fitnessfunction: 1) the number of coincidences of the individual X for all the possible com-binations at the output with the target functions Y, 2) the number of logic gates thatare inside the individual and 3) the number of logic circuit levels that will determinethe maximum propagation delay of the circuit.



fitness = ω1.[m

∑

j=1

n∑

i=1

Y (j, i) − X(j, i)] + ω2.P (x) + ω3.l(x) (1)

In equation 1 the fitness function for the evolvable systems is shown. Constants 1,2y 3 are used for establishing the weights of each of the parameters that will determinethe fitness function. The p(X) function is used for calculating the number of logic gatesof a chromosome taking into account some of the introns or segments of the genotypestring that will not have any associated function and that do not contribute to theresult of the logic circuit they represent. The function l(X) is used for determining thenumber of levels the circuit has, or in other words the number of gates that the criticalpath crosses. The m constant means the number of outputs in the circuit and n thenumber of possible combinations of inputs in the circuit.

3.3 Genetic operators.

Selection operator

responsible for identifying the best individuals of the population taking into account theexploitation and the exploration [20]. The first one allows the individuals with betterfitness to survive and reproduce more often, and the second one means searching inmore areas and making possible to find better results. The Boltzman selection methodis used for controlling the equilibrium between exploitation and exploration during theexecution of the genetic algorithm by using a temperature factor (T) in a similar wayto the simulated annealing method [11]. The equation 2 describes the expected valueof an individual as a function of T and the fitness value for an specific iteration or timestamp t.

Ei,t =e

f(i,t)

T

〈ef(i,t)

T 〉 t

(2)

This way an individual is chosen by the number of times its fitness value matchesthe integer part of the expected valued calculated in equation 2.

Mutation

This operator modifies the chromosome randomly in order to increase the search space.It can change: 1) an operator or variable and 2) a segment in the chromosome. Bothare executed randomly and with a certain probability. A variable muting probabilityduring the execution of the algorithm (evolvable mutation) [13] is more effective forevolvable systems.

Crossing

This operator combines two selected individuals for obtaining two additional individualsto add to the population. A crossing system with one or two crossing points randomlyselected has been implemented because is more efficient for evolvable systems [15].



4 Paralell genetic programming

One of the advantages of the evolvable algorithms is its intrinsic parallelism, thereforeperformance can be increased using more microprocessors, memories and communica-tion systems for the data exchange. A programmable genetic program (PGP) can beparallelized in two ways without drastically disturbing its structure:

1. Parallelizing the fitness function computation, and

2. Executing many evolvable algorithms at a time in different processors, which isequivalent to let many populations evolve in parallel.

The performance of the parallel algorithm can be optimized [20] by these threemain factors:

1. The topology of the communication infrastructure (ring, completely meshed,master-slave),

2. The ratio of data exchange (the number of the best individuals to exchange in-creases the probability of finding a better solution).

3. The migration frequency, because how frequently the individuals are exchangedbetween nodes of a parallel system can be a factor that helps finding bettersolutions in less time.

5 Cluster architecture

On previous works [22], the authors have implemented an ad-hoc FPGA cluster calledSMILE to accelerate highly demanding computational algorithms. For this new pro-posal the SMILE cluster has been updated to use the new Xilinx ML507 boards insteadof the old VirtexII Pro. This new board includes an XC5VFX70T FPGA with twoPowerPC 440 microprocessors. It also has 256MB of DDR2 memory, SystemACE andGigabit Ethernet connectivity. The custom hardware can be connected to the PowerPCprocessor through the PLB (Processor Local Bus). The PowerPC processor executes aLinux 2.6 operating system with all the tools and drivers needed to set up and managethe system.

5.1 Evolutionary Algorithm Acceleration.

The implemented evolutionary algorithm is very demanding from a computational pointof view, therefore demanding hardware acceleration on the FPGA fabric. A profilingof the algorithm determined that the most consuming part were the fitness functioncalculation and the new individual generation (25% and 35% of the execution time,respectively). Therefore, these two has been specifically accelerated with a coprocessorconnected to the PowerPC processor. The fitness function is divided into three partsas shown in equation 1 : the minterm coincidence calculation, the number of gates



Figure 3: FCU block diagram.

and the number of logic levels (critical path). The FCU (Fitness Calculation Unit)is the element which calculates the three parameters using the objective function, thechromosome and the number of variables as inputs. This coprocessor is connected to thePowerPC 440 processor through the PLB bus using a custom interface. The interfaceallows register-based and DMA communication to transfer the objective function andthe chromosome in an efficient way. Figure 3 shows the FCUs structure. Once thechromosome has been read, each of its basic cells are converted to their equivalent inLUT (Look Up Table) through a ROM based translation. The next block, computesthe minterm value (number of hits of that individual) using the information from theobjective function and a counter as inputs. The calculation of the gates number and thelogic levels is done in the corresponding blocks and then send to the fitness calculationblock in order to calculate the final fitness value which will be send back to the PowerPCprocessor. Finally, in order to accelerate the new generation of individuals, a Mersenne-Twister-based pseudo random number generator were inserted, through the registersof the PLB interface, accelerating the generation of new individuals and the mutationprocess.

6 Experiments

To determine the system performance a three bit comparator has been implementedin two different versions. The first one parallelizes the function evaluation; the secondone executes different algorithms in parallel in 16 SMILE nodes. Figure 4 a showsthe first implementation, where the algorithm is executed in one node and the fitnessevaluation is distributed in the nodes. The Evolutionary Engine is implemented inthe master node. This node is the responsible for generating the population, and themutation and cross operations.

In the second version implementation the EE is executed in all the cluster nodes,as shown in figure 4 b . The master node sends the objective function and receivesthe partial solutions. With this information the master node migrates the selectedindividuals to find better solutions. several tests have been carried out with differentpopulations and different migration rates. The main goals are to calculate the clusterspeed-up and to try different number of variables (8 and 12) to be able to compare



Figure 4: (a) Case 1: fitness function distributed on the cluster nodes, (b) Case 2:algorithm running in parallel on all the nodes

with traditional implementations.

7 Results

7.1 FCU unit performance.

Figure 5 shows the evaluation time on SMILE for different populations with 15 logicgates chromosomes and different number of variables. A significant growth in theevaluation time can be seen for executions with more than 12 variables due to theexponential growth of the input combinations. These times have been compared witha HPC cluster of the University of Cantabria, called Altamira. This cluster is made upof 18 eServer BladeCenter, with 256 JS20 nodes (512 processors), linked together usinga Myrinet network with 1 Gbps of bandwidth. Each of the nodes features the followingconfiguration: 2 IBM PowerPC 970TX 2.2 GHz processors and 4 GB of main memory.The operating system is SuSe Linux 9 Enterprise of 64 bits, with 2.6.5 kernel, usingthe IBM General Parallel File System (GPFS). Figure 5 presents the response times ofthe parallel implementation in both architectures. It is worth mentioning an incrementof the evaluation time for 8-bit problems in a PC when compared with 4-bit problemsand to the cluster node evaluation time.

The FPGA resources used by the FCU are 448 slices out of 13696, and the maximumoperating frequency is 112 MHz, optimal for the 100 MHz PLB system clock.

7.2 Speed-up and cluster performance.

The speed-up number was computed for SMILE compared with Altamira HPC cluster,for 16 nodes with 4, 8 and 12 variables. As shown in figure 6 the SMILE cluster per-formance is about 20 times higher than Altamira HPC cluster for 4-variable problems,400 times higher for 8-variable and 5000 times for 12-variable problems. A slightlydecrease in performance is shown when population is increased, because the processingtimes of the software implemented functions becomes significant, compared with the



Figure 5: Fitness calculation time for 4 and 8 variables in (a) Altamira HPC clusternode and (b) SMILE Cluster.

Altamira cluster. Other experiments demonstrated that the speed-up for case 1 is al-most 10 times lower than case 2, due the processing times in communications, crossover,individual generation, and other processes not parallelized in this case.

Tests with 4 and 6-bit functions were implemented varying the constants 2 and 3of the fitness function to obtain synthesis results optimized for number of gates andlevel of gates. Table 1 shows the results for these different efforts for a 2 and 3-bitcomparator.

2 bit comparator 3 bit comparator 4 bit comparator

Effort Gates Level Max Gates Level Max Gates Level MaxGates 10 3 20 4 97 4Critical path 15 2 78 3 220 3

Table 1: Synthesis results for 2, 3 and 4-bit comparator (4, 6 and 8 bit problems).

8 Conclusions

This paper presented an alternative for the combinational synthesis by using geneticparallel programming. An evolutionary algorithm was implemented in a Virtex5 FPGAcluster and compared in performance with an Altamira HPC cluster. In order toaccelerate the process, a coprocessor was implemented to calculate the fitness functionand to generate random numbers, improving the performance when compared with aPC implementation for problems with more than 6 bit. The success of the FCU was dueto the use of equivalent LUTs for each combination of functions in the basic cell. LUTimplementations on conventional memory increase performance instead of using otherslower approaches (PLD-oriented implementations or modifying FPGA configurationmemory by dynamic partial reconfiguration). The tests have proven that the algorithmis more effective for 4-bit or 8-bit problems (less variables means processing times



Figure 6: Speed-up and execution time for 16 nodes with 4, 8 and 12 variables.

similar to communication times, therefore decreasing the performance). There havebeen some convergence problems for 12-bit functions due the space search is too long.This could be solved as future work with some improvements in terms of multiple FCUsinside an FPGA, more nodes, and others hardware-accelerated genetic operators. TheLUT-oriented solutions of the algorithm can be directly mapped on the Virtex5 throughpartial reconfiguration techniques via ICAP port, allowing the cluster to change itself.At present we are working in an evolvable cluster capable of accelerating some processesin any time, in a near future results about testing of fault tolerant algorithms on SMILEwill be shown.

9 Acknowledgments

This work has been partially funded by the the Spanish Ministry of Education andScience (grants TIN2007-68023-C02-01 and Consolider CSD2007-00050), as well as bythe HiPEAC European Network of Excellence.

References

[1] J Alander. On optimal population size of genetic algorithms. Com-

pEuro’92.’Computer Systems and Software Engineering’, Jan 1992.

[2] S Chang, H Hou, and Y Su. Automated synthesis of passive filter circuits includingparasitic effects by genetic programming. Microelectronics Journal, Jan 2006.

[3] Sin Cheang, Kin Lee, and Kwong Leung;. Applying genetic parallel program-ming to synthesize combinational logic circuits. Evolutionary Computation, IEEE

Transactions on, 11(4):503 – 520, Aug 2007.



[4] Rothlauf F.. Representations for genetic and evolutionay algorithms. Springer,Jan 2006.

[5] D Goldberg and J Holland. Genetic algorithms and machine learning. Machine

Learning, Jan 1988.

[6] A Hernandez-Aguirre, C Coello, and B Buckles. A genetic programming approachto logic function synthesis by meansof multiplexers. Evolvable Hardware, Jan 1999.

[7] T Higuchi and B Manderick. Hardware realizations of evolutionary algorithms.Evolutionary Computation 1, Jan 2000.

[8] T Higuchi, T Niwa, T Tanaka, H Iba, and H de Garis. Evolving hardware withgenetic learning: a first step towards building a darwin machine. Proceedings of

the second international conference on From . . . , Jan 1993.

[9] L Jozwiak, N Ederveen, and A Postula. Solving synthesis problems with geneticalgorithms. Euromicro Conference, Jan 1998.

[10] T Kalganova. An extrinsic function-level evolvable hardware approach. LECTURE

NOTES IN COMPUTER SCIENCE, Jan 2000.

[11] S Kirkpatrick, C Gelatt, and M Vecchi. Optimization by simulated annealing.Science, Jan 1983.

[12] J Koza, F Bennett, D Andre, and M Keane. Genetic programming iii: darwinianinvention and problem solving [book review]. Evolutionary Computation, Jan 1999.

[13] R Krohling, Y Zhou, and A Tyrrell. Evolving fpga-based robot controllers usingan evolutionary algorithm. 1 stInternational Conference on Artificial Immune

Systems, Jan 2002.

[14] J Miller and S Harding. Cartesian genetic programming. GECCO ’08: Proceedings

of the 2008 GECCO conference companion on Genetic and evolutionary computa-

tion, pages 2701–2726, Jan 2008.

[15] J Miller and P Thomson. Aspects of digital evolution: Evolvability and architec-ture. LECTURE NOTES IN COMPUTER SCIENCE, Jan 1998.

[16] Macedo A. Nedjah N. Genetic systems programming. Springer, Jan 2006.

[17] L Sekanina. Evolvable components from theory to hardware implementations.Natural computing series, Jan 2000.

[18] A Stoica. Evolvable hardware for autonomous systems. CEC Tutorial, Jan 2004.

[19] A Thompson. An evolved circuit, intrinsic in silicon, entwined with physics. LEC-

TURE NOTES IN COMPUTER SCIENCE, Jan 1997.

[20] Pan C Yu Q, Chen C. Parallel genetic algoritms on programmable graphics hard-ware. Lecture notes in computer science, Jan 2006.



Petri Nets as Discrete Dynamical Systems

Pelayo, Maria L.1, Pelayo, Fernando L.1, Guirao, Juan L. G.2 and

Valverde, Jose C.1

1 Department of Computing Systems, University of Castilla - La Mancha

2 Department of Applied Mathematics, Polytechnic University of Cartagena

emails: [email protected], [email protected], [email protected],[email protected]

Abstract

This paper present the very first step in the line of applying Discrete DynamicalSystems, DDSs, theory to the analysis of Concurrent Computing Systems. PetriNets, PNs, are properly encoded into DDSs, so defining the corresponding PhaseSpace and Continuous map associated.

Key words: Formal Computer Science, Petri Net, Applied Mathematics, Dis-crete Dynamical System

1 Introduction

Formal Models are used to describe and analyse the behaviour of computer systems.Among these models, we have Process Algebras, Event Structures, Markov Chains,Petri Nets and some others. Software designers work happily with process algebras,since they have a very similar syntax to programming languages, but they are not able,in general, to capture true concurrency, and even formal verification is a bit harderthan it is in other formalisms like Petri Nets.

Petri Nets were first conceived by Carl Petri [1]. They predate traditional processalgebra at being able to model concurrent systems.

Petri Nets are widely used for the modelling and analysis of concurrent systems,because of their graphical nature and the solid mathematical foundations supportingthem. Furthermore, one of the main advantages of Petri nets is that they easily capturetrue concurrency , i.e., they are able to model the simultaneous execution of actions inthe system.

It is also well known that many scientist and technicians try to find out the futureand the past state of a process whose present state they are observing. The fact is thatthe future and past states of many biological, ecological, physical or even computer


From PNs To DDSs

processes can be predicted, if their present states and the laws governing their evolutionare knowing, provided these laws do not change in time.

A dynamical system is the mathematical formalization of a deterministic process,created to deal with these kind of challenges. Thus, in this formalization we have toinclude the elementary parts cited before, i.e., the set of all possible states and theevolution law in time

2 Petri Nets into Discrete Dynamical Systems

A Petri Net, PN, is a graph which has two kinds of nodes: places and transitions.Places are usually related to conditions or states, whereas transitions are associatedwith events or actions, which cause the changes of state in a system. The arcs in thenet represent the conditions that must be fulfilled for executing an action (firing atransition), and the new conditions or states obtained after firing that transition. Thebehaviour of a PN can easily be represented by means of a linear equation, the so calledstate equation, which is the most formal tool for the analysis of PNs.

Petri Net: An Ordinary Petri Net (OPN) is a triple N = (P, T, F ) consisting oftwo sets P and T , and a relation F defined over P ∪ T , such that:

1. P ∩ T = ∅

2. F ⊆ (P × T ) ∪ (T × P )

3. dom(F ) ∪ cod(F ) = P ∪ T

P is the set of places, T is the set of transitions and F is the flow relation. F

relates places and transitions by arcs connecting them. Petri Nets can be representedby graphs with two kinds of nodes (places and transitions). Places are represented bycircles and transitions by rectangles.

Let X be the set X = P ∪ T , thus ∀x ∈ X two sets are defined:

• •x = y ∈ X | (y, x) ∈ F (precondition set of x),

• x• = y ∈ X | (x, y) ∈ F (postcondition set of x).

Example: Let N = (P, T, F ) be an ordinary Petri net, where:

P = p1, p2, p3T = t1, t2F = (p1, t1), (p2, t1), (t1, p3), (p3, t2)

This Petri net is graphically represented in fig. 1.The state of a system described by a PN is captured by means of the so called

markings. They are defined as follows.Markings of Ordinary Petri nets: Let N = (P, T, F ) be an ordinary Petri net.

A function M : P −→ IN is a marking of N. Thus, (P, T, F, M) is a marked ordinaryPetri net, MOPN.


Pelayo, Pelayo, Valverde, Guirao

?

?

?

JJ

?

p3

t2

t1

p2p1

Figure 1: Example of Petri net

Markings of Petri nets are graphically represented by including in the places asmany points as tokens.

Example: The Petri net of fig. 1 can be marked as shown in fig. 2:

M(p1) = 1, M(p2) = 1, M(p3) = 0

q

q

?

?

?

JJ

?

p3

t2

t1

p2p1

Figure 2: Example of marked Petri net

This Marked Ordinary Petri Net will be codified in our particular DDS as (1,1,0).

Given a MOPN (P, T, F, M) with P = p1, . . . , pn, a Marking, M of it whichhas tokens (m) in places pi1 , . . . , pim with m ≤ n, will be codified by a binary n-tuplecontaining 1′s in pi1 , . . . , pim positions and the remainder n−m positions contain 0′s

The semantics of a Petri net is defined by the following firing rule which establisheswhen we can fire a transition and by the marking obtained after firing.

Firing rule: Let N = (P, T, F, M) be a marked ordinary Petri net. A transitiont ∈ T is enabled at marking M, denoted by M [t〉, if for all place p ∈ P such that(p, t) ∈ F , we have M(p) > 0.

An enabled transition M can be fired, thus producing a new marking, M ′: M ′(p) =M(p)−Wf (p, t) + Wf (t, p) ∀p ∈ P where Wf (x) = 1 if x ∈ F and Wf (x) = 0, whenx 6∈ F , for all x ∈ (T × P ) ∪ (P × T ). It is denoted by M [t〉M ′.

We would like to note that since a place can belongs to the precondition sets oftwo different transitions, a token in it could potentially enable two transitions and afterfiring one or the other two different markings can be reached. This fact has lead us toconsider as Phase Space not the set of binary n-tuples, but the set of all its subsets, forthe sake of properly capturing this case

In the example 2, the fire of t1 generates the marking M ′ given by: M ′(p1) =0, M ′(p2) = 0, M ′(p3) = 1

These definitions can be extended in order to consider the evolution by executingan arbitrary number of transitions simultaneously.

Concurrent activation of transitions: Let N = (P, T, F, M) be a markedordinary Petri net. Let R ⊆ T be a set of transitions of N . We say that all transi-


From PNs To DDSs

tions in R are enabled at marking M, denoted by M [R〉, if and only if (iff) M(p) ≥∑t∈R Wf (p, t), ∀p ∈ P , where Wf (p, t) is defined as in previous definition.

Moreover, we say that a multiset of transitions R is enabled at marking M iffM(p) ≥

∑t∈T Wf (p, t) ·R(t), ∀p ∈ P .

The firing of a multiset of transitions R at the marking M generates a new markingM ′, defined by:

M ′(p) = M(p)− Σt∈T

(Wf (p, t)−Wf (t, p)) ·R(t)

This evolution of the PN in a single step is denoted by M [R〉M ′.This is the way in which a PN evolve and it is assumed the best in terms of accuracy

to the real behaviour of concurrent computing systemsThe DDS which encodes the MOPN N = (P, T, F, M) is the triple (X, T,Φ), where:

• the Phase Space X is the set of all the subsets of 0, 1n, being n the number ofplaces of the MOPN.

• T is the monoid N ∪ 0

• Φ : T ×X → X is the evolution operator Φ holds:

1. Φ(0, x) = x ∀x ∈ X, i.e., Φ0(x) = idX

2. Φ(1, x) = y x, y ∈ X where:

– x = x1, . . . , xk where xi ∈ 0, 1n encodes Markings of the MOPN N

– y = ∪ki=1

yi

– yi = ∪tj=1

yij , i.e. the union of all (t) possible reachable markings fromxi, defined by xi[Ri〉yij being Ri the set of transitions of the net enabledat marking xi

3. Φ(t, Φ(s, x)) = Φ(t + s, x) ∀t, s ∈ T , ∀x ∈ X

3 Conclusion and Future Work

At this point we are able to apply all the well founded theory on DDSs to the analysisof concurrent systems. Our work now is defining the best metric on the Phase Spacein order to provide the most and the best results.

Acknowledgements

This work has been partially supported by projects MTM2008-03679/MTM, 00684-FI-04, PAI06-0114, PAC06-0008-6995, PAC08-0173-4838 & CGL07-66440-C04-03

References

[1] G. A. Petri, Communications with Automata,Technical Report RADC-TR-65-377, New York University (1966).



A Lower Bound for the Oriented-Tree Network DesignProblem based on Information Theory Concepts

Angel M. Perez-Bellido1, Sancho Salcedo-Sanz1, Emilio G.Ortiz-Garcıa1, Antonio Portilla-Figueras1 and Maurizio Naldi2

1 Department of Signal Theory and Communications, Universidad de Alcala1 Department of Informatics, Systems and Production, Universita di Roma 2

emails: [email protected], [email protected], [email protected],[email protected], [email protected]

Abstract

In this paper we propose and demonstrate a lower bound for the Oriented-Tree Network Design Problem (OTNDP). This problem consists of, given a set ofleaves for a tree, and their associated probabilities, finding the tree with minimumaverage of bifurcations. The OTNDP has applications in systems design, mainlytelecommunications and it is usually solved by means of meta-heuristics approachessuch as evolutionary algorithms. The bound proposed in this paper for the OTNDPis based on Information Theory concepts, and it is a tight bound, as we show inthe experimental evaluation of the paper.

Key words: Tree encoding, Oriented-Tree Network Problem, Information The-ory, Lower Bound.

1 Introduction

Many combinatorial optimization problems related with networks consist of obtainingoptimal trees in terms of an objective function and a set of constraints [1]-[5]. Problemsin which the optimal form of a tree must be obtained have deserved the attention ofresearchers in the last few years [3], [6]. In this case, constraints on the maximumnumber of nodes or leaves in the tree are usually considered.

This paper deals with a specific problem of tree-network design, the Oriented-TreeNetwork Design Problem (OTNDP). This problem starts from a set of leaves each withan associated demand probability, and looks for the oriented-tree network with theminimum average number of bifurcations. The OTNDP has direct application in thedesign of communications networks [7], hydric networks [8], structure of organizations[4], systems design such as call centers, etc.


A Lower bound for the OTNDP

A number of algorithms and heuristics can be found in the literature for tree designand tree encoding [9]-[11]. Specifically, intensive research has been carried out in thefield of evolutionary computation applied to the resolution of problems of tree design[10], [11]. For some applications, the use of evolutionary approaches has drawbacks,mainly due to the large amount of time these algorithms usually take. Sometimes, theuse of lower bound is really useful for comparison purposes, or quick a-priori evaluationof the best possible result of a problem (which provides an idea of the approximate costof a network deployment for example etc.).

In this paper we present the calculation of a lower bound for the OTNDP, basedon concepts borrowed from Information Theory. We will show that this problem ad-mits an easy and fast-to-compute lower bound depending only on the entropy of theprobabilities of the tree leaves.

The structure of the rest of the paper is as follows: next section states the OTNDPproblem, Section 3 is the body of the paper, in which we introduce the calculation ofthe lower bound for the OTNDP, and finally, Section 5 closes the paper with someconcluding remarks.

2 OTNDP problem definition

This section contains the definition of the OTNDP. Before carry out this definition, weprovide the terminology and definition of the main parameters needed in the rest ofthe section.

- N : Set of nodes in the tree T .

- L: Set of leaves in the tree T , with cardinal M .

- r: Root node of tree T .

- pi: Probability of leaf i.

- qi: Sum of the probabilities of all the leaves in the subtree rooted at node i.

- Ri: Route between node i and root node r, with cardinal di.

- A: Adjacency matrix for tree T .

- Si: Set of all nodes which hang from node i.

- hi: Number of bifurcations in each node i.

- bi: Number of bifurcations along a route Ri.

- b(T ): Average number of bifurcations of the tree T .

For each node i ∈ N , the route between node i and root node r (Ri), is defined as:


Angel M. Perez-Bellido et al.

Ri = ni,j ∈ N : A(ni,j , ni,j−1) = A(ni,j , ni,j+1) = 1, ni,1 = r, ni,di= i. (1)

The number of bifurcations in each node i is:

hi =

∑j∈N A(i, j) if i = r(∑

j∈N A(i, j))− 1 otherwise,

(2)

where we consider that A(i, i) = 0, ∀i ∈ N . Note that ∀i ∈ L, hi = 0. The number ofbifurcations in Ri (bi) can be calculated as:

bi =∑

j∈Ri

hj . (3)

The sum of the probabilities of all the leaves in the subtree rooted at node i isdefined as:

qi =∑

j∈L:i∈Rj

pj , (4)

note that in any node i ∈ N , the following condition is fulfilled:

qi =∑

j∈Si

qj . (5)

The average number of bifurcations in tree T is defined as:

b(T ) =∑

i∈L

pibi =∑

i∈N

hiqi. (6)

With all these definitions, the problem of obtaining the tree T with minimumnumber of average bifurcations can be stated as:

Given pii∈L, find the optimal tree T ∗ such that:

T ∗ = arg minT

b(T ) (7)

The problem we tackle in this paper is to find a good lower bound for b(T ). Asan example, consider the tree given in Figure 1. In order to calculate the averagenumber of bifurcations, we have to calculate the number of bifurcations for each leavein the tree, for example, for the leave with probability 0.25, the number of bifurcationsis 5 (3 at the root node and 2 more in the intermediate node). Vector b would bethen, for this tree (5, 8, 8, 8, 4, 3), which provides an average number of bifurcationsb(T ) =

∑i∈L pibi = 6.2.



Figure 1: Example of the calculation of the average bifurcations for a given tree.

3 A lower bound for b(T ) based on Information Theory

This section contains the calculation of the lower bound for the OTNDP. Before demon-strating the main Theorem, which contains the OTNDP lower bound, we state andproof three previous Lemmas necessary in the final result.

Lemma 1 Given a k-ary regular rooted tree T whose leaves have probabilities pi, i ∈L in such a way that

∑i∈L pi = 1, its mean number of bifurcations fulfils b(T ) ≥

kHk(pii∈L), where Hk stands for the k base entropy function.

Proof From the noiseless coding theorem [12] it is well known that in a rooted treewith a maximum of k links starting from every node, the average depth at which itsleaves are located fulfils

d(T ) =∑

i∈L

pidi ≥ −∑

i∈L

pi logk(pi) = Hk(pii∈L), (8)

where di, i ∈ L stands for the depth of the ith tree leaf. Since exactly k links hangfrom every node in a k-ary regular tree (except from leaves obviously), the numberof bifurcations bi associated to the ith leaf can be expressed as bi = kdi. Then, fromEquation (8) the mean number of bifurcations must satisfy

b(T ) =M∑

i=1

pibi =M∑

i=1

pikdi ≥ kHk(pii∈L). (9)

Lemma 2 Given a k-ary regular rooted tree T whose leaves have probabilities pi, i ∈L in such a way that

∑i∈L pi = 1, its mean number of bifurcations fulfils b(T ) ≥

3H3(pii∈L).



Proof Let k′ be the natural number for which kHk(pii∈L) achieves its minimumvalue. From Lemma 1

k′ = arg mink∈N

kHk(pii∈L) ⇒ b(T ) ≥ kHk(pii∈L) ≥ k′Hk′(pii∈L)∀k ∈ N. (10)

Decomposing the k base entropy function it can be shown how kHk(pii∈L) de-pends on k.

kHk(pii∈L) = −k∑

i∈L

pi logk(pi) = − k

ln(k)

∑

i∈L

pi ln(pi), (11)

where it can be seen how this dependence arises through the multiplicative factork/ ln(k). Thus, the minimum value of kHk(pii∈L) can be reached from the minimumone of this factor. To this end we can take the two first derivatives of this factor withrespect to k, namely

d kln(k)

dk=

1ln(k)

(1− 1

ln(k)

), (12)

d2 kln(k)

dk2=

1k ln2(k)

(2

ln(k)− 1

). (13)

According to these equations, there is just a single point that makes the firstderivative become zero (k = e), at which the factor shows an convex behavior becauseof the 1/e value of its second derivative. Since e is located between 2 and 3, the desirednatural number must be some of them. Evaluating the multiplicative factor in both ofthem it can be proof that 3H3 < 2H2, so we can conclude that k′ = 3.

Lemma 3 Given a regular rooted tree T whose leaves have probabilities pi, i ∈ L insuch a way that 0 < α =

∑i∈L pi ≤ 1, its mean number of bifurcations fulfils b(T ) ≥

−3∑

i∈L pi log3(pi/α).

Proof Let T ′ be a regular rooted tree with a series of leaves’ probabilities p′i, i ∈ Lwhich verify

∑i∈L p′i = 1. From Lemma 2

b(T ′) ≥ 3H3(p′ii∈L). (14)

Let T be the tree which arises when the probabilities of T ′ are scaled on thefollowing way: pi = αp′i, i ∈ L for some 0 < α ≤ 1. Its average number of bifurcationssatisfy

b(T ) =∑

i∈L

pibi = α∑

i∈L

p′ibi = αb(T ′) ≥ α3H3(p′ii∈L) (15)

= −3∑

i∈L

αp′i log3(p′i) = −3

∑

i∈L

pi log3

(pi

α

). (16)



In the next theorem we will make use from some new notation symbols, definedhere:

- Ni: Set of all nodes placed at the ith depth level.

- XL, XI : Partition of the nodes set X among those which act as tree leaves andthus belong to L (XL) and those which not (XI).

- Bi: Mean number of bifurcations located from the ith tree level. From Equation(6) Bi = b(T )−∑

j<i

∑k∈Nj

hkqk.

Theorem 1 Given a rooted tree T whose leaves have probabilities pi, i ∈ L in such away that

∑i∈L pi = 1, its mean number of bifurcations fulfils b(T ) ≥ B∗ = 3H3(pii∈L).

Proof Let n be some non-leaf node belonging to T . From Equation (5) hnqn =∑i∈Sn

hnqi. This expression corresponds to the average number of bifurcations of aone-level (and consequently regular) tree, which is rooted in n and whose leaves’ prob-abilities are qi, i ∈ Sn. Thus, from Lemma 3

hnqn ≥ −3∑

i∈Sn

qi log3

(qi∑

j∈Snqj

)(17)

According to the meaning of Bi, and taking into account that N I0 = r we can

express b(T ) in the following way

b(T ) = B1 +∑

i∈NI0

hiqi = hrqr (18)

≥ B1 − 3∑

i∈Sr

qi log3

(qi∑

k∈Srqk

)(19)

= B1 − 3∑

i∈NI1

qi log3(qi)− 3∑

i∈NL1

pi log3(pi), (20)

where (19) follows from (17). Given that each node in the first level (equivalentlybelonging to N1) must hang from the tree root r we can split the set Sr among N I

1 , NL1 .

Note that ∀i ∈ L, pi = qi, and for the root node∑

k∈Srqk =

∑i∈L pi = 1. In case of

B1 it can be developed as



B1 = B2 +∑

j∈NI1

hjqj (21)

≥ B2 − 3∑

j∈NI1

∑

i∈Sj

qi log3

(qi∑

k∈Sjqk

)(22)

= B2 +∑

j∈NI1

−3

∑

i∈Sj

qi log3(qi)

+ 3

∑

i∈Sj

qi log3

( ∑

k∈Sj

qk

)

(23)

= B2 +∑

j∈NI1

−3

∑

i∈Sj

qi log3(qi)

+ 3qj log3(qj)

(24)

= B2 − 3∑

i∈N2

qi log3(qi) + 3∑

i∈NI1

qi log3(qi), (25)

where (24) follows from (5). In the last equality it must be noted that⋃

j∈N1Sj =

N2, since every node in the second level must hang from some node in the first one.Combining (20) and (25) we obtain

b(T ) ≥ B2 − 3∑

i∈N2

qi log3(qi) + 3∑

i∈NI1

qi log3(qi)− 3∑

i∈NI1

qi log3(qi)− 3∑

i∈NL1

pi log3(pi)

(26)

= B2 − 3∑

i∈NI2

qi log3(qi)− 3∑

i∈NL2

pi log3(pi)− 3∑

i∈NL1

pi log3(pi), (27)

where N2 has been split into N I2 , NL

2 . The last equation shows the trend followed byconsecutive expressions of b(T ). Repeating this procedure k − 2 more times, a newlower bound for b(T ) is given by

b(T ) ≥ Bk − 3∑

i∈NIk

qi log3(qi)− 3k∑

j=1

∑

i∈NLj

pi log3(pi). (28)

This reasoning can be iterated up to the deeper level in the tree, which we willdenote as D. Since there is no any level lower than D, every node placed at this pointmust be a tree leaf. Therefore, the set of non-leaf nodes at this level is empty (N I

D = ∅)which makes that every sum over this becomes zero. Moreover, there cannot be anybifurcation at this level (BD = 0), since hi = 0 ∀i ∈ D. Following these facts, the nextbound must be satisfied



b(T ) ≥ BD − 3∑

i∈NIL

qi log3(qi)− 3D∑

j=1

∑

i∈NLj

pi log3(pi) (29)

= −3D∑

j=1

∑

i∈NLj

qi log3(qi) = −3∑

i∈L

pi log3(pi) (30)

= 3H3(pii∈L), (31)

where (30) follows from the fact that⋃D

j=1 NLj = L. Finally, the lower bound we

are looking for is:

B∗ = 3H3(pii∈L). (32)

4 Experimental evaluation of the bound

In order to experimentally evaluate the proposed lower bound for the OTNDP we haveprepared a large set of experiments, consisting of OTNDP instances of different size(from 10 to 300 leaves). The probabilities of the leaves have been randomly generatedin each instances, ensuring that

∑i∈L pi = 1. For each OTNDP instance the proposed

lower bound has been calculated. In order to compare the tight of this bound, eachinstance has also been solved using two different evolutionary algorithms with providesgood results for this problem, one including Prufer encoding [9] and the second oneDandelion Encoding [10] (30 runs in each OTNDP instance). Figure 2 shows theresults obtained for mean and the best fitness of the 30 runs. In these figures it can beappreciated the tightness of the proposed lower bound: note that even in large OTNDPinstances the results of the evolutionary algorithms are quite close to the lower bound,which indicate the goodness of the latter.

5 Conclusions

In this paper we have presented a lower bound for the Oriented-Tree Network DesignProblem (OTNDP). This bound is based on Information Theory concepts, and pro-vides a tight reference of the best possible value for a given OTNDP instance. In thepaper we have presented the bound by means of three lemmas and one main theo-rem, for which we have provided the corresponding proofs. The proposed bound hasbeen experimentally evaluated in OTNDP of different sizes, and its tightness has beendemonstrated by means of comparison with the results obtained in the instances bytwo different evolutionary algorithms.



0 50 100 150 200 250 3006

8

10

12

14

16

18

Number of leaves

Num

ber

of b

ifurc

atio

ns

DandelionPrüferLower bound

(a)

0 50 100 150 200 250 3006

8

10

12

14

16

18

Number of leaves

Num

ber

of b

ifurc

atio

ns

DandelionPrüferLower bound

(b)

Figure 2: Results of the proposed lower bound in different size OTNDP instances. The figureshows comparison with the results obtained with two evolutionary algorithms using Dandelionand the Prufer encoding; (a) Results for mean fitness (over 30 experiments); (b) Results for thebest fitness.

Acknowledgement

This work has been partially supported by Comunidad de Madrid, Universidad deAlcala and Ministerio de Educacion, through Projects CCG08-UAH/AMB-3993 andTEC2006/07010. E. G. Ortiz-Garcıa is supported by Universidad de Alcala, under theUniversity F.P.I. grants program. A. M. Perez-Bellido is supported with a doctoralfellowship by the European Social Fund and Junta de Comunidades de Castilla laMancha, in the frame of the Operating Programme ESF 2007-2013.



References

[1] A. Fatemi, K. Zamanifar and N. NematBakhsh, “A new genetic approach to con-struct near-optimal binary search trees,” Applied Mathematics and Computation,vol. 190, no. 2, pp. 1514-1525, 2007.

[2] S. A. Choudum, R. Indhumathi “On embedding subclasses of height-balanced treesin hypercubes,” Information Sciences, vol. 179, no. 9, pp. 1333-1347, 2009.

[3] I. Averbakh and O. Berman, “Algorithms for path medi-centers of a tree,” Com-puters & Operations Research, vol. 26, pp. 1395-1409, 1999.

[4] J. J. Bartholdi III, D. Eisenstein and Y. F. Lim, “Bucket brigades on in-treeassembly networks” European Journal of Operational Research, vol. 168, pp. 870-879, 2006.

[5] T. Oncan, “Design of capacitated minimum spanning tree with uncertain cost anddemand parameters,” Information Sciences, vol. 177, no. 20, pp. 4354-4367, 2007.

[6] K. Sawada and R. Wilson, “Models of adding relations to an organization structureof a complete K-ary tree,” European Journal of Operational Research, vol. 174,pp. 1491-1500, 2006.

[7] S. Shioda, K. Ohtsuka, T. Sato, “An efficient network-wide broadcasting based onhop-limited shortest-path trees,” Computer Networks, vol. 52, no. 17, pp. 3284-3295, 2008.

[8] J. M. Latorre, S. Cerisola and A. Ramos, “Clustering algorithms for scenario treegeneration: application to natural hybro inflows” European Journal of OperationalResearch, vol. 181, pp. 1339-1353, 2007.

[9] S. Piccioto, How to encode a tree, Ph.D. dissertation, Univ. California, San Diego,1999.

[10] T. Paulden and D. K. Smith, “From the Dandelion Code to the Rainbow code: aclass of bijective spanning tree representations with linear complexity and boundedlocality,” IEEE Trans. Evol. Comput., vol. 10, no. 2, pp. 108-123, 2006.

[11] S. Soak, D. W. Corne and B. Ahn, “The edge-window-decoder representation fortree-based problems,” IEEE Trans. Evol. Comput., vol. 10, no. 2, pp. 124-144,2006.

[12] R. B. Ash, Information Theory, Dover Publishers, New York, 1965.





30 June, 1–3 July 2009.

Evaluating Sparse Matrix-Vector Product on the

FinisTerrae Supercomputer

Juan C. Pichel1, Juan A. Lorenzo2, Dora B. Heras2 and Jose C.Cabaleiro2

1 Galicia Supercomputing Center (CESGA), Spain

2 Electronics and Computer Science Dpt., Univ. of Santiago de Compostela, Spain


Abstract

In this paper the sparse matrix-vector product (SpMV) is evaluated on theFinisTerrae supercomputer. FinisTerrae is a SMP-NUMA system with morethan 2500 processors. Several topics are studied. We have estimated the influenceof data and thread allocation in the SpMV performance. Due to the indirect andirregular memory access patterns of SpMV, we have also studied the influence ofthe memory hierarchy in the performance. Additionally, different parallelizationstrategies and compiler options were considered.

According to the behavior observed in the study a set of optimizations speciallytuned for FinisTerrae were successfully applied to SpMV. Noticeable improve-ments are obtained in comparison with the SpMV naıve implementation.

Key words: sparse matrix, NUMA, data locality, performance

1 Introduction

FinisTerrae [1] is a SMP-NUMA system with more than 2500 processors (rank 427in TOP500-November 2008 list [2]) installed at Galicia Supercomputing Center (Spain)(see Figure 1). The most demanded applications in this system are those related tothe simulation of fluid mechanics, semiconductor devices, etc. In these simulationsthe solution of large sparse linear equation systems is required, which are often solvedusing iterative methods. The main kernel of the iterative methods is the sparse matrix–vector multiplication (SpMV). This kernel is notorious for sustaining low fractions ofpeak processor performance. Therefore, understanding the behavior of SpMV on theFinisTerrae supercomputer is very important in order to achieve high performance.

In this paper the sparse matrix-vector product is evaluated on the FinisTerrae

supercomputer. Several topics are studied. We have estimated the influence of data


Evaluating SpMV on the FinisTerrae Supercomputer

Figure 1: FinisTerrae supercomputer at CESGA.

and thread allocation in the SpMV performance. A comparison between the operatingsystem scheduler and a explicit data-thread allocation is provided. Due to the indirectand irregular memory access patterns of SpMV, we have also studied the influence ofthe memory hierarchy in the performance. With this purpose we have evaluated alocality optimization technique specially tuned for the SpMV on multicore processors.Additionally, different parallelization strategies and compiler options were considered.Finally, another important issue in high performance computing is considered: thepower efficiency.

In a related work a framework for automatic detection and application of the bestmapping among threads and cores in parallel applications on multicore systems is pre-sented [3]. Williams et al. [4] propose several optimization techniques for the sparsematrix-vector multiplication which are evaluated on different multicore platforms. Au-thors examine a wide variety of techniques including, among others, the influence ofthe process and memory affinity. In the same way, Goumas et. al [5] revisited theperformance issues of the SpMV on modern microarchitectures. The authors, based onthe experiments performed, extract several conclusions that can serve as guidelines fora subsequent optimization process of the kernel.

2 FinisTerrae supercomputer architecture

FinisTerrae supercomputer consists of 142 HP Integrity rx7640 nodes [6], with 81.6GHz Dual-Core Intel Itanium2 (Montvale) processors and 128 GB of memory pernode. Additionally, there is a HP Integrity Superdome with 64 1.6GHz Dual-CoreIntel Itanium2 (Montvale) processors and 1 TB of main memory (not considered inthis work). The interconnect network is an Infiniband 4x DDR network, capable of upto 20 Gbps (16 Gbps effective bandwidth). It has, also, a high performance storagesystem, the HP SFS, made up of 20 HP Proliant DL380 G5 servers and 72 SFS 20disk arrays. The SFS storage system is accessed through the Infiniband network. Theoperating system used in all the computing nodes is SuSE Linux Enterprise Server 10.

Figure 2(a) displays a scheme of a FinisTerrae node. It is shown how processorsare arranged in a SMP-cell NUMA configuration. A cell has two buses at 8.5 GB/s (6.8GB/s sustained), each connecting two sockets (that is, four cores) to a 64GB memory


Juan C. Pichel et al.

8 9

10 11

12 13

14 15

Cell 0 Core

Socket

Cell

Cell Controller

Cell Controller

Memory 0

0 1

2 3

4 5

6 7

Cell 1 Memory 1

Figure 2: Block diagram of a rx7640 node (on the left) and Dual-Core Intel Itanium2(Montvale) processor (on the right).

for (i=0; i < N; i++) reg=0;

for (j=PTR[i]; j < PTR[i+1]; j++) reg = reg + DA[j]*X[INDEX[j]];

Y[i]=reg;

Figure 3: Basic CRS-based sparse matrix-vector product implementation.

module through a sx2000 chipset (Cell Controller). The Cell Controller maintains acache-coherent memory system using a directory-based protocol and connects both cellsthrough a 34.6 GB/s crossbar (27.3 GB/s sustained). It yields a theoretical processor-cell controller peak bandwidth of 17 GB/s and a cell controller-memory peak bandwidthof 17.2 GB/s (four buses at 4.3 GB/s).

Focusing on the processors, each Itanium2 processor comprises two 64-bit coresand three cache levels per core (see Figure 2(b)). This architecture has 128 GeneralPurpose Registers and 128 FP registers. L1I and L1D (write-through) are both a 4-way set-associative, 64-byte line-sized cache. L2I and L2D (write-back) are a 8-wayset-associative, 128-byte line-sized cache. Finally, there is an unified 12-way L3 cacheper core, with line size of 128 bytes and write-back policy. Note that floating-pointoperations bypass the L1. Each Itanium2 core can perform 4 FP operations per cycle.Therefore, the peak performance per core is 6.4 GFlop/s. HyperThreading is disabledin the FinisTerrae.

3 Experimental conditions

In this work the sparse matrix-vector product operation is studied. This kernel isnotorious for sustaining low fractions of peak processor performance due to its indirectand irregular memory access patterns. Let’s consider the operation y=a×x, where x

and y are dense vectors, and a is a N × M sparse matrix. The most common datastructure used to store a sparse matrix for SpMV computations is Compressed-Sparse-Row format (CSR) [7]. da, index and ptr are the three vectors (data, column indices



N NZ N NZ

av41092 41092 1683902 nmos3 18588 386594e40r0100 17281 553562 pct20stif 52329 2698463exdata 1 6001 2269501 psmigr 1 3140 543162garon2 13535 390607 rajat15 32761 443573gyro k 17361 1021159 sme3Da 12504 874887

mixtank new 29957 1995041 syn12000a 12000 1436806msc10848 10848 1229778 tsyl201 20685 2454957nd3k 9000 3279690

Table 1: Matrix benchmark suite.

and row pointer) that characterize this format. Figure 3 shows an implementation ofSpMV for CSR storage. This implementation enumerates the stored elements of a bystreaming both ind and val with unit-stride, and loads and stores each element of y

only once. However, x is accessed indirectly, and unless we can inspect ind at run-time,it is difficult or impossible to reuse the elements of x explicitly. Note that the localityproperties of the accesses to x depend directly on the sparsity pattern of the consideredmatrix.

All the codes in this work were written in C and compiled with the Intel’s 10.0Linux C compiler (icc). OpenMP directives were used to parallelize the irregularcode of Figure 3. All the results shown in the next section were obtained using thecompiler optimization flag -O2 (with the exception of Section 4.5). The parallel codeuses a block distribution of the sparse matrix rather than a cyclic one due to the betterperformance achieved (see Section 4.1). Tests have been performed on a rx7640 nodeof the FinisTerrae supercomputer.

As matrix test set we have selected fifteen square sparse matrices from differentreal problems that represent a variety of nonzero patterns. These matrices are from theUniversity of Florida Sparse Matrix Collection (UFL) [8]. Table 1 summarizes somefeatures of the matrices. N is the number of rows/columns (all the considered matricesare square), and NZ is the number of nonzero elements.

4 Performance evaluation

In this section the results of the performance evaluation of the SpMV on the FinisTer-

rae are shown. Several topics are studied. The influence of data and thread allocationon a NUMA architecture as rx7640 nodes is discussed in Sections 4.2 and 4.3. Moreover,the use of the memory hierarchy by SpMV is analyzed (Section 4.4). With this purposewe have evaluated a locality optimization technique specially tuned for the SpMV onmulticore processors [9]. Additionally, different parallelization strategies and compileroptions were considered (Sections 4.1 and 4.5). Finally, a summary of the main resultsobtained are shown.

4.1 Loop distribution

Next, we will evaluate the performance when different parallelization strategies areapplied to the code of Figure 3. In particular, block and cyclic distributions of loop iare considered.



1.0

2.0

3.0

GFLO

PS/s

2 Th, cyclic 2 Th, block 4 Th, cyclic 4 Th, block

0.0

1.0

2.0

3.0

GFLO

PS/s


2.0

4.0

6.0

8.0

10.0

12.0

GFLO

PS/s


0.0

2.0

4.0

6.0

8.0

10.0

12.0

GFLO

PS/s


Figure 4: SpMV performance using different loop distribution: 2 and 4 threads (up),and with 8 and 16 (bottom).

Figure 4 shows the SpMV performance of both strategies using different numberof threads. According to the results obtained, we conclude that the block distributionoutperforms the cyclic one. Only for matrix exdata 1 this behavior is not observed.The explanation is the poor load balance shown by the block distribution for thisparticular matrix. We must highlight that in most of the cases the block distributionachieve higher performance than the cyclic strategy using less threads. It happens forall the matrices with the exception of exdata 1, nd3k and psmigr 1.

Better results of the block distribution are due to the fact that threads work withdisjoint parts of the sparse matrix when the code is parallelized using this strategy.This distribution fits better than the cyclic one in the Itanium2 architecture, wherethe cache hierarchy is not shared among cores (see Figure 2(b)). Therefore, a blockdistribution is preferred over the cyclic one. For this reason, the results in this sectionwere always obtained using a block distribution of the SpMV code.

4.2 Memory affinity

As we have explained in Section 2, processors on a rx7640 node are arranged in a twoSMP-cell NUMA configuration. Each cell has a 64GB memory module. Therefore,data can be allocated on a local memory (threads and data are in the same cell) or ona remote memory (threads and data are in different cell). Additionally to these twomodes, data can be allocated in a different way: using the interleave policy. Wheninterleaved memory is used, 50% of the addresses are to memory on the same cell asthe requesting processor, and the other 50% of the addresses are to memory on theother cell (round-robin). The main goal of using interleave memory is to decrease theaverage access time when accessing data simultaneously from processors belonging to



0.6

0.8

1.0

1.2

1.4

1.6

GFLO

PS/s

local

interleave

remote

0.0

0.2

0.4

0.6GFLO

PS/s

Figure 5: Influence of the data allocation on a rx7640 node.

different cells. Memory latencies provided by the manufacturer are [6]: ∼185 ns (localmemory) and ∼249 ns (interleaving memory). Latencies to remote memory are notavailable. In order to guarantee the correct allocation of the data (memory affinity)we have used the libnuma library. Likewise, libnuma was used to map threads to cores(processor affinity).

Next, the influence of the data allocation in the SpMV performance is studied.As example, we show in Figure 5 the behavior of SpMV code when using two threadsmapped to cores 8 and 12 (see Figure 2(a)). Data is allocated in the memory moduleof the cell of cores 8 and 12 (local), in the memory of the other cell (remote), and usingthe interleave policy. As is expected, an important degradation in the performance isobserved when data is allocated on a remote cell with respect to local memory accesses.In particular, an average decrease of 20.7% is obtained, ranging from 3% in the bestcase (matrix garon2) to 32% (matrix exdata 1). Results point out that the worstbehavior is achieved for big size matrices (see Table 1 for details). When using interleavememory, this degradation is in average about 10%. Note that depending on the datadistribution for a particular matrix, interleave policy offers performance close to localaccesses (matrix e40r0100) or to remote accesses (matrices nmos3 and rajat15). Theoverall behavior analyzed in this example is also observed when a different number ofthreads is considered.

Therefore, we conclude that data allocation has a great influence in the performanceof SpMV on FinisTerrae. When threads are mapped to cores in the same cell, it isbetter to allocate the data in the memory module of the same cell. However, if cores inboth cells must be used, the allocation of the data in the interleaving memory makessense due to accesses to remote memory are very costly.

4.3 Processor affinity

In this section different aspects related to the influence of thread allocation in the SpMVperformance are studied. First, we have focused on evaluating the influence of mappingthreads to the same processor (socket). Figure 6(a) shows the performance achievedusing two threads for several mapping configurations: same socket (for example, cores



0.8

1.0

1.2

1.4

1.6

GFLO

PS/s

Same socket

Different socket, same bus

Different socket and bus

0.2

0.4

0.6

GFLO

PS/s

1.4

1.8

2.2

2.6

3.0

GFLO

PS/s

Two sockets, same bus (8,9,10,11)

Two sockets, diff. bus (8,9,12,13)

Four sockets (8,10,12,14)

0.2

0.6

1.0

1.4GFLO

PS/s

Figure 6: Influence of the thread allocation on a rx7640 node: using two (on the left)and four threads (on the right).

8 and 9), different socket and same bus (cores 8 and 10) and different socket andbus (cores 8 and 12). Note that data is allocated in the same cell where threads aremapped. Results point out that the influence of mapping the two threads to the sameor to different socket sharing the bus is very low. In particular, the average differencein the performance is only about 0.2%. However, some improvements (up to 5%) areobserved when mapping threads to cores that do not share the bus. This is particulartrue in the case of big size matrices (for example, matrices mixtank new and tsyl201).Therefore, the contention of the bus to memory seems to have an impact on the SpMVperformance.

In order to confirm the behavior observed previously, we have tested the SpMV per-formance using four threads mapped to: two sockets sharing the bus (cores 8,9,10,11),two sockets in different buses (cores 8,9,12,13) and four sockets (cores 8,10,12,14). Per-formance results obtained using these configurations are shown in Figure 6(b). Severalconclusions can be made.

First, the same trend is observed with respect to the results using two threads(Figure 6(a)). That is, better performance is achieved when threads are mapped tosockets that do not share the bus (configuration 8,9,12,13). In this case, improvementsmean in average about 8% (higher than 30% for some matrices as nd3k, pct20stif andtsyl201). Therefore, the influence of the bus become more significant as the number ofthreads increases. Note that average improvement is about 2% when using two threadsthat do not share bus (see configuration “different socket and bus“ in Figure 6(a))

Second, the impact of the bus for small matrices is minimal in such a way thatthe three considered mappings obtain similar results (for example, matrices nmos3 andrajat15). And finally, there is no significant differences among using four sockets(8,10,12,14) and two sockets in different bus (8,9,12,13). An explanation to this behav-ior is that in both cases there are two threads mapped to cores that share the bus (seeFigure 2(a)).

As we have shown above, data and thread allocation have a great impact on theperformance of SpMV. Next, we will evaluate the behavior of the operating system



0.4

0.6

0.8

1.0

1.2

1.4

1.6

GFLO

PS/s

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

av4

10

92

e4

0r0

10

0

exd

ata

_1

ga

ron

2

gyro

_k

mix

tan

k_

…

msc

10

84

8

nd

3k

nm

os3

pct

20

stif

psm

igr_

1

raja

t15

sme

3D

a

syn

12

00

0a

tsyl2

01

GFLO

PS/s

2 Th, scheduler2 Th, NUMA optimization

1.0

1.5

2.0

2.5

3.0

GFLO

PS/s

0.0

0.5

1.0

1.5

2.0

2.5

3.0

av4

10

92

e4

0r0

10

0

exd

ata

_1

ga

ron

2

gyro

_k

mix

tan

k_

…

msc

10

84

8

nd

3k

nm

os3

pct

20

stif

psm

igr_

1

raja

t15

sme

3D

a

syn

12

00

0a

tsyl2

01

GFLO

PS/s


2.0

3.0

4.0

5.0

6.0

7.0

GFLO

PS/s

0.0

1.0

2.0

3.0

4.0

5.0

6.0

7.0

av4

10

92

e4

0r0

10

0

exd

ata

_1

ga

ron

2

gyro

_k

mix

tan

k_

…

msc

10

84

8

nd

3k

nm

os3

pct

20

stif

psm

igr_

1

raja

t15

sme

3D

a

syn

12

00

0a

tsy

l20

1

GFLO

PS/s


Figure 7: Effect of NUMA optimizations on a rx7640 node.

scheduler. With this purpose we have made a comparison among the performanceachieved by the OS scheduler and taking into account the particularities of the Finis-

Terrae architecture (explicit data and thread allocation). The results are displayed inFigure 7, where only results for the best mapping configurations are shown (labeled as”NUMA optimization“). Best configurations using two and four threads were analyzedbefore. For eight threads, we have mapped all the threads to different cores of thesame cell. Results show that our NUMA optimizations achieve a better performancein most of the cases. For example, improvements up to 40% are observed when usingtwo threads (matrices msc10848 and tsyl201). Note that NUMA optimization effectsare more visible in the case of big matrices (for example, matrices mixtank new, nd3kor tsyl201). In summary, the average performance improvement when NUMA issuesare considered is 15.6%, 14.1% and 4.7% for two, four and eight threads respectively.

4.4 Data reordering

SpMV is a kernel characterized by irregular accesses, and it tends to have low spatial andtemporal localities. SpMV performance will depend on both the system architectureand the sparse matrix considered (as we have commented in Section 3). In this sectionwe study the influence of the memory hierarchy in the SpMV performance. Withthis purpose we have evaluated a locality optimization technique specially tuned forthe SpMV on multicore processors [9]. This way, we can compare and analyze theSpMV performance when locality issues are considered. This technique consists onreorganizing the data guided by a locality model instead of restructuring the code orchanging the sparse matrix storage format. The goal is to increase the grouping ofnonzero elements in the sparse matrix pattern that characterizes the irregular accessesand, as a consequence, increasing the locality in the execution of the code.



Performance (GFlops/s)2 Th 4 Th 8 Th 16 Th

Orig. Reord. Orig. Reord. Orig. Reord. Orig. Reord.av41092 0.54 0.51 0.85 0.78 2.52 0.83 4.29 2.06e40r0100 1.14 1.16 2.53 2.57 5.02 5.09 9.46 9.52exdata 1 0.35 0.46 0.24 1.24 0.63 5.06 1.22 11.40garon2 1.20 1.21 2.28 2.39 3.73 4.68 6.88 8.98gyro k 0.70 0.82 2.41 2.43 5.12 5.32 9.92 9.96

mixtank new 0.57 0.58 1.03 1.49 2.29 4.84 6.68 10.38msc10848 0.45 0.63 2.00 2.52 5.00 6.18 10.03 11.73nd3k 0.52 0.52 0.77 0.88 2.42 3.12 9.39 10.25nmos3 1.09 1.11 2.18 2.19 4.22 4.30 8.14 8.15

pct20stif 0.49 0.51 0.92 1.01 3.73 3.73 9.33 9.60psmigr 1 1.48 1.25 2.98 2.66 5.06 4.16 7.83 7.51rajat15 0.85 0.85 1.60 1.70 2.98 3.31 5.67 6.21sme3Da 0.90 0.95 2.61 2.76 4.79 5.61 7.40 10.53

syn12000a 0.53 0.56 2.13 2.20 5.88 5.88 11.93 11.95tsyl201 0.44 0.48 1.05 1.06 4.56 4.75 11.20 11.84

Table 2: Influence of the locality improvement technique.

Table 2 shows the results of applying the reordering technique using a differentnumber of threads. Note that no NUMA optimizations (explicit data and thread al-location) were additionally applied. According to the results, we can conclude thatlocality (as we expect) has a great influence on the SpMV performance. Significantimprovements are achieved when applying the locality optimization technique. For ex-ample, performance of the reordered exdata 1 matrix using eight threads is 8x higherthan the obtained by the original one, increasing from 0.63 GFlops/s to 5.06 GFlops/s(see Table 2). Note that as the number of threads increases, the performance improve-ments due to the locality optimization become more important. This way, the averageimprovement is 1.1%, 9.0%, 15.1% and 17.2% for two, four, eight and sixteen threadsrespectively. Only for matrices av41092 and psmigr 1, the reordering technique doesnot increase the SpMV performance.

Therefore, taking advantage of Itanium2 caches on FinisTerrae is critical, be-cause accesses to memory (local, remote or interleave) incur in a high penalty in com-parison with other systems [10].

4.5 Compiler options

For all the results shown in this section, as Section 3 points out, the SpMV code iscompiled using the optimization flag -O2. Now, we will evaluate the performance of thecode using more aggressive compiler optimizations as -O3, and compiler optimizationsfor the specific CPU (in this case, flag -mtune=itanium2-p9000 for Dual-Core Itanium2processors) [11]. In particular, using -O3 enables -O2 optimizations plus more aggressiveoptimizations, such as prefetching, scalar replacement, and loop and memory accesstransformations.

Figure 8 shows the SpMV performance (for 4 and 8 threads) comparing the codecompiled using -O2 with respect to the code compiled using the best optimizationoptions found. Note that if -O2 code obtains the best results, the performance obtainedusing other compilation options is not displayed. According to the results, several



2.0

3.0

4.0

5.0

6.0

7.0

GFLO

PS/s

8 Th, Compiler optim.8 Th, -O24 Th, Compiler optim.4 Th, -O2

0.0

1.0

2.0

3.0

4.0

5.0

6.0

7.0

GFLO

PS/s

8 Th, Compiler optim.8 Th, -O24 Th, Compiler optim.4 Th, -O2

Figure 8: Influence of the compiler optimizations in the SpMV performance.

Performance (GFlops/s)2 Th 4 Th 8 Th 16 Th

Orig. Opt. Orig. Opt. Orig. Opt. Orig. Opt.av41092 0.54 0.72 0.85 1.12 2.52 2.53 4.29 -e40r0100 1.14 1.23 2.53 2.55 5.02 5.04 9.46 9.52exdata 1 0.35 1.15 0.24 2.09 0.63 8.61 1.22 17.98garon2 1.20 1.22 2.28 2.40 3.73 4.69 6.88 8.98gyro k 0.70 0.92 2.41 2.57 5.12 5.40 9.92 9.95

mixtank new 0.57 0.64 1.03 1.58 2.29 5.17 6.68 10.62msc10848 0.45 1.19 2.00 3.35 5.00 7.11 10.03 13.38nd3k 0.52 1.26 0.77 1.51 2.42 4.66 9.39 14.92nmos3 1.09 1.10 2.18 2.21 4.22 4.30 8.14 8.15

pct20stif 0.49 0.65 0.92 1.20 3.73 4.18 9.33 9.60psmigr 1 1.48 1.80 2.98 3.42 5.06 5.68 7.83 10.07rajat15 0.85 0.86 1.60 1.70 2.98 3.33 5.67 6.21sme3Da 0.90 1.06 2.61 2.80 4.79 5.62 7.40 10.67

syn12000a 0.53 1.24 2.13 3.30 5.88 7.06 11.93 13.56tsyl201 0.44 1.17 1.05 1.85 4.56 6.98 11.20 14.01

Table 3: Summary of the SpMV performance results.

observation can be made.There are six matrices where -O2 flag is always the best option (matrices e40r0100,

garon2, gyro k, nmos3, pct20stif and rajat15). In the other cases, using differentcompiler optimization options increase noticeably the performance. Specifically, im-provements are mainly due to -O3 because compiler optimizations for the specific CPUhave a small influence, improving the performance only about 1% on average.

Moreover, unlike the influence of the locality optimization technique detailed inSection 4.4, the impact of the compiler options in the performance decreases with thenumber of running threads. For instance, the performance of the optimized code formatrix exdata 1 is 3x higher with respect to the -O2 case when using four threads.While, for example, the best performance improvement using eight threads is 1.8x formatrices mixtank new and tsyl201.

4.6 Summary

Finally, a summary of the results discussed in previous sections is presented. Table3 shows the SpMV performance without optimizations and when the optimizations



detailed previously are applied together (that is, data and thread allocation, localityimprovement and compiler optimization options). First, we must highlight that for allthe considered matrices some improvement was achieved (at least for a particular num-ber of threads). The average SpMV performance when the optimizations are appliedis 1.1 GFlops/s, 2.2 GFlops/s, 5.4 GFlops/s and 10.7 GFlops/s using two, four, eightand sixteen threads respectively. It implies an average improvement of 44.2%, 31.5%,38.6% and 35.1% with respect to the non-optimization case. As example, optimizationsfor matrix exdata 1 when using sixteen threads multiplies by 14.7 the performanceachieved without optimizations (from 1.2 GFlops/s to 17.9 GFlops/s). Average per-formance with optimizations means in average about 12% of the peak performance forItanium2 Montvale cores (that is, 6.4 GFlops/s).

Moreover, results show an average speedup (in comparison with the use of twothreads without optimizations) of 3.4x, 9x and 18.4x for four, eight and sixteen threadsrespectively. Matrix exdata 1 is again the most noticeable case, with a speedup up to51x when using sixteen threads.

Another important aspect in high performance computing is the power efficiency.In order to evaluate it for our system we have calculated the performance-Watts ratio.The average performance of a rx7640 node (that is, using sixteen threads) is 10.7GFlops/s. Additionally, a Montvale Itanium2 processor consumes 104 W according tothe manufacturer specifications. Therefore, the ratio is 12.9. If we compare this valuewith other systems when using the SpMV as benchmark, our system is near the topof the ranking [4]. Montvale processors outperform AMD Dual-Core Opteron, IntelQuad-Core Clovertown and Sun Niagara2 processors, and only is beaten by the IBMCell. Note that the sparse matrices used as testbed in the other systems are differentfrom ours.

5 Conclusions

FinisTerrae is a SMP-NUMA system with more than 2500 processors installed atGalicia Supercomputing Center (Spain). In this paper the sparse matrix-vector product(SpMV) is evaluated on this system. Several topics are studied.

We have estimated the influence of data and thread allocation in the SpMV perfor-mance. With respect to the data allocation, we conclude that when threads are mappedto cores in the same cell, it is better to allocate the data in the memory module of thesame cell. However, if cores in both cells must be used, the allocation of the data in theinterleaving memory makes sense due to accesses to remote memory are very costly.With respect to the thread allocation results point out that the contention of the busto memory have a great impact on the SpMV performance. A comparison between theOS scheduler and a explicit data-thread allocation is also provided, demonstrating thatexplicit data and thread allocation outperforms the scheduler..

Due to the indirect and irregular memory access patterns of SpMV, we have alsostudied the influence of the memory hierarchy in the performance. With this purposewe have evaluated a locality optimization technique specially tuned for the SpMV onmulticore processors. We conclude that locality has a great influence on the SpMV per-



formance. Therefore, taking advantage of Itanium2 caches on FinisTerrae is critical,because accesses to memory incur in a high penalty in comparison with other systems.

Additionally, different parallelization strategies and compiler options were consid-ered. Finally, another important issue in high performance computing as the powerefficiency is considered. Our system is near the top of the ranking in comparison withother processors.

According to the behavior observed in this study a set of optimizations speciallytuned for FinisTerrae were successfully applied to SpMV. Noticeable improvementsare obtained with respect to the SpMV naıve implementation.

References

[1] Galicia Supercomputing Center (CESGA). http://www.cesga.es.

[2] The TOP500 Supercomputing Sites. http://www.top500.org/.

[3] J. W. Tobias Klug, M. Ott, and C. Trinitis. autopin - Automated optimization ofthread-to-core pinning on multicore systems. Trans. on HiPEAC, 3(4), 2008.

[4] S. Williams, L. Oliker, R. Vuduc, J. Shalf, K. Yelick, and J. Demmel. Optimizationof sparse matrix-vector multiply on emerging multicore platforms. In Proc. of

Supercomputing (SC), 2007.

[5] G. Goumas, K. Kourtis, N. Anastopoulos, V. Karakasis, and N. Koziris. Under-standing the performance of sparse matrix-vector multiplication. In Proc. of the

Euromicro Conf. on Parallel, Distributed and Network-Based Processing, pages283–292, 2008.

[6] Hewllet-Packard Company. HP Integrity rx7640 Server Quick Specs.

[7] Y. Saad. Iterative methods for sparse linear systems. SIAM, 2003.

[8] T. Davis. University of Florida Sparse Matrix Collection. NA Digest, 97(23), June1997. http://www.cise.ufl.edu/research/sparse/matrices.

[9] J. C. Pichel, D. E. Singh, and J. Carretero. Reordering algorithms for increasinglocality on multicore processors. In 10th IEEE Int. Conf. on High Performance

Computing and Communications, pages 123–130, 2008.

[10] S. R. Alam, R. F. Barret, M. R. Fahey, J. A. Kuehn, O. E. Bronson, R. T. Mills,P. C. Roth, J. S. Vetter, and P. H. Worley. An evaluation of the Oak RidgeNational Laboratory Cray XT3. Int. Journal of High Performance Computing

Applications, 22(1):52–80, 2008.

[11] Intel. Intel C++ Compiler User and Reference Guides.



Optimal Extended Optical Flow and Statistical Constraint

Martine Picq1, Jerome Pousin1 and Patrick Clarysse2

1 Universite de Lyon, CNRS, INSERM,INSA-Lyon, ICJ UMR5208, F-69100 villeurbanne Cedex

2 INSA-LYON CREATIS-LRMN UMR 5008,


Abstract

This work is motivated by the necessity to improve the Heart image tracking. Thistechnique is related to the ability to generate an apparent continuous motion, observablethrough intensity variation from one starting image to an ending one, which both supposedsegmented. Given two images ρ0 and ρ1, we calculate an evolution process ρ(t, ·) whichtransports ρ0 to ρ1 by using the optimal extended optical flow. Such a strategy has beenfound well suited to Heart image tracking, provided the motion is controlled by a statisticalmodel. In this paper we use viability theory to give sufficient conditions for handling theoptimal extended optical flow subject to point-wise statistical constraint using Parzen’s ap-proximation. The strategy is implemented in a 1D case and numerical results are presentedshowing the efficiency of the proposed strategy.

1 Introduction

Modern medical imaging modalities can provide a great amount of information to study thehuman anatomy and physiological functions in both space and time. In cardiac Magnetic Res-onance Imaging (MRI) for example, several cross-section images, or slices, can be acquiredto map the heart in 3D and at a collection of discrete time samples over the cardiac cycle.From these partial observations and in order to study the heart’s dynamics, the challenge is tosomehow extrapolate from these input data throughout the cardiac cycle [7], [13].Image interpolation in the time domain deals with the metamorphosis of one image to an other.It is a technique widely used in motion of pictures. Given a pair of images, the image continu-ous registration is aimed to find a sequence of intermediate images, such that the first image inthe sequence is equal to the first given image (starting image) and the last image is equal to thesecond given image (ending image). The challenge is to find a precise process to interpolatethe image intensity functions between the two images.


O.E.O.F AND STATISTICAL CONSTRAINT

There have been many algorithms proposed for image warping. Some of the most pop-ular approaches are mesh warping [16] or field warping [3]. Roughly speaking, it consists inminimizing some kind of statistical dissimilarity. The idea to use optical flow, based on theinvariance brightness hypothesis, has been widely used. Giving an exhaustive bibliography isout of the scoop of this article. Let us mention for example [5] [12]. In these studies, smooth-ing terms or comparison terms, based on statistics or not, are added to the functional to beminimized. In many works involving optical flow and statistics, the optical flow equation (1)is considered as a constraint to be satisfied in which the velocity v is unknown. To specify thevelocity v, a statistical dissimilarity, evaluated with the images, has to be minimized.

This work propose an OF method integrating a statistical constraint expressed locally. an ap-proach combining optimal extended optical flow, i.e. the conservation of ”mass of gray level”equation when the optimal velocity v of the optical flow equation is determined by minimiz-ing the kinetic energy for transporting the starting image towards the ending image. A medicalimage dataset in one dimension will be considered at the end of Section 4, for evaluating themethod.In this paper, we propose a rigorous methodology to combine the EOF constraint with statisticalconstraints on the transportation process. This methodology is applied to the estimation of theheart’s dynamic from cardiac imaging data. Statistical information on the heart’s dynamicextracted from exemplar data are to constraint the warping in between two border conditions,initial and target images, from a dynamic image sequence.The paper is organized as follows. Section 2 is dedicated to the extended optical flow model.Section 3 deals with the statistical constraint built with a Parzen’s density approximation. Insection 4, a necessary viability condition is proposed for the extended optical flow submittedto statistical constraints. Section 5 is devoted to numerical examples. To circumvent technicaldifficulties, since for space dimensions greater than one the optimal mass transport transfor-mation is considerably more complex, only the one dimensional space will be considered innumerical experiments, and thus images are reduced to curves.

2 Optimal Extended Optical Flow method

2.1 The O.E.O.F. principle

If we denote by ρ the intensity function, and by v the velocity of the apparent motion of bright-ness pattern, an image sequence is considered via the gray-value map ρ : Q = (0, 1)×Ω → IRwhere Ω ⊂ IRd, the support of images, for d = 1, 2, 3, is a bounded Lipschitz domain. Ifthe points of the image move according to the velocity field v : Q → IRd, then gray valuesρ(t,X(t, x)) are constant along motion trajectories X(t, x). This is supposed with the standardoptical flow equation:

∂tρ(t, X(t, x)) + v · ∇xρ(t,X(t, x)) = 0. (1)

The assumption that the pixel intensity does not change during the movement is in somecases too restrictive. A weakened assumption called sometimes the extended optical flow,


M. PICQ, J. POUSIN, P. CLARYSSE

replaces the intensity preserving by a mass preserving which reads:

∂tρ + v · ∇xρ + div (v)ρ = 0. (2)

The previous equations lead to an ill-posed problem for the unknown (ρ, v). Variational formu-lations or relaxed minimizing problems for computing jointly (ρ, v) have been proposed firstby [5] and after by many other authors. Here our concern is somewhat different. Finding (ρ, v)simultaneously is possible by solving the optimal mass transport problem (see problem 3).Let ρ0 and ρ1 denote the cardiac images between two times arbitrary fixed to zero and one, themathematical problem reads: find ρ the gray level function defined from Q with values in [0, 1]verifying

∂tρ(t, x) + div(v(t, x)ρ(t, x)) = 0, in (0, 1)× Ω;ρ(0, x) = ρ0(x); ρ(1, x) = ρ1(x).

(3)

The velocity function v is determined such as it minimizes the functional:

infρ,v

∫ 1

0

∫

Ωρ(t, x)‖v(t, x)‖2 dtdx. (4)

Thus we get an image sequence through the gray-value map ρ.In the next paragraph, we show the way in which the velocity v is computed in the 1-D

case. For general properties of optimal transportation, the reader is referred to the book ofC. Villani [15]. Let us consider a positive continuous function (ρ0, Ω0) and a bounded fromabove and from below by positive constant continuous function (ρ1,Ω1). Here Ω0 and Ω1 arebounded intervals. Assume that

∫

Ω0

ρ0(x) dx =∫

Ω1

ρ1(y) dy,

Consider M : Ω0 → Ω1 a measure-preserving map∫

Ω0

f(M(x))ρ0(x) dx =∫

Ω1

f(y)ρ1(y) dy ∀f ∈ C0(IR; IR); (5)

with the requirements that:

M = arginf∫Ω0|(u(x)− x)|2ρ0(x) dx.

u(6)

Since we deal with intervals and with continuous functions, M is an increasing diffeomor-phism. A mapping M defined as above, is a redistribution of mass between ρ0 and ρ1 andverifies:

ρ0 = M ′ρ1 M. (7)

The density ρ1 is the image measure of ρ0 by M .The time interpolation density ρ(t, x) is computed ccording to the following result due to

Benamou and Brenier [6]:



Theorem 2.1

The Kantorovich-Wasserstein distance between ρ0 and ρ1 verifies:

d2(ρ0, ρ1) = infρ,v

∫

Ω

∫ 1

0ρ(t, x)|v(t, x)|2 dt dx, (8)

where (ρ, v) are solution to: 0 ≤ ρ

∂tρ + ∂x(vρ) = 0 in (0, 1)× Ω;ρ(0, ·) = ρ0; ρ(1, ·) = ρ1 in Ω.

(9)

Moreover, the infimum is reached for the couple (ρ, vM ), with M defined by (7), and X(t, x) =(1− t)x + tM(x) which for all continuous function f verifies:

∫

Ω

∫ 1

0f(t, x)ρ(t, x) dt dx =

∫

Ω

∫ 1

0f(t,X(t, x))ρ0(x) dt dx. (10)

This yields vM (t, x) = M(X−1(t, x))−X−1(t, x).

If the functions ρ0, ρ1 are less regular we have to deal with a weak (integral) formulationinstead of the differential equation (7) (see (19) in [6]).

2.2 Numerical Approximation of the 1-D Optimal Extended Optical Flow

Set Ω = (0, 1), we introduce a subdivision xjJj=0 of the interval Ω where, J being fixed,

for 0 ≤ j ≤ J , xj = j∆x with ∆ = 1J . The following implicit Euler’s scheme is used to

approximate the solution to Problem (7).

M j+1 = M j + δxρ0(xj+1)ρ1(Mj+1)

for 0 < j < J − 1)M0 = 0.

(11)

Then Problem (9) is approximated with a finite upwind difference scheme according to thevelocity vM .

3 Statistical constraint

To track the heart motion from two images, it is possible to directly apply the method describedin the previous paragraph. However, such strategy may lead to unsatisfactory results becausesome part of the heart shape may change differently. The continuity constraint on the overallmotion of the heart’s structure is too general. From a reference data set of some healthy vol-unteers, a statistical dynamic model of the heart can be estimated, see [2], [4]. For instance astatistical model is built by using the 2D-cardiac synthetic image generator ASSESS [9]. Thistechnique is so related to the ability to generate an apparent continuous motion, observablethrough intensity variation from one starting image to an ending one, both supposed segmented.Given two images ρ0 and ρ1, we calculate an evolution process ρ(t, ·) which transports ρ0 to ρ1

by using the optimal extended optical flow. The viability theory is used to give sufficient condi-tions for handling the optimal extended optical flow subject to point-wise statistical constraintusing Parzen’s approximation.



3.1 Statistical data modeling

We deal with a statistics of P discretized curves at the collection points xj , 0 ≤ j ≤ J at thetimes points tn, 0 ≤ n ≤ N denoted by Snj

p . Let us introduce the following hypothesis:H1 For each fixed parameter (tn, xj), Sp(tn, xj) = Snj

p Pp=1 are considered as realizations

of real random variables Snj1 , Snj

2 , · · ·, SnjP valued in (0, 1), independent and having the same

probability law, which admits a uniformly continuous density with respect to the Lebesguemeasure gnj .

Let us approximate gnj(·), the density of the probability law, with a Gaussian Parzenkernel:

gnj(r, s) =1

Pr√

2π

P∑

p=1

e−(s−S

njp )2

2r2 ∀s ∈ IR; (12)

r = P−η where η can be chosen such that the approximated probability density functiongnj converges, when P goes to infinity, towards the probability density function gnj with anasymptotical error estimate of the mean squared error, EP

[(gnj(r, s) − gnj(s))2

]≤ CP− 4

5

([14]).When (12) is used as an estimator, an important point is to specify the value of r the

size of the window. To determine the optimal value of r, it is necessary to have gnj which isunknown. Thus in literature, replacing it by a Gaussian kernel, the standard deviation of which

σ is computed with the statistical data, the value of r = σ(

43P

) 15 is proposed.

3.2 Statistical Constraint

By mean of spline interpolation in time and space, from the family of points Sp(tn, xj)Pp=1,

we define a C1 family of applications Sp(t, x)Pp=1 defined on Q, valued in (0, 1), which

gives rise to a C1 density functions g(t, x, s) according to (12). The constraint expresses thatthe solution ρ of the Optimal Extended Optical Flow (9) is a realization close to the expectedvalue of the random variable S, the approximated density of which g has been computed witha Parzen kernel from the statistical data.Let σ be an upper bound of the empirical standard deviation of the statistical data Snj

p Pp=1,

and let S be the expected value of S. We require that the probability of the following eventverifies:

P (|ρ(t, x)− S(t, x)| ≤ kσ) ≥ 1, for any (t, x) ∈ Q. (13)

where k is a positive number. From the Tchebychev’s inequality, we know that

P (|S(t, x)− S(t, x)| ≤ kσ) ≥ (1− 1k2

), ∀(t, x) ∈ Q. (14)

Since the intersection of the events involved in (13) (14) is included in the event involved in(15), combining (13) and (14) leads to ∀(t, x) ∈ Q,

P (|ρ(t, x)− S(t, x)| ≤ 2kσ) ≥ (1− 1k2

). (15)



In order to satisfy the necessary condition (15), we introduce a function h defined by: ∀(t, x) ∈Q,

h(t, x, u) =∫ ρ0(x)+2kσ

ρ0(x)−2kσg(0, x, s) ds−

∫ u+2kσ

u)−2kσg(t, x, s) ds∀u ∈ IR. (16)

Finally the following constraint is imposed on ρ, the solution to the O.E.O.F. equation:

h(t, x, ρ(t, x)) ≤ 0 for any (t, x) ∈ Q. (17)

4 Statistical viability and O.E.O.F.

The theory of viability allows us to handle point-wise non convex constraint. More preciselythis theory gives point-wise conditions which insure that, for (t, x) ∈ Q = [0, 1] × Ω, thegraph of the O.E.O.F. solution (t, x, ρ(t, x)) evolves in a given domain of constraint D whichneeds not to be convex. When these conditions are not satisfied, we compute a relaxed solutionof the transport equation the graph of which stays in the domain D. Then to simultaneouslyuse O.E.O.F. from ρ0 to ρ1 and to take account of the statistical model Snj

p , we apply theseviability results to the O.E.O.F. equation with D = h−1(]−∞, 0]) where h is defined in (16).This allows us to take into account the statistical model Snj

p . When the O.E.O.F. solution is notcompatible with the Statistical Model Snj

p , the relaxed solution of the O.E.O.F. depends on theresidues of the O.E.O.F. for the P realizations (t, x) ∈ Q 7→ Sp(t, x) defined in section (3.2).

4.1 Viability of the Transport Equation

Basically the following results have been given by Nagumo (1942), widely expanded by Aubin[1], [8] who gave an exhaustive bibliography. The theory of viability needs the notion ofcontingent cones largely developed in [11]. For numerical simulations, we need to computecontingent cones. However the calculus of contingent cones remains something uneasy tohandle except for convex subsets. So though the constraint in viability theory need not to beconvex, computation is feasible when it is the pre-image of a convex subset. We limit us tochoose a constraint which is the pre-image of a convex subset through a regular application hwhich can be vector-valued. Here, under restrictive hypotheses adapted to the studied cases,we use results proved in [10] applied to the application h and the point-wise constraint:

h(t, x, u) ∈ D where D =]−∞, 0] (18)

If Ω is a bounded opened lipschitzian subset of IRd, Q =]0, 1[×Ω, Q1 an open subset of IRd+1

which contains Q. We consider an application h in C1(Q1 × IR, IRm) with m < d + 2, D aconvex closed subset of IRm and the point-wise constraint :

h(t, x, u) ∈ D or (t, x, u) ∈ h−1(D) (19)

Let ρ0 and ρ1 be continuous on Ω, bounded from above by one. Suppose ρ1 lipschitzian andbounded below by a positive constant, then (t, x) 7→ v(t, x) is C1. We introduce the time-spaceTransport operator

>ρ = β(t, x) · ∇t,xρ ,

with the C1 velocity β(t, x) = (1, v(t, x)), associated with a initial condition at time t = 0.



Theorem 4.1 - Viability for regular Transport([10] Chap. II) Assume that f is bounded andcontinuous on Q1 × IR, lipschitzian with respect to u. If ρ0 satisfies the constraint :

∀x ∈ Ω (0, x, ρ0(x)) ∈ D = h−1(D).

Then the classical solution ρ ∈ C1(Q, IR) to the transport equation >ρ(t, x) = f(t, x, ρ(x)) in Q;

ρ(0, x) = ρ0(x),(20)

satisfies the constraint (19), provided h is of maximal rank on Q1 × IR (i.e. Dh is surjective)and there exists ε positive such that : ∀(t, x, u) ∈ U such that for z = (t, x, u) /∈ D we have:

∇dD · Jh(z)F(z) + ε ≤ 0, (21)

where F(z) = (1, v(t, x), f(z))T , Jh is the jacobian matrix of h and dD is the distance to D.

In 1-D situation, the condition (21) is reduced to

∇h(z) · F (z) + ε ≤ 0. (22)

Remark that the O.E.O.F solution is solution to the Transport equation (20) f(t, x, ρ) =−∇xv(t, x)ρ truncated at a sufficiently large level not reached by f for ρ ∈ [0, 1]. Assumethe hypotheses of Theorem 4.1 are satisfied, then the O.E.O.F solution satisfies the constrainth(t, x, ρ(t, x)) ∈ D.

4.2 Computation of a Relaxed Solution

When the condition (21) is not satisfied, we define a relaxed Transport equation which has asolution ρn satisfying the constraint (17). Set z = (t, x, u) ∈ U , fix n ∈ IN; ε ∈ IR+∗ anddefine the functions ϕn, G and fn by:

ϕn(z) = [1− n(h(z))−]+

G(z) = [∇h(z) ·F (z)+εsgn(h(z))]+

∂3h(z)

fn(z) = f(z)− ϕn(z)G(z)(23)

Let us mention that, when the constraint is not verified, our strategy is not to change thevelocity (and thus the flow) but to change the right hand side of the Transport equation so thatthe condition (22) be satisfied, which leads to the previous definition of G.

Lemma 4.2 Relaxed viability for O.E.O.F.Assume the hypotheses of Theorem 4.1 to be satisfied, in particular D3h does not vanish onQ1 × IR. Let n and ε be fixed define fn by fn(z) = f(z)− ϕn(z)G(z), then ρn solution to

>ρn = fn(ρn) in Q;ρn(0, ·) = ρ0 in Ω,

(24)

verifies the constraint h(t, x, ρn(t, x)) ∈ D.



Since the solution of Problem (24) is specified in form of a fixed point, let us give thealgorithm we propose to compute it.

• assume ρkn to be known compute ρk+1

n solution to >ρk+1

n = fn(·, ·, ρkn) for (t, x) ∈ Q;

ρk+1n (0, x) = ρ9(x) for x ∈ Ω.

(25)

• If∥∥ρk+1

n − ρkn

∥∥L2(Q)

≤ precis stop; else k = k+1; repeat.

For the completeness, let us end this section with the expression of ∇h · F

ep(0, x, ρ0) = e−(ρ0+2kσ−Sp(0,x))2

2r2 − e−(ρ0−2kσ−Sp(0,x))2

2r2

ep(t, x, u) = e−(u+2kσ−Sp(t,x))2

2r2 − e−(u−2kσ−Sp(t,x))2

2r2

∇h(t, x, u) · F (t, x, u) = 1√2πrP∑p=P

p=1 ep(t, x, u)[>Sp(t, x)− f(t, x, u)

]− ep(0, x, ρ0)

[>Sp(0, x)

](26)

5 Numerical simulations

The method is illustrated with 1-D profiles (Figure 2) extracted from end diastolic and systolictime points of a cardiac Magnetic Resonance (Figure 1).

Figure 1: end of diastole (a), of systole (b)

0 20 40 60 80 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

x axis

rho

(0,x

) 0

: d

iasto

le e

nd

−

rh

o(t

s,x

) ts

: systo

le e

nd

Figure 2: 2 profiles(c).



0 10 20 30 40 50 60 70 80 90 100

0

20

x axis

X: 19Y: 57Z: 0.1968

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

target profile

initial profile

Figure 3: O.E.O.F solution x=19,y=57,z=0.19

The objective is to transport the initial end diastolic profile to the target end systolic profile.The direct application on the OEOF method is illustrated in figure 3.

One can observe the exact warping between the two profiles. However, in every applicationand notably the cardiac images application, one know that the transport equation bases warpingprocess may not be realistic. Therefore the idea is to incorporate a priori knowledge fromspecific examples to constrain the warping. Statistical constraint is learn from a collection ofexamples generated from an analytical left ventricle cardiac motion model. The model hasbeen designed originally, to generate the synthetic cardiac MR images in the context of theevaluation of motion estimates. Then a collection of 100 examples has been generated byassigning a probability law to two of the models parameters namely the left ventricle internaland external circle radius. The average model from the 100 samples is given in figure 4 (left),the solution obtained with the statistically constraint proposed method is given in figure 4(right).

The perturbation of the right hand side of the equation is given in picture 5.

We have a family of 100 curves which are sampled at the points xjj=100j=1 at each time

step tnn=100n=1 . We can compute mean of these data providing the mean dynamical model. The

standard deviation of it is σ = 0.1051. h is computed with the values k = 1.15. The Algorithm(4.2) is applied, and the O.E.O.F solution subject to the statistical point-wise constraint isobtained after 20 nonlinear iterations. The computing time on a standard laptop by usingMATLAB library is about 7 minutes. It should be mention that no effort have been done foroptimizing the computational cost.



0 20 40 60 80 100

0

50

1000.2

0.4

0.6

0.8

1

x axis

X: 23Y: 69Z: 0.8581

t axis

Sm

axis

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 10 20 30 40 50 60 70 80 90 100

0x axis

X: 19Y: 57Z: 0.2903

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Figure 4: Mean model from statistics.x=19 y=57 z=0.84 (left); Constraint O.E.O.F solutionx=19,y=57,z=0.29 (right)

020

4060

80100

0

50

1000

0.05

0.1

0.15

0.2

x axist axis

rho a

xis

Figure 5: Active constraint



5.1 Conclusions

To summarize, in this work, we have presented a mathematical method for the fusion of statis-tical information with a motion described with a PDE model for handling an images trackingproblem. The efficiency of the method has been tested with a 1-D problem coming from 2-Dmedical images in order to reduce the complexity of the computation of the optimal velocity.Nevertheless, the method can be extended to 2-D or 3-D problems. Furthermore, this methodallows easily to locate the discrepancy of the analyzed images with the dynamical statisticalmodel.

References[1] T. Aubin, Viability theory, Birkhauser, Bale, 1991.[2] P. C. T. K. I. E. M. B. Delhay, J. Ltjnen, ”A dynamic 3-d cardiac surface model from mr images”,

In Computer In Cardiology, IEEE, 2005.[3] T. Beier and S. Neely, ”Feature-based image metamorphosis”, In SIGGRAPH, volume 26, pp.

3542, 1992.[4] B. Delhay, Estimation spatio-temporelle de mouvement et suivi de structures dformables. Appli-

cation l’imagerie dynamique du coeur et du thorax, PhD thesis, INSA de Lyon, Lyon France,2006.

[5] P. K. G. Aubert, R. Deriche, ”Computing optical flow problem via variational techniques”, SIAMJ. Appl. Math., 80, January 1999, pp. 156–182.

[6] Y. B. J. Benamou, ”A computational fluid mechanics solution to the monge-kantorovich masstransfer problem”, Numeriche Mathematik, 84, 2000, pp. 375–393.

[7] O. G. M. Lynch and P. F. Whelan, ”Segmentation of the left ventricle of the heart in 3d+t mri datausing an optimised non-rigid temporal model”, IEEE TMI, 2008, pp. 195–203.

[8] M. N. O. Carja and I. Vrabie, Viability, Invariance and Applications, volume 207 of MathematicsStudies, NOTH-HOLLAND, 2007.

[9] C. B. P. Clarysse and I. E. Magnin, ”L. k. p. c. d. f. c. o. 2d spatial and temporal displacement fieldfitting from cardiac mr tagging”, Medical Image Analysis, (3), 2000, pp. 253–268.

[10] M. Picq, Resoluton de l’equation du Transport sous contraintes, PhD thesis, INSA de Lyon, LyonFrance, March 2007.

[11] J. W. R.T. Rokafellar, Variational Analysis, Springer Verlag, Berlin, 2005.[12] W. R. S. Keeling, ”Medical image registration and interpolation by optical flow with maximal

rigidity”, Jour. Math. Imaging and Vision, 2005, pp. 47–65.[13] J. Schaerer, P. Clarysse, and J. Pousin, ”A new perturbation approach for image segmentation

tracking”, In In 5th IEEE International Symposium on Biomedical Imaging: From Nano to Macro,ISBI’2008, Paris, France., 2008.

[14] A. Tsybakov, Introduction a l’estimation non parametrique, Mathematiques et applications,Springer Verlag, 2003.

[15] C. Villani, Topics in optimal transportation, volume 58 of Graduate Studies in Math, Amer MathSoc, Providence, 2003.

[16] G. Wolberg, ”Recent advances in image morphing”, In Int. Conf. Comput. Graph., pp. 6471. IEEE,1996.





30 June, 1–3 July 2009.

Optimization of a Hyperspectral Image Processing Chain

Using Heterogeneous and GPU-Based Parallel Computing

Architectures

Antonio J. Plaza1, Javier Plaza1, Sergio Sanchez1 and Abel Paz1

1 Department of Technology of Computers and Communications, University of

Extremadura, Avda. de la Universidad s/n, E-10071 Caceres, Spain

emails: [email protected], [email protected], [email protected], [email protected]

Abstract

Hyperspectral imaging is a new technique in remote sensing that generates hun-dreds of images, at different wavelength channels, for the same area on the surfaceof the Earth. In recent years, several efforts have been directed towards the in-corporation of high-performance computing systems and architectures into remotesensing missions. With the aim of providing an overview of current and new trendsin the design of parallel and distributed systems for remote sensing applications,this paper presents two solutions for efficient implementation of a hyperspectralimage processing chain based on mixed pixel analysis. The first solution is in-tended for efficient exploitation of hyperspectral data after being transmitted toEarth, and is tested on a heterogeneous network of workstations at University ofMaryland. The second solution is intended for on-board, real-time exploitation ofhyperspectral data, and is tested on an NVidia graphics processing unit (GPU).Combined, the two discussed approaches integrate a system with different levelsof priority in processing of the hyperspectral data, which can be tuned dependingon the specific requirements of the application scenario. The proposed implemen-tations are evaluated using hyperspectral data collected by the Airborne VisibleInfra-Red Imaging Spectrometer (AVIRIS) operated by NASA/JPL.

Key words: hyperspectral imaging, high performance computing, heterogeneous

parallel computing, commodity graphics hardware, mixed pixel analysis.

1 Introduction

Hyperspectral imaging is concerned with the measurement, analysis, and interpretationof spectra acquired from a given scene (or specific object) at a short, medium or longdistance by an airborne or satellite sensor [1]. For instance, NASA is continuouslygathering hyperspectral data with Earth-observing sensors such as JPL’s Airborne


Hyperspectral Processing Using Heterogeneous and GPU-Based Architectures

Figure 1: The concept of hyperspectral imaging.

Visible-Infrared Imaging Spectrometer (AVIRIS) [2], able to record the visible andnear-infrared spectrum (wavelength region from 0.4 to 2.5 micrometers) of the reflectedlight of an area 2 to 12 kilometers wide and several kilometers long using 224 spectralbands. The resulting hyperspectral data cube is a stack of images (see Fig. 1) in whicheach pixel (vector) has an associated spectral signature or fingerprint that uniquelycharacterizes the underlying objects, and the resulting data volume typically comprisesseveral GBs per flight.

The wealth of spatial and spectral information provided by hyperspectral instru-ments has opened ground-breaking perspectives in many application domains, includ-ing environmental modeling and assessment, target detection for military and de-fense/security purposes, urban planning and management studies, risk/hazard preven-tion and response including wild land fire tracking, biological threat detection, monitor-ing of oil spills and other types of chemical contamination [3]. Most of the above-citedapplications require analysis algorithms able to provide a response in (near) real-time,which is an ambitious goal since the price paid for the rich information available fromhyperspectral sensors is the enormous amounts of data that they generate.

The utilization of high performance computing (HPC) systems has become moreand more widespread in hyperspectral imaging applications [4]. Although most paral-lel techniques and systems for image information processing employed by NASA andother institutions during the last decade have chiefly been homogeneous in nature [5],a recent trend in the design of HPC systems for data-intensive problems is to utilizehighly heterogeneous computing resources [6]. As shown in previous work [7, 8], net-works of heterogeneous computing resources can realize a very high level of aggregateperformance in hyperspectral imaging applications. Although remote sensing data pro-


Antonio J. Plaza, Javier Plaza, Sergio Sanchez and Abel Paz

cessing algorithms map nicely to heterogeneous networks of computers, these systemsare generally expensive and difficult to adapt to on-board data processing scenarios, inwhich low-weight and low-power integrated components are essential to reduce missionpayload and obtain analysis results in real-time, i.e., at the same time as the datais collected by the sensor. In this regard, an exciting new development in the field ofcommodity computing is the emergence of commodity graphic processing units (GPUs),which can bridge the gap towards on-board processing of remotely sensed data [9].

The main purpose of this paper is to describe the two components (ground andon-board) of an integrated parallel system for efficient processing of remotely sensedhyperspectral data. These two solutions provide different levels of priority in remotesensing data processing, with the ground segment being mainly oriented to informationextraction from data sets already transmitted to Earth, while the on-board system canbe applied for real-time data processing in time-critical applications. As a case study,we focus on efficient implementation of a standard hyperspectral image processingchain, available into Kodak’s Research Systems ENVI software1 which is one of themost widely used remote sensing software packages. The remainder of the paper isorganized as follows. Section 2 describes the hyperspectral processing chain used asa representative case study in this work. Section 3 develops parallel implementationsof the hyperspectral processing chain. Section 4 provides an experimental comparisonof the proposed implementations using several architectures, including a heterogeneousnetwork of workstations at University of Maryland and an NVidia GeForce 8800 GTXGPU. Section 5 concludes the paper with some remarks and hints future research lines.

2 Hyperspectral image processing chain

2.1 Problem formulation

Let us assume that a hyperspectral scene with N bands is denoted by F, in whicha pixel of the scene is represented by a vector fi = [fi1, fi2, · · · , fin] ∈ ℜN , where ℜdenotes the set of real numbers in which the pixel’s spectral response fik at sensorchannels k = 1, . . . ,N is included. Under the linear mixture model assumption, eachpixel vector in the original scene can be modeled using the following expression [10]:

fi =

E∑

e=1

ee · aee + n, (1)

where ee denotes the spectral response of a pure spectral signature (endmember [11]in hyperspectral imaging terminology), aee is a scalar value designating the fractionalabundance of the endmember ee, E is the total number of endmembers, and n is anoise vector. The use of spectral endmembers allows one to deal with the problem ofmixed pixels, which arise when the spatial resolution of the sensor is not high enoughto separate different materials. For instance, it is likely that the pixel labeled as ‘veg-etation’ in Fig. 1 actually comprises a mixture of vegetation and soil. In this case,

1ITT Visual Information Solutions, ENVI User’s Guide. Online: http://www.ittvis.com.



the measured spectrum can be decomposed into a linear combination of pure spectralendmembers of soil and vegetation, weighted by abundance fractions that indicate theproportion of each endmember in the mixed pixel. The solution of the linear spectralmixture problem described in (1) relies on the correct determination of a set ee

Ee=1

of endmembers and their correspondent abundance fractions aeeEe=1 at each pixel fi.

2.2 Spectral unmixing methodology

A standard approach to decompose mixed pixels in hyperspectral images is linear spec-tral unmixing, which comprises the following stages. Firstly, a set of spectral end-members are extracted from the input data set. For this purpose, we have considered astandard algorithm in the literature: the pixel purity index (PPI) algorithm [12]. Then,the fractional abundances of such endmembers in each pixel of the scene is obtainedusing an inversion process [10]. The inputs to the hyperspectral processing chain con-sidered in this work are a hyperspectral image cube F with N spectral bands and T

pixel vectors; the number of endmembers to be extracted, E, a maximum number ofprojections, K ; a cut-off threshold value, vc, used to select as final endmembers onlythose pixels that have been selected as extreme pixels at least vc times throughout theprocess; and a threshold angle, va, used to discard redundant endmembers. The chaincan be summarized by the following steps:

1. Skewer generation. Produce a set of K randomly generated unit vectors, denotedby skewerj

Kj=1.

2. Extreme projections. For each skewerj , all sample pixel vectors fi in the originaldata set F are projected onto skewerj via products of |fi ·skewerj | to find samplevectors at its extreme (maximum and minimum) projections, forming an extremaset for skewerj which is denoted by Sextrema(skewerj).

3. Calculation of pixel purity scores. Define an indicator function of a set S, denotedby IS(fi), to denote membership of an element fi to that particular set as IS(fi) =1 if fi ∈ S. Using the indicator function above, calculate the number of timesthat given pixel has been selected as extreme using the following equation:

Ntimes(fi) =K

∑

j=1

ISextrema(skewerj)(fi) (2)

4. Endmember selection. Find the pixels with value of Ntimes(fi) above vc and forma unique set of E endmembers ee

Ee=1 by calculating the spectral angle distance

(SAD) for all possible endmember pairs and discarding those which result in anangle value below va. SAD is invariant to multiplicative scalings that may arisedue to differences in illumination and sensor observation angle [10]. The SADbetween endmember ei and endmember ej is defined as follows:

SAD(ei, ej) = cos−1 ei · ej

‖ei‖ · ‖ej‖(3)



5. Spectral unmixing. For each sample pixel vector fi in F, a set of abundancefractions specified by ae1

, ae2, · · · , aeE

is obtained using the set of endmembersee

Ee=1, so that each fi can be expressed as a linear combination of endmembers

as fi = e1 · ae1+ e2 · ae2

+ · · · + eE · aeE, thus solving the mixture problem.

3 Parallel implementations

3.1 Heterogeneous parallel implementation

Let us assume that a heterogeneous network can be modeled as a complete graph whereeach node models a computing resource pi weighted by its relative cycle-time wi [6].Each edge in the graph models a communication link weighted by its relative capacity.In the following, we assume for simplicity that all communication links have the samecapacity, thus relating our study to a partially heterogeneous network. With the aboveassumptions in mind, processor pi should accomplish a share of αi × W of the totalworkload, denoted by W , to be performed by a certain algorithm, with αi ≥ 0 for1 ≤ i ≤ P and

∑Pi=1 αi = 1. With the above definitions in mind, the heterogeneous

parallel implementation can be summarized as follows:

1. Obtain necessary information about the heterogeneous system, including thenumber of available processors P , each processor’s identification number pi

Pi=1,

and processor cycle-times wiPi=1.

2. Let V denote the total volume of data in the original hyperspectral image F.Processor psi will be assigned a certain share αi × V of the input volume, whereαi ≥ 0 for 1 ≤ i ≤ P and

∑Pi=1 αi = 1. In order to obtain the value of αi for

processor pi, calculate αi = (1/wi)PP

j=1(1/wj)

.

3. Once the set αiPi=1 has been obtained, a further objective is to produce P

spatial-domain partitions of the input hyperspectral data set. To do so, we firstobtain a partitioning of the hyperspectral data set so that the number of entirepixel vectors allocated to each processor pi is proportional to its associated valueof αi. Then, we refine the initial partitioning taking into account the local memoryassociated to each processor [8].

4. Skewer generation. Generate K random unit vectors skewerjKj=1 in parallel,

and broadcast the entire set of skewers to all the workers.

5. Extreme projections. For each skewerj , project all the sample pixel vectors ateach local partition p onto skewerj to find sample vectors at its extreme projec-

tions, and form an extrema set for skewerj which is denoted by S(p)extrema(skewerj).

Now calculate the number of times each pixel vector f(p)i in the local partition is

selected as extreme using the following expression:



N(p)times(f

(l)i ) =

K∑

j=1

IS

(p)

extrema(skewerj)(f

(l)i ) (4)

6. Candidate selection. Each worker now sends the number of times each pixel vectorin the local partition has been selected as extreme to the master, which forms afinal matrix of pixel purity indices Ntimes by combining all the individual matrices

N(p)times provided by the workers.

7. Endmember selection. The master selects those pixels with Ntimes(fi) > vc andforms a unique set ee

Ee=1 by calculating the SAD for all possible pixel vector

pairs and discarding those pixels which result in angle values below va.

8. Spectral unmixing. For each sample pixel vector fi in F, obtain (in embarrasinglyparallel fashion) the set of abundance fractions specified by ae1

, ae2, · · · , aeE

, sothat each fi can now be expressed as a linear combination of the set of endmembersee

Ee=1, weighted by their corresponding fractional abundances.

3.2 GPU-based parallel implementation

The first issue that needs to be addressed is how to map a hyperspectral image ontothe memory of the GPU. Since the size of hyperspectral images usually exceeds thecapacity of such memory, we split them into multiple spatial-domain partitions, so thateach partition incorporates all the spectral information on a localized spatial regionand is composed of spatially adjacent pixel vectors. In order to accommodate thepartitions onto the GPU memory, each partition is further subdivided into 4-band tiles(called spatial-domain tiles), which are arranged in different areas of a 2-D texture.Such partitioning allows us to map four consecutive spectral bands onto the RGBAcolor channels of a texture (memory) element. Apart from the tiles, we also allocateadditional memory to hold other information, such as the skewers and intermediate dotproducts, norms, and SAD-based distances.

Figure 2 shows a flowchart describing our GPU-based implementation. The data

partitioning stage performs the spatial-domain decomposition of the original hyperspec-tral image F. In the stream uploading stage, the spatial-domain partitions are uploadedas a set of tiles onto the GPU memory. The skewer generation stage provides K skewers,using NVidia’s parallel implementation of the Mersenne twister pseudo-random numbergenerator on the GPU [13]. The remaining stages comprise the following kernels:

• Extreme projections. The tiles are input streams to this stage, which obtains allthe dot products necessary to compute the required projections. Since streamsare actually tiles, the implementation of this stage is based on a multi-pass kernelthat implements an element-wise multiply and add operation, thus producing fourpartial inner products stored in the RGBA channels of a texture element.

• Candidate selection. This kernel uses as inputs the projection values generatedin the previous stage, and produces a stream for each pixel fi, containing the



Figure 2: Flowchart of the proposed stream-based GPU implementation.

relative coordinates of the pixels with maximum and minimum distance after theprojection onto each skewer. A complementary kernel is then used to identifythose pixels which have been selected at least vc times during the process.

• Endmember selection. For each endmember candidate, this kernel computes thecumulative SAD with all the other candidates. It is based on a single-pass kernelthat computes the SAD between two pixel vectors using the dot products andnorms produced by the previous stage. A complementary kernel is then used todiscard those candidates with cumulative SAD scores below va, thus producing afinal set of spectrally unique endmembers ee

Ee=1.

• Spectral unmixing. Finally, this kernel uses as inputs the final endmembers se-lected in the previous stage and produces the endmember fractional abundancesfor each pixel fi, thus solving the mixture problem. In order to achieve this, thekernel multiplies each fi by (MT

M)−1M

T, where M = eeEe=1 and the super-

script “T” denotes a matrix transpose operation.

4 Experimental results

4.1 Parallel computing systems

Two parallel computing systems have been used in experiments. The first one is aheterogeneous network consisting of 16 different workstations which has been used toevaluate the proposed heterogeneous implementation. Table 1 shows the properties of



Table 1: Specifications of heterogeneous computing nodes.Processor Architecture Cycle-time Memory Cache

number overview (seconds/Mflop) (MB) (KB)

p1 Intel Pentium 4 0.0058 2048 1024p2, p5, p8 Intel Xeon 0.0102 1024 512

p3 AMD Athlon 0.0026 7748 512p4, p6, p7, p9 Intel Xeon 0.0072 1024 1024

p10 UltraSparc-5 0.0451 512 2048p11 − p16 AMD Athlon 0.0131 2048 1024

the 16 heterogeneous workstations, which are interconnected using the same homoge-neous communication network with capacity c = 26.64 milliseconds. Although this isa simple architecture, it is also a quite typical and realistic one as well.

The GPU-based experiments were performed on a 2006-model HP xw8400 work-station based on dual Quad-Core Intel Xeon processor E5345 running at 2.33 GHz with1.333 MHz bus speed and 3 GB RAM. The computer was equipped with an NVidiaGeForce 8800 GTX with 16 multiprocessors, each composed of 8 SIMD processors op-erating at 1350 Mhz. Each multiprocessor has 8192 registers, a 16 KB parallel datacache of fast shared memory, and access to 768 MB of global memory. The processingchain was implemented using NVidia’s Compute Unified Device Architecture (CUDA).

4.2 Hyperspectral data

A well-known hyperspectral data set collected over the Cuprite mining district inNevada was used in experiments to evaluate the algorithms in the context of a realmineral mapping application. The data set2 consists of 1939 × 677 pixels and 224bands in the wavelength range 0.4–2.5 µm (574 MB in size). The Cuprite site hasbeen extensively mapped by the U.S. Geological Survey (USGS), and there is extensiveground-truth information available, including a library of mineral signatures collectedon the field3. Fig. 3(a) shows the spectral band at 587 nm wavelength of the AVIRISscene. The spectra of USGS ground minerals are also displayed in Figs. 3(b-c).

4.3 Performance evaluation

Table 2 shows the SAD-based spectral similarity scores obtained after comparing theten USGS library spectra with the corresponding endmembers extracted by the originalchain in Kodak’s Research Systems ENVI software, version 4.5, and the two parallelimplementations of the processing chain (heterogeneous and GPU). It is important toemphasize that smaller SAD values indicate higher spectral similarity. As shown byTable 2, the spectral similarity scores with regards to the reference USGS signatureswere very satisfactory. In all cases, we empirically set parameter vc (threshold value)to the mean of Ntimes scores obtained after K = 104 iterations, while we set va = 0.01(threshold angle to remove redundant endmembers) according to previous work [11].

2Available online from http://aviris.jpl.nasa.gov/html/aviris.freedata.html3Available online from http://speclab.cr.usgs.gov/spectral-lib.html



Figure 3: (a) A portion of the AVIRIS scene over Cuprite mining district. (b-c) Ground-truth mineral spectra provided by USGS.

Table 3 shows the performance gain of the heterogeneous implementation withregards to the sequential version of the processing chain as the number of processorswas increased. Here, we assumed that processor p3 (the fastest) was always the masterand varied the number of slaves. The construction of speedup plots in heterogeneousenvironments is not straightforward, mainly because the workers do not have the samerelative speed, and therefore the order in which they are added to plot the speedupcurve needs to be further analyzed. We have tested three different ordering strategies:

1. Strategy #1. First, we used an ordering strategy in which new processors wereadded according to their numbering in Table 1, i.e., the first case study tested(labeled as “2 CPUs” in Table 3) consisted of using p3 as the master and p0 as theslave; the second case (labeled as “3 CPUs”) consisted of using p3 as the masterand p0, p1 as slaves, and so on, until a final case (labeled as “15 CPUs”) wastested, based on using p3 as the master and all remaining 15 processors as slaves.

2. Strategy #2. Second, we used an ordering strategy based on the relative speedof processors in Table 1, i.e., the first case study tested (labeled as “2 CPUs” inTable 3) consisted of using processor p3 as the master and processor p10 (i.e., theone with lowest relative speed) as the slave, and so on.



Table 2: SAD spectral similarity scores between the endmembers extracted by differentimplementations of the hyperspectral processing chain and USGS reference signatures.

USGS mineral ENVI Heterogeneous GPU-based

Alunite 0.084 0.084 0.084Buddingtonite 0.071 0.075 0.071Calcite 0.099 0.099 0.091Chlorite 0.065 0.065 0.065Jarosite 0.091 0.091 0.093Kaolinite 0.136 0.136 0.145Montmorillonite 0.106 0.112 0.106Muscovite 0.092 0.092 0.092Nontronite 0.099 0.102 0.102Pyrophillite 0.094 0.094 0.097

Table 3: Speedups achieved by the proposed parallel heterogeneous implementation onthe heterogeneous network.

# CPUs Strategy #1 Strategy #2 Strategy #3

2 1.93 1.90 1.873 2.91 2.92 2.884 3.88 3.89 3.675 4.83 4.89 4.726 5.84 5.81 5.747 6.75 6.83 6.558 7.63 7.76 7.619 8.81 8.74 7.6510 9.57 9.68 9.5311 10.62 10.65 10.4412 11.43 11.55 11.4113 12.25 12.42 12.3614 13.16 13.32 13.2915 14.22 14.25 14.2216 15.19 15.22 15.16

3. Strategy #3. Finally, we also used a random ordering strategy, i.e., the first casestudy tested (labeled as “2 CPUs” in Table 3) consisted of using processor p3

as the master and a different processor, selected randomly among the remainingprocessors, as the slave, until all remaining processors were exhausted.

As shown by Table 3, the three ordering strategies tested provided almost linearperformance increase (regardless of the relative speed of the nodes). Although theproposed implementation scales well in heterogeneous environments, there are severalrestrictions in order to incorporate this type of platform for on-board processing inremote sensing missions. To address this issue, we compare the performance of a fullyoptimized CPU implementation and the GPU implementation by measuring the execu-tion time as a function of the image size. Table 4 shows the execution times measuredfor different image sizes by the CPU and GPU-based implementations, respectively,where the largest image size in the table (574 MB) corresponds to the full hyperspec-tral scene (1939× 677 pixels and 224 bands) whereas the others correspond to croppedportions of the same image. The C function clock() was used for timing the CPU im-plementation and the CUDA timer was used for the GPU implementation. The time



Table 4: Processing time (seconds) for the CPU and GPU-based implementations.Size (MB) Processing time (CPU) Processing time (GPU)

68 81.53 2.88136 162.75 5.93205 244.22 8.25273 325.21 10.90410 489.69 16.24574 685.45 22.62

measurement was started right after the hyperspectral image file was read to the CPUmemory and stopped right after the results of the processing chain were obtained andstored in the CPU memory. From Table 4, it can be seen that the full AVIRIS datacube was processed in 22.62 seconds. This response is not strictly in real-time since thecross-track line scan time in AVIRIS, a push-broom instrument [2], is quite fast (8.3msec). This introduces the need to process the full image cube (1939 lines) in about 16seconds to achieve real-time performance. Although the proposed implementation canstill be optimized, Table 4 indicates that the complexity of the implementation scaleslinearly with the problem size, i.e. doubling the image size doubles the execution time.The speedups achieved by the GPU implementation over the CPU one remained closeto 30 for all considered image sizes, which doubles the best speedup result reported forthe heterogeneous network in Table 3 at much lower cost (1 GPU vs 16 CPUs) and,most importantly, with less restrictions in terms of power consumption and size, whichare very important when defining mission payload in remote sensing missions.

5 Conclusions and future research lines

Remote sensing missions require efficient processing systems able to cope with the ex-tremely high dimensionality of the collected data without compromising mission pay-load. In this paper, two innovative parallel implementations of a remotely sensedhyperspectral image processing chain for mixed pixel characterization have been eval-uated from the viewpoint of both algorithm accuracy and parallel performance. Theconsidered solutions include a heterogeneity-aware parallel implementation, developedfor effective information extraction from data sets already transmitted to Earth, anda GPU-based implementation for on-board processing. Although both solutions arecomplementary, the GPU-based one is more appealing for data analysis in time-criticalmissions. Future work will comprise optimizing the GPU-based implementation andpursuing implementations in other systems, such as NASA’s Discover supercomputer.

Acknowledgements

This research has been developed in the framework of the network “High PerformanceComputing on Heterogeneous Parallel Architectures” (CAPAP-H), supported by theSpanish Ministry of Science and Innovation (TIN2007-29664-E). Funding from HY-PERCOMP/EODIX project (AYA2008-05965-C04-02) is also gratefully acknowledged.



References

[1] A. F. H. Goetz, G. Vane, J. E. Solomon and B. N. Rock, Imaging spec-

trometry for Earth remote sensing, Science (1985) 1147–1153.

[2] R.O. Green et al., Imaging spectroscopy and the airborne visible/infrared imag-

ing spectrometer (AVIRIS), Remote Sensing of Environment (1998) 227–248.

[3] D. A. Landgrebe, Signal theory methods in multispectral remote sensing, Wiley,Hoboken, NJ, 2003.

[4] A. Plaza and C.-I Chang, High performance computing in remote sensing, CRCPress, Boca Raton, FL, 2007.

[5] A. Plaza and C.-I Chang, Clusters versus FPGA for parallel processing of

hyperspectral imagery, International Journal of High Performance Computing Ap-plications (2008) 366–385.

[6] A. Lastovetsky, Parallel computing on heterogeneous networks, Wiley, Hobo-ken, NJ, 2003.

[7] A. Plaza, J. Plaza and D. Valencia, Impact of platform heterogeneity on

the design of parallel algorithms for morphological processing of high–dimensional

image data, Journal of Supercomputing (2007) 81–107.

[8] A. Plaza, Parallel techniques for information extraction from hyperspectral im-

agery using heterogeneous networks of workstations, Journal of Parallel and Dis-tributed Computing (2008) 93–111.

[9] J. Setoain, M. Prieto, C. Tenllado and F. Tirado, GPU for parallel on-

board hyperspectral image processing, International Journal of High PerformanceComputing Applications (2008) 424–437.

[10] C.-I Chang, Hyperspectral imaging: techniques for spectral detection and classi-

fication, Kluwer, New York, 2003.

[11] A. Plaza, P. Martinez, R. Perez and J. Plaza, A quantitative and compar-

ative analysis of endmember extraction algorithms from hyperspectral data, IEEETransactions on Geoscience and Remote Sensing (2004) 650–663.

[12] J. W. Boardman, Automating spectral unmixing of AVIRIS data using convex

geometry concepts, Proceedings of the Fourth Annual NASA/JPL Airborne EarthScience Workshop (1993) 11–14.

[13] M. Matsumoto and T. Nishimura, Mersenne twister: a 623-dimensionally

equidistributed uniform pseudorandom number generator, ACM Transactions onModeling and Computer Simulation (1998) 3–30.



Interior point methods for protein image alignment

Florian A. Potra1

1 Department of Mathematics and Statistics,University of Maryland, Baltimore County

emails: [email protected]

Abstract

One of the core technologies for obtaining protein mixtures is provided by two-dimensional polyacrylamide gel electrophoresis. In order to analyze variations inthe protein gels obtained from different groups that account for biological variationswe must first eliminate distortions to properly align images. The image alignment isrecognized as a major bottleneck in proteomics. We formulate the image alignmentproblem as a large scale quadratic programming problem, which can be solvedin polynomial time by interior point methods. Numerical results illustrate theeffectiveness of this approach.

Key words: proteomics, electrophoresis, image alignment, quadratic program-ming, interior point method

1 Introduction

Proteomics is the large-scale analysis of complex protein mixtures focusing on the qual-itative and quantitative variations of protein expression levels. Two-dimensional poly-acrylamide gel electrophoresis (2D-PAGE) is a core technology for separating complexprotein mixtures. The main goal of protein separation methods is to detect differen-tially expressed proteins across treatment groups. However, a major bottleneck towardthat goal is the misalignment of gels due to warping and thus confounding biologicalvariation with non-biologically relevant distortions.

Replicate gels are generated from blood, tissue or serum from subjects belonging todifferent treatment groups and comparisons are made between the groups via a studyof protein abundance on scanned images of the gels. Due to the complexity of the 2D-PAGE procedure, gel-to-gel experimental variation is substantial. Even under strictlycontrolled laboratory conditions, aliquots from the same sample can yield quite differentprotein maps. A spot that appears at one location on a given image may appear at adifferent location on another image. Gel images may be distorted from global shifts orlocal image warping. Evaluating a set of high-resolution 2-D gels by manual comparison



is often impossible. It is indeed very laborious to detect the emergence of a few newspots or the disappearance of a single spot among several hundreds of spots of differentsizes and intensities when local distortions are present in each gel.

Alignment of a family of gel images can be done by means of geometrical transfor-mations applied to the coordinate domain of the images [7, 8]. This technique is alsoknown as image warping [3, 6, 15], or image registration [4, 5, 16, 17]. The transformedimages should have no (or very little) geometrical variations, so that only statisticaland biological variations are observed. Most image warping methods rely on somepreassigned control points or landmarks, which are a relatively small group of spotspresent in all the gels being compared.

Traditionally, alignment of families of 2-D gels was done by choosing one gel of thefamily as “target”, and by constructing appropriate geometric transformations to mapthe other gels (“source” gels) into the target gel, such that the landmarks of the sourcegels are aligned with the landmarks of the target gel. Since for each source gel thecorresponding transformation is determined by using only the information containedin the landmarks present in the source gel and the target gel, this (pair-wise) align-ment approach can be implemented at a relatively low computational cost. However,the choice of a target gel, which is done by a human operator, may introduce a biasin the alignment process. Moreover the pair-wise alignment does not use any globalinformation about the whole family of gels. In case the chosen target is an outlier orcontains severe local distortions, the resulting alignment is very poor, or the algorithmmay fail altogether.

In [13], we have proposed a new approach for aligning families of 2D-gels where theinformation contained in the landmarks of the whole family is used to create an ideal gel,and to determine transformations that optimally align all the gels of the family to thisideal gel. The ideal landmarks, along with the coefficients defining the transformations,are obtained as the solution of a large-scale quadratic programming problem, which canbe efficiently solved with modern interior-point algorithms [14]. The transformationsused in [11] are piecewise affine and bilinear transformations based on a sequence ofhierarchical grids. Similar hierarchical approach has been used in [15, 17] for pair-wise alignment. In [12] this approach is applied in conjunction with a family of tensorproduct cubic splines. With proper constraints the transformations are globally C2,in the sense that the transformations themselves, the first derivatives and the secondderivatives are continuous.

The main computational cost in implementing the image alignment algorithmsfrom [11, 12, 13] consists in solving a large scale quadratic programming problem.The numerical results reported in those papers were obtained by using the quadraticprogramming solver from the commercial package MOSEK. While this is a very efficientgeneral quadratic programming solver, it does not take full advantage of the particularstructure of the gel alignment problem. The largest family of gels that we have managedto align using this method consisted of 123 gels from the Human Leukemia Data Set.In order to be able to align much larger families, that are required by large scaleproteomics studies, we have to develop an interior point method that takes advantageof the problem structure. In this paper we will show that a variant of the corrector-


Florian A. Potra

predictor interior point algorithm recently proposed and analyzed in [10] can be usedto this effect. In what follows we will summarize some results from [11, 12, 13] and[10] using standard mathematical notation, with the exception that we will use [x; s]to denote a column vector obtained by concatenating the column vectors x and s. Forexample [1; [2; 3]; [a; b; c]] = [1; 2; 3; a; b; c; ] = (1, 2, 3, a, b, c)T . A similar notation is usedby some programming environments like MATLAB, or in [1].

2 Optimal gel alignment via quadratic programming

Given a collection of M gel images I(1), I(2), . . . , I(M), with N preassigned landmarks oneach image, we denote by Lil = [xil; yil] the lth landmark on I(i), l = 1, . . . , N . All thegel images in our applications are rectangles. We assume in general that I(i) is includedin the rectangle Ω(i). We construct a set of ideal landmarks Ll = [xl; yl], l = 1, . . . , N ,an ideal gel image I, along with a set of geometric transformations T (i) : Ω(i) → IR2,by solving an optimization problem of the form.

minT (i), Ll

ϕ (1)

subject to ‖T (i)(Lil)− Ll‖∞ ≤ ε, i = 1, . . . ,M, l = 1, . . . , N,

‖Ll − 1M

∑Mi=1 Lil‖∞ ≤ δ, l = 1, . . . , N ,

E(T (i)

)= 0 i = 1, . . . ,M.

where the objective function ϕ is constructed in such a way as to minimize the curvatureof the transformations and their distance to the identity transformation. The firstconstraint in (1) is used to ensure that the transformed landmarks T (i)(Lil) are closeenough (within ε pixels) to the ideal landmark Ll, and the second constraint is usedto set the ideal landmark Ll close (within δ pixels) to the center of the correspondinggroup of landmarks Lil, i = 1, . . . ,M . The equality constraints E

(T (i)

)= 0 are linear

in the unknowns defining the transformation T (i), and they enforce the smoothness ofthis transformation. For example, in [12] we have considered T (i) to be a piecewisepolynomial transformation based on dividing the rectangle Ω(i) into p2 sub-rectanglesΩ(i)

jk , j, k = 1, . . . , p. On each sub-rectangle, T (i)(x, y) is cubic in both x and y. Byusing Hermite bases we obtained the equality constraints E

(T (i)

)= 0 as a system of

8(p2−1) linear equations in 8(p+1)2 unknowns to ensure that T (i) is twice continuouslydifferentiable.

Let us assume that each transformation T (i) is written as linear combination ofsome basic transformations, so that T (i) is completely defined by a vector of unknownsξi ∈ IRq. We denote by u ∈ IRMq+2N the vector obtained by concatenating the vectorsξ1, . . . , ξM , L1, . . . , LN . Let us also assume that the objective function ϕ is quadratic inx, an assumption that is satisfied by the objective functions used in [13, 11, 12]. Thenthe optimization problems (1) can be written as a convex quadratically constrainedquadratic programming problem of the form

minu

12uT Q0u + cT

0 u + d0 (2)



s. t. Au = b12uT Qiu + cT

i u + di ≤ 0 , ı = 1, 2, . . . , ν

where Q0, Q1, . . . , Qν are positive definite matrices.

3 Complementarity problems over second-order cones

Using the methods from [1] we can write the optimization problem (2) as a linearcomplementarity problem over a cone K ⊂ IRn which is a Cartesian product of second-order ( Lorenz) cones. If denotes the product in the associated Jordan algebra J ,then (2) can be written as

x s = 0Px + Qs + Ry = a (3)

x 0 , s 0

The notation x 0 means that x ∈ K. In this case we say that x is positive semidefinite.If x ∈ K0, where K0 denotes the interior of K, then we write x 0 and we say thatx is positive definite. The set of feasible points of the linear complementarity problem(3) is denoted by

F = z = [x; s; y] ∈ K ×K × IRm : Px + Qs + Ry = a . (4)

It’s relative interior F0 = F⋂K0 ×K0 × IRm is called the set of strictly feasible points

(or interior points). It turns out that if F0 is nonempty, then for any positive numberτ > 0, the nonlinear system of equations

x s = τePx + Qs + Ry = a

(5)

where e is the identity element of the Jordan algebra J , has a unique solution z(τ) =[x(τ); s(τ); y(τ)] with x(τ) 0 and s(τ) 0. The set C of all solutions of (3) with x 0and s 0 is called the central path of the linear complementarity problem (3). It iseasily seen that if z = [x; s; y] ∈ K×K× IRm and x s = τe, then τ = µ = µ(z), whereµ(z) = (xT s)/n is the normalized duality gap of z. Feasible interior point methods usea strictly feasible starting point z0 ∈ F0 and produce a sequence of points zk ∈ F0,belonging to a certain neighborhood of the central path C, with limk→∞ µ(zk) = 0.While feasible interior point method have very attractive theoretical properties, inpractice it may be difficult to find a strictly feasible starting point. The interior pointmethod to be presented in the next section uses a starting point z0 = [x0; s0; y0] ∈K0×K0× IRm that is infeasible in the sense that the residual r0 = a−Px0−Qs0−Ry0

is different from zero. Let us denote τ0 = µ(z0) and q0 = τ−10 r0. It can be proved that

for any positive number τ > 0, the nonlinear system of equations

x s = τePx + Qs + Ry = a− τq0 (6)


Florian A. Potra

has a unique solution z(τ) = [x(τ); s(τ); y(τ)] ∈ K0 ×K0 × IRm. Using the terminologyintroduced in [2] we call the set of all such solutions the infeasible central path pinnedon q0, and we denote it by Cq0 . Obviously, if the starting point is feasible then q0 = 0and Cq0 reduces to the classical feasible central path C. In the infeasible case, for anyx0 ∈ K0, y0 ∈ IRm, and τ0 > 0, we can take s0 = τ0 (x0)−1e, so that the startingpoint [τ0;x0; s0; y0] belongs to Cq0 . In the feasible case it is extremely difficult to finda starting point belonging to C0. The infeasible interior point method to be presentedin the next section generates a sequence [τk; zk], satisfying [P Q R]zk = a− τkq

0, andbelonging to the following neighborhood of the infeasible central path

V(α) = [τ ; z] ∈ (0, τ0]×K0 ×K0 × IRm : σ(z) ⊂ [ατ, (1/α)τ ] (7)

where α ∈ (0, 1) is a given parameter, and σ(z) denotes the set of all eigenvalues ofQx1/2s, with Qx1/2 being the quadratic representation of x1/2 (see [1] and [9, Proposition1]). We note that the above neighborhood was considered in [2] for linear complemen-tarity problem over the classical cone IRn

+. If we introduce the proximity measure

δ(τ, z) = max

1− λmin(z)τ

, 1− τ

λmax(z)

, (8)

then the neighborhood V(α) can be written as

V(α) = [τ ; z] ∈ (0, τ0]×K0 ×K0 × IRm : δ(τ, z) ≤ 1− α.

At each iteration we will chose a scaling point p ∈ K0 belonging to the commutativeclass of scalings for z,

C(z) =∈ K0 : Qp x and Qp−1s operator commute

, (9)

and we will consider the scaled quantities

x = Qp x, s = Qp−1s, P = PQp−1 , Q = QQp . (10)

Since x s = τe if and only if x s = τe, the linear complementarity problem can bewritten under the form

x s = τe

P x + Qs + Ry = a. (11)

4 An infeasible corrector-predictor method

At each step of our algorithm we are given a point [τ ; z] = [τ ;x; s; y] ∈ V(α) thatsatisfies Px + Qs + Ry − a = τq0, and for each θ ∈ [0, 1] we define

τ(θ) = (1− (1− γ)θ)τ, z(θ) = z + θ∆z, (12)

where ∆z = [∆x;∆s;∆y] is obtained by first solving the linear system

s ∆x + ∆s x = γτe− x s

P∆x + Q∆s + R∆y = (1− γ) (a− Px−Qs−Ry)(13)



and then setting[∆x;∆s;∆y] = [∆x;∆s;∆y]. (14)

In a corrector step we choose γ ∈ [γ , γ], where 0 < γ < γ < 1 are given parameters,while in a predictor step we take γ = 0.

The corrector. The main purpose of the corrector step is to increase proximity tothe central path. Therefore, the steplength of the corrector is obtained as

θc = argmin δ(θ) : θ ∈ [ 0, 1] . (15)

As a result of the corrector step we obtain the point

[τ ; z] = [τ ; x; s; y] := [τ(θc); z(θc)] . (16)

We have clearly [τ ; z] ∈ V(αc) for some αc > α. While the parameter α is fixed duringthe algorithm, the positive quantity αc varies from iteration to iteration. However, wecan prove that there is a constant α∗c > α, such that αc > α∗c at all iterations.

The predictor. The predictor is obtained by taking [τ ; z] = [τ ; z], where [τ ; z] is theresult of the corrector step, and γ = 0 in (12)-(13). The aim of the predictor step isto decrease the complementarity gap as much as possible while keeping the iterate inV(α). This is accomplished by defining the predictor steplength as

θp = maxθ : [τ(θ); z(θ)] ∈ V(α), ∀θ ∈ [0, θ ] . (17)

With the above line search the predictor step computes a point

[τ+; z+] = [τ+;x+; s+; y+] := [τ(θp); z(θp)] . (18)

By construction we have z+ ∈ V(α), so that a new corrector step can be applied.Summing up we can formulate the following iterative procedure:

Algorithm 1 (Corrector-Predictor Infeasible Interior Point Method)Given real parameters 0 < α < 1, 0 < γ < γ < 1,and a vector [τ0; z0] = [τ0;x0; s0; y0] ∈ V(α) :

Set k ← 0;repeat

(corrector step)Set [τ ; z] ← [τk, z

k];Choose p ∈ C(z) and γ ∈ [ γ , γ ] ;Compute direction ∆z from (13)–(14);Compute corrector steplength θc from (15);Compute [τ ; z] from (16);Set [τk; zk] ← [τ ; z](predictor step)Set [τ ; z] ← [τk; zk], γ = 0;Choose p ∈ C(z);


Florian A. Potra

Compute direction ∆z from (13)–(14);Compute predictor steplength θp from (17);Compute z+ from (18);Set [τk+1; zk+1] ← [τ+; z+] , k ← k + 1.

continue

5 Numerical results

We have applied the image alignment method described above, within the same settingas in [12], but by using Algorithm 1 described in the present paper for solving theoptimization problem. The quality of the alignment was comparable to that achievedin [12]. For the largest problem, consisting of 123 gels from the Human Leukemia DataSet, Algorithm 1 required fewer iterations than the algorithm used in [12].

Acknowledgements

This work has been partially supported by the National Institute of Health under GrantR01GM075298-01, and by the National Science Foundation under Grant 0728878.

References

[1] F. Alizadeh and D. Goldfarb. Second-order cone programming. Math. Program.,95(1, Ser. B):3–51, 2003. ISMP 2000, Part 3 (Atlanta, GA).

[2] J. F. Bonnans and F. A. Potra. On the convergence of the iteration sequence ofinfeasible path following algorithms for linear complementarity problems. Mathe-matics of Operations Research, 22(2):378–407, 1997.

[3] C. A. Glasbey and K. V. Mardia. A review of image warping methods. Journal ofApplied Statistics, 25:155–171, 1998.

[4] A. Goshtasby. Piecewise linear mapping functions for image registration. PatternRecognition, 19:459–466, 1986.

[5] A. Goshtasby. Piecewise cubic mapping functions for image registration. PatternRecognition, 20:525–533, 1987.

[6] J. S. Gustafsson, A. Blomberg, and M. Rudemo. Warping two-dimensional elec-trophoresis gel images to correct for geometric distortions of the spot pattern.Electrophoresis, 23:1731–1744, 2002.

[7] K. Kaczmarek, B. Walczak, S. de Jong, and B.G.M. Vandeginste. Comparisonof image-transformation methods used in matching 2d gel electrophoresis images.Acta Chromatographica, 13:7–21, 2003.



[8] K. Kaczmarek, B. Walczak, S. de Jong, and B.G.M. Vandeginste. Matching2d gelelectrophoresis images. Journal of Chemical Information and Computer Sciences,43:978–986, 2003.

[9] R. D. C. Monteiro and T. Tsuchiya. Polynomial convergence of primal-dual algo-rithms for the second-order cone program based on the MZ-family of directions.Math. Program., 88(1, Ser. A):61–83, 2000.

[10] F. A. Potra. Corrector-predictor methods for monontone linear complementarityproblems in a wide neighborhood of the central path. Math. Program., 111(1-2,Ser. B):243–272, 2008.

[11] F. A. Potra and X. Liu. Aligning families of 2D-gels by a combined hierarchi-cal forward-inverse transformation approach. Journal of Computational Biology,13(7):1384–1395, 2006.

[12] F. A. Potra and X. Liu. Protein image alignment via tensor product cubic splines.Optimization Methods and Software, 22(1):155–168, 2007.

[13] F. A. Potra, X. Liu, F. Seillier-Moiseiwitsch, A. Roy, Y. Hang, M. R. Marten, andB. Raman. Protein image alignment via piecewise affine transformations. Journalof Computational Biology, 13(3):614–630, 2006.

[14] F. A. Potra and S. J. Wright. Interior-point methods. Journal of Computationaland Applied Mathematics, 124:281–302, 2000.

[15] J. Salmi, T. Aittokallio, J. Westerholm, M. Griese, A. Rosengren, T. A. Numan,R. Lahesmaa, and O. Nevalainen. Hierarchical grid transformation for image warp-ing in the analysis of two-dimensional electrophoresis gels. Proteomics, 2:1504–1515, 2002.

[16] Zeev Smilansky. Automatic registration for images of two-dimensional protein gels.Electrophoresis, 22:1616–1626, 2001.

[17] S. Veeser, M. J. Dunn, and G.-Z. Yang. Multiresolution image registration fortwo-dimensional gel electrophoresis. Proteomics, 1:856–870, 2001.



Error Analysis on the implementation of explicit Falknermethods for y ′′ = f(x, y)

Higinio Ramos1 and Cesareo Lorenzo1

1 Department of Applied Mathematics, University of Salamanca


Abstract

The unusual implementation of Falkner method appeared in [6] increased theorder of the method as was shown by means of a numerical example. On this paperwe made an analysis of the propagation of the errors in both the usual and theunusual implementations, thus justifying the numerical results obtained. By that,some error bounds for the global errors on the solution and on the derivative areprovided. A numerical example confirms that the bounds are realistic.

Key words: error analysis, Falkner methods, second-order initial-value problemsMSC 2000: 65L70, 65L06

1 Introduccion

Second-order differential equations deserve special consideration because they appearfrequently in applied sciences. Examples of that are the mass movement under theaction of a force, problems of orbital dynamics, or in general, any problem involvingNewton’s law.

There is a vast literature addressing the numerical solution of the so-called specialsecond-order initial value problem (I.V.P.)

y ′′(x) = f(x, y(x)), y(x0) = y0, y ′(x0) = y ′0 , (1)

(see for example the classical books by Henrici [4], Collatz [1], Lambert [5], Shampineand Gordon [8] or Hairer et al. [3]).

Although it is possible to integrate a second-order I.V.P. by reducing it to a first-order system and applying one of the methods available for such systems, it seems morenatural to provide numerical methods in order to integrate the problem directly. TheStormer-Cowell methods is a well-known class of schemes of this type.

The advantage of this procedure lies in the fact that they are able to exploit specialinformation about ODES, and this results in an increase in efficiency. For instance, it


Error Analysis on explicit Falkner methods

is well-known that Runge-Kutta-Nystrom methods for (1) involve a real improvementas compared to standard Runge-Kutta methods for a given number of stages ([3], p.285), although the computational cost remains high because of the number of functionevaluations. On the other hand, a linear k-step method for first-order ODEs becomesa 2k-step method for (1), ([3], p. 461), increasing the computational work.

When the right hand side of the differential equation includes the first derivativewe have the I.V.P.

y ′′(x) = f(x, y(x), y′(x)), y(x0) = y0, y ′(x0) = y ′0 . (2)

One of the methods for numerically solving this problem is due to Falkner [2], and canbe written in the form

yn+1 = yn + h y′n + h2k−1∑

j=0

βj 5j fn , (3)

y′n+1 = y′n + hk−1∑

j=0

γj 5j fn , (4)

where h is the stepsize, yn and y′n are approximations to the values of the solution andits derivative at xn = x0 + nh , fn = f(xn, yn, y′n) and 5jfn is the standard notationfor the backward differences. The coefficients βj and γj can be obtained using thegenerating functions

Gβ(t) =∞∑

j=0

βj tj =t + (1− t) Log(1− t)(1− t)Log2(1− t)

,

Gγ(t) =∞∑

j=0

γj tj =−t

(1− t)Log(1− t).

Of course, analogously, there exist similar implicit formulas (see [1]) that read

yn+1 = yn + h y′n + h2k∑

j=0

β∗j 5j fn+1 , (5)

y′n+1 = y′n + hk∑

j=0

γ∗j 5j fn+1 , (6)

with generating functions for the coefficients given by

Gβ∗(t) =∞∑

j=0

β∗j tj =t + (1− t) Log(1− t)

Log2(1− t),

Gγ∗(t) =∞∑

j=0

γ∗j tj =−t

Log(1− t).


Higinio Ramos, Cesareo Lorenzo

Note that the formulas in (4) and (6) are respectively the Adams-Bashforth and Adams-Moulton schemes for the problem (y′)′ = f(x, y, y′), which are used to follow the valuesof the derivative. All the above formulas are of multistep type, specifically k-stepformulas, and so k initial values must be provided in order to proceed with the methods.On [6] an implementation of the explicit Falkner method using the formulas in (3) and(6) for solving the problem in (1) was considered, and a numerical example showingthe improvement obtained with this approach was presented. On this paper a rigorousanalysis of the errors involved on the procedure is made, which justifies the resultsobtained.

The paper is organized as follows. In Section 2 we reproduced the explicit Falknerschemes in [6]. In Section 3 the formulas for the local truncation errors are provided,which made possible the analysis of the global errors. Section 4 is devoted to corroboratethe obtained bounds on the global errors of the solution with both implementationsconsidering a numerical example.

2 Implementation of explicit Falkner methods

The usual implementation of the explicit Falkner method on each step for solving theproblem in (2) or in (1) is

1. Evaluate yn+1 using the formula in (3)

2. Evaluate y′n+1 using the formula in (4)

3. Evaluate fn+1 = f(xn+1, yn+1)

The unusual implementation of Falkner method on each step for solving the prob-lem in (1) appeared in [6] reads

1. Evaluate yn+1 using the formula in (3)

2. Evaluate fn+1 = f(xn+1, yn+1)

3. Evaluate y′n+1 using the formula in (6)

which can be accomplished due to the absence of the derivative on the function f .Thus, having obtained the value yn+1 it is straightforward to obtain fn+1 to be usedin the formula in (6). Note that in this way the ”implicit formula” in (6) is no furtherimplicit, resulting on an explicit formulation of the method.

3 Error analysis

3.1 Local truncation errors

Consider the formula resulting after approximating the function f on the exact formula

y(xn+1) = y(xn) + hy′(xn) +∫ xn+1

xn

f(x, y(x), y′(x))(xn+1 − x) dx (7)



by the interpolating polynomial on the grid points xn−(k−1), . . . , xn, and also considerthe formula in (3). Using the localization hypothesis that y(xj) = yj , j = n − (k −1), . . . , n, and following a similar procedure as for the Adams formulas [5] we obtainthat the local truncation error for the method in (3) is given by

LFE [y(xn);h] = hk+2y(k+2)(ξ)βk, (8)

where ξ is an internal point of the smallest interval containing xn−(k−1), . . . , xn.Similarly, for the formula in (4) the local truncation error reads

LAB[y′(xn);h] = hk+1y(k+2)(ψ)γk, (9)

where ψ as before refers to an internal point.And for the formula in (5) the local truncation error is given by

LFI [y(xn);h] = hk+3y(k+3)(ξ)β∗k+1, (10)

and similarly for the formula in (6) the local truncation error is

LAM [y′(xn);h] = hk+2y(k+3)(ψ)γ∗k+1 (11)

where we denote by ξ or ψ the internal points, or if they have to be different, by ξj orψj , as in the following section.

3.2 Global errors

3.2.1 Usual implementation

Assuming that we have to integrate the problem (1) on the interval [x0, xN ] and that weknow in advance the starting values needed to apply the numerical scheme, we proceedanalyzing the global error on each successive application of the method along the gridpoints on the integration interval.

Assuming the localization hypothesis and using the formulas for the local truncationerrors in (8) and (9) for the first step (n = 1) we have

y(x1)− y1 = y(x1)−y0 + h y′0 + h2

k−1∑

j=0

βj 5j f0

= hk+2y(k+2)(ξ1)βk , (12)

y′(x1)− y′1 = hk+1y(k+2)(ψ1)γk , (13)

where for convenience in the sequel the ξj and ψj denote appropriate internal points.For the next step (n = 2) we have in view of the formula for the local truncation

error in (8) that

y(x2) = y(x1) + hy′(x1) + h2k−1∑

j=0

βj 5j f1

+hk+2y(k+2)(ξ2)βk , (14)



where the bar over the f means that the function is evaluated on the true values (notethat in (12) it was not necessary to consider the bar over f due to the localizationhypothesis). On the other hand, the method in (3) results in this case in

y2 = y1 + hy′1 + h2k−1∑

j=0

βj 5j f1o, . (15)

Substracting the above formulas in (14) and (15), and using that

f1 − f1 = f(x1, y(x1))− f(x1, y1) =∂f

∂y(ξ)(y(x1)− y1) (16)

and the formulas in (12) and (13), it results after some calculus that

y(x2)− y2 = hk+2[y(k+2)(ξ1) + y(k+2)(ξ2)

]βk +

[hk+2y(k+2)(ψ1)γk

]

+O(hk+4)

Analogously, from the local truncation error in (9) it follows that

y′(x2) = y′(x1) + h

k−1∑

j=0

γj 5j f1 + hk+1y(k+2)(ψ2)γk , (17)

and from the method in (4) we obtain that

y′2 = y′1 + hy′1 + h

k−1∑

j=0

γj 5j f1 (18)

Now substracting the above formulas in (17) and (18), and using the formulas in (13)and (16), after some calculus results in

y′(x2)− y′2 = hk+1[y(k+2)(ψ1) + y(k+2)(ψ2)

]γk +O(hk+3) .

Repeating the procedure along the nodes on the integration interval we obtain thatthe global error at the final point xN is given by

y(xN )− yN = hk+2N∑

j=1

y(k+2)(ξj)βk + hk+2N−1∑

j=1

(N − j)y(k+2)(ψj)γk

+O(hk+4) (19)

and for the derivative

y′(xN )− y′N = hk+1n∑

j=1

y(k+2)(ψj)γk +O(hk+3) .



Assuming that y(k+2)(x) is continuous and using the mean value theorem the aboveformulas may be rewritten as follows. For the solution we can write

y(xN )− yN = hk+1y(k+2)(ξ)(xN − x0)βk

+12

hky(k+2)(ψ)(xN − x0)(xN − x1)γk (20)

+O(hk+4)

and for the derivative we have

y′(xN )− y′N = hk(xN − x0)y(k+2)(ψ)γk +O(hk+3) . (21)

3.2.2 Unusual implementation

The unusual implementation consists in changing the formula (4) by (6) for evaluatingthe derivative. This means that for n = 1 the formula (12) remains the same while theformula (13) must be changed by

y′(x1)− y′1 = hk+2y(k+3)(ψ1)γ∗k+1 (22)

Proceeding as before we obtain that for n = 2 it is

y(x2)− y2 = hk+2[y(k+2)(ξ1) + y(k+2)(ξ2)

]βk +

[hk+3y(k+3)(ψ1)

]γ∗k+1

+O(hk+4)

and from the method in (6) we obtain for the derivative that the global error in thiscase is

y′(x2)− y′2 = hk+2[y(k+3)(ψ1) + y(k+3)(ψ2)

]γ∗k+1 +O(hk+3) .

Repeating the procedure along the integration nodes, at the final point we obtainthat

y(xN )− yN = hk+2N∑

j=1

y(k+2)(ξj) βk + hk+3N−1∑

j=1

(N − j) y(k+3)(ψj) γ∗k+1

+O(hk+4) (23)

and for the derivative

y′(xN )− y′N = hk+2n∑

j=1

y(k+3)(ψj)γ∗k+1 +O(hk+3) .



Assuming that y(k+3)(x) is continuous and using the mean value theorem the aboveformulas may be written as

y(xN )− yN = hk+1y(k+2)(ξ)(xN − x0)βk

+12

hk+1y(k+3)(ψ)(xN − x0)(xN − x1)γ∗k+1 (24)

+O(hk+4)

for the solution, and similarly, for the derivative

y′(xN )− y′N = hk+1(xN − x0)y(k+3)(ψ)γ∗k+1 +O(hk+3) . (25)

4 Numerical verification

Consider the IVP given by

y′′(x) = −y(x) + sin(x) , y(0) = 1 , y′(0) = 0 (26)

whose exact solution is

y(x) =12(sin(x) + (2− x) cos(x)) .

The problem has been integrated on [0, 20π] using the above implementations withk = 6, named FEAB6 and FEAM6 respectively, using the MATHEMATICA program.We have consider also the Runge-Kutta-Nystrom method or order 6 in [7], namedRKN6, and the classical Stormer method of order 6, named ST6. In this last case theformula in (4) was used to follow the derivative. The number of steps was fixed for allmethods in N = 1000. Figure 1 shows the propagation of the absolute global errorsfor the solution using different schemes, from up to down: RKN6, FEAB6, ST6 andFEAM6. We can see that the best results correspond to the unusual implementationof the Falkner method. For the absolute global errors on the derivatives similar resultswere obtained, as shown in Figure 2.

As we know the exact solution we have used the formula in (19) for the usualimplementation, taking the ξj as the nodal points, ξj = xj , to obtain estimates forthe absolute global error at the final point. On Table 1 appear these estimations, theabsolute global errors at the final given by

GE(xN ) = |y(xN )− yN | ,

the absolute errors on these estimations, and the number of steps.For the unusual implementation similar considerations concerning the formula in

(23) lead to the data in Table 2. We observe that as the number of steps is increasingmore precise estimations are obtained in both cases.



10 20 30 40 50 60

10-10

10-8

10-6

10-4

0.01

1

Figure 1: Absolute global errors in logarithmic scale for the solution y(x) of problem(26) using RKN6, FEAB6, ST6 and FEAM6 (from up to down)

5 Conclusions

Considering the expressions of the global errors in (20) and (24), after eliminating theresidual terms indicated by O(hk+4) we observe that for the usual formulation of theFalkner method the appearance of the derivative contributes to the global error witha term of order O(hk), while for the unusual formulation such contribution is of orderO(hk+1). This results on global errors of orders O(hk) and O(hk+1) which justifies thebetter performance of the unusual implementation of the Falkner method.

N steps Estimate GE(xN ) GE(xN ) Err. Estimation2000 4.0417× 10−6 1.3792× 10−7 3.9037× 10−6

3000 3.5534× 10−7 1.2162× 10−8 3.4317× 10−7

4000 6.3244× 10−8 2.1131× 10−9 6.1130× 10−8

5000 1.6569× 10−8 6.1769× 10−10 1.5951× 10−8

6000 5.5525× 10−9 6.3090× 10−11 5.4894× 10−9

7000 2.2010× 10−9 9.1419× 10−11 2.1095× 10−9

8000 9.8825× 10−10 2.8422× 10−10 7.0403× 10−10

Table 1: Estimations of the absolute global error at xN for problem (26) with the usualimplementation.



10 20 30 40 50 60

10-9

10-7

10-5

0.001

0.1

Figure 2: Absolute global errors in logarithmic scale for the derivative y′(x) of problem(26) using RKN6, FEAB6, ST6 and FEAM6 (from up to down)

N steps Estimate GE(xN ) GE(xN ) Err. Estimation1000 7.8684× 10−7 1.1111× 10−7 6.7572× 10−7

2000 6.1705× 10−9 9.1773× 10−10 5.2527× 10−9

3000 3.6195× 10−10 5.4019× 10−11 3.0793× 10−10

4000 4.8333× 10−11 8.0362× 10−12 4.0297× 10−11

5000 1.0132× 10−11 5.6843× 10−14 1.0075× 10−11

6000 2.8299× 10−12 2.2346× 10−12 5.9526× 10−13

Table 2: Estimations of the absolute global error at xN for problem (26) with the unusualimplementation.

Acknowledgements

The authors wish to thank JCYL project SA050A08 and MICYT project MTM2008/05489for financial support.

References

[1] L. Collatz, “The Numerical treatment of Differential Equations”, (1966), Springer,Berlin.

[2] V. M. Falkner, “A method of numerical solution of differential equations”, Phil.Mag. S. 7, 21, 621-640, (1936).



[3] E. Hairer, S. P. Norsett and G. Wanner, “Solving Ordinary Differential EquationsI ”, (1987), Springer, Berlin.

[4] P. Henrici, “Discrete variable Methods in Ordinary Differential Equations”, (1962),John Wiley, New York.

[5] J. D. Lambert, “Computational Methods in Ordinary Differential Equations”,(1973), John Wiley & sons, London.

[6] P. Martın, D.J. Lopez and A. Garcıa, Implementation of Falkner method for prob-lems of the form y′′ = f(x, y). App. Math. and Comput., 109, (2000) 183–187

[7] B. Paternoster and A. Stecca, “Runge-Kutta-Nystrom Integrator”. Avaliable athttp://library.wolfram.com/infocenter/MathSource/532.

[8] L. F. Shampine and M. K. Gordon, “Computer solution of Ordinary DifferentialEquations. The initial Value Problem”, (1975), Freeman, San Francisco, CA.





30 June, 1–3 July 2009.

An approximate solution to an initial boundary valued

problem to the Rakib-Sivashinsky equation

Paulo Rebelo1

1 Departamento de Matematica, Universidade da Beira Interior, 6201 − 001 Covilha,

Portugal


Abstract

Of concern is the construction of an approximate solution to an initial bound-ary valued problem to the Rakib-Sivashinsky equation. The Fourier Method iscombined with the Adomian Decomposition Method in order to provide the ap-proximate solution. The variables are separated by the Fourier Method and thenonlinear system is solved by the Adomian Decompositiom Method. One exampleof application is presented.

Key words: Rakib-Sivashinsky equation, Fourier Method, Adomian polynomials,

Initial boundary valued problem.

1 Introduction

The Rakib-Sivashinsky equation

ut = ε2uxx − 1

2u2

x + u − u, (1)

governs the weak thermal limit of an upward flame interface in a channel. In (1), ε2

is a physical parameter corresponding to the Markstein length, u = u (t, x) defines an

instantaneous flame profile in dimensionless variables and u = 1L

∫ L

0 u (t, x) dx denotesthe space average. This equation is complemented with initial condition

u (t = 0, x) = u0 (x) , (2)

and with the Neumann (adiabatic) boundary conditions

ux (t, 0) = ux (t, L) = 0. (3)

The aim of this paper is to provide an approximate solution to the problem con-sidered in [12]:


ut = ε2uxx − 1

2u2

x + u − 1

L

∫ L

0u (t, x) dx, x ∈ ]0, L[ , t ∈ (0, T ]

u (0, x) = u0 (x) , x ∈ ]0, L[

ux = 0, x ∈ 0, L , t > 0

. (4)

We suppose that the exact solution of problem (4) is of the form:

u (t, x) = u0 (t) +

∞∑

k=1

uk (t) cos

(

kπx

L

)

, (5)

where the uk (t) are the solutions of a nonlinear system of ordinary differential equa-tions. We should note that the solution in the form (5) satisfies the boundary conditionsin (4).

The variables are separated by the Fourier Method and then, the nonlinear systemof differential equations is solved by the Adomian Decomposition Method. The initialconditions for the system are the projection of the initial condition u0 (x) in a suitablefunction space. More details will be given later. In order to obtain a good approx-imation to the solution we must have a good approximation to the initial condition,u0 (x).

For an application of this procedure to higher order non linear partial differentialequations we refer the reader to [15].

2 The Adomian Decomposition Method

In [4], G. Adomian developed a decomposition method for solving nonlinear (stochastic)differential equations using special polynomials An, usually called Adomian polynomi-als. The An’s are generated for each nonlinearity.

One of the main advantages of the Adomian’s polynomials is that they dependonly on the known function u0 (x). Another great advantage of this method is that thealgorithm is of simple implementation.

But usually the solutions provided by the standard Adomian Decomposition Methoddo not satisfy the boundary conditions. In [3] and [7] the authors present an im-provement that allows the standard Adomian Decomposition Method to solve initialboundary valued problems for partial differential equations.

The convergence of the Adomian Decomposition series has been investigated byseveral authors. In [11] and [14], the authors showed that the method does not convergein general, in particular, when the method is applied to linear operator equations.The theoretical analysis of convergence and speed of convergence of the decompositionmethod was considered in [13], [9], [10], [1] and [2].

Let us now consider the differential equation


Lu = Ru + Φu, (6)

where L (linear) and R are differential operators and Φ is a nonlinear operator.

The Adomian polynomials decompose a given function u (t, x) in a series

u (t, x) =

∞∑

n=0

un (t, x) (7)

and for a nonlinear operator Φ, we have the following decomposition

Φ (u (t, x)) =

∞∑

n=0

An, (8)

where the An, usually called the Adomian’s Polynomials, are given by the recurrenceformula

An =1

n!

dn

dλn

[

Φ

(

∞∑

n=0

λnun

)]

∣

∣

∣

λ=0, n ≥ 0. (9)

The Adomian polynomials can be constructed as follows:

A0 = Φ (u0)

A1 = u1Φ (u0)

A2 = u2Φ′ (u0) +

1

2u2

1Φ′′ (u0)

A3 = u3Φ′ (u0) + u1u2Φ

′′ (u0) +1

3!u3

1Φ′′′ (u0)

...

Algorithms for formulating Adomian polynomials where investigated in [5] and [16].

We also suppose the existence of the inverse operator L−1. Thus, applying L−1 to(6), we obtain the recurrence relation,

um+1 = L−1Rum + L−1Φum, u0 = u0 (x) , m ∈ N, (10)

that provides a reliable approach to the solution of the problem.

Now, we briefly describe how to apply the Adomian Decomposition Method to sys-tems of ordinary differential equations. Let us consider a system of ordinary differentialequations in the form

Lu1 = f1 (t, u1, u2, . . . , un)Lu2 = f2 (t, u1, u2, . . . , un)

...Lun = fn (t, u1, u2, . . . , un)

, (11)


with initial conditions ui (0), for 1 ≤ i ≤ n, where Lu = u ≡ dudt

with inverse L−1 (·) =∫ t

0 (·) dt.Applying the inverse operator L−1 to (11), we obtain the following canonical form

u1 = u1,0 + L−1t [f1 (t, u1, u2, . . . , un)]

u2 = u2,0 + L−1t [f2 (t, u1, u2, . . . , un)]

...

un = un,0 + L−1t [fn (t, u1, u2, . . . , un)]

, (12)

where ui,0 = ui (0) for 1 ≤ i ≤ n.Thus, by the Adomian Decomposition Method, each component of the solution of

(11) can be expressed as a series of the form

uj =

∞∑

i=0

fi,j (13)

and the integrands on the right side of (12), using (9), are expressed as

fi (t, u1, u2, . . . , un) =∞∑

j=0

Ai,j (fi,0, fi,1, fi,2, . . . , fi,j) , 1 ≤ i ≤ n, (14)

where the Ai,j are the Adomian Polynomials corresponding to the nonlinear part fi.We should note that in order to solve system (11), we obtain a system of Volterra

integral equations of the second kind, (12).In order to accelerate the convergence of the method when applied to nonlinear

systems of Volterra integral equations of second kind, we will proceed as in [8]. Forconsiderations related to the convergence of the Adomian Decomposition Method whenapplied to nonlinear systems of Volterra integral equations of second kind, we refer thereader to [6], where the problem of convergence is studied.

This paper is divided as follows: in the next section we describe the Adomian De-composition method; in section 4 we provide the necessary steps to obtain the approxi-mate solution to problem 4; in the last section, we present two examples of application.

3 The approximate solution

We now consider the problem

ut = ε2uxx − 1

2u2

x + u − 1

L

∫ L

0u (t, x) dx, x ∈ ]0, L[ , t ∈ (0, T ]

u (0, x) = u0 (x) , x ∈ ]0, L[

ux = 0, x ∈ 0, L , t > 0

. (15)

We consider that the approximate solution is of the form


u (t, x) ≈ u0 (t) +n∑

k=1

uk (t) cos

(

kπx

L

)

. (16)

Thus, using (16) in (15), multiplying by cos(

iπxL

)

for 0 ≤ i ≤ n, we obtain thenonlinear system of ordinary differential equations

u0 (t) = −1

4

(π

L

)2n∑

k=1

k2u2k (t) (17)

ui (t) =

(

1 −(

εiπ

L

)2)

ui (t)

− 1

L

(π

L

)2n∑

j,k=1

jkuj (t)uk (t)

∫ L

0sin

(

jπx

L

)

sin

(

kπx

L

)

cos

(

iπx

L

)

dx,(18)

for 1 ≤ i ≤ n.The initial conditions for this system are given by

u0 (0) =1

L

∫ L

0u0 (x) dx

ui (0) =2

L

∫ L

0u0 (x) cos

(

iπx

L

)

dx

(19)

for 1 ≤ i ≤ n.Integrating in order to the variable t, we obtain

u0 (t) = u0 (0) − 1

4

(π

L

)2n∑

k=1

k2

∫ t

0u2

k (t) dt (20)

ui (t) = ui (0) +

(

1 −(

εiπ

L

)2)

∫ t

0ui (t) dt

− 1

L

(π

L

)2n∑

j,k=1

jk

[∫ L

0sin

(

jπx

L

)

sin

(

kπx

L

)

cos

(

iπx

L

)

dx

]

×∫ t

0uj (t)uk (t) dt,

(21)

for 1 ≤ i ≤ n.Thus, we have the following recurrence scheme

u0,m+1 (t) = −1

4

(π

L

)2n∑

k=1

k2

∫ t

0u2

k,m (t) dt (22)


ui,m+1 (t) =

[

1 −(

εiπ

L

)2]

∫ t

0ui,m (t) dt

− 1

L

(π

L

)2n∑

j,k=1

jk

[∫ L

0sin

(

jπx

L

)

sin

(

kπx

L

)

cos

(

iπx

L

)

dx

]

×∫ t

0uj,m (t)uk,m (t) dt, (23)

for 1 ≤ i ≤ n.The initial conditions for system (22)-(23) are given by:

u0,0 (0) =1

L

∫ L

0u0 (x) dx

ui,0 (0) =2

L

∫ L

0u0 (x) cos

(

iπx

L

)

dx

(24)

for 1 ≤ i ≤ n.All the nonlinearities in (17)-(18) (and consequently in the following systems) are

of the form Φ (u, υ) = uυ and Ψ (u) = u2.Using (9) we have for Φ (u, υ)

A0 = u0υ0

A1 = u1υ0 + u0υ1

A2 = u2υ0 + u1υ1 + u0υ2

A3 = u3υ0 + u2υ1 + u1υ2 + u0υ3

A4 = u4υ0 + u3υ1 + u2υ2 + u1υ3 + u0υ4

...

and for Ψ (u), it is only necessary to consider that u = υ in Φ (·, ·). Thus we have

B0 = u20

B1 = 2u1u0

B2 = 2u2u0 + u21

B3 = 2u3u0 + 2u1u2

B4 = u22 + 2u1υ3 + 2u0υ4

...

In the next section we present one example of application.

3.1 Numerical Simulation

All the calculations and graphics presented in this section where obtained using thesymbolic software Mathematica.


In this section we present one example of application. We consider that L = π,n = 5, ε2 = 10−2 and 0 ≤ m ≤ mmax = 4. The approximate solution satisfies bothinitial and boundary conditions.

Let us consider the problem

ut = ε2uxx − 1

2u2

x + u − 1

L

∫ π

0u (t, x) dx, x ∈ ]0, π[ , t ∈ (0, T ]

u (0, x) = 1+cos(x)100 , x ∈ ]0, π[

ux = 0, x ∈ 0, π , t > 0

. (25)

Thus, using Mathematica to obtain an approximate solution to the system (22)-(23)we obtain the following approximate solution:

u0,0 (t) = 0.01000000000

u0,1 (t) = −0.000025t

u0,2 (t) = −0.000025t2

u0,3 (t) = −0.0000166667t3

u0,4 (t) = −8.333333083330211 × 10−6t4

u0,5 (t) = −3.3333331999931272 × 10−6t5

u1,0 (t) = 0.01000000000

u1,1 (t) = ×0.009999999900t

u1,2 (t) = ×0.004999874900t2

u1,3 (t) = ×0.001666416617t3

u1,4 (t) = ×0.0004164062339t4

u1,5 (t) = ×0.00008314583074t5

u2,0 (t) = 0.00002500000000

u2,1 (t) = 0.00003749999925t

u2,2 (t) = 0.00002916583225t2

u2,3 (t) = 0.00001562291585t3

u2,4 (t) = 6.45562459 × 10−6t4

u2,5 (t) = 6.45562459 × 10−6t5

u3,0 (t) = 0

u3,1 (t) = 0

u3,2 (t) = 1.250000000 × 10−7t2

u3,3 (t) = 2.499999929 × 10−7t3

u3,4 (t) = 2.604096219 × 10−7t4

u3,5 (t) = 1.874788927 × 10−7t5

u4,0 (t) = 0

u4,1 (t) = 0

u4,2 (t) = 0

u4,3 (t) = 8.33333333 × 10−10t3

u4,4 (t) = 2.083333259 × 10−9t4

u4,5 (t) = 2.70826649 × 10−9t5

u5,0 (t) = 0

u5,1 (t) = 0

u5,2 (t) = 0

u5,3 (t) = 0

u5,4 (t) = 6.51041667 × 10−12t4

u5,5 (t) = 1.953124918 × 10−11t5

The surface in figure 1 is a graphical representation of the approximate solution,u (t, x).


0

1

2

3 0

2

4

6

8

10

-5-2.5

02.55

0

1

2

3

Figure 1: The surface shows the approximate solution u (t, x), for 0 ≤ x ≤ π, 0 ≤ t ≤ 10.

4 Conclusions

In this paper we have used the Fourier Method and the Adomian DecompositionMethod to present an approximate solution to an initial boundary boundary prob-lem to the one-dimensional Rakib-Sivashinsky equation. The procedure is simple toimplement and only with a few terms provides a reliable approximate solution. Italso avoids the difficulties and massive computational work compared to other existingtechniques.

Acknowledgements

This work was partially supported by Centro de Matematica da Universidade da BeiraInterior, FCT/POCI2010/FEDER (MPPC project of CMUBI).

References

[1] Abbaoui, K., and Cherrauault, Y. Convergence of adomian’s method ap-plied to differential equations. Comput. Math. Appl. 28 (1996), 103–109.

[2] Abbaoui, K., and Cherrauault, Y. New ideas for proving convergence ofdecomposition methods. Comput. Math. Appl. 28 (1996), 103–108.

[3] Adomian, G., and Rach, R. A new algorithm for matching boundary condi-tions in decomposition solutions. Appl. Math. Comput. 58, 1 (1993), 61–68.


[4] Adomian, G. Nonlinear Stochastic Systems and Application to Physics. Kluwer,1989.

[5] Adomian, G. Solving Frontier Problems of Physics: The Decomposition Method.

Kluwer, 1994.

[6] Babolian, E., and Biazar, J. Solution of a system of nonlinear volterraintegral equations of the second kind. Far East J. Math. Sci. 2, 6 (2004), 935–945.

[7] Benneouala, T., and Cherruault, Y. New methods for applying the ado-mian method to partial differential equations with boundary conditions. Kyber-

netes 34, 7-8 (2005), 924–933.

[8] Biazar, J., Babolian, E., and Islam, R. Solution of the system of ordinarydifferential equations by adomian decomposition method. Applied Mathematics

and Computation 147 (2004), 713–719.

[9] Cherrauault, Y. Convergence of adomian’s method. Kybernetes 18 (1989),31–38.

[10] Cherrauault, Y., and Adomian, G. Decomposition methods: a new proofof convergence. Math. Comput. Model. 18 (1993), 103–106.

[11] Golberg, M. A note on the decomposition method for operator equations. Appl.

Math. Comput. 106, 2-3 (1999), 215–220.

[12] Guidi, L. F., and Marchetti, D. H. A comparison analysis of Sivashinsky’stype evolution equations describing flame propagation in channels. Phys. Lett.,

A 308, 2-3 (2003), 162–172.

[13] Hosseini, M., and Nasabzadeh, H. On the convergence of Adomian decom-position method. Appl. Math. Comput. 182, 1 (2006), 536–543.

[14] Nelson, P. Adomian’s method of decomposition: Critical review and examplesof failure. J. Comput. Appl. Math. 23, 3 (1988), 389–393.

[15] Rebelo, P. On the approximate solution to an initial boundary valued problemfor the Cahn-Hilliard equation. Commun Nonlinear Sci Numer Simulat (2009),doi:10.1016/j.cnsns.2009.03.030

[16] Wazwaz, A. A new algorithm for calculating adomian polynomials for nonlinearoperators. Appl. Math. Comput. 111, 1 (2001), 33–51.





30 June, 1–3 July 2009.

Shared memory programming models for evolutionary

algorithms∗

J.L. Redondo1, I. Garcıa1 and P.M. Ortigosa1

1 Dpt. Computer Architecture and Electronics, University of Almerıa

emails: [email protected], [email protected], [email protected]

Abstract

In this work, two parallel techniques based on shared memory programming arepresented. These models are specially suitable to be applied over evolutionary algo-rithms. To study their performance, the algorithm UEGO (Universal Evolutionary

Global Optimizer) has been chosen.

Key words: Evolutionary algorithm, shared memory programming, computa-

tional experiment, UEGO.

1 Introduction

The objective of global optimization is to find the best (global) solution of optimizationproblems, in the presence of multiple local and global optimal solutions. Formally,global optimization seeks the global solution of a constrained optimization model. Thefield of global optimization studies these problems which have one or several globaloptima.

Nowadays, it is really difficult to propose a classification of the search algorithms,since they blend different techniques in order to obtain the global optimum. A possibleclassification of the global optimization methods can be consulted in [25]. Neverthe-less, we venture to give one classification, where the global optimization methods aredivided into exact and heuristic, depending on whether or not they can guarantee theconvergence to the optimal solution.

Exact algorithms are guaranteed to find, for a finite size instance of a problem,an optimal solution. However, for NP-hard problems, no polynomial time algorithmexists. Therefore, for many practical purposes, exact methods are very expensive from

∗† This work has been funded by grants from the Spanish Ministry of Science and Innovation(TIN2008-01117) and Junta de Andalucıa (P06-TIC-01426, P08-TIC-3518), in part financed by theEuropean Regional Development Fund (ERDF).


Shared memory programming models for evolutionary algorithms

a computational point of view, and heuristic methods have received more and moreattention in the last few years.

Heuristics can provide useful and practical solutions for a wide range of problemsand application domains. The power of heuristics comes from its ability to treat withcomplex problems when no or little knowledge of the search space is given. Theyare particularly well suited to deal with a wide range of computationally intractableoptimization and decision-making applications.

Usually, heuristics imitate successful strategies found in nature. For example, evo-lutionary techniques copy the principles applied to populations to develop superiorqualities over generations.

Evolutionary Computation (EC) is a modern search technique which uses computa-tional models of processes of evolution and selection [10, 11]. Concepts and mechanismsof Darwinian evolution and natural selection are encoded in evolutionary algorithms(EAs) and used to solve problems in many fields of engineering and science [3].

Nevertheless, those problems are hard and large, and need to explore the searchspace deeply to obtain good solutions. This translates directly into larger computa-tional times and larger resource requirements (memory, processors...). In such situation,a parallel machine together with a parallel model is required. Literature contains manyexamples of successful parallel implementations. Nevertheless, as nowadays new multi-core systems are expected to become common as personal computers, to develop modelsbased on this paradigm could be a good choice.

In shared memory programming, the whole memory is directly accessible to all theprocesses with an intent to provide communication among themselves. Depending oncontext, programs may run on the same physical processor or on separate ones. Al-though there exist several ways to deal with parallelism in a shared memory model, thestandardized library Pthreads (or POXIS threads) have been considered. It providesa unified set of C library routines with the main aim to make multithreaded imple-mentations portable. This library includes functions for thread management (creation,scheduling and destruction) and synchronization (mutexes, synchronization variables,semaphores and read-write locks), and it is available mainly for several variants of theUNIX operating system.

In this work, two parallel techniques based on shared memory programming forevolutionary algorithms have been designed. The first parallel algorithms has beencalled GM (Global Memory). In this case, threads access to shared memory in mutualexclusion. In the second one, named SSM (Structured Shared Memory), each thread isin charge of processing the assigned computational burden, which is stored in a localdata structure.

To illustrate the performance of such models, the evolutionary algorithm UEGO(Universal Evolutionary Global Optimizer) has been used. This algorithm has provedits ability at finding the global optimal solutions when solving multimodal optimizationproblems and also test functions described in literature (See references [8, 14, 15, 17,19, 22] and papers there in). Moreover, several parallel implementations of UEGOwere designed for distributed memory architectures with successful results. They werebased on message-passing mechanisms [16, 18, 20, 21]. The first technique will be called


J.L. Redondo, I. Garcıa, P.M. Ortigosa

Algorithm 1: Evolutionary Algorithms(EA)

1 Generate an initial population2 Evaluate the population3 WHILE termination conditions do not met DO4 Recombine the population to obtain a new offspring5 Mutate the offspring to obtain a new population6 Evaluate the new population7 Select individuals from the new population to be considered in the next

generation

GM UEGO and the second SSM UEGO.

The rest of the paper is organized as follows: In Section 2 the main ideas ofthe evolutionary algorithms are briefly described and some particularities of UEGOare presented. It is in Section 3 where the parallel implementations are presented. InSection 4 computational experiments to study the performance of the parallel algorithmare carried out. The paper ends with some conclusions in Section 5.

2 The evolutionary algorithms

Algorithm 1 describes the basic structure of an evolutionary algorithm (EA). Initially,a population of randomly generated individuals (or individuals obtained from othersources such as constructive heuristics) is created. The fitness is used to determine therelative merit of each individual. Once the initial population is obtained, an iterativeprocess is carried out. At each iteration, a new offspring is generated using a recombi-nation operator, usually a two-parent or multi-parent crossover [2, 4]. Other techniquesuse population statistics for generating offspring [13, 24]. In order to introduce somenoise in the search process for avoiding the convergence to local optima, a mutationoperator is applied to the offspring. In some applications, small random changes areused as a mutation mechanism. But, in other ones, it proved to be quite beneficial touse local search algorithms to increase the fitness of individuals. These EA methods areoften called Memetic Algorithms. While the use of a population ensures an explorationof the search space, the use of local search techniques helps to quickly identify “good”areas in the search space. Nevertheless, a premature convergence towards sub-optimalsolutions can happen when applying local search. In order to avoid this drawbackniching methods are commonly used [5, 9, 12]. Finally, the individuals compete, eitheronly among themselves or also against their parents, to belong to the population at thenext iteration. This is done by a selection scheme.

2.1 The algorithm UEGO

UEGO is an evolutionary algorithm which is able both to solve multimodal optimizationproblems and to discover the structure of these optima as well as the global optimum.In a multimodal domain, each peak can be thought of as a niche.



The concept of niche is renamed species in UEGO. A species in UEGO can bethought of as a window on the whole search space. This window is defined by its centerand a radius. The center is a solution, and the radius indicates its attraction area,which covers a region of the search space and hence, multiple solutions. The radiusof the species is neither constant along the execution of UEGO nor the same for eachspecies, but it decreases as the index level (or cycles or generations) increases. Theparameter L indicates the maximum number of levels (iterations) in the algorithm. Ateach level i the function radius Ri does not change. In addition to the radius valueRi, each level has two maxima on the number of function evaluations (f.e.), namelynewi (maximum f.e. allowed when creating new species) and ni (maximum f.e. allowedwhen optimizing species).

During the optimization process, a list of species is kept by uego. This concept,species-list, would be equivalent to the term population in an evolutionary algorithm.UEGO is in fact a method for managing this species-list (i.e. creating, deleting andoptimizing species). The maximum length of the species list is given by the inputparameter max spec num (maximum population size).

In UEGO every species is intended to occupy a local maximizer of the fitnessfunction, without knowing the total number of local maximizers in the fitness landscape.This means that when the algorithm starts it does not know how many species therewill be at the end. For this purpose, UEGO uses a non-overlapping set of specieswhich defines sub-domains of the search space. As the algorithm progresses, the searchprocess can be directed towards smaller regions by creating new sets of non-overlappingspecies defining smaller sub-domains. This process is a kind of cooling method similarto simulated annealing. A particular species is not a fixed part of the search domain;it can move through the space as the search proceeds.

A global description of UEGO is given in Algorithm 2. At the beginning, a sin-gle species (the root species) exists, and as the algorithm evolves and applies geneticoperators, new species can be created (Create specie procedure). At every generation,UEGO performs a local optimizer operation on each species (Optimize species proce-dure). Therefore, UEGO belongs to the class of Memetic algorithms too. It is impor-tant to highlight that UEGO, unlike other evolutionary algorithm, realizes two selectionprocedures during the optimization process. The first one is carried out after the newoffspring is generated. This consist of the Fuse species and Shorten species list proce-dures. The second one takes place after the optimization procedure, and only considersthe Fuse species procedure.

The reader is referred to [19] for a more detailed description of the UEGO algorithm.

3 Parallel EAs on shared memory

Evolutionary algorithms are methods that work on a set of solutions (i.e. a population).Literature contains many examples of successful parallel implementations, dependingon how population can be divided [1]. In this study, it is considered that the populationcan be separated into several isolated subpopulations, which can evolve to the local or



Algorithm 2: Algorithm UEGO

1 Init species list2 Optimize species(n1)3 FOR i = 2 to L

4 Determine Ri, newi, ni

5 Create specie(newi)6 Fuse species(Ri)7 Shorten species list(max spec num)8 Optimize species(ni)9 Fuse species(Ri)

global optima without participation of the remaining ones. Therefore, there existsan intrinsic parallelism which consists of dividing the population into the availableprocessing elements. It is important to highlight that, in UEGO, a subpopulation iscompounded by a single species.

The parallel algorithms developed in this study consider a single population (specieslist) which is stored in shared memory. A main thread (processing element) carries outglobal decisions and has unidirectional control over one or more secondary threads.The parallelism comes from the concurrent execution of both creation and optimiza-

tion procedures (Lines 5 and 8 of Algorithm 2). Notice that, in parallel evolutionaryalgorithms, the selection (Fuse species and Shorten species list procedures in UEGO)can become a synchronization point, since in most of the cases, it is necessary to havethe whole population to proceed as the sequential version. Of course, partial selec-tions could be carried out concurrently, although finally a global one may be necessarywhenever it is required the parallel algorithm behaves as the sequential one.

In this work, two parallel strategies of the Evolutionary Algorithm UEGO areanalyzed (GM UEGO and SSM UEGO), depending on how processing elements inter-change information. In the first one (GM UEGO), the exchange is carried out throughglobal variables, while in the second one (SSM UEGO), local data structures are used.

3.1 The algorithm GM UEGO

In this parallel model, the population (species list) is stored in shared memory and eachsecondary thread reads the fraction of the population the main thread indicates, appliesthe Creation species or Optimize species procedures, and writes back the evaluationresults. Threads access to data in mutual-exclusion.

The main thread will create as many secondary threads as number of species in thespecies list, although they will be executed by sets of (at the most) MaxTh threads.MaxTh is an input parameter representing the number of process units used to solvethe problem. Once a thread has finished its task, it will be destroyed. This means thatthe total number of created threads varies depending of the stage of the algorithm.

To be able either to create or to optimize species, threads have to know the currentiteration (level) of the algorithm and the length of the species list, since the number of



function evaluations allowed during the creation and optimization procedures dependson these two values. In this parallel version, two global variables have been consideredto keep these two important values, i.e. the level of the algorithm and the length ofthe list. Therefore, all the threads can access to their values everywhere.

In the parallel Create species procedure, the main thread assigns a single speciesto every thread (the corresponding memory address is given). This species will evolvetrying to improve its fitness function value. If it happens, the region memory corre-sponding to the species will be updated with the improved values. Notice that everythread access to different addresses of shared memory to update their species and,therefore, mutual exclusion is not required here. It is important to highlight that, asconsequence of the applications of the creation operators, new species can be generated.These new species must be inserted into the species list and therefore, the correspond-ing list length must be increases accordingly. In this parallel version, each secondarythreads is in charge of updating the species list and its length. To maintain coherencein the data mutual exclusions are considered.

In the parallel Optimize species procedure, as in the previous one, the main threadassigns the memory address of a single species to every secondary thread. Those threadswill execute a local optimization procedure (in our case SASS [23]). As consequence,the species may be improved. In such a case, the thread will update the memoryaddress with the new value.

3.2 The algorithm SSM UEGO

In this parallel method, the species list is stored in shared memory. Nevertheless,threads do not access directly to species and neither to other variables through mutual-exclusion, but each secondary thread handles its own Local Data Structure (LDS).Different input and output LDSs are considered. The input data structure includes:the identity of the thread (tid), the memory address of a sublist of species (specList),the current iteration (level) of the algorithm and the length of the species list. Theoutput data structure contains: the memory address of a new sublist of species.

At every level, when creation and optimization procedures are executed, the mainthread creates MaxTh secondary threads, with their corresponding input LDS as soleargument. These threads will be run simultaneously at each process unit. The value ofthe parameter MaxTh coincides with the number of available units of process. Whena thread finishes its task, it will return the output resulting structure, which will beavailable to the main thread.

In the parallel Create species process, every secondary thread executes the creationprocedure to its corresponding input sublist. As consequence, the species of this sublistcan be modified. Every thread takes charge of updating the corresponding addressof memory with the new values. Again, mutual exclusion is not required. After thecreation procedure, a list of new species can be also obtained. It is important tohighlight that this creation mechanism considers a partial fusion over this new list.This reduces the computational load of the subsequent Fuse species procedure (see line6 of Algorithm 2). Finally, the thread will update its output data structure with the



fused species sublist.In the Optimize species paral procedure, as previously, the main thread divides the

number of species into the species list among MaxTh threads. Each thread will executea local optimization procedure (SASS [23]) to every species and update its sublist withthe new values. Again, mutual exclusion is not necessary.

After the parallel Create species and Optimize species paral procedures, the mainthread will update the species list taking into account the resulting values obtained byeach secondary thread.

4 Performance evaluation

The main goal of this study consists of evaluating the parallel algorithms GM UEGOSSM UEGO, in comparison to the sequential algorithm UEGO. To determine if theparallel algorithm is efficient from a computational point of view, numerical values ofefficiency have been registered. The efficiency of the parallel version, which estimateshow well-utilized the processors are in solving the problem, is computed as:

Eff(P ) =T (1)

P · T (P ),

where T (i) is the CPU time employed by the algorithm when i processing elements areused (i = 1, 2, ..., P ).

It is important to highlight that, due to the stochastic nature of the algorithms, allthe experiments have been executed 5 times and average values have been consideredwhen computing the efficiency.

Performance evaluation has been made using a set of tailored test problems. Thisset of problems and the set up parameters of UEGO are described in subsection 4.1,which provides computational results obtained from our experiments.

4.1 Test problems and the environment

A comprehensive computational study has been carried out on a set of new definedfunctions following the ideas discussed in [6], namely that for doing scientific testsit is more convenient to use functions that differ only in controllable features. Thisallows the analysis of the effect of only one separated feature of the test problem,e.g. the number of local optima. Well-known benchmark functions are not used fortesting, since it prevents developing methods that perform well only on special kindof problems. Furthermore, it is more important to characterize the ideal problem foroptimization methods than trying to show that they outperform other methods on asmany benchmark problems as possible.

In the experiments to be discussed here, the effects of dimensionality and thenumber of local optima are examined. To this aim, several test functions are used,which are characterized in Table 1.

The construction of these functions is detailed in [7]. It starts with a user-given listof local optimum sites (o) and the corresponding function values (fo). All the function



values have to be positive. In the first step, we define bell shapes for every site to createthe local optima. The height of a bell is given by the function value fo of its site o,and its radius r is the distance of o from the closest site. The height of the bell at adistance x from o is fog(x), where:

g(x) =

1 − 2x2

r2 if x < r

22(x−r)2

r2 if r

2 ≤ x < r

0 otherwise

The objective function is the sum of these bells. In the case of our test functions, thecoordinates of the maximum sites and their values were randomly taken from [0, 1]using a uniform distribution.

Algorithms were coded in C++ and run on an SGI Altix 300 machine with 16 IntelItanium2 1.6 GHz processors, 64 GB of RAM, with cache coherent NonUniform MemoryAccess (ccNUMA) and Linux operating system with 2.6.5 kernel.POSIX Thread NPTLlibrary (version 2.3.5) was used to create threads.

The input parameters of UEGO were set to N = 2 · 108, M = 750, L = 10and RL = 0.02 for all the instances and algorithms. Due to the stochastic natureof the algorithms, all the experiments were executed 5 times and average values wereconsidered. It is important to highlight that the confidence intervals obtained for theseaverage values are relatively narrow, which reveals the robustness of the algorithms’solutions.

4.2 Computational results

It is important to emphasize that an effectiveness analysis showed that both paral-lel algorithms are able to provide the global optimal solution for the complete set ofproblems.

Table 1 summarizes the efficiency obtained by the parallel algorithms when theyare executed with a maximum number of 2, 4, 8, 16 processing elements over sixteenproblems with different number of optima (50, 100, 150, 200) and dimension of the searchspace (2, 5, 10, 15).

Experimental results show that both parallel algorithms are able to obtain highvalues of efficiency independently of the number of processing elements and the com-putational load of the problem (see Table 1). In fact, both parallel algorithms areable to obtain efficiencies close or equal to the ideal case. See for instance the casewith dim = 5 and 50 optima executed with 2 threads, where SSM UEGO obtains aefficiency value of 1.00 and GM UEGO gets 0.93. These high values of efficiency canbe obtained because the computational costs associated to the execution of the selec-tion procedures (Fuse species and Shorten species list) are negligible despite they areexecuted sequentially.

In general, the efficiency of the parallel implementation decreases when the numberof processing elements increases. This may be due to the computational load is notenough for dividing among many processing elements. Moreover, the existence of thesynchronization points imposed by the selection procedures (see Lines 6, 7 and 9 of



Table 1: Efficiency estimation for all the problems

Number of optima

dim MaxTh 50 100 150 200

GM UEGO SSM UEGO GM UEGO SSM UEGO GM UEGO SSM UEGO GM UEGO SSM UEGO

2 2 0.88 0.93 0.88 0.91 0.88 0.96 0.86 0.924 0.77 0.85 0.76 0.84 0.77 0.90 0.76 0.888 0.68 0.75 0.70 0.75 0.66 0.81 0.66 0.8016 0.52 0.60 0.55 0.63 0.53 0.68 0.52 0.68

5 2 0.93 1.00 0.91 0.94 0.93 0.91 0.89 0.924 0.81 0.93 0.87 0.94 0.85 0.90 0.85 0.878 0.77 0.91 0.79 0.90 0.78 0.86 0.77 0.8416 0.65 0.83 0.69 0.84 0.69 0.82 0.68 0.80

10 2 0.93 0.98 0.88 0.90 0.89 0.88 0.89 0.914 0.86 0.95 0.84 0.84 0.85 0.85 0.88 0.888 0.78 0.89 0.76 0.84 0.77 0.84 0.80 0.8016 0.71 0.84 0.66 0.76 0.70 0.80 0.69 0.70

15 2 0.89 0.97 0.80 0.81 0.80 0.81 0.80 0.834 0.84 0.95 0.74 0.75 0.70 0.73 0.70 0.708 0.76 0.80 0.68 0.70 0.68 0.70 0.62 0.6616 0.60 0.80 0.60 0.62 0.61 0.63 0.58 0.61

Algorithm 2) may produce that processes stay idle longer if there exists load unbalance.This fact reduces the efficiency of the parallel versions.

Notice also that efficiency decreases when the complexity of the problem to solveis too high (i.e. with the number of optima and dimension). This may be due to theeffects of load unbalance may increase when the computational costs associated to afunction evaluation rises.

As can be observed, the algorithm SSM UEGO outperforms the efficiencies ob-tained by GM UEGO. In the algorithm GM UEGO, processing elements interchangeinformation using global variables. To maintain the coherence, processes access to thememory through mutual exclusions. Those mutual exclusions may become bottleneckswhen several processing elements want to update the same memory region. On the con-trary, SSM UEGO uses local variables for the exchange of information. This makes thememory management more efficient, which translates to increments in the efficiencies.

5 Conclusion and future work

In this paper, two parallelization of the algorithm UEGO have been presented: GM UEGOand SSM UEGO. The parallel versions has been designed to be executed on sharedmemory architectures. Since UEGO is an evolutionary algorithm and share many fea-tures with many of these methods, the conclusions inferred from this study can beapplied to this wide set of algorithms.

Computational experiments have shown that both algorithms have a good behav-ior, since efficiencies are high independently of the computational load of the problemand the number of processing elements. Moreover, they obtain ideal efficiencies insome cases. However, SSM UEGO obtains better results than GM UEGO. The reasonbehind is that SSM UEGO does not consider mutual exclusions, which reduces thenumber of synchronization points and waiting times of the processing elements.



In the near future, other parallel strategies based also on shared memory program-ming will be developed, evaluated and compared. But also some behaviours of thecurrent parallel versions will be studied deeper.

References

[1] Alba, E. Parallel Metaheuristics: A New Class of Algorithms. Wiley-Interscience,2005.

[2] Bersini, H., and Seront, G. In search of a good evolution-optimizationcrossover. In Proceeding of PPSN-II, Second International Conference on Par-

allel Problem Solving from Nature (1992), R. Manner and B. Manderick, Eds.,Elsevier, Amsterdam, The Netherlands, pp. 479–488.

[3] Darwin, C. The Origin of Species by Means of Natural Selection. Mentor Reprint,1958, 1859.

[4] Eiben, A., Raue, P., and Ruttkay, Z. Genetic algorithms with multiparentrecombination. In Parallel Problem Solving from Nature - PPSN III (Berlin, 1994),Y. Davidor, H.-P. Schwefel, and R. Manner, Eds., vol. 866 of Lecture Notes in

Computer Science, Springer, pp. 78–87.

[5] Goldberg, D., and Richardson, J. Genetic algorithm with sharing for multi-modal function optimization. In Genetic Algorithms and their Applications, J. J.Grefenstette, Ed. Lawrence Erlbaum Associates, Hillsdale, NJ, 1987, pp. 177–183.

[6] Hooker, J. N. Testing heuristics: We have it all wrong. Journal of Heuristics 1,1 (1995), 33–42.

[7] Jelasity, M., Ortigosa, P., and Garcıa, I. UEGO, an abstract clusteringtechnique for multimodal global optimization. Journal of Heuristics 7, 3 (2001),215–233.

[8] J.L.Redondo. Solving competitive location problems via memetic algorithms. High

performance computing approaches. PhD thesis, Universidad de Almerıa, 2008.

[9] Jong, K. D. An analysis of the behavior of a class of genetic adaptive systems.

PhD thesis, University of Michigan, Ann Arbor, MI, 1975.

[10] Kicinger, R., Arciszewski, T., and Jong, K. D. Evolutionary computationand structural design: A survey of the state of the art. Computers and Structures

83, 23-24 (2005), 1943–1978.

[11] Luke, S. Issues in Scaling Genetic Programming: Breeding Strategies, Tree Gen-

eration, and Code Bloat. PhD thesis, Department of Computer Science, Universityof Maryland, 2000.



[12] Mahfoud, S. Niching methods for genetic algorithms. PhD thesis, University ofIllinois at Urbana-Champaign, Urbana, IL, 1995.

[13] Muhlenbein, H., and Voigt, H.-M. Gene pool recombination in genetic al-gorithms. In Proceeding of the Metaheuristics Conference (1995), I. Osman andJ. Kelly, Eds., Kluwer Academic Publishers, pp. 53–62.

[14] Ortigosa, P., Redondo, J., Garcıa, I., and Fernandez, J. A populationglobal optimization algorithm to solve the image alignment problem in electroncrystallography. Journal of Global Optimization 37, 4 (2007), 527–539.

[15] Redondo, J., Fernandez, J., Garcıa, I., and Ortigosa, P. Heuristics forthe facility location and design (1|1)-centroid problem on the plane. Computational

Optimizations and Applications (2007). To appear, DOI: 10.1007/s10589-008-9170-0.

[16] Redondo, J., Fernandez, J., Garcıa, I., and Ortigosa, P. Parallel al-gorithms for continuous competitive location problems. Optimization Methods &

Software 23, 5 (2008), 779–791.

[17] Redondo, J., Fernandez, J., Garcıa, I., and Ortigosa, P. A robust andefficient global optimization algorithm for planar competitive location problems.Annals of Operations Research 167 (2009), 87–106.

[18] Redondo, J., Fernandez, J., Garcıa, I., and Ortigosa, P. Parallel al-gorithms for multifacilities continuous competitive location problems. Journal of

Global Optimization in press (2009).

[19] Redondo, J., Fernandez, J., Garcıa, I., and Ortigosa, P. Solving the mul-tiple competitive location and design problem on the plane. Evolutionary Compu-

tation 17, 1 (2009), 21–53.

[20] Redondo, J., Garcıa, I., Ortigosa, P., Pelegın, B., and Fernandez, P.

Parallelization of an algorithm for finding facility locations for an entering firmunder delivered pricing. In Proceeding of Parallel Computing 2005 (PARCO 2005)

(2005), pp. 269–276.

[21] Redondo, J., Garcıa, I., Ortigosa, P., Pelegrın, B., and Fernandez,

P. Parallelization of an algorithm for finding facility locations for an enteringfirm under delivered pricing. In Parallel Computing: Current and Future Issues of

High-End Computing, G. Joubert, W. Nagel, F. Peters, O. Plata, P. Tirado, andE. Zapata, Eds., vol. 33 of NIC series. John von Neumann Institute for Computing,2006, pp. 269–276.

[22] Redondo, J., Ortigosa, P., Garcıa, I., and Fernandez, J. Image registra-tion in electron microscopy. A stochastic optimization approach. Lecture Notes in

Computer Science, Proceedings of the International Conference on Image Analysis

and Recognition, ICIAR 2004 3212, II (2004), 141–149.



[23] Solis, F., and Wets, R. Minimization by random search techniques. Mathe-

matics of Operations Research 6, 1 (1981), 19–30.

[24] Syswerda, G. Simulated crossover in genetic algorithms. In Proceeding of the

Second Workshop on Foundations of Genetic Algorithms (1993), L. Whitley, Ed.,Morgan Kaufmann Publishers, pp. 239–255.

[25] Torn, A., and Zilinskas, A. Global optimization. Springer-Verlag New York,Inc., New York, NY, USA, 1989.





30 June, 1–3 July 2009.

A Reinvestment Mechanism for Incentive Collaborators

and Discouraging Free Riding in Peer-to-Peer Computing

Josep Rius1, Fernando Cores1 and Francesc Solsona1

1 Computer Science and Industrial Engineering, Universitat de Lleida

emails: jrius, fcores, [email protected]

Abstract

Peer-to-Peer (P2P) computing, the harnessing of idle compute cycles throughthe Internet, offers new research challenges in the domain of distributed comput-ing. This paper presents a global P2P credit-based incentive scheme that allowsCoDiP2P, a P2P computing framework developed by our group, job scheduling byencouraging peers to collaborate and detects and penalizes free-riding efficiently.The main contribution is a re-investing mechanism that increases peer participa-tion significantly. The good performance and functionality of our proposal is shownby comparing it with the new emerging incentives in the literature by means ofsimulation.

Key words: Distributed Computing, Incentives, P2P Computing, Mutable Re-

source Sharing, Free-Riders.

1 Introduction

P2P computation [1, 2, 3, 4] represents an emergent low-cost alternative to supercom-puters and cluster systems, providing access to distributed computational resources ina scalable and fault-tolerant way. P2P architectures take advantage of the under uti-lization of personal computers, integrating them into a platform based on the sharingof computational resources between geographically distributed equals.

Peer-To-Peer (P2P) architectures are characterized by self-organization, distributednature, resource sharing architecture and high dynamism. P2P systems are able toprovide an inexpensive, highly scalable, available, fault tolerant and robust platformfor sharing resources.

The participation of their forming nodes (or peers) is voluntary. Although, cooper-ation is the key to the success of a peer-to-peer system, it is difficult to cultivate withoutan effective incentive mechanism. In fact, many P2P systems lack such a mechanismand consequently suffer from free riding [8]. Free-riders consume resources donated byother peers but not donate any of their own. Experience with peer-to-peer systems


Reinvestment Mechanism for Incentive Collaborators & Discourage Free Riding

shows that in the absence of incentives to donate, a large proportion of the peers onlyconsume the resources of the system [7]. Free-riding is a concern because it decreasesthe utility of the shared resources in the system, potentially to the point of systemcollapse.

There are many architectures for sharing resources in peer-to-peer networks butmost of these are only used to share immutable resources (i.e. files). Incentive mech-anisms are applied on the fly, allowing quick and direct results in the system. Thesepolices do not require global information management nor long-term characterizationof peer behavior.

Some of these incentive mechanisms are based on increasing the QoS of the collab-orators. BitTorrent [12] is an example of such incentive. Credit is another incentivemechanism. For example, eMule [13] implements a local credit-like incentive. Thismechanism tries to achieve local fairness, but global coordination is not its goal. Withlocal accounts, all information about a peer is derived directly from the peer itself. Evenif the receipts are signed by the transaction partner, fraud is easily possible throughmalicious collaboration.

A credit system is implemented in the eMule system [13]. The goal of the creditsystem is to reward users contributing to the network by reducing their waiting timein the upload queue. The approach is cheat-proof in the sense that peers have noreason to tamper with the credit file. However, anecdotal evidence [18] suggests thatthe approach does not consistently provide a clear performance advantage to users whocontribute resources to the network.

BitTorrent’s[12] tit-for-tat mechanism makes the peers upload evenly to all links asmuch as they can. This new approach encourages peers to collaborate actively in thesystem and this was shown to avoid free-riding. The incentive mechanism of the originalBitTorrent was proven to be very inefficient because the links were chosen in a round-robin fashion and thus free-riders are given repeated benefit. Most of the incentiveapproaches such as BitTorrents tit-for-tat mechanism or eMules credit system, areimmutable-oriented and thus not easily applicable to decentralized computing systems,such as mutable sharing P2P platforms.

However, there are very few P2P systems for sharing mutable resources (e.g., CPUor Memory). The most interesting works devoted to sharing mutable resources areCompuP2P[6] and OurGrid [7]. These systems have the same properties as the im-mutable ones. They are distributed, inexpensive, highly scalable, fault tolerant androbust.

CompuP2P [6] is an architecture for sharing computing resources in peer-to-peernetworks. It creates dynamic markets of network accessible mutable resources in acompletely distributed, scalable, and fault-tolerant manner. CompuP2P uses ideasfrom game theory [5] and microeconomics to devise incentive-based schemes. Abovethe Chord overlay network, different lightweight control mechanisms have been built tocreate markets dynamically. The search for appropriate markets, trading and pricingof resources is done in a completely distributed manner without requiring any trustedand/or centralized authority to oversee the transactions. The market-based mecha-nism proposed encourages peers to participate actively in the purpose of the system.


J. Rius, F.Cores, F.Solsona

However, this mechanism ignores free-riders.

OurGrid [7] uses a peer-to-peer autonomous reputation, called Network of Favors,as an incentive for peers. OurGrid prioritizes the peers with high reputation, thusmotivating sharing. The main advantage of OurGrid is that there is no need for stor-ing global reputation. The reputation of neighbors is maintained in each peer and acommunicating overhead is avoided. This scheme is focused on solving free-riding, al-though it limits job scheduling. The behavior of peers cannot be determined in advance.There is no objective information about those peers that have never collaborated withthe server peer. Scheduling success depends on the collaboration of the peers afterthe mapping of tasks. If the target peers cooperate with each other, their respectivereputation will increase, so good performance will be obtained. However, if the targetpeers do not cooperate, job performance will not be guaranteed.

This present work is based on the CoDiP2P platform, developed by our researchgroup. CoDiP2P is a decentralized mechanism for distributing computation and re-source management that takes into account the system heterogeneity by using the P2Pparadigm. The design of CoDiP2P is focused on hiding system complexity from theprogrammers and users. The hierarchical topology in managing and maintaining thesystem-growing capacity favors the good scalability of the system. Fault-tolerance hasalso been considered by providing self-organization of peers and avoiding centralizedcomponents. The system is able to manage its resources efficiently, independently oftheir heterogeneity, geographical dispersion and volatility.

In this paper, we present a new peer incentive mechanism for CoDiP2P with in-vestment capabilities and designed to avoid free riding efficiently. Our architectureimplements a P2P credit-based incentive scheme.

The CoDiP2P scheduling process is based on the proposed incentive mechanism.It manages updated information from the system, and can thus take the resource mu-tability into account. Taking this into account, the method always chooses the bestassignment of multitasking jobs to peers.

CoDiP2P’s incentive mechanism implements a non-negative credit function (toprevent ID-changing cheating) with an historic term used to differentiate between new-comers and old collaborative peers. Re-investment of the credits obtained by the localowners of the areas, called here “managers”, increases system throughput enormously,this being the main contribution of this work.

Results are compared with OurGrid and CompuP2P, a basic credit-market mech-anism without investing, achieving a better throughput due to the re-investing anddiscouraging free-riding through the credit-like mechanism.

The remainder of the paper is organized as follows. Section 2 introduces theCoDiP2P platform. Our proposal is explained and discussed in Section 3, while inSection 4 we present our simulation results to support our assumptions in the analysis.Finally, we conclude in Section 5.



2 CoDiP2P Framework

We are currently deploying a peer-to-peer architecture called CoDiP2P [9, 10] for shar-ing computing resources. The peers that make up the system have two roles, as workersand managers and are grouped together by areas. The architecture of CoDiP2P is atree of areas. Fig. 1 shows the CoDiP2P architecture. In this example, the tree hasthree levels and is made up of five areas. The main CoDiP2P components are thefollowing:

• Area Ai: logical space made up of a set of workers controlled by a manager.Each CoDiP2P area has a limited capacity of peers (Size(Ai)), which dependsmainly on its network properties, bandwidth and latency.

• Manager Mi: is one of the two types of roles that a peer can acquire. Its maingoal is to manage peers in the same area and schedule jobs over the workers.Each area Ai has a set of peers named Replicated Managers (RMi), which havea special role for replacing Mi when it falls. Each RMi maintains a copy of thesame information kept by Mi.

• Worker Wi: peer with the role of executing tasks scheduled by its manager. Itcan also launch tasks for execution to its manager.

Figure 1: CoDiP2P, a tree-like architecture.

The properties of CoDiP2P are the following: scalability to ensure that the systemsupports the massive entry of peers; distributed management to harness the computingresources offered by the nodes making up the system; fault tolerance to prevent thehigh possibility of a peer failure and replace it by ensuring the stability, robustness andperformance of the global system; self -organization is the way that each peer can be amanager or a worker dynamically according to the needs of the system; heterogeneous

resource management to manage the scheduling and load balancing of tasks amongthe peers efficiently, independently of their heterogeneity, geographical dispersion andvolatility.

The main features of CoDiP2P involved in resource management are the followings(see Fig. 2): peer insertion of new peers in a balanced way; scheduling of multitasking



jobs; maintenance which keeps the system updated in periods of T seconds and peeroutput, used also to re-balance the tree when a peer leaves the system. The insertionand output cost is θ(logSize(Ai)(N)), where Size(Ai) is the area capacity and N is thenumber of peers in the system. The cost of the maintenance algorithm is hardly worse,θ(Size(Ai)). These low costs justify the election of a tree-like architecture. For moreinformation see [9].

Figure 2: CoDiP2P Functionalities.

3 Reinvestment Incentive mechanism

This section defines the incentive mechanism implemented in CoDiP2P to reward co-operating peers while free-riders are discouraged. The CoDiP2P incentive mechanismis based on credits. The main goal of the credit policy proposed is to maximize thesystem throughput.

When a peer wants to submit a job to the system, the number of acquired creditsof such a peer will determine the launching success of this job. New-coming peers startwithout any credit assignment in order to avoid identity-changing attacks. Using thisstrategy, CoDiP2P can avoid prioritizing malicious identity-changing over collaborativepeers who have consumed more resources than they have contributed. The system doesnot allow the peers to have negatives credit. This decision is based on the experimentalstudy about Internet auctioning [14]. Problem arises when the system cannot distin-guish between malicious identity-changing free-riders who never donates resources andcollaborative peers who have spent all their credits submitting jobs to the system. Thisadditional problem is overcome by maintaining the number of collaborations made byeach peer in each associated manager.

Every system node i has an associated cost (Vi), which represents the relative valueof its computational resources at a given time. This value is updated periodically by themaintenance algorithm every T seconds. The Vi values are used in the job assignmentprocedure, made in the reinvestment incentive algorithm (Algorithm 1), and located inthe manager of each area.

In the scheduling process, the manager assigns each task that makes up the job todifferent workers in its area. The method used to do so is the reverse vickrey auctionstrategy. The manager chooses the worker (Workerl) with the lowest cost (Vl). Theselected peer is rewarded with the second lowest offered cost (V2l) minus its own, thatis, the lowest one (Vl). Formally, the number of credits rewarded to a worker in the



assignment of a task (Profit Wl) is:

Profit Wl = V2l − Vl

The incentive mechanism, inspired in the reverse vickrey auction, avoids the possibilityof a peer cheating by offering values lower than their respective cost. In the case whereone peer wants to be selected, and tries to drop its corresponding Vi, if the second lowestcost is above it, selecting it will cause a negative profit. Thus cheating is discouraged.

In the scheduling, each manager will always choose the best option, which alsomaximize its profit (in credits), defined as the difference between the highest Vmax andlowest costs (of the selected node), Vsel. Thus, the selected peer maximize the profit ofthe manager Profit M :

Profit M = Vmax − Vsel

The total amount of credits (Creditssub) afforded for launching the job by thesubmitter peer Peersub is the sum of the manager profit Profit Mj of each task assignedto its area plus the sum of the profit of the mapped workers (Profit Wj):

Creditssub =∑J

j=1 (Profit Mj + Profit Wj),

where J is the number of tasks making up the job.Algorithm 1 shows the behavior of the incentive scheme in CoDiP2P. It starts when

a manager (ManB) receives a request for a job launching (Job) from a peer (Peersub).The submitter Peersub could be a worker or a manager who has received a job froma peer in its area, but cannot take it on. When areas are full, managers can send jobtasks to other managers, located in other areas.

Next, the manager M checks if there are enough available workers in the systemand if Peersub has enough credits to execute the job (see lines 2-4). If not, M rejects thejob (line 5). Then, if necessary it delivers parts of the Jobj to its manager, located onelevel above it (lines 6-10). Finally, M maps the remaining tasks by using the reversevickrey auction strategy to its area (lines 11-19).

Due to the volatility of the resources, if resources are not assigned they are wasted.Taking this into account, if the system is underused, a job launch request will beaccepted, even if the launching peer does not have enough credits (see line 4). Con-sequently, some peers receive work and credits. These additional earned credits willencourage the possibility of new jobs being accepted by the manager, who will alsoearn more credits through doing the job assignment. This mechanism will increase theexecution of jobs in absence of activity and encourage the usage of the system.

Over time, managers can become large credit hoarders. In our incentive scheme,managers can reinvest their credits into the system in order to motivate the workers tosubmit jobs to the platform. In this study, the manager distributes its credits uniformlybetween the peers in its area. This concept is represented in algorithm by the δ factor(see lines 16-17). δ is the portion of payoff in credits of the manager distributed betweeneach worker. We strongly believe that more efficient credit distribution strategies canbe found. However, this first and simple approximation improves system performancesignificantly, as is shown in Section 4 below. Future work will trend to investigate newdistribution mechanisms.



Algorithm 1 Reinvestment incentive algorithm

Require: (Job): Input parameters

1. M receives the Job features from the Peersub

2. NTasks := Number of tasks making up the Job

3. ALLFreeWork := All free Workers of the System

4. if ALLFreeWork < NTasks||(Peersub has not enough credits && System is notidle) then

5. M rejects Job

6. else

7. FreeWork := All free Workers of the M area

8. if FreeWork < NTasks then

9. M sends the Job with (NTasks − FreeWork) tasks to its manager

10. NTasks := NTasks − FreeWork

11. Vmax :=Maximum peer cost value in the M area

12. for all NTasks of Job do

13. V2l := Second lowest peer cost value in the M area

14. Workerl := Free Worker with the lowest Vl in the M area

15. Profit Wl := V2l − Vl

16. Profit M := Profit M + (Vmax − Vl) ∗ δ

17. Creditssub := Profit Wi + (Vmax − Vl) ∗ δ

18. M sends Taskj to Workerl

19. end for

4 Experimentation

In this section, we evaluate the performance of the proposed system through simulationstests. The SimJava [11] framework allows us to simulate a peer-to-peer network witha market-based system where peers try to sell their resources.

SimJava is a process-based discrete event simulation api for Java. A SimJavasimulation is a collection of entities each running in its own thread, connected togetherby ports to communicate with each other by sending and receiving event objects. Thereis also a central system class that controls all the threads, advances the simulation time,and delivers the events.

4.1 Simulator behavior

In order to evaluate the proposed incentive mechanism we have implemented a simulatorwith the following features:

1. System creation: The first step is to create all system peers and establish theconnection between them.



2. Simulation process:

• During the simulation, each peer continuously tries to submit jobs to thesystem with a probability p.

• Once a peer decides to submit a job, it specifies the number of tasks makingup the job.

• On receiving a new job the manager performs the followings checks:

(a) Are there enough workers in the system to execute the job? If not, themaster will reject the job.

(b) Has the launcher peer enough credits to afford the total cost of the job?If so, the manager will start scheduling the job. In the other case, hewill reject the job or start scheduling depending on the next question.

(c) Is the system idle? If so, the manager will accept the job charging fewercredits.

• If the manager accepts the job, the incentive mechanism described in thesections above is started.

• Finally, the manager is responsible for subtracting the credits from thelauncher peer and eventually reinvesting a part of its earnings evenly amongthe other peers.

3. Generating statistics: Finally, the simulator shuts down all the entities (peers)of the system and closes all the connection between them. It also generates allthe traces and the statistics of the simulation process.

4.2 Launch probability

In order to validate the simulator, the first test is presented in Fig. 3. It tries to showhow many jobs the system cannot accept depending on the launching probability ofeach peer and the maximum number of tasks per job. In this test, the only criterion isthat a manager will reject a job when there are not enough free workers to assign allthe tasks simultaneously.

Obviously, if we increase the submit peer probability, the managers will increasethe number of requests at the same time, so they will be forced to refuse more jobs.Similarly, if we increase the maximum number of tasks per job, the manager will exhaustall its workers sooner. So, the consistency of these results confirms the good behaviorof the simulator.

In conclusion, we have to assume a low launch probability (between 4 and 7 percent)to avoid system saturation.

4.3 Credit reinvestment

Figure 4 shows the impact of the credit return policy on system throughput, understoodas the number of executed tasks. In this experimentation, the percentage of creditsreturned by the manager to the system ranges from 0 to 100%.



Figure 3: Number of reject jobs depending on the launch probability.

As can be noticed, the system throughput increases drastically when managercredits are reinvested. For reinvesting percentages below 70%, the number of executedtask increases linearly from 15.000 to 40.000. The best improvement is obtained whenthe reinvest percentage is above 70%. In this case, the system throughput is multipliedby 4, allowing up to 200.000 tasks. However, with percentages over 88% no furtherimprovement can be noticed.

These results indicate a clear conclusion: although it is a good policy to encouragethe collaboration of the manager with the system, it should be limited. Too generousbonus may cause unbalance credit distribution and system starvation. We propose tokeep the manager incentives, but reduce them by 88% in order to redistribute thesecredits among the remaining incentive policies (i.e. remaining peers).

Figure 4: Number of tasks executed in the system depending on the manager reinvest-ment.

4.4 Avoiding Free-Riders

Nowadays, free-riders represent one of the most challenging problems for cooperativeP2P systems. Incentive policies are responsible for addressing this problem, discourag-ing free-riding behavior.



In the simulation, free-riders do not collaborate with the system at all. Free-riderslaunch tasks for their own execution but do not execute foreign ones. We assume thatnormal peers can become a free-riders with a probability of σ if the system allows it.Also a free-rider can become a collaborator peer with a probability of ρ if it is properlyencouraged. This behavior is modeled with the state-diagram represented in Figure 5.

Figure 5: State diagram of peers behavior.

Figure 6 shows the evolution of free-riders with regard to the simulation time whenincentive mechanism is applied. We assume that initially 20% of peers are free-ridersin figures 6a-6b and 60% in figures 6c-6d. The probability of a free-rider becoming acollaborative peer (ρ) are 20% and 50% respectively. These factors for different systemload (launch probabilities) are also analyzed.

(a) 5 Reject and 20% free-riders. (b) 1 Reject and 20% free-riders.

(c) 5 Reject and 60% free-riders. (d) 1 Reject and 60% free-riders.

Figure 6: Free-rider Evolution with 5% launch probability of collaborative peers.

The results shows that our incentive mechanism reduces the number of free-riders.The percentages of free-riders decrease as a consequence of applying of incentive policies.It can be also shown that free-riders disappear proportionally with the system load.With system saturation (launch probability 100%) the incentive policies could be used



to penalize free-riders. If the system is underused such policies have no effect. Theeffects of incentive policies increases with the system load because they encourage thecollaboration of free-riders.

Figure 6b plots free-rider evolution, when the probability of a free-rider changingits behavior is only 50%. The important impact of this parameter is clear if the resultsare compared with the ones in Figure 6b. In Figure 6b the free-riders disappear quicklyand do not relapse.

Similar results are obtained when the initial number of free-riders is 60% (seeFigures 6c and 6d).

5 Conclusions and Future Work

CoDiP2P is a decentralized mechanism for distributing computation and resource man-agement that takes into account the system heterogeneity by using the P2P paradigm.The hierarchical topology in managing and maintaining the system-growing capacityfavors good scalability, fault-tolerance, self organization of the system, independentlyof its geographical dispersion and volatility.

Our architecture implements a global P2P credit-based incentive scheme for theCoDiP2P system. This allows the efficient scheduling of multitasking distributed appli-cations efficiently. The scheduler manages updated information from the system, andso it can take into account the resource mutability.

CoDiP2P’s incentive mechanism implements a non-negative credit function (toprevent ID-changing cheating) with an historic term used to differentiate between new-comers and old collaborative peers. Re-investment of the credits obtained by the localowners of the areas, named managers here, increases peer cooperation enormously, thisbeing the main contribution of this work. The results are compared with CompuP2P,a basic credit-market mechanism without investing and we proved that the gain of ourproposal was about 88%.

Future work will study to find the stability of the system. We are interested infinding the Nash equilibrium of the overall manager’s profit, depending on the areaproperties, defined basically by its bandwidth and latency. New reinvesting policiesmust be developed in order to increase the system performance even more.

There are basically three kind of users, active, passive and free-riders. We are inter-ested in modeling the behavior of these users and implementing them in the simulator,thus obtaining a new system performance according to the users’ behavior.

Acknowledgements

This work was supported by the MEC-Spain under contract TIN2008-05913 and theCUR of DIUE of GENCAT and the European Social Fund.



References

[1] Avaki, http://www.sybase.com (2006).

[2] SETI@home, http://setiathome.ssl.berkeley.edu (2006).

[3] J. Frankel and T. Pepper, The Gnutella Protocol Specication v0.6, http://rfc-gnutella.sourceforge.net (2003).

[4] J. Ernst-Desmulier and J. Bourgeois and F. Spies and J. Verbeke,Adding New Features In a Peer-to-Peer Distributed Computing Framework, Proc.of the 13th EuroMicro Conf. on Paral., Dist. and Network-Based Processing (2005).

[5] M. J. Osborne, An Introduction to Game Theory, Oxford University Press, NewYork (2002).

[6] Rohit Gupta, and Arun K. Somani, Compup2p: An architecture for sharing

of computing resources in peer-to-peer networks with selfish nodes, Online Pro-ceedings of Second Workshop on the Economics of Peer-to-Peer Systems, HarvardUniversity (2004).

[7] N. Andrade, F. Brasileiro, W. Cirne, M. Mowbray Discouraging Free

Riding in a Peer-to-Peer CPU-Sharing Grid, HPDC ’04: Proc. of the 13th IEEEIntl. Symposium on High Performance Distributed Computing (2004) 129–137.

[8] E. Adar and B. A. Huberman Free riding on Gnutella, First Monday (2000).

[9] D. Castella, I. Barri, J. Rius, F. Gine, F. Solsona and F. Guirado

CoDiP2P: A Peer-to-Peer Architecture for Sharing Computing Resources , DCAI2008: International Symposium on Distributed Computing and Artificial Intelli-gence (2008) 293–303.

[10] D. Castella, J. Rius, I. Barri, F. Gine and F. Solsona A New Reliable

Proposal to Manage Dynamic Resources in a Computing P2P System, PDP 2009:17th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (2009).

[11] W. Kreutzer, J. Hopkins and M. van Mierlo SimJAVA - A Framework for

Modeling Queueing Networks in Java, Proceedings of the 1997 Winter SimulationConference (1997).

[12] S. Jun, M. Ahamad Incentives in bittorrent induce free riding, SIGCOMM: Spe-cial Interest Group on Data Communication (2005).

[13] Y. Kulbak and D. Bickson, The Emule Protocol Specification, DistributedAlgorithms, Networking and Secure Systems, 2005.

[14] T. Yamagishi, M Matsuda, Improving the Lemons Market with a

Reputation System: An Experimental Study of Internet Auctioning,http:joi.ito.comarchivespapersYamagishi ASQ1.pdf, 2002.



Dimensionality Reduction and Parallel Computing forMalware Detection

Alberto Rodrıguez1, Pablo Martıınez1, Jose Ranilla1, Elıas F.Combarro1, Elena Montanes1, Raquel Cortina1, Pedro Alonso2 and

Irene Dıaz1

1 Artificial Intelligence Center, University of Oviedo2 Department of Mathematics, University of Oviedo


Abstract

Malware poses one of the most serious threats to computer security nowadays.Traditionally, detection of malicious software is carried out by the use of a databasewhich includes signatures of already known instances of malware. This method isnot valid for detecting unknown viruses and is, most times, unfeasible in practice.

In this paper, we present an approach that tries to overcome these difficul-ties with the use of Machine Learning, Parallel Computing and DimensionalityReduction techniques.

Key words: parallel computing, machine learning, malware detection, dimen-sionality reduction

1 Introduction

Malware or malicious software is one the main threats to computer security. It isestimated [1] that, even among those PCs running an up-to-date anti-virus solution,23% are infected. This problem becomes more serious every day due to the growingcomputer communication infrastructures and the increasing sophistication of malware.

The usual approach to malware detection relies on the construction of a signaturedatabase of (all) known malware instances [2]. Any file in the user’s computer thatmatch against one of these signatures is identified as a virus and managed accordingly.

This method presents several important drawbacks. First of all, for a computervirus to be detected it must have been previously identified by a human expert and asignature must have been generated and provided to the user. With 20000 new virusesdetected each day [3] this clearly becomes unfeasible. Moreover, this approach is unableto detect new malware and can miss the identification of mutating specimens. Finally,


Dimensionality Reduction and Parallel Computing for Malware Detection

0

200000

400000

600000

800000

1e+06

1.2e+06

1.4e+06

1.6e+06

1.8e+06

2e+06

2.2e+06

2003 2004 2005 2006 2007

Malware detected from 2003 to 2007

25.454 36.06980.288

206.345

2.012.931

Figure 1: Malware detection in recent years

the World Wide Web allows faster than ever malware dissemination and computersecurity can no longer rely on signature lists that are obsolete just minutes after theirrelease.

Thus, a paradigm shift in malware detection and antivirus systems is being carriedout by the leader security solution providers in the last few years [3]. In this paper wepresent an approach based on Machine Learning to be able to detect malicious programsbeforehand. Due to the high dimensionality of data involved, reduction techniques andparallel computing are needed in order to deal with the high volumes of files to beprocessed.

The rest of the paper is organized as follows: Section 2 presents an overview ofthe malware trends in recent years. In Section 3 we explain how Machine Learningtogether with Parallel Computing and Dimensionality Reduction techniques can beused to detect preciously unknown malware instances. Finally, in Section 4 we presentsome conclusions.

2 Malware Growth and Trends

Recent years have seen a dramatical increase in the number of malware detected in thewild. It has been estimated that in the year 2007 it was produced as much malwareas in the previous 20 years altogether [4]. For instance, in the last year, PandaLabs [5]has received more malware than in the previous 16 years combined. In Figure 1 thetrend of malware growth is shown (see [5]).

Also, financial motivations seem to be the driving force behind this astronomicalgrowth. This makes malware creators seek stealthier methods of infection which reflects


0

200

400

600

800

1000

1200

1400

2004 2005 2006 2007

Malware using rootkits from 2004 to 2007

23

127

428

1.253

Figure 2: Rootkit detection in recent years

on the shift on the type of malicious software detected in recent years. Figures 2and 3 show the increase in number of rootkits detected on the wild and the growth ofcybercrime activities.

With over 20000 new malware specimens detected every day traditional approachesbased on signature detection are no longer feasible for use in commercial antivirussolutions.

3 Parallel Computing, Machine Learning and Dimension-ality Reduction for Malware Detection

To tackle the growth trends for malware presented a shift in paradigm is in order. Inrecent years heuristic approaches capable of proactive malware detection have beenadopted in the industry. The use of Machine Learning techniques [6] is one of the mostpopular methods [3]. The usual procedure consists in several steps:

• A set of attributes is defined in order to characterize the different files

• Train and test sets are obtained from already classified programs

• These sets are represented by means of the selected attributes

• A classification model is obtained using a Machine Learning algorithm on thetrain set

• The model is validated on the test set


Dimensionality Reduction and Parallel Computing for Malware Detection

0

10000

20000

30000

40000

50000

60000

70000

80000

90000

2003 2004 2005 2006 2007

Cybercrime growth (only Banking Trojans) from 2003 to 2007

34 754

5.096

19.128

88.384

Figure 3: Increase in Banking Trojans detection

Usually, these steps are repeated a number of times until an acceptable propor-tion of false positive detections is obtained [3]. Machine Learning algorithms that havebeen adopted for malware detection include Neural Networks [7], Support Vector Ma-chines [8], C4.5 [9] and several others.

The number of attributes and the volume of files involved in the classificationtend to be extremely high and, consequently, Dimensionality Reduction and ParallelComputing are applied to make the model inducing feasible in practice. One of themost common practices [3] is the use of selection measures such as Information Gain [10]to rank and select the most relevant features.

4 Conclusions

In this paper we have presented the problem of malware detection on a ever grow-ing scenario. With the current trends, traditional approaches such as signature-baseddetection are no longer feasible.

We have explained how Machine Learning approaches can be used in order toalleviate this problem. When used together with Dimensionality Reductions techniquesand Parallel Computing, it is possible to obtain fast and robust solutions which providean acceptable proportion of detection with low false positive rates.

Acknowledgements

This work has been partially supported by MEC TIN2007-61273 and by projects FUO-EM-305-08 and FUO-EM-177-08.


References

[1] Panda Security, Malware Infections in Protected Systems,http://pandalabs.pandasecurity.com

[2] P. Morley, Processing virus collections, In Proceedings of the 2001 Virus BulletinConference (VB2001), pages 129-134. Virus Bulletin

[3] Etor Llona, Zaira Vicente, Tecnicas en Machine Learning para la clasifi-cacion de codigo ejecutable malicioso, Proyecto [email protected]

[4] F-Secure Corporation, F-Secure Reports Amount of Malware Grew by 100%during 2007, http://www.f-secure.com

[5] Panda Security, Malware Radar White Paper. http://www.malwareradar.com

[6] T. Mitchell, Machine Learning, McGraw Hill, 1997

[7] B.D. Ripley, Pattern Recognition and Neural Networks, Cambridge, 1996.

[8] C. Cortes, V. Vapnik Support-Vector Networks, Machine Learning, 20, 1995

[9] J.R. Quinlan C4.5: Programs for Machine Learning, Morgan Kaufmann Pub-lishers, 1993

[10] I. Dıaz, J. Ranilla, E. Montanes, J. Fernandez, E.F. Combarro, Improv-ing performance of text categorization by combining filtering and support vectormachines, JASIST 55(7): 579-592 (2004)



The complexity space of partial functions: A connectionbetween Complexity Analysis and Denotational Semantics

S. Romaguera1, M.P. Schellekens2 and O. Valero3

1 Instituto Universitario de Matematica Pura y Aplicada, Universidad Politecnica deValencia, 46071 Valencia, Spain

2 Center of Efficiency-Oriented Languages, Departament of Computer Science,University College Cork, Western Road, Cork, Ireland

3 Departamento de Ciencias Matematicas e Informatica, Universidad de las IslasBaleares, 07122, Palma de Mallorca, Spain


Abstract

The study of dual complexity spaces, introduced by S. Romaguera and M.Schellekens [Topology Appl. 98 (1999), 311-322], constitutes a part of the inter-disciplinary research on Computer Science and Topology. The relevance of thistheory is given by the fact that it allows one to apply fixed point techniques ofDenotational Semantics to Complexity Analysis. Motivated by this fact and withthe intention of obtaining a mixed framework valid for both disciplines, a newcomplexity space formed by partial functions was recently introduced and studiedby S. Romaguera and O. Valero [Int. J. Comput. Math. 85 (2008), 631-640]. Anapplication of the complexity space of partial functions to model certain processesthat arise, in a natural way, in symbolic computation was given in the aforemen-tioned reference. In this paper we enter more deeply into the relationship betweensemantics and complexity analysis of programs. We construct an extension of thecomplexity space of partial functions in order to give a mathematical model forthe validation of recursive definitions of programs. As an application of this newapproach the correctness of the denotational specification of the factorial functionis shown.

Key words: ordered cone, extended quasi-metric, fixed point, recursive specifi-cation, Complexity Analysis, Denotational Semantics.

1 Introduction

The theory of complexity spaces was introduced in [24] as a topological foundationfor the complexity analysis of programs and algorithms. The basis for this theory


A connection between Complexity Analysis and Denotational Semantics

is the notion of “complexity distance”, which is a generalized metric that intuitivelymeasures relative progress made in lowering the complexity when a program is replacedby another one. The main aim of the developed topological theory is to obtain aunified structure that allows one to apply the techniques of Denotational Semantics tothe analysis of complexity of algorithms. In order to achieve this objective the notionof “complexity domain” was introduced in [25]. This generalized concept consists ofan ordered structure, which satisfies the same axioms of an ordered cone except theexistence of a neutral element, equipped with a quasi-metric.

Later on, a new complexity structure was introduced and studied, the so-calleddual complexity space ([20, 21]). This is a quasi-metric space actually admitting thestructure of an ordered cone in the sense of [6]. Furthermore, dual complexity spacesstill allows one to carry out the complexity analysis of algorithms and programs. Thesetwo facts motivate the use of dual complexity spaces instead of the original ones.

We recall that the interest in dual complexity spaces has increased in the last years,and they have been studied in depth ([22], [7], [8], [9], [16], [15], [18], [14], [19]).

On the other hand, in Computer Science it is very usual to define procedures orfunctions as subprograms that call themselves. When a programmer designs a pro-cedure using recursion one must consider whether the mathematical specification forthe procedure provides a semantically meaningless recursive definition, especially if themeaning is expressed in terms of the function to define.

The analysis of the consistency of recursive definitions of functions is based on thetheory of recurrence equations. The semantical meaning of a recursive denotationaldefinition can be seen as a solution of a recurrence equation. Consequently, fixed pointtheory turns out central to obtain “consistent” specifications for procedures or func-tions. This is achieved using the principle of fixpoint induction ([4]), which providesthe mathematical specification (a mapping, total or partial, defined recursively) as afixed point that is, at the same time, the limit of a sequence of partial mappings (alsodefined recursively).

Motivated by the fact that partial functions have proven to be very useful in Deno-tational Semantics in that they provide a basis for a mathematical model for high-levelprogramming languages, a new (dual) complexity space was constructed in [23] usingthe notion of a partial function. This new complexity structure is also an orderedcone and supplies a suitable tool for the application of typical Denotational Semanticstechniques in the context of Symbolic Computation ([23]).

In this paper we show that the complexity space of partial functions is a usefulframework to apply the principle of fixed point induction to program verification. Theremainder of this paper is organized as follows. Section 2 is devoted to introduce somemathematical preliminaries. A detailed description of complexity spaces, including thecomplexity space of partial functions, is given in Section 3. In Section 4 we present anextension of the complexity space of partial functions. Furthermore, we show that thisnew approach, contrary to the old one, is suitable for the semantic analysis of programs.In fact it is useful to prove mathematically whether a function defined recursively isconsistent. As an example we give an alternative proof of the well-known fact that thefactorial semantic specification is meaningful.


S. Romaguera, M.P. Schellekens, O. Valero

2 Preliminaries

Throughout this paper the letters R+, N and ω will denote the set of nonnegativereal numbers, the set of natural numbers and the set of nonnegative integer numbers,respectively.

Our main references for quasi-metric spaces are [5] and [13].Following the modern terminology, a quasi-metric on a set X is a nonnegative real-

valued function d on X ×X such that for all x, y, z ∈ X : (i) d(x, y) = d(y, x) = 0 ⇐⇒x = y; (ii) d(x, z) ≤ d(x, y) + d(y, z).

We will also consider extended quasi-metrics. They satisfy the above axioms, exceptthat we allow d(x, y) = +∞.

An extended quasi-metric space is a pair (X, d) such that X is a (nonempty) setand d is an extended quasi-metric on X.

Each extended quasi-metric d on a set X induces a T0 topology T (d) onX whichhas as a base the family of open d-balls Bd(x, r) : x ∈ X, r > 0, where Bd(x, r) =y ∈ X : d(x, y) < r for all x ∈ X and r > 0.

Given an extended quasi-metric d on X, then the function ds defined on X ×X byds(x, y) =maxd(x, y), d(y, x) is an extended metric on X.

An extended quasi-metric d on a set X is said to be bicomplete if the extendedmetric ds is complete on X.

According to [6] a cone on R+ (a semilinear space in [19]) is a triple (X, +, ·) suchthat (X, +) is an Abelian monoid, and · is a function from R+ ×X to X such that forall x, y ∈ X and r, s ∈ R+: (i) r · (s · x) = (rs) · x; (ii) r · (x + y) = (r · x) + (r · y); (iii)(r + s) · x = (r · x) + (s · x); (iv) 1 · x = x; (v) 0 · x = 0.

A cone (X, +, ·) is called cancellative if for all x, y, z ∈ X, x + z = y + z impliesthat x = y.

Similarly to [19], an extended quasi-metric d on a cone (X, +, ·) is said to besubinvariant (respectively, invariant) if for each x, y, z ∈ X and r > 0, d(x+ z, y + z) ≤d(x, y) (respectively, d(x + z, y + z) = d(x, y)) and d(r · x, r · y) = rd(x, y), where weassume that r · (+∞) = +∞ for all r > 0.

We briefly introduce a few notions of order theory (see [4] for a fuller treatment).An order on a nonempty set X is a reflexive, transitive and antisymmetric binary

relation ≤ on X. An ordered set is a pair (X,≤) such that ≤ is an order on X.

In case of the least element of an ordered set exists, we will say that the orderedset is pointed.

A well-known example of ordered set, which will play a central role in this work isthe set of partial functions on ω, which is formally defined by

[ω → R+] = f : domf → R+ with domf 6= ∅ and domf ⊆ ω.

Obviously this set becomes an ordered set when it is ordered by extension v (for adetailed discussion we refer the reader to [4]), i.e.

f v g ⇐⇒ domf ⊆ domg and f(n) = g(n) for all n ∈ domf.



Let (X,≤) and (Y,¹) be two ordered sets. A mapping ϕ : X → Y is said to bemonotone if ϕ(x) ¹ ϕ(y) whenever x ≤ y.

Following [6], an ordered cone is a pair (X,≤) where X is a cone and ≤ is an orderon X which is compatible with the cone structure, i.e. x + y ≤ v + w and r · x ≤ r · vwhenever x, y, v, w ∈ X with x ≤ v, y ≤ w and r ∈ R+. Ordered cones have proved tobe useful in semantics for programming languages (see [27]).

In the sequel if A is a nonempty set, we will denote by |A| its cardinality.

3 The complexity space of partial functions

In 1995, M. Schellekens introduced the theory of complexity (quasi-metric) spaces asa part of the development of a topological foundation for the complexity analysis ofprograms and algorithms ([24]). The applicability of this theory to the complexityanalysis of Divide & Conquer algorithms was illustrated by Schellekens in the samereference. In particular, he gave a new proof, based on fixed point arguments, of thewell-known fact that the mergesort algorithm has optimal asymptotic average runningtime.

Let us recall that the complexity space is the pair (C, dC), where

C = f : ω → (0, +∞] :+∞∑

n=0

2−n 1f(n)

< +∞,

and dC is the quasi-metric on C defined by

dC(f, g) =+∞∑

n=0

2−n[(1

g(n)− 1

f(n)) ∨ 0].

According to [24], given two functions f, g ∈ C the numerical value dC(f, g) (thecomplexity distance from f to g) can be interpreted as the relative progress made inlowering the complexity by replacing any program P with complexity function f by anyprogram Q with complexity function g. Therefore, if f 6= g, the condition dC(f, g) = 0can be assumed as f is “more efficient” than g on all inputs.

Later on, S. Romaguera and M. Schellekens ([20, 21]) introduced the so-calleddual complexity space and they studied several quasi-metric properties of the originalcomplexity space, which are interesting from a computational point of view, via theanalysis of this new complexity (quasi-metric) space. Furthermore, and contrarily tothe original space, the dual complexity space can be endowed with a cancellative conestructure equipped with pointwise addition and pointwise scalar multiplication. Thisfact gives one more motivation for the use of this new approach instead of the originalone, because cones provide a suitable framework for an efficiency analysis of a wideclass of algorithms (see [22], [9], [8], [7]).

The dual complexity space is the pair (C∗, dC∗), where

C∗ = f : ω → R+ :+∞∑

n=0

2−nf(n) < +∞,



and dC∗ is the quasi-metric on C∗ defined by

dC∗(f, g) =+∞∑

n=0

2−n[(g(n)− f(n)) ∨ 0].

It is clear that the computational intuition behind the complexity distance be-tween two functions in C can be recuperated in the following way: the numerical valuedC∗(f, g), for any f, g ∈ C∗, can be interpreted as a relative measure of the progressmade in lowering the complexity by replacing any program Q with complexity functiong by any program P with complexity function f, whenever the complexity measure isassumed as the running time of computing. Hence dC∗(f, g) = 0 provides that g ismore “efficient” than f on all inputs. However, as it happens for the distance dC , whendC∗(f, g) 6= 0 we can not establish which complexity function of the two, f or g, is moreefficient. In order to avoid this disadvantage, a slight modification in the definitionof the complexity distance dC∗ was introduced, and thus, a new complexity (extendedquasi-metric) distance eC∗ was constructed and studied in [19]. Now, the distance eC∗

is a useful tool for the quantitative complexity analysis of algorithms for the specificcomplexity measure of running time of computing. This new approach was appliedin [19] to the complexity analysis of Divide and Conquer algorithms, in the spirit ofSchellekens, and to modeling certain processes that arise, in a natural way, in SymbolicComputation.

In particular, this new (dual) complexity space consists of the pair (C∗, eC∗) whereeC∗ is the extended quasi-metric on C∗ given by

eC∗(f, g) = ∑+∞

n=0 2−n(g(n)− f(n)) if f(n) ≤ g(n) for all n ∈ ω,+∞ otherwise.

Recall that eC∗ has nice properties such as, for instance, invariancy, Hausdorffnessand bicompleteness. (for a deeper study see [19]).

Recently, and motivated by the usefulness of partial functions in Denotational Se-mantics and the relationship between Denotational Semantics and Complexity Analysis(see [24, 25]), Romaguera and Valero have extended the dual complexity space (C∗, eC∗)to a more general one, the so-called complexity space of partial functions (C∗→, eC∗→)which is introduced in [23] as follows. Let [(ω R+)] be the set of partial functionsf ∈ [ω → R+] such that domf = 0, 1, ..., n for some n ∈ ω, or domf = ω, and let ≤→be the order on [(ω R+)] given by

f ≤→ g ⇔ domg ⊆ domf and f(n) ≤ g(n) for all n ∈ domg.

Then we define

C∗→ = f ∈ [(ω R+)] :∑

n∈domf

2−nf(n) < +∞

and

eC∗→(f, g) = ∑

n∈domg 2−n(g(n)− f(n)) if f ≤→ g

+∞ otherwise..



Note that if f ∈ C∗ then the unordered sum∑

n∈domf 2−nf(n) exists and its sum isequal to

∑∞n=0 2−nf(n) (see, for instance, Problem G (g) in [12]). Therefore C∗ ( C∗→.

On the other hand, the set C∗→ becomes a noncancellative ordered cone (Proposition2 of [23]) endowed with the operations ⊕ and ¯ defined for all f, g ∈ C∗→ as follows:

(f ⊕ g)(n) = f(n) + g(n) for all n ∈ dom(f ⊕ g)(r ¯ f)(n) = rf(n) for all n ∈ dom(r ¯ f),

where dom(f ⊕ g) = domf ∩ domg and dom(r ¯ f) = domf.Of course, if f, g ∈ C∗ and r ∈ R+ then the operations f ⊕ g and r ¯ f coincide

with the pointwise addition and scalar multiplication, respectively.It was proven in Proposition 3 of [23] that eC∗→ is a bicomplete subinvariant extended

quasi-metric on C∗→.The complexity space (C∗→, eC∗→) constitutes, as in case of (C∗, eC∗), a suitable frame-

work to measure distances between symbolic representations of real numbers and itsapproximations, as it was shown in [23].

4 Recursion in Denotational Semantics for programminglanguages: An extension of (C∗→, eC∗→)

Motivated, in part, by the work of E. A. Emerson and C. S. Jutla ([3]) about treeautomata and modal logic, a general class of complexity spaces has been introduced andstudied in [8, 9] to obtain an appropriate framework for efficient complexity analysisof algorithms with exponential running time of computing. By an exponential timealgorithm we mean an algorithm whose running time is in the class O(2P (n)), where foreach n ∈ ω, P (n) is a polynomial such that P (0) ≥ 0 and P (n) > 0 for all n ∈ N. It isobvious that if, in addition, P (n) ≥ n for all n ∈ N, and we associate the complexity ofan algorithm of this type with a function fP given by fP (n) = 2P (n) for all n ∈ ω, thenfP /∈ C∗. For this reason, fixed polynomials P (n) as before, the complexity structurepresented in [9] consists of a pair (C∗P (n), dC∗P (n)

) such that

C∗P (n) = f : ω → R+ :+∞∑

n=0

2−P (n)f(n) < +∞,

and dC∗P (n)

is the extended quasi-metric given by

dC∗P (n)

(f, g) =+∞∑

n=0

2−P (n)[(g(n)− f(n)) ∨ 0].

Now it is clear that fP ∈ C∗Q(n), where Q(n) = P (n) + n for all n ∈ ω.With the aim to go more deeply into the combination of the techniques of De-

notational Semantics and Complexity Analysis, we construct, in this direction, a newcomplexity space which extends the old one (C∗→, eC∗→). In order to motivate this new



construction let us show that the complexity space (C∗→, eC∗→) can not be used, in gen-eral, as a mathematical model for the validation of recursive definitions of programs.Indeed, consider the easy but representative example of a function which is given by arecursive specification, the factorial fact.

To implement an algorithm that computes the factorial of a nonnegative integernumber it is needed the following recursive denotational specification (see, for instance,[10]):

fact(k) =

1 if k = 0kfact(k − 1) if k ≥ 1

.

The preceding denotational specification has the drawback that the meaning of thesymbol fact, which is given by the right hand side, is expressed again in terms offact. So the symbol fact can not be replaced by its meaning because the meaning alsocontains the symbol. Furthermore, it is obvious that the entire factorial function is notcomputable in a finite numbers of steps although, given k ∈ ω, it is clear that the valuek! can be computed in a finite number of steps.

The usual method used to avoid this handicap is to consider a nonrecursive func-tional φ defined on the set of partial mappings as follows:

φf(k) =

1 if k = 0kf(k − 1) if k ≥ 1 and k − 1 ∈ domf

,

and then to show that fact is a fixed point of φ. Our purpose here is to prove thatsuch a denotational specification is meaningful using as the support space of φ ourcomplexity structure, and applying the fixed point induction. However it is evidentthat the function fact (the solution of the recursive equation) is not in C∗→, because∑+∞

n=0 2−nn! = +∞. To obtain our aim we propose, similarly to [9], the followinggeneralization of the complexity space (C∗→, eC∗→).

Fixed polynomials P (n), with P (0) ≥ 0 and P (n) > 0 for all n ∈ N, set

C∗→,P (n) = f ∈ [(ω R+)] :∑

n∈domf

2−P (n)f(n) < +∞.

Note that the partial order ≤→ remains valid on C∗→,P (n).Define the nonnegative real valued function eC∗→,P (n)

on C∗→,P (n) × C∗→,P (n) as

eC∗→,P (n)(f, g) =

∑n∈domg 2−P (n)(g(n)− f(n)) if f ≤→ g

+∞ otherwise.

Obviously C∗P (n) ( C∗→,P (n). Moreover, C∗→ ⊆ C∗→,P (n) and eC∗→,P (n)|C∗→ = eC∗→ ,

whenever P (n) ≥ n for all n ∈ ω.Denote by 0C∗→,P (n)

the function that vanishes at every n ∈ ω.

Under these conditions, it is a simple matter to prove the next results.

Proposition 2. The pair (C∗→,P (n),≤→) is a pointed ordered (noncancellative) conewith bottom element 0C∗→,P (n)

.



Proposition 3. The function eC∗→,P (n)is a bicomplete subinvariant extended quasi-

metric on C∗→,P (n).

Decreasing sequences of complexity functions play a central role in applications ofcomplexity spaces to Computer Science. In fact, such sequences have allowed one todiscuss the complexity (running time of computing) of the sorting program mergesort([24]) and certain wide classes of Probabilistic Divide and Conquer algorithms ([17]).Moreover, several advantages, in measuring real numbers, have been exhibited whensequences of computational representations of real numbers have been identified withdecreasing sequences of (partial or total) complexity functions ([19, 18, 23]).

Following [23], we will say that a sequence (fk)k∈N in C∗→,P (n) is decreasing iffk+1 ≤→ fk for all k ∈ N. In this case we will denote by u↓fk the element of [(ω R+)]such that domu↓fk =

⋃k∈N domfk and

(u↓fk)(n) = infn∈domfk

fk(n).

The following result is useful to prove our main theorem (Theorem 2 below).

Proposition 4. Let (fk)k∈N be a decreasing sequence in C∗→,P (n) such that u↓fk ∈C∗P (n). If limk→∞eC∗→,P (n)

(u↓fk, fk) = 0, then u↓fk is the unique eC∗→,P (n)-limit point of(fk)k∈N.

The fixed point theory provides an efficient tool in Computer Science. In particular,many applications of such a theory to denotational models of programming languagesare obtained by means of order-theoretic notions (see, for instance, [4, 10, 26]). How-ever, several applications of the Banach fixed point theorem to complexity analysisof programs and algorithms and to metric semantics for programming languages havebeen given in [24, 1, 2, 11]. In this last case such applications are founded only onmetric requirements. Next we present a fixed point theorem in the realm of extendedquasi-metric spaces which involves also order notions.

According to [23] (compare [24]), we say that a monotone mapping φ : C∗→,P (n) →C∗→,P (n) is an improver with respect to f ∈ C∗→,P (n) if φ(f) ≤→ f. Note that if φ is an im-prover with respect to f ∈ C∗→,P (n), then (φk(f))k∈N is a decreasing sequence in C∗→,P (n).

Theorem 2. Let φ be a continuous monotone mapping from the complexity space(C∗→,P (n), eC∗→,P (n)

) into itself. If φ is an improver with respect to any f0 ∈ C∗→,P (n) suchthat u↓φkf0 ∈ C∗P (n) and

limk→∞eC∗→,P (n)(u↓φkf0, φ

kf0) = 0.

Then u↓φkf0 is a fixed point of φ.



In the rest of the section we show, by means of Theorem 2, that our new complexityapproach is suitable to prove mathematically whether a function defined recursively isconsistent as we announced before. In particular we give an alternative proof of thefact that the factorial semantic specification is meaningful, and we do this by meansof the principle of fixed point induction showing that the factorial function (the totalcomplexity mapping) can be considered as the limit of a sequence of approximations(complexity partial mappings) which can be computed in a finite number of steps.

From now on we consider the polynomial P (n) given by P (n) = n2 for all n ∈ ω.Denote by 1C∗→,n2

the element of C∗→,n2 such that dom1C∗→,n2= 0 and 1C∗→,n2

(0) =1. Consider the functional φ : C∗→,n2 → C∗→,n2 defined by

φf(k) =

1 if k = 0kf(k − 1) if k ≥ 1 and k − 1 ∈ domf

.

It is clear that φ is monotone and it is an improver with respect to 1C∗→,n2.

Next we prove that φ is continuous. Indeed, let (fk)k∈N be a sequence in C∗→,n2

and let f ∈ C∗→,n2 be such that limk→∞eC∗→,n2(f, fk) = 0. Then f ≤→ fk eventually. By

monotonicity of φ we obtain φf ≤→ φfk eventually. Moreover,

eC∗→,n2(φf, φfk) ≤ eC∗→,n2

(f, fk)

eventually. So limk→∞eC∗→,n2(φf, φfk) = 0 and, thus, φ is continuous.

Note that φk1C∗→,n2(n) = n! for all n ∈domφk1C∗→,n2

.

On the other hand, we have that domu↓φk1C∗→,n2= ω and u↓φk1C∗→,n2

(n) = n!

for all n ∈ ω, since limn→∞|domφk1C∗→,n2| =limn→∞k = +∞ and u↓φk1C∗→,n2

(n) =φn1C∗→,n2

(n) = n! for all n ∈ ω. Moreover,

+∞∑

n=0

2−n2 u↓ φk1C∗→,n2(n) =

+∞∑

n=1

2−n2n! < +∞.

So u↓φk1C∗→,n2∈ C∗n2 . Since

eC∗→,n2(u↓φk1C∗→,n2

, φk1C∗→,n2) = 0

for all k ∈ ω, we have, by Theorem 2, that u↓φk1C∗→,n2is a fixed point of φ. So we

have obtained the factorial (the meaning of the recursive denotational definition) asthe fixed point u↓φk1C∗→,n2

, which is the limit of the partial mappings (φk1C∗→,n2)k∈N

that allow us to obtain each computation of the factorial in a finite number of steps.

5 Acknowledgements

The first and the third authors acknowledge the support of the Spanish Ministry of Ed-ucation and Science, and FEDER, grant MTM2006-14925-C02-01. The second authoracknowledges the support of the Science Foundation Ireland, SFI Principal InvestigatorGrant 07/IN.1/I977.



References

[1] J.W. de Bakker and E.P. de Vink, Control Flow Semantics, The MIT Press,Massachusetts, 1996.

[2] J.W. de Bakker and E.P. de Vink, Denotational models for programminglanguages: applications of Banach’s fixed point theorem, Topology Appl. 85 (1998),35–52.

[3] A.C. Emerson and C. S. Jutla, The complexity of tree automata and logic ofprograms, Siam J. Comput. 29 (1999), 132–158.

[4] B.A. Davey and H.A. Priestley, Introduction to Lattices and Order, Cam-bridge University Press, New York, 1990.

[5] P. Fletcher and W.F. Lindgren, Quasi-Uniform Spaces, Marcel Dekker, NewYork, 1982.

[6] B. Fuchssteiner and W. Lusky, Convex Cones, North Holland, Amsterdam,1981.

[7] L.M. Garcıa-Raffi, S. Romaguera and E.A. Sanchez-Perez, Sequencespaces and asymmetric norms in the theory of computational complexity, Math.Comput. Model. 36 (2002), 1–11.

[8] L.M. Garcıa-Raffi, S. Romaguera and E.A. Sanchez-Perez, The supre-mum asymmetric norm on sequence algebras: A general framework to measurecomplexity distances, Electronic Notes Theoret. Comput. Sci. 74 (2003), 39–50.

[9] L.M. Garcıa-Raffi, S. Romaguera, E.A. Sanchez-Perez and O. Valero,Normed semialgebras: a mathematical model for the complexity analysis of pro-grams and algorithms, in: Proceedings of the 7th World Multiconference on Sys-temics, Cybernetics and Informatics Vol. II, 2003, pp. 55-58.

[10] C.A. Gunter and D. S. Scott, Semantic domains in: Handbook of TheoreticalComputer Science, Jan van Leeuwen (ed.), Elsevier Science Publishers, vol. B:Formal Models and Semantics, 633-674, 1990.

[11] J. I. Hartog, J.W. de Bakker and E.P. De Vink, Metric semantics andfull abstractness for action refinement and probabilistic choice, Electronic NotesTheoret. Comput. Sci. 40 (2001), 72–99.

[12] J. L. Kelley, General Topology, Springer-Verlag, New york, 1955.

[13] H.P.A. Kunzi, Nonsymmetric distances and their associated topologies: Aboutthe origins of basic ideas in the area of asymmetric topology in: Handbook of theHistory of General Topology, C.E. Aull and R. Lowen (eds.), Kluwer Acad. Publ.vol. 3, 853-968, 2001.



[14] M.O. O’Keeffe, S. Romaguera and M. Schellekens, Norm-weightableRiesz spaces and the dual complexity space, Electronic Notes Theoret. Comput.Sci. 74 (2003), 105–121.

[15] J. Rodrıguez-Lopez, A new approach to epiconvergence and some applications,Southeast Asian Bulletin of Mathematics 28 (2004), 685–701.

[16] J. Rodrıguez-Lopez, S. Romaguera and O. Valero, Asymptotic complex-ity of algorithms via the nonsymmetric Hausdorff distance, Computing Letters 2(2006), 155–161.

[17] J. Rodrıguez-Lopez, S. Romaguera and O. Valero, Denotational semanticsfor programming languages, balanced quai-metrics and fixed points, Int. J. Comput.Math 85 (2008), 623–630.

[18] S. Romaguera, E.A. Sanchez-Perez and O. Valero, The complexity spaceof a valued linearly ordered set, Electronic Notes Theoret. Comput. Sci. 74 (2003),158–171.

[19] S. Romaguera, E.A. Sanchez-Perez and O. Valero, Computing complexitydistances between algorithms, Kybernetika 39 (2003), 569–582.

[20] S. Romaguera and M. Schellekens, Quasi-metric properties of complexityspaces, Topology Appl. 98 (1999), 311–322.

[21] S. Romaguera and M. Schellekens, The quasi-metric of complexity conver-gence, Quaestiones Math. 23 (2000), 359–374.

[22] S. Romaguera and M. Schellekens, Duality and quasi-normability for com-plexity spaces, Appl. Gen. Topology 3 (2002), 91–112.

[23] S. Romaguera and O. Valero, On the structure of the space of complexitypartial functions, Int. J. Comput. Math. 85 (2008), 631–640.

[24] M. Schellekens, The Smyth completion: a common foundation for denona-tional semantics and complexity analysis, Electronic Notes Theoret. Comput. Sci.1 (1995), 535–556.

[25] M. Schellekens, Complexity domains, a relative view of complexity, manuscript.

[26] R.D. Tennent, The denotational semantics of programming languages, Comm.ACM 19 (1976), 437–453.

[27] R. Tix, K. Keimel and G. Plotkin, Semantic domains for combining proba-bility and non-determinism, Electronic Notes Theoret. Comput. Sci. 129 (2005),1–104.



Spectral centralities of complex networksvs. local estimators

Miguel Romance1

1 Department of Applied Mathematics, University Rey Juan Carlos, Madrid (Spain)


Abstract

We will analyze several centrality measure by giving a general framework thatincludes the Bonacich centrality, PageRank centrality or in-degree vector amongothers. We will get some local scale estimators for such a global measures bygiving some geometrical characterizations and some deviation results that helps toquantify the error of approximating a spectral centrality by a local estimator.

Key words: Complex Networks, Centrality measures, spectral measures.

1 Introduction and Motivation

The last years there have been an an intense research activity on complex networksanalysis by the scientific community, since it has been shown that that many systemsin technology, society and nature, can be modeled by graphs with peculiar local prop-erties (see, for example [1, 2]). In particular, many networks are characterized by aheterogeneous characteristics of their nodes, such as the number of neighbors or otherfeatures. Such heterogeneity is responsible for many remarkable features of real net-works, such as resilience to random failures the behavior of the network in an epidemicspreading or the synchronization phenomena of dynamical systems in complex networks(see, for example [2, 6]). In order to measure such a heterogeneity, network researcherscoming from many different scientific fields have introduced a large number of centralityindices that measure the varying importance of the vertices in a network according toseveral criterion. These indices have proved of great value in the analysis and under-standing of the roles played by the different nodes in the structure and dynamics of thenetwork.

Perhaps the simplest centrality measure is the (in-)degree vector. If G = (X, E)is a (directed or undirected) network with n nodes X = 1, . . . , n and m links E =`1, . . . , `m, the (in-)degree of a node i is the number of links incident on i and roughlyspeaking it is is a measure in some sense of the popularity of node i. There are many


spectral centralities vs local estimators

other centrality index that are more sophisticated, such as the closeness, which isthe mean geodesic (i.e., shortest path) distance between a node and all other nodesreachable from it, or the betweeness centrality, but in this note we will consider a classof centralities based on spectral properties of the network: the spectral centralities.

P. Bonacich [3] introduced the notion of eigenvector centrality that relates the cen-trality of each node i of a network G = (X, E) (directed or not) with the ith-componentof a eigenvector corresponding to the dominant eigenvalue of the transpose of the ad-jacency matrix of G that we will denote by A. The heuristic behind this concept isconsidering that if we want to define a centrality function c : X −→ R, then c(i) is theaddition of the centrality of all the neighbors of i that points to it (i.e. the vertices jsuch that the link (j, i) exists), that is if we consider a directed network G = (X, E) ofn nodes where X = 1, 2, . . . , n, then for all 1 ≤ i ≤ n

c(i) =1λ

∑

dji=1

c(j) =1λ

n∑

j=1

ajic(j),

where (aij) = A is the adjacency matrix of G. The role of factor 1/λ in the lastformula is guarantying that the equality has non-trivial sense. It is easy to check thatif we denote c = (c(1), . . . c(n)) ∈ Rn, the last expression ensures that Atc = λc, andtherefore the centrality of a node i is the ith-component of an eigenvector of At (thetranspose of the adjacency matrix of G = (X, E)). P. Bonacich [3], [4] introducedthe eigenvector centrality following these ideas, simply by considering λ to be thespectral radius of A (which is the same that the spectral radius of At) in order toguaranty that all the components of the eigenvector c are positives (as a consequenceof the classical Perron-Frobenius theorem) and therefore c = (c(1), . . . c(n)) ∈ Rn isa eigenvector associated to the spectral radius ρ(A) such that belongs to the positivecone Rn

+ = (x1, . . . xn); xi > 0 ∀1 ≤ i ≤ n.Another centrality index that we will consider is the PageRank centrality [5], which

is the prestige measure that is at the hearth of the Google’s ranker of web pages. Themain idea of this index it to simulating the behavior of a user browsing the Web. Mostof the times, the user visits pages just by surfing, i.e. by clicking on hyperlinks of thepage he is on; otherwise, the user will jump to another page by typing its URL on thebrowser, or going to a bookmark, or whatever. On a complex network, such a processcan be modelled by a random walk with eventual jumps towards randomly selectednodes. Hence, the PageRank centrality is stationary state of the Markov chain givenby the n× n transition matrix M defined as

Mij =q

n+ (1− q)

aji

grout(j),

where q ∈ [0, 1] is a damping factor (that models the probability that user jump toanother random page) and grout(j) is the outgoing degree of node j.

We will see that either Bonacich centrality or PageRank centrality and other cen-trality index (including the degree vector) are particular cases of a more general frame-work. In addition to this we will also give some local estimators of such indexes that


Miguel Romance

gives computable efficient approximations. There are several numerical testings in theliterature that illustrate that there should be some correlation between the Bonacich’scentrality and the degree vector of the network, but we prove some analytical relation-ships not only for such a centrality, but for the general spectral centrality model, thatalso give some estimators for any spectral centrality.

2 Mathematical model and main results

We will present a general framework that includes several centrality indexes, such asthe Bonacich centrality, the PageRank centrality or the in-degree vector, among other.The main idea is given in the following concept.

Definition 2.1 Let G = (X, E) be a directed network such that X = 1, . . . n andlet φ : X ×X −→ [0, +∞). A centrality measure vector of G associated to the weightfunction φ is a vector 0 6= Cφ = (Cφ(1), . . . , Cφ(n)) ∈ Rn with Cφ(j) ≥ 0 for all1 ≤ j ≤ n, such that there exists Λ 6= 0 verifying

Cφ(i) =1Λ

n∑

j=1

φ(i, j)Cφ(j),

for all i ∈ X.

By taking properly the function φ we get several well known centrality measuresand also many other new centrality indexes that can be useful in real applications, sincethey give much more information about the role of each node of the network in theproperties of the network.

We will prove that for every φ : X ×X −→ [0, +∞), we can always find a central-ity index Cφ associated to φ and a constructive method for obtaining such a vector.Moreover, if φ(i, j) > 0, then we can show that such a index is unique, simply by usingsome results from the Perron-Frobenius theory for positive matrices.

Once we have introduced the model and stated the existence (and uniqueness insome cases) of such spectral centralities, we consider the problem of giving local es-timator of such a centralities that allow us to avoid the computational complexity ofcalculating a spectral centrality. The results presented are in two different directions.On the one hand we will give some geometrical characterizations of such a local esti-mator, by proving results as the following:

Proposition 2.2 Let G = (X,E) be a directed network such that X = 1, . . . n,φ : X × X −→ [0,+∞) as in definition 2.1 and Cφ ∈ Rn a centrality measure of Gassociated to φ such that for every i ∈ X

Cφ(i) =1Λ

n∑

j=1

φ(i, j)Cφ(j).

If Mφ denotes the n× n matrix (φ(i, j))ij, then Cφ is orthogonal to the vectors wk forall k ∈ N, where


spectral centralities vs local estimators

(i) w1 = (1, . . . , 1)− 1Λ

n∑

j=1

φ(i, j)

n

i=1

∈ Rn.

(ii) wk =1

Λk−1

n∑

j=1

(Mkφ )ij

n

i=1

−

n∑

j=1

φ(i, j)

N

i=1

∈ Rn, for all k ≥ 2.

On the other hand, we can give some concentration results that give some boundsfor the difference between a spectral centrality (a global measure) and its local estimator(local measure whose computation is much more easy). A particular case of these kindof results is the following theorem for the Bonacich centrality, which can en estimatedby the in-degree vector. This results give the analytical answer to several numericaltestings in the literature that illustrate that there is a strong correlation between theBonacich’s centrality and the degree vector of the network.

Theorem 2.3 If G = (X, E) is a directed network such that X = 1, . . . n and 0 ≤c = (C1, . . . , cn) ∈ Rn is the Bonacich centrality vector of G then∥∥∥∥c− 1

|E|grin(G)∥∥∥∥

1

≤ |E|max|cj − ck|; 1 ≤ j 6= k ≤ n.

Acknowledgements

This work has been partially supported by the grant from the Spanish MEC, ProjectRef. MTM2006-10053.

References

[1] R.Albert and A. L.Barabasi, Statistical mechanics of complex networks,Rev. Mod. Phys. 74 (2002) 47–97.

[2] S. Boccaletti, V. Latora, Y. Moreno, M. Chavez, D.-U. Hwang, Com-plex networks: Structure and dynamics, Physics Reports 424 (2006) 175–308.

[3] P. Bonancich, Factoring and weighing approaches to status scores and cliqueidentification, J. Math. Soc. 2 (1972) 113.

[4] P. Bonancich and P. Lloyd, Eigenvector-like measures of centrality for asym-metric relations, Soc. Netw.23 (2001) 191.

[5] S. Brin and L. Page, The anatomy of a large-scale hypertextual Web searchengine, Comput. Netw. 30 (1998) 107–117.

[6] M.E. J.Newman, The structure and function of complex networks, SIAM Review45 (2003) 167–256.





30 June, 1–3 July 2009.

Computational Methods for Finite Semifields

Ignacio F. Rua1, Elıas F. Combarro2 and J. Ranilla2

1 Departamento de Matematicas, Universidad de Oviedo

2 Artificial Ingelligence Center, University of Oviedo


Abstract

Finite semifields (finite non necessarily associative division rings) have tradi-tionally been considered in the context of finite geometries (they coordinatize pro-jective semifield planes). Recent applications to coding theory, combinatorics andgraph theory, have broaden the potential interest in these rings.

We show recent progress on the study of these objects with the help of compu-tational tools. In particular, we state results on the classification and primitivityof these semifields obtained with the help of implementations (in language C) ofdifferent algorithms designed to deal with these rings.

Key words: Finite semifields

MSC 2000: 17D99, 17-04, 17-08

1 Introduction

A finite semifield (or finite division ring) D is a finite nonassociative ring with identitysuch that the set D∗ = D \ 0 is closed under the product, i.e., it is a loop [18, 8].Finite semifields have been traditionally considered in the context of finite geometriessince they coordinatize projective semifield planes [12]. Recent applications to codingtheory [6, 15, 11], combinatorics and graph theory [19], have broadened the potentialinterest in these rings.

Because of their diversity, the obtaining of general theoretical algebraic resultsseems to be a rather difficult (and challenging) task. On the other hand, becauseof their finiteness, computational methods can be naturally considered in the study ofthese objects. So, the classification of finite semifields of a given order is a rather naturalproblem to use computations in. For instance, computers were used in the classificationof semifields or orders 16 [17] and 32 [23, 18]. These results date back 40 years, whencomputers were being incorporated to scientific research. In [17], a complete accountof the study of finite semifields in the first decades of the 20th century can be found.



In the last few years there is a renovated interest on the study of semifields with thehelp of computational methods. So, in [16] a quest for the study of semifields of orderat most 256 is launched: These computer-assisted results used very weak computers by

modern standards; it is surprising that there has not yet been an enmueration of all

semifields of order at most 256 since the resulting data might be useful for finding new

general constructions. This lead U. Dempwolff to describe in [9] all finite semifields oforder 81 (independently this classification was also achieved by the first two authors[7]), and to the classification of semifields of order 64 which has been recently obtainedby the three authors [21]. On the other hand, motivated by complete different reasons,in [20, 13], the primitivity of semifields of orders 32, 64 and 81 was considered.

In this paper we present the computational methods we have implemented to dealwith different problems in semifields, and the recent results we have obtained with theseimplementations. In particular, we will show the state of the art on the classificationand primitivity problems.

2 Finite Semifields

A finite nonassociative ring D is called presemifield, if the set of nonzero elementsD∗ is closed under the product. If D has an identity element, then it is called (finite)

semifield. If D is a finite semifield, then D∗ is a multiplicative loop. That is, thereexists an element e ∈ D∗ (the identity of D) such that ex = xe = x, for all x ∈ D and,for all a, b ∈ D∗, the equation ax = b (resp. xa = b) has a unique solution.

Apart from finite fields (which are obviously finite semifields), proper finite semi-fields were first considered by L.E. Dickson [10] and were deeply studied by A.A. Albert[1, 2, 3, 4]. The term finite semifield was introduced in 1965 by D.E. Knuth [18]. Theserings play an important role in the study of certain projective planes, called semifield

planes [12, 18].

The characteristic of a finite presemifield D is a prime number p, and D is a finite-dimensional algebra over the Galois field GF (q) (q = pc) of dimension d, for somec, d ∈ N, so that |D| = qd. If D is a finite semifield, then GF (q) can be chosen to be itsassociative-commutative center Z(D). Other relevant subsets of a finite semifield arethe left, right, and middle nuclei (Nl,Nr,Nm), and the nucleus N [8].

Isomorphism of presemifields is defined as usual for algebras, and the classificationof finite semifields up to isomorphism can be naturally considered. Because of theconnections to finite geometries, we must also consider the following notion. If D1,D2

are two presemifields over the same prime field GF (p), then an isotopy between D1

and D2 is a triple (F,G,H) of bijective linear maps D1 → D2 over GF (p) such that

H(ab) = F (a)G(b) , ∀a, b ∈ D1.

Clearly, any isomorphism between two presemifields is an isotopy, but the converseis not necessarily true. Any presemifield is isotopic to a finite semifield [18, Theorem4.5.4]. From a presemifield D, a projective plane P(D) can be constructed. We referto [12, 18] for the details. Theorem 6 in [3] shows that isotopy of finite semifields


Ignacio F. Rua, Elıas F. Combarro, J. Ranilla

is the algebraic translation of the isomorphism between the corresponding projectiveplanes. Two finite semifields D1,D2 are isotopic if, and only if, the projective planesP(D1),P(D2) are isomorphic.

The set of isotopies from a finite semifield D to itself is a group under composition,called the autotopism group, and denoted At(D). This group acts on the funda-

mental triangle of the plane P(D), that is, it leaves invariant each of the three linesLx = (1, x, 0) | x ∈ D ∪ (0, 1, 0) , Ly = (1, 0, y) | y ∈ D ∪ (0, 0, 1) , L∞ =(0, 1, z) | z ∈ D ∪ (0, 0, 1).

Given a finite semifield D, it is possible to construct the set D of all its isotopicbut non necessarily isomorphic finite semifields. It is a subset of the set of principal

isotopes of D [14, 18]. A principal isotope of D is a finite semifield D(y,z) (wherey, z ∈ D∗) such that (D(y,z),+) = (D,+) and multiplication is given by the rule

a · b = R−1z (a)L−1

y (b) , ∀a, b ∈ D

where Rz, Ly : D → D are the maps Rz(a) = az, Ly(a) = ya, for all a ∈ D. Moreover,there is a relation between the order of At(D) and the orders of the automorphismgroups of the elements in D [18, Theorem 3.3.4]. If D is a finite semifield, and D is theset of all nonisomorphic semifields isotopic to D, then

(|D| − 1)2 = |At(D)|∑

E∈D

1

|Aut(E)|

where Aut(E) denotes the automorphism group of a finite semifield E. The sum of theright term will be called the Semifield/Automorphism (S/A) sum [18, Theorem 3.3.4],and it provides the number of nonisomorphic semifields generating the same plane, andthe order of their automorphism groups.

If B = [x1, . . . , xd] is a GF (q)-basis of a presemifield D, then there exists a uniqueset of constants AD,B = Ai1i2i3

di1,i2,i3=1 ⊆ GF (q) such that

xi1xi2 =

d∑

i3=1

Ai1i2i3xi3 ∀i1, i2 ∈ 1, . . . , d

This set of constants is known as cubical array or 3-cube corresponding to D withrespect to the basis B, and it completely determines the multiplication in D.

A remarkable fact is that any permutation of the indexes of a 3-cube preserves theabsence of nonzero divisors. Namely, if D is a presemifield, and σ ∈ S3 (the symmetricgroup on the set 1, 2, 3), then the set

AσD,B = Aiσ(1)iσ(2)iσ(3)

di1,i2,i3=1 ⊆ GF (q)

is the 3-cube of a GF (q)-algebra DσB without zero divisors [18, Theorem 4.3.1]. Notice

that, in general, different choices of bases B,B′ may lead to nonisomorphic presemifieldsDσ

B,DσB′ . However, these presemifields are always isotopic [18, Theorems 4.4.2 and

4.2.3].



By [18][Theorem 5.2.1], up to six projective planes can be constructed from a givenfinite semifield D using the transformations of the group S3. Actually, S3 acts on theset of semifield planes of a given order. So, the classification of finite semifields can bereduced to the classification of the corresponding projective planes up to the action ofthe group S3. The different surveys on the topic ([18, 8, 16]) contain a description ofknown families of semifieds together with the classification for small orders (32 or less).

With the help of 3-cubes the construction of finite semifields of a given order canbe rephrased as a matrix problem [9][13, Proposition 3].

Proposition 1 There exists a finite semifield D of dimension d over its center Z(D) ⊇GF (q) if, and only if, there exists a set of d matrices A1, . . . , Ad ⊆ GL(d, q) (the set

of invertible matrices of size d over the Galois field GF (q)) such that:

1. A1 is the identity matrix;

2.∑d

i=1 λiAi ∈ GL(d, q), for all non-zero tuples (λ1, . . . , λd) ∈∏d GF (q), that is,

(λ1, . . . , λd) 6= −→0 .

3. The first column of the matrix Ai is the column vector e↓i with a 1 in the i-th

position, and 0 everywhere else.

In such a case, the set Bijkdi,j,k=1, where Bijk = (Ai)kj , is the 3-cube correspond-

ing to D with respect to the standard basis of GF (q)d. In [9], the set BΣ, and its linear

span Σ are called standard basis and semifield spread set (SSS), respectively.

For a concrete representation of the semifield one can identify the semifield withGF (q)d, and the multiplication with x ∗ y =

∑6i=1 xiAiy, i.e., Ai is the matrix of left

multiplication by the element ei, where e1, . . . , ed is the canonical basis of GF (q)d.So, the elements of the standard basis are coordinate matrices of the linear mapsLei

: D → D, Lei(y) = ei ∗ y.

3 Methods, implementation and results

The result stated in Proposition 1 is fundamental to represent of finite semifields in acomputer. So, we represent any finite semifield by one of its standard basis, i.e., by aset of ordered matrices of fixed size over a finite field. This was also the way semifieldswere represented in [23] or, more recently, in [13, 9].

It is clear that a finite semifield can be represented by a big amount of differentstandard bases. Namely, any choice of basis of the corresponding vector space (suchthat the first element is the identity) induces one of those standard bases. So, in order toreduce the amount of different representations we deal only with cyclic representations.

Definition 1 A finite semifield D, of dimension d over its center Z(D), is called

• Left primitive, if there exists a ∈ D (a left primitive element) such that:

D∗ = e, a, a(2, a(3, . . .

where a(2 = aa, a(3 = aa(2, . . ..



• Left cyclic, if there exists a ∈ D (a left cyclic element) such that:

e, a, a(2, . . . , a(d−1

is a Z(D)-basis of D.

Any left primitive semifield is always left cyclic [13, Corollary 1]. First examplesof left primitive semifields are associative finite semifields, i.e., finite fields, since themultiplicative group of a finite field is a cyclic group. It was conjectured in [24] thatany finite semifield is always left (or right) primitive, but the two examples of 32 and 64elements mentioned in [20, 13] show that this conjecture is not true. However, even thesefew nonprimitive semifields are cyclic. So, empirical evidence suggests that any finite

semifield is left cyclic. Therefore, our restriction in the use of cyclic representations,i.e., representations in which ei = Li

e2(e1), seems to be irrelevant. Notice that in these

representations, e1 is the identity of the semifield, and e2 is a left cyclic element.With this choice of representations computations are reduced significantly. Another

reduction in computation times comes from the use of adequate implementations inlanguage C. This language proved to be specially suitable to manipulation of semifieldsin [13] and was later confirmed with the experiments which lead to the classification ofsemifields or order 81. These rings were classified in [9] in a few days on a PC, usingimplementations in software GAP [22]. Independently, this classification was achieved ina few minutes using our implementations in language C [7]. Of course, GAP providespowerful routines to deal with different problems, such as computation of structure ofautomorphism groups. However, C provides astonishingly fast procedures. In thoseplaces where speed is not relevant, we have used the routines provided by the softwareMagma [5]. Here are examples of some of the different routines we have implemented:

• centro computes the center and nuclei of a semifield.

• listaisotopos produces the list of principal isotopes of a semifield.

• permuta generates semifields by permuting the indexes the 3-cube of a semifield.

• clasifica classifies up to isomorphism, isotopy, and S3-action a list of semifields.

• autotopism computes the autotopism group of a semifield.

• automorphism computes the automorphism group of a semifield.

Different combination of these procedures produce new results. For instance, com-bining listaisotopos, clasifica and automorphism, it is possible to compute the S/A-sumof any semifield.

The first result obtained with these implementations was the classification of semi-fields of order 81[7]. This independent classification confirms the correctness of theclassification presented in [9], in the same way that the classification in [18] did withthe one shown in [23]. The classification lead to 12 S3-classes, four of them previouslyunknown.



A remarkable step was achieved when the implementations mentioned above werecombined with the algorithm introduced in [13], and some ideas from [9], in orderto produce a complete classification of semifields of order 64. This classification issignificantly more difficult than that of 81 elements, since one year of computationtime was needed to produce it (actually, the computation was done in one monthof parallel computation) [21]. The relevance of this classification was not only thecombination of ideas which lead to the reduction of computing times (actually, therewas a 99% reduction in the number of cases to be explored) but also in the big amountof semifields found. Namely, 80 S3-classes were found, but only 13 were previouslyknown.

More classification results have been obtained with our techniques. Let us brieflypresent our current progress.

• Semifields of order 44, with center GF (4): 28 S3-classes.

• Semifields of order 54: 42 S3-classes.

• Commutative semifields of order 35: 7 S3-planes.

Let us also notice that computation complexity increases exponentially on thedimension of the semifield, so semifields of dimension 4 are significantly easier to classifythat those of dimension 5 or more. We summarize the known results on the classificationof semifields in the following table. Results obtained by our methods are included inboldface.

Number of S3-classes d = 3 d = 4 d = 5 d = 6

q = 2 1 3 3 80

q = 3 2 12 7 (commutative) ?

q = 4 2 28 ? ?

q = 5 3 42 ? ?

Table 1: Number of finite semifields of dimension d over GF (q)

None of these classes (except one of order 32 and another one of order 64) is relatedto a nonprimitive semifield. The search of such rare semifields is another problem wehave paid attention to. Currently, we can state that no nonprimitive commutativesemifield of order 128 exists. This is the last step in the quest of these semifields, whichwe collect in the following table.

Order 8 16 32 64 81 128

Number of nonprimitive semifields 0 0 1 1 0 0 (commutative)

Table 2: Number of nonprimitive semifields



Acknowledgements

This work has been partially supported by MEC - MTM - 2007 - 67884 C04 - 01,IB08-147, MEC -TIN - 2007 - 61273 and MEC- TIN -2007 - 29664 - E.

References

[1] A. A. Albert, On nonassociative division algebras, Transactions of the AmericanMathematical Society 72 (1952), 296-309.

[2] A. A. Albert, Finite noncommutative division algebras, Proceedings of the Amer-ican Mathematical Society 9 (1958), 928-932.

[3] A. A. Albert, Finite division algebras and finite planes, Proceedings of Symposiain Applied Mathematics 10 (1960), 53-70.

[4] A. A. Albert, Generalized twisted fields, Pacific Journal of Mathematics 11

(1961), 1-8.

[5] W. Bosma, J. Cannon, C. Playoust, The Magma algebra system. I. The user

language. J. Symbolic Comput., 24 (3-4) (1997), 235-265.

[6] A. R. Calderbank, P. J. Cameron, W. M. Kantor, J. J. Seidel, Z4-

Kerdock codes, orthogonal spreads, and extremal Euclidean line-sets, Proc. LondonMath. Soc 75 (1997), 436–480.

[7] Elıas F. Combarro, I.F. Rua, New Semifield Planes of order 81, (2008) (un-published).

[8] M. Cordero, G. P. Wene, A survey of finite semifields, Discrete Mathematics208/209 (1999), 125-137.

[9] U. Dempwolff, Semifield Planes of Order 81, J. of Geometry 89 (2008), 1-16.

[10] L. E. Dickson, Linear algebras in which division is always uniquely possible,Transactions of the American Mathematical Society 7 (1906), 370-390.

[11] S. Gonzalez, C. Martınez, I.F. Rua, Symplectic Spread based Generalized

Kerdock Codes, Designs, Codes and Cryptography 42 (2) (2007), 213–226.

[12] M. Hall(Jr.), The theory of groups, Macmillan, (1959).

[13] I.R. Hentzel, I. F. Rua, Primitivity of Finite Semifields with 64 and 81 elements

International Journal of Algebra and Computation 17 (7) (2007), 1411-1429.

[14] D. R. Hughes, F. C. Piper, Projective planes, Graduate Texts in Mathematics,Vol. 6. Springer-Verlag, New York-Berlin, (1973).



[15] W. M. Kantor, M. E. Williams, Symplectic semifield planes and Z4-linear

codes, Transactions of the American Mathematical Society 356 (2004), 895–938.

[16] W. M. Kantor, Finite semifields, Finite Geometries, Groups, and Computation,(Proc. of Conf. at Pingree Park, CO Sept. 2005), de Gruyter, Berlin-New York(2006).

[17] E. Kleinfeld, A history of finite semifields, Lecture Notes in Pure and Appl.Math. 82, Dekker, New York, 1983.

[18] D.E. Knuth, Finite semifields and projective planes, Journal of Algebra 2 (1965),182-217.

[19] J. P. May, D. Saunders, Z. Wan, Efficient Matrix Rank Computation with

Applications to the Study of Strongly Regular Graphs, Proceedings of ISSAC 2007,277-284, ACM, New-York, 2007

[20] I.F. Rua, Primitive and non primitive finite semifields, Communications in Alge-bra, 32 (2) (2004), 793-803

[21] I.F. Rua, Elıas F. Combarro, J. Ranilla, Classification of Semifields of Order

64, J. of Algebra, (2009) (to appear, doi:10.1016/j.jalgebra.2009.02.020)

[22] The GAP Group, GAP – Groups, Algorithms, and Programming, Version

4.4.12, (2008) (http://www.gap-system.org).

[23] R. J. Walker, Determination of division algebras with 32 elements, Proceedingsin Symposia of Applied Mathematics 75 (1962), 83-85.

[24] G. P. Wene, On the multiplicative structure of finite division rings, AequationsMathematicae 41 (1991), 222-233.



First Programming, then Algebra

Julio Rubio1

1 Departamento de Matematicas y Computacion, Universidad de La Rioja


Abstract

This note reports on a small experiment for certifying the correctness of queryoptimizations. In an unexpected way, we found that natural properties of search-ing programs led to a new axiomatic for groups. Then, applying elementary grouptheory to permutations groups we got new properties of programs dealing withdata structures. Thus, this experience seems to indicate that programming is con-ceptually previous to algebra. But, obviously, logic is surrounding and underlyingour whole approach.

Key words: mathematical models, software verification

1 Starting the story

Some time ago, we undertake a project to certify the correctness of query optimizationalgorithms in relational databases (see [1], for instance), by means of the ACL2 auto-mated theorem prover [2]. Being ACL2 a system based in Common Lisp, relations arenaturally represented as lists of lists: a relation is a non-empty list, the first elementbeing the header of the relation (a list of distinguishable attributes, i.e. simply a listwithout duplicates; no information on datatypes is considered in a first step); the restof elements are the tuples of the relation: simply lists of the same length as the header.

The first relational algebra [1] operation considered is projection: from a relationand a subset of attributes of its header, a new relation is obtained by projecting in eachtuple the data corresponding to the selected attributes. To program this operation twosteps are needed:

1. To find the indexes of the attributes in the header.

2. To extract the data corresponding to these indexes in each tuple.

The first step can be based in the following function:



(defun locate (x list)(if (endp list)

0(if (equal x (first list))

0(+ 1 (locate x (rest list))))))

[Here endp is the predicate checking if its argument is an empty list, first extractsthe first element of a list, and rest returns the list without its first element; indexes oflist elements in Lisp rang from 0 to n− 1, where n is the length of the list.]

The previous function is extended naturally to lists:

(defun maplocate (list1 list2)(if (endp list1)

(list )(cons (locate (first list1) list2)

(maplocate (rest list1) list2))))

[Being cons the primitive function taking as argument an element and a list, andconstructing a new list with the element as first element and with the second argumentas rest.]

For the second step, the basic operation is nth which from an index and a list,extracts the corresponding element. This is again naturally extended to lists:

(defun mapnth (indexesList list)(if (endp indexesList)

(list )(cons (nth (first indexesList) list)

(mapnth (rest indexesList) list))))

From these two tools, maplocate and mapnth, to define the projection operatoris straightforward. The first result to be proved in ACL2 for each operator is it is welldefined. In the case of projection, the statement is as follows:

(defthm projection-is-well-defined(implies (and (all-different? attributesList)

(subset? attributesList (first relation))(relationp relation))

(relationp (projection attributesList relation))))

where two predicates all-different? and subset? (checking, respectively, that in alist there is no duplicate, and that all the elements of a first list are elements of a secondone) have been used. The proof of this first statement poses no problem in ACL2.

The first optimization property is the following one:

(defthm certification-first-optimization(implies (and (all-different? attributesList1)


Julio Rubio

(all-different? attributesList2)(subset? attributesList1 attributesList2)(subset? attributesList2 (first relation))(relationp relation))

(equal (projection attributesList1(projection attributesList2 relation))

(projection attributesList1 relation))))

Proving it in ACL2 requires some effort, related evidently to the properties linkingmaplocate and mapnth, and therefore based on those linking locate and nth. Or, inother words, depending on the properties of sequential searching in linear structures.

2 Properties of searching

A first property related to the correctness of searching is the following one:

(defthm nth-locate(implies (member x list)

(equal (nth (locate x list) list)x)))

which can be extended naturally to lists:

(defthm mapnth-maplocate(implies (and (true-listp list1)

(subset? list1 list2))(equal (mapnth (maplocate list1 list2) list2)

list1)))

[Where the technical condition on true-listp uses this predicate to ensure that thelast tail of a list is really the empty list; this is an ACL2 function, but not a CommonLisp primitive; see [2] for details.]

This property can be slightly generalized in the following way:

(defthm mapnth-maplocate-maplocate(implies (and (subset? list1 list2)

(subset? list2 list3))(equal (mapnth (maplocate list1 list2)

(maplocate list2 list3))(maplocate list1 list3))))

If locate is applied to a list without duplicates, there is a second correctnessproperty of locate:

(defthm locate-nth(implies (and (natp i) (< i (length list))

(all-different? list))



(equal (locate (nth i list) list)i)))

[Where natp is a predicate checking if its argument is a non-negative integer.]Again, this admits a simple extension to lists:

(defthm maplocate-mapnth(implies (and (true-listp indexesList)

(all-different? list)(locate-listp indexesList (length list)))

(equal (maplocate (mapnth indexesList list) list)indexesList)))

[Where locate-listp is a predicate checking its first argument is a list whose elementsare suitable indexes for list.]

Finally, let us present a new theorem which is the essential one to prove the certi-fication of the first optimization, because it captures the property of the compositionof two projections:

(defthm mapnth-is-transitive(implies (subset? list1 list2)

(equal (mapnth (maplocate list1 list2)(mapnth (maplocate list2 list3) list))

(mapnth (maplocate list1 list3) list))))

Let us stress that all the properties in this section (even the last one which couldseem a bit intricated) are proved directly by ACL2, with the help of some small auxiliarylemmas, but without needing to define new ad-hoc induction schemes. Thus we canconsider them as natural properties of maplocate and mapnth, since they follow from therecursive definition of both functions in such a way that ACL2 can find automaticallythe right induction schemes.

3 New axiomatic for groups

A relationship which is always underlying in relational databases theory, but whichis never implemented in Databases Management Systems (due to its huge complexitycost), is that of equality between two relations. If we try to implement it in ACL2, wemust in particular to check when two headers are equal. This is defined by the naturalequality as sets:

(defun equal-as-sets? (list1 list2)(and (subset? list1 list2)

(subset? list2 list1)))

When we apply maplocate to a pair of equal headers, we obtain a very particularkind of indexes list. Since ACL2 is not only a proving environment but also an executingone, we can show the result of such a call:


Julio Rubio

> (maplocate ’(d c b a) ’(c a d b))(2 0 3 1)

The resulting list is a permutation (represented here as a list of length n, containingthe numbers from 0 to n − 1). This is a first clue indicating that group theory couldbe somehow related to our database problem.

An important permutation is the identity, computed by means of a function wecall id-permutation:

> (id-permutation 4)(0 1 2 3)

In order to prove in ACL2 the reflexive property of the equality between relations,an important property is the following one.

(defthm maplocate-id-permutation(implies (all-different? list)

(equal (maplocate list list)(id-permutation (length list)))))

Now, we are ready to collect all the properties on maplocate. But, first, let usremark that all the premises of the maplocate theorems in the previous section and inthis one, are satisfied by permutations (in particular, two permutations are always equalas sets, and can have no duplicate element). Let us denote by G the set of permutations(of a fixed length n), and let us call 1 the identity permutation, and denote mapnth by∗ : G×G→ G and maplocate by : G×G→ G. Then the following list of “axioms”

• Bx0. g g = 1

• Bx1. (g1 g2) ∗ (g2 g3) = g1 g3

• Bx2. (g1 g2) ∗ g2 = g1

• Bx3. (g1 g2) ∗ ((g2 g3) ∗ g4) = (g1 g3) ∗ g4

corresponds respectively to the theorems we called previouslymaplocate-id-permutation, mapnth-maplocate-maplocate,maplocate-mapnth and maplocate-is-transitive.

Let us now call ( )−1 : G→ G the function defined by g−1 := 1 g. Then it is veryeasy to prove the following theorem.

Theorem 1 Let G be a set with a distinguished element 1 ∈ G and with two binaryinternal operations ∗ : G×G→ G and : G×G→ G satisfying properties Bx0, Bx1,Bx2 and Bx3. Then [G, ∗, ( )−1, 1] defines a group. In addition: g1 g2 = g1 ∗ g−1

2 andg1 ∗ g2 = g1 (1 g2).



Let us remark that, a posteriori, this gives a new axiomatic for groups with onlyone operation (namely, (a, b) 7→ a ∗ b−1) and four axioms. However the interest ofthis discussion is not the theorem itself (it is no more than a beginner’s exercise) orthe axiomatic found (as it is redundant in some sense, and many other variants arepossible), but the surprising fact that it has appeared in the context of the propertiesof searching: the natural properties (at least natural in the ACL2 sense) of maplocateare just the properties needed to prove that permutations form a group. This experienceseems to indicate that programming is conceptually previous to algebra (at least if logicis present, here under the form of ACL2).

And, even more important from a practical point of view, this modest observationallows us to obtain, automatically, a good number of new properties of our programs,properties which are instrumental to prove the correctness of database algorithms.

4 Discovery of new program properties

Likely, the most striking fact is the definition of maplocate in terms of mapnth andvice versa, when dealing with permutations, as explained in the “in addition” part ofTheorem 1 in the previous section; in other words, application can be expressed interms of searching, and vice versa.

Another direct consequence of our treatment is the associative property for mapnth:

(defthm mapnth-is-associative(implies (and (permutationp g1 n)

(permutationp g2 n)(permutationp g3 n))

(equal (mapnth g1 (mapnth g2 g3))(mapnth (mapnth g1 g2) g3))))

whose proof is, in principle, far from obvious.Furthermore, every result from group theory has now a reading with respect to

maplocate and mapnth, and thus, a possible application to our query optimizationproblem. As a final example, the following property

(defthm inverse-maplocate(implies (and (permutationp g1 n)

(permutationp g2 n))(equal (inv-permutation (maplocate g1 g2))

(maplocate g2 g1))))

where

(defun inv-permutation (permutation)(maplocate (id-permutation (length permutation))

permutation))


Julio Rubio

is of direct application to the proof of the symmetric property of the equality betweenrelations.

The important point in our approach is not the fact that these results can be provedin ACL2, but that they follow without proving effort and, in particular, without looking(automatically or “by hand”) for induction schemes; proofs follow from simple rewritingbased on groups axioms. This is a great improvement from a practical (and even froma conceptual) point of view.

5 ACL2 technical issues

The technical ACL2 tool used to deal with axiomatic structures is that of encapsulates.An encapsulate has a list of function signatures and a number of properties on theencapsulated functions. In addition, ACL2 demands giving a witness for the set offunctions, satisfying the properties. This ensures that the encapsulate has at least onemodel, and avoids introducing inconsistencies in the ACL2 logic.

Here, a part of an encapsulate for group theory is shown:

(encapsulate(((dmp * ) => *) ; domain definition((unt ) => *) ; unit((inv * ) => *) ; inverse((prd * *) => *) ; product

)

(local (defun dmp (x) (equal x nil)))(local (defun unt ( ) nil))(local (defun inv (x) (declare (ignore x)) nil))(local (defun prd (x y) (declare (ignore x y)) nil))

(defthm dmp-unt(dmp (unt)))

(defthm dmp-inv(implies (dmp x) (dmp (inv x))))

; ... lines skipped

(defthm Ax3(implies (and (dmp a) (dmp b) (dmp c))

(equal (prd a (prd b c))(prd (prd a b) c)))))

)

Remark that, in addition to the unit (here presented as a 0-ary operation), theinverse and the product, we also include a predicate determining the domain for the



group. After the signatures, some local definitions are written, defining a witness forthe encapsulate (in the example, it is the trivial group over a singleton whose uniqueelement is nil). Then the properties start: first, axioms related to domain invariance(the unit is an element of the group, the inverse is an internal operation, and so on),and finally the usual axioms for a group.

From this encapsulate, some new theorems can be deduced by using only the prop-erties abstracted inside the encapsulate (the group axioms, in our case study). Then,ACL2 is capable of exporting these theorems for any set of functions satisfying boththe signatures (in a loose sense, as we will see later) and the encapsulated axioms. Tothis aim, ACL2 provides the user with a mechanism of functional instantiation.

Thus, in the next statement we are indicating to ACL2 how instantiating thedifferent functions appearing in a group encapsulate (here, the name tns has beenchosen to denote the binary operation).

(defthm mapnth-is-associative(implies (and (permutationp g1 n)

(permutationp g2 n)(permutationp g3 n))

(equal (mapnth g1 (mapnth g2 g3))(mapnth (mapnth g1 g2) g3)))

:hints (("Goal":use (:functional-instance D-Ax3

(trick (lambda () (natp n)))(dmp (lambda (per) (permutationp per n)))(unt (lambda () (id-permutation n)))(prd (lambda (per1 per2) (mapnth per1 per2)))(tns (lambda (per1 per2) (maplocate per1 per2)))))))

Several points are worth noting in this code fragment. First, let us note thatpermutations are organized as an indexed family of groups: for each natural numbern , there is a permutation group. This can be syntactically observed in our predicatepermutationp which takes as second argument a natural number. In general, to work inmechanized mathematics with indexed structures is a hard problem. Nevertheless, thisproblem can be smoothed in ACL2 because functions can be instantiated by means ofthe so-called pseudo-lambda expressions (see [2]). In particular, free variables can occurin the body of a pseudo-lambda expression, as n in (lambda (per) (permutationpper n)) (this possibility is explicitly forbidden in actual ACL2 functions [2]). Thisallows us to deal comfortably with the index in each permutation group.

Another interesting question is that axioms must have the same pattern as thestatements they are replaced for. As an example, the unit for permutations is only anactual permutation if its argument is a natural number. Formally, the result is read:

(defthm id-permutation-is-permutation(implies (natp n)

(permutationp (id-permutation n) n)))


Julio Rubio

Thus, if we try to instantiate the encapsulate introduced at the beginning of thissection, ACL2 cannot verify the proof obligation (dmp (unt)), since the (natp n)premise has been lost. Therefore, we must specify the axioms for a group in a conditionalmanner as, for instance:

(defthm dmp-unt-bis(implies (trick)

(dmp (unt))))

Then trick is instantiated by (lambda () (natp n)) in our case, while for the witnesswe can just define locally (defun trick () t), where t is the boolean Lisp constantrepresenting the true value.

All these technical issues can be considered not too elegant, but they are effectiveand allow the modeler to get his objectives.

6 Ending the story

The objective of this short note is simply to report on a case study showing that,in the field of program verification, mathematical structures (namely, algebraic ones)seems to appear as a consequence of the natural properties of algorithms, instead ofbeing an external tool to be applied to the programs. In this sense, we can declarethat programming is previous to algebra (at least if enough logic is in background; thecomputational logic is present in our case under the form of the automated theoremprover ACL2). If this experimental observation is only an isolated fact or if it can beextended to other contexts is a subject for further investigation.

Acknowledgements

Partially supported by Ministerio de Educacion y Ciencia, Spain, project MTM2006-06513.

References

[1] R. Elmasri, S. Navathe, Fundamentals of Database Systems, Addison Wesley,2007.

[2] M. Kaufmann, P. Manolios, J. Moore, Computer-Aided Reasoning: An Ap-proach, Addison Wesley, 2004.





30 June, 1–3 July 2009.

On the Approximation of Controlled Singular Stochastic

Processes

George A. Rus1, Richard H. Stockbridge1 and Bruce A. Wade1

1 Department of Mathematical Sciences, University of Wisconsin-Milwaukee


Abstract

Singular stochastic control processes can model a wide variety of problems,however, they are not easily solvable. Therefore, it is imperative to develop efficientnumerical methods for these problems. We reformulate each stochastic controlproblem as an infinite-dimensional linear programming (LP) problem, and obtainfeasible solutions for the LP problem through a least-squares finite element method.An optimal solution for the problem is selected using a constrained optimizationtechnique.

Key words: Stochastic Control, Least Squares Method

1 Introduction

In this article we give a brief summary of some numerical analytic techniques to approx-imate solutions of stochastic control problems. Consider a process X which satisfiesthe stochastic differential equation

dX(t) = µ(X(t), u(t)) dt + σ(X(t), u(t)) dW (t) +m(X(t−)) dξ(t) (1)

in which X denotes the state of the process, u is a control process, W is a standardBrownian motion and ξ is a non-decreasing process whose set of times of increase aretypically singular with respect to Lebesgue measure. For example, ξ could be a countingprocess that increases when X reaches some boundary with m providing a jump of theprocess away from the boundary, or ξ might be a local time process with m pushing Xjust enough so as to reflect it back into a certain domain.

In this setting, a decision maker must select a control process u from a collectionof admissible processes in such a way as to minimize some cost criterion or maximizesome reward criterion. This paper examines the expected long-term average cost

lim supt→∞

t−1E

[∫ t

0c0(X(s), u(s)) ds +

∫ t

0c1(X(s−), u(s−)) dξ(s)

]

. (2)


Controlled Singular Stochastic Processes

Singular stochastic control processes have been used to model a wide variety ofproblems. Some of the applications include, but are not limited to, portfolio opti-mization, sequential investment, dynamic control of queueing networks, and optimalrotation problems. The survey paper by Shreve [11] examines a number of additionalapplications. The long-term average criterion is of particular importance in queueingtheory where the stationary distribution is often used to determine the costs associ-ated with the control. For example, Harrison and Wein [3] discuss a queueing networkproblem where they consider two single-server stations and two types of customers withthe goal of finding a dynamic sequencing policy at the first station that minimizes thelong-run average expected number of customers in the system. Dai and Harrison [2]examine the steady-state analysis of a reflected Brownian motion and apply their re-sults to a queueing problem. The paper by Shen, Chen, Dai and Dai [10] revisits theuncontrolled problem of [2] and employs a finite element approach to estimate the sta-tionary distribution. Their results are extended in this paper to the controlled setting.Use of finite elements has also been applied to a wear model by Kaczmarek [4], thoughconvergence results were not obtained.

The key to the analysis of the long-term average cost is the equivalence between thestochastic control problem displayed above and an infinite-dimensional linear program(LP) over a space of pairs of measures. Such equivalence results for processes withoutsingular behavior was proven by Stockbridge [12], Bhatt and Borkar [1] and Kurtz andStockbridge [6]. The extension to singular processes involves existence results in Kurtzand Stockbridge [7] and is given in Kurtz and Stockbridge [8].

This paper employs finite elements to approximate the stationary densities corre-sponding to the optimal measures for the equivalent LP. It begins with a slightly moregeneral stochastic formulation of the problem and the derivation of the equivalent LP.It then introduces finite-dimensional approximations and establishes convergence of thevalues of the approximating LPs to the value of the infinite-dimensional LP. Solution ofthe approximating LPs are then obtained using finite elements to estimate the densitiesof the optimal measures. The paper concludes with an example for which the exactsolution can be obtained in closed-form and an examination of the level of accuracyand convergence rate.

2 Linear Programming Formulation and Least Squares

The stochastic control problem can be equivalently reformulated as the infinite-dimensionallinear program ([7, Theorem 2.1]):

Minimize

∫

c0 dµ0 +

∫

c1 dµ1

Subject to

∫

Af dµ0 +

∫

Bf dµ1 = 0, ∀f ∈ D∞,

µ0 ∈ P(E × U),µ1 ∈ M(E × U),

(3)


George Rus, Richard Stockbridge, and Bruce Wade

in which D∞ denotes a countable set of test functions that is dense in an appropriatespace. We refer to (3) as LP∞. For each n, let Dn = f1, . . . , fn.

Define the feasible sets

M∞ =

(µ0, µ1) ∈ P(F ) ×M(F ) :

∫

Af dµ0 +

∫

Bf dµ1 = 0,∀f ∈ D∞

Mn =

(µ0, µ1) ∈ P(F ) ×M(F ) :

∫

Af dµ0 +

∫

Bf dµ1 = 0,∀f ∈ Dn

and form the linear programming problem having finitely-many constraints

LPn =

Minimize

∫

c0 dµ0 +

∫

c1 dµ1

Subject to (µ0, µ1) ∈ Mn.

Observe that, by construction, we have M∞ ⊂ Mn+1 ⊂ Mn, for each n ≥ 1.To simplify notation, for each n ∈ 1, 2, 3, . . . ∪ ∞ denote with Mn the value of

the objective function

Mn(µ0, µ1) =

∫

c0 dµ0 +

∫

c1 dµ1,

and with M∗n the infimum associated with an optimal solution.

Theorem 2.1 Let LPn and LP∞ be defined as above and let M∗n and M∗

∞ be the

optimal value for LPn and LP∞ respectively. Then, M∗n →M∗

∞ as n→ ∞. Moreover,if

for each n ≥ 1 (µ(n)0 , µ

(n)1 ) is an optimal solution for LPn with the mass of µ

(n)1 bounded,

then there exists an optimal solution (µ(∞)0 , µ

(∞)1 ) for LP∞ such that µ

(n)0 ⇒ µ

(∞)0 and

µ(n)1 ⇒ µ

(∞)1 .

For each feasible pair of measures, (µ0, µ1), decompose them into their regularconditional probability distributions ηi on U given the state x and the marginal distri-butions µE

i of x:

µi(dx× du) = ηi(x, du)µi(dx× U) = ηi(x, du)µEi (dx) i = 0, 1.

Assume there exists a reference measure ν defined on the space E such that the measuresµE

0 << ν and µE1 << ν. Let p and m be the corresponding densities:

µE0 (dx) = p(x)ν(dx) and µE

1 (dx) = m(x)ν(dx).

Define Af(x) and Bf(x) by

Af(x) =

∫

U

Af(x, u) η0(x, du), and Bf(x) =

∫

U

Bf(x, u) η1(x, du).

With this notation, the constraints now take the form

∫

E

Af(x) p(x)ν(dx) +

∫

E

Bf(x)m(x)ν(dx) = 0, ∀f ∈ Dinfty,∫

Ep(x)ν(dx) = 1

(4)



We rewrite (4) in a more compact form. Let Tf(·) =(

Af(·), Bf(·))

, ρ(dx) =(ν(dx), ν(dx)) and q(x) = (p(x),m(x)). Let f and g be L2(E) functions and define theinner product 〈f, g〉 by 〈f, g〉 ≡

∫

Gf0g0 ν(dx)+

∫

Gf1g1 ν(dx). Then the adjoint relation

(4) is〈Tf, q〉 = 0, ∀f ∈ D〉\⊔† (5)

and hence Tf⊥q for all f ∈ D∞, or equivalently, q ∈ H⊥, where H⊥ denotes theorthogonal complement of H, the closure of the linear span of D∞. Define e1 bye1 = (1, 0). Then

〈q, e1〉 =

∫

(q · e1)ρ(dx) =

∫

E

p(x)ν(dx) = 1 6= 0

and so e1 /∈ H.Let φ denote the orthogonal projection of e1 onto H:

φ = arg minh∈H

‖e1 − h‖2 .

Define ψ ≡ e1 − φ 6= 0 and note that ψ ∈ H⊥. Also, denote by z

z = 〈e1, ψ〉 = 〈ψ + φ,ψ〉 = 〈ψ,ψ〉 + 〈φ,ψ〉 = 〈ψ,ψ〉 > 0.

Proposition 2.2 Suppose that q is in L2(E, ρ) and define ψ as above. Then 1zψ sat-

isfies (5) and∫

E(1

zψ · e1) dρ = 1. Assuming ψ does not change sign, then q = 1

zψ a.e.

with respect to ρ.

Our goal is to determine q, which amounts to finding φ, the orthogonal projectionof e1 onto H. The linear space H is infinite dimensional and trying to find a solutionof the least squares problem is generally impossible. One common approach is toapproximate the space H by a finite dimensional space Hn and obtain an approximatesolution using Hn. We can choose a finite dimensional approximation Dn of the spaceand obtain a convergent approximation of solutions for the LP. We defined the spaceH in terms of the space of functions D, and so, a natural choice for the approximatingspace Hn should be

Hn = Tf : f ∈ Dn.

Therefore, for each n, we will find φn, the orthogonal projection of e1 onto the spaceHn. The following proposition ensures that, as n→ ∞, the approximation solution φn

converges to the infinite dimensional solution φ.

Proposition 2.3 Suppose that there exists a sequence of finite dimensional subspaces

Hn of H such that Hn ր H as n→ ∞. Let

φ = arg minh∈H

‖e1 − h‖2and φn = arg min

h∈Hn

‖e1 − h‖2 .

Then ‖φ− φn‖2 → 0 as n → ∞. Furthermore, letting ψ ≡ e1 − φ and ψn ≡ e1 − φn,

then ψn → ψ in L2(E, ρ) as n→ ∞.



Observe

φn =

n∑

j=1

αjTfj (6)

for some scalar coefficients αj, with j = 1, 2, . . . , n. Then e1 − φn ∈ H⊥ and for eachi ∈ 1, 2, . . . , n we have Tfi ∈ H. Therefore, 〈Tfi, e1 − φn〉 = 0 and hence

0 = 〈Tfi, e1 − φn〉 = 〈Tfi, e1〉 − 〈Tfi, φn〉 ∀i.

Thus, using the definition of φn we obtain

〈Tfi, e1〉 = 〈Tfi, φn〉 =

⟨

Tfi,∑

j

αjTfj

⟩

.

Using again the addition property of the inner product, and rewriting the right handside, we obtain

∑

j

αj〈Tfi, T fj〉 = 〈Afi, 1〉 ∀i.

3 An example

Consider a one-dimensional process that is a controlled diffusion in the interior of theinterval [0, 1], is reflected at 0, and jumps from 1 to 0.

Figure 1: A one-dimensional example

When X(t) ∈ (0, 1), the process X satisfies the stochastic differential equation

dX(t) = u(t) dt+ σ dW (t), (7)

in which W is a standard Brownian motion, and the control process u has the hardconstraint u(t) ∈ [−1, 1] for all t ≥ 0. Let N1(t) denote the number of jumps from 1 to0 in the interval [0, t]. The objective is to minimize

lim supt→∞

1

tE

[∫ t

0X2(s) ds+ cN1(t)

]

; (8)



that is, the objective of the controller is to select the drift rate u in such a manneras to minimize the long-term average second moment of the process subject to a costfor the process jumping back to the origin. The diffusion generator is Af(x, u) =

uf ′(x) + σ2

2 f ′′(x).The process X has two different singular behaviors: an instantaneous jump from

x = 1 to x = 0 when the process hits 1 and reflection at 0. For clarity of exposition,we separate out these two behaviors by assigning each its own generator. The jumpbehavior of X is specified through the jump operator

B1f(x) = f(0) − f(x) (9)

which indicates that the process jumps to 0 from x. We adopt the appoach of modelingthe reflection at 0 by defining

B0f(x) = f ′(x). (10)

The general formulation of the dynamics of the stochastic process are specified asa solution (X,Λ,Γ0,Γ1) of the singular controlled martingale problem for (A,B0, B1);that is, there exists a filtration Ft such that (X,Λ,Γt

0,Γt1) is Ft-progressively mea-

surable,

(a) Γ0 is a random measure on [0, 1] × R+ such that

Γ0(0 × [0, t]) = Γ0([0, 1] × [0, t]), ∀t ≥ 0;

(b) Γ1 is a random measure on [0, 1] × R+ satisfying

Γ1(1 × [0, t]) = Γ1([0, 1] × [0, t]), ∀t ≥ 0; and

(c) for each f ∈ C2[0, 1],

f(X(t)) − f(X(0)) −

∫ t

0

∫

[−1,1]Af(X(s), u)Λs(du) ds

−

∫

[0,1]×[0,t]B0f(x) dΓ0(dx× ds) (11)

−

∫

[0,1]×[0,t]B1f(x) dΓ1(dx× ds))

is a mean 0, Ft-martingale.

In this formulation, condition (a) implies that Γ0 captures the local time of X at0 and (b) enforces that the jump behavior occurs only at the endpoint 1. Noticethat when f satisfies the restrictions f ′(0) = 0 and f(0) = f(1), the singular parts ofthe martingale problem drop out, resulting in a solely absolutely continuous controlledmartingale problem.

Let V = (0, 1) and partition the space V = [0, 1] into n segments; this will be amesh for our basis functions. We require each basis function to be C2 in the interior ofeach segment and C1 globally. Hermite cubic polynomial functions (below) will satisfythe requirements.



Figure 2: Rescaled Hermite cubic polynomial basis functions with n = 2

Summarizing the development up to this point, assuming there are densities wehave:

Minimize

∫

[0,1]x2p(x)dx+ cm1

Subject to

∫

[0,1]Af(x)p(x)dx+

∫

[0,1]f ′ν0(dx) +

∫

[0,1][f(0) − f(x)]ν1(dx) = 0, ∀f ∈ Bn

∫

[0,1]

p(x)dx = 1, p ≥ 0, ν0, ν1 ∈ M([0, 1]) (12)

with ν0(dx) = m0δ0(dx) and ν1(dx) = m1δ1(dx).

The optimization problem seeks a control in order that the value of the objectivefunction is minimized. We assume for the moment that the control u is fixed, notnecessarily constant on the space E and compute the density p and point masses m0

and m1 associated with this fixed control. Then, once the density is obtained, weevaluate the objective function. For each choice of the control u, we can repeat theprocess and obtain the density and the value of the objective function. Therefore, weoptimize over choices of u, and we select a control variable that gives the optimal value.

Figure 5 displays the optimal control when n = 20 elements are used. Observe theapproximate optimal control is constant on each element and so, it is defined for eachx ∈ (0, 1). Also, observe that u switches from −1 to 1 between x = 0.75 and x = 0.80.

We assume the control u can take values from a finite collection of values Um whichwe defined earlier, and we assume that the control is constant on each element. Thedifference of this method versus a brute force method is that, instead of going throughall possible combinations of controls, we use a random number generator to select thevalue of the control. To be more specific, let n be the number of elements and denote



Figure 3: Function pa(x) with σ2 = 2 and a = .75

Figure 4: Function Fσ(a) with σ2 = 2

n value error

6 1.54296 × 10−01

3.096 × 10−04

8 1.53988 × 10−01

1.615 × 10−06

10 1.54296 × 10−01

3.096 × 10−04

12 1.53988 × 10−01

1.615 × 10−06

14 1.54144 × 10−01

1.583 × 10−04

16 1.53987 × 10−01

6.570 × 10−07

18 1.54081 × 10−01

9.499 × 10−05

20 1.53987 × 10−01

3.950 × 10−07

22 1.54049 × 10−01

6.278 × 10−06

24 1.53986 × 10−01

3.000 × 10−07

26 1.54031 × 10−01

4.426 × 10−06

Table 1: Accuracy of long-term average cost.

with ui the value of the control variable on the i-th element, for each i ∈ 1, 2, . . . , n.We start with an initial guess, say ui = 0 for each i, and compute the long-term averagecost. We use a random number generator to obtain new values for the control constantsui, compute the value of the objective function, and compare it with the previouslyobtained value. If the new value is smaller, we record it and choose a different set of



Figure 5: Approximate optimal control

constants ui. We continue in this fashion until an optimal solution is obtained. Eventhough the algorithm eventually converges, it is not a very big improvement from thebrute force method.

We let the method randomly choose values for the control u from the interval[−1, 1]; however, in order to simplify calculations and accelerate the convergence tothe solution, we limit the method to a fixed number of choices. For all the numericalcomputations, we use the set of controls Um = −1,−0.9,−0.8, . . . , 0.8, 0.9, 1 for eachvalue ui, 1 ≤ i ≤ n.

Table 2 displays the long-term average cost, error, and CPU times for the random-ization method. Once again, the errors are computed using the exact solution. Observethe computation times for the random method versus the ones from the brute forcemethod presented in Table 1. To emphasize the faster times for the random method, we

n value error CPU times (s)

10 1.54296 × 10−01

3.096 × 10−04

63

15 1.54033 × 10−01

4.677 × 10−05

387

20 1.53988 × 10−01

3.942 × 10−07

2448

25 1.53997 × 10−01

1.073 × 10−05

12144

30 1.54019 × 10−01

3.268 × 10−04

57234

35 1.53996 × 10−01

9.989 × 10−06

217784

40 1.53986 × 10−01

2.237 × 10−07

543587

Table 2: Accuracy of long-term average cost using a random optimization method.

want to point out the times associated with n = 30. The brute force method producedthe results in 17×106 seconds; whereas, the random method took 12144 seconds, whichis 1400 times faster. Also, remember that when we computed the long-term averagecosts using the brute force method, we only used two possible values for the control −1and 1, compared to 21 possible values for the random method.

The results obtained by the random optimization problem seem to be very positive,but we would like to compare them with previous results. Together with P. Kaczmarek,S.T. Kent, R.H. Stockbridge, and B.A. Wade [5] we analyzed the same problem and wenumerically approximated the solution. In that paper we compared several methods.We used finite differences to approximate the differential operators, a finite elementmethod to approximate the stationary density, as well as a dynamic programming



approach. As mentioned in the introduction, the finite element method gave betterresults than the other approaches. We will present a comparison of the errors for thefinite element method and the LSFEM introduced in this article. In Table 3 we denoteby FEM the results obtained from [5] and with LSFEM the results we obtained withthe method employed in this paper.

n LSFEM value FEM value error LSFEM error FEM

10 1.5430 × 10−01

1.5417 × 10−01

3.096 × 10−04

1.884 × 10−04

20 1.5399 × 10−01

1.5411 × 10−01

3.942 × 10−07

1.236 × 10−04

40 1.5399 × 10−01

1.6221 × 10−01

2.237 × 10−07

8.223 × 10−03

60 1.5399 × 10−01

1.5400 × 10−01

2.146 × 10−07

1.146 × 10−05

80 1.5399 × 10−01

1.5399 × 10−01

2.131 × 10−07

5.901 × 10−06

100 1.5399 × 10−01

1.5399 × 10−01

2.129 × 10−07

3.469 × 10−06

Table 3: Comparison between FEM and LSFEM.

Acknowledgements

This research has been supported in part by the U.S. National Security Agency under GrantAgreement Number H98230-09-1-0002 . The United States Government is authorized to repro-duce and distribute reprints notwithstanding any copyright notation herein.

References

[1] AG Bhatt and VS Borkar, Occupation measures for controlled Markov processes: Charac-terization and optimality, Ann. Probab., (1996), Vol. 24, p1531-1562.

[2] JG Dai and JM Harrison, Steady-state analysis of RBM in a rectangle: numerical methodsand a queueing application, Annals of Applied Probability, (1991), Vol. 1, p16-35.

[3] JM Harrison and LM Wein, Scheduling Networks of Queues: Heavy Traffic Analysis of aSimple Open Network, Queueing Systems, (1989) Vol. 5, p265-280.

[4] P Kaczmarek, Numerical Analysis of a Long-term Average Stochastic Control Problemwith Regime-Switching, Diplomarbeit in Wirtschaftsmathematik, Universitat Ulm, 2007.

[5] P Kaczmarek, ST Kent, GA Rus, RH Stockbridge and BA Wade, Numerical Solution ofa Long-term Average Control Problem for Singular Stochastic Processes, Mathematical

Methods of Operations Research, (2007), Vol. 66, p451-473.

[6] TG Kurtz and RH Stockbridge, Existence of Markov Controls and Characterization ofOptimal Markov Controls, SIAM Journal of Control and Optimization, 36 (1998), 609–653.

[7] TG Kurtz and RH Stockbridge, Stationary Solutions and Forward Equations for Controlledand Singular Martingale Problems, Electronic Journal of Probability (2001), Vol. 6, PaperNo. 14, p1-52.

[8] TG Kurtz and RH Stockbridge, Linear Programming Formulations of Singular Stochastic

Control Problems, Working Paper, 2009.

[9] B Øksendal, An Introduction to Stochastic Differential Equations, 6nd ed., (2003),Springer-Verlag, New York.



[10] X Shen, H Chen, JG Dai and W Dai, The Finite Element Method for Computing theStationary Distribution of an SRBM in a Hypercube with Applications to Finite BufferQueueing Networks, Queueing Syst., (2002) Vol. 42, p33-62.

[11] SE Shreve, An Introduction to Singular Stochastic Control. Stochastic differential systems,

stochastic control theory and applications (Minneapolis, Minn., 1986), Vol. 10, p513-528,IMA Vol. Math. Appl., Springer, New York-Berlin, 1988.

[12] RH Stockbridge, Time-Average Control of Martingale Problems: Existence of a StationarySolution, Annals of Probability, 18 (1990), 190–205.





30 June, 1–3 July 2009.

High-Performance Monte Carlo Radiosity

on the GPU using CUDA

J. R. Sanjurjo1, M. Amor1, M. Boo2, R. Doallo1 and J. Casares1

1 Dept. of Electronics and Systems, Univ. of A Coruna, Spain

2 Dept. of Electronics and Computer Engineering, Univ. of Santiago de Compostela,

Spain

emails: [email protected], [email protected], [email protected],[email protected], --

Abstract

Global illumination provides realistic image synthesis, but its high computa-tional requirements limits its use in practice. In this paper we present an imple-mentation of the Monte Carlo radiosity algorithm on the GPU. Our proposal isbased on the partition of the scene in sub-scenes to be processed in parallel to ex-ploit the graphics card structure. The convex partition method employed permitsthe data locality exploitation and the optimization of the ray shooting proceduredue to the minimization of the number of objects to be tested in the intersectioncalculation. We have used the CUDA programming environment on a NVIDIAGeForce 9800 GTX. The results are good in terms of execution times, increasingthe flexibility of previous solutions and showing that the GPU can outperform theCPU results even for non-regular algorithms.

Key words: radiosity, Monte Carlo method, GPU, ray shooting

1 Introduction

Global illumination simulation [5] is employed for realistic image synthesis with appli-cation in product design, architecture and interior design, computer games or computeranimation, among others. In general, the global illumination methods try to solve therendering equation [7] that describes the light propagation in closed scenes. Specificallythe radiosity method [5] simplifies the rendering equation by considering only ideal dif-fuse surfaces. This method is particularly suited for applications such as lighting designor walkthroughs of architectural models.

One possible approach for the radiosity computation is the Monte Carlo algorithm.The Monte Carlo methods are stochastic and randomly sample the space associated


High-Performance Monte Carlo Radiosity on the GPU using CUDA

with the problem to arrive at approximate solutions. Monte Carlo methods are usedto solve a variety of complex problems arising in numerous areas such as computa-tional physics, finance, economics and many others. Monte Carlo Radiosity (MCR)methods [5] can handle much larger and complex scenes. However, the computationalrequirements of the algorithm are high and so execution time optimizations are desir-able.

In the MCR algorithm a large number of independent rays are chosen stochasticallyby a weighted sampling of the objects according to their power after previous iterations.Once a ray is shot, the procedure follows by searching the first object intersected byeach ray. This procedure, called ray shooting, is the main computational core of thealgorithm. This is due to the large number of objects of the scene and, in consequence,the large number of ray-object intersections to be computed.

There are a lot of recent works on GPU-based global illumination, such as [3, 4,8, 11]. In this paper we present a new approach to implement the MCR algorithm ona GPU. GPUs offer a platform for high performance computing not only for graphicsbut also for non-graphics processing (GPGPU-General-Purpose computation on GPU).As examples, many applications can be cited: video encoding and decoding, signalprocessing, physics simulation, computational finance or computational biology. GPUis capable of executing a very high number of threads in parallel and result in a goodcandidate to implement any algorithm based on a ray shooting strategy. This is dueto the fact that ray shooting is extremely parallel at the thread level, since every raycasting in the procedure can be computed independently. Moreover, the ray neverrequires information from other rays.

The recent interest in GPGPU has stimulated the improvements in the programma-bility of GPU and the appearance of different programming languages. Conventionalshader graphic languages such as NVIDIA Cg, Microsoft High Level Shader Language(HLSL), or OpenGL Shader Language (GLSL) are optimized for graphics and are dif-ficult to use for GPGPU. Several new proposals have appeared to promote the utiliza-tion of the GPU for general purpose applications such as ATI Brook+ [2] and NVIDIACUDA [10]. CUDA (Compute Unified Device Architecture) is an extension to the Cprogramming language that permits GPU programming without the need of mappingthe programs to a graphics API. The GPU is treated as a coprocessor that executesdata-parallel kernel codes. This allows a more general and flexible programming modelfacilitating the programmability of the new generations of GPUs.

Although the utilization of new programming environments facilitates the pro-grammability of the GPUs, different challenges have to be solved to optimize the re-sults of a direct implementation. In particular, a straightforward implementation ofthe MCR algorithm using CUDA does not produce the expected performances. In ourproposal, the scene is partitioned into sub-scenes according to the convex partitionprocedure presented in [1]. The sub-scenes are distributed among cores to speedupthe computation by partitioning the load among them with a careful management ofmemory hierarchy. The results show that the partitioning strategy together with thedata management employed lead to good results in terms of execution time.

The rest of the paper is organized as follows: in Section 2 we briefly describe the


J. R. Sanjurjo, M. Amor, M. Boo, R. Doallo, J. Casares

MCR method. In Section 3 we present our proposal for the MCR implementation onthe GPU using CUDA. Experimental results are analyzed in Section 4, while Section5 presents the main conclusions of the work.

2 The Monte Carlo Radiosity

The radiosity method [5] solves the global illumination problem considering only idealdiffuse surfaces. The resulting discrete radiosity equation is:

Bi = Ei + ρi

Ns∑

j=1

BjFij (1)

where Bi is the radiosity of surface Si, Ei is the emittance, ρi the diffuse reflectivity andNs is the number of surfaces of the scene. The summation represents the contributionof the other surfaces of the scene and Fij is the form factor between surfaces Si andSj. Fij is an adimensional constant that only depends on the geometry of the sceneand represents the proportion of the radiosity leaving Si that is received by Sj. Thereis one radiosity equation per surface and one form factor among all pairs of surfaces.

MCR [5] provides reliable form factor computation while explicit form factor stor-age is avoided. The basic idea is to compute the geometry relation between two surfacesSi and Sj by casting a set of Nri rays from surface Si. The amount of rays that finallyintersect with Sj permits the estimation of the form factor between Si and Sj.

In this work we focus on a stochastic relaxation method [9] that uses the Jacobi-iteration to solve the set of radiosity equations. This method works with radiant powervalues Pi = Bi · Ai rather than with radiosity values Bi (being Ai the area of surfaceSi). The radiosity equation can easily be expressed in terms of power:

P k+1i = Pei + ρi

Ns∑

j=1

P kj · Fji (2)

where Pei is the self emitted power of Si and P 0i = Pei. At each iteration the unshot

power is propagated rather than the total power. An approximation for the total poweris then obtained as the sum of increments ∆P k computed in each iteration:

∆P k+1i = ρi

Ns∑

j=1

∆P kj · Fji, P k

i =k

∑

l=0

∆P li (3)

where ∆P 0i = Pei. Sums can be estimated stochastically by randomly picking terms

from the sum up according to some probability. One possible approach consists of cast-ing a set of rays from each surface Si following a random cosine distribution direction.Consequently, the destination surface Sj is randomly selected being the closest surfaceintersected by each ray.

The structure of the incremental stochastic Jacobi iterative radiosity algorithmis outlined in Figure 1. While not converged the process is iteratively repeated (line



1 while ( Convergence ( ) )

23 Number samples(Nr);45 for ( Si = scene→Surface initial; Si; Si = Si →next)

67 Number samples(Nri);89 for ( l = 1; l ≤ Nri; l++)

10 Random Point(xl);11 Random Direction(Dl);12 Sj = Surface nearest(xl, Dl);

13 ∆P k+1

j = ∆P k+1

j + ρj · Power ray;14

15

16 for ( Si = scene→Surface initial; Si; Si = Si →next)

17 Pi = Pi + ∆P k+1

i ;19

Figure 1: Incremental Stochastic Jacobi Iterative algorithm

1). The convergence is checked according to a certain threshold value in the energydistributed in the previous iteration. In line 3 the number of samples Nr is calculated,this number is proportional to the amount of power ∆P k

T to be propagated in eachiteration, and all rays carry the same amount of power, Power ray (see line 13). Then,the number of rays Nri to be shot from each surface Si is computed (line 7). To shoota ray l from a surface Si a point xl on the surface of the surface is selected and the rayis finally shot in a cosine-distributed direction Dl at xl (lines 10 and 11). For each shotray, the closest surface Sj intersected by the ray is determined (line 12). After this,∆P k+1

j is calculated (line 13). Finally, for each surface its total power Pi is incremented(line 17).

Therefore, a large number of independent rays are chosen stochastically by aweighted sampling of the objects according to their power after previous iterations.Once a ray is shot, the procedure follows-on by searching the first object intersected bythis ray. This ray shooting procedure is the main computational core of the algorithm.This is due to the large number of surfaces of the scene and, in consequence, the numberof rays involved.

3 GPU implementation using CUDA

In this section we present our MCR implementation on a GPU using CUDA. A straight-forward CUDA implementation of the MCR algorithm does not exploit the hardwarecapabilities of the GPU. Our MCR algorithm allows the optimization of the GPU im-plementation though a partitioning strategy. Specifically our proposal is based on thesubdivision of the scene into sub-scenes to exploit all the potential computational re-sources available: multiple computational cores and local access to a common memory.



1P NcP2P

Multiprocessor 1

Multiprocessor 2

Multiprocessor Nm

. . .

Global Memory

. . .

Shared Memory

Figure 2: Structure of the GPU

Figure 2 depicts a scheme of the structure of a GPU. The system consists of Nm

Streaming Multiprocessors (SM). Each SM has Nc cores, where each one can executea thread. Each SM has a private small size memory space with low access latency,called shared memory. On the other hand, all SMs have access to the same globalmemory providing a very high bandwidth when accesses are contiguous, in other casesthe achievable bandwidth is a fraction of the maximum. There are also two additionalmemories, omitted in the figure, that are optimized for specific memory usage: theconstant and the texture memory.

The cores execute kernels which are programs invoked by the CPU, that acts ashost. A kernel creates many parallel threads organized in a hierarchy of a grid ofindependent thread blocks. A thread block is a set of concurrent threads that cancooperate among themselves by efficiently sharing data through shared memory andsynchronizing their execution to coordinate memory accesses. In general, a high numberof threads are required to occupy the cores and the architecture allows efficient datasharing and synchronization among threads in the same thread block.

To achieve a high performance computing on this architecture the proper balancingof shared resources usage is critical. A basic strategy is to improve data reuse and reduceglobal memory access. In this case one thread loads a datum and then synchronizes sothat other threads in the same block can use it. We present a method to improve thedata reuse based on the data locality exploitation among threads of the same block.In order to increase the reutilization of the data, a blocking transformation strategybased on the partitioning of the scene is performed.

The original code is changed to work on sub-scenes of a lower size so data beingaccessed can fit into the texture cache of each SM. Therefore, the whole scene is dividedinto Np uniform disjoint sub-scenes with a convex structure. The version we haveemployed makes a geometric uniform partition of the input scene. It splits the scenedomain into a set of disjoint sub-scenes of the same shape and size. This kind ofpartitioning is computed quickly and straightforwardly and, as will be shown later,



1 while ( Convergence ( ) )

23 Number samples(Nr);45 for ( Si = scene→Surface initial; Si; Si = Si →next)6 Number samples(Nri);78 for ( t = 0; t < Nt; t + + )

910 KERNEL1 (Nb1, Ntb1)

11 Random Point(xl);12 Random Direction(Dl);13 Tj = Surface nearest(xl, Dl);14 H [j] + +;15

16

17 KERNEL2 (Nb2, Ntb2)18 Pi = Pi + H [i] · ρi · Power ray;19

Figure 3: CUDA implementation of the MCR algorithm

also provides high data locality. This technique allows the exploitation of the datalocality reducing the number of accesses to the global memory.

The structure of our CUDA code is shown in Figure 3. A CUDA program consistsof code segments that are implemented on CPU and other code segments implementedas kernel functions on GPU. Specifically, our implementation is based on two kernels:KERNEL 1 and KERNEL 2. The process is iteratively repeated while there is noconvergence (line 1). In line 3 the number of samples Nr is calculated. Then, thenumber of rays Nri to be shot from each surface Si is computed (line 6). KERNEL 1 isrepeated Nt stages to optimize the utilization of memories (line 8). This kernel (lines10–15) executes the ray shooting procedure, that is, the calculation of the nearestobject for each ray shot. Finally, the total power Pi of each surface is incremented inKERNEL 2 (lines 17–18). A detailed explanation of the algorithm is presented in thefollowing.

The key idea behind our implementation in CUDA is getting a high number ofindependent threads to keep the cores occupied together with the optimal exploitationof the memory hierarchy. These objectives were obtained through three techniques: anefficient thread scheduling, an independent multithread algorithm, and the accelerationof the ray intersection test. In the following these three proposals are summarized.

The first key of our proposal is an efficient thread scheduling associated with thescene subdivision strategy. Our implementation on GPU follows a fine-grain approach.The objective is the utilization of a high number of threads to keep the cores busy.Specifically in our MCR proposal each thread processes a given ray and all the raysof a surface are processed in the same block. This permits the sharing of commoninformation among rays. This information is stored in the shared memory of the SM



with the objective of improving the memory access pattern and the simplification ofthe computations associated with the ray shooting procedure.

In the invocation of the kernel that calculates the ray shooting procedure, calledKERNEL 1 (see Figure 3), not all surfaces shoot at the same time because this degradesthe utilization of the memories. Specifically, the texture memories that have a cacheworking set of 8 KB per multiprocessor, are used to store information of the surfacesthat will be shot. Therefore, KERNEL 1 is invocated in consecutive Nt stages (seeline 8 of Figure 3), where Nt has the following value:

Nt = maxk=1,···,Np

⌈

Npsk

Nps

⌉

(4)

being Nps the number of surfaces per sub-scene that are grouped in each KERNEL 1invocation and Nsk the number of surfaces in the sub-scene k. We have verified thatgood results are obtained when the number of blocks per stage t is computed through:

Nb1 =1

Ntb1

Np∑

k=1

Nps∑

i=1+Nps·t

Nrki

(5)

being Nrki the number of rays of surface Si in the sub-scene k, and Ntb1 the number

of threads per block in KERNEL 1. In current GPUs the number of threads per blockhas a value between 32 and 512 and usually is 192 or 256. This number was selectedas a compromise between the attempt to allocate more threads per block to achieve anefficient time slicing and the number of registers available per thread.

The second key of our proposal is the independent multithread strategy we havefollowed. The processing of each ray requires read-only memory accesses to process thescene geometry. But together with these read operations there is an associated writeoperation to store the power increment of the nearest object ∆P k+1

j (see line 13 ofFigure 1). Thus, multiple threads could try to increment the same power at the sametime. As an example Figure 4 shows that two rays detect the same nearest object. Thisresults in a conflict due to the multiple write memory accesses.

To obtain an efficient independent multithread algorithm, we have designed a sim-ple strategy based on the utilization of the atomicAdd( ) function, an efficient atomicread-modify-write primitive supported by CUDA. This way, each intersected surfaceSj increments the element j of an auxiliar vector called H[j] (see Figure 3) using theatomicAdd( ) function. At the end of each iteration of the algorithm, the power ofeach surface is finally updated with a second kernel, KERNEL 2. Instead of computing∆P k+1

j as indicated in line 13 of Figure 1.The third key of our implementation is the utilization of acceleration structures

to reduce the number of ray-object intersections calculated. There are a large num-ber of spatial structures that reduce the number of ray–object intersection tests to beperformed per ray. The technique most widely employed for CPU implementations isthe Kd-tree structure [6]. This is due to its simplicity (both in concept and imple-mentation), versatility and because it is a powerful sorting and classification structure.We have analyzed the inclusion of this acceleration structure per sub-scene to test its



T

T

T

1

2

3

r1

r2

Figure 4: Multiple rays with the same nearest object

benefits. Due to storage limitations the resulting Kd-tree was mostly stored in theglobal memory. The results of our analysis indicate inefficiencies in the utilization ofthe coalesced accesses to global memory with this structure.

We have obtained better results when the scene partitioning inherent to our pro-posal is employed as a unique guide to accelerate the intersection test. Our partitioningstrategy has two benefits, first the optimization of the global memory access patternand second, the reduction of the number of intersection tests to be performed. In detail,the first main advantage of employing the scene partition as an acceleration techniqueis that coalesced accesses to global memory are permitted. Then close references tomemory are grouped in a unique memory access with the consequent reduction of thenumber of accesses. Additionally, the utilization of a sub-scene structure permits thesimplification of the ray shooting procedure which searches the closest intersected ob-ject by a given ray. This is due to the convex partitioning strategy employed. Ourpartitioning technique assures that, if source and intersected surface both belong tothe same sub-scene, the intersection test can be performed locally without checking theother sub-scenes. This reduces the number of surfaces to be tested and minimizes thenumber of memory accesses.

4 Experimental Results

We have tested our MCR implementation on a system with an Intel Core2 2.4 GHz,2 GB RAM and a GeForce 9800GTX. Three different input scenes have been usedfor our tests: Classroom (see Figure 5.a), Livingroom (see Figure 5.b) and Studio

(see Figure 5.c). The Classroom scene presents a regular distribution of objects. TheLivingroom scene has an irregular distribution of objects with non-illuminated areas (acorridor without lights). The Studio scene has an irregular distribution of illuminatedobjects.

First, we analyze the performance of our proposal according to the number ofsubdivisions Np when the number of surfaces per sub-scene that are grouped in eachKERNEL 1 invocation is Nps = 1. Figure 6 shows the execution times for the differentscenes and different Np values. As can be observed, the partitioning strategy permitsthe reduction of the timing requirements. The benefits are larger when the Np value isincremented. This large reduction in the execution times is mainly due to the reduction



(a) (b) (c)

Figure 5: Test scenes: (a) Classroom (b) Livingroom (c) Studio

1 2 4 8 16 32 64 128 256 512 1024 20480

100

200

300

400

500

600

700

800

Execu

tion

tim

e (

secs)

Np

Classroom

Livingroom

Study

Figure 6: Execution times for different number of subdivisions (with Nps = 1)

of the number of global memory accesses and the efficient utilization of thread blocks.This optimization is based on the data locality exploitation and it is directly associatedwith the partitioning strategy employed.

Let us compare these results with the reference application running in the CPU.We have employed as a reference an optimized sequential code including the Kd-treeacceleration technique. As an example, the execution time for the Livingroom is 194seconds with this reference application, while for the GPU partitioned version withNp = 256 this time is lowered to 16.2 seconds. Therefore, a high speedup of 12.0 isobtained for this example.

However for larger Np values this trend changes. The best results in terms ofexecution times are obtained with Np = 256. This can be better appreciated in Figure 7where only high Np values are included. This figure depicts two execution times perscene, the time for the iterative procedure on the GPU and the total time. This valueincludes time associated with the partitioning procedure, executed in the CPU. As canbe observed in this figure, for larger Np values the GPU time decreases while the totaltime increases. In consequence the larger timing requirements of the application are



128 256 512 1024 20480

5

10

15

20

25

30

35

Execu

tion

tim

e (

secs)

Np

Classroom total

Classroom GPU

Livingroom total

Livingroom GPU

Study total

Study GPU

Figure 7: Zoom of Figure 6 with 128 ≤ Np ≤ 2048

1 2 4 8 16 32 64 128 256 512 1024

Nps

0

2

4

6

8

10

12

14

16

18

20

22

24

26

Execu

tion

tim

e(s

ecs)

Classroom

Livingroom

Study

Figure 8: Analysis in function of Nps (with Np = 256)

associated to the partitioning procedure (executed in the CPU). Therefore a proposalfor future work is the improvement of the partitioning procedure to reduce its timingrequirements.

Figure 8 shows the influence of the Nps value in the execution times. Takinginto account the analysis of the results indicated above, this figure was obtained forNp = 256. We have verified that this Np value is also a good selection when Nps 6= 1.As can be observed in the figure, the best performance is obtained with Nps = 8 forClassroom and Nps = 16 for Livingroom and Studio. These values are large enough toassure a high number of threads for keeping the cores busy. But at the same time aresmall enough to optimize the utilization of the texture memory used as cache.

Finally, we have analyzed in detail the efficiency of our GPU implementation withrespect to the conventional CPU implementation. Table 1 shows the execution times forthe three scenes under test (Figure 5) for a CPU implementation and for three differentversions of our GPU version. Speedups are indicated between parentheses. The firstcolumn of results indicates the CPU reference implementation. This is an optimizedsequential version of the algorithm presented in Section 2 with a Kd-tree as acceleration



Scenes CPU GPU GPU without GPU withsubdivision Kd-tree

Classroom 62.8 9.6 (6.6) 38.1 (1.7) 18.7 (3.4)Livingroom 194.0 16.2 (12.0) 42.7 (4.6) 31.1 (6.2)

Studio 68.0 11.6 (5.8) 49.8 (1.4) 19.8 (3.4)

Table 1: Comparative of CPU with different GPU implementations

technique. The second column of results includes the results of our GPU proposal as itwas described in Section 4. We display the lowest times obtained, these correspond withNp = 256 and Nps = 8 for Classroom and Nps = 16 for Livingroom and Studio. Thethird column of results corresponds to our GPU proposal with Np = 1 and Nps = 256,that is, a scene without scene subdivision. We include this data to emphasize thebenefits of the utilization of the uniform subdivision. The last column of results, GPUwith Kd-tree, is the GPU implementation using a Kd-tree for acceleration purposesfor the optimum Np and Nps values. As can be observed, very good results in termsof speedup are achieved in all tests for the GPU implementations. As was previouslyindicated, the best results were obtained when the scene partitioning is employed asa unique technique to accelerate the ray-object intersection test. The speedup in thiscase is at least 5.8 and up to 12, very good results taking into account the nature ofthe scenes employed.

5 Conclusions

In this paper we have presented a new method for the implementation of Monte CarloRadiosity algorithm on the GPU. Our proposal is based on the subdivision of thewhole scene into sub-scenes as a strategy to increase the parallelism, to optimize theutilization of available processors and to increase the data locality reducing the numberof accesses to the global memory.

Our proposal allows the efficient exploitation of the hardware resources and memoryhierarchy available. This is achieved through three main techniques: efficient threadscheduling, an independent multithread algorithm and the utilization of a convenienttechnique for ray-object intersection test acceleration. Very good results were obtainedin all tests performed, with speedups always larger than 5.8. These results indicatethat the GPU can also be an alternative platform for non-regular algorithms.

Acknowledgements

This work has been partially supported by the Ministery of Science and Innovationof Spain under contract TIN 2007-67537-C03, Xunta de Galicia under the contract08TIC001206PR and by the High Performance Computing Galician Thematic Network

(G-HPC).



References

[1] M. Amor, J. R. Sanjurjo, E. J. Padron and R. Doallo, Progressive Ra-

diosity Method on Clusters Using a New Clipping Algorithm, Int. Journal of HighPerformance Computing and Networking (IJHPCN) 1 (2004) 55–63.

[2] ATI: ATI Stream Computing, 2009.

[3] E. Cheslack-Postava, R. Wang, O. Akerlund and F. Pellacini, Fast,

Realistic Lighting and Material Design using Nonlinear Cut Approximation, ACMTransaction on Graphics (TOG) 27(5) (2008).

[4] C. Dachsbacher, M. Stamminger, G. Drettakis and F. Durand, Implicit

Visibility and Antiradiance for Interactive Global Illumination, ACM Transactionon Graphics (TOG) 26(3) (2007).

[5] P. Dutre, P. Bekaert and K. Bala, Advanced Global Illumination, AK PetersLimited, 2nd Edition, 2006.

[6] V. Havran, Heuristic Ray Shooting Algorithms, PhD thesis, Czech TechincalUniversity in Prague, 2001.

[7] J.T. Kajiya, The Rendering Equation, SIGGRAPH Comput. Graph. 20(4)

(1986) 143–150.

[8] S. Laine, H. Saransaari, J. Kontkanen, J. Lehtinen and T. Aila, Incre-

ment Instant Radiosity for Real-Time Indirect Illumination, In Proc. EurographicsSymposium on Rendering (2007) 277-286.

[9] L. Neumann, W. Purgathofer, R. F. Tobler, A. Neumann, P. Elias P,

M. Feda and X. Pueyo, The Stochastic Ray Method for Radiosity, In Proc. ofthe Rendering Techniques (1995) 206–218.

[10] NVIDIA: NVIDIA CUDA Compute Unified Device Architecture. Programming

Guide, 2008.

[11] T. Ritschel, T. Grosh, M.H. Kim, H. -P.Seidel, C. Dachsbacher and J.

Kautz, Imperfect Shadow Maps for Efficient Computation of Indirect Illumina-

tion, ACM Transaction on Graphics (TOG) 27(5) (2008).





30 June, 1–3 July 2009.

Web Services based scheduling in OpenCF

A. Santos1, F. Almeida1 and V. Blanco1

1 Dpto. Estadıstica, I.O. y Computacion, Universidad de La Laguna, Spain


Abstract

In large scale, heterogeneous and dynamic systems, efficient execution of par-allel computations can require mappings of tasks to processors whose performanceis both irregular (because of heterogeneity) and time–varying (because of dinam-icity). The effective use of such platforms requires new approaches to resourcescheduling and application mapping capable of dealing with fault tolerant, hetero-geneous and dynamic nature of such systems. Web Service standards provides anincreased level of manageability, extensibility and interoperability between looselycoupled services. The adoption of these technologies at HPC sites for monitoringand scheduling will improve the efficient use of the computational resources. WebServices provide the ability to decompose HPC resources and functionality intoa set of discoverable and loosely coupled services, that are capable of interact inheterogeneous environments. At the same time, it can address many of the inter-operability issues that can be encountered in large scale systems. The OpenCFcomputational framework has extensively adopted WS technology for its imple-mentation. System performance monitoring or computational resource descriptionare offered to users as WS. We propose to decouple in different WS the tasksperformed by a scheduler, since these services can be used by third party clientapplications, their composition leads to a distributed meta–scheduler WS basedplatform suplying a wide range of applicability.

1 Introduction

Grid and Cloud Computing environments present the following characteristics [6]: mul-tiple administration domains, heterogeneity, scalability, and dynamicity or adaptability.Large scale systems add the difficulty to manage a huge amount of resources. All theseaspects completely determine the way scheduling and execution on Grids and Cloudsto be done. The effective use of such platforms requires new approaches to resourcescheduling and application mapping capable of dealing with the heterogeneous and dy-namic nature of such systems. Grid scheduling or superscheduling [8] has been definedin the literature as the process of scheduling resources over multiple administrative



Figure 1: OpenCF providing services for HPC system description, monitoring andscheduling

domains based upon a defined policy in terms of job requirements, system throughput,application performance, budget constraints, deadlines, etc. In general, this processincludes the following phases: resource discovery and selection; and job preparation,submission monitoring, migration and termination [17]

Web Service standards provides an increased level of manageability, extensibilityand interoperability between loosely coupled services. The adoption of these technolo-gies in the context of Grid and Cloud Computing has improved the efficient use of thecomputational resources. Projects like Globus [11] has been generated WS-based tech-nology to manage computational resources: systems monitoring or job managementand scheduling among others.

We can find in the literature meta–scheduling projects, some of them using WebServices technology like Gridway [13], a loosely-coupled meta-scheduler for interfacingsimultaneously with different Grid resource management services; Condor/G [9], whichprovides user tools with fault tolerance capabilities to submit jobs to a Globus basedGrid; Nimrod/G [1], designed specifically for Parameter Sweep Application (PSA) op-timizing user-supplied parameters like deadline or budget; GridLab Resource Manage-ment System (GRMS) [18], which is a meta-scheduler component to deploy resourcemanagement systems for large scale infrastructures; STAR Grid scheduler [20], devel-oped specifically for the Star experiment; and the Community Scheduler Framework(CSF) [7], an implementation of an OGSA-based meta-scheduler; and finally the En-abling Grids for E-sciencE (EGEE) Resource Broker [14], that handles job submissionand accounting.


A. Santos, F. Almeida, V. Blanco

Figure 2: The OpenCF architecture.

Most of meta–schedulers have a centralized view of the process of scheduling. Theyobtain information from the state of computational systems through WS monitoring,for example. And analyze the requirements of the job to be executed to decide whereto send it using a job scheduling policy like first-come-first-served (FCFS), least workfirst (LWF) or backfill for example. These requires the support for Web Services whichhas the ability to access and manipulate state, i.e., data values that persist across, andevolve as a result of, Web Service interactions. Globus Alliance and IBM has proposeda standard to achieve these goals: WSRF [2].

The Open Computational Framework (OpenCF) [15, 16, 3] is intended to ease theaccess to high performance computing resources for those users willing to use them. Ourmain focus is to diminish the significant barrier (with regard to technology and learning)users face when trying to access High Performance Computing Systems (HPCS).

The OpenCF computational framework has extensively adopted WS technologyfor its implementation. System performance monitoring or computational resource de-scription are offered to users as WS. We propose to decouple in different WS the tasksperformed by a scheduler, since these services can be used by third party client appli-cations, their composition leads to a distributed meta–scheduler WS based platformsuplying a wide range of applicability. System monitorization, load characterization,HPC site description or scheduling policy tasks can be implemented as standard WS.

In this work we present the design considerations related with scheduling issuesin the OpenCF project as well as the architectural modifications taken into accountto achieve this goal. Section 2 summarizes the OpenCF infrastructure. Performancemonitoring issues and resources description are covered in section 3. We discuss insection 4 about the enhancements to support distributed scheduling in OpenCF andfinally we conclude the work with some remarks and comments.

2 The OpenCF architecture

For OpenCF, we identified the following core services to be implemented in the portal:secure identification and authorization for users and inter-communication services, in-formation services for accessing descriptions of available host computers, applications



Figure 3: Job status under the Pelican project.

and users, job submission and monitoring through the queue system, file transfer, andfacilities for user and resource management.

Figure 2 depicts the OpenCF software architecture. In keeping with a modulardesign, two modules make up the OpenCF package, namely, the server (side) moduleand the client (side) module. Modules can be independently extended or even replacedto provide new functions without altering the other system components. The client andthe server implement the three lower-level layers of a WS stack: Service Description,XML Messaging, and Service Transport. The fourth level, Service Discovery, has notbeen implemented for security reasons. Thus, system managers still control the clientservices accessing the clusters via traditional authentication techniques.

The client provides an interface for the end user and translates the requests intoqueries for the servers. The server receives queries from authenticated clients andtransforms them into jobs for the queue manager. These modules, in turn, are alsomodularized. Access Control, Query Processors and Collector components canbe found on both the server and client sides. The client also holds a database to bettermanage the information generated by the system. The server includes elements for thescripting and launching of jobs under the queue system.

The following subsections describe the functionalities and technology used in eachof the aforementioned modules.

2.1 The Client

The client is the interface between the end user and the system. Users are registeredin the system through a form. Some of the information requested is used for securitypurposes, while the rest is needed for job management. This information is stored inthe client’s database. Next is a listing of the client module submodules.

• The client DataBase stores information on users, servers, jobs, input/output files,



Listing 1: Describing a Problem.

<?xml ve r s i on=” 1 .0 ” encoding=”UTF−8”?><?xml−s t y l e s h e e t type=” text / x s l ” h r e f=” job . x s l ”?>

<job xmlns :xs i=” ht tp : //www.w3 . org /2001/XMLSchema−i n s t ance ”xsi:noNameSpaceSchemaLocation=” job . xsd”><name>Hello World</name><s e rv i ce name>hello</ serv i ce name><binary>bin /hello</ binary>

<d e s c r i p t i o n>

Simple Hello World application

</ d e s c r i p t i o n>

<argument type=” i n t e g e r ”><name>greetings_num</name><sdesc>Number of greetings</ sdesc>< l d e s c>

The number of greetings to output .</ l d e s c>

</argument><argument type=” s t r i n g ”><name>name</name><sdesc>Person to greeting</ sdesc>< l d e s c>

The name of the person to greet .</ l d e s c>

</argument></ job>

etc. It has been implemented as a relational MySQL database and it is accessedthrough PHP scripts.

• The client Query Processor consists of a web interface through which the usercan access lists of available applications. Each entry in the lists shows a briefdescription of the routine. Tasks are grouped according to the servers supportingthem. When execution of a routine is requested, the target platform is implicitlyselected. An XHTML form for inputting parameters is dynamically generatedaccording to the job description.

• The client Collector manages the job’s server-generated output. The service no-tifies the user via e-mail when the job is finished. The state of the jobs submittedfor execution can also be checked through a web interface (see Fig. 3), and theresults stored into disk files.

2.2 The Server

The server manages all job-related issues, making them available at the service andcontrolling their state and execution. When Apache catches a new query from theclient, it allocates a new independent execution thread to create a new instance of theserver module.

• The Query Processor module consists of a set of PHP scripts that is responsiblefor distributing the job among the different components. Queries addressed for



the computational system are dispatched to the Queue Manager Interface, andthe rest of the queries are served by the Query Processor. The web serviceis also generated and served by this module. The service description document(WSDL) is automatically updated by the NuSOAP PHP class. This class alsohandles the SOAP messaging encapsulation of the packets.

• The Queue Manager Interface handles the interaction with the HPCS QueueSystem. The server needs to know how a job will be executed and how to query thestatus of a job being executed on the server supporting it. To do so, two PHP classmethods, the class OpenCFJob, have to be overwritten. These methods enable thejob to be executed (under the queue system) and the job status to be checked. Inaddition, an XML description of each available routine is needed to specify the job.As an example, listing 1 shows the XML description for a simple “Hello World”code. The tag <name> holds a representative identifier for the routine and thetag <binary> is the path to the executable file. Then, the problem descriptionand the routine arguments, along with their data types, are introduced. Once theuser submits a job request, the server executes the associated binary code withthe arguments supplied by the user. Thus, new services can be easily incorporatedinto the service just by adding the XML description file with the pre-compiledcode to the server.

• The Scripts Generator produces the scripts needed for the job execution un-der the various queue systems. The Script Generator is composed of a set oftemplates plus a processing engine to create the script. A different template isneeded for each of the queue managers supported. The template is instantiatedinto a functional script by the processing engine by the substitution of a fixed setof fields. These fields are obtained from the input data arguments for the job,from the XML job description document, and from the user data stored by theclient DataBase. Currently the Scripts Generator uses the template librarySmarty[19], a lightweight PHP library found in any UNIX/Linux system.

• The Launcher is the interface between OpenCF and the operating system; itforks the process to be executed, returns its identification and unlocks the threadhandling the client query. The implementation is Perl-based to be independentfrom the architecture. In future versions of the OpenCF accounting system, thismodule will be responsible for collecting and reporting on individual and groupuse of system resources.

• The Collector is the client interface which delivers the output data produced byan executed job. Once a job has finished, the queue system automatically sendsan e-mail to the user, and moves the output data files to a temporary directoryuntil they are downloaded from the client Collector. The server Collector,implemented as a PHP script, cleans up files from the temporary directories onthe server once they have been safely downloaded from the client.



3 Performance monitoring as a Web Service

OpenCF provides end users with an integrated vision of servers (HPCS) with groupedservices (sequential and parallel routines) through a client interface. Up to now thedecision of selecting the server where a service will be executed is left under the re-sponsibility of the end user. We focus on the necessity of an automatic selection ofresources (servers) to be allocated when a service is required. To achieve this target,we propose the use of WS technology to provide performance information as well asserver description of the HPC site. By services composition, the scheduler could decidewhich HPC resource will be appropriate for the user task/job requested.

3.1 Server description

We have introduced services providing full information about the servers, number ofnodes, node descriptions, memory, network, architecture, queue systems, etc. Sincethis information is provided as a service, it is fully integrated in the system and canbe invoked by other services. As far as we know, no effort has been made in order todevelop a standard specification for clusters or parallel machines. Using the JSDL [5]description language as starting point, we propose an XML specification that allowsfor a full description of the machines in an HPC site. This XML description can beautomatically generated from the system when a service is requested in OpenCF.

3.2 Resource capabilities

We also incorporate a set of tools that can be used to analyze in run time the resources ofthe servers, load of the nodes, capacity of the network, etc. The information gathered bytools like Ganglia [10], HawkEye [12] or NWS [21] can be offered as WS in the OpenCFFramework and used by users for performance evaluation purposes or by other tools(an scheduler, for example). We just made preliminary developments with Ganglia andsimple performace test.

4 Web Services based scheduling

The OpenCF web client module has been updated so that clients may decide betweena service view grouped by servers, or an unified one. According to the value of aconfiguration variable the client may show each server and the services provided on it(figure 4) or to show the whole set of services leaving aside the server provider (figure5). Services provided by more than one server are shown as one, and the schedulingassignment policy implemented on the client decides the server to launch the service.

The underlying idea in our scheduling approach is that each server provides aservice suplying an index, the normalized computational power (ncp), that gathers,as a whole, the information about computational capacity and load on a given server.The server with highest ncp is the best candidate to run a service. When launching aservice, the scheduling policy at the client is reduced to choose the server with the best



Figure 4: OpenCF job submission interface with automatic scheduling activated

ncp, or following a more relaxed policy to select the server with an ncp overcominga given threadhold. The lack of an standard metric providing such absolute indexis the main difficulty of the approach. Factors like CPUs power, storage capacity,communication network, load of the system, waiting time in queues, and some othersshould be considered when computing ncp. A second difficulty appears with the factthat the information about the features and conditions of each server is not sharedamong the servers, and many of this facts are relative to a given context, i. e., aprocessor knows its CPU power but it doesn’t know is that it the best machine in thewhole system.

Assumed that the two former difficulties pointed out can be solved, the ncp canbe computed in a distributed manner and, known ncp for each server, the decision tobe taken is quite simple in the client. So, the service implementing ncp has the maincompetency in the scheduling process. From now on we will describe a proposal tocompute ncp.

Let be S = si / si is a server providing a service to the system, si mayrepresent a sequential or a parallel homogeneous or non homogeneous system. ncp canbe defined as the function:

ncp : S → [0, 10]

si → ncp(si)

We assume that in the semantic of ncp the value 0 means worst and 10 means best,i. e., ncp is equal to 10 for the best server of the system that has the best architecture,is absolutely free of charge and is ready to launch a service, and ncp = 0 when a serveris not ready since is fully overloaded or is not ready to launch a service.



Figure 5: OpenCF job submission interface without scheduling support

The semantic of ncp should assume the fact that a server si with the highestcapacity with a 50% load has the same ncp than a server sj absolutely free of chargewith half the capacity of si, at the same time, using the same reasoning, a sequentialserver could be the best option to run a parallel code if the parallel server is overloadedor the capacity of the parallel server is overpassed by the sequential server capacity.

We abstract the capacity of server into two variables P and C, P for the compu-tational power and C for the computational load. The computational power gives theinformation about the computational capacity of the architecture (CPU power, storagecapacity, communication network, and any other to be considered), in its simplest naiveform, P can computed using the clock frequency of the processors or according to somebenchmark. The computational load is a measure of the amount of resources of thesystem being used (load of the system, waiting time in queues, etc.), the uptime systemcall can be used as a first approach. Obviously, to be consistent, all servers should usethe same approach to compute P and C. Some of the parameters influencing P andC can be statically computed just once when the server joins to the system, and someother depend on the moment a service is requested and must be dynamically computed.

To achieve the expected normalization of ncp we introduce fictitious values Pmax

and Cmax, known by all servers, that are provided by OpenCF at installing time. Pmax

represents the computational power of a server with the highest computational capacityand Cmax is a measure about maximum load. On each server s, P and C are computedlocally and then, we define ncp as:



ncp(s(P,C)) =P

Pmax∗ 10 + (10 − C

Cmax∗ 10)

2

=10( P

Pmax+ (1 − C

Cmax))

2

= 5(P

Pmax

+ 1 − (C

Cmax

))

where

0 ≤ P

P max

≤ 10 , 0 ≤ C

C max

≤ 10 and0 ≤ n(s(P,C)) ≤ 10

The semantics of P and C are given in terms of the server. For the case of P :

• In a sequential server the main factors to be considered are the CPU power andthe memory capacity.

• In a parallel server with a set of homogeneous processors, the CPU power mustconsider the total sum of the capacities for all the processors. If the server in-troduces heterogeneity due to differences in the CPU power of the individualprocessors, the total CPU power must consider the total sum of the capacitiesfor all the processors but, as usual in heterogeneous systems, in terms of the bestprocessor of the server [4]. The communication network capacity is another factorto be considered when computing P .

• In any case, a value of P = 0 means the worst server, and a value of P = Pmax

means the best server.

For the case of C:

• In the case of systems with non exclusive mode execution, this value represents theload of the system, where a value of 0 means that there is no other process underexecution in the system, a value or 10 means that the system is fully overloadedso that the execution of service in this server would be useless.

• In the case of systems that allow the exclusive mode execution (using for examplea queue system), the parameter C represents the expected delay to start theexecution of the service since the moment it is requested. A value of C = 0means that the execution of the service will start immediately and a value ofC = Cmax means that the execution will start after a waiting time greater thana given threadhold (2 weeks, 1 month, etc.) depending on the service.

Observe that when a sequential service is requested, in parallel servers, P andC provide computational capacity and load for the individual processor in the bestconditions.

It should be also noticed that our actual approach does not consider the variationsof the load during the execution nor process migration.



5 Conclusion

Web Services-based technologies have emerged as a technological alternative for compu-tational web portals. Facilitating access to distributed resources through web interfaceswhile simultaneously ensuring security is one of the main goals in most of the currentlyexisting tools and frameworks. OpenCF, the Open Source Computational Frameworkthat we have developed, shares these objectives and adds others, like enforced porta-bility, genericity, modularity and compatibility with a wide range of High PerformanceComputing Systems. OpenCF provides services for performance monitoring, HPC re-source description and job submission. We propose to decouple in different WS thetasks performed by a scheduler, since these services can be used by third party clientapplications, their composition leads to a distributed meta–scheduler WS based plat-form suplying a wide range of applicability. The underlying idea in our schedulingapproach is that each server provides a service suplying an index, the normalized com-putational power (ncp), that gathers, as a whole, the information about computationalcapacity and load on a given server. The server with highest ncp is the best candidateto run a service.

Acknowledgements

This work has been partially supported by the EC F(EDER) and the Spanish MICINN(Plan Nacional de I+D+I, TIN2008-06570-C04-03 and Spanish Network CAPAP-H”High Performance Computing on Heterogeneous Parallel Architectures” ACOO769926).

References

[1] D. Abramson, R. Buyya, and J. Giddy. A computational economy for grid com-puting and its implementation in the nimrod-g resource broker. Future Generation

Comp. Syst., 18(8):1061–1074, 2002.

[2] G. Alliance and IBM. The WS-Resource Framework. http://www.globus.org/

wsrf/.

[3] F. Almeida, V. Blanco, C. Delgado, F. de Sande, and A. Santos. IDEWEP: Webservice for astronomical parallel image deconvolution. JNCA, 32:293–313, Jan.2009.

[4] F. Almeida, D. Gonzalez, and L. M. Moreno. The master-slave paradigm on het-erogeneous systems: A dynamic programming approach for the optimal mapping.Journal of Systems Architecture, 52(2):105–116, 2006.

[5] A. Anjomshoaa, M. Drescher, D. Fellows, A. Ly, S. McGough,D. Pulsipher, and A. Savva. Job submission description language.http://www.gridforum.org/documents/GFD.56.pdf, Nov. 2005.



[6] M. Baker, R. Buyya, and D. Laforenza. Grids and grid technologies for wide-areadistributed computing. Softw., Pract. Exper., 32(15):1437–1466, 2002.

[7] P. Computing. Open source metascheduling for virtual organizations with thecommunity scheduler framework (CSF). Technical report, Platform Computing,2003.

[8] I. Foster and C. Kesselman, editors. The Grid 2: Blueprint for a New Computing

Infrastructure. Morgan Kaufmann, 2 edition, 2003. ISBN: 1558609334.

[9] J. Frey, T. Tannenbaum, M. Livny, I. T. Foster, and S. Tuecke. Condor-g: Acomputation management agent for multi-institutional grids. Cluster Computing,5(3):237–246, 2002.

[10] Ganglia monitoring system. http://ganglia.sourceforge.net/.

[11] Globus Toolkit: Open source software toolkit for building Grid systems.http://www.globus.org.

[12] Hawkeye: A monitoring and management tool for distributed systems.http://www.cs.wisc.edu/condor/hawkeye/.

[13] E. Huedo, R. S. Montero, and I. M. Llorente. A modular meta-scheduling architec-ture for interfacing with pre-ws and ws grid resource management services. Future

Generation Computing Systems 23, 23(2):252–261, 2007.

[14] E. Laure. Egee middleware architecture and planning (release 2). Technical report,European Grid for e-Science, 2005.

[15] OpenCF project webpage. http://opencf.pcg.ull.es/.

[16] A. Santos, F. Almeida, and V. Blanco. Lightweight web services for high perfor-mace computing. In European Conference on Software Architecture ECSA2007,number 4758 in Lecture Notes in Computer Science, Madrid, Spain, Sept. 2007.Springer-Verlag, Berlin, Heidelberg.

[17] J. Schopf. Ten actions when superscheduling. Technical report, Scheduling Work-ing Group: Northwestern University, July 2001.

[18] E. Seidel, G. Allen, A. Merzky, and J. Nabrzyski. Gridlab–a grid applicationtoolkit and testbed. Future Generation Comp. Syst., 18(8):1143–1153, 2002.

[19] Smarty template engine. http://smarty.php.net/.

[20] Star grid scheduler. http://www.star.bnl.gov/public/comp/Grid/scheduler/,2003.

[21] R. Wolski, N. T. Spring, and J. Hayes. The network weather service: a distributedresource performance forecasting service for metacomputing. Future Generation

Comp. Syst., 15(5-6):757–768, 1999.



An Algorithm for Generating Sequences of RandomTuples on Special Simple Polytopes

Efraim Shmerling1

1 Department of Computer Science and Mathematics, Ariel University Center ofSamaria, Israel

emails: [email protected] , [email protected]

Abstract

A triangulation algorithm for generating random tuples on simple polytopes ismodified and adapted to generate random tuples on special simple polytopes. Thepresented methods is based on splitting a special simple polytope into equal volumegroups of simplices, which makes generation of random tuples more efficient. Theadvantages and the scope of the presented algorithm are discussed.

Key words: Algorithm,Polytope,SimulationMSC 2000: AMS code : 65C10

1 Introduction

It is widely accepted that for dimensions higher than 4 simulation is the preferableoption for numerical calculation of multiple integrals. If the integration domain is asimple polytope defined by a system of linear inequalities for the variables x1, ...xn,the method of simulation requires generation of a sequence of random tuples uni-formly distributed in the polytope .If the polytope is a simplex, generation of ran-dom tuples is very simple and fast (see [6]). One only has to sample uniformlyn random numbers in (0,1), sort them in ascending order, thus obtaining a vector~U = (u1, ..., un),0 < u1 < u2 < ... < un < 1, and set u0 = 0 , un+1 = 1.The generatedrandom tuple is then given by

∑ni=0 Vi · (ui+1 − ui), where Vi, i = 0, n are vertices of

the simplex. For arbitrary simple polytopes the problem of generating random tuples ismuch more complicated. Several methods were suggested until recently: grid method(see [1]),sweep-plane method (see [5]), and triangulation method based on decomposi-tion of a polytope into simplices , sampling one of them with probabilities proportionalto their volumes and , finally , generating a random tuple on the simplex (see [1],[2]).Detailed algorithms and software implementing the grid and sweep-plane methods havebeen developed and utilized in numerical integration. Triangulation, however, attracted


Generating Random Tuples

much less attention due to the fact that splitting a convex polytope into simplices isnot a simple task as well as sampling a simplex, therefore according to common opinionalgorithms based on this approach can’t be effective. A new approach to integration viasimulation over a simple polytope was presented in [3].It is based on splitting a simplepolytope into special simple (SS) polytopes and thus reduces the problem to integra-tion over these polytopes. The main advantage of the new approach is that it requiresgeneration of random tuples on SS polytopes which is much easier than generation ofrandom tuples on arbitrary simple polytopes. Special simple polytope of dimension nis a polytope defined by a system of n inequalities of the form

ln(x1, ..., xn−1) ≤ xn ≤ rn(x1, ..., xn)ln−1(x1, ..., xn−2) ≤ xn−1 ≤ rn−1(x1, ..., xn−1)...l1 ≤ x1 ≤ r1 ,

(1)

wherelk(x1, ..., xk−1) = a0 +

∑k−1i=1 akixi

rk(x1, ..., xk−1) = b0 +∑k−1

i=1 bkixi , 2 ≤ k ≤ n ,l1, r1, aki, bki are constants.

(2)

In the article we present an algorithm for generating tuples uniformly distributed in anSS polytope based on creating equal volume groups of simplices. In order to generatea tuple we choose randomly one of the groups utilizing an integer numbers generator,then we use uniform random number generator to choose a simplex belonging to thegroup, and finally we generate a tuple on the chosen simplex. The idea of creatingequal volume groups of simplices is based on the fact that sampling one of these groupsrequires only generation of a random integer, and sampling one of the simplices fromthe group is easy since each group consists of a few elements.

The presented algorithm for generating random tuples in an SS polytope consists ofthree parts which are presented as separate algorithms in section 2.Algorithm 1 enablesone to split an SS polytope into simplices, algorithm 2 enables one to form equal volumegroups of simplices utilizing the simplices obtained via algorithm 1.Algorithm 3 enablesone to generate random tuples utilizing the groups created via algorithm 2.

2 Description of Algorithm 1

Algorithm 1 is a recursive algorithm for splitting an SS polytope of dimension n intoM simplices (M ≤ 2n−2 ∗ n!) Sk, k = 1, M . It can be viewed as a modification andadaptation of the complete barycentric subdivision algorithm (see [4]) to generationon SS polytope. This modification utilizes the fact that each face of a n-dimensionalSS polytope is an (n-1)-dimensional SS polytope in the hyperplane to which the facebelongs.Step 1. Calculate the coordinates of the vertex V er(n) = 1

2(LeftV er + RightV er) ,where LeftVer is the vertex of (1) such that for any k, 1 ≤ k ≤ n the k-th coordinate ofLeftVer is less or equal to the k-th coordinate of any point belonging to (1). RightVer


Efraim Shmerling

can be defined analogously .Step 2. (Execute for n ≥ 3).

• Transform each face of polytope (1) which belongs to a hyperplane defined by theequality

xk = lk(x1, ..., xk−1) (3)

or by the equalityxk = rk(x1, ..., xk−1) (4)

for some k , 1 ≤ k ≤ n , to an (n-1) - dimensional SS-polytope in the (n-1)-dimensional Euclidean space of coordinates x1, ..., xk−1, xk+1, ..., xn by excludingxk from the system (1) utilizing (3) or (4) .

• Split each of the obtained polytopes into 2n−3 ∗ (n − 1)! simplices of dimension(n-1) ,by executing Algorithm 1 (recursive call).

• Transform each of the obtained (n-1) - dimensional simplices to an n-dimensionalsimplex , one of the vertices of which is V er(n) and the other vertices are thevertices of the (n-1)-dimensional simplex transformed to n-dimensional verticesutilizing (3) or (4).

Step 3. Exit condition. If n=2 , split the 2-dimensional SS-polytope into 2 triangles .

3 Description of Algorithm 2.

Algorithm 2 is designed for splitting an SS polytope with volume Vpol into N groupsof simplices (group volumes equal). It utilizes the simplices created via algorithm 1 (infurther discussion we call them input simplices). In the group creation process someof the input simplices are split into several parts which are also simplices. The groupsconsist of those input simplices which are not split and parts of input simplices whichare split. We use the variables GC , SC , NPC , FV described below. At any timeduring the execution of algorithm 2 a certain group is being created (the value that thevariable GC takes is the index of this group) by including in it a certain input simplexor one of its parts (the index of this simplex is the current value of the variable SC). Ifthis simplex has been split into several parts, the number of parts is the current valueof the variable NPC. If the simplex SSC is not split, the current value of NPC is 1.The current value of FV is the difference between V pol/N and the sum of volumes ofthe simplices already included in the currently filled group. S

′SC denotes SC if NPC

equals 1, and if NPC is greater than 1 which means that SSC has been split into severalparts, S

′SC denotes the part of SSC that has not been included in any group.



Step 1 (initialization):

• Set group counter GC=1.

• Set input simplices counter SC=1.

• Set parts counter NPC=1.

• Set free volume FV =Vpol/N.

Step 2 (executed repeatedly while GC ≤ N)If V (S

′SC) < FV :

• Include S′SC in group GC.

• Set FV = FV − V (S′SC).

• Increment SC by 1.

• Set NPC =1.

If V (S′SC) > FV ) :

• Split S′SC into 2 parts.

• Include the part with volume FV in group GC .

• Increment GC by 1.

• Increment NPC by 1.

• Set FV = V pol/N .

If V (S′SC) = FV :

• Include S′SC in group GC .

• Increment GC by 1 ,

• Increment SC by 1,

• Set NPC=1,

• Set FV =V pol/N .

Step 3 (executed iff GC = N + 1):

• terminate

Any simplex included in some group has two indices : the first is the group numberand the second is the index of the simplex as a group member.S(i, j) denotes the j-thsimplex in the i-th group. For any simplex S(i, j), i = 1, N, j = 1, ki ,where ki is thenumber of simplices in group i, the value of p(i, j) = V (S(i, j))/(V pol/N) is calculatedimmediately after its inclusion in the group and kept in computer memory togetherwith the vertices of the simplex.


Efraim Shmerling

4 Description of Algorithm 3.

Algorithm 3 is designed for generating random tuples utilizing the groups of simplicescreated via algorithm 2 and consists of four steps.Step 1. Generate a group index i∗ by an integer- valued random number generator.Step 2. Generate a random number rn∗ belonging to the interval (0,1) by a standarduniform random number generator.Step 3. Find j∗ which is the minimal integer for which the inequality∑j∗

j=1 p(i∗, j) ≥ rn∗ holds.Step 4. Generate a random tuple on the simplex S(i∗, j∗).

5 Concluding Remarks.

1) The only universal random vector generation method well suited for generation ofrandom tuples on SS polytopes is the conditional distribution method described in [2].However for dimensions higher than 4 implementation of this method for SS polytopesrequires solving polynomial equations via a numerical method which is time expensive.2) The main disadvantage of the presented method relative to conditional distributionmethod is the need for setup : the groups of simplicies have to be created before start-ing to generate random tuples, but if the number of generated tuples is big enough theruntime for setup is negligible compared to the runtime for generation itself, and exper-imental calculations showed that generating random tuples via the presented algorithmis much more efficient than generation via the conditional distribution method.3) The presented algorithm can be viewed as a modification of the triangulation al-gorithm and its adaptation to generation of random tuples on SS polytopes. Thepresented modified triangulation algorithm appears to be the most effective for di-mensions higher than 4 when the number of tuples that need to be generated is verybig.

References

[1] L. Devroye, Non-Uniform Random Variate Generation , Springer-Verlag, New-York , 1986.

[2] W. Hormann, J. Leydold and G. Derflinger , Automatic Nonuniform Ran-dom Variate Generation , Springer, 2004.

[3] M. Korenblit, E. Shmerling, Algorithm and Software for Integraton over aConvex Polyhedron , LNCS 4151 (2006) 273–283.

[4] C.W. Lee , Subdivisions and triangulations of polytopes . Handbook of Discreteand Computational Geometry, CRC Press LLC, 1997.

[5] J. Leydold, W. Hormann, A Sweep-Plane Algorithm for Generating RandomTuples in Simple Polytopes , J. Math. Comp. 67 (1998) 1617-1635.



[6] R.Y. Rubinstein, D. P. Kroese , Simulation and the Monte Carlo method ,Wiley-Interscience, 2007.





30 June, 1–3 July 2009.

Computational aspects in the investigation

of chaotic multi-strain dengue models

Nico Stollenwerk1, Maıra Aguiar1 and Bob W. Kooi2

1 Centro de Matematica e Aplicacoes Fundamentais, Universidade de Lisboa, Portugal

2 Faculty of Earth and Life Sciences, Department of Theoretical Biology, Vrije

Universiteit Amsterdam, The Netherlands


Abstract

We investigate a multi-strain model for dengue fever epidemiology, includingtemporary cross-immunity. This model has been demonstrated to show determin-istic chaos in wide parameter regions. One of the tools for such investigations areLyapunov spectra. However, these are computationally very demanding. Here weshow a way to spead up computations of such Lyapunov spectra by a factor of morethan ten by parallelizing previously used C programs. Such fast computations ofLyapunov spectra will be especially of use in future investigations of seasonallyforced versions of the present models, for which other tools of bifurcation analysislike AUTO are soon becoming uninformative due to the complexity of the problem.

Key words: deterministic chaos, dengue fever epidemiology, Lyapunov spectra,

computer parallelization

1 Introduction

Recently we have investigated a multi-strain model for dengue fever epidemiology, in-cluding temporary cross-immunity, which shows deterministic chaos in wide parameterregions [1, 2]. Especially, coexisting attractors and isolas have been found in an inter-play between bifurcation analysis by continuation, as provided by computer programslike AUTO, and Lyapunov spectra [3, 4]. However, these methods are computationallyintensive, and especially the Lyapunov spectra, which we previously published, costeda lot of computer time.

Here we show a way to spead up computations of such Lyapunov spectra by afactor of more than ten by parallelizing previously used C programs. We use opensource parallelization software on an eight processor machine as a first test. As aresult we can now calculate the Lyapunov spectra of the autonomous dynamical systemdescribing the dengue epidemiology more accurately.


Computational aspects in chaotic dengue models

Such fast computations of Lyapunov spectra will also be of use in future inves-tigations of seasonally forced versions of the present models, for which other tools ofbifurcation analysis like AUTO [14, 15] are soon becoming uninformative due to thecomplexity of the problem. We will show just a first example of application and leavethe more detailed analysis to future research. Already now it becomes clear that therelevance of the concept of derterministically chaotic attractors in the autonomous aswell as the seasonally forced system are of major importance in the understanding offluctuations observed in dengue epidemiology, more than previous work has indicated[6, 8, 10]. For a more recent account see also [13].

2 Epidemiological multi-strain model

Multi-strain dengue models are of essentially SIR systems (susceptible, infected, recov-ered), with indexing for infection and recovery with different strains, see e.g. [6], andinclude the in dengue fever well described effect of antibody dependent enhancementADE, see e.g. [5].

However, the dramatic consequences of temporary cross immunity on the dynamicalbehaviour in such systems, up to a rich bifurcation behaviour including determinsticallychaotic attractors in wide parameter regions, has only recently been discovered [1, 2,3, 4], although temporary cross-immunity was occasionally included in models but notproperly analyzed then [8, 10] in terms of dynamics.

The complete system of ordinary differential equations for a two strain epidemiolog-ical system allowing for differences in primary versus secondary infection and temporarycross immunity [2] is given by

d

dtS = − β

NS(I1 + φI21) −

β

NS(I2 + φI12) + µ(N − S)

d

dtI1 =

β

NS(I1 + φI21) − (γ + µ)I1

d

dtI2 =

β

NS(I2 + φI12) − (γ + µ)I2

d

dtR1 = γI1 − (α + µ)R1

d

dtR2 = γI2 − (α + µ)R2 (1)

d

dtS1 = − β

NS1(I2 + φI12) + αR1 − µS1

d

dtS2 = − β

NS2(I1 + φI21) + αR2 − µS2

d

dtI12 =

β

NS1(I2 + φI12) − (γ + µ)I12

d

dtI21 =

β

NS2(I1 + φI21) − (γ + µ)I21

d

dtR = γ(I12 + I21) − µR .

For two different strains, 1 and 2, we label the SIR classes for the hosts that haveseen the individual strains. Susceptibles to both strains (S) get infected with strain 1


Nico Stollenwerk, Maıra Aguiar & Bob W. Kooi

(I1) or strain 2 (I2), with infection rate β. They recover from infection with strain 1(becoming temporary cross-immune R1) or from strain 2 (becoming R2), with recoveryrate γ, and so on. With rate α, the R1 and R2 enter again in the susceptible classes (S1

being immune against strain 1 but susceptible to 2, respectively S2), where the indexrepresents the first infection strain. Now, S1 can be reinfected with strain 2 (becomingI12), meeting I2 with infection rate β or meeting I12 with infection rate φβ, secondaryinfected contributing differently to the force of infection than primary infected, and soon.

We include demography of the host population denoting the birth and death rateby µ. For constant population size N we have for the immune to all strains R =N − (S + I1 + I2 +R1 +R2 +S1 +S2 + I12 + I21) and therefore we only need to considerthe first 9 equations of Eq. (1), giving 9 Lyapunov exponents. In our numerical studieswe take the population size equal to N = 100 so that numbers of susceptibles, infectedetc. are given in percentage. As fixed parameter values we take µ = (1/65) year−1,γ = 52 year−1, β = 2 · γ and α = 2 year−1. The parameter φ is varied.

A rich bifurcation structure has been observed for as well φ much larger one [6, 7]as recently also for φ smaller one and moderate values of φ larger one [1, 2, 3, 4]. Weconcentrate on the parameter region for φ smaller one, as a biologically relevant region,see especially [2] for more details on the biological aspects. In this parameter regionwe found fixed point, limit cycle and chaotic attractors

3 Lyapunov exponents

We quantify the attractor structure, fixed point, limit cycle or chaotic attractor etc.,by calculating Lyapunov exponents. A negative largest Lyapunov exponent indicates astable fixed point as attractor, a zero largest Lyapunov exponent indicates a stable limitcycle and a positive largest Lyapunov exponent indicates a chaotic attractor [16, 17].

As short hand notation for Eq. system (1) let the dynamics for the state

x := (S, I1, I2, ..., R) (2)

be f(x), henced

dtx = f(x) (3)

which explicitly gives the dynamics as written down above. Then we analyse the sta-bility in all 9 directions of the state space of this ODE system by calculating deviations∆x along a numerically integrated solution of Eq. (3) in the attractor with attractortrajectory x∗(t), hence

d

dt∆x =

df

dx

∣

∣

∣

∣

x∗(t)

· ∆x . (4)

Here, any attractor is notified by x∗(t), be it a fixed point, periodic orbit or chaoticattractor. In this ODE system the linearized dynamics is given with the Jacobian

matrixdf

dxof the ODE system Eq. (3) evaluated at the trajectory points x∗(t) given in

notation ofdf

dx

∣

∣

∣

x∗(t).



The Lyapunov exponents then are the logarithms of the eigenvalues of the in-tegrated Eq. (4) in the limit of large integration times. Besides for very simpleiterated maps no analytic expressions for chaotic systems can be given for the Lya-punov exponents. For the calculation of the iterated Jaconbian matrix and its eigen-values, we use the QR decomposition algorithm [11, 12]. With the matrix A(x∗(t)) :=

1| + ∆tdf

dx

∣

∣

∣

x∗(t)= Q(x∗(t)) · R(x∗(t)), where 1| is the unit (9 × 9)-matrix, we have

∆x(t0 + (n + 1)∆t) = An · An−1 · ... · A0 · ∆x(t0)

(5)

= Qn · Rn · Rn−1 · ... · R0 · ∆x(t0)

for An = A(x(t0 + n ∆t)). From Rn · Rn−1 · ... · R0 =∏

n

ν=0 Rν with the diagonalelements rii(ν) of the right diagonal matrix Rν the Lyapunov exponents are given forlarge t = n∆t by

λi(t) =1

n · ∆tln

(

n∏

ν=0

|rii(ν)|)

. (6)

Plots with λi as function of time t = n∆t are given in [2]. For small integrationtimes, see the Lyapunov exponents change a lot along the attractor, but soon settletowards their final size, still showing small oscillations. For long integration times,these oscillations also disappear, giving reliable values for the infinit time limit of theLyapunov exponents λi = limt→∞

λi(t).Fig. 1 a) shows the largest four Lyapunov exponents as a function of φ, as previ-

ously described [2]. We observe that for small φ up to 0.1 all four Lyapunov exponentsare negative, indicating the stable fixed point solution. Then follows a region up toφ = 0.5 where the largest Lyapunov exponent is zero, characteristic for stable limit cy-cles. Above φ = 0.5 a positive Lyapunov exponent, clearly separated from the secondlargest Lyapunov exponent being zero, indicates deterministically chaotic attractors.In the chaotic window between φ = 0.5 and φ = 1 also periodic windows appear,giving a zero largest Lyapunov exponent. These findings are in good agreement withthe numerical bifurcation diagram, and we will now further investigate this bifurcationstructure in the next section.

4 High resolution Lyapunov spectra

The calculation of the Lyapunov spectrum as shown in [2] needed around seven hours onan everage desktop/laptop. Fig. 1 a) is already calcualted in a parallelized C programand was achieved in less than half an hour on an eight processor parallel computer.The parallelization of the Lyapunov spectrum calculation then gives the opportunityof using longer integration time as shown in 1 b)

For the parallelization we used the previously designed C program which is rela-tively time consuming due to the QR decomposition on top of the integration alongthe trajectories for each φ value. The parallelization was done by an open sourceprogram on Ubuntu linux called open-mpi, where MPI stands for “message passing



a)

-0.06

-0.04

-0.02

0

0.02

0.04

0.06

0 0.2 0.4 0.6 0.8 1

λi

φ b)

-0.06

-0.04

-0.02

0

0.02

0.04

0.06

0 0.2 0.4 0.6 0.8 1

λi

φ

Figure 1: a) Lyapunov spectrum as previously calculated. b) Lyapunov spectrum with higher

time resolution on a parallel computer shows the zero Lyapunov exponent captured better than

in a).

interface“. Surprisingly few additional C-code had to be included into the already ex-isting software for the Lyapunov spectrum calculation. Instead of the gcc compiler onenow uses the mpicc compiler with same arguments as before. The compiled programis started with mpirun by specifying the number of parallel processes as mpirun -np

101 paralleltestprogram. The number of parallel processes can be larger than thephysically present processors on the computer. As additional header file we specify#include <mpi.h>.

Then there are six basic commands for the actual parallelization, like the initial-ization with MPI Init(&argc,&argv); and ending with MPI Finalize();.

After the initialization the program has to know the number of processors viaMPI Comm size(MPI COMM WORLD,&numprocs); and the indentification rank of processesvia MPI Comm rank(MPI COMM WORLD,&myid);, rank zero is often used as master pro-cess.

Now we can start the actual calculation by running the parameter φ-loop as rankzero process and give to each subprocess with rank 1 to let us say 100 the respectiveφ-value via MPI Send(&phi, 1, MPI DOUBLE, iphi, TAG, MPI COMM WORLD);. Eachsubprocess receives the respective φ-value via MPI Recv(&phi, 1, MPI DOUBLE, 0,

TAG, MPI COMM WORLD, &stat); and after calculating the Lyapunov spectrum in anordinary C-subroutine Lyapunovspektrum();, which is the old sequential program, thesubprocess delivers back the result of the 9 Lyapunov exponents via MPI Send(lyap,

MAX, MPI DOUBLE, 0, TAG, MPI COMM WORLD);.

The master process with rank zero then receives back the result via MPI Recv(lyap,

MAX, MPI DOUBLE, iphi, TAG, MPI COMM WORLD, &stat); and finally prints the re-sult onto a file for graphical representation, as shown in Fig. 1.

A few parameters used in the six parallelization commands MPI Init, MPI Comm size,MPI Comm rank, MPI Send, MPI Recv and MPI Finalize have to be declared, in ourcase as usually int myid; and int myid; and as mpi-specific declaration MPI Status



-1.5

-1

-0.5

0

0.5

1

1.5

0 0.2 0.4 0.6 0.8 1 λ

i

φ

Figure 2: Lyapunov spectrum for the seasonally forced system. The values of the largest

Lyapunov is much larger in most parameter regions than in the autonomous system without

seasonal forcing. Also more than one Lyapunov exponent are larger than zero in wide parameter

regions.

stat;, and finally in the definition of the main C-routine int main(int argc, char

*argv[]).

Since the parallel computer is operating in Linux, for the graphical representa-tion no additional tools have to be used than gnuplot as identically in the sequentialprogram. As a quality test in Fig. 1 b) with the higher resolution as compared tothe previously used in a) (besides the immense decrease in computation time) we ob-serve the zero-value Lyapunov exponent for limit cycles, toruses and chaotic attractorsnumerically very close to zero actually.

5 First Lyapunov spectrum for seasonal forcing

With the presented tool we now can start to efficiently investigate the seasonally forcedmulti-strain epidemiological system. For the seasonal forcing to be inserted into Eq.system 1 we use

β = β(t) = β0 · (1 + η · cos(ω · (t + ϕ))) (7)

with ω = 2π · 1T

and T = 1 year as data of time series in dengue e.g. from Thailandsuggest [9, 10].

A first test with seasonality of strength η = 0.1 reveals that the deterministicchaos does not disappear in eventual model locking but is even enhanced, see the largerpositive Lyapunov exponents for the seasonally forced system in Fig. 2 as opposed to theautonomous case in Fig. 1. A first analysis in AUTO shows for the present parametervalues in the seasonally forced system a torus bifurcation at φ = 0.1145293, hence nearthe previously observed Hopf bifurcation in the autonomous system at φ = 0.1132861[2]. Bifurcation software like AUTO cannot see easily beyond the torus bifurcationwhich happens quite early in the relevant parameter φ-region. The more important are



fast calculable Lyapunov spectra as demonstrated here. Further more detailed analysesof bifurcation structures will be done in order to see which parameter combinations aremost likely capturing the dynamical behaviour of empirically observed time series ofdengue fever.

Acknowledgements

This work has been supported by the European Union under the EPIWORK projectand by FCT, Portugal. We thank Luis Sanchez, Lisbon, for scientific support.

References

[1] M. Aguiar and N. Stollenwerk, A new chaotic attractor in a basic multi-

strain epidemiological model with temporary cross-immunity, arXiv:0704.3174v1[nlin.CD] (2007) (accessible electronically at http://arxive.org).

[2] M. Aguiar, B.W. Kooi and N. Stollenwerk, Epidemiology of dengue fever:

A model with temporary cross-immunity and possible secondary infection shows

bifurcations and chaotic behaviour in wide parameter regions, Math. Model. Nat.Phenom. 3 (2008) 48–70.

[3] M. Aguiar, N. Stollenwerk and B.W. Kooi Torus bifurcations, isolas and

chaotic attractors in a simple dengue model with ADE and temporary cross immu-

nity, in Proceedings of 8th Conference on Computational and Mathematical Meth-ods in Science and Engineering, CMMSE 2008, ISBN 978-84-612-1982-7 (2008).

[4] M. Aguiar, N. Stollenwerk and B.W. Kooi Torus bifurcations, isolas and

chaotic attractors in a simple dengue fever model with ADE and temporary cross

immunity, accepted for publication in Intern. Journal of Computer Mathematics(2009).

[5] S.B. Halstead, Neutralization and antibody-dependent enhancement of dengue

viruses, Advances in Virus Research 60 (2003) 421–67.

[6] N. Ferguson, R. Anderson, and S. Grupta, The effect of antibody-dependent

enhancement on the transmission dynamics and persistence of multiple-strain

pathogens, Proc. Natl. Acad. Sci. USA 96 (1999) 790–94.

[7] L. Billings, B.I. Schwartz, B.L. Shaw, M. McCrary, D.S. Burke and

T.A.D. Cummings, Instabilities in multiserotype disease models with antibody-

dependent enhancement, Journal of Theoretical Biology 246 (2007) 18–27.

[8] H.J. Wearing and P. Rohani, Ecological and immunological determinants of

dengue epidemics, Proc. Natl. Acad. Sci. USA 103 (2006) 11802–11807.



[9] B. Cazelles, M. Chavez,A.J. McMichael and S. Hales, Nonstationary

influence of El Nino on the synchronous dengue epidemics in Thailand, PLoSMedicine. 4 (2005) 313–318.

[10] Y. Nagao and K. Koelle, Decreases in dengue transmission may act to increase

the incidence of dengue hemorrhagic fever, Proc. Natl. Acad. Sci 105 (2008) 2238–2243.

[11] J.P. Eckmann, S. Oliffson-Kamphorst, D. Ruelle and S. Ciliberto,Liapunov exponents from time series, Phys. Rev. A 34 (1986) 4971–9.

[12] U. Parlitz, Identification of true and spurious Lyapunov exponents from time

series, Int. J. Bif. Chaos 2 (1992) 155–165.

[13] M. Recker, K.B. Blyuss, C.P. Simmons, T.T. Hien, B. Wills, J.

Farrar and S. Gupta, Immunological seroptype interactions and their effect

on the epidemiological pattern of dengue, Proc. R. Soc. B, published onlinedoi:10.1098/rspb.2009.0331 (2009).

[14] E.J. Doedel, R.C. Paffenroth, A.R. Champneys, T.F. Fairgrieve, Y.A.

Kusnetsov, B. Sandstede, B. Oldeman, X.J. Wang, and C. Zhang, AUTO

07P – Continuation and bifurcation software for ordinary differential equations,Technical Report: Concordia University, Montreal, Canada (2007) (accessible elec-tronically at http://indy.cs.concordia.ca/auto/).

[15] Y.A. Kuznetsov, Elements of Applied Bifurcation Theory Applied MathematicalSciences 112, Springer-Verlag, 3 edition, New York, 2004.

[16] D. Ruelle,Chaotic Evolution and Strange Attractors, Cambridge UniversityPress, Cambridge, 1989.

[17] E. Ott, Chaos in Dynamical Systems, Cambridge University Press, Cambridge,2002.



Perfect secrecy in Semi-Linear Key Distribution Scheme

Sergey P. Strunkov1 and Sergio Sanchez2

1 Scientific Institute for System Analisis of the Russian Academy of Science(SRISA),Moscow, Russia

2 Dept. of Applied Mathematics, University Rey Juan Carlos, Madrid, Spain


Abstract

In this paper we study certain extension of a encryption scheme of Blom type[3]and investigate their theoretical security.

Key words: Key Distribution, Blom type encryption scheme, key distributionschemes, compromise

MSC 2000: 94A60,11T71,14G50

1 Introduction

In various multiuser systems, the need for generating a secret, common to a certainsubset of the users arises occasionally. Such key can be used, for example, in establishinga secure private key cryptosystem among the members of the subset.

It is sometimes required that user pairs in a network share secret information tobe used for mutual identification or as a key in a cipher system.

One of the widely known actually cryptosystems of this type for a subset of twousers is the public key cryptosystem of Diffie-Hellman [2], in which the common keyis an element of cyclic group of high order. The Diffie-Hellman cryptosystem is non-communicating key distribution scheme in the sense that for calculation of the commonkey the users don’t have to exchange any additional information. In the Diffie-Hellmansystem the common key of two users is not accessible for the enemy because its calcula-tion lead to solve a difficult mathematical problem. For example, for the Diffie-Hellmancryptosystem this difficult problem can be the discreet logarithm problem in the mul-tiplicative group over finite field o in additive group of points of an elliptic curve overfinite field.

The system that we study in this paper is also of non-communicating type. Con-trary to the Diffie-Hellman cryptosystem the impossibility to access to common key ofa subset of users in this systems is absolute, in other words is a perfectly secure secret


Remarks on Semi-Linear Key Distribution Encryption Scheme

system. More exactly if the number of a coalition of corrupt users is not too large thenthe existent information of the common key of a subset of users, in the case that theenemy know the secret key of all corrupt users,coincides with the a priori informationabout of this key. In other words the a posteriori information for the enemy, in thecase of knowing the key information of some coalition of corrupt users, coincides witha priori information about this key, so the key is perfectly secure.This way the crypto-graphic security of a Diffie-Hellman system is relative, while the cryptographic securityof the system here analyzed is absolute.

On the other hand if the network is large it becomes impractical or even impossibleto store all keys securely at the users. A natural solution then is to supply each userwith a relatively small amount of secret data from which he can derive all his keys.However, as all keys will be generated from a small amount of data, dependenciesbetween keys will exist.Therefore by cooperation, users in the system might be able todecrease their uncertainty about keys they should not to access to.

Blom[3, 4] was the first to consider non-communicating schemes for conference ofsize 2 and coalitions of size b.

In our paper is analyzed a semi-linear key system [1] whose special case is a Blom’sstructure[3, 4]. At present it is known at least one essential generalization of a Blom’sstructure[5, 6].We present in our paper another generalization of a Blom’s results whichalso is different to generalization of papers[5, 6].

We want to stay that although the basic result in the Blom’s paper’s is true wecannot consider his justification as complete. The results presents in ours paper, weare expecting, to fill this absence.

2 Definition of Key Distribution Scheme

In this section we present formal definition of of key distribution scheme. We startwith:

1. the set of users Q, |Q | = m, Q = a1, ..., am

2. the set of secret keys K generated by certain trust center. Each key in theanalyzed system of distribution of keys is an element of the lineal span L (K)over a finite field Fq of keys. More it can be understand that each element ofK is a lineally independent element of certain space FN

q with a large value of N ,so K ⊂ FN

q .

3. a public algorithm A that allows to each user ai calculate the common secret keywith the user aj .

4. it is supposed that in the system exists an unknown coalition of corrupt usersTk= a1,...,ak that try to calculate the common keys of the all other usersT = a1,...,at.


Sergey P.Strunkov, Sergio Sanchez

5. with the coalition Tk we associates a subset of compromised secret keys K (Tk) = ∪K a,a ∈ Tk and its linear span L (K (Tk)).

6. it is said that the system of distribution of keys is stable against a coalition ofTk = a1, ..., ak, if for any T = a1,...,at such that T ∩ T k= ∅ the mutual keysverifies that

dij /∈ L (K (Tk)) .

3 The results

In the paper[3] are shown examples (for small values of the parameter N - the numberof users) for cryptosystem in which each i user in the network has a public key Xi

= xi1 , ..., xiN and a secret key Yi = yi1 , ..., yiN . The mutual key dij of user i andj are calculated in the following way

dij =N∑

k=1

xikyjk (1)

where the keys Xi and Yi are chosen so that the habitual conditions of agreementare verified

dij = dji for all i and j (2)

In this paper it is presented a cryptosystems of more general type than in paper[3], in which the mutual keys verify the condition of agreement (1.2) and we are carriedout an analysis of stability regarding to compromise the safety of some of them. Hereby compromise the safety we understand the number of secret keys |K (Tk)| that itis necessary for determine in unique way all common keys of all users in the network.

Let 1, 2, ..., m be the numbers(indexes) of a all users in the network. Each user hasa public key Xi = xi1 , ..., xin and a secret key Yi = yi1 , ..., yiN which are vectorswith n and N components respectively. The users i and j generate a common key

dij = dji = Φ ( Yi, Xj) = Φ ( Yj , Xi) (3)

where Φ ( Y, X) is a function of the arguments xi1 , ..., xin , yi1 , ..., yiN that are polyno-mials of first order of the variables yi1 , ..., yiN .

These cryptosystems can be carried out introducing in the key distribution centeras a public key of a user i the set xi1 , ..., xin . It is possible to carried out reliablecryptosystems with relatively small values of n even with n = 1. This way will diminishthe volume of the distribution center. This volume is defined by the product of n andm, therefore it doesn’t depend of the N which is the unique parameter related tocompromise the safety.

Let us begin the construction of these functions Φ ( Y, X) and the analysis of thecryptosystems based on them. We consider all the numeric values belonging to anarbitrary field Fq and on this field Fq we are not imposed any special restriction.



In the practical applications of a cryptosystems we habitually take finite fields Fq ofsufficiently large order(q À n À N).

Let (X) = (zij) be a matrix of order m×N in which

zij =

xij , 1 ≤ j ≤ nfij (xi1 , ..., xin) , n + 1 ≤ j ≤ N

(4)

where fij (xi1 , ..., xin) is a function with values in Fq . O in extended form

x11......x1n f1(n+1) (x11......x1n) ... f1N (x11......x1n)x21......x1n f2(n+1) (x21......x2n) ... f2N (x21......x1n)

...... ........... ... .............xm1......x1n fm(n+1) (xm1......xmn) ... fmN (xm1......xmn)

(5)

The functions fij (xi1 , ..., xin) in each row of the matrix is chosen in such a waythat in the matrix (X) any system of N or smaller than N different rows is linearlyindependent.In this case we will say that the matrix (X) is of general type. In par-ticular when n = 1 we can take fik = xk−n+1

i1 if the field Fq is sufficiently large(q -bignumber). Let S be an arbitrary symmetrical non singular matrix of N ×N dimension.We could introducing S as a unit matrix S = EN. In this case (X) S (X)t is a sym-metrical matrix of m×m dimension. Let us take an arbitrary matrix A of dimensionm × m such that is verified that the matrix D = A (X) S (X)t is symmetrical now.This type of matrix exists. We could introducing A as arbitrary symmetrical matrixpermutable with matrix (X) S (X)t, in particular as a value of random polynomial ina matrix (X) S (X)t. Let Y = A (X) be a matrix, as the expression (Y ) S (X)t is asymmetrical matrix of dimension m×m, therefore

dij = (Y ) S (X)t = Φ ( Yi, Xj) = Φ ( Yi, Xj) = dji (6)

where Yi = yi1 , ..., yiN is the i−row of the matrix (Y ) = A (X) . So it is obtainedthat the condition (1.2) is verified, which is a basic requirement for any cryptosystemsof this type. The system this way built we will call a Blom type cryptosystems withthe parameters m, n, N, S, (X) , A.

Let us move now to the estimation of the stability of the cryptosystem regardingto compromises when S = EN . Let us suppose that the number of users m is notsmaller than N . Let us show that the built cryptosystems, when S = EN is verified,is stable regarding to N − 1 compromises, in other words the knowledge of a numberof secret keys smaller keys N doesn’t allow to recover in a unique way the matrix ofmutual keys. As the matrix S = EN is non singular then it is easy to show that therank of the matrix (X) S (X)t in this case is also equal to N . In this case all the submatrix (X)N S (X)N of dimension N ×N are non singular, where (X)N is an arbitrarysub matrix formed with N rows of matrix (X). Moreover found that from the equality

(Y ) = A (X) = DN

(S (X)t

N

)−1(7)

where DN = (Y ) S (X)N is a sub matrix of matrix D with dimension m × N , isdeduced that the matrix D univocally determines the matrix Y. In other words the



knowledge of the matrix of mutual keys is equivalent to know the matrix (Y ) of allsecret keys. Here (X)N is the non singular sub matrix of matrix (X) with dimensionN ×N.

In the following theorem we will prove that the system built in this way is stableregarding N−1 commitments, and therefore the knowledge of an number of secret keyssmaller that N , doesn’t allow the recovery in unique way the matrix of mutual keys ofusers.

Theorem 1 For any k (k < N) given users there are exists a different cryptosystemsof Blom type with given parameters m,n, S = EN , (X) in which each one of these usershas same secret keys and each one of the other users different.

Proof. We can suppose that this users have the numbers 1, 2, ..., k. Let A0 be certainmatrix , that carries out the cryptosystem, with the given parameters m,n, N, (X), inother words the matrix is such that the matrix D = A0 (X) S (X)t of mutual keys issymmetrical. Let B be certain non singular matrix with rank N − k and dimensionN ×N , which is permutable with the matrix S, and such that in the matrix (X) B allthe elements in the first k rows are equal to zero. This matrix B exists. Indeed be (X)k

a sub matrix of the matrix main (X), formed by their first k rows, and let C be anarbitrary matrix of a maximum rank and a dimension N×N such that (X)k C = 0. Wewill require,also, that some N −k columns of the matrix C be lineally independent andthe remaining k columns are columns zeroes. Evidently that the rank of the matrixC is N − k. So B = CCt is the matrix that is looked for. Notice that the matrix(X) BS (X)t is symmetrical because

((X) BS (X)t) = (X) StBt (X)t = (X) SB (X)t = (X) BS (X)t (8)

and all the elements in the first k rows of this matrix is zeroes, since they are elementsof the first k rows of the matrix (X) B. But it is important for us that (Xr)B 6= 0for any row (Xr) whose number r is bigger than k. This is deduced from the fact that(X) by construction is a matrix of general type and therefore the row (Xr) cannot beexpressed lineally by means of the first k rows of the matrix (X) . Moreover the rangeof B is also N − k. Let us take,now, some matrix A1 of dimension m×m that verifies

A1 (X) = (X) B (9)

The matrix A1 also exists. To be convinced of it is enough to analyze the set ofequations

(A1)1 (X) = ((X)B)1 (10)

obtained from the relationship (1.8) for the coefficients of a arbitrary row i (A1)i ofthe matrix A1. In this expression ((X) B)i means the i row of the matrix (X) B. Therank of the matrix of the system (1.9) is N , so equal to the number of equations,therefore for each row i of the matrix A1 the system of equations (1.9) is solvable andas a consequence is followed the existence of the matrix A1. Let A be the matrix in the



form A = A0 +A1. Then the matrix A (X) S (X)t is symmetrical since it is representedin form of a sum of two symmetrical matrixes

A (X) S (X)t = A0 (X) S (X)t + A1 (X) S (X)t = A0 (X) S (X)t + (X) BS (X)t (11)

There the secret keys of the first k users in the coding systems with the matrix A andthe matrix A0 coincide since it is verified

A (X) = A0 (X) + A1 (X) = A0 (X) + (X) B (12)

Before we have shown that all the rows of the matrix (X) B whose numbers are biggerthan k are null, therefore this same property will have the rows of the matrix of secretkeys A (X) and A0 (X) that have these same numbers. Now we complete the proof.

Cryptosystem of similar form can be designed in some different way.Let (X) and S be the matrices as above and A an arbitrary symmetrical matrix

of dimension N ×N permutable with matrix S . In particular if S = EN then A canbe any symmetrical matrix of dimension N × N . As a system of secret keys we takethe matrix (Y ) = (X) A and the mutual keys of the users i and j we define by meansof the element (i, j) of the matrix of mutual keys D = (X) AS (X) . It is easy to showthat the matrix D is symmetrical then the relationship (1.2) is verified in this case.The proof of stability of this crypto system against of N − 1 compromises is realizedin similar way that the cryptosystem described above.

Acknowledgements

This work has been partially supported by Russian foundation for basic research, grant09-01-00287-a.

References

[1] Strunkov S., Podyfalov N., Sanchez S., O polylineynuch kruptocuctemach(inrussian), Bezopastnost unformazuonnuch technologuy, 1(49),2006, pp.86-88

[2] Diffie W., Hellman M.E., New directions in cryptography, IEEE Transactionson information Theory, 22(6),1976, pp.664-654

[3] R.Blom, Non-Public Key Distibution, Advances in Cryptology, Proceeding ofCrypto 82, pp.231-236

[4] R.Blom, On optimal class of symmetric key generation systems, Euro-crypt‘84,1984, pp.335-338

[5] Blundo C., De Santis A., Herzberg A., Kutten S., Vacaro U., Yung M.,Perfectly-secure key distribution for dinamic conferences, Advances in cryptology-CRYPTO‘92,1992, pp.471-486



[6] Blundo C., Mattos L.A.F., Stinson D.R., Trade-offs between communicationand storage in unconditionally secure schemes for broadcasencryption and interac-tive key distribution, Advances in cryptology-CRYPTO‘96,1996, pp.387-400

[7] Beimel A., Chor B., Communication in key distribution schemes, IEEE Trans.on Inform. Theory,1996, V. 42, N1, p.19-28

[8] N.D.Podufalov,S.P.Strunkov, On Difference Sets in Finite Groups, Proceed-ings, 12th International Conference, FPSAC‘00, Moscow, June 2000. pp. 28-29





30 June, 1–3 July 2009.

Numerical Approximation of Forward-Backward

Differential Equations by a Finite Element Method

M.F. Teodoro1,2, P.M. Lima2, N.J. Ford3 and P.M. Lumb3

1 Departamento de Matematica, EST, Instituto Politecnico de Setubal, Portugal

2 CEMAT, Instituto Superior Tecnico, UTL, Lisboa, Portugal

3 Departement of Mathematics, University of Chester, CH1 4BJ, Chester, UK


Abstract

This paper is concerned with the approximate solution of a linear first-orderfunctional differential equation which involves forward and backward deviatingarguments. We search for a solution x, defined for t ∈ [−1, k],(k ∈ IN), which takesgiven values on the intervals [−1, 0] and (k − 1, k]. Continuing the work started in[13], [14] and [8], we introduce and analyse a new computational algorithm based onthe finite element method for the solution of this problem which is applicable bothin the case of constant and variable coefficients. Numerical results are presentedand compared with the results obtained by other methods.

Key words: Mixed-type functional differential equation, method of steps, collo-

cation method, least squares, finite element, splines

MSC 2000: 34K06; 34K10; 4K28; 65Q05

1 Introduction

This paper is devoted to the numerical solution of mixed-type functional differentialequations (MTFDE). We consider equations of the type

x′(t) = α(t)x(t) + β(t)x(t − 1) + γ(t)x(t + 1), (1)

where x is the unknown function, α, β and γ are known functions. The interest in thistype of equation is motivated by its applications in optimal control [11], economic dy-namics [12], nerve conduction [2] and travelling waves in a spatial lattice [1]. Importantcontributions to their analysis have appeared in the literature in the two last decadesof the past century. The ill-posedness of MTFDEs has been discussed by Rustichini in


Numerical Approximation of Forward-Backward Differential Equations

1989 [11], where he considered linear autonomous equations. The same author extendedhis results to nonlinear equations [12]. J. Mallet-Paret applied the Fredholm theory toobtain new results for this class of equation [9] and introduced the idea of factorizationof their solutions [10]. Independently, the authors of [6] have obtained results aboutfactorization for the linear non-autonomous case. On the other hand, Krisztin [7] hasanalysed the roots of the characteristic equation of linear systems of MTFDE, whichhas led him to important results on the qualitative behaviour of their solutions. Inparticular, he has shown that a MTFDE may have a nonoscillatory solution in spite ofthe non-existence of real roots of its characteristic equation (unlike the case of delay-differential equations). Based on the existing insights on the qualitative behaviour ofMTFDE, the authors of [3] and [4] have recently developed a new approach to theanalysis of these equations in the autonomous case. More precisely, they have analysedMTFDE as boundary value problems, that is, for a MTFDE of the form (1), but withconstant coefficients α, β and γ, they have considered the problem of finding a differen-tiable solution on a certain real interval [−1, k], given its values at the intervals [−1, 0]and (k − 1, k]. They have concluded that in general the specification of such boundaryfunctions is not sufficient to ensure that a solution can be found. For the case wheresuch a solution exists they have introduced a numerical algorithm to compute it. Thisapproach was further developed in [13], where new numerical methods were proposedfor the solution of such boundary value problems. In [14] and [8] these methods wereextended to the non-autonomous case (when α, β and γ are smooth functions of t).In [5], we have decomposed the solution into a growing and a decaying component,following the ideas of J. Mallet-Paret and Verduyn-Lunel [10]. By approximating eachof the components separately, we partially overcome the ill-conditioning of the prob-lem, which enables us to improve the accuracy of the numerical results and to obtainnumerical solutions on much larger intervals.

In [8] a theoretical basis was given to the computational methods, by relating themto the existing analytical results about MTFDE and with classical results of numericalanalysis.

In the present paper,we give an overview of the numerical methods described in[8], [13] , [14] and introduce a new algorithm, based on the finite element method. Acomparative analysis of all these algorithms is provided.

2 Preliminaries

2.1 Method of Steps

The purpose is to compute a particular solution of equation (1) which satisfies

x(t) =

ϕ1(t), if t ∈ [−1, 0],f(t), if t ∈ (k − 1, k],

(2)

where ϕ1 and f are smooth real-valued functions, defined on [−1, 0] and (k − 1, k],respectively (1 < k ∈ IN). In order to analyze and solve this boundary value problem


M.F. Teodoro, P.M. Lima, N.J. Ford, P.M. Lumb

(BVP) we consider an initial value problem (IVP), with the conditions:

x(t) = ϕ(t), t ∈ [−1, 1], (3)

where the function ϕ is defined by

ϕ(t) =

ϕ1(t), if t ∈ [−1, 0],ϕ2(t), if t ∈ (0, 1],

. (4)

This reformulation provides the basis for both analytical and numerical construction ofsolutions using ideas based on Bellman’s method of steps for solving delay differentialequations. One solves the equation over successive intervals of length unity. Assumingthat γ(t) 6= 0,∀t ≥ 0, equation (1) can be rewritten in the form

x(t + 1) = a(t)x′(t) + b(t)x(t − 1) + c(t)x(t), (5)

where

a(t) =1

γ(t), b(t) = −

β(t)

γ(t), c(t) = −

α(t)

γ(t). (6)

Using formula (5) repeatedly, we can construct a smooth solution of equation (1) onany interval [1, k], starting from its definition on [−1, 1] by formula (4). Continuing thisprocess, we can extend the solution to any interval, provided that the initial function ϕand the variable coefficients are smooth enough functions. The next theorem formulatesthis result in more precise terms.

Theorem 2.1 Let x be the solution of problem (1),(4), where α(t), β(t), γ(t) ∈ C2L([−1,2L + 1]), γ(t) 6= 0, t ∈ [−1, 2L + 1], ϕ1(t) ∈ C2L([−1, 0]), ϕ2(t) ∈ C2L([0, 1]) (for some

L ∈ IN). Then there exist functions δi,l, ǫi,l, δi,l, ǫi,l ∈ C([−1, 2L + 1]), l = 1, . . . , L,

i = 0, 1, . . . , 2l, such that the following formulae are valid:

x(t) =∑2l−1

i=0 δi,l(t)ϕ(i)1 (t − 2l) +

∑2l−1i=0 ǫi,l(t)ϕ

(i)2 (t − 2l + 1), t ∈ [2l − 1, 2l];

x(t) =∑2l

i=0 ǫi,l(t)ϕ(i)2 (t − 2l) +

∑2l−1i=0 δi,l(t)ϕ

(i)1 (t − 2l − 1), t ∈ [2l, 2l + 1];

l = 1, 2, . . .

(7)

A detailed proof can be found in [8]. For the autonomous case, similar results wereobtained in [3].

2.2 Existence and Uniqueness of Solution

Results about existence and uniqueness of solution are found in [10], when analysingmixed-type functional differential equations on a real interval [−τ, τ ]. Actually, thisBVP can be reformulated (by a simple shift of the independent variable) as a BVP onan interval [−τ −1, τ +1], with boundary conditions given at [−τ −1,−τ ] and [τ, τ +1].In [8] the results presented in [10] are applied to the analysis of the considered BVP.

To present the necessary conditions for the existence of at least one continuouslydifferentiable solution to a given BVP of the form (1), (2) , we begin by consideringthe ODE approach, introduced in two recent papers [13], [14].



On the interval [−1, 1] the solution of (1), (2) can be written in the form

x(t) = x0(t) + u(t), t ∈ [−1, 1] (8)

where x0 is an initial approximation of the solution and u is a correction. More precisely,we require that that x0 satisfies the following conditions:

• x0(t) = φ1(t),∀t ∈ [−1, 0];

• x0 is k times continuously differentiable on [−1, 0], x0(0) = ϕ1(0) and x(j)0 (0) =

ϕ(j)1 (0) , j = 1, ..., k;

• x0 is k− 1 times continuously differentiable on (0, 1] , x0(1) = x(1) and x(j)0 (1) =

x(j)(1), j = 1, ..., k − 1 (where x is the required solution of the problem (1),(2)).

Once x0 is defined, our problem is reduced to the computation of the correctionu. First of all note that u(t) ≡ 0,∀t ∈ [−1, 0] (otherwise, x does not satisfy the firstboundary condition). Therefore, if we define u on [0, 1[, we can extend it to the wholeinterval [−1, k] using the method of steps (as described in previous subsection). Let usdenote as u[−1,k] the extension of u(t) to the interval [−1, k]. Then u[−1,k] is definedpiece by piece in the following way:

u[−1,k](t) =

0, if t ∈ [−1, 0];u(t), if t ∈ (0, 1];

u[l−1,l](t), if t ∈ (l − 1, l], l = 2, 3, . . . , k.

(9)

We shall now obtain an expression for u[l−1,l](t). Using (1) repeatedly, and taking intoaccount that u(t) ≡ 0, for t ∈ [−1, 0], we can express u[l−1,l](t) in terms of u(t) and itsderivatives, on any interval (l − 1, l].

In particular, on the interval (k − 1, k] , we obtain

u[k−1,k](t) = Lk−1u(t−k+1) := ck−1,k(t)u(k−1)(t−k+1)+· · ·+c0,k(t)u(t−k+1), (10)

t ∈ (k − 1, k]. Here, Lk−1 denotes a linear differential operator of order k − 1. The cik

are coefficients that can be computed recursively, just as the δik and ǫik coefficients inthe right-hand side of (7).

Notice that x0 can also be extended to the interval [−1, k], using the same method.Now, since x must satisfy the second boundary condition in (2), we conclude that

Lk−1u(t) = f(t + k − 1) − x[−1,k]0 (t + k − 1), t ∈ [0, 1]. (11)

Imposing regularity conditions at t = 0 and t = 1, the following boundary conditionsmust be satisfied:

u(0) = u′(0) = · · · = u(k)(0) = 0;

u(1) = u′(1) = · · · = u(k−1)(1) = 0.(12)

The number of boundary conditions in (12), 2k + 1, is higher than the order of theconsidered ODE (11). Therefore, there may not exist a solution of (11) which satisfies



all the conditions (12). This is not surprising, since the existence of a solution to theoriginal boundary value problem (1), (2) is also not guaranteed (as discussed in [3]).

When solving the problem (11),(12) we must keep only k−1 conditions and ignorethe remaining ones. The corresponding boundary value problem considering k − 1boundary conditions is called the reduced BVP.

Finally, as it follows from [10] and [8], if a continuously differentiable solution xexists and b(t)c(t) > 0 , ∀t ∈ [0, k − 1], then this solution is unique in the spaceW 1,p(0, k).

3 Numerical Methods

3.1 Outline of the methods

We now give an outline of three numerical methods based on the ODE approach,described in the previous section.

According to these methods, we search for an approximate solution of (1) on [−1, 1]in the form

xN (t) = x0(t) +d−1∑

j=0

Cjxj(t), t ∈ [−1, 1] (13)

where x0 is an initial approximation of the solution ; xj0≤j≤d−1 is a basis in thespace of functions where the correction to the initial approximation is sought; d isthe dimension of this space. The algorithms for computing xN consist of three steps:1. constructing the initial approximation; 2. defining a set of basis functions ; 3.computing the Cj coefficients.

1. Constructing the initial approximation.

From formula (5) it follows that if a solution of equation (1) belongs to Cn((l−1, l])(for certain l ≥ 1, n ≥ 1), then it also belongs to Cn−1((l, l + 1]). Therefore, sincewe want xN to be at least continuous on [−1, k] (for a certain k ≥ 2), we requirethat x0 belongs to Ck((−1, l]) . With this in mind, we define x0 on [−1, 1] in thefollowing way:

x0(t) =

ϕ1(t), t ∈ [−1, 0];P2k(t) = a0 + a1t + · · · + a2kt

2k, t ∈ [0, 1].(14)

Since x0 must be k times continuously differentiable on [−1, 0], P2k must satisfyregularity conditions at t = 0 and at t = 1, we define a linear system of 2k + 1equations with 2k + 1 unknowns a0, a1,. . . , a2k. This system has a nonsingularmatrix, for any k ≥ 2.

Further, x0 is extended from [−1, 1] to [−1, k] using the recurrence formulae (7).

Let us denote by x[−1,k]0 this extension.

2. Definition of a basis set.

With the purpose of computing a correction to the initial approximation on [0, k],



we first consider this correction on [0, 1]. Let us define a grid of stepsize h on thisinterval. Let h = 1/N (where N ∈ IN, N ≥ k + 1)) and ti = ih, i = 0, . . . ,N .The correction xd(t)−x0(t) on[0, 1] will be sought as a k-th degreee spline, Sk(t),defined on this grid, which satisfies Sk(0) = Sk(1) = 0. As usual, we will useas basis functions xj(t) the so-called B-splines of degree k. From the definitionit follows that the basis functions have the following properties: xj ∈ Ck−1[0, 1];xj(t) is different from zero only in (tj , tj+k+1) and xj(t) is a polynomial of degreek on each interval [ti, ti+1], i = 0, . . . ,N − 1. Note that we have N − k functionsxj with these properties; therefore, we set d = N − k.

Next the basis functions are extended to the interval [0, k] using the formulae(7), where ϕ1 is replaced by 0 and φ2 is replaced by xj. Let us denote the

extended basis functions by x[0,k]j . Each time we extend the basis function to

the next interval, the smoothness of the splines decreases by 1 unit (though thepolynomial degree remains constant). Therefore, on the interval [k − 1, k] thebasis functions are continuous but not continuously differentiable function. Onthe whole interval [0, k], the approximate solution is given by

x[0,k]N (t) = x

[−1,k]0 (t) +

N−k−1∑

j=0

Cjx[0,k]j (t), t ∈ [0, k]. (15)

3. Computation of the Coefficients

Finally, we compute the coefficients Cj , j = 0, . . . ,N − k − 1 of the expansion(15) from the condition that xN approximates f on the interval (k − 1, k]. Threealternative methods were used for this purpose. The collocation and least squaresmethods are described in [13], [14] and [8]. Here we will describe the finite elementmethod. As in the previous two methods, the coefficients Cj, j = 0, . . . ,N −k−1are computed by solving a linear system with a (N − k) × (N − k) band matrix.In this last case, the coefficients Cj are obtained from the following orthogonalityconditions:

∫ k

k−1

f(t) − x[−1,k]0 (t) −

N−k−1∑

j=0

Cjx[0,k]j (t)

x[0,k]j (t−k+1)dt = 0, j = 0, . . . ,N−k−1.

The form of the basis functions leads us to the solution of a system of N −k linearequations with a band matrix A and independent term B whose generic elementare respectively

ai+1,j+1 =

∫ k

k−1 x[0,k]i (t)x

[0,k]j (t − k + 1)dt, if i, j = 0, . . . ,N − k − 1, |i − j| ≤ k,

0, if i, j = 0, . . . ,N − k − 1, |i − j| > k;

and

bj+1 =

∫ k

k−1

(

f(t) − x[−1,k]0 (t)

)(

x[0,k]j (t − k + 1)

)

dt, if j = 0, . . . ,N − k − 1.



interval [0, 1] interval [1, 2] interval [2, 3] interval [3, 4]

N ǫ p ǫ p ǫ p ǫ p

k = 5

32 2.428e − 11 4.60 1.907e − 10 4.68 5.979e − 9 4.35 2.602e − 7 3.3664 8.660e − 13 4.81 6.638e − 12 4.85 2.749e − 10 4.44 2.495e − 8 3.38128 2.899e − 14 4.90 2.194e − 13 4.92 1.240e − 11 4.47 2.311e − 9 3.43

k = 432 1.136e − 9 4.08 3.152e − 8 3.88 4.786e − 7 3.3664 6.846e − 11 4.05 2.038e − 9 3.96 4.456e − 8 3.43128 4.194e − 12 4.03 1.294e − 10 3.98 4.044e − 9 3.46

k = 332 8.387e − 8 3.87 1.146e − 6 3.5664 5.468e − 9 3.94 9.772e − 8 3.55128 3.489e − 10 3.97 8.436e − 9 3.53

Table 1: Numerical results for Example 4.1 with m = 2, by the finite element method. ǫ =‖x − xN‖2 on each unit interval [l − 1, l], l = 1, . . . , k − 1, k = 3, 4, 5.

Once again, we can remark that the obtained system of linear equations is thesame that we obtain if we apply the finite element method to the solution of theequation (11).

4 Numerical Results and Conclusions

We present an example of a BVP for MTFDE (4.1) also used in [8] to test some of thedescribed numerical algorithms. The error norm at each interval [l−1, l] is the discreteanalog of the 2-norm:

‖x[l,l+1]N − x‖2 =

(

h∑N

i=0

(

x[l,l+1]N (tlN+i) − x(tlN+i)

)2)1/2

.

The estimate of the convergence order is obtained as usual by

p = log2‖x−xN‖2

‖x−x2N‖2.

Example 4.1 The following example of an autonomous equation was first considered:

x′(t) = (m − 0.5e−m − 0.5em)x(t) + 0.5x(t − 1) + 0.5x(t + 1), (16)

with the boundary conditions ϕ1(t) = emt, t ∈ [−1, 0]; f(t) = emt, t ∈(k − 1, k], with m ∈ R, m 6= 0. The exact solution is x(t) = emt.



k = 3 k = 4 k = 5

N ǫ p ǫ p ǫ p

Finite element

32 8.1354e − 7 3.5663 2.7689e − 7 3.3631 1.3016e − 7 3.358864 6.9206e − 8 3.5553 2.5751e − 8 3.4266 1.2473e − 8 3.3833128 5.9704e − 9 3.5350 2.3362e − 9 3.4624 1.1555e − 9 3.4323

Least Squares

32 2.8214e − 6 3.0032 1.2661e − 6 2.9944 4.9214e − 7 2.969764 3.5073e − 7 3.0080 1.5681e − 7 3.0132 6.0840e − 8 3.0160128 4.3671e − 8 3.0056 1.9457e − 8 3.0107 7.6719e − 9 2.9874

Collocation

32 2.1429e − 5 2.1185 6.6952e − 6 2.1866 1.9745e − 6 2.245764 5.1223e − 6 2.0647 1.5517e − 6 2.1093 4.4122e − 7 2.1619128 1.2515e − 6 2.0331 3.7301e − 7 2.0565 1.0392e − 7 2.0860

Table 2: Numerical results for Example 4.1 using different methods.

ǫ = ‖x − xN‖2/√

(k − 1) on interval [0, k − 1], k = 3, 4, 5, m = 2.

Table 1 contains numerical results for Example 4.1, including error norms and estimatesof the convergence order. These values correspond to the finite element method andwere computed separately on each interval [l, l + 1], in the case m = 2, for differentvalues of k . In the case k = 3 (resp. k = 4) the estimate of the convergence order isclose to 4 in the first interval (resp. first two intervals) and decreases to approximately3.5 in the interval ([k − 2, k − 1]). In the case k = 5, the same estimate is higherthan 4 in the first three intervals and less than four in [3, 4]. This fall of the estimate isprobably connected with the fact that the convergence order in [0, 1] is not the same forthe solution and for its higher derivatives. As shown in [8], the convergence order of thesolution in the subsequent intervals depends on the converence order of its derivativesin the first one. In Table 2, the global error norm and the convergence order of thenumerical solution are displayed. Here the error norms were computed globally on[0, k − 1], for the three methods: finite element, least squaes and collocation method.As it was expected, the higher convergence order was obtained for the finite elementmethod, p = 3.5, and the lowest one corresponds to collocation method, p = 2. For theleast squares method, the estimated global order of convergence is close to 3. The finiteelement method seems to be the most efficient one, providing high (≈ 3.5) convergenceorder and giving results with absolute error less than 6 × 10−9, for h = 1/128, whenk = 3 and m = 2. In the future, we intend to carry out a more detailed numericalanalysis of the presented methods, in paticular the least squares and the finite elementalgorithms.



Acknowledgements

M.F. Teodoro acknowledges support from FCT, grant SFRH/BD/37528/2007. Allthe authors acknowledge support from the Treaty of Windsor Programme (project B-15/08).

References

[1] K. A. Abell, C.E. Elmer, A.R. Humphries, E.S. Vleck, Computation of

mixed type functional differential boundary value problems, SIADS 4, 3 (2005) ,755-781.

[2] H. Chi, J. Bell and B. Hassard, Numerical solution of a nonlinear advance-

delay-differential equation from nerve conduction theory, J. Math. Biol., 24 (1986),583-601.

[3] N.J. Ford, P.M. Lumb, Mixed-type functional differental equations:

a numerical approach, J. Comp. Appl. Math, available electronically,(DOI:10.1016/j.cam.2008.04.016).

[4] N.J. Ford, P.M. Lumb, Mixed-type functional differential equations: a numerical

approach (Extended version), Tech. Report 2007: 1, Depart. of Math., Univ. ofChester, 2007.

[5] N.J. Ford, P.M.Lumb, P.M. Lima and M.F. Teodoro, The Numerical So-

lution of Forward-Backward Differential Equations: Decomposition and related Is-

sues, submitted.

[6] J. Harterich, B. Sandstede, A. Scheel, Exponential Dichotomies for Linear

Non-autonomous Functional Differential Equations of Mixed Type, Indiana uni-versity Mathematics Journal, 51, 5 (2002), 94-101.

[7] T. Krisztin, Nonoscillation for Functional Differential Equations of Mixed Type,J. Math. Analysis and Applications, 245 (2000), 326-345.

[8] P.M. Lima, M.F. Teodoro, N.J. Ford and P.M. Lumb, Analytical and Nu-

merical Investigation of Mixed Type Functional Differential Equations, submitted.

[9] J. Mallet-Paret, The Fredholm alternative or functional differential equations

of mixed type, J. Dyn. Diff. Eq., 11 (1999), 1-47.

[10] J. Mallet-Paret and S.M. Verduyn Lunel, Mixed-type functional differ-

ential equations, holomorphic factorization and applications, Proc. of Equadiff2003, Inter. Conf. on Diff. Equations, HASSELT 2003, World Scientific, Singa-pore (2005), 73-89.

[11] A. Rustichini, Functional differential equations of mixed type: the linear au-

tonomous case, J. Dyn. Diff. Eq., 1, 2(1989), 121-143.



[12] A. Rustichini, Hopf bifurcation for functional differential equations of mixed type,J. Dyn. Diff. Eq., 1 (1989), 145-177.

[13] M.F. Teodoro, P.M. Lima, N.J. Ford, P. M. Lumb, New approach to the

numerical solution of forward-backward equations, Front. Math. China,V.4, N.1(2009) 155-168.

[14] M.F. Teodoro, N. Ford, P.M. Lima, P. Lumb, Numerical modelling of a func-

tional differential equation with deviating arguments using a collocation method,Inter. Conference on Numerical Analysis and Applied Mathematics, Kos 2008, AIPProc., vol 1048 (2008), pp. 553-557



Trends in the formation of aggregates and crystals fromM@Si16 clusters. A study from first principle calculations

M. B. Torres1, E. M. Fernandez2, G. Lopez Laurrabaquio3 and L. C.Balbas4

1 Departamento de Matematicas y Computacion, Universidad de Burgos, 09006Burgos. Spain

2 Instituto de Ciencia de Materiales de Madrid, Consejo Superior de InvestigacionesCientıficas (CSIC), 28049 Madrid, Spain

3 Instituto Nacional de Investigaciones Nucleares, ININ, 52750 Mexico D.F., Mexico

4 Departamento de Fısica Teorica, Universidad de Valladolid, E-47011 Valladolid.Spain

emails: [email protected], [email protected], , [email protected]

Abstract

We have shown recently that the ground state and low-lying energy isomersof the endohedral M@Si16 clusters (M = Sc−, Ti, V+) have a nearly sphericalcage-like symmetry with a closed shell electronic structure which conforms themas exceptional stable entities. This is manifested, among other properties, by alarge Homo-Lumo gap about 2 eV which suggest the possibility of using these clus-ters as basic units (superatoms) to construct optoelectronic materials. As a firststep in that direction, we have studied in this work, by means of first principles cal-culations, the trends in the formation of [Ti@Si16]n, [Sc@Si16K]n, and [V@Si16F]naggregates as their size increases, going from linear to planar to three dimensionalarrangements. The more favorable configurations for n ≥ 2 are those formed fromthe fullerene-like D4d isomer of M@Si16, instead of the ground state Frank-KasperTd structure of the isolated M@Si16 unit, joined by Si-Si bonds between the Siatoms of the square facets. In all cases the Homo-Lumo gap for the most favorablestructure decrease with the size n. Trends for the binding energy, dipole moment,and other electronic properties are also discussed. Several crystal structures con-structed from these superatom, supermolecules, and aggregates have been testedand preliminary results are summarily commented.

Key words: template, instructionsMSC 2000: AMS codes (optional)


Trends in the formation of aggregates and crystals from M@Si16 clusters

1 Introduction

The interest in the study of small atomic clusters is growing in present times motivatedfor their potential usage as building blocks for new functional materials and devices atthe nanoscale. To achieve this goal is of paramount importance to investigate how thesystem geometry depends on the interparticle coupling and how it affects the physicalproperties of the systems. Chemically stable building blocks which interact weaklyamong themselves and with other clusters of the same material should have a closedelectron configuration with a large energy gap between the highest occupied molecularorbital (HOMO) and the lowest unoccupied molecular orbital (LUMO). The otherimportant factor determining the cluster stability is the atomic geometry. Thus, thecooperative effects between electronic and geometrical factors can provide a guidingprinciple for designing stable building-block clusters. In a previous work [1] we havedetermined from first-principle calculations the geometrical and electronic structure ofthe low-laying energy isomers of M@Sin clusters (M = Sc−, Ti, V+) in the range n =14-18, see Figure 1. In that work we obtained a good agreement with the experimentalresults obtained by Nakajima and coworkers [2, 3, 4, 5] for the the endohedral characterand extra stability of M@Si16 clusters, as well as for their electron affinity and Homo-Lumo gap. For the ground state geometry of these clusters we obtained a distortedFrank-Kasper Td structure in agreement with a previous calculation for Ti@Si16 byKumar and coworkers [6].

Furthermore, we provided in ref.[1] an interpretation of the electronic structureand orbital projected density of states (PDOS) of M@Si16 clusters in the context ofthe spherical shell model perturbed by the crystalline field of the underlying ionicgeometry. That fact has been confirmed recently by the angular dependent photo-electron spectroscopy experiments of von Issendorf and coworkers [7]. The model reston the following assumptions:

1. For an empty spherical cage, the O+(3) states have predominantly zero radialnodes: 1s, 1p, 1d, 1f, 1g, 1h, . . .

2. l-selection rule: only those M orbitals transforming in the same irrep of thepoint group of the cluster can be mixed in a given bonding state.

3. The covalent bonding in M@Si16 results from the hybridization of the empty-cage states and the valence states of endohedral-atom having equal angular momentuml.

Thus, the V@Si+16 cluster has 68 valence electrons (64 from Si and 4 from V+)in the single-particle spherical shells 1s, 1p, 1d, 1f, 2s, 1g, 2p, 2d, which suffer dif-ferent splitting depending on the ionic geometry. For the ground state Td symmetry,with Homo-Lumo gap 2.25 eV, the splitting is s(a1), p(t2), d(t2+e), f(a2+t1+t2),g(a1+e+t1+t2), h(e+t1+2t2), i(a1+a2+e+t1+2t2), . . ., which can easily be recog-nized in our calculated PDOS for that isomer [1]. The XAS experiments of ref.[7] showagrement with our PDOS for that Td FK isomer (16-I of Figure 1), but not for that ofthe D4d f-like geometry of V@Si+16, which is 0.5 eV obove the ground state and have asmaller Homo-Lumo gap 1.50 eV.

The construction of new optoelectronic materials by assembling molecules of the


M.B. Torres, E.F. Fernandez, G. Lopez Laurrabaquio, L.C. Balbas

Low-lying isomers of M@Si16

C3v /FK* Td /FK C5v /Penta D4d / F-like

16-ISc-: 0.00, 1.91, 1Ti : 0.00, 2.09, 1V+ : 0.01, 2.25, 2

1616--IIIIScSc--: 0.02, 2.12, 2: 0.02, 2.12, 2Ti : 0.03, 2.17, 2Ti : 0.03, 2.17, 2VV++ : 0.00, 2.25, 1: 0.00, 2.25, 1

16-IIISc-: 0.03, 1.81, 3Ti : 0.25, 2.01, 3

16-IVSc-: 0.14, 1.04, 4Ti : 0.40, 1.32, 4V+ : 0.50, 1.50, 3

Et

HOMO-LUMO gap

Isomer number

Figure 1: Geometry of the several low-lying energy isomers of M@Si16 clusters. Beloweach structure is given, for the three types of atom-impurity, the total energy difference(eV) with respect to the lowest energy configuration, the Homo-Lumo gap (eV), andordinal number of the isomer. Notice that the Homo-Lumo gap of the 16-IV isomer isconsiderably smaller than for the others.

type V@Si16X with X = halogen atom, or Sc@Si16Y with Y = alkali atom, was suggestedby Nakajima and coworkers [4] assuming that the large Homo-Lumo gap of the ionicsuperatom can be maintained in the range of ∼ 2 eV when the ionic supermolecule isformed and a solid phase is eventually grown after them. For the Ti@Si16 superatom,has been shown in a calculation by Pacheco et al [8] the possible survivance at roomconditions of a meta-stable hcp solid formed from the FK ground state.

Before to explore the stability of meta-stable bulk phases of materials composed ofM@Si16Z super-atoms we study in this paper the more favorable structures of the aggre-gates [Ti@Si16]n, [Sc@Si16K]n, and [V@Si16F]n formed from cage-like M@Si16 clusters.In the section 2 is described our computational approach. In sections 3-4-5 we presentand discuss the results for aggregates of Ti, Sc−, and V+ doped Si16 cluster, respec-tively. In section 6 are presented a few preliminary results for extended 1D, 2D, and3D systems formed from these superatom, supermolecules, and agregates. In section 7we summarize our results.



2 Computational Procedure

We have used the density functional theory [9] (DFT) code Siesta [10] within thegeneralized gradient approximation as parameterized by Perdew, Burke and Ernzerhof[11] for the exchange-correlation effects. Details about the pseudopotentials and basissets are the same as in our previous work [1]. Specifically, we used norm conservingscalar relativistic pseudopotentials [12] in their fully nonlocal form [13], generated fromthe atomic valence configuration 3s23p2 for Si (with core radii 1.9 a. u. for s and porbitals), and the semi-core valence configuration 4s23p63dn for Sc (n=1), Ti (n=2),and V (n=3) (all of them with core radii, in a.u., 2.57, 1.08, and 1.37 for s, p, andd orbitals, respectively). For K (F) we used the configuration 4s1 (2s22p5) with coreradius 3.64 (1.39) a.u. for all s, p, and d valence orbitals. In the present calculations weused a double-ζ basis s, p (for Si) and s, p, d (for M), with single polarization d (for Si)and p for M, having maximum cutoff radius, in a.u., 7.47 (Si), 8.85 (Sc), 8.45 (Ti), and8.08 (V). The basis set and pseudopotentials of M atoms were used and tested beforein Refs [14, 15]. The matrix elements of the self-consistent potential are evaluated byintegration in a uniform grid with Double Zeta plus polarization (DZP) basis. The gridfineness is controlled by the energy cutoff of the plane waves that can be representedin it without aliasing (120 Ry in this work).

The equilibrium geometries result from an unconstrained conjugate-gradient struc-tural relaxation using the DFT forces. We try out several initial structures for eachcluster (typically more than twenty) until the force on each atom was smaller than0.010 eV/A. A 4x4x4 Monkhorst pack grid was used for bulk calculations of SC, BCC,FCC, NaCl, CsCl, and hcp structures.

3 [Ti@Si16]n aggregates

In Figure 2 are represented the minimum energy [Ti@Si16]n aggregates with n≤4formedfrom the D4d f-like Ti@Si16 superatom and with one- and two-dimensional arrange-ments. The binding energy (eV) (with respect to the fully separated units), Homo-Lumo gap (eV), and dipole moment (Debye) for different sizes (n) and arrangements(chain and planar) are given in the inset. The dimer (n=2) is formed preferably bymeans 4 Si-Si bonds between Si atoms of the type 1, that is, Si atoms belonging tothe square basis of the D4d superatom. Notice that the units in the dimer are specularimages the one of the other. This dimer has ∼ 1 eV deeper total energy than a com-pact equilibrium configuration of Ti2@Si32 which was fully optimized after an initialicosahedral Ih geometry of Si32. This fact give us confidence in the reliability of theaggregation of Ti@Si16 superatoms to form larger complexes.

Several low lying energy isomers of the dimer in Figure 2 were found. One of themis formed by twisting 90 one of the two Ti@Si16 D4d units around the molecular axeof the dimer. It is possible to grown linear chains from n=2 to n=3,4 aggregates byadding adding a specular unit or dimer, respectively, as represented in Figure 2. Otherpossibilities, among many others, is to add 90 twisted units, or alternating specularwith twisted dimer units. The planar configurations for n=3 and 4 have smaller binding



energy per unit than the linear ones. These 2D aggregates are formed by Si-Si bondsbetween the Si atoms of the square basis in the D4d isomer of Ti@Si16. Comparingthe binding energy per Si-Si bond of the 1D and 2D n=3-4 aggregates we see that theplanar configuration is favorable. For n=5 the five Ti dopant atoms form a non regularplanar pentagon, and are formed Si-Si bond between type-2 Si atoms. The regularpentagon is a low lying energy isomer.

The Homo-Lumo gap in both, 1D and 2D structures, decreases as the size increases.Thus the transition to the metallic state is reached for very few basic units. The studyof the PDOS of these aggregates is in progress. Interestingly, one can see that the planarn=3 and 4 aggregates can be taken as units to form infinite surfaces with honeycomband simple square structures, respectively. Growing these n=3-4 units in the normaldirection to the plane can be formed other type of wires resembling nanotubes. We willstudy these systems in the near future.

Figure 2: One-(left) and two-(right) dimensional arrangements of [Ti@Si16]n aggregatesformed from the D4d f-like Ti@Si16 superatom. The binding between superatoms occursby means Si-Si bonds between Si atoms of the type 1, that is, Si atoms belonging tothe square basis of the cluster, except for the planar n=5 aggregate, where Si-Si bondsinvolving Si type 2 atoms are also involved. The binding energy (eV), Homo-Lumo gap(eV), and dipole moment (Debye) for different sizes (n) and arrangements (chain andplanar) are given in the inset.



4 [Sc@Si16K]n aggregates

In Figure 3 are represented the [Sc@Si16-K]n aggregates of supermolecular Sc@Si16-Kunits formed from the 16-III Sc@Si−16 cluster bonded to a K atom. The Sc@Si16K su-permolecule is formed by capping the K atom on the pentagonal basis of an isomer ofSc@Si−16 with only 30 meV higher energy than the Frank-Kasper ground state. The[Sc@Si16K]n aggregates (n = 1-3) formed from that supermolecule prefer non-linearcompact configurations, with the highest coordination of the K atoms. Other competi-tive equilibrium aggregates with linear configuration for n≥2 are formed from molecularunits composed of the C3v and Td isomers. The formation of aggregates from fullerene-like D4d isomer of Sc@Si−16 plus the K atom is in progress and preliminary results showinteresting trend to be reported elsewere.

Figure 3: [Sc@Si16-K]n aggregates of supermolecular Sc@Si16-K units formed from the16-III O5v Sc@Si−16 cluster bonded to a K atom. The binding between supermoleculesis mediated by the K atom. For n=3 are given two configurations, linear (right up) andplanar (right down). The binding energy (eV), Homo-Lumo gap (eV), dipole moment(Debye), and distance Sc-Sc (A) for n≤3 are given in the inset.

Calculations using the D4d f-like isomer for the anionic component of Sc@Si16K arein progress. The possibility of a endohedral ScK molecule inside of Si16 cage configu-rations have been also considered.



5 [V@Si16F]n aggregates

The molecule F2 dissociate on all isomers and positions of V@Si+16 in agreement withexperiments [4]. The highest dissociation energy is 8.25 eV.

In Figure 4 are given the [V@Si16-F]n aggregates of supermolecular V@Si16-F unitsformed from the D4d f-like V@Si−16 cluster bonded to a F atom. The binding betweensuperatoms occurs by means of Si-Si bonds between Si atoms of the type 1, that is, Siatoms belonging to the square basis of the cluster, as well as Si-Si bonds between Sitype 2 atoms. The binding energy (eV), Homo-Lumo gap (eV), dipole moment (Debye),and magnetic moment (µB) (only for n=4-5) are also given.

The most stable V@Si16F supermolecule is formed by bonding the F atom on aSi atom of a square basis in the fullerene-like D4d isomer of V@Si−16. The [V@Si16F]naggregates (n = 1-5) formed from that supermolecule prefer planar configurations up ton=4, with the smallest coordination of the F atoms. Contrary to Ti@Si16 aggregates,the ground state for n=2 is bonded having the two units inverted with respect to thecenter of the two Si-Si bonds. Interesting chains and nano-tubes can be formed fromthe n=4 aggregate. The low lying isomer of the n=3 aggregate with only Si-Si bondsbetween type-1 Si atoms can be grown along the aggregate axis to obtain a 1D infinitewire.

6 Crystals from M@Si16-X supermolecular units

We have calculated several crystal phases having the Ti@Si16 superatom as basic unit.For the BCC case, we obtained that the FK isomer, with Td symmetry, reaches a meta-stable minimum with ∼ 1.5 eV cohesive energy at ∼ 8.5 Aunit-unit distance, whereasthe f-like D4d isomer lead to a deeper minimum with ∼ 5.5 eV cohesive energy at ∼ 8.0A unit-unit distance. The orientation of the cluster in the cell has a controllable effect.

Similar calculations have been performed for FCC, BCC, and SC crystals using theSc@Si16K supermolecule with the FK isomer isomer of the Sc@Si16 component as thebasic unit. Other bulk structures with the NaCl and CsCl structures were tested. Themore stable structure is, however, NaCl with the f-like isomer.

In the case of the f-like V@Si16F supermolecule, the largest cohesive energy is ob-tained for the NaCl-FCC crystal. The orientation of the super-atom in thecell playan important role. The initial geometry of the super-atom suffers strong deforma-tions when approaching the minimum of the energy-volume plot. The PDOS for thismeta-stable structure shows an interesting metallic character with a tendency to ferro-magnetism in the distribution of the d -electrons of Vanadium.

7 Summary and Outlook

The doped M@Si16 clusters with M = Sc−, Ti, and V+, having 68 valence electrons,are more stable (magic) than those with neighbor sizes (n=15,17) because they adopta nearly spherical geometry allowing hybridization of the d orbital of the endohedral



Figure 4: [V@Si16-F]n aggregates of supermolecular V@Si16-F units formed from theD4d f-like V@Si−16 cluster bonded to a F atom. The binding between superatoms occursby means of Si-Si bonds between Si atoms of the type 1, that is, Si atoms belongingto the square basis of the cluster, as well as Si-Si bonds between Si type 2 atoms. Thebinding energy (eV), Homo-Lumo gap (eV), dipole moment (Debye), and magneticmoment (µB) (only for n=4-5) are given beside.

M and the d orbital of the spherical Si cage. This calculated features are in agrementwith experiments [4, 7].

For [Ti@Si16]n we have obtained equilibrium structures for one dimensional (1D)linear chains, two dimensional networks having triangular (2D-T) and square (2D-S)motifs, and three dimensional (3D) arrangements. Some of these arrangements can beseen as units to form infinite 1D, 2D, and 3D periodic systems. We have studied therelative stability of these [Ti@Si16]n arrangements in order to determine the minimumsize n to evolve from 1D to 2D to 3D systems.

The Sc@Si16K supermolecule is formed by capping the K atom on the pentagonalbasis of an isomer of Sc@Si−16 with only 30 meV higher energy than the Frank-Kasperground state. The [Sc@Si16K]n aggregates (n = 1-3) formed from that supermoleculeprefer non-linear compact configurations, with the highest coordination of the K atoms.Other competitive equilibrium aggregates with linear configuration for n≥2 are formedfrom molecular units composed of the fullerene-like D4d isomer of Sc@Si−16 plus the Katom.

The most stable V@Si16F supermolecule is formed by bonding the F atom on aSi atom of a square basis in the fullerene-like D4d isomer of V@Si−16. The [V@Si16F]n



aggregates (n = 1-5) formed from that supermolecule prefer planar configurations up ton=4, with the smallest coordination of the F atoms. Interesting chains and nano-tubescan be formed from the n=4 aggregate.

Ti@Si16 and V@Si16F with f-like units form meta-stable bcc and NaCl crystals,respectively, whereas Sc@Si16K prefers a fcc (NaCl) structure formed with the f-likesupermolecule. The orientation of the super-atom in the cell play an important role.The PDOS of fcc-NaCl structure of V@Si16F at the Fermi energy is very high indicatingan unstable phase. The Homo-Lumo gap of the finite aggregates decreases when thesize increases, and the 3D bulk phases are generally metallic. Thus, bulk materialsassembled from these aggregates are not of interest for optoelectronic devices. However,the study of the stability and magnetic properties of 1D and 2D infinite systems formedfrom the M@Si16Z superatom and supermolecules considered in this work is in progress,and promise a big amount of unexpected properties.

Acknowledgements

We wish to acknowledge the support of the Spanish Ministry of Science (Grant FIS2008-02490/FIS) and Junta de Castilla y Leon (Grant GR120).

References

[1] M. B. Torres, E. M. Fernandez, and L. C. Balbas, Theoretical study ofisoelectronic SinM clusters (M = Sc−, Ti, V+; n=14-18), Phys. Rev. B 75 (2007)205425-1-12.

[2] M. Ohara, K. Koyasu, N. Nakajima, and K. Kaya, Geometric and electronicstructures of metal (M)-doped silicon clusters (M=Ti, Hf, Mo and W), Chem.Phys. Lett. 371 (2003) 490.

[3] K. Koyasu, M. Akutsu, M. Mitsui, and N. Nakajima, Selective Formationof MSi16 (M = Sc, Ti, and V), J. Am. Chem. Soc. 127 (2005) 4998.

[4] K. Koyasu, J. Atobe, M. Akutsu, M. Mitsui, and N. Nakajima, Elec-tronic and Geometric Stabilities of Clusters with Transition Metal Encapsulatedby Silicon, J. Phys. Chem. A 111 (2007) 42.

[5] S. Furuse, K. Koyasu, J. Atobe, and N. Nakajima, Experimental and theo-retical characterization of MSi16−, MGe16−, MSn16−, and MPb16−, (M=Ti, Zr,and Hf): The role of cage aromaticity, J. Chem. Phys. 129 (2008) 064311.

[6] V. Kumar and Y. Kawazoe, Metal-Encapsulated Fullerenelike and Cubic CagedClusters of Silicon, Phys. Rev. Lett. 87 (2001) 045503.

[7] Ph. Klar, K. Hirsch, A. Langenberg, F. Lofink, R. Richter, J.Rittmann, M. Vogel, V. Zamudio-Bayer, T. Moller, Bernd v. Is-sendorff, and T. Lau, Local Electronic Structure of Doped Silicon Clusters:



Direct Evidence for the Magic Nature of Endohedrally Doped VSi+16, BESSY re-port: Jahresbericht 2007-2-70416.

[8] C. L. Reis, J. L. Martins, and J. M. Pacheco Stability analisis of a bulkmaterial built from silicon cage clusters: A first principles approach Phys. Rev. B76 (2007) 233406.

[9] W. Kohn and L. J. Sham, One-Particle Properties of an Inhomogeneous Inter-acting Electron Gas, Phys. Rev. 145 (1965) 561.

[10] J. M. Soler and E. Artacho and J. D. Gale and A. Garcıa and J.Junquera, Pablo Ordejon and D. Sanchez-Portal, The SIESTA methodfor ab initio order-N materials simulation, J. Phys.: Condens. Matter 14 (2002)2745.

[11] J. P. Perdew, K. Burke, and M. Ernzerhof, Generalized Gradient Approx-imation Made Simple, Phys. Rev. Lett. 77 (1996) 3865.

[12] N. Troullier and J. L. Martıns, Efficient pseudopotentials for plane-wavecalculations, Phys. Rev. B 43 (1991) 1993.

[13] L. Kleinman and D. M. Bylander, Efficacious Form for Model Pseudopoten-tials, Phys. Rev. Lett. 48 (1982) 1425.

[14] E. M. Fernandez, M. B. Torres and L. C. Balbas, Trends in the bonding ofthe first-row transition metal compounds: V(001) surface, TM-oxide and nitridemolecules, and AunTi (2=n=7) clusters, Int. J. Quantum Chem. 99 (2004) 39.

[15] M. B. Torres, E. M. Fernandez and L. C. Balbas, Theoretical study ofstructural, electronic, and magnetic properties of AuMn

+, clusters (M=Sc, Ti, V,Cr, Mn, Fe, Au; n=9), Phys. Rev. B 71 (2005) 155412.



A statistical characterization of differences andsimilarities of aggregation functions

Luigi Troiano1 and Luis J. Rodrıguez-Muniz2

1 Dept. of Engineering - University of Sannio, RCOST, Viale Traiano – 82100Benevento, Italy

2 Dept of Statistics and O.R. - University of Oviedo, E.U.I.T.Industrial, Campus deGijon s/n – 33271 Gijon, Spain


Abstract

As information fusion is becoming relevant to many quantitative fields of sci-ence and economics, aggregation functions have been deeply studied in order toinvestigate their analytical properties. More recently, researchers are experiencingthe practical application of aggregation functions. Along the problem of fitting anappropriate function to empirical data, there is the need of studying the behaviorof models. In this paper we focus on the statistical study of the output values asa way to compare and choose among different alternatives by simulation.

Key words: Aggregation Function, Information Fusion, Statistics.

1 Introduction

In many applications, there is a need for aggregating several input values into a singleoutput value. The aggregation is often related to quantitative information. This taskbecomes central to several problems in fields such as Physics, Computer Science, En-gineering, Social Sciences, Economics and Finance [1]. In the recent years, a relevantresearch effort has been produced in order to characterize the framework of aggregationfunctions.

With respect to this last meaning, an aggregation function is a mapping F : Rn →R. Generally input and output values belong to some ratio scale. Therefore, withoutany loss of generality we can rescale all values to the unit interval I. So an aggregationfunction can be defined as F : In → I.

In particular, researchers paid attention to conjunctive functions (e.g. t-norms),disjunctive functions (e.g. t-conorms), and compensatory functions (e.g. means, OWA,


Statistical characterization of aggregation functions

uninorms) in order to characterize them by analytical properties. More recently, at-tention has been paid to how to fit models to empirical data [2, 3]. In this contextbecomes relevant to study the distribution of output values and how it is related toinput values.

For instance, conjunctive functions concentrate values in the lower part of I, whilstdisjunctive functions will concentrate values in the upper part. Compensatory functionswill concentrate values in the central part. Therefore, looking at the output distribu-tions we can verify if two aggregation functions have a different behavior or they followa similar pattern from a statistical point of view.

Despite the evidence that statistics provide robust and meaningful tools for charac-terizing aggregation functions, this approach has never been deeply investigated. Manyquestions can arise by studying the output distribution from a statistical point of view.In this paper we will focus on pairwise comparing aggregation functions on the basisof their output distributions in order to answer questions such as: Are statistical dif-ferences among conjunctive (disjunctive) and compensatory functions? Do the numberof data points to aggregate as any effect on the output distributions? Do the numberof variables to aggregate emphasize differences between functions? Does the input dis-tribution change the behavior of outputs?. Answering these questions provide insightsuseful to practitioners in choosing between alternatives with substantial statistical dif-ferences.

The remainder of this paper is organized as follows: in Section 2 we provide somepreliminaries in order to get a common basis for discussion; in Section 3 we state thetheoretical output distributions for some of the functions defined previously; in Section4 we develop the experimental study by simulating several cases and comparing themin order to give answer to some of the questions stated above; in Section 5 we outlineconclusions and future directions.

2 Preliminaries

Given the scale I and the order n, all possible aggregation functions are functions beingdeveloped within the hypercube In+1. In this context, an n-ary aggregation functionis a mapping F(n) : In → I. The number of arguments may not be known in advance.Therefore an aggregation function is in general a mapping F :

⋃n∈N

In → I, where

F |In = F(n) ∀n ∈ N, so that a general aggregation function can be regarded as a familyF =

(F(n)

)n∈N. For the sake of simplicity, in the following sections we will mostly refer

to binary functions. Moreover, since the interval I plays the role of scale, we can focuson I ≡ [0, 1] without any loss of generality. Thus, we will consider the class of mappingsF(n) : [0, 1]n → [0, 1] and in particular F(2) : [0, 1]× [0, 1] → [0, 1].

There are some basic properties able to characterize a generic function as an ag-gregation function [4].

Boundary conditions. For any aggregation function there are two boundary


L. Troiano, L.J. Rodrıguez-Muniz

conditions to meet, namely

F(n)(0, . . . , 0) = 0 and F(n)(1, . . . , 1) = 1 (1)

Both the conditions preserve the domain bounds, so that the minimal inputs leadto the minimal output value, whilst the maximal inputs lead to the maximal outputvalue. This property becomes a natural assumption within the context of multi-criteriadecision making (i.e. mcdm) problems. Indeed, we expect that the aggregation of com-pletely unsatisfactory (negative, false) criteria is itself completely unsatisfactory (neg-ative, false). Similarly, we expect that the aggregation of fully satisfactory (positive,true) criteria is itself fully satisfactory (positive, true).

Monotonicity. The monotonicity of aggregation functions is generally considerednon-decreasing with respect to each variable (marginal monotonicity), so that

F(n)(x1, . . . , xn) ≥ F(n)(x′1, . . . , x

′n) ∀xi ≥ x′i, i = 1..n (2)

The non-deincreasingness reflects the assumption in mcdm scoring problems that betterinput values should lead to an overall better aggregated value. In other words, betterscores cannot provide a worse overall score.

Identity when unary. This simple property states that

F(1)(x) = x ∀x ∈ I (3)

that is the aggregated result should not differ from the input value in the trivial caseof one argument.

Beside these fundamental properties, there are other useful properties. In particu-lar, the notion of strength provides a partial ordering of aggregation functions.

Strength. This property is related to the comparison of two aggregation functionsF(n) and F ′

(n), so that F(n) is stronger than F ′(n) (F(n) < F ′

(n)) iff

F(n)(x1, . . . , xn) ≥ F ′(n)(x1, . . . , xn) ∀xi ∈ I, i = 1..n (4)

Duality. For each aggregation function F , there exists a dual function F ′ definedas

F (x1, . . . , xn) = 1− F (1− x1, . . . , 1− xn) (5)

When met, this property allows to evaluate the overall behavior of an aggregationfunction, so that it is assumed to provide higher or lower aggregated values in compar-ison to another function. Moreover, aggregation functions can be partially ordered bystrength, so that this property provides a criterion for classifying three main categoriesof aggregation functions, namely the t-norms, t-conorms and averages, discussed below.

The class of aggregation functions is very large. There are many other propertiesable to characterize some groups of aggregation functions. More detailed overview ofaggregation functions and properties can be found in references [1]. Instead, we willfocus on noticeable example of aggregation functions, in particular examples belongingto the class of t-norms, t-conorms and compensatory functions.



2.1 Triangular norms

A triangular norm (i.e. t-norm) is an aggregation function T : [0, 1]2 → [0, 1] that issymmetric, associative, with neutral element 1. The dual function S : [0, 1]2 → [0, 1],that is also symmetric and associative, but with neutral element 0, is called a triangularconorm (i.e. t-conorm). It is possible to prove that t-norms have absorbent element 0,while t-conorms have absorbent element 1.

To the class of t-norms belong many functions commonly used in applicative do-mains. Some examples are reported in Table 1.

Minimum TM (x1, x2) = min(x1, x2)Product TP (x1, x2) = x1 · x2

Lukasiewicz TL(x1, x2) = max(x1 + x2 − 1, 0)

Drastic TD(x1, x2) =

x1 x2 = 1x2 x1 = 10 otherwise

Yager (p > 0)TY (x1, x2) =

max(1− [(1− x1)p + (1− x2)p]

1p , 0

)

Schweizer andSklar (q > 0)

TS(x1, x2) =

1− [(1− x1)q + (1− x2)q − (1− x1)q(1− x2)q]1q

Frank(s > 0, s 6= 1)

TF (x1, x2) = logs

(1 + (s−1)x1+(s−1)x2

s−1

)

Table 1: Some examples of t-norms

Although the set of t-norms is not totally ordered (at least it has not been provenyet), it is possible to prove that any generic t-norm T is constrained, more precisely

TD 4 T 4 TM (6)

Thus, t-norms are conjunctive. However, it holds

TD 4 TL 4 TP 4 TM (7)

For each t-norm there exists a dual counterpart in the set of t-conorms. For in-stance, Table 2 reports the (standard) dual t-conorms of Table 1.

In the case of t-conorms, it is possible to verify

SM 4 S 4 SD (8)

andSM 4 SP 4 SL 4 SD (9)

Therefore t-conorms are disjunctive.



Maximum SM (x1, x2) = max(x1, x2)ProbabilisticSum

SP (x1, x2) = x1 + x2 − x1x2

Lukasiewicz SL(x1, x2) = min(x1 + x2, 1)

Drastic SD(x1, x2) =

x1 x2 = 0x2 x1 = 01 otherwise

Yager (p > 0) SY (x1, x2) = min([xp

1 + xp2]

1p , 1

)

Schweizer andSklar (q > 0)

SS(x1, x2) = [xq1 + xq

2 − xq1x

q2]

1q

Frank(s > 0, s 6= 1)

SF (x1, x2) = 1− logs

(1 + (s−1)1−x1+(s−1)1−x2

s−1

)

Table 2: Some examples of t-conorms

2.2 Compensatory functions

An aggregation function M is said compensatory if it holds the compensatory property,that is

min 4 M 4 max (10)

Examples of compensatory aggregation are presented in Table 3.The class of compensatory functions is not limited to the means. This is the case of

the exponential compensatory functions. Other functions can be built as compositionof t-norms and t-conorms, for instance the convex-linear compensatory functions aredefined as

LT,Sγ (x1, . . . , xn) = (1− γ)T (x1, . . . , xn) + γS(x1, . . . , xn) (11)

In general, aggregation functions are not associative; so they are all functions presentedabove.

3 Output distributions

We can relate the cumulative distribution function of output values (i.e. cdf) to thestructure of levels, as

Pr(F (x1, ..., xn) 6 z) = 1− µF (z) (12)

where µF (z) is the strict α-cut measure at level z. Indeed the strict α-cut of F , collectsall points x ∈ In, such that F (x) > z, as depicted in Fig.3.

Instead, the probability density function (pdf) is defined as

Pr(F (x1, ..., xn) = z) = − ddzµF (z) (13)



ArithmeticMean

AM(x1, . . . , xn) = 1n

n∑i=1

xi

Geometric Mean GM(x1, . . . , xn) =(

n∏i=1

xi

) 1n

Quadratic Mean QM(x1, . . . , xn) =

√1n

n∑i=1

x2i

Harmonic Mean HM(x1, . . . , xn) =(

1n

n∑i=1

1xi

)−1

Quasi-arithmeticMean (α ∈ R)

QAM(x1, . . . , xn) =(

1n

n∑i=1

xαi

) 1α

ExponentialCompensation(γ ∈ [0, 1])

Zγ(x1, . . . , xn) =(n∏

i=1xi

)1−γ

·(

1−n∏

i=1(1− xi)

)γ

Table 3: Some examples of compensatory functions

Minimum. Alpha cuts are squares whose sides become smaller by level increasing,and whose measure is µF (z) = (1 − z)2, so CDFTM

(z) = 2z − z2 and PDFTM(z) =

2(1− z).

Product. In this case, the area of the α-cut at level z is µF (z) =1∫z

zxdx =

1 + z(log z − 1). Thus, the product cdf and pdf are CDFTP(z) = z(1 − log z) and

PDFTP(z) = − log z.

Lukasiewicz t-norm. The alpha-cut measure at z = 0 is 1/2. Indeed, µF (z) =12(1− z)2 and CDFTL

(z) = 1− 12(1− z)2 and PDFTL

(z) = 12δ(z) + (1− z).

Drastic t-norm. The drastic t-norm is 0 almost everywhere, except on the sidesx = 1 and y = 1. Therefore: CDFTD

(z) = 1(z) and PDFTD(z) = δ(z).

The cdf and pdf of some functions above are plotted in Fig.3Levels of dual functions are symmetric as shown by Fig.3 in the case of product

and probabilistic sum. The symmetry between dual functions suggests a general rulefor computing cdf and pdf, as stated below.

Proposition 3.1 Let F (·) be the dual function of F (·), the following equality stands:

CDFF (z) = 1− CDFF (1− z) and PDFF (z) = PDFF (1− z)

Proof. Since F (x) = 1−F (1−x), α-cuts of F are obtained as depicted in Fig.3. Thus,

µF (z) = 1− µF (1− z)

and this concludes the proof.



0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1

0.2 0.4 0.6 0.8 1

0.2

0.4

0.6

0.8

1

Figure 1: Relationships between function levels and output cdf

0.2 0.4 0.6 0.8 1

0.2

0.4

0.6

0.8

1

0.2 0.4 0.6 0.8 1

0.2

0.4

0.6

0.8

1

0.2 0.4 0.6 0.8 1

0.2

0.4

0.6

0.8

1

0.2 0.4 0.6 0.8 1

1

2

3

4

5

6

7

0.2 0.4 0.6 0.8 1

1

2

3

4

5

6

7

0.2 0.4 0.6 0.8 1

1

2

3

4

5

6

7

Lukasiewicz Product Minimum

Figure 2: The cdf (top) and pdf (bottom) of noticeable t-norms

Similarly, we can obtain the distribution functions of some averages. With respectto two input variable, we can determine the output distribution of arithmetic meanand quadratic mean (see Fig.3)

Arithmetic Mean.

CDFAM (z) =

2z2 0 ≤ z < 1

2

1− 2(1− z)2 12 ≤ z ≤ 1

(14)

PDFAM (z) =

4z 0 ≤ z < 1

2

4(1− z) 12 ≤ z ≤ 1

(15)



0

0.25

0.5

0.75

10

0.25

0.5

0.75

1

0

0.25

0.5

0.75

1

0

0.25

0.5

0.75

1

0

0.25

0.5

0.75

10

0.25

0.5

0.75

1

0

0.25

0.5

0.75

1

0

0.25

0.5

0.75

1

0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1

Product Probabilistic Sum

Figure 3: Comparison of dual functions product and probabilistic sum

Quadratic Mean.

CDFQM (z) =

12z2π 0 ≤ z < 1√

2√2z2 − 1 + z2

(π2 − 2 arctan

√2z2 − 1

)1√2≤ z ≤ 1

(16)

PDFQM (z) =

zπ 0 ≤ z < 1√2

2z(

π2 − 2 arctan

√2z2 − 1

)1√2≤ z ≤ 1

(17)

Arithmetic Mean Quadratic Mean

0.2 0.4 0.6 0.8 1

0.2

0.4

0.6

0.8

1

0.2 0.4 0.6 0.8 1

0.5

1

1.5

2

0.2 0.4 0.6 0.8 1

0.2

0.4

0.6

0.8

1

0.2 0.4 0.6 0.8 1

0.5

1

1.5

2

Figure 4: The cdf (top) and pdf (bottom) of arithmetic and quadratic mean.

The measure µF (z) involves the computation of multidimensional integrals, thatsometimes can be difficult or impossible to compute analytically. In this case, the cdfcan be computed numerically, by approximation or montecarlo simulation.

4 Differences and similarities in the aggregation operatorsoutputs

In this section we analyze the impact that changes in the distribution of the inputvalues and the number of arguments produce on the distribution of the aggregationfunction. We also pay attention to the effect of increasing the sample size, that is, the



number of points we are aggregating, since we develop the analysis by using simulation.This type of study is a very important question since the original distribution over theinputs has obvious effects on the values of the aggregation function: for instance, theresult for a conjunctive function with uniform inputs will differ from the result usingan input distribution that concentrates values in the lower part of I. Also the numberof inputs is an important source of variability since the distribution of the aggregationfunctions changes when considering more arguments. On the other hand, the samplesize is key point since the larger the sample size is the better is our information aboutthe behavior of the outputs and, thus, we are in a better position to see the differencesbetween outputs. As an additional analysis, not following the same procedure butstraightly related to this one is the fitting of the empirical distribution of the outputsto a given theoretical distribution.

Therefore, our interest in this section is to establish a procedure that helps thepractitioner in deciding which aggregation function to choose. And the answer thatwe are giving is based on the statistical significance of the differences between two(or more) aggregation functions. So that, when we are not able to deduce significantdifferences between the outputs of two aggregation functions, we can make our choiceknowing that the relevance for the data is not important.

It should be underlined, from the beginning of this procedure, that we are focusingon the experimental results of the outputs and not on the theoretical basis of the inputs.This is very important since we are not stating that two different aggregation functionsare equal in any sense, but we are empirically checking that, despite the original dif-ferences in the distribution of the inputs, the outputs does not provide statisticallysignificant differences. Thus, from a statistical point of view, we could consider the twoaggregation functions to be indifferent regarding the distribution of the data. Moreover,increasing the sample size will always ease the possibility to differentiate the effect ofdifferent aggregation operators.

In order to clarify these ideas we have followed a standard procedure, that we showbelow and then we illustrate it with some examples.

4.1 Procedure

Procedure: Comparing aggregation outputs distributions.1. Fix the number k of aggregation functions to compare.2. Fix the sample size.3. Fix the number n of arguments (inputs).4. Fix the probability distributions over input values.5. Generate a sample of input values xlj, for j = 1, . . . , k and l = 1, . . . , n.6. Choose the aggregation functions to evaluate: F1, . . . , Fk.7. Obtain the output values Y1, . . . , Yk by making Yj = Fj(x1j , . . . , xnj), for j =

1, . . . , k.8. Pose the question: ‘Can the values of Y1, . . . , Yk can be considered to have

the same distribution?’. If k = 2, perform a Mann-Whitney-Wilcoxon Test over the



outputs Y1, Y2. Obtain a p-value. If k > 2, perform a Kruskal-Wallis Test over theoutputs Y1, . . . , Yk. Obtain a p-value.

9. If the p-value is greater than or equal to the prefixed significance level, then theconclusion is: ‘The outputs can be considered to have the same distribution’. Else, theconclusion is: ‘The outputs are significantly different’.

10. End.

We have to remark than when solving a statistical test, the conclusion is eitherwe have found evidences to reject our question (null hypothesis) or we do not havefound evidences to reject the hypothesis. Hence, a substantial nuance is that we arenot properly saying that the outputs are similar but they can considered to be similar.

It is a must to emphasize that we are not saying that different aggregation operatorsapplied to the same inputs produce similar results. We are not referring to fixedindividuals, because in that case the differences is immediate. On the contrary, ourconclusion is statistical in the following sense: we are concluding that when applyingdifferent aggregation operators on a population of inputs with a certain distribution,our results can considered to be statistically similar, that is, in general, most of ouroutputs will have similar behavior.

We also observe that the sample size is prefixed at the beginning of the procedure.As we briefly pointed out before, the sample size is crucial in determining the differences,because the greater is the sample size the more powerful is the test to appreciatedifferences.

A final important remark concerns the test we are using. In case we have a Gaus-sian distribution over the data Y1, . . . , Yk we could use other types of tests, like T-tests/ANOVA for instance, but it is important to note that we are more focusedon distribution rather than equality of means, that is the hypothesis tested by T-tests/ANOVA.

4.2 Examples

Obviously, the number of possible cross-combinations over input distributions/aggre-gation functions/number of arguments/sample sizes is impossible to handle. Thus,we show some very illustrative examples. In all of them we have perfomed severalsimulations (up to 10000 iterations per each one) and them we have analyzed the overallresults. Other parameters in the model are: number of input arguments, sample size(number of items to be aggregated) and input distributions. Instead of providing theexhaustive results of all simulations, we prefer to explain the general results.

Example 1. Consider two t-norms. A symmetry of this analysis can be trans-lated to the case of t-conorms, due to the dual property described in Section 2. Itis very important to take into account the dominance relationships because Mann-Whitney-Wilcoxon test uses ranks, so that, it will produce significative differences whencomparing dominated outputs. We provide examples of comparison of two dominatedt-norms. In case A we test minimum versus product. In case B we test minimum ver-sus ÃLukasiewicz. Also we provide examples in which we compare two t-norms without



dominance relationship. In case C we test product and Yager. In this case, we see how,for certain values of the parameter p of Yager t-norm, the results of the aggregationcan be considered to follow the same pattern. All cases are constructed from uniforminputs.

Example 2. Consider two averages. By using uniform inputs we observe how ana-lytically different aggregation operators (see Section 2) produce either clearly differentoutputs (case D: geometric mean versus arithmetic mean) or not so different outputs(case E: quadratic mean versus arithmetic mean).

Example 3. In this case we variate the sample size. As it was expected, thegreater is the sample size the better situation to differentiate the outputs of differentaggregation functions. We perform several trials increasing and decreasing the samplesize used in examples 1 and 2 and we check how the different/similar relationshipsvaries. In particular, we mention the conclusions for cases C, D and E. In case C, weobserve that, for very close values of parameter p of Yager t-norm, differences do notincrease when increasing the sample size. In cases D and E, we find that the smalleris the sample size the less important is the choosing of the average by the practitioner,that is, if the practitioner is aggregating a small quantity of points the election of theaverage is not so relevant.

Example 4. Now we study how changes in the number of arguments can affectthe results. By increasing the number of arguments we see how certain aggregationfunctions change their behavior, with respect to that analyzed in cases C and E. Incase C, we note that increasing the number of arguments also increase very quicklythe differences between product and Yager t-norms. But when we compare two Yagert-norms with close values for the parameter p we find that the differences are slightlydecreasing when the number of arguments increase. As in case E, the differencesalso increase when increasing the arguments but in a much slower way, because ofcompensation properties of averages.

Example 5. Finally, we analyze what happens when we use different distributionsthan uniform on the inputs. We use Beta distributions to move the concentration ofthe values to different areas on interval I. We observe that the relationships obtainedin examples 1 and 2 can variate depending on which values are more weighted by theaggregation operator and when the values are more concentrated. Actually in caseA differences between minimum and product t-norms become greater when the Betadistribution is unimodal and concentrates values in one subinterval of I (for instance,when using B(1,10), B(2,2), or B(10,1)). The reason is that this concentration of inputvalues weaken the compensations that uniformity produced among extreme values. Onthe other hand if the Beta distribution is bimodal (as B(.5,.5)) we observe that thiscompensation becomes greater and, thus, it is much more difficult to appreciate differ-ences between minimum and product. This also occurs when values are concentratedin an opposite way, that is, when inputs in one argument accumulate in an oppositesubinterval than the second argument (B(10,1) and B(1,10), for instance). Similar be-havior occurs in case C. This inclines us to think that the reason of this behavior is notdominance between t-norms (it is valid in case A but not in case C) but the reinforce-ment property for t-norms. Thus, when weighting more on some subinterval of I the



reinforcement property becomes more important to differentiate t-norms. In case Ethe behavior is just opposite to cases A and C. Obviously, since we are now averaging,the compensation that produces the operator becomes less effective when the valuesare concentrated in some subinterval of I and, then, is much easier to differentiate theoutputs. That is why the behavior is complementary to the one showed by t-norms.In summary, the reinforcement/compensatory property, accentuate differences betweenoutput distributions when inputs concentrate in some regions of the unit square.

5 Conclusion and future works

In this paper we have performed an statistical analysis on similarities and differenceswithin three main groups of aggregation functions: conjunctive, disjunctive and com-pensatory functions. We have obtained several conclusions described in the examples,that, in some cases, allow the practitioner to make his decision about the aggregationfunction that needs to use. In other cases, theoretical outputs distributions lead us toclearly differentiate between the simulated outputs.

Our future work is focused on analyzing the behavior of outputs under differentprobability distribution and the study of the role that compensatory, conjunction anddisjunction properties play on this behavior. Moreover, we are also planning to providea procedure to check the goodness-of-fit of a given output to a known distribution.Finally, another open problem is to study what is the real effect of different aggregationfunctions on the same set of input data. We plan to develop all this in a forthcomingpaper.

Acknowledgements

Second author is supported by a Grant from the Spanish Ministry of Science andInnovation MTM2008-01519.

References

[1] Beliakov, G., Pradera, A., Calvo, T.: Aggregation Functions: A Guide For Practi-tioners. Springer (2007)

[2] Beliakov, G.: Construction of aggregation functions from data using linear pro-gramming. Fuzzy Sets and Systems 160 (2009) 65–75

[3] Beliakov, G., Troiano, L.: Fitting ST-OWA Operators to Empirical Data. In:EUSFLAT Conf. (1). (2007) 67–73

[4] Calvo, T., Mayor, G., Mesiar, R., eds.: Aggregation Operators: New Trends andApplications. Springer (2002)





30 June, 1–3 July 2009.

Decoding of signals from MIMO communication systems

using Direct Search methods

Rafael A. Trujillo1, Antonio M. Vidal2 and Vıctor M. Garcıa2

1 Departamento de Tecnicas de Programacion, Universidad de las Ciencias

Informaticas, La Habana, CUBA

2 Departamento de Sistemas Informaticos y Computacion, Universidad Politecnica de

Valencia, Valencia, SPAIN


Abstract

Direct Search is a set of derivative-free optimization techniques. Their prop-erties make them appropriate for optimization problems where derivative-basedproblems cannot be applied or do not offer good results. In this work we presentthe application of the Direct Search methods to a Discrete Optimization problem:The decoding of signals coming from Multiple Input Multiple Output wireless com-munication systems. The results obtained are compared with those obtained byother methods designed specifically for this problem.

1 Introduction

Discrete optimization problems have a very different nature compared with continuousoptimization problems. The latter rely on the derivatives of the goal function, whichare used to build preferred search directions. However, when the solution domain is adiscrete, multidimensional space, (say, Z

n) derivatives are no longer available (or canbe meaningless). Then, discrete problems become combinatorial ones and the difficultyof solving them increases; the computational complexity of these problems is very oftenexponential.

The Direct Search methods are a class of derivative-free methods that are appropri-ate for solution of discrete optimization. These methods were proposed and applied firstin the 60s [6, 10]. They are robust, locally convergent and can be easily programmed;however, they are slow compared with methods using derivatives. An important ad-vantage is that they are easily parallelizable; this fact has increased their popularity[2, 15].


Decoding of signals from MIMO communication systems using Direct Search methods

Most of the works in this field applied these methods in continuous optimizationproblems, although they can be applied as well in discrete optimization, or even inoptimization of functions of non-numeric nature.

In this work we have focused our interest in the solution, through Direct Search,of the minimum squares discrete problem;

mins∈Am

‖x − Hs‖2 (1)

where x ∈ Rn×1, H ∈ R

n×m and A is a set of integer values, so that Am is a subset ofZ

m.

Usually this problem is described in terms of lattices. If the elements of A areequally spaced, the set Am form a rectangular lattice as the shown in Figure 1(a).

When the matrix H multiplies the elements of Am then the lattice is deformed, andmight look like in Figure 1(b). Then it is said that the new lattice has been generatedby the matrix H.

The problem (1) is equivalent to the problem of finding the closest point of thelattice generated by a matrix H to a given point x. This problem is known as CVP(Closest Vector Problem), and is known to be a NP-complete problem [9]. This is whyan exhaustive search through all the lattice is not viable.

(a) (b)

Figure 1: (a)Rectangular lattice; (b)Skewed lattice

This problem appears in the field of wireless communications, specifically in de-coding of signals in MIMO (Multiple Input - Multiple Output) systems [4, 14]. Thesesystems are composed of M transmitting antennas and N receiving antennas. Throughthis system a signal is sent s = [s1, s2, . . . , sM ]T ∈ C

M where real and imaginary partsof each component belong to a discrete and finite set A, and a signal x ∈ C

N is received.This signal is a linear combination of the transmitted signal s, perturbed with additivewhite gaussian noise (AWGN) v ∈ C

N with variance σ2

x = Hs + v (2)

Here H is a general complex matrix with N rows and M columns (usually H

is called ”channel matrix”). The discrete set A is finite (|A| = L), and is calledconstellation or symbol alphabet.


Rafael A. Trujillo, Antonio M. Vidal, Vıctor M. Garcıa

For computational reasons, the complex model (2) is usually converted to a realmodel, where the vector s has length m = 2M , and the vectors x and v (of lengthn = 2N) are defined as:

s =[

Re (s)T Im (s)T]T

, x =[

Re (x)T Im (x)T]T

, v =[

Re (v)T Im (v)T]T

,

and the matrix H of dimensions n × m is defined as:

H =

[

Re (H) Im (H)−Im (H) Re (H)

]

Thus, the real model equivalent to (2) is given by:

x = Hs + v (3)

In this setting, the solution to the problem is called the Maximum Likelihood(ML)solution.

This problem has attracted plenty of attention in the latest years; many methodshave been proposed for the efficient solution of this problem. The so-called heuristicmethods are based on the solution of the continuous problem (or of a modified problem),plus the quantization (rounding to the closest value in the lattice) of the obtained(continuous) solution. The most popular methods in this cathegory are the ZF (Zero-

Forcing) method and the MMSE (Minimum Mean Square Error) method [5]. Thesemethods require a QR decomposition to solve the minimum square problem, hencetheir complexity is cubic. These methods do not guarantee to find the ML solution;instead, they give an approximate solution. Other methods such as the Sphere-Decoding

methods [3, 12] do obtain the ML solution but at the expense of a higher computationalcost.

In this paper we propose the Direct search methods as a new way to solve the prob-lem (1). We will show later that these methods offer a performance at least comparableto the offered by other known methods. For this work, several Direct Search methodswere designed and tested, but we present only those that gave the best results.

In the next section we describe the Direct Search algorithms designed; then, theexperimental results are presented, comparing the methods in accuracy of the solutionand in computational work required (number of evaluations of the objective function).Finally, the conclusions of the work are offered.

2 Direct Search Methods

We will present two different Direct Search methods; both belong to the class of Gen-

erating Set Search, as defined by Kolda et. al. in [8]. These methods start froman initial point x0, an initial step length ∆0 and a set of directions spanning ℜn:D0 = dii=1,...p , di ∈ ℜn, so that every vector in ℜn can be written as a nonnegativelinear combination of the directions in Dk.



The driving idea of the GSS methods is to find a decrease direction among thoseof Dk. To do that, in each iteration k the objective function f is evaluated along thedirections in Dk. At that stage, the actual point shall be xk and the step length is ∆k;therefore, f (xk + ∆kdi) is computed, for i = 1, ..., p, until a direction di is found suchthat f (xk + ∆kdi) < f (xk). If no such direction is found, the step length is decreasedand the function is evaluated again along the directions.

When an acceptable pair (∆k, di) is found, the new point is updated: xk+1 = xk+∆kdi, a new step length ∆k+1 is computed and the set of directions is possibly modifiedor updated. This procedure shall be repeated until convergence (that is, when the steplength is small enough).

This general algorithmic framework can be implemented in many ways. Among theversions that we have implemented, we have chosen the two versions described below,since they give the best results.

2.1 Method GSS1

In our first version, the direction set was chosen as

Dk = ±ei ∪ (1, 1, · · · , 1) , (−1,−1, · · · ,−1) (4)

, that is, the set of the coordinate axes and the vectors (1, 1, · · · , 1) and (−1,−1, · · · ,−1)which very often accelerate the search when the initial point is far from the optimum.

Another characteristic is the strategy proposed in [7] of doubling the step lengthwhen the same descent direction is chosen in two consecutive iterations.

A distinct characteristic of our method (not proposed before, as far as we know)is the appropriate rotations of the set of directions. The evaluations are carried out inthe order in which the directions are located in Dk. Therefore, if the descent directionsare located in the first positions of the vector, the algorithm should find the optimumfaster. To accomplish that, we propose that if in the iteration k the first descentdirection is di, it means that d1, ..., di−1 are not descent directions and, most likely,will not be descent directions in the next iterations. Therefore, in our algorithm,these directions are displaced to the end of the Dk set; that is, the new set would be:(di, di+1, ..., dn, d1, ..., di−1)

2.2 Method GSS2

This method follows a similar strategy to the described by Hooke and Jeeves in [6].Here the initial direction set is D0 = ±ei, and, unlike in GSS1, all the directions

in Dk are explored. Then, choosing all the descent directions: d(k)i,1 , d

(k)i,2 , ...d

(k)i,j , a new

descent direction is built:

d = d(k)i,1 + d

(k)i,2 + ... + d

(k)i,j (5)

,which probably will cause a greater descent. If this is not true, the algorithm will

choose among d(k)i,1 , d

(k)i,2 , ...d

(k)i,j the direction with larger descent.



Clearly, this algorithm should need less iterations for convergence than GSS1, butthe cost of each iteration is larger, since the function will be evaluated along all thedirections.

2.3 Parameter Configuration for the Direct Search

The application of the Direct Search methods requires a previous adaptation to theproblem. The objective function will be the function f : Am → R

n such that f (s) =‖x − Hs‖2 , s ∈ Am, where A is the constellation.

Next, we summarize the configuration of the following parameters:

• Initial Point s(0): Initially it shall be generated as a random lattice point althoughit also might be generated using some heuristic method.

• Initial step length ∆0: To guarantee that the initial value of ∆0 is enough toexplore the whole lattice, ∆0 must be initialized with the value L · l, where L isthe number of symbols in the constellation and l is the distancce between twosymbols of A

• Step length tolerance ∆tol: Any real number smaller than l, since the minimumstep length that the Direct Search methods would use would be exactly l

• Step expansion φ: We have chosen not to expand the step, so that the expansionfactor can be φ = 1.0

• Step contraction θ: We have chosen to contract the step length by a factor ofθ = 0.5 when an iteration fails. This factor is chosen because, since ∆0 = L · land L generally is a power of 2, then the multiplication the step length by 0.5means that the step length is a multiple of l, by a factor that is always a powerof 2.

• Set of search directions D: The set of coordinate directions can be used; Sincethe step length will always be a multiple of l, the search will be performed onlyamong lattice points.

3 Experimental Results

The Direct Search methods GSS1 and GSS2 were chosen to be applied to the problemof decoding signals in MIMO systems. To carry out the trials, L-PAM constellationstaking the values L = 8, 16, 32 and 64 were selected. We chose problems wheren = m = 4, i.e., problems arising from MIMO systems with 2 transmitting antennasand 2 receiving antennas. The target matrices were first obtained by generating 2 × 2complex matrices and then transforming them into the real model as explained above.

A Gaussian zero-mean white noise with variance given by

σ2 =m

(

L2 − 1)

12ρ(6)



5 10 15 20 250

0.5

1

1.5

2

2.5

SNR [dB]

Euc

lidea

n di

stan

ce ||

x −

Hs|

| GSS1GSS2MMSEASD

(a) 8-PAM

5 10 15 20 250.5

1

1.5

2

2.5

3

3.5

SNR [dB]

Euc

lidea

n di

stan

ce ||

x −

Hs|

| GSS1GSS2MMSEASD

(b) 16-PAM

5 10 15 20 251

2

3

4

5

6

7

SNR [dB]

Euc

lidea

n di

stan

ce ||

x −

Hs|

| GSS1GSS2MMSEASD

(c) 32-PAM

5 10 15 20 25

2

4

6

8

10

SNR [dB]

Euc

lidea

n di

stan

ce ||

x −

Hs|

| GSS1GSS2MMSEASD

(d) 64-PAM

Figure 2: Minima obtained in problems of dimension 4× 4. Initial search point chosenrandom

was added to all transmitted signals, where ρ is the signal to noise ratio (SNR); for SNRvalues 5, 10, 15, 20 and 25 were considered. For each case (n,m,L, ρ) ten matrices, weregenerated; for each matrix ten signals were decoded; then, the average of the minimumvalues reached during the search and the average execution time were calculated.

Direct search methods were initially compared against a ML method and a heuristicmethod. The ML method chosen was the Automatic Sphere-Decoding (ASD) publishedin [13], and the heuristic method considered was the MMSE method published in [5].Figure 2 shows the values of the Euclidean distance ‖x − Hs‖, where s is the vector ofdecoded symbols.

It can be seen that as the size of the constellation increases (and hence ,the prob-lem becomes more complex), the solution offered by the MMSE method is becomingfar away from the ML solution (which is the global optimum), while in all cases thesolutions obtained by the Direct Search methods are kept at a constant and quite close



metodo ρ = 5 ρ = 10 ρ = 15 ρ = 20 ρ = 25 L

GSS1 46, 45 46, 91 48, 25 47, 66 49, 13 8GSS2 60, 32 59, 01 61, 63 61, 47 63, 03ASD 93, 72 51, 72 48, 52 38, 12 35, 16

GSS1 71, 22 70, 59 70, 30 69, 68 71, 55 16GSS2 94, 13 89, 03 90, 48 91, 66 90, 70ASD 698, 28 173, 00 93, 16 95, 88 87, 88

GSS1 93, 55 93, 49 99, 42 95, 94 99, 50 32GSS2 123, 69 122, 52 122, 01 121, 96 124, 17ASD 8406, 12 1859, 24 238, 44 235, 24 188, 20

GSS1 126, 54 124, 51 123, 39 124, 93 122, 02 64GSS2 155, 31 161, 63 158, 40 162, 57 159, 95ASD 13908, 2 8967, 4 2121, 32 502, 66 374, 76

Table 1: Average number of explored lattice points

distance of the ML solution. Thus, in cases where L = 32 y L = 64 Direct Searchsolutions clearly improve those offered by the heuristic method.

Table 1 shows the average lattice point explored by each method to find the closestpoint. Here Direct Search methods are compared with ASD method. This comparisonis important because it provides a measure of how often the method performs theoperation ‖x − Hs‖.

Obviously, as the complexity of the problem increases (higher values for L andlower values for ρ) all the methods explore more points in the lattice. In the case ofDirect Search methods, the value of SNR has little influence, because in this first test arandom starting point was taken. One can see that as problems become more complex,Direct Search methods explore much less points than the ASD method. For L = 32,64 there have been cases in which the use of ASD method has implied the explorationof up to 100,000 or even 500,000 points.

Comparing the two Direct Search methods, it can be observed that in all cases theGSS1 method consumed less time than the GSS2 method, although the GSS2 methodobtained best accuracy in almost all cases.

In these early experiments a randomly generated point was always selected asstarting point for the search. Thus, Direct Search presents always a disadvantage withregard to the MMSE method as it works without knowledge about the variance ofthe noise that was added to the signal. That is why we also performed several testsby using the solution obtained by another heuristic method, the Zero-Forcing (ZF) asstarting point. In turn, the ASD method was added a search radius to decrease theaverage number of nodes explored. We have taken as search radius ‖x − HsMMSE‖,where sMMSE is the solution computed by the MMSE method. Figure 3 and table 2show the results of these experiments, this time with L = 16, 32, 64, 128.

As expected, in this last case the solutions of the Direct Search methods are muchcloser to the optimum, because now the search starts from a point which is the solutiongiven by a heuristic method. Moreover, in the vast majority of these cases DirectSearch methods reach the ML solution. Table 2 shows that the Direct Search methods



5 10 15 20 251

2

3

4

5

6

SNR [dB]

Euc

lidea

n di

stan

ce ||

x −

Hs|

| GSS1GSS2MMSEASD−MMSE

(a) 16-PAM

5 10 15 20 251

2

3

4

5

6

7

SNR [dB]

Euc

lidea

n di

stan

ce ||

x −

Hs|


(b) 32-PAM

5 10 15 20 25

2

4

6

8

10

SNR [dB]

Euc

lidea

n di

stan

ce ||

x −

Hs|


(c) 64-PAM

5 10 15 20 25

5

10

15

20

SNR [dB]

Euc

lidea

n di

stan

ce ||

x −

Hs|


(d) 128-PAM

Figure 3: Minima obtained in problems of dimension 4× 4. Initial search point chosenwith ZF method

metodo ρ = 5 ρ = 10 ρ = 15 ρ = 20 ρ = 25 L

GSS1 38, 48 36, 62 36, 34 36, 13 35, 15 16GSS2 43, 54 40, 69 40, 03 39, 76 37, 45

ASD-MMSE 53, 22 12, 56 15, 83 10, 11 8, 24

GSS1 51, 38 49, 33 46, 84 45, 53 45, 27 32GSS2 57, 22 54, 41 50, 35 48, 50 48, 02

ASD-MMSE 195, 96 57, 33 17, 81 28, 62 10, 06

GSS1 63, 44 59, 91 57, 24 57, 48 57, 40 64GSS2 70, 28 64, 90 61, 78 61, 44 60, 88

ASD-MMSE 1798, 84 560, 48 44, 03 17, 28 13, 55

GSS1 75, 35 73, 02 70, 28 68, 18 67, 89 128GSS2 86, 45 78, 48 75, 82 71, 25 71, 45

ASD-MMSE 19917, 48 4009, 26 374, 31 28, 62 21, 10

Table 2: Average number of explored lattice points



explore now fewer points in the lattice. The ASD method also considerably reducesthe search field, due to the incorporation of the search radius taken from the MMSEsolution. However, the behaviour of Direct Search methods seems to be better for thoseproblems with a lot of noise, specially if the alphabet of symbols is large.

4 Conclusions

In this work two Direct Search methods have been designed and implemented. Severalstrategies for acceleration of the convergence have been incorporated. First, in the GSS1method a strategy of rotation of directions has been proposed. This complements thestrategy proposed in [7] of doubling the step length when a direction has been successfulin decreasing the value of the objective function, in more than one iteration. Second,in the method GSS2 a new, simple rule to build new directions has been proposed,obtained combining the coordinate directions that obtained some reduction.

The methods proposed were applied to the optimization problem of estimation ofthe transmitted symbols in MIMO wireless systems. These methods were configured forthis problem and they were compared with two of the best methods for this problem, theheuristic MMSE method and the exact ASD method. The comparisons were carriedout by examining the quality of the decoded signals and the cost of the search. Ithas been checked that, taking random initial search points, the solutions obtained bythe Direct Search methods are as good, or better, than the solutions obtained by theheuristic method, specially when the dimension of the problem increases. We carriedout also many experiments starting the search by a point obtained through heuristicmethods. Then, the solution obtained was the ML solution in a high percentage ofcases.

From the point of view of the execution time, in problems where the noise wasrelatively important, the Direct Search methods explored in average less points thanthe ASD method. In some cases, the difference was substantial. The solution obtainedwas in most cases the ML solution. We can conclude that the Direct Search methodsare both effective and efficient methods for decoding signals in MIMO systems. It isparticulary remarkable the performance of the GSS1 method, which has the shortestexecution time in all cases.

Acknowledgements

This work was financially supported by the Spanish Ministerio de Ciencia e Innovacion(Project TIN2008-06570-C04-02), Universidad Politecnica de Valencia through project20080009 and Generalitat Valenciana through Project 20080811.

References

[1] G. Berman, Lattice Approximations to the Minima of Functions of Several Vari-

ables, J. ACM, 16 (1969), pp. 286–294.



[2] J. E. Dennis, Jr. and V. Torczon, Direct search methods on parallel machines.,SIAM Journal Optimization, 1 (1991), pp. 448–474.

[3] U. Fincke and M. Pohst, Improved methods for calculating vectors of short

length in a lattice, including a complexity analysis., Mathematics of Computation,44(170) (1985), pp. 463–471.

[4] G. J. Foschini and M. J. Gans, On limits of wireless communications in a

fading environment when using multiple antennas, Wireless Personal Communica-tions, 6 (1998), pp. 311–335.

[5] B. Hassibi, An efficient square-root algorithm for BLAST, ICASSP ’00: Proceed-ings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE InternationalConference, 2 (2000), pp. II737–II740.

[6] R. Hooke and T. A. Jeeves, Direct Search solution of numerical and statistical

problems., Journal of the Association for Computing Machinery, (1961), pp. 212–229.

[7] P. D. Hough, T. G. Kolda, and V. J. Torczon, Asyncronous parallel pattern

for nonlinear optimization, SIAM J. Sci. Comput., 23 (2001), pp. 134–156.

[8] T. G. Kolda, R. M. Lewis, and V. Torczon, Optimization by Direct Search:

New perspective on Some Clasical and Modern Methods., SIAM Review, 3 (2003),pp. 385–442.

[9] D. Micciancio and S. Goldwasser, Complexity of Lattice Problems, KluwerAcademic Publishers, Norwell, MA, USA, 2002.

[10] J. A. Nelder and R. Mead, A simplex method for function minimization., TheComputer Journal, 7 (1965), pp. 308–313.

[11] E. Polak, Computational Methods in Optimization: A Unified Approach., Aca-demic Press, New York, 1971.

[12] C.-P. Schnorr and M. Euchner, Lattice basis reduction: Improved practi-

cal algorithms and solving subset sum problems, Math. Programming, 66 (1994),pp. 181–191.

[13] K. Su, Efficient Maximum-Likelihood detection for communication over Multiple

Input Multiple Output channels, tech. report, Department of Engineering, Univer-sity of Cambridge, 2005.

[14] I. E. Telater, Capacity of multi-antenna gaussian channels, Europ. Trans.Telecommun., (1999), pp. 585–595.

[15] V. Torczon, Multi-Directional Search: A Direct Search Algorithm for Parallel

Machines., tech. report, Department of Computational and Applied Mathematics,Rice University, Houston, TX, 1990.



[16] , On the convergence of pattern search algorithms., SIAM Journal on Opti-mization, 7 (1997), pp. 1–25.



Unification of Analysis with Mathematica

Unal Ufuktepe1 and Sinan Kapcak1

1 Department of Mathematics, Izmir University of Economics, Izmir, TURKEY


Abstract

In 1990 Hilger defined the Time Scale Calculus which is the unification of dis-crete and continuous analysis in his PhD. In 2005 Yantir and Ufuktepe showeddelta derivative with mathematica [4]. In this study we give many computationsof Time Scale Calculus with Mathematica such as the definition of any time scale,graphs of functions on different time scales. We also improve and extent the TimeScale package.

Key words: Time Scale, MathematicaMSC 2000: AMS codes (optional)

1 Introduction

The theory time scale springs from doctoral thesis of Hilger. Time Scale calculus uni-fies and generalizes various mathematical concepts from the theories of discrete andcontinuous analysis. A succinct survey on time scale can be found in [1].Mathematica is a revolutionary software for engineering, science, economics, and matheducation and research. Mathematica delivers unprecedented workflow, coherence, re-liability, and innovation. Mathematica is excellent program for computation, but foralso modeling, simulation, visualization, development, documentation, and deployment.Since time scales is new concept for science and there is a few works on computer alge-bra systems: Previous works of the first author and Time Scales Toolbox for MatLabproject by Brian Ballard and Bumni Otegbade (Baylor University). We developed allMathematica codes and get the graph of time scales, graph of functions which definedon a time scales, the area of the region bounded by the function and above the x-axis,tangent line of a function at the given point, and prove the some equalities on timescales. In the first section we give some basic concepts of time scales, and in the secondsection we give the further examples with mathematica.



2 Time Scales with Mathematica

Definition 2.1 A time scale,T, is an arbitrary nonempty closed subset of the real num-bers. Thus, R itself, Z, N, union of any closed intervals as [0, 1] ∪ [2, 3], or any closedintervals plus some single points as [0, 1] ∪ 4, 6, 8 can be given as examples of timescales whereas Q, C, any subset of real numbers that are not closed or not union ofclosed intervals or single points are not time scales.[3, 2].

Definition 2.2 Let T be a time scale. We define the forward jump operator σ : T→ Tby

σ(t) := infs ∈ T : s > t (1)

and the backward jump operator ρ : T→ T by

ρ(t) := sups ∈ T : s < t. (2)

If σ(t) > t, t is said to be right-scattered and if ρ(t) < t, t is left-scattered. Pointsthat are both right-scattered and left-scattered are called isolated. Also, if σ(t) = t, thent is called right-dense, and if ρ(t) = t, then t is called left-dense. Points that are bothright-dense and left-dense are called dense points. For a special case if t = maxT,σ(t) = t and if t = minT, ρ(t) = t.

Definition 2.3 The function µ : T→ [0,∞) is defined by

µ(t) := σ(t)− t. (3)

is called grainess function.

The grainess function plays a central role in the analysis on the time scales. For thegeneral case many formulae have some terms containing the factor µ(t).

As the first example we represent time scale T1 consisting of two closed intervalsand one isolated point:

T1 = [−1, 0] ∪ [1, 2] ∪ 12

In our package, we describe a time scale as a collection of three lists: a list ofright-dense left-scattered points, a list of left-dense right-scattered points and a list ofisolated points, namely, for our example, −1, 1, 0, 2 and 1

2, respectively. Notethat the number of elements in the first list must be equal to the number of elementsin the second.

In[1]:= T1=-1,1,0,2,1/2;


Unal Ufuktepe, Sinan Kapcak

We can check whether T1 is time scale or not and get its graph on the real axis:

In[2]:= QTimeScale[T1]Out[2]:= TrueIn[3]:= DrawTimeScale[T1]Out[3]:= See Figure 1

-1.0 -0.5 0.0 0.5 1.0 1.5 2.0

Figure 1: Time Scale T1

We define forward and backward jump operators and graininess function in ourpackage by Sigma[Time Scale,Point ], Rho[Time Scale,Point ] and TSGraininess[TimeScale,Point ], respectively:

In[4]:= Sigma[T1,-1]Out[4]:= -1In[5]:= Sigma[T1,1/2]Out[5]:= 1In[6]:= Sigma[T1,Sqrt[2]]Out[6]:=

√2

In[7]:= Sigma[T1,2]Out[7]:= 2In[8]:= Rho[T1,1/2]Out[8]:= 0In[9]:= Rho[T1,0]Out[9]:= 0In[10]:= TSGraininess[T1,0]Out[10]:= 1

2

In[11]:= TSGraininess[T1,Sqrt[3]]Out[11]:= 0

To plot a given function f : T→ R, we represent TSPlot[Time Scale,Function,Variable ].As the first example let us use the function f : T1 → R, f(x) = x2 on T1. For thesecond, we will try the function f : T1→ R, f(x) = σ(x):



In[12]:= TSPlot[T1,x^2,x]Out[12]= See Figure 2In[12]:= TSPlot[T1,Sigma[T1,x],x]Out[12]= See Figure 3

-1.0 -0.5 0.5 1.0 1.5 2.0

1

2

3

4

Figure 2: Graph of x2 on time scale T1

Definition 2.4 A function f : T→ R is said to be rd continuous if it is continuous atall right dense points and the left side limit exists on left dense points.

Definition 2.5 A function f : T→ R is said to be ld continuous if it is continuous atall left dense points and the right side limit exists on right dense points.

Now we consider a function f : T→ R and define so-called the delta or Hilger derivativeof f at a point t ∈ Tk. Tk is a new term derived from T if T has a left scatteredmaximum m, then Tk = T− m. Otherwise, Tk = T.

Definition 2.6 Assume f : T→ R is a function and let t ∈ Tk. Then we define deltaderivative f∆(t) to be the number provided it exists with the property that given anyε > 0, there is a neighborhood U of t (i.e, U = (t− δ, t+ δ) ∩ T for some δ > 0 ) suchthat

|(f(σ(t))− f(s))− f∆(t)(σ(t)− s)| ≤ ε|σ(t)− s| for all s ∈ U.We call f∆(t) the delta derivative of f at t.Moreover, we say that f is ∆-differentiable on Tk provided f∆(t) exists for all t ∈ Tk.The function f∆ : Tk → R is then called the delta derivative of f on Tk.

Theorem 2.7 Assume f : T → R is a function and let t ∈ Tk. Then we have thefollowing:



-1.0 -0.5 0.5 1.0 1.5 2.0

-1.0

-0.5

0.5

1.0

1.5

2.0

Figure 3: Graph of σ(x) on time scale T1

(i) If f is differentiable at t, then f is continuous at t.

(ii) If f is continuous at t and t is right-scattered, then f is differentiable at t with

f∆(t) =f(σ(t))− f(t)

µ(t).

(iii) If t is right-dense, then f is differentiable at t if and only if the limit

lims→t

f(t)− f(s)t− s

exist as a finite number, in this case

f∆(t) = lims→t

f(t)− f(s)t− s

.

(iv) If f is differentiable at t, then f(σ(t)) = f(t) + µ(t)f∆(t).

We define the delta derivative of any function on a time scale at a given point byDeltaDerivative[Time Scale,Function,Variable,Point ]:

In[13]:= DeltaDerivative[T1,x^2,x,0]Out[13]:= 1

2

In[13]:= DeltaDerivative[T1,x^2,x,1]Out[13]:= 2



Note that, since the function f : T1 → R, f(x) = σ(x) is not continuous at thepoint x = 0 (see Figure 3), delta derivative at that point doesn’t exist:

In[14]:= DeltaDerivative[T1,Sigma[T1,x],x,0]Out[14]:= Delta derivative at the point 0 doesn’t exist.

We also plot the tangent line for a given point as follows:

In[14]:=TSTangentLine[T1,x^2,x,0]Out[14]= See Figure 4

-2 -1 1 2

-1

1

2

3

4

Figure 4: Tangent Line for y = x2 at the point 0 on T1

A function F : T→ R is called a ∆-antiderivative of f : T→ R provided F∆(t) =f(t) holds for all t ∈ Tκ. Then Cauchy ∆-integral from a to t of f is defined by

∫ t

af(s)∆s = F (t)− F (a) for all t ∈ T.

Note that in the case T = R, we have

f∆(t) = f ′(t),∫ b

af(t)∆t =

∫ b

af(t)dt,

and in the case T = Z, we have

f∆(t) = f(t+ 1)− f(t),

∫ b

af(t)∆t =

b−1∑

k=a

f(k),



where a, b ∈ T with a ≤ b.

Now we represent the command DeltaIntegral[Time Scale,Function,Variable,Interval ]which gives the definite integral of a given function on a given interval:

In[15]:= DeltaIntegral[T1,x^2,x,-1,1]Out[15]:= 11

24

In our package we also display the corresponding area of the definite integral asfollows:

In[15]:= TSAreaPlot[T1,x^2,x,-1,1]

-1.0 -0.5 0.5 1.0 1.5 2.0

1

2

3

4

Figure 5: Delta-integral of the function y = x2 on the interval [−1, 1], on T1

3 Further Examples

In this section, we give some further examples of time scales and some results by usingour package. Let us define the time scale T2 and T3 as follows:

In[]:= T2=Table[2 k,k,0,5],Table[2 k+1,k,0,5],Out[]:= 0,2,4,6,8,10,1,3,5,7,9,11,In[]:= T3=,,Table[n,n,10]Out[]:= ,,1,2,3,4,5,6,7,8,9,10

By using the Mathematica command Function, we get both the graph of T2 andT3:



In[3]:= Function[DrawTimeScale[#]]/@T2,T3Out[3]:= See Figure 6

:0 2 4 6 8 10

,2 4 6 8 10

>

Figure 6: Time Scales T2 and T3

Consider the equation σ(ρ(x)) = ρ(σ(x)). Does it hold for every time scales? Byvisualizing, we can show that it does not have to.

Take the time scale T2 and plot the functions:

In[]:= TSPlot[T2,Sigma[T2,Rho[T2,x]],x]Out[]:= See Figure 7

2 4 6 8 10

2

4

6

8

10

Figure 7: f(x) = σ(ρ(x)) on T2

In[]:= TSPlot[T2,Rho[T2,Sigma[T2,x]],x]Out[]:= See Figure 8

4 Conclusion

As the reader see, we obtain very good results by TSPlot, TSAreaPlot in order tounderstand time scale calculus very well. Regarding our future work, we believe thereis still potential in the rotation of the areas around given axis and also try to solvedynamic equations (unification of difference and differential equations) which are notdescribed here.



2 4 6 8 10

2

4

6

8

10

Figure 8: f(x) = ρ(σ(x)) on T2

References

[1] R.Agarwal,M.Bohner, D.O’Regan, A.Peterson, Dynamic Equations on TimeScales: A survey, J.Comput.Appl.Math. 141 (2002) 1-26

[2] Bohner, M. and Peterson, A., 2001. Dynamic Equations on Time Scales: An In-troduction with Applications, (Birkhauser, Boston), Chapter 1.

[3] Hilger, S., 1997. “Differential and difference calculus — unified.”, Nonlinear Analy-sis, Theory, Methods and Applications, Vol. 30, pp. 2683-2694.

[4] A.Yantir, U.Ufuktepe, Mathematica Applications on Time Scales for Calculus,Lecture Notes in Computer Science,(2005) p529-537, 3482.





30 June, 1–3 July 2009.

Application of the generalized finite difference method to

solve advection-diffusion equation

Francisco Urena Prieto1, Juan Jose Benito Munoz2 and Luis Gavete

Corvinos3

1 Departamento de Matematica Aplicada , Universidad de Castilla La Mancha

2 Departamento de Construccion y Fabricacion, Universidad Nacional de Educacion a

Distancia

3 Departamento de Matematicas Aplicadas a los Recursos Naturales, Universidad

Politenica de Madrid


Abstract

The study of advection-diffusion equation continues to be an active field ofresearch. The subject has important applications to fluid dynamics as well asmany other branches of science and engineering.This paper shows the application of the generalized finite difference method tosolve the advection-diffusion equation by the explicit method. The convergence ofthe method has been studied and the truncation error over irregular grids is given.Different examples have been solved using the explicit finite difference formulaeand the criterion of stability.

Key words: generalized finite differences, advection-diffusion equation, irregular

grids, explicit method.

MSC 2000: 65M06, 65M12, 74S20, 80M20

1 Introduction

An evolution of the method of finite differences has been the development of genera-lized finite difference (GFD) method that can be applied to irregular grids or clouds ofpoints. Benito, Urena and Gavete have made interesting contributions to the develop-ment of this method ([1, 3, 4, 6, 7]).The paper [5] shows the application of the GFD method to solving parabolic and hy-perbolic equations.


Application of the gfdm to solve advection-diffusion equation

2 Explicit Generalized Differences Schemes

We consider here an advective-diffusive equation for the unknown function U(x, t)

∂U(x, t)

∂t+ cx

∂U(x, t)

∂x+ cy

∂U(x, t)

∂y+ cz

∂U(x, t)

∂z=

α(∂2U(x, t)

∂x2+

∂2U(x, t)

∂y2+

∂2U(x, t)

∂z2) t > 0, x ∈ Ω (1)

with the initial condition

U(x, 0) = f(x) (2)

and the boundary condition

aU(x0, t) + b∂U(x0, t)

∂n= g(t) in Γ (3)

being f(x) and g(t) two known functions, a, b are constants and Γ is the boundary ofΩ, where α > 0 is the diffusion coefficient and cx > 0, cy > 0, cz > 0 are the constantvelocities .The intention is to obtain explicit linear expressions for the approximation of partialderivatives in the points of the domain. First of all, an irregular grid or cloud of pointsis generated in the domain Ω ∪ Γ. On defining the central node with a set of nodessurrounding that node, the star then refers to a group of established nodes in relationto a central node. Each node in the domain has an associated star assigned to it.If U0 is the value of the function at the central node of the star, (x0, y0, z0) and Uj

are the function values at the rest of nodes, with j = 1, · · · ,N , then, according to theTaylor serie expansion

Uj = U0 + hj

∂U0

∂x+ kj

∂U0

∂y+ lj

∂U0

∂z+

1

2(hj

∂U0

∂x+ kj

∂U0

∂y+ lj

∂U0

∂z)2 + · · · (4)

where (x0, y0, z0) are the spatial coordinates of the central node, (xj, yj , zj) are thespatial coordinates of the jth node in the star, hj = xj − x0, kj = yj − y0, lj = zj − z0.If in equation 4 the terms over the second order are ignored, an approximation of thesecond order for the Uj function is obtained. This is indicated as uj. It is then possibleto define the function

B(u) =N

∑

j=1

[(u0−uj+hj∂u0

∂x+kj

∂u0

∂y+lj

∂u0

∂z+

1

2(hj

∂u0

∂x+kj

∂u0

∂y+lj

∂u0

∂z)2)w(hj , kj , lj)]

2

(5)where w(hj , kj , lj) is the denominated weight function.If the function 5 is minimized with respect to the partial derivatives, the followinglinear equation system is obtained

ADu = b (6)


F. Urena, J.J. Benito, L. Gavete

On solving system 6, the explicit difference formulae are obtained.On including in the equation 1 the explicit expressions obtained for the partial deriva-tives:

∂u0

∂t=

un+10 − un

0

t(7)

∂u0

∂x= −λ0u

n0 +

N∑

j=1

λjunj (8)

∂u0

∂x= −µ0u

n0 +

N∑

j=1

µjunj (9)

∂u0

∂x= −η0u

n0 +

N∑

j=1

ηjunj (10)

∂2u0

∂x2+

∂2u0

∂y2+

∂2u0

∂z2= −m0u

n0 +

N∑

j=1

mjunj (11)

the following expression is obtained

un+10 = un

0 −t[cx(−λ0un0 +

N∑

j=1

λjunj ) + cy(−µ0u

n0 +

N∑

j=1

µjunj )+

cz(−η0un0 +

N∑

j=1

ηjunj )]+αt[−m0u

n0 +

∑Nj=1 mju

nj ] (12)

with

λ0 =

N∑

j=1

λj µ =

N∑

j=1

µj η0 =

N∑

j=1

ηj m0 =

N∑

j=1

mj (13)

The expression 12 relates the value of the function at the central node of the star, intime n + 1, with the values of the functions in the nodes of the star, for a time n,multiplied by specific coefficients.

3 Convergence

According to Lax‘s equivalence theorem, if the consistency condition is satisfied, stabi-lity is the necessary and sufficient condition for convergence. In this Section we studyfirstly the truncation error of convection-diffusion equation, and secondly consistencyand stability.



3.1 Truncation error

We can split the total truncation error of convection-diffusion equation previously de-fined in 1 in two parts, the first one corresponding to time derivative (TEt), and thesecond one corresponding to the space derivatives (TEx). As it is well known, thetruncation error for first order time derivative is given as follows

∂U(x0, t)

∂t=

U(x0, t + t) − U(x0, t)

t−

t

2

∂2U(x0, t1)

∂t2+ Θ((t)2), t < t1 < t + t (14)

(TEt) = −t

2

∂2U(x0, t1)

∂t2+ Θ((t)2), t < t1 < t + t (15)

In order to obtain the truncation error for space derivatives, Taylor‘s series expansionincluding higher order derivatives is used and then higher order function B∗(u) isobtained

B∗(u) =N

∑

j=1

[u0 − uj + hj∂u0

∂x+ kj

∂u0

∂y+ lj

∂u0

∂z+

1

2(hj

∂u0

∂x+ kj

∂u0

∂y+ lj

∂u0

∂z)2+

1

6(hj

∂u0

∂x+ kj

∂u0

∂y+ lj

∂u0

∂z)3 +

1

24(hj

∂u0

∂x+ kj

∂u0

∂y+ lj

∂u0

∂z)4)w(hj , kj , lj)]

2 (16)

If the function 16 is minimized with respect to the partial derivatives down to thesecond order, the following linear equation systems is obtained

ADu =(

NX

j=1

Ξhj

NX

j=1

Ξkj

NX

j=1

Ξlj

NX

j=1

Ξh2

j

2

NX

j=1

Ξk2

j

2

NX

j=1

Ξl2j

2

NX

j=1

Ξhjkj

NX

j=1

Ξhj lj

NX

j=1

Ξkj lj

)T

(17)

where

Ξ = [u0 − uj + hj

∂u0

∂x+ kj

∂u0

∂y+ lj

∂u0

∂z+

1

2(hj

∂u0

∂x+ kj

∂u0

∂y+ lj

∂u0

∂z)2+

1

6(hj

∂u0

∂x+ kj

∂u0

∂y+ lj

∂u0

∂z)3 +

1

24(hj

∂u0

∂x+ kj

∂u0

∂y+ lj

∂u0

∂z)4)]w(hj , kj , lj)

2 (18)

with N ≥ 9, and then

TEx = CA−1×

(

NX

j=1

Υhj

NX

j=1

Υkj

NX

j=1

Υlj

NX

j=1

Υh2

j

2

NX

j=1

Υk2

j

2

NX

j=1

Υl2j

2

NX

j=1

Υhjkj

NX

j=1

Υhjlj

NX

j=1

Υkj lj

)T

(19)

where

C =(

−cx −cy −cz α α α 0 0 0)T

(20)



and

Υ = −[1

3!(hj

∂u0

∂x+ kj

∂u0

∂y+ lj

∂u0

∂z)3 +

1

4!(hj

∂u0

∂x+ kj

∂u0

∂y+ lj

∂u0

∂z)4)]w(hj , kj , lj)

2

(21)

and operating

TEx = −

N∑

i=1

γj [∂3f0

∂x3

h3j

6(h2j + k2

j + l2j )+

∂3f0

∂x2∂y

h2jkj

2(h2j + k2

j + l2j )+

∂3f0

∂x∂y2

hjk2j

2(h2j + k2

j + l2j )+

∂3f0

∂y3

k3j

6(h2j + k2

j + l2j )+

∂3f0

∂z3

l3j

6(h2j + k2

j + l2j )+

∂3f0

∂x2∂z

h2j lj

2(h2j + k2

j + l2j )+

∂3f0

∂y2∂z

k2j lj

2(h2j + k2

j + l2j )

+∂3f0

∂x∂z2

hj l2j

2(h2j + k2

j + l2j )+

∂3f0

∂y∂z2

kj l2j

2(h2j + k2

j + l2j )+

∂3f0

∂x∂y∂z

hjkj lj

(h2j + k2

j + l2j )]−

N∑

i=1

γj[∂4f0

∂x4

h4j

24(h2j + k2

j + l2j )+

∂4f0

∂x3∂y

h3jkj

6(h2j + k2

j + l2j )+

∂4f0

∂x2∂y2

h2jk

2j

4(h2j + k2

j + l2j )+

∂4f0

∂x∂y3

hjk3j

6(h2j + k2

j + l2j )+

∂4f0

∂y4

k4j

24(h2j + k2

j + l2j )+

∂4f0

∂z4

l4j

24(h2j + k2

j + l2j )+

∂4f0

∂x3∂z

h3j lj

6(h2j + k2

j + l2j )+ · · · ] + Φ(h3

j , k3j ) (22)

where γj ∈ R.The expression 22 is the truncation error for spatial derivatives.Taking into account that the total truncation errors (TTE) for advection-diffusionequation is given by

TTE = TEt + TEx (23)

where TEt and TEx are given by 15 and 22 respectively.

3.2 Consistency

By considering bounded derivatives in 23

lim(t,hj ,kj ,lj)→(0,0,0)

TTE → 0 (24)

Then, the truncation error given in 24 show the consistency of the approximation forconvection-diffusion equation with constant coefficients. In the following paragraph thestability is studied using the Von Neumann criterium.

3.3 Stability criteria

For the difference schemes, the von Neumann condition is sufficient as well as necessaryfor stability [2]. ”Boundary conditions are neglected by the von Neumann method



which applies in theory only to pure initial value problems with periodic initial data. Itdoes however provide necessary conditions for stability of constant coefficient problemsregardless of the type of boundary condition”.For the stability analysis a harmonic decomposition is made of the approximate solutionat grid points at a given time level. Then, by following the von Neumann idea forstability analysis, we can write that the finite difference approximation in the centralnode at time n, may be expressed as

un0 = ξneiνT x0 (25)

and the finite difference approximation in the other nodes of the star

unj = ξneiνT xj (26)

where nuj is the column vector of the wave numbers, x0 is the vector of coordinatesof central node of star and xj is the coordinates of the other nodes of star, being:

xj = x0 + hj (27)

then hj = (hj , kj , lj) are the relative coordinates between the nodes of star.On the other hand, ξ is called the amplification factor and it is in general a complexconstant. If this amplification factor has a modulus greater than unity (‖ξ‖ > 1), themethod is unstable.Substituting 25 and 26 into 12, we obtain

ξn+1eiνT x0 = ξneiνT x0 −t[cx(−λ0ξneiνT x0 +

N∑

j=1

λjξneiνT xj )+

cy(−µ0ξneiνT x0 +

N∑

j=1

µjξneiνT xj ) + cz(−η0ξ

neiνT x0 +N

∑

j=1

ηjξneiνT xj )]+

αt[−m0ξneiνT x0 +

N∑

j=1

mjξneiνT xj ] (28)

Using 27, cancellation of ξneiνT x0 , leads to

ξ = 1 −t[cx(−λ0 +

N∑

j=1

λjeiνT hj ) + cy(−µ0 +

N∑

j=1

µjeiνT hj )+

cz(−η0 +

N∑

j=1

ηjeiνT hj )] + αt[−m0 +

N∑

j=1

mjeiνT hj ] (29)



Substituting 13 into 29 we obtain

ξ = 1 −t[N

∑

j=1

(−cxλj − cyµj − czηj + αmj)(1 − eiνT x0)] =

1 −t[N

∑

j=1

(−cxλj − cyµj − czηj + αmj)(1 − cos νThj)]+

it[N

∑

j=1

(−cxλj − cyµj − czηj + αmj) sin νThj] (30)

Then we can write the conditions stability as:

− 1 < 1 −t[

N∑

j=1

(−cxλj − cyµj − czηj + αmj)(1 − cos νThj)] < 1 ⇔

0 < t <1

|αm0 − cxλ0 − cyµ0 − czη0|(31)

|t[N

∑

j=1

(−cxλj − cyµj − czηj + αmj) sin νThj]| < 1 ⇔

0 < t <1

|αm0 − cxλ0 − cyµ0 − czη0|(32)

and the modulus of the amplification factor is

‖ξ‖ ≤ 1 ⇔ (t[

N∑

j=1

(−cxλj − cyµj − czηj + αmj) sin νThj])

2 ≤

t[

N∑

j=1

(−cxλj − cyµj − czηj + αmj)(1 − cos νThj)]×

× (2 −t[

N∑

j=1

(−cxλj − cyµj − czηj + αmj)(1 − cos νThj)])

t[N

∑

j=1


2 ≤

4N

∑

j=1

(−cxλj − cyµj − czηj + αmj) × (1 −t[N

∑

j=1

(−cxλj − cyµj − czηj + αmj)]) (33)

with 32 andmj >> λj , µj, ηj (34)



t([

N∑

j=1


2 ≤ αm0 (35)

t(c2x

N∑

j=1

λ2j + c2

y

N∑

j=1

µ2j + c2

z

N∑

j=1

η2j ) ≤ αm0 (36)

Then the conditions for stability of convection-diffusion equation are

t ≤1

|αm0 − cxλ0 − cyµ0 − czη0|; t

(c2x

∑Nj=1 λ2

j + c2y

∑Nj=1 µ2

j + c2z

∑Nj=1 η2

j )

α|m0|≤ 1

(37)

4 Numerical Test

In order to illustrate the applications of the numerical explicit generalized differences(GFDM) schemes developed previously, a problem for which an exact solution is avail-able is required so that approximate results obtained may be compared with an exactsolution.The weighting function used has been

w(hj , kj , lj) =1

(√

h2j + k2

j + l2j )3

(38)

The global error is evaluated for each time increment, in the last time step consid-ered, using the following formula

Global error =

√

PNTj=1

(sol(j)−exac(j))2

NT

|exacmax|× 100 (39)

and the maximum local error is evaluated, in the last time step calculated, using thefollowing formula

Maximum local error = maxj|sol(j) − exac(j)| (40)

where sol(j) is the GFDM solution at the node j exac(j) is the exact value of thesolution at the node j, exacmax is the maximun value of the exact values in the cloudof nodes considered and NT is the total number of nodes of the domain.Let us consider equation

∂U(x, t)

∂t+

∂U(x, t)

∂x+

∂U(x, t)

∂y+

∂U(x, t)

∂z=

0.1(∂2U(x, t)

∂x2+

∂2U(x, t)

∂y2+

∂2U(x, t)

∂z2) t > 0, 0 < x, y, z < 1 (41)

with given initial condition and Dirichlet boundary conditions. The exact solution is

U(x, y, z, t) = exp(−2.7t + x + y + z) (42)



In this problem, we consider the regular grids with different h. The influence on globalerror of using and different grids of nodes with different h and different values of timeincrement, t, is given in figure 1. Also, we consider the irregular grids (729 nodes).

Figure 1: Global error versus h ; Global error versus t

The influence on global error versus different values of time increment, t, and themaximum error local versus different values of time increment, t is given in figure 2.

Figure 2: Global error versus t ; Maximum error local versus t

5 Conclusions

The use of the generalized finite difference method using irregular clouds of points isan interesting way of solving partial differential equations. The extension of the gene-ralized finite difference to the explicit solution of advection-diffusion equation has beendeveloped.The truncation error of advection-diffusion equation in the case of irregular grids ofpoints have been defined. The von Neumann stability criterion has been expressed infunction of the coefficients of the star equation for irregular of nodes.As is shown in the numerical results, a decrease in the value of the time step, always



below the stability limits, leads to a decrease of the global error.

Acknowledgements

The authors acknowledge the support from Ministerio de Ciencia e Innovacion of Spain,project CGL2008 − 01757/CLI.

References

[1] J. J. Benito, F. Urena and L. Gavete, Leading-Edge Applied Mathematical

Modelling Research (chapter 7), Nova Science Publishers, New York, 2008.

[2] A.R. Mitchell, D. F. Griffiths, The Finite Difference Method in Partial

Differential Equations, Jhon Wiley & Sons, New York, 1980.

[3] J.J. Benito, F. Urena, L. Gavete, Influence of several factors in the gene-

ralized finite difference method, Applied Mathematical Modelling 25 (2001) 1039–1053.

[4] J.J. Benito, F. Urena, L. Gavete, R. Alvarez, An h-adaptive method in

the generalized finite differences, Computer methods in Applied Mechanics andEngineering 192 (2003) 735–759.

[5] J.J. Benito, F. Urena, L. Gavete, Solving parabolic and hyperbolic equations

by Generalized Finite Difference Method, Journal of Computational and AppliedMathematics 209, Issue 2, (2007) 208–233.

[6] J.J. Benito, F. Urena, L. Gavete, B. Alonso, A posteriori error estimator

and indicator in Generalized Finite Differences. Application to improve the approx-

imated solution of elliptic pdes, International Journal of Computer Mathematics85 (2008) 359–370.

[7] J.J. Benito, F. Urena, L. Gavete, B. Alonso, Application of the Generalized

Finite Difference Method to improve the approximated solution of pdes, ComputerModelling in Engineering & Sciences 38 (2009) 39–58.





30 June, 1–3 July 2009.

Analytic likelihood function for data analysis in

the starting phase of an influenza outbreak

Sander van Noort1, Nico Stollenwerk2 and Lewi Stone2

1 Theoretical Epidemiology Group, Gulbenkian Institute of Science, Oeiras, Portugal

2 Centro de Matematica e Aplicacoes Fundamentais, Universidade de Lisboa, Portugal

2 Biology Department, Tel Aviv University, Israel


Abstract

Influenza is a disease which frequently captures media attention. Seasonal in-fluenza epidemics still are the best data source for investigations on spreading ofthe disease, even when getting prepared to pandemic influenza of newly mutatedstrains. We investigate data from an internet based surveillance system, the In-

fluenzaNet. Our previous work concentrated on analysing the noise floor beforethe seasonal outbreak starts and the burn out of sucsceptibles at the hight andfinal phase of the epidemics. Both scenarions could be modelled analytically, andlikelihood functions be derived rigorously. In the present study we model the startof the epidemic outbreak, and again can derive the likelihood function analytically.Our approximation captures the initial exponential growth phase of the epidemic.Especially the interplay of the noise floor and the onset of the epidemic season isof importance to understand when the outbreak happens and when still normalfluctuations in the noise floor might give erroneous alerts.

Key words: stochastic dynamics, master equation, generating function, partial

differential equation, maximum likelihood, InfluenzaNet

1 Introduction

Recent outbreaks of influenza have allerted the public of possible pandemic spread ofa new influenza virus mutation. Also seasonal influenza is driven to most extend bynewly appearing slight mutations in already in humans existing strains. Hence, seasonalinfluenza epidemics still are the best data source for investigations on spreading of thedisease, even when getting prepared to pandemic influenza of newly mutated strains.

Here we investigate data from an internet based surveillance system, the Influen-

zaNet. Our previous work [1] concentrated on analysing the noise floor, before the


Analytic likelihood function for starting phase of influenza outbreak

seasonal outbreak starts, as a Poisson process and then the burn out of sucscepti-bles at the hight and final phase of the epidemics. Both scenarios could be modelledanalytically, and likelihood functions be derived rigorously.

In the present study we model epidemic outbreak of influenza as an susceptible,infected, recovered process (SIR). We calculate from the master equation, which is tra-ditionally widely used in physics and chemistry [2, 3], the partial differential equation(PDE) in two variables for the generating function in order to find solutions for the mas-ter equation [2, 4, 5, 9]. However, this PDE is analytically untractable. Therefore, thestart of the epidemic outbreak then is approximated by the assumption of abundantlymany susceptible individuals during the onset phase. This assumption significantlysimplifies the PDE to a form in only one variable, similar to the ones treated earlier[1]. It can be shown rigorously that the waning immunity transition from recoveredR back to susceptibles S does not play any role during the onset of the epidemic, asdefined by our assumption of abundance of susceptibles.

Though the PDE is still much more complicated than in the simple test casestreated in [1], we can solve the PDE and obtain the generating function analytically inclosed form. From the generating function the likelihood for future parameter analysiscan be calculated as well as the dynamics of the mean value, demonstrating clearly thatwe describe with our approximation of abundantly many susceptibles the exponentialgrowth phase of the epidemic onset. Especially the interplay of the noise floor and theonset of the epidemic season is of importance to understand when the out break happensand when still normal fluctuations in the noise floor might give erroneous alerts. Theanalysis is applicable to influenza time series, originated from InfluenzaNet, an internetbased surveillance system which has operated for a number of years successfully in theNetherland, Belgium, Portugal and Italy, and is in the process to operate in furtherEuropean countries like Germany, France and the United Kingdom of the British Isles[6, 7, 8]. Such a system in parallel to the classical national medical surveillance systemscould serve as an early warning system in future pandemics, since information can bespread faster than in traditional notification systems.

2 Stochastic epidemic dynamics and generating function

One of the basic epidemic process is the susceptible, infected, recovered (SIR) epidemic,in which susceptible individuals S become infected on contact with already infected Iwith infection rate β and recover with rate γ into the R class. Eventually, the recoveredand immune R can become susceptible again with rate α. For the onset phase of anepidemic this waining susceptibility does not play any important role, as we will showbelow. For recurrent outbreaks in seasonal influenza this will become important though[10]. The SIR epidemic is given by the reaction scheme

S + Iβ

−→ I + I

Iγ

−→ R (1)

Rα

−→ S


Sander van Noort, Nico Stollenwerk & Lewi Stone

giving the following master equation (stochastic Markov process in continuous time)for fixed population size N , hence N = S + I + R.

The master equation for the SIR system with R = N − S − I, hence we only needto consider the probability of S and I, and R follows from this, is given by

d

dtp(S, I, t) =

β

N(I − 1)(S + 1)) p(I − 1, t) + γ(I + 1) p(S, I + 1, t)

+α(R + 1) p(S − 1, I, t) (2)

−

(

β

NI(N − I) + γI + αR

)

p(S, I, t)

with (R + 1) = N − (S − 1) − I. In order to solve the master equation we can usegenerating functions or charateristic functions, obtaining an eventually easier solvablepartial differential equation (PDE). In the folowing we apply the generating functionto the above given master equation.

2.1 Generating function

The generating function for the master equation in two variables S and I is defined as

〈xIyS〉 :=

N∑

I=0

N∑

S=0

xIyS · p(S, I, t) =: f(x, y, t) (3)

and it generates moments of the stochastic process, once it is determined from thatprocess. It is

∂f(x, y, t)

∂x=

N∑

I=0

N∑

S=0

IxI−1yS · p(S, I, t) (4)

and∂f(x, y, t)

∂y=

N∑

I=0

N∑

S=0

xI · SyS−1 · p(S, I, t) (5)

and∂2f(x, y, t)

∂x∂y=

N∑

I=0

N∑

S=0

IxI−1 · SyS−1 · p(S, I, t) (6)

etc. and from this we obtain the moments by evaluating at point (x, y) = (1, 1),

∂f(x, y, t)

∂x

∣

∣

∣

∣

x=1,y=1

=

N∑

I=0

N∑

S=0

I · p(S, I, t) = 〈I〉 (7)

respectively

∂f(x, y, t)

∂y

∣

∣

∣

∣

x=1,y=1

=N∑

I=0

N∑

S=0

S · p(S, I, t) = 〈S〉 (8)



and for correlations e.g.

∂2f(x, y, t)

∂x∂y

∣

∣

∣

∣

x=1,y=1

=N∑

I=0

N∑

S=0

SI · p(S, I, t) = 〈SI〉 . (9)

Inserting the generating function into the stochastic master equation gives a partialdifferential equation (PDE) to be solved. From the generating function the probabilitycan be obtained as a back transformation via Taylor’s expansion of f(x, y, t) in respectto x and y. The dynamics for the generating function is given by

∂

∂tf(x, y, t) =

N∑

I=0

N∑

S=0

xIyS ·d

dtp(S, I, t) (10)

and by inserting the master equation Eq. (3) into Eq. (10) and taking Eqs. (4), (5)and (6) gives the following PDE

∂f

∂t=

β

Nx · (x − y)

∂2f

∂x∂y+ γ(1 − x)

∂f

∂x+ α(1 − y)

(

Nf − x∂f

∂x− y

∂f

∂y

)

(11)

and with initial condition p(S, I, t0) = δS,S0· δI,I0 giving the initial condition for the

generating function f(x, y, t0) = xI0 · xS0.

2.2 Dynamics of the mean value

The dynamics of the mean values can be obtained via the generating function and itsPDE as

d

dt〈S〉 =

d

dt

(

∂f(x, y, t)

∂y

∣

∣

∣

∣

x=1,y=1

)

=

(

∂

∂y

(

∂f

∂t

))∣

∣

∣

∣

x=1,y=1

(12)

and inserting the original PDE, Eq. (11), leads after some calculation to

d

dt〈S〉 = −

β

N〈SI〉 + α(N − 〈S〉 − 〈I〉) (13)

with 〈R〉 = N − 〈S〉 − 〈I〉 as one would expect from direct calculations from themaster equation. The well known closed ordinary differential equation for the number ofinfected for the SIS system is found from here by inserting the mean field approximation,i.e. neglecting higher moments 〈SI〉 − 〈S〉〈I〉 ≈ 0. Hence we obtain

d

dt〈S〉 = −

β

N〈I〉〈S〉 + α〈R〉 (14)

respectively for 〈I〉

d

dt〈I〉 =

d

dt

(

∂f(x, y, t)

∂x

∣

∣

∣

∣

x=1,y=1

)

=β

N〈SI〉 − γ〈I〉 (15)

and in mean field approximation

d

dt〈I〉 =

β

N〈I〉〈S〉 − γ〈I〉 . (16)

The mean field ODEs for the SIR epidemics, Eq. (14) and (16), are the starting pointof all deterministic modelling.



2.3 Application to the onset of the epidemic in a linearized model

We now approximate for the initial phase of the SIR epidemics the master equation bythe assumption of an abundant number of susceptible individual, hence the reactionscheme has to be altered in the first component as

S∗ + Iβ

−→ I + I (17)

with S∗ a constant for the number of susceptibles, hence S − 1 ≈ S∗ etc. The masterequation for this process is given by

d

dtp(I, t) =

β

NS∗(I − 1)p(I − 1, t) + γ(I + 1)p(I + 1, t)

−

(

β

NS∗I + γI

)

p(I, t) (18)

and with the definition for the constant β := βN

S∗ we obtain

d

dtp(I, t) = β(I − 1) p(I − 1, t) + γ(I + 1) p(I + 1, t) −

(

βI + γI)

p(I, t) . (19)

Due to the autocatalytic first reaction which creates from one infected two infected, theforce of infection increases, such that we expect an exponential increase of the numberof infected, which is characteristic for the intial phase of an epidemic. We also observehere that the waining immunity transition proportional to α vanishes from the masterequation due to our approximation.

Since the S-dependence in the master equation drops out, the generating functionis now simply defined as

〈xI〉 :=

N∑

I=0

xI · p(I, t) =: f(x, t) (20)

and it generates moments of the stochastic process, once it is determined from thatprocess. It is

∂f(x, t)

∂x=

N∑

I=0

IxI−1 · p(I, t) (21)

and (∂f/∂x)|x=1 = 〈I〉.Inserting the generating function into the stochastic master equation gives a partial

differential equation PDE to be solved. From the generating function the probabilitycan be obtained as a back transformation via Taylor’s expansion of f(x, t) in respectto x, see Eq. (3),

p(I, t) =1

I!

∂If(x, t)

∂xI

∣

∣

∣

∣

x=0

. (22)

The dynamics for the generating function is given by

∂

∂tf(x, t) =

N∑

I=0

xI ·d

dtp(I, t) (23)



and by inserting the master equation Eq. (19) into Eq. (23) gives after some calculationthe following PDE

∂f

∂t=

(

(1 − x)(γ − βx)) ∂f

∂x(24)

and with initial condition p(I, t0) = δI,I0 giving the initial condition for the generatingfunction f(x, t0) = xI0. With these intial conditions, given we found a solution of Eq.(24), we obtain via the back transformation Eq. (22) the solution p(I, t|I0, t0) to beused for the likelihood function.

This PDE can be solved by the separation ansatz z(x, t) = u(x) ·v(t) and to includeinitial condition using another function Φ(z) as

f(x, t) := Φ(z) = Φ(u(x) · v(t)) . (25)

Inserting this ansatz into the PDE gives

∂

∂tf(x, t) =

dΦ

dz·∂z

∂t=

dΦ

dz· u(x)

∂v

∂t= (1 − x)(γ − βx)

∂f

∂x

= (1 − x)(γ − βx)dΦ

dz

∂z

∂x(26)

= (1 − x)(γ − βx)dΦ

dz·∂u

∂xv(t)

separating the PDE into two ODEs for v(t) with dv/dt = v(t) and u(x) with du/dx =(1/(1 − x)(γ − βx))u(x) and arbitrary function Φ(z). After integration we find thespecial solutions

v(t) = et , u(x) =

(

x − 1

x − γ

β

)1

β−γ

(27)

and as solution for the separation ansatz

z(x, t) =

(

x − 1

x − γ

β

)1

β−γ

et . (28)

To determine the function Φ(z) from the initial condition f(x, t0) = xI0 we insert fortime t0

f(x, t0) = xI0 = Φ(z) = Φ

(

x − 1

x − γ

β

)1

β−γ

et

(29)

and from z =

(

x−1x− γ

β

)1

β−γ

et0 we determin x(z) as

x =

γ

β

(

z et0)β−γ

− 1

(z et0)β−γ − 1(30)



and finally insert this into Φ(z) giving

Φ(z) = Φ

(

x − 1

x − γ

β

)1

β−γ

et0

=

γ

β

(

z et0)β−γ

− 1

(z et0)β−γ − 1

I0

. (31)

Up to now we considered for Φ(z) only the initial time t0. Now we take z for all timest to obtain the general solution for the generating function in the special case of β = 0f(x, t) = Φ(z(x, t) with z(x, t) from Eq. (28). We finally obtain the general solutionfor all times

f(x, t) =

γ

β(x − 1) e(β−γ)(t−t0) −

(

x − γ

β

)

(x − 1) e(β−γ)(t−t0) −(

x − γ

β

)

I0

(32)

and from this we can obtain via Eq. (22) the probability p(I, t), respectively thetransition probability p(I, t|I0, t0) needed for the likelihood function [1], because weused the initial conditions as described above.

From Eq. (32) we can also calculate the solution for the mean value as

〈I(t)〉 =∂f(x, t)

∂x

∣

∣

∣

∣

x=1

= I0 e(β−γ)(t−t0) (33)

giving an exponential time dependence. This demonstrates that our approximationof abundantly many susceptibles describes the exponential growth phase of the SIRepidemics.

2.4 Constructing the likelihood function

For observed data points in a time series (I0, I1, ...In) at times (t0, t1, ...tn) we have thejoint probability of data points under the model assumption

p(In, tn, In−1, tn−1, ..., I1, t1, I0, t0) =

n−1∏

ν=0

p(Iν+1, tν+1|Iν , tν) · p(I0, t0) (34)

=

n−1∏

ν=0

1

Iν+1!

∂Iν+1f(x, tν+1)

∂xIν+1

∣

∣

∣

∣

x=0

· p(I0, t0)

and inserting the solution of the stochastic process Eq. (32) we obtain the likelihoodfunction for the model parameters β and γ

L(β, γ) =n−1∏

ν=0

1

Iν+1!

∂Iν+1

∂xIν+1

γ

β(x − 1) e(β−γ)(tν+1−tν) −

(

x − γ

β

)

(x − 1) e(β−γ)(tν+1−tν) −(

x − γ

β

)

Iν

∣

∣

∣

∣

∣

∣

∣

x=0

(35)

which can be maximised to get the most likely parameter values given the data. Theseparameter values can then be used to fit the actual data.



To maximize the likelihood function, we take the logarithm of it ℓ(β, γ) := lnL(β, γ)and from this the partial derivatives in respect to β and γ to be zero. Hence we have

∂ℓ

∂β=

n−1∑

ν=0

∂

∂βln

∂Iν+1

∂xIν+1

γ

β(x − 1) e(β−γ)(tν+1−tν) −

(

x − γ

β

)

(x − 1) e(β−γ)(tν+1−tν) −(

x − γ

β

)

Iν

∣

∣

∣

∣

∣

∣

∣

x=0

=: F (β, γ)

(36)and

∂ℓ

∂γ=

n−1∑

ν=0

∂

∂γln

∂Iν+1

∂xIν+1

γ

β(x − 1) e(β−γ)(tν+1−tν) −

(

x − γ

β

)

(x − 1) e(β−γ)(tν+1−tν) −(

x − γ

β

)

Iν

∣

∣

∣

∣

∣

∣

∣

x=0

=: G(β, γ)

(37)with the maximum of the log-likelihood given by F (β, γ) = 0 and G(β, γ) = 0 simul-taneously. Since already in simpler models [1] we have to apply Newton’s method intwo dimensions to obtain the simultaneous estimates for β and γ, the derivative op-erator ∂Iν+1/∂xIν+1 can also be calculated numerically or by symbolic transformationcomputer programs. Numerically, we have for the Ith derivative the scheme

p(I, t) =1

I!

∂If(x, t)

∂xI

∣

∣

∣

∣

x=x0=0

= limh→0

(

1

I!

I∑

k=0

(−1)k(

Ik

)

1

hIf(x0 − k · h)

)

(38)

which is easy to evaluate in our application because we only have to take the first fewderivatives of the generating function f at point x0 = 0 in the likelihood function.

Further approximations like the Poisson approximation [11] of constant transitionrates, e.g. λ := βS∗I0 in the master equation for p(I, t|I0, t0) lead again to expressionsas already described earlier [1], and become less and less accurate for longer integrationtime ∆t = tν+1− tν , where our more accurate scheme still holds well. Again, numericaltests have to show the validity of such further approximations in application to thepresent data. These methods come closer and closer to complete numerical estimationof likelihood functions [12].

3 Summary

For the onset of an SIR epidemic process, which describes e.g. seasonal influenzaoutbreaks, we have given the master equation, calculated the charcteristic function,and from this we calculated the likelihood function. The approximation of abundantlymany susceptibles captures the exponential onset of the epidemic well, as we see fromthe time dependent solution of the mean value of infected as derived from the generatingfunction. Though the analytic expressions are much more complicated than in thepreviously treated examples [1] to describe the noise floor of influenza and the seondpart of the epidemics, the analytic expressions obtained here can be treated in futurenumeric work similarly to the previously investigated cases.



AcknowledgementsThis work has been supported by the European Union under the EPIWORK grant.We thank Luis Sanchez and Maıra Aguiar, both Lisbon, and Gabriela Gomes, Oeiras,for scientific support, and Haggai Katriel, Tel Aviv, Frank Hilker, Bath, and FriedhelmDrepper, Julich, for valuable discussions on some aspects of the presented work.

References

[1] S. van Noort and N. Stollenwerk, From dynamical processes to likelihood

functions: an epidemiological application to influenza, Proceedings of 8th Confer-ence on Computational and Mathematical Methods in Science and Engineering,CMMSE 2008, ISBN 978-84-612-1982-7 (2008).

[2] N.G. van Kampen, Stochastic Processes in Physics and Chemistry, North-Holland, Amsterdam, 1992.

[3] T. Tome and M.J. de Oliveira, Dinamica estocastica e irreversibildade, Edi-tora da Universidade de Sao Paulo, Sao Paulo, 2001.

[4] J. Honerkamp, Stochastic Dynamical Systems: Concepts, Numerical Methods

and Data Analysis, VCH Publishers, Heidelberg, New York, 1993.

[5] A.T. Bharucha-Reid, Elements of the theory of Markov Processes and their

applications, McGraw-Hill, New York, 1960.

[6] InfluenzaNet, http://www.influenzanet.com/.

[7] R.L. Marquet, A.I. Bartelds, S. van Noort, C.E. Koppeschaar, J.

Paget, F.G. Schellevis, and J. van der Zee, Internet-based monitoring of

influenza-like illness (ILI) in the general population of the Netherlands during the

2003-2004 influenza season, BMC Public Health 6 (2006) 242 (8 p.).

[8] S. van Noort, J. Lourenco, H. Rebelo de Andrade, M. Muehlen, and

G.M.G. Gomes, Gripenet: An internet-based system to monitor influenza-like

illness uniformely across Europe, Eurosurveillance Monthly 12 (2007) 7–8.

[9] A. Renyi, Wahrscheinlichkeitsrechnung, VEB Deutscher Verlag der Wissen-schaften, Berlin, 1962.

[10] R. Casagrandi, L. Bolzoni, S. A. Levin and V. Andreasen, The SIRC

model and influenza A, Mathematical Biosciences 200 (2006) 152–169.

[11] L. Gustafsson, and M. Sternad, Bringing consistency to simulation of popula-

tion models - Poisson simulation as a bridge between micro and macro simulation,Mathematical Biosciences 209 (2007) 361–385.

[12] N. Stollenwerk and K.M. Briggs, Master equation solution of a plant disease

model, Physics Letters A 274 (2000) 84–91.





30 June, 1–3 July 2009.

Accelerating sparse matrix vector product with GPUs

Francisco Vazquez1, Ester M. Garzon 1, Jose A. Martınez 1 and Jose

J. Fernandez 1

1 Dept. Computer Architecture and Electronics, University of Almeria. Spain


Abstract

The sparse matrix vector product (SpMV) is a paramount operation in engi-neering and scientific computing and, hence, has been a subject of intense researchfor long. The irregular computations involved in SpMV make its optimization chal-lenging. Therefore, enormous effort has been devoted to devise data formats tostore the sparse matrix with the ultimate aim of maximizing the performance. TheGraphics Processing Units (GPUs) have recently emerged as platforms that yieldoutstanding acceleration factors. Currently, SpMV implementations for NVIDIA-GPUs have already appeared on the scene. This work proposes and evaluates a newimplementation of SpMV for GPUs based on a new matrix storage format, calledELLPACK-R, and compares it against a variety of formats proposed elsewhere.The most important qualities of this new format is that (1) no preprocessing of thesparse matrix is required, and (2) the resulting SpMV algorithm is very regular.The comparative evaluation of this new SpMV approach has been carried out basedon a representative set of test matrices. The results show that the SpMV approachbased on ELLPACK-R turns out to be superior to the previous strategies used sofar. Moreover, a comparison with standard state-of-the-art superscalar processorsreveals that significant speedup factors are achieved with GPUs.

Key words: GPU, High Performance Computing, Sparse Matrix Vector Product

1 Introduction

The Matrix-Vector product (MV) is a key operation for a wide variety of scien-tific applications, such as image processing, simulation, control engineering andso on [1]. The relevance of this kind of operation in computational sciences issupported by the constant effort devoted to optimise the computation of MV forthe processors at the time, which range from the early computers in the seventiesto the last modern multi-core architectures [2, 3, 4, 5]. In that sense, the fact



that MV is a routine of Level 2 in BLAS (Basic Linear Algebra Subroutines)is remarkable because the BLAS library has constantly been improved and opti-mized as the computer architectures have evolved [6, 7, 8]. For many applicationsbased on MV, the matrix is large and sparse, i.e. the dimensions of matrix arelarge (≥ 105) and the percentage of non-zero components is very low (≤ 1−2%).Sparse matrices are involved in linear systems, eigensystems and partial differen-tial equations from a wide spectrum of scientific and engineering disciplines. Forthese problems the optimization of the sparse matrix vector product (SpMV) isa challenge because of the irregular computation of large sparse operations. Thisirregularity arises from the fact that the data access locality is not maintainedand that fine grained parallelism of loops is not exploited [9]. Therefore, addi-tional effort must be spent to accelerate the computation of SpMV. This effort isfocused on the design of appropriate data formats to store the sparse matrices,since the performance of SpMV is directly related to the used format.

Currently, Graphics Processing Units (GPUs) offer massive parallelism forscientific computations. The use of GPUs for general purpose applications hasexceptionally increased in the last few years thanks to the availability of Appli-cation Programming Interfaces (APIs), such as Compute Unified Device Archi-tecture (CUDA) [14] and OpenCL [10], that greatly facilitate the development ofapplications targeted at GPUs. Specifically, dense algebra operations are accel-erated by GPU computing and the library CUBLAS [8] is now publicly availableto get easily high performance with NVIDIA GPUs in these operations. Re-cently, several implementations of SpMV have also been developed with CUDAand evaluated on NVIDIA GPUs [11, 12, 13]. The aim of this work is to designand analyse GPU computing approaches of SpMV. This work covers a variety offormats to store the sparse matrix in order to explore the best possible use ofthe GPU for a variety of algorithmic parameters. A proposal for a new storageformat is presented which proves to outperform the most common and efficientformats for SpMV used so far.

Next, Section 2 summarises the aspects related to GPU programming andcomputing. Then, Section 3 reviews the different formats to compress sparse

Figure 1: Different access times and sizes of GPU Memories


F. Vazquez, E.M. Garzon, J.A. Martınez, J.J. Fernandez

matrices and the corresponding codes to compute SpMV, given that the selectionof an appropriate format is the key to optimise SpMV on GPUs. Section 4introduces a new format suitable for computation of SpMV on GPUs. In Section5 the performance measured on a NVIDIA Geforce GTX 295 with a wide setof representative sparse matrices belonging to diverse applications is presented.The results clearly show that the new storage format presented here, ELLPACK-R, gets the best performance for most of the test matrices. Finally, Section 6summarises the main conclusions.

2 Computational keys to exploit GPUs

Compute Unified Device Architecture (CUDA) provides a set of extensions tostandard ANSI C for programming NVIDIA GPUs. It supports heterogeneouscomputation where applications use both the CPU and GPU. Serial portions ofapplications are run on the CPU, and parallel portions are accelerated on theGPU. These portions executed in parallel by the GPU are called kernels [14].GPUs have hundreds of cores that can collectively run thousands of computingthreads. Each core, called Scalar Processor (SP), belongs to a set of multipro-cessors units called Streaming Multiprocessors (SM) that compose the device.The number of SMs ranges from eight (NVIDIA Tesla C870) to thirty in mod-ern GPUs (NVIDIA Geforce GTX 295). The SPs in a SM share resources suchas registers and memory. The on-chip shared memory allows the parallel tasksrunning on these cores to share the data without the need of sending it over thesystem memory bus [14].

To develop codes for GPUs with CUDA, the programmer has to take intoaccount several architectural characteristics, such as the topology of the multi-processors and the management of the memory hierarchy. The GPU architectureallows the host to issue a succession of kernel invocations to the device. Eachkernel is executed as a batch of threads organized as a grid of thread blocks. Theexecution of every thread block is assigned to every SM. Moreover, every block iscomposed by several groups of 32 threads called warps. All threads belonging toa warp execute the same program over different data. The size of every threadblock is defined by the programmer. The maximum instruction throughput isgot when all threads of the same warp execute the same instruction sequence,given that any flow control instruction can cause the threads of the same warp todiverge, that is, to follow different execution paths. If this occurs, the differentexecutions paths have to be serialized, increasing the total number of instructionsexecuted for this warp [14].

Another key to take advantage of GPUs is related to the memory manage-ment. There are several kinds of memory available on GPUs with different accesstimes and sizes that constitute a memory hierarchy, as illustrated in Figure 1. Theeffective bandwidth can vary by an order of magnitude depending on the access



pattern for each type of memory. There is a parallel memory interface betweenthe global memory and every SM of the GPU. The access to the global memorycan be performed in parallel by all threads of a half-warp (16 threads), which isaccelerated only for specific coalesced memory access patterns [14]. Hence the or-dering of the data access chosen in an algorithm may have significant performanceeffects during GPU memory operations.

From the programmer’s point of view, the GPU is considered as a set of SIMD(Single Instruction stream, Multiple Data streams) multiprocessors with sharedmemory. Therefore, the SPMD (Single Program Multiple Data) programmingmodel is offered by CUDA. Moreover, in order to optimise the GPU performance,the programmer has to consider two main goals: (1) to balance the computationof the sets of threads, and (2) to optimise the data access through the memoryhierarchy. Specifically, to optimise SpMV on GPUs, both goals have to be takeninto account in devising appropriate formats to store the sparse matrix, since theparallel computation and the memory access are tightly related to the storageformat of the sparse matrix.

3 Formats to compress sparse matrices

3.1 Coordinate storage

The coordinate storage scheme (COO) to compress a sparse matrix is a directtransformation from the dense format. Let Nz be the total number of non-zero entries of the matrix. A typical implementation of COO uses three one-dimensional arrays of size Nz. One array, of floating-point numbers (hereafterreferred to as floats), contains the non-zero entries. The other two arrays, ofinteger numbers, contain the corresponding row and column indices for each non-zero entry. The performance of SpMV may be penalised by COO because it doesnot implicitly include the information about the ordering of the coordinates.

3.2 Compressed Row Storage (CRS) and some variants

Compressed Row Storage (CRS) is the most extended format to store sparsematrices on superscalar processors. Figure 2(left) illustrates the CRS details.Let N and Nz be the number of rows of the matrix and the total number ofnon-zero entries of the matrix, respectively; the data structure consists of thefollowing arrays:(1) A[ ] array of floats of dimension Nz, which stores the entries;(2) j[ ] array of integers of dimension Nz, which stores their column index; and(3) start[ ] array of integers of dimension N , which stores the pointers to thebeginning of every row in A[ ] and j[ ].

The code to compute SpMV based on CRS can be seen on Figure 2(left).There are several drawbacks that hamper the optimization of the performanceof this code on superscalar architectures. First, the locality to access to vector



v[ ] is not maintained due to the indirect addressing. Second, the fine grainedparallelism is not exploited because the number of iterations of the inner loop issmall and variable [9].

The Incremental Compressed Row Storage (ICRS) format [1] is a variant ofCRS where the location of non-zero elements is encoded as a one-dimensionalindex. The underlying motivation is the following: if the entries within a row areordered by increasing column index, the one-dimensional indices form a mono-tonically increasing sequence. ICRS consists of two arrays: (1) A[ ] array offloats of dimension Nz, which stores the entries; and (2) inc[ ] array of integersof dimension Nz, which stores the increments of indexes of the vector v[ ]. Moredetails about ICRS can be obtained in [1]. The performance obtained by ICRS issimilar to CRS according to our experience with the set of matrices considered.

We have designed and evaluated a new format, called CRSN (CRS with Neg-ative marks), which is a variant of CRS. The components of CRSN are illustratedin Figure 2(center). CRSN only requires two arrays of dimension Nz, which areequivalent to the arrays A[ ] and j[ ] of the original CRS. However, the beginningof every row is marked with a negative column index in j[ ], and the correspond-ing code to compute SpMV includes one loop that contains a conditional branch.The performance obtained with CRSN is slightly better than with CRS and ICRSon superscalar cores included in current processors, such as Intel Core 2 Duo, In-tel Xeon Quad Core Clovertown and AMD Opteron Quad Core according to ourexperience. Specifically, better performance is got on Intel Core 2 Duo E8400. InSection 5, CRSN on one core of Intel Core 2 Duo E8400 is considered as a refer-ence, with the purpose of comparing the performance of the SpMV computationon two architectures, GPU and one superscalar core.

3.3 ELLPACK

ELLPACK or ITPACK [15] was introduced as a format to compress a sparsematrix with the purpose of solving large sparse linear systems with ITPACKVsubroutines on vector computers. This format stores the sparse matrix on twoarrays, one float, to save the entries, and one integer, to save the columns of everyentry. Both arrays are of dimension at least N × MaxEntriesbyRows, where N

is the number of rows and MaxEntriesbyRows is the maximum number of non-zeros per row in the matrix, with the maximum being taken over all rows. Notethat the size of all rows in these compressed arrays A[ ] and j[ ] is the same,because every row is padded with zeros, as seen in Figure 2(right). Therefore,ELLPACK can be considered as an approach to fit a sparse matrix in a regulardata structure similar to a dense matrix. Consequently, this format is appropriateto compute operations with sparse matrices on vector architectures. However, ifthe percentage of zeros is high in the ELLPACK data structure and there is a veryirregular location of entries in different rows, then the performance decreases.



Figure 2: Different storage formats for sparse matrices and the corresponding codes to computeSpMV

4 ELLPACK-R, a format to optimize SpMV on GPUs

ELLPACK-R is a variant of the ELLPACK format. ELLPACK-R consists of twoarrays, A[ ] (float) and j[ ] (integer) of dimension N ×MaxEntriesbyRows; and,moreover, an additional integer array called rl[ ] of dimension N (i.e. the numberof rows) is included with the purpose of storing the actual length of every row,regardless of the number of the zero elements padded. An important point isthe fact that the arrays store their elements in column-major order. As seen inFigure 3, these data structures take advantage of:

(1) The coalesced global memory access, thanks to the column-major orderingused to store the matrix elements into the data structures. Then, the threadidentified by index x accesses to the elements in the x row: A[x + i ∗ N ] with 0 ≤ i < rl[x] where i is the column index and rl[x] is the total number ofnon-zeros in row x. Consequently, two threads x and x + 1 access to consecutivememory address, thereby fulfilling the conditions of coalesced global memoryaccess.

(2) Non-synchronized execution between different blocks of threads. Every



Figure 3: ELLPACK-R format and kernel to compute SpMV on GPUs

block of threads can complete its computation without synchronization with oth-ers blocks, because every thread computes one element of the vector u (i.e. theresult of the SpMV operation), and there are no data dependences in the com-putation of different elements of u.

(3) The reduction of the waiting time or unbalance between threads of onewarp. Figure 4 shows an example of histogram of a tiny matrix, and a hypotheticsmall warp of eight threads is considered with the goal of illustrating the advan-tage of ELLPACK-R. The computational load of every warp of eight threads isdifferent and it is proportional to the longest row in the corresponding subsetof rows of the matrix. Bearing in mind the kernel of SpMV with ELLPACK-R,the dark area is proportional to the runtime of every thread, and the grey areais proportional to the waiting time of every thread. Therefore, only the warpsrelated to rows of very different length are penalised with longer waiting times,as can be seen in Figure 4.

(4) Homogeneous computing within the threads in the warps. The threadsbelonging to one warp do not diverge when executing the kernel to computeSpMV. The code does not include flow instructions that cause serialization inwarps since every thread executes the same loop, but with different number ofiterations. Every thread stops as soon as its loop finishes, and the other threadsin the warp continue with the execution (see Figure 4). Furthermore, coalescedmemory access is possible. This characteristic has a significant impact on theperformance.

Recently, different proposals of kernels to compute SpMV have been describedand analysed [11, 12, 13]. The kernels related to the format called HYB (whichstands for hybrid) proposed by [11] seem to yield the best performance on GPUsso far. This format combines the ELLPACK and COO formats for different setsof rows. However, this format previously requires a preprocessing step consistingof reordering the rows in order to get a better performance. This preprocessingof the matrix is a drawback that may produce a significant penalization due to



Figure 4: Histogram of a simple example with a tiny sparse matrix and assuming ahypothetic small warp of eight threads. The dark area is related to the runtimes ofevery thread belonging to every warp, and the grey area is related to the waiting timesof the same thread.

the calls/returns to/from kernels, especially in matrices where large sets of rowshave to be divided and reordered. Other kernel called CRS(vector) has also beenevaluated in [11]. This kernel computes every output vector element with thecollaboration of the 32 threads of every warp. So, every warp computes the floatproducts related to the entries of every row, followed by a parallel reduction inshared memory in order to obtain the final result of output vector element.

5 Evaluation

A comparative analysis of the performance of different kernels to compute SpMVon NVIDIA GPUs has been carried out in this work. The following formats tostore the matrix have been evaluated: CRS, CRS(vector), ELLPACK, ELLPACK-R and HYB. This analysis is based on the run-times measured on a GeForce GTX295 with a set of test sparse matrices from different disciplines of science and en-gineering. Table 1 summarizes the test matrices used in this work and theirimportant characteristics, such as the dimensions, the number of non-zero en-tries, etc. Most considered matrices belong to collections of the Matrix Marketrepository [16]. All matrices are real of dimensions N × N . Although some ofthem are symmetric, they all have been considered as general to compute SpMV.

All kernels have been evaluated using the texture memory. This memory isbound to the global memory and plays the role of a cache level within the memoryhierarchy, and its use improves the performance [14].

Figure 5 shows the performance (GFLOPs) of the SpMV kernels based on theformats that have been evaluated: CRS, CRS(vector), ELLPACK, ELLPACK-Rand HYB. The results shown in that figure allow us to highlight the followingmajor points: (1) the performance obtained by most formats increases with the



Table 1: Set of test matrices

Matrix N Entries Type Application areaqh1484 1484 6.110 Gen. Power systems simulationsdw2048 2048 10.114 Gen. Electrical engineeringrbs480a 480 17.087 Gen. Robotic controlgemat12 4929 33.111 Gen. Power flow modelingdw8192 8192 41.746 Gen. Electrical engineeringmhd3200a 3200 68.026 Gen. Plasma physicsbcsstk24 3562 81.736 Sym. Structural engineeringe20r4000 4241 131.556 Gen. Fluid dynamicsmac econ 206500 1.273.389 Gen. Economicscop20k A 121192 1.362.087 Sym. FEM/Acceleratorqcd5 4 49152 1.916.928 Gen. QCDcant 62451 2.034.917 Sym. FEM/Cantivelermc2depi 525825 2.100.225 Gen. Epidemiologypdb1HYS 36417 2.190.591 Sym. Biocomputationrma10 46835 2.374.001 Gen. FEM/Harborconsph 83334 3.046.907 Sym. FEM/Sphereswbp128 16384 3.933.095 Gen. Tomographic reconstructionshipsec1 140874 3.977.139 Sym. FEM/Shipdense2 2000 4.000.000 Gen. Densepwtk 217918 5.926.171 Gen. Fluid dynamicswbp256 65536 31.413.932 Gen. Tomographic reconstruction

number of non-zero entries in the matrix; (2) in general, the CRS format yieldsthe poorest performance because the pattern of memory access is not coalescent;(3) the CRS(vector) format achieves better performance than CRS in most cases(even for matrices with high number of non-zero entries by row or nearly dense),despite the fact that a coalesced matrix data access is not possible with thisformat either; (4) in general, ELLPACK outperforms both CRS-based formats,however its computation is penalised for some particular matrices, mainly dueto the divergence of the warps when the matrix histogram includes rows withvery uneven length; (5) The performance obtained by HYB is, in general, higherthan that for the three previous formats, but it is remarkable its poorer resultsfor smaller matrices due to the penalty introduced by the preprocessing step;(6) finally, ELLPACK-R clearly achieves the best performance for most matricesconsidered in this work.

Figure 6 plots the average performance obtained for the five formats evalu-ated. As seen, the best average performance is got by ELLPACK-R, followed byHYB and ELLPACK, and the worst average performance is obtained by CRSand CRS(vector). Therefore, these results confirm that ELLPACK-R is superiorto the sparse matrix storage formats used thus far. The algorithm for computingSpMV using ELLPACK-R neither includes flow control instructions that seri-alise the execution of a warp of 32 threads, nor complex pre-processing steps toreorder the matrix rows; moreover, it allows coalesced matrix data access. In con-clusion, the simplicity of the SpMV computation based on ELLPACK-R allowsfull exploitation of the GPU architecture and its computing power.



Figure 5: Performance of SpMV based on different formats on GPU GeForce GTX 295 withthe set of test matrices, using the texture cache memory

Figure 6: Average performance of SpMV on GPU GeForce GTX 295 and the set of testmatrices

The key of the success of GPUs in high performance computing comes fromthe outstanding speedup factors in comparison with standard computers or evenclusters of workstations. In order to estimate the net gain provided by GPUs inthe SpMV computation, we have implemented the SpMV for a computer basedon a state-of-the-art superscalar core, Intel Core 2 Duo E8400, and evaluatedthe computing times for the set of test matrices in Table 1. For the superscalarimplementation, we chose the CRSN format as it provided the best performancefor this platform (results not shown here). For the GPU GeForce GTX 295,we used the ELLPACK-R format, which is the best for the GPU according tothe results presented above. Figure 7 shows the speedup factors obtained forthe SpMV operation on the GPU against the superscalar core for all the testmatrices considered in this work. The speedup ranges from a modest 5× factorto an exceptional 80× factor. The plot shows that the speedup depends on



Figure 7: Speed-up of SpMV on GPU GeForce GTX 295 for the set of test matrices in Table 1,taking as a reference the runtimes of SpMV on a Intel Core 2 Duo E8400. The storage formatthat provided the best performance for each platform was used, ELLPACK-R for the GPU andCRSN for the superscalar core.

the matrix pattern, though in general it increases with the number of non-zeroentries. In view of these results, we can conclude that the GPU turns out to bean excellent accelerator of SpMV.

6 Conclusions

In this paper a new approach to compute the sparse matrix vector on GPUshas been proposed and evaluated, ELLPACK-R. The simplicity of the SpMVimplementation based on ELLPACK-R makes it well suited for GPU computing.The comparative evaluation with other proposals has shown that the averageperformance achieved by ELLPACK-R is the best after an extensive study on awide set of test matrices. Therefore, ELLPACK-R has proven to be superior tothe other approaches used thus far. Moreover, the fact that this approach forSpMV does not require any preprocessing step makes it specially attractive tobe integrated on sparse matrix libraries currently available. A comparison of theGPU implementation of SpMV based on ELLPACK-R on a GeForce GTX 295has revealed that acceleration factors of up to 80× can be achieved in comparisonto state-of-the-art superscalar processors. Therefore, GPU computing is expectedto play an important role in computational science to accelerate SpMV, especiallydealing with problems where huge sparse matrices are involved.



Acknowledgements

This work has been funded by grants from the Spanish Ministry of Science and Innovation

TIN2008-01117 and Junta de Andalucia (P06-TIC-01426, P08-TIC-3518), in part financed by

the European Regional Development Fund (ERDF). Moreover, it has been developed in the

framework of the network “High Performance Computing on Heterogeneous Parallel Architec-

tures” (CAPAP-H), supported by the Spanish Ministry of Science and Innovation (TIN2007-

29664-E).

References

[1] R.H. Bisseling Parallel Scientific Computation, Oxford Univ. Press, 2004.

[2] A.T. Ogielski, W. Aiello Sparse matrix computations on parallel processor arrays SIAM Journal onScientific Computing,14 (1993) 519–530.

[3] S. Toledo Improving the memory-system performance of sparse-matrix vector multiplication IBM Journalof Research and Development,41 (6) (1997) 711–725

[4] J. Mellor-Crummey, J. Garvin Optimizing Sparse Matrix-Vector Product Computations Using Unroll

and Jam Intl. J. High Performance Comput. App.,18 (2004) 225–236

[5] S. Williams, L. Oliker, R. Vuduc, J. Shalf, K. Yelick, J. Demmel Optimization of sparse matrix-

vector multiplication on emerging multicore platforms Parallel Computing,35 (3) (2009) 178–194

[6] C. Lawson, R. Hanson, D. Kincaid, F. Krogh, Basic Linear Algebra Subprograms for Fortran Usage,ACM Trans. Mathematical Software, 5 (1979) 308–325.

[7] J. Baldeschwieler and R. Blumofe and E. Brewer ATLAS: An Infrastructure for Global Computing

In Proceedings of the Seventh ACM SIGOPS European Workshop on System Support for WorldwideApplications, (1996).

[8] NVIDIA, CUDA CUBLAS Library. PG-00000-002.V2.1 September, 2008. http://developer.download.

nvidia.com/compute/cuda/2_1/toolkit/docs/CUBLAS_Library_2.1.pdf

[9] J. Kurzak, W. Alvaro and J. Dongarra Optimizing matrix multiplication for a short-vector SIMD

architecture - CELL processor Parallel Computing,35 (3) (2009) 138–150

[10] Kronos Group, OpenCL - The open standard for parallel programming of heterogeneous systems http:

//www.khronos.org/developers/library/overview/opencl_overview.pdf

[11] N. Bell, M. Garland Efficient Sparse Matrix-Vector Multiplication on CUDA. NVIDIA Technical ReportNVR-2008-004, December 2008.

[12] L. Buatois, G. Caumon, B. Lvy, Concurrent number cruncher - A GPU implementation of a general

sparse linear solver, International Journal of Parallel, Emergent and Distributed Systems, to appear.

[13] M.M. Baskaran, R. Bordawekar Optimizing Sparse Matrix-Vector Multiplication on GPUs. IBM Re-search Report RC24704. April 2009.

[14] NVIDIA, CUDA Programming guide. Version 1.1, April, 2009 http://developer.download.nvidia.com/

compute/cuda/docs/CUDA_Architecture_Overview.pdf

[15] D.R. Kincaid, T.C. Oppe, D.M. Young, ITPACKV 2D User’s Guide. CNA-232 1989 http://rene.ma.

utexas.edu/CNA/ITPACK/manuals/userv2d/

[16] The Matrix Market http://math.nist.gov/MatrixMarket


Documents

Proceedings of the 2009 International Conference on ...cmmse.usal.es/cmmse2020/sites/default/files/volumes/volumen3opt.pdfA Spherical Interpolation Algorithm Using Zonal Basis Functions