Asynchronous evolutionary shape optimization based on high-quality surrogates: Application to an air-conditioning duct

ORIGINAL ARTICLE

Asynchronous evolutionary shape optimizationbased on high-quality surrogates: applicationto an air-conditioning duct

Balaji Raghavan • Piotr Breitkopf

Received: 21 December 2010 / Accepted: 22 March 2012

� Springer-Verlag London Limited 2012

Abstract Multi-processor HPC tools have become com-

monplace in industry and research today. Evolutionary

algorithms may be elegantly parallelized by broadcasting a

whole population of designs to an array of processors in a

computing cluster or grid. However, issues arise due to

synchronization barriers: subsequent iterations have to wait

for the successful execution of all jobs of the previous

generation. When other users load a cluster or a grid,

individual tasks may be delayed and some of them may

never complete, slowing down and eventually blocking the

optimization process. In this paper, we extend the recent

‘‘Futures’’ concept permitting the algorithm to circumvent

such situations. The idea is to set the default values to the

cost function values calculated using a high-quality surro-

gate model, progressively improving when ‘‘exact’’

numerical results are received. While waiting for the exact

result, the algorithm continues using the approximation and

when the data finally arrives, the surrogate model is

updated. At convergence, the final result is not only an

optimized set of designs, but also a surrogate model that is

precise within the neighborhood of the optimal solution.

We illustrate this approach with the cluster optimization of

an A/C duct of a passenger car, using a refined CFD legacy

software model along with an adaptive meta-model based

on Proper Orthogonal Decomposition (POD) and diffuse

approximation.

Keywords Parallel computing � Genetic algorithms �Ask&Tell � Futures � Fluid mechanics

1 Introduction

Stochastic search Genetic Algorithms (GAs) [1, 2] attempt

to find the optimal solution by manipulating a population of

candidate solutions, evaluating the fitness of the popula-

tion, and selecting the best solutions to reproduce and form

the next generation. As the generations proceed, the indi-

viduals with the highest fitness levels dominate the popu-

lation, potentially resulting in an increase in the quality.

These algorithms are attractive for multi-modal and multi-

objective optimization problems [3, 4] since rather than a

single optimum, they report a set of solutions correspond-

ing to local minima or the Pareto points. Since GAs do not

use gradient information, these are easily implemented

when a non-intrusive optimization strategy is desired. As

the overall cost of an optimization may be approximated by

the cost of the function evaluation multiplied by the

number of function calls, there are two major shortcomings

of GA’s when applied to structural optimization problems:

• The cost of fitness of an individual when calculated

using a high-fidelity numerical model with several

thousands of degrees of freedom (finite elements, finite

volumes, etc.);

• The number of function evaluations resulting both from

the size of the population and from the number of

generations required to converge.

In the present work, the first issue is addressed by

building a dedicated, low-cost and adaptive counterpart

(surrogate) of the numerical model. The second issue is

addressed by proposing an asynchronous parallel version of

B. Raghavan � P. Breitkopf (&)

Laboratoire Roberval, UTC-CNRS (UMR 7337), Labex MS2T,

Universite de Technologie de Compiegne, Compiegne, France

e-mail: [email protected]

B. Raghavan

e-mail: [email protected]

123

Engineering with Computers

DOI 10.1007/s00366-012-0263-0

GA that uses both high-fidelity simulation when available

and otherwise continues with a surrogate model. As the

optimization advances, the refined solutions accumulate

and are used to progressively refine the surrogate model. At

convergence, the final result is an optimized set of designs

and a finely tuned surrogate model that is precise within the

neighborhood of the optimal solutions.

Surrogate functions and reduced order models have been

around for a while and have been used mostly in the field of

control systems to reduce the order of the overall transfer

function [5]. They can approximate the objective functions

in a fraction of the computational time of the high-fidelity

model [6]. Lim et al. [7] provided a generalization of

surrogate-assisted evolutionary frameworks for computa-

tionally expensive problems. Surrogate-assisted evolu-

tionary optimization typically proceeds in cycles consisting

of: collection and analysis of a number of designs, fitting a

surrogate to the designs, optimization based on the surro-

gate and exact analysis at the final solution. Most recently,

Quiepo et al. [8] addressed a Gaussian process surrogate-

based optimization and looked for a statistically rigorous

procedure that would allow the user to determine the

number of surrogate-based optimization (SBO) cycles

needed for a given problem, while Viana et al. [9] inves-

tigated the cross-validation error of a set of surrogates and

tried to obtain the most accurate surrogate. The ParEGO

[10] extension of the Efficient Global Optimization (EGO)

approach [11] uses a design-of-experiments inspired

initialization procedure and learns a Gaussian process

model of the search landscape, which is updated after every

function evaluation.

We are interested here in high-quality surrogates

obtained using the method of Proper Orthogonal Decom-

position (POD) [12]. Filomeno Coelho et al. [13] used a

model reduction approach using kriging and POD to opti-

mize the intake port for a car engine. Filomeno Coelho

et al. [14] then proposed a bi-level model reduction strat-

egy to reduce the interdependence between the fluid and

structural models and hence the computation time for the

case of a 2D wing demonstrator, and then extended the

approach to the optimization of a 3D flexible wing, while

Xiao et al. [15] proposed a constrained POD version for

the optimization of a car engine intake port. All these

approaches used a variant of a GA operating directly on the

a priori constructed global surrogate functions, without

making calls to the higher order numerical model. While

this approach is extremely fast, one could run into trouble

since the engineer relies completely on the surrogate

function for his final result(s).

A distributed/parallel GAs implementation is possible,

since the fitness of an individual can be evaluated

independently of the other individuals, meaning that the

computation and search strategy is independent of the

computing sequence within a single generation. In the

master–slave scheme, communication is necessary only

when the slaves receive their individuals and then when

they return the function. Bethke [16] performed the first

study on parallel/distributed implementations of GA and

compared their parallel performance with gradient-based

optimizers. He also identified the bottlenecks that limit the

efficiency of a gradient-based approach. Greffensette [17]

proposed master–slave paradigms and the multiple popu-

lations approach with a migration scheme. The next sub-

class of parallel GAs is multi-deme GAs with multiple

populations [18]. Most parallel GA implementations take

one or more serial GAs and run each separately on separate

nodes, exchanging individuals at predetermined times, and

perform exact/numerical model-based evaluations. This,

however, can be a bottleneck on a busy cluster since the

function evaluations may be delayed leading to deadlocks.

The need for asynchronous parallelism appears partic-

ularly when the high-fidelity model runs on a remote

computing cluster that is being used for hundreds of other

projects and computational purposes, such as in a large

research laboratory or technical organization. This means

that scheduled jobs will be handled depending on the

overall priority, load on the cluster, and available resour-

ces. On a computing grid the possibility of node failures

and/or delayed/dropped computational jobs exists and

needs thus to be factored in when planning projects.

Tsutsui and Wu et al. [19, 20] investigates asynchronous

evolutionary algorithms on a multi-core cluster.

Clearly, a ‘‘mixed’’ approach that allows for asynchro-

nous computation using both numerical as well as surro-

gate calls allows the design engineer much more control

and flexibility given the precision versus computational

time and computational power, with the cluster load being

a factor outside the user’s control. The literature surveyed

shows little research on asynchronous surrogate-assisted

evolutionary algorithms. Regis et al. [21] used a parallel

stochastic radial-basis function algorithm for the surrogate

and an asynchronous global pattern search on up to eight

processors, and Asouti [22] used Artificial Neural Network

(ANN)-based meta-models to improve the quality of the

surrogate compared to existing low-precision meta-models.

In this paper, we propose an ‘‘Ask&Future Tell’’ parallel

paradigm for competitive use of high-fidelity and reduced

models specifically designed for cluster/grid environments

where an asynchronous approach is managed by a time-out

scheme. We give a ‘‘non-intrusive’’ algorithm for mas-

sively parallel asynchronous surrogate-based evolutionary

optimization tolerant to an existing cluster load with lim-

ited control over the recovery of the numerical results

performed on the slave nodes. We illustrate the proposed

approach in the problem of multi-objective optimization of

a vehicle air-conditioning duct.


123

The paper is organized into four sections. In Sect. 2 we

describe the proposed algorithm. In Sect. 3 the test case,

i.e., air-conditioning duct geometry and CFD model is

treated along with the CPOD-based meta-model used as a

surrogate with discussion of the optimization results.

Finally we discuss the perspectives and limitations of

proposed approach.

2 Paradigms for parallel evolutionary optimization

Consider a simple GA (Fig. 1). Each iteration involves two

distinct phases: evaluation of the population fitness values

and the recombination of individuals to produce the new

population.

The population evaluation in serial mode is shown in

Fig. 2, when the optimization algorithm and all function

evaluations are performed in an iterative manner on a

single node.

The simulator block is an embarrassingly parallel task

and may be performed on a remote cluster with a master

node dedicated to the optimization (GA operations) shown

in Fig. 3, typically using a Single Program, Multiple Data

(SPMD) block.

This process takes advantage of the natural scalability of

a GA and its result is strictly identical to that of the

sequential version because of the independence of the order

of function evaluations. However, the maximal theoretical

speedup is limited by the size of the population. This

means that when there are more processors available than

individuals to be evaluated, some of the processors will

stay idle. Moreover, when the remote cluster is under

heavy use by competing users for multiple projects,

affecting the evaluation time for individual processors of

the cluster due to jobs being delayed (or even dropped) and

as we need all the evaluations for the whole population in

order to continue the GA, the overall execution time is

dictated by the most charged and thus the slowest node.

This phenomenon is represented using the synchronization

barrier shown by a horizontal dotted line in Fig. 3.

2.1 ‘‘Ask&Tell’’ synchronous paradigm

Communication with the optimizer may be formally

explained using the Ask&Tell programming pattern [23]

applied to the optimizer/numerical model interface. In

simplest terms, ‘‘Ask&Tell’’ is a communication concept

that replaces the function call construct in an object-ori-

ented parallel environment. Since the numerical analysis is

performed/queued on a remote cluster, the master node that

needs the function evaluation issues an ‘‘Ask’’ message to

probe the result of the analysis, and the remote cluster uses

a ‘‘Tell’’ message to inform the master of the result, when

available (Fig. 4).

The Ask&Tell paradigm based on simple synchronous

messages is not sufficient to leverage the synchronization

barrier by itself. We need to introduce a paradigm for time-

out messages as well, in order to permit the algorithm to

Selection

Recombination

Mutation

Dominance

Initial population

simulator block

optimizer block

Updated population

Fig. 1 Basic GA simulator and optimizer blocks

Fig. 2 Sequential simulator block

Fig. 3 Parallel SPMD implementation of simulator block

Fig. 4 The Ask&Tell paradigm


123

continue by replacing missing function values by making

some reasonable hypotheses.

2.2 ‘‘Ask&Future Tell’’ asynchronous paradigm

The concept of ‘‘Futures’’ or wait-by-necessity [24] permits

an algorithm to advance with missing data ‘‘marked for

future updating’’. We propose here to extend the original

idea by giving a physical meaning to the ‘‘Futures’’ para-

digm. The extended concept may be stated as ‘‘use

approximate response and mark for future updating’’. The

surrogate model is performed by default on all individuals,

prior to sending them to the exact model for ‘‘precise’’

evaluation. At each new generation two cases arise:

• (k B N) individuals complete within the specified time-

out: the exact function values fy1. . .ykg are ‘‘told’’ to

the optimizer and the errors of the initial estimates are

used to update the surrogate model;

• N-k C 0 computations are timed-out: the ‘‘Futures’’ tag

is set for the processes continuing in background, while

the surrogate values fy�kþ1. . .y�Ng are ‘‘told’’ to the

optimizer.

The synchronization barrier in Fig. 3 is thus replaced by

a result time-out (user-specified) in Fig. 5. Once the

numerical result is sent from the remote cluster, the sur-

rogate is updated based on the error criterion. Figure 3

shows the ‘‘Ask&Future Tell’’ concept in an asynchronous

surrogate-assisted GA.

The key parameter is the time-out value which has to be

finely tuned during the process. For the initial generation,

the time-out is typically set at around 1.1 times the ideal

time taken for a single evaluation, i.e., on a dedicated node.

Obviously for an ideal case using a dedicated remote

cluster ‘‘zero-load’’ (unrealistic) all evaluations will com-

plete before the time-out allowing the optimization to

proceed with the exact values instead of a surrogate. The

time-out value decreases progressively during the optimi-

zation process as allowing the surrogate quality improve on

top of accumulated ‘‘exact’’ values.

3 Performance optimization of an air-conditioning duct

We demonstrate the proposed approach to optimizing the

geometry of an A/C duct (Fig. 6) of a passenger car [25] in

order to maximize its performance characterized by the

permeability (related to head loss over the duct length),

uniformity of flow at the exit and the total duct volume.

We set up a CFD grid with 39,000 grid points and

17,250 hexahedral cells (Fig. 7). The physical domain is

split into 23 different blocks for the purpose of meshing.

Blocks 13–22 (Fig. 7) in the curved portion have a higher

meshing density near the sharper portions of the curves.

The cell locations are constant in fixed blocks 1 and 23,

while cell locations in blocks 2–22 change with each

choice of design parameters, albeit the same number of

cells, arrangement, connectivity and mesh density.

Since the Reynolds number for the situation is typically

low, we assume incompressible 2D laminar airflow. We

use OpenFoam [26] CFD model to solve the Navier–Stokes

Fig. 5 ‘‘Ask&Future Tell’’

Fig. 6 A/C duct shape with inlet and outlet sections

Fig. 7 CFD meshing blocks and cells


123

equation in 2D. Boundary conditions prescribe null flow

speed along the walls and atmospheric pressure at the duct

outlet. The CFD analysis is run for 500 iterations to ensure

convergence, yielding pressure and velocity fields at the

midpoint of each hexahedral cell. The permeability and

exit flow uniformity are directly evaluated from the pres-

sure/velocity fields, while the duct volume is easily cal-

culated from the cell coordinates.

3.1 Optimization problem statement

The shape optimization problem with three objective

functions: pressure drop F1, output flow uniformity F2

expressed by the standard deviation of velocities and the

duct volume F3, may be stated as follows

Find Xopt ¼ Argmax F1ð �XÞ; F2ð �XÞ; F3ð �XÞ�X ¼ fX1. . .X13g; �LB� �X� �UB

ð1Þ

F1 = 1.0/(PA - PB)

F2 ¼ 1:0=X

B

ð �Vik k � �Vmeank kÞ2=nB

F3 = 1/(duct volume), where A = inlet section, B = out-

let section �UB and �LB are the geometric bounds on the

design variables ð �XÞ. The pressure and velocity fields P; �V

are calculated using CFD.

3.2 Shape parameterization

The inlet and outlet portions of the air-conditioning duct

have fixed geometries, while the middle portion allows for

modification of the shape and thus performance of the duct.

The 2D section (Fig. 8) is completely described by the

relative positions of points P1–P11. Positions of P1–P4 and

P9–P11 are assumed fixed and P5–P8 are obtained by the

geometric constructions.

Parameters X1–X5 allow us to locate points P5–P8,

while the additional parameters a1–a4 and b1–b4 define

Bezier curves passing through these points tracing out the

curved portion of the duct. The 13 design variables are thus:

X1, X2, X3, X4, X5 and X6 = a1, X7 = b1, X8 = a2, X9 = b2,

X10 = a3, X11 = b3, X12 = a4, X13 = b4. Upper and lower

bounds on these 13 parameters are set in order to retain the

laminar flow hypothesis and guarantee a realistic shape.

3.3 Bi-level surrogate model

Physics-based meta-models take advantage of the infor-

mation contained in the pressure and velocity fields of

high-fidelity model. The model reduction is two-stage POD

with customized coefficients conserving previously chosen

cumulative variables after truncation.

3.3.1 First level: constrained POD

Proper orthogonal decomposition works on a vector basis

obtained from a rectangular correlation matrix based on a

series of snapshots, and allows capturing the physics with a

limited set of scalar coefficients and is several orders of

magnitude faster than a CFD simulation. The initial set of

snapshots is composed of velocity/pressure fields of the

first generation. As the GA progresses, we add successive

high-fidelity velocity/pressure fields to the snapshots

matrix S. We obtain three snapshot matrices for P, Ux and

Uy, each composed of M snapshot vectors of length N

equal to the number of sampling points (shown for P, other

two field variables are similar)

S ¼P1

1 � �P1 P21 � �P1 . . . PM

1 � �P1

: : : :: : : :

P1N � �PN P2

N � �PN . . . PMN � �PN

2

664

3

775: ð2Þ

It must be noted that the region in space occupied by the

structural geometry changes depending on the design

variables. We interpolate the field values at the grid points

over a fixed reference grid using the finite elements [27] or

Delaunay Triangulation [28].

We first calculate the covariance matrices CP for each

field variable (only P shown here)

CP ¼ SST; ð3Þ

allowing us to express the three field vectors in terms of the

eigenvectors UP ¼ ½/P1 ;/

P2 . . ./P

M� of Cp

Pi ¼ �PþXM

j¼1

aij/Pj : ð4Þ

We next truncate the basis to the m most active modes

~P ¼ �PþPm

1 ai/Pi (m � M) with a relative projection

errorFig. 8 2D section and geometric construction of curved portion of

the duct


123

eðmÞ ¼ 1�Pm

i¼1 kiPMi¼1 ki

: ð5Þ

However, additional constraints are needed for the

truncated POD approximation to conserve cumulative

quantities (such as total fluid flow) over the approximated

field ~Pi

WT ~Pi ¼ �ciP; ð6Þ

where WT stores interpolation and integration coefficients

over the reference grid and �ciP is the value integrated over

the original snapshot. We have shown [15] that for a given

basis truncated to m vectors /;m, the coefficients a (b and cfor the other two variables) are obtained from

/T;mM/;m /T

;mW

WT/;m 0

� �aðkÞ

k

� �¼ /T

;mMðPðkÞ � �PÞcðkÞ �WT �P

� �: ð7Þ

This permits us to reconstruct the snapshots preserving

global quantities of interest (objective functions). In order

to get fields for arbitrary values of design variables, we use

an approximation scheme.

3.3.2 Second level: diffuse approximation

of POD coefficients

The next step is to express the coefficients, a, b and c as

functions of the design variables �X ¼ X1. . .X13 using dif-

fuse approximation method [29], chosen for its adaptivity

and ability to capture local effects

að �XÞ � bTð �XÞað �XÞ ð8Þ

bTð �XÞ ¼ 1 X1 X2 . . . X21 X1X2 . . .

� �; ð9Þ

where að �XÞ are chosen to minimize the functional Jx(a)

JxðaÞ ¼1

2

XM

i¼1

wið �Xi; �XÞ bTð �XiÞa� Pi

� 2; ð10Þ

where wið �Xi; �XÞ is the weighting function for the MLS

approximation, e.g., radial

wið �Xi; �XÞ ¼ exp �0:25 �Xi � �Xk k2 �

: ð11Þ

The two steps—POD and diffuse approximation

constitute a bi-level surrogate model for the airflow

within the duct permitting us to approximate the air-duct

pressure and velocity fields by

~Pð �XÞ ¼ �PþXm

1

aið �XÞ/Pi : ð12Þ

3.3.3 Adaptive learning strategy

The learning strategy to refine and improve the surrogate

generates new snapshots around the Pareto (dominant)

solutions and removes duplicates as well as weak points far

from the Pareto to increase the approximation precision in

the local neighborhood of the dominant solutions (first k

snapshots in the snapshot matrix are replaced by more

dominant point closer to the Pareto front)

S ¼ S1 S2 . . . SM� �

! Sk . . . SN SNþ1 . . . SMþk�1� �

:

If we observe distinct groups indicating a split on the

population (two different local optima) then the design

space can also be split giving different bases /Pi ;/

Uxi ;/Uy

i

for the different local optima, based on adaptive POD by

Ryckelynck [30].

3.4 Master–slave implementation on a cluster

Our implementation of the proposed model queues the

CFD function calls on a 160-processor remote cluster with

close to 40 competing users working on unrelated projects.

Since there are three objectives, we used a MOGA [4]

approach with single point crossover, mutation and niching

to locate dominant solutions using the asynchronous pop-

ulation evaluation (Sect. 2). We implement niching by

removing ‘‘clones’’ of the most dominant individuals and

replacing them by freshly sampled points in the neigh-

borhood of the dominant solutions.

Figure 9 shows the asynchronous implementation in

practice with the CFD results obtained using OpenFOAM

on the loaded cluster. MATLAB/Scilab is run on the group

A (master nodes) processors while group B processors

(remote slaves) receive the CFD analysis requests (to be run

using OpenFoam). The processors are split into two groups.

• The master nodes launch CFD jobs each generation;

create snapshots, calculate surrogates and refine data-

base, have ‘‘Ask’’ access to CFD values, write surrogate

values to separate database, and perform GA operations.

• The slave nodes receive CFD requests for individuals,

post-process CFD results to evaluate objective func-

tions, ‘‘Tell’’ access to CFD values database, NO access

to surrogate database.

Thus master nodes submit jobs to slaves through the

scheduler so slaves can eventually ‘‘write’’ output to the

database. Masters continue the optimization with the sur-

rogate values without waiting for the CFD results. Con-

vergence is achieved when we have obtained a satisfactory

number of non-dominated and validated solution designs.

3.5 Numerical results and discussion

While testing the presented approach on a cluster, the

important features were,


123

• Non-intrusive nature of the protocol allowing the

optimization to continue in a stable manner indepen-

dent of the CAD/CFD despite system crashes due to

CAD failure, server load and outages;

• Quality of optimal solution (s) obtained as verified

using the exact CFD calculation;

• Observed utilization of computing resources allotted

with no idle time for any of the processors.

The mixed approach ensured zero idle time for the

group A processors beyond waiting for a sufficient initial

size of the database to use the meta-model, ensuring a

completely asynchronous execution. The accuracy of the

CPOD surrogate improved quickly with the database size.

The population diversity had to be increased (using niching

and additional sampling) during any group B server out-

ages/overloads in order to prevent premature convergence

with a less accurate estimate. With this, the optimization

process could continue and be halted at any time depending

on the need and performed well in the face of failure due to

the way databases are written as external files rather than

internal variables.

Figure 10 shows the evolution of the design points

plotted on the 3D objective function space (permeability

Fig. 9 Cluster implementation of asynchronous GA

Fig. 10 Evolution of the population on objective functions space


123

vs. flow uniformity vs. duct volume) as the multi-objective

genetic algorithm proceeds from the first to future

generations.

Figure 11 shows the individual objectives plotted

against each other. It bears mentioning that the first two

objectives (permeability and flow uniformity at the duct

exit) are not mutually opposing. As expected, there is a

gradual improvement in the population based on the

objective functions, even though the graph may change as

the ‘‘Futures’’ protocol updates the surrogate values with

the CFD-calculated values.

Figure 12 shows the pressure distribution and the flow

field (streamlines) in one of the air-conditioning duct

geometries obtained after 40 generations (point A in

Fig. 11). As can be seen, the pressure distribution tapers off

over the length of the duct eventually releasing into

atmospheric pressure (zero gauge pressure), and the pres-

sure drop off is considerably sharp, and directly relates to

performance.

The velocity distribution shows the reason for the choice

of the first objective function, i.e., uniformity of flow

velocity at the exit of the duct related to noise control. We

note that circulation can be seen next to the curved surfaces

of the duct, and since these individuals still involve a trade

off between duct volume and permeability/flow uniformity,

the velocity fields show some circulation in two distinct

regions.

3.6 Performance and parallel efficiency

In the general approach covered by this article, it is

impossible to obtain a quantitative estimate of parallel

efficiency since the actual computation time depends on the

existing load on the remote cluster. In our particular

example, computation for the full CFD model required 90 s

on a dedicated remote node; while the CPOD-based sur-

rogate functions on a master node (identical to remote

nodes) required less than 5 s. That said, the actual course of

the optimization will greatly depend on the exact state/load

of the remote cluster which is constantly evolving with the

number of users submitting jobs and thus making a broad

statement about the actual efficiency as has been seen in

previous related works is not possible.

We have made some comparisons over an hour elapse

time of MOGA for three different remote server states:

‘‘zero-load’’ (dedicated server); ‘‘normal-load’’ (*30

competing users) and ‘‘full-load’’ (remote server unable to

process CFD requests). The program was run with 100 %

CFD (no surrogate-assistance) and a mixed approach, i.e.,

CFD and surrogate with time-out, and the results are

compared in Fig. 13.

The reference (green) curve is obtained for zero-load

and constant 100-s time-out value. The proposed approach

reduces here to the synchronous ‘‘Ask&Tell’’ GA

Fig. 11 Generation 20: F1 and F2 non-competing; F3 competes with

F1 and F2; the point A on the Pareto set is chosen for further

inspection (Fig. 12)

Fig. 12 Pressure distribution

and flow streamlines for a

Pareto point A


123

implementation for a dedicated remote cluster since all

exact evaluations complete within roughly 90 s.

The red curve shows the degradation of performance of

the synchronous ‘‘Ask&Tell’’ GA implementation (without

surrogates) on a charged cluster. Only two generations are

performed within the 1st hour as it is enough for a single

individual taking 40 min to block the progression.

The blue curve illustrates the proposed asynchronous

‘‘Ask&Future Tell’’ approach. We notice that the iterations

are performed at the beginning within the same 100 s time-

out, which decreases giving more frequent update than the

reference curve. Due to the poor initial quality of the initial

surrogate, the convergence is slower at the beginning than

that of the synchronous counterpart and improves over the

time. The comparison is hindered by the fact that the

asynchronous version is not deterministic as the order of

operations changes between generations. It is, however,

easily observed that within 1-hour elapse time the proposed

algorithm gives objective function values close to the ref-

erence and largely outperforms the synchronous imple-

mentation in this ‘‘real life’’ case.

4 Conclusions

In this paper, we presented and tested a unified non-intru-

sive approach to solving multi-objective optimization

problems on a busy grid/cluster, avoiding deadlocks and

node failures. This is possible by an introduction of a new

‘‘Ask&Future Tell’’ parallel paradigm combining asyn-

chronous approach with high-fidelity simulations and cus-

tom-built adaptive bi-level surrogate meta-model. This

allows the high-fidelity simulations to proceed at their own

pace on a remote cluster and update the simulation results

to enrich the database of stored results once available to

ensure and improve surrogate quality, while the optimiza-

tion algorithm proceeds with approximate values. The

asynchronous approach presented will conceivably work

with any surrogate-based meta-model that produces quality

estimates, and on any type of cluster/grid.

For an efficient use, several parameters need closer

investigation such as the time-out evolution, which con-

ditions the quality of the learning process. In the current

work jobs complete largely in FIFO (First In, First Out)

order. A second level time-out may be necessary for

deleting largely delayed jobs corresponding to design

points far from the current Pareto set, the ideal learning

strategy being clearly LIFO (Last In, First Out). Further

work is needed to extend the ‘‘Ask&Future Tell’’ approach

to gradient-based optimization. Several issues need to be

addressed such as scalability, approximation of gradients

and local optima.

Acknowledgments This work has been supported by the French

National Research Agency (ANR), through the COSINUS program

(project OMD2 no. ANR-08-COSI-007). The authors acknowledge

the Projet Pluri-Formations PILCAM2 at the Universite de Tech-

nologie de Compiegne for providing HPC resources that have con-

tributed to the research results reported within this paper (URL:

http://pilcam2.wikispaces.com.) as well as Maryan Sidorkiewicsz,

Direction de la Recherche, Renault, France and Mr. V. Picheny, Ecole

des Mines, France for contributing the CFD model used in this work.

References

1. Holland JH (1975) Adaptation in natural and artificial systems.

University of Michigan Press, Ann Arbor

2. Vose MD (1999) The simple genetic algorithm: foundations and

theory. MIT Press, Cambridge

3. Konak A, Coit DW, Smith AE (2006) Multi-objective optimi-

zation using genetic algorithms: a tutorial. Reliab Eng Syst Saf

91:992–1007

Fig. 13 Elapse-time convergence of synchronous SPMD GA on dedicated cluster compared with ‘‘real life’’ and with proposed ‘‘Ask&Future

Tell’’ approach performance on loaded cluster


123

http://pilcam2.wikispaces.com

4. Deb K (2001) Multi-objective optimization using genetic algo-

rithms. Wiley, Chichester

5. Willcox K, Peraire J (2002) Balanced model reduction via the proper

orthogonal decomposition. AIAA Journal 40(11):2323–2330

6. Gorissen D, Couckuyt I, Laermans E, Dhaene T (1985) Multi-

objective global surrogate modeling, dealing with the 5-percent

problem. Eng Comput 26(1):81–98

7. Lim D, Jin YC, Ong YS, Sendhoff B (2010) Generalizing sur-

rogate-assisted evolutionary computation. IEEE Trans Evol

Comput 14(3):329–355

8. Quiepo NV, Verde A, Pintos S, Haftka RT (2009) Assessing the

value of another cycle in Gaussian process surrogate-based

optimization. Int J Struc Multidisc Optim 39(5):459–475

9. Viana FAC, Haftka RT, Steffen V (2009) Multiple surrogates:

how cross-validation errors can help us to obtain the best pre-

dictor. Int J Struc Multidisc Optim 39(4):439–457

10. Knowles J (2006) ParEGO: a hybrid algorithm with on-line

landscape approximation for expensive multi objective optimi-

zation problems. IEEE Trans Evol Comput 10(1):50–66

11. Jones D, Schonlau M, Welch W (1998) Efficient global optimiza-

tion of expensive black-box functions. J Glob Optim 13:455–492

12. Berkooz G, Holmes P, Lumley JL (1993) The proper orthogonal

decomposition in the analysis of turbulent flows. Annu Rev Fluid

Mech 25:539–575

13. Filomeno Coelho R, Breitkopf P, Knopf-Lenoir C (2008) Model

reduction for multidisciplinary optimization—application to a 2d

wing. Int J Struc Multidisc Optim 37(1):29–48

14. Filomeno Coelho R, Breitkopf P, Knopf-Lenoir C (2009) Bi-level

model reduction for coupled problems. Int J Struc Multidisc

Optim 39(4):401–418

15. Xiao M, Breitkopf P, Coelho RF, Knopf-Lenoir C, Sidorkiewicsz

M, Villon P (2009) Model reduction by CPOD and Kriging. Int J

Struc Multidisc Optim 41(4):555–574

16. Bethke AD (1976) Comparison of genetic algorithms and gradi-

ent-based optimizers on parallel processors: efficiency of use of

processing capacity, Tech rep no 197. University of Michigan,

Ann Arbor

17. Greffensette JJ (1981) Parallel adaptive algorithms for function

optimization: parallel subcomponent interaction in a multilocus

model, Tech Rep No CS-81-19. Vanderbilt University, Nashville

18. Cantu-Paz E (1997) A survey of parallel genetic algorithms Ill-

GAL report 97003. The University of Illinois, Chicago

19. Tsutsui S (2010) Parallelization of an evolutionary algorithm on a

platform with multi-core processors. Artificial evolution, vol

5975. Lecture notes in computer science. Springer, Heidelberg,

pp 61–73

20. Wu H, Xu CL, Zou XF (2009) An efficient asynchronous parallel

evolutionary algorithm based on message passing model for

solving complex nonlinear constrained optimization. In: pro-

ceedings of the 8th international symposium on operations

research and its applications, ZhangJiaJie, China

21. Regis RG, Shoemaker CA (2009) Parallel stochastic global

optimization using radial basis functions. INFORMS J Comput

21(3):411–426

22. Asouti VG, Kampolis IC, Giannakoglou KC (2009) A grid-

enabled asynchronous meta model-assisted evolutionary algo-

rithm for aerodynamic optimization. Genet Program Evolvable

Mach 10(4):373–389

23. LeRiche R, Collette Y, Hansen N, Pujol G, Salazar D (2010) On

object-oriented programming of optimizers: examples in Scilab.

In: P. Breitkopf, R. Filomeno Coehlo (eds) Multidisciplinary

design optimization in computational mechanics (chapter 14)

Wiley/ISTE, Ney York, June 2010, pp 499–538

24. Caromel D, Henrio L (2004) A theory of distributed objects.

Springer, Berlin

25. http://omd2.scilab.org/ (2009) OMD2-project home-page, Accessed

Feb 22 2011

26. http://www.openfoam.com OpenFoam: the open-source CFD

toolbox, Accessed Aug 17 2010

27. Breitkopf P (1998) An algorithm for construction of iso-valued

surfaces for finite elements. Eng Comput 14(2):146–149

28. Rypl D, Krysl P (1997) Triangulation of 3D surfaces. Eng

Comput 13(2):87–98

29. Breitkopf P, Rassineux A, Touzot G, Villon P (2000) Explicit

form and efficient computation of MLS shape functions and their

derivatives. Int J Numer Meth Eng 48:451–456

30. Ryckelynck D (2005) A priori hyper eduction method: an adap-

tive approach. J Comput Phys 202(1):346–366


123

http://omd2.scilab.org/

http://www.openfoam.com

Documents

Asynchronous evolutionary shape optimization based on high-quality surrogates: Application to an air-conditioning duct