Upload
insa-rennes
View
0
Download
0
Embed Size (px)
Citation preview
ORIGINAL ARTICLE
Asynchronous evolutionary shape optimizationbased on high-quality surrogates: applicationto an air-conditioning duct
Balaji Raghavan • Piotr Breitkopf
Received: 21 December 2010 / Accepted: 22 March 2012
� Springer-Verlag London Limited 2012
Abstract Multi-processor HPC tools have become com-
monplace in industry and research today. Evolutionary
algorithms may be elegantly parallelized by broadcasting a
whole population of designs to an array of processors in a
computing cluster or grid. However, issues arise due to
synchronization barriers: subsequent iterations have to wait
for the successful execution of all jobs of the previous
generation. When other users load a cluster or a grid,
individual tasks may be delayed and some of them may
never complete, slowing down and eventually blocking the
optimization process. In this paper, we extend the recent
‘‘Futures’’ concept permitting the algorithm to circumvent
such situations. The idea is to set the default values to the
cost function values calculated using a high-quality surro-
gate model, progressively improving when ‘‘exact’’
numerical results are received. While waiting for the exact
result, the algorithm continues using the approximation and
when the data finally arrives, the surrogate model is
updated. At convergence, the final result is not only an
optimized set of designs, but also a surrogate model that is
precise within the neighborhood of the optimal solution.
We illustrate this approach with the cluster optimization of
an A/C duct of a passenger car, using a refined CFD legacy
software model along with an adaptive meta-model based
on Proper Orthogonal Decomposition (POD) and diffuse
approximation.
Keywords Parallel computing � Genetic algorithms �Ask&Tell � Futures � Fluid mechanics
1 Introduction
Stochastic search Genetic Algorithms (GAs) [1, 2] attempt
to find the optimal solution by manipulating a population of
candidate solutions, evaluating the fitness of the popula-
tion, and selecting the best solutions to reproduce and form
the next generation. As the generations proceed, the indi-
viduals with the highest fitness levels dominate the popu-
lation, potentially resulting in an increase in the quality.
These algorithms are attractive for multi-modal and multi-
objective optimization problems [3, 4] since rather than a
single optimum, they report a set of solutions correspond-
ing to local minima or the Pareto points. Since GAs do not
use gradient information, these are easily implemented
when a non-intrusive optimization strategy is desired. As
the overall cost of an optimization may be approximated by
the cost of the function evaluation multiplied by the
number of function calls, there are two major shortcomings
of GA’s when applied to structural optimization problems:
• The cost of fitness of an individual when calculated
using a high-fidelity numerical model with several
thousands of degrees of freedom (finite elements, finite
volumes, etc.);
• The number of function evaluations resulting both from
the size of the population and from the number of
generations required to converge.
In the present work, the first issue is addressed by
building a dedicated, low-cost and adaptive counterpart
(surrogate) of the numerical model. The second issue is
addressed by proposing an asynchronous parallel version of
B. Raghavan � P. Breitkopf (&)
Laboratoire Roberval, UTC-CNRS (UMR 7337), Labex MS2T,
Universite de Technologie de Compiegne, Compiegne, France
e-mail: [email protected]
B. Raghavan
e-mail: [email protected]
123
Engineering with Computers
DOI 10.1007/s00366-012-0263-0
GA that uses both high-fidelity simulation when available
and otherwise continues with a surrogate model. As the
optimization advances, the refined solutions accumulate
and are used to progressively refine the surrogate model. At
convergence, the final result is an optimized set of designs
and a finely tuned surrogate model that is precise within the
neighborhood of the optimal solutions.
Surrogate functions and reduced order models have been
around for a while and have been used mostly in the field of
control systems to reduce the order of the overall transfer
function [5]. They can approximate the objective functions
in a fraction of the computational time of the high-fidelity
model [6]. Lim et al. [7] provided a generalization of
surrogate-assisted evolutionary frameworks for computa-
tionally expensive problems. Surrogate-assisted evolu-
tionary optimization typically proceeds in cycles consisting
of: collection and analysis of a number of designs, fitting a
surrogate to the designs, optimization based on the surro-
gate and exact analysis at the final solution. Most recently,
Quiepo et al. [8] addressed a Gaussian process surrogate-
based optimization and looked for a statistically rigorous
procedure that would allow the user to determine the
number of surrogate-based optimization (SBO) cycles
needed for a given problem, while Viana et al. [9] inves-
tigated the cross-validation error of a set of surrogates and
tried to obtain the most accurate surrogate. The ParEGO
[10] extension of the Efficient Global Optimization (EGO)
approach [11] uses a design-of-experiments inspired
initialization procedure and learns a Gaussian process
model of the search landscape, which is updated after every
function evaluation.
We are interested here in high-quality surrogates
obtained using the method of Proper Orthogonal Decom-
position (POD) [12]. Filomeno Coelho et al. [13] used a
model reduction approach using kriging and POD to opti-
mize the intake port for a car engine. Filomeno Coelho
et al. [14] then proposed a bi-level model reduction strat-
egy to reduce the interdependence between the fluid and
structural models and hence the computation time for the
case of a 2D wing demonstrator, and then extended the
approach to the optimization of a 3D flexible wing, while
Xiao et al. [15] proposed a constrained POD version for
the optimization of a car engine intake port. All these
approaches used a variant of a GA operating directly on the
a priori constructed global surrogate functions, without
making calls to the higher order numerical model. While
this approach is extremely fast, one could run into trouble
since the engineer relies completely on the surrogate
function for his final result(s).
A distributed/parallel GAs implementation is possible,
since the fitness of an individual can be evaluated
independently of the other individuals, meaning that the
computation and search strategy is independent of the
computing sequence within a single generation. In the
master–slave scheme, communication is necessary only
when the slaves receive their individuals and then when
they return the function. Bethke [16] performed the first
study on parallel/distributed implementations of GA and
compared their parallel performance with gradient-based
optimizers. He also identified the bottlenecks that limit the
efficiency of a gradient-based approach. Greffensette [17]
proposed master–slave paradigms and the multiple popu-
lations approach with a migration scheme. The next sub-
class of parallel GAs is multi-deme GAs with multiple
populations [18]. Most parallel GA implementations take
one or more serial GAs and run each separately on separate
nodes, exchanging individuals at predetermined times, and
perform exact/numerical model-based evaluations. This,
however, can be a bottleneck on a busy cluster since the
function evaluations may be delayed leading to deadlocks.
The need for asynchronous parallelism appears partic-
ularly when the high-fidelity model runs on a remote
computing cluster that is being used for hundreds of other
projects and computational purposes, such as in a large
research laboratory or technical organization. This means
that scheduled jobs will be handled depending on the
overall priority, load on the cluster, and available resour-
ces. On a computing grid the possibility of node failures
and/or delayed/dropped computational jobs exists and
needs thus to be factored in when planning projects.
Tsutsui and Wu et al. [19, 20] investigates asynchronous
evolutionary algorithms on a multi-core cluster.
Clearly, a ‘‘mixed’’ approach that allows for asynchro-
nous computation using both numerical as well as surro-
gate calls allows the design engineer much more control
and flexibility given the precision versus computational
time and computational power, with the cluster load being
a factor outside the user’s control. The literature surveyed
shows little research on asynchronous surrogate-assisted
evolutionary algorithms. Regis et al. [21] used a parallel
stochastic radial-basis function algorithm for the surrogate
and an asynchronous global pattern search on up to eight
processors, and Asouti [22] used Artificial Neural Network
(ANN)-based meta-models to improve the quality of the
surrogate compared to existing low-precision meta-models.
In this paper, we propose an ‘‘Ask&Future Tell’’ parallel
paradigm for competitive use of high-fidelity and reduced
models specifically designed for cluster/grid environments
where an asynchronous approach is managed by a time-out
scheme. We give a ‘‘non-intrusive’’ algorithm for mas-
sively parallel asynchronous surrogate-based evolutionary
optimization tolerant to an existing cluster load with lim-
ited control over the recovery of the numerical results
performed on the slave nodes. We illustrate the proposed
approach in the problem of multi-objective optimization of
a vehicle air-conditioning duct.
Engineering with Computers
123
The paper is organized into four sections. In Sect. 2 we
describe the proposed algorithm. In Sect. 3 the test case,
i.e., air-conditioning duct geometry and CFD model is
treated along with the CPOD-based meta-model used as a
surrogate with discussion of the optimization results.
Finally we discuss the perspectives and limitations of
proposed approach.
2 Paradigms for parallel evolutionary optimization
Consider a simple GA (Fig. 1). Each iteration involves two
distinct phases: evaluation of the population fitness values
and the recombination of individuals to produce the new
population.
The population evaluation in serial mode is shown in
Fig. 2, when the optimization algorithm and all function
evaluations are performed in an iterative manner on a
single node.
The simulator block is an embarrassingly parallel task
and may be performed on a remote cluster with a master
node dedicated to the optimization (GA operations) shown
in Fig. 3, typically using a Single Program, Multiple Data
(SPMD) block.
This process takes advantage of the natural scalability of
a GA and its result is strictly identical to that of the
sequential version because of the independence of the order
of function evaluations. However, the maximal theoretical
speedup is limited by the size of the population. This
means that when there are more processors available than
individuals to be evaluated, some of the processors will
stay idle. Moreover, when the remote cluster is under
heavy use by competing users for multiple projects,
affecting the evaluation time for individual processors of
the cluster due to jobs being delayed (or even dropped) and
as we need all the evaluations for the whole population in
order to continue the GA, the overall execution time is
dictated by the most charged and thus the slowest node.
This phenomenon is represented using the synchronization
barrier shown by a horizontal dotted line in Fig. 3.
2.1 ‘‘Ask&Tell’’ synchronous paradigm
Communication with the optimizer may be formally
explained using the Ask&Tell programming pattern [23]
applied to the optimizer/numerical model interface. In
simplest terms, ‘‘Ask&Tell’’ is a communication concept
that replaces the function call construct in an object-ori-
ented parallel environment. Since the numerical analysis is
performed/queued on a remote cluster, the master node that
needs the function evaluation issues an ‘‘Ask’’ message to
probe the result of the analysis, and the remote cluster uses
a ‘‘Tell’’ message to inform the master of the result, when
available (Fig. 4).
The Ask&Tell paradigm based on simple synchronous
messages is not sufficient to leverage the synchronization
barrier by itself. We need to introduce a paradigm for time-
out messages as well, in order to permit the algorithm to
Selection
Recombination
Mutation
Dominance
Initial population
simulator block
optimizer block
Updated population
Fig. 1 Basic GA simulator and optimizer blocks
Fig. 2 Sequential simulator block
Fig. 3 Parallel SPMD implementation of simulator block
Fig. 4 The Ask&Tell paradigm
Engineering with Computers
123
continue by replacing missing function values by making
some reasonable hypotheses.
2.2 ‘‘Ask&Future Tell’’ asynchronous paradigm
The concept of ‘‘Futures’’ or wait-by-necessity [24] permits
an algorithm to advance with missing data ‘‘marked for
future updating’’. We propose here to extend the original
idea by giving a physical meaning to the ‘‘Futures’’ para-
digm. The extended concept may be stated as ‘‘use
approximate response and mark for future updating’’. The
surrogate model is performed by default on all individuals,
prior to sending them to the exact model for ‘‘precise’’
evaluation. At each new generation two cases arise:
• (k B N) individuals complete within the specified time-
out: the exact function values fy1. . .ykg are ‘‘told’’ to
the optimizer and the errors of the initial estimates are
used to update the surrogate model;
• N-k C 0 computations are timed-out: the ‘‘Futures’’ tag
is set for the processes continuing in background, while
the surrogate values fy�kþ1. . .y�Ng are ‘‘told’’ to the
optimizer.
The synchronization barrier in Fig. 3 is thus replaced by
a result time-out (user-specified) in Fig. 5. Once the
numerical result is sent from the remote cluster, the sur-
rogate is updated based on the error criterion. Figure 3
shows the ‘‘Ask&Future Tell’’ concept in an asynchronous
surrogate-assisted GA.
The key parameter is the time-out value which has to be
finely tuned during the process. For the initial generation,
the time-out is typically set at around 1.1 times the ideal
time taken for a single evaluation, i.e., on a dedicated node.
Obviously for an ideal case using a dedicated remote
cluster ‘‘zero-load’’ (unrealistic) all evaluations will com-
plete before the time-out allowing the optimization to
proceed with the exact values instead of a surrogate. The
time-out value decreases progressively during the optimi-
zation process as allowing the surrogate quality improve on
top of accumulated ‘‘exact’’ values.
3 Performance optimization of an air-conditioning duct
We demonstrate the proposed approach to optimizing the
geometry of an A/C duct (Fig. 6) of a passenger car [25] in
order to maximize its performance characterized by the
permeability (related to head loss over the duct length),
uniformity of flow at the exit and the total duct volume.
We set up a CFD grid with 39,000 grid points and
17,250 hexahedral cells (Fig. 7). The physical domain is
split into 23 different blocks for the purpose of meshing.
Blocks 13–22 (Fig. 7) in the curved portion have a higher
meshing density near the sharper portions of the curves.
The cell locations are constant in fixed blocks 1 and 23,
while cell locations in blocks 2–22 change with each
choice of design parameters, albeit the same number of
cells, arrangement, connectivity and mesh density.
Since the Reynolds number for the situation is typically
low, we assume incompressible 2D laminar airflow. We
use OpenFoam [26] CFD model to solve the Navier–Stokes
Fig. 5 ‘‘Ask&Future Tell’’
Fig. 6 A/C duct shape with inlet and outlet sections
Fig. 7 CFD meshing blocks and cells
Engineering with Computers
123
equation in 2D. Boundary conditions prescribe null flow
speed along the walls and atmospheric pressure at the duct
outlet. The CFD analysis is run for 500 iterations to ensure
convergence, yielding pressure and velocity fields at the
midpoint of each hexahedral cell. The permeability and
exit flow uniformity are directly evaluated from the pres-
sure/velocity fields, while the duct volume is easily cal-
culated from the cell coordinates.
3.1 Optimization problem statement
The shape optimization problem with three objective
functions: pressure drop F1, output flow uniformity F2
expressed by the standard deviation of velocities and the
duct volume F3, may be stated as follows
Find Xopt ¼ Argmax F1ð �XÞ; F2ð �XÞ; F3ð �XÞ�X ¼ fX1. . .X13g; �LB� �X� �UB
ð1Þ
F1 = 1.0/(PA - PB)
F2 ¼ 1:0=X
B
ð �Vik k � �Vmeank kÞ2=nB
F3 = 1/(duct volume), where A = inlet section, B = out-
let section �UB and �LB are the geometric bounds on the
design variables ð �XÞ. The pressure and velocity fields P; �V
are calculated using CFD.
3.2 Shape parameterization
The inlet and outlet portions of the air-conditioning duct
have fixed geometries, while the middle portion allows for
modification of the shape and thus performance of the duct.
The 2D section (Fig. 8) is completely described by the
relative positions of points P1–P11. Positions of P1–P4 and
P9–P11 are assumed fixed and P5–P8 are obtained by the
geometric constructions.
Parameters X1–X5 allow us to locate points P5–P8,
while the additional parameters a1–a4 and b1–b4 define
Bezier curves passing through these points tracing out the
curved portion of the duct. The 13 design variables are thus:
X1, X2, X3, X4, X5 and X6 = a1, X7 = b1, X8 = a2, X9 = b2,
X10 = a3, X11 = b3, X12 = a4, X13 = b4. Upper and lower
bounds on these 13 parameters are set in order to retain the
laminar flow hypothesis and guarantee a realistic shape.
3.3 Bi-level surrogate model
Physics-based meta-models take advantage of the infor-
mation contained in the pressure and velocity fields of
high-fidelity model. The model reduction is two-stage POD
with customized coefficients conserving previously chosen
cumulative variables after truncation.
3.3.1 First level: constrained POD
Proper orthogonal decomposition works on a vector basis
obtained from a rectangular correlation matrix based on a
series of snapshots, and allows capturing the physics with a
limited set of scalar coefficients and is several orders of
magnitude faster than a CFD simulation. The initial set of
snapshots is composed of velocity/pressure fields of the
first generation. As the GA progresses, we add successive
high-fidelity velocity/pressure fields to the snapshots
matrix S. We obtain three snapshot matrices for P, Ux and
Uy, each composed of M snapshot vectors of length N
equal to the number of sampling points (shown for P, other
two field variables are similar)
S ¼P1
1 � �P1 P21 � �P1 . . . PM
1 � �P1
: : : :: : : :
P1N � �PN P2
N � �PN . . . PMN � �PN
2
664
3
775: ð2Þ
It must be noted that the region in space occupied by the
structural geometry changes depending on the design
variables. We interpolate the field values at the grid points
over a fixed reference grid using the finite elements [27] or
Delaunay Triangulation [28].
We first calculate the covariance matrices CP for each
field variable (only P shown here)
CP ¼ SST; ð3Þ
allowing us to express the three field vectors in terms of the
eigenvectors UP ¼ ½/P1 ;/
P2 . . ./P
M� of Cp
Pi ¼ �PþXM
j¼1
aij/Pj : ð4Þ
We next truncate the basis to the m most active modes
~P ¼ �PþPm
1 ai/Pi (m � M) with a relative projection
errorFig. 8 2D section and geometric construction of curved portion of
the duct
Engineering with Computers
123
eðmÞ ¼ 1�Pm
i¼1 kiPMi¼1 ki
: ð5Þ
However, additional constraints are needed for the
truncated POD approximation to conserve cumulative
quantities (such as total fluid flow) over the approximated
field ~Pi
WT ~Pi ¼ �ciP; ð6Þ
where WT stores interpolation and integration coefficients
over the reference grid and �ciP is the value integrated over
the original snapshot. We have shown [15] that for a given
basis truncated to m vectors /;m, the coefficients a (b and cfor the other two variables) are obtained from
/T;mM/;m /T
;mW
WT/;m 0
� �aðkÞ
k
� �¼ /T
;mMðPðkÞ � �PÞcðkÞ �WT �P
� �: ð7Þ
This permits us to reconstruct the snapshots preserving
global quantities of interest (objective functions). In order
to get fields for arbitrary values of design variables, we use
an approximation scheme.
3.3.2 Second level: diffuse approximation
of POD coefficients
The next step is to express the coefficients, a, b and c as
functions of the design variables �X ¼ X1. . .X13 using dif-
fuse approximation method [29], chosen for its adaptivity
and ability to capture local effects
að �XÞ � bTð �XÞað �XÞ ð8Þ
bTð �XÞ ¼ 1 X1 X2 . . . X21 X1X2 . . .
� �; ð9Þ
where að �XÞ are chosen to minimize the functional Jx(a)
JxðaÞ ¼1
2
XM
i¼1
wið �Xi; �XÞ bTð �XiÞa� Pi
� 2; ð10Þ
where wið �Xi; �XÞ is the weighting function for the MLS
approximation, e.g., radial
wið �Xi; �XÞ ¼ exp �0:25 �Xi � �Xk k2 �
: ð11Þ
The two steps—POD and diffuse approximation
constitute a bi-level surrogate model for the airflow
within the duct permitting us to approximate the air-duct
pressure and velocity fields by
~Pð �XÞ ¼ �PþXm
1
aið �XÞ/Pi : ð12Þ
3.3.3 Adaptive learning strategy
The learning strategy to refine and improve the surrogate
generates new snapshots around the Pareto (dominant)
solutions and removes duplicates as well as weak points far
from the Pareto to increase the approximation precision in
the local neighborhood of the dominant solutions (first k
snapshots in the snapshot matrix are replaced by more
dominant point closer to the Pareto front)
S ¼ S1 S2 . . . SM� �
! Sk . . . SN SNþ1 . . . SMþk�1� �
:
If we observe distinct groups indicating a split on the
population (two different local optima) then the design
space can also be split giving different bases /Pi ;/
Uxi ;/Uy
i
for the different local optima, based on adaptive POD by
Ryckelynck [30].
3.4 Master–slave implementation on a cluster
Our implementation of the proposed model queues the
CFD function calls on a 160-processor remote cluster with
close to 40 competing users working on unrelated projects.
Since there are three objectives, we used a MOGA [4]
approach with single point crossover, mutation and niching
to locate dominant solutions using the asynchronous pop-
ulation evaluation (Sect. 2). We implement niching by
removing ‘‘clones’’ of the most dominant individuals and
replacing them by freshly sampled points in the neigh-
borhood of the dominant solutions.
Figure 9 shows the asynchronous implementation in
practice with the CFD results obtained using OpenFOAM
on the loaded cluster. MATLAB/Scilab is run on the group
A (master nodes) processors while group B processors
(remote slaves) receive the CFD analysis requests (to be run
using OpenFoam). The processors are split into two groups.
• The master nodes launch CFD jobs each generation;
create snapshots, calculate surrogates and refine data-
base, have ‘‘Ask’’ access to CFD values, write surrogate
values to separate database, and perform GA operations.
• The slave nodes receive CFD requests for individuals,
post-process CFD results to evaluate objective func-
tions, ‘‘Tell’’ access to CFD values database, NO access
to surrogate database.
Thus master nodes submit jobs to slaves through the
scheduler so slaves can eventually ‘‘write’’ output to the
database. Masters continue the optimization with the sur-
rogate values without waiting for the CFD results. Con-
vergence is achieved when we have obtained a satisfactory
number of non-dominated and validated solution designs.
3.5 Numerical results and discussion
While testing the presented approach on a cluster, the
important features were,
Engineering with Computers
123
• Non-intrusive nature of the protocol allowing the
optimization to continue in a stable manner indepen-
dent of the CAD/CFD despite system crashes due to
CAD failure, server load and outages;
• Quality of optimal solution (s) obtained as verified
using the exact CFD calculation;
• Observed utilization of computing resources allotted
with no idle time for any of the processors.
The mixed approach ensured zero idle time for the
group A processors beyond waiting for a sufficient initial
size of the database to use the meta-model, ensuring a
completely asynchronous execution. The accuracy of the
CPOD surrogate improved quickly with the database size.
The population diversity had to be increased (using niching
and additional sampling) during any group B server out-
ages/overloads in order to prevent premature convergence
with a less accurate estimate. With this, the optimization
process could continue and be halted at any time depending
on the need and performed well in the face of failure due to
the way databases are written as external files rather than
internal variables.
Figure 10 shows the evolution of the design points
plotted on the 3D objective function space (permeability
Fig. 9 Cluster implementation of asynchronous GA
Fig. 10 Evolution of the population on objective functions space
Engineering with Computers
123
vs. flow uniformity vs. duct volume) as the multi-objective
genetic algorithm proceeds from the first to future
generations.
Figure 11 shows the individual objectives plotted
against each other. It bears mentioning that the first two
objectives (permeability and flow uniformity at the duct
exit) are not mutually opposing. As expected, there is a
gradual improvement in the population based on the
objective functions, even though the graph may change as
the ‘‘Futures’’ protocol updates the surrogate values with
the CFD-calculated values.
Figure 12 shows the pressure distribution and the flow
field (streamlines) in one of the air-conditioning duct
geometries obtained after 40 generations (point A in
Fig. 11). As can be seen, the pressure distribution tapers off
over the length of the duct eventually releasing into
atmospheric pressure (zero gauge pressure), and the pres-
sure drop off is considerably sharp, and directly relates to
performance.
The velocity distribution shows the reason for the choice
of the first objective function, i.e., uniformity of flow
velocity at the exit of the duct related to noise control. We
note that circulation can be seen next to the curved surfaces
of the duct, and since these individuals still involve a trade
off between duct volume and permeability/flow uniformity,
the velocity fields show some circulation in two distinct
regions.
3.6 Performance and parallel efficiency
In the general approach covered by this article, it is
impossible to obtain a quantitative estimate of parallel
efficiency since the actual computation time depends on the
existing load on the remote cluster. In our particular
example, computation for the full CFD model required 90 s
on a dedicated remote node; while the CPOD-based sur-
rogate functions on a master node (identical to remote
nodes) required less than 5 s. That said, the actual course of
the optimization will greatly depend on the exact state/load
of the remote cluster which is constantly evolving with the
number of users submitting jobs and thus making a broad
statement about the actual efficiency as has been seen in
previous related works is not possible.
We have made some comparisons over an hour elapse
time of MOGA for three different remote server states:
‘‘zero-load’’ (dedicated server); ‘‘normal-load’’ (*30
competing users) and ‘‘full-load’’ (remote server unable to
process CFD requests). The program was run with 100 %
CFD (no surrogate-assistance) and a mixed approach, i.e.,
CFD and surrogate with time-out, and the results are
compared in Fig. 13.
The reference (green) curve is obtained for zero-load
and constant 100-s time-out value. The proposed approach
reduces here to the synchronous ‘‘Ask&Tell’’ GA
Fig. 11 Generation 20: F1 and F2 non-competing; F3 competes with
F1 and F2; the point A on the Pareto set is chosen for further
inspection (Fig. 12)
Fig. 12 Pressure distribution
and flow streamlines for a
Pareto point A
Engineering with Computers
123
implementation for a dedicated remote cluster since all
exact evaluations complete within roughly 90 s.
The red curve shows the degradation of performance of
the synchronous ‘‘Ask&Tell’’ GA implementation (without
surrogates) on a charged cluster. Only two generations are
performed within the 1st hour as it is enough for a single
individual taking 40 min to block the progression.
The blue curve illustrates the proposed asynchronous
‘‘Ask&Future Tell’’ approach. We notice that the iterations
are performed at the beginning within the same 100 s time-
out, which decreases giving more frequent update than the
reference curve. Due to the poor initial quality of the initial
surrogate, the convergence is slower at the beginning than
that of the synchronous counterpart and improves over the
time. The comparison is hindered by the fact that the
asynchronous version is not deterministic as the order of
operations changes between generations. It is, however,
easily observed that within 1-hour elapse time the proposed
algorithm gives objective function values close to the ref-
erence and largely outperforms the synchronous imple-
mentation in this ‘‘real life’’ case.
4 Conclusions
In this paper, we presented and tested a unified non-intru-
sive approach to solving multi-objective optimization
problems on a busy grid/cluster, avoiding deadlocks and
node failures. This is possible by an introduction of a new
‘‘Ask&Future Tell’’ parallel paradigm combining asyn-
chronous approach with high-fidelity simulations and cus-
tom-built adaptive bi-level surrogate meta-model. This
allows the high-fidelity simulations to proceed at their own
pace on a remote cluster and update the simulation results
to enrich the database of stored results once available to
ensure and improve surrogate quality, while the optimiza-
tion algorithm proceeds with approximate values. The
asynchronous approach presented will conceivably work
with any surrogate-based meta-model that produces quality
estimates, and on any type of cluster/grid.
For an efficient use, several parameters need closer
investigation such as the time-out evolution, which con-
ditions the quality of the learning process. In the current
work jobs complete largely in FIFO (First In, First Out)
order. A second level time-out may be necessary for
deleting largely delayed jobs corresponding to design
points far from the current Pareto set, the ideal learning
strategy being clearly LIFO (Last In, First Out). Further
work is needed to extend the ‘‘Ask&Future Tell’’ approach
to gradient-based optimization. Several issues need to be
addressed such as scalability, approximation of gradients
and local optima.
Acknowledgments This work has been supported by the French
National Research Agency (ANR), through the COSINUS program
(project OMD2 no. ANR-08-COSI-007). The authors acknowledge
the Projet Pluri-Formations PILCAM2 at the Universite de Tech-
nologie de Compiegne for providing HPC resources that have con-
tributed to the research results reported within this paper (URL:
http://pilcam2.wikispaces.com.) as well as Maryan Sidorkiewicsz,
Direction de la Recherche, Renault, France and Mr. V. Picheny, Ecole
des Mines, France for contributing the CFD model used in this work.
References
1. Holland JH (1975) Adaptation in natural and artificial systems.
University of Michigan Press, Ann Arbor
2. Vose MD (1999) The simple genetic algorithm: foundations and
theory. MIT Press, Cambridge
3. Konak A, Coit DW, Smith AE (2006) Multi-objective optimi-
zation using genetic algorithms: a tutorial. Reliab Eng Syst Saf
91:992–1007
Fig. 13 Elapse-time convergence of synchronous SPMD GA on dedicated cluster compared with ‘‘real life’’ and with proposed ‘‘Ask&Future
Tell’’ approach performance on loaded cluster
Engineering with Computers
123
4. Deb K (2001) Multi-objective optimization using genetic algo-
rithms. Wiley, Chichester
5. Willcox K, Peraire J (2002) Balanced model reduction via the proper
orthogonal decomposition. AIAA Journal 40(11):2323–2330
6. Gorissen D, Couckuyt I, Laermans E, Dhaene T (1985) Multi-
objective global surrogate modeling, dealing with the 5-percent
problem. Eng Comput 26(1):81–98
7. Lim D, Jin YC, Ong YS, Sendhoff B (2010) Generalizing sur-
rogate-assisted evolutionary computation. IEEE Trans Evol
Comput 14(3):329–355
8. Quiepo NV, Verde A, Pintos S, Haftka RT (2009) Assessing the
value of another cycle in Gaussian process surrogate-based
optimization. Int J Struc Multidisc Optim 39(5):459–475
9. Viana FAC, Haftka RT, Steffen V (2009) Multiple surrogates:
how cross-validation errors can help us to obtain the best pre-
dictor. Int J Struc Multidisc Optim 39(4):439–457
10. Knowles J (2006) ParEGO: a hybrid algorithm with on-line
landscape approximation for expensive multi objective optimi-
zation problems. IEEE Trans Evol Comput 10(1):50–66
11. Jones D, Schonlau M, Welch W (1998) Efficient global optimiza-
tion of expensive black-box functions. J Glob Optim 13:455–492
12. Berkooz G, Holmes P, Lumley JL (1993) The proper orthogonal
decomposition in the analysis of turbulent flows. Annu Rev Fluid
Mech 25:539–575
13. Filomeno Coelho R, Breitkopf P, Knopf-Lenoir C (2008) Model
reduction for multidisciplinary optimization—application to a 2d
wing. Int J Struc Multidisc Optim 37(1):29–48
14. Filomeno Coelho R, Breitkopf P, Knopf-Lenoir C (2009) Bi-level
model reduction for coupled problems. Int J Struc Multidisc
Optim 39(4):401–418
15. Xiao M, Breitkopf P, Coelho RF, Knopf-Lenoir C, Sidorkiewicsz
M, Villon P (2009) Model reduction by CPOD and Kriging. Int J
Struc Multidisc Optim 41(4):555–574
16. Bethke AD (1976) Comparison of genetic algorithms and gradi-
ent-based optimizers on parallel processors: efficiency of use of
processing capacity, Tech rep no 197. University of Michigan,
Ann Arbor
17. Greffensette JJ (1981) Parallel adaptive algorithms for function
optimization: parallel subcomponent interaction in a multilocus
model, Tech Rep No CS-81-19. Vanderbilt University, Nashville
18. Cantu-Paz E (1997) A survey of parallel genetic algorithms Ill-
GAL report 97003. The University of Illinois, Chicago
19. Tsutsui S (2010) Parallelization of an evolutionary algorithm on a
platform with multi-core processors. Artificial evolution, vol
5975. Lecture notes in computer science. Springer, Heidelberg,
pp 61–73
20. Wu H, Xu CL, Zou XF (2009) An efficient asynchronous parallel
evolutionary algorithm based on message passing model for
solving complex nonlinear constrained optimization. In: pro-
ceedings of the 8th international symposium on operations
research and its applications, ZhangJiaJie, China
21. Regis RG, Shoemaker CA (2009) Parallel stochastic global
optimization using radial basis functions. INFORMS J Comput
21(3):411–426
22. Asouti VG, Kampolis IC, Giannakoglou KC (2009) A grid-
enabled asynchronous meta model-assisted evolutionary algo-
rithm for aerodynamic optimization. Genet Program Evolvable
Mach 10(4):373–389
23. LeRiche R, Collette Y, Hansen N, Pujol G, Salazar D (2010) On
object-oriented programming of optimizers: examples in Scilab.
In: P. Breitkopf, R. Filomeno Coehlo (eds) Multidisciplinary
design optimization in computational mechanics (chapter 14)
Wiley/ISTE, Ney York, June 2010, pp 499–538
24. Caromel D, Henrio L (2004) A theory of distributed objects.
Springer, Berlin
25. http://omd2.scilab.org/ (2009) OMD2-project home-page, Accessed
Feb 22 2011
26. http://www.openfoam.com OpenFoam: the open-source CFD
toolbox, Accessed Aug 17 2010
27. Breitkopf P (1998) An algorithm for construction of iso-valued
surfaces for finite elements. Eng Comput 14(2):146–149
28. Rypl D, Krysl P (1997) Triangulation of 3D surfaces. Eng
Comput 13(2):87–98
29. Breitkopf P, Rassineux A, Touzot G, Villon P (2000) Explicit
form and efficient computation of MLS shape functions and their
derivatives. Int J Numer Meth Eng 48:451–456
30. Ryckelynck D (2005) A priori hyper eduction method: an adap-
tive approach. J Comput Phys 202(1):346–366
Engineering with Computers
123