Multi-ﬁdelity Modeling and Optimization of Biogas Plants

Multi-fidelity Modeling and Optimizationof Biogas Plants

Martin Zae↵erer⇤, Daniel Gaida, Thomas Bartz-BeielsteinFaculty for Computer and Engineering Sciences, TH Koln (University of Applied Sciences), Steinmullerallee 1, 51643 Gummersbach, Germany

Abstract

An essential task for operation and planning of biogas plants is the optimization of substrate feed mixtures. Optimizingthe monetary gain requires the determination of the exact amounts of maize, manure, grass silage, and other substrates.For this purpose, accurate simulation models are mandatory, because the underlying biochemical processes are veryslow. The simulation models may be time-consuming to evaluate, hence we show how to use surrogate-model-basedapproaches to optimize biogas plants e�ciently. In detail, a Kriging surrogate is employed. To improve model qualityof this surrogate, we integrate cheaply available data into the optimization process. To this end, multi-fidelity modelingmethods like Co-Kriging are applied. Furthermore, a two-layered modeling approach is used to avoid deteriorationof model quality due to discontinuities in the search space. At the same time, the cheaply available data is shown tobe very useful for initialization of the employed optimization algorithms. Overall, we show how biogas plants can bee�ciently modeled using data-driven methods, avoiding discontinuities as well as including cheaply available data.The application of the derived surrogate models to an optimization process is only partly successful. Given the samebudget of function evaluations, the multi-fidelity approach outperforms the alternatives. However, due to considerablecomputational requirements, this advantage may not translate into a success with regards to overall computation time.

Keywords: Biogas Plant, Simulation, Optimization, Surrogate Models, Multi-fidelity, Co-Kriging

1. Introduction

Optimizing the operation of biogas plants is and willbe one of the main challenges in the field of anaerobicdigestion (AD) in the near future. Due to a steady de-crease in funding and increasing substrate costs only op-timal operating biogas plants will be economically ad-vantageous.

The operation of biogas plants is very sensitive to themixture of the used substrates. Hence, optimizing themixture is an important task to run or plan such plantse�ciently. Due to the very slow processes involved,optimizing the plants in real-time would consume toomuch time. Models like the Anaerobic Digestion ModelNo. 1 (ADM1) allow to compute a good prediction ofbiogas plant’s process variables, based on the used sub-

⇤Corresponding author. Phone: +49 2261 8196 6327Email addresses: [email protected] (Martin

Zae↵erer), [email protected] (Daniel Gaida),[email protected] (ThomasBartz-Beielstein)

strates [7]. Thus, ADM1 can be used as a substitute inthe optimization process instead of a real plant.

While such models are much cheaper to evaluate thantheir real-world counter-part, they do take some time toevaluate. Hence, methods that use the smallest amountof evaluations possible are of interest. This situationmotivated the central question that will be tackled in thisstudy:(Q-1) How can the precision of simulation models be

improved without increasing the number of evalu-ations?

Surrogate modeling techniques are therefore apromising choice. Besides the expensive informationderived from ADM1, additional performance informa-tion is available. A rough performance estimate can bedetermined based on the biogas potential of the usedsubstrates and their associated costs. This additionalknowledge can be integrated into the optimization pro-cess, by bolstering the quality of the chosen surrogate-modeling technique. This approach of integrating dif-ferent levels of granularity or cost has previously beencalled multi-fidelity optimization [19]. It is worth in-

Accepted Manuscript submitted to Applied Soft Computing DOI of the final article: http://dx.doi.org/10.1016/j.asoc.2016.05.047 ©2016. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/

vestigating whether these approaches are applicable toreal-world settings. This can be formulated as the sec-ond question to be analyzed in this study:(Q-2) What are the benefits and limitations of multi-

fidelity modeling approaches?In this paper, several multi-fidelity modeling ap-

proaches are compared, and the best are tested for theirperformance in an optimization process.

Section 2 gives an overview of relevant previouswork. The specific problem to be solved is introduced inSection 3. In Section 4, methods that were used in thisstudy are described. Section 5 presents experiments, inwhich various multi-fidelity approaches are tested fortheir modeling quality, whereas Section 6 tests the bestof these for their success in solving the actual optimiza-tion problem. A concluding summary of findings aswell as an outlook on future research is given in Sec-tion 7.

2. Former Research

2.1. Biogas Plant SimulationIslam et al. [28] analyze the impact of di↵erent fac-

tors on production of biogas in di↵erent biogas plantsof Bangladesh. The data was collected from 18 poultryfarms. Their analysis is based on collected data fromsurvey, Internet, and other sources. To obtain further in-sight in the behavior of biogas plants, simulation modelssuch as the ADM1 can be used. ADM1 is very popularand the nowadays most complex mathematical modelused to simulate the anaerobic digestion process (for areview see [6]). In several publications it is utilized todynamically model full-scale agricultural and industrialbiogas plants [8, 33, 47]. ADM1 is a structured modelincorporating disintegration and hydrolysis, acidogen-esis, acetogenesis, and methanogenenesis steps. TheADM1 is implemented as a sti↵ di↵erential equationsystem in a MATLAB R� toolbox for biogas plant model-ing, optimization and control published by Gaida et al.[23]. In this toolbox, a model of a full-scale agriculturalbiogas plant is developed that is used in the empiricalpart of this publication. The simulation model of thebiogas plant includes the ADM1 and furthermore mod-els of electrical and thermal energy sinks and sourcesas well as models for performance and stability crite-ria. Typical criteria include cost versus benefit (withrespect to the Renewable Energy Sources Act (EEG2009) in Germany [9]), stability of substrate degrada-tion processes and operating constraints such as upperand lower pH limits, maximum VFA/TA [52] value,maximum total solids content in the digester, and mini-mum methane concentration of the biogas.

2.2. Biogas Substrate Feed OptimizationBiogas plant substrate feed mixtures have previously

been optimized with a Genetic Algorithm and ParticleSwarm Optimization by Wolf et al. [56]. More recentlyZiegenhirt et al. [60] used state of the art evolutionstrategies like Covariance Matrix Adaption EvolutionStrategy (CMAES) [27, 26] or Di↵erential Evolution(DE) [54] to reduce the number of needed simulations.They also used the Sequential Parameter OptimizationToolbox (SPOT) [5] to tune the employed algorithms.In our work, we directly use SPOT on the substrate feedoptimization problem. That is, we support the optimiza-tion procedure with surrogate-models.

Both previous studies used a biogas plant modelbased on the MATLAB R� Simulink R� Toolbox SIMBA,developed by ifak system GmbH1. The herein presentedresearch on the other hand is based on the MATLAB R�

Toolbox for Biogas Plant Simulation [23]. In contrastto earlier works by Wolf et al. [56] and Ziegenhirt etal. [60] our approach is not limited to the ADM1. Asimple estimate of a substrate mixtures quality is de-rived from the biogas potential of each ingredient.

2.3. Surrogate Modeling in OptimizationEspecially when the evaluation of target functions is

expensive, it is a well established approach to exploitsurrogate models of the target function to save expen-sive function evaluations.

A methodical framework for surrogate model basedoptimization of noisy and deterministic problems is Se-quential Parameter Optimization (SPO) introduced byBartz-Beielstein et al. [5]. SPO has been developed forsolving expensive algorithm tuning problems but canbe directly employed for solving real world engineeringproblems as well.

One of the most often used surrogate-models is Krig-ing, which is an especially promising model for continu-ous, smooth problem landscapes. Besides its predictionperformance, it is often employed because it providesan estimator of the local certainty of the model, whichcan be used to calculate the Expected Improvement (EI)of a new sample over the best known sample. Joneset al. [32] introduced this concept to balance exploita-tion and exploration in expensive optimization, termingit E�cient Global Optimization (EGO).

Other models include Artificial Neural Networks(ANN) or Support Vector Regression (SVR) [14]. Non-continuous problem landscapes, or problems which arenot that expensive, may be tackled with approaches like

1www.ifak-system.com

2

Random Forest (RF) [11] or Multivariate Adaptive Re-gression Splines (MARS) [20].

A comprehensive overview of surrogate model as-sisted optimization was provided by Jin [30], focusingon single objective problems.

Extensions of the above concepts to multi-objectiveproblems are available (e.g., multi objective EGO [35,48, 16] and SPO [58, 59]). Since multi-objective prob-lems are not in the focus of this paper, we refer to theoverview by Knowles and Nakayama [36] for furtherinformation.

2.4. Multi-fidelity

Multi-fidelity optimization [19] deals with problemswhere the target function can be evaluated at di↵erentlevels of fidelity. That is, the actual target function rep-resents the highest level of fidelity, yielding the mostaccurate but also most expensive fitness estimate. Atthe same time, one or several cheaper, less accurate esti-mates can represent the lower fidelity levels. The actual,expensive target function will be referred to as the finefunction, whereas the cheaper and less accurate func-tion will be referred to as coarse function, respectively.Note, that in this study, multi-fidelity will usually re-fer to the case where in fact at least three levels of fi-delity exist: fine function, coarse function and surrogatemodel. Only the first two are inherent to the problem,the third is learned based on collected data.

Such situations often arise, especially in engineer-ing problems. There, the evaluation of the actual prob-lem may be an expensive real-world evaluation mea-surement, or a time consuming Computational FluidDynamics (CFD) simulation. In these cases, a simpli-fied physics-based model may yield an inexpensive butless accurate quality estimate. For some models, fi-delity may even be scalable. For instance, simplifiedmeshes with less density can be employed with CFD,or if available pre-converged simulation results may beharnessed.

2.4.1. Multi-fidelity ModelingTo exploit information from di↵erent fidelity levels

in surrogate modeling, several methods exist, includingCo-Kriging. Forrester et al. [19] show how this can beapplied to engineering problems. Co-Kriging exploitscorrelation between coarse and fine function to generatea better surrogate model of the fine function.

More simple, yet often used, is the idea of using somekind of scaling to correct the lower fidelity model. Thiscan be understood as introducing a scaling-model whichcorrects the error of the lower fidelity model. The most

typical approaches are to use multiplicative or additivescaling (or a combination of both). Haftka introduced amultiplicative scaling to correct a global approximationwith that of a local one, by using a scaling factor basedon the ratio of both approximations and a first orderTaylor series expansion [25]. A similar approach thatuses additive scaling was first proposed by Lewis andNash [39]. The original first-order corrections of thesetypes have also been extended to second order [24, 15].

Another approach is Space Mapping (SM) [4]. Inessence, SM tries to find a transformation, that mapsfrom the parameter space of a fine function into thespace of a coarse function or vice versa. This trans-formation can be used to map a solution found with acoarse function into the fine function space. This alsohas the advantage that coarse and fine function spaceneed not have the same dimensionality.

While many applications are mostly concerned withdeterministic simulations (e.g., using CFD) multi-fidelity techniques have also been applied to stochasticproblems. Kuya et al. [38] combine data from simula-tions (deterministic, low-fidelity) and experiments (non-deterministic, high-fidelity). They studied the influenceof experimental design strategies (mainly on the low-fidelity level) on the accuracy of a Co-Kriging model.To deal with systematic error, they also propose to useblocking and randomization to avoid respective biasfrom the experimental procedure.

2.4.2. Multi-fidelity OptimizationOnce a certain multi-fidelity modeling technique is

chosen, it has to be decided how to implement the re-spective models into an optimization framework. Oneapproach is to use model management techniques toswitch between di↵erent levels of fidelity (which mayor may not include surrogate models) [42]. One va-riety of these techniques are trust-region strategies.Here, a lower fidelity model is used to approximatethe higher fidelity model during the one-dimensionalsearch along the direction of the gradient, inside a re-gion of trust (with respect to accuracy of the lowerfidelity model) [1, 2]. A rather new approach is toswitch between di↵erent fidelity methods, as proposedby Mehmani et al. [42]. They initialize with the low-est fidelity model and subsequently make decisions forswitching to higher fidelity models. The decision isbased on a non-deterministic metric that takes into ac-count the uncertainty of a model as well as the ob-served improvement in fitness. When the uncertainty isstronger than the improvement, a change of the fidelitylevel is made. Another model selection technique fora stochastic case is introduced by Mullins and Mahade-

3

van [44]. They take into account the predictive accuracyof a model as well as the required computational e↵ort(which may have to be approximated) to make a selec-tion.

Essentially, even the earlier introduced surrogatemodeling techniques (Section 2.3) can be understoodas a case of multi-fidelity optimization. They representa special case where the lower fidelity model is usu-ally a data-driven model. In that sense, some methods(e.g., the trust region strategies) are also classified assurrogate modeling techniques. Surrogate-model basedsearches, where the design (as well as the model) is se-quentially updated are also sometimes called sequen-tial sampling (e.g., the earlier introduced EGO algo-rithm) [29].

2.4.3. Areas of ApplicationMulti-fidelity optimization has been applied in sev-

eral areas. In the following we give a brief overviewof some of these areas. One important and very fre-quent area of application for multi-fidelity optimiza-tion is aerodynamic optimization. For example, severalof the earlier cited works deal with applications fromthis field (cf. [2, 24, 19, 38, 42]). Another applica-tion of multi-fidelity optimization is laser peening, asreported by Singh and Grandhi [53]. They make useof application-specific models and surrogate models ofvarying fidelity which they employ in di↵erent phasesof a particle swarm optimization algorithm. Koziel etal. [37] used Co-Kriging to design electromagnetic an-tennas [37]. Electro-magnetic design problems are alsoa very typical application of space mapping, which orig-inated from this area [4].

3. Problem Description

In this paper we deal with a problem where two fi-delity levels are available. The optimization objectiveas well as its two fidelity levels are described in thissection.

3.1. The Optimization Problem

Consider a biogas plant fed with k 2 N substrates.Its m 2 N dimensional system state is symbolized byz : R+0 ! Z and its substrate feed by x 2 X, withZ ✓ Rm and X ✓ Rk denoting the state and input space,respectively.

The objective is to minimize a one-dimensional ob-jective function f : Z ⇥ X ! R, which depends onthe state z and the substrate feed x of the biogas plant,

approximately modeled by a set of nonlinear di↵eren-tial equations z

(t) = g

(z

(t), x), called the biogas plantmodel g : Z ⇥ X ! Rm. The optimization problem issolved by choosing the optimal substrate feed x. Withthe initial state z0 2 Z it can be formulated as:

maxx2X

f (z

(t), x)

s.t. z

(t) = g

(z

(t), x) , z

(0) = z0,

x � xlo, x xup.

(1)

Here, xlo, xup are the lower and upper bounds for thesubstrate feed x.

3.2. The Objective Function

The objective function f represents the daily financialgain of the plant, defined as follows:

f :=revelect. (z

(t), x)+revtherm. (

z

(t), x)�costenerg. (

z

(t), x)�costsubs. (

x

)

(2)

The objective values are given in Euros per day. The de-cision space is spanned by the amount of each substratein the mixture which is fed into the plant. In detail. thefour terms are:

• revenue from selling electrical energy produced incombined heat and power plants (revelect.)

• revenue from selling thermal energy produced incombined heat and power plants (revtherm.)

• cost of energy used in plant operation, e.g., stirringthe digester content, substrate transportation, heat-ing the digesters (costenerg.)

• cost of substrates (costsubs.).

The revenue is defined by the profit obtained selling theproduced electrical and thermal energy, which, in Ger-many, is determined by the Renewable Energy SourcesAct - EEG. These values are obtained at steady state op-eration of the biogas plant. The di↵erent terms are moreor less dependent on the specific substrate mixture, andof course dependent on plant-specific parameters whichcan be assumed to be constant, e.g., size of fermentersor outside temperature.

4

3.3. The Fine Function

The fine objective function (denoted f f ) is based onthe ADM1 simulation of a biogas plant. The mod-eled biogas plant contains two digesters and producesan electrical power of 500 kW. As mentioned above,the used implementation is the MATLAB R� toolbox de-veloped by Gaida et al. [23].

This toolbox is able to yield information about all rel-evant process variables, as well as calculates the mon-etary gain for a given setting. Depending on the exactsetup, the model will take at least 30 seconds, with anaverage of about 1 minute to compute the daily mone-tary gain for a certain substrate mixture in equilibriumstate. Note, that simulation time may also depend onthe quality of the solution, and the number of substratesused, see Fig. B.13 in the appendix. Simulations mayfail, or are stopped if they do not yield a result after 10minutes. These cases have to be dealt with during opti-mization, as discussed in Section 4.3.

The volumetric flowrate of each available substratein the in-feed mixture is varied during the optimiza-tion process. That means, the dimension of the deci-sion space depends on the number of available substratetypes. The term ”available” can refer to physical avail-ability of a substrate at the plant, or the availability ofcalibration data for that substrate. Only substrates withknown parameters can be represented by the ADM1.

In this study, two cases will be tested.

• Two-dimensional case: It is assumed that only thesubstrates maize and pig manure are available. Theresulting optimization problem is that of findingthe best mixture of both. The low dimensionalityallows for visual analysis thus providing an intu-itive understanding of the problem.

• Five-dimensional case: Here, three additionalsubstrates are available, namely cow manure, grassand corn-cob-mix. This is a realistic scenario formany agricultural biogas plants.

The exact limits of the optimized parameters are sum-marized in Table 1.

3.4. The Coarse Function

The coarse, more simple objective function is mostlybased on the biomethane potential of each substrate,hence called biomethane potential (BMP) model. TheBMP can be calculated for each substrate using theBuswell equation [12].

Table 1: Lower and upper boundaries for the optimized parameters,that is the amount of substrates in the mixture.

Substrates xlo [ m3

d ] xup [ m3

d ]

Maize 5.00 40.00Pig Manure 5.00 60.00Grass Silage 0.00 20.00Corn-Cob-Mix 0.00 10.00Cow Manure 0.00 10.00

Thus, the BMP model estimates that the amount ofproduced gas rises linearly with the amount of each sub-strate feed into the plant. The estimate of produced en-ergy is limited by the maximum load of the block heatand power plant. One evaluation takes two hundred mi-croseconds or less. Hence, ten thousands of coarse func-tion evaluations could be made during one evaluation ofthe fine objective function.

The BMP model is able to yield basic informationlike amount of methane gas produced or daily monetarygain. However, it can not yield the complete set of pro-cess variables that are available with the ADM1 and isless accurate.

3.5. Advantages of the Coarse FunctionIn the case of this application, the optimization pro-

cess can profit in two di↵erent ways from the data avail-able in form of the coarse function. First, the low fi-delity models optimum can be used to enhance the ini-tial experimental design created by SPOT, or used asa starting guess for non-set-based approaches. Sec-ond, the surrogate model of the global landscape canbe enhanced by the low-fidelity model, e.g., using Co-Kriging or similar methods.

3.6. Model Implementation, Complexity and Time-Constraints

The Co-Kriging approach taken here is independentof the actual implementation of the fine and crude ob-jective function. Thus, an even more detailed simulationmodel could be used for the fine objective function. Forthe sake of this study, we investigate an instance of theADM1 with medium complexity, which is (relatively)fast to evaluate. This allows us to study a wider range ofmethods and perform a more in-depth analysis. This canbe considered as a benchmark for more complex cases.Various di↵erent implementations of the ADM1 do existand are being developed. A recent implementation [17]contains many more variables, compared to the standardADM1 that we use. Most models assume that the di-gester is perfectly mixed, but there are also approaches

5

that separate the digester into di↵erent zones, each ofthem modeled by one ADM1 instance [46]. But, theADM1 can also be implemented as a CFD model [21].Thus, depending on the model complexity one simula-tion can take several seconds but also several minutes oreven hours.

Furthermore, the benchmark in this study does onlytake the optimization of the equilibrium state into ac-count. In practice, substrate parameters as well asboundary constraints are not constant. Thus, the optimalsubstrate feed is only optimal for a short time (assumingthat parameters and constraints can be approximated aspiecewise constant). Therefore, the optimization prob-lem has to be solved repeatedly, e.g., once per hour oronce per day. Each time, the optimization problem issolved taking into account the current state of the biogasplant as well as current parameters and constraints. Thefaster this heavily time-constraint optimization problemcan be solved, the smaller is the lag between the presentmoment and the moment the optimization problem wasstarted. Such a real-time optimization scheme was de-veloped in [22] and it is planned to integrate the meth-ods proposed in this paper into the mentioned work.

4. Methods

4.1. Modeling

4.1.1. Co-KrigingKriging is a method for interpolation and regression

based on Gaussian process modeling. The following no-tation is adopted from Forrester et al. [18]. Given a setof n solutions X = {x(i)}i=1...n in a k-dimensional con-tinuous search space with observations y = {y(i)}i=1...n,Kriging is a method to find an expression for a pre-dicted value at an unknown point by interpreting theobserved responses y as if they are realizations of astochastic process. The following set of random vec-tors Y = {Y(x(i))}i=1...n is used to define this stochasticprocess. The correlation of the random variables Y(·) ismodeled as follows [18]:

corhY(x(i)),Y(x(l))

i= exp

0BBBBBB@�

kX

j=1

✓ j|x(i)j � x(l)

j |p j

1CCCCCCA . (3)

The matrix that collects correlations of all pairs {(i, l)} iscalled the correlation matrix . It is used in the Krigingpredictor

y(x) = µ + T �1(y � 1µ), (4)

where y(x) is the predicted function value of a new sam-ple x, µ is the maximum likelihood estimate of the mean

and is the vector of correlations between training sam-ples X and the new sample x. The width parameter✓ =

⇣✓1, . . . , ✓ j, . . . , ✓k

⌘Tdetermines how far the influ-

ence of each sample point x spreads. The parameter p jis usually fixed at p j = 2, and defines the shape of thecorrelation function.

As an extension of Kriging, Co-Kriging may in-clude information of a coarse function into the model.To that end, Co-Kriging exploits correlation betweenthe di↵erent fidelity levels. According to Forrester etal. [19], Co-Kriging can be understood to regress thecoarse function while coinciding with the fine func-tion. We now have two vectors with n f samples fromthe fine function and nc samples from the coarse func-tion, i.e., X f = {x( j)

f } j=1...n f and Xc = {x(i)c }i=1...nc in a k-

dimensional continuous search space with observationsyc f = {y(i)

c , y( j)f }i=1...nc, j=1...n f . Accordingly, the stochastic

process can now be defined by the set of random vectorsYc f = {Yc(x(i)

c ),Yf (x( j)f )}i=1...nc, j=1...n f . Then, we obtain

the covariance matrix C = �2

c c(Xc,Xc) ⇢�2c c(Xc,X f )

⇢�2c c(X f ,Xc) ⇢2�2

c c(X f ,X f ) + �2d d(X f ,X f )

!,

(5)where we have the same correlation function, but withtwo sets of model parameters for d and c respec-tively. An additional parameter ⇢ is introduced as aconstant scaling factor. While c does represent thecorrelation structure in the coarse function, d capturesthe di↵erence between the Gaussian process represent-ing the cheap function (scaled by ⇢) and the unknownGaussian process representing the fine function. TheCo-Kriging predictor for the fine function is

y f (x) = µ + cT C�1(y � 1µ), (6)

where c is the vector of covariances between the knownsolutions (fine and coarse) Xc f =

⇣X fXc

⌘and the solution

to be predicted x and 1 denotes a vector of ones. Werefer to Forrester et al. [19] for further information. Themodel we use in this work is a re-implementation in Rbased on MATLAB R� code of Forrester et al. [18].

Two experimental designs are evaluated for Co-Kriging, one is a large experimental design which cov-ers the design space of the coarse function. Hence, allpoints in this design are evaluated with the coarse func-tion. The second design is the smaller set of pointsevaluated on the fine function. It is nested in thelarger design, that means, each point evaluated on thefine function is also evaluated on the coarse function.Both designs should respect some criterion of space-fillingness. In this work, we create Latin Hypercube De-signs (LHDs) [40] that maximize the minimum distance

6

Start Create larger initial design (for coarse function)

Evaluate designs, with coarse and fine functions respectively

Build Kriging model Mc of the coarse function using

the generated data

Create smaller initial design (for fine function)

Build Co-Kriging model Mf of the fine function using

the generated data and Mc

Estimate optimum of Mf

Evaluate optimum with fine function and coarse function

Fine function evaluation budget

depleted?

No

Report results EndYes

Figure 1: A flow chart to visualize the algorithmic procedure of opti-mization with a Co-Kriging surrogate model.

between the samples. That means, the search space isdivided into subsections with a regular grid. Samplepoints are placed inside grid cells, such that only onesample is in each row and column of the grid. Inside agrid segment, sample points are placed at random, us-ing a uniform distribution. This procedure is repeatedh times, hence generating h LHDs. The minimum pair-wise distance of each LHD is calculated, and the LHDwith the largest distance is chosen. This helps to avoidthe o↵ chance of placing sample points too closely, incase they are in neighboring grid cells.

A flow chart of model based optimization with Co-Kriging is depicted in Fig. 1.

4.1.2. Alternative multi-fidelity ModelsSeveral simplified alternatives to Co-Kriging can be

used to integrate information from both coarse and finefunction into the modeling process. The followingmethods are all compared to Co-Kriging in a prelimi-nary investigation of model quality.

Di↵. Model A very intuitive idea is to assume that thecoarse function is able to model the general struc-ture correctly. The remaining error can then sim-ply be corrected by modeling the di↵erence be-tween coarse and fine function ( fc, f f ). The resid-uals of the coarse function are used as training datafor a data-driven model. That is, the surrogatemodel cM is built with design X and observations

f f (X) = y f , fc(X) = yc and

cMdi f f = fM⇣X, y f � yc

⌘. (7)

Here, M(.) is function that builds the respectivesurrogate model. A new prediction is then alwaysbased on the result of the model, as well as the re-sult of the coarse function:

y f = f f (x) = fdi f f (x) + fc(x), (8)

where y f is the predicted fine function value andfdi f f represents the prediction of the surrogatemodel cMdi f f . This model will be referred to asthe Di↵erence Model (Di↵. Model). It is essen-tially the same as the additive scaling approachmentioned in Sec. 2.4.1.

Ratio Model The Ratio Model works in a very similarway. Instead of di↵erences, i.e., residuals, the ratiobetween fine and coarse function is modeled.

cMratio = fM X,

y f

yc

!(9)

The prediction of the Ratio Model cMratio is givenas

y f = f f (x) = fratio(x) fc(x). (10)

It is essentially the same as the multiplicative scal-ing approach mentioned in Sec. 2.4.1.

Input Model The input model takes a slightly di↵erentapproach. The response of the coarse target func-tion is used as an additional input parameter of themodel, e.g., Kriging.

cMinput = fM({X, yc}, y f ), (11)

with prediction:

y f = f f (x) = finput({x, fc(x)}). (12)

These three simple approaches can all be applied toarbitrary models, e.g., Neural Networks or Support Vec-tor Machines. They require the coarse function duringprediction. Of course, if the coarse function itself issomewhat costly, although cheaper than the fine func-tion, it can again be replaced by a separate surrogatemodel.

7

4.1.3. Two-layer ModelingOne problem in surrogate modeling of biogas plants

is that the modeled landscape is not continuous, as illus-trated in Fig. 2. In this example, the actual gain functionhas a saltus at x = 30%. In fact, the optimum is oftenin the vicinity of a saltus in decision space, which canalso be seen in Fig. 3. This behavior is caused by theso called manure bonus, which is a fixed bonus paid tobiogas producers. This bonus is paid, if more than 30%of the substrate contains specific manures [9].

Models like Kriging are best suited for continuouslandscapes. To some extent, they are able to deal withdiscontinuities, at least globally. Still, a Kriging modelwill always deteriorate in regions close to the disconti-nuities. This is especially problematic due to the factthat the optimum may often be close to the 30 percentbonus limit. The model quality would therefore be de-teriorated in the vicinity of the optimum. To avoid thisproblem, two approaches are eligible for this applica-tion.

1. While the modeled landscape is that of the actualmonetary gain, the simulation does provide addi-tional information. This could be exploited bymodeling the exact amount of produced gas (e.g.,with Co-Kriging), and calculating the monetarygain on-the-fly during prediction. The amount ofproduced gas would be continuous over the wholedesign space, thus yielding a reasonable surrogatemodel. The drawback is, that at least two modelswould need to be trained: the first, which modelsthe amount of produced methane gas, and the sec-ond, which models further results from the ADM1simulation which a↵ect the gain of the plant. Also,the on-the-fly calculation of the monetary gainwould be added on top of the e↵ort of the predic-tion during the surrogate-optimization process.

2. The alternative is, to create two Kriging modelsfor the monetary gain, one approximating mone-tary gain without manure bonus and one with themanure bonus. During surrogate-optimization theoptimizer will switch between the two models, de-pending on whether the 30 percent bonus limit isreached. The two layers are illustrated in Fig. 2.

The former approach would be more time consuming,since it needs to calculate the monetary gain for eachpredicted sample. Also, each model would have to pre-dict each sample, because both values are needed for themonetary gain calculation. The latter approach wouldonly require to take the 30 percent bonus limit into ac-count, to switch between models, without any further

Manure [%]

Gain [€/d]

actual gaintheor. gain with manure bonustheor. gain without manure bonus

20 30 40

800

700

600

Figure 2: Illustration of the two di↵erent modeling layers in the Two-layer surrogate model. Here, only the percentage of manure for a fixedamount of other substrates is assumed to vary. The discontinuity in thecurve arises at exactly 30 percent manure.

calculations. The drawback would be the loss of infor-mation, since the former approach is able to give an es-timate of the produced amount of gas to the interesteduser. In this study, it was decided to take the less infor-mative but more e�cient approach two. We refer to thisas the Two-layer approach, due to the two di↵erent mod-els of the monetary gain. The di↵erence in model qual-ity will be investigated in a preliminary study in Sec. 5.

Essentially, this approach is comparable to the com-bined meta-modeling of discrete (logic) and continuousfunctions proposed by Meckesheimer et al. [41]. Ourapproach would be best represented by the case wherethe meta-modeling only encompasses the continuous re-sponses, whereas the logic/decision function does notneed to be approximated due to its low evaluation cost.

Please note, that the earlier described bump in the de-cision space is not a constant o↵set. The manure bonuswhich causes this discontinuity a↵ects the revenue fromsold gas, thus having a multiplicative influence on a sin-gle part of the objective value calculation. The impactof gas pricing on the overall gain varies significantly inthe given search space.

4.2. Error Measure

Two error measures will be used in the model qual-ity experiments. The Mean Squared Error (MSE) ofthe vector of n observations y = (y1, . . . , yi, . . . , yn)T

and the vector of corresponding predictions y =(y1, . . . , yi, . . . , yn)T

MSE(y, y) =1n

nX

i=1

(yi � yi)2. (13)

8

The second error measure is the Scaled MSE (SMSE)as introduced by Keijzer [34]. Keijzer [34] defines theSMSE as follows:

SMSE(y, y) = MSE(y, 1a + by) =

(13)=

1n

nX

i=1

(yi � (a + byi))2

where b =cov(y, y)

var(y)and a = ¯y � by

(14)

Here, ¯y and y indicate the respective mean values. TheSMSE can be understood to evaluate di↵erences be-tween two vectors after scaling them to a commonrange. That allows to ignore errors that are irrelevantto the optimization procedure. A simple example wouldbe a prediction y that di↵ers from the observations yonly by a constant o↵set. While this prediction wouldreceive a comparatively large MSE value, the locationof the optimum would be perfectly accurate. SMSE isconsidered as the adequate error measure in this case.Section 5 will further motivate this choice for the biogasapplication, showing preliminary results where SMSEis clearly the more reasonable indicator.

4.3. Handling Evaluation FailuresSimulations of the expensive, fine target function may

fail and lead to unreasonable or missing results. Thesefailures are consistent, i.e., a repeated evaluation of afailed configuration will fail again. Such points can notalways be ignored. Ignoring them could lead to a sit-uation were the optimization process would repeatedlysuggest an instable configuration. This would possiblyprevent the optimization progress. Instead, only failedsimulations in the initial design are removed.

Failed simulations in the optimization process itselfare replaced by the imputation method suggested byForrester et al. [18]. That means, instead of the miss-ing simulation result y(x) + s(x) is imputed. Here, y(x)is the prediction of the (Co-)Kriging model and s(x) isthe uncertainty estimate, which is used as a penalty. Ina sense, we impute some upper confidence bound, asestimated by the Kriging model.

Naturally, such a method is only possible where avariance estimate of the model is available. All non-model-supported approaches use a fixed penalty value.That means, they receive a constant value that representsbad quality whenever the simulation fails.

4.4. Optimization AlgorithmsThree di↵erent optimization approaches are em-

ployed in this study, either to optimize the fine/coarse

function directly, or to optimize the corresponding sur-rogate models.

• Downhill Simplex Method (Simplex) is a clas-sic, derivative free, local optimization methoddeveloped by Nelder and Mead [45]. For theexperiments in this paper a bound constrainedSimplex [10] implementation from the NLopt li-brary [31] is used, interfaced by the R-packagenloptr [57].

• Di↵erential Evolution (DE) [49, 50] is a state-of-the-art derivative free optimization method basedon the principles of evolution. Due to beingpopulation-based and due to its stochastic nature ithas the capability (although no guarantee) to leavelocal optima. The R-implementation in the pack-age DEoptim [3, 43] was used in the experiments.

• Latin Hypercube Sampling (LHS) is a very sim-ple optimization strategy, where a space fillingLatin Hypercube Design (LHD) of experiment isevaluated in the decision space of the optimizationproblem. The best found solution in this design isthe estimated optimum. As with all LHD instancesin this study, the design creation uses h = 100 re-peats for the selection of a design with largest min-imum pairwise distance.

5. Preliminary Study: Model Quality

A first study was performed to analyze how well thedi↵erent surrogate-models represent the problem land-scape, i.e., testing for modeling error. To get a simpleand understandable example, we restrict this Pre-Studyto the two dimensional case. Only the substrates pigmanure and maize are assumed to be available.

Three questions are of interest:

1. What error measure should be used?

2. Does the Two-layer modeling approach improvemodel quality?

3. Which (multi-fidelity-) modeling method, e.g..,Kriging, Co-Kriging, di↵ or ratio model, worksbest?

Please note, that approximately five percent of ran-domly chosen substrate-mixtures may yield simulationfailures due to numerical instability. In this preliminarystudy, such points are ignored, i.e., manually removed.A more complicated imputation as described in Sec. 4.3is only employed during the later optimization experi-ments in Sec. 6.

9

5.1. Experimental SetupWe perform four sets of experiments, where each

represents a di↵erent size of the experimental design.In each set, two design types are created and evalu-ated. Type one is a smaller LHD (size 5, 10, 15 and20 points) evaluated with the fine function f f . Type twois a very large LHD (always 100 points) evaluated withthe coarse function fc. The derived information is usedto build surrogate models of the fine function, i.e., f f .In both cases, h = 100 LHDs are created, and the onewith the largest minimum pairwise distance chosen.

To evaluate performance, we will look at the earlierintroduced error measures (MSE, SMSE). The consid-ered surrogate models are standard Kriging, Co-Krigingand a Neural Network approach (Quantile RegressionNeural Network QRNN [13]). QRNN and Kriging arealso tested with the earlier introduced simple multi-fidelity approaches, that is, Input, Ratio and Di↵erencemodeling (see Section 4.1.2). QRNN is introduced as analternative approach to determine whether certain obser-vations are actually linked to the Multi-fidelity model-ing approach, or rather to the employed model type.

It has to be noted that Co-Kriging is the most time-consuming method. Since the model building takes lessthan a second in any case, this is not significant in com-parison to the cost of evaluation f f . However, in thelater optimization experiments runtime deserves moreattention, since the time-consumption will sum up overall sequential optimization steps. Higher search spacedimension will also play an important role.

For each combination of error measure, design sizeand chosen surrogate model, 20 repeats are performed.The modeling error is estimated based on data from alarger Latin Hypercube Design, consisting of 400 de-sign points. This data set is not available during modeltraining. This validation is only performed for the pur-pose of this research study and is not proposed to be per-formed under real-world time-restrictions. This split-sample approach with a large test-set allows to easilydetect di↵erences that are only present in small regionsof the search space (e.g., the discontinuity in the searchspace or the vicinity of the optimum). This is the mainreason why other popular model-independent error eval-uation methods like cross-validation or predictive errormeasures [51, 55, 42] are not used here. These may beused for model selection where a large test set is notavailable, as they may not require additional functionevaluations.

5.2. Results and DiscussionAs a first result, Fig. 3 shows filled contour plots rep-

resenting the ADM1 ( f f ) and BMP ( fc) target functions,

−1000

−500

0

500

1000

5 10 15 20 25 30 35 40

10

20

30

40

50

60BMP

maize [ m3 d ]

pig−manure [ m

3d ]

0

500

1000

1500

5 10 15 20 25 30 35 40

10

20

30

40

50

60ADM1

maize [ m3 d ]

pig−manure [ m

3d ]

Figure 3: Contour plots of ADM1 and BMP model, based on 400samples from a LHD. The two contour plots are both interpolated withTwo-layer Kriging, using data from 400 samples generated with LHS.The depicted values are monetary daily gain in Euro, thus larger val-ues are better.

respectively.Here, both are interpolated with Two-layer Kriging,

using data from 400 samples created with LHS. In thegiven region of interest, both show similar behavior.While the BMP has only a slightly di↵erent shape, astrong o↵set can be observed. Still, the optima of bothfunctions are not to far from each other. They are alsoclose to the discontinuity, hence the need for the Two-layer approach.

To show the influence of the five di↵erent substratesin the 5D optimization case, a sensitivity analysis is per-formed. One simple indicator of the sensitivity to a cer-tain variable is the parameter ✓ j ( with j = 1, ..., k, seeEq. (3)), which specifies the width of the correlationfunction. This parameter can be interpreted to deter-

10

− 1.0 − 0.5 0.0 0.5 1.0

500

1000

1500

2000

Normalized Region of Interest

Obj

ectiv

e Va

lue

●

● MaizeP ig ManureGrass SilageCorn− Cob MixCow Manure

Figure 4: Sensitivity of the objective function value (gain in euro) tochanges in each parameter (substrate) over the normalized parameterrange. Values at �1 are the lower bound and 1 the upper bound of therespective parameter. When one parameter is varied, all other param-eters are fixed to the best solution observed. The optimal substratevalues are represented by the respective symbols close to the x-axis.

mine how active the modeled landscape is in the respec-tive dimension. Table 2 shows estimated ✓ j values foreach substrate.

Table 2: Estimated ✓ j values for a model based on 400 evaluations ofthe fine function (ADM1). Larger values indicate stronger activity ofthe respective variable.

Maize Pig Ma-nure

GrassSilage

Corn-CobMix

CowManure

3.85 1.72 1.58 2.29 0.02

It can be observed that Maize has the strongest impacton the objective value, whereas Cow Manure seems tohave the least impact. As a further indication of the pa-rameters sensitivity, Fig. 4 presents the sensitivity alongeach dimension (biogas substrate). Here, all but one pa-rameters are fixed at the value of the best known solu-tion. The single remaining parameter is varied, and therespective function values are depicted in the plot. Theplot verifies the results from the Table 2. For example,cow manure has little influence, as the objective valueline is nearly horizontal in the plot. Maize and corn-cob mix on the other hand may account for very largechanges, if they are varied.

The di↵erent scale of objective values visible in the

MSE: KrigingMSE: Coarse Funct ion

SMSE: KrigingSMSE: Coarse Funct ion

0.1 0.2 0.3 0.4

Figure 5: Depicted are SMSE and MSE of a Kriging model of thefine function as well as the BMP model (coarse function) evaluatedby two di↵erent error measures. The Kriging model was built basedon a LHD of size 5. The process of creating the LHD design wasrepeated 20 times. Smaller values are better.

Naive KrigingTwo− layer Kriging

Coarse Funct ion

0.02 0.04 0.06 0.08 0.10

D esign size: 5

SMSE

● ●●

●


Coarse Funct ion

0.00 0.02 0.04 0.06 0.08 0.10

D esign size: 10

SMSE

●● ●

●


Coarse Funct ion

0.00 0.01 0.02 0.03 0.04 0.05

D esign size: 15

SMSE

● ● ●Naive KrigingTwo− layer Kriging

Coarse Funct ion

0.005 0.010 0.015 0.020

D esign size: 20

Figure 6: Again, all plots are based on LHD designs, repeated 20times. Design sizes are in each header. The two-layer approach mod-els values with and without manure bonus separately. Smaller valuesare better.

contour plots indicates that SMSE should be preferredover MSE. An unscaled error measure would not be afair comparison, as the optimum of the coarse functionis very close to the one of the fine one. Figure 5 showshow this choice a↵ects the estimation of quality, com-paring MSE and SMSE of the coarse function to a Krig-ing model of the fine function.

SMSE is better suited to evaluate the usefulness of themodel for optimization purposes. As shown in Fig. 3,bad MSE values are caused by the saltus, although thelocation of the optimum is very well approximated.

Figure 6 shows how SMSE results vary depending onwhether or not the Two-layer approach is used.

As expected, the model profits from using the Two-layer approach, as it avoids the discontinuity introducedby the manure bonus. The exception is the smallest de-sign size of just five points. Here, no advantage is visi-ble. Two reasons can be given. Firstly, the very sparsedesign of five points will yield such a poor model thatthe discontinuity becomes irrelevant. Secondly, it be-comes unlikely that any of the five points is in the small

11

●

Coarse Fun.Kriging

Co−KrigingDiff. Kriging

Input Kriging

0.00 0.02 0.04 0.06 0.08 0.10

D esign size: 5

SMSE

●

●●

●● ●●

● ●

Coarse Fun.Kriging


Input Kriging

0.00 0.01 0.02 0.03 0.04

D esign size: 10

SMSE

●

●● ●

●

●●

Coarse Fun.Kriging


Input Kriging

0.000 0.005 0.010 0.015 0.020 0.025

D esign size: 15

SMSE

●●● ●

● ●●

●●

Coarse Fun.Kriging


Input Kriging

0.00 0.01 0.02 0.03 0.04 0.05

D esign size: 20

Figure 7: These plots show how di↵erent multi-fidelity methods,based on Kriging, perform in comparison to the coarse function andthe single-fidelity Kriging model. Di↵. Kriging models the di↵er-ences between coarse and fine function. Input Kriging uses the coarsefunction values as an additional input variable. Smaller values for theSMSE are better.

area of the region of interest, where manure bonus ispaid.

Figure 7 visualizes SMSE results of the di↵erentmulti-fidelity, i.e., Two-layer, modeling approaches.

In all cases, Co-Kriging outperforms the standardKriging model. Results for the Ratio model are notshown. Their results were worst, and including themin the figure would have made them hard to read. TheDi↵erence- and Input-based models seem to work forthe smallest design size, but are later outperformed evenby standard Kriging. This is despite the fact that theDi↵erence-based model is the best for the smallest de-sign size.

One peculiar observation in this context is, that theDi↵erence-based model does not seem to profit fromlarger design sizes, whereas all other methods do. Thiscan be seen more clearly in Fig. 8. The reason for thisbehavior is currently unclear and is subject of furtherresearch.

As a final comparison in this preliminary study, Fig. 9shows how choosing a di↵erent model type would a↵ectresults.

For the smallest design size, Input QRNN outper-forms Input Kriging, whereas the standard models show

●● ●●

●

● ●●

Coarse Fun.Diff. Kriging (n= 5)

Diff. Kriging (n= 10)Diff. Kriging (n= 15)Diff. Kriging (n= 20)

0.01 0.02 0.03 0.04 0.05

Figure 8: This plot uses the same data as Fig. 7 but for the Di↵erence-based multi-fidelity approach only. Smaller values for the SMSE arebetter.

●●

●

KrigingQRNN

Diff. KrigingDiff. QRNN

Input KrigingInput QRNN

0.00 0.02 0.04 0.06 0.08 0.10

D esign size: 5

SMSE

●

● ● ●

●● ●●

●

● ●

●

KrigingQRNN



0.00 0.02 0.04 0.06 0.08 0.10

D esign size: 10

SMSE

●

● ●

●

●

●●

●●

KrigingQRNN



0.00 0.02 0.04 0.06 0.08 0.10

D esign size: 15

SMSE

●●

● ●●

● ● ●

●●

● ● ●

KrigingQRNN



0.00 0.01 0.02 0.03 0.04 0.05

D esign size: 20

Figure 9: This plot compares how choosing a di↵erent modeltype (QRNN instead of single-fidelity Kriging) would a↵ect results.Smaller values for the SMSE are better.

no striking di↵erence. For all larger design sizes, stan-dard Kriging clearly works best. The previous observa-tion that the Di↵erence-based approach does not profitfrom larger design sizes can be confirmed for the QRNNmodel, too.

6. Main Study: Optimization Performance

6.1. Experimental Setup

To run the optimization experiments, the R-packageSPOT is used. Choosing parameters for the model-based optimization process in SPOT is a hard optimiza-tion problem. Tuning a model-based optimization algo-rithm requires large computational e↵ort. In case of anexpensive optimization problem this e↵ort becomes too

12

large to warrant the potential benefit. Therefore, we re-strict the choice of parameters to one fixed set for all ex-periments. Since the model-based approaches all shareseveral of these parameters, it can at least be expectedthat the bias due to lack of tuning is not exceedinglylarge.

In the following list, k is the number of optimizedsubstrates, that is, the dimension of the optimizationproblem. The list contains the important parameterswhich a↵ect the optimization performance.

• Choice of surrogate model: The di↵erent modelsare Kriging and Co-Kriging. The implementationsare taken from the R-package SPOT and are basedon the MATLAB R� code by Forrester et al. [18].

• Surrogate optimizer: While the objective land-scape in the Pre-Study looked rather well-naturedand uni-modal, there is no guarantee that this holdsfor any other scenario. Adding further substrates,or changing the costs of certain substrates, mayeasily introduce local optima. Furthermore, thereis no guarantee that a surrogate model of the uni-modal landscape is unimodal. Due to model-ing errors, additional optima might be introduced.Therefore, it was decided to use DE to optimizethe surrogate landscape. DE is well suited to solvemulti-modal optimization problems. The used im-plementation is the DEoptim Package in R.

• Surrogate optimizer population size: Populationsize of the surrogate-optimizer (DEoptim) is cho-sen to be 10k, which is the minimal suggestedvalue according to the DEoptim package documen-tation.

• Surrogate optimizer budget: 500k evaluations ofthe surrogate model are allowed for each sequentialstep.

• Number of Function evaluations (fine): Thenumber of evaluations of the fine function (ADM1)is limited to 5k.

• Design creation: Experimental designs are cre-ated with LHS.

• Initial Design size (fine): From the 5k budget, 3kevaluations are used for the initial design.

• Design Size (coarse): 50k points are evaluated onthe coarse function only. In addition, the coarsefunction is evaluated at every point where the finefunction is evaluated, leading to an overall number

of 55k evaluations of the coarse function at the endof an optimization run.

• Infill criterion: The chosen infill criterion is themean predicted by the Kriging models. To makebest use of the very small budget, expected im-provement (EI) is not used, thus the optimizationmay focus on exploitation only. This results ina greedy strategy, i.e., exploitation dominates ex-ploration of the search space. The benefit of thischoice was validated in pre-experimental studies.

Based on the results from the preliminary study, itwas chosen to test only Co-Kriging as the best per-forming multi-fidelity method. To get an estimate ofthe performance improvement this yields, Co-Kriging iscompared against the basic Kriging approach. Both aretested once with and once without a fixed point in theinitial design. That point is the optimum of the coarsemodel, which is cheap to determine.

Furthermore, two other methods were chosen as base-lines in the comparison. Due to the strict budget limi-tations, DE was not considered here. Instead, Simplexand LHS are compared to the model-based approaches.The former is a good local optimizer, while the latter ismore globally oriented. LHS can be viewed as an ap-proach that has to be beaten by any reasonable searchstrategy, because it simply selects the best out of a setof randomly generated solutions. Simplex may be un-beatable by the competing approaches if the target func-tion is of a su�ciently simple structure. Compared tothe surrogate-supported approaches, LHS and Simplexhave little overhead. To make use of the coarse function,LHS and Simplex use its optimum as a starting point.

As Simplex and LHS do not employ models, failuresof the fine function evaluations will be compensated byusing a constant penalty value of 5,000 e. Otherwisethe imputation method described in Sec. 4.3 is used.

6.2. Analysis

Table 3 summarizes the parameters of the best so-lutions found in any experiment. In both 2D and 5Doptimization, the optimal amount of manure is cho-sen to barely reach the required level for the manurebonus. When two substrates are optimized, the amountof maize is thus the largest part of the mixture, yield-ing a daily gain of roughly 1,770 e per day. Optimiz-ing five substrates at the same time yields a better re-sult of 1,987 e. Here, maize is mostly replaced withgrass silage which is not that expensive and thereforehas a better cost-benefit ratio. Nevertheless, one of thedisadvantages of using a high amount of grass silage is

13

Table 3: Overall optimal solutions found.

Parameter 2D 5D

Gain [e/d] 1,770 1,987

Maize [m3/d] 22.85 5.22Pig Manure [m3/d] 11.85 11.48

Grass Silage [m3/d] 0 18.98Corn-Cob-Mix [m3/d] 0 0.01

Cow Manure [m3/d] 0 0.08

Manure Bonus yes yes

Ammonia Digester [mg/l] 163.4 216.6Ammonia Post-digester [mg/l] 291.0 464.2

● ●

●

● ●●

KrigingCo−Kriging

Simplex

1000 1200 1400 1600 1800

W it h ou t fixed st a r t p oin t

Gain of opt imal set t ing in euro per day

●

●●

KrigingCo−Kriging

SimplexLHS

1720 1730 1740 1750 1760 1770

O p t im u m of B M P m od el as st a r t p oin t


Figure 10: Final results of 2D optimization. Larger values are better.Results are based on 20 repeats of each optimization run. The up-per plot shows results where the initial design (or starting guess) wascreated in a random fashion. For the lower plot, the initial guess (orat least one point in the initial design) was set to the optimum of thecoarse objective function.

its high nitrogen content. In the digester the nitrogenis released and becomes ammonium and ammonia. Asammonia is toxic for the bacteria a too high concentra-tion in the digester must be avoided. While this e↵ectis respected by the ADM1, plant operators will still pre-fer mixtures resulting in lower ammonia concentrations.Hence, future studies may be conducted that considerammonia as a second objective or as a constraint. Acoarse model to calculate ammonia concentration wouldbe to use the extended Buswell equation.

The performance of the di↵erent model-based andmodel-free approaches is compared for the 2D-case inFig. 10. It can be seen that Co-Kriging works much bet-ter in comparison to the standard Kriging model. Afterthe specified budget is exhausted, both outperform Sim-plex in the case where no start point is derived from the

coarse function optimum. In this case, Simplex shows avery large variance caused by the random start point.

When the optimum of the coarse function is used toinitialize the di↵erent methods, Simplex becomes com-pletely deterministic. All methods perform better thanLHS. The median of the plain Kriging approach is ap-proximately the same as the Simplex performance. Co-Kriging is better than both.

Figures B.14 and B.15 in the appendix show the sameresults for the 2D case, but include results after each sin-gle evaluation. That way, the convergence progress canbe observed. Note, that in this case, the LHS approachevaluates the promising candidate solution determinedwith the BMP model last. Similarly, the model basedapproaches evaluate the same candidate solution alwaysafter the non-deterministic candidates in the initial de-sign. This is the reason for the respective lack of vari-ance after ten (LHS with BMP) and six (Kriging/Co-Kriging with BMP) evaluations. Kriging requires about9 and Co-Kriging 8 solutions to be competitive with theSimplex (all with BMP start point). While choosinga start point with BMP is essential for quicker conver-gence with the basic Kriging model, Co-Kriging seemsto be less reliant on that. This can intuitively be at-tributed to the fact that Co-Kriging already has otherways of exploiting the BMP model information.

At a first glance, the situation does seem to be verysimilar for the larger search space of five parameters.The results of the 5D-case in Fig. 11 seem to show thesame behavior as the 2D runs. Again, more detailedfigures B.17 and B.16 in the appendix compare objec-tive function values against number of evaluations. Forthis case, the advantage of choosing a start point withthe BMP model seems to be more even more advanta-geous. All methods with random start points are notcompetitive. In contrast to the 2D-case the Co-Krigingbased optimization now also profits more clearly fromthe start point determined by BMP. Compared to the ba-sic Kriging model, the Co-Kriging model requires twoexpensive simulations less to receive similar results tothe deterministic Simplex approach.

It is important to note, that Fig. 10 and Fig. 11 showresults after an equal number of function evaluations,i.e., after the budget is expended. This does not take run-time into consideration. In case of the 2D-optimizationthere is nearly no di↵erence in runtime between the dif-ferent model-based approaches. This observation cannot be made for the 5D case. Here, due to the muchlarger coarse design sizes, the model building becomesexpensive. This can be seen in more detail from Tab. 4,which compares the net time used by the optimizationalgorithm runs in the 2D and 5D case. In fact, build-

14

KrigingCo−Kriging

Simplex

800 1000 1200 1400 1600 1800 2000



●

KrigingCo−Kriging

SimplexLHS

1840 1860 1880 1900 1920 1940 1960 1980



Figure 11: Boxplot of final results for the 5D optimization with equalnumber of fine objective function evaluations. Larger values arebetter. Results are based on 20 repeats of each optimization run. Theupper plot shows results where the initial design (or starting guess)was created in a random fashion. For the lower plot, the initial guess(or at least one point in the initial design) was set to the optimum ofthe coarse objective function.

KrigingCo−Kriging

Simplex

800 1000 1200 1400 1600 1800 2000



●

KrigingCo−Kriging

SimplexLHS

1840 1860 1880 1900 1920 1940 1960 1980



Figure 12: Boxplot of final results for the 5D optimization with equalruntime. Larger values are better. Results are based on 20 repeats ofeach optimization run. The upper plot shows results where the initialdesign (or starting guess) were created in a random fashion. For thelower plot, the initial guess (or at least one point in the initial design)was set to the optimum of the coarse objective-function.

ing and optimizing the Co-Kriging model now takes asimilar e↵ort as single evaluation of the fine function(ADM1).

Hence, Fig. 12 shows a comparison which is basedon results after an equal runtime of roughly 25 min-utes. While this does not a↵ect the model-free optimiza-tion algorithms, Co-Kriging is not significantly di↵erentfrom the standard Kriging model any more.

The main reason for the di↵erence in time-consumption is the large computational e↵ort spend onbuilding a model with data of 250 (initial design: 50k,Section 6.1) evaluations of the coarse function, or more.

This is not done with Kriging, which still remains com-paratively cheap. Co-Kriging however su↵ers from astrong increase in time consumption. Careful tuning ofparameters (e.g., design sizes) may avoid this issue, butwould be extremely expensive and thus undesirable.

Finally, Table 4 summarizes the attained best objec-tive function values (gain) and required gross and nettimes taken by the optimization algorithms as well asthe number of failed simulations. Co-Kriging providesthe best median performance for the 2D and 5D case.In the 2D case, it is also the fastest with respect to grosstime, while it clearly has the worst net time. This contra-diction can be explained by the fact that good substratesettings will usually require less simulation time2.

In some cases, a single simulation of an inappropriatesubstrate mixture may take up to 10 minutes. Clearly,avoiding these cases seems to be worth the invested nettime, at least in case of the 2D runs. At the same time,it has to be considered that the values of the Simplexruns with deterministic start point are based on a singlerun. Hence, they may even be outliers. This issue isstressed by the fact, that the deterministic Simplex runused much more gross time than the one with randomstart point (2D case). This single, deterministic Sim-plex run had one failed simulation (aborted after 600seconds), which accounts for the di↵erence to the non-deterministic runs. These do also have such outliers,but the depicted median value is not that much a↵ectedby those. On the other hand, the deterministic Simplexrun in the 5D case did not encounter a failed simula-tion, hence yielding a gross time of barely more than 25minutes.

The net time required is larger for the model basedapproaches, as expected. As stated earlier, the net timerequired by the Co-Kriging model is larger, but di↵er-ences in case of the 2D problem are negligible. Still,the detrimental explosion of net time required by theCo-Kriging based 5D optimization runs becomes evi-dent from the data in Table 4.

Overall, the results show both the advantages as wellas the disadvantages of using multi-fidelity information.Essentially, the use of the coarse function is shown tobe useful for deriving a reasonable start-point for theoptimization. Also, integrating the coarse function bythe means of Co-Kriging does increase performance, ifmeasured on a per-evaluation basis. Despite this suc-cess, evaluating the results based on actual runtime re-

2The variance in simulation time is mostly due to the substratemixture. Repetition of a simulation usually yields very similar simu-lation times. Simulation time and function values (gain) are negativelycorrelated, although only weakly.

15

Table 4: Results of all optimization approaches. Depicted are medianvalues attained after 25 fine function evaluations, in 20 optimizationexperiments for each method. Only the F column is not a median, butthe sum of all simulation failures over all experiments. In the methodcolumn, BMP means a deterministic start point (optimum of the BMPmodel) is used, whereas ran means a random start point. k is thedimensionality of the problem. The objective value Gain is in Euro perday. GT is the gross time taken by the optimization run (duration ofcomplete run) in seconds. NT is the net time used by the optimizationalgorithm (gross time minus the time taken by all required ADM1simulations) in seconds. Bold numbers are the respective best medianfor the 2D and 5D case. Note that Simplex (BMP) is deterministic,hence all values are based on a single optimization run.

method k Gain GT NT FSimplex (ran) 2 1554 898.24 0.37 1Simplex (BMP) 2 1751 1451.61 0.41 0LHS (BMP) 2 1553 975.49 0.32 1Kriging (ran) 2 1702 909.30 10.68 2Kriging (BMP) 2 1755 895.94 10.66 0Co-Krig. (ran) 2 1757 861.50 15.20 2Co-Krig. (BMP) 2 1760 805.76 15.08 0Simplex (ran) 5 1760 2779.87 0.92 7Simplex (BMP) 5 1958 1594.88 0.98 0LHS (BMP) 5 1620 3156.73 0.88 0Kriging (ran) 5 1808 2763.89 53.16 2Kriging (BMP) 5 1963 2615.60 53.51 4Co-Krig. (ran) 5 1917 2980.53 293.60 4Co-Krig. (BMP) 5 1968 2877.09 288.97 1

veals an important issue. Co-Kriging becomes veryslow, especially for higher dimensions and number of(coarse or fine) observations. Since the presented appli-cation of biogas substrate feed optimization is not ex-tremely expensive, model and target function runtimemay have the same order of magnitude. Hence, the re-sulting performance gain due to the Co-Kriging modelis nearly negligible. In essence, such a trade-o↵ be-tween model and objective function runtimes can notbe avoided with multi-fidelity approaches. Clearly, thewhole purpose of using multi-fidelity models is to in-crease performance by use of additional information. Inturn, this will always lead to increased complexity of thederived model. Whether or not such a model is usefulwill necessarily depend on the runtime of the objectivefunction.

7. Summary and Outlook

The experiments showed that the employed ap-proaches can be successful in building cheaper yet quiteaccurate models of the concerned biogas plant opti-mization problem. Therefore, Question (Q-1) can be

a�rmed. In detail, Co-Kriging based on cheap eval-uations from a basic biomethane potential model wasshown to improve the model quality, compared to a stan-dard Kriging model based on evaluations of an accu-rate ADM1 based simulation model only. Furthermore,model quality could be improved by using a Two-layerapproach, thus avoiding discontinuities in the searchedlandscape.

Besides these successes in improving surrogatemodel quality, their application in an optimization pro-cess proved to be more di�cult. In detail, time con-sumption of Co-Kriging proved to be too large in caseof a 5-Dimensional optimization problem. On the otherhand, Co-Kriging was still successful in the case of a 2-Dimensional problem formulation. The core issue hereis the trade-o↵ between computational e↵ort of the ob-jective function (ADM1) and the surrogate-model op-timization procedure. Should the optimization be sub-ject to tighter time-constraints or a more expensive ob-jective function, Co-Kriging based optimization may beprofitable even in the higher-dimensional case. This de-pends on the kind of control problem to be solved inpractice.

Apart from that, the integration of cheaply avail-able data was shown to be very profitable. That is, alltested methods benefited from using a coarse represen-tation of the target function to generate a promising ini-tial solution. Using this initialization method, a sim-ple Downhill Simplex method was shown to be as ef-ficient as the more complex model based approaches.This comparison does however require to be investi-gated in more detail. Since Downhill Simplex, witha fixed start point, is completely deterministic, whilethe model-based approaches are not, the comparisondoes not provide any information on statistical signif-icance. Without a fixed starting point, the surrogate-model-based approaches proved to be much more e�-cient than the Simplex method. Summarizing, detailedanswers to Question (Q-2) were presented in this study.

For future research, it is therefore commendable touse test cases with larger variety. The herein presentedresearch is based on a simulation for a specific com-bination of plant parameters, substrate costs, substrateparameters and environmental parameters. For otherplants, the initialization with the cheaper BMP modelmay not work that well, or the searched landscape canbecome more complex.

As mentioned previously, the larger amount of am-monia in the sludge may make the obtained optimum forthe 5D optimization undesirable to plant operators. Theamount of ammonia can therefore be included as a con-straint or even as a secondary objective. In both cases,

16

it may again be modeled with data-driven approacheslike Kriging, to cheapen the evaluation of the constraintor secondary objective, respectively. In the same way,there are other details of additional parameters whichmay require to be included. In previous studies [56, 60]these were included in a weighted sum of objectives.Since the weights are hard to set, a multi-objective ap-proach may make more sense. Of course, the compu-tational e↵ort of a surrogate-model-based approach willincrease, since a separate model has to be built for eachobjective. At the same time, classical approaches likethe Simplex method are not applicable anymore.

Acknowledgments. This work has been kindly sup-ported by the Federal Ministry of Education andResearch (BMBF) under the grants MCIOP (FKZ17N0311) and CIMO (FKZ 17002X11).

Appendix A. List of Symbols

costenerg. cost of energy used in plant operation (e.g.,stirring of digester content, substrate transporta-tion, heating of digester)

costsubs. cost of substrates

C covariance matrix used in the Co-Kriging model

c vector of covariances of a new sample with the ex-isting samples known to the Co-Kriging model

f objective function

f f fine (accurate, expensive) simulation of the objec-tive function

fc coarse (inaccurate, cheap) simulation of the objec-tive function

h Parameter of the Latin Hypercube Sampling ap-proach. Number of times Latin Hypercube designsare randomly created, before choosing the one withthe least pair-wise distance of samples.

k dimensionality of the decision space, i.e., the num-ber of biogas substrates

n number of observations / number of data points

m dimensionality of the system state space vector

cM fit of a surrogate model

fM function used to learn/build a surrogate model

vector of correlations of a new sample with the ex-isting samples known to the Kriging model

correlation matrix used in the Kriging model

revelect. revenue from selling electrical energy pro-duced in combined heat and power plants

revtherm. revenue from selling thermal energy pro-duced in combined heat and power plants

⇢ scaling parameter used in the Co-Kriging model

t time step

✓ parameter vector of the Gaussian correlation func-tion

p parameter vector of the Gaussian correlation func-tion

x input vector of length k, i.e., the optimized sub-strate feed values

xlo lower boundary of the decision variables (sub-strate feed values)

xup upper boundary of the decision variables (sub-strate feed values)

y observed objective function value

y vector of observed objective function values oflength n

z state space vector of length nz, i.e., state of the bio-gas plant

z0 initial state of the biogas plant system

Z system state space

X input parameter space

Appendix B. Additional Figures and Tables

●●●●● ●●● ●● ●● ●● ● ●●● ●● ●● ●● ●● ●●●● ●● ● ●● ●●● ●● ●● ●● ● ●● ●●● ●●●● ●●●●●● ●● ●●●

●● ●● ●● ●●● ● ●●● ●● ●● ●●●●●● ●●●●●●●●● ● ●●● ●●● ● ●●●●● ● ●●● ●●●● ●●● ●●●● ●●●● ● ●● ● ●● ●●● ● ●●● ●●● ●●● ● ●● ●●● ● ●● ●●● ● ●●● ●● ●●● ● ●● ●●●● ●●● ●● ● ● ●●●●● ●●● ●● ●● ●● ●● ●●● ●●●● ●●●● ●● ●●●●● ● ●●●●●● ●●● ●● ●●●● ●●

2D5D

0 100 200 300 400 500 600Eva

luat

ion

Tim

e in

sec.

Figure B.13: Boxplots depicting the evaluation time of the ADM1simulation (fine function) in seconds, split by the dimensionality ofthe problem (i.e., number of substrates).

17

● ●●

●

●●

●●

●●

●

●●

●

●

●

● ●●

●

●

1

3

6

9

11

1

3

6

9

11

1

3

6

9

11

1

3

6

9

11

1

3

6

9

11

1

3

6

9

11

Simplex (ran)

LHS (B

MP

)K

riging (BM

P)

Kriging (ran)

Co−

Krig. (B

MP

)C

o−K

rig. (ran)

500 1000 1500Gain

Eva

luat

ions

Figure B.14: Boxplot of 2D optimization performance after each eval-uation performed. Larger values are better. Results are based on 20repeats of each optimization run. The red line presents the perfor-mance of the simplex run with deterministic start point.

● ●

●

●

●

● ●●

●

●

7

8

9

10

7

8

9

10

7

8

9

10

Kriging (B

MP

)C

o−K

rig. (BM

P)

Co−

Krig. (ran)

1600 1650 1700 1750Gain

Eva

luat

ions

Figure B.15: Boxplot of 2D optimization performance after each eval-uation performed, focusing on best performin methods after the initial6 evaluations. Larger values are better. Results are based on 20 re-peats of each optimization run. The red line presents the performanceof the simplex run with deterministic start point.

●

●●

● ●●

● ●●

●

●

●

●●

● ●●

●

16

18

20

22

24

16

18

20

22

24

16

18

20

22

24

Kriging (B

MP

)C

o−K

rig. (BM

P)

Co−

Krig. (ran)

1250 1500 1750 2000Gain

Eva

luat

ions

Figure B.16: Boxplot of 5D optimization performance after each eval-uation performed, focusing on best performing methods after the ini-tial 15 evaluations. Larger values are better. Results are based on 20repeats of each optimization run. The red line presents the perfor-mance of the simplex run with deterministic start point.

18

● ●

●●

●●●●

●●●●●●●●●●

●●●●

● ●●

●

●●●●●●●●

●●

●

●

●●●●

● ●●

●●●●●

●●

● ●●●

0

10

20

30

0

10

20

30

0

10

20

30

0

10

20

30

0

10

20

30

0

10

20

30

Simplex (ran)

LHS (B

MP

)K

riging (BM

P)

Kriging (ran)

Co−

Krig. (B

MP

)C

o−K

rig. (ran)

− 1000 0 1000 2000Gain

Eva

luat

ions

Figure B.17: Boxplot of 5D optimization performance after each eval-uation performed. Larger values are better. Results are based on 20repeats of each optimization run. The red line presents the perfor-mance of the simplex run with deterministic start point.

19

References

[1] Alexandrov, N. M., Dennis Jr, J. E., Lewis, R. M., Torczon, V.,1998. A trust-region framework for managing the use of approx-imation models in optimization. Structural Optimization 15 (1),16–23.

[2] Alexandrov, N. M., Lewis, R. M., Gumbert, C. R., Green, L. L.,Newman, P. A., 1999. Optimization with variable-fidelity mod-els applied to wing design. Tech. rep.

[3] Ardia, D., Mullen, K. M., Peterson, B. G., Ulrich, J., 2013. DE-optim: Di↵erential Evolution in R. Version 2.2-2.

[4] Bandler, J. W., Biernacki, R. M., Chen, S. H., Grobelny, P.,Hemmers, R. H., et al., 1994. Space mapping technique for elec-tromagnetic optimization. Microwave Theory and Techniques,IEEE Transactions on 42 (12), 2536–2544.

[5] Bartz-Beielstein, T., Parsopoulos, K. E., Vrahatis, M. N., 2004.Design and analysis of optimization algorithms using computa-tional statistics. Applied Numerical Analysis and ComputationalMathematics (ANACM) 1 (2), 413–433.

[6] Batstone, D., Keller, J., Steyer, J., Aug. 2006. A review ofADM1 extensions, applications, and analysis: 2002-2005. Wa-ter Sci Technol 54 (4), 1.

[7] Batstone, D. J., Keller, J., Angelidaki, I., Kalyuzhnyi, S. V.,Pavlostathis, S. G., Rozzi, A., Sanders, W., Siegrist, H., Vavilin,V. A., 2002. Anaerobic Digestion Model No. 1, Scientific andtechnical report no.13. Tech. rep., IWA Task Group for Mathe-matic Modelling of Anaerobic Digestion Processes, London.

[8] Biernacki, P., Steinigeweg, S., Borchert, A., Uhlenhut, F.,Brehm, A., 2013. Application of Anaerobic Digestion ModelNo. 1 for describing an existing biogas power plant. Biomassand Bioenergy.

[9] BMU, 2009. Gesetz fur den Vorrang Erneuerbarer Energien(Erneuerbare-Energien-Gesetz - EEG).

[10] Box, M. J., Apr 1965. A new method of constrained optimiza-tion and a comparison with other methods. The Computer Jour-nal 8 (1), 42–52.

[11] Breiman, L., 2001. Random forests. Machine Learning 45 (1), 5–32.

[12] Buswell, A. M., Mueller, H. F., Mar. 1952. Mechanism ofMethane Fermentation. Industrial & Engineering Chemistry44 (3), 550–552.

[13] Cannon, A. J., 2011. Quantile regression neural networks: Im-plementation in r and application to precipitation downscaling.Computers & Geosciences 37 (9), 1277–1284.

[14] Drucker, H., Burges, C. J. C., Kaufman, L., Smola, A. J., Vap-nik, V., 1996. Support vector regression machines. In: Mozer,M., Jordan, M. I., Petsche, T. (Eds.), NIPS. MIT Press, pp. 155–161.

[15] Eldred, M., Giunta, A., Collis, S., Alexandrov, N., Lewis,R., 2004. Second-order corrections for surrogate-based op-timization with model hierarchies. In: Proceedings of the10th AIAA/ISSMO Multidisciplinary Analysis and Optimiza-tion Conference, Albany, NY,, Aug. pp. 2013–2014.

[16] Emmerich, M., Deutz, A., Klinkenberg, J., 2011. Hypervolume-based expected improvement: Monotonicity properties and ex-act computation. In: Evolutionary Computation (CEC), 2011IEEE Congress on. IEEE, pp. 2147–2154.

[17] Flores-Alsina, X., Kazadi Mbamba, C., Solon, K., Vrecko, D.,Tait, S., Batstone, D., Jeppsson, U., Gernaey, K., 2015. A plant-wide aqueous phase chemistry module describing pH variationsand ion speciation/pairing in wastewater treatment process mod-els. Water Research 85, 255–265.

[18] Forrester, A., Sobester, A., Keane, A., 2008. Engineering De-sign via Surrogate Modelling. Wiley.

[19] Forrester, A. I., Sobester, A., Keane, A. J., Dec 2007. Multi-

fidelity optimization via surrogate modelling. Proceedings of theRoyal Society A: Mathematical, Physical and Engineering Sci-ences 463 (2088), 3251–3269.

[20] Friedman, J. H., 1991. Multivariate adaptive regression splines.Ann. Stat. 19 (1), 1–141.

[21] Gaden, D. L. F., 2013. Modelling Anaerobic Digesters in ThreeDimensions: Integration of Biochemistry with ComputationalFluid Dynamics. Doctor, The University of Manitoba.

[22] Gaida, D., 2014. Dynamic Real-Time Substrate Feed Optimiza-tion of Anaerobic Co-Digestion Plants. Doctoral thesis, LeidenUniversity.

[23] Gaida, D., Wolf, C., Bongards, M., Back, T., April 2011. Mat-lab toolbox for biogas plant modelling and optimization. In:Progress in Biogas II - Biogas production from agriculturalbiomass and organic residues. Vol. 2. Stuttgart, pp. 67–70.

[24] Gano, S. E., Perez, V. M., Renaud, J. E., Batill, S. M., Sanders,B., 2004. Multilevel variable fidelity optimization of a morphingunmanned aerial vehicle. In: AIAA/ASME/ASCE/AHS/ASCStructures, Structural Dynamics & Materials Conference, PalmSprings, CA.

[25] Haftka, R. T., 1991. Combining global and local approxima-tions. AIAA journal 29 (9), 1523–1525.

[26] Hansen, N., 2006. The CMA evolution strategy: a comparingreview. In: Lozano, J., Larranaga, P., Inza, I., Bengoetxea, E.(Eds.), Towards a new evolutionary computation. Advances onestimation of distribution algorithms. Springer, pp. 75–102.

[27] Hansen, N., Ostermeier, A., 1996. Adapting arbitrary normalmutation distributions in evolution strategies: the covariancematrix adaptation. In: Evolutionary Computation, 1996., Pro-ceedings of IEEE International Conference on. pp. 312–317.

[28] Islam, M. S., Islam, A., Islam, M., May 2014. Variation ofbiogas production with di↵erent factors in poultry farms ofbangladesh. In: 3rd International Conference on the Develop-ments in Renewable Energy Technology (ICDRET). pp. 1–6.

[29] Jin, R., Chen, W., Sudjianto, A., 2002. On sequential samplingfor global metamodeling in engineering design. In: ASME 2002International Design Engineering Technical Conferences andComputers and Information in Engineering Conference. Ameri-can Society of Mechanical Engineers, pp. 539–548.

[30] Jin, Y., 2005. A comprehensive survey of fitness approximationin evolutionary computation. Soft Computing 9 (1), 3–12.

[31] Johnson, S. G., 2010. The nlopt nonlinear-optimization package.http://ab-initio.mit.edu/nlopt, accessed: 2013-11-20.

[32] Jones, D., Schonlau, M., Welch, W., 1998. E�cient global op-timization of expensive black-box functions. Journal of GlobalOptimization 13, 455–492.

[33] Juznic-Zonta, Z., Kocijan, J., Flotats, X., Vrecko, D., Nov. 2012.Multi-criteria analyses of wastewater treatment bio-processesunder an uncertainty and a multiplicity of steady states. Waterresearch 46 (18), 6121–31.

[34] Keijzer, M., 2004. Scaled symbolic regression. Genetic Pro-gramming and Evolvable Machines 5 (3), 259–269.

[35] Knowles, J., January 2006. Parego: A hybrid algorithm with on-line landscape approximation for expensive multiobjective op-timization problems. IEEE Transactions on Evolutionary Com-putation 10 (1), 50–66.

[36] Knowles, J. D., Nakayama, H., 2008. Meta-modeling inmultiobjective optimization. In: Multiobjective Optimization.Springer, pp. 245–284.

[37] Koziel, S., Ogurtsov, S., Couckuyt, I., Dhaene, T., November2012. Cost-e�cient electromagnetic-simulation-driven antennadesign using co-kriging. Microwaves, Antennas Propagation,IET 6 (14), 1521–1528.

[38] Kuya, Y., Takeda, K., Zhang, X., J. Forrester, A. I., 2011. Mul-tifidelity surrogate modeling of experimental and computational

20

aerodynamic data sets. AIAA journal 49 (2), 289–298.[39] Lewis, R. M., Nash, S. G., 2000. A multigrid approach to the op-

timization of systems governed by di↵erential equations. AIAApaper 4890, 2000.

[40] McKay, M. D., Beckman, R. J., Conover, W. J., 1979. A com-parison of three methods for selecting values of input variablesin the analysis of output from a computer code. Technometrics21 (2), 239–245.

[41] Meckesheimer, M., Barton, R., Simpson, T., Limayen, F., Yan-nou, B., 2001. Metamodeling of combined discrete/continuousresponses. AIAA Journal 39 (10), 1950–1959, cited By 0.

[42] Mehmani, A., Chowdhury, S., Messac, A., 2015. Predictivequantification of surrogate model fidelity based on modal varia-tions with sample density. Structural and Multidisciplinary Op-timization 52 (2), 353–373.

[43] Mullen, K., Ardia, D., Gil, D., Windover, D., Cline, J., 2011.DEoptim: An R package for global optimization by di↵erentialevolution. Journal of Statistical Software 40 (6), 1–26.

[44] Mullins, J., Mahadevan, S., 2014. Variable-fidelity model selec-tion for stochastic simulation. Reliability Engineering & SystemSafety 131, 40 – 52.

[45] Nelder, J. A., Mead, R., Jan 1965. A simplex method for func-tion minimization. The Computer Journal 7 (4), 308–313.

[46] Ogurek, M., Alex, J., 2013. Uberlegungen zur Modellierung mitdem ADM1. In: ifak Workshop: Steuerung, Regelung und Sim-ulation von Biogasanlagen. ifak e.V., Leipzig.

[47] Page, D. I., Hickey, K. L., Narula, R., Main, A. L., Grimberg,S. J., Jan. 2008. Modeling anaerobic digestion of dairy manureusing the IWA Anaerobic Digestion Model no. 1 (ADM1). Wa-ter science and technology : a journal of the International Asso-ciation on Water Pollution Research 58 (3), 689–95.

[48] Ponweiser, W., Wagner, T., Biermann, D., Vincze, M., 2008.Multiobjective optimization on a limited budget of evaluationsusing model-assisted s-metric selection. In: PPSN. pp. 784–794.

[49] Price, K., 1996. Di↵erential evolution: A fast and simple nu-merical optimizer. In: Proceedings NAFIPS’96. pp. 524–525.

[50] Price, K. V., Storn, R. M., Lampinen, J. A., January 2006. Dif-ferential Evolution - A Practical Approach to Global Optimiza-tion. Natural Computing. Springer-Verlag, iSBN 540209506.

[51] Queipo, N. V., Haftka, R. T., Shyy, W., Goel, T., Vaidyanathan,R., Tucker, P. K., 2005. Surrogate-based analysis and optimiza-tion. Progress in Aerospace Sciences 41 (1), 1 – 28.

[52] Schoen, M. A., Sperl, D., Gadermaier, M., Goberna, M., Franke-Whittle, I., Insam, H., Ablinger, J., Wett, B., Dec. 2009. Pop-ulation dynamics at digester overload conditions. Bioresourcetechnology 100 (23), 5648–55.

[53] Singh, G., Grandhi, R. V., 2010. Mixed-variable optimizationstrategy employing multifidelity simulation and surrogate mod-els. AIAA journal 48 (1), 215–223.

[54] Storn, R., Price, K., 1997. Di↵erential evolution - a simpleand e�cient heuristic for global optimization over continuousspaces. Journal of Global Optimization 11 (4), 341–359.

[55] Viana, F. A. C., Haftka, R. T., Ste↵en, V., 2009. Multiple surro-gates: how cross-validation errors can help us to obtain the bestpredictor. Structural and Multidisciplinary Optimization 39 (4),439–457.

[56] Wolf, C., McLoone, S., Bongards, M., 2008. Biogas plant op-timization using genetic algorithms and particle swarm opti-mization. In: IET Irish Signals and Systems Conference (ISSC2008). Institution of Electrical Engineers, pp. 244–249.

[57] Ypma, J., 2013. nloptr vers-0.9.6: R interface to nlopt. http://cran.r-project.org/package=nloptr, accessed: 2013-11-20.

[58] Zae↵erer, M., Bartz-Beielstein, T., Friese, M., Naujoks, B.,Flasch, O., July 2012. Multi-criteria optimization for hard prob-

lems under limited budgets. In: Soule, T., et al. (Eds.), GECCOCompanion ’12: Proceedings of the fourteenth internationalconference on Genetic and evolutionary computation confer-ence companion. ACM, Philadelphia, Pennsylvania, USA, pp.1451–1452.

[59] Zae↵erer, M., Bartz-Beielstein, T., Naujoks, B., Wagner, T., Em-merich, M., 2013. A case study on multi-criteria optimization ofan event detection software under limited budgets. In: Evolu-tionary Multi-Criterion Optimization. Springer, pp. 756–770.

[60] Ziegenhirt, J., Bartz-Beielstein, T., Flasch, O., Konen, W.,Zae↵erer, M., Jul 2010. Optimization of biogas productionwith computational intelligence a comparative study. In: IEEECongress on Evolutionary Computation. Institute of Electricaland Electronics Engineers, pp. 1–8.

21

Documents

Multi-ﬁdelity Modeling and Optimization of Biogas Plants