NLP++ Optimization Toolbox+TheoryManual.pdf · 2013-01-22 · kowski, and van de Braak [17]. The general availability of parallel computers and in particular of distributed computing

NLP++ Optimization Toolbox- Theory Manual -

GmbHFuerther Str. 212

D - 90429 Nuernberg

January 22, 2013

http://www.inutech.de

Contents

1. General information about NLP++ 11.1. General NLP++ Optimization Problem . . . . . . . . . . . . . . . . . . . . . . . 11.2. Program Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

I. Constrained Nonlinear Optimization 5

2. SqpWrapper - Sequential Quadratic Programming 72.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.2. Sequential Quadratic Programming Methods . . . . . . . . . . . . . . . . . . . . 92.3. Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.3.1. Testing Distributed Function Calls . . . . . . . . . . . . . . . . . . . . . . 152.3.2. Testing Gradient Approximations by Difference Formulae under Random

Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.3.3. Comparing Monotone versus Non-Monotone Line Search . . . . . . . . . . 18

2.4. Program Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.5. Example - using parallel line search in SqpWrapper . . . . . . . . . . . . . . . . . 252.6. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3. NlpqlbWrapper - Solving Problems With Very Many Constraints 313.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.2. An Active-Set Sequential Quadratic Programming Method . . . . . . . . . . . . . 353.3. Numerical Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383.4. Program Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463.5. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4. NlpqlgWrapper - Heuristic Global Optimization 534.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534.2. The Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554.3. Program Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594.4. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

5. ScpWrapper - Sequential Convex Programming 655.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 655.2. The MMA approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 665.3. Predictor-corrector interior point method . . . . . . . . . . . . . . . . . . . . . . 71

5.3.1. Predictor step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 765.3.2. Corrector step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

5.4. Convergence results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

iii

Contents

5.5. Program Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

6. CobylaWrapper - Gradient-free Constrained Optimization by Linear Approximations 916.1. Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 916.2. Program Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

7. MisqpWrapper - Sequential Quadratic Programming with Trust Region Stabilization 957.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 957.2. The Trust Region SQP Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 967.3. Program Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

8. QlWrapper - Quadratic optimization with linear constraints 1058.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1058.2. Program Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1068.3. Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

9. IpoptWrapper - Interface to an open-source large scale optimization algorithm 1139.1. Program Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

10.Example - Solving a constrained nonlinear optimization problem using active-set-strategy 115

II. Multicriteria Optimization 121

11.Theory - Multiple Objective Optimization 12311.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12311.2. Pareto Optimality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

12.The Method of Weighted Objectives 12712.1. The Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12712.2. Program Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

13.The Hierarchical Optimization Method 13313.1. The Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13313.2. Program Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

14.The Trade-Off Method 13914.1. The Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13914.2. Program Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

15.Method of Distance Functions 14515.1. The Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14515.2. Program Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

16.Global Criterion Method and the Min-Max Optimum 15116.1. The Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15116.2. Program Documentation: MooGlobalCriterion . . . . . . . . . . . . . . . . . . . 151

iv

Contents

16.3. Program Documentation: MooMinMaxOpt . . . . . . . . . . . . . . . . . . . . . 156

17.Weighted Tchebycheff Method 16117.1. The Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16117.2. Program Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

18.STEP Method 16718.1. The Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16718.2. Program Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168

19.Example 173

III. Mixed Integer Optimization 183

20.MidacoWrapper 18520.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18520.2. Approaches for non-convex Mixed Integer Nonlinear Programs . . . . . . . . . . 18620.3. The Ant Colony Optimization framework . . . . . . . . . . . . . . . . . . . . . . 18820.4. Penalty method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191

20.4.1. Robust oracle penalty method . . . . . . . . . . . . . . . . . . . . . . . . 19220.4.2. Oracle update rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195

20.5. Program Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196

21.MipOptimizer 20321.1. Definition of Optimization Problems addressed . . . . . . . . . . . . . . . . . . . 20321.2. General Notation for Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20321.3. MIPSQP: Algorithm for Solving Nonlinear Discrete Optimization Problems . . . 20521.4. Program Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209

22.MisqpWrapper - A Mixed Integer SQP Extension 21522.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21522.2. The Mixed-Integer Trust Region SQP Method . . . . . . . . . . . . . . . . . . . . 21622.3. Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21822.4. Program Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221

23.Example 229

IV. Constrained Data Fitting 235

24.NlpmmxWrapper - Constrained Min-Max Optimization 23724.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23724.2. The Transformed Optimization Problem . . . . . . . . . . . . . . . . . . . . . . . 23724.3. Program Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238

25.NlpinfWrapper - Constrained Data Fitting in the L∞-Norm 24525.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245

v

Contents

25.2. The Transformed Optimization Problem . . . . . . . . . . . . . . . . . . . . . . . 24625.3. Program Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247

26.NlpL1Wrapper - Constrained Data Fitting in the L1-Norm 25326.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25326.2. The Transformed Optimization Problem . . . . . . . . . . . . . . . . . . . . . . . 25326.3. Program Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254

27.LeastSquaresWrapper - Constrained Data Fitting in the L2-Norm 26127.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26127.2. Least Squares Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26227.3. The SQP-Gauss-Newton Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 26427.4. Constrained Least Squares Problems . . . . . . . . . . . . . . . . . . . . . . . . . 26727.5. Program Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269

28.Example 275

V. Other Useful Tools in NLP++ 281

29.Class Variable Catalogue 28329.1. Defining allowed values for a catalogued variable . . . . . . . . . . . . . . . . . . 28329.2. Class methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283

30.Class ApproxGrad 28530.1. Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28530.2. Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286

31.Class OptProblem and the DefaultLoop()-function 297

VI. Using Nlp++ together with Diffpack 299

32.Proceeding from a calculation to an optimization program 30132.1. General Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30132.2. Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301

Bibliography 313

vi

1. General information about NLP++

The NLP++ Optimization Toolbox is a comprehensive C++ class library providing optimizationroutines for a large variety of nonlinear, constrained optimization problems.

1.1. General NLP++ Optimization Problem

The general Nlp++ optimization problem is of the following form:

Minimize F0(x), ...Fl(x) l ∈ Ns.t. Gj(x) = 0, j = 0, . . . ,ME − 1,

Gj(x) ≥ 0, j = ME , . . . ,M − 1,Li ≤ xi ≤ Ui, i = 0, . . . , Nr +Ni − 1xi ∈ R, i = 0, . . . , Nr − 1.xNr+i ∈ Z, i = 0, . . . , Ni − 1.xNr+Ni+i ∈ Ci i = 0, . . . , Nc − 1.

with

l Number of objective functionsME Number of equality constraintsM Number of constraints (equalities and inequalities)Li Lower bound for the i’th variableUi Upper bound for the i’th variableNr Number of real variablesNi Number of integer variablesNc Number of catalogued variablesCi Catalogue (finite subset of R containing the allowed values)

for the i’th catalogued variable

The formulation of the optimization proplems in the following chapters might be different.However, when an optimization problem for Nlp++ is implemented it has to be consistent withthe formulation above.

1

1. General information about NLP++

1.2. Program Components

NLP++ consists of a kernel and five additional toolboxes:

• NLP++ Kernel

– SqpWrapper for sequential quadratic optimization

– ScpWrapper for large scale nonlinear optimization using the optimization methodsequential convex programming (SCP) or (optional) the method of moving asymp-totes (MMA)

– CobylaWrapper, a gradient-free optimization routine

– MisqpWrapper for nonlinear optimization applying a sequential quadratic program-ming (SQP) method with trust region stabilization

– QlWrapper, for quadratic optimization with linear constraints

– IpoptWrapper, an interface to the interior-point algorithm Ipopt for large scaleoptimization.Note: Only the interface is provided. Ipopt itself must be obtained separately.

– ApproxGrad, a class for approximating gradients of objective and/or constraintfunctions

• Heuristic Global Optimization

– NlpqlgWrapper, a heuristic procedure for solving global optimization problemswith expensive function evaluations, based on the SQP-method

• Sqp with very many constraints

– NlpqlbWrapper for sequential quadratic optimization problems with very manyconstraints

• Constrained Data Fitting

– LeastSquaresWrapper for data fitting problems in the L2-norm (least squares op-timization)

– NlpL1Wrapper for data fitting problems in the L1-norm

– NlpinfWrapper for data fitting problems in the L∞-norm (maximum-norm)

– NlpmmxWrapper for min-max-optimization

• Mixed Integer Optimization

– MidacoWrapper, a stochastic Gauss approximation algorithm

– MipOptimizer, an adapted sequential quadratic programming algorithm

– MisqpWrapper, a modified trust region mixed-integer SQP method that addressesnonlinear optimization problems with real, binary, integer and/or catalogued variables

• Multiple Objective Optimization

– MooTradeOff using the Trade-Off-Method

2

1.2. Program Components

– MooMinMaxOpt using the Min-Max-Method

– MooHierarchOptMethod using a hierarchical optimization Method

– MooGlobalCriterion using a global criterion

– MooDistFunc using the Method of distance functions

– MooWeightedObjectives using the method of weighted objectives

– MooWeightedTchebycheff using the weighted Tchebycheff-Method

– MooStepMethod, an interactive optimization method

3

Part I.

Constrained Nonlinear Optimization

5

2. SqpWrapper - Sequential QuadraticProgramming

The Fortran subroutine NLPQLP by Schittkowski solves smooth nonlinear programming prob-lems and is an extension of the code NLPQL. This version is specifically tuned to run underdistributed systems controlled by an input parameter l. In case of computational errors as forexample caused by inaccurate function or gradient evaluations, a non-monotone line search isactivated. Numerical results are included, which show that in case of noisy function values adrastic improvement of the performance is achieved compared to the version with monotone linesearch. The usage of the code is documented and illustrated by an example (chapter 10). Thesections 2.1, 2.2, 2.3 and 2.6 are taken from Schittkowski [138].

2.1. Introduction

We consider the general optimization problem to minimize an objective function f under non-linear equality and inequality constraints,

x ∈ Rn :

min f(x)gj(x) = 0 , j = 1, . . . ,me

gj(x) ≥ 0 , j = me + 1, . . . ,mxl ≤ x ≤ xu

(2.1)

where x is an n-dimensional parameter vector. It is assumed that all problem functions f(x)and gj(x), j = 1, . . ., m, are continuously differentiable on the whole Rn.

Sequential quadratic programming is the standard general purpose method to solve smoothnonlinear optimization problems, at least under the following assumptions:

• The problem is not too large.

• Functions and gradients can be evaluated with sufficiently high precision.

• The problem is smooth and well-scaled.

The code NLPQL of Schittkowski [123] is a Fortran implementation of a sequential quadraticprogramming (SQP) algorithm. The design of the numerical algorithm is founded on extensivecomparative numerical tests of Schittkowski [116, 121, 118], Schittkowski et al. [145], Hock andSchittkowski [69], and on further theoretical investigations published in [117, 119, 120, 122].The algorithm is extended to solve also nonlinear least squares problems efficiently, see [125] or[128], and to handle problems with very many constraints, see [126]. To conduct the numericaltests, a random test problem generator is developed for a major comparative study, see [116].Two collections with more than 300 academic and real-life test problems are published in Hockand Schittkowski [69] and in Schittkowski [124]. The test examples are part of the Cute test

7

2. SqpWrapper - Sequential Quadratic Programming

problem collection of Bongartz et al. [13]. About 80 test problems based on a Finite Elementformulation are collected for a comparative evaluation in Schittkowski et al. [145]. A set of 1,000least squares test problems solved by an extension of the code NLPQL to retain typical featuresof a Gauss-Newton algorithm, is described in [128].

Moreover, there exist hundreds of commercial and academic applications of NLPQL, for ex-ample

1. mechanical structural optimization, see Schittkowski, Zillober, Zotemantel [145] and Knep-pe, Krammer, Winkler [77],

2. data fitting and optimal control of transdermal pharmaceutical systems, see Boderke,Schittkowski, Wolf [10] or Blatt, Schittkowski [7],

3. computation of optimal feed rates for tubular reactors, see Birk, Liepelt, Schittkowski, andVogel [5],

4. food drying in a convection oven, see Frias, Oliveira, and Schittkowski [49],

5. optimal design of horn radiators for satellite communication, see Hartwanger, Schittkowski,and Wolf [66],

6. receptor-ligand binding studies, see Schittkowski [127],

7. optimal design of surface acoustic wave filters for signal processing, see Bunner, Schitt-kowski, and van de Braak [17].

The general availability of parallel computers and in particular of distributed computing innetworks motivates a careful redesign of NLPQL to allow simultaneous function evaluations. Theresulting extensions are implemented and the code is called NLPQLP. Another input parameterl is introduced for the number of parallel machines, that is the number of function calls to beexecuted simultaneously. In case of l = 1, NLPQLP is identical to NLPQL. Otherwise, theline search procedure is modified to allow parallel function calls, which can also be appliedfor approximating gradients by difference formulae. The mathematical background is outlined,in particular the modification of the line search algorithm to retain convergence under parallelsystems. It must be emphasized that distributed computation of function values is only simulatedthroughout the paper. It is up to the user to adopt the code to a particular parallel environment.

However, SQP methods are quite sensitive subject to round-off or any other errors in func-tion and especially gradient values. If objective or constraint functions cannot be computedwithin machine accuracy or if the accuracy by which gradients are approximated is above thetermination tolerance, the code could break down typically with the error message Status=4.In this situation, the line search cannot be terminated within a given number of iterations andthe algorithm is stopped.

The new version 2.0 makes use of non-monotone line search in the error situation describedabove. The idea is to replace the reference value of the line search termination check, ψrk(xk, vk),by

maxψrj (xj , vj) : j = k − p, . . . , k ,

where ψr(x, v) is a merit function and p a given parameter. The general idea is not new andfor example described in Dai [27], where a general convergence proof for the unconstrained case

8

2.2. Sequential Quadratic Programming Methods

is presented. The general idea goes back to Grippo, Lampariello, and Lucidi [57], and wasextended to constrained optimization and trust region methods in a series of subsequent papers,see Bonnans et al. [14], Deng et al. [29], Grippo et al. [58, 59], Ke and Han [74], Ke et al. [75],Lucidi et al. [87], Panier and Tits [101], Raydan [113], and Toint [157, 158]. However, there isa basic difference in the methodology: Our goal is to allow monotone line searches as long asthey terminate successfully, and to apply a non-monotone one only in a special error situation.

In Section 2.2 we outline the general mathematical structure of an SQP algorithm, the non-monotone line search, and the modifications to run the code under distributed systems. Section2.3 contains some numerical results obtained for a set of 306 standard test problems of thecollections published in Hock and Schittkowski [69] and in Schittkowski [124]. They show thesensitivity of the new version with respect to the number of parallel machines and the influenceof different gradient approximations under uncertainty. Moreover, we test the non-monotone linesearch versus the monotone one, and generate noisy test problems by adding random errors tofunction values and by inaccurate gradient approximations. This situation appears frequently inpractical environments, where complex simulation codes prevent accurate responses and wheregradients can only be computed by a difference formula.

The usage of the Fortran subroutine is documented in Section 4 and Section 5 contains anillustrative example.


Sequential quadratic programming or SQP methods belong to the most powerful nonlinear pro-gramming algorithms we know today for solving differentiable nonlinear programming problemsof the form (2.1). The theoretical background is described e.g. in Stoer [151] in form of a review,or in Spellucci [150] in form of an extensive text book. From the more practical point of view,SQP methods are also introduced in the books of Papalambros, Wilde [102] and Edgar, Him-melblau [37]. Their excellent numerical performance is tested and compared with other methodsin Schittkowski [116], and for many years they belong to the most frequently used algorithms tosolve practical optimization problems.

To facilitate the notation of this section, we assume that upper and lower bounds xu and xlare not handled separately, i.e., we consider the somewhat simpler formulation

x ∈ Rn :min f(x)gj(x) = 0 , j = 1, . . . ,me

gj(x) ≥ 0 , j = me + 1, . . . ,m(2.2)

It is assumed that all problem functions f(x) and gj(x), j = 1, . . ., m, are continuously differ-entiable on Rn.

The basic idea is to formulate and solve a quadratic programming subproblem in each iterationwhich is obtained by linearizing the constraints and approximating the Lagrangian function

L(x, u) := f(x)−m∑j=1

ujgj(x) (2.3)

quadratically, where x ∈ Rn is the primal variable and u = (u1, . . . , um)T ∈ Rm the multipliervector.

9


To formulate the quadratic programming subproblem, we proceed from given iterates xk ∈ Rn,an approximation of the solution, vk ∈ Rm, an approximation of the multipliers, and Bk ∈ Rn×n,an approximation of the Hessian of the Lagrangian function. Then one has to solve the quadraticprogramming problem

d ∈ Rn :

min 12d

TBkd+∇f(xk)Td

∇gj(xk)Td+ gj(xk) = 0 , j = 1, . . . ,me

∇gj(xk)Td+ gj(xk) ≥ 0 , j = me + 1, . . . ,m

(2.4)

Let dk be the optimal solution and uk the corresponding multiplier of this subproblem. A newiterate is obtained by (

xk+1

vk+1

):=

(xkvk

)+ αk

(dk

uk − vk

)(2.5)

where αk ∈ (0, 1] is a suitable steplength parameter.

Although we are able to guarantee that the matrix Bk is positive definite, it is possible that(2.4) is not solvable due to inconsistent constraints. One possible remedy is to introduce anadditional variable δ ∈ R, leading to a modified quadratic programming problem, see Schitt-kowski [123] for details.

The steplength parameter αk is required in (2.5) to enforce global convergence of the SQPmethod, i.e., the approximation of a point satisfying the necessary Karush-Kuhn-Tucker opti-mality conditions when starting from arbitrary initial values, typically a user-provided x0 ∈ Rnand v0 = 0, B0 = I. αk should satisfy at least a sufficient decrease condition of a merit functionφr(α) given by

φr(α) := ψr

((xv

)+ α

(d

u− v

))(2.6)

with a suitable penalty function ψr(x, v). Implemented is the augmented Lagrangian function

ψr(x, v) := f(x)−∑j∈J

(vjgj(x)− 1

2rjgj(x)2)− 1

2

∑j∈K

v2j /rj , (2.7)

with J := 1, . . . ,me ∪ j : me < j ≤ m, gj(x) ≤ vj/rj and K := 1, . . . ,m \ J , cf.Schittkowski [120]. The objective function is penalized as soon as an iterate leaves the feasibledomain. The corresponding penalty parameters rj , j = 1, . . ., m that control the degree ofconstraint violation, must carefully be chosen to guarantee a descent direction of the meritfunction, see Schittkowski [120] or Wolfe [166] in a more general setting, i.e., to get

φ′rk(0) = 5ψrk(xk, vk)T

(dk

uk − vk

)< 0 . (2.8)

Finally one has to approximate the Hessian matrix of the Lagrangian function in a suitableway. To avoid calculation of second derivatives and to obtain a final superlinear convergencerate, the standard approach is to update Bk by the BFGS quasi-Newton formula, cf. Powell [104]or Stoer [151].

The implementation of a line search algorithm is a critical issue when implementing a nonlinearprogramming algorithm, and has significant effect on the overall efficiency of the resulting code.On the one hand we need a line search to stabilize the algorithm, on the other hand it is not

10


desirable to waste too many function calls. Moreover, the behavior of the merit function becomesirregular in case of constrained optimization because of very steep slopes at the border causedby large penalty terms. Even the implementation is more complex than shown above, if linearconstraints and bounds of the variables are to be satisfied during the line search.

Usually, the steplength parameter αk is chosen to satisfy the Armijo [2] condition

φr(σβi) ≤ φr(0) + σβiµφ′r(0) , (2.9)

see for example Ortega and Rheinboldt [100]. The constants are from the ranges 0 < µ < 0.5,0 < β < 1, and 0 < σ ≤ 1. We start with i = 0 and increase i until (2.9) is satisfied for the firsttime, say at ik. Then the desired steplength is αk = σβik .

Fortunately, SQP methods are quite robust and accept the steplength one in the neighborhoodof a solution. Typically the test parameter µ for the Armijo-type sufficient descent property(2.9) is very small. Nevertheless the choice of the reduction parameter β must be adopted to theactual slope of the merit function. If β is too small, the line search terminates very fast, but onthe other hand the resulting stepsizes are usually too small leading to a higher number of outeriterations. On the other hand, a larger value close to one requires too many function calls duringthe line search. Thus, we need some kind of compromise, which is obtained by first applying apolynomial interpolation, typically a quadratic one, and use (2.9) only as a stopping criterion.Since φr(0), φ′r(0), and φr(αi) are given, αi the current iterate of the line search procedure, weeasily get the minimizer of the quadratic interpolation. We accept then the maximum of thisvalue and the Armijo parameter as a new iterate, as shown by the subsequent code fragmentimplemented in NLPQLP.

Algorithm 1

Let β, µ with 0 < β < 1, 0 < µ < 0.5 be given.

Start: α0 := 1

For i = 0, 1, 2, . . . do:

1) If φr(αi) < φr(0) + µ αi φ′r(0), then stop.

2) Compute αi :=0.5 α2

i φ′r(0)

αiφ′r(0)− φr(αi) + φr(0).

3) Let αi+1 := max(β αi, αi).

Corresponding convergence results are found in Schittkowski[120]. αi is the minimizer of thequadratic interpolation, and we use the Armijo descent property for checking termination. Step3 is required to avoid irregular values, since the minimizer of the quadratic interpolation could beoutside of the feasible domain (0, 1]. The search algorithm is implemented in NLPQLP togetherwith additional safeguards, for example to prevent violation of bounds. Algorithm 4.1 assumesthat φr(1) is known before calling the procedure, i.e., that the corresponding function valuesare given. We have to stop the algorithm, if sufficient descent is not observed after a certainnumber of iterations, say 10. If the tested stepsize falls below machine precision or the accuracyby which model function values are computed, the merit function cannot decrease further.

To outline the new approach, let us assume that functions can be computed simultaneouslyon l different machines. Then l test values αi = βi−1 with β = ε1/(l−1) are selected, i = 1, . . ., l,where ε is a guess for the machine precision. Next we require l parallel function calls to get the

11


corresponding model function values. The first αi satisfying a sufficient descent property (2.9),say for i = ik, is accepted as the new steplength to set the subsequent iterate by αk := αik . Onehas to be sure that existing convergence results of the SQP algorithm are not violated.

The proposed parallel line search will work efficiently, if the number of parallel machines l issufficiently large, and works as follows, where we omit the iteration index k.

Algorithm 2

Let β, µ with 0 < β < 1, 0 < µ < 0.5 be given.

Start: For αi = βi compute φr(αi) for i = 0, . . ., l − 1.

For i = 0, 1, 2, . . . do:

If φr(αi) < φr(0) + µ αi φ′r(0), then stop.

To precalculate l candidates in parallel at log-distributed points between a small toleranceα = τ and α = 1, 0 < τ << 1, we propose β = τ1/(l−1).

The paradigm of parallelism is SPMD, i.e., Single Program Multiple Data. In a typicalsituation we suppose that there is a complex application code providing simulation data, forexample by an expensive Finite Element calculation in mechanical structural optimization. It issupposed that various instances of the simulation code providing function values, are executableon a series of different machines, so-called slaves, controlled by a master program that executesNLPQLP. By a message passing system, for example PVM, see Geist et al. [51], only veryfew data need to be transferred from the master to the slaves. Typically only a set of designparameters of length n must to be passed. On return, the master accepts new model responsesfor objective function and constraints, at most m + 1 double precision numbers. All massivenumerical calculations and model data, for example the stiffness matrix of a Finite Elementmodel in a mechanical engineering application, remain on the slave processors of the distributedsystem.

In both situations, i.e., the serial or parallel version, it is still possible that Algorithm 1 orAlgorithm 2 breaks down because to too many iterations. In this case, we proceed from a descentdirection of the merit function, but φ′r(0) is extremely small. To avoid interruption of the wholeiteration process, the idea is to repeat the line search with another stopping criterion. Insteadof testing (2.9), we accept a stepsize αk as soon as the inequality

φrk(αk) ≤ maxk−p(k)<=j<=k

φrj (0) + αkµφ′rk

(0) (2.10)

is satisfied, where p(k) is a predetermined parameter with p(k) = mink, p, p a given tolerance.Thus, we allow an increase of the reference value φrjk (0) in a certain error situation, i.e., anincrease of the merit function value. To implement the non-monotone line search, we need aqueue consisting of merit function values at previous iterates. In case of k = 0, the referencevalue is adapted by a factor greater than 1, i.e., φrjk (0) is replace by tφrjk (0), t > 1. Thebasic idea to store reference function values and to replace the sufficient descent property by asufficient ’ascent’ property in max-form, is for example described in Dai [27], where a generalconvergence proof for the unconstrained case is presented. The general idea goes back to Grippo,Lampariello, and Lucidi [57], and was extended to constrained optimization and trust regionmethods in a series of subsequent papers, see Bonnans et al. [14], Deng et al. [29], Grippo etal. [58, 59], Ke and Han [74], Ke et al. [75], Lucidi et al. [87], Panier and Tits [101], Raydan [113],

12

2.3. Performance Evaluation

and Toint [157, 158]. However, there is a difference in the methodology: Our goal is to allowmonotone line searches as long as they terminate successfully, and to apply a non-monotone oneonly in an error situation.


Our numerical tests use the 306 academic and real-life test problems published in Hock andSchittkowski [69] and in Schittkowski [124]. Part of them are also available in the Cute library,see Bongartz et. al [13], and their usage is described in Schittkowski [129]. The distributionof the dimension parameter n, the number of variables, is shown in Figure 2.1. We see, forexample, that about 270 of 306 test problems have not more than 10 variables. In a similar way,the distribution of the number of constraints is shown in Figure 2.2.

Figure 2.1.: Distribution of number of variables.

Since analytical derivatives are not available for all problems, we approximate them numer-ically. The test examples are provided with exact solutions, either known from analytical pre-calculations by hand or from the best numerical data found so far. The Fortran codes arecompiled by the Intel Visual Fortran Compiler, Version 8.0, under Windows XP, and executedon a Pentium IV processor with 2.8 GHz.

First we need a criterion to decide whether the result of a test run is considered as a successfulreturn or not. Let ε > 0 be a tolerance for defining the relative accuracy, xk the final iterate ofa test run, and x? the supposed exact solution known from the test problem collection. Thenwe call the output a successful return, if the relative error in the objective function is less thanε and if the maximum constraint violation is less than ε2, i.e., if

f(xk)− f(x?) < ε|f(x?)| , if f(x?) <> 0

13


Figure 2.2.: Distribution of number of constraints.

orf(xk) < ε , if f(x?) = 0

andr(xk) = ‖g(xk)

−‖∞) < ε2 ,

where ‖ . . . ‖∞ denotes the maximum norm and gj(xk)− = min(0, gj(xk)), j > me, and gj(xk)

− =gj(xk) otherwise.

We take into account that a code returns a solution with a better function value than theknown one, subject to the error tolerance of the allowed constraint violation. However, thereis still the possibility that an algorithm terminates at a local solution different from the knownone. Thus, we call a test run a successful one, if in addition to the above decision the internaltermination conditions are satisfied subject to a reasonably small tolerance (IFAIL=0), and if

f(xk)− f(x?) ≥ ε|f(x?)| , if f(x?) <> 0

orf(xk) ≥ ε , if f(x?) = 0

andr(xk) < ε2 .

For our numerical tests, we use ε = 0.01, i.e., we require a final accuracy of one per cent.Gradients are approximated by the following three difference formulae:

1. Forward differences:∂

∂xif(x) ≈ 1

ηi

(f(x+ ηiei)− f(x)

)(2.11)

14


2. Two-sided differences:

∂

∂xif(x) ≈ 1

2ηi

(f(x+ ηiei)− f(x− ηiei

)(2.12)

3. Fourth-order formula:

∂

∂xif(x) ≈ 1

4!ηi

(2f(x− 2ηiei)− 16f(x− ηiei) + 16f(x+ ηiei)− 2f(x+ 2ηiei)

)(2.13)

Here ηi = ηmax(10−5, |xi|) and ei is the i-th unit vector, i = 1, . . . , n. The tolerance η dependson the difference formula and is set to η = ηm

1/2 for forward differences, η = ηm1/3 for two-sided

differences, and η = (ηm/72)1/4 for fourth-order formulae. ηm is a guess for the accuracy bywhich function values are computed, i.e., either machine accuracy in case of analytical formulaeor an estimate of the noise level in function computations. In a similar way, derivatives ofconstraints are computed.

The Fortran implementation of the SQP method introduced in the previous section, is calledNLPQLP. The code represents the most recent version of NLPQL which is frequently usedin academic and commercial institutions. NLPQLP is prepared to run also under distributedsystems, but behaves in exactly the same way as the serial version, if the number of simulatedprocessors is set to one. Functions and gradients must be provided by reverse communicationand the quadratic programming subproblems are solved by the primal-dual method of Goldfarband Idnani [56] based on numerically stable orthogonal decompositions. NLPQLP is executedwith termination accuracy ACC=10−8 and a maximum number of iterations MAXIT=500.

In the subsequent tables, we use the notation

number of successful test runs (according to above definition) : nsuccnumber of runs with error messages of NLPQLP (IFAIL>0) : nerraverage number of function evaluations : nfuncaverage number of gradient evaluations or iterations : ngradaverage number of equivalent function calls (function calls countedalso for gradient approximations)

: nequ

total execution time for all test runs in seconds : timeTo get nfunc, we count each single function call, also in the case of several simulated processors,

l > 1. However, function evaluations needed for gradient approximations, are not counted. Theiraverage number is nfunc for forward differences, 2×nfunc for two-sided differences, and 4×nfuncfor fourth-order formulae. One gradient computation corresponds to one iteration of the SQPmethod.

2.3.1. Testing Distributed Function Calls

First we investigate the question, how parallel line searches influence the overall performance.Table 2.1 shows the number of successful test runs and the average number of iterations orgradient evaluations, nit, for an increasing number of simulated parallel calls of model functionsdenoted by l. The forward difference formula (2.11) is used for gradient approximations andnon-monotone line search is applied with a queue size of p = 30. Calculation time is 1 sec for aseries of 306 test runs.

15


l nsucc nit l nsucc nit1 306 21 8 299 273 236 142 9 302 254 265 110 10 299 245 295 63 15 302 236 299 37 20 301 217 301 30 50 301 20

Table 2.1.: Performance Results for Parallel Line Search

l = 1 corresponds to the sequential case, when Algorithm 1 is applied to the line searchconsisting of a quadratic interpolation combined with an Armijo-type bisection strategy and anon-monotone stopping criterion.

In all other cases, l > 1 simultaneous function evaluations are made according to Algorithm 2.To get a reliable and robust line search, we need at least 5 parallel processors. No significantimprovements are observed, if we have more than 10 parallel function evaluations.

The most promising possibility to exploit a parallel system architecture occurs, when gradientscannot be calculated analytically, but have to be approximated numerically, for example byforward differences, two-sided differences, or even higher order methods. Then we need at leastn additional function calls, where n is the number of optimization variables, or a suitable multipleof n.

2.3.2. Testing Gradient Approximations by Difference Formulae under RandomNoise

For our numerical tests, we apply the three different difference formulae mentioned before,see (2.11), (2.12), and (2.13). To test the stability of these formulae, we add some randomlygenerated noise to each function value. Non-monotone line search is applied with a queue sizeof p = 30, and the serial line search calculation by Algorithm 1 is required.

Tables 2.2 to 2.4 show the corresponding results for the different procedures under consider-ation, and for increasing random perturbations (εerr). More precisely, if ρ denotes a uniformlydistributed random number between 0 and 1, we replace f(xk) by f(xk)(1 + εerr(2ρ − 1)) foreach iterate xk. In the same way, restriction functions are perturbed. The tolerance for approx-imating gradients, ηm, is set to the machine accuracy in case of εerr = 0, and to the randomnoise level otherwise.

The results are surprising and depend heavily on the new non-monotone line search strategy.There are no significant differences in the number of test problems solved between the threedifferent formulae despite of the increasing theoretical approximation orders. Moreover, we areable to solve about 80 % of the test examples in case of extremely noisy function values withat most two correct digits. If we take the number of equivalent function calls into account, weconclude that forward differences are more efficient than higher order formulae.

16


εerr nsucc nfunc ngrad nequ0 306 33 21 32810−12 306 91 23 40110−10 304 58 24 39810−8 299 84 26 45210−6 298 117 30 63710−4 275 162 31 56810−2 229 220 31 564

Table 2.2.: Test Results for Forward Differences

εerr nsucc nfunc ngrad nequ0 306 34 20 61410−12 302 42 23 65410−10 297 52 22 66310−8 300 53 22 66910−6 294 83 23 71410−4 293 121 25 77610−2 252 166 25 722

Table 2.3.: Test Results for Two-sided Differences

εerr nsucc nfunc ngrad nequ0 306 32 20 1,18510−12 302 33 21 1,20210−10 298 44 23 1,24910−8 296 45 22 1,27810−6 298 66 24 1,42110−4 294 97 24 1,47810−2 249 153 27 1,272

Table 2.4.: Test Results for Fourth-order Formula

17


η = ε1/2 η = 10−7

ε nsucc−nm nsucc−m nsucc−nm nsucc−m0 306 304 306 30410−12 306 302 305 30310−10 304 299 299 28610−8 299 287 268 19610−6 298 255 176 5810−4 275 202 112 2210−2 229 103 113 17

Table 2.5.: Monotone versus Non-Monotone Line Search

2.3.3. Comparing Monotone versus Non-Monotone Line Search

To investigate the situation in more detail, we proceed from forward differences and summarizethe number of successful returns for the non-monotone and the monotone line search, nsucc−nmand nsucc−m. The results are listed in Table 2.5. They clearly indicate the advantage of non-monotone line searches over the monotone ones. Robustness and stability of the SQP methodare significantly increased especially in case of large noise in function evaluations.

A further series of test runs concerns the situation that a user fixes the tolerance η for gradientapproximations to η = 10−7 despite of possible inaccurate function evaluations. This is aunlikely worst-case scenario and should only happen in a a situation, where a black-box derivativecalculation is used and where a user is not aware of the accuracy by which derivatives areapproximated. Whereas nearly all test runs break down with error messages for monotone linesearch and large random perturbations, the non-monotone line search is still able to terminatein at least 30 % of all NLPQLP calls, see Table 2.5.

2.4. Program Documentation

Besides the implementation described here, Nlp++ provides the possibility to derive your prob-lem from a given base class and therefore be able to use an automated loop over StartOptimizerand the function/gradient evaluation. For this possibility see chapter 31.

Usage without DefaultLoop

1. Generate an SqpWrapper object employing SqpWrapper()

2. a) Put the number of design variables to the object:

PutNumDv( int iNumDv ) ;

b) Put the number of available systems to the object:

Put( const char * pcParam , const int * piNumParSys ) ;

or

Put( const char * pcParam , const int iNumParSys ) ;

18


where pcParam = "NumParallelSys".

c) Put the values for other parameters you want to adjust:

• Put( const char * pcParam , const int * piValue ) ;

orPut( const char * pcParam , const int iValue ) ;

where pcParam is one of the following parameters:

– "NumParallelSys" Put number of parallel systems.Redefining the value for the number of parallel systems is only allowed whenoptimization has not yet been started.

– "MaxNumIter" Put maximum number of iterations.

– "MaxNumIterLS" Put maximum number of function calls during line search.This value is only needed when there is only one parallel system. In case ofmore parallel systems the value is ignored.

– "OutputLevel" Put desired output level.

– "FortranOutputLevel" If > 0, the output of the fortran code is written toan additional output file. This output might give more detailed informationabout errors.0: no additional output (default)1: only final convergence analysis2: one line of intermediate results is printed in each iteration.3: more detailed information is printed in each iteration step, e.g. variable,constraint and multiplier values. 4: In addition to 3, merit function andsteplength values are displayed during the line search. NOTE: Constraint andmultiplier values are not displayed if there are more than 1000 constraints.

– "OutputUnit" Put output unit for the Fortran output.

– "OutUnit" Same as ”OutputUnit”

– "StackSizeLS" Stack size for storing merit function values at previous iter-ations for non-monotone line search (e.g. 10). In case of "StackSizeLS" ==0, monotone line search is performed. "StackSizeLS" should not be greaterthan 50.

– "Mode" Parameter for warm starts:

Mode = 0 : Normal execution.

Mode = 1 : The user wants to provide an initial guess for the multipliers inLagrMultiplier and for the Hessian of the Lagrangian function in HessLagr

or in LoTriLdlHessLagr and DiagLdlHessLagr.

Mode = 2 : Initial scaling (Oren-Luenberger) after first step.

Mode > 2 : Initial and repeated scaling after Mode steps.

• Put( const char * pcParam , const double * pdValue ) ;

orPut( const char * pcParam , const double dValue ) ;


19


– "TermAcc" Put desired final accuracy.

– "AccQP" Put desired accuracy for QP solver.

– "MinStepLength" Put Minimum steplength in case of "NumParallelSys">1. Recommended is any value in the order of the accuracy by which func-tions are computed. The value is needed to compute a steplength reduc-tion factor by MinStepLength1/NumParallelSys−1. If MinStepLength ≤ 0, thenMinStepLength = TermAcc is used.

– "RestartParam" Parameter for initializing a restart in case of an uphillsearch direction by setting the BFGS-update matrix to RestartParam*I,where I denotes the identity matrix. The number of restarts is bounded byMaxNumIterLS. No restart is performed if RestartParam is less than 1. Mustbe greater than 0 (e.g. 100).

– "HessLagr" Put the Hessian of the Lagrangian as full matrix. Only if LQL= true.Note: The matrix must have one additional column and one additional rowand must be stored columnwise.

– "LoTriLdlHessLagr" Put the lower triangular matrix L of the LDLT de-composition of the Hessian of the Lagrangian at the initial iterate. (Withoutthe diagonal elements, which are 1). Only if LQL = false.Note: The matrix must have one additional column and one additional rowand must be stored columnwise.

– "DiagLdlHessLagr" Put the diagonal matrix D of the LDLT decompositionof the Hessian of the Lagrangian at the initial iterate. Only if LQL = false.

– "LagrMultiplier" Put array of multipliers with respect to the initial iter-ate. The first M locations must contain the multipliers of the M nonlinearconstraints, the subsequent N locations the multipliers of the lower bounds,and the final N locations the multipliers of the upper bounds. At an opti-mal solution, all multipliers with respect to inequality constraints should benonnegative.

• Put( const char * pcParam , const char * pcValue ) ;

orPut( const char * pcParam , const char cValue ) ;

where pcParam = "OutputFileName" to set the name of the output file.

• Put( const char * pcParam , const bool * pbValue ) ;

orPut( const char * pcParam , const bool bValue ) ;


– "LQL" If LQL is set true, the quadratic programming subproblem is solvedwith a full positive definite quasi-Newton matrix. Otherwise,a Cholesky de-composition is performed and updated, so that the subproblem matrix con-tains only a triangular factor.

– "OpenNewOutputFile" Determines whether a new output file must be createdby the optimizer. If false, the output is appended to an existing file. This

20


is the case, if the optimizer is used as a subproblem solver (e.g. in a multipleobjective optimizer). Usually, this parameter does not have to be changed.

d) Put the number of inequality constraint functions to the object:

PutNumIneqConstr( int iNumIneqConstr ) ;

e) Put the number of equality constraint functions to the object:

PutNumEqConstr( int iNumEqConstr ) ;

3. Set boundary values and provide initial guess.

a) Set upper and lower bounds using

PutLowerBound( const int iVariableIndex, const double dLowerBound );

PutUpperBound( const int iVariableIndex, const double dUpperBound );

for iVariableIndex = 0,...,NumberOfVariables - 1.or

PutUpperBound( const double * pdXu );

PutLowerBound( const double * pdXl );

where pdXu and pdXl must be pointing on arrays of length equal to the number ofdesign variables.

Before setting bounds, the number of design variables must be set.

If no bounds are set, these will be -1E30 and +1E30 by default.

b) Provide an initial guess for the optimization vector by

PutInitialGuess( const double * pdInitialGuess )

where pdInitialGuess must be a double array of length equal to the number ofvariables or

PutInitialGuess( const int iVariableIndex, const double dInitialGuess )

for iVariableIndex = 0,...,NumberOfVariables - 1.

Before setting an initial guess, the number of design variables must be set.

c) If you want to provide an initial guess for the Hessian of the Lagrangian and the La-grange multipliers use Put( const char * pcParam , const double * pdValue )

to set the values. Use Put( const char * pcParam , const int * piValue ) toset parameter "Mode" = 1.

4. Start Sqp-Routine: StartOptimizer()

5. Check status of the object using GetStatus( int & iStatus) const

• if iStatus equals SolvedId(): The final accuracy has been achieved, the problem issolved.

• if iStatus is > 0: An error occured during the solution process.

21


• if iStatus equals EvalFuncId(): SqpWrapper needs new function values → 6b Newfunction values.After passing these values to the SqpWrapper object go to 4.

• if iStatus equals EvalGradId():SqpWrapper needs new values for gradients → 6c Providing new gradient values.After passing these values to the SqpWrapper object go to 4.

6. Providing new function and gradient values:

a) Useful methods:

i. GetNumConstr( int & iNumConstr ) const

Returns the number of constraints.GetNumDv( int & iNumDesignVar ) const

Returns the number of design variables.Get( const char * pcParam, int & iNumParallelSys )

Returns the number of parallel systems when pcParam = "NumParallelSys".

ii. GetDesignVarVec( const double * & pdPointer, const int i ) const

Returns a pointer to the i’th design variable vector (default: i=0 ).GetDesignVar( const int iVarIdx , double & pdPointer, const int iVectorIdx

) const

Returns the value of the iVarIdx’th design variable of the iVectorIdx’th designvariable vector.

b) Providing new function valuesFunction values for the objective as well as for constraints have to be calculated forall optimization vectors provided by Sqp.Note: When the status is equal EvalFuncId() for the first time, the functions needto be evaluated only for the 0’th parallel system.For access to the design variable vectors see 6(a)ii. After calculating the values, thesemust be passed to the SqpWrapper object using:

• PutConstrValVec( const double * const pdConstrVals,

const int iParSysIdx ),for iParSysIdx=0,...,NumberOfParallelSystems - 1

where pdConstrVals is pointing on an array containing the values of each con-straint at the iParSysIdx’th design variable vector provided by SqpWrapper.

• PutObjVal( const double dObjVal, const int iObjFunIdx,

const int iParSysIdx ),for iParSysIdx = 0,...,NumberOfParallelSystems - 1

with iObjFunIdx = 0 and dObjVal defining the value of the objective functionat the iParSysIdx’th design variable vector provided by SqpWrapper.

Alternatively you can employPutConstrVal( const int iConstrIdx, const double dConstrValue,


and iConstrIdx = 0,...,NumberOfConstraints - 1

22


c) Providing new gradient valuesGradients must be calculated for the objective and the active constraints at thecurrent design variable vector.

Get( const char * pcParam, int * & piActPtr )

with pcParam = "ActConstr" sets piActPtr on an array of length equal to the num-ber of constraints whose entries are 0 for an inactive constraint and 1 for an activeconstraint. Note: piActPtr must be NULL on input.

For access to the design variable vector see 6(a)ii with iParSysIdx = 0.

For the gradients of the constraints you can usePutDerivConstr( const int iConstrIdx, const int iVariableIdx,

const double dDerivativeValue )

for iVariableIdx = 0,...,NumberOfVariables - 1


where dDerivativeValue is defined as the derivative of the iConstrIdx’th con-straint with respect to the iVariableIdx’th variable.

Alternatively, you can use

PutGradConstr( const int iConstrIdx, const double * pdGradient ),

for iConstrIdx = 0,...,NumberOfConstraints - 1

where pdGradient is a pointer on the gradient of the iConstrIdx’th constraint.

For the gradient of the objective you can use

• PutDerivObj( const int iVariableIdx,

const double dDerivativeValue,

const int iFunIdx ),with iFunIdx = 0 (default).for iVariableIdx = 0,...,NumberOfVariables - 1

where dDerivativeValue is defined as the derivative of the objective with respectto the iVariableIdx’th variable at the current design variable vector.

• PutGradObj( const double * pdGradient, const int iFunIdx),where iFunIdx = 0 (default) and pdGradient is a pointer on the gradient of theobjective at the current design variable vector.

7. Output

• GetConstrValVec( const double * & pdPointer ) const

Returns a pointer to an array containing the values of all constraint functions at thelast solution vector.

• GetConstrVal( const int iConstrIdx, double & dConstrVal ) const

Returns the value of the iConstrIdx’th constraint function at the last solution vector.

• GetDerivConstr( const int iConstrIdx , const int iVariableIdx,

double & dDerivativeValue ) const

Returns the value of the derivative of the iConstrIdx’th constraint with respect tothe iVariableIdx’th design variable at the last solution vector.

23


• GetDerivObj( const int iVariableIdx, double & dDerivativeValue,

const int iFunIdx ) const

with iFunIdx = 0 (default) returns the value of the derivative of the objective withrespect to the iVariableIdx’th design variable at the last solution vector.

• GetObjVal( double & dObjVal, const int iFunIdx ) const

with iFunIdx = 0 (default) returns the value of the objective function at the lastsolution vector.

• GetDesignVar( const int iVariableIdx,

double & dValue,

const int iParSysIdx ) const

with iParSysIdx = 0 (default) returns the value of the iVariableIdx’th design vari-able in the last solution vector.

• GetDesignVarVec( const double * & pdPointer,

const int iParSysIdx) const

with iParSysIdx = 0 (default) returns a pointer to the last solution vector.

Some of the termination reasons depend on the accuracy used for approximating gradients.If we assume that all functions and gradients are computed within machine precision and thatthe implementation is correct, there remain only the following possibilities that could cause anerror message:

1. The termination parameter TermAcc is too small, so that the numerical algorithm playsaround with round-off errors without being able to improve the solution. Especially theHessian approximation of the Lagrangian function becomes unstable in this case. Astraightforward remedy is to restart the optimization cycle again with a larger stoppingtolerance.

2. The constraints are contradicting, i.e., the set of feasible solutions is empty. There is noway to find out, whether a general nonlinear and non-convex set possesses a feasible pointor not. Thus, the nonlinear programming algorithms will proceed until running in any ofthe mentioned error situations. In this case, the correctness of the model must be verycarefully checked.

3. Constraints are feasible, but some of them there are degenerate, for example if some of theconstraints are redundant. One should know that SQP algorithms assume the satisfactionof the so-called constraint qualification, i.e., that gradients of active constraints are linearlyindependent at each iterate and in a neighborhood of an optimal solution. In this situation,it is recommended to check the formulation of the model constraints.

However, some of the error situations also occur if, because of wrong or non-accurate gradients,the quadratic programming subproblem does not yield a descent direction for the underlyingmerit function. In this case, one should try to improve the accuracy of function evaluations,scale the model functions in a proper way, or start the algorithm from other initial values.

Since Version 2.1, NLPQLP returns the best iterate obtained. In case of successful termination(Status = 0), this is always the last iterate, in case of non-successful return (Status > 0) aneventually better previous iterate. The success if measured by objective function value and

24

2.5. Example - using parallel line search in SqpWrapper

constraint violation. Note that the output of constraints and multiplier values is suppressed formore than 10.000 constraints.


An example for the use of the active set strategy can be found in chapter 10. As an examplefor the use of parallel line search we solve the following problem:

Minimize F (x) = −x0 · x1 ,s.t. G0(x) = x2

0 + x21 ≥ 0 ,

G1(x) = 1− x20 − x2

1 ≥ 0 ,0 ≤ xi ≤ 106, ∀i = 0, 1

We implement the problem using DefaultLoop():

The file SqpExample.h contains the class definition.

1 # ifndef SQP EXAMPLE H INCLUDED2 # define SQP EXAMPLE H INCLUDED34 #include”OptProblem . h”567 class SqpExample : public OptProblem8 9 public :

1011 SqpExample ( ) ;1213 int FuncEval ( bool bGradApprox = fa l se ) ;1415 int GradEval ( ) ;1617 int SolveOptProblem ( ) ;1819 ;2021 # endif

The file SqpExample.cpp contains the implementation:

1 #include”SqpExample . h”2 #include<iostream>3 #include<cmath>4 #include”SqpWrapper . h”56 using std : : cout ;7 using std : : endl ;

25


8 using std : : c in ;9

10 SqpExample : : SqpExample ( )11 12 cout << endl << ”−−−− t h i s i s Example 10 −−−−” << endl << endl ;13 cout << endl << ”−−− so lved by SqpWrapper −−−” << endl << endl ;1415 m pOptimizer = new SqpWrapper ( ) ;16 1718 int SqpExample : : FuncEval ( bool /∗bGradApprox∗/ )19 20 const double ∗ dX ;21 int iNumParSys ;22 double ∗ pdFuncVals ;23 int iNumConstr ;2425 m pOptimizer−>GetNumConstr ( iNumConstr ) ;2627 pdFuncVals = new double [ iNumConstr + 1 ] ;2829 m pOptimizer−>Get ( ” NumParallelSys ” , iNumParSys ) ;3031 // The for−l oop s i m u l a t e s a d i s t r i b u t e d system where32 // the f u n c t i o n s can be e v a l u a t e d at a l l33 // des i gn v a r i a b l e v e c t o r s s i m u l t a n e o u s l y .34 for ( int i = 0 ; i < iNumParSys ; i++ )35 36 m pOptimizer−>GetDesignVarVec ( dX, i ) ;3738 // 0 th c o n s t r a i n t39 pdFuncVals [ 0 ] = dX [ 0 ] ∗ dX [ 0 ] + dX [ 1 ] ∗ dX [ 1 ] ;4041 // 1 s t c o n s t r a i n t42 pdFuncVals [ 1 ] = 1 .0 − dX [ 0 ] ∗ dX[0]− dX [ 1 ] ∗ dX [ 1 ] ;4344 // o b j e c t i v e45 pdFuncVals [ 2 ] = − dX [ 0 ] ∗ dX [ 1 ] ;4647 m pOptimizer−>PutObjVal ( pdFuncVals [ iNumConstr ] , 0 , i ) ;48 m pOptimizer−>PutConstrValVec ( pdFuncVals , i ) ;49 5051 delete [ ] pdFuncVals ;5253 return EXIT SUCCESS ;

26


54 5556 int SqpExample : : GradEval ( )57 58 const double ∗ dX ;5960 m pOptimizer−>GetDesignVarVec ( dX ) ;6162 m pOptimizer−>PutDerivObj ( 0 , − dX [ 1 ] ) ;63 m pOptimizer−>PutDerivObj ( 1 , − dX [ 0 ] ) ;6465 m pOptimizer−>PutDerivConstr ( 0 , 0 , 2 . 0 ∗ dX [ 0 ] ) ;66 m pOptimizer−>PutDerivConstr ( 0 , 1 , 2 . 0 ∗ dX [ 1 ] ) ;6768 m pOptimizer−>PutDerivConstr ( 1 , 0 , ( − 2 .0 ) ∗ dX [ 0 ] ) ;69 m pOptimizer−>PutDerivConstr ( 1 , 1 , ( − 2 .0 ) ∗ dX [ 1 ] ) ;7071 return EXIT SUCCESS ;72 7374 int SqpExample : : SolveOptProblem ( )75 76 int iE r r o r ;7778 iEr ro r = 0 ;7980 // Suppose we have a d i s t r i b u t e d system which can e v a l u a t e81 // f u n c t i o n s at 10 des i gn v a r i a b l e v e c t o r s s i m u l t a n e o u s l y .82 m pOptimizer−>Put ( ” NumParallelSys ” , 10 ) ;8384 // The f o l l o w i n g parameters do not have to be se t ,85 // t h e r e are d e f a u l t v a l u e s g iven in the o p t i m i z e r .86 // However f o r the f i n e tuning i t may be necessary to s e t them .87 int iMaxNumIter = 500 ;88 int iMaxNumIterLS = 30 ;89 double dTermAcc = 1 .0E−6 ;9091 // Parameters can be s e t us ing the address o f the v a l u e . . .92 m pOptimizer−>Put ( ”MaxNumIter” , & iMaxNumIter ) ;93 // . . . a v a r i a b l e c o n t a i n i n g the v a l u e . . .94 m pOptimizer−>Put ( ”MaxNumIterLS” , iMaxNumIterLS ) ;95 m pOptimizer−>Put ( ”TermAcc” , dTermAcc ) ;96 // . . . or the v a l u e i t s e l f .97 m pOptimizer−>Put ( ”OutputLevel ” , 2 ) ;9899

27


100 // We d e f i n e the o p t i m i z a t i o n problem :101102 m pOptimizer−>PutNumDv( 2 ) ;103 m pOptimizer−>PutNumIneqConstr ( 2 ) ;104 m pOptimizer−>PutNumEqConstr ( 0 ) ;105106 double LB[ 2 ] = 0 . 0 , 0 . 0 ;107 double UB[ 2 ] = 10E6 , 10E6 ;108 double IG [ 2 ] = 0 . 8 , 0 .05 ;109110 m pOptimizer−>PutUpperBound ( UB ) ;111 m pOptimizer−>PutLowerBound ( LB ) ;112 m pOptimizer−>Put In i t i a lGue s s ( IG ) ;113114115 // S t a r t the o p t i m i z a t i o n116 iEr ro r = m pOptimizer−>DefaultLoop ( this ) ;117118 // i f an e rror occured , r e p o r t i t . . .119 i f ( iE r ro r != EXIT SUCCESS )120 121 cout << ” Error ” << iE r r o r << ”\n” ;122 return iE r r o r ;123 124125 // . . . e l s e r e p o r t the r e s u l t :126127 const double ∗ dX ;128129 m pOptimizer−>GetDesignVarVec ( dX ) ;130131 int iNumDesignVar ;132133 m pOptimizer−>GetNumDv( iNumDesignVar ) ;134135 for ( int iDvIdx = 0 ; iDvIdx < iNumDesignVar ; iDvIdx++ )136 137 cout << ”dX[ ” << iDvIdx << ” ] = ” << dX[ iDvIdx ] << endl ;138 139 return EXIT SUCCESS ;140

28

2.6. Conclusions

2.6. Conclusions

We present a modification of an SQP algorithm designed for execution under a parallel comput-ing environment (SPMD) and where a non-monotone line search is applied in error situations.Numerical results indicate stability and robustness for a set of 306 standard test problems. Itis shown that not more than 6 parallel function evaluations per iteration are required for con-ducting the line search. Significant performance improvement is achieved by the non-monotoneline search especially in case of noisy function values and numerical differentiation. There areno differences in the number of test problems solved between forward differences, two-sided dif-ferences, and fourth-order formula, even not in case of severe random perturbations. With thenew non-monotone line search, we are able to solve about 80 % of the test examples in caseof extremely noisy function values with at most two correct digits and forward differences forderivative calculations.

29

3. NlpqlbWrapper - Solving Problems WithVery Many Constraints

The Fortran subroutine NLPQLB by Schittkowski solves smooth nonlinear programming prob-lems with a large number of constraints, but a moderate number of variables. The underlyingalgorithm applies an active set method proceeding from a given bound mw for the maximumnumber of expected active constraints. A quadratic programming subproblem is generated withmw linear constraints, the so-called working set, which are internally exchanged from one iterateto the next. Only for active constraints, i.e., a certain subset of the working set, new gradientvalues must be computed. The line search takes the active constraints into account. In caseof computational errors as for example caused by inaccurate function or gradient evaluations, anon-monotone line search is activated. Numerical results are included for some academic testproblems, which show that nonlinear programs with up to 200,000,000 nonlinear constraintscan be efficiently solved. The amazing observation is that despite of a large number of nearlydependent active constraints, the underlying SQP code converges very fast. The usage of thecode is documented and illustrated by an example (chapter 10). The sections 3.1, 3.2, 3.3 and3.5 are taken from Schittkowski [137].

3.1. Introduction

We consider the general optimization problem to minimize an objective function under nonlinearequality and inequality constraints,

x ∈ Rn :

min f(x)

gj(x) = 0 , j = 1, . . . ,me ,

gj(x) ≥ 0 , j = me + 1, . . . ,m ,

xl ≤ x ≤ xu ,

(3.1)

where x is an n-dimensional parameter vector. It is assumed that all problem functions f(x) andgj(x), j = 1, . . ., m, are continuously differentiable on the whole Rn. To simplify the notation,we omit the upper and lower bounds and get a problem of the form

x ∈ Rn :

min f(x)

gj(x) = 0 , j = 1, . . . ,me ,

gj(x) ≥ 0 , j = me + 1, . . . ,m .

(3.2)

We assume now that the nonlinear programming problem possesses a very large number ofnonlinear inequality constraints on the one hand, but a much lower number of variables. Atypical situation is the discretization of an infinite number of constraints, as indicated by thefollowing case studies.

31

3. NlpqlbWrapper - Solving Problems With Very Many Constraints

1. Semi-infinite optimization: Constraints must be satisfied for all y ∈ Y , where y is anadditional variable and Y ⊂ Rr,

x ∈ Rn :min f(x)

g(x, y) ≥ 0 for all y ∈ Y .(3.3)

Here we assume for simplicity that there is only one scalar restriction of inequality type. Ifwe discretize the set Y , we get a standard nonlinear programming problem, but with a largenumber of constraints depending on the desired accuracy,

x ∈ Rn :min f(x)

g(x, yj) ≥ 0 , j = 1, . . . ,m ,(3.4)

where yj is a discretization of Y , e.g., yj = j−1m−1 for j = 1, . . ., m in case of Y = [0, 1].

2. Min-max optimization: We minimize the maximum of a function f depending now on twovariables x ∈ Rn and y ∈ Rr,

minx∈X

maxy∈Y

f(x, y) (3.5)

with suitable subsets X ⊂ Rn and Y ⊂ Rr, respectively. (3.5) is easily transformed into anequivalent standard nonlinear programming problem of the form

x ∈ X :min t

f(x, y) ≤ t for all y ∈ Y .(3.6)

Again, we get a semi-infinite optimization problem provided that Y is an infinite set.

3. L∞-Approximation: The situation is similar to min-max optimization, but we want tominimize the maximum of absolute values of a given set of functions,

minx∈Rn

maxi=1,...,r

|fi(x)| . (3.7)

Typically, the problem describes the approximation of a nonlinear functions by a simpler func-tion, i.e., fi(x) = f(ti) − p(ti, x). In this case, f(t) is a given function depending on a variablet ∈ R and p(t, x) a member of a class of approximating functions, e.g., a polynomial in t withcoefficients x ∈ Rn. The problem is non-differentiable, but can be transformed into a smoothone assuming that all functions fi(x) are smooth,

x ∈ Rn :

min t

fi(x) ≤ t i = 1, . . . , r ,

fi(x) ≥ −t i = 1, . . . , r .

(3.8)

4. Optimal control: The goal is to determine a control function u(t) depending on a timevariable t, which has to minimize a cost criterion subject to a state equation in form of a systemof differential equations and additional restrictions on the state and control variables that areto be satisfied for all time values under consideration. If the control problem is discretized in aproper way, we get a nonlinear programming problem with a large number of constraints.

32

3.1. Introduction

5. Mechanical structural optimization: Suppose we want to minimize the weight of a mechan-ical structure. Typical constraints in this case are bounds for allowable stresses or displacements,among very many other possible types of restrictions. After modeling the structure by a finiteelement technique, we will get a standard nonlinear programming problem. If certain restric-tions are to be taken into account for each element of the structure, we will get optimizationproblems with a large number of constraints. Moreover a design engineer may wish to definedifferent load cases, where part of the constraints is duplicated subject to some other data. InKneppe [76] it is reported that the optimization of the frame of an airplane fuselage lead to anoptimization problem with 187 variables and 92,829 nonlinear constraints.

The examples mentioned above motivate the necessity to develop special methods for prob-lems with very many restrictions. The total number of constraints is so large that either thelinearized constraints cannot be stored in memory or slow down the solution process unneces-sarily. Although we can expect that most of the constraints are redundant, we cannot predicta priori which constraints are the important ones, i.e., probably active at the optimal solution,and which not.

Sequential quadratic programming methods construct a sequence of quadratic programmingsubproblems by approximating the Lagrangian function

L(x, u).= f(x)−

m∑j=1

ujgj(x) (3.9)

quadratically and by linearizing the constraints. The resulting quadratic programming subprob-lem

d ∈ Rn, δ ∈ R :

min 12d

TBkd+5f(xk)Td

5gj(xk)Td+ gj(xk) = 0 , j = 1, . . .me ,

5gj(xk)Td+ gj(xk) ≥ 0 , j = me + 1, . . .m

(3.10)

can be solved by any available black-box algorithm, at least in principle. Here, xk denotes acurrent iterate and Bk an estimate of the Hessian of the Lagrangian function (3.9) updated bythe BFGS quasi-Newton method. However, if m is large, the Jacobian might become too big tobe stored in the computer memory.

The basic idea is to proceed from a user-provided value mw with

n ≤ mw ≤ m

by which we estimate the maximum number of expected active constraints. Only quadraticprogramming subproblems with mw linear constraints are created which require lower storageand allow faster numerical solution. Thus, one has to develop a strategy to decide, whichconstraint indices are added to a working set of size mw

W.= j1, . . . , jmw ⊂ 1, . . . ,m

and which ones have to leave the working set. It is recommended to keep as many constraints aspossible in a working set, i.e., to keep mw as large as possible, since also non-active constraintcould contribute an important influence on the computation of search direction.

33


It is, however, possible, that too many constraints are violated at a starting point even ifit is known that the optimal solution possesses only very few active constraints. To avoid anunnecessary blow-up of the working set, it is also possible to extend the given optimizationproblem by an additional artificial variable xn+1, which, if chosen sufficiently large at start,decreases the number of active constraints. (3.1) or (3.2), respectively, is then replaced by

x ∈ Rn+1 :

min f(x) + ρxn+1

gj(x) = 0 , j = 1, . . . ,me ,

gj(x) + xn+1 ≥ 0 , j = me + 1, . . . ,m ,

xn+1 ≥ 0 .

(3.11)

However, this transformation does not make sense in cases where the original problem istransformed in a similar way as for example the min-max problem (3.6). Also, the choice of thepenalty parameter ρ and the starting value for xn+1 is crucial. A too rapid decrease of xn+1

to zero must be prevented to avoid too many active constraints, which is difficult to achieve ingeneral. But if adapted to a specific situation, the transformation works very well and can beextremely helpful.

There is another motivation for considering active sets. Since we want to solve problems witha large number of constraints, many of them are probably redundant. But in any case, we haveto require the evaluation of gradients for all of them in the working set. Thus, an additionalactive set strategy is proposed with the aim to reduce the number of gradient evaluations, andto calculate gradients at a new iterate only for a certain subset of estimated active constraints.The underlying SQP algorithm is described in Schittkowski [120], and the presented active setapproach for solving problems with a large number of constraints in Schittkowski [126].

Active set strategies are widely discussed in the nonlinear programming literature and havebeen implemented in most of the available codes. A computation study for linear constraintswas even conducted in the 70’s, see Lenard [80], and Google finds 267,000 hits for active set

strategy nonlinear programming. It is out of the scope of this paper to give a review. Some ofthese strategies are quite complex and a typical example is the one included in the KNITROpackage for large scale optimization, see Byrd, Gould, Nocedal, and Waltz [19], based on linearprogramming and equality constrained subproblems.

From the technical point of view, NLPQLB is implemented in form of a Fortran subroutine,where function and gradient values are passed through reverse communication, see chapter 2 orSchittkowski [132]. NLPQLB calls the SQP code NLPQLP, see again chapter 2 or[132], withexactly mw constraints, where the constraints of the working set are changed from one iterationto the next.

The modified SQP-algorithm is described in Section 3.2 in detail. Since some heuristics areincluded which prevent a rigorous convergence analysis, at least the most important sufficientdecrease property is available which shows that the algorithm is well-defined. Some numericaltest results based on a few academic examples are found in Section 3.3, where the numberof nonlinear constraints is very large, i.e., up to 200,000,000. More details of the softwareimplementation and the usage of the code NLPQLB are presented in Section 3.4. Chapter 10contains a simple example to become familiar with the software.

34

3.2. An Active-Set Sequential Quadratic Programming Method


Since we want to modify the sequential quadratic programming algorithm presented in Schitt-kowski [120, 123], we use the notation introduced there and omit details which are not essentialto understand the basic idea. A typical dense SQP code takes all constraints into account, andrequires real working arrays of length O(n2 + nm), see chapter 2 or Schittkowski [132]. n is thenumber of variables and m the number of constraints of (3.1) without bounds. In particular, weneed mn double precision real numbers to store the gradients of the constraint functions for thequadratic programming subproblem.

We assume now that n is of reasonable size, say below 100, but that m is very large comparedto n, say 1,000,000 or even more. Then either the available memory is insufficient to store thetotal gradient matrix of size nm, or the large set of linear constraints in the subproblem slowsdown the quadratic programming solver. It is furthermore assumed that there are no sparsitypatterns in the Jacobian matrix which could be exploited. Thus, we replace m by mw, wheremw is a user-provided number depending on the available memory and the expected number ofactive constraints, and which satisfies

n ≤ mw ≤ m .

It is supposed that a double precision array of size nmw can be addressed in memory. Moreover,it has to be guaranteed that the active-set algorithm is identical with a standard SQP methodif m = mw.

We want to formulate smaller quadratic programming subproblems withmw linear constraints.If xk denotes an iterate of the algorithm, vk the corresponding multiplier estimate and Bk apositive definite estimate of the Hessian of the Lagrangian function (3.9), we solve quadraticprograms of the form

d ∈ Rn, δ ∈ R :

min 12d

TBkd+5f(xk)Td + 1

2σ2kδ

2 ,

5gj(xk)Td+ (1− δ)gj(xk)

=≥

0 , j ∈ J?k ,

5gj(xj(k))Td+ gj(xk) ≥ 0 , j ∈ K?

k .

(3.12)

An additional variable δ is introduced to prevent infeasible linear constraints, see Schittkowski[120] for details. To get a descent direction in this case, it may happen that another internalloop must be entered with an increasing penalty term for the additional variable, see Schitt-kowski [120] for details. Note that the matrix Bk is positive definite, so that the solution of(3.12) is always unique.

The index set J?k is called the set of active constraints and is defined by

J?k.= 1, . . . ,me ∪ j : me < j ≤ m, gj(xk) < ε or v

(k)j > 0 . (3.13)

It is assumed that |J?k | ≤ mw, i.e., that all active constraints are part of the working set

Wk.= J?k ∪K

?k (3.14)

with mw elements. The working set contains the active constraints plus a certain subset of thenon-active ones, K

?k ⊂ K?

k , defined by

K?k := 1, . . . ,m \ J?k . (3.15)

35


vk = (v(k)1 , . . . , v

(k)m )T is the current multiplier estimate and ε a user provided error tolerance.

The indices j(k) in (3.12) denote previously computed gradients of constraints. Their definitionwill become clear when investigating the algorithm in more detail. The idea is to recalculateonly gradients of active constraints and to fill the remaining rows of the constraint matrix withpreviously computed ones.

We have to assume that there are not more than mw active constraints throughout the algo-rithm. But we do not support the idea to include some kind of automatized phase I procedureto project an iterate back to the feasible region whenever this assumption is violated. We willhave some safeguards in the line search algorithm to prevent this situation. If, for example ata starting point, more than mw constraints are active, it is preferred to stop the algorithm andto leave it to the user either to change the starting point or to establish an outer constraintrestoration procedure depending on the problem structure.

After solving the quadratic programming subproblem (3.12) we get a search direction dk anda corresponding multiplier vector uk. The new iterate is obtained by

xk+1.= xk + αkdk , vk+1

.= vk + αk(uk − vk) (3.16)

for approximating the optimal solution x? ∈ Rn of (3.2) and the corresponding optimal multipliervector u? ∈ Rm. The steplength parameter αk is the result of an additional line search sub-algorithm, by which we want to achieve a sufficient decrease of an augmented Lagrangian meritfunction

ψr(x, v).= f(x)−

∑j∈J(x,v)

(vjgj(x)− 1

2rjgj(x)2)− 1

2

∑j∈K(x,v)

v2j /rj . (3.17)

The index sets J(x, v) and K(x, v) are defined by

J(x, v).= 1, . . . ,me ∪ j : me < j ≤ m, gj(x) ≤ vj/rj ,

K(x, v).= 1, . . . ,m \ J(x, v) ,

(3.18)

see Schittkowski [120]. The corresponding penalty parameters rk.= (rk1 , . . . , r

km)T that control

the degree of constraint violation, must carefully be chosen to guarantee a sufficient descentdirection of the merit function

φrk(αk) ≤ φrk(0) + αkµφ′rk

(0) , (3.19)

see Schittkowski [120], Ortega and Rheinboldt [100], or Wolfe [166] in a more general setting,where

φrk(α).= ψrk

((xkvk

)+ α

(dk

uk − vk

))(3.20)

and


(dk

uk − vk

). (3.21)

An additional requirement is that at each intermediate step of the line search procedure at mostmw constraints are active. If this condition is violated, the steplength is further reduced untilsatisfying this condition. ¿From the definition of our index sets, we have

J?k ⊃ Jk.= J(xk, vk) . (3.22)

36


The starting point x0 is crucial from the viewpoint of numerical efficiency and must be pre-determined by the user. It has to satisfy the assumption that not more than mw constraintsare active, i.e., that J0 ⊂ W0. The remaining indices of W0 are to be set in a suitable wayand must not overlap with the active ones. Also W0 must be provided by the user to have thepossibility to exploit pre-existing knowhow about the position of the optimal solution and itsactive constraints.

For all other parameters, suitable default values can be provided, e.g., v0 = 0 for the initialmultiplier guess, B0 = I for the initial estimate of the Hessian matrix of the Lagrangian functionof (1) and r0 = (1, . . . , 1)T for the initial penalty parameters. In general it is assumed that v0

j ≥ 0for j = me + 1, . . ., m, that B0 is positive definite, and that all coefficients of the vector r0 arepositive.

The basic idea of the algorithm can be described in the following way: We determine aworking set Wk and perform one step of a standard SQP-algorithm with respect to nonlinearprogramming problem with mw nonlinear constraints. Then the working set is updated and thewhole procedure repeated.

One particular advantage is that the numerical convergence conditions for the reduced problemare applicable for the original one as well, since all constraints not in the working set Wk areinactive, i.e., satisfy gj(xk) > ε for j ∈ 1, . . . ,m \Wk.

The line search procedure described in Schittkowski [120] can be used to determine a steplengthparameter αk, which is a combination of an Armijo-type steplength reduction with a quadraticinterpolation of φk(α). The proposed approach guarantees theoretical convergence results, isvery easy to implement and works satisfactorily in practice. But in our case we want to achievethe additional requirement that all intermediate iterates αk,i or xk + αk,i−1dk, respectively, donot possess more than mw violated constraints. By introducing an additional loop reducing thesteplength by a constant factor, it is always possible to guarantee this condition. An artificialpenalty term is added to the objective function consisting of violated constraints. The modifi-cation of the line search procedure prevents iterates of the modified SQP-method that violatetoo many constraints.

BFGS-updates are standard technique in nonlinear programming and yield excellent con-vergence results both from the theoretical and numerical point of view. The modification toguarantee positive definite matrices Bk was proposed by Powell [105]. The update is performedwith respect to the corrections xk+1 − xk, and 5xL(xk+1)−5xL(xk).

Since a new restriction is included in the working set Wk+1 only if it belongs to J?k+1, weget always new and current gradients in the quadratic programming subproblem (3.12). Butgradients can be reevaluated for any larger set, e.g., Wk+1. In this case we can expect even abetter performance of the algorithm.

The proposed modification of the standard SQP-technique is straightforward and easy toanalyze. We want to stress out that its practical performance depends mainly on the heuristicsused to determine the working set Wk. The first idea could be to take out those constraintsfrom the working set which got the largest function values. However, the numerical size of aconstraint depends on its internal scaling. In other words, we cannot conclude from a largerestriction function value that the constraint is probably inactive.

To get a decision on constraints in the working set Wk that is independent of the scaling ofthe functions as much as possible, we propose the following rules:

• Among the constraints feasible at xk and xk+1, keep those in the working set that were

37


violated during the line search. If there are too many of them according to some givenconstant, select constraints for which

gj(xk+1)− εgj(xk+1)− fj(xk + αk,i−1dk)

is minimal. The decision whether a constraint is feasible or not, is performed with respectto the given tolerance ε.

• In addition keep the restriction in the working set for which gj(xk + dk) is minimal.

• Take out those feasible constraints from the working set, which are the oldest ones withrespect to their successive number of iterations in the working set.

Under the assumptions mentioned so far, we can prove that


(dk

uk − vk

)< −1

4γ‖dk‖2 (3.23)

for all k and a positive constant γ, see Schittkowski [126].

It is possible that we get a descent direction of the merit function, but that φ′r(0) is extremelysmall. To avoid interruption of the whole iteration process, the idea is to repeat the line searchwith another stopping criterion. Instead of testing (3.23), we accept a stepsize αk as soon asthe inequality

φrk(αk) ≤ maxk−p(k)<=j<=k

φrj (0) + αkµφ′rk

(0) (3.24)

is satisfied, where p(k) is a predetermined parameter with p(k) = mink, p, p a given tolerance.Thus, we allow an increase of the reference value φrk(0) in a certain error situation, i.e., anincrease of the merit function value. In case of k = 0, the reference value is adapted by afactor greater than 1, i.e., φrjk (0) is replace by tφrjk (0), t > 1. The basic idea to store referencefunction values and to replace the sufficient descent property by a sufficient ’ascent’ property inmax-form, see Dai and Schittkowski [28] for details and a convergence proof.

3.3. Numerical Tests

The modified SQP-algorithm is implemented in form of a Fortran subroutine with name NLPQLB.As pointed out in the previous section, an iteration consists of one step of a standard SQP-method, in our case of the code NLPQLP (see chapter 2 or [132]), with mw constraints. Ba-sically, only the definition of the working set and some rearrangements of index sets must beperformed. Then NLPQLP is called to perform only one iteration proceeding from the iteratesxk, vk, Bk, rk and J?k .

The algorithm as described in the previous section, requires an estimate of the maximum sizeof the working set, mw, a starting point x0 ∈ Rn, and an initial working set W0 with J?0 ⊂W0,see (3.13) for a definition of the active set J?k . A straightforward idea is to sort the constraintsaccording to their function values at x0, and to take the first mw constraints in increasing order.However, one would have to assume that all constraints are equally scaled, a very reasonableassumption in case of scalar semi-infinite problems of the the form (3.3).

38


Figure 3.1.: Function Plot for P1

Otherwise, an alternative, much simpler proposal could be to include all constraints in theinitial working set for which gj(x) ≤ ε, and to fill the remaining position with indices for whichgj(x) > ε.

Any other initialization of the working set depending on available information about theexpected active constraints may be applied.

Some numerical experiments are reported to show that the resulting algorithm works as ex-pected. The examples are small academic test problems taken from the literature, but somewhatmodified in particular to get problems with a varying number of constraints. They have beenused before to get the results published in Schittkowski [126], and are now solved with up to200,000,000 instead of maximal 10,000 constraints. Gradients are evaluated analytically.

P1: The nonlinear semi-infinite test problem is taken from Tanaka, Fukushima, and Ibaraki [156],

x1, x2, x3 ∈ R :min x2

1 + x22 + x2

3

−x1 − x2 exp (x3y)− exp (2y) + 2 exp(4y) ≥ 0 for all y ∈ [0, 1](3.25)

with starting point x0 = (1,−1, 2)T . After a discretization of the interval [0, 1] with m = 4 · 107

equidistant points, we get a nonlinear program with 40,000,000 constraints. Figure 3.1 shows thecurve plots over y for the starting point and the optimal solution x? = (−0.21331259,−1.3614504,1.8535473)T . Although only one constraint is active at the optimal solution x?, the starting pointx0 violates about 50 % of all constraints. Thus, the initial working set W0 must be sufficientlylarge and we choose mw = 2 · 107. For the same reason, we restrict the discretization of theinterval [0, 1] and the total number of constraints by m = 4 · 107.

P1F: This is the same nonlinear semi-infinite test problem as before. It is to be shown thatthe simple feasibility modification (3.11) by introducing an additional variable x4 and a penaltyterm with ρ = 104 in the modified objective function reduces the number of intermediate active

39



constraints significantly. The test problem is now given in the form

x1, . . . , x4 ∈ R :

min x21 + x2

2 + x23 + ρx4

−x1 − x2 exp (x3y)− exp (2y) + 2 exp(4y) ≥ −x4

for all y ∈ [0, 1] ,

x4 ≥ 0 .

(3.26)

The equidistant discretization is performed with m = 2 · 108 points, and the initial iterate isx0 = (1,−1, 2, 100)T . No constraint is active at the starting point and we are able to choose amuch smaller working set of size mw = 2 · 103.

P3: The nonlinear semi-infinite test problem is very similar to P1, see Tanaka, Fukushima, andIbaraki [156],

x1, x2, x3 ∈ R :

min expx1 + expx2 + expx3

x1 + x2y + x3y2 − 1

1 + y2≥ 0 for all y ∈ [0, 1]

(3.27)

with starting point x0 = (1, 0.5, 0)T . An equidistant discretization of the interval [0, 1] withm = 2 · 108 equidistant points is chosen and the size of the working set is mw = 5 · 105.Figure 3.2 shows the curve plots over y for the starting point and the optimal solution x? =(1.0066047,−0.12687988,−0.37972483)T . One constraint is active at the starting point and twoat the optimal solution. However, we have to expect a larger number of nearly active constraints.Since the active set at the starting point differs significantly from the active set at the optimalsolution, we have to choose a relatively large working set.

P4: This is a nonlinear semi-infinite test problem with two free variables from the interval

40



[0, 1]× [0, 1], see Tanaka, Fukushima, and Ibaraki [156],

x1, x2, x3 ∈ R :

min x21 + x2

2 + x23

−x1(y1 + y22 + 1)− x2y2(y1 − y2)− x3y2(y1 + y2 + 1) ≥ 1 ,

for all y1 ∈ [0, 1] and y2 ∈ [0, 1]

(3.28)

with starting values x0 = (−2,−1, 0)T . Again, we use 200,000,000 uniformly distributed pointsof the interval [0, 1] × [0, 1] to discretize it. Since only one constraint is active for y1 = y2 = 0,the size of the working set can be as low as mw = 200. Figure 3.3 shows the curvature plot overy1 and y2 for the optimal solution x? = (−1, 0, 0)T .

TP332: The problem is a modification of the test problem TP332 of Schittkowski [124] to geta larger number of constraints,

x1, x2 ∈ R :

min

m∑i=1

((log ti + x2 sin ti + x1 cos ti)

2 + (log ti + x2 cos ti − x1 sin ti)2)

arctan

(1/ti − x1

log ti + x2

)≤ π

60, i = 1, . . . ,m

(3.29)

with starting values x0 = (0.75, 0.75)T , where ti.= π(1/3 + (i − 1)/180). We proceed from

m = 2 · 108 constraints, and the size of the working set is mw = 100. Only one constraint isactive at the starting and the optimal solution x? = (0.94990963, 0.049757167)T .

TP374: The problem is extended in a straightforward way to allow more than 35 constraints

41


given in Schittkowski [124],

x ∈ R10 :

min x10

z(ti)− (1− x10)2 ≥ 0 , i = 1, . . . , r ,

−z(ti) + (1 + x10)2 ≥ 0 , i = r + 1, . . . , 2r ,

−z(ti) + x210 ≥ 0 , i = 2r + 1, . . . , 3.5r ,

(3.30)

where

z(t).=

(9∑

k=1

xk cos(kt)

)2

+

(9∑

k=1

xk sin(kt)

)2

andti = π(i− 1)0.025 , i = 1, . . . , r ,ti = π(i− 1− r)0.025 , i = r + 1, . . . , 2r ,ti = π(1.2 + (i− 1− 2r)0.2)0.25 , i = 2r + 1, . . . , 3.5r .

Starting solution is x0 = (0.1, 0.1, . . . , 0.1, 1)T . By choosing r = 108/3.5, we get 100,000,000nonlinear constraints. Because of a large number of intermediate active constraints, we have torestrict the total number of constraints and let mw = 2 · 106. The optimal solution is

x? = (0.25559274, 0.25291560, 0.29120000, 0.26826036, 0.18929502,0.082428568, −0.014405861, −0.070673846, −0.18878711, 0.29170025)T .

U3: The goal is to approximate the exponential function by a rational one, i.e., to minimize themaximum norm of r functions, see Luksan [88],

minx∈R5

max| x1 + x2ti1 + x3ti + x4t2i + x5t3i

− exp(ti)| , i = 1, . . . , r , (3.31)

where

ti.= 2

i− 1

r − 1− 1

for i = 1, . . ., r. Starting point is x0 = (0.5, 0, 0, 0, 0)T . The problem is transformed into asmooth nonlinear program of the form (3.8) with m

.= 2r constraints and n+ 1 variables. The

starting point for the additional variable is set to t = 20, and the number of constraints isr = 5 · 107 or m = 108, respectively, where we expect a large number of intermediate activeconstraints. Thus, we allow up to mw = 5 · 105 constraints in the working set. Figure 3.4 showsthe curvature plot of the residual function f(x, t) as defined by (3.31) over t for the optimalsolution

x? = (0.99987768, 0.25365090,−0.74654084, 0.24513247,−0.037465273)T .

L5: The problem is similar to the previous one. Again, the sum of absolute values of a set ofr differentiable functions is to be minimized, see Luksan [88], but now with additional linear

42


Figure 3.4.: Function Plot for U3

equality and inequality constraints,

x ∈ R7 :

min max |1 + 27∑j=1

cos(2πxj sin θi)| , i = 1, . . . , r

−x4 + x6 = 1 ,x7 = 3.5 ,−x1 + x2 ≥ 0.4 ,−x2 + x3 ≥ 0.4 ,−x3 + x4 ≥ 0.4 ,−x4 + x5 ≥ 0.4 ,−x5 + x6 ≥ 0.4 ,−x6 + x7 ≥ 0.4 ,

(3.32)

where

θj.=

π

180(8.5 + 0.5i)

for i = 1, . . ., r. Starting point is x0 = (0.5, 1, 1.5, 2, 2.5, 3, 3.5)T . The problem is transformedinto a smooth nonlinear program (3.8) with m

.= 2r + 8 constraints and n + 1 variables. The

starting point for the additional variable is set to t = 1. We set r = 108−4 leading to m = 2 ·108

constraints. The number of constraints in the working set is mw = 4 · 104. Figure 3.5 shows thecurvature plot of the residual function f(x, t) as defined by (3.32) over θ for the optimal solution

x? = (0.34626538, 0.75414273, 1.2127664, 1.7308550, 2.1702278, 2.7308550, 3.5)T .

We observe a larger number of non-connected active constraints at the optimal solution.

E5: Again, we minimize the maximum norm of r functions, see Hald and Madsen [64],

minx∈R4

max(x1 + x2ti − exp ti)2 + (x3 + x4 sin ti − cos ti)

2 , i = 1, . . . , r (3.33)

43


Figure 3.5.: Function Plot for L5

where

ti.=

4i

r

for i = 1, . . ., r. Starting point is x0 = (25, 5,−5,−1)T . The problem is transformed into asmooth nonlinear program of the form (3.8) with m

.= r constraints and n + 1 variables. The

starting point for the additional variable is set to t = 1, 000. For r = 2 · 108 we get m = 2 · 108

constraints, and the number of constraints in the working set is mw = 5 · 104. Figure 3.6 showsthe curvature plot of the residual function f(x, t) as defined by (3.33) over t for the optimalsolution

x? = (−10.112611, 13.376871,−0.45892449,−0.17124647)T .

The Fortran codes were compiled by the Intel Visual Fortran Compiler, Version 10.1, underWindows XP64, and executed on a Dual Core ADM Opteron processor 265 with 1.81 GHzand 4 GB RAM. The working arrays of the routine calling NLPQLB are dynamically allo-cated. Quadratic programming subproblems are solved by the primal-dual method of Goldfarband Idnani [56] based on numerically stable orthogonal decompositions, see Schittkowski [130].NLPQLB is executed with termination accuracy ε = 10−8.

Some test problem data characterizing their structure, are summarized in Table 1. Numericalexperiments are reported in Table 2 where we use the following notation:

44


Figure 3.6.: Function Plot for E5

name n m mw f?

P1 3 40,000,000 20,000,000 5.33469P1F 4 200,000,000 2,000 5.33469P3 3 200,000,000 500,000 4.30118P4 3 200,000,000 200 1.00000TP332 2 200,000,000 100 398.587TP374 10 100,000,000 2,000,000 0.434946U3 6 100,000,000 500,000 0.00012399L5 8 200,000,000 40,000 0.0952475E5 5 200,000,000 50,000 125.619

Table 3.1.: Test Examples

name - identification of test examplen - number of variables in standard form (3.1)m - total number of constraints in standard form (3.1)mw - number of constraints in the working set|Jmax| - maximum number of active constraintsnf - number of simultaneous function computations, i.e., of

objective function and all constraints at a given iteratentotg - total number of gradient computations, i.e., of all indi-

vidual constraint gradient evaluationsnits - number of iterations or simultaneous gradient computa-

tions, i.e., of gradients of objective function and all con-straints at a given iterate

time - calculation time in secondsf? - final objective function value

It is obvious that the efficiency of an active set strategy strongly depends on how close the

45


name |Jmax| nf ntotg nits tcalcP1 5,579,011 20 27,597,939 13 256P1F 801 23 6,728 15 942P3 129,237 12 1,558,343 10 210P4 1 4 203 4 62TP332 1 12 110 11 1,229TP374 584,004 150 20,238,000 89 7,470U3 357,153 140 4,832,326 49 899L5 39,976 80 207,960 24 3,005E5 41,674 30 209,519 21 1,244

Table 3.2.: Numerical Results

active set at the starting point to that of the optimal solution is. If dramatic changes of activeconstraints are expected as in case of P1, i.e., if intermediate iterates with a large number ofviolated constraints are generated, the success of the algorithm is marginal. On the other hand,practical optimization problems often have special structures from where good starting pointscan be predetermined. Examples P1F and especially P4 and TP332 show a dramatic reductionof derivative calculations, which is negligible compared to the number of function calls.

Since the constraints are nonlinear and non-convex, we have to compute all m constraintfunction values at each iteration to check feasibility and to predict the new active set. The totalnumber of individual constraint function evaluations is nf ·m.

Calculation times are excessive and depend mainly on data transfer operations from and tothe standard swap file of Windows, and the available memory in core, which is 4 GB in our case.To give an example, test problem TP332 requires 110 sec for m = 2 · 107 constraints and only 8sec for m = 2 · 106 constraints.

Note that the code NLPQLB requires additional working space in the order of 2m doubleprecision real numbers plus mw · (n + 1) double precision numbers for the partial derivativesof constraints in the working set. Thus, the total memory to run a test problem with m =2 · 8 constraints requires at least 600,000,000 double precision numbers and in addition at least400, 000, 000 logical values.

It is amazing that numerical instabilities due to degeneracy are prevented. The huge numberof constraints indicates that the derivatives are extremely close to each other, making the opti-mization problem unstable. The constraint qualification, i.e., the linear independence of activeconstraints, is more or less violated. We benefit from the fact that derivatives are analyticallygiven.



46



1. Generate an NlpqlbWrapper object employing NlpqlbWrapper()



b) Put a guess for the number of active constraints to the object:Put( const char * pcParam , const int * piNumConstrSubProb ) ;

orPut( const char * pcParam , const int iNumConstrSubProb ) ;

where pcParam = "NumConstrSubProb".






– "MaxNumIterLS" Put maximum number of function calls during line search.

– "StackSizeLS" Stack size for storing merit function values at previous iter-ations for non-monotone line search (e.g. 10). In case of StackSizeLS == 0,monotone line search is performed. StackSizeLS should not be greater than50.


– "FortranOutputLevel" If > 0, the output of the fortran code is written toan additional output file. This output might give more detailed informationabout errors.0: no additional output (default)1: only final convergence analysis2: one line of intermediate results is printed in each iteration.3: more detailed information is printed in each iteration step, e.g. variable,constraint and multiplier values.4: In addition to 3, merit function and steplength values are displayed duringthe line search.








47


– "ToleranceLS" Put the relative bound for increase of merit function value, ifline search is not successful during the very first step. Must be non-negative(e.g. 0.1).







– "LQL" If LQL is set true, the quadratic programming subproblem is solvedwith a full positive definite quasi-Newton matrix. Otherwise,a Cholesky de-composition is performed and updated, so that the subproblem matrix con-tains only an upper triangular factor.

– "OpenNewOutputFile" Determines whether a new output file has to be cre-ated. If false, the output is appended to an existing file. Usually, thisparameter does not have to be changed.












where pdXu and pdXl must be pointing on arrays of length equal to the number ofdesign variables .

Before setting bounds, the number of design variables must be set.If no bounds are set, these will be -1E30 and +1E30 by default.




48



for iVariableIndex = 0,...,NumberOfVariables - 1.Before setting an initial guess, the number of design variables must be set.

4. Start Nlpqlb-Routine: StartOptimizer()




• if iStatus equals EvalFuncId(): NlpqlbWrapper needs new function values → 6bNew function values.After passing these values to the NlpqlbWrapper object go to 4.

• if iStatus equals EvalGradId():NlpqlbWrapper needs new values for gradients → 6c Providing new gradient values.After passing these values to the NlpqlbWrapper object go to 4.


a) Useful methods:



Returns the number of design variables.


with i=0 (default) returns a pointer to the design variable vector.GetDesignVar( const int iVarIdx , double & pdPointer,

const int iVectorIdx ) const

with iVectorIdx = 0 (default) returns the value of the iVarIdx’th design vari-able of the design variable vector.

b) Providing new function valuesFunction values for the objective as well as for constraints have to be calculated forthe design variable vector provided by Nlpqlb.For access to the design variable vector see 6(a)ii. After calculating the values, thesemust be passed to the NlpqlbWrapper object using:


const int iParSysIdx ),with iParSysIdx=0 (default)where pdConstrVals is pointing on an array containing the values of each con-straint at the design variable vector provided by NlpqlbWrapper.


const int iParSysIdx ),with iParSysIdx = 0(default), iObjFunIdx = 0 (default) and dObjVal definingthe value of the objective function at the design variable vector provided byNlpqlbWrapper.

49



const int iParSysIdx ),with iParSysIdx = 0 (default)for iConstrIdx = 0,...,NumberOfConstraints - 1




For access to the design variable vector see 6(a)ii.






Alternatively, you can usePutGradConstr( const int iConstrIdx, const double * pdGradient ),for iConstrIdx = 0,...,NumberOfConstraints - 1



• PutDerivObj( const int iVariableIdx, const double dDerivativeValue,

const int iFunIdx ),with iFunIdx = 0 (default)for iVariableIdx = 0,...,NumberOfVariables - 1



7. Output





50










• GetDesignVar( const int iVariableIdx, double & dValue, const int iParSysIdx

) const


• GetDesignVarVec( const double * & pdPointer, const int iParSysIdx) const




2. The constraints are contradicting, i.e., the set of feasible solutions is empty. There is noway to find out whether a general nonlinear and non-convex set possesses a feasible pointor not. Thus, the nonlinear programming algorithms will proceed until running in any ofthe mentioned error situations. In this case, the correctness of the model must be verycarefully checked.


However, some of the error situations also occur if, because of wrong or inaccurate gradients,the quadratic programming subproblem does not yield a descent direction for the underlyingmerit function. In this case, one should try to improve the accuracy of function evaluations,scale the model functions in a proper way, or start the algorithm from other initial values.

51


NLPQLB returns the best iterate obtained. In case of successful termination (Status= 0),this is always the last iterate, in case of non-successful return (Status> 0) an eventually betterprevious iterate. The success is measured by objective function value and constraint violation.Note that the output of constraints and multiplier values is suppressed for more than 10, 000constraints.

3.5. Conclusions

We present a modification of an SQP algorithm to solve optimization problems with a verylarge number of constraints, m, relative to the number of variables, n. The idea is to proceedfrom a user-provided guess, mw, for the maximum number of violated constraints, and to solvequadratic programming subproblems with mw linear constraints instead of m constraints.

Some numerical experiments with simple academic test problems show that it is possibleto solve problems with up to m = 2 · 108 nonlinear constraints, which would not be solvableotherwise by a standard SQP algorithm. Sparsity of the Jacobian of the constraints is notassumed.

The performance depends significantly on the position of the starting point, i.e., on the ques-tion, how close the initial active set is to the active set at the final solution. If, in the worstcase, all constraints are violated at an intermediate iterate, the proposed active set strategy isuseless, as probably any other as well. In practical applications, however, optimization prob-lems are often routinely solved and some information about the choice of good starting pointsis available. If there are no or very little changes of the active set and if only a few constraintsare active, the achievements are significant.

It is very amazing that it is possible at all to solve problems with this huge number ofconstraints on a standard PC. The test examples have a quite simple structure, in most casesarising from a semi-infinite optimization and an equidistant discretization. It must be expectedthat gradients of neighbored constraints coincide up to seven digits. These optimization problemsare highly unstable in the sense that the linear independency constraint qualification is more orless violated at all iterates.

52

4. NlpqlgWrapper - Heuristic GlobalOptimization

Usually, global optimization codes with guaranteed convergence require a large number of func-tion evaluations. On the other hand, there are efficient optimization methods which exploitgradient information, but only the approximation of a local minimizer can be expected. If,however, the underlying application model is expensive, if there are additional constraints, es-pecially equality constraints, and if the existence of different local solutions is expected, thenheuristic rules for successive switches from one local minimizer to another are often the onlyapplicable approach. For this specific situation, we present some simple ideas for cutting offa local minimizer and to restart a new local optimization run. However, some safeguards areneeded to stabilize the algorithm, since very little is usually known about the distribution oflocal minima. The paper introduces an approach where the nonlinear programs generated canbe solved by any available black box software. For our implementation, a sequential quadraticprogramming code (NLPQLP) is chosen for local optimization. The usage of the code is outlinedand we present some numerical results based on a set of test examples found in the literature.In chapter 10 an example implementation can be found. The sections 4.1, 4.2 and 4.4 are takenfrom Schittkowski [143].

4.1. Introduction


x ∈ Rn :

min f(x)

gj(x) = 0 , j = 1, . . . ,me,

gj(x) ≥ 0 , j = me + 1, . . . ,m,

xl ≤ x ≤ xu ,

(4.1)

where x is an n-dimensional parameter vector. It is assumed that all problem functions f(x)and gj(x), j = 1, . . ., m, are continuously differentiable on the whole Rn. But besides of thiswe do not impose any further restrictions on the mathematical structure.

Let P denote the feasible domain,

P := x ∈ Rn : gj(x) = 0, j = 1, . . . ,me, gj(x) ≥ 0, j = me + 1, . . . ,m, xl ≤ x ≤ xu .

Our special interest is to find a global optimizer, i.e., a feasible point x? ∈ P with f(x?) ≤ f(x)for all x ∈ P . Without further assumptions, it is not possible to know in advance how manylocal solutions exist or even whether the number of local solutions is finite or not, or whether toglobal minimizer is unique.

53

4. NlpqlgWrapper - Heuristic Global Optimization

Global optimization algorithms have been investigated in the literature extensively, see forexample the books of Pinter [103], Torn and Zilinskas [159], Horst and Pardalos [71] and thereferences herein. Main solution strategies are partition techniques, stochastic algorithms, orapproximation techniques, among many other methods. Despite of significant progress and im-provements in developing new computer codes, there remains the disadvantage that the numberof function evaluations is often large and unacceptable for realistic time-consuming simulations.Especially nonlinear equality constraints are often not appropriate and must be handled in formof penalty terms, by which direct and random search algorithms can be deteriorated drastically.

One of the main drawbacks of global optimization is the lack of numerically computable andgenerally applicable stopping criteria. Thus, global optimization is inherently a difficult problemand requires more or less an exhaustive search over the whole feasible domain to guaranteeconvergence.

As soon as function evaluations become extremely expensive preventing the application ofany of the methods mentioned above, there are only two alternatives. Either the mathematicalmodel allows specific analysis to restrict the domain of interest to a region where the globalminimizer is expected, or one tries to improve local solutions until a reasonable, not necessarilyglobal solution is found. A typical technique is the so-called tunnelling method, see Levy andMontalvo [82], where the objective function in (4.1) is replaced by

f(x) =f(x)− f(xloc)

||x− xloc||ρ

and where xloc ∈ P denotes a local minimizer. ρ is a penalty parameter with the intention topush away the next local minimizer from the known one, xloc. A similar idea to move to anotherlocal minimizer, is proposed by Ge and Qin [50], also called function filling method, where theobjective function is inverted and an exponential penalty factor added to prevent approximationof a known local solution.

Our heuristic proposal follows a similar idea, the successive improvement of global minima.But the algorithm is different in the following sense. Additional constraints are attached tothe original problem formulation (4.1). First, there is a constraint that the next local solutionmust have an objective function value less than the best known feasible function value minus arelative improvement of ε1 > 0. For each known local solution xloc, a ball is formulated aroundxloc with radius ε2 to prevent a subsequent approximation of xloc. The procedure is repeateduntil the solution method breaks down with an error message from which we conclude that thefeasible domain is empty. To further prevent a return to a known local solution, an RBF kernelfunction of the form

ρ exp(−µ‖x− xloc‖2)

is added to the objective function for each local solution.However, an appropriate choice of the tolerances ε1, ε2, ρ, and µ depends on the distribution

of the local solutions, scaling properties, and the curvature of the objective function. Moreover,the nonlinear programs generated this way, become more and more non-convex. Thus, one hasto add an additional regularization to stabilize the algorithm and appropriate starting pointsfor each subproblem.

In Section 4.2, we outline the heuristic procedure in more detail. Regularization and numericalaspects are discussed. The usage of the Fortran subroutine is documented in Section 4.3 andChapter 10 contains an illustrative example.

54

4.2. The Algorithm

4.2. The Algorithm

We proceed from the constrained nonlinear program (4.1) without any information about thenumber of local minima. Note that the feasible set P is compact and the objective functionf(x) is continuous on P , i.e., we know that a global solution exists.

Let x?1, . . ., x?k be a series of known local minima, i.e., a series of feasible points satisfying thenecessary KKT optimality conditions of (4.1). For computing a new iterate x?k+1, the followingexpanded nonlinear program is formulated and solved,

x ∈ Rn :

min f(x) +∑k

i=1 ρi exp(−µi‖x− x?i ‖2)

gj(x) = 0 , j = 1, . . . ,me,

gj(x) ≥ 0 , j = me + 1, . . . ,m,

f(x) ≤ f?k − ε1|f?k | ,

‖x− x?i ‖2 ≥ ε2 , i = 1, . . . , k,

xl ≤ x ≤ xu .

(4.2)

Here, f?k is the best known objective function value, i.e.,

f?k := mini=1,...,k

f(x?i ) , (4.3)

and ‖.‖ denotes the Euclidean norm. Once a local solution of (4.2) is found, the objectivefunction value is cut away and a ball around the minimizer prevents an approximation of anyof the previous iterates. In addition, a radial basis function (RBF) is added to the objectivefunction to push away subsequent local minimizers from the known ones. ε1 > 0 and ε2 > 0 aresuitable tolerances for defining the artificial constraints, also ρi > 0 and µi > 0 for defining theRBF kernel.

If the applied solution method for the nonlinear program (4.2) terminates with an errormessage at a non-feasible iterate, we conclude that P is empty and stop. Otherwise, we obtainat least a feasible point x?k+1 with an objective function value better than all known ones, andk is replaced by k + 1 to solve (4.2) again.

Obviously, we get a series of feasible iterates with decreasing objective function values. How-ever, the approach has a couple of severe drawbacks:

1. The choice of the tolerances ε1 and ε2 is critical for the performance and it is extremelydifficult to find appropriate values in advance. Too small values prevent the algorithmfrom moving away from the neighborhood of a local minimizer of the original program(4.1) towards another local minimizer, and too large values could cut off too many localminimizers, even the global one.

2. The choice of the RBF kernel parameters ρi and µi seems to be less crucial, but mustnevertheless carefully adapted to the structure on the model functions.

3. Even if the initial feasible set P is convex, the additional constraints are non-convex andthe feasible domains of (4.2) become more and more irregular. It is even possible that aninitial connected set P becomes non-connected.

55


Figure 4.1.: Local and Global Minimizer

4. The algorithm stops as soon as the constraints of (4.2) become inconsistent. But infeasi-bility cannot be checked by any mathematical criterion. The only possibility is to run anoptimization algorithm until an error message occurs at an infeasible iterate. But there isno guarantee in this case that the feasible region is in fact non-empty.

5. The local solutions of (4.2) do not coincide with local solutions of (4.1), if some of theartificial constraints become active.

6. It is difficult to find appropriate starting values for solving (4.2). Keeping the original oneprovided by the user, could force the iterates to get stuck at a previous local solution untilan internal error occurs.

The situation is illustrated in Figure 4.1. Once a local solution x?1 is obtained, an intervalwith radius ε2 around x?1 and a cut of the objective function subject to a relative bound ε1 tryto push away subsequent iterates from x?1. The dotted line shows the objective function to beminimized, including the RBF term. The new feasible domain is shrinking, as shown by the grayarea. If, however, the applied descent algorithm is started close to x?1, it tries to follow the slopeand to reach x?1. If ε1 and ε2 are too small and if there are additional numerical instabilities,for example badly scaled functions or inaccurate gradient approximation, it is possible that thecode runs into an error situation despite of its theoretical convergence properties.

To overcome at least some of these disadvantages, it is tried to regularize (4.2) in the followingsense. For each of the artificial inequality constraints, a slack variable is introduced. Thus, weare sure that the feasible domain of the new subproblem is always non-empty. However, we getonly a perturbed solution of (4.2) in case of non-zero slack variables. To reduce their size andinfluence as much as possible, a penalty term is added to the objective function, and we get the

56

4.2. The Algorithm

problem

x ∈ Rn, y ∈ R, z ∈ Rk, :

min f(x) +∑k

i=1 ρi exp(−µi‖x− x?i ‖2) + γk(y + eT z)

gj(x) = 0 , j = 1, . . . ,me,

gj(x) ≥ 0 , j = me + 1, . . . ,m,

f(x) ≤ f?k − ε1|f?k |+ y ,

‖x− x?i ‖2 ≥ ε2 − eTi z , i = 1, . . . , k,

xl ≤ x ≤ xu ,

0 ≤ y ≤ β1 ,

0 ≤ z ≤ β2 .

(4.4)

Here ei ∈ Rk denotes the i-th unit vector, i = 1, . . ., k, γk is a penalty parameter, and β1 and β2

are upper bounds for the slack variables y and z, e = (1, . . . , 1)T . By letting β1 = 0 or β2 = 0,the corresponding slack variables are suppressed completely.

There remains the question how to find a suitable starting point for (4.4) without forcing auser to supply too much information about the optimization problem and without trying to findsome kind of pattern or decomposition of the feasible domain, where problem functions must beevaluated. Basically, the user should have full control how to proceed, and new starting valuescould be computed randomly. Another possibility is to choose

x0k =

1

k + 1

(x0 +

k∑i=1

x?i

)(4.5)

where x0 ∈ Rn is the initial starting point provided by the user.The algorithm can be summarized as follows:Another tolerance ε3 is introduced to adopt the penalty parameter γk and to force the artificial

variables y and z to become as small as possible. Usually we set ε3 = ε1.The algorithm is a heuristic one without guaranteed convergence. However, there are many

situations preventing the application of a more rigorous method, for example based on a parti-tion technique, stochastic optimization, approximations, or a direct search method. The maindisadvantage of these algorithms is the large number of function evaluations, often not accept-able for realistic time-consuming simulation programs. Especially nonlinear equality constraintsare often not appropriate and must be handled in form of penalty terms, by which direct andrandom search algorithms can be deteriorated drastically. To summarize, the approach seemsto be applicable under the subsequent conditions:

• The evaluation of the model functions f(x) and gj(x), j = 1, . . ., m is expensive.

• There are highly nonlinear restrictions, especially equality constraints.

• Model functions are continuously differentiable and the numerical noise in function andgradient evaluations is negligible.

• The number of local minima of (4.1) is not too large.

• There is some empirical, model-based knowledge about the expected relative locations oflocal minima and the curvature of the objective function.

57


Algorithm 3 Global Optimum of (4.1)

Start: Select a starting point x0 ∈ Rn and some tolerances ρ > 0, µ > 0, ε1 > 0, ε2 > 0, ε3 > 0,β1 ≥ 0, β2 ≥ 0, γ > 0, and δ > 1. Moreover, let kmax be an upper bound for the number ofiterations.Initialization: Solve (4.1) by a locally convergent optimization algorithm to get a local minimizerx?1. Let γ1 := γ, ρ1 = ρ, and µ1 = µ.Iteration Cycle: For k = 1, 2,, . . . compute x?k+1 from x?1, . . ., x?k as follows:

1. Determine f?k by (4.3).

2. Formulate the expanded nonlinear program (4.4) with slack variables y ∈ R and z ∈ Rk.

3. Solve (4.4) by an available locally convergent algorithm for smooth, constrained nonlinearprogramming starting from x0

k ∈ Rn, for example given by (4.5), y = 0, and z = 0. Letxk+1, yk+1, and zk+1 be the optimal solution.

4. If yk + eT zk > ε3, let γk+1 = δγk, ρk+1 = δρk, and µk+1 = δµk. Otherwise, let γk+1 = γk,ρk+1 = ρk, and µk+1 = µk.

5. Let k := k + 1. If the maximum number of iterations is reached, i.e., if k = kmax, thenstop.

6. Repeat the iteration, if the local optimization code reports that all internal convergencecriteria are satisfied.

7. Stop otherwise. The last successful return is supposed to be the global minimizer.

58





1. Generate an NlpqlgWrapper object employing NlpqlgWrapper()



b) Put a guess for the number of expected local minimizers to the object:Put( const char * pcParam , const int * piNumConstrSubProb ) ;

orPut( const char * pcParam , const int iNumConstrSubProb ) ;

where pcParam = "MaxNumLocMin".






– "MaxNumIterLS" Put maximum number of function calls during line searh.


– "FortranOutputLevel" If > 0, the output of the fortran code is written toan additional output file. This output might give more detailed informationabout errors.0: No additional output (default)1: Only a final summary about local solutions.2: Final convergence analysis for each subproblem.3: One line of intermediate results is printed in each iteration.4: More detailed information is printed in each iteration step, e.g. variable,constraint and multiplier values.5: In addition to 4, merit function and steplength values are displayed duringthe line search.







59









– "OpenNewOutputFile" Determines whether a new output file has to be opened.If false, the output is appended to an existing file. Usually, this parameterdoes not have to be changed.



















4. Start Nlpqlg-Routine: StartOptimizer()

60





• if iStatus equals EvalFuncId(): NlpqlgWrapper needs new function values → 6bNew function values.After passing these values to the NlpqlgWrapper object go to 4.

• if iStatus equals EvalGradId():NlpqlgWrapper needs new values for gradients → 6c Providing new gradient values.After passing these values to the NlpqlgWrapper object go to 4.

• if iStatus == -3 :Nlpqlg needs a new initial guess. In the DefaultLoop() this is automatically doneusing random numbers.


a) Useful methods:








b) Providing new function valuesFunction values for the objective as well as for constraints have to be calculated forthe design variable vector provided by Nlpqlg.For access to the design variable vector see 6(a)ii. After calculating the values, thesemust be passed to the NlpqlbWrapper object using:


const int iParSysIdx ),with iParSysIdx=0 (default)where pdConstrVals is pointing on an array containing the values of each con-straint at the design variable vector provided by NlpqlgWrapper.


const int iParSysIdx ),with iParSysIdx = 0(default), iObjFunIdx = 0 (default) and dObjVal definingthe value of the objective function at the design variable vector provided byNlpqlgWrapper.


61



















7. Output







62

4.4. Summary








) const




4.4. Summary

We present a heuristic approach for stepwise approximation of the global solution of a constrainednonlinear programming problem. In each step, additional variables and constraints are added tothe original ones, to cut off known local solutions. The idea is implemented and the resulting codeNLPQLG is able to solve a large number of test problems found in the literature. However, thealgorithm is quite sensitive subject to the input tolerances, which must be chosen very carefully.But under certain circumstances, for example very time-consuming function evaluations or highlynonlinear constraints, the proposed idea is often the only way to improve a known local solution.

63

5. ScpWrapper - Sequential ConvexProgramming

For solving optimization problems with very many variables, the sequential convex program-ming algorithm SCPIP by Zillober can be applied. The code is frequently used in industrialapplications, especially in mechanical engineering.

5.1. Introduction

The method of moving asymptotes (MMA) has been introduced in 1987 by K. Svanberg [153].In 1993 the lack of missing global convergence has been removed in the method SCP (sequentialconvex programming; cf. [176]) by adding a line-search procedure. Both methods have beenproven to be efficient tools in the context of structural optimization (cf. [145]), since displacementdependent constraints (e.g. stresses) are approximated very well. However, problems besides thiscontext could also be solved efficiently. In 1995, weak convergence results have been proven in[154] without a line-search. In [176] the methods have been extended to a general mathematicalprogramming framework. Box-constraints for the variables are kept in the model because theycan be handled separately and they appear in many real-world applications.

Thus, we consider the following general nonlinear programming problem:

min f(x), x ∈ Rn,s.t. hj(x) = 0, j = 1, ...,meq,

hj(x) ≤ 0, j = meq + 1, ...,M, (P1)

xi ≤ xi ≤ xi, i = 1, ..., n.

The functions f and hj (j = 1, ...,M) have to be defined only on X := x | xi ≤ xi ≤ xi, i =1, ..., n and are assumed to be at least twice continuously differentiable in the interior of X.The feasible region is assumed to be non-empty.

The objective function of the problem will be approximated by a uniformly convex function,inequality constraints by convex functions, equality constraints by linear functions and the box-constraints will be allowed to shrink. Then, (P1) will be replaced by a sequence of subproblemsusing these approximations.

In the classical approaches, the subproblems have been solved applying a dual method (cf.[153]). This approach needs the repeated solution of dense linear systems of the dimensionM×Mwhich can be excellent for the case of large n and small M . But for a larger number of constraintsthis approach is disadvantageous since the linear system matrices are dense, independent of thestructure of the original problem.

In [175] the predictor-corrector interior point method to solve the subproblems has beenintroduced. The interior point method has several advantages. Firstly, it is also possible toreduce to M ×M systems, i.e. the advantage of the dual approach is covered. Secondly, it is

65

5. ScpWrapper - Sequential Convex Programming

possible to reduce to n×n systems which is desirable for a small number of variables and a largenumber of constraints. This is the case in many sizing problems of structural optimization. Asa third possibility, reduction to certain (n + M) × (n + M) systems is possible which can beof interest for certain sparsity patterns. It is important to notice that sparsity in the originalproblem can be exploited by the interior point method. This is not the case for the dual approach.

SCPIP has been described in a structural optimization framework in [177]. In [178] its abilityto solve very large-scale problems arising in structural optimization and the solution of ellipticcontrol problems has been shown.

The outline of the paper is as follows. In Section 5.2 the approximation scheme is introducedand the methods MMA and SCP are formulated. The interior point method to solve the sub-problems is described in Section 5.3. Section 5.4 contains convergence results. In Section 5.5 theinterface of ScpWrapper is introduced and in chapter 10 a sample calling program is described.

Sections 5.2-5.4 are a summary of some papers mentioned above, containing the underlyingmethods and theory.

5.2. The MMA approximation

As for most methods of nonlinear programming, the method proposed here replaces the problem(P1) by a sequence of easier to solve subproblems. For this purpose, the model functions of (P1),i.e. f and hj (j = 1, ...,M) are approximated by functions fk and hkj (j = 1, ...,M) as follows.The motivation for this approach comes from the fact, that displacement dependent constraintsin the context of structural optimization are approximated very well by this scheme, but it isgenerally applicable. More detailed explanations can be found in [153] and the references citedtherein.

The objective function f is replaced by

fk(x) := f(xk) +∑I+0,k

[∂f(xk)

∂xi

((Uki − xki )2

Uki − xi− (Uki − xki )

)+ τki

(xi − xki )2

Uki − xi

]

−∑I−0,k

[∂f(xk)

∂xi

((xki − Lki )2

xi − Lki− (xki − Lki )

)− τki

(xi − xki )2

xi − Lki

]. (5.1)

An inequality constraint hj (j = meq + 1, ...,M) is replaced by

hkj (x) := hj(xk) +

∑I+j,k

∂hj(xk)

∂xi

((Uki − xki )2

Uki − xi− (Uki − xki )

)

−∑I−j,k

∂hj(xk)

∂xi

((xki − Lki )2

xi − Lki− (xki − Lki )

). (5.2)

I+j,k, (I−j,k, resp.) is the set of indices of all components with non-negative (negative) first

partial derivatives at the expansion point xk, I+j,k := i|∂hj(x

k)∂xi

≥ 0, ( I−j,k := i|∂hj(xk)

∂xi< 0),

where h0 := f .

66


fk and hkj (j = meq + 1, ...,M) are defined on

Dk :=x | Lki < xi < Uki , i = 1, ..., n

.

Lki and Uki are parameters to be chosen with Lki < xki < Uki , τki are positive parameters,

i = 1, ..., n.

Equality constraints hj (j = 1, ...,meq) are approximated by

hkj (x) := hj(xk) +

n∑i=1

∂hj(xk)

∂xi

(xi − xki

). (5.3)

For the equality constraints no further restrictions on Dk apply.

This means, equality constraints are linearized in its usual sense, inequality constraints and the

objective function are linearized with respect to transformed variables1

Uki − xiand

1

xi − Lki. In

the objective function an additional term is added. These so defined functions have the followingproperties:

fk and hkj are first-order approximations of f and hj , resp. I.e.

fk(xk) = f(xk), hkj (xk) = hj(x

k),

∇fk(xk) = ∇f(xk), ∇hkj (xk) = ∇hj(xk),

for all j = 1, ...,M. Additionally:

hkj (j = meq + 1, ...,M) are convex, i.e. can be strictly convex.

fk is strictly convex. (5.4)

fk and hkj (j = 1, ...,M) are separable.

One particular subproblem of (P1) looks then as follows:

min fk(x), x ∈ Rn,s.t. hkj (x) = 0, j = 1, ...,meq,

hkj (x) ≤ 0, j = meq + 1, ...,M, (P ksub)

xi′ ≤ xi ≤ xi′, i = 1, ..., n.

where xi′ := maxxi, xki − ω(xki − Lki ) and xi

′ := minxi, xki + ω(Uki − xki ), ω ∈]0; 1[ fixed.

I.e., xi ≤ xi′ ≤ xki ≤ xi′ ≤ xi, where xk denotes the expansion point. For further use we define

X ′ := x : xi′ ≤ xi ≤ xi′, i = 1, ..., n.

We will now explain how the asymptotes can be chosen.

Definition 5.2.1 A sequence of asymptotes is called feasible, if

67


1. Lki ≤ xki − ξ, Uki ≥ xki + ξ, ξ is a positive constant, for all i = 1, ..., n and k ≥ 0.

2. Lki ≥ Lmin, Uki ≤ Umax ∀ k ≥ 0, Lmin, Umax finite.

The first point of this definition prevents the curvature of the approximations from going toinfinity. The second part prevents the approximations from going to close to linearity. Thisis necessary for the approximation of the objective. It is not necessary for the constraints.Therefore, one could use different asymptotes for the objective and the constraints but, for thematter of simplicity of the presentation we use the same asymptotes for both the objective andthe constraints.

Remark 5.2.1 Individual asymptotes for each function that has to be approximated were intro-duced by Fleury [47]. But with very large problems in mind we do not proceed this way, becausewe would have to compute and store 2n(M − meq + 1) values. Therefore, it only seems to bereasonable to use different asymptotes for the objective on the one hand and the set of inequalityconstraints on the other hand.

For the rest of this paper we assume a feasible sequence of asymptotes.We will now formulate algorithm MMA.

Algorithm 4 MMA (method of moving asymptotes)

Step 0 : Choose x0 ∈ X, y0j ≥ 0, j = 1, ...,M ;

compute f(x0),∇f(x0), hj(x0), ∇hj(x0), j = 1, ...,M ;

let k := 0.

Step 1 : Compute Lki and Uki (i = 1, ..., n) by some scheme;define fk(x), hkj (x), j = 1, ...,M

(cf. (5.1),(5.2)and (5.3)).

Step 2 : Solve (P ksub)(xk); let (xk+1, yk+1) be the solution,

where yk+1 denotes the corresponding vector of Lagrange multipliers.

Step 3 : If xk+1 = xk stop;(xk

yk

)is the solution.

Step 4: Compute f(xk+1), ∇f(xk+1), hj(xk+1), ∇hj(xk+1), j = 1, ...,M ,

let k := k + 1,goto step 1.

Taking the solution of (P ksub) as the new iteration point does not lead to a globally convergentalgorithm. One would have to introduce strong assumptions on the model functions that arefar away from reality. In order to globalize the convergence of algorithms, one possibility is tointroduce so-called merit functions and perform a line-search with respect to such a function.The background is that stationary points of a problem like (P1) are linked to stationary points ofthe unrestricted merit function. The decrease in the merit function is a measure for the progressin the iteration. For our purposes we chose the augmented Lagrangian merit function which willbe introduced below.

68


To simplify the notation we rewrite (P1) incorporating the box-constraints into the generalinequality constraints:

min f(x), x ∈ Rn,s.t. hj(x) = 0, j = 1, ...,meq, (P2)

hj(x) ≤ 0, j = meq + 1, ...,m,

where m = M + 2n.

Now we introduce the augmented Lagrangian function.

Definition 5.2.2 The augmented Lagrangian function Φρ : Rn+m → R associated to (P2) isfor a fixed set of parameters ρj > 0 (j = 1, ...,m) defined by:

Φρ

(x

y

)= f(x) +

meq∑j=1

(yjhj(x) +

ρj2h2j (x)

)

+m∑

j=meq+1

yjhj(x) +

ρj2h2j (x) , if − yj

ρj≤ hj(x)

−y2j

2ρj, otherwise

.

The motivation to choose the augmented Lagrangian merit function results from the followingtwo well known statements:

a) A point(x∗

y∗

)is stationary for Φρ if and only if

(x∗

y∗

)is stationary for (P2).

b) Under some regularity conditions there is a ρ > 0 ∈ Rm such that x∗ is a local minimizerfor Φρ(x) := Φρ

(xy∗

)for all ρ ≥ ρ.

For usage in the algorithm below we define:

ηki :=

(∂f(xk)∂xi

+ τki

)2Uki −zki −xki

(Uki −zki )2 , if ∂f(xk)∂xi

≥ 0(τki −

∂f(xk)∂xi

)−2Lki +zki +xki

(zki −Lki )2 , if ∂f(xk)∂xi

< 0

andηk := min

i=1,...,nηki . (5.5)

Another important ingredient for the SCP-algorithm is the procedure to modify the penaltyparameter vector in case of violation of the descent property (cf. step 6 of Algorithm 6), whichwill be introduced now. The motivation for subsequently used factors can be found in [176].

Moreover, we define:

J := j | 1 ≤ j ≤ meq ∪j | meq + 1 ≤ j ≤ m;−yj

ρj≤ hj(x)

,

K := j | 1 ≤ j ≤ m; j 6∈ J .

69


Algorithm 5 Update of penalty parameters

Let κ1 > 1 (const.; e.g. 2), κ2 > κ1 (const.; e.g. 10).

j ∈ J : if hj(xk) > 0 and ∇hj(xk)T (zk − xk) 6= 0 or hj(x

k) < 0 and ∇hj(xk)T (zk − xk) > 0 :

ρj := min

κ2ρj ; max

κ1ρj ;

∣∣∣∣2(vj − yj)hj(xk)

∣∣∣∣else: ρj := κ1ρj

j ∈ K : if vj − yj < 0 :

ρj := min

κ2ρj ; max

κ1ρj ;

∣∣∣∣yj(vj − yj)4mηk(δk)2

∣∣∣∣else: ρj := κ1ρj

κ1 prevents a penalty parameter from converging slowly to an upper bound. κ2 preventsit from going to fast to large values. Notice, that it is not necessary to compute the lowestpossible penalty parameter, it is only necessary (and reasonable from a numerical point of view)to exceed a certain (unknown) value.

70

5.3. Predictor-corrector interior point method

We will now formulate the SCP-algorithm.

Algorithm 6 SCP (sequential convex programming)

Step 0 : Choose x0 ∈ X, y0j ∈ R, j = 1, ...,meq, y

0j ≥ 0, j = meq + 1, ...,m,

0 < c < 1 (e.g. 0.001), 0 < ψ < 1 (e.g. 0.5), ρj > 0 (e.g. 1), j = 1, ...,m;

compute f(x0),∇f(x0), hj(x0), ∇hj(x0),

j = 1, ...,M ; let k := 0.

Step 1 : Compute Lki and Uki (i = 1, ..., n) by some scheme;

define fk(x), hkj (x), j = 1, ...,M

(cf. (5.1),(5.2) and (5.3)).

Step 2 : Solve (P ksub)(xk);

let (zk, vk) be the solution, where vk denotes the corresponding vector ofLagrange multipliers.

Step 3 : If zk = xk stop;(xk

yk

)is the solution.

Step 4 : Let pk :=(zk−xkvk−yk

), δk := ‖ zk − xk ‖, ηk as defined in (5.5).

Step 5 : Compute Φρ

(xk

yk

), ∇Φρ

(xk

yk

), ∇Φρ

(xk

yk

)Tpk.

Step 6 : If ∇Φρ

(xk

yk

)Tpk > −η

k(δk)2

2update the penalty parameters ρj (j = 1, ...,m)

according to Algorithm 5 and goto step 5;

otherwise, let σ := 1.

Step 7 : Compute f(xk + σ(zk − xk)), hj(xk + σ(zk − xk)), j = 1, ...,M, Φρ

((xk

yk

)+ σpk

).

Step 8 : (Armijo condition)

If Φρ

(xk

yk

)− Φρ

((xk

yk

)+ σpk

)< −cσ∇Φρ

(xk

yk

)Tpk let σ := σ · ψ, goto step 7;

else let σk := σ.

Step 9 : Let(xk+1

yk+1

):=(xk

yk

)+ σkpk, k := k + 1.

Step 10: Compute ∇f(xk), ∇hj(xk), j = 1, ...,M , goto step 1.


It is beyond the scope of this paper to go into the convergence theory of interior point methodsfor convex programming. This work has been done mainly by Nesterov and Nemirovskii ([97])for classes of interior point methods. One of the first practical methods for nonlinear convex pro-gramming appeared in [72]. Svanberg ([155]) developed in parallel to [174] a primal dual interiorpoint method for the solution of the MMA subproblems and modified MMA subproblems.

71


We chose the predictor-corrector interior point method for the solution of our convex programs.It has been proven to be one of the most efficient interior point methods for linear programming(see [89] and [93] for the development of the method and [89] for a numerical evaluation). Thebasic structure of our problem does not differ very much from a linear program.

We proceed from problem (P ksub) and modify it in order to prepare it for the application ofthe predictor-corrector method. For that purpose we add nonnegative slack variables whereverinequalities appear:

min fk(x), x ∈ Rn,s.t. hkj (x) = 0, j = 1, ...,meq,

hkj (x) + cj−meq = 0, j = meq + 1, ...,M,

− cj + rj = 0, j = 1, ...,M −meq, (5.6)

xi′ − xi + si = 0, i = 1, ..., n,

xi − xi′ + ti = 0, i = 1, ..., n,

r, s, t ≥ 0.

The slacks s and t are introduced because variable x is formally free and allowed to violate itsbounds. Slack r is not absolutely necessary, but with its usage we allow c and in consequencethe dual of the original subproblem constraints to become 0. If we do not use slack r we needc and the dual variable y to be positive. The computational amount is touched only slightly bythis modification.

The positivity is only demanded for the new introduced variables r, s and t, variables that donot appear outside the subproblems, i.e. in the main loop.

For the next step we add barrier terms corresponding to the slack variables and build theLagrangian of the so modified problem:

Lµ(x, yeq, yie, c, r, s, t, dr, ds, dt) = fk(x)− µM−meq∑j=1

ln rj − µn∑i=1

ln si

− µn∑i=1

ln ti + yTeqhkeq(x) + yTie(h

kie(x) + c)

+ dTr (−c+ r) + dTs (x′ − x+ s) + dTt (x− x′ + t),

where hkeq := (hk1, ..., hkmeq)

T , hkie := (hkmeq+1, ..., hkM )T and yeq, yie, dr, ds, dt are the dual variable

vectors to the corresponding constraints. The combination of yeq and yie corresponds to thevector y of the last section, i.e. y = (yeq, yie)

T . µ is a positive homotopy parameter. Formally,in this formulation all variables are free, the barrier formulation, however, needs r, s and t to bepositive.

The necessary optimality condition ∇Lµ = 0 reads then:

72


∇x : ∇fk(x) + Jeqyeq + Jieyie − ds + dt = 0,

∇yeq : hkeq(x) = 0,

∇yie : hkie(x) + c = 0,

∇c : yie − dr = 0,

∇r : DrRe− µe = 0,

∇s : DsSe− µe = 0, (S0)

∇t : DtTe− µe = 0,

∇dr : − c+ r = 0,

∇ds : x′ − x+ s = 0,

∇dt : x− x′ + t = 0,

where R = diag (r1, ..., rM−meq);S, T,Dr, Ds, Dt resp., e = (1, 1, ..., 1)T in the appropriatedimension,

Jeq =

∂hk1(x)∂x1

∂hk2(x)∂x1

...∂hkmeq (x)

∂x1

. . ... .

. . ... .

. . ... .

. . ... .∂hk1(x)∂xn

∂hk2(x)∂xn

...∂hkmeq (x)

∂xn

∈ Rn,meq and

Jie =

∂hkmeq+1(x)

∂x1

∂hkmeq+2(x)

∂x1...

∂hkM (x)∂x1

. . ... .

. . ... .

. . ... .

. . ... .∂hkmeq+1(x)

∂xn

∂hkmeq+2(x)

∂xn...

∂hkM (x)∂xn

∈ Rn,M−meq .

The linear system to compute a Newton step for the solution of this systems reads as follows:

∇xxL Jeq Jie −I IJTeqJTie I

I −IDr R

Ds SDt T

−I I−I II I

∆x∆yeq∆yie∆c∆r∆s∆t

∆dr∆ds∆dt

= −∇Lµ, (S1)

73


where ∇xxL = ∇2fk(x) + ddxJeqyeq + d

dxJieyie. We will discuss this term now a little bit moredetailed. It is crucial for the effectiveness of the interior point method. For the equalities thematrix Jeq is constant and thus d

dxJeqyeq = 0. For the inequalities the term does not vanish:

Jieyie =

M∑j=meq+1

∂hkj (x)

∂x1(yie)j−meq

.

.

.M∑

j=meq+1

∂hkj (x)

∂xn(yie)j−meq

=⇒ ∂(Jieyie)k

∂xi= ∂

∂xi

(M∑

j=meq+1

∂hkj (x)

∂xk(yie)j−meq

)=

M∑j=meq+1

∂2hkj (x)

∂xk∂xi(yie)j−meq .

Since the functions hkj (j = meq + 1, ...,M) are separable, we have:∂2hkj (x)

∂xk∂xi= 0 if k 6= i,

i.e. these Hessians are diagonal. Thus, ∇xxL is completely determined since

d

dxJieyie = diag

M∑j=meq+1

∂2hkj (x)

∂x2k

(yie)j−meq

k=1,...,n

and fk is separable, too. From a computational point of view, ∇xxL is in consequence not an × n matrix, but a n vector. A central point considering problems with a large number ofvariables. Remember, that each component of ∇xxL is strictly positive due to the convexityproperties of the included functions.

For the purpose of computational efficiency, let us have a look on the nonzero structure of thematrix Jie. For one particular component of the gradient of a constraint we have (cf. (5.2)):

∂hkj (x)

∂xi=

∂hj(x

k)∂xi

(Uki −xki )2

(Uki −xi)2 , if∂hj(x

k)∂xi

≥ 0

∂hj(xk)

∂xi

(xki−Lki )2

(xi−Lki )2 , if∂hj(x

k)∂xi

< 0. (5.7)

Since the second term is in both cases always ensured to be strictly positive, the nonzero structureof Jie in an arbitrary point is identical to the nonzero structure of the original constraints in thecurrent main iteration point xk. The same is true for the matrix Jeq. That means, sparsity inthe original problem is preserved for the subproblems.

We will now briefly state the principles of the predictor-corrector interior point algorithm,which has been chosen for the solution of (P ksub).

74


Algorithm 7 Predictor-corrector algorithm

Step 0: Choose x0 ∈ X ′, c0 ≥ 0, r0, s0, t0, d0r , d

0s, d

0t > 0, y0

ie ≥ 0, y0eq, ε (solution tolerance). Let

k := 0.

Step 1: Let µ0 := maxj=1,...,M−meq

i=1,...,n

r0j (d

0r)j , s

0i (d

0s)i, t

0i (d

0t )i.

Step 2: Solve (S1) for the predictor step (cf. Section 5.3.1).

Step 3: Let µk+1 = µk+1(µk,∆x, ...,∆dt) (see below).

Step 4: Solve (S1) for the corrector step (cf. Section 5.3.2) using the results of the predictor stepand µk+1.

Step 5: Compute independent primal and dual stepsizes: Let κ := 10 · εmach, where εmach is themachine precision.

δip,primal := minj=1,...,M−meq

i=1,...,n

(rkj−κ−∆rj

, where rkj + ∆rj < κ

),(

ski−κ−∆si

, where ski + ∆si < κ),(

tki−κ−∆ti

, where tki + ∆ti < κ)

;

δip,dual := minj=1,...,M−meq

i=1,...,n

((dkr )j−κ−(∆dr)j

, where (dkr )j + (∆dr)j < κ),(

(dks )i−κ−(∆ds)i

, where (dks)i + (∆ds)i < κ),(

(dkt )i−κ−(∆dt)i

, where (dkt )i + (∆dt)i < κ)

.

δip,primal =

1, if δip,primal is not defined

min1, 0.99995 · δip,primal, else,

δip,dual =

1, if δip,dual is not defined

min1, 0.99995 · δip,dual, else.

δprimal := mini=1,...,n

δip,primal,

(xki−xi′−∆xi

, where xki + ∆xi < xi′),(

xki−xi′−∆xi

, where xki + ∆xi > xi′)

;

δdual := minj=1,...,M−meq

δip,dual,

((ykie)j−(∆yie)j

, where (ykie)j + (∆yie)j < 0)

.

Step 6: Update:

xk+1 := xk + δprimal ·∆x, yk+1eq := ykeq + δdual ·∆yeq, yk+1

ie := ykie + δdual ·∆yie,

ck+1 := ck + δprimal ·∆c, rk+1 := rk + δprimal ·∆r, sk+1 := sk + δprimal ·∆s,

tk+1 := tk + δprimal ·∆t, dk+1r := dkr + δdual ·∆dr, dk+1

s := dks + δdual ·∆ds,

dk+1t := dkt + δdual ·∆dt.

Let k := k + 1.

If ||∇L0|| < ε stop, else goto step 2.

75


Remark 5.3.1

• For the initialization in Step 0 one can take the values of the previous main (MMA-/SCP-)iteration. One possibly has to take care of the positivity condition.

• The factor 0.99995 prevents the components of the critical vectors from getting too closeto 0.

• δip,primal and δip,dual are the stepsizes to ensure the interior point condition. δprimal takesadditionally the definition of the approximations into account where we have the strongerdemand x ∈ X ′. δdual ensures that yie ≥ 0 throughout, to guarantee the positivity of theHessian of the Lagrangian.

We will now consider step 2 of the algorithm above, i.e., we will analyze how the linear system(S1) can be solved for the predictor step in practice.

Update of the homotopy parameter

The homotopy parameter µ of step 3 of the predictor-corrector interior point algorithm (Alg. 7)is updated as:

µk+1 :=1

n+ n+M −meq

[(sk + δ

′primal ·∆s)T (dks + δ

′dual ·∆ds)+

(tk + δ′primal ·∆t)T (dkt + δ

′dual ·∆dt)+

(rk + δ′primal ·∆r)T (dkr + δ

′dual ·∆dr)

]where ∆s,∆t,∆r,∆ds,∆dt and ∆dr are the results of the predictor step and δ

′primal and δ

′dual

are intermediate stepsizes computed in the same way as in step 5 of Algorithm 7.

This update rule is well approved in linear programming, cf. [167] or [160], and works alsovery well in our context.

5.3.1. Predictor step

Indefinite linear system

The predictor step is only used to estimate a proper value for the homotopy parameter µ and toestimate the nonlinearity terms for the corrector step. It corresponds to an affine scaling step.

For the predictor step we solve the linear system (S1), but without the terms containing µin the right hand side.

For this purpose, we will now eliminate some variables of the linear system in order to getone that can be handled easier. Notice, that all matrices in this section that are inverted arepositive diagonal matrices and thus are invertable.

Choose equation 5 of (S1): Dr∆r +R∆dr = −DrRe;

=⇒ ∆dr = −dr −R−1Dr∆r. (5.8)

76


Analogously we get for the equations 6 and 7 of (S1):

∆ds = −ds − S−1Ds∆s, (5.9)

∆dt = −dt − T−1Dt∆t. (5.10)

Substitute in equation 1 of (S1):∇xxL∆x+ Jeq∆yeq + Jie∆yie −∆ds + ∆dt = −∇fk(x)− Jeqyeq − Jieyie + ds − dt;

=⇒ ∇xxL∆x+Jeq∆yeq +Jie∆yie+S−1Ds∆s−T−1Dt∆t = −∇fk(x)−Jeqyeq−Jieyie. (5.11)

Substitute in equation 4 of (S1):

∆yie +R−1Dr∆r = −yie. (5.12)

Further eliminations in the equations 8, 9 and 10 of (S1) yield:

∆r = ∆c+ c− r, (5.13)

∆s = ∆x+ x− x′ − s, (5.14)

∆t = −∆x− x+ x′ − t. (5.15)

Substitution in (5.11) and (5.12) yields:

(∇xxL+ S−1Ds + T−1Dt)∆x+ Jeq∆yeq + Jie∆yie =

−∇fk(x)− Jeqyeq − Jieyie + ds − dt − S−1Ds(x− x′) + T−1Dt(x′ − x) =: γ1, (5.16)

∆yie +R−1Dr∆c = −yie + dr −R−1Drc. (5.17)

The last equation is used to eliminate ∆c:

∆c = D−1r R(−yie + dr −R−1Drc−∆yie) = D−1

r R(−yie + dr −∆yie)− c. (5.18)

This has to be inserted in equation 3 of (S1):

JTie∆x−D−1r R∆yie = −hkie(x)−D−1

r R(−yie + dr) =: γ2. (5.19)

Putting all together we get the following linear system:

∇xxL+ S−1Ds + T−1Dt Jeq JieJTeqJTie −D−1

r R

∆x∆yeq∆yie

=

γ1

−hkeq(x)

γ2

. (S2)

This system is indefinite and has full rank provided that Jeq and Jie are of full rank (remember∇xxL > 0). It’s dimension is (n+M)×(n+M). The upper left and lower right part are diagonal.Thus, the matrix can be considered as sparse. Additional sparsity of the Jacobians improves thesituation. It is now possible to solve this system with a sparse indefinite linear system solver.

SCPIP does not support this approach. We will proceed with two more possibilities and willshow which approach is favorable in which situation.

77


Positive definite system of dimension M ×M

First of all we defineΘ := ∇xxL+ S−1Ds + T−1Dt.

We proceed from (S2) by eliminating ∆x from equation 1:

∆x = Θ−1(γ1 − Jeq∆yeq − Jie∆yie). (5.20)

Inserting in the second equation yields:

JTeqΘ−1Jeq∆yeq + JTeqΘ

−1Jie∆yie = hkeq(x) + JTeqΘ−1γ1. (5.21)

Inserting in the third equation yields:

JTieΘ−1Jeq∆yeq + (JTieΘ

−1Jie +D−1r R)∆yie = JTieΘ

−1γ1 − γ2. (5.22)

Together we have(JTeqΘ

−1Jeq JTeqΘ−1Jie

JTieΘ−1Jeq JTieΘ

−1Jie +D−1r R

)(∆yeq∆yie

)=

(hkeq(x) + JTeqΘ

−1γ1

JTieΘ−1γ1 − γ2

). (S3)

This system is positive definite. It’s dimension is M×M . If Jeq and Jie are sparse then we canhope that the same is true for the matrix in (S3). Unfortunately, this cannot be ensured. In thecontext of linear programming this case was extensively examined. If one of the Jacobians hasat least one dense column then the matrix is dense, too. On the other hand, there are techniquesto overcome this situation by splitting dense columns. It is beyond the scope of this paper tooutline these techniques. The reader is referred to the book of Wright [167] and the referencescited therein. Notice, that it is worthwhile to think about sparse positive definite systems if theJacobians are dense overall. Then, of course, the matrix in (S3) is also dense.

In SCPIP there are three possibilities to solve the linear systems (S3). Parameter ”SubProb-lemStrat” has to be set to 1.

• A dense Cholesky solver can be chosen. This is favorable in case of a small or mediumnumber of constraints (M). In this case, parameter ”LinEqSolver” has to be set to 1.

• A sparse Cholesky solver can also be chosen. This should be the best approach (withinthis three) for large M and sparse Jacobians Jie and Jeq, if it does not lead to a densematrix in (S3). ”LinEqSolver” has to be set to 2.

• The third possibility is a conjugate gradient (cg) solver, i.e. an iterative solution method.This is favorable in case of large M and dense Jie and Jeq, or a large M and sparseJacobians if this leads to a dense matrix in (S3). ”LinEqSolver” has to be set to 3.

Positive definite system of dimension n× n

For this approach we have to assume the absence of equality constraints. That means, weproceed from equation (S2) by deleting row 2 and column 2 and eliminate then ∆yie instead of∆x:

∆yie = R−1Dr(JTie∆x− γ2). (5.23)

78


Inserting in the first equation yields:

(Θ + JieR−1DrJ

Tie)∆x = γ1 + JieR

−1Drγ2. (S4)

As in the last subsection, this system is positive definite. It’s dimension is n×n. The remarksabout sparsity are also valid here replacing columns by rows. The impact of dense rows orcolumns is examined more detailed in the book of Vanderbei [160].

In SCPIP there are three possibilities to solve the linear systems (S4). Parameter ”SubProb-lemStrat” has to be set to 2. This is equivalent to the last subsection.

• A dense Cholesky solver can be chosen. This is favorable in case of a small or mediumnumber of variables (n). In this case, parameter ”LinEqSolver” has to be set to 1.

• A sparse Cholesky solver can be chosen. This should be the best approach (within thisthree) for large n and a sparse Jacobian Jie, if it does not lead to a dense matrix in (S4).(LinEqSolver = 2)

• The third possibility is a cg solver. This is favorable in case of large n and a dense Jie, ora large n and a sparse Jie if it leads to a dense matrix in (S4). (LinEqSolver = 3)

Comparison of the three approaches

We will now summarize the properties of the three approaches of this section and compare itamong each other as well as with the classical dual approach. The time that is spent for therepeated solution of the linear systems is crucial for the computing time of the solution of asubproblem and thus also important for the solution of the overall algorithm (together with thecomputing amount for the function and gradient evaluations). Notice, that the method to solvethe subproblems has theoretically no influence on the number of main iterations (if we solve thesubproblems up to exact optimality).

For the comparison we have to mention that approach (S4) does not work for equality con-straints and that the dual approach has never been worked out for this situation. So thecomparison is only fair for problems without equality constraints.

For the dual approach, finally a concave maximization problem with simple nonnegativityconstraints has to be solved. But this problem is only once continuously differentiable. Thesecond derivative has jumps and is therefore only almost everywhere existent. These jumps arenot only theoretically important, they are also numerically relevant (cf. [46]). Thus, we comparea BFGS based maximization algorithm. The term sparsity in the table means the case if the

Table 5.1.: Comparison of linear systems

dual (S2) (S3) (S4)

Dimension M ×M (n+M)× (n+M) M ×M n× nDefinite? positive indefinite positive positiveSparsity dense sparse ? ?

Jacobians are sparse.

79


Let us summarize these properties: To the experience of the author, the dual approach isnot relevant any longer. The advantage, that in the case of few constraints one can reduce thecomputing amount to the solution of M×M systems is also governed by approach (S3). But theinterior point approaches are much more stable. In case of large n and small M approach (S3)is favorable, as well as approach (S4) vice versa, if n is small and M is large. The choice is notso easy if n and M are in the same range. Then approaches (S3) and (S4) are comparable. Itdepends on the sparsity of Jeq and Jie. In cases, where a sparse Jacobian causes dense positivedefinite systems, approach (S2) is also a practical alternative.

Which specific linear system solver should be chosen is a difficult question. For small andmedium dimensions the dense Cholesky solver is usually the best choice. The sparse Choleskyoption is the best approach for large dimensions and sparse matrices. In case of large densematrices or if the Cholesky decomposition needs to much storage, the iterative solver is the bestpossibility. At which moment it is the best to switch from one possibility to the other is hardto predict. It is matter of further research and numerical experience.

5.3.2. Corrector step

For the corrector step we take the equation (S0), replace the vector (x, ..., dt) by the new point(x+ ∆x, ..., dt + ∆dt) and put all zero order and second order terms with the unknowns into theright hand side. The right hand side of the system (S1) changes then as follows (w.r.t. the pre-dictor step): The µ-terms are added, as they are part of the original gradient of the Lagrangianand the nonlinear terms mentioned above are estimated by the results of the predictor step. Thematrix of (S1) remains unchanged. This makes this approach attractive because the matrix hasto be factorized only once using a direct solver.

The following factors have to be added to the right hand side of (S1):

Component 1: −(

M∑j=meq+1

∂2hkj∂x2k

∆xk(∆yie)j−meq

)k=1,...,n

,

Component 5: +µe−∆r∆dr,Component 6: +µe−∆s∆ds,Component 7: +µe−∆t∆dt,

where the underlined terms stand for the results of the predictor step.

The same eliminations, that have been done for the predictor system, are applied to thecorrector system. Firstly, equations 5, 6 and 7 of (S1):

∆dr = −dr +R−1(µe−Dr∆r −∆r∆dr), (5.24)

∆ds = −ds + S−1(µe−Ds∆s−∆s∆ds), (5.25)

∆dt = −dt + T−1(µe−Dt∆t−∆t∆dt). (5.26)


∇xxL∆x+ Jeq∆yeq + Jie∆yie + S−1Ds∆s− T−1Dt∆t =

−∇fk(x)−Jeqyeq−Jieyie+S−1(µe−∆s∆ds)−T−1(µe−∆t∆dt)−( M∑j=meq+1

∂2hkj∂x2

k


)k=1,...,n

.

(5.27)

80



∆yie +R−1Dr∆r = −yie +R−1(µe−∆r∆dr). (5.28)

Further eliminations in the equations 8, 9 and 10 of (S1) yield (as for the predictor step):

∆r = ∆c+ c− r, (5.29)

∆s = ∆x+ x− x′ − s, (5.30)

∆t = −∆x− x+ x′ − t. (5.31)

Substitution in equations (5.27) and (5.28) yields:

(∇xxL+ S−1Ds + T−1Dt)∆x+ Jeq∆yeq + Jie∆yie =

−∇fk(x)− Jeqyeq − Jieyie + ds − dt − S−1Ds(x− x′) + T−1Dt(x′ − x)

+ S−1(µe−∆s∆ds)− T−1(µe−∆t∆dt)−

M∑j=meq+1

∂2hkj∂x2

k


k=1,...,n

=: γ3,

(5.32)

∆yie +R−1Dr∆c = −yie + dr −R−1Drc+R−1(µe−∆r∆dr). (5.33)

We use this equation in order to eliminate ∆c:

∆c = D−1r R(−yie + dr −R−1Drc+R−1(µe−∆r∆dr −∆yie) =

D−1r R(−yie + dr +R−1(µe−∆r∆dr −∆yie)− c. (5.34)

Inserting in equation 3 of (S1):

JTie∆x−D−1r R∆yie = −hkie(x)−D−1

r R(−yie + dr +R−1(µe−∆r∆dr) =: γ4.

Putting all together we get the following linear system:

∇xxL+ S−1Ds + T−1Dt Jeq JieJTeqJTie −D−1

r R

∆x∆yeq∆yie

=

γ3

−hkeq(x)

γ4

. (S5)

In comparison with the corresponding predictor system (S2) the matrix remains unchanged.Only the right hand side has changed. This is important justifying an iteration that requirestwo linear system solutions. A direct linear solver decomposes the matrix in the predictorstep and uses this decomposition in the corrector step such that there is only an additionalbackward substitution necessary. This is also true for the two other approaches, that resultedin the equations (S3) and (S4) for the predictor step. We will not formulate the correspondingcorrector equations. They are the same replacing γ1 by γ3 and γ2 by γ4.

The advantage of using a decomposed matrix twice does not apply for the cg method.Presently, the cg solver is also applied twice but in a future release it is planned to use then theclassical primal-dual interior point approach which needs only one linear system solution periteration but usually a few iterations more than the predictor-corrector method.

81


5.4. Convergence results

This section provides a short summary of the most important convergence results. Details canbe found in [176].

Theorem 5.4.1 Let the sequence (xk, yk)k=0,1,2,... be produced by the SCP-algorithm, all sub-problems be solvable and gradients of active constraints at the optimal points of (P ksub) be linearindependent, as well as those of (P ∗sub) where (P ∗sub) is the subproblem defined in any possibleaccumulation point x∗ of (xk)k=0,1,2,....Then all points xk are in X and all yk are in a compact set U , i.e., there is a ymax with|ykj |, |vkj | ≤ ymax for all k ≥ 0 and j = 1, ...,m.

Theorem 5.4.2 Let the assumptions of Theorem 5.4.1 be valid. pk, ηk and δk are defined as inAlgorithm 6 and let the choice of asymptotes be feasible.

a) Then there are penalty parameters ρkj > 0 (j = 1, ...,m) such that pk is a direction of

descent for all ρ ≥ ρk for the augmented Lagrangian function Φρ, that means

∇Φρ(xk, yk)T pk ≤ −η

k(δk)2

2for all ρ ≥ ρk.

b) For each fixed δ > 0 there is a finite ρδ such that for all(xk

yk

)with δk ≥ δ we have

∇Φρ(xk, yk)T pk ≤ −η

k(δk)2

2≤ −ηδ

2

2for all ρ ≥ ρδ.

The first part of this theorem shows us that we can always find penalty parameters, suchthat the computed search direction is a direction of descent for the augmented Lagrangian. Thesecond part guarantees that the penalty parameters are uniformly bounded, provided that weare not too close to a stationary point.

This suffices to prove a weak convergence theorem, i.e. the existence of an accumulation pointand at least one accumulation point is stationary. To be able to prove a strong convergencetheorem, i.e. additionally to the weak theorem, each accumulation point is stationary, we haveto conclude, that the penalty parameters are uniformly bounded in the neighborhood of astationary point too. For this purpose we need some additional but reasonable assumptions.

Recall that

J := j | 1 ≤ j ≤ meq ∪j | meq + 1 ≤ j ≤ m;−yj

ρj≤ hj(x)

,

K := j | 1 ≤ j ≤ m; j 6∈ J .

Theorem 5.4.3 Let the assumptions of Theorem 5.4.1 be valid and assume a feasible choice of

asymptotes. For δk 6= 0 we define: αk :=‖yk − vk‖2

(δk)2. Let

(xk

yk

)be determined by Algorithm 6

and fulfill the following two conditions:

a) j ∈ J if and only if hkj (zk) = 0 (j = 1, ...,m) (that means, subproblems and original

problem identify the same set of active constraints),

82


b) there is a α ∈ R independent of k, such that αk ≤ α <∞.

Then there is a δ′ > 0, such that

∇Φρ′(xk, yk)T pk ≤ −η(δk)2

2for all

(xk

yk

)with δk ≤ δ′,

where ρ′ is a vector of penalty parameters with

minj=1,...,m

ρ′j ≥

2α

η.

Theorem 5.4.3 would suffice to prove strong convergence but especially assumption a) seemsnot to be necessarily fulfilled if we are far away from a stationary point. Therefore, we distinguishbetween Theorem 5.4.2 and Theorem 5.4.3. Notice, that we do not use any information aboutδk in the proof of Theorem 5.4.3.

An immediate consequence of the Theorems 5.4.2 and 5.4.3 is:

Corollary 5.4.1 Let the sequence(xk

yk

)k=0,1,...

be produced by Algorithm 6 and let the assump-

tions of Theorems 5.4.2 and 5.4.3 be valid. Then the sequence of penalty parameter vectors isbounded, i.e. there is a ρ <∞ and a k <∞ such that ρk = ρ for all k ≥ k ≥ 0.

With the results of this section it is now possible to formulate the main convergence theorem.

Theorem 5.4.4 Let the sequence(xk

yk

)k=0,1,...

be produced by Algorithm 6 and let the assumptions

of Theorems 5.4.2 and 5.4.3 be valid. Then the sequence has at least one accumulation pointand each accumulation point is stationary.




1. Generate an ScpWrapper object employing ScpWrapper()



b) Put the values for the parameters you want to adjust:





83


– "OptMethod" Set desired optimization method:OptMethod == 1: method of moving asymptotes (default)OptMethod == 2: sequential convex programming

– "MaxNumIterLS" Put maximum number of function calls in the line search.

– "SubProblemStrat" Set strategy for the subproblem.SubProblemStrat == 0: automatic choice by the programSubProblemStrat == 1: approach (S3) of section 5.3.SubProblemStrat == 2: approach (S4) of section 5.3.

– "AsympStrat" Set strategy for the choice of asymptotes."AsympStrat" == 1: as Svanberg uses it nowadays (default)."AsympStrat" == 2: strategy due to the diploma thesis of Schenk. See also[173]."AsympStrat" == 3: thought to approximate the ’choice’ of the conlin-method (only for positive lower bounds)"AsympStrat" == 4: according to the first idea of Svanberg[153](only forpositive lower bounds).

– "LinEqSolver" Choose a solver for linear equation systems.LinEqSolver == 1: dense cholesky solverLinEqSolver == 2: sparse cholesky solver.LinEqSolver == 3: cg solver

– "ActConstr" Sets constraints active or inactive. piValue must be a pointeron an array of length equal to the number of inequality constraints containing1 for an active, 0 for an inactive constraint.

– "LevelConvCheck" Puts the level of the convergence checkLevelConvCheck == 0: DefaultLevelConvCheck == 1: a relaxed convergence check is applied.Then the program is terminated if

∗ the current iterate is feasible, and

∗ the relative change of the last two succeeding iteration points is less thanConvCheckParam3, and

∗ the absolute change of the two last objective function values is less thanConvCheckParam2, and

∗ the relative change of the two last objective function values is less thanConvCheckParam1.

– "MemIncrement" Set the stepsize for increasing the constraint gradient ar-rays.

– "MemInit" Set the size of the constraint gradient arrays at first allocation.


– "FortranOutputLevel" If > 0, the output of the fortran code is written toan additional output file. This output might give more detailed information

84


about errors.0: no additional output (default)1: only final convergence analysis2: one line of intermediate results3: more detailed and intermediate results

– "OutputUnit" Put output unit for the Fortran Output.






– "Infinity" Put the number which shall represent infinity.

– "ActiveThreshold"Put threshold for activity constraints.

∗ ActiveThreshold > 0: All inequality constraints with values ¡ ActiveThresholdare going into the subproblem, i.e. are considered as potentially active.Other constraints are not considered in the submodel.If no active-set-strategy is desired, it is recommended to let ActiveThresholdbe a large number, e.g. Infinity, which is the default.

∗ ActiveThreshold < 0: Same as for ActiveThreshold > 0, but addition-ally all those constraints are considered active, which have been active inthe last iteration and are assigned with a nonzero lagrangian multiplier.

– "ConvCheckParam" Put values for relaxed convergence check. pdValue mustpoint to a double array of length three.For the meaning of the parameters see description of parameter LevelConvCheck.

– "ConvCheckParam1" Put one value for relaxed convergence check. For themeaning of the parameter see description ofparameter LevelConvCheck.







or

85


Put( const char * pcParam , const bool bValue ) ;


– "JacobianSparse" Determines whether the Jacobian of the constraints shallbe treated as a sparse matrix.


c) Put the number of inequality constraint functions to the object:


d) Put the number of equality constraint functions to the object:
















4. Start Scp-Routine: StartOptimizer()




86


• if iStatus equals EvalFuncId(): ScpWrapper needs new function values → 6b Newfunction values.After passing these values to the ScpWrapper object go to 4.

• if iStatus equals EvalGradId():ScpWrapper needs new values for gradients → 6c Providing new gradient values.After passing these values to the ScpWrapper object go to 4.


a) Useful methods:








b) Providing new function valuesFunction values for the objective as well as for constraints have to be calculated forthe design variable vector provided by Scp.For access to the design variable vector see 6(a)ii. After calculating the values, thesemust be passed to the ScpWrapper object using:


const int iParSysIdx ),with iParSysIdx=0 (default)where pdConstrVals is pointing on an array containing the values of each con-straint at the design variable vector provided by ScpWrapper.


const int iParSysIdx ),with iParSysIdx = 0(default), iObjFunIdx = 0 (default) and dObjVal definingthe value of the objective function at the design variable vector provided byScpWrapper.





87
















7. Output











88





) const




89

6. CobylaWrapper - Gradient-free ConstrainedOptimization by Linear Approximations

6.1. Algorithm

The algorithm by Powell employs linear approximations of the objective and constraint functions,the approximations being formed by linear interpolation at N + 1 points in the space of thevariables. The interpolation points are regarded as vertices of a simplex. The parameter ρcontrols the size of the simplex and is reduced automatically from ρBegin to ρEnd.

For each ρ the subroutine tries to achieve a good vector of variables for the current size, andthen ρ is reduced until the value ρEnd is reached. Therefore ρBegin and ρEnd should be set toreasonable initial changes to and the required accuracy in the variables respectively, but thisaccuracy should be viewed as a subject for experimentation because it is not guaranteed.

The algorithm has an advantage over many of its competitors, however, which is that it treatseach constraint individually when calculating a change to the variables,* instead of lumping theconstraints together into a single penalty function.

The name of the subroutine is derived from the phrase Constrained Optimization BY LinearApproximations.

For more information on the algorithm see Powell [108] and [109].


Besides the implementation described here, Nlp++ provides the possibility to derive your prob-lem from a given base class and therefore be able to use an automated loop over StartOptimizerand the function evaluation. For this possibility see chapter 31.


1. Generate a CobylaWrapper object employing CobylaWrapper()









91

6. CobylaWrapper - Gradient-free Constrained Optimization by Linear Approximations

– "FortranOutputLevel" If >0, the output of the fortran code is written toan additional output file. This output might give more detailed informationabout errors.0: the additional output will be suppressed. (default)1: an additional file will contain a summary at the end of the optimization.2: Each new value of RHO and SIGMA is printed. Further, the vector ofvariables and some function information are given when RHO is reduced.3: Each new value of RHO and SIGMA is printed. Further, the vector ofvariables and some function information are given when each new value ofthe objective function is computed.Here SIGMA is a penalty parameter, it being assumed that a change to thevariable vector X is an improvement if it reduces the merit function F (X) +SIGMA∗MAX(0.0,−C1(X),−C2(X), ...,−CM(X)), where C1, C2, ..., CMdenote the constraint functions that should become nonnegative eventually,at least to the precision of Rhoend = TermAcc. In the printed output thedisplayed term that is multiplied by SIGMA is called MAXCV, which standsfor ’MAXimum Constraint Violation’.






– "RhoBegin" Put desired value for ρBegin. Should be the size of a reasonableinitial change in the design variables.

– "RhoEnd" Put desired value for ρEnd. Should be the desired accuracy (with-out guarantee).

– "TermAcc" Same as "RhoEnd".








c) Put the number of inequality constraint functions to the object:


d) Put the number of equality constraint functions to the object:

92

















4. Start Cobyla-Routine: StartOptimizer()




• if iStatus equals EvalFuncId(): CobylaWrapper needs new function values → 6bNew function values.After passing these values to the CobylaWrapper object go to 4.


a) Useful methods:




93

6. CobylaWrapper - Gradient-free Constrained Optimization by Linear Approximations





b) Providing new function valuesFunction values for the objective as well as for constraints have to be calculated forthe design variable vector provided by Cobyla.For access to the design variable vector see 6(a)ii. After calculating the values, thesemust be passed to the CobylaWrapper object using:


const int iParSysIdx ),with iParSysIdx=0 (default)where pdConstrVals is pointing on an array containing the values of each con-straint at the design variable vector provided by CobylaWrapper.


const int iParSysIdx ),with iParSysIdx = 0(default), iObjFunIdx = 0 (default) and dObjVal definingthe value of the objective function at the design variable vector provided byCobylaWrapper.



7. Output








) const




94

7. MisqpWrapper - Sequential QuadraticProgramming with Trust RegionStabilization

MisqpWrapper is a C++ Wrapper for the Fortran subroutine MISQP. A sequential quadraticprogramming method for nonlinear optimization problems is implemented that applies trustregion techniques instead of a line search approach to achieve fast local convergence. MISQPallows continuous, binary, integer, and catalogued variables which can only take values withina given discrete set. Thus, MISQP also solves mixed-integer nonlinear programming problems,see Section 22 for more details. Parts of the following sections are taken from Exler et al. [40].

7.1. Introduction

The implemented trust region algorithm was proposed by Yuan [171] and addresses the gen-eral optimization problem to minimize an objective function f under nonlinear equality andinequality constraints,

x ∈ Rn :

min f(x)gj(x) = 0 , j = 1, . . . ,me

gj(x) ≥ 0 , j = me + 1, . . . ,mxl ≤ x ≤ xu

(7.1)

where x denotes the vectors of the continuous variables that is restriced by box constraints, i.e.,lower bounds xl and upper bounds xu. It is assumed that the problem functions f(x) and gj(x),j = 1, . . . ,m, are twice continuously differentiable with respect to all x ∈ Rn, where n denotesthe number of design variables.

Trust region methods have been invented many years ago first for unconstrained optimization,especially for least squares optimization, see for example Powell [107], or More [95]. Extensionswere developed for non-smooth optimization, see Fletcher [43], and for constrained optimiza-tion, see Celis [21], Powell and Yuan [110], Byrd et al. [20], Toint [158] and many others. Acomprehensive review on trust region methods is given by Conn, Gould, and Toint [26].

On the other hand, sequential quadratic programming or SQP methods belong to the mostfrequently used algorithms to solve practical optimization problems. The theoretical backgroundis described e.g. in Stoer [151], Fletcher [44], or Gill et al. [54].

The basic idea of trust region methods is to compute a new trial step dk by a second ordermodel or a close approximation, where k denotes the k-th iteration. The stepsize is restrictedby a trust region radius ∆k, i.e., ‖dk‖ ≤ ∆k with ‖.‖ being an arbitrary norm. Subsequently,a ratio rk of the actual and the predicted improvement subject to a certain merit function iscomputed. Depending on the calculated ration the trial step is either accepted and a move toan new iterate is performed, or the trial step is rejected and the iterate remains unchanged.

95

7. MisqpWrapper - Sequential Quadratic Programming with Trust Region Stabilization

Moreover, the trust region radius is either enlarged or decreased depending on the deviation ofrk from the ideal value rk = 1. If sufficiently close to a solution, the artificial bound ∆k shouldnot become active, so that the new trial step proposed by the second order model can alwaysbe accepted. For more details see Exler and Schittkowski [41] or Exler et. al. [39].

7.2. The Trust Region SQP Method

To simplify the notation, we assume that upper and lower bounds xu and xl are not handledseparately, i.e., we consider the formulation

x ∈ Rn :min f(x)gj(x) = 0 , j = 1, . . . ,me

gj(x) ≥ 0 , j = me + 1, . . . ,m(7.2)

The Lagrangian function of (7.2) is

L(x, u) := f(x)−m∑j=1

ujgj(x), (7.3)

where x ∈ Rn is the primal variable and u = (u1, . . . , um)T ∈ Rm the multiplier vector, and theexact L∞ penalty function is given by

Pσ(x) := f(x)− σ‖g(x)−‖∞. (7.4)

Here we combine all constraints in one vector, g(x) = (g1(x), . . . , gm(x))T , and the minus-signdefines the constraint violation

gj(x)− :=

gj(x) , if j ≤ me ,min(0, gj(x)) , otherwise ,

(7.5)

j = 1, . . . ,m. The penalty parameter of the exact L∞ penalty function (7.4) be denoted byσ > 0. As σ has to be sufficiently large, the value is adapted by the algorithm. (7.4) is frequentlyapplied to enforce convergence, see for example Fletcher [43], Burke [18], or Yuan [171].

To obtain an SQP or sequential quadratic programming method, we compute a trial step dkat iteration k towards the next iterate that is a solution to the subproblem

d ∈ Rn :min φk(d) := 1

2dTBkd+∇f(xk)

Td+ σk

∥∥∥(g(xk) +∇g(xk)Td)−∥∥∥

∞

‖d‖∞ ≤ ∆k

(7.6)

The constraints are linearized and are treated as a penalty term subject to the maximumnorm. The matrix Bk is an approximation to the Hessian of the Lagrangian function (7.3).A particular advantage is that we do not need further safeguards in case of inconsistent linearsystems. Note that (7.6) is equivalent to the quadratic program

d ∈ Rn, µ ∈ R :

min 12d

TBkd+∇f(xk)Td+ σkµ

−gj(xk)−∇gj(xk)Td+ µ ≥ 0 , j = 1, . . . ,me

gj(xk) +∇gj(xk)Td+ µ ≥ 0 , j = 1, . . . ,m

‖d‖∞ ≤ ∆k, µ ≥ 0 ,

(7.7)

96

7.2. The Trust Region SQP Method

which can be solved by any black-box quadratic programming solver. The optimal solution isdenoted by dk and uk is the corresponding multiplier. The penalty parameter σk and the trustregion radius ∆k are iteratively adapted.

A major role in a trust region method plays the decision whether a trial step dk is acceptedor not. Rejecting some trial steps is required to ensures the convergence of the method. In casea trial step is accepted we move to the new point and set xk+1 := xk + dk, otherwise the pointfor the next iteration remains unchanged, i.e., xk+1 := xk. The idea is to check the quotient ofthe actual and the predicted improvements of the merit function by

rk :=Pσk(xk)− Pσk(xk + dk)

φk(0)− φk(dk), (7.8)

where Pσk(x) denotes the penalty function (7.4) to measure the actual change, and φk(d) denotesthe objective function of subproblem (7.6). The proposed method accepts all trial steps in casethe ratio rk is greater than zero. Otherwise, the step is rejected.

In addition, a new trust radius ∆k+1 for the next iteration k + 1 has to be determined. Ifrk is close to one or even greater than one, then the trust region radius ∆k is enlarged and ifthe ratio rk is very small, ∆k is decreased. If rk remains in the intermediate range, ∆k is notchanged at all. More formally, we use the same constants proposed by Yuan [171], and set

∆k+1 :=

max(2∆k, 4‖dk‖∞) , if rk > 0.9 ,∆k , if 0.1 ≤ rk ≤ 0.9 ,min(∆k/4, ‖dk‖∞/2) , if rk < 0.1 .

(7.9)

A basic trust region SQP algorithm can be stated as follows,

Algorithm 8 Trust Region SQP

Step 0: Let x0 ∈ Rn, ∆0 > 0, B0 ∈ Rn×n symmetric, σ0 > 0, and let k := 1.Step 1: Solve subproblem (7.6) of (7.7) to get dk.Step 2: Increase σk+1 if necessary.Step 3: Calculate ratio rk according to (7.8).Step 4: Update the trust region radius ∆k+1 according to (7.9).Step 5: If rk ≤ 0, then let xk+1 := xk, Bk+1 := Bk. Set k := k + 1 and goto Step 1.Step 6: If rk > 0, then let xk+1 := xk + dk, update Bk+1. Set k := k + 1 and goto Step 1.

For more details it is refered to the original algorithm by Yuan [171].

Because of the non-differentiable merit function, however, superlinear convergence cannotbe guaranteed and it is even possible that the algorithm only converges linearly. To avoidthis situation, Fletcher [43] introduced a second order correction (SOC), for which superlinearconvergence can be shown, see Yuan [170]. Thus, the classical trust region method as outlinedbefore is combined with an additional correction which can be interpreted as a feasible directionstep. We define a new approximation at xk + dk, if the basic step obtained from solving (7.6) isnot as good as expected, and obtain a modified subproblem

d∈Rn :min 1

2(d+dk)TBk(d+dk)+∇f(xk)

T (d+dk)+σk

∥∥∥(g(xk+dk)+∇g(xk)Td)−∥∥∥

∞

‖d+ dk‖∞ ≤ ∆k .(7.10)

97


Let the solution be dk. Note that subproblem (7.10) can be transformed into a differentiablequadratic problem the same way as subproblem (7.6). The quotient rk is replaced by

rk :=Pσk(xk)− Pσk(xk + dk + dk)

φk(0)− φk(dk), (7.11)

that is rk := rk. The newly determined ratio rk is then used to decide whether the step withsecond order correction is accepted or not. The extension of the aforementioned algorithm isstraightforward.

For an algorithm that applies second order correction steps, the superlinear convergence ratecan be proved, see Fletcher [43] and Yuan [170].




1. Generate an MisqpWrapper object employing MisqpWrapper()

2. a) Put the number of design variables to the object:PutNumDv( int iNumDv ) ;

b) Put the values for other parameters you want to adjust:




– "MaxNumIter" Put maximum number of iterations. (default: 500)

– "OutputLevel" Put desired output level. (default: 2)

– "FortranOutputLevel" If > 0, the output of the fortran code is written toan additional output file. This output might give more detailed informationabout errors.0: no additional output (default)1: only final convergence analysis2: one line of intermediate results is printed in each iteration.3: more detailed information is printed in each iteration step, e.g. variable,constraint and multiplier values.


– "OutUnit" Same as ”OutputUnit”.

– "InternalScaling" Put flag for internal scaling. Function values can beinternally scaled.0: No scaling.

98


1: Subject to their absolute values, if greater than 1.0 or"InternalScalingBound", respectively. (default)

– "NonmonotoneConst" Put maximum number of successive iterations whichare considered for the non-monotone trust region algorithm (ignored if alldesign variables are continuous). The value must be less than 100. (default:10)

– "MeritFunction" Put merit function strategy:0: Augmented Lagrangian (only if all variables are continuous),1: L∞ penalty function,2: Combined version of 0 and 1,Default value is 2.

– "MaxNumUnsuccSteps" Put maximum number of feasible steps without im-provement of objective function subject to the tolerance

√TermAcc. (default:

10)




– "TermAcc" Put desired final accuracy. (default: 1.0E-6)

– "TermQPAcc" Put desired accuracy for QP solver. The value is needed forthe QP solver to perform several tests, for example whether optimality con-ditions are satisfied or whether a number is considered as zero or not. If"TermQPAcc" is less or equal to zero, then the machine precision is computedand subsequently multiplied by 1.0E+4. (default: 1.0E-10)

– "FacPenaltyIncrease" Put factor for increasing a penalty parameter, mustbe greater than one. (default: 10.0)

– "FacDeltaReduction" Put factor for decreasing the internal descent para-meter, see "ScalingDeltaInit". Must be less than one. (default: 0.1)

– "PenaltyParamInit" Put initial penalty parameter greater than zero. (de-fault: 1000.0)

– "ScalingDeltaInit" Put initial scaling parameter, smaller than one. (de-fault: 0.05)

– "TrustRegionContInit" Put initial continuous trust region radius. (default:50.0)

– "InternalScalingBound" Put scaling bound in case of "InternalScaling"> 0, i.e., function values are scaled only if their absolute initial values aregreater than the bound. (default: 1.0)




99





– "OpenNewOutputFile" Determines whether a new output file must be createdby the optimizer. If false, the output is appended to an existing file. Thisis the case, if the optimizer is used as a subproblem solver (e.g. in a multipleobjective optimizer). Usually, this parameter does not have to be changed.

c) Put the number of inequality constraint functions to the object:PutNumIneqConstr( int iNumIneqConstr ) ;

d) Put the number of equality constraint functions to the object:PutNumEqConstr( int iNumEqConstr ) ;

3. Set boundary values for design variables and provide initial guess.

a) Set upper and lower bounds usingPutLowerBound( const int iVariableIndex, const double dLowerBound )

PutUpperBound( const int iVariableIndex, const double dUpperBound )

for iVariableIndex = 0,...,NumberOfVariables -1 orPutUpperBound( const double * pdXu );


where pdXu and pdXl must be pointing on arrays of length equal to the number ofdesign variables.


b) Provide an initial guess for the optimization vector byPutInitialGuess( const double * pdInitialGuess ),where pdInitialGuess must be a double array of length equal to the number ofvariables.orPutInitialGuess( const int iVariableIndex, const double dInitialGuess ),for iVariableIndex = 0,...,NumberOfVariables - 1.


4. Start Misqp-Routine: StartOptimizer()


• if iStatus equals SolvedId():The final accuracy has been achieved, the problem is solved.

• if iStatus is > 0:An error occured during the solution process.

• if iStatus equals EvalFuncId() :MisqpWrapper needs new function values → 6b New function values.After passing these values to the MisqpWrapper object go to 4.

100


• if iStatus equals EvalGradId():MisqpWrapper needs new values for gradients → 6c Providing new gradient values.After passing these values to the MisqpWrapper object go to 4.


a) Useful methods:






Returns a pointer to the i’th design variable vector (default: i=0 ).GetDesignVar( const int iVarIdx , double & pdPointer,



b) Providing new function values.For access to the design variable vectors see 6(a)ii. After calculating the values, thesemust be passed to the MisqpWrapper object using:



where pdConstrVals is pointing on an array containing the values of each con-straint at the iParSysIdx’th design variable vector provided by MisqpWrapper.



with iObjFunIdx = 0 and dObjVal defining the value of the objective functionat the iParSysIdx’th design variable vector provided by MisqpWrapper.




c) Providing new gradient values.Gradients must be calculated for the objective and the constraints at the currentdesign variable vector. Partial derivatives have to be supplied for all variables.For access to the design variable vector see 6(a)ii with iParSysIdx = 0.



101














• PutGradObj( const double * pdGradient, const int iFunIdx),where iFunIdx = 0 (default) and pdGradient is a pointer to the gradient of theobjective at the current design variable vector.

7. Output














double & dValue,



102





103

8. QlWrapper - Quadratic optimization withlinear constraints

QlWrapper is a C++ Wrapper for the Fortran subroutine QL which solves strictly convexquadratic programming problems subject to linear equality and inequality constraints by theprimal-dual method of Goldfarb and Idnani. An available Cholesky decomposition of the ob-jective function matrix can be provided by the user. Bounds are handled separately. The codeis designed for solving small-scale quadratic programs in a numerically stable way. Its usage isoutlined and an illustrative example is presented. Section 8.1 is taken from Schittkowski [139].

8.1. Introduction

The code solves the strictly convex quadratic program

min 12x

TCx+ dTx

aTj x+ bj = 0 , j = 1, . . . ,me , (8.1)

x ∈ Rn : aTj x+ bj ≥ 0 , j = me + 1, . . . ,m ,

xl ≤ x ≤ xu

with an n by n positive definite matrix C, an n-dimensional vector d, an m by n matrixA = (a1, ..., am)T , and an m-vector b. Lower and upper bounds for the variables, xl and xu,respectively, are separately handled.

The quadratic program (1) is solved by the primal-dual method of Goldfarb and Idnani [56].Initially, a Cholesky decomposition of C is computed by an upper triangular matrix R suchthat C = RTR. If available, a user can provide a known triangular factor. In case of numericalinstabilities, e.g. round-off errors, or a semi-definite matrix C, a certain multiple of the unitmatrix is added to C to get a positive definite matrix for which a Cholesky decomposition canbe obtained.

Successively, violated constraints are added to an active set until a solution is obtained. Ineach step, the minimizer of the objective function subject to the new active set is computed. Ifan iterate satisfies all linear constraints and bounds, the optimal solution is obtained and thethe algorithm terminates. If necessary, a constraint can be dropped from the working set if nolonger considered as an active one.

The corresponding matrix manipulations to compute the successive minimizers subject toactive constraints are performed in a numerically stable way by orthogonal Givens rotations.Since objective function values are strictly increasing from one iteration to the next and sincethe number of possible active sets is finite, the algorithm terminates after finitely many steps.

A particular advantage of a dual method is that the phase I of a primal algorithm, i.e., thecomputation of an initial feasible point satisfying all linear constraints and bounds, can be

105

8. QlWrapper - Quadratic optimization with linear constraints

avoided. A generalization of the method introduced in this section, is published by Boland [11]for the case that C is positive semi-definite.

The implementation of the code goes back to Powell [106] and the algorithmic details arefound in the reference. Besides of a few internal changes concerning numerical details, the mainextensions of QL compared to ZQPCVX are the separate handling of upper and lower boundsand the optional provision of the Cholesky factor of C.

As part of the sequential quadratic programming code NLPQL for constrained nonlinearprogramming, QL is frequently used in practice. Even many of the standard test problems of thecollections of Hock and Schittkowski [68] and Schittkowski [124] are badly scaled, ill-conditioned,or even degenerate. The reliability of an SQP solver depends mainly on the numerical efficiencyand stability of the code solving the quadratic programming subproblem. As verified by thecomparative study of Schittkowski [132], NLPQL and QL successfully solve all 306 test problemsunder consideration. In addition, all 1,000 test examples of the interactive data fitting systemEASY-FIT, see Schittkowski [128], are solved by the code DFNLP. The subroutine is an extensionof NLPQL and depends also on the stable solution of quadratic programming subproblems, seeSchittkowski [125].


Usage

1. Generate a QlWrapper object employing QlWrapper()



b) Put the values for parameters you want to adjust:


or

Put( const char * pcParam , const int iValue ) ;


or

Put( const char * pcParam , const double dValue ) ;


or

Put( const char * pcParam , const char cValue ) ;


or

Put( const char * pcParam , const bool bValue ) ;

c) Put the number of inequality constraints to the object:


106


d) Put the number of equality constraints to the object:


3. Set upper and lower bounds using

PutLowerBound( const int iVariableIndex, const double dLowerBound )


for iVariableIndex = 0,...,NumberOfVariables - 1

or



where pdXu and pdXl must be pointing on arrays of length equal to the number of designvariables .


4. Set matrices and Vectors for the objective and constraint functions using

Put( const char * pcParam , const double * pdValue ) ;

where pcParam = "ObjMat", "ObjVec","ConstrMat", and "ConstrVec" (each) and pdValue

is a pointer to the corresponding array.

5. Start Ql-Subroutine: StartOptimizer()




7. Useful methods:

a) GetNumConstr( int & iNumConstr ) const



b) GetDesignVarVec( const double * & pdPointer, const int i ) const

Returns a pointer to the i’th design variable vector ( default: i=0 ).GetDesignVar( const int iVarIdx, double & pdPointer,


Returns the value of the iVarIdx’th design variable of the iVectorIdx’th designvariable vector (default: iVectorIdx=0).

107


8. Output

a) GetDesignVar( const int iVariableIdx, double & dValue, const int iParSysIdx

) const

with iParSysIdx = 0 returns the value of the iVariableIdx’th design variable inthe last solution vector.

b) GetDesignVarVec( const double * & pdPointer, const int iParSysIdx) const

with iParSysIdx = 0 returns a pointer to the last solution vector.

8.3. Example

Consider the following problem:

Minimize 12

∑4i=0 x

2i − 21.98x0 − 1.26x1 + 61.39x2 + 5.3x3 + 101.3x4

s.t. −7.56x0 + 0.5x4 + 39.1 ≥ 0,−100 ≤ xi ≤ 100, i = 0, . . . , 4xi ∈ R, i = 0, . . . , 4.

This problem can be written as

Minimize 1/2xTCx+ dTx,s.t. aT0 x+ b0 ≥ 0,

−100 ≤ xi ≤ 100, i = 0, . . . , 4xi ∈ R, i = 0, . . . , 4.

with C =

1 0 0 0 00 1 0 0 00 0 1 0 00 0 0 1 00 0 0 0 1

, d =

−21.98−1.2661.395.3

101.3

, a0 =

−7.56

000

0.5

and b0 = 39.1.

Now it can be solved by QlWrapper. The file QlExample.h contains the class definition:

1 # ifndef QLEXAMPLE H INCLUDED2 # define QLEXAMPLE H INCLUDED34 #include”OptProblem . h”567 class QlExample : public OptProblem8 9 public :

1011 QlExample ( ) ;1213 int FuncEval ( bool bGradApprox = fa l se ) ;14 int GradEval ( ) ;15 int SolveOptProblem ( ) ;

108

8.3. Example

16 ;17 # endif

The file QlExample.cpp contains the implementation:

1 #include”QlExample . h”2 #include<iostream>3 #include<cmath>4 #include”QlWrapper . h”56 using std : : cout ;7 using std : : endl ;8 using std : : c in ;9

10 QlExample : : QlExample ( )11 12 cout << endl << ”−−− t h i s i s the Ql Example −−−” << endl ;13 cout << endl << ”−−−− so lved by QlWrapper −−−−−” << endl << endl ;14 m pOptimizer = new QlWrapper ;15 1617 int QlExample : : FuncEval ( bool bGradApprox )18 19 // not needed20 return EXIT FAILURE ;21 2223 int QlExample : : GradEval ( )24 25 // not needed26 return EXIT FAILURE ;27 2829 int QlExample : : SolveOptProblem ( )30 31 int iE r r o r ;3233 iEr ro r = 0 ;3435 m pOptimizer−>Put ( ”TermAcc” , 1 . 0E−6 ) ;36 m pOptimizer−>Put ( ”OutputLevel ” , 2 ) ;37 m pOptimizer−>PutNumDv( 5 ) ;38 m pOptimizer−>PutNumIneqConstr ( 1 ) ;39 m pOptimizer−>PutNumEqConstr ( 0 ) ;4041 double ∗ MatVec = new double [ 2 5 ] ;42

109


43 for ( int iRowIdx = 0 ; iRowIdx < 5 ; iRowIdx++ )44 45 for ( int iCo l Idx = 0 ; iCo l Idx < 5 ; iCo l Idx++)46 47 i f ( iCo l Idx == iRowIdx )48 49 MatVec [ iCo l Idx + 5 ∗ iRowIdx ] = 1 ;50 51 else52 53 MatVec [ iCo l Idx + 5 ∗ iRowIdx ] = 0 ;54 55 56 57 m pOptimizer−>Put ( ”ObjMat” , MatVec ) ;58 MatVec [ 0 ] = −21.98 ;59 MatVec [ 1 ] = −1.26 ;60 MatVec [ 2 ] = 61 .39 ;61 MatVec [ 3 ] = 5 .3 ;62 MatVec [ 4 ] = 101 .3 ;63 m pOptimizer−>Put ( ”ObjVec” , MatVec ) ;64 MatVec [ 0 ] = −7.56 ;65 MatVec [ 1 ] = 0 ;66 MatVec [ 2 ] = 0 ;67 MatVec [ 3 ] = 0 ;68 MatVec [ 4 ] = 0 .5 ;69 m pOptimizer−>Put ( ”ConstrMat” , MatVec ) ;70 MatVec [ 0 ] = 39 .1 ;71 m pOptimizer−>Put ( ”ConstrVec” , MatVec ) ;7273 delete [ ] MatVec ;7475 for ( int i = 0 ; i < 5 ; i++ )76 77 m pOptimizer−>PutUpperBound ( i , 100 ) ;78 m pOptimizer−>PutLowerBound ( i , −100.0 ) ;79 8081 iE r ro r = m pOptimizer−>StartOpt imizer ( ) ;8283 i f ( iE r ro r != EXIT SUCCESS )84 85 cout << ” Error ” << iE r r o r << ”\n” ;86 return iE r r o r ;87 88

110

8.3. Example

89 const double ∗ dX ;90 int iNumDesignVar ;91 double dObjVal ;9293 m pOptimizer−>GetNumDv( iNumDesignVar ) ;9495 m pOptimizer−>GetDesignVarVec ( dX ) ;9697 for ( int iDvIdx = 0 ; iDvIdx < iNumDesignVar ; iDvIdx++ )98 99 cout << ”dX[ ” << iDvIdx << ” ] = ” << dX[ iDvIdx ] << endl ;

100 101102 m pOptimizer−>GetObjVal ( dObjVal ) ;103 cout << ”Value o f o b j e c t i v e : ” << dObjVal << endl ;104105106 return EXIT SUCCESS ;107

111

9. IpoptWrapper - Interface to an open-sourcelarge scale optimization algorithm

IpoptWrapper provides the possibility to use Ipopt through the Nlp++-Interface. Details on thealgorithm of Ipopt can be found in [163, 162, 161, 98, 115]. Ipopt is not contained in Nlp++.It must be obtained separately. After successful installation of Ipopt Nlp++ must be adjustedaccording to the instructions in NlpConfig.h.


IpoptWrapper can only be used if you derive your problem from class OptProblem and useDefaultLoop to start the optimizer. A direct call to Startoptimizer, as allowed by most ofthe optimizers, is not possible here.


1. Define a class (e.g. MyProblem) which is derived from OptProblem and defines your opti-mization problem.

2. This class must implement the functions FuncEval(), GradEval(), SolveOptProblem().If you want to approximate the gradients of the objective/constraint functions, you canimplement GradEval() as a call to GradApprox() which is already implemented in the baseclass OptProblem. In this case, however, the function FuncEval() must be implementedin a way that makes sure that during the approximation of gradients the function valuesare put to the ApproxGrad object in the correct order (first constraints, then objective(s))

3. Either the class constructor or the SolveOptProblem() function must allocate memoryfor m pOptimizer, e.g. using new IpoptWrapper.

4. Via m pOptimizer, SolveOptProblem() then must define the problem using the Put-methods:

a) Put the number of design variables to the object:PutNumDv( int iNumDv ) ;






113

9. IpoptWrapper - Interface to an open-source large scale optimization algorithm



• Put( const char * pcParam , const bool * pbValue ) ; orPut( const char * pcParam , const bool bValue ) ;

c) Put the number of inequality constraint functions to the object: PutNumIneqConstr(int iNumIneqConstr ) ;

d) Put the number of equality constraint functions to the object: PutNumEqConstr( int

iNumEqConstr ) ;

e) Set boundary values and provide initial guess.

i. Set upper and lower bounds using

• PutLowerBound( const int iVariableIndex, const double dLowerBound

);

• PutUpperBound( const int iVariableIndex, const double dUpperBound

);


• PutUpperBound( const double * pdXu );

• PutLowerBound( const double * pdXl );

where pdXu and pdXl must be pointing on arrays of length equal to thenumber of design variables.


ii. Provide an initial guess for the optimization vector byPutInitialGuess( const double * pdInitialGuess ),where pdInitialGuess must be a double array of length equal to the number ofvariables.or PutInitialGuess( const int iVariableIndex, const double dInitialGuess

),for iVariableIndex = 0,...,NumberOfVariables- 1.


f) The SolveOptProblem() function can now call m pOptimizer->DefaultLoop(this)

using the current MyProblem object as an argument.

5. Generate a MyProblem object.

6. Call MyProblem::SolveOptProblem().

114

10. Example - Solving a constrained nonlinearoptimization problem usingactive-set-strategy

As an illustrative example we solve the following problem using DefaultLoop():

Minimize F (x) = 2512x0 + 77

60x1 + 1920x2 + 319

420x3 ,s.t. G0(x) = x0 + 1

2x1 + 13x2 + 1

4x3 − 2512 ≥ 0 ,

G1(x) = 12x0 + 1

3x1 + 14x2 + 1

5x3 − 7760 ≥ 0 ,

G2(x) = 13x0 + 1

4x1 + 15x2 + 1

6x3 − 1920 ≥ 0 ,

G3(x) = 14x0 + 1

5x1 + 16x2 + 1

7x3 − 319420 ≥ 0 ,

0 ≤ xi ≤ 106, ∀i = 0, . . . , 3

The file Example14.h contains the class definition.

1 # ifndef EXAMPLE14 H INCLUDED2 # define EXAMPLE14 H INCLUDED34 #include”OptProblem . h”56 class Example14 : public OptProblem7 8 public :9

10 Example14 ( ) ;1112 int FuncEval ( bool bGradApprox = fa l se ) ;13 int GradEval ( ) ;14 int SolveOptProblem ( ) ;1516 ;1718 # endif

The file Example14.cpp contains the implementation:

1 #include”Example14 . h”2 #include<iostream>3 #include<cmath>4 #include”SqpWrapper . h”5 #include”NlpqlbWrapper . h”6 #include”NlpqlgWrapper . h”

115

10. Example - Solving a constrained nonlinear optimization problem using active-set-strategy

7 #include”ScpWrapper . h”8 #include”CobylaWrapper . h”9

10 using std : : cout ;11 using std : : endl ;12 using std : : c in ;1314 Example14 : : Example14 ( )15 16 char s ;1718 cout << endl << ”−−−− t h i s i s Example 14 −−−−” << endl << endl ;1920 // Choose o p t i m i z e r :21 do22 23 cout << endl << ”Which Optimizer do you want to use ? \n\n” ;24 cout << ” [ q ] SqpWrapper \n” ;25 cout << ” [ c ] ScpWrapper \n” ;26 cout << ” [ b ] NlpqlbWrapper \n” ;27 cout << ” [ g ] NlpqlgWrapper \n” ;28 cout << ” [ y ] CobylaWrapper \n” ;29 c in >> s ;3031 i f ( s == ’ q ’ )32 33 m pOptimizer = new SqpWrapper ( ) ;34 35 else i f ( s == ’ c ’ )36 37 m pOptimizer = new ScpWrapper ( ) ;38 39 else i f ( s == ’b ’ )40 41 m pOptimizer = new NlpqlbWrapper ( ) ;42 43 else i f ( s == ’ g ’ )44 45 m pOptimizer = new NlpqlgWrapper ;46 47 else i f ( s == ’ y ’ )48 49 m pOptimizer = new CobylaWrapper ;50 51 else52

116

53 cout << ” i l l e g a l input ! ” << endl ;54 55 56 while ( s != ’ c ’ &&57 s != ’ q ’ &&58 s != ’ g ’ &&59 s != ’ y ’ &&60 s != ’b ’ ) ;61 6263 int Example14 : : FuncEval ( bool /∗Parameter not needed ∗/ )64 65 const double ∗ dX ;66 double dValue ;67 int iE r r o r ;6869 iEr ro r = 0 ;7071 m pOptimizer−>GetDesignVarVec ( dX ) ;7273 // 0 th c o n s t r a i n t74 dValue = dX [ 0 ] + dX[ 1 ] / 2 . 0 + dX[ 2 ] / 3 . 0 + dX[ 3 ] / 4 . 0 − 25 . 0/12 . 0 ;75 m pOptimizer−>PutConstrVal ( 0 , dValue ) ;7677 // 1 s t c o n s t r a i n t78 dValue = dX[ 0 ] / 2 . 0 + dX[ 1 ] / 3 . 0 + dX[ 2 ] / 4 . 0 + dX[ 3 ] / 5 . 0 − 77 . 0/60 . 0 ;79 m pOptimizer−>PutConstrVal ( 1 , dValue ) ;8081 // 2nd c o n s t r a i n t82 dValue = dX[ 0 ] / 3 . 0 + dX[ 1 ] / 4 . 0 + dX[ 2 ] / 5 . 0 + dX[ 3 ] / 6 . 0 − 19 . 0/20 . 0 ;83 m pOptimizer−>PutConstrVal ( 2 , dValue ) ;8485 // 3 rd c o n s t r a i n t86 dValue = dX[ 0 ] / 4 . 0 + dX[ 1 ] / 5 . 0 + dX[ 2 ] / 6 . 0 + dX[ 3 ] / 7 . 087 − 319 .0/420 .0 ;88 m pOptimizer−>PutConstrVal ( 3 , dValue ) ;8990 // o b j e c t i v e91 dValue = 25 .0/12 . 0 ∗ dX [ 0 ] + 77 .0/60 . 0 ∗ dX [ 1 ]92 + 19 .0/20 . 0 ∗ dX [ 2 ] + 319 .0/420 .0 ∗ dX [ 3 ] ;93 m pOptimizer−>PutObjVal ( dValue ) ;949596 return EXIT SUCCESS ;97 98

117


99 int Example14 : : GradEval ( )100 101 /∗ I f we were o p t i m i z i n g us ing CobylaWrapper ( on ly ) t h i s102 f u n c t i o n cou ld be implemented as a re turn command only .103 As CobylaWrapper i s a g r a d i e n t f r e e a l go r i t hm t h i s f u n c t i o n104 would never be c a l l e d . A l l o the r o p t i m i z e r s in t h i s par t105 o f the manual o f f e r an a c t i v e s e t s t r a t e g y which i s used106 in the f o l l o w i n g implementat ion . ∗/107108 const double ∗ dX ;109 int ∗ p iAct ive ;110111 p iAct ive = NULL ;112113 // Get a c t i v e c o n s t r a i n t s114 m pOptimizer−>Get ( ”ActConstr ” , p iAct ive ) ;115116 m pOptimizer−>GetDesignVarVec ( dX ) ;117118 m pOptimizer−>PutDerivObj ( 0 , 25 .0 / 12 .0 ) ;119 m pOptimizer−>PutDerivObj ( 1 , 77 .0 / 60 .0 ) ;120 m pOptimizer−>PutDerivObj ( 2 , 19 .0 / 20 .0 ) ;121 m pOptimizer−>PutDerivObj ( 3 , 319 .0 / 420 .0 ) ;122123 i f ( p iAct ive [ 0 ] == 1 )124 125 m pOptimizer−>PutDerivConstr ( 0 , 0 , 1 . 0 ) ;126 m pOptimizer−>PutDerivConstr ( 0 , 1 , 1 . 0 / 2 .0 ) ;127 m pOptimizer−>PutDerivConstr ( 0 , 2 , 1 . 0 / 3 .0 ) ;128 m pOptimizer−>PutDerivConstr ( 0 , 3 , 1 . 0 / 4 .0 ) ;129 130 i f ( p iAct ive [ 1 ] == 1 )131 132 m pOptimizer−>PutDerivConstr ( 1 , 0 , 1 . 0 / 2 .0 ) ;133 m pOptimizer−>PutDerivConstr ( 1 , 1 , 1 . 0 / 3 .0 ) ;134 m pOptimizer−>PutDerivConstr ( 1 , 2 , 1 . 0 / 4 .0 ) ;135 m pOptimizer−>PutDerivConstr ( 1 , 3 , 1 . 0 / 5 .0 ) ;136 137 i f ( p iAct ive [ 2 ] == 1 )138 139 m pOptimizer−>PutDerivConstr ( 2 , 0 , 1 . 0 / 3 .0 ) ;140 m pOptimizer−>PutDerivConstr ( 2 , 1 , 1 . 0 / 4 .0 ) ;141 m pOptimizer−>PutDerivConstr ( 2 , 2 , 1 . 0 / 5 .0 ) ;142 m pOptimizer−>PutDerivConstr ( 2 , 3 , 1 . 0 / 6 .0 ) ;143 144 i f ( p iAct ive [ 3 ] == 1 )

118

145 146 m pOptimizer−>PutDerivConstr ( 3 , 0 , 1 . 0 / 4 .0 ) ;147 m pOptimizer−>PutDerivConstr ( 3 , 1 , 1 . 0 / 5 .0 ) ;148 m pOptimizer−>PutDerivConstr ( 3 , 2 , 1 . 0 / 6 .0 ) ;149 m pOptimizer−>PutDerivConstr ( 3 , 3 , 1 . 0 / 7 .0 ) ;150 151152 return EXIT SUCCESS ;153 154155 int Example14 : : SolveOptProblem ( )156 157 int iE r r o r ;158159 iE r ro r = 0 ;160161 // I f SqpWrapper has been chosen as opt imizer ,162 // the number o f p a r a l l e l systems i s s e t .163 // (The implementat ion o f FuncEval a l l o w s only one system . )164 // I f any o th er o p t i m i z e r has been chosen ,165 // t h i s i n s t r u c t i o n w i l l not do anyth ing .166 m pOptimizer−>Put ( ” NumParallelSys ” , 1 ) ;167168 // I f NlpqlgWrapper has been chosen as opt imizer ,169 // the number o f expec ted l o c a l minimizers i s s e t .170 // I f any o th er o p t i m i z e r has been chosen ,171 // t h i s i n s t r u c t i o n w i l l not do anyth ing .172 m pOptimizer−>Put ( ”MaxNumLocMin” , 2 ) ;173174 // I f NlpqlbWrapper has been chosen as opt imizer ,175 // the number o f c o n s t r a i n t s to be taken i n t o the subproblem i s s e t .176 // I f any o th er o p t i m i z e r has been chosen ,177 // t h i s i n s t r u c t i o n w i l l not do anyth ing .178 m pOptimizer−>Put ( ”NumConstrSubProb” , 2 ) ;179180 // Define the o p t i m i z a t i o n problem181 m pOptimizer−>PutNumDv( 4 ) ;182 m pOptimizer−>PutNumIneqConstr ( 4 ) ;183 m pOptimizer−>PutNumEqConstr ( 0 ) ;184185 double LB[ 4 ] = 0 . 0 , 0 . 0 , 0 . 0 , 0 . 0 ;186 double UB[ 4 ] = 10E6 , 10E6 , 10E6 , 10E6 ;187 double IG [ 4 ] = 0 . 0 , 0 . 0 , 0 . 0 , 0 . 0 ;188189 m pOptimizer−>PutUpperBound ( UB ) ;190 m pOptimizer−>PutLowerBound ( LB ) ;

119


191 m pOptimizer−>Put In i t i a lGue s s ( IG ) ;192193 // S t a r t the o p t i m i z a t i o n194 iEr ro r = m pOptimizer−>DefaultLoop ( this ) ;195196 // i f an e rror occured , r e p o r t i t . . .197 i f ( iE r ro r != EXIT SUCCESS )198 199 cout << ” Error ” << iE r r o r << ”\n” ;200 return iE r r o r ;201 202203 // . . . e l s e r e p o r t the r e s u l t s :204 const double ∗ dX ;205206 m pOptimizer−>GetDesignVarVec ( dX ) ;207208 int iNumDesignVar ;209210 m pOptimizer−>GetNumDv( iNumDesignVar ) ;211212 for ( int iDvIdx = 0 ; iDvIdx < iNumDesignVar ; iDvIdx++ )213 214 cout << ”dX[ ” << iDvIdx << ” ] = ” << dX[ iDvIdx ] << endl ;215 216 return EXIT SUCCESS ;217

120

Part II.

Multicriteria Optimization

121

11. Theory - Multiple Objective Optimization

11.1. Introduction

The multiple objective optimization problem is stated as follows:

Definition 11.1.1 (MOO)

minx∈Rn

~f(x) = (f1(x), ..., fk(x))T

s.t.: g(x) ≥ 0 g : Rn → R

h(x) = 0 h : Rn → R

xl ≤ x ≤ xu xl, xu ∈ Rn

(MOO)

In other words we wish to determine from among the set of all numbers which satisfy theinequality g(x) ≥ 0 and the equality constraint h(x) = 0 that particular set x∗ ∈ Rn withxl ≤ x∗ ≤ xu which yields the optimum values of all objective functions f1, ..., fk. Here ’optimize’does not simply mean to find the minimum or maximum of the objective function as it isfor a single criterion optimization problem. It means to find a ’good’ solution considering allthe objective functions simultaneously. Of course, first we need to know how to designate aparticular solution as ’good’ or ’bad’. This is the major question which arises while solving anymulticriterian optimization problem, and will be discussed here.

11.2. Pareto Optimality

Let us start taking a closer look at the multiple objective problem (MOO). Let

X := x ∈ Rn : g(x) ≥ 0, h(x) = 0, xl ≤ x ≤ xu

denote the set of all feasible solutions. Then problem (MOO) can be simplified to

minx∈X

~f(x)

If there exists x∗ ∈ X such that for all objective functions x∗ is optimal, that is

fi(x∗) ≤ fi(x)∀x ∈ X,∀i = 1, ..., k

then x∗ is certainly a desirable solution. Unfortunately, this is an utopian situation which rarelyexists, since it is unlikely that all fi(x) will take out their minimum values at a common pointx∗. Thus we are almost always faced with the question: What solution should we adopt; thatis, how should an ’optimal’ solution be defined. First consider the so called ideal solution. Inorder to define this solution we have to find separately attainable minima, for all the objectivefunctions.

123

11. Theory - Multiple Objective Optimization

Assuming there is one, let xi∗ be the solution of the scalar optimization problem

minx∈X∗

fi(x) = f∗i for i = 1, ..., k

then f∗i is called the individual minimum for the scalar problem i, the vector f∗ = (f∗1 , ..., f∗k ) is

called ideal for a multiple objective optimization problem and the points in X which determinedthis vector is the ideal solution.

It is usually not true thatxi∗ = xj∗∀i, j = 1, ..., k

holds, which would be great since we would have solved the multiple objective optimizationproblem just by considering a sequence of scalar ones. Thus we need to define a new formof optimality which leads us to the concept of Pareto Optimality, which was introduced by V.Pareto in 1896 and is still the most important part of multicriterian analysis:

Definition 11.2.1 (Pareto Optimality)A point x∗ ∈ X is said to be Pareto Optimal for problem (MOO) if there is no other vectorx ∈ X, such that

fi(x) ≤ fi(x∗) for all i = 1, ..., k

andfi(x) < fi(x

∗) for at least one i ∈ 1, ..., k

This definition is based on the intuitive conviction that the point x∗ ∈ X is chosen as theoptimal if no criterion can be improved without worsening at least one other criterion. Unfor-tunately the Pareto optimium almost always gives not a single solution but a set of solutionscalled the functional-efficient boundary (Pareto Set) of (MOO).

Example 11.2.1 (Functional-efficient boundary)Consider the problem of minimizing the two objectives:

f1(x1, x2) = x1 + x22, f2(x1, x2) = x2

1 + x2

subject to:

12− x1 − x2 ≥ 0

−x21 + 10x1 − x2

2 + 16x2 − 80 ≥ 0

Then the functional-efficient boundary of the problem can be graphically plotted:

It could seem in chapter 11 that the solution of a multiple objective problem is all aboutfinding Pareto optimal points in design space and to decide based on a selection what pointsto choose as the optimal one. There are different ways to evaluate Pareto optimal points. Oneclass of methods consist in transforming the multiple objective problem in one (or a sequenceof) scalar optimization problem(s) which will be content of the following chapters. Each methoddescribed here requires some information about the preferences in the criteria. Since the engineermay meet different decision-making problems, and since different information about preferencesis available, it is difficult to indicate which method can be recommended to solve a problem.

124

11.2. Pareto Optimality

Figure 11.1.: Functional-efficient Boundary

125

12. The Method of Weighted Objectives

12.1. The Algorithm

The weighting objectives method has received most attention and particular models within thismethod have been widely applied. The basis of this method consist in adding all the objectivefunctions together using different weighting coefficients for each. We transform our MOO to ascalar problem by creating one function of the form

f(x) =k∑i=1

ωifi(x) (12.1)

where ωi ≥ 0,k∑i=1

ωi = 1 are the weighting coefficients, which reflect the preference of the

individual objective functions. Since the numerical values of the different objectives may vary alot we should scale our formulation to

f(x) =k∑i=1

ωifi(x)ci (12.2)

The best results are obtained if ci = 1f∗i

, where f∗i is the optimal value (ideal) of the scalar

optimization problem with objective function fi. It is important to note that in case of a convexMOO problem any solution of problem 12.2 with any weight selection as above will lead to apoint x ∈ X which lies on the curve of the functional-efficient points.




1. Generate an MooWeightedObjectives object employing MooWeightedObjectives()

2. Generate a scalar optimizer (e.g. SqpWrapper ) object and attach it to the multipleobjective optimizer using AttachScalarOptimizer()

a) Put the number of objective functions to the object:Put( const char * pcParam , const int * piNumObjFuns ) ;

orPut( const char * pcParam , const int iNumObjFuns ) ;

where pcParam = "NumObjFuns".

127


b) Put the number of design variables to the object:





where pcParam is one of the parameters of the scalar optimizer.



where pcParam is one of the parameters of the scalar optimizer or

– "Weights" Put the weights for the weighted sum. pdValue must be a pointeron a double array of length equal to the number of objective functions con-taining the weights.



with pcParam = "OutputFileName" to set the name of the output file.



where pcParam is one of the parameters of the scalar optimizer or:

– "ComputeIdeals" If ComputeIdeals == true (default) the ideals will becomputed before the multiple objective optimization starts and will be usedfor scaling the objectives.












128









4. Start MooWeightedObjectives: StartOptimizer()




• if iStatus equals EvalFuncId(): MooWeightedObjectives needs new function values→ 6b New function values.After passing these values to the MooWeightedObjectives object go to 4.

• if iStatus equals EvalGradId():MooWeightedObjectives needs new values for gradients→ 6c Providing new gradientvalues.After passing these values to the MooWeightedObjectives object go to 4.


a) Useful methods:



Returns the number of design variables.Get( const char * pcParam, int & iNumObjFuns )

with pcParam == "NumObjFuns" returns the number of objective functions.Get( const char * pcParam, int & iNumParallelSys )

Returns the number of parallel systems whenpcParam == "NumParallelSys".If the scalar optimizer is not SqpWrapper this number will always be 1.

ii. GetDesignVarVec( const double * & pdPointer,


returns a pointer to the iVectorIdx’th design variable vector.GetDesignVar( const int iVarIdx , double & pdPointer,

129



returns the value of the iVarIdx’th design variable of the iVectorIdx’th designvariable vector.

iii. IsObjFunInSubProb( const int iFunIdx ) const

Returns true, if the iFunIdx’th objective function (or its gradient) is in thecurrent subproblem and thus has to be evaluated.Returns false, if the function (or its gradient) is not in the current subproblemand thus does not have to be evaluated.

b) Providing new function valuesFunction values for the objectives as well as for constraints have to be calculated forthe design variable vectors provided by MooWeightedObjectives.Note: When Status is equal EvalGradId() for the first time, the functions need tobe evaluated only for the 0’th parallel system (This is only relevant if SqpWrapper isused as scalar optimizer.)For access to the design variable vectors see 6(a)ii. After calculating the values, thesemust be passed to the MooWeightedObjectives object using:



where pdConstrVals is pointing on an array containing the values of each con-straint at the iParSysIdx’th design variable vector provided by MooWeighte-dObjectives.



and iObjFunIdx = 0,...,NumberOfObjectiveFunctions - 1

where dObjVal defines the value of the iObjFunIdx’th objective function at theiParSysIdx’th design variable vector provided by MooWeightedObjectives.




c) Providing new gradient valuesGradients must be calculated for all objectives which are used in the current sub-problem (see 6(a)iii) and the constraints at the current design variable vector. Foraccess to the design variable vector see 6(a)ii.






130




For the gradients of the objectives you can use


const int iFunIdx ),for iVariableIdx = 0,...,NumberOfVariables - 1

where dDerivativeValue is defined as the derivative of the iFunIdx’th objec-tive with respect to the iVariableIdx’th variable at the current design variablevector.

• PutGradObj( const double * pdGradient, const int iFunIdx),where pdGradient is a pointer on the gradient of the iFunIdx’th objective atthe current design variable vector.

7. Output










returns the value of the derivative of the iFunIdx’th objective with respect to theiVariableIdx’th design variable at the last solution vector.


returns the value of the iFunIdx’th objective function at the last solution vector.


) const




131

13. The Hierarchical Optimization Method

13.1. The Algorithm

The hierarchical optimization method has been suggested by Walz [164] and considers the situa-tion where the objectives can be ordered in terms of their importance. Without loss of generalitylet the vector of objectives

(f1(x), ..., fk(x))T

be ordered with respect to their importance, i.e. let f1(x) be the most and fk(x) the leastimportant objective function, specified by the user. We now minimize each objective separately,adding in, at each step a new constraint which limits the assumed increase or decrease of thepreviously considered functions.

Algorithm 9 Hierarchical Method

(1) Find the minimum of the first objective, i.e. find

x(1) = (x(1)1 , ..., x

(1)k )

such thatf1(x(1)) = min

x∈Xf1(x) (13.1)

Repeat step (2) for i = 2, ..., k(2) Find the minimum of the i’th objective, i.e. find

x(i) = (x(i)1 , ..., x

(i)k )

such thatfi(x

(i)) = minx∈X

fi(x) (13.2)

with the additional constraints

fj−1(x) ≤ (1 +εj−1

100)fj−1(x(j−1)), for j = 2, ..., i (13.3)

Note that the solution of the k scalar problems are all functional-efficient but only the last(k’th) leads to optimality. The assumption that in each step the parameter εi equals zero leadsto the suggested variation of Ben-Tal [4].

133





1. Generate an MooHierarchOptMethod object employing MooHierarchOptMethod()














– "Epsilon" Parameter for the additional constrains. pdValue must be apointer on a double array of length equal to the number of objective functionscontaining the epsilon-vector.











134
















4. Start MooHierarchOptMethod: StartOptimizer()




• if iStatus equals EvalFuncId(): MooHierarchOptMethod needs new function values→ 6b New function values.After passing these values to the MooHierarchOptMethod object go to 4.

• if iStatus equals EvalGradId():MooHierarchOptMethod needs new values for gradients→ 6c Providing new gradientvalues.After passing these values to the MooHierarchOptMethod object go to 4.


a) Useful methods:



135












b) Providing new function valuesFunction values for the objectives as well as for constraints have to be calculated forthe design variable vectors provided by MooHierarchOptMethod.Note: When Status is equal EvalGradId() for the first time, the functions need tobe evaluated only for the 0’th parallel system (This is only relevant if SqpWrapper isused as scalar optimizer.)For access to the design variable vectors see 6(a)ii. After calculating the values, thesemust be passed to the MooHierarchOptMethod object using:











136















7. Output













137



) const




138

14. The Trade-Off Method

14.1. The Algorithm

This Method is known as constraint oriented transformation. Here we only minimize one singleobjective, the other objectives are considered to be constraints.

Algorithm 10 Trade-Off

For given l ∈ 1, ..., k do:(1) Find the minimum of the l’th objective function, i.e. find x∗ such that

fl(x∗) = min

x∈Xfl(x)

subject to additional constraints of the form

fi(x) ≤ yi, ∀i = 1, ..., k, i 6= l

where the yi are assumed values of the objective functions we wish not to exceed.

(2) Repeat (1) for different values of yi. The information derived from a well chosen setof yi can be useful in making the decision. The search is stopped when the user finds asatisfactory solution.

It may be necessary to repeat Algorithm 10 for different indices l ∈ 1, ..., k. In order toget a reasonable choice of yi, it is useful to minimize each objective function separately, i.e. letf∗i , i = 1, ..., k be the optimal values of the scalar problems, then take account of the followingconstraints:

fi(x) ≤ f∗i + ∆fi,∀i = 1, ..., k, i 6= l

where ∆fi are given vlaues of function increments. A detailed theory about the trade-off methodand its variants can be found in [63], [99] and [96].




1. Generate an MooTradeOff object employing MooTradeOff()

139








c) Put the values for the parameters you want to adjust:




– "IdxOptFun Put the index of the objective used as objective in the scalarproblem. The default is 0.




– "Bounds" Put the bounds for the objectives used as constraints. pdValue

must be a pointer on a double array whose length equals the number ofobjective functions. If no bounds are set, the default is to compute the idealsand add BoundFactor times their absolute values ( default: BoundFactor =

0.5 ). (This means, that for the one used as objective in the scalar problem abound has to be put, too. This bound will be ignored in the scalar problem.)

– "BoundFactor" Put the factor used to compute the bounds from the ideals.Only relevant when parameter Bounds is not set.







– "ComputeIdeals" Determine whether the ideals shall be computed beforethe multiple objective optimization starts.Note: If no bounds are set, the ideals will always be computed, regardless ofthis parameter.


140



















4. Start MooTradeOff: StartOptimizer()




• if iStatus equals EvalFuncId(): MooTradeOff needs new function values→ 6b Newfunction values.After passing these values to the MooTradeOff object go to 4.

• if iStatus equals EvalGradId():MooTradeOff needs new values for gradients → 6c Providing new gradient values.After passing these values to the MooTradeOff object go to 4.


a) Useful methods:

141














b) Providing new function valuesFunction values for the objectives as well as for constraints have to be calculated forthe design variable vectors provided by MooTradeOff.Note: When Status is equal EvalGradId() for the first time, the functions need tobe evaluated only for the 0’th parallel system (This is only relevant if SqpWrapper isused as scalar optimizer.)For access to the design variable vectors see 6(a)ii. After calculating the values, thesemust be passed to the MooTradeOff object using:









142

















7. Output










143






) const




144

15. Method of Distance Functions

15.1. The Algorithm

This method also leads to the scalarization of the problem (MOO). Assuming we have boundsyi, i = 1, ..., k, usually provided by the user, our goal is to evaluate the functions to approachthe yi-values as close as possible, i.e.

fp(x) =

(k∑i=1

|fi(x)− yi|p) 1

p

, 1 ≤ p ≤ ∞

where p is the order of the method:

p = 1 : f1(x) =k∑i=1|fi(x)− yi| Goal Programming

p = 2 : f2(x) =

(k∑i=1|fi(x)− yi|2

) 12

Euclidian Norm




1. Generate an MooDistFunc object employing MooDistFunc()








145





– "Order Put the order p of the method.Set Order = 0 for p =∞ (default).




– "Bounds" Put the bounds for the objectives. If no bounds are set, the defaultis to compute the ideals and add BoundFactor times their absolute values (default: BoundFactor = 0.5 ).

– "BoundFactor" Put the factor used to compute the bounds from the ideals.Only relevant when parameter Bounds is not set.







– "ComputeIdeals" Determine whether the ideals shall be computed beforethe multiple objective optimization starts.Note: If no bounds are set, the ideals will always be computed, regardless ofthis parameter.












146









4. Start MooDistFunc: StartOptimizer()




• if iStatus equals EvalFuncId(): MooDistFunc needs new function values→ 6b Newfunction values.After passing these values to the MooDistFunc object go to 4.

• if iStatus equals EvalGradId():MooDistFunc needs new values for gradients → 6c Providing new gradient values.After passing these values to the MooDistFunc object go to 4.


a) Useful methods:










147





b) Providing new function valuesFunction values for the objectives as well as for constraints have to be calculated forthe design variable vectors provided by MooDistFunc.Note: When Status is equal EvalGradId() for the first time, the functions need tobe evaluated only for the 0’th parallel system (This is only relevant if SqpWrapper isused as scalar optimizer.)For access to the design variable vectors see 6(a)ii. After calculating the values, thesemust be passed to the MooDistFunc object using:

















148









7. Output














) const




149

16. Global Criterion Method and the Min-MaxOptimum

16.1. The Algorithm

Here we need the optimal values of the single objectives f∗i , i = 1, ..., k and the scalar substituteof (MOO) is

minx∈X

k∑i=1

(fi(x)− f∗i

f∗i

)p, 1 ≤ p ≤ ∞

in the case of the global criterion method and

minx∈X

maxi=1,...,k

(ωifi(x)− f∗i

f∗i

), f∗i ≥ 0

in case of the min-max formulation.

The solution of the former leads to a point which lies in the middle of the functional-efficientboundary whereas the latter is minimizing the relative derivation of the functions to their idealsolutions. Note that the min-max formulation is not continuously differentiable with respect tothe design parameter, thus not applicable to standard optimization methods unless we formulatean equivalent optimization problem which is continuously differentiable, i.e.

minx∈X,z

z


z − ωifi(x)− f∗i

f∗i≥ 0, i = 1, ..., k

One can show that this problem is equivalent to the min-max problem. However, there is onemore optimization parameter z and k additional constraints.

16.2. Program Documentation: MooGlobalCriterion



1. Generate an MooGlobalCriterion object employing MooGlobalCriterion()

151

16. Global Criterion Method and the Min-Max Optimum











– "Order Put the order p of the method. The default is 1.




















152










4. Start MooGlobalCriterion: StartOptimizer()




• if iStatus equals EvalFuncId(): MooGlobalCriterion needs new function values →6b New function values.After passing these values to the MooGlobalCriterion object go to 4.

• if iStatus equals EvalGradId():MooGlobalCriterion needs new values for gradients → 6c Providing new gradientvalues.After passing these values to the MooGlobalCriterion object go to 4.


a) Useful methods:








153







b) Providing new function valuesFunction values for the objectives as well as for constraints have to be calculated forthe design variable vectors provided by MooGlobalCriterion.Note: When Status is equal EvalGradId() for the first time, the functions need tobe evaluated only for the 0’th parallel system (This is only relevant if SqpWrapper isused as scalar optimizer.)For access to the design variable vectors see 6(a)ii. After calculating the values, thesemust be passed to the MooGlobalCriterion object using:
















154










7. Output














) const




155


16.3. Program Documentation: MooMinMaxOpt



1. Generate an MooMinMaxOpt object employing MooMinMaxOpt()














– "Weights" Put the weights for the weighted maximum. pdValue must be apointer on a double array of length equal to the number of objective functionscontaining the weights.











156
















4. Start MooMinMaxOpt: StartOptimizer()




• if iStatus equals EvalFuncId(): MooMinMaxOpt needs new function values → 6bNew function values.After passing these values to the MooMinMaxOpt object go to 4.

• if iStatus equals EvalGradId():MooMinMaxOpt needs new values for gradients→ 6c Providing new gradient values.After passing these values to the MooMinMaxOpt object go to 4.


a) Useful methods:




157


Get( const char * pcParam, int & iNumObjFuns )










b) Providing new function valuesFunction values for the objectives as well as for constraints have to be calculated forthe design variable vectors provided by MooGlobalCriterion.Note: When Status is equal EvalGradId() for the first time, the functions need tobe evaluated only for the 0’th parallel system (This is only relevant if SqpWrapper isused as scalar optimizer.)For access to the design variable vectors see 6(a)ii. After calculating the values, thesemust be passed to the MooMinMaxOpt object using:











158















7. Output













159



) const




160

17. Weighted Tchebycheff Method

17.1. The Algorithm

The weighted Tchebycheff method solves the min-max problem

minx∈X

maxi=1,...,k

(ωi|fi(x)− f∗i |)

with the ideal objective vector f∗ = (f∗1 , ..., f∗k )T and appropriate weights ωi, i = 1, ..., k. A

particularly good choice would be ωi = 1f∗i

. This problem is nondifferentiable. However, it

can be solved in a differentiable form as long as the objective and the constraint functions aredifferentiable and f∗ is known globally. In this case the | · |-operator can be removed and thefollowing differentiable equivalent problem is solved

minx∈X,z

z


z − ωi(fi(x)− f∗i ) ≥ 0, i = 1, ..., k




1. Generate an MooWeightedTchebycheff object employing MooWeightedTchebycheff()








161








– "Weights" Put the weights for the weighted maximum. pdValue must be apointer on a double array of length equal to the number of objective functionscontaining the weights.If no weights are put, the inverse of the ideals will be taken as weights.






















162





4. Start MooWeightedTchebycheff: StartOptimizer()




• if iStatus equals EvalFuncId(): MooWeightedTchebycheff needs new function val-ues → 6b New function values.After passing these values to the MooWeightedTchebycheff object go to 4.

• if iStatus equals EvalGradId():MooWeightedTchebycheff needs new values for gradients → 6c Providing new gradi-ent values.After passing these values to the MooWeightedTchebycheff object go to 4.


a) Useful methods:













163


b) Providing new function valuesFunction values for the objectives as well as for constraints have to be calculated forthe design variable vectors provided by MooWeightedTchebycheff.Note: When Status is equal EvalGradId() for the first time, the functions need tobe evaluated only for the 0’th parallel system (This is only relevant if SqpWrapper isused as scalar optimizer.)For access to the design variable vectors see 6(a)ii. After calculating the values, thesemust be passed to the MooWeightedTchebycheff object using:





















164





7. Output














) const




165

18. STEP Method

18.1. The Algorithm

The STEP Method contains elements similar to the Tchebycheff method described in 17, but isbased on a different idea. STEP is a so called interactive method for MOO problems. It can beconsidered to aspire at finding satisfactory solutions.

It is assumed in STEP that at a certain Pareto optimal objective vector the decision makercan indicate both objective functions that have acceptable values and those whose values are toohigh. The latter can be said to be unacceptable. The decision maker is now assumed to allowthe values of some acceptable objective functions to increase so that the unacceptable functionscan have lower values. In other words, the decision maker must give up a little in the value(s)of some objective functions fi, i ∈ Ia in order to improve the values of some other objectivefunctions fi, i ∈ Iua such that Ia ∪ Iua = 1, ..., k.

STEP uses the weighted Tchebycheff problem (see 17) to generate new trial solutions. Theideal objective vector f∗ is used as a reference point in the calculations. Information concerningthe ranges of the Pareto optimal set is needed in determining the weighting vector for the metric.The idea is to make the scales of all the objective functions similar with the help of the weightingcoefficients. The so called nadir objective vector - which is defined as

f∗∗i = f∗i + ∆fi, i = 1, ..., k

where f∗i is a component of the ideal objective vector and ∆fi > 0 a relatively small butcomputationally significant scalar for objective i - is approximated from the payoff table. Theweighting vector is calculated by the formula

ωi =eik∑j=1

ej

, i = 1, ..., k

where for every i = 1, ..., k

ei =1

f∗i

f∗∗i − f∗if∗∗i

The weight is larger for those objective functions that are far from their ideal objective vectorcomponent.

167

18. STEP Method

Algorithm 11 STEP

(1) Calculate the ideal and the nadir objective vectors and the weighting constants as givenabove. Set k = 1. Solve the weighted Tchebycheff problem with the calculated weights:

minx∈X

maxi=1,...,k

(ω − i|fi(x)− f∗i |) (18.1)

Denote the solution by xk ∈ X and the corresponding objective vector by ~f(xk).(2) Ask the decision maker to classify the objective functions at ~f(xk) into satisfactory (Ia) andunsatisfactory (Iua) ones. If the latter class is empty go to step (4). Otherwise, ask the decisionmaker to specify relaxed upper bounds εi for the satisfactory objective functions.(3) Solve the following weighted Tchebycheff problem

minx∈X

maxi=1,...,k

(ωi|fi(x)− f∗i |)

s.t.: fi(x) ≤ εi, ∀i ∈ Ia(18.2)

where the additional upper bounds for the acceptable objectives are taken into account. Denotethe solution by xk+1 ∈ X and the corresponding objective vector by ~f(xk+1). Go to step (2).(4) Stop. The final solution is xk ∈ X

In the first step the distance between the ideal objective vector and the feasible region isminimized by the weighted Tchebycheff metric (18.1). The solution obtained is presented to thedecision maker. Then the decision maker is asked to specify those objective function(s) whosevalue(s) he/she is willing to relax (i.e. weaken) to decrease the values of some other objectivefunctions. The decision maker must also specify the amounts of acceptable relaxation.

The feasible region is restricted according to the information of the decision maker and theweights of the relaxed objective functions are set equal to zero, that is ωi = 0, i ∈ Ia. Then anew distance minimization problem (18.2) is solved. The new constraint set allows the relaxed(acceptable) objective function values to increase up to the specified level. The procedurecontinues until the decision maker does not change any component of the current objectivevector.




1. Generate an MooStepMethod object employing MooStepMethod()



168













– "DeltaF" Put the distances for computing the nadir objective vector fromthe ideal vector.



















169

18. STEP Method







4. Start MooStepMethod: StartOptimizer()




• if iStatus equals EvalFuncId(): MooStepMethod needs new function values → 6bNew function values.After passing these values to the MooStepMethod object go to 4.

• if iStatus equals EvalGradId():MooStepMethod needs new values for gradients→ 6c Providing new gradient values.After passing these values to the MooStepMethod object go to 4.


a) Useful methods:











170




b) Providing new function valuesFunction values for the objectives as well as for constraints have to be calculated forthe design variable vectors provided by MooStepMethod.Note: When Status is equal EvalGradId() for the first time, the functions need tobe evaluated only for the 0’th parallel system (This is only relevant if SqpWrapper isused as scalar optimizer.)For access to the design variable vectors see 6(a)ii. After calculating the values, thesemust be passed to the MooStepMethod object using:


















171

18. STEP Method







7. Output














) const




172

19. Example


Minimize F0(x) = x0 + x21 + x2 ,

F1(x) = x20 + x1 + x2 ,

F2(x) = x0 + x1 + x22 ,

s.t. G0(x) = 12− x0 − x1 − x2 ≥ 0 ,G1(x) = −x2

0 + 10x0 − x21 + 16x1 − x2

2 + 22x2 − 80 ≥ 0 ,−105 ≤ xi ≤ 105, ∀i = 0, . . . , 2





1 #include”Example38 . h”2 #include<iostream>3 #include<iomanip>4 #include<cmath>5 #include”ScpWrapper . h”6 #include”SqpWrapper . h”7 #include”NlpqlbWrapper . h”8 #include”NlpqlgWrapper . h”9 #include”CobylaWrapper . h”

10 #include”MooWeightedObjectives . h”

173

19. Example

11 #include”MooHierarchOptMethod . h”12 #include”MooTradeOff . h”13 #include”MooDistFunc . h”14 #include”MooMinMaxOpt . h”15 #include” MooGlobalCriter ion . h”16 #include”MooWeightedTchebycheff . h”17 #include”MooStepMethod . h”1819 using std : : cout ;20 using std : : endl ;21 using std : : c in ;22 using std : : s e t p r e c i s i o n ;2324 Example38 : : Example38 ( )25 26 char s ;27 cout << endl << ”−−− t h i s i s Example 38 −−−” << endl << endl ;2829 // Choose m u l t i p l e o b j e c t i v e o p t i m i z e r :30 do31 32 cout << endl << ”Which Optimizer do you want to use ? \n\n” ;33 cout << ” [ o ] MooWeightedObjectives \n” ;34 cout << ” [ h ] MooHierarchOptMethod \n” ;35 cout << ” [ t ] MooTradeOff \n” ;36 cout << ” [ d ] MooDistFunc \n” ;37 cout << ” [ p ] MooMinMaxOpt \n” ;38 cout << ” [ a ] MooGlobalCriter ion \n” ;39 cout << ” [w] MooWeightedTchebycheff \n” ;40 cout << ” [ e ] MooStepMethod \n” ;4142 c in >> s ;4344 i f ( s == ’ o ’ )45 46 m pOptimizer = new MooWeightedObjectives ;47 48 else i f ( s == ’ t ’ )49 50 m pOptimizer = new MooTradeOff ;51 52 else i f ( s == ’d ’ )53 54 m pOptimizer = new MooDistFunc ;55 56 else i f ( s == ’h ’ )

174

57 58 m pOptimizer = new MooHierarchOptMethod ;59 60 else i f ( s == ’p ’ )61 62 m pOptimizer = new MooMinMaxOpt ;63 64 else i f ( s == ’ a ’ )65 66 m pOptimizer = new MooGlobalCriter ion ;67 68 else i f ( s == ’w ’ )69 70 m pOptimizer = new MooWeightedTchebycheff ;71 72 else i f ( s == ’ e ’ )73 74 m pOptimizer = new MooStepMethod ;75 76 else77 78 cout << ” i l l e g a l input ! ” << endl ;79 80 81 while ( s != ’ o ’ &&82 s != ’ t ’ &&83 s != ’d ’ &&84 s != ’h ’ &&85 s != ’p ’ &&86 s != ’ a ’ &&87 s != ’w ’ &&88 s != ’ e ’ ) ;8990 // Attach a s c a l a r o p t i m i z e r us ing At tachSca larOpt imizer .91 // The second argument t e l l s the m u l t i p l e o b j e c t i v e o p t i m i z e r92 // to d e l e t e the s c a l a r o p t i m i z e r a f t e r use .93 do94 cout << endl << ”Choose a s c a l a r opt imize r : \n\n” ;95 cout << ” [ q ] SqpWrapper \n” ;96 cout << ” [ c ] ScpWrapper \n” ;97 cout << ” [ b ] NlpqlbWrapper \n” ;98 cout << ” [ g ] NlpqlgWrapper \n” ;99 cout << ” [ y ] CobylaWrapper \n” ;

100101 c in >> s ;102

175

19. Example

103 i f ( s == ’ y ’ )104 105 m pOptimizer−>AttachScalarOptimizer ( new CobylaWrapper ( ) ,106 true ) ;107 108 else i f ( s == ’ q ’ )109 110 m pOptimizer−>AttachScalarOptimizer ( new SqpWrapper ( ) ,111 true ) ;112 113 else i f ( s == ’ c ’ )114 115 m pOptimizer−>AttachScalarOptimizer ( new ScpWrapper ( ) ,116 true ) ;117 118 else i f ( s == ’b ’ )119 120 m pOptimizer−>AttachScalarOptimizer ( new NlpqlbWrapper ( ) ,121 true ) ;122 123 else i f ( s == ’ g ’ )124 125 m pOptimizer−>AttachScalarOptimizer ( new NlpqlgWrapper ( ) ,126 true ) ;127 128 else129 130 cout << ” i l l e g a l input ! ” << endl ;131 132 133 while ( s != ’ c ’ &&134 s != ’ q ’ &&135 s != ’ g ’ &&136 s != ’b ’ &&137 s != ’ y ’ ) ;138 139140 int Example38 : : FuncEval ( bool bGradApprox )141 142 const double ∗ dX ;143 int iNumParSys ;144 double ∗ pdFuncVals ;145 int iNumConstr ;146 int iNumEqConstr ;147 int iNumIneqConstr ;148 int iNumActConstr ;

176

149 int ∗ p iAct ive ;150 int iCounter ;151 int iNumObjFuns ;152 int iE r r o r ;153 bool bAct iveSetSt rat ;154155 p iAct ive = NULL ;156 iNumActConstr = 0 ;157 iCounter = 0 ;158159 m pOptimizer−>Get ( ”NumObjFuns” , iNumObjFuns ) ;160 m pOptimizer−>GetNumConstr ( iNumConstr ) ;161 m pOptimizer−>GetNumEqConstr ( iNumEqConstr ) ;162 m pOptimizer−>GetNumIneqConstr ( iNumIneqConstr ) ;163164 iE r ro r = m pOptimizer−>Get ( ”ActConstr ” , p iAct ive ) ;165 // I f the o p t i m i z e r does not suppor t a c t i v e s e t s t r a t e g y ,166 // a l l i n e q u a l i t i e s are cons idered a c t i v e .167 i f ( iE r ro r != EXIT SUCCESS )168 169 bAct iveSetSt rat = fa l se ;170 p iAct ive = new int [ iNumIneqConstr ] ;171 for ( int iConstr Idx = 0 ; iConstr Idx < iNumIneqConstr ; iConstr Idx++ )172 173 p iAct ive [ iConstr Idx ] = 1 ;174 175 176 else177 178 bAct iveSetSt rat = true ;179 180181182 for ( int iConstr Idx = 0 ; iConstr Idx < iNumIneqConstr ; iConstr Idx++ )183 184 i f ( p iAct ive [ iConstr Idx ] != 0 )185 186 iNumActConstr++ ;187 188 189 // when used f o r g r a d i e n t approximation , FuncEval does not190 // e v a l u a t e i n a c t i v e i n e q u a l i t y c o n s t r a i n t s191 i f ( bGradApprox == true )192 193 pdFuncVals = new double [ iNumEqConstr + iNumObjFuns194 + iNumActConstr ] ;

177

19. Example

195 196 else197 198 pdFuncVals = new double [ iNumConstr + iNumObjFuns ] ;199 200201 // Gradients have to be e v a l u a t e d only at202 // one des i gn v a r i a b l e v e c t o r203 i f ( bGradApprox == true )204 205 iNumParSys = 1 ;206 207 // Functions may have to be e v a l u a t e d at more than one208 // des i gn v a r i a b l e v e c t o r when the s c a l a r o p t i m i z e r209 // i s SqpWrapper .210 else211 212 m pOptimizer−>Get ( ” NumParallelSys ” , iNumParSys ) ;213 214215 // A for−l oop s i m u l a t e s the p a r a l l e l e v a l u a t i o n in216 // iNumParSys p a r a l l e l systems217 for ( int iSys Idx = 0 ; iSys Idx < iNumParSys ; iSys Idx++ )218 219 i f ( bGradApprox == fa l se )220 221 m pOptimizer−>GetDesignVarVec ( dX, iSys Idx ) ;222 223 else224 225 m pApprox−>GetDesignVarVec ( dX ) ;226 227228 // Eva luate the i n e q u a l i t y c o n s t r a i n t s ( when approximating229 // g r a d i e n t s on ly the a c t i v e ones )230231 iCounter = iNumEqConstr ;232233 // 0 th c o n s t r a i n t234 i f ( bGradApprox == fa l se | | p iAct ive [ 0 ] != 0 )235 236 pdFuncVals [ iCounter ] = 12 .0 − dX [ 0 ] − dX [ 1 ] − dX [ 2 ] ;237 iCounter ++ ;238 239 // 1 s t c o n s t r a i n t240 i f ( bGradApprox == fa l se | | p iAct ive [ 1 ] != 0 )

178

241 242 pdFuncVals [ iCounter ] = − dX [ 0 ] ∗ dX [ 0 ] + 10 .0 ∗ dX [ 0 ]243 − dX [ 1 ] ∗ dX [ 1 ] + 16 .0 ∗ dX [ 1 ]244 − dX [ 2 ] ∗ dX [ 2 ] + 22 .0 ∗ dX [ 2 ] − 80 .0 ;245 iCounter ++ ;246 247248 // Eva luate the o b j e c t i v e s249250 // 0 th o b j e c t i v e251 pdFuncVals [ iCounter ] = dX [ 0 ] + dX [ 1 ] ∗ dX [ 1 ] + dX [ 2 ] ;252 // 1 s t o b j e c t i v e253 pdFuncVals [ iCounter + 1 ] = dX [ 0 ] ∗ dX [ 0 ] + dX [ 1 ] + dX [ 2 ] ;254 // 2 s t o b j e c t i v e255 pdFuncVals [ iCounter + 2 ] = dX [ 0 ] + dX [ 1 ] + dX [ 2 ] ∗ dX [ 2 ] ;256257 // re turn the v a l u e s258259 i f ( bGradApprox == fa l se )260 261 for ( int iFunIdx = iCounter ; iFunIdx < iCounter + iNumObjFuns ;262 iFunIdx++ )263 264 m pOptimizer−>PutObjVal ( pdFuncVals [ iFunIdx ] ,265 iFunIdx − iCounter , iSys Idx ) ;266 267 m pOptimizer−>PutConstrValVec ( pdFuncVals , iSys Idx ) ;268 269 else270 271 m pApprox−>PutFuncVals ( pdFuncVals ) ;272 273 274275 delete [ ] pdFuncVals ;276 pdFuncVals = NULL ;277278 // I f t h e r e i s no a c t i v e s e t s t r a t e g y , p i A c t i v e has been279 // a l l o c a t e d by t h i s f u n c t i o n and thus must be d e l e t e d now .280 i f ( bAct iveSetStrat == fa l se )281 282 delete [ ] p iAct ive ;283 p iAct ive = NULL ;284 285286 return EXIT SUCCESS ;

179

19. Example

287 288289 int Example38 : : GradEval ( )290 291 /∗ I f we were a l l o w i n g only CobylaWrapper as a s c a l a r opt imizer , t h i s292 f u n c t i o n cou ld be implemented as a re turn command only .293 As CobylaWrapper i s a g r a d i e n t f r e e a l go r i t hm t h i s f u n c t i o n294 would never be c a l l e d . A l l o the r o p t i m i z e r s in t h i s par t295 o f the manual o f f e r an a c t i v e s e t s t r a t e g y which i s used296 in the f o l l o w i n g implementat ion . ∗/297 const double ∗ dX ;298299 m pOptimizer−>GetDesignVarVec ( dX ) ;300301 m pOptimizer−>PutDerivObj ( 0 , 1 . 0 ) ;302 m pOptimizer−>PutDerivObj ( 1 , 2 . 0 ∗ dX [ 1 ] ) ;303 m pOptimizer−>PutDerivObj ( 2 , 1 . 0 ) ;304305 m pOptimizer−>PutDerivObj ( 0 , 2 . 0 ∗ dX [ 0 ] , 1 ) ;306 m pOptimizer−>PutDerivObj ( 1 , 1 . 0 , 1 ) ;307 m pOptimizer−>PutDerivObj ( 2 , 1 . 0 , 1 ) ;308309 m pOptimizer−>PutDerivObj ( 0 , 1 . 0 , 2 ) ;310 m pOptimizer−>PutDerivObj ( 1 , 1 . 0 , 2 ) ;311 m pOptimizer−>PutDerivObj ( 2 , 2 . 0 ∗ dX [ 2 ] , 2 ) ;312313 m pOptimizer−>PutDerivConstr ( 0 , 0 , − 1 .0 ) ;314 m pOptimizer−>PutDerivConstr ( 0 , 1 , − 1 .0 ) ;315 m pOptimizer−>PutDerivConstr ( 0 , 2 , − 1 .0 ) ;316317 m pOptimizer−>PutDerivConstr ( 1 , 0 , − 2 .0 ∗ dX [ 0 ] + 10 .0 ) ;318 m pOptimizer−>PutDerivConstr ( 1 , 1 , − 2 .0 ∗ dX [ 1 ] + 16 .0 ) ;319 m pOptimizer−>PutDerivConstr ( 1 , 2 , − 2 .0 ∗ dX [ 2 ] + 22 .0 ) ;320321 return EXIT SUCCESS ;322 323324 int Example38 : : SolveOptProblem ( )325 326 int iE r r o r ;327328 iEr ro r = 0 ;329330 // Parameters can be s e t us ing a v a r i a b l e , an address331 // or the v a l u e i t s e l f :332

180

333 int iMaxNumIter = 500 ;334 int iMaxNumIterLS = 1000 ;335 double dTermAcc = 1 .0E−6 ;336337 m pOptimizer−>Put ( ”NumObjFuns” , 3 ) ;338 m pOptimizer−>Put ( ”Order” , 2 ) ;339 m pOptimizer−>Put ( ” ComputeIdeals ” , true ) ;340 m pOptimizer−>Put ( ”RhoBegin” , 5 ) ;341 m pOptimizer−>Put ( ”MaxNumLocMin” , 10 ) ;342 m pOptimizer−>Put ( ”MaxNumIter” , iMaxNumIter ) ;343 m pOptimizer−>Put ( ”MaxNumIterLS” , & iMaxNumIterLS ) ;344 m pOptimizer−>Put ( ”TermAcc” , & dTermAcc ) ;345 m pOptimizer−>Put ( ”OutputLevel ” , 3 ) ;346347 // Define the o p t i m i z a t i o n problem :348349 m pOptimizer−>PutNumDv( 3 ) ;350 m pOptimizer−>PutNumIneqConstr ( 2 ) ;351 m pOptimizer−>PutNumEqConstr ( 0 ) ;352353 double LB[ 3 ] = − 1e5 , − 1e5 , − 1e5 ;354 double UB[ 3 ] = 1e5 , 1e5 , 1 e5 ;355 double IG [ 3 ] = 0 . 5 , 8 . 0 , 2 . 0 ;356357 m pOptimizer−>PutUpperBound ( UB ) ;358 m pOptimizer−>PutLowerBound ( LB ) ;359 m pOptimizer−>Put In i t i a lGue s s ( IG ) ;360361 // S t a r t the o p t i m i z a t i o n362 iE r ro r = m pOptimizer−>DefaultLoop ( this ) ;363364 // i f t h e r e was an error , r e p o r t i t . . .365 i f ( iE r ro r != EXIT SUCCESS )366 367 cout << ” Error ” << iE r r o r << ”\n” ;368 return iE r r o r ;369 370371 // . . . e l s e r e p o r t the r e s u l t s :372373 const double ∗ dX ;374 double ∗ F ;375376 int iNumDesignVar ;377 int iNumConstr ;378 int iNumObjFuns ;

181

19. Example

379 int iDvIdx ;380 int iFunIdx ;381382 m pOptimizer−>GetNumDv( iNumDesignVar ) ;383 m pOptimizer−>GetNumConstr ( iNumConstr ) ;384 m pOptimizer−>Get ( ”NumObjFuns” , iNumObjFuns ) ;385386 F = new double [ iNumObjFuns ] ;387388 m pOptimizer−>GetDesignVarVec ( dX ) ;389390 for ( iFunIdx = 0 ; iFunIdx < iNumObjFuns ; iFunIdx++)391 392 m pOptimizer−>GetObjVal ( F [ iFunIdx ] , iFunIdx ) ;393 394395 for ( iDvIdx = 0 ; iDvIdx < iNumDesignVar ; iDvIdx++ )396 397 cout << ”dX[ ” << iDvIdx << ” ] = ” << dX[ iDvIdx ] << endl ;398 399400 for ( iFunIdx = 0 ; iFunIdx < iNumObjFuns ; iFunIdx++ )401 402 cout << ” f ” << iFunIdx << ” [X] = ” << s e t p r e c i s i o n ( 20 ) << F[ iFunIdx ] << endl ;403 404405 delete [ ] F ;406 return EXIT SUCCESS ;407

182

Part III.

Mixed Integer Optimization

183

20. MidacoWrapper

MidacoWrapper is a C++ Wrapper for MIDACO, a stochastic Gauss approximation algorithmby Schluter for mixed integer nonlinear optimization problems. MidacoWrapper allows contin-uous, integer and catalogued variables which can only take values within a given discrete set.Furthermore the algorithm supports parallel function evaluation at a given number of points.

The algorithm succeeded in finding the best known solutions to three global trajectory opti-mization problems proposed by the ESA (European Space Agency) which can be found at

http://www.esa.int/gsp/ACT/inf/op/globopt/edvdvdedjds.htm,

http://www.esa.int/gsp/ACT/inf/op/globopt/evevejsa.htm

and

http://www.esa.int/gsp/ACT/inf/op/globopt/MessengerFull.html.

The sections 20.1, 20.2, 20.3 and 20.4 are taken from Schluter et al. [146], where a descriptionof the extended ACO algorithm is given. MIDACO controls several of the ACO algorithmsdescribed in the following and contains further heuristics.

More information on MIDACO can be found online at www.midaco-solver.com.

20.1. Introduction

The first optimization algorithms inspired by ants foraging behavior were introduced by MarcoDorigo in his PhD thesis (Dorigo [32]). Later, these algorithms were formalized as the AntColony Optimization (ACO) metaheuristic (Dorigo and Di Caro [33]). Originally the ACOmetaheuristic was considered only for combinatorial optimization problems (e.g. TravellingSalesman Problem, Sttzle and Dorigo [152]). In Bonabeau et al. [12] a general overview onACO and its applications on some scientific and engineering problems is given. An introductionto ACO together with recent trends is given in Blum [8]. Comprehensive information on ACOcan be found in Dorigo and Stuetzle [34]

Several extensions of the ACO (Ant Colony Optimization) metaheuristic for continuous searchdomains can be found in the literature, among them Socha and Dorigo [149], Yu et al. [169],Dreo and Siarry [35] or Kong and Tian [78]. Other applications of ACO frameworks for real-world problems, arising from engineering design applications, can be found in Jayaraman et al.[73], Rajesh et al. [111], Chunfeng and Xin [24] or Zhang et al. [172]. In contrast extensions formixed integer search domains are very rare in the literature. In Socha [148] a general extensionon continuous and mixed integer domains is discussed.

Although a detailed explanation of an ACO algorithm design for continuous problems togetherwith numerical results is given in this reference, the application on mixed integer domains is

185

http://www.esa.int/gsp/ACT/inf/op/globopt/edvdvdedjds.htm

http://www.esa.int/gsp/ACT/inf/op/globopt/evevejsa.htm

http://www.esa.int/gsp/ACT/inf/op/globopt/MessengerFull.html

www.midaco-solver.com

20. MidacoWrapper

only mentioned theoretically. The proposed approach is a combination of a conventional ACOalgorithm, which is based on the concept of a pheromone table for discrete domains, with anextension for continuous domains based on the use of pheromone controlled probability densityfunctions. In Serban and Sandou [147] a mixed integer programming method based on the ACOmetaheuristic is introduced especially for the unit commitment problem. Again this referencecouples a conventional pheromone table based ACO algorithm for discrete variables with anextension for continuous variables.

This paper introduces a conceptual new extension of the ACO metaheuristic for mixed integersearch domains. In contrast to a pheromone table, a pheromone guided discretised probabilitydensity function will be used for the discrete search domain. This approach allows an intuitivehandling of integer variables besides continuous ones within the same algorithm framework.Whilst the above mentioned extensions of ACO on mixed integer domains can be seen as basedon a discrete ACO algorithm with an extension for continuous domains, our approach worksthe other way round. Based on a continuous ACO methodology we extend the algorithm ondiscrete variables. This is done by a heuristic, defining a lower bound for the standard deviationof the discretized Gaussian probability function, which is assumed here as probability densityfunction.

To apply the ACO metaheuristic on general MINLPs, not only we had to consider mixedinteger search domains, but also a good handling of arbitrary constraints. Here we propose anew penalty strategy that reinforces this approach and which fits very well in the extended ACOmetaheuristic. Our method is based on a user-given oracle information, an estimated preferredobjective function value, which is used as the crucial parameter for the penalty function. Adetailed investigation of this approach is in preparation and the results seem to be promising.In particular the method seems to be quite robust against ’bad’ selected oracles. Furthermoreit can be shown analytically, that a sufficiently large or sufficiently low oracle can be used asdefault parameters.

Our implementation, named ACOmi, is based on the ACO metaheuristic for continuous domainspresented by Socha and Dorigo [149] and enlarged by the above mentioned two novel heuristics.As this implementation is a sophisticated one, aiming on the practical use of real-world appli-cations, several other heuristics (which will be briefly described in Section 5) are included and amixed integer sequential quadratic programming algorithm is embedded as a local solver in themetaheuristic framework. In addition a set of academic benchmark test problems, a complexMINLP application, a thermal insulation system described in Abramson [1], is solved with thisimplementation to fortify its practical relevance for real-world applications.

20.2. Approaches for non-convex Mixed Integer Nonlinear Programs

MINLPs are the most general type of single-objective optimization problems. Containing bothcontinuous and integer decision variables, and without any limitation to the complexity of eitherthe objective function or the constraints, these problems can be a real challenge. The presenceof nonlinearities in the objective and constraint functions might imply non-convexity in MINLPproblems, i.e. the potential existence of multiple local solutions.

186

20.2. Approaches for non-convex Mixed Integer Nonlinear Programs

Before giving an overview of the existing methodologies for such problems, the mathematicalformulation of a MINLP is given in (20.1).

Minimize f(x, y) (x ∈ Rncon , y ∈ Nnint , ncon, nint ∈ N)

subject to: gi(x, y) = 0, i = 1, ...,meq ∈ Ngi(x, y) ≥ 0, i = meq + 1, ...,m ∈ Nxl ≤ x ≤ xu (xl, xu ∈ Rncon)

yl ≤ y ≤ yu (yl, yu ∈ Nnint)

(20.1)

In this formulation f(x, y) is the objective function, which has to be minimized, dependingon x, the vector of ncon continuous decision variables, and y, the vector of nint integer deci-sion variables. The functions g1, ..., gmeq represent the equality constraints and the functionsgmeq+1, .., gm the inequality constraints. The vectors xl, xu and yl, yu are the lower and upperbounds for the decision variables x and y, those are also called box-constraints.

In principle two types of approaches are possible to solve this kind of problem: deterministicand stochastic methods. So-called metaheuristics (Glover and Kochenberger [55]) often belongto the latter. ACO can be classified as a stochastic metaheuristic. Modern algorithms, like theone discussed in this paper, often combine both methodologies in a hybrid-manner, consistingof a stochastic framework with deterministic strategies embedded. For example Egea et al.[38], Chelouah and Siarry [23] or Chelouah and Siarry [22] follow also this hybrid frameworkapproach.

Among the deterministic approaches for MINLPs, Branch and Bound techniques, Outer Ap-proximation, General Benders Decomposition or extended Cutting Plane methods are the mostcommon ones. A comprehensive review on these can be found in Grossman [60]. The big ad-vantage of deterministic approaches is that a lot of these can guarantee global optimality. Onthe other hand this guarantee comes with a disadvantage, the possibility of a tremendous com-putation time depending on the problem structure. In addition, most of the implementations ofthese methods require a user given formulation of the mathematical MINLP in an explicit way.In this case the implementation is called a white box solver. In principle any implementationcan gain the required information via approximation by function evaluations from a black boxformulation alternatively, but this does highly increase the computational time effort.

In contrast to the white box approach, black box solvers do not require any knowledge ofthe mathematical formulation of the optimization problem. Of course a mathematical formula-tion is always essential to implement and tackle an optimization problem, but black box solversdo not assimilate this formulation. This property makes them very flexible according to pro-gramming languages and problem types, which is very much appreciated by practitioners. Allmetaheuristics can be seen as black box solvers regarding their fundamental concept. In thispaper an extension of the ACO metaheuristic will be introduced to apply this method on gen-eral MINLPs. Besides academic benchmark MINLP test problems, one complex engineeringapplication will be considered and optimized with our ACO implementation.

187

20. MidacoWrapper

20.3. The Ant Colony Optimization framework

To find food, biological ants start to explore the area around their nest randomly at first. If anant succeeds in finding a food source, it will return back to the nest, laying down a chemicalpheromone trail marking its path. This trail will attract other ants to follow it in the hope offinding food again. Over time the pheromones will start to evaporate and therefore reduce theattraction of the path, so only paths that are updated frequently with new pheromones remainattractive. Short paths from the nest to a food source imply short marching times for the ants,so those paths are updated with pheromones more often than long ones. Consequently moreand more ants will be attracted by the shorter paths with ongoing time. As a final result, a veryshort path will be discovered by the ant colony.

This basic idea of ACO algorithms is to mimic this biological behavior with artificial ants’walking’ on a graph, which represents a mathematical problem (e.g. Traveling Salesman Prob-lem). An optimal path in terms of length or some other cost-resource is requested in thoseproblems, which belong to the field of combinatorial optimization. By using a parametrizedprobabilistic model, called pheromone table, the artificial ants choose a path through a com-pletely connected Graph G(C,L), where C is the set of vertices and L is the set of connections.The set C of vertices represent the solution components, which every ant chooses incrementallyto create a path. The pheromone values within the pheromone table are used by the ants, tomake these probabilistic decisions. By updating the pheromone values according to informationgained on the search domain, this algorithmic procedure leads to very good and hopefully globaloptimal solutions, like the biological counterpart.

Algorithm 12 ACO metaheuristic

while stopping criteria not met dopheromone based solution constructionpheromone updatedaemon actions

end while

The pseudo-code in Algorithm 12 illustrates this fundamental working procedure of the ACOmetaheuristic. The stopping criteria and daemon actions are a choice of the algorithm designer.Commonly used stopping criteria are for example a maximal limit of constructed solutions ora maximal time budget. Daemon actions might be any activity, that cannot be performed bysingle ants. Local search activities and additional pheromone manipulations are examples forsuch daemon actions, see Blum [9].

In contrast to the original ACO metaheuristic developed for combinatorial optimization prob-lems, the ACO framework considered in this paper is mainly based on the extended ACO forcontinuous domains proposed by Socha and Dorigo [149]. The biological visualization of antschoosing their way through a graph-like search domain does not hold any longer for these prob-lems, as these belong to a completely different class. However, the extension of the originalACO metaheuristic to continuous domains is possible without any major conceptual change,see Socha [148]. In this methodology, ACO works by the incremental construction of solutionsregarding a probabilistic choice according to a probability density function (PDF), instead of a

188

20.3. The Ant Colony Optimization framework

pheromone table like in the original ACO. In principle any function P (x) ≥ 0 for all x with theproperty: ∫ ∞

−∞P (x) dx = 1 (20.2)

can act as a PDF. Among the most popular functions to be used as a PDF is the Gaussianfunction. This function has some clear advantages like an easy implementation (e.g. Box andMller [16]) and a corresponding fast sampling time of random numbers. On the other hand, asingle Gaussian function is only able to focus on one mean and therefore not able to describesituations where two or more disjoint areas of the search domain are promising. To overcomethis disadvantage by still keeping track of the benefits of a Gaussian function, a PDF Gi(x)consisting of a weighted sum of several one-dimensional Gaussian functions gil(x) is consideredfor every dimension i of the original search domain:

Gi(x) =

k∑l=1

wil · gil(x) =

k∑l=1

wil1

σil√

2πe− (x−µil)

2

2 σi 2l (20.3)

This function is characterized by the triplets (wil , σil , µ

il) that are given for every dimension i of

the search domain and the number of kernels k of Gauss functions used within Gi(x). Within thistriplet, w represents the weights for the individual Gaussian functions for the PDF, σ representsthe standard deviations, and µ represents the means for the corresponding Gaussian functions.The indices i and l refer, respectively, to the i-th dimension of the decision vector of the MINLPproblem and the l-th kernel number of the individual Gaussian function within the PDF.

As that the above triplets fully characterize the PDF and therefore guide the sampled solutioncandidates throughout the search domain, they are called pheromones in the ACO sense andconstitute the biological background of the ACO metaheuristic presented here. Besides theincremental construction of the solution candidates according to the PDF, the update of thepheromones plays a major role in the ACO metaheuristic.

An obviously good choice to update the pheromones is the use of information, which has beengained throughout the search process so far. This can be done by using a solution archive SAin which the so far most promising solutions are saved. In case of k kernels this can be donechoosing an archive size of k. Thus the SA contains k n-dimensional solution vectors sl and thecorresponding k objective function values (see Socha [148]).

As the focus is here on constrained MINLPs, the solution archive SA also contains the cor-responding violation of the constraints and the penalty function value for every solution sl.In particular, the attraction of a solution sl saved in the archive is measured regarding thepenalty function value instead of the objective function value. Details on the measurement ofthe violation and the penalty function will be described in Section 20.4.

We now explain the update process for the pheromones which is directly connected to theupdate process of the solution archive SA. The weights w (which indicate the importance of anant and therefore rank them) are calculated with a linear proportion according to the parameter

189

20. MidacoWrapper

k:

wil =(k − l + 1)∑k

j=1 j(20.4)

With this fixed distribution of the weights, a linear order of priority within the solution archiveSA is established. Solutions sl with a low index l are preferred. Hence, s1 is the current bestsolution found and therefore most important, while sk is the solution of the lowest interest, savedin SA. Updating the solution archive will then directly imply a pheromone update based onbest solutions found so far. Every time a new solution (ant) is created and evaluated within ageneration its attraction (penalty function value) is compared to the attraction of the so far bestsolutions saved in SA, starting with the very best solution s1 and ending up with the last onesk in the archive. In case the new solution has a better attraction than the j-th one saved in thearchive, the new solution will be saved on the j-th position in SA, while all solutions formerlytaking the j-th till k − 1-th position will drop down one index in the archive and the solutionformerly taking the last k-th position is discarded at all. Of course, if the new solution has aworse than attraction as one of sk, the solution archive remains unaffected. As it is explainedin detail in the following, the solutions saved in SA fully imply the deviations and means usedfor the PDF and imply therefore the major part of the pheromone triplet (wil , σ

il , µ

il). This way

updating the solution archive with better solutions leads automatically a positive pheromoneupdate. Note that a negative pheromone update (evaporation) is indirectly performed as well bythe dropping the last solution sk of SA every time a new solution is introduced in the solutionarchive. Explicit pheromone evaporation rules are known and can be found for example in Sochaand Dorigo [149] but were not considered here due to the implicit negative update and simplicityof the framework.

Standard deviations σ are calculated by exploiting the variety of solutions saved in SA. Forevery dimension i the maximal and minimal distance between the single solution componentssi of the k solutions saved in SA is calculated. Then the distance between these two values,divided by the number of generations, defines the standard variation for every dimension i:

σil =dismax(i)− dismin(i)

#generation

dismax(i) = max|sig − sih| : g, h ∈ 1, ..., k, g 6= hdismin(i) = min|sig − sih| : g, h ∈ 1, ..., k, g 6= h

(20.5)

For all k single Gaussian functions within the PDF this deviation is then used regarding thecorresponding dimension i. The means µ are given directly by the single components of thesolutions saved in SA:

µil = sil (20.6)

The incremental construction of a new ant works the following way: A mean µil is chosen firstfor every dimension i . This choice is done respectively to the weights wil . According to theweights defined in (20.4) and the identity of µil and sil defined in (20.6), the mean µi1 has thehighest probability to be chosen, while µik has the lowest probability to be chosen. Second, arandom number is generated by sampling around the selected mean µil using the deviation σildefined by (20.5). Proceeding like this through all dimensions i = 1, ..., n then creates the newant, which can be evaluated regarding its objective function value and constraint violations ina next step.

190

20.4. Penalty method

So far, this algorithm framework does not differ from the one proposed by Socha [148], exceptthat here explicit rules for all pheromone parameters (wil , σ

il , µ

il) are given in (20.4), (20.5) and

(20.6), while Socha proposes those only for the deviations and means. It is to note that rulesfor the deviations and means shown here were taken from Socha. The novel extension whichenables the algorithm to handle mixed integer search domains modifies the deviations σil usedfor the integer variables and is described now in detail.

To handle integer variables besides the continuous ones, as it is described above, a discretiza-tion of the continuous random numbers (sampled by the PDF) is necessary. The clear advantageof this approach is the straight forward integration in the same framework described above. Adisadvantage is the missing flexibility in cases where for an integer dimension i all k solutioncomponents si1,...,k in SA share the same value. In this case the corresponding deviations σi1,...,kare zero regarding the formulation in (20.5). As a consequence, no further progress in thesecomponents is possible, as the PDF would only sample the exact mean without any deviation.

Introducing a lower bound for the deviation of integer variables helps to overcome this disad-vantage and enables the ACO metaheuristic to handle integer and continuous variables simulta-neously without any major extension in the same framework. For a dimension i that correspondsto an integer variable the deviations σil are calculated by:

σil = max

dismax(i)− dismin(i)

#generation,

1

# generation, (1− 1

√nint

)/2

(20.7)

With this definition the deviations according to the corresponding Gaussian functions forinteger variables will never fall under a fixed lower bound of (1 − 1√

nint)/2, determined by the

number of integer variables nint considered in the MINLP problem formulation. For MINLPwith a large amount of integers, this lower bound converges toward 0.5 and ensures therefore adeviation that hopefully keeps the algorithm flexible enough to find its way through the (large)mixed integer search domain. A small number of integer variables leads to a smaller lowerbound, whilst for only one integer variable this bound is actually zero. In case of a smallamount of integer variables in the MINLP it is reasonable to think that at some point of thesearch progress the optimal combination of integers is found and therefore no further searchingwith a wide deviation is necessary for the integer variables. But even in case of only one integervariable, with a corresponding absolute lower bound of zero, the middle term ( 1

#generation) in (7)ensures a not too fast convergence of the deviation. Therefore, the calculation of the standarddeviation σ for integer variables by (7) seems to be a reasonable choice and is confirmed by thenumerical results.


Penalty methods are well known strategies to handle constraints in optimization problems.By replacing the original objective function by a penalty function which is a weighted sum (orproduct) of the original objective function and the constraint violations, the constrained problemcan be transformed into an unconstrained one. The penalty function acts therefore as a newobjective function which might also be called a cost function. A wide range of modificationsof this method is known and comprehensive reviews can be found in Coello [25] or Yeniay[168]. In general it is to note, that simple penalty methods (e.g. death or static, see Yeniay[168]) do not require a lot of problem specific parameters to be selected, which makes their

191

20. MidacoWrapper

use and implementation very easy and popular. This advantage of simple penalty methodscomes with the drawback, that for especially challenging problems these are often not capableto gain sufficient performance. Sophisticated penalty methods (e.g. adaptive or annealing, seeYeniay [168]) are more powerful and adjustable to a specific problem due to a larger number ofparameters. However, this greater potential implies an additional optimization task for a specificproblem: the sufficiently good selection of a large set of parameters. A high potential penaltymethod, that requires none or only a few parameters to be selected, is therefore an interestingapproach.

20.4.1. Robust oracle penalty method

We present a general penalty method based on only one parameter, named Ω, which is selectedbest equivalent or just slightly greater than the optimal (feasible) objective function value fora given problem. As for most real-world problems this value is unknown a priori, the user hasto guess this parameter Ω at first. After an optimization test run, the quality of the chosenparameter can be directly compared to the truly reached solution value. Due to this predictivenature of the parameter it is called an oracle. Despite this illustrative and clear meaning ofthe oracle parameter, it is important, that the method performs satisfactory even for bad orunreasonable choices of Ω, to apply it to real-world problems. Indeed the method is constructedthis way and numerical results fortify the robustness of it. A detailed investigation of thedevelopment, properties and robustness of the method is currently in preparation and would gobeyond the scope here. Therefore only the essential mathematical formulation together with abrief description, a graphical illustration and a general update rule for the oracle parameter willbe given here.

The oracle method works with a Norm-function over all violations of the m constraints of anoptimization problem (20.1); this function is called a residual. Commonly used Norm-functionsare the l1, l2 or l∞ Norm, here we assume the l1 Norm as residual (see (20.8)). To simplify thenotation of the robust oracle method, we denote here with z := (x, y) the vector of all decisionvariables without explicit respect to x and y, which are the vectors of continuous and integervariables in the MINLP (20.1) formulation. Such a vector z is called an iterate in the followingdue to the generality of the method. In case of ACO z represents an ant.

res(z) =

meq∑i=1

|gi(z)| −m∑

i=meq+1

min0, gi(z) (20.8)

where g1,...,meq denote the equality constraints and gmeq+1,...,m the inequality constraints. Thepenalty function p(z) is then calculated by:

p(z) =

α · |f(z)− Ω|+ (1− α) · res(z)− β , if f(z) > Ω or res(z) > 0

−|f(z)− Ω| , if f(z) ≤ Ω and res(z) = 0(20.9)

where α is given by:

192


α =

|f(z)−Ω|· 6√

3−26√

3−res(z)

|f(z)−Ω|−res(z) , if f(z) > Ω and res(z) < |f(z)−Ω|3

1− 1

2√|f(z)−Ω|res(z)

, if f(z) > Ω and |f(z)−Ω|3 ≤ res(z) ≤ |f(z)− Ω|

12

√|f(z)−Ω|res(z) , if f(z) > Ω and res(z) > |f(z)− Ω|

0 , if f(z) ≤ Ω

(20.10)

and β by:

β =

|f(z)−Ω|· 6

√3−2

6√

3

1+ 1√#generation

· (1− 3 res(z)|f(z)−Ω|) , if f(z) > Ω and res(z) < |f(z)−Ω|

3

0 , else

(20.11)

Before a brief motivation of the single components of the penalty method is given, a graphicalillustration of the penalty function values p(z) regarding different objective and residual functionvalues for a given oracle parameter Ω and a fixed number of generations is given below. Figure20.1 shows the three dimensional shape of the penalty function according to the first and thehundredth generation with an oracle parameter equal to zero, for residual function values in therange from zero to ten and for objective function values in the range from minus ten to ten. Itis important to note, that the shape of the penalty function is not affected at all by the oracleparameter. A lower or greater oracle parameter than zero results only in a movement to the leftor right respectively on the axis representing the objective function value.

193

20. MidacoWrapper

Figure 20.1.: Three dimensional shape of the robust oracle penalty method for Ω = 0 and#generation = 1 (left) and #generation = 100 (right).

Now a brief motivation of the single components of the penalty method is given. The penaltyfunction p(z) defined in (20.9) is split into two cases. The appearance of the penalty functionin case f(z) > Ω and res(z) > 0 is similar to common ones, where a parameter α balances theweight between the objective function and the residual and an additional β term acts as a bias.In case f(z) ≤ Ω and res(z) = 0 the penalty function p(z) is defined as the negative distancebetween the objective function value f(z) and the oracle parameter Ω. In this case, the resultingpenalty values will be zero or negative. This case corresponds with the front left sides (res = 0and f(z) ≤ Ω) of the 3D shapes shown in Figure 20.1, which are formed as a vertical triangular.In case f ≤ Ω and res(z) > 0 both, the α and β term, are zero. According to (20.9) this implies,that the penalty value p(z) is equal to the residual value res(z). This case corresponds with theleft sides (f(z) ≤ Ω) of the two 3D shapes shown in Figure 20.1, which are formed as a beveledplane.

The two middle terms 1− 1

2√|f(z)−Ω|res(z)

and 12

√|f(z)−Ω|res(z) in the definition of α are the major

components of the oracle penalty method. In case f(z) > Ω those are active in α and usedrespectively to the residual value res(z). If res(z) > |f(z) − Ω| the latter one is active andresults in a value of α < 1

2 , which will increase the weight on the residual in the penalty functionp(z). Otherwise, if res(z) ≤ |f(z) − Ω|, the first one is active and results in a value of α ≥ 1

2 ,which will increase the weight on the objective function regarding to (20.9). Therefore these twocomponents enable the penalty method to balance the search process, respectively to either theobjective function value or the residual. This balancing finds it representation in the nonlinearshapes of the upper right sides (f(z) > Ω) of the two 3D shapes shown in Figure 20.1.

A special case occurs if f(z) > Ω and res(z) < |f(z)−Ω|3 , in this case the very first term in

the definition of α and the β term is active. This is done, because otherwise the second termin the definition of α would lead to a worser penalization of iterates with res(z) < |f(z)−Ω|

3

194


than those with res(z) = |f(z)−Ω|3 (a detailed explanation goes beyond the scope here). As this

is seen as a negative effect regarding the robustness of the method, because feasible solutionswith res(z) = 0 < |f(z)−Ω|

3 would be penalized worse than infeasible ones, the very first termof α is activated in this case. The very first term in α implies an equal penalization p(z) of alliterates sharing the same objective function value f(z) > Ω and a residual between zero and|f(z)−Ω|

3 , not taking the β term in definition (20.9) into account. The β term goes one step

further and biases the penalization of iterates with a smaller residual than |f(z)−Ω|3 better than

those with res(z) = |f(z)−Ω|3 . Moreover this bias will increase with ongoing algorithm process, as

the number of generations influence the β term. The front right sides (f(z) > Ω) of the two 3Dshapes shown in Figure 20.1 correspond to this special case, those are formed as a triangular.In case of the left subfigure in Figure 20.1, which corresponds to a generation number of 1, thebiasing effect is not as strong as in case of the right subfigure, which corresponds to a generationnumber of 100. It is to note, that this dynamic bias, caused by the β term in definition (20.9),is the only difference between the two 3D shapes shown in Figure 20.1.

20.4.2. Oracle update rule

For most real-world problems the global optimal (feasible) objective function value is unknown apriori. As the oracle parameter Ω is selected best, equal or just slightly greater than the optimalobjective function value, finding a sufficiently good oracle and solving the original problem tothe global optimum goes hand in hand. This self-tuning effect of the method is easy to exploitif several optimization test runs are performed. Each time an optimization test run using aspecific oracle has finished, the obtained solution objective and residual function values directlydeliver information about an appropriate oracle update.

Numerical tests show that the oracle method is quite robust against overestimated oracles.Underestimated oracles tend to deliver feasible points that are close to but not exactly theglobal optimal objective function value. Based on those results the following update rule wasdeduced and is intended to apply, if for a given problem absolutely no information is availableand a possible optimal objective function value. The oracle for the very first run should beselected sufficiently low (in Table 20.1 designated as −∞, e.g. −1012), in this case the robustoracle method follows a dynamic penalty strategy. The hope is to find a feasible point, whichis already close to the global optimum. If the first run succeeded in finding a feasible solution,the oracle will be updated with this solution and from now on only feasible and lower solutionswill be used to further update the oracle. In case the first run did not deliver any feasible point,the oracle should be selected sufficiently large (in Table 20.1 designated as ∞, e.g. 1012. Withthis parameter the method will completely focus on finding a feasible solution at first (see theflat shape in Figure 20.1 for f(z) < Ω and res(z) > 0) and will then switch to a death strategy(see the spike shape in Figure 20.1 for f(z) < Ω and res(z) = 0) as soon as a feasible solutionis found.

195

20. MidacoWrapper

Table 20.1.: Update rule for the oracle throughout several optimization test runs

Ωi Oracle used for the i-th optimization test runf i Obj-function value obtained by the i-th test runresi Residual function value corresponding to f i

Ω1 = −∞ (sufficiently low) Initialization with dynamic strategy

Ω2 =

f1 ,if res1 = 0∞ (suff. high) ,else

Update with solution if f1 is feasible,

static & death strategy if f1 is infeasible

Ωi =

f i−1 ,if

resi−1 = 0

andf i−1 < Ωi−1

Ωi−1 ,else

Update Ωi only with feasible (resi−1 = 0) and

better (f i−1 < Ωi−1 = f i−2 ∨∞) solutions




1. Generate a MidacoWrapper object employing MidacoWrapper()

2. a) Put the number of parallel systems, i.e. the number of points where you want toevaluate the functions simultaneously:Put( const char * pcParam , const int * piValue ) ;


where pcParam is "NumParallelSys"



c) Put the number of integer design variables to the object:Put( const char * pcParam , const int * piValue ) ;


where pcParam is "NumIntDv"

d) Put the number of catalogued design variables to the object:Put( const char * pcParam , const int * piValue ) ;

or

196



where pcParam is "NumCatDv"

e) Put the allowed values for the catalogued design variables:Variablecatalogue * m pVariableCatalogue is a pointer to an array of lengthequal to the number of catalogued variables. Via this pointer each variable can beinitialized as explained in 29.1

f) Put the values for the parameters you want to adjust:





– "OutputLevel" Put desired output level:1: only summary2: as 1 and intermediate function values after every ”PrintEval” evaluations3: as 2 and current solution vector after every ”PrintEval” evaluations

– "Stop" Communication flag to stop MIDACO by the user. If MIDACO iscalled with ”Stop” = 1, MIDACO returns the best found solution with cor-responding function values. As long as MIDACO should proceed its search,”Stop” must be equal to 0. Usually this parameter does not have to bechanged by the user; Nlp++ takes care of this parameter automatically.

– "QStart" This value indicates the quality of the starting point submitted bythe user at the first call of MIDACO. It must be a (discrete) value ≥ 0 where”QStart” = 0 is the default setting assuming that the starting point is justsome random point without specific quality. A higher value indicates a higherquality of the solution. If ”QStart” ≥ 1 is selected, MIDACO will concen-trate its search around the startingpoint by sampling its initial populationin the area (UpperBound(i)-LowerBound(i))/”QStart” in all dimensions ’i’respectively. Note that activating ”QStart” will *NOT* shrink the searchspace defined by the lower/upper bounds, but only concentrate the initialpopulation around the startingpoint. The ”QStart” option is very useful torefine a previously calculated solution and/or to increase the accuracy of theconstraint violation.

– "PrintEval" This value indicates the frequency of the intermediate output.After every ”PrintEval” evaluations, the current solution is printed to theoutput file.




– "TermAcc" Accuracy for the constraint violation. An iterate is assumed fea-sible, if the L-infinity norm over the constraint values is lower or equal to”TermAcc”. Hint: For a first optimization run the ”TermAcc” accuracy

197

20. MidacoWrapper

should be selected not too small. A value of 0.01 to 0.0001 is recommended.If a higher accuracy is demanded, a refinement of the solution regarding theconstraint violations is possible by using the parameter ”QStart”.

– "Seed" This value indicates the random seed used within MIDACO’s internalpseudo-random number generator. It must be a (discrete) value ≥ 0 given indouble precision. For example ”Seed” = 0.0, 1.0, 2.0,... Note that MIDACOruns are 100% reproducable for the same random seed used.

– "QStart" This value indicates the quality of the starting point submitted bythe user at the first call of MIDACO. It must be a (discrete) value ≥ 0 givenin double precision, where ”QStart” = 0.0 is the default setting assuming thatthe starting point is just some random point without specific quality. A highervalue indicates a higher quality of the solution. If ”QStart” ≥ 1.0 is selected,MIDACO will concentrate its search around the startingpoint by samplingits initial population in the area (UpperBound(i)-LowerBound(i))/”QStart”in all dimensions ’i’ respectively. Note that activating ”QStart” will *NOT*shrink the search space defined by the lower/upper bounds, but only concen-trate the initial population around the startingpoint. The ”QStart” option isvery useful to refine a previously calculated solution and/or to increase theaccuracy of the constraint violation.

– "AutoStop" This value enables an automatic stopping criterion within MI-DACO. It must be a (discrete) value ≥ 0 given in double precision. Forexample ”AutoStop” = 1.0, 2.0, 3.0,... If ”AutoStop” is ≥ 1.0 MIDACOwill stop and return its current best solution after ”AutoStop” major itera-tions without significant improvement of the current solution. Hence, a smallvalue for ”AutoStop” will lead to a shorter runtime of MIDACO with a lowerchance of reaching the global optimal solution. A large value of ”AutoStop”will lead to a longer runtime of MIDACO but increases the chances of reach-ing the global optimum. Note that running MIDACO with ”AutoStop” = 0.0with a maximum number of iterations will always provide the highest chanceof global optimality.

– "Omega" This value affects only constrained problems. If ”Omega”=0.0 issubmitted MIDACO will use its inbuild oracle strategy. If ”Omega” is notequal to 0.0, MIDACO will use ”Omega” as initial oracle for its oracle penaltyfunction. In case the user wants to submit the specific ”Omega” = 0.0, a closevalue like 1.0e-16 can be used as dummy. Please review the TheoryManualto receive more information on the oracle penalty method.

– "Ants" This value fixes the number of iterates (ants) used within a gen-eration. If ”Ants”=0.0 is submitted, MIDACO will handle the number ofiterates dynamically by itself. The use of ”Ants” and ”Kernel” is useful foreither very expensive problems or higher dimensional problems (more than100 variables).

– "Kernel" This value fixes the kernel size for each generation. If ”Kernel”=0.0is submitted, MIDACO will handle the kernel sizes dynamically by itself. IF”Kernel” is not equal 0.0, it must be at least 2.0.

198


– "Character" MIDACO includes three different parameter settings especiallytuned for NLP/MINLP/IP problems. If ”Character” = 0.0D MIDACO willselect its parameter set according to the balance of discrete and continuouscariables. If the user wishes to enable a specific set, he can do so by ”Char-acter”:”Character” = 1.0 enables the internal IP parameter set”Character” = 2.0 enables the internal NLP parameter set”Character” = 3.0 enables the internal MINLP parameter set








g) Put the number of inequality constraint functions to the object:


h) Put the number of equality constraint functions to the object:


3. Set boundary values for continuous and integer variables and provide initial guess.




for iVariableIndex = 0,...,NumContVariables + NumIntVariables - 1.or



where pdXu and pdXl must be pointing on arrays of length equal to the number ofcontinuous and integer design variables .




199

20. MidacoWrapper




4. Start MidacoWrapper: StartOptimizer()




• if iStatus equals EvalFuncId(): MidacoWrapper needs new function values → 6bNew function values.After passing these values to the MidacoWrapper object go to 4.

6. Providing new function values:

a) Useful methods:






GetDesignVar( const int iVectorIdx ) const


b) Providing new function valuesFunction values for the objective as well as for constraints have to be calculated forall optimization vectors provided by Midaco.Note: When the status is equal EvalFuncId() for the first time, the functions needto be evaluated only for the 0’th parallel system.For access to the design variable vectors see 6(a)ii. After calculating the values, thesemust be passed to the SqpWrapper object using:



where pdConstrVals is pointing on an array containing the values of each con-straint at the iParSysIdx’th design variable vector provided by Midaco.



200


with iObjFunIdx = 0 and dObjVal defining the value of the objective functionat the iParSysIdx’th design variable vector provided by Midaco.




7. Output








) const




201

21. MipOptimizer

21.1. Definition of Optimization Problems addressed

Our purpose is to solve nonlinear mixed discrete optimization problems as given in the followingmathematical definition:

Minimize f(x, yR, yNR)s.t.:

gj(x, yR, yNR) = 0 , j = 1, . . . ,me

gj(x, yR, yNR) ≥ 0 , j = me + 1, . . . ,m

li ≤ xi ≤ ui, i = 1, . . . , nCyRi ∈ SRi ⊂ R, i = 1, . . . , nRyNRi ∈ SNRi ⊂ R, i = 1, . . . , nNR

(MIP)

with:

f Objective function, f : Rn → R, n = nNR + nR + nCgj Constraint functions, gj : Rn → R, j = 1, . . . ,myNR Vector of nonrelaxable discrete design variables.yR Vector of relaxable discrete parameters.x Vector of continuous optimization parameters.l, u ∈ Rn Vectors containing lower and upper bounds for continuous design vari-

ables.

The difference between relaxable and nonrelaxable design variables is, that function val-ues f(x, yR, yNR) and gj(x, y

R, yNR) can be calculated at any point yR : yRi ∈ [miny ∈SRi ,maxy ∈ SRi ]∀i = 1, ..., nR, whereas for nonrelaxable design variables these can onlybe evaluated at yNR ∈ SNR = SNR1 × · · · × SNRnNR .

21.2. General Notation for Algorithm

The underlying algorithm MIPSQP solves discrete optimization problems by solving severalcontinuous nonlinear optimization problems (NLPs). These NLPs are derived from the originproblem (MIP) by fixing all nonrelaxable parameters to provided values and considering the

203

21. MipOptimizer

relaxable design variables as contonuous ones. This NLP can be stated as follows.


gj(x, yR, yNR) = 0 , j = 1, . . . ,me

gj(x, yR, yNR) ≥ 0 , j = me + 1, . . . ,m

li ≤ xi ≤ ui, i = 1, . . . , nClRi ≤ yRi ≤ uRi , i = 1, . . . , nR

(P(yNR))

with lRi = miny ∈ SRi , uRi = maxy ∈ SRi , i = 1, . . . nR.

The second type of nonlinear optimization problem to be solved is generated from (MIP) byfixing all discrete parameters to given values. This NLP can be stated as follows.


gj(x, yR, yNR) = 0 , j = 1, . . . ,me

gj(x, yR, yNR) ≥ 0 , j = me + 1, . . . ,m

li ≤ xi ≤ ui, i = 1, . . . , nC

(P(yNR, yR))

If we use F (yNR) and F (yR, yNR)in this paper we denote the optimal objective functionvalue of P(yNR) and P(yNR, yR) respectively. If an optimization problem is infeasible we setF (yNR) = +∞ or F (yR, yNR) = +∞ respectively.

The termination criterion of the algorithm is described in the following:Let (xopt, y

Ropt, y

NRopt ) ∈ RnC × RnR × RnNR be feasible for (MIP). Then we call (xopt, y

Ropt, y

NRopt )

a local optimal point, if

F (yRopt, yNRopt ) ≤ F (yR, yNR)∀(yR, yNR) ∈ Y

where

Y :=

nNR⋃i=1

(yRopt, yNR) :

yNRj =(yNRopt

)j∀j ∈ 1, . . . , nNR \ i

yNRi =(yNRopt

)i+ d

d ∈

miny∈SNRi

(yNRopt

)i− y > 0

, miny∈SNRi

y −

(yNRopt

)i> 0

∪nR⋃i=1

(yR, yNRopt ) :

yRj =(yRopt)j∀j ∈ 1, . . . , nR \ i

yRi =(yRopt)i+ d

d ∈

miny∈SRi

(yRopt)i− y > 0

, miny∈SRi

y −

(yRopt)i> 0

In other words, if we cannot yield an improved objective function value by varying a single

component of the discrete design vector to the left or right neighbor we consider the designvector as local optimal.

For solving nonlinear optimization problems we employ the well known SQP (SequentialQuadratic Programming) approach.

204

21.3. MIPSQP: Algorithm for Solving Nonlinear Discrete Optimization Problems

21.3. MIPSQP: Algorithm for Solving Nonlinear DiscreteOptimization Problems

To obtain a solution of (MIP) we split the design variables into non relaxable, relaxable andcontinuous design variables and apply different strategies for the different types of design variablesets.

Relaxable Design Variables: These design variables are first considered as continuous ones.The optimal solution of the continuous relaxation (P(yNR)) is used to generate different feasiblecandidate vectors for relaxable design variables. Here we assume, that the optimal solutionof (MIP) is in the neighborhood of the solution of (P(yNR)). The optimality for these designvariables is tested by MP-Search. If no descent direction is found the algorithm stops andour optimality criterion is fulfilled. If all generated candidate vectors are not feasible, we willcontinue with MP-Search.

Nonrelaxable Design Variables: To retrieve a descent direction for nonrelaxable design vari-ables, we approximate the minimum value function F (yNR) : Rn → R quadratically:Let H ≈ ∇2F (yNR) and G ≈ ∇F (yNR), then we solve the following quadratic optimizationproblem.

Minimize 12d

THd+GTds.t.:

yNR − lNR ≤ d ≤ yNR − uNR,

where (x, yR) is the optimal solution for P(yNR) and lNR, uNR ∈ RnNR , with lNRi = min(y ∈SNRi ) and uNRi = max(y ∈ SNRi ), i = 1, . . . , nNR the lower and upper bound vectors fornonrelaxable design variables. The optimal solution d of the quadratic problem is taken asnew search direction for nonrelaxable design variables, i.e. let yNR be the vector resulting fromrounding each component of yNR+d to the next feasible point. Then we solve P(yNR) and searcha feasible point for relaxable design variables yR. If this point satisfies F (yR, yNR) < F (yR, yNR)we have a new iterate with an improved objective value and restart with the QP-search. If theQP-search fails we will perform the MP-Search algorithm.

MIPSQP - Algorithm

(I) Generating First Iterate

The generation of the first iterate is very important for solving the problem. We do it by fixingall discrete design variables to user provided initial values:

Solve P(yR0 , yNR0 ) where yR0 , y

NR0 are the user provided initial values for nonrelaxable and

relaxable design variables. Let x be the solution for P(yR0 , ynR0 ), then the first iterate is given

by (x, yR0 , yNR0 ) with objective function value F (yR0 , y

NR0 ). If P(yR0 , y

NR0 ) is not feasible, we try

to retrieve a feasible point for non relaxable design variables by searching a descent direction ofthe following objective function:

FP (yNR) = FR(yNR) + φ(x, yR, yNR)

205

21. MipOptimizer

where

φ(x, yR, yNR) =m∑i=1

ωigi(x, yR, yNR)

andFR(yNR0 ) = min

yR,xf(x, yR, yNR) : g(x, yR, yNR) ≤ c, c ≥ 0

To retrieve a descent direction we approximate∇FR(yNR0 ) by finite central differences and searchthe other direction. The first steplength is calculated by the following rule:

• A =m∑i=1|∇FR(yNR0 )|

• l =∇FR(yNR0 )

A

• τ = 1

• (1) if τ ≥ 1.5 no feasible solution vector found.

• for i = 1, ..., NR do:

– if li < 0 set di = li · τ · (yNR0 −min(y ∈ SNRi ))

– else if li > 0 set di = li · τ · (max(y ∈ SNRi )− yNR0 )

• Let yNR be the value resulting from rounding yNR0 + l to the next value in SNRi

• If yNR is feasible for (MIP) the first iterate is generated either by rounding the optimalrelaxed values for F (yNR) or applying more concise methods.

• Else if yNR is not feasible for (MIP) we set

– τ = 2− τ , if τ > 1

– τ = 2− τ − 0.1, else.

• Goto (1)

(II) Generating Search Directions for Non-Relaxable Design Variables using QP-Searchmethod

Let (xk, yRk , y

NRk ) be the current iterate. For k = 0 this is equal to the solution vector generated

by a method described above.If the set of non relaxable design variables is empty, we proceed with testing optimality for

relaxable design variables. Else we approximate F (yNRk ) : RnNR → R quadratically. I.e. LetH ≈ ∇2F (yNRk ) and G ≈ ∇F (yNRk ). Then we solve the quadratic optimization problem

d ∈ RnNR :min 1

2dTHd+GTd

lNR − yNRk ≤ d ≤ uNR − yNRk

where lNR, uNR ∈ RNR with lNRi = miny ∈ SNRi and uNRi = maxy ∈ SNRi , i = 1, ..., nNR,denoting the lower and upper bound vector for non relaxable design variables. The optimalsolution d of this quadratic problem is the new search direction for non relaxable parameters.

206

21.3. MIPSQP: Algorithm for Solving Nonlinear Discrete Optimization Problems

I.e. let yNRk+1 be the vector resulting from rounding each component of yNRk + d to the next

feasible point. Then we solve P (yNRk+1) and search the best feasible point for relaxable design

variables yRk+1. If the new point satisfies F (yRk+1, yNRk+1) < F (yRk , y

NRk ) we have a new iterate with

improved objective function value, set k = k + 1 and restart QP-Search. If QP-Search fails wewill perform a MP-Search algorithm.

(III) Generating Search Directions for Discrete Design Variables using MP-Search method

Let (xk, yRk , y

NRk ) be the best solution for (MIP) found so far and d = 0nNR+nR Then we vary

each component yi of the design vector y = (yRk , yNRk ) to the left neighbor y, where y ∈ SRi if

i = 1, ..., nR and y ∈ SNRi−nR if i = nR, ..., nNR, i.e. y = y − eiyi + eiy and solve

x ∈ Rn :min f(x, y)g(x, y) ≥ 0

li ≤ xi ≤ ui, i = 1, ..., nC

If the optimal solution x satisfies f(x, y) ≤ F (yRk , yNRk ), set

• (xk+1, yRk+1, y

NRk+1) = (x, y)

• k = k + 1

• i = i+ 1

• di = −1

Else set

• y = y + eiyi − eiy

• i=i+1

and restart until i > nR + nNR.After completing the search for the left direction each component yi of the design vector

y = (yRk , yNRk ) is varied to the right neighbor y, where y ∈ SRi if i = 1, ..., nR and y ∈ SNRi−nR if

i = nR, ..., nNR, i.e. y = y − eiyi + eiy.If d = −1 set i = i+ 1 and try next component. If i > nR + nNR determine steplength.Else solve

x ∈ Rn :min f(x, y)g(x, y) ≥ 0

li ≤ xi ≤ ui, i = 1, ..., nC

If the optimal solution x satisfies f(x, y) ≤ F (yRk , yNRk ), set

• (xk+1, yRk+1, y

NRk+1) = (x, y)

• k = k + 1

• i = i+ 1

• di = +1

207

21. MipOptimizer

Else set

• y = y + eiyi − eiy

• i=i+1

and try next component until i > nR + nNR.If d = 0nR+nNR optimality was proved, else we try larger stepsizes. I.e. define a new discrete

design vector using the following rule:

Determination of Steplength

Set

• yi = maxy ∈ SRi , i = 1, ..., nR

• yi+nR = maxy ∈ SNRi , i = 1, ..., nNR

• yi

= miny ∈ SRi , i = 1, ..., nR

• yi+nR

= miny ∈ SNRi , i = 1, ..., nNR

for all i = 1, ..., nR do

• If di = −1 set (yR)i = yi

• If di = +1 set (yR)i = yi

• If di = 0 set (yR)i = (yRk )i

for all i = 1, ..., nNR do

• If di+nR = −1 set (yNR)i = yi+nNR

• If di+nR = +1 set (yNR)i = yi+nNR

• If di+nR = 0 set (yNR)i = (yNRk )i

(*) Solve P(yR, yNR).If the solution x satisfies f(x, yR, yNR) < f(xk, y

Rk , y

NRk ), we have determined an improved

iterate (xk+1, yRk+1, y

NRk+1) = (x, yR, yNR) and restart MP-Search or QP-Search.

If P(yR, yNR) is not feasible or the solution x satisfies f(x, yR, yNR) ≥ f(xk, yRk , y

NRk ), we set:

• yRi = y ∈ SRi :∣∣∣y − yRi

2

∣∣∣ = miny∈SRi

∣∣∣y − yRi2

∣∣∣• yNRi = y ∈ SNRi :

∣∣∣y − yNRi2

∣∣∣ = miny∈SNRi

∣∣∣y − yNRi2

∣∣∣If yR = yRk and yNR = yNRk restart MP-Search or QP-Search to retrieve a new search direction.

Else go to (*).

208





1. Generate a MipOptimizer object employing MipOptimizer()

2. Generate a continuous optimizer (SqpWrapper or ScpWrapper) object and attach it to theMipOptimizer using AttachScalarOptimizer()

a) Put the number of design variables to the object:


b) Put the number of integer design variables to the object:Put( const char * pcParam , const int * piValue ) ;


where pcParam is "NumIntDv"

c) Put the number of catalogued design variables to the object:Put( const char * pcParam , const int * piValue ) ;


where pcParam is "NumCatDv"

d) Put the allowed values for the catalogued design variables:Variablecatalogue * m pVariableCatalogue is a pointer to an array of lengthequal to the number of catalogued variables. Via this pointer each variable can beinitialized as explained in 29.1

e) Put the values for the parameters you want to adjust:



where pcParam is one of the parameters of the continuous optimizer or:

– "MaxNumIter" Put maximum number of iterations for the continuous sub-problems.

– "OutputLevel" Put the desired output level.0: no output1: only final convergence analysis2: show results of subproblems (default)>2: add output of the subproblem solvers withOutputLevel(subprob)=OutputLevel-2

– "FortranOutputLevel" If >0, the output of the fortran code is written toan additional output file. This output might give more detailed information

209

21. MipOptimizer

about errors.0: no additional output (default)





where pcParam is one of the parameters of the continuous optimizer.






where pcParam is one of the parameters of the continuous optimizer.

f) Put the number of inequality constraint functions to the object:


g) Put the number of equality constraint functions to the object:


3. Set boundary values for continuous and integer variables and provide initial guess.




for iVariableIndex = 0,...,NumContVariables + NumIntVariables - 1.or



where pdXu and pdXl must be pointing on arrays of length equal to the number ofcontinuous and integer design variables .






210



4. Start MipOptimizer: StartOptimizer()




• if iStatus equals EvalGradId():MipOptimizer needs new values for gradients → 6c Providing new gradient values.After passing these values to the MipOptimizer object go to 4.


a) Useful methods:






Returns a pointer to the i’th design variable vector (default: i=0 ).GetDesignVar( const int iVarIdx , double & pdPointer, const int iVectorIdx

) const


b) Providing new function valuesFunction values for the objective as well as for constraints have to be calculated forall optimization vectors provided by MipOptimizer.Note: When the status is equal EvalFuncId() for the first time, the functions needto be evaluated only for the 0’th parallel system.For access to the design variable vectors see 6(a)ii. After calculating the values, thesemust be passed to the MipOptimizer object using:



where pdConstrVals is pointing on an array containing the values of each con-straint at the iParSysIdx’th design variable vector provided by MipOptimizer.



with iObjFunIdx = 0 and dObjVal defining the value of the objective functionat the iParSysIdx’th design variable vector provided by MipOptimizer.

211

21. MipOptimizer







For access to the design variable vector see 6(a)ii with iParSysIdx = 0.













7. Output





212











) const

with iParSysIdx = 0(default) returns the value of the iVariableIdx’th design vari-able in the last solution vector.





2. The constraints are contradicting, i.e., the set of feasible solutions is empty. There is noway to find out, whether a general nonlinear and non-convex set possesses a feasible pointor not. Thus, the nonlinear programming algorithms will proceed until running in any ofthe mentioned error situations. In this case, the correctness of the model must be verycarefully checked.


However, some of the error situations also occur if, because of wrong or non-accurate gradients,the quadratic programming subproblem does not yield a descent direction for the underlyingmerit function. In this case, one should try to improve the accuracy of function evaluations,scale the model functions in a proper way, or start the algorithm from other initial values.

213

22. MisqpWrapper - A Mixed Integer SQPExtension

MisqpWrapper is a C++ Wrapper for the Fortran subroutine MISQP. MISQP is an implemen-tation of a sequential quadratic programming method that addresses nonlinear optimizationproblems. The algorithm applies trust region techniques instead of the line search approach.MISQP allows continuous, binary, integer and catalogued variables which can only take valueswithin a given discrete set. Thus, MISQP also solves mixed-integer nonlinear programmingproblems. To simplify notation only the mixed integer problem formulation is considered in thesubsequent description of the method. Moreover, all types of variables, except the continuousones, are denoted by integer variables from now on. All integer variables are assumed to benon-relaxable, therefore, the variables are not evaluated for fractional values. Partial deriva-tives with respect to integer variables are approximated internally. Alternatively, these partialderivatives can be provided by user. Numerical results are included which shows the behaviorof the code with respect to different parameter settings. The sections 22.1, 22.2 and 22.3 aretaken from Exler et al. [40].

22.1. Introduction


x ∈ Rnc , y ∈ Zni :

min f(x, y)gj(x, y) = 0 , j = 1, . . . ,me

gj(x, y) ≥ 0 , j = me + 1, . . . ,mxl ≤ x ≤ xuyl ≤ y ≤ yu

(22.1)

where x and y denote the vectors of the continuous and integer variables, respectively. Thevector y of length ni contains binary, integer and catalogued variables, in this order. Note thatthe binary variables can only take values 0 and 1. It is assumed that the problem functionsf(x, y) and gj(x, y), j = 1, . . . ,m, are continuously differentiable subject to all x ∈ Rnc , wherenc denotes the number of continuous variables.

Trust region methods have been invented many years ago first for unconstrained optimization,especially for least squares optimization, see for example Powell [107], or More [95]. Extensionswere developed for non-smooth optimization, see Fletcher [43], and for constrained optimiza-tion, see Celis [21], Powell and Yuan [110], Byrd et al. [20], Toint [158] and many others. Acomprehensive review on trust region methods is given by Conn, Gould, and Toint [26].

On the other hand, sequential quadratic programming or SQP methods belong to the mostfrequently used algorithms to solve practical optimization problems. The theoretical backgroundis described e.g. in Stoer [151], Fletcher [44], or Gill et al. [54].

However, the situation becomes much more complex if additional integer variables must betaken into account. Numerous algorithms have been proposed in the past, see for example

215

22. MisqpWrapper - A Mixed Integer SQP Extension

Floudas [48] or Grossmann and Kravanja [61] for review papers. Typically, these approachesrequire convex model functions and continuous relaxations of integer variables. By a continuousrelaxation, we understand that integer variables can be treated as continuous variables, i.e.,function values can also be computed between successive integer points. There are branch-and-bound methods where a series of relaxed nonlinear programs obtained by restricting thevariable range of the relaxed integer variables must be solved, see Gupta and Ravindran [62] orBorchers and Mitchell [15]. When applying an SQP algorithm for solving a subproblem, it ispossible to apply early branching, see also Leyffer [83]. Pattern search algorithms are available tosearch the integer space, see Audet and Dennis [3]. After replacing the integrality condition bycontinuous nonlinear constraints, it is possible to solve the resulting highly nonconvex programby a global optimization algorithm, see e.g. Li and Chou [84]. Outer approximation methods areinvestigated by Duran and Grossmann [36] and Fletcher and Leyffer [45], where a sequence ofalternating mixed-integer linear and nonlinear programs must be solved. Alternatively, it is alsopossible to apply cutting planes similar to linear programming, see Westerlund and Porn [165].

But fundamental assumptions are often violated in practice. Many real-life mixed-integerproblems are not relaxable, and model functions are highly nonlinear and nonconvex. Moreover,some approaches require detection of infeasibility of nonlinear programs, a highly unattractivefeature from the computational point of view. We assume now that integer variables are notrelaxable, that there are relatively large ranges for integer values, and that the integer variablespossess some kind of physical meaning, i.e., are smooth in a certain sense. It is supposed that aslight alteration of an integer value, say by one, changes the model functions only slightly, at leastmuch less than a more drastic change. Typically, relaxable programs satisfy this requirement.In contrast to them, there are categorial variables which are introduced to enumerate certainsituations and where any change leads to a completely different category and thus to a completelydifferent response.

A practical situation is considered by Bunner and Schittkowski [17], where typical integervariables are the number of fingers and the number of layers of an electrical filter. By increasingor decreasing the number of fingers by one, we expect only a small alteration in the totalresponse, the transmission energy. The more drastically the variation of the integer variable is,the more drastically model function values are supposed to change.

Thus, we propose an alternative idea in Section 22.2 where we try to approximate the La-grangian subject to the continuous and integer variables by a quadratic function based on aquasi-Newton update formula. Instead of a line search as is often applied in the continuouscase, we use trust regions to stabilize the algorithm and to enforce convergence following thecontinuous trust region method of Yuan [171] with second order corrections. The specific formof the quadratic programming subproblem avoids difficulties with inconsistent linearized con-straints and leads to a convex mixed-integer quadratic programming problem, which can besolved by any available algorithm, for example, a branch-and-cut method. Algorithmic detailsand numerical results are reported in Exler et. al. [39].

22.2. The Mixed-Integer Trust Region SQP Method

Basically, integer variables lead to extremely difficult optimization problems. Even if we assumethat the integer variables are relaxable and that the resulting continuous problem is strictlyconvex, it is possible that the integer solution is not unique. There is no numerically applicable

216

22.2. The Mixed-Integer Trust Region SQP Method

criterion to decide whether we are close to an optimal solution nor can we retrieve any infor-mation about the position of the optimal integer solution from the continuous solution of thecorresponding relaxed problem.

The basic idea of a trust region method is to compute a new iterate by a second order modelor a close approximation, see Exler and Schittkowski [41] or Exler et. al. [39] for more details.The stepsize is restricted by a trust region radius ∆k, where k denotes the k-th iteration step.Subsequently, a ratio rk of the actual and the predicted improvement subject to a certain meritfunction is computed. The trust region radius is either enlarged or decreased depending on thedeviation of rk from the ideal value rk = 1. If sufficiently close to a solution, the artificial bound∆k should not become active, so that the new trial step proposed by the second order modelcan always be accepted.

For the continuous case, the superlinear convergence rate can be proved, see Fletcher [43],Burke [18], or Yuan [170]. The individual steps depend on a large number of constants, whichare carefully selected based on numerical experience.

The goal is to apply a trust region SQP algorithm and to adapt it to the mixed-integer casewith as few alterations as possible. Due to the integer variables, the quadratic programmingsubproblems that have to be solved are mixed-integer quadratic programming problems andcan be solved by any available algorithm. Since the generated subproblems are always convex,we apply the branch-and-cut code MIQL of Lehmann et al. [79]. The mixed-integer quadraticprogramming problems to be solved, are of the form

d ∈ Rnc ×Zni , µ ∈ R :

min 12d

TBkd+∇f(xk, yk)Td+ σkµ

gj(xk, yk) +∇gj(xk, yk)Td ≥ −µ , j = 1, . . . ,m−gj(xk, yk)−∇gj(xk, yk)Td ≥ −µ , j = 1, . . . ,me

max(xlj − x(k)j ,−∆c

k) ≤ dcj ≤ max(xuj − x(k)j ,∆c

k) , j = 1, . . . , nc

max(ylj − y(k)j ,−∆i

k) ≤ dij ≤ max(yuj − y(k)j ,∆i

k) , j = 1, . . . , niµ ≥ 0

(22.2)where d := (dc, di)T contains a continuous step dc and an integer step di. The solution is denotedby dk, and µk.

Since we do not assume that (22.1) is relaxable, i.e., that f and g1, . . . , gm can be evaluatedat any fractional parts of the integer variables, we approximate the first derivatives at f(x, y)by the difference formula

dyf(x, y) ≈ 1

2

(f(x, y1, . . . , yj + 1, . . . , yni)− f(x, y1, . . . , yj − 1, . . . , yni)

)(22.3)

for j = 1, . . . , ni, at neighboring grid points, see Exler et. al. [39] for the detailed procedure. Ifeither yj + 1 or yj −1 violates a bound, we apply a non-symmetric difference formula. Similarly,dygj(x, y) denote a difference formula for first derivatives at gj(x, y) computed at neighboringgrid points.

The adaption of the continuous trust region radius must be modified, since a trust regionradius smaller than one does not allow any further change of integer variables. Therefore, twodifferent radii are defined, ∆c

k for the continuous and ∆ik for the integer variables. We prevent

∆ik from falling below one by ∆i

k+1 := max(1, 2∆ik). Note, however, that the algorithm allows a

decrease of ∆ik below one. In this situation, integer variables are fixed and a new step is made

217


only subject to the continuous variables. As soon as a new iterate is accepted, we set ∆ik to one

in order to be able to continue optimization over the whole range of variables.


We are interested in comparing several parameter settings to analyze their influence on theperformance. We select 142 test problems out of 186 problems published in Schittkowski [144],which have not more than 20 integer or binary variables and not more than 400 continuousvariables, to prevent extreme outliers. All test examples are provided with the best objectivefunction value f? we know, either obtained from analytical solutions, literature, or extensivenumerical testing, see Schittkowski [144] for details.

Note again that our main paradigm is to proceed from non-relaxable integer variables anddescent directions, which can be considered as very crude numerical approximations of partialderivatives by a difference formula. To be as close as possible to complex practical engineeringapplications, derivatives with respect to continuous variables are approximated by a two-sideddifference formula, unless stated otherwise. Integer derivatives are replaced by descent directions,see Exler et al. [39] for the detailed procedure. For binary variables or for variables at a bound,a forward or backward difference formula is applied, respectively.

First we need a criterion to decide whether the result of a test run can be considered as asuccessful return or not. Let εt > 0 be a tolerance for defining the relative accuracy, and (xk, yk)the final iterate of a test run. If f? = 0, as in some of the academic test instances, we add thevalue one to the objective function. We call (xk, yk) a successful solution, if the relative errorin the objective function is less than εt and if the maximum constraint violation is less than ε2t ,i.e., if

f(xk, yk)− f? < εt|f?|(4) (22.4)

and ‖g(xk, yk)−‖∞ < ε2t , where g(xk, yk)

− represents the vector of constraint violations. We takeinto account that a code returns a solution with a better function value than the best knownone, subject to the error tolerance of the allowed constraint violation. The tolerance is smallerfor measuring the constraint violation than for the error in the objective function, since we applya relative measure in this case.

Optimization algorithms for continuous optimization have the advantage that in each step,local optimality criteria, the so-called KKT-conditions, can be checked. The algorithm is stoppedas soon as an iterate is sufficiently feasible and satisfies these stationary conditions subject to auser-provided tolerance.

In mixed-integer optimization, however, we do not know how to distinguish between a localor global solution nor do we have any local criterion to find out whether we are sufficiently closeto a solution or not. Thus, our algorithm stops if constraints are satisfied subject to a tolerance,and if no further progress is possible towards neighboring grid points as measured by a meritfunction. Thus, we distinguish between feasible, but non-global solutions, successful solutionsas defined above, and false terminations. We call the return of a test run, say xk and yk, anacceptable solution, if the internal termination conditions are satisfied subject to a reasonablysmall tolerance ε = 10−6 and if instead of (22.4)

f(xk, yk)− f? ≥ εt|f?| (22.5)

218


holds. For our numerical tests, we use εt = 0.01. Note that our approach is based on a localsearch method, as in case of continuous optimization, and that global search is never started.

There is basic difficulty when evaluating statistical comparative scores by mean values fora series of test problems and different computer codes. It might happen that the less reliablecodes do not solve the more complex test problems successfully, but the more advanced onessolve them with additional numerical efforts, say calculation time or number of function calls. Adirect evaluation of mean values over successful test runs would thus penalize the more reliablealgorithms.

A more realistic possibility is to compute mean values of the criteria we are interested in, andto compare the codes pairwise over sets of test examples, which are successfully solved by twocodes. We then get a reciprocal ncode × ncode matrix, where ncode is the number of codes underconsideration. The largest eigenvalue of this matrix is positive and we compute its normalizedeigenvector from where we retrieve priority scores. In a final step, we normalize the eigenvectorsso that the smallest coefficient gets the value one. The idea is known under the name prioritytheory, see Saaty [114], and has been used by Schittkowski [116] for comparing 27 optimizationcodes, see also Exler et al. [39] for another comparative study of mixed-integer codes.

We use the subsequent criteria to compare the robustness and efficiency of the code:

nsucc - percentage of successful and acceptable test runs according to above defini-tions,

nacc - percentage of acceptable, i.e., of non-global feasible solutions, see above defi-nition,

nerr - percentage of errors, i.e., of terminations with iStatus > 0,pfunc - relative priority of equivalent function calls including function calls used for

gradient approximations,nfunc - average number of equivalent function calls including function calls used for

computing a descent direction or gradient approximations, evaluated over allsuccessful test runs, where one function call consists of one evaluation of theobjective function and all constraint functions at a given x and y,

pgrad - relative priority of number of iterations,ngrad - average number of iterations or gradient evaluations subject to the continuous

variables, respectively, evaluated over all successful test runs,ptime - relative priority of execution times.time - average execution times in seconds, evaluated over all successful test runs,

The following parameter settings are defined to compare several versions of MISQP againstthe standard options:

Version Option Value Description

1 - partial derivatives subject to continuous vari-ables computed by two-sided differences

2 - external function scaling3 ”InternalScaling” 0 internal function scaling suppressed4 ”ModifyMatrix” 0 internal scaling of matrix Bk suppressed5 UserProvDerivatives 1 partial derivatives subject to discrete vari-

ables computed externally by forward differ-ences

219


code nsucc(%) nacc(%) nerr(%) pfunc nfunc pgrad ngrad ptime ntime1 85.2 12.7 4.2 2.0 1,047 1.7 24 1.3 0.22 85.9 11.9 2.1 1.5 950 1.5 22 1.8 0.23 82.4 10.5 7.0 1.5 566 1.5 18 1.3 0.14 62.7 35.2 2.1 1.7 801 1.5 15 1.0 0.35 82.4 14.1 3.5 2.0 1,222 2.3 31 1.0 0.36 61.3 35.2 3.5 1.0 541 1.0 12 1.0 0.17 84.5 14.8 0.7 2.0 1,191 1.9 26 1.4 0.38 83.1 14.8 2.1 1.1 547 1.6 24 1.5 0.29 78.2 19.7 2.8 2.1 553 1.4 16 1.4 0.2

Table 22.1.: Performance Results for Different Parameter Settings of MisqpWrapper

6 ”MaxNumRestarts” 0 restarts suppressed7 ”MeritFunction” 1 L∞ merit function8 - partial derivatives subject to continuous vari-

ables computed by forward differences9 ”NonmonotoneConst” 0 monotone trust region updates

We apply default parameter settings and tolerances unless changed as indicated in the table,with termination tolerance 10−6 of MISQP and 10−9 of MIQL, the mixed-integer quadraticprogramming code, see Lehmann et al. [79]. Maximum number of iterations is 500 and the max-imum number of branch-and-bound nodes for solving a mixed-integer quadratic programmingsubproblem is 1,000.

To understand the fifth version, one has to take into account that although we proceed fromnon-relaxable model functions, our test problems nevertheless consist of analytical functions andare relaxable. In this case, we want to check whether analytical partial derivatives subject tointeger variables, if exist, lead to better numerical results.

Table 22.1 shows numerical results obtained for the different parameter settings of the mixed-integer trust region method MISQP. A couple of test runs were terminated by an error message,in most cases because of reaching maximum number of iterations or not being able to find afeasible point. In the first case, it is nevertheless possible that an optimal solution is obtained.

The most important conclusions are:

• External function scaling adapted to the specific problem might help to solve more testproblems successfully. However, computational costs are higher.

• Function scaling is important and highly recommended. Without scaling, the code becomesless reliable.

• Without proper scaling and adaption of the quasi-Newton matrix, the algorithm mightterminate too early.

• Analytical partial derivatives subject to the integer variables do not improve robustnessor efficiency, and the number of iterations is enlarged.

• Internal restarts are extremely important despite higher number of function evaluationsand calculation times. Otherwise, the code might terminate too early.

220


code nsucc(%) nacc(%) nerr(%) nfunc ngrad ntime1 85.2 12.7 4.2 1,047 24 0.2

10 79.6 16.2 4.2 924 24 0.01

Table 22.2.: Performance Results of Mixed-Integer versus Continuously Relaxed Test Problems

• Change of the merit function does not influence performance significantly.

• Less accurate partial derivatives subject to the continuous variables decrease number offunction evaluations significantly, but are a bit less reliable.

• Monotone trust region updates decrease the number of function calls, but less problemsare successfully solved.

Finally, it might be interesting to compare performance results obtained for mixed-integerproblems on the one hand and for the corresponding continuous relaxations on the other. Thetest problems are relaxable and MISQP can be applied for solving continuous optimizationproblems in an efficient way. The code is then a typical SQP-implementation with trust-regionand a second-order stabilization.

Table 22.2 contains numerical results, where code number 10 stands for the relaxed versionof MISQP. Also in this case, we apply a two-sided difference formula for partial derivatives toavoid side effects due to inaccurate gradients.

The relaxed version of MISQP computes better optimal solutions for 106 test runs, as mustbe expected. Very surprisingly, the numerical solution of mixed-integer test problems requiresabout the same number of function evaluations as the solution of the relaxed counterparts.Because of a large number of branch-and-bound steps when solving the mixed-integer quadraticsubproblems, computation times for solving mixed-integer programs are much higher. But forthe practical applications with time consuming function calls we have in mind, these additionalefforts are tolerable and in many situations negligible compared to a large amount of internalcomputations of a complex simulation code.

There is still another important conclusion. About 96 % of the relaxed and the mixed-integertest problems are successfully solved by MISQP, i.e., the code terminates at a feasible iteratewhere the optimality criteria are satisfied subject to a tolerance 10−6. Thus, we should notexpect that MISQP will be able to get significantly higher scores for nsucc for the much morecomplex mixed-integer version of our test problem suite. Another conclusion is that about 15% of the test problems are non-convex.




1. Generate an MisqpWrapper object employing MisqpWrapper()

221


2. a) Put the number of design variables to the object:PutNumDv( int iNumDv ) ;

b) Put the number of binary design variables to the object:Put( const char * pcParam , const int * piNumBinDv ) ;

orPut( const char * pcParam , const int iNumBinDv ) ;

where pcParam = "NumBinDv".

c) Put the number of integer design variables to the object:Put( const char * pcParam , const int * piNumIntDv ) ;

orPut( const char * pcParam , const int iNumIntDv ) ;

where pcParam = "NumIntDv".

d) Put the number of catalogued design variables to the object:Put( const char * pcParam , const int * piNumIntDv ) ;

orPut( const char * pcParam , const int iNumIntDv ) ;

where pcParam = "NumCatDv".

e) Put the allowed values for the catalogued design variables:Variablecatalogue * m pVariableCatalogue is a pointer to an array of lengthequal to the number of catalogued variables. Via this pointer each variable can beinitialized as explained in 29.1

f) By default the partial derivatives with respect to discrete variables are approximatedinternally. If you want to provide the partial derivatives directly you can change thespecific flag for each discrete variable:PutUserDerivativeFlag( const int iDiscreteDvIdx , const int iFlag ) ;

where iDiscreteDvIdx is the index of the discrete variable you want to modify. Thefirst discrete variable is accessed by 0 and the last one by NumAllDiscreteVariable

- 1, respectively.Possible values of iFlag:0: internal approximation of partial derivatives. (default)1: user provided derivatives.

g) Put the values for other parameters you want to adjust:




– "MaxNumIter" Put maximum number of iterations. (default: 500)

– "OutputLevel" Put desired output level. (default: 2)

– "FortranOutputLevel" If > 0, the output of the fortran code is written toan additional output file. This output might give more detailed informationabout errors.0: no additional output (default)1: only final convergence analysis

222


2: one line of intermediate results is printed in each iteration.3: more detailed information is printed in each iteration step, e.g. variable,constraint and multiplier values.


– "OutUnit" Same as ”OutputUnit”.

– "MaxNumCuts" Put maximum number of cuts generated in subproblems. (de-fault: 50)

– "MaxNodesBB" Put maximum number of branch and bound nodes used bythe subproblem solver. (default: 1000)

– "InternalScaling" Put flag for internal scaling. Function values can beinternally scaled.0: No scaling.1: Subject to their absolute values, if greater than 1.0 or"InternalScalingBound", respectively. (default)

– "NonmonotoneConst" Put maximum number of successive iterations whichare considered for the non-monotone trust region algorithm (ignored if alldesign variables are continuous). The value must be less than 100. (default:10)

– "MaxNumRestarts" Maximum number of successive restarts without improv-ing solution. Setting might lead to better results, but increases the numberof function evaluations. The value must be less than 15. (default: 2)

– "MeritFunction" Put merit function strategy:0: Augmented Lagrangian (only if all variables are continuous),1: L∞ penalty function,2: Combined version of 0 and 1,Default value is 0, if no discrete variables defined, and 2 otherwise.

– "MaxNumUnsuccSteps" Put maximum number of feasible steps without im-provement of objective function subject to the tolerance

√TermAcc. (default:

10)

– "BranchingRule" Put branching rule for subproblem solver:1: maximum fractional branching, (default)2: minimum fractional branching.

– "NodeSelectRule" Put node selection strategy of subproblem solver:1: best of all (large search trees).2: best of two (warmstarts, less memory for search tree).3: depth first (good warmstarts, less memory, many QPs solved). (default)

– "WarmStart" Put maximum number of successive warmstarts for subproblemsolver to avoid numerical instabilities. (default: 100)

– "ImprovedBounds" Put flag for calculating improved bounds in case of Best-of-All selection strategy in subproblem solver:0: no (default)1: yes

223


– "DepthFirstDirection" Put direction for Depth-First according to value ofLagrange function in subproblem solver:0: no (default)1: yes

– "CholeskyMode" Put Cholesky decomposition mode in subproblem solver:0: calculate Cholesky decomposition once and reuse it.1: calculate new Cholesky decomposition whenever warmstarts are not acti-vated. (default)

– "ScaleDv" Activate internal scaling of continuous variables:0: no1: yes (default)

– "ModifiyMatrix" Activate modification of Hessian approximation in orderto get more accurate search directions. Calculation time is increased in caseof a large number of integer variables.0: no1: yes (default)

– "TransformMIQP" Activate transformation of MIQP to positive orthant:0: no1: yes (default)




– "TermAcc" Put desired final accuracy. (default: 1.0E-6)

– "TermQPAcc" Put desired accuracy for QP solver. The value is needed forthe QP solver to perform several tests, for example whether optimality con-ditions are satisfied or whether a number is considered as zero or not. If"TermQPAcc" is less or equal to zero, then the machine precision is computedand subsequently multiplied by 1.0E+4. (default: 1.0E-10)

– "FacPenaltyIncrease" Put factor for increasing a penalty parameter, mustbe greater than one. (default: 10.0)

– "FacDeltaReduction" Put factor for decreasing the internal descent para-meter, see "ScalingDeltaInit". Must be less than one. (default: 0.1)

– "PenaltyParamInit" Put initial penalty parameter greater than zero. (de-fault: 1000.0)

– "ScalingDeltaInit" Put initial scaling parameter, smaller than one. (de-fault: 0.05)

– "TrustRegionContInit" Put initial continuous trust region radius. (default:50.0)

– "TrustRegionIntInit" Put initial integer trust region radius, must be greaterthan or equal to one. (default: 50.0)

224


– "InternalScalingBound" Put scaling bound in case of "InternalScaling"> 0, i.e., function values are scaled only if their absolute initial values aregreater than the bound. (default: 1.0)







– "OpenNewOutputFile" Determines whether a new output file must be createdby the optimizer. If false, the output is appended to an existing file. Thisis the case, if the optimizer is used as a subproblem solver (e.g. in a multipleobjective optimizer). Usually, this parameter does not have to be changed.

h) Put the number of inequality constraint functions to the object:PutNumIneqConstr( int iNumIneqConstr ) ;

i) Put the number of equality constraint functions to the object:PutNumEqConstr( int iNumEqConstr ) ;

3. Set boundary values for continuous and integer variables and provide initial guess. Notethat boundary values for binary variables are set to 0 and 1 automatically.

a) Set upper and lower bounds usingPutLowerBound( const int iVariableIndex, const double dLowerBound )


for iVariableIndex = 0,...,NumberOfContinuousVariables +

NumberOfBinaryVariables + NumberOfInteger - 1

orPutUpperBound( const double * pdXu );


where pdXu and pdXl must be pointing on arrays of length equal to the number ofcontinuous and integer (without catalogued) design variables.


b) Provide an initial guess for the optimization vector byPutInitialGuess( const double * pdInitialGuess ),where pdInitialGuess must be a double array of length equal to the number ofvariables.orPutInitialGuess( const int iVariableIndex, const double dInitialGuess ),for iVariableIndex = 0,...,NumberOfVariables - 1.


225


4. Start Misqp-Routine: StartOptimizer()




• if iStatus equals EvalFuncId() :MisqpWrapper needs new function values → 6b New function values.After passing these values to the MisqpWrapper object go to 4.

• if iStatus equals EvalGradId():MisqpWrapper needs new values for gradients → 6c Providing new gradient values.After passing these values to the MisqpWrapper object go to 4.


a) Useful methods:






Returns a pointer to the i’th design variable vector (default: i=0 ).GetDesignVar( const int iVarIdx , double & pdPointer,



b) Providing new function values.For access to the design variable vectors see 6(a)ii. After calculating the values, thesemust be passed to the MisqpWrapper object using:



where pdConstrVals is pointing on an array containing the values of each con-straint at the iParSysIdx’th design variable vector provided by MisqpWrapper.



with iObjFunIdx = 0 and dObjVal defining the value of the objective functionat the iParSysIdx’th design variable vector provided by MisqpWrapper.


226




c) Providing new gradient values.Gradients must be calculated for the objective and the constraints at the currentdesign variable vector.Partial derivatives have to be supplied for all continuous variables.Partial derivatives with respect to the discrete variables have to be supplied only forvariables where iFlag was set to 1 by PutUserDerivativeFlag(.), see 2f above.For access to the design variable vector see 6(a)ii with iParSysIdx = 0.
















7. Output








227








double & dValue,






228

23. Example

We solve the following problem using DefaultLoop():

Minimize x0 + x1 + x2 + x3 + x4 + x5

S.T.

0.0 ≤ −x2 − x4 + 0.6

0.0 ≤ x2 + x4 − 0.5

−1 ≤ x0 ≤ 1 (real)

−1 ≤ x1 ≤ 1 (real)

−1 ≤ x2 ≤ 1 (integer, relaxable)

−1 ≤ x3 ≤ 2 (integer, relaxable)

0.5 ≤ x4 ≤ 5.4 (∈ 0.5, 3.5, 5.4, not relaxable)

2 ≤ x5 ≤ 10 (integer, not relaxable)

Initial guess:

x0 = 1

x1 = 1

x2 = 1

x3 = 2

x4 = 5.4

x5 = 10

Optimal solution:

x0 = −1.0

x1 = −1.0

x2 = 0

x3 = −1

x4 = 0.5

x5 = 2

fopt = −0.5


229

23. Example

1 # ifndef Example41 H INCLUDED2 # define Example41 H INCLUDED34 #include”OptProblem . h”567 class Example41 : public OptProblem8 9 public :



1 #include”Example41 . h”2 #include<iostream>3 #include<cmath>4 #include”MidacoWrapper . h”5 #include”MisqpWrapper . h”6 #include”MipOptimizer . h”7 #include”ScpWrapper . h”8 #include”SqpWrapper . h”9

10 using std : : cout ;11 using std : : endl ;12 using std : : c in ;1314 Example41 : : Example41 ( )15 16 char c ;17 cout << endl << ”−−− t h i s i s Example 41 −−−” << endl << endl ;1819 // Choose o p t i m i z e r :20 do21 22 cout << ”Choose an opt imize r :\n\n” ;23 cout << ” [ 1 ] MipOptimizer\n” ;24 cout << ” [ 2 ] MidacoWrapper\n” ;25 cout << ” [ 3 ] MisqpWrapper\n” ;26

230

27 c in >> c ;28 29 while ( c != ’ 1 ’ && c != ’ 2 ’ && c != ’ 3 ’ ) ;3031 i f ( c == ’ 2 ’ )32 33 m pOptimizer = new MidacoWrapper ( ) ;34 35 else i f ( c == ’ 3 ’ )36 37 m pOptimizer = new MisqpWrapper ( ) ;38 39 else40 41 m pOptimizer = new MipOptimizer ( ) ;4243 // Choose cont inuous o p t i m i z e r and a t t a c h i t us ing44 // AttachSca larOpt imizer . The second argument t e l l s45 // MipOptimizer to d e l e t e the cont inuous o p t i m i z e r46 // a f t e r use .47 // Current ly on ly ScpWrapper and SqpWrapper can be used .48 do49 cout << endl << ”Choose a s c a l a r opt imize r : \n\n” ;50 cout << ” [ q ] SqpWrapper \n” ;51 cout << ” [ c ] ScpWrapper \n” ;5253 c in >> c ;5455 i f ( c == ’ q ’ )56 57 m pOptimizer−>AttachScalarOptimizer ( new SqpWrapper ( ) ,58 true ) ;59 60 else i f ( c == ’ c ’ )61 62 m pOptimizer−>AttachScalarOptimizer ( new ScpWrapper ( ) ,63 true ) ;64 65 else66 67 cout << ” i l l e g a l input ! ” << endl ;68 69 70 while ( c != ’ c ’ &&71 c != ’ q ’ ) ;72

231

23. Example

73 7475 int Example41 : : FuncEval ( bool bGradApprox )76 77 const double ∗ dX ;78 double ∗ pdFuncVals ;79 int iNumConstr ;80 int iNumObjFuns ;8182 m pOptimizer−>GetNumConstr ( iNumConstr ) ;83 m pOptimizer−>Get ( ”NumObjFuns” , iNumObjFuns ) ;8485 pdFuncVals = new double [ iNumConstr + iNumObjFuns ] ;8687 // 0 th c o n s t r a i n t88 pdFuncVals [ 0 ] = − dX [ 2 ] − dX [ 4 ] + 0 .6 ;89 // 1 s t c o n s t r a i n t90 pdFuncVals [ 1 ] = dX [ 2 ] + dX [ 4 ] − 0 .5 ;91 // o b j e c t i v e92 pdFuncVals [ 2 ] = dX [ 0 ] + dX [ 1 ] + dX [ 2 ] + dX [ 3 ] + dX [ 4 ] + dX [ 5 ] ;9394 m pOptimizer−>PutObjVal ( pdFuncVals [ iNumConstr ] ) ;95 m pOptimizer−>PutConstrValVec ( pdFuncVals ) ;9697 delete [ ] pdFuncVals ;9899 return EXIT SUCCESS ;

100 101102 int Example41 : : GradEval ( )103 104 const double ∗ dX ;105 int iNumDv ;106107 m pOptimizer−>GetNumDv( iNumDv ) ;108 m pOptimizer−>GetDesignVarVec ( dX ) ;109110 for ( int iDvIdx = 0 ; iDvIdx < iNumDv ; iDvIdx++ )111 112 m pOptimizer−>PutDerivObj ( iDvIdx , 1 ) ;113 i f ( ( iDvIdx == 2 ) | | ( iDvIdx == 4 ) )114 115 m pOptimizer−>PutDerivConstr ( 0 , iDvIdx , −1 ) ;116 m pOptimizer−>PutDerivConstr ( 1 , iDvIdx , 1 ) ;117 118 else

232

119 120 m pOptimizer−>PutDerivConstr ( 0 , iDvIdx , 0 ) ;121 m pOptimizer−>PutDerivConstr ( 1 , iDvIdx , 0 ) ;122 123 124 return EXIT SUCCESS ;125 126127 int Example41 : : SolveOptProblem ( )128 129 int iE r r o r ;130131 iE r ro r = 0 ;132133 // MidacoWrapper needs a r e l a t i v e l y h igh v a l u e f o r134 // t h i s parameter comnpared to MipOptimizer .135 m pOptimizer−>Put ( ”MaxNumIter” , 100000) ;136137 // Define the o p t i m i z a t i o n problem :138139 m pOptimizer−>PutNumDv( 6 ) ;140 m pOptimizer−>Put ( ”NumIntDv” , 2 ) ;141 m pOptimizer−>Put ( ”NumCatDv” , 2 ) ;142143 m pOptimizer−>PutNumIneqConstr ( 2 ) ;144 m pOptimizer−>PutNumEqConstr ( 0 ) ;145146 m pOptimizer−>PutUpperBound ( 0 , 1 ) ;147 m pOptimizer−>PutLowerBound ( 0 , −1 ) ;148 m pOptimizer−>Put In i t i a lGue s s ( 0 , 1 ) ;149 m pOptimizer−>PutUpperBound ( 1 , 1 ) ;150 m pOptimizer−>PutLowerBound ( 1 , −1 ) ;151 m pOptimizer−>Put In i t i a lGue s s ( 1 , 1 ) ;152 m pOptimizer−>PutUpperBound ( 2 , 1 ) ;153 m pOptimizer−>PutLowerBound ( 2 , −1 ) ;154 m pOptimizer−>Put In i t i a lGue s s ( 2 , 1 ) ;155 m pOptimizer−>PutUpperBound ( 3 , 2 ) ;156 m pOptimizer−>PutLowerBound ( 3 , −1 ) ;157 m pOptimizer−>Put In i t i a lGue s s ( 3 , 2 ) ;158159 // For the 0 ’ th c a t a l o g u e d v a r i a b l e a l l a l l o w e d v a l u e s160 // are put s e p a r a t e l y161 m pOptimizer−>m pVariableCatalogue [ 0 ] . PutNumAllowedValues ( 3 ) ;162 m pOptimizer−>m pVariableCatalogue [ 0 ] . PutAllowedValue (0 , 0 . 5 ) ;163 m pOptimizer−>m pVariableCatalogue [ 0 ] . PutAllowedValue (1 , 3 . 5 ) ;164 m pOptimizer−>m pVariableCatalogue [ 0 ] . PutAllowedValue (2 , 5 . 4 ) ;

233

23. Example

165 // As the a l l o w e d v a l u e s o f the 1 ’ s t c a t a l o g u e d v a r i a b l e are an166 // e q u i d i s t a n t p a r t i t i o n o f an i n t e r v a l , the v a l u e s can be put167 // by s p e c i f y i n g the bounds and s t e p s i z e .168 m pOptimizer−>m pVariableCatalogue [ 1 ] . PutAllowedValues ( 2 . 0 ,169 1 . 0 , 10 .0 ) ;170171 m pOptimizer−>Put In i t i a lGue s s ( 4 , 5 . 4 ) ;172 m pOptimizer−>Put In i t i a lGue s s ( 5 , 10 ) ;173174 // S t a r t o p t i m i z a t i o n175 iEr ro r = m pOptimizer−>DefaultLoop ( this ) ;176177 // i f t h e r e was an error , r e p o r t i t . . .178 i f ( iE r ro r != EXIT SUCCESS )179 180 cout << ” Error ” << iE r r o r << ”\n” ;181 return iE r r o r ;182 183184 // . . . e l s e r e p o r t the r e s u l t s :185186 const double ∗ dX ;187 int iNumDesignVar ;188 double dObjval ;189190 m pOptimizer−>GetNumDv( iNumDesignVar ) ;191192 m pOptimizer−>GetDesignVarVec ( dX ) ;193194 for ( int iDvIdx = 0 ; iDvIdx < iNumDesignVar ; iDvIdx++ )195 196 cout << ”dX[ ” << iDvIdx << ” ] = ” << dX[ iDvIdx ] << endl ;197 198199 m pOptimizer−>GetObjVal ( dObjval ) ;200201 cout <<” Object ive func t i on value : ” << dObjval << ”\n” ;202203 return EXIT SUCCESS ;204

234

Part IV.

Constrained Data Fitting

235

24. NlpmmxWrapper - Constrained Min-MaxOptimization

The Fortran subroutine NLPMMX by Schittkowski solves constrained min-max nonlinear pro-gramming problems, where the maximum of absolute nonlinear function values is to be min-imized. It is assumed that all functions are continuously differentiable. By introducing oneadditional variable and nonlinear inequality constraints, the problem is transformed into a gen-eral smooth nonlinear program subsequently solved by the sequential quadratic programming(SQP) code NLPQLP. The sections 24.1 and 24.2 are taken from Schittkowski [142].

24.1. Introduction

Min-max optimization problems consist of minimizing the maximum of finitely many givenfunctions,

x ∈ Rn :

min maxfi(x), i = 1, . . . , l

gj(x) = 0 , j = 1, . . . ,me ,

gj(x) ≥ 0 , j = me + 1, . . . ,m ,

xl ≤ x ≤ xu .

(24.1)

It is assumed that f1, . . ., fl and g1, . . ., gm are continuously differentiable functions.

In this paper, we consider the question how an existing nonlinear programming code canbe used to solve constrained min-max problems in an efficient and robust way after a suitabletransformation. In a very similar way, also L∞, L1, and least squares problems can be solvedefficiently by an SQP code, see chapters 27, 26, 24 and Schittkowski [128, 131, 134, 141, 140].

The transformation of a min-max problem into a special nonlinear program is described inSection 24.2. Section 24.3 contains a documentation of the code. An example implementationcan be found in chapter 28.

24.2. The Transformed Optimization Problem

We consider the constrained nonlinear min-max problem (24.1), and introduce one additionalvariable, z, and l additional nonlinear inequality constraints of the form

z − fi(x) ≥ 0 , (24.2)

237

24. NlpmmxWrapper - Constrained Min-Max Optimization

i = 1, . . ., l. The following equivalent problem is to be solved by an SQP method,

(x, z) ∈ Rn+1 :

min z

gj(x) = 0 , j = 1, . . . ,me ,

gj(x) ≥ 0 , j = me + 1, . . . ,m ,

z − fi(x) ≥ 0 , i = 1, . . . , l ,

xl ≤ x ≤ xn .

(24.3)

In this case, the quadratic programming subproblem wich has to be solved in each step of anSQP method, has the form

(d, e) ∈ Rn+1 :

min 12(dT , e)Bk

(de

)+ e

∇gj(xk)Td+ gj(xk) = 0 , j = 1, . . . ,me ,

∇gj(xk)Td+ gj(xk) ≥ 0 , j = me + 1, . . . ,m ,

e−∇fi(xk)Td+ zk − fi(xk) ≥ 0 , i = 1, . . . , l ,

xl − xk ≤ d ≤ xu − xk .

(24.4)

Bk ∈ Rn+1 × Rn+1 is a quasi-Newton update matrix of the Lagrangian function of (24.3). Anew iterate is then obtained from

xk+1 = xk + αkdk , zk+1 = zk + αkek ,

where dk ∈ Rn and ek ∈ R are a solution of (24.4) and αk a steplength parameter obtainedfrom forcing a sufficient descent of a merit function. The proposed transformation (24.3) isindependent of the SQP method used, so that available codes can be used in the form of a blackbox. For the maximum-norm, two-sided bound constraints can be added, see chapter 25 or [140].




1. Generate an NlpmmxWrapper object employing NlpmmxWrapper()



238


b) Put the number of supporting points to the object:Put( const char * pcParam , const int * piNumSuppPts ) ;

orPut( const char * pcParam , const int iNumSuppPts ) ;

where pcParam = "NumSuppPts".

c) Put a guess for the approximate size of the residual, i.e., a low positive real numberif the residual is supposed to be small, and a large one in the order of 1 if the residualis supposed to be large:Put( const char * pcParam , const double * pdApproxSize ) ;

orPut( const char * pcParam , const double dApproxSize ) ;

where pcParam = "ApproxSize".

d) Put the values for other parameters you want to adjust:








– "FortranOutputLevel" If >0, the output of the fortran code is written toan additional output file. This output might give more detailed informationabout errors.0: no additional output (default)









239


– "SuppPtCoords" Put coordinates of supporting points (t0,y0,t1,y1,...).

– "SuppPtTimes" Put times of supporting points.

– "SuppPtVals" Put values of supporting points.

– "Weights" Put weights.




e) Put the number of inequality constraint functions to the object:


f) Put the number of equality constraint functions to the object:
















4. Start Nlpmmx-Routine: StartOptimizer()



240



• if iStatus equals EvalFuncId(): NlpmmxWrapper needs new function values → 6bNew function values.After passing these values to the NlpmmxWrapper object go to 4.

• if iStatus equals EvalGradId():NlpmmxWrapper needs new values for gradients→ 6c Providing new gradient values.After passing these values to the NlpmmxWrapper object go to 4.


a) Useful methods:



Returns the number of design variables.Get( const char * pcParam, int & iNumSuppPts )

where pcParam = "NumSuppPts" returns the number of supporting points.Get( const char * pcParam, double * & pdSuppPtTimes ),where pcParam = "SuppPtTimes" returns a pointer to an array of ti, i = 1, ..., l,i.e. the time coordinates of the supporting points.Get( const char * pcParam, double * & pdSuppPtVals ),where pcParam = "SuppPtVals" returns a pointer to an array of yi, i = 1, ..., l,i.e. the values at the supporting points.





b) Providing new function valuesFunction values of the model function h as well as of constraints have to be calculatedfor the variable vectors at each supporting point.For access to the design variable vector see 6(a)ii. After calculating the values, thesemust be passed to the NlpmmxWrapper object using:


const int iParSysIdx ),with iParSysIdx=0 (default)where pdConstrVals is pointing on an array containing the values of each con-straint at the design variable vector provided by NlpmmxWrapper.

• PutModelVal( const double dModelVal, const int iSuppPtIdx,where iSuppPtIdx is the index of the supporting point (beginning with 0) anddModelVal defines the value of the model function at the current design variablevector.

Alternatively you can employ

241


• PutConstrVal( const int iConstrIdx, const double dConstrValue,


• PutModelValVec( const double * const pdModelVal ),where pdModelVal is a double array of length that equals the number of sup-porting points containing the corresponding model function values.

c) Providing new gradient valuesGradients must be calculated for the model function and the constraints at the currentdesign variable vector. For access to the design variable vector see 6(a)ii.








For the gradient of the model function you can use

• PutDerivModel( const int iVariableIdx, const double dDerivVal,

const int iSuppPtIdx ),for iSuppPtIdx = 0,...,NumSuppPts


where dDerivVal is defined as the derivative of the model function with respectto the iVariableIdx’th variable at the current design variable vector.

• PutGradModel( const double * pdGradient, const int iSuppPtIdx),for iSuppPtIdx = 0,...,NumSuppPts and where pdGradient is a pointer on thegradient of the model function at the current design variable vector.

7. Output








242


• GetDerivModel( const int iVariableIdx, double & dDerivativeValue,

const int iSuppPtIdxdx ) const

returns the value of the derivative of the model function with respect to the iVari-ableIdx’th design variable at the iSuppPtIdx’th supporting point.


) const




• Get( const char * pcParam, double & dResidual )

with pcParam = "Residual" returns the residual.

243

25. NlpinfWrapper - Constrained Data Fittingin the L∞-Norm

The Fortran subroutine NLPINF by Schittkowski solves constrained min-max or L∞ nonlin-ear programming problems, where the maximum of absolute nonlinear function values is to beminimized. It is assumed that all functions are continuously differentiable. By introducing oneadditional variable and nonlinear inequality constraints, the problem is transformed into a gen-eral smooth nonlinear program subsequently solved by the sequential quadratic programming(SQP) code NLPQLP. An important application is data fitting, where the distance of experi-mental data from a model function evaluated at given experimental times is minimized by theL∞ or maximum norm, respectively. The usage of the code is documented, and an illustrativeexample is presented. The sections 25.1 and 25.2 are taken from Schittkowski [140].

25.1. Introduction

Min-max optimization problems arise in many practical situations, for example in approximationor when fitting a model function to given data in the L∞-norm. In this particular case, amathematical model is available in form of one or several equations, and the goal is to estimatesome unknown parameters of the model. Exploited are available experimental data, to minimizethe distance of the model function, in most cases evaluated at certain time values, from measureddata at the same time values. An extensive discussion of data fitting especially in case ofdynamical systems is given by Schittkowski [128].

The mathematical problem we want to solve, is given in the form

x ∈ Rn :

min max|fi(x)|, i = 1, . . . , l

gj(x) = 0 , j = 1, . . . ,me ,

gj(x) ≥ 0 , j = me + 1, . . . ,m ,

xl ≤ x ≤ xu .

(25.1)

It is assumed that f1, . . ., fl and g1, . . ., gm are continuously differentiable functions.

Some test examples for L∞-approximation are studied in Schittkowski [136, 133], where thenumber of data points is extremely large, i.e., up to 60,000,000. An active set strategy isapplied to reduce the size of the Jacobian matrix of the quadratic programming subproblem,see Schittkowski [126] for details.

In this paper, we consider the question how an existing nonlinear programming code canbe used to solve constrained min-max problems in an efficient and robust way after a suitabletransformation. In a very similar way, also L1 and least squares problems can be solved efficientlyby an SQP code, see chapters 26, 27 or Schittkowski [128, 131, 134, 141].

245

25. NlpinfWrapper - Constrained Data Fitting in the L∞-Norm

The transformation of an L∞ problem into a special nonlinear program is described in Section25.2. Section 25.3 contains a documentation of the code. An example implementation can befound in chapter 28


We consider the constrained nonlinear min-max or L∞ problem (25.1), and introduce an addi-tional variables z and 2l additional nonlinear equality constraints of the form

fi(x) + z ≥ 0 ,

−fi(x) + z ≥ 0 ,(25.2)


(x, z) ∈ Rn+1 :

min z

gj(x) = 0 , j = 1, . . . ,me ,

gj(x) ≥ 0 , j = me + 1, . . . ,m ,

fi(x) + z ≥ 0 , i = 1, . . . , l ,

−fi(x) + z ≥ 0 , i = 1, . . . , l ,

xl ≤ x ≤ xn ,

z ≥ 0 .

(25.3)


(d, e) ∈ Rn+1 :

min 12(dT , e)Bk

(de

)+ e

∇gj(xk)Td+ gj(xk) = 0 , j = 1, . . . ,me ,

∇gj(xk)Td+ gj(xk) ≥ 0 , j = me + 1, . . . ,m ,

∇fi(xk)Td+ e+ fi(xk) + zk ≥ 0 , i = 1, . . . , l ,

−∇fi(xk)Td+ e− fi(xk) + zk ≥ 0 , i = 1, . . . , l ,

xl − xk ≤ d ≤ xu − xk ,

e ≥ 0 .

(25.4)



where dk ∈ Rn and ek ∈ R are a solution of (25.4) and αk a steplength parameter obtained fromforcing a sufficient descent of a merit function.

246


The proposed transformation (25.3) is independent of the SQP method used, so that availablecodes can be used in the form of a black box. However, an active set strategy is recommendedto reduce the number of constraints, if l becomes large, e.g., the code NLPQLB (see chapter 3or [136]).

A final remark concerns the theoretical convergence of the algorithm. Since the original prob-lem is transformed into a general nonlinear programming problem, we can apply all convergenceresults known for SQP methods. If an augmented Lagrangian function is preferred for the meritfunction, a global convergence theorem is found in Schittkowski [120], see also [126] for con-vergence of the active set strategy. The theorem states that when starting from an arbitraryinitial value, a Karush-Kuhn-Tucker point is approximated, i.e., a point satisfying the necessaryoptimality conditions. If, on the other hand, an iterate is sufficiently close to an optimal solu-tion and if the steplength is 1, then the convergence speed of the algorithm is superlinear, seePowell [105] for example. This remark explains the fast final convergence rate one observes inpractice.




1. Generate an NlpinfWrapper object employing NlpinfWrapper()













247




























248














4. Start Nlpinf-Routine: StartOptimizer()




• if iStatus equals EvalFuncId(): NlpinfWrapper needs new function values → 6bNew function values.After passing these values to the NlpinfWrapper object go to 4.

• if iStatus equals EvalGradId():NlpinfWrapper needs new values for gradients → 6c Providing new gradient values.After passing these values to the NlpinfWrapper object go to 4.


a) Useful methods:




where pcParam = "NumSuppPts" returns the number of supporting points.Get( const char * pcParam, double * & pdSuppPtTimes ),

249


where pcParam = "SuppPtTimes" returns a pointer to an array of ti, i = 1, ..., l,i.e. the time coordinates of the supporting points.Get( const char * pcParam, double * & pdSuppPtVals ),where pcParam = "SuppPtVals" returns a pointer to an array of yi, i = 1, ..., l,i.e. the values at the supporting points.





b) Providing new function valuesFunction values of the model function h as well as of constraints have to be calculatedfor the variable vectors at each supporting point.For access to the design variable vector see 6(a)ii. After calculating the values, thesemust be passed to the NlpinfWrapper object using:


const int iParSysIdx ),with iParSysIdx=0 (default)where pdConstrVals is pointing on an array containing the values of each con-straint at the design variable vector provided by NlpinfWrapper.












250










7. Output












) const






251

26. NlpL1Wrapper - Constrained Data Fittingin the L1-Norm

The Fortran subroutine NLPL1 by Schittkowski solves constrained nonlinear programming prob-lems, where the sum of absolute nonlinear function values is to be minimized. It is assumed thatall functions are continuously differentiable. By introducing additional variables and nonlinearinequality constraints, the problem is transformed into a general smooth nonlinear programsubsequently solved by the sequential quadratic programming (SQP) code NLPQLP. The usageof the code is documented, and an illustrative example is presented. The sections 26.1 and 26.2are taken from Schittkowski [141].

26.1. Introduction

L1 optimization problems arise in many practical situations, for example in approximationor when fitting a model function to given data in the L1-norm. In this particular case, amathematical model is available in form of one or several equations, and the goal is to estimatesome unknown parameters of the model. Exploited are available experimental data, to minimizethe distance of the model function, in most cases evaluated at certain time values, from measureddata at the same time values. An extensive discussion of data fitting especially in case ofdynamical systems is given by Schittkowski [128]. The mathematical problem we want to solve,is given in the form

x ∈ Rn :

min∑l

i=1 |fi(x)|

gj(x) = 0 , j = 1, . . . ,me ,

gj(x) ≥ 0 , j = me + 1, . . . ,m ,

xl ≤ x ≤ xu .

(26.1)

It is assumed that f1, . . ., fl and g1, . . ., gm are continuously differentiable functions.In this paper, we consider the question how an existing nonlinear programming code can be

used to solve constrained L1 optimization problems in an efficient and robust way after a suitabletransformation. In a very similar way, also L∞, min-max and least squares problems can besolved efficiently by an SQP code, see chapters 25, 24, 27 or Schittkowski [128, 131, 134, 142].


We consider the constrained nonlinear L1 problem (26.1), and introduce 2l additional variablesz1, . . ., z2l and 2l additional nonlinear inequality constraints of the form

fi(x) + zi ≥ 0 ,

−fi(x) + zl+i ≥ 0 ,(26.2)

253

26. NlpL1Wrapper - Constrained Data Fitting in the L1-Norm


(x, z) ∈ Rn+2l :

min z

gj(x) = 0 , j = 1, . . . ,me ,

gj(x) ≥ 0 , j = me + 1, . . . ,m ,

fi(x) + zi ≥ 0 , i = 1, . . . , l ,

−fi(x) + zl+i ≥ 0 , i = 1, . . . , l ,

xl ≤ x ≤ xn ,

z ≥ 0 .

(26.3)


(d, e) ∈ Rn+2l :

min 12(dT , e)Bk

(de

)+ e

∇gj(xk)Td+ gj(xk) = 0 , j = 1, . . . ,me ,

∇gj(xk)Td+ gj(xk) ≥ 0 , j = me + 1, . . . ,m ,

∇fi(xk)Td+ ei + fi(xk) + zki ≥ 0 , i = 1, . . . , l ,

−∇fi(xk)Td+ el+i − fi(xk) + zkl+i ≥ 0 , i = 1, . . . , l ,

xl − xk ≤ d ≤ xu − xk ,

e ≥ 0 .

(26.4)



where dk ∈ Rn and ek ∈ R are a solution of (26.4) and αk a steplength parameter obtained fromforcing a sufficient descent of a merit function.

The proposed transformation (26.3) is independent of the SQP method used, so that availablecodes can be used in the form of a black box. Our implementation calls the code NLPQLP (seechapter 2 or [132]).




1. Generate an NlpL1Wrapper object employing NlpL1Wrapper()

254


























– "RestartParam" Parameter for initializing a restart in case of an uphillsearch direction by setting the BFGS-update matrix to RestartParam*I,where I denotes the identity matrix. The number of restarts is bounded by

255


MaxNumIterLS. No restart is performed if RestartParam is less than 1. Mustbe greater than 0 (e.g. 100).


























4. Start NlpL1-Routine: StartOptimizer()


256




• if iStatus equals EvalFuncId(): NlpL1Wrapper needs new function values → 6bNew function values.After passing these values to the NlpL1Wrapper object go to 4.

• if iStatus equals EvalGradId():NlpL1Wrapper needs new values for gradients → 6c Providing new gradient values.After passing these values to the NlpL1Wrapper object go to 4.


a) Useful methods:




where pcParam = "NumSuppPts" returns the number of supporting points.Get( const char * pcParam, double * & pdSuppPtTimes ),where pcParam = "SuppPtTimes" returns a pointer to an array of ti, i = 1, ..., l,i.e. the time coordinates of the supporting points.Get( const char * pcParam, double * & pdSuppPtVals ),where pcParam = "SuppPtVals" returns a pointer to an array of yi, i = 1, ..., l,i.e. the values at the supporting points.





b) Providing new function valuesFunction values of the model function h as well as of constraints have to be calculatedfor the variable vectors at each supporting point.For access to the design variable vector see 6(a)ii. After calculating the values, thesemust be passed to the NlpL1Wrapper object using:


const int iParSysIdx ),with iParSysIdx=0 (default)where pdConstrVals is pointing on an array containing the values of each con-straint at the design variable vector provided by NlpL1Wrapper.


257




















7. Output







258







) const






259

27. LeastSquaresWrapper - Constrained DataFitting in the L2-Norm

LeastSquaresWrapper is a C++-Wrapper for the Fortran subroutines NLPLSX and NLPLSQ.The Fortran subroutines by Schittkowski solve constrained least squares nonlinear programmingproblems, where the sum of squared nonlinear functions is to be minimized. It is assumed thatall functions are continuously differentiable.

The subroutine NLPLSX is applied when the problem has more than 200 data points. Theproblem is directly solved by the SQP-routine NLPQLP.

If the problem has 200 or less data points the subroutine NLPLSQ is applied. By introducingadditional variables and nonlinear equality constraints, the problem is transformed into a generalsmooth nonlinear program subsequently solved by the sequential quadratic programming (SQP)code NLPQLP. It can be shown that typical features of special purpose algorithms are retained,i.e., a combination of a Gauss-Newton and a quasi-Newton search direction. The additionallyintroduced variables are eliminated in the quadratic programming subproblem, so that calcula-tion time is not increased significantly. Some comparative numerical results are included, theusage of the code is documented, and an illustrative example is presented. The sections 27.1,27.2, 27.3, 27.4 are taken from Schittkowski [134]

27.1. Introduction

Nonlinear least squares optimization is extremely important in many practical situations. Typ-ical applications are maximum likelihood estimation, nonlinear regression, data fitting, systemidentification, or parameter estimation, respectively. In these cases, a mathematical model isavailable in form of one or several equations, and the goal is to estimate some unknown para-meters of the model. Exploited are available experimental data, to minimize the distance of themodel function, in most cases evaluated at certain time values, from measured data at the sametime values. An extensive discussion of data fitting especially in case of dynamical systems isgiven by Schittkowski [128].

The mathematical problem we want to solve, is given in the form

x ∈ Rn :

min 12

∑li=1 fi(x)2

gj(x) = 0 , j = 1, . . . ,me ,

gj(x) ≥ 0 , j = me + 1, . . . ,m ,

xl ≤ x ≤ xu .

(27.1)

It is assumed that f1, . . ., fl and g1, . . ., gm are continuously differentiable.Although many nonlinear least squares programs were developed in the past, see Hiebert [67]

for an overview and numerical comparison, only very few programs were written for the non-linearly constrained case, see, e.g., Holt and Fletcher [70], Lindstrom [86], Mahdavi-Amiri and

261

27. LeastSquaresWrapper - Constrained Data Fitting in the L2-Norm

Bartels [91], or Fernanda et al. [42]. However, the implementation of one of these or any similarspecial purpose code requires a large amount of theoretical and numerical analysis.

In this paper, we consider the question how an existing nonlinear programming code canbe used to solve constrained nonlinear least squares problems in an efficient and robust way.We will see that a simple transformation of the model under consideration and subsequentsolution by a sequential quadratic programming (SQP) algorithm retains typical features ofspecial purpose methods, i.e., the combination of a Gauß-Newton search direction with a quasi-Newton correction. Numerical test results indicate that the proposed approach is as efficient asspecial purpose methods, although the required programming effort is negligible provided thatan SQP code is available.

In a very similar way, also L1, L∞, and min-max problems can be solved efficiently by an SQPcode after a suitable transformation, see chapters 26, 25, 24 or Schittkowski [128, 131, 135].

The following section describes some basic features of least squares optimization, especiallysome properties of Gauss-Newton and related methods. The transformation of a least squaresproblem into a special nonlinear program is described in Section 27.3. We will discuss howsome basic features of special purpose algorithms are retained. The same ideas are extended tothe constrained case in Section 27.4. Section 27.5 contains a documentation of the code. Anexample implementation can be found in chapter 28.

27.2. Least Squares Methods

First, we consider unconstrained least squares problems where we omit constraints and boundsto simplify the notation,

min 12

∑li=1 fi(x)2

x ∈ Rn .(27.2)

These problems possess a long history in mathematical programming and are extremely impor-tant in practice, particularly in nonlinear data fitting or maximum likelihood estimation, see,e.g., Bjorck [6] or Dennis [30]. Thus, a large number of mathematical algorithms is available forsolving (27.2).

To understand their basic features, we introduce the notation

F (x) = (f1(x), . . . , fl(x))T

and let

f(x) =1

2

l∑i=1

fi(x)2 .

Then

∇f(x) = ∇F (x)F (x) (27.3)

defines the Jacobian of the objective function with ∇F (x) = (∇f1(x), . . . ,∇fl(x)). If we assumenow that all functions f1, . . ., fl are twice continuously differentiable, we get the Hessian matrixof f by

∇2f(x) = ∇F (x)∇F (x)T +B(x) , (27.4)

262

27.2. Least Squares Methods

where

B(x) =

l∑i=1

fi(x)∇2fi(x) . (27.5)

Proceeding from a given iterate xk, Newton’s method can be applied to (27.2) to get a searchdirection dk ∈ Rn by solving the linear system

∇2f(xk)d+∇f(xk) = 0

or, alternatively,∇F (xk)∇F (xk)

Td+B(xk)d+∇F (xk)F (xk) = 0 . (27.6)

Assume for a moment that

F (x?) = (f1(x?), . . . , fl(x?))T = 0

at an optimal solution x?. A possible situation is a perfect fit where model function valuescoincide with experimental data. Because of B(x?) = 0, we neglect matrix B(xk) in (27.6),see also (27.5). Then (27.6) defines the so-called normal equations of the linear least squaresproblem

min ‖∇F (xk)Td+ F (xk)‖

d ∈ Rn .(27.7)

A new iterate is obtained by xk+1 = xk + αkdk, where dk is a solution of (27.7) and whereαk denotes a suitable steplength parameter. It is obvious that a quadratic convergence rateis achieved when starting sufficiently close to an optimal solution. The above calculation of asearch direction is known as the Gauss-Newton method and represents the traditional way tosolve nonlinear least squares problems, see Bjorck [6] for more details. In general, the Gauss-Newton method possesses the attractive feature that it converges quadratically although we onlyprovide first order information.

However, the assumptions of the convergence theorem of Gauss-Newton methods are verystrong and cannot be satisfied in real situations. We have to expect difficulties in case of non-zero residuals, rank-deficient Jacobian matrices, non-continuous derivatives, and starting pointsfar away from a solution. Further difficulties arise when trying to solve large residual problems,where F (x?)TF (x?) is not sufficiently small, for example relative to ‖∇F (x?)‖. Numerousproposals have been made in the past to deal with this situation, and it is outside the scope ofthis section to give a review of all possible attempts developed in the last 30 years. Only a fewremarks are presented to illustrate basic features of the main approaches, for further reviews seeGill, Murray and Wright [54], Ramsin and Wedin [112], or Dennis [31].

A very popular method is known under the name Levenberg-Marquardt algorithm, see Lev-enberg [81] and Marquardt [92]. The key idea is to replace the Hessian in (27.6) by a multipleof the identity matrix, say λkI, with a suitable positive factor λk. We get a uniquely solvablesystem of linear equations of the form

∇F (xk)∇F (xk)Td+ λkd+∇F (xk)F (xk) = 0 .

For the choice of λk and the relationship to so-called trust region methods, see More [94].A more sophisticated idea is to replace B(xk) in (27.6) by a quasi-Newton-matrix Bk, see

Dennis [30]. But some additional safeguards are necessary to deal with indefinite matrices

263


∇F (xk)∇F (xk)T + Bk in order to get a descent direction. A modified algorithm is proposed

by Gill and Murray [52], where Bk is either a second-order approximation of B(xk), or a quasi-Newton matrix. In this case, a diagonal matrix is added to ∇F (xk)∇F (xk)

T + Bk to obtaina positive definite matrix. Lindstrom [85] proposes a combination of a Gauss-Newton and aNewton method by using a certain subspace minimization technique.

If, however, the residuals are too large, there is no possibility to exploit the special structureand a general unconstrained minimization algorithm, for example a quasi-Newton method, canbe applied as well.

27.3. The SQP-Gauss-Newton Method

Many efficient special purpose computer programs are available to solve unconstrained nonlinearleast squares problems. On the other hand, there exists a very simple approach to combine thevaluable properties of Gauss-Newton methods with that of SQP algorithms in a straightforwardway with almost no additional efforts. We proceed from an unconstrained least squares problemin the form

min 12

∑li=1 fi(x)2

x ∈ Rn ,(27.8)

see also (27.2). Since most nonlinear least squares problems are ill-conditioned, it is not recom-mended to solve (27.8) directly by a general nonlinear programming method. But we will seein this section that a simple transformation of the original problem and its subsequent solutionby an SQP method retains typical features of a special purpose code and prevents the need totake care of negative eigenvalues of an approximated Hessian matrix as in the case of alternativeapproaches. The corresponding computer program can be implemented in a few lines providedthat a SQP algorithm is available.

The transformation, also described in Schittkowski [125, 128, 131], consists of introducing ladditional variables z = (z1, . . . , zl)

T and l additional nonlinear equality constraints of the form

fi(x)− zi = 0 , i = 1, . . . , l . (27.9)

Then the equivalent transformed problem is

(x, z) ∈ Rn+l :min 1

2zT z

F (x)− z = 0 ,(27.10)

F (x) = (f1(x), . . ., fl(x))T . We consider now (27.10) as a general nonlinear programmingproblem of the form

x ∈ Rn :min f(x)

g(x) = 0(27.11)

with n = n+ l, x = (x, z), f(x, z) = 12zT z, g(x, z) = F (x)− z, and apply an SQP algorithm, see

Spellucci [150], Stoer [151], or Schittkowski [123, 128]. The quadratic programming subproblemis of the form

d ∈ Rn :min 1

2 dT Bkd+∇f(xk)

T d

∇g(xk)T d+ g(xk) = 0 .

(27.12)

264

27.3. The SQP-Gauss-Newton Method

Here, xk = (xk, zk) is a given iterate and

Bk =

(Bk : CkCTk : Dk

)(27.13)

with Bk ∈ Rn×n, Ck ∈ Rn×l, and Dk ∈ Rl×l, a given approximation of the Hessian of theLagrangian function L(x, u) defined by

L(x, u) = f(x)− uT g(x)

= 12zT z − uT (F (x)− z) .

Since

∇xL(x, u) =

(−∇F (x)uz + u

)and

∇2xL(x, u) =

(B(x, u) : 0

0 : I

)with

B(x, u) = −l∑

i=1

ui∇2fi(x) , (27.14)

it seems to be reasonable to proceed now from a quasi-Newton matrix given by

Bk =

(Bk : 00 : I

), (27.15)

where Bk ∈ Rn×n is a suitable positive definite approximation of B(xk, uk). Insertion of this Bkinto (27.12) leads to the equivalent quadratic programming subproblem

(d, e) ∈ Rn+l :min 1

2dTBkd+ 1

2eT e+ zTk e

∇F (xk)Td− e+ F (xk)− zk = 0 ,

(27.16)

where we replaced d by (d, e). Some simple calculations show that the solution of the abovequadratic programming problem is identified by the linear system

∇F (xk)∇F (xk)Td+Bkd+∇F (xk)F (xk) = 0 . (27.17)

This equation is identical to (27.6), if Bk = B(xk), and we obtain a Newton direction for solvingthe unconstrained least squares problem (27.8).

Note that B(x) defined by (27.5) and B(x) defined by (27.14) coincide at an optimal solutionof the least squares problem, since F (xk) = −uk. Based on the above considerations, an SQPmethod can be applied to solve (27.10) directly. The quasi-Newton-matrices Bk are alwayspositive definite, and consequently also the matrix Bk defined by (27.13). Therefore, we omitnumerical difficulties imposed by negative eigenvalues as found in the usual approaches forsolving least squares problems.

265


When starting the SQP method, one could proceed from a user-provided initial guess x0 forthe variables and define

z0 = F (x0) ,

B0 =

(µI : 0

0 : I

),

(27.18)

guaranteeing a feasible starting point x0. The choice of B0 is of the form (27.15) and allowsa user to provide some information on the estimated size of the residuals, if available. If it isknown that the final norm F (x?)TF (x?) is close to zero at the optimal solution x?, the usercould choose a small µ in (27.18). At least in the first iterates, the search directions are moresimilar to a Gauss-Newton direction. Otherwise, a user could define µ = 1, if a large residual isexpected.

Under the assumption that Bk is decomposed in the form (27.15) and that Bk be updatedby the BFGS formula, then Bk+1 is decomposed in the form (27.15), see Schittkowski [125].The decomposition (27.15) is rarely satisfied in practice, but seems to be reasonable, since theintention is to find a x? ∈ Rn with

∇F (x?)F (x?) = 0 ,

and ∇F (xk)Tdk +F (xk) is a Taylor approximation of F (xk+1). Note also that the usual way

to derive Newton’s method is to assume that the optimality condition is satisfied for a certainlinearization of a given iterate xk, and to use this linearized system for obtaining a new iterate.

Example 27.3.1 We consider the banana function

f(x1, x2) = 100(x2 − x12)2 + (1− x1)2 .

When applying the nonlinear programming code NLPQLP of Schittkowski [123, 132], an imple-mentation of a general purpose SQP method, we get the iterates of Table 27.1 when startingat x0 = (−1.2, 1.0)T . The objective function is scaled by 0.5 to adjust this factor in the leastsquares formulation (27.1). The last column contains an internal stopping condition based onthe optimality criterion, in our unconstrained case equal to

|∇f(xk)B−1k ∇f(xk)|

with a quasi-Newton matrix Bk. We observe a very fast final convergence speed, but a relativelylarge number of iterations.

The transformation discussed above, leads to the equivalent constrained nonlinear programmingproblem

x1, x2, z1, z2 :

min z12 + z2

2

10(x2 − x12)− z1 = 0 ,

1− x1 − z2 = 0 .

NLPQLP computes the results of Table 27.2, where the second column shows in addition themaximal constraint violation.

266

27.4. Constrained Least Squares Problems

k f(xk) s(xk)

0 24.20 0.54 · 105

1 12.21 0.63 · 102

2 7.98 0.74 · 102

. . .35 0.29 · 10−3 0.47 · 10−3

36 0.19 · 10−4 0.39 · 10−4

37 0.12 · 10−5 0.25 · 10−5

38 0.12 · 10−7 0.24 · 10−7

39 0.58 · 10−12 0.11 · 10−11

40 0.21 · 10−15 0.42 · 10−15

Table 27.1.: NLP Formulation of Banana Function

k f(xk) r(xk) s(xk)

0 12.10 0.0 0.24 · 102

1 0.96 · 10−10 0.48 · 102 0.23 · 10−4

2 0.81 · 10−10 0.40 · 10−10 0.16 · 10−9

3 0.88 · 10−21 0.14 · 10−8 0.18 · 10−20

Table 27.2.: Least Squares Formulation of Banana Function

27.4. Constrained Least Squares Problems

Now we consider constrained nonlinear least squares problems

x ∈ Rn :

min 12

∑li=1 fi(x)2

gj(x) = 0 , j = 1, . . . ,me ,

gj(x) ≥ 0 , j = me + 1, . . . ,m ,

xl ≤ x ≤ xu .

(27.19)

A combination of the SQP method with the Gauss-Newton method is proposed by Mahdavi-Amiri [90]. Lindstrom [86] developed a similar method based on an active set idea leading to asequence of equality constrained linear least squares problems. A least squares code for linearlyconstrained problems was published by Hanson and Krogh [65] that is based on a tensor model.

On the other hand, a couple of SQP codes are available for solving general smooth nonlinearprogramming problems, for example VFO2AD (Powell [105]), NLPQLP (Schittkowski [123,132]), NPSOL (Gill, Murray, Saunders, Wright [53]), or DONLP2 (Spellucci [150]). Since mostnonlinear least squares problems are ill-conditioned, it is not recommended to solve (27.19)directly by a general nonlinear programming method as shown in the previous section. Thesame transformation used before can be extended to solve also constrained problems. Thesubsequent solution by an SQP method retains typical features of a special purpose code and iseasily implemented.

267


As outlined in the previous section, we introduce l additional variables z = (z1, . . . , zl)T and

l additional nonlinear equality constraints of the form

fi(x)− zi = 0 ,

i = 1, . . ., l. The following transformed problem is to be solved by an SQP method,

(x, z) ∈ Rn+l :

min 12zT z

fi(x)− zi = 0 , i = 1, . . . , l ,

gj(x) = 0 , j = 1, . . . ,me ,

gj(x) ≥ 0 , j = me + 1, . . . ,m ,

xl ≤ x ≤ xn .

(27.20)

In this case, the quadratic programming subproblem has the form

(d, e) ∈ Rn+l :

min 12(dT , eT )Bk

(de

)+ zTk e

∇fi(xk)Td− e+ fi(xk)− zki = 0 , i = 1, . . . , l ,

∇gj(xk)Td+ gj(xk) = 0 , j = 1, . . . ,me ,

∇gj(xk)Td+ gj(xk) ≥ 0 , j = me + 1, . . . ,m ,

xl − xk ≤ d ≤ xu − xk .

(27.21)

It is possible to simplify the problem by substituting

e = ∇F (xk)Td+ F (xk)− zk ,

so that the quadratic programming subproblem depends on only n variables and m linear con-straints. This is an important observation from the numerical point of view, since the com-putational effort to solve (27.21) reduces from the order of (n + l)3 to n3, and the remainingcomputations in the outer SQP frame are on the order of (n+ l)2. Therefore, the computationalwork involved in the proposed least squares algorithm is comparable to the numerical effortsrequired by special purpose methods, at least if the number l of observations is not too large.

When implementing the above proposal, one has to be aware that the quadratic programmingsubproblem is sometimes expanded by an additional variable δ, so that some safeguards arerequired. Except for this limitation, the proposed transformation (27.20) is independent fromthe variant of the SQP method used, so that available codes can be used in the form of a blackbox.

In principle, one could use the starting points proposed by (27.18). Numerical experiencesuggests, however, starting from z0 = F (x0) only if the constraints are satisfied at x0,

gj(x0) = 0 , j = 1, . . . ,me ,

gj(x0) ≥ 0 , j = me + 1, . . . ,m .

In all other cases, it is proposed to proceed from z0 = 0.

268


A final remark concerns the theoretical convergence of the algorithm. Since the original prob-lem is transformed into a general nonlinear programming problem, we can apply all convergenceresults known for SQP methods. If an augmented Lagrangian function is preferred for the meritfunction, a global convergence theorem is found in Schittkowski [120]. The theorem states thatwhen starting from an arbitrary initial value, a Karush-Kuhn-Tucker point is approximated,i.e., a point satisfying the necessary optimality conditions. If, on the other hand, an iterate issufficiently close to an optimal solution and if the steplength is 1, then the convergence speedof the algorithm is superlinear, see Powell [104] for example. This remark explains the fast finalconvergence rate one observes in practice.

The assumptions are standard and are required by any special purpose algorithm in one oranother form. But in our case, we do not need any regularity conditions for the function f1,. . ., fl, i.e., an assumption that the matrix ∇F (xk) is of full rank, to adapt the mentionedconvergence results to the least squares case. The reason is found in the special form of thequadratic programming subproblem (27.21), since the first l constraints are linearly independentand are also independent of the remaining restrictions.




1. Generate an LeastSquaresWrapper object employing LeastSquaresWrapper()






c) Only necessary for NLPLSQ, i.e. for 200 or less variables: Put a guess for theapproximate size of the residual, i.e., a low positive real number if the residual issupposed to be small, and a large one in the order of 1 if the residual is supposed tobe large:Put( const char * pcParam , const double * pdApproxSize ) ;





or

269





























270














4. Start LeastSquaresWrapper-Routine: StartOptimizer()




• if iStatus equals EvalFuncId(): LeastSquaresWrapper needs new function values→ 6b New function values.After passing these values to the LeastSquaresWrapper object go to 4.

• if iStatus equals EvalGradId():LeastSquaresWrapper needs new values for gradients → 6c Providing new gradientvalues.After passing these values to the LeastSquaresWrapper object go to 4.


a) Useful methods:




where pcParam = "NumSuppPts" returns the number of supporting points.

271


Get( const char * pcParam, double * & pdSuppPtTimes ),where pcParam = "SuppPtTimes" returns a pointer to an array of ti, i = 1, ..., l,i.e. the time coordinates of the supporting points.Get( const char * pcParam, double * & pdSuppPtVals ),where pcParam = "SuppPtVals" returns a pointer to an array of yi, i = 1, ..., l,i.e. the values at the supporting points.





b) Providing new function valuesFunction values of the model function h as well as of constraints have to be calculatedfor the variable vectors at each supporting point.For access to the design variable vector see 6(a)ii. After calculating the values, thesemust be passed to the LeastSquaresWrapper object using:


const int iParSysIdx ),with iParSysIdx=0 (default)where pdConstrVals is pointing on an array containing the values of each con-straint at the design variable vector provided by LeasSquaresWrapper.











272











7. Output












) const






273

28. Example


Minimize F0(x) =∑l−1

i=0 (h(x, ti)− yi)2 ,

where

h(x, t) =t2 + x1) · x0

t2 + x2t+ x3

and

t =

0.06250.07140.0823

0.10.1250.1670.250.5124

y =

0.02460.02350.03230.03420.04560.06270.08440.16

0.17350.19470.1957

For the approximation of gradients we use the class ApproxGrad (see chapter 30).The file Example44.h contains the class definition.

1 # ifndef Example44 H INCLUDED2 # define Example44 H INCLUDED34 #include”OptProblem . h”56 class Example44 : public OptProblem7 8 public :9

10 Example44 ( ) ;1112 int FuncEval ( bool bGradApprox = fa l se ) ;13 int GradEval ( ) ;14 int SolveOptProblem ( ) ;1516 protected :17 double F( const double x , const double ∗ b) const ;18 ;

275

28. Example

1920 # endif


1 #include”Example44 . h”2 #include<iostream>3 #include<iomanip>4 #include<cmath>5 #include”NlpinfWrapper . h”6 #include”NlpL1Wrapper . h”7 #include”NlpmmxWrapper . h”8 #include” LeastSquaresWrapper . h”9

10 using std : : cout ;11 using std : : endl ;12 using std : : c in ;13 using std : : s e t p r e c i s i o n ;1415 Example44 : : Example44 ( )16 17 char s ;18 cout << endl << ”−−− t h i s i s Example 44 −−−” << endl << endl ;19 do20 21 // Choose a l gor i thm :22 cout << endl << ”Which Algorithm do you want to use ? \n\n” ;23 cout << ” [ i ] NlpinfWrapper \n” ;24 cout << ” [m] NlpmmxWrapper \n” ;25 cout << ” [ l ] NlpL1Wrapper \n” ;26 cout << ” [ s ] LeastSquaresWrapper \n” ;2728 c in >> s ;2930 i f ( s == ’m’ )31 32 m pOptimizer = new NlpmmxWrapper ( ) ;33 34 else i f ( s == ’ l ’ )35 36 m pOptimizer = new NlpL1Wrapper ( ) ;37 38 else i f ( s == ’ i ’ )39 40 m pOptimizer = new NlpinfWrapper ;41 42 else i f ( s == ’ s ’ )

276

43 44 m pOptimizer = new LeastSquaresWrapper ;45 46 else47 48 cout << ” i l l e g a l input ! ” << endl ;49 50 51 while ( s != ’ l ’ &&52 s != ’m’ &&53 s != ’ i ’ &&54 s != ’ s ’ ) ;5556 5758 int Example44 : : FuncEval ( bool bGradApprox )59 60 const double ∗ dX ;61 double ∗ pdFuncVals ;62 double ∗ pdSuppPtTimes ;63 int iNumSuppPts ;64 int iSuppPtIdx ;65 int iNumConstr ;6667 pdSuppPtTimes = NULL ;6869 m pOptimizer−>GetNumConstr ( iNumConstr ) ;70 m pOptimizer−>Get ( ”NumSuppPts” , iNumSuppPts ) ;71 m pOptimizer−>Get ( ”SuppPtTimes” , pdSuppPtTimes ) ;7273 pdFuncVals = new double [ iNumConstr + iNumSuppPts ] ;7475 // Get the current d es i gn v a r i a b l e v e c t o r76 // from m pApprox when g r a d i e n t s are be ing approximated77 // and from m pOptimizer in case o f normal f u n c t i o n e v a l u a t i o n .78 i f ( bGradApprox == fa l se )79 80 m pOptimizer−>GetDesignVarVec ( dX ) ;81 82 else83 84 m pApprox−>GetDesignVarVec ( dX ) ;85 8687 // Eva luate model f u n c t i o n at a l l s u p p o r t i n g p o i n t s88 for ( iSuppPtIdx = 0 ; iSuppPtIdx < iNumSuppPts ; iSuppPtIdx++ )

277

28. Example

89 90 pdFuncVals [ iSuppPtIdx ] = (pow( pdSuppPtTimes [ iSuppPtIdx ] , 2 )91 +dX [ 1 ] ) ∗dX [ 0 ] / ( pow( pdSuppPtTimes [ iSuppPtIdx ] ,2)+dX [ 2 ]92 ∗pdSuppPtTimes [ iSuppPtIdx ] + dX [ 3 ] ) ;93 949596 // Put f u n c t i o n v a l u e s97 // to m pApprox when g r a d i e n t s are be ing e v a l u a t e d98 // to m pOptimizer in case o f normal f u n c t i o n e v a l u a t i o n99 i f ( bGradApprox == fa l se )

100 101102 for ( iSuppPtIdx = iNumConstr ; iSuppPtIdx < iNumConstr103 + iNumSuppPts ; iSuppPtIdx++ )104 105 m pOptimizer−>PutModelVal ( pdFuncVals [ iSuppPtIdx ] ,106 iSuppPtIdx ) ;107 108109 m pOptimizer−>PutConstrValVec ( pdFuncVals , iSuppPtIdx ) ;110 111 else112 113 m pApprox−>PutFuncVals ( pdFuncVals ) ;114 115116 delete [ ] pdFuncVals ;117118 return EXIT SUCCESS ;119 120121 int Example44 : : GradEval ( )122 123 // Gradients are approximated .124 return GradApprox ( ) ;125 126127 int Example44 : : SolveOptProblem ( )128 129 int iE r r o r ;130131 iEr ro r = 0 ;132133 int iMaxNumIter = 500 ;134 int iMaxNumIterLS = 50 ;

278

135 double dTermAcc = 1 .0E−6 ;136137 double dTimes [ 11 ] = 0 .0625 , 0 .0714 , 0 .0823 , 0 . 1 , 0 . 125 , 0 . 167 ,138 0 . 25 , 0 . 5 , 1 , 2 , 4 ;139 double dVals [ 11 ] = 0 .0246 , 0 .0235 , 0 .0323 , 0 .0342 , 0 .0456 , 0 .0627 ,140 0 .0844 , 0 . 16 , 0 .1735 , 0 .1947 , 0 .1957 ;141142143 // Set parameters :144 m pOptimizer−>Put ( ”MaxNumIter” , & iMaxNumIter ) ;145 m pOptimizer−>Put ( ”MaxNumIterLS” , & iMaxNumIterLS ) ;146 m pOptimizer−>Put ( ”TermAcc” , & dTermAcc ) ;147 m pOptimizer−>Put ( ”OutputLevel ” , 2 ) ;148149 // Define f i t t i n g problem :150 m pOptimizer−>Put ( ”NumSuppPts” , 11 ) ;151 m pOptimizer−>Put ( ”SuppPtTimes” , dTimes ) ;152 m pOptimizer−>Put ( ”SuppPtVals” , dVals ) ;153 m pOptimizer−>Put ( ” ApproxSize ” , 0 .01 ) ;154 m pOptimizer−>PutNumDv( 4 ) ;155 m pOptimizer−>PutNumIneqConstr ( 0 ) ;156 m pOptimizer−>PutNumEqConstr ( 0 ) ;157158 double LB[ 4 ] = − 1e5 , − 1e5 , − 1e5 , − 1e5 ;159 double UB[ 4 ] = 1e5 , 1e5 , 1e5 , 1 e5 ;160 double IG [ 4 ] = 0 . 25 , 0 . 39 , 0 . 415 , 0 .39 ;161162 m pOptimizer−>PutUpperBound ( UB ) ;163 m pOptimizer−>PutLowerBound ( LB ) ;164 m pOptimizer−>Put In i t i a lGue s s ( IG ) ;165166 // S t a r t o p t i m i z a t i o n167 iE r ro r = m pOptimizer−>DefaultLoop ( this ) ;168169 // i f t h e r e was an error , r e p o r t i t . . .170 i f ( iE r ro r != EXIT SUCCESS )171 172 cout << ” Error ” << iE r r o r << ”\n” ;173 return iE r r o r ;174 175176 // . . . e l s e r e p o r t the r e s u l t s :177178 const double ∗ dX ;179180 int iNumDesignVar ;

279

28. Example

181 int iDvIdx ;182 double dResidual ;183184 m pOptimizer−>GetNumDv( iNumDesignVar ) ;185186 m pOptimizer−>GetDesignVarVec ( dX ) ;187188 for ( iDvIdx = 0 ; iDvIdx < iNumDesignVar ; iDvIdx++ )189 190 cout << ”dX[ ” << iDvIdx << ” ] = ” << dX[ iDvIdx ] << endl ;191 192193 m pOptimizer−>Get ( ” Res idual ” , dRes idual ) ;194195 cout << ” Res idua l : ” << dResidual << endl ;196197 return EXIT SUCCESS ;198

280

Part V.

Other Useful Tools in NLP++

281

29. Class Variable Catalogue

This class defines a catalogued design variable.The catalogued variables of MipOptimizer and MidacoWrapper are stored in an array of that

type. So before you start optimization you have to access the class methods of VariableCataloguevia the class member MipOptimizer::m pVariableCatalogue or MidacoWrapper::m pVariableCatalogue

which is a VariableCatalogue* pointer to the array of catalogued variables.

29.1. Defining allowed values for a catalogued variable

There are two ways to do this:

• If the stepsize between the variables is constant you can use PutAllowedValues( double

dLowerBound, double dStepSize, double dUpperBound )

• If the stepsize is not constant you have to do it in two steps:

1. Put the total number of allowed values for the variable using PutNumAllowedValues(

int iNumAllowedValues )

2. Use either PutAllowedValues( const double * pdValues ) to put an array ofall allowed values or call PutAllowedValue( int iValIndex, double dValue ) re-peatedly for each allowed value, i.e.for iValIndex = 0,...,NumberOfAllowedValues-1.

29.2. Class methods

See html documentation.

283

30. Class ApproxGrad

30.1. Usage

1. Generate an ApproxGrad object using ApproxGrad().

2. Put the number of design variables: PutNumDv (const int iNumDv) ;

3. Put the number of functions: PutNumFuns (const int iNumFuns) ;

4. Put the perturbance factor:PutPerturbFactor (const double dPerturbFactor) ;

5. Put the design variable vector where to compute the gradient(s):PutDesignVarVec (const double *dX) ;

6. Determine whether the Jacobian is expected to be sparse:PutJacobianSparse (const bool bJacobianSparse) ;

7. If you use input- files for a program to evaluate the functions and within these files thereis only a limited number of digits for the values of the variables, it may happen that theperturbed and the unperturbed variables have the same value in the file. To avoid thisyou can put this parameter. ApproxGrad will then perturb the variables enough to makethe difference visible in the file.PutMaxNumChar (const int iMaxNumChar) ;

8. Put the approximation method:

• 1: Forward differences

• 2: Central differences

• 3: Five-Point-Formula

PutApproxMethod( const int iMethod ) ;

9. Run Start() ;

10. Check the Status: GetStatus (int & iStatus) ;

• If iStatus < 1: More function values are needed. Get the current design variablevector usingGetDesignVarVec (const double *& dX) ;

(with dX = NULL on input) and evaluate all functions at this vector. Store the valuesin an array and put them to the object usingPutFuncVals (const double *pdFuncVals) ;

where pdFuncVals points on a double array of length equal to the number of functionscontaining the values. Then go to Step 9.

285


• If iStatus == 1: The gradients are approximated. They can be accessed using

– GetJacobian (double **& pdJacobian) This is possible only in non-sparsemode. pdJacobian must be NULL on input and will point on a twodimensionalarray containing the Jacobian matrix with the gradients in its rows.

– GetGrad (const int iFunIdx, const double *pdGrad) This is possible onlyin non-sparse mode. pdGrad must be NULL on input and will on return point toan array of length equal to the number of variables containing the gradient ofthe iFunIdx’th function.

– GetDeriv( const int iFunIdx, const int iDvIdx,

double & dDerivVal ) ;

This function returns in dDerivVal the derivative of the iFunIdx’th functionw.r.t the iDvIdx’th variable.

• If iStatus > 1: An error occured.

30.2. Example

As an example we solve the following problem using DefaultLoop() and approximating thegradients using ApproxGrad:

Minimize F (x) = ln(|x2|)− x1 ,s.t. G0(x) = x2

1 + x22 − 4 = 0 ,

G1(x) = x2 − 1− x20 ≥ 0 ,

−108 ≤ xi ≤ 108, ∀i = 0, . . . , 2



10 Example12 ( ) ;1112 int FuncEval ( bool bGradApprox = fa l se ) ;13 int GradEval ( ) ;14 int GradApprox ( int iApproxMethod ) ;15 int SolveOptProblem ( ) ;1617 ;18 # endif


286

30.2. Example

1 #include”Example12 . h”2 #include<iostream>3 #include<cmath>4 #include”SqpWrapper . h”56 using std : : cout ;7 using std : : endl ;8 using std : : c in ;9

10 Example12 : : Example12 ( )11 12 char s ;13 cout << endl << ”−−− t h i s i s Example 12 −−−” << endl << endl ;14 cout << endl << ”−−−Solved by SqpWrapper−−−” << endl << endl ;1516 m pOptimizer = new SqpWrapper ( ) ;17 1819 int Example12 : : FuncEval ( bool bGradApprox )20 21 const double ∗ dX ;22 int iNumParSys ;23 double ∗ pdFuncVals ;24 int iNumConstr ;25 int iNumEqConstr ;26 int iNumIneqConstr ;27 int iNumActConstr ;28 int ∗ p iAct ive ;29 int iCounter ;30 int iNumObjFuns ;31 int iE r r o r ;32 bool bAct iveSetSt rat ;3334 p iAct ive = NULL ;35 iNumActConstr = 0 ;36 iCounter = 0 ;3738 m pOptimizer−>Get ( ”NumObjFuns” , iNumObjFuns ) ;39 m pOptimizer−>GetNumConstr ( iNumConstr ) ;40 m pOptimizer−>GetNumEqConstr ( iNumEqConstr ) ;41 m pOptimizer−>GetNumIneqConstr ( iNumIneqConstr ) ;4243 iEr ro r = m pOptimizer−>Get ( ”ActConstr ” , p iAct ive ) ;4445 // I f the o p t i m i z e r does not suppor t a c t i v e s e t s t r a t e g y ,46 // a l l i n e q u a l i t i e s are cons idered a c t i v e .

287


47 i f ( iE r ro r != EXIT SUCCESS )48 49 bAct iveSetSt rat = fa l se ;50 p iAct ive = new int [ iNumIneqConstr ] ;51 for ( int iConstr Idx = 0 ; iConstr Idx < iNumIneqConstr ;52 iConstr Idx++ )53 54 p iAct ive [ iConstr Idx ] = 1 ;55 56 57 else58 59 bAct iveSetSt rat = true ;60 616263 for ( int iConstr Idx = 0 ; iConstr Idx < iNumIneqConstr ;64 iConstr Idx++ )65 66 i f ( p iAct ive [ iConstr Idx ] != 0 )67 68 iNumActConstr++ ;69 70 71 // when used f o r g r a d i e n t approximation ,72 // FuncEval does not e v a l u a t e i n a c t i v e i n e q u a l i t y c o n s t r a i n t s73 i f ( bGradApprox == true )74 75 pdFuncVals = new double [ iNumEqConstr + iNumObjFuns76 + iNumActConstr ] ;77 78 else79 80 pdFuncVals = new double [ iNumConstr + iNumObjFuns ] ;81 8283 // Gradients have to be e v a l u a t e d only at one d es i g n v a r i a b l e v e c t o r84 i f ( bGradApprox == true )85 86 iNumParSys = 1 ;87 88 // Functions may have to be e v a l u a t e d at more than one des i gn89 // v a r i a b l e v e c t o r when us ing SqpWrapper90 else91 92 m pOptimizer−>Get ( ” NumParallelSys ” , iNumParSys ) ;

288

30.2. Example

93 9495 // A for−l oop s i m u l a t e s the p a r a l l e l e v a l u a t i o n in iNumParSys96 // p a r a l l e l systems97 for ( int iSys Idx = 0 ; iSys Idx < iNumParSys ; iSys Idx++ )98 99 // Get the des i gn v a r i a b l e v e c t o r

100 // from m pApprox when g r a d i e n t s are be ing approximated101 // from m pOptimizer in case o f normal f u n c t i o n e v a l u a t i o n102 i f ( bGradApprox == fa l se )103 104 m pOptimizer−>GetDesignVarVec ( dX, iSys Idx ) ;105 106 else107 108 m pApprox−>GetDesignVarVec ( dX ) ;109 110111 // Eva luate the e q u a l i t y c o n s t r a i n t s112113 // 0 th c o n s t r a i n t114 pdFuncVals [ 0 ] = dX [ 1 ] ∗ dX [ 1 ] + dX [ 2 ] ∗ dX [ 2 ] − 4 .0 ;115116 // Eva luate the i n e q u a l i t y c o n s t r a i n t s117 // ( when approximating g r a d i e n t s on ly the a c t i v e ones )118119 iCounter = iNumEqConstr ;120121 i f ( bGradApprox == fa l se | | p iAct ive [ 0 ] != 0 )122 123 // 1 s t c o n s t r a i n t124 pdFuncVals [ iCounter ] = dX [ 2 ] − 1 .0 − dX [ 0 ] ∗ dX [ 0 ] ;125 iCounter ++ ;126 127128 // Eva luate the o b j e c t i v e129130 pdFuncVals [ iCounter ] = log ( fabs ( dX [ 2 ] ) ) − dX [ 1 ] ;131132 // Put the v a l u e s133 // to m pApprox when g r a d i e n t s are be ing approximated134 // to m pOptimizer in case o f normal f u n c t i o n e v a l u a t i o n135 i f ( bGradApprox == fa l se )136 137 m pOptimizer−>PutObjVal ( pdFuncVals [ iCounter ] , 0 , iSys Idx ) ;138 m pOptimizer−>PutConstrValVec ( pdFuncVals , iSys Idx ) ;

289


139 140 else141 142 m pApprox−>PutFuncVals ( pdFuncVals ) ;143 144145 146147 delete [ ] pdFuncVals ;148 pdFuncVals = NULL ;149150 // I f t h e r e i s no a c t i v e s e t s t r a t e g y , p i A c t i v e has been151 // a l l o c a t e d by t h i s f u n c t i o n and thus must be d e l e t e d here .152 i f ( bAct iveSetSt rat == fa l se )153 154 delete [ ] p iAct ive ;155 p iAct ive = NULL ;156 157158 return EXIT SUCCESS ;159 160161 int Example12 : : GradEval ( )162 163 // As a l l the work i s done in GradApprox ( ) ,164 // t h e r e i s not much to do here . . .165 return GradApprox ( ) ;166 167168 int Example12 : : GradApprox ( int iApproxMethod )169 170 // This f u n c t i o n i s a l r e a d y implemented in the base c l a s s171 // OptProblem and thus does not have to implemented here .172 // However in some cases i t may be necessary to reimplement i t .173174 m pApprox = new ApproxGrad ;175176 int iNumDv ;177 int iNumConstr ;178 int iNumObjFuns ;179 int iNumSuppPts ;180 const double ∗ pdX ;181 int i S t a t u s ;182 double ∗∗ ppdJac ;183 bool bF i t t ing ;184 int ∗ p iAct ive ;

290

30.2. Example

185 int iNumIneqConstr ;186 int iNumActConstr ;187 int iCounter ;188189 iNumActConstr = 0 ;190 p iAct ive = NULL ;191192 // re turn an er ror code i f no v a l i d approximation193 // method was chosen194 i f ( iApproxMethod < 1 | | iApproxMethod > 3 )195 196 return ERROR APPROX ILLEGAL DATA ;197 198199 // Check i f a f i t t i n g a l gor i thm i s used as o p t i m i z e r .200 // I f yes , no a c t i v e s e t s t r a t e g y i s used .201 bF i t t ing = m pOptimizer−>I sF i t t ingAlgo r i thm ( ) ;202203 i f ( bF i t t i ng == fa l se )204 205 m pOptimizer−>GetNumDv( iNumDv ) ;206 m pApprox−>PutNumDv( iNumDv ) ;207208 m pOptimizer−>Get ( ”NumObjFuns” , iNumObjFuns ) ;209 m pOptimizer−>GetNumConstr ( iNumConstr ) ;210 m pOptimizer−>GetNumIneqConstr ( iNumIneqConstr ) ;211212 m pOptimizer−>Get ( ”ActConstr ” , p iAct ive ) ;213214 for ( int i IneqConstr Idx = 0 ; i IneqConstr Idx < iNumIneqConstr ;215 i IneqConstr Idx++ )216 217 i f ( p iAct ive [ i IneqConstr Idx ] != 0 )218 219 iNumActConstr++ ;220 221 222223 m pApprox−>PutNumFuns( iNumActConstr + iNumConstr224 − iNumIneqConstr + iNumObjFuns ) ;225226 m pApprox−>PutPerturbFactor ( 1 . 0E−6 ) ;227228 m pApprox−>PutApproxMethod ( iApproxMethod ) ;229230 m pOptimizer−>GetDesignVarVec ( pdX ) ;

291


231 m pApprox−>PutDesignVarVec ( pdX ) ;232233 m pApprox−>Star t ( ) ;234 m pApprox−>GetStatus ( i S t a t u s ) ;235236237 while ( i S t a t u s <= 0 )238 239 // C a l l g rad ient−v e r s i o n o f FuncEval240 FuncEval ( true ) ;241242 // Do one s t e p o f the approximation243 m pApprox−>Star t ( ) ;244245 m pApprox−>GetStatus ( i S t a t u s ) ;246 247248 // Get the r e s u l t s from m pApprox249 // and put them to the o p t i m i z e r250 m pApprox−>GetJacobian ( ppdJac ) ;251252 for ( int iConstr Idx = 0 ;253 iConstr Idx < iNumConstr − iNumIneqConstr ; iConstr Idx++ )254 255 m pOptimizer−>PutGradConstr ( iConstrIdx ,256 ppdJac [ iConstr Idx ] ) ;257 258259 iCounter = iNumConstr − iNumIneqConstr ;260261 for ( int iConstr Idx = iNumConstr − iNumIneqConstr ;262 iConstr Idx < iNumConstr ; iConstr Idx++ )263 264 i f ( p iAct ive [ iConstr Idx − iNumConstr + iNumIneqConstr ] != 0 )265 266 m pOptimizer−>PutGradConstr ( iConstrIdx , ppdJac [ iCounter ] ) ;267 iCounter++ ;268 269 270 for ( int iObjFunIdx = 0 ; iObjFunIdx < iNumObjFuns ; iObjFunIdx++)271 272 m pOptimizer−>PutGradObj ( ppdJac [ iCounter + iObjFunIdx ] ,273 iObjFunIdx ) ;274 275 276 else

292

30.2. Example

277 278 m pOptimizer−>GetNumDv( iNumDv ) ;279 m pApprox−>PutNumDv(iNumDv) ;280281 m pOptimizer−>Get ( ”NumSuppPts” , iNumSuppPts ) ;282 m pOptimizer−>GetNumConstr ( iNumConstr ) ;283 m pApprox−>PutNumFuns( iNumConstr + iNumSuppPts ) ;284285 m pApprox−>PutPerturbFactor ( 1 . 0E−6 ) ;286287 m pApprox−>PutApproxMethod ( iApproxMethod ) ;288289 m pOptimizer−>GetDesignVarVec ( pdX ) ;290 m pApprox−>PutDesignVarVec ( pdX ) ;291292 m pApprox−>Star t ( ) ;293 m pApprox−>GetStatus ( i S t a t u s ) ;294295296 while ( i S t a t u s <= 0 )297 298 // c a l l g rad ient−v e r s i o n o f FuncEval299 FuncEval ( true ) ;300301 m pApprox−>Star t ( ) ;302303 m pApprox−>GetStatus ( i S t a t u s ) ;304 305306 m pApprox−>GetJacobian ( ppdJac ) ;307308 for ( int iConstr Idx = 0 ; iConstr Idx < iNumConstr ;309 iConstr Idx++ )310 311 m pOptimizer−>PutGradConstr ( iConstrIdx ,312 ppdJac [ iConstr Idx ] ) ;313 314 for ( int iSuppPtIdx = 0 ; iSuppPtIdx < iNumSuppPts ;315 iSuppPtIdx++)316 317 m pOptimizer−>PutGradModel ( ppdJac [ iNumConstr + iSuppPtIdx ] ,318 iSuppPtIdx ) ;319 320 321322 delete m pApprox ;

293


323 m pApprox = NULL ;324325 return EXIT SUCCESS ;326 327328329 int Example12 : : SolveOptProblem ( )330 331 int iE r r o r ;332333 iEr ro r = 0 ;334335 // Put parameters :336337 m pOptimizer−>Put ( ”MaxNumIter” , 500 ) ;338 m pOptimizer−>Put ( ”MaxNumIterLS” , 50 ) ;339 m pOptimizer−>Put ( ”TermAcc” , 1 . 0E−6 ) ;340 m pOptimizer−>Put ( ”OutputLevel ” , 2 ) ;341342 // Define o p t i m i z a t i o n problem343344 m pOptimizer−>PutNumDv( 3 ) ;345 m pOptimizer−>PutNumIneqConstr ( 1 ) ;346 m pOptimizer−>PutNumEqConstr ( 1 ) ;347348 double LB[ 3 ] = − 1 .0E8 , − 1 .0E8 , − 1 .0E8 ;349 double UB[ 3 ] = 1 .0E8 , 1 . 0E8 , 1 . 0E8 ;350 double IG [ 3 ] = − 0 . 1 , 1 . 0 , 0 . 1 ;351352 m pOptimizer−>PutUpperBound ( UB ) ;353 m pOptimizer−>PutLowerBound ( LB ) ;354 m pOptimizer−>Put In i t i a lGue s s ( IG ) ;355356 // S t a r t o p t i m i z a t i o n357 iEr ro r = m pOptimizer−>DefaultLoop ( this ) ;358359 // i f t h e r e was an error , r e p o r t i t . . .360 i f ( iE r ro r != EXIT SUCCESS )361 362 cout << ” Error ” << iE r r o r << ”\n” ;363 return iE r r o r ;364 365366 // . . . e l s e r e p o r t the r e s u l t s :367368 const double ∗ dX ;

294

30.2. Example

369370 m pOptimizer−>GetDesignVarVec ( dX ) ;371372 int iNumDesignVar ;373374 m pOptimizer−>GetNumDv( iNumDesignVar ) ;375376 for ( int iDvIdx = 0 ; iDvIdx < iNumDesignVar ; iDvIdx++ )377 378 cout << ”dX[ ” << iDvIdx << ” ] = ” << dX[ iDvIdx ] << endl ;379 380 return EXIT SUCCESS ;381

295

31. Class OptProblem and theDefaultLoop()-function

OptProblem is a base class for optimization problems which can be solved using NLP++’soptimizers.

DefaultLoop() is a member function of OptProblem which performs an optimization loopuntil either an error occurs or the optimal solution is computed.

OptProblem is providing the DefaultLoop() function as well as several functions which theoptimizer needs to run DefaultLoop().

These functions are:

• FuncEval(), which evaluates the objective and constraint functions at the current iterateand puts the values to the optimizer

• GradEval(), which evaluates the gradients of objective and constraint functions and putsthem to the OptimizerInterface object.

• GradApprox(), which approximates the gradient of all functions by using the ApproxGradclass.

If you do not want to use DefaultLoop(), there is no need to derive your optimization problemfrom OptProblem.

Usage:

1. Define a class (e.g. MyProblem) which is derived from OptProblem and defines your opti-mization problem.

2. This class must implement the functions FuncEval(), GradEval(), SolveOptProblem().If you want to approximate the gradients of the objective/constraint functions, you canimplement GradEval() as a call to GradApprox() which is already implemented in the baseclass OptProblem. In this case, however, the function FuncEval() must be implementedin a way that makes sure that during the approximation of gradients the function valuesare put to the ApproxGrad object in the correct order (first constraints, then objective(s))

3. Either the class constructor or the SolveOptProblem() function must allocate memoryfor m pOptimizer, e.g. using new SqpWrapper.

4. Via m pOptimizer, the function SolveOptProblem() then must define the problem usingthe Put-methods (see documentation of the optimizer). Instead of calling StartOptimizer()the SolveOptProblem() function can now call DefaultLoop(this) using the currentMyProblem object as an argument.

297

31. Class OptProblem and the DefaultLoop()-function

5. Generate a MyProblem object.

6. Call MyProblem::SolveOptProblem().

298

Part VI.

Using Nlp++ together with Diffpack

299

32. Proceeding from a calculation to anoptimization program

32.1. General Strategy

Proceeding from a Diffpack program which performs a calculation, generally the following stepsare necessary to make an optimization program of it:

• main() function: Call adm and SolveOptProblem instead of multipleLoop

• Class definition and implementation:

– Add OptProblem as additional base class.

– Add functions SolveOptProblem and FuncEval

– It may be necessary to add some variables and menu entries (e.g. for the bounds)

– As the default implementation of GradEval (which is a call to GradApprox) needsseveral calls to the Diffpack simulator it may be good to use a more advanced method(e.g. a sensitivity analysis as done in DPSteadyEnergy1) to save time computing thegradients.

– Suppress unnecessary output (e.g. the results of the Diffpack calculations).

• Compilation:

– Microsoft Visual C++ Project settings:

∗ It will probably be necessary to ignore the libcmt.lib library.

∗ Add $(NLP)\include as include directory

∗ Add $(NLP)\lib as library directory

∗ Link to nlp.lib in release-mode and to nlpd.lib in debug-mode

– Linux: Adjust .cmake1 and .cmake2 so that NLP++’s headerfiles and the libNlp++.acan be found and used for compilation.

32.2. Example

A Diffpack program is given which computes the deflection of a beam of given length when aforce is applied. Proceeding from that program we minimize the cross sectional area. Variablesare the height and the width of the beam. However, the deflection is not allowed to exceed abound (maxDeflection). Thus we solve the following optimization problem:

301

32. Proceeding from a calculation to an optimization program

Minimize F (x) = width · height ,s.t. G0(x) = deflection− maxDeflection ≥ 0 (deflection and maxDeflection are negative)

All changes which were necessary to proceed from the pure calculation to the optimizationwere marked with NLP++ and ENDNLP++.

To be able to execute this example Diffpack has to be installed on your computer.

File DPBeamSolver.h:

#i f n d e f DPBeamSolver h IS INCLUDED#d e f i n e DPBeamSolver h IS INCLUDED

#inc lude <FEM. h>#inc lude <DegFreeFE . h>#inc lude <LinEqAdmFE . h>

//NLP++: Added Headers#inc lude ” Opt imize r Inte r faceHeader s . h”//ENDNLP++

//NLP++: added a d d i t i o n a l d e r i v a t i o n from OptProblemc l a s s DPBeamSolver : pub l i c OptProblem , pub l i c FEM//ENDNLP++pub l i c :

Handle ( GridFE ) g r id ; // FE−Git t e rHandle ( DegFreeFE ) dof ; // Verwaltung der F r e i h e i t s g r a d e

// des SystemsHandle ( FieldsFE ) w; // BalkenauslenkungHandle (LinEqAdmFE) l i n e q ; // L ineare s SystemVec ( dprea l ) l i n s o l ; // Loesung des l i n e a r e n Systems

dprea l e p s i l o n ; // Parameter f u e r Biegebalkendprea l width ; // Ba lkenbre i t edprea l he ight ; // Balkenhoehe//NLP++: Added v a r i a b l e sdprea l maxDef lect ion ;// maximum d e f l e c t i o ndprea l minWidth ; // lower bound f o r widthdprea l maxWidth ; // upper bound f o r widthdprea l minHeight ; // lower bound f o r he ightdprea l maxHeight ; // upper bound f o r he ight//ENDNLP++dprea l l ength ; // Balkenlaengedprea l rho ; // Balkendichtedprea l I y ; // Flaechentraegheitsmoment in

// Richtung der y−Achse

302

http://www.diffpack.com

32.2. Example

dprea l E modul ; // E l a s t i z i t a e t s m o d u ldprea l f o r c e z ; // Kraft auf den Balkendprea l q f o r c e ; // Querkraft auf den Balken

v i r t u a l void adm ( MenuSystem& menu ) ;v i r t u a l void d e f i n e ( MenuSystem& menu , i n t l e v e l = MAIN) ;v i r t u a l void scan ( ) ;v i r t u a l void solveProblem ( ) ;

// Setzen der e s s e n t i e l l e n Randbedingungv i r t u a l void f i l lE s sBC ( ) ;// Berechnung von Elementmatrix und −vektorv i r t u a l void calcElmMatVec ( i n t e , ElmMatVec& elmat ,

FiniteElement& f e ) ;// Berechnung der Var i a t i on sg l e i chungv i r t u a l void in teg rands (ElmMatVec& elmat ,

const FiniteElement& f e ) ;

// NLP++ Added f u n c t i o n s :i n t FuncEval ( bool = f a l s e ) ;i n t GradEval ( ) re turn GradApprox ( ) ; i n t SolveOptProblem ( void ) ;// ENDNLP++

DPBeamSolver ( ) ;˜DPBeamSolver ( ) ;

;#e n d i f

File main.cpp:

#i f d e f WIN32#inc lude <LibsDP . h>#e n d i f

#inc lude <DPBeamSolver . h>// NLP++ : added h e a d e r f i l e#inc lude ” Opt imize r Inte r faceHeader s . h”// ENDNLP++

i n t main ( i n t argc , const char ∗ argv [ ] )

i n i t D i f f p a c k ( argc , argv ) ;

303


global menu . i n i t (” Minimize area o f a beam s . t . a maximumd e f l e c t i o n ” , ”DPBeamSolver ” ) ;

DPBeamSolver sim ;

// NLP++: in s t ead o f c a l l i n g mult ipleLoop// c a l l adm and SolveOptProblem

// old :

// global menu . mult ipleLoop ( sim ) ;

// new :

sim . adm( global menu ) ;sim . SolveOptProblem ( ) ;

// ENDNLP++

return 0 ;

File DPBeamSolver.cpp:

#inc lude <DPBeamSolver . h>#inc lude <PreproBox . h>#inc lude <ElmMatVec . h>#inc lude <FiniteElement . h>#inc lude <ErrorNorms . h>#inc lude <readOrMakeGrid . h>

DPBeamSolver : : DPBeamSolver ( )

DPBeamSolver : : ˜DPBeamSolver ( )

l i n e q . detach ( ) ;

void DPBeamSolver : : adm ( MenuSystem& menu)

SimCase : : attach (menu ) ;d e f i n e (menu ) ;menu . prompt ( ) ;scan ( ) ;

304

32.2. Example

void DPBeamSolver : : d e f i n e ( MenuSystem& menu , i n t l e v e l )

menu . addItem ( l e v e l , ” g r i d f i l e ” , ” Read e x i s t i n g g r id f i l e s ” ,” s r c /beam . g r id ” ) ;

//NLP++: add upper and lower bounds f o r the v a r i a b l e s height ,width and s e t the maximum d e f l e c t i o n

// o ld :

//menu . addItem ( l e v e l , ”width ” , ” balkenwidth ” , ” 1 . 0 0 ” ) ;//menu . addItem ( l e v e l , ” he ight ” , ” balkenhoehe ” , ” 1 . 0 0 ” ) ;

//new :// i n i t i a l va lue f o r the widthmenu . addItem ( l e v e l , ”width ” , ” Anfangswert b a l k e n b r e i t e ” , ” 0 . 8 ” ) ;// i n i t i a l va lue f o r the he ightmenu . addItem ( l e v e l , ” he ight ” , ” Anfangswert balkenhoehe ” , ” 0 . 8 ” ) ;// maximum d e f l e c t i o n ( negat ive )menu . addItem ( l e v e l , ” maxDef lect ion ” , ”Maximale Durchbiegung ” , ”−0.005”);// lower bound f o r the widthmenu . addItem ( l e v e l , ”minWidth ” , ”Minimale Ba lkenbre i t e ” , ”0 . 00001” ) ;// upper bound f o r the widthmenu . addItem ( l e v e l , ”maxWidth” , ”Maximale Ba lkenbre i t e ” , ” 1 . 0 ” ) ;// lower bound f o r the he ightmenu . addItem ( l e v e l , ”minHeight ” , ”Minimale Balkenhoehe ” , ”0 . 00001” ) ;// upper bound f o r the he ightmenu . addItem ( l e v e l , ”maxHeight ” , ”Maximale Balkenhoehe ” , ” 1 . 0 ” ) ;

//ENDNLP++:

menu . addItem ( l e v e l , ” l ength ” , ” ba lken laenge ” , ” 1 0 . 0 0 ” ) ;menu . addItem ( l e v e l , ” f o r c e z ” , ” k r a f t auf balken ” , ”20000 . 0” ) ;

menu . addItem ( l e v e l , ” rho ” , ” ba lkend i chte ” , ”2500”) ;menu . addItem ( l e v e l , ”E modul ” , ” parameter in PDE” ,

”30000000000 .0”) ;

//Submenu// Parameter des l i n e a r e n Systems

LinEqAdmFE : : d e f i n e S t a t i c (menu , l e v e l +1);

void DPBeamSolver : : scan ( )

305


MenuSystem& menu = SimCase : : getMenuSystem ( ) ;

// E in l e s en des L o e s u n g s g i t t e r sg r id . reb ind (new GridFE ( ) ) ;S t r ing g r i d f i l e ;g r i d f i l e = menu . get (” g r i d f i l e ” ) ;readOrMakeGrid (∗ gr id , g r i d f i l e ) ;

// E in l e s en der Menu−Eintraege

//NLP++: get va lue s o f the a d d i t i o n a l menu e n t r i e s :

width = menu . get (” width ” ) . getReal ( ) ;he ight = menu . get (” he ight ” ) . getReal ( ) ;maxDef lect ion = menu . get (” maxDef lect ion ” ) . getReal ( ) ;minWidth = menu . get (” minWidth ” ) . getReal ( ) ;maxWidth = menu . get (”maxWidth ” ) . getReal ( ) ;minHeight = menu . get (” minHeight ” ) . getReal ( ) ;maxHeight = menu . get (” maxHeight ” ) . getReal ( ) ;

//ENDNLP++

length = menu . get (” l ength ” ) . getReal ( ) ;rho = menu . get (” rho ” ) . getReal ( ) ;E modul = menu . get (” E modul ” ) . getReal ( ) ;f o r c e z = menu . get (” f o r c e z ” ) . getReal ( ) ;

// Gewichtskra f t// q f o r c e = width ∗ he ight ∗ rho ∗ ( −9 .81) ;// f u e r das Bsp mit r e i n e r Punkt lastq f o r c e = 0 . 0 ;// FlaechentraegheitsmomentI y = width ∗ pow( height , 3 . 0 ) / 12 .0 ;// Biegebalkenparametere p s i l o n = E modul ∗ I y ;

// Anlegen des Loesungsvektorsw. reb ind (new FieldsFE (∗ gr id , 2 , ”w” ) ) ;// Anlegen des DegFreeFE−Objektes , 2 Unbekannte pro Knotendof . reb ind (new DegFreeFE (∗ gr id , 2 ) ) ;// L ineare s System und l i n e a r e Loeser e inbindenl i n e q . reb ind (new LinEqAdmFE ( ) ) ;// E in l e s en der Parameter des l i n e a r e n Systemsl ineq−>scan (menu ) ;

306

32.2. Example

// Gesamtanzahl der F r e i h e i t s g r a d ei n t t o t a l n o d o f s = dof−> getTotalNoDof ( ) ;// Anlegen des L s u n g s v e k t o r s des l i n e a r e n Systemsl i n s o l . redim ( t o t a l n o d o f s ) ;l i n eq−>attach ( l i n s o l ) ;l i n s o l . f i l l ( 0 . 0 ) ;

// se t z en der D i r i c h l e t−Randbedingungenvoid DPBeamSolver : : f i l lE s sBC ( )

i n t i ;dof−>initEssBC ( ) ;i n t nno= grid−>getNoNodes ( ) ; // Anzahl der Gitterknoten

f o r ( i = 1 ; i<= nno ; i++)

// f a l l s der Randindikator 2 auf einem Knoten g e s e t z t i s t . . .i f ( gr id−>boNode ( i , 2 ) )

/ / . . . wird der e r s t e F r e i h e i t s g r a d am Knoten Nul ldof−>f i l lE s sBC ( i , 1 , 0 . 0 ) ;

i f ( gr id−>boNode ( i , 3 ) )

dof−>f i l lE s sBC ( i , 2 , 0 . 0 ) ;

// Berechnung des l i n e a r e n Gle ichungssystemsvoid DPBeamSolver : : calcElmMatVec ( i n t e , ElmMatVec& elmat ,

FiniteElement& f e )

f e . e v a l D e r i v a t i v e s ( 2 ) ;FEM: : calcElmMatVec ( e , elmat , f e ) ;

i n t s , n s i d e s ;i n t globNode ;i n t noNodes ;i n t locNodeNum ;n s i d e s = f e . getNoSides ( ) ; // Anzahl der Knoten des ElementesnoNodes = grid−>getNoNodes ( ) ;

f o r ( s = 1 ; s <= n s i d e s ; s++)

307


// f a l l s der Randindikator 5 g e s e t z t wurdei f ( f e . boSide ( s , 5 ) )

// l o k a l e KnotennummerlocNodeNum = f e . getElmDef ( ) . getNodeOnSide ( s , 1 ) ;// g l o b a l e KnotennummerglobNode = grid−>l o c 2g l ob ( e , locNodeNum ) ;

// Randknoteni f ( ( e == 1 && globNode == 1 ) | | ( e == grid−>getNoElms ( ) &&

globNode == noNodes ) )

// Kraft zum Elementvektor addierenelmat . b ( ( locNodeNum−1)∗2+1) += − f o r c e z ;

e l s e

elmat . b ( ( locNodeNum−1)∗2+1) += (− f o r c e z / 2 . 0 ) ;

// Berechnung der Elementmatrix und des Elementvektorsvoid DPBeamSolver : : i n t eg rands (ElmMatVec& elmat ,

const FiniteElement& f e )

i n t i , j ;dprea l grad2N ;// Anzahl der Bas i s funkt ionenconst i n t nbf = f e . getNoBasisFunc ( ) ;// Jacobi−Determinante x I n t e g r a t i o n s g e w i c h tconst dprea l detJxW = f e . detJxW ( ) ;

f o r ( i = 1 ; i <= nbf ; i++)

f o r ( j = 1 ; j <= nbf ; j++)

grad2N = e p s i l o n ∗ f e . d2N( i , 1 , 1 )∗ f e . d2N( j , 1 , 1 ) ;elmat .A( i , j )+= grad2N ∗ detJxW ;

elmat . b( i ) += q f o r c e ∗ f e .N( i ) ∗ detJxW ;

308

32.2. Example

//NLP++: Add new func t i on FuncEvali n t DPBeamSolver : : FuncEval ( bool bGradApprox )

i f ( bGradApprox == f a l s e )

m pOptimizer−>GetDesignVar ( 0 , width ) ;m pOptimizer−>GetDesignVar ( 1 , he ight ) ;

e l s e

const double ∗ vec = NULL ;m pApprox−>GetDesignVarVec ( vec ) ;width = vec [ 0 ] ;he ight = vec [ 1 ] ;

// moment o f i n e r t i a o f areaI y = width ∗ pow( height , 3 . 0 ) / 12 .0 ;e p s i l o n = E modul ∗ I y ;

solveProblem ( ) ;

i n t nno = grid−>getNoNodes ( ) ;

i f ( bGradApprox == f a l s e )

m pOptimizer−>PutObjVal ( width ∗ he ight ) ;m pOptimizer−>PutConstrVal (0 , l i n s o l ( nno ) − maxDef lect ion ) ;

e l s e

double FuncVals [ 2 ] = l i n s o l ( nno ) − maxDeflection , width ∗ he ight ;m pApprox−>PutFuncVals ( FuncVals ) ;

re turn EXIT SUCCESS ;//ENDNLP++

//NLP++: Add new func t i on SolveOptProblemi n t DPBeamSolver : : SolveOptProblem ( void )

i n t iE r ro r ;

iE r r o r = 0 ;

m pOptimizer = new ScpWrapper ( ) ;

309


i n t iMaxNumIter = 5000 ;i n t iMaxNumIterLS = 20 ;double dTermAcc = 1 .0E−6 ;

m pOptimizer−>Put ( ” NumParallelSys ” , 1 ) ;m pOptimizer−>Put ( ”MaxNumIter” , & iMaxNumIter ) ;m pOptimizer−>Put ( ”MaxNumIterLS” , & iMaxNumIterLS ) ;m pOptimizer−>Put ( ”RhoBegin ” , 0 . 1 ) ;m pOptimizer−>Put ( ”TermAcc” , & dTermAcc ) ;m pOptimizer−>Put ( ”OutputLevel ” , 2 ) ;

m pOptimizer−>PutNumDv( 2 ) ;m pOptimizer−>PutNumIneqConstr ( 1 ) ;m pOptimizer−>PutNumEqConstr ( 0 ) ;

double LB[ 2 ] = minWidth , minHeight ;double UB[ 2 ] = maxWidth , maxHeight ;double IG [ 2 ] = width , he ight ;

m pOptimizer−>PutUpperBound (UB) ;m pOptimizer−>PutLowerBound (LB) ;m pOptimizer−>Put In i t i a lGue s s ( IG) ;

iE r ro r = m pOptimizer−>DefaultLoop ( t h i s ) ;

i f ( iE r r o r != EXIT SUCCESS )

cout << ” Error ” << iE r r o r << ”\n” ;re turn iEr ro r ;

const double ∗ dX ;

m pOptimizer−>GetDesignVarVec ( dX ) ;i n t iNumDesignVar ;m pOptimizer−>GetNumDv( iNumDesignVar ) ;

f o r ( i n t iDvIdx = 0 ; iDvIdx < iNumDesignVar ; iDvIdx++ )

cout << ”dX[ ” << iDvIdx << ” ] = ” << dX[ iDvIdx ] << ”\n” ;

re turn EXIT SUCCESS ;

310

32.2. Example

//ENDNLP++

void DPBeamSolver : : solveProblem ( )

// Setzen der e s s e n t i e l l e n Randbedingungf i l lE s sBC ( ) ;// Berechnen des l i n e a r e n SystemsmakeSystem (∗ dof , ∗ l i n e q ) ;// Loesen des l i n e a r e n Systemsl ineq−>s o l v e ( ) ;

// Schre iben der Loesung in wdof −> v e c 2 f i e l d ( l i n s o l , ∗w) ;

// NLP++: Suppress output o f s o l u t i o n

// i n t nno = grid−>getNoNodes ( ) ;// i n t i , j ;// i n t k=1;

//Ptv ( dprea l ) x va lue ; // a k t u e l l e r Auswertungspunkt// x va lue . redim ( nno ) ;// x va lue . f i l l ( 0 . 0 ) ;

//Ptv ( dprea l ) x coords ; // Gi t t e rkoord inaten// x coords . redim ( nno ) ;

// f o r ( i =1; i<=nno ; i++)//// // Berechnung der Gi t t e rkoord inaten// x coords ( i )=gr id−>getCoor ( i , 1 ) ;////// f o r ( j =1; j<=nno ; j++)//// x va lue (1)= x coords ( j ) ;

// s o<<”Auswertungspunkt : ”<<x va lue (1)<<”\n ” ;// s o << ” numerische Loesung : ”<< l i n s o l ( k)<<”\n ” ;// // D e f l e c t i o n i s s to r ed in the odd components o f w// k=k+2;//

//ENDNLP++

311


312

Bibliography

[1] M. A. Abramson. Mixed variable optimization of a load-bearing thermal insulation systemusing a filter pattern search algorithm. Optim. Eng., 5(2):157–177, 2004.

[2] L. Armijo. Minimization of functions having lipschitz continuous first partial derivatives.Pacific Journal of Mathematics, 16:1–3, 1966.

[3] C. Audet and J. Dennis. Pattern search algorithm for mixed variable programming. SIAMJournal on Optimization, 11:573–594, 2001.

[4] A. Ben-Tal. Characterization of Pareto and Lexicographic Optimal Solutions. Fandel, G.and Gal, T.(eds.): Multiple Criteria Decision Making Theory and Application. Springer,New York, 1980.

[5] J. Birk, M. Liepelt, K. Schittkowski, and F. Vogel. Computation of optimal feed rates andoperation intervals for tubular reactors. Journal of Process Control, 9:325–336, 1999.

[6] A. Bjorck. Least Squares Methods. Elsevier, Amsterdam, 1990.

[7] M. Blatt and K. Schittkowski. Optimal control of one-dimensional partial differentialequations applied to transdermal diffusion of substrates. Optimization Techniques andApplications, 1:81–93, 1998.

[8] C. Blum. Ant colony optimization: Introduction and recent trends. Physics of LifeReviews, 2(4):353–373, 2005.

[9] C. Blum. The metaheuristics network web site: Ant colony optimization.Http://www.metaheuristics.org/index.php?main=3&sub=31, 2008.

[10] P. Boderke, K. Schittkowski, M. Wolf, and H. Merkle. Modeling of diffusion and concurrentmetabolism in cutaneous tissue. Journal on Theoretical Biology, 204(3):393–407, 2000.

[11] N. Boland. A dual-active-set algorithm for positive semi-definite quadratic programming.Mathematical Programming, 78:1–27, 1997.

[12] F. Bonabeau, M. Dorigo, and G. Theraulaz. Inspiration for optimization from social insectbehaviour. Nature, 406:39–42, 2000.

[13] I. Bongartz, A. Conn, N. Gould, and P. Toint. Cute: Constrained and unconstrainedtesting environment. Transactions on Mathematical Software, 21(1):123–160, 1995.

[14] J. Bonnans, E. Panier, A. Tits, and J. Zhou. Avoiding the maratos effect by means of anonmonotone line search, ii: Inequality constrained problems – feasible iterates,. SIAMJournal on Numerical Analysis, 29:1187–1202, 1992.

313

Bibliography

[15] B. Borchers and J. Mitchell. An improved branch and bound algorithm for mixed integernonlinear programming,. Computers and Operations Research, 21(4):359–367, 1194.

[16] G. E. P. Box and M. E. Mller. A note on the generation of random normal deviates. Ann.Math. Stat., 29(2):610–611, 1958.

[17] M. Bunner, K. Schittkowski, and G. van de Braak. Optimal design of electronic compo-nents by mixed-integer nonlinear programming. Optimization and Engineering, 5:271–294,2004.

[18] J. Burke. A robust trust region method for constrained nonlinear programming problems.SIAM Journal on Optimization, 2:325–347, 1992.

[19] R. Byrd, N. Gould, J. Nocedal, and R. Waltz. An active-set algorithm for nonlinear pro-gramming using linear programming and equality constrained subproblems. MathematicalProgramming B, 100:27–48, 2004.

[20] R. H. Byrd, R. B. Schnabel, and G. A. Schultz. A trust region algorithm for nonlinearlyconstrained optimization. SIAM Journal on Numerical Analysis, 24:1152–1170, 1987.

[21] M. Celis. A trust region strategy for nonlinear equality constrained optimization. PhDthesis, Department of Mathematics, Rice University, USA, 1983.

[22] R. Chelouah and P. Siarry. Genetic and nelder-mead algorithms hybridized for a moreaccurate global optimization of continuous multiminima functions. Eur. J. Oper. Res.,148(2):335–348, 2003.

[23] R. Chelouah and P. Siarry. A hybrid method combining continuous tabu search and nelder-mead simplex algorithms for the global optimization of multiminima functions. Eur. J.Oper. Res., 161(3):636–654, 2005.

[24] W. Chunfeng and Z. Xin. Ants foraging mechanism in the design of multiproduct batchchemical process. Ind. Eng. Chem. Res., 41(26):6678–6686, 2002.

[25] C. A. Coello. Theoretical and numerical constraint-handling techniques used with evo-lutionary algorithms: A survey of the state of the art. Comput. Method. Appl. M.,191(11):1245–1287, 2002.

[26] A. Conn, N. Gould, and P. Toint. Trust-Region Methods. MPS/SIAM Series on Optimiza-tion, 2000.

[27] Y. Dai. On the nonmonotone line search. Journal of Optimization Theory and Applications,112(2):315–330, 2002.

[28] Y. Dai and K. Schittkowski. A sequential quadratic programming algorithm with non-monotone line search. Pacific Journal of Optimization, 4:335–351, 2006.

[29] N. Deng, Y. Xiao, and Z. F.J. Nonmonotonic trust-region algorithm. Journal of Opti-mization Theory and Applications, 26:259–285, 1993.

314

Bibliography

[30] J. Dennis. Some computational techniques for the nonlinear least squares problem. InG. Byrne and C. Hall, editors, Numerical Solution of Systems of Nonlinear AlgebraicEquations. Academic Press, London, New York, 1973.

[31] J. Dennis. Nonlinear least squares. In D. Jacobs, editor, The State of the Art in NumericalAnalysis. Academic Press, London, New York, 1977.

[32] M. Dorigo. Optimization, Learning and Natural Algorithms. PhD thesis, Politecnico diMilano (Italy), 1992.

[33] M. Dorigo, G. Di Caro, and L. M. Gambardella. Ant algorithms for discrete optimization.Artifical Life, 5(2):137–172, 1999.

[34] M. Dorigo and T. Stuetzle. Ant Colony Optimization. MIT Press, 2004.

[35] J. Dreo and P. Siarry. An ant colony algorithm aimed at dynamic continuous optimization.Appl. Math. Comput., 181(1):457–467, 2006.

[36] M. Duran and I. E. Grossmann. An outer-approximation algorithm for a class of mixed-integer nonlinear programs. Math. Program., 36:307–339, 1986.

[37] T. Edgar and D. Himmelblau. Optimization of Chemical Processes. McGrawHill, 1988.

[38] J. A. Egea, M. Rodriguez-Fernandez, J. R. Banga, and R. Marti. Scatter search forchemical and bio-process optimization. J. Global. Optim., 37(3):481–503, 2007.

[39] O. Exler, T. Lehmann, and K. Schittkowski. A comparative study of SQP-type algo-rithms for nonlinear and nonconvex mixed-integer optimization. Mathematical Program-ming Computation, 70(4):383–412, 2012.

[40] O. Exler, T. Lehmann, and K. Schittkowski. Misqp: A fortran subroutine of a trust regionSQP algorithm for mixed-integer nonlinear programming - user’s guide. Technical report,Department of Mathematics, University of Bayreuth, 2012.

[41] O. Exler and K. Schittkowksi. A trust region SQP algorithm for mixed-integer nonlinearprogramming. Optimization Letters, 3(1):269–280, 2007.

[42] M. Fernanda, P. Costa, and M. Fernandes. A primal-dual interior-point algorithm fornonlinear least squares constrained problems. TOP, 13:1863–8279, 2005.

[43] R. Fletcher. Second order corrections for non-differentiable optimization. In G. A. Watson,editor, Numerical Analysis, volume 912 of Lecture notes in mathematics, pages 85–115.Springer, Berlin, 1982.

[44] R. Fletcher. Practical Methods of Optimization. John Wiley & Sons, Chichester, 2. ed.,reprinted in paperback. edition, 2000.

[45] R. Fletcher and S. Leyffer. Solving mixed integer nonlinear programs by outer approxi-mation. Mathematical Programming, 66(1-3):327–349, 1994.

[46] C. Fleury. CONLIN: an efficient dual optimizer based on convex approximation concepts.Structural Optimization, 1:81–89, 1989.

315

Bibliography

[47] C. Fleury. First and second order convex approximation strategies in structural optimiza-tion. Structural Optimization, 1:3–10, 1989.

[48] C. A. Floudas. Nonlinear and Mixed-Integer Optimization: Fundamentals and Applica-tions. Topics in chemical engineering. Oxford University Press, New York, 1995.

[49] J. Frias, J. Oliveira, and K. Schittkowski. Modelling of maltodextrin de12 drying processin a convection oven. Applied Mathematical Modelling, 24:449–462, 2001.

[50] R. Ge and Y. Qin. A class of filled functions for finding global minimizers of a function ofseveral variables. Journal of Optimization Theory and Applications, 54:241–252, 1987.

[51] A. Geist, A. Beguelin, J. Dongarra, W. Jiang, R. Manchek, and V. Sunderam. PVM 3.0.A User’s Guide and Tutorial for Networked Parallel Computing. The MIT Press, 1995.

[52] P. Gill and W. Murray. Algorithms for the solution of the non-linear least-squares problem.CIAM Journal on Numerical Analysis, 15:977–992, 1978.

[53] P. Gill, W. Murray, M. Saunders, and M. Wright. User’s guide for sql/npsol: A fortranpackage for nonlinear programming. Technical Report SOL 83-12, Dept. of OperationsResearch, Standford University, California, 1983.

[54] P. Gill, W. Murray, and M. Wright. Practical Optimization. Academic Press, London,New York, Toronto, Sydney, San Francisco, 1981.

[55] F. W. Glover and G. A. Kochenberger. Handbook of Metaheuristics. Springer, New York,2003.

[56] D. Goldfarb and A. Idnani. A numerically stable method for solving strictly convexquadratic programs. Mathematical Programming, 27:1–33, 1983.

[57] L. Grippo, F. Lampariello, and S. Lucidi. A nonmonotone line search technique for new-tons’s method. SIAM Journal on Numerical Analysis, 23:707–716, 1986.

[58] L. Grippo, F. Lampariello, and S. Lucidi. A truncated newton method with nonmonotoneline search for unconstrained optimization. Journal of Optimization Theory and Applica-tions, 60:401–419, 1989.

[59] L. Grippo, F. Lampariello, and S. Lucidi. A class of nonmonotone stabilization methodsin unconstrained optimization. Numerische Mathematik, 59:779–805, 1991.

[60] I. E. Grossmann. Review of nonlinear mixed-integer and disjunctive programming tech-niques. Optim. Eng., 3(3):227–252, 2002.

[61] I. E. Grossmann and Z. Kravanja. Mixed-integer nonlinear programming: A survey ofalgorithms and applications. In L. T. Biegler, T. F. Coleman, A. R. Conn, and F. N. San-tosa, editors, Large-scale Optimization with Applications, volume 93 of The IMA volumesin mathematics and its applications. Springer, New York, 1997.

[62] O. K. Gupta and V. Ravindran. Branch and bound experiments in convex non- linearinteger programming. Manage. Sci., 31:1533–1546, 1985.

316

Bibliography

[63] Y. Haimes and V. Chankong. Kuhn-tucker multipliers as trade-offs in multiobjectivedecision-making. Automatica, 15:59–72, 1979.

[64] J. Hald and K. Madsen. Combined lp and quasi-newton methods for minmax optimization.Mathematical Programming, 20:49–62, 1981.

[65] R. Hanson and F. Frogh. A quadratic-tensor model algorithm for nonlinear least-squaresproblems with linear constraints. ACM Transactions on Mathematical Software, 18(2):115–133, 1992.

[66] C. Hartwanger, K. Schittkowski, and H. Wolf. Computer aided optimal design of hornradiators for satellite communication. Engineering Optimization, 33:221–244, 2000.

[67] K. Hiebert. A comparison of nonlinear least squares software. Technical Report SAND79-0483, Sandia National Laboratories, Albuquerque, New Mexico, 1979.

[68] W. Hock and K. Schittkowski. Test Examples for Nonlinear Programming Codes, volume187 of Lecture Notes in Economics and Mathematical Systems. Springer, 1981.

[69] W. Hock and K. Schittkowski. A comparative performance evaluation of 27 nonlinearprogramming codes. Computing, 30:335–358, 1983.

[70] J. Holt and R. Fletcher. An algorithm for constrained non-linear least-squares. IMAJournal on Applied Mathematics, 23:449–463, 1979.

[71] R. Horst and P. Pardalos, editors. Handbook of Global Optimization. Kluwer AcademicPublishers, Dordrecht, 1995.

[72] F. Jarre and M. Saunders. A practical interior-point method for convex programming.SIAM J. on Optimization, 5:149–171, 1995.

[73] V. K. Jayaraman, B. D. Kulkarni, S. Karale, and P. Shelokar. Ant colony framework foroptimal design and scheduling of batch plants. Comput. Chem. Eng., 24(8):1901–1912,2000.

[74] X. Ke and J. Han. A nonmonotone trust region algorithm for equality constrained opti-mization. Science in China, 38A:683–695, 1995.

[75] X. Ke, G. Liu, and D. Xu. A nonmonotone trust-region algorithm for unconstrainedoptimization. Chinese Science Bulletin, 41:197–201, 1996.

[76] G. Kneppe. Mbb-lagrange: Structural optimization system for space and aircraft struc-tures. Technical Report S-PUB-0406, MBB Hubschrauber und Flugzeuge, Ottobrunn,Munich, 1990.

[77] G. Kneppe, J. Krammer, and E. Winkler. Structural optimization of large scale problemsusing mbb-lagrange. Technical Report MBB-S-PUB-305, Messerschmitt-Bolkow-Blohm,Munich, 1987.

[78] M. Kong and P. Tian. Application of aco in continuous domain. In L. Jiao, L. Wang,X. Gao, J. Liu, and F. Wu, editors, Advances in Natural Computation, pages 126–135.Springer, Berlin, Heidelberg, 2006.

317

Bibliography

[79] T. Lehmann, K. Schittkowski, and T. Spickenreuther. Miql: A fortran subroutine for con-vex mixed-integer quadratic programming by branch-and-bound - user’s guide. Technicalreport, Department of Computer Science, University of Bayreuth, Bayreuth, 2009.

[80] M. Lenard. A computational study of active set strategies in nonlinear programming withlinear constraints. Mathematical Programming, 16:81–97, 1979.

[81] K. Levenberg. A method for the solution of certain problems in least squares. Quarterlyof Applied Mathematics, 2:164–168, 1944.

[82] A. Levy and A. Montalvo. The tunneling algorithm for the global minimization of func-tions. SIAM Journal on Scientific and Statistical Computing, 6:15–29, 1985.

[83] S. Leyffer. Integrating SQP and branch-and-bound for mixed integer nonlinear program-ming. Computational Optimization and Application, 18:295–309, 2001.

[84] H. L. Li and C. T. Chou. A global approach for nonlinear mixed discrete programming indesign optimization. Engineering Optimization, 22:109–122, 1994.

[85] P. Lindstrom. A stabilized gauß-newton algorithm for unconstrained least squares prob-lems. Technical Report UMINF-102.82, Institute of Information Processing, University ofUmea, Umea, Sweden, 1982.

[86] P. Lindstrom. A general purpose algorithm for nonlinear least squares problems with non-linear constraints. Technical Report UMINF-103.83, Institute of Information Processing,University of Umea, Umea, Sweden, 1983.

[87] S. Lucidi, F. Rochetich, and M. Roma. Curvilinear stabilization techniques for truncatednewton methods in large-scale unconstrained optimization. SIAM Journal on Optimiza-tion, 8:916–939, 1998.

[88] L. Luksan. An implementation of recursive quadratic programming variable metric meth-ods for linearly constrained nonlinear minmax approximations. Kybernetika, 21:22–40,1985.

[89] I. Lustig, R. Marsten, and D. Shanno. On implementing Mehrotra’s predictor-correctorinterior point method for linear programming. SIAM J. Optimization, 2(3):435–449, 1992.

[90] N. Mahdavi-Amiri. Generally constrained nonlinear least squares and generating nonlin-ear programming test problems: Algorithmic approach. PhD thesis, The John HopkinsUniversity, Baltimore, Maryland, USA, 1981.

[91] N. Mahdavi-Amiri and R. Bartels. Constrained nonlinear least squares: an exact penaltyapproach with projected structured quasi-newton updates. ACM Transactions on Mathe-matical Software (TOMS), 15:220–242, 1989.

[92] D. Marquardt. An algorithm for least-squares estimation of nonlinear parameters. SIAMJournal on Applied Mathematics, 11:431–441, 1963.

[93] S. Mehrotra. On the implementation of a primal-dual interior point method. SIAM J.Optimization, 2(4):575–601, 1992.

318

Bibliography

[94] J. More. The Levenberg-Marquardt algorithm: implementation and theory, volume 630 ofWatson, G.(ed): Numerical Analysis, Lecture Notes in Mathematics. Springer, Berlin,1977.

[95] J. J. More. Recent developments in algorithms and software for trust region methods. InA. Bachem, M. Grotschel, and B. Korte, editors, Mathematical Programming: The Stateof the Art, pages 258–287. Springer, Heidelberg, Berlin, New York, 1983.

[96] K. Musselmann and J. Tavalage. A trade-off cut approach to multiple objective optimiza-tion. Operations Research, 28(6):1424–1435, 1980.

[97] Y. Nesterov and A. Nemirovskii. Interior Point Polynomial Methods in Convex Program-ming. SIAM publications, 1994.

[98] J. Nocedal, A. Wachter, and R. A. Waltz. Adaptive barrier strategies for nonlinear interiormethods. Technical Report RC 23563, IBM T. J. Watson Research Center, March 2005;revised January 2006.

[99] K. Oppenheimer. A proxy approach to multi-attribute decision-making. ManagementScience, 24:675–689, 1978.

[100] J. Ortega and W. Rheinbold. Iterative Solution of Nonlinear Equations in Several Vari-ables. Academic Press, New York, San Francisco, London, 1970.

[101] E. Panier and A. Tits. Avoiding the maratos effect by means of a nonmonotone linesearch, i: General constrained problems. SIAM Journal on Numerical Analysis, 28:1183–1195, 1991.

[102] P. Papalambros and D. Wilde. Principles of Optimal Design. Cambridge University Press,1988.

[103] J. Pinter. Global Optimization in Action. Kluwer Academic Publishers, Dordrecht, 1996.

[104] M. Powell. The convergence of variable metric methods for nonlinearly constrained opti-mization calculations. In O. Mangasarian, R. Meyer, and S. Robinson, editors, NonlinearProgramming, volume 3, pages 27–63. Academic Press, 1978.

[105] M. Powell. A fast algorithm for nonlinearly constraint optimization calculations, volume630 of Watson, G. (ed.): Numerical Analysis, Lecture Notes in Mathematics. Springer,Berlin, 1978.

[106] M. Powell. Zqpcvx, a fortran subroutine for convex quadratic programming. TechnicalReport DAMTP/1983/NA17, University of Cambridge, 1983.

[107] M. Powell. On the global convergence of trust region algorithms for unconstrained mini-mization. Mathematical Programming, 29:297–303, 1984.

[108] M. Powell. A direct search optimization method that models the objective and constraintfunctions by linear interpolation. In S. Gomez and J.-P. Hennart, editors, Advances inOptimization and Numerical Analysis, pages 51–67. Kluwer Academic, Dordrecht, 1994.

319

Bibliography

[109] M. Powell. A view of algorithms for optimization without derivatives. Technical ReportDAMTP 2007/Na 03, University of Cambridge, Cambridge, 2007.

[110] M. J. D. Powell and Y. Yuan. A trust region algorithm for equality constrained optimiza-tion. Mathematical Programming, 49:189–211, 1991.

[111] J. Rajesh, K. Gupta, H. S. Kusumakar, V. K. Jayaraman, and B. D. Kulkarni. Dy-namic optimization of chemical processes using ant colony framework. Comput. Chem.,25(6):583–595, 2001.

[112] H. Ramsin and P. Wedin. A comparison of some algorithms for the nonlinear least squaresproblem. Nordisk Tidstr. Informationsbehandlung (BIT), 17:72–90, 1977.

[113] M. Raydan. The barzilai and borwein gradient method for the large-scale unconstrainedminimization problem. SIAM Journal on Optimization, 7:26–33, 1997.

[114] T. L. Saaty. A scaling method for priorities in hierarchical structures. Journal of Mathe-matical Psychology, 15:234–281, 1977.

[115] O. Schenk, A. Wachter, and M. Hagemann. Combinatorial approaches to the solution ofsaddle-point problems in large-scale parallel interior-point optimization. Technical ReportRC 23824, IBM T. J. Watson Research Center, December 2005.

[116] K. Schittkowski. Nonlinear Programming Codes, volume 183 of Lecture Notes in Economicsand Mathematical Systems. Springer, 1980.

[117] K. Schittkowski. The nonlinear programming method of wilson, han and powell. part 1:Convergence analysis. Numerische Mathematik, 38:83–114, 1981.

[118] K. Schittkowski. The nonlinear programming method of wilson, han and powell. part 2: Anefficient implementation with linear least squares subproblems. Numerische Mathematik,38:115–127, 1981.

[119] K. Schittkowski. Nonlinear programming methods with linear least squares subproblems,volume 199 of Mulvey, J.M. (ed.): Evaluating Mathematical Programming Techniques,Lecture Notes in Economics and Mathematical Systems. Springer, 1982.

[120] K. Schittkowski. On the convergence of a sequential quadratic programming methodwith an augmented lagrangian search direction. Mathematische Operationsforschung undStatistik, Series Optimization, 14:197–216, 1983.

[121] K. Schittkowski. Theory, implementation and test of a nonlinear programming algorithm.In Eschenauer, H. and Olhoff, N. (eds.): Optimization Methods in Structural Design,volume 199. Wissenschaftsverlag, 1983.

[122] K. Schittkowski. On the global convergence of nonlinear programming algorithms. ASMEJournal of Mechanics, Transmissions, and Automation in Design, 107:454–458, 1985.

[123] K. Schittkowski. Nlpql: A fortran subroutine solving constrained nonlinear programmingproblems. Annals of Operations Research, 5:458–500, 1985/86.

320

Bibliography

[124] K. Schittkowski. More Test Examples for Nonlinear Programming, volume 182 of LectureNotes in Economics and Mathematical Systems. Springer, 1987.

[125] K. Schittkowski. Solving nonlinear least squares problems by a general purpose sqp-method. In HoffmannK.-H., H.-U. J.-B., C. Lemarechal, and J. Zowe, editors, Trends inMathematical Optimization, volume 84 of Series of Numerical Mathematics, pages 295–309. Birkhauser, 1988.

[126] K. Schittkowski. Solving nonlinear programming problems with very many constraints.Optimization, 25:179–196, 1992.

[127] K. Schittkowski. Parameter estimation in systems of nonlinear equations. NumerischeMathematik, 68:129–142, 1994.

[128] K. Schittkowski. Numerical Data Fitting in Dynamical Systems. Kluwer Academic Pub-lishers, Dordrecht, 2002.

[129] K. Schittkowski. Test problems for nonlinear programming - user’s guide. Technical report,Department of Mathematics, University of Bayreuth, 2002.

[130] K. Schittkowski. Ql: A fortran code for convex quadratic programming - user’s guide.Technical report, Department of Mathematics, University of Bayreuth, 2003.

[131] K. Schittkowski. Dfnlp: A fortran implementation of an sqp-gauss-newton algorithm -user’s guide, version 2.0. Technical report, Department of Computer Science, Universityof Bayreuth, 2005.

[132] K. Schittkowski. Nlpqlp: A fortran implementation of a sequential quadratic program-ming algorithm with distributed and non-monotone line search - user’s guide, version 2.2.Technical report, Department of Computer Science, University of Bayreuth, 2006.

[133] K. Schittkowski. An active set strategy for solving optimization problems with up to60,000,000 nonlinear constraints. Technical report, Department of Computer Science,University of Bayreuth, 2007. submitted for publication.

[134] K. Schittkowski. Nlplsq: A fortran implementation of an sqp-gauss-newton algorithmfor least-squares optimization - user’s guide. Technical report, Department of ComputerScience, University of Bayreuth, 2007.

[135] K. Schittkowski. Nlpmmx: A fortran implementation of a sequential quadratic program-ming algorithm for solving constrained nonlinear min-max problems - user’s guide, version1.0. Technical report, Department of Computer Science, University of Bayreuth, 2007.

[136] K. Schittkowski. Nlpqlb: A fortran implementation of a sequential quadratic programmingalgorithm with active set strategy for solving optimization problems with a very largenumber of nonlinear constraints - user’s guide, version 2.0. Technical report, Departmentof Computer Science, University of Bayreuth, 2007.

[137] K. Schittkowski. Nlpqlb: A fortran implementation of an sqp algorithm with active setstrategy for solving optimization problems with a very large number of nonlinear con-straints - users guide, version 2.0 -. Technical report, Department of Computer Science,University of Bayreuth, 2007.

321

Bibliography

[138] K. Schittkowski. Nlpqlp: A fortran implementation of a sequential quadratic programmingalgorithm with distributed and non-monotone line search - users guide, version 2.24 -.Technical report, Department of Computer Science, University of Bayreuth, 2007.

[139] K. Schittkowski. Ql: A fortran code for convex quadratic programming - user’s guide.Technical report, Department of Mathematics, University of Bayreuth, 2007.

[140] K. Schittkowski. Nlpinf: A fortran implementation of an sqp algorithm for maximum-norm optimization problems - user’s guide. Technical report, Department of ComputerScience, University of Bayreuth, 2008.

[141] K. Schittkowski. Nlpl1: A fortran implementation of an sqp algorithm for minimizingsums of absolute function values - user’s guide. Technical report, Department of ComputerScience, University of Bayreuth, 2008.

[142] K. Schittkowski. Nlpmmx: A fortran implementation of an sqp-gauss-newton algorithm formin-max optimization - user’s guide. Technical report, Department of Computer Science,University of Bayreuth, 2008.

[143] K. Schittkowski. Nlpqlg: Heuristic global optimization - users guide, version 2.1. Technicalreport, Department of Computer Science, University of Bayreuth, 2008.

[144] K. Schittkowski. A collection of 186 test problems for nonlinear mixed-integer program-ming in fortran - user’s guide. Technical report, Department of Computer Science, Uni-versity of Bayreuth, Bayreuth, 2012.

[145] K. Schittkowski, C. Zillober, and R. Zotemantel. Numerical comparison of nonlinearprogramming algorithms for structural optimization. Structural Optimization, 7(1):1–28,1994.

[146] M. Schluter, J. A. Egea, and J. R. Banga. Extended ant colony optimization for non-convexmixed integer nonlinear programming. Computers & Operations Research, 7, 2009.

[147] A. T. Serban and G. Sandou. Mixed ant colony optimization for the unit commitmentproblem. In B. Beliczynski, A. Dzielinski, M. Iwanowski, and B. Ribeiro, editors, Adaptiveand Natural Computing Algorithms, pages 332–340. Springer, Berlin, Heidelberg, 2007.

[148] K. Socha. Aco for continuous and mixed-variable optimization. In M. Dorigo, M. Birat-tari, C. Blum, M. Gambardella, L., F. Mondada, and T. Stutzle, editors, Ant Colony,Optimization and Swarm Intelligence, pages 25–36. Springer, Berlin, Heidelberg, 2004.

[149] K. Socha and M. Dorigo. Ant colony optimization for continuous domains. Eur. J. Oper.Res., 185:1155–1173, 2008.

[150] P. Spellucci. Numerische Verfahren der nichtlinearen Optimierung. Birkhauser, 1993.

[151] J. Stoer. Foundations of recursive quadratic programming methods for solving nonlinearprograms. In K. Schittkowski, editor, Computational Mathematical Programming, vol-ume 15 of NATO ASI Series, Series F: Computer and Systems Sciences. Springer, 1985.

322

Bibliography

[152] T. Stuetzle and M. Dorigo. ACO algorithms for the travelling salesman problem, pages163–183. Miettinen, K. and Maekelae, M., M. and Neittaanmaeki, P. and Periaux, J.(eds.): Evolutionary Algorithms in Engineering and Computer Science. John Wiley &Sons, Chichester (UK), 1999.

[153] K. Svanberg. The Method of Moving Asymptotes – a new method for structural optimiza-tion. International Journal for Numerics Methods in Engineering, 24:359–373, 1987.

[154] K. Svanberg. A globally convergent version of MMA without linesearch. In N. Olhoff andG. Rozvany, editors, Proceedings of the First World Congress of Structural and Multidis-ciplinary Optimization, pages 9–16. Pergamon, 1995.

[155] K. Svanberg. Two primal-dual interior-point methods for the MMA subproblems. Techni-cal Report TRITA/MAT-98-OS12, Department of Mathematics, KTH, Stockholm, 1998.

[156] Y. Tanaka, M. Fukushima, and T. Ibaraki. A comparative study of several semi-infinitenonlinear programming algorithms. European Journal of Operations Research, 36:92–100,1988.

[157] P. Toint. An assessment of nonmontone line search techniques for unconstrained optimiza-tion. SIAM Journal on Scientific Computing, 17:725–739, 1996.

[158] P. Toint. A nonmonotone trust-region algorithm for nonlinear optimization subject toconvex constraints. Mathematical Programming, 77:69–94, 1997.

[159] A. Torn and A. Zilinskas. Global Optimization, volume 350 of Lecture Notes in ComputerScience. Springer, Heidelberg, 1989.

[160] R. Vanderbei. Linear Programming: Foundations and Extensions. Kluwer AcademicPublishers, 1996.

[161] A. Wachter. An Interior Point Algorithm for Large-Scale Nonlinear Optimization withApplications in Process Engineering. PhD thesis, Carnegie Mellon University, 2002.

[162] A. Wachter and L. T. Biegler. Line search filter methods for nonlinear programming:Motivation and global convergence. SIAM Journal on Optimization, 16(1):1–31, 2005.

[163] A. Wachter and L. T. Biegler. On the implementation of an interior-point filter line-searchalgorithm for large-scale nonlinear programming. Mathematical Programming, 106(1):25–57, 2006.

[164] F. Walz. An engineering approach: Hierarchical optimization criteria. IEEE Trans. Au-tomatic Control, 12:179, 1967.

[165] T. Westerlund and R. Porn. Solving pseudo-convex mixed integer optimization problemsby cutting plane techniques. Optim. Eng., 3:253–280, 2002.

[166] P. Wolfe. Convergence conditions for ascent methods. SIAM Review, 11:226–235, 1969.

[167] S. Wright. Primal-Dual Interior-Point Methods. SIAM publications, 1997.

323

Bibliography

[168] O. Yeniay. Penalty function methods for constrained optimization with genetic algorithms.Mathematical and Computational Applications, 10:45–56, 2005.

[169] L. Yu, K. Liu, and K. Li. Ant colony optimization in continuous problem. Frontiers ofMechanical Engineering in China, 2(4):459–462, 2007.

[170] Y. Yuan. On the superlinear convergence of a trust region algorithm for nonsmoothoptimization. Mathematical Programming, 31:269–285, 1985.

[171] Y. Yuan. On the convergence of a new trust region algorithm. Numerische Mathematik,70:515–539, 1995.

[172] B. Zhang, D. Chen, and W. Zhao. Iterative ant-colony algorithm and its application todynamic optimization of chemical process. Comput. Chem. Eng., 29(10):2078–2086, 2005.

[173] C. Zillober. A globally convergent version of the method of moving asymptotes. StructuralOptimization, 6:166–174, 1993.

[174] C. Zillober. A practical interior point method for a nonlinear programming problemarising in sequential convex programming. Technical Report TR98-1, Informatik, Uni-versitat Bayreuth, WWW: www.uni-bayreuth.de/departments/math/∼czillober/papers/-tr98-1.ps, 1998.

[175] C. Zillober. A combined convex approximation – interior point approach for large scalenonlinear programming. Optimization and Engineering, 2(1):51–73, 2001.

[176] C. Zillober. Global convergence of a nonlinear programming method using convex ap-proximations. Numerical Algorithms, 27(3):256–289, 2001.

[177] C. Zillober. SCPIP — an efficient software tool for the solution of structural optimizationproblems. Structural and Multidisciplinary Optimization, 24(5):362–371, 2002.

[178] C. Zillober, K. Schittkowski, and K. Moritzen. Very large-scale optimization by sequentialconvex programming. Optimization Methods and Software, 19(1):103–120, 2004.

324

Documents

NLP++ Optimization Toolbox+TheoryManual.pdf · 2013-01-22 · kowski, and van de Braak [17]. The general availability of parallel computers and in particular of distributed computing