Multiobjective Optimization

Frederico Gadelha Guimarãesfredericoguimaraes@ufmg.br

Department of Electrical EngineeringUniversidade Federal de Minas Gerais

Belo Horizonte, Brazil

Introduction

General formulation of optimization problems

f ( x)∈ℝ , x∈Φ

Φ={g i(x )≤0, i=1,… , ph j( x)=0, j=1,… , qx∈X

Introduction

General formulation of optimization problems

f ( x)∈ℝ , x∈Φ

Φ={g (x)≤0,h(x )=0,x∈X

Definitions

Objective function

● The objective function (cost function or optimization criterion) is a function to be optimized (minimized) by the optimization algorithm.

Global optimum (global minimum)

● A point is global optimum of if

f : X⊂ℝn→Y⊂ℝ

x ∗ ∈X f : X⊂ℝn→Y⊂ℝ

f (x ∗ )≤ f (x ) , ∀ x≠x ∗

Definitions

Strong local minimum (strict local minimum)

● A strict local minimum is defined in terms of its vicinity:

Weak local minimum (non strict local minimum)

f : X⊂ℝn→Y⊂ℝf ( x ∗ )< f ( x ) , ∀ x≠x ∗ ∧ x∈V ϵ(x∗ )

f : X⊂ℝn→Y⊂ℝf (x ∗ )≤ f (x ) , ∀ x≠x ∗ ∧ x∈V ϵ(x∗ )

Definitions

Convex sets

● A convex set is defined by

● Examples

Q⊂ℝn

z=λ x1+(1−λ) x2, z∈Q ,∀ x1, x2∈Q ∧ 0≤λ≤1

A={(x1, x2): x12+x2

2≤4 }⊂ℝ2

A={x : Ax≤b }

Definitions

Convex functions

● A function is convex if

● A function is strictly convex if the above inequality is strict.

f : X ∈ℝn→Y ∈ℝ

f [λ x1+(1−λ) x 2 ]≤λ f (x1)+(1−λ) f (x2) , ∀ x1, x 2∈X ∧ 0≤λ≤1

Definitions

Convex functions

● Sub-level region

● The sub-level region of a convex function is a convex set.

● Level surface (or level curve)

f : X ∈ℝn→Y ∈ℝ

R ( f ,α)= {x∈X : f ( x)≤α }

S ( f ,α)={x∈X : f (x)=α }

Definitions

Unimodal function

● A function is unimodal if its sub-level region is a connected set for all values of alpha.

Multimodal function

● A function is multimodal if the associated sub-level region is disconnected for some value of alpha.

Definitions

Attraction basin

● Around local minima, there are regions in which the function behaves as if it were unimodal. Such regions are named attraction basins.

● The attraction basin of a local minimum is defined by the greatest connected sub-level region that contains

● A local search method converges to the local minimum of an attraction basin if x0∈B(x ∗ )

x ∗R ( f ,α)

Definitions

Differentiable functions

● A function is differentiable in the domain if there is a Gradient vector defined by

● A function is differentiable to the second order in the domain if there is a Hessian matrix given by

∇ f ( x)=( ∂ f∂ x1

∂ f∂ x1

⋯ ∂ f∂ xn

) , x∈X

H (x)=(∂2 f

∂ x12

⋯ ∂2 f∂ x1∂ xn

⋮ ⋱ ⋮∂2 f∂ xn∂ x1

⋯ ∂2 f

∂ xn2) , x∈X

Definitions

Differentiable functions

● Calculate the Gradient and the Hessian of the following function

f ( x)=10(x2−x12)2+(1−x1)

Optimality conditions

Taylor series

● A continuously differentiable function can be approximated locally by its expansion in Taylor series:

f ( x)= f (x0)+∇ f (x0)' (x−x0)+12(x−x0)' H ( x0)(x−x0)+O (∥x−x0∥3)

Theorem (Necessary condition of 1st order)

● The Gradient at the local minimum is null.

f (x ∗ )≤ f ( x ) ,∀ x∈V ϵ( x∗ )

f (x)≥ f (x ∗ )

f ( x ∗ )+∇ f ∗ ' (x−x ∗ )+O (∥x−x ∗∥2 )≥ f ( x ∗ )

∇ f ∗ ' (−α∇ f ∗ )+O (∥x−x ∗∥2)≥0

−α∥∇ f ∗∥2+O (∥x−x ∗∥2)≥0

limx→ x ∗

O (∥x−x ∗∥2)α =0 → ∇ f ∗=0

Theorem (Necessary condition of 2nd order)

● The Hessian matrix at the local minimum is definite positive.

f (x ∗ )≤ f ( x ) ,∀ x∈V ϵ( x∗ )

f (x)≥ f (x ∗ )

f ( x ∗ )+12(x−x ∗ )' H ∗ (x−x ∗ )+O (∥x−x ∗∥3)≥ f (x ∗ )

u ' H ∗ u≥0

12(x−x ∗ )' H ∗ (x−x ∗ )+O (∥x−x ∗∥3)≥0

12( x−x ∗ ) '∥x−x ∗∥

H ∗ (x−x∗ )

∥x−x ∗∥+

O (∥x−x ∗∥3 )∥x−x ∗∥2 ≥0

● What about constrained problems?

● How can we determine the optimality conditions?

Consider the following constrained problem:

At the minimum, the Gradients of the objective function and the constraint are parallel (why?):

min f ( x) , with h(x )=0

∇ f ( x ∗ )=−λ ∗ ∇ h(x ∗ )

∇ f ( x ∗ )+λ ∗ ∇ h (x ∗ )=0

∇ [ f (x ∗ )+λ ∗ h( x ∗ )]=0

∇ x L ( x∗ ,λ ∗ )=0

The solution of the constrained problem:

is a critical point of the Lagrangean function:

min f ( x) , with h(x )=0

L ( x ,λ)= f (x )+λ h( x )

∇ x L ( x∗ ,λ ∗ )=0

∇ λ L(x∗ ,λ ∗ )=0

Now consider the constrained problem (inequality constraint):

The Lagrangean function is given by:

min f ( x) , with g (x )≤0

L ( x , z ,μ)= f ( x)+μ [g (x)+z2 ]

∇ x L ( x∗ , z ∗ ,μ ∗ )=0 ⇒ ∇ f ( x ∗ )+μ ∗ ∇ g (x ∗ )=0

∇ z L( x∗ , z ∗ ,μ ∗ )=0 ⇒ 2μ ∗ z ∗=0

g ( x )+ z2=0

∇μ L(x∗ , z ∗ ,μ ∗ )=0 ⇒ g (x ∗ )+z ∗2=0

The second condition implies either z=0 or μ=0.

● If z=0, the solution lies at the border of the feasible region and we say that g is an active constraint.

● If μ=0, then z can be different from 0. The constraint is satisfied at the solution and we say that g is inactive.

∇ x L ( x∗ , z ∗ ,μ ∗ )=0 ⇒ ∇ f ( x ∗ )+μ ∗ ∇ g (x ∗ )=0

∇ z L( x∗ , z ∗ ,μ ∗ )=0 ⇒ 2μ ∗ z ∗=0

∇μ L(x∗ , z ∗ ,μ ∗ )=0 ⇒ g (x ∗ )+z ∗2=0

We can replace this condition by an equivalent one, thus eliminating the need of the slack variable z.

2μ ∗ z ∗=0 ⇒ μ ∗ g (x ∗ )=0

∇ x L ( x∗ , z ∗ ,μ ∗ )=0 ⇒ ∇ f ( x ∗ )+μ ∗ ∇ g (x ∗ )=0

∇ z L( x∗ , z ∗ ,μ ∗ )=0 ⇒ 2μ ∗ z ∗=0 ⇒ μ ∗ g (x ∗ )=0

∇μ L(x∗ , z ∗ ,μ ∗ )=0 ⇒ g (x ∗ )+z ∗2=0

Karush-Kuhn-Tucker optimality conditions

At the solution, the following equations are valid:

f ( x)∈ℝ , x∈Φ

Φ={g i(x )≤0, i=1,… , ph j( x)=0, j=1,… , qx∈X

∇ f (x ∗ )+∑i=1

μi∗ ∇ g i(x

∗ )+∑j=1

λ j∗ ∇ h j(x

∗ )=0

μ i∗ g i( x

∗ )=0, μi≥0

g i(x∗ )≤0 h j( x

∗ )=0

Deterministic optimization methods

● Derivative methods

– Gradient method (steepest descent method);

– Newton method;

– Marquardt method;

– Quasi-Newton methods;

– Conjugate Gradients methods;

● Non-derivative methods

– Nelder-Mead Simplex;

– Hooke-Jeeves method (pattern search);

Deterministic optimization methods

General structure of derivative methods (search direction based):

● Methods vary in the way the step size and the search direction are calculated.

xk+1← xk+αk d k

Gradient method

● The Gradient method (or Cauchy method or Steepest Descent Method) is the simplest one among the derivative methods.

● It was developed by Cauchy in 1847.

A. L. Cauchy, Méthode générale pour la résolution des systèmes d’équations simultanées, Comptes Rendus de l’Academie des Sciences, Paris, Vol. 25, pp. 536–538, 1847.

d k=−∇ f ( xk)

Gradient method

Algorithm: Gradient (Cauchy) method

Input:

2. while stop criterion is not met do

3. Calculate

8. end

d k←−∇ f ( xk)

∇ f ( xk)

αk←arg minα

f (x k+αd k)

xk+1← x k+αk d k

k← k+1

x0∈X , f : X →Y

Gradient method

● The algorithm generates a monotonic sequence

● The step size is a non-negative scalar that minimizes the function in the search direction from the current solution, i.e., represents a step towards the minimizing direction.

● In practice, the step size should be calculated with a one-dimensional minimization method (line search).

{xk , f (x k)} such that∇ f ( xk)→0 when k →∞

Gradient method

Numerical evaluation of the Gradient

● Finite difference approximation:

● Central finite difference approximation:

∂ f∂ xi∣x

≈f ( x+δi e i)− f ( x)

δi, i=1,… , n

∂ f∂ xi∣x

≈f ( x+δi e i)− f ( x−δi e i)

2δi, i=1,… , n

Gradient method

Possible stop criteria

● Gradient close to zero

● Stabilization of the variables

● Stabilization of function values

∥∇ f (xk)∥≤ϵ

∥xk−xk−1∥≤ϵ

∣ f ( xk)− f (xk−1)∣≤ϵ

Gradient method

Difficulties

● Slow convergence;

● Zig zag effect;

● Trapped by non-differentiable regions;

One-dimensional minimization

Newton method

● From its Taylor series expansion:

f : X →Y , f ∈C 2

f ( x)= f (x k)+∇ f (x k)(x−xk)+12(x−xk) ' H (xk)(x−xk)+O (∥x−xk∥3)

f ( x)≈ f (x k)+∇ f (x k)(x−xk)+12(x−xk) ' H (xk)(x−xk)

∇ f (xk)+H (xk)(xk+1−xk)=0

xk+1=x k−H−1( xk)∇ f (xk)

Newton method

● If the objective function is quadratic, the Newton method gives the solution in one step;

● The inverse of the Hessian can be interpreted as a “correction” applied to the Gradient direction, considering the curvature of the function;

● In the general case, for non quadratic functions, the step size should be computed:

xk+1=x k+αk d k , d k=−H−1(xk)∇ f ( xk)

Newton method

Algorithm: Newton method

Input:

3. Calculate

4. Calculate

9. end

d k←−H−1(x k)∇ f (x k)

∇ f ( xk)

αk←arg minα

f (x k+αd k)

xk+1← x k+αk d k

k← k+1

x0∈X , f : X →Y

H ( xk)

Newton method

● The method has quadratic convergence;

● Convergence is guaranteed under two assumptions:

– The Hessian is non singular – there is an inverse;

– The Hessian is definite positive, which guarantees a minimizing direction;

Newton method

Numerical evaluation of the Hessian

● Finite difference approximation:

∂2 f∂ xi∂ x j

≈f (x+δi ei+δ j e j)− f (x+δi ei)− f (x+δ j e j)+ f (x )

δi δ j

Newton method

Difficulties

● Requires the computation of the inverse of the Hessian matrix;

● Numerical ill-conditioning of the Hessian matrix makes the computation of the inverse difficult in practice;

● Numerical derivatives: greater numerical errors and many function evaluations required by finite difference approximation;

Marquardt method

Motivation

● The Cauchy method reduces the function value faster when the design vector is away from the optimum;

● The Newton method, on the other hand, converges fast when close to the optimum point;

● The Marquardt method attempts to take advantage of both;

D. Marquardt, An algorithm for least squares estimation of nonlinear parameters, SIAM Journal of Applied Mathematics, Vol. 11, No. 2, pp. 431–441, 1963.

Marquardt method

● The Marquardt method changes the diagonal of the Hessian:

● When gamma is large (~104), the diagonal dominates and the inverse is given by

H̃ (xk)=H ( xk)+γk I , γk>0

H̃−1(x k)=[H ( xk)+γk I ]−1≈[γk I ]

−1= 1γkI

Marquardt method

● The search direction of the Marquardt method is:

d k=−[ H̃ (xk)]−1∇ f (xk)

d k=−[H (xk)+γk I ]−1∇ f (x k)

Marquardt method

Algorithm: Marquardt method

Input:

3. Calculate

4. Calculate

8. If the function decreased, decrease else increase

10. end

d k←−[H ( xk)+γk I ]−1∇ f ( xk)

∇ f ( xk)

αk←arg minα

f (x k+αd k)

xk+1← x k+αk d k

k← k+1

x0∈X , f : X →Y , γ0

H ( xk)

γk γk

Marquardt method

Difficulties

● Numerical derivatives: greater numerical errors and many function evaluations required by finite difference approximation;

quasi-Newton methods

Motivation

● These methods approximate the inverse of the Hessian matrix, avoiding the computation of the inverse and the computation of the Hessian itself;

● Avoid numerical calculation of second derivatives;

● Maintain quadratic convergence of the Newton method;

C. G. Broyden, Quasi-Newton methods and their application to function minimization, Mathematics of Computation, Vol. 21, p. 368, 1967.

Approximating the inverse of the Hessian

● It is possible to approximate the inverse of the Hessian iteratively by using the recursive formula:

● The update of this estimative is built in terms of the Gradient vectors and the points in previous iterations.

Bk+1=Bk+c z z ' , Bk→H−1(xk )

An approximate inverse of the Hessian is to be computed. Using the Taylor series expansion to expand the gradient:

∇ f ( x)≈∇ f ( x0)+H (x0)( x−x0)

∇ f (xk+1)=∇ f (x0)+Ak(xk+1−x0)∇ f (x k)=∇ f (x0)+Ak(xk−x0)

Ak (x k+1−xk)=∇ f (x k+1)−∇ f ( xk)

Ak d k=gk

d k=[Ak ]−1gk=Bk gk , Bk=[Ak ]

−1≈[H (x0)]−1

Rank 1 update

d k=Bk+1 g k

d k=(Bk+c z z ' ) gk=Bk gk+c z ( z ' g k )

c z=d k−Bk gkz ' gk

c= 1z ' gk

, z=d k−Bk g k

Rank 1 update

● This leads to the update formula:

● This formula is attributed to Broyden.

Bk+1=Bk+(d k−Bk g k ) (d k−Bk gk ) '

(d k−Bk gk ) ' gk

Rank 2 update

● Rank 1 update formula guarantees symmetry but not positive definiteness;

● Rank 2 update formulas were developed to guarantee both symmetry and positive definiteness and are more robust in minimizing general nonlinear functions;

● A rank 2 update can be obtained as:

● Following a similar procedure, rank 2 update formulas can be derived.

Bk+1=Bk+c1 z1 z1 '+c2 z 2 z2 ' , Bk→H−1(x k)

Davidon-Fletcher-Powell (DFP) formula

Bk+1=Bk+d k d k

d kT g k

−(Bk gk ) (Bk gk )

(Bk g k )T gk

d k=xk+1−xk

gk=∇ f (xk+1)−∇ f (xk)

Broyden-Fletcher-Goldfarb-Shanno (BFGS) formula

Bk+1=Bk+d k d k

d kT g k (1+

g kT Bk g kd kT g k )−

Bk g k d kT

d kT gk

−d k gk

T Bkd kT gk

d k=xk+1−xk

gk=∇ f (xk+1)−∇ f (xk)

● Numerical experience indicates that the BFGS method is the best unconstrained method and is less influenced by errors in finding the optimal step size compared to the DFP method;

Conjugate gradient methods

● Presented first in 1908 by Schmidt, reinvented independently in 1948 and improved in the 1950s;

● Initially it was developed for solving linear systems of equations, still used for sparse matrices;

● In 1964, Fletcher and Reeves generalized the method to solve unconstrained nonlinear optimization problems.

R. Fletcher and C. M. Reeves, Function minimization by conjugate gradients, Computer Journal , Vol. 7, No. 2, pp. 149–154, 1964.

Conjugate directions

● Let A be a symmetric matrix. A set of n vectors (or directions) is said to be conjugate (A-conjugate) if

● Orthogonal directions are a special case of conjugate directions.

Quadratically Convergent Method.

● If a minimization method, using exact arithmetic, can find the minimum point in n steps while minimizing a quadratic function in n variables, the method is called a quadratically convergent method.

d kT Ad k=0, ∀ i≠ j , i=1,… , n , j=1,… , n

Theorem

● If a quadratic function

is minimized sequentially, once along each direction of a set of n mutually conjugate directions, the minimum of the function will be found at or before the nth step irrespective of the starting point.

q (x)=12xT A x+BT x+C

Algorithm: Conjugate gradient method

Input:

6. Calculate

7. Calculate new conjugate direction

9. end

rk+1←−∇ f ( xk+1)

αk←arg minα

f (x k+αd k)

xk+1← x k+αk d k

k← k+1

k←0 ; r0←−∇ f ( x0) ; d 0← r0

x0∈X , f : X →Y

d k+1← rk+1+βk d k

● Two well known formulas are:

– Fletcher-Reeves:

– Polak-Ribière:

βkFR=

r k+1T r k+1r kT rk

βkFR=

r k+1T (r k+1−rk)

r kT r k

, r k=−∇ f ( xk)

● For quadratic functions, the method converges in n iterations. In non-quadratic functions, directions are no longer conjugate.

● Since the method is based on the generation of n conjugate directions in an n-dimensional space, it should be restarted at every n iterations in non-quadratic problems;

● In general, quasi-Newton methods converge in less iterations, however require more computation and more memory per iteration. Therefore, conjugate gradient is recommended for large scale problems.

Nelder-Mead Simplex

● Derivative methods converge faster, but can only be used for problems characterized by differentiable functions;

● Nonetheless, in problems with many variables, the numerical errors introduced by numerical derivatives can become significant;

Nelder-Mead Simplex

● Nelder-Mead Simplex was developed in 1965 for nonlinear optimization;

● The method works with n+1 points at every iteration, eliminating the “worst” point;

● The n+1 points produce a simplex which “moves” in the search space;

J. A. Nelder and R. Mead, A simplex method for function minimization, Computer Journal, Vol. 7, p. 308, 1965.

Nelder-Mead Simplex

Convex hull

● The convex hull of a set A is given by the intersection of all convex sets that contain A.

Polytope

● The convex hull of a finite set of points is called polytope.

Simplex

● If n+1 n-dimensional points form n linearly independent vectors, then the convex hull of this set of points is a simplex.

Nelder-Mead Simplex

Notation

● Index of the vertex with the best objective function value:

● Index of the vertex with the worst objective function value:

● Index of the vertex with the second worst objective function value:

● Centroid of the opposite face to the worst vertex:

b∈{1,… , n+1} , x b

w∈{1,… , n+1} , xw

s∈{1,… , n+1} , x s

x̂= 1n ∑i=1, i≠w

Nelder-Mead Simplex

● Reflection: reflects the worst solution and moves the simplex towards direction of improvement:

● Expansion: expands the simples towards direction of improvement:

● Outer contraction:

● Inner contraction:

x r= x̂+α ( x̂−xw ) , α=1

xe= x̂+γ ( x̂−xw ) , γ=2

xoc= x̂+β ( x̂−xw ) , β=0.5

x ic= x̂−β ( x̂−xw ) , β=0.5

Nelder-Mead Simplex

Algorithm: Nelder-Mead Simplex method

Input:

3. Perform reflection

4. if then

5. calculate and evaluate

6. if then accept else accept

7. else if then accept

8. else if then

10. if then accept

xr= x̂+α( x̂−xw)

x0∈X , f : X →Y

f (x r)< f ( xb)

xe= x̂+γ( x̂−xw)

f (x e)< f (x r) xe xr

f (x r)< f ( x s) xr

f (x r)< f ( xw)

xoc= x̂+β( x̂−xw)

f (x oc)≤ f (xw) xoc

Nelder-Mead Simplex

Algorithm: Nelder-Mead Simplex method

Input:

11. else if then

13. if then accept

14. else shrink simplex

15. end

16. end

x0∈X , f : X →Y

f (x r)≥ f ( xw)

x ic= x̂−β( x̂−xw)

f (x ic)≤ f (xw) x ic

Nelder-Mead Simplex

● Stop criterion is based on the volume of the simplex;

● Convergence to convex function proven only recently;

● Initialization: orthogonal perturbation to the initial solution;

● Method actually used in the function fminsearch in Matlab;

● Can be coupled with one-dimensional search methods.

Pattern search method

● Pattern search method tests pattern points from the current solution;

● It alternates search directions parallel to the coordinate axis and search directions of the kind

R. Hooke and T. A. Jeeves, Direct search solution of numerical and statistical problems, Journal of the ACM, Vol. 8, No. 2, pp. 212–229, 1961.

M.J.D. Powell, An efficient method for finding the minimum of a function of several variables without calculating derivatives, Computer Journal , Vol. 7, No. 4, pp. 303–307, 1964.

xk+1−x k

● Given the initial point:

● The algorithm tests coordinate directions making a move when the function improves or staying at the current point:

● After searching all coordinates, we end up at

● Perform a search in the direction

● Restart the search from this point. If the function does not decrease, reduce the step size.

x0, y0=x0, k=0

yi±λ e i+1

y0=xk+1+α( xk+1−x k)

Algorithm: Hooke-Jeeves method

Input:

3. foreach do

4. if then else

5. end

6. if then

7. else

8. end

10. end

i=0,… , n−1

k←0, y0← x0

x0∈X , f : X →Y ,λ ,α

f ( y i±λ e i+1)< f ( y i ) y i+1← y i±λ e i+1 y i+1← y i

f ( yn)< f ( xk) xk+1← yn ; y0← xk+1+α( xk+1−x k)

λ←λ /2 ; xk+1← x k ; y0← xk

k← k+1

● Pattern search method is very easy to code and computationally competitive with other methods.

● Possible modifications include: different step sizes for each variable, or coupling with one-dimensional search methods.

Multiobjective Optimization

Documents

Interactive Particle Swarm: A Pareto-Adaptive Metaheuristic to Multiobjective Optimization

Multiobjective Mathematical Optimization Model for Municipal

Comparison of Two Spatial Optimization Techniques: A Framework to Solve Multiobjective Land Use Distribution Problems

$\\theta$-Multiobjective Teaching–Learning-Based Optimization for Dynamic Economic Emission Dispatch

Multiobjective optimization of AMR systems

A new achievement scalarizing function based on parameterization in multiobjective optimization

Multiobjective Optimization of Temporal Processes

Design for Six Sigma through collaborative multiobjective optimization

Multiobjective Supervised Learning

Strategic design of hydrogen infrastructure considering cost and safety using multiobjective optimization

CFD-Based Multiobjective Optimization of Waterjet Propelled High Speed Ships

Non-linear multiobjective optimization for control of hydropower plants network

Constrained Optimization via Multiobjective Evolutionary Algorithms

Defect prediction as a multiobjective optimization problem

Increasing selective pressure towards the best compromise in evolutionary multiobjective optimization: The extended NOSGA method

Multiobjective evolutionary optimization of periodic layered materials for desired wave dispersion characteristics

A new evolutionary approach for multiobjective optimization

Evolutionary multiobjective optimization of topologies for an urban Ad-Hoc network

Robust Optimization of an Automotive Valvetrain Using a Multiobjective Genetic Algorithm

Multiobjective optimization and multiple constraint handling with evolutionary algorithms. I. A unified formulation