Optimization Theory Lecture 2

7/29/2019 Optimization Theory Lecture 2

1/44

uol-new

Basic Principles

Engineering Optimization (EEE614)Lecture 2: Basic Principles

Dr. Shafayat Abrar

Signal Processing Research GroupCommunications Research Group

Department of Electrical EngineeringCOMSATS Institute of IT, Pakistan

Email: [email protected]

Spring 2013Textbook: Practical Optimization: Algorithms and Engineering Applications,

A. Antoniou and W.-S. Lu, Springer, 2007.

Dr. Shafayat Abrar COMSATS Institute of IT, Pakistan

Engineering Optimization (EEE614): Lecture 2
http://find/http://goback/


2/44

uol-new

Basic Principles

Problem Definition of Nonlinear Optimization

We focus our attention on the nonlinear optimization problem

minimize F = f(x)

subject to : x R (1)

where f(x) is a real-valued function, R En is the feasible region, andE

n

represents the n-dimensional Euclidean space.


http://find/


3/44

uol-new

Basic Principles

Problem Definition of Nonlinear Optimization

We focus our attention on the nonlinear optimization problem

minimize F = f(x)

subject to : x R (1)

where f(x) is a real-valued function, R En is the feasible region, andE

n

represents the n-dimensional Euclidean space.

Note that x can be expressed as a column vector with elementsx1, x2, , xn; the transpose of x, namely, xT, is a row vector

xT = [x1 x2 xn]

So the variables x1, x2, , xn are the parameters that influence the costf(x). The optimization problem is to adjust variables x1, x2, , xn insuch a way as to minimize the scalar-valued quantity F = f(x).


http://find/


4/44

uol-new

Basic Principles

Gradient Information

In many optimization methods, gradient information pertaining to theobjective function is required for the evaluation (or computation) ofminima.

This information consists of the first and second derivatives of f(x) withrespect to the n variables.


http://find/


5/44

uol-new

Basic Principles


In many optimization methods, gradient information pertaining to theobjective function is required for the evaluation (or computation) ofminima.

This information consists of the first and second derivatives of f(x) withrespect to the n variables.

If f(x)

C1, that is, if f(x) has continuous first-order partial derivatives,then the gradient of f(x) is defined as

g(x) =

f

x1

f

x2 f

xn

T:= f(x)

(2)

where is gradient operator, viz

:=

x1

x2

xn

T



B P l
http://find/


6/44

uol-new

Basic Principles


In vector calculus, the gradient of a scalar field is a vector field thatpoints in the direction of the greatest rate of increase of the scalar field,and whose magnitude is that rate of increase.

In simple terms, the variation in space of any quantity can be represented(e.g. graphically) by a slope. The gradient represents the steepness anddirection of that slope.

Note: We can numerically compute the gradient of any function f(x, y)with respect to x and y using gradient.m and plot using quiver.m.



B i P i i l
http://find/


7/44

uol-new

Basic Principles

Consider a function f(x, y) = xexp(x2 y2).

2

1

0

1

2

1

0

1

0.40.2

00.20.4

Surface Plot

Contour Plot

1 0 11

0

1

2 1 0 1 21

0.5

0

0.5

1

Quiver Plot

x

y

QuiverContour Plot

x

y

2 1 0 1 21

0.5

0

0.5

1



B si P i i les
http://find/


8/44

uol-new

Basic Principles

MATLAB Code:

[x,y] = meshgrid(-2:0.1:2,-1:0.1:1);

z = x .* exp(-x.^2 - y.^2);[px,py] = gradient(z,.1,.1);

mi = min(min(z)); ma = max(max(z));

slices = mi:(ma-mi)/20:ma;

subplot(2,2,1); surf(x,y,z), axis image

title(Surface Plot,fontsize,14)

axis tight; subplot(2,2,2);cs = contour(x,y,z,slices,linewidth,2);

axis image; title(Contour Plot,fontsize,14)

subplot(2,2,3); qv=quiver(x,y,px,py);

set(qv,linewidth,1); axis image

title(Quiver Plot,fontsize,14)

subplot(2,2,4); contour(x,y,z,slices,linewidth,1);

hold on; qv=quiver(x,y,px,py);

set(qv,linewidth,1); hold off, axis image

title(Quiver-Contour Plot,fontsize,14)



Basic Principles
http://find/


9/44

uol-new

Basic Principles

For the function f(x, y) = xexp(x2 y2).QuiverContour Plot

2 1.5 1 0.5 0 0.5 1 1.5 22

1.5

1

0.5

0

0.5

1

1.5

2



Basic Principles
http://find/


10/44

uol-new

Basic Principles

Note

To minimize a function in an iterative or adaptive

fashion, it is necessary to follow the directionopposite to the direction of gradient.

We will discuss this issue in detail in Chapter 5.



Basic Principles


11/44

uol-new

Basic Principles


If f(x)

C2, that is, if f(x) has continuous second-order partial

derivatives, then the Hessian of f(x) is defined as

H(x) = gT = Tf(x)

=

2 f

x21

2f

x1x2 2f

x1xn

2 f

x2x

1

2f

x

2

2

2fx

2x

n...

......

2 f

xnx1

2f

xnx2 2f

x2n

(3)

For a function f(x) C2, we have2f

xjxi=

2f

xixj

since differentiation is a linear operation and hence H(x) is an n nsquare symmetric matrix.



Basic Principles
http://find/


12/44

uol-new

Basic Principles

Gradient InformationHence, the Hessian matrix (or simply the Hessian) is a square matrix ofsecond-order partial derivatives of a function. Mathematically, it describesthe local curvature of a function of many variables.

The Hessian matrix was developed in the 19th century by the Germanmathematician Ludwig Otto Hesse and later named after him. Hessehimself had used the term functional determinants.

There exist families of iterative optimization algorithms, likequasi-Newton method, which use (inverse of) Hessian to accelerate

their convergence speed.



Basic Principles
http://find/


13/44

uol-new

p

Taylor Series: for function of two variablesIf f(x) is a function of two variables x1 and x2 such that f(x) Cp wherep, that is, f(x) := f(x1, x2) has continuous partial derivatives of allorders, then the value of function f(x) at point [x1 + 1, x2 + 2] is givenby the Taylor series as

f(x1 + 1, x2 + 2) = f(x1, x2) + 1 f(x1, x2)

x1+ 2 f(x

1, x2)x2

+1

2

21

2f(x1, x2)

x21+ 212

2f(x1, x2)

x1x2+ 22

2f(x1, x2)

x22

+ O(3)

(4)where O(

3) is the remainder, and

is the Euclidean norm of

= [1, 2]T given by = T = 21 + 22 .



Basic Principles


14/44

uol-new

Oh notations

The notation (x) = O(x) denotes that (x) approaches zero at least as fast as

x, as x approaches zero, that is, there exists a constant K 0 such that(x)x K as x 0

The notation (x) = o(x) denotes that (x) approaches zero faster than x, as xapproaches zero, that is, there exists a constant K 0 such that

(x)x 0 as x 0



Basic Principles
http://find/


15/44

uol-new

Oh notations

The notation (x) = O(x) denotes that (x) approaches zero at least as fast as

x, as x approaches zero, that is, there exists a constant K 0 such that(x)x K as x 0

The notation (x) = o(x) denotes that (x) approaches zero faster than x, as xapproaches zero, that is, there exists a constant K 0 such that

(x)x 0 as x 0f(x1 + 1, x2 + 2) = f(x1, x2) + 1

f(x1, x2)

x1+ 2

f(x1, x2)

x2

+1

2 21

2f(x1, x2)

x21+ 212

2f(x1, x2)

x1x2+ 22

2f(x1, x2)

x22 + O(3)

= f(x1, x2) + 1f(x1, x2)

x1+ 2

f(x1, x2)

x2

+1

2

21

2f(x1, x2)

x21+ 212

2f(x1, x2)

x1x2+ 22

2f(x1, x2)

x22

+ o(2)



Basic Principles
http://find/


16/44

uol-new

Taylor Series: for function ofn variables

f(x1 + 1, x2 + 2, , xn + n)

= f(x1, x2, , xn) +n

i=1

if

xi+

ni=1

nj=1

ij

2

2f

xixj+ o(2)

Dr. Shafayat Abrar COMSATS Institute of IT, PakistanEngineering Optimization (EEE614): Lecture 2

Basic Principles


17/44

uol-new


f(x1 + 1, x2 + 2, , xn + n)

= f(x1, x2, , xn) +n

i=1

if

xi+

ni=1

nj=1

ij

2

2f

xixj+ o(2)

Taylor Series in matrix form

f(x + ) = f(x) + g(x)T +1

2

TH(x) + o(2)

where g(x) is the gradient, and H(x) is the Hessian at point x.


Basic Principles
http://find/


18/44

uol-new


f(x1 + 1, x2 + 2, , xn + n)

= f(x1, x2, , xn) +n

i=1

if

xi+

ni=1

nj=1

ij

2

2f

xixj+ o(2)

Taylor Series in matrix form

f(x + ) = f(x) + g(x)T +1

2

TH(x) + o(2)

where g(x) is the gradient, and H(x) is the Hessian at point x.

Later, we will discuss the use of Taylors expansion in optimization.


Basic Principles
http://find/


19/44

uol-new

Types of Extrema

The extrema of a function are its minima and maxima.


Basic Principles
http://find/


20/44

uol-new

Types of Extrema


Points at which a function has minima (maxima) are said to beminimizers (maximizers).


Basic Principles
http://find/


21/44

uol-new

Types of Extrema



Several types of minimizers (maximizers) can be distinguished, namely,local or global and weak or strong.


Basic Principles
http://find/


22/44

uol-new

Types of Extrema



Several types of minimizers (maximizers) can be distinguished, namely,local or global and weak or strong.


Basic Principles
http://find/


23/44

uol-new

In the general optimization problem, we are in principle seeking the globalminimum (or maximum) of f(x). In practice, an optimization problemmay have two or more local minima.


Basic Principles
http://find/


24/44

uol-new


Since optimization algorithms in general are iterative procedures whichstart with an initial estimate of the solution and converge to a singlesolution, one or more local minima may be missed. If the global minimumis missed, a suboptimal solution will be achieved, which may or may notbe acceptable.


Basic Principles
http://find/


25/44

uol-new



This problem can to some extent be overcome by performing theoptimization several times using a different initial estimate for the solutionin each case in the hope that several distinct local minima will be located.


Basic Principles
http://find/


26/44

uol-new




If this approach is successful, the best minimizer, namely, the one yieldingthe lowest value for the objective function can be selected. Although sucha solution could be acceptable from a practical point of view, usuallythere is no guarantee that the global minimum will be achieved.


Basic Principles
http://find/


27/44

uol-new




If this approach is successful, the best minimizer, namely, the one yieldingthe lowest value for the objective function can be selected. Although sucha solution could be acceptable from a practical point of view, usuallythere is no guarantee that the global minimum will be achieved.

Therefore, for the sake of convenience, the term minimize f(x) in thegeneral optimization problem will be interpreted as find a localminimum of f(x).


Basic Principles
http://find/


28/44

uol-new

Good News: In a specific class of problems where function f(x)and set R satisfy certain convexity properties, any local minimumof f(x) is also a global minimum of f(x). In this class of problems

an optimal solution can be assured.

These problems will be examined in Section 2.7.


Basic Principles
http://find/


29/44

uol-new

Definition 2.1: Weak local minimizer

A point x R, where R is the feasible region, is said to be a weak localminimizer of f(x) if there exists a distance > 0 such that f(x)

f(x) if

x R and x x <


Basic Principles
http://find/


30/44

uol-new



f(x) if

x R and x x <

Definition 2.2: Weak global minimizer

A point x R is said to be a weak global minimizer of f(x) if

f(x) f(x

) for all x R


Basic Principles
http://find/


31/44

uol-new



f(x) if

x R and x x <



f(x) f(x

) for all x R

Definition 2.3(a): Strong local minimizer

A point x R is said to be a strong local minimizer of f(x) if there exists adistance > 0 such that f(x) > f(x) if x

Rand

x

x

<


Basic Principles
http://find/


32/44

uol-new



f(x) if

x R and x x <



f(x) f(x

) for all x R

Definition 2.3(a): Strong local minimizer

A point x R is said to be a strong local minimizer of f(x) if there exists adistance > 0 such that f(x) > f(x) if x R and x x <

Definition 2.3(b): Strong global minimizer

A point x R is said to be a strong global minimizer of f(x) iff(x) > f(x) for all x R


Basic Principles

N d S ffi i C di i f L l Mi i d M i
http://find/


33/44

uol-new

Necessary and Sufficient Conditions for Local Minima and Maxima

The gradient g(x) and the Hessian H(x) must satisfy certain conditionsat a local minimizer x.


Basic Principles

N d S ffi i C di i f L l Mi i d M i
http://find/


34/44

uol-new



Two sets of conditions will be discussed, as follows:1 Conditions which are satisfied at a local minimizer x. These

are the necessary conditions.2 Conditions which guarantee that x is a local minimizer. These

are the sufficient conditions.


Basic Principles

N d S ffi i t C diti f L l Mi i d M i
http://find/


35/44

uol-new






A concept that is used extensively in these theorems is the concept of afeasible direction.


Basic Principles

http://find/


36/44

uol-new






A concept that is used extensively in these theorems is the concept of afeasible direction.Definition 2.4: Feasible Direction

Let = d be a change in x where is a positive constant and d is a directionvector. IfR is the feasible region and a constant

> 0 exists such that

x + d R in the range 0 , then d is said to be a feasible direction at point x.


Basic Principles

http://find/


37/44

uol-new






A concept that is used extensively in these theorems is the concept of afeasible direction.Definition 2.4: Feasible Direction

Let = d be a change in x where is a positive constant and d is a directionvector. IfR is the feasible region and a constant

> 0 exists such that

x + d R in the range 0 , then d is said to be a feasible direction at point x.In simple words, if a point x remains in R after it is moved a finite distance ina direction d, then d is a feasible direction vector at x.


Basic Principles

E l 2 2
http://find/


38/44

uol-new

Example 2.2

Suppose the feasible region is given by

R = {x : x1 2, x2 0}Which of the vectors d1 = [2 2]T, d2 = [0 2]T, d3 = [2 0]T arefeasible directions at x1 = [4 1]

T, x2 = [2 3]T, x3 = [1 4]

T?


Basic Principles
http://find/


39/44

uol-new

Theorem 2.1 First-Order Necessary Conditions for a Minimum(a) If f(x) C1 and x is a local minimizer, then

g(x)Td 0for every feasible direction d at x.

(b) If x

is located in the interior ofR theng(x) = 0

Proof

Proof is provided on white board. (Available in book)


Basic Principles
http://find/


40/44

uol-new

Definition 2.5

(a) Let d be an arbitrary direction vector at point x. The quadratic formdTH(x)d is said to be positive definite, positive semidefinite, negativesemidefinite, negative definite if dTH(x)d > 0, 0, 0, < 0,respectively, for all d = 0 at x. If d

T

H(x)d can assume positive as well asnegative values, it is said to be indefinite.

(b) If dTH(x)d is positive definite, positive semidefinite, etc., then matrixH(x) is said to be positive definite, positive semidefinite, etc.


Basic Principles
http://find/


41/44

uol-new

Theorem 2.2 Second-Order Necessary Conditions for a Minimum

(a) If f(x) C2 and x is a local minimizer, then for every feasible directiond at x.

1 g(x)Td 02 If g(x)Td = 0, then dTH(x)d 0

(b) If x

is a local minimizer in the interior ofR, then1 g(x) = 02 d

TH(x)d 0 for all d = 0

Proof




Basic Principles
http://find/


42/44

uol-new

Theorem 2.4 Second-Order Sufficient Conditions for a Minimum

If f(x) C2 and x is located in the interior ofR, then theconditions

1 g(x) = 0

2 H(x

) is positive definiteare sufficient for x to be a strong local minimizer.

Proof




Basic Principles
http://find/


43/44

uol-new

Example 2.3

Discussed on white board. (Available in book)

Example 2.4Discussed on white board. (Available in book)



Basic Principles
http://find/


44/44

uol-new

In the next lecture, we will discuss the remaining parts of Chapter 2 including

Classifications of Stationary Points

Convex and Concave Functions

Optimization of Convex Functions

See you next time. Thank you for your attention.


http://find/

Documents

Optimization Theory Lecture 2