B1 Optimization - robots.ox.ac.ukvictor/teaching/pdfs/B1-Optimization-Lecture1... · •Lecture...

B1 Optimization4 Lectures Trinity 2019 / Michaelmas 20201 Examples Sheet Prof V A Prisacariu

• Lecture 1: Local and global optima, unconstrained univariate and multivariate optimization, stationary points, steepest descent.

• Lecture 2: Newton and Newton like methods – Quasi-Newton, Gauss-Newton; the Nelder-Mead (amoeba) simplex algorithm.

• Lecture 3: Linear programming constrained optimization; the simplex algorithm, interior point methods; integer programming.

• Lecture 4: Convexity, robust cost functions, methods for non-convex functions – grid search, multiple coverings, branch and bound, simulated annealing, evolutionary optimization.

• Course based on previous B1 Optimization, by Prof A Zisserman.

Textbooks

Practical Optimization

Philip E. Gill, Walter Murray, and Margaret H. Wright

Covers unconstrained and constrained optimization. Very clear and comprehensive.

Background reading and web resources

Numerical Recipes in C (or C++) : The Art of Scientific Computing

William H. Press, Brian P. Flannery, Saul A. Teukolsky, William T. Vetterling

CUP 1992/2002- Good chapter on optimization- Available on line as pdf

Background reading and web resources

Convex Optimization

Stephen Boyd and Lieven Vandenberghe

CUP 2004- Available on line at http://www.stanford.edu/~boyd/cvxbook/

Further reading, web resources, and the slides available on Canvas and at:

http://www.robots.ox.ac.uk/~victor -> Teaching

Lecture 1Topics covered in this lecture:

• Problem formulation;

• Local and global optima;

• Unconstrained univariate optimization;

• Unconstrained multivariate optimization for quadratic functions:

• Stationary points;

• Steepest descent.

Introduction

Optimization is used to find the best or optimal solution to a problem.

Steps involved in formulating an optimization problem:

• Conversion of the problem into a mathematical model that abstracts all theessential elements;

• Choosing a suitable optimization method for the problem;

• Obtaining the optimum solution.

Example: airfoil/wing aerodynamic shape optimization

Optimization problem: • constraints: lift, area, chord;• minimize drag;• vary shape.

Aircraft Design via Numerical Optimization Are We There Yet? Martins et al, 2017

Introduction: Problem specification

Suppose we have a cost function (or objective function):

𝑓 𝐱 :ℝ! → ℝ

Our aim is find the value of the parameters 𝐱 that minimize the function:

𝐱∗ = argmin𝐱𝑓(𝐱)

subject to the following constraints:- equality: 𝑐! 𝐱 = 0, 𝑖 = 1,… ,𝑚".- inequality: 𝑐!(𝐱) ≥ 0, 𝑖 = 𝑚" + 1,… ,𝑚.

We will start by focussing on unconstrained problems.

Recall: One dimensional functions

A differentiable function has a stationary point when the derivative is zero: 𝑑𝑓𝑑𝑥

The second derivative gives the type of stationary point.

𝑓!! 𝑥 ≤ 0maximum

𝑓!! 𝑥 = 0inflection

𝑓!! 𝑥 ≥ 0minimum

Unconstrained optimization

- down-hill search (gradient descent) algorithms can find local minima;

- which of the minima is found depends on the starting point;

- such minima often occur in real applications.

local minimum

global minimum

min𝐱𝑓(𝐱)

𝑓(𝐱)

Example: template matching in 2D images

Input:Two point sets 𝑀 = 𝐌! and 𝐷 = {𝐃#}.

Task:Determine the transformation 𝑇 that minimizes the error between 𝐷 and the transformed 𝑀.

Motivation:Robot picking up “C”

Data, 𝐷

Transformation, 𝑇

Model, 𝑀

Cost function (matches known)

2D points 𝑥, 𝑦 $, Model 𝐌!, Data 𝐃!

𝑓 𝜃, 𝑡%, 𝑡& =;!

𝐑 𝜃 𝐌! + 𝑡 − 𝐃! '

Transformation parameters:- Rotation angle 𝜃

- Translation 𝐭 = 𝑡%, 𝑡&,$

Cost function (matches unknown)

2D points 𝑥, 𝑦 $, Model 𝐌!, Data 𝐃!

𝑓 𝜃, 𝑡%, 𝑡& =;!

𝐑 𝜃 𝐌# + 𝑡 − 𝐃!'

Transformation parameters:- Rotation angle 𝜃

- Translation 𝐭 = 𝑡%, 𝑡&,$

for each data point

find closest model point

Cost function (matches unknown)

𝑓 𝜃, 𝑡%, 𝑡& =;!

𝐑 𝜃 𝐌# + 𝑡 − 𝐃!'

for each data point

find closest model point

Model point: 𝐌# = 𝑥#, 𝑦#$

Transformation parameters:- Rotation angle 𝜃- Translation 𝐭 = 𝑡%, 𝑡&,

$ correct correspondences

closest point correspondences

𝐃"As distances become smaller we get the correct correspondences and model pulled onto the data

Performance

Unconstrained univariate optimization

We will look at three basic methods to determine the minimum:1. Gradient descent;2. Polynomial interpolation;3. Newton’s method. These introduce the ideas that will be applied in the multivariate case.

For the moment, assume we can start close to the global minimum

min%𝑓(𝑥)

𝑓(𝑥)

function of one variable

A typical 1D function

As an example, consider the function:

𝑓 𝑥 = 0.1 + 0.1𝑥 + 𝑥!/(0.1 + 𝑥!)

(assume we do not know the actual function expression from now on)

1. Gradient descent

Given a starting location, 𝑥(, examine )*)% and move in the downhill direction to generate a new estimate 𝑥+ = 𝑥( + 𝛿𝑥.

How to determine the step size 𝛿𝑥 ?

𝛿𝑥 = −𝛼𝑑𝑓𝑑𝑥

Setting alpha ..

Slide credit: Bert De Brabandere

If the learning rate is too large, gradient descent may not converge (left)If it is too small, convergence will be slow (middle).A good learning rate leads to fast convergence (right)

2. Polynomial interpolation (trust region method)

Approximate 𝑓(𝑥) with a simpler function which reasonably approximates the function in a neighbourhood around the current estimate 𝑥. This neighbourhood is the trust region.

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 10

- Bracket the minimum.- Fit a quadratic or cubic polynomial

which interpolates 𝑓(𝑥) at some points in the interval.

- Jump to the (easily obtained) minimum for the polynomial.

- Throw away the worst point and repeat the process.

Quadratic interpolation using 3 points, 2 iterations

Other methods to interpolate a quadratic ?

- e.g. 2 points and one gradient

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 10

3. Newton’s method

Fit a quadratic approx. to 𝑓(𝑥) using both gradient and curvature information at 𝑥.

- Expand 𝑓(𝑥) locally using a Taylor series:

𝑓 𝑥 + 𝛿𝑥 = 𝑓 𝑥 + 𝛿𝑥𝑓! 𝑥 +𝛿𝑥"

2𝑓!! 𝑥 + h.o.t.

- Find the 𝛿𝑥 (the variable here) such that 𝑥 + 𝛿𝑥 is a stationary point of 𝑓:𝑑𝑑𝛿𝑥 𝑓 𝑥 + 𝛿𝑥𝑓! 𝑥 +

𝛿𝑥"

2 𝑓!! 𝑥 = 𝑓! 𝑥 + 𝛿𝑥𝑓!! 𝑥 = 0

- and rearranging:

𝛿𝑥 = −𝑓′(𝑥)𝑓′′(𝑥)

- Update for 𝑥:

𝑥#$% = 𝑥# −𝑓! 𝑥#𝑓!! 𝑥#

-0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.20

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 10

Newton iterations Quadratic approximations

- avoids the need to bracket the root;

- quadratic convergence (decimal accuracy doubles at every iteration).

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 10

- global convergence of Newton’s method is poor;

- often fails if the starting point is too far from the minimum;

- in practice, must be used with a globalization strategy which reduces the step length until function decrease is assured.

-0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.20

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1-1

Newton iterations Quadratic approximations

Stationary Points for Multidimensional functions

𝑓 𝑥 :ℝ' → ℝ has a stationary point with the gradient

∇𝑓 =𝜕𝑓𝜕𝑥(

,𝜕𝑓𝜕𝑥)

, … ,𝜕𝑓𝜕𝑥'

∇𝑓 =𝜕𝑓𝜕𝑥(

,𝜕𝑓𝜕𝑥)

Extension to N dimensions

- How big can N be?: problem sizes can vary from a handful of parameters to millions.

- In the following we will first examine the properties of stationary points in N dimensions and then move onto optimization algorithms to find the stationary point (minimum).

- We will consider examples for N = 2, so that cost function surfaces can be visualized.

Taylor expansion in 2D

A function can be approximated locally by its Taylor series expansion about a point 𝑥(.

𝑓 𝑥( + 𝑥 ≈ 𝑓 𝑥( +𝜕𝑓𝜕𝑥,𝜕𝑓𝜕𝑦

𝑥𝑦 +

12𝑥, 𝑦

𝜕'𝑓𝜕𝑥'

𝜕'𝑓𝜕𝑥𝜕𝑦

𝜕'𝑓𝜕𝑦'

𝑥𝑦 + h.o.t.

- This is a generalization of the 1D Taylor series:

𝑓 𝑥 + 𝛿𝑥 = 𝑓 𝑥 + 𝛿𝑥𝑓, 𝑥 +𝛿𝑥'

2𝑓,, 𝑥 + h.o.t.

- The expansion to second order is a quadratic function in 𝐱.𝑓 𝐱 = 𝑎 + 𝐠$𝐱 +

12𝐱$H𝐱

Taylor expansion in ND

A function may be approximated locally by its Taylor expansion about a point 𝑥(𝑓 𝑥( + 𝑥 ≈ 𝑓 𝑥( + ∇𝑓$𝐱 +

12𝐱$H𝐱 + h.o.t.

where the gradient ∇𝑓(𝐱) of 𝑓(𝐱) is the vector

∇𝑓 𝐱 =𝜕𝑓𝜕𝑥+

, … ,𝜕𝑓𝜕𝑥-

and the Hessian H 𝐱 of 𝑓(𝐱) is the symmetric matrix𝜕'𝑓𝜕𝑥+'

⋯𝜕'𝑓

𝜕𝑥+𝜕𝑥-⋮ ⋱ ⋮

𝜕'𝑓𝜕𝑥+𝜕𝑥-

⋯𝜕'𝑓𝜕𝑥-'

Properties of Quadratic functions

Taylor expansion:

𝑓 𝐱( + 𝐱 = 𝑓 𝐱( + 𝐠$𝐱 +12𝐱$H𝐱

Expand about a stationary point 𝐱( = 𝐱∗ in direction 𝐩:

𝑓 𝐱∗ + 𝛼𝐩 = 𝑓 𝐱∗ + 𝐠$𝛼𝐩 +12𝛼'𝐩$H𝐩 = 𝑓 𝐱∗ +

12𝛼'𝐩$H𝐩

since the stationary point 𝐠 = |∇𝑓 𝐱∗ = 0.

At a stationary point the behavior is determined by H.

Properties of Quadratic functions

H is a symmetric matrix, so it has orthogonal eigenvectors

H𝐮0 = 𝜆0𝐮0 choose 𝐮! = 1

𝑓 𝐱∗ + 𝛼𝐮! = 𝑓 𝐱∗ +12𝛼'𝐮!$H𝐮! = 𝑓 𝐱∗ +

12𝛼'𝜆!

As |𝛼| increases, 𝑓 𝐱∗ + 𝛼𝐮! increases, decreases or is unchanging according to whether 𝜆! is positive, negative or zero.

Examples of Quadratic functions

Case 1: both eigenvalues positive

𝑓 𝐱 = 𝑎 + 𝐠$𝐱 +12𝐱$H𝐱

with 𝑎 = 0, 𝐠 = −50

−50 ,𝐇 = 6 44 6

-5 0 5 10 15-5

minimum

Case 2: eigenvalues have different signs

𝑓 𝐱 = 𝑎 + 𝐠$𝐱 +12𝐱$H𝐱

with 𝑎 = 0, 𝐠 = −30

+20 ,𝐇 = 6 00 −4

-5 0 5 10 15-5

saddle surface: extremum but not a minimum

Case 3: one eigenvalue zero.

𝑓 𝐱 = 𝑎 + 𝐠$𝐱 +12𝐱$H𝐱

with 𝑎 = 0, 𝐠 = 0

0 ,𝐇 = 6 00 0

-5 0 5 10 15-5

parabolic cylinder

Hessian positive definite.Convex function.Minimum point.

Hessian negative definite.Concave function.Maximum point.

Hessian mixed.Surface has negative point.Saddle point.

B1 Optimization - robots.ox.ac.ukvictor/teaching/pdfs/B1-Optimization-Lecture1... · •Lecture...

Documents

Introduction to Unconstrained Optimization: Part 1

1 Optimization Introduction & 1-D Unconstrained Optimization

UNCONSTRAINED MULTIVARIABLE OPTIMIZATION

UNCONSTRAINED MULTIVARIABLE OPTIMIZATION - OU

An Introduction to Optimization Theory. Outline Introduction Unconstrained optimization problem Constrained optimization problem

Testing Unconstrained Optimization Softwareyzhang/caam454/nls/MGH.pdf · Testing Unconstrained Optimization Software JORGE J. MORI~, BURTON S. GARBOW, and KENNETH E. HILLSTROM Argonne

1 Chapter 6 UNCONSTRAINED MULTIVARIABLE OPTIMIZATION

Chapter 7 Optimization. Content Introduction One dimensional unconstrained Multidimensional unconstrained Example

MTAEA Unconstrained Optimization in Euclidean Space

Local, Unconstrained Function Optimization

Optimality Conditions for Unconstrained optimization

Part IV: Unconstrained Optimization Spring 2015

Numerical Methods for Unconstrained Optimization and Nonlinear Equations

Optimization II: Unconstrained Multivariable · 2013. 10. 30. · steepest descent" CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 4 / 20. Multivariable

Nonlinear programming Unconstrained optimization techniques

Parameter Optimization: Unconstrainedmocha-java.uccs.edu/ECE5570-NEW/ECE5570_CH3_12Feb17.pdf · ECE5570, Parameter Optimization: Unconstrained 3–3 Unconstrained Parameter Optimization!

Chapter 4: Unconstrained Optimization - McMaster …xwu/part4.pdf · Chapter 4: Unconstrained Optimization † Unconstrained optimization problem minx F(x) or maxx F(x) † Constrained

Unconstrained Optimization 4

Nonlinear Optimization: Algorithms 1: Unconstrained ...members.cbio.mines-paristech.fr/~jvert/teaching/...Nonlinear Optimization: Algorithms 1: Unconstrained Optimization INSEAD, Spring

Introduction to Unconstrained Optimization · Introduction to Unconstrained Optimization Micha el Baudin February 2011 Abstract This document is a small introduction to unconstrained