What's New in Mathematical Optimisation from NAG · Nonlinear programming: active set versus interior point methods Overview Sequential quadratic programming Interior point methods

Experts in numerical algorithmsand HPC services

What's New in Mathematical Optimisation from NAG

Jan Fiala, Benjamin Marteau


Nonlinear programming: active set versus interior point methods

Overview

Sequential quadratic programming

Interior point methods

Illustration on a few examples

Mixed integer nonlinear optimisation

Semidenite programming

Sample applications in nance

Coming next

Large-scale linear programming

Derivative free solver for calibration

Working with customers

2



Overview







Coming next




3


Nonlinear optimisation

Problems of the form:

minx∈Rn

f(x)

hk(x) = 0, k = 1...me

gk(x) ≤ 0, k = 1...mi

Two dierent approaches:

Sequential quadratic programming:

Active set method

based on Gill et al., Stanford University

Interior point method

based on Wächter, Biegler, Carnegie Mellon University

4


Formalisation of the problem

Karush-Kuhn-Tucker (KKT) optimality conditions:

Stationarity condition

∇f(x) +

me∑k=1

λk∇hk(x) +

mi∑k=1

µk∇gk(x) = 0

Primal feasibility condition

h(x) = 0

g(x) ≤ 0

Dual feasibility condition

∀k ∈ 1, ...,mi, µk ≥ 0

Complementarity condition

∀k ∈ 1, ...,mi, µkgk(x) = 0

5


Two approches to tackle these equations

The Complementarity condition is problematic due to its

combinatorial nature.

Two distincts strategy:

An SQP solver guesses which constraints are binding

An IPM perturbs the equation

6



Overview







Coming next




7



Denition

An inequality constraint k is said to be active at x if it is binding

(g(x) = 0).

SQP methods iteratively build the set of active constraints by

solving quadratic programs:

Initialisation Choose a rst estimate of the solution x0. Build a

quadratic model of the objective around x0 and take a rst guess of

the set of active constraints

Iteration k

Solve the quadratic program warm started by the active set estimation

Update xk+1 and the set of active constraints

Build a new quadratic model around xk+1

8


A few characteristics of SQP methods

Perform lots of inexpensive iterations

Work on the null space of the constraints

The more active constraints there are, the cheaper the iterations are

As a consequence, SQP methods scale very well to large NLP

problems with a high number of constraints.

9



Overview







Coming next




10



If one tries to solve the KKT system directly, the complementarity

condition turns out to be problematic. Therefore, a IPM iteration

can be:

Relax the complementarity condition (µg(x∗) = ν with ν > 0)

Perform one Newton iteration towards the solution of the relaxed

KKT system

Update the current solution estimate and the relaxation parameter

ν

Interior point methods aim at nding a sequence of points

converging to the solution that satisfy the constraints strictly.

11


A few characteristics of Interior Point methods

Perform a few expensive iterations

In the absence of constraints, behave as a Newton method

As a consequence, Interior Point methods scale very well to large

NLP problems with a small number of constraints.

12


Illustration on a few highly constrained problems

Problems were selected from the CUTER test set.

Name Number Number e04vh (SQP) e04st (IPM)

of vars of constrs time (s) time (s)

MINC44 1113 1033 0.28 7.60

READING8 2002 1000 9.78 251.12

NCVXQP6 10000 7500 3.60 613.38

MADSSCHJ 201 398 0.34 5.51

13


Illustration on a few weakly constrained problems

Problems selected from the CUTER test set.



JIMACK 3549 0 542.42 8.12

OSORIO 10201 202 303.00 0.78

TABLE8 1271 72 3.80 0.04

OBSTCLBL 10000 1 40.84 0.50

The number of constraints is not the only factor...

14


Illustration on a few weakly constrained problems

Problems selected from the CUTER test set.



JIMACK 3549 0 542.42 8.12

OSORIO 10201 202 303.00 0.78

TABLE8 1271 72 3.80 0.04

OBSTCLBL 10000 1 40.84 0.50

The number of constraints is not the only factor...

14


Other characteristics

IPM (e04st) advantages SQP (e04vh) advantages

•Ecient on unconstrained or

loosely constrained problems

•Ecient on highly constrained

problems

•Can exploit 2nd derivatives•Can capitalize on good initial

point

•Ecient also for quadratic

problems

•Stay feasible with respect to the

linear constraints throughout the

optimization

•Better use of multi-core

architecture

•Usually better results on

pathological problems

•New and simpler interface•Usually requires less function

evaluations

•Infeasibility detection

•Allows warm starting

15



Overview







Coming next




16



Problems of the form:

minx∈Rn,y∈Zm

f(x, y)

l ≤ c(x, y) ≤ u

x: continuous variables

y: integer variables

SQP with branch-and-cut techniques

Ordinal variables

Does not require the model evaluation on fractional values of integer

variables

17


Some characteristics

It might be necessary to use integral variables in an optimization

model, for example:

Cardinality constraints

Decision logic between variables (e.g. constraints only present if a

certain variable is nonzero)

Variables can only take values inside a predecided set

...

Included in NAG, Mark 25 as h02da. Based on Schittkowski et al.,

University of Bayreuth.

18



Overview







Coming next




19


Semidenite Programming (SDP)

Linear Programming (LP)

well-known, well-researched

convex (local → global)

strong theoretical properties

but only linear

Extensions:

NLP but some nice properties lost (e.g., convexity, duality theory)

SDP retain the theory, change geometry

add matrix inequality, symmetric matrix is positive semidenite

(all eignevalues are nonnegative)

highly nonlinear

notation: A(x) 0

20







but only linear

Extensions:





highly nonlinear

notation: A(x) 0

20







but only linear

Extensions:





highly nonlinear

notation: A(x) 0

20


Semidenite Programming (SDP) formulation

LP

→ SDP → BMI-SDP

minx∈Rn

cTx

subject to lB ≤ Bx ≤ uBlx ≤ x ≤ ux

21



LP → SDP

→ BMI-SDP

minx∈Rn

cTx


A(x) = A0 +n∑i=1

xiAi 0

Ai given symmetric matrices

A(x) is linear in x, LMI = linear matrix inequality

with special choice A(x) can be a matrix variable X

21



LP → SDP → BMI-SDP

minx∈Rn

cTx+1

2xTHx


A(x) = A0 +

n∑i=1

xiAi +

n∑i,j=1

xixjQij 0

further (quadratic) extension

BMI = bilinear matrix inequalities

unique to NAG, included in Mark 26 as e04sv

in collaboration with Ko£vara at al., University of Birmingham

21


Semidenite Programming (SDP) Applications?

SDP = special tool

It's there when you need it!

very powerful concept

matrix constraints might not appear naturally

⇒ reformulations, relaxations

structural optimization, chemical engineering, combinatorial

optimization, statistics, control and system theory, polynomial

optimization, ...

spark interest

Warning: I am not a quant!

22



SDP = special tool







optimization, ...

spark interest


22



SDP = special tool







optimization, ...

spark interest


22



Overview







Coming next




23


SDP Applications in Finance

positive semidenite requirement appears directly construction of a correlation/covariance matrix

nearest correlation matrix (with constraints)

robust (worst-case) portfolio optimization

calibration of volatility structure for Libor market swaption

eigenvalue optimization(min/max eigenvalue/singular value, matrix condition number,nuclear norm as heuristic for rank minimization, ...) risk-management: limit Γ of your portfolio

relaxations many relaxations of (NP-hard) combinatorial problems

asian option pricing bounds(?)

reformulations polynomial nonnegativity ↔ matrix inequality

(e.g., interpolation by nonnegative splines)

Lyapunov stability of ODE

in nance?

24













in nance?

24













in nance?

24













in nance?

24


Nearest Correlation Matrix (with Constraints)

minX

n∑i,j=1

(Xij −Hij)2

subject to Xii = 1, i = 1, . . . , n

X 0

correlation matrix = symmetric positive semidenite matrix with

unit diagonal

H approximate correlation matrix

X new (true) correlation matrix closest to H in Frobenius norm

do not use SDP on vanilla NCM due to algorithm complexity;

special solvers in G02 are preferrable

25



minX

n∑i,j=1

(Xij −Hij)2

subject to Xii = 1, i = 1, . . . , n

X 0

correlation matrix = symmetric positive semidenite matrix with

unit diagonal

H approximate correlation matrix

X new (true) correlation matrix closest to H in Frobenius norm

do not use SDP on vanilla NCM due to algorithm complexity;

special solvers in G02 are preferrable

25



minX

n∑i,j=1

(Xij −Hij)2

subject to Xii = 1, i = 1, . . . , n

X 0

Possible new constraints:

x elements: Xij = Hij for some i, j

element-wise bounds: lij ≤ Xij ≤ uijsmallest eigenvalue constraint: X λminI, where λmin given

limit condition number: λmaxI X λminI, λmax ≤ κλmin,

where κ is given and λmin, λmax are new variables

25



minX

n∑i,j=1

(Xij −Hij)2

subject to Xii = 1, i = 1, . . . , n

X 0

Possible dierent objective:

weight elements:∑Wij(Xij −Hij)

2

consider portfolio V aRα: −λZ2αw

TDXDw +∑

(Xij −Hij)2

D deviations (dii = σi), w asset allocation, λ weighting factor

25



minX

n∑i,j=1

(Xij −Hij)2

subject to Xii = 1, i = 1, . . . , n

X 0

Full control over the formulation!

25


Robust Portfolio Optimization

mean-variance analysis often very sensitive to the data

are nominal µ (expected returns) and Σ (covariance) correct?

robust EF = limit sensitivity of the results by incorporating

uncertainity model on parameters

choose solution in the worst-case scenario (see Boyd '07)

min (µ− r1 + λ)TΣ−1(µ− r1 + λ)

subject to Fµ ≥ 0

|µi − µi| ≤ α1|µi|, i = 1, . . . , n

|1Tµ− 1T µ| ≤ α2|1T µ||Σij − Σij | ≤ β1|Σij |, i, j = 1, . . . , n

||Σ− Σ||F ≤ β2||Σ||FΣ 0

λ ≥ 0

26


Robust Portfolio Optimization

mean-variance analysis often very sensitive to the data

are nominal µ (expected returns) and Σ (covariance) correct?

robust EF = limit sensitivity of the results by incorporating

uncertainity model on parameters

choose solution in the worst-case scenario (see Boyd '07)

min (µ− r1 + λ)TΣ−1(µ− r1 + λ)

subject to Fµ ≥ 0

|µi − µi| ≤ α1|µi|, i = 1, . . . , n

|1Tµ− 1T µ| ≤ α2|1T µ||Σij − Σij | ≤ β1|Σij |, i, j = 1, . . . , n

||Σ− Σ||F ≤ β2||Σ||FΣ 0

λ ≥ 0

26


Calibration of volatility structure

How to extract correlation information from market option prices?

assume LIBOR market model with covariance structure X and

swap weights Ω = wwT

under some assumptions, swaption prices are given by

Black-Scholes formula with volatility parameter σ = Tr(ΩX)

Task: calibrate X to observed swaption market prices:

nd X

subject to Tr(ΩX) = σ

X 0

where σ are observed swaption implied vols

27


Calibration of volatility structure cont.

Correlation X in the previous feasibility problem not unique,

can choose objective:

min or max price of some other option: min/max Tr(ΩX)

norm of X: min‖X‖smoothness: min‖∆X‖robustness via Bid/Ask spread:

max t s.t. σBid + t ≤ Tr(ΩX) ≤ σAsk − trank of X as a heuristic via nuclear norm of X

28


Risk-management: How to construct positive Γ portfolio?

assume existing portfolio Π of derivatives/exotics on underlying Si:

Π = F (S1, . . . , Sn)

Π must be risk managed usual Delta hedging: ∂Π/∂S = 0

but Delta hedging only works for very small movements in the

underlyings, for larger would like to keep positive (or small) Γ as

dΠ = ∂Π∂S + 1

2ST ∂2Π∂S2 S + · · ·

to construct positive Γ: buy xi units of vanilla option pi on Si and

yi of underlying Si

minx,y∑

xipi(Si) + yiSi

subject to∂2F

∂S2+ diag

(xi∂2pi∂S2

i

) 0

∂F

∂Si+ xi

∂pi∂Si

+ yi = 0, i = 1, . . . , n

29


Risk-management: How to construct positive Γ portfolio?

assume existing portfolio Π of derivatives/exotics on underlying Si:

Π = F (S1, . . . , Sn)

Π must be risk managed usual Delta hedging: ∂Π/∂S = 0

but Delta hedging only works for very small movements in the

underlyings, for larger would like to keep positive (or small) Γ as

dΠ = ∂Π∂S + 1

2ST ∂2Π∂S2 S + · · ·

to construct positive Γ: buy xi units of vanilla option pi on Si and

yi of underlying Si

minx,y∑

xipi(Si) + yiSi

subject to∂2F

∂S2+ diag

(xi∂2pi∂S2

i

) 0

∂F

∂Si+ xi

∂pi∂Si

+ yi = 0, i = 1, . . . , n

29



Overview







Coming next




30


Coming next new LP solver

NAG = Amazon of optimization

(be a one-stop-shop for all you need in optimization)

Constant evolution of the library

based on our roadmap

customers' requests

latest research & collaborations

... ongoing hard work

New LP solver

new solver for large-scale LP problems

based on interior point method (IPM)

lling the missing gap

signicant speed-up

31


Coming next new LP solver

NAG = Amazon of optimization

(be a one-stop-shop for all you need in optimization)

Constant evolution of the library

based on our roadmap

customers' requests

latest research & collaborations

... ongoing hard work

New LP solver

new solver for large-scale LP problems

based on interior point method (IPM)

lling the missing gap

signicant speed-up

31


Coming next DFO for calibration

Standard data-tting (calibration) problem

given oberved data [ti, yi]; model f(·;x) depending on model

parameters x

Task: nd x to t the data as close as possible,

typically in least square sense: minx∑

(yi − f(ti;xi))2

Additional requirements

small number of parameters (< 100)

black-box model, no derivatives available

possibly expensive and/or inaccurate function evaluations

typically reasonable starting point, small improvement sucient

⇒ nite dierences shouldn't be used!

New Derivative free optimization (DFO) solver exploiting the

problem structure (the only of its kind!)

32





parameters x



(yi − f(ti;xi))2









32





parameters x



(yi − f(ti;xi))2









32



Overview







Coming next




33



Sometimes solution out of the box is not sucient!

Is it possible to speed up the solver?

Does the model t the solver?

Can a special problem structure be exploited?

NAG Mathematical Optimization Consultancy ready to help!

choice and tuning of the solver

adjustments with the model

bespoke solver development

34



Sometimes solution out of the box is not sucient!

Is it possible to speed up the solver?

Does the model t the solver?

Can a special problem structure be exploited?

NAG Mathematical Optimization Consultancy ready to help!

choice and tuning of the solver

adjustments with the model

bespoke solver development

34


Examples of optimisation projects

Energy & Commodities Trading Co. The client's model was demonstrating unusual behaviour - signicant

memory footprint and slow convergence. Analysis of the model

showed that a more suitable equivalent reformulation is available.

When the model was adjusted, the solver performed as expected.

Financial Services Software Vendor extended site visit of a client allowed us to discuss client's problem in

detail and helped to identify a weak point which was causing

convergence issues and x.

Financial Brokerage Co. The client wanted a class of problems to be solved within the

prescribed time limit. After the initial assessment of the problem, a

possible solution was identied using recent research from Stanford

university. A bespoke solution was delivered during a short consulting

engagement. The new solver drastically improved the performance so

that even bigger problems could be considered by the client.

35

Documents

What's New in Mathematical Optimisation from NAG · Nonlinear programming: active set versus interior point methods Overview Sequential quadratic programming Interior point methods