Application of Quasi-Newton Algorithms in Optimal Design

Sebastian Ueckert, Joakim Nyberg, Andrew C. Hooker

Pharmacometrics Research GroupDepartment of Pharmaceutical BiosciencesUppsala UniversitySweden

Outline1. Optimizing Designs2. Introduction: Quasi-Newton Methods (QNMs)3. Performance QNMs4. Advantages QNMs5. Laplace Approximation for Global Optimal Design6. Using QNMs in Laplace Approximation

Optimizing a Design

( ) ( , )j x f F x

* arg max ( )x X

e.g. D-Optimal Design ( ) ,j x F x

Parameters α Design variables x

DataEstimation

Optimization• Interval methods

True global optimizersHard to implementStill under development

• Stochastic methods – Simulated Annealing (SA), Ant colony optimization, Genetic Algorithm(GA) Easy to implement (SA)

Marketing effective (GA)SlowNo information about solutionHeuristic

Optimization• Derivative free methods

– Downhill Simplex MethodNo derivatives necessaryRobustSlowLocal

Gradient Based Methods

Gradient Based MethodsMathematically well understoodFast (if OFV calc not too expensive) Only localComplicated to implement

• Steepest Descent• Conjugate Gradient

1. Set xk=x0

2. Determine search direction

3. Do line search along p* to find minimal xk+1

4. Set xk=xk+1 and go to 2

Newton Method

212( ) : ( )T T

k k k k kf x p f p f p f p m p 2( ) 0k k km p f f p

* arg max ( )x R

* 2 1k kp f f

Algorithm:

1. Set xk=x0

2. Determine search direction

4. Set xk=xk+1 and go to 2

Newton Method

212( ) : ( )T T

k k k k kf x p f p f p f p m p 2( ) 0k k km p f f p

* arg max ( )x R

* 2 1k kp f f

Algorithm:

Calculate Hessian

1. Set xk=x0, Bk=I2. Determine search direction

4. Set xk=xk+1, Bk=Bk+Uk and go to 2

Quasi-Newton Methods

12( ) : ( )T T

k k k k k kf x p f p f p B f p m p

( ) 0k k km p f B p * 1

k kp B f

Algorithm:

Calculation of Hessian is computationally expensiveProblem:

Approach: Use approx. Hessian and build up during search

Quasi-Newton Methods• Different methods for different updating formulas

– Davidon–Fletcher–Powell (DFP)

– Broyden-Fletcher-Goldfarb-Shanno (BFGS) 1

TTk k k kk k

k k T Tk k k k k

B x B xy yB By x x B x

T T Tk k k k k k

k kT T T T Tk k k k k k

y x y x y yB I B I

y x y x y x

1k k ky f f 1k k ks x x

Constraints• Experiments usually come with practicality constraints e.g.:

– Administered dose has to be smaller than X mg– Sampling times can only be taken until 8 h after dosing

i i iu x b Box Constraints

BFGS-B 17

BFGS-B

1. Set xk=x0, Bk=I2. Determine search direction

3. Project search direction vector on feasible region4. Do line search along p* to find minimal xk+1 respecting bounds5. Set xk=xk+1, Bk=Bk+Uk and go to 2

Algorithm:

12( ) : ( )T T

k k k k k kf x p f p f p B f p m p

( ) 0k k km p f B p * 1

k kp B f

Comparison• Test Scenario

– Model:• PKPD (1 cmp oral absorption; IMAX drug effect)• All parameters (ka,CL,V,IC50, E0, IMAX) with log-normal IIV 30% CV• PK parameters fixed• Combined error

– Design:• 3 groups (40,30,30 subjects)• 1 PK and 1 PD sample per subject

• Approach:– Generate random initial values – Optimize with steepest descent and BFGS

Results

BFGS Steepest Descent

Steepest Descent

BFGS 20

OFV0 2 4 6 8

x 1010

Design Sensitivity• Approximate Hessian matrix can be used to assess sensitivity

of design (at no additional computational costs)– Diagonal of the inverse of the Hessian– Use approximate efficiency

( )( ) j xEff xj x

* * * *12( ) T Tj x a j a j a B a

* * *12

*( )T Tj a j a B a

Eff aj

Design Sensitivity - Visual

7 7.5 8 8.5 9 9.50.9

6 6.5 7 7.5 8 8.50.9

1 1.5 2 2.5 30.9

Group 2 PD Group 1 PK Group 1 PD

Design Sensitivity - Numerical

PK Sample PD SampleGroup 1

7.12 [0.35;13.9]8.38[5.28;11.38]

Group 2

1.26 [0;3.74]1.79[1.03;2.55]

Group 3

9.22 [-1.31E;+1.31E] 0[0;0.0025]

0 20 40 60 80 10080

Group 2 PD

0 20 40 60 80 10080

Group 1 PK Group 3 PK23

LAPLACE APPROXIMATION

Global Optimal Design• Integral has to be evaluated• FIM occurs in integrand• For example ED optimal design:

• Usually evaluated with Monte-Carlo integrationComputationally intensive or imprecise

( ) ( ) ( , )EDj x p F x d

Laplace Approximation

,( ) ( ) ( , ) k xEDj x p F x d e d

, : log ( ) ( , )k x p F x

arg min ,m k x

Laplace Approximation

1. Minimize

2. Calculate the Hessian

3. Evaluate

Algorithm:

, : log ( ) ( , )k x p F x

2 ,mk x

Laplace-BFGS Approximation

1. Minimize using BFGS algorithm

2. Evaluate

Algorithm:

, : log ( ) ( , )k x p F x

Laplace-BFGS – Random Effects

arg min ,m k x

( )g e

Problem:

Approach:

For variance parameter α ≥ 0

Perform optimization on log-domain

1. Minimize using BFGS algorithm

2. Rescale approximate Hessian

3. Evaluate

, : log ( ( )) ( ( ), )k x p g F g x

mk g xe

Algorithm:

1TB B g g

Comparison• Comparison of 4 algorithms:

1. Monte Carlo integration with random sampling (MC-RS)2. Monte Carlo integration with Latin hypercube sampling (MC-LHS)3. Laplace integral approximation (LAPLACE)4. Laplace integral approximation with BFGS Hessian (LAPLACE-BFGS)

• Testing MC methods with 50 and 500 random samples

Comparison• Test Scenario

– Model:• 1 cmp IV bolus• CL,V with log-normal IIV• Additive error

– Design:• 20 subjects• 2 samples per subject

– Parameter distribution:• Log-normal an all parameters (Fixed effect Var=0.05; Random Effect

Var=0.09)

Results - OFV

Method Mean OFV1021 [95% CI]MC-RS 100,000 3.24MC-RS 50 3.27[2.2-5.0]MC-RS 500 3.33[2.8-3.8]MC-LHS 50 3.24[2.2-4.6]MC-LHS 500 3.22[2.9-3.7]LAPLACE 2.95LAPLACE-BFGS 3.01

Mean OFV and non-parametric confidence intervals for different integration methods from 100 evaluations

Results - DesignMC-RS 50 MC-LHS 50 LAPLACE

LAPLACE-BFGSMC-RS 500 MC-LHS 500

Results – Runtimes

MC-LHS 50 MC-RS 50 LAPLACE-BFGS LAPLACE MC-LHS 500 MC-RS 500

0.35 0.37 0.460.63

3.533.67

Conclusions• Quasi-Newton methods constitute fast alternative for

continuous design variable optimization • Information about design sensitivity can be obtained with no

additional cost• Global Optimal Design:

– Monte-Carlo methods are easy and flexible but need high number of samples to give stable results

– Laplace approximation constitutes fast alternative for priors with continuous probability distribution function

– Laplace integral approximation with BFGS Hessian gave same sampling times with approx. 30% shorter runtimes

THANK YOU!

References1) C.G. Broyden, “The Convergence of a Class of Double-rank Minimization Algorithms 1. General

Considerations,” IMA J Appl Math, vol. 6, Mar. 1970, pp. 76-90. 2) R. Fletcher, “A new approach to variable metric algorithms,” The Computer Journal, vol. 13,

1970, p. 317. 3) D. Goldfarb, “A family of variable-metric methods derived by variational means,” Mathematics

of Computation, 1970, pp. 23–26. 4) D.F. Shanno, “Conditioning of quasi-Newton methods for function minimization,” Mathematics

of Computation, 1970, pp. 647–656. 5) R.H. Byrd, P. Lu, J. Nocedal, and C. Zhu, “A limited memory algorithm for bound constrained

optimization,” SIAM J. Sci. Comput., vol. 16, 1995, pp. 1190-1208. 6) M. Dodds, A. Hooker, and P. Vicini, “Robust Population Pharmacokinetic Experiment Design,”

Journal of Pharmacokinetics and Pharmacodynamics, vol. 32, Feb. 2005, pp. 33-64.

Application of Quasi-Newton Algorithms in Optimal Design

Documents

New Quasi-Newton Methods for Efficient Large-Scale …vishy/talks/LBFGS.pdf · New Quasi-Newton Methods for Efﬁcient Large-Scale Machine Learning S.V: ... New Quasi-Newton Methods

A Stochastic Quasi-Newton Method with Nesterov’s

Quasi-Newton Methods - CMU Statisticsryantibs/convexopt-F18/lectures/quasi... · 2018. 10. 29. · But quasi-Newton converges in less than 100 times the iterations 18. Implicit-form

Newton Based Optimal Power Flow

Optimización. Newton y Quasi Newton

Quasi Newton Methods

Quasi-Newton Methods in DFTsusi.theochem.tuwien.ac.at/reg_user/.../Marks... · Quasi-Newton Methods in DFT L. D. Marks ... Two are nominally the same, except – For a minimization

Investigation of quasi-Newton methods for unconstrained …people.math.sfu.ca/~elushi/project_833.pdf · 2004-11-10 · focus on four popular quasi-Newton methods, namely BFGS, DFP,

Stochastic Newton and quasi-Newton Methods for Large-Scale ...aspremon/Houches/talks/Goldfarb.pdf · Newton-like and quasi-Newton methods for convex stochastic optimization problems

A FULL-SPACE QUASI-LAGRANGE-NEWTON-KRYLOV …

Practical Quasi-Newton algorithms for singular nonlinear

Méthodes Quasi-Newton · Méthodes quasi-Newton : BFGS • H k vériﬁe l’équation sécante • H k n’est pas nécessairement symétrique • H k n’est pas nécessairement

Randomized Quasi-Newton Updates are Linearly Convergent ... · Randomized Quasi-Newton Updates are Linearly Convergent Matrix Inversion Algorithms Robert M. Gower and Peter Richt

A Quasi-Newton Accelerated Multirate Coupling Scheme for ... · A Quasi-Newton Accelerated Multirate Coupling Scheme for Partitioned Simulation Benjamin Rüth1, Benjamin Uekermann2,

Research Article Partitioned Quasi-Newton Approximation

Méthodes quasi-Newton : BFGS

Exact linesearch limited-memory quasi-Newton methods for

GUSTAF: A QUASI-NEWTON NONLINEAR ADI FORTRAN IV …

Quasi-Newton Methods, Motivation and Theory

Planar Quasi-Newton Algorithms for Unconstrained ...oren/pubs/quasi.pdf · Planar Quasi-Newton Algorithms for Unconstrained Saddlepoint Problems I ... are often treated as synonyms)