Multidimensional Optimization: The Simplex Methodcsg.sph.umich.edu/abecasis/class/2008/615.16.pdf · 2008-11-06 · Homework 5: Hashing zP i l ill i f diff h hiPractical illustration

Multidimensional Optimization:pThe Simplex Method

Lecture 16Biostatistics 615/815Biostatistics 615/815

Homework 5: HashingP i l ill i f diff h hiPractical illustration of different hashing strategies• For a given hash table size, double hashing isFor a given hash table size, double hashing is

much more efficient• Even greater performance savings are available

by switching to a larger hash tabley g g

Major differences:• F d bl h hi t diff t i• For double hashing, use two different primes ….• Different random number generators can affect

performance a bit …

Previous Topic: Previous Topic: One-Dimensional Optimization

Bracketingg

Golden SearchGolden Search

Q d ti A i tiQuadratic Approximation

Bracketing

Find 3 points such that • a < b < c• f(b) < f(a) and f(b) < f(c)

L t i i b d ll t i iLocate minimum by gradually trimming bracketing interval

Bracketing provides additional confidence in resultin result

The Golden Ratio

Bracketing Triplet

A B C

New Point

A B CX

0.38196 0.38196

Parabolic Interpolation

For well behaved functions, faster than Golden Search

Today:Today:Multidimensional Optimization

Illustrate the method of Nelder and Mead• Simplex Method• Ni k d "A b "• Nicknamed "Amoeba"

Simple and in practice quite robustSimple and, in practice, quite robust• Counter examples are known

Discuss other standard methods

C Utility Functions:C Utility Functions:Allocating Vectors

E ll i fEase allocation of vectors.Peppered through today's examples

double * alloc_vector(int cols){return (double *) malloc(sizeof(double) * cols);}}

void free_vector(double * vector, int cols){free(vector);}

C Utility Functions:C Utility Functions:Allocating Matrices

double ** alloc_matrix(int rows, int cols){ int i;double ** matrix = (double **) malloc(sizeof(double *) * rows);

for (i = 0; i < rows; i++)matrix[i] = alloc_vector(cols);

return matrix;}}

void free_matrix(double ** matrix, int rows, int cols){ i iint i;for (i = 0; i < rows; i++)

free_vector(matrix[i], cols);free(matrix);}}

The Simplex MethodCalculate likelihoods at simplex vertices• Geometric shape with k+1 corners• E g a triangle in k = 2 dimensionsE.g. a triangle in k = 2 dimensions

Simplex crawls p• Towards minimum• Away from maximum

Probably the most widely used optimization method

A Simplex in Two Dimensions

Evaluate function at verticesx1

Note:• The highest (worst) point• Th t hi h t i t

x0 • The next highest point• The lowest (best) point

Intuition:• Move away from high point,

x2

towards low point

C Code:C Code:Creating A Simplexdouble ** make_simplex(double * point, int dim)

{int i, j;double ** simplex = alloc matrix(dim + 1, dim);doub e s p e a oc_ at (d , d );

for (int i = 0; i < dim + 1; i++)for (int j = 0; j < dim; j++)

i l [i][j] i t[j]simplex[i][j] = point[j];

for (int i = 0; i < dim; i++)simplex[i][i] += 1.0;p

return simplex;}

C Code:C Code:Evaluating Function at Vertices

Thi f i i i lThis function is very simple• This is a good thing!• Making each function almost trivial makes debugging easy

void evaluate_simplex(double ** simplex, int dim,double * fx double (* func)(double * int))double * fx, double (* func)(double *, int))

{for (int i = 0; i < dim + 1; i++)

fx[i] = (*func)(simplex[i] dim);fx[i] = ( func)(simplex[i], dim);}

C Code:C Code:Finding Extremesvoid simplex_extremes(double *fx, int dim, int & ihi, int & ilo,

int & inhi){int i;

if (fx[0] > fx[1]){ ihi = 0; ilo = inhi = 1; }

else{ ihi = 1; ilo = inhi = 0; }{ ihi = 1; ilo = inhi = 0; }

for (i = 2; i < dim + 1; i++)if (fx[i] <= fx[ilo])

ilo = i;ilo = i;else if (fx[i] > fx[ihi])

{ inhi = ihi; ihi = i; }else if (fx[i] > fx[inhi])

inhi = i;inhi = i;}

Direction for Optimization

x1Line through worst pointand average of other points

x0mid

and average of other points

x2 Average of all points, excluding worst point

C Code:C Code:Direction for Optimizationvoid simplex_bearings(double ** simplex, int dim,

double * midpoint, double * line, int ihi){int i, j;for (j = 0; j < dim; j++)

midpoint[j] = 0.0;

for (i = 0; i < dim + 1; i++)if (i ! i i)if (i != ihi)

for (j = 0; j < dim; j++)midpoint[j] += simplex[i][j];

f (j 0 j di j )for (j = 0; j < dim; j++){midpoint[j] /= dim;line[j] = simplex[ihi][j] - midpoint[j];}}

}

Reflection

x1

x0mid x'

x2

This is the default new trial point

Reflection and Expansion

x1If reflection results in new minimum…

x0mid x' x''

Move further alongminimization direction

x2

Contraction (One Dimension)

x1Try a smaller step

x0mid x'x''

x2

If x' is still the worst point…

C Code: Updating The SimplexC Code: Updating The Simplexint update_simplex(double * point, int dim, double & fmax,

double * midpoint, double * line, double scale,double (* func)(double *, int))

{int i, update = 0; double * next = alloc_vector(dim), fx;

for (i = 0; i < dim; i++)for (i = 0; i < dim; i++)next[i] = midpoint[i] + scale * line[i];

fx = (*func)(next, dim);

if (fx < fmax) ( a ){for (i = 0; i < dim; i++)

point[i] = next[i];fmax = fx;update = 1;}

free_vector(next, dim);t d treturn update;

}

Contraction …

x1 "passing through the eye of a needle"

x0mid

If a simple contraction doesn't improve things, then try moving

all points towards the currentx2

all points towards the currentminimum

C Code:C Code:Contracting The Simplexvoid contract_simplex(double ** simplex, int dim,

double * fx, int ilo, double (*func)(double *, int))

{int i, j;

for (int i = 0; i < dim + 1; i++)if (i != ilo)

{{for (int j = 0; j < dim; j++)

simplex[i][j] = (simplex[ilo][j]+simplex[i][j])*0.5;fx[i] = (*func)(simplex[i], dim);}}

}

Summary:Summary:The Simplex Method

highOriginal Simplex

highlow

reflection contractionreflection contraction

reflection andmultiplecontraction

expansion

C Code:C Code:Minimization Routine (Part I)

D l l l i bl d llDeclares local variables and allocates memory

double amoeba(double *point, int dim,double (*func)(double *, int),double tol)

{int ihi, ilo, inhi, j;int ihi, ilo, inhi, j;double fmin;double * fx = alloc_vector(dim + 1);double * midpoint = alloc_vector(dim);d bl * li ll t (di )double * line = alloc_vector(dim);double ** simplex = make_simplex(point, dim);

evaluate simplex(simplex, dim, fx, func);_ p ( p , , , )

C Code:C Code:Minimization Route (Part II)while (true)

{simplex_extremes(fx, dim, ihi, ilo, inhi);simplex_bearings(simplex, dim, midpoint, line, ihi);

if (check_tol(fx[ihi], fx[ilo], tol)) break;

update_simplex(simplex[ihi], dim, fx[ihi],i i i 0 )midpoint, line, -1.0, func);

if (fx[ihi] < fx[ilo])update_simplex(simplex[ihi], dim, fx[ihi],

id i li 2 0 f )midpoint, line, -2.0, func);else if (fx[ihi] >= fx[inhi])

if (!update_simplex(simplex[ihi], dim, fx[ihi],midpoint, line, 0.5, func))

i l ( i l di f il f )contract_simplex(simplex, dim, fx, ilo, func);}

C Code:C Code:Minimization Routine (Part III)

S h l d fStore the result and free memory

for (j = 0; j < dim; j++)point[j] = simplex[ilo][j];point[j] = simplex[ilo][j];

fmin = fx[ilo];

free_vector(fx, dim);free_vector(midpoint, dim);free_vector(line, dim);free_matrix(simplex, dim + 1, dim);

return fmin;}

C Code:C Code:Checking Convergence##include <math.h>

#define ZEPS 1e-10

int check_tol(double fmax, double fmin, double ftol){double delta = fabs(fmax - fmin);d bl (f b (f ) + f b (f i )) * ft ldouble accuracy = (fabs(fmax) + fabs(fmin)) * ftol;

return (delta < (accuracy + ZEPS));}

amoeba()

A general purpose minimization routine• Works in multiple dimensions• U l f i l i• Uses only function evaluations• Does not require derivatives

Typical usage:• f (d bl * i t ) { }•my_func(double * x, int n) { … }•amoeba(point, dim, my_func, 1e-7);

Example ApplicationExample ApplicationOld Faithful Eruptions (n = 272)

Old Faithful Eruptions

ncy

1520

Freq

ue

510

1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

05

Duration (mins)

Fitting a Normal Distribution

Fit two parameters• Mean• V i• Variance

Requires ~165 likelihood evaluationsRequires 165 likelihood evaluations• Mean = 3.4878• Variance = 1.2979

• Maximum log-likelihood = -421.42

Nice fit, eh?

00.

35

Fitted DistributionOld Faithful Eruptions

0.20

0.25

0.3

ensi

ty

quen

cy

1520

0.10

0.15

De

Freq

510

1 2 3 4 5 6

0.05

Duration (mins)Duration (mins)

1 2 3 4 5 6

0

A Mixture of Two NormalsFiFit 5 parameters• Proportion in the first component• Two means• Two variances

Required about ~700 evaluationsRequired about 700 evaluations• First component contributes 0.34841 of mixture• Means are 2.0186 and 4.2734• Variances are 0 055517 and 0 19102Variances are 0.055517 and 0.19102


Two Components


50.

6

Fitted Distribution

quen

cy

1520

0.3

0.4

0.5

ensi

ty

Freq

510

0.1

0.2

De

Duration (mins)

1 2 3 4 5 6

0

1 2 3 4 5 6

0.0

Duration (mins)

A Mixture of Three NormalsFi 8Fit 8 parameters• Proportion in the first two components• Three means• Th i• Three variances

Required about ~1400 evaluations• Did not always converge!

One of the best solutionsOne of the best solutions …• Components contributing .339, 0.512 and 0.149• Component means are 2.002, 4.401 and 3.727• Variances are 0 0455 0 106 0 2959Variances are 0.0455, 0.106, 0.2959


Three Components


0.6

0.7

Fitted Distribution

quen

cy

1520

0.4

0.5

ensi

ty

Freq

510

0.1

0.2

0.3D

e

Duration (mins)

1 2 3 4 5 6

0

1 2 3 4 5 6

0.0

Duration (mins)

Tricky Minimization Questions

Fitting variables that are constrained• Proportions vary between 0 and 1• Variances must be positive

Selecting the number of components

Checking convergence

Improvements to amoeba()

Different scaling along each dimension• If parameters have different impact on the likelihood

Track total function evaluations• Avoid getting stuck if function does not cooperateAvoid getting stuck if function does not cooperate

Rotate simplexp• If the current simplex is leading to slow improvement

optim() Function in R

optim(point, function, method)

• Point – starting point for minimization• Function that accepts point as argumentp p g• Method can be

• "Nelder-Mead" for simplex method (default)• "BFGS", "CG" and other options use gradient

One parameter at a timeSimple but inefficient approach

Consider• Parameters θ = (θ1, θ2, …, θk)• Function f (θ)

M i i θ ith t t h θ i tMaximize θ with respect to each θi in turn• Cycle through parameters

The Inefficiency…

θ2

θ1

Steepest Descent

Consider• Parameters θ = (θ1, θ2, …, θk)

F i f( )• Function f(θ; x)

SScore vector

⎟⎟⎞

⎜⎜⎛

==fdfdfdS lnlnln

• Find maximum along θ + δS

⎟⎟⎠

⎜⎜⎝

==kddd

Sθθθ

...,,1

Still inefficient…

Consecutive steps are still perpendicular!

Multidimensional Minimization

Typically, sophisticated methods will…

Use derivatives • May be calculated numerically. How?

Select a direction for minimization, using:• W i ht d f i di ti• Weighted average of previous directions• Current gradient• Avoid right angle turnsAvoid right angle turns

Recommended Reading

Numerical Recipes in C (or C++, or Fortran)• Press, Teukolsky, Vetterling, Flannery• Ch t 10 4• Chapter 10.4

Clear description of Simplex MethodClear description of Simplex Method• Other sub-chapters illustrate more sophisticated methods

Online at• http://www.numerical-recipes.com/

Documents

Multidimensional Optimization: The Simplex Methodcsg.sph.umich.edu/abecasis/class/2008/615.16.pdf · 2008-11-06 · Homework 5: Hashing zP i l ill i f diff h hiPractical illustration