Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Multidimensional Optimization:pThe Simplex Method
Lecture 16Biostatistics 615/815Biostatistics 615/815
Homework 5: HashingP i l ill i f diff h hiPractical illustration of different hashing strategies• For a given hash table size, double hashing isFor a given hash table size, double hashing is
much more efficient• Even greater performance savings are available
by switching to a larger hash tabley g g
Major differences:• F d bl h hi t diff t i• For double hashing, use two different primes ….• Different random number generators can affect
performance a bit …
Previous Topic: Previous Topic: One-Dimensional Optimization
Bracketingg
Golden SearchGolden Search
Q d ti A i tiQuadratic Approximation
Bracketing
Find 3 points such that • a < b < c• f(b) < f(a) and f(b) < f(c)
L t i i b d ll t i iLocate minimum by gradually trimming bracketing interval
Bracketing provides additional confidence in resultin result
The Golden Ratio
Bracketing Triplet
A B C
New Point
A B CX
0.38196 0.38196
Parabolic Interpolation
For well behaved functions, faster than Golden Search
Today:Today:Multidimensional Optimization
Illustrate the method of Nelder and Mead• Simplex Method• Ni k d "A b "• Nicknamed "Amoeba"
Simple and in practice quite robustSimple and, in practice, quite robust• Counter examples are known
Discuss other standard methods
C Utility Functions:C Utility Functions:Allocating Vectors
E ll i fEase allocation of vectors.Peppered through today's examples
double * alloc_vector(int cols){return (double *) malloc(sizeof(double) * cols);}}
void free_vector(double * vector, int cols){free(vector);}
C Utility Functions:C Utility Functions:Allocating Matrices
double ** alloc_matrix(int rows, int cols){ int i;double ** matrix = (double **) malloc(sizeof(double *) * rows);
for (i = 0; i < rows; i++)matrix[i] = alloc_vector(cols);
return matrix;}}
void free_matrix(double ** matrix, int rows, int cols){ i iint i;for (i = 0; i < rows; i++)
free_vector(matrix[i], cols);free(matrix);}}
The Simplex MethodCalculate likelihoods at simplex vertices• Geometric shape with k+1 corners• E g a triangle in k = 2 dimensionsE.g. a triangle in k = 2 dimensions
Simplex crawls p• Towards minimum• Away from maximum
Probably the most widely used optimization method
A Simplex in Two Dimensions
Evaluate function at verticesx1
Note:• The highest (worst) point• Th t hi h t i t
x0 • The next highest point• The lowest (best) point
Intuition:• Move away from high point,
x2
towards low point
C Code:C Code:Creating A Simplexdouble ** make_simplex(double * point, int dim)
{int i, j;double ** simplex = alloc matrix(dim + 1, dim);doub e s p e a oc_ at (d , d );
for (int i = 0; i < dim + 1; i++)for (int j = 0; j < dim; j++)
i l [i][j] i t[j]simplex[i][j] = point[j];
for (int i = 0; i < dim; i++)simplex[i][i] += 1.0;p
return simplex;}
C Code:C Code:Evaluating Function at Vertices
Thi f i i i lThis function is very simple• This is a good thing!• Making each function almost trivial makes debugging easy
void evaluate_simplex(double ** simplex, int dim,double * fx double (* func)(double * int))double * fx, double (* func)(double *, int))
{for (int i = 0; i < dim + 1; i++)
fx[i] = (*func)(simplex[i] dim);fx[i] = ( func)(simplex[i], dim);}
C Code:C Code:Finding Extremesvoid simplex_extremes(double *fx, int dim, int & ihi, int & ilo,
int & inhi){int i;
if (fx[0] > fx[1]){ ihi = 0; ilo = inhi = 1; }
else{ ihi = 1; ilo = inhi = 0; }{ ihi = 1; ilo = inhi = 0; }
for (i = 2; i < dim + 1; i++)if (fx[i] <= fx[ilo])
ilo = i;ilo = i;else if (fx[i] > fx[ihi])
{ inhi = ihi; ihi = i; }else if (fx[i] > fx[inhi])
inhi = i;inhi = i;}
Direction for Optimization
x1Line through worst pointand average of other points
x0mid
and average of other points
x2 Average of all points, excluding worst point
C Code:C Code:Direction for Optimizationvoid simplex_bearings(double ** simplex, int dim,
double * midpoint, double * line, int ihi){int i, j;for (j = 0; j < dim; j++)
midpoint[j] = 0.0;
for (i = 0; i < dim + 1; i++)if (i ! i i)if (i != ihi)
for (j = 0; j < dim; j++)midpoint[j] += simplex[i][j];
f (j 0 j di j )for (j = 0; j < dim; j++){midpoint[j] /= dim;line[j] = simplex[ihi][j] - midpoint[j];}}
}
Reflection
x1
x0mid x'
x2
This is the default new trial point
Reflection and Expansion
x1If reflection results in new minimum…
x0mid x' x''
Move further alongminimization direction
x2
Contraction (One Dimension)
x1Try a smaller step
x0mid x'x''
x2
If x' is still the worst point…
C Code: Updating The SimplexC Code: Updating The Simplexint update_simplex(double * point, int dim, double & fmax,
double * midpoint, double * line, double scale,double (* func)(double *, int))
{int i, update = 0; double * next = alloc_vector(dim), fx;
for (i = 0; i < dim; i++)for (i = 0; i < dim; i++)next[i] = midpoint[i] + scale * line[i];
fx = (*func)(next, dim);
if (fx < fmax) ( a ){for (i = 0; i < dim; i++)
point[i] = next[i];fmax = fx;update = 1;}
free_vector(next, dim);t d treturn update;
}
Contraction …
x1 "passing through the eye of a needle"
x0mid
If a simple contraction doesn't improve things, then try moving
all points towards the currentx2
all points towards the currentminimum
C Code:C Code:Contracting The Simplexvoid contract_simplex(double ** simplex, int dim,
double * fx, int ilo, double (*func)(double *, int))
{int i, j;
for (int i = 0; i < dim + 1; i++)if (i != ilo)
{{for (int j = 0; j < dim; j++)
simplex[i][j] = (simplex[ilo][j]+simplex[i][j])*0.5;fx[i] = (*func)(simplex[i], dim);}}
}
Summary:Summary:The Simplex Method
highOriginal Simplex
highlow
reflection contractionreflection contraction
reflection andmultiplecontraction
expansion
C Code:C Code:Minimization Routine (Part I)
D l l l i bl d llDeclares local variables and allocates memory
double amoeba(double *point, int dim,double (*func)(double *, int),double tol)
{int ihi, ilo, inhi, j;int ihi, ilo, inhi, j;double fmin;double * fx = alloc_vector(dim + 1);double * midpoint = alloc_vector(dim);d bl * li ll t (di )double * line = alloc_vector(dim);double ** simplex = make_simplex(point, dim);
evaluate simplex(simplex, dim, fx, func);_ p ( p , , , )
C Code:C Code:Minimization Route (Part II)while (true)
{simplex_extremes(fx, dim, ihi, ilo, inhi);simplex_bearings(simplex, dim, midpoint, line, ihi);
if (check_tol(fx[ihi], fx[ilo], tol)) break;
update_simplex(simplex[ihi], dim, fx[ihi],i i i 0 )midpoint, line, -1.0, func);
if (fx[ihi] < fx[ilo])update_simplex(simplex[ihi], dim, fx[ihi],
id i li 2 0 f )midpoint, line, -2.0, func);else if (fx[ihi] >= fx[inhi])
if (!update_simplex(simplex[ihi], dim, fx[ihi],midpoint, line, 0.5, func))
i l ( i l di f il f )contract_simplex(simplex, dim, fx, ilo, func);}
C Code:C Code:Minimization Routine (Part III)
S h l d fStore the result and free memory
for (j = 0; j < dim; j++)point[j] = simplex[ilo][j];point[j] = simplex[ilo][j];
fmin = fx[ilo];
free_vector(fx, dim);free_vector(midpoint, dim);free_vector(line, dim);free_matrix(simplex, dim + 1, dim);
return fmin;}
C Code:C Code:Checking Convergence##include <math.h>
#define ZEPS 1e-10
int check_tol(double fmax, double fmin, double ftol){double delta = fabs(fmax - fmin);d bl (f b (f ) + f b (f i )) * ft ldouble accuracy = (fabs(fmax) + fabs(fmin)) * ftol;
return (delta < (accuracy + ZEPS));}
amoeba()
A general purpose minimization routine• Works in multiple dimensions• U l f i l i• Uses only function evaluations• Does not require derivatives
Typical usage:• f (d bl * i t ) { }•my_func(double * x, int n) { … }•amoeba(point, dim, my_func, 1e-7);
Example ApplicationExample ApplicationOld Faithful Eruptions (n = 272)
Old Faithful Eruptions
ncy
1520
Freq
ue
510
1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
05
Duration (mins)
Fitting a Normal Distribution
Fit two parameters• Mean• V i• Variance
Requires ~165 likelihood evaluationsRequires 165 likelihood evaluations• Mean = 3.4878• Variance = 1.2979
• Maximum log-likelihood = -421.42
Nice fit, eh?
00.
35
Fitted DistributionOld Faithful Eruptions
0.20
0.25
0.3
ensi
ty
quen
cy
1520
0.10
0.15
De
Freq
510
1 2 3 4 5 6
0.05
Duration (mins)Duration (mins)
1 2 3 4 5 6
0
A Mixture of Two NormalsFiFit 5 parameters• Proportion in the first component• Two means• Two variances
Required about ~700 evaluationsRequired about 700 evaluations• First component contributes 0.34841 of mixture• Means are 2.0186 and 4.2734• Variances are 0 055517 and 0 19102Variances are 0.055517 and 0.19102
• Maximum log-likelihood = -276.36
Two Components
Old Faithful Eruptions
50.
6
Fitted Distribution
quen
cy
1520
0.3
0.4
0.5
ensi
ty
Freq
510
0.1
0.2
De
Duration (mins)
1 2 3 4 5 6
0
1 2 3 4 5 6
0.0
Duration (mins)
A Mixture of Three NormalsFi 8Fit 8 parameters• Proportion in the first two components• Three means• Th i• Three variances
Required about ~1400 evaluations• Did not always converge!
One of the best solutionsOne of the best solutions …• Components contributing .339, 0.512 and 0.149• Component means are 2.002, 4.401 and 3.727• Variances are 0 0455 0 106 0 2959Variances are 0.0455, 0.106, 0.2959
• Maximum log-likelihood = -267.89
Three Components
Old Faithful Eruptions
0.6
0.7
Fitted Distribution
quen
cy
1520
0.4
0.5
ensi
ty
Freq
510
0.1
0.2
0.3D
e
Duration (mins)
1 2 3 4 5 6
0
1 2 3 4 5 6
0.0
Duration (mins)
Tricky Minimization Questions
Fitting variables that are constrained• Proportions vary between 0 and 1• Variances must be positive
Selecting the number of components
Checking convergence
Improvements to amoeba()
Different scaling along each dimension• If parameters have different impact on the likelihood
Track total function evaluations• Avoid getting stuck if function does not cooperateAvoid getting stuck if function does not cooperate
Rotate simplexp• If the current simplex is leading to slow improvement
optim() Function in R
optim(point, function, method)
• Point – starting point for minimization• Function that accepts point as argumentp p g• Method can be
• "Nelder-Mead" for simplex method (default)• "BFGS", "CG" and other options use gradient
One parameter at a timeSimple but inefficient approach
Consider• Parameters θ = (θ1, θ2, …, θk)• Function f (θ)
M i i θ ith t t h θ i tMaximize θ with respect to each θi in turn• Cycle through parameters
The Inefficiency…
θ2
θ1
Steepest Descent
Consider• Parameters θ = (θ1, θ2, …, θk)
F i f( )• Function f(θ; x)
SScore vector
⎟⎟⎞
⎜⎜⎛
==fdfdfdS lnlnln
• Find maximum along θ + δS
⎟⎟⎠
⎜⎜⎝
==kddd
Sθθθ
...,,1
Still inefficient…
Consecutive steps are still perpendicular!
Multidimensional Minimization
Typically, sophisticated methods will…
Use derivatives • May be calculated numerically. How?
Select a direction for minimization, using:• W i ht d f i di ti• Weighted average of previous directions• Current gradient• Avoid right angle turnsAvoid right angle turns
Recommended Reading
Numerical Recipes in C (or C++, or Fortran)• Press, Teukolsky, Vetterling, Flannery• Ch t 10 4• Chapter 10.4
Clear description of Simplex MethodClear description of Simplex Method• Other sub-chapters illustrate more sophisticated methods
Online at• http://www.numerical-recipes.com/