Levenberg Marquardt Method

8/12/2019 Levenberg Marquardt Method

1/16

The Levenberg-Marquardt method for

nonlinear least squares

curve-fitting problemscHenri P. Gavin

Department of Civil and Environmental Engineering

Duke UniversitySeptember 24, 2013

Abstract

The Levenberg-Marquardt method is a standard technique used to solve nonlin-ear least squares problems. Least squares problems arise when fitting a parameterizedfunction to a set of measured data points by minimizing the sum of the squares ofthe errors between the data points and the function. Nonlinear least squares problemsarise when the function is not linear in the parameters. Nonlinear least squares meth-ods involve an iterative improvement to parameter values in order to reduce the sum

of the squares of the errors between the function and the measured data points. TheLevenberg-Marquardt curve-fitting method is actually a combination of two minimiza-tion methods: the gradient descent method and the Gauss-Newton method. In thegradient descent method, the sum of the squared errors is reduced by updating the pa-rameters in the direction of the greatest reduction of the least squares objective. In theGauss-Newton method, the sum of the squared errors is reduced by assuming the leastsquares function is locally quadratic, and finding the minimum of the quadratic. TheLevenberg-Marquardt method acts more like a gradient-descent method when the pa-rameters are far from their optimal value, and acts more like the Gauss-Newton methodwhen the parameters are close to their optimal value. This document describes thesemethods and illustrates the use of software to solve nonlinear least squares curve-fitting

problems.

1 Introduction

In fitting a function y(t; p) of an independent variable t and a vector ofn parametersp to a set ofm data points (ti, yi), it is customary and convenient to minimize the sum ofthe weighted squares of the errors (or weighted residuals) between the measured data y(ti)and the curve-fit function y(ti; p). This scalar-valued goodness-of-fit measure is called thechi-squared error criterion.

2

(p) =

mi=1

y(ti) y(ti; p)wi

2(1)

= (y y(p))TW(y y(p)) (2)

= yTWy 2yTWy +yTWy (3)

The value wi is a measure of the error in measurement y(ti). The weighting matrix W isdiagonal with Wii = 1/w

2i . If the function y is nonlinear in the model parameters p, then

the minimization of2 with respect to the parameters must be carried out iteratively. Thegoal of each iteration is to find a perturbation hto the parameters pthat reduces 2.

1


2/16

2 The Gradient Descent Method

The steepest descent method is a general minimization method which updates parame-ter values in the direction opposite to the gradient of the objective function. It is recognizedas a highly convergent algorithm for finding the minimum of simple objective functions[2,3].

For problems with thousands of parameters, gradient descent methods are sometimes theonly viable method.

The gradient of the chi-squared objective function with respect to the parameters is

p2 = (y y(p))TW

p(y y(p)) (4)

= (y y(p))TW

y(p)

p

(5)

= (y y)TWJ (6)

where the m n Jacobian matrix [y/p] represents the local sensitivity of the function y

to variation in the parameters p. For notational simplicityJ will be used for [y/p]. Theperturbation hthat moves the parameters in the direction of steepest descent is given by

hgd= JTW(y y), (7)

where the positive scalardetermines the length of the step in the steepest-descent direction.

3 The Gauss-Newton Method

The Gauss-Newton method is a method for minimizing a sum-of-squares objective func-tion. It presumes that the objective function is approximately quadratic in the parametersnear the optimal solution [?]. For moderately-sized problems the Gauss-Newton methodtypically converges much faster than gradient-descent methods[4].

The function evaluated with perturbed model parameters may be locally approximatedthrough a first-order Taylor series expansion.

y(p + h) y(p) +

y

p

h= y + Jh, (8)

Substituting the approximation for the perturbed function, y + Jh, for yin equation (3),

2(p + h) yTWy +yTWy 2yTWy 2(y y)TWJh + hTJTWJh. (9)

This shows that 2 is approximately quadratic in the perturbation h, and that the Hessianof the chi-squared fit criterion is approximately JTWJ.

The perturbation h that minimizes 2 is found from 2/h= 0.

h2(p + h) 2(y y)TWJ + 2hTJTWJ , (10)

and the resulting normal equations for the Gauss-Newton perturbation areJTWJ

hgn = J

TW(y y). (11)

2


3/16

4 The Levenberg-Marquardt Method

The Levenberg-Marquardt algorithm adaptively varies the parameter updates betweenthe gradient descent update and the Gauss-Newton update,

JTWJ +Ihlm= JTW(y y), (12)where small values of the algorithmic parameter result in a Gauss-Newton update andlarge values of result in a gradient descent update. At a large distance from the functionminimum, the steepest descent method is utilized to provide steady and convergent progresstoward the solution. As the solution approaches the minimum, is adaptively decreased,the Levenberg-Marquardt method approaches the Gauss-Newton method, and the solutiontypically converges rapidly to the local minimum[2,3,4].

Marquardts suggested update relationship [4]

JTWJ + diag(JTWJ)

hlm= J

TW(y y) . (13)

is used in the Levenberg-Marquardt algorithm implemented in the Matlabfunction lm.m

4.1 Numerical Implementation

Variations of the Levenberg-Marquardt have been published in papers and in code.This document borrows from some of these and offers the enhancement of a rank-1 Jacobianupdate. In iteration i, the step h is evaluated by comparing 2(p) to2(p + h). The step isaccepted if the metric i [5] is greater than a user-specified value, 4,

i(h) =

2(p) 2(p + h)

/

2hT

ih + JTW (y y(p))

If in an iteration i(h)> 4 then p + h is sufficiently better than p, pis replaced by p + h,andis reduced by a factor. Otherwise is increased by a factor, and the algorithm proceedsto the next iteration.

4.1.1 Initialization and update of the L-M parameter,, and the parametersp

In lm.m users may select one of three methods for inializing and updating and p.

1. 0 = o; o is user-specified [4].JTWJ +i diag[J

TWJ]

h= JTW (y y(p))

ifi(h)> 4, update p, p= p + h;i+1= max[i/L, 10

7

];otherwise,i+1= min [iL, 107];

2. 0= omaxdiag[JTWJ]

;o is user-specified.

JTWJ +i I

h= JTW(y y(p)) ,

=

JTW(y y(p))T

h

/

(2(p + h) 2(p)) /2 + 2

JTW(y y(p))T

h

;

ifi(h)> 4, update p, p= p +h; i+1= max [i/(1 +), 107];

otherwise,i+1= i+ |2(p +h) 2(p)|/(2);

3


4/16

3. 0 = omaxdiag[JTWJ]

;o is user-specified[5].

JTWJ +i I

h= JTW(y y(p)) ,

ifi(h)> 4, update p, p= p + h;i+1= imax [1/3, 1 (2i 1)3] ; i= 2;

otherwise,i+1= ii; i+1= 2i;

For the examples in section 4.4, method 1 [4]withL 11 andL 9 has good convergenceproperties.

4.1.2 Computation and rank-1 update of the Jacobian,[y/p]

In the first iteration, in every 2niterations, and in iterations where2(p + h)> 2(p),the Jacobian (J Rmn) is numerically approximated using forward differences,

Jij = yipj

=y(ti; p +pj) y(ti; p)

||pj|| , (14)

or central differences (default)

Jij = yipj

=y(ti; p +pj) y(ti; p pj)

2||pj|| , (15)

where the j-th element ofpj is the only non-zero element and is set to j(1 + |pj|). In allother iterations, the Jacobian is updated using the Broyden rank-1 update formula,

J= J +

(y(p + h) y(p) Jh) hT

/

hTh

. (16)

For problems with many parameters, a finite differences Jacobian is computationally ex-pensive. Convergence can be achieved with fewer function evaluations if the Jacobian isre-computed using finite differences only occaisonally. The rank-1 Jacobian update equation(16) requires no additional function evaluations.

4.1.3 Convergence criteria

Convergence is achieved when one of the following three criteria is satisfied,

Convergence in the gradient, maxJTW(y y)

< 1;

Convergence in paramters, max |hi/pi|< 2; or

Convergence in 2, 2/(m n+ 1)< 3.

Otherwise, iterations terminate when the iteration count exceeds a pre-specified limit.

4


5/16

4.2 Error Analysis

Once the optimal curve-fit parameters pfit are determined, parameter statistics arecomputed for the converged solution using weight values, w2i , equal to the mean squaremeasurement error, 2y,

w2i =2y =

1

m n+ 1(y y(pfit))

T(y y(pfit)) i . (17)

The parameter covariance matrix is then computed from

Vp= [JTWJ]1 , (18)

and the asymptotic standard parameter errors are given by

p =

diag([JTWJ]1), (19)

The asymptotic standard parameter error is a measure of how unexplained variability in thedata propagates to variability in the parameters, and is essentially an error measure for theparameters. The standard error of the fit is given by

y=

diag(J[JTWJ]1JT) . (20)

The standard error of the fit indicates how variability in the parameters affects the variabilityin the curve-fit. The asymptotic standad prediction error reflects the standard error of thefit as well as the mean square measurement error.

yp = 2y+diag(J[JTWJ]1JT). (21)4.3 Matlab code: lm.m

The Matlab function lm.m implements the Levenberg-Marquardt method for curve-fitting problems. The code with examples are available here:

http://www.duke.edu/hpgavin/lm.m

http://www.duke.edu/hpgavin/lm examp.m

http://www.duke.edu/hpgavin/lm func.m

http://www.duke.edu/hpgavin/lm plots.m

5
http://www.duke.edu/~hpgavin/lm.mhttp://www.duke.edu/~hpgavin/lm.mhttp://www.duke.edu/~hpgavin/lm.mhttp://www.duke.edu/~hpgavin/lm_examp.mhttp://www.duke.edu/~hpgavin/lm_examp.mhttp://www.duke.edu/~hpgavin/lm_examp.mhttp://www.duke.edu/~hpgavin/lm_func.mhttp://www.duke.edu/~hpgavin/lm_func.mhttp://www.duke.edu/~hpgavin/lm_func.mhttp://www.duke.edu/~hpgavin/lm_plots.mhttp://www.duke.edu/~hpgavin/lm_plots.mhttp://www.duke.edu/~hpgavin/lm_plots.mhttp://www.duke.edu/~hpgavin/lm_plots.mhttp://www.duke.edu/~hpgavin/lm_func.mhttp://www.duke.edu/~hpgavin/lm_examp.mhttp://www.duke.edu/~hpgavin/lm.m


6/16

1 function [ p , X 2 , s i g m a _ p , s i g m a _ y , c o r r , R _ s q , c v g _ h s t ] = l m ( f u n c , p , t , y _ d a t , w e i g h t , d p , p _ m i n , p _ m ax , c , o p t s )

2 % [p ,X2, s igm a p , s igm a y , cor r , R s q , cv g hs t ] = lm ( fun c ,p , t , y dat , weight , dp , p m in , p m ax , c , opt s )3 %4 % L e v e nb e r g M a rq u ar d t c u r ve f i t t i n g : mi ni mi ze sum o f w e i g h t e d s qu a r e d r e s i d u a l s5 % INPUT VARIABLES 6 % f un c = f u n c ti o n o f n i nd ep en de nt v a r i a bl e s , t , and m pa ra me te rs , p ,7 % r et ur n in g th e si mu la te d model : y h at = fu n c ( t , p , c )8 % p = n v e c to r o f i n i t i a l g u es s o f p ar am et er v a l ue s9 % t = m v e c to r s o r m a tr ix o f i nd ep en de nt v a r i a b l e s ( u se d a s a rg t o f un c )10 % y d a t = m v e c to r s o r m at ri x o f d at a t o b e f i t b y f un c ( t , p )11 % w ei g ht = w e i gh t in g v e ct o r f o r l e a s t s q ua re s f i t ( w ei g ht >= 0 ) . . .12 % i nv er se o f t he s t a n d a rd measurement e rr or s13 % D e f a u l t : s q r t ( d . o . f . / ( y d a t y d at ) )14 % dp = f r ac t i on a l i ncr emen t o f p f o r n um er i ca l d e r i v a ti v es15 % dp ( j )>0 c e nt r al d i f f e re n c es c a l cu l a te d 16 % dp ( j )1 i n te r me d i at e r e s u l t s ; >2 p l o t s25 % o p t s ( 2 ) = MaxIter 10 Npar maximum number o f i t e r a t i o n s26 % o p t s ( 3 ) = e p s i l o n 1 1e 3 c on v er ge nc e t ol er an ce fo r g ra di en t 27 % o p t s ( 4 ) = e p s i l o n 2 1e 3 c on ve rg en ce t o le ra n ce fo r p ar am et ers28 % o p t s ( 5 ) = e p s i l o n 3 1e 3 c on v er ge nc e t ol er an ce fo r Chi s q u a r e 29 % o p t s ( 6 ) = e p s i l o n 4 1e 2 d et er m in e s a cc ep ta nc e o f a LM s t e p30 % o p t s ( 7 ) = lambda 0 1e 2 i n i t i a l v a l u e o f LM par am ter 31 % o pt s ( 8 ) = lambda UP fac 11 f ac to r fo r i nc re as in g lambda 32 % o pt s ( 9 ) = lambda DN fac 9 f ac to r f or d ec re a si ng lambda 33 % o p t s ( 1 0 ) = Update Type 1 1 : Levenber g M ar quar dt lam bda update34 % 2 : Q uadr a tic u pd ate 35 % 3 : N i e l s e n s lambda upd at e e q u a t i o n s36 %37 % OUTPUT VARIABLES 38 % p = l e a s t s q u ar e s o p t im a l e s t i ma t e o f t h e p ar am et er v a l u e s39 % X2 = Chi s q u ar ed c r i te r ia

40 % s ig ma p = a s ym p to t i c s t an d ar d e r r or o f t h e p a ra me te rs41 % si g ma y = as y mp t o ti c s t an d ar d e r r or o f t h e c ur ve f i t42 % c or r = c o r re l a ti o n m at ri x o f t h e pa ra me te rs43 % R sq = Rs qu ar ed c o f f i c i e n t o f m u l t ip l e d e te rm in a ti o n 44 % c v g h s t = c o nv e rg e nc e h i s t o r y

The m-file to solve a least-squares curve-fit problem with lm.m can be as simple as:

1 m y _ da t a = load( m y _ d a t a _ f i l e ) ; % l o ad t h e d at a 2 t = m y_ d at a (: ,1); % i f th e i nd ep en de nt v a r i a b l e i s i n column 13 y _ d at = m y _ d at a ( : , 2 ) ; % i f t he d ep en de nt v a r ia b l e i s i n column 2 4

5 p _mi n = [ -10 0.1 5 0.1 ]; % minimum e x p e c t e d p a r a m e t er v a l u e s6 p _ma x = [ 10 5.0 15 0.5 ]; % maximum e x p e c t e d p a r a me t e r v a l u e s7 p _ in it = [ 3 2.0 10 0.2 ]; % i n i t i a l g u e ss f o r pa ra me te r v a l ue s

8

9 [ p _f it , X 2 , s ig ma _p , s ig ma _y , c or r , R _s q , c v g_ h st ] = . ..

10 l m ( l m _f u nc , p _i ni t , t , y _d at , 1 , - 0. 01 , p _m in , p _ ma x )

where the user-supplied function lm_func.m could be, for example,

1 function y _ ha t = l m _ f un c ( t , p , c )

2 y _h at = p (1 ) * t .* exp( - t / p ( 2 )) . * cos(2 *pi * ( p ( 3 )* t - p ( 4) ) );

6


7/16

It is common and desirable to repeat the same experiment two or more times and to estimatea single set of curve-fit parameters from all the experiments. In such cases the data file mayarranged as follows:

1 % tv a r i ab l e y (1 s t e x pe ri me nt ) y (2 nd e xp er ie mn t ) y (3 rd e xp er ie mn t )2 0 . 50 0 00 3 . 5 98 6 3 . 60 1 92 3 . 58 29 3

3 0 . 80 0 00 8 . 1 23 3 8 . 01 2 31 8 .1 6 23 4

4 0 . 90 0 00 12 . 23 4 2 1 2. 2 9 52 3 1 2. 0 1 82 35 : : : :

6 etc . etc . etc . etc .

If your data is arranged as above you may prepare the data for lm.musing the following lines.

1 m y _ da t a = load( m y _ d a t a _ f i l e ) ; % l o ad t h e d at a 2 t _ co l um n = 1 ; % c ol um n o f t h e i n de p en d en t v a r i a b l e 3 y _c ol um ns = [ 2 3 4 ] ; % c ol um ns o f t h e m ea su re d d e p en d en t v a r i a b l e s4

5 y _ d at = m y _ d at a ( : , y _ c o l u mn s ) ; % t h e m ea su re d d a t a 6 y _ d at = y _ d a t ( : ) ; % a s i n g l e c ol um n v e c t o r 7

8 t = m y_ dat a(: , t_ co lu mn ); % t h e i n de p en d en t v a r i a b l e 9 t = t *ones(1 , length( y _ c o l u m n s ) ) ; % a c olumn o f t f o r e ac h c olumn o f y 10 t = t (:) ; % a s i n g l e c ol um n v e c t o r

Note that the arguments tand y dat to lm.mmay be matrices as long as the dimensions oft match the dimensions of y dat. The columns of t need not be identical. Results may beplotted with lm plots.m:

1 function l m _ p lo t s ( n , t , y _d at , y _f it , s ig ma _y , c v g_ h st )

2 % l m p l ot s ( n , t , y da t , y f i t , s igma y , c vg h s t )3 % P lo t s t a t i s t i c s o f t he r e s u l t s o f a L ev enb erg M ar qu ar dt l e a s t s q u a r e s4 % a n a l y s i s w i th l m .m 5

6 y _ d at = y _ d a t ( : ) ;

7 y _ f it = y _ f i t ( : ) ;

8

9 f igure ( 1 0 0 ) ; % p l o t c on ve rg en ce h i s t o r y o f f i t 10 c l f

11 subplot(211)

12 plot( c v g _ hs t ( : , 1 ) , c v g _ hs t ( : , 2 : n + 1) , - o , l i n ew i d t h , 4 );

13 ylabel( p a r a m et e r v a l ue s )

14

15 subplot(212)

16 semilogy( c v g _ hs t ( : , 1 ) , [ c v g _ hs t ( : , n + 2 ) c v g _ h s t ( : , n + 3) ] , - o , l i ne w i d th , 4)

17 legend( \ c h i 2 , \ l a m b d a ) ;

18 xlabel( f u n c t io n c a l ls ) ; ylabel( \ c h i 2 a n d \ l a m bd a )

19

20 f igure ( 1 0 1 ) ; % p l o t d at a , f i t , and c o nf i de nc e i n t e r v a l o f f i t 21 c l f

22 subplot(211)

23 plot( t , y _ da t , o , t , y _ fi t , - , l i n e w id t h , 2 , . . .

24 t , y _ f i t + 1 . 9 6 * s i g m a _ y , - r , t , y _ f i t - 1 . 9 6 * s i g m a _ y , - r ) ;

25 legend( y _ { d a t a } , y _ { f i t } , y _ { f i t } + 1 . 9 6 \ s i g m a _ y , y _ { f i t } - 1 . 9 6 \ s i g m a _ y ) ;26 ylabel( y ( t ) )

27 subplot(212)

28 semilogy( t , s i g m a _ y , - r , l i n e w i d t h , 4 ) ;

29 xlabel( t ) ; ylabel( \ s i g m a _ y ( t ) )

30

31 f igure ( 1 0 2 ) ; % p l o t h i st o gr a m o f r e s i d u a l s , a r e t h e y G a us se an ? 32 c l f

33 hist( y _ da t - y _ fi t )

34 t i t l e ( h i s t o g ra m o f r e s i d ua l s )

35 axis( t i g h t ) ; xlabel( y _ { d a ta } - y _ { f i t } ); ylabel( c o u n t )

7


8/16

4.4 Numerical Examples

The robustness oflm.mis tested in three numerical examples by curve-fitting simulatedexperimental measurements. Noisy experimental measurementsy are simulated by addingrandom measurement noise to the curve-fit function evaluated with a set of true parameter

values y(t; ptrue). The random measurement noise is normally distributed with a mean ofzero and a standard deviation of 0.20.

yi= y(ti; ptrue) +N(0, 0.20). (22)

The convergence of the parameters from an erroneous initial guess pinitial to values closer toptrue is then examined.

Each numerical example below has four parameters (n = 4) and one-hundred mea-surements (m = 100). Each numerical example has a different curve-fit function y(t; p), adifferent true parameter vector ptrue, and a different vector of initial parameters pinitial.

For several values ofp2 and p4, the 2 error criterion is calculated and is plotted as asurface over thep2 p4 plane. The bowl-shaped nature of the objective function is clearlyevident in each example. The objective function may not appear quadratic in the parametersand the objective function may have multiple minima. The presence of measurement noisedoes not affect the smoothness of the objective function.

The gradient descent method endeavors to move parameter values in a down-hill direc-tion to minimize 2(p). This often requires small step sizes but is required when the objec-tive function is not quadratic. The Gauss-Newton method approximates the bowl shape asa quadratic and endeavors to move parameter values to the minimum in a small number ofsteps. This method works well when the parameters are close to their optimal values. The

Levenberg-Marquardt method retains the best features of both the gradient-descent methodand the Gauss-Newton method.

The evolution of the parameter values, the evolution of2, and the evolution of fromiteration to iteration is plotted for each example.

The simulated experimental data, the curve fit, and the 95-percent confidence intervalof the fit are plotted, the standard error of the fit, and a histogram of the fit errors are alsoplotted.

The initial parameter values pinitial, the true parameter values ptrue, the fit parametervalues pfit, the standard error of the fit parameters p, and the correlation matrix of the

fit parameters are tabulated. The true parameter values lie within the confidence intervalpfit 1.96p < ptrue< pfit+ 1.96p with a confidence level of 95 percent.

8


9/16

4.4.1 Example 1

Consider fitting the following function to a set of measured data.

y(t; p) =p1exp(t/p2) +p3t exp(t/p4) (23)

The m-function to be used with lm.m is simply:


2 y _h a t = p ( 1 )*exp( - t /p ( 2 )) + p ( 3 )* t . *exp( - t / p ( 4 ) ) ;

The true parameter values ptrue, the initial parameter values pinitial, resulting curve-fitparameter values pfit and standard errors of the fit parameters p are shown in Table 1.The R2 fit criterion is 98 percent. The standard parameter errors are less than one percentof the parameter values except for the standard error for p2, which is 1.5 percent of p2.The parameter correlation matrix is given in Table 2. Parameters p3 and p4 are the mostcorrelated at -96 percent. Parameters p1 andp4 are the least correlated at -35 percent.

The bowl-shaped nature of the 2 objective function is shown in Figure 1(a). Thisshape is nearly quadratic and has a single minimum.

The convergence of the parameters and the evolution of 2 and are shown in Fig-ure1(b).

The data points, the curve fit, and the curve fit confidence band are plotted in Fig-ure1(c). Note that the standard error of the fit is smaller near the center of the fit domainand is larger at the edges of the domain.

A histogram of the difference between the data values and the curve-fit is shown in

Figure1(d). Ideally these curve-fit errors should be normally distributed.

9


10/16

Table 1. Parameter values and standard errors.pinitial ptrue pfit p p/pfit (%)

5.0 20.0 19.918 0.150 0.75

2.0 10.0 10.159 0.152 1.500.2 1.0 0.9958 0.005 0.5710.0 50.0 50.136 0.209 0.41

Table 2. Parameter correlation matrix.p1 p2 p3 p4

p1 1.00 -0.74 0.40 -0.35

p2 - 0.74 1.00 -0.77 0.71p3 0.40 -0.77 1.00 -0.96p4 - 0.35 0.71 -0.96 1.00

(a)0

5

10

15

20

p21020

3040

5060

7080

90

p4

2

2.5

3

3.5

4

4.5

l g10(2

)

(b)

10-8

10-6

10-4

10-2

100

102

104

5 10 15 20 25 30 35

2

and

function calls

2

0

10

20

30

40

50

60

5 10 15 20 25 30 35

parametervalues

(c)

10-2

10-1

0 20 40 60 80 100

y

(t)

t

13

14

15

16

17

18

19

20

0 20 40 60 80 100

y(t)

ydatayfit

yfit+1.96yyfit-1.96y

(d)

0

5

10

15

20

-0.2 -0.1 0 0.1 0.2

count

ydatayfit

histogram of residuals

Figure 1. (a) The sum of the squared errors as a function ofp2andp4. (b) Top: the convergenceof the parameters with each iteration, (b) Bottom: values of2 and each iteration. (c) Top:data y, curve-fit y(t;pfit), curve-fit+error, and curve-fit-error; (c) Bottom: standard error of

the fit, y(t). (d) Histogram of the errors between the data and the fit.

10


11/16

4.4.2 Example 2


y(t; p) =p1(t/ max(t)) +p2(t/ max(t))2 +p3(t/ max(t))

3 +p4(t/ max(t))4 (24)

This function is linear in the parameters and may be fit using methods of linear least squares.The m-function to be used with lm.m is simply:


2 mt = max( t ) ;

3 y _h a t = p ( 1 ) *( t / m t ) + p ( 2 )* ( t / mt ) . 2 + p ( 3 )* ( t / mt ) . 3 + p ( 4 )* ( t / mt ) . 4 ;

The true parameter values ptrue, the initial parameter values pinitial, resulting curve-fitparameter valuespfit and standard errors of the fit parametersp are shown in Table3. TheR2 fit criterion is 99.9 percent. In this example, the standard parameter errors are largerthan in example 1. The standard error for p2 is 12 percent and the standard error of p3is 17 percent. Note that a very high value of the R2 coefficient of determination does notnecessarily mean that parameter values have been found with great accuracy. The parametercorrelation matrix is given in Table 4. These parameters are highly correlated with oneanother, meaning that a change in one parameter will almost certainly result in changes inthe other parameters.

The bowl-shaped nature of the 2 objective function is shown in Figure 2(a). Thisshape is nearly quadratic and has a single minimum. The correlation of parameters p2 and

p4, for example, is easily seen from this figure.

The convergence of the parameters and the evolution of 2 and are shown in Fig-ure2(b). The parameters converge monotonically to their final values.

The data points, the curve fit, and the curve fit confidence band are plotted in Fig-ure2(c). Note that the standard error of the fit approaches zero at t = 0 and is largest att= 100. This is because y(0; p) = 0, regardless of the values in p.

A histogram of the difference between the data values and the curve-fit is shown inFigure1(d). Ideally these curve-fit errors should be normally distributed, and they appearto be so in this example.

11


12/16


4.0 20.0 19.934 0.506 2.54

-5.0 -24.0 -22.959 2.733 11.906.0 30.0 27.358 4.597 16.8010.0 -40.0 -38.237 2.420 6.33


p1 1.00 -0.97 0.92 -0.87

p2 - 0.97 1.00 -0.99 0.95p3 0.92 -0.99 1.00 -0.99p4 - 0.87 0.95 -0.99 1.00

(a)-50

-40-30

-20-10

0

p2-80

-70-60

-50-40

-30-20

-100

p4

0

1

2

3

4

5

l g10(2

)

(b)

10-8

10-7

10-6

10-5

10-4

10-3

10-2

10-1

100101

102

6 8 10 12 14 16

2

and

function calls

2

-60

-40

-20

0

20

40

6 8 10 12 14 16

parametervalues

(c)

10-3

10-2

10-1

0 20 40 60 80 100

y

(t)

t

-15

-10

-5

0

5

10

0 20 40 60 80 100

y(t)

ydatayfit


(d)

0

5

10

15

20

25

-0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2

count

ydatayfit




12


13/16

4.4.3 Example 3


y(t; p) =p1exp(t/p2) +p3sin(t/p4) (25)

This function is linear in the parameters and may be fit using methods of linear least squares.The m-function to be used with lm.m is simply:


2 y _h a t = p ( 1 )*exp( - t /p ( 2 )) + p ( 3 )* si n( t / p ( 4 ) ) ;

The true parameter values ptrue, the initial parameter values pinitial, resulting curve-fitparameter valuespfit and standard errors of the fit parametersp are shown in Table5. TheR2 fit criterion is 99.8 percent. In this example, the standard parameter errors are all lessthan one percent. The parameter correlation matrix is given in Table6. Parameters p4 isnot correlated with the other parameters. Parameters p1 and p2 are most correlated at 73

percent.

The bowl-shaped nature of the 2 objective function is shown in Figure 3(a). Thisshape is clearly not quadratic and has multiple minima. In this example, the initial guessfor parameter p4, the period of the oscillatory component, has to be within ten percent ofthe true value, otherwise the algorithm in lm.m will converge to a very small value of theamplitude of oscillation p3 and an erroneous value for p4. When such an occurrence arises,the standard errors p of the fit parameters p3 and p4 are quite large and the histogram ofcurve-fit errors (Figure3(d)) is not normally distributed.

The convergence of the parameters and the evolution of 2 and are shown in Fig-ure3(b). The parameters converge monotonically to their final values.

The data points, the curve fit, and the curve fit confidence band are plotted in Fig-ure3(c).

A histogram of the difference between the data values and the curve-fit is shown inFigure1(d). Ideally these curve-fit errors should be normally distributed, and they appearto be so in this example.

13


14/16


10.0 6.0 5.987 0.032 0.53

50.0 20.0 20.100 0.144 0.726.0 1.0 0.978 0.010 0.985.6 5.0 4.999 0.004 0.08


p1 1.00 -0.74 -0.28 -0.02

p2 -0.74 1.00 0.18 0.02p3 - 0.28 0.18 1.00 -0.02p4 - 0.02 0.02 -0.02 1.00

(a)0

5 10

15 20

25 30

35 40

p20

2

4

6

8

10

p4

1.2

1.4

1.6

1.8

2

2.2

2.4

2.6

l g10(2

)

(b)

10-8

10-6

10-4

10-2

100

102

104

0 10 20 30 40 50

2

and

function calls

2

0

5

10

15

20

25

30

35

0 10 20 30 40 50

parametervalues

(c)

10-2

10-1

0 20 40 60 80 100

y

(t)

t

-1

0

1

2

3

4

5

6

0 20 40 60 80 100

y(t)

ydatayfit


(d)

0

5

10

15

20

25

-0.2 -0.1 0 0.1 0.2 0.3

count

ydatayfit




14


15/16

4.5 Fitting in Multiple Dimensions

The code lm.mcan carry out fitting in multiple dimensions. For example, the function

z(x, y) = (p1xp2 + (1 p1)y

p2)1/p2

may be fit to data points zi(xi, yi), (i= 1, , m), using lm.m using an m-file such as

1 m y _ da t a = load( m y _ d a t a _ f i l e ) ; % lo a d t h e d a t a 2 x _ d at = m y _ d at a ( : , 1 ) ; % i f t he i nd ep en de nt v a r i a b l e x i s i n column 13 y _ d at = m y _ d at a ( : , 2 ) ; % i f t he i nd ep en de nt v a r i a b l e y i s i n column 2 4 z _ d at = m y _ d at a ( : , 3 ) ; % i f t he d ep end ent v a r ia b l e z i s i n column 3 5

6 p _mi n = [ 0.1 0.1 ]; % minimum e x p e c t e d p a r a me t e r v a l u e s7 p _ma x = [ 0.9 2.0 ]; % maximum e x p e c t e d p a r a me t e r v a l u e s8 p _ in it = [ 0.5 1.0 ]; % i n i t i a l g u es s f o r p ar am et er v a l ue s9

10 t = [ x _d at y _d at ] ; % x and y ar e co lum n v e c t o r s o f i n de p en d en t v a r i a b l e s11

12 [ p _ f i t , C h i _ s q , s i g m a _ p , s i g m a _ y , c o r r , R 2 , c v g _ h s t ] = . . .

13 l m ( l m _ f u n c 2 d , p _ i n i t , t , z _ d a t , w e i g h t , 0 . 0 1 , p _ m i n , p _ m a x ) ;

with the m-function lm_func2d.m

1 function z _ ha t = l m _ f un c 2 d ( t , p )

2 % e xa mp le f u n c t i on u se d f o r n o n li n e ar l e a s t s q u ar e s c u rv e f i t t i n g3 % t o d e mo n s tr a t e t h e L e ve n be r g M a rq u ar d t f u n c t i o n , l m . m,4 % in two f i t t i n g d im en si on s5

6 x _ da t = t ( : , 1) ;

7 y _ da t = t ( : , 2) ;

8 z _ h at = ( p ( 1 ) * x _ d at . p ( 2 ) + ( 1 - p ( 1 ) )* y _ d a t . p ( 2 ) ) . ( 1/ p ( 2 ) ) ;

15


16/16

5 Remarks

This text and lm.m were written in an attempt to understand and explain methods ofnonlinear least squares for curve-fitting applications.

It is important to remember that the purpose of applying statistical methods for data

analysis is primarily to estimate parameter statistics . . . not the parameter values them-selves. Reasonable parameter values can often be found using non-statistical minimizationalgorithms, such as random search methods, the Nelder-Mead simplex method, or simplygriding the parameter space and finding the best combination of parameter values.

Nonlinear least squares problems can have objective functions with multiple local min-ima. Fitting algorithms will converge to different local minima depending upon values of theinitial guess, the measurement noise, and algorithmic parameters. It is perfectly appropriateand good to use the best available estimate of the desired parameters as the initial guess. Inthe absence of physical insight into a curve-fitting problem, a reasonable initial guess may befound by coarsely griding the parameter space and finding the best combination of parameter

values. There is no sense in forcing any statistical curve-fitting algorithm to work too hardby starting it with a poor initial guess. In most applications, parameters identified fromneighboring initial guesses (5%) should converge to similar parameter estimates (0.1%).The fit statistics of these converged parameters should be the same, within two or threesignificant figures.

References

[1] K. Levenberg. A Method for the Solution of Certain Non-Linear Problems in Least Squares.The Quarterly of Applied Mathematics, 2: 164-168 (1944).

[2] M.I.A. Lourakis. A brief description of the Levenberg-Marquardt algorithm implemented bylevmar,Technical Report, Institute of Computer Science, Foundation for Research and Tech-nology - Hellas, 2005.

[3] K. Madsen, N.B. Nielsen, and O. Tingleff.Methods for nonlinear least squares problems.Tech-nical Report. Informatics and Mathematical Modeling, Technical University of Denmark, 2004.

[4] D.W. Marquardt.An algorithm for least-squares estimation of nonlinear parameters,Journalof the Society for Industrial and Applied Mathematics, 11(2):431-441, 1963.

[5] H.B. Nielson,Damping Parameter In Marquardts Method, Technical Report IMM-REP-1999-05, Dept. of Mathematical Modeling, Technical University Denmark.

[6] W.H. Press, S.A. Teukosky, W.T. Vetterling, and B.P. Flannery. Numerical Recipes in C,Cambridge University Press, second edition, 1992.

[7] Mark K.Transtrum, Benjamin B. Machta, and James P. Sethna, Why are nonlinear fits todata so challenging?, Phys. Rev. Lett. 104, 060201 (2010),

[8] Mark K. Transtrum and James P. Sethna Improvements to the Levenberg-Marquardt algo-

rithm for nonlinear least-squares minimization, Preprint submitted to Journal of Computa-

tional Physics,January 30, 2012.

16
http://users.ics.forth.gr/~lourakis/levmar/levmar.pdfhttp://users.ics.forth.gr/~lourakis/levmar/levmar.pdfhttp://users.ics.forth.gr/~lourakis/levmar/levmar.pdfhttp://www2.imm.dtu.dk/pubdb/views/edoc_download.php/3215/pdf/imm3215.pdfhttp://www.dista.unibo.it/~bittelli/materiale_lettura_fisica_terreno/marquardt_63.pdfhttp://www2.imm.dtu.dk/~hbn/publ/TR9905.pshttp://www.lassp.cornell.edu/sethna/pubPDF/ChallengingFits.pdfhttp://www.lassp.cornell.edu/sethna/pubPDF/ChallengingFits.pdfhttp://arxiv.org/pdf/1201.5885.pdfhttp://arxiv.org/pdf/1201.5885.pdfhttp://arxiv.org/pdf/1201.5885.pdfhttp://arxiv.org/pdf/1201.5885.pdfhttp://www.lassp.cornell.edu/sethna/pubPDF/ChallengingFits.pdfhttp://www.lassp.cornell.edu/sethna/pubPDF/ChallengingFits.pdfhttp://www2.imm.dtu.dk/~hbn/publ/TR9905.pshttp://www.dista.unibo.it/~bittelli/materiale_lettura_fisica_terreno/marquardt_63.pdfhttp://www2.imm.dtu.dk/pubdb/views/edoc_download.php/3215/pdf/imm3215.pdfhttp://users.ics.forth.gr/~lourakis/levmar/levmar.pdfhttp://users.ics.forth.gr/~lourakis/levmar/levmar.pdf

Documents

Levenberg Marquardt Method