Scientific Computing with Python Webinar 9/18/2009:Curve Fitting

Curve-fitting (regression) with Python

September 18, 2009

Enthought Consulting

Enthought Training Courses

Python Basics, NumPy, SciPy, Matplotlib, Traits, TraitsUI, Chaco…

Enthought Python Distribution (EPD)http://www.enthought.com/products/epd.php

http://www.enthought.com/products/epd.php


y = mx + b

m = 4.316b = 2.763

Data Model

y =a

(b + ce!dx)a = 7.06b = 2.52c = 26.14d = !5.57

Curve Fitting or Regression?

Carl Gauss

Adrien-MarieLegendre Francis Galton

R.A. Fisher

or (my preferred) ... Bayesian Inference

LaplaceBayes

Richard T. Cox Edwin T. JaynesHarold Jeffreys

p(X|Y) =p(Y|X)p(X)

p(Y)

=p(Y|X)p(X)!p(Y|X)p(X)dX

Model PriorInference

Data

Unknowns

More pedagogy

Curve Fitting RegressionParameter Estimation Bayesian Inference

Understated statistical model

Statistical model is more important

Just want “best” fit to data

Post estimation analysis of error and fit

Machine Learning

Pragmatic look at the methods

• Because the concept is really at the heart of science, many practical methods have been developed.

• SciPy contains the building blocks to implement basically any method.

• SciPy should get high-level interfaces to all the methods in common use.

Methods vary in...

• The model used: – parametric (specific model):– non-parametric (many unknowns)

• The way error is modeled– few assumptions (e.g. zero-mean, homoscedastic)– full probabilistic model

• What “best fit” means (i.e. what is distance between the predicted and the measured). – traditional least-squares– robust methods (e.g. absolute difference)

y = f(x;!)y =

!

i

!i!i(x)

y = y + !

Parametric Least Squares

y = [y0, y1, ..., yN!1]x = [x0, x1, ..., xN!1]! = [!0, !1, ...,!K!1]K < N

y = f(x;!) + "

! = argmin!

J(y,x,!)

! = argmin!

(y ! f(x;!))T W(y ! f(x;!))

T

T

T

Linear Least Squares

y = H(x)! + "

! =!H(x)T WH(x)

"!1H(x)T Wy

yi = ax2i + bxi + c

y =

!

"""#

x20 x0 1

x21 x1 1...

......

x2N!1 xN!1 1

$

%%%&

!

#abc

$

& + !

Quadratic Example:

H(x) !

Non-linear least squares

! = argmin!

J(y,x,!)

! = argmin!

(y ! f(x;!))T W(y ! f(x;!))

Logistic Example

Optimization Problem!!

yi =a

b + ce!dxi

Tools in NumPy / SciPy

• polyfit (linear least squares)• curve_fit (non-linear least-squares)• poly1d (polynomial object)• numpy.random (random number generators)• scipy.stats (distribution objects• scipy.optimize (unconstrained and

constrained optimization)

15

Polynomials

• p = poly1d(<coefficient array>)

• p.roots (p.r) are the roots

• p.coefficients (p.c) are the coefficients

• p.order is the order

• p[n] is the coefficient of xn

• p(val) evaulates the polynomial at val

• p.integ() integrates the polynomial

• p.deriv() differentiates the polynomial

• Basic numeric operations (+,-,/,*) work

• Acts like p.c when used as an array

• Fancy printing

>>> p = poly1d([1,-2,4])>>> print p 2x - 2 x + 4

>>> g = p**3 + p*(3-2*p)>>> print g

6 5 4 3 2x - 6 x + 25 x - 51 x + 81 x - 58 x + 44

>>> print g.deriv(m=2) 4 3 230 x - 120 x + 300 x - 306 x + 162

>>> print p.integ(m=2,k=[2,1]) 4 3 20.08333 x - 0.3333 x + 2 x + 2 x + 1

>>> print p.roots[ 1.+1.7321j 1.-1.7321j]

>>> print p.coeffs[ 1 -2 4]

16

Statisticsscipy.stats — CONTINUOUS DISTRIBUTIONS

over 80 continuous distributions!

pdf

cdf

rvs

ppf

stats

fit

sf

isf

METHODSentropy

nnlf

moment

freeze

17

Using stats objects

>>> from scipy.stats import norm# Sample normal dist. 100 times.>>> samp = norm.rvs(size=100)

>>> x = linspace(-5, 5, 100)# Calculate probability dist.>>> pdf = norm.pdf(x)# Calculate cummulative Dist.>>> cdf = norm.cdf(x)# Calculate Percent Point Function>>> ppf = norm.ppf(x)

DISTRIBUTIONS

18

Setting location and Scale

>>> from scipy.stats import norm# Normal dist with mean=10 and std=2 >>> dist = norm(loc=10, scale=2)

>>> x = linspace(-5, 15, 100)# Calculate probability dist.>>> pdf = dist.pdf(x)# Calculate cummulative dist.>>> cdf = dist.cdf(x)

# Get 100 random samples from dist.>>> samp = dist.rvs(size=100)

# Estimate parameters from data>>> mu, sigma = norm.fit(samp)>>> print “%4.2f, %4.2f” % (mu, sigma)10.07, 1.95

NORMAL DISTRIBUTION

.fit returns best shape + (loc, scale)that explains the data

19

Fitting Polynomials (NumPy)

>>> from numpy import polyfit, poly1d>>> from scipy.stats import norm# Create clean data.>>> x = linspace(0, 4.0, 100)>>> y = 1.5 * exp(-0.2 * x) + 0.3# Add a bit of noise.>>> noise = 0.1 * norm.rvs(size=100)>>> noisy_y = y + noise

# Fit noisy data with a linear model.>>> linear_coef = polyfit(x, noisy_y, 1)>>> linear_poly = poly1d(linear_coef)>>> linear_y = linear_poly(x),

# Fit noisy data with a quadratic model.>>> quad_coef = polyfit(x, noisy_y, 2)>>> quad_poly = poly1d(quad_coef)>>> quad_y = quad_poly(x))

POLYFIT(X, Y, DEGREE)

20

Optimization

Unconstrained Optimization• fmin (Nelder-Mead simplex)• fmin_powell (Powell’s method)• fmin_bfgs (BFGS quasi-Newton

method)• fmin_ncg (Newton conjugate

gradient)• leastsq (Levenberg-Marquardt)• anneal (simulated annealing global

minimizer)• brute (brute force global minimizer)• brent (excellent 1-D minimizer)• golden• bracket

Constrained Optimization• fmin_l_bfgs_b• fmin_tnc (truncated Newton code)• fmin_cobyla (constrained optimization by

linear approximation)• fminbound (interval constrained 1D

minimizer)Root Finding• fsolve (using MINPACK)• brentq• brenth• ridder• newton• bisect• fixed_point (fixed point equation solver)

scipy.optimize — Unconstrained Minimization and Root Finding

21

Optimization: Data Fitting

>>> from scipy.optimize import curve_fit# Define the function to fit.>>> def function(x, a , b, f, phi):... result = a * exp(-b * sin(f * x + phi))... return result

# Create a noisy data set.>>> actual_params = [3, 2, 1, pi/4]>>> x = linspace(0,2*pi,25)>>> exact = function(x, *actual_params)>>> noisy = exact + 0.3 * randn(len(x))

# Use curve_fit to estimate the function parameters from the noisy data.>>> initial_guess = [1,1,1,1]>>> estimated_params, err_est = curve_fit(function, x, noisy, p0=initial_guess) >>> estimated_paramsarray([3.1705, 1.9501, 1.0206, 0.7034])

# err_est is an estimate of the covariance matrix of the estimates# (i.e. how good of a fit is it)

NONLINEAR LEAST SQUARES CURVE FITTING

StatsModelsJosef Perktold

Canada

Skipper SeaboldPhD StudentAmerican UniversityWashington, D.C.

Economists

GUI example: astropysics (with TraitsUI)Erik J. Tollerud

PhD StudentUC IrvineCenter for CosmologyIrvine, CA

http://www.physics.uci.edu/~etolleru/



Scientific Python Classes

Sept 21-25 AustinOct 19-22 Silicon ValleyNov 9-12 ChicagoDec 7-11 Austin

http://www.enthought.com/training



Technology

Scientific Computing with Python Webinar 9/18/2009:Curve Fitting