25
Curve-fitting (regression) with Python September 18, 2009

Scientific Computing with Python Webinar 9/18/2009:Curve Fitting

Embed Size (px)

DESCRIPTION

This webinar will provide an overview of the tools that SciPy and NumPy provide for regression analysis including linear and non-linear least-squares and a brief look at handling other error metrics. We will also demonstrate simple GUI tools that can make some problems easier and provide a quick overview of the new Scikits package statsmodels whose API is maturing in a separate package but should be incorporated into SciPy in the future.

Citation preview

Page 1: Scientific Computing with Python Webinar 9/18/2009:Curve Fitting

Curve-fitting (regression) with Python

September 18, 2009

Page 2: Scientific Computing with Python Webinar 9/18/2009:Curve Fitting

Enthought Consulting

Page 3: Scientific Computing with Python Webinar 9/18/2009:Curve Fitting

Enthought Training Courses

Python Basics, NumPy, SciPy, Matplotlib, Traits, TraitsUI, Chaco…

Page 4: Scientific Computing with Python Webinar 9/18/2009:Curve Fitting

Enthought Python Distribution (EPD)http://www.enthought.com/products/epd.php

Page 5: Scientific Computing with Python Webinar 9/18/2009:Curve Fitting

y = mx + b

m = 4.316b = 2.763

Data Model

y =a

(b + ce!dx)a = 7.06b = 2.52c = 26.14d = !5.57

Page 6: Scientific Computing with Python Webinar 9/18/2009:Curve Fitting

Curve Fitting or Regression?

Carl Gauss

Adrien-MarieLegendre Francis Galton

R.A. Fisher

Page 7: Scientific Computing with Python Webinar 9/18/2009:Curve Fitting

or (my preferred) ... Bayesian Inference

LaplaceBayes

Richard T. Cox Edwin T. JaynesHarold Jeffreys

p(X|Y) =p(Y|X)p(X)

p(Y)

=p(Y|X)p(X)!p(Y|X)p(X)dX

Model PriorInference

Data

Unknowns

Page 8: Scientific Computing with Python Webinar 9/18/2009:Curve Fitting

More pedagogy

Curve Fitting RegressionParameter Estimation Bayesian Inference

Understated statistical model

Statistical model is more important

Just want “best” fit to data

Post estimation analysis of error and fit

Machine Learning

Page 9: Scientific Computing with Python Webinar 9/18/2009:Curve Fitting

Pragmatic look at the methods

• Because the concept is really at the heart of science, many practical methods have been developed.

• SciPy contains the building blocks to implement basically any method.

• SciPy should get high-level interfaces to all the methods in common use.

Page 10: Scientific Computing with Python Webinar 9/18/2009:Curve Fitting

Methods vary in...

• The model used: – parametric (specific model):– non-parametric (many unknowns)

• The way error is modeled– few assumptions (e.g. zero-mean, homoscedastic)– full probabilistic model

• What “best fit” means (i.e. what is distance between the predicted and the measured). – traditional least-squares– robust methods (e.g. absolute difference)

y = f(x;!)y =

!

i

!i!i(x)

y = y + !

Page 11: Scientific Computing with Python Webinar 9/18/2009:Curve Fitting

Parametric Least Squares

y = [y0, y1, ..., yN!1]x = [x0, x1, ..., xN!1]! = [!0, !1, ...,!K!1]K < N

y = f(x;!) + "

! = argmin!

J(y,x,!)

! = argmin!

(y ! f(x;!))T W(y ! f(x;!))

T

T

T

Page 12: Scientific Computing with Python Webinar 9/18/2009:Curve Fitting

Linear Least Squares

y = H(x)! + "

! =!H(x)T WH(x)

"!1H(x)T Wy

yi = ax2i + bxi + c

y =

!

"""#

x20 x0 1

x21 x1 1...

......

x2N!1 xN!1 1

$

%%%&

!

#abc

$

& + !

Quadratic Example:

H(x) !

Page 13: Scientific Computing with Python Webinar 9/18/2009:Curve Fitting

Non-linear least squares

! = argmin!

J(y,x,!)

! = argmin!

(y ! f(x;!))T W(y ! f(x;!))

Logistic Example

Optimization Problem!!

yi =a

b + ce!dxi

Page 14: Scientific Computing with Python Webinar 9/18/2009:Curve Fitting

Tools in NumPy / SciPy

• polyfit (linear least squares)• curve_fit (non-linear least-squares)• poly1d (polynomial object)• numpy.random (random number generators)• scipy.stats (distribution objects• scipy.optimize (unconstrained and

constrained optimization)

Page 15: Scientific Computing with Python Webinar 9/18/2009:Curve Fitting

15

Polynomials

• p = poly1d(<coefficient array>)

• p.roots (p.r) are the roots

• p.coefficients (p.c) are the coefficients

• p.order is the order

• p[n] is the coefficient of xn

• p(val) evaulates the polynomial at val

• p.integ() integrates the polynomial

• p.deriv() differentiates the polynomial

• Basic numeric operations (+,-,/,*) work

• Acts like p.c when used as an array

• Fancy printing

>>> p = poly1d([1,-2,4])>>> print p 2x - 2 x + 4

>>> g = p**3 + p*(3-2*p)>>> print g

6 5 4 3 2x - 6 x + 25 x - 51 x + 81 x - 58 x + 44

>>> print g.deriv(m=2) 4 3 230 x - 120 x + 300 x - 306 x + 162

>>> print p.integ(m=2,k=[2,1]) 4 3 20.08333 x - 0.3333 x + 2 x + 2 x + 1

>>> print p.roots[ 1.+1.7321j 1.-1.7321j]

>>> print p.coeffs[ 1 -2 4]

Page 16: Scientific Computing with Python Webinar 9/18/2009:Curve Fitting

16

Statisticsscipy.stats — CONTINUOUS DISTRIBUTIONS

over 80 continuous distributions!

pdf

cdf

rvs

ppf

stats

fit

sf

isf

METHODSentropy

nnlf

moment

freeze

Page 17: Scientific Computing with Python Webinar 9/18/2009:Curve Fitting

17

Using stats objects

>>> from scipy.stats import norm# Sample normal dist. 100 times.>>> samp = norm.rvs(size=100)

>>> x = linspace(-5, 5, 100)# Calculate probability dist.>>> pdf = norm.pdf(x)# Calculate cummulative Dist.>>> cdf = norm.cdf(x)# Calculate Percent Point Function>>> ppf = norm.ppf(x)

DISTRIBUTIONS

Page 18: Scientific Computing with Python Webinar 9/18/2009:Curve Fitting

18

Setting location and Scale

>>> from scipy.stats import norm# Normal dist with mean=10 and std=2 >>> dist = norm(loc=10, scale=2)

>>> x = linspace(-5, 15, 100)# Calculate probability dist.>>> pdf = dist.pdf(x)# Calculate cummulative dist.>>> cdf = dist.cdf(x)

# Get 100 random samples from dist.>>> samp = dist.rvs(size=100)

# Estimate parameters from data>>> mu, sigma = norm.fit(samp)>>> print “%4.2f, %4.2f” % (mu, sigma)10.07, 1.95

NORMAL DISTRIBUTION

.fit returns best shape + (loc, scale)that explains the data

Page 19: Scientific Computing with Python Webinar 9/18/2009:Curve Fitting

19

Fitting Polynomials (NumPy)

>>> from numpy import polyfit, poly1d>>> from scipy.stats import norm# Create clean data.>>> x = linspace(0, 4.0, 100)>>> y = 1.5 * exp(-0.2 * x) + 0.3# Add a bit of noise.>>> noise = 0.1 * norm.rvs(size=100)>>> noisy_y = y + noise

# Fit noisy data with a linear model.>>> linear_coef = polyfit(x, noisy_y, 1)>>> linear_poly = poly1d(linear_coef)>>> linear_y = linear_poly(x),

# Fit noisy data with a quadratic model.>>> quad_coef = polyfit(x, noisy_y, 2)>>> quad_poly = poly1d(quad_coef)>>> quad_y = quad_poly(x))

POLYFIT(X, Y, DEGREE)

Page 20: Scientific Computing with Python Webinar 9/18/2009:Curve Fitting

20

Optimization

Unconstrained Optimization• fmin (Nelder-Mead simplex)• fmin_powell (Powell’s method)• fmin_bfgs (BFGS quasi-Newton

method)• fmin_ncg (Newton conjugate

gradient)• leastsq (Levenberg-Marquardt)• anneal (simulated annealing global

minimizer)• brute (brute force global minimizer)• brent (excellent 1-D minimizer)• golden• bracket

Constrained Optimization• fmin_l_bfgs_b• fmin_tnc (truncated Newton code)• fmin_cobyla (constrained optimization by

linear approximation)• fminbound (interval constrained 1D

minimizer)Root Finding• fsolve (using MINPACK)• brentq• brenth• ridder• newton• bisect• fixed_point (fixed point equation solver)

scipy.optimize — Unconstrained Minimization and Root Finding

Page 21: Scientific Computing with Python Webinar 9/18/2009:Curve Fitting

21

Optimization: Data Fitting

>>> from scipy.optimize import curve_fit# Define the function to fit.>>> def function(x, a , b, f, phi):... result = a * exp(-b * sin(f * x + phi))... return result

# Create a noisy data set.>>> actual_params = [3, 2, 1, pi/4]>>> x = linspace(0,2*pi,25)>>> exact = function(x, *actual_params)>>> noisy = exact + 0.3 * randn(len(x))

# Use curve_fit to estimate the function parameters from the noisy data.>>> initial_guess = [1,1,1,1]>>> estimated_params, err_est = curve_fit(function, x, noisy, p0=initial_guess) >>> estimated_paramsarray([3.1705, 1.9501, 1.0206, 0.7034])

# err_est is an estimate of the covariance matrix of the estimates# (i.e. how good of a fit is it)

NONLINEAR LEAST SQUARES CURVE FITTING

Page 22: Scientific Computing with Python Webinar 9/18/2009:Curve Fitting

StatsModelsJosef Perktold

Canada

Skipper SeaboldPhD StudentAmerican UniversityWashington, D.C.

Economists

Page 23: Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
Page 24: Scientific Computing with Python Webinar 9/18/2009:Curve Fitting

GUI example: astropysics (with TraitsUI)Erik J. Tollerud

PhD StudentUC IrvineCenter for CosmologyIrvine, CA

http://www.physics.uci.edu/~etolleru/

Page 25: Scientific Computing with Python Webinar 9/18/2009:Curve Fitting

Scientific Python Classes

Sept 21-25 AustinOct 19-22 Silicon ValleyNov 9-12 ChicagoDec 7-11 Austin

http://www.enthought.com/training