3. Model Fitting 3.1 The bivariate normal distribution

3. Model Fitting

In this section we apply the statistical tools introduced in Section 2 to

explore:

how to estimate model parameters

how to test the goodness of fit of models.

We will consider:

3.2 The method of least squares

3.3 The principle of maximum likelihood

3.4 Least squares as maximum likelihood estimators

3.5 Chi-squared goodness of fit tests

3.6 More general hypothesis testing

3.7 Computational methods for minimising / maximising functions

But before we do, we first introduce an important pdf:

the bivariate normal distribution

3.1 The bivariate normal distribution

p(x,y)

In fact, for any two variables x and y, we define

)()(),cov( yEyxExEyx (3.4)

Isoprobability contours for

the bivariate normal pdf

0.0 3.0

5.0 7.0

7.0 9.0

: positive correlation

y tends to increase as x increases

: negative correlation

y tends to decrease as x increases

Contours become narrower and

steeper as

stronger (anti) correlation

between x and y.

i.e. Given value of x , value of

y is tightly constrained.

3.2 The method of Least Squares

o ‘workhorse’ method for fitting lines and curves to data

in the physical sciences

o method often encountered (as a ‘black box’?) in

elementary courses

o useful demonstration of underlying statistical

principles

o simple illustration of fitting straight line to (x,y) data

Ordinary Linear Least Squares

(3.10)

(3.11)

LSLS aaE ˆ

LSLS bbE ˆ

We can show that i.e. LS estimators are unbiased.

(3.12)

(3.13)

(3.14)

(3.15)

Choosing the so that we can make and independent. LSa

LSbix 0ix

Weighted Linear Least Squares

( Common in astronomy )

Define

Again we find Least Squares estimators of a and b satisfying

(3.16)

Solving, we find

(3.17)

(3.18)

(3.19)

(3.20)

WLSWLS

ˆ,ˆcov

ba (3.21)

Extensions and Generalisations

o Errors on both variables?

Need to modify merit function accordingly.

Renders equations non-linear ; no simple analytic solution!

(3.22)

Not examinable, but see e.g.

Numerical Recipes 15.3

o General linear models?

We can write

Can formulate as a matrix equation and solve for parameters

(3.23)

(3.24)

Matrix approach to Least Squares

Define

Vector of model

parameters

Vector of

observations

Design matrix of

model basis functions

Model: Vector of model

parameters

Xay (3.25)

Vector of

errors

Vector of

observationsDesign matrix of

model basis functions

where we assume is drawn from some

pdf with mean zero and variancei

Matrix approach to Least Squares: weighting by errors

Define

Vector of model

parameters

Vector of

weighted

observations

Weighted design

matrix of model basis

functions

Weighted Model: Vector of model

parameters

eAab (3.26)

Vector of

weighted

errorsVector of

weighted

observations

Weighted design

matrix of model basis

functions

e where we assume is drawn from some

pdf with mean zero and variancei

bAAAaTT

matrixMM

We solve for the parameter vector that minimises

This has solution

1ˆcov AAa

LS(3.28)

(3.27)

o Non-linear models?

Suppose

But no simple analytic method to minimise sum of squares( e.g. no analytic solutions to )

Model parameters

(3.29)

i drawn from pdf with mean zero, variance2

3.3 The principle of maximum likelihood

Frequentist approach:

A parameter is a fixed (but unknown) constant

From actual data we can compute Likelihood,

L = probability of obtaining the observed data, given the value of the parameter

Now define likelihood function: (infinite) family of curves

generated by regarding L as

a function of , for data fixed.

Principle of Maximum Likelihood

A good estimator of maximises L -

i.e. and0L

We set the parameter equal to the value that makes the actualdata sample we did observe –out of all the possible random samples we could have observed– the most likely.

Likelihood function has same definition in Bayesian probability theory, but subtle difference inmeaning and interpretation – no need to invoke idea of (infinite) ensemble of different samples.

Aside:

i.e. and

Observed value

i.e. and

Observed value

i.e. and

Observed value

i.e. and

Observed value

3.4 Least squares as maximum likelihood estimators

To see the maximum likelihood method in action, let’s consider again

weighted least squares for the simple model

Let’s assume the pdf is a Gaussian

Likelihood

(note: L is a product of 1-D Gaussians because we are assuming the are independent)

(From Section 3.3)

1(3.30)

Substitute

and the ML estimators of and satisfy and

But maximising is equivalent to maximising

So in this case maximising L is exactly equivalent to minimising the sum of squares.

i.e. for Gaussian, independent errors, ML and weighted LS estimators are identical.

1(3.31)

iii bxay

b 0aL 0bL

1constant

1ln2ln

(3.32)

This is exactly the same

sum of squares we

defined in Section 3.3

3.5 Chi-squared goodness of fit test

In the previous 3 sections we have discussed how to estimate

parameters of an underlying pdf model from sample data.

We now consider the related question:

How good is our pdf model in the first place?

We now illustrate the frequentist approach to this question using the

chi-squared goodness of fit test, for the (very common) case where

the model pdf is a Gaussian.

We take an example from Gregory (Chapter 7)

(book focusses mainly on Bayesian probability, but

is very good on frequentist approach too)

Model: radio emission from a galaxy is constant in time.

Assume residuals are iid, drawn from N(0, )

Goodness-of-fit Test: the basic ideas

From Gregory, pg. 164

(3.33)

The 2 pdf2/12

2 22 epp (3.34)

n = 15 data points, but = 14 degrees of

freedom, because statistic involves the

sample mean and not the true mean.

We subtract one d.o.f. to account for this.

If the null hypothesis is true, how probable is it that we

would measure as large, or larger, a value of ?

(3.35)

If the null hypothesis were true, how probable is it that we

would measure as large, or larger, a value of ?

This is an important quantity, referred to as the P-value

What precisely does the P-value mean?

If we obtain a very small P-value (e.g. a few percent?) we can interpret this as

providing little support for the null hypothesis, which we may then choose to reject.

(Ultimately this choice is subjective, but may provide objective ammunition for doing so)

exp11value-P

obs dxx

“If the galaxy flux density really is constant, and we repeatedly obtained sets

of 15 measurements under the same conditions, then only 2% of the

values derived from these sets would be expected to be greater than our one

actual measured value of 26.76”

From Gregory, pg. 165

(3.36)

Nevertheless, P-value based frequentist hypothesis testing remains very

common in the literature:

Type of problem test References

Line and curve NR: 15.1-15.6

goodness-of-fit

Difference of means Student’s t NR: 14.2

Ratio of variances F test NR: 14.2

Sample CDF K-S test NR: 14.3, 14.6

Rank sum tests

Correlated variables? Sample correlation NR: 14.5, 14.6

coefficient

Discrete RVs test / NR: 14.4

contingency table

3. Model Fitting 3.1 The bivariate normal distribution

Documents

LECTURE 2 Understanding Relationships Between 2 Numerical Variables 1 Scatterplots and correlation 2 Fitting a straight line to bivariate data

Chapters 8, 9, 10 Least Squares Regression Line Fitting a Line to Bivariate Data

ContaminatedMixt: An R Package for Fitting Parsimonious ... · ContaminatedMixt: An R Package for Fitting Parsimonious Mixtures of Multivariate Contaminated Normal Distributions Antonio

The bivariate normal assumption Diagnostic plots ...jackd/Stat302/Wk02-3_Full.pdf · The bivariate normal assumption Diagnostic plots: Residuals and Cook's Distance R output (moved

Fitting Bivariate Models October 21, 2014 Elizabeth Prom-Wormley & Hermine Maes ecpromwormle@vcu.edu 804-828-8154

20 Bivariate

The Bivariate Normal Copula Christian Meyer December 15 ... · arXiv:0912.2816v1 [math.PR] 15 Dec 2009 The Bivariate Normal Copula Christian Meyer∗† December 15, 2009 Abstract

12 Bivariate

Reinvestigating Adolescent Smoking Decisions: … Adolescent Smoking Decisions: The Importance ... measured by fitting a hyperbolic function to bivariate data on ... Today we define

LECTURE UNIT 7 Understanding Relationships Among Variables 7.1-7.2 Scatterplots and correlation 7.3 -7.4 Fitting a straight line to bivariate data

Double-hurdle Model with Bivariate Normal Errors: An ...agecon2.tamu.edu/people/faculty/capps-oral/agec 635/Readings/Double... · Double-hurdle Model with Bivariate Normal Errors:

Conditional Distributions and the Bivariate Normal Distribution James H. Steiger

Holt McDougal Algebra 2 8-7 Fitting to a Normal Distribution 8-7 Fitting to a Normal Distribution Holt Algebra 2 Warm Up Warm Up Lesson Presentation Lesson

1 The Vision Thing Power Thirteen Bivariate Normal Distribution

Double-hurdle Model with Bivariate Normal Errors: An ...ageconsearch.umn.edu/bitstream/15273/1/27020363.pdf · Double-hurdle Model with Bivariate Normal Errors: An Application to

Conditional Distributions and the Bivariate Normal ... Distributions and the... · Bivariate Frequency Distributions We need two dimensions to table or plot a univariate (1-variable)

Normal Distributions. Fitting a Density Curve to a Histogram Skewed right Symmetrical and Bell shaped : Approximately normal

RAL OF THE DBIVARIATE NORMAL DISTRIBUTION SO,VER … · ABSTRACT An efficient automatic procedure is given for evaluating the integral of the bivariate normal density function (IBND)

Distribution Fitting (Bivariate Mixture Distributions) Fitting... · Distribution Fitting (Bivariate Mixture Distributions) - 1 Distribution Fitting ... The file bodytemp.sgd contains

Bivariate Normal Property