13
Physics 114: Lecture 15 Probability Tests & Linear Fitting Dale E. Gary NJIT Physics Department

Physics 114: Lecture 15 Probability Tests & Linear Fitting Dale E. Gary NJIT Physics Department

Embed Size (px)

Citation preview

Page 1: Physics 114: Lecture 15 Probability Tests & Linear Fitting Dale E. Gary NJIT Physics Department

Physics 114: Lecture 15 Probability Tests & Linear

Fitting

Dale E. Gary

NJIT Physics Department

Page 2: Physics 114: Lecture 15 Probability Tests & Linear Fitting Dale E. Gary NJIT Physics Department

Mar 29, 2010

Reminder of Previous Results Last time we showed that rather than considering a single set of

measurements, one can join multiple sets of measurements to refine both the estimated value and the precision of the mean.

The rule for finding the standard deviation of such a combination of sets of measurements, for the case of all statistically identical data sets (i.e. same errors ), is

Likewise, the rule for combining data sets with different errors is

That led us to the concept of weighting, where perhaps the errors themselves are not known, but the relative weighting of the measurements is known. In that case, the rule for individual sets of data is:

then combine N sets as usual

.N

2

2

1.

1 i

,i i

i

w x

w

2

2 ( ).

1i i

i

w x N

w N

.

N

Page 3: Physics 114: Lecture 15 Probability Tests & Linear Fitting Dale E. Gary NJIT Physics Department

Mar 29, 2010

Probability Tests We sometimes need to know more than just the mean and standard

deviation (uncertainty) of a set of measurements. For many cases, we also want to assess how likely our result is to be “true.”

One way to do this is to relate the uncertainty to the Gaussian probability. For example, we have learned that approximately 68% of measurements in a Gaussian distribution fall within 1 of the mean . In other words, 68% of our measurements should fall in the range ( – ) < < ( + ). If we repeat our measurement many times to determine the mean more precisely ( ), then again 68% of the repeated measurements should average in the range (’ – ) < < (’ + ).

A table of probability versus is given in Table C.2. In science, it is expected that errors are given in terms of ±1 Thus, stating a result as 3.4±0.2 means that 68% of values fall between 3.2 and 3.6. In some disciplines, it is common instead to state 90% confidence intervals (1.64), in which case the same measurements would be stated as 3.4±0.37. To avoid confusion, one should say 3.4±0.37 (90% confidence level).

x

x/ N

Page 4: Physics 114: Lecture 15 Probability Tests & Linear Fitting Dale E. Gary NJIT Physics Department

Mar 29, 2010

Probability Tests, cont’d A problem, however, occurs when we want to assign a probability estimate

to measurements that are based on only a few samples. Although the samples are governed by the same parent mean and width (), the sample width s is so poorly determined with only a few measurements that we should take that into account.

In such cases, a better estimate of probability is given by Student’s t distribution. Note that this has nothing to do with students. It was first described by an author who published under the name Student. In this distribution, the parameter t is the deviation in units of the sample standard deviation, t = (x – )/s.

It is a complicated function:

where is the gamma function (see Chapter 11), and is the number of degrees of freedom (N – 1 in this case).

This function (listed in Table C.8) differs from Table C.4 for small N, but is nearly identical for N > 30 or so.

x

( 1)/221 [( 1) / 2]( , ) 1

( / 2)t

tp t

Page 5: Physics 114: Lecture 15 Probability Tests & Linear Fitting Dale E. Gary NJIT Physics Department

Mar 29, 2010

Chi-Square Probability I want to introduce a useful concept without proof, called the (chi-

square) test of goodness of fit. We will need it in the next lecture when we describe linear fits to data.

Consider our histograms from Lecture 14.

2

2 4 6 80

10

20

2 4 6 80

10

20

2 4 6 80

10

20

2 4 6 80

10

20

2 4 6 80

10

20

2 4 6 80

10

20

2 4 6 80

10

20

2 4 6 80

10

20

2 4 6 80

10

20

2 4 6 80

10

20

2 4 6 80

10

20

2 4 6 80

10

20

2 4 6 80

10

20

2 4 6 80

10

20

2 4 6 80

10

20

2 4 6 80

10

20

Page 6: Physics 114: Lecture 15 Probability Tests & Linear Fitting Dale E. Gary NJIT Physics Department

Mar 29, 2010

Chi-Square Probability Here is a similar histogram from the text, showing the parent pdf

(solid Gaussian curve NPG(x)) and one histogram of 100 measurements of mean 5. Superimposed is the spread of values in each bin for multiple sets of 100 measurements.

Since the histogram is a frequency diagram, the value of each bin can only have integer values—hence, we expect a Poisson distribution with mean NPG(x) and standard deviation . ( )GNP x

Page 7: Physics 114: Lecture 15 Probability Tests & Linear Fitting Dale E. Gary NJIT Physics Department

Mar 29, 2010

Chi-Square Probability The definition of is

where yi are the measurements (the bin heights in this case), y is the expected value (the smooth Gaussian curve NPG(x) in this case), and i is the expected standard deviation of each yi ( in this case).

You can see that in each bin you expect the yi not to stray more than about i from y on average, so each bin should contribute about 1 to the sum.

Thus, the sum should be about n, thenumber of bins. This is almost right. Infact, statistically the expectation valueof is not n, but the number of degreesof freedom = n – nc, where nc is the

number of constraints. Often we use the reduced chi-square

2

22

i

i

y y

2

( )GNP x

2

2 2 / 1.

Page 8: Physics 114: Lecture 15 Probability Tests & Linear Fitting Dale E. Gary NJIT Physics Department

Mar 29, 2010

Meaning of the Chi-Square Test

Consider the plot below as some measurements given by the histogram, and the smooth Gaussian as a fit to the data. If we shift the smooth curve, it will obviously not fit the data as well. Then

will be much larger than , because the deviations of each bin from the shifted smooth curve are larger than i.

Likewise, if we change the width, or theamplitude of the curve, either of thesewill also raise the value of .

The best fit of the curve, in fact, is the onethat minimizes , which then should beclose to . What is in this case? It takesthree parameters to define the Gaussian,so = n – nc= 6 – 3= 3.

2

22

i

i

y y

2

2

Page 9: Physics 114: Lecture 15 Probability Tests & Linear Fitting Dale E. Gary NJIT Physics Department

0 0.5 1 1.5 2 2.52

3

4

5

6

7

8

9

x

y

Points and Fit

data points

Fit

Mar 29, 2010

Chapter 6—Least Squares Fit to a Straight Line

There are many situations where we can measure one quantity (the dependent variable) with respect to another quantity (the independent variable). For instance, we might measure the position of a car vs. time, where the position is the dependent variable and time the independent variable.

If the velocity is constant, we expect a straight line Let us generically call the dependent variable y for this discussion, and the independent variable x. Then we can write such alinear relationship as , wherea and b are constants.

Here is a plot of points with noise, showinga linear relationship, and a straight line thatgoes through the points.

( )y x a bx

( ) .o ox t x v t

Page 10: Physics 114: Lecture 15 Probability Tests & Linear Fitting Dale E. Gary NJIT Physics Department

Mar 29, 2010

Least Squares Fit to a Straight Line Here are several plots with lines through the points. Which one

do you think is the best fit?

It is surprisingly easy to see by eye which one fits best, but what does your brain do to determine this?

It is minimizing 2! Let’s go through the problem analytically.

0 0.5 1 1.5 2 2.52

3

4

5

6

7

8

9

x

y

Points and Fit

data points

Fit

0 0.5 1 1.5 2 2.52

3

4

5

6

7

8

9

10

x

y

Points and Fit

0 0.5 1 1.5 2 2.52

3

4

5

6

7

8

9

x

y

Points and Fit

Page 11: Physics 114: Lecture 15 Probability Tests & Linear Fitting Dale E. Gary NJIT Physics Department

Mar 29, 2010

Minimizing Chi-Square We start with a smooth line of the form

which is the “curve” we want to fit to the data. The chi-square for this situation is

To minimize any function, you know that you should take the derivative and set it to zero. But take the derivative with respect to what? Obviously, we want to find constants a and b that minimize , so we will form two equations:

( )y x a bx

2 2

2 ( ) 1ii

i i

y y xy a bx

2

2

22

2

22

1 12 0,

12 0.

i i i ii i

ii i i i

i i

y a bx y a bxa a

xy a bx y a bx

b b

Page 12: Physics 114: Lecture 15 Probability Tests & Linear Fitting Dale E. Gary NJIT Physics Department

Mar 29, 2010

Minimizing Chi-Square Now we can rearrange these two equations to obtain two

equations in two unknowns (a and b):

You can solve this set of simultaneous equations any way you wish. One way is to use Cramer’s Rule of matrix theory, which says

2 2 2

2

2 2 2

1,

.

i i

i i i

i i i i

i i i

y xa b

x y x xa b

1 1 1 1 1 1

2 2 2 2 2 2

1 1 1 1

2 2 2 2

1 1 1 1

2 2 2 2

,

.

has solution

and .

z ax by z x yor a b

z ax by z x y

z y x z

z y x za b

x y x y

x y x y

Ratios of determinants.

Page 13: Physics 114: Lecture 15 Probability Tests & Linear Fitting Dale E. Gary NJIT Physics Department

Mar 29, 2010

Linear Regression The solution, then, is

where

Note that if the errors are all equal (i.e. ), then when you take the ratio of these determinants the errors cancel and we get simpler expressions

2 2

2

2 2

2 2

2 2

2

2 2 2 2

1

2 2 2 2

1 1

1 1 1,

i i

i i

i i i

i i

i

i i

i i i

i i

y x

i i i i i

x y xi i i i

y

i i i i

x x yi i i i

x y x x ya

x y x yb

2 2

2

2 2

1 22

2 2 2

1.

i

i i

i i

i i

x

i i

x xi i i

x x

i

22

1 1

1 1,

i ii i i i i

i i i

ii i i i

i i i

y xa x y x x y

x y x

N yb N x y x y

x x y

222 .i

i ii i

N xN x x

x x