CHAPTER 2 PROBABILITY DISTRIBUTIONS. The three probability distributions the binomial distribution, the Poisson distribution, and the Gaussian distribution

CHAPTER 2PROBABILITY

DISTRIBUTIONS

The three probability distributions the binomial distribution, the Poisson distribution, and the Gaussian distribution play a fundamental role in the analysis of experimental data.

Out of them, the Gaussian, or normal error, distribution is undoubtedly the most important in statistical analysis of data.

Practically, it is useful because it seems to describe the distribution of random observations for many experiments, as well as describing the distributions obtained when we try to estimate the parameters of most other probability distributions.

The Poisson distribution is generally appropriate for counting experiments where the data represent the number of items or events observed per unit interval.

It is important in the study of random processes such as those associated with the radioactive decay of elementary particles or nuclear states, and is also applied to data

that have been sorted into ranges to form a frequency table or a histogram. The binomial distribution is generally applied to experiments in which the result is one of a small number of possible final states, such as the number of "heads“ or "tails" in a series of coin tosses, or the number of particles scattered forward or backward relative to the direction of the incident particle in a particle physics experiment. Because both the Poisson and the Gaussian distributions can be considered as limiting

cases of the binomial distribution, we shall devote some attention to the derivation of the binomial distribution from basic considerations.

2.1 BINOMIAL DISTRIBUTION Suppose we toss a coin in the air and let it land. There is a 50% probability that it will land heads up and a 50% probability that it will

land tails up. By this we mean that if we continue tossing a coin repeatedly, the fraction of times that

it lands with heads up will asymptotically approach 1/2, indicating that there was a probability of 1/2 of doing so.

For any given toss, the probability cannot determine whether or not it will land heads up; it can only describe how we should expect a large number of tosses to be divided into two possibilities.

Suppose we toss two coins at a time. There are now four different possible permutations of the way in which they can land:

both heads up, both tails up, and two mixtures of heads and tails depending on which one is heads up.

Because each of these permutations is equally probable, the probability for any choice of them is 1/4 or 25%. To find the probability for obtaining a particular mixture of heads and tails, without differentiating between the two kinds of mixtures, we must add the probabilities corresponding to each possible kind.

Thus, the total probability of finding either head up and the other tail up is 1/2. Note that the sum of the probabilities for all possibilities (1/4 + 1/4 + 1/4 + 1/4) is

always equal to 1 because something is bound to happen. Let us extrapolate these ideas to the general case. Suppose we toss n coins into the air, where n is some integer. Alternatively, suppose

that we toss one coin n times. What is the probability that exactly x of these coins will land heads up, without

distinguishing which of the coins actually belongs to which group? We can consider the probability P(x; n) to be a function of the number n of coins tossed

and of the number x of coins that land heads up. For a given experiment in which n coins are tossed, this probability P(x; n) will vary as a

function of x. Of course, x must be an integer for any physical experiment, but we can consider the

probability to be smoothly varying with x as a continuous variable for mathematical purposes.

Permutations and Combinations If n coins are tossed, there are 2n different possible ways in which they can

land. This follows from the fact that the first coin has two possible orientations, for

each of these the second coin also has two such orientations, for each of these the third coin also has two, and so on.

Because each of these possibilities is equally probable, the probability for anyone of these possibilities to occur at any toss of n coins is l/2n.

How many of these possibilities will contribute to our observations of x coins with heads up?

Imagine two boxes, one labeled "heads" and divided into x slots, and the other labeled "tails."

We shall consider first the question of how many permutations of the coins result in the proper separation of x in one box and n - x in the other; then we shall consider the question of how many combinations of these permutations should be considered to be different from each other.

In order to enumerate the number of permutations Pm(n, x), let us pick up the coins one at a time from the collection of n coins and put x of them into the

"heads“ box. We have a choice of n coins for the first one we pick up. For our second selection we can choose from the remaining n - 1 coins. The range of choice is diminished until the last selection of the x th coin can be

made from only n - x + 1 remaining coins. The total number of choices for coins to fill the x slots in the "heads" box is the

product of the numbers of individual choices: Pm (n, x) = n (n - 1)(n - 2)·· (n - x + 2)(n - x + 1)

This expansion can be expressed more easily in terms of factorials

So far we have calculated the number of permutations Pm(n, x) that will yield x coins in the “heads" box and n - x coins in the "tails" box, with the provision that we have identified which coin was placed in the "heads" box first which was placed in second, and so on.

That is, we have ordered the x coins in the “heads" box. In our computation of 2n different possible permutations of the n coins, we are only interested in which coins landed heads up or heads down, not which landed first.

Therefore, we must consider contributions different only if there are different coins in the two boxes, not if the x coins within the "heads" box are permuted into different time orderings.

The number of different combinations C(n, x) of the permutations in the preceding enumeration results from combining the x! different ways in which x coins in the "heads" box can be permuted within the box. For every x! permutations, there will be only one new combination.

Thus, the number of different combinations C(n, x) is the number of permutations Pm(n, x) divided by the degeneracy factor x! of the permutations:

This is the number of different possible combinations of n items taken x at a time commonly referred to as

Probability The probability P(x; n) that we should observe x coins with heads up and n - x

with tails up is the product of the number of different combinations C(n, x) that contribute to that set of observations multiplied by the probability for each of the combinations to occur, which we have found to be (1/2)n.

Actually, we should separate the probability for each combination into two parts: one part is the probability px = (1/2)x for x coins to be heads up; the other part is the probability qn-x = (1-½)n-x = (½)n-x for the other n - x coins to be tails up: for symmetrical coins, the product of these two parts px qn-x = (1/2)n is the probability of the combination with x coins heads up and n - x coins tails up.

In the general case the probability p of success for each item is not equal in magnitude to the probability q = 1 - p for failure for example, when tossing a die, the probability that a particular number will show is p = 1/6, while, the probability of its not showing is q = 1 – 1/6 = 5/6 so that px qn-x = (l/6)x X (5/6) n-x .

With these definitions of p and q, the probability PB (x; n, p) for observing x of the n Items to be in the state with probability p is given by the binomial distribution

the coefficients PB(x; n, p) are closely related to the binomial theorem for the expansion of a power of a sum.

According to the binomial theorem,

The (j + l)th term, corresponding to x = j, of this expansion, therefore, is equal tothe probability PB(j; n, p).

We can use this result to show that the binomial distribution coefficients PB(x; n, p) are normalized to a sum of 1. The right-hand side of Equation (2.5) is the sum of probabilities over all possible values of x from 0 to n and the left-hand side is just 1n = 1.

Mean and Standard Deviation The mean of the binomial distribution is evaluated by combining the definition

of in Equation (1.10) with the formula for the probability function of Equation (2.4)

We interpret this to mean that if we perform an experiment with n items and observe the number x of successes, after a large number of repeated experiments the average x of the number of successes will approach a mean value given by the probability for success of each item p times the number of items n.

In the case of coin tossing where p = 1/2, we should expect on the average to observe half the coins land heads up, which seems eminently reasonable.

The variance 2 of a binomial distribution is similarly evaluated by combining Equations (1.11) and (2.4):

If the probability for a single success p is equal to the probability for failure p = q = 1/2, then the distribution is symmetric about the mean , and the median 1I2

and the most probable value are both equal to the mean. In this case, the variance 2 is equal to half the mean: 2 = /2. If p and q are not

equal, the distribution is asymmetric with a smaller variance.

Example 2.1. Suppose we toss 10 coins into the air a total of 100 times. With each coin toss we observe the number of coins that land heads up and

denote that number by xi where i is the number of the toss; i ranges from 1 to 100 and xi can be any integer from 0 to 10.

The probability function governing the distribution of the observed values of X is given by the binomial distribution PB(x; n,p) with n = 10 and p = 1/2.

This is the parent distribution and is not affected by the number N of repeated procedures in the experiment.

The distribution is not symmetric about the mean or about any other point. The most probable value is x = 1, but the peak of the smooth curve occurs for a

slightly larger value of x. The parent distribution PB(x; 10, 1/2) is shown in Figure 2.1 as a smooth curve

drawn through discrete points. The mean is given by Equation (2.6): = np = 10(1/2) = 5 the standard deviation is given by Equation (2.7):

The curve is symmetric about its peak at the mean so that approximately 25% of the throws yield five heads and five tails, about 20% yield four heads and six tails and the same fraction yields six heads and four tails.

The magnitudes of the points are such that the sum of the probabilities over all ten points is equal to 1.

Example 2:2. Suppose we roll ten dice. What is the probability that x of these dice will land with the 1 up? If we throw one die, the probability of its landing with 1 up is p = 1/6. If we throw ten dice, the probability for x of them to land with 1 up is given by

the binomial distribution PB(x; n, p) with n = 10 and p = 1/6:

This distribution is illustrated in Figure 2.2 as a smooth curve drawn through discrete points. The mean and standard deviation are = 10/6 = 1.67 and

The distribution is not symmetric about the mean or about any other point. The most probable value is x = 1, but the peak of the smooth curve occurs

for a slightly larger value of x.

Example 2.3 A particle physicist makes some preliminary measurements of the angular

distribution of K mesons scattered from a liquid hydrogen target. She knows that there should be equal numbers of particles scattered forward

and backward in the center- of-mass system of the particles. She measures 1000 interactions and finds that 472 scatter forward and 528

backward. What uncertainty should she quote in these numbers? The uncertainty is given by the standard deviation from Equation (2.7),

Thus, she could quote for the fraction of particles scattered in the forward direction and

fF = (472 ± 15.8)/1000 = 0.472 ± 0.15 for the fraction scattered backward

fB = (528 ± 15.8)/1000 = 0.528 ± 0.15 Note that the uncertainties in the numbers scattering forward and backward

must be the same because losses from one group must be made up in the other. If the experimenter did not know the a priori probabilities of scattering forward

and backward, she would have to estimate p and q from her measurements; that is,

p = 472/1000 = 0.472 and q = 528/1000 = 0.528 She would then calculate

For probability p near 50%, the standard deviation is relatively insensitive to uncertainties in the experimental determination of p.

2.2 POISSON DISTRIBUTIONThe Poisson distribution represents an approximation to the binomial distributionfor the special case where the average number of successes is much smaller than the possible number; that is, when For such experiments the binomial distribution correctly describes the probability PB(x; n, p) of observing x events per time interval out of n possible events, each of which has a probability p of occurring, but the large number n of possible events makes exact evaluation from the binomial distribution impractical. Furthermore, neither the number n of possible events nor the probability p for each is usually known. What may be known instead is the average number of events expected in each time interval or its estimate average x.The Poisson distribution provides an analytical form appropriate to such investigations that describes the probability distribution in terms of just the variable x and the Parameter .Let us consider the binomial distribution in the limiting case ofWe are interested in its behavior as n becomes infinitely large while the mean = np remains constant. Equation (2.4) for the probability function of the binomial distribution can be written as

If we expand the second factor

we can consider it to be the product of x individual factors, each of which is very nearly equal to n because in the region of interest.

The second factor in Equation (2.8) thus asymptotically approaches nx. The product of the second and third factors then becomes (np)x = x. The fourth factor is approximately equal to 1 + px, which tends to 1 as p tends

to 0. The last factor can be rearranged by substituting /p for n and expanding the

expression to show that it asymptotically approaches e- :

Combining these approximations, we find that the binomial distribution probability function PB(x; n, p) asymptotically approaches the Poisson distribution Pp(x; ) as p approaches 0:

Because this distribution is an approximation to the binomial distribution for

the distribution is asymmetric about its mean and will resemble that of Figure 2.2. Note that Pp (x; ) does not become 0 for x = 0 and is not defined for negative

values of x. This restriction is not troublesome for counting experiments because the

number of counts per unit time interval can never be negative.

Derivation The Poisson distribution can also be derived for the case where the number of

events observed is small compared to the total possible number of events. Assume that the average rate at which events of interest occur is constant over

a given interval of time and that event occurrences are randomly distributed over that interval.

Then, the probability dP of observing no events in a time interval dt is given by

where P(x; t, ) is the probability of observing x events in the time interval dt, is a constant proportionality factor that is associated with the mean time between events, and the minus sign accounts for the fact that increasing the differential time interval dt decreases the probability proportionally.

Integrating this equation yields the probability of observing no events within a time t to be

where Po , the constant of integration, is equal to 1 because P(O; t,) = 1 at t = O. The probability P(x; t, ) for observing x events in the time interval can be

evaluated by integrating the differential probability

which is the product of the probabilities of observing each event in a different interval dt; and the probability e -tl of not observing any other events in the remaining time.

The factor of x! in the denominator compensates for the ordering implicit in the probabilities dPi (1, t, ) as discussed in the preceding section on permutations and combinations.

Thus, the probability of observing x events in the time interval t is obtained by integration

or

which is the expression for the Poisson distribution, where = t / is the average number of evens observed in the time interval t.

Equation (2.16) represents a normalized probability function; that is, the sum of the function evaluated at each of the allowed values of the variable x is unity:

Mean and Standard Deviation The Poisson distribution, like the binomial distribution, is a discrete distribution. That is, it is defined only at integral values of the variable x, although the

parameter Is a positive, real number. The mean of the Poisson distribution is actually the parameter that appears in

the probability function PP(x;) of Equation (2.16). To verify this, we can evaluate the expectation value of x:

To find the standard deviation , the expectation value of the square of the deviations can be evaluated:

This, the standard deviation is equal to the square root of the mean and the Poisson distribution has only a single parameter, .

Computation Of the Poisson distribution by Equation (2.16) can be limited by the factorial function in the denominator.

The problem can be avoided by using logarithms or by using the recursion relations

This form has the disadvantage that, in order to calculate the function for particular values of x and , the function must be calculated at all lower values of x as well.

However, if the function is to be summed from x = 0 to some upper limit to obtain the summed probability or to generate the distribution for a Monte Carlo calculation (Chapter 5), the function must be calculated at all lower values of x anyway.

Example 2.4 As part of an experiment to determine the mean life of radioactive isotopes of

silver, students detected background counts from cosmic rays. (See Example 8.1.) They recorded the number of counts in their detector for a series of 100 2-s

intervals, and found that the mean number of counts was l.69 per interval. From the mean they estimated the standard deviation to be = 1.69 = 1.30,

compared to s = 1.29 from a direct calculation with Equation (1.9). The students then repeated the exercise, this time recording the number of

counts in 15-s intervals for 60 intervals, obtaining a mean of 11.48 counts per interval, with standard deviations = 11.48 = 3.17 and s = 3.39.

Histograms of the two sets of data are shown in Figures 2.3, and. 2.4. The calculated mean in each case was used as an estimate of the mean of the

parent distribution to calculate a Poisson distribution for each data set. The distributions are shown as continuous curves, although only the points at

integral values of the abscissa are physically significant.

The asymmetry of the distribution in Figure 2.3 is obvious, as is the fact that the mean does not coincide with the most probable value of x at the peak of the curve.

The curve of Figure 2.4, on the other hand, is almost symmetric about its mean and the data are consistent with the curve.

As increases, the symmetry of the Poisson distribution increases and the distribution becomes indistinguishable from the Gaussian distribution.

Summed Probability We may want to know the probability of obtaining a sample value of x between

limits x1 and x2 from a Poisson distribution with mean . This probability is obtained by summing the values of the function calculated at

the integral values of x between the two integral limits x1 and x2,

More likely, we may want to find the probability of recording n or more events in a given interval when the mean number of events is .

This is just the sum

In Example 2.4, the mean number of counts recorded in a 15-s time interval was x = 11.48. In one of the intervals, 23 counts were recorded.

From Equation (2.22), the probability of collecting 23 or more events in a single 15-s time interval is - 0.0018, and the probability of this occurring in anyone of 60 15-s time intervals is just the complement of the joint probability that 23 or more counts not be observed in any of the 60 time intervals, or p = 1 - (1 - 0.0018)60 = 0.10, or about 10%.

For large values of , the probability sum of Equation (2.22) may be approximated by an integral of the Gaussian function.

2.3 GAUSSIAN OR NORMAL ERROR DISTRIBUTION The Gaussian distribution is an approximation to the binomial distribution for

the special limiting case where the number of possible different observations n becomes infinitely large and the probability of success for each is finitely large so np 1.

It is also, as we observed, the limiting case for the Poisson distribution as becomes large.

There are several derivations of the Gaussian distribution from first principles, none of them as convincing as the fact that the distribution is reasonable, that it has a fairly simple analytic form, and that it is accepted by convention and experimentation to be the most likely distribution for most experiments.

In addition, it has the satisfying characteristic that the most probable estimate of the mean from a random sample of observations x is the average of those observations x.

Characteristics The Gaussian probability density is defined as

This is a continuous function describing the probability of obtaining the value x in a random observation from a parent distribution with parameters and , corresponding to the mean and standard deviation, respectively.

Because the distribution is continuous, we must define an interval in which the value of the observation x will fall.

The probability density function is properly defined such that the probability dPG(x; , ) that the value of a random observation will fall within an interval dx around x is given by

dPG(x;, ) = PG(x; , )dx (2.24) considering dx to be an infinitesimal differential, and the probability density

function to be normalized, so that

The width of the curve is determined by the value of , such that for x = + , the height of the curve is reduced to e -1/2 of its value at the peak:

The shape of the Gaussian distribution is shown in Figure 2.5. The curve displays the characteristic bell shape and symmetry about the mean

· We can characterize a distribution by its full-width at half maximum , often

referred to as the half-width, defined as the range of x between values at which the probability PG(x; , ) is half its maximum value:

With this definition, we can determine from Equation (2.23) that = 2.354 (2.28) As illustrated in Figure 2.5, tangents drawn along a portion of steepest descent

of the curve intersect the curve at the e-1/2 points x = ± and intersect the x axis at the points x = ± 2.

Standard Gaussian Distribution It is generally convenient to use a standard form of the Gaussian equation

obtained by defining the dimensionless variable z = (x -)/, because with this change of variable, we can write

Thus, from a single computer routine or a table of values of PG(z), we can find the Gaussian probability function PG(x; , ) for all values of the parameters and by changing the variable and scaling the function by 1/ to preserve the normalization.

Mean and Standard Deviation The parameters and in Equation (2.23) for the Gaussian probability density

distribution correspond to the mean and standard deviation of the function. This equivalence can be verified by calculating and with Equations (1.13)

and (1.14) as the expectation values for the Gaussian function of x and (x - )2, respectively.

For a finite data sample, which is expected to follow the Gaussian probability density distribution, the mean and standard deviation can be calculated directly

from Equations (1.1) and (1.9). The resulting values of x and s will be estimates of the mean and standard

deviation . Values of , obtained in this way from the original 50 time measurements in Example 1.2, were used as estimates of and a in Equation (2.23) to calculate the solid Gaussian curve in Figure 1.2.

The curve was scaled to have the same area as the histogram. The curve represents our estimate of the parent distribution based on our

measurements of the sample.

Integral Probability We are often interested in knowing the probability that a measurement will

deviate from the mean by a specified amount x or greater. The answer can be determined by evaluating numerically the integral

which gives the probability that any random value of x will deviate from the mean by less than ± x . Because the probability function P G(x; , a) is normalized to unity, the

probability that a measurement will deviate from the mean by more than x is just 1 - PG(x; , ).

Of particular interest are the probabilities associated with deviations of , 2, and so forth from the mean, corresponding to 1,2, and so on standard deviations.

We may also be interested in the probable error (pe ), defined to be the absolute value of the deviation Ix - l such that the probability for the deviation of any random observation Ix i - I is less than 1/2.

That is, half the observations of an experiment would be expected to fall within the boundaries denoted by ± pe ·

If we use the standard form of the Gaussian distribution of Equation (2.29), we can calculate the integrated probability PG(z) in terms of the dimensionless variable z = (x - )l

where z = x/ measures the deviation from the mean in units of the standard deviation . The integral of Equation (2.31) cannot be evaluated analytically, so in order to

obtain the probability P G(x;, ) it is necessary either to expand the Gaussian function in a Taylor's series and integrate the series term by term, or to integrate numerically.

With modem computers, numerical integration is fast and accurate, and reliable results can be obtained from a simple quadratic integration (Appendix A.3).

Tables and Graphs The Gaussian probability density function PG(z) and the integral probability

PG(z) are tabulated and plotted in Tables C.1 and C.2, respectively. From the integral probability Table C.2, we note that the probabilities are

about 68% and 9S% that a given measurement will fall within 1 and 2 standard deviations of the mean, respectively.

Similarly, by considering the 50 % probability limit we can see that the probable error is given by pe = 0.6745.

Comparison of Gaussian and Poisson Distributions A comparison of the Poisson and Gaussian curves reveals the nature of the

Poisson distribution. It is the appropriate distribution for describing experiments in which the

possible values of the data are strictly bounded on one side but not on the other.

The Poisson curve of Figure 2.3 exhibits the typical Poisson shape. The Poisson curve of Figure 2.4 differs little from the corresponding Gaussian

curve of Figure 2.5, indicating that for large values of the mean, the Gaussian distribution becomes an acceptable description of the Poisson distribution.

Because, in general, the Gaussian distribution is more convenient to calculate than the Poisson distribution, it is often the preferred choice.

However, one should remember that the Poisson distribution is only defined at 0 and positive integral values of the variable x, whereas the Gaussian function is defined at all values of x.

2.4 LORENTZIAN DISTRIBUTION There are many other distributions that appear in scientific research. Some are phenomenological distributions, created to parameterize certain data

distributions. Others are well grounded in theory. One such distribution in the latter category is the Lorentzian distribution, similar

but unrelated to the binomial distribution. The Lorentzian distribution is an appropriate distribution for describing data

corresponding to resonant behavior, such as the variation with energy of the cross section of a nuclear or particle reaction or absorption of radiation in the Mossbauer effect.

The Lorentzian probability density function PL(X; , ), also called the Cauchy distribution, is defined as

This distribution is symmetric about its mean with a width characterized by its half-width .

The most striking difference between it and the Gaussian distribution is that it does not diminish to 0 as rapidly; the behavior for large deviations is proportional to the inverse square of the deviation, rather than exponentially related to the square of the deviation.

As with the Gaussian distribution, the Lorentzian distribution function is a continuous function, and the probability of observing a value x must be related to the interval within which the observation may fall.

The probability dPL(x; , ), for an observation to fall within an infinitesimal differential interval dx around x is given by the product of the probability density function PL(x; , ), and the size of the interval dx:

The normalization of the probability density function PL(X; , ) is such that the integral of the probability over all possible values of x is unity:

where z = (x - )/(/2). Mean and Half-Width The mean of the Lorentzian distribution is given as one of the parameters in

Equation (2.32). It is obvious from the symmetry of the distribution that must be equal to the

mean as well as to the median and to the most probable value. The standard deviation is not defined for the Lorentzian distribution as a

consequence of its slowly decreasing behavior for large deviations. If we attempt to evaluate the expectation value for the square of the

deviations we find

that the integral is unbounded: the integral does not converge for large deviations. Although it is possible to calculate a sample standard deviation by evaluating

the average value of the square of the deviations from the sample mean, this calculation has no meaning and will not converge to a fixed value as the number of samples increases.

The width of the Lorentzian distribution is instead characterized by the full width at half maximum , generally called the half-width.

This parameter is defined such that when x = ± /2, the probability density function is equal to one-half its maximum value, or P( ± /2;, ) = ½ P(; , ).

Thus, the half-width 1/2 is the full width of the curve measured between the levels of half maximum probability.

We can verify that this identification of f with the full-width at half maximum is correct by substituting x = ± /2 into Equation (2.32). The Lorentzian and Gaussian distributions are shown for comparison in Figure 2.6, for = 10 and = 2.354 (corresponding to = 1 for the Gaussian function). Both distributions are normalized to unit area according to their definitions in Equations (2.23) and (2.32). For both curves, the value of the maximum probability is inversely proportional

to the half-width. This results in a peak value of 2/ = 0.270 for the Lorentzian distribution and

a peak value of 1/(2) = 0.399 for the Gaussian distribution. Except for the normalization, the Lorentzian distribution is equivalent to the

dispersion relation that is used, for example, in describing the cross section of a nuclear reaction for a Breit-Wigner resonance:

Documents

CHAPTER 2 PROBABILITY DISTRIBUTIONS. The three probability distributions the binomial distribution, the Poisson distribution, and the Gaussian distribution