12
The Poisson Distribution Attributes of a Poisson Experiment A Poisson experiment is a statistical experiment that has the following properties: The experiment results in outcomes that can be classified as successes or failures. The average number of successes (μ) that occurs in a specified region is known. The probability that a success will occur is proportional to the size of the region. The probability that a success will occur in an extremely small region is virtually zero. Note that the specified region could take many forms. For instance, it could be a length, an area, a volume, a period of time, etc. Notation The following notation is helpful, when we talk about the Poisson distribution. e: A constant equal to approximately 2.71828. (Actually, e is the base of the natural logarithm system.) μ: The mean number of successes that occur in a specified region. x: The actual number of successes that occur in a specified region. P(x; μ): The Poisson probability that exactly x successes occur in a Poisson experiment, when the mean number of successes is μ.

The Poisson Distribution

Embed Size (px)

Citation preview

The Poisson DistributionAttributes of a Poisson ExperimentA Poisson experiment is a statistical experiment that has the following properties: The experiment results in outcomes that can be classified as successes or failures. The average number of successes () that occurs in a specified region is known. The probability that a success will occur is proportional to the size of the region. The probability that a success will occur in an extremely small region is virtually zero.

Note that the specified region could take many forms. For instance, it could be a length, an area, a volume, a period of time, etc.

NotationThe following notation is helpful, when we talk about the Poisson distribution. e: A constant equal to approximately 2.71828. (Actually, e is the base of the natural logarithm system.) : The mean number of successes that occur in a specified region. x: The actual number of successes that occur in a specified region. P(x; ): The Poisson probability that exactly x successes occur in a Poisson experiment, when the mean number of successes is .

Poisson DistributionA Poisson random variable is the number of successes that result from a Poisson experiment. The probability distribution of a Poisson random variable is called a Poisson distribution. Given the mean number of successes () that occur in a specified region, we can compute the Poisson probability based on the following formula: Poisson Formula. Suppose we conduct a Poisson experiment, in which the average number of successes within a given region is . Then, the Poisson probability is: P(x; ) = (e-) (x) / x! where x is the actual number of successes that result from the experiment, and e is approximately equal to 2.71828.

Properties Of Poisson DistributionThe Poisson distribution has the following properties:

The mean of the distribution is equal to . The variance is also equal to .

Example 1The average number of homes sold by the Acme Realty company is 2 homes per day. What is the probability that exactly 3 homes will be sold tomorrow? Solution: This is a Poisson experiment in which we know the following: = 2; since 2 homes are sold per day, on average. x = 3; since we want to find the likelihood that 3 homes will be sold tomorrow. e = 2.71828; since e is a constant equal to approximately 2.71828.

We plug these values into the Poisson formula as follows: P(x; ) = (e-) (x) / x! P(3; 2) = (2.71828-2) (23) / 3! P(3; 2) = (0.13534) (8) / 6 P(3; 2) = 0.180 Thus, the probability of selling 3 homes tomorrow is 0.180 . Cumulative Poisson Probability A cumulative Poisson probability refers to the probability that the Poisson random variable is greater than some specified lower limit and less than some specified upper limit.

Example 2Suppose the average number of lions seen on a 1-day safari is 5. What is the probability that tourists will see fewer than four lions on the next 1-day safari? Solution: This is a Poisson experiment in which we know the following: = 5; since 5 lions are seen per safari, on average. x = 0, 1, 2, or 3; since we want to find the likelihood that tourists will see fewer than 4 lions; that is, we want the probability that they will see 0, 1, 2, or 3 lions.

e = 2.71828; since e is a constant equal to approximately 2.71828.

To solve this problem, we need to find the probability that tourists will see 0, 1, 2, or 3 lions. Thus, we need to calculate the sum of four probabilities: P(0; 5) + P(1; 5) + P(2; 5) + P(3; 5). To compute this sum, we use the Poisson formula: P(x < 3, 5) = P(0; 5) + P(1; 5) + P(2; 5) + P(3; 5) P(x < 3, 5) = [ (e-5)(50) / 0! ] + [ (e-5)(51) / 1! ] + [ (e-5)(52) / 2! ] + [ (e-5)(53) / 3! ] P(x < 3, 5) = [ (0.006738)(1) / 1 ] + [ (0.006738)(5) / 1 ] + [ (0.006738)(25) / 2 ] + [ (0.006738)(125) /6] P(x < 3, 5) = [ 0.0067 ] + [ 0.03369 ] + [ 0.084224 ] + [ 0.140375 ] P(x < 3, 5) = 0.2650

The Poisson DistributionThe Poisson Distribution arises in a number of contexts as the distribution of a random number of points, for example the number of clicks of a Geiger counter in one second, the number of raisins in a box of raisin bran, the number of blades of grass in a randomly chosen square inch of lawn, and so forth. The formula for the probability of observing k of whatever is being counted when the expected number is m is

p(k) = mk e m k !where e is the base of the natural logarithms and k ! indicates the factorial function (use the ex key on a scientific calculator to calculate e m and the x! key to calculate k !). Theoretically any count between zero and infinity (including zero) is possible, but the probability of large counts falls off very rapidly. In a lottery, the number of winners cannot have an exact Poisson distribution for two reasons. The number of winners cannot be more than the number of tickets sold, whereas the Poisson distribution gives nonzero probability to arbitrarily large numbers of winners. The choice of lottery numbers by the players is not completely random. If you choose a popular number, you will have to share with many other winners if you win. Conversely, if you can figure out a number no one else likes and play that, you are guaranteed not to have to share the jackpot if you win. The Poisson distribution assumes every player chooses lottery numbers completely at random.

The first issue is not a serious problem. The Poisson distribution would be an extremely good approximation if it were not for the other issue. The second is more serious. Many players (about 70%) buy quick picks which are completely random, but other players choose some number they think is lucky and that's not random. If every player choose a quick pick the Poisson distribution would be an almost perfect approximation. Since they don't, it is not quite right. However, we will assume the Poisson distribution is correct to keep things simple.

The reason why the unconditional distribution of the number of winners of the jackpot and the conditional distribution of the number of other winners given you win are the same has to do with the assumption of completely random choices of numbers by all the players, which is required for the correctness of the Poisson distribution. Then whether you you win or not doesn't change the probability of anyone else winning. Everyone has the same 1 in 146.1 million chance of winning, and their ticket choice had nothing to do with yours.

Our Expected WinningsIf we win and there are k other winners, then the jackpot gets split k we win is J (k + 1), where J is the size of the jackpot.

+ 1 ways, and the amount

Our expected winnings are calculated just like any other expectation: multiply the amount we win in each case, which is J (k + 1), by the probability of that case, which is mk e m k !, and sum. The sum runs over k from zero to infinity, so it appears to require calculus to sum this infinite series. Fortunately, there is a trick that allows us to see what the expectation is without doing the infinite sum. The terms in the infinite sum are

ak = J (k + 1) mk e m k ! = J mk e m / (k + 1) !Let W denote the sum of the ak as k runs from zero to infinity, which is the expectation we are trying to calculate. If we multiply each term by m we get

m ak = J mk + 1 e m (k + 1) ! = J p(k + 1)where p(k) is the Poisson probability defined above. The probabilities p(k) must sum to one as k goes from zero to infinity by the properties of probability. Because of the k + 1 above, the first term is J p(1). If we were to add an additional term J p(0), the series would sum to J (because the probabilities sum to one). Thus the series sums to

J [1 p(0)] = m W(Recall that we multiplied by m so the sum is m W rather than W). Solving for W gives

W = J [1 p(0)] / m = J (1 e m) mThus, the probability of seeing at no more than 3 lions is 0.2650.

A discrete random variable X with a probability distribution function (p.d.f.) of the form:

is said to be a Poisson random variable with parameter l. We write X ~ Po(l)

Expectation and Variance;

If X ~ Po(l), then: E(X) = l. Var(X) = l.

Sums of PoissonSuppose X and Y are independent Poisson random variables with parameters l and m respectively. Then X + Y has a Poisson distribution with parameter l + m. In other words:

If X ~ Po(l) and Y ~ Po(m), then X + Y ~ Po(l + m)

Random EventsThe Poisson distribution is useful because many random events follow it. If a random event has a mean number of occurrences l in a given time period, then the number of occurrences within that time period will follow a Poisson distribution. For example, the occurrence of earthquakes could be considered to be a random event. If there are 5 major earthquakes each year, then the number of earthquakes in any given year will have a Poisson distribution with parameter 5.

ExampleThere are 50 misprints in a book which has 250 pages. Find the probability that page 100 has no misprints. The average number of misprints on a page is 50/250 = 0.2 . Therefore, if we let X be the random variable denoting the number of misprints on a page, X will follow a Poisson distribution with parameter 0.2 . Since the average number of misprints on a page is 0.2, the parameter, l of the distribution is equal to 0.2 .

P(X = 0) = (e-0.2)(0.20) 0! = 0.819 (3sf)

Binomial ApproximationThe Poisson distribution can be used as an approximation to the binomial distribution. A Binomial distribution with parameters n and p can be approximated by a Poisson distribution with parameter np.

The Probability Density FunctionWe have shown that the k th arrival time in the Poisson process has the gamma probability density function with shape parameter k and rate parameter r :

f *k (t) = r k tk 1 (k 1)! e r t , t 0Recall also that at least k arrivals come in the interval (0, t] if and only if the k th arrival occurs by time t:

( N t k) (T k t)1. Use integration by parts to show that ( N t k) = t f *k (s)ds = 1 j =0 k 1 e r t (r t) j j! ,k 2. Use the result of Exercise 1 to show that the probability density function of the number of arrivals in the interval (0, t] is ( N t = k) = e r t (r t)k k ! , k The corresponding distribution is called the Poisson distribution with parameter r t; the distribution is named after

Simeon Poisson.3. In the Poisson experiment, vary r and t with the scroll bars and note the shape of the density function. Now with r = 2 and t = 3, run the experiment 1000 times with an update frequency of 10 and watch the apparent convergence of the relative frequency function to the density function.

The Poisson distribution is one of the most important in probability. In general, a discrete random variable N in an experiment is said to have the Poisson distribution with parameter c > 0 if it has the probability density function g(k) = e c c k k ! , k 4. Show directly that g is a valid probability density function. 5. Show that a. g(n 1) < g(n) if and only if n < c. b. g at first increases and then decreases, and thus the distribution is unimodal c. If c is not an integer, there is a single mode at c. If c is an integer there are two modes at c 1 and c. 6. Suppose that requests to a web server follow the Poisson model with rate r = 5. per minute. Find the probability that there will be at least 8 requests in a 2 minute period. 7. Defects in a certain type of wire follow the Poisson model with rate 1.5 per meter. Find the probability that there will be no more than 4 defects in a 2 meter piece of the wire. Moments Suppose that N has the Poisson distribution with parameter c. The following exercises give the mean, variance, and probability generating function of N . 8. Show that ( N ) = c. 9. Show that var( N ) = c. 10. Show that (u N ) = e c (u 1) . for u Returning to the Poisson process {N t : t 0} with rate parameter r , it follows that ( N t ) = r t and var( N t ) = r t for t 0. Once again, we see that r can be interpreted as the average arrival rate. In an interval of length t, we expect about r t arrivals. 11. In the Poisson experiment, vary r and t with the scroll bars and note the location and size of the mean/standard deviation bar. Now with r = 3 and t = 4, run the experiment 1000 times with an update frequency of 10 and watch the apparent convergence of the sample mean and standard deviation to the distribution mean and standard deviation, respectively. 12. Suppose that customers arrive at a service station according to the Poisson model, at a rate of r = 4. Find the mean and standard deviation of the number of customers in an 8 hour period. Stationary, Independent Increments Let us see what the basic regenerative assumption of the Poisson process means in terms of the counting variables {N t : t 0}. 13. Show that if s < t, then N t N s is the number of arrivals in the interval (s, t]. Recall that our basic assumption is that the process essentially starts over at time s and the behavior after time s is independent of the behavior before time s. 14. Argue that: a. Nt Ns has the same distribution as Nt s namely Poisson with parameter r (t s). b. Nt Ns and Ns are independent. 15. Suppose that N and M are independent random variables, and that N has the Poisson distribution with parameter c and M has the Poisson distribution with parameter d. Show that N + M has the Poisson distribution with parameter c + d. Give a probabilistic proof, based a. on the Poisson process.

b. Give an analytic proof using probability density functions. c. Give an analytic proof using probability generating functions. 16. In the Poisson experiment, select r = 1 and t = 3. Run the experiment 1000 times, updating after each run. By computing the appropriate relative frequency functions, investigate empirically the independence of the random variables N 1 and N 3 N 1.

Normal ApproximationNow note that for k +, N k = N 1 + ( N 2 N 1) + + ( N k N k 1) The random variables in the sum on the right are independent and each has the Poisson distribution with parameter r . 17. Use the central limit theorem to show that the distribution of the standardized variable below converges to the standard normal distribution as k . Z k = N k k r k r A bit more generally, the same result is true with the integer k replaced by the positive real number c. Thus, if N has the Poisson distribution with parameter c, and c is large, then the distribution of N is approximately normal with mean c and standard deviation c. When using the normal approximation, we should remember to use the continuity correction, since the Poisson is a discrete distribution. 18. In the Poisson experiment, set r = 1 and t = 1. Increase r and t and note how the graph of the probability density function becomes more bell-shaped. 19. In the Poisson experiment, set r = 5 and t = 4. Run the experiment 1000 times with an update frequency of 100. Compute and compare the following: a. (15 N4 22) b. The relative frequency of the event {15 N4 22} . c. The normal approximation to (15 N4 22). 20. Suppose that requests to a web server follow the Poisson model with rate r = 5. Compute the normal approximation to the probability that there will be at least 280 requests in a 1 hour period.

Conditional DistributionsConsider again the Poisson model with arrival time sequence (T 1, T 2, ...) and counting process {N t : t 0}. 21. Let t > 0. Show that the conditional distribution of T 1 given N t = 1 is uniform on the interval (0, t). Interpret the result. 22. More generally, show that given N t = n, the conditional distribution of (T 1, ..., T n) is the same as the distribution of the order statistics of a random sample of size n from the uniform distribution on the interval (0, t). Note that the conditional distribution in the last exercise is independent of the rate r . This result means that, in a sense, the Poisson model gives the most random distribution of points in time.

23. Suppose that requests to a web server follow the Poisson model, and that 1 request comes in a five minute period. Find the probability that the request came during the first 3 minutes of the period. 24. In the Poisson experiment, set r = 1 and t = 2. Run the experiment 1000 times, updating after each run. Compute the appropriate relative frequency functions and investigate empirically the theoretical result in Exercise 23. 25. Suppose that 0 < s < t and that n is a positive integer. Show that the conditional distribution of N s given N t = n is binomial with trial parameter n and success parameter p = st. Note that the conditional distribution is independent of the rate r . Interpret the result. 26. Suppose that requests to a web server follow the Poisson model, and that 10 requests come during a 5 minute period. Find the probability that at least 4 requests came during the first 3 minutes of the period.

Estimating the RateIn many practical situations, the rate r of the process in unknown and must be estimated based on observing the number of arrivals in an interval. 27. In the Poisson experiment, set r = 3 and t = 5. Run the experiment 100 times, updating after each run. a. For each run, compute the estimate of r based on Nt . b. Over the 100 runs, compute the average of the squares of the errors. c. Compare the result in (b) with the variance in Exercise 28. 29 . Suppose that requests to a web server follow the Poisson model with unknown rate r per minute. In a one hour period, the server receives 342 requests. Estimate r .