47
Outline Common Distributions (Continued) Random Numbers and Q-Q Plots (Lab 3) Week 6 Random Variables and Their Distributions, Part II Week 6 Random Variables and Their Distributions, Part II

Week 6 Random Variables and Their Distributions, Part IIpersonal.psu.edu/acq/401/course.info/week6.pdf · 2018-02-15 · Outline Common Distributions (Continued) Random Numbers and

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Week 6 Random Variables and Their Distributions, Part IIpersonal.psu.edu/acq/401/course.info/week6.pdf · 2018-02-15 · Outline Common Distributions (Continued) Random Numbers and

OutlineCommon Distributions (Continued)

Random Numbers and Q-Q Plots (Lab 3)

Week 6Random Variables and

Their Distributions, Part II

Week 6 Random Variables and Their Distributions, Part II

Page 2: Week 6 Random Variables and Their Distributions, Part IIpersonal.psu.edu/acq/401/course.info/week6.pdf · 2018-02-15 · Outline Common Distributions (Continued) Random Numbers and

OutlineCommon Distributions (Continued)

Random Numbers and Q-Q Plots (Lab 3)

Week 6 Objectives

This week we will cover additional models for probabilitydistributions, and explore the use of R for understanding someproperties of these distributions. In particular we will:

1 Introduce the geometric, the negative binomial, thePoisson and the normal distributions.

2 Explore further the nature of these distributions throughproperties of simulated random samples drawn from themwith R.

3 Introduce the heavy-tailed Cauchy distribution and use it todemonstrate the effect of extreme outliers.

Finally, the Q-Q plot will be introduced as a graphical device fortesting the normality assumption.

Week 6 Random Variables and Their Distributions, Part II

Page 3: Week 6 Random Variables and Their Distributions, Part IIpersonal.psu.edu/acq/401/course.info/week6.pdf · 2018-02-15 · Outline Common Distributions (Continued) Random Numbers and

OutlineCommon Distributions (Continued)

Random Numbers and Q-Q Plots (Lab 3)

1 Common Distributions (Continued)

The Geometric and Negative Binomial Random Variables

The Poisson Random Variable and Process

The Normal Distribution

2 Random Numbers and Q-Q Plots (Lab 3)

Random Numbers in R

Q-Q Plots

Week 6 Random Variables and Their Distributions, Part II

Page 4: Week 6 Random Variables and Their Distributions, Part IIpersonal.psu.edu/acq/401/course.info/week6.pdf · 2018-02-15 · Outline Common Distributions (Continued) Random Numbers and

OutlineCommon Distributions (Continued)

Random Numbers and Q-Q Plots (Lab 3)

The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution

1 Common Distributions (Continued)

The Geometric and Negative Binomial Random Variables

The Poisson Random Variable and Process

The Normal Distribution

2 Random Numbers and Q-Q Plots (Lab 3)

Random Numbers in R

Q-Q Plots

Week 6 Random Variables and Their Distributions, Part II

Page 5: Week 6 Random Variables and Their Distributions, Part IIpersonal.psu.edu/acq/401/course.info/week6.pdf · 2018-02-15 · Outline Common Distributions (Continued) Random Numbers and

OutlineCommon Distributions (Continued)

Random Numbers and Q-Q Plots (Lab 3)

The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution

Consider a Bernoulli experiment being repeatedindependently until r th successes occur. For example

flip coins until 5 heads happen, orinspect items as they come off the assembly line until 10defective items have been found.

If X is the number of repetitions it takes to observe the r thsuccess, we say X is a negative binomial r.v.If p is the probability of 1 (success) in a Bernoulli trial, wewrite

X ∼ NBin(r ,p)

If r = 1, X is called the geometric r.v.

Week 6 Random Variables and Their Distributions, Part II

Page 6: Week 6 Random Variables and Their Distributions, Part IIpersonal.psu.edu/acq/401/course.info/week6.pdf · 2018-02-15 · Outline Common Distributions (Continued) Random Numbers and

OutlineCommon Distributions (Continued)

Random Numbers and Q-Q Plots (Lab 3)

The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution

If X ∼ NBin(r ,p), then

Its pmf (dnbinom(x-r, r, p) in R) is

P(X = x) =

(x − 1r − 1

)pr (1− p)x−r , x = r , r + 1, . . .

Its cdf in R is pnbinom(x-r, r, p)Its expected value is:

E(X ) =rp

Its variance is:σ2

X =r(1− p)

p2

Week 6 Random Variables and Their Distributions, Part II

Page 7: Week 6 Random Variables and Their Distributions, Part IIpersonal.psu.edu/acq/401/course.info/week6.pdf · 2018-02-15 · Outline Common Distributions (Continued) Random Numbers and

OutlineCommon Distributions (Continued)

Random Numbers and Q-Q Plots (Lab 3)

The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution

Figure: Some Negative Binomial PMFs.

Week 6 Random Variables and Their Distributions, Part II

Page 8: Week 6 Random Variables and Their Distributions, Part IIpersonal.psu.edu/acq/401/course.info/week6.pdf · 2018-02-15 · Outline Common Distributions (Continued) Random Numbers and

OutlineCommon Distributions (Continued)

Random Numbers and Q-Q Plots (Lab 3)

The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution

Items are inspected as they come off the production lineuntil the 5th defective is found. Items are defective withprobability 0.1 independently of each other. If Y is thenumber of non-defective items found, find µY , σ2

Y andP(Y ≤ 40).

Solution: Let X be the number of items inspected. Then,Y = X − 5. (Why?) Because X ∼ NBin(5,0.1), we have

µX =5

0.1= 50, σ2

X =5× 0.9

0.12 = 450.

Thus, µY = 50− 5 = 45 and σ2Y = 450. (Why?) Finally,

P(Y ≤ 40) = P(X − 5 ≤ 40) = P(X ≤ 45) = 0.473,

where the answer is found in R by pnbinom(40, 5, 0.1).

Week 6 Random Variables and Their Distributions, Part II

Page 9: Week 6 Random Variables and Their Distributions, Part IIpersonal.psu.edu/acq/401/course.info/week6.pdf · 2018-02-15 · Outline Common Distributions (Continued) Random Numbers and

OutlineCommon Distributions (Continued)

Random Numbers and Q-Q Plots (Lab 3)

The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution

Average Run Length

In quality control, samples from the production line aretaken at specified inspection times and a characteristic ismeasured for each product item in the sample. If theaverage falls below a certain threshold, the productionprocess is interrupted. The threshold is chosen so that theprobability that an inspection will lead to interruption is0.05. The number of inspections between successiveinterruptions is called a run length. If the r.v. X denotes arun length, find µX , which is referred to as average runlength.

Hint: Run length is a geometric random variable.

Week 6 Random Variables and Their Distributions, Part II

Page 10: Week 6 Random Variables and Their Distributions, Part IIpersonal.psu.edu/acq/401/course.info/week6.pdf · 2018-02-15 · Outline Common Distributions (Continued) Random Numbers and

OutlineCommon Distributions (Continued)

Random Numbers and Q-Q Plots (Lab 3)

The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution

Two athletic teams, A and B, play a best-of-three series ofgames. Suppose team A is the stronger team and will winany game with probability 0.6, independently from othergames. Find the probability that the stronger team will bethe overall winner.

Solution: Let X be the number of games needed for teamA to win twice. Then X ∼ NBin(2,0.6). Team A will win theseries if X = 2 or X = 3. Thus,

P(Team A wins the series) = P(X = 2) + P(X = 3)

=

(11

)0.62(1− 0.6)2−2 +

(21

)0.62(1− 0.6)3−2

= 0.36 + 0.288 = 0.648 (Found also in R: pnbinom(1,2,0.6))

Week 6 Random Variables and Their Distributions, Part II

Page 11: Week 6 Random Variables and Their Distributions, Part IIpersonal.psu.edu/acq/401/course.info/week6.pdf · 2018-02-15 · Outline Common Distributions (Continued) Random Numbers and

OutlineCommon Distributions (Continued)

Random Numbers and Q-Q Plots (Lab 3)

The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution

1 Common Distributions (Continued)

The Geometric and Negative Binomial Random Variables

The Poisson Random Variable and Process

The Normal Distribution

2 Random Numbers and Q-Q Plots (Lab 3)

Random Numbers in R

Q-Q Plots

Week 6 Random Variables and Their Distributions, Part II

Page 12: Week 6 Random Variables and Their Distributions, Part IIpersonal.psu.edu/acq/401/course.info/week6.pdf · 2018-02-15 · Outline Common Distributions (Continued) Random Numbers and

OutlineCommon Distributions (Continued)

Random Numbers and Q-Q Plots (Lab 3)

The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution

The Poisson Random Variable

A r.v. X with SX = {0,1,2, . . .} is a Poisson r.v. withparameter λ, X ∼ Poisson(λ), if its pmf is

p(x) = P(X = k) = e−λλx

x!, x = 0,1,2, . . . ,

for some λ > 0.

The pmf in R is dpois(x, lambda)∑∞x=0 p(x) = 1 follows from eλ =

∑∞k=0(λk/k !).

Its cdf in R: ppois(x, lambda)µX = λ, σ2

X = λ.

Week 6 Random Variables and Their Distributions, Part II

Page 13: Week 6 Random Variables and Their Distributions, Part IIpersonal.psu.edu/acq/401/course.info/week6.pdf · 2018-02-15 · Outline Common Distributions (Continued) Random Numbers and

OutlineCommon Distributions (Continued)

Random Numbers and Q-Q Plots (Lab 3)

The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution

0 5 10 15 20

0.0

0.1

0.2

0.3

P(X=k)

λ = 1λ = 4λ = 10

Figure: The Poisson PMF for different values of λ.Week 6 Random Variables and Their Distributions, Part II

Page 14: Week 6 Random Variables and Their Distributions, Part IIpersonal.psu.edu/acq/401/course.info/week6.pdf · 2018-02-15 · Outline Common Distributions (Continued) Random Numbers and

OutlineCommon Distributions (Continued)

Random Numbers and Q-Q Plots (Lab 3)

The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution

The Poisson random variable X can be:

1 the number of fish caught by an angler in an afternoon,2 the number of new potholes in a stretch of I80 during the

winter months,3 the number of disabled vehicles abandoned in I95 in a

year,4 the number of earthquakes (or other natural disasters) in a

region of the United States in a month,5 the number of wrongly dialed telephone numbers in a

given city in an hour,6 the number of freak accidents, such as falls in the shower,

in a given time period.7 the number of hits in a website in a day.

Week 6 Random Variables and Their Distributions, Part II

Page 15: Week 6 Random Variables and Their Distributions, Part IIpersonal.psu.edu/acq/401/course.info/week6.pdf · 2018-02-15 · Outline Common Distributions (Continued) Random Numbers and

OutlineCommon Distributions (Continued)

Random Numbers and Q-Q Plots (Lab 3)

The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution

In general, the Poisson distribution is used to model theprobability for a number of occurrences of a certain eventin a specified period of time (or distance, area or volume).The events must occur at random and at a constant rate.The occurrence of an event must not influence the timingof subsequent events (i.e. events occur independently).Its earliest use dealt with the number of alpha particlesemitted from a radioactive source in a given period of time.Current applications include areas such as insuranceindustry, tourist industry, traffic engineering, demography,forestry and astronomy.

Week 6 Random Variables and Their Distributions, Part II

Page 16: Week 6 Random Variables and Their Distributions, Part IIpersonal.psu.edu/acq/401/course.info/week6.pdf · 2018-02-15 · Outline Common Distributions (Continued) Random Numbers and

OutlineCommon Distributions (Continued)

Random Numbers and Q-Q Plots (Lab 3)

The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution

Illustrative use of Table A.2

Let X ∼ Poisson(5). Find: a) P(X ≤ 5),b) P(6 ≤ X ≤ 9), andc) P(X ≥ 10).

Solution. a) P(X ≤ 5) = F (5) = 0.616.

b) Write

P(6 ≤ X ≤ 9) = P(5 < X ≤ 9) = P(X ≤ 9)− P(X ≤ 5)

= F (9)− F (5) = 0.968− 0.616.

c) Write

P(X ≥ 10) = 1− P(X ≤ 9) = 1− F (9) = 1− 0.968.

Week 6 Random Variables and Their Distributions, Part II

Page 17: Week 6 Random Variables and Their Distributions, Part IIpersonal.psu.edu/acq/401/course.info/week6.pdf · 2018-02-15 · Outline Common Distributions (Continued) Random Numbers and

OutlineCommon Distributions (Continued)

Random Numbers and Q-Q Plots (Lab 3)

The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution

Suppose the number of colds a person contracts in a yearis a Poisson random variable. Persons taking Vitamin Csupplements contract an average of 3 colds per year, andpersons not taking the supplements contract an average of5 colds per year.

a) Find the probability of no more than two colds for a persontaking, and for a person not taking, Vitamin C supplements.

b) Suppose 70% of the population takes Vitamin C. Find theprobability that a randomly selected person will have nomore than two colds in a given year.

c) Suppose that a randomly selected person contracts nomore than two colds in a given year. What is the probabilitythat he/she takes Vitamin C supplements?

Week 6 Random Variables and Their Distributions, Part II

Page 18: Week 6 Random Variables and Their Distributions, Part IIpersonal.psu.edu/acq/401/course.info/week6.pdf · 2018-02-15 · Outline Common Distributions (Continued) Random Numbers and

OutlineCommon Distributions (Continued)

Random Numbers and Q-Q Plots (Lab 3)

The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution

Solution: a) The R commands ppois(2, 3) and ppois(2,5) return0.423 and 0.125 for the probability of no more than two coldsfor a person taking, and for a person not taking, Vitamin Csupplements, respectively.

b) By the Law of Total Probability the answer, found with the Rcommand ppois(2, 3)*0.7 + ppois(2,5)*0.3, is 0.334.

c) Use of Bayes Theorem yields the answer 0.888. Thecorresponding R command is ppois(2, 3)*0.7/(ppois(2, 3)*0.7 +ppois(2,5)*0.3)

Week 6 Random Variables and Their Distributions, Part II

Page 19: Week 6 Random Variables and Their Distributions, Part IIpersonal.psu.edu/acq/401/course.info/week6.pdf · 2018-02-15 · Outline Common Distributions (Continued) Random Numbers and

OutlineCommon Distributions (Continued)

Random Numbers and Q-Q Plots (Lab 3)

The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution

Poisson Approximation to Binomial Probabilities

• If Y ∼ Bin(n,p), with n ≥ 100, p ≤ 0.01, and np ≤ 20, then

P(Y ≥ k) ' P(X ≥ k), k = 0,1,2, . . . ,n,

where X ∼ Poisson(λ = np).

The enormous range of applications of the Poissondistribution is due to this proposition. Read the twoparagraphs following the proof of Proposition 3.4-1 onpage 138.

Week 6 Random Variables and Their Distributions, Part II

Page 20: Week 6 Random Variables and Their Distributions, Part IIpersonal.psu.edu/acq/401/course.info/week6.pdf · 2018-02-15 · Outline Common Distributions (Continued) Random Numbers and

OutlineCommon Distributions (Continued)

Random Numbers and Q-Q Plots (Lab 3)

The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution

Suppose the monthly suicide rate in a certain county is 1per 100,000 people. Give an approximation to theprobability that in a city of 500,000 in this county there willbe no more than six suicides in the next month.

Solution. Let Y be the # of suicides in that city next month.Then Y ∼ Bin(500,000,p = 10−5), and all conditions forthe Poisson approximation to Binomial probabilities aremet. Thus, if X ∼ Poisson(λ = np = 5),

P(Y ≤ 6) ' P(X ≤ 6) = 0.762.

Week 6 Random Variables and Their Distributions, Part II

Page 21: Week 6 Random Variables and Their Distributions, Part IIpersonal.psu.edu/acq/401/course.info/week6.pdf · 2018-02-15 · Outline Common Distributions (Continued) Random Numbers and

OutlineCommon Distributions (Continued)

Random Numbers and Q-Q Plots (Lab 3)

The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution

The Poisson process

• If the # of occurrences are recorded as they accumulate overtime, set

X (t) = number of occurrences in the time interval [0, t ].

• X (t), t ≥ 0, is called a Poisson process with rate α per unittime, if the following assumptions are satisfied.

1 The probability of exactly one occurrence in a short timeperiod of length h is approximately αh.

2 The probability of more than one occurrence in a shorttime period is approximately 0.

3 The number of occurrences in nonoverlapping timeintervals are mutually independent.

Week 6 Random Variables and Their Distributions, Part II

Page 22: Week 6 Random Variables and Their Distributions, Part IIpersonal.psu.edu/acq/401/course.info/week6.pdf · 2018-02-15 · Outline Common Distributions (Continued) Random Numbers and

OutlineCommon Distributions (Continued)

Random Numbers and Q-Q Plots (Lab 3)

The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution

• The parameter α in the first assumption specifies the “rate” ofthe occurrences, i.e. the average number of occurrences pertime unit.

• If X (t), t ≥ 0 is a Poisson(α) process, then:

For each fixed t0, X (t0) ∼ Poisson(λ = αt0). Thus,

P(X (t0) = k) = e−αt0 (αt0)k

k !, k = 0,1,2, · · ·

If t1 < t2 are two positive numbers, then

X (t2)− X (t1) ∼ Poisson(α× (t2 − t1))

Week 6 Random Variables and Their Distributions, Part II

Page 23: Week 6 Random Variables and Their Distributions, Part IIpersonal.psu.edu/acq/401/course.info/week6.pdf · 2018-02-15 · Outline Common Distributions (Continued) Random Numbers and

OutlineCommon Distributions (Continued)

Random Numbers and Q-Q Plots (Lab 3)

The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution

Continuous electrolytic inspection of a tin plate yields onaverage 0.2 imperfections per minute. Find:

a) The probability of one imperfection in three minutes.

b) The probability of at most one imperfection in 0.25 hours.

Solution. a) Here α = 0.2, t = 3, λ = αt = 0.6. Thus,

P(X (3) = 1) = F (1; 0.6)− F (0; 0.6) = .878− .549 = .329.

b) Here α = 0.2, t = 15, λ = αt = 3.0. Thus,

P(X (15) ≤ 1) = F(1; 3.0

)= .199.

Week 6 Random Variables and Their Distributions, Part II

Page 24: Week 6 Random Variables and Their Distributions, Part IIpersonal.psu.edu/acq/401/course.info/week6.pdf · 2018-02-15 · Outline Common Distributions (Continued) Random Numbers and

OutlineCommon Distributions (Continued)

Random Numbers and Q-Q Plots (Lab 3)

The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution

People enter a department store according to a Poissonprocess with rate α per hour. It is known that 30% of thoseentering the store will make a purchase of $50.00 or more.Find the probability mass function of the number ofcustomers who will make purchases of $50.00 or moreduring the next hour.

Answer: Poisson(0.3α). (Proof omitted, but see Example3.4-16, p. 143)

Week 6 Random Variables and Their Distributions, Part II

Page 25: Week 6 Random Variables and Their Distributions, Part IIpersonal.psu.edu/acq/401/course.info/week6.pdf · 2018-02-15 · Outline Common Distributions (Continued) Random Numbers and

OutlineCommon Distributions (Continued)

Random Numbers and Q-Q Plots (Lab 3)

The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution

1 Common Distributions (Continued)

The Geometric and Negative Binomial Random Variables

The Poisson Random Variable and Process

The Normal Distribution

2 Random Numbers and Q-Q Plots (Lab 3)

Random Numbers in R

Q-Q Plots

Week 6 Random Variables and Their Distributions, Part II

Page 26: Week 6 Random Variables and Their Distributions, Part IIpersonal.psu.edu/acq/401/course.info/week6.pdf · 2018-02-15 · Outline Common Distributions (Continued) Random Numbers and

OutlineCommon Distributions (Continued)

Random Numbers and Q-Q Plots (Lab 3)

The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution

The Normal distribution if the most important distribution inprobability and statistics.X ∼ N(µ, σ2) if its pdf is

f (x ;µ, σ2) =1√

2πσ2e−

(x−µ)2

2σ2 , −∞ < x <∞.

The pdf of X ∼ N(µ, σ2) in R is dnorm(x,µ,σ).The cdf of X ∼ N(µ, σ2) does not have a closed formexpression. In R it is pnorm(x,µ,σ).The (1− α)100th percentile of X ∼ N(µ, σ2) is found in Rby qnorm(1-α, µ,σ).

Week 6 Random Variables and Their Distributions, Part II

Page 27: Week 6 Random Variables and Their Distributions, Part IIpersonal.psu.edu/acq/401/course.info/week6.pdf · 2018-02-15 · Outline Common Distributions (Continued) Random Numbers and

OutlineCommon Distributions (Continued)

Random Numbers and Q-Q Plots (Lab 3)

The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution

The Standard Normal Distribution

•When µ = 0 and σ = 1, X is said to have the standard normaldistribution and is denoted, universally, by Z . The pdf of Z is

φ(z) =1√2π

e−z2/2, −∞ < z <∞.

The cdf of Z is denoted by Φ. Thus

Φ(z) = P(Z ≤ z) =

∫ z

−∞φ(x)dx .

Φ(z) has no closed form expression, but is tabulated in TableA.3.

Week 6 Random Variables and Their Distributions, Part II

Page 28: Week 6 Random Variables and Their Distributions, Part IIpersonal.psu.edu/acq/401/course.info/week6.pdf · 2018-02-15 · Outline Common Distributions (Continued) Random Numbers and

OutlineCommon Distributions (Continued)

Random Numbers and Q-Q Plots (Lab 3)

The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution

Historical Notes

It was discovered by Abraham DeMoivre in 1733, forapproximating binomial probabilities when n is large. Hecalled it the exponential bell-shaped curve.

DeMoivre was the first statistical consultant working out of”Slaughter’s Coffee House”, a betting shop in Long Acres,London.

In 1803, Karl Friedrich Gauss used it for predicting thelocation of astronomical objects. Because of this it becameknown as the Gaussian distribution.By the late 19th century, statisticians had noted that mostdata sets would have approximately bell-shapedhistograms. It came to be accepted that it was ”normal” forany well-behaved data set to follow this curve. So theGaussian curve became the normal curve.

Week 6 Random Variables and Their Distributions, Part II

Page 29: Week 6 Random Variables and Their Distributions, Part IIpersonal.psu.edu/acq/401/course.info/week6.pdf · 2018-02-15 · Outline Common Distributions (Continued) Random Numbers and

OutlineCommon Distributions (Continued)

Random Numbers and Q-Q Plots (Lab 3)

The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution

Figure: One side of the 10 Mark bill

Week 6 Random Variables and Their Distributions, Part II

Page 30: Week 6 Random Variables and Their Distributions, Part IIpersonal.psu.edu/acq/401/course.info/week6.pdf · 2018-02-15 · Outline Common Distributions (Continued) Random Numbers and

OutlineCommon Distributions (Continued)

Random Numbers and Q-Q Plots (Lab 3)

The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution

Basic Properties of the Normal Distribution

Proposition

If X ∼ N(µ, σ2), then1 E(X ) = µ.2 Var(X ) = σ2.3 For an real numbers a,b

Y = a + bX ∼ N(a + bµ,b2σ2).

For example, if X ∼ N(4,9) then

Y = 5 + 2X ∼ N(13,36)

Week 6 Random Variables and Their Distributions, Part II

Page 31: Week 6 Random Variables and Their Distributions, Part IIpersonal.psu.edu/acq/401/course.info/week6.pdf · 2018-02-15 · Outline Common Distributions (Continued) Random Numbers and

OutlineCommon Distributions (Continued)

Random Numbers and Q-Q Plots (Lab 3)

The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution

Corollary1 If Z ∼ N(0,1), then X = µ+ σZ ∼ N(µ, σ2).

2 If X ∼ N(µ, σ2), then Z =X − µσ∼ N(0,1).

3 If X ∼ N(µ, σ2), then xα = µ+ σzα,

where xα and zα denote the percentiles of X and Z .

The corollary implies that probabilities and percentiles ofany normal random variable can be computed fromcorresponding probabilities and percentiles of Z .

Week 6 Random Variables and Their Distributions, Part II

Page 32: Week 6 Random Variables and Their Distributions, Part IIpersonal.psu.edu/acq/401/course.info/week6.pdf · 2018-02-15 · Outline Common Distributions (Continued) Random Numbers and

OutlineCommon Distributions (Continued)

Random Numbers and Q-Q Plots (Lab 3)

The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution

The Standard Normal Table

• Table A.3 provides values of Φ(z), the standard normal cdf,for z-values ranging from 0 up to 3.09, in steps of 0.01.

The left-most column of the table provides the z-value upto the first decimal, while the top is used for the seconddecimal. Thus, the number 1.00 is identified by 1.0 in theleft column and 0.00 in the top row, and the number 1.25 isidentified by 1.2 in the left column and 0.05 in the top row.

• The value of the standard normal cdf at negative argumentsis found from the formula

Φ(−z) = 1− Φ(z),

which follows from the symmetry of the standard normal pdf.

Week 6 Random Variables and Their Distributions, Part II

Page 33: Week 6 Random Variables and Their Distributions, Part IIpersonal.psu.edu/acq/401/course.info/week6.pdf · 2018-02-15 · Outline Common Distributions (Continued) Random Numbers and

OutlineCommon Distributions (Continued)

Random Numbers and Q-Q Plots (Lab 3)

The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution

Probabilities via the Standard Normal Table

• Table A.3 can be used to find probabilities related to anynormal r.v.: If X ∼ N(µ, σ2) its cdf is

FX (x) = Φ

(x − µσ

).

If X ∼ N(1.25,0.462), use Table A.3 to find P(1.00 ≤ X ≤ 1.75).

Solution. From P(1.00 ≤ X ≤ 1.75) = FX (1.75)− FX (1.00) wehave

P(1.00 ≤ X ≤ 1.75) = Φ

(1.75− 1.25

0.46

)− Φ

(1.00− 1.25

0.46

)= Φ(1.09)− Φ(−0.54) = 0.8621− 0.2946

= 0.5675.

Week 6 Random Variables and Their Distributions, Part II

Page 34: Week 6 Random Variables and Their Distributions, Part IIpersonal.psu.edu/acq/401/course.info/week6.pdf · 2018-02-15 · Outline Common Distributions (Continued) Random Numbers and

OutlineCommon Distributions (Continued)

Random Numbers and Q-Q Plots (Lab 3)

The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution

The 68-95-99.7% Property

If X ∼ N(µ, σ2), then

P(µ− 1σ < X < µ+ 1σ) = P(−1 < Z < 1) = 0.6826,

P(µ− 2σ < X < µ+ 2σ) = P(−2 < Z < 2) = 0.9544,

P(µ− 3σ < X < µ+ 3σ) = P(−3 < Z < 3) = 0.9974.

Week 6 Random Variables and Their Distributions, Part II

Page 35: Week 6 Random Variables and Their Distributions, Part IIpersonal.psu.edu/acq/401/course.info/week6.pdf · 2018-02-15 · Outline Common Distributions (Continued) Random Numbers and

OutlineCommon Distributions (Continued)

Random Numbers and Q-Q Plots (Lab 3)

The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution

Percentiles via the Standard Normal Table

• Recall Φ(z) does not have a closed for expression.

• Table A.3 can be used for providing an approximate solutionfor zα in the equation

Φ(zα) = 1− α.

• To find zα, one first locates 1− α in the body of Table A.3 andthen reads zα from the margins. If the exact value of 1− α doesnot exist, then locate the values just smaller and just larger than1− α, and use the average of the corresponding z-values as anapproximation to zα.

• If X ∼ N(µ, σ2), xα = µ+ σzα.

Week 6 Random Variables and Their Distributions, Part II

Page 36: Week 6 Random Variables and Their Distributions, Part IIpersonal.psu.edu/acq/401/course.info/week6.pdf · 2018-02-15 · Outline Common Distributions (Continued) Random Numbers and

OutlineCommon Distributions (Continued)

Random Numbers and Q-Q Plots (Lab 3)

The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution

Find z0.05, the 95th percentile of Z ∼ N(0,1).

Solution. 1− α = 0.95 does not exist in the body of thetable. The entry that is closest to, but larger than 0.95(which is 0.9505), corresponds to 1.64. The entry that isclosest to, but smaller than 0.95 (which is 0.9495),corresponds to 1.65. We approximate z0.05 by averaging1.64 and 1.65, i.e., z.05 ' 1.645.

The R command qnorm(0.95) yields1.644854.

Week 6 Random Variables and Their Distributions, Part II

Page 37: Week 6 Random Variables and Their Distributions, Part IIpersonal.psu.edu/acq/401/course.info/week6.pdf · 2018-02-15 · Outline Common Distributions (Continued) Random Numbers and

OutlineCommon Distributions (Continued)

Random Numbers and Q-Q Plots (Lab 3)

The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution

Let X denote the weight of a randomly chosen frozenyogurt cup. Suppose X ∼ N(8, .462). Find the value c thatseparates the upper 5% of weight values from the lower95%.

Solution. This is another way of asking for the 95-thpercentile, x.05, of X . Using the formula xα = µ+ σzα, wehave

x.05 = 8 + .46z.05 = 8 + (.46)(1.645) = 8.76.

Week 6 Random Variables and Their Distributions, Part II

Page 38: Week 6 Random Variables and Their Distributions, Part IIpersonal.psu.edu/acq/401/course.info/week6.pdf · 2018-02-15 · Outline Common Distributions (Continued) Random Numbers and

OutlineCommon Distributions (Continued)

Random Numbers and Q-Q Plots (Lab 3)

Random Numbers in RQ-Q Plots

1 Common Distributions (Continued)

The Geometric and Negative Binomial Random Variables

The Poisson Random Variable and Process

The Normal Distribution

2 Random Numbers and Q-Q Plots (Lab 3)

Random Numbers in R

Q-Q Plots

Week 6 Random Variables and Their Distributions, Part II

Page 39: Week 6 Random Variables and Their Distributions, Part IIpersonal.psu.edu/acq/401/course.info/week6.pdf · 2018-02-15 · Outline Common Distributions (Continued) Random Numbers and

OutlineCommon Distributions (Continued)

Random Numbers and Q-Q Plots (Lab 3)

Random Numbers in RQ-Q Plots

R Names of Common Distributions

Distribution R name additional argumentsbinomial binom size, prob (n,p)hypergeometric hyper m, N-m, ngeometric geom prob (p)negative binomial nbinom size, prob (r,p)Poisson pois λnormal norm mean, sduniform unif min, maxexponential exp rate (λ)gamma gamma shape, scaleCauchy cauchy location, scale

Week 6 Random Variables and Their Distributions, Part II

Page 40: Week 6 Random Variables and Their Distributions, Part IIpersonal.psu.edu/acq/401/course.info/week6.pdf · 2018-02-15 · Outline Common Distributions (Continued) Random Numbers and

OutlineCommon Distributions (Continued)

Random Numbers and Q-Q Plots (Lab 3)

Random Numbers in RQ-Q Plots

Versions of the R Names with “d”, “p”, “q”, and “r”

A “d” in front of the R name gives the pmf or pdf; a “p”gives the cdf; a “q” gives percentiles; an “r” gives randomnumbers. For example

dbinom(x,n,p), for x = 0,1, . . . ,n, gives the Bin(n,p) pmf.rbinom(m,n,p), for m ≥ 1, gives m random numbers fromthe Bin(n,p) distribution.qnorm(t, µ, σ) gives the 100t-th percentile of the N(µ, σ2)distribution. The default values for µ and σ are 0 and 1;thus, qnorm(0.9) gives the 90-th percentile of N(0, 1).rcauchy(m) gives m numbers from the Cauchy(0, 1)distribution (default values for location and scale are 0, 1).

The “x” in the “d” or “p” versions of the R names can alsobe a vector of values. Similarly for the “t” in the “q” versionof the names. For example dbinom(0:10, 10, 0.3), orqnorm(c(0.9, 0.95)).

Week 6 Random Variables and Their Distributions, Part II

Page 41: Week 6 Random Variables and Their Distributions, Part IIpersonal.psu.edu/acq/401/course.info/week6.pdf · 2018-02-15 · Outline Common Distributions (Continued) Random Numbers and

OutlineCommon Distributions (Continued)

Random Numbers and Q-Q Plots (Lab 3)

Random Numbers in RQ-Q Plots

Use the command x=rbinom(1000, 4, 0.5) to generates asample of size 1000 from the Bin(4, 0.5) distribution andstore it in x.

Guess what the command mean(x) might return, and thentry the command to verify your guess.Guess what the command var(x) might return, and then trythe command to verify your guess.The command table(x)/1000 will return the proportions of0, . . . ,4 in your sample. Give the outcome and explain whyyou should have expected this outcome. (Hint:dbinom(0:4,4,0.5) gives the pmf of Bin(4, 0.5).)

Week 6 Random Variables and Their Distributions, Part II

Page 42: Week 6 Random Variables and Their Distributions, Part IIpersonal.psu.edu/acq/401/course.info/week6.pdf · 2018-02-15 · Outline Common Distributions (Continued) Random Numbers and

OutlineCommon Distributions (Continued)

Random Numbers and Q-Q Plots (Lab 3)

Random Numbers in RQ-Q Plots

Use the commands qnorm(0.9, 10, 3); 10 + 3*qnorm(0.9).Could you have expected that they give the same answerand why?Use the command x=rnorm(10000) to generate a sampleof size 10000 from the N(0, 1) distribution and store it in x.

Guess what the command quantile(x, 0.9) might return, andthen try the command to verify your guess. (Hint:qnorm(0.9).Guess what the command quantile(10 +3*x, 0.9) mightreturn, and then try the command to verify your guess.

Week 6 Random Variables and Their Distributions, Part II

Page 43: Week 6 Random Variables and Their Distributions, Part IIpersonal.psu.edu/acq/401/course.info/week6.pdf · 2018-02-15 · Outline Common Distributions (Continued) Random Numbers and

OutlineCommon Distributions (Continued)

Random Numbers and Q-Q Plots (Lab 3)

Random Numbers in RQ-Q Plots

The ’heavy’ tails of the Cauchy distribution

• Use the following commands to plot the PDFs of N(0,1) andCauchy distributions:

curve(dnorm, -4, 4, col=”red”); curve(dcauchy, -4, 4, add=T,col=”green”)

• Observe that there is much more area under the Cauchy pdfin the tails.

Week 6 Random Variables and Their Distributions, Part II

Page 44: Week 6 Random Variables and Their Distributions, Part IIpersonal.psu.edu/acq/401/course.info/week6.pdf · 2018-02-15 · Outline Common Distributions (Continued) Random Numbers and

OutlineCommon Distributions (Continued)

Random Numbers and Q-Q Plots (Lab 3)

Random Numbers in RQ-Q Plots

Heavy Tails and Outliers

• Random samples from the Cauchy distribution often containextreme outliers.

Try cs=rcauchy(1000); summary(cs)For comparison with normal samples tryns=rnormal(1000); summary(ns)Try the above commands with 10000 instead of 1000.

• Outliers affect the sample mean!

Repeat the last command five times. Note that the samplemedian is always very close to 0, which is the populationmedian. Is the sample mean always close to 0?

Week 6 Random Variables and Their Distributions, Part II

Page 45: Week 6 Random Variables and Their Distributions, Part IIpersonal.psu.edu/acq/401/course.info/week6.pdf · 2018-02-15 · Outline Common Distributions (Continued) Random Numbers and

OutlineCommon Distributions (Continued)

Random Numbers and Q-Q Plots (Lab 3)

Random Numbers in RQ-Q Plots

1 Common Distributions (Continued)

The Geometric and Negative Binomial Random Variables

The Poisson Random Variable and Process

The Normal Distribution

2 Random Numbers and Q-Q Plots (Lab 3)

Random Numbers in R

Q-Q Plots

Week 6 Random Variables and Their Distributions, Part II

Page 46: Week 6 Random Variables and Their Distributions, Part IIpersonal.psu.edu/acq/401/course.info/week6.pdf · 2018-02-15 · Outline Common Distributions (Continued) Random Numbers and

OutlineCommon Distributions (Continued)

Random Numbers and Q-Q Plots (Lab 3)

Random Numbers in RQ-Q Plots

Q-Q plots are used to provide visual verification (orcontradiction) of an assumed distribution. Here we willconcentrate on the assumption of normality.Q-Q stands for Quantile-Quantile. It plots the samplepercentiles versus those of the assumed distribution. If theassumption is correct, the plot should be roughly linear(since sample percentiles estimate the correspondingpopulation percentiles).

With the data in the R object x, two versions of the commandare:

Week 6 Random Variables and Their Distributions, Part II

Page 47: Week 6 Random Variables and Their Distributions, Part IIpersonal.psu.edu/acq/401/course.info/week6.pdf · 2018-02-15 · Outline Common Distributions (Continued) Random Numbers and

OutlineCommon Distributions (Continued)

Random Numbers and Q-Q Plots (Lab 3)

Random Numbers in RQ-Q Plots

R commands for the normal Q-Q plotqqnorm(x); qqline(x, col=2)

qqnorm(x,datax=T); qqline(x, datax=T, col=2)

The first version has the sample percentiles on the y-axis, andthe second puts them on the x-axis.

Q-Q plot for a normal sample:x=rnorm(50); qqnorm(x); qqline(x, col=2)Repeat the above command with x=rnorm(50) replaced by

a) x=rexp(50), b) x=runif(50), and c) x=rcauchy(50)

Week 6 Random Variables and Their Distributions, Part II