Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
OutlineCommon Distributions (Continued)
Random Numbers and Q-Q Plots (Lab 3)
Week 6Random Variables and
Their Distributions, Part II
Week 6 Random Variables and Their Distributions, Part II
OutlineCommon Distributions (Continued)
Random Numbers and Q-Q Plots (Lab 3)
Week 6 Objectives
This week we will cover additional models for probabilitydistributions, and explore the use of R for understanding someproperties of these distributions. In particular we will:
1 Introduce the geometric, the negative binomial, thePoisson and the normal distributions.
2 Explore further the nature of these distributions throughproperties of simulated random samples drawn from themwith R.
3 Introduce the heavy-tailed Cauchy distribution and use it todemonstrate the effect of extreme outliers.
Finally, the Q-Q plot will be introduced as a graphical device fortesting the normality assumption.
Week 6 Random Variables and Their Distributions, Part II
OutlineCommon Distributions (Continued)
Random Numbers and Q-Q Plots (Lab 3)
1 Common Distributions (Continued)
The Geometric and Negative Binomial Random Variables
The Poisson Random Variable and Process
The Normal Distribution
2 Random Numbers and Q-Q Plots (Lab 3)
Random Numbers in R
Q-Q Plots
Week 6 Random Variables and Their Distributions, Part II
OutlineCommon Distributions (Continued)
Random Numbers and Q-Q Plots (Lab 3)
The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution
1 Common Distributions (Continued)
The Geometric and Negative Binomial Random Variables
The Poisson Random Variable and Process
The Normal Distribution
2 Random Numbers and Q-Q Plots (Lab 3)
Random Numbers in R
Q-Q Plots
Week 6 Random Variables and Their Distributions, Part II
OutlineCommon Distributions (Continued)
Random Numbers and Q-Q Plots (Lab 3)
The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution
Consider a Bernoulli experiment being repeatedindependently until r th successes occur. For example
flip coins until 5 heads happen, orinspect items as they come off the assembly line until 10defective items have been found.
If X is the number of repetitions it takes to observe the r thsuccess, we say X is a negative binomial r.v.If p is the probability of 1 (success) in a Bernoulli trial, wewrite
X ∼ NBin(r ,p)
If r = 1, X is called the geometric r.v.
Week 6 Random Variables and Their Distributions, Part II
OutlineCommon Distributions (Continued)
Random Numbers and Q-Q Plots (Lab 3)
The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution
If X ∼ NBin(r ,p), then
Its pmf (dnbinom(x-r, r, p) in R) is
P(X = x) =
(x − 1r − 1
)pr (1− p)x−r , x = r , r + 1, . . .
Its cdf in R is pnbinom(x-r, r, p)Its expected value is:
E(X ) =rp
Its variance is:σ2
X =r(1− p)
p2
Week 6 Random Variables and Their Distributions, Part II
OutlineCommon Distributions (Continued)
Random Numbers and Q-Q Plots (Lab 3)
The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution
Figure: Some Negative Binomial PMFs.
Week 6 Random Variables and Their Distributions, Part II
OutlineCommon Distributions (Continued)
Random Numbers and Q-Q Plots (Lab 3)
The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution
Items are inspected as they come off the production lineuntil the 5th defective is found. Items are defective withprobability 0.1 independently of each other. If Y is thenumber of non-defective items found, find µY , σ2
Y andP(Y ≤ 40).
Solution: Let X be the number of items inspected. Then,Y = X − 5. (Why?) Because X ∼ NBin(5,0.1), we have
µX =5
0.1= 50, σ2
X =5× 0.9
0.12 = 450.
Thus, µY = 50− 5 = 45 and σ2Y = 450. (Why?) Finally,
P(Y ≤ 40) = P(X − 5 ≤ 40) = P(X ≤ 45) = 0.473,
where the answer is found in R by pnbinom(40, 5, 0.1).
Week 6 Random Variables and Their Distributions, Part II
OutlineCommon Distributions (Continued)
Random Numbers and Q-Q Plots (Lab 3)
The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution
Average Run Length
In quality control, samples from the production line aretaken at specified inspection times and a characteristic ismeasured for each product item in the sample. If theaverage falls below a certain threshold, the productionprocess is interrupted. The threshold is chosen so that theprobability that an inspection will lead to interruption is0.05. The number of inspections between successiveinterruptions is called a run length. If the r.v. X denotes arun length, find µX , which is referred to as average runlength.
Hint: Run length is a geometric random variable.
Week 6 Random Variables and Their Distributions, Part II
OutlineCommon Distributions (Continued)
Random Numbers and Q-Q Plots (Lab 3)
The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution
Two athletic teams, A and B, play a best-of-three series ofgames. Suppose team A is the stronger team and will winany game with probability 0.6, independently from othergames. Find the probability that the stronger team will bethe overall winner.
Solution: Let X be the number of games needed for teamA to win twice. Then X ∼ NBin(2,0.6). Team A will win theseries if X = 2 or X = 3. Thus,
P(Team A wins the series) = P(X = 2) + P(X = 3)
=
(11
)0.62(1− 0.6)2−2 +
(21
)0.62(1− 0.6)3−2
= 0.36 + 0.288 = 0.648 (Found also in R: pnbinom(1,2,0.6))
Week 6 Random Variables and Their Distributions, Part II
OutlineCommon Distributions (Continued)
Random Numbers and Q-Q Plots (Lab 3)
The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution
1 Common Distributions (Continued)
The Geometric and Negative Binomial Random Variables
The Poisson Random Variable and Process
The Normal Distribution
2 Random Numbers and Q-Q Plots (Lab 3)
Random Numbers in R
Q-Q Plots
Week 6 Random Variables and Their Distributions, Part II
OutlineCommon Distributions (Continued)
Random Numbers and Q-Q Plots (Lab 3)
The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution
The Poisson Random Variable
A r.v. X with SX = {0,1,2, . . .} is a Poisson r.v. withparameter λ, X ∼ Poisson(λ), if its pmf is
p(x) = P(X = k) = e−λλx
x!, x = 0,1,2, . . . ,
for some λ > 0.
The pmf in R is dpois(x, lambda)∑∞x=0 p(x) = 1 follows from eλ =
∑∞k=0(λk/k !).
Its cdf in R: ppois(x, lambda)µX = λ, σ2
X = λ.
Week 6 Random Variables and Their Distributions, Part II
OutlineCommon Distributions (Continued)
Random Numbers and Q-Q Plots (Lab 3)
The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution
0 5 10 15 20
0.0
0.1
0.2
0.3
P(X=k)
λ = 1λ = 4λ = 10
Figure: The Poisson PMF for different values of λ.Week 6 Random Variables and Their Distributions, Part II
OutlineCommon Distributions (Continued)
Random Numbers and Q-Q Plots (Lab 3)
The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution
The Poisson random variable X can be:
1 the number of fish caught by an angler in an afternoon,2 the number of new potholes in a stretch of I80 during the
winter months,3 the number of disabled vehicles abandoned in I95 in a
year,4 the number of earthquakes (or other natural disasters) in a
region of the United States in a month,5 the number of wrongly dialed telephone numbers in a
given city in an hour,6 the number of freak accidents, such as falls in the shower,
in a given time period.7 the number of hits in a website in a day.
Week 6 Random Variables and Their Distributions, Part II
OutlineCommon Distributions (Continued)
Random Numbers and Q-Q Plots (Lab 3)
The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution
In general, the Poisson distribution is used to model theprobability for a number of occurrences of a certain eventin a specified period of time (or distance, area or volume).The events must occur at random and at a constant rate.The occurrence of an event must not influence the timingof subsequent events (i.e. events occur independently).Its earliest use dealt with the number of alpha particlesemitted from a radioactive source in a given period of time.Current applications include areas such as insuranceindustry, tourist industry, traffic engineering, demography,forestry and astronomy.
Week 6 Random Variables and Their Distributions, Part II
OutlineCommon Distributions (Continued)
Random Numbers and Q-Q Plots (Lab 3)
The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution
Illustrative use of Table A.2
Let X ∼ Poisson(5). Find: a) P(X ≤ 5),b) P(6 ≤ X ≤ 9), andc) P(X ≥ 10).
Solution. a) P(X ≤ 5) = F (5) = 0.616.
b) Write
P(6 ≤ X ≤ 9) = P(5 < X ≤ 9) = P(X ≤ 9)− P(X ≤ 5)
= F (9)− F (5) = 0.968− 0.616.
c) Write
P(X ≥ 10) = 1− P(X ≤ 9) = 1− F (9) = 1− 0.968.
Week 6 Random Variables and Their Distributions, Part II
OutlineCommon Distributions (Continued)
Random Numbers and Q-Q Plots (Lab 3)
The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution
Suppose the number of colds a person contracts in a yearis a Poisson random variable. Persons taking Vitamin Csupplements contract an average of 3 colds per year, andpersons not taking the supplements contract an average of5 colds per year.
a) Find the probability of no more than two colds for a persontaking, and for a person not taking, Vitamin C supplements.
b) Suppose 70% of the population takes Vitamin C. Find theprobability that a randomly selected person will have nomore than two colds in a given year.
c) Suppose that a randomly selected person contracts nomore than two colds in a given year. What is the probabilitythat he/she takes Vitamin C supplements?
Week 6 Random Variables and Their Distributions, Part II
OutlineCommon Distributions (Continued)
Random Numbers and Q-Q Plots (Lab 3)
The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution
Solution: a) The R commands ppois(2, 3) and ppois(2,5) return0.423 and 0.125 for the probability of no more than two coldsfor a person taking, and for a person not taking, Vitamin Csupplements, respectively.
b) By the Law of Total Probability the answer, found with the Rcommand ppois(2, 3)*0.7 + ppois(2,5)*0.3, is 0.334.
c) Use of Bayes Theorem yields the answer 0.888. Thecorresponding R command is ppois(2, 3)*0.7/(ppois(2, 3)*0.7 +ppois(2,5)*0.3)
Week 6 Random Variables and Their Distributions, Part II
OutlineCommon Distributions (Continued)
Random Numbers and Q-Q Plots (Lab 3)
The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution
Poisson Approximation to Binomial Probabilities
• If Y ∼ Bin(n,p), with n ≥ 100, p ≤ 0.01, and np ≤ 20, then
P(Y ≥ k) ' P(X ≥ k), k = 0,1,2, . . . ,n,
where X ∼ Poisson(λ = np).
The enormous range of applications of the Poissondistribution is due to this proposition. Read the twoparagraphs following the proof of Proposition 3.4-1 onpage 138.
Week 6 Random Variables and Their Distributions, Part II
OutlineCommon Distributions (Continued)
Random Numbers and Q-Q Plots (Lab 3)
The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution
Suppose the monthly suicide rate in a certain county is 1per 100,000 people. Give an approximation to theprobability that in a city of 500,000 in this county there willbe no more than six suicides in the next month.
Solution. Let Y be the # of suicides in that city next month.Then Y ∼ Bin(500,000,p = 10−5), and all conditions forthe Poisson approximation to Binomial probabilities aremet. Thus, if X ∼ Poisson(λ = np = 5),
P(Y ≤ 6) ' P(X ≤ 6) = 0.762.
Week 6 Random Variables and Their Distributions, Part II
OutlineCommon Distributions (Continued)
Random Numbers and Q-Q Plots (Lab 3)
The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution
The Poisson process
• If the # of occurrences are recorded as they accumulate overtime, set
X (t) = number of occurrences in the time interval [0, t ].
• X (t), t ≥ 0, is called a Poisson process with rate α per unittime, if the following assumptions are satisfied.
1 The probability of exactly one occurrence in a short timeperiod of length h is approximately αh.
2 The probability of more than one occurrence in a shorttime period is approximately 0.
3 The number of occurrences in nonoverlapping timeintervals are mutually independent.
Week 6 Random Variables and Their Distributions, Part II
OutlineCommon Distributions (Continued)
Random Numbers and Q-Q Plots (Lab 3)
The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution
• The parameter α in the first assumption specifies the “rate” ofthe occurrences, i.e. the average number of occurrences pertime unit.
• If X (t), t ≥ 0 is a Poisson(α) process, then:
For each fixed t0, X (t0) ∼ Poisson(λ = αt0). Thus,
P(X (t0) = k) = e−αt0 (αt0)k
k !, k = 0,1,2, · · ·
If t1 < t2 are two positive numbers, then
X (t2)− X (t1) ∼ Poisson(α× (t2 − t1))
Week 6 Random Variables and Their Distributions, Part II
OutlineCommon Distributions (Continued)
Random Numbers and Q-Q Plots (Lab 3)
The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution
Continuous electrolytic inspection of a tin plate yields onaverage 0.2 imperfections per minute. Find:
a) The probability of one imperfection in three minutes.
b) The probability of at most one imperfection in 0.25 hours.
Solution. a) Here α = 0.2, t = 3, λ = αt = 0.6. Thus,
P(X (3) = 1) = F (1; 0.6)− F (0; 0.6) = .878− .549 = .329.
b) Here α = 0.2, t = 15, λ = αt = 3.0. Thus,
P(X (15) ≤ 1) = F(1; 3.0
)= .199.
Week 6 Random Variables and Their Distributions, Part II
OutlineCommon Distributions (Continued)
Random Numbers and Q-Q Plots (Lab 3)
The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution
People enter a department store according to a Poissonprocess with rate α per hour. It is known that 30% of thoseentering the store will make a purchase of $50.00 or more.Find the probability mass function of the number ofcustomers who will make purchases of $50.00 or moreduring the next hour.
Answer: Poisson(0.3α). (Proof omitted, but see Example3.4-16, p. 143)
Week 6 Random Variables and Their Distributions, Part II
OutlineCommon Distributions (Continued)
Random Numbers and Q-Q Plots (Lab 3)
The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution
1 Common Distributions (Continued)
The Geometric and Negative Binomial Random Variables
The Poisson Random Variable and Process
The Normal Distribution
2 Random Numbers and Q-Q Plots (Lab 3)
Random Numbers in R
Q-Q Plots
Week 6 Random Variables and Their Distributions, Part II
OutlineCommon Distributions (Continued)
Random Numbers and Q-Q Plots (Lab 3)
The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution
The Normal distribution if the most important distribution inprobability and statistics.X ∼ N(µ, σ2) if its pdf is
f (x ;µ, σ2) =1√
2πσ2e−
(x−µ)2
2σ2 , −∞ < x <∞.
The pdf of X ∼ N(µ, σ2) in R is dnorm(x,µ,σ).The cdf of X ∼ N(µ, σ2) does not have a closed formexpression. In R it is pnorm(x,µ,σ).The (1− α)100th percentile of X ∼ N(µ, σ2) is found in Rby qnorm(1-α, µ,σ).
Week 6 Random Variables and Their Distributions, Part II
OutlineCommon Distributions (Continued)
Random Numbers and Q-Q Plots (Lab 3)
The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution
The Standard Normal Distribution
•When µ = 0 and σ = 1, X is said to have the standard normaldistribution and is denoted, universally, by Z . The pdf of Z is
φ(z) =1√2π
e−z2/2, −∞ < z <∞.
The cdf of Z is denoted by Φ. Thus
Φ(z) = P(Z ≤ z) =
∫ z
−∞φ(x)dx .
Φ(z) has no closed form expression, but is tabulated in TableA.3.
Week 6 Random Variables and Their Distributions, Part II
OutlineCommon Distributions (Continued)
Random Numbers and Q-Q Plots (Lab 3)
The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution
Historical Notes
It was discovered by Abraham DeMoivre in 1733, forapproximating binomial probabilities when n is large. Hecalled it the exponential bell-shaped curve.
DeMoivre was the first statistical consultant working out of”Slaughter’s Coffee House”, a betting shop in Long Acres,London.
In 1803, Karl Friedrich Gauss used it for predicting thelocation of astronomical objects. Because of this it becameknown as the Gaussian distribution.By the late 19th century, statisticians had noted that mostdata sets would have approximately bell-shapedhistograms. It came to be accepted that it was ”normal” forany well-behaved data set to follow this curve. So theGaussian curve became the normal curve.
Week 6 Random Variables and Their Distributions, Part II
OutlineCommon Distributions (Continued)
Random Numbers and Q-Q Plots (Lab 3)
The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution
Figure: One side of the 10 Mark bill
Week 6 Random Variables and Their Distributions, Part II
OutlineCommon Distributions (Continued)
Random Numbers and Q-Q Plots (Lab 3)
The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution
Basic Properties of the Normal Distribution
Proposition
If X ∼ N(µ, σ2), then1 E(X ) = µ.2 Var(X ) = σ2.3 For an real numbers a,b
Y = a + bX ∼ N(a + bµ,b2σ2).
For example, if X ∼ N(4,9) then
Y = 5 + 2X ∼ N(13,36)
Week 6 Random Variables and Their Distributions, Part II
OutlineCommon Distributions (Continued)
Random Numbers and Q-Q Plots (Lab 3)
The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution
Corollary1 If Z ∼ N(0,1), then X = µ+ σZ ∼ N(µ, σ2).
2 If X ∼ N(µ, σ2), then Z =X − µσ∼ N(0,1).
3 If X ∼ N(µ, σ2), then xα = µ+ σzα,
where xα and zα denote the percentiles of X and Z .
The corollary implies that probabilities and percentiles ofany normal random variable can be computed fromcorresponding probabilities and percentiles of Z .
Week 6 Random Variables and Their Distributions, Part II
OutlineCommon Distributions (Continued)
Random Numbers and Q-Q Plots (Lab 3)
The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution
The Standard Normal Table
• Table A.3 provides values of Φ(z), the standard normal cdf,for z-values ranging from 0 up to 3.09, in steps of 0.01.
The left-most column of the table provides the z-value upto the first decimal, while the top is used for the seconddecimal. Thus, the number 1.00 is identified by 1.0 in theleft column and 0.00 in the top row, and the number 1.25 isidentified by 1.2 in the left column and 0.05 in the top row.
• The value of the standard normal cdf at negative argumentsis found from the formula
Φ(−z) = 1− Φ(z),
which follows from the symmetry of the standard normal pdf.
Week 6 Random Variables and Their Distributions, Part II
OutlineCommon Distributions (Continued)
Random Numbers and Q-Q Plots (Lab 3)
The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution
Probabilities via the Standard Normal Table
• Table A.3 can be used to find probabilities related to anynormal r.v.: If X ∼ N(µ, σ2) its cdf is
FX (x) = Φ
(x − µσ
).
If X ∼ N(1.25,0.462), use Table A.3 to find P(1.00 ≤ X ≤ 1.75).
Solution. From P(1.00 ≤ X ≤ 1.75) = FX (1.75)− FX (1.00) wehave
P(1.00 ≤ X ≤ 1.75) = Φ
(1.75− 1.25
0.46
)− Φ
(1.00− 1.25
0.46
)= Φ(1.09)− Φ(−0.54) = 0.8621− 0.2946
= 0.5675.
Week 6 Random Variables and Their Distributions, Part II
OutlineCommon Distributions (Continued)
Random Numbers and Q-Q Plots (Lab 3)
The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution
The 68-95-99.7% Property
If X ∼ N(µ, σ2), then
P(µ− 1σ < X < µ+ 1σ) = P(−1 < Z < 1) = 0.6826,
P(µ− 2σ < X < µ+ 2σ) = P(−2 < Z < 2) = 0.9544,
P(µ− 3σ < X < µ+ 3σ) = P(−3 < Z < 3) = 0.9974.
Week 6 Random Variables and Their Distributions, Part II
OutlineCommon Distributions (Continued)
Random Numbers and Q-Q Plots (Lab 3)
The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution
Percentiles via the Standard Normal Table
• Recall Φ(z) does not have a closed for expression.
• Table A.3 can be used for providing an approximate solutionfor zα in the equation
Φ(zα) = 1− α.
• To find zα, one first locates 1− α in the body of Table A.3 andthen reads zα from the margins. If the exact value of 1− α doesnot exist, then locate the values just smaller and just larger than1− α, and use the average of the corresponding z-values as anapproximation to zα.
• If X ∼ N(µ, σ2), xα = µ+ σzα.
Week 6 Random Variables and Their Distributions, Part II
OutlineCommon Distributions (Continued)
Random Numbers and Q-Q Plots (Lab 3)
The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution
Find z0.05, the 95th percentile of Z ∼ N(0,1).
Solution. 1− α = 0.95 does not exist in the body of thetable. The entry that is closest to, but larger than 0.95(which is 0.9505), corresponds to 1.64. The entry that isclosest to, but smaller than 0.95 (which is 0.9495),corresponds to 1.65. We approximate z0.05 by averaging1.64 and 1.65, i.e., z.05 ' 1.645.
The R command qnorm(0.95) yields1.644854.
Week 6 Random Variables and Their Distributions, Part II
OutlineCommon Distributions (Continued)
Random Numbers and Q-Q Plots (Lab 3)
The Geometric and Negative Binomial Random VariablesThe Poisson Random Variable and ProcessThe Normal Distribution
Let X denote the weight of a randomly chosen frozenyogurt cup. Suppose X ∼ N(8, .462). Find the value c thatseparates the upper 5% of weight values from the lower95%.
Solution. This is another way of asking for the 95-thpercentile, x.05, of X . Using the formula xα = µ+ σzα, wehave
x.05 = 8 + .46z.05 = 8 + (.46)(1.645) = 8.76.
Week 6 Random Variables and Their Distributions, Part II
OutlineCommon Distributions (Continued)
Random Numbers and Q-Q Plots (Lab 3)
Random Numbers in RQ-Q Plots
1 Common Distributions (Continued)
The Geometric and Negative Binomial Random Variables
The Poisson Random Variable and Process
The Normal Distribution
2 Random Numbers and Q-Q Plots (Lab 3)
Random Numbers in R
Q-Q Plots
Week 6 Random Variables and Their Distributions, Part II
OutlineCommon Distributions (Continued)
Random Numbers and Q-Q Plots (Lab 3)
Random Numbers in RQ-Q Plots
R Names of Common Distributions
Distribution R name additional argumentsbinomial binom size, prob (n,p)hypergeometric hyper m, N-m, ngeometric geom prob (p)negative binomial nbinom size, prob (r,p)Poisson pois λnormal norm mean, sduniform unif min, maxexponential exp rate (λ)gamma gamma shape, scaleCauchy cauchy location, scale
Week 6 Random Variables and Their Distributions, Part II
OutlineCommon Distributions (Continued)
Random Numbers and Q-Q Plots (Lab 3)
Random Numbers in RQ-Q Plots
Versions of the R Names with “d”, “p”, “q”, and “r”
A “d” in front of the R name gives the pmf or pdf; a “p”gives the cdf; a “q” gives percentiles; an “r” gives randomnumbers. For example
dbinom(x,n,p), for x = 0,1, . . . ,n, gives the Bin(n,p) pmf.rbinom(m,n,p), for m ≥ 1, gives m random numbers fromthe Bin(n,p) distribution.qnorm(t, µ, σ) gives the 100t-th percentile of the N(µ, σ2)distribution. The default values for µ and σ are 0 and 1;thus, qnorm(0.9) gives the 90-th percentile of N(0, 1).rcauchy(m) gives m numbers from the Cauchy(0, 1)distribution (default values for location and scale are 0, 1).
The “x” in the “d” or “p” versions of the R names can alsobe a vector of values. Similarly for the “t” in the “q” versionof the names. For example dbinom(0:10, 10, 0.3), orqnorm(c(0.9, 0.95)).
Week 6 Random Variables and Their Distributions, Part II
OutlineCommon Distributions (Continued)
Random Numbers and Q-Q Plots (Lab 3)
Random Numbers in RQ-Q Plots
Use the command x=rbinom(1000, 4, 0.5) to generates asample of size 1000 from the Bin(4, 0.5) distribution andstore it in x.
Guess what the command mean(x) might return, and thentry the command to verify your guess.Guess what the command var(x) might return, and then trythe command to verify your guess.The command table(x)/1000 will return the proportions of0, . . . ,4 in your sample. Give the outcome and explain whyyou should have expected this outcome. (Hint:dbinom(0:4,4,0.5) gives the pmf of Bin(4, 0.5).)
Week 6 Random Variables and Their Distributions, Part II
OutlineCommon Distributions (Continued)
Random Numbers and Q-Q Plots (Lab 3)
Random Numbers in RQ-Q Plots
Use the commands qnorm(0.9, 10, 3); 10 + 3*qnorm(0.9).Could you have expected that they give the same answerand why?Use the command x=rnorm(10000) to generate a sampleof size 10000 from the N(0, 1) distribution and store it in x.
Guess what the command quantile(x, 0.9) might return, andthen try the command to verify your guess. (Hint:qnorm(0.9).Guess what the command quantile(10 +3*x, 0.9) mightreturn, and then try the command to verify your guess.
Week 6 Random Variables and Their Distributions, Part II
OutlineCommon Distributions (Continued)
Random Numbers and Q-Q Plots (Lab 3)
Random Numbers in RQ-Q Plots
The ’heavy’ tails of the Cauchy distribution
• Use the following commands to plot the PDFs of N(0,1) andCauchy distributions:
curve(dnorm, -4, 4, col=”red”); curve(dcauchy, -4, 4, add=T,col=”green”)
• Observe that there is much more area under the Cauchy pdfin the tails.
Week 6 Random Variables and Their Distributions, Part II
OutlineCommon Distributions (Continued)
Random Numbers and Q-Q Plots (Lab 3)
Random Numbers in RQ-Q Plots
Heavy Tails and Outliers
• Random samples from the Cauchy distribution often containextreme outliers.
Try cs=rcauchy(1000); summary(cs)For comparison with normal samples tryns=rnormal(1000); summary(ns)Try the above commands with 10000 instead of 1000.
• Outliers affect the sample mean!
Repeat the last command five times. Note that the samplemedian is always very close to 0, which is the populationmedian. Is the sample mean always close to 0?
Week 6 Random Variables and Their Distributions, Part II
OutlineCommon Distributions (Continued)
Random Numbers and Q-Q Plots (Lab 3)
Random Numbers in RQ-Q Plots
1 Common Distributions (Continued)
The Geometric and Negative Binomial Random Variables
The Poisson Random Variable and Process
The Normal Distribution
2 Random Numbers and Q-Q Plots (Lab 3)
Random Numbers in R
Q-Q Plots
Week 6 Random Variables and Their Distributions, Part II
OutlineCommon Distributions (Continued)
Random Numbers and Q-Q Plots (Lab 3)
Random Numbers in RQ-Q Plots
Q-Q plots are used to provide visual verification (orcontradiction) of an assumed distribution. Here we willconcentrate on the assumption of normality.Q-Q stands for Quantile-Quantile. It plots the samplepercentiles versus those of the assumed distribution. If theassumption is correct, the plot should be roughly linear(since sample percentiles estimate the correspondingpopulation percentiles).
With the data in the R object x, two versions of the commandare:
Week 6 Random Variables and Their Distributions, Part II
OutlineCommon Distributions (Continued)
Random Numbers and Q-Q Plots (Lab 3)
Random Numbers in RQ-Q Plots
R commands for the normal Q-Q plotqqnorm(x); qqline(x, col=2)
qqnorm(x,datax=T); qqline(x, datax=T, col=2)
The first version has the sample percentiles on the y-axis, andthe second puts them on the x-axis.
Q-Q plot for a normal sample:x=rnorm(50); qqnorm(x); qqline(x, col=2)Repeat the above command with x=rnorm(50) replaced by
a) x=rexp(50), b) x=runif(50), and c) x=rcauchy(50)
Week 6 Random Variables and Their Distributions, Part II