Random Variables - SJSUbremer/Teaching/Math163/LectureNotes... · 2018. 3. 15. · Similarly to the discrete case, we can also compute expected values of functions of continuous random

Math 163 - Spring 2018 M. Bremer

Random Variables

Definition: A random variable X is a real valued function that maps a samplespace S into the space of real numbers R.

X : S 7→ R

As such, a random variable summarizes the outcome of an experiment in numericalform. For example, we may be interested in how many coin tosses resulted in headsrather than in the actual sequence of heads and tails from an experiment in whicha coin is tossed many times.

There are two different kinds of random variables:

• Discrete random variables take on finitely many or countably infinitely manypossible values.

• Continuous random variables take on uncountably infinitely many possiblevalues, usually in an interval.

Definition: The probability mass function (PMF) of a discrete random variable isdefined as the function

p(a) = P (X = a)

Since probability mass functions are probabilities, we must have 0 ≤ p(a) ≤ 1 and ifthe probability mass function is summed over all possible values of X we must have∑

all a

p(a) = 1

Probability mass functions are either specified in function form, e.g.,

P (X = a) =

(n

a

)pa(1− p)n−a, a = 0, 1, . . . , n

or in table form, e.g.,a p(a)0 1/41 1/22 1/4

Probability mass functions can also be graphed, most commonly as bar graphs withthe possible values on the x-axis and bars whose heights represent the probabilities.

There is a handout with probability mass functions and other properties of specificdiscrete random variables that you are expected to be familiar with on the coursewebsite.

27


Example: Coupon collector problem

Suppose there are N distinct types of coupons and each time one obtains a coupon,it is, independently of the past, equally likely to be any one of the N types. Onerandom variable of interest is T the number of coupons one has to collect in orderobtain a complete set (with each type of coupon collected at least once). Find theprobability mass function of T .

Definition: The functionF (a) = P (X ≤ a)

is called the cumulative distribution function (CDF) of the random variable X.Cumulative distribution functions are always non-decreasing, right-continuous with

limb→−∞

F (b) = 0, limb→∞

F (b) = 1

28


Continuous Random Variables

Definition: Suppose X is a continuous random variable. Then a function f(x)with

P (X ∈ B) =

∫B

f(x)dx

is called the probability density function (PDF) of X.

Probability density functions must satisfy the following properties:

• They must be non-negative everywhere: f(x) ≥ 0 for all x ∈ R.

• If integrated over all possible values of X (or the whole of R) we must obtainone.

∞∫−∞

f(x) = P (X ∈ R) = 1

since the random variable X must take on some value.

Example: Suppose X is a continuous random variable with PDF

f(x) =

{C(4x− 2x2) 0 < x < 20 otherwise

(a) Find the value of the constant C.

(b) Find P (X > 1).

29


Example: The cumulative distribution function of a random variable X is givenby

F (x) =

0 x < 0x2

0 ≤ x < 123

1 ≤ x < 21 2 ≤ x

Graph the CDF. Is X discrete or continuous? Find

P (X < 2)

P (X = 1)

P (X = 1/2)

P (1/2 ≤ X ≤ 1)

Example*: An insurance company determines that N , the number of claims re-ceived in a week, is a random variable with

P (N = n) =1

2n+1, where n ≥ 0

The company also determines that the number of claims in a given week is indepen-dent of the number of claims received in any other week. Calculate the probabilitythat exactly seven claims will be received during a given two-week period.

30


Expected Value and Variance

Expected values were what gave rise to the study of probability originally. They canbe understood as the “long-run-average” outcome or as the center of the distributionof a random variable. Expected values are defined very similarly for discrete andcontinuous distributions.

Definition: For a discrete random variable X with probability mass function p(x),the expected value of X is defined as

E[X] =∑

x:p(x)>0

xp(x)

That is, the expected value is a weighted sum of all possible values of X where theweights are the probabilities.

For a continuous random variable, the idea is very similar. But instead of takinga sum over the countably many possible values, the probability density function isintegrated over all possible values.

Definition: Let X be a continuous random variable with probability density func-tion f(x). Then the expected value of X is defined as

E[X] =

∞∫−∞

xf(x)dx

Example: Consider a discrete random variable with probability mass function p(x).Find E[X] and indicate it in the graph of the PMF.

x p(x)1 1/42 1/28 1/4

1 2 3 4 5 6 87

Note: An expected value must always be within the range of possible values of arandom variable.

Example*: Find E[X] if the density of the continuous random variable X is

f(x) =

{ |x|10−2 ≤ x ≤ 4

0 otherwise

31


Frequently, the expected value of a function of the random variable X is of interest,rather than the expected value of the random variable X itself.

Theorem: Let X be a discrete random variable and let g(x) be a real valuedfunction, then

E[g(X)] =∑

x:p(x)>0

g(x)p(x)

Proof:

Example: Find E[aX + b] in terms of E[X] if a, b ∈ R are constants.

Example*: A tour operator has a bus that can accommodate 20 tourists. The op-erator knows that tourists may not show up, so he sells 21 tickets. The probabilitythat an individual tourist will not show up is 0.02, independent of all other tourists.Each ticket costs 50, and is non-refundable if a tourist fails to show up. If a touristshows up and a seat is not available, the tour operator has to pay 100 (ticket cost+ 50 penalty) to the tourist. Calculate the expected revenue of the tour operator.

32


Similarly to the discrete case, we can also compute expected values of functions ofcontinuous random variables. However, to prove the corresponding statement, thefollowing result is helpful.

Lemma: For a nonnegative random variable Y (discrete or continuous)

E[Y ] =

∞∫0

P (Y > y)dy

Proof:

Theorem: If X is a continuous random variable with PDF f(x), then for any realvalued function g(x),

E[g(X)] =

∞∫−∞

g(x)f(x)dx

Proof:

Example: A stick of length 1 is split at a point U having density function

f(u) =

{1 0 ≤ u ≤ 10 otherwise

Determine the expected length of the piece that contains the point p (0 ≤ p ≤ 1).

33


While the expected value of a random variable measures the “center” of the distri-bution of X, the variance measures the spread of the distribution.

Definition: If X is a random variable with mean µ, then the variance of X, denotedby V ar(X) is defined by

V ar(X) = E[(X − µ)2]

That is, the variance is the expected squared difference of X and its mean.

Note: This definition is valid for both discrete and continuous random variables.

Fact: Alternatively, variance can be computed as

V ar(X) = E[X2]− E[X]2

Proof: (discrete case)

Example: Let X be a continuous random variable with density function

f(x) =

{1b−a a ≤ x ≤ b

0 otherwise

Find the variance of X.

34


Example: Let X be a random variable (discrete or continuous) with variance σ2.Find V ar(aX + b), where a, b ∈ R are constants.

Example*: A recent study indicates that the annual cost of maintaining and re-pairing a car in a town in Ontario averages 200 with a variance of 260. A tax of20% is introduced on all items associated with the maintenance and repair of cars(i.e., everything is made 20% more expensive). Calculate the variance of the annualcost of maintaining and repairing a car after the tax is introduced.

35


Distribution of a Function of a Random Variable

Suppose you know the distribution of a random variable X. How do you find thedistribution of some function Y = g(X) of X?

Example: Let X be a continuous random variable with CDF FX(x). Find the CDFof Y = X2.

Theorem: Let X be a continuous random variable with probability density func-tion fX(x). Suppose that g(x) is a strictly monotonic (increasing or decreasing),differentiable (and thus continuous) function of x. Then the random variable Ydefined by Y = g(X) has a probability density function given by

fY (y) =

{fX [g−1(y)]| d

dyg−1(y)| if y = g(x) for some x

0 if y 6= g(x) for all x

Proof:

Example: Let X be a continuous nonnegative random variable with density func-tion f and let Y = Xn. Find fY , the density function of Y .

36


Named Discrete Distributions

You should be familiar with the discrete and continuous distributions introducedin Math 161A. Being familiar with a distribution includes knowing its parameters,possible values, and the probability mass function or density as well as the cumula-tive distribution function of the distribution. In addition, you should know (and beable to derive where appropriate) formulas for the mean and variance of each distri-bution. Recall, that all distributions covered in 161A (and their key properties) arelisted on the two handouts “Discrete Distributions” and “Continuous Distributions”available on the course web site. The same information is available inside the frontand back cover of your textbook.

Each distribution is useful to model random variables for specific kinds of situa-tions. We will briefly review the distributions and the situations for which they areintended.

Bernoulli: X ∼ Bernoulli(p). X models whether or not a single trial will resultin a success.

p(1) = p, p(0) = 1− p

Here, p is the success probability.

Fact: E[X] = p, V ar(X) = p(1− p)

Binomial: X ∼ Binomial(n, p) X models the number of successes in n independenttrials, each of which will result in a success with probability p.

p(x) =

(n

x

)px(1− p)n−x, x = 0, 1, . . . , n

Fact: E[X] = np, V ar(X) = np(1− p).

Fact: The sum of n independent Bernoulli random variables with the same param-eter p has a Binomial(n, p) distribution.

Example: In a U.S. presidential election, the candidate who gains the maximumnumber of votes in a state is awarded the total number of electoral college votes al-located to that state. The number of electoral college votes is roughly proportionalto the population of that state. That is, a state with population n has roughly ncelectoral votes. In which states does your vote have more average power in a closeelection? Here, average power is defined as the expected number of electoral votesthat your vote will affect. Let’s assume that the total population of the state youare in is odd n = 2k + 1.

37


Hypergeometric: X ∼ Hypergeometric(n,m,N). X is the number of specialobjects in a sample of size n taken from a population of N objects (without replace-ment) of which m are special.

p(x) =

(mx

)(N−mn−x

)(Nn

) , x = 0, 1, . . . ,min{n,m}

Fact: E[X] = nmN, V ar(X) =

(N−nN−1

)nmN

(1− m

N

).

Fact: If N is very large compared to n, then it makes no difference whether youdraw with or without replacement and thus a Hypergeometric random variable withN >> n can be approximated by a Binomial random variable with p = m

N.

Geometric: X ∼ Geometric(p). X is the number of independent trials that haveto be performed until the first success is observed. p is the probability of a successin each trial.

p(x) = (1− p)x−1p, x = 1, 2, . . .

Fact: E[X] = 1p, V ar(X) = 1−p

p2.

Fact: The geometric distribution has the “lack-of-memory” property

P (X > s+ t|X > t) = P (X > s)

Example: Find a closed-from formula for the CDF of a Geometric(p) random vari-able.

Negative Binomial: X ∼ Negative Binomial(r, p). X is the number of inde-pendent trials that have to be performed until the rth success is observed. p is theprobability of success in each trial.

p(x) =

(x− 1

r − 1

)pr(1− p)x−r, x = r, 2 + 1, . . .

Fact: E[X] = rp, V ar(X) = r 1−p

p2.

Fact: The sum of r independent geometric random variables (with the same pa-rameter p) has a negative binomial distribution with parameters r and p.

38


Example: The Banach match problem

A pipe smoking mathematician carries two matchboxes, one in his right pocketand one in his left. Each time he needs a match, he is equally likely to chooseeither pocket. Initially, each box contained N matches. Consider the moment themathematician first discovers a matchbox to be empty. At this time, what is theprobability that there are exactly k matches in the other box (k = 0, 1, . . . , N).

Poisson: X ∼ Poisson(λ). X is the number of times a “rare” event occurs (in acertain time or space interval).

p(x) = e−λλx

x!, x = 0, 1, . . .

The Poisson distribution can be understood as an approximation of the binomialdistribution in those cases where n is large and p is small enough so that np ismoderately small.

Examples:

• The number of hurricanes in the central U.S. in a month.

• The number of misprints on a page of some document.

• The number of walnuts in a walnut cookie.

Fact: E[X] = λ, V ar(X) = λ.

Fact: The sum of independent Poisson random variables is Poisson. The parametersadd.

Example: Poisson variables can be derived from Binomial random variables.Tocount the number of rare events, imagine an interval (that can stand for time, or apage, or a volume of cookie dough) split up into n little subintervals.

Interval

{Subinterval

It is always possible to make the subintervals small enough (by making n large),such that there is at most one event in a subinterval. Suppose that the probabilitythat a subinterval has an event in it is p.

39


We are interested in the number of times X the event occurs in the interval. Strictlyspeaking, X has a Binomial(n, p) distribution but with a very large n and a small p(since the events are “rare”). What happens to the Binomial PMF, if n→∞? Letλ = np.

While working with specified distributions is quite straightforward (e.g., computingvalues of the PMF, CDF, expected values or variances) it can sometimes be chal-lenging for students to recognize which distribution to use in a specific situation.

Example: Skittles are small fruit candy that come in many different colors. About10% of all skittles are orange. Skittles are sold in randomly filled packages of 90candy each. For each of the following random variables, state the distribution andfind the values of all relevant parameters.

(a) X is the number of orange skittles in one package.

(b) X is the number of packages you buy until you get one that has no orangeskittles.

(c) X is the number of orange skittles you eat, if you eat ten from a full packagethat had 12 orange ones in it.

(d) Suppose you get “Skittle-cravings” on average twice every day. X is the num-ber of skittle-cravings you’ll have within the next 36 hours.

(e) X is the number of skittles you’ll eat (randomly selected from a very largesupply) until you eat your 10th orange skittle.

(f) Suppose you eat ten randomly selected skittles every day for a week. X is thenumber of days on which you eat no orange ones.

40


Named Continuous Distributions

Uniform: X ∼ Uniform(a, b). The random variable X is equally likely to assumeany position in the interval [a, b].

f(x) =

{1b−a a ≤ x ≤ b

0 otherwise

Fact: E[X] = b+a2, V ar(X) = (b−a)2

12.

Fact: The CDF of a continuous uniform(a, b) random variable is

F (x) =

0 x < ax−ab−a a ≤ x ≤ b

1 x > b

Exponential: X ∼ Exponential(λ). The exponential distribution is the contin-uous analog to the geometric distribution. It is frequently used to model waitingtimes and has a close relationship with the Poisson distribution.

f(x) =

{λe−λx x ≥ 00 otherwise

Examples:

• X is the time until the next customer arrives at a bank.

• X is the time until a lightbulb burns out (lifetime).

• X is the mileage you get out of one tank of gas.

Fact: E[X] = 1λ, V ar(X) = 1

λ2.

Fact: Like the geometric distribution, the exponential distribution also has thememoryless property

P (X > s+ t|X > t) = P (X > s)

Example: Suppose the number of events that occur in a unit time interval hasPoisson distribution with mean λ. Find the distribution of the amount of time untilthe first event occurs.

41


Gamma Distribution: X ∼ Gamma(r, λ). X models the continuous waiting timeuntil the rth occurrence of an event.

f(x) =

{λe−λx(λx)r−1

Γ(r)x ≥ 0

0 x < 0

Here, Γ(r) is the gamma function which is defined as

Γ(r) = (r − 1)!

if r is an integer. But r does not necessarily have to be an integer in the abovedefinition of the gamma distribution. For general r, the gamma function is definedas

Γ(r) =

∞∫0

e−yyr−1dy

Fact: E[X] = rλ, V ar(X) = r 1

λ2.

Fact: The sum of r independent exponential random variables with the same pa-rameter λ has a gamma(r, λ) distribution.

Fact: A gamma random variable with λ = 12

and r = n2

for some positive integer nis called a χ2

n (chi-squared) random variable with n degrees of freedom.

Normal: X ∼ Normal(µ, σ2). The nor-mal distribution is also sometimes called theGaussian distribution after Carl FriedrichGauss who was the first to officially definethis distribution in 1809. Its PDF has thecharacteristic “bell-curve-shape”.

f(x) =1√2πσ

e−12

(x−µ)2

σ2 , −∞ < x <∞

Fact: E[X] = µ, V ar(X) = σ2.

Example: Linear transformations of normal random variables are normal. That is,suppose X ∼ Normal(µ, σ2). Find the distribution of Y = aX + b (a, b,∈ R).

42


Fact: A normal distribution with mean µ = 0 and variance σ2 = 1 is called astandard normal distribution.

Fact: The CDF of a standard normal random variable is denoted Φ(x) = P (X ≤ x)where X ∼ Normal(0,1). There is no closed form for Φ(x). Instead, the values ofΦ(x) are obtained from tables or through software.

Fact: The sum of independent normal random variables is normal. Let X ∼Normal(µ1, σ

21) and Y ∼ Normal(µ2, σ

22) be independent, then

X + Y ∼ Normal(µ1 + µ2, σ21 + σ2

2)

Even before Carl Friedrich Gauss officially defined the normal distribution (in hispaper about least squares and maximum likelihood methods), Abraham deMoivreand Pierre-Simon Laplace showed that a Binomial random variable with large ncan be approximated well by a normal random variable with the same mean andvariance as the binomial. They first proved this result only for p = 1

2and after

Gauss’ paper extended it to the general case.

Theorem: The DeMoivre-Laplace Limit Theorem

Let Sn denote the number of successes in n independent trials, each resulting in asuccess with probability p, then for any a < b

P

(a ≤ Sn − np√

np(1− p)≤ b

)−−−→n→∞

Φ(b)− Φ(a)

We will consider the proof when we discuss the Central Limit Theorem (of whichthis is a special case) later in the semester.

Example: Consider X ∼ Binomial(n = 10, p = 0.3). Use your calculator tocompute P (3 ≤ X ≤ 6). Also use the normal approximation to the Binomial(with continuity correction) to approximate this same probability. Use either yourcalculator or a normal table to look up the required normal CDF values.

0 2 4 6 8 10

0.00

0.05

0.10

0.15

0.20

0.25

x

y

Binom(x,n,p)Normal Approx.

43


Chi-Squared Distribution

The χ2 (chi-squared) distribution appears in several common hypothesis tests (e.g.,goodness of fit, likelihood ratio). It is related to both the gamma and the normaldistributions.

Definition: A continuous random variable with density function

f(x) =

{xn2 −1

2n2 Γ(n

2)e−

x2 x > 0

0 otherwise

is said to have a χ2n distribution (chi-squared with n degrees of freedom).

Fact: The χ2n-distribution is a special case of the gamma distribution with λ = 1

2

and r = n2.

0 2 4 6 8 10

0.0

0.2

0.4

0.6

0.8

x

prob

abili

ty d

ensi

ty fu

nctio

n

degree of freedom123456

Fact: The mean and variance of a χ2n distribution are

E[X] = V ar(X) =

Example: Let Z ∼ Normal(0,1). Show that Z2 has a χ21 distribution.

44


Cauchy Distribution

Definition: A random variable is said to have a Cauchy distribution with parameterθ if it has probability density function

f(x) =1

π

1

1 + (x− θ)2, −∞ < x <∞

-4 -2 0 2 4

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

x

prob

abili

ty d

ensi

ty fu

nctio

n

θ

01-1

Definition: If X ∼ Cauchy(θ = 0), X is said to have a standard Cauchy distribu-tion.

Fact: If X and Y are independent standard normal random variables, then theirquotient X/Y has a standard Cauchy distribution.

Proof of this statement will follow once we have introduced joint probability densityfunctions.

Fact: A standard Cauchy distribution is the same as Student’s t-distribution withone degree of freedom. We will study t-distributions in a few weeks.

Example: Since the standard Cauchy distribution is symmetric, its median is zero.The mean of a standard Cauchy random variable is not defined.

45

Documents

Random Variables - SJSUbremer/Teaching/Math163/LectureNotes... · 2018. 3. 15. · Similarly to the discrete case, we can also compute expected values of functions of continuous random