1/129
Chapter 1 Probability and Distributions
1.1 Introduction
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
2/129
Question: What is probability?Answer:
I Probability is the chance for something to happen, so it’salways a number in [0, 1].
I In addition, in this class, probability is also the name of themathematical tool we are going to use.
How do we learn probability?I Mathematics is the language we use to communicate with the
universe. Its grammars are rules called axioms andtheorems.
I When we learn probability, we learn those rules. So preparefor a bit of a headache at the beginning similar to learning asecond language.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
3/129
Relationship between probability and statistics?I Probability: From population to sample.I Statistics: From sample to population.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
4/129
Random experiment
I The experiment can be repeated under the same condition.I Each experiment terminates with an outcome.I The outcome cannot be predicted with certainty prior to the
performance of the experiment.I The collection of all possible outcomes can be described prior
to the performance of the experiment.
This collection is called the sample space, usually denoted by C.
Example (1.1.1)
1 Toss a coin once, sample space C = {T,H}.2 Toss a coin twice, sample space?
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
5/129
Event
A subset of the sample space C, usually denoted by C, is called anevent. If an outcome belongs to C, then we say that the event Chas occurred.
Example (1.1.2)
1 Casting one red and one white die. Sample space is 36ordered pairs,
C = {(1, 1), . . . , (1, 6), (2, 1), . . . , (2, 6), . . . , (6, 6)}.
2 Event: sum of numbers on dice is 9,
C = {(3, 6), (4.5), (5, 4), (6, 3)}
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
6/129
Probability
Probability quantifies the notion of chance or likelihood of an event.Relative frequency is an empirical definition of probability:
I Suppose the experiment is repeated N times.I Let kN denote the number of times the event C actually
occurred.I The fN = kN/N is the relative frequency of the event C in the
repeated experiments.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
7/129
Probability: frequentist
I Relative frequency: fN = kN/N .I Suppose that N increases.I Suppose that p = limN→∞ fN exists. Note p ∈ [0, 1].I Then p is the probability of the event C.
Example (1.1.2, cont’d)
Sum of numbers on dice is 9:
C = {(3, 6), (4.5), (5, 4), (6, 3)}
If each of the 36 outcomes is equally likely, then p = 4/36.
I We denote the probability of an event C by P (C).I It is the long-run relative frequency of the event C in a very
large number of independent replications of the experiment.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
8/129
Subjective probability
Consider the event C = Hawkeye wins NCAA basketballchampionship in 2019. Suppose that I offer you two lottery tickets,and you can choose between them.
I If you pick lottery ticket 1, then we spin a roulette wheel thathas 100 slots and has been declared “fair” by the NevadaGaming Commission. If the ball lands in a slot from 1 through10 you get $100 and otherwise you get $0.
I If you pick lottery ticket 2, then you get $100 if C occurs andotherwise you get $0.
If you choose Lottery ticket 1, then your subjective p ≤ 0.10, and ifyou choose Lottery ticket 2 then your subjective p ≥ 0.10.
I The advantage of subjective probability is that it can beextended to experiments that cannot be repeated. (Thinkabout betting in sports, investing your money, . . . .)
I Mathematically, the two concepts of probability are identical.In many situations we will not need to make the distinction.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
9/129
Chapter 1 Probability and Distributions
1.2 Set Theory
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
10/129
Sets
I A set is a collection of objects.If an element x belongs to a set C, then we write x ∈ C.
I If each element of a set C1 is also an element of another setC2, then C1 is called a subset of C2, written as C1 ⊂ C2.
I If C1 ⊂ C2 and C2 ⊂ C1, then C1 = C2.
Example (1.2.1)
Define sets C1 = {x : 0 ≤ x ≤ 1} and C2 = {x : −1 ≤ x ≤ 2}.We have C1 ⊂ C2.
Example (1.2.2)
Define sets C1 = {(x, y) : 0 ≤ x = y ≤ 1} andC2 = {(x, y) : 0 ≤ x ≤ 1, 0 ≤ y ≤ 1}. We have C1 ⊂ C2.
I If a set C has no elements, then C is called the null set,written as C = φ.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
11/129
Union
I The set of all elements that belong to at least one of the setsC1 and C2 is called the union of C1 and C2, written asC1 ∪ C2.
I For example, if C1 = {1, 2, 3}, C2 = {2, 3, 5}, thenC1 ∪ C2 = {1, 2, 3, 5}.
I The set of elements that belongs to at least one of the setsC1, . . . , Ck is C1 ∪ C2 . . . ∪ Ck, also written ∪ki=1Ci.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
12/129
ExampleI C ∪ φ =
I C ∪ C =
I If C1 ⊂ C2, then C1 ∪ C2 =
I If Ci = {x : x ∈ [i− 1, i]}, i = 1, . . . , k then∪ki=1Ci = {x : x ∈ [0, k]}.
Example (1.2.7)
Ck =
{x :
1
k + 1≤ x ≤ 1
}, k = 1, 2, . . . .
ThenC1 ∪ C2 ∪ C3 ∪ · · · = (0, 1].
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
13/129
Intersection
I The set of all elements that belong to each of the sets C1 andC2 is called the intersection of C1 and C2, written as C1 ∩C2.
I If C1 = {1, 2, 3}, C2 = {2, 3, 5}, then C1 ∪ C2 = {2, 3}.I If C1 = [0, 1] and C2 = [−1, 0] then C1 ∩ C2 = {0}.I If Ci = (0, 1/i), i = 1, . . . , k then ∩ki=1Ci = (0, 1/k).
Example (1.2.11)Let
Ck =
{x : 0 < x <
1
k
}, k = 1, 2, . . . .
ThenC1 ∩ C2 ∩ C3 ∩ · · · = ∅.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
14/129
ExampleUse Venn diagrams to depict the sets C1 ∪ C2, C1 ∩ C2,(C1 ∪ C2) ∩ C3 and (C1 ∩ C2) ∪ C3.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
15/129
Space & Complement
I The set of all elements under consideration is called thespace, written as C.
I Number of heads in tossing a coin ten times. The space isC = {0, 1, . . . , 10}.
I Let C be the sample space and C be its subset. The set thatconsists of all elements of C that are not elements of C iscalled the complement of C, written as Cc.
I Number of heads in tossing a coin ten times. IfC = {0, 1, 2, 3, 4} then Cc = {5, 6, 7, 8, 9, 10}.
1 C ∪ Cc = C.2 C ∩ Cc = ∅.3 (Cc)
c= C.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
16/129
Important basic rules
1 DeMorgan’s laws: Let C denote the space and supposeC1, C2 ⊂ C. Then
A : (C1 ∩ C2)c = Cc1 ∪ Cc2.
B : (C1 ∪ C2)c = Cc1 ∩ Cc2.
2 Distributive laws: Let C denote the space and supposeC1, C2, C3 ⊂ C. Then
A : C1 ∪ (C2 ∩ C3) = (C1 ∪ C2) ∩ (C1 ∪ C3).
B : C1 ∩ (C2 ∪ C3) = (C1 ∩ C2) ∪ (C1 ∩ C3).
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
17/129
Set functions
I Usual function maps each point to a real number.
I Set function maps each set to a real number:
More specifically, let C be a space and C be its subset. Amapping Q that assigns a value to the subset C (rather thanan element x) is called a set function.
Example (1.2.18)
Let C be a set in one-dimensional space and let Q(C) be equal tothe number of points in C which correspond to positive integers.Then Q(C) is a function of the set C. Thus:
I if C = {x : 0 < x < 5}, then Q(C) = 4.I if C = {−2,−1}, then Q(C) = 0.I if C = {x : −∞ < x < 6}, then Q(C) = 5.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
18/129
Example (1.2.23)Let C be a set in one-dimensional space and let
Q(C) =
∫Ce−xdx.
If C = {x : 0 ≤ x <∞}, then
Q(C) =
∫ ∞0
e−xdx = 1.
If C = {x : 1 < x ≤ 3}, then
Q(C) =
∫ 3
1e−xdx = e−1 − e−3.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
19/129
Example (1.2.24)Let C be a set in n-dimensional space and let
Q(C) =
∫· · ·∫
C
dx1 · · · dxn.
If C = {(x1, x2, . . . , xn) : 0 ≤ x1, x2, · · · , xn ≤ 1}, then
Q(C) =
∫ 1
0
∫ 1
0· · ·∫ 1
0dx1dx2 · · · dxn = 1.
If C = {(x1, x2, . . . , xn) : 0 ≤ x1 ≤ x2 ≤ · · · ≤ xn ≤ 1}, then
Q(C) =
∫ 1
0
∫ xn
0· · ·∫ x3
0
∫ x2
0dx1dx2 · · · dxn−1dxn =
1
n!.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
20/129
Chapter 1 Probability and Distributions
1.3 The Probability Set Function
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
21/129
σ-field
Definition (σ-field)
A collection of sets that is closed under complementation andcountable union of its members is a σ-field. This collection ofevents is usually denoted by B.
I The collection is also closed under countable intersectionsaccording to DeMorgan’s Laws.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
22/129
Probability set function
Definition (Probability set function)
Let C be a sample space and B be a σ-field defined on C. Let P bea real-valued function defined on B. Then P is a probabilityfunction if
1 P (C) ≥ 0 for all C ∈ B;
2 P (C) = 1;
3 If Ci ∈ B (i = 1, 2, . . .) and Ci ∩ Cj = Φ ∀i 6= j, thenP (∪∞i=1Ci) =
∑∞i=1 P (Ci).
I A collection of events whose members are pairwise disjoint issaid to be mutually exclusive.
I The collection is further said to be exhaustive if the union of itsevents is the sample space.
I These three axioms imply many properties of the probabilityset function. Let us see a few examples.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
23/129
Basic results on probability functions
Theorem (1.3.1)
P (C) = 1− P (Cc).
Theorem (1.3.2)
P (φ) = 0.
proof: Let C = ∅ and follows from P (C) = 1.
Theorem (1.3.3)
If C1 ⊂ C2, then P (C1) ≤ P (C2).
proof: Let T = C1 ∪ (CC1 ∩ C2). Notice that C2 = C1 ∪ T andC1 ∩ T = ∅.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
24/129
Theorem (1.3.4)
0 ≤ P (C) ≤ 1.
proof: 0 = P (∅) ≤ P (C) ≤ P (C).
Theorem (1.3.5)
For two arbitrary events C1 and C2, it holds that
P (C1 ∪ C2) = P (C1) + P (C2)− P (C1 ∩ C2).
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
25/129
Inclusion Exclusion Formula
For three arbitrary events C1, C2 and C3, it holds that
P (C1 ∪ C2 ∪ C3) = P (C1) + P (C2) + P (C3)
− P (C1 ∩ C2)− P (C1 ∩ C3)− P (C2 ∩ C3)
+ P (C1 ∩ C2 ∩ C3).
I Boole’s inequality:
P (C1) + P (C2) + . . .+ P (Ck) ≥ P (C1 ∪ C2 ∪ . . . ∪ Ck).
I Bonferroni’s inequality:
P (C1 ∩ C2) ≥ P (C1) + P (C2)− 1.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
26/129
If an experiment can result in any one of N different outcomes,and if exactly n of those outcomes correspond to event C, then theprobability of event C is
P (C) =n
N
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
27/129
Example 1.3.2
An unbiased coin is to be tossed twice and the outcomes are in anordered pair. Then,C = {(TT ), (TH), (HT ), (HH)}.
Write C1 = {the first toss results in a head},C2 = {the second toss results in a tail}.Then, P (C1 ∪ C2) =?
Method 1: C1 = {HH,HT}, C2 = {HT, TT}, soC1 ∪ C2 = {HH,HT, TT}. We see P (C1 ∪ C2) = 3/4.
Method 2: C1 = {HH,HT}, so P (C1) = 1/2, C2 = {HT, TT},so P (C2) = 1/2. We have C1 ∩ C2 = {HT}, soP (C1 ∩ C2) = 1/4. Thus by Inclusion Exclusion Formula,P (C1 ∪ C2) = P (C1) + P (C2)− P (C1 ∩ C2) = 3/4.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
28/129
Counting rules
I Suppose we have two experiments, The first experimentresults in m outcomes while the second results in noutcomes. The composite experiment (the first experimentfollowed by the second) has mn outcomes. This is called themultiplication rule.
I Let A be a set with n elements. Suppose we are interested ink−tuples whose components are elements of A. Then, by theextended multiplication rule, there are nk such k−tuples.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
29/129
Permutation
Suppose k ≤ n and we are interested in k−tuples whosecomponents are distinct elements of A. Hence, by themultiplication rule, there are n(n− 1) . . . (n− (k − 1)) suchk−tuples with distinct elements.
ExampleFor the integers 1, 2, 3, 4, 5, and ordered subsets of size 2 we have:
(1,2), (1,3), (1,4), (1,5), (2, 1), (2,3), (2,4), (2,5), (3, 1), (3, 2),
(3,4), (3,5), (4, 1), (4, 2), (4, 3), (4, 5), (5, 1), (5, 2), (5, 3), (5, 4).
The number of permutations of n things taken k at a time is
Pnk = n(n− 1) · . . . · (n− j + 1) · . . . · (n− k + 1) =n!
(n− k)!
I Apply to previous example: 5!/2! = 20.Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
30/129
Combination
I Suppose we have n objects, a1, . . . , an. How many subsets ofsize k without regard for order can we choose from theseobjects?For the previous example we have 10.
I In general the number of combinations of n things taken k at atime is (
n
k
)=
n!
k!(n− k)!
ExamplePoker hand example: 52 cards, 5 cards in a hand.
1 Probability of any specific hand: 1/(525
).
2 Probability that all cards are hearts:(135
)/(525
).
3 Probability of a flush (all cards same suit):(41
)(135
)/(525
).
4 Probability of a full house (three kings and two queens):(43
)(42
)/(525
).
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
31/129
TheoremThe number of distinct permutations of n objects of which n1 are ofone kind, n2 of a second kind, ..., nk of a kth kind is
n!
n1!n2! . . . nk!.
ExampleHow many distinct permutations can be made form the wordINTERNET?
I The letters are 8 letters in the word ”INTERNET”:
I N T E R1 2 2 2 1
I Number of different permutations:
8!
1!2!2!2!1!= 5040.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
32/129
A different point of view
I If you put n distinct objects into k cells. And you want to putn1 into cell 1, n2 into cell 2, . . . , nk into cell k, wheren = n1 + n2 + . . .+ nk, then the answer is the same as
n!
n1!n2! . . . nk!
I Multiplication rule:(n
n1, n2, . . . , nk
)=
(n
n1
)(n− n1n2
). . .
(n− n1 − . . . nk−1
nk
)=
n!
n1!n2! . . . nk!
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
33/129
Chapter 1 Probability and Distributions
1.4 Conditional Probability and Independence
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
34/129
Motivation of conditional probability
I An experiment involves three tosses of a coin.Sample space consists of 8 possible outcomes, all equallylikely:
HHH,HHT,HTH,HTT, THH, THT, TTH, TTT.
I Define two events:C1 = {first toss results in a head},C2 = {at least two heads}.What is the probability of event C2?
P (C2) =
I Suppose we know event C1 occurs. Now what is theprobability of event C2?
P (C2|C1) =
(the conditional probability of C2 given that C1 occurs).
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
35/129
Definition of conditional probability
Definition. For two events C1 and C2, with C1 satisfyingP (C1) > 0, the conditional probability of C2 given C1 is
P (C2|C1) =P (C1 ∩ C2)
P (C1).
Properties of conditional probabilities:
1 P (C2|C1) ≥ 0;
2 P (C2 ∪C3 ∪ · · · |C1) = P (C2|C1) +P (C3|C1) + · · · , providedthat C2, C3, · · · are mutually disjoint;
3 P (C1|C1) = 1.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
36/129
Example (1.4.2)A bowl contains eight chips. Three of the chips are red and theremaining five are blue. Two chips are to be drawn successively, atrandom and without replacement. Compute the probability that thefirst draw results in a red chip (C1) and the second draw results ina blue chip (C2).
Solution:
P (C1 ∩ C2) = P (C1)P (C2|C1) =3
8
5
7=
15
56.
P (C1 ∩ C2) = P (C2)P (C1|C2) = hard · hard.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
37/129
The multiplication rule can be extended to three or more events.For example, if P (C1 ∩ C2) > 0, hence P (C1) > 0, we have
P (C1 ∩ C2 ∩ C3) = P (C1)P (C2|C1)P (C3|C1 ∩ C2).
Example (1.4.4)Four cards are drawn successively, at random and withoutreplacement, from an ordinary deck of playing cards. Compute theprobability of receiving a spade, a heart, a diamond, and a club, inthat order?
Solution:13
52
13
51
13
50
13
49.
What is the probability of receiving a spade, a heart, a diamond,and a club in any order?Solution:
4! · 13
52
13
51
13
50
13
49.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
38/129
Law of total probability
Suppose C1, . . . , Ck form a partition of the sample space C, i.e.
1 C1, . . . , Ck are mutually exclusive
2 C1, . . . , Ck are exhaustive, i.e. P (C1 ∪ . . . ∪ Ck) = 1
Then for any event C ∈ B
P (C) = P (C|C1)P (C1) + P (C|C2)P (C2) + · · ·+ P (C|Ck)P (Ck)
Proof:
P (C) = P [C ∩ C)]
= P [(C1 ∪ C2 ∪ . . . ∪ Ck) ∩ C)]
= P [(C1 ∩ C) ∪ . . . ∪ (Ck ∩ C))]
= P (C1 ∩ C) + · · ·+ P (Ck ∩ C)
= P (C|C1)P (C1) + · · ·+ P (C|Ck)P (Ck).
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
39/129
Bayes’s theorem
Suppose C1, . . . , Ck form a partition of the sample space C. Thenfor any event C ∈ B,
P (Cj |C) =P (C ∩ Cj)P (C)
=P (C|Cj)P (Cj)∑ki=1 P (C|Ci)P (Ci)
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
40/129
Example
On Friday evening 10% of the drivers in Iowa City are drunk. Theprobability that a drunk driver will be involved in a traffic accident is0.01% and the probability that a sober driver will be involved in atraffic accident is 0.002%. If you read in the morning paper that aparticular individual was involved in a traffic accident, what is theprobability that this individual was drunk?Solution:Let C = Accident occurs; C1 = Drunk; C2 = Sober. Then
P (C1|C) =P (C1)P (C|C1)
P (C1)P (C|C1) + P (C2)P (C|C2)
=.10 · .0001
.10 · .0001 + .9 · .00002= 0.357
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
41/129
Four inspectors at a film factory are stamping the expiration dateon each package of a film at the end of the assembly line.
I John, 20% of the packages, fails to stamp 1 in 200,I Tom, 60% of the packages, fails to stamp 1 in 100,I Jeff, 15% of the packages, fails to stamp 1 in 90,I Pat, 5% of the packages, fails to stamp 1 in 200.
If a customer complains that her package of film does not show theexpiration date, what is the probability that it was inspected byJohn?
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
42/129
Solution:
I Let B1 be the event that John inspected the package.Similarly, denote the event that the package was inspected byTom, Jeff and Pat using B2, B3, B4 respectively. ThenB1, B2, B3, B4 form a partition of the sample space. Let F bethe event that the package failed to be stamped.
I P (B1) = 0.2, P (B2) = 0.6, P (B3) = 0.15, P (B4) = 0.05.P (F |B1) = 1
200 , P (F |B2) = 1100 , P (F |B3) = 1
90 ,P (F |B4) = 1
200 .I The probability that the package was inspected by John given
the failure is
P (B1|F ) =P (F |B1)P (B1)
P (F |B1)P (B1) + P (F |B2)P (B2) + P (F |B3)P (B3) + P (F |B4)P (B4)
=(0.2)(1/200)
(0.2)(1/200) + (0.6)(1/100) + (0.15)(1/90) + (0.05)(1/200)
= 0.112
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
43/129
Suppose 5 out of 10000 employees at Lawrence LivermoreNational Laboratory are spies. According to experts, a polygraphtest has sensitivity of 0.91 and specificity of 0.94.Mathematically
Prevalence = P (S) = 0.0005
Sensitivity = P (+|S) = 0.91
Specificity = P (−|NS) = 0.941 Find the probability of a false positive test.
Solution:
P (+|NS) = 1− P (−|NS) = 1− 0.94 = 0.06
The complement rule can be used for conditional probabilitiesif what is being conditioned on is held fixed (i.e. NS was heldfixed in the above computation). In other words,P (+|NS) 6= 1− P (+|S).
2 Find the probability of a false negative test.Solution:
P (−|S) = 1− P (+|S) = 1− 0.91 = 0.09
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
44/129
1 Suppose an employee is randomly selected. Find theprobability that he/she tests positive for being a spy.Solution: By the law of total probability,
P (+) = P (+|S)P (S) + P (+|NS)P (NS)
= (0.91)(0.0005) + (0.06)(0.9995) = 0.060425.
2 Given that a randomly selected employee tested positive, findthe probability that he/she actually is a spy.Solution: By Bayes’s theorem,
P (S|+) =P (+ ∩ S)
P (+)
=P (+|S)P (S)
P (+|S)P (S) + P (+|NS)P (NS)
=(0.91)(0.0005)
0.060425= 0.00753.
Therefore, if an employee tests positive for being a spy, thereis a 0.753% chance that he/she actually is a spy!
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
45/129
Independence
Definition
Two events C1 and C2 with P (C1) > 0 and P (C2) > 0 areindependent if and only if P (C1 ∩ C2) = P (C1)P (C2).
Remark:I Two events C1, C2 are independent if and only ifP (C1 ∩ C2) = P (C1)P (C2), which is equivalent toP (C1|C2) = P (C1), when P (C2) > 0, and also equivalent toP (C2|C1) = P (C2), when P (C1) > 0.
I Two events are independent if the occurrence (ornon-occurrence) of one event does not influence the likelihoodof occurrence of the other event.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
46/129
I If the two events C1 and C2 are independent, then thefollowing pairs of events are also independent:(i) C1 and Cc2; (ii) Cc1 and C2; (iii) Cc1 and Cc2.
Proof.It suffices to show that
P (Cc1 ∩ C2) = P (Cc1)P (C2) = (1− P (C1))P (C2),
where
P (C2) = P (C ∩ C2)
= P ((C1 ∪ Cc1) ∩ C2)
= P [(C1 ∩ C2) ∪ (Cc1 ∩ C2)] (distributive law)
= P (C1 ∩ C2) + P (Cc1 ∩ C2) (axiom of probability)
= P (C1) · P (C2) + P (Cc1 ∩ C2). (independence)
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
47/129
Mutual independence
Suppose now we have three events, C1, C2, and C3. We say thatthey are mutually independent if
P (C1 ∩ C2) = P (C1)P (C2),
P (C1 ∩ C3) = P (C1)P (C3),
P (C2 ∩ C3) = P (C2)P (C3).
andP (C1 ∩ C2 ∩ C3) = P (C1)P (C2)P (C3).
More generally, we say the n events, C1, C2, . . . , Cn, are mutuallyindependent if for any collection of distinct integers, d1, d2, . . . , dk,from {1, 2, . . . , n}, it holds that
P (Cd1 ∩ Cd2 ∩ · · · ∩ Cdk) = P (Cd1)P (Cd2) · · ·P (Cdk).
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
48/129
Example
Suppose a circuit board contains 3 modules. The probability thatthe first module works properly is 0.98, while the second and thirdmodules work properly with probability 0.95 and 0.92, respectively.Modules are independent.
1 Find the probability that all 3 modules work properly.
P (W1 and W2 and W3)indep= P (W1)P (W2)P (W3)
= (0.98)(0.95)(0.92) = 0.8565
2 Find the probability that one or more modules work.
P (one or more work) = 1− P (0 work)indep= 1− P (W c
1 )P (W c2 )P (W c
2 )
= 1− (1− 0.98)(1− 0.95)(1− 0.92)
= 1− 0.00008 = 0.99992Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
49/129
Chapter 1 Probability and Distributions
1.5 Random Variables
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
50/129
Definition of random variable
Definition A random variable X is a map from the sample spaceC to the real set R. It assigns to each element c ∈ C one and onlyone value X(c) = x.
1 X induces a new sample space:
D = {x = X(c) : c ∈ C} ⊂ R.
Example 1: Coin flip: C = (T,H), and we create X(T ) = 0,X(H) = 1.Example 2: C = {c1, . . . , cn}, students at the UIowa.X(ci) = ci’s height and Y (ci) = ci’s weight.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
51/129
2 X induces a probability set function on R:
PX(B) = P{c ∈ C : X(c) ∈ B} for B ⊂ R.
Consider D ∈ D. We haveP (X ∈ D) = P ({c ∈ C : X(c) ∈ D}).
ExampleFlip a coin twice. Let X = total number of heads. The samplespace is C = {HH,HT, TH, TT} and all outcomes are equallylikely, so
x : 0 1 2
P (X = x) :1
4
2
4
1
4
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
Discrete Random Variables
52/129
52/129
Discrete random variables
DefinitionA random variable X is discrete if it can assume a finite orcountably infinite (i.e. you can create a one-to-onecorrespondence with the positive integers) number of values, thenX is a discrete random variable.
Example1 Let X be the number of broken eggs in a dozen randomly
selected eggs.Possible values for X are x = 0, 1, . . . , 12, so X is discrete.
2 Let X be the number of accidents at an intersection in a year.Possible values for X are x = 0, 1, 2, . . ., so X is discrete.
3 Flipping a coin until you obtain Head.Possible values for X are x = 1, 2, . . ., so X is discrete.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
53/129
Probability mass function
For the discrete random variables, the probability mass function(pmf)
pX(di) = P (X = di) i = 1, 2 . . . ,
determines the probability set function PX .
The pmf pX(x) has the following properties,
1 0 ≤ pX(x) ≤ 1,
2∑
x∈D pX(x) = 1,
3 P [X ∈ A] =∑
x∈A pX(x).
Definition
The support of a discrete random variable is {x : P (X = x) > 0},sometimes denoted S
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
54/129
Suppose we have a bowl with 1 chip labeled “1”, 2 chips labeled“2”, and 3 chips labeled “3”. Draw 2 chips without replacement.Let X = sum of the two draws.
1 Find the probability mass function (p.m.f.) of X.x : 3 4 5 6
pX(x) = P (X = x) : 215
415
615
315
2 Find the probability that the sum of the two draws is 5 or more.
P (X ≥ 5) = P (X = 5) + P (X = 6) =9
15.
3 Find the probability that the sum equals 6 given that the sumis 4 or more.
P (X = 6|X ≥ 4) =P (X = 6 ∩X ≥ 4)
P (X ≥ 4)=P (X = 6)
P (X ≥ 4)=
3
13.
4 Find the probability that the sum is 5 or less given that thesum is more than 4.
P (X ≤ 5|X > 4) =P (X ≤ 5 ∩X > 4)
P (X > 4)=P (X = 5)
P (X > 4)=
2
3.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
55/129
Cumulative distribution function
Let X be a random variable. Then its cumulative distributionfunction (cdf) is defined by
FX(x) = PX((−∞, x]) = P (X ≤ x).
For a discrete random variable, this is a step function.
FX(x0) =∑x≤x0
PX(X = x). (cdf is sum of pmf)
For the previous example
F (x) =
0 x < 32/15 3 ≤ x < 46/15 4 ≤ x < 512/15 5 ≤ x < 61 x ≥ 6,
given the pmfx : 3 4 5 6
pX(x) = P (X = x) : 215
415
615
315
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
Continuous Random Variables
56/129
56/129
Continuous random variables
DefinitionA random variable is a continuous random variable if itscumulative function FX(x) is a continuous function for all x ∈ R.
The cumulative distribution function (cdf) defined for thediscrete random variables can be used to describe continuousrandom variables too.
ExampleLet X denote a real number chosen at random between 0 and 1.Since any number can be chosen equally likely, it is reasonable toassume
PX [(a, b)] = b− a, for 0 < a < b < 1. (1)
Describe its cdf.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
57/129
Probability density function
I The notion of probability mass function (pmf) defined fordiscrete random variables does NOT work here:FX(x)− FX(x−) = 0 ∀x ∈ R, so P (X = x) = 0 ∀x ∈ R.
I Instead, for the continuous case, if there is a non-negativefunction fX such that
FX(x) =
∫ x
−∞fX(t)dt, for all −∞ < x <∞, (cdf is integral of pdf)
I We call fX the probability density function (pdf) of therandom variable X. It is easy to see that the following relationusually holds:
fX(x) = F ′X(x), for all −∞ < x <∞.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
58/129
A pdf fX(x) always has the properties
1 f(x) ≥ 0 ∀x ∈ R.
2∫R f(x)dx = 1.
3 If A ⊂ R, then P (A) =∫A f(x)dx. In other word,
P (a < X < b) =
∫ b
af(x)dx.
Definition
The support of a continuous random variable is {x : fX(x) > 0},sometimes denoted S
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
59/129
Let
f(x) =
{k(1− x2), −1 < x < 1,
0, otherwise.
Find
1 the constant k such that f(x) is a pdf.
2 P (−0.5 < X < 0.5).
3 P (X ≤ 0.1).
4 P (X > 0.7|X > 0.1).
5 Evaluate F (x).
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
60/129
Solution:
1 We must have
1 =
∫ ∞−∞
f(x) dx =
∫ 1
−1k(1− x2) dx = k
∫ 1
−11− x2 dx
Thus k · 43 = 1 and k = 34 .
2 By the definition of PDF, we have
P (−0.5 < X < 0.5) =
∫ 0.5
−0.5
3
4(1− x2) dx =
11
16= 0.6875.
3 P (X ≤ 0.1) = P (X < 0.1) =
∫ 0.1
−1
3
4(1− x2) dx = 0.57475.
4 For r.v.’s, the convention is to use comma instead of ∩.
P (X > 0.7|X > 0.1) =P (X > 0.7, X > 0.1)
P (X > 0.1)
=P (X > 0.7)
P (X > 0.1)=
∫ 10.7
34(1− x2) dx∫ 1
0.134(1− x2) dx
= 0.143.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
61/129
Percentile & quartile
Definition
Let X be a continuous-type random variable with pdf f(x) and cdfF (x). The (100p)th percentile is a number πp such that the areaunder f(x) to the left of πp is p. That is,
p =
∫ πp
−∞f(x)dx = F (πp).
The 50th percentile is called the median (m = π0.50). The 25thand 75th percentiles are called the first and third quartiles,respectively, denoted by q1 = π0.25 and q3 = π0.75. Of course, themedian m = π0.50 = q2 is also called the second quartile.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
62/129
ExampleLet Y be a continuous random variable with probability densityfunction given by
f(y) =
{3y2 0 ≤ y ≤ 10 otherwise.
Find the first, second and third quartiles.
Solution: ∫ π0.25
03y2dy = 1/4, π0.25 = 1/
3√
4.∫ π0.5
03y2dy = 1/2, π0.5 = 1/
3√
2.∫ π0.75
03y2dy = 3/4, π0.75 =
3√
3/3√
4.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
Basic Properties of CDF (for both discrete and continuous r.v.)
63/129
63/129
Theorem 1.5.2
Let X be a random variable with cdf FX . Then,
PX(a, b] = P (a < X ≤ b) = FX(b)− FX(a).
Proof.It holds that
{−∞ < X ≤ b} = {−∞ < X ≤ a} ∪ {a < X ≤ b}.
The theorem follows from the third axiom of the probability sincethe two sets on the right-hand side are disjoint.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
64/129
Example 1.5.7 & 1.5.8
Determine the constant c and the probability P (2 < X ≤ 4) in thefollowing questions:
1 X has a pmf
pX(x) =
{cx x = 1, 2, . . . , 100 otherwise.
2 X has a pdf
fX(x) =
{cx 0 < x < 100 otherwise.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
65/129
Example
Consider an urn which contains balls each with one of thenumbers 1, 2, 3, 4 on it. Suppose there are i balls with the numberi for i = 1, 2, 3, 4. Suppose one ball is drawn at random. Let X bethe number on the ball.
(a) Determine the pmf of X;
(b) Compute P (X ≤ 3);
(c) Determine the cdf of X.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
66/129
Theorem 1.3.6 Continuity of Probability
For an increasing sequence of events {Cn}, define its limit aslimn→∞Cn =
⋃∞n=1Cn. It holds that
limn→∞
P (Cn) = P ( limn→∞
Cn) = P
( ∞⋃n=1
Cn
).
Symmetrically, for an decreasing sequence of events {Cn}, defineits limit as limn→∞Cn =
⋂∞n=1Cn. It holds that
limn→∞
P (Cn) = P ( limn→∞
Cn) = P
( ∞⋂n=1
Cn
).
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
67/129
Theorem 1.5.1
Let X be a random variable with cdf F .
(a) If a < b, then F (a) ≤ F (b).(b) limx→−∞ F (x) = 0.
(c) limx→∞ F (x) = 1.
(d) limx↘x0 F (x) = F (x0).
Proof.
(a) If a < b, then {X ≤ a} ⊂ {X ≤ b}. The result then followsfrom the monotonicity of the probability.
(b) If {xn} is an decreasing sequence such that xn → −∞, thenCn = {X ≤ xn} is decreasing with ∅ = ∪∞n=1Cn. From thecontinuity of probability theorem,
limn→−∞
F (xn) = P (∩∞n=1Cn) = P (∅) = 0.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
68/129
Theorem 1.5.1 (cont’d)
Let X be a random variable with cdf F .(c) limx→∞ F (x) = 1.(d) limx↘x0 F (x) = F (x0) (right continuous).
Proof.
(c) If {xn} is an increasing sequence such that xn →∞, thenCn = {X ≤ xn} is increasing with {X ≤ ∞} = ∪∞n=1Cn.From the continuity of probability theorem,
limn→∞
F (xn) = P (∪∞n=1Cn) = 1.
(d) Let {xn} be any decreasing sequence of real numbers suchthat xn ↓ x0. Then the sequence of sets {Cn} is decreasingand ∩∞n=1Cn = {X ≤ x0}. The continuity of probabilitytheorem implies that
limn→∞
F (xn) = P (∩∞n=1Cn) = F (x0).
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
69/129
Theorem 1.5.3
For any random variable,
P (X = x) = FX(x)− FX(x−), ∀x ∈ R.
Proof.
For any x, we have {x} =⋂∞n=1
(x− 1
n, x
]. By the continuity of
the probability function,
P (X = x) = P
[ ∞⋂n=1
{x− 1
n< X ≤ x
}]
= limn→∞
P
[x− 1
n< X ≤ x
]= lim
n→∞[FX(x)− FX(x− 1/n)]
= FX(x)− FX(x−).
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
Summary
70/129
70/129
I Random variable X is a function from a sample space C intothe real numbers R.
I Every random variable is associated with a cdf:
FX(x) = PX(X ≤ x) for −∞ < x <∞.
I FX(x) is defined for all x, not just those in its domain D.I Notation wise, random variables will always be denoted with
uppercase letters and the realized values of the variable willbe denoted by the corresponding lowercase letters. Thus therandom variable X can take the value x.
I We say a random variable X is discrete if FX(x) is a stepfunction: F (x0) =
∑x≤x0 P (X = x).
I We say a random variable X is continuous if FX(x) is acontinuous function: F (x0) =
∫ x0−∞ f(x)dx.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
71/129
Chapter 1 Probability and Distributions
1.6 Discrete Random Variables
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
72/129
Review of discrete random variable
D = {x = X(c) : c ∈ C} is either finite or countable.(“D is countable” means that there is a one-to-one correspondencebetween D and the positive integers.)
The support of a discrete random variable X is {x : pX(x) > 0} =(In English: the support of X consists of all points x such thatpX(x) > 0. )
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
73/129
Example: geometric distribution
Consider a sequence of independent flips of a coin, each resultingin a head with probability p and a tail with probability q = 1− p. Letthe random variable X be the number of tails before the first headappears. Determine the pmf of X and the probability that X iseven.Solution:
P (X = x) = qxp, x = 0, 1, 2, . . .
∞∑k=0
P (X = 2k) = p
∞∑k=0
(q2)k
= p/(1− q2).
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
74/129
Example 1.6.2: hypergeometric distribution
A lot consists of m good fuses and n defective fuses. Choose k,k ≤ min{m,n}, fuses at random from the lot. Let the randomvariable X be the number of defective fuses among the k.Determine the pmf of X.Solution:
pX(x) =
(mx
)(nk−x)(
m+nk
) for x = 0, 1, . . . , k,
0 otherwise.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
75/129
Transformation of discrete random variables
Let X be a discrete random variable with pmf pX(x) and supportDX = {x : pX(x) > 0}. Suppose we are interested in Y = g(X).Then Y is also a random variable, and we want to determine itspmf.
Let DY = {y : y = g(x) for some x ∈ DX}. We have
pY (y) = P (Y = y)
= P (g(X) = y)
=∑
x:g(x)=y
pX(x)
= pX(g−1(y)) if g is one to one
= P (X ∈ g−1(y)),
where g−1(y) = {x : g(x) = y}.Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
76/129
Example Consider the geometric random variable Xpmf: P (X = x) = qxp, x = 0, 1, 2, . . ..1 Let Y be the number of flips needed to obtain the first head.
Then Y = X + 1.Y = g(X) = X + 1 and X = g−1(Y ) = Y − 1.
Determine the pmf of Y .Solution:
pY (y) = pX(g−1(y)) = qy−1p.
2 Let Y = (X − 2)2. Determine the pmf of Y .Solution:
pY (y) =
pX(2) if y = 0,
pX(1) + pX(3) if y = 1,
pX(0) + pX(4) if y = 4,
pX(√y + 2) if y ≥ 9, 16, 25 . . .
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
77/129
Chapter 1 Probability and Distributions
1.7 Continuous Random Variables
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
78/129
Review of continuous random variable
I Recall that a random variable is continuous if its cdf FX(x) isa continuous function for all x ∈ R. Hence, if the randomvariable X is continuous, thenP (X = x) = FX(x)− FX(x−) = 0, for all x ∈ R. Thismeans that there is no point of discrete mass.
I Recall that a nonnegative function fX is a pdf of the randomvariable X if∫ x
−∞fX(t)dt = FX(x), for all −∞ < x <∞,
I The support of a continuous random variable X is{x : fX(x) > 0} = D.(In English: the support of X consists of all points x such thatfX(x) > 0. )
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
79/129
Example 1.7.1
Suppose we select a point at random in the interior of a circle ofradius 1. Let X be the distance of the selected point from theorigin. Determine the cdf and pdf of X and the probability that theselected point falls in the ring with radius 1/4 and 1/2.Solution:
P (X ≤ x) = x2,
FX(x) =
0, x < 0,
x2, 0 ≤ x < 1,
1, x ≥ 1,
fX(x) =
{2x, 0 ≤ x < 1,
0, otherwise,
P (1/4 < X ≤ 1/2) =
∫ 1/2
1/42tdt = 3/16.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
80/129
Example 1.7.2 (Exponential Distribution)
Let the random variable X be the time in seconds betweenincoming telephone calls at a busy switchboard. Suppose that Xhas a pdf
fX(x) =
{λe−λx x > 00 otherwise.
This is an exponential distribution with parameter/rate λ > 0.Knowing that the parameter λ = 1/4, compute the probability thatthe time between successive phone calls exceeds 4 seconds.Solution:
P (X > 4) =
∫ ∞4
1
4e−x/4dx = e−1 = 0.3679.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
Transformation of Continuous R.V.
81/129
81/129
Sometimes we know the distribution of a continuous randomvariable X. We are interested in the distribution of a randomvariable Y which is a transformation (function) of X, sayY = g(X), and we want to determine the distribution of Y .
Example 1.7.3. Let X be the random variable with the pdf
fX(x) =
{2x 0 < x < 10 otherwise.
Determine the cdf and pdf of Y = X2.Solution:
FY (y) = P (Y ≤ y) = P (X2 ≤ y) = P (X ≤ √y) = FX(√y) = y.
fY (y) =
{1 0 < y < 1,
0 otherwise.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
82/129
Theorem 1.7.1: Transformation of continuous r.v.
Assume that X is a continuous random variable with pdf fX(x) andsupport SX . Suppose Y = g(X), where g is one-to-one. Then,
fY (y) = fX(h(y))|h′(y)|, where h = g−1.
Proof: Suppose that g (and thus h) is monotone increasing. Then
FY (y) = P (Y ≤ y) = P [g(X) ≤ y] = P [X ≤ h(y)] = FX [h(y)],
Therefore,
fY (y) = F ′Y (y) = F ′X [h(y)]·h′(y) = fX [h(Y )]h′(y) = fX [h(Y )]dx
dy.
Note that since h is increasing h′(y) is positive on SY .(h′(y) = |h′(y)|).
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
83/129
Proof (cont’d)
Suppose that g (and therefore h) is monotone decreasing. Then,
FY (y) = P (Y ≤ y) = P [g(X) ≤ y] = P [X ≥ h(y)] = 1−FX [h(y)]
Therefore,
fY (y) = F ′Y (y) = −F ′X [h(y)] · h′(y) = fX [h(Y )](−h′(y))
Note that since h is decreasing h′(y) is negative on SY .(−h′(y) = |h′(y)|).
We refer to dx/dy = J as the Jacobian of the transformation.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
84/129
Example
Assume that f(x) = 1 when 0 < x < 1 and 0 otherwise.Therefore SX = (0, 1).
1 Let Y = X2. g(x) = x2 is a one-to-one function on (0, 1).h(y) = y1/2, and h′(y) = 1
2y−1/2. Therefore,
fY (y) = fX(h(y))|h′(y)| = 1 · 1
2y−1/2 =
1
2y−1/2,
when 0 < y < 1. (SY = (0, 1)).
2 Suppose now that Y = g(X) = − log(X). Thenh(y) = exp(−y), and |h′(y)| = exp(−y). Therefore
fY (y) = fX(h(y))|h′(y)| = 1 · exp(−y) = exp(−y) y > 0.
You can verify that∫∞0 fY (y)dy = 1.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
85/129
Extended theorem: Transformation of continuous r.v.
Assume X is a continuous random variable with pdf fX(x) andsupport SX . Suppose Y = g(X) and the support has a partitionA1, . . . Ak. Further, suppose there exist functions g1(x), . . . , gk(x)such that
I g(x) = gi(x), for x ∈ Ai,I gi(x) is monotone on each x ∈ Ai,I define Yi = {y = gi(x), ∀x ∈ Ai}, then Y1 = Y2 = . . . = Yk.
Then
FY (y) =
∑k
i=1 fX(g−1i (y))
∣∣∣∣ ddyg−1i (y)
∣∣∣∣ , y ∈ Y,
0 otherwise.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
86/129
ExampleI Assume that f(x) = 1/2 when −1 < x < 1 and 0 otherwise.
Therefore SX = (−1, 1).I Let Y = X2 and g(x) = x2. Then g1(x) = x2 is a one-to-one
function on (−1, 0) and g2(x) = x2 is one-to-one on (0, 1).We see Yi = {y : y = gi(x), x ∈ Ai} = (0, 1).
I Then h1(y) = g−11 (y) = −y1/2, and h′1(y) = −12y−1/2.
h2(y) = g−12 (y) = y1/2, and h′2(y) = 12y−1/2.
I Therefore,
fY (y) = fX(h1(y))|h′1(y)|+ fX(h2(y))|h′2(y)|
=1
2· 1
2y−1/2 +
1
2· 1
2y−1/2
=1
2y−1/2,
when 0 < y < 1. (SY = (0, 1)).
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
Mixture of Discrete and Continuous R.V.
87/129
87/129
Example 1.7.6
Study the cdf
FX(x) =
0 x < 0x+12 0 ≤ x < 1
1 1 ≤ x.
I What is P (−3 < X ≤ 1/2).I What is P (X = 0).I What is P (−3 < X ≤ 0).I What is P (−3 < X < 0).
It always holds from the definition of cdf that
P (a < X ≤ b) = P (X ≤ b)− P (X ≤ a) = F (b)− F (a).
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
88/129
Example 1.7.7
Let X equals the size of a wind loss in millions of dollars, andsuppose it has the cdf
FX(x) =
{0 −∞ < x < 0
1−(
1010+x
)30 ≤ x <∞.
If losses beyond $10, 000, 000 are reported as 10, then the cdf ofthis censored distribution is
FY (y) =
0 −∞ < y < 0
1−(
1010+y
)30 ≤ y < 10
1 10 ≤ y <∞.
where Y = min(X, 10),
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
89/129
I This is an example of a mixed continuous and discreterandom variable. The particular example chosen is known ascensoring because it creates the discrete part by lumpingone end of the distribution into a single point.
I The continuous part of the random variable has the same pdfas the pdf for X on (0, 10), i.e.
3 · 103
(y + 4)41(0,10)(y).
I The discrete part has the pmf
P (Y = 10) = 1− FX(10) =
(10
20
)3
=1
8.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
90/129
Chapter 1 Probability and Distributions
1.8 Expectation of a Random Variable
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
91/129
Definition of expectation
Let X be a random variable.I If X is continuous with pdf f(x) then the expectation of X is
E(X) =
∫ ∞−∞
xf(x)dx.
provided that∫∞−∞ |x| f(x)dx <∞.
We say the expectation does not exist if∫∞−∞ |x| f(x)dx =∞.
I If X is discrete with pmf p(x) then the expectation of X is
E(X) =∑x
xp(x).
provided that∑
x |x| p(x) <∞.We say the expectation does not exist if
∑x |x| p(x) =∞.
I Sometimes the expectation of X is called the mathematicalexpectation of X, the expected value of X, or the mean of X.We often denote E(X) by µ (µ = E(X)).
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
92/129
Example
Suppose we have a bowl with 1 chip labeled “1”, 2 chips labeled“2”, and 3 chips labeled “3”. Draw 2 chips without replacement. LetX = sum of the two draws. The pmf (derived before) is as follows:
x : 3 4 5 6pX(x) = P (X = x) : 2
15415
615
315
We now find the expected value (mean) of the random variable X.
µ =
6∑x=3
xp(x) = 3
(2
15
)+ 4
(4
15
)+ 5
(6
15
)+ 6
(3
15
)= 4.667.
If the random variable X was observed many times and therealizations were recorded, the expected value, µ, describes themean of the observed realizations. If I repeatedly drew 2 chipsfrom the bowl without replacement and recorded the sum of thetwo draws, the mean of the recorded sums would equal µ = 4.667.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
93/129
Exponential distribution
Let X be a random variable that follows the exponential distributionwith parameter λ, that is,
fX(x) =
{λe−λx x > 00 otherwise.
Compute its expectation.Solution:
EX =
∫ ∞0
λxe−λxdx
= −∫ ∞0
xde−λx
= −xe−λx∣∣∣∣∣∞
0
+
∫ ∞0
e−λxdx
=1
λ.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
94/129
Two formulations of exponential distribution
1 The parameter λ is defined as rate:
fX(x) =
{λe−λx x > 00 otherwise.
In this case, EX = 1/λ.
2 The parameter λ is defined as scale:
fX(x) =
{ 1
λe−
1λx x > 0
0 otherwise.
In this case, EX = λ.
3 The first case is used in this course.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
Expectation of a Function of X
95/129
95/129
Let X be a random variable and let Y = g(X) for some function g.We can calculate EY = Eg(X) in two different ways.Suppose X is discrete and
∑x∈DX
∣∣g(x)∣∣ pX(x) <∞, then
1
Eg(X) =∑x∈DX
g(x) pX(x).
2 Let pY (y) be the pmf of Y . ThenEg(X) = EY =
∑y∈DY y pY (y).
Suppose X is continuous and∫∞−∞
∣∣g(t)∣∣fX(t)dt <∞, then
1
Eg(X) =
∫ ∞−∞
g(x)fX(x)dx.
2 Let Y = g(X) and let fY (y) be the pdf of Y . ThenEg(X) = EY =
∫∞−∞ yfY (y)dy.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
96/129
Assume X has a pdf f(x) = 1, for 0 < x < 1. Let Y = − logX.What is EY ?Solution:
EY =
∫ 1
0− log xdx = −[x log x− x]|10 = 1.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
97/129
For a certain ore samples the proportion X of impurities persample is a random variable with density function given by
f(x) =
32x
2 + x 0 ≤ x ≤ 1,
0 elsewhere.
The dollar value of each sample is Y = 5− 0.5X. Find the meanof X and Y .Solution:
EX =
∫ 1
0x
(3
2x2 + x
)dx =
17
24= 0.708.
EY =
∫ 1
0(5− 0.5x)
(3
2x2 + x
)dx = 5− 17
48= 4.646.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
98/129
Linearity of expectation
Theorem (1.8.2)
Let g1(X) and g2(X) be functions of a random variable X.Suppose the expectations of g1(X) and g2(X) exist.Then for any constants k0, k1 and k2, the expectation ofk0 + k1g1(X) + k2g2(X) exists and it is given by
E [k0 + k1g1(X) + k2g2(X)] = k0 + k1E [g1(X)] + k2E [g2(X)] .
Corollary: If g(X) = a+ bX, i.e. g is a linear function, then
E[g(X)] = a+ bE(X) = g[E(X)]
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
99/129
Example
Assume X has a pdf f(x) = 1, for 0 < x < 1.
E(X + 3X2) = E(X) + 3E(X2) =1
2+ 3 · 1
3=
3
2
Warning: If g is a nonlinear function, then in generalE[g(X)] 6= g[E(X)]. For the previous example
E(X2) =1
36= (E(X))2 =
(1
2
)2
=1
4.
Which one is larger? Jensen inequality...
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
100/129
Example
Let X have a pdf f(x) = 3x2, 0 < x < 1, zero elsewhere.Consider a random rectangle whose sides are X and (1−X).Determine the expected value of the area of the rectangle.Solution: ∫ 1
03x2 · x(1− x)dx = 0.15.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
101/129
Suppose X is a random variable such that E(X2)<∞. Consider
the function
h(b) = E[(X − b
)2].
The value of b that minimizes h(b) is EX.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
102/129
Let X be a continuous random variable with cdf F (x). Determinethe expectation of F (X).
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
103/129
Chapter 1 Probability and Distributions
1.9 Some Special Expectations
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
104/129
Mean and variance
Definition (Mean)
Let X be a random variable whose expectation exists. The meanvalue µ of X is defined to be µ = E(X).
Definition (Variance)
Let X be a random variable with finite mean µ. Then the varianceof X is defined to be σ2 = Var(X) = E
[(X − µ)2
].
The positive square root σ =√
Var(X) is called the standarddeviation of X.
Computation of Var(X):
Var(X) = EX2 − (EX)2,
which is because E is a linear operator.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
105/129
Coin-flipping example
Define X = 1 if it is head and X = 0 if it is tail.Assume P (X = 1) = p and then P (X = 0) = 1− p.Find the mean and variance of X.
Solution:
E(X) = 1 · p+ 0 · (1− p) = p,
E(X2) = 12 · p+ 02 · (1− p) = p,
σ2 = p− p2 = p(1− p). σ =√p(1− p).
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
106/129
Assume a > 0 and let
fX(x) =
{1/a 0 < x < a0 otherwise.
Solution:
µ =
∫ a
0x
1
adx =
a
2,
E(X2) =
∫ a
0x2
1
adx =
a2
3,
σ2 =a2
3− a2
4=a2
12.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
107/129
Example 1.9.2
If X has the pdf
fX(x) =
{ 1
x21 < x <∞
0 otherwise.
then the mean value of X does not exist:∫ ∞1
x · 1
x2dx =
∫ ∞1
1
xdx
=∞.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
108/129
Linear transformation
I Suppose that Y = a+ bX. Then,1 The mean of Y is E(Y ) = µY = a+ bE(X) = a+ bµX .2 The variance of Y is
Var(Y ) = E(Y − µY )2 = E(b2(X − µX)2) = b2σ2X .
In fact,
Var(Y ) = Var(a+ bX) = Var(bX) = b2Var(X).
3 The standard deviation of Y is |b|σX .
I Let the random variable X have a mean µ and a standarddeviation σ. Show that
E
[(X − µσ
)2]
= 1.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
109/129
Expectation of non-negative random variables
I Let X be a random variable of the discrete type with pmf p(x),x = 0, 1, 2, . . .. It holds that
E(X) =
∞∑x=0
[1− F (x)] .
I Let X be a continuous random variable with pdf f(x).Suppose that f(x) = 0 for x < 0. It holds that
E(X) =
∫ ∞0
[1− F (x)] dx.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
110/129
Proof for discrete case
∞∑x=0
[1− F (x)] =
∞∑x=0
[1− P (X ≤ x)] =
∞∑x=0
P (X > x)
=
∞∑x=0
∞∑t=x+1
P (X = t)
=
∞∑x=0
∞∑t=0
1t>xP (X = t)
=
∞∑t=0
∞∑x=0
1x<tP (X = t) (interchangable due to absolute convergence)
=
∞∑t=0
tP (X = t) = E(X).
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
111/129
Proof for continuous case
In this 4000-level class, we only prove
E(X) =
∫ b
0[1− F (x)] dx,
where the support of X is (0, b) and b <∞, although the equalityholds for b =∞.
Proof. ∫ b
0[1− F (x)]dx = (x− xF (x))
∣∣b0
+
∫ b
0xf(x)dx
=
∫ b
0xf(x)dx
= E(X).
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
112/129
Moments
I The m’th raw moment of X is defined as E(Xm)if the expectation exists.
I The m’th central moment of X is defined as E(X − µ)m
if the expectation exists.
Example
1 Coin flipping: P (X = 1) = p,
E(Xm) = 1m · p+ 0m(1− p) = p.
2 Assume a > 0 and let
fX(x) =
{1/a 0 < x < a,0 otherwise.
E(Xm) =
∫ a
0xm
1
adx =
am
m+ 1.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
113/129
Moment generating function
I Let X be a random variable. Assume there is a positivenumber h such that E[exp(tX)] exists for all t ∈ (−h, h).
I The moment generating function (mgf) of X is defined tobe the function
MX(t) = E(etX), −h < t < h.
Example
1 Assume P (X = 0) = 1− p and P (X = 1) = p.
MX(t) = E[etX ] = p·et+(1−p)·e0 = pet+1−p, for −∞ < t <∞.
2
fX(x) =
{1/a 0 < x < a0 otherwise.
MX(t) = E[etX ] =1
a
∫ a
0
etxdx =1
atetx|a0 =
eat − 1
at,
when t 6= 0 and MX(t) = 1 when t = 0.Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
114/129
Properties of mgf
1 It is always true that MX(0) = 1.
2 The moments of X can be found (or “generated”) from thesuccessive derivatives of MX(t).
M ′X(0) = E(X), M ′′X(0) = E(X2), M(n)X = E(Xn).
I We have M ′X(0) = E(X) =∫∞−∞ xf(x)dx = µ, since
M ′(t) =d
dt
∫ ∞−∞
etxf(x)dx =
∫ ∞−∞
d
dtetxf(x)dx =
∫ ∞−∞
xetxf(x)dx.
I We then see
M ′′(0) = E(X2) =
∫ ∞−∞
x2f(x)dx = µ2 + σ2,
so σ2 = E(X2)− µ2 = M ′′(0)− [M ′(0)]2.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
115/129
The pdf of X is
fX(x) =
{λe−λx x > 00 otherwise.
Find the mgf of X and use it to find the mean and the variance.Solution:
The mgf is given as follows,
M(t) = Eetx =
∫ ∞0
etxf(x)dx =λ
λ− t, t < 1,
and
M ′(t) =λ
(λ− t)2,
M ′′(t) =2λ
(λ− t)3.
Thus
µ = M ′(0) = 1/λ, σ2 = M ′′(0)− µ2 = 2/λ2 − 1/λ2 = 1/λ2.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
116/129
Theorem 1.9.1
TheoremLet X and Y be random variables with moment generatingfunctions MX and MY , respectively, existing in open intervalsabout 0. If MX(t) = MY (t) for all t in an interval counting t = 0,then X and Y have identical probability distributions.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
117/129
Example
Suppose X is a random variable of the continuous type with mgf
M(t) =1
1− 3t, t <
1
3.
Identify its distribution.
Solution:The random variable X follows the exponential distribution withrate λ = 1/3.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
118/129
Characteristic function
I Distributions may not have mgf.I Can you show the mgf of Cauchy distribution does not exist?
f(x) =1
π
1
x2 + 1, −∞ < x <∞
I Characteristic function: (not in exams)
φ(t) = E(eitX), for an arbitrary real value t.
I Every distribution has a unique characteristic function, andE(X) = iφ′(0) and E(X2) = −φ′′(0).
I Uniqueness of characteristic function is due to the uniquenessof Fourier transform. In fact, uniqueness of mgf comes fromthe uniqueness of Laplacian transform.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
119/129
Chapter 1 Probability and Distributions
1.10 Important Inequalities
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
Chebyshev’s inequality
120/129
120/129
Motivation
I The variance of a random variable tells use something aboutthe variability of the observations about the mean.
I If a random variable has a small variance or standarddeviation, we would expect most of the values to be groupedaround the mean.
I For any random variable, the probability between any twovalues symmetric about the mean should be related to thestandard deviation.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
121/129
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
122/129
Chebyshev’s Theorem
Theorem (1.10.3)
Suppose for a random variable X, E(X2) exists, then for anyconstant k > 0,
P (µ− kσ < X < µ+ kσ) ≥ 1− 1
k2.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
123/129
Example
A random variable X has a mean µ = 8, a variance σ2 = 9, andan unknown probability distribution. Find the lower bounds of
1 P (−4 < X < 20),
2 P (|X − 8| ≥ 6).
Solution:
P (−4 < X < 20) = P (8− 4× 3 < X < 8 + 4× 3) ≥ 15/16.
P (|X − 8| ≥ 6) = 1− P (|X − 8| < 6)
= 1− P (−6 < X − 8 < 6)
= 1− P (8− 2× 3 < X < 8 + 2× 3) ≤ 1/4.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
124/129
Comments on Chebyshev’s Theorem
I Chebyshev’s theorem holds for any distribution ofobservations.
I For this reason, the results are usually weak. The value givenby the theorem is a lower bound only.
I We know the probability of a random variable falling within twostandard deviations will be no less than 3/4, but we neverknow how much more it might actually be, unless we candetermine exact probabilities.
I Chebyshev’s theorem is thus called distribution-free result.The results will be less conservative when specificdistributions are known.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
125/129
Example
Compute P (µ− 2σ < X < µ+ 2σ), where X has the densityfunction
f(x) =
{6x(1− x), 0 < x < 1,
0, elsewhere,
and compare with the result given in Chebyshev’s theorem.
Solution:
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
Jensen’s inequality
126/129
126/129
Definition 1.10.1
A function φ defined on an interval (a, b) is said to be a convexfunction if for all x and y in (a, b) and all 0 < γ < 1,
φ [γx+ (1− γ)y] ≤ γφ(x) + (1− γ)φ(y).
We say φ is strictly convex if the above inequality is strict.
If φ is a (strictly) convex function, then −φ is a (strictly) concavefunction.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
127/129
Theorem 1.10.4
If φ is twice differentiable on (a, b), then
(a) (first-order condition) φ is convex if and only if
φ(x1) ≥ φ(x2) + φ′(x2)(x1 − x2).
(b) φ is strictly convex if and only if
φ(x1) > φ(x2) + φ′(x2)(x1 − x2).
(c) (second-order condition) φ is convex if and only ifφ′′(x) ≥ 0 for all x ∈ (a, b);
(d) φ is strictly convex if and only ifφ′′(x) > 0 for all x ∈ (a, b).
The second-order condition is usually used to check the convexity,provided that the second-order derivative exists.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
128/129
Jensen’s inequality
Let φ be a convex function on an open interval I, let X be arandom variable whose support is contained in I. If µ = E(X)exists then
φ [E(X)] ≤ E [φ(X)] .
The inequality reverses if φ is a concave function.
Proof.By the first-order condition,
φ(x) ≥ φ(µ) + φ′(µ)(x− µ).
Then taking expectations of both sides leads to the result.The inequality is strict if φ is strictly convex.
Example
We have µ2 < E(X2) as φ(t) = t2 is strictly convex.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018
129/129
Example
Let X be a positive random variable. Argue that
1
E
(1
X
)>
1
E(X).
2
E(√X) <
√E (X).
Solution:
1 When x > 0, φ′′(x) = 2x−3 > 0, so φ(x) = 1/x is strictlyconvex.The result follows from Jensen’s inequality.
2 When x > 0, φ′′(x) = −1/(4x3/2) < 0, so φ(x) =√x is
strictly concave.The result follows from Jensen’s inequality.
Boxiang Wang, The University of Iowa Chapter 1 STAT 4100 Fall 2018