L2 probability theory refresh

Probability Theory: Advanced LookStatistical Methods in Finance

Lecture 2

Ta-Wei Huang

December 7, 2016

Ta-Wei Huang Probability Theory: Advanced Look December 7, 2016 1 / 58

Table of Contents

Probability theory is the way we think about randomness. In financialmarkets, uncertainty and risks are everywhere, and thus we need theprobability theory to model the market trends.

1 Sample Space and Event

2 Probability Measure

3 Random Variable

4 Distribution Function

5 Next Lecture


Table of Contents




3 Random Variable


5 Next Lecture


Table of Contents




3 Random Variable


5 Next Lecture


Table of Contents




3 Random Variable


5 Next Lecture


Table of Contents




3 Random Variable


5 Next Lecture


Sample Space and Event

Sample Space: Definition

Definition 2.1.1 (Sample Space)

For an experiment and a given index set A, all possible outcomes

ωα, α ∈ A are called sample points. The set Ω = ωα : α ∈ A = the

collection of all possible outcomes, i.e., the set of all sample points, is

called the sample space of that experiment.

1 If A is countable, we say that Ω is a discrete sample space;

2 if A is uncountable, we say that Ω is a continuous sample space.



Sample Space: Examples

Example 2.1.2 (Sample Space)

1 If the experiment is tossing a coin, and the observation is the face of

that coin, then the sample space is Ω = H,T, which is a discrete

sample space.

2 If the experiment is a monetary policy conducted by Fed, and the

observation is the return on S&P500 Index one day after that policy,

then Ω = R, which is a continuous sample space.



Event

Definition 2.1.3 (Event)

An event E is any collection of all possible outcomes of an experiment,

that is, any subset of the sample space Ω.

An event is actually a statement about the experiment results. For

example, the set R+ = S&P500 has positive return is an event for Ω in

example 2.1.2 case (2).



σ-algebra: Definition

Definition 2.1.4 (σ-algebra)

A system F of subsets of Ω is called a σ-algebra if

(a) Ω ∈ F , (b) Ac ∈ F if A ∈ F ,

(c) and for A1, A2, · · · , An, · · · ∈ F , the union⋃∞i=1Ai ∈ F .

Why do we need the concept of σ-algebra? The reason is that we will

assign a probability to any event E, which is a subset of the sample space

Ω, and therefore set operations are required so that we can easily do

something on different events, and then compute probabilities.



σ-algebra: Example

Example 2.1.5 (σ-algebra)

1 Let Ω = 1, 2, 3. Then

1 F1 = φ, 1, 2, 3, 1, 2, 3 is a σ-algebra.

2 F2 = φ, 1, 2, 3, 1, 2, 3 is not a σ-algebra since for

A = 1 ∈ F2, Ac = 2, 3 6∈ F2.

2 Let Ω = R. Then F = the collection of all subsets in R is a σ-algebra.

3 Let Ω = N. Then

1 F1 = φ, 1, 3, 5, 7, · · · , 2, 4, 6, 8, · · · ,N is a σ-algebra.

2 F2 = A ⊆ N : A is countable or Ac is countable is a σ-algebra.

3 F3 = A ⊆ N : A is finite or Ac is finite is not a σ-algebra since

Ω = N 6∈ F3.


Probability Measure

Defining Probability Measure

Classically, we define the probability P (A) of an event A by

P(A) = ] of A] of Ω , but there are two main problems.

1 It requires a finite sample space, which is not true in some cases

2 It requires symmetric outcomes, that is, ∀ωi ∈ Ω, P(ωi) = 1] of Ω .

Therefore, we need the modern probability theory, on foundations laid by

Andrey Nikolaevich Kolmogorov.


Probability Measure

Probability Measure: Definition

Definition 2.1.6 (Measurable Space and Probability Measure)

Let Ω be a non-empty set and let F be a σ-algebra on Ω, then (Ω,F) is

called a measurable space. A probability measure is a real-valued function

P : F → R such that

(1) P(Ω) = 1;

(2) P(E) ≥ 0 for any event E ∈ F ;

(3) (Countable additivity) for any sequence of pairwise disjoint events

En in F , P(⋃∞i=1Ei) =

∑∞i=1 P(Ei).

The triple (Ω,F ,P) is called a probability space.


Probability Measure

Probability Measure: Remark

Remark on the Definition of Probability Measure

This axiomatic definition makes no attempt to tell what particular

function P to choose.

This definition regards the probability as a property of an event in the

σ-algebra F on the sample space Ω.

So, we’ll further discuss how to define probability measures on discrete and

continuous sample spaces, respectively.


Probability Measure

Properties of Probability Measure 1

Proposition 2.1.7

Let (Ω,F ,P) be a probability space. Then, for any event E ∈ F ,

(1) P(φ) = 0; (2) P(E) ≤ 1;

(3) P(EC) = 1− P(E).

Proof. It is easier to prove (3) first. Since the sets E and EC form a

partition of the sample space Ω, we have P(E ∪ EC) = P(Ω) = 1 by the

axiom of probability. Also, E and EC are disjoint, so by the axiom,

P(E ∪ EC) = P(E) + P(EC) = 1, and hence P(EC) = 1− P(E).

It is similar to prove (1), and so we skip it out.

Since P(EC) = 1− P(E) ≥ 0, (2) immediately holds.


Probability Measure


Proposition 2.1.8

Let (Ω,F ,P) be a probability space. Then, for any events A,B ∈ F ,

(1) P(B ∩AC) = P(B)− P(A ∩B);

(2) P(A ∪B) = P(A) + P(B)− P(A ∩B);

(3) (Monotonicity) If A ⊆ B, then P(A) ≤ P(B).

Proof. For any sets A and B, B = B ∩AC ∪ B ∩A. Then,

P(B) = P(B ∩AC ∪ B ∩A) = P(B ∩AC) + P(B ∩A).

To establish (2), we use the identity A ∪B = A ∪ B ∩AC. (why?)

To establish (3), combining A ⊆ B ⇒ B = B ∩A and (1) will give the

result.


Probability Measure


Proposition 2.1.9 (Inclusion-exclusion Identity)

Let (Ω,F ,P) be a probability space. For n events A1, · · · , An ∈ F , define

P1, P2, · · · , Pn by P1 =∑n

i=1 P (Ai) , P2 =∑

1≤i<j≤n P (Ai ∩Aj) ,

P3 =∑

1≤i<j<k≤n P (Ai ∩Aj ∩Ak) , · · · , and Pn = P (∩ni=1Ai) . Then

the probability of the union of A1, · · · , An is given by

P (A1 ∪A2 ∪ · · · ∪An) =

n∑i=1

(−1)n+1Pi.

Proof. By mathematical induction.


Probability Measure


Example 2.1.10 (The Matching Problem)

Suppose that each of N men at a party throws his hat into the center of

the room. The hats are first mixed up, and then each man randomly

selects a hat. What is the probability that none of the men selects his own

hat?

Solution. We first calculate the complementary probability of at least one

man’s selecting his own hat. Let Ai be the event that the i-th man selects

his own hat, i = 1, 2, . . . , N .


Probability Measure


Solution (Cont’d). Then, the probability that at least one of the men

selects his own hat is given by P (A1 ∪A2 ∪ · · · ∪AN ) =

(−1)2∑n

i=1 P (Ai) + (−1)3∑

1≤i1<i2≤n P (Ai1 ∩Ai2) + · · ·+ (−1)N+1Pn

= P(∩Ni=1Ai

)The number of all possible outcomes of this experiment is

N !. The number of all possible outcomes of the event Ai1 ∩ · · · ∩Ain is

(N − n)!. So, the probability P(Ai1 ∩ · · · ∩Ain) = (N−n)!N ! .


Probability Measure


Solution (Cont’d). Now, since there are(Nn

)terms in the item∑

1≤i1<···<in≤n P(Ai1 ∩ · · · ∩Ain), we have∑1≤i1<···<in≤n P(Ai1 ∩ · · · ∩Ain) =

(Nn

) (N−n)!N ! = 1

n! .

Thus, we get the complementary probability

P (A1 ∪A2 ∪ · · · ∪AN ) = 1− 1

2!+

1

3!− · · ·+ (−1)N+1 1

N !

Hence, the probability that none of the men selects his own hat is

1−(1− 1

2! + 13! − · · ·+ (−1)N+1 1

N !

)Note that as N →∞, the

probability is 1− e−1 ≈ 0.36788, not 1!


Probability Measure


Proposition 2.1.11 (Boole’s Inequality)

Let (Ω,F ,P) be a probability space. For countably many events

A1, · · · ∈ F , P (∪ni=1Ai) ≤∑n

i=1 P (Ai)

Solution. Let A′i be events defined by A′1 = A1 andA′i = Ai − ∪i−1

j=1Aj , ∀i = 2, 3, . . .. Then (1) A′i ⊂ Ai and (2) A′i’s arepairwise disjoint since for i > k,

A′i ∩A′k = (Ai − ∪i−1j=1Aj) ∩ (Ak − ∪k−1

j=1Aj)

= (Ai ∩ (∪i−1j=1Aj)

C) ∩ (Ak ∩ (∪k−1j=1Aj)

C)

= (Ai ∩ (∩i−1j=1A

Cj )) ∩ (Ak ∩ (∩k−1

j=1ACj ))

and (Ai ∩ (∩i−1j=1A

Cj )) are contained in ACk . Since ∪iA′i = ∪iAi, we have

P (∪ni=1Ai) = P (∪ni=1A′i) =

∑ni=1 P (A′i) ≤

∑ni=1 P (Ai).


Probability Measure


Corollary 2.1.12 (σ-subadditivity)

Let (Ω,F ,P) be a probability space. Then for countably many events

A1, · · · ∈ F and A ⊂ ∩ni=1Ai, P(A) ≤∑n

i=1 P (Ai).

Proof. By Theorem 1.1.9.(3) and Theorem 1.1.11.


Probability Measure


Proposition 2.1.13 (Law of Total Probability)

Let (Ω,F ,P) be a probability space. If A1, · · · ∈ F is a partition of Ω,

that is, Ai’s are pairwise disjoint and ∪∞i Ai = Ω. Then, for any event

B ∈ F , P(B) =∑∞

i=1 P(B ∩Ai).

Proof. Since Ai is a partition of Ω, we have

B = B ∩ Ω = B ∩ (∪∞i=1Ai) = ∪∞i=1 (B ∩Ai). Therefore, we have

P(B) = P(∪∞i=1 (B ∩Ai)) =∑∞

i=1 P(B ∩Ai).


Probability Measure

Define Probability Measure: Discrete Sample Space 1

Theorem 2.1.14 (Define Probability Measures in a Discrete Sample Space)

Let Ω = ωα : α ∈ N be a discrete (countable) sample space and F be a

σ-algebra on Ω. Define a function P : F → R by (1) P(Ω) = 1; (2)

P(ω) ≥ 0 for any ω ∈ Ω; (3) P(E) =∑

ω∈E P(ω). Then P is a

probability measure.

Proof. The axiom (1) is true by the definition of P. Since P(ω) ≥ 0 for

any ω ∈ Ω, for any event E ∈ F , P(E) =∑

ω∈E P(ω) ≥ 0. ⇒

assumption (2) holds.


Probability Measure

Define Probability Measure: Discrete Sample Space 2

Proof (Cont’d). Let En be a sequence of pairwise disjoint events in F .

Then,

P

( ∞⋃i=1

Ei

)=

∑ω∈

⋃∞i=1 Ei

P(ω) =

∞∑i=1

∑ω∈Ei

P(ω) =

∞∑i=1

P(Ei)

⇒ Assumption (3) holds.

Remark on Theorem 2.1.7

The triple (Ω,F ,P) in above theorem is called a discrete probability space.


Probability Measure

Define Probability Measure: Continuous Sample Space 1

It is much harder to define a probability measure on a continuous sample

space. First, we need some basic knowledge on set theory.

Definition 2.1.15 (Convergence of Sets)

A sequence of sets A1, A2, · · · , An, · · · is said to be increasing to A if

A1 ⊂ A2 ⊂ · · · ⊂ An ⊂ · · · and A = ∪∞n=1An. We denote it as

limn→∞An = A or The case An ↑ A.

A sequence of set A1, A2, · · · , An, · · · is said to be decreasing to A if

A1 ⊃ A2 ⊃ · · · ⊃ An ⊃ · · · and A = ∩∞n=1An. We denote it as

limn→∞An = A or The case An ↓ A.


Probability Measure


The following graph shows the idea about the convergence of sets.


Probability Measure


Next, we introduce a basic theorem about the continuity of a probability

measure. Actually, we have a more general definition of the limit and

continuity of a set, but here we skip it since we only want to see how to

define a probability measure on a continuous sample space.

Theorem 2.1.16 (Above and Below Continuity of Probability)

Let (Ω,F ,P) be a probability space. If A1, A2, · · · ∈ F is

increasing/decreasing to a set A, then limi→∞ P(An) = P(A).


Probability Measure


Proof. We only prove the increasing case. Let Bn = An −An−1. Then

Bn’s are pairwise disjoint, ∪ni=1Bi = ∪ni=1Ai, and ∪∞n=1Bn = A. Thus,

P(A) = P(∪∞n=1Bn) =

∞∑n=1

P(Bn)

= limn→∞

n∑i=1

P(Bi) = limn→∞

P(∪ni=1Bi)

= limn→∞

P(∪ni=1Ai) = limn→∞

P(An).


Probability Measure


Example 2.1.17

Let (Ω,F ,P) be a probability space. If A1, A2, · · · ∈ F is

increasing/decreasing to a set A, then limi→∞ P(An) = P(A).

Solution. Let Ai =(b− 1

k , b]. Then Ai ↓ [a, b], and so we have

P([a, b]) = limi→∞ P(Ai) = limi→∞ = 12π

(b− a+ 1

k

)= P((a, b]) = b−a

2π .

Remark.

Note that P(a) = P ([a, b]− (a, b]) = 0. Also, the probability

P(Q ∩ Ω) = 0. (Quite different from discrete case!)


Probability Measure

General Limit of Sets

Definition 2.1.18 (Limit of Sets)

Let A1, A2, · · · , An, · · · be a sequence of subsets of Ω.

The limit sup of An is defined as

lim supAn = ∩∞n=1 ∪∞i=n Ai = ω ∈ Ω : ω ∈ infinitely many An.

The limit inf of An is defined as

lim inf An = ∪∞n=1 ∩∞i=n Ai = ω ∈ Ω : ω ∈ all but finitely many An.

We say that limnAn = A if A = lim supAn = lim inf An.


Probability Measure

Continuity of Probability Measure 1

Theorem 2.1.19 (Continuity of Probability)

Let (Ω,F ,P) be a probability space. If A1, A2, · · · ∈ F is a sequence of

subsets of Ω with limnAn = A, then limn→∞ P(An) = P(A).

Proof. Let Bn = ∪∞i=nAi, a decreasing sequence. Thus,

P(lim supAn) = P(∩∞n=1Bn) = limi→∞

P(Bn)

by theorem 2.1.15. Also, let Cn = ∩∞i=nAi, a increasing sequence. Then,

P(lim supAn) = P(∪∞n=1Cn) = limi→∞ P(Cn).


Probability Measure

Continuity of Probability Measure 2

Proof.

Since limnAn exist, we have lim supAn = lim inf An = A, and so

P(A) = limi→∞ P(Bn) = limi→∞ P(Cn).

Since the relationship Cn ⊆ An ⊆ Bn holds, we have

P(Cn) ≤ P(An) ≤ P(Bn).

By pinching theorem, one can get

P(A) = limi→∞ P(Bn) = limi→∞ P(Cn) = limi→∞ P(An).


Random Variable

Understanding the Sample Space

From example 2.1.17, we know that defining a probability measure on a

sample space is not quite simple as one may think. Therefore, further

understanding on σ−algebra and probability is needed to help us to define

an appropriate probability measure. Here, we first discuss properties of

σ−algebra. The discussion will be very helpful when constructing the

framework of random variable.


Random Variable

Understanding the σ−algrbra

Question

Given a sample set Ω and a collection of subsets of Ω, C. Does there exist

a collection of subsets of Ω, say G, such that (1)C ⊆ G and (2)G is a

σ−algebra?

The answer is definitely yes. We may take G to be the collection of all

subsets of Ω. However, this method does not help us to understand about

how to define a suitable probability measure on a desired complicated

space. So our goal is to find the smallest σ−algebra that contains C.


Random Variable

Understanding the σ−algrbra

Definition 2.2.1

Let Ω be a sample space and C a collection of subsets of Ω. If G is the

smallest σ−algebra that contains C, we say that G is generated by C and

denote it by G = σ(C).

Note. Here are two proposition that is easy to check. First, if C1 ⊆ C2,

then σ(C1) ⊆ σ(C2). Second, σ(σ(C)) = σ(C) since σ(C) is a σ−algebra.


Random Variable

Borel Set and Borel σ−algrbra

Definition 2.2.2 (Borel Sets)

Let Ω be a topological space and C the collection of all open subsets of Ω.

The the σ−algebra B(Ω) ≡ σ(C) is called a Borel σ−algebra. Any

element E ∈ B is called a Borel set.

Now, let R be the sample sets. Then every subset of R which you meet

everyday is a Borel set, and it is difficult, but possible, to find a subset of

R constructed explicitly that is not a Borel set. This means that

B ≡ B(R) is not equal to the collection of all subsets of R.


Random Variable

Lemma

Events in B are quite complicated, but we can use a very easily

understanding structure to construct B by theorem 2.2.4, but before prove

it, we need a simple result in lemma 2.2.3.

Lemma 2.2.3

Let Ω be a set. IF Fα is a σ−algebra for each α in some non-empty index

set I, then ∩α∈IFα is also a σ−algebra.

Proof. This follows immediately from the definition of σ−algebra.


Random Variable

Structure of a Borel σ−algebra 1

Theorem 2.2.4

The Borel σ−algebra B ≡ B(R) is generated by the following collections

of subsets of R:

(1) C1 = (a, b) : a < b, a ∈ R, b ∈ R;

(2) C2 = (a, b] : a < b, a ∈ R, b ∈ R;

(3) C3 = [a, b) : a < b, a ∈ R, b ∈ R;

(4) C4 = [a, b] : a < b, a ∈ R, b ∈ R.

Proof. This is not our main topic, so we only prove (1) as an example.


Random Variable

Structure of a Borel σ−algebra 2

Proof (Cont’d).

Let G be the collection of all open sets in R. Then B = σ(G). Since every

(a, b) are open, we have C1 ⊂ G, and so σ(C1) ⊂ σ(G).

Since (a,∞) = ∪ε∈R+(a, a+ ε) and (−∞, b) = ∪ε∈R+(b− ε, b), we have

(a,∞) and (−∞, b) in σ(C1).

To get the reverse inclusion, if G ⊂ R is open, it is the countable union of

open intervals and so G ∈ σ(C1), and thus G ⊂ σ(C1). This implies that

σ(G) ⊂ σ(C1).


Random Variable

Measurable Function 1

There are still one thing we need to talk about, that is, the term of

”measurability” of a function. This definition is the core concept behind

random variables.

Definition 2.2.5 (Measurability)

Let (Ω,F) and (S,F ′) be two measurable spaces. A function X : Ω→ S

is said to be a (F ,F ′) measurable map from (Ω,F) to (S,F ′) if

X−1(B) = ω : X(ω) ∈ B ∈ F , ∀B ∈ F ′.


Random Variable

Measurable Function 2

The meaning of (F ,F ′) measurable is that we can transform the original

measure defined on F into a new probability measure PX on F ′, and then

form a new probability space (S,F ′,PX). From here, we’ve known the

structure behind the concept of random variable. Now, we give the forma

definition of a random variable.


Random Variable

Random Variable

Definition 2.2.6 (Random Variable)

Let (Ω,F ,P) be a probability space. We say that a real-valued function

X : Ω→ R a (F ,B) measurable random variable if for any Borel set

B ⊂ R, X−1(B) = ω : X(ω) ∈ B ∈ F .

Remark.

This definition means that a random variable transforms the original

probability space (Ω,F ,P) into a new probability space (X(Ω),B,PX).

Here we have not talked about how to transform P into a new probability

measure PX w.r.t X on B.


Random Variable

Example of Random Variables

Example 2.2.7 (Random Variables)

Let Ω = [0, 1] and F = B ∩ [0, 1]. Is the real-valued function

X1(ω) = ω, ω ∈ Ω a random variable? How about X2(ω) = ω2, ω ∈ Ω?

Solution. For any Borel set B ∈ B,

X−11 (B) = ω ∈ Ω : X1(ω) ∈ B = ω ∈ [0, 1] : ω ∈ B = B ∩ [0, 1] ∈ F .

Hence, X1(ω) = ω, ω ∈ Ω is a random variable.

For any Borel set B ∈ B,

X−12 (B) = ω ∈ Ω : X2(ω) ∈ B = ω ∈ [0, 1] : ω2 ∈ B.

It is hard to check whether X−12 (B) ∈ F , especially that we need to check

every Borel set B!


Random Variable

Alternative Definition of Random Variable 1

The definition of the random variable is too hard to check! We need other

equivalent definition of random variables. We first introduce a lemma.

Lemma 2.2.8 (Transformation of σ−algebra)

Let (Ω,F) and (S,F ′) be two measurable spaces and C ∈ F ′ be a

collection of subsets of S. If X : Ω→ S is a (F ,F ′) measurable map,

then σ(X−1(C)) = X−1(σ(C)).

Proof. Very hard to prove! Skip!


Random Variable


Theorem 2.2.9 (Equivalent Definition of a Random Variable)

Let (Ω,F ,P) be a probability space and X : Ω→ R a real-valued

function. Then X is a random variable if and only if

X−1((−∞, r]) = ω : X(ω) ≤ r ∈ F for any r ∈ R.

Proof.

If X is a random variable, since (−∞, r] is a Borel set (note that (−∞, r]

generates the Borel σ−algebra on R), X−1((−∞, r]) ∈ F must holds.

To prove the converse, let C = (−∞, r] : r ∈ R. Then

X−1(B) = X−1(σ(C)) = σ(X−1(C)). Since X−1(C) ∈ F , we have

σ(X−1(C)) = X−1(B) ⊂ F , and so X is a random variable.


Random Variable


Corollary 2.2.10 (Equivalent Definition of a Random Variable)

Let (Ω,F ,P) be a probability space and X : Ω→ R a real-valued

function. The following statements are equivalent.

(1) X is a random variable.

(2) X−1((−∞, r]) = ω : X(ω) ≤ r ∈ F for any r ∈ R.

(3) X−1((−∞, r)) = ω : X(ω) < r ∈ F for any r ∈ R.

(4) X−1([r,∞)) = ω : X(ω) ≥ r ∈ F for any r ∈ R.

(5) X−1((r,∞)) = ω : X(ω) > r ∈ F for any r ∈ R.

It is a easier definition for us to check whether a function is a random

variable. For example, now we can answer example 2.2.7.

If r < 0, X−12 ((−∞, r]) = φ ∈ F . If 0 ≤ r ≤ 1,

X−12 ((−∞, r]) = [0,

√r] ∈ F since

√r ≤ 1 If r > 1,

X−12 ((−∞, r]) = φ ∈ F . Therefore, X2 is a random variable.


Random Variable


Now we can answer example 2.2.7.

Example 2.2.7 (Random Variables)

Let Ω = [0, 1] and F = B ∩ [0, 1]. Is the real-valued function

X1(ω) = ω, ω ∈ Ω a random variable? How about X2(ω) = ω2, ω ∈ Ω?

If r < 0, X−12 ((−∞, r]) = φ ∈ F . If 0 ≤ r ≤ 1,

X−12 ((−∞, r]) = [0,

√r] ∈ F since

√r ≤ 1 If r > 1,

X−12 ((−∞, r]) = φ ∈ F . Therefore, X2 is a random variable.


Random Variable

Define the Probability Measure on the Borel Set 1

Here, we still need to discuss one thing, that is how to define a probability

measure on B with respect to the random variable X? The following

theorem gives us the hint.

Theorem 2.2.11

Let (Ω,F ,P) be a probability space and X : Ω→ R a random variable.

Then the function PX : B → [0, 1] defined by

PX(B) = P(X−1(B)), ∀B ∈ B is a probability measure on (R,B).

Proof. To prove this theorem, we need to check the axioms of probability

measures.


Random Variable


Proof (Cont’d).

(1) PX(R) = P(X−1(R)) = P(X−1(Ω)) = 1.

(2) PX(B) = P(X−1(B)) ≥ 0 since P) is a probability measure.

(3) For pairwise disjoint sets Bi’s in B,

PX(∪iBi) = P(X−1(∪iBi)) = P(∪iX−1(Bi))

=∑

i P(X−1(Bi)) =∑

i PX(Bi).

Therefore, PX is a probability measure on (R,B). Note that PX is called

the probability distribution function of the random variable X.


Random Variable


From here, we’ve known how to define a probability measure with respect

to a random variable. However, the definition is too complicated since it

defines a probability measure on (R,B). As a result, in the next section we

will introduce a more common, useful, and intuitive function to describe

the probability w.r.t X, that is the (culmulative) distribution function FX .


Distribution Function

Distribution Function: Definition

The distribution of a random variable X is usually described by giving its

distribution function, not the probability distribution function.

Definition 2.3.1 (Distribution Function)

Let (Ω,F ,P) be a probability space and X : Ω→ R a random variable

with probability distribution function PX . The distribution function of X,

written FX : R→ [0, 1], is defined by

FX(x) = PX((−∞, x]) = P(ω ∈ Ω : X(ω) ≤ x) = P(X ≤ x), ∀x ∈ R.



Distribution Function: Properties 1

The distribution function gives us a simplified framework when discussing

a quantitative random phenomenon.

Theorem 2.3.2 (Properties of Distribution Functions)

Any distribution function FX of a random variable X has the following

properties:

(1) FX is nondecreasing;

(2) limx→∞ FX(x) = 1 and limx→−∞ FX(x) = 0;

(3) FX is right continuous, that is, limx→a+ FX(x) = FX(a);

(4) FX(a−) := limx→a− FX(x) = P(X < a);

(5) the probability of the event X = a is P(X = a) = FX(a)−FX(a−).



Distribution Function: Properties 2

Proof. To prove (1), assume that a ≤ b ∈ R. Note that

ω : X(ω) ≤ a ⊆ ω : X(ω) ≤ b. Then,

FX(a) = P(ω : X(ω) ≤ a) ≤ ω : X(ω) ≤ b = F (b).

For(2), if x→∞, then ω : X(ω) ≤ x ↑ Ω, so

limx→∞ FX(x) = limx→∞ P(ω : X(ω) ≤ x) = P(Ω) = 1. The fact that

x→ −∞⇒ X(ω) ≤ x ↓ φ implies limx→−∞ FX(x) = 0.

To prove (3), we observe that if x→ a+, then

ω : X(ω) ≤ x ↓ ω : X(ω) ≤ a.

To prove (4), if x→ a−, ω : X(ω) ≤ x ↑ ω : X(ω) ≤ a.

For (5), note that P(X = a) = P (X ≤ a)− P(X < a) and use (3),(4).



Lebesgue-Stieltjes measure 1

Definition 2.3.3 (Lebesgue-Stieltjes Measure)

Let Ω = R and C be a collection of intervals of the form (a, b]. If F is a

function satisfying (1) and (3), then we call it a Stieltjes measure.

Define a function l by l((a, b]) = F (b)− F (a). Then the function

m∗(E) = inf

∞∑i=1

l(Ai) : Ai ∈ C for each i and E ⊂ ∩∞i Ai

is called a Lebesgue-Stieltjes measure. If we define F (x) = x, then the

measure m∗ is called a Lebesque measure.



Lebesgue-Stieltjes measure 2

Note that there are many important properties for the Lebesgue-Stieltjes

measure. First, every set in the Borel σ−algebra on R can be measured by

the Lebesgue-Stieltjes measure. (Here, we do not define what the term

”can be measured”!) Second, if a and b are finite, then

m∗((a, b]) = l((a, b]). A more important one is the Lebesque measure, it

plays a central role when defining expectation.



Conditions for Distribution Functions 1

Theorem 2.3.4 (Conditions for Distribution Functions)

If a function F satisfies (1), (2) and (3), then it is the distribution

function of some random variable.

Proof.

Let Ω = [0, 1], F = the Borel sets on (0, 1), and P = the Lebesgue

measure.

Define a real-valued function X : Ω→ R by X(ω) = supx : F (x) < ω.

Then, for a fixed number x, if ω ≤ F (x), X(ω) < x must holds by the

definition of X.



Conditions for Distribution Functions 2

On the other hand if ω > F (x), then since F is right continuous, there is

an ε > 0 so that F (x+ ε) < ω and so X(ω) ≤ x+ ε > x. As a result, we

have ω : X(ω) ≤ x = ω : ω ≤ F (x). Since P is a lebesgue measure,

we have the desired result since

P(ω ∈ Ω : X(ω) ≤ x) = P(ω : ω ≤ F (x))

= P((−∞, F (x)]) = F (x)− limx→−∞ FX(x) = F (x).



Probability Mass Function

Definition 2.3.5 (Probability Mass Function)

Let (Ω,F ,P) be a probability space and X : Ω→ R a random variable. If

the image of X, say X(Ω), is a countable set, we call X a discrete

random variable. The probability mass function of X is defined by

pX(x) = PX(X(ω) = x) = P(ω : X(ω) = x). In this case, the

distribution function of X must have some jumps.



Probability Density Function

Definition 2.3.6 (Probability Density Function)

Let (Ω,F ,P) be a probability space and X : Ω→ R a random variable. If

the distribution function FX of X is continuous, we call X a continuous

random variable. The function fX(x) = ddxFX(x) is called the probability

density function of X.



Probability Density Function

By the fundamental theorem of calculus, we have the relationship

FX(x) =∫ x−∞ f(t)dt and

PX((a− ε, a+ ε)) = FX(a+ ε)− FX(a− ε) =∫ a+εa−ε f(x)dx.

These properties mean that we can start with f and then define a

distribution function F . In order to end up with a distribution function FX

it is necessary and sufficient that f(x) ≥ 0 and∫∞−∞ f(x)dx = 1.


Next Lecture

The Next Lecture

In next lecture, we’ll review the idea about the random variable, and then

introduce the random vector.


Data & Analytics

L2 probability theory refresh