124
INSTITUTO POLITÉCNICO NACIONAL CENTRO DE INVESTIGACION EN COMPUTACION Probability, Random Processes and Inference Dr. Ponciano Jorge Escamilla Ambrosio [email protected] http://www.cic.ipn.mx/~pescamilla/ Laboratorio de Ciberseguridad

Probability, Random Processes and Inference - CIC …pescamilla/PRPI/slides/PRPI_1.pdf · Probability, Random Processes and Inference Dr. Ponciano Jorge Escamilla Ambrosio [email protected]

Embed Size (px)

Citation preview

INSTITUTO POLITÉCNICO NACIONALCENTRO DE INVESTIGACION EN COMPUTACION

Probability, Random Processes and Inference

Dr. Ponciano Jorge Escamilla [email protected]

http://www.cic.ipn.mx/~pescamilla/

Laboratorio de

Ciberseguridad

CIC

Instructor

Dr. Ponciano Jorge Escamilla Ambrosio

[email protected]

http://www.cic.ipn.mx/~pescamilla/

Class meetings

Mondays and Wednesdays 12:00 – 14:00 hrs.

Classroom Aula A3

2

Probability, Random

Processes and Inference

CIC

Course web site:

http://www.cic.ipn.mx/~pescamilla/academy.html

Reader material and homework exercises, etc.

3

Course web site

CIC

The student will learn the fundamentals of

probability theory: probabilistic models, discrete

and continuous random variables, multiple

random variables and limit theorems as well as an

introduction to more advanced topics such as

random processes and statistical inference. At the

end of the course the student will be able to

develop and analyse probabilistic models in a

manner that combines intuitive understanding and

mathematical precision.

4

Course Objective

CIC

5

Course content

1. Probability

1.1. What is Probability?

1.1.1. Statistical Probability

1.1.2. Probability as a Measure of Uncertainty

1.2. Sample Space and Probability

1.2.1. Probabilistic Models

1.2.2. Conditional Probability

1.2.3. Total Probability Theorem and Bayes’ Rule

1.2.4. Independence

1.2.5. Counting

1.2.6. The probabilistic Method

CIC

6

Course content

1.3. Discrete Random Variables

1.3.1. Basic Concepts

1.3.2. Probability Mass Functions

1.3.3. Functions of Random Variables

1.3.4. Expectation and Variance

1.3.5. Joint PMFs of Multiple Random Variables

1.3.6. Conditioning

1.3.7. Independence

CIC

7

Course content

1.4. General Random Variables

1.4.1. Continuous Random Variables and PDFs

1.4.2. Cumulative Distribution Function

1.4.3. Normal Random Variables

1.4.4. Joint PDFs of Multiple Random Variables

1.4.5. Conditioning

1.4.6. The Continuous Bayes’ Rule

1.4.7. The Strong Law of Large Numbers

CIC

8

Course content

2. Introduction to Random Processes

2.1. Markov Chains

2.1.1. Discrete Time Markov Chains

2.1.2. Classification of States

2.1.3. Steady State Behavior

2.1.4. Absorption Probabilities and Expected Time to

Absorption

2.1.5. Continuous Time Markov Chains

2.1.6. Ergodic Theorem for Discrete Markov Chains

2.1.7. Markov Chain Montecarlo Method

2.1.8.Queuing Theory

CIC

9

Course content

3. Statistics

3.2. Classical Statistical Inference

3.2.1. Classical Parameter Estimation

3.2.2. Linear Regression

3.2.3. Analysis of Variance and Regression

3.2.4. Binary Hypothesis Testing

3.2.5. Significance Testing

CIC

10

Course text books

Joseph Blitzstein, Jessica Hwang. Introduction to probability, CRC Press2014.https://www.crcpress.com/Introduction-to-Probability/Blitzstein-Hwang/9781466575578

Dimitri P. Bertsekas and John N. Tsitsiklis. Introduction to probability, 2nd Edition, Athena Scientific, 2008. http://athenasc.com/probbook.html

CIC

11

Course text books

Géza Schay, Introduction to probability with statistical applications,Birkhauser, Boston, 2007.http://link.springer.com/book/10.1007/978-0-8176-4591-5

William Feller. An introduction to probability theory and its applications, Vol. 1, 3rd Edition, Wiley, 1968.http://www.wiley.com/WileyCDA/WileyTitle/productCd-0471257087.html

CIC

Midterm exam 15%

Final exam 15%

Homework assignments 20%

One written departmental exam 50%

12

Grading

CIC

13

Course Schedule A-17http://www.cic.ipn.mx/~pescamilla/academy.html

CIC

1. What is Probability?

1.1.1. Statistical Probability

1.1.2. Probability as a Measure of Uncertainty

14

Probability

CIC

15

What is Probability?

CIC

The relative is trying to use the concept of

probability to discuss an uncertain

situation

Luck, Coincidence, Randomness,

Uncertainty, Risk, Doubt, Fortune, Chance…

Used in a vague, casual way!

A first approach to define probability is in

terms of frequency of occurrence, as a

percentage of success

16

What is Probability?

CIC

For example, if we toss a coin, and observe

whether it lands head (H) or tail (T) up

What is the probability of either result?

Why?

17

What is Probability?

CIC

Example: Flip a coin twice

18

What is Probability?

𝑃 𝐴 =# 𝐹𝑎𝑣𝑜𝑟𝑎𝑏𝑙𝑒 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠

# 𝑃𝑜𝑠𝑠𝑖𝑏𝑙𝑒 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠

CIC

Definition 1 (Sample space and event).

The sample space S of an experiment is the

set of all possible outcomes of an experiment.

An event A is a subset of the sample space S,

and we say that A occurred if the actual

outcome is in A.

19

Sample space

CIC

Tossing twice a coin experiment, example

20

Sample space

CIC

“Probability is logical framework for

quantifying uncertainty and randomness” [Blitzstein and Hwang, 2014]

“Probability theory is a branch of

mathematics that deals with repetitive events

whose occurrence or nonoccurrence is

subject to chance variation.” [Schay, 2007]

21

What is Probability?

CIC

Provides tools for understanding and

explaining variation, separating signal from

noise, and modeling complex phenomena.

(engineer definition)

22

What is Probability?

CIC

There are situation where the frequency

interpretation is not appropriate

Example: A scholar asserts that the Iliad and

the Odyssey were composed by the same

person, with probability 90%

It is based on the scholar’s subjective belief

23

What is Probability?

CIC

The theory of probability is useful in a broad

variety of contexts and applications:

Statistics, Physics, Biology, Computer

Science, Meteorology, Gambling, Finance,

Political Science, Medicine, Life.

Assignment 1a: Give an example of the

application of probability theory in each area

Assignment 1b: Read math review: http://projects.iq.harvard.edu/files/stat110/files/math_rev

iew_handout.pdf

24

What is Probability?

CIC

25

Probabilistic Model

CIC

The sample space S, which is the set of all

possible outcomes of an experiment.

The probability law, which assigns to a set A of

possible outcomes (also called an event) a

nonnegative number P(A) (called the probability

of A) that encodes our knowledge or belief about

the collective “likelihood” of the elements of A.

The probability law must satisfy certain

properties.

26

Elements of a Probabilistic Model

CIC

The experiment will produce exactly one out of

several possible outcomes.

A subset of the sample space, that is, a collection

of possible outcomes, is called an event.

It means that any collection of possible

outcomes, including the entire sample space S

and its complement, the empty set , may

qualify as an event.Strictly speaking, however, some sets have to be excluded. In particular when dealing with probabilistic models

involving an uncountable infinite sample space, there are certain unusual subsets for which one cannot

associate meaningful probabilities.

27

Experiments and events

CIC

There is no restriction on what constitutes an

experiment.

The events to be considered can be described by

such statements as “a toss of a given coin results

in head,” “a card drawn at random from a regular

52 card deck is an Ace,” or “this book is green.”

Associated with each statement there is a set S of

possibilities, or possible outcomes.

28

Experiments and events

CIC

Examples of experiments and events:

Tossing a Coin. For a coin toss, S may be taken to consist of

two possible outcomes, which we may abbreviate as H and T

for head and tail. We say that H and T are the members,

elements or points of S, and write S = {H, T}.

Tossing two coins but ignore one of them. In this case S =

{HH, HT, TH, TT}. In this case, for instance, the outcome

“the first coin shows H” is represented by the set {HH, HT},

that is, this statement is true if we obtain HH or HT and false

if we obtain TH or TT.

29

Experiments and events

CIC

Tossing a Coin Until an H is Obtained. If we toss a coin

until an H is obtained, we cannot say in advance how many

tosses will be required, and so the natural sample space is S =

{H, TH, TTH, TTTH, . . . }, an infinite set. We can use, of

course, many other sample spaces as well, for instance, we

may be interested only in whether we had to toss the coin

more than twice or not, in which case S = {1 or 2, more than

2} is adequate.

Selecting a Number from an Interval. Sometimes, we need

an uncountable set for a sample space. For instance, if the

experiment consists of choosing a random number between 0

and 1, we may use S = {x : 0 < x < 1}.

30

Experiments and events

CIC

Specifies the “likelihood” of any outcome, or of

any set of possible outcomes.

Assigns to every event A, a number P(A), called

the probability of A.

31

The probability law

CIC

Given a sample space S and a certain collection ℱ of its

subsets, called events, an assignment P of a number P(A) to

each event A in ℱ is called a probability measure, and P(A)

the probability of A, if P has the following properties:

1. P(A) ≥ 0 for every A,

2. P(S) = 1, and

3. P(A1 ∪ A2 ∪· · · ) = P(A1)+ P(A2) + ·· · for any finite or

countably infinite set of mutually exclusive events A1, A2, …

Then, the sample space S together with ℱ and P is called a

probability space.

32

Probability Space[Schay 2007]

CIC

33

Probability Axioms[Bertsekas and Tsitsiklis, 2008]

S

P(S) = 1.

CIC

Definition 1.6.1 (General definition of probability). A

probability space consists of a sample space S and a

probability function P which takes an event A S as input

and returns P(A), a real number between 0 and 1, as output.

The function P must satisfy the following axioms:

1. P() = 0, P(S) = 1.

2. If A1, A2, . . . are disjoint events, then:

(Saying that these events are disjoint means that they are mutually exclusive:

Ai ∩ Aj = for i ≠ j.)

34

Probability Space[Blitzstein and Hwang, 2015]

CIC

The Probability of the Empty Set Is 0. In any

probability space, P(∅) = 0.

Proof:

35

Properties of probabilities

1 = P(S) = P(S∪ ) = P(S) + P() = 1 + P()

CIC

The Probability of the Union of Two Events.

For any two events A and B,

P(A ∪ B) = P(A) + P(B) − P(A ∩ B)

Proof:

36

Properties of probabilities

CIC

Probability of Complements. For any event A,

P(Ac) = 1 − P(A)

Proof: Ac ∩ A = ∅ and Ac ∪ A = S by the definition

of Ac. Thus, by Axiom 3, P(S) = P(Ac ∪ A) = P(Ac)

+ P(A). Now, Axiom 2 says that P(S) = 1, and so,

comparing these two values of P(S), we obtain

P(Ac) + P(A) = 1.

37

Properties of probabilities

CIC

Probability of Subsets. If A ⊂ B,

then P(A) ≤ P(B).

Proof:

38

Properties of probabilities

If A B, then we can write B as the

union of A and B ∩ Ac, where B ∩ Ac is

the part of B not also in A.

Since A and B ∩ Ac are disjoint, we can

apply the second axiom:

P(B) = P(A∪ (B ∩ Ac)) = P(A) + P(B ∩ Ac)

Probability is nonnegative, so P(B ∩ Ac) ≥ 0, proving that P(B) ≥ P(A).

CIC

Inclusion-exclusion. For any events A1, . . . ,An,

39

Properties of probabilities

CIC

Example:

40

Properties of probabilities

CIC

41

Properties of Probability Laws

CIC

42

Discrete Probability Law

CIC

In the special case where the probabilities P(s1),

…, P(sn) are all the same, by necessity equal to

1/n, in view of the normalization axiom, we

obtain:

43

Discrete Uniform Probability Law

CIC

44

Discrete Uniform Probability Law

CIC

45

Discrete Uniform Probability Law

CIC

The calculation of probabilities often involves

counting the number of outcomes in various events. When the sample space S has finite number of equally likely

outcomes, so that the discrete uniform probability law applies. Then,

the probability of any event A is given by:

When we want to calculate the probability of an event A with a finite

number of equally likely outcomes, each of which has an already

known probability p. Then the probability of A is given by:

46

Counting

𝑃 𝐴 =number of elements of 𝐴

number of elements of 𝑆=

𝑘

𝑛

𝑃 𝐴 = 𝑝 ∙ (number of elements of 𝐴)

CIC

In how many ways you can dress today if you

find:

4 shirts

3 ties

2 jackets

in your closet?

47

Basic Counting Principle

CIC

Consider a process that consists of r stages. Suppose

that:

a) There are n1 possible results at the firs stage.

b) For every possible result at the first stage, there are n2

possible results at the second stage.

c) More generally, for any sequence of possible results at the

first i ˗ 1 stage, there are ni possible results at the ith stage.

Then, the total number of possible results of the r-stage

process is:

𝑛1𝑛2⋯𝑛𝑟

48

The Multiplication Principle

CIC

49

The Multiplication Principle

CIC

Example 1. The number of telephone numbers. A

local telephone company number is a 7-digit

sequence, but the first digit has to be different

from 0 or 1. How many distinct telephone

numbers are there?

50

The Multiplication Principle

CIC

Example 2. The number of subsets of an n-

element set. Consider an n-element set

{s1, s2,…, sn}.

How many subsets it have, including itself and

the empty set?

Example, in the set {1,2,3}?

51

The Multiplication Principle

CIC

This is a sequential process where we take in turn

each of the n elements and decide whether to

include it in the desired subset or not.

Thus we have n steps, and in each step two choices,

namely yes or no to the question of whether the

element belongs to the desired subset. Therefore the

number of subsets is:

for n = 1?

52

The Multiplication Principle

CIC

Example 3. Drawing three cards. What is the

number of ways three cards can be drawn one

after the other from a regular 52 cards deck

without replacement?

What is this number if we replace each card

before the next one is drawn?

53

Number of subsets

n1 = n2 = n3 = 52 523

n1 = 52, n2 = 51, n3 = 50 525150

CIC

Involve the selection of k objects out of a

collection of n objects.

If the order of selection matters, the selection is

called a permutation.

If the order of selection does not matter, the

selection is called a combination.

54

Permutation and Combination

CIC

k permutations

Assume there are n distinct objects, and let k be

some positive integer with k n.

We want to count the number of different ways

that we can pick k out of these n objects and

arrange them in a sequence, e.g. the number of

distinct k-object sequences.

55

Permutation

CIC

In place 1 we can put n objects, which we can write as

n−1+1;

In place 2 we can put n−1 = n−2+1 objects; and so on.

Thus the kth factor will be n − k + 1, and so, for any 2

positive integers n and k ≤ n:

n(n − 1)(n − 2) · · · (n − k + 1) = Pn,k

In the special case where k = n:

n(n − 1)(n − 2) · · · 3 · 2 · 1 = n!

The number of possible sequences is simple called

permutations

56

Permutation

CIC

From the definitions of n!, (n − k)! and Pn,k we

can obtain the following relation:

n! = [n(n − 1)(n − 2) · · · (n − k + 1)][(n − k)(n − k − 1) · · · 2 · 1]

= Pn,k · (n − k)!

and so:

𝑃𝑛,𝑘 =𝑛!

𝑛 − 𝑘 !

with 0! = 1.

57

Permutation

CIC

Example 4. Six rolls of a die. Find the probability

that:

Six rolls of a (six sided) die all give different numbers

Assume all outcomes are equally likely

P(all six rolls give different numbers) = ?

58

Probability calculation

𝑃 𝐴 =number of elements of 𝐴

number of elements of 𝑆=

𝑘

𝑛

𝑃 𝐴 = 𝑝 ∙ (number of elements of 𝐴)

p = probability of each equally likely outcome in A

CIC

Example 4. Six rolls of a die. Find the probability

that:

Six rolls of a (six sided) die all give different numbers

Assume all outcomes are equally likely

P(all six rolls give different numbers) = ?

59

Probability calculation

𝑃 𝐴 =number of elements of 𝐴

number of elements of 𝑆=

𝑘

𝑛= 𝐴 =

𝑃6,6

# 𝑒𝑙𝑒𝑚𝑒𝑛𝑡𝑠 𝑖𝑛 𝑆=

6!

66

𝑃 𝐴 = 𝑝 ∙ number of elements of 𝐴 =1

666!

p = probability of each equally likely outcome in A

CIC

Example 5. Dealing Three Cards. In how many

ways can three cards be dealt from a regular deck

of 52 cards?

60

Permutation

CIC

Example 5. Dealing Three Cards. In how many

ways can three cards be dealt from a regular deck

of 52 cards?

P52,3 = 𝑃𝑛,𝑘 =𝑛!

𝑛−𝑘 !

= 52·51·50 = 132, 600.

61

Permutation

CIC

Example 6. Birthday problem. There are k people

in a room. Assume each person’s birthday is

equally likely to be any of the 365 days of the

year (we exclude February 29), and that people’s

birthdays are independent (we assume there are

no twins in the room). What is the probability

that two or more people in the group have the

same birthday?

62

Permutation

CIC

This amounts to sampling the 365 days of the year without

replacement, so:

365 · 364 · 363 · · · (365−k +1) for k 365

Therefore the probability of no birthday matches in a group of k

people is:

and the probability of at least one birthday match is:

63

Permutation

CIC

64

Permutation

Probability that in a room of k people, at least two were born on the

same day. This probability first exceeds 0.5 when k = 23.

CIC

The number of possible unordered selections of k different

things out of n different ones is denoted by Cn,k, and each such

selection is called a combination of the given things.

If we select k things out of n without regard to order, then, this

can be done in Cn,k ways.

In each case we have k things which can be ordered k! ways.

Thus, by the multiplication principle, the number of ordered

selections is Cn,k · k!

On the other hand, this number is, by definition, Pn,k . Therefore

Cn,k · k! = Pn,k , and so:

65

Combinations

𝐶𝑛,𝑘 =𝑃𝑛,𝑘𝑘!

=𝑛!

𝑘! 𝑛 − 𝑘 !

CIC

The quantity on the right-hand side is usually abbreviated

as 𝑛𝑘

, and is called a binomial coefficient.

Thus, for any positive integer n and k = 1, 2, . . . , n:

𝐶𝑛,𝑘 =𝑛𝑘

=𝑛(𝑛 − 1)(𝑛 − 2)⋯ (𝑛 − 𝑘 + 1)

𝑘!

=𝑛!

𝑘! 𝑛 − 𝑘 !

66

Combinations

n ! = [n (n − 1)(n − 2) · · · (n − k + 1)][(n − k)(n − k − 1) · · · 2 · 1]

CIC

67

Combinations

CIC

Binomial coefficient 𝑛𝑘 Binomial probabilities

n 1 independent coin tosses: P(H) = p; P(k heads) = ?

Example: P(HTTTHH) = ?

P(particular sequence) = ?

P(particular k-head sequence) = ?

68

Binomial probabilities

CIC

A combination can be seen as a partition of the

set in two: one part contains k elements and the

other contains the remaining n ˗ k elements.

Given an n-element set and nonnegative integers

n1, n2, …, nr, whose sum is equal to n; consider

partitions of the set into r disjoint subsets, with

the ith subset containing exactly ni elements.

In how many ways this can be done?

69

Partitions

CIC

There are 𝑛𝑛1

ways of forming the first subset.

Having formed the first subset, there are left n – n1

elements. We need to choose n2 of them in order to form

the second subset, and have 𝑛 − 𝑛1𝑛2

choices, and so on.

Thus, using the Counting Principle:

70

Partitions

CIC

As several terms cancel, it results:

This is called the multinomial coefficient and is

usually denoted by:

71

Partitions

CIC

72

Partitions

CIC

Example 7. Each person gets an ace. There is a 52-

card deck, dealt (fairly) to four players. What is the

probability of each player getting an ace?

73

Partitions

CIC

Example 7. Each person gets an ace. There is a 52-

card deck, dealt (fairly) to four players. What is the

probability of each player getting an ace?

The size of the sample space is: 52!

13!13!13!13!

Constructing an outcome with one ace for each person:

o# of different ways of distributing the 4 aces to 4 players: 4!

oDistribution of the remaining 48 cards: 48!

12!12!12!12!

74

Partitions

CIC

75

Summary of Counting Results

CIC

Conditional probability provides us with a way to

reason about the outcome of an experiment,

based on partial information.

Examples:

A) In an experiment involving two successive rolls of a die,

you are told that the sum of the two rolls is 9. How likely is

that the first roll was 6?

B) In a word guessing game, the first letter of the word is a

“t”. What is the likelihood that the second letter is an “h”?

76

Conditional Probability

CIC

C) How likely is it that a person has certain

disease given that a medical test was negative?

D) A spot shows up on a radar screen. How likely

is it to correspond to an aircraft?

77

Conditional Probability

CIC

Given:

An experiment

A corresponding sample space

A probability law

We know that the outcome is within some given event

B.

Quantify the likelihood that the outcome also

belongs to some other given event A.

78

Conditional Probability

CIC

Construct a new probability law that takes into

account the available knowledge.

A probability law that for any event A, specifies the

conditional probability of A given B, P(A|B).

The conditional probabilities P(A|B) of

different events A should satisfy the

probability axioms.

79

Conditional Probability

CIC

Example:

Suppose that all six possible outcomes of a fair die

roll are equally likely.

If the outcome is even, then there are only three

possible outcomes: 2, 4 and 6.

What is the probability of the outcome being 6 given

that the outcome is even?

80

Conditional Probability

CIC

If all possible outcomes are equally likely:

Conditional probability definition:

With P(B) > 0.

The total probability of the elements of B, P(A|B) is the fraction

that is assigned to possible outcomes that also belong to A.

81

Conditional Probability

CIC

Probability law of conditional probabilities

satisfy the three axioms:

1. P(A|B) ≥ 0 for every event A,

2. P(S|B) = 1,

3. P(A1 ∪ A2 ∪ ·· · |B) = P(A1|B)+ P(A2|B) + ·· · for

any finite or countably infinite number of mutually

exclusive events A1, A2, . . . .

82

Conditional Probability

CIC

Proofs:

1. In the definition of P(A|B) the numerator is

nonnegative by Axiom 1, and the denominator is

positive by assumption. Thus, the fraction is

nonnegative.

2. Taking A = S in the definition of P(A|B), we get:

83

Conditional Probability

CIC

3.

84

Conditional Probability

CIC

85

Conditional Probability

Knowledge that event B has occurred implies that the outcome of the

experiment is in the set B. In computing P(A|B) we can therefore view the

experiment as now having the reduced sample space B. The event A occurs

in the reduced sample space if and only if the outcome ζ is in A ∩ B. The

equation simply renormalizes the probability of events that occur jointly

with B.

CIC

86

Conditional Probability

Suppose that we learn that B occurred. Upon obtaining this information, we get rid

of all the pebbles in Bc because they are incompatible with the knowledge that B

has occurred. Then P(A∩B) is the total mass of the pebbles remaining in A. Finally,

we renormalize, that is, divide all the masses by a constant so that the new total

mass of the remaining pebbles is 1. This is achieved by dividing by P(B), the total

mass of the pebbles in B. The updated mass of the outcomes corresponding to

event A is the conditional probability P(A|B) = P(A∩B)/P(B).

CIC

If we interpret probability as relative frequency:

P(A|B) should be the relative frequency of the event

P(A∩B) in experiments where B occurred.

Suppose that the experiment is performed n times, and

suppose that event B occurs nB times, and that event

A∩B occurs nA∩B times. The relative frequency of

interest is then:

where we have implicitly assumed that P(B) > 0.

87

Conditional Probability

CIC

Example 1. Given the figure below, obtain

P(A|B)

88

Conditional Probability

CIC

Example 2. A ball is selected from an urn containing two

black balls, numbered 1 and 2, and two white balls, numbered

3 and 4. The number and color of the ball is noted, so the

sample space is {(1,b),(2,b), (3,w), (4,w)}. Assuming that the

four outcomes are equally likely, find P(A|B) and P(A|C),

where A, B, and C are the following events:

89

Conditional Probability

CIC

Example 3. From all families with three children,

we select one family at random. What is the

probability that the children are all boys, if we

know that a) the first one is a boy, and b) at least

one is a boy? (Assume that each child is a boy or

a girl with probability 1/2, independently of each

other.)

90

Conditional Probability

CIC

Example 4. A card is drawn at random from a deck

of 52 cards. What is the probability that it is a King

or a 2, given that it is a face card (J, Q, K)?

91

Conditional Probability

CIC

If we multiply both sides of the definition of

P(A|B) by P(B) we obtain:

P(A ∩ B) = P(A|B) P(B)

Similarly, if we multiply both sides of the

definition of P(B|A) by P(A) we obtain:

P(B ∩ A) = P(B|A) P(A)

92

Total Probability Theorem and

Bayes’ Rule

CIC

Joint Probability of Two Events. For any events

A and B with positive probabilities:

P(A ∩ B) = P(B) P(A|B) = P(A) P(B|A)

Joint Probability of Three Events

P(A∩B∩C) = P(A) P(B|A) P(C|A∩B)

P(A1∩A2∩A3) = P(A1) P(A2|A1) P(A3|A1∩A2)

93

Total Probability Theorem and

Bayes’ Rule

CIC

Applying repeatedly, we can generalise to the

intersection of n events.

94

Total Probability Theorem and

Bayes’ Rule

CIC

95

Total Probability Theorem and

Bayes’ Rule

CIC

96

Total Probability Theorem

Total Probability Theorem:

CIC

P(B) = P(A1) P(B|A1) + · · · + P(An) P(B|An)

The Ai partition the sample space; P(B) is equal to:

97

Total Probability Theorem

The probability that B occurs is a

weighted average of its conditional

probability under each scenario,

where each scenario is weighted

according to its (unconditional)

probability.

CIC

98

Total Probability Theorem

CIC

Example 1. Radar detection. If an aircraft is present

in certain area, a radar detects it and generates an

alarm signal with probability 0.99. If an aircraft is

not present, the radar generates a (false) alarm, with

probability 0.10. We assume that an aircraft is

present with probability 0.05.

What is the probability of no aircraft presence and

false alarm?

What is the probability of aircraft presence and no

detection?

99

Total Probability Theorem

CIC

100

Total Probability Theorem

Sequential

representation

in a tree diagram

CIC

101

Total Probability Theorem

Sequential

Representation in a tree diagram

CIC

Example 2. Picking Balls from Urns. Suppose we

have two urns, with the first one containing 2 white

and 6 black balls, and the second one containing 2

white and 2 black balls. We pick an urn at random,

and then pick a ball from the chosen urn at random.

What is the probability of picking a white ball?

102

Total Probability Theorem

CIC

103

Total Probability Theorem

Tree diagram

What is the probability of picking a black ball?

CIC

Dealing Three Cards. From a deck of 52 cards

three are drawn without replacement.

What is the probability of the event E of getting

two Aces and one King in any order?

Denote the relevant outcomes by A, K and O (for

“other”),

104

Total Probability Theorem

CIC

105

Total Probability Theorem

CIC

106

Total Probability Theorem

CIC

107

Bayes’ Rule

CIC

To verify Bayes’ rule, by the definition of

conditional probability:

P(B) follows from the total probability theorem.

108

Bayes’ Rule

CIC

109

Bayes’ Rule

CIC

110

Bayes’ Rule

CIC

Example 1. Rare disease. A test for a rare disease is assumed

to be correct 95% of the time: if a person has the disease, the

test results are positive with probability 0.95, and if the person

does not have the disease, the results are negative with

probability 0.95. A random person drawn from a certain

population has probability 0.001 of having the disease. Given

that the person just tested positive, what is the probability of

having the disease?

A={“the person has the disease”}

B={“the test results are positive”}

P(A|B)=?

111

Bayes’ Rule

CIC

112

Bayes’ Rule

A rare disease we need a much more accurate test. The probability of

a false positive result must be of a lower order of magnitude than the

fraction of people with the disease.

CIC

Example 2. Random coin. You have one fair

coin, and one biased coin which lands Heads with

probability 3/4. You pick one of the coins at

random and flip it three times. It lands Heads all

three times. Given this information, what is the

probability that the coin you picked is the fair

one?

113

Bayes’ Rule

CIC

114

Bayes’ Rule

Before flipping the coin, we thought we were equally likely to have picked

the fair coin as the biased coin: P(F) = P(Fc) = 1/2. Upon observing three

Heads, however, it becomes more likely that we’ve chosen the biased coin

than the fair coin, so P(F|A) is only about 0.23.

CIC

Independence of two events. Events A and B are

independent if

P(A ∩ B) = P(A) P(B)

If P(A) > 0 and P(B) > 0, then this is equivalent

to:

P(A|B) = P(A)

and also equivalent to:

P(B|A) = P(B)

115

Independence

CIC

Two events are independent if we can obtain the

probability of their intersection by multiplying

their individual probabilities. Alternatively, A

and B are independent if learning that B occurred

gives us no information that would change our

probabilities for A occurring (and vice versa).

Independence is a symmetric relation: if A is

independent of B, then B is independent of A.

116

Independence

CIC

Independence is completely different

from disjointness. If A and B are

disjoint, then P(A∩B) = 0, so disjoint

events can be independent only if P(A)

= 0 or P(B) = 0. Knowing that A occurs

tells us that B definitely did not occur,

so A clearly conveys information about

B, meaning the two events are not

independent (except if A or B already

has zero probability).

117

Independence

CIC

If A and B are independent, then A and Bc are

independent, Ac and B are independent, and Ac

and Bc are independent.

Proof. Let A and B be independent. Then

P(Bc|A) = 1 − P(B|A) = 1 − P(B) = P(Bc)

so A and Bc are independent. Swapping the roles of A and B, we

have that Ac and B are independent. Using the fact that A, B

independent implies A, Bc independent, with Ac playing the role

of A, we also have that Ac and Bc are independent.

118

Independence

CIC

Independence of three events. Events A, B, and C

are said to be independent if all of the following

equations hold:

P(A ∩ B) = P(A)P(B)

P(A ∩ C) = P(A)P(C)

P(B ∩ C) = P(B)P(C)

P(A ∩ B ∩ C) = P(A)P(B)P(C)

119

Independence

CIC

120

Independence

CIC

Independence of many events. For n events A1,A2, . . . ,

An to be independent, we require any pair to satisfy:

P(Ai ∩ Aj) = P(Ai)P(Aj) (for i ≠ j),

any triplet to satisfy:

P(Ai ∩ Aj ∩ Ak) = P(Ai)P(Aj)P(Ak) (for i, j, k distinct)

And similarly for all quadruplets, quintuplets, and so on.

For infinitely many events, we say that they are

independent if every finite subset of the events is

independent.

121

Independence

CIC

Given an event C, the events A and B are said to

be conditionally independent if:

P(A ∩ B|C) = P(A|C) P(B|C)

122

Conditional independence

CIC

The previous relation states that if C is known to

have occurred, the additional knowledge that B

also occurred does not change the probability of

A.

The independence of two events A and B with

respect to the unconditional probability law, does

not imply conditional independence, and vice

versa.

123

Conditional independence

CIC

Example 2. Reliability.

124

Independence

pi: probability that unit i is “up”

ui: ith unit is up

u1, u2,…, un are independent

fi: ith unit is down

fi are independent

P(system is up) = ?