Review of probability calculus

Review of probability calculus

June 11, 2017Andreas Scheidegger

Eawag: Swiss Federal Institute of Aquatic Science and Technology

Random variables (RV)“Mathematical machines that generate numbers”

Completely described by the cumulative probability distributionfunction (cdf) or the probability distribution/density function(pdf).Some properties can be described by measures such as mean,variance, mode, . . .

Andreas Scheidegger Univariate Random Variables 1

Probability Distribution/Density Function (pdf)

PA fB

z1 z2 zn zzrzl

Discrete RV Probability to obtain a certain output.Continuous RV Proportional to the probability to obtain an output

close to a certain value.


Cumulative Distribution Function (cdf)

FA FB

z1 z2 zn zzrzl0

1

0

1

Discrete and continous RV Probability to obtain an output equalor smaller than a certain value.


cdf and pdf

Discrete RVs

Distribution function:

FA(z) = P(A ≤ z)

Probability distribution:

PA(zi ) for zi ∈ ΩA

Continous RVs

Distribution function:

FB(z) = P(B ≤ z)

Probability density:

fB(z) = ddz FB(z)

P(B ∈ [z1, z2]) =∫ z2

z1fB(z) dz

P(B ∈ [z , z + ∆]) ≈ ∆ · fB(z)


Characteristics of Random VariablesMeasures of LocationExpected value:

E[A] =∑

z∈ΩA

z PA(z) , E[B] =∫

ΩBz fB(z) dz

Median:

Med[Z ] : P(Z ≤ Med[Z ]) = P(Z > Med[Z ]) = Q0.5[Z ]

Quantiles:

Qp[Z ] : P(Z ≤ Qp[Z ]) = p and P(Z > Qp[Z ]) = 1− p

Mode:

Mode[A] = arg maxzi∈ΩA

PA(zi ) , Mode[B] = arg maxz∈ΩB

fB(z)


Characteristics of Random VariablesMeasures of Location

Expected value of a function of a RV:

E[g(A)] =∑

z∈ΩA

g(z)PA(z)

E[g(B)] =∫

ΩBg(z)fB(z) dz

Attention!

E[g(X )] 6= g (E[X ])


Characteristics of Random VariablesMeasures of Location

Expected value of a function of a RV:

E[g(A)] =∑

z∈ΩA

g(z)PA(z)

E[g(B)] =∫

ΩBg(z)fB(z) dz

Attention!

E[g(X )] 6= g (E[X ])


Characteristics of Random VariablesMeasures of Extension

Variance:Var[Z ] = E

[(Z − E[Z ]

)2]Standard Deviation:

SD[Z ] =√Var[Z ]

Inter-Quantile Range:

QRp[Z ] = Q(1+p)/2[Z ]− Q(1−p)/2[Z ]


Characteristics of Random Variables

E[aZ + b] = a E[Z ] + b

E[Z1 ± Z2] = E[Z1]± E[Z2]

Var[Z ] = E[Z 2]− E[Z ]2

Var[aZ + b] = a2 Var[Z ]

Only if Z1 and Z2 are independent:

Var[Z1 ± Z2] = Var[Z1] + Var[Z2]


Multivariate random variables

A

B

Andreas Scheidegger Multivariate Random Variables 9

Joint distribution

discrete RV:

PA,B(a, b) = PA|B(a|b) · PB(b) = PB|A(b|a) · PA(a)

E.g.: PA,B(3, 1) : Probability to obtain ai = 3 and bi = 1.

continous RV:

fA,B(a, b) = fA|B(a|b) · fB(b) = fB|A(b|a) · fA(a)

E.g.: fA,B(3, 1) : proportional to the probability to obtain arealization close to 3 and 1.


Joint distribution

discrete RV:

PA,B(a, b) = PA|B(a|b) · PB(b) = PB|A(b|a) · PA(a)

E.g.: PA,B(3, 1) : Probability to obtain ai = 3 and bi = 1.

continous RV:

fA,B(a, b) = fA|B(a|b) · fB(b) = fB|A(b|a) · fA(a)

E.g.: fA,B(3, 1) : proportional to the probability to obtain arealization close to 3 and 1.


Conditional Distributions

Discrete RV:PA|B(a|b) = PA,B(a, b)

PB(b)

Continuous RV:fA|B(a|b) = fA,B(a, b)

fB(b)


Marginal distribution

Discrete random variables:

PA(a) =∑

b∈ΩB

PA,B(a, b)

Continuous random variables:

fA(a) =∫

ΩBfA,B(a, b) db


Marginal distribution


PA(a) =∑

b∈ΩB

PA,B(a, b)


fA(a) =∫

ΩBfA,B(a, b) db


Independence

Definition:FA,B(a, b) = FA(a) · FB(b)


PA,B(a, b) = PA(a) · PB(b)


fA,B(a, b) = fA(a) · fB(b)


Bayes’ Theorem1

Discrete random variables

BecausePA|B(a|b)PB(b) = PB|A(b|a)PA(a)

we can write

PA|B(a|b) =PB|A(b|a)PA(a)

PB(b) =PB|A(b|a)PA(a)∑

a′∈ΩA

PB|A(b|a′)PA(a′)

1Bayes’ Theorem as we know it today was actually formulated by P. Laplacein 1774 and not by T. Bayes.


Bayes’ TheoremContinuous random variables

fA|B(a|b) =fB|A(b|a)fA(a)

fB(b) =fB|A(b|a)fA(a)∫

fB|A(b|a′)fA(a′) da′


Characteristics of Random VariablesDependencies

Variance-Covariance Matrix:

Var[Z] = E[(Z− E[Z]

)(Z− E[Z]

)T]Individual Covariances:

Cov[Zi ,Zj ] = E[(Zi − E[Zi ]

)(Zj − E[Zj ]

)]= Var[Z]i ,j

Correlation Matrix:

Cor[Z]i ,j = Cov[Zi ,Zj ]√Var[Zi ] · Var[Zj ]


CorrelationCorrelation measures only linear dependencies!

Figure: Several sets of (x , y) points, with the correlation coefficient of xand y for each set. Source: Wikipedia.


http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient

Short NotationFunction argument corresponds to RV

PA(a), PB|A(b|a) ←→ P(a), P(b|a)

fB(b), fA|B(a|b) ←→ f (b), f (a|b) or p(b), p(a|b)

Example:

fX1|X2,X3(x1|x2, x3) =fX2|X1(x2|x1)fX1|X3(x1|x3)

fX2(x2)

p(x1|x2, x3) = p(x2|x1)p(x1|x3)p(x2)

Andreas Scheidegger Notation 18

Short NotationFunction argument corresponds to RV

PA(a), PB|A(b|a) ←→ P(a), P(b|a)

fB(b), fA|B(a|b) ←→ f (b), f (a|b) or p(b), p(a|b)

Example:

fX1|X2,X3(x1|x2, x3) =fX2|X1(x2|x1)fX1|X3(x1|x3)

fX2(x2)

p(x1|x2, x3) = p(x2|x1)p(x1|x3)p(x2)


Directed Acyclic GraphsVisualize independence structure of RV

A

B

DC

p(A)

p(B | A)

p(C | A,B)

p(D | B)

e.g. A and D are conditionallyindependent. joint distribution:

p(A,B,C ,D) =p(A) p(B | A) p(C | A,B) p(D | B)


Directed Acyclic GraphsVisualize independence structure of RV

A

B

DC

p(A)

p(B | A)

p(C | A,B)

p(D | B)

e.g. A and D are conditionallyindependent. joint distribution:

p(A,B,C ,D) =p(A) p(B | A) p(C | A,B) p(D | B)


Normal distribution

Andreas Scheidegger Normal distributions 20

Central Limit Theorem

Lets X1,X2, . . . be independent and identically distributed RVswith mean µ and a finite variance σ2. Further we defineSn = X1 + X2 + . . .+ Xn, that has a mean nµ and variance nσ2.Then the standardized RV

Zn = Sn − nµ√nσ

is standard normal distributed for n→∞.


Central Limit Theorem Example

n = 1

Den

sity

−2 −1 0 1 2

0.0

0.4

0.8

n = 2

Den

sity

−2 −1 0 1 20.

00.

30.

6

n = 3

Den

sity

−2 −1 0 1 2

0.0

0.3

0.6

n = 4

Den

sity

−2 −1 0 1 2

0.0

0.2

0.4

n = 5

Den

sity

−2 −1 0 1 2

0.0

0.2

0.4

n = 6

Den

sity

−2 −1 0 1 2

0.0

0.2

0.4

n = 7

Den

sity

−2 −1 0 1 2

0.0

0.2

0.4

n = 8

Den

sity

−2 −1 0 1 2

0.0

0.2

0.4

n = 9

Den

sity

−2 −1 0 1 2

0.0

0.2

0.4

n = 10

Den

sity

−2 −1 0 1 2

0.0

0.2

0.4

n = 11

Den

sity

−2 −1 0 1 2

0.0

0.2

0.4

n = 12

Den

sity

−2 −1 0 1 2

0.0

0.2

0.4


Relationships of Univariate Distributions

Figure 1. Univariate distribution relationships.

The American Statistician, February 2008, Vol. 62, No. 1 47

Dow

nloa

ded

by [

Lib

4RI]

at 0

2:24

28

May

201

3

Figure 1. Univariate distribution relationships.

The American Statistician, February 2008, Vol. 62, No. 1 47

Dow

nloa

ded

by [

Lib

4RI]

at 0

2:24

28

May

201

3

From: Leemis, L. M. and McQueston, J. T. (2008) Univariate distributionrelationships. The American Statistician, 62(1), 45–53. → Link


http://www.math.wm.edu/~leemis/2008amstat.pdf

Multivariate Normal Distribution

Density of a multivariate Normal distribution of dimension n with amean vector µ and a variance-covariance matrix Σ:

Z ∼ N(µ,Σ)

fN(µ,σ,R)(z) = 1(2π)n/2

1| Σ |1/2 exp

(−12(z− µ)TΣ−1(z− µ)

)


Multivariate Normal DistributionProperties

All marginals are normal distributed

Z ∼ N(µ,Σ) ⇒ Zi ∼ N(µi ,Σi ,i )

Linear transformation:

Z ∼ N(µ,Σ)⇒ AZ + b ∼ N(Aµ + b,AΣAT)

Conditional distribution:

Z =(

XY

)∼ N

(µXµY

,

[ΣX,X ΣX,YΣTX,Y ΣY,Y

])

⇒ X | Y (y) ∼ N(µX + ΣX,YΣ−1

Y,Y(y− µY),ΣX,X −ΣX,YΣ−1Y,YΣT

X,Y

)




Z ∼ N(µ,Σ) ⇒ Zi ∼ N(µi ,Σi ,i )




Z =(

XY

)∼ N

(µXµY

,


])

⇒ X | Y (y) ∼ N(µX + ΣX,YΣ−1


X,Y

)




Z ∼ N(µ,Σ) ⇒ Zi ∼ N(µi ,Σi ,i )




Z =(

XY

)∼ N

(µXµY

,


])

⇒ X | Y (y) ∼ N(µX + ΣX,YΣ−1


X,Y

)Andreas Scheidegger Normal distributions 25

Further Generalization

one-dimensional

n-dimensional

A

B what’s next?

Andreas Scheidegger Random Processes 26

Discrete random process“Random vectors with infinity large number of elements”

(0.11, 10.78, -10.24, -3.90, 5.91, ...)(-1.11, -4.06, -8.64, -0.92, -2.27, ...)

(0.76, -8.54, 0.81, 2.03, 12.9, ...)


Continous random processes“Random functions”


What is a Probability?Interpretation of probabilities

1. The probability for “head” is 1/2.2. The probability that it rains tomorrow is 30%.

Frequentist

1. The frequency that “head”occurs if the randomexperiment is repeated.

2. “Rain tomorrow” is not arepeatable experiment

Subjective

1. Somebody’s belief that acoin toss results in “head”,given his/her experience.

2. Somebody’s belief that itrains tomorrow, givenhis/her experience.

Other probability interpretations:→ http://www.webcitation.org/6YupVo9zG

Andreas Scheidegger Interpretation 29

http://www.webcitation.org/6YupVo9zG



Frequentist



Subjective








Frequentist



Subjective






Summary

joint = conditional x marginal

f (a, b) = f (a|b) f (b) = f (b|a) f (a)

Marginals:

f (a) =∫

f (a, b) db =∫

f (a|b) f (b) db

More information in Appendix A.2 – A.5.

Andreas Scheidegger Summary 30

Common distributions


Implemented distribution in R

For all distributions four functions are implemented:

d__(x, ...) pdf evaluated at x

p__(x, ...) cdf evaluated at x

q__(p, ...) p-th quantiler__(n, ...) sample n random numbers

beta *beta binomial *binomCauchy *cauchy chi-squared *chisqexponential *exp F *fgamma *gamma geometric *geomhypergeometric *hyper log-normal *lnormmultinomial *multinom negative binomial *nbinomnormal *norm Poisson *poisStudent’s t *t uniform *unifWeibull *weibull


Normal DistributionDensity

Z ∼ N(µ, σ) fN(µ,σ)(z) = 1σ√2π

exp(−(z − µ)2

2σ2

)

−3 −2 −1 0 1 2 3

01

23

45

Normal with mean=0

z

f

sd = 0.1sd = 0.25sd = 0.5sd = 1sd = 2sd = 4

−3 −2 −1 0 1 2 3

0.0

0.2

0.4

0.6

0.8

1.0

Normal with mean=0

z

F

sd = 0.1sd = 0.25sd = 0.5sd = 1sd = 2sd = 4


Normal DistributionProperties

E[N(µ, σ)

]= Mode

[N(µ, σ)

]= Med

[N(µ, σ)

]= µ

SD[N(µ, σ)

]= σ

Central limit theorem:Lets X1,X2, . . . be independent and identically distributed RVswith mean µ and a finite variance σ2. Further we defineSn = X1 + X2 + . . .+ Xn, that has a mean nµ and variance nσ2.Then the standardized RV

Zn = Sn − nµ√nσ

is standard normal distributed for n→∞.Andreas Scheidegger Summary 34

Lognormal Distribution

Definition:Z = exp(X ) , X ∼ N(m, s)

Density:Z ∼ LN(µ, σ)

fLN(µ,σ)(z) =

1√2π

1sz exp

−12

(log(zµ

)+ s2

2

)2

s2

for z > 0

0 for z ≤ 0

with

s =

√log(1 + σ2

µ2

)


Lognormal Distribution

0.0 0.5 1.0 1.5 2.0 2.5 3.0

01

23

45

Lognormal with mean=1

z

f

sd = 0.1sd = 0.25sd = 0.5sd = 1sd = 2sd = 4

0.0 0.5 1.0 1.5 2.0 2.5 3.00.

00.

20.

40.

60.

81.

0

Lognormal with mean=1

z

F

sd = 0.1sd = 0.25sd = 0.5sd = 1sd = 2sd = 4


Lognormal DistributionProperties

E[LN(µ, σ)

]= µ

Mode[LN(µ, σ)

]= µ(

1 + σ2

µ2

) 32

Med[LN(µ, σ)

]= µ√

1 + σ2

µ2

SD[LN(µ, σ)

]= σ


Lognormal DistributionR implementation

Attention: The lognormal distribution in R is defined with m and s(the mean and standard deviation of X )!

The code below computes the arguments if mean µ and standarddeviation σ are given:## conversion , ’mu ’ and ’sigma ’ givenmeanlog <- log(mu) - 0.5*log (1 + (sigma/mu )^2)sdlog <- sqrt(log (1 + sigma ^2/(mu ^2)))

## generate 1000 random samplesrlnorm (1000 , meanlog =meanlog , sdlog=sdlog)


χ2 Distribution

Definition:Z =

n∑i=1

X 2i , Xi ∼ N(0, 1)

Density:

Z ∼ χ2n fχ2

n(z) = z(n−2)/2 exp(−z/2)

2n/2 Γ(n/2)


χ2 Distribution

0 2 4 6 8 10 12 14

0.0

0.1

0.2

0.3

0.4

0.5

0.6

χ2

z

f

df = 1df = 2df = 3df = 4df = 5df = 10

0 2 4 6 8 10 12 140.

00.

20.

40.

60.

81.

0

χ2

z

F

df = 1df = 2df = 3df = 4df = 5df = 10


χ2 DistributionProperties

E[χ2

n]

= n

Mode[χ2

n]

= n − 2 for n ≥ 2

SD[χ2

n]

=√2n


F Distribution

Definition:

Z =

XnYm

, X ∼ χ2n , Y ∼ χ2

m

Density:

Z ∼ Fn,m fFn,m (z) =Γ((n + m)/2

)(n/m)n/2 z(n−2)/2

Γ(n/2

)Γ(m/2

)


F Distribution

0 1 2 3 4

0.0

0.2

0.4

0.6

0.8

1.0

1.2

F

z

f

df1 = 2 df2 = 10df1 = 3 df2 = 10df1 = 5 df2 = 10df1 = 5 df2 = 100

0 1 2 3 40.

00.

20.

40.

60.

81.

0

F

z

F

df1 = 2 df2 = 10df1 = 3 df2 = 10df1 = 5 df2 = 10df1 = 5 df2 = 100


F DistributionProperties

E[Fn,m

]= m

m − 2 for m > 2

Mode[Fn,m

]= m(n − 2)

n(m + 2) for n > 2

SD[Fn,m

]=√

2m2(n + m − 2)n(m − 2)2(m − 4) for m > 4


t Distribution

Definition:

Z = X√Yn

, X ∼ N(0, 1) , Y ∼ χ2n

Density:

Z ∼ tn ftn (z) =Γ((n + 1)/2

)√π n Γ

(n/2

)(1 + z2/n)(n+1)/2


t Distribution

−6 −4 −2 0 2 4 6

0.0

0.1

0.2

0.3

0.4

0.5

t

z

f

df = 1df = 2df = 4df = 10df = 100

−6 −4 −2 0 2 4 60.

00.

20.

40.

60.

81.

0

t

z

F

df = 1df = 2df = 4df = 10df = 100


t DistributionProperties

E[tn]

= Mode[tn]

= 0 for n > 1

SD[tn]

=√ n

n − 2 for n > 2


Uniform DistributionDensity

Z ∼ U(zmin, zmax) fU(zmin,zmax) = 1zmax − zmin

−3 −2 −1 0 1 2 3

0.0

0.2

0.4

0.6

0.8

1.0

Uniform with mean=0

z

f

max = 1max = 2

−3 −2 −1 0 1 2 3

0.0

0.2

0.4

0.6

0.8

1.0

Uniform with mean=0

z

Fmax = 1max = 2


Uniform DistributionProperties

E[U(zmin, zmax)

]= zmin + zmax

2

Med[U(zmin, zmax)

]= zmin + zmax

2

SD[U(zmin, zmax)

]= zmax − zmin

2√3