59
Review of probability calculus June 11, 2017 Andreas Scheidegger Eawag: Swiss Federal Institute of Aquatic Science and Technology

Review of probability calculus

Embed Size (px)

Citation preview

Page 1: Review of probability calculus

Review of probability calculus

June 11, 2017Andreas Scheidegger

Eawag: Swiss Federal Institute of Aquatic Science and Technology

Page 2: Review of probability calculus

Random variables (RV)“Mathematical machines that generate numbers”

Completely described by the cumulative probability distributionfunction (cdf) or the probability distribution/density function(pdf).Some properties can be described by measures such as mean,variance, mode, . . .

Andreas Scheidegger Univariate Random Variables 1

Page 3: Review of probability calculus

Probability Distribution/Density Function (pdf)

PA fB

z1 z2 zn zzrzl

Discrete RV Probability to obtain a certain output.Continuous RV Proportional to the probability to obtain an output

close to a certain value.

Andreas Scheidegger Univariate Random Variables 2

Page 4: Review of probability calculus

Cumulative Distribution Function (cdf)

FA FB

z1 z2 zn zzrzl0

1

0

1

Discrete and continous RV Probability to obtain an output equalor smaller than a certain value.

Andreas Scheidegger Univariate Random Variables 3

Page 5: Review of probability calculus

cdf and pdf

Discrete RVs

Distribution function:

FA(z) = P(A ≤ z)

Probability distribution:

PA(zi ) for zi ∈ ΩA

Continous RVs

Distribution function:

FB(z) = P(B ≤ z)

Probability density:

fB(z) = ddz FB(z)

P(B ∈ [z1, z2]) =∫ z2

z1fB(z) dz

P(B ∈ [z , z + ∆]) ≈ ∆ · fB(z)

Andreas Scheidegger Univariate Random Variables 4

Page 6: Review of probability calculus

Characteristics of Random VariablesMeasures of LocationExpected value:

E[A] =∑

z∈ΩA

z PA(z) , E[B] =∫

ΩBz fB(z) dz

Median:

Med[Z ] : P(Z ≤ Med[Z ]) = P(Z > Med[Z ]) = Q0.5[Z ]

Quantiles:

Qp[Z ] : P(Z ≤ Qp[Z ]) = p and P(Z > Qp[Z ]) = 1− p

Mode:

Mode[A] = arg maxzi∈ΩA

PA(zi ) , Mode[B] = arg maxz∈ΩB

fB(z)

Andreas Scheidegger Univariate Random Variables 5

Page 7: Review of probability calculus

Characteristics of Random VariablesMeasures of Location

Expected value of a function of a RV:

E[g(A)] =∑

z∈ΩA

g(z)PA(z)

E[g(B)] =∫

ΩBg(z)fB(z) dz

Attention!

E[g(X )] 6= g (E[X ])

Andreas Scheidegger Univariate Random Variables 6

Page 8: Review of probability calculus

Characteristics of Random VariablesMeasures of Location

Expected value of a function of a RV:

E[g(A)] =∑

z∈ΩA

g(z)PA(z)

E[g(B)] =∫

ΩBg(z)fB(z) dz

Attention!

E[g(X )] 6= g (E[X ])

Andreas Scheidegger Univariate Random Variables 6

Page 9: Review of probability calculus

Characteristics of Random VariablesMeasures of Extension

Variance:Var[Z ] = E

[(Z − E[Z ]

)2]Standard Deviation:

SD[Z ] =√Var[Z ]

Inter-Quantile Range:

QRp[Z ] = Q(1+p)/2[Z ]− Q(1−p)/2[Z ]

Andreas Scheidegger Univariate Random Variables 7

Page 10: Review of probability calculus

Characteristics of Random Variables

E[aZ + b] = a E[Z ] + b

E[Z1 ± Z2] = E[Z1]± E[Z2]

Var[Z ] = E[Z 2]− E[Z ]2

Var[aZ + b] = a2 Var[Z ]

Only if Z1 and Z2 are independent:

Var[Z1 ± Z2] = Var[Z1] + Var[Z2]

Andreas Scheidegger Univariate Random Variables 8

Page 11: Review of probability calculus

Multivariate random variables

A

B

Andreas Scheidegger Multivariate Random Variables 9

Page 12: Review of probability calculus

Joint distribution

discrete RV:

PA,B(a, b) = PA|B(a|b) · PB(b) = PB|A(b|a) · PA(a)

E.g.: PA,B(3, 1) : Probability to obtain ai = 3 and bi = 1.

continous RV:

fA,B(a, b) = fA|B(a|b) · fB(b) = fB|A(b|a) · fA(a)

E.g.: fA,B(3, 1) : proportional to the probability to obtain arealization close to 3 and 1.

Andreas Scheidegger Multivariate Random Variables 10

Page 13: Review of probability calculus

Joint distribution

discrete RV:

PA,B(a, b) = PA|B(a|b) · PB(b) = PB|A(b|a) · PA(a)

E.g.: PA,B(3, 1) : Probability to obtain ai = 3 and bi = 1.

continous RV:

fA,B(a, b) = fA|B(a|b) · fB(b) = fB|A(b|a) · fA(a)

E.g.: fA,B(3, 1) : proportional to the probability to obtain arealization close to 3 and 1.

Andreas Scheidegger Multivariate Random Variables 10

Page 14: Review of probability calculus

Conditional Distributions

Discrete RV:PA|B(a|b) = PA,B(a, b)

PB(b)

Continuous RV:fA|B(a|b) = fA,B(a, b)

fB(b)

Andreas Scheidegger Multivariate Random Variables 11

Page 15: Review of probability calculus

Marginal distribution

Discrete random variables:

PA(a) =∑

b∈ΩB

PA,B(a, b)

Continuous random variables:

fA(a) =∫

ΩBfA,B(a, b) db

Andreas Scheidegger Multivariate Random Variables 12

Page 16: Review of probability calculus

Marginal distribution

Discrete random variables:

PA(a) =∑

b∈ΩB

PA,B(a, b)

Continuous random variables:

fA(a) =∫

ΩBfA,B(a, b) db

Andreas Scheidegger Multivariate Random Variables 12

Page 17: Review of probability calculus

Independence

Definition:FA,B(a, b) = FA(a) · FB(b)

Discrete random variables:

PA,B(a, b) = PA(a) · PB(b)

Continuous random variables:

fA,B(a, b) = fA(a) · fB(b)

Andreas Scheidegger Multivariate Random Variables 13

Page 18: Review of probability calculus

Bayes’ Theorem1

Discrete random variables

BecausePA|B(a|b)PB(b) = PB|A(b|a)PA(a)

we can write

PA|B(a|b) =PB|A(b|a)PA(a)

PB(b) =PB|A(b|a)PA(a)∑

a′∈ΩA

PB|A(b|a′)PA(a′)

1Bayes’ Theorem as we know it today was actually formulated by P. Laplacein 1774 and not by T. Bayes.

Andreas Scheidegger Multivariate Random Variables 14

Page 19: Review of probability calculus

Bayes’ TheoremContinuous random variables

fA|B(a|b) =fB|A(b|a)fA(a)

fB(b) =fB|A(b|a)fA(a)∫

fB|A(b|a′)fA(a′) da′

Andreas Scheidegger Multivariate Random Variables 15

Page 20: Review of probability calculus

Characteristics of Random VariablesDependencies

Variance-Covariance Matrix:

Var[Z] = E[(Z− E[Z]

)(Z− E[Z]

)T]Individual Covariances:

Cov[Zi ,Zj ] = E[(Zi − E[Zi ]

)(Zj − E[Zj ]

)]= Var[Z]i ,j

Correlation Matrix:

Cor[Z]i ,j = Cov[Zi ,Zj ]√Var[Zi ] · Var[Zj ]

Andreas Scheidegger Multivariate Random Variables 16

Page 21: Review of probability calculus

CorrelationCorrelation measures only linear dependencies!

Figure: Several sets of (x , y) points, with the correlation coefficient of xand y for each set. Source: Wikipedia.

Andreas Scheidegger Multivariate Random Variables 17

Page 22: Review of probability calculus

Short NotationFunction argument corresponds to RV

PA(a), PB|A(b|a) ←→ P(a), P(b|a)

fB(b), fA|B(a|b) ←→ f (b), f (a|b) or p(b), p(a|b)

Example:

fX1|X2,X3(x1|x2, x3) =fX2|X1(x2|x1)fX1|X3(x1|x3)

fX2(x2)

p(x1|x2, x3) = p(x2|x1)p(x1|x3)p(x2)

Andreas Scheidegger Notation 18

Page 23: Review of probability calculus

Short NotationFunction argument corresponds to RV

PA(a), PB|A(b|a) ←→ P(a), P(b|a)

fB(b), fA|B(a|b) ←→ f (b), f (a|b) or p(b), p(a|b)

Example:

fX1|X2,X3(x1|x2, x3) =fX2|X1(x2|x1)fX1|X3(x1|x3)

fX2(x2)

p(x1|x2, x3) = p(x2|x1)p(x1|x3)p(x2)

Andreas Scheidegger Notation 18

Page 24: Review of probability calculus

Directed Acyclic GraphsVisualize independence structure of RV

A

B

DC

p(A)

p(B | A)

p(C | A,B)

p(D | B)

e.g. A and D are conditionallyindependent. joint distribution:

p(A,B,C ,D) =p(A) p(B | A) p(C | A,B) p(D | B)

Andreas Scheidegger Notation 19

Page 25: Review of probability calculus

Directed Acyclic GraphsVisualize independence structure of RV

A

B

DC

p(A)

p(B | A)

p(C | A,B)

p(D | B)

e.g. A and D are conditionallyindependent. joint distribution:

p(A,B,C ,D) =p(A) p(B | A) p(C | A,B) p(D | B)

Andreas Scheidegger Notation 19

Page 26: Review of probability calculus

Normal distribution

Andreas Scheidegger Normal distributions 20

Page 27: Review of probability calculus

Central Limit Theorem

Lets X1,X2, . . . be independent and identically distributed RVswith mean µ and a finite variance σ2. Further we defineSn = X1 + X2 + . . .+ Xn, that has a mean nµ and variance nσ2.Then the standardized RV

Zn = Sn − nµ√nσ

is standard normal distributed for n→∞.

Andreas Scheidegger Normal distributions 21

Page 28: Review of probability calculus

Central Limit Theorem Example

n = 1

Den

sity

−2 −1 0 1 2

0.0

0.4

0.8

n = 2

Den

sity

−2 −1 0 1 20.

00.

30.

6

n = 3

Den

sity

−2 −1 0 1 2

0.0

0.3

0.6

n = 4

Den

sity

−2 −1 0 1 2

0.0

0.2

0.4

n = 5

Den

sity

−2 −1 0 1 2

0.0

0.2

0.4

n = 6

Den

sity

−2 −1 0 1 2

0.0

0.2

0.4

n = 7

Den

sity

−2 −1 0 1 2

0.0

0.2

0.4

n = 8

Den

sity

−2 −1 0 1 2

0.0

0.2

0.4

n = 9

Den

sity

−2 −1 0 1 2

0.0

0.2

0.4

n = 10

Den

sity

−2 −1 0 1 2

0.0

0.2

0.4

n = 11

Den

sity

−2 −1 0 1 2

0.0

0.2

0.4

n = 12

Den

sity

−2 −1 0 1 2

0.0

0.2

0.4

Andreas Scheidegger Normal distributions 22

Page 29: Review of probability calculus

Relationships of Univariate Distributions

Figure 1. Univariate distribution relationships.

The American Statistician, February 2008, Vol. 62, No. 1 47

Dow

nloa

ded

by [

Lib

4RI]

at 0

2:24

28

May

201

3

Figure 1. Univariate distribution relationships.

The American Statistician, February 2008, Vol. 62, No. 1 47

Dow

nloa

ded

by [

Lib

4RI]

at 0

2:24

28

May

201

3

From: Leemis, L. M. and McQueston, J. T. (2008) Univariate distributionrelationships. The American Statistician, 62(1), 45–53. → Link

Andreas Scheidegger Normal distributions 23

Page 30: Review of probability calculus

Multivariate Normal Distribution

Density of a multivariate Normal distribution of dimension n with amean vector µ and a variance-covariance matrix Σ:

Z ∼ N(µ,Σ)

fN(µ,σ,R)(z) = 1(2π)n/2

1| Σ |1/2 exp

(−12(z− µ)TΣ−1(z− µ)

)

Andreas Scheidegger Normal distributions 24

Page 31: Review of probability calculus

Multivariate Normal DistributionProperties

All marginals are normal distributed

Z ∼ N(µ,Σ) ⇒ Zi ∼ N(µi ,Σi ,i )

Linear transformation:

Z ∼ N(µ,Σ)⇒ AZ + b ∼ N(Aµ + b,AΣAT)

Conditional distribution:

Z =(

XY

)∼ N

(µXµY

,

[ΣX,X ΣX,YΣTX,Y ΣY,Y

])

⇒ X | Y (y) ∼ N(µX + ΣX,YΣ−1

Y,Y(y− µY),ΣX,X −ΣX,YΣ−1Y,YΣT

X,Y

)

Andreas Scheidegger Normal distributions 25

Page 32: Review of probability calculus

Multivariate Normal DistributionProperties

All marginals are normal distributed

Z ∼ N(µ,Σ) ⇒ Zi ∼ N(µi ,Σi ,i )

Linear transformation:

Z ∼ N(µ,Σ)⇒ AZ + b ∼ N(Aµ + b,AΣAT)

Conditional distribution:

Z =(

XY

)∼ N

(µXµY

,

[ΣX,X ΣX,YΣTX,Y ΣY,Y

])

⇒ X | Y (y) ∼ N(µX + ΣX,YΣ−1

Y,Y(y− µY),ΣX,X −ΣX,YΣ−1Y,YΣT

X,Y

)

Andreas Scheidegger Normal distributions 25

Page 33: Review of probability calculus

Multivariate Normal DistributionProperties

All marginals are normal distributed

Z ∼ N(µ,Σ) ⇒ Zi ∼ N(µi ,Σi ,i )

Linear transformation:

Z ∼ N(µ,Σ)⇒ AZ + b ∼ N(Aµ + b,AΣAT)

Conditional distribution:

Z =(

XY

)∼ N

(µXµY

,

[ΣX,X ΣX,YΣTX,Y ΣY,Y

])

⇒ X | Y (y) ∼ N(µX + ΣX,YΣ−1

Y,Y(y− µY),ΣX,X −ΣX,YΣ−1Y,YΣT

X,Y

)Andreas Scheidegger Normal distributions 25

Page 34: Review of probability calculus

Further Generalization

one-dimensional

n-dimensional

A

B what’s next?

Andreas Scheidegger Random Processes 26

Page 35: Review of probability calculus

Discrete random process“Random vectors with infinity large number of elements”

(0.11, 10.78, -10.24, -3.90, 5.91, ...)(-1.11, -4.06, -8.64, -0.92, -2.27, ...)

(0.76, -8.54, 0.81, 2.03, 12.9, ...)

Andreas Scheidegger Random Processes 27

Page 36: Review of probability calculus

Continous random processes“Random functions”

Andreas Scheidegger Random Processes 28

Page 37: Review of probability calculus

What is a Probability?Interpretation of probabilities

1. The probability for “head” is 1/2.2. The probability that it rains tomorrow is 30%.

Frequentist

1. The frequency that “head”occurs if the randomexperiment is repeated.

2. “Rain tomorrow” is not arepeatable experiment

Subjective

1. Somebody’s belief that acoin toss results in “head”,given his/her experience.

2. Somebody’s belief that itrains tomorrow, givenhis/her experience.

Other probability interpretations:→ http://www.webcitation.org/6YupVo9zG

Andreas Scheidegger Interpretation 29

Page 38: Review of probability calculus

What is a Probability?Interpretation of probabilities

1. The probability for “head” is 1/2.2. The probability that it rains tomorrow is 30%.

Frequentist

1. The frequency that “head”occurs if the randomexperiment is repeated.

2. “Rain tomorrow” is not arepeatable experiment

Subjective

1. Somebody’s belief that acoin toss results in “head”,given his/her experience.

2. Somebody’s belief that itrains tomorrow, givenhis/her experience.

Other probability interpretations:→ http://www.webcitation.org/6YupVo9zG

Andreas Scheidegger Interpretation 29

Page 39: Review of probability calculus

What is a Probability?Interpretation of probabilities

1. The probability for “head” is 1/2.2. The probability that it rains tomorrow is 30%.

Frequentist

1. The frequency that “head”occurs if the randomexperiment is repeated.

2. “Rain tomorrow” is not arepeatable experiment

Subjective

1. Somebody’s belief that acoin toss results in “head”,given his/her experience.

2. Somebody’s belief that itrains tomorrow, givenhis/her experience.

Other probability interpretations:→ http://www.webcitation.org/6YupVo9zG

Andreas Scheidegger Interpretation 29

Page 40: Review of probability calculus

Summary

joint = conditional x marginal

f (a, b) = f (a|b) f (b) = f (b|a) f (a)

Marginals:

f (a) =∫

f (a, b) db =∫

f (a|b) f (b) db

More information in Appendix A.2 – A.5.

Andreas Scheidegger Summary 30

Page 41: Review of probability calculus

Common distributions

Andreas Scheidegger Summary 31

Page 42: Review of probability calculus

Implemented distribution in R

For all distributions four functions are implemented:

d__(x, ...) pdf evaluated at x

p__(x, ...) cdf evaluated at x

q__(p, ...) p-th quantiler__(n, ...) sample n random numbers

beta *beta binomial *binomCauchy *cauchy chi-squared *chisqexponential *exp F *fgamma *gamma geometric *geomhypergeometric *hyper log-normal *lnormmultinomial *multinom negative binomial *nbinomnormal *norm Poisson *poisStudent’s t *t uniform *unifWeibull *weibull

Andreas Scheidegger Summary 32

Page 43: Review of probability calculus

Normal DistributionDensity

Z ∼ N(µ, σ) fN(µ,σ)(z) = 1σ√2π

exp(−(z − µ)2

2σ2

)

−3 −2 −1 0 1 2 3

01

23

45

Normal with mean=0

z

f

sd = 0.1sd = 0.25sd = 0.5sd = 1sd = 2sd = 4

−3 −2 −1 0 1 2 3

0.0

0.2

0.4

0.6

0.8

1.0

Normal with mean=0

z

F

sd = 0.1sd = 0.25sd = 0.5sd = 1sd = 2sd = 4

Andreas Scheidegger Summary 33

Page 44: Review of probability calculus

Normal DistributionProperties

E[N(µ, σ)

]= Mode

[N(µ, σ)

]= Med

[N(µ, σ)

]= µ

SD[N(µ, σ)

]= σ

Central limit theorem:Lets X1,X2, . . . be independent and identically distributed RVswith mean µ and a finite variance σ2. Further we defineSn = X1 + X2 + . . .+ Xn, that has a mean nµ and variance nσ2.Then the standardized RV

Zn = Sn − nµ√nσ

is standard normal distributed for n→∞.Andreas Scheidegger Summary 34

Page 45: Review of probability calculus

Lognormal Distribution

Definition:Z = exp(X ) , X ∼ N(m, s)

Density:Z ∼ LN(µ, σ)

fLN(µ,σ)(z) =

1√2π

1sz exp

−12

(log(zµ

)+ s2

2

)2

s2

for z > 0

0 for z ≤ 0

with

s =

√log(1 + σ2

µ2

)

Andreas Scheidegger Summary 35

Page 46: Review of probability calculus

Lognormal Distribution

0.0 0.5 1.0 1.5 2.0 2.5 3.0

01

23

45

Lognormal with mean=1

z

f

sd = 0.1sd = 0.25sd = 0.5sd = 1sd = 2sd = 4

0.0 0.5 1.0 1.5 2.0 2.5 3.00.

00.

20.

40.

60.

81.

0

Lognormal with mean=1

z

F

sd = 0.1sd = 0.25sd = 0.5sd = 1sd = 2sd = 4

Andreas Scheidegger Summary 36

Page 47: Review of probability calculus

Lognormal DistributionProperties

E[LN(µ, σ)

]= µ

Mode[LN(µ, σ)

]= µ(

1 + σ2

µ2

) 32

Med[LN(µ, σ)

]= µ√

1 + σ2

µ2

SD[LN(µ, σ)

]= σ

Andreas Scheidegger Summary 37

Page 48: Review of probability calculus

Lognormal DistributionR implementation

Attention: The lognormal distribution in R is defined with m and s(the mean and standard deviation of X )!

The code below computes the arguments if mean µ and standarddeviation σ are given:## conversion , ’mu ’ and ’sigma ’ givenmeanlog <- log(mu) - 0.5*log (1 + (sigma/mu )^2)sdlog <- sqrt(log (1 + sigma ^2/(mu ^2)))

## generate 1000 random samplesrlnorm (1000 , meanlog =meanlog , sdlog=sdlog)

Andreas Scheidegger Summary 38

Page 49: Review of probability calculus

χ2 Distribution

Definition:Z =

n∑i=1

X 2i , Xi ∼ N(0, 1)

Density:

Z ∼ χ2n fχ2

n(z) = z(n−2)/2 exp(−z/2)

2n/2 Γ(n/2)

Andreas Scheidegger Summary 39

Page 50: Review of probability calculus

χ2 Distribution

0 2 4 6 8 10 12 14

0.0

0.1

0.2

0.3

0.4

0.5

0.6

χ2

z

f

df = 1df = 2df = 3df = 4df = 5df = 10

0 2 4 6 8 10 12 140.

00.

20.

40.

60.

81.

0

χ2

z

F

df = 1df = 2df = 3df = 4df = 5df = 10

Andreas Scheidegger Summary 40

Page 51: Review of probability calculus

χ2 DistributionProperties

E[χ2

n]

= n

Mode[χ2

n]

= n − 2 for n ≥ 2

SD[χ2

n]

=√2n

Andreas Scheidegger Summary 41

Page 52: Review of probability calculus

F Distribution

Definition:

Z =

XnYm

, X ∼ χ2n , Y ∼ χ2

m

Density:

Z ∼ Fn,m fFn,m (z) =Γ((n + m)/2

)(n/m)n/2 z(n−2)/2

Γ(n/2

)Γ(m/2

)

Andreas Scheidegger Summary 42

Page 53: Review of probability calculus

F Distribution

0 1 2 3 4

0.0

0.2

0.4

0.6

0.8

1.0

1.2

F

z

f

df1 = 2 df2 = 10df1 = 3 df2 = 10df1 = 5 df2 = 10df1 = 5 df2 = 100

0 1 2 3 40.

00.

20.

40.

60.

81.

0

F

z

F

df1 = 2 df2 = 10df1 = 3 df2 = 10df1 = 5 df2 = 10df1 = 5 df2 = 100

Andreas Scheidegger Summary 43

Page 54: Review of probability calculus

F DistributionProperties

E[Fn,m

]= m

m − 2 for m > 2

Mode[Fn,m

]= m(n − 2)

n(m + 2) for n > 2

SD[Fn,m

]=√

2m2(n + m − 2)n(m − 2)2(m − 4) for m > 4

Andreas Scheidegger Summary 44

Page 55: Review of probability calculus

t Distribution

Definition:

Z = X√Yn

, X ∼ N(0, 1) , Y ∼ χ2n

Density:

Z ∼ tn ftn (z) =Γ((n + 1)/2

)√π n Γ

(n/2

)(1 + z2/n)(n+1)/2

Andreas Scheidegger Summary 45

Page 56: Review of probability calculus

t Distribution

−6 −4 −2 0 2 4 6

0.0

0.1

0.2

0.3

0.4

0.5

t

z

f

df = 1df = 2df = 4df = 10df = 100

−6 −4 −2 0 2 4 60.

00.

20.

40.

60.

81.

0

t

z

F

df = 1df = 2df = 4df = 10df = 100

Andreas Scheidegger Summary 46

Page 57: Review of probability calculus

t DistributionProperties

E[tn]

= Mode[tn]

= 0 for n > 1

SD[tn]

=√ n

n − 2 for n > 2

Andreas Scheidegger Summary 47

Page 58: Review of probability calculus

Uniform DistributionDensity

Z ∼ U(zmin, zmax) fU(zmin,zmax) = 1zmax − zmin

−3 −2 −1 0 1 2 3

0.0

0.2

0.4

0.6

0.8

1.0

Uniform with mean=0

z

f

max = 1max = 2

−3 −2 −1 0 1 2 3

0.0

0.2

0.4

0.6

0.8

1.0

Uniform with mean=0

z

Fmax = 1max = 2

Andreas Scheidegger Summary 48

Page 59: Review of probability calculus

Uniform DistributionProperties

E[U(zmin, zmax)

]= zmin + zmax

2

Med[U(zmin, zmax)

]= zmin + zmax

2

SD[U(zmin, zmax)

]= zmax − zmin

2√3

Andreas Scheidegger Summary 49