28
Common Probability Distributions Nathaniel E. Helwig Associate Professor of Psychology and Statistics University of Minnesota August 28, 2020 Copyright c 2020 by Nathaniel E. Helwig Nathaniel E. Helwig (Minnesota) Common Probability Distributions c August 28, 2020 1 / 28

Common Probability Distributions - Statistics

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Common Probability Distributions - Statistics

Common Probability Distributions

Nathaniel E. Helwig

Associate Professor of Psychology and StatisticsUniversity of Minnesota

August 28, 2020

Copyright c© 2020 by Nathaniel E. Helwig

Nathaniel E. Helwig (Minnesota) Common Probability Distributions c© August 28, 2020 1 / 28

Page 2: Common Probability Distributions - Statistics

Table of Contents

1. Probability Distribution Background

2. Discrete Distributions

3. Continuous Distributions

4. Central Limit Theorem

5. Affine Transformations of Normal Variables

Nathaniel E. Helwig (Minnesota) Common Probability Distributions c© August 28, 2020 2 / 28

Page 3: Common Probability Distributions - Statistics

Probability Distribution Background

Table of Contents

1. Probability Distribution Background

2. Discrete Distributions

3. Continuous Distributions

4. Central Limit Theorem

5. Affine Transformations of Normal Variables

Nathaniel E. Helwig (Minnesota) Common Probability Distributions c© August 28, 2020 3 / 28

Page 4: Common Probability Distributions - Statistics

Probability Distribution Background

Distribution, Mass, and Density Functions

A random variable X has a cumulative distribution function (CDF)F (·), which is a function from the sample space S to the interval [0, 1].

• F (x) = P (X ≤ x) for any given x ∈ S• 0 ≤ F (x) ≤ 1 for any x ∈ S and F (a) ≤ F (b) for all a ≤ b

F (·) has an associated function f(·) that is referred to as a probabilitymass function (PMF) or probability distribution function (PDF).

• PMF (discrete): f(x) = P (X = x) for all x ∈ S• PDF (continuous):

∫ ba f(x)dx = F (b)− F (a) = P (a < X < b)

Nathaniel E. Helwig (Minnesota) Common Probability Distributions c© August 28, 2020 4 / 28

Page 5: Common Probability Distributions - Statistics

Probability Distribution Background

Parameters and Statistics

A parameter θ = t(F ) refers to a some function of a probabilitydistribution that is used to characterize the distribution.

• µ = E(X) and σ2 = E[(X − µ)2] are parameters

Given a sample of data x = (x1, . . . , xn)>, a statistic T = s(x) is somefunction of the sample of data.

• x̄ = 1n

∑ni=1 xi and s2 = 1

n−1

∑ni=1(xi − x̄)2 are statistics

Parametric distributions have a finite number of parameters, whichcharacterize the form of the CDF and PMF (or PDF).

• The parameters define a family of distributions

Nathaniel E. Helwig (Minnesota) Common Probability Distributions c© August 28, 2020 5 / 28

Page 6: Common Probability Distributions - Statistics

Discrete Distributions

Table of Contents

1. Probability Distribution Background

2. Discrete Distributions

3. Continuous Distributions

4. Central Limit Theorem

5. Affine Transformations of Normal Variables

Nathaniel E. Helwig (Minnesota) Common Probability Distributions c© August 28, 2020 6 / 28

Page 7: Common Probability Distributions - Statistics

Discrete Distributions

Bernoulli Distribution

A Bernoulli trial refers to a simple experiment that has two possibleoutcomes. The two outcomes are x = 0 (failure) and x = 1 (success),and the probability of success is denoted by p = P (X = 1).

• One parameter: p ∈ [0, 1]

• Notation: X ∼ Bern(p) or X ∼ B(1, p)

The Bernoulli distribution has properties:

• PMF: f(x) =

{1− p if x = 0p if x = 1

• CDF: F (x) =

0 if x < 0

1− p if 0 ≤ x < 11 if x = 1

• Mean: E(X) = p

• Variance: Var(X) = p(1− p)Nathaniel E. Helwig (Minnesota) Common Probability Distributions c© August 28, 2020 7 / 28

Page 8: Common Probability Distributions - Statistics

Discrete Distributions

Bernoulli Distribution Example

Suppose we flip a coin and record the outcome as 0 (tails) or 1 (heads).

This experiment is an example of a Bernoulli trial, and the randomvariable X (the outcome of a coin flip) follows a Bernoulli distribution.

Bernoulli (p= 1/2)

x

f(x

)

0 1

0.0

0.4

0.8

Bernoulli (p= 3/4)

x

f(x

)

0 1

0.0

0.4

0.8

Figure 1: Bernoulli PMF with p = 1/2 (left) and p = 3/4 (right).

Nathaniel E. Helwig (Minnesota) Common Probability Distributions c© August 28, 2020 8 / 28

Page 9: Common Probability Distributions - Statistics

Discrete Distributions

Binomial Distribution

If X =∑n

i=1 Zi where the Zi are independent and identicallydistributed Bernoulli trials with probability of success p, then therandom variable X follows a binomial distribution.

• Two parameters: n ∈ {1, 2, 3, . . .} and p ∈ [0, 1]

• Notation: X ∼ B(n, p)

The binomial distribution has the properties:

• PMF: f(x) =(nx

)px(1− p)n−x where

(nx

)= n!

x!(n−x)!

• CDF: F (x) =∑bxc

i=0

(ni

)pi(1− p)n−i

• Mean: E(X) = np

• Variance: Var(X) = np(1− p)

Nathaniel E. Helwig (Minnesota) Common Probability Distributions c© August 28, 2020 9 / 28

Page 10: Common Probability Distributions - Statistics

Discrete Distributions

Binomial Distribution Example

Suppose that we flip a coin n ≥ 1 independent times, and assume thateach flip Zi has probability of success (i.e., heads) p ∈ [0, 1].

If we define X to be the total number of observed heads, thenX =

∑ni=1 Zi follows a binomial distribution with parameters n and p.

0 1 2 3 4 5

0.05

0.15

0.25

n = 5

x

P(X

= x

)

0 2 4 6 8 10

0.00

0.10

0.20

n = 10

x

P(X

= x

)

Figure 2: Binomial PMF with n ∈ {5, 10} and p = 1/2.

Nathaniel E. Helwig (Minnesota) Common Probability Distributions c© August 28, 2020 10 / 28

Page 11: Common Probability Distributions - Statistics

Discrete Distributions

Limiting Distribution of Binomial

As the number of independent trials n→∞, we have that

X − np√np(1− p)

→ Z ∼ N(0, 1)

where N(0, 1) denotes a standard normal distribution (later defined).

In other words, the normal distribution is the limiting distribution ofthe binomial for large n.

Note that the binomial distribution (depicted on the previous slide)looks reasonably bell-shaped for n = 10.

Nathaniel E. Helwig (Minnesota) Common Probability Distributions c© August 28, 2020 11 / 28

Page 12: Common Probability Distributions - Statistics

Discrete Distributions

Discrete Uniform Distribution

Suppose that a simple random experiment has possible outcomesx ∈ {a, a+ 1, . . . , b− 1, b} where a ≤ b and m = 1 + b− a, and all mpossible outcomes are equally likely, i.e., if P (X = x) = 1/m.

• Two parameters: the two endpoints a and b (with a < b)

• Notation: X ∼ U{a, b}

The discrete uniform distribution has the properties:

• PMF: f(x) = 1/m

• CDF: F (x) = (1 + bxc − a)/m

• Mean: E(X) = (a+ b)/2

• Variance: Var(X) = [(b− a+ 1)2 − 1]/12

Nathaniel E. Helwig (Minnesota) Common Probability Distributions c© August 28, 2020 12 / 28

Page 13: Common Probability Distributions - Statistics

Discrete Distributions

Discrete Uniform Distribution Example

Suppose we roll a fair dice and let x ∈ {1, . . . , 6} denote the number ofdots that are observed. The random variable X follows a discreteuniform distribution with a = 1 and b = 6.

1 2 3 4 5 6

0.00

0.10

0.20

Discrete Uniform PMF

x

f(x

)

1 2 3 4 5 60.

00.

40.

8

Discrete Uniform CDF

x

F(x

)

Figure 3: Discrete uniform distribution PDF and CDF with a = 1 and b = 6.

Nathaniel E. Helwig (Minnesota) Common Probability Distributions c© August 28, 2020 13 / 28

Page 14: Common Probability Distributions - Statistics

Continuous Distributions

Table of Contents

1. Probability Distribution Background

2. Discrete Distributions

3. Continuous Distributions

4. Central Limit Theorem

5. Affine Transformations of Normal Variables

Nathaniel E. Helwig (Minnesota) Common Probability Distributions c© August 28, 2020 14 / 28

Page 15: Common Probability Distributions - Statistics

Continuous Distributions

Normal Distribution

The normal (or Gaussian) distribution is the most well-known andcommonly used probability distribution. The normal distribution isquite important because of the central limit theorem (later defined).

• Two parameters: the mean µ and the variance σ2

• Notation: X ∼ N(µ, σ2)

The standard normal distribution refers to a normal distribution whereµ = 0 and σ2 = 1, which is typically denoted by Z ∼ N(0, 1).

The normal distribution has the properties:

• PDF: f(x) = 1σ√

2πexp

(−1

2

(x−µσ

)2)where exp(x) = ex

• CDF: F (x) = Φ(x−µ

σ

)where Φ(x) = 1√

∫ x−∞ e

−z2/2dz

• Mean: E(X) = µ

• Variance: Var(X) = σ2

Nathaniel E. Helwig (Minnesota) Common Probability Distributions c© August 28, 2020 15 / 28

Page 16: Common Probability Distributions - Statistics

Continuous Distributions

Normal Distribution Visualizations

The CDF has a elongated “S” shape, which is called an ogive.

−4 −2 0 2 4

0.0

0.4

0.8

Normal PDFs

x

f(x

)

µ = 0, σ2 = 0.2µ = 0, σ2 = 1µ = 0, σ2 = 5µ = −2, σ2 = 0.5

−4 −2 0 2 4

0.0

0.4

0.8

Normal CDFs

xF

(x)

µ = 0, σ2 = 0.2µ = 0, σ2 = 1µ = 0, σ2 = 5µ = −2, σ2 = 0.5

Figure 4: Normal distribution PDFs and CDFs with different µ and σ2.

Nathaniel E. Helwig (Minnesota) Common Probability Distributions c© August 28, 2020 16 / 28

Page 17: Common Probability Distributions - Statistics

Continuous Distributions

Chi-Square Distribution

If X =∑k

i=1 Z2i where the Zi are independent standard normal

distributions, then the random variable X follows a chi-squaredistribution with degrees of freedom k.

• One parameter: the degrees of freedom k

• Notation: X ∼ χ2k or X ∼ χ2(k)

The chi-square distribution has the properties:

• PDF: f(x) = 12k/2Γ(k/2)

xk/2−1e−x/2 where Γ(x) =∫∞

0 tx−1e−tdt

is the gamma function

• CDF: F (x) = 1Γ(k/2)γ

(k2 ,

x2

)where γ(u, v) =

∫ v0 t

u−1e−tdt is thelower incomplete gamma function

• Mean: E(X) = k

• Variance: Var(X) = 2k

Nathaniel E. Helwig (Minnesota) Common Probability Distributions c© August 28, 2020 17 / 28

Page 18: Common Probability Distributions - Statistics

Continuous Distributions

Chi-Square Distribution Visualization

χ2(k) distribution approaches normal distribution as k →∞

0 2 4 6 8 10

0.0

0.2

0.4

Chi−Square PDFs

x

f(x

)

k = 1k = 2k = 3k = 4k = 5k = 6

0 2 4 6 8 10

0.0

0.4

0.8

Chi−Square CDFs

xF

(x)

k = 1k = 2k = 3k = 4k = 5k = 6

Figure 5: Chi-square distribution PDFs and CDFs with different DoF.

Nathaniel E. Helwig (Minnesota) Common Probability Distributions c© August 28, 2020 18 / 28

Page 19: Common Probability Distributions - Statistics

Continuous Distributions

F Distribution

A random variable X has an F distribution if the variable has the form

X =U/m

V/n

where U ∼ χ2(m) and V ∼ χ2(n) are independent chi-square variables.• Two parameters: the two degrees of freedom parameters m and n• Notation: X ∼ Fm,n or X ∼ F (m,n)

The F distribution has the properties:

• PDF: f(x) =

√(xm)mnn

(xm+n)m+n

xB(m2,n2

) where B(u, v) =∫ 1

0 tu−1(1− t)v−1dt

• CDF: F (x) = I xmxm+n

(m2 ,

n2

)where Ix(u, v) = B(x;u,v)

B(u,v) and

B(x;u, v) =∫ x

0 tu−1(1− t)v−1dt

• Mean: E(X) = nn−2 assuming that n > 2

• Variance: Var(X) = 2n2(m+n−2)m(n−2)2(n−4)

assuming that n > 4

Nathaniel E. Helwig (Minnesota) Common Probability Distributions c© August 28, 2020 19 / 28

Page 20: Common Probability Distributions - Statistics

Continuous Distributions

F Distribution Visualization

Note that if X ∼ F (m,n), then X−1 ∼ F (n,m)

0 1 2 3 4 5

0.0

0.5

1.0

1.5

2.0

F Distribution PDFs

x

f(x

)

m = 1, n = 1m = 2, n = 1m = 2, n = 5m = 5, n = 5m = 5, n = 10m = 50, n = 50

0 1 2 3 4 5

0.0

0.4

0.8

F Distribution CDFs

xF

(x) m = 1, n = 1

m = 2, n = 1m = 2, n = 5m = 5, n = 5m = 5, n = 10m = 50, n = 50

Figure 6: F distribution PDFs and CDFs with different DoF.

Nathaniel E. Helwig (Minnesota) Common Probability Distributions c© August 28, 2020 20 / 28

Page 21: Common Probability Distributions - Statistics

Continuous Distributions

Continuous Uniform Distribution

Suppose that a simple random experiment has an infinite number ofpossible outcomes x ∈ [a, b] where −∞ < a < b <∞, and all possibleoutcomes are equally likely.

• Two parameters: the two endpoints a and b

• Notation: X ∼ U [a, b] or X ∼ U(a, b) (closed or open interval)

The continuous uniform distribution has the properties:

• PDF: f(x) = 1b−a if x ∈ [a, b] (note that f(x) = 0 otheriwse)

• CDF: F (x) = x−ab−a if x ∈ [a, b] (note that F (x) = 0 if x < a and

F (x) = 1 if x > b)

• Mean: E(X) = (a+ b)/2

• Variance: Var(X) = (b− a)2/12

Nathaniel E. Helwig (Minnesota) Common Probability Distributions c© August 28, 2020 21 / 28

Page 22: Common Probability Distributions - Statistics

Continuous Distributions

Continuous Uniform Distribution Visualization

0 2 4 6 8 10 12

0.00

0.04

0.08

Continuous Uniform PDF

x

f(x

)

0 2 4 6 8 10 12

0.0

0.4

0.8

Continuous Uniform CDF

xF

(x)

Figure 7: Continuous uniform PDF and CDF with a = 0 and b = 12.

Nathaniel E. Helwig (Minnesota) Common Probability Distributions c© August 28, 2020 22 / 28

Page 23: Common Probability Distributions - Statistics

Central Limit Theorem

Table of Contents

1. Probability Distribution Background

2. Discrete Distributions

3. Continuous Distributions

4. Central Limit Theorem

5. Affine Transformations of Normal Variables

Nathaniel E. Helwig (Minnesota) Common Probability Distributions c© August 28, 2020 23 / 28

Page 24: Common Probability Distributions - Statistics

Central Limit Theorem

Central Limit Theorem

Let x1, . . . , xn denote an independent and identically distributed (iid)sample of data from some probability distribution F with mean µ andvariance σ2 <∞.

The central limit theorem reveals that as the sample size n→∞√n(x̄n − µ)

d−→ N(0, σ2)

where x̄n = 1n

∑ni=1 xi is the sample mean.

For a large enough n, the sample mean x̄n is approximately normallydistributed with mean µ and variance σ2/n, i.e., x̄n ∼̇ N(µ, σ

2

n ).

Nathaniel E. Helwig (Minnesota) Common Probability Distributions c© August 28, 2020 24 / 28

Page 25: Common Probability Distributions - Statistics

Central Limit Theorem

Central Limit Theorem Visualization

Bernoulli (n = 100)

x

f̂(x)

0.3 0.4 0.5 0.6 0.7

02

46

8

Bernoulli (n = 500)

x

f̂(x)

0.40 0.45 0.50 0.55 0.60

05

1015

Uniform (n = 100)

x

f̂(x)

0.3 0.4 0.5 0.6 0.7

02

46

8

Uniform (n = 500)

x

f̂(x)

0.40 0.45 0.50 0.55 0.600

510

15

Figure 8: Top: X ∼ B(1, 1/2). Bottom: X ∼ U [ 2−√

124

, 2+√

124

]. Note that µ = 1/2and σ2 = 1/4 for both distributions. The histograms depict the approximatesampling distribution of x̄n and the lines denote the N(µ, σ2/n) density.

Nathaniel E. Helwig (Minnesota) Common Probability Distributions c© August 28, 2020 25 / 28

Page 26: Common Probability Distributions - Statistics

Affine Transformations of Normal Variables

Table of Contents

1. Probability Distribution Background

2. Discrete Distributions

3. Continuous Distributions

4. Central Limit Theorem

5. Affine Transformations of Normal Variables

Nathaniel E. Helwig (Minnesota) Common Probability Distributions c© August 28, 2020 26 / 28

Page 27: Common Probability Distributions - Statistics

Affine Transformations of Normal Variables

Definition of Affine Transformation

Another reason that the normal distribution is so popular for appliedresearch is the fact that normally distributed variables are convenientto work with. This is because affine transformations of normalvariables are also normal variables.

Given a collection of variables X1, . . . , Xp, an affine transformation hasthe form Y = a+ b1X1 + · · · bpXp, where a is an offset (intercept) termand bj is the weight/coefficient that is applied to the j-th variable.

• Linear transformations are a special case with a = 0

Nathaniel E. Helwig (Minnesota) Common Probability Distributions c© August 28, 2020 27 / 28

Page 28: Common Probability Distributions - Statistics

Affine Transformations of Normal Variables

Affine Transformations of Normals

If the random variable X is normally distributed, i.e., if X ∼ N(µ, σ2),then the random variable Y = a+ bX is also normally distributed, i.e.,Y ∼ N(µY , σ

2Y ) where µY = E(Y ) = a+ bµ and σ2

Y = Var(Y ) = b2σ2.

If we define Y = a+∑p

j=1 bjXj , then the mean and variance of Y are

µY = E(Y ) = a+

p∑j=1

bjµj

σ2Y = Var(Y ) =

p∑j=1

b2jσ2j + 2

p∑j=2

j−1∑k=1

bjbkσjk

where σjk = E[(Xj − µj)(Xk − µk)]. If the collection of variables(X1, . . . , Xp) are multivariate normal, then Y ∼ N(µY , σ

2Y ).

Nathaniel E. Helwig (Minnesota) Common Probability Distributions c© August 28, 2020 28 / 28