26
Bayesian Scientific Computing, Spring 2013 (N. Zabaras) Beta and Gamma Distributions Prof. Nicholas Zabaras School of Engineering University of Warwick Coventry CV4 7AL United Kingdom Email: [email protected] URL: http://www.zabaras.com/ August 7, 2014 1

Beta and Gamma Distributions - Purdue UniversityBayesian Scientific Computing, Spring 2013 (N. Zabaras) Beta and Gamma Distributions Prof. Nicholas Zabaras School of Engineering University

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Beta and Gamma Distributions - Purdue UniversityBayesian Scientific Computing, Spring 2013 (N. Zabaras) Beta and Gamma Distributions Prof. Nicholas Zabaras School of Engineering University

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Beta and Gamma Distributions

Prof. Nicholas Zabaras

School of Engineering

University of Warwick

Coventry CV4 7AL

United Kingdom

Email: [email protected]

URL: http://www.zabaras.com/

August 7, 2014

1

Page 2: Beta and Gamma Distributions - Purdue UniversityBayesian Scientific Computing, Spring 2013 (N. Zabaras) Beta and Gamma Distributions Prof. Nicholas Zabaras School of Engineering University

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Beta distribution,

Gamma Function, Normalization of the Beta Distribution, Beta as a Prior to

Bernoulli, Posterior and Predictive Distributions

A Frequentist View of Bayesian Learning, Variance Decomposition

Gamma Distribution

Exponential Distribution

Chi Squared Distribution

Inverse Gamma Distribution

The Pareto Distribution

2

Contents

• Following closely Chris Bishops’ PRML book, Chapter 2

• Kevin Murphy’s, Machine Learning: A probablistic perspective, Chapter 2

Page 3: Beta and Gamma Distributions - Purdue UniversityBayesian Scientific Computing, Spring 2013 (N. Zabaras) Beta and Gamma Distributions Prof. Nicholas Zabaras School of Engineering University

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

The Beta(a,b) distribution with is defined as

follows:

The expected value, mode and variance of a Beta

random variable x with (hyper-)parameters α and β :

For more information visit this link.

[0,1], , 0x ba

1 11 1( ) (1 )

( ) (1 )( ) ( ) ( , )

x xx x x

beta

a ba ba b

a b a b

Beta

Normalizingfactor

xa

a b

2( 1)

var xab

a b a b

Beta Distribution

3

1

mod2

e xa

a b

Page 4: Beta and Gamma Distributions - Purdue UniversityBayesian Scientific Computing, Spring 2013 (N. Zabaras) Beta and Gamma Distributions Prof. Nicholas Zabaras School of Engineering University

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

If a=b=1, we obtain a uniform distribution.

If a and b are both less than 1, we get a bimodal

distribution with spikes at 0 and 1.

If a and b are both greater

than 1, the distribution

is unimodal.

Run betaPlotDemo

from PMTK

Beta Distribution

4

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0

0.5

1

1.5

2

2.5

3

beta distributions

a=0.1, b=0.1

a=1.0, b=1.0

a=2.0, b=3.0

a=8.0, b=4.0

a=b=1

a,b<1

a,b>1

Page 5: Beta and Gamma Distributions - Purdue UniversityBayesian Scientific Computing, Spring 2013 (N. Zabaras) Beta and Gamma Distributions Prof. Nicholas Zabaras School of Engineering University

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

The gamma function extends the factorial to real numbers:

With integration by parts:

For integer n:

For more information visit this link.

1 1( )( ) (1 )

( ) ( )x x xa ba b

a b

Beta

Gamma Function

5

1

0

( ) x ux u e du

( ) ( 1)!n n

( 1) ( )x x x

Page 6: Beta and Gamma Distributions - Purdue UniversityBayesian Scientific Computing, Spring 2013 (N. Zabaras) Beta and Gamma Distributions Prof. Nicholas Zabaras School of Engineering University

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Showing that the Beta(a,b) distribution is normalized

correctly is a bit tricky. We need to prove that:

Indeed we follow the steps: (a) change the variables y to

t=y+x; (b) change the order of integration in the shaded

triangular region; and (c) change x to m via x=tm:

1

1 1

0

( ) ( ) ( ) (1 ) dxa ba b a b m m

Beta Distribution: Normalization

6

11 1 1

0 0 0

11 11 1 1 1

0 0 0 0

111

0

( ) ( )

1

( ) 1

x y t

y t x x

t

t t

x t

e x dx e y dy x e t x dt dx

x e t x dx dt e t t tdt d

a d

ba b a

b ba a b a

m

ba

a b

m m m

b m m m

x

t t=x

Page 7: Beta and Gamma Distributions - Purdue UniversityBayesian Scientific Computing, Spring 2013 (N. Zabaras) Beta and Gamma Distributions Prof. Nicholas Zabaras School of Engineering University

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Beta Distribution

7

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.5

1

1.5

2

2.5

3

x

pdf

Beta(0.1,0.1)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.5

1

1.5

2

2.5

3

x

pdf

Beta(1,1)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.5

1

1.5

2

2.5

3

x

pdf

Beta(2,3)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.5

1

1.5

2

2.5

3

x

pdf

Beta(8,4)

See Matlab implementation

Page 8: Beta and Gamma Distributions - Purdue UniversityBayesian Scientific Computing, Spring 2013 (N. Zabaras) Beta and Gamma Distributions Prof. Nicholas Zabaras School of Engineering University

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Assuming a Bernoulli likelihood and Beta prior we derive the

posterior as:

This is also a Beta distribution:

a and b are the effective number of observations of x=1 and

x=0, respectively, introduced by the prior (don’t have to be

integers).

Posterior Distribution

8

( | ) (1 )m N mp m m m D 1 1( ) (1 )x a bm m Beta

1 1( | , ) (1 )m N mp a bm a b m m D,

( | , ) ( | , )

,

N N

N N

p

m N m

m a b m a b

a a b b

D, Beta

Page 9: Beta and Gamma Distributions - Purdue UniversityBayesian Scientific Computing, Spring 2013 (N. Zabaras) Beta and Gamma Distributions Prof. Nicholas Zabaras School of Engineering University

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

From the properties of the Beta distribution, we compute:

The posterior mean always lies in between the prior mean

and the MLE estimate:

This can be shown easily by noticing that:

Posterior Mean and Variance

9

a

ma b

N

N N

2( 1)

a bm

a b a b

N N

N N N N

var

a

mb a b

a m m

a m N m N

0 1 1

1

a a b a a bm

a b a b a b a b

m m

N N N N

Page 10: Beta and Gamma Distributions - Purdue UniversityBayesian Scientific Computing, Spring 2013 (N. Zabaras) Beta and Gamma Distributions Prof. Nicholas Zabaras School of Engineering University

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Posterior Distribution

10

For example, after observing heads, the posterior is computed as

follows:

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

x

pdf

Posterior: Beta(3,2)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

x

pdf

Likelihood Function (N=m=1)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

x

pdf

Prior Beta(2,2)

Page 11: Beta and Gamma Distributions - Purdue UniversityBayesian Scientific Computing, Spring 2013 (N. Zabaras) Beta and Gamma Distributions Prof. Nicholas Zabaras School of Engineering University

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

We can now compute the probability that the next coin flip is

heads:

Predictive Distribution

11

1 1( | , ) (1 )m N mp a bm a b m m D,

1

0

1

0 ( , )

( 1| , ) ( 1| ) ( | , )

( | , )

| ,

N N

N

N N

p x p x p d

p d

a b

a b m m a b m

m m a b m

am a b

a b

Beta

D, D,

D,

D,

Posterior mean

Page 12: Beta and Gamma Distributions - Purdue UniversityBayesian Scientific Computing, Spring 2013 (N. Zabaras) Beta and Gamma Distributions Prof. Nicholas Zabaras School of Engineering University

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Consider the case of infinite data (N→∞):

and the posterior mean and variance become:

For N→∞, the distribution as expected spikes around the

MLE estimate with zero variance (i.e. the uncertainty

decreases as N→∞). Is this a general property?

Properties of the Posterior Distribution

12

,N Nm m N m N ma a b b

a

ma b

N

N N

m m

m N m N

2 2

( )0

( 1) ( 1)

a bm

a b a b

N N

N N N N

m N mvar

m N m m N m

Page 13: Beta and Gamma Distributions - Purdue UniversityBayesian Scientific Computing, Spring 2013 (N. Zabaras) Beta and Gamma Distributions Prof. Nicholas Zabaras School of Engineering University

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

A Frequentist View of Bayesian Learning

13

Consider inference of parameter q using data D. We

expect that because the posterior p(q|D) incorporates the

information from the data D, it will imply less variability for q

than the prior p(q).

We have the following identities:

[ ] |q q D

[ ] | | |var var var varq q q q D D D

Page 14: Beta and Gamma Distributions - Purdue UniversityBayesian Scientific Computing, Spring 2013 (N. Zabaras) Beta and Gamma Distributions Prof. Nicholas Zabaras School of Engineering University

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

A Frequentist View of Bayesian Learning

14

This means that on average over the realizations of the

data D, the conditional expectation E[q|D] is equal to E[q].

Also, the posterior variance on average is smaller than the

prior variance by an amount that depends on the variations

in posterior means over the distribution of

possible data.

[ ] |q q D

[ ] | | |var var var varq q q q D D D

|var q D

Page 15: Beta and Gamma Distributions - Purdue UniversityBayesian Scientific Computing, Spring 2013 (N. Zabaras) Beta and Gamma Distributions Prof. Nicholas Zabaras School of Engineering University

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Posterior Mean

15

Note the not-surprising result regarding the posterior mean:

| ( | )

( | ) ( ) ( , ) ( )

p d

p p d d p d d p d

q q q q

q q q q q q q q q q

D D

D D D D D

|q q

Prior Posteriormean mean

Posterior meanaveraged over thedata

D

Page 16: Beta and Gamma Distributions - Purdue UniversityBayesian Scientific Computing, Spring 2013 (N. Zabaras) Beta and Gamma Distributions Prof. Nicholas Zabaras School of Engineering University

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Variance Decomposition Identity

16

If (q,D) are two scalar random variables then we have:

Here is the proof:

[ ] | r |q q q var var vaD D

22

22

22

[ ]

| |

| |

var | var |

var q q q

q q

q q

q q

D D

D D

D D

Page 17: Beta and Gamma Distributions - Purdue UniversityBayesian Scientific Computing, Spring 2013 (N. Zabaras) Beta and Gamma Distributions Prof. Nicholas Zabaras School of Engineering University

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Posterior Variability

17

We can derive a similar expression regarding the posterior

variance:

Thus on average (over the data), the variability in q

decreases. For a particular observed data set D, it is

however possible that

These results implicitly assume that the data follow the

distribution:

Pr

| | |

ior Posteriorvariance variance

averaged overall data

var var var varq q q q D D D

|var varq q D

( ) ( )m p p dq q q D D

|var varq qD

Page 18: Beta and Gamma Distributions - Purdue UniversityBayesian Scientific Computing, Spring 2013 (N. Zabaras) Beta and Gamma Distributions Prof. Nicholas Zabaras School of Engineering University

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

The Gamma distribution is a two-parameter family of

continuous distributions. It has a scale parameter θ>0 and

a shape parameter k>0. If k is an integer then the

distribution represents the sum of k independent

exponentially distributed random variables, each of which

has a mean of θ (which is equivalent to a rate parameter

of θ −1) .

More often, we also use the rate

parametrization

1 exp( / )~ ( , ) , 0,

( )

qq

q

k

k

xX k x x

kGamma

Gamma Distribution

18

1( | , ) exp( ), 0,( )

aab

X a b x xb xa

Gamma

Page 19: Beta and Gamma Distributions - Purdue UniversityBayesian Scientific Computing, Spring 2013 (N. Zabaras) Beta and Gamma Distributions Prof. Nicholas Zabaras School of Engineering University

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

It is frequently a model for waiting times. For important

properties see here.

It is more often parameterized in terms of a shape

parameter a = k and an inverse scale parameter b = 1/θ,

called a rate parameter:

The mean, mode and variance with this parametrization are:

1 1

0

( | , ) , 0, , ( )( )

aa bx a ub

p x a b x e x a u e dua

Gamma Distribution- Rate Parametrization

19

xb

a

1, 1

mod

0

afor a

e x b

otherwise

2var x

b

a

Page 20: Beta and Gamma Distributions - Purdue UniversityBayesian Scientific Computing, Spring 2013 (N. Zabaras) Beta and Gamma Distributions Prof. Nicholas Zabaras School of Engineering University

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Plots of

As we decrease the rate b, the distribution squeezes

leftwards and upwards .

Gamma Distribution

20

1 2 3 4 5 6 7

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Gamma distributions

a=1.0,b=1.0

a=1.5,b=1.0

a=2.0,b=1.0

1( | , ) exp( ), 1( )

aab

X a b x xb ba

Gamma

Run gammaPlotDemo

from PMTK

Page 21: Beta and Gamma Distributions - Purdue UniversityBayesian Scientific Computing, Spring 2013 (N. Zabaras) Beta and Gamma Distributions Prof. Nicholas Zabaras School of Engineering University

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

An empirical PDF of rainfall data fitted with a Gamma

distribution.

Run MatLab function gammaRainfallDemo

from PMTK

Gamma Distribution

21

0 0.5 1 1.5 2 2.50

0.5

1

1.5

2

2.5

3

3.5

0 0.5 1 1.5 2 2.50

0.5

1

1.5

2

2.5

3

3.5

MoM

MLE

Page 22: Beta and Gamma Distributions - Purdue UniversityBayesian Scientific Computing, Spring 2013 (N. Zabaras) Beta and Gamma Distributions Prof. Nicholas Zabaras School of Engineering University

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Exponential Distribution

22

This is defined as

Here λ is the rate parameter.

This distribution describes the times between events in a

Poisson process, i.e. a process in which events occur

continuously and independently at a constant average rate

λ.

( | ) ( |1, ) exp( ), 0,X X x x Expon Gamma

Page 23: Beta and Gamma Distributions - Purdue UniversityBayesian Scientific Computing, Spring 2013 (N. Zabaras) Beta and Gamma Distributions Prof. Nicholas Zabaras School of Engineering University

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Chi-Squared Distribution

23

This is defined as

This is the distribution of the sum of squared Gaussian

random variables.

More precisely,

2

12 2

1

1 2( | ) ( | , ) exp( ), 0,

2 2 2

2

xX X x x

Gamma

2 2

1

~ (0,1) . ~i i

i

Let Z and S Z Then S

N

Page 24: Beta and Gamma Distributions - Purdue UniversityBayesian Scientific Computing, Spring 2013 (N. Zabaras) Beta and Gamma Distributions Prof. Nicholas Zabaras School of Engineering University

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Inverse Gamma Distribution

24

This is defined as follows:

where:

a is the shape and b the scale parameters.

It can be shown that:

1~ ( | , ) ~ ( | , )If X X a b X X a bGamma InvGamma

( 1)( | , ) exp( / ), 0,( )

aab

X a b x b x xa

InvGamma

2

2

( 1), ,1 1

var ( 2)( 1) ( 2)

b bMean exists for a Mode

a a

bexists for a

a a

Page 25: Beta and Gamma Distributions - Purdue UniversityBayesian Scientific Computing, Spring 2013 (N. Zabaras) Beta and Gamma Distributions Prof. Nicholas Zabaras School of Engineering University

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

The Pareto Distribution

25

Used to model the distribution of quantities that exhibit

long tails (heavy tails)

This density asserts that x must be greater than some

constant m, but not too much greater, k controls what is

“too much”.

As k → ∞, the distribution approaches δ(x − m).

On a log-log scale, the pdf forms a straight line, of the form

log p(x) = a log x + c for some constants a and c (power

law, Zipf’s law).

( 1)( | , ) ( )k kX k m km x x m Pareto

Page 26: Beta and Gamma Distributions - Purdue UniversityBayesian Scientific Computing, Spring 2013 (N. Zabaras) Beta and Gamma Distributions Prof. Nicholas Zabaras School of Engineering University

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

The Pareto Distribution

26

Applications: Modeling the frequency of words vs their

rank, distribution of wealth (k=Pareto Index), etc.

( 1)

2

2

( | , ) ( ),

( 1),1

,

var ( 2)( 1) ( 2)

k kX k m km x x m

kmMean if k

k

Mode m

m kif k

k k

Pareto

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Pareto distribution

m=0.01, k=0.10

m=0.00, k=0.50

m=1.00, k=1.00

ParetoPlot from PMTK