Conjugacy Print

Embed Size (px)

Citation preview

  • 8/12/2019 Conjugacy Print

    1/28

    Conjugate Models

    Patrick Lam

  • 8/12/2019 Conjugacy Print

    2/28

    Outline

    Conjugate ModelsWhat is Conjugacy?The Beta-Binomial Model

    The Normal ModelNormal Model with Unknown Mean, Known Variance

    Normal Model with Known Mean, Unknown Variance

  • 8/12/2019 Conjugacy Print

    3/28

    Outline

    Conjugate ModelsWhat is Conjugacy?The Beta-Binomial Model

    The Normal ModelNormal Model with Unknown Mean, Known Variance

    Normal Model with Known Mean, Unknown Variance

  • 8/12/2019 Conjugacy Print

    4/28

    Outline

    Conjugate ModelsWhat is Conjugacy?The Beta-Binomial Model

    The Normal ModelNormal Model with Unknown Mean, Known Variance

    Normal Model with Known Mean, Unknown Variance

  • 8/12/2019 Conjugacy Print

    5/28

    Conjugacy

    Suppose we have a Bayesian model with a likelihood p(y|) and a

    prior p().

    If we multiply our likelihood andprior, we get ourposterior p(|y)up to a constant of proportionality.

    If ourposterioris a distribution that is of the same family as ourprior, then we have conjugacy. We say that theprioris conjugateto the likelihood.

    Conjugate models are great because we know the exact distribution

    of theposteriorso we can easily simulate or derive quantities ofinterest analytically.

    In practice, we rarely have conjugacy.

  • 8/12/2019 Conjugacy Print

    6/28

    Brief List of Conjugate Models

    Likelihood Prior PosteriorBinomial Beta Beta

    Negative Binomial Beta BetaPoisson Gamma Gamma

    Geometric Beta BetaExponential Gamma Gamma

    Normal (mean unknown) Normal NormalNormal (variance unknown) Inverse Gamma Inverse Gamma

    Normal (mean and variance unknown) Normal/Gamma Normal/GammaMultinomial Dirichlet Dirichlet

  • 8/12/2019 Conjugacy Print

    7/28

    Outline

    Conjugate ModelsWhat is Conjugacy?The Beta-Binomial Model

    The Normal ModelNormal Model with Unknown Mean, Known Variance

    Normal Model with Known Mean, Unknown Variance

  • 8/12/2019 Conjugacy Print

    8/28

    A Binomial Example

    Suppose we have vector of data on voter turnout for a random

    sample ofn voters in the 2004 US Presidential election.

    We can model the voter turnout with a binomial model.

    Y Binomial(n, )

    Quantity of interest: (voter turnout)

    Assumptions:

    Each voters decision to vote follows the Bernoulli distribution. Each voter has the same probability of voting. (unrealistic)

    Each voters decision to vote is independent. (unrealistic)

  • 8/12/2019 Conjugacy Print

    9/28

    The Conjugate Beta Prior

    We can use the beta distribution as apriorfor, since the betadistribution is conjugate to the binomial distribution.

    p(|y) p(y|)p()

    = Binomial(n, ) Beta(, )

    =ny

    y(1 )(ny)( + )

    ()()(1)(1 )(1)

    y(1 )(ny)(1)(1 )(1)

    p(|y) y+

    1(1 )n

    y+

    1

    Theposterior distributionis simply aBeta(y+ , n y+ )distribution. Effectively, ourprioris just adding 1 successesand 1 failures to the dataset.

  • 8/12/2019 Conjugacy Print

    10/28

    The Uninformative (Flat) Uniform Prior

    Suppose we have no strong prior beliefs about the parameters. We

    can choose apriorthat gives equal weight to all possible values ofthe parameters, essentially an uninformative orflat prior.

    p() =constant

    for all values of.

    For the binomial model, one example of a flat prioris theBeta(1,1) prior:

    p() = (2)

    (1)(1)(1

    1)(1 )(1

    1)

    = 1

    which is the Uniform distribution over the [0, 1] interval.

  • 8/12/2019 Conjugacy Print

    11/28

    Since we know that a Binomial likelihood and a Beta(1,1) priorproduces aBeta(y+ 1, n y+ 1) posterior, we can simulate the

    posteriorin R.

    Suppose our turnout data had 500 voters, of which 285 voted.

    > table(turnout)

    turnout

    0 1

    215 285

    Setting ourpriorparameters at = 1 and = 1,

    > a < - 1> b < - 1

    we get theposterior

    > posterior.unif.prior

  • 8/12/2019 Conjugacy Print

    12/28

    Outline

    Conjugate ModelsWhat is Conjugacy?The Beta-Binomial Model

    The Normal ModelNormal Model with Unknown Mean, Known Variance

    Normal Model with Known Mean, Unknown Variance

  • 8/12/2019 Conjugacy Print

    13/28

    Outline

    Conjugate ModelsWhat is Conjugacy?The Beta-Binomial Model

    The Normal ModelNormal Model with Unknown Mean, Known Variance

    Normal Model with Known Mean, Unknown Variance

  • 8/12/2019 Conjugacy Print

    14/28

    Normal Model with Unknown Mean, Known Variance

    Suppose we wish to estimate a model where the likelihood of thedata is normal with an unknown mean and a known variance 2.

    Our parameter of interest is .

    We can use a conjugateNormal prioron , with mean 0 andvariance20 .

    p(|y, 2) p(y|, 2)p()

    Normal(1, 21 ) = Normal(, 2) Normal(0, 20 )

  • 8/12/2019 Conjugacy Print

    15/28

    Let represent our parameter of interest, in this case .

    p(|y) nY

    i=1

    1

    22 exp(yi

    )2

    22

    1p220 exp

    (

    0)

    2

    220

    exp

    nXi=1

    (yi )222

    ( 0)2

    220

    !

    = exp"1

    2

    n

    Xi=1

    (yi

    )2

    2 +

    (

    0)

    2

    20!#

    = exp

    " 1

    2220

    20

    nXi=1

    (yi )2+2( 0)2!#

    = exp" 1

    222020

    n

    Xi=1

    (y2i

    2yi+ 2)+2(2

    20+

    20)!#

  • 8/12/2019 Conjugacy Print

    16/28

    We can multiply the 2yiterm in the summation by nn

    in order toget the equations in terms of the sufficient statistic y.

    p(|y)

    exp"

    1

    222020

    n

    Xi=1

    (y2i

    2n

    nyi+

    2)+2(2

    20+

    20)!#

    = exp

    " 1

    2220

    20

    nXi=1

    y2i 20 2ny+ 20 n2+22202 + 202

    !#

    We can then factor the terms into several parts. Since 2

    0

    2

    and20n

    i=1y2i do not contain , we can represent them as some

    constant k, which we will drop into the normalizing constant.

    p(|y) exp 1

    2220

    2

    2 + 20 n 2

    0

    2 + 20 ny

    +k

    = exp1

    2

    22 + 20 n220

    2

    02 + 20 ny

    220

    +k

    = exp

    1

    2

    2

    1

    20+

    n

    2

    2

    0

    20+ny

    2

    +k

  • 8/12/2019 Conjugacy Print

    17/28

    Lets multiply by

    1

    20

    + n2

    120

    + n2

    in order to simplify the 2 term.

    p(|y) exp241

    2

    1

    20+

    n

    2

    0@20@ 120 + n2

    1

    20

    + n2

    1A 2

    0@ 020 + ny2

    1

    20

    + n2

    1A+k

    1A35

    = exp2412 120 +

    n

    20@

    2 20@0

    2

    0

    + ny2

    1

    20

    + n2

    1A+k

    1A35

    = exp

    2

    41

    2

    1

    20+

    n

    2

    0

    @

    0

    @

    0

    20

    + ny2

    1

    20

    + n2

    1

    A

    1

    A

    23

    5Finally, we have something that looks like the density function of aNormal distribution!

  • 8/12/2019 Conjugacy Print

    18/28

    p(|y

    ) exp241

    2 120 +

    n

    20@ 0@

    0

    20

    + ny2

    1

    20

    + n21A1A

    2

    35

    Posterior Mean: 1 =

    0

    20

    +ny2

    1

    2

    0

    + n2

    Posterior Variance: 21 =

    120

    + n2

    1

    Posterior Precision: 1

    2

    1

    = 1

    2

    0

    + n

    2

    Posterior Precisionis just the sum of theprior precisionand thedata precision.

  • 8/12/2019 Conjugacy Print

    19/28

    We can also look more closely at how the prior mean 0 and theposterior mean 1 relate to each other.

    1 =02

    0

    + ny2

    1

    20

    + n2

    =

    02+20ny

    20

    2

    2+n20

    20

    2

    = 02 + 20 ny

    2 +n20

    = 0

    2

    2 +n20+

    20 ny

    2 +n20

    As n increases, data mean dominates prior mean.

    As 20 decreases (less prior variance, greater prior precision),our prior mean becomes more important.

    A Si l E l

  • 8/12/2019 Conjugacy Print

    20/28

    A Simple Example

    Suppose we have some (fake) data on the heights (in inches) of arandom sample of 100 individuals in the U.S. population.> known.sigma.sq unknown.mean n heights mu0 tau.sq0

  • 8/12/2019 Conjugacy Print

    21/28

    Our posterior is a Normal distribution with Mean0

    20

    +ny

    2

    1

    20

    + n2

    and

    Variance

    120

    + n2

    1

    > post.mean post.mean

    [1] 68.03969

    > post.var post.var

    [1] 0.1592920

    O tli

  • 8/12/2019 Conjugacy Print

    22/28

    Outline

    Conjugate ModelsWhat is Conjugacy?The Beta-Binomial Model

    The Normal ModelNormal Model with Unknown Mean, Known VarianceNormal Model with Known Mean, Unknown Variance

    N l M d l ith K M U k V i

  • 8/12/2019 Conjugacy Print

    23/28

    Normal Model with Known Mean, Unknown Variance

    Now suppose we wish to estimate a model where the likelihood of

    the data is normal with a known mean and an unknown variance2.

    Now our parameter of interest is 2.

    We can use a conjugateinverse gamma prioron 2, with shapeparameter0 and scale parameter 0.

    p(2|y, ) p(y|, 2)p(2)

    Invgamma(1, 1) = Normal(, 2) Invgamma(0, 0)

  • 8/12/2019 Conjugacy Print

    24/28

    Let represent our parameter of interest, in this case 2.

    p(|y, ) n

    Yi=112

    exp

    (yi )

    2

    2

    00

    (0)(0+1) exp

    0

    nY

    i=1

    12 exp

    (yi )

    2

    2

    (0+1) exp

    0

    = n2 exp

    Pn

    i=1(yi )22

    (0+1) exp

    0

    = (0+ n2+1) exp0

    +Pn

    i=1(yi )2

    2

    = (0+n2+1) exp

    24

    0@20+ 2

    Pni=1 (yi)

    2

    2

    2

    1A35

    = (0+n2+1) exp

    240@0+ Pn

    i=1 (yi

    )2

    2

    1A35

    This looks like the density of an inverse gamma distribution!

  • 8/12/2019 Conjugacy Print

    25/28

    p(|y, ) (0+ n2+1) exp24

    0@0+

    Pni=1(yi)

    2

    2

    1A35

    1 =

    0+

    n

    2

    1 = 0+

    Pni=1(yi )2

    2

    Our posterior is anInvgamma(0+ n2 , 0+

    Pni=1(yi)

    2

    2 )

    distribution.

    A Simple Example

  • 8/12/2019 Conjugacy Print

    26/28

    A Simple Example

    Again suppose we have some (fake) data on the heights (in inches)

    of a random sample of 100 individuals in the U.S. population.> known.mean unknown.sigma.sq n heights alpha0 beta0

  • 8/12/2019 Conjugacy Print

    27/28

    Our posterior is a inverse gamma distribution with shape 0+ n2

    and scale 0+Pn

    i=1(yi)

    2

    2

    > alpha1 beta1 library(MCMCpack)> posterior post.mean post.mean

    [1] 12.88139

    > post.var post.var

    [1] 3.136047

    Hmm . . . what if we increased our sample size?

  • 8/12/2019 Conjugacy Print

    28/28

    > n heights alpha1 beta1 posterior post.mean post.mean

    [1] 15.92281

    > post.var post.var

    [1] 0.5058952