24
High Performance Computing for Science and Engineering II Pantelis Vlachas Computational Science and Engineering Lab ETH Zürich Probabilities, Bayes Rule, Markov Chain Monte Carlo

Probabilities, Bayes Rule, Markov Chain Monte Carlo

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Probabilities, Bayes Rule, Markov Chain Monte Carlo

High Performance Computing for Science and Engineering II Pantelis Vlachas

Computational Science and Engineering Lab ETH Zürich

Probabilities, Bayes Rule, Markov Chain Monte Carlo

Page 2: Probabilities, Bayes Rule, Markov Chain Monte Carlo

Structure

Bayes Rule

Markov Chain Monte Carlo

Computing the posterior (coin toss)

Conjugate Priors

Page 3: Probabilities, Bayes Rule, Markov Chain Monte Carlo

Structure

Bayes Rule

Markov Chain Monte Carlo

Computing the posterior (coin toss)

Conjugate Priors

Page 4: Probabilities, Bayes Rule, Markov Chain Monte Carlo

Bayes Rule

Bayes Rule

P(A |B) =P(B |A)P(A)

P(B)

• We assume a model (usually omitted/self-explained and absorbed by )

• We look for a parametrisation of that “explains” the data

• is some observed data • is the likelihood of observing the data

given that we have a model of the reality • / is the prior • is the data evidence

M(θ)θ

θ M

DP(D |θ)

p(θ) p(θ |M)p(D)/p(D |M)

ORp(D |M) = ∫θ′

p(D |θ′ , M)p(θ′ |M)dθ′

p(D) = ∫θ′

p(D |θ′ )p(θ′ )dθ′

DATA EVIDENCE (does not depend on )θ

p(θ |D) =p(D |θ) p(θ)

p(D)

MODEL ABSORBED in θ

p(θ |D, M) =p(D |θ, M) p(θ |M)

p(D |M)

GENERAL FORM

Page 5: Probabilities, Bayes Rule, Markov Chain Monte Carlo

Bayes Rule

=p(D |θ, M)

LIKELIHOOD ℒ

p(θ |M)

PRIOR π(θ)

p(θ |D, M)

POSTERIOR

p(D |M)DATA EVIDENCE

(does not depend on )θ

Page 6: Probabilities, Bayes Rule, Markov Chain Monte Carlo

Bayes Rule

Speagle, J. S. (2021). “A conceptual introduction to markov chain monte carlo methods”, arXiv Preprint arXiv:1909.12313.

THE POSTERIOR IS A COMPORMISE OVER

PRIOR AND THE DATA (LIKELIHOOD)

p(θ) = π(θ) p(D |θ) = ℒ(D |θ)

p(θ |D)

Page 7: Probabilities, Bayes Rule, Markov Chain Monte Carlo

Update Belief Based on Data

p(θ |x1) =p(x1 |θ) p0(θ)

p(x1)

Computation of the posteriorp0(θ)

Prior

x1

Data

Experiment 1

Computation of the posterior

p1(θ) =̂ p(θ |x1)Todays’ posterior

is the prior of tomorrow

p1(θ)

Prior

x2

Data

Experiment 2

p(θ |x2) =p(x2 |θ) p1(θ)

p(x2)

Accurate estimate of p(θ |x1, …, xN)

Page 8: Probabilities, Bayes Rule, Markov Chain Monte Carlo

Structure

Bayes Rule

Markov Chain Monte Carlo

Computing the posterior (coin toss)

Conjugate Priors

Page 9: Probabilities, Bayes Rule, Markov Chain Monte Carlo

Conjugate Priors

p(θ |D) =p(D |θ) p(θ)

p(D)

=p(D |θ) p(θ)

∫θ′

p(D |θ′ ) p(θ′ ) dθ′

=p(D |θ) p(θ)

∫θ′

p(D, θ′ ) dθ′

• Given some prior knowledge of the “data generating process” (model , etc.) the form of the likelihood is fixed and well-defined

• The choice (form) of the prior affects both the nominator and the denominator and determines the form of the posterior

• In applications, we need either to (1) have an analytic form of

the posterior (resolve ), or (2) be able to sample from it. • For certain choices of the prior , the posterior has the same

form (belongs to the same family) ! (i.e. different parameters) A. Then is conjugate to the likelihood B. The normal distribution is conjugate prior to a normal

likelihood ! C. Conjugate priors make bayesian update rule easy, else

numerical integration is needed

M p(D |θ)

p(θ′ ′ )

p(θ |x)

Zp(θ)

p(θ) p(D |θ)

=1Z

p(D |θ) p(θ)

Page 10: Probabilities, Bayes Rule, Markov Chain Monte Carlo

Conjugate Priors

Page 11: Probabilities, Bayes Rule, Markov Chain Monte Carlo

Structure

Bayes Rule

Markov Chain Monte Carlo

Computing the posterior (coin toss)

Conjugate Priors

Page 12: Probabilities, Bayes Rule, Markov Chain Monte Carlo

Coin Toss Experiment• You are given a coin which probably is counterfeit and you perform

experiments by flipping it • Repeated runs (sampling) from a Bernoulli distribution • Suppose the probability of a head toss is (unknown) • If you knew , what is the probability of tosses in trials

(Likelihood) ? • head tosses of probability • tail tosses with probability

• number of permutations of the total tosses that have

head tosses

P(H) = θθ NH N

NH θN − NH (1 − θ)

( NNH) N NH

Bernulli Distribution

Bern(θ)

p(NH) = ( NNH) θNH (1 − θ)N−NH Binomial Distribution

H T

0.30.7

LIKELIHOOD

Page 13: Probabilities, Bayes Rule, Markov Chain Monte Carlo

How to select a prior ?• In this case: Conjugate prior to the Binomial likelihood ? • Random variable on which the prior is defined: parametrization of model

• Support ? • Initially we might assume that we do not know anything about the coin

(Uninformative prior) • Uninformative prior - Uniform • Special case of the Beta distribution

P(H) = θθ ∈ [0,1]

U[0,1]

Page 14: Probabilities, Bayes Rule, Markov Chain Monte Carlo

An Informative Prior

• Suppose that we do have information about the coin, we know that most probably it is a fair coin (why shouldn’t it ?)

• We want to incorporate this information into the prior belief

• Selection of a Prior belief of around

• The beta distribution is flexible enough to allow this !

• The magnitude of shape parameters controls our confidence

p(θ)θ = P(H) = 0.5

α = β

Page 15: Probabilities, Bayes Rule, Markov Chain Monte Carlo

Conjugate to Binomial Likelihood

• Beta function is defined as

• How to choose prior ? Support: • Prior for selected as the Beta distribution

• Beta distribution : A distribution of a parametrisation of another distribution !How likely a random variable (probability) can take a value . Parametrized by (shape parameters).

• Assume that you conduct the experiment and get and

B(x, y) = ∫1

0tx−1 (1 − t)y−1 dt

P(H) = θ ∈ [0,1]q =̂ P(H)

p(θ) =̂ Beta(θ; α, β) =̂θα−1 (1 − θ)β−1

B(α, β)Beta(α, β)

[0,1] α, βNH NT = N − NH

p(N, NH⏟data

| θ = x⏟model

) = ( NNH) xNH (1 − x)N−NH

Likelihood

Binomial Distribution

BETAp(θ = x) =̂

xα−1 (1 − x)β−1

B(α, β)

prior

Page 16: Probabilities, Bayes Rule, Markov Chain Monte Carlo

Conjugate to Binomial Likelihood

=p(N, NH |θ = x) p(θ = x)

∫ 1y=0

p(N, NH |θ = y) p(θ = y)dy

=( N

NH) xNH (1 − x)N−NHxα−1 (1 − x)β−1

B(α, β)

∫ 1y=0 ( N

NH) yNH (1 − y)N−NHyα−1 (1 − y)β−1

B(α, β) dy

=xNH+α−1 (1 − x)N−NH+β−1

∫ 1y=0

yNH+α−1 (1 − y)N−NH+β−1dy

=xNH+α−1 (1 − x)N−NH+β−1

B(α + NH, β + N − NH)

= Beta(α + NH, β + N − NH)

p(N, NH⏟data

| θ = x⏟model

) = ( NNH) xNH (1 − x)N−NH

LIKELIHOODp(θ = x) =̂

xα−1 (1 − x)β−1

B(α, β)

prior

PRIOR

B(x, y) = ∫1

0tx−1 (1 − t)y−1 dt

Beta function

p( θ = x⏟model

| N, NH⏟data

)

posterior

=p(N, NH |θ = x) p(θ = x)

p(N, NH)POSTERIOR:

Page 17: Probabilities, Bayes Rule, Markov Chain Monte Carlo

Conjugate to Binomial Likelihood - BETA

Posterior Prior x Likelihood∝

x p(θ |N, NH) ∝ p(θ) p(NH, N |θ)

x p(θ |N, NH) ∝ Beta(α, β) Binomial(N, NH)

p(θ |N, NH) = Beta(α + NH, β + N − NH)

The BETA distribution is a conjugate distribution

to the Binomial Likelihood !

Page 18: Probabilities, Bayes Rule, Markov Chain Monte Carlo

Coding …

Page 19: Probabilities, Bayes Rule, Markov Chain Monte Carlo

Structure

Bayes Rule

Markov Chain Monte Carlo

Computing the posterior (coin toss)

Conjugate Priors

Page 20: Probabilities, Bayes Rule, Markov Chain Monte Carlo

Markov Chain Monte Carlo

=p(D |θ, M)

LIKELIHOOD ℒ

p(θ |M)

PRIOR π(θ)

p(θ |D, M)

POSTERIOR

p(D |M)DATA EVIDENCE

(does not depend on )θIn practice: • Conjugate priors only for simple/academic examples • In MCMC we estimate/sample from the posterior without the normalization factor • Very important factor: SELECTION OF PRIOR (prior knowledge, selection of the distr., range, many issues, “informative

priors”) • Numerical estimation of model parameters and their uncertainty • Calculate high dimensional integrals in complex surfaces • e.g. particle moving on a potential , probability of location is , normalisation constant difficult to

evaluate, goal is to calculate by integration physical quantities, mean position, etc. How ? Use the

simulated values (markov chain) for posterior analysis.

Z = p(D |M)

V(x) p(x) ∝ exp(−V(x))

∫xf(x) p(x) dx

Page 21: Probabilities, Bayes Rule, Markov Chain Monte Carlo

Markov Change Monte Carlo

θ1

θ2

GIVEN • MODEL • LIKELIHOOD • PRIOR • We can evaluate

(Bayes rule)

M(θ)L(D |θ)

p(θ) = π(θ)∝ p(θ |D)

∝ ℒ(D |θ, M) π(θ |M)p(θ |D, M)

High prop. region

True answer somewhere here • high likelihood • low posterior p(θ |D)θ⋆+1, p(θ⋆+1 |D) = P⋆

METROPOLIS SAMPLING • Sample from a proposal

distribution • If accept the jump,

• If accept with

probability

θ⋆+1 ∼ p(θk+1 |θk)P⋆ > Pk

θk+1 = θ⋆

P⋆ ≤ Pk

P⋆/Pk

Evidence does not matter !

Initial guess from • • low likelihood • low posterior

πθ0 = (θ0

1 , θ02)

P0 = p(θ |D)

θ0, P0

Page 22: Probabilities, Bayes Rule, Markov Chain Monte Carlo

Markov Change Monte Carlo

θ1

θ2

GIVEN • MODEL • LIKELIHOOD • PRIOR • We can evaluate

(Bayes rule)

M(θ)L(D |θ)

p(θ) = π(θ)∝ p(θ |D)

High prop. region

True answer somewhere here • high likelihood • low posterior p(θ |D)

METROPOLIS SAMPLING • Sample from a proposal

distribution • If accept the jump,

• If accept with

probability

θ⋆+1 ∼ p(θk+1 |θk)P⋆ > Pk

θk+1 = θ⋆

P⋆ ≤ Pk

P⋆/Pk

Evidence does not matter !

Initial guess from • • low likelihood • low posterior

πθ0 = (θ0

1 , θ02)

P0 = p(θ |D)

θ0, P0

1. The endless jumps form a chain

∝ ℒ(D |θ, M) π(θ |M)p(θ |D, M)

Page 23: Probabilities, Bayes Rule, Markov Chain Monte Carlo

Markov Change Monte Carlo

θ1

θ2

GIVEN • MODEL • LIKELIHOOD • PRIOR • We can evaluate

(Bayes rule)

M(θ)L(D |θ)

p(θ) = π(θ)∝ p(θ |D)

High prop. region

True answer somewhere here • high likelihood • low posterior p(θ |D)

METROPOLIS SAMPLING • Sample from a proposal

distribution • If accept the jump,

• If accept with

probability

θ⋆+1 ∼ p(θk+1 |θk)P⋆ > Pk

θk+1 = θ⋆

P⋆ ≤ Pk

P⋆/Pk

Evidence does not matter !

Initial guess from • • low likelihood • low posterior

πθ0 = (θ0

1 , θ02)

P0 = p(θ |D)

θ0, P0

1. The endless jumps form a chain

2. Initial burn-in steps should be

removed

∝ ℒ(D |θ, M) π(θ |M)p(θ |D, M)

Page 24: Probabilities, Bayes Rule, Markov Chain Monte Carlo

Markov Change Monte Carlo

θ1

θ2

GIVEN • MODEL • LIKELIHOOD • PRIOR • We can evaluate

(Bayes rule)

M(θ)L(D |θ)

p(θ) = π(θ)∝ p(θ |D)

High prop. region

True answer somewhere here • high likelihood • low posterior p(θ |D)

METROPOLIS SAMPLING • Sample from a proposal

distribution • If accept the jump,

• If accept with

probability

θ⋆+1 ∼ p(θk+1 |θk)P⋆ > Pk

θk+1 = θ⋆

P⋆ ≤ Pk

P⋆/Pk

Evidence does not matter !

θ0, P0

1. The endless jumps form a chain

2. Initial burn-in steps should be

removed

• Elaborate explanation • Form of proposal distribution ? • More sophisticated algorithms ? • Convergence ?

LECTURE

Initial guess from • • low likelihood • low posterior

πθ0 = (θ0

1 , θ02)

P0 = p(θ |D)