29
Stochastic approximate inference Kay H. Brodersen Computational Neuroeconomics Group Department of Economics University of Zurich Machine Learning and Pattern Recognition Group Department of Computer Science ETH Zurich http://people.inf.ethz.ch/bkay/

Stochastic approximate inference

  • Upload
    nita

  • View
    43

  • Download
    0

Embed Size (px)

DESCRIPTION

Stochastic approximate inference. Kay H. Brodersen Computational Neuroeconomics Group Department of Economics University of Zurich Machine Learning and Pattern Recognition Group Department of Computer Science ETH Zurich http://people.inf.ethz.ch/bkay/. When do we need approximate inference?. - PowerPoint PPT Presentation

Citation preview

Page 1: Stochastic approximate inference

Stochastic approximate inference

Kay H. Brodersen

Computational Neuroeconomics GroupDepartment of EconomicsUniversity of Zurich

Machine Learning and Pattern Recognition GroupDepartment of Computer ScienceETH Zurich

http://people.inf.ethz.ch/bkay/

Page 2: Stochastic approximate inference

2

When do we need approximate inference?

How to evaluate the posterior distribution of the model parameters?

How to compute the evidence term?

How to compute the expectation of the posterior?

How to make a point prediction?

sample from an arbitrary distribution

compute an expectation

compute an expectation

compute an expectation

Page 3: Stochastic approximate inference

3

Deterministic approximationsthrough structural assumptions

application often requires mathematical derivations (hard work)

systematic error computationally efficient efficient representation learning rules may give additional insight

Which type of approximate inference?

Stochastic approximationsthrough sampling

computationally expensive storage intensive asymptotically exact easily applicable general-purpose

algorithms

Page 4: Stochastic approximate inference

4

Themes in stochastic approximate inference

Sampling from a desired target distributionwe need to find a way of drawing random numbers from some target distribution

Computing an expectation w.r.t. that target distributionwe can approximate the expectation of using the sample mean:

p(z)

f(z)

z

z

𝐸 [𝑧 ] sampling

summary

Page 5: Stochastic approximate inference

Transformation method1

Page 6: Stochastic approximate inference

6

Idea: we can obtain samples from some distribution by first sampling from the uniform distribution and then transforming these samples.

Transformation method for sampling from

0 0.5 10

0.2

0.4

0.6

0.8

1

1.2

1.4

p(u)

0 1 2 3 40

0.5

1

1.5

2

p(z)

𝑢=0.9 𝑧=1.15

transformation

Page 7: Stochastic approximate inference

7

Algorithm for sampling from � Draw a random number from the uniform distribution:

� Transform by applying the inverse cumulative density function (cdf) of the

desired target distribution:

� Repeat both steps for .

Transformation method: algorithm

Page 8: Stochastic approximate inference

8

Example: sampling from the exponential distribution� The desired pdf is: � The corresponding cdf is: � The inverse cdf is: � Thus, is a sample from the exponential distribution.

Implementation in MATLABfor t=1:10000

z(t) = -1/lambda*log(1-rand);

end

hist(z)

mean(z)

Transformation method: example

Page 9: Stochastic approximate inference

9

Discussion yields high-quality samples easy to implement computationally efficient obtaining the inverse cdf can be difficult

Transformation method: summary

Page 10: Stochastic approximate inference

Rejection samplingand importance sampling

2

Page 11: Stochastic approximate inference

11

Idea: when the transformation method cannot be applied, we can resort to a more general method called rejection sampling. Here, we draw random numbers from a simpler proposal distribution and keep only some of these samples.

Rejection sampling

The proposal distribution , scaled by a factor , represents an envelope of .

Page 12: Stochastic approximate inference

12

Algorithm for sampling from � Sample from � Sample from � If , then accept the sample:

� Otherwise, discard and .� Repeat until we have obtained accepted samples.

Rejection sampling: algorithm

Page 13: Stochastic approximate inference

13

Idea: if our goal is to compute the expectation , we can outperform rejection sampling by bypassing the generationg of random samples.

Naïve approach� A naïve approach would be to approximate the expectation as follows. Rather than

sampling from , we discretize -space into a uniform grid and evaluate:

� There are two problems with this approach:� The number of terms in the summation grows exponentially with the dimensionality of .� Only a small proportion of the samples will make a significant contribution to the sum.

Uniform sampling clearly is very inefficient.

Importance sampling

Page 14: Stochastic approximate inference

14

Addressing the two problems of the naive approach above, given a proposal distribution , we can approximate the expectation as

where the samples are drawn from .

The quantities are known as importance weights, and they correct the bias introduced by sampling from the wrong distribution.

Unlike in the case of rejection sampling, all of the generated samples are retained.

Importance sampling

Page 15: Stochastic approximate inference

Markov Chain Monte Carlo3

Page 16: Stochastic approximate inference

16

Idea: we can sample from a large class of distributions and overcome the problems that previous methods face in high dimensions using a framework called Markov Chain Monte Carlo.

Markov Chain Monte Carlo (MCMC)

Page 17: Stochastic approximate inference

17

A first-order Markov chain is defined as a series of random variables such the following conditional-independence property holds:

Thus, the graphical model of a Markov chain is a chain:

A Markov chain is specified in terms of� the initial probability distribution � the transition probabilities

Background on Markov chains

𝑧 (1 ) 𝑧 (2 ) 𝑧 (3 ) 𝑧 (𝑀 )…

Page 18: Stochastic approximate inference

18

Background on Markov chains

1

2

3

1 2 3𝑧

𝑝 (𝑧)

Equlibrium distributionMarkov chain: state diagram

0.5

0.5

0.5

Page 19: Stochastic approximate inference

19

Background on Markov chains

1

2

3

1 2 3𝑧

𝑝 (𝑧)

Equlibrium distributionMarkov chain: state diagram

Page 20: Stochastic approximate inference

20

The idea behind MCMCp(z)

f(z)

z

z2 4 6 81 3 5 7

Empirical distribution of samples

Desired target distribution

Markov chain whose equilibrium distribution is

Page 21: Stochastic approximate inference

21

Algorithm for sampling from � Initialize by drawing somehow.� At cycle , draw a candidate sample from .

Importantly, needs to be symmetric, i.e., .� Accept with probability

,and otherwise set .

Notes� In contrast to rejection sampling, each cycle leads to a new sample, even

when the candidate is discarded.� Note that the sequence is not a set of independent samples from because

successive samples are highly correlated.

Metropolis algorithm

Page 22: Stochastic approximate inference

22

Algorithm for sampling from � Initialize by drawing somehow.� At cycle , draw a candidate sample from .

In contrast to the Metropolis algorithm (see previous slide), no longer needs to be symmetric.

� Accept with probability,and otherwise set .

Metropolis-Hastings algorithm

Page 23: Stochastic approximate inference

23

Metropolis: accept or reject?

Inrease in density:

Decrease in density:

p(z*)p(zτ)

p(zτ)

p(z*)

z*

z*

1

)(~*)(~zpzp

Page 24: Stochastic approximate inference

24

Idea: as an alternative to the Metropolis-Hastings algorithm, Gibbs sampling is less broadly applicable but does away with acceptance tests and can therefore be more efficient.

Suppose we wish to sample from a multivariate distribution , e.g., representing several variables in a model. For example, we might be interested in their joint posterior distribution.

In Gibbs sampling, we update one component at a time.

Gibbs sampling

𝑧1𝑧 2𝑧 3𝑧 4𝑧5𝑧 6

𝑧1𝑧 2𝑧 3𝑧 4𝑧5𝑧 6

𝑧1𝑧 2𝑧 3𝑧 4𝑧5𝑧 6

Page 25: Stochastic approximate inference

25

Algorithm for sampling from � Initialize somehow.� At cycle , sample , i.e., replace the variable by a new sample, drawn from a distribution

that is conditioned of the current values of all other variables. The resulting new vector is our new sample.

� In the next cycle, replace a different variable . The simplest procedure is to go round Alternatively, could be chosen randomly.

Gibbs sampling

Page 26: Stochastic approximate inference

26

Throughout Bayesian statistics, we encounter intractable problems. Most of these problems are: (i) evaluating a distribution; or (ii) computing the expectation of a distribution.

Sampling methods provide a stochastic alternative to deterministic methods. They are usually computationally less efficient, but are asymptotically correct, broadly applicable, and easy to implement.

We looked at three main approaches:� Transformation method: efficient sampling from simple distributions� Rejection sampling and importance sampling: sampling from arbitrary

distributions; direct computation of an expected value� Monte Carlo Markov Chain (MCMC): efficient sampling from high-dimensional

distributions through the Metropolis-Hastings algorithm or Gibbs sampling

Summary

Page 27: Stochastic approximate inference

27

Supplementary slides

Page 28: Stochastic approximate inference

28

Procedure: Proof:

. Is ? Moreover: Therefore:

Transformation method: proof

Page 29: Stochastic approximate inference

29

Ergodicity: A Markov chain is ergodic if the distribution converges to an invariant distribution irrespective of the choice of initial distribution . A homogeneous Markov chain will be ergodic, subject to some weak restrictions on the invariant distribution and the transition probabilities.

Equilibrium distribution: Given an ergodic Markov chain, that invariant distribution which any initial distribution converges to is called the equilibrium distribution. (An ergodic Markov chain may have several invariant distributions but only one of these is the equilibrium distribution.)

Background on Markov chains