Ab cancun

  • Published on
    23-Aug-2014

  • View
    2.738

  • Download
    2

Embed Size (px)

DESCRIPTION

ABC short cours at ISBA 2016, Cancn

Transcript

  • ABC methodology and applications Jean-Michel Marin & Christian P. Robert I3M, Universite Montpellier 2, Universite Paris-Dauphine, & University of Warwick ISBA 2014, Cancun. July 13
  • Outline 1 Approximate Bayesian computation 2 ABC as an inference machine 3 ABC for model choice 4 Genetics of ABC
  • Approximate Bayesian computation 1 Approximate Bayesian computation ABC basics Alphabet soup ABC-MCMC 2 ABC as an inference machine 3 ABC for model choice 4 Genetics of ABC
  • Untractable likelihoods Cases when the likelihood function f (y|) is unavailable and when the completion step f (y|) = Z f (y, z|) dz is impossible or too costly because of the dimension of z c MCMC cannot be implemented!
  • Untractable likelihoods Cases when the likelihood function f (y|) is unavailable and when the completion step f (y|) = Z f (y, z|) dz is impossible or too costly because of the dimension of z c MCMC cannot be implemented!
  • The ABC method Bayesian setting: target is ()f (y|)
  • The ABC method Bayesian setting: target is ()f (y|) When likelihood f (y|) not in closed form, likelihood-free rejection technique:
  • The ABC method Bayesian setting: target is ()f (y|) When likelihood f (y|) not in closed form, likelihood-free rejection technique: ABC algorithm For an observation y f (y|), under the prior (), keep jointly simulating () , z f (z| ) , until the auxiliary variable z is equal to the observed value, z = y. [Tavare et al., 1997]
  • Why does it work?! The proof is trivial: f (i ) zD (i )f (z|i )Iy(z) (i )f (y|i ) = (i |y) . [AcceptReject 101]
  • Earlier occurrence Bayesian statistics and Monte Carlo methods are ideally suited to the task of passing many models over one dataset [Don Rubin, Annals of Statistics, 1984] Note Rubin (1984) does not promote this algorithm for likelihood-free simulation but frequentist intuition on posterior distributions: parameters from posteriors are more likely to be those that could have generated the data.
  • A as A...pproximative When y is a continuous random variable, equality z = y is replaced with a tolerance condition, (y, z) where is a distance
  • A as A...pproximative When y is a continuous random variable, equality z = y is replaced with a tolerance condition, (y, z) where is a distance Output distributed from () P{ (y, z) < } (| (y, z) < ) [Pritchard et al., 1999]
  • ABC algorithm Algorithm 1 Likelihood-free rejection sampler 2 for i = 1 to N do repeat generate from the prior distribution () generate z from the likelihood f (| ) until {(z), (y)} set i = end for where (y) denes a (not necessarily sucient) [summary] statistic
  • Output The likelihood-free algorithm samples from the marginal in z of: (, z|y) = ()f (z|)IA ,y (z) A ,y ()f (z|)dzd , where A ,y = {z D|((z), (y)) < }.
  • Output The likelihood-free algorithm samples from the marginal in z of: (, z|y) = ()f (z|)IA ,y (z) A ,y ()f (z|)dzd , where A ,y = {z D|((z), (y)) < }. The idea behind ABC is that the summary statistics coupled with a small tolerance should provide a good approximation of the posterior distribution: (|y) = (, z|y)dz (|y) .
  • A toy example Case of y| N1 2( + 2)( 2), 0.1 + 2 and U[10,10] when y = 2 and (y, z) = |y z| [Richard Wilkinson, Tutorial at NIPS 2013]
  • A toy example = 7.5 3 2 1 0 1 2 3 1001020 theta vs D theta D + D 3 2 1 0 1 2 3 0.00.20.40.60.81.01.21.4 Density theta Density ABC True = 7.5
  • A toy example = 5 3 2 1 0 1 2 3 1001020 theta vs D theta D + D 3 2 1 0 1 2 3 0.00.20.40.60.81.01.21.4 Density theta Density ABC True = 5
  • A toy example = 2.5 3 2 1 0 1 2 3 1001020 theta vs D theta D + D 3 2 1 0 1 2 3 0.00.20.40.60.81.01.21.4 Density theta Density ABC True = 2.5
  • A toy example = 1 3 2 1 0 1 2 3 1001020 theta vs D theta D + 3 2 1 0 1 2 3 0.00.20.40.60.81.01.21.4 Density theta Density ABC True = 1
  • ABC as knn [Biau et al., 2013, Annales de lIHP] Practice of ABC: determine tolerance as a quantile on observed distances, say 10% or 1% quantile, = N = q(d1, . . . , dN)
  • ABC as knn [Biau et al., 2013, Annales de lIHP] Practice of ABC: determine tolerance as a quantile on observed distances, say 10% or 1% quantile, = N = q(d1, . . . , dN) Interpretation of as nonparametric bandwidth only approximation of the actual practice [Blum & Francois, 2010]
  • ABC as knn [Biau et al., 2013, Annales de lIHP] Practice of ABC: determine tolerance as a quantile on observed distances, say 10% or 1% quantile, = N = q(d1, . . . , dN) Interpretation of as nonparametric bandwidth only approximation of the actual practice [Blum & Francois, 2010] ABC is a k-nearest neighbour (knn) method with kN = N N [Loftsgaarden & Quesenberry, 1965]
  • ABC consistency Provided kN/ log log N and kN/N 0 as N , for almost all s0 (with respect to the distribution of S), with probability 1, 1 kN kN j=1 (j ) E[(j )|S = s0] [Devroye, 1982]
  • ABC consistency Provided kN/ log log N and kN/N 0 as N , for almost all s0 (with respect to the distribution of S), with probability 1, 1 kN kN j=1 (j ) E[(j )|S = s0] [Devroye, 1982] Biau et al. (2013) also recall pointwise and integrated mean square error consistency results on the corresponding kernel estimate of the conditional posterior distribution, under constraints kN , kN /N 0, hN 0 and hp N kN ,
  • Rates of convergence Further assumptions (on target and kernel) allow for precise (integrated mean square) convergence rates (as a power of the sample size N), derived from classical k-nearest neighbour regression, like when m = 1, 2, 3, kN N(p+4)/(p+8) and rate N 4 p+8 when m = 4, kN N(p+4)/(p+8) and rate N 4 p+8 log N when m > 4, kN N(p+4)/(m+p+4) and rate N 4 m+p+4 [Biau et al., 2013] where p dimension of and m dimension of summary statistics
  • Rates of convergence Further assumptions (on target and kernel) allow for precise (integrated mean square) convergence rates (as a power of the sample size N), derived from classical k-nearest neighbour regression, like when m = 1, 2, 3, kN N(p+4)/(p+8) and rate N 4 p+8 when m = 4, kN N(p+4)/(p+8) and rate N 4 p+8 log N when m > 4, kN N(p+4)/(m+p+4) and rate N 4 m+p+4 [Biau et al., 2013] where p dimension of and m dimension of summary statistics Drag: Only applies to sucient summary statistics
  • Probit modelling on Pima Indian women Example (R benchmark) 200 Pima Indian women with observed variables plasma glucose concentration in oral glucose tolerance test diastolic blood pressure diabetes pedigree function presence/absence of diabetes
  • Probit modelling on Pima Indian women Example (R benchmark) 200 Pima Indian women with observed variables plasma glucose concentration in oral glucose tolerance test diastolic blood pressure diabetes pedigree function presence/absence of diabetes Probability of diabetes function of above variables P(y = 1|x) = (x11 + x22 + x33) , for 200 observations based on a g-prior modelling: N3(0, n XT X)1
  • Probit modelling on Pima Indian women Example (R benchmark) 200 Pima Indian women with observed variables plasma glucose concentration in oral glucose tolerance test diastolic blood pressure diabetes pedigree function presence/absence of diabetes Probability of diabetes function of above variables P(y = 1|x) = (x11 + x22 + x33) , for 200 observations based on a g-prior modelling: N3(0, n XT X)1 Use of MLE estimates as summary statistics and of distance (y, z) = ||(z) (y)||2
  • Pima Indian benchmark 0.005 0.010 0.020 0.030 020406080100 Density 0.05 0.03 0.01 020406080 Density 1.0 0.0 1.0 2.0 0.00.20.40.60.81.0 Density Figure: Comparison between density estimates of the marginals on 1 (left), 2 (center) and 3 (right) from ABC rejection samples (red) and MCMC samples (black) .
  • MA example Case of the MA(2) model xt = t + 2 i=1 i ti Simple prior: uniform prior over the identiability zone, e.g. triangle for MA(2)
  • MA example (2) ABC algorithm thus made of 1 picking a new value (1, 2) in the triangle 2 generating an iid sequence ( t)2
  • MA example (2) ABC algorithm thus made of 1 picking a new value (1, 2) in the triangle 2 generating an iid sequence ( t)2
  • Comparison of distance impact Evaluation of the tolerance on the ABC sample against both distances ( = 10%, 1%, 0.1% quantiles of simulated distances) for an MA(2) model
  • Comparison of distance impact 0.0 0.2 0.4 0.6 0.8 01234 1 2.0 1.0 0.0 0.5 1.0 1.5 0.00.51.01.5 2 Evaluation of the tolerance on the ABC sample against both distances ( = 10%, 1%, 0.1% quantiles of simulated distances) for an MA(2) model
  • Comparison of distance impact 0.0 0.2 0.4 0.6 0.8 01234 1 2.0 1.0 0.0 0.5 1.0 1.5 0.00.51.01.5 2 Evaluation of the tolerance on the ABC sample against both distances ( = 10%, 1%, 0.1% quantiles of simulated distances) for an MA(2) model
  • Homonomy The ABC algorithm is not to be confused with the ABC algorithm The Articial Bee Colony algorithm is a swarm based meta-heuristic algorithm that was introduced by Karaboga in 2005 for optimizing numerical problems. It was inspired by the intelligent foraging behavior of honey bees. The algorithm is specically based on the model proposed by Tereshko and Loengarov (2005...