33
Reproducibility and Stan Aki Vehtari, Aalto University and Stan development team with special thanks to Andrew Gelman, Ben Goodrich, Bob Carpenter, Breck Baldwin, Daniel Lee, Daniel Simpson, Eric Novik, Jonah Gabry, Lauren Kennedy, Michael Betancourt, Rob Trangucci, Sean Talts Funding acknowledgement: DARPA agreement D17AC00001, Academy of Finland grant 313122 mc-stan.org Aki.Vehtari@aalto.fi – @avehtari – mc-stan.org

Reproducibility and Stan - Columbia Universitystatmodeling.stat.columbia.edu/wp-content/uploads/2018/07/Vehtari… · Reproducibility and Stan Aki Vehtari, Aalto University and Stan

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Reproducibility and Stan - Columbia Universitystatmodeling.stat.columbia.edu/wp-content/uploads/2018/07/Vehtari… · Reproducibility and Stan Aki Vehtari, Aalto University and Stan

Reproducibility and Stan

Aki Vehtari, Aalto University

and

Stan development team

with special thanks to

Andrew Gelman, Ben Goodrich, Bob Carpenter, Breck Baldwin, DanielLee, Daniel Simpson, Eric Novik, Jonah Gabry, Lauren Kennedy,

Michael Betancourt, Rob Trangucci, Sean TaltsFunding acknowledgement: DARPA agreement D17AC00001, Academy of Finland grant 313122

mc-stan.org

[email protected] – @avehtari – mc-stan.org

Page 2: Reproducibility and Stan - Columbia Universitystatmodeling.stat.columbia.edu/wp-content/uploads/2018/07/Vehtari… · Reproducibility and Stan Aki Vehtari, Aalto University and Stan

Stan - probabilistic programming framework

Language, inference engine, user interfaces,documentation, case studies, diagnostics, packages, ...

More than ten thousand users in social, biological, andphysical sciences, medicine, engineering, and business

Several full time developers, 34 in dev team, more than 100contributors

R, Python, Julia, Scala, Stata, Matlab, command lineinterfaces

More than 50 R packages using Stan

Hamiltonian Monte Carlo / No-U-Turn-Sampling, ADVI,optimization

[email protected] – @avehtari – mc-stan.org

Page 3: Reproducibility and Stan - Columbia Universitystatmodeling.stat.columbia.edu/wp-content/uploads/2018/07/Vehtari… · Reproducibility and Stan Aki Vehtari, Aalto University and Stan

Reproducibility and Stan

1) Reproducibility of StanCon contributed talks

2) Reproducibility of Stan

3) Validation of Bayesian inference code

[email protected] – @avehtari – mc-stan.org

Page 4: Reproducibility and Stan - Columbia Universitystatmodeling.stat.columbia.edu/wp-content/uploads/2018/07/Vehtari… · Reproducibility and Stan Aki Vehtari, Aalto University and Stan

• StanCon conference contributed oral submissions:- Notebook submission- On web page; notebook, link to a code repo, slides- DOI via Zenodo

• 50% of invited speakers have also provided the code

[email protected] – @avehtari – mc-stan.org

Page 5: Reproducibility and Stan - Columbia Universitystatmodeling.stat.columbia.edu/wp-content/uploads/2018/07/Vehtari… · Reproducibility and Stan Aki Vehtari, Aalto University and Stan

• StanCon conference contributed oral submissions:- Notebook submission- On web page; notebook, link to a code repo, slides- DOI via Zenodo

• 50% of invited speakers have also provided the code

[email protected] – @avehtari – mc-stan.org

Page 6: Reproducibility and Stan - Columbia Universitystatmodeling.stat.columbia.edu/wp-content/uploads/2018/07/Vehtari… · Reproducibility and Stan Aki Vehtari, Aalto University and Stan

• StanCon conference contributed oral submissions:- Notebook submission- On web page; notebook, link to a code repo, slides- DOI via Zenodo

• 50% of invited speakers have also provided the code

[email protected] – @avehtari – mc-stan.org

Page 7: Reproducibility and Stan - Columbia Universitystatmodeling.stat.columbia.edu/wp-content/uploads/2018/07/Vehtari… · Reproducibility and Stan Aki Vehtari, Aalto University and Stan

[email protected] – @avehtari – mc-stan.org

Page 8: Reproducibility and Stan - Columbia Universitystatmodeling.stat.columbia.edu/wp-content/uploads/2018/07/Vehtari… · Reproducibility and Stan Aki Vehtari, Aalto University and Stan

[email protected] – @avehtari – mc-stan.org

Page 9: Reproducibility and Stan - Columbia Universitystatmodeling.stat.columbia.edu/wp-content/uploads/2018/07/Vehtari… · Reproducibility and Stan Aki Vehtari, Aalto University and Stan

[email protected] – @avehtari – mc-stan.org

Page 10: Reproducibility and Stan - Columbia Universitystatmodeling.stat.columbia.edu/wp-content/uploads/2018/07/Vehtari… · Reproducibility and Stan Aki Vehtari, Aalto University and Stan

[email protected] – @avehtari – mc-stan.org

Page 11: Reproducibility and Stan - Columbia Universitystatmodeling.stat.columbia.edu/wp-content/uploads/2018/07/Vehtari… · Reproducibility and Stan Aki Vehtari, Aalto University and Stan

[email protected] – @avehtari – mc-stan.org

Page 12: Reproducibility and Stan - Columbia Universitystatmodeling.stat.columbia.edu/wp-content/uploads/2018/07/Vehtari… · Reproducibility and Stan Aki Vehtari, Aalto University and Stan

[email protected] – @avehtari – mc-stan.org

Page 13: Reproducibility and Stan - Columbia Universitystatmodeling.stat.columbia.edu/wp-content/uploads/2018/07/Vehtari… · Reproducibility and Stan Aki Vehtari, Aalto University and Stan

Reproducibility of notebooks

[email protected] – @avehtari – mc-stan.org

Page 14: Reproducibility and Stan - Columbia Universitystatmodeling.stat.columbia.edu/wp-content/uploads/2018/07/Vehtari… · Reproducibility and Stan Aki Vehtari, Aalto University and Stan

Reproducibility of R session

[email protected] – @avehtari – mc-stan.org

Page 15: Reproducibility and Stan - Columbia Universitystatmodeling.stat.columbia.edu/wp-content/uploads/2018/07/Vehtari… · Reproducibility and Stan Aki Vehtari, Aalto University and Stan

Reproducibility of R session

• Checkpoint handles CRAN, but not git repos

• Checkpoint doesn’t handle external libraries and compilerenvironment

[email protected] – @avehtari – mc-stan.org

Page 16: Reproducibility and Stan - Columbia Universitystatmodeling.stat.columbia.edu/wp-content/uploads/2018/07/Vehtari… · Reproducibility and Stan Aki Vehtari, Aalto University and Stan

Reproducibility of Stan

• There is more code for tests than algorithms

• Continuous integration tests

• Regression tests

[email protected] – @avehtari – mc-stan.org

Page 17: Reproducibility and Stan - Columbia Universitystatmodeling.stat.columbia.edu/wp-content/uploads/2018/07/Vehtari… · Reproducibility and Stan Aki Vehtari, Aalto University and Stan

Reproducibility of Stan

[email protected] – @avehtari – mc-stan.org

Page 18: Reproducibility and Stan - Columbia Universitystatmodeling.stat.columbia.edu/wp-content/uploads/2018/07/Vehtari… · Reproducibility and Stan Aki Vehtari, Aalto University and Stan

Time and machine independent reproducibilityof stochastic computation

• In case of MCMC, even small differences in floating pointarithmetic can lead to very different Markov chains

• To freeze everything containers could be used...

[email protected] – @avehtari – mc-stan.org

Page 19: Reproducibility and Stan - Columbia Universitystatmodeling.stat.columbia.edu/wp-content/uploads/2018/07/Vehtari… · Reproducibility and Stan Aki Vehtari, Aalto University and Stan

Reproducibility of inference algorithms

• If you make changes to your existing codeor if you implement an algorithm from some paper

- how do you know the code is correct?

• In case of stochastic algorithms and differences in floatingpoint arithmetic make bitwise comparison impossible

[email protected] – @avehtari – mc-stan.org

Page 20: Reproducibility and Stan - Columbia Universitystatmodeling.stat.columbia.edu/wp-content/uploads/2018/07/Vehtari… · Reproducibility and Stan Aki Vehtari, Aalto University and Stan

Reproducibility of inference algorithms

• If you make changes to your existing codeor if you implement an algorithm from some paper

- how do you know the code is correct?

• In case of stochastic algorithms and differences in floatingpoint arithmetic make bitwise comparison impossible

[email protected] – @avehtari – mc-stan.org

Page 21: Reproducibility and Stan - Columbia Universitystatmodeling.stat.columbia.edu/wp-content/uploads/2018/07/Vehtari… · Reproducibility and Stan Aki Vehtari, Aalto University and Stan

Is your code producing draws from the posteriordistribution?

• In case of simple models with analytic marginal posteriordistributions testing easier

• Picking some true parameter values, generating a singledata set from the model, and comparing whether the trueparameter values are inside some interval is not enough!

[email protected] – @avehtari – mc-stan.org

Page 22: Reproducibility and Stan - Columbia Universitystatmodeling.stat.columbia.edu/wp-content/uploads/2018/07/Vehtari… · Reproducibility and Stan Aki Vehtari, Aalto University and Stan

Is your code producing draws from the posteriordistribution?

• In case of simple models with analytic marginal posteriordistributions testing easier

• Picking some true parameter values, generating a singledata set from the model, and comparing whether the trueparameter values are inside some interval is not enough!

[email protected] – @avehtari – mc-stan.org

Page 23: Reproducibility and Stan - Columbia Universitystatmodeling.stat.columbia.edu/wp-content/uploads/2018/07/Vehtari… · Reproducibility and Stan Aki Vehtari, Aalto University and Stan

Validating Bayesian Inference Algorithms withSimulation-Based CalibrationTalts, Betancourt, Simpson, Vehtari, Gelman

• Sample a ground truth from the prior, θ ∼ π(θ)

• Sample data from the corresponding data generatingprocess, y ∼ π(y |θ)

• Sample from the posterior π(θ | y) using your algorithm

π(θ) =

∫π(θ | y)π(y | θ)π(θ)dy dθ

• In case of finite number of draws analyse uniformity of therank statistic

[email protected] – @avehtari – mc-stan.org

Page 24: Reproducibility and Stan - Columbia Universitystatmodeling.stat.columbia.edu/wp-content/uploads/2018/07/Vehtari… · Reproducibility and Stan Aki Vehtari, Aalto University and Stan

Validating Bayesian Inference Algorithms withSimulation-Based CalibrationTalts, Betancourt, Simpson, Vehtari, Gelman

• Sample a ground truth from the prior, θ ∼ π(θ)

• Sample data from the corresponding data generatingprocess, y ∼ π(y |θ)

• Sample from the posterior π(θ | y) using your algorithm

π(θ) =

∫π(θ | y)π(y | θ)π(θ)dy dθ

• In case of finite number of draws analyse uniformity of therank statistic

[email protected] – @avehtari – mc-stan.org

Page 25: Reproducibility and Stan - Columbia Universitystatmodeling.stat.columbia.edu/wp-content/uploads/2018/07/Vehtari… · Reproducibility and Stan Aki Vehtari, Aalto University and Stan

Validating Bayesian Inference Algorithms withSimulation-Based CalibrationTalts, Betancourt, Simpson, Vehtari, Gelman

• Sample a ground truth from the prior, θ ∼ π(θ)

• Sample data from the corresponding data generatingprocess, y ∼ π(y |θ)

• Sample from the posterior π(θ | y) using your algorithm

π(θ) =

∫π(θ | y)π(y | θ)π(θ)dy dθ

• In case of finite number of draws analyse uniformity of therank statistic

[email protected] – @avehtari – mc-stan.org

Page 26: Reproducibility and Stan - Columbia Universitystatmodeling.stat.columbia.edu/wp-content/uploads/2018/07/Vehtari… · Reproducibility and Stan Aki Vehtari, Aalto University and Stan

Validating Bayesian Inference Algorithms withSimulation-Based CalibrationTalts, Betancourt, Simpson, Vehtari, Gelman

• Sample a ground truth from the prior, θ ∼ π(θ)

• Sample data from the corresponding data generatingprocess, y ∼ π(y |θ)

• Sample from the posterior π(θ | y) using your algorithm

π(θ) =

∫π(θ | y)π(y | θ)π(θ)dy dθ

• In case of finite number of draws analyse uniformity of therank statistic

[email protected] – @avehtari – mc-stan.org

Page 27: Reproducibility and Stan - Columbia Universitystatmodeling.stat.columbia.edu/wp-content/uploads/2018/07/Vehtari… · Reproducibility and Stan Aki Vehtari, Aalto University and Stan

Validating Bayesian Inference Algorithms withSimulation-Based CalibrationTalts, Betancourt, Simpson, Vehtari, Gelman

• Sample a ground truth from the prior, θ ∼ π(θ)

• Sample data from the corresponding data generatingprocess, y ∼ π(y |θ)

• Sample from the posterior π(θ | y) using your algorithm

π(θ) =

∫π(θ | y)π(y | θ)π(θ)dy dθ

• In case of finite number of draws analyse uniformity of therank statistic

[email protected] – @avehtari – mc-stan.org

Page 28: Reproducibility and Stan - Columbia Universitystatmodeling.stat.columbia.edu/wp-content/uploads/2018/07/Vehtari… · Reproducibility and Stan Aki Vehtari, Aalto University and Stan

All good

0 2 4 6 8 10

Rank Statistic

[email protected] – @avehtari – mc-stan.org

Page 29: Reproducibility and Stan - Columbia Universitystatmodeling.stat.columbia.edu/wp-content/uploads/2018/07/Vehtari… · Reproducibility and Stan Aki Vehtari, Aalto University and Stan

Overdispersed relative to the prior

Prior

ComputedData-Averaged Posterior

f(θ)

0 2 4 6 8 10

Rank Statistic

[email protected] – @avehtari – mc-stan.org

Page 30: Reproducibility and Stan - Columbia Universitystatmodeling.stat.columbia.edu/wp-content/uploads/2018/07/Vehtari… · Reproducibility and Stan Aki Vehtari, Aalto University and Stan

Biased relative to the prior

PriorComputed

Data-Averaged Posterior

f(θ)

0 2 4 6 8 10

Rank Statistic

[email protected] – @avehtari – mc-stan.org

Page 31: Reproducibility and Stan - Columbia Universitystatmodeling.stat.columbia.edu/wp-content/uploads/2018/07/Vehtari… · Reproducibility and Stan Aki Vehtari, Aalto University and Stan

SBC

• Can be used to check algorithms and their implementations

• Works for MCMC and distributional approximations likeINLA and VI

• Reference- Talts, Betancourt, Simpson, Vehtari, and Gelman (2018). Validating

Bayesian Inference Algorithms with Simulation-Based Calibration.arXiv:1804.06788.

[email protected] – @avehtari – mc-stan.org

Page 32: Reproducibility and Stan - Columbia Universitystatmodeling.stat.columbia.edu/wp-content/uploads/2018/07/Vehtari… · Reproducibility and Stan Aki Vehtari, Aalto University and Stan

Bonus: Garden of forking paths

• Instead of picking the best result, integrate over the paths- you can drop only paths with practically zero

contribution to the integral

• Instead of selecting a model by computing model selectioncriterion independently for each model, condition theselection given the integral over all the modelssee, e.g.,

- Vehtari, A., & Ojanen, J. (2012). A survey of Bayesian predictivemethods for model assessment, selection and comparison.Statistics Surveys, 6, 142-228.

- Piironen, J., & Vehtari, A. (2017). Comparison of Bayesianpredictive methods for model selection. Statistics and Computing,27(3), 711-735.

[email protected] – @avehtari – mc-stan.org

Page 33: Reproducibility and Stan - Columbia Universitystatmodeling.stat.columbia.edu/wp-content/uploads/2018/07/Vehtari… · Reproducibility and Stan Aki Vehtari, Aalto University and Stan

Bonus: Garden of forking paths

• Instead of picking the best result, integrate over the paths- you can drop only paths with practically zero

contribution to the integral

• Instead of selecting a model by computing model selectioncriterion independently for each model, condition theselection given the integral over all the modelssee, e.g.,

- Vehtari, A., & Ojanen, J. (2012). A survey of Bayesian predictivemethods for model assessment, selection and comparison.Statistics Surveys, 6, 142-228.

- Piironen, J., & Vehtari, A. (2017). Comparison of Bayesianpredictive methods for model selection. Statistics and Computing,27(3), 711-735.

[email protected] – @avehtari – mc-stan.org