Upload
phungkiet
View
280
Download
4
Embed Size (px)
Citation preview
Introduction Bayesian Stats About Stan Examples Tips and Tricks
Introduction to Stan
Cameron BrackenUniversity of Colorado Boulder
February 2015
Introduction Bayesian Stats About Stan Examples Tips and Tricks
What is Stan?
“A probabilistic programming language implementing fullBayesian statistical inference with MCMC sampling(NUTS, HMC) and penalized maximum likelihoodestimation with Optimization (L-BFGS)”
!"#$%&'()*+,
!"#$%&'( )$"( "%#*+"')( ,-./0"1)",( /'"( -2( #%1,-0( '%0&*+13( )-( '-*4"( %0%)$"0%)+.%*(&#-5*"0(6%'()$%)(-2(7-0&)"(,"(8/22-1( +1(9::;<(=1()$"(2-**-6+13(1">)()6-."1)/#+"'?()$+'()".$1+@/"($%,(%(1/05"#(-2(-)$"#(/'"'<((=1()$"(9ABC'?(D1#+.-(E"#0+(/'",(+))-('-*4"(&#-5*"0'( +1(1"/)#-1(&$F'+.'?(%*)$-/3$($"(1"4"#(&/5*+'$",($+'( #"'/*)'<( ( =1(G-'H*%0-'(,/#+13(I-#*,(I%#(==?(E"#0+(%*-13(6+)$(J)%1(K*%0?(L-$1(4-1(M"/0%11?(M+.$-*%'N")#-&-*+'?(%1,(-)$"#'(,+'./''",()$"(%&&*+.%)+-1(-2()$+'(')%)+')+.%*('%0&*+13()".$1+@/"()-)$"( &#-5*"0'( )$"F( 6"#"( 6-#O+13( -1<( ( K*%0( &-+1)",( -/)( )$"( /'"( -2( "*".)#-0".$%1+.%*.-0&/)"#'( )-( -4"#.-0"( )$"( *-13( %1,)",+-/'( 1%)/#"( -2( )$"( .%*./*%)+-1'?( %1,N")#-&-*+'(1%0",( )$+'(&#"4+-/'*F(/11%0",)".$1+@/"(PN-1)"(7%#*-Q(%2)"#(K*%0R'(/1.*"6$-( 5-##-6",( 0-1"F( 2#-0( #"*%)+4"'5".%/'"($"(ST/')($%,()-(3-()-(N-1)"(7%#*-QU)$"(3%05*+13(.%'+1-V<
W1( N%#.$( 99?( 9AX:?( L-$1( 4-1M"/0%11('"1)(%(*"))"#(UY+.$)0F"#?(9AX:V()-)$"( Z$"-#")+.%*( [+4+'+-1( *"%,"#( &#-&-'+13)$"(/'"(-2()$+'()".$1+@/"(-1(DM=H7()-('-*4"1"/)#-1( ,+22/'+-1( %1,( 0/*)+&*+.%)+-1&#-5*"0'<( ( Z$+'( 6%'( )$"( 2+#')( &#-&-'%*( )-/'"( )$"( N-1)"( 7%#*-( )".$1+@/"( -1( %1"*".)#-1+.( ,+3+)%*( .-0&/)"#<( ( H*'-( +1( 9AX:?D1#+.-( E"#0+( $%,( EDYN=H7( UE+3/#"( 9V?( %0".$%1+.%*( %1%*-3( .-0&/)"#?( &#-3#%00",)-(#/1(N-1)"(7%#*-(&#-5*"0'<(( =1(9AX\?()$"2+#')( #/1'( -1( %,+3+)%*( .-0&/)"#)--O( &*%."( -1DM=H7( UE+3/#"( ;V<=1( )$"( *%)"( 9AXC'%1,( "%#*F( 9A]C'?0%1F(&%&"#'(6"#"6#+))"1( ,"'.#+5+13)$"( N-1)"( 7%#*-0")$-,( %1,( +)'/'"( +1( '-*4+13&#-5*"0'( +1#%,+%)+-1( %1,&%#)+.*"( )#%1'&-#)%1,( -)$"#( %#"%'<Z$"( 2+#')( -&"1N-1)"( 7%#*-.-12"#"1."( 6%'$"*,( %)( K7GH( +1)$"( '/00"#( -29AXA<( ( N%1F( -2( )$-'"( 0")$-,'( %#"( ')+**( +1( /'"( )-,%F( +1.*/,+13( )$"( #%1,-0( 1/05"#3"1"#%)+-1(0")$-,(/'",(+1(N7M!<
-'./+0%12%%3#45"
-'./+0%62%%7)89%:;8<%&*;='9.%-3>!45"
“Stanislaw Ulam, namesake ofStan and co-inventor MonteCarlo methods shown hereholding the Fermiac, EnricoFermi’s physical Monte Carlosimulator for neutron diffusion.”(image from the Stan manual)
Introduction Bayesian Stats About Stan Examples Tips and Tricks
Bayesian Statistics
By Bayesian data analysis, we mean practical methods formaking inferences from data using probability models forquantities we observe and about which we wish to learn.
The essential characteristic of Bayesian methods is theirexplict use of probability for quantifying uncertainty ininferences based on statistical analysis.
[Gelman et al., Bayesian Data Analysis, 3rd edition, 2013]
Introduction Bayesian Stats About Stan Examples Tips and Tricks
Background on Bayesian Statistics
From Bayes’ rule, supposing the data is fixed (observed):
p(θ|y) = p(y, θ)
p(y)=
p(y|θ)p(θ)p(y)
=p(y|θ)p(θ)∫
p(y, θ)dθ
=p(y|θ)p(θ)∫p(y|θ)p(θ)dθ
p(θ|y) ∝ p(y|θ)p(θ) = p(y, θ)
I Everyone: Model data as randomI Bayesians: Model parameters as random
Introduction Bayesian Stats About Stan Examples Tips and Tricks
Background on Bayesian StatisticsFrom Bayes’ rule:
p(θ|y, x)︸ ︷︷ ︸Posterior
∝ p(y|θ, x)︸ ︷︷ ︸Liklihood
p(θ, x)︸ ︷︷ ︸Prior
θ Parametersy Dependent data (response)x Independent data (covariates/predictors/constants)
Posterior: The answer, probability distributions of parametersLiklihood: A computable function of the parameters, modelspecificPrior: "Initial guess", incorporates existing knowledge of thesystem
The key to building Bayesian models is specifying the likelihoodfunction, same as frequentist.
Introduction Bayesian Stats About Stan Examples Tips and Tricks
Monte Carlo Markov Chain (MCMC) in a nutshell
I We want to generate random draws from a target distribution(the posterior). We then identify a way to construct a ’nice’markov chain such that its equilibrium probability distributionis our target distribution.
I If we can construct such a chain then we arbitrarily start fromsome point and iterate the markov chain many times (like howwe forecasted the weather n times). Eventually, the draws wegenerate would appear as if they are coming from our targetdistribution.
I There are several ways to construct ‘nice‘ markov chains (e.g.,gibbs sampler, Metropolis-Hastings algorithm).
(explanation from Cross Validated)- MCMC is really a way to solve integrals that are impossible tosolve analytically.
Introduction Bayesian Stats About Stan Examples Tips and Tricks
What does stan do?
I Samples from the posterior distribution (if your model isspecified correctly)
I "Fits" bayesian modelsI Empowers you to write your own Bayesian models, it’s much
easier than you think!
No U-TurnSamplerAutomatic Step Size andNumber Adaptation
Introduction Bayesian Stats About Stan Examples Tips and Tricks
Why Stan?
There are tons of other “black-box” MCMC samplers out there(BUGS, JAGS, Church, PyMC, many many more,http://probabilistic-programming.org/wiki/Home)
I Stan is open sourceI Built to be fast (about 10 times faster then BUGS according
Gelman)I “Stan can handle problems that choke BUGS and JAGS” –
Andrew Gelman
Introduction Bayesian Stats About Stan Examples Tips and Tricks
Using Stan
Stan is a library with a number of interfaces, we will use the Rinterface called RStan.
Stan Model Code (.stan)
Setup script (.R)
RStan
Stan C++ code Compiler
Model Run Data
Results
Introduction Bayesian Stats About Stan Examples Tips and Tricks
Example 1 – Fit Normal Distribution – Model codeDownload from: http://bechtel.colorado.edu/~bracken/stan/example_models.zipExample files in 1-normal: normal.stan/normal.R
data {int<lower=0> N; // error checking for Nvector[N] y;
}parameters {
real<lower=0> sigma;real mu;
}model {
y ~ normal(mu, sigma); //vectorized}
Introduction Bayesian Stats About Stan Examples Tips and Tricks
Example 1 – Fit Normal Distribution – RStan code
library(RStan)model_file = 'normal.stan'iterations = 500N = 1000mu = 100sigma = 10y = rnorm(N, mu,sigma) # simulate data
stan_data = list(N=N, y=y) # data passed to stan# set up the model
stan_model = stan(model_file, data = stan_data, chains = 0)stanfit = stan(fit = stan_model, data = stan_data,
iter=iterations) # run the modelprint(stanfit,digits=2)
Introduction Bayesian Stats About Stan Examples Tips and Tricks
Diagnostics - Text output
Inference for Stan model: normal.4 chains, each with iter=500; warmup=250; thin=1;post-warmup draws per chain=250, total post-warmup draws=1000.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhatsigma 9.84 0.01 0.19 9.47 9.72 9.84 9.98 10.23 366 1.01mu 99.68 0.01 0.31 99.08 99.48 99.69 99.88 100.30 617 1.00lp__ -2785.83 0.04 0.87 -2788.07 -2786.19 -2785.58 -2785.19 -2784.97 395 1.00
Samples were drawn using NUTS(diag_e) at Thu Feb 19 21:04:56 2015.For each parameter, n_eff is a crude measure of effective sample size,and Rhat is the potential scale reduction factor on split chains (atconvergence, Rhat=1).
I thin, warmupI ne f f
I R̂I lp__
Introduction Bayesian Stats About Stan Examples Tips and Tricks
Diagnostics - Traceplots
0 100 200 300 400 500
02000
4000
6000
8000
Trace of sigma
Iterations
0 100 200 300 400 500
050
100
150
Trace of mu
Iterations
0 100 200 300 400 500
-2000000
-1000000
0
Trace of lp__
Iterations
Introduction Bayesian Stats About Stan Examples Tips and Tricks
Diagnostics - Posterior plots
Histogram of extract(stanfit)$mu
extract(stanfit)$mu
Frequency
98.5 99.0 99.5 100.0 100.5
050
100
150
200
250
Histogram of extract(stanfit)$sigma
extract(stanfit)$sigma
Frequency
9.4 9.6 9.8 10.0 10.2 10.4
050
100
150
200
Histogram of extract(stanfit)$lp__
extract(stanfit)$lp__
Frequency
-2791 -2789 -2787 -2785
0100
200
300
400
Introduction Bayesian Stats About Stan Examples Tips and Tricks
Diagnostics - shinyStan
Old package (don’t use it):https://github.com/jgabry/SHINYstanChanging to: https://github.com/shinyStan/shinyStan
library("shinyStan")launch_shinystan(stanfit)
Introduction Bayesian Stats About Stan Examples Tips and Tricks
Convergence - Traceplots
Converged:
0 500 1000 1500 2000
050
100
150
Trace of beta_space_loc[1]
Iterations0 500 1000 1500 2000
1415
1617
Trace of beta_space_loc[2]
Iterations
0 500 1000 1500 2000−0.007
−0.004
Trace of beta_space_loc[3]
Iterations0 500 1000 1500 2000
−0.005
0.005
0.015 Trace of beta_space_loc[4]
Iterations
0 500 1000 1500 2000−0.015
−0.005
0.005 Trace of beta_space_loc[5]
Iterations
Not Converged:
0 200 400 600 800 1000
−200
−100
050
Trace of gp_knot_value[1]
Iterations0 200 400 600 800 1000
−200
−100
050
Trace of gp_knot_value[2]
Iterations
0 200 400 600 800 1000
−250
−150
−50
Trace of gp_knot_value[3]
Iterations0 200 400 600 800 1000
−250
−150
−50
Trace of gp_knot_value[4]
Iterations
0 200 400 600 800 1000
−150
−50
50150 Trace of gp_knot_value[5]
Iterations0 200 400 600 800 1000
−200
−100
050
Trace of gp_knot_value[6]
Iterations
0 200 400 600 800 1000−300
−150
−50
Trace of gp_knot_value[7]
Iterations0 200 400 600 800 1000
−200
−100
0
Trace of gp_knot_value[8]
Iterations
Introduction Bayesian Stats About Stan Examples Tips and Tricks
Convergence - Posterior plots
Converged:beta_space_loc_01 beta_space_loc_02 beta_space_loc_03
beta_space_loc_04 beta_space_loc_05
0
200
400
0
200
400
0
100
200
300
400
0
200
400
600
0
200
400
0 50 100 150 14 15 16 17 −0.007−0.006−0.005−0.004−0.003−0.002−0.001
−0.0050.0000.0050.0100.015 −0.015−0.010−0.0050.000value
count
Not Converged:beta_space_loc_01 beta_space_loc_02 beta_space_loc_03
beta_space_loc_04 beta_space_loc_05
0
500
1000
1500
0
500
1000
1500
2000
0
500
1000
1500
2000
0
500
1000
1500
0
500
1000
1500
−139.6125−139.6100−139.6075−139.6050−139.6025 9.9800 9.9825 9.9850 9.9875 0.030 0.035 0.040 0.045 0.050 0.055
0.010 0.015 0.020 0.025 0.024 0.027 0.030 0.033value
count
Introduction Bayesian Stats About Stan Examples Tips and Tricks
Example 2 – Fit normal distribution (fancier)
yn ∼ Normal(µ, σ)
The likelihood of observing a normally distributed data value is thenormal density of that point given the parameter values.
Example files in 2-normal: normal2.stan/normal2.R
I priorsI initial values
Introduction Bayesian Stats About Stan Examples Tips and Tricks
Example 3 – fit GEV distribution
Generalized extreme value distribution
y ∼ GEV(µ, σ, ξ)
µ: location, σ: scale, ξ: shape
Example files in 3-gev: gev.stan/gev.R
I Random seedI constraintsI variable constraintsI Initial values
Introduction Bayesian Stats About Stan Examples Tips and Tricks
Example 4 - multiple linear regression
yn = α + βxn + εn
where εn ∼ Normal(0, σ), which can be written:
yn − (α + βxn) ∼ Normal(0, σ)
yn ∼ Normal(α + βxn, σ)
The likelihood of observing a given yn is just the normal densitywith mean α + βxn and standard deviation σ.
Example files in 4-multiple-linear-regression:mlr.stan/mlr.R
Introduction Bayesian Stats About Stan Examples Tips and Tricks
Example 5 - Binomial regression (glm)
yn = α + βg−1(xn) + εn
yn ∼ Normal(g−1(xn), 1)
Example files in 5-binomial-logit:binomial-logit.stan/binomial-logit.R
I Hierarchical
Introduction Bayesian Stats About Stan Examples Tips and Tricks
Stan tips and tricks#1 tip: Read the Manual! It is excellentOther things we didn’t really talk about:
I Local variables in the model block, can be used to storeintermediate results
I Matrices vs arrays, Column vector vs row vectorI Constrained data typesI Transformed parametersI FunctionsI Logical operations/Other types of loopingI Elementwise operatorsI Built-in functionsI Print statementsI Missing dataI PredictionI Discrete variables
Introduction Bayesian Stats About Stan Examples Tips and Tricks
Stan tips and tricksNo need to truncate priors, do that in the parameter bounds
I BAD: setting constraints on parameters but using a prior withother constraintsparameters{
real alpha; //implies no constraints}model{
alpha ~ uniform(0,1);}
I GOOD::parameters{
real <lower=0,upper=1> alpha;}model{
#alpha ~ uniform(0,1); // default uniform priors}
Introduction Bayesian Stats About Stan Examples Tips and Tricks
Stan tips and tricks
I No need to use conjugate priorsI Unlike BUGS (or other Gibbs based samplers), avoid super
vauge priors if you can, i.e. inv_gamma(0.1,0.1)I When in doubt, use a normal prior, or google itI The Stan mailing list is very active
Introduction Bayesian Stats About Stan Examples Tips and Tricks
Speeding up Stan modelsI Avoid repeated operations
// 1/alpha is repeatedfor(n in 1:N)
y[n] ~ exponential(1/alpha * x[n]);I Vectorization is always faster
// not vectorizedfor(n in 1:N)
y[n] ~ normal(beta0 + beta1 * x[n], sigma);//vectorizedy ~ normal(beta0 + beta1 * x, sigma);
I Priors: More informative the better (think better initialconditions), use MLE to get initial estimates
I Parallization: can run multiple chains if you have multiplecores, but each chain is still serial
I More advanced: Access increment_log_prob directly