Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
MARKOVCHAINMONTECARLOMETHODSMARKOVCHAINMONTECARLOMETHODSMARKOLAINE,FMIMARKOLAINE,FMIINVERSEPROBLEMSSUMMERSCHOOL,HELSINKI2019INVERSEPROBLEMSSUMMERSCHOOL,HELSINKI2019
1
TABLEOFCONTENTSTABLEOFCONTENTSUncertaintiesinmodelling
Uncertaintiesinmodelling
Parameterestimation
Parameterestimation
MarkovchainMonteCarlo–MCMC
MarkovchainMonteCarlo–MCMC
MCMCinpractice
MCMCinpractice
SomeMCMCtheory
SomeMCMCtheory
AdaptiveMCMCmethods
AdaptiveMCMCmethods
OtherMCMCvariantsandimplementations
OtherMCMCvariantsandimplementations
Example:dynamicalstatespacemodelsandMCMC
Example:dynamicalstatespacemodelsandMCMC
Exercises
Exercises
2
UNCERTAINTIESINMODELLINGUNCERTAINTIESINMODELLINGFirstweintroducesomebasicconceptsrelatedtodi�erentsourcesofuncertaintiesinmodellingandtoolstoquantifyuncertaity.Westartwithlinearmodel,withknownproperties.
3 . 1
Considersimpleregressionproblem,whereweareinterestedinmodelingthesystematicpartbehindthenoisyobservations.Inadditiontothebest�ttingmodel,weneedinformationabouttheuncertaintyinourestimates.Forlinearmodels,wehaveclassicalstatisticaltheorygivingformulasfortheuncertaintiesdependingontheassumptionsonthenatureofthenoise.
INTRODUCTIONINTRODUCTION
4 . 1
Fornon-linearmodels,orhighdimensionallinearmodels,thesituationisharder.Simulationbasedanalysis,suchasMarkovchainMonteCarlo,providesremedies.Ifweareabletosamplerealizationsfromourmodelwhileperturbingtheinput,wecanassesthesensitivityofthemodeloutputontheinput.TheBayesianstatisticalparadigmallowshandlingofalluncertaintiesbyauni�edframework.
NON-LINEARMODELSNON-LINEARMODELS
5 . 1
STATISTICALANALYSISBYSIMULATIONSTATISTICALANALYSISBYSIMULATIONTheuncertaintydistributionofmodelparametervector giventheobservations andthemodel: .Thisdistributionistypicallyanalyticallyintractable.
Butwecansimulateobservationsfrom .
Statisticalanalysisisusedtode�newhatisagood�t.Parametersthatareconsistentwiththedataandthemodellinguncertaintyareaccepted.
θ yp(θ|y, M)
p(y|θ, M)
6 . 1
Simulatethemodelwhilesamplingtheparametersfromaproposaldistribution.Accept(orweight)theparametersaccordingtoasuitablegoodness-of-�tcriteriadependingonpriorinformationanderrorstatisticsde�ningthelikelihoodfunction.TheresultingchainisasamplefromtheBayesianposteriordistributionofparameteruncertainty.
MARKOVCHAINMONTECARLO–MCMCMARKOVCHAINMONTECARLO–MCMC
7 . 1
WhilesamplingthemodelusingMCMC,weget:
Posteriordistributionofmodelparameters.Posteriordistributionofmodelpredictions.Posteriordistributionformodelcomparison.
Inmanyinverseproblems,modelparametersareofsecondaryinterest.Wearemostlyinterestedinmodel-basedpredictionsofthestate.Usuallyitisevenenoughtobeabletosimulateanensembleofpossiblerealizationsthatcorrespondtothepredictionuncertainty.
POSTERIORDISTRIBUTIONSPOSTERIORDISTRIBUTIONS
8 . 1
EXAMPLE:UNCERTAINTYINNUMERICALWEATHERPREDICTIONSEXAMPLE:UNCERTAINTYINNUMERICALWEATHERPREDICTIONSEuropeanCentreforMediumRangeWeatherForecasts(ECMWF)runsanensembleof50forecastswithperturbedinitialvalues.Thisisdonetwiceadaytogetaposteriordistributionoftheforecastuncertainty.
https://en.ilmatieteenlaitos.�/weather/helsinki?forecast=long
https://en.ilmatieteenlaitos.�/weather/helsinki?forecast=long
9 . 1
GENERALMODELGENERALMODEL
Herewewillmostlyconsideramodellingprobleminverygeneralform
IfweassumeindependentGaussianerrors ,thelikelihoodfunctioncorrespondstoasimplequadraticcostfunction,
Wecandirectlyextendthistonon-Gaussianlikelihoodsbyde�ning,thelog-likelihoodin"sum-of-squares"format.
Forcalculatingtheposterior,wealsoneedtoaccount ,theprior"sum-of-squares".
y
observations
= f (x, θ) + ϵ,
= model + error.
ϵ ∼ N(0, I )σ 2
p(y|θ) ∝ exp{− } = exp{− }.1
2
∑ni ( − f ( |θ))yi xi
2
σ 2
1
2
SS(θ)
σ 2
SS(θ) = −2 log(p(y|θ))S (θ) = −2 log(p(θ))Spri
10 . 1
SOURCESOFUNCERTAINTIESINMODELLINGSOURCESOFUNCERTAINTIESINMODELLINGuncertainty source methodsObservation instrumentnoise, samplingdesign
sampling,representation, retrievalmethod
retrieval
Parameter estimation, optimalestimation
calibration,tuning MCMC
Modelformulation approximatephysics, modeldiagnostics
numerics, modelselection
resolution,sub-gridscale averaging
processes Gaussianprocesses
Initialvalue statespacemodels Kalman�lter
assimilation
11 . 1
PARAMETERESTIMATIONPARAMETERESTIMATIONSomeremarksondi�erentestimationparadigms,beforewegofullyBayesian.
12 . 1
PARAMETERESTIMATIONPARAMETERESTIMATIONWhenconsideringtheproblemofparameterestimationwebasicallyhavetwoalternativemethodologies:classicalleastsquaresestimationandBayesianapproach.Whenthemodelisnon-linearortheerrordistributionisnon-Gaussian,weneedsimulationbasedoriterativenumericalmethodsforestimation.WithMCMCweapplyBayesianreasoningandgetasaresultasamplefromthedistributionthatdescribestheuncertaintyintheparameters.Uncertaintyintheestimatestogetherwitherrorintheobservationscauseuncertaintyinthemodelpredictions.MonteCarlomethodsallowwaystosimulatemodelpredictionswhiletakingintoaccounttheuncertaintyintheparametersandotherinputvariables.
13 . 1
ANOTEONNOTATIONANOTEONNOTATIONThesymbol standsfortheunknownparametertobeestimatedinthebasicmodelequation .Thisiscommoninstatisticalliterature.However,statisticalinverseproblemliteratureisusuallyconcernedinestimatingunknownstateonthesystem,whichistypicallydenotedby .Instatisticalterms,wearedealingwithsameproblem.However,inthestateestimationproblemthereareusuallyspeci�cthingstotakecareof,suchasthediscretizationofthemodel.EspeciallywhenwearefollowingtheBayesianparadigm,alluncertainties,whethertheconcernthestateofthesystem,theparametersofthemodelortheuncertaintyinthepriorknowledge,aretreatedinuni�edway.
θy = f (x; θ) + ϵ
x
14 . 1
EXAMPLE EXAMPLE
Considerachemicalreaction ,modelledasanODEsystem
Thedata consistsofmeasurementsofthecomponents atsomesamplinginstants,andthe 'sarethetimeinstances ,but couldincludeotherconditions,suchastemperature.Theunknownstobeestimatedaretherateconstants, andperhapssomeinitialconditions.Themodelfunction returnsthesolutionoftheaboveequations,perhapsusingsomenumericalODEsolver.
ff ((xx,, θθ)) ++ ϵϵ
A → B → C
dA
dtdB
dtdC
dt
= − Ak1
= A − Bk1 k2
= Bk2
y A, B, Cx , i = 1, 2, . . . nti x
θ = ( , )k1 k2
f (x, θ)
15 . 1
ESTIMATIONPARADIGMSESTIMATIONPARADIGMSAnestimateofaparameterisavaluecalculatedfromthedatathattriestobeasgoodapproximationoftheunknowntruevalueaspossible.Forexample,thesamplemeanisanestimateofthemeanofthedistributionthatisgeneratingthenumbers.Thereareseveralwaysofde�ningoptimalestimators:least-squares,maximumlikelihood,minimumloss,Bayesestimators,etc.Anestimatormustalwaysbeaccompaniedwithestimateofitsuncertainty.Basically,therearetwowaysofconsideringtheuncertainties:frequentistic(samplingtheorybased)andBayesian.
16 . 1
ESTIMATIONPARADIGMSESTIMATIONPARADIGMSFrequentistic,samplingtheorybaseduncertaintyconsidersthesamplingdistributionofestimatorwhenweimagineindependentreplicationsofthesameobservationgeneratingprocedureunderidenticalconditions.InBayesiananalysistheinformationontheuncertaintyabouttheparametersiscontainedintheposteriordistributioncalculatedaccordingtotheBayesrule.
17 . 1
EXAMPLEEXAMPLE
Ifwehaveindependentobservations fromnormaldistribution(assumeknown ),weknowthatthatthesamplemean isaminimumvarianceunbiasedestimatorfor andithassamplingdistribution
Thiscanbeusedtoconstructtheusualcon�denceintervalsfor .
∼ N(θ, ), i = 1, … , nyi σ 2
σ 2 y⎯⎯⎯
θ
∼ N(θ, /n).y⎯⎯⎯
σ 2
θ
18 . 1
EXAMPLE(CONT.)EXAMPLE(CONT.)Bayesianinferenceassumesthatwecandirectlytalkaboutthedistributionofparameter(notjustthedistributionofestimator)anduseBayesformulatomakeinferenceaboutit.The'drawback'isthenecessaryintroductionofthepriordistribution.
Ifourpriorinformationon isveryvague, ,thenafterobservingthedatawehave
andthisdistributioncontainsalltheinformationabout availabletous.
θ p(θ) = 1 y
θ ∼ N( , /n)y⎯⎯⎯
σ 2
θ
19 . 1
MARKOVCHAINMONTECARLO–MCMCMARKOVCHAINMONTECARLO–MCMCNextwelookinmoredetailinsomespeci�cMCMCalgorithmsandtheiruses.
20 . 1
MCMCMETHODSMCMCMETHODSInprinciple,theBayesformulasolvestheestimationprobleminafullyprobabilisticsense.Theposteriordistributioncanbeusedforprobabilitystatementsaboutthemodelunknowns.WecanuseMAPorposteriormeanasapointestimate,calculateprobabilitylimits(con�denceregions),etc.However,someproblemsremain.
Howtode�netheapriordistribution.Howtocalculatetheintegralofthenormalizingconstant.
Theintegrationofthenormalizingconstantisadi�culttaskinhighdimensional,non-linearcases(dimensionhigherthan2or3).TheBayesianapproachhasbecometrulyaccessibleduetovariousMonteCarlomethods.
21 . 1
MCMCMCMCMarkovchainMonteCarlo(MCMC)algorithmsgeneratesasequenceofparametervalues whoseempiricaldistribution,approachestheposteriordistribution.Thegenerationofthevectorsinthechain , isdonebyrandomnumbers(MonteCarlo)issuchwaythateachnewpoint mayonlydependonthepreviouspoint (Markovchain).Intuitively,acorrectdistributionofpoints isgeneratedbyfavoringpointswithhighvaluesoftheposterior .Thechainmaybeusedasifitwouldbeasamplefromtheposterior,andvariousconclusionsconcerningthemodelpredictionsmaybebasedonmeanvalues,variancesetc.computedbythechain.AsimplebuttypicalexampleofMarkovchainisarandomwalk ,wheretherandomincrement doesnotdependon .
, , …θ1 θ2
θn n = 1, 2, …
θn+1
θn θp(θ|y)
= +Xn+1 Xn ϵn
ϵn , , …Xn Xn−1
22 . 1
Thingsaremorecomplicatedinhighdimensionsandourintuitioniseasilyfooled.
Theplotshowsthevolumeofahypersphere
dividedbythevolumeofahypercube .
Randomwalktypemethodsareneededtoexplorethespaceofstatisticalsigni�cantprobability.Otherwisewewillalwaysbelostatsomedistantcorners.
HIGHDIMENSIONALSPACESAREVERYEMPTYHIGHDIMENSIONALSPACESAREVERYEMPTY
2πd/2rd
dΓ(d/2)
(2r)d
23 . 1
THEMETROPOLIS-HASTINGSALGORITHMTHEMETROPOLIS-HASTINGSALGORITHM1.Chooseinitialvalues andproposaldensity .2.Usingcurrentvalueofthechain proposeanewvalue usingproposaldistribution
.
3.Generatearandomnumber uniformon andacceptthenewvalueif
4.Ifacceptedset ,ifnot .5.Gotoii)untilenoughvalueshavebeensampled.
θ0 qθi θ′
q( , ⋅)θi
u [0, 1]
u ≤ min{1, } .π( )q( , )θ′ θ′ θi
π( )q( , )θi θi θ′
=θi+1 θ′ =θi+1 θi
24 . 1
RANDOMWALKMETROPOLISRANDOMWALKMETROPOLIS
Number1inthelistof .Top10Algorithmsofthe20thCentury
Top10Algorithmsofthe20thCentury
chain(1,:) = oldpar; for i=2:nsimu % simulation loop newpar = oldpar + randn(1,npar)*R; newss = ssfun(newpar,xdata,ydata); if (newss<oldss | rand < exp(-0.5*(newss-oldss)/sigma2))) chain(i,:) = newpar; % accept oldpar = newpar; oldss = newss; else chain(i,:) = oldpar; % reject endend
25 . 1
MCMCINPRACTICEMCMCINPRACTICEHowtodoMCMCinpractice.
26 . 1
NON-LINEARMODELFITTINGNON-LINEARMODELFITTING
Consideranon-linearmodeldescribingobservations bycontrolvariables andparametervector :
In'classical'theorywe�ndtheoptimal byminimizingthesumofsquares
whichleadstoanon-linearminimizationproblem.Con�denceregionsfor areusuallyobtainedbylinearizingthelikelihoodfunctionandbyasymptoticarguments.
InBayesianapproachwecangetasimilar�tbyusingalikelihoodde�nedbyaugmentedbysuitablepriorinformationandtheinferenceisdonewiththeposteriordistributionof obtainedbyMCMCsampling.
y xθ
y = f (x, θ) + ϵ, ϵ ∼ N(0, I ).σ 2
θ
SS(θ) = ∑i=1
n
( − f ( , θ))yi xi2
θ
SS(θ)
θ
27 . 1
METROPOLIS-HASTINGSALGORITHMMETROPOLIS-HASTINGSALGORITHMRandomwalkMetropolis-HastingsalgorithmwithGaussianproposaldistribution(andGaussianlikelihood).
Proposenewparametervalue ,where isdrawnfromthproposaldistribution.
Accept withprobability,
ForGaussianlikelihoodswithscalarunknown wecanupdateitwithaGibbsdrawfrom
= + ξθprop θcurr ξ ∼ N(0, )Σprop
θprop
α( , ) = 1 ∧ exp{ − ( )− (S ( ) − S (θcurr θprop
1
2
SS( ) − SS( )θprop θcurr
σ 2
1
2Spri θprop Spri θcu
σ 2
∼ Γ( , ) .σ −2 + nn0
2
+ SS( )n0S20 θcurr
2
28 . 1
PRIORINFORMATIONPRIORINFORMATIONInthepreviousalgorithm,weassumedthatthepriordistributionswasgivenas
.Ingeneral ,whichforstandarddistributionsandindependentcomponentsiseasytoformulate.
IfpriorisindependentGaussian , ,thenwehave
Forlognormaldensity wehave
S (θ)Spri
S (θ) = −2 log(p(θ))Spri
∼ N( , )θi νi η2i i = 1, … p
S (θ) = .Spri ∑i=1
p
( )−θi νi
ηi
2
∼ logN(log( ), )θi νi η2i
S (θ) = .Spri ∑i=1
p
( )log( / )θi νi
ηi
2
29 . 1
OBSERVATIONERROROBSERVATIONERROR
Inpreviousexample,weassumedasocalledconjugatepriorfortheerrorvariancewith
Thisisusefulastheconditionaldistribution isknown:
Andthisallowsustosampleanewvaluefor aftereveryMHstepbyGibbssampling.Matlabsteprequired:
Thispriordistributionisalsoknownasinverse priorfor .
σ 2
p( ) = Γ( , ) .σ −2 n0
2
2
n0S20
p( |y, θ)σ −2
p( |y, θ) = Γ( , ) .σ −2 + nn0
2
2
+ Sn0S20 Sθ
σ 2
sigma2 = 1/gammar(1,1,(n0+n)./2,2./(n0*s20+oldss));
χ2 σ 2
30 . 1
GIBBSSAMPLINGGIBBSSAMPLING
InGibbssamplingthetargetisupdatedcomponentwise(orinblocks),sothattheproposaldistributionsisaconditionaltargetdistributionwithrespecttotheothercomponents.
Weseethattheacceptanceprobabilityisthenalways1.Problemistheconstructionofconditionalprobabilities.Ifthe1dimensionalconditionaldistributionisnotknown,itmustbeapproximativelycreated.Thisusuallyrequiresseveralevaluationsofthelikelihoodtobuildanempiricalversionofthedistribution.Insomeapplicationsthereareeasywaystoconstructthesedistributions.Hierarchicallinearmodelswithconjugatepriorsbeingonetypicalexample.
( | ) = π( | , , … , , , … , )qi θi θi− θi θ1 θ2 θi−1 θi+1 θd
31 . 1
CONJUGACYCONJUGACYTherewasatrickinusingGammadistributioninthesamplingoftheerrorvariancebefore.Whencalculatingtheposteriorweneedtocomputetheproductoflikelihoodandprior
andconsideritasadistributionfortheparameter .Thelikelihooditselfisnotgenerallyadensitywithrespectto .Howeveritcanhaveaformthatcanbescaledtobeadensityandwithsuitableprior,theproductwillalsoretainsthisform.
ForexampletheGaussiandistributionconsideredasfunctionof issimilartoGammadistributionfor
p(y|θ)p(θ) θθ
σ 2
τ = 1/σ 2
∝ .1
πσ 2‾ ‾‾‾√e
− (x−μ)2
2σ 2 τα−1e−βτ
32 . 1
WHYCONJUGACYWHYCONJUGACYConjugacyismanytimesjustacomputationalaidthatisusedbecauseonedoesnotwanttocodemorerealisticalternatives.Howeveritsometimescomesfromtheideathatparametersanddataarisefromasamekindofreasoning.Ifpriorandposteriorhavethesameform,thepriorcanbethoughtasarisingfrom(imaginary)databydirectlyobservingtheparameter.Also,conjugacyisbasisforsimpletextbookdemosofBayesiananalysis
(examplesinnornor.m,sigmaprior.m,binbeta).
33 . 1
GaussianlikelihoodwithGaussianprior.
EXAMPLE:NORMALLIKELIHOOD,NORMALPRIOREXAMPLE:NORMALLIKELIHOOD,NORMALPRIOR
nobs = 10; xmean = 1.0; obssigma2 = 0.5^2; priormu = 0; priorsigma2 = 0.2^2; x = linspace(-2,2,200); prior = norpf(x,priormu,priorsigma2); postmu = (priormu/priorsigma2+nobs*xmean/obssigma2) /... (1/priorsigma2 + nobs/obssigma2); postsigma2 = 1/(1/priorsigma2+nobs/obssigma2); posterior = norpf(x,postmu,postsigma2); postuninf = norpf(x,xmean,obssigma2/nobs); plot(x,prior,x,posterior,x,postuninf) hline([],xmean); legend({'prior','posterior','posterior (noninf)','obs'},'Location','best') title('Gaussian likelihood with Gaussian prior for \mu')
34 . 1
Inverse aspriorfor .
EXAMPLE:INV-EXAMPLE:INV-χχ22
χ2 σ 2
% prior parameters n0 = 1; s20 = 5; % observed sum-of-squared n = 40; ss = 10*n; x = linspace(0.01,20,200); plot(x,invchipf(x,n0,s20),... x,invchipf(x,n+n0,(n0*s20+ss)/(n0+n))) hline([],ss/n); legend({'prior','posterior','obs'}) title('Inverse \chi^2 prior for \sigma^2')
35 . 1
BinomiallikelihoodwithBetaprior.
EXAMPLE:BETA-BINOMIALEXAMPLE:BETA-BINOMIAL
n = 10; % number of tries y = 1; % number of success p = linspace(0,1); a = 3; b = 3; % prior parameters prior = betapf(p,a,b); % prior ptas = betapf(p, y+a, n-y+b); % posterior plot(p,ptas,p,prior) hline([],y/n); legend({'posterior','prior','obs'},'Location','best') title('Binomial likelihood with Beta prior for p')
36 . 1
CHAINCONVERGENCECHAINCONVERGENCEMonteCarlomethodshaveerrorintheestimatesthatbehaveslike where is
numberofsimulations,oringeneral,availableresourcesorCPUcycles.Non-stochasticnumericalmethodshavemuchbetterconvergenceresults,butgenerallytherearenousabledirectnumericalmethodsforhighdimensions(>3–4).Inadditiontoslowconvergence,theMCMCchainhascorrelationbetweensimulatedvalues,whicha�ecttheMonteCarloerroroftheestimatescalculatedfromthechain.
1n√
n
37 . 1
SERIALAUTOCORRELATIONSERIALAUTOCORRELATION
38 . 1
INTEGRATEDAUTOCORRELATIONTIMEINTEGRATEDAUTOCORRELATIONTIME
Thesecondterm
iscalledintegratedautocorrelationtime.Ittellstheincreaseofvarianceofsamplebasedestimatesduetoautocorrelation.Itisthepricetopayofnothavingani.i.d.sample.
Function inMCMCtoolboxestimates usingmethoddueSokal.AnalternativemethodforestimatingtheMonteCarloerrorisbybatchmeanstandarderror,functions and .
ττ
τ = 1 + 2 ρ(k)∑k=1
∞
iact.m τ
bmstd.m chainstats.m
39 . 1
CHAINSTHATHAVENOTMIXEDYETCHAINSTHATHAVENOTMIXEDYET>> chainstats(chain,names) MCMC statistics, nsimu = 1000 mean std MC_err tau geweke --------------------------------------------------------------------- theta_1 0.037146 0.29164 0.054374 67.086 0.017371 theta_2 -0.98318 0.39323 0.082073 112.74 0.43098 theta_3 0.0077533 0.37183 0.076218 101.3 0.013535 theta_4 0.058239 0.34665 0.069839 101.88 0.014742 ---------------------------------------------------------------------
40 . 1
CHAINSTHATHAVEBETTERMIXINGCHAINSTHATHAVEBETTERMIXING>> chainstats(chain,names) MCMC statistics, nsimu = 1000 mean std MC_err tau geweke --------------------------------------------------------------------- theta_1 0.047117 0.42071 0.068536 43.274 0.027867 theta_2 -1.1139 0.49858 0.088196 47.542 0.60531 theta_3 0.050454 0.43527 0.069026 41.525 0.023323 theta_4 0.033922 0.42226 0.067013 43.516 0.038749 ---------------------------------------------------------------------
41 . 1
ERGODICITYERGODICITYThetheoreticalcorrectnessofMCMCmethodsmaybeexpressedbythefollowingergodicitytheorem.Let bethesamplesproducedbyaMCMCalgorithm.Thefollowingshouldbevalid,andindeedis,forexamplefortherandomwalkMetropolisalgorithmwithaGaussianproposal:
TheoremLet bethedensityfunctionofatargetdistributionintheEuclideanspace .ThentheMCMCalgorithmsimulatesthedistribution correctly:foranarbitraryboundedandmeasurablefunction it('almostsurely')holdsthat
, . . .θ1 θn
π Rd
π
f : → RRd
(f ( ) + … + f ( )) = f (θ)π(dθ).limn→∞
1
nθ1 θn ∫
Rd
42 . 1
ERGODICITYANDPREDICTIVEINFERENCEERGODICITYANDPREDICTIVEINFERENCETheergodicitytheoremsimplystatesthatthesampledvaluesasymptoticallyapproachthetheoreticallycorrectones,foreachrealizationofthechain.Notetheroleofthefunction .If isthecharacteristicfunctionofaset ,thentherighthandsideoftheequationgivestheprobabilitymeasureof ,whilethelefthandsidegivesthefrequencyof'hits'to bythesampling.But mightalsobeourmodelandthen isthemodeloutputassumingtheparametervalue .Thetheoremstatesthatthevaluescalculatedatthesampledparameterscorrectlygivesthedistributionofthemodelpredictions,thesocalledpredictivedistribution.
f f AA
Af f (θ)
θ
43 . 1
PREDICTIVEINFERENCEPREDICTIVEINFERENCEPredictiveinferencemeansstudyingtheposteriordistributionofthemodelpredictions.Inmanyapplicationsthesepredictionsaremoreinterestingthantheactualvaluesofthemodelparameters.ThenicefeatureofMCMCanalysesistheeasewithwhichyoucanmakeprobabilitystatementsandinferenceonthevaluesthatcanbecalculatedfromthemodel.Wesimplycalculatethemodelpredictionforeachrowofthechain(orrandomsubsetofit)andwehaveasamplefromtheposteriorpredictivedistribution.
44 . 1
PARAMETERUNCERTAINTYPARAMETERUNCERTAINTY
45 . 1
MODELPREDICTIONUNCERTAINTYMODELPREDICTIONUNCERTAINTY
46 . 1
MCMCTOOLBOXFORMATLABMCMCTOOLBOXFORMATLAB
47 . 1
MCMCTOOLBOXFORMATLABMCMCTOOLBOXFORMATLAB
MatlabtoolboxforadaptiveMCMC
model.ssfun = @mycostfun data = load('datafile.dat'); parameters = { {'par1', 2.3 } {'par2', 1.2 } }; options.nsimu = 5000; options.method = 'am'; [results,chain] = mcmcrun(model,data,parameters,options); mcmcplot(chain,[],results)
https://www.github.com/mjlaine/mcmcstat/
https://www.github.com/mjlaine/mcmcstat/
48 . 1
SOMEMCMCTHEORYSOMEMCMCTHEORYLet'sreviewsometheoryofMarkovchainsrelatedtoMCMC.
49 . 1
SOMEMCMCTHEORYSOMEMCMCTHEORYLet beaparametervectorhavingvaluesinaparameterspace ,indexingfamilyofpossibleprobabilitydistributions describingourobservations .
TheBayesformulagivestheposterior intermsoflikelihoodandprior
MarkovchainMonteCarlomethodsconstructaMarkovchainwhichhasthespaceasitsstatespaceand asitslimitingstationarydistribution.Thatmeanswehaveawayofsamplingvaluesfromposteriordistribution andthereforemakeMonteCarloinferenceabout informofsampleaveragesanddensityestimates.
θ Θ
p(y|θ) y
π p(θ)
π(θ) := p(θ|y) =p(y|θ)p(θ)
∫ p(y|θ)p(θ) dθ
Θ
ππ
θ
50 . 1
MARKOVCHAINMONTECARLO–MCMCMARKOVCHAINMONTECARLO–MCMCAMarkovchainisdescribedbyatransitionkernel thatgivesforeachstatetheprobabilitydistributionforthechaintomovetostate inthenextstep.Belowweassumethatthereexistsacorrespondingtransitiondensity .
MCMCmethodsproducechainsthatareaperiodic,irreducibleandful�llareversibilityconditioncalleddetailedbalanceequation:
If istheinitialdistributionofthestartingstate,thentheintensityofgoingfromstatetostate issameasthatofgoingfrom to .
Directconsequenceofthereversibilityis
thatmeansthat isthestationarydistributionofthechainandwecanuseasamplefromthechainasarandomsamplefrom .
P(θ, d )θ′ θdθ′
p(θ, )θ′
π(θ)p(θ, ) = π( )p( , θ) θ, ∈ Θ.θ′ θ′ θ′ θ′
πθ θ′ θ′ θ
∫ π(θ)p(θ, ) dθ = π( ), forall ∈ Θθ′ θ′ θ′
ππ
51 . 1
THEMETROPOLIS-HASTINGSALGORITHMTHEMETROPOLIS-HASTINGSALGORITHM
InMetropolis-HastingsalgorithmwegenerateaMarkovchainwithatransitiondensity
forsomeproposaldensity andforacceptanceprobability .
Thechainisreversibleifandonlyif
Whichleadstochoose as
Usually ,butthereversibilityconditioncanbeformulatedinmoregeneralstatespace.
p(θ, )θ′
p(θ, θ)
= q(θ, )α(θ, ), θ ≠θ′ θ′ θ′
= 1 − ∫ q(θ, )α(θ, ) dθθ′ θ′
q α
π(θ)q(θ, )α(θ, ) = π( )q( , θ)α( , θ).θ′ θ′ θ′ θ′ θ′
α
α(θ, ) = min{1, } .θ′ π( )q( , θ)θ′ θ′
π(θ)q(θ, )θ′
Θ ⊂ Rd
52 . 1
NOTESNOTES
InMHalgorithmweneedtocalculatetheposteriorratio intheformulaforbutBayesformulagivesthisintermsoflikelihoodandprioras
andtheconstantofproportionalitydisappears.
WeneedsometheoryofMarkovchainsbutwithimportantsimpli�cations:weknowbyconstructionthatthestationarydistribution exists.Also,weareabletochoosetheinitialdistributionaswelike.ThatgivesussimplewaystoproveimportantergodicpropertiesoftheMHchain.Thelawoflargenumbersthatgivesuspermissiontousesampleaveragesasestimatesandthecentrallimittheoremwhichgivesustheconvergencerateforthealgorithms.
π( )/π(θ)θ′
α
p(y| )p( )θ′ θ′
p(y|θ)p(θ)
π
53 . 1
REVERSIBLEJUMPMETROPOLIS-HASTINGSALGORITHMREVERSIBLEJUMPMETROPOLIS-HASTINGSALGORITHMThedetailedbalanceequationcanalsobeformulatedinverygeneralstatespaces.FortheMetropolis-Hastingsalgorithmtoworkitmustonlyacceptstatesfromwherethereisapositiveprobabilitytodoareversiblemovebacktotheoriginalstate.
ForthereversiblejumpMetropolis-Hastingsalgorithmthestatespaceiswrittenas
where isenumerablemodelspaceand istheparameterspaceofmodel.Thedimensionof canvarywith .
Theposteriordistributioncanbefactorizedas
andwemightbeinterestedintheposteriorprobabilitiesofdi�erentmodels anddrawconditionalormarginalconclusionsaboutdi�erentmodelsintermsofor .
E = {(k, ), k ∈ , ∈ } ,θ(k) θ(k)Θk
∈θ(k)Θk
k θ(k) k
π( , k) = π( |k)π(k)θ(k) θ(k)
π(k)π( |k)θ(k)
π( )θ(k)
54 . 1
IMPLEMENTINGRJMCMCIMPLEMENTINGRJMCMC
55 . 1
STEPOFTHEALGORITHM:STEPOFTHEALGORITHM:Whenbeinginmodel withparametervector :
1.Chooseanewmodel bydrawingitfromdistribution .Proposeavaluefortheparameter bygenerating fromdistribution .
2.Acceptthemovewithprobability(**): and .
3.Ifthemoveisnotaccepted,stayinthecurrentmodel: and .
Instep(ii)itisalsopossibletochoosestayinthecurrentmodelanddoastandardMetropolis-Hastingsstep.
ki θ( )ki
i
j p(i, ⋅)θ(j) u ( , u)q jki
θ( )ki
i
= jki+1 =θ( )ki+1
i+1 θ(j)
=ki+1 ki =θ( )ki+1
i+1 θ( )ki
i
56 . 1
ADAPTIVEMCMCMETHODSADAPTIVEMCMCMETHODSShortdescriptionofadaptiveMCMCmethodswhichweredevelopedtosolvelargenumberofestimationproblemswithouthandtuningofthealgorithm.
MCMCadaptationhasmanyinterestingtheoreticalquestionsaboutconvergenceofthemethods.
57 . 1
ADAPTIVEMETHODS,AMADAPTIVEMETHODS,AMThebottleneckinMCMC(Metropolis)calculationsoftenisto�ndaproposalthatmatchesthetargetdistribution,sothatthesamplingise�cient.Thismayleadtoatime-consumingtrial-and-errortuningoftheproposal.Variousadaptivemethodshavebeendevelopedinordertoimprovetheproposalduringtherun.Onerelativelysimplewayistocomputethecovariancematrixofthechainanduseitastheproposal.ThisiscalledAM,theAdaptiveMetropolisalgorithm.InAMthenewpointdependsnotjustonthepreviouspoint,butontheearlierhistoryofthechain.SothealgorithmisnomoreMarkovian.However,iftheadaptationisbasedonanincreasingpartofthechain,onecanprovethatthealgorithmproducesacorrectergodicresult.
58 . 1
ADAPTIVEMETHODS,AMADAPTIVEMETHODS,AM
AdaptiveMetropolisisarandomwalkalgorithmthatusesGaussianproposalwithacovariance thatdependonthechaingeneratedsofar,
where isaparameterthatdependsonlyonthedimension ofthesamplingspace,isaconstantthatwemaychooseverysmall,and ,de�nesthelengthof
theinitialnon–adaptationperiodandwelet
Cn
= { ,Cn,C0
cov( , … , ) + ε ,sd θ1 θn sd Id
n ≤ n0
n > .n0
sd dε > 0 > 0n0
59 . 1
ADAPTIVEMETHODS,AMADAPTIVEMETHODS,AMThisformofadaptationcanbeproventobeergodic.Notethatthesameadaptation,butwitha�xedupdatehistorylength,isnotergodic.Thechoiceforthelengthoftheinitialnon-adaptiveportionofthesimulation, ,isfree.Theadaptationisnotneededbedoneateachtimestep,onlyatgivenintervals.Thisformofadaptationimprovesthemixingpropertiesofthealgorithm,especiallyforhighdimensions.Theroleoftheparameter istoensurethat willnotbecomesingular.Inmostpracticalcases canbesafelysettozero.Thescalingparameterusuallyistakenas .
n0
ε Cn
ε
= /dsd 2.42
60 . 1
DR,THEDELAYEDREJECTIONALGORITHMDR,THEDELAYEDREJECTIONALGORITHM
Supposethecurrentpositionofasampledchainis .AsinaregularMH,acandidatemove isgeneratedfromaproposal andacceptedwiththeusualprobability
Uponrejection,insteadofretainingthesameposition, ,asecondstagemove, ,isproposed.Thesecondstageproposalisallowedtodependnotonlyonthecurrentpositionofthechainbutalsoonwhatwehavejustproposedandrejected:
.
= θθn
θ′1 (θ, ⋅)q1
(θ, ) = 1 ∧α1 θ′1
π( ) ( , θ)θ′1 q1 θ′
1
π(θ) (θ, )q1 θ′1
= θθn+1
θ′2
(θ, , ⋅)q2 θ′1
61 . 1
DR,THEDELAYEDREJECTIONALGORITHMDR,THEDELAYEDREJECTIONALGORITHM
Itcanbeshownthatanergodicchainiscreated,whenthesecondstageproposalisacceptedwithprobability
Thisprocessofdelayingrejectioncanbeiteratedtotrysamplingfromfurtherproposalsincaseofrejectionbythepresentone.However,inmanycasestheessentialbene�t–moreacceptedpointinasituationwhereone(e.g.,Gaussian)proposaldoesnotseemtoworkproperly–isalreadyreachedbytheabove2-stageDRalgorithm.
(θ, , ) = 1 ∧α2 θ′1 θ′
2
π( ) ( , ) ( , , θ)[1 − ( , )]θ′2 q1 θ′
2 θ′1 q2 θ′
2 θ′1 α1 θ′
2 θ′1
π(θ) (θ, ) (θ, , )[1 − (θ, )]q1 θ′1 q2 θ′
1 θ′2 α1 θ′
1
62 . 1
DRWITHADAPTATION:DRAMDRWITHADAPTATION:DRAMItispossibletocombinetheideasofadaptationanddelayedrejection.Toavoidcomplications,adirectwayofimplementingAMadaptationwithan$m$-stageDRalgorithmissuggested.Theproposalatthe�rststageofDRisadaptedjustasinAM:thecovariance fortheGaussianproposaliscomputedfromthepointsofthesampledchain,nomatteratwhichstageofDRthesepointshavebeenacceptedinthesamplepath.Thecovariance oftheproposalforthe 'thstage( )iscomputedasascaledversionofthe�rstproposal, .Thescalefactors canbefreelychosen:theproposalsofthehigherstagescanhaveasmallerorlargervariancethantheproposalatearlierstages.WehaveseenthatAMalonetypicallyrecoversfromaninitialproposalthatistoosmall,whiletheadaptationhasdi�cultiesifnooronlyafewacceptedpointsarecreatedinthestart.Soagooddefaultistousejusta2stageversionwherethesecondproposalisscaleddownfromthe(adapted)proposalofthe1.stage.
C1n
C in i i = 2, . . . , m
=C in γiC
1n
γi
63 . 1
DRAMversionoftheMHalgorithmthatproposesfromaGaussiandistribution.Secondstageproposalcovarianceishalfthesizeofthe�rststage.Adaptationisdoneafterevery20iterations.Animationshowsthe�rst100DRAMsteps.
DRAM-DELAYEDREJECTIONADAPTIVEMETROPOLISDRAM-DELAYEDREJECTIONADAPTIVEMETROPOLIS
64 . 1
SHORTCHAINSANDADAPTATIONSHORTCHAINSANDADAPTATIONItisimportanttomakeshortchainsase�cientaspossible.E�cient:produceestimateswithsmallMonteCarloerror.
ShortMCMCchainrepeated1000timeswithdi�erentalgorithms,Gaussian10dimensionaltargetandtoolargeinitialcovariance.
65 . 1
SHORTCHAINSANDADAPTATIONSHORTCHAINSANDADAPTATIONBut,adaptationmightslowtheconvergence.
Sameasinthepreviousslide,butnowwithmoreoptimalinitialproposal,Gaussian10dimensionaltarget,nearoptimalinitialcovariance.
66 . 1
RandomwalkMCMCisbynaturesequential,anditisgenerallymoree�cienttorunonelongchainthanmanyshortindependentchains.Onecouldrunsseveralchainsinparallel,withoutcommunicationbetweenthechains.InparalleladaptiveMCMC,theadaptationisdoneoverthepointsinallchainsandtheyshareonecommonadaptedproposalcovariance.Communicationbetweenthechainscanbeasynchronous.
FASTERMCMC:PARALLELCHAINSFASTERMCMC:PARALLELCHAINS
67 . 1
EARLYREJECTIONEARLYREJECTIONInmanycases isamonotonicallyincreasingfunctionwrt.addingnewobservationsorsimulatingthemodelfurtherintime.Withtheacceptanceprobability de�nedasbefore,wedraw
,andacceptif
Ifwewritethisas
thenwecanstopevaluatingthemodelwhen .
SS(θ)
α( , )θcurr θprop
u ∼ U(0, 1)
−2 log(u) < + S ( ) − S ( ).SS( ) − SS( )θprop θcurr
σ 2Spri θprop Spri θcurr
SScrit = −2 log(u) + SS( )/ + S ( )θcurr σ 2 Spri θcurr
< SS( )/ + S ( ),θprop σ 2 Spri θprop
SS( ) ≥ (S − S ( ))θprop Scrit Spri θprop σ 2
68 . 1
OTHERMCMCVARIANTSANDIMPLEMENTATIONSOTHERMCMCVARIANTSANDIMPLEMENTATIONS
69 . 1
GRADIENTBASEDMETHODSGRADIENTBASEDMETHODSLangevindi�usion.Hamiltonianmethods.
Bothneed(automatic)di�erentationofthelikelihood/forwardmodel.Othertoolboxes
STANPYMC3FME
Niceanimationsofdi�erentMCMCmethodsbyChiFeng.
SeealsoblogpostbyColinCarroll .
https://mc-stan.org
https://mc-stan.org
https://docs.pymc.io
https://docs.pymc.io
https://cran.r-project.org/package=FME
https://cran.r-project.org/package=FME
https://chi-feng.github.io/mcmc-demo/
https://chi-feng.github.io/mcmc-demo/
HamiltonianMonteCarlofromscratch
HamiltonianMonteCarlofromscratch
70 . 1
EXAMPLE:DYNAMICALSTATESPACEMODELSANDEXAMPLE:DYNAMICALSTATESPACEMODELSANDMCMCMCMC
AsamoreadvancedexampleonmodellingwithMCMCmethods,weconsiderageneralstatespacemodelasaframeworkformultivariatetimeseriesanalysis.ThisiscloselyconnectedwithJanne'slecturesabouttheKalman�lter.
71 . 1
DLM–DYNAMICLINEARMODELDLM–DYNAMICLINEARMODELStatespaceformofthegeneralmodel
Modeloperator ,observationoperator ,modelerrorcovariance ,observationerrorcovariance ,withtimeindex .
Bayesianhierarchicalmodel
Observationmodel: .Processmodel: .Parametermodel: .
Bayesformula
xt
yt
= + ,Mtxt−1 Et
= + ,Htxt ϵt
∼ (0, )Et Nm Qt
∼ (0, ).ϵt Np Rt
Mt Ht Qt
Rt t = 0, 1, … , n
p( | , θ)yt xt
p( | , θ)xt+1 xt
p(θ)
p( , θ| ) ∝ p( | , θ)p( | , θ)p(θ).x1:n y1:n ∏t=1
n
yt xt xt xt−1
72 . 1
ESTIMATINGLINEARSTATESPACEMODELESTIMATINGLINEARSTATESPACEMODELFordynamiclinearmodelswehavecomputationaltoolsforallrelevantstatisticaldistributions.
onesteppredictionsbyKalman�lter.�lteringdistributionbyKalman�lter.smoothingdistributionbyKalmansmoother.jointstatebysimulationsmoother.
parameterlikelihoodbyKalman�lterlikelihood.jointstateandparameterbyMCMC.
Theparameter containstheauxiliarymodelparameterrelatedtomodelstructureandobservationandmodelerrorcovariances.Theparameterscanbe�xedbypriorknowledge,estimatedbymaximumlikelihood,orestimatedandmarginalizedoverbyMCMC.
p( | , , θ)xt+1 xt y1:t
p( | , θ)xt y1:t
p( | , θ)xt y1:n
p( | , θ)x1:n y1:n
p( |θ)y1:n
p( , θ| )x1:n y1:n
θ
73 . 1
TRENDANALYSISBYSIMULATIONTRENDANALYSISBYSIMULATIONKalmanformulasgivemarginaldistributions .WecansimulateDLMstatesfrom .
NeedMCMCtointegrateouttheuncertaintyabout andsimulatefrom
Thisisneeded,forexample,togetuncertaintyestimatesoftrendrelatedstatisticsintimeseriesanalysis.
p( | , θ)xt y1:n
p( | , θ)x1:n y1:n
θ
p( | ) = ∫ p( | , θ) dθ.x1:n y1:n x1:n y1:n
74 . 1
ESTIMATINGPARAMETERSESTIMATINGPARAMETERSHowtoselectthemodelmatrix ?
Byconsideringalltherelevantprocesses.Diagnosingtheresiduals.
Howtochooseinitialstatedistribution ?Byassumingdi�usepriors .
Howtoestimateerrorcovariances and ?
ByKalman�lterlikelihood:
M
N( , )x0 C0
…
Qt Rt
−2 log(p( | , θ)) ∝ [( − ( − ) + log(| |)]y1:n x1:n ∑t=1
n
yt Ht x̂ t)T C
y−1t yt Ht x̂ t C
yt
75 . 1
DLMTOOLBOXDLMTOOLBOXMatlabtoolboxforDLM
Ozonetimeseriesexample
MoreinfoonDLMhere
https://www.github.com/mjlaine/dlm/
https://www.github.com/mjlaine/dlm/
https://mjlaine.github.io/dlm/ex/ozonedemo.html
https://mjlaine.github.io/dlm/ex/ozonedemo.html
https://arxiv.org/abs/1903.11309
https://arxiv.org/abs/1903.11309
76 . 1
THATSALL!THATSALL!H.Haario,E.Saksman,J.Tamminen:AnadaptiveMetropolisalgorithm,Bernoulli,7(2),2001.H.Haario,E.Saksman,J.Tamminen:ComponentwiseadaptationforhighdimensionalMCMC,ComputationalStatistics,20(2),2005.H.Haario,M.Laine,A.Mira,E.Saksman:DRAM:E�cientadaptiveMCMC,StatisticsandComputing,16(3),2006.HaarioH.,M.Laine,M.Lehtinen,E.Saksman,J.Tamminen:MCMCmethodsforhighdimensionalinversioninremotesensing,JournaloftheRoyalStatisticalSociety,SeriesB,66(3),2004.LaineM.,J.Tamminen:AerosolmodelselectionanduncertaintymodellingbyadaptiveMCMCtechnique,AtmosphericChemistryandPhysics,8(24),2008.
SolonenA.,P.Ollinaho,M.Laine,H.Haario,J.Tamminen,H.Järvinen:E�cientMCMCforClimateModelParameterEstimation:ParallelAdaptiveChainsandEarlyRejection,BayesianAnalysis,7(3),2012.M.Laine:IntroductiontoDynamicLinearModelsforTimeSeriesAnalysis,inGeodeticTimeSeriesAnalysisandApplications,Springer2019. ,
http://dx.doi.org/10.2307/3318737
http://dx.doi.org/10.2307/3318737
http://dx.doi.org/10.1007/BF02789703
http://dx.doi.org/10.1007/BF02789703
http://dx.doi.org/10.1007/s11222-006-9438-0
http://dx.doi.org/10.1007/s11222-006-9438-0
http://dx.doi.org/10.1111/j.1467-9868.2004.02053.x
http://dx.doi.org/10.1111/j.1467-9868.2004.02053.x
http://dx.doi.org/10.5194/acp-8-7697-2008
http://dx.doi.org/10.5194/acp-8-7697-2008
http://dx.doi.org/10.1214/12-BA724
http://dx.doi.org/10.1214/12-BA724
https://arxiv.org/abs/1903.11309
https://arxiv.org/abs/1903.11309
77 . 1
EXERCISESEXERCISES
78 . 1
NON-LINEARMODELFITTINGNON-LINEARMODELFITTINGTakeasimplenon-linearmodel.Forexampleexponentialdecay,with:
withdata
Here,youcanassume�rstthat .
FamiliarizeyourselfwithsomeMCMCpackage(matlab,python,R)orevenwriteyourowncode.Writethecodeneededforthebasicmodel ,withi.i.d.Gaussianerrors,
.
y = + ( − ) exp(− t)θ1 A0 θ1 θ2
data.tdata = [1,2,3,4,5,6,7,8,9,10]'; data.ydata = [0.487 0.572 0.369 0.179 0.119 0.0809 0.104 0.091 0.047 0.051]';
= 1A0
y = f (x, θ) + ϵϵ ∼ N(0, I )σ 2
79 . 1
NON-LINEARMODELFITTING(CONT.)NON-LINEARMODELFITTING(CONT.)WithyourchosenMCMCpackage.
Estimateposterior withMCMC.Studythechainconvergence.Howlongchainsdoyouneed?WritedownparameterestimatesandtheMonteCarloerrorsoftheestimatesofposteriormeanandposteriorcovariance.Considerdi�erentstrategiestohandleuncertaintyinobservation, .Generatesuitablepredictiveenvelopesaroundthebest�tmodel.Howwouldhandlethefactthatobservationsareconstrainedtopositive?Add asanextraparameter.
p(θ|y)
σ
A0
80 . 1
EFFICIENCYOFMCMCMETHODSEFFICIENCYOFMCMCMETHODSHowwouldyoujudgecorrectness,convergenceandthee�ciencyofaMCMCsampler?
CompareyourchosenMCMCsamplertoknownGaussiantarget(oreventothe"banana"target).Dosamplingmultipletimesforthesametargetandseehowthemeanestimatesbehave.Couldyouinferthesamefromasinglerun?EstimateMonteCarloerrorbyintegratedautocorrelationandbybatchmeans.Doyougetsimilarresults?Whatarethebene�tsofdi�erentadaptationschemes?Whendoyouneedburn-in?
81 . 1
LOGISTICMODELWITHPOISSONLIKELIHOODLOGISTICMODELWITHPOISSONLIKELIHOODConsideramodelfordiscreteeventsfromPoissondistributions ,wherethemeanparameter ismodelledfurtherwithalogisticfunction
andassuming .
Fitthemodelparameters and usingMCMCandstudythepredictivebehaviourofthe�t,whenhavethefollowingobservations(nextslide)
∼ Poisson(μ(t))Nobs
μ(t) = E(N)
= kμ( − μ),μ′ μmax
N(0) = 1
k μmax
82 . 1
LOGISTICMODELWITHPOISSONLIKELIHOOD(CONT.)LOGISTICMODELWITHPOISSONLIKELIHOOD(CONT.)time
0 1
2 1
4 3
6 5
8 3
10 4
12 3
14 0
16 1
18 0
20 1
t Nobs
83 . 1