66
1 USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB 2.0 AND HMSC-R 2.0 Anna Norberg, Gleb Tikhonov, F. Guillaume Blanchet, Nerea Abrego and Otso Ovaskainen Contents General information ....................................................................................................................................................................... 3 Getting started .................................................................................................................................................................................. 3 Installing the software ............................................................................................................................................................. 3 How to organize the folder structure? .............................................................................................................................. 4 The general workflow .............................................................................................................................................................. 4 An overview of the structure of the rest of this manual ............................................................................................ 5 The general structure of the HMSC-model ........................................................................................................................... 5 Input data....................................................................................................................................................................................... 6 Parameters to be estimated ................................................................................................................................................... 7 Typical model outputs ................................................................................................................................................................... 8 MCMC trace plots........................................................................................................................................................................ 9 Parameter summaries ........................................................................................................................................................... 10 Association networks ............................................................................................................................................................ 11 Explanatory/predictive power .......................................................................................................................................... 12 Variance partitioning ............................................................................................................................................................. 13 Predictions ................................................................................................................................................................................. 14 Simulated Example 1................................................................................................................................................................... 15 Preliminaries ............................................................................................................................................................................. 16 Reading in the data ................................................................................................................................................................. 17 Constructing the HMSC object in MatLab ..................................................................................................................... 18 Constructing an HMSC object in R.................................................................................................................................... 19 Defining the prior distributions ........................................................................................................................................ 20 Scaling the covariates ............................................................................................................................................................ 20 Setting the values of the ‘true parameters’ .................................................................................................................. 21 Setting options for MCMC sampling ................................................................................................................................ 21 Performing the MCMC sampling ....................................................................................................................................... 23 Saving the HMSC object ........................................................................................................................................................ 23 Preliminaries before generating model outputs........................................................................................................ 24 Producing MCMC trace plots .............................................................................................................................................. 25 Producing posterior summaries ....................................................................................................................................... 25 Performing a variance partitioning ................................................................................................................................. 27

USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

1

USER MANUAL FOR THE SOFTWARE PACKAGESHMSC-MATLAB2.0ANDHMSC-R2.0AnnaNorberg,GlebTikhonov,F.GuillaumeBlanchet,NereaAbregoandOtsoOvaskainen

ContentsGeneralinformation.......................................................................................................................................................................3Gettingstarted..................................................................................................................................................................................3Installingthesoftware.............................................................................................................................................................3Howtoorganizethefolderstructure?..............................................................................................................................4Thegeneralworkflow..............................................................................................................................................................4Anoverviewofthestructureoftherestofthismanual............................................................................................5

ThegeneralstructureoftheHMSC-model...........................................................................................................................5Inputdata.......................................................................................................................................................................................6Parameterstobeestimated...................................................................................................................................................7

Typicalmodeloutputs...................................................................................................................................................................8MCMCtraceplots........................................................................................................................................................................9Parametersummaries...........................................................................................................................................................10Associationnetworks............................................................................................................................................................11Explanatory/predictivepower..........................................................................................................................................12Variancepartitioning.............................................................................................................................................................13Predictions.................................................................................................................................................................................14

SimulatedExample1...................................................................................................................................................................15Preliminaries.............................................................................................................................................................................16Readinginthedata.................................................................................................................................................................17ConstructingtheHMSCobjectinMatLab.....................................................................................................................18ConstructinganHMSCobjectinR....................................................................................................................................19Definingthepriordistributions........................................................................................................................................20Scalingthecovariates............................................................................................................................................................20Settingthevaluesofthe‘trueparameters’..................................................................................................................21SettingoptionsforMCMCsampling................................................................................................................................21PerformingtheMCMCsampling.......................................................................................................................................23SavingtheHMSCobject........................................................................................................................................................23Preliminariesbeforegeneratingmodeloutputs........................................................................................................24ProducingMCMCtraceplots..............................................................................................................................................25Producingposteriorsummaries.......................................................................................................................................25Performingavariancepartitioning.................................................................................................................................27

Page 2: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

2

Plottingassociationnetworks...........................................................................................................................................28Computingtheexplanatorypowerofthemodel.......................................................................................................31Samplingtheposteriordistribution................................................................................................................................32Generatingpredictionsfortrainingdata......................................................................................................................33Generatingpredictionsfornewdata..............................................................................................................................34

SimulatedExample2...................................................................................................................................................................37Modificationsneededtofitamodelwithtraitsandphylogeneticcorrelations..........................................37Additionaloutputfromamodelwithtraitsandphylogeneticcorrelations..................................................38

Aspatiallyhierarchicalstudydesign:Wood-InhabitingFungionBeechLogs..................................................40Generaldescriptionofthiscasestudy...........................................................................................................................40Howtousethescriptforestimationandgeneratingoutputs?...........................................................................41Readinginthedataanddividinginintotrainingandvalidationsets..............................................................41ConstructingandfittingtheHMSC-model....................................................................................................................44Modeloutputs:conditionalpredictions........................................................................................................................45

ASpatiallyExplicitStudyDesign:ButterfliesinGreatBritain..................................................................................48Generaldescriptionofthiscasestudy...........................................................................................................................48Readinginthedataanddividingitintotrainingandvalidationsets...............................................................49ConstructingandfittingtheHMSC-model....................................................................................................................50Modeloutputs:distancedecayofcommunitysimilarityandregionsofcommonprofile......................50

ATemporallyExplicitStudyDesign:WaterBirdsinPonds.......................................................................................52Generaldescriptionofthisstudysystem......................................................................................................................52Readinginthedata.................................................................................................................................................................52ConstructingandfittingtheHMSC-model....................................................................................................................54Modeloutputs:heterospecificinteractionsbetweenyears..................................................................................54

AnAppliedPerspective:HowDoEpiphyticBryophytesRespondtoForestManagement?........................55Generaldescriptionofthiscasestudy...........................................................................................................................55Readinginthedata.................................................................................................................................................................55ConstructingandfittingtheHMSC-model....................................................................................................................56Generatingscenariosimulations......................................................................................................................................57Modeloutputs:recoveryofspeciesrichnessandcommunitycomposition..................................................59Modeloutputs:classificationofspeciesbasedontheirresponsetomanagement....................................61

Appendix:Generatingsimulateddata.................................................................................................................................62Preliminaries.............................................................................................................................................................................62ConstructinganHMSCobject.............................................................................................................................................63Generatingtheparametersandthedatamatrices...................................................................................................63Savingthegeneratedparametersandthedatamatrices......................................................................................65

References.......................................................................................................................................................................................66

Page 3: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

3

GeneralinformationThe Hierarchical Modelling of Species Communities (HMSC) framework described inOvaskainen et al. (2016b) has been implemented asMatLab- and R-packages. Thismanualaimstobeabridgebetweenthehelpfilesofthepackages(wheremoretechnicaldetailsaregiven)andthearticle(wheretheecologicalinterpretationoftheresultsisdiscussedandthemathematicaldescriptionofHMSCisgiven).Forthis,weprovidestep-by-stepguidelinesforreplicating the results of the four case studies illustrated in the main article, as well assimulatedexamples.Thismanual startsbygiving the instructions for installing thepackages, thendescribes themodel structures and parameters defined within the HMSC framework and finalizes byinstructingtheuseronhowtoperformtheanalysesinpractice.ThematerialofthismanualholdsbothfortheMatLab-andR-packages.AstheultimateobjectiveoftheuseristoapplytheHMSCframeworktoher/hisowndata,werecommendtheusertostartbyreadingOvaskainenetal.(2016b)wherethecapacityoftheframeworkisdiscussed,thenreplicatethesimulatedexampleswhichillustratetheuseoftheframeworkinasimplifiedsetting,andeventuallyreplicatetherealdataexamples,whichalsoillustratesomewhatmorecomplicatedstudydesignsortypesofanalyses.

GettingstartedInstallingthesoftwareThe HMSC software is available for bothMatLab (HMSC-MatLab 2.0; developed under theMatLab version R2015a) and R (HMSC-R 2.0; developed under R version 3.3.1). Both theMatLab- andR-packages (aswell the last version of thismanual) can be downloaded fromhttps://www.helsinki.fi/en/researchgroups/metapopulation-research-centre/hmsc. All filesrelated to the MatLab version are found from the file HMSC-MatLab.zip, whereas all filesrelatedtotheRversionarefoundfromHMSC-R.zip.Onthesamewebpage,thesimulatedandrealdatafilesareavailablewithinthefolderHMSC-data.zip.Forreplicatingtheexamples,theuserneeds todownload thedata, thesoftwarebeingused (i.e.MatLaborR), aswellas thescriptsforanalysingthedata.The MatLab package can be unzipped to any location where the user wishes to keep thesoftware.Itcontainscodeas.mfiles,anddoesnotrequireanyinstallationstep.TheR-packageisavailableinitssource(.tar.gz)andcompiledversions(forMacOSX,.tgzandWindows .zip).Manyparts of theRpackagehavebeen implemented inC++11and for thisreasonitissimplertousethecompiledversionoftheRpackage.Iftheuserwantstocompilethe R package, it is necessary also to have the packages ‘coda’, ‘Rcpp’ and ‘RcppArmadillo’installed. After it is downloaded, the R package can be installed using theinstall.packages()functioninR.In addition to this manual, we provide specific help files for both software types. For theMatLabpackage,thehelpfile(‘help.pdf’)isincludedinthezip-file.FortheRversion,thehelp

Page 4: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

4

filesareavailablebytypingaquestionmarkandthenameofthefunctionofinterestintotheconsole(> ?hmsc).Howtoorganizethefolderstructure?The user should create separate folders for individual studies, within which the modeloutputswillbestoredinsubfolders.Forboththesimulatedandrealexamplesthatwewillgothrough below, such folder structure exists already under the folder ‘case studies’. Eachsubfolderthatcorrespondstoonestudy(suchas‘fungi’)includestheparticularscriptforthestudyandonesubfoldercalled‘results’,wherethemodeloutputs(asexcelfilesortiff-images)will be stored. The fitted HMSC model (‘model.mat’ for MatLab or ‘model.RData’ for R) isstoredunderthemainfolderofthespecificstudy.ThegeneralworkflowWhetherusingMatLaborR,werecommendtheusertoconductHMSCanalysesbywritingascript (or set of consequent scripts), that consists of all the commands needed to run theanalyses, from importing thedata to theproductionof the result figures and tables. In thisway, theuserwillmoreeasilyensurethereproducibilityof theresults.As illustrated intheexamplescriptsbelow,atypicalscriptincludestwoparts:onewhichrelatestomodelfittingandanotheronewhichrelatestothepost-processingoftheresults.ThefirstpartoftheworkflowconsistsoffittinganHMSCmodel.Thisincludes

a) Setting up an HMSC object by defining the model structure (e.g. whether traits orrandomeffects are tobe included)andorganizing thedata (typically imported fromfileswithstandardMatLaborRcommands)intotheHMSCobject.

b) DefiningthepriorsrequiredforBayesianinference.

c) Defining the parameters related to the MCMC sampling scheme used for posteriorsampling(e.g.howmanyiterationroundsaretoberunandhowtheywillbethinned).

d) Running the actual estimation scheme. As running the estimation schememay takesometime,werecommendtheusertosavetheHMSCobjecttoafile(called‘model.m’forMatLaband‘model.RData’forR).Thisobjectincludesthemodelstructure,thedata,andthefullposteriordistribution.

ThesecondpartoftheworkflowconsistsofreadingtheHMSCobjectfromthefileandpost-processing the results.Unlike themodel fittingproceduredescribed above, this part of theworkflow does not necessarily follow any specific order or include any compulsorycomponents,as it isup to theuseronwhatkindofoutput isneeded.But typically theusermaywishtoassesstheconvergenceoftheMCMCchains,produceposteriorsummaries(e.g.posterior means and quantiles) as tables or plots, measure the explanatory power of themodel, perform a variance partitioning among the fixed and random effects, and makepredictions.

Page 5: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

5

AnoverviewofthestructureoftherestofthismanualWefirstdescribethegeneralstructureoftheHMSCmodelaswellasgiveanoverviewofthetypicaloutputs that theusermaywish togenerate.We thengo through twosimulatedandfourrealcasestudies, thatwehopewillhelptoserveas illustrativeexamplesthatwillhelpthe user in analysing his or her own data. The full scripts for running these examples arefound from the folder of each case study. In the MatLab version, these scripts are called‘simulated_example_1.m’, ‘simulated_example_2.m’, ‘fungi.m’, ‘butterflies.m’, ‘birds.m’ and‘brypophytes.m’,while for theRpackagetheyarecalled ‘simulatedexample1.R’, ‘simulatedexample2.R’,‘fungi.R’,‘butterflies.R’,‘birds.R’and‘brypophytes.R’.

ThegeneralstructureoftheHMSC-modelFigure1givesagraphicaloverviewofthestatisticalmodel, themathematicaldescriptionofwhichisdescribedbyOvaskainenetal.(2016b).It is important to note that not all of the data types illustrated in Fig. 1 are necessary forbuildinganHMSCmodel.Technically, theonly compulsory component for fitting theHMSCmodelsistheoccurrencematrix𝐘foratleastonespecies(inwhichcaseitisavectorratherthanamatrix).Inthiscase,theenvironmentalmatrixwillincludeonlytheintercept,andthemodelwillrelatetoestimatingthemeanoccurrence(abundanceorprevalenceofthespecies).However, forecologicallymeaningfulanalyses, theminimal setofdataneededareeitherofthefollowing:

a) Theoccurrencematrix𝐘foronespeciesaswellastheenvironmentalcovariatematrix𝐗,inwhichcasetheHMSCmodelcorrespondstoatraditionalsingle-speciesmodel.

b) The occurrence matrix 𝐘 for several species, in which case the HMSC modelcorrespondstoamodel-basedordination.

Although such analyses are possiblewith theHMSCpackage, it is not primarilymeant forthese special situations, for which many other kinds of software is available. As theavailabilityof thedata typesdescribed inFig.1 increase,deeperanalysesandconsequentlydeeperecologicalinsightsbecomepossible—andthisiswhattheHMSCisreallymeantfor.Wenextgothroughthetypeanddimensionsoftheinputdata(orangeboxesofFig.1)aswellastheparametersthatareestimated(theblueellipsesinFig.1)throughposteriorsampling.

Page 6: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

6

InputdataWe indexby 𝑖 = 1,… , 𝑛) the samplingunits;by 𝑗 = 1,… , 𝑛+ the species;by𝑘 = 1,… , 𝑛- thecovariates; by 𝑡 = 1,… , 𝑛/ the traits; and by 𝑟 = 1,… , 𝑛1 the random effect levels (e.g.samplingunits,plots,years,...)tobeestimated.Themaininputdata(Fig.1)arecontainedinthefollowingfourmatrices:

• 𝐘-matrix: a𝑛)×𝑛+matrix of the responsedata𝑦45 (calledm.Y inMatLab,Y in theRpackage).

• 𝐗-matrix:a𝑛)×𝑛- matrixoftheenvironmentalcovariates𝑥47 (calledm.XinMatLab,XintheRpackage).

Figure1.Agraphicalsummaryofthestatisticalframework.Theorangeboxesrefertodata,theblueellipsestoparameterstobeestimated,andthearrowstofunctionalrelationshipsdescribedwiththehelpofstatisticaldistributions.See(Ovaskainenetal.2016b)formoredetails.

Page 7: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

7

• 𝐓-matrix:a𝑛+×𝑛/matrixofthetraits𝑡59 (calledm.T inMatLab.IntheRpackagethismatrixiscalledTranditisthetransposeof𝐓).

• 𝐂-matrix:a𝑛+×𝑛+matrixof thephylogenetic correlations𝑐55< (calledm.C inMatLab,PhylointheRpackage).

Additionally,theHMSC-frameworkaccountsforrandomvariationinspeciesoccurrenceandco-occurrence, forwhich the spatio-temporal contextof the studyworksas inputdata.Thedescriptionofthestructureofthestudydesignisgiveinthefollowingmatrices:

• 𝚷-matrix:a𝑛)×𝑛1 matrixoftheunits𝜋41 towhichthesamplingunits(rowsofthe𝐘matrix)belongto;forexample,if𝑟correspondstoaplot-leveleffect,𝜋41 istheplottowhichthesamplingunit𝑖belongsto.Thenumberofunitsateachlevelisdenotedby𝑛?(1)(calledm.piinMatLabandRandominR).

• 𝐱𝐲(𝑟)-matrix:a𝑛?(1)×𝑑matrixcontainingthespatial(ortemporal)coordinatesofthe𝑛?(1)unitsatlevel𝑟.Thismatrixisdefinedforthoselevels𝑟forwhichthestudydesignhas an explicit spatial (or temporal) structure. These coordinates can be of anydimension𝑑 = 1,2, …, typicallywith𝑑 = 2 forspatialcoordinatesand𝑑 = 1 fortime.These coordinates are translated into a distance matrix using Euclidian distance(calledm.xyinMatLabandAutoinR).

Theuseralsoneedstodefinethestatisticaldistributionofthedata.Inthecurrentversionofthe framework, there are four link functions/error distributions available: 1) ‘normal’ fornormally distributed data, 2) ‘probit’ for binary data, and 3) ‘Poisson’ and 4)‘overdispersedPoisson’forcountdata.Toexplainthesemathematically,wedenoteby𝐿45 thelinearpredictorforsamplingunit𝑖andspecies𝑗.

1) The‘normal’modelisdefinedas𝑦45 = 𝐿45 + 𝜀45 ,where𝜀45~𝑁(0, 𝜎5M).2) The ‘probit’ model is defined as Pr 𝑦45 = 1 = Φ 𝐿45 , where Φ is the probit link

function, i.e. the cumulative density function of the standard normal distribution𝑁 0,1 .Theestimationprocedureutilizes themathematicallyequivalent formulationof𝑦45 = 1 for𝑧45 ≥ 0 and𝑦45 = 1 for𝑧45 < 0,where𝑧45 = 𝐿45 + 𝜀45 , and𝜀45~𝑁(0, 𝜎5M),where𝜎5Misfixedto𝜎5M = 1.

3) The ‘Poisson’and4) ‘overdispersedPoisson’modelsaredefinedas𝑦45~Poi exp(𝐿45 +𝜀45) ,where𝜀45~𝑁(0, 𝜎5M).Forthe‘Poisson’model𝜎5Misfixedto𝜎5M = 0.01,whereasforthe ’overdispersedPoisson’ model 𝜎5M is estimated. The reason for including a smallresidual variance for the ‘Poisson’ model (rather than setting it to zero that wouldcorrespond exactly to the usual Poisson model) relates to the posterior samplingschemeswehaveusedandisthusoftechnicalratherthanbiologicalnature.

ParameterstobeestimatedThe posterior distributions of the parameters are compiled in HMSC-MatLab within aHmscParobjectnamedp,andinHMSC-R,theyarefoundinmodel$results$estimation.

• 𝜸-matrix: a𝑛/×𝑛- matrix of the 𝛾97 , i.e. the effects of the traits to the responses tocovariates.InHMSC-MatLab,thisisreferredtoasp.gamma.InHMSC-R,thisisreferredtoasparamTranditisthetransposeofthe𝜸-matrix.

Page 8: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

8

• 𝐕-matrix: a 𝑛-×𝑛- matrix of variation (in responses to covariates) among specieswhichisnotexplainedbytraits.InHMSC-MatLab,thisisreferredtoasp.V.InHMSC-R,thisisreferredtoasvarX.

• 𝜷-matrix:a𝑛+×𝑛- matrixofthe𝛽57 ,i.e.theresponsesofthespeciestothecovariates.In HMSC-MatLab, this is referred to as p.beta. In HMSC-R, this is referred to asparamX.

• 𝜌(scalar):aparametermeasuringthestrengthandsignofthephylogeneticsignalonthespecies'responsestocovariates.InHMSC-MatLab,thisisreferredtoasp.rho. InHMSC-R,thisisreferredtoasparamPhylo.

• 𝚺-matrix:a𝑛+×𝑛+diagonalmatrixofresidualvariances(seethedatamodelsabove),withdiagonalelements𝜎5M.InHMSC-MatLab,thisisreferredtoasp.sigma.InHMSC-RthisiscalledvarX.

• 𝛀-matrix: a 𝑛+×𝑛+ species-to-species variance-covariance matrix with elementsΩ45 .Thisassociationmatrixisestimatedseparatelyforalllevelslistedinthe𝚷-matrix(e.g.oneforsamplingunitlevelandanotheroneforplotlevel).Theoff-diagonalelementsofthesematricesmodelspecies-to-speciesassociations.Thediagonalelementsmodelrandomvariationwithinspecies,neededtoaccountforthestatisticaldependencyoncaseswherethedata involvesrepeatedsamplesfromthesameunits.Forexample, ifanalysingthefungaldata(withstudydesignconsistingofsamplingunitswithinplotswithinforests)forasingle-speciesonly,the𝚷-matrixshouldinvolvethelevelsofplotsand forests to make the analyses statistically valid, in the same way as one shouldincludeplot and forest as randomeffects in a usualGLMManalysis. If analysing thefungal data for multiple species, the 𝚷-matrix may or may not involve also thesamplingunitlevel,dependingonwhethertheuserwishestoestimatethespecies-to-speciesassociationpatterns at this level.Technically, theseassociationmatricesareestimated using a latent variable approach. The association matrix 𝛀 at level 𝑟 isdefinedas 𝛀 = 𝛌𝛌e,where𝛌 is the𝑛+×𝑛f(1)matrixof the factor loadings(InHMSC-MatLab, this is referred to as p.lambda. In HMSC-R, this is referred to asparamLatent),with𝑛f(1)beingthenumberoflatentfactors(whichisalsoestimated).The latent factors are included in the𝑛?(1)×𝑛f(1)matrix𝜼 (InHMSC-MatLab, this isreferredtoasp.etaanditiscalledlatentinHMSC-R).Forspatial(ortemporal)latentfactors,weassumeaexponentiallydecreasingcorrelationstructure,withspatialscaleoffactorℎatlevel𝑟denotedby𝛼1j(InHMSC-MatLab,thisisreferredtoasp.alphaandinHMSC-R,thisisreferredtoasparamLatentAuto).

TypicalmodeloutputsAfter a HMSCmodel has been fitted and thus the posterior distribution of the parametersestimated, theuser canoutput the results in various forms.Most importantly, theuserhasaccess to the full posterior distribution, and can thus post-process the results in the wayshe/helikes.ToaidauserwithlimitedexperienceinprogrammingorBayesianinference,theHMSC-packages also include a set of standard outputs that are useful for evaluating theconvergenceoftheMCMCsamplingschemeaswellasforprovidingmanykindsofsummariesandpredictingforinterpretingtheresults.Theexactshapeandamountofoutputthatcanbegenerated will vary depending on the components that the fitted model included. In thissection,we introducesomestandardoutput types that thereadermaywish togenerate. Inthe context of the real data examples,we illustrate some further kinds of outputs that arespecifictoparticularstudydesigns.

Page 9: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

9

MCMCtraceplots

WithMCMC-basedBayesianinference,itisnecessarytoexaminethemixingofthechains,i.e.whether the MCMC chains have become stationary and are long enough to yield arepresentativesamplefromtheposterior.OnesimplewayistoassesstheconvergenceoftheparametersbyavisualinspectionoftheMCMCtraceplots.Forexample,Fig.2showsatraceplotofthe𝜸parameters.Inthistraceplot,thechainsappeartobeconvergingproperly.Firstof all, there is no obvious transient phase, as the parameter values in the beginning of thechainscoverasimilarrangeasintheendofthechains.Second,whilethechainsshowsomelevel of autocorrelation (consecutive values aremore similar to each other than values faraway), the chains move many times up and down, and thus contain several independentsamplesfromtheposterior.Asitisnotdesirabletoincludeauto-correlatedsamples(theydonotprovideadditional informationand increasee.g. file sizes), it isusual to thin thechainsafterthesampling(asdoneinFig.2).Toobtainahighlyaccuratesampleoftheposterior,thechaincouldberune.g.10timeslonger,soupto100,000iterations,andthenthinite.g.byafactorof100toproduce1000essentiallyindependentsamplesfromtheposterior.

Figure2.MCMCtraceplot illustratingthemixingchainsof the𝜸parameters.Thex-axisshows the MCMC iteration, and the y-axis the parameter values at each iteration, eachcolour corresponding to one specific parameter within the 𝜸 matrix. The dashed linescorrespondtothetrueparametervalues(usuallynotknown,butasthisexampleisbasedonsimulateddata, theyareknown).TheMCMCchainhasbeenrunfora totalof10,000iterations, and then thinned to include only every 10th sample, thus for each estimatedparameter 1000 distinct values are plotted. As detailed in the examples, this plot wasgeneratedwiththeMatLabpackageusingthefunctioncallm.plotGamma.

Page 10: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

10

ParametersummariesInsteadof reporting the full jointposteriordistribution (which includes informatione.g. onposterior correlations among the parameters), one often wishes to report summaries forindividualparameters, i.e.propertiesofsocalledmarginaldistributions.Asweareworkingwith Bayesian inference, instead of p-values and confidence intervals (that are typicallyreportedinmaximumlikelihoodestimation)onemaycomputee.g.posteriorprobabilitiesorcredibleintervals.IntheBayesiananalysis,theposteriormean(ormedian)canbeconsideredasthebestpointestimate.Toassessthelevelofstatisticalsupportforwhetherprobabilityofoccurrence (or abundance) of a species 𝑗increases or decreaseswith increasing value of agivenenvironmentalcovariate𝑘,onecanconstructe.g. the95%centralcredible intervalof𝛽57 by computing the 0.025 and 0.975 quantiles of this parameter. If this credible intervalcontains e.g. only positive values, one may conclude that the parameter 𝛽57 is positive(occurrenceofspecies increaseswith thecovariate)with this levelofstatisticalsupport.AsillustratedbyFig. 3, in theHMSC framework suchposterior summaries canbe exportedasexcelfiles,inwhichthefirstsheetgivestheposteriormeanandtheothersheetsthequantilesdefinedbytheuser.

Anotherwaytosummarizemarginalposteriordistributionsistovisualizethemase.g.boxplots(availableinHMSC-MatLabandthroughtheboxplotfunctioninR),orbeanplots(throughthe‘beanplot’packageinR,seeFig.4).

Figure3. Exampleof aposterior summary table for the regressionparameters𝜷.The firstsheetgivestheposteriormeanestimates,andthenexttwosheetsthequantilesdefinedbytheuser (here respectively the 0.025 and 0.975 quantiles). The green box indicates a specificparameter that ispositivewithin the95%credible interval.As shown in theexamples, thistablewasgeneratedusingthefunctioncallm.summaryintheMatLabpackage.

Page 11: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

11

AssociationnetworksA key aspect of theHMSC-framework is that it enables the estimationof species-to-speciesassociation matrices 𝛀. These matrices are technically variance-covariance matrices, andthey can be turned into correlationmatrices𝐑 by the conversion𝑅5m5n = Ω5m5n/ Ω5m5mΩ5n5n .Correlationshavetheadvantagethattheirvaluesalwaysrangefrom-1to1,andthustheyareeasier to interpret than covariance values, for which the absolute values depend on thevariances.The variance-covariance and correlation matrices can be interpreted as species-to-speciesassociationnetworks,andtheycanbeestimatedatdifferentlevelsinthecaseofahierarchicalstudy designs. Summaries for both the covariance matrices and correlation matrices (e.g.posteriormeansandquantiles)canbeoutputtednotonlyasexcel-tables(inthesamewayasweillustratedaboveforthe𝜷parameters),butalsoastwokindsofplots,illustratedinFig.5.In the association plots, red colour corresponds to positive species-to-species associations(twospeciesco-occurmoreoftenthanexpectedbyrandom)andbluetonegativeassociations(two species co-occur less often than expectedby random).Theuser can set the thresholdlevelofstatisticalsupportfortheassociationstobeincluded,andthenonlyspeciespairswithstatistical support above the threshold level are shown. The association networks can beplottedasequivalent'matrixplots'or 'circleplots'(alsocalledchorddiagrams).InFigure5,wecanseethatafteraccountingfortheenvironmentalcovariates,e.g.thespeciespair5-6isfoundmoreoften together thanexpectedbyrandom,whereas thespeciespair1-6 is foundlessoftentogetherthanexpectedbyrandom.AsdiscussedmorethoroughlyinOvaskainenetal.(2016b),theseassociationscanbeconsideredashypothesesforpositive(speciespair5-6)andnegative(speciespair1-6)speciesinteractions.

Figure 4. Example of a bean plot, summarizing the posterior distributions of regressionparameters𝜷.ThisplotwasgeneratedinR,withthepackage‘beanplot’.

Page 12: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

12

Explanatory/predictivepowerInOvaskainenetal.(2016b),weillustratetheexplanatoryandpredictivepowersofthemodelwithR2values.Suchvaluescanbecomputedeitherforthesamedataforwhichthemodelhasbeen fitted (training data) or for predictions related to independent data (validation data).These values characterize the match between predicted and true occurrences for thetraining/validationdata.Fornormallydistributeddata,HMSCmeasuresR2asusually,i.e.bycalculatingtheproportionofexplainedvarianceoverthetotalvarianceinthedata(Zar2010).Forbinarydata,HMSCcalculatestheTjurR2(Tjur2009).TjurR2isdefinedasthemeanmodelprediction for those sampling units where the species occurs, minus the mean modelpredictionforthosesamplingunitswherethespeciesdoesnotoccur.Forabundancedata,thepredicted abundances are compared to the observed abundances by using Pearson’scorrelation.Forbinarydata,whenevaluatingthemodel’spredictionsforhigherhierarchicallevels than the sampling unit, the sampling unit-level predictions are summed to generatepredictedabundances,whicharecompared to theobservedabundancesbyusingPearson’scorrelation.Theupper limitofanyof theseR2measure is1,whichcorrespondstothe idealcasewherethemodelcompletelyreplicatesthedata.Ifthesamedataareusedbothtofitthemodelandtoevaluateitsperformance,theR2valuesmaybeinflatedduetooverfitting.Thus,whileHMSCproducesR2summariesasadefaultforthe same data towhich it is fitted,we recommend the user to split the data into separatetraining and validation data sets and compute the R2 values for the validation data. Weprovideexamplescriptsforhowtoseparatetrainingandvalidationdatawiththefungalandbutterfly case studies.As illustrated inFig.6, theR2 values canbe computed separately foreachspeciesandthenplottedagainstspecies'prevalence,abundance,orothersuchmeasures.

Figure5.Matrixplot(lefthandside)andcircleplot(righthandside)illustratingspecies-to-speciesassociationsmeasuredbyacorrelationmatrix𝐑.Bluecolourrepresentsnegativeandredpositivecorrelationsbetweenspeciespairs.Onlythoseassociationsforwhichtheupperand lower quantile of a given interval (given by the user) are either both positive or bothnegative are shown. As detailed in the examples, this plot can generated in HMSC-MatlabusingthefunctioncallplotCorrelations.

Page 13: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

13

VariancepartitioningItisoftenofinteresttopartitiontheexplainedvariationintocomponentsrelatedtoresponsesto (groups of) environmental covariates and to random effects, defined possibly at variouslevels.Onecanfurthermorecomputetheamountofamong-speciesvariationintheresponsesof the species to the environmental covariates (informally, the variation among speciesniches),andaskwhichfractionofthiscanbeexplainedbythespeciestraits.TheequationsonwhichthisvariancepartitioningisbasedarepresentedintheSupplementaryInformationofOvaskainenetal.(2016b).Figure7illustratesacaseofvariancepartitioning.

Figure6. Examples of theR2 summaries computed for binary (presence-absence) data.Thelefthandsidecorrespondstothesamplingunit-levelR2valuesandtherighthandsideplottotheplot-levelR2values.Thedotscorrespondtotheindividualspecies.Thespecies-specificR2 values are plotted here against prevalence, i.e. fraction of occupied samplingunits. The figure titles give themean values over the species, which can be used as anoverallsummaryofthemodel’sexplanatorypower.Asdetailedintheexamples,thisplotcangeneratedwiththeHMSC-MatLabusingthefunctioncallplotR2.

Page 14: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

14

PredictionsTheHMSCpackageoffersthepossibilityofgeneratingseveralkindsofpredictions,orinotherwords, forgeneratingsimulateddata.Therearethreekindsofchoicesrelatedtogeneratingpredictions.First, one can generate either expected values (e.g. occupancy probabilities) or actualrealizations(e.g.presencesorabsences).Thechoiceamongthesedependsontheuseoftheprediction:ifinterestedine.g.species-specificoccurrenceprobabilities,itismoreefficienttogenerate directly expected values rather than many realizations and then averaging overthem.But if interested inpredictingco-occurrencepatterns,one typicallywishes topredictrealizations,as(atleastfornewdata,seebelow)theinformationaboutco-occurrenceisnotcarriedoverwithspecies-specificoccupancyprobabilities.

Figure7. Exampleof avariancepartitioningplot.Eachbar corresponds toone species,and the colours shows the proportion of variance related to different variancecomponents (the legend showsmean values over the species). The plot title shows thepercentageofvariance in species'nichesthat canbeattributedtothespecies traits. Asdetailed in the examples, this plot can be generated using the function callplotVariancePartitioning,withHMSC-Matlab.

Page 15: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

15

Second, one can generate predictions either for units that were included for estimation(trainingdata)orfornewunits(calledvalidationdataornewdata).Alsointermediatecasesare possible: e.g. tomake a prediction for a new sampling unit present in a plot that wasincludedinthetrainingdata.Third,fornewsamplingunits(i.e.,samplingunitsthatarenotpartofthedatausedformodelfitting) it is possible to perform unconditional or conditional predictions. Unconditionalpredictions are the typical predictions needed to ask how the response variable (e.g. theoccurrence of a species) changesunderdifferent environmental conditions (e.g. a changingvalueofanenvironmentalcovariate).Incaseofconditionalpredictions,onecanadditionallyaskhowinferenceonspeciesoccurrenceisinfluencedbyinformationabouttheoccurrenceofthesameorotherspeciesinthesameorothersamplingunits.All types of predictions can be done formultiple samples of themodel parameters drawnfrom the posterior distribution in order to propagate parameter uncertainty to thepredictions.Weillustratevariouskindsofpredictionswiththesimulateddataexamplesandwiththerealexamplesrelatedtofungi,butterflies,birdsandbryophytes.

SimulatedExample1We start illustrating how to performanalyseswithHMSCwith a highly simplified exampleconsistingofpresence-absencedatafor10species,acquiredin200samplingunits,belongingto10plots.Thisexample is foundunderthefolder ‘simulatedexample1’.Thedatathatarethe starting point for the analyses are found as .csv files from the subfolder “data”. In thisfolder,thedatahavebeenorganisedinthreefiles(Fig.8.).Thesefilesincludetheinformationneededtoconstructthespeciesoccurrencedatamatrix𝐘(file‘Y.csv’,wherethespeciesareincolumnsandsamplingunitsinrows),theenvironmentaldatamatrix𝐗(file‘X.csv’,wherethecovariatesare incolumnsandsamplingunits in rows),and the indexmatrix𝚷 (file ‘pi.csv’,wherethelevelsofthestudydesignareincolumnsandsamplingunitsarerows).

Figure8.IllustrationofthefilescontainingthedatamatricesforSimulatedExample1.

Page 16: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

16

WithSimulatedExample1,weanalysethesedatawiththeaimofquantifyingspeciesniches(theresponsesofthespeciestotheenvironmentalcovariates;throughtheparameters𝜷),aswell as species-to-species association networks at the levels of sampling units and plots(through the parameters𝛀). The scripwhich runs all the analyses andwhich is describedbelowiscalled‘simulatedexample1.m’(forMatLab)and‘simulatedexample1.R’(forR).BeforemovingonwiththeSimulatedExample1,letusmakethreeremarks.

1) Thedatamatrices donot need to be specifically in this format.Also file types otherthan.csvfilescanbeused,andthefilesmayormaynotincludeheaders.Further,theinformationneededtoconstructe.g.the𝐘and𝐗matricescanbeinasinglefileorintwo separate files. This is because the actual data matrices 𝐘, 𝐗 and 𝚷 are to beconstructedfromtheimportedfileswithstandardMatLaborR-functionsbytheuser,asisdoneinourexamplescripts.

2) In the Simulated Example 2, we will return to these same data files, but we willadditionally assume that we have information on species' traits (𝐓-matrix) andphylogeneticcorrelationsamongthespecies(𝐂-matrix).Thereasonfornot includingthoseyetistokeepthisfirstexampleassimpleaspossible,aswellastoillustratehowadditionallayerscanbeaddedtothemodelifmoretypesofdataareavailable.

3) In the Appendix of the manual, we will describe how these simulated data weregenerated with HMSC-Matlab. The reason for presenting the generation of thesimulateddataonlyintheAppendixisthatweexpecttheusertobemainlyinterestedinapplyingHMSCtorealempiricaldata,inwhichcasesheorhemaynotbeinterestedin the description of data generation. On the other hand, the ability to generatesimulateddatawithknownparametervalues(calledhere ‘trueparametervalues’) isvery useful for two purposes: 1) for testing the validity of the posterior samplingmethodsthatwehavedeveloped(whichtestingwehavedoneratherextensivelybutweappreciateanypotentialbugreportsfromusers),and2)forconductinganalysesofstatisticalpower(i.e.,generatingvaryingamountsofdataforvaryingeffectsizesandexamininghowaccuratelythetrueparametervaluescanbeidentified).

PreliminariesWefirstcleantheworkspaceandsetaseedtotherandomnumbergenerationsothatitstartsalwaysfromthesamevalueandthusensuresthereproducibilityoftheresults.WithHMSC-R,wealsoloadthe‘HMSC’packageusingthecommandlibrary(HMSC).

MATLABSCRIPT clearvars; rng(1);

Page 17: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

17

The next step with HMSC-Matlab is to ensure that the path includes the directory for theHMSC-package(toensurethattheHMSC-functionscanbecalled;incasetheHMSCdirectoryisnotthesameastheworkingdirectory),aswellastodefinethefoldercontainingthedataandmain folderof thecasestudy.WithHMSC-R, theusershouldset themainHMSC folder(containingthe‘data’andthe‘casestudies’folders)astheworkingdirectory.

In these lines, the user should complete the paths to the folderswhere the software,mainfolder,andthedataare.ReadinginthedataThe next step is to read in the data files. The HMSC-packages do not provide any specialfunctionsforthis,sotheusualMatLaborR-functionsaretobeused.Below,weconstructthenumericaldatamatricesaswellasstore thenamesof thespecies,covariates,and levels, sothatwecanrelatetothesewhenoutputtingtheresults.InMatlab,weconvertthelevelsofthe𝚷-matrixtoacellarrayofstrings,whichisanalogoustodefiningfactorsinR(thisenablestousee.g.thenamesofsitesasthelevelsofthesite-levelrandomeffects).

RSCRIPT rm(list = ls()) library(HMSC) set.seed(1)

MATLABSCRIPT % Add a path to the HMSC directory addpath('D:\MY HMSC FOLDER\HMSC class'); % Assign the folder containing the simulated data dataFolder = 'D:\MY HMSC FOLDER\data\simulated'; % Assign the folder where the results for this model will be saved (under % the results subfolder to be generated by HMSC) folder = 'D:\MY HMSC FOLDER\case studies\simulated example 1';

RSCRIPT #============================================================================ ### Set working directory #============================================================================ setwd("/my HMSC-R folder/ ")

Page 18: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

18

Nextwedescribe(separatelyforMatLabandR)howtoconstructthemainHMSC-object.Thisobjectwillintheendcontainallessentialinformationofthemodel:itsstructure,thedata,andtheposteriordistribution.ConstructingtheHMSCobjectinMatLabTheuserneeds to specify the folderof the study,whether traitsare included in themodel,whether phylogenetic correlations are included, and whether the random effects at theincludedlevels(ifany)haveaspatialstructure.Inthisexample,wedonotincludetraitsnorphylogeny,andweincluderandomeffectsattwolevels,neitherwithaspatialstructure.

MATLABSCRIPT file = fullfile(datafolder, 'Y.csv'); Yda = importdata(file); Y = Yda.data; species = Yda.textdata; file = fullfile(datafolder, 'X.csv'); Xda = importdata(file); X = Xda.data; covariates = Xda.textdata; file = fullfile(datafolder, 'pi.csv'); PIda = importdata(file); levels = PIda.textdata; % Convert pi to cell array of strings piCell=num2cell(PIda.data); piCell = cellfun(@num2str, piCell, 'UniformOutput', false);

RSCRIPT # Community matrix Y <- read.csv("data/simulated/Y.csv") # Covariates X <- read.csv("data/simulated/X.csv") # Random effects Pi <- read.csv("data/simulated/Pi.csv") # Covert all columns of Pi to a factor for(i in 1:ncol(Pi)){ Pi[,i] <- as.factor(Pi[,i]) }

MATLABSCRIPT % General format m = Hmsc(folder, traits, phylogeny, [spatial]) In this % example we have no traits, no phylogeny, and random effects at two % levels (called sampling units and plots), neither of which is spatial m = Hmsc(folder, false, false, [false, false]);

Page 19: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

19

WenextincorporatethedatamatricestotheHMSCobject,aswelldefinethedatamodel(hereprobit-regression for presence-absence data). Further, we incorporate the names of thespecies,covariates,andlevelstotheobject.

ConstructinganHMSCobjectinRIn R, the data need to have a particular structure, i.e. to be an object of class HMSCdata.Formattingthedatainsuchawaycanbecarriedoutusingtheas.HMSCdatafunction.Whenthe data are in the right format and have passed through a series of tests (included in theas.HMSCdatafunction),onecanstarttheactualanalyses.It is common for theas.HMSCdata function to sendvariousmessages andwarnings in theconsolequeryingtheusertomakesurethatthedatawereformattedasthesoftwarerequires.The as.HMSCdata function expects the data to be formatted in a particular way for thedifferentmatrices(Fig.1).Assuch,errororwarningmessagesmaybesent for theuserstoorganizethedataintheproperformat.Forexample,thespeciesandcovariatesshouldalwayscontainnumericvalueswhile the randomeffects shouldbea factor if there isonlya singlerandomeffect,oradata.frameiftherearemultiplerandomeffects.Iftraitsareconsideredintheanalysistheyshouldbecomposedonlyofnumericvalueswhilethephylogenyneedstobe a square symmetric correlation or covariance matrix (if it is a covariance matrix, theas.HMSCdata function will convert it to a correlation matrix). As for the autocorrelatedrandomeffect,ithastobeeitheradata.framewiththefirstcolumnbeingafactorwhiletheother columns are spatial or temporal coordinates (the number of coordinates is notconstrained) or a list of data.frame. The function also includes the logical argumentsscaleXandscaleTr,whichscale(centeranddividebythestandarddeviation)thematricesof environmental covariates and traits. Lastly, it also includes the logical argumentsinterceptX and interceptTr that automatically add an intercept to the environmentalcovariateandtraitmatricesrespectively.Thelastfourargumentspresentedabove(scaleX,scaleTr,interceptXandinterceptTr),aredefinedasTRUEbydefault.

MATLABSCRIPT % Adding the data to the HMSC object. % General format m.setData(Y, data_model, X, pi, xy, T, C) % Data types not included are specified as empty matrices m.setData(Y, 'probit', X, piCell, [], [], []); % Adding names of species, covariates and levels to the HMSC object m.setSpeciesNames(species); m.setCovNames(covariates); m.setLevelNames(levels);

RSCRIPT formdata <- as.HMSCdata(Y = Y, X = X, Random = Pi, interceptX = FALSE, scaleX=FALSE)

Page 20: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

20

TheRpackageincludesthisdata,whichcanbeaccessedasexplainedinthenextRscriptbox.Note,however,thatknowinghowtoconvertanydataintoanobjectofclassHMSCdataistheonlywaytogoforwardwiththispackage.ThereforethepreviousstepsareessentialtoreadinanydataintotheHMSCRpackage.Forsimplicity,inthelaterexampleswewillimportthedatabyloadingitinamannershownbelow.

DefiningthepriordistributionsWenextdefinethepriordistributionssimplybysettingthemtothedefaultvaluesdefinedintheSupportingInformationofOvaskainenetal. (2016b).Auser interestedinmodifyingthepriordistributionscandoitsousingthefunctionsexplainedinthemoredetailedsoftware-specifichelpfiles.

InHMSC-R,IftheuserwantstoinitiateaspecificsetofparametersfortheMCMCalgorithm,itcan be done with the as.HMSCparam function. Note that if only one or a reduced set ofparametersneedstobespecified,theas.HMSCparamfunctionwillautomaticallygeneratealltheotherparameters.InthelaterexamplesofHMSC-Rthepriorsaredefinedalreadyintheloaded data object, so therewill be no need to define them separately, butwhen the userwantstoanalyseownrealdata,thisisanecessarystep.

ScalingthecovariatesItistypicallyhighlyrecommendedtoscalethecovariatesfromtheXmatrixsothattheirmeaniszeroandvarianceone.This isbecausethedefaultpriorsarecompatiblewithsuchscaledcovariates. In theMatlab-package, the scaling is done during theMCMC estimation, so theestimated parameters related to the original (non-scaled) values in the Xmatrix. In the R-package,thescalingisdonewhenconstructingtheHMSCobject.

RSCRIPT data("simulEx1")

MATLABSCRIPT % Here we assume the default prior distributions m.setPriorsDefault();

RSCRIPT formprior <- as.HMSCprior(formdata) formparam <- as.HMSCparam(formdata, formprior)

Page 21: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

21

Settingthevaluesofthe‘trueparameters’As already mentioned, the true parameter values that structure the data are not usuallyknown.Further, if the trueparameter valueswouldbeknown, therewouldbenoneed forfitting themodeland thus forestimating theparameters!Butas thedataweconsiderherehavebeengeneratedbythesamestatisticalmodelusedtofit tothedata,weknowthetrueparametervalues.Wehavestoredthesetrueparametervaluesintoafile,whichwewillnextreadandincludeintheHMSCobject.Bydoingso,wecancomparetheestimatedparametervaluestothetrueones,andthusassesstheperformanceofthemodel.Withrealdata,thisstepissimplyomitted.

SettingoptionsforMCMCsamplingIn thecontextof aBayesiananalysis,model fittingorparameterestimationcorresponds tothesamplingoftheposteriordistribution.Asisoftendonewithhierarchicalmodelstructures,we utilize hereMarkov ChainMonte Carlo (MCMC)methods for posterior sampling. For atechnically oriented user, by using conjugate prior distributions (or discrete priors for thespatial scale parameters and the phylogenetic signal parameter), we can utilize Gibbssampling, i.e. sampling each parameter (which may be a scalar, vector or matrix) in turndirectly from its full conditional distribution. As a result, there are no proposal-acceptancesteps as is the casewith e.g. thewidely usedMetropolis-Hastings algorithm.Thanks to the

MATLABSCRIPT % Here we set options for centring and scaling the covariates % by giving a vector which columns not to scale (0), which scale (1) and % which column should be considered as intercept when scaling (2). % This is because the first column of the matrix m.X is the intercept % and the remaining ones are the covariates to be scaled. covScaleFlag = [2, ones(1, m.nc-1)]; m.setCovScaling(covScaleFlag);

RSCRIPT # The scaling is done automatically when constructing the HMSC object # with the scaleX = TRUE/FALSE and scaleTr = TRUE/FALSE options.

MATLABSCRIPT % For this example, we know the "true parameters" as the data analyzed here % has been re generated by the HMSC model itself. We read the values from % file and add them to the HMSC object. truePar = HmscPar.readFromText(fullfile(dataFolder,'truePar.txt')); m.setParameters(truePar)

RSCRIPT data("simulParamEx1")

Page 22: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

22

abilityofsamplingdirectlythefullconditionaldistributions,themixingoftheparametersisgenerally rather good, and there is no need for tuning the sampling algorithm, i.e. withautomatic ormanual adaptation. In spite of the this, there are several parameters theuserneedstosettoguidethesamplingprocess.However,themaintake-homemessageforauserwhoisnottechnicallyorientedisthatnoneofthesechoicesinfluencestheendresult,i.e.theparameter estimates, assuming that the chainsmixwell enough (see the section onMCMCtraceplots).Thus,theusermaystartwithher/hisownanalyseswiththesamesettingsthatwe use below, and then increase the amount of the sampling if the mixing of the chainsindicatesaneedforsuch.AtechnicaldetailrelatedtotheestimationoftheassociationmatricesisthatthenumbersoftheunderlyingrandomfactorsareadaptedonlyintheinitialphaseoftheMCMCafterwhichthey are fixed. Thus, for their numbers no posterior distributionwill be estimated. This isbecause technically the model has an infinite number of factors, of which those that arenegligible are dropped to speed the calculations. The user need to specify the end of theadaptationphase.Inthescriptsbelow,westoptheadaptationinthebeginningofthefifthrun.Asoneoption,theusercanalsodecidetofixthenumberof latentfactorstoacertainvaluefromtheverybeginning,andthusdecidenottoadapttheirnumberatall.WithHMSC-Matlab,theresultscanbeeithersavedtoRAMorbewrittentofiles,orboth.IntheexamplescriptwesavethemtoRAMonly,meaningthattheywillbeincludedintheresultobject. After the MCMC sampling, we save the result object (and thus also the posteriordistribution) to a file, hence usually there is no need forwriting them also to a file duringmodel fitting.However, if themodel fitting is very time consuming, itmay be beneficial tostorethesamplesalsotoafile,e.g.incaseofcomputerfailureduringmodelfitting.Intheexamplescripts,wedefinethatforeach‘run’ofMCMCwewillsample100drawsfromtheposterior,whichareobtainedbythinningthechainbystoringthevaluefromevery10thMCMCiteration(thusrunningintotal1000iterationsforeachrun).

MATLABSCRIPT % General formats: % m.setMCMCOptions(how many stored at each run, thinning) % m.setMCMCAdapt([run, iteration], number of factors fixed) % m.setMCMCSaveOptions(store to RAM, save to file) % In this example m.setMCMCOptions(100, 10); m.setMCMCAdapt([5, 0], false); m.setMCMCSaveOptions(true, false);

RSCRIPT # These steps are not necessary with the R package because they are included # in using the "hmsc" function.

Page 23: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

23

PerformingtheMCMCsamplingAfter all thepreliminaries steps above are carriedout,we are ready toperform theMCMCsampling. InthescriptsbelowwehavedecidedtoconducttheMCMCsampling for10runs,the length ofwhichwe have defined above. Note that one can obtain the same amount ofsamplingbychangingeitherthenumberofrunsorthenumberofiterationswithinarun.Thereasontohavemultipleruns(insteadofasinglelongone)isthatwithverylargemodels(thefittingofwhichmaytakealongandsomewhatunpredictabletime)itcanbebeneficialtosetuptheestimationforaverylargesetofruns,andselecttheoptionofstoringtheresultstoafileaftereachrun.Inthisway,onecanstartexaminingtheresultswhilelettingtheestimationprocedure continue, and thus to stop the estimation only after themixing of the chains issatisfactory.Optionally,theusercansettheburn-inphaseandthintheresultsfurtherafterthesampling.The resulted thinned sample is stored in addition to the original sample, and used for allfurther analyses. Then the original sample can be discarded to save disk space. Here wedecidetothinthechainfurtherbyconsideringonlyevery5thsample,andtodroptheresultsfromthefirst4runsasaburn-inperiod.WhenperformingtheMCMCsampling,itisnotnecessarytospecifytheinitialparametersastheyareselectedinternally.Duringthemodelfittingphase,HMSCreportsontheprogressofthe MCMC iteration as well as reports the amount of computational time used. With thisexample,theestimationshouldruninlessthanaminute.

SavingtheHMSCobject Nowwe have finished the first part of the script, which corresponds to estimation of theparameters.Atthisstage,itisbeneficialtostoretheHMSCobjectintoafileoutsidetheworkenvironment. In this way, we may develop the script for generating the model outputindependentlyof theestimationphase.Most importantly,wecanreturn to the fittedmodel

MATLABSCRIPT % General format % m.sampleMCMC(runs, append existing MCMC?, staring value([] if to be % determined automatically), level of technical output during sampling); % m.setPostThinning(included runs, additional thinning); % In this example m.sampleMCMC(10, false, [], 3); m.setPostThinning(5:m.repN, 5); % m.repN gives the length of the posterior sample m.postRamClear();

RSCRIPT model <- hmsc(simulEx1, family = "probit", niter = 10000, nburn = 1000, thin = 10)

Page 24: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

24

e.g. after restarting the software (MatLab or R) without re-running the sometimes time-consumingmodelfittingphase.

PreliminariesbeforegeneratingmodeloutputsNowthattheparameterswereestimated,itistimetostartlookingattheresults.Asthefirststep,theHMSCobjectshouldbeloadedbackfromthefileifthesoftwarehasbeenrestartedsincetheestimationwasrunoriftheobjecthasotherwisebeenclearedfromthememory.

In HMSC-Matlab, it is useful (but optional) to define variables that describe what kind ofoutputistobegeneratedandhowitwillbeshownorstored.Inthisway,onecaneasilye.g.suppress thegenerationofpossiblya largenumberof figuresand/or files justby changingthesesettingsinthebeginningofthescript.Herewespecifythatweknowthetruevaluesandwish to include them in the relevant outputs.Wedo notwish to generate any plots to thescreen, but to save themas files in the ‘results’ folder.Wewish to seemixing plots and togenerate parameter summary tables as well as box plots of marginal distributions of theparameters.Fortheassociationnetworks,wegeneratematrixplotsbutnotcircleplots.

MATLABSCRIPT save(fullfile(folder, 'model.mat'), 'm')

RSCRIPT save(model, file = "case studies/simulated example 1/model.RData")

MATLABSCRIPT load(fullfile(folder, 'model.mat'), 'm')

RSCRIPT load(file = "case studies/simulated example 1/model.RData")

MATLABSCRIPT showTrueValues = true; showToScreen = false; saveToFile = true; mixingPlots = true; boxPlots = true; summaryFiles = true; matrixPlots = true; circlePlots = true;

Page 25: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

25

ProducingMCMCtraceplotsTheexamplecodesbelowdemonstratehowtogenerateMCMCtraceplots(ormixingplots)for the parameters𝜷 (see Fig. 2),𝐕and𝛀.MCMC trace plots for other parameters can beplottedusingthesamekindoffunctioncalls(seemoredetailsinthehelpfiles).

ThefollowingRcodeproducestraceplotsanddensityplotsforthe𝜷parameters.

ProducingposteriorsummariesTheexamplescriptforHMSC-Matlabbelowshowhowthemarginalposteriordistributionsofsomeoftheparameterscanbesummarizedinanexceltable(illustratedinFig.3),aswellashow to plot them as box plots (illustrated in Fig. 3b). The excel files include the posteriormean (first sheet) and the quantiles that user selects (in the example 0.025 and 0.975quantiles). Summaries of other parameters can be plotted using the same kind of functioncalls(seemoredetailsinthehelpfiles).

RSCRIPT # These steps are not necessary with HMSC-R.

MATLABSCRIPT % We construct the files mixing_beta.tiff, mixing_V.tiff, mixing_omega_1.tiff and mixing_omega_2.tiff % First argument of plotOmega controls the maximum number of randomly selected pairs to display. if mixingPlots m.plotBeta(showTrueValues, showToScreen, saveToFile, 'mix'); m.plotV(showTrueValues, showToScreen, saveToFile, 'mix'); m.plotOmega(10, showTrueValues, showToScreen, saveToFile, 'mix'); end

RSCRIPT ### Mixing object mixing <- as.mcmc(model, parameters = "paramX") ### Draw trace and density plots for all combination of parameters plot(mixing)

Page 26: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

26

TheexamplescriptforHMSC-Rbelowshowhowthemarginalposteriordistributionsofthe𝜷parameters can be summarizedwith a beanplot, boxplots, a table, aswell as how to drawcredibleintervalplots.

MATLABSCRIPT % We construct the files beta.xlsx, V.xlsx, omega_1.xlsx and omega_2.xlsx % and also box_beta.tiff, box_V.tiff, box_omega_1.tiff and box_omega_2.tiff. if summaryFiles m.summary('beta',[0.025 0.975],showTrueValues,showToScreen,saveToFile); m.summary('V', [0.025 0.975], showTrueValues,showToScreen,saveToFile); m.summary('omega', [0.025 0.975],showTrueValues,showToScreen,saveToFile); end

MATLABSCRIPT if boxPlots m.plotBeta(showTrueValues, showToScreen, saveToFile, 'box'); m.plotV(showTrueValues, showToScreen, saveToFile, 'box'); m.plotOmega(10, showTrueValues, showToScreen, saveToFile, 'box'); end

Page 27: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

27

Performingavariancepartitioning

RSCRIPT### Convert the mixing object to a matrix mixingDF <- as.data.frame(mixing) ### Draw beanplots library(beanplot) par(mar = c(7, 4, 4, 2)) beanplot(mixingDF, las = 2) ### Draw boxplot for each parameters par(mar = c(7, 4, 4, 2)) boxplot(mixingDF, las = 2) ### True values truth <- as.vector(simulParamEx1$param$paramX) ### Average average <- apply(model$results$estimation$paramX, 1:2, mean) ### 95% confidence intervals CI.025 <- apply(model$results$estimation$paramX, 1:2, quantile, probs = 0.025) CI.975 <- apply(model$results$estimation$paramX, 1:2, quantile, probs = 0.975) CI <- cbind(as.vector(CI.025), as.vector(CI.975)) ### Draw confidence interval plots plot(0, 0, xlim = c(1, nrow(CI)), ylim = range(CI, truth), type = "n", xlab = "", ylab = "", main="paramX") abline(h = 0,col = "grey") arrows(x0 = 1:nrow(CI), x1 = 1:nrow(CI), y0 = CI[, 1], y1 = CI[, 2], code = 3, angle = 90, length = 0.05) points(1:nrow(CI), average, pch = 15, cex = 1.5) points(1:nrow(CI), truth, col = "red", pch = 19) RSCRIPT ### Summary table paramXCITable <- cbind(unlist(as.data.frame(average)), unlist(as.data.frame(CI.025)), unlist(as.data.frame(CI.975))) colnames(paramXCITable) <- c("paramX", "lowerCI", "upperCI") rownames(paramXCITable) <- paste(rep(colnames(average), each = nrow(average)), "_", rep(rownames(average), ncol(average)), sep="")

Page 28: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

28

Withthescriptsbelowtheusercanperformvariancepartitioningforthemodel(illustratedinFigure 7 and explained in more detail in the section ‘Typical outputs — Variancepartitioning’).

PlottingassociationnetworksThe example scripts below show how to plot species-to-species association networks(illustratedinFig.4)atthelevelsincludedinthemodel(heresamplingunitandplotlevels).Theusercandefineathresholdtoshowonlythosecorrelationsforwhichtheprobabilitytobeofcertainsignisgreaterthandefinedthreshold.

MATLABSCRIPT%The script below produces as output the files variance-partitioning.xlsx and variance-partitioning_legend.tiff to the results folder group=[1 1 2]; groupnames = {'climate','habitat'}; [fixed, fixedsplit, random, traitR2] = m.computeVariances(group); m.plotVariancePartitioning(fixed,fixedsplit,random,traitR2,groupnames,showToScreen,saveToFile);

RSCRIPTvariationPart <- variPart(model,c(rep("climate",2),"habitat")) barplot(t(variationPart), legend.text=colnames(variationPart), args.legend=list(y=1.1, x=nrow(variationPart)/2, xjust=0.5, horiz=T))

Page 29: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

29

MATLABSCRIPT % The script below produces as output the files % circle_Associations-sampling_unit.tiff, circle_Associations-plot.tiff, % matrix_Associations-sampling_unit.tiff and matrix_Associations-plot.tiff % Here we wish to include to the plot only associations which are either % positive or negative with at least 75% posterior probability threshold = 0.75; for level = 1:m.nr [correlations, support, index] = m.computeCorrelations(level, threshold); ctoplot = correlations.*(support>threshold); ordering = index; m.plotCorrelations(ctoplot, ordering, ['Associations-' , m.levelNames{level}],[matrixPlots circlePlots], showToScreen, saveToFile); end % Above, we loop over the m.nr=2 levels to produce the association matrices % both at the sampling unit and plot levels. % 'correlations' is posterior mean of correlations % 'support' gives the level of statistical support for each correlation % 'index' suggest an ordering of the species that aims to cluster in the % figures groups of co-occurring species. % If the original order is to be used, define ordering=1.m.ns % 'ctoplot' sets those correlations to zero (resulting in white colour in % the figure) for which the level of statistical support does not exceed % the threshold.

RSCRIPT ### Plot random effect estimation through correlation matrix corMat <- corRandomEff(model,cor=FALSE) -------------------------------------------------------------------------- ### Sampling units level #---------------------------------------------------------------------------- ### Isolate the values of interest ltri <- lower.tri(apply(corMat[, , , 1], 1:2, quantile, probs = 0.025), diag=TRUE) ### True values truth <- as.vector(tcrossprod(simulParamEx1$param$paramLatent[[1]])[ltri]) ### Average average <- as.vector(apply(corMat[, , , 1], 1:2, mean)[ltri]) ### 95% confidence intervals corMat.025 <- as.vector(apply(corMat[, , , 1], 1:2, quantile, probs = 0.025)[ltri]) corMat.975 <- as.vector(apply(corMat[, , , 1], 1:2, quantile, probs=0.975)[ltri]) CI <- cbind(corMat.025, corMat.975)

Page 30: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

30

RSCRIPT### Plot the results plot(0, 0, xlim = c(1, nrow(CI)), ylim = range(CI, truth), type = "n", xlab = "", ylim = "", main = "cov(paramLatent[[1, 1]])") abline(h = 0, col = "grey") arrows(x0 = 1:nrow(CI), x1 = 1:nrow(CI), y0 = CI[, 1], y1 = CI[, 2], code = 3, angle = 90, length = 0.05) points(1:nrow(CI), average, pch = 15,cex = 1.5) points(1:nrow(CI), truth, col = "red", pch=19) ### Mixing object mixing <- as.mcmc(model, parameters = "paramLatent") ### Draw trace and density plots for all combination of parameters plot(mixing[[1]]) ### Convert the mixing object to a matrix mixingDF <- as.data.frame(mixing[[1]]) ### Draw boxplot for each parameters par(mar=c(7, 4, 4, 2)) boxplot(mixingDF, las = 2) ### Draw beanplots library(beanplot) par(mar = c(7, 4, 4, 2)) beanplot(mixingDF, las = 2) ### Draw estimated correlation matrix library(corrplot) corMat <- corRandomEff(model, cor = TRUE) averageCor <- apply(corMat[, , , 1], 1:2, mean) corrplot(averageCor, method = "color", col = colorRampPalette(c("blue", "white", "red"))(200)) ### Draw chord diagram library(circlize) corMat <- corRandomEff(model, cor = TRUE) averageCor <- apply(corMat[, , , 1], 1:2, mean) colMat <- matrix(NA, nrow = nrow(averageCor), ncol = ncol(averageCor)) colMat[which(averageCor > 0.4, arr.ind = TRUE)] <- "red" colMat[which(averageCor < -0.4, arr.ind = TRUE)] <- "blue" chordDiagram(averageCor, symmetric = TRUE, annotationTrack = c("name", "grid"), grid.col = "grey",col=colMat)

Page 31: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

31

RSCRIPT #---------------------------------------------------------------------------- ### Plot level #---------------------------------------------------------------------------- ### Isolate the values of interest ltri <- lower.tri(apply(corMat[, , , 2], 1:2, quantile, probs=0.025), diag=TRUE) ### True values truth <- as.vector(tcrossprod(simulParamEx1$param$paramLatent[[2]])[ltri]) ### Average average <- as.vector(apply(corMat[, , , 2], 1:2, mean)[ltri]) ### 95% confidence intervals corMat.025 <- as.vector(apply(corMat[, , , 2], 1:2, quantile, probs = 0.025)[ltri]) corMat.975 <- as.vector(apply(corMat[, , , 2], 1:2, quantile, probs = 0.975)[ltri]) CI <- cbind(corMat.025, corMat.975) ### Plot the results plot(0, 0, xlim = c(1, nrow(CI)), ylim = range(CI, truth), type = "n", xlab = "", main = "cov(paramLatent[[1,2]])") abline(h = 0, col = "grey") arrows(x0 = 1:nrow(CI), x1 = 1:nrow(CI), y0 = CI[, 1], y1 = CI[, 2], code = 3, angle = 90, length = 0.05) points(1:nrow(CI), average, pch = 15, cex = 1.5) points(1:nrow(CI), truth, col = "red", pch = 19) ### Mixing object mixing <- as.mcmc(model, parameters = "paramLatent") ### Draw trace and density plots for all combination of paramters plot(mixing[[2]]) ### Convert the mixing object to a matrix mixingDF <- as.data.frame(mixing[[2]]) ### Draw boxplot for each parameters par(mar = c(7, 4, 4, 2)) boxplot(mixingDF, las = 2) ### Draw beanplots library(beanplot) par(mar = c(7, 4, 4, 2)) beanplot(mixingDF, las = 2) ### Draw estimated correlation matrix library(corrplot) corMat <- corRandomEff(model, cor = TRUE) averageCor <- apply(corMat[, , , 2], 1:2, mean) corrplot(averageCor, method = "color", col = colorRampPalette(c("blue", "white", "red"))(200))

Page 32: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

32

ComputingtheexplanatorypowerofthemodelWith theexample scriptsbelow theuser cancompute theR2 values (illustrated inFigure5and described in more detail in the section ‘Typical outputs — Explanatory/predictivepower’),whichquantifytheexplanatorypowerofthemodelateachofthelevelsofthestudydesign.Inthiscase,theR2valuesarecomputedforsamplingunitandplotlevels.

SamplingtheposteriordistributionAsillustratedbythescriptsbelow,theusercanaccessthefulljointposteriordistributiontodo any kind of post-processing of the results, if needed beyond themore specific toolsweprovide.

RSCRIPT ### Draw chord diagram library(circlize) corMat <- corRandomEff(model, cor = TRUE) averageCor <- apply(corMat[, , , 2], 1:2, mean) colMat <- matrix(NA, nrow = nrow(averageCor), ncol = ncol(averageCor)) colMat[which(averageCor>0.4, arr.ind = TRUE)] <- "red" colMat[which(averageCor< -0.4, arr.ind = TRUE)] <- "blue" chordDiagram(averageCor, symmetric = TRUE, annotationTrack = c("name", "grid"), grid.col = "grey", col = colMat)

MATLABSCRIPT % The script below produces as output the files R2-sampling_unit.tiff and % R2-plot.tiff, based on 100 predictions that are compared to the data R2 = m.computeR2(100); m.plotR2(R2, showToScreen, saveToFile); RSCRIPT Ymean <- apply(model$data$Y,2,mean) R2 <- Rsquare(model, averageSp=FALSE) plot(Ymean,R2,pch=19)

MATLABSCRIPT % The full joint posterior distribution of all parameters can be accessed as a vector: m.postSamVec % The posterior distribution of a single parameter (such as the beta matrix) can be obtained as a cell array: postBetaArray = {m.postSamVec.beta} % This can be modified into a three-dimensional matrix (covariates x species x samples): postBetaMat = cat(3, m.postSamVec.beta)

Page 33: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

33

Generatingpredictionsfortrainingdata As explained above, HMSC can be used to generate many kinds of predictions. Here weillustratethepredictionsthatarenotconditionalontheoccurrencesoftheotherspecies.Howtoperformconditionalpredictionsisshowninthefungalcasestudy.Wefirstmakepredictionsforthetrainingdata,i.e.forthesameenvironmentalconditions(asdescribed in thematrix𝐗)andunits(asdescribed in thematrix𝚷)aswereused formodelfitting.Themotivationforcomputingsuchpredictionsmaybee.g.theevaluationofmodelfit.To illustratethis,weusethepredictionstocomputethespecies-specificTjur𝑅Mvalues(forwhichalsoareadyfunctionexists,asweillustratedearlier).As in the present case we are interested in species occurrence probabilities, we predictexpectations (occurrenceprobabilities) rather than realizations (zerosorones).Toaccountforparameteruncertainty,wegeneratethepredictionsfor100posteriorsamples.

RSCRIPT ### For a particular set of parameter, this given directly by the hmsc function. For example for the beta (paramX) model$results$estimation$paramX ### For the full joint probability distribution fullPost <- jposterior(model)

MATLABSCRIPT % The script below produces as output the file predictions.csv % General format predList = m.predict(ns, X, piCell, xy, expected) % ns: number of predictions to be generated % X: the environmental covariates for which predictions are to be created % piCell: the units for which predictions are to be created % xy: the coordinates for which predictions are to be created (relevant for % spatial models only) expected: set true if expected values are to be % predicted instead of realizations % We wish to generate 100 replicate predictions of presence expectations for % the original training data. As we assume the same units as used here, the % predictions will be based on the estimated plot- and sampling unit % specific random effects predList = m.predict(100, m.X, m.piCell, [], true); % The predictions are given in a cell array. We transform them into three % dimensional matrix (sampling units x species x posterior samples) predM = cat(3, predList{:}); % We save the predictions to a file csvwrite(fullfile(folder, 'results', 'predictions.csv'), predM); % If averaging over the posterior samples, we obtain the mean occurrence % probability for each species probs = mean(predM, 3);

Page 34: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

34

Withthescriptbelow,theusercanproducepredictionsforthetrainingdatawithHMSC-R.Thisisdonewiththe‘predict’function,withoutdefininganynewexplanatorydata,henceusingthedatausedformodeltraining.

Generatingpredictionsfornewdata Predictions for newdata (outside the training data) are commonly needed for at least twokindsofpurposes.First, theyareneeded forcomparing themodelpredictions tovalidationdata,andthusexaminingthepredictivepowerofthemodelinamoreappropriatewaythanwedidabove.Second,theyareneededformakingscenariosimulations,whichmaybeaimede.g. to ask how species occurrence depends on environmental conditions. In SimulatedExample1,thedesignmatrixinvolvestheinterceptandtwoenvironmentalcovariates.Letusassume that we are interested in how the species occurrences depend on variation of thesecond environmental covariate. To do so, we construct a new matrix 𝐗 representingenvironmentalconditionswherethevalueofthefirstcovariate is fixedto itsmean,andthesecondcovariateisvariedsystematicallytocovertherangeoninterest.Aswedon’twanttoapplytherandomeffectsestimatedforthespecificplotsandsamplingunits includedinthetrainingdata,wealsospecifythattheunitsforwhichthepredictionsaretobegeneratedare

MATLABSCRIPT % These probabilities can be used to compute e.g. species-specific R2 % values at both the sampling unit and plot levels. z = max(m.nr, 1); R2b = zeros(m.ns, z); for level = 1:z nunits = max(m.pi(:, level)); if(nunits==m.ny) for j = 1:m.ns R2b(j,level) = mean(probs(m.Y(:,j)==1,j))-mean(probs(m.Y(:,j)==0,j)); end else obs = zeros(nunits, m.ns); preds = zeros(nunits, m.ns); for j = 1:nunits obs(j,:)= sum(m.Y(m.pi(:,level)==j,:)); preds(j,:) = sum(probs(m.pi(:,level)==j,:)); end for j = 1:m.ns R2b(j,level) = corr(obs(:,j), preds(:,j)); end end end % This results into the same values that we computed earlier with the % m.computeR2 function (R2 and R2b are identical up to randomness in % generating presence probabilities from the posterior)

RSCRIPT predTrain <- predict(model)

Page 35: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

35

new, so we just assign new units (which are alphanumerically different from which weredefinedduringmodelfitting)tothenew𝚷-matrix.

Afterthesepreliminaries,wearereadytomakethepredictionsandplotthem.Forexample,wecanplothowthespeciesrespondtovariationintheenvironmentalcovariateof interest(illustratedinFigure9).

MATLABSCRIPT % Initialize the matrix of environmental conditions npred = 100; Xnew = zeros(npred, m.nc); % Set the intercept Xnew(:,1) = 1; % Set covariate 2 to its mean Xnew(:,2) = mean(m.X(:,2)); % Vary covarite 3 linearly from its smallest to largest value Xnew(:,3)=linspace(min(m.X(:,3)), max(m.X(:,3)), npred)'; % Define new units: each prediction will be done to a distinct sampling % unit in a distinct plot, and thus they involve random noise piNew = [(1:npred)', (1:npred)']; piNewCell = num2cell(piNew); piNewCell = cellfun(@num2str, piNewCell, 'UniformOutput', false); piNewCell = cellfun(@(c) [c, '-new'], piNewCell, 'UniformOutput', false);

MATLABSCRIPT% Generating the predictions and plotting the figure predList = m.predict(1000, Xnew, piNewCell, [], true); predM=cat(3, predList{:}); probs=mean(predM, 3); if ~showToScreen set(0,'DefaultFigureVisible','off'); end f=figure; for i = 1:m.ns plot(Xnew(:,3), probs(:,i)); hold on end xlabel('covariate 3'); ylabel('probability of occurrence');

MATLABSCRIPTif saveToFile print(f,'-dtiff',char(fullfile(m.folder,'results','newPredictions.tiff'))); end set(0,'DefaultFigureVisible','on');

Page 36: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

36

Withthescriptbelow,theusercanproducepredictionsfornewdatawithHMSC-R.Thisisdonewiththe‘predict’function,forwhichtheuserprovidesthenewexplanatorydata.

RSCRIPT #---------------------------------------------------------------------------- ### Simulating "validation" data #---------------------------------------------------------------------------- X <- matrix(nrow = 10, ncol = 3) colnames(X) <- colnames(simulEx1$X) X[, 1] <- 1 X[, 2] <- rnorm(10) X[, 3] <- rnorm(10) RandomSel <- sample(200, 10) Random <- simulEx1$Random[RandomSel, ] for(i in 1:ncol(Random)){ Random[, i] <- as.factor(as.character(Random[, i])) } colnames(Random) <- colnames(simulEx1$Random) dataVal <- as.HMSCdata(X = X, Random = Random) #---------------------------------------------------------------------------- ### Prediction for a new set of values #---------------------------------------------------------------------------- predVal <- predict(model, dataVal)

Figure 9. Predicted responses to covariate number three, when covariates one(intercept) and two (the first measured covariate) are set to their mean values. Thedifferentcoloursindicatedifferentspecies.

Page 37: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

37

SimulatedExample2WenextcontinuetheHMSCanalysesbyassumingthat,inadditiontotheoccurrencedataandthe environmental covariate data, we have access to species' trait data and phylogeneticcorrelations.Thedata,scriptsandmodeloutputsforthisexamplearefoundfromthefolder‘simulated example 2’,which is found under the ‘case studies’ folder. The data used in theexample are found from the subfolder ‘data’, where five files are given: exactly the same‘Y.csv’, ‘X.csv’and‘pi.csv’fileswhichwehaveanalysedinSimulatedExample1(andthatareillustratedinFig.7),aswellasthenew‘T.csv’and‘C.csv’fileswhichcontainthetraitmatrix𝐓andthephylogeneticcorrelationmatrix𝐂.ThelattertwofilesareillustratedinFig.10.

Wewill now analyse this extended set of data to address not only the same aims aswithSimulated Example 1, but also two additional aims. First, we wish to examine how traitsinfluencespeciesnichesandthusestimatetheparameters𝜸.Second,wewishtoaskifspeciesniches show a phylogenetic signal after accounting to the variation explained by the traits,andthusestimatetheparameter𝜌.As many parts of the script for Simulated example 1 are identical to the script Simulatedexample2,weexplainbelowonlythosepartsthataredifferent.ThusausergoingexclusivelythroughthisscriptcanalsochecktheexplanationsgivenforSimulatedExample1.However,thescriptitselfisstand-alone,andthuscan(andshould)berunindependently.ModificationsneededtofitamodelwithtraitsandphylogeneticcorrelationsInadditiontoreadingfromthefilethesamedataasinSimulatedExample1,weneedtoreadthetraitandphylogeneticcorrelationmatrices.Inthecodebelow,wealsostorethenamesofthetraitssothatwecanrelatetothemlateron.

Figure10. Illustrationof the files that contain theTandCmatricesused inSimulatedExample2.

Page 38: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

38

Whenconstructing themainHMSC-object,wespecify that inaddition to theelements fromthe Simulated Example 1, we include the traits and phylogeny to the model structure,Accordingly,wesetthenamesofthetraitsaswell.

WithHMSC-R,allthedatamatricesareincludedinthedataobject,andhenceitisreadytobeusedafterloadingit.

After thesemodifications to thedataandmodelstructure, themodel is fittedusingascriptwhichisidenticaltothatofSimulatedExample1.AdditionaloutputfromamodelwithtraitsandphylogeneticcorrelationsInadditiontotheresultsalreadyintroducedforSimulatedExample1,wecannowexaminetheestimatedparametersrelatedtotraitsandphylogeny. The HMSC-Matlab script below produces MCMC trace plots showing the mixing of theparameters𝜸 (effects of traits to species responses to the covariates) and𝜌 (phylogeneticsignal).Thescriptshowshowtoaccesstheposteriordistributionoftheseparameters,aswellashowtoproducetheposteriorsummaries:

MATLABSCRIPT file = fullfile(datafolder, 'T.csv');; Tda = importdata(file); T = Tda.data; traits = Tda.textdata; file = fullfile(datafolder, 'C.csv'); Cda = importdata(file); C = Cda.data;

MATLABSCRIPT m = Hmsc(folder, true, true, [false false]); m.setData(Y, 'probit', X, piCell, [], T, C); m.setSpeciesNames(species) m.setCovNames(covariates) m.setLevelNames(levels) m.setTraitNames(traits)

RSCRIPTdata("simulEx2")

Page 39: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

39

TheHMSC-Rscriptbelowshowshowtoaccesstheposteriordistributionoftheparameters𝜸and𝜌,howtoproduceMCMCtraceplotsshowingthemixingoftheparameter,aswellashowtoproducevariousotherposteriorsummaries:

MATLABSCRIPT % Producing MCMC trace plots: the script below produces as output the files % mixing_gamma.tiff and mixing_rho.tiff to the results folder if mixingPlots m.plotGamma(showTrueValues, showToScreen, saveToFile, 'mix'); m.plotRho(showTrueValues, showToScreen, saveToFile, 'mix'); end % The posterior distribution of the gamma matrix can be obtained as a cell % array: postGammaArray = {m.postSamVec.gamma} % This can be modified into a three-dimensional matrix (traits x covariates x samples): postGammaMat = cat(3, m.postSamVec.gamma) % Producing posterior summaries %The script below produces as output the files gamma.xlsx and rho.xlsx to the results folder if summaryFiles m.summary('gamma', [0.025 0.975], showTrueValues,showToScreen,saveToFile); m.summary('rho', [0.025 0.975], showTrueValues, showToScreen, saveToFile); end

RSCRIPT #---------------------------------------------------------------------------- ## Plot results paramTr #---------------------------------------------------------------------------- ### Mixing object mixing <- as.mcmc(model, parameters = "paramTr") ### Draw trace and density plots for all combination of paramters plot(mixing) ### Convert the mixing object to a matrix mixingDF <- as.data.frame(mixing) ### Draw boxplot for each parameters par(mar = c(7, 4, 4, 2)) boxplot(mixingDF, las = 2) ### Draw beanplots library(beanplot) par(mar = c(7, 4, 4, 2)) beanplot(mixingDF, las = 2)

Page 40: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

40

Aspatiallyhierarchicalstudydesign:Wood-InhabitingFungionBeechLogsGeneraldescriptionofthiscasestudyInthiscasestudywearelookingatwood-inhabitingfungalcommunitiesfrombeechforestsin Northern Spain (see Ovaskainen et al. 2016a for original study). The study design ishierarchical,withalargequantityofsamplingunits(piecesofdeadwood)locatedinmultipleplots,whicharelocatedwithinmultipleforestsites.Theforestsitescanbeclassifiedintotwoenvironmentalcategories(naturalormanaged),andthereareseveralsamplingunitandplotlevelenvironmentalcharacteristicsaswell.AsinSimulatedExample2,weexamineherehowspecies'traits(includedinthematrix𝐓)andphylogeneticcorrelations(includedinthematrix𝐂) influence their responses to their environment (included in the matrix𝐗). Further, weexaminespeciesassociationnetworksatthe levelsofsamplingunits,plots,andforests.Thetraitdatadescribing the speciesarebasedon literature, and thephylogenetic relationships

RSCRIPT ### True values truth <- as.vector(simulParamEx2$param$paramTr) ### Average average <- apply(model$results$estimation$paramTr, 1:2, mean) ### 95% confidence intervals CI.025 <- apply(model$results$estimation$paramTr, 1:2, quantile, probs = 0.025) CI.975 <- apply(model$results$estimation$paramTr, 1:2, quantile, probs = 0.975) CI <- cbind(as.vector(CI.025), as.vector(CI.975)) ### Draw confidence interval plots plot(0, 0, xlim = c(1, nrow(CI)), ylim = range(CI, truth), type = "n", xlab = "", ylab = "", main = "paramX") abline(h = 0, col = "grey") arrows(x0 = 1:nrow(CI), x1 = 1:nrow(CI), y0 = CI[, 1], y1 = CI[, 2], code = 3, angle = 90, length = 0.05) points(1:nrow(CI), average, pch = 15, cex = 1.5) points(1:nrow(CI), truth, col = "red", pch = 19) #---------------------------------------------------------------------------- ## Plot results for the phylogenetic signal Phylo #---------------------------------------------------------------------------- boxplot(model$results$estimation$paramPhylo) abline(h = simulParamEx2$param$paramPhylo, pch = 19, col = "red")

Page 41: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

41

are derived from the taxonomical levels (see Ovaskainen et al. 2016b and its Supportinginformation).

Wetakeasubsetof thewholedatasetwhichwefurtherdivide intotrainingandvalidationdatasets.We then fit themodel toa trainingdataset, evaluate itsperformanceagainst thevalidationdata,andillustratetheresults.Astheverybasiccomponentsofmodelfittingandresultpostprocessingprocedureshavebeenintroducedwiththesimulatedcasestudies,wewillnotgothroughthesepartsindetailinthissection,butonlythepartsthatarenewforthepresentcasestudy.However,astherestofscriptsprovidedinthismanual,thescript‘fungi.m’canberunindependently.

Howtousethescriptforestimationandgeneratingoutputs?Giventhelargesizeofthefungaldata,runningthecodewilltakeafewhours.Therearetwotime-consumingparts: estimating themodelparameters through theMCMC, andproducingpredictions that are conditional on the occurrences of the other species (the generation ofwhichalsorequiresMCMCsampling).Forthisreason,thecodeshouldbeusedinthisorder:1)run theestimationandsave theparameterizedmodel to file,2)generate theconditionalpredictionsandsavethemtofile,and3)outputtheresults.

ReadinginthedataanddividinginintotrainingandvalidationsetsThefirststepistoreadinthedatafilesandconstructthetrainingandvalidationdata.Allthedata is in a single file called ‘data_fungi.csv’, so after it has been imported to Matlab, thedifferent datamatrices needed are extracted from there. For increasing the computationalspeedofthisexample,weusehereonlyasubsetofthewholedataset.Inthescriptbelowweset the parameters that determine the kind of the data to be generated: herewedecide toinclude only the 100 largest logs for each training and validation site and the 50 mostabundantspecies.

MATLABSCRIPT %run the model = true, load an already finished model object = false estimate = true; %make predictions conditional on other species = true makeConditionalPredictions = true; %first run the script with estimate = true; %then run the script with estimate = false; makeConditionalPredictions = true %finally run the script with estimate = false; makeConditionalPredictions = false;

RSCRIPT# This part is not relevant for HMSC-R.

Page 42: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

42

The next step is to read in the data files.We first read the file called ‘data_fungi.csv’, thenselect the subset of the data that are to be used in this example, and construct the datamatriceswhichatthisstageincludebothtrainingandvalidationdata.Thesematricesincludethespeciesoccurrencedata𝐘,environmentalcovariates𝐗,speciestraits𝐓,thephylogenetic(taxonomic in this case) relationships 𝐂, and the study design𝚷 (with three hierarchicallevels).

MATLABSCRIPT if estimate % Settings for dividing the data to training and validation sets, % and for the MCMC sampling nAx = 100; %number of sampling units per training site (15 such sites) nBx = 100; %number of sampling units per validation site (13 such sites) nsA = 50; %number of species (most abundant selected; total 326)

MATLABSCRIPT % Reading in the data %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Import data and do subsetting to training and validation sets file = fullfile(dataFolder, 'data_fungi.csv'); da = importdata(file); data = da.data; % Select half of sites with each management type to training set and put the rest to validation site = data(:,1); unSite = unique(site)'; unSiteN = length(unSite); managed = data(:,3); trainingSites = []; for man=unique(managed)'; unManSite = unique(site(managed==man))'; trainingSites = [trainingSites, randsample(unManSite, round(length(unManSite)/2))]; end trainingSites = sort(trainingSites); validationSites = setdiff(unSite, trainingSites); % For each training site select only nAx largest resource units subData = []; for i = 1:length(trainingSites) dataSite = data(site==trainingSites(i),:); logSize = dataSite(:,6); [~, index] = sort(logSize, 'descend'); subData = [subData; dataSite(index(1:min(nAx, length(index))),:)]; end nA = size(subData, 1); % For each validation site select only nBx largest resource units for i = 1:length(validationSites) dataSite = data(site==validationSites(i),:); logSize = dataSite(:,6); [~, index] = sort(logSize, 'descend'); subData = [subData; dataSite(index(1:min(nBx, length(index))),:)]; end data = subData;

Page 43: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

43

MATLABSCRIPT site = data(:,1); plot = data(:,2); % Construct the matrices X, Y, C and T X = nan(size(data,1),6); X(:,1) = 1; X(:,2) = data(:,3); X(:,3) = log(data(:,4)); X(:,4) = log(data(:,6)); X(:,5) = data(:,5); X(:,6) = data(:,5).^2;

covariates = {'intercept','managed','log dist','log diam','decay','decay2'};

Y = data(:,7:end); [n, nc] = size(X); ns = size(Y, 2); species=da.textdata(7:7+ns-1); file=strcat(folder, filesep, 'data', filesep, 'T_fungi.csv'); da=importdata(file); traits=da.textdata(1,2:end); T=da.data; nt = size(T, 2); % Read species trait data file=fullfile(dataFolder, 'T_fungi.csv'); da=importdata(file); traits=da.textdata(1,2:end); T=da.data; nt = size(T, 2); % Read species phylogenetic relationships file=fullfile(dataFolder, 'phylo_fungi.csv'); da=importdata(file); C=da.data; % Select only nsA most abundant species abu = sum(Y); [~, index] = sort(abu, 'descend'); if nsA >= ns nsA = ns; end selSpec = index(1:nsA); Y = Y(:,selSpec); C = C(selSpec,selSpec); T = T(selSpec,:); levels={'site','plot','resource unit'}; unit=(1:n)'; piAll=[site, plot, unit]; piAll = cellfun(@num2str, num2cell(piAll), 'UniformOutput', false);

Page 44: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

44

WithHMSC-R,thereadytrainingandvalidationdatasetscanbeloadedasfollows:

ConstructingandfittingtheHMSC-modelWithHMSC-Matlab,wenexttakethesubsetofthesedatathatwewillstilluseforfittingthemodel,andconstructtheHMSC-object.

After incorporating thedatamatrices to theHMSCobject,defining themodel structureandincorporatingthenamesofthespecies,covariates,traits,andthelevelsofthestudydesigntotheobject, theusermaywant tohavea lookat thedata assigned to theobjectm inHMSC-Matlab. The data can be accessed by typingm. followed by the name of the data. Thus forexampleforlookingatthespeciesoccurrencedatamatrix,typem.Y.AllthedatamatricesareillustratedinFigure11.InHMSC-R,thedataareassignedandcanbeaccessedbytypingthemodelobjectsfungiTrain and fungiVal.

RSCRIPT data("fungiTrain") data("fungiVal")

MATLABSCRIPT training = 1:nA; validation = (nA+1):n; XTrain = X(training,:); YTrain = Y(training,:); piTrain = piAll(training,:); m.setData(YTrain,'probit',XTrain,piTrain,[],T,C);

Page 45: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

45

Howthescalingoftheexplanatoryvariablesisdone,thepriordistributionsaredefinedandthe MCMC sampling is performed has been introduced in the previous sections with thesimulated examples. In the fungal case study, these steps are done essentially in the sameway,andthusarenotexplainedindetailhere(theexactcodeisprovidedinthe‘fungi.m’and‘fungi.R’scripts).Runningthemodelwiththesesettingscantakeafewhours.Afterthemodelis fitted, the HMSC-object which includes the training data, the model structure, and theposteriordistributionsshouldbesaved.Additionally,oneshouldalsosavealsothefulldatamatrices,aswewillneedthemasvalidationdata.Modeloutputs:conditionalpredictions Wenextillustrateconditionalpredictionswiththehelpofthefungalcasestudy.Hereouraimistopredicttheoccurrencesofthespeciesfornewunits(=thevalidationsites)inawaythataccountsfortheknowledgeontheoccurrencesofallotherspeciesinthosenewunits.

Figure11.IllustrationofthedatamatricesusedforfittingtheHMSC-modelinthefungalcasestudy,andtheirlocationintheobjectm.

MATLABSCRIPT % The script below generates both unconditional and conditional predictions and saves them into disk predN = 10; % The number of predictions to be done. % Set to 100 when producing the results of the paper but here to 10 to make % the code run faster nmcmc = 10; % The number of MCMC iterations run to update the latent % variables for making the conditional predictions

Page 46: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

46

In thenext step,we read in thegeneratedpredictions, and compute themodel’spredictivepower,bothforunconditionalandconditionalpredictionsatalllevels.

MATLABSCRIPT % Generating unconditional predictions (this part of the code is fast) predList = m.predict(predN, X, piAll, [], true); % Generating conditional predictions (this part of the code is slow because % it involves MCMC and it is done separately for each focal species). % Below the matrix Yc gives the occurrences to which the predictions are % conditioned. For the focal species this is set to NaN, for other species to % their true occurrences in the validation data predListC = predList; for j=1:m.ns j Yc=Y; Yc(nA+1:end,j)=NaN; predListC1 = m.predictConditional(predN,Yc,nmcmc,X,piAll,[],true); for i=1:predN predListC{i}(:,j)=predListC1{i}(:,j); end end save(fullfile(m.folder,'predictions.mat'),'predList','predListC','predN','nmcmc');

MATLABSCRIPT% The script below computes the models predictive power based on posterior % means of conditional and unconditional predictions load(fullfile(m.folder,'predictions.mat')); pred=mean(cat(3, predList{:}), 3); predC=mean(cat(3, predListC{:}), 3); R2=zeros(3,m.ns,4); % 1st argument: 1=site, 2=plot, 3=resource unit; % 2nd argument: species % 3rd argument: 1:training, unconditional 2:training, conditional, % 3: validation, unconditional, 4: validation, conditional

Page 47: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

47

Wemaythene.g.plotthepredictivepowervaluesagainstspeciesprevalencesinawaythatcomparestheunconditionalandconditionalpredictionsforthevalidationsites.

MATLABSCRIPT % 1st argument: 1=site, 2=plot, 3=resource unit; % 2nd argument: species % 3rd argument: 1:training, unconditional 2:training, conditional, % 3: validation, unconditional, 4: validation, conditional for z=1:4 for j=1:m.ns; switch z case 1 piX=piAll(1:nA,:); YX=Y(1:nA,j); predX=pred(1:nA,j); case 2 piX=piAll(1:nA,:); YX=Y(1:nA,j); predX=predC(1:nA,j); case 3 piX=piAll(nA+1:end,:); YX=Y(nA+1:end,j); predX=pred(nA+1:end,j); case 4 piX=piAll(nA+1:end,:); YX=Y(nA+1:end,j); predX=predC(nA+1:end,j); end

r1 = YX==1; r0 = YX==0; R2(3,j,z) = mean(predX(r1))-mean(predX(r0)); for level = 1:2 unit = piX(:,level); units = unique(unit); nu = length(units); tmp = zeros(nu,2); for i = 1:nu take = ismember(unit, units(i)); tmp(i,:) = [sum(predX(take)) sum(YX(take))]; end R2(level,j,z)=corr(tmp(:,1),tmp(:,2)); end end end

MATLABSCRIPT pre = mean(Y); maxpre = max(pre); minpre = min(pre);

Page 48: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

48

Theotheroutputswhichwegenerate for thisexamplearesimilar tothoseof thesimulatedexamples.Forthese,thecodeisprovidedinthe‘fungi.m’and‘fungi.R’scripts.

ASpatiallyExplicitStudyDesign:ButterfliesinGreatBritainGeneraldescriptionofthiscasestudyThis case study consists of a spatially explicit survey of butterflies in Great Britain (seeOvaskainen et al. (2016c) for original study). The data involves presence-absences for 55speciesacquiredata10x10kmresolution.With this case studywe illustrate theway theHMSC framework can be used for spatially explicit data. The model involves spatiallystructured latent factors that generate spatial covariance in species occurrences and co-occurrences.Aswith the fungal example,weexplainbelowonly thosepartsof the scripts ‘butterflies.m’and ‘butterflies.R’ that are different from the simulated examples. The script however isstand-alone.

MATLABSCRIPT

for level=1:3 figure; m.levelNames(level) mean(R2(level,:,3)) mean(R2(level,:,4)) scatter(pre,R2(level,:,3),16,'black'); hold on p = polyfit(pre,R2(level,:,3),1);

plot([minpre,maxpre],polyval(p,[minpre,maxpre]),'color',[0.7 0.7 0.7],'LineWidth',2);

scatter(pre,R2(level,:,4),16,'black','filled'); p = polyfit(pre,R2(level,:,4),1); plot([minpre,maxpre],polyval(p,[minpre, maxpre]), 'black','LineWidth',2); xlim([0 1.1*maxpre]); ylim([0 min(1,1.1*max(max(R2(level,:,4)),max(R2(level,:,3))))]); set(gca,'fontsize',24); xlabel('prevalence'); ylabel('Predictive power'); title(m.levelNames(level)); print(fullfile(m.folder,'results',['explanatory power', int2str(level), '.tiff']),'-dtiff'); end

RSCRIPT # This output has not been implemented for HMSC-R yet.

Page 49: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

49

ReadinginthedataanddividingitintotrainingandvalidationsetsWefirstreadinthedatatoconstructtherequiredmatrices,whichconsistatthisstageofbothtraining and validation data. As a new element, the script below includes the spatialcoordinatesofthesamplingsites,neededtoconstructaspatiallyexplicitmodel.

Weselectrandomly300sitesfortrainingdataandleavetherestforvalidation.

WithHMSC-R,boththetrainingandvalidationdatasetscanagainbeloadedasfollows:

MATLABSCRIPT file = fullfile(dataFolder, 'data_butterflies.csv'); da = importdata(file); tmp = da.data; X = ones(size(tmp,1),5); X(:,2:5) = tmp(:,1:4); covariates{1} = 'intercept'; for i = 1:4 covariates{i+1} = da.textdata{i}; end; xy = tmp(:,5:6); Y = tmp(:,7:end); [n, nc] = size(X); ns = size(Y, 2); species = da.textdata(7:7+ns-1); piCell = cellfun(@num2str, num2cell((1:n)'), 'UniformOutput', false); xyCell = [piCell, num2cell(xy)]; file = fullfile(dataFolder,'T_butterflies.csv'); da = importdata(file); %disp(da.textdata); traits = da.textdata; T = da.data; nt = size(T, 2); file=fullfile(dataFolder,'phylo_butterflies.csv'); C=importdata(file);

MATLABSCRIPT nA=300; training=randperm(n,nA); validation=setdiff(1:n,training); YA=Y(training,:); XA=X(training,:);

RSCRIPT data("butterfliesTrain") data("butterfliesVal")

Page 50: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

50

ConstructingandfittingtheHMSC-modelWe then construct the HMSC-object, where we specify that the study design has only onehierarchical level (the sampling unit, i.e. the 10 x 10 km grid cell), which has a spatialstructure.We then incorporate thedatamatrices to theHMSCobject, aswell as define thedatamodel.Asinallotherexamples,weincorporatethenamesofthespecies,covariates,andlatentrandomeffectlevelstotheobject.

Theremainingstepsneeded to fit themodelaredone in thesamewayas for theexamplesabove: scalingof parameters, settingpriordistributions, settingoptions for theMCMC, andperformingtheMCMCsampling.Afterconductingthesesteps,thescript‘butterflies.m’savestheparameterizedmodelaswellasthefulldata(neededformodelvalidationpurposes).Modeloutputs:distancedecayofcommunitysimilarityandregionsofcommonprofileWe will next go through the outputs specific for the butterfly case study. These kinds ofoutputscanbegeneratedessentiallyforanykindofspatiallyexplicitdata:wedecomposethedistancedecayincommunitysimilarityintocomponentsthatcanandcannotbeexplainedbysimilarity in environmental covariates, and classify the spatial context into regionscharacterisedbysimilarcommunities.The script belowpredicts communities for all sites (including training and validation), andclustersthepredictionstoclassifythesitesintoregionsofcommonprofile.

MATLABSCRIPT % General format % m = Hmsc(folder, traits, phylogeny, [spatial]); % In this example m = Hmsc(folder, true, true, [true]); % General format % m.setData(Y, data model, X, pi, xy, T, C); % In this example; data types not included are specified as empty matrices m.setData(YA,'probit',XA,piA,{xyCell},T,C); setTraitNames(m, traits); setCovNames(m, covariates); setSpeciesNames(m, species); m.setLevelNames('cell');

MATLABSCRIPT % Find and plot regions of common profile. The script below produces as % output the file "regions of common profile.tiff" to the results folder XT=X; XTS=repmat(mean(XT),n,1); piT=piCell; predN = 3; predList = m.predict(predN, XT, piT, {xyCell}, true); predListS = m.predict(predN, XTS, piT, {xyCell}, true); pred = mean(cat(3, predList{:}), 3); predS = mean(cat(3, predListS{:}), 3);

Page 51: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

51

Thescriptbelowplots thedistancedecayofcommunitysimilarity forbothpredictions thataccountforordonotaccountforenvironmentalvariation.Predictionsthatdonotaccountforenvironmenatl variation (and thus includea spatial effectonly) areobtainedby setting theenvironmentalconditionstotheirmeanvalues.

MATLABSCRIPT if ~showToScreen set(0,'DefaultFigureVisible','off'); end figure; cl = kmeans(pred,7,'dist','sqeuclidean'); scatter(cell2mat(xyCell(:,2)),cell2mat(xyCell(:,3)),25,cl,'s','filled') set(gca,'fontsize',16) axis equal tight; if saveToFile print(fullfile(m.folder,'results','regions of common profile.tiff'), '-dtiff'); end

MATLABSCRIPT % Plot community similarity as a function of distance, based on pairs of % different points. The script below produces as output the files % "distance decay of community similarity.tiff" to the results folder nrepls=100; ta=zeros(nrepls,3); for i=1:nrepls s1s2=randperm(n,2); s1=s1s2(1); s2=s1s2(2); ta(i,1)=sqrt(sum((cell2mat(xyCell(s1,2:end))-

cell2mat(xyCell(s2,2:end))).^2)); ta(i,2)=corr(pred(s1,:)', pred(s2,:)'); ta(i,3)=corr(predS(s1,:)', predS(s2,:)'); end maxd=max(ta(:,1)); figure; scatter(ta(:,1),ta(:,2),25,'blue','filled') hold on; p = polyfit(ta(:,1),ta(:,2),1); plot([0,maxd],polyval(p,[0 maxd]),'blue'); scatter(ta(:,1),ta(:,3),20,'red','filled') p = polyfit(ta(:,1),ta(:,3),1); plot([0,maxd],polyval(p,[0 maxd]),'red'); plot([0,maxd],[0,0],'black'); hold off; set(gca,'fontsize',16) axis([0,maxd -0.05 1.05]); xlabel('distance'); ylabel('community similarity'); legend('','space + environmental covariates','','space only', 'Location',

'southwest'); if saveToFile print(fullfile(m.folder, 'results', 'distance decay of community

similarity.tiff'),'-dtiff'); end

Page 52: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

52

Therestofoutputsgeneratedbythe‘butterflies.m’scriptaresimilartothoseillustratedwithsimulatedcasestudies.

ATemporallyExplicitStudyDesign:WaterBirdsinPondsGeneraldescriptionofthisstudysystemIn thiscasestudywehavewaterbirds insoutheastSpain.Weutilizedataonthepresencesandabsencesof sevenwaterbird species, thathavebeen surveyed for sevenyearson215irrigationponds(seeSebastián-Gonzálezetal.(2010)fororiginalstudy).WiththiscasestudyweillustratehowtheHMSCframeworkcanbeusedfortimeseriesdata.Hereweincludetheoccurrencesofall species in thepreviousyearaspredictors for theiroccurrences ina focalyear.We includethosepredictorssimply to theenvironmentalcovariatematrix𝐗, inwhichcasetheirinfluencesareestimatedbytheregressionparameters𝜷.Aswith all other examples, we explain below only those parts of the scripts ‘birds.m’ and‘birds.R’thataredifferentfromthesimulatedexamples.Thescriptshoweverarestand-alone.ReadinginthedataWenext read in thedataandconstruct thedatamatrices.Themaindifference to theotherexamples is thathereweadditionally include theoccurrencesofall species in thepreviousyearaspredictorsaswell.

RSCRIPT # This output has not been implemented for HMSC-R yet.

MATLABSCRIPT file = fullfile(dataFolder, 'data_birds.csv'); da = importdata(file); data = da.data; site = data(:,1); year = data(:,4); cov = da.textdata(5:11); ncov = length(cov); spec = da.textdata(12:end); ns = length(spec);

Page 53: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

53

WithHMSC-R,thedataisloadedasfollows:

MATLABSCRIPT % Add the species occurrences of the previous year to the X-matrix % and construct corresponding pi matrix X = []; Y = []; pi = []; for y = 2:max(year) siteYear = site(year==y); for i = 1:length(siteYear) sitePrev = (site==siteYear(i)) & (year==y-1); siteNext = (site==siteYear(i)) & (year==y); if sum(sitePrev) == 1 x = ones(1,ncov+ns+1); x(2:8) = data(siteNext,5:11); x(9:end) = data(sitePrev,12:end); X = [X; x]; Y = [Y; data(siteNext,12:end)]; pi = [pi; [y-1, siteYear(i)]]; else warning('HMSC (birds): check the data - %d records for (site %d X

year %d) pair were found', sum(sitePrev)==1, siteYear(i), y-1); end end end [n, nc] = size(X); pi = [(1:n)', pi]; piCell = cellfun(@num2str, num2cell(pi), 'UniformOutput', false); % Set names for data levels={'visit','year','pond'}; covariates=cell(1,1+ncov+ns); covariates{1}='intercept'; for i=1:ncov covariates{i+1}=cov{i}; end for i=1:ns covariates{ncov+i+1}=spec{i}; end species=spec; species=spec; % have a look at the data matrices produced: we have ny=1091 samples in % our training data; ns=7 species; nc=15 covariates % including both the environmental covariates and the species %occurrences of % the previous year; % and np=3 random effects size(Y) Y(1:10,:) size(X) X(1:10,:) size(pi) pi(1:10,:)

RSCRIPT data("waterbirds")

Page 54: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

54

ConstructingandfittingtheHMSC-modelWenextconstructthemainHMSC-objectandaddthedatainthesamewayaswehavedoneintheexamplesabove.Asthedataincludeonly7species,thereisnotenoughinformationforaskinghowtheresponsesofthespeciesdependontheirtraitsorphylogeneticrelationships,forwhichreasonthesecomponentsarenotincluded.Weincorporatethreelevelsinthestudydesign:year,pond,andthesamplingunitlevelofsurveyvisitconductedinagivenpondinagivenyear.Notethat inthiscasethestudydesigndoesnothaveahierarchicalstructure,astheyearsarenotnestedwithinpondsnorviceversa.

Theremainingstepsneeded to fit themodelaredone in thesamewayas for theexamplesabove: scalingof parameters, settingpriordistributions, settingoptions for theMCMC, andperforming theMCMCsampling.After conducting these steps, the script ‘birds.m’ saves theparameterizedmodel.Modeloutputs:heterospecificinteractionsbetweenyearsTo illustrate the species-to-species associations over time, the script below extracts theposteriordistributionforthetemporalassociationmatrix𝚨 = (𝛼57) fromtheparameterizedmodel,andgeneratesaplotofthismatrix.

MATLABSCRIPT % General format % m = Hmsc(folder, traits, phylogeny, [spatial]); % In this example m = Hmsc(folder, false, false, [false false false]); % General format % m.setData(Y, data model, X, pi, xy, T, C); % In this example; data types not included are specified as empty matrices m.setData(Y,'probit',X,piCell,[],[],[]); setCovNames(m, covariates); setSpeciesNames(m, species); m.setLevelNames(levels);

MATLABSCRIPT % The script below produces as output the files "matrix_Influence of the % species in the previous year-visit.tiff" to the results folder Abeta = cat(3, m.postSamVec.beta); Abeta = Abeta(9:end,:,:); beta1 = mean(Abeta, 3); support = max(sum(Abeta>0, 3), sum(Abeta<0, 3))/m.postSamN; ctoplot=beta1.*(support>threshold); plotCorrelations(m,ctoplot,1:m.ns,strcat('Influence of the species in the

previous year-',m.levelNames{1}),[matrixPlots circlePlots], showToScreen,saveToFile);

Page 55: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

55

Therestofoutputsgeneratedbythescriptsaresimilartothoseillustratedwiththesimulatedcasestudies.

AnAppliedPerspective:HowDoEpiphyticBryophytesRespondtoForestManagement?GeneraldescriptionofthiscasestudyIn thiscasestudyweanalysedataonanepiphyticbryophytecommunitygrowingonaspentress from natural forests of varying stand ages, as well as on retention aspens located inloggedforestswithvariationintimesincelogging(seeOldénetal.(2014)fororiginalstudy).Asdiscussed inOvaskainenetal. (2016b), the interest lies inassessinghowthecommunityrespondstoforestmanagement,inparticulartoGreen-TreeRetentioncutting(GTR),aswellashowthelifeformofthespeciesinfluencestheirresponsestotheenvironmentalcovariates.Aswiththeotherexamples,weexplainbelowonlythosepartsofthescripts ‘bryophytes.m’and‘bryophytes.R’whicharedifferentfromthesimulatedexamples.Thescriptshoweverarestand-alone.ReadinginthedataThescriptreadsinthedataandconstructsthematricesneededtofitthemodel.Therearenonewelementscomparedtothescriptspresentedabove.

RSCRIPT library(corrplot) paramXReorg <- aperm(model$results$estimation$paramX, c(2, 1, 3)) Abeta <- paramXReorg[9:nrow(paramXReorg), , ] beta1 <- apply(Abeta, c(1, 2), mean) support <- pmax(apply(Abeta > 0, c(1, 2), sum), apply(Abeta<0, c(1, 2), sum)) / dim(model$results$estimation$paramX)[3] corrplot(support, method = "color", col = colorRampPalette(c("blue", "white", "red"))(200))

MATLABSCRIPT file = fullfile(dataFolder, 'data_bryophytes.csv'); da = importdata(file); data = da.data; site = data(:,1); X = [ones(size(data,1), 1), data(:,2:5)]; covariates{1} = 'intercept'; for i=1:4 covariates{i+1}=da.textdata{i+1}; end; Y = double(data(:,6:end) > 0); [n, nc] = size(X); ns = size(Y, 2);

Page 56: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

56

WithHMSC-R,thedataisloadedasfollows.Theenvironmentalcovariatesareunscaledforlaterpurposes,soweneedtoscalethembeforefittingthemodel.

ConstructingandfittingtheHMSC-modelWenextconstructthemainHMSC-objectandaddthedatatointhesamewasasintheaboveexamples.Weincorporatetwolevelsinthestudydesign:siteandtree.

MATLABSCRIPT species = da.textdata(6:end); pi = [site, (1:n)']; piCell = cellfun(@num2str, num2cell(pi), 'UniformOutput', false); levels={'site','tree'}; file = fullfile(dataFolder,'T_bryophytes.csv'); da = importdata(file); %disp(da.textdata); traits = da.textdata; T = da.data; nt = size(T, 2); file=fullfile(dataFolder, 'phylo_bryophytes.csv'); C=importdata(file);

RSCRIPT data("bryophytes") ### Scale the X matrix of environmental covariates bryophyteNoScaleX <- bryophytes bryophytes <- as.HMSCdata(Y=bryophytes$Y, X = bryophytes$X,

Tr = bryophytes$Tr, Phylo = bryophytes$Phylo, Random = bryophytes$Random, scaleX = TRUE, scaleTr = FALSE, interceptX = FALSE, interceptTr = FALSE)

MATLABSCRIPT% General format: m = Hmsc(folder, traits, phylogeny, [spatial]); % In this example m = Hmsc(folder, true, true, [false false]); % General format: m.setData(Y, data model, X, pi, xy, T, C); % In this example; data types not included are specified as empty matrices setCovNames(m, covariates); setSpeciesNames(m, species); setTraitNames(m, traits); m.setLevelNames(levels);

Page 57: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

57

GeneratingscenariosimulationsWe next predict the influence of GTR on bryophyte communities through a scenariosimulation. To do so, we generated simulated bryophyte communities for environmentalconditionswhichcorrespondedtoanaspeninanaturalforestaswellasaretentionaspeninaloggingsite.Wevariedtheageofthenaturalforest,thetimesincelogging,aswellasthesizeoftheaspentree(seeOvaskainenetal.2016b).

RSCRIPT model <- hmsc(bryophytes, family = "probit", niter = 10000, nburn = 1000, thin = 10)

MATLABSCRIPT if ~showToScreen set(0,'DefaultFigureVisible','off'); end % Preparing GTR and natural comunities of various age as well as old natural % community as reference XR=ones(31,5); XR(:,4)=0:30; XR(:,5)=0; diamR=30+(60-30)*(XR(:,4))/(70); XR(:,2)=log(diamR); piR=zeros(31,2); piR(:,1)=29; piR(:,2)=205; piR = cellfun(@num2str, num2cell(piR), 'UniformOutput', false); XN=ones(71,5); XN(:,3)=0; XN(:,4)=0; XN(:,5)=80:150; diamN=30+(60-30)*(XN(:,5)-80)/(150-80); XN(:,2)=log(diamN); piN=zeros(71,2); piN(:,1)=29; piN(:,2)=205; piN = cellfun(@num2str, num2cell(piN), 'UniformOutput', false); XRef=ones(1,5); XRef(:,3)=0; XRef(:,4)=0; XRef(:,5)=150; diamRef=60; XRef(:,2)=log(diamRef); piRef=zeros(1,2); piRef(:,1)=29; piRef(:,2)=205; piRef = cellfun(@num2str, num2cell(piRef), 'UniformOutput', false);

Page 58: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

58

RSCRIPT ### Unscaled covariates X <- bryophyteNoScaleX$X ### Preparing GTR and natural communities of various age as well as old ### natural community as reference, and generating simulated communities for ### these data meanX <- apply(X, 2, mean) meanX[1] <- 0 sc <- apply(bryophytes$X, 2, sd) / apply(X, 2, sd) sc[1] <- 1 XR <- matrix(1, nrow = 31, ncol = 5) XR[,4] <- 0:30 XR[,5] <- 0 diamR <- 30 + (60 - 30) * XR[, 4] / 70 XR[,2] <- log(diamR) PiR <- matrix(0, nrow = 31, ncol = 2) PiR[,1] <- 205 PiR[,2] <- 29 XN <- matrix(1, nrow = 71, ncol = 5) XN[,3] <- 0 XN[,4] <- 0 XN[,5] <- 80:150 diamN <- 30 + (60-30) * (XN[, 5]-80) / (150 - 80) XN[,2] <- log(diamN) PiN <- matrix(0, nrow = 71, ncol = 2) PiN[,1] <- 205 PiN[,2] <- 29 XRef <- matrix(1, nrow = 1, ncol = 5) XRef[,3] <- 0 XRef[,4] <- 0 XRef[,5] <- 150 diamRef <- 60 XRef[,2] <- log(diamRef) PiRef <- matrix(0, nrow = 1, ncol = 2) PiRef[,1] <- 205 PiRef[,2] <- 29 ### Predictions # Read the special prediction function for this case source("case studies/bryophytes/predictNew.R") n <- 1000 predVecR <- predictNew(model, n, t((t(XR) - meanX) * sc), PiR) predVecN <- predictNew(model, n, t((t(XN) - meanX) * sc), PiN) predVecRef <- predictNew(model, n, (XRef - meanX) * sc, PiRef)

Page 59: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

59

Modeloutputs:recoveryofspeciesrichnessandcommunitycompositionWe next utilize the above constructed predictions to illustrate how bryophyte speciesrichnessandcommunitycompositionbehaveasafunctionofforestage(foraspensinnaturalforests)andtimesincelogging(forretentionaspens).

MATLABSCRIPT% Calculate richness for retended and natural forests at different age % The script below produces as output the files "species richness.tiff" and % % "community similarity.tiff" to the results folder n = 1000; predListR = m.predict(n, XR, piR, [], true); predListN = m.predict(n, XN, piN, [], true); predListRef = m.predict(n, XRef, piRef, [], true); sRR = cellfun(@(c) sum(c, 2)',predListR, 'UniformOutput', false); sRR = cat(1, sRR{:}); sRN = cellfun(@(c) sum(c, 2)',predListN, 'UniformOutput', false); sRN = cat(1, sRN{:}); % Calculate similarity to old natural community csR=nan(n,31); csN=nan(n,71); for sN=1:n predR = predListR{sN}; predN = predListN{sN}; predRef = predListRef{sN}; csR(sN,:)=corr(predR',predRef'); csN(sN,:)=corr(predN',predRef'); end % Plot richness figure; lw=2; plot(80+XR(:,4),mean(sRR),'r','LineWidth',lw); hold on; plot(80+XR(:,4),quantile(sRR,[0.25 0.75]),'--r','LineWidth',lw); plot(XN(:,5),mean(sRN),'k','LineWidth',lw); plot(XN(:,5),quantile(sRN,[0.25 0.75]),'--k','LineWidth',lw); axis([80 150 0 15]); set(gca,'fontsize',16) if saveToFile print(fullfile(m.folder,'results','species richness.tiff'),'-dtiff'); end % Plot similarity figure; plot(80+XR(:,4),mean(csR),'r','LineWidth',lw); hold on; plot(80+XR(:,4),quantile(csR,[0.25 0.75]),'--r','LineWidth',lw); plot(XN(:,5),mean(csN),'k','LineWidth',lw); plot(XN(:,5),quantile(csN,[0.25 0.75]),'--k','LineWidth',lw); % xlabel('stand age'); % ylabel('community similarity'); set(gca,'fontsize',16) if saveToFile print(fullfile(m.folder,'results','community similarity.tiff'),'-dtiff'); end if ~showToScreen set(0,'DefaultFigureVisible','on'); end

Page 60: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

60

RSCRIPT ### Calculate richness for retended and natural forests at different age sRR <- apply(predVecR, c(1, 3), sum) sRN <- apply(predVecN, c(1, 3), sum) ### Calculate similarity to old natural community csR <- matrix(NA, nrow = n, ncol = 31) csN <- matrix(NA, nrow = n, ncol = 71) for(sN in 1:n){ csR[sN, ] <- cor(t(predVecR[, , sN]), (predVecRef[, , sN])) csN[sN, ] <- cor(t(predVecN[, , sN]), (predVecRef[, , sN])) } ### Plot richness and similarity par(mfrow=c(1, 2), oma = c(0, 1, 0, 0)) # Richness plot(80 + XR[, 4], apply(sRR, 1, mean), lwd=3, type="l", col="red", ylim = c(0, 15), xlim = c(80, 150), main = "", ylab = "Species richness", xlab = "Stand age (years)") lines(80 + XR[, 4], apply(sRR, 1, quantile, 0.25), col = "red", lty = 2, lwd = 3) lines(80 + XR[, 4], apply(sRR, 1, quantile, 0.75), col = "red", lty = 2, lwd = 3) lines(XN[, 5], apply(sRN, 1, mean), lwd = 3, type = "l", col = "black") lines(XN[, 5], apply(sRN, 1, quantile, 0.25), col = "black", lty = 2, lwd = 3) lines(XN[, 5], apply(sRN, 1, quantile, 0.75), col = "black", lty = 2, lwd = 3) # Similarity plot(80 + XR[, 4], apply(csR, 2, mean), lwd = 3, type = "l", col = "red", ylim = c(0.6, 0.95), xlim = c(80, 150), main = "", ylab = "Community similarity",xlab = "Stand age (years)") lines(80 + XR[, 4], apply(csR, 2, quantile, 0.25), col = "red", lty = 2, lwd = 3) lines(80 + XR[, 4], apply(csR, 2, quantile, 0.75), col = "red", lty = 2, lwd = 3) lines(XN[, 5], apply(csN, 2, mean), lwd = 3, type = "l", col = "black") lines(XN[, 5], apply(csN, 2, quantile, 0.25), col = "black", lty = 2, lwd = 3) lines(XN[, 5], apply(csN, 2, quantile, 0.75), col = "black", lty = 2, lwd = 3) mtext("Scenario simulations", side = 3, line = -2, outer = TRUE, font = 2, cex = 1.5)

Page 61: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

61

Modeloutputs:classificationofspeciesbasedontheirresponsetomanagementWefurtherutilizethesamepredictionstoplot thespecies intoaspacethat illustratestheirresponses tomanagement and to time since logging.Basedon these responses, the speciescan be classified into four groups (the green boxes in Fig. 8 fromOvaskainen et al. 2016b,drawnmanuallybasedontheinterpretationofthex-andy-axes).

MATLABSCRIPT % The script below produces as output the files classification.tiff and % "classification_example_species.tiff" to the results folder mpredR=mean(cat(3, predListR{:}), 3); mpredN=mean(cat(3, predListN{:}), 3); youngR=mpredR(1,:); oldR=mpredR(31,:); youngN=mpredN(1,:); oldN=mpredN(31,:); me=mean([youngR; oldR; youngN; oldN]); x1=log(oldN./(1-oldN))-log(oldR./(1-oldR));

% log of ratio of ratios in different communities x2=log(oldR./(1-oldR))-log(youngR./(1-youngR)); x1(x1>2)=2; x1(x1<-2)=-2; x2(x2>2)=2; x2(x2<-2)=-2; figure; scatter(x1,x2,'black','filled'); set(gca,'fontsize',16) print(fullfile(m.folder,'results','classification.tiff'),'-dtiff'); hold on sel=[35 38 52]; % sel=[47 37 1]; m.spNames{sel} scatter(x1(sel(1)),x2(sel(1)),'red','filled'); scatter(x1(sel(2)),x2(sel(2)),'blue','filled'); scatter(x1(sel(3)),x2(sel(3)),'yellow','filled'); print(fullfile(m.folder,'results','classification_example_species.tiff'),'-dtiff'); end

RSCRIPT # This output has not been implemented for HMSC-R yet.

Page 62: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

62

Appendix:GeneratingsimulateddataHerewewill showhow the simulateddata setused inSimulatedExamples1and2 canbegenerated with HMSC-Matlab with the script ‘generate simulated data.m’ found from thefolder ‘generating simulated data’, which is found under the ‘case studies’ folder. Thesimulateddatasetcanbeusedtotestthemodel'sperformance.Asinthiscaseweknowthetrueparametervalueswithwhichthedatahavebeengenerated,itispossibletocompareourownmodelfittingresultstothe‘truth’,asshowninmoredetailinthesimulatedexamples.PreliminariesRuntheheader,tocleantheworkspaceandseedtherandomnumbergeneration.

AddthepathtothedirectoryoftheHMSC-package,asthisisnotnecessarilythesameastheworkingdirectory,andassignthefolderwherethedatacanbesaved.

Definethestructureofoursimulateddata.Inourexample,wehave𝑛) = 200samplingunits,𝑛+ = 10species,𝑛/ =2traits,and𝑛- = 3environmentalcovariates.Wehavedecidedtohavetwohierarchically organised randomeffects (𝑛1 = 2),with the samplingunit level (𝑛?(r) =𝑛) = 200),nestedwithinalargersamplinglevel,whichwecall‘plot’(𝑛?(M) = 5).

MATLABSCRIPTRC = 0; clearvars -except RC; RC = RC + 1; rng(RC);

MATLABSCRIPT % Add a path to the HMSC directory, if needed addpath(fullfile('..', 'HMSC class')); % Assign the folder containing the simulated data datafolder=fullfile('..', 'data', 'simulated');

MATLABSCRIPT % number of sampling units ny=200; % number of species ns=10; % number of traits nt=2; % number of covariates nc=3; % number of levels for random effects np=[ny 5];

Page 63: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

63

ConstructinganHMSCobject WenextconstructanHMSC-objectwhichdefinesthestructureofthedataset.

GeneratingtheparametersandthedatamatricesWenextgeneratethedatawiththebuilt-infunctions.Thefirststepistosetthedimensionsofthedatasetaccordingtowhatwedefinedbefore.

Wethendefinethelinkfunctionanderrorstructurerelatedtotheresponsedata.

Wenextgeneratethestructureofrandomeffects(with𝑛?(r)unitsat first leveland𝑛?(M)atsecond),thecovariancematrixoftheenvironmentalcovariates,andspecies'traits(𝑛/traits).Aswe aim to include thephylogenetic correlations to themodel,we also need to generatethem.

Wenextselectthe ‘true’parametersvalueswithwhichwegeneratethedata.Herewehavedecided to sample the parameters from the prior distribution, for which we assume thedefaultvalues.Toincludethephylogeneticcorrelationsinthegenerationofthedata,wehaveto define also the parameter 𝜌, which measures the strength of the phylogenetic signal.CurrentHMSCimplementationfor𝜌parameterisbasedonassigningadiscretegridandprior

MATLABSCRIPT m = Hmsc([], true, true, [false false]);

MATLABSCRIPT % set necessary dimensions to generate data using build-in class generators m.setDim(ny, ns, nc);

MATLABSCRIPT % set link function and following distributions m.setDist('probit');

MATLABSCRIPT % generate structure of random effects m.genPi(np) % generate covariates matrix m.genX(); % generate traits m.genTraits(nt); % generate matrix of phylogenetic relationships m.genPhylogeny();

Page 64: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

64

weightsforit(includedinm.rhopwandassignedtoarbitraryweighteddiscretegridfrom0to1bycommandm.setPriorsRho(points,weights)).Bydefault,weassignedtheprobabilityof 0.5 for 𝜌 = 0 (corresponding to independence among species), and the remainingprobabilityof0.5uniformlytotherange[0,1],discretizedto100values.Herewedefineanarbitrarystrengthforthephylogeneticsignalof0.79bycommandtP.rho= 80;which results in a strongpositivephylogenetic signal in the species responses to theenvironmentalcovariates.Afterthiswecangeneratetherestofmodelparameters,whichinthisexamplewillbe𝚺, 𝜸, 𝜷and𝐕.Asthefinalstep,wegeneratethespeciesoccurrencedatamatrix𝐘.

Letuslookatthegenerateddata,asstoredwithinthe'm'HMSCobject.

MATLABSCRIPT m.setPriorsDefault(); % generate HmscPar structure from set default priors % in this example this involves generating sigma, gamma, beta and V parameters % set model priors to values defined by default m.genParameters(); tP = m.truePar; % simulate strong positive phylogenic strength tP.rho = 80; % check that rho = 0.79 m.rhopw(tP.rho, 1) lambda = cellfun(@(c) c*2, tP.lambda, 'UniformOutput', false); tP.lambda = lambda; m.setParameters(tP); % generate data without missing values m.genY(0);

MATLABSCRIPT % generated covariates m.X % generated data m.Y % generated traits m.T % generated phylogeny m.C % generated sampling unit and plot level random effects m.pi % generated true parameters - these parameters were used to generate Y matrix m.truePar

Page 65: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

65

SavingthegeneratedparametersandthedatamatricesAstheobjectmcouldbeusedalsoforotherpurposes,itisconvenienttorenamethecreateddata for later use.We save the generated data into .csv-files and save the true parametervaluesasaseparatefile.ThesearethefilesthatweutilizedinSimulatedExamples1and2.

MATLABSCRIPT % assign the generated data mData=m % save the generated data as an 'Hmsc'-object as well as .csv % save(fullfile(datafolder, 'mData'), 'mData'); truePar=mData.truePar; save(fullfile(datafolder, 'truePar.mat'), 'truePar'); truePar.writeToText(fullfile(datafolder, 'truePar.txt')); spNames = cell(1,m.ns); for i=1:m.ns spNames{i} = strcat('sp_',int2str(i)); end covNames = cell(1,m.nc); for i=1:m.nc covNames{i} = strcat('cov_',int2str(i)); end trNames = cell(1,m.nt); for i=1:m.nt trNames{i} = strcat('tr_',int2str(i)); end piNames = {'sampling_unit' 'plot'}; Y=array2table(mData.Y,'VariableNames',spNames); X=array2table(mData.X,'VariableNames',covNames); T=array2table(mData.T,'VariableNames',trNames); pi=array2table(mData.pi,'VariableNames',piNames); C=array2table(mData.C,'VariableNames',spNames); writetable(Y,strcat(datafolder, '/Y.csv')); % save the generated Y matrix writetable(T,strcat(datafolder, '/T.csv')); % save the generated X matrix writetable(X,strcat(datafolder, '/X.csv')); % save the generated T matrix writetable(pi,strcat(datafolder, '/pi.csv')); % save the generated pi matrix writetable(C,strcat(datafolder, '/C.csv')); % save the generated C matrix

Page 66: USER MANUAL FOR THE SOFTWARE PACKAGES HMSC-MATLAB … · random effects are to be included) and organizing the data (typically imported from files with standard MatLab or R commands)

66

ReferencesOldén,A.,Ovaskainen,O.,Kotiaho,J.S.,Laaka-Lindberg,S.&Halme,P.(2014)Bryophyte

speciesrichnessonretentionaspensrecoversintimebutcommunitystructuredoesnot.

PloSone,9,e93786.

Ovaskainen,O.,Abrego,N.,Halme,P.&Dunson,D.(2016a)Usinglatentvariablemodelsto

identifylargenetworksofspecies-to-speciesassociationsatdifferentspatialscales.

MethodsinEcologyandEvolution,7,549–555.

Ovaskainen,O.,GlebTikhonov,Norberg,A.,Blanchet,F.G.,Duan,L.,Dunson,D.,Roslin,T.&

Abrego,N.(2016b)Howtomakemoreoutofcommunitydata?Aconceptualframework

anditsimplementationasmodelsandsoftware.

Ovaskainen,O.,Roy,D.B.,Fox,R.&Anderson,B.J.(2016c)Uncoveringhiddenspatialstructure

inspeciescommunitieswithspatiallyexplicitjointspeciesdistributionmodels.Methods

inEcologyandEvolution,7,428–436.

Sebastián-González,E.,Sánchez-Zapata,J.A.,Botella,F.,Ovaskainen,O.,Sebastán-González,E.,

Sánchez-Zapata,J.A.,Botella,F.&Ovaskainen,O.(2010)Testingtheheterospecific

attractionhypothesiswithtime-seriesdataonspeciesco-occurrence.Proceedings.

Biologicalsciences/TheRoyalSociety,277,2983–90.

Tjur,T.(2009)CoefficientsofDeterminationinLogisticRegressionModels—ANewProposal:

TheCoefficientofDiscrimination.AmericanStatistician,63,366–372.

Zar,J.H.(2010)BiostatisticalAnalysis,5thed.Prentice-Hall/Pearson,UpperSaddleRiver,N.J.