Introduction to WinBUGS for Ecologists: Bayesian approach ...bayanbox.ir/view/5958854163974907854/Introduction-to-WinBUGS-for... · Foreword The title of Marc Kéry’s book, Introduction

Academic Press is an imprint of Elsevier30 Corporate Drive, Suite 400, Burlington, MA 01803, USA525 B Street, Suite 1900, San Diego, CA 92101 4495, USA84 Theobald’s Road, London WC1X 8RR, UKRadarweg 29, PO Box 211, 1000 AE Amsterdam, The Netherlands

First edition 2010

Copyright © 2010, Elsevier Inc. All rights reserved.

No part of this publication may be reproduced, stored in a retrieval system, or transmittedin any form or by any means electronic, mechanical, photocopying, recording, or otherwisewithout the prior written permission of the publisher.

Permissions may be sought directly from Elsevier’s Science & Technology RightsDepartment in Oxford, UK: phone (+44) (0) 1865 843830; fax (+44) (0) 1865 853333;e mail: [email protected]. Alternatively you can submit your request onlineby visiting the Elsevier Web site at http://elsevier.com/locate/permissions, and selectingObtaining permission to use Elsevier material.

NoticeNo responsibility is assumed by the publisher for any injury and/or damage to persons orproperty as a matter of products liability, negligence, or otherwise or from any use oroperation of any methods, products, instructions, or ideas contained in the material herein.Because of rapid advances in the medical sciences, in particular, independent verification ofdiagnoses and drug dosages should be made.

Library of Congress Cataloging in Publication DataKéry, Marc.Introduction to WinBUGS for ecologists : A Bayesian approach to regression, ANOVA,mixed models and related analyses / Marc Kéry. 1st ed.p. cm.ISBN: 978 0 12 378605 01. Biometry Data processing. 2. WinBUGS. I. Title.QH323.5.K47 2010577.01'5118 dc22 2009048549

British Library Cataloguing in Publication DataA catalogue record for this book is available from the British Library.

For information on all Academic Press publicationsvisit our Web site at www.books.elsevier.com

Typeset by: diacriTech, Chennai, India

Printed and bound in China10 10 9 8 7 6 5 4 3 2 1

mailto:[email protected]

http://elsevier.com/locate/permissions

http://www.books.elsevier.com

A Creed for Modeling

To make sense of an observation, everybody needs a model …whether he or she knows it or not.

It is difficult to imagine another methodthat so effectively fosters clear thinking about a

system than the use of a model written in the language of algebra.

v

Foreword

The title of Marc Kéry’s book, Introduction to WinBUGS for Ecologists,provides some good hints about its content. From this title, we mightguess that the book focuses on a piece of software, WinBUGS, that thetreatment will not presuppose extensive knowledge of this software,and that the focus will be on the kinds of questions and inference pro-blems that are faced by scientists who do ecology. So why WinBUGSand why ecologists? Of course, the most basic answer to this question isthat Marc Kéry is an ecologist who has found WinBUGS to be extremelyuseful in his own work. But the important question then becomes, “IsMarc correct that WinBUGS can become an important tool for other eco-logists?” The ultimate utility of this book will depend on the answer to thisquestion, so I will try to develop a response here.

WinBUGS is a flexible, user-friendly software package that permitsBayesian inference from data, based on user-defined statistical models.Because the models must be completely specified by the user, WinBUGSmay not be viewed by some as being as user-friendly as older statisticalsoftware packages that provide classical inference via methods such asmaximum likelihood. So why should an ecologist invest the extra timeand effort to learn WinBUGS? I can think of at least two reasons. Thefirst is that all inference is based on underlying models (a basic constraintof the human condition). In the case of ecological data, the models repre-sent caricatures of the processes that underlie both the data collectionmethods and the dynamics of ecological interest. I confess to knowingfrom personal experience that it is possible to obtain and “interpret”results of analyses from a standard statistical software package, withoutproperly understanding the underlying model(s) on which inference wasbased. In contrast, having to specify a model in WinBUGS insures a basicunderstanding that need not accompany use of many common statisticalsoftware packages. So the necessity of specifying models, and thus ofthinking clearly about underlying sampling and ecological processes, pro-vide a good reason for ecologists to learn and use WinBUGS.

A second reason is that ecological data are typically generated by multi-ple processes, each of which induces variation. Frequently, such multiplesources of variation do not correspond closely to models available in moreclassical statistical software packages. Through my career as a quantitative

xi

ecologist, hundreds of field ecologists have brought me data sets askingfor “standard” analyses, suggesting that I must have seen many similardata sets and that analysis of their data should thus be relatively quickand easy. However, despite these claims, I can’t recall ever having seena data set for which a standard, off-the-shelf analysis was strictly appro-priate. There are always aspects of either the studied system or, more typi-cally, the data collection process that requires nonstandard models. Itwas noted above that most ecological data sets are generated by at leasttwo classes of process: ecological and sampling. The ecological processgenerates the true patterns that our studies are designed to investigate,and conditional on this truth, the sampling process generates the datathat we actually obtain. The data are thus generated by multiple processesthat are best viewed and modeled as hierarchical. Indeed, hierarchicalmodels appropriate for such data are readily constructed in WinBUGS,and hierarchical Bayes provides a natural approach to inference for suchdata. Relative ease of implementation for complex hierarchical models is acompelling reason for ecologists to become proficient with WinBUGS.

For many problems, use of WinBUGS to implement a complex modelcan result in substantial savings of time and effort. I am always impressedby the small amount of WinBUGS code needed to provide inferencefor capture–recapture models that are represented by extremelycomplicated-looking likelihood functions. However, even more importantthan problems that can be solved more easily using WinBUGS than usingtraditional likelihood approaches are the problems that biologists wouldbe unable to solve using these traditional approaches. For example,natural variation among individual organisms of a species will alwaysexist for any characteristic under investigation. Such variation is pervasiveand provides the raw material for Darwinian evolution by natural selec-tion, the central guiding paradigm of all biological sciences. Even withingroups of animals defined by age, sex, size, and other relevant covariates,ecologists still expect variation among individuals in virtually any attri-bute of interest. In capture–recapture modeling, for example, we wouldlike to develop models capable of accounting for variation in captureprobabilities and survival probabilities among individuals within anydefined demographic group. However, in the absence of individualcovariates, it is simply not possible to estimate a separate capture and sur-vival probability for each individual animal. But we can consider a distri-bution of such probabilities across individuals and attempt to estimatecharacteristics of that distribution. In WinBUGS, we can develop hierarch-ical models in which among-individual variation is treated as a randomeffect, with this portion of the inference problem becoming one of estimat-ing the parameters of the distributions that describe this individual varia-tion. In contrast, I (and most ecologists) would not know how to begin to

FOREWORDxii

construct even moderately complex capture–recapture models withrandom individual effects using a likelihood framework. So WinBUGSprovides access to models and inferences that would be otherwise unap-proachable for most ecological scientists.

I conclude that WinBUGS, specifically, and hierarchical Bayesian ana-lysis, generally, are probably very good things for ecologists to learn.However, we should still ask whether the book’s contents and Marc’stutorial writing style are likely to provide readers with an adequate under-standing of this material. My answer to this question is a resounding“Yes!” I especially like Marc’s use of simulation to develop the data setsused in exercises and analyses throughout the book, as this approacheffectively exploits the close connection between data analysis and genera-tion. Statistical models are intended to be simplified representations of theprocesses that generate real data, and the repeated interplay betweensimulation and analysis provides an extremely effective means of teachingthe ability to develop such models and understand the inferences that theyproduce.

Finally, I like the selection of models that are explored in this book. Thebulk of the book focuses on general model classes that are used frequentlyby ecologists, as well as by scientists in other disciplines: linear models,generalized linear models, linear mixed models, and generalized linearmixed models. The learn-by-example approach of simulating data setsand analyzing them using both WinBUGS and classical approaches imple-mented in R provide an effective way not only to teach WinBUGS but alsoto provide a general understanding of these widely used classes of statis-tical model. The two chapters dealing with specific classes of ecologicalmodels, site occupancy models, and binomial mixture abundance modelsthen provide the reader with an appreciation of the need to model bothsampling and ecological processes in order to obtain reasonable inferencesusing data produced by actual ecological sampling. Indeed, it is in thedevelopment and application of models tailored to deal with specific eco-logical sampling methods that the power and utility of WinBUGS are mostreadily demonstrated.

However, I believe that the book, Introduction to WinBUGS for Ecologists,is far too modest and does not capture the central reasons why ecologistsshould read this book and work through the associated examples andexercises. Most important, I believe that the ecologist who gives thisbook a serious read will emerge with a good understanding of statisticalmodels as abstract representations of the various processes that give riseto a data set. Such an understanding is basic to the development of infer-ence models tailored to specific sampling and ecological scenarios. A bene-fit that will accompany this general understanding is specific insights intomajor classes of statistical models that are used in ecology and other areas

FOREWORD xiii

of science. In addition, the tutorial development of models and analyses inWinBUGS and R should leave the reader with the ability to implementboth standard and tailored models. I believe that it would be hard to over-state the value of adding to an ecologist’s toolbox this ability to developand then implement models tailored to specific studies.

Jim NicholsPatuxent Wildlife Research Center, Laurel, MD

FOREWORDxiv

Preface

This book is a gentle introduction to applied Bayesian modeling forecologists using the highly acclaimed, free WinBUGS software, as runfrom program R. The bulk of the book is formed by a very detailed yet,I hope, enjoyable tutorial consisting of commented example analyses.These form a progression from the trivially simple to the moderately com-plex and cover linear, generalized linear (GLM), mixed, and generalizedlinear mixed models (GLMMs). Along the way, a comprehensive and lar-gely nonmathematical overview is given of these important model classes,which represent the core of modern applied statistics and are those whichecologists use most in their work. I provide complete R and WinBUGScode for all analyses; this allows you to follow them step-by-step and inthe desired pace. Being an ecologist myself and having collaborated withmany ecologist colleagues, I am convinced that the large majority of usbest understands more complex statistical methods by first executingworked examples step-by-step and then by modifying these templateanalyses to fit their own data.

All analyses with WinBUGS are directly compared with analyses of thesame data using standard R functions such as lm(), glm(), and lmer().Hence, I would hope that this book will appeal to most ecologists regard-less of whether they ultimately choose a Bayesian or a classical mode ofinference for their analyses. In addition, the comparison of classical andBayesian analyses should help demystify the Bayesian approach to statis-tical modeling. A key feature of this book is that all data sets are simulated(=“assembled”) before analysis (=“disassembly”) and that fully commen-ted R code is provided for both. Data simulation, along with the powerful,yet intuitive model specification language in WinBUGS, represents aunique way to truly understand that core of applied statistics in muchof ecology and other quantitative sciences, generalized linear models(GLMs) and mixed models.

This book traces my own journey as a quantitative ecologist toward anunderstanding of WinBUGS for Bayesian statistical modeling and ofGLMs and mixed models. Both the simulation of data sets and model fit-ting in WinBUGS have been crucial for my own advancement in theserespects. The book grew out of the documentation for a 1-week coursethat I teach at the graduate school for life sciences at the University ofZürich, Switzerland, and elsewhere to similar audiences. Therefore, thetypical readership would be expected to be advanced undergraduate,

xv

graduate students, and researchers in ecology and other quantitativesciences. To maximize your benefits, you should have some basic knowl-edge in R computing and statistics at the level of the linear model (LM)(i.e., analysis of variance and regression).

After three introductory chapters, normal LMs are dealt with inChapters 4–11. In Chapter 9 and especially Chapter 12, they are general-ized to contain more than a single stochastic process, i.e., to the (normal)linear mixed model (LMM). Chapter 13 introduces the GLM, i.e., theextension of the normal LM to allow error distributions other than thenormal. Chapters 13–15 feature Poisson GLMs and Chapters 17–18 bino-mial GLMs. Finally, the GLM, too, is generalized to contain additionalsources of random variation to become a GLMM in Chapter 16 for a Pois-son example and in Chapter 19 for a binomial example. I strongly believethat this step-up approach, where the simplest of all LMs, that “of themean” (Chapter 4), is made progressively more complex until we havea GLMM, helps you to get a synthetic understanding of these modelclasses, which have such a huge importance for applied statistics in ecol-ogy and elsewhere.

The final two main chapters go one step further and showcase twofairly novel and nonstandard versions of a GLMM. The first is the site-occupancy model for species distributions (Chapter 20; MacKenzie et al.,2002, 2003, 2006), and the second is the binomial (or N-) mixture model forestimation and modeling of abundance (Chapter 21; Royle, 2004). Thesemodels allow one to make inference about two pivotal quantities inecology: distribution and abundance of a species (Krebs, 2001). Impor-tantly, these models fully account for the imperfect detection of occupiedsites and individuals, respectively. Arguably, imperfect detection is a hall-mark of all ecological field studies. Hence, these models are extremely use-ful for ecologists but owing to their relative novelty are not yet widelyknown. Also, they are not usually described within the GLM framework,but I believe that recognizing how they fit into the larger picture of linearmodels is illuminating. The Bayesian analysis of these two models offersclear benefits over that by maximum likelihood, for instance, in the easewith which finite-sample inference is obtained (Royle and Kéry, 2007), butalso just heuristically, since these models are easier to understand when fitin WinBUGS.

Owing to its gentle tutorial style, this book should be excellent to teachyourself. I hope that you can learn much about Bayesian analysis usingWinBUGS and about linear statistical models and their generalizationsby simply reading it. However, the most effective way to do this obviouslyis by sitting at a computer and working through all examples, as well asby solving the exercises. Fairly often, I just give the code required to pro-duce a certain output but do not show the actual result, so to fully graspwhat is happening, it is best to execute all code.

PREFACExvi

If the book is used in a classroom setting and plenty of time is given tothe solving of exercises, then up to two weeks might be required to coverall material. Alternatively, some chapters may be skipped or left for thestudents to go through for themselves. Chapters 1–5, inclusive, containkey material. If you already have experience with Bayesian inference,you may skip Chapters 1–2. If you understand well (generalized) linearmodels, you may also skip Chapter 6 and just skim Chapters 7–11 tosee whether you can easily follow. Chapters 9 and 12 are the key chaptersfor your understanding of mixed models, whether LMM or GLMM, andshould not be skipped. The same goes for Chapter 13, which introducesGLMs. The next Chapters (14–19) are examples of (mixed) GLMs and maybe sampled selectively as desired. There is some redundancy in content,e.g., between the following pairs of chapters, which illustrate the samekind of model for a Poisson and a binomial response: 13/17, 15/18, and16/19. Finally, Chapters 20 and 21 are somewhat more specialized andmay not have the same importance for all readers (though I find themto be the most fascinating models in the whole book).

As much as I believe in the great benefits of data simulation for yourunderstanding of a model, data assembly at the start of each chaptermay be skipped. You can download all data sets from the book Web siteor simply execute the R code to generate your own data sets and only go tothe line-by-line mode of study where the analysis begins. Similarly, com-parison of the Bayesian solutions with the maximum likelihood estimatescan be dropped by simply fitting the models inWinBUGS and not in R also.

All R and WinBUGS code in this book can be downloaded fromthe book Web site at http://www.mbr-pwrc.usgs.gov/software/kerybook/maintained by Jim Hines at the Patuxent Wildlife Research Center. TheWeb site also contains some bonus material: a list of WinBUGS tricks,an Errata page, solutions to exercises, a text file containing all the codeshown in the book, as well as the actual data sets that were used to pro-duce the output shown in the book. It also contains a real data set (theSwiss hare data) we deal with extensively in the exercises. The Swisshare data contain replicated counts of Brown hares (Lepus europaeus: seeChapter 13) conducted over 17 years (1992–2008) at 56 sites in eightregions of Switzerland. Replicated means that each year two countswere conducted during a 2-week period. Sites vary in area and elevationand belong to two types of habitat (arable and grassland): hence, there areboth continuous and discrete explanatory variables. Unbounded countsmay be modeled as Poisson random variables with log(area) as an offset,but we can also treat the observed density (i.e., the ratio of a count to area)as a normal or the incidence of a density exceeding some threshold as abinomial random variable. Hence, you can practice with all models shownin this book and meet many features of genuine data sets such as missingvalues and other nuisances of real life.

PREFACE xvii

http://www.mbr-pwrc.usgs.gov/software/kerybook/

ACKNOWLEDGMENTS

My sincere thanks go to the following people: Andy Royle for introdu-cing me to WinBUGS and for his friendship and plenty of help over theyears; Jim Hines for creating and maintaining the book Web site, Michael(Miguel) Schaub, Benedikt Schmidt, Beth Gardner, and Bob Dorazio forhelp and discussions; Wesley Hochachka for reading and commentingon large parts of the book draft; the photographers for providing mewith beautiful illustrations of the organisms behind our statistical exam-ples, and Christophe Berney, Andrew Gelman, Jérôme Guélat, FränziKorner, Konstans Wells, and Ben Zuckerberg for various comments andhelp. Thanks also to the WinBUGS developers for making modern Baye-sian statistical modeling accessible for nonstatisticians like myself and tothe Swiss Ornithological Institute, especially Niklaus Zbinden and LukasJenni, for giving me some freedom to research and write. I am indebted toJim Nichols for writing the Foreword to my book and for introducing meto hierarchical thinking in ecology during two extremely pleasant yearsspent at the Patuxent Wildlife Research Center. And finally, I expressmy deep gratitude to Susana and Gabriel for their love and patience.

Marc Kéry,October 2009

PREFACExviii

C H A P T E R

1Introduction

O U T L I N E

1.1 Advantages of the Bayesian Approach to Statistics 21.1.1 Numerical Tractability 21.1.2 Absence of Asymptotics 21.1.3 Ease of Error Propagation 21.1.4 Formal Framework for Combining Information 31.1.5 Intuitive Appeal 31.1.6 Coherence and Intellectual Beauty 4

1.2 So Why Then Isn’t Everyone a Bayesian? 4

1.3 WinBUGS 4

1.4 Why This Book? 51.4.1 This Is Also an R Book 61.4.2 Juxtaposition of Classical and Bayesian Analyses 61.4.3 The Power of Simulating Data 7

1.5 What This Book Is Not About: Theory of Bayesian Statisticsand Computation 8

1.6 Further Reading 9

1.7 Summary 11

WinBUGS (Gilks et al., 1994; Spiegelhalter et al., 2003; Lunn et al., 2009)is a general-purpose software program to fit statistical models under theBayesian approach to statistics. That is, statistical inference is based onthe posterior distribution, which expresses all that is known about theparameters of a statistical model, given the data and existing knowledge.In recent years, the Bayesian paradigm has gained tremendous momentum

Introduction to WinBUGS for EcologistsDOI: 10.1016/B978-0-12-378605-0.00001-6 © 2010 Elsevier Inc. All rights reserved.1

in statistics and its applications, including ecology, so it is natural towonderabout the reasons for this.

1.1 ADVANTAGES OF THE BAYESIAN APPROACHTO STATISTICS

Key assets of the Bayesian approach and of the associated computa-tional methods include the following:

1.1.1 Numerical Tractability

Many statistical models are currently too complex to be fitted usingclassical statistical methods, but they can be fitted using Bayesian compu-tational methods (Link et al., 2002). However, it may be reassuring that, inmany cases, Bayesian inference gives answers that numerically closelymatch those obtained by classical methods.

1.1.2 Absence of Asymptotics

Asymptotically, that is, for a “large” sample, classical inference basedon maximum likelihood (ML) is unbiased, i.e., in the long run right ontarget. However, for finite sample sizes, i.e., for your data set, ML maywell be biased (Le Cam, 1990). Similarly, standard errors and confidenceintervals are valid only for “large” samples. Statisticians never say what“large” exactly means, but you can be assured that typical ecological datasets aren’t large. In contrast, Bayesian inference is exact for any samplesize. This issue is not widely understood by ecological practitioners of sta-tistics but may be particularly interesting for ecologists since our data setsare typically small to very small.

1.1.3 Ease of Error Propagation

In classical statistics, computing the uncertainty of functions of randomvariables such as parameters is not straightforward and involves approxi-mations such as the delta method (Williams et al., 2002). For instance,consider obtaining an estimate for a population growth rate (r̂) that iscomposed of two estimates of population size in subsequent years(N̂1, N̂2). We have N̂1 and N̂2 and we want r̂: what should we do? Gettingthe point estimate of r̂ is easy, but what about its standard error? In aBayesian analysis with Markov chain Monte Carlo, estimating such, andmuch more complex, derived quantities including their uncertainty is

1. INTRODUCTION2

trivial once we have a random sample from the posterior distribution oftheir constituent parts, such as N̂1 and N̂2 in our example.

1.1.4 Formal Framework for Combining Information

By basing inference on both what we knew before (the prior) and whatwe see now (the data at hand), and using solely the laws of probability forthat combination, Bayesian statistics provides a formal mechanism forintroducing external knowledge into an analysis. This may greatlyincrease the precision of the estimates (McCarthy and Masters, 2005);some parameters may only become estimable through precisely this com-bination of information.

Using existing information also appears a very sensible thing to do: afterall, only rarely don’t we know anything at all about the likely magnitude ofan estimated parameter. For instance, when estimating the annual survivalrate in a population of some large bird species such as a condor, we wouldbe rather surprised to find it to be less than, say, 0.9. Values of less than, say,0.5 would appear downright impossible. However, in classical statistics, bynot using any existing information, we effectively say that the survival ratein that population could be just as well 0.1 as 0.9, or even 0 or 1. This is notreally a sensible attitude since every population ecologists knows very wella priori that no condor population would ever survive for very long with asurvival rate of 0.1. In classical statistics, we always feign total ignoranceabout the system under study when we analyze it.

However, within some limits, it is also possible to specify ignorance in aBayesian analysis. That is, also under the Bayesian paradigm, we can baseour inference on the observed data alone and thereby obtain inferencesthat are typically very similar numerically to those obtained in a classicalanalysis.

1.1.5 Intuitive Appeal

The interpretation of probability in the Bayesian paradigm is much moreintuitive than in the classical statistical framework; in particular, we directlycalculate the probability that a parameter has a certain value rather than theprobability of obtaining a certain kind of data set, given some Null hypo-thesis. Hence, popular statements such as “I am 99% sure that …” are onlypossible in a Bayesian mode of inference, but they are impossible in princi-ple under the classical mode of inference. This is because, in the Bayesianapproach, a probability statement is made about a parameter, whereas inthe classical approach, it is about a data set.

Furthermore, by drawing conclusions based on a combination of whatwe knew before (the prior, or the “experience” part of learning) and what

1.1 ADVANTAGES OF THE BAYESIAN APPROACH TO STATISTICS 3

we see now (the likelihood, or the “current observation” part of learning),Bayesian statistics represent a mathematical formalization of the learningprocess, i.e., of how we all deal with and process information in scienceas well as in our daily life.

1.1.6 Coherence and Intellectual Beauty

The entire Bayesian theory of statistics is based on just three axioms ofprobability (Lindley, 1983, 2006). This contrasts with classical statisticsthat Bayesians are so fond to criticize for being a patchwork of theoryand ad hoc amendments containing plenty of internal contradictions.

1.2 SO WHY THEN ISN’T EVERYONE A BAYESIAN?

Given all the advantages of the Bayesian approach to statistics just men-tioned, it may come as a surprise that currently almost all ecologists stilluse classical statistics. Why is this?

Of course, there is some resistance to the Bayesian philosophy withits perceived subjectivity of prior choice and the challenge of avoidingto, unknowingly, inject information into an analysis via the priors, seeChapter 2. However, arguably, the lack of a much more widespread adop-tion of Bayesian methods in ecology has mostly practical reasons.

First, a Bayesian treatment shines most for complex models, which maynot even be fit in a frequentist mode of inference (Link et al., 2002). Hence,until very recently, most applications of Bayesian statistics featured rathercomplex statistical models. These are neither the easiest to understand inthe first place, nor may they be relevant to the majority of ecologists.Second, typical introductory books on Bayesian statistics are written inwhat is fairly heavy mathematics to most ecologists. Hence, getting to theentry point of the Bayesian world of statistics has been very difficult formany ecologists. Third, Bayesian philosophy and computational methodsare not usually taught at universities. Finally, and perhaps most impor-tantly, the practical implementation of a Bayesian analysis has typicallyinvolved custom-written code in general-purpose computer languagessuch as Fortran or C++. Therefore, for someone lacking a solid knowledgein statistics and computing, Bayesian analyses were essentially out of reach.

1.3 WinBUGS

This last point has radically changed with the advent of WinBUGS(Lunn et al., 2009). Arguably, WinBUGS is the only software that allowsan average numerate ecologist to conduct his own Bayesian analyses of

1. INTRODUCTION4

realistically complex, customized statistical models. By customized I meanthat one is not constrained to run only those models that a program letsyou select by clicking on a button. However, although WinBUGS has beenand is increasingly being used in ecology, the paucity of really accessibleand attractive introductions to WinBUGS for ecologists is a surprise (butsee McCarthy, 2007). I believe that this is the main reason for why Win-BUGS isn’t even more widely used in ecology.

1.4 WHY THIS BOOK?

This book aims at filling this gap by gently introducing ecologists toWinBUGS for exactly those methods they use most often, i.e., the linear,generalized linear, linear mixed, and generalized linear mixed model(GLMM). Table 1.1 shows how the three latter model classes are all gen-eralizations of the simple Normal linear model (LM) in the top left cell ofthe body of the table. They extend the Normal model to contain eithermore than a single random process (represented by the residual in theNormal LM) and/or to exponential family distributions other thanthe Normal, e.g., Poisson and Binomial. Alternatively, starting from theGLMM in the bottom right cell, the other three model classes can beviewed as special cases obtained by imposing restrictions on a generalGLMM.

These four model classes form the core of modern applied statistics.However, even though many ecologists will have applied them oftenusing click-and-point programs or even statistics packages with a pro-gramming language such as GenStat, R, or SAS, I dare express doubtswhether they all really always understand the models they have fitted.Having to specify a model in the elementary way that one has to in Win-BUGS will prove to greatly enhance your understanding of these models,whether you fit them by some sort of likelihood analysis (e.g., ML orrestricted maximum likelihood [REML]) or in a Bayesian analysis.

Apart from the gentle and nonmathematical presentation by examples,the unique selling points of this book, which distinguish it from others, are

TABLE 1.1 Classification of Some Core Models Used for AppliedStatistical Analysis

Single Random ProcessTwo or More RandomProcesses

Normal response Linear model (LM) Linear mixed model (LMM)

Exponential family response Generalized linear model(GLM)

Generalized linear mixedmodel (GLMM)

1.4 WHY THIS BOOK? 5

the full integration of all WinBUGS analyses into program R, the parallelpresentation of classical and Bayesian analyses of all models and the use ofsimulated data sets. Next, I briefly expand on each of these points.

1.4.1 This Is Also an R Book

One key feature of this book as an introduction to WinBUGS is that weconduct all analyses in WinBUGS fully integrated within program R(R Development Core Team, 2007). R has become the lingua franca of mod-ern statistical computing and conducting your Bayesian analysis inWinBUGS from within an R session has great practical benefits. Moreover,we also see how to conduct all analyses using common R functions such aslm(), glm(), and glmer(). This has the added bonus that this book willbe useful to you even if you only want to learn to understand and fit themodels in Table 1 in a classical statistical setting.

1.4.2 Juxtaposition of Classical and Bayesian Analyses

Another key feature is the juxtaposition of analyses using the classicalmethods provided for in program R (mostly ML) and the analyses of thesame models in a Bayesian mode of inference using WinBUGS. Thus, withthe exception of Chapters 20 and 21, we fit every model in both the clas-sical and the Bayesian mode of inference. I have two reasons for creatingparallel examples. First, this should increase your confidence into the“new” (Bayesian) solutions since with vague priors they give numericallyvery similar answers as the “old” solutions (e.g., ML). Second, the analysisof a single model by both classical and Bayesian methods should help todemystify Bayesian analysis. One sometimes reads statements like “weused a Bayesian model,” or “perhaps a Bayesian model should be triedon this difficult problem.” This is nonsense! Since any model exists inde-pendently of the method we choose to analyze it. For instance, the linearregression model is not Bayesian or non-Bayesian; rather, this model maybe analyzed in a Bayesian or in a frequentist mode of inference. Even thatclass of models which has come to be seen as almost synonymous withBayesian inference, hierarchical models which specify a hierarchy of sto-chastic processes, is not intrinsically Bayesian; rather, hierarchical modelscan be analyzed by frequentist (de Valpine and Hastings, 2002; Lee et al.,2006; de Valpine, 2009; Ponciano et al., 2009) or by Bayesian methods(Link and Sauer, 2002; Sauer and Link, 2002; Wikle, 2003; Clark et al.,2005). Indeed, many statisticians now use the two modes of inferencequite opportunistically (Royle and Dorazio, 2006, 2008). Thus, the juxta-position of classical and Bayesian analysis of the same models shouldmake it very clear that a model is one thing and its analysis anotherand that there really is no such thing as a “Bayesian model.”

1. INTRODUCTION6

1.4.3 The Power of Simulating Data

A third key feature of this book is the use of simulated data setsthroughout (except for one data set used repeatedly in the exercises). Atfirst, this may seem artificial, and I have no doubts that some readers maybe disinterested in an analysis when a problem is perceived as “unreal.”However, I would claim that several very important benefits accrue fromthe use of simulated data sets, especially in an introductory book:

1. For simulated data, truth is known. That is, estimates obtained in theanalysis of a model can be compared with what we know they shouldbe in the long-run average.

2. When coding an analysis in WinBUGS, especially in more complexcases but even for simpler ones, it is very easy to make mistakes.Ensuring that an analysis recovers estimates that resemble the knowninput values used to generate a data set can be an important check thatit has been coded correctly.

3. It has been said that one of the most difficult, but absolutely necessarystatistical concepts to grasp is that of the sampling variation of anestimator. For nonstatisticians, I don’t see any other way to grasp themeaning of sampling variation other than literally experiencing it byrepeatedly simulating data under the same model, analyzing them, andseeing how estimates differ randomly from one sample to the next: thisvariation is exactly what the standard error of an estimate quantifies. Inreal life, one typically only ever observes a single realization (i.e., dataset) from the stochastic system about which one wants to make aninference in a statistical analysis. Hence, for ecologists it may be hard tomake the connection with the concept of repeated samples from asystem, when all we have is a single data set (and related to that, tounderstand the difference between a standard deviation and a standarderror).

4. Simulating data can be used to study the long-run averagecharacteristics of estimates, given a certain kind of data set, byrepeating the same data generation-data analysis cycle many times. Inthis way, the (frequentist) operating characteristics of an estimator(bias, or “is it on target on average?”; efficiency, or “how far away fromthe target is the individual estimate on average?”) can be studied bypackaging both the simulation and the analysis into a loop andcomparing the distribution of the resulting estimates to the knowntruth. Further, required sample sizes to obtain a desired level ofprecision can be investigated, as can issues of parameter estimability.All this can be done for exactly the specifications of one’s data set, e.g.,replicate data sets can be generated and analyzed with sample size andparameter values identical to those in one’s real data set to get animpression, say, of the precision of the estimates that one is likely to

1.4 WHY THIS BOOK? 7

obtain. This is also the idea behind posterior predictive checks ofgoodness-of-fit, where the “natural” lack of fit for a model is studiedusing ideal data sets and then compared with the lack of fit observedfor the actual data set (see Section 8.4.2).

5. Simulated data sets can be used to study effects of assumptionviolations. All models embody a set of assumptions that will beviolated to some degree. Whether this has serious consequences forthose estimates one is particularly interested in, can be studied usingsimulation.

6. Finally, and perhaps most importantly, I would claim that the ultimateproof that one has really understood the analysis of a statistical modelis when one is able to simulate a data set under that very model.Analyzing data is a little like fixing a motorbike but in reverse: itconsists of breaking a data set into its parts (e.g., covariate effects andvariances), whereas fixing a bike means putting all the parts of a bikeinto the right place. One way to convince yourself that you reallyunderstand how a bike works is to first dismantle and then reassembleit again to a functioning vehicle. Similarly, for data analysis, by firstassembling a data set and then breaking it apart into recognizable partsby analyzing it, you can prove to yourself that you really understandthe analysis.

In summary, I believe that the value of simulation for analysis andunderstanding of complex stochastic systems can hardly be overstated.On a personal note, what has helped me most to understand nonnormalGLMs or mixed models, apart from having to specify them in the intuitiveBUGS language, was to simulate the associated data sets in program R,which is great for simulating data.

Finally, I hope that the slightly artificial flavor of my data sets is morethan made up for by their nice ecological setting and the attractive organ-isms we pretend to be studying. I imagine that many ecologists will by farprefer learning about new statistical methods using artificial ecological datasets than using real, but “boring” data sets from the political, social,economical, or medical sciences, as one has to do in many excellent intro-ductory books.

1.5 WHAT THIS BOOK IS NOT ABOUT: THEORY OFBAYESIAN STATISTICS AND COMPUTATION

The theory of Bayesian inference is treated only very cursorily in thisbook (see Chapter 2). Other authors have done this admirably, andI refer you to them. Texts that should be accessible to ecologists include

1. INTRODUCTION8

Ellison (1996), Wade (2000), Link et al. (2002), Bernardo (2003), Brooks(2003), Gelman et al. (2004), Woodworth (2004), McCarthy (2007), Royleand Dorazio (2008), King et al. (2009), and Link and Barker (2010).

Furthermore, I don’t dwell on explaining Markov chain Monte Carlo(MCMC) or Gibbs sampling, the computational methods most frequentlyused to fit models in the Bayesian framework. Arguably, a deep under-standing of the details of MCMC is not required for an ecologist to con-duct an adequate Bayesian analysis using WinBUGS. After all, very fewecologists who nowadays fit a GLM or a mixed model understand the(possibly restricted) likelihood function or the algorithms used to findits maximum. (Or can you explain the Newton–Raphson algorithm?And how about iteratively reweighted least squares?) Rather, by usingWinBUGS we are going to experience some of the key features ofMCMC. This includes the chain’s initial transient behavior, the resultantneed for visual or numerical assessment of convergence that leads to dis-carding of initial (“burn-in”) parts of a chain, and the fact that successiveiterations are not independent. If you want to read more on Bayesian com-putation, most of the above references may serve as an entry point to arich literature.

1.6 FURTHER READING

If you seriously consider going Bayesian for your statistical modeling,you will probably want to purchase more than a single book. McCarthy(2007) is an accessible introduction to WinBUGS for beginners, although itpresents WinBUGS only as a standalone application (i.e., not run from R)and the coverage of model classes dealt with is somewhat more limited.Gelman and Hill (2007) is an excellent textbook on linear, generalized, andmixed (generalized) linear models fit in both the classical and the Bayesianmode of inference and using both R and WinBUGS. Thus, its concept issomewhat similar to that of this book, though it does not feature the rig-orous juxtaposition of both kinds of analysis. All examples are from thesocial and political sciences, which will perhaps not particularly interestan ecologist. However, the book contains a wealth of information thatshould be digestible for the audience of this book, as does Gelman et al.(2004). Ntzoufras (2009) is a new and comprehensive introduction to Win-BUGS focusing on GLMs. It is very useful, but has a higher mathematicallevel and uses WinBUGS as a standalone application only. Woodworth(2004) is an entry-level introduction to Bayesian inference and also hassome WinBUGS code examples.

Link and Barker (2010) is an excellent textbook on Bayesian inferencespecifically for ecologists and featuring numerous WinBUGS examples.

1.6 FURTHER READING 9

As an introduction to Bayesianism written mostly in everyday language,Lindley, an influential Bayesian thinker, has written a delightful book,where he argues, among others, that probability is the extension of logic toall events, both certain (like classical logic) and uncertain (Lindley, 2006,p. 66). His book is not about practical aspects of Bayesian analysis, butvery informative, quite amusing and above all, written in an accessibleway.

In this book, we run WinBUGS from within program R; hence, someknowledge of R is required. Your level of knowledge of R only needs tobe minimal and any simple introduction to R would probably suffice toenable you to use this book. I like Dalgaard (2001) as a very accessibleintroduction that focuses mostly on linear models, and at a slightlyhigher level, featuring mostly GLMs, Crawley (2005) and Aitkin et al.(2009). More comprehensive R books will also contain everythingrequired, e.g., Venables and Ripley (2002), Clark (2007), and Bolker(2008).

This book barely touches some of the statistical models that onewould perhaps particularly expect to see in a statistics book for ecolo-gists, namely, Chapters 20 and 21. I say nothing on such core topics inecological statistics such as the estimation of population density, survi-val and other vital rates, or community parameters (Buckland et al.,2001; Borchers et al., 2002; Williams et al., 2002). This is intentional. Ihope that my book lays the groundwork for a much better understand-ing of statistical modeling using WinBUGS. This will allow you to bettertackle more complex and specialized analyses, including those featuredin books like Royle and Dorazio (2008), King et al. (2009), and Link andBarker (2010).

Free documentation for WinBUGS abounds, see http://www.mrc-bsu.cam.ac.uk/bugs/winbugs/contents.shtml. The manual comes along withthe program; within WinBUGS go Help > User Manual or press F1 andthen scroll down. Recently, an open-source version of BUGS has beendeveloped under the name of OpenBugs, see http://mathstat.helsinki.fi/openbugs/, and the latest release contains a set of ecological exampleanalyses including those featured in Chapters 20 and 21. WinBUGS canbe run in combination with other programs such as R, GenStat, Matlab,SAS; see the main WinBUGS Web site. There is even an Excel front-end(see http://www.axrf86.dsl.pipex.com/) that allows you to fit a wide rangeof complex models without even knowing the BUGS language. However,most serious WinBUGS users I know run it from R (see Chapter 5). It turnsout that one of the main challenges for the budding WinBUGS program-mer is to really understand the linear model (see Chapter 6). One particu-larly good introduction to the linear model in the context of survival andpopulation estimation is Chapter 6 in Evan Cooch’s Gentle introduction toMARK (see http://www.phidot.org/software/mark/docs/book/pdf/chap6.pdf).

1. INTRODUCTION10

http://www.mrc-bsu.cam.ac.uk/bugs/winbugs/contents.shtml

http://www.mrc-bsu.cam.ac.uk/bugs/winbugs/contents.shtml

http://mathstat.helsinki.fi/openbugs/


http://www.axrf86.dsl.pipex.com/

http://www.phidot.org/software/mark/docs/book/pdf/chap6.pdf

1.7 SUMMARY

This book attempts the following:

1. demystify Bayesian analyses by showing their application in the mostwidely used general-purpose Bayesian software WinBUGS, in a gentletutorial-like style and in parallel with classical analyses using programR, for a large set of ecological problems that range from very simple tomoderately complex;

2. enhance your understanding of the core of modern applied statistics:linear, generalized linear, linear mixed, and generalized linear mixedmodels and features common to all of them, such as statisticaldistributions and the design matrix;

3. demonstrate the great value of simulation; and4. thereby building a solid grounding of the use of WinBUGS (and R) for

relatively simple models, so you can tackle more complex ones, and tohelp free the modeler in you.

1.7 SUMMARY 11

C H A P T E R

2Introduction to the BayesianAnalysis of a Statistical Model

O U T L I N E

2.1 Probability Theory and Statistics 14

2.2 Two Views of Statistics: Classical and Bayesian 15

2.3 The Importance of Modern Algorithms and Computersfor Bayesian Statistics 19

2.4 Markov chain Monte Carlo (MCMC) and Gibbs Sampling 19

2.5 What Comes after MCMC? 212.5.1 Convergence Monitoring 212.5.2 Summarizing the Posterior for Inference 232.5.3 Computing Functions of Parameters 232.5.4 Forming Predictions 24

2.6 Some Shared Challenges in the Bayesian and the ClassicalAnalysis of a Statistical Model 242.6.1 Checking Model Adequacy 242.6.2 Hypothesis Tests and Model Selection 242.6.3 Parameter Identifiability 26

2.7 Pointer to Special Topics in This Book 27

2.8 Summary 27

This is a practical book that does not cover the theory of Bayesian analy-sis or computational methods in any detail. Nevertheless, a very briefintroduction is in order. For more detailed expositions of these topics see:Royle and Dorazio (2008), King et al. (2009), or Link and Barker (2010),


which are all specifically aimed at ecologists. In this chapter, I will firstmotivate a statistical view of ecology and the world at large and define astatistical model, then, contrast classical and Bayesian statistics, brieflytouch upon Bayesian computation, sketch the steps of a typical Bayesiananalysis, and finally, end with a brief pointer to special topics illustratedin this book.

2.1 PROBABILITY THEORY AND STATISTICS

Both probability theory and statistics are sciences that deal with uncer-tainty. Their subject is the description of stochastic systems, i.e., systemsthat are not fully predictable but include random processes that add adegree of chance and therefore, uncertainty in their outcome. Stochasticsystems are ubiquitous in nature; hence, probability and statistics areimportant not only in science but also to understand all facets of life.

Indeed, stochastic systems are everywhere! They may be the weather(“will it rain tomorrow?”), politics (“will my party win?”), life (“will shemarry me?”), sports (“will my football team win?”), an exam (“will Ipass?”), the sex of an offspring (“will I have a daughter?”), body size ofan organism, and many, many more. Indeed, it is hard to imagine anythingobservable in the world that is not at least in part affected by chance, i.e.,at least partly unpredictable. For such observables or data, probability andstatistics offer the only adequate framework for rigorous description, ana-lysis, and prediction (Lindley, 2006).

To formally interpret any observation, we always need a model, i.e., anabstract description of how we believe our observations are a result ofobservable and unobservable quantities. The latter are called parameters,and one main aim of analyzing the model is to obtain numerical estimatesfor them. A model is always an abstraction and thus strictly alwayswrong. However, according to one of the most famous sayings in statis-tics, some models are useful and our goal must be to search for them. Use-ful models provide greater insights into a stochastic system that mayotherwise be too complex to understand or to predict.

Both probability theory and statistics deal with the characteristics of a sto-chastic system (described by the parameters of a model) and its outcomes(the observed data), but these two fields represent different perspectiveson stochastic systems. Probability theory specifies parameters and amodel and then examines a variable outcome, whereas statistics takes thedata, assumes a model, and then tries to infer the system properties, giventhe model. Parameters are key descriptors of the stochastic system aboutwhich one wants to learn something. Hence, statistics deals with makinginferences (i.e., probabilistic conclusions about system components param-eters) based on a model and the observed outcome of a stochastic system.

2. INTRODUCTION TO THE BAYESIAN ANALYSIS OF A STATISTICAL MODEL14

2.2 TWO VIEWS OF STATISTICS: CLASSICALAND BAYESIAN

In statistics, there are two main views about how one should learnabout the parameter values in a stochastic system: classical (also calledconventional or frequentist) and Bayesian statistics. Although practicalapplications of Bayesian statistics in ecology have greatly increased onlyin recent years, Bayesian statistics is, in fact, very old and was the domi-nating school of statistics for a long time. Indeed, the foundations ofBayesian statistics, the use of conditional probability for inference embo-died in Bayes rule, were laid as early as 1763 by Thomas Bayes, an Englishminister and mathematician. In contrast, the foundations of classicalstatistics were not really laid until the first half of the twentieth century.So what are the differences?

Both classical and Bayesian statistics view data as the observed realiza-tions of stochastic systems that contain one or several random processes.However, in classical statistics, the quantities used to describe these ran-dom processes (parameters) are fixed and unknown constants, whereas inBayesian statistics, parameters are themselves viewed as unobserved rea-lizations of random processes. In classical statistics, uncertainty is evalu-ated and described in terms of the frequency of hypothetical replicates,although these inferences are typically only described from knowledgeof a single data set. Therefore, classical statistics is also called frequentiststatistics. In Bayesian statistics, uncertainty is evaluated using the posteriordistribution of a parameter, which is the conditional probability distribu-tion of all unknown quantities (e.g., the parameters), given the data, themodel, and what we knew about these quantities before conducting theanalysis.

In other words, classical and Bayesian statistics differ in their definitionof probability. In classical statistics, probability is the relative frequency ofa feature of observed data. In contrast, in Bayesian statistics, probability isused to express one’s uncertainty about the likely magnitude of a param-eter; no hypothetical replication of the data set is required.

Under Bayesian inference, we fundamentally distinguish observablequantities x from unobservable quantities θ. Observables x are the data,whereas unobservables θ can be statistical parameters, missing data, mismea-sured data, or future outcomes of the modeled system (predictions); they areall treated as random variables, i.e., quantities that can only be determinedprobabilistically. Because parameters θ are random variables under theBayesian paradigm, we can make probabilistic statements about them, e.g.,say things like “the probability that this population is in decline is 24%.” Incontrast, under the classical view of statistics, such statements are impossiblein principle because parameters are fixed and only the data are random.

2.2 TWO VIEWS OF STATISTICS: CLASSICAL AND BAYESIAN 15

Central to both modes of inference is the sampling distribution pðx jθÞof the data x as a function of a model with its parameters θ. For instance,the sampling distribution for the total number of heads among 10 flips of afair coin is the Binomial distribution with p = 0.5 and trial size N = 10. Thisis the distribution used to describe the effects of chance on the outcome ofthe random variable (here, sum of heads). In much of classical statistics,the likelihood function pðx jθÞ is used as a basis for inference. The likeli-hood function is the same as the sampling distribution of the observeddata x, but “read in the opposite direction”: That value θ̂, which yieldsthe maximum of the likelihood function for the observed data x is takenas the best estimate for θ and is called the maximum likelihood estimate(MLE) of the parameter θ. That is, much of classical inference is basedon the estimation of a single point that corresponds to the maximum ofa function. Note that θ can be a scalar or a vector.

The basis for Bayesian inference is Bayes rule, also called Bayes’ theo-rem, which is a simple result of conditional probability. Bayes ruledescribes the relationship between the two conditional probabilitiespðA jBÞ and pðB jAÞ, where j is read as “given”:

p A jBð Þ = pðB jAÞpðAÞpðBÞ :

This equation is an undisputed fact and can be proven from simpleaxioms of probability. However, what used to be more controversial,and partly still is (e.g., Dennis, 1996; de Valpine, 2009; Lele and Dennis,2009; Ponciano et al., 2009), is how Bayes used his rule. He used it to derivethe probability of the parameters θ, given the data x, that is, the posteriordistribution pðθ jxÞ:

p θ jxð Þ = pðx jθÞpðθÞpðxÞ :

We see that the posterior distribution pðθ jxÞ is proportional to the productof the likelihood function pðx jθÞ and the prior distribution of the param-eter p(θ). To make this product a genuine probability distribution function,with an integral equal to 1, a normalizing constant p(x) is needed asa denominator; this is the probability of observing one’s particular dataset x. Ignoring the denominator (which is just a constant and does notinvolve the unknowns θ), Bayes, rule as applied in Bayesian statisticscan be paraphrased as

Posterior distribution / Likelihood×Prior distribution,

where ! reads as “is proportional to.” Thus, Bayesian inference works byusing the laws of probability to combine the information about parameterθ contained in the observed data x, as quantified in the likelihood function


pðx jθÞ, with what is known or assumed about the parameter before thedata are collected or analyzed, i.e., the prior distribution p(θ). This resultsin a rigorous mathematical statement about the probability of parameter θ,given the data, the posterior distribution pðθ jxÞ. Hence, while classical sta-tistics works by estimating a single point for a parameter (which is anunknown constant), Bayesian statistics makes inference about an entiredistribution instead, because parameters are random variables describedby a statistical distribution.

A prior distribution does not necessarily imply a temporal priority,instead, it simply represents a specific assumption about a model param-eter. Bayes rule tells us how to combine such an assumption abouta parameter with our current observations into a logical, quantitative con-clusion. The latter is represented by the posterior distribution of theparameter.

I find it hard not to be impressed by the application of Bayes rule tostatistical inference, because it so perfectly mimics the way in which we learnin everyday life! In our guts, we always weigh any observation we make,or new information we get, with what we know to be the case or believe toknow. For instance, if someone tells me that he went to the zoo and saw anelephant that stood 5 m tall, I believe this information and find the obser-vation remarkable. However, if someone claimed that he just saw an ele-phant that stood 10 m tall, I don’t believe him. This shows that humanpsychology works exactly as Bayes rule applied to statistical inference:we always weigh new information by its prior probability in drawingour conclusions (here, “Oh, that’s amazing!” or “You must be mad!”).An elephant height of 10 m has a prior probability close to zero to me,hence, I am not all too impressed by this claim (except as to find it outra-geous). Note, however, that I am not particularly knowledgeable about ele-phants. Perhaps someone with more specialist knowledge aboutpachyderms would already have serious doubts about the former claim(I haven’t checked). This is the reason for why many Bayesians emphasizethat all probability is subjective, or personal: it depends on what we knewbefore observing a datum (Lindley, 2006). It is easy to find plenty moreexamples of where we naturally think according to Bayes rule.

Inference in Bayesian statistics is a simple probability calculation, andone of the things Bayesians are most proud of is the parsimony and inter-nal logic of their framework for inference. Thus, the entire Bayesian theoryfor inference can be derived using just three axioms of probability (Lindley,1983, 2006). Bayes rule can be deduced from them, and the entire frame-work for Bayesian statistics, such as estimation, prediction, hypothesis test-ing, is based on just these three premises. In contrast, classical statistics lackssuch an internally coherent body of theory.

However, the requirement to determine a prior probability p(θ) formodel parameters (“prior belief”) has caused fierce opposition to the

2.2 TWO VIEWS OF STATISTICS: CLASSICAL AND BAYESIAN 17

Bayesian paradigm because this was (and partly still is) seen to bring intoscience an unwanted subjective element. However, as we shall see, it iseasy to exaggerate this issue, for several reasons. First, objective scienceor statistics is an illusion anyway: there are always decisions to bemade, e.g., what questions to ask, what factor levels to study, whetherto transform a response, and literally myriads more. Each one of thesedecisions may have an effect on the outcome of a study. Second, it is pos-sible to use the Bayesian machinery for inference (Bayes rule and Markovchain Monte Carlo [MCMC] computing algorithms, see later) withso-called flat priors (also vague, diffuse, uninformative, minimally infor-mative, or low-information priors). Such priors represent our ignoranceabout a parameter or our wish to let inference, i.e., the posterior distribu-tion, be dominated by the observed data. Actually, this is exactly what wedo throughout this book. Third, the prior is seen by some statisticians as astrength rather than a weakness of the Bayesian framework (Link andBarker, 2010): it lets one formally examine the effect on one’s conclusionsof different assumptions about the parameters. Also, anybody using infor-mative priors must say so and justify this choice. When the choice of priorsis suspected to have an undue influence on the posterior distribution, it isgood practice to conduct a sensitivity analysis to see how much one’sconclusions are changed when a different set of priors is used.

Nevertheless, it is fair to say that there can be challenges involving thepriors. First, one possible problem is that priors are not invariant to trans-formation of parameters. A prior that is uninformative for θ may wellbe informative for a one-to-one transformation g(θ) of θ, such as logðθÞor 1/θ. Hence, it is possible to introduce information into an analysis with-out intending to do so. Especially in complex models and these are theones where a Bayesian treatment and the Bayesian model fitting algo-rithms offer most rewards it is quite possible that one unknowinglyintroduces unwanted information by the choice of ostensibly vague priors.Hence, for more complex models, a sensitivity analysis of priors is evenmore useful. Still, these challenges are not seen as insurmountable bymany statisticians, and Bayesian statistics has now very much enteredthe mainstream of statistical science. This can be seen immediatelywhen browsing journals such as Biometrics, Biometrika, or the Journal ofthe American Statistical Association, which contain both frequentist andBayesian work. Many statisticians now use Bayesian and classicalstatistics alike, and some believe that in the future, we will see somekind of merging of the paradigms (e.g., Little, 2006).

Finally, in view of the profound philosophical difference between thetwo paradigms for statistical inference, it is remarkable how little param-eter estimates actually differ numerically in practical applicationswhen vague priors are used in a Bayesian analysis. We shall see this inalmost every example in this book. Indeed, MLEs are an approximation to


the mode of the posterior distribution of a parameter when vague priors areassumed. This is one of the reasons for the ironic claim made by I. J. Good“People who don’t know they are Bayesians are called non-Bayesians.”

2.3 THE IMPORTANCE OF MODERN ALGORITHMSAND COMPUTERS

FOR BAYESIAN STATISTICS

For most modeling applications, the denominator in Bayes rule, p(x),contains high-dimensional integrals which are analytically intractable.Historically, they had to be solved by more or less adequate numericalapproximations. Often, they could not be solved at all. Ironically therefore,for a long time Bayesians thought that they had the better solutions inprinciple than classical statisticians but unfortunately could not practicallyapply them to any except very simple problems for want of a method tosolve their equations.

A dramatic change of this situation came with the advent of simulation-based approaches like MCMC and related techniques that draw samplesfrom the posterior distribution (see for instance the article entitled “Bayesianstatistics without tears” by Smith and Gelfand, 1992). These techniques cir-cumvent the need for actually computing the normalizing constant inBayes rule. This, along with the ever-increasing computer power whichis required for these highly iterative techniques, made the Bayesian revo-lution in statistics possible (Brooks, 2003).

It seems fair to say that the ease with which difficult computationalproblems are solved by MCMC algorithms is one of the main reasonsfor the recent upsurge of Bayesian statistics in ecology, rather than theability to conduct an inference without pretending one is completelystupid (i.e., has no prior knowledge about the analyzed system). Indeed,so far there are only few articles in ecological journals that have actuallyused this asset of Bayesian statistics, i.e., have formally injected priorknowledge into their Bayesian analyses. They include Martin et al.(2005); McCarthy and Masters (2005); Mazzetta et al. (2007); and Swainet al. (2009). Nevertheless, it is likely that analyses with informative priorswill become more common in the future.

2.4 MARKOV CHAIN MONTE CARLO (MCMC)AND GIBBS SAMPLING

MCMC is a set of techniques to simulate draws from the posterior distri-bution pðθ jxÞ given a model, a likelihood pðx jθÞ, and data x, using depen-dent sequences of random variables. That is, MCMC yields a sample from

2.4 MARKOV CHAIN MONTE CARLO (MCMC) AND GIBBS SAMPLING 19

the posterior distribution of a parameter. MCMC was developed in 1953by the physicists Metropolis et al., and later generalized by Hastings(1970), so one of the main MCMC algorithms is called the MetropolisHastings algorithm. Many different flavors of MCMC are available now.One of the most widely used MCMC techniques is Gibbs sampling(Geman and Geman, 1984). It is based on the idea that to solve a largeproblem, instead of trying to do all at once, it is more efficient to break theproblem down into smaller subunits and solve each one in turn. Here is asketch of how Gibbs sampling works taken from a course taught in 2005by Nicky Best and Sylvia Richardson at Imperial College in London.

Let our data be x and our vector of unknowns θ consist of k subcompo-nents θ = ðθ1, θ2, …, θkÞ, hence we want to estimate k parameters.

1. Choose starting (initial) values θð0Þ1 , θð0Þ2 , …, θð0Þk

2. Sample θð1Þ1 from pðθ1 jθð0Þ2 , θð0Þ3 , …, θð0Þk , xÞSample θð1Þ2 from pðθ2 jθð1Þ1 , θð0Þ3 , …, θð0Þk , xÞ…………

Sample θð1Þk from pðθk jθð1Þ1 , θð1Þ2 , …, θð1Þk 1, xÞ3. Repeat step 2 many times (e.g. 100s, 1000s)

eventually obtain a sample from pðθ jxÞStep 2 is called an update or iteration of the Gibbs sampler and after con-vergence is reached, it leads to one draw (=sample) consisting of k valuesfrom the joint posterior distribution pðθ jxÞ. The conditional distributionsin this step are called “full conditionals” as they condition on all otherparameters. The sequence of random draws for each of k parameter result-ing from step 3 forms a Markov chain.

So far, a very simplistic summary of a Bayesian statistical analysis asillustrated in this book would go as follows:

1. We use a degree-of-belief definition of probability rather than adefinition of probability based on the frequency of events amonghypothetical replicates.

2. We use probability distributions to summarize our beliefs or ourknowledge (or lack thereof) about each model parameter and applyBayes rule to update that knowledge with observed data to obtain theposterior distribution of every unknown in our model. The posteriordistribution quantifies all our knowledge about these unknowns giventhe data, our model, and prior assumptions. All statistical inference isbased on the posterior distribution.

3. However, posterior distributions are virtually impossible to computeanalytically in all but the simplest cases; hence, we use simulation(MCMC) to draw series of dependent samples from the posteriordistribution and base our inference on that sample.


How do we construct such a Gibbs sampler or other MCMC algorithm?I am told by statistician colleagues that this is surprisingly easy (but thatthe art consisted of constructing an efficient sampler). However, for mostecologists, this will arguably be prohibitively complicated. And this iswhere WinBUGS comes in: WinBUGS constructs an MCMC algorithmfor us for the model specified and our data set and conducts the iterativesimulations for as long as we desire and have time to wait. Essentially,WinBUGS is an MCMC blackbox (J. A. Royle, pers. comm.).

2.5 WHAT COMES AFTER MCMC?

Once we are done with MCMC, we have a series of random numbersfrom the joint posterior distribution pðθ jxÞ that may look like this for atwo-parameter model such as the model of the mean in Chapters 4 and5 (showing only the first six draws):

μ : 4:28, 6:09, 7:37, 6:10, 4:72, 6:67, …σ2 : 10:98, 11:23, 15:26, 9:17, 14:82, 18:19, …

Now what should we do with these numbers?Essentially, we have to make sure that these numbers come from a sta-

tionary distribution, i.e., that the Markov chain that produced them was atan equilibrium. If that is the case, then this is our estimate of the posteriordistribution. Also, these numbers should not be influenced by our choiceof initial parameter values supplied to start the Markov chains (the initialvalues); remember that successive values are correlated. This is called con-vergence monitoring. Once we are satisfied, we can summarize our sam-ples to estimate any desired feature of the posterior distribution we like,for instance, the mean, median, or mode as a measure of central tendencyas a Bayesian point estimate or the standard deviation of the posterior dis-tribution as a Bayesian measure of the uncertainty of a parameter estimate.Then, we can compute the posterior distribution for derived variables. Forinstance, if parameter γ is the ratio of α and β and we are interested in γ, wecan simply divide α by β for each iteration in the Markov chain to obtain asample of γ and then summarize that for inference about γ. We can alsocompute inferences for very complicated functions of parameters, such asthe probability that γ exceeds some threshold value. Next, we brieflyexpand on each of these topics.

2.5.1 Convergence Monitoring

The first step in making an inference from an MCMC analysis is toensure that an equilibrium distribution has indeed been reached by the

2.5 WHAT COMES AFTER MCMC? 21

Markov chain, i.e., that the chain has converged. For each parameter, westarted the chain at an arbitrary point (the initial value or init chosen foreach parameter), and because successive draws are dependent on the pre-vious values of each parameter, the actual values chosen for the inits willbe noticeable for a while. Therefore, only after a while is the chain inde-pendent of the values with which it was started. These first draws oughtto be discarded as a burn-in as they are unrepresentative of the equilibriumdistribution of the Markov chain.

There are several ways to check for convergence. Most methods use atleast two parallel chains, but another possibility is to compare successivesections of a single long chain. The simplest method is just to inspect plotsof the chains visually: they should look like nice oscillograms around ahorizontal line without any trend. Visual checks are routinely used to con-firm convergence. For example, Fig. 2.1 shows the time-series plot for fiveparallel Markov chains for a parameter in a dynamic occupancy model(MacKenzie et al., 2003) fitted to 16 years worth of Swiss wallcreeperdata (see Chapter 8). Convergence seems to be achieved after about60 iterations.

Another, more formal check for convergence is based on the GelmanRubin (or Brooks Gelman Rubin) statistic (Gelman et al., 2004), calledRhat when using WinBUGS from R via R2WinBUGS (see Chapter 5).It compares between- and within-chain variance in an analysis of variancefashion. Values near 1 indicate likely convergence, and 1.1 is consideredby some as an acceptable threshold (Gelman et al., 2004; Gelman and Hill,2007). With this approach, it is important to start the parallel chains atdifferent selected or at random places.

Convergence monitoring may be a thorny issue and there are horrorstories about how difficult it can be to make sure that convergence hasactually been achieved. I have repeatedly found cases where Rhat

Iteration

psi [

11]

0

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

100 200 300 400 500

FIGURE 2.1 Time series plot of five Markov chains for an occupancy parameter in adynamic occupancy model fitted to Swiss wallcreeper data using WinBUGS (from Kéryet al., 2010a). Typically, the chains of all parameters do not converge equally rapidly.


erroneously indicated convergence; see Chapters 11 and 21 for examples.However, it is also easy to exaggerate this challenge, and with moderncomputing power, many models can be run for 100 000 iterations ormore. Insuring convergence in MCMC analyses is in a sense akin to mak-ing sure that the global maximum in a likelihood function has been foundin classical statistical analyses. In both, it can be difficult to determine thatthe desired goals have been achieved.

2.5.2 Summarizing the Posterior for Inference

Again, the aim of a Bayesian analysis is not the estimate of a single point,as the maximum of the likelihood function in classical statistics, but theestimate of an entire distribution. That means that every unknown (e.g.,parameter, function of parameters, prediction, residual) has an entire dis-tribution. This usually appears a bit odd at first. The posterior can be sum-marized graphically, e.g., using a histogram or a kernel-smoother.Alternatively, we can use mean, median, or mode as a measure of centraltendency of a parameter (i.e., as a point estimate) and the standard devia-tion of the posterior as a measure of the uncertainty in the estimate, i.e., asthe standard error of a parameter estimate. (Beware of challenging casessuch as estimating a parameter that represents a standard deviation, e.g.,the square root of a variance component. We will obtain the posterior dis-tribution of that standard deviation, which will itself have a standarddeviation that is used as the standard error of the estimate of the standarddeviation … simple, eh?). Finally, the Bayesian analog to a 95% confidenceinterval is called a (Bayesian) credible interval (CRI) and is any region ofthe posterior containing 95% of the area under the curve. There is morethan one such region, and one particular CRI is the highest-posterior den-sity interval (HPDI). However, in this book, we will only consider 95%CRIs bounded by the 2.5 and the 97.5 percentile points of the posteriorsample of a parameter.

2.5.3 Computing Functions of Parameters

As mentioned earlier, one of the greatest features of the Bayesian modeof inference using MCMC is the ease with which any function of modelparameters can be computed along with their standard errors exactly,while fully accounting for all the uncertainty involved in computing this func-tion and without the need for any approximations such as the deltamethod (Powell, 2007). To get a posterior sample for a populationgrowth rate r from two estimates of population size, N1 and N2, we simplycompute the ratio of the two at every iteration of the Markov chain andsummarize the resulting posterior sample for inference about r.

2.5 WHAT COMES AFTER MCMC? 23

2.5.4 Forming Predictions

Predictions are expected values of the response for future samples orof hypothetical values of the explanatory variables of a model, or moregenerally, of any unobserved quantity. Predictions are very importantfor (a) presentation of results from an analysis and (b) to even understandwhat a model tells us. For example, the biological meaning of aninteraction or a polynomial term can be difficult to determine from a setof parameter estimates. Because predictions are functions of parametersand of data (values of covariate), their posterior distributions can againbe used for inference with the mean and the 95% CRIs often used as thepredicted values along with a 95% prediction interval.

2.6 SOME SHARED CHALLENGES IN THEBAYESIAN AND THE CLASSICAL

ANALYSIS OF A STATISTICAL MODEL

Other, much more general and also more difficult topics in a Bayesiananalysis include model criticism (checking whether the chosen model isadequate for a data set), hypothesis testing and model selection, andchecking parameter identifiability (e.g., making sure that there are enoughdata to actually estimate a parameter). These topics are also challenging inclassical statistics, although they are frequently neglected.

2.6.1 Checking Model Adequacy

In models with a single error term (e.g., linear model [LM] and general-ized linear model [GLM]), the usual residual diagnostics can be applied,e.g., plots of residuals vs. fitted values, histograms of the residuals, and soon, can be produced. We will see some examples of this in this book. Inhierarchical models (i.e., models that include random effects other thanresiduals), checking model adequacy is more difficult and may have toinvolve (internal) cross-validation, validation against external data, orposterior predictive checks, see Gelman et al. (1996, 2004). We will see sev-eral examples for posterior predictive checks, including computation ofthe Bayesian p-value, which is not about hypothesis testing as ina classical analysis, but a measure of how well a model fits a given dataset, i.e., a measure for goodness-of-fit.

2.6.2 Hypothesis Tests and Model Selection

As for classical statistics with a confidence interval, in the Bayesianparadigm, hypothesis tests can be conducted based on a CRI: if it overlaps


zero, then the evidence for an effect of that parameter is ambiguous. Inaddition, we can make direct probability statements about the magnitudeof a parameter as mentioned above.

To measure the evidence for a single or a collection of effects in themodel, we can use an idea described by Kuo and Mallick (1998) andalso Dellaportas et al. (2002): premultiply each regression parameterwith a binary indicator w and give w a Bernoulli(p = 0.5) prior. The poste-rior of w will then measure the probability that the associated effectbelongs in the model; see Royle (2008) for an example of this. Based onthe MCMC output from such a model run, model-averaged parameterestimates can also be produced. It must be noted, though, that implement-ing this feature greatly slows down MCMC samplers.

For model selection in nonhierarchical models, that is, models with asingle random component, e.g., LMs and GLMs, there is a Bayesian analogto the Akaike’s information criterion (AIC) called the DIC (deviance infor-mation criterion; Spiegelhalter et al., 2002). Similar to the AIC, the DIC iscomputed as the sum of the deviance plus twice the effective number ofparameters (called pD) and expresses the trade-off between the fit of amodel and the variance of (i.e., uncertainty around) its estimates. Allelse being equal, a more complex model fits better than a simpler onebut has less precise parameter estimates, so the best choice will be someintermediate degree of model complexity.

The DIC as computed by WinBUGS seems to work well for nonhie-rarchical models, but unfortunately, for models more complex thanGLMs, especially hierarchical models (e.g., linear mixed models and gen-eralized linear mixed models), the DIC needs to be computed in a differentand more complicated manner; see Millar (2009). Thus, in this book, weare not going to pay special attention to the DIC, nor to the estimate of theeffective number of parameters (pD). However, you are invited to observe,for any of the models analyzed, whether pD or the DIC score computedmake sense or not, for instance, when you add or drop a covariate orchange other parts of a model.

There are other ways to decide on how much complexity is warrantedin a model, one of which goes under the name reversible-jump (or RJ-)MCMC (King et al., 2009; Ntzoufras, 2009). Simple versions of RJ-MCMCcan be implemented in WinBUGS, see http://www.winbugs-development.org.uk/.

No doubt the lack of a semiautomated way of conducting model selec-tion or model averaging (except for RJ-MCMC) will come as a disappoint-ment to many readers. After all, many ecologists have come to think thatthe problem of model selection has been solved and that this solution hasa name, AIC (see extensive review by Burnham and Anderson, 2002).However, this rosy impression is probably too optimistic as argued, forinstance, by Link and Barker (2006). It is perhaps instructive for an

2.6 SOME SHARED CHALLENGES IN THE BAYESIAN 25

http://www.winbugs-development.org.uk/

http://www.winbugs-development.org.uk/

ecologist to browse through Kadane and Lazar (2004); this article pub-lished in one of the premier statistical research journals reviews modelselection and clearly shows that in the field of statistics (as opposed toparts of ecology), the challenge of model selection is not yet viewed ashaving been solved.

2.6.3 Parameter Identifiability

A parameter can be loosely said to be identifiable when there is enoughinformation in the data to determine its value unambiguously. This hasnothing to do with the precision of the estimate. For instance, in the equa-tion a + b = 7, no parameter is estimable, and we would need an additionalequation in a and/or b to be able to determine the values of the twoparameters, i.e., to make them estimable.

Strictly speaking, parameter identifiability is not an issue in theBayesian framework because, in principle, we can always compute a pos-terior distribution for a parameter. At worst, the posterior will be the sameas the prior, but then we haven’t learned anything about that parameter.A common check for identifiability is therefore to compare the prior andthe posterior distribution of a parameter: if the two coincide approxi-mately, there does not seem to be any information in the data aboutthat parameter. Assessing parameter identifiability is another difficult

TABLE 2.1 Examples for the Illustration of Some Important Special Topics

Topic Location (Chapters)

Importance of detection probability when analyzingcounts

13, 16, 16E, 17, 18, 20, and 21

Random effects/hierarchical models/mixed models 9, 12, 16, 19, 20, and 21

Computing residuals 7, 8, 13, 18, and 20

Posterior predictive checks, including Bayesian p value 8, 13, 18, 20, and 21

Forming predictions 8, 10, 13, 15, 20, and 21

Prior sensitivity 5E, 20, and 21

Nonconvergence or convergence wrongly indicated 11 and 21

Missing values 4E and 8E

Standardization of covariates 11

Use of simulation for assessment of bias (or estimability)in an estimator

10

E denotes the exercises at the end of each chapter.


topic in the Bayesian world, and one where more research is needed, butthe same state of affairs exists also for any complex statistical model ana-lyzed by classical methods (see for instance Dennis et al., 2006).In WinBUGS, nonconvergence may not only indicate lack of identifiabilityof one or more parameters but also other problems. Perhaps one of thesimplest ways of finding out whether a parameter is indeed identifiableis by simulation: we simulate a data set and see whether we are able torecover parameter values that resemble those used to generate the data.To distinguish sampling and estimation error from lack of estimability,simulations will have to be repeated many, e.g., 10 or 100, times.

2.7 POINTER TO SPECIAL TOPICS IN THIS BOOK

In this book, we learn extensively by example. Hence, Table 2.1.lists some special topics and shows where examples for them may befound.

2.8 SUMMARY

I have given a very brief introduction to Bayesian statistics and how itis conducted in practice using simulation-based methods (e.g., MCMC,Gibbs sampling). This chapter and indeed the whole book isnotmeant to deal with the theory of Bayesian statistics and the associatedcomputational methods in any exhaustive way. Rather, books like Kinget al. (2009) and Link and Barker (2010) should be consulted for that.

EXERCISE1. Bayes rule in classical statistics: Not every application of Bayes rule makes a

probability calculation Bayesian as shown next in a classical example frommedical testing. An important issue in medical testing is the probability thatone actually has a disease (denoted “D”), given that one gets a positive testresult, denoted “+” (which, depending on the test, in common languagemay be very negative, just think about an AIDS test). This probability ispðD j +Þ. With only three pieces of information that are often known fordiagnostic tests and a given population, we can use Bayes rule to computepðD j +Þ. We simply need to know the sensitivity of the diagnostic test,denoted pð+ jDÞ, its specificity pð− jnot DÞ, and the general prevalence, orincidence, of the disease in the study population, p(D). Note that sensitivityand specificity are the two possible kinds of diagnostic error.

2.8 SUMMARY 27

Compute the probability of having the disease, given that you got apositive test result. Assume the following values: sensitivity = 0.99,

specificity = 0.95, and prevalence = 5%. Start with p D j+ð Þ = pð+jDÞpðDÞpð+Þ and

note that a positive test result, which has probability p(+), can be obtainedin two ways: either one has the disease (with probability p(D)) or one doesnot have it (with probability p(not D)). Does the result surprise you?


C H A P T E R

3WinBUGS

O U T L I N E

3.1 What Is WinBUGS? 29

3.2 Running WinBUGS from R 30

3.3 WinBUGS Frees the Modeler in You 30

3.4 Some Technicalities and Conventions 31

3.1 WHAT IS WinBUGS?

The BUGS language and program was developed by epidemiologists inCambridge, UK in the 1990s (Gilks et al., 1994; Lunn et al., 2009). The acro-nym stands for Bayesian analysis Using Gibbs Sampling. In later years, aWindows version called WinBUGS was developed (Spiegelhalter et al.,2003). Despite imperfections, (Win)BUGS is a groundbreaking program;for the first time, it has made really flexible and powerful Bayesian statis-tical modeling available to a large range of users, especially for users wholack the experience in statistics and computing to fit such fully custommodels by maximizing their likelihood in a frequentist mode of inference.(Although no doubt some statisticians may deplore this because it mayalso lead to misuse; Lunn et al., 2009.)

WinBUGS lets one specify almost arbitrarily complex statisticalmodels using a fairly simple model definition language that describesthe stochastic and deterministic “local” relationships among all observableand unobservable quantities in a fully specified statistical model. Thesestatistical models contain prior distributions for all top-level quantities,i.e., quantities that do not depend on other quantities. From this, WinBUGSdetermines the so-called full conditional distributions and then constructsa Gibbs or other MCMC sampler and uses it to produce the desired numberof random samples from the joint posterior distribution.

Introduction to WinBUGS for EcologistsDOI: 10.1016/B978-0-12-378605-0.00003-X © 2010 Elsevier Inc. All rights reserved.29

To let an ecologist really grasp what WinBUGS has brought us, it isinstructive to compare the code (say in program R) required to find themaximum likelihood estimates (MLEs) of a custom model using numericaloptimization with the code to specify the same model in WinBUGS: thedifference is dramatic! With the former, some ugly likelihood expressionappears at some point. In contrast, in WinBUGS, the likelihood of a modelis specified implicitly as a series of deterministic and stochastic relation-ships; the latter types of relationships are specified as the statistical distri-butions assumed for all random quantities in the model.

WinBUGS is a fairly slow program; for large problems, it may fail toprovide posterior samples of reasonable size within practical time limits.Custom-written samplers in more general programming languages suchas Fortran or C++, or even R, can easily beat WinBUGS in terms ofspeed (Brooks, 2003), and often do so by a large margin. However, formany ecologists, writing their own samplers is simply not an option,and this makes WinBUGS so unique.

3.2 RUNNING WinBUGS FROM R

In contrast to most other WinBUGS introductions (e.g., McCarthy, 2007;Ntzoufras, 2009), all examples in this book will be analyzed withWinBUGSrun fromwithin programRbyuse of the R2WinBUGSpackage (Sturtz et al.,2005). R has become the lingua franca of statistical computing and conduct-ing a Bayesian analysis in WinBUGS directly from within one’s normalcomputing environment is a great advantage. In addition, the steps beforeand after an analysis in WinBUGS are greatly facilitated in R, e.g., datapreparation as well as computations on the Markov chain Monte Carlo(MCMC) output and the presentation of results in graphs and tables.

Importantly, after conducting an analysis in WinBUGS, R2WinBUGSwill import back into R the results of the Bayesian analysis, which essen-tially consist of the Markov chains for each monitored parameter. Hence,these results must be saved before exiting the program, otherwise allresults will be lost! (Although the coda files are saved into the workingdirectory, see 5.3.) This would not be dramatic for most examples in thisbook, but can be very annoying if you have just run a model for 7 days.

3.3 WinBUGS FREES THE MODELER IN YOU

Fitting statistical models in WinBUGS opens up a new world of mode-ling freedom to many ecologists. In my experience, writing WinBUGScode, or understanding and tailoring to one’s own needs WinBUGScode written by others, is much more within the reach of typical ecologists

3. WinBUGS30

than writing or adapting a similar analysis in some software that explicitlymaximizes a likelihood function for the same problem. The simplepseudo-code model specification in WinBUGS is just wonderful. Thus,in theory, and often also in practice, WinBUGS really frees your creativityas a modeler and allows you to fit realistically complex models to obser-vations from your study system.

However, it has been said (by Box or Cox, I believe) that statisticalmodeling is as much an art as a science. This applies particularly tomodeling in WinBUGS, where for more complex problems, a lot of art(and patience!) may often be required to get an analysis running.WinBUGSis great when it works, but on the other hand, there are many thingsthat may go wrong (e.g., traps, nonconvergence) without any obviouserror in one’s programming. For instance, cryptic error messages suchas “trap 66” can make one feel really miserable. Of course, as usual,experience helps a great deal, so in an appendix on the book Web site Iprovide a list of tips that hopefully allow you to love WinBUGS moreunconditionally (see Appendix—A list of WinBUGS tricks). I wouldsuggest you skim over them now and then refresh your memory fromtime to time later.

3.4 SOME TECHNICALITIES AND CONVENTIONS

As a typographical convention in this book, WinBUGS and R code isshown in Courier font, like this. When part of the output was removedfor the sake of clarity, I denote this by [ ... ]. When starting a new Rsession, it is useful or even obligate to set a few options, e.g., choose aworking directory (where WinBUGS will save the files she creates, suchas those containing the Markov chain values), load the R2WinBUGS pack-age and tell R where the WinBUGS program is located. Each step may haveto be adapted to your particular case. You can set the R working directoryby issuing a command such as setwd("C:\"). The R2WinBUGS functionbugs(), which we use all the time to call WinBUGS from within R, has anargument called bugs.directory that defaults to "C:/Program Files/WinBUGS14/" (though this is not the case under Windows VISTA). Thisis fine for most Anglosaxon computers but will have to be adapted forother language locales. Most recent versions of the bugs() function haveanother option called working.directory that partly overrides the globalR setting from setwd(). It is best set to set working.directory = getwd().

To conduct the analyses in this book, you will need to load someR packages, most notably R2WinBUGS, but sometimes also lme4,MASS, or lattice. At the beginning of every session, execute library("R2WinBUGS") and do the same for other required packages. One

3.4 SOME TECHNICALITIES AND CONVENTIONS 31

useful R feature is a simple text file called Rsite.profile, which sits inC:\Program Files\R\R 2.8.1\etc., and contains global R settings.For instance, you could add the following lines:

library("R2WinBUGS")library("lme4")bd < "C:/Program files/WinBUGS14/"

This causes R to load the packages R2WinBUGS and lme4 wheneverit is started and to have an object called bd that contains the usualWinBUGS address. We could add the option bugs.directory = bdwhenever calling the bugs() function, and when running an analysis ona German-language locale, redefine bd appropriately.

In all analyses, I have striven for a consistent presentation. First, wewrite into the R working directory a text file containing the WinBUGSmodel description. After that, the other components required for the ana-lysis are written into R objects: data, initial values, a list of parameters tobe estimated and MCMC settings. Finally, these objects are sent to Win-BUGS by the R2WinBUGS workhorse function bugs(). Within the Win-BUGS model description, I have also sought consistency of layout bygrouping as far as possible statements defining priors, the likelihoodand derived quantities. This is not required; indeed, within some limits(e.g., putting code inside or outside of loops or pairs of braces), WinBUGScode lines may be swapped freely. This is not commonly the case withother programming languages.

When conducting a WinBUGS analysis from R, the WinBUGS modeldescription, i.e., the text file that contains the model definition in theBUGS language, could be written in any text editor and then saved intothe R working directory. However, I find it useful to use for that thesink() function and keep both WinBUGS and R code in the same file.However, you must not get confused about which code is R and whichis WinBUGS: put simply, everything inside of the pair of sink() callsand inside the paired curly braces after the key word model will beWinBUGS code. The book Web site contains a text file with all the codeshown in this book, and copying and pasting the code into an openR window works well. However, when using the popular R editorTinn-R, a correct WinBUGS model description file will not always beproduced when using sink(), so the WinBUGS trick list on the bookWeb site shows an alternative that works when using Tinn-R.

All analyses have been tested out with R 2.8.1 and WinBUGS 1.4.3.I would hope that some backwards compatibility is present. Also, mostor all of the WinBUGS code should work fine in OpenBugs, but I havenot verified this.

3. WinBUGS32

C H A P T E R

4A First Session in WinBUGS:The “Model of the Mean”

O U T L I N E

4.1 Introduction 33

4.2 Setting Up the Analysis 34

4.3 Starting the MCMC Blackbox 40

4.4 Summarizing the Results 41

4.5 Summary 44

4.1 INTRODUCTION

To familiarize ourselves with WinBUGS and key features of a Bayesiananalysis in practice (such as prior, likelihood, Markov chain Monte Carlo(MCMC) settings, initial values, updates, convergence, etc.), we will firstrun an analysis directly within WinBUGS of what is perhaps the simplestof all models—the “model of the mean.” That is, we estimate the mean of aNormal population from a sample of measurements taken in that popula-tion. Our first example will deal with body mass of male peregrines(Fig. 4.1).

As an aside, this example emphasizes that even something that mostpeople would not think of as a model—taking a simple average—in factalready implies a model that may be more or less adequate in any givencase and is thus not as innocuous as many would think. For instance,using a simple average to express the central tendency of a skewed popu-lation is not very useful, and the “model of the mean” is then not ade-quate. Taking an average over nonindependent measurements, such asthe diameter of each of several flowers measured within each of a sample


of plants, is also not such a good idea since the “model of the mean”underlying that average does not account for dependencies among mea-surements (as would, for instance, a mixed model). Finally, the computa-tion and usual interpretation of a standard error for the population mean(as SD(x)/sqrt(n(x))) is implicitly based on a “model of the mean”where each measurement is independent and the population distributionreasonably close to a Normal or at least symmetric.

4.2 SETTING UP THE ANALYSIS

To open WinBUGS, we click on its bug-like icon. WinBUGS uses files inits own ODC format. To create an ODC file, you can either modify anexisting ODC file (e.g., one of the examples contained in the WinBUGShelp: go to Help > Examples vol I and II) or write a program in a texteditor such as Word, save it as a text file (with ending .txt) and read itinto WinBUGS and save it as an ODC document. Within WinBUGS, thesefiles can be edited in content and format.

Here, we open the file called "The model of the (normal) mean.odc"(from the book Web site) via File > Open. We see something like at the topof the next page.

FIGURE 4.1 Male peregrine falcon (Falco peregrinus) wintering in the FrenchMediterranean, Sète, 2008. (Photo: J. M. Delaunay)

4. A FIRST SESSION IN WinBUGS: THE “MODEL OF THE MEAN”34

Note that we have two data sets available on the body mass of maleperegrines. Both were created by using program R to produce randomNormal numbers with specified values for the mean and the standarddeviation. One data set contains 10 observations and the other contains1000 (scroll down to see them). Male peregrines in Western Europeweigh on average about 600 g and Monneret (2006) gives a weightrange of 500–680 g. Hence, the assumption of a Normal distribution ofbody mass implies a standard deviation of about 30 g.

To run a Bayesian analysis of the “model of the mean” for one of thesedata sets, we have to tell WinBUGS what the model is, what data to use,provide initial values from where to start the Markov chains and set someMCMC features, e.g., say how many Markov chains we want, how manydraws from the posterior distribution we would like to have, how manydraws WinBUGS should discard as a burn-in at the start and perhaps by

4.2 SETTING UP THE ANALYSIS 35

how much we want to thin the resulting chain to save disk space andreduce autocorrelation among repeated draws.

First, we have to tell WinBUGS what the model is. We select Model >Specification, which causes the model Specification Tool window to popup. Then, we put the cursor somewhere on the word “model” or indeedanywhere within the model definition in the ODC document and


press “check model,” whereupon, if the model is syntactically correct,WinBUGS responds in the bottom left corner:

Next, we load the data. We start with analysis of the data set number 1,which comprises 10 measurements, mark the entire word “list” in the datastatement and press “load data”, whereupon WinBUGS responds in thebottom left corner “data loaded.”

Then, we select the number of parallel Markov chains that WinBUGSshould “prepare” (i.e., compile), here, 3. This needs to be done before the“compile” button is pressed! Then press the “compile” button.

WinBUGS replies (in the bottom left corner) that the model is compiled.We still need to provide the starting values. One after another, mark theword “list” in that part of the program where the initial values are definedand then press “load inits.” After you have done this once for each chain,the model is initialized and ready to run.

It is possible to let WinBUGS select the initial values by itself by insteadpressing the “gen inits” button. This works often, and may be necessaryfor some parameters in more complex models, but in general, it is prefer-able to explicitly specify inits for as many parameters as possible.

Next, we choose Inference > Samples to tell WinBUGS what kindof inference we want for which parameter, which incidentally is calleda node by WinBUGS. This makes the “Sample Monitor Tool” pop up(see top of p. 38).


We write into the blank space next to the word “node” the names ofthose parameters whose posterior distributions we want to estimate bysampling them and press the “set” button. This button only becomesblack, i.e., activated, once the correct name of a parameter from the pre-viously defined model has been entered.

For models with many parameters, this can be tedious because we haveto enter manually all of their names (correctly!). Moreover, every time wewant to rerun the model, we have to repeat this whole procedure.

Once we have entered all parameters whose Markov chains we want tokeep track of, i.e., for which we want to save the values and get estimates,we type an asterisk into the blank space to the right of the word “node.”


This tells WinBUGS that it should save the simulated draws from the pos-terior distributions of all the parameters whose names we have enteredpreviously. On typing the asterisk, a few further buttons become acti-vated, among them “trace” and “history” (see top of this page).

We select “trace,”which will provide us with a moving time-series plot ofthe posterior draws for each selected parameter. (The button “history” pro-duces a static graph of all draws up to the particular draw at which the his-tory button is pushed.) Another window pops up that has an empty graphfor every selected parameter. There are several other settings in thiswindow that could be changed, perhaps only at later stages of the analysis.An important one is the “beg” and the “end” setting. Together, they definewhich draws should be used when summarizing the inference about each


parameter, for instance when pressing the “density” or the “stats” buttons(see later). Changing the “beg” setting to, say, 100, specifies a burn-in of 100.This means that the first 100 draws from each Markov chain are discardedas not representative of the stationary distribution of the chain (i.e., theposterior distribution of the parameters in the model).

Now we are ready to start our simulation, i.e., start the Markov chainsat their specified initial values and let the chains evolve for the requestednumber of times. To do this, we select Model > Update which makes athird window pop open, the “Update Tool.”

4.3 STARTING THE MCMC BLACKBOX

We choose the default 1000 updates, but change the “refresh” settingto 1 or 10, which gives a more smoothly moving dynamic trace of theMarkov chains. Then, we press the “update” button and off we go. The


MCMC blackbox WinBUGS (Andy Royle, pers. comm.) draws samplesfrom the posterior distributions of all monitored parameters. One contin-uous stream for each chain (here, 3) appears and represents the sampledvalues for each monitored parameter. Also, note that WinBUGS explainswhat it is doing at the bottom left corner, i.e., WinBUGS says that it isupdating (=drawing samples from the posterior distribution).

Once the specified number of iterations (=updates, =draws) have beencompleted, WinBUGS stops and says that (on my laptop) the “updatestook 62 s.”

4.4 SUMMARIZING THE RESULTS

From each Markov chain, we now have a sample of 1000 randomdraws from the joint posterior distribution of the two parameters in themodel. For inference about the mass of male peregrines, we can summar-ize these samples numerically or we can graph them, either in one dimen-sion (for each parameter singly) or in two, to look at correlations amongtwo parameters. If we wanted more samples, i.e., longer Markov chains,we could just press the “update” button again and would get another1000 draws added. For now, let’s be happy with a sample of 1000(which is by far enough for this simple model and the small data set)and proceed with the inference about the parameters.

First we should check whether the Markov chains have indeed reacheda stable equilibrium distribution, i.e., have converged. This can be donevisually or by inspecting the Brooks–Gelman–Rubin (BGR) diagnosticstatistic that WinBUGS displays on pressing the “bgr diag” button inthe Sample Monitor Tool. The BGR statistic (graphed by the red line) is

4.4 SUMMARIZING THE RESULTS 41

an analysis of variance (ANOVA)-type diagnostic that compares within-and among-chain variance. Values around 1 indicate convergence, with1.1 considered as acceptable limit by Gelman and Hill (2007). Numericalsummaries of the BGR statistic for successive sections of the chains can beobtained by first double-clicking on a BGR plot and then holding theCTRL key and clicking on the plot once again.

According to this diagnostic, the chains converge to a stationarydistribution almost instantly; thus, we may well use all 3*1000 iterationsfor inference about the parameters. In contrast, for more complex models,nonzero burn-in lengths are always needed. For instance, for some com-plex models, some statisticians may use chains one million iterations longand routinely discard the first half of every chain as a burn-in.

Once we are satisfied that we have a valid sample from the posteriordistribution of the parameters of the model of the mean and based on theobserved 10 values of male peregrine body mass, we can use these drawsto make an inference. Inference means to draw a probabilistic conclusionabout the population from which these 10 peregrines came. Because wehave created both this population and the samples from it, we know ofcourse that the true population mean and standard deviations are 600and 30 g, respectively.

We close some windows and press the “history,” “density,” “stats,”and the “auto cor” buttons, which produces the following graphical dis-play of information (after some rearranging of the pop-up windows):


Visual inspection of the time series plot produced by “history” againsuggests that the Markov chains have converged. Then, we obtain akernel-smoothed histogram estimate of the posterior distributions ofboth parameters. The posterior of the mean looks symmetrical, whilethat for the standard deviation is fairly skewed. The node statistics givethe formal parameter estimates. As a Bayesian point estimate, typicallythe posterior mean or the posterior median (or sometimes also themode) is reported, while the posterior standard deviation is used as a stan-dard error of the parameter estimate. The range between the 2.5th and97.5th percentiles represents a 95% Bayesian confidence interval and iscalled a credible interval.

The autocorrelation function is depicted last. For this simple model, thechains are hardly autocorrelated at all. This is good as our posteriorsample contains more information about the parameters than when suc-cessive draws are correlated. In more complex models, we will frequentlyfind considerable autocorrelation between consecutive draws in a chain,and then it may be useful to thin the chains to get more approximatelyindependent draws, i.e., more “concentrated” information. Note that avery high autocorrelation may also mean that a parameter is not identifi-able in the model, i.e., not estimable, or that there are other structural pro-blems with a model.

You should note two things about Bayesian credible intervals: First,though in practice, most nonstatisticians will not make a distinctionbetween a Bayesian credible interval and a frequentist confidence interval,and indeed, for large samples sizes and vague priors, the two will typicallybe very similar numerically, there is in fact a major conceptual differencebetween them (see Chapter 2). Second, there are different ways inwhich theposterior distribution of a parameter may be summarized by a Bayesiancredible interval. One is the highest posterior density interval (HPDI),which denotes the limits of the narrowest segment of the posterior distribu-tion containing 95% of its total mass. HPDIs are not computed by Win-BUGS (nor by R2WinBUGS, see later), but can be obtained using thefunction HPDinterval() in the R package coda.

One should also keep an eye on the Monte Carlo (MC) error undernode statistics. The MC error quantifies the variability in the estimatesthat is due to Markov chain variability, which is the sampling error inthe simulation-based solution for Bayes rule for our problem. MC errorshould be small. According to one rule of thumb, it should be <5% ofthe posterior standard deviation for a parameter.

One possible summary from our Bayesian analysis of the model of themean adopted for the body mass of male peregrines would be that meanbody mass is estimated at 604.8 g (SE: 12.62, 95%; CI: 579.2–629.1 g) andthat the standard deviation of the body mass of male peregrines is 39.06 g(SE: 10.99, 95%; CI: 23.77–66.24 g). (Note how variance parameters aretypically estimated much less precisely than are means.)

4.4 SUMMARIZING THE RESULTS 43

There is other summary information from WinBUGS that we can exam-ine; for instance, Inference > Compare or Inference > Correlation allows usto plot parameter estimates or to see how they are related across two ormore parameters. Let’s have a look at whether the estimates of the meanbody mass and its standard deviation are correlated: Choose Inference >Correlation, type the names of the two parameters (nodes) and we see thatthe estimates of the two parameters are not correlated at all.

4.5 SUMMARY

We have conducted our first Bayesian analysis of the simplest statisticalmodel, the “model of the mean.”We see that WinBUGS is a very powerfulsoftware to fit models in a Bayesian mode of inference using MCMC andthat a lot can be achieved using simple click and point techniques.

EXERCISES1. Save your file under another name, e.g., test.odc. Then try out a few

things that can go wrong.

• create a typo in one of the node names of the model and WinBUGScomplains “made use of undefined node.”


• select extremely diffuse priors and see whether WinBUGS gets caught ina trap.

• select initial values outside of the range of a (uniform) prior andWinBUGS will crash.

• add data to the data set (e.g., silly.dat = c(0,1,2)) and WinBUGSwill say “undefined variable.”

• try out other typos in model, inits or data, e.g., an o (‘oh’) instead of azero.

• See how WinBUGS deals with missing responses: turn the secondelement of the mass vector (in the smaller data set, but it doesn’t reallymatter) into a missing value (NA). On loading the inits, you will now seethat WinBUGS says (in its bottom left corner) that for every chain, thereare still uninitialized variables. By this, it means the second element ofmass, as we shall see. Continue (press the “gen inits” button) and thenrequest a sample for both population.mean and mass in the SampleMonitor Tool. Request the Trace to be displayed and you will see that weget one for the node population.mean and another one for mass[2]. Wesee that this missing value is automatically imputed (estimated). Theimputed value in this example is the same as the population mean,except that its uncertainty is much larger. Adding missing responsevalues to the data set is one of the easiest ways in which predictions canbe obtained in WinBUGS.

4.5 SUMMARY 45

C H A P T E R

5Running WinBUGS from R

via R2WinBUGS

O U T L I N E

5.1 Introduction 47

5.2 Data Generation 48

5.3 Analysis Using R 49

5.4 Analysis Using WinBUGS 49

5.5 Summary 55

5.1 INTRODUCTION

In Chapter 4, it has become clear that as soon as you want to useWinBUGS more frequently, pointing and clicking is a pain and it ismuch more convenient to run the analyses by using a programming lang-uage. In program R, we can use the add-on package R2WinBUGS (Sturtzet al., 2005) to communicate with WinBUGS, so WinBUGS analyses can berun directly from R (also see package BRugs, not featured here, in connec-tion with OpenBugs). The ingredients of an entire analysis are specified inR and sent to WinBUGS, then WinBUGS runs the desired number ofupdates, and the results, essentially just a long stream of random drawsfrom the posterior distribution for each monitored parameter, along withan index showing which value of that stream belongs to which parameter,are sent back to R for convenient summary and further analysis. This ishow we will be using WinBUGS from now on (so I will assume thatyou have R installed on your computer).

Any WinBUGS run produces a text file called “codaX.txt” for eachchain X requested along with another text file called “codaIndex.txt.”


When requesting three chains, we get coda1.txt, coda2.txt,coda3.txt, and the index file. They contain all sampled values of theMarkov chains (MCs) and can be read into R using facilities providedby the R packages coda or boa and then analyzed further. Indeed, some-times, for big models, the R object produced by R2WinBUGS may be toobig for one’s computer to swallow, although it may still be possible to readin the coda files. Doing this and using coda or boa for analyzing, theMarkov chain Monte Carlo (MCMC) output may then be one’s onlyway of obtaining inferences.

As an example for how to use WinBUGS in combination with Rthrough R2WinBUGS, we rerun the model of the mean, this time for acustom-drawn sample of male peregrines. This is the first time wherewe use R code to simulate our data set and then analyze it in WinBUGSas well as by using the conventional facilities in R.

First, we assemble our data set, i.e., simulate a sample of male peregrinebody mass measurements. Second, we use R to analyze the model ofthe mean for this sample in the frequentist mode of inference. Third, weuse WinBUGS called from R to conduct the same analysis in a Bayesianmode of inference. Remember, analyzing data is like repairing motorcycles.Thus, analyzing the data set means breaking it apart into those pieces thatwe had used earlier to build it up. In real life, of course, nature assemblesthe data sets for us and we have to infer truth (i.e., nature's assembly rules).

5.2 DATA GENERATION

First, we create our data:

# Generate two samples of body mass measurements of male peregrines

y10 <- rnorm(n 10, mean 600, sd 30) # Sample of 10 birds

y1000 <- rnorm(n 1000, mean 600, sd 30) # Sample of 1000 birds

# Plot data

xlim c(450, 750)

par(mfrow c(2,1))

hist(y10, col 'grey ', xlim xlim, main 'Body mass (g) of 10 male peregrines')

hist(y1000, col = 'grey', xlim = xlim, main = ' Body mass (g) of1000 male peregrines')

Here, and indeed in all further examples, it is extremely instructive to exe-cute the previous set of statements repeatedly to experience samplingerror—the variation in one’s data that stems from the fact that only partbut not the whole of a variable population has been measured. Samplingerror, or sampling variance, is something absolutely central to statistics,yet it is among the most difficult concepts for ecologists to grasp,

5. RUNNING WinBUGS from R via R2WinBUGS48

especially, since in practice we only ever observe a single sample from thedistribution that characterizes that variation! It is astonishing to observehow different repeated realizations from the exact same random processcan be, here, the sampling of 10 or 1000 male peregrines from an assumedinfinite population of male peregrines. Also surprising is, how far fromnormal the distribution in the smaller sample may look.

After playing around a while, we keep one pair of samples and go on.Note that unless you set a so-called seed for the random-number generatorin R (for details, type ?set.seed), you will have a different sample fromme (i.e., this book) and therefore also get slightly different output in theensuing analyses (although you can download the exact data sets analy-zed for the book from the Web site).

5.3 ANALYSIS USING R

We can conduct a quick classic analysis of this model using the linearregression facilities in R on the larger sample.

summary(lm(y1000 ~ 1))> summary(lm(y1000 ~ 1))

[ ...]

Coefficients:Estimate Std. Error t value Pr(>|t|)

(Intercept) 599.664 1.008 594.6 <2e 16 ***

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 31.89 on 999 degrees of freedom

We recognize well the estimates of the population mean (599.664) andpopulation standard deviation (SD), which is called the residual standarderror (31.89). Now let’s do the same analysis in WinBUGS.

5.4 ANALYSIS USING WinBUGS

Remember that you always need to load the R2WinBUGS package first,although we won’t show this again from now on. Also, we need to set theR working directory.

library(R2WinBUGS) # Load the R2WinBUGS library

setwd("F:/ WinBUGS book/Simple Normal mean model in WinBUGS") # wd

# Save BUGS description of the model to working directory

sink("model.txt")

5.4 ANALYSIS USING WinBUGS 49

cat("

model {

# Priors

population.mean ~ dunif(0,5000) # Normal parameterized by precision

precision <- 1 / population.variance # Precision 1/variance

population.variance <- population.sd * population.sd

population.sd ~ dunif(0,100)

# Likelihood

for(i in 1:nobs){

mass[i] ~ dnorm(population.mean, precision)

}

}

",fill TRUE)

sink()

This last bit of code writes into the R working directory a text file named“model.txt” containing the WinBUGS description of the model. We cansee this when we look at the Windows Explorer after execution of this setof statements.

Next, we need to package the data that WinBUGS uses in the analysis.We do this by creating a bundle that contains both the data themselvesand a count of the number of data points.

# Package all the stuff to be handed over to WinBUGS# Bundle datawin.data < list(mass = y1000, nobs = length(y1000))


Then, we define a function that creates random starting values, the initsfunction. We could also explicitly supply these initial values for eachMarkov chain requested, as we did in the previous chapter. However,this is less flexible, since we would need to explicitly specify one set ofinits for each chain. If we wanted five chains instead of three, we wouldhave to add two sets of inits. In contrast, a function will simply be executedtwo more times.

# Function to generate starting valuesinits < function()list(population.mean = rnorm(1,600),population.sd = runif(1,1,30))

As a reminder, if you are unsure what an R function such as rnorm() orrunif() means, just type ?rnorm or ?runif in the R console. We also haveto tell WinBUGS for which parameters it should save the posterior draws.Let’s say we want to keep the variance also.

# Parameters to be monitored (= to estimate)params < c("population.mean", "population.sd", "population.variance")

Then, the MCMC settings need to be selected.

# MCMC settingsnc < 3 # Number of chainsni < 1000 # Number of draws from posterior (for each chain)nb < 1 # Number of draws to discard as burn innt < 1 # Thinning rate

Finally, the function bugs() is called to perform the analysis in WinBUGSand put its results into an R object called out:

# Start Gibbs sampler: Run model in WinBUGS and save results in object called out

out <- bugs(data win.data, inits inits, parameters.to.save params, model.file

"model.txt", n.thin nt, n.chains nc, n.burnin nb, n.iter ni, debug

TRUE, DIC TRUE, working.directory getwd())

During execution of the program inWinBUGS, the Rwindow is frozen. Oncethe requested number of draws from the posterior has been produced, Win-BUGS presents a graphical (the “history”) and numerical summary of thoseparameters for which monitoring (i.e., estimation) was requested. On exit-ing WinBUGS, we have the results in various formats in our working direc-tory (e.g., coda1.txt, coda2.txt, coda3.txt, log.odc, and log.txt) aswell as in the R workspace a new object named out, a summary of whichcan be obtained by just typing its name (i.e., out). Note that setting debug =FALSE would cause WinBUGS to exit automatically after completion of therequested number of draws. This is important for instance when runningrepeated simulations. Otherwise, keepingWinBUGS open after the required


number of draws have been taken from the posterior (i.e., setting debug =TRUE) allows you to use WinBUGS for additional analyses, for instance, tomake plots.

Now look at the R workspace and note the new object called out:

ls()> ls()[1] "inits" "nb" "nc" "ni" "nt" "out" "params"[8] "win.data" "xlim" "y10" "y1000"

We look at a summary of the Bayesian analysis:

out # Produces a summary of the object

> out

Inference for Bugs model at "model.txt", fit using WinBUGS,

3 chains, each with 1000 iterations (first 1 discarded)

n.sims 2997 iterations saved

mean sd 2.5% 25% 50% 75% 97.5% Rhat n.eff

population.mean 599.7 1.0 597.7 598.9 599.6 600.4 601.7 1.0 3000

population.sd 32.0 1.8 30.6 31.4 31.9 32.4 33.4 1.1 3000

population.variance 1027.0 173.7 934.6 988.7 1019.0 1050.0 1116.0 1.1 3000

deviance 9765.1 33.4 9762.0 9762.0 9763.0 9764.0 9769.0 1.1 1500


For each parameter, n.eff is a crude measure of effective sample size,

and Rhat is the potential scale reduction factor (at convergence, Rhat 1).

DIC info (using the rule, pD Dbar Dhat)

pD 3.5 and DIC 9768.6

DIC is an estimate of expected predictive error (lower deviance is better).

Actually, object out contains a lot more information, which we can see bylisting all the objects contained within out by typing names(out).

> names(out)[1] "n.chains" "n.iter" "n.burnin" "n.thin"[5] "n.keep" "n.sims" "sims.array" "sims.list"[9] "sims.matrix" "summary" "mean" "sd"

[13] "median" "root.short" "long.short" "dimension.short"[17] "indexes.short" "last.values" "isDIC" "DICbyR"[21] "pD" "DIC" "model.file" "program"

You can look into any of these objects by typing their name, which willusually print their full content, or by applying some summarizing functionsuch as names() or str() on them. (Again, when not sure what an R func-tion does, just type ?names or ?str into your R console.) You can also lookdeeper; note some degree of redundancy among sims.array, sims.list,and sims.matrix:

str(out)

> str(out)

List of 24

$ n.chains : num 3

$ n.iter : num 1000

$ n.burnin : num 1

$ n.thin : num 1

$ n.keep : num 999

$ n.sims : num 2997

$ sims.array : num [1:999, 1:3, 1:4] 600 601 600 599 599 ...

..- attr(*, "dimnames") List of 3

.. ..$ : NULL

.. ..$ : NULL

.. ..$ : chr [1:4] "population.mean" "population.sd" "population.variance"

"deviance"

$ sims.list :List of 4

..$ population.mean : num [1:2997] 600 600 601 601 600 ...

..$ population.sd : num [1:2997] 31.8 32.8 31.6 32 31.8 ...

..$ population.variance: num [1:2997] 1013 1076 1000 1027 1011 ...

..$ deviance : num [1:2997] 9762 9763 9764 9763 9762 ...

$ sims.matrix : num [1:2997, 1:4] 600 600 601 601 600 ...



.. ..$ : NULL


"deviance"

$ summary : num [1:4, 1:9] 599.66 32 1027.02 9765.13 1.04 ...



"deviance"

.. ..$ : chr [1:9] "mean" "sd" "2.5%" "25%" ...

$ mean :List of 4

..$ population.mean : num 600

..$ population.sd : num 32

..$ population.variance: num 1027

..$ deviance : num 9765

$ sd :List of 4

..$ population.mean : num 1.04

..$ population.sd : num 1.78


..$ deviance : num 33.4

$ median :List of 4

..$ population.mean : num 600

..$ population.sd : num 31.9


..$ deviance : num 9763

[ ... ]

Object out contains all the information contained in the coda files thatWinBUGS produced plus some processed items such as summaries likemean values of all monitored parameters, the Brooks–Gelman–Rubin(BGR) convergence diagnostic called Rhat, and an effective sample sizethat corrects for the degree of autocorrelation within the chains (moreautocorrelation means smaller effective sample size). We can now applystandard R commands to get what we want from this raw output of ourBayesian analysis of the model of the mean.

For a quick check whether any of the parameters has a BGR diagnosticgreater than 1.1 (i.e., has Markov chains that have not converged), you cantype this:

hist(out$summary[,8]) # Rhat values in the eighth column of the summary

which(out$summary[,8] > 1.1) # None in this case

For trace plots for the entire chains, do

par(mfrow = c(3,1))matplot(out$sims.array[1:999,1:3,1], type = "l")matplot(out$sims.array[,,2] , type = "l")matplot(out$sims.array[,,3] , type = "l")


… or just for the start of the chains to see how rapidly they converge …

par(mfrow = c(3,1))matplot(out$sims.array[1:20,1:3,1], type = "l")matplot(out$sims.array[1:20,,2] , type = "l")matplot(out$sims.array[1:20,,3] , type = "l")

We can also produce graphical summaries, e.g., histograms of the poster-ior distributions for each parameter:

par(mfrow = c(3,1))hist(out$sims.list$population.mean, col = "grey")hist(out$sims.list$population.sd, col = "blue")hist(out$sims.list$population.variance, col = "green")

… or plot the (lack of ) correlation between two parameters:

par(mfrow = c(1,1))plot(out$sims.list$population.mean, out$sims.list$population.sd)

or

pairs(cbind(out$sims.list$population.mean, out$sims.list$population.sd,

out$sims.list$population.variance))

Numerical summaries of the posterior distribution can also be obtained,with the standard deviation requested separately:

summary(out$sims.list$population.mean)summary(out$sims.list$population.sd)sd(out$sims.list$population.mean)sd(out$sims.list$population.sd)

Now compare this again with the classical analysis using maximum like-lihood and the R function lm(). The results are almost indistinguishable.

summary(lm(y1000 ~ 1))

To summarize, after using R2WinBUGS, the R workspace contains all theresults from your Bayesian analysis (the coda files contain them as wellbut in a less accessible format). If you want to keep these results, youmust save the R workspace or save them in an external file.

5.5 SUMMARY

We have repeated the analysis of the model of the mean from the pre-vious chapter by calling WinBUGS from within program R using theR2WinBUGS interface package. This is virtually always the most efficientmode of running analyses in WinBUGS and is the way in which we willuse WinBUGS in the remainder of this book. We have also seen that a

5.5 SUMMARY 55

Bayesian analysis with vague priors yields almost identical estimates as aclassical analysis, something that we will see many more times.

EXERCISES1. Informative priors: The analysis just presented uses vague priors, i.e., the

effect of the prior on the posterior distribution is minimal. As we have justseen, Bayesian analyses with such priors typically yield estimates verysimilar numerically to those obtained using maximum likelihood.

To see how the posterior distribution is influenced by both prior andlikelihood, assume that we knew that the average body mass of maleperegrines in populations like the one we sampled lay between 500 and590 g. We can then formally incorporate this information into the analysisby means of the prior distribution. Hint: Change some code bits as follows,and let WinBUGS generate initial values automatically:

# Priors

population.mean ~ dunif(500, 590) # Mean mass must lie between 500 and 590

...

Rerun the analysis (again, with the prior for the population.sdslightly changed) and see how the parameter estimates and theiruncertainty change under this prior. Is this a bad thing? You may want toexperiment with other limits for the uniform prior on the sd, for instance,set it at (0,10). Run the model and inspect the posterior distributions.

2. Run the model for the small and large data set and compare posteriormeans and SDs. Explain the differences.

3. Run the analysis for the larger data set for a long time, and from time totime, draw a density plot (in WinBUGS) or inspect the stats, for instance,after 10, 100, 1000, and 10 000 iterations: see how increasing length ofMarkov chains leads to improved estimates. Look at the MC error (in theWinBUGS output, hence you have to set default = TRUE) and at thevalues of the BGR statistic (=Rhat). Explain what you see.

4. CompareMCerror andposterior SDanddistinguish these things conceptually.See how they change in Markov chains of different lengths (as in Exercise 3).

5. Derived quantities: Estimate the coefficient of variation (CV = SD/mean) ofbody mass in that peregrine population. Report a point estimate (with SE)of that CV and also draw a picture of its posterior distribution.

6. Swiss hare data: Fit the model of the mean to mean.density and report themean, the SE of the mean, and a 95% credible interval for mean hare density.


C H A P T E R

6Key Components of

(Generalized) Linear Models:Statistical Distributions and the

Linear Predictor

O U T L I N E

6.1 Introduction 58

6.2 Stochastic Part of Linear Models: Statistical Distributions 596.2.1 Normal Distribution 616.2.2 Continuous Uniform Distribution 626.2.3 Binomial Distribution: The “Coin-Flip Distribution” 626.2.4 Poisson Distribution 65

6.3 Deterministic Part of Linear Models: Linear Predictor andDesign Matrices 666.3.1 The Model of the Mean 686.3.2 t-Test 696.3.3 Simple Linear Regression 736.3.4 One-Way Analysis of Variance 766.3.5 Two-Way Analysis of Variance 786.3.6 Analysis of Covariance 82

6.4 Summary 89


6.1 INTRODUCTION

All models in this book explain the variation in an observed response asbeing composed of a deterministic and a stochastic part. Thus, the conceptof these models is this:

response = deterministic part + stochastic part

The deterministic part is also called the systematic part and the stochas-tic the random part of a model. It is the presence of the stochastic partthat makes a model a statistical, rather than simply a mathematicalmodel. For the description of the stochastic part of the model, we usea statistical distribution. In order to choose the “right” distribution, weneed to know the typical sampling situation that leads to one rather thananother distribution. Sampling situation means how the studied objectswere chosen and how and what characteristic was measured. The quan-tity measured or otherwise assessed, the variation of which one wants toexplain, is the response. In this chapter, we briefly review four of themost common statistical distributions that are used to capture the varia-bility in a response: the normal, the uniform, the binomial, and the Pois-son. These are virtually all the distributions that you need to know forthis book. In addition, they are also those that underlie a vast range ofstatistical techniques that ecologists commonly employ.

Linear models are so called because they assume that the mean(i.e., the expected) response can be treated as the result of explanatoryvariables whose effects add together. Effects are additive regardless ofwhether the explanatory variables are continuous (covariates, regres-sors) or discrete (factors). To be able to specify how exactly we thinkthe explanatory variables affect the response, we need to understandthe so-called linear predictor of the model, the design matrix, and differ-ent parameterizations of a linear model. It is the design matrix alongwith the observed values of the covariates that makes up the determinis-tic part of a linear statistical model. They combine to form the linear pre-dictor, i.e., the expected response, which is the value of the response thatwe would expect to observe in the absence of a stochastic component inthe studied system.

In programs such as R or GenStat, the specification of the designmatrix, and hence the linear predictor, just “happens” internally withoutus having to know how exactly this works. All we type is a formula todescribe the linear model as in response ~ effect1 + effect2. In contrastand unfortunately (or actually, fortunately!), using WinBUGS requires usto know exactly what kind of linear model we want to fit because we haveto describe this model at a very elementary level. Therefore, in thischapter, we review some of the basics of linear models. I expect them tobe useful also for your general understanding of linear statistical models,

6. KEY COMPONENTS OF (GENERALIZED) LINEAR MODELS58

whether analyzed in a Bayesian or frequentist mode of inference. We dealfirst with distributions (the stochastic part of a model) and then with thedesign matrix and the linear predictor (the deterministic part).

6.2 STOCHASTIC PART OF LINEAR MODELS:STATISTICAL DISTRIBUTIONS

Parametric statistical modeling means describing a caricature of the“machine” that plausibly could have produced the numbers we observe.The machine is nature, and nature is stochastic, i.e., nature is never fullypredictable. An element of chance in the observed outcome of something,e.g., body mass, does not mean that the outcome has no reason for hap-pening. There is always a reason; we simply don’t know it. Chance justmeans that there are a few or many unrecognized factors that affect theoutcome, but we either don’t know or haven’t measured them, or thatwe don’t fully understand the relationship between these factors and anoutcome. Things whose outcome is affected by chance and thus are onlypredictable up to a certain degree are called random variables. At somelevel, almost anything in nature is best thought of as a random variable.

However, random variables are seldom totally unpredictable; instead,the combined effect of all the unmeasured factors can often reasonably wellbe described by a mathematical abstraction: a probability distribution.A probability distribution assigns to each possible realization (value orevent) of a random variable a probability of occurrence. To be a properprobability distribution, that description must be complete, i.e., all possi-ble realizations must be described so that the sum of their probabilities ofoccurring is equal to 1.

The types of events vary greatly and may be binary like a coin flip(heads/tail), a survival event (dead/survived), or sex (male/female).They may be categorical like hair color (brown/blonde/red/white/gray), nationality (Swiss/French/Italian), or geographical location. (Notethat a binary random variable is just a special case of a categorical one.)They may be counts, like the number of birds in a sampling quadrat, thenumber of people in a park or pub, or the number of boys in a school class(Note the slight difference in this last kind of count?). Or they can be mea-surements, like body mass, wing length, or diameter of a tree.

Probability distributions are themselves governed (described) by one ora few parameters, and their actual form depends on the particular valuesof these parameters. A probability distribution can be described using aformula, a table, or a picture. Statisticians usually prefer formulae andecologists pictures. The advantage of a formula is that it is completely gen-eral, while a picture can only ever show how a distribution looks like forparticular parameter values. Depending on these values, different

6.2 STOCHASTIC PART OF LINEAR MODELS: STATISTICAL DISTRIBUTIONS 59

distributions can yield very similar pictures or same distributions can lookvery different. Nevertheless, there are some typical features of manydistributions, and it may be worthwhile to describe these, so you have abetter chance of recognizing a distribution when you meet it.

Of the four distributions we encounter in this book, two are discreteand have non-negative values, i.e., they can only take on integer valuesfrom 0, 1, and upwards. They are the binomial distribution and thePoisson distribution. Then, we have two continuous distributions,i.e., that can take on any value, within measuring accuracy. One is thefamous normal distribution, which is defined on the entire real line(from −∞ to ∞), and the other is the uniform distribution, which is usuallydefined between its lower and upper limits.

Sometimes, people are astonished at how modelers know howto choose the right kind of distribution for describing their randomvariables. It is true that there is some arbitrary element involved and,often, there is more than a single distribution that could be used todescribe the output of the machine that produced our observations.However, very frequently the circumstances of the data and how thesewere collected quite firmly point to one rather than another distributionas the most appropriate description of the number-generating machineand therefore also of the random variability in the output of thatmachine, the response.

Here, I describe circumstances in which each distribution would ariseduring sampling and then provide a pictorial example for each. I also givefew lines of R code that are required to create random samples from eachdistribution and then plot them in a histogram for checking how selectedparameter values influence the shape of a distribution. You can playaround with different parameter values and so get a feel for how thedistributions change as parameters are varied.

Among the four distributions (normal, uniform, Poisson, and binomial),one big divide is whether your response (i.e., the data) is discrete or con-tinuous. Counts are discrete and point to the two latter distributions,while measurements are continuous and point to the two former distribu-tions. In practice, there is some overlap. Since measurement accuracy isfinite, every continuous random variable is recorded in a discrete way.However, this is usually of no consequence. On the other hand, undercertain circumstances (e.g., large sample sizes, many observed uniquevalues), the two discrete distributions can often be well approximatedby a normal distribution. For instance, large counts in practice are oftenmodeled as coming from a normal distribution.

What follows are brief vignettes of the four distributions we usethroughout this book. Further features of these and many other distribu-tions for selected parameter values can be studied in R (type ?rnorm,?runif, ?rpois, or ?rbinom to find out more).


6.2.1 Normal Distribution

Sampling situation: Measurements are taken, which are affected by alarge number of effects that act in an additive way.Classical examples: They include body size and other linearmeasurements. In WinBUGS, they are also used to specify ignorance inpriors when the precision is small (meaning the variance large).Varieties: When effects are multiplicative instead of additive, we get thelog-normal distribution. A log-normal random variable can betransformed into a normal one by log-transforming it.Typical picture: It is the Gaussian bell curve, i.e., symmetrical, singlehump, more or less long tails. In small samples, it can look remarkablyirregular, e.g., skewed.Mathematical description: It includes two parameters, the mean (location)and standard deviation (spread, average deviation from the mean) or,equivalently, the variance (squared standard deviation). In WinBUGS,spread is specified as precision = 1/variance.Specification in WinBUGS:

x ~ dnorm(mean, tau) # note tau = 1/variance

R code to draw n random number with specified parameter(s) and plot ahistogram (see Fig. 6.1):

n < 100000 # Sample sizemu < mean < 600 # Body mass of male peregrinessd < st.dev < 30 # SD of body mass of male peregrines

450 500 550 600 650 700 750

5000

0

10000

15000

20000

25000

Fre

quen

cy

FIGURE 6.1 Sample histogram of normal distribution.


sample < rnorm(n = n, mean = mu, sd = sd)print(sample, dig = 4)hist(sample, col = "grey")

Mean and standard deviation:

EðyÞ = μ meansdðyÞ = σ sd

6.2.2 Continuous Uniform Distribution

Sampling situation: Measurements are taken, which are all equally likelyto occur in a certain range of values.Classical examples: In WinBUGS, this distribution is typically used tospecify ignorance in a prior (as an alternative to a “flat” normal).Varieties: The distribution can be discrete uniform, e.g., the result ofrolling a die with the response being anything from 1 to 6.Typical picture: It is a rectangle shape (with variation due tosampling variability).Mathematical description: It includes two parameters, lower (a) andupper limits (b).Specification in WinBUGS:

x ~ dunif(lower, upper)

R code to draw n random number with specified parameter(s) and plothistogram (see Fig. 6.2):

n < 100000 # Sample sizea < lower.limit < 0 #b < upper.limit < 10 #

sample < runif(n = n, min = a, max = b)print(sample, dig = 3)hist(sample, col = "grey")


EðyÞ = ða + bÞ/2 mean

sdðyÞ = ðb− aÞ2/12q

sd

6.2.3 Binomial Distribution: The “Coin-Flip Distribution”

Sampling situation: When N available things all have the sameprobability p of being in a certain state (e.g., being counted, or havinga certain attribute, like being male or dead), then the number x that isactually counted in that sample, or has that attribute, is binomiallydistributed.


Classical examples: Number of males in a clutch, school class, or herdof size N; number of times heads shows up among N flips of a coin;number of times you get a six among N = 10 rolls of a die; and thenumber of animals among those N present that you actually detect.Varieties: The Bernoulli distribution corresponds to a single coin flipand has only a single parameter, p. Actually, a binomial is the sum ofN Bernoullis (or coin flips).Typical picture: It varies a lot but strictly speaking always discrete.Normally, it is skewed, but skewness depends on the actual values ofthe parameters. The binomial distribution is symmetrical for p = 0.5.Mathematical description: It includes one or two parameters, theprobability of being chosen or having a certain trait (male, dead), oftencalled success probability p, and the “binomial total” N, which is thesample or trial “size.” N represents a ceiling to a binomial count; thisis an important distinction to the similar Poisson distribution. Usually,N is observed and therefore not a parameter (but see Chapter 21). Aimof modeling is typically estimation and modeling of p (but sometimesalso of N; see Chapter 21).Important feature: The binomial comes with a “built-in” variance equalto N � p � ð1− pÞ, i.e., the variance is a function of the mean, which isN � p. See also Fig. 6.3.Specification in WinBUGS:

x ~ dbin(p, N) # Note order of parameters !

0

0

1000

2000

3000

4000

5000

2 4 6 8 10

Fre

quen

cy

FIGURE 6.2 Sample histogram of uniform distribution.


Bernoulli:x ~ dbern(p)

R code to draw n random number with specified parameter(s) and plothistogram (Fig. 6.3):

n < 100000 # Sample sizeN < 16 # Number of individuals that flip the coinp < 0.8 # Probability of being counted (seen), dead or a male

sample < rbinom(n = n, size = N, prob = p)print(sample, dig = 3)hist(sample, col = "grey")

Extra comment: This picture can be thought to give the frequency withwhich we count 4, 5, 6, …, 16 greenfinches in a monitoring plot that hasa population of N = 16 individuals and where each greenfinch has a prob-ability of being seen (= counted) of exactly p = 0.8. It is important, thoughnot sufficiently widely recognized among ornithologists, to note thatthere is variation in bird counts even under constant conditions,i.e., when N and p are constant (Kéry and Schmidt, 2008).


EðyÞ = N � p meansdðyÞ = N � p � ð1− pÞp

sd

Mean and standard deviation for the Bernoulli are p and p � ð1− pÞp,

respectively.

4

0

5000

10000

15000

20000

25000

6 8 10 12 14 16

Freq

uenc

y

FIGURE 6.3 Sample histogram of binomial distribution with N=16 and p=0.8.


6.2.4 Poisson Distribution

Sampling situation: When things (e.g., birds, cars, or erythrocytes) arerandomly distributed in one or two (or more) dimensions and werandomly place a “counting window” along that dimension or in thatspace and record the number of things, then that number is Poissondistributed.Classical examples: Number of passing cars during 10-min counts at astreet corner; number of birds that fly by you at a migration site, annualnumber of Prussian soldiers kicked to death by a horse; number of caraccidents per day, month, or year, and number of birds or hares persample quadrat.Varieties: None really. But the Poisson is an approximation to thebinomial when N is large and p is small and can itself beapproximated by the normal when the average count, λ, is large,e.g., greater than 10. The negative binomial distribution is anoverdispersed version of the Poisson and can be derived byassuming that the Poisson mean, λ, is itself a random variable withanother distribution, the gamma. Therefore, it is also referred to asa Poisson–gamma mixture.Typical picture: It varies a lot but strictly speaking always discrete. It isskewed normally, but skewness depends on the value of lambda.Mathematical description: It includes a single parameter called λ, which isequal to the mean (= expectation, average count, intensity), as well asthe variance (i.e., variance = mean). That is, as for the binomialdistribution, the variance is not a free parameter but is a function of themean.Important feature: As for the binomial, values from a Poissondistribution are non-negative integers that come with a built-invariance.Specification in WinBUGS:

x ~ dpois(lambda)

R code to draw n random number with specified parameter(s) and plothistogram (see Fig. 6.4):

n < 100000 # Sample size

lambda < 5 # Average # individuals per sample, density

sample < rpois(n n, lambda lambda)

print(sample, dig 3)

par(mfrow c(2,1))

hist(sample, col "grey", main "Default histogram")

plot(table(sample), main "A better graph", lwd 3, ylab "Frequency")



EðyÞ = λ meansdðyÞ = λ

psd

6.3 DETERMINISTIC PART OF LINEAR MODELS:LINEAR PREDICTOR AND

DESIGN MATRICES

Linear models describe the expected response as a linear combinationof the effects of discrete or continuous explanatory variables. That is, theydirectly specify the relationship between the response and one or moreexplanatory variables. I note, in passing, that it is this linear (additive)relationship that makes a statistical model linear, not that a picture of itis a straight line. Most statistically linear models describe relationshipsthat are represented by curvilinear graphs of some kind. Thus, one mustdifferentiate between linear statistical models and linear models that alsoimply straight-line relationships.

Most ecologists have become accustomed to using a simple formulalanguage when specifying linear models in statistical software. Forinstance, to specify a two-way analysis of variance (ANOVA) with inter-action in R, we just type y ~ A*B as an argument of the functions lm() orglm(). Many of us don’t know exactly what typing y ~ A*B causes R to dointernally, and this may not be fatal. However, to fit a linear model in

00

10000

5 10 15

0 2 4 6 8 10 12 14 16 19

Fre

quen

cy

0

10000

Fre

quen

cy

Default histogram

A better graph

FIGURE 6.4 Sample histogram of Poisson distribution. The lower graph is better since itshows the discrete nature of a Poisson random variable.


WinBUGS, you must know exactly what this and other model descriptionsmean and how they can be adapted to match your hypotheses! Therefore,I next present a short guide to the so-called design matrix of a linear orgeneralized linear model (GLM). We will see how the same linearmodel can be described in different ways; the different forms of the asso-ciated design matrices are called parameterizations of a model.

The material in the rest of this chapter may be seen by some as a nui-sance. It may look difficult at first and indeed may not be totally necessarywhen fitting linear models using one of the widely known stats packages.For better or worse, in order to use WinBUGS, you must understand thismaterial. However, understanding design matrices will greatly increaseyour grasp of statistical models in general. In particular, you will greatlybenefit from this understanding when you use other software to conductmore specialist analyses that ecologists typically conduct. For instance,you need to understand the design matrix when you use programMARK (White and Burnham, 1999) to fit any of a very large range ofcapture–recapture types of models (for instance, most of those describedby Williams et al., 2002). Thus, time invested to understand this material istime well spent for most ecologists!

For each element of the response vector, the design matrix n index indi-cates which effect is present for categorical (= discrete) explanatory vari-ables and what “amount” of an effect is present in the case of continuousexplanatory variables. The design matrix contains as many columns as thefitted model has parameters, and when matrix-multiplied with the param-eter vector, it yields the linear predictor, another vector. The linear predictorcontains the expected value of the response (on the link scale), given thevalues of all explanatory variables in the model. Expected value meansthe response that would be observed when all random variation is averagedout. For a fuller understanding of some of this GLM jargon, you may needto jump to Chapter 13 (and back again). However, this is not required for anunderstanding of this chapter.

In the remainder of this chapter, we look at a progression of typical lin-ear models (e.g., t-test, simple linear regression, one-way ANOVA, andanalysis of covariance (ANCOVA)). We see how to specify them in Rand then find out what R does internally when we do this. We look athow to write these models algebraically and what system of equationsthey imply. We see different ways of writing what is essentially thesame model (i.e., different parameterizations of a model) and how theseaffect the interpretation of the model parameters. Only understanding allthe above allows us to specify these linear models in WinBUGS (though,notably, none of this has anything to do with Bayesian statistics).

For most nonstatisticians, it is usually much easier to understand some-thing with the aid of a numerical example. So to study the design matricesfor these linear models, we introduce a toy data set consisting of just six

6.3 DETERMINISTIC PART OF LINEAR MODELS 67

data points (Table 6.1). Let’s imagine that we measured body mass (mass,in units of 10 g; a continuous response variable) of six snakes in threepopulations (pop), two regions (region), and three habitat types (hab).These three discrete explanatory variables are factors with 3, 2, and 3levels, respectively. We also measured a continuous explanatory variablefor each snake, snout–vent length (svl).

Here is the R code for setting up these variables:

mass < c(6, 8, 5, 7, 9, 11)pop < factor(c(1,1,2,2,3,3))region < factor(c(1,1,1,1,2,2))hab < factor(c(1,2,3,1,2,3))svl < c(40, 45, 39, 50, 52, 57)

We use factor() to tell R that the numbers in this variable arejust names that lack any quantitative meaning. Next, we use the R func-tions lm() or glm() to specify different linear models for these data andinspect what this exactly means in terms of the design matrix.

6.3.1 The Model of the Mean

What if we just wanted to fit a common mean to the mass of all sixsnakes? That is, we want to fit a linear model with an intercept only(see Chapters 4 and 5). In R, this is done simply by issuing the command:

lm(mass ~ 1)

The 1 implies a covariate with a single value of one for every snake.One way to write this model algebraically is this:

massi = μ + εi

That is, we imagine that the mass measured for snake i is composedof an overall mean, μ, plus an individual deviation from that mean

TABLE 6.1 Our Toy Data Set for Six Snakes

mass pop region hab svl

6 1 1 1 40

8 1 1 2 45

5 2 1 3 39

7 2 1 1 50

9 3 2 2 52

11 3 2 3 57


called εi. Of course, the latter is the residual for snake i. If we want toestimate the mean within a linear statistical model as above, then we alsoneed an assumption about these residuals, for instance εi ∼Normalð0, σ2Þ.That is, we assume that the residuals are normally distributed around μwith a variance of σ2.

Let’s look at the design matrix of that model using the function model.matrix(). We see that it consists just of a vector of ones and that thisvector is termed the intercept by R.

model.matrix(mass ~ 1)

In the rest of this book, we make extensive use of this very usefulR function, model.matrix(). Next, we consider in more detail some lesstrivial linear statistical models.

6.3.2 t-Test

When we are interested in the effect of a single, binary explanatoryvariable such as region on a continuous response such as mass, we canconduct a t-test (see Chapter 7). Here is the specification of a t-test as alinear model in R:

lm(mass ~ region)

Some might wonder now: “But where is the p-value?” However, themost important thing in the linear model underlying a t-test is not thep-value but the vector of parameter estimates. R is a respectable statisticsprogram and thus presents the most important things first. But of course,there are several ways the p-value might be obtained, for instance, bysummary(lm(mass ~ region)).

In the methods section of a research paper, you would not describeyour model with the R description above; rather, you could describe italgebraically. Of course, for such a simple model as that underlying thet-test, you would simply mention that you used a t-test, but for the sakeof getting some mental exercise, let’s do it now in a more general way.One way of describing the linear model that relates snake mass to regionis with the following statement:

massi = α + β � regioni + εi

This means that the mass of snake i is made up of the sum of three com-ponents: a constant α, the product of another constant (β) with the valueof the indicator for the region in which snake i was caught, and a thirdterm εi that is specific to each snake i. The last term, εi, is the residual,and to make this mathematical model a statistical model, we need someassumption about how these residuals vary among individuals. One of the


simplest assumptions about these unexplained, snake-specific contribu-tions to mass is that they follow a normal distribution centered on zeroand with a spread (standard deviation or variance) that we are going toestimate. We can write this as follows:

εi ∼Normalð0, σ2ÞThis way of writing the linear model underlying a t-test immediately clari-fies a widespread confusion about what exactly needs to be normally dis-tributed in normal linear models: it is the residuals and not the rawresponse. Thus, if you are concerned about the normality or not of aresponse variable, you must first fit a linear model and only then inspectthe residual distribution.

A second and equivalent way to write the linear model for the t-testalgebraically is this:

massi ∼Normalðα + β � regioni, σ2Þ

This way to write down the model resembles more the way in which wespecify it in WinBUGS, as we will see. It clarifies that the mean response isnot the same for each snake. Moreover, in this way to write the model,α + β � regioni represents the deterministic part of the model andNormalð:::, σ2Þ its stochastic part.

Finally, we might also write this model like this: massi = μi + εi, whichagain says that the mass of snake i is the sum of a systematic effect, embo-died by the mean or expected mass μi, and a random effect εi. The expectedmass of snake i, μi, consists of α + β � regioni. This is called the linear pre-dictor of the model, and the residuals are assumed to follow a normal dis-tribution centered on the value of the linear predictor, as before.

Regardless of how we describe the model, we must know what thetwo-level variable named region actually looks like in the analysis ofthe linear model underlying the t-test. When we define region to be afactor and then fit its effect, what R does internally is to convert it intoa design matrix with two columns, one containing only ones (correspond-ing to the intercept) and the other being a single indicator or dummy vari-able. That is a variable containing just ones and zeroes that indicateswhich snake was caught in the region indicated in R’s name for that col-umn in the design matrix.

We can look at this using the function model.matrix():

model.matrix(~region)

> model.matrix(~region)(Intercept) region21 1 02 1 03 1 0


4 1 05 1 16 1 1[...]

(This is what the R default treatment contrasts yield; type ?model.matrix to find out what other contrasts are possible in R.) Therefore,the indicator variable named region2 contains a one for those snakes inregion 2.

Fitting the model underlying the t-test to our six data points withregion defined in this way implies a system of equations, and it is veryinstructive to write it down:

6 = α � 1 + β � 0 + ε18 = α � 1 + β � 0 + ε25 = α � 1 + β � 0 + ε37 = α � 1 + β � 0 + ε49 = α � 1 + β � 1 + ε5

11 = α � 1 + β � 1 + ε6

Here, one sees why the intercept is often represented by a 1: it is alwayspresent and identical for all the values of the response variable. To get asolution for this system of equations, i.e., to obtain values for theunknowns α and β that are “good” in some way, we need to definesome criterion for dealing with the residuals εi. Usually, in this systemof equations, the unknowns α and β are chosen such that the sum of thesquared residuals is minimal. This is called the least-squares method, andfor normal GLMs (again, see Chapter 13), the resulting parameter esti-mates for α and β are equivalent to those obtained using the more generalmaximum likelihood method.

A more concise way of writing the same system of equations is usingvectors and matrices:

6857911

0BBBBBB@

1CCCCCCA

=

1 01 01 01 01 11 1

0BBBBBB@

1CCCCCCA

� αβ

� �+

ε1ε2ε3ε4ε5ε6

0BBBBBB@

1CCCCCCA

We see again that the response is made up of the value of the linearpredictor plus the vector of residuals. The linear predictor consists ofthe product of the design matrix (also called model matrix or X matrix)and the parameter vector. In a sense, the design matrix contains the“weights” with which α and β enter the linear predictor for each snake i.The linear predictor, i.e., the result of that matrix multiplication, is another


vector containing the expected value of the response, given the covariates,for each snake i.

What is the interpretation of these parameters? This is a crucial topicand one that depends entirely on the way in which the design matrix isspecified for a model. It is clear from looking at the extended version of thesystem of equations above (or the matrix version as well) that α mustrepresent the expected mass of a snake in region 1 and β is the differencein the expected mass of a snake in region 2 compared with that of a snakein region 1. Thus, region 1 serves as a baseline or reference level and in themodel becomes represented by the intercept parameter, and the effect ofregion 2 is parameterized as a difference from that baseline. Let’s lookagain at an analysis of this model parameterization in R:

lm(mass ~ region)> lm(mass ~ region)

Call:lm(formula = mass ~ region)

Coefficients:(Intercept) region2

6.5 3.5

We recognize the value of the intercept as the mean mass of snakes inregion 1 and the parameter called region2 as the difference betweenthe mass in region 2 and that in region 1, i.e., the effect of region 2, orhere the effect of region. Hence, we may call this an effects parameterizationof the t-test model. Actually, in the output, R labels the parameter forregion 2 by the name of its column in the design matrix and that maybe slightly misleading.

Importantly, this is not the only way in which a linear model represent-ing a t-test for a difference between the two regions might be specified.Another way how to set up the equations, which is equivalent to theone above and is called a reparameterization of that model, is this:

6857911

0BBBBBB@

1CCCCCCA

=

1 01 01 01 00 10 1

0BBBBBB@

1CCCCCCA

� αβ

� �+

ε1ε2ε3ε4ε5ε6

0BBBBBB@

1CCCCCCA

What do the parameters α and β mean now? The interpretation of α hasnot changed; it is the expected mass of a snake in region 1. However, theinterpretation of parameter β is different, since now it is the expected bodymass of a snake in region 2. Thus, in this parameterization of the linearmodel underlying the t-test, the parameters directly represent the group


means, and we may call this a means parameterization of the model. Thismodel can be fitted in R by removing the intercept:

model.matrix(~region 1)> model.matrix(~region 1)

region1 region21 1 02 1 03 1 04 1 05 0 16 0 1[ ... ]

lm(mass~region 1)> lm(mass~region 1)

Call:lm(formula = mass ~ region 1)

Coefficients:region1 region2

6.5 10.0

Why would we want to use these different parameterizations of the samemodel? The answer is that they serve different aims: the effects parameter-ization is more useful for testing for a difference between the means in thetwo regions; this is equivalent to testing whether the effect of region 2, i.e.,parameter b, is equal to zero. In contrast, for a summary of the analysis,we might prefer the means parameterization and directly report the esti-mated expected mass of snakes for each region. However, the two modelsare equivalent; for instance, the sum of the two effects in the formerparameterization is equal to the value of the second parameter in the latterparameterization, i.e., 6.5 + 3.5 = 10.

6.3.3 Simple Linear Regression

To examine the relationship between a continuous response (mass) anda continuous explanatory variable such as svl, we would specify a simplelinear regression in R (see also Chapter 8):

lm(mass ~ svl)

This model is written algebraically in the same way as that underlyingthe t-test:

massi = α + β � svli + εi


and

εi ∼Normalð0, σ2ÞOr like this:

massi ∼Normalðα + β � svli, σ2ÞThe only difference to the t-test lies in the contents of the explanatoryvariable, svl, which may contain any real number rather than just thetwo values of 0 and 1. Here, svl contains the lengths for each of the sixsnakes. We inspect the design matrix that R builds on issuing the abovecall to lm():

model.matrix(~svl)> model.matrix(~svl)(Intercept) svl

1 1 402 1 453 1 394 1 505 1 526 1 57[]

Thus, fitting this simple linear regression model to our six data pointsimplies solving the following system of equations:

6 = α + β � 40 + ε18 = α + β � 45 + ε25 = α + β � 39 + ε37 = α + β � 50 + ε49 = α + β � 52 + ε511 = α + β � 57 + ε6

Again, a more concise way of writing the same equations is by usingvectors and a matrix:

6857911

0BBBBBB@

1CCCCCCA

=

1 401 451 391 501 521 57

0BBBBBB@

1CCCCCCA

� αβ

� �+

ε1ε2ε3ε4ε5ε6

0BBBBBB@

1CCCCCCA

The design matrix contains an intercept and another column withthe values of the covariate svl. The interpretation of the parametersα and β is thus that of a baseline, representing the expected value of the


response (mass) at a covariate value of svl = 0, and a difference or an effect,representing the change in mass for each unit change in svl. Equivalently,this effect is the slope of the regression of mass on svl or the effect of svl.

Let’s look at an analysis of this model in R:

lm(mass~svl)> lm(mass~svl)

Call:lm(formula = mass ~ svl)

Coefficients:(Intercept) svl

5.5588 0.2804

The intercept is biologically nonsense; it says that a snake of zero lengthweighs −5.6 mass units! This illustrates that a linear model can only ever be auseful characterization of a biological relationship over a restricted range ofthe explanatory variables. To give the intercept a more sensible interpretation,the model could be reparameterized by transforming svl to svl mean(svl).Fitting this centered version of svl will cause the intercept to become theexpected mass of a snake at the average of the observed size distribution.

Again, this is not the onlyway to specify a linear regressionofmass on svl,and we could again remove the intercept as in the t-test. However, remov-ing the intercept is not just a reparameterization of the same model; instead,it changes the model and forces the regression line to go through the origin.The design matrix then contains just the values of the covariate svl:

model.matrix(~svl 1)> model.matrix(~svl 1)svl

1 402 453 394 505 526 57[ ... ]

lm(mass~svl 1)> lm(mass~svl 1)

Call:lm(formula = mass ~ svl 1)

Coefficients:svl

0.1647


We see that the estimated slope is less than before, which makes sense,since previously the estimate of the intercept was negative and now it isforced to be zero. In most instances, the no-intercept model is not verysensible, and we should have strong reasons for forcing the intercept ofa linear regression to be zero. Also, the lower limit of the observedrange of the covariate should be close to, or include, zero.

6.3.4 One-Way Analysis of Variance

To examine the relationship between mass and pop, a factor with threelevels, we would specify a one-way ANOVA in R:

lm(mass ~ pop)

There are different equivalent ways in which to write this modelalgebraically; see also Chapter 9. Here, we focus on the effects and themeans parameterizations and show how their design matrices look. Forthe mass of individual i in population j, one way to write the model islike this:

massi = α + βjðiÞ � popi + εi

and

εi ∼Normalð0, σ2ÞAnother way to specify the same model is this:

massi ∼Normalðα + βjðiÞ � popi, σ2Þ

This parameterization means to set up the design matrix in an effectsformat, i.e., with an intercept α, representing the mean for population 1,plus indicators for every population except the first one, representing thedifferences between the means in these populations and the means in thereference population. (The choice of reference is arbitrary and has no influ-ence on inference.). For n populations, there are n− 1 βj parameters, whichare indexed by factor level j.

An algebraic description of the means parameterization of the one-wayANOVA would be this:

yi = αjðiÞ � popi + εi

and

εi ∼Normalð0, σ2ÞHere, the design matrix is set up to indicate each individual population,and the parameters αj represent the mean body mass of snakes in eachpopulation j.


We look at the two design matrices associated with the twoparameterizations.

First, the effects parameterization, which is the default in R:

model.matrix(~pop)

> model.matrix(~pop)(Intercept) pop2 pop3

1 1 0 02 1 0 03 1 1 04 1 1 05 1 0 16 1 0 1[ ... ]

Second, themeans parameterization, which is specified in R by removingthe intercept:

model.matrix(~pop 1)> model.matrix(~pop 1)

pop1 pop2 pop31 1 0 02 1 0 03 0 1 04 0 1 05 0 0 16 0 0 1[ ... ]

Again, the interpretation of the first parameter of the model is the samefor both parameterizations: it is the mean body mass in population 1.However, while in the effects parameterization, the parameters 2 and 3correspond to differences in the means (above), and they represent theactual expected body mass for snakes in each of the three populationsin the means parameterization (below). Also note how the first parameterchanges names when going from the former to the latter parameterizationof the one-way ANOVA.

Here is the matrix–vector description of the effects ANOVA modelapplied to our toy snake data set:

6857911

0BBBBBB@

1CCCCCCA

=

1 0 01 0 01 1 01 1 01 0 11 0 1

0BBBBBB@

1CCCCCCA

�αβ2β3

0@

1A +

ε1ε2ε3ε4ε5ε6

0BBBBBB@

1CCCCCCA


And here is the means parameterization:

6857911

0BBBBBB@

1CCCCCCA

=

1 0 01 0 00 1 00 1 00 0 10 0 1

0BBBBBB@

1CCCCCCA

�α1α2α3

0@

1A +

ε1ε2ε3ε4ε5ε6

0BBBBBB@

1CCCCCCA

Again, we see clearly that the interpretation of the first parameter in themodel is not affected by the parameterization chosen, but those for thesecond and third are.

Let’s fit the two versions of the ANOVA in R, first the effects and thenthe means model:

lm(mass~pop) # Effects parameterization (R default)> lm(mass~pop)

Call:lm(formula = mass ~ pop)

Coefficients:(Intercept) pop2 pop3

7 1 3lm(mass~pop 1) # Means parameterization> lm(mass~pop 1)

Call:lm(formula = mass ~ pop 1)

Coefficients:pop1 pop2 pop3

7 6 10

Each parameterization is better suited to a different aim: the effectsmodel is better for testing for differences and the means model is betterfor presentation.

6.3.5 Two-Way Analysis of Variance

A two-way or two-factor ANOVA serves to examine the relationshipbetween a continuous response, such as mass, and two discrete explana-tory variables, such as population (region) and habitat (hab), in ourexample (see also Chapter 10). Importantly, there are two different waysin which to combine the effects of two explanatory variables: additive(also called main effects) and multiplicative (also called interactioneffects). In addition, there is the possibility to specify these models usingan effects parameterization or a means parameterization. We considereach one in turn.


To specify a main-effects ANOVA with region and hab in R, we wouldwrite

lm(mass ~ region + hab)> lm(mass ~ region + hab)

Call:lm(formula = mass ~ region + hab)

Coefficients:(Intercept) region2 hab2 hab3

6.50 3.50 0.25 0.25

To avoid overparameterization (i.e., trying to estimate more parametersthan the available data allow us to do), we need to arbitrarily set tozero the effects of one level for each factor. The effects of the remaininglevels then get the interpretation of differences relative to the base level.It does not matter which level is used as a baseline or reference, but oftenstats programs use the first or the last level of each factor. R sets the effectsof the first level to zero.

In our snake toy data set, for the mass of individual i in region j andhabitat k, we can write this model as follows:

massi = α + βjðiÞ � regioni + δkðiÞ � habi + εi

and

εi ∼Normalð0, σ2ÞHere, α is the expected mass of a snake in habitat 1 and region 1. There isonly one parameter βj, so the subscript could as well be dropped. It spe-cifies the difference in the expected mass between snakes in region 2 andsnakes in region 1. We need two parameters δk to specify the differences inthe expected mass for snakes in habitats 2 and 3, respectively, relative tothose in habitat 1.

We look at the design matrix

model.matrix(~region + hab)

> model.matrix(~region + hab)(Intercept) region2 hab2 hab3

1 1 0 0 02 1 0 1 03 1 0 0 14 1 0 0 05 1 1 1 06 1 1 0 1[ ... ]


Hence, the implied system of equations that the software solves for us isthis:

6857911

0BBBBBB@

1CCCCCCA

=

1 0 0 01 0 1 01 0 0 11 0 0 01 1 1 01 1 0 1

0BBBBBB@

1CCCCCCA

�αβ2δ2δ3

0BB@

1CCA +

ε1ε2ε3ε4ε5ε6

0BBBBBB@

1CCCCCCA

Interestingly, there is no way to specify the main-effects model in a meansparameterization.

Next, consider the model with interactive effects, which lets the effect ofone factor level depend on the level of the other factor. The default effectsparameterization is written like this in R:

lm(mass ~ region * hab)> lm(mass ~ region * hab)

Call:lm(formula = mass ~ region * hab)

Coefficients:(Intercept) region2 hab2 hab3 region2:hab2 region2:hab36.5 6.0 1.5 1.5 5.0 NA

We see that one parameter is not estimable. The reason for this is that inthe 2-by-3 table of effects of region crossed with habitat, we lack snakeobservations for habitat 1 in region 2.

Algebraically, we assume that the mass of individual i in region j andhabitat k can be broken down as follows:

massi = α + βjðiÞ � regioni + δkðiÞ � habi + γjkðiÞ � regioni � habi + εi

with

εi ∼Normalð0, σ2Þ:In this equation, the meanings of parameters α, βj, and δk remain as

before, i.e., they specify the main effects of the levels for the habitat andregion factors. The new coefficients, γjk, of which there are two, specify theinteraction effects between these two factors.

Here is the design matrix for this model:

model.matrix(~region * hab)

> model.matrix(~region * hab)(Intercept) region2 hab2 hab3 region2:hab2 region2:hab3

1 1 0 0 0 0 02 1 0 1 0 0 03 1 0 0 1 0 04 1 0 0 0 0 0


5 1 1 1 0 1 06 1 1 0 1 0 1[...]

And here is, therefore, the system of equations that needs to be solvedto get the parameter estimates for this model:

6857911

0BBBBBB@

1CCCCCCA

=

1 0 0 0 0 01 0 1 0 0 01 0 0 1 0 01 0 0 0 0 01 1 1 0 1 01 1 0 1 0 1

0BBBBBB@

1CCCCCCA

�

αβ2δ2δ3γ22γ23

0BBBBBB@

1CCCCCCA

+

ε1ε2ε3ε4ε5ε6

0BBBBBB@

1CCCCCCA

Finally, the means parameterization of the interaction model:

lm(mass ~ region * hab 1 region hab)> lm(mass ~ region * hab 1 region hab)

Call:lm(formula = mass ~ region * hab 1 region hab)

Coefficients:reg1:hab1 reg2:hab1 reg1:hab2 reg2:hab2 reg1:hab3 reg2:hab36.5 NA 8.0 9.0 5.0 11.0

(I slightly edited the output so it fits a single line.)We see again that in the interactivemodel,wehave no information to esti-

mate the expected bodymass of snakes in habitat of type 1 in region 2, sinceno snakeswere examined for that combination of factor levels (called a cell inthe cross-classification of the two factors). In the additive model, the infor-mation for that cell in the table comes from the other cells in the table, butin the interactive model, each cell mean is estimated independently.

This model can be written algebraically like this:

massi = αjkðiÞ � regioni � habi + εiand

εi ∼Normalð0, σ2ÞThere are six elements in the vector αjk, corresponding to the six ways inwhich the levels of the two factors region and habitat can be combined.Also note that regioni � habi implies six columns in the design matrix. Nowlook at the design matrix (again, slightly edited for an improvedpresentation):

model.matrix(~ region * hab 1 region hab)

> model.matrix(~ region * hab 1 region hab)reg1:hab1 reg2:hab1 reg1:hab2 reg2:hab2 reg1:hab3 reg2:hab3

1 1 0 0 0 0 02 0 0 1 0 0 0


3 0 0 0 0 1 04 1 0 0 0 0 05 0 0 0 1 0 06 0 0 0 0 0 1[ ... ]

And the system of equations:

6857911

0BBBBBB@

1CCCCCCA

=

1 0 0 0 0 00 0 1 0 0 00 0 0 0 1 01 0 0 0 0 00 0 0 1 0 00 0 0 0 0 1

0BBBBBB@

1CCCCCCA

�

α11α21α12α22α13α23

0BBBBBB@

1CCCCCCA

+

ε1ε2ε3ε4ε5ε6

0BBBBBB@

1CCCCCCA

We see clearly the lack of information about effect α21, which is theexpected mass of a snake in region 2 and habitat 1, represented by thesecond column in the design matrix containing all zeroes.

6.3.6 Analysis of Covariance

When we are interested in the effects on mass of both a factor (= discreteexplanatory variable, e.g., pop) and a continuous covariate like svl, wecould specify an ANCOVA model (see also Chapter 11). There are twoways in which we might like to specify that model. First, we may thinkthat the relationship between mass and svl is the same in all populations,or worded in another way, that the mass differences among populationsdo not depend on the length of the snake. In statistical terms, this wouldbe represented by a main-effects model. Second if we admitted that themass–length relationship might differ among populations or that the dif-ferences in mass among populations might depend on the length of asnake, we would fit an interaction-effects model. (Note that here “effects”has a slightly different meaning from that when used as effects parameter-ization vs. means parameterization.) To specify these two versions of anANCOVA in R, we write this:

lm(mass ~ pop + svl) # Additive model

lm(mass ~ pop * svl) # Interactive model

lm(mass ~ pop + svl + pop:svl) # Same, R's way of specifying the interaction term

Here’s the additive model algebraically in the effects parameterization:

massi = α + βjðiÞ � popi + δ � svli + εi,

with

εi ∼Normalð0, σ2ÞThis model says that the mass of snake i in population j is made up ofa constant α plus the effects βj when in populations 2 and 3 plus a constantδ times svl plus the residual. In the effects parameterization, α is the


intercept for population 1 and vector βj has two elements, one for popula-tion 2 and another one for population 3, being the differences of the inter-cepts in these populations and the intercept in population 1. Finally, δ isthe common slope of the mass–length relationship in all three populations.

The means parameterization of the same model is this:

massi = αjðiÞ � popi + δ � svli + εi,

with

εi ∼Normalð0, σ2ÞAll that changes is that now the vector αj has three elements representingthe intercepts in each population.

The effects parameterization of the interaction-effects model is writtenlike this:

massi = α + βjðiÞ � popi + δ � svli + γjðiÞ � svli � popi + εi

and

εi ∼Normalð0, σ2ÞIn this model, α is the intercept for population 1 and vector βj has twoelements, representing the difference in the intercept between population 1and populations 2 and 3, respectively. Parameter δ is the slope of the mass–length relationship in the first population. Vector γj has two elementscorresponding to the difference in the slope between population 1 andpopulations 2 and 3, respectively.

The means parameterization of the same model is probably easier tounderstand and is written like this:

massi = αjðiÞ � popi + δjðiÞ � svli + εi

and

εi ∼Normalð0, σ2ÞThe only change relative to the main-effects model is that we added a sub-script j to the effect δ of svl, meaning that we now estimate three slopes δ,one for each population, instead of a single one that is common to snakesfrom all three populations.

Let’s now look at the design matrices of both the effects and the meanparameterizations of both the main- and interaction-effects models. Hereis the design matrix of the main-effects ANCOVA model using theR default, the effects parameterization (I do hope that this terminologyis not too confusing here …):

model.matrix(lm(mass ~ pop + svl)) # Additive model

> model.matrix(lm(mass ~ pop + svl))(Intercept) pop2 pop3 svl

1 1 0 0 402 1 0 0 45


3 1 1 0 394 1 1 0 505 1 0 1 526 1 0 1 57[ ... ]

The first parameter, the intercept, signifies the expected mass in popula-tion 1 at the point where the value of the covariate is equal to zero, i.e., for asnake of length zero. That is, therefore, the intercept of the regressionmodel.The parameters associated with the design matrix columns named pop2and pop3 quantify the difference in the intercept between these popula-tions and population 1. The parameter associated with the last columnin the design matrix measures the common slope of mass on svl forsnakes in all three populations.

And here is the design matrix of the interaction-effects ANCOVAmodel using the R default, the effects parameterization:

model.matrix(lm(mass ~ pop * svl)) # Interactive model

> model.matrix(lm(mass ~ pop * svl))(Intercept) pop2 pop3 svl pop2:svl pop3:svl

1 1 0 0 40 0 02 1 0 0 45 0 03 1 1 0 39 39 04 1 1 0 50 50 05 1 0 1 52 0 526 1 0 1 57 0 57[ ... ]

The parameters associated with the first three columns in this designmatrix signify the population 1 intercept and the effects of populations2 and 3, respectively, i.e., the difference in intercepts. The parameter asso-ciated with the fourth column, svl, is the slope of the mass–length regres-sion in the first population, and the parameters associated with the lasttwo columns are the differences between the slopes in populations 2and 3 relative to the slope in population 1.

Now let’s see the main-effects model using the means parameterization:

model.matrix(lm(mass ~ pop + svl 1)) # Additive model

> model.matrix(lm(mass ~ pop + svl 1))pop1 pop2 pop3 svl

1 1 0 0 402 1 0 0 453 0 1 0 394 0 1 0 50


5 0 0 1 526 0 0 1 57[ ... ]

The parameters associated with the first three columns in the designmatrix now directly represent the intercepts for each population, whilethat associated with the fourth column denotes the common slope of themass svl relationship.

And finally, let’s formulate the interaction-effects model using themeans parameterization.

model.matrix(lm(mass ~ (pop * svl 1 svl))) # Interactive model

> model.matrix(lm(mass ~ pop * svl 1))pop1 pop2 pop3 pop1:svl pop2:svl pop3:svl

1 1 0 0 40 0 02 1 0 0 45 0 03 0 1 0 0 39 04 0 1 0 0 50 05 0 0 1 0 0 526 0 0 1 0 0 57[ ... ]

This parameterization of the ANCOVA model with interaction betweenpopulation and svl contains parameters that have the direct interpretationas intercepts and slopes of the three mass svl relationships (one in eachpopulation). We will see this below, where we will also see that in R wecan directly fit in an lm() function an object that is a design matrix.

Here is the matrix–vector description of the main-effects ANCOVAmodel with effects parameterization applied to our snake data set:

6857911

0BBBBBB@

1CCCCCCA

=

1 0 0 401 0 0 451 1 0 391 1 0 501 0 1 521 0 1 57

0BBBBBB@

1CCCCCCA

�αβ1β2δ

0BB@

1CCA +

ε1ε2ε3ε4ε5ε6

0BBBBBB@

1CCCCCCA

And here is the means parameterization of the main-effects ANCOVA:

6857911

0BBBBBB@

1CCCCCCA

=

1 0 0 401 0 0 450 1 0 390 1 0 500 0 1 520 0 1 57

0BBBBBB@

1CCCCCCA

�α1α2α3δ

0BB@

1CCA +

ε1ε2ε3ε4ε5ε6

0BBBBBB@

1CCCCCCA


Here is the effects parameterization of the interactive model:

6857911

0BBBBBB@

1CCCCCCA

=

1 0 0 40 0 01 0 0 45 0 01 1 0 39 39 01 1 0 50 50 01 0 1 52 0 521 0 1 57 0 57

0BBBBBB@

1CCCCCCA

�

αβ1β2δγ1γ2

0BBBBBB@

1CCCCCCA

+

ε1ε2ε3ε4ε5ε6

0BBBBBB@

1CCCCCCA

And finally the means parameterization of the same model:

6857911

0BBBBBB@

1CCCCCCA

=

1 0 0 40 0 01 0 0 45 0 00 1 0 0 39 00 1 0 0 50 00 0 1 0 0 520 0 1 0 0 57

0BBBBBB@

1CCCCCCA

�

α1α2α3δ1δ2δ3

0BBBBBB@

1CCCCCCA

+

ε1ε2ε3ε4ε5ε6

0BBBBBB@

1CCCCCCA

Now let’s use R to fit the main-effects and the interaction-effects modelsusing both the effects parameterization and the means parameterization.First, R’s default effects parameterization of the main-effects model:

lm(mass ~ pop + svl)> lm(mass ~ pop + svl)

Call:lm(formula = mass ~ pop + svl)

Coefficients:(Intercept) pop2 pop3 svl

3.43860 1.49123 0.05263 0.24561

So, the intercept is the expected mass of a snake in population 1 that hassvl equal to zero. Pop2 and pop3 are the differences in the interceptbetween populations 2 and 3 compared with that in population 1, andthe parameter named svl measures the slope of the mass svl relationshipcommon to snakes in all populations.

It is instructive to plot the estimates of the relationships for eachpopulation under this model (Fig. 6.5).

fm < lm(mass ~ pop + svl) # Refit modelplot(svl,mass,col=c(rep("red", 2), rep("blue", 2), rep("green", 2)))abline(fm$coef[1], fm$coef[4], col = "red")abline(fm$coef[1]+ fm$coef[2], fm$coef[4], col = "blue")abline(fm$coef[1]+ fm$coef[3], fm$coef[4], col = "green")

Thismodel assumes that the mass svl relationship differs among popu-lations only in the average level. What we see then is that population 1(red) hardly differs from population 3 (green), but that snakes in


population 2 (blue) weigh less at a given length than do snakes inpopulation 1 or 3.

Next, the interaction-effects models using the R default effectsparameterization:

lm(mass ~ pop * svl)> lm(mass ~ pop * svl)

Call:lm(formula = mass ~ pop * svl)

Coefficients:(Intercept) pop2 pop3 svl pop2:svl pop3:svl

1.000e+01 7.909e+00 1.800e+00 4.000e 01 2.182e 01 6.232e 17

The first and the fourth parameters describe intercept and slope of therelationship between mass and svl in the first population, while theremainder refer to intercept and slope differences between the other twopopulations and those of the baseline population. Note that the lastparameter estimate should in fact be zero but is not due to rounding error.

We plot the estimated relationships also under the interaction model(Fig. 6.6):

fm < lm(mass ~ pop * svl) # Refit modelplot(svl, mass, col c(rep("red", 2), rep("blue", 2), rep("green", 2)))

abline(fm$coef[1], fm$coef[4], col = "red")

40 45 50 55

5

6

7

8

9

10

11

mas

s

svl

FIGURE 6.5 The main effects ANCOVA model.


abline(fm$coef[1]+ fm$coef[2], fm$coef[4] + fm$coef[5], col = "blue")abline(fm$coef[1]+ fm$coef[3], fm$coef[4] + fm$coef[6], col = "green")

As an aside, this is not a useful statistical model, since it has an R2 = 1,but the example does serve to illustrate the two kinds of assumptionsabout homogeneity or not of slopes that one may examine in an ANCOVAmodel.

Next, we try the means parameterizations of both the main-effectsmodel and the interaction-effects model. First the main-effects model:

lm(mass ~ pop + svl 1)> lm(mass ~ pop + svl 1)

Call:lm(formula = mass ~ pop + svl 1)

Coefficients:pop1 pop2 pop3 svl

3.4386 4.9298 3.3860 0.2456

This gives us the estimates of each individual slope plus the slope esti-mate common to all three populations.

What about the interaction-effects model?

lm(mass ~ pop * svl 1 svl)> lm(mass ~ pop * svl 1 svl)

40 45 50 55

5

6

7

8

9

10

11

mas

s

svl

FIGURE 6.6 The interaction effects ANCOVA model.


Call:lm(formula = mass ~ pop * svl 1 svl)

Coefficients:pop1 pop2 pop3 pop1:svl pop2:svl pop3:svl

10.0000 2.0909 11.8000 0.4000 0.1818 0.4000

These estimates have direct interpretations as the intercept and theslope of the three regressions of mass on svl.

6.4 SUMMARY

We have briefly reviewed the two key components of linear statisticalmodels: statistical distributions and the linear predictor, which is repre-sented by the product of the design matrix and the parameter vector.Understanding both is essential for applied statistics. But while sometimesone may get away in R or other useful stats packages with not exactlyknowing what parameterization of a model is fit by the software andwhat the parameters effectively mean, this is not the case when using Win-BUGS. In WinBUGS, we have to specify all columns of the design matrixand thus must know exactly what parameterization of a model we want tofit and how this is done. The linear models in this chapter were presentedin a progression from simple to complex. Chapters 7–11 follow that struc-ture and show how to fit these same models in WinBUGS and also in R fornormal responses, i.e., for normal linear models, to which ANOVA,regression, and related methods all belong.

EXERCISE1. Fitting a design matrix: The interaction-effects ANCOVA wasn’t a useful

statistical model for the toy snake data set, since six fitted parametersperfectly explain six observations and we can’t estimate anymore thevariability in the system. Use lm() to fit a custom-built design matrix, i.e.,the design matrix of an ANCOVA with partial interaction effects, where theslopes of the mass length relationship are the same in population 1 andpopulation 3. Build this design matrix in R, call it X, and fit the model bydirectly specifying X as the explanatory variable in function lm().

6.4 SUMMARY 89

C H A P T E R

7t-Test: Equal and Unequal

Variances

O U T L I N E

7.1 t-Test with Equal Variances 927.1.1 Introduction 927.1.2 Data Generation 927.1.3 Analysis Using R 937.1.4 Analysis Using WinBUGS 94

7.2 t-Test with Unequal Variances 977.2.1 Introduction 977.2.2 Data Generation 977.2.3 Analysis Using R 987.2.4 Analysis Using WinBUGS 99

7.3 Summary and a Comment on the Modeling of Variances 100

One of the most commonly used linear models is that underlying thesimple t-test. Actually, almost as with the computation of the arithmeticmean, many people wouldn’t even think of a t-test as being a form of lin-ear model. The t-test comes in two flavors; one for the case with equalvariances and another for unequal variances. We will look at both inthis chapter.


7.1 t-TEST WITH EQUAL VARIANCES

7.1.1 Introduction

The model underlying the t-test with equal variances states that:

yi = α + β � xi + εiεi ∼Normal ð0, σ2Þ

Here, a response yi is a measurement on a continuous scale taken on indi-viduals i, which belong to either of two groups, and xi is an indicator ordummy variable for group 2. (See Chapter 6 for different parameterizationsof this model.) This simple t-test model has three parameters, the mean α forgroup 1, the difference in the means between groups 1 and 2 (β) and thevariance σ2 of the normal distribution from which the residuals εi areassumed to have come from.

7.1.2 Data Generation

We first simulate data under this model and for a motivating examplereturn to peregrine falcons. We imagine that we had measured male andfemale wingspan and are interested in a sex difference in size. For WesternEurope, Monneret et al. (2006) gives the range of male and female wing-span as 70–85 cm and 95–115 cm, respectively. Assuming normal distribu-tions for wingspan, this implies means and standard deviations of about77.5 and 2.5 cm for males and 105 and 3 cm for females.

n1 < 60 # Number of females

n2 < 40 # Number of males

mu1 < 105 # Population mean of females

mu2 < 77.5 # Population mean of males

sigma < 2.75 # Average population SD of both

n < n1+n2 # Total sample size

y1 < rnorm(n1, mu1, sigma) # Data for females

y2 < rnorm(n2, mu2, sigma) # Date for males

y < c(y1, y2) # Aggregate both data sets

x < rep(c(0,1), c(n1, n2)) # Indicator for male

boxplot (y ~ x, col "grey", xlab "Male", ylab "Wingspan (cm)", las 1)

The manner in which we just generated this data set (Fig. 7.1) correspondsin a way to a means parameterization of the linear model of the t-test.Here is a different way to generate a set of the same kind of data. Perhapsit lets one see more clearly the principle of an effects parameterization ofthe linear model:


alpha < mu1 # Mean for females serves as the intercept

7. t-TEST: EQUAL AND UNEQUAL VARIANCES92

beta < mu2 mu1 # Beta is the difference male female

E.y < alpha + beta*x # Expectation

y.obs < rnorm(n n, mean E.y, sd sigma) # Add random variation

boxplot(y.obs ~ x, col "grey", xlab "Male", ylab "Wingspan (cm)", las 1)

An important aside (again): To get a feel for the effect of chance, or tech-nically, for sampling variance (= sampling error), you can repeatedly exe-cute one of the previous sets of commands and observe how differentrepeated realizations of the same random process are.

7.1.3 Analysis Using R

There is an R function called t.test(), but we will use the linear modelfunction lm() instead to fit the t-test with equal variances for both groups.

fit1 < lm(y ~ x) # Analysis of first data setfit2 < lm(y.obs ~ x) # Analysis of second data setsummary(fit1)summary(fit2)

> fit1 < lm(y ~ x)> fit2 < lm(y.obs ~ x)> summary(fit1)

[...]

Male

70

Win

gspa

n (c

m)

0 1

80

90

100

110

FIGURE 7.1 A boxplot of the generated data set on wingspan of female and maleperegrines when the residual variance is constant (0 females, 1 males).

7.1 t-TEST WITH EQUAL VARIANCES 93


(Intercept) 104.6452 0.3592 291.37 <2e 16 ***x 26.5737 0.5679 46.80 <2e 16 ***[...]Residual standard error: 2.782 on 98 degrees of freedomMultiple R squared: 0.9572, Adjusted R squared: 0.9567F statistic: 2190 on 1 and 98 DF, p value: < 2.2e 16

> summary(fit2)

[...]


(Intercept) 105.1785 0.3621 290.45 <2e 16 ***x 27.6985 0.5726 48.38 <2e 16 ***[...]Residual standard error: 2.805 on 98 degrees of freedomMultiple R squared: 0.9598, Adjusted R squared: 0.9594F statistic: 2340 on 1 and 98 DF, p value: < 2.2e 16

The difference between the two analyses is because of sampling variance,i.e., the fact that two different samples from the same population weretaken. You may use anova() for an analysis of variance (ANOVA) tableof this model:

anova(fit1)anova(fit2)

Just for fun check the designmatrices for the twomodels (they are the same):

model.matrix(fit1)model.matrix(fit2)

7.1.4 Analysis Using WinBUGS

Here’s the Bayesian analysis of the first data set (and don’t forget toload package R2WinBUGS). We need to specify the model first. As anextra, we also compute the residuals.

# Define BUGS model

sink("ttest.txt")

cat("

model {

# Priors

mu1 ~ dnorm (0,0.001) # Precision 1/variance


delta ~ dnorm (0,0.001) # Large variance Small precision

tau < 1/ (sigma * sigma)

sigma ~ dunif(0, 10)

# Likelihood

for (i in 1:n) {

y[i] ~ dnorm(mu[i], tau)

mu[i] < mu1 + delta *x[i]

residual[i] < y[i] mu[i] # Define residuals

}

# Derived quantities: one of the greatest things about a Bayesian analysis

mu2 <– mu1 + delta # Difference in wingspan

}

",fill TRUE)

sink()

Then, we provide the data, a function to generate inits, a list of parameterswe want WinBUGS to keep track of, and specify the Markov chain MonteCarlo (MCMC) settings, after which, we use the function bugs() to runthe analysis. Note the use of lognormal (instead of normal) random num-bers as initial values for the positive-valued standard deviation sigma.

# Bundle data

win.data < list("x", "y", "n")

# Inits function

inits< function(){list(mu1 rnorm(1), delta rnorm(1), sigma rlnorm(1))}

# Parameters to estimate

params < c("mu1","mu2", "delta", "sigma", "residual")

# MCMC settings

nc < 3 # Number of chains

ni < 1000 # Number of draws from posterior for each chain

nb < 1 # Number of draws to discard as burn in

nt < 1 # Thinning rate

# Start Gibbs sampler

out < bugs(data win.data, inits inits, parameters params, model

"ttest.txt", n.thin nt, n.chains nc, n.burnin nb, n.iter ni, debug TRUE,

working.directory getwd())

print(out, dig 3)

> print(out, dig 3)

Inference for Bugs model at "ttest.txt", fit using WinBUGS,



7.1 t-TEST WITH EQUAL VARIANCES 95

mean sd 2.5% 25% 50% 75% 97.5% Rhat n.eff

mu1 104.638 0.367 103.900 104.400 104.600 104.900 105.400 1.001 2100

mu2 78.070 0.448 77.200 77.770 78.080 78.370 78.950 1.001 3000

delta 26.568 0.577 27.680 26.960 26.560 26.190 25.410 1.001 3000

sigma 2.820 0.205 2.471 2.672 2.808 2.950 3.250 1.001 2400

residual[1] 2.671 0.366 3.411 2.911 2.669 2.430 1.937 1.001 2200

[ ...]

residual[100] 4.579 0.449 3.701 4.279 4.573 4.875 5.443 1.001 3000

deviance 489.489 2.579 486.600 487.700 488.800 490.600 496.300 1.001 3000

[ ...]


pD 3.0 and DIC 492.5


>

Comparing the inference from WinBUGS with that using frequentiststatistics, we see that the means estimates are almost identical, but thatthe residual standard deviation estimate is slightly larger in WinBUGS.This last point is general. In later chapters, we will often see that estimatesof variances are greater in a Bayesian than in a frequentist analysis. Presum-ably, the difference will be greatest with smaller sample sizes. This is anindication of the approximate and asymptotic nature of frequentist infer-ence that may differ from the exact inference under the Bayesian paradigm.

One of the nicest things about a Bayesian analysis is that parametersthat are functions of primary parameters and their uncertainty (e.g., stan-dard errors or credible intervals) can very easily be obtained using theMCMC posterior samples. Thus, in the above model code, the primaryparameters are the female mean and the male–female difference, but wejust added a line that computes the mean for males at every iteration, andwe directly obtain samples from the posterior distributions of not only thefemale mean wingspan and the sex difference, but also directly of themean male wingspan. In a frequentist mode of inference, this wouldrequire application of the delta method which is more complicated andalso makes more assumptions. In the Bayesian analysis, estimation erroris automatically propagated into functions of parameters.

Here are two further comments about model assessment. First, we seethat the effective number of parameters estimated is 3.0, which is rightbecause we are estimating one variance and two means. And second,before making an inference about the wingspan in this peregrine popula-tion, we should really check whether the model is adequate. Of course, thecheck of model adequacy is somewhat contrived because we use exclu-sively simulated and therefore, in a sense, perfect data sets. However, itis important to practice, so we will check the residuals here.


One of the first things to notice about the residuals, which is a littlestrange at first, is that each one has a distribution. This makes sense,because in a Bayesian analysis, every unknown has a posterior distribu-tion representing our uncertainty about the magnitude of that unknown.Here, we plot the residuals against the order in which individuals werepresent in the data set and then produce a boxplot for male and femaleresiduals to get a feel whether the distributions of residuals for the twogroups are similar.

plot(1:100, out$mean$residual)

abline(h 0)

boxplot(out$mean$residual ~ x, col "grey", xlab "Male", ylab "Wingspan

residuals (cm)", las 1)

abline(h 0)

No violation of the model assumption of homoscedasticity is apparentfrom these residual checks.

7.2 t-TEST WITH UNEQUAL VARIANCES

7.2.1 Introduction

The previous analysis assumed that interindividual variation in wing-span is identical for male and female peregrines. This may well not be thecase and it may be better to use a model that can accommodate possiblydifferent variances. Our model then becomes as follows:

yi = α + β � xi + εiεi ∼Normalð0, σ21Þ for xi = 0 ðfemalesÞεi ∼Normalð0, σ22Þ for xi = 1 ðmalesÞ


We first simulate data under the heterogeneous groups model (Fig. 7.2).

n1 < 60 # Number of females

n2 < 40 # Number of males

mu1 < 105 # Population mean for females

mu2 < 77.5 # Population mean for males

sigma1 < 3 # Population SD for females

sigma2 < 2.5 # Population SD for males


y1 < rnorm(n1, mu1, sigma1) # Data for females

7.2 t-TEST WITH UNEQUAL VARIANCES 97

y2 <– rnorm(n2, mu2, sigma2) # Data for males

y <– c(y1, y2) # Aggregate both data sets

x <– rep(c(0,1), c(n1, n2)) # Indicator for male

boxplot(y ~ x, col "grey", xlab "Male", ylab "Wingspan (cm)", las 1)


A frequentist analysis, using the Welch test to allow for unequal var-iances, is easy. R defaults to heterogeneous variances when calling thet-test function:

t.test(y ~ x)> t.test(y ~ x)

Welch Two Sample t test

data: y by xt = 46.7144, df = 88.497, p value < 2.2e 16alternative hypothesis: true difference in means is not equal to 095 percent confidence interval:25.93319 28.23750sample estimates:mean in group 0 mean in group 1

104.67641 77.59106

Male

Win

gspa

n (c

m)

0 1

80

90

100

110

FIGURE 7.2 A boxplot of the generated data set on wingspan of female and maleperegrines when the residuals depend on sex (0 females, 1 males).



Now here is the Bayesian analysis.

# Define BUGS model

sink("h.ttest.txt")

cat("

model {

# Priors

mu1 ~ dnorm(0,0.001)

mu2 ~ dnorm(0,0.001)

tau1 < 1 / ( sigma1 * sigma1)

sigma1 ~ dunif(0, 1000) # Note: Large var. Small precision

tau2 < 1 / ( sigma2 * sigma2)

sigma2 ~ dunif(0, 1000)

# Likelihood

for (i in 1:n1) {

y1[i] ~ dnorm(mu1, tau1)

}

for (i in 1:n2) {

y2[i] ~ dnorm(mu2, tau2)

}

# Derived quantities

delta < mu2 mu1

}

",fill TRUE)

sink()

# Bundle data

win.data < list("y1", "y2", "n1", "n2")

# Inits function

inits < function(){ list(mu1 rnorm(1), mu2 rnorm(1), sigma1 rlnorm(1), sigma2

rlnorm(1))}


params < c("mu1","mu2", "delta", "sigma1", "sigma2")

# MCMC settings


ni < 2000 # Number of draws from posterior for each chain



7.2 t-TEST WITH UNEQUAL VARIANCES 99

# Unleash Gibbs sampler

out < bugs(data win.data, inits inits, parameters params, model

"h.ttest.txt", n.thin nt, n.chains nc, n.burnin nb, n.iter ni, debug

TRUE)

print(out, dig 3)

> print(out, dig 3)

Inference for Bugs model at "h.ttest.txt", fit using WinBUGS,



mean sd 2.5% 25% 50% 75% 97.5% Rhat n.eff

mu1 104.667 0.400 103.900 104.400 104.700 104.900 105.400 1.001 4500

mu2 77.573 0.444 76.705 77.270 77.570 77.870 78.460 1.001 2700

delta 27.093 0.592 28.250 27.490 27.090 26.700 25.890 1.001 4500

sigma1 3.053 0.292 2.545 2.848 3.029 3.236 3.688 1.001 4500

sigma2 2.825 0.328 2.269 2.592 2.795 3.025 3.540 1.001 4500

deviance 497.784 2.882 494.200 495.600 497.200 499.300 504.952 1.001 4500

[ ... ]




The complication of sex-dependent variances is trivial to deal with in theBayesian framework. To formally test whether the two variances really dif-fer, one could reparameterize themodel such that the variance for one groupis expressed as the variance of the other plus some constant to be estimated.Actually, this could also be done “outside” of WinBUGS in R by forming thedifference, for each draw in the Markov chain, between sigma1 andsigma2. If the credible interval for that parameter covers zero, then thatwould be taken as lack of evidence for different variances. This is an impor-tant idea; that derived variables with their full posterior uncertainty canalso be computed outside of WinBUGS in R if posterior samples of all oftheir components are available. This is often easier than putting the addedcode into the WinBUGS model description.

7.3 SUMMARY AND A COMMENT ON THEMODELING OF VARIANCES

We have used WinBUGS to conduct the most widely used statistical test,the t-test. The version of that test with unequal variances is the only place inthis book where we explicitly model the variance (except for the modelingof variances by variance components; see Chapters 9, 12, 16, 19, 20, and 21).This chapter shows that not only the mean but also the variance may be


modeled. In classical statistics, variance modeling may be rather hard andfairly obscure in its application to an ecologist. In contrast in WinBUGS, themodeling of variances, e.g., as a function of some covariate, could be simplyundertaken by use of a log link function; see Lee and Nelder (2006) and Leeet al. (2006) for (frequentist) examples of such models and Exercise 4(below) for a Bayesian example. Variance modeling, either for the residualsor for random effects, may be required to adequately characterize the sto-chastic system components when inference is focusing on the mean struc-ture. Alternatively, one may focus on a relation between an explanatoryvariable and a variance, for instance, to test a hypothesis that some condi-tions increase the variance in some trait.

EXERCISES1. Comparing variances: See whether you can adapt the WinBUGS code directly

to test for equal variances of wingspan. Try a solution using the quantitiesmonitored in the previous analysis.

2. Assumption violations: Use simulation to study the effects of heterogeneousvariances on the inference by a t-test that assumes homogeneous variances.Assemble data with different SD for males and females, but a commonmean, and analyze them in WinBUGS assuming a common dispersion. Seewhat kind of bias is introduced.

3. Swiss hare data 1: Use WinBUGS to fit a t-test to the mean.density in arableand grassland sites. Repeat that assuming unequal variances. Test for adifference of the variance.

4. Swiss hare data 2: Use WinBUGS to fit a t-test to the mean.density in arableand grassland sites and introduce a log-linear regression of the variance onelevation.

7.3 SUMMARY AND A COMMENT ON THE MODELING OF VARIANCES 101

C H A P T E R

8Normal Linear Regression

O U T L I N E

8.1 Introduction 103



8.4 Analysis Using WinBUGS 1058.4.1 Fitting the Model 1058.4.2 Goodness-of-Fit Assessment in Bayesian Analyses 1068.4.3 Forming Predictions 1098.4.4 Interpretation of Confidence vs. Credible Intervals 111

8.5 Summary 113

8.1 INTRODUCTION

We have seen in Chapter 6 that the linear model underlying the simplenormal linear regression is the same as that for the t-test:

yi = α + β � xi + εiεi ∼Normal ð0, σ2Þ

The only difference is that the variable xi doesn’t just take on two possiblevalues to indicate membership to one of two groups; rather, it is a mea-surement that can take on any possible value, within some bounds and upto measurement accuracy. The geometric representation of this model is astraight line, with α being the intercept and β the slope.

As a motivating example for a linear regression analysis, we take a Swisssurvey of the wallcreeper (Fig. 8.1), a spectacular little cliff-inhabiting birdthat appears to have declined greatly in Switzerland in recent years.


Assume that we had data on the proportion of sample quadrats in whichthe species was observed in Switzerland for the years 1990–2005 and thatwe were willing to assume that the random deviations about a linear timetrend were normally distributed. This is for illustration only; usually, wewould use logistic regression (Chapters 17–19) or a site-occupancy model(see Chapter 20) to make inference about such data that have to do withthe distribution of a species and represent a proportion (i.e., number sitesoccupied/number sites surveyed).

Importantly, in this chapter, we will also introduce posterior predictivemodel checking, including the Bayesian p-value (Gelman et al., 1996;Gelman and Hill, 2007, Chapter 24). This is a very general concept forchecking the goodness-of-fit of a model analysed using simulation techni-ques like MCMC.

8.2 DATA GENERATION

We generate simple linear regression data (see later Fig. 8.4):

n <- 16 # Number of years

a 40 # Intercept

b −1.5 # Slope

sigma2 25 # Residual variance

FIGURE 8.1 Wallcreeper (Tichodroma muraria), Switzerland, 1989. (Photo E. Hüttenmoser)

8. NORMAL LINEAR REGRESSION104

x <- 1:16 # Values of covariate year

eps <- rnorm(n, mean 0, sd sqrt(sigma2))

y <- a + b*x + eps # Assemble data set

plot((x+1989), y, xlab "Year", las 1, ylab "Prop. occupied (%)", cex 1.2)


Here is a classical analysis using the linear regression facility in R. Thefunction I(.) makes it more straightforward to plot the results:

print(summary(lm(y ~ I(x+1989))))abline(lm(y~ I(x+1989)), col = "blue", lwd = 2)


Next, we conduct a Bayesian analysis of the same model, which willalso include a posterior predictive check plus a Bayesian p-value (Gelmanet al., 1996) to assess the adequacy of the model for our data set. (Hint:If you don’t understand some WinBUGS expressions, such as pow() orstep(), open the WinBUGS manual under the Help menu, and in thecontents go to Model Specification > Logical nodes.)

8.4.1 Fitting the Model

# Write model

sink("linreg.txt")

cat("

model {

# Priors

alpha ~ dnorm(0,0.001)

beta ~ dnorm(0,0.001)


# Likelihood

for (i in 1:n) {

y[i] ~ dnorm(mu[i], tau)

mu[i] <- alpha + beta*x[i]

}


tau <- 1/ (sigma * sigma)

p.decline <- 1-step(beta) # Probability of decline


# Assess model fit using a sums-of-squares-type discrepancy

for (i in 1:n) {

residual[i] <- y[i]-mu[i] # Residuals for observed data

predicted[i] <- mu[i] # Predicted values

sq[i] <- pow(residual[i], 2) # Squared residuals for observed data

# Generate replicate data and compute fit stats for them

y.new[i] ~ dnorm(mu[i], tau) # one new data set at each MCMC iteration

sq.new[i] <- pow(y.new[i]-predicted[i], 2) # Squared residuals for new data

}

fit <- sum(sq[]) # Sum of squared residuals for actual data set

fit.new <- sum(sq.new[]) # Sum of squared residuals for new data set

test <- step(fit.new - fit) # Test whether new data set more extreme

bpvalue <- mean(test) # Bayesian p-value

}

",fill TRUE)

sink()

# Bundle data

win.data <- list("x","y", "n")

# Inits function

inits <- function(){ list(alpha rnorm(1), beta rnorm(1), sigma rlnorm(1))}


params <- c("alpha","beta", "p.decline", "sigma", "fit", "fit.new", "bpvalue",

"residual", "predicted")

# MCMC settings

nc 3 ; ni 1200 ; nb 200 ; nt 1


out <- bugs(data win.data, inits inits, parameters params, model

"linreg.txt", n.thin nt, n.chains nc, n.burnin nb, n.iter ni, debug TRUE)

print(out, dig 3)

8.4.2 Goodness-of-Fit Assessment in Bayesian Analyses

In the WinBUGS code, there are two components included to assess thegoodness-of-fit of our model. First, there are two lines that computeresiduals and predicted values under the model. And second, thereis code to compute a Bayesian p-value, i.e., a posterior predictive check(Gelman et al., 1996, 2004; Gelman and Hill, 2007). As an instructiveexample, we will assess the adequacy of the model using a traditional resi-dual check and then using posterior predictive distributions, including aBayesian p-value, as an overall measure of fit for a chosen fit criterion.


Residual Plots

One commonly produced graphical check of the residuals of a linearmodel is a plot of the residuals against the predicted values. Under the nor-mal linear regression model, residuals are assumed to be a random samplefrom one single normal distribution. There should be no visible structure inthe residuals. In particular, the scatterplot of the residuals should not havethe shape of a fan which would indicate that the variance is not constantbut is larger, or smaller, for larger responses. We check this first and findno sign of a violation of the homoscedasticity assumption (Fig. 8.2).

plot(out$mean$predicted, out$mean$residual, main "Residuals vs. predicted

values", las 1, xlab "Predicted values", ylab "Residuals")

abline(h 0)

Posterior Predictive Distributions and Bayesian p-ValuesThe use of posterior predictive distributions is a very general way of

assessing the fit of a model when using MCMC model fitting techniques(Gelman et al., 1996; Gelman and Hill, 2007). The idea of a posterior pre-dictive check is to compare the lack of fit of the model for the actual dataset with the lack of fit of the model when fitted to replicated, “ideal” datasets. Ideal means that a data set conforms exactly to the assumptions madeby the model and is generated under the parameter estimates obtained

Predicted values

15

�5

0

Res

idua

ls

5

20 25 30 35 40

FIGURE 8.2 Residual plot for the linear regression analysis for trend in the Swisswallcreeper distribution.


from the analysis of the actual data set. In contrast to a frequentist analy-sis, where the solution of a model consists in a single value for each para-meter, we estimate a distribution in a Bayesian analysis; hence, anylack-of-fit statistic will also have a distribution.

To obtain such perfect data sets, at each MCMC iteration one replicatedata set is assembled under the same model that we fit to the actual dataset and using the values of all parameters from the current MCMC itera-tion. A discrepancy measure chosen to embody a certain kind of lack of fitis computed for both that perfect data set and for the actual data set.Therefore, at the end of an MCMC run for n chains of length m, wehave n*m draws from the posterior predictive distribution of the discre-pancy measure applied to the actual data set as well as for the discrepancymeasure applied to a perfect data set.

What does “discrepancy measure” mean and how is it chosen? The dis-crepancy measure can be chosen to assess particular features of the model.Often, some global measure of lack of fit will be selected, e.g., a sums ofsquares-type of discrepancy as we do here, or a Chi-squared-type discre-pancy (see Chapter 21 for another example in a more complex hierarchicalmodel). However, entirely different measures may also be chosen; forinstance, a discrepancy measure that quantifies the incidence or magni-tude of extreme values to assess the adequacy of the model for outliers;see Gelman et al. (1996) for examples.

One of the best ways to assess model adequacy based on posteriorpredictive distributions is graphically, in a plot of the lack of fit for theideal data vs. the lack of fit for the actual data (Fig. 8.3). If the modelfits the data, then about half of the points should lie above and halfof them below a 1:1 line. Alternatively, a numerical summary, called aBayesian p-value, can be computed that quantifies the proportion oftimes when the discrepancy measure for the perfect data sets is greaterthan the discrepancy measure computed for the actual data set. A fittingmodel has a Bayesian p-value near 0.5, and values close to 0 or closeto 1 suggest doubtful fit of the model.

lim <- c(0, 3200)

plot(out$sims.list$fit, out$sims.list$fit.new, main "Graphical posterior

predictive check", las 1, xlab "SSQ for actual data set", ylab "SSQ for ideal

(new) data sets", xlim lim, ylim lim)

abline(0, 1)

mean(out$sims.list$fit.new > out$sims.list$fit) # Bayesian p-value

> mean(out$sims.list$fit.new > out$sims.list$fit)

[1] 0.547

The graphical posterior predictive check and the numerical Bayesianp-value concur in suggesting that our fitted model is adequate for the


wallcreeper data, something that will hardly come as a surprise with thesesimulated data.

Some statisticians don’t like posterior predictive checks because theyuse the data twice: first, to generate the replicate data and second tocompare them with these replicates. Model fit assessments based onposterior predictive checks are somewhat too liberal, and posterior predic-tive checks should not be used for model selection; see Chapter 10 inNtzoufras (2009) for alternatives.


Predictions are expected values of the response variable at somehypothetical values of one or more explanatory variables. Forming predic-tions is extremely important in applied statistical modeling for two reasons.First, predictions, especially when represented as a graph, are one of thebest ways of communicating what can be learned from a model. Second,especially for more complex models, for instance, when there are polyno-mial terms or interactions and also for Poisson or binomial models(see Chapters 13–21), predictions may be the only way to understandwhat a model is telling us. Of course, for the simple normal straight-line

SSQ discrepancy for actual data set

0 500 1000 1500 2000 2500 3000

0

500

1000

1500

2000

2500

3000

SS

Q fo

r pe

rfec

t (ne

w)

data

set

s

FIGURE 8.3 Graphical posterior predictive check (PPC) of the model adequacy for thewallcreeper analysis plotting predictive vs. realized sums of squares discrepancies. TheBayesian p value is equal to the proportion of plot symbols above the 1:1 line. Note thatthe truncation on the left is due to the hard minimum provided by the least squares estimate.


model in this chapter, we can simply look at magnitude and sign of theslope estimate to understand what the model is telling us about the popu-lation trend in Swiss wallcreepers. However, as an exercise, we next plotthe estimated trend line from the classical (maximum likelihood [ML]) andthe Bayesian (MCMC) fit of the linear regression model:


abline(lm(y~ I(x+1989)), col "blue", lwd 2)

pred.y <- out$mean$alpha + out$mean$beta * x

points(1990:2005, pred.y, type "l", col "red", lwd 2)

text(1994, 20, labels "blue – ML; red - MCMC", cex 1.2)

Given the small sample size, we get remarkably similar and indeedvirtually identical inferences under the two paradigms (Fig. 8.4).

We can also easily calculate a 95% uncertainty interval by simulation.There are two kinds of such an uncertainty interval, one for the actual dataset (called a credible interval) and another for a new data set sampledfrom the same population (called a prediction interval). Because there ismore uncertainty about the line when adding the variability due to thenew sample, the latter is wider than the former. Here, we give an exampleof a credible interval. To produce the interval, we compute the expectedresponse for each of the 3000 elements of our posterior sample of the inter-cept α and the slope β, and at each point along the x-axis (i.e., for each of

Year

1990 1995 2000 2005

10

20

30

40

Pro

p. s

ites

occu

pied

by

wal

lcre

eper

(%

)

FIGURE 8.4 Observed and predicted change in the distribution of Swiss wallcreepers(blue maximum likelihood, red Bayesian).


the 16 years). Then, we use the 2.5th and 97.5th percentiles of each poster-ior distribution as our bounds of the credible interval.

We set up an R data structure to hold the predictions, fill them, thendetermine the appropriate percentile points and produce a plot (Fig. 8.5):

predictions <- array(dim c(length(x), length(out$sims.list$alpha)))

for(i in 1:length(x)){

predictions[i,] <- out$sims.list$alpha + out$sims.list$beta*i

}

LPB <- apply(predictions, 1, quantile, probs 0.025) # Lower bound

UPB <- apply(predictions, 1, quantile, probs 0.975) # Upper bound


points(1990:2005, out$mean$alpha + out$mean$beta * x, type "l", col "black",

lwd 2)

points(1990:2005, LPB, type "l", col "grey", lwd 2)

points(1990:2005, UPB, type "l", col "grey", lwd 2)

8.4.4 Interpretation of Confidence vs. Credible Intervals

Consider the frequentist inference about the slope parameter; −1.763,SE 0.241. A quick and dirty frequentist 95% confidence interval is

Year

Pro

p. s

ites

occu

pied

by

wal

lcre

eper

(%

)

1990 1995 2000 2005

10

20

30

40

FIGURE 8.5 Predicted wallcreeper trend (black line) with 95% credible interval(grey lines).


provided by −1.763 ± 2*0.241 = (−2.245, −1.281). This means that if wetook, for example, 100 replicate sample observations of 16 annual surveyseach in the same Swiss wallcreeper population and 100 times estimated anannual trend with associated 95% CI using linear regression, then on aver-age we would expect 95 intervals would indeed contain the true value ofthe population trend. We cannot make any direct probability statementabout the trend itself; the true value of the trend is either in or out ofour single interval, but there is no probability associated with this. Inparticular, it is wrong to say that the population trend of thewallcreeper lies between −2.245 and −1.281 with a probability of 95%.The probability statement in the 95% CI refers to the reliability of thetool, i.e., computation of the confidence interval, and not to the parameterfor which a CI is constructed.

In contrast, the posterior probability in a Bayesian analysis measuresour degree of belief about the likely magnitude of a parameter, giventhe model, the observed data, and our priors. Hence, we can make directprobability statements about a parameter using its posterior distribution.Let’s do this here for the slope parameter, which represents the populationtrend of the wallcreeper in Switzerland (Fig. 8.6).

hist(out$sims.list$beta, main "", col "grey", xlab "Trend estimate", xlim

c(-4, 0))

abline(v 0, col "black", lwd 2)

Trend estimate

�4

0

�3 �2 �1 0

100

200

300

400

Fre

quen

cy

FIGURE 8.6 Posterior distribution of the distributional trend in Swiss wallcreepers. Thevalue of zero (representing no trend) is shown as a black vertical line.


We see clearly that values representing no decline or an increase, i.e.,values of the slope of 0 and larger have no mass at all under this posteriordistribution. We can thus say that the probability of a stable or increasingwallcreeper population is essentially nil. Such a statement is exactly whatmost users of statistics, such as politicians, would like to have, rather thanthe somewhat contorted statement about a population trend as based onthe frequentist confidence interval.

8.5 SUMMARY

We have used WinBUGS to fit a linear regression, the algebraic modeland the WinBUGS code for which is essentially identical to that for thet-test. We have also introduced posterior predictive distributions alongwith the Bayesian p-value as a very general and flexible way of assessinggoodness-of-fit of a model analyzed using MCMC.

EXERCISES1. Toy problem: Assume you examined five frogs that weighed 10, 20, 23, 32,

and 35 g and had lengths of 5, 7, 10, 12, and 15 units. Write out the linearregression model using vectors and matrices and the set of equationsimplied. Conduct a normal linear regression analysis for this data set usingR and WinBUGS.

2. Prediction: One way of forming predictions in WinBUGS is by specifyingthem as additional (derived) variables in the model. Another way is to formthem outside of WinBUGS in R, if Markov chains for all the requiredingredients are available. The third and perhaps the simplest way is byadding to the data set missing values (NAs) in the response and the desiredlevels of the explanatory variables in the model. In a Bayesian analysis,missing values are treated exactly like parameters, i.e., WinBUGS will drawsamples for each missing value as part of the model fitting. The resultingposterior predictive distribution can then be summarized for inference inthe usual way. Try this for the frog data set and predict mass at length 16,17, 18, 19, and 20 units.

3. Swiss hare data: Fit a normal linear regression analysis for mean.densityon year for grassland areas. Hint: You must first select the data for grasslandareas and then aggregate over sites to obtain mean annual density ingrassland. Does a linear trend adequately capture the variability in thedata?

8.5 SUMMARY 113

C H A P T E R

9Normal One-Way ANOVA

O U T L I N E

9.1 Introduction: Fixed and Random Effects 115

9.2 Fixed-Effects ANOVA 1199.2.1 Data Generation 1199.2.2 Maximum Likelihood Analysis Using R 1209.2.3 Bayesian Analysis Using WinBUGS 120

9.3 Random-Effects ANOVA 1229.3.1 Data Generation 1229.3.2 Restricted Maximum Likelihood Analysis Using R 1249.3.3 Bayesian Analysis Using WinBUGS 125

9.4 Summary 127

9.1 INTRODUCTION: FIXED ANDRANDOM EFFECTS

Analysis of variance (ANOVA) is the generalization of a t-test to morethan two groups. There are different kinds of ANOVA: one-way, withjust a single factor, and two- or multiway, with two or more factors,and main- and interaction-effects models (see Chapter 10). Here, wepresent a one-way ANOVA and introduce the concept of random effectsalong the way. In random-effects models, a set of effects (e.g., groupmeans) is constrained to come from some distribution, which is mostoften a normal, although it may be a Bernoulli (see Chapter 20), a Poisson(see Chapter 21) or yet another distribution. In this chapter, we will firstgenerate and analyze a fixed-effects and then a random-effects ANOVAdata set. In Chapters 12, 16, and 19–21, we will focus on mixed models,i.e., those containing both fixed and random effects. As a motivatingexample for this chapter, we assume that we measured snout–vent length


(SVL) in five populations of Smooth snakes (Fig. 9.1) and were interestedin whether populations differ.

The one-way ANOVA can be parameterized in various ways (seeSection 6.3.4). We adopt a means parameterization of the linear modelfor the fixed-effects, one-way ANOVA:

yi = αjðiÞ + εiεi ∼Normal ð0,σ2Þ

Here, yi is the observed SVL of Smooth snake i in population j, αj(i) is theexpected SVL of a snake in population j, and residual εi is the random SVLdeviation of snake i from its population mean αj(i). It is assumed to be nor-mally distributed around zero with constant variance σ 2.

Without any further assumption, the population means αj(i) are simplysome unknown constants that are estimated in a fixed-effects ANOVA. If,however, we add a distributional assumption about the population meansαj(i), we obtain a random-effects ANOVA:

yi = αjðiÞ + εiεi ∼ Normal ð0,σ2ÞαjðiÞ ∼ Normal ðμ,τ2Þ

The interpretation of αj(i) and εi as population mean SVL and residual,respectively, is unchanged. But now, the αj(i) parameters are no longerassumed to be independent; rather, they come from a second normal

FIGURE 9.1 Smooth snake (Coronella austriaca), France, 2006. (Photo C. Berney)

9. NORMAL ONE-WAY ANOVA116

distribution with mean μ and variance τ2. The latter are also called hyper-parameters, because they are one level higher than the parameters αj(i) thatthey govern.

Thus, the models for the fixed- and random-effects ANOVA differ onlysubtly; so how do we know when to apply which one in practice? Unfor-tunately, there are fairly differing views on this decision, see Gelman andHill (2007, p. 245). The traditional view goes about as follows. When youhave a particular interest in the studied factor levels and/or when youhave included (nearly) all conceivable levels in a study, the associated fac-tor should be viewed as having fixed effects. You estimate the effects ofeach level but are not interested in the variance among levels, except aspart of an ANOVA table to construct an F-test statistic for that factor.Importantly, you cannot generalize to factor levels that you did notstudy. In contrast, you consider a factor as random when you don’thave a particular interest in the levels that actually appear in yourstudy and/or when these levels form a sample from a (much) larger setof possible levels that you could have included in your study. Typically,you want to generalize to this larger population and are more interestedin the variation among the factor levels in that population, although youmay still want to estimate the effects of the levels actually observed inyour study. Thus, typical fixed-effects factors would be sex or cereal vari-ety in an agricultural experiment. Typical random-effects factors might betime (e.g., year, month, or day) or location, such as experimental blocks orother spatial units on which repeated measurements are taken.

It has been argued (e.g., Robinson, 1991), that it doesn’t make sense toclaim that you studied only a sample of effects and then estimate themanyway, and that a more natural distinction between fixed and randomeffects is simply based on the question of whether they could plausiblyhave come from some distribution of effects. Such random effects are gen-erated by the same homogeneous, stochastic process and statisticians alsosay they are exchangeable, which in common language means that they aresimilar, but not identical. This similarity is because of the commonstochastic process that generated them and thus creates a stochasticrelationship among the effects of the levels of a random-effects factor.In contrast, when factor levels are modeled as fixed they are consideredunrelated or independent.

Why should one make distributional assumptions about a set ofeffects in a model, i.e., go from a fixed-effects ANOVA to the corre-sponding random-effects ANOVA? There are three reasons: extrapola-tion of inference to a wider population, improved accounting forsystem uncertainty, and efficiency of estimation. First, viewing thestudied effects as a random sample from some population enablesone to extrapolate to that population. This generalization can only beachieved by modeling the process that generates the realized values of

9.1 INTRODUCTION: FIXED AND RANDOM EFFECTS 117

the random effects (i.e., by assuming a normal distribution for the αj(i)above). Second, declaring factor effects as random acknowledges thatwhen repeating our study, we obtain a different set of effects, so theresulting parameter estimates will differ from those in our currentstudy. Random-effects modeling properly accounts for this added uncer-tainty in our inference about the analyzed system. Third, when making arandom-effects assumption about a factor, these effects are no longerestimated independently; instead, estimates are influenced by eachother and therefore are dependent. Specifically, individual estimatesare “pulled in” toward the common mean μ, i.e., they are closer to μthan the corresponding fixed-effects estimates. This is why random-effects estimators are said to be “shrinkage estimators”. Estimates thatare more imprecise and are based on a smaller sample size are shrunkmore. When effects are indeed exchangeable, shrinkage results inbetter estimates (e.g., with smaller prediction error) than the estimatesobtained from a fixed-effects analysis. This is why one also says thata random-effects analysis “borrows strength”.

Random-effects modeling can also be viewed as a compromise betweenassuming no effects and fully independent effects of the levels of a factor.When assuming a factor has no effect, you pool its effects, whereas whenassuming it has fixed effects, you treat all effects as completely indepen-dent instead. When assuming a factor has random effects, you pool effectsonly partially, and the degree of pooling is based on the amount ofinformation that you have about the effect of each level. According tothis view, you might always want to assume all factors as random andlet the data determine the degree of pooling; see Gelman (2005) andGelman and Hill (2007).

Recently, statisticians seem to prefer the view of random-effects factorsas those, whose effects result from a common stochastic process, with theresulting benefits of the ability to extrapolate, more honest accounting foruncertainty and shrinkage estimation. For instance, Sauer and Link (2002)assessed population trends in a large numbers of bird species and showedhow imprecise estimates for species with little information borrowedstrength from the “ensemble” (the group of species) and got pulledtoward the group mean, and this yielded better predictions. Similarly,Welham et al. (2004) analyzed a huge wheat variety testing experimentand treated variety as random. Again, they found that this gave better pre-dictions of future yield than treating variety as fixed.

Next, we generate one data set under a fixed-effects design and anotherunder a random-effects design. We do this in a very “linear model” fash-ion, i.e., by first specifying a design matrix and choosing parameter values.For the fixed-effects analysis, we will arbitrarily select these values,whereas for the random-effects analysis, we will draw them from a nor-mal distribution with specified hyperparameters. Then, we multiply the


design matrix by the parameter vector to obtain the linear predictor, towhich we add residuals to obtain the actual measurements.

9.2 FIXED-EFFECTS ANOVA


We assume five groups with 10 snakes measured in each with SVLaverages of 50, 40, 45, 55, and 60. This corresponds to a baseline popula-tion mean of 50 and effects of populations 2–5 of −10, −5, 5, and 10. Wechoose a residual standard deviation of SVL of 3 and assemble everything(Fig. 9.2). (Note that %*% denotes matrix multiplication in the R codebelow.)

ngroups < 5 # Number of populations

nsample < 10 # Number of snakes in each

pop.means < c(50, 40, 45, 55, 60) # Population mean SVL

sigma < 3 # Residual sd

n < ngroups * nsample # Total number of data points

eps < rnorm(n, 0, sigma) # Residuals

x < rep(1:5, rep(nsample, ngroups)) # Indicator for population

means < rep(pop.means, rep(nsample, ngroups))

X < as.matrix(model.matrix(~ as.factor(x) 1)) # Create design matrix

Population1

35

SV

L

40

45

50

55

60

65

2 3 4 5

FIGURE 9.2 Snout vent length (SVL) of Smooth snakes in five populations simulatedunder a fixed effects model.

9.2 FIXED-EFFECTS ANOVA 119

X # Inspect that

y < as.numeric(X %*% as.matrix(pop.means) + eps)

# assemble NOTE: as.numeric ESSENTIAL for WinBUGS

boxplot(y~x, col "grey", xlab "Population", ylab "SVL", main "", las 1)

9.2.2 Maximum Likelihood Analysis Using R

print(anova(lm(y~as.factor(x))))cat("\n")print(summary(lm(y~as.factor(x)))$coeff, dig = 3)cat("Sigma: ", summary(lm(y~as.factor(x)))$sigma, "\n")

> print(summary(lm(y~as.factor(x)))$coeff, dig = 3)Estimate Std. Error t value Pr(>|t|)

(Intercept) 48.84 0.864 56.51 1.96e 43as.factor(x)2 9.48 1.222 7.75 7.89e 10as.factor(x)3 4.54 1.222 3.71 5.67e 04as.factor(x)4 4.76 1.222 3.90 3.20e 04as.factor(x)5 11.25 1.222 9.20 6.56e 12> cat("Sigma: ", summary(lm(y~as.factor(x)))$sigma, "\n")Sigma: 2.733362

Remembering that R fits an effects parameterization, we recognizefairly well the input values of the analysis.

9.2.3 Bayesian Analysis Using WinBUGS

We fit a means parameterization of the model and obtain effects esti-mates (i.e., population differences) as derived quantities. Note WinBUGS’elegant double-indexing (alpha[x[i]]) to specify the expected SVL ofsnake i according to the i-th value of the population index x. We alsoadd two lines to show how custom hypotheses can easily be tested asderived quantities. Test 1 examines whether snakes in populations 2and 3 have the same size as those in populations 4 and 5. Test 2 checkswhether the size difference between snakes in populations 5 and 1 is twicethat between populations 4 and 1.

# Write model

sink("anova.txt")

cat("

model {

# Priors

for (i in 1:5){ # Implicitly define alpha as a vector

alpha[i] ~ dnorm(0, 0.001)

}



# Likelihood

for (i in 1:50) {

y[i] ~ dnorm(mean[i], tau)

mean[i] <- alpha[x[i]]

}


tau <- 1 / ( sigma * sigma)

effe2 <- alpha[2] - alpha[1]




# Custom hypothesis test / Define your own contrasts

test1 <- (effe2+effe3) - (effe4+effe5) # Equals zero when 2+3 4+5

test2 <- effe5 - 2 * effe4 # Equals zero when effe5 2*effe4

}

",fill TRUE)

sink()

# Bundle data

win.data <- list("y", "x")

# Inits function

inits <- function(){ list(alpha rnorm(5, mean mean(y)), sigma rlnorm(1) )}


params <- c("alpha", "sigma", "effe2", "effe3", "effe4", "effe5", "test1", "test2")

# MCMC settings

ni <- 1200

nb <- 200

nt <- 2

nc <- 3

# Start Gibbs sampling

out <- bugs(win.data, inits, params, "anova.txt", n.thin nt, n.chains nc,

n.burnin nb, n.iter ni, debug TRUE)

# Inspect estimates

print(out, dig 3)

> print(out, dig 3)

Inference for Bugs model at "anova.txt", fit using WinBUGS,

3 chains, each with 1200 iterations (first 200 discarded), n.thin 2


mean sd 2.5% 25% 50% 75% 97.5% Rhat n.eff

alpha[1] 48.837 0.877 47.049 48.240 48.840 49.420 50.636 1.011 180

alpha[2] 39.354 0.888 37.595 38.770 39.350 39.990 41.080 1.000 1500

9.2 FIXED-EFFECTS ANOVA 121

alpha[3] 44.315 0.867 42.710 43.720 44.290 44.910 46.016 1.002 1500

alpha[4] 53.580 0.886 51.759 53.020 53.565 54.150 55.356 1.001 1500

alpha[5] 60.019 0.882 58.285 59.430 60.030 60.590 61.750 1.001 1500

sigma 2.811 0.303 2.296 2.594 2.777 3.015 3.451 1.002 1200

effe2 −9.482 1.273 −12.040 −10.320 −9.456 −8.665 −7.046 1.003 590

effe3 −4.521 1.275 −7.079 −5.356 −4.548 −3.677 −2.037 1.006 360

effe4 4.744 1.241 2.325 3.944 4.768 5.514 7.205 1.004 470

effe5 11.182 1.239 8.767 10.390 11.180 12.002 13.615 1.002 810

test1 −29.929 1.764 −33.425 −31.100 −29.900 −28.800 −26.415 1.000 1500

test2 1.694 2.158 −2.670 0.319 1.644 3.139 6.016 1.002 940

deviance 243.557 3.671 238.500 240.800 242.800 245.600 252.452 1.005 580

[ ... ]

DIC info (using the rule, pD Dbar-Dhat)



As an aside, the effective number of parameters, pD, is estimated quitecorrectly as we have five group means and one variance parameter.Comparison with the maximum likelihood solutions (see Section 9.2.2)shows how with vague priors, a Bayesian analysis yields numericallyvirtually identical inferences as a frequentist analysis. However, one ofthe most compelling things about a Bayesian analysis conducted usingMarkov chain Monte Carlo methods is the ease with which derivedquantities can be estimated and custom tests conducted. In the aboveWinBUGS model code, we see how easily such custom contrasts (com-parisons) can be estimated with full error propagation from all theinvolved random quantities. Of course, for a simple model such as aone-way ANOVA, this can also be done fairly easily in standard statspackages. However, in WinBUGS, this is equally simple for any kindof parameter, e.g., for variances (see Chapter 7), and in any model type,e.g., mixed models, generalized linear models (GLMs), or generalizedlinear mixed models (GLMMs).

9.3 RANDOM-EFFECTS ANOVA


For our second data set, we assume that population SVL means comefrom a normal distribution with selected hyperparameters. The code isonly slightly different from the previous data-generating code. First, wechoose the two sample sizes; the number of populations and that of snakesexamined in each. As always, we generate balanced data for convenienceonly. The methods to “decompose” a data set in R and WinBUGS workjust as well for unbalanced data.


npop < 10 # Number of populations: now choose 10 rather than 5

nsample < 12 # Number of snakes in each

n < npop * nsample # Total number of data points

We choose the hyperparameters of the normal distribution from which therandom population means are thought to come from and use rnorm() todraw one realization from that distribution for each population. Then, weselect the residual standard deviation and draw residuals.

pop.grand.mean < 50 # Grand mean SVLpop.sd < 5 # sd of population effects about meanpop.means < rnorm(n = npop, mean = pop.grand.mean, sd = pop.sd)sigma < 3 # Residual sdeps < rnorm(n, 0, sigma) # Draw residuals

We build the design matrix, expand the population effects to the chosen(larger) sample size, use matrix multiplication to assemble our data setand have a look at what we’ve created (Fig. 9.3):

x <- rep(1:npop, rep(nsample, npop))

X <- as.matrix(model.matrix(~ as.factor(x) −1))

y <- as.numeric(X %*% as.matrix(pop.means) + eps) # as.numeric is ESSENTIAL

boxplot(y ~ x, col "grey", xlab "Population", ylab "SVL", main "", las 1)

# Plot of generated data

abline(h pop.grand.mean)

Population1

SV

L

40

45

50

55

2 3 4 5 6 7 8 9 10

60

FIGURE 9.3 Simulated snout vent length (SVL) of Smooth snakes in 10 populations. Wecan’t tell from this graph that the population effects are random now rather than fixed as inFig. 9.2. The horizontal line shows the grand mean.

9.3 RANDOM-EFFECTS ANOVA 123

9.3.2 Restricted Maximum Likelihood Analysis Using R

The functions contained in the R package lme4 allow one to fit, usingapproximate methods (Gelman and Hill, 2007), a wide range of linear,nonlinear, and generalized linear mixed models. We use this packagefor fitting a random-effects ANOVA by restricted maximum likelihood(REML). lme4 requires package Matrix, which we need to downloadand install in case we haven’t done so already.

library('lme4') # Load lme4

pop < as.factor(x) # Define x as a factor and call it pop

lme.fit < lmer(y ~ 1 + 1 | pop, REML = TRUE)lme.fit # Inspect resultsranef(lme.fit) # Print random effects

> lme.fit# Inspect resultsLinear mixed model fit by REMLFormula: y ~ 1 + 1 | pop

AIC BIC logLik deviance REMLdev630.1 638.5 312.1 626.3 624.1

Random effects:Groups Name Variance Std.Dev.pop (Intercept) 13.1244 3.6228Residual 8.5172 2.9184

Number of obs: 120, groups: pop, 10

Fixed effects:Estimate Std. Error t value

(Intercept) 50.318 1.176 42.78> ranef(lme.fit)# Print random effects$pop

(Intercept)1 1.07845602 2.25855873 6.58574704 0.59069535 6.29705546 3.39976127 1.95251758 1.04074139 0.998912210 2.2336038



Now, WinBUGS. Note that adopting a common prior distribution forthe 10 population means represents the random-effects assumption in thismodel.

sink("re.anova.txt")

cat("

model {

# Priors and some derived things

for (i in 1:npop){

pop.mean[i] ~ dnorm(mu, tau.group) # Prior for population means

effe[i] <- pop.mean[i] – mu # Population effects as derived quant’s

}

mu ~ dnorm(0,0.001) # Hyperprior for grand mean svl

sigma.group ~ dunif(0, 10) # Hyperprior for sd of population effects

sigma.res ~ dunif(0, 10) # Prior for residual sd

# Likelihood

for (i in 1:n) {

y[i] ~ dnorm(mean[i], tau.res)

mean[i] <- pop.mean[x[i]]

}


tau.group <- 1 / (sigma.group * sigma.group)

tau.res <- 1 / (sigma.res * sigma.res)

}

",fill TRUE)

sink()

# Bundle data

win.data <- list(y y, x x, npop npop, n n)

# Inits function

inits <- function(){ list(mu runif(1, 0, 100), sigma.group rlnorm(1), sigma.res

rlnorm(1) )}

# Params to estimate

parameters <- c("mu", "pop.mean", "effe", "sigma.group", "sigma.res")

# MCMC settings

ni <- 1200

nb <- 200

nt <- 2

nc <- 3

9.3 RANDOM-EFFECTS ANOVA 125

# Start WinBUGS

out <- bugs(win.data, inits, parameters, "re.anova.txt", n.thin nt, n.chains nc,


# Inspect estimates

print(out, dig 3)

Inference for Bugs model at "re.anova.txt", fit using WinBUGS,



mean sd 2.5% 25% 50% 75% 97.5% Rhat n.eff

mu 50.199 1.438 47.309 49.337 50.195 51.070 53.076 1.002 1500

pop.mean[1] 49.247 0.834 47.589 48.670 49.250 49.810 50.875 1.001 1500

pop.mean[2] 48.054 0.852 46.405 47.490 48.020 48.640 49.655 1.001 1500

pop.mean[3] 56.933 0.871 55.275 56.340 56.950 57.520 58.590 1.001 1500

pop.mean[4] 49.724 0.839 48.100 49.180 49.710 50.290 51.290 1.001 1500

pop.mean[5] 43.962 0.847 42.290 43.380 43.980 44.540 45.510 1.001 1500

pop.mean[6] 53.705 0.849 52.070 53.120 53.685 54.300 55.370 1.000 1500

pop.mean[7] 48.349 0.799 46.715 47.800 48.360 48.920 49.875 1.001 1500

pop.mean[8] 49.289 0.829 47.650 48.720 49.310 49.842 50.846 1.003 1000

pop.mean[9] 51.267 0.844 49.650 50.700 51.240 51.860 52.920 1.001 1500

pop.mean[10] 52.567 0.806 51.020 52.000 52.580 53.130 54.115 1.004 540

effe[1] −0.952 1.589 −4.156 −1.990 −0.926 0.095 2.189 1.001 1500

effe[2] −2.145 1.631 −5.513 −3.197 −2.126 −1.113 1.171 1.002 1200

effe[3] 6.734 1.639 3.558 5.680 6.781 7.736 10.055 1.001 1500

effe[4] −0.475 1.617 −3.716 −1.461 −0.493 0.532 2.744 1.001 1500

effe[5] −6.237 1.628 −9.426 −7.231 −6.255 −5.202 −3.169 1.001 1500

effe[6] 3.506 1.616 0.458 2.490 3.435 4.542 6.749 1.001 1500

effe[7] −1.850 1.576 −5.136 −2.832 −1.858 −0.832 1.124 1.000 1500

effe[8] −0.910 1.611 −4.105 −1.940 −0.898 0.074 2.352 1.003 1300

effe[9] 1.067 1.622 −1.975 −0.008 1.096 2.121 4.104 1.001 1500

effe[10] 2.368 1.582 −0.729 1.342 2.383 3.392 5.595 1.001 1500

sigma.group 4.287 1.254 2.546 3.380 4.036 4.941 7.305 1.002 1400

sigma.res 2.946 0.199 2.584 2.804 2.931 3.077 3.376 1.000 1500

deviance 598.756 5.010 591.347 595.200 597.900 601.600 610.210 1.005 590

[ ... ]


pD 10.6 and DIC 609.3


On comparing the inference using REML in lme4 (see Section 9.3.2) withour Bayesian analysis, we find similar, but not identical, results of the twoanalyses. For instance, the among-population SVL variation (sigma.group), is estimated better in the Bayesian analysis (remember that truthis 5). The classical analysis, using lmer(), appears to be more biased and


does not give a standard error of the estimated random-effects standarddeviation. This illustrates that lmer() may yield more biased estimates forsmall samples, i.e., when there are few levels only (Gelman and Hill,2007). However, it has to be said that estimation of a between-group var-iance is always difficult when there are few levels and/or that variance issmall (Lambert et al., 2005).

9.4 SUMMARY

We have introduced fixed (independent) and random (dependent) factoreffects and fitted the corresponding one-way ANOVA models using Win-BUGS and R, using lme4. We found that the WinBUGS solution is presum-ably less biased for the random-effects variance than the solution by lme4when sample sizes are small.

EXERCISES1. Convert the ANOVA to an ANCOVA (analysis of covariance): Within the

fixed-effects ANOVA, add the effect on SVL of a continuous measure ofhabitat quality that varies by individual snake (perhaps individuals inbetter habitat are larger and more competitive). You may either recreate anew data set that contains such an effect or simply create a habitat covariate(e.g., by drawing random numbers) and add it as a covariate into theprevious analysis.

2. Watch shrinkage happen: Population mean estimates under the random-effects ANOVA are shrunk toward the grand mean when compared withthose under a fixed-effects model. First, watch this shrinkage by fitting afixed-effects ANOVA to the random-effects data simulated in Section 9.3.1.Second, discard the data from 8 out of 10 snakes in one population and seewhat happens to the estimate of that population mean.

3. Swiss hare data: Compare mean observed population density among allsurveyed sites when treating years as replicates. Do this once assuming thatthese populations corresponded to fixed effects; then repeat the analysisassuming they are random effects.

9.4 SUMMARY 127

C H A P T E R

10Normal Two-Way ANOVA

O U T L I N E

10.1 Introduction: Main and Interaction Effects 129


10.3 Aside: Using Simulation to Assess Bias and Precisionof an Estimator 133


10.5 Analysis Using WinBUGS 13510.5.1 Main-Effects ANOVA Using WinBUGS 13510.5.2 Interaction-Effects ANOVA Using WinBUGS 13710.5.3 Forming Predictions 138

10.6 Summary 139

10.1 INTRODUCTION: MAIN ANDINTERACTION EFFECTS

We now extend the one-way analysis of variance (ANOVA) model byadding another factor and arrive at a two-way ANOVA. We only considerfixed effects here. There are two ways in which the effects of two factorsA and B can be combined, and the associated models are called main-effects and interaction-effects model. In the main-effects model, the effectsof A and B are additive, i.e., the effect of one level of factor A, say a1, doesnot depend on whether it is assessed at one level of B, say b1, or at another,say b2. In contrast, with an interaction between A and B, some or all effectsdepend on some or all of each other and the effect of a1 may not be iden-tical when assessed at b1 or at b2. Interaction is symmetric, so the effectof b1 will in general also not be the same whether assessed at a1 or at a2.However, the interaction model is still linear, since effects are simply


added together, only with an additional set of effects: those for the com-bination of each level of two or more factors. When factors are consideredfixed, then depending on the model, not all effects will be estimable, seelater (10.5.1). In contrast, in a random-effects model with interaction, alleffects will in general be estimable.

In Chapter 6 we already saw the linear models for the two-wayANOVA with interaction. In short, the effects parameterization for amodel with two factors A (with j levels) and B (with k levels) is this:

yi = α + βjðiÞ � Ai + δkðiÞ � Bi + γjkðiÞ � Ai � Bi + εi,

while the means parameterization is this:

yi = αjkðiÞ � Ai � Bi + εi

In both cases, we need to assume a distribution for the residuals to com-plete the model description:

εi ∼Normal ð0, σ2Þ:We will use the beautiful mourning cloak (Fig. 10.1) as an illustration forthis chapter and assume that we measured wing length of butterfliesin each of three elevation classes in five different populations and thatthe effects of these factors interact. Table 10.1 shows the meaning of thecoefficients in the linear model for the effects parameterization.

FIGURE 10.1 Mourning cloak (Nymphalis antiopa), Switzerland, 2006. (Photo: T. Marent)

10. NORMAL TWO-WAY ANOVA130

If the population factor has n.pop levels and the elevation factorn.elev levels, we have one intercept, n.pop 1 = 4 effects for the popula-tion factor, n.elev 1 = 2 effects for the elevation factor and (n.pop 1)*(n.elev 1) = 8 effects for the interaction between population and eleva-tion. This adds up to the 15 degrees of freedom that it will cost us to fit thismodel to a data set that contains observations in every cell (combination oflevels) in the cross-classification of population and elevation.

10.2 DATA GENERATION

We assume five populations with 12 butterflies were measured in eachand that of these 12, four butterflies were studied at each of three elevationclasses (low, medium, high). Wing length differs with elevation, perhapsbecause butterflies hatch at different size at different elevation or becauseof different size-dependent predation owing to different bird communitiesat different elevations. Furthermore, the relationship between wing lengthand elevation class is not homogeneous among the five studied popula-tions so there is a population-elevation interaction. Residual wing lengthstandard deviation will be 3.

# Choose sample size

n.pop <- 5

n.elev <- 3

nsample <- 12

n <- n.pop * nsample

# Create factor levels

pop <- gl(n n.pop, k nsample, length n)

elev <- gl(n n.elev, k nsample / n.elev, length n)

# Choose effects

baseline <- 40 # Intercept

pop.effects <- c(-10, −5, 5, 10) # Population effects

elev.effects <- c(5, 10) # Elev effects

TABLE 10.1 The 15 Parameters Estimated in the Effects Parameterization of aTwo-Way ANOVA with Interaction for the Mourning Cloak Example.

Intercept Elevation2 Elevation3

pop2 pop2.elevation2 pop2.elevation3




10.2 DATA GENERATION 131

interaction.effects <- c(−2, 3, 0, 4, 4, 0, 3, −2) # Interaction effects

all.effects <- c(baseline, pop.effects, elev.effects, interaction.effects)

sigma <- 3

eps <- rnorm(n, 0, sigma) # Residuals

X <- as.matrix(model.matrix(~ pop*elev) ) # Create design matrix

X # Have a look at that

Use matrix multiplication to assemble all components for the final winglength measurements y which we inspect in a grouped boxplot (Fig. 10.2).

wing <- as.numeric(as.matrix(X) %*% as.matrix(all.effects) + eps)

# NOTE: as.numeric is ESSENTIAL for WinBUGS later

boxplot(wing ~ elev*pop, col "grey", xlab "Elevation-by-Population", ylab

"Wing length", main "Simulated data set", las 1, ylim c(20, 70)) # Plot of

generated data

abline(h 40)

We have generated data for which the wing length–elevation relationshipvaries considerably among the five populations. This can also be nicelyseen in a useful conditioning plot, which can be drawn using the functionxyplot() in the lattice package. The data can be viewed in two ways; bothplots show that the effects of population and elevation are notindependent.

70

60

50

40Win

g le

ngth

30

20

1.1 2.1 3.1 1.2 2.2 3.2 1.3 2.3Elevation-by-Population

3.3 1.4 2.4 3.4 1.5 2.5 3.5

FIGURE 10.2 Mean wing length of mourning cloaks at each of three elevations and ineach of five populations. Boxplots are ordered first by elevation and second by populationand their identity is recognizable from the tick labels on the x axis. For instance, the boxplotlabelled 3.2 on the x axis shows the mean wing length in population 2 at elevation 3.


library("lattice") # Load the lattice library

xyplot(wing ~ elev | pop, ylab "Wing length", xlab "Elevation", main

"Population specific relationship between wing and elevation class")

xyplot(wing ~ pop | elev, ylab "Wing length", xlab "Population", main

"Elevation specific relationship between wing and population")

10.3 ASIDE: USING SIMULATION TO ASSESS BIASAND PRECISION OF AN ESTIMATOR

Let’s quickly compare our parameter estimates with what we put intothe data:

lm(wing ~ pop*elev)

all.effects

> lm(wing ~ pop*elev)

Call:

lm(formula wing ~ pop * elev)

Coefficients:

(Intercept) pop2 pop3 pop4 pop5 elev2

38.859 8.543 4.793 4.702 13.410 6.437

elev3 pop2:elev2 pop3:elev2 pop4:elev2 pop5:elev2 pop2:elev3

11.213 1.284 1.530 0.332 1.857 1.359

pop3:elev3 pop4:elev3 pop5:elev3

1.820 2.835 6.609

> all.effects

[1] 40 5 10 10 5 5 10 2 3 0 4 4 0 3 2

The coefficient estimates don’t necessarily resemble very much the param-eters from which we simulated these data; after all, our sample size israther small. So, to reassure ourselves that these differences are simplydue to sampling variation, we repeat this data generation-analysis cycle1000 times and average over the random sampling variation to convinceourselves that the estimators from the linear model are indeed unbiased.Simulations of this kind can be done easily in R, and this is one of the greatstrengths of R.

n.iter <- 1000 # Desired number of iterations

estimates <- array(dim c(n.iter, length(all.effects))) # Data structure to

hold results

for(i in 1:n.iter) { # Run simulation n.iter times

print(i) # Optional

eps <- rnorm(n, 0, sigma) # Residuals

10.3 ASIDE: USING SIMULATION TO ASSESS BIAS 133

y <- as.numeric(as.matrix(X) %*% as.matrix(all.effects) + eps) # Assemble data

fit.model <- lm(y ~ pop*elev) # Break down data

estimates[i,] <- fit.model$coefficients # Save estimates of coefs.

}

Compare the input (i.e., the chosen effects) and the output when averagedover sampling variation:

print(apply(estimates, 2, mean), dig 2)

all.effects

> print(apply(estimates, 2, mean), dig 2)

[1] 40.0102 −10.0452 −5.0717 4.9687 9.9524 5.0417 10.0419 −1.9761

[9] 2.9493 −0.0226 4.0031 3.9407 −0.0039 3.0234 −1.9838

> all.effects

[1] 40 −10 −5 5 10 5 10 −2 3 0 4 4 0 3 −2

These are much closer to our input parameters. Depending on the numberof iterations, we can get arbitrarily close to the input. Alternatively, wecould increase the sample size from 12 to 12000, and we would getestimates that are still closer to the input values.


We continue with the analysis of our data set and use R to fit themain-effects model first.

mainfit < lm(wing ~ elev + pop)mainfit

> mainfit[ ... ]Coefficients:(Intercept) elev2 elev3 pop2

38.736 6.924 11.095 8.518pop3 pop4 pop5

3.676 5.757 11.826

Then, we fit the means parameterization of the interaction model.

intfit < lm(wing ~ elev*pop 1 pop elev)intfit

> intfit < lm(wing ~ elev*pop 1 pop elev)> intfit[ ... ]


Coefficients:elev1:pop1 elev2:pop1 elev3:pop1

38.86 45.30 50.07elev1:pop2 elev2:pop2 elev3:pop2




52.27 60.56 56.87


10.5.1 Main-Effects ANOVA Using WinBUGS

We fit the main-effects model in the effects parameterization becauseI find that easier to code. One minor feature in this analysis is the wayin which we specify the priors for the elements of the parameter vectors:instead of looping over each of them, we now write them all out, since wehave to set to zero the first (or another) level of each factor to make thisfixed-effects model identifiable.

# Define model

sink("2w.anova.txt")

cat("

model {

# Priors

alpha ~ dnorm(0, 0.001) # Intercept

beta.pop[1] <- 0 # set to zero effect of 1st level

beta.pop[2] ~ dnorm(0, 0.001)




beta.elev[1] <- 0 # ditto

beta.elev[2] ~ dnorm(0, 0.001)

beta.elev[3] ~ dnorm(0, 0.001)


# Likelihood

for (i in 1:n) {

wing[i] ~ dnorm(mean[i], tau)

mean[i] <- alpha + beta.pop[pop[i]] + beta.elev[elev[i]]

}



tau <- 1 / ( sigma * sigma)

}

",fill TRUE)

sink()

# Bundle data

win.data <- list(wing wing, elev as.numeric(elev), pop as.numeric(pop), n

length(wing))

# Inits function

inits <- function(){ list(alpha rnorm(1), sigma rlnorm(1) )}


params <- c("alpha", "beta.pop", "beta.elev", "sigma")

# MCMC settings

ni <- 1200

nb <- 200

nt <- 2

nc <- 3


out <- bugs(win.data, inits, params, "2w.anova.txt", n.thin nt, n.chains nc,


# Print estimates

print(out, dig 3)

> print(out, dig 3)

Inference for Bugs model at "2w.anova.txt", fit using WinBUGS,



mean sd 2.5% 25% 50% 75% 97.5% Rhat n.eff

alpha 38.701 1.216 36.355 37.870 38.760 39.510 40.945 1.006 680

beta.pop[2] 8.447 1.476 11.325 9.492 8.442 7.428 5.547 1.001 1500

beta.pop[3] 3.567 1.458 6.360 4.512 3.586 2.664 0.670 1.001 1500

beta.pop[4] 5.820 1.470 3.040 4.825 5.871 6.788 8.713 1.003 780

beta.pop[5] 11.841 1.454 9.090 10.850 11.860 12.770 14.725 1.001 1500

beta.elev[2] 6.901 1.156 4.714 6.106 6.923 7.671 9.184 1.004 560

beta.elev[3] 11.077 1.119 8.988 10.300 11.040 11.850 13.220 1.002 970

sigma 3.561 0.360 2.954 3.306 3.538 3.788 4.323 1.000 1500

deviance 320.983 4.308 314.900 317.900 320.400 323.200 331.600 1.001 1500

[ ... ]

DIC info (using the rule, pD Dbar–Dhat)




We get estimates that are fairly similar with the MLEs above. To see theestimate of the residual, you can type summary(mainfit).

10.5.2 Interaction-Effects ANOVA Using WinBUGS

We will specify the means parameterization for ease of coding andshow how parameters in WinBUGS can be arrays with two (or more)dimensions. This is handy when organizing an analysis.

# Write model

sink("2w2.anova.txt")

cat("

model {

# Priors

for (i in 1:n.pop){

for(j in 1:n.elev) {

group.mean[i,j] ~ dnorm(0, 0.0001)

}

}


# Likelihood

for (i in 1:n) {

wing[i] ~ dnorm(mean[i], tau)

mean[i] < group.mean[pop[i], elev[i]]

}


tau < 1 / ( sigma * sigma)

}

",fill TRUE)

sink()

# Bundle data

win.data < list(wing wing, elev as.numeric(elev), pop as.numeric(pop), n

length(wing), n.elev length(unique(elev)), n.pop length(unique(pop)))

# Inits function

inits < function(){list(sigma rlnorm(1) )}


params < c("group.mean", "sigma")

# MCMC settings

ni < 1200

nb < 200

nt < 2

nc < 3



out < bugs(win.data, inits, params, "2w2.anova.txt", n.thin nt, n.chains nc,


# Print estimates

print(out, dig 3)

> print(out, dig 3)

Inference for Bugs model at "2w2.anova.txt", fit using WinBUGS,



mean sd 2.5% 25% 50% 75% 97.5% Rhat n.eff

group.mean[1,1] 38.864 1.644 35.500 37.790 38.885 39.982 42.040 1.001 1500

group.mean[1,2] 45.263 1.600 42.135 44.187 45.250 46.300 48.506 1.000 1500

group.mean[1,3] 50.062 1.645 46.784 48.987 50.080 51.110 53.330 1.000 1500

group.mean[2,1] 30.330 1.589 27.150 29.297 30.320 31.370 33.495 1.000 1500

group.mean[2,2] 35.405 1.637 32.284 34.340 35.390 36.532 38.495 1.004 720

group.mean[2,3] 42.905 1.564 39.708 41.865 42.930 43.950 45.850 1.003 780

group.mean[3,1] 34.071 1.634 30.975 32.977 34.065 35.120 37.346 1.002 1500

group.mean[3,2] 41.932 1.609 38.714 40.897 41.940 43.010 45.041 1.001 1500

group.mean[3,3] 47.077 1.628 43.855 46.017 47.040 48.160 50.290 1.004 500

group.mean[4,1] 43.508 1.663 40.345 42.370 43.440 44.580 46.895 1.001 1500

group.mean[4,2] 50.403 1.635 47.239 49.310 50.365 51.520 53.590 1.004 540

group.mean[4,3] 57.630 1.617 54.525 56.570 57.640 58.720 60.770 1.002 1500

group.mean[5,1] 52.246 1.608 49.054 51.177 52.260 53.350 55.411 1.002 1100

group.mean[5,2] 60.538 1.606 57.384 59.457 60.580 61.630 63.625 1.000 1500

group.mean[5,3] 56.893 1.621 53.670 55.877 56.870 57.962 60.095 1.002 1300

sigma 3.227 0.358 2.617 2.980 3.193 3.447 4.014 1.001 1500

deviance 309.326 6.734 299.100 304.500 308.350 313.000 324.952 1.001 1500

[...]


pD 15.7 and DIC 325.1


We find the usual similarity between the Bayes and the maximum likeli-hood solution above (do summary(intfit)) and note in passing that theestimated number of parameters (pD) is pretty close to what we wouldexpect it to be.


Let’s present the Bayesian inference for the interaction-effects model ina graph showing the predicted response, analogous to least-square meansin a classical analysis, for each combination of elevation and population(Fig. 10.3). This plot corresponds to the boxplot of the data set (Fig. 10.2);or selects the order of the predictions to match that in Fig. 10.2.


or < c(1,4,7,10,13,2,5,8,11,14,3,6,9,12,15)

plot(or, out$mean$group.mean, xlab "Elev by Population", las 1, ylab

"Predicted wing length", cex 1.5, ylim c(20, 70))

segments(or, out$mean$group.mean, or, out$mean$group.mean + out$sd$group.mean,

col "black", lwd 1)

segments(or, out$mean$group.mean, or, out$mean$group.mean out$sd$group.mean,

col "black", lwd 1)

abline(h 40)

10.6 SUMMARY

We have introduced the concepts of main and interaction effects andused R and WinBUGS to fit the corresponding two-way ANOVA models.In an aside, we have illustrated R’s flexibility to conduct simulations toverify the effects of sampling variation on the parameter estimates.

EXERCISES1. Toy snake example: Fit a two-way ANOVA with interaction to the toy

example of Chapter 6 and see what happens to the nonidentifiableparameter.

2. Swiss hare data: Fit anANOVAmodel tomean hare density to decidewhetherthe effect of grassland and arable land use is the same in all regions. Regionsand land use are somewhat confounded, but we ignore this here.

70

60

50

40

Pre

dict

ed w

ing-

leng

th

30

20

2 4 6 8

Elevation-by-Population

10 12 14

FIGURE 10.3 Predicted wing length of mourning cloaks for each elevation populationcombination. Error bars are 1 SE.

10.6 SUMMARY 139

C H A P T E R

11General Linear Model

(ANCOVA)

O U T L I N E




11.4 Analysis Using WinBUGS (And A Cautionary TaleAbout the Importance of Covariate Standardization) 145

11.5 Summary 149

11.1 INTRODUCTION

The “model of the mean,” t-test, simple linear regression, and analysis ofvariance (ANOVA) are all just special cases of a very general and powerfulstatistical model, the general linear model (≠ GLM!; see Chapter 13). Thismodel expresses a continuous response as a linear combination of theeffects of discrete and/or continuous explanatory variables plus a singlerandom contribution from a normal distribution, whose variance is esti-mated along with the coefficients of all discrete and continuouscovariates and possible interactions.

Before this was widely recognized, people used to make a rather sharpand artificial distinction between linear models that contain categoricalexplanatory variables only and were called t-test or ANOVA modelsand those that contain continuous covariates only and were called regres-sion models. Models that contained both types of explanatory variableswere usually treated as ANOVAs with typically a single continuous cov-ariate to correct for preexisting variation among experimental units. These


models were called analysis of covariance (ANCOVA) models and that’swhy I use this term here.

Nowadays, in many practical applications, we typically have severalexplanatory variables of both types. In addition, we want to fit bothmain effects of these covariates and some or all their pairwise oreven higher-order interactions. As an example, we here consider anANCOVA model with an interaction between a discrete and a continuouscovariate. We saw how to fit interactions between two discrete covariatesin Chapter 10. Interactions between two continuous covariates are easyto fit: simply fit one additional covariate whose values are obtained bymultiplication of the two main covariates.

In this chapter, we consider the relationship between body mass andbody length of the asp viper (Fig. 11.1) in three populations: Pyrenees,Massif Central, and the Jura mountains. We are particularly interestedin population-specific differences of the mass–length relationship, i.e., inthe interaction between length and population. The means parameteriza-tion of the model we will fit can be written as (see Section 6.3.6)

yi = αjðiÞ + βjðiÞ � xi + εiand

εi ∼Normalð0, σ2Þ,

FIGURE 11.1 Male asp viper (Vipera aspis), Switzerland, 2007. (Photo T. Ott)

11. GENERAL LINEAR MODEL (ANCOVA)142

where yi is the body mass of individual i, αj(i) and βj(i) are the intercept andthe slope, respectively, of the mass–length relationship in population j, xiis the body length of snake i, and as usual, εi describes the combinedeffects of all unmeasured influences on the body mass of snake i and isassumed to behave like a normal random variable whose variance σ 2

we estimate.The effects parameterization of the same model is this:

yi = αPyr + β1 � xMCðiÞ + β2 � xJuraðiÞ + β3 � xbodyðiÞ + β4 � xbodyðiÞ � xMCðiÞ+ β5 � xbodyðiÞ � xJuraðiÞ + εi

In addition to yi and εi that are as before, αPyr is the expected mass ofsnakes in the Pyrenees, β1 is the difference between the expected massof snakes in the Massif Central and that in the Pyrenees, and xMC(i) isthe indicator for snakes caught in the Massif Central. β2 is the differencebetween the expected mass in the Jura and that in the Pyrenees, xJura(i) isthe indicator for snakes in the Jura, β3 is the slope of the regression of bodymass on body length xbody in the Pyrenees, β4 is the difference in that slopebetween the Massif Central and the Pyrenees, and β5 is the difference ofslopes between the Jura and the Pyrenees. Thus, snakes in the Pyrenees actas baseline with which snakes from the Massif Central and the Jura arecompared, but as usual, this choice has no effect on inference.


As always, we assume a balanced design simply for convenience ofdata generation.

n.groups < 3n.sample < 10n < n.groups * n.sample # Total number of data pointsx < rep(1:n.groups, rep(n.sample, n.groups)) # Indicator forpopulationpop < factor(x, labels = c("Pyrenees", "Massif Central", "Jura"))length < runif(n, 45, 70) # Obs. body length (cm) is rarely lessthan 45

We build the design matrix of an interactive combination of length andpopulation, inspect that, and select the parameter values, i.e., choosevalues for αPyr, β1, β2, β3, β4, and β5.

Xmat < model.matrix(~ pop*length)print(Xmat, dig = 2)beta.vec < c( 250, 150, 200, 6, 3, 4)


Next, we build up the body mass measurements yi by adding the residualto the value of the linear predictor, with residuals drawn from an appro-priate zero-mean normal distribution. The value of the linear predictor isobtained by matrix multiplication of the design matrix (Xmat) and theparameter vector (beta.vec). Our vipers are probably all too fat, butthat doesn’t really matter for our purposes.

lin.pred <– Xmat[,] %*% beta.vec # Value of lin.predictor

eps <– rnorm(n n, mean 0, sd 10) # residuals

mass <– lin.pred + eps # response lin.pred + residual

hist(mass) # Inspect what we've created

matplot(cbind(length[1:10], length[11:20], length[21:30]), cbind(mass[1:10],

mass[11:20], mass[21:30]), ylim c(0, max(mass)), ylab "Body mass (g)", xlab

"Body length (cm)", col c("Red","Green","Blue"), pch c("P","M","J"), las 1,

cex 1.2, cex.lab 1.5)

We have created a data set in which vipers from the Pyrenees havethe steepest slope between mass and length, followed by those from theMassif Central and finally those in the Jura mountains (Fig. 11.2). Nowlet’s disassemble these data.

150

100

Bod

y m

ass

(g)

Body length (cm)

50

PP

P

MMM

M

M

M

M

MJ

J JJ J

J JJ

JJM

P

P

PP

P

P

P

0

45 50 55 60 65

M

FIGURE 11.2 Simulated data set showing body mass versus length of 10 asp vipers ineach of three populations (P Pyrenees, M Massif Central, and J Jura).



The code for an analysis in R is very parsimonious indeed:

summary(lm(mass ~ pop * length))

> summary(lm(mass ~ pop * length))

[...]

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 246.6623 24.8588 9.923 5.72e 10 ***

popMassif Central 197.5543 37.8767 5.216 2.41e 05 ***

popJura 241.5563 39.1405 6.172 2.24e 06 ***

length 5.9623 0.4323 13.792 6.66e 13 ***

popMassif Central:length 3.8041 0.6631 5.737 6.52e 06 ***

popJura:length 4.7114 0.6595 7.144 2.20e 07 ***

[...]

Residual standard error: 10.01 on 24 degrees of freedom

Multiple R squared: 0.9114, Adjusted R squared: 0.8929

F statistic: 49.37 on 5 and 24 DF, p value: 7.48e 12

These coefficients can directly be compared with the beta vector since wesimulated the data exactly in the default effects format of a linear modelspecified in R. The residual standard deviation is called residual standarderror by R.

beta.veccat("And the residual SD was 10 \n")

> beta.vec[1] 250 150 200 6 3 4> cat("And the residual SD was 10 \n")And the residual SD was 10

11.4 ANALYSIS USING WinBUGS(AND A CAUTIONARY TALEABOUT THE IMPORTANCE OF

COVARIATE STANDARDIZATION)

In WinBUGS, I find it much easier to fit the means parameterization ofthe model, i.e., to specify three separate linear regressions for each moun-tain range. The effects (i.e., differences of intercept or slopes with referenceto the Pyrenees) are trivially easy to recover as derived parameters by justadding a few WinBUGS code lines. This allows for better comparisonbetween input and output values.


# Define model

sink("lm.txt")

cat("

model {

# Priors

for (i in 1:n.group){

alpha[i] ~ dnorm(0, 0.001) # Intercepts

beta[i] ~ dnorm(0, 0.001) # Slopes

}

sigma ~ dunif(0, 100) # Residual standard deviation

tau <– 1 / ( sigma * sigma)

# Likelihood

for (i in 1:n) {

mass[i] ~ dnorm(mu[i], tau)

mu[i] <– alpha[pop[i]] + beta[pop[i]]* length[i]

}


# Define effects relative to baseline level

a.effe2 <– alpha[2] − alpha[1] # Intercept Massif Central vs. Pyr.

a.effe3 <– alpha[3] − alpha[1] # Intercept Jura vs. Pyr.

b.effe2 <– beta[2] − beta[1] # Slope Massif Central vs. Pyr.

b.effe3 <– beta[3] − beta[1] # Slope Jura vs. Pyr.

# Custom tests

test1 <– beta[3] − beta[2] # Slope Jura vs. Massif Central

}

",fill TRUE)

sink()

# Bundle data

win.data <– list(mass as.numeric(mass), pop as.numeric(pop), length length,

n.group max(as.numeric(pop)), n n)

# Inits function

inits <– function(){ list(alpha rnorm(n.group, 0, 2), beta rnorm(n.groups, 1,

1), sigma rlnorm(1))}


parameters <– c("alpha", "beta", "sigma", "a.effe2", "a.effe3", "b.effe2",

"b.effe3", "test1")

# MCMC settings

ni <– 1200

nb <– 200

nt <– 2

nc <– 3


# Start Markov chains

out <– bugs(win.data, inits, parameters, "lm.txt", n.thin nt, n.chains nc,


This is a simple model that converges rapidly. We inspect the results andcompare them with the “truth” in the data-generating random process aswell as with the inference from R …

print(out, dig 3) # Bayesian analysis

beta.vec # Truth in the data generating process

summary(lm(mass ~ pop * length)) # The ML solution again

… and are perplexed! WinBUGS claims that the Markov chains haveconverged (see Rhat values), but we get totally different estimates fromwhat we should! Remember that alpha[1] and beta[1] in WinBUGS cor-respond to the intercept and the length main effect in the analysis in Rand a.effe2, a.effe3. b.effe2, b.effe3 to the remaining terms of theanalysis in R.

> print(out, dig 3)# Bayesian analysis

Inference for Bugs model at "lm.txt", fit using WinBUGS,



mean sd 2.5% 25% 50% 75% 97.5% Rhat n.eff

alpha[1] 100.524 31.918 162.910 122.400 100.600 79.105 38.406 1.002 1500

alpha[2] 16.420 26.238 65.813 34.060 16.670 1.595 34.926 1.000 1500

alpha[3] 2.613 26.127 51.847 20.862 2.920 15.893 47.693 1.001 1500

beta[1] 3.442 0.557 2.336 3.066 3.443 3.813 4.540 1.002 1500

beta[2] 1.586 0.464 0.669 1.272 1.588 1.896 2.467 1.000 1500

beta[3] 1.205 0.438 0.367 0.885 1.207 1.503 2.059 1.002 1500

sigma 15.841 3.089 10.669 13.650 15.525 17.637 22.610 1.001 1500

a.effe2 84.103 38.764 6.313 57.752 84.620 110.300 160.082 1.002 1500

a.effe3 97.911 41.127 16.015 70.735 97.855 126.200 175.105 1.001 1500

b.effe2 1.856 0.684 3.178 2.327 1.856 1.401 0.479 1.002 1500

b.effe3 2.237 0.708 3.574 2.716 2.253 1.777 0.820 1.001 1500

test1 0.381 0.633 1.608 0.782 0.396 0.032 0.866 1.001 1500

deviance 249.071 8.258 232.847 243.300 249.000 254.800 264.552 1.000 1500

[ ... ]

> beta.vec # Truth in the data generating process

[1] 250 150 200 6 3 4

So lm() is able to recover the right parameter values, up to sampling andestimation error, but WinBUGS is not. Why is this?

The problem turns out to reside in the lack of standardization of thecovariate length. In WinBUGS, it is always advantageous to scale


covariates so that their extremes are not too far away from zero; otherwise,there may be nonconvergence or other problems. And the ugly thing hereis that from looking at the convergence diagnostics (Rhat), we wouldnever have guessed that there was a problem!

This example illustrates how useful it is to check the consistency of one’sinference fromWinBUGS with other sources, e.g., estimates from a simpler,but similar model run in WinBUGS or maximum likelihood estimates fromanother software. Know thy model! Alternatively, we could also haveplotted the estimated regression lines into the observed data and wouldhave seen easily that something was wrong.

As a quick check that lack of standardization was indeed the problem,we repeat both the maximum likelihood and the Bayesian analysis using anormalized version of the length covariate. This is simple; we just need toredefine the length covariate in the data list we pass to WinBUGS and canrerun the same code as before.

# Data passed to WinBUGS

win.data <– list(mass as.numeric(mass), pop as.numeric(pop), length

as.numeric(scale(length)), n.group max(as.numeric(pop)), n n)

# Start Markov chains

out <– bugs(win.data, inits, parameters, "lm.txt", n.thin nt, n.chains nc,

n.burnin nb, n.iter ni, debug FALSE)

...

# Inspect results

print(out, dig 3)

> print(out, dig 3)

[ ... ]

mean sd 2.5% 25% 50% 75% 97.5% Rhat n.eff

alpha[1] 97.796 3.406 90.679 95.820 97.905 100.100 104.252 1.004 1500

alpha[2] 74.952 3.504 67.864 72.690 75.030 77.292 81.580 1.004 1200

alpha[3] 66.535 3.653 59.009 64.060 66.660 68.995 73.265 1.007 350

beta[1] 41.243 3.158 34.829 39.150 41.215 43.332 47.316 1.005 870

beta[2] 14.486 3.763 7.318 11.947 14.570 16.920 22.112 1.002 1500

beta[3] 8.868 3.820 1.635 6.401 8.803 11.322 16.816 1.000 1500

sigma 10.587 1.678 7.937 9.380 10.370 11.510 14.516 1.005 630

a.effe2 −22.844 4.756 −31.771 −25.962 −22.815 −19.718 −13.498 1.003 1200

a.effe3 −31.262 5.000 −41.826 −34.470 −31.260 −28.050 −21.384 1.005 400

b.effe2 −26.757 4.802 −36.065 −30.032 −26.800 −23.620 −16.660 1.002 1500

b.effe3 −32.375 4.907 −41.445 −35.853 −32.350 −29.345 −22.162 1.001 1500

test1 −5.618 5.334 −16.024 −9.007 −5.836 −2.101 5.108 1.001 1500

deviance 225.218 4.558 218.700 221.900 224.500 227.800 235.500 1.009 270

[ ... ]


Compare with MLEs (R output slightly edited):

print(lm(mass ~ pop * as.numeric(scale(length)))$coefficients, dig 4)

...

> print(lm(mass ~ pop * as.numeric(scale(length)))$coefficients, dig 4)

(Intercept) popMassif Central

98.94 −22.95

popJura as.numeric(scale(length))

−31.54 41.78

popMassif Central:as.numeric(scale(length)) popJura:as.numeric(scale(length))

−26.66 −33.01

Indeed, we now get consistent estimates in both analyses. Hence, pre-viously the scale of the covariate didn’t allow WinBUGS to converge,even though the Rhat values reported did indicate convergence. Asa cautionary principle, we might therefore always consider to transformall covariates for WinBUGS, even if that slightly complicates presentationof results afterwards (for instance, in graphics). Transforming can meancentering, i.e., subtracting the mean, which changes the intercept only,but not the slope. Transforming can also mean normalizing, i.e., subtract-ing the mean and dividing the result by the standard deviation of theoriginal covariate values. This changes both the intercept and the sloperelative to an analysis with the original covariate. In the above case,centering will also work (want to try this out?).

11.5 SUMMARY

In a key chapter for your understanding of the modeling of groupeddata, we have looked at the general linear model, or ANCOVA, in Rand WinBUGS. We have focused on a model with one discrete and onecontinuous predictor and their interaction. Understanding ANCOVA isan important intermediate step to understanding the linear-mixedmodel in the next chapter.

EXERCISES1. Probability of a parameter: What is the probability that the slope of the mass

length relationship of asp vipers is inferior in the Jura than in the MassifCentral? Produce a graphical and a numerical answer.

2. Related models: Adapt the code to fit two variations of the model:• Fit different intercepts but a common slope• Fit the same intercept and the same slope

11.5 SUMMARY 149

3. Quadratic effects: Add a quadratic term to the mass length relationship,i.e., fit the model pop + (length + length^2). You do not need toreassemble a data set that contains an effect of length squared, but you cansimply take the data set we have already created in this chapter.

4. Swiss hare data: Fit an ANCOVA (pop * year, with year as a continuousexplanatory variable) to the mean density. Also compute residuals and plotthem to check for outliers.


C H A P T E R

12Linear Mixed-Effects Model

O U T L I N E



12.3 Analysis under a Random-Intercepts Model 15612.3.1 REML Analysis Using R 15612.3.2 Bayesian Analysis Using WinBUGS 156

12.4 Analysis under a Random-Coefficients Model withoutCorrelation between Intercept and Slope 15812.4.1 REML Analysis Using R 15812.4.2 Bayesian Analysis Using WinBUGS 159

12.5 The Random-Coefficients Model with Correlation betweenIntercept and Slope 16112.5.1 Introduction 16112.5.2 Data Generation 16212.5.3 REML Analysis Using R 16312.5.4 Bayesian Analysis Using WinBUGS 164

12.6 Summary 165

12.1 INTRODUCTION

Mixed-effects or mixed models contain factors, or more generallycovariates, with both fixed and random effects. During the last 15 yearsor so, the use of mixed models has greatly increased in statistical appli-cations in ecology and related disciplines (Pinheiro and Bates, 2000;McCulloch and Searle, 2001; Lee et al., 2006; Littell et al., 2008). Asexplained in Chapter 9, there may be at least three benefits to assuming


a set of parameters constitutes a random sample from some distribution,whose hyperparameters are then estimated as the main structural param-eters of a model: increased scope of the inference, more honest accountingfor system uncertainty, and efficiency of estimation.

In Chapter 9, we met a one-way analysis of variance model that, apartfrom an overall mean, contained only random effects and could be called avariance-components model. It could also be called a mixed model if theoverall mean, the intercept, is viewed as a fixed effect, but this terminol-ogy is not standard. Here, we consider a classic mixed model that arises asa direct generalization of the analysis of covariance (ANCOVA) model inChapter 11. We modify our asp viper (Fig. 12.1) data set from there just abit and assume we now have measurements from a much larger numberof populations, say, 56. A random-effects factor need not possess thatmany levels (some statisticians even fit a two-level factor such as sex asrandom; see Gelman, 2005), but one rarely sees fewer than, say, 5–10 orso parameters fitted as random effects. Estimating a variance with so fewvalues, which are moreover unobserved, will not result in very preciseand perhaps biased estimates (see also Lambert et al., 2005).

We resimulate some asp viper data using R code fairly similar to thatin the previous chapter. However, we now constrain the values for atleast one set of effects (intercepts and/or slopes) to come from a normal

FIGURE 12.1 Gravid female asp viper (Vipera aspis), France, 2008. (Photo T. Ott)

12. LINEAR MIXED-EFFECTS MODEL152

distribution: this is what the random-effects assumption means. There areat least three sets of assumptions that one may make about the randomeffects for the intercept and/or the slope of regression lines that are fittedto grouped (here, population-specific) data:

1. Only intercepts are random, but slopes are identical for all groups.2. Both intercepts and slopes are random, but they are independent.3. Both intercepts and slopes are random, and there is a correlation

between them.

(An additional case, where slopes are random and intercepts are fixed,is not a sensible model in most circumstances.) Model No. 1 is often calleda random-intercepts model, and both models No. 2 and 3 are also calledrandom-coefficients models. Model No. 3 is the default in R’s functionlmer() in package lme4 when fitting a random-coefficients model.

We now first generate a random-coefficients data set under modelNo. 2, where both intercepts and slopes are uncorrelated random effects.We then fit both a random-intercepts (No. 1) and a random-coefficientsmodel without correlation (No. 2) to these data (see Sections 12.2–12.4).Then, we generate a second data set that includes a correlation betweenrandom intercepts and random slopes and adopt the random-coefficientsmodel with correlation between intercepts and slopes (No. 3) to analyze it(see Section 12.5).

This is a key chapter for your understanding of mixed models, andI expect its contents to be helpful for the general understanding ofmixed models to many ecologists. A close examination of how suchdata can be assembled (i.e., simulated) will be an invaluable help forunderstanding how analogous data sets are broken down (i.e., analyzed)using mixed models. Indeed, I believe that very few strategies can be moreeffective to understand this type of mixed model than the combination ofsimulating data sets and describing the models fitted in WinBUGS syntax.

Here is one way in which to write the random-coefficients modelwithout correlation between the random effects for mass yi of snake i inpopulation j:

yi = αjðiÞ + βjðiÞ � xi + εi

αj ∼Normalðμα, σ2αÞ # Random effects for interceptsβj ∼Normalðμβ, σ2βÞ # Random effects for slopes

εi ∼Normalð0, σ2Þ # Residual “random” effects

Exactly as in the ANCOVA model in Chapter 11, mass yi is related to bodylength xi of snake i by a straight-line relationship with population-specificvalues for intercept αj and slope βj. (These regression parameters vary byindividual i according to their membership to population j.) However,both αj and βj are now assumed to come from an independent normal dis-tribution, with means μα and μβ and variances of σ2α and σ2β, respectively.

12.1 INTRODUCTION 153

The residuals εi for snake i are assumed to come from another independentnormal distribution with variance σ2.


As always, we assume a balanced design for simple convenience,though that is not required to conduct mixed model analyses usingrestricted maximum likelihood (REML) in R or a Bayesian analysis inWinBUGS. Indeed, the flexibility with unbalanced data sets was one ofthe main reasons why REML-based mixed model estimation became somuch more popular than estimation based on sums of squares decompo-sitions (Littell et al., 2008).

n.groups < 56 # Number of populationsn.sample < 10 # Number of vipers in each popn < n.groups * n.sample # Total number of data pointspop < gl(n = n.groups, k = n.sample) # Indicator for population

We directly normalize covariate length to avoid trouble with WinBUGS.

# Body length (cm)original.length < runif(n, 45, 70)mn < mean(original.length)sd < sd(original.length)cat("Mean and sd used to normalise.original length:", mn, sd, "\n\n")length < (original.length mn) / sdhist(length, col = "grey")

We build a design matrix without intercept.

Xmat < model.matrix(~pop*length 1 length)print(Xmat[1:21,], dig = 2) # Print top 21 rows

Next, we choose parameter values, but this time, we need to constrainthem, i.e., both values for the intercepts and those for the slopes will bedrawn from two normal distributions for whom we specify four hyper-parameters, i.e., two means (corresponding to μα and μβ) and two standarddeviations (SDs) (corresponding to the square root of σ2α and σ2β). As resi-dual variation, we use a mean-zero normal distribution with SD of 30.

intercept.mean <– 230 # mu alpha

intercept.sd <– 20 # sigma alpha

slope.mean <– 60 # mu beta

slope.sd <– 30 # sigma beta

intercept.effects<–rnorm(n n.groups, mean intercept.mean, sd intercept.sd)

slope.effects <– rnorm(n n.groups, mean slope.mean, sd slope.sd)

all.effects <– c(intercept.effects, slope.effects) # Put them all together


We assemble the measurements yi as before.

lin.pred < Xmat[,] %*% all.effects # Value of lin.predictor

eps < rnorm(n n, mean 0, sd 30) # residuals

mass < lin.pred + eps # response lin.pred + residual

hist(mass, col "grey") # Inspect what we've created

We produce a trellis graph of the relationships in all ngroup populations(Fig. 12.2). Depending on the particular realization of the simulated sto-chastic system, we generally have quite a few fatties and may evenhave a few negative masses, but this doesn’t really matter for our analysis.

library("lattice")xyplot(mass ~ length | pop)

500400300200100

0

5004003002001000

5004003002001000

5004003002001000

500400300200100

0

500400300200100

0

�1 0 1 �1 0 1

500400300200100

0

�1 0

49 50 51 52 53 54 55 56

41 42 43 44 45 46 47 48

33 34 35 36 37 38 39 40

25

mas

s

length

26 27 28 29 30 31 32

17 18 19 20 21 22 23 24

9 10 11 12 13 14 15 16

1 2 3 4 5 6 7 8

1 �1 0 1 �1 0 1 �1 0 1

�1 0 1 �1 0 1

FIGURE 12.2 Trellis plot of the mass length relationships in 56 asp viper populations(length has been standardized).


We can detect straight-line relationships between mass and length thatdiffer among the 56 populations. What we can’t see is the random-effectsassumption built into the data set. That is, we are unable to distinguish asimple ANCOVA data set as in Chapter 11 from a mixed model data set asin this chapter.

12.3 ANALYSIS UNDER A RANDOM-INTERCEPTSMODEL

12.3.1 REML Analysis Using R

We first assume that the slope of the mass–length relationship is iden-tical in all populations and that only the intercepts differ randomly fromone population to another.

library('lme4')lme.fit1 < lmer(mass ~ length + (1 | pop), REML = TRUE)lme.fit1> lme.fit1Linear mixed model fit by REMLFormula: mass ~ length + (1 | pop)

AIC BIC logLik deviance REMLdev5873 5890 2932 5872 5865

Random effects:Groups Name Variance Std.Dev.pop (Intercept) 260.60 16.143Residual 1930.94 43.942Number of obs: 560, groups: pop, 56


(Intercept) 226.527 2.846 79.59length 59.647 1.916 31.13

Correlation of Fixed Effects:(Intr)

length 0.000


In our Bayesian analysis of the random-intercepts model, we use a sui-tably wide uniform distribution as a prior for the standard deviation of therandom-effects distribution (Gelman, 2006).


# Write model

sink("lme.model1.txt")

cat("

model {

# Priors

for (i in 1:ngroups){

alpha[i] ~ dnorm(mu.int, tau.int) # Random intercepts

}

mu.int ~ dnorm(0, 0.001) # Mean hyperparameter for random intercepts

tau.int < 1 / (sigma.int * sigma.int)

sigma.int ~ dunif(0, 100) # SD hyperparameter for random intercepts

beta ~ dnorm(0, 0.001) # Common slope

tau < 1 / ( sigma * sigma) # Residual precision


# Likelihood

for (i in 1:n) {

mass[i] ~ dnorm(mu[i], tau) # The random variable

mu[i] < alpha[pop[i]] + beta* length[i] # Expectation

}

}

",fill TRUE)

sink()

# Bundle data

win.data < list(mass as.numeric(mass), pop as.numeric(pop), length length,

ngroups max(as.numeric(pop)), n n)

# Inits function

inits < function(){list(alpha rnorm(n.groups, 0, 2), beta rnorm(1, 1, 1),

mu.int rnorm(1, 0, 1), sigma.int rlnorm(1), sigma rlnorm(1))}


parameters < c("alpha", "beta", "mu.int", "sigma.int", "sigma")

# MCMC settings

ni < 2000

nb < 500

nt < 2

nc < 3


out < bugs(win.data, inits, parameters, "lme.model1.txt", n.thin nt, n.chains nc,


12.3 ANALYSIS UNDER A RANDOM-INTERCEPTS MODEL 157

# Inspect results

print(out, dig 3)

> print(out, dig 3)

Inference for Bugs model at "lme.model.txt", fit using WinBUGS,



mean sd 2.5% 25% 50% 75% 97.5% Rhat n.eff

[...]

beta 59.348 1.948 55.457 58.030 59.33 60.700 62.978 1.002 1300

mu.int 224.606 2.925 218.600 222.700 224.70 226.600 230.077 1.001 2200

sigma.int 16.507 2.760 11.582 14.580 16.42 18.230 22.356 1.012 180

sigma 44.120 1.409 41.570 43.150 44.08 45.040 46.958 1.002 1600

[...]

# Compare with input values

intercept.mean ; slope.mean ; intercept.sd ; slope.sd ; sd(eps)

> intercept.mean ; slope.mean ; intercept.sd ; slope.sd ; sd(eps)

[1] 230

[1] 60

[1] 20

[1] 30

[1] 29.86372

As usual with vague priors, the two analyses yield rather comparableresults. Interestingly, the residual standard deviation in both is estimatedtoo high. This is because we simulated the data to contain random varia-tion among the slopes, but we did not fit this model, so this variation isunaccounted for and gets absorbed in the residual.

12.4 ANALYSIS UNDER A RANDOM-COEFFICIENTSMODEL WITHOUT CORRELATION BETWEEN

INTERCEPT AND SLOPE


Next, we assume that both slopes and intercepts of the mass–length rela-tionship differ among populations in the fashion of two independent ran-domvariables, i.e.,we assume the absence of a correlation between interceptand slope. Thus, we will analyze the data under the same model that weused to generate our data set.

library('lme4')lme.fit2 < lmer(mass ~ length + (1 | pop) + ( 0+ length | pop))lme.fit2

> lme.fit2Linear mixed model fit by REML


Formula: mass ~ length + (1 | pop) + (0 + length | pop)AIC BIC logLik deviance REMLdev

5598 5619 2794 5596 5588Random effects:

Groups Name Variance Std.Dev.pop (Intercept) 274.37 16.564pop length 1012.49 31.820Residual 875.96 29.597





length 0.002


Finally, here is the Bayesian analysis of the simple random-coefficientsmodel:

# Define model


cat("

model {

# Priors


alpha[i] ~ dnorm(mu.int, tau.int) # Random intercepts

beta[i] ~ dnorm(mu.slope, tau.slope) # Random slopes

}

mu.int ~ dnorm(0, 0.001) # Mean hyperparameter for random intercepts

tau.int <– 1 / (sigma.int * sigma.int)

sigma.int ~ dunif(0, 100) # SD hyperparameter for random intercepts

mu.slope ~ dnorm(0, 0.001) # Mean hyperparameter for random slopes

tau.slope <– 1 / (sigma.slope * sigma.slope)

sigma.slope ~ dunif(0, 100) # SD hyperparameter for slopes

tau <– 1 / ( sigma * sigma) # Residual precision


# Likelihood

for (i in 1:n) {

12.4 ANALYSIS UNDER A RANDOM-COEFFICIENTS MODEL 159

mass[i] ~ dnorm(mu[i], tau)

mu[i] <– alpha[pop[i]] + beta[pop[i]]* length[i]

}

}

",fill TRUE)

sink()

# Bundle data



# Inits function

inits <– function(){ list(alpha rnorm(n.groups, 0, 2), beta rnorm(n.groups,

10, 2), mu.int rnorm(1, 0, 1), sigma.int rlnorm(1), mu.slope rnorm(1, 0, 1),

sigma.slope rlnorm(1), sigma rlnorm(1))}


parameters <– c("alpha", "beta", "mu.int", "sigma.int", "mu.slope", "sigma.

slope", "sigma")

# MCMC settings

ni <– 2000

nb <– 500

nt <– 2

nc <– 3


out <– bugs(win.data, inits, parameters, "lme.model2.txt", n.thin nt, n.chains nc,


This is still a relatively simple model that converges rapidly.

print(out, dig 2)

> print(out, dig 2)

Inference for Bugs model at "lme.model2.txt", fit using WinBUGS,



mean sd 2.5% 25% 50% 75% 97.5% Rhat n.eff

[...]

mu.int 227.22 2.68 221.90 225.50 227.20 229.00 232.40 1.00 1400

sigma.int 17.05 2.29 12.96 15.41 16.94 18.52 21.80 1.00 2200

mu.slope 58.49 4.57 49.48 55.37 58.49 61.59 66.96 1.00 2200

sigma.slope 32.48 3.39 26.50 30.16 32.24 34.63 39.74 1.00 2200

sigma 29.67 1.01 27.77 28.95 29.65 30.34 31.66 1.00 2200

[...]

>



> intercept.mean ; slope.mean ; intercept.sd ; slope.sd ; sd(eps)

[1] 230

[1] 60

[1] 20

[1] 30

[1] 29.86372

The two sets of numbers agree rather nicely, as do the solutions obtainedby lmer() and the input values. I emphasize again that using simulateddata and successfully recovering the input values gives one the confi-dence that the analysis in WinBUGS has probably been specifiedcorrectly. For more complex models, this is helpful, since it’s so easy tomake mistakes!

Finally, we note that the realized values of the intercept and slope ran-dom effects are estimated and are returned by typing ranef(lme.fit2)for the analysis in R. They are also contained in the WinBUGS output thatwe get by typing print(out, dig = 2).

12.5 THE RANDOM-COEFFICIENTS MODELWITH CORRELATION BETWEEN

INTERCEPT AND SLOPE

12.5.1 Introduction

The random-coefficients model with correlation is a simple extension ofthe previous model. The mass yi of snake i in population j is assumed to bedescribed by the following relations:

yi = αjðiÞ + βjðiÞ � xi + εiðαj, βjÞ∼MVNðμ,SÞ # Bivariate normal random effects

μ = ðμα, μβÞ # Mean vector

S =σ2α σαβσαβ σ2β

!# Variance�covariance matrix

εi ∼Normalð0, σ2Þ # Residual “random” effects

As before, the mass yi of snake i in population j is related to its body lengthxi by a straight-line relationship with population-specific values for inter-cept αj and slope βj. But now, pairs of αj and βj from the same populationare assumed to come from a multivariate normal distribution (actually,here, a bivariate normal) with mean vector μ and variance–covariancematrix S. The latter contains the variances of the intercept (σ2α) and theslope (σ2β) in the diagonal and the covariance between αj and βj (σαβ) inthe off-diagonals. As before, the residuals εi for snake i are assumed to

12.5 THE RANDOM-COEFFICIENTS MODEL WITH CORRELATION 161

come from an independent (univariate) normal distribution with varianceσ2. The interpretation of the covariance is such that positive values indicatea steeper mass–length relationship for snakes with a greater mass.


We generate data under the random-coefficients model.

n.groups < 56n.sample < 10n < n.groups * n.samplepop < gl(n = n.groups, k = n.sample)

We generate the covariate length.

original.length < runif(n, 45, 70) # Body length (cm)mn < mean(original.length)sd < sd(original.length)cat("Mean and sd used to normalise.original length:", mn, sd, "\n\n")length < (original.length mn) / sdhist(length, col = "grey")

We build the same design matrix as before.

Xmat < model.matrix(~pop*length 1 length)print(Xmat[1:21,], dig = 2) # Print top 21 rows

We choose the parameter values, i.e., the population-specific interceptsand slopes from a bivariate normal distribution (available in the R pack-age MASS) whose hyperparameters (two means and the four cells of thevariance–covariancematrix)weneed to specify.Weuse again as residual var-iation a mean-zero normal distribution with SD of 30.

library(MASS) # Load MASS

?mvrnorm # Calls help file

intercept.mean <– 230 # Values for five hyperparameters

intercept.sd <– 20

slope.mean <– 60

slope.sd <– 30

intercept.slope.covariance <– 10

mu.vector <– c(intercept.mean, slope.mean)

var.cova.matrix <– matrix(c(intercept.sd^2,intercept.slope.covariance,

intercept.slope.covariance, slope.sd^2),2,2)

effects <– mvrnorm(n n.groups, mu mu.vector, Sigma var.cova.matrix)

effects # Look at what we've created

apply(effects, 2, mean)

var(effects)


intercept.effects <– effects[,1]

slope.effects <– effects[,2]


Assemble the measurements yi.

lin.pred < Xmat[,] %*% all.effects # Value of lin.predictor

eps < rnorm(n n, mean 0, sd 30) # residuals

mass < lin.pred + eps # response lin.pred + residual

hist(mass, col "grey") # Inspect what we've created

Again, negative masses are possible for some realizations of the data set.We look at the simulated data set:

library("lattice")xyplot(mass ~ length | pop)

Now, we analyze this second data set allowing for a nonzero covariancebetween intercept and slope effects.


The model with an intercept–slope correlation is the default when spe-cifying a random-coefficients model in R using function lmer().

library('lme4')lme.fit3 < lmer(mass ~ length + (length | pop))lme.fit3> lme.fit3Linear mixed model fit by REMLFormula: mass ~ length + (length | pop)AIC BIC logLik deviance REMLdev5624 5650 2806 5620 5612Random effects:Groups Name Variance Std.Dev. Corrpop (Intercept) 255.86 15.996

length 652.01 25.534 0.333Residual 979.29 31.294Number of obs: 560, groups: pop, 56




length 0.254

12.5 THE RANDOM-COEFFICIENTS MODEL WITH CORRELATION 163


Here is one way in which to specify a Bayesian analysis of the random-coefficients model with correlation. For a different and more general wayto allow for correlation among two or more sets of random effects in amodel, see Gelman and Hill (2007, p. 376–377).

# Define model


cat("

model {

# Priors


alpha[i] <– B[i,1]

beta[i] <– B[i,2]

B[i,1:2] ~ dmnorm(B.hat[i,], Tau.B[,])

B.hat[i,1] <– mu.int

B.hat[i,2] <– mu.slope

}

mu.int ~ dnorm(0, 0.001) # Hyperpriors for random intercepts

mu.slope ~ dnorm(0, 0.001) # Hyperpriors for random slopes

Tau.B[1:2,1:2] <– inverse(Sigma.B[,])

Sigma.B[1,1] <– pow(sigma.int,2)

sigma.int ~ dunif(0, 100) # SD of intercepts

Sigma.B[2,2] <– pow(sigma.slope,2)

sigma.slope ~ dunif(0, 100) # SD of slopes

Sigma.B[1,2] <– rho*sigma.int*sigma.slope

Sigma.B[2,1] <– Sigma.B[1,2]

rho ~ dunif(−1,1)

covariance <– Sigma.B[1,2]

tau <– 1 / ( sigma * sigma) # Residual


# Likelihood

for (i in 1:n) {

mass[i] ~ dnorm(mu[i], tau) # The "residual" random variable

mu[i] <– alpha[pop[i]] + beta[pop[i]]* length[i] # Expectation

}

}

",fill TRUE)

sink()

# Bundle data




# Inits function

inits <– function(){ list(mu.int rnorm(1, 0, 1), sigma.int rlnorm(1), mu.slope

rnorm(1, 0, 1), sigma.slope rlnorm(1), rho runif(1, −1, 1), sigma

rlnorm(1))}


parameters <– c("alpha", "beta", "mu.int", "sigma.int", "mu.slope", "sigma.slope",

"rho", "covariance", "sigma")

# MCMC settings

ni <– 2000

nb <– 500

nt <– 2

nc <– 3


out <– bugs(win.data, inits, parameters, "lme.model3.txt", n.thin nt, n.chains nc,


We inspect the results and compare them with the frequentist analysis inSection 12.5.3 and find the usual comforting agreement between the twoapproaches (note rho in the Bayesian analysis has to be compared withCorr in the frequentist analysis).

print(out, dig = 2) # Bayesian analysislme.fit3 # Frequentist analysis

As usual, the approximate solution given by lmer() comes reasonablyclose to the exact solution from the Bayesian analysis (Gelman and Hill,2007). Even though convergence is achieved fairly rapidly in the latter, itoften takes much longer to obtain the exact Bayesian solution in a mixedmodel analysis. So there is a price to pay for enjoying the advantages ofthe Bayesian analysis, and this price can be fairly high when using Win-BUGS to fit more complex models.

For some realizations of the data set, the covariance may be estimatedat a negative value, even though we’ve built the data with a positive cov-ariance in the parent (statistical) population of intercepts and slopes. Thisis a reflection of both sampling variation and estimation error. Also look athow imprecise the estimate for the covariance is; covariances are evenharder to estimate than variances. R doesn’t return a standard error forthat estimate.

12.6 SUMMARY

We have introduced the classic mixed ANCOVA model with randomintercepts, random slopes, and the possibility of an intercept–slope covar-iance. Understanding the material presented in this chapter is essential for

12.6 SUMMARY 165

a thorough understanding of much of the current mixed modeling inecology. The ideas presented here appear over and over again, in laterchapters of this book, as well as in the applied work of many quantitativeecologists.

EXERCISES1. Specification of fixed- and random-effects in WinBUGS: The WinBUGS model

description for the random-intercepts, random-slope model (i.e., the secondone we fit in this chapter) is very similar to the fixed-effects “version” of thesame model, i.e., the one we fitted in Chapter 11. Without looking at theWinBUGS model description in that chapter, take the linear mixed modeldescription for WinBUGS from the current chapter and change it back to afixed-effects ANCOVA with population-specific intercepts and slopes, i.e.,corresponding to what you would fit in R as lm(mass ~ pop*length).

2. Swiss hare data: Fit a random-coefficients regression without intercept slopecorrelation to mean density (i.e., ~ population * year, with yearcontinuous).


C H A P T E R

13Introduction to the GeneralizedLinear Model: Poisson “t-test”

O U T L I N E


13.2 An Important but Often ForgottenIssue with Count Data 170



13.5 Analysis Using WinBUGS 17113.5.1 Check of Markov Chain Monte Carlo Convergence

and Model Adequacy 17313.5.2 Inference Under the Model 174

13.6 Summary 177

13.1 INTRODUCTION

The unification of a large number of statistical methods such asregression, analysis of variance (ANOVA), and analysis of covariance(ANCOVA) under the umbrella of the general linear model was a bigadvancement for applied statistics. However, even more significant wasthe unification of an even wider range of statistical methods within theclass of the generalized linear model or GLM in 1972 by Nelder and Wedder-burn (also see McCullagh and Nelder, 1989). They showed that a largenumber of techniques previously thought of as representing quite separatetypes of analyses, including logistic regression, multinomial regression,Chi-square tests, log-linear models, as well as the general linear model,


could all be represented as special cases of a generalized version of a linearmodel. In that way, much of what was well understood for the linearmodel could be carried over to that much larger class of models.

The two main ideas of the GLM are that, first, a transformation of theexpectation of the response E(y) is expressed as a linear combination of cov-ariate effects rather than the expected (mean) response itself. And second,for the random part of the model, distributions other than the normal can bechosen, e.g., Poisson or binomial.

Formally, a GLM is described by the following three components:

1. a statistical distribution is used to describe the random variation in theresponse y; this is the stochastic part of the system description,

2. a so-called link function g, that is applied to the expectation of theresponse E(y), and

3. a linear predictor, which is a linear combination of covariate effects thatare thought to make up g(E(y)); this is the systematic or deterministicpart of the system description.

Binomial, Poisson, and normal are probably the three most widely usedstatistical distributions in a GLM (see Chapter 6). The former two aredistributions for non-negative, discrete responses and therefore suitableto describe counts. The normal is the most widely used distribution forcontinuous responses such as measurements. The three most widelyused link functions are the identity, logit(=log(odds)=log(x/(1 x))),and the log. For various reasons, one link function is typically advanta-geous, although not obligate, for each of these distributions. For instance,the normal distribution combined with an identity link yields the generallinear model; the Poisson with a log link yields a log-linear model; and thebinomial with a logit link yields a logistic regression. Hence, all the normallinear models seen in Chapters 4–11 are simply special cases of a GLM.

In the next nine chapters, we will go through a progression from simpleto more complex models for Poisson and binomial responses. As for normallinear models, we begin again with what might be called a “Poisson t-test”in the sense that it consists of a comparison of two groups. To better see theanalogy with the normal linear model, we start by writing the model for thenormal two-group comparison (see Chapter 7) in GLM format:

1. Distribution: yi ∼Normal ðμi, σ2Þ2. Link function: identity, i.e., μi = EðyiÞ = linear predictor3. Linear predictor: α + β � xi

Next, we generalize this model to count data. The inferential situationconsidered is that of counts (C) of Brown hares (Fig. 13.1) in a sample of10 arable and 10 grassland study areas. We wonder whether hare densitydepends on land-use.

13. INTRODUCTION TO THE GENERALIZED LINEAR MODEL168

The typical distribution assumed for such counts is a Poisson, whichapplies when counted things are distributed independently and randomlyand samples of equal size are taken randomly. Then, the number of harescounted per study area (C) will be described by a Poisson. The Poisson hasa single parameter, the expected count λ, that is often called the intensityand here represents the mean hare density. In contrast to the normal, thePoisson variance is not a free parameter but is equal to the mean λ. For aPoisson-distributed random variable C, we write C∼PoissonðλÞ.

If hare density depends on land-use, i.e., is different in arable and grass-land areas, the assumption of a constant mean density across all 20 studyareas is not realistic. And in a “Poisson t-test” we are specifically inter-ested in whether hare density differs between grassland and arableareas. Therefore, here is a model for hare count Ci in area i:

1. Distribution: Ci ∼PoissonðλiÞ2. Link function: log, i.e., logðλiÞ = logðEðCiÞÞ = linear predictor3. Linear predictor: α + β � xi

In words, hare count Ci in area i is distributed as a Poisson randomvariable with mean EðCiÞ = λi. The log-transformation of λi is assumed tobe a linear function α + β � xi, where α and β are unknown constants and xiis the value of an area-specific covariate. If xi is an indicator for arableareas, then α becomes the mean hare density on a log-scale in grasslandareas and β, again on a log-scale, is the difference in mean density betweenthe two land-use types.

FIGURE 13.1 Brown hare (Lepus europaeus), Germany, 2008. (Photo N. Zbinden)


13.2 AN IMPORTANT BUT OFTEN FORGOTTENISSUE WITH COUNT DATA

Whenever we interpret λ as the mean hare density, we make the implicitassumption that every individual hare is indeed seen, i.e., that detection prob-ability ( p) is equal to 1. This is not very likely for hares nor indeed for anywild animal because typically some individuals are overlooked (Yoccozet al., 2001; Kéry, 2002; Williams et al., 2002; Kéry and Juillerat, 2004;Schmidt, 2005, 2008). Alternatively, we may assume that the proportion ofhares overlooked per area is the same, on average, in both land-use types. Inthat case, counts are considered just an index to absolute density, i.e., a mea-sure for relative density, and what we model as the Poisson parameter λi is inreality the product between absolute hare density and the proportion p ofhares seen. Only by making the assumption that p is identical, on average,in both land-use types may we validly interpret a mean difference betweencounts in arable and grassland areas as an indication of a difference in truehare density. See Chapters 20 and 21 for more on this important topic, thedistinction between the imperfectly observed true state and the observeddata, or, between the ecological and the observation processes underlyingecological field data.


For now, we simulate and analyze hare counts under the assumptionthat detectability is perfect. First we need an indicator for land-use:

n.sites < 10x < gl(n = 2, k = n.sites, labels = c("grassland", "arable"))n < 2*n.sites

Let the mean hare density in grassland and arable areas be 2 and 5 hares,respectively. Then, α = logð2Þ = 0:69 and logð5Þ = α + β, thus, β = logð5Þ –logð2Þ = 0:92. Therefore, the expected density λi is given by:

lambda <– exp(0.69 + 0.92*(as.numeric(x)–1)) # x has levels 1 and 2, not 0 and 1

We add the noise that comes from a Poisson distribution and inspect thehare counts we’ve thus generated (Fig. 13.2):

C < rpois(n n, lambda lambda) # Add Poisson noise

aggregate(C, by list(x), FUN mean) # The observed means

boxplot(C ~ x, col "grey", xlab "Land use", ylab "Hare count", las 1)

Again, we can get a feel for the strong effects of chance (sampling varia-tion) by repeatedly generating hare counts and observing by how muchthey vary from one sample of 20 counts to another sample of 20 counts.



We fit the “Poisson t-test” using the R function glm(..., family=poisson). To test whether mean density in grassland differs from thatin arable areas, we use the t-test provided by the function summary() ora likelihood ratio test from anova(). There is no big difference here interms of the inferences.

poisson.t.test <– glm(C ~ x, family poisson) # Fit the model

summary(poisson.t.test) # t–Test

anova(poisson.t.test, test "Chisq") # Likelihood ratio test (LRT)


Let’s now fit the “Poisson t-test” in WinBUGS. To do this, we will takethe code from the normal t-test (Chapter 7) and adapt it to the PoissonGLM case. In addition, we will do two more things in the WinBUGSprogram below:

1. compute Pearson residuals to assess model fit and2. conduct a posterior predictive check including a Bayesian p-value as

we did for normal linear regression in Chapter 8 (and will do for a“generalized” Poisson regression in Chapter 21).

8

6

4

2

Har

e co

unt

ArableGrasslandLand-use

0

FIGURE 13.2 Relationship between hare count and land use.


# Define model

sink("Poisson.t.test.txt")

cat("

model {

# Priors



# Likelihood

for (i in 1:n) {

C[i] ~ dpois(lambda[i])

log(lambda[i]) <– alpha + beta *x[i]

# Fit assessments

Presi[i] <– (C[i] – lambda[i]) / sqrt(lambda[i]) # Pearson residuals

C.new[i] ~ dpois(lambda[i]) # Replicate data set

Presi.new[i] <– (C.new[i] – lambda[i]) / sqrt(lambda[i]) # Pearson resi

D[i] <– pow(Presi[i], 2)

D.new[i] <– pow(Presi.new[i], 2)

}

# Add up discrepancy measures

fit <– sum(D[])

fit.new <– sum(D.new[])

}

",fill TRUE)

sink()

# Bundle data

win.data <– list(C C, x as.numeric(x)–1, n length(x))

# Inits function

inits <– function(){ list(alpha rlnorm(1), beta rlnorm(1))}


params <– c("lambda","alpha", "beta", "Presi", "fit", "fit.new")

# MCMC settings

nc <– 3

ni <– 3000

nb <– 1000

nt <– 2


out <- bugs(data win.data, inits inits, parameters.to.save params,

model.file "Poisson.t.test.txt", n.thin nt, n.chains nc, n.burnin nb, n.iter ni,

debug TRUE)


13.5.1 Check of Markov Chain Monte Carlo Convergenceand Model Adequacy

The first two things to do before even looking at the estimates of amodel fitted using MCMC should really be to check (1) that the Markovchains have converged and (2) that the fitted model is adequate for thedata set. We do both here in an exemplary manner.

Convergence—Again, we can assess convergence by graphical means(typically directly within WinBUGS) or using a numerical summary, theBrooks–Gelman–Rubin statistic, which R2WinBUGS calls Rhat. Rhat isabout 1 at convergence, with 1.1 often taken an acceptable threshold.We will look at the Rhat values first.

print(out, dig = 3)

If we briefly look at the second to last column in the table, we see that thechains for all parameters seem to have converged admirably. For largermodels with many more parameters, a summary of this summary tablemay be useful. For instance, we may ask which (if any) parametershave a value of Rhat greater than 1.1. Or we can draw a histogram ofthe Rhat values.

which(out$summary[,8] > 1.1) # which value in the 8th column is > 1.1 ?> which(out$summary[,8] > 1.1)named integer(0) # So here we have none

hist(out$summary[,8], col = "grey", main = "Rhat values")

So, as expected in this simple model fitted to a “good” data set, there is noproblem with convergence.

Residuals and posterior predictive check—We do the analogous towhat we did in the normal linear regression example in Chapter 8. Thatis, we plot the residuals first and then plot the two fit statistics (for theactual data set and for the perfect, new, data sets) against each otherand compute the Bayesian p-value as a numerical summary of overalllack of fit. The fit statistic for the new data sets represents, in a way, thereference distribution for the chosen test statistic, here, the sum of squaredPearson residuals.

For GLMs other than the normal linear model, the variability of theresponse depends on the mean response. To get residuals with approxi-mately constant variance, Pearson residuals are often computed. Theyare obtained by dividing the raw residuals ðyi − �yÞ by the standard devia-tion of y; see also WinBUGS code above.

plot(out$mean$Presi, las = 1)abline(h = 0)


There is no obvious sign of lack of fit for any particular data point(Fig. 13.3). Next, we conduct a posterior predictive check (Fig. 13.4):

plot(out$sims.list$fit, out$sims.list$fit.new, main "Posterior predictive check

\nfor sum of squared Pearson residuals", xlab "Discrepancy measure for actual data set",

ylab "Discrepancy measure for perfect data sets")

abline(0,1, lwd 2, col "black")

Of course, this looks perfect and computation of the Bayesian p-value(below) confirms this impression. Here, we compute the Bayesianp-value outside WinBUGS in R. This is easier, but of course, we needto have saved the Markov chains for both fit and fit.new.

mean(out$sims.list$fit.new > out$sims.list$fit)> mean(out$sims.list$fit.new > out$sims.list$fit)[1] 0.624

13.5.2 Inference Under the Model

Now that we are convinced that the model is adequate for these data,we inspect the estimates and compare them with what we put into thedata set, as well as what the frequentist analysis in R tells us.

print(out, dig = 3)

A comparison of the Bayesian solution with the input values that were usedfor generating thedata set (α = 0.69, β = 0.92) and the solution given by glm()shows a reasonably decent consistency (in view of the small sample size).

1.5

1.0

0.5

0.0

out$

mea

n$P

resi

5 10Index

15 20

�0.5

�1.0

FIGURE 13.3 Pearson residuals for the hare counts.


summary(poisson.t.test)

So is there a difference in hare density according to land-use? Let’s look atthe posterior distribution of the coefficient for arable (Fig. 13.5).

hist(out$sims.list$beta, col "grey", las 1, xlab "Coefficient for arable",

main "")

The posterior distribution does not overlap zero, so arable sites really doappear to have a different hare density than grassland sites. The same con-clusion is arrived at when looking at the 95% credible interval of β in thesummary of the analysis mentioned earlier: (0.64–1.72).

Finally, we will form predictions for presentation. Predictions are theexpected values of the response under certain conditions, such as for particu-lar covariate values.Wehave seen earlier that predictions are a valuablemeansfor synthesizing the information that a model extracts from a data set. Ina Bayesian analysis, forming predictions is easy. Predictions are just anotherform of unobservables, such as parameters ormissing values. Therefore, wecan base inference about predictions on their posterior distributions.

To summarize what we have learned about the differences in hare den-sities in grassland and arable study areas, we plot the posterior distribu-tions of the expected hare counts (λ) for both habitat types (Fig. 13.6). Weobtain the expected hare counts by exponentiating parameter α and α + β,respectively.

30

Discrepancy measure for actual data set

Dis

crep

ancy

mea

sure

for

perf

ect d

ata

sets

35 40 45252015

10

20

30

40

50

FIGURE 13.4 Graphical posterior predictive check based on the sum of squared Pearsonresiduals. The Bayesian p value (0.62) is the proportion of points above the line. The hardboundary on the left is because of the fact that the discrepancy measure cannot be smallerthan that corresponding to the maximum likelihood estimate.


par(mfrow c(2,1))

hist(exp(out$sims.list$alpha), main "Grassland study areas", col "grey", xlab

"", xlim c(0,10), breaks 20)

hist(exp(out$sims.list$alpha + out$sims.list$beta), main "Arable study areas",

col "grey", xlab "Expected hare count", xlim c(0,10), breaks 20)

200

100

Freq

uenc

y 250

4 6

Grassland

Arable

8 10

Freq

uenc

y

Expected hare count

0

100

250

20 4 6 8 10

FIGURE 13.6 Posterior distribution of mean hare density in grassland (top) and arableareas (bottom).

800

600

200

0

0.5 1.0 1.5 2.0

400

Freq

uenc

y

Coefficient for arable

FIGURE 13.5 Posterior distribution of the coefficient of arable.


13.6 SUMMARY

We have introduced the generalized linear model, or GLM, whereeffects of covariates are linear in the transformed expectation of aresponse, which may come from a distribution other than the normal.The GLM is another key concept that appears over and over again in mod-ern applied statistics in empirical sciences such as ecology. Therefore, wewill deepen our understanding of this essential model class in subsequentchapters. Furthermore, we will combine the GLM and the mixed model toarrive at the most complex model considered in this book, the generalizedlinear mixed model, in Chapters 16 and 19–21.

EXERCISES1. Predictions: Within the WinBUGS code, add a line that directly computes

the mean hare density in arable areas.2. Derived quantities: Summarize the posterior distribution for the difference

in mean hare density in grassland and arable areas.3. Zeroes (migrating raptors): This fine example is borrowed from Bernardo

(2003). It beautifully illustrates the power of Bayesian inference based onthe posterior distribution of the unobservables (parameters, etc.).Ornithologists frequently count migrating birds of prey at places wherethey concentrate in spring or autumn, e.g., along coasts, mountain ridges,or on isthmuses. Assume that at a certain place, one rare raptor species hadnot been seen during 10 consecutive days. What is the probability that wesee at least one on day 11? What is the probability that we see two or more?In your solution, make explicit your reasoning for using the particularstatistical model you choose and discuss a few of its assumptions that maynot hold in reality (e.g., serial independence, constancy of rates).

4. Zeroes (contrast estimate): Assume that no hare was ever observed ingrassland areas, i.e., that the counts in all 10 grassland areas were zero. Tryto fit the Poisson t-test using R and using WinBUGS.

5. Swiss hare data: Compare mean counts (not density) in arable and ingrassland areas in one selected year (e.g., 2000; take the smallest countswhen there is more than one per year). When taking these counts asobservations from an identical Poisson distribution, is there anything thatstrikes you as inadequate?

13.6 SUMMARY 177

C H A P T E R

14Overdispersion, Zero-Inflation,

and Offsets in the GLM

O U T L I N E

14.1 Overdispersion 18014.1.1 Introduction 18014.1.2 Data Generation 18014.1.3 Analysis Using R 18114.1.4 Analysis Using WinBUGS 183

14.2 Zero-Inflation 18414.2.1 Introduction 18414.2.2 Data Generation 18514.2.3 Analysis Using R 18614.2.4 Analysis Using WinBUGS 187

14.3 Offsets 18814.3.1 Introduction 18814.3.2 Data Generation 18914.3.3 Analysis Using R 18914.3.4 Analysis Using WinBUGS 189

14.4 Summary 190

Two features specific to nonnormal generalized linear models (GLMs)are overdispersion and offsets. Zero-inflation can be called a specific formof overdispersion: there are more zeroes than expected. Here, we brieflydeal with each of them in the context of the hare counts example.


14.1 OVERDISPERSION

14.1.1 Introduction

In both distributions commonly used to model counts (Poisson andbinomial), the dispersion (the variability in the counts) is not a free param-eter but instead is a function of the mean. The variance is equal to the mean(λ) for the Poisson and equal to the mean (N * p) times 1 − p for the bino-mial distribution (see Chapters 17–19). This means that for a Poisson orbinomial random variable, the models for the counts come with a “built-in” varia-bility and the magnitude of that variability is known. In an analysis ofdeviance conducted in a classical statistical analysis of the model, theresidual deviance of the model will be about the same magnitude as theresidual degrees of freedom, i.e., the mean deviance ratio (= residualdeviance/residual df) is about 1.

However, in real life, count data are almost always more variable thanexpected under the Poisson or binomial models. This is called overdisper-sion or extra-Poisson or extra-binomial variation and means that the resi-dual variation is larger than prescribed by a Poisson or binomial.Overdispersion can occur because there are hidden correlations thathave not been included in the model, e.g., when individuals in familygroups are assumed to be independent, or when important covariateshave not been included. When overdispersion is not modeled, tests andconfidence intervals will be overconfident (although means won’t nor-mally be biased). Therefore, overdispersion should be tested andcorrected for when necessary.

The simplest way to correct for overdispersion in a classical analysis isby the quasi-likelihood (McCullagh and Nelder, 1989) and by usingfamily=quasipoisson (or quasibinomial) in the R function glm().Using WinBUGS, there are several ways in which one can account foroverdispersion. One is to specify a distribution that is overdispersed rela-tive to the Poisson, such as the negative binomial. Another solution, andthe one we illustrate here, is to add into the linear predictor for the Poissonintensity a normally distributed random effect. Technically, this model isthen a Poisson generalized linear mixed model (GLMM; see Chapter 16for a more formal introduction). It is sometimes called a Poisson-lognormalmodel (Millar 2009).


We generate a slightly modified hare count data set, where in additionto the land-use difference in mean density, there is also a normally distrib-uted site-specific effect in the linear predictor. For illustrative purposes, wealso generate a sister data set without overdispersion.

14. OVERDISPERSION, ZERO-INFLATION, AND OFFSETS IN THE GLM180

n.site <– 10

x <– gl(n 2, k n.site, labels c("grassland", "arable"))

eps <– rnorm(2*n.site, mean 0, sd 0.5) # Normal random effect

lambda.OD <– exp(0.69 +(0.92*(as.numeric(x)−1) + eps) )

lambda.Poisson <– exp(0.69 +(0.92*(as.numeric(x)−1)) ) # For comparison

We add the noise that comes from a Poisson and inspect the hare countswe’ve generated (Fig. 14.1):

C.OD <– rpois(n 2*n.site, lambda lambda.OD)

C.Poisson <– rpois(n 2*n.site, lambda lambda.Poisson)

par(mfrow c(1,2))

boxplot(C.OD ~ x, col "grey", xlab "Land-use", main "With OD", ylab "Hare

count", las 1, ylim c(0, max(C.OD)))

boxplot(C.Poisson ~ x, col "grey", xlab "Land-use", main "Without OD", ylab

"Hare count", las 1, ylim c(0, max(C.OD)) )


We conduct a classical analysis of the overdispersed data once withoutand then with correction for overdispersion (using glm(, family =quasi...)).

glm.fit.no.OD < glm(C.OD ~ x, family = poisson)glm.fit.with.OD < glm(C.OD ~ x, family = quasipoisson)summary(glm.fit.no.OD)

With OD

14

12

10

Har

e co

unt

Har

e co

unt

8

6

4

2

0

Grassland

Land-use

Without OD

14

12

10

8

6

4

2

0

GrasslandLand-use

FIGURE 14.1 Hare counts by land use with and without overdispersion (OD).Overdispersion was caused by site specific differences in hare density.

14.1 OVERDISPERSION 181

summary(glm.fit.with.OD)anova(glm.fit.no.OD, test = "Chisq")anova(glm.fit.with.OD, test = "F")

> summary(glm.fit.no.OD)

Call:glm(formula = C.OD ~ x, family = poisson)

[ ... ]

Coefficients:Estimate Std. Error z value Pr(>|z|)

(Intercept) 0.9933 0.1925 5.161 2.46e 07 ***xarable 0.7646 0.2330 3.282 0.00103 **

[ ... ]

(Dispersion parameter for poisson family taken to be 1)

[ ... ]

> summary(glm.fit.with.OD)

Call:glm(formula = C.OD ~ x, family = quasipoisson)

[ ... ]


(Intercept) 0.9933 0.2596 3.826 0.00124 **xarable 0.7646 0.3143 2.433 0.02563 *

[ ... ]

(Dispersion parameter for quasipoisson family taken to be 1.819569)

Null deviance: 44.198 on 19 degrees of freedomResidual deviance: 32.627 on 18 degrees of freedom[ ... ]

> anova(glm.fit.no.OD, test = "Chisq")Analysis of Deviance Table

Model: poisson, link: log

[ ... ]

Df Deviance Resid. Df Resid. Dev P(>|Chi|)NULL 19 44.198x 1 11.571 18 32.627 0.001> anova(glm.fit.with.OD, test = "F")Analysis of Deviance Table


Model: quasipoisson, link: log

[ ... ]

Df Deviance Resid. Df Resid. Dev F Pr(>F)NULL 19 44.198x 1 11.571 18 32.627 6.3591 0.02132 *

Thus, the parameter estimates don’t change when accounting for overdisper-sion, but tests and standard errors do.


In WinBUGS, it is easy to get from the simple Poisson t-test with homo-geneous (Poisson) variance in the last chapter to the overdispersed Pois-son t-test represented by the Poisson-lognormal model.

# Define model

sink("Poisson.OD.t.test.txt")

cat("

model {

# Priors




tau <– 1 / (sigma * sigma)

# Likelihood

for (i in 1:n) {

C.OD[i] ~ dpois(lambda[i])

log(lambda[i]) <– alpha + beta *x[i] + eps[i]

eps[i] ~ dnorm(0, tau)

}

}

",fill TRUE)

sink()

# Bundle data

win.data <– list(C.OD C.OD, x as.numeric(x)−1, n length(x))

# Inits function

inits <– function(){ list(alpha rlnorm(1), beta rlnorm(1), sigma rlnorm(1))}


params <– c("lambda","alpha", "beta", "sigma")

Note that as soon as we start estimating variances (here, of the overdisper-sion effects eps), we need longer chains.

14.1 OVERDISPERSION 183

# MCMC settings

nc <– 3 # Number of chains

ni <– 3000 # Number of draws from posterior per chain

nb <– 1000 # Number of draws to discard as burn-in

nt <– 5 # Thinning rate


out <– bugs(data win.data, inits inits, parameters.to.save params,

model.file "Poisson.OD.t.test.txt", n.thin nt, n.chains nc, n.burnin nb, n.iter ni,

debug TRUE)

print(out, dig 3)

14.2 ZERO-INFLATION

14.2.1 Introduction

Zero-inflation can be called a specific form of overdispersion and is fre-quently found in count data. It means that there are more zeroes thanexpected under the assumed (e.g., Poisson or binomial) distribution. Inthe context of our hare counts, a typical explanation for excess zeroes isthat some sites are simply not suitable for hares, such as paved parkinglots, roof tops, or lakes; hence, resulting counts must be zeroes. In theremaining suitable sites, counts vary according to the assumed distribu-tion. Thus, we may imagine a sequential genesis of zero-inflated counts:first, Nature determines whether a site may be occupied at all, and second,she selects the counts for those that are habitable in principle. Regressionmodels that account for this kind of overdispersion are often called zero-inflated Poisson (ZIP) or zero-inflated binomial (ZIB) models.

A ZIP model for count Ci at site i can be written algebraically like this:

wi ∼BernoulliðψiÞ Suitability of a site (14.1)Ci ∼Poissonðwi � λiÞ Observed counts (14.2)

For each site i, Nature flips a coin that lands heads (i.e., wi = 1) withprobability ψi. We can’t observe wi perfectly, i.e., it is a latent or randomeffect. Only for sites with wi = 1, Nature then rolls her Poisson (λi) die todetermine the count Ci at that site. For sites with wi = 0, the Poisson meanis 0 × λi = 0 and the corresponding Poisson die produces zero counts only.

We see that a ZIP model simply represents a set of two coupled GLMs:the logistic regression describes the suitability in principle of a site whilethe Poisson regression describes the variation of counts among suitablesites, i.e., those with wi = 1. All the usual GLM features apply, and in par-ticular, both the Bernoulli and the Poisson parameter can be expressed as afunction of covariates on the link scale. These covariates may or may notbe the same for both regressions.


I make four notes on the ZIP model in our context of hare counts: First,the model allows for two entirely different kinds of zero counts; thosecoming from the Bernoulli and those from the Poisson process. The formerare zero counts at unsuitable sites while the latter are due to Poissonchance, i.e., for them, Nature’s Poisson die happened to yield a zero.The actual distribution of an organism, i.e., the proportion of sitesthat is occupied (has nonzero counts), is a function of both processes.Hence, it would be wrong to say that Eqn. 14.1 describes the distributionand Eqn. 14.2 the abundance.

Second, the above ZIP model is a hierarchical, or random-effects, modelwith binary instead of normal random effects. It is an example of the kindof nonstandard GLMMs that are featured extensively in Chapters 20and 21. There, we will see the site-occupancy species distribution model,another kind of zero-inflated GLM, but one where a Bernoulli or binomialdistribution is zero-inflated with another Bernoulli, so we get a zero-inflated binomial (ZIB) model.

Third, some authors advocate ZIP models widely for inference aboutcount data (Martin et al., 2005; Joseph et al., 2009). However, on ecologicalgrounds, they appear most adequate in situations where unknown envir-onmental covariates determine the suitability of a site. If covariates areknown and have been measured, they are probably best added to the lin-ear model for the Poisson mean. Distribution, or occurrence, is fundamen-tally a function of abundance, i.e., a species occurs at all sites whereabundance is greater than zero. It appears contrived to model distributioncompletely separately from abundance.

Fourth, there is a variant of a ZIP model called the hurdle model(Zeileis et al., 2008), where the first step in the hierarchical genesis of thecounts is assumed to be the same as in a ZIP model, i.e., wi ~ Bernoulli(ψi). Butthen, counts at suitable sites (i.e., with wi = 1) are modeled as coming from azero-truncated Poisson distribution, i.e., a Poisson for values excluding zero.Hurdles (thresholds) other than zero are also possible. Superficially, thismodel may appear “better” than a ZIP model because it only allows onekind of zero: that coming from the Bernoulli process. However, it positsthat all sites that are suitable in principle will be occupied and have a countgreater than 0. This is not sensible biologically because, in reality, a suita-ble site may well be unoccupied as a result of local extinction, dispersallimitation, or some other reason.


We generate the simplest kind of zero-inflated count data for our (Pois-son) hare example. We assume different densities in arable and grasslandareas and a constant zero inflation, i.e., a single value of ψ for all sites,regardless of land-use or other environmental covariates.

14.2 ZERO-INFLATION 185

psi < 0.8n.site < 20x < gl(n = 2, k = n.site, labels = c("grassland", "arable"))

For each site, we flip a coin to determine its suitability and store the resultin the latent state wi.

w < rbinom(n = 2*n.site, size = 1, prob = psi)

We assume identical effects of arable and grass as before and generateexpected counts at suitable sites as before.

lambda < exp(0.69 +(0.92*(as.numeric(x) 1)) )

We then add up (actually, multiply) the effects of both processes(Bernoulli and Poisson) and inspect the counts we’ve generated. Notehow all counts at unsuitable sites (with wi = 0) are zero.

C < rpois(n = 2*n.site, lambda = w *lambda)cbind(x, w, C)


A wide range of ZIP and related models can be fitted in R using thefunction zeroinfl() in package pscl; see, for instance, Zeileis et al.(2008). We load that package and fit the simplest possible ZIP model.

library(pscl)fm < zeroinfl(C ~ x | 1, dist = "poisson")

summary(fm)> summary(fm)

Call:zeroinfl(formula = C ~ x | 1, dist = "poisson")

Count model coefficients (poisson with log link):Estimate Std. Error z value Pr(>|z|)

(Intercept) 0.7441 0.1773 4.197 2.71e 05 ***xarable 0.8820 0.2095 4.209 2.56e 05 ***

Zero inflation model coefficients (binomial with logit link):Estimate Std. Error z value Pr(>|z|)

(Intercept) 1.6786 0.4943 3.396 0.000684 ***

Signif. codes: 0'***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Number of iterations in BFGS optimization: 8Log likelihood: 78.4 on 3 Df


Because of sampling and estimation error, the coefficients for thecount model (corresponding to Eqn. 14.2) may not always be very close tothe input values. Also note that what pscl calls the coefficient in the zero-inflation model corresponds to 1 − ψ in Eqn. 14.1. Typing plogis( 1.6786) in R convinces us that the function is doing what it should do.


Next, the solution in WinBUGS. As always, the elementary manner ofmodel specification using the BUGS language makes it very clear whatmodel is fitted. To make the parameter estimates directly comparable,we also add a line that computes the logit of the zero-inflation parameterin R from the parameter ψ that we use here.

# Define model

sink("ZIP.txt")

cat("

model {

# Priors

psi ~ dunif(0,1)



# Likelihood

for (i in 1:n) {

w[i] ~ dbern(psi)

C[i] ~ dpois(eff.lambda[i])

eff.lambda[i] <– w[i]*lambda[i]

log(lambda[i]) <– alpha + beta *x[i]

}

# Derived quantity

R.lpsi <– logit(1-psi)

}

",fill TRUE)

sink()

# Bundle data

win.data <– list(C C, x as.numeric(x)−1, n length(x))

# Inits function

inits <– function(){ list(alpha rlnorm(1), beta rlnorm(1), w rep(1, 2*n.site))}

We will also estimate the latent state wi, i.e., the intrinsic suitability forbrown hares at each site.

14.2 ZERO-INFLATION 187


params < c("lambda","alpha", "beta", "w", "psi", "R.lpsi")

# MCMC settings (need fairly long chains)


ni < 50000 # Number of draws from posterior per chain



# Start WinBUGS

out < bugs(data win.data, inits inits, parameters.to.save params,

model.file "ZIP.txt", n.thin nt,n.chains nc,n.burnin nb, n.iter ni, debug TRUE)

print(out, dig 3)

> print(out, dig 3)

Inference for Bugs model at "ZIP.txt", fit using WinBUGS,



mean sd 2.5% 25% 50% 75% 97.5% Rhat n.eff

[ ...]

alpha 0.733 0.175 0.384 0.617 0.736 0.853 1.066 1.001 11000

beta 0.884 0.208 0.479 0.744 0.883 1.024 1.300 1.001 30000

[ ...]

psi 0.827 0.064 0.689 0.786 0.831 0.873 0.936 1.001 30000

R.lpsi 1.635 0.484 2.684 1.931 1.596 1.302 0.795 1.001 30000

We find pretty similar estimates between R and WinBUGS.

14.3 OFFSETS

14.3.1 Introduction

In a Poisson GLM, we assume that the expected counts are adequatelydescribed by the effect of the covariates in the model. However, fre-quently, we have that the “counting window” is not constant, e.g., thatstudy areas don’t have the same size or, in temporal samples, that theduration of counting periods differ. To account for this known componentof variation in the conditional Poisson mean, we define the log of the sizeof the “counting window” (study area size, count duration) as an offset.Effectively, we then model density as a response.

Let’s consider this for the hare counts using algebra. The Poisson GLMis Ci ~ Poisson(λi), i.e., hare counts Ci are conditionally distributed as Pois-son with expected count λi. When study areas differ in size, we have Ci ~Poisson (Ai * λi), where Ai is the area of study area i. Therefore, the linearpredictor becomes log(Ai * λi) = log(Ai) + log(λi). If we also wish to model a


covariate x into the mean, we get log(Ai * λi) = log(Ai) + α + β * xi. This isequivalent to forcing the coefficient of log(area) to be equal to 1. That is,we effectively fit the model log(Ai * λi) = β0 * log(Ai) + α + β * xi with β0 = 1.The offset compensates for the additional and known variation in theresponse resulting from differing study area size.


n.site < 10A < runif(n = 2*n.site, 2,5) # Areas range in size from 2 to 5 km2x < gl(n = 2, k = n.site, labels = c("grassland", "arable"))linear.predictor < log(A) + 0.69 +(0.92*(as.numeric(x) 1))lambda < exp(linear.predictor)C < rpois(n = 2*n.site, lambda = lambda) # Add Poisson noise


We use R for an analysis with and without consideration of the differ-ing areas.

glm.fit.no.offset < glm(C ~ x, family = poisson)glm.fit.with.offset < glm(C ~ x, family = poisson, offset = log(A))summary(glm.fit.no.offset)summary(glm.fit.with.offset)anova(glm.fit.with.offset, test = "Chisq") # LRT

Comparing the residual deviance of the two models makes clear thatspecification of an offset represents a sort of correction for a systematickind of overdispersion.


Note how simple it is in WinBUGS to jump from one kind of analysisfor the hare counts to another.

# Define model

sink("Offset.txt")

cat("

model {

# Priors



# Likelihood

for (i in 1:n) {

14.3 OFFSETS 189


log(lambda[i]) <– 1 * logA[i] + alpha + beta *x[i] # Note offset

}

}

",fill TRUE)

sink()

# Bundle data

win.data <– list(C C, x as.numeric(x)−1, logA log(A), n length(x))

# Inits function



params <– c("lambda","alpha", "beta")

# MCMC settings

nc <– 3 # Number of chains

ni <– 1100 # Number of draws from posterior

nb <– 100 # Number of draws to discard as burn-in

nt <– 2 # Thinning rate



model.file "Offset.txt", n.thin nt,n.chains nc,n.burnin nb, n.iter ni, debug

TRUE)

print(out, dig 3)

14.4 SUMMARY

Overdispersion, zero-inflation, and offsets are important GLM topics.The specification of the associated models in WinBUGS is fairly easyand clarifies the actual meaning of these three topics. This is not usuallythe case when fitting these models in a canned routine in R or anothersoftware. Thus, this is another example of where the simple model speci-fication in the BUGS language enforces an understanding of the fittedmodel that is easily lost in other stats packages.

EXERCISES1. Estimating a coefficient for an offset covariate: Forcing the coefficient of area to

be 1 implies the assumption that hare density is unaffected by area.However, we can also estimate a coefficient for log(area) rather thansetting it to 1. For instance, in real life, density may well differ between


large and small areas, perhaps because predator density is also related toarea. This hypothesis could be tested by fitting a coefficient for log(area)and seeing whether it differs from 1. Adapt the WinBUGS code to achievethis. A Bayesian analysis is extremely suited for this type of question (i.e., totest whether a parameter has a value other than what it is expected to beunder a certain hypothesis).

2. Swiss hare data: Fit a model that contains both overdispersion, modeled aslog-normal random effect, and an offset of log(area), when estimatingthe difference in mean density between arable and grassland study areas inone year, e.g., 2000 (use a single count per year and area). In a variant ofthat analysis, estimate the coefficient of log(area) to test whether haredensity depends on area.

14.4 SUMMARY 191

C H A P T E R

15Poisson ANCOVA

O U T L I N E




15.4 Analysis Using WinBUGS 19715.4.1 Fitting the Model 19715.4.2 Forming Predictions 199

15.5 Summary 201

15.1 INTRODUCTION

We may call a Poisson analysis of covariance (ANCOVA) a Poissonregression with both discrete and continuous covariates. In most practicalapplications, Poisson models will have several covariates and of bothtypes. Therefore, here we look at this important variety of a generalizedlinear model (GLM). To stress the similarity with the normal linear case,we only slightly alter the inferential setting sketched in Chapter 11. Weassume that instead of measuring body mass in asp vipers in three popu-lations in the Pyrenees, Massif Central, and the Jura mountains, leading toa normal model, we had instead assessed ectoparasite load in a dragonfly,the Sombre Goldenring (Fig. 15.1), leading to a Poisson model. We are par-ticularly interested in whether there are more or less little red mites ondragonflies of different size (expressed as wing length) and whether thisrelationship differs among the three mountain ranges. (Actually, dragon-flies don’t vary that much in body size, but let’s assume there is sufficientvariation to make such a study worthwhile.)


We will fit the following model to mite count Ci on individual i:

1. Distribution: Ci ~ Poisson(λi)2. Link function: log, i.e., log(λi) = log(E(Ci)) = linear predictor3. Lin. predictor: logðλiÞ = αPyr + β1 � xMC + β2 � xJura + β3 � xwing + β4�

xwing � xMC + β5 � xwing � xJuraNote the great similarity between this model and the one we fitted to themass of asps in Chapter 11. Apart from the link function, the main otherdifference is simply that for this model we don’t have a dispersion term;the Poisson already comes with a built-in variability. We could modeloverdispersion in mite counts by using a Poisson-lognormal formulationas in the previous chapter, but we omit such added complexity here.


We assemble the data set.

n.groups < 3

n.sample < 100

n < n.groups * n.sample

FIGURE 15.1 Sombre Goldenring (Cordulegaster bidentata), Switzerland, 1995. (PhotoF. Labhardt)

15. POISSON ANCOVA194

x < rep(1:n.groups, rep(n.sample, n.groups)) # Population indicator

pop < factor(x, labels c("Pyrenees", "Massif Central", "Jura"))

length < runif(n, 4.5, 7.0) # Wing length (cm)length < length mean(length) # Centre by subtracting the mean

We build the design matrix of an interactive combination of lengthand population:

Xmat < model.matrix(~ pop*length)print(Xmat, dig = 2)

Select the parameter values, i.e., choose values for αPyr, β1, β2, β3, β4, β5

beta.vec < c( 2, 1, 2, 5, 2, 7)

Here’s the recipe for assembling the mite counts in three steps:

1. we add up all components of the linear model to get the linearpredictor, which is the expected mite count on a (natural) log scale,

2. we exponentiate to get the actual value of the expected mitecount, and

3. we add Poisson noise.

We again obtain the value of the linear predictor by matrix multiplicationof the design matrix (Xmat) and the parameter vector (beta.vec) (As inthe viper example, don’t be too harsh on me in case we achieve unnaturallevels of parasitation …):

lin.pred <– Xmat[,] %*% beta.vec # Value of lin.predictor

lambda <– exp(lin.pred) # Poisson mean

C <– rpois(n n, lambda lambda) # Add Poisson noise

# Inspect what we've created

par(mfrow c(2,1))

hist(C, col "grey", breaks 30, xlab "Parasite load", main "", las 1)

plot(length, C, pch rep(c("P","M","J"), each n.sample), las 1, col

rep(c("Red","Green","Blue"), each n.sample), ylab "Parasite load", xlab "Wing

length", cex 1.2)

par(mfrow c(1,1))

We have created a data set where parasite load increases with wing lengthin the South (Pyrenees, Massif Central) but decreases in the North (Juramountains); see Fig. 15.2.



Again, the R code to fit the model is very parsimonious and theestimates resemble the input values reasonably well (coefficients can becompared directly with beta.vec). Obviously, with larger sample sizesthe correspondence would be better still (you could try that out, e.g., bysetting n.sample = 1000).

summary(glm(C ~ pop * length, family poisson))

beta.vec

> summary(glm(C ~ pop * length, family poisson))

Call:

glm(formula C ~ pop * length, family poisson)

Deviance Residuals:

Min 1Q Median 3Q Max

−2.1471 −0.6680 −0.1961 0.2141 2.8843

Wing length

Parasite load

�0.5�1.0

40

200

50

0

0 5 10 20 30 352515

Fre

quen

cy

100

150

30

20

Par

asite

load

10

0

0.0 0.5 1.0

FIGURE 15.2 Top: Frequency distribution of ectoparasite load in Sombre Goldenrings.Bottom: Relationship between parasite load and wing length (deviation from mean, in cm)in three mountain ranges (P Pyrenees, M Massif Central, J Jura mountains).


Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) −1.8540 0.2554 −7.260 3.87e–13 ***

popMassif Central 0.6386 0.3463 1.844 0.0652 .

popJura 1.8046 0.2851 6.330 2.46e–10 ***

length 4.8199 0.2505 19.244 < 2e–16 ***

popMassif Central:length −1.6938 0.3516 −4.817 1.46e–06 ***

popJura:length −6.9059 0.2859 −24.156 < 2e–16 ***

–––

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 2402.99 on 299 degrees of freedom

Residual deviance: 205.35 on 294 degrees of freedom

AIC: 684.25

Number of Fisher Scoring iterations: 5

> beta.vec

[1] −2 1 2 5 −2 −7

Don’t forget that the difference between the coefficients and the betavector is due to the combined effect of sampling and estimation error.The former means that due to natural variation, 300 dragonflies sampledfrom large populations can not possibly represent the population valuesperfectly.


15.4.1 Fitting the Model

We simply adapt the code for the normal linear case (Chapter 11) to thePoisson case and again fit a reparameterized model with three separatelog-linear regressions.

# Define model

sink("glm.txt")

cat("

model {

# Priors

for (i in 1:n.groups){



}


# Likelihood

for (i in 1:n) {

C[i] ~ dpois(lambda[i]) # The random variable

lambda[i] < exp(alpha[pop[i]] + beta[pop[i]]* length[i])

} # Note double indexing: alpha[pop[i]]


# Recover effects relative to baseline level (no. 1)

a.effe2 < alpha[2] alpha[1] # Intercept Massif Central vs. Pyr.

a.effe3 < alpha[3] alpha[1] # Intercept Jura vs. Pyr.

b.effe2 < beta[2] beta[1] # Slope Massif Central vs. Pyr.

b.effe3 < beta[3] beta[1] # Slope Jura vs. Pyr.

# Custom test

test1 < beta[3] beta[2] # Slope Jura vs. Massif Central

}

",fill TRUE)

sink()

# Bundle data

win.data < list(C C, pop as.numeric(pop), n.groups n.groups, length

length, n n)

# Inits function

inits < function(){list(alpha rlnorm(n.groups,3,1), beta rlnorm(n.groups,2,1))}


params < c("alpha", "beta", "a.effe2", "a.effe3", "b.effe2", "b.effe3", "test1")

# MCMC settings

ni < 4500

nb < 1500

nt < 5

nc < 3


out < bugs(data win.data, inits inits, parameters.to.save params, model.file

"glm.txt", n.thin nt, n.chains nc, n.burnin nb, n.iter ni, debug TRUE)

This is still a simple model that converges fairly rapidly. We inspect theresults and compare them with “truth” in the data-generating randomprocess, as well as with the inference from R’s function glm().


beta.vec # Truth in data–generating process

print(glm(C ~ pop * length, family poisson)$coef, dig 4) # The ML solution


Remember that alpha[1] and beta[1] in WinBUGS correspond tothe intercept and the length main effect in the analysis in R anda.effe2, a.effe3. b.effe2, b.effe3 to the remaining terms of the ana-lysis in R. To ease comparison, these terms are shown in boldface.

> print(out, dig 3) # Bayesian analysis

Inference for Bugs model at "glm.txt", fit using WinBUGS,



mean sd 2.5% 25% 50% 75% 97.5% Rhat n.eff

alpha[1] −1.844 0.261 −2.384 −2.016 −1.852 −1.666 −1.360 1.017 120

alpha[2] 1.252 0.233 1.744 1.399 1.235 1.089 0.821 1.016 130

alpha[3] 0.050 0.130 0.321 0.136 0.046 0.037 0.186 1.002 1400

beta[1] 4.808 0.258 4.329 4.628 4.809 4.974 5.329 1.015 140

beta[2] 3.159 0.245 2.712 2.987 3.141 3.322 3.661 1.015 140

beta[3] 2.087 0.141 2.367 2.176 2.083 1.987 1.811 1.002 940

a.effe2 0.592 0.344 −0.063 0.356 0.590 0.825 1.264 1.005 440

a.effe3 1.793 0.291 1.231 1.597 1.793 1.989 2.371 1.015 140

b.effe2 −1.649 0.352 −2.324 −1.887 −1.647 −1.403 −0.970 1.004 530

b.effe3 −6.895 0.296 −7.470 −7.092 −6.891 −6.690 −6.336 1.011 190

test1 5.246 0.282 5.823 5.439 5.240 5.044 4.720 1.015 140

deviance 678.382 3.450 673.597 675.800 677.800 680.200 686.202 1.003 1200

For each parameter, n.eff is a crude measure of effective sample size,

and Rhat is the potential scale reduction factor (at convergence, Rhat 1).




> beta.vec# Truth in data generating process

[1] 2 1 2 5 2 7

> print(glm(C ~ pop * length, family poisson)$coef, dig 4) # The ML solution

(Intercept) popMassif Central popJura

1.8540 0.6386 1.8046

length popMassif Central:length popJura:length

4.8199 1.6938 6.9059

As expected, we find fairly concurrent estimates between the two modesof inference.


Finally, let’s summarize our main findings from the analysis in a graph.We illustrate the posterior distribution of the relationship betweenmite load and wing length for each of the three study areas. To do that,


we predict mite load for 100 dragonflies in each of the three mountainranges and plot these estimates along with their uncertainty. We computethe predicted relationship between mite count and wing-length for asample of 100 of all Markov chain Monte Carlo (MCMC) draws of theinvolved parameters and plot that (Fig. 15.3).

# Create a vector with 100 wing lengths

original.wlength < sort(runif(100, 4.5, 7.0))

wlength < original.wlength 5.75 # 5.75 is approximate mean (correct would be

that from the original data really)

# Create matrices to contain prediction for each winglength and MCMC iteration

sel.sample < sample(1:1800, size 100)

mite.load.Pyr < mite.load.MC < mite.load.Ju < array(dim c(100, 100))

# Fill in these vectors: this is clumsy, but it works

for(i in 1:100) {

for(j in 1:100) {

mite.load.Pyr[i,j] < exp(out$sims.list$alpha[sel.sample[j],1] +

out$sims.list$beta[sel.sample[j],1] * wlength[i])

mite.load.MC[i,j] < exp(out$sims.list$alpha[sel.sample[j],2] +


mite.load.Ju[i,j] < exp(out$sims.list$alpha[sel.sample[j],3] +


60

40

20

0

4.5 5.0 7.06.56.0

Wing length (cm)

Exp

ecte

d m

ite lo

ad

5.5

FIGURE 15.3 Predicted relationship between mite load and the size of a dragonfly inthree mountain ranges: Pyrenees (red), Massif Central (green), and Jura mountains (blue).Shaded grey lines illustrate the uncertainty in these relationships based on a randomsample of 100 from the posterior distribution and colored lines represent posterior means.


}

}

matplot(original.wlength, mite.load.Pyr, col "grey", type "l", las 1, ylab

"Expected mite load", xlab "Wing length (cm)")

for(j in 1:100){

points(original.wlength, mite.load.MC[,j], col "grey", type "l")

points(original.wlength, mite.load.Ju[,j], col "grey", type "l")

}

points(original.wlength, exp(out$mean$alpha[1] + out$mean$beta[1] * wlength), col

"red", type "l", lwd 2)


"green", type "l", lwd 2)


"blue", type "l", lwd 2)

I find this a rather nice plot, but if an editor asks for conventional 95%credible intervals instead, you can give the results of this code:

LCB.Pyr < apply(mite.load.Pyr, 1, quantile, prob 0.025)

UCB.Pyr < apply(mite.load.Pyr, 1, quantile, prob 0.975)

LCB.MC < apply(mite.load.MC, 1, quantile, prob 0.025)

UCB.MC < apply(mite.load.MC, 1, quantile, prob 0.975)

LCB.Ju < apply(mite.load.Ju, 1, quantile, prob 0.025)

UCB.Ju < apply(mite.load.Ju, 1, quantile, prob 0.975)

mean.rel < cbind(exp(out$mean$alpha[1] + out$mean$beta[1] * wlength),

exp(out$mean$alpha[2] + out$mean$beta[2] * wlength), exp(out$mean$alpha[3] +

out$mean$beta[3] * wlength))

covar < cbind(original.wlength, original.wlength, original.wlength)

matplot(original.wlength, mean.rel, col c("red", "green", "blue"), type "l",

lty 1, lwd 2, las 1, ylab "Expected mite load", xlab "Wing length (cm)")

points(original.wlength, LCB.Pyr, col "grey", type "l", lwd 1)

points(original.wlength, UCB.Pyr, col "grey", type "l", lwd 1)

points(original.wlength, LCB.MC, col "grey", type "l", lwd 1)

points(original.wlength, UCB.MC, col "grey", type "l", lwd 1)

points(original.wlength, LCB.Ju, col "grey", type "l", lwd 1)

points(original.wlength, UCB.Ju, col "grey", type "l", lwd 1)

15.5 SUMMARY

We have generalized the general linear model from the normal to thePoisson case to model the effects on grouped counts of a continuouscovariate. The changes involved in doing so in WinBUGS were minor,

15.5 SUMMARY 201

and the inclusion of further covariates is straightforward. The PoissonANCOVA is an important intermediate step for your understanding ofthe Poisson generalized linear mixed model.

EXERCISES1. Multiple Poisson regression: Invent a new covariate called xnonsense, for

instance, and fit this model: yi = αj + βj � xwing + δ � xnonsense. You can simplycreate values for xnonsense by sampling a normal or uniform randomvariable; you don’t need to assemble a new data set.

2. Polynomial Poisson regression: In addition to the linear effect (on a log-scale)of wing length, check for a quadratic effect also. You may or may notreassemble the data to include that effect.

3. Swiss hare data: Take a single count per year and site and fit a PoissonANCOVA to the counts by expressing counts as a function of site and year.You may ignore the variable site area or incorporate this source of variationby fitting an offset.


C H A P T E R

16Poisson Mixed-Effects Model

(Poisson GLMM)

O U T L I N E



16.3 Analysis Under a Random-Coefficients Model 20616.3.1 Analysis Using R 20716.3.2 Analysis Using WinBUGS 207

16.4 Summary 209

16.1 INTRODUCTION

The Poisson generalized linear mixed model (GLMM) is an extension ofthe Poisson generalized linear model (GLM) to include at least one addi-tional source of random variation over and above the random variationintrinsic to a Poisson distribution. Here, we adopt a Poisson GLMM toanalyze a set of long-term population surveys of Red-backed shrikes(Fig. 16.1).

We assume that pair counts over 30 years were available in each of16 shrike populations (again, the balanced design is for convenienceonly). Our intent is to model population trends. First, we write downthe random-coefficients model without correlation between the interceptsand slopes. This model is very similar to that for the normal linear casethat we examined extensively in Chapter 12. Thanks to how we specifymodels in the BUGS language, this similarity is more evident than whenfitting the model with a canned routine such as in R.


We model Ci, the number pairs of Red-backed shrikes counted in year iin study area j:

1. Distribution: Ci ~ Poisson(λi)2. Link function: log, i.e., log(λi) = log(E(Ci)) = linear predictor3. Linear predictor: αjðiÞ + βjðiÞ � xi4. Submodel for parameters/distribution of random effects:

αj ∼Normal ðμα, σ2αÞβj ∼Normal ðμβ, σ2βÞ

So the GLMM is just the same as a simple GLM, but with the addedsubmodels for the log-linear intercept and slope parameters that we useto describe the population trends. We don’t add a year-specific “residual”to the linear predictor. This could be done to account for random yeareffects and would be a first step towards modeling serial autocorrelation,e.g., by imposing an autoregressive structure on successive random yeareffects (Littell et al., 2008), but we omit this added complexity here. Also,we don’t model a correlation between intercepts and slopes.

In conducting this analysis, we implicitly make one of two assumptions.Either we assume that we find all shrike pairs in every year and study areaor we assume that at least the proportion of pairs overlooked does not varyamong years or study areas in a systematic way, i.e., that there is no time

FIGURE 16.1 MaleRed backedshrike (Lanius collurio), Switzerland, 2004. (Photo A. Saunier)

16. POISSON MIXED-EFFECTS MODEL (POISSON GLMM)204

trend in detection probability. Otherwise, the interpretation of these datawould be difficult or impossible. For an alternative protocol for data collec-tion and associated analysis method that doesn’t require these potentiallyrestrictive assumptions, see the binomial mixture model in Chapter 21.


We generate data under the random-coefficients model without corre-lation between the intercepts and slopes.

n.groups < 16n.years < 30n < n.groups * n.yearspop < gl(n = n.groups, k = n.years)

We standardize the year covariate to a range from zero to one.

original.year < rep(1:n.years, n.groups)year < (original.year 1)/29

We build a design matrix without the intercept.

Xmat < model.matrix(~pop*year 1 year)print(Xmat[1:91,], dig = 2) # Print top 91 rows

Next, we draw the intercept and slope parameter values from their normaldistributions, whose hyperparameters we need to select.

intercept.mean <– 3 # Choose values for the hyperparams

intercept.sd <– 1

slope.mean <– –2

slope.sd <– 0.6




We assemble the counts Ci by first computing the linear predictor, thenexponentiating it and finally adding Poisson noise. Then, we look at thedata.

lin.pred <– Xmat[,] %*% all.effects # Value of lin.predictor

C <– rpois(n n, lambda exp(lin.pred)) # Exponentiate and add Poisson noise

hist(C, col "grey") # Inspect what we've created

We use a lattice graph to plot the shrike counts against time for eachpopulation (Fig. 16.2).

library("lattice")

xyplot(C ~ original.year | pop, ylab "Red-backed shrike counts", xlab "Year")


16.3 ANALYSIS UNDER A RANDOM-COEFFICIENTSMODEL

We could analyze the shrike counts under the assumption that allshrike populations have the same trend, but at different levels, corre-sponding to a random-intercepts model, as we did for the normal case(see Section 12.3). However, we only show here the analysis under therandom-coefficients model. We assume that each population has a specifictrend, i.e., that both slopes and intercepts are independent random vari-ables governed by common hyperparameters.

0 5 10 15 20 25 30 0 5 10 15 20 25 30

0 5 10 15 20 25 30 0 5 10 15 20 25 30

Year

0

20

60

40

80

100

0

20

60

40

80

100

0

20

60

40

Red

-bac

ked

shrik

e co

unts

80

100

0

20

60

40

80

10013 14 15 16

11 12109

8765

1 2 3 4

FIGURE 16.2 Trellis plot of pair counts in 16 populations of Red backed shrikes over 30years.



library('lme4')

glmm.fit < glmer(C ~ year + (1 | pop) + ( 0+ year | pop), family poisson)

glmm.fit # Inspect results

> glmm.fit# Inspect results

Generalized linear mixed model fit by the Laplace approximation

Formula: C ~ year + (1 | pop) + (0 + year | pop)

AIC BIC logLik deviance

602.7 619.3 297.3 594.7

Random effects:

Groups Name Variance Std.Dev.

pop (Intercept) 0.67721 0.82293

pop year 0.13311 0.36484


Fixed effects:


(Intercept) 3.2406 0.2071 15.65 <2e 16 ***

year 1.8233 0.1055 17.28 <2e 16 ***

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Correlation of Fixed Effects:

(Intr)

year 0.044


# Define model

sink("glmm.txt")

cat("

model {

# Priors


alpha[i] ~ dnorm(mu.int, tau.int) # Intercepts

beta[i] ~ dnorm(mu.beta, tau.beta) # Slopes

}

mu.int ~ dnorm(0, 0.001) # Hyperparameter for random intercepts


sigma.int ~ dunif(0, 10)


mu.beta ~ dnorm(0, 0.001) # Hyperparameter for random slopes

tau.beta <– 1 / (sigma.beta * sigma.beta)

sigma.beta ~ dunif(0, 10)

# Poisson likelihood

for (i in 1:n) {


lambda[i] <– exp(alpha[pop[i]] + beta[pop[i]]* year[i])

}

}

",fill TRUE)

sink()

# Bundle data

win.data <– list(C C, pop as.numeric(pop), year year, n.groups n.groups, n

n)

# Inits function

inits <– function(){ list(alpha rnorm(n.groups, 0, 2), beta rnorm(n.groups, 0,

2), mu.int rnorm(1, 0, 1), sigma.int rlnorm(1), mu.beta rnorm(1, 0, 1),

sigma.beta rlnorm(1))}


parameters <– c("alpha", "beta", "mu.int", "sigma.int", "mu.beta", "sigma.beta")

# MCMC settings

ni <– 2000

nb <– 500

nt <– 2

nc < 3


out <– bugs(win.data, inits, parameters, "glmm.txt", n.thin nt, n.chains nc,


This GLMM converges easily.

print(out, dig 2)

> print(out, dig 2)

Inference for Bugs model at "glmm.txt", fit using WinBUGS,


n.sims = 2250 iterations savedmean sd 2.5% 25% 50% 75% 97.5% Rhat n.eff

[ ... ]

mu.int 3.24 0.24 2.77 3.08 3.23 3.40 3.75 1.00 2200

sigma.int 0.94 0.20 0.64 0.79 0.91 1.04 1.44 1.01 280

mu.beta 1.83 0.12 2.08 1.90 1.82 1.74 1.60 1.00 870

sigma.beta 0.41 0.10 0.25 0.34 0.40 0.47 0.65 1.00 970

[ ... ]


DIC info (using the rule, pD var(deviance)/2)

pD 31.7 and DIC 2478.9



intercept.mean ; intercept.sd ; slope.mean ; slope.sd

> intercept.mean ; intercept.sd ; slope.mean ; slope.sd

[1] 3

[1] 1

[1] 2

[1] 0.6

>

This comparison looks good.

16.4 SUMMARY

We have seen how the concept of random effects, which we first met inChapters 9 and 12 for the normal linear case, can be carried over fairlysmoothly to the non-normal case. Both frequentist and Bayesian analysesare fairly straightforward, but the model fitted in WinBUGS appears moretransparent.

EXERCISES1. Swiss hare data 1: Fit the random-coefficient Poisson regression to the counts

without regard to land use. Take a single count per site and year. Don’tforget to account for variable area by inclusion of an offset.

2. Swiss hare data 2: Now add a separate pair of hyperparameters (for interceptand slope) for arable and grassland areas; this should definitely get youpublished somewhere really nice. You could also add other explanatoryvariables, such as an index for fox abundance if you had that. Do not forgetthat you are not modeling hare abundance, just the expected hare counts.Nobody really knows what this means, but expected counts presumablyrepresent an unknown, and you hope, constant, proportion of the truenumber of hares out there.

16.4 SUMMARY 209

C H A P T E R

17Binomial “t-Test”

O U T L I N E





17.5 Summary 216

17.1 INTRODUCTION

The next three chapters deal with another common kind of count data,where we want to estimate a binomial proportion. The associated modelsare often called logistic regressions. The crucial difference between bino-mial and Poisson randomvariables is the presence of a ceiling in the former:binomial counts cannot be greater than some upper limit. Therefore, wemodel bounded counts, where the bound is provided by trial size N. Itmakes sense to express a count relative to that bound, and this yields aproportion. In contrast, Poisson counts lack an upper limit, at least inprinciple.

Modeling a binomial random variable in essence means modeling aseries of coin flips. We count the number of heads among the total numberof coin flips (N) and from this want to estimate the general tendency of thecoin to show heads. That is, we want to estimate Pr(heads). Data comingfrom coin flip-like processes are ubiquitous in nature and include survivalor the occurrence of an organism. In coin flips, the binomial distributiondescribes the number of times r a coin shows heads among a number ofN flips, where the coin has Pr(heads) = p. We also write r ~ Binomial (N, p).A special case of the binomial distribution with N = 1, corresponding to


a single flip of the coin, is called the Bernoulli distribution. It has just asingle parameter p.

As our inferential setting of this chapter, we consider a plant inventoryon calcareous grasslands in the Jura mountains. A total of 50 sites werevisited by experienced botanists who recorded whether they saw a speciesor not. The Cross-leaved gentian (Fig. 17.1) was found at 13 sites and theChiltern gentian (see Chapter 20) at 29 sites. We wonder whether this isenough evidence, given the variation inherent in binomial sampling, toclaim that the Cross-leaved gentian has a more restricted distribution inthe Jura mountains.

This type of data is often called “presence–absence data.” It is moreaccurate to call it “detection–nondetection data,” since the number ofsites at which a species is detected depends on two entirely different things:first, the number of sites where a species is actually present and second, theease with which a species is detected at an occupied site. Without a specialkind of data (see Chapter 20), we have no way of distinguishing betweenthe two components of “presence–absence.” All we can do is hope, pray,or claim either that both gentian species are found at every occupied siteor else that their probability to be overlooked is identical. For now, weassume that every individual present is detected, i.e., that every occupiedsite is observed as such. Given such data, our question can be framed

FIGURE 17.1 Cross leaved gentian (Gentiana cruciata), Spanish Pyrenees, 2006. (PhotoM. Kéry)

17. BINOMIAL “t-TEST”212

statistically by what can be called a binomial version of the t-test, i.e.a logistic regression that contrasts two groups. For gentian species i,let Ci be the number of sites it was detected. A simple model for Ci is this:

1. (Statistical) Distribution: Ci ~ Binomial (N, pi)

2. Link function: logit, i.e., logitðpiÞ = logpi

1− pi

� �= linear predictor

3. Linear predictor: α + β * xi

If x is an indicator for the Chiltern gentian, then α can be interpreted asa logit-scale parameter for the probability of occurrence of the Cross-leaved gentian in the Jura mountains and β is the difference, on a logit-scale, between the probability of occurrence of the Chiltern gentian andthat of the Cross-leaved gentian.


We simulate the data from a binomial process whose parameters weredefined such that the sample data approximately match those in ourexample. Note that our modeled response simply consists of twonumbers.

N <– 50 # Binomial total (Number of coin flips)

p.cr <– 13/50 # Success probability Cross-leaved

p.ch <– 29/50 # Success probability Chiltern gentian

C.cr <– rbinom(1, 50, prob p.cr) ; C.cr # Add Binomial noise

C.ch <– rbinom(1, 50, prob p.ch) ; C.ch # Add Binomial noise

C <– c(C.cr, C.ch)

species <– factor(c(0,1), labels c("Cross-leaved gentian","Chiltern gentian"))


A binomial t-test in R suggests a significant difference in the perceiveddistribution of the Cross-leaved gentian and the Chiltern gentian.

summary(glm(cbind(C, N - C) ~ species, family binomial))

predict(glm(cbind(C, N - C) ~ species, family binomial), type "response")

> summary(glm(cbind(C, N - C) ~ species, family binomial))

Call:

glm(formula cbind(C, N - C) ~ species, family binomial)

[ ... ]

17.3 ANALYSIS USING R 213

Coefficients:


(Intercept) -1.0460 0.3224 -3.244 0.00118 **

speciesChiltern gentian 1.3687 0.4313 3.173 0.00151 **

- - -

[ ... ]

> predict(glm(cbind(C, N - C) ~ species, family binomial), type "response")

1 2

0.26 0.58


# Define model

sink("Binomial.t.test.txt")

cat("

model {

# Priors



# Likelihood

for (i in 1:n) {

C[i] ~ dbin(p[i], N) # Note p before N

logit(p[i]) <– alpha + beta *species[i]

}


Occ.cross <– exp(alpha) / (1 + exp(alpha))

Occ.chiltern <– exp(alpha + beta) / (1 + exp(alpha + beta))

Occ.Diff <– Occ.chiltern - Occ.cross # Test quantity

}

",fill TRUE)

sink()

# Bundle data

win.data <– list(C C, N 50, species c(0,1), n length(C))

# Inits function



params <– c("alpha", "beta", "Occ.cross", "Occ.chiltern", "Occ.Diff")


# MCMC settings

nc <– 3

ni <– 1200

nb <– 200

nt <– 2



model.file "Binomial.t.test.txt", n.thin nt, n.chains nc, n.burnin nb, n.iter ni,

debug TRUE)

print(out, dig 3)

> print(out, dig 3)

Inference for Bugs model at "Binomial.t.test.txt", fit using WinBUGS,



mean sd 2.5% 25% 50% 75% 97.5% Rhat n.eff

alpha −1.047 0.333 −1.710 −1.273 −1.034 −0.810 −0.441 1.007 290

beta 1.362 0.449 0.520 1.059 1.375 1.657 2.278 1.007 300

Occ.cross 0.265 0.063 0.153 0.219 0.262 0.308 0.392 1.007 290

Occ.chiltern 0.577 0.069 0.438 0.530 0.580 0.626 0.701 1.003 960

Occ.Diff 0.312 0.095 0.124 0.251 0.318 0.376 0.485 1.006 340

[ ... ]DIC info (using the rule, pD Dbar-Dhat)

pD 2.1 and DIC 12.6

Next, we plot the posterior distribution of the (biological) distributionsof the two species and their difference, as perceived from the detection–nondetection data (Fig. 17.2):

par(mfrow c(3,1))

hist(out$sims.list$Occ.cross, col "grey", xlim c(0,1), main "", xlab

"Occupancy Cross-leaved", breaks 30)

abline(v out$mean$Occ.cross, lwd 3, col "red")

hist(out$sims.list$Occ.chiltern, col "grey", xlim c(0,1), main "", xlab

"Occupancy Chiltern", breaks 30)

abline(v out$mean$Occ.chiltern, lwd 3, col "red")

hist(out$sims.list$Occ.Diff, col "grey", xlim c(0,1), main "", xlab

"Difference in occupancy", breaks 30)

abline(v 0, lwd 3, col "red")

The posterior distribution of the difference barely overlaps zero(Fig. 17.2, bottom), and the 95% credible interval is (0.015, 0.485). There-fore, we can say that the Chiltern gentian is found at significantly moresites than the Cross-leaved gentian. This is not equivalent to saying thatthe Chiltern gentian was more widespread in the Jura mountains than is


the Cross-leaved gentian, only that it is detected at more sites. Whether italso means the former depends on whether the assumption of perfect or atleast of constant detection probability holds (MacKenzie and Kendall,2002; Kéry and Schmidt, 2008).

17.5 SUMMARY

As for all generalized linear models, a binomial t-test in WinBUGS is afairly trivial generalization of the corresponding normal response model.We have seen another example of the ease with which, in Markov chainMonte Carlo-based statistical inference, we can compute derived param-eters exactly, i.e., without any approximations, and with full errorpropagation.

Freq

uenc

y 150

50

0

Freq

uenc

y 150

50

140

80

40

0

0

Difference in occupancy

Freq

uenc

y

0.0 0.2 0.4 0.6 0.8 1.0

0.0 0.2 0.4 0.6 0.8 1.0Occupancy Chiltern

0.0 0.2 0.4 0.6 0.8 1.0

Occupancy Cross-leaved

FIGURE 17.2 Posterior distributions of the occupancy probability of Cross leaved (toppanel) and Chiltern gentian (middle panel); vertical red lines indicate posterior means.Bottom panel shows the posterior distribution of the difference of the two species’occupancy probability; vertical red line indicates zero.


EXERCISES1. Binomial and Bernoulli: Try to formulate the same problem using a Bernoulli

distribution. You will need to simulate, not the aggregate number of sites atwhich each gentian species was found, but rather the detected occupancystatus of each individual site. Adapt the code for analysis. In WinBUGS,you can directly specify a Bernoulli by dbern(p). (As usual, when you’reunsure, go to the WinBUGS manual: Help > User Manual and then scrolldown.)

2. Swiss hare data: Model the probability of occurrence of a “large” count inarable and grassland areas. Select a useful threshold to call a count “large.”Are large counts more common in arable areas?

17.5 SUMMARY 217

C H A P T E R

18Binomial Analysis of

Covariance

O U T L I N E





18.5 Summary 228

18.1 INTRODUCTION

We can specify a binomial analysis of covariance (ANCOVA) byadding discrete and continuous covariates to the linear predictor of a bino-mial generalized linear model (GLM). Once again, to stress the structuralsimilarity with the normal linear model in Chapter 11, we modify the aspviper example just slightly. Instead of modeling a continuous measure-ment such as body mass in Chapter 11, we model a count governed byan underlying probability; specifically, we model the proportion ofblack individuals in adder populations. The adder has an all-black anda zigzag morph, where females are brown and males are gray (Fig. 18.1).

It has been hypothesized that the black color confers a thermal advan-tage, and therefore, the proportion of black individuals should be greater incooler or wetter habitats. We simulate data that bear on this question and“study,” by simulation, 10 adder populations each in the Jura mountains,the Black Forest, and the Alps. We capture a number of snakes in thesepopulations and record the proportion of black adders. Then, we relatethese proportions to the mountain range and to a combined index of low


temperature, wetness, and northerliness of the site. Our expectation will ofcourse be that there are relatively more black adders at cool and wet sites.As always, a count is the result of a true number (here, of black and zigzagadders) and a detection probability. Hence, in the following analyses, wemake the implicit assumption that the detectability of black and zigzagadders neither differs between each other nor among populations.

We model the number of black adders Ci among Ni captured animals inpopulation i. Here is the description of the model.

1. Distribution: Ci ~ Binomial(pi, Ni)


1− pi


3. Linear predictor: αJura + β1 * xBlackF + β2 * xAlps + β3 * xwet + β4 *xwet * xBlackF + β5 * xwet * xAlps

Note that the number of animals captured, Ni, is not a parameter of thebinomial distribution in this example but is known (in contrast to themodel of Chapter 21 where Ni will be estimated). The link function isthe logit, as is customary for a binomial distribution, though other linksare possible (e.g., the complementary log–log, which is asymmetrical; seeGLM textbooks such as McCullagh and Nelder, 1989). Finally, the linear

FIGURE 18.1 Male adder (Vipera berus) of the zigzag morph, Germany, 2007. (Photo T. Ott)

18. BINOMIAL ANALYSIS OF COVARIANCE220

predictor is made up of an intercept αJura for the expected proportion, onthe logit scale, of black adders in the Jura and parameters β1 and β2 forthe difference in the intercept between Black Forest and the Jura and theAlps and the Jura, respectively. The parameters β3, β4, and β5 specify theslope of the (logit-linear) relationship between the proportion of blackadders and the wetness indicator of a site in the Jura and the differencefrom β3 of these slopes in the Black Forest and the Alps, respectively.


n.groups < 3n.sample < 10n < n.groups * n.samplex < rep(1:n.groups, rep(n.sample, n.groups))pop < factor(x, labels = c("Jura", "Black Forest", "Alps"))

We construct a continuous wetness index: 0 denotes wet sites lacking sunand 1 is the converse. For ease of presentation, we sort this covariate; thishas no effect on the analysis.

wetness.Jura < sort(runif(n.sample, 0, 1))wetness.BlackF < sort(runif(n.sample, 0, 1))wetness.Alps < sort(runif(n.sample, 0, 1))wetness < c(wetness.Jura, wetness.BlackF, wetness.Alps)

We also need the number of adders examined in each population (Ni), i.e.,the binomial totals, also called sample or trial size of the binomial distri-bution. We assume that the total number of snakes examined in eachpopulation is a random variable drawn from a uniform distribution, butthis is not essential.

N < round(runif(n, 10, 50) ) # Get discrete Uniform values

We build the design matrix of an interactive combination of populationand wetness.

Xmat < model.matrix(~ pop*wetness)print(Xmat, dig = 2)

Select the parameter values, i.e., choose values for αJura, β1, β2, β3, β4, and β5.

beta.vec < c( 4, 1, 2, 6, 2, 5)

We assemble the number of black adders captured in each population inthree steps:

1. We add up all components of the linear model to get the value ofthe linear predictor,


2. We apply the inverse logit transformation to get the expected proportion(pi) of black adders in each population (Fig. 18.2; top), and finally,

3. We add binomial noise, i.e., use pi and Ni to draw binomial randomnumbers representing the count of black adders in each sample of Ni

snakes (Fig. 18.2; bottom).

The value of the linear predictor is again obtained by matrix multiplica-tion of the design matrix (Xmat) and the parameter vector (beta.vec).(For ecological purists the usual reality disclaimer applies again.).

lin.pred < Xmat[,] %*% beta.vec # Value of lin.predictor

exp.p < exp(lin.pred) / (1 + exp(lin.pred)) # Expected proportion

C < rbinom(n n, size N, prob exp.p) # Add binomial noise

hist(C) # Inspect what we've created

par(mfrow c(2,1))

matplot(cbind(wetness[1:10], wetness[11:20], wetness[21:30]), cbind(exp.p[1:10],

exp.p[11:20], exp.p[21:30]), ylab "Expected black", xlab "", col

c("red","green","blue"), pch c("J","B","A"), lty "solid", type "b", las 1,

cex 1.2, main "", lwd 1)

AA A

AAA

A

JJJJJJB B

B

B B B

BBB

B BBB

B

BB B

BB B

Exp

ecte

d bl

ack

1.0

0.8

0.6

0.4

0.2

0.0

0.0

AA A A A AJ

J

J

JJJ

JJJJJ

J

J

AA A A

0.2 0.4 0.6 0.8 1.0

Obs

erve

d bl

ack

1.0

0.8

0.6

0.4

0.2

0.0

Wetness index0.0 0.2 0.4 0.6 0.8 1.0

AA A

FIGURE 18.2 Relationship between the wetness index and the expected (top) andobserved proportion (bottom) of black adders in the Jura mountains (red J), the BlackForest (green B), and the Alps (blue A). The discrepancy between the two graphs is due tobinomial sampling variation.


matplot(cbind(wetness[1:10], wetness[11:20], wetness[21:30]),

cbind(C[1:10]/N[1:10], C[11:20]/N[11:20], C[21:30]/N[21:30]), ylab "Observed

black", xlab "Wetness index", col c("red","green","blue"), pch

c("J","B","A"), las 1, cex 1.2, main "")

par(mfrow c(1,1))

Hence, the proportion of black adders increases with wetness moststeeply in the Black Forest, less so in the Jura, and hardly at all in theAlps (Fig. 18.2).


As usual, the analysis in R is concise.

summary(glm(cbind(C, N C) ~ pop * wetness, family binomial))

beta.vec

> summary(glm(cbind(C, N C) ~ pop * wetness, family binomial))

Call:

glm(formula cbind(C, N C) ~ pop * wetness, family binomial)

Deviance Residuals:

Min 1Q Median 3Q Max

1.8911 0.5785 0.1309 0.8162 1.8866

Coefficients:


(Intercept) 3.7255 0.3905 9.541 < 2e 16 ***

popBlack Forest 0.1342 0.5894 0.228 0.81990

popAlps 1.5118 0.4946 3.057 0.00224 **

wetness 5.5286 0.7215 7.663 1.81e 14 ***

popBlack Forest:wetness 3.6046 1.3070 2.758 0.00582 **

popAlps:wetness 4.3748 0.8607 5.083 3.72e 07 ***

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 374.63 on 29 degrees of freedom

Residual deviance: 26.41 on 24 degrees of freedom

AIC: 123.75

Number of Fisher Scoring iterations: 4

> print(beta.vec)

[1] 4 1 2 6 2 5

>

Owing to the small sample size, we observe only a moderate correspon-dence with the input values.

18.3 ANALYSIS USING R 223


In the following WinBUGS code, we add a few lines to computePearson residuals. For a binomial response, these can be computed asðCi −NipiÞ/ Nipið1− piÞ

p, where Ci is the binomial count for unit i and Ni

and pi are the trial size and success probability of the associated binomialdistributions. The Pearson residual has the form of a raw residual dividedby the standard deviation of unit i.

# Define model

sink("glm.txt")

cat("

model {

# Priors




}

# Likelihood

for (i in 1:n) {

C[i] ~ dbin(p[i], N[i])

logit(p[i]) <– alpha[pop[i]] + beta[pop[i]]* wetness[i] # Baseline Jura

# Fit assessments: Pearson residuals and posterior predictive check

Presi[i] <– (C[i]−N[i]*p[i]) / sqrt(N[i]*p[i]*(1−p[i])) # Pearson resi

C.new[i] ~ dbin(p[i], N[i]) # Create replicate data set

Presi.new[i] <– (C.new[i]−N[i]*p[i]) / sqrt(N[i]*p[i]*(1−p[i]))

D[i] <– pow(Presi[i], 2) # Squared Pearson residuals

D.new[i] <– pow(Presi.new[i], 2)

}


# Recover the effects relative to baseline level (no. 1)

a.effe2 <– alpha[2] – alpha[1] # Intercept Black Forest vs. Jura

a.effe3 <– alpha[3] – alpha[1] # Intercept Alps vs. Jura

b.effe2 <– beta[2] – beta[1] # Slope Black Forest vs. Jura

b.effe3 <– beta[3] – beta[1] # Slope Alps vs. Jura

# Custom tests

test1 <– beta[3] – beta[2] # Difference slope Alps -Black Forest

# Add up discrepancy measures

fit <– sum(D[])

fit.new <– sum(D.new[])

}


",fill TRUE)

sink()

# Bundle data

win.data <– list(C C, N N, pop as.numeric(pop), n.groups n.groups, wetness

wetness, n n)

# Inits function

inits <– function(){ list(alpha rlnorm(n.groups, 3, 1), beta rlnorm(n.groups,

2, 1))} # Note log-normal inits here


params <– c("alpha", "beta", "a.effe2", "a.effe3", "b.effe2", "b.effe3", "test1",

"Presi", "fit", "fit.new")

# MCMC settings

ni <– 1500

nb <– 500

nt <– 5

nc <– 3


out <– bugs(data win.data, inits inits, parameters.to.save params, model.file

"glm.txt", n.thin nt, n.chains nc, n.burnin nb, n.iter ni, debug TRUE)

The model converges rapidly. To practice good statistical behavior, wenow first assess model fit before rushing on to inspect the parameter esti-mates (or indulge in some activity related to significance testing). After all,only when a model adequately reproduces the salient features in the dataset should we believe what it says about the parameters. We first plot thePearson residuals and, naturally, find no obvious remaining patternrelated to order or wetness of a site (Fig. 18.3).

par(mfrow = c(1,2), cex = 1.5)plot(out$mean$Presi, ylab = "Residual", las = 1)abline(h = 0)plot(wetness, out$mean$Presi, ylab = "Residual", las = 1)abline(h = 0)

Next, we conduct a posterior predictive check for overall goodness of fit ofthe model (Fig. 18.4). Remember that our discrepancy measure is the sumof squared Pearson residuals.

plot(out$sims.list$fit, out$sims.list$fit.new, main "", xlab "Discrepancy

actual data", ylab "Discrepancy ideal data")



Pea

rson

res

idua

l2

1

0

21

Index

0 5 10 15 20 25 30

Pea

rson

res

idua

l

2

1

0

21

0.0 0.2 0.4

Wetness index

0.6 0.8

FIGURE 18.3 Pearson residuals plotted against the order of each observation (left) andagainst the value of the wetness indicator (right).

Dis

crep

ancy

idea

l dat

a

50

60

40

30

20

1025 30 40

Discrepancy actual data

5035 45

FIGURE 18.4 Posterior predictive check for the black adder analysis. The Bayesianp value (here, 0.52) is the proportion of points above the line.


Of course, this looks good and so does the Bayesian p-value.


Hence, we feel justified in comparing the results of the Bayesian analysiswith the truth from the data-generating random process and with the fre-quentist inference from R’s function glm().


beta.vec # Truth in data generation

print(glm(cbind(C, N C) ~ pop * wetness, family binomial)$coef, dig 4)

# The ML solution

We get rather similar estimates. Parameters in the output of the Bayesiananalysis printed in boldface correspond to the quantities used for datageneration and also to those of the model parameterization in thefrequentist analysis in R.

> print(out, dig 3) # Bayesian analysis

Inference for Bugs model at "glm.txt", fit using WinBUGS,



mean sd 2.5% 25% 50% 75% 97.5% Rhat n.eff

alpha[1] −3.797 0.383 −4.514 −4.056 −3.782 −3.518 −3.048 1.009 190

alpha[2] −3.591 0.421 −4.408 −3.896 −3.580 −3.302 −2.825 1.002 600

alpha[3] −2.225 0.301 −2.837 −2.422 −2.235 −2.021 −1.651 1.031 68

beta[1] 5.655 0.701 4.372 5.164 5.632 6.131 7.004 1.009 200

beta[2] 9.139 1.086 7.148 8.362 9.102 9.927 11.300 1.000 600

beta[3] 1.162 0.477 0.180 0.843 1.192 1.476 2.086 1.034 60

a.effe2 0.207 0.597 −0.972 −0.190 0.201 0.631 1.289 1.012 200

a.effe3 1.572 0.480 0.639 1.256 1.559 1.894 2.536 1.018 110

b.effe2 3.484 1.337 0.960 2.543 3.490 4.472 6.148 1.009 360

b.effe3 −4.492 0.832 −6.134 −5.000 −4.471 −3.916 −3.022 1.012 150

test1 −7.977 1.164 −10.501 −8.788 −7.897 −7.218 −5.741 1.007 280

Presi[1] −1.109 0.185 −1.520 −1.235 −1.103 −0.979 −0.805 1.009 190

[ ... ]




> beta.vec # Truth in data-generation

[1] −4 1 2 6 2 −5


> print(glm(cbind(C, N–C) ~ pop * wetness, family binomial)$coef, dig 4)

# The ML solution

(Intercept) popBlack Forest popAlps

3.7255 0.1342 1.5118

wetness popBlack Forest:wetness popAlps:wetness

5.5286 3.6046 −4.3748

18.5 SUMMARY

Moving from the normal and the Poisson to a binomial ANCOVAinvolves only minor changes in the code of WinBUGS (and also R).Similarly, the concepts of residuals and posterior predictive distributionscarry over to this class of models. We see examples for both.

EXERCISES1. Multiple binomial regression: Create another habitat variable X and, using the

data set created in this chapter, use WinBUGS to fit the following modelprop.black ~ pop*wet + X + wet:X. That is, add the main effect of Xplus the interaction between wet and X.

2. Swiss hare data: Model the probability of a “large” count as a function ofland use and year (continuous), i.e., fit the following model:

large.pop ~ land use * year

You may choose a threshold of about 5. You may also want to check fornonlinearity (on the scale of the logit link) by adding a quadratic effectof year.


C H A P T E R

19Binomial Mixed-Effects Model

(Binomial GLMM)

O U T L I N E



19.3 Analysis Under a Random-Coefficients Model 23119.3.1 Analysis Using R 23219.3.2 Analysis Using WinBUGS 234

19.4 Summary 236

19.1 INTRODUCTION

As in a Poisson generalized linear mixed model (GLMM), we can alsoadd into a binomial generalized linear model (GLM) random variationbeyond what is stipulated by the binomial distribution. We show this for aslight modification of the Red-backed shrike example from Chapter 16.Instead of counting the number of pairs, which naturally leads to the adop-tion of a Poisson model, we now study the reproductive success (success orfailure) of its much rarer cousin, the woodchat shrike (Fig. 19.1). We exam-ine the relationship between precipitation during the breeding season andreproductive success; wet springs are likely to depress the proportion of suc-cessful nests. We assemble data from 16 populations studied over 10 years.

First, we write down the random-coefficients model (without intercept-slope correlation) for a binomial response. We model Ci, the number ofsuccessful pairs among Ni studied pairs in year i and study area j:

1. Distribution: Ci ~ Binomial (pi, Ni)


1− pi



3. Linear predictor: αj(i) + βj(i) * xi4. Submodel for parameters: αj ∼Normalðμα, σ2αÞ

βj ∼Normalðμβ, σ2βÞExcept for a different distribution and link function, the additional kind

of data provided by the binomial totals Ni, and a different interpretationand indexing of the covariate, this model looks exactly like the PoissonGLMM in Chapter 16! The linear predictor, αj(i) + βj(i) * xi, specifies apopulation-specific, logit-linear relationship between breeding success andprecipitation. Furthermore, populations are assumed to be related in thesense that both intercepts (αj) and slopes (βj) of those relationships comefrom two normal distributions whose hyperparameters we estimate.


We generate data under the random-coefficients model, i.e., with bothαj and βj assumed independent and random effects. We assume nocorrelation.

n.groups < 16n.years < 10n < n.groups * n.yearspop < gl(n = n.groups, k = n.years)

FIGURE 19.1 Woodchat shrike (Lanius senator), Catalonia, 2008. (Photo J. Rojals)

19. BINOMIAL MIXED-EFFECTS MODEL (BINOMIAL GLMM)230

We create a uniform covariate as an index to spring precipitation: 0denotes little rain and 1 much.

prec < runif(n, 0, 1)

Ni, the binomial total, is the number of nesting attempts surveyed in year i.

N < round(runif(n, 10, 50) )

We build the design matrix as before.

Xmat < model.matrix(~pop*prec 1 prec)print(Xmat[1:91,], dig = 2) # Print top 91 rows

Next, we choose the parameter values from their respective normaldistributions, but first, we need to select the associated hyperparameters.

intercept.mean <– 1 # Select hyperparams

intercept.sd <– 1

slope.mean <– −2

slope.sd <– 1




We assemble the counts Ci by first computing the value of thelinear predictor, then applying the inverse-logit transformation, andfinally integrating binomial noise (where we need Ni).

lin.pred < Xmat %*% all.effects # Value of lin.predictor

exp.p < exp(lin.pred) / (1 + exp(lin.pred)) # Expected proportion

For each population, we plot the expected (Fig. 19.2) and the observedbreeding success (Fig. 19.3) of woodchat shrikes against standardizedspring precipitation using a Trellis plot.

library("lattice")

xyplot(exp.p ~ prec | pop, ylab "Expected woodchat shrike breeding success ",

xlab "Spring precipitation index", main "Expected breeding success")

C <– rbinom(n n, size N, prob exp.p) # Add binomial variation

xyplot(C/N ~ prec | pop, ylab "Realized woodchat shrike breeding success", xlab

"Spring precipitation index", main "Realized breeding success")

19.3 ANALYSIS UNDERA RANDOM-COEFFICIENTS MODEL

We could assume that all shrike populations have the same relation-ship between breeding success and standardized spring precipitation,but at different levels, corresponding to a random-intercepts model


(see Section 12.3). However, we directly adopt the random coefficientsmodel without correlation instead. This means assuming that every shrikepopulation has a specific response to precipitation but that both interceptand slope are “similar” among populations.


library('lme4')

glmm.fit <– glmer(cbind(C, N–C) ~ prec + (1 | pop) + ( 0+ prec | pop), family

binomial)

glmm.fit

> glmm.fit

Generalized linear mixed model fit by the Laplace approximation

Formula: cbind(C, N – C) ~ prec + (1 | pop) + (0 + prec | pop)

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

Spring precipitation index

Exp

ecte

d br

eedi

ng s

ucce

ss

0.8

0.6

0.4

0.2

0.8

0.6

0.4

0.2

0.6

0.4

0.2

0.8

0.6

0.4

0.2

0.8

130.0 0.2 0.4 0.6 0.8 1.0

14 15 160.0 0.2 0.4 0.6 0.8 1.0

11 12109

8765

1 2 3 4

FIGURE 19.2 Trellis plot of the relationship between spring precipitation (standardized)and expected breeding success of woodchat shrikes in 16 populations over 10 years.


AIC BIC logLik deviance

229.3 241.6 −110.6 221.3

Random effects:

Groups Name Variance Std.Dev.

pop (Intercept) 0.38785 0.62277

pop prec 0.53258 0.72978


Fixed effects:

Estimate Std.Error z value Pr(>|z|)

(Intercept) 0.8006 0.1690 4.738 2.16e–06 ***

prec −2.1290 0.2173 −9.796 < 2e–16 ***

–––

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

13 14 15 16

1 2 3 4

5 6 7 8

9 10 11 12

0.0 0.2 0.4 0.6 1.00.8 0.0 0.2 0.4 0.6 1.00.8

0.0

0.2

0.4

0.6

1.0

0.8

0.0

0.2

0.4

0.6

1.0

0.8

0.0

0.2

0.4

0.6

1.0

0.8

0.0

0.2

0.4

0.6

1.0

0.8

0.0 0.2 0.4 0.6 1.00.80.0 0.2 0.4 0.6 1.00.8R

ealiz

ed b

reed

ing

suce

ss

Spring precipitation index

FIGURE 19.3 Trellis plot of the relationship between spring precipitation (standardized)and the realized breeding success of woodchat shrikes in 16 populations over 10 years. Thedifference between this and the previous plot is due to binomial sampling variation.


Correlation of Fixed Effects:

(Intr)

prec −0.182


# Define model

sink("glmm.txt")

cat("

model {

# Priors


alpha[i] ~ dnorm(mu.int, tau.int) # Intercepts

beta[i] ~ dnorm(mu.beta, tau.beta) # Slopes

}

mu.int ~ dnorm(0, 0.001) # Hyperparameter for random intercepts


sigma.int ~ dunif(0, 10)

mu.beta ~ dnorm(0, 0.001 # Hyperparameter for random slopes

tau.beta <– 1 / (sigma.beta * sigma.beta)

sigma.beta ~ dunif(0, 10)

# Binomial likelihood

for (i in 1:n) {

C[i] ~ dbin(p[i], N[i])

logit(p[i]) <– alpha[pop[i]] + beta[pop[i]]* prec[i]

}

}

",fill TRUE)

sink()

# Bundle data

win.data <– list(C C, N N, pop as.numeric(pop), prec prec, n.groups

n.groups, n n)

# Inits function

inits <– function(){ list(alpha rnorm(n.groups, 0, 2), beta rnorm(n.groups, 1,

1), mu.int rnorm(1, 0, 1), mu.beta rnorm(1, 0, 1))}


params <– c("alpha", "beta", "mu.int", "sigma.int", "mu.beta", "sigma.beta")


# MCMC settings

ni <– 2000

nb <– 500

nt <– 2

nc <– 3


out <– bugs(win.data, inits, params, "glmm.txt", n.thin nt, n.chains nc,


This standard GLMM converges nicely.

print(out, dig 2)

> print(out, dig 2)

Inference for Bugs model at "glmm.txt", fit using WinBUGS,



mean sd 2.5% 25% 50% 75% 97.5% Rhat n.eff

[ ... ]

mu.int 0.80 0.19 0.44 0.68 0.80 0.92 1.17 1.00 2200

sigma.int 0.70 0.16 0.45 0.59 0.68 0.79 1.08 1.00 1600

mu.beta 2.13 0.25 2.64 2.28 2.12 1.97 1.64 1.00 2200

sigma.beta 0.84 0.23 0.48 0.67 0.81 0.96 1.38 1.00 1100

[ ... ]


pD 32.8 and DIC 748.4


>

# Compare with input valuesintercept.mean ; intercept.sd ; slope.mean ; slope.sd> intercept.mean ; intercept.sd ; slope.mean ; slope.sd[1] 1[1] 1[1] 2[1] 1>

This seems to work, too. As is typical, the estimates of random-effectsvariances are greater in the Bayesian approach, presumably since inferenceis exact and incorporates all uncertainty in the modeled system. In contrast,the approach in lmer() or glmer() is approximate and may underesti-mate these variances (Gelman and Hill, 2007).


19.4 SUMMARY

As for the Poisson case, the introduction of random effects into a bino-mial GLM in WinBUGS is particularly straightforward and transparent.Fitting the resulting binomial GLMM in WinBUGS is very helpful foryour general understanding of this class of models.

EXERCISES1. Predictions: Produce a plot of the mean expected response of breeding

success to the spring precipitation index. Try to overlay these estimates onthe observed data.

2. Fixed and random, shrinkage: Fit a fixed-effects binomial analysis ofcovariance (ANCOVA) model to the data set we just assembled andcompare the estimates of the population-specific intercepts and slopesunder the fixed- and the random-effects models.


C H A P T E R

20Nonstandard GLMMs 1:Site-Occupancy SpeciesDistribution Model

O U T L I N E




20.4 Summary 251

20.1 INTRODUCTION

We have now seen a wide range of random-effects (also called mixed orhierarchical) models, including the normal, Poisson, and binomial general-ized linear models (GLMs) with random effects. This means that someparameters are themselves represented as realizations from a randomprocess. With the exception of the zero-inflated models in chapter 14,we have used a normal (or a multivariate normal) distribution as oursole description of this additional random component. However, nothingconstrains us to use the normal distribution only and sometimes, otherdistributions will be appropriate for some parameters. The next twochapters illustrate two cases with discrete random effects that are assumedto be drawn from a Bernoulli or a Poisson distribution. Importantly, theseeffects have a precise biological meaning in these models: they correspondto the true, but imperfectly observed, state of occurrence (Chapter 20) or ofabundance (Chapter 21).


The modeling of animal and plant distributions is an important andactive area of ecological research and applications. The most frequentlyapplied method is a binomial GLM, or logistic regression (see Chapters17–19), where the probability that an organism is found is modeledfrom what typically, and misleadingly, are called “presence–absence”data. These are binary indicators for whether a species was found (1) ornot (0) in a spatial sample unit, and the effect of covariates is modeledthrough the logit link function. There are many variants of this approach,but the basic principle is often the same: so-called “presence–absence”data are directly modeled as coming from a Bernoulli distribution andthe Bernoulli parameter is interpreted as the probability of occurrence.

However, a fundamental and extremely widely overlooked issue inalmost all species distribution models is that detectability (p) of mostspecies is imperfect—typically, a species will not always be detectedwhere it occurs (MacKenzie and Kendall, 2002; Pellet and Schmidt,2005; Kéry and Schmidt, 2008; Kéry et al., 2010b). In other words, detec-tion probability is typically less than one (p < 1). This basic fact is verywell known to field naturalists and reasonably understood for animals,and it even applies to populations of immobile organisms such as plants(Kéry, 2004; Kéry et al., 2006). However, it seems to have been overlookedby most professional ecologists dealing with distributional data (Araujoand Guisan, 2006; Elith et al., 2006).

As a consequence, virtually no study generated by current speciesdistribution modelers actually models the true occurrence of a species aspretended or believed. Rather, the product of the two probabilities ofoccurrence and detection of a species is modeled. For noteworthy excep-tions, see Gelfand et al. (2005), Royle et al. (2005), Latimer et al. (2006), andAltwegg et al. (2008), also see Royle et al. (2007), and Webster et al. (2008).

The widespread confusion about what is actually being modeled inmost species distribution models has three main consequences (MacKenzieet al., 2002; Tyre et al., 2003; Gu and Swihart, 2004; MacKenzie, 2006;MacKenzie et al., 2006; Kéry and Schmidt, 2008; Royle and Dorazio,2008; Kéry et al., 2010b):

1. species distributions will be underestimated whenever p < 1,2. estimates of covariate relationships will be biased toward zero

whenever p < 1, and3. the factors that affect the difficulty with which a species is found

may end up in predictive models of species occurrence.

The first is easy to understand, but the second has not been widelyrecognized, although Tyre et al. (2003), Gu and Swihart (2004), andMacKenzie et al. (2006, pp. 34–35) described this effect already a fewyears ago. Interestingly, and in contrast to a naïve analysis of count datain the presence of imperfect detection (cf. simple Poisson regression exam-ples in Chapters 13–16), in distribution modeling, even a constant p < 1 will

20. NONSTANDARD GLMMs 1238

bias low the slope estimate of the relationship between occurrence probabil-ity and a covariate. Presumably, this includes also a time covariate, i.e., situa-tions where changes in distribution over time are modeled. As an examplefor the third effect, if a species has a higher probability to be found nearroads, perhaps, because near roads, more people are likely to stumble uponit, thenobviously roads or habitat types associatedwith roadswill showupasimportant for that species, unlessdetection probability is accounted for.As anextreme example, when a species distribution map is constructed from road-kill records, then no matter how much roads might actually be avoided bythat species in reality, the resulting distribution map will emphasize thegreat positive effect of roads on the distribution of the species!

In contrast, a novel class of models with the rather peculiar name“site-occupancy models” (MacKenzie et al., 2002, 2003, 2006; Tyre et al.2003) is able to estimate the true distribution of animals or plants free ofany distorting effects of the difficulty with which they are found. Thischapter deals with these models and shows how to fit them usingWinBUGS.

As a motivating example, we consider an inventory of the beautifulChiltern gentian (Fig. 20.1) conducted in 150 calcareous grasslands in

FIGURE 20.1 Chiltern gentian (Gentianella germanica), Slovenia, 2007. (Photo: M. Vogrin)


the Jura mountains. Our aim is to estimate the proportion or numberof occupied sites and to identify environmental factors related to theoccurrence of the gentian. Interestingly, Gentianella germanica is a typicalplant of nutrient-poor sites, which are thus a priori often rather dry.However, within the class of nutrient-poor grasslands, it preferentiallyoccurs on wetter sites. However, these sites often have a higher and denservegetation cover, so the rather small gentian (5–40 cm height) may morefrequently be overlooked at these better sites (we ignore here the fact thatbetter sites may hold larger populations). This effect could mask itspreference for wetter sites. None of the currently widespread methodsfor distribution modeling such as GLM, generalized additive models(GAM), or boosted regression trees (Elith et al., 2006) are able to teaseapart the effects of a covariate that influences both the occurrence of aspecies and the ease with which it is found (i.e. detection probability).

We will use site-occupancy models (MacKenzie et al., 2002, 2003, 2006)to separately estimate the gentian’s probability of occurrence (called occu-pancy or species distribution) and its probability to be detected at occupiedsites (detectionprobability), alongwith covariate effects on either occurrenceor detection. It may be claimed that site-occupancymodels are currently theonly genuine distribution models available. All other widespread distribu-tion modeling approaches confound occurrence and detection and onlyestimate the apparent occurrence, or more explicitly, the combination of theprobability of occurrence and the probability of detection, given the occurrence.

The price to be paid for this improved inference is a sort of repeated-measures design, i.e., at least some sites need to be visited twice orpreferably more frequently. This field protocol may be called a metapopu-lation design because the same quantity (occurrence) is assessed at manyspatial replicates (Royle, 2004b; Royle and Dorazio, 2006). It is from thepattern of detection or nondetection at multiply visited sites that we obtainthe information about detection probability, separate from occurrenceprobability. See MacKenzie and Royle (2005), MacKenzie et al. (2006),and Bailey et al. (2007) for design considerations relevant to this model.

A balanced design, i.e., an equal number of visits to all sites, is notessential for site-occupancy models; it simply makes things easier to simu-late and present. (See Chapter 21 for an alternative, “vertical” data format,which is more convenient for the analysis of unbalanced metapopulationdata.) Therefore, in our inventory of G. germanica, we assume that each siteis visited three times by an independent botanist, and every time she noteswhether at least one plant of G. germanica is detected or not. The result ofthese surveys may be summarized in a binary string, such as 010 for a site,where a gentian is detected during the second, but not during the first orthird surveys. Generally, for a species surveyed T times at each of R sites,survey results are summarized in an R-by-T matrix containing a 1 whenthe species is detected at site i during survey j and a 0 when it is not.


The genesis, and therefore the analysis, of detection/nondetectionobservation yij at site i during survey j is naturally described by a hierarch-ical, or state-space, model that contains one submodel for the only par-tially observed true state (occurrence, the result of the biologicalprocess), and another submodel for the actual observations. The actualobservations result from both the particular realization of the biologicalprocess and of the observation process.

zi ∼BernoulliðψÞ Biological process yields true stateyij ∼Bernoulliðzi � pijÞ Observation process yields observations

Hence, true occurrence zi of G. germanica at site i is a Bernoulli randomvariable governed by the parameter ψ (occurrence probability), which isexactly the parameter that most distribution modelers wish they weremodeling. The actual gentian observation yij, detection or not at site i dur-ing survey j (or “presence–absence” datum yij), is another Bernoulli ran-dom variable with a success rate that is the product of the actualoccurrence of G. germanica at that site, zi, and detection probability pij atsite i during survey j. Hence, at a site where the gentian doesn’t occur,z = 0, and y must be 0. Conversely, at an occupied site, we have z = 1,and G. germanica is detected with probability pij. That is, in the site-occu-pancy model, the detection probability is expressed conditional on occur-rence, and the two parameters ψ and p are separately estimable ifreplicate visits are available.

One way to look at this model in terms of the GLM framework, thatfeatures so prominently in this book, is as a hierarchical, coupled logisticregression. One logistic regression describes true occurrence, and theother describes detection, given that the species does occur. Anotherdescription of the model is as a nonstandard binomial GLMM with abinary distribution for the random effects—occupied or not occupied—instead of the normal distributions that we assumed in the “standard”binomial GLMM in Chapter 19 as well as most other mixed models inprevious chapters.

The above equations describe just the simplest kind of a site-occupancymodel; they can readily be extended to more complex cases. First, the abil-ity to model covariate effects is crucial; indeed, covariates can easily bemodeled into both the occurrence (ψ) and the detection (p) part of themodel in a simple GLM fashion. That is, we can add to the model astatement like this:

logitðψiÞ = α + β �xi,

where xi is the value of some occurrence-relevant covariate measured atsite i, and α and β are parameters. The same can be done for the observa-tion model also, i.e., the logit transformation of the detection parametercan be modeled by either a site covariate (xi) or a survey covariate (xij).


We could also model the effects of many explanatory variables, of poly-nomial terms or even of splines.

Second, the model, as hitherto described, assumes a so-called closedpopulation, i.e., a site is assumed to be either occupied or not occupiedover the entire survey period. For plant surveys conducted during asingle season, this may be a sensible assumption. But as soon as surveysare extended over several growing seasons, or the framework is appliedto more short-lived or mobile organisms, such as most animals, occupancystatus may change within the survey period. Indeed, colonization/extinc-tion dynamics may be a research focus as in metapopulation studies(Hanski, 1998). The solution then is to collect data according to the robustdesign, where two or more surveys are conducted within a short timeperiod (called a “season”), when the population can be assumed closed(Williams et al., 2002). This is repeated over multiple seasons betweenwhich the population may change. For such data, there is a dynamicformulation of the simple site-occupancy model described here to analyzeoccupancy, colonization, and extinction rates corrected for imperfectdetection and to estimate covariate effects on each of these parameters,see MacKenzie et al. (2003, 2006), and Royle and Kéry (2007). In ourcase, however, we will simulate and analyze data only from a singleseason and assume a closed population.


We now simulate the data for our inventory of G. germanica. We assumethat the 150 sites visited form a sample of a larger number of calcareousgrasslands in the Jura. The study objective is to use these data to learnabout all calcareous grassland sites in the Jura and to see whether sitehumidity affects the distribution of G. germanica.

n.site < 150 # 150 sites visited

We create an arbitrary continuous index for soil humidity, where −1means dry and 1 means wet, and sort the data for convenient presentationof the results.

humidity < sort(runif(n = n.site, min = 1, max =1))

Next, we create the positive true relationship between occurrenceprobability of G. germanica and soil humidity (Fig. 20.2). We do this bya logit-linear regression as customary for binomial responses. We choosethe intercept and the slope for this relationship so that about 50% of allsites end up being occupied by the gentian.

alpha.occ <– 0 # Logit-linear intercept for humidity on occurrence

beta.occ <– 3 # Logit-linear slope for humidity on occurrence


occ.prob <– exp(alpha.occ + beta.occ * humidity) / (1 + exp(alpha.occ + beta.occ

* humidity))

plot(humidity, occ.prob, ylim c(0,1), xlab "Humidity index", ylab

"Occurrence probability", main "", las 1)

true.presence <– rbinom(n n.site, size 1, prob occ.prob)

true.presence # Look at the true occupancy state of each site

sum(true.presence) # Among the 150 visited sites

> true.presence <– rbinom(n n.site, size 1, prob occ.prob)

> true.presence # Look at the true occupancy state of each site

[1] 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0[36] 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 1 0 0 1 1 0 0 0 0 0 0 1 0[71] 0 0 0 1 1 0 0 1 1 1 1 0 1 1 0 1 1 1 1 1 0 1 0 0 1 0 0 1 1 1 1 1 1 0 1[106] 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1[141] 0 1 1 1 1 1 1 1 0 1> sum(true.presence)# Among the 150 visited sites[1] 75

This is the true state of the gentian system we are studying, i.e., therealization of the stochastic biological process we’re interested in. Thisstate is only imperfectly observable in nature, even for plant populations(Kéry et al., 2006). However, it is what we would like to observe andwhat we would like to relate to habitat variables such as humidity inour distribution model for G. germanica.

1.0

0.8

0.6

0.4

0.2

0.0

Humidity index

Occ

urre

nce

prob

abili

ty

�1.0 �0.5 1.00.0 0.5

FIGURE 20.2 The unobserved true relationship between humidity and occurrenceprobability of the Chiltern gentian.


Unfortunately, we only observe a degraded image of that true state ofnature, where the degradation is because of the fact that G. germanicamay be overlooked and particularly so at the wetter sites with highervegetation. So, next we simulate this effect to obtain our actual “presence–absence” observations. After simulating the biological process (whichresulted in 150 realizations of true occurrence zi), we now model the obser-vation process that resides between the true biological process (“truth”)and our observations (Fig. 20.3).

alpha.p <– 0 # Logit-linear intercept for humidity on detection

beta.p <– −5 # Logit-linear slope for humidity on detection

det.prob <– exp(alpha.p + beta.p * humidity) / (1 + exp(alpha.p + beta.p *

humidity))

plot(humidity, det.prob, ylim c(0,1), main "", xlab "Humidity index", ylab

"Detection probability", las 1)

Assuming no false-positive errors, i.e., that no other species is erroneouslyidentified as G. germanica, the Chiltern gentian can only be detected at siteswhere it occurs. Hence, the effective detection probability is the product oftrue occurrence (zi) and this detection probability (det.prob):

eff.det.prob < true.presence * det.prob

Importantly, this effective detection probability or apparent occurrenceprobability is precisely the quantity modeled by conventional species dis-tribution models (e.g., Elith et al., 2006)! Its expectation is the product ofoccurrence probability and detection probability, i.e., ψ × p.

1.0

0.8

0.6

0.4

0.2

0.0

Humidity index

Det

ectio

n pr

obab

ility

�1.0 �0.5 1.00.50.0

FIGURE 20.3 Relationship between site humidity and detection probability in theChiltern gentian.


We store the results of each survey, 1 (gentian detected) or 0 (no gentiandetected), in an n.site-by-3 matrix and fill it by simulating coin-flips(i.e., drawing Bernoulli trials) with the detection probabilities justcomputed. We note that this is the first time in the book that we use atwo-dimensional array for our data. As a consequence, we will see adouble for loop in the BUGS code to analyse these data.

R < n.siteT < 3y < array(dim = c(R, T))

# Simulate results of first through last surveysfor(i in 1:T){

y[,i] < rbinom(n = n.site, size = 1, prob = eff.det.prob)}

Hence, y now contains the results of our simulated surveys to findG. germanica at 150 sites.

y # Look at the data

sum(apply(y, 1, sum) > 0) # Apparent distribution among 150 visited sites

> y# Look at the detection/nondetection data

[,1] [,2] [,3]

[1,] 0 0 0

[2,] 0 0 0

[ ... ]

[149,] 0 0 0

[150,] 0 0 0

> sum(apply(y, 1, sum) > 0)# Apparent distribution

[1] 31

On average (if we simulate this stochastic system many times) our para-meter values yield about 42% detected gentian populations. Let’s seewhat a naïve analysis of these observations would tell us aboutthe relationship between humidity and the occurrence of G. germanica.(I call this analysis naïve because it omits an important systemcomponent, the observation process. The simplest way to analyze this rela-tionship is by a logistic regression of an indicator for “ever detected” (herecalled obs) on humidity.

obs <– as.numeric(apply(y, 1, sum) > 0)

naive.analysis <– glm(obs ~ humidity, family binomial)

summary(naive.analysis)

lin.pred <– naive.analysis$coefficients[1] + naive.analysis$coefficients[2] *

humidity

plot(humidity, exp(lin.pred) / (1 + exp(lin.pred)), ylim c(0,1), main "",

xlab "Humidity index", ylab "Predicted probability of occurrence", las 1)


We see that in a naïve analysis, the gentian’s preference for wetter sitesis totally masked (Fig. 20.4; although in different realizations of thedata-generating process, you might also find significant positive or evennegative slopes for humidity). Let’s see whether a site-occupancy modelcan do better.


A variety of site-occupancy models (e.g., with covariates, for singleor multiple seasons, single or multiple species) may be fitted using maxi-mum likelihood in the free Windows-based programs MARK (see http://welcome.warnercnr.colostate.edu/~gwhite/mark/mark.htm) and PRESENCE(see http://www.mbr-pwrc.usgs.gov/software/doc/presence/presence.html).R code for obtaining maximum likelihood estimates (MLEs) can befound in the Web appendix of the book by Royle and Dorazio (2008).Furthermore, there is a new R package called unmarked which allowsto fit these models using maximum likelihood (Fiske and Chandler,2010). Hence, here we will directly use WinBUGS to fit the site-occupancymodel in a Bayesian mode of inference.

As an added feature, we will perform a posterior predictive checkbased on the sum of absolute residuals. The resulting Bayesian p-valueallows us to judge whether the assumed model is appropriate for ourdata set (based on the selected discrepancy measure).

1.0

0.8

0.6

0.4

0.2

0.0

Humidity index

Pre

dict

ed p

roba

bilit

y of

occ

urre

nce

�1.0 �0.5 1.00.50.0

FIGURE 20.4 A naïve analysis of the apparent relationship between occurrence of theChiltern gentian and humidity. This conventional species distribution model ignoresdetection probability.


http://welcome.warnercnr.colostate.edu/~gwhite/mark/mark.htm

http://welcome.warnercnr.colostate.edu/~gwhite/mark/mark.htm

http://www.mbr-pwrc.usgs.gov/software/doc/presence/presence.html

# Define model

sink("model.txt")

cat("

model {

# Priors

alpha.occ ~ dunif(−10, 10) # Set A of priors

beta.occ ~ dunif(−10, 10)

alpha.p ~ dunif(−10, 10)

beta.p ~ dunif(−10, 10)

# alpha.occ ~ dnorm(0, 0.01) # Set B of priors

# beta.occ ~ dnorm(0, 0.01)

# alpha.p ~ dnorm(0, 0.01)

# beta.p ~ dnorm(0, 0.01)

# Likelihood

for (i in 1:R) { #start initial loop over the R sites

# True state model for the partially observed true state

z[i] ~ dbern(psi[i]) # True occupancy z at site i

logit(psi[i]) <– alpha.occ + beta.occ * humidity[i]

for (j in 1:T) { # start a second loop over the T replicates

# Observation model for the actual observations

y[i,j] ~ dbern(eff.p[i,j]) # Detection-nondetection at i and j

eff.p[i,j] <– z[i] * p[i,j]

logit(p[i,j]) <– alpha.p + beta.p * humidity[i]

# Computation of fit statistic (for Bayesian p-value)

Presi[i,j] <– abs(y[i,j]-p[i,j]) # Absolute residual

y.new[i,j]~dbern(eff.p[i,j])

Presi.new[i,j] <– abs(y.new[i,j]-p[i,j])

}

}

fit <– sum(Presi[,])# Discrepancy for actual data set

fit.new <– sum(Presi.new[,]) # Discrepancy for replicate data set


occ.fs <– sum(z[]) # Number of occupied sites among 150

}

",fill TRUE)

sink()

# Bundle data

win.data <– list(y y, humidity humidity, R dim(y)[1], T dim(y)[2])

# Inits function

zst <– apply(y, 1, max) #Good starting values for latent states essential !

inits <– function(){list(z zst, alpha.occ runif(1, −5, 5), beta.occ runif(1,

−5, 5), alpha.p runif(1, −5, 5), beta.p runif(1, −5, 5))}



params <– c("alpha.occ","beta.occ", "alpha.p", "beta.p", "occ.fs", "fit",

"fit.new")

# MCMC settings

nc <– 3

nb <– 2000

ni <– 12000

nt <– 5


out <– bugs(win.data, inits, params, "model.txt", n.chains nc, n.iter ni, n.burn

nb, n.thin nt, debug TRUE)

Before inspecting the parameter estimates, we first check the adequacy ofthe model for our data set using a posterior predictive check (Fig. 20.5).

plot(out$sims.list$fit, out$sims.list$fit.new, main "", xlab "Discrepancy for

actual data set", ylab "Discrepancy for perfect data sets", las 1)

abline(0,1, lwd 2)

We then compute a Bayesian p-value based on the posterior predictivedistributions of our discrepancy measures. In our graph, this correspondsto the proportion of points above the line and for an adequate model, itshould be around 0.5 and not approach too much either toward 0 or 1.

240

230

220

210

200

190

180180 190 200 210

Discrepancy for actual data set

Dis

crep

ancy

for

perf

ect d

ata

sets

220 230 240

FIGURE 20.5 Posterior predictive check of the adequacy of the site occupancy model forthe gentian data based on the sum of absolute residuals. A well fitting model has an evennumber of circles on either side of the 1:1 line.



The model seems to fit well, so we compare the known true values fromthe data-generating process with what the site-occupancy analysis hasrecovered.

cat("\n *** Known truth ***\n\n")

alpha.occ ; beta.occ ; alpha.p ; beta.p

sum(true.presence) # True number of occupied sites, to be compared with occ.fs

sum(apply(y, 1, sum) > 0) # Apparent number of occupied sites

cat("\n *** Our estimate of truth *** \n\n")

print(out, dig 3)

> cat("\n *** Known truth ***\n\n")

*** Known truth ***

> alpha.occ ; beta.occ ; alpha.p ; beta.p

[1] 0

[1] 3

[1] 0

[1] −5

> sum(true.presence) # True number of occupied sites, to be compared with occ.fs

[1] 75

> sum(apply(y, 1, sum) > 0) # Apparent number of occupied sites

[1] 31

> cat("\n *** Our estimate of truth *** \n\n")

*** Our estimate of truth ***

> print(out, dig 3)

Inference for Bugs model at "model.txt", fit using WinBUGS,



mean sd 2.5% 25% 50% 75% 97.5% Rhat n.eff

alpha.occ 0.445 0.466 −0.417 0.127 0.427 0.744 1.394 1.001 3900

beta.occ 3.737 1.094 1.769 2.971 3.673 4.444 6.043 1.002 2600

alpha.p −0.255 0.259 −0.767 −0.432 −0.256 −0.084 0.256 1.001 6000

beta.p −5.514 0.817 −7.237 −6.030 −5.484 −4.955 −3.988 1.001 6000

occ.fs 76.752 5.812 62.000 74.000 78.000 81.000 85.000 1.001 5800

fit 218.370 6.399 204.297 214.500 218.800 222.600 229.900 1.001 6000

fit.new 217.376 7.246 201.900 212.900 217.700 222.200 230.600 1.001 6000

[ ... ]


Thus, the site-occupancy species distribution model succeeds well inrecovering the true relationships between humidity and occurrence anddetection probability, respectively, which we built into the data. Particu-larly impressive is its ability to estimate the true number of occupied sites:gentians were only detected at 31 of the known 75 sites where they actu-ally occurred, and the site-occupancy model estimates this number at 76.7,with a 95% credible interval of 62–85. (Note that when analyzing mysimulated data set from the book website you will get slightly differentresults because of Monte Carlo sampling error.)

Finally, we will graphically compare the results from the naïve and thesite-occupancy analysis with truth by plotting the true and estimatedrelationships between occurrence probability of the Chiltern gentian andsite humidity (Fig. 20.6). This plot shows the mean predictions under themodels; credible intervals could be added if we wished so.

plot(humidity, exp(lin.pred) / (1 + exp(lin.pred)), ylim c(0,1), main "",

ylab "Occurrence probability", xlab "Humidity index", type "l", lwd 2,

col "red", las 1)

points(humidity, occ.prob, ylim c(0,1), type "l", lwd 2, col "black")

lin.pred2 < out$mean$alpha.occ + out$mean$beta.occ * humiditypoints(humidity, exp(lin.pred2) / (1 + exp(lin.pred2)), ylim c(0,1), type

"l", lwd 2, col "blue")

�1.0

0.0

0.2

Occ

urre

nce

prob

abili

ty

0.4

0.6

0.8

1.0

�0.5 0.0Humidity index

0.5 1.0

FIGURE 20.6 Comparison of true and estimated relationship between occurrenceprobability and humidity in the Chiltern gentian (G. germanica) under a site occupancymodel (dashed line) and under the naïve approach that ignores detection probability(dotted line). Truth is shown in solid line.


It is evident from Fig. 20.6 that nonaccounting for detection in speciesdistribution models may lead one spectacularly astray. However, somemight argue that we have simulated a pathological case, and that one wouldrarely find such a situation in nature. This may be true. But, we don’t knowuntil we have conducted the right analysis. And, we know that even a con-stant detection probability <1 not only biases the apparent distribution inconventional models but also biases low the strength of covariate relation-ships. Also, given suitable data, the site-occupancy distribution modelcan correct for that. This should make it a serious candidate for species dis-tribution modeling when replicate observations are available from at leastsome sites.

In practice, this very positive conclusion about the model and its imple-mentation in WinBUGS needs to be moderated somewhat. Performance ofthe model will be inferior with smaller samples (e.g., with fewer sites, asmaller proportion of occupied sites or lower detection probability, orfewer replicate visits) and presumably also in the presence of importantunmeasured covariates. On the application side, it must be said thatWinBUGS can be painful when fitting these slightly more complex models.For instance, it is essential to provide adequate starting values, in particu-lar, for the latent state (occupancy, the code bit z = zst). There may also beprior sensitivity. For instance, altering the set B prior precision from 0.01to 0.001 may cause WinBUGS to issue one of its dreaded trap messages(e.g., TRAP 66 (postcondition violated)). Sometimes, they are producedeven with uniform or normal(0, 0.01) priors. Many other seeminglyinnocent modeling choices may influence success or failure when fittinga particular model to a given data set.

Hence, for suitable data, site-occupancy models in WinBUGS are great,but a comprehensive analysis of a more complex model may have to beaccompanied by a few simulations to check the quality of the inference forthe particular case. In addition, sometimes youmust be prepared to do quitesome amount of painstaking trial and error until the codeworks. But then, toa large part, this applies quite generally for more complex models, not justsite-occupancy models and not just to models fitted using WinBUGS.

20.4 SUMMARY

The site-occupancy model is an extended logistic regression that canestimate true occurrence probability (ψ) and the factors affecting it,while correcting for imperfect detection. The extension is represented bythe model component for detection probability (p): conventional logisticregression is a special case, when p = 1. Site-occupancy models are theonly current framework for inference about species distributions thatmodel true rather than apparent distributions; the latter is the product

20.4 SUMMARY 251

of occurrence and detection probability (i.e., ψ × p). Our example showsthat not accounting for detection probability may lead to spectacularlywrong inferences about the distribution of a species under a conventional,naïve modeling approach. In contrast, the site-occupancy model appliedto replicated “presence–absence” data was better able to estimate thetrue system state (site occupancy and covariate relationship).

EXERCISES1. Prior sensitivity: Conduct a simple prior sensitivity analysis. Compare the

inference under the normal and the uniform sets of priors. Plot ahistogram of the posterior draws for each of the four primary modelparameters to see whether the uniform (−10, 10) priors were not toorestrictive to be uninformative.

2. Site and survey covariates: We have fitted a site covariate, i.e., one that variedamong sites but not among surveys. Incorporate a survey covariate intosimulation and analysis code, i.e., one that varies by individual survey. Anexample might be an inventory conducted by different people withdiffering and known experience. Experience could be rated on a continuousscale from 0 to 1 and more than one person would be sent to each site.Hint: A sampling covariate has the same 2-D-format as the observeddetection/nondetection data.

3. Swiss hare data: Collapse the hare counts of one particular year (e.g., 2000) tobinary data where a 1 indicates the observation of a “large population”(say, a count ≥10). Estimate the proportion of sites inhabited by a largepopulation, i.e., one capable of producing a count ≥10. A site-occupancymodel will correct for the fact that a “large population”may appear small, i.e., produce a count ≥10, or is missed altogether. You may fit a site covariatesuch as elevation on the occurrence probability of “large” populations(i.e., on ψ).

4. Simulation study: Extend the simulation in this chapter to see under whichconditions the site-occupancy model is superior in performance to abinomial GLM. Things to vary might be the number of sites (e.g., 20, 50,150, 500), number of replicate visits (e.g., 2, 3, 5, 10), true averageoccupancy, and true average detection probability. This is a larger projectthat could be part of a thesis.


C H A P T E R

21Nonstandard GLMMs 2:

Binomial Mixture Model toModel Abundance

O U T L I N E




21.4 Summary 273

21.1 INTRODUCTION

Ecology has been defined as the study of distribution and abundance(Andrewartha and Birch, 1954; Krebs, 2001). However, in nature, neitherof them can usually be observed without error, and methods may need tobe applied to infer the true states of distribution and abundance fromimperfect observations. In Chapter 20, we met a protocol, which wecalled a metapopulation design, where the same quantity was assessed ina similar way across R sites and T temporal replicates. We saw thatsuch a metapopulation design enables the application of site-occupancymodels, a kind of nonstandard generalized linear mixed model (GLMM)with binary random effects, to estimate true species distribution free of thedistorting effects of detection probability. Temporal replicate observationsin a closed system allowed us to resolve the confounding betweenoccurrence and detection.

This chapter features another nonstandard GLMM where the randomeffects distribution is different from normal, namely Poisson. As for the


site-occupancy model, the random effects in this model have a precisebiological meaning, which is local population size in this case. Thus,this model estimates abundance corrected for imperfect detectionfrom temporally and spatially replicated counts, rather than occurrencefrom detection/nondetection observations as does the site-occupancymodel in Chapter 20.

Our ecological motivation in this chapter is that of Dutch sand lizards(Lacerta agilis), one of the most widespread reptiles in large parts ofWestern Europe (Fig. 21.1). For more than a decade, The Netherlandshave had a rather interesting reptile-monitoring program where volun-teers walk transects of about 2 km length repeatedly during spring andcount all reptiles they see (Kéry et al., 2009). We will generate andanalyze data in the format collected in this scheme.

FIGURE 21.1 Pair of sand lizards (Lacerta agilis), Switzerland, 2006. (Photo: T. Ott)


The typical question in monitoring programs is always “are thingsgetting better or worse?,” i.e., is there a trend over time in abundance(or distribution)? The usual type of analysis to answer this question forcounts used to be linear regression, but since generalized linear models(GLMs) have become fashionable, some sort of Poisson regression hasbecome the method of choice for the analysis of count data. And, as forother ecological analyses, random effects have become popular and thus,nowadays, inference from many time-series of animal count data is oftenbased on variants of Poisson mixed models (e.g., Link and Sauer, 2002;Ver Hoef and Jansen, 2007).

When repeated counts are available, often only the maximum count isanalyzed, although this approach simply throws out valuable informa-tion. In this chapter, we make full use of replicated counts and adoptthe binomial mixture model (also called the N-mixture model) of Royle(2004a) to estimate true abundance, corrected for imperfect detection(also see Dodd and Dorazio 2004; Royle, 2004b; Kéry et al., 2005; Royleet al., 2005; Dorazio, 2007; Kéry, 2008; Wenger and Freeman, 2008; Josephet al., 2009). For simplicity, we will consider only data from a singleyear, and hence, we assume population closure, but the model can alsobe fitted to multiyear data and a trend in abundance can be estimateddirectly (Royle and Dorazio, 2008, p. 4–7; Kéry and Royle, 2009; Kéryet al., 2010).

First, the model: we assume that a count yij at site i and made duringsurvey j comes from a two-stage stochastic process. The first stochasticprocess is the biological process that distributes the animals among thesites. This process generates the site-specific abundance that we wouldlike to model directly but cannot because we hardly ever see all indivi-duals. The standard statistical model for such data is the Poisson distribu-tion, governed by the intensity (density) parameter λ, which is typicallyconditional on a few habitat covariates. The result of this first stochasticprocess is the local, site-specific abundance Ni. Given that true state Ni,the second stochastic process is the observation process which, togetherwith Ni, determines the data actually observed, i.e., the counts yij. A nat-ural model for the observation process in the presence of imperfect detec-tion is the binomial distribution; given that there are Ni sand lizardspresent and that each has a probability of pij to be observed at site i dur-ing replicate survey j, the number of lizards actually observed is binomi-ally distributed. Two important consequences are that (1) we typicallyobserve fewer than Ni lizards, and (2) the counts yij will vary automati-cally from survey to survey even under identical conditions (Kéry andSchmidt, 2008). Three important assumptions of the binomial mixturemodel are that of population closure, independent and identical detectionprobability for all individuals at site i and during survey j, except insofaras differences among sites or surveys are modeled by covariates, and


absence of double counts and other false positive errors. The effects ofviolations of these assumptions are still being investigated (e.g., Josephet al., 2009).

In summary, the binomial mixture model to estimate abundance fromtemporally and spatially replicated counts can be written succinctly in justtwo lines:

Ni ∼Poisson ðλÞ Biological process yields true stateyij ∼Binomial ðNi, pijÞ Observation process yields observations

It is fascinating to note the similarity of this hierarchical model (Royle andDorazio, 2006, 2008) for abundance and that for occurrence, the site-occu-pancy model from Chapter 20. Recognizing that in the site-occupancymodel, the observation process may also be described by a binomial distri-bution (rather than by a Bernoulli), the sole thing that changes when we gofrom the modeling of occurrence to that of abundance is the distributionused to model the biological process, a Poisson instead of a Bernoulli. Thebinomial mixture model can be described as a binomial GLMMwith a dis-crete (Poisson-distributed) random effect or alternatively, as a logisticregression for the count observations coupled with a Poisson regressionfor the imperfectly observed abundances.

Furthermore, as in the site-occupancy model, covariate effects can bemodeled into the Poisson parameter λ via a log link function and intothe binomial success rate p via the logit link. We can add to the modelexpressions such as logðλiÞ = α + β � xi and logitðpijÞ = α + β � xij, where xiand xij are the values of a site-covariate measured at site i (xi) or of a sur-vey-covariate measured at site i during survey j (xij). Of course, more thana single covariate could be fitted and the covariates for detection can be ofboth types.

Before we embark on our usual simulation-analysis exercise, we maketwo important observations on the abundance parameter of the binomialmixture model. First, even when correcting for imperfect detection (pij),the interpretation of the abundance parameter Ni is not what we mightwant it to be: the number of individuals that permanently reside withina well-defined plot of land. The reason for why this is not so is that ani-mals move around, so the effective sampling area will be greater than thenominal sampling area. Hence, the estimate of Ni refers to a larger area,and we don’t exactly know the size of it. The magnitude of this discre-pancy depends on two things: the typical dispersal distances of thestudy species and the time frame of the repeated surveys. The discrepancywill be greater for greater dispersal and a longer total survey period. If wewant to circumvent this difficulty, other sampling and analysis protocolsmust be used such as distance sampling (Buckland et al., 2001) or, morerecently, spatial capture–recapture methods (Efford, 2004; Borchers and


Efford, 2008; Royle and Dorazio, 2008; Royle and Young, 2008; Effordet al., 2009; Royle et al., 2009).

Second, when animals move through the sampling area randomly, thusin effect violating the closure assumption, it appears that the estimate of Ni

does not refer to the number of animals that permanently reside within thesample area, but to the number of animals that ever use an area duringthe entire sampling period (Joseph et al., 2009). This reasoning is analo-gous to the reasoning about the interpretation of the occupancy parameterin site-occupancy models in the face of temporary emigration (MacKenzieet al., 2006). In effect, it again makes the effective sampling area largerthan the nominal sampling area.

These issues are not a fault of the binomial mixture model; rather, evenin the absence of a formal framework for interpreting a count in the lightof both the biological and the observation process, we never know exactlywith which area the count is associated. Nor do we know by how muchmovement inflates our counts relative to the number of individuals thatpermanently reside within the area. Thus, a binomial mixture modelsolves the problem of imperfect detection when interpreting (i.e., analyz-ing) counts, but the issue of how exactly abundance should be interpretedmay still remain a challenge.


In this example of the analysis of data collected from a metapopulationdesign, we will choose a different format from that in Chapter 20. Insteadof a rectangular or horizontal format (remember the data matrix yij in thesite-occupancy analysis), we will assemble and analyze the data in a ver-tical format and use a population index covariate to keep track, for eachcount, of the population it was made in. This format is more convenientwhen there are many missing values, i.e., in unbalanced designs, wherethe number of replicate surveys is variable among sites. (It is fairly easyto formulate the site-occupancy model in this format also and the code inthe current chapter may be used as a template. For an example of Win-BUGS code for the binomial mixture model in horizontal format, seeKéry, 2008.) Also note the similarity of this to traditional (normal)repeated measures analysis of variance analysis, where some statisticspackages allow the replicate observations to be stored in parallel columns(the horizontal format) and others prefer the observations in a single col-umn (the vertical format) with an additional covariate that indexes “sub-jects.” This reiterates the fact that both site-occupancy and binomialmixture models are a kind of repeated-measures analysis.

To simulate our data, we assume that we surveyed 200 sites. We choosea site covariate affecting the abundance of sand lizards and also their


detection probability. We take vegetation cover this time and assume thatlizard abundance is highest at medium values: too open is bad, but toodense vegetation is also bad.Wewillmodel this as a quadratic effect of vege-tation density on abundance.

n.site < 200vege < sort(runif(n = n.site, min = 1.5, max =1.5))

We construct the relationship between vegetation density and abundance(Fig. 21.2).

alpha.lam <– 2 # Intercept

beta1.lam <– 2 # Linear effect of vegetation

beta2.lam <– −2 # Quadratic effect of vegetation

lam <– exp(alpha.lam + beta1.lam * vege + beta2.lam * (vege^2))

par(mfrow c(2,1))

plot(vege, lam, main "", xlab "", ylab "Expected abundance", las 1)

N <– rpois(n n.site, lambda lam)

table(N) # Distribution of abundances across sites

sum(N > 0) / n.site # Empirical occupancy

> table(N)# Distribution of abundances across sites

N

0.0

Vegetation cover

0.5 1.0 1.521.5 21.0 20.5

0.0 0.5 1.0 1.521.5 21.0 20.5

Exp

ecte

d ab

unda

nce

121086420

Rea

lized

abu

ndan

ce

15

10

20

0

5

FIGURE 21.2 Expected (top) and realized relationship (bottom) between sand lizardabundance and vegetation cover. The expected abundance is shown as a black line in thebottom panel and is the same as the realized abundance minus Poisson variability.


0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 19 20 21

57 12 11 7 7 16 8 13 14 12 9 9 7 4 5 3 2 2 1 1

> sum(N > 0) / n.site # Empirical occupancy

[1] 0.715

plot(vege, N, main "", xlab "Vegetation cover", ylab "Realized abundance")

points(vege, lam, type "l", lwd 2)

This concludes our description of the biological process: we have a ran-dom process that distributes the sand lizards across sites, and we assumethat the result of this stochastic process at site i can be approximated by aconditional Poisson distribution, with rate parameter λi, that itself dependson vegetation density xi in a quadratic fashion.

Next, we need to simulate the observation process, i.e., something like a“machine” that maps abundance Ni onto lizard counts yij. We assume thatthe observation process is also affected by vegetation density: denser vege-tation reduces the detection probability (Fig. 21.3 upper panel). I note inpassing that in the binomial mixture model, detection probability isdefined per individual animal, whereas in the occupancy model, it refersto the probability to detect at least one among the Ni animals or plantspresent at a site. In fact, there is a precise mathematical relationshipbetween the two kinds of detection probability, which we do not showhere (see Royle and Nichols, 2003; Dorazio, 2007).

0.0 0.5 1.0 1.5

0.20.40.60.8

0.0

1.0

21.021.5 20.5

0.0 0.5 1.0 1.521.5 21.0 20.5

Vegetation cover

8

12

0

4

App

aren

t abu

ndan

ceD

etec

tion

prob

abili

ty

FIGURE 21.3 The relationship between vegetation cover and detection probability (top)and the expected sand lizard counts (=apparent abundance; bottom). Truth is shown as ablack line in the bottom panel.


par(mfrow c(2,1))

alpha.p <– 1 # Intercept

beta.p <– −4 # Linear effect of vegetation

det.prob <– exp(alpha.p + beta.p * vege) / (1 + exp(alpha.p + beta.p * vege))

plot(vege, det.prob, ylim c(0,1), main "", xlab "", ylab "Detection

probability")

Now for fun, let’s see the expected lizard count at each site in relation tovegetation cover. The expected count at site i is given by the product ofabundance Ni and detection probability at that site pi. And let’s put it alltogether and inspect the truth also.

expected.count < N * det.prob

plot(vege, expected.count, main "", xlab "Vegetation cover", ylab "Apparent

abundance", ylim c(0, max(N)), las 1)

points(vege, lam, type "l", col "black", lwd 2) # Truth

A conventional analysis would use some sort of Poisson regressionand model the expected count or apparent abundance. This is the bell-shaped cloud in the lower panel of Fig. 21.3, where abundance and detec-tion are confounded. Thus, compared with the truth represented by theblack line, a conventional analysis will underestimate average abundanceand (in our case) estimate maximum abundance at too low of a value ofvegetation cover. This is because the Poisson regression does not modelabundance but rather the product of expected abundance with detectionprobability. This is hardly ever made explicit by authors and apparentlyoften not even recognized.

Now let’s simulate three replicate counts at each site and look at thedata.

R < n.siteT < 3 # Number of replicate counts at each sitey < array(dim = c(R, T))

for(j in 1:T){y[,j] < rbinom(n = n.site, size = N, prob = det.prob)

}y

A species (occurrence) distribution is fundamentally the same as an abun-dance distribution, but with much reduced information: a species occursat all sites where abundance N > 0 (Royle et al., 2005; Dorazio, 2007).Hence, any model of abundance is also a model of species distribution.It is seldom useful to think of distribution as something separate fromabundance.


sum(apply(y, 1, sum) > 0) # Apparent distribution (proportion occupied sites)

sum(N > 0) # True occupancy

> sum(apply(y, 1, sum) > 0)

[1] 126

> sum(N > 0)

[1] 143

Now stack the replicated counts on top of each other for a vertical dataformat (i.e., convert the matrix to a vector)

C < c(y)

We also need a site index and a vegetation covariate that have the samelength as the variable C (for the observation model, i.e., to model p; seeWinBUGS code in section 21.3.). We will denote them by a p suffix inthe variable name.

site 1:R # ‘Short’ version of site covariate

site.p < rep(site, T) # ‘Long’ version of site covariate

vege.p < rep(vege, T) # ‘Long’ version of vegetation covariate

cbind(C, site.p, vege.p) # Check that all went right

Here is a quick and dirty conventional analysis of the maximum countsassuming a Poisson distribution for the max(count) (A slightly betteralternative would, perhaps, be to assume a normal distribution for themean of the counts.):

max.count < apply(y, 1, max)

naive.analysis < glm(max.count ~ vege + I(vege^2), family poisson)

summary(naive.analysis)

lin.pred < naive.analysis$coefficients[1] + naive.analysis$coefficients[2] *

vege + naive.analysis$coefficients[3] * (vege*vege)

We compare truth and the naïve analysis in a graph (Fig. 21.4):

par(mfrow c(1,1))

plot(vege, max.count, main "", xlab "Vegetation cover", ylab "Abundance or

count", ylim c(0,max(N)), las 1)

points(vege, lam, type "l", col "black", lwd 2)

points(vege, exp(lin.pred), type "l", col "red", lwd 2)

Clearly, the predictions under the naïve analysis yield a biased picture ofthe relationship between abundance and vegetation cover because sandlizards are easier to see in more open vegetation.



Now let’s see how the binomial mixture model can do better. As for thesite-occupancy model, a variety of binomial mixture models can be fittedusing maximum likelihood in the free Windows programs MARK andPRESENCE. R code for obtaining maximum likelihood estimates underthe model can be found in Kéry et al. (2005), Royle and Dorazio (2008)and its Web appendix, and Wenger and Freeman (2008). Furthermore,the model can be fitted using functions in the new R package unmarked(Fiske and Chandler, 2010). We show a Bayesian solution here.

# Define model

sink("BinMix.txt")

cat("

model {

# Priors

alpha.lam ~ dnorm(0, 0.1)

beta1.lam ~ dnorm(0, 0.1)

beta2.lam ~ dnorm(0, 0.1)

alpha.p ~ dnorm(0, 0.1)

beta.p ~ dnorm(0, 0.1)

21.5

2

4

6

8

10

12

21.0 20.5 1.51.00.50.0Vegetation cover

Abu

ndan

ce o

r co

unt

0

FIGURE 21.4 Relationship between abundance of Dutch sand lizards and vegetationcover as inferred by a naïve analysis not accounting for detection probability. Circles showthe maximum count at each site and the black solid line the predicted relationship under thenaïve model. Truth is a dashed line.


# Likelihood

# Biological model for true abundance

for (i in 1:R) { # Loop over R sites

N[i] ~ dpois(lambda[i])

log(lambda[i]) < alpha.lam + beta1.lam * vege[i] + beta2.lam * vege2[i]

}

# Observation model for replicated counts

for (i in 1:n) { # Loop over all n observations

C[i] ~ dbin(p[i], N[site.p[i]])

logit(p[i]) < alpha.p + beta.p * vege.p[i]

}


totalN < sum(N[]) # Estimate total population size across all sites

}

",fill TRUE)

sink()

# Bundle data

R dim(y)[1]

n dim(y)[1] * dim(y)[2]

vege2 (vege * vege)

win.data < list(R R, vege vege, vege2 vege2, n n, C C, site.p

site.p, vege.p vege.p)

As for the site-occupancy model, clever starting values for the latent states(the Ni’s) are essential. We use the maximum count at each site as a firstguess of what N might be and add 1 to avoid zeros. WinBUGS cannot usezeroes for the N value for the binomial and will crash if you initialize themodel with zeroes.

# Inits function

Nst < apply(y, 1, max) + 1

inits < function(){list(N Nst, alpha.lam rnorm(1, 0, 1), beta1.lam rnorm(1, 0,

1), beta2.lam rnorm(1, 0, 1), alpha.p rnorm(1, 0, 1), beta.p rnorm(1, 0, 1))}


params < c("N", "totalN", "alpha.lam", "beta1.lam", "beta2.lam", "alpha.p",

"beta.p")

# MCMC settings

nc < 3

nb < 200

ni < 1200

nt < 5



out < bugs(win.data, inits, params, "BinMix.txt", n.chains nc, n.iter ni, n.burn


Here is a first note on the practical implementation of such slightlymore complex models in WinBUGS: This code works fine and appearsto converge surprisingly quickly for our data set. But as an illustrationof how “difficult” WinBUGS can sometimes be, try widening the rangeof the priors by increasing some or all of the precisions from 0.1 to 0.01:WinBUGS will crash. Many modeling choices that are not wrong, butsimply not chosen in an optimal manner, can throw you off the track inyour attempts to exploit the great modeling freedom that WinBUGS givesyou in principle. It is true that with experience, the reasons for manycrashes can be diagnosed, but for a beginner, they may represent majorstumbling blocks.

Let’s now compare the known true values with what the analysis hasrecovered.


print(out, dig 2)

cat("\n *** Compare with known truth ***\n\n")

alpha.lam ; beta1.lam ; beta2.lam ; alpha.p ; beta.p

sum(N) # True total population size across all sites

sum(apply(y, 1, max)) # Sum of site max counts



> print(out, dig 2)

Inference for Bugs model at "BinMix.txt", fit using WinBUGS,



mean sd 2.5% 25% 50% 75% 97.5% Rha n.eff

[ ... ]

totalN 957.01 160.99 683.95 836.00 948.00 1066.00 1282.22 1.01 310

alpha.lam 2.47 0.18 2.11 2.35 2.48 2.59 2.80 1.00 340

beta1.lam 0.71 0.23 0.27 0.54 0.72 0.85 1.17 1.01 480

beta2.lam 2.81 0.23 3.25 2.97 2.80 2.65 2.33 1.02 150

alpha.p 0.14 3.16 5.80 2.10 0.06 2.57 6.32 1.00 600

beta.p 0.14 2.95 5.95 2.15 0.17 1.96 5.76 1.00 600

deviance 5844.93 4799.46 1385.70 2547.73 4475.00 7278.66 18910.75 1.00 600

[ ... ]

>

> cat("\n *** Compare with known truth ***\n\n")


*** Compare with known truth ***

> alpha.lam ; beta1.lam ; beta2.lam ; alpha.p ; beta.p

[1] 2

[1] 2

[1] 2

[1] 1

[1] 4

> sum(N) # True total population size across all sites

[1] 1073

> sum(apply(y, 1, max))# Sum of site max counts

[1] 481

With truth being 1073 sand lizards, the estimate of total N (957 lizards)appears decent with respect to the sum of the max counts across all 200sites, which was only 481. However, there is a slight correspondence forthe coefficients in the biological process model (alpha.lam, beta1.lam,beta2.lam), but no similarity at all for the coefficients in the observationprocess model … That is disappointing! What has happened? Note againthat WinBUGS claims that the chains have converged.

Seeing these results for the first time, I couldn’t believe that this would gowrong because I knew that these parameters should all be identifiable (andwere so for an only slightly different model in Kéry, 2008). I tried out sev-eral things to see whether I could get a better result; I increased sample sizes(e.g., to R = 2000 and T = 10), dropped the quadratic term in the biologicalprocess model, but none of those helped. In the end, I tried out an alterna-tive set of uniform priors instead of the fairly uninformative normals usedpreviously. I also avoided the WinBUGS logit and defined that functionmyself (see WinBUGS tricks in the Web appendix). And this worked!

That is, I fitted the following version of the model, where I also addedcode to assess model goodness of fit using a posterior predictive checkfor a Chi-square discrepancy measure. Convergence in a binomialmixture model is attained notoriously much more slowly than, say, in asite-occupancy model. Therefore, I ran considerably longer chains.

# Define model with new uniform priors

sink("BinMix.txt")

cat("

model {

# Priors (new)

alpha.lam ~ dunif( 10, 10)

beta1.lam ~ dunif( 10, 10)

beta2.lam ~ dunif( 10, 10)

alpha.p ~ dunif( 10, 10)

beta.p ~ dunif( 10, 10)


# Likelihood

# Biological model for true abundance

for (i in 1:R) {

N[i] ~ dpois(lambda[i])

log(lambda[i]) < alpha.lam + beta1.lam * vege[i] + beta2.lam * vege2[i]

}

# Observation model for replicated counts

for (i in 1:n) {

C[i] ~ dbin(p[i], N[site.p[i]])

lp[i] < alpha.p + beta.p * vege.p[i]

p[i] < exp(lp[i])/(1+exp(lp[i]))

}


totalN < sum(N[])

# Assess model fit using Chisquare discrepancy

for (i in 1:n) {

# Compute fit statistic for observed data

eval[i]< p[i]*N[site.p[i]]

E[i] < pow((C[i] eval[i]),2) / (eval[i] + 0.5)

# Generate replicate data and compute fit stats for them

C.new[i] ~ dbin(p[i], N[site.p[i]])

E.new[i] < pow((C.new[i] eval[i]),2) / (eval[i] + 0.5)

}

fit < sum(E[])

fit.new < sum(E.new[])

}

",fill TRUE)

sink()


params < c("N", "totalN", "alpha.lam", "beta1.lam", "beta2.lam", "alpha.p",

"beta.p", "fit", "fit.new")

# MCMC settings

nc < 3

nb < 10000

ni < 60000

nt < 50 # Takes about 20 mins on my laptop


out < bugs(win.data, inits, params, "BinMix.txt", n.chains nc, n.iter ni, n.burn



Because convergence in Bayesian analyses of binomial mixture modelsmay be hard to achieve, we first check whether this run has converged.We find that the specifications of this Markov chain Monte Carlo(MCMC) run seem to have been long enough. The Markov chains of allprimary, structural model parameters seem to have converged, only thosefor some latent (local abundance Ni) parameters have not converged. Theirconvergence is of lesser concern, unless of course the chains for local abun-dance of your favorite population happened not to converge.

print(out, dig = 3)which(out$summary[,8] > 1.1)

Next, we check whether the uniform prior on beta.p was not too restric-tive by producing a histogram of the posterior (the posterior of this para-meter seemed to come closest to the bounds of the uniform prior).However, there is no indication for any mass being piled up toward oneof the bounds (Fig. 21.5), and therefore, we conclude that there was noundue influence of this prior on our inference.

hist(out$sims.list$beta.p, col = "grey", main = "", xlab = "")

Next, we check the adequacy of the model for the data set first usinga posterior predictive check, before inspecting the parameter estimates(Fig. 21.6):

plot(out$sims.list$fit, out$sims.list$fit.new, main "", xlab "Discrepancy

measure for actual data set", ylab "Discrepancy measure for perfect data sets")


25.5

0

100

Fre

quen

cy

200

300

400

25.0 24.5 23.5 22.5beta.p

23.024.0

FIGURE 21.5 Posterior distribution of the slope estimate of sand lizard detectionprobability on vegetation cover (beta.p).


mean(out$sims.list$fit.new > out$sims.list$fit)

> mean(out$sims.list$fit.new > out$sims.list$fit)

[1] 0.1906667

Both the graphical check and the Bayesian p-value, indicate an adequatemodel for our data set, so we inspect the parameter estimates and com-pare them with truth in the data-generating process.


print(out, dig 2)

cat("\n *** Compare with known truth ***\n\n")

alpha.lam ; beta1.lam ; beta2.lam ; alpha.p ; beta.p

sum(N) # True total population size across all sites

sum(apply(y, 1, max)) # Sum of site max counts



> print(out, dig 2)

Inference for Bugs model at "BinMix.txt", fit using WinBUGS,



mean sd 2.5% 25% 50% 75% 97.5% Rhat n.eff

N[1] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 1

[ ... ]

200

180

160

140

120

100

100 120 140Discrepancy measure for actual data set

Dis

crep

ancy

mea

sure

for

perf

ect d

ata

sets

160 180 200

FIGURE 21.6 Posterior predictive check of the binomial mixture model using aChi square discrepancy.


N[200] 2.26 3.07 0.00 0.00 1.00 3.00 11.00 1.03 290

totalN 1014.12 268.27 683.00 822.00 942.00 1136.25 1741.17 1.02 160

alpha.lam 1.96 0.07 1.82 1.91 1.96 2.00 2.10 1.00 1300

beta1.lam 1.81 0.26 1.34 1.62 1.79 1.99 2.36 1.01 190

beta2.lam 1.99 0.33 2.64 2.22 1.99 1.76 1.35 1.01 280

alpha.p 1.12 0.17 0.78 1.02 1.13 1.24 1.44 1.01 310

beta.p 3.94 0.49 4.94 4.28 3.92 3.59 3.02 1.01 260

fit 153.07 10.64 133.10 145.70 152.70 160.10 175.10 1.00 550

fit.new 139.98 16.73 110.60 128.00 139.10 150.60 176.20 1.01 290

[ ... ]


pD 209.2 and DIC 1161.9


>

> cat("\n *** Compare with known truth ***\n\n")

*** Compare with known truth ***

> alpha.lam ; beta1.lam ; beta2.lam ; alpha.p ; beta.p

[1] 2

[1] 2

[1] 2

[1] 1

[1] 4

> sum(N) # True total population size across all sites

[1] 1073

> sum(apply(y, 1, max))# Sum of site max counts

[1] 481

Our model recovered adequate parameter estimates for the covariaterelationships and estimated a total population size across all 200 sites of1014 sand lizards (95% CI: 683–1741). As a comparison, truth was 1073lizards, and the sum of the max counts, a conventional estimate of totalpopulation size, was only 481.

Figure 21.7 gives a graphical comparison between the parameterestimates under the binomial mixture model and the values of the asso-ciated data-generating parameters. It shows again that the model does agood job at estimating abundance.

par(mfrow c(3,2))

hist(out$sims.list$alpha.lam, col "grey", main "alpha.lam", xlab "")

abline(v alpha.lam, lwd 3, col "black")

hist(out$sims.list$beta1.lam, col "grey", main "beta1.lam", xlab "")


abline(v beta1.lam, lwd 3, col "black")

hist(out$sims.list$beta2.lam, col "grey", main "beta2.lam", xlab "")

abline(v beta2.lam, lwd 3, col "black")

hist(out$sims.list$alpha.p, col "grey", main "alpha.p", xlab "")

abline(v alpha.p, lwd 3, col "black")

hist(out$sims.list$beta.p, col "grey", main "beta.p", xlab "")

abline(v beta.p, lwd 3, col "black")

hist(out$sims.list$totalN, col "grey", , main "Total N", xlab "")

abline(v sum(N), lwd 3, col "black")

Here is a second note on the practical implementation of such slightlymore complex models in WinBUGS: We saw that a relatively slight change(here, in the priors) had a very large effect on the inference. This could alsobe called an example of prior sensitivity of the inference. It is definitely agood idea to check the Bayesian analysis of more complex models byexploring “neighboring model regions.” Vary the likelihood (e.g., the cov-ariates that are in or not), priors, model parameterization or other thingsslightly and see whether your inference is robust.

alpha.lam beta1.lam1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.51.5 2.0

0

200

400

Fre

quen

cy

0

400

800F

requ

ency

alpha.pbeta2.lam23.0 22.0 0.4 0.6 0.8 1.0 1.2 1.4 1.621.022.5 21.5

0

200

500

Fre

quen

cy

200

0

500

Fre

quen

cy

1000 20001500beta.p Total N

25.5 24.5 23.5 22.50

200

500

Fre

quen

cy

0

200

400

Fre

quen

cy

FIGURE 21.7 Comparison of estimates under the binomial mixture model (posteriordistributions) and truth in the data generating algorithm (black line) for six estimands.


In our case, we now get rather decent estimates fairly close to theknown truth. Hence, we illustrate a few further inferences that can bemade under the model by looking at a few further posterior distributions.In Fig. 21.7, we have seen those for some primary parameters of themodel. One of the most interesting things in the binomial mixturemodel is that site-specific estimates, Ni, can be obtained. Let’s now havea look at these estimates of local abundance for a random sample of sites(Fig. 21.8).

sel < sort(sample(1:200, size 4))

sel

par(mfrow c(2,2))

hist(out$sims.list$N[,sel[1]], col "grey", xlim c(Nst[sel[1]] 1,

max(out$sims.list$N[,sel[1]])), main "Site 48", xlab "")

abline(v Nst[sel[1]] 1, lwd 3, col "red")

2.0 3.0 4.0

Abundance

Site 134 Site 137

Site 48 Site 95

2.5 3.5

10 20 30Abundance

0

200

400

600

800

Fre

quen

cy

0

200

400

600

500

1500

2000

1000

0

800

Fre

quen

cyF

requ

ency

0

1000

2000

3000

Fre

quen

cy

15 25 105 20 30Abundance

15 25

9 11 13

Abundance

10 12

FIGURE 21.8 Comparison of estimates of local abundance (Ni) under the binomialmixture model (posterior distributions) and the maximum count (black line) at a sample offour sites.









max(out$sims.list$N[,sel[4]])) , main "Site 137", xlab "")


> sel

[1] 48 95 134 137

The posterior distributions show the likely size of the local populations(Ni) of sand lizards at sites number 48, 95, 134, and 137. We can comparethis to the observed data at these sites:

y[sel,]> y[sel,]

[,1] [,2] [,3][1,] 3 3 3[2,] 6 7 9[3,] 7 7 3[4,] 1 3 2

And since we know truth, why not have a look at it? Here are the truepopulation sizes at these sites:

N[sel]> N[sel][1] 3 9 21 8

Compare this with the estimates of these local Ni:

print(out$mean$N[sel], dig = 3)> print(out$mean$N[sel], dig = 3)[1] 3.00 9.34 15.31 9.31

Finally, Fig. 21.9 shows a comparison of the relationship between sandlizard abundance and vegetation cover using a naïve analysis andunder the binomial mixture model.

par(mfrow c(1,1))

plot(vege, N, main "", xlab "Vegetation cover", ylab "Abundance", las 1,

ylim c(0,max(N)))


points(sort(vege), lam[order(vege)], type "l", col "black", lwd 2)

points(vege, exp(lin.pred), type "l", col "red", lwd 2)

BinMix.pred <– exp(out$mean$alpha.lam + out$mean$beta1.lam * vege +

out$mean$beta2.lam * (vege*vege))

points(vege, BinMix.pred, type "l", col "blue", lwd 2)

21.4 SUMMARY

Binomial mixture modeling in metapopulation designs offers greatopportunities for the estimation of animal or plant abundance correctedfor imperfect detection probability (p). Essentially, the model is simply ageneralized version of the familiar Poisson regression model that accom-modates imperfect detection; when p = 1, we are back to a classicalPoisson generalized linear model. However, it is more complex than asimpler Poisson regression and even more so than a conventional hier-archical Poisson regression or GLMM (see Chapter 16). There are manyways in which the practical implementation in WinBUGS may fail, anda lot of trial and error and model checking may be required. Nevertheless,if replicate count data are available from a number of sites, you shouldabsolutely try out this new and exciting model.

21.5 21.0 20.5 1.51.00.50.0Vegetation cover

5

10

15

20

Abu

ndan

ce

0

FIGURE 21.9 Comparison of the estimated abundance vegetation relationship in Dutchsand lizards under a naïve approach that ignores imperfect detection (red) with that underthe binomial mixture model (blue). Truth is shown in black: circles are the realizedabundances at each site (Ni) and the black line is their expectation (as in Fig. 21.1).

21.4 SUMMARY 273

EXERCISES1. Survey covariates: In metapopulation designs, we frequently have

detection-relevant covariates that vary by site and survey (i.e., surveycovariates). For our Dutch sand lizards, one such covariate is ambienttemperature (Kéry et al., 2009): presumably, lizard activity depends on thetemperature and this affects their detection probability. Modify the datageneration code as well as the WinBUGS model to include the effects of atemperature covariate.

2. Prior sensitivity: Play around with prior settings in the last model we ran.Change the uniform distributions to have a very wide range and seewhether the model converges. Conversely, set the range very narrow andsee whether the inference is affected, i.e., whether the parameter estimatesare changed.

3. Swiss hare data: Fit the binomial mixture model to the hare data from asingle year (e.g., 2000) and see whether there is a difference in theprobability of detection in grassland and arable areas. By what proportionwill mean or max counts underestimate true population size?

4. Simulation exercise: The binomial mixture model was described in 2004(Royle, 2004a) and so is still fairly young. Five years later, it had beenapplied in hardly more than 10 publications, and much remains to be foundout about the model that can be tackled by simulation studies. For instance,doing a simulation study (vary R, T, covariate effects, and other things) formodels with covariates similar as what was suggested for the site-occupancy model (Exercise 4 in Chapter 20) might be worthwhile. This isanother serious study that could easily yield a decent article or thesischapter.


C H A P T E R

22Conclusions

Now that you are through, it is perhaps worth looking back and seeingwhere you have come. I hope that you have achieved four things. First,you have achieved some practical understanding for how Bayesian infer-ence with vague priors works and why Bayesian inference is simply souseful (Link and Barker, 2010). Second, you have gained plenty of practicewith WinBUGS, the most widely used general-purpose Bayesian software.Third, I imagine that many of you have obtained a much deeper under-standing for what you have been doing for years: fitting linear, general-ized linear, and mixed models. In addition to the insights provided by thesimulation of data sets, it is the natural way of model specification in theBUGS language that makes WinBUGS uniquely suitable to really under-stand these models. So, funny perhaps for a book about applied Bayesianmodeling, I think that one of its greatest benefits for you may be some-thing more general: an improved understanding for linear models andtheir extensions.

Finally, and fourth, I hope you have gained a taste for modern statisti-cal modeling. In statistics classes at university, many ecologists have onlyseen a sad caricature of statistics. We were taught to think in terms of adecision tree for black-box procedures. The tree started with a questionlike “Are the data normally distributed?” and its terminal branches pre-scribed a t-test or a Kruskal–Wallis test or an analysis of covariance withhomogeneous slopes (or else we were in deep trouble …). And then, ap-value popped up somewhere, and if it was <0.05, life was good. Thisis a terrible way of doing statistics, boring and devoid of any creativeenergy. It also does great injustice to the inspiring activity of trying tomake sense of incomplete and imperfect observations made in a noisyworld. Perhaps it should come as no surprise that many “sensible” peoplehate and distrust statistics.

To me, collaborating with some statistician colleagues, and especiallystarting to use WinBUGS, has been an eye-opener. I have come to seedata analysis as equivalent with the creation of a statistical model thatattempts to emulate the main features of the stochastic process that


could conceivably have generated our data. We then study features of ourstatistical model representation of that part of the world we are interestedin, such as body mass of peregrines, snout-vent length of snakes, popula-tion size of lizards, or the distribution of a plant. By doing so, we hope tolearn something about that world, something that may be hidden under-neath the tangle of detail, distortion, and noise that is a hallmark of mostdata sets.

Many modern statisticians build their models in an organic way, with-out ever thinking about that old decision tree that many of us ecologistshave come to learn by heart. They choose their ingredients from a vastarray and assemble their models in a modular way: take a little of thisdistribution and add a little of that feature to the linear predictor. Thisis a much more creative and inspiring act than simply working our wayalong the branches of that old tree. And one that likely leads to muchimproved inference and mechanistic insight, since the model can beadapted to the particular situation at hand rather than the converse,with us having to shoehorn reality into the conceptual box provided byour statistics program. Thus, modern statistical analysis means to buildone’s own custom models.

Unfortunately, I doubt whether most ecologists will ever be sufficientlynumerate to fit their custom models by maximum likelihood, at least, ifthey have to specify themselves the likelihood of their model in an explicitway. This is where WinBUGS comes in. It is the only software I know thatallows the average, somewhat numerate ecologist to conduct his or herown creative statistical modeling. Therefore, again, I believe that Win-BUGS has the potential to free the statistical modeler in us. Even withoutthe benefits of the Bayesian paradigm of statistical inference, this would beenough for me to recommend WinBUGS to any ambitious quantitativeecologist.

Where will you go from here? I hope that my book enables you to bettertackle more advanced books on ecological modeling, such as by Royle andDorazio (2008), King et al. (2009), or Link and Barker (2010). Much of thisadvanced modeling is based on the powerful notion of hierarchical models(HMs; e.g., Royle and Dorazio, 2006, 2008). HMs describe observed dataas the result of a series of linked stochastic processes, whose outcomesmay be observed or unobserved, i.e., latent. HMs are very powerful tounderstand and predict complex ecological systems. They can be fittedalso using frequentist methods (e.g., Lee et al., 2006; Ponciano et al.,2009), but their implementation in WinBUGS is much more straightfor-ward and arguably easier to understand for an ecologist. The reason forthis is that in the BUGS language, a complex model is naturally brokenapart into hierarchically linked submodels. Indeed, in this book, wehave seen many instances of HMs: all the models containing randomeffects and more particularly the site-occupancy and binomial mixture

22. CONCLUSIONS276

models for inference about metapopulation distribution and abundance(Chapters 20 and 21). In these final chapters, we have just about reachedthe level of modeling where the book by Royle and Dorazio (2008) starts.

Another direction that you may want to explore is nonlinear modelsand especially nonlinear mixed models (Pinheiro and Bates, 2000).In this book, we have exclusively adopted linear statistical models,but there is no reason to restrict the deterministic part of our systemdescription to be additive. Nonlinear models may be more realistic repre-sentations of a study system and may yield better predictions, especiallyoutside of the observed range of covariates.

Finally, I have no doubts that many Bayesians would accuse me ofbeing Bayesian only in a purely opportunistic way. In this book, wehave used the Bayesian computing machinery as a handy way of fittingsometimes complex models without ever really taking advantage of thattrue hallmark of Bayesian inference: the ability to formally combine theinformation contained in the data with all available knowledge aboutthe study system. That is, we don’t really use informative priors. This istrue, and I believe that much can be gained by not feigning ignoranceabout the system analyzed in the way that inference by maximum likeli-hood or Bayesian analysis with vague priors does. Indeed, especially withthe small data sets that are so typical for ecology, much may be gained interms of precision of the estimates and parameter estimability when infor-mative priors are used. It seems likely that in the future, we will increas-ingly see Bayesian analyses with informative priors.

PerhapsWinBUGSmay not allow you to go all the way there. WinBUGSmay be too slow for your data set, you may never get convergence, becaught in a trap, or just lose the patience when WinBUGS behavesjust like a 15-year old. Also, new programs are likely to be developedand improved in the future, such as OpenBugs (http://mathstat.helsinki.fi/openbugs/), AD Model Builder (http://admb-foundation.org/), JAGS (http://www-fis.iarc.fr/~martyn/software/jags/), PyMC (http://code.google.com/p/pymc/), among others, and they may eventually be even more accessibleto ecologists than WinBUGS is now and offer greater computing power.Or, you may learn how to code your own Markov chain Monte Carlosamplers and perhaps use a general programming language such asFortran or C++, which may speed up your Bayesian computations byorders of magnitude relative to what you can achieve in WinBUGS. Buteven then, I believe that for many of you, WinBUGS will have achievedone important thing: it will have changed forever the way in which youthink about, and conduct, statistical modeling.

22. CONCLUSIONS 277



http://admb-foundation.org/

http://www-fis.iarc.fr/~martyn/software/jags/

http://www-fis.iarc.fr/~martyn/software/jags/

http://code.google.com/p/pymc/

http://code.google.com/p/pymc/

A P P E N D I X

A List of WinBUGS Tricks

I provide here a list of tips that hopefully allow you to love WinBUGSmore unconditionally. I would suggest you skim over the list now andthen refer back to it later as necessary.

1. Do read the manual: WinBUGS may not have the best documentationavailable for a software, but its manual is nevertheless very useful. Besure to at least skim over most of it once when you start getting intoWinBUGS (i.e., now for many of you), so you may remember that themanual has something to say about a particular topic when you needit. Don’t forget the sections entitled “Tricks: Advanced Use of theBUGS Language” and “Tips and Troubleshooting.”

2. How to begin: When starting a new analysis, always start from atemplate of a similar analysis. Only ever try to write an analysis fromscratch if you want to test yourself.

3. Initial values 1: The wise choice of inits can be the key to success orfailure of an analysis in WinBUGS, although we don’t see this so muchin the fairly simple models in this book (the nonstandard generalizedlinear mixed models in Chapters 20 and 21 are an exception). Withmore complex models, WinBUGS needs to start the Markov chains nottoo far away from their stationary distribution or it will crash or noteven start to update. Of course, the requirement to start the chainsclose to the solution goes counter the requirement to start them atdispersed places to assess convergence, so some reasonableintermediate choice is important.

4. Initial values 2: Inits must not be outside of the possible range of aparameter. For instance, negative inits for a parameter that has priormass only for positive values (such as a variance) will cause WinBUGSto crash and so do inits outside of the range of a uniform prior.

5. Be ignorant, but not too ignorant: When you want your Bayesian inferenceto be dominated by your data and choose priors intended to be vague,don’t specify too much ignorance, otherwise traps may result orconvergence may not be achieved. For instance, don’t specify the limitsof a uniform prior or the variance of a normal prior to be too wide.


6. Missing values (NAs): NAs are not an issue in this book, since theydo not occur in our “perfect” data sets. However, in your real-worlddata sets, there will always be NAs. In WinBUGS, NAs are dealtwith less automatically than in conventional stats programs withwhich you are likely familiar; hence, it is important to know how todeal with them: briefly, missing responses (i.e., missing y s) are noproblem, but NAs in the explanatory variables (the x s) needattention. A missing response is simply estimated, and indeed, addingmissing responses for selected covariate values is one of the simplestways to form predictions for desired values of explanatory variables.On the other hand, a missing explanatory variable must either bereplaced with some number, e.g., the mean observed value for thatvariable, or else given some prior distribution. In general, the formeris easier and should not pose a problem unless the number of missingx s is large.

7. Think in a box (and know your box): When coding an analysis inWinBUGS, you often have to deal with data that come in arrays thathave more than one dimension. For instance, when analysing animalcounts from different sites, over several years, and taken at variousmonths in a year, it may be useful to format them into a three-dimensional array. Some covariates of such an analysis will then havetwo or even three dimensions, too. Obviously, you must then beabsolutely clear about the dimensions of theses “boxes” in which yourdata are and not get confused by the indexing of the data. In myexperience, knowing how to format data into such arrays and then notgetting lost is one of the most difficult things to learn about the routineuse of WinBUGS.

8. Know your box (2): In “serious” analyses, your modeling oftenrequires the data to be formatted in some multidimensional array. Forinstance, for a multispecies version of a site-occupancy model (seeChapter 20), you have at least three dimensions corresponding tospecies (i), site (j), and replicate survey (k). It appears that how youbuild your array and, especially, how you loop over that array inthe definition of the likelihood can make a huge difference in terms ofthe speed with which your Markov chains in WinBUGS evolve. Itappears that you should loop over the longest dimension first and overthe shortest last. For instance, if you have data from 450 sites, 100species, and for two surveys each, then it appears best to format thedata as y[j, i, k] and then loop over sites (j) first, then over species (i),and finally over replicate surveys (k).

9. Don’t define things twice: Every parameter in WinBUGS can only bedefined once. For instance, writing y ~ dnorm(mu, tau) and thenadding y[3] < 5 will cause an error. There is a single exception tothis rule, and that is the transformation of the response by some

A LIST OF WinBUGS TRICKS280

function such as the log() or abs(). So to conduct an analysis of alog-transformed response, you may write y < log(y) and theny ~ dnorm(mu, tau). Beware of inadvertently defining quantitiesmultiple times when erroneously putting them within a loop that theydon’t belong.

10. logit function: In more complex models, I have fairly often experiencedproblems when using WinBUGS’ own logit function, for instance,with achieving convergence (Actually, problems may arise even withfairly simple models.). Therefore, it is often better to specify thattransformation explicitly by logit.p[i] < log(p[i] / (1 – p[i])),p[i] < exp(logit.p[i]) / (1 + exp(logit.p[i])) or p[i] < 1 /(1 + exp( logit.p[i])).

11. Stabilized logit: To avoid numerical overflow or underflow, youmay “stabilize” the logit function by excluding extreme values(B. Wintle, pers. comm.). Here’s a sketch of how to do that. The Gibbssampling will typically get slower, but at least WinBUGS will beless likely to crash:

logit(psi.lim[i]) < lpsi.lim[i]lpsi.lim[i] < min(999, max( 999, lpsi[i]))lpsi[i] < alpha.occ + beta.occ * something[i]

12. Truncated normals: Similarly, in log- or logit-normal mixtures (whichwe see when introducing a normal random effect into the linearpredictor of a Poisson or binomial generalized linear model), you maywant to truncate the zero-mean normal distribution, e.g., at ±20 (Kéryand Royle, 2009).

13. Tinn-R special: Tinn-R is a popular R editor. Users of Tinn-R 2.0 (ornewer) may have problems writing the text file containing the BUGSmodel description with the sink() function; Tinn-R adds to that filesome gibberish that will cause WinBUGS to crash. You must then usean alternative way of writing the model file. As an example, here is aworkaround for the BUGS description of the model of the mean inchapter 5 that should be compatible with Tinn-R (thanks to WesleyHochachka):

modelFilename = 'model.txt'cat("model of the mean {

# Priorspopulation.mean ~ dunif(0,5000)precision < 1 / population.variancepopulation.variance < population.sd * population.sdpopulation.sd ~ dunif(0,100)

APPENDIX 281

# Likelihoodfor(i in 1:nobs){

mass[i] ~ dnorm(population.mean, precision)}}",fill=TRUE, file=modelFilename)

An alternative solution is given by Jérôme Guélat: The “R send”functions available in Tinn-R allow sending commands into R.However the “(echo=TRUE)” versions of these functions should not beused when sending the sink() function and its content into R. Forexample: one should use “R send: selection” instead of “R send:selection (echo=TRUE)”.

14. Trial runs first: Run very short chains first, e.g., of length 12 with aburnin of 2, just to confirm that there are no coding or other errors.Only when you are satisfied that your code works and your modeldoes what it should, increase the chain length to get a production run.

15. Use of native WinBUGS: A key feature of this book is that we runWinBUGS exclusively from within program R. I believe that this ismuch more efficient than running native WinBUGS. However, withsome complex models and/or large sets, WinBUGS will be extremelyslow. This may be the one exception where it is perhaps moreefficient to run WinBUGS natively. You may still prepare the analysisin R as shown in this book, but only request WinBUGS to runvery short Markov chains. When you set the option DEBUG = TRUEin the function bugs(), then WinBUGS stays open after therequested number of iterations have been conducted. Then, you canrequest more iterations to be executed directly in WinBUGS (i.e.,using the Update Tool; see Chapter 4). You can then incrementallyincrease the total chain length and monitor convergence as you go.Once convergence has been achieved, do the required additionalnumber of iterations and save them into coda files. You must dothis later, since when exiting WinBUGS, the bugs() function onlyimports back into R the (small) number of iterations that you originallyrequested. When you have your valuable samples of your complexmodel’s posterior distribution in coda files, use facilities providedby R packages boa or coda to import them into R and processthem (e.g., compute Brooks–Gelman–Rubin convergence tests orposterior summaries for inference about the parameters).

16. Be flexible in your modeling: Try out different priors, e.g., for parametersrepresenting probabilities try a uniform(0,1), a flat normal for the logittransform, or a Beta(1,1). Sometimes, and for no obvious reason, onemay work while another doesn’t. Similarly, WinBUGS is very sensitiveto changes in the parameterization of a model. Sometimes, one way of


writing the model may work and the other doesn’t, for unknownreasons, or one works much faster than the other (which, usually, ismostly an issue for more complex models than most of those featuredin this book).

17. Scale continuous covariates: Scaling continuous covariates, so that theirrange does not extend too far away on either side of zero, can greatlyimprove mixing of the chains and often only make convergencepossible (see, e.g., Section 11.4. in this book).

18. WinBUGS hangs after compiling: Try a restart, and if that doesn’twork, find better starting values (this is an important tip from themanual).

19. Debugging a WinBUGS analysis 1: If something went wrong, youneed to attentively read through the entire WinBUGS log file from thetop to identify the first thing that went wrong. Other errors mayfollow, but they may not be the actual cause of the failure.

20. Debugging a WinBUGS analysis 2: When something doesn’t work,the simplest and best advice (see also Gelman and Hill, 2007) is to goback to a simpler version of the same model, or to a similar model, thatdid work, and then incrementally increase the complexity of thatmodel until you arrive at the desired model. That is, from less complexmodels sneak up on the model you want. Indeed, when usingWinBUGS, you learn to always start from the simplest version of aproblem and gradually build in more complexity until you are at thelevel of complexity that you require.

21. DIC problems: Surprisingly, sometimes when getting a trap(including one with the very informative title “NIL dereference(read)”), setting the argument DIC = FALSE in the bugs() functionhas helped.

22. R2WinBUGS chokes: Sometimes the R object created by R2WinBUGSis too big for one’s computer. Then, use BOA or CODA to read inthe coda files directly, and use their facilities to produce yourinference in this way (e.g., convergence diagnostics and posteriorsummaries).

23. Identifiability/estimability of parameters: To see whether two or moreparameters are difficult to estimate separately, you can plot the valuesof their Markov chains against each other.

24. Check of model adequacy: Do residual analysis, posterior predictivechecks, and cross-validation to see whether your model appears to bean adequate representation of the main features in the data.

25. Predictions: They are a very important part of inference: i.e., theestimation of unobserved or future data. One particularly useful wayto examine predictions is to estimate what a response would look likefor a chosen combination of values of the explanatory variables. Thegeneration and examination of such predicted values is an important

APPENDIX 283

method to understand complex models (for instance, to see what aparticular interaction means) and also needed to illustrate the resultsof an analysis, e.g., as a figure in a paper.

26. Sensitivity analysis for priors: Consider assessing prior sensitivity, i.e.,repeat your analyses, or those for key models, with different priorspecifications and see whether your inference is robust in this respect.If it is not, then not all is lost, but you must report on that in themethods section of your paper.

27. Have a healthy distrust in your solutions: Always inspect your inference,e.g., plot predictions against observed values for quantities that can beobserved, to make sure that the WinBUGS solution is sensible. Alsowatch out for unexplained differences in parameter estimates betweenneighboring models, e.g., those that differ by only one covariate orsome other rather minor model characteristic. This can be an indicationthat something went wrong or that there are estimability problemswith the model for your data set.

28. NAs and NaNs: When dealing with data in multidimensional arrays, avery useful R package is ‘reshape’. The newer versions the reshapepackage in R 2.9 use an NaN to fill in NAs. This makes WinBUGS veryunhappy—you must have NA, not NaN. In general, this is probablygood to know about BUGS and newer versions of other packages maybe doing the same thing. So, if you use the melt/cast functions inreshape to organize data, then you will need to update your code inthe newer R versions by adding “fill=NA real ”. Example: Ymat=cast(data.melt, SppCode~JulianDate~GridCellID, fun.aggregate=mean,fill=NA real ) (tipp from Beth Gardner).

29. Long Windows addresses: WinBUGS doesn’t like too long Windowsaddresses (C:\My harddisk\Important stuff\Less important stuff\…)for its working directory. Hence, you should not bury your WinBUGSanalyses too much down in a tree hierarchy.

30. VISTA problems: Windows VISTA has caused all sorts of “challenges”in workshops taught using this book—be prepared! One problemwas that the default BUGS directory is not the same as that statedin section 3.4 of the book.


References

Aitkin, M., Francis, B., Hinde, J., Darnell, R., 2009. Statistical Modelling in R. Oxford University Press, Oxford, UK.

Altwegg, R., Wheeler, M., Erni, B., 2008. Climate and the range dynamics of species withimperfect detection. Biol. Lett. 4, 581 584.

Andrewartha, H.G., Birch, L.C., 1954. The Distribution and Abundance of Animals. University of Chicago Press, Chicago.

Araujo, M.B., Guisan, A., 2006. Five (or so) challenges for species distribution modeling.J. Biogeogr. 33, 1677 1688.

Bailey, L.L., Hines, J.E., Nichols, J.D., MacKenzie, D.I., 2007. Sampling design trade offs inoccupancy studies with imperfect detection: examples and software. Ecol. Appl. 17,281 290.

Bernardo, J.M., 2003. Bayesian Statistics. Encyclopedia of Life Support Systems (EOLSS).Probability and Statistics. UNESCO, Oxford, UK. <http://www.uv.es/~bernardo/BayesStat2.pdf> (accessed 4 May 2009).

Bolker, B.M., 2008. Ecological Models and Data in R. Princeton University Press, NJ.Borchers, D.L., Buckland, S.T., Zucchini, W., 2002. Estimating Animal Abundance. Springer,

London.Borchers, D.L., Efford, M.G., 2008. Spatially explicit maximum likelihood methods for

capture recapture studies. Biometrics 64, 377 385.Brooks, S.P., 2003. Bayesian computation: a statistical revolution. Philos. Trans. R. Soc. Lond.

A. 361, 2681 2697.Buckland, S.T., Anderson, D.R., Burnham, K.P., Laake, J.L., Borchers, D.L., Thomas, L., 2001.

Introduction to Distance Sampling. Oxford University Press, Oxford.Burnham, K.P., Anderson, D.R., 2002. Model Selection and Multimodel Inference: Practical

Information Theoretic Approach. Springer, New York.Clark, J.S., 2007. Models for Ecological Data: An Introduction. Princeton University Press, NJ.Clark, J.S., Ferraz, G., Oguge, N., Hays, H., DiCostanzo, J., 2005. Hierarchical Bayes for struc

tured, variable populations: from recapture data to life history prediction. Ecology 86,2232 2244.

Crawley, M.J., 2005. Statistics. An Introduction Using R. Wiley, Chichester, West Sussex.Dalgaard, P., 2001. Introductory Statistics with R. Springer, New York.de Valpine, P., 2009. Shared challenges and common ground for Bayesian and classical

analysis of hierarchical statistical models. Ecol. Appl. 19, 584 588.de Valpine, P., Hastings, A., 2002. Fitting population models incorporating process noise and

observation error. Ecol. Monogr. 72, 57 76.Dellaportas, P., Forster, J.J., Ntzoufras, I., 2002. On Bayesian model and variable selection

using MCMC. Stat. Comput. 12, 27 36.Dennis, B., 1996. Discussion: should ecologists become Bayesians? Ecol. Appl. 6, 1095 1103.Dennis, B., Ponciano, J.M., Lele, S.R., Taper, M.L., Staples, D.F., 2006. Estimating density

dependence, process noise, and observation error. Ecol. Monogr. 76, 323 341.Dodd Jr., C.K., Dorazio, R.M., 2004. Using counts to simultaneously estimate abundance and

detection probabilities in salamander surveys. Herpetologica 60, 468 478.

285

http://www.uv.es/~bernardo/BayesStat2.pdf

http://www.uv.es/~bernardo/BayesStat2.pdf

Dorazio, R.M., 2007. On the choice of statistical models for estimating occurrence and extinction from animal surveys. Ecology 88, 2773 2782.

Efford, M.G., 2004. Density estimation in live trapping studies. Oikos 106, 598 610.Efford, M.G., Dawson, D.K., Borchers, D.L., 2009. Population density estimated from

locations of individuals on a passive detector array. Ecology 90, 2676 2682.Elith, J., et al. 2006. Novel methods improve prediction of species’ distributions from occur

rence data. Ecography 29, 129 151.Ellison, A.M., 1996. An introduction to Bayesian inference for ecological research and envir

onmental decision making. Ecol. Appl. 6, 1036 1046.Fiske, I., Chandler, R., 2010. unmarked:models for data from unmarked animals. R package ver

sion 0.9.0. http://cran.r project.org/web/packages/unmarked/index.html (accessed 1 March 2010).Gelfand, A.E., Schmidt, A.E., Wu, S., Silander Jr., J.A., Latimer, A., Rebelo, A.G., 2005. Mod

elling species diversity through species level hierarchical modelling. Appl. Stat. 54, 1 20.Gelman, A., 2005. Analysis of variance: why it is more important than ever (with discussion).

Ann. Stat. 33, 1 53.Gelman, A., 2006. Prior distributions for variance parameters in hierarchical models. Bayesian

Anal. 1, 514 534.Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B., 2004. Bayesian Data Analysis. Second ed.

CRC/Chapman & Hall, Boca Raton, FL.Gelman, A., Hill, J., 2007. Data Analysis Using Regression and Multilevel/Hierarchical

Models. Cambridge University Press, Cambridge.Gelman, A., Meng, X. L., Stern, H.S., 1996. Posterior predictive assessment of model fitness

via realized discrepancies (with discussion). Stat. Sin. 6, 733 807.Geman, S., Geman, S., 1984. Stochastic relaxation, Gibbs distributions, and the Bayesian

restoration of images. IEEE Trans. Pattern. Anal. Mach. Intell. 6, 721 741.Gilks, W.R., Thomas, A., Spiegelhalter, D.J., 1994. A language and program for complex

Bayesian modeling. Statistician 43, 169 178.Gu, W., Swihart, R.K., 2004. Absent or undetected? Effects of non detection of species occur

rence on wildlife habitat models. Biol. Conserv. 116, 195 203.Hanski, I., 1998. Metapopulation dynamics. Nature 396, 41 49.Hastings, W.K., 1970. Monte Carlo sampling methods using Markov chains and their

applications. Biometrika 57, 97 109.Joseph, L.N., Elkin, C., Martin, T.G., Possingham, H.P., 2009. Modeling abundance using

N mixture models: the importance of considering ecological mechanisms. Ecol. Appl.19, 631 642.

Kadane, J.B., Lazar, N.A., 2004. Methods and criteria for model selection. J. Am. Stat. Assoc.99, 279 290.

Kéry, M., 2002. Inferring the absence of a species A case study of snakes. J. Wildl. Manage66, 330 338.

Kéry, M., 2004. Extinction rate estimates for plant populations in revisitation studies: theimportance of detectability. Conserv. Biol. 18, 570 574.

Kéry, M., 2008. Estimating abundance from bird counts: binomial mixture models uncovercomplex covariate relationships. Auk 125, 336 345.

Kéry, M., Dorazio, R.M., Soldaat, L., van Strien, A., Zuiderwijk, A., Royle, J.A., 2009. Trendestimation in populations with imperfect detection. J. Appl. Ecol. 46, 1163 1172.

Kéry, M., Gardner, B., Monnerat, C., 2010b. Predicting species distributions from checklistdata using site occupancy models. J. Biogeogr. (in press)

Kéry, M., Juillerat, L., 2004. Sex ratio estimation and survival analysis for Orthetrum coerulescens (Odonata, Libellulidae). Can. J. Zool. 82, 399 406.

Kéry, M., and Royle, J.A., 2009. Inference about species richness and community structureusing species specific occupancy models in the national Swiss breeding bird surveyMHB. pp. 639 656 in Thomson, D.L., Cooch, E.G., and Conroy, M.J. (eds.) Modeling

REFERENCES286

http://cran.r-project.org/web/packages/unmarked/index.html

Demographic Processes in Marked Populations. Series: Environmental and Ecological Statistics, Vol. 3, Springer.

Kéry, M., Royle, J.A., 2010. Hierarchical modeling and estimation of abundance in metapopulation designs. J. Anim. Ecol. 79, 453 461.

Kéry, M., Royle, J.A., Schmid, H., 2005. Modeling avian abundance from replicated countsusing binomial mixture models. Ecol. Appl. 15, 1450 1461.

Kéry, M., Royle, J.A., Schmid, H., Schaub, M., Volet, B., Häfliger, G., Zbinden, N., 2010a. Siteoccupancy distribution modeling to correct population trend estimates derived fromopportunistic observations. Conserv. Biol. (in press).

Kéry, M., Schmidt, B.R., 2008. Imperfect detection and its consequences for monitoring forconservation. Community Ecol. 9, 207 216.

Kéry, M., Spillmann, J.H., Truong, C., Holderegger, R., 2006. How biased are estimates ofextinction probability in revisitation studies? J. Ecol. 94, 980 986.

King, R., Morgan, B.J.T., Gimenez, O., Brooks, S.P., 2009. Bayesian Analysis for PopulationEcology. Chapman and Hall/CRC, Boca Raton, FL.

Krebs, C.J., 2001. Ecology: The Experimental Analysis of Distribution and Abundance. Fifthed. Benjamin Cummings, Menlo Park, CA.

Kuo, L., Mallick, B., 1998. Variable selection for regression models. Sankhya 60B, 65 81.Lambert, P.C., Sutton, A.J., Burton, P.R., Abrams, K.R., Jones, D.R., 2005. How vague is

vague? A simulation study of the impact of the use of vague prior distributions inMCMC using WinBUGS. Stat. Med. 24, 2401 2428.

Latimer, A.M., Wu, S., Gelfand, A.E., Silander Jr., J.A., 2006. Building statistical models toanalyse species distributions. Ecol. Appl. 16, 33 50.

Le Cam, L., 1990. Maximum likelihood: an introduction. Int. Stat. Rev. 58, 153 171.Lee, Y., Nelder, J.A., 2006. Double hierarchical generalised linear models. Appl. Stat. 55,

139 185.Lee, Y., Nelder, J.A., Pawitan, Y., 2006. Generalized Linear Models with Random Effects.

Unified Analysis via H likelihood. Chapman and Hall/CRC, Boca Raton, FL.Lele, S.R., Dennis, B., 2009. Bayesian methods for hierarchical models: are ecologists making a

Faustian bargain? Ecol. Appl. 19, 581 584.Lindley, D.V., 1983. Theory and practice of Bayesian statistics. Statistician 32, 1 11.Lindley, D.V., 2006. Understanding Uncertainty. Wiley, Hoboken, NJ.Link, W.A., Barker, R.J., 2006. Model weights and the foundations of multimodel inference.

Ecology 87, 2626 2635.Link, W.A., Barker, R.J., 2010. Bayesian Inference with Ecological Examples. Academic Press,

San Diego, CA.Link, W.A., Cam, E., Nichols, J.D., Cooch, E.G., 2002. Of BUGS and birds: Markov chain

Monte Carlo for hierarchical modeling in wildlife research. J. Wildl. Manage. 66, 277 291.Link, W.A., Sauer, J.R., 2002. A hierarchical analysis of population change with application to

Cerulean warblers. Ecology 83, 2832 2840.Littell, R.C., Milliken, G.A., Stroup, W.W., Wolfinger, R.D., Schabenberger, O., 2008. SAS for

Mixed Models. Second ed. SAS Institute, Cary, NC.Little, R.J.A., 2006. Calibrated Bayes: a Bayes/Frequentist roadmap. Am. Stat. 60, 213 223.Lunn, D., Spiegelhalter, D., Thomas, A., Best, N., 2009. The BUGS project: evolution, critique

and future directions. Stat. Med. 28, 3049 3067.MacKenzie, D.I., 2006. Modeling the probability of resource use: the effect of, and dealing

with, detecting a species imperfectly. J. Wildl. Manage. 70, 367 374.MacKenzie, D.I., Kendall, W.L., 2002. How should detection probability be incorporated into

estimates of relative abundance? Ecology 83, 2387 2393MacKenzie, D.I., Nichols, J.D., Hines, J.E., Knutson, M.G., Franklin, A.B., 2003. Estimating site

occupancy, colonization and local extinction when a species is detected imperfectly.Ecology 84, 2200 2207.

REFERENCES 287

MacKenzie, D.I., Nichols, J.D., Lachman, G.B., Droege, S., Royle, J.A., Langtimm, C.A., 2002.Estimating site occupancy rates when detection probability rates are less than one.Ecology 83, 2248 2255.

MacKenzie, D.I., Nichols, J.D., Royle, J.A., Pollock, K.H., Hines, J.E., Bailey, L.L., 2006.Occupancy Estimation and Modeling: Inferring Patterns and Dynamics of SpeciesOccurrence. Elsevier, San Diego, CA.

MacKenzie, D.I., Royle, J.A., 2005. Designing occupancy studies: general advice and allocating survey effort. J. Appl. Ecol. 42, 1105 1114.

Martin, T.G., Kuhnert, P.M., Mergersen, K., Possingham, H.P., 2005. The power of expertopinion in ecological models using Bayesian methods: impact of grazing on birds. Ecol.Appl. 15, 266 280.

Mazzetta, C., Brooks, S., Freeman, S.N., 2007. On smoothing trends in population indexmodeling. Biometrics 63, 1007 1014.

McCarthy, M.A., 2007. Bayesian Methods for Ecology. Cambridge University Press,Cambridge.

McCarthy, M.A., Masters, P., 2005. Profiting from prior information in Bayesian analyses ofecological data. J. Appl. Ecol. 42, 1012 1019.

McCullagh, P., Nelder, J.A., 1989. Generalized Linear Models. Second ed. Chapman and Hall,London.

McCulloch, C.E., Searle, S.R., 2001. Generalized, Linear, and Mixed Models. Wiley, New York.Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., Teller, E., 1953. Equation of

state calculations by fast computing machines. J. Chem. Phys. 21, 1087 1092.Millar, R.B., 2009. Comparison of hierarchical Bayesian models for over dispersed count data

using DIC and Bayes factors. Biometrics 65, 962 969.Monneret, R. J., 2006. Le faucon pèlerin. Delachaux and Niestlé, Paris, France.Nelder, J.A., Wedderburn, R.W.M., 1972. Generalized linear models. J. R. Stat. Soc. Ser. A.

135, 370 384.Ntzoufras, I., 2009. Bayesian Modeling Using WinBUGS. Wiley, Hoboken, NJ.Pellet, J., Schmidt, B.R., 2005. Monitoring distributions using call surveys: estimating site

occupancy, detection probabilities and inferring absence. Biol. Conserv. 123, 27 35.Pinheiro, J.C., Bates, D.M., 2000. Mixed Effects Models in S and S Plus. Springer, New York.Ponciano, J.M., Taper, M.L., Dennis, B., Lele, S.R., 2009. Hierarchical models in ecology:

confidence intervals, hypothesis testing, and model selection using data cloning. Ecology90, 356 362.

Powell, L.A., 2007. Approximating variance of demographic parameters using the deltamethod: a reference for avian biologists. Condor 109, 949 954.

R Development Core Team 2007. R: A language and Environment for Statistical Computing.R Foundation for Statistical Computing, Vienna, Austria. <http://www.R project.org>(accessed 4 May 2009).

Robinson, G.K., 1991. That BLUP is a good thing: the estimation of random effects (withdiscussion). Stat. Sci. 6, 15 51.

Royle, J.A., 2004a. N mixture models for estimating population size from spatially replicatedcounts. Biometrics 60, 108 115.

Royle, J.A., 2004b. Generalized estimators of avian abundance from count survey data. Anim.Biodivers. Conserv. 27, 375 386.

Royle, J.A., 2008. Modeling individual effects in the Cormack Jolly Seber model: a state spaceformulation. Biometrics 64, 364 370.

Royle, J.A., Dorazio, R.M., 2006. Hierarchical models of animal abundance and occurrence.J. Agric. Biol. Environ. Stat. 11, 249 263.

Royle, J.A., Dorazio, R.M., 2008. Hierarchical Modeling and Inference in Ecology. AcademicPress, Amsterdam.

Royle, J.A., Karanth, K.U., Gopalaswamy, A.M., Kumar, N.S., 2009. Bayesian inference in camera trapping studies for a class of spatial capture recapture models. Ecology 90, 3233 3244.

REFERENCES288

http://www.R-project.org

Royle, J.A., Kéry, M., 2007. A Bayesian state space formulation of dynamic occupancymodels. Ecology 88, 1813 1823.

Royle, J.A., Kéry, M., Gautier, R., Schmid, H., 2007. Hierarchical spatial models of abundanceand occurrence from imperfect survey data. Ecol. Monogr. 77, 465 481.

Royle, J.A., Nichols, J.D., 2003. Estimating abundance from repeated presence absence data orpoint counts. Ecology 84, 777 790.

Royle, J.A., Nichols, J.D., Kéry, M., 2005. Modeling occurrence and abundance of species withimperfect detection. Oikos 110, 353 359.

Royle, J.A., Young, K.G., 2008. A hierarchical model for spatial capture recapture data.Ecology 89, 2281 2289.

Sauer, J.R., Link, W.A., 2002. Hierarchical modeling of population stability and species groupattributes from survey data. Ecology 86, 1743 1751.

Schmidt, B.R., 2005. Monitoring the distribution of pond breeding amphibians when speciesare detected imperfectly. Aquat. Conserv. Mar. Freshwater Ecosyst. 15, 681 692.

Schmidt, B.R., 2008. Neue statistische Verfahren zur Analyse von Monitoring und Verbreitungsdaten von Amphibien und Reptilien. Zeitschrift für Feldherpetologie 15, 1 14.

Smith, A.F.M., Gelfand, A.E., 1992. Bayesian statistics without tears: A sampling resamplingperspective. Am. Stat. 46, 84 88.

Spiegelhalter, D., Best, N.G., Carlin, B.P., van der Linde, A., 2002. Bayesian measures of complexity and fit (with discussion). J. R. Stat. Soc. Series B. 64, 583 639.

Spiegelhalter, D., Thomas, A., Best, N.G., 2003. WinBUGS User Manual, Version 1.4. MCRBiostatistics Unit, Cambridge.

Sturtz, S., Ligges, U., Gelman, A., 2005. R2WinBUGS: a package for running WinBUGS fromR. J. Stat. Softw. 12, 1 16.

Swain, D.P., Jonsen, I.D., Simon, J.E., Myers, R.A., 2009. Assessing threats to species at riskusing stage structured state space models: mortality trends in skate populations. Ecol.Appl. 19, 1347 1364.

Tyre, A.J., Tenhumberg, B., Field, S.A., Niejalke, D., Parris, K., Possingham, H.P., 2003.Improving precision and reducing bias in biological surveys: Estimating false negativeerror rates. Ecological Applications 13, 1790 1801.

Venables, W.N., Ripley, B.D., 2002. Modern Applied Statistics with S. Springer, New York.Ver Hoef, J.M., Jansen, J.K., 2007. Space time zero inflated count models of Harbour seals.

Environmetrics 18, 697 712.Wade, P.R., 2000. Bayesian methods in conservation biology. Conserv. Biol. 14, 1308 1316.Webster, R.A., Pollock, K.H., Simons, T.R., 2008. Bayesian spatial modeling of data from

avian point count surveys. J. Agric. Biol. Environ. Stat. 13, 121 139.Welham, S., Cullis, B., Gogel, B., Gilmour, A., Thompson, R., 2004. Prediction in linear mixed

models. Aust. N. Z. J. Stat. 46, 325 347.Wenger, S.J., Freeman, M.C., 2008. Estimating species occurrence, abundance, and detection

probability using zero inflated distributions. Ecology 89, 2953 2959.White, G.C., Burnham, K.P., 1999. Program MARK: survival estimation from populations of

marked animals. Bird Study 46 (Suppl.), 120 138.Wikle, C.K., 2003. Hierarchical Bayesian models for predicting the spread of ecological

processes. Ecology 84, 1382 1394.Williams, B.K., Nichols, J.D., Conroy, M.J., 2002. Analysis and Management of Animal

Populations. Academic Press, San Diego, CA.Woodworth, G.G., 2004. Biostatistics. A Bayesian Introduction. Wiley, Hoboken, NJ.Yoccoz, N.G., Nichols, J.D., Boulinier, T., 2001. Monitoring biological diversity in space and

time. Trends Ecol. Evol. 16, 446 453.Zeileis, A., Kleiber, C., Jackman, S., 2008. Regression models for count data in R. J. Stat. Softw.

27, <http://www.jstatsoft.org/v27/i08/> (accessed 4 May 2009).

REFERENCES 289

http://www.jstatsoft.org/v27/i08/

Index

Page numbers followed by f indicate figure, t indicate table

AAbundance, 253 255

in binomial mixture model, 256 257,271 273, 271f

distribution and, 260 261naïve analysis and, 272, 273ffor species distribution, 260vegetation and, 257 262

AD model builder (program), 277Adder, 219 228, 220fADMB (program), 277ADMB foundation, 277AIC. See Akaike’s information criterionAkaike’s information criterion (AIC), 25,

124, 156, 159, 207, 233Algorithm

computing, 18Metropolis Hastings, 20Newton Raphson, 9

Alps, 220Analysis of Covariance. See ANCOVAAnalysis of Variance. See ANOVAANCOVA. See also Binomial ANCOVA;

Poisson ANCOVAanalysis of, 82 89ANOVA converted to, 127ANOVA distinguished from, 141 142data generation for, 143 144effects parameterization and, 143interaction effects, 84, 88fmain effects, 85 86, 87fR analysis, 145Swiss hare data, 150WinBUGS analysis, 145 149

ANOVA, 41, 66ANCOVA converted from, 127ANCOVA distinguished from, 141 142continuous covariate, 141 142main effects, 79meansparameterization for one way, 76 77parameterizations for one way, 78types of, 115

ANOVA (one way), 76 78fixed effects, Bayesian analysis using

WinBUGS, 120 122fixed effects, data generation, 119 120fixed effects, ML analysis using R, 120fixed effects compared to random effects,

116 118parameterization of, 116random effects, Bayesian analysis using

WinBUGS, 125 127random effects, data generation,

122 123, 123frandom effects, REML using R, 124Swiss hare data, 127

ANOVA (two way), 78 82bias assessment, 133 134data generation, 131 133fixed effects, 129 139interaction effects, R analysis, 134 135interaction effects usingWinBUGS, 137 138main effects, R analysis, 134main effects using WinBUGS, 135 137parameterization, 130predictions, 138 139, 139fSwiss hare data, 139

Apparent occurrence, 240Arable, grasslands and, 101, 167 177Asp viper, 142, 142f, 152, 152fAssumption violations, 101Asymptotic, 2Autocorrelation, 42, 204Axioms, of probability, 4, 16 17

BBackwards compatibility, 32Balanced data, 240Bayes rule, 16 17Bayesian (approach to statistics)

advantages of, 2 4asymptotics and, 2classical statistics compared to, 6, 15 19combining information, 3

291

Bayesian (approach to statistics) (Cont.)computers aiding, 19definition, 1 2error propagation, 2 3inference, 8 9, 16 17intuitive appeal, 3 4literature on, 9 10modern algorithms, 19numerical tractability, 2probability, interpretation in, 3 4, 15resistance to, 4summary of, 53 54vague prior and, 55 56, 275

Bayesian p value, 104, 227linear regression and, 107 109

Beliefdegree of , 20prior, 17 18

Bell shaped cloud, 260Bernoulli (distribution), 63 64

presence absence data and, 238Between chain variance, 22BGR statistic

convergence monitoring and, 54 55Bias, 101

detection probability and, 251Maximum likelihood and, 2two way ANOVA, assessing,

133 134un , 2, 133

Binomialcounts, 211distribution, 211 213noise, 222, 231regression, multiple, 228sampling variation, 233ft test, 213 216

Binomial ANCOVA, 219data generation, 221 223Pearson residual, 224R analysis, 223Swiss hare data, 228WinBUGS analysis, 224 228

Binomial distribution, 16, 58, 60inbuilt variance, 63picture, 64fsampling situation, 62uses, 168in WinBUGS, 63 64

Binomial mixed effects model, 229data generation, 230 231Poisson GLMM compared to, 230

random coefficients model, 229 230,231 235

Binomial mixture model, 253abundance, 256 257, 271 273, 271fassumptions, 255 256convergence, 265 266data generation, 257 262naïve analysis and, 261f, 262posterior predictive check, 268fsite occupancy model compared to,

256, 259Swiss hare data, 274WinBUGS analysis, 262 273

Biological distribution, 265Biological process, 241, 243 244,

255 256, 259Biometrics (journal), 18Biometrika (journal), 18Bird. See specific speciesBivariate Normal (distribution), 161 162Black adder, 219 228, 220fBlack Forest, 220Block, experimental, 117BOA, 48Boosted regression tree, 240Boxplot, 98f, 132fBrooks Gelman Rubin (BGR). See BGR

statisticBrown hare, 168 169, 169fBRugs, 47BUGS, 8. See also WinBUGSBugs(), 31 32, 51, 95Burn in, 9, 22

nonzero, 42specifying, 39

Butterflies, 130, 130fpopulation data for, 131 133

CC++, 4, 277Calcareous, 239 240Cambridge, 29Canned routine, 190, 203Capture recapture, 256Catalonia, 230Categorical variables, 59, 67Centering, 149Chain(s). See also Markov chain Monte Carlo

requesting, 47 48variance, 22

Chiltern gentian, 212 216, 216f,239 252, 239f

INDEX292

Chi squared, 167, 268fCI. See Confidence intervalClick point, 5Closed population, 242CODA, 48Coherence, 4Coin flip, 211 212Colonization, 242Complementary log log, 220Conditional distribution, 20Condor, 3Confidence interval (CI), 111 113Confound, occurrence/detection, 240,

254, 260Continuous

discrete compared to, 60index, 242response and explanatory variable,

73 76uniform distribution, 62wetness index, 221, 222f

Continuous covariate, 82, 193, 219ANOVA, 141 142

Contrastcustom, 122estimate, 177

Converged MCMC, 21 22, 41,173 174

Convergencein binomial mixture model, 265 266non , 26t, 27, 31of Poisson GLMM, 208 209Rhat and, 173

Convergence diagnostics, 148Convergence monitoring, 21 23

BGR statistic and, 54 55Copy pasting, 32Cordulegaster bidentata, 194fCoronella austriaca, 116fCorrelation, 43 44Counting window, 65

consistency of, 188Counts

binomial, 211land use, 170mite, 195naïve analysis, 238 239overdispersion in, 180Poisson distribution, 169, 211replicate, 255, 260ZIP, 185

Courier font, 31

Covariate, 24design matrix and values of, 75 76of mixed model, 151normalize, 154offset and, 188 189of Poisson ANCOVA, 193site, 252site occupancy model and, 241 242survey, 252, 273 274time, 239transformation, 149uniform, 231WinBUGS and standardization of,

147 149Crash, in WinBUGS, 45, 263 264Creativity

statistics and, 275in WinBUGS, 30 31

Credible interval (CRI), 23Bayesian CRI v. frequentist CI, 43creating, 42Hypothesis testing, 24 25for linear regression, 110 111, 111f

CRI. See Credible intervalCross classification, 81, 131Cross leaved gentian, 212 216, 212f, 216fCross validation, 24Curvilinear graphs, 66Custom

contrast, 122hypothesis test, 121tests, 146

CV, 56

DData sets

actual, 8assemble, 105, 123, 194 195, 229 230bundle, 50generation of, 48 49ideal, 8loading, 36simulated, 7 8

Debugfalse, 51true, 52

Decision tree, 275Decline

population in, 15, 103probability of, 105 106

Degree of belief, 20Delta method, 23

INDEX 293

Density, 39, 42, 56of land use, 169

Derived quantity, 56, 95, 99, 146, 177Design matrix

covariate values, 75 76without intercept, 154linear predictor and, 67, 71 72, 144parameterizations associated with, 77R function for, 68 69specification of, 58 59

Detectionimperfect, 251 252, 254 257occurrence confounded with, 240, 254, 260

Detection probability, 238, 240bias and, 251importance of, 251in site occupancy model, 241,

250 251, 250fDetection nondetection data, 212, 215Deterministic systems, 29, 277

stochastic systems compared to, 58Deviance, 68 69Deviance information criterion (DIC), 25Df, 94, 98, 145, 180, 182 183DIC. See Deviance information criterionDiffuse prior, 18Discrepancy measure, 172, 248Discrete

continuous compared to, 60explanatory variables, 68

Disease, 27Dispersal, 256Dispersion, 101, 194, 223. See also

overdispersionDistance sampling, 256Distribution, 204, 220. See also specific

distribution typesabundance and, 260 261binomial, 211 213choosing type of, 60conditional, 20equilibrium, 21 22exponential family, 5gamma, 65

Double indexing, 120, 198Dragonfly, 193 194, 194fDummy variable, 70, 92Dynamic, site occupancy model, 22, 242

EEcology, 253Ectoparasite, 193, 196f

Effective number of parameters (pD), 25Effects parameterization, 72, 77

ANCOVA, 143interaction effects, 83, 86t test, 92 93

Equal variances, 92 97Equation. See specific typesEquilibrium distribution, 21 22Error propagation, 216

Bayesian approach and, 2 3false positive, 244

Estimable. See also Maximum likelihoodestimate

contrast, 177fixed effects, 118

Exchangeable, 117Experimental block, 117Explanatory variable, 242

combining effects of multiple, 78 82continuous response and continuous,

73 76discrete, 68

Exponential family distributions, 5Extinction, 242Extra binomial, 180Extra Poisson, 180Eye opener, WinBUGS as, 275

FFactor. See also fixed effects; mixed effect;

two factor ANOVAlevels, 18scale reduction, 199two level, 152unrecognized, 59

Falco peregrinus, 34fFalse positive, 244Fixed effects

one way ANOVA, Bayesian analysisusing WinBUGS, 120 122

one way ANOVA, data generation,119 120

one way ANOVA, ML analysisusing R, 120

one way ANOVA, random effectscompared to, 116 118

two way ANOVA, 129 139in WinBUGS, 166

Flat prior, 18Fortran, 4, 277Frog, 113F test, 117

INDEX294

GGAM, 240Gamma (distribution), 65Gaussian bell curve, 61Gelman Rubin, 22Generalized linear mixed model (GLMM),

5, 5t. See also Binomial GLMM andPoisson GLMM. See also Mixed model.See also Non standard (GLMM).

Generalized linear model (GLM), 5, 5t,8, 24, 67. See also ANCOVA;Binomial ANCOVA; PoissonANCOVA

components, 168recognition of, 141 142unification of models under, 167 168

General purpose software, 1, 4, 11, 275GenStat (program), 5Gentiana cruciata, 212fGentianella germanica, 239 252, 239fGibbs sampling, 9, 100

basis of, 20starting, 40 41

GLM. See Generalized linear modelGlm(), 6, 66 67, 174, 180Glmer(), 6, 235GLMM. See Generalized linear mixed

modelGoodness of fit

code assessing, 265linear regression and, 106 109measuring, 24

Graphical check, 107, 267Grassland, 242

arable sites and, 101, 167 177Greenfinch, 64

HHabitat suitability, 68, 127Hare. See Swiss hare dataHeterogeneous variance, 97, 98fHierarchical model, 241, 276 277

non , 25High dimensional integrals, 19Higher order interactions, 142Highest posterior density interval

(HPDI), 23obtaining, 43

History button, 38 39, 42, 51Homogeneous slopes, 275Homogeneous variance, 101Homoscedasticity, 97, 107

HPDI. See Highest posterior densityinterval

Hurdle model, 185Hyperparameter, 122 123, 152, 162Hyperprior, 125, 164Hypothesis testing, 24

CRI for, 24 25custom, 121

IIdentifiability, parameter, 24

assessing, 26 27definition of, 26simulation for, 27

Identity (link function), 168Ignorance, 18

specifying, 3, 61Imperfect detection, 251 252, 254 257Implicit assumption, 170, 205, 220Impute (data), 45Inbuilt variance (in Poisson and Binomial

distributions), 63, 65Poisson ANCOVA and, 194

Indicator variable, 71, 143Inference. See also Scope (of inference)

Bayesian, 8 9, 16 17summarizing, 39

Informative prior, 19, 277posterior distribution and, 56

Initial value, 22defining, 51loading, 37for MCMC, 35

Intensity (Poisson), 169, 180Interaction effects

ANCOVA, 84, 88feffects parameterization, 83, 86main effects compared to, 129 130means parameterization, 81, 83, 86specifying, 82 83two way ANOVA, R analysis, 134 135two way ANOVA, WinBUGS, 137 138

Intercept, 72, 75, 84design matrix without, 154random coefficients mixed model

ANCOVA, with correlation ofslope and, 161 165

random coefficients mixed modelANCOVA, without correlation ofslope and, 158 161

Inventory, plant, 212, 239 240, 242Iteratively reweighted least squares, 9

INDEX 295

JJAGS (program), 277Jura, 193 194, 212, 221, 240, 242

KKruskal Wallis test, 275

LLacerta agilis, 254fLand use, 139, 180 181, 209, 228

counts and, 170density of, 169

Lanius collurio, 204fLanius senator, 230fLatent effect, 184Latent state, 187 188, 251Lattice, 31, 205, 206fLeast squares

in classical analysis, 138iteratively reweighted, 9method of, 71

Lepus europaeus, 169fLikelihood, 16, 95, 99Likelihood ratio test (LRT), 171Linear mixed model (LMM). See Mixed

modelLinear model (LM), 5, 5t, 24

explanatory variable specified, 66, 75normal distribution, 70of t test compared to linear regression, 103usefulness of, 58 59in WinBUGS, 66 67

Linear predictor, 168, 204, 213, 220design matrix and, 67, 71 72, 144residuals and, 70

Linear regressionBayesian p value, 107 109CI v. CRI, 111 113data generation, 104 105goodness of fit, 106 109LM of t test compared to, 103model adequacy, 105 106posterior predictive distributions PPD,

107 109predictions, 109 111, 110 111fR analysis, 105residual plot, 107, 107fsimple, 73 76Swiss hare data, 113WinBUGS analysis, 105 113

Link function, 204, 213, 220identity, 168

LM. See Linear modelLm(), 6, 66 67Lme4, 31Lmer(), 165, 235LMM. See Linear mixed modelLog (link function), 168Logistic regression, 104, 167, 251Logit (link function), 168, 220Logit linear regression, 242 243Log linear model, 167, 197 199Log normal, 61, 95, 183, 225Log transformation, 169Low information prior, 18LRT. See likelihood ratio test

MMain effects

ANCOVA, 85 86, 87fANOVA, 79interaction effects compared to, 129 130means parameterization, 80, 84 85two way ANOVA, R analysis, 134two way ANOVA, WinBUGS, 135 137

MARK (program), 67, 246, 262Markov chain Monte Carlo (MCMC), 9, 18

compiling, 36 37converged, 21 22, 41, 173 174definition of, 19 20history, 20initial values, 35posterior distribution, 20reversible jump, 25sampling error, 250time series plot, 22f

MASS, 31, 162Massif Central, 193 194Matlab (program), 10Matrix, 53, 71, 74

multiplication, 123, 132, 144variance covariance, 162X, 71

Maximum likelihood (ML). See alsoRestricted maximum likelihood

bias and, 2Maximum likelihood estimate (MLE), 16MC error, 43MCMC. See Markov chain Monte CarloMeans parameterization, 73

ANOVA, one way, 76 77of interaction effects, 81, 83, 86of main effects, 80, 84 85t test, 92

INDEX296

Metapopulation, 240, 253, 257Metropolis Hastings algorithm, 20Minimally informative prior, 18Missing data, 113Missing value, 257Mite, 193 194

counts, 195load, 199 201

Mixed effect. See mixed model; Binomialmixed effects model; See Poissonmixed effects model

Mixed model. See also Binomial mixedeffects model; Generalized linearmixed model

benefits, 152covariates, 151data generation for, 154 156linear, 5, 5t, 151 166non linear, 277random coefficients model, with slope/

intercept correlation, 161 165random coefficients model, without

slope/intercept correlation,158 161

random coefficients model, 153random intercepts model, Bayesian

analysis with WinBUGS, 156 158random intercepts model, REML analysis

with R, 156random intercepts model, 153, 156 158REML, 154REML analysis using R, 156

Mixture model. See Binomial mixture modelML. See Maximum likelihoodMLE. See Maximum likelihood estimateModel adequacy

checking, 24for linear regression, 105 106posterior predictive check, 108 109,

109f, 248posterior predictive distributions, 108for t test, 96

Model averaging, 25Model criticism, 24Model matrix, 69 70, 143“Model of the mean,” 33 34, 68 69

result summary, 41 44setting up, 34 40

Model selection, 24challenge, 26DIC, 25

Monitoring, convergence, 21 23

Morph, zigzag, 219 220Motorbike, 8Motorcycle, 48Mourning cloak, 130, 130fMultiple binomial regression, 228Multi way ANOVA, 115. See also

ANOVA (two way)MVN, 161

NNaïve analysis

abundance and, 272, 273fbinomial mixture model, 261f, 262counts, 238 239for site occupancy model, 246, 246f

Negative binomial (distribution), 65Neighboring model, 270Netherlands, The, 254 274Newton Raphson algorithm, 9N mixture model. See Binomial mixture

modelNode, 38, 42Non Bayesian, 6, 19Nonconvergence, 26t, 27, 31Nondetection. See detection nondetection

dataNonhierarchical, 25Nonidentifiable parameter, 139Non linear mixed models, 277Non normal, 229Non standard (GLMM). See Binomial

mixture model; Site occupancy modelNormal distribution, 34, 58

mathematical description, 61picture, 61fuses, 168in WinBUGS, 61 62

Normalize covariate, 154Nymphalis antiopa, 130f

OObservation process, 265

modeling, 244Occupancy, 215 216, 216fOccurrence

apparent, 240detection confounded with, 240,

254, 260probability of, 238, 240, 242, 243f

OD. See overdispersionODC format, 34

INDEX 297

Offset, 179analysis methods for, 189 190covariates and, 188 189

OpenBUGS (program), 47, 277Optimization, 30Outlier, 108Overdispersion (OD)

correcting, 180data generation, 180 181in counts, 180R analysis, 181 183Swiss hare data, 191WinBUGS analysis, 183 184zero inflation as, 179

Overparameterization, 79

PPachyderm, 17Parameter estimability, 7, 21, 27Parameter identifiability, 24

assessing, 26 27definition of, 26simulation for, 27

Parameter vector, 71, 144,195, 222

Parameterization, 7, 67, 72. See alsoeffects parameterization;means parameterization

ANOVA, one way, and, 78design matrices, 77of one way ANOVA, 116over , 79re , 72for two way ANOVA, 130

Parametric statistical modeling, 59Parasite data sets, 195, 196fParsimony, 18, 145, 196, 223PD, 25Pearson residual, 171, 173, 174f

for Binomial ANCOVA, 224posterior predictive check and, 175f,

225, 226fPeregrine falcon, 34f, 48

body mass, 33summary of results, 41 44wingspan, 92

Poisson ANCOVAcovariates, 193data generation, 194 196inbuilt variance, 194predictions, 199 201R analysis, 196 197

Swiss hare data, 202WinBUGS analysis, 197 201

Poisson distribution, 60for counts, 169, 211data generation, 170inbuilt variance, 65posterior distribution, 175predictions, 175R analysis, 171R code, 65, 66fSwiss hare data, 177t test, 168 169in WinBUGS, 65WinBUGS analysis, 171 177zero truncated, 185

Poisson GLMM, 203binomial mixed effects model

compared to, 230convergence, 208 209data generation, 205R analysis, 207random coefficients model, 206 209Swiss hare data, 209WinBUGS analysis, 207 209

Poisson regression, 202Poisson gamma (mixture), 65Poisson log normal (distribution), 183Polynomial, 24, 109, 242

Poisson regression, 202Population, 10

closed, 242covariate, 257in decline, 15, 103density, 10trends, 109, 112 113, 119, 203

Postcondition violated, 251Posterior distribution, 16 17, 19

graphical summaries for, 55informative prior and, 56numerical summaries, 55samples from, 19summarizing, 23

Posterior draws, 51Posterior predictive check (PPC),

104 105, 171for binomial mixture model, 268ffor model adequacy, 108 109, 109f,

225, 226f, 248Pearson residuals and, 175f, 225,

226ffor residuals, 173 174for site occupancy model, 246

INDEX298

Posterior predictive distributions (PPD),106

linear regression and, 107 109model adequacy and, 108

PPC. See Posterior predictive checkPPD. See Posterior predictive distributionsPredation, size dependent, 131Prediction(s), 236

forming, 24for linear regression, 109 111,

110 111ffor Poisson ANCOVA, 199 201for Poisson distribution, 175for two way ANOVA, 138 139, 139f

Prediction interval, 110PRESENCE (program), 246, 262Presence absence data, 212, 244

Bernoulli distribution and, 238Prevalence, 27Prior(s), 135. See also specific types

belief, 17 18inference affected by change in, 270uniform, 267

Prior distribution, 16 17problems with, 17 18

Prior sensitivity, 251, 274analysis for, 252

Probabilityaxioms of, 4, 16 17Bayesian approach and interpretation of,

3 4, 15classical interpretation of, 15of decline, 105 106degree of belief, 20detection, 238, 240 241of occurrence, 238, 240, 242, 243ftheory, 14

Probability distributions, 20parameters of, 59 60random variables and, 59

Prussian soldiers, 65Pscl, 186Pseudo code, 31P value. See Bayesian p valuePyMC (program), 277Pyrenees, 142 145, 193, 195 196

QQuadratic effect, 257Quasibinomial, 180Quasi likelihood, 180Quasipoisson, 182

RR (program), 5 6, 8

basic knowledge of, 10Poisson distribution code in, 65, 66fin R2WinBUGS used with WinBUGS,

49 55WinBUGS integrated with, 6WinBUGS run from, 30working directory set for, 49 51workspace, 51 52, 55

R function. See also specific typescommon, 6defining types of, 51, 53for design matrix, 68 69

R2WinBUGS, 22, 30, 47WinBUGS with R program used

through, 49 55Random variables

Poisson, 211probability distributions and, 59

Random coefficients modelfor binomial mixed effects model,

229 230, 231 235for mixed model ANCOVA, 153for mixed model ANCOVA with slope/

intercept correlation, 161 165for mixed model ANCOVA without

slope/intercept correlation,158 161

for Poisson GLMM, 206 209for Poisson mixed effects model,

See for Poisson GLMMSwiss hare data, 166

Random effectsin Bayesian analysis, 235definition of, 115ZIP, 185

Random intercepts modelfor mixed model ANCOVA,

Bayesian analysis withWinBUGS, 156 158

for mixed model ANCOVA,REML analysis with R, 156

Raptor, 177Red backed shrike, 203 204, 204f,

229 236, 230fRefresh setting, 40Regression. See also Linear regression

linear, 73 76logistic, 104, 167, 251logit linear, 242 243multinomial, 167

INDEX 299

Regression. (Cont.)multiple binomial, 228Poisson, 202ZIP, 184

REML. See restricted maximum likelihoodReparameterization, 72Repeated measurement, 240, 257Replicate counts, 255, 260Reptiles, 254Residual(s), 24. See also Pearson residual

assumptions, 69 70linear predictor and, 70PPC for, 173 174raw, 224t test, checking, 97

Residual plot, of linear regression, 107, 107fResidual standard error, 49Restricted maximum likelihood (REML), 5

mixed model, 154mixed model ANCOVA analysis

using R and, 156random effects one way ANOVA

using R and, 124Reversible jump (MCMC), 25Rhat, 22, 54, 148

convergence and, 173R squared, 94, 145

SSample Monitor Tool, 37, 41Sampling error

exercise, 93MCMC, 250statistics and understanding, 48 49understanding, 7

Sampling variance. See Sampling errorSand lizard, 254 274, 254fSAS (program), 5, 10Scale reduction factor, 199Scope (of inference), 152SD. See Standard deviationSex, 59, 92Shrinkage estimators, 118

fixed/random, 236Simulated data sets, 7 8Simulation, 27. See also specific simulationsSite covariate, 252Site occupancy model, 104, 237, 239

binomial mixture model compared to,256, 259

confusions with, 238 239covariates, 241 242

data generation, 242 246detection probability, 241, 250 251, 250fdynamic, 22, 242naïve analysis, 246, 246fposterior predictive check, 246probability of occurrence, 240starting values, 263Swiss hare data, 252truth, 251 252WinBUGS analysis, 246 251

Sloperandom coefficients mixed model

ANCOVA, with correlation ofintercept and, 161 165

random coefficients mixed modelANCOVA, without correlation ofintercept and, 158 161

Slovenia, 239Smooth snake, 116, 116fSnake, 142, 142f, 219 228, 220f

data set for, 67 68, 68tsmooth, 116, 116f

Snout vent length (SVL), 115 116Software. See specific softwareSombre goldenring, 193 194, 194fSplines, 242Standard deviation (SD), 7, 42, 49

for Bernoulli distribution, 64residual, 123

Standard error, 7Standardization, covariate, 147 149Starting value

random, 51for site occupancy model, 263

State space (model), 241Statistical analysis, 5, 5tStatisticians, 275 276Statistics. See also Bayesian (approach to

statistics)Bayesian v. classical, 6, 15 19creativity and, 275definition/use of, 14sampling error and, 48 49

Stochastic systems, 7binomial mixture model and, 255deterministic systems compared to, 58in life, 14

Straight line relationships, 66, 156Structural

model parameters, 152, 266model parts, 25model problems, 43

INDEX300

Survey covariate, 252, 273 274Survival, 10, 59, 211 212SVL. See Snout vent lengthSwiss hare data, 56

ANCOVA, 150Binomial ANCOVA, 228binomial mixture model, 274linear regression, 113one way ANOVA, 127overdispersion, 191Poisson ANCOVA, 202Poisson distribution, 177Poisson GLMM, 209random coefficients model, 166site occupancy model, 252t test, 101two way ANOVA, 139

TTichodroma muraria, 104fTransect, 254Transformation, covariate, 149Treatment contrasts, 71Trellis plot, 206f, 232 233f

constructing, 155, 155fTrends, population, 109, 112 113, 119, 203True state, 242 244, 253

partially observed, 247T test, 69 73

binomial, 213 216data generation, 92 93, 97 98effects parameterization, 92 93with equal variances, 92 97as LM, 91LM of linear regression compared to, 103means parameterization, 92model adequacy, 96Poisson distribution, 168 169residuals, 97Swiss hare data, 101with unequal variances, 97 100

Tutorial, WinBUGS, 11

UUnbalanced data, 257Unbiased, 2, 133Uncertainty interval, 110Unequal variances, 97 100Uniform distribution, 58

continuous, 62picture, 63f

Uninformative prior, 18

VVague prior, 18 19, 158, 277

Bayesian approach with,55 56, 275

Variance(s). See also specific variancesBayesian compared to

frequentist, 96chain, 22comparing, 101modeling, 100 101t test and equal, 92 97t test and unequal, 97 100

Variance component, 51Variance components model, 152Variance covariance matrix, 162Vegetation, 257 262

covariable, 261Vipera aspis, 142f, 152fVipera berus, 220f

WWallcreeper, 103 104, 104fWebsite

for book, 31 32, 34, 49, 250for WinBUGS, 10

Welch test, 98WinBUGS

advantages, 4 5ANCOVA, 145 149Binomial ANCOVA, 224 228binomial distribution, 63 64binomial mixture model, 262 273covariate standardization, 147 149crash, 45, 263 264creativity, 30 31as eye opener, 275fixed effects, 166fixed effects one way ANOVA,

120 122freedom of, 30 31Gibbs sampling, 21interaction effects two way ANOVA,

137 138linear regression analysis, 105 113literature on, 9 10main effects two way ANOVA,

135 137mixed model ANCOVA, 156 158normal distribution, 61 62overdispersion, 183 184Poisson ANCOVA, 197 201Poisson distribution, 171 177

INDEX 301

WinBUGS (Cont.)Poisson distribution, 65Poisson GLMM, 207 209power of, 29practical implementation of complex

models in, 263 264, 270 271R integrated with, 6R2WinBUGS, 49 55random effects in, 166random effects one way ANOVA,

125 127site occupancy model, 246 251technicalities, 31 32t test analysis, 94 97tutorial, 11website for, 10ZIP model, 187 188

Within chain variance, 22Woodchat shrike, 229 236, 230fWorking directory, 31 32

setting R, 49 51Workspace, R, 51 52, 55

XX matrix, 71

ZZeroes, 177

avoiding, 263Zeroinfl(), 186Zero inflated binomial (ZIB), 184Zero inflated Poisson (ZIP)

counts, 185data generation, 185 186hurdle model compared to, 185R analysis, 186 187random effects, 185WinBUGS analysis, 187 188

Zero inflationdefinition, 184as overdispersion, 179

Zero truncated, Poisson distribution, 185ZIB. See zero inflated binomialZigzag adders, 219 228, 220fZIP. See zero inflated Poisson

INDEX302

Documents

Introduction to WinBUGS for Ecologists: Bayesian approach ...bayanbox.ir/view/5958854163974907854/Introduction-to-WinBUGS-for... · Foreword The title of Marc Kéry’s book, Introduction