Upload
dragonleeme5840
View
58
Download
5
Embed Size (px)
DESCRIPTION
thesis winbug
Citation preview
Bayesian Modelling of The Indirect Effectsof Insecticides on Yellowhammer Chick
Survival
Xiaosi Wang
Dissertation submitted for the MSc in Data Analysis, Networks
and Nonlinear Dynamics, Department of Mathematics, University
of York, UK
August 2005
Contents
List of Figures iv
List of Tables v
Preface vi
Acknowledgements viii
1 Introduction of The Project 11.1 An Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 The Experiments on The Yellowhammer . . . . . . . . . . . . . . . . . . . 41.3 A Non-Bayesian Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.4 Bayesian Method in A Relevant Area . . . . . . . . . . . . . . . . . . . . . 101.5 Motivations & Difficulties . . . . . . . . . . . . . . . . . . . . . . . . . . . 121.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2 Bayesian Approaches 142.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.2 Bayes’ Theorem for General Quantities . . . . . . . . . . . . . . . . . . . . 162.3 Bayesian Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3.1 Analysis of Binary Data . . . . . . . . . . . . . . . . . . . . . . . . 182.3.2 Inference with Normal Distribution . . . . . . . . . . . . . . . . . . 202.3.3 Point Estimation & Interval Estimation . . . . . . . . . . . . . . . . 21
2.4 Computational Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.4.1 The Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.4.2 Markov Chain Monte Carlo . . . . . . . . . . . . . . . . . . . . . . 242.4.3 Introduction of WinBUGS . . . . . . . . . . . . . . . . . . . . . . . 25
2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3 Data Analysis and Model Handling in WinBUGS 293.1 Formatting The Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.2 The Analysis of Failure Time Data . . . . . . . . . . . . . . . . . . . . . . 31
3.2.1 Failure Time Distributions . . . . . . . . . . . . . . . . . . . . . . . 333.2.2 Regression Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
i
Contents ii
3.2.3 Survival Analysis in WinBUGS . . . . . . . . . . . . . . . . . . . . 373.3 Models Considering Time Dependence . . . . . . . . . . . . . . . . . . . . 393.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4 Discussions on Failure Time Data Models 434.1 Survival Models Based on The Exponential Prior . . . . . . . . . . . . . . 43
4.1.1 Models Based on Insecticide Quantities . . . . . . . . . . . . . . . . 444.1.2 Models Based on Time . . . . . . . . . . . . . . . . . . . . . . . . . 454.1.3 Models Based on Block Structure . . . . . . . . . . . . . . . . . . . 504.1.4 Analysis of The Posterior Summaries & Convergence of MCMC Sam-
pling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514.2 Survival Models Based on The Weibull Prior . . . . . . . . . . . . . . . . . 54
4.2.1 Models Based on Non-Informative Priors . . . . . . . . . . . . . . . 554.2.2 Models Based on Informative Priors . . . . . . . . . . . . . . . . . . 64
4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5 Discussions on Time Dependence Models 695.1 Model I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 695.2 Model II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Further Developments 74
Conclusions 75
Appendix I 77
Appendix II 83
Bibliography 92
List of Figures
1.1 Three patterns of influences on food chains . . . . . . . . . . . . . . . . . . 21.2 A GIS Map of Nests with 200m Radius Foraging Areas . . . . . . . . . . . 6
2.1 The comparison between the MCMC and the common Markov Chain . . . 242.2 An example of using the Sample Monitoring tool . . . . . . . . . . . . . . 27
3.1 Original data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.2 Insecticide data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.1 The model fit of the daily survival rate over the different proportions ofinsecticides from exponential model . . . . . . . . . . . . . . . . . . . . . . 44
4.2 The model fit of the nestling period survival probability over different pro-portions of insecticides from the exponential model . . . . . . . . . . . . . 45
4.3 The model fit of the chick daily survival rate over time . . . . . . . . . . . 464.4 The trend of the chick daily survival rate over time . . . . . . . . . . . . . 474.5 The model fit of the nestling period survival probability over time . . . . . 484.6 The trend of the nestling period survival probability over time . . . . . . . 484.7 The model fit of the probability density of chick survival life time . . . . . 494.8 The trend of the probability density of chick survival life time . . . . . . . 504.9 The model fit of nestling daily survival rate against different blocks . . . . 514.10 The convergence of two chains run for t[10] . . . . . . . . . . . . . . . . . . 534.11 The model fit of the chick daily survival rate over different proportions of
insecticides (the Weibull model, under non-informative prior distributions) 574.12 The model fit of the nestling period survival rate over different proportions
of insecticides (the Weibull model, under non-informative prior distributions) 574.13 The model fit of the daily survival rate over time (the Weibull model, under
non-informative prior distributions) . . . . . . . . . . . . . . . . . . . . . . 584.14 The trend of the daily survival rate over time (the Weibull model, under
non-informative prior distributions) . . . . . . . . . . . . . . . . . . . . . . 594.15 The model fit of the individual chick survival rate to days (the Weibull
model, under non-informative prior distributions) . . . . . . . . . . . . . . 594.16 the trend of the individual chick survival rate to days (the Weibull model,
under non-informative prior distributions) . . . . . . . . . . . . . . . . . . 60
iii
List of Figures iv
4.17 The model fit of the probability density of the nestling death time (theWeibull model, under non-informative prior distributions) . . . . . . . . . . 61
4.18 The trend of the probability density of the nestling death time (the Weibullmodel, under non-informative prior distributions) . . . . . . . . . . . . . . 61
4.19 The multiple chains of some monitored variables which have reached con-vergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.20 The posterior densities of four monitored variables of the 17th nestling . . 634.21 The pairs of chains for bn which have clearly not reached convergence under
very vague priors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 634.22 The model fit of nestling daily survival rate over time (the Weibull, under
standard normal distribution priors) . . . . . . . . . . . . . . . . . . . . . . 664.23 The model fit of the chick daily survival rate over different proportions of
insecticides (the Weibull model, under standard normal distribution priors) 664.24 The pairs of chains for bn which have reached reasonable convergence under
informative priors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.1 Data format for model I . . . . . . . . . . . . . . . . . . . . . . . . . . . . 705.2 The autocorrelation functions of the daily survival rate on the 9th day and
the 10th day respectively . . . . . . . . . . . . . . . . . . . . . . . . . . . . 705.3 The daily survival rate of chicks from 20th May, 2002 to 3rd June, 2002(Block
One) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715.4 The convergence for parameters αc and βc respectively . . . . . . . . . . . 725.5 The dynamics of the convergence for parameters αc and βc respectively . . 73
List of Tables
1.1 Bobwhite Data-Knox County, 1991 . . . . . . . . . . . . . . . . . . . . . . 11
2.1 Conjugate Distributions Table . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.1 Explanations of Input Data . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.1 Posterior summaries from the Exponential Model . . . . . . . . . . . . . . 524.2 A random subset of the posterior summaries and the MC errors of some
monitored variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 544.3 A random subset of the posterior summaries and the MC errors of some
monitored variables from the Weibull Model under very vague priors . . . . 564.4 A random subset of the posterior summaries and the MC errors of some
monitored variables from the Weibull Model under informative priors . . . 644.5 A random subset of the posterior summaries and the MC errors of some
monitored variables from the Weibull Model under informative priors . . . 65
5.1 Input data for model II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
v
Preface
This paper concerns a very important and valuable ecological problem. In Britain, in recent
years, the populations of many species of birds, especially farmland birds, are declining.
The poisoning of farmland birds in the past was a direct effect of agricultural chemical
treatments. However, after the disappearance of poisonous or sub-poisonous insecticides,
one of the possible reasons for the continued decline in bird populations can be attributed
to the indirect effects of pesticides acting through the food chain.
An extensive series of farmland experiments on this issue have been completed by the
Central Science Laboratory (CSL), an executive agency of the Department for Environ-
ment Food and Rural Affairs (DEFRA), UK, which provides appropriate research and
development to safeguard the food supply and to protect the environment. Although sev-
eral interesting findings have been reported in some recent papers, only classical frequentist
techniques involving Generalized Linear Models have been applied. A more novel statis-
tical method, Bayesian inference, is expected to be used to make further developments.
Until now, few attempts of using Bayesian approaches have been made in the survival
nestling analysis. Therefore, a scientific publication will be possible if Bayesian models are
successful.
As a result, a project was promoted by CSL to model the indirect effects of insecticides
on the yellowhammer (Emberiza citrinella) chick survival conditions by using Bayesian
statistics. It was then provided to the MSc course, Data Analysis, Networks and Nonlinear
Dynamics, the University of York, 2004-2005 as a student placement for the dissertation.
vi
Preface vii
The author took the placement and was under the supervision of senior bio-statistician
Alistair Murray, the team leader of the Applied Mathematics and Statistics Department,
CSL and Dr. Peter M Lee, the University of York.
The Bayesian approaches were absolutely new to the author and a specific method
supposed to be used in the project was proved not feasible after quite a long time due to
several avoidless reasons. However, a variety of new models are completed at last with a
lot of hard work behind.
The structure of the dissertation is as following:
Firstly, the ecological experiments and the relevant statistical approaches are introduced
to give a general idea of the project.
Secondly, the main features of the Bayesian methods related to the project are pre-
sented. WinBUGS, a brilliant software for Bayesian analysis, is selected as the main tool
for building Bayesian models, whilst the language, R, is also applied in Chapter 5.
More importantly, detailed introductions of the data reformatting, the mathematical
analysis of data and the model handling in WinBUGS are provided in Chapter 3.
According to different statistical modelling perspectives, model discussions are elabo-
rated in Chapter 4 and 5 separately. These two chapters demonstrate the new achievements
of the project, and therefore are the most important parts in the paper.
Finally, the summary at the end of each chapter has listed the most important features
of that chapter.
Acknowledgements
My thanks go firstly to my supervisors, Alistair Murray, Central Science Laboratory and
Dr. Peter Lee, the University of York. Alistair explored the student placement for the
project and has provided me with valuable supervision throughout the whole process. Peter
has provided me precious expertise on Bayesian Approaches and WinBUGS programmes.
As his last student before his retirement, I hope my work is a nice present.
I am also grateful to ornithologists, Justin Hart and Dr. Tim Milsom of CSL. They
have provided me expertise in ecological issues. Justin also helped me enter the data into
different data bases which was a time-consuming work.
I should also say thanks to Dr. David Spiegelhalter and Dr. Nicky Best for their
extremely useful course on WinBUGS.
I would also like to give my thanks to Professor Mike Smith. His unselfishness helped
me overcome my family misfortune both mentally and substantially.
Finally, my thanks go to my parents and my sincere friends, Anna Armstrong, Kit Fan,
Lieven Clarisse, Mahlet Getachew, and Marina Theodoropoulou. Their support in every
aspect has encouraged me to accomplish the MSc course.
viii
Chapter 1
Introduction of The Project
1.1 An Overview
Since the middle of the 1980s, the populations of many farmland birds in the UK, including
the yellowhammer (Emberiza citrinella), have declined. The reductions of arthropods,
seeds and weeds on arable farmlands caused by intensive agricultural practices are related
to the population decline. However, the potential mechanisms are still unknown ([Boatman
N. D. et al, 2004]; [Hart, J. D. et al, 2005]).
The simultaneously increased pesticide applications are suspected to have impacted
on the bird populations. Due to the withdrawal of the organochlorine insecticides in the
1950s and 1960s, little evidence of lethal or sub-lethal poisoning of chicks or provisioning
adults which is termed as direct effects of pesticides is observed in the last three decades.
Therefore, more attention has been focused on the indirect effects of pesticides. For exam-
ple, various insects which are taken as food by nestlings of most bird species during the
breeding season are reduced by insecticides, and subsequently the populations of chicks
shrink because of the lack of food supplies. In other words, it is a mechanism that works
by operating on food chains. Although the influence has been considered in general, the
1
1.1 An Overview 2
extent to which the survival conditions of arable farmland birds are affected by this kind
of agricultural treatments is still unidentified. Three patterns of the impact have been
suspected:
• the arthropod population is depleted by insecticides
• the supply of plants which are used as hosts by arthropods is reduced by herbicides
• the weed species which provide either green matter or seeds for herbivorous andseed-eating species respectively is eliminated by herbicides
These routes are shown in Figure 1.1.
insecticides
arthropod reduction
reduction of surviving chicks
herbicides
weed reduction
reduction of surviving chicks
herbicides
plant reduction
reduction of surviving chicks
arthropod reduction
Figure 1.1: Three patterns of influences on food chains
1.1 An Overview 3
Although it was concluded that at least 11 species are possibly affected by increased use
of pesticides, sufficient evidence for this, and elaborate data analysis, were only available
on the grey partridge (Perdix perdix ) [Campbell, L. H. et al, 1997]. This lack of research is
due to field work required in the relevant experiments being expensive and labour-intensive.
In order to stop the trend of decline and to promote healthier farming practices, a series
of large-scale field experiments were designed and carried out by scientists from the Central
Science Laboratory between 2000 and 2003 to study the indirect effects of insecticides on
farmland birds (pattern 1, Figure 1.1) on three bird species: the yellowhammer, the skylark,
and the corn bunting, of these the yellowhammer was emphasized because of three main
reasons:
Multiple foraging areas It not only forages in the arable crops, but also forages in field
margins. Therefore, the field margins included in the study areas do not need to be
considered separately.
Invertebrates dominated nestling diet A range of arthropods comprises the main
nestling diet even though semi-ripe cereal grain is also consumed later in the breeding
season. Thus, insecticides without herbicides and fungicides can also influence the
food supplies.
Sufficient abundance for sampling Although it is a species with marked decline, it is
still wide-spread on farmlands. This situation enables us to collect enough samples
for study
Treatments in these experiments included supplementary seed supplies in winter to
enhance the sufficiency of winter food and increases in the times of normal insecticide
applications to depress the invertebrate food resources. Three sites, each of which were
composed of four blocks with 1km radius were involved in the experiments. They were
located in Hampshire, Lincolnshire, and North Yorkshire, on arable or mixed farms. The
1.2 The Experiments on The Yellowhammer 4
data were collected from different aspects, including insecticide use, invertebrate availabil-
ity, foraging, chick condition, growth rate and chick survival.
The data of the insecticide inputs and chick survival conditions of the yellowhammer
which have been grouped and subsequently used in the previous analysis are provided to
the student placement for deeper and further Bayesian analysis, a method which can be
seen as a novel interpretation into the study of bird reproductive performances under para-
meter controls (insecticide applications). The earlier analysis was conducted by traditional
statistical methods and therefore the data were not in the an appropriate form at the be-
ginning of the placement. Reformatting the data (see Section 3.1, Chapter 3) was needed.
Although the experiments carried out were on different sites within successive years, they
are designed as replicated studies. Therefore, once a statistical model based on one data
set from a specific site in a certain year is applicable, it can be quickly translated into
studies on other data sets.
1.2 The Experiments on The Yellowhammer
The yellowhammer is a typical farmland bird species which is suffering from population
decline. Fundamental statistical studies have provided some evidences on the relationships
between the abundance of arthropods which was heavily reduced by insecticides and the
physical mass of chicks. As a development of the current achievements, modelling the
impact of summer insecticide inputs on the yellowhammer chick survival conditions from
hatching to fledging from different perspectives are the major tasks for the project, in-
cluding studies of the daily survival rate of chicks, the nestling period survival rate of an
individual chick, the direct relations of the daily survival rate and insecticide quantities
and so forth.
The data used in the paper were from four blocks of farmlands at Castle Howard
1.2 The Experiments on The Yellowhammer 5
site, North Yorkshire, 2002. The breeding season, a critical phase in the life circle of
the yellowhammer, is between the late April and early August. Hence, this period is
chosen for the data collection. During the time under monitoring, all active nests located
in the study area were found and each nest was visited every 2-4 days but not daily to
eliminate disturbance. Chick conditions were recorded from different aspects, including
general condition (alive or dead) at each visit, physical mass at certain ages and so on.
Particularly, a chick was always weighted to determine its age when it was found for the
first time.
The data of insecticides were recorded from farming practices. All insecticides applied
here belonged to the pyrethroid family which have no short or chronic poisonous effects on
chicks. Most fields on block 1 and 4 were sprayed only once at a normal quantity whereas
those on block 2 and 3 received twice normal insecticide treatments. Conventionally, blocks
treated twice at different time were referred to as ‘extra’ while those treated only once
were referred to as ‘normal’. There was no evident difference between ‘extra’ and ‘normal’
blocks in general [Boatman N. D. et al, 2004] mainly because the ‘extra’ only meant an
second treatment with a normal quantity and the first spray was also at a normal level
of insecticides but on a much earlier date. During the interval between two sprays, the
population of most arthropods had recovered rapidly. As a result, the block structure was
ignored and only the data of pesticide quantities related to individual nests were used.
In general, 90% of the foraging flights were found to take place in a 200m-radius circle
area around a single yellowhammer nest. Hence nests were located on a Geographic In-
formation Systems (GIS ) map and the proportions of fields in such a foraging area with
insecticides on were calculated. Such a method is shown in Figure 1.2. Note that different
colors only denote different crops. Thus, it has nothing to do with the proportions of
insecticide treated area.
1.2 The Experiments on The Yellowhammer 6
Figure 1.2: A GIS Map of Nests with 200m Radius Foraging Areas
The data of insecticides are available for most nests. However, a few nests failed shortly
after they had been found various unusual reasons unrelated to the pesticide treatments,
so no insecticide data are recorded for such cases. For example, all chicks in nest No. 254
died in an accident as the nest was not stably built and fell off the tree when there was
heavy wind.
The previous data analysis mainly reported some evidences on the relationships be-
tween insecticide use and chick food invertebrates, between chick condition and chick food
invertebrates, between chick growth rate and chick survival. However, some crucial analy-
ses such as the daily survival rate of yellowhammer chicks from hatching to fledging, and
the relationship between the insecticide use and chick survival were not studied. Statistical
models to solve such problems are built in this project.
1.3 A Non-Bayesian Method 7
1.3 A Non-Bayesian Method
In bird studies, a widely-used statistical method is called Mayfield Method, which is named
after Harold Mayfield, a professor at Cornell University, whose two papers ([Mayfield,
1961]; [Mayfield, 1975]) presented an easy-handled statistical method for estimating daily
survival rate (dsr) of bird nests. Further modifications and developments were made by
other statisticians ([Johnson, 1979]; [Hensler and Nichols, 1981]). As a result, the initial
method was formalized into a standard mathematical method. It has been concluded that
the Mayfield formula was actually a Maximum Likelihood Estimator (MLE) of the dsr. A
brief introduction is presented here in order to carry out the comparison between ‘Bayesian’
and ‘Mayfield’ (See Chapter 4 and 5).
To start the discussion of ‘Mayfield’, some assumptions are necessary:
• a nest survives from one day to the next with the same probability throughout the
nest life
• the complete period to success (the nesting stage) are of the same length for all nests
• all nests being observed have the same probability to survive from one day to the
next
• the exact date of a nest’s success or failure is known
• the nests under monitoring comprise a random sample of the population of nests
under consideration
• all nests are independent
The formula derivations introduced here are mainly based on [Aebischer, 1999] and [Hensler
and Nichols, 1981]. Suppose that a nest is found at a particular nesting stage, i.e. egg-
laying, incubation or brood-rearing. Let ωk denote the probability that it is found on
1.3 A Non-Bayesian Method 8
day k (encounter probability) within the total nesting days K. It is evident that all the
probabilities form a vector ~ω = (ω1, ω2 . . . ωK). The nest is observed for n days, and its
outcome o is recorded as failure (0) or success (1). The quantity of interest is the probability
that an active nest on one day survives to the next (daily survival rate s). Let f denote
the conditional probability of the binary outcome, i.e. either failure or success. It can be
expressed in a function of s (the likelihood function) given the conditions described above:
f(s|o, n, K, ~ω) = (ωK−n+1sn)o
[
sn−1(1 − s)K−n+1∑
k=1
ωk
]1−o
If M nests are found, the mth nest is monitored for nm days with outcome om. The joint
probability of the outcomes given that all the nests are independent is:
M∏
m=1
f(s|om, nm, Km, ~ωm)
So the log-likelihood function is:
l(s|~o, ~n, ~K, ~ω) = ln
[
M∏
m=1
f(s|om, nm, Km, ~ωm)
]
=M∑
m=1
(nm + om − 1) ln s +M∑
m=1
(1 − ym) ln(1 − s)+
M∑
m=1
ln
ωom
Km−nm+1,m
(
Km−nm+1∑
k=1
ωkm
)1−ym
= (N + O − M) ln p + (M − O) ln(1 − p)+
M∑
m=1
ln g(om, nm, Km, ωm)
1.3 A Non-Bayesian Method 9
where
N =
M∑
m=1
nm;
O =M∑
m=1
om;
g(om, nm, Km, ωm) = ωom
Km−nm+1,m(
Km−nm+1∑
k=1
ωkm)1−ym
Here N denotes the total number of nest-days under monitoring, and O represents the
successful number of nests.
To solve the equation
dl(s|~o, ~n, ~K, ~ω)
dp= 0
we get the maximum likelihood estimator s of s:
s =N + O − M
N
Furthermore, the asymptotic variance can be estimated by
V ar(s) =−1
E[
d2l(s|~o,~n, ~K,~ω)ds2
] , when O < M
Hence,
V ar(s) =s(1 − s)
N, (~s 6= 1)
For O = M , 1 is a boundary estimate of s and subsequently the approximate confidence
interval of the variance is obtained by that of the minimum p-value whose 95% confidence
interval includes 1. For further extensions on multi-way comparisons and generalized linear
regression models, please refer to [Aebischer, 1999].
1.4 Bayesian Method in A Relevant Area 10
Although it is hinted in [Mayfield, 1975] that this method can be generalized to calculate
the survival rate of eggs or nestlings, existing differences between chicks and nests can cause
some problems if it is translated directly. For instance, the ages of nests are usually left-
truncated, or right-truncated, or even double-interval truncated because they are usually
not found until the incubation starts, implying the building period of the nest is unknown.
It is even worse that if the nest is still active when the observation stops because this
implies that the finishing age of the nest is missing, too. It is a different situation for
chicks: the ages of the chicks are derived from their physical mass after being weighed
when they are first found.
It has also been noticed ([He, 2003]) that large biased estimation of daily survival rate
might be given by Mayfield methodology. This is mainly because some of the assumptions
are hardly to be satisfied in practice. Therefore, uncertainties in the fates of nests are hardly
modeled in Mayfield formula which to some extents reflects the generic disadvantages of
the frequentist perspective. That is why Bayesian methods are strongly needed.
1.4 Bayesian Method in A Relevant Area
The idea for applying Bayesian methods to the analysis of the yellowhammer survival rate
was prompted by a high-profile paper [He, 2003], from Dr. Zhechong He, University of
Missouri-Columbia, US. Bayesian inferences were used to model the age-specific survival
rate of the bird nests with double-interval censoring of the active nest life time. There have
been further developments of Dr. He’s method which is finished in a new paper [Cao and
He, 2005].
In that paper, the data of 36 active northern bobwhite nests were used to demonstrate
the Bayesian method. The survival time of each nest was assumed equally likely, 25 days.
Nest conditions were grouped into different catalogues at each visit:
1.4 Bayesian Method in A Relevant Area 11
(1) destroyed or abandoned nests were considered failed
(2) nests with at least one egg hatched or one chick fledged inside were assumed successful
(3) other nests were called active nests
The uncertainty involved here included data truncation. Usually, nests were not visited
daily but less frequently, i.e. 2-4 days interval to avoid disturbance which was the same as
what happened in the CSL experiments. Moreover, nests were usually not discovered until
the breeding performances began, so the encounter age of individual nest was unknown.
On the contrary, ornithologists at CSL weighed the individual chick to determine its age in
order to reduce the uncertainty. This means more information is available for our project.
Table 1.1 shows the data format from the paper [He, 2003]:
Table 1.1: Bobwhite Data-Knox County, 1991
Encounter age Observed daysOutcome Minimum Maximum Minimum Maximum
0 1 23 2 21 1 1 24 240 1 22 3 50 1 22 3 51 1 1 24 24
From Table 1.1, we can see that if a nest failed, the encounter age was supposed to be
within a range restricted by the observation days. On the other hand, if it was successful,
the encounter age was derived from the observation days under the assumption that the life
length of each nest is same. This is why that the addition of the minimum of the observed
days and the maximum of the encounter age is an invariant number, 25. Obviously, in the
real world, the number of the total nesting days is not fixed but may vary within a small
range. For example, both the incubation and the nestling periods of the yellowhammer
last 10-15 days respectively. Therefore, the fledging age of the yellowhammer chicks has
1.5 Motivations & Difficulties 12
to be assumed equal if we would like to use the same model. Having been discussed with
the ecologists at CSL, the hypothesis is considered inappropriate for our project.
It is worth mentioning that the results given by the model of [He, 2003] were demon-
strated to be better than those from classical Mayfield using some simulated known values.
Therefore, the advantages of Bayesian methods have been established.
1.5 Motivations & Difficulties
Although the profile of the dissertation only emphasized the modelling of the relationship
between the insecticide applications and nestling survival conditions, scientists in CSL are
also very interested in the Bayesian results of the survival probability over time because
those results from the ‘Mayfield’ are under an assumption that the daily hazard rate is
a constant which in some sense implies an unnatural restriction. Therefore, modelling
different survival properties against time axis by applying Bayesian ideas has motivated
the author to seek the solutions.
Several difficulties arising from the missing data and the relevant the Bayesian methods
have been conquered. Due to the irregular visits of nests, a lot of information was miss-
ing. As mentioned above, in daily survival analysis, the exact time of the deaths usually
measured in days are compulsory. However, the information would be unavailable if some
days were skipped between the days when it was still alive and when the chick was found
dead.
The exact same Bayesian models from the paper [He, 2003] were suggested to be trans-
lated into the project at the beginning of the student placement. However, after repeated
comparisons of He’s model and the problems we need to solve, several fundamental dif-
ferences were found out, preventing us to follow He’s footsteps. Moreover, because very
complex computations, several new theorems and a distinct data form (Table 1.1) were
1.6 Summary 13
used in [He, 2003], nearly one month was spent in investigating the feasibility of applying
those methods to the CSL data. The author, Dr. Zhechong He and her student, Jing Cao,
who has made some new developments, were contacted, and after considerable delay in
their responding, feedback from them also suggested that models using different methods
should be built. The integrals that were required to be solved to implement their method
are complex and non-standard and would have taken too long to re-implement. Dr. He
was unable to make the software available at this time although a commitment to do so
was made in [He, 2003]; it is to be hoped this will eventually be done. I am grateful for
their feedback and advice.
1.6 Summary
The information provided in this chapter will be used throughout the paper.
Firstly, an overview of the ecological problem is provided. The lack of evidence on the
indirect effects of pesticides on other bird species except grey partridge (Perdix perdix ),
has motivated ecologists to search new evidence on other bird species, such as the yel-
lowhammer, on which the data analysis in the project is based.
Secondly, important concepts and detailed descriptions of the experiments on the yel-
lowhammer are presented. Some relevant statistical methods are briefly introduced. Al-
though they are not analytically used in this project, they have played a very important
role in helping the author understand the project and bio-statistical modelling.
Finally, the motivations and difficulties of the project are mentioned.
Chapter 2
Bayesian Approaches
The core issues of the Bayesian analysis related to the project are introduced in this chap-
ter. In probability theory, both continuous and discrete conditions have to be discussed.
However, regarding the fact that in practice, daily collected survival data are treated as a
continuous series in bird studies, Bayesian ideas will be illustrated within the continuous
literature. The discussions presented in this chapter are mainly based on two books: [Lee,
2004] and [Spiegelhalter, D. J. et al, 2004].
2.1 Background
The traditional probability theories are based on searching the long-run properties of ran-
dom events. This is referred to as the frequency interpretation of probability. Consequently,
the standard statistical methods are developed from the frequentist perspective. For ex-
ample, “although there is still no universal agreements on the definition of probability,
most people might agree the following statement: Probability represents the times of the
occurring of a random event θ in the repeated trials after a ‘long time’. ”—[Spiegelhalter,
D. J. et al, 2004] We may use a mathematical notation P (θ) to represent Probability. This
14
2.1 Background 15
definition describes the frequency with which specific events take place.
However, in the last two decades, more and more statisticians have started interpreting
probability theory from another perspective. “As [Lindley, 2000] addressed, we do not
need to assume that the rules of probability are objective, but we can derive it from our
individual feelings of the uncertainty. For example, a gambling game in Macao is carried
out like this: two dice are covered instantaneously outside of the gamblers’ sight after being
tossed up to the sky. Then the players are asked to guess the different combinations of
the dice. The fact that the event has happened means the participants bet their fortunes
only depending on their subjective feelings, i.e.personal probabilities.”—[Spiegelhalter, D.
J. et al, 2004] A subjective interpretation of probability is given in such situation. It is
vital that your probability for a event represents your relationship to that event which has
nothing to do with the objective property of the event itself.
Bayesian statistical methods are based on the subjective view of probability introduced
above. This kind of thinking in some sense reflects the dynamics and the natural process
of knowledge learning and accumulation of human beings. Bayesian inference has obtained
more and more applications since it is a very comprehensive and robust method. In the
past, the properties of estimators usually only include unbiasedness, minimum variance,
efficiency, consistency, and sufficiency. In recent years, a special attention has been given
to a statistical property called robustness. An estimator is said to be robust if the results
given by a statistical model maintain stable after new samples are pooled in. Traditional
methods are known to be weak in robustness.
Bayesian approach begins its life with a simple theorem, which was named Bayes’
Theorem after Thomas Bayes, a nonconformist minister from the small English town of
Tunbridge Wells. It first appeared in one of his posthumous publications in 1763. We
have learnt the simple equation form of the theorem in the first course of mathematical
statistics, however, in practice, a form for general quantities is preferred.
2.2 Bayes’ Theorem for General Quantities 16
2.2 Bayes’ Theorem for General Quantities
Firstly, suppose that we are interested in a series of unknown parameters
Θ = (θ1, θ2, . . .), Θ ∈ RP
Meanwhile, we might have some personal beliefs or guesses of the probability density
function pdf of Θ before we see the data:
P (Θ)
which also represents the dimension of Θ. It is given a statistical term, the prior distribu-
tion. Prior means a statistician’s hypothesis of the unidentified quantities before the data
were available. Then a sequence of data was observed
B = (B1, B2, . . .).
A conditional pdf can be used to express how likely to have current values of ~B in terms
of the current values of ~θ:
P (B|Θ).
This is usually referred to as the likelihood function of Θ. The next step is to update the
prior by the likelihood. Recall that there is a rule concerning the conditional distribution
in probability theory:
P (Θ|B)P (B) = P (Θ,B) = P (B|Θ)P (Θ).
2.3 Bayesian Inference 17
Therefore,
P (Θ|B) =P (B|Θ)P (Θ)
P (B)(2.1)
which is the simplest form of Bayes’ theorem. Note that P (B) is just a denominator to
make sure that∫
P (θ|B)dθ = 1, and its value is usually not concerned unless alternative
models are compared. Thus, a proportional form which only regards the terms including
Θ is commonly used in practice:
P (Θ|B) ∝ P (B|Θ)P (Θ) (2.2)
It is a significant improvement that the parameters of interest have been successfully mod-
ified in the light of the data. The pdf on the left hand side of the proportion is called the
posterior distribution reflecting how the data change our initial state of knowledge. When
we obtain new data, the current posterior distribution becomes the prior for the new sam-
ple. Our knowledge of the parameters increase in such process and can always be modified
by the movements of the parameters because they are not constants but variables.
In theory, statistical inferences can be derived from the entire posterior distribution.
However, all information is encapsulated in the posterior distribution with a very compact
form, and a mathematical formula is too simple to convey the information in a useful form.
Therefore, a range of helpful summaries should be supplemented. In the next section, a
detailed discussion is carried out to demonstrate how to make inferences.
2.3 Bayesian Inference
In the study of statistics, summarizing the operating mechanisms behind the observed data
is a main task needed to be solved. The process of making such statements about the given
physical systems is called inference.
2.3 Bayesian Inference 18
The fact that Bayesian inference allows us to combine the substantive information
we have with our personal beliefs about the unknown parameters could lead to some
conclusions that we have expected naively to get from statistics. For instance, about
the traditional 95% confidence interval, “all we can say is that if we carry out similar
procedures time after time then the unknown parameters will lie in the confidence intervals
we construct 95% of the time. It appears that the books we look at are not answering
the questions that naturally occur to a reader, and that instead they answer some rather
recondite questions which no-one is likely to want to ask.”—[Lee, 2004] While on the
contrary, in Bayesian analysis, “the 95% Highest Posterior Density Interval (HPDI ) really
does mean an interval in which the statistician is justified in thinking that there is a 95%
probability of finding the unknown parameter.”—[Lee, 2004] This interpretation is more
natural and easy to be handled.
2.3.1 Analysis of Binary Data
Although the likelihood of binary data is from a discrete sample space, the parameter θ
of interest is usually given a continuous prior distribution. The simplest assumption is to
suppose the possible values of θ following a uniform distribution, and hence P (θ) = 1 for
0 6 θ 6 1. Using the proportional form of Bayes’ Theorem (2.2),
P (θ|B) ∝ θk(1 − θ)n−k × 1
= θ(k+1)−1(1 − θ)(n−r+1)−1
= Beta(k + 1, n − r + 1)
where k is the number of events occurred, n is the total number of trials. The posterior is
taking the form of Beta distribution. “It immediately suggests that we can summarize the
posterior distribution in terms of mean and variance of Beta distribution, and make prob-
2.3 Bayesian Inference 19
ability statements based on what we know about it (for example, many common statistical
packages will calculate tail area probabilities for the Beta distribution).”—[Spiegelhalter,
D. J. et al, 2004]
Instead of a uniform prior, Beta(a, b) can be used as a prior and the following analysis
can be obtained
Prior ∝ θa−1(1 − θ)b−1
Likelihood ∝ θk(1 − θ)n−k
Posterior ∝ θa−1(1 − θ)b−1θk(1 − θ)n−k
= Beta(a + k, b + n − k)
The prior is said to be conjugate to the likelihood if the prior and posterior are from the
same family of distributions. This has the advantage that prior parameters can usually be
interpreted as a prior sample. Some common conjugate families are shown in Table 2.1:
Table 2.1: Conjugate Distributions Table
Prior Likelihood PosteriorNormal Normal NormalBeta Binomial Beta
Gamma Poisson GammaDirichlet multinomial Dirichlet
However, the conjugate priors do not exist for all likelihood, and always be restrictive.
Hence, the uniform or normal distribution with large variance are usually used as priors,
especially when little information of the shape of the likelihood distribution is available.
2.3 Bayesian Inference 20
2.3.2 Inference with Normal Distribution
In many circumstances, it is appropriate to suppose that the likelihood follows a normal
distribution which is also the case in the project. From the conjugate family, the prior
distribution has the form
P (θ) = N
(
θ | µ,σ2
n
)
(2.3)
where µ is the prior mean, σ is the likelihood standard deviation, and n is the sample size
of the prior. As n → 0, the variance becomes very large and the distribution becomes very
‘flat’. Thus, the limit form is essentially identical to a uniform distribution over (−∞,∞)
which is known as a non-informative distribution.
Assume a normal prior
θ ∼ N
(
µ,σ2
n
)
(2.4)
and the likelihood
B ∼ N
(
µ,σ2
l
)
(2.5)
where l is the sample size of real data set. Applying Bayes’ theorem
P (θ|B) ∝ P (θ)P (B|θ) (2.6)
∝ e−l(B−θ)2
2σ2 × e−n(µ−θ)2
2σ2 (2.7)
The terms without θ can be ignored in the proportional form of Bayes’ theorem. After
rearranging the terms according to θ, the posterior distribution can be obtained
θ|B ∼ N
(
θ |nµ + lB
n + l,
σ2
n + l
)
(2.8)
The expression (2.7) is important, “It says that the posterior mean nµ+lB
n+lis weighted
average of the prior mean µ and the likelihood B, weighted by their precisions, and therefore
2.3 Bayesian Inference 21
is always a compromise between the two. The posterior variance (1/precision) is based on
an implicit size equivalent to the sum of the prior ‘sample size’ n and the sample size of
the data l: thus, when combining sources of evidence from the prior and the likelihood,
we add precisions and hence always decrease the uncertainty.”—[Spiegelhalter, D. J. et al,
2004]
If n tends to 0, the prior approaches a uniform distribution and the posterior distribution
is totally dominated by the data, i.e. the posterior will have the same shape with the
likelihood.
2.3.3 Point Estimation & Interval Estimation
In many cases, posterior summaries, such as point estimates and interval estimates are
generally needed. In the project, summary statistics are used at every second. Therefore,
some specific terms in Bayesian inference must be introduced.
Point Estimates The mean, median and mode derived from the posterior distribution
are given the same interpretations as those from the classical frequentist theory. It
is usually preferable to report all the three measures of location of distributions. In
the project, median is suggested to plot the exact model fit in order to avoid the
influence from the tails of skewed distributions whereas the mean value is preferred
to give the plot of the trend.
Interval estimates Usually the interval estimates are termed credible intervals or poste-
rior intervals in Bayesian inference to differ from the traditional confidence interval.
Three types of intervals exist in Bayesian literature, in which the Highest Posterior
Density Interval (HPDI or HPD) is the most commonly used.
Suppose a continuous parameter θ ∈ (−∞, +∞) and a posterior conditional on
generic data B.
2.4 Computational Methods 22
One-sided “A one sided upper 95% interval would be expressed as (θl, +∞), where
P (θ < θl|B) = 0.05.”
Two-sided (equal tail area) “A two-sided 95% interval with equal probability in
each tail area would comprise (θl, θu), where P (θ < θl|B) = 0.025, and P (θ >
θu|B) = 0.975.”
Highest Posterior Density “If the posterior distribution is skewed, then a two-
sided interval with equal tail areas will generally contain some parameter values
that have lower posterior probability than values out side the interval. An HPDI
does not have this property - it is adjusted so that the probability ordinates at
each end of the interval are identical, and hence it is also the narrowest possible
interval containing the required probability. ”
2.4 Computational Methods
Although the theory of Bayesian inference is developed elaborately, the fact that the re-
quired complex and high dimension integrations can hardly be solved analytically has
restricted the applications of the method in the past two hundred years. Until the last two
decades, a computer-based method, Markov Chain Monte Carlo (MCMC), has provided
an alternative way to solve the problem. Considering the facts that MCMC is default and
packed in WinBUGS and building proper statistical models is much more important and
meaningful according to this project, only a short introduction is given to this numerical
method.
2.4.1 The Problem
All the mathematical derivations presented here are mainly based on the tutorial paper
[Brooks, 1998]. Recall that P (Θ) is the prior distribution and P (B|Θ) is the likelihood
2.4 Computational Methods 23
function, the posterior is usually written in a proportional form
P (Θ|B) ∝ P (B|Θ)P (Θ)
So the normalizing factor is given by
∫
P (B|θ)P (θ)dθ (2.9)
Suppose now Θ = (θ1, θ2), the marginal posterior distribution might be of interests:
P (θ1|B) =
∫
P (θ1, θ2|B)dθ2 (2.10)
More commonly, any summary of the posterior distribution, for example moments, quan-
tiles, HPDI, etc are legitimate for Bayesian inference. All these features can be related to
the posterior expectation of functions of θ. For instance, the posterior expectation of a
function g(θ) is
E[g(θ)|B] =
∫
g(θ)P (θ|B)dθ (2.11)
No matter the constant of proportionality (2.9), the marginal density (2.10), or the pos-
terior expectation (2.11), the ability to solve the integrations which are often complex
and high dimensional. It is extremely hard to obtain analytical solutions. Alternatively,
MCMC is able to simulate the posterior density. Hence, the difficulty in computation
is substantially conquered. This breakthrough in numerical methods has accelerated the
boom of Bayesian applications.
2.4 Computational Methods 24
2.4.2 Markov Chain Monte Carlo
The idea of MCMC sampling is simple. There is an elaborate description in [Brooks,
1998], “Suppose we have some distribution P (θ), θ ∈ S ⊆ RP , which is known as only
up to some multiplicative constant. We commonly refer to this as the target distribution.
If P is sufficiently complex that we cannot sample from it directly, an indirect method
for obtaining samples from P is to construct an aperiodic and irreducible Markov Chain
with state space S, and whose stationary (or invariant) distribution is P (θ), as discussed
in Smith and Roberts (1993), for example. Then, if we run the chain for sufficiently long,
simulated values from the chain can be treated as a dependent sample from the target
distribution and used as a basis for summarizing important features of P . ”
The relationship between the MCMC approach and the usual Markov Chain theory is
shown in Figure 2.1. The stationary distribution of the Markov Chain is known, and the
transition distribution is the target.
Transition Distribution
Stationary Distribution
Common Markov Chain Theory Markov Chain Monte Carlo
Figure 2.1: The comparison between the MCMC and the common Markov Chain
In order to make the introduction more convenient and to keep consistency with
[Brooks, 1998], the Markov chain transition distribution is denoted by K since it is more
often referred to as Markov chain transition kernel. The conditional distribution of the
2.4 Computational Methods 25
current state (θ(current)) given the previous state (θ(previous)) is K(θ(previous), θ(current)) con-
sequently.
“The main theorem underpinning the MCMC method is that any chain which is ir-
reducible and aperiodic will have a unique stationary distribution, and that the t-step
transition kernel will ‘converge’ to that stationary distribution as t → ∞. Thus, to gen-
erate a chain with stationary distribution P , we need only to find transition kernels K
that satisfy these conditions and for which PK = P , i.e. K is such that, given an ob-
servation θ(previous) ∼ P (θ(previous)), if θ(current) ∼ K(θ(previous), θ(current)), then θ(previous) ∼
P (θ(previous)), also. ”—[Brooks, 1998]
In practice, different kernels would be used to update the distinct components of the
whole chain instead of a single form of transition distribution. The most popular up-
date algorithm is the Gibbs sampling, which is a special case of the Metropolis-Hastings
updating. It is possible to combine them to construct a Markov chain which performs
appropriately. The combination in WinBUGS is called the Metropolis-within-Gibbs algo-
rithm. However, their details are beyond the scope of the paper. Detailed discussions of
these update schemes can be found in [Gilks and Wild, 1992], [Gelman and Rubin, 1992],
[Gilks, W. R. et al, 1996] and [Gamerman, 1997].
2.4.3 Introduction of WinBUGS
WinBUGS is a free software for Bayesian analysis, especially for complex Bayesian models.
It is built up by David Spiegelhalter, Andrew Thomas, Nicky Best, and Dave Lunn, all
of whom are working in the area of public health. The initial purpose for developing
WinBUGS is to use Bayesian approaches to solve problems in clinic trials and health-
care evaluation. Its brilliance have been acknowledged by more and more statisticians
([Congdon, 2001];[Congdon, 2003];[Lee, 2004]). As a result, there are more attempts to use
WinBUGS in other research areas, such as economics, biology, ecology and so on, which is
2.4 Computational Methods 26
just the case of this paper. The author was sent by CSL to take a short course in WinBUGS
in Medical Research Council Biostatistics Unit, Cambridge, run by Dr. Spiegelhalter, and
Dr. Best. The precious expertise from Dr. Spiegelhalter has definitely accelerated the
process of modelling.
The main advantages of WinBUGS are that model specification in WinBUGS is fairly
straightforward if you are skilled with Bayesian analysis and different updating algorithms
for using MCMC method are automatically chosen to sample from the posterior distrib-
ution. The disadvantages are that “it cannot be integrated with a traditional statistical
package for data manipulation, exploratory analysis and so forth since it is designed as a
‘stand-alone’ program”—[Spiegelhalter, D. J. et al, 2004] and the speed for the convergence
of MCMC sampling can be extremely slow if the data set is huge.
Some basic notations which having contributed to the models in the paper are shown
below, “
• <- represents logical dependence, e.g.
m <- a+b*x
• ∼ represents stochastic dependence, e.g.
r ~ dunif(a, b)
• Can use arrays and loops, e.g.
for (i in 1:n){
r[i] ~ dbin(p[i], n[i])
p[i] ~ dunif(0, 100)
}
• Some functions can appear on the left-hand-side of an expression, e.g.
logit(p[i])<- a+b*x[i]
log(m[i])<- c+d*y[i]
2.5 Summary 27
• The normal is parameterized in terms of its mean and precision = 1/variance = 1/sd2
• Functions cannot be used as arguments in distributions
”—[Spiegelhalter and Best, 2005].
Some tools in WinBUGS are very helpful. For example, Figure 2.2 shows a posterior
density of an element of a random variable by using the Sample Monitoring tool.
Posterior distribution of anelement of a random variable
Figure 2.2: An example of using the Sample Monitoring tool
More applications will be combined with model discussions. Finally, it should be men-
tioned that it was concluded [Celeux, 2005] that the DIC tool [Spiegelhalter, D. J. et al,
2002] in WinBUGS, which is famous for comparing the models is not naturally defined for
the missing data models. Therefore, it cannot be applied in the project.
2.5 Summary
The comparison between the novel Bayesian perspective and the classical frequentist theory
is given as the background, but not emphasized.
2.5 Summary 28
The proportional form of Bayes’ theorem is the backbone of the whole theory, and
therefore is introduced in detail. Comparatively, the Bayesian methods for inference are
the most important part in this chapter. The difficulties in computation were solved by
MCMC simulation, and hence a brief introduction is presented. Finally, a short description
is given to WinBUGS, a brilliant software which is used throughout the project.
Chapter 3
Data Analysis and Model Handling
in WinBUGS
3.1 Formatting The Data
The quality and suitability of the statistical models in some sense are determined by the
way that the data are handled. Moreover, data structures for Bayesian inferences are
usually different from those for traditional statistical models. Therefore, it is important
to format the original data into appropriate forms for distinct models to be built for the
project.
Originally, all the data about chicks were recorded in the ornithologist’s observation
diaries, i.e. field work diary. Usually, the data of chick conditions in an specific nest would
be collected twice a week if the first time for recording was early in that week. However,
if a nest was visited on Thursday or Friday, the next visit would be delayed to the next
Monday. Consequently, more information was lost in such a condition. The original data
form was shown in Figure 3.1:
29
3.1 Formatting The Data 30
Figure 3.1: Original data
The information about the percentages of the 200m-radius foraging area around an
individual nest sprayed with insecticides were arranged into two groups in accordance with
the timing: the proportions of the 200m-radius foraging area sprayed with insecticides
6 20 days before hatching and the proportions of the 200m-radius foraging area sprayed
with insecticides > 20 days before hatching. Such data are shown in Figure 3.2.
Figure 3.2: Insecticide data
3.2 The Analysis of Failure Time Data 31
In [Boatman N. D. et al, 2004], it was mentioned that there was no significant relation-
ship between the brood reduction and the proportions of the 200m-radius foraging area
sprayed with insecticides > 20 days before hatching, so those data (Column “more20”,
Figure 3.2) are discarded in the current analysis. In the light of the expertise and expe-
rience from bio-statisticians and ornithologists, the exact values of the proportions of the
200m-radius foraging area sprayed with insecticides ≤ 20 days before hatching are grouped
into <= 25% and > 25% to simplify the calculation.
3.2 The Analysis of Failure Time Data
Usually, the survival analysis is grouped into the analysis of failure time data. Generally
speaking, the analysis of data when the response of interest is the duration until a single
event occurs, is called the analysis of failure time data. The events, such as the deaths of
chicks, the reappearance of an specific illness etc., are defined by temporary or permanent
changes of states. According to our project, the deaths of chicks are obviously the latter
condition. On the other hand, such events are also generically called failures. More impor-
tantly, “such data can be studied in terms of the distribution of waiting times or in terms
of the rate of change between states in given time intervals.”—[Congdon, 2001]
Therefore, the questions we would like to answer can be described under the literature
of the analysis of failure time data:
• the impact of the covariates (the proportions of the foraging area sprayed with in-
secticides, blocks) on the survival rate or the length of survival
• the characteristics of the survival rate (the rate of no transition between states), for
example, how it changed with time spent in the current state
The whole theory can be found in the book [Kalbfleisch and Prentice, 2002].
3.2 The Analysis of Failure Time Data 32
Let T denote a random survival time of an individual chick from hatching, i.e. age
of chick, with probability density f(t). Then the probability of an individual chick which
died before time t from hatching is known as the failure function F (t),and expressed as
F (t) = P (T < t), 0 < t < ∞ (3.1)
Two most useful functions are given as definitions:
Definition 3.2.1 The survival function S(t) is the probability of an individual chick which
survives beyond time t.
S(t) = P (T > t) = 1 − F (t), 0 < t < ∞ (3.2)
Definition 3.2.2 The hazard rate is the chance of the death of an individual chick in the
interval (t, t + ∆t) given the survival until t. Hence
h(t) =f(t)
S(t)(3.3)
If it is very short, the function represents an instantaneous hazard rate. If it represents
one day, the hazard rate is a daily hazard rate (dhr) and also known as the daily mortality
rate (dmr). Therefore, the daily survival rate (dsr) is
dsr = 1 − dhr (3.4)
All the notations will be used in the paper to avoid dull English. It also follows that the
cumulative hazard
H(t) =
∫ t
0
h(ν)dν (3.5)
3.2 The Analysis of Failure Time Data 33
Hence, the survival function (S ) can be written as
S(t) = e−H(t) (3.6)
The probability density can be written as
f(t) = h(t)e[−H(t)] (3.7)
It must be mentioned that in the project, the survival function (S ) is known as the
nestling period survival probability or the single chick survival rate to days.
3.2.1 Failure Time Distributions
From the Bayesian perspective, a prior distribution of the daily hazard rate before the
action that the data are pooled in the models is strongly needed. A simplest hypothesis
is to suppose the survival lifetime follows the exponential distribution. Consequently, the
prior assumption of the hazard rate is h(t) = µ, µ is a constant. It will be dominated and
modified by the likelihood of the data even though it is assumed invariant in the prior.
Therefore, the survival function and the density function of T become, respectively,
S(t) = e−µt (3.8)
f(t) = µe−µt (3.9)
from 3.6 and 3.7. Recall that a constant daily hazard rate is also used by traditional
Mayfield. However, it is treated as a random variable in Bayesian literature which can
be modified by the data. The results which will be discussed later have shown significant
advantages of the Bayesian methods.
Having been obtained from the exponential models, the posterior distribution of the
3.2 The Analysis of Failure Time Data 34
daily hazard rate has been found a slight increase trend. Therefore, Weibull distribution
is applied to set up a more informative prior distribution for the survival time T .
Two-parameter Weibull distribution W (µ, γ)(µ, γ > 0) allows for a power dependence
of the hazard function on time, where µ and γ are scale and shape parameters respectively
[Congdon, 2003]. Weibull hazard is expressed as
h(t) = µγtγ−1 (3.10)
The survival function
S(t) = e−µtγ (3.11)
The probability density function
f(t) = µγtγ−1e−µtγ (3.12)
The hazard is monotonically increasing for γ > 1, decreasing for γ < 1. In the project, an
empirical value γ = 1.4 is used.
3.2.2 Regression Models
In order to model and determine the relationship between insecticide applications and yel-
lowhammer chick survival rate, regression models are necessary. The explanatory variables
include the proportions of the 200m-radius foraging area sprayed with insecticides 6 20
days before hatching and blocks which are non-informative.
A vector of covariates x with two dimensions (treatments and blocks) is available on
each chick. In general, it is possible for x to include both qualitative and quantitative
variables. Qualitative variables are adopted in the project following the expertise in ecology
in CSL. The first variable is the proportions of the 200m-radius foraging area sprayed with
3.2 The Analysis of Failure Time Data 35
insecticides before hatching which have been grouped to 6 25% foraging area and > 25%
foraging area whilst the second variable is the number of blocks (1, 2, 3, 4). It was also
reported in the previous paper [Boatman N. D. et al, 2004] that two blocks received normal
insecticide applications whereas the other two blocks received extra insecticide applications.
It looks like that such information can be used to set up two groups to form an independent
covariate. However, the actual procedure included a first spray in late May with normal
farmland practice quantity and a second spray in late June with normal quantity as well.
The invertebrate groups that were important in the diet of yellowhammer chicks might
have recovered [Boatman N. D. et al, 2004]. Hence, the summaries (normal, extra) of
blocks are non-informative, i.e. it is improper if they are mixed with the information of
the proportions to act as an independent regression variable.
The next step is to set up appropriate regression functions. Generally speaking, the
hazard rate usually depends on both time t and explanatory variables x. Therefore, re-
gression models are obtained by allowing the hazard rate to be a function of covariates.
Let b = (b1, . . . , bk) denote the regression parameters, and G is the specified function. For
the exponential distribution, the function can be written as
h(t, x) = µG(bx)
Where µ is the constant. The most natural form of G in survival analysis [Kalbfleisch and
Prentice, 2002] is
G(y) = ey
since it can guarantee that G(bx) > 0 for all possible x. Therefore the model of hazard
rate becomes
h(t, x) = µe(bx) (3.13)
3.2 The Analysis of Failure Time Data 36
In practice, a slight modification is made: the log survival time is used to set up the
regression model instead of hazard rate to simplify the calculation. This is demonstrated
in the models for the project. In terms of the log survival time
L = ln t (3.14)
The regression equation can be written as
L = b0 + bx (3.15)
where the inception b0 = − ln µ.
Similarly, the GLMs are built for the Weibull distribution. The conditional hazard is
h(t, x) = µγtγ−1e(bx) (3.16)
Alternatively, the linear regression on the log survival time is given as
L = b0 − bx (3.17)
where b0 = − ln µ, and b = µ−1b. Note that there is a difference between the sign in front
the regression coefficients which should be given cautions when analyzing the posterior
summaries.
Finally, it should be mentioned that Linear and Logistic regressions will not work for
right-truncation [Vittinghoff, E. et al, 2005]. For example, logistic regressions are untenable
because equal length of observation time is required. However, right-censoring of the data
caused different lengths of monitoring time.
3.2 The Analysis of Failure Time Data 37
3.2.3 Survival Analysis in WinBUGS
In the model specification part, a for-loop in which all the functions are specified is used
to go through all the individuals in the sample. For each stochastic node, exponential
distribution is routinely implemented as
t[i] ∼ dexp(µ[i])
while the Weibull is
t[i] ∼ dweib(r, µ[i])
where i is the loop variable and r is the Weibull shape parameter. Then the daily mortality
rate, the daily survival rate, the nestling period survival probability and the probability
density of chick survival time are built up respectively. Finally, the log of mu[i] is written
as a linear link of the covariates.
The coefficients of the covariates were subsequently given very vague, medium non-
informative and informative priors. The reasons and discussions will be given in the next
chapter. The main difficulty in the project is that right-censoring caused missing data. In
WinBUGS, missing data are treated as unknown parameters. Hence, the inferences are
based on the joint posterior distribution of parameters and missing data given the observed
data and prior distribution.
In WinBUGS, “Censoring is denoted using the notation I(lower, upper) e.g.
y ∼ ddist(θ)I(lower, upper)
would denote a quantity y from distribution dist with parameters θ, which had been
observed to lie between lower and upper.”—[User Manual of WinBUGS]
In the project, x is the survival time t of yellowhammer chicks while the lower and
3.2 The Analysis of Failure Time Data 38
upper are denoted by tmin and tmax respectively. Furthermore, starting values must be
given to the MCMC sampler of the stochastic node t. Two initial files are denoted as
tinitial1 and tinitial2. Three representative types of input data are shown in Table 3.1 to
illustrate how the missing data problem is solved in real models.
Table 3.1: Explanations of Input Data
t tmin tmax tinitial1 tinitial2NA 13 300 14 200NA 13 300 14 200NA 13 300 14 200NA 13 300 14 200NA 12 15 12 15NA 8 11 8 11NA 5 8 5 8NA 1 3 1 37 0 300 NA NA7 0 300 NA NA
The data of four chicks in nest No.247 are shown in the first four rows. All of them
fledged when they were age 13 (i.e. on the 13th day including the hatch day). It is common
for chicks in a specific nest to fledge or to be predated on the same day. Therefore, the death
of each chick should occur between 13 days and an unknown later date since ornithologists
stopped the monitoring once they had observed the outcome of the nest (i.e. the right
truncation happened). In WinBUGS, tmax can be given an arbitrary number larger than
tmin because fledged individuals only contribute to the denominator of the daily hazard
rate from hatching to fledging and the period beyond the nestling duration will be discarded
automatically if the study is restricted to the chick period from hatching to fledging. In
the models, a reasonable value , the average life length of yellowhammer (about 300 days),
is given to tmax. Theoretically, the model can also be used to estimate the chick survival
rate within 300-day average life length even though the estimates are not accurate.
3.3 Models Considering Time Dependence 39
From the 5th to the 8th row, information about four chicks in the nest No.250 is
provided. They all died from starvation, but none of the failure dates was known exactly.
Therefore uncertainty was involved. Since the death of an individual chick happened
between the last observation when it was still alive and the first observation when it was
found to have failed, the corresponding chick ages on these two days are filled into columns
tmin and tmax respectively.
In the two conditions above, the exact age of failure was missing, so the values in the
column of the exact t must be denoted as NA. Furthermore, the most popular method to
check the convergence of the MCMC simulation is to run two chains simultaneously with
widely differing starting values. For missing data models, the widely differing initial values
have to be restricted between the lower and upper bounds Table (3.1).
Occasionally, the dates on which the chicks died were known exactly. It was the case
for the chicks in nest No.304. The relevant data are filled in the last two rows of Table
3.1. In such a condition, the missing data structure will be ignored in WinBUGS. Hence,
the exact chick survival days are filled in the column of t. Two arbitrary numbers have to
be filled in the columns of tmin and tmax to make sure the exact t is in the range. For
the two nodes, they are not stochastic nodes any more, so no MCMC is applied and initial
values are denoted as NA.
3.3 Models Considering Time Dependence
Sometimes, it is of interest that if the chicks monitored in May may have different daily
survival rate with those found in June. Therefore, two models (I and II) considering the
time dependence are built up. For these models, some assumptions are needed to deal
with missing data problems. Therefore, these models are not as appropriate as the failure
data models introduced in the last section. However, as models from a totally different
3.3 Models Considering Time Dependence 40
perspective of survival modelling, these models deserve discussion.
Two assumptions are needed for model I:
• The total number of chicks on day t is known
• The number of chicks survived from day t-1 is known
The idea is simple: the number of survival chicks on day t was part of the total number of
chicks on day t-1. Hence, a natural model would be to assume that
Ns, t ∼ Binomial(dsr, Nt−1)
where Ns, t denotes the number of nestlings on day t surviving from day t-1, Nt−1 denotes
the total number of chicks on day t-1, and dsr is the daily survival rate as usual.
Logistic regression is applied to adopt the covariate t:
logit(dsr) = α0 + α1 ∗ t (3.18)
The regression parameters α0 and α1 are given very vague priors
αn ∼ Normal(0.0, 1.0E − 6)
The data used here are from block one, Castle Howard, 2002. Since not many yel-
lowhammer nestlings are found until after the middle of May, the period from the 18th
May to the 3rd June, lasting for 17 days is studied. Another advantage for using informa-
tion from that period is that most fields on block one received the insecticide treatment
only once on 20th May. Therefore, it should be reflected on the daily survival rate if there
were effects from the insecticide treatment.
In this model, missing data have to be entered under some assumptions. One assump-
3.4 Summary 41
tion from [Johnson, 1979] is suggested by [He, C. Z. et al, 2001]: if days were skipped
between the last visit when it was still found alive and the first observation of the death,
the 40% point is assigned as the day of failure. However, a modification is made after
discussing with ecologists that the previous day before the first observation of the failure is
supposed to be the day of death. Another problem is that the sample size is comparatively
small because the total number of chicks found on one block within a specific duration is
limited. Therefore, the results are relatively large biased. In spite of this, the model still
can provide some useful information. The discussion of the results will be carried out in
the next chapter.
Model II, with a much more complex structure, is built to find out not only the daily
hazard rate but also the probability that a nestling was still found alive on date ta was
subsequently found died on a later day td. It is naturally assumed that the values of the
number of chicks found alive on ta was subsequently found died on td for each specific ta fol-
low a multinomial distribution with proportions Pta, td denoting the probability mentioned
above. The regression equation is the same as equation 3.3.
Daily collected data are also required for this model. Because of the time limit of the
placement, real data are not available for the model. Therefore, only a simulated data
structure taking a form of 5 × 6 matrix is applied. Model discussions will be presented in
the next chapter. For more models considering time dependence in the animal conservation
study, good references include [Brooks, S. P. et al, 2000], [Besbeas, 2002], [Brooks, S.P. et
al, 2004],and [King, R. et al, 2005].
3.4 Summary
This chapter introduces the kernel methods of data analysis involved in the project.
First we discuss how to arrange the data to make them more convenient for Bayesian
3.4 Summary 42
analysis. Then the core issues of model buildings from different aspects are elaborately
described.
Failure time models have successfully solved the missing data problem mainly because
uncertainty modelling is available from Bayesian perspective. Therefore, the main problem
with the data involved in the project is primarily solved. These models are elaborately
built and are the most important models for the posterior inferences carried out in the
next chapter. Some very good estimate are given by these models. It is worth noting that
no approximations on the missing data are needed in these models. Therefore, they have
highlighted Bayesian approaches compared to traditional Mayfield method.
Moreover, new models from a different logical aspect are built. They consider time
dependence. No specific functions in WinBUGS are needed for them. However, they are
not comprehensively developed because of the time limit of the student placement.
Chapter 4
Discussions on Failure Time Data
Models
All the random variables are monitored in WinBUGS, including the posterior summaries
of the variables and the regression coefficients. The discussions begin with the exponential
model. The model comparison between Exponential and Weibull will be carried out in
the Weibull model part. The data from the experiment site, Castle Howard, 2002 (four
blocks) are pooled in. In the sample, the longest nestling period of a individual chick is
15 days. Therefore, the study is restricted to a 15 day range. The default burn-in number
and follow up iterations are 5000 and 20000 respectively.
4.1 Survival Models Based on The Exponential Prior
It takes 132s to run the program on a 1.24 GHz personal computer. Moreover, some longer
sampling chains are also used to test the robustness of the model.
43
4.1 Survival Models Based on The Exponential Prior 44
4.1.1 Models Based on Insecticide Quantities
In this project, the main goal is to find out whether there is indirect effects of insecticides
on the survival of the yellowhammer chicks. Therefore, this is modeled at the first place
by combining the data of the proportions of the foraging area (200m-radius) sprayed with
insecticides 6 20 days before hatching and the data of survival chicks. A negative rela-
tionship between them has been found out. Figure 4.1 plots the posterior means (dots) of
the daily survival rate against two treatment groups.
1.0 1.5 2.0
0.8
0.85
0.9
0.95
daily
sur
viva
l rat
e
Insecticide applications1: <= 25% 200m-radius foraging area was sprayed2: > 25% 200m-radius foraging area was sprayed
posterior means of the monitored variablelinear fitting
Figure 4.1: The model fit of the daily survival rate over the different proportions of insec-ticides from exponential model
It has been shown that chicks that were in the nest of which more than 25% of the
foraging circle area was sprayed with insecticides had a much lower daily survival rate than
those in foraging area of which less than 25% of the foraging area was treated. The mean
of the dsr the former is a number less than 0.925 whereas that of the latter is about 0.95.
Furthermore, the analysis on the nestling period survival rate (S ) also shows a negative
relationship which is illustrated in Figure 4.2.
4.1 Survival Models Based on The Exponential Prior 45
Insecticide applications1: <= 25% 200m-radius foraging area was sprayed2: > 25% 200m-radius foraging area was sprayed
posterior means of the monitored variablelinear fitting
1.0 1.5 2.0
0.0
0.25
0.5
0.75
1.0
sing
le s
urvi
val r
ate
to d
ays
Figure 4.2: The model fit of the nestling period survival probability over different propor-tions of insecticides from the exponential model
4.1.2 Models Based on Time
The direct relationship between insecticides and nestling survival is found out in the last
section. In fact, detailed information, i.e. how the insecticides impact on the chick survival,
has to be obtained by modelling variables of interest over time.
The model fit of the chick daily survival rate (dsr) against time t is shown in Figure
4.3. The posterior medians and the 95% Highest Posterior Density Intervals are plotted.
As we can see, the daily survival rate was between about less than 0.80 and 1.00. The
model fit curve oscillates heavily before chick age 10. For such a phenomenon to occur
before chick age 6, especially the steep drop which took place around age 2, one most likely
ecological explanation is that yellowhammer chicks suffered from the starvation caused by
the reduction of arthropod abundance after the insecticide applications [Boatman N. D. et
al, 2004].
4.1 Survival Models Based on The Exponential Prior 46
The steep drop that happened around chick age 9 is possibly because the risk of preda-
tion increased when the chicks became more vocal and the adult feeding visits to the nest
became more frequent after chick age 8. After about chick age 9, the curve becomes much
smoother. It declines slightly at first, then reaches a local minimum at about chick age 12,
increases again until a local maximum after age 13 and finally goes down. In the sample,
most chicks fledged at about age 13. This can explain why chicks suffered from a relatively
higher dsr at that time. In general, the model fit plot implies that the likelihood of data
has been dominating the posterior distribution. Obviously, the noise from the sample has
influenced the shape of the curve. Therefore, it is better to refer to the plot of the trend.
survival time from hatching to fledging (15 days)
daily
sur
viva
l rat
e
t
95% posterior intervalposterior median of each element of themonitored variable
Figure 4.3: The model fit of the chick daily survival rate over time
The trend of the dsr is shown in Figure 4.4. A slightly declined trend of the daily
survival rate is observed, implying that the posterior result has changed the prior opinion
because a constant daily mortality rate is summarized in the prior distribution. Therefore,
a more informative prior, the Weibull distribution should lead to some more realistic results.
4.1 Survival Models Based on The Exponential Prior 47
survival time from hatching to fledging (15 days)
daily
sur
viva
l rat
ethe posterior means of monitored variableand 95% posterior intervaltrend
t
Figure 4.4: The trend of the chick daily survival rate over time
Another important summary is the nestling period survival probability (S ), which is
also known as the single chick survival rate to days. It is the probability that chick survives
to a certain future date given that it has survived in the past days. The model fit and the
trend are shown in Figure 4.5 and Figure 4.6 respectively.
4.1 Survival Models Based on The Exponential Prior 48
survival time from hatching to fledging (15 days) t
The nestling period survival probability
95% posterior intervalposterior median of each element of themonitored variable
Figure 4.5: The model fit of the nestling period survival probability over time
The nestling period survival probability
survival time from hatching to fledging (15 days) t
the posterior means of monitored variableand 95% posterior intervaltrend
Figure 4.6: The trend of the nestling period survival probability over time
The model fit shows a vibrated curve before chick age 10. It has kept the consistency
with the plot of the daily survival rate. Therefore, similar ecological interpretations should
4.1 Survival Models Based on The Exponential Prior 49
be applicable.
Meanwhile, the trend provides a smooth decline with two drops around nestling age
5 and age 12. These changes suggest slightly increased hazard rates around the two ages
which is also consistent with the earlier results.
In bird brood performance studies, the probability density function (pdf ) of the nestling
life time was not emphasized. However, one advantage of this summary is that it can
provide the information more visually, such as which stage of the chick age the single chick
will suffer a higher hazard rate. In other words, the corresponding ecological explanations
will be more reasonable if this summary can be provided. The model fit and the trend of
the pdf are shown in Figure 4.7 and 4.8.
95% posterior intervalposterior median of eachelement of the monitoredvariable
survival time from hatching to fledging (15 days)
probability density function of survival time t
Figure 4.7: The model fit of the probability density of chick survival life time
4.1 Survival Models Based on The Exponential Prior 50
survival time from hatching to fledging (15 days)
the posterior means of monitored variableand 95% posterior intervaltrend
t
probability density function of survival time t
Figure 4.8: The trend of the probability density of chick survival life time
It can be seen that around age 2 and age 5, chicks were exposed to relative high hazard
risks, confirming the earlier results of the dsr and the S. Moreover, it also shows that the
death of a single chick most likely happened around chick age 2, which is a new finding. It
has been known that the physical mass of the new born chicks in a specific nest is usually
not balanced. Therefore, once the food resources are not abundant, the weak ones will
probably die from starvation. Therefore, the new discovery has provided strong evidence
that the rapid reduction of arthropod groups, caused by insecticide applications, which
comprised the main diet of the yellowhammer nestlings did influence the chick survival
conditions, especially that of the new born chicks.
4.1.3 Models Based on Block Structure
Although it was shown that no significant difference between blocks was detected, detailed
estimates were not available. It might be of interest if more information can be obtained at
the block level. The model fit of daily survival rate against blocks are shown in Figure 4.9.
4.1 Survival Models Based on The Exponential Prior 51
In the experiment, block 1 and 4 received normal treatments twice which were referred to
as ‘extra’ whereas block 2 and 3 received normal treatment only once which were referred
to as ‘normal’.
Four experiment blocks
daily
sur
viva
l rat
e
block
95% posterior intervalposterior median of each element of themonitored variable
Figure 4.9: The model fit of nestling daily survival rate against different blocks
In general, there was no significant difference between the first three blocks on the dsr.
However, the daily survival rate of chicks on the fourth block is much lower than those on
the other three blocks. After checking the original data, a large number of chicks found
on that block died from starvation. Unfortunately, the ecological interpretations are not
available at present.
4.1.4 Analysis of The Posterior Summaries & Convergence of
MCMC Sampling
Besides the discussion from the biometrics perspective, the mathematical analysis of the
posterior summaries obtained by using MCMC simulation is also important. Table 4.1.4
4.1 Survival Models Based on The Exponential Prior 52
provides various statistics of the posterior summaries of the regression coefficients.
Table 4.1: Posterior summaries from the Exponential Model
node mean sd 2.5% median 97.5%b0 -3.946 7.612 -14.26 -6.229 12.82
b1[1] 1.013 5.276 -9.556 1.949 8.918b1[2] 1.756 5.278 -8.799 2.695 9.663b2[1] -0.6453 3.625 -8.066 -0.2109 5.795b2[2] -0.9388 3.645 -8.462 -0.5143 5.576b2[3] -0.8334 3.634 -8.293 -0.4005 5.622b2[4] 0.6511 3.626 -6.743 1.079 7.095
We note that the posterior means for the slope parameter b1 are positive, implying
that there is a positive relationship (Equation 3.15) between the daily hazard rate and the
insecticide treatments which is consistent with the model fit plots. Moreover, the posterior
mean of b1[2] is larger than that of b1[1], confirming the earlier conclusion again that the
daily hazard risk was higher if a greater proportion of the foraging area was sprayed. In
general, no conclusions of the relationship between the hazard rate and the index of blocks
can be derived since both positive and negative means are involved in b2. However, the
posterior mean of b2[4] suggests that chicks found on the block No.4 suffered a much higher
mortality rate than those on the other three blocks. confirming the model fit in Figure 4.9.
Checking the convergence of the chains for the monitored variables of interest is also
important. In practice, running two long chains with widely differing initial conditions
is the most popular approach. The convergence is said to have been achieved if the two
chains appear to be overlapping one another. Figure 4.10 shows a reasonable convergence
of the two chains for the tenth element of the stochastic node t. From the plot, we can
conclude that the model is appropriate as the convergence is reached rapidly.
4.1 Survival Models Based on The Exponential Prior 53
initial values for MCMC
Figure 4.10: The convergence of two chains run for t[10]
Once the convergence is satisfactory, the burn-in iterations will be discarded. A fur-
ther number of iterations should be run to obtain samples that can be used for posterior
inferences. There is no fixed value for the number of the following iterations. However,
the Monte Carlo error (MC error) is usually used to assess the accuracy of the required
posterior distributions. It is defined as “the difference between the mean of the sampled
values (which we are using as our estimate of the posterior mean for each parameter) and
the true posterior mean”—[User Manual of WinBUGS]. In practice, it is said to be an
sufficient number of iterations if the MC error for each variable of interest is less than 5%
of the sample standard deviation. Table 4.2 reports a random subset of the posterior sum-
maries and the MC errors of some monitored variables. For a formal method to diagnose
the convergence, please refer to [Brooks and Gelman, 1998].
4.2 Survival Models Based on The Weibull Prior 54
Table 4.2: A random subset of the posterior summaries and the MC errors of some moni-tored variables
node mean sd MC error 2.5% median 97.5%S[116] 0.4233 0.08134 5.503E-4 0.2674 0.4229 0.5839S[117] 0.4237 0.08143 5.477E-4 0.268 0.4233 0.5849S[118] 0.6099 0.09249 5.338E-4 0.4212 0.6146 0.7708S[119] 0.1625 0.103 4.897E-4 0.007498 0.1541 0.3759S[120] 0.163 0.1034 5.39E-4 0.008026 0.1541 0.3771
pdf[116] 0.04263 0.004261 2.155E-5 0.03574 0.04237 0.05075pdf[117] 0.04267 0.004249 2.274E-5 0.03577 0.04243 0.05079pdf[118] 0.06238 0.01033 6.166E-5 0.04595 0.06104 0.085pdf[119] 0.01619 0.009402 4.469E-5 7.888E-4 0.01612 0.03194pdf[120] 0.01622 0.0094 4.876E-5 8.559E-4 0.01614 0.03194t[116] 8.428 0.8638 0.004302 7.065 8.391 9.913t[117] 8.419 0.8622 0.004417 7.066 8.381 9.911t[118] 4.862 1.153 0.005515 3.079 4.789 6.871t[119] 21.04 10.47 0.05434 11.25 17.82 49.27t[120] 20.98 10.41 0.05898 11.24 17.77 48.54
In Table 4.2, all the MC errors after 20000 times following-up are much smaller than
the 5% of the standard deviations. Therefore, the posterior inferences are appropriate. It is
also worth noting that the last two elements of t are estimated with large precision. This is
because, even though the survival conditions of the yellowhammer adults after fledging are
not of interest, the lack of such information and the average survival length of adult birds
which is used as the upper bound of the missing data structure have caused comparatively
large posterior standard deviations.
4.2 Survival Models Based on The Weibull Prior
Having observed a slightly decreasing trend of the daily survival rate, a more expert prior
on the individual survival time, a Weibull distribution is applied. Model comparisons
between the exponential model and the Weibull model are carried out. The discussion on
4.2 Survival Models Based on The Weibull Prior 55
the effects of different priors will also be reported. An empirical hyperparameter γ = 1.4
is selected to run the Weibull models. The first 5000 iterations are discarded and the
posterior distributions are then inferred after a 20000-iteration run.
4.2.1 Models Based on Non-Informative Priors
In this section, a very ‘vague’ prior for the regression coefficients is used to avoid disturbing
the posterior distribution:
bn ∼ Normal(0.0, 1.0E − 4)
In fact, extensive sensitivity studies in which each of these prior values are increased by
several orders of magnitude have given essentially identical results, implying that the exact
choice of prior had little influence on the posteriors obtained. Therefore, only one of them
will be reported in Section 4.2.2. It takes 169s to run the program on a 1.24 GHz personal
computer.
In order to compare the results with those given by the exponential model, Table 4.3
reports the same subset of the posterior summaries and the MC errors of some monitored
variables as Table 4.2.
4.2 Survival Models Based on The Weibull Prior 56
Table 4.3: A random subset of the posterior summaries and the MC errors of some moni-tored variables from the Weibull Model under very vague priors
node mean sd MC error 2.5% median 97.5%S[116] 0.4248 0.0878 5.464E-4 0.2551 0.4246 0.5952S[117] 0.425 0.08755 5.393E-4 0.2568 0.4255 0.5941S[118] 0.6645 0.1013 5.426E-4 0.4625 0.6704 0.8333S[119] 0.1449 0.09317 4.697E-4 0.006617 0.1359 0.3433S[120] 0.1449 0.09322 4.41E-4 0.006844 0.1365 0.3431
pdf[116] 0.05937 0.005737 2.995E-5 0.04976 0.05914 0.07042pdf[117] 0.05941 0.005766 2.909E-5 0.04984 0.05913 0.07048pdf[118] 0.07543 0.009935 6.467E-5 0.05692 0.07479 0.09644pdf[119] 0.02444 0.01282 6.512E-5 0.001627 0.02519 0.04473pdf[120] 0.02441 0.01278 5.589E-5 0.001657 0.02524 0.04461t[116] 8.43 0.8606 0.004425 7.067 8.398 9.912t[117] 8.427 0.8648 0.004195 7.064 8.391 9.918t[118] 4.946 1.146 0.005735 3.099 4.911 6.888t[119] 16.56 5.244 0.02886 11.15 15.04 30.41t[120] 16.55 5.192 0.02444 11.16 15.07 30.19
Compared to Table 4.2, the 95% HPDI s have comparatively shrunk and the posterior
standard deviations have decreased in general. These are particularly true for the estimates
of t in the last two rows. Moreover, the mean of each element of a monitored variable is
much closer to each other, implying that some exceptional data values hardly shake the
model. Therefore, the posterior summaries from Weibull model are more realistic and
robust. On the other hand, the results from the exponential model are relatively more
biased, implying that the model lacks robustness.
Models Based on Insecticide Quantities
First of all, it is vital to pool the data of the insecticides into the Weibull model. Fig-
ure 4.11 shows the relationship between the proportions of the foraging area sprayed with
insecticides and the chick daily survival rate. Meanwhile, Figure 4.12 illustrates the re-
lationship between the proportions of the foraging area sprayed with insecticides and the
4.2 Survival Models Based on The Weibull Prior 57
nestling period survival rate. Compared to Figure 4.1 and Figure 4.2, the curve slopes down
more evidently, supporting our assumption that the insecticide applications influenced the
survival conditions of the yellowhammer chicks.
daily
sur
viva
l rat
e
Insecticide applications1: <= 25% 200m-radius foraging area was sprayed2: > 25% 200m-radius foraging area was sprayed
posterior means of the monitored variablelinear fitting
Figure 4.11: The model fit of the chick daily survival rate over different proportions ofinsecticides (the Weibull model, under non-informative prior distributions)
Insecticide applications1: <= 25% 200m-radius foraging area was sprayed2: > 25% 200m-radius foraging area was sprayed
posterior means of the monitored variablelinear fitting
sing
le s
urvi
val r
ate
to d
ays
Figure 4.12: The model fit of the nestling period survival rate over different proportionsof insecticides (the Weibull model, under non-informative prior distributions)
4.2 Survival Models Based on The Weibull Prior 58
Models Based on Time
The models based on time are necessary. Figure 4.13 and Figure 4.14 provide the model
fit and the trend of the daily survival rate over time t respectively. In the model fit plot,
the drops are not so steep as those shown in Figure 4.3, which confirms the analysis of
Table 4.3 that the results are less influenced by some special nestlings with comparatively
extreme life lengths. Furthermore, the 95% Highest Posterior Density Intervals are much
narrower. Meanwhile, the trend shows a more evident decline than that in Figure 4.4. The
plot of the trend also shows some stages in chick age, around which relatively low daily
survival rates occurred. Such results have kept consistency with those in Figure 4.4, but
much clearer. Therefore, the same ecological explanations should work and be given more
credence.
survival time from hatching to fledging (15 days)
daily
sur
viva
l rat
e
t
95% posterior intervalposterior median of each element of themonitored variable
Figure 4.13: The model fit of the daily survival rate over time (the Weibull model, undernon-informative prior distributions)
4.2 Survival Models Based on The Weibull Prior 59
daily
sur
viva
l rat
ethe posterior means of the monitored variabletrend
survival time from hatching to fledging (15 days) t
Figure 4.14: The trend of the daily survival rate over time (the Weibull model, undernon-informative prior distributions)
Similarly, the plots of model fit and the trend of the single chick survival rate to days
(Figure 4.15 and Figure 4.16) have also shown better estimates than Figure 4.5 and Figure
4.6.
survival time from hatching to fledging (15 days) t
The nestling period survival probability
95% posterior intervalposterior median of each element of themonitored variable
Figure 4.15: The model fit of the individual chick survival rate to days (the Weibull model,under non-informative prior distributions)
4.2 Survival Models Based on The Weibull Prior 60
The nestling period survival probability
survival time from hatching to fledging (15 days) t
the posterior means of monitored variableand 95% posterior intervaltrend
Figure 4.16: the trend of the individual chick survival rate to days (the Weibull model,under non-informative prior distributions)
Finally, the model fit and the trend of the probability density of chick survival life
time are provided in Figure 4.17 and Figure 4.18 respectively. It is worth noting that The
95% HPDI s of the pdf of the survival time after chick age 13 (Figure 4.17) have suddenly
increased, suggesting that the monotone declining Weibull prior on the hazard rate is
not very suitable after that time point. Alternatively, the exponential model gives much
narrower HPDI s (Figure 4.7), implying that the daily survival rate after that chick age
was almost invariant. On the other hand, compared to the results about dsr and S, more
information has been provided by the pdf of chick survival time, confirming the earlier
conclusion that the study on the probability density of chick survival time is necessary.
4.2 Survival Models Based on The Weibull Prior 61
95% posterior intervalposterior median of eachelement of the monitoredvariable
survival time from hatching to fledging (15 days)
probability density function of survival time t
t
Figure 4.17: The model fit of the probability density of the nestling death time (the Weibullmodel, under non-informative prior distributions)
survival time from hatching to fledging (15 days)
the posterior means of monitored variableand 95% posterior intervaltrend
t
probability density function of survival time t
Figure 4.18: The trend of the probability density of the nestling death time (the Weibullmodel, under non-informative prior distributions)
4.2 Survival Models Based on The Weibull Prior 62
Checking The Convergence of The MCMC Simulation
It is still important to check the MCMC simulations for the stochastic nodes of interest,
such as the daily survival rate, the nestling period survival rate and pdf of single chick
failure time. Table 4.3 reports very small MC errors. On the other hand, once the conver-
gence has been reached, samples should look like a random scatter about a stable mean
value. Such evidence is reported in Figure 4.19. Moreover, some smoothed kernel density
estimates are also available. Four posterior densities about the seventeenth nestling are
shown in Figure 4.20. Two chains are run simultaneously, therefore the total sample size
is 40000.
Var
iabl
e va
lue
Var
iabl
e va
lue
Var
iabl
e va
lue
Figure 4.19: The multiple chains of some monitored variables which have reached conver-gence
4.2 Survival Models Based on The Weibull Prior 63
Figure 4.20: The posterior densities of four monitored variables of the 17th nestling
However, having been checked, the chains for the regression coefficients bn are still far
away from convergence. Such phenomena are shown in Figure 4.21.
b0 chains 1:2
iteration
5001 10000 15000 20000 25000
-300.0
-280.0
-260.0
-240.0
-220.0
b1[1] chains 1:2
iteration
5001 10000 15000 20000 25000
100.0
120.0
140.0
160.0
180.0
b2[1] chains 1:2
iteration
5001 10000 15000 20000 25000
80.0
100.0
120.0
140.0
160.0
Par
amet
er v
alue
Par
amet
er v
alue
Par
amet
er v
alue
Figure 4.21: The pairs of chains for bn which have clearly not reached convergence undervery vague priors
4.2 Survival Models Based on The Weibull Prior 64
Although they do not influence the inferences which are based on the monitored vari-
ables with converged posterior sample densities, they do imply that the prior distributions
are too vague. Therefore, more informative priors should be applied and the robustness of
the Weibull model should be tested. As a result, the prior parameter values are increased
by several orders of magnitude. As mentioned above, all of them have given essentially
equal results, and therefore only one of them will be discussed in the next section as an
example.
4.2.2 Models Based on Informative Priors
Given the prior distribution,
bn ∼ Normal(0.0, 1.0)
which is in fact the standard normal distribution, the posterior summaries of the regression
coefficients and the monitored variables are reported in Table 4.4 and Table 4.5 respectively.
It takes 177s to run the model on a 1.24 GHz personal computer.
Table 4.4: A random subset of the posterior summaries and the MC errors of some moni-tored variables from the Weibull Model under informative priors
node mean sd MC error 2.5% median 97.5%b0 -2.201 0.6409 0.02564 -3.488 -2.189 -0.9946
b1[1] -1.499 0.5977 0.0226 -2.666 -1.5 -0.3334b1[2] -0.6668 0.6061 0.02214 -1.852 -0.6708 0.5043b2[1] -0.858 0.5166 0.01539 -1.898 -0.85 0.1381b2[2] -1.022 0.6016 0.01349 -2.234 -1.009 0.132b2[3] -0.9346 0.5544 0.01438 -2.044 -0.9276 0.1263b2[4] 0.5433 0.5003 0.01464 -0.4686 0.5471 1.516
4.2 Survival Models Based on The Weibull Prior 65
Table 4.5: A random subset of the posterior summaries and the MC errors of some moni-tored variables from the Weibull Model under informative priors
node mean sd MC error 2.5% median 97.5%S[116] 0.432 0.08757 5.272E-4 0.2634 0.4318 0.6019S[117] 0.4318 0.08739 5.146E-4 0.265 0.4316 0.602S[118] 0.6686 0.1009 5.742E-4 0.4667 0.675 0.8353S[119] 0.1474 0.0949 4.811E-4 0.00679 0.1382 0.3493S[120] 0.1478 0.09517 4.908E-4 0.006506 0.1395 0.348
pdf[116] 0.05925 0.005629 3.014E-5 0.04997 0.0589 0.07024pdf[117] 0.05923 0.005626 3.102E-5 0.04994 0.0589 0.07016pdf[118] 0.07452 0.009777 6.182E-5 0.0562 0.07404 0.09523pdf[119] 0.02444 0.01282 6.409E-5 0.001629 0.02529 0.04474pdf[120] 0.0245 0.0129 6.75E-5 0.0016 0.02541 0.0448t[116] 8.425 0.8605 0.004308 7.068 8.395 9.909t[117] 8.429 0.8597 0.004353 7.066 8.392 9.911t[118] 4.958 1.149 0.005981 3.099 4.932 6.887t[119] 16.7 5.338 0.02796 11.16 15.17 30.81t[120] 16.7 5.364 0.02825 11.15 15.16 30.69
After comparison, there is little differences in essence between the data in Table 4.5
and Table 4.3, confirming that the model is very robust since the exact choice of prior
has little shaking on the posteriors. Moreover, Figure 4.22 and Figure 4.23 also provide
identical results with the corresponding ones under non-informative priors.
4.2 Survival Models Based on The Weibull Prior 66
survival time from hatching to fledging (15 days)
daily
sur
viva
l rat
e
t
95% posterior intervalposterior median of each element of themonitored variable
Figure 4.22: The model fit of nestling daily survival rate over time (the Weibull, understandard normal distribution priors)
daily
sur
viva
l rat
e
Insecticide applications1: <= 25% 200m-radius foraging area was sprayed2: > 25% 200m-radius foraging area was sprayed
posterior means of the monitored variablelinear fitting
Figure 4.23: The model fit of the chick daily survival rate over different proportions ofinsecticides (the Weibull model, under standard normal distribution priors)
Finally, it can be seen that the MC errors are less than 5% of the standard deviations in
Table 4.4, suggesting the reasonable convergence of chains has been reached, which is also
shown in Figure 4.24. Recall that the samplings on the regression coefficients are still far
4.3 Summary 67
away (Figure 4.21)from the stable means under non-informative priors after even longer
running time. Therefore, the informative priors are preferred in practice.
Par
amet
er v
alue
Par
amet
er v
alue
Par
amet
er v
alue
Figure 4.24: The pairs of chains for bn which have reached reasonable convergence underinformative priors
4.3 Summary
One of the distinguishing features of this chapter is that detailed discussions on the pos-
terior distributions are carried out from many different aspects. Sufficient data are used
to ensure that the models can give reasonable estimates. Different prior distributions (the
4.3 Summary 68
exponential distribution, the Weibull distribution) are used to build the Bayesian models.
Under the Weibull prior of the chick survival time, different prior distributions on the re-
gression coefficients (informative and non-informative) are also used to test the robustness
of posterior inferences. Therefore, all the models can be seen as hierarchical Bayesian GLM
models.
The most important finding is that there is a negative relationship between the insec-
ticide applications and the yellowhammer chick daily survival rate. The main research
goal of the project has been achieved. Furthermore, it is also found that the insecticide
applications negatively influence the yellowhammer nestling period survival rate in general.
Models for the chick daily survival rate, the nestling period survival rate and the prob-
ability density function of the chick survival based on time are built. Many informative
results are obtained and available ecological explanations are presented. Important achieve-
ments include: the yellowhammer chicks in the sample suffered from comparatively high
hazard risk at about chick age 2 and chick age 5; the dsr of the sample was not constant
but decreased. This suggested a more reasonable Weibull prior with a monotone decline
trend should provide more realistic estimates then the exponential prior.
Results from the Weibull prior show more clear features on all the monitored variables,
implying it is more suitable than the simple exponential prior. Different priors on regression
coefficients show no impacts on the results, suggesting that the model is very robust.
Analysis of the posterior summaries and MC errors in tables are also provided. The
convergence of the MCMC simulation is checked to make sure the inference is carried out
under a stable posterior distribution.
Finally, modelling on block level shows no significant difference except block 4. However,
the reason of such a phenomenon is still unknown.
Chapter 5
Discussions on Time Dependence
Models
In general, models discussed in this chapter represent totally different logical thinking
perspective from the last chapter. However, similar results with those presented in the
last chapter are obtained, implying that the Bayesian approaches are very flexible and
powerful. Meanwhile, some new tools in WinBUGs are used in this chapter. The language,
R, is applied as well.
5.1 Model I
For model I, Table 5.1 shows the input data.
69
5.1 Model I 70
Figure 5.1: Data format for model I
Figure 5.2 shows the autocorrelation function of two elements of the variable out to
lag-50. The plots illustrate that the drops of the autocorrelation coefficients happen in a
very short time, implying the reasonable convergence of simulating chains has been reached
rapidly.
Figure 5.2: The autocorrelation functions of the daily survival rate on the 9th day and the10th day respectively
Figure 5.3 provides a plot (in R) demonstrating the decrease of the daily survival rate
from 20th May, 2002 to 3rd June, 2002. A very smooth declining curve is obtained,
confirming the results from the failure time data models that the dsr of the yellowhammer
chick found in the experiment site was not a constant.
5.2 Model II 71
survival time from hatching to fledging (15 days)
daily
sur
viva
l rat
e
95% posterior interval
posterior median of eachelement of the monitoredvariable
t
Figure 5.3: The daily survival rate of chicks from 20th May, 2002 to 3rd June, 2002(BlockOne)
5.2 Model II
For Model II, only some simulated data are used to investigate the feasibility of the model
which are shown in Table 5.1. Suppose that the chick survival conditions were recorded
for six days, from day 1 to day 6. Denote each row as t1 (2 6 t1 6 6) and each column
as t2 (1 6 t2 6 6). Because the model is built to calculate the probability of survival to
days with an arbitrary beginning, each element of the matrix except those in the right
most column is used to denote the total number of chicks still found alive on t1 − 1 and
subsequently found dead on t2. The right most column is filled with the number of survival
chicks from t1 − 1 when the observation stopped. Consequently, the sum of each element
on row t1 is the total number of chicks found on t1 − 1.
5.2 Model II 72
Table 5.1: Input data for model II
13 2. 1. 3. 4. 600. 14 3. 5. 3. 500. 0. 15 3. 5. 400. 0. 0. 19 5. 300. 0. 0. 0. 5 20
Although model II is much more complex than model I, the convergence of chains
is reached rapidly (Figure 5.4). This is because very small data set is applied. Widely
differing initial values for chains are used to test the convergence.
Figure 5.4: The convergence for parameters αc and βc respectively
Alternatively, the convergence can be monitored dynamically by using the trace tool.
Figure 5.5 provides two examples.
5.3 Summary 73
Figure 5.5: The dynamics of the convergence for parameters αc and βc respectively
Because the original data are not arranged into a reasonable form for Model II, survival
analysis is not available. However, interesting findings should be available if this model
can be further developed.
5.3 Summary
In general, models in this chapter represent very novel ideas. The time dependence should
be considered in this project because the effects of insecticides decline when time passes,
i.e. it is a time dependent variable. A very exciting finding is that a declining trend of the
dsr between 20th May and 3rd June 2002, given by Model I, confirms the models in the
last chapter which arise from a totally different logical perspective. It should give more
accurate results if more data from more blocks are used. However, that was not carried
out because of the time limit of the student placement.
Model II can be used to model the nestling period survival rate with an arbitrary start
point and end point. Compared to other models in which the start point is restricted to
the hatch day, the results given by this model are more flexible and more abundant if real
data can be used.
Both the models may overestimate the underlying variables because the approximations
are used when the right censoring occurs on both death and fledging.
Further Developments
First of all, for the failure time models, non-grouped insecticide data will be applied to
give more detailed relationships between the insecticides and chick survival conditions. All
data from all experiments will be pooled in. In the current models, chicks are not differed
by their encounter ages, therefore age-specific survival rate will be studied. Similarly, the
chicks are supposed to be independent. However, the correlations between the death of
a chick in a specific nest might influence the survival conditions of the living ones. For
example, the death of one chick might give the surviving chicks a higher probability of
survival rate, based on an increased capacity of available food resources, aided with the
reduction of food competition.
Secondly, for the models with time dependence, more data from different time interval,
say July, is needed for model I and real data should be entered for model II.
Until now, WinBUGS still suffer from the convergence problems caused by huge data
sets. A computer in CSL always crashed when the author ran the WinBUGS programmes.
Similar experience also reported by other WinBUGS users. On the other hand, all data
from all sites will surely comprise an extremely huge data set because chick numbers are
much more than nest numbers. Therefore, a lower-level language instead of WinBUGS,
will be used to overcome such problems.
74
Conclusions
To demonstrate a link between the insecticides and the population decline of a specific bird
species, evidence is needed from different aspects [Boatman N. D. et al, 2004].
New and very important evidence for the indirect effects of insecticides on the yel-
lowhammer chick survival conditions has been found in the project. Therefore, more infor-
mation from different catalogues is now available on the link between the indirect effects of
insecticides and the population decline of the yellowhammer. If further work can be done,
a conclusive demonstration of that link should be very possible. Consequently, it will help
people to carry out healthier farming practice to protect the ecological system.
Moreover, this project represents a new attempt of applying Bayesian approaches to
model chick survival conditions. No assumptions are needed for the missing data in the
Bayesian failure time models. Therefore, it is much more advanced than the traditional
Mayfield method which has been dominating this area since 1960s. Other attempts of
using Bayesian methods include summarizing the data from different aspects, modelling
more complex chick survival probabilities.
A high-profile publication is very likely in the light of the new evidence and the novel
Bayesian application.
Personally, the author has learnt a lot of the Bayesian approaches and has been im-
pressed by the power and the flexibility of such methods in the bio-statistical modelling.
Meanwhile, the author has built a good working relationship with scientists in CSL and
75
5.3 Summary 76
therefore has motivated to carry out further developments on the project in CSL.
Appendix I
The Exponential Model
model{
for(i in 1:N){
t[i] ~ dexp(mu[i]) I(tmin[i], tmax[i])
S[i] <- exp(-mu[i]*t[i])
h[i] <- mu[i]
dsr[i] <- 1-h[i]
pdf[i] <- mu[i]*S[i]
log(mu[i]) <- b0 + b1[treat[i]] + b2[block[i]]
}
b0 ~ dnorm(0.0, 0.001)
for(k in 1:4){
b1[k] ~ dnorm(0.0, 0.01)
b2[k] ~ dnorm(0.0, 0.01)
}
}
77
Appendix I 78
The Weibull Model
model{
for(i in 1:N){
t[i] ~ dweib(r, mu[i]) I(tmin[i], tmax[i])
h[i] <- mu[i]*r*pow(t[i], r-1)
dsr[i] <- 1-h[i]
S[i] <- exp(-mu[i]*pow(t[i], r))
pdf[i] <- h[i]*S[i]
log(mu[i]) <- b0 + b1[treat[i]] + b2[block[i]]
}
b0 ~ dnorm(0.0, 1.0E-4)
for(k in 1:2){
b1[k] ~ dnorm(0.0, 1.0E-4)
}
for(k in 1:4){
b2[k] ~ dnorm(0.0, 1.0E-4)
}
}
Model I
Main Model in WinBUGS
model{
for( t in 1 : T-1 ) {
Appendix I 79
Ns[t+1] ~ dbin(dsr[t],N[t])
logit(dsr[t])<- alpha0+alpha1*(t)
}
alpha0 ~ dnorm(0.0,1.0E-6)
alpha1 ~ dnorm(0.0,1.0E-6)
}
data
list(T=17, N= c(3, 10, 10, 13, 12, 11, 11, 11, 10, 11, 11, 13,
12, 8, 6, 6, 6), Ns=c(0, 2, 10, 10, 12, 11, 11, 11, 10, 9, 11, 11,
12, 8, 6, 6, 6) )
initials(1)
list(alpha0 = 0, alpha1 = 0 )
initials(2)
list(alpha0 = 500, alpha1 =500 )
R Commands for The Plot
setwd("I:/") data <- read.table("I:/Model 1 Input data.txt")
dsr <-data[,2]
plot(1:16,dsr,type="l")
plot(1:16,dsr,ylim=c(0,1),type="l")
sd <- data[,3]
hi <- dsr+1.96*sd
Appendix I 80
lo <- dsr-1.96*sd
par(new=T)
plot(1:16,hi,ylim=c(0,1),type="o")
par(new=T)
plot(1:16,lo,ylim=c(0,1),type="o")
Model II
alive[t, t]: the no. of chicks found alive on day t
dsr[t]: chick survival rate on from day t-1 to day
model{
alphac~ dnorm(0, 0.01)
betac~ dnorm(0, 0.01)
for(i in 1:T){
logit(dsr[i])<- alphac+betac*(i)
}
for(t in 1:T){
m[t, 1:(T+1)] ~ dmulti(p[t, ], alive[t])
}
for(t in 1:T){
alive[t]<-sum(m[t, ])
}
Appendix I 81
#T:total observation days -1= 5
#chicks found alive on day 0~(T=4)
are found dead on t2: 1<=t1<=(T-1), t1<t2<=T
#t1: the second day to the fifth day starting
#t2: same with t1
for(t1 in 1:T){
for(t2 in t1:T){
for(i in t1:t2){
lsur[t1, t2, i]<-log(dsr[i])
}
p[t1, t2]<- exp(sum(lsur[t1, t2, t1:t2]))*(1-dsr[t2])
}
for(t2 in 1:(t1-1)){
p[t1, t2]<-0
}
p[t1, T+1] <- 1-sum(p[t1, 1:T])
}
}
data
list(T=5,
m=structure(
.Data = c(
13, 2., 1., 3., 4., 60,
Appendix I 82
0., 14, 3., 5., 3., 50,
0., 0., 15, 3., 5., 40,
0., 0., 0., 19, 5., 30,
0., 0., 0., 0., 5, 20)
,.Dim = c(5,6)
)
)
initials
list(alphac=2, betac=-0.1)
Appendix II
Initial Values for The Failure Time Models
Inits(1)
list(
t=c(14,14,14,14,12,8,5,1,14,14,14,14,14,14,
14,14,0,14,14,0,14,3,0,14,14,14,14,14,14,14,
14,6,6,0,0,3,3,3,3,5,5,5,14,14,14,0,14,14,5,
5,2,14,14,14,14,14,14,14,14,16,16,16,16,14,
8,2,2,14,14,14,14,14,14,14,14,14,14,2,2,2,2,
2,14,14,14,14,14,3,14,14,14,3,3,0,14,14,
4,4,0,0,NA,6,11,3,3,3,14,14,3,3,NA,NA,3,
3,7,7,7,3,11,11,3,3,9,9,6)
)
Inits(2)
list(
t=c(200,200,200,200,15,11,8,3,200,200,200,
200,200,200,200,200,1,200,200,3,200,6,3,200,
200,200,200,200,200,200,200,7,7,3,3,6,6,6,6,
8,8,8,200,200,200,2,200,200,9,9,5,200,200,200,
83
Appendix II 84
200,200,200,200,200,200,200,200,200,200,9,6,6,
200,200,200,200,200,200,200,200,200,200,6,6,5,
5,5,200,200,200,200,200,6,200,200,200,5,5,3,
200,200,7,7,4,4,NA,8,14,7,7,7,200,200,7,7,NA,
NA,6,6,10,10,10,7,200,200,7,7,13,13,9)
)
Input Data for The Failure Time Models
Rectangular Structure
Data
list(N=125)
t[] tmin[] tmax[] treat[] block[]
NA 13 300 2 1
NA 13 300 2 1
NA 13 300 2 1
NA 13 300 2 1
NA 12 15 2 1
NA 8 11 2 1
NA 5 8 2 1
NA 1 3 2 1
NA 11 300 2 1
NA 11 300 2 1
NA 13 300 1 1
NA 13 300 1 1
Appendix II 85
NA 13 300 1 1
NA 12 300 1 1
NA 12 300 1 1
NA 12 300 1 1
NA 0 1 1 1
NA 13 300 2 1
NA 13 300 2 1
NA 0 3 2 1
NA 12 300 2 1
NA 3 6 2 1
NA 0 3 2 1
NA 12 300 2 1
NA 12 300 2 1
NA 12 300 1 1
NA 12 300 1 1
NA 12 300 1 1
NA 12 300 1 1
NA 11 300 2 1
NA 11 300 2 1
NA 6 7 1 1
NA 6 7 1 1
NA 0 3 1 1
NA 0 3 1 1
NA 3 6 2 1
NA 3 6 2 1
NA 3 6 2 1
Appendix II 86
NA 3 6 2 1
NA 5 8 2 1
NA 5 8 2 1
NA 5 8 2 1
NA 13 300 2 1
NA 13 300 2 1
NA 13 300 2 1
NA 0 2 2 1
NA 9 300 2 1
NA 9 300 2 1
NA 5 9 1 1
NA 5 9 1 1
NA 2 5 1 1
NA 11 300 1 1
NA 11 300 1 1
NA 11 300 1 1
NA 13 300 1 2
NA 13 300 1 2
NA 13 300 1 2
NA 13 300 1 2
NA 11 300 1 2
NA 15 300 1 2
NA 15 300 1 2
NA 15 300 1 2
NA 15 300 1 2
NA 9 300 1 2
Appendix II 87
NA 8 9 1 2
NA 2 6 1 2
NA 2 6 1 2
NA 10 300 1 3
NA 10 300 1 3
NA 10 300 1 3
NA 9 300 1 3
NA 9 300 1 3
NA 11 300 1 3
NA 11 300 1 3
NA 11 300 1 3
NA 11 300 1 3
NA 13 300 2 3
NA 2 6 2 3
NA 2 6 2 3
NA 2 5 2 3
NA 2 5 2 3
NA 2 5 2 3
NA 10 300 1 3
NA 10 300 1 3
NA 10 300 1 3
NA 13 300 1 3
NA 13 300 1 3
NA 3 6 1 3
NA 13 300 1 3
NA 13 300 1 3
Appendix II 88
NA 13 300 1 3
NA 3 5 1 4
NA 3 5 1 4
NA 0 3 1 4
NA 8 300 1 4
NA 8 300 1 4
NA 4 7 2 4
NA 4 7 2 4
NA 0 4 2 4
NA 0 4 2 4
9 0 300 2 4
NA 6 8 2 4
NA 11 14 1 4
NA 3 7 1 4
NA 3 7 1 4
NA 3 7 1 4
NA 14 300 1 4
NA 14 300 1 4
NA 3 7 1 4
NA 3 7 1 4
7 0 300 1 4
7 0 300 1 4
NA 3 6 1 4
NA 3 6 1 4
NA 7 10 1 4
NA 7 10 1 4
Appendix II 89
NA 7 10 1 4
NA 3 7 1 4
NA 11 300 1 4
NA 11 300 1 4
NA 3 7 1 4
NA 3 7 1 4
NA 9 13 1 4
NA 9 13 1 4
NA 6 9 1 4
NA 3 6 1 4
END
S-Plus Structure
list( N=126, t=c(NA,NA,NA,NA,NA,NA,
NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,
NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,
NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,
NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,
NA,15,15,15,15,NA,NA,NA,NA,NA,NA, NA,
NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,
NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,
NA,NA,NA,NA,NA,NA,9,NA,NA,NA,NA,NA,
NA,NA,NA,NA,7,7,NA,NA,NA,NA,NA,NA,
NA,NA, NA,NA,NA,NA,NA,NA),
Appendix II 90
tmin=c(13,13,13,13,12,8,5,1,11,11,13,
13,13,12,12,12,0,13,13,0,12,3,0,12,12,
12,12,12,12,11,11,6,6,0,0,3,3,3,3,5,5,
5,13,13,13,0,9,9,5,5,2,11,11,11,13,13,
13,13,11,0,0,0,0,9,8,2,2,10,10,10,9,9,
11,11,11,11,13,2,2,2,2,2,10,10,10,13,
13,3,13,13,13,3,3,0,8,8,4,4,0,0,0,6,11,
3,3,3,14,14,3,3,0,0,3,3,7,7,7,3,11,11,3,
3,9,9,6,3),
tmax=c(300,300,300,300,15,11,8,3,300,
300,300,300,300,300,300,300,1,300,300,
3,300,6,3,300,300,300,300,300,300,300,
300,7,7,3,3,6,6,6,6,8,8,8,300,300,300,
2,300,300,9,9,5,300,300,300,300,300,300,
300,300,300,300,300,300,300,9,6,6,300,
300,300,300,300,300,300,300,300,300,6,
6,5,5,5,300,300,300,300,300,6,300,
300,300,5,5,3,300,300,7,7,4,4,300,
8,14,7,7,7,300,300,7,7,300,
300,6,6,10,10,10,7,300,300,
7,7,13,13,9,6),
treat =c(4,4,4,4,4,4,4,4,4,4,3,3,3,
3,3,3,3,4,4,4,4,4,4,4,4,3,3,3,3,4,
Appendix II 91
4,3,3,3,3,4,4,4,4,4,4,4,4,4,4,4,4,
4,3,3,3,3,3,3,1,1,1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,
2,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,
2,2,2,2,1,1,1,1,1,1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,1,1,1),
block =c(1,1,1,1,1,1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,
2,2,2,3,3,3,3,3,3,3,3,3,3,3,3,3,3,
3,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,
4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,
4, 4,4,4,4,4,4,4,4,4,4) )
Input Data for Model I Plot
node mean sd
dsr[1] 0.9599 0.0281
dsr[2] 0.9575 0.02669
dsr[3] 0.9548 0.02534
dsr[4] 0.9516 0.02404
dsr[5] 0.948 0.02283
dsr[6] 0.9439 0.02175
dsr[7] 0.9391 0.02091
dsr[8] 0.9337 0.02053
Appendix II 92
dsr[9] 0.9275 0.02088
dsr[10] 0.9205 0.02235
dsr[11] 0.9125 0.0253
dsr[12] 0.9033 0.02996
dsr[13] 0.893 0.03649
dsr[14] 0.8813 0.04491
dsr[15] 0.8683 0.05525
dsr[16] 0.8538 0.06742
Bibliography
Aebischer, N. J. (1999). Multi-way comparisons and generalized linear models of nest
success: extensions of the Mayfield method. Bird Study, 46(Suppl.):22–31.
Besbeas, P. (2002). Integrating mark-recapture-recovery and census data to estimate ani-
mal abundance and demographic parameters. Biometrics, 58:540–547.
Boatman N. D. et al (2004). Evidence for the indirect effects of pesticides on farmland
birds. Ibis, 146(Suppl.2):131–143.
Brooks, S. P. (1998). Markov Chain Monte Carlo method and its application. The Statis-
tian, 47(Part 1):69–100.
Brooks, S. P. and Gelman, A. (1998). Genaral methods for mornitoring convergence of
iterative simulations. Journal of Computational and Graphical Statistics, 7(4):434–
455.
Brooks, S. P. et al (2000). Bayesian animal survival extimation. Statistical Science,
15(4):357–376.
Brooks, S.P. et al (2004). A Bayesian appraoch to combining animal abundance and
demographic data. Animal Biodiversity and Conservation, 27(1):515–529.
BugsHelpList (2005). Bugs Help List. [email protected].
93
Bibliography 94
Campbell, L. H. et al (1997). A review of the indirect effects of pesticides on birds.
Technical Report 227, JNCC, Peterborough.
Cao, J. and He, Z. (2005). Bias adjustment in Bayesian estimation of bird nest age-specific
survival rates. unpublished.
Celeux, G. (2005). Deviance information criteria for mising data models. unpublished.
Congdon, P. (2001). Bayesian Statistics Modeling. John Wiley & Sons Ltd.
Congdon, P. (2003). Applied Bayesian Modelling. Wiley Series in Probability and Statistics.
John Wiley & Sons, Ltd.
D’ Agostino, R. B., editor (2004). Tutorials in Biostatistics, volume 1. John Wiley & Sons
Ltd.
Gamerman, D. (1997). Sampling from the posterior distribution in generalized linear mixed
models. Statistics and Computing, 7:57–68.
Gelman, A. and Rubin, D. B. (1992). Inference from iterative simulation using multiple
sequences. Statistical Science, 7(4):457–472.
Gilks, W. R. and Wild, P. (1992). Adaptive rejection sampling for Gibbs sampling. Applied
Statistics, 41(2):337–348.
Gilks, W. R. et al (1996). Markov Chain Monte Carlo in practice. Chapman & Hall.
Hart, J. D. et al (2005). The relationship between yellowhammer breeding performance,
arthropod abundance and insecticide applications on arable farmland. Journal of
applied ecology(in press).
He, C. Z. (2003). Bayesian modeling of age-specific survival in bird nesting studies under
irregular visits. Biometrics, 59:962–973.
Bibliography 95
He, C. Z. et al (2001). Bayesian modeling of age-specific survival in nesting studies under
dirichlet priors. Biometrics, 57:1059–1066.
Hensler, G. L. and Nichols, J. D. (1981). The Mayfield method of estimating nesting
success: a model estimators and simulation results. Wilson Bulletin, 93(1):42–53.
Johnson, D. H. (1979). Estimating nest success: the Mayfield method and an alternative.
The Auk, 96:651–661.
Kalbfleisch, J. and Prentice, R. (2002). The Statistical Analysis of Failure Time Data.
John Wiley &Sons,Inc.
King, R. et al (2005). Identifying and diagnosing population declines: a Bayesian assess-
ment of lapwings in the uk. unpublished.
Krause, A. and Olson, M. (2000). The Basics of S and S-Plus. Springer.
Lee, P. M. (2004). Bayesian Statistics, an introduction. Arnold, third edition.
Lindley, D. V. (2000). The philosophy of statistics (with discussion). The Statistician,
13:136–141.
Manly, B. F. J. and Schmutz, J. A. (2001). Estimation of brood and nest survival: com-
parative methods in the presence of heterogeneity. Journal of wildlife management,
65(2):258–270.
Mayfield, H. F. (1961). Nesting success calculated from exprosure. Wilson Bulletin, 73:255–
261.
Mayfield, H. F. (1975). Suggestions for calculating nest success. Wilson Bulletin, 87(4):456–
466.
Bibliography 96
Miller, I. and Miller, M. (2004). John E. Freund’s Mathematical Statistics with Applica-
tions. Pearson Education, Inc., seventh edition.
Morris, A. J. et al (2005). Indiret effects of pesticides on breeding yellowhammer. Agricul-
ture, Ecosystems and Enviroment, 106:1–16.
O’Hagan, A. and Forster, J. (2004). Bayesian Inference, volume 2B of Kendall’s advanced
theory of statistics. Arnold.
R Project (2005). R Project. http://www.r-project.org/.
Spiegelhalter, D. and Best, N. (2005). Short course in Bayesian Analysis, MCMC, and
WinBUGS.
Spiegelhalter, D. J. et al (2002). Bayesian measures of model complexity and fit. Journal
of the Royal Statistical Society, 64(Series b):583–640.
Spiegelhalter, D. J. et al (2004). Bayesian Approaches to Clinical Trials and Health-Care
Evalution. John Wiley & Sons, Ltd.
Spiegelhalter D. J. et al (2005). WinBUGS Website. http://www.mrc-bsu.cam.ac.uk/
bugs/welcome.shtml.
Vittinghoff, E. et al (2005). Regression Methods in biostatistics, Linear, Logistic, Survival,
and Repeated Measure Models. Statistics for Biology and Health. Springer.