Bayesian inference for controlled branching processes through MCMC and ABC methodologies

RACSAMDOI 10.1007/s13398-012-0072-8

ORIGINAL PAPER

Bayesian inference for controlled branching processesthrough MCMC and ABC methodologies

Miguel González · Cristina Gutiérrez ·Rodrigo Martínez · Inés M. del Puerto

Received: 18 February 2012 / Accepted: 5 June 2012© Springer-Verlag 2012

Abstract The controlled branching process (CBP) is a generalization of the classicalBienaymé–Galton–Watson branching process, and, in the terminology of population dynam-ics, is used to describe the evolution of populations in which a control of the populationsize at each generation is needed. In this work, we deal with the problem of estimating theoffspring distribution and its main parameters for a CBP with a deterministic control functionassuming that the only observable data are the total number of individuals in each generation.We tackle the problem from a Bayesian perspective in a non parametric context. We considera Markov chain Monte Carlo (MCMC) method, in particular the Gibbs sampler and approx-imate Bayesian computation (ABC) methodology. The first is a data imputation method andthe second relies on numerical simulations. Through a simulated experiment we evaluate theaccuracy of the MCMC and ABC techniques and compare their performances.

Keywords Controlled branching process · Bayesian inference · Gibbs sampler ·Approximate Bayesian computation · Non-parametric

This research was supported by the Ministerio de Ciencia e Innovación and the FEDER through the grantsMTM2009-13248.

M. González · C. Gutiérrez · R. Martínez · I. M. del Puerto (B)Department of Mathematics, University of Extremadura,Badajoz, Spaine-mail: [email protected]

M. Gonzáleze-mail: [email protected]

C. Gutiérreze-mail: [email protected]

R. Martíneze-mail: [email protected]

M. González et al.

1 Introduction

The behaviour of the long-time evolution of many populations depends on the conditionson the offspring reproduction model and on the span life of their individuals. The theory ofbranching processes was developed to deal with these models. The terminology BranchingProcess was introduced by Kolmogorov in the 1940s of the twentieth century but the subjectis much older and goes back to more than a century and a half ago. Initially it was motivatedto explain the extinction phenomenon of certain family lines of European aristocracy andnames like Bienaymé, Galton and Watson are linked to the former studies. Nowadays, thiskind of processes are treated extensively for their mathematical interest and as theoreticalapproaches to solving problems in applied fields such as Biology (gene amplification, clonalresistance theory of cancer cells, polymerase chain reactions, etc.), Epidemiology, Genetics,and Cell Kinetics (the evolution of infectious diseases, sex-linked genes, stem cells, etc.),Computer Algorithms and Economics, and, of course, Population Dynamics, to mention onlysome of the more important applications.

In particular, in this work we are interested in the class of controlled branching processes.These are discrete-time stochastic processes that model population developing in the fol-lowing manner: at generation 0, the population consists of a fixed number of individuals orprogenitors, each of them, independently of the others and in accordance with a commonprobability distribution gives rise to offspring, and then ceases to participate in subsequentreproduction processes. Thus, each individual lives for one unit of time and is replaced by arandom number of offspring. Moreover, as for various reasons of an environmental, social, orother nature, the number of progenitors which take part in each generation must be controlled,a control mechanism is introduced in the model to determine the number of offspring in eachgeneration with reproductive capacity and so on. Mathematically, a controlled branchingprocess (CBP), {Zn}n≥0, is defined recursively as

Z0 = N ≥ 1, Zn+1 =φ(Zn)∑

i=1

Xni , n = 0, 1, . . . , (1)

where {Xni : i = 1, 2, . . . , n = 0, 1, . . .} is a sequence of independent and identically distrib-uted non–negative integer-valued random variables and φ is a function assumed to be non-negative and integer-valued for integer-valued arguments. The empty sum in (1) is definedto be 0.

Intuitively, Zn denotes the number of individuals in generation n, and Xni the number ofoffspring of the i th individual in generation n. Thus, the probability law {pk}k≥0 is called theoffspring distribution or reproduction law, and m and σ 2, respectively, the offspring meanand variance (both assumed finite). Respect to the control function φ, if φ(Zn) < Zn , thenZn − φ(Zn) individuals are artificially removed from the population, and therefore do notparticipate in the future evolution of the process. If φ(Zn) > Zn , then φ(Zn)− Zn new indi-viduals of the same type are added to the population participating under the same conditionsas the others. No control is applied to the population when φ(Zn) = Zn .

It is easy to see that {Zn}n≥0 is a homogeneous Markov chain. Moreover assuming thatφ(0) = 0 and p0 > 0, the classical duality in branching process theory, extinction–explosionholds, i.e., P(Zn → 0) + P(Zn → ∞) = 1.

For this process, theoretical considerations have been tackled in several papers, see forexample, [1,13,14,24,30,35]. Various extensions have been considered. It is worthy high-lighting the controlled branching processes with random control function [7,8,15–20,25,34].For these processes, for each n = 0, 1, . . ., independent stochastic processes {φn(k)}k≥0 are

Bayesian inference for controlled branching processes

introduced, with equal one-dimensional probability distributions and independent of the off-spring distribution. Thus, in each generation n = 1, 2, . . ., with size Zn , the number ofprogenitors for the next generation is determined by the random variable φn(Zn). It is clearthat the CBP (1) is a particular case of a CBP with random control function.

Few of the quoted references deal with statistical issues. In addition most of those that doit are focussed on a frequentist viewpoint. Indeed, the only paper that considers a Bayesianoutlook is [25]. In particular it considers a CBP with random control function with offspringdistribution belonging to the power series family and establishes the asymptotic normalityof the posterior distribution of the offspring mean.

It is the aim of this paper to develop the Bayesian inference theory, in a nonparametricscenario, in the class of CBP with a deterministic control function. Mainly, inference on theoffspring distribution and on its main moments is developed. The branching process theoryusually has assumed that the entire family tree is needed to be observed in order to makeinferences, in the nonparametric framework, on the offspring distribution (see [21,23]). Asnovelty, we consider in this paper that the sample one can observe is given by only the genera-tion-by-generation population size. In this case, the problem can be dealt as an incomplete dataproblem. Then to make inference based on this sample we first consider a traditional Markovchain Monte Carlo (MCMC) method, which is the Gibbs sampler. After that, approximateBayesian computation (ABC) methods are considered. ABC methods are being developedduring last decade as an alternative to such more traditional MCMC methods. These likeli-hood-free techniques are very well-suited to models for which the likelihood of the data areeither mathematically or computationally intratable but it is easy to simulate from them, sothat they are very appropriate for studying the inference of CBPs.

Through a simulated example we first evaluate the accuracy of the MCMC to estimate thereproduction law and the offspring mean and variance. To this end we rely on well-knowntheoretical convergence results as well as on practical procedures for checking numericalconvergence for MCMC methods. Later we analyze whether ABC techniques can provideaccurate estimates of the parameters of interest. In the literature, no methods are available toevaluate the approximations given by ABC methodology. Thus, the performance indicatorwe consider for ABC procedure is its ability to provide posterior probability distributionclose enough to the ones given by MCMC (as was also proposed in [5]). Mainly, we evaluatethe accuracy of the ABC methods by focussing on the performance of the estimate of theoffspring mean.

The paper is organized as follows. Section 2 describes an algorithm based on Gibbs sam-pler and Sect. 3 is devoted to ABC methods. We describe the rejection ABC algorithm andtwo post-processing schemes that change the analysis of the ABC output. Section 4 illustrateswith a simulated example the described algorithms and compares their performances.

Throughout the paper, we shall assume a CBP with the control function φ known andwith an offspring distribution p = {pk} whose support, S, is assumed to be finite and alsoknown for simplicity, let denote its cardinal by s. Let denote Zn = {Z0, . . . , Zn}, the priorprobability density for p by π(p) and by π(p | Zn) the posterior distribution of p afterobserving Zn .

2 Gibbs sampler

We shall describe an algorithm based on the Gibbs sampler (see, for instance [6]) to approxi-mate the posterior offspring distribution of a CBP only by observing Zn and considering thecontrol function φ to be known.

M. González et al.

Let us introduce Zl(k), k ∈ S, l = 0, 1, . . . , n − 1, as the random variable that rep-resents the number of individuals at the lth generation with exactly k offspring and denoteZ∗

n = {Zl(k), k ∈ S, l = 0, 1, . . . , n − 1}. Mathematically,

Zl(k) =φ(Zl )∑

j=1

I{Xl j =k},

with IA standing for the indicator function of the set A.Taking advantage of the conjugate theory, let consider that the prior distribution of p is a

Dirichlet distribution with parameter α, being α = (αk, k ∈ S), αk > 0, i.e.

π(p) = d(α)∏

k∈S

pαk−1k , with d(α) = �(α∗)

(∏

k∈S

�(αk)

)−1

,

being α∗ = ∑k∈S αk and �(·) denoting the Gamma function. It is easy to check that the

likelihood function based on Z∗n is proportional to

∏

k∈Sp

Z∗n,k

k , (2)

with Z∗n,k = ∑n−1

l=0 Zl(k). We consider the unobservable variables Zl(k), k ∈ S, ł =0, 1, . . . , n − 1, as latent variables and the augmented parameter vector (p, Z∗

n ). Weshall approximate the posterior distribution of (p, Z∗

n ) after observing Zn , denoted byπ(p, Z∗

n | Zn), and from this obtain an approximation for π(p | Zn). In order to sample theposterior distribution π(p, Z∗

n | Zn), taking into account the Gibbs sampler algorithm, it isonly necessary to determine the conditional posterior distribution of p after observing Zn

and Z∗n , denoted by π(p | Zn, Z∗

n ), and the conditional posterior distributions of Z∗n after

observing (p, Zn), denoted by f (Z∗n | p, Zn). Taking into account that

φ(Zl) =∑

k∈SZl(k) and Zl+1 =

∑

k∈Sk Zl(k) l = 0, . . . , n − 1, (3)

π(p | Zn, Z∗n ) is the same as π(p | Z∗

n ). Now, from (2) it is deduced that

π(p | Z∗n ) ∝ d(β)

∏

k∈Spβk−1

k , (4)

with βk = αk + Z∗n,k and β = (βk, k ∈ S).

With respect to f (Z∗n | p, Zn), since the individuals reproduce independently, one has

that

f (Z∗n | p, Zn) =

n−1∏

l=0

f ({Zl(k) : k ∈ S}|p, Zl , Zl+1),

with f ({Zl(k) : k ∈ S}|p, Zl , Zl+1) denoting the conditional distribution of the random vec-tor (Zl(k); k ∈ S), given p, Zl , and Zl+1 (the proof follows similar steps to that developedin the Appendix of [12]).

Moreover it is easy to see that the distribution of {Zl(k) : k ∈ S} given p, Zl and Zl+1

is obtained from a multinomial distribution with size φ(Zl) and probabilities given by p,normalized by considering the constraint Zl+1 = ∑

k∈S k Zl(k). Once it is known how toobtain samples from the distributions π(p | Zn, Z∗

n ) and f (Z∗n | p, Zn), the Gibbs sampler

algorithm works in the following way:


Gibbs sampler algorithm

generate p(0) ∼ π(p)

do i = 1generate Z∗(i)

n ∼ f (Z∗n | Zn, p(i−1))

generate p(i) ∼ π(p | Z∗(i)n )

do i = i + 1

Thus, for a run of the sequence {(p(l), Z∗(l)n )}l≥0, we choose Q+1 vectors p(N ), p(N+G), . . .,

p(N+QG), with N , G, Q > 0. These vectors are approximately independent sampled valuesof the distribution π(p | Zn) if G and N are large enough (see details in [32]). Since thesevectors could be affected by the initial state p(0), we apply the algorithm T times, obtaininga final sample of length T (Q + 1).

3 Approximate Bayesian computation

The use of ABC ideas initially come from the field of population genetics, although thesewere quickly extended to a great variety of scientific applications areas. The basic ideas areto simulate a large number of data from a model depending on a parameter that is drawnfrom a prior distribution and to calculate for each simulated data summary statistics that arecompared with the values for the observed sample. An approximate sample from the pos-terior distribution is given by the parameters that provide summary statistics close enoughto summary statistics of the observed sample. That is, the aim of the ABC methodology isto provide samples from a posterior distribution which is a good (enough) approximation ofthe target distribution. A survey on ABC algorithms can be read in [22].

For a CBP, given an specific p, it is easy to simulate the entire family tree up to the currentnth generation and then samples from the random variable {Zl(k), k ∈ S, l = 0, 1, . . . , n−1}and from (3) to obtain the population size in each generation.

Let denote f (Z∗n | p) the probability density of Z∗

n given p. As before Zn = {Z0, . . . , Zn}and consider the observed data from the model (1) Zobs

n = {Z obs0 , . . . , Z obs

n }.We describe the rejection ABC algorithm and two post-processing schemes to obtain good

approximations to π(p | Zn).

3.1 Likelihood-free sampler: tolerance rejection algorithm

This algorithm is an adaptation of that proposed in [28]. It is based on summary statistics,S(·), calculated for Z∗

n , and on a distance, ρ(·, ·), on S(Z∗n ).

For a given ε > 0, known as a tolerance level, the algorithm proposed below providessamples from π(p | ρ(S(Zn), S(Zobs

n )) ≤ ε) which allow us to obtain a good approximationto π(p | Zobs

n ) by using a small enough ε and good choices of S(·) and ρ(·, ·).Taking into account that the available sample is the total population size in each generation

and that our aim is to obtain an approximation of π(p | Zn), we consider S(Z∗n ) = Zn , that

is, from a simulation of the entire family tree up to generation n we keep the total populationsize in each generation of such simulate data. Indeed we implicity assume that we work withthe entire observed data, Zn , rather than building reduced dimension summary statistics ofthem as is usually done in practice. This assumption is also considered in a Markovian contextin [9]. The search of sufficient statistics for p which guarantee π(p | Zn) = π(p | S(Zn))

is difficult.

M. González et al.

Several metrics can be proposed to evaluate when the simulated data match the observeddata. The more intuitive and usual ones:

– 1- metric

ρ1

(Zn, Zobsn

) =n∑

i=1

|Z obsi − Zi |

– Euclidean metric:

ρe(Zn, Zobs

n

) =(

n∑

i=1

(Z obsi − Zi )

2

)1/2

Also we consider the Hellinger’s metric, defined as,

ρh(Zn, Zobs

n

) =(

n∑

i=1

((Z obs

i )1/2 − Z1/2i

)2)1/2

This was first considered in [3] to deal with the efficiency and robustness properties in para-metric estimation problems.

Finally, due to the aspect of our particular data we also could consider the followingmetric:

ρs(Zn, Zobs

n

) =∣∣∣∣

∑ni=1 Z obs

i∑ni=1 Zi

−∑n

i=1 Zi∑ni=1 Z obs

i

∣∣∣∣ + 1

2

n∑

j=1

∣∣∣∣∣Z j∑n

i=1 Zi− Z obs

j∑ni=1 Z obs

i

∣∣∣∣∣ .

This metric is a modification of that proposed in [26] in order to obtain the symmetric prop-erty. The first summand measures the difference between the total progenie in the sampleand simulated data, whereas the second summand is the total variation distance betweenthe two proportions of individuals in each generation respect to the total progenie in thesimulated sample and in the observed data. Thinking in the estimation of the offspring prob-ability, this metric considers as valid simulated data those that provide similar progeniesto the observed data and small total variation distance between the two vectors of propor-tions.

Let us formulate the tolerance rejection ABC algorithm by considering a generic metricρ(·, ·).

Likelihood-free sampler: Tolerance rejection algorithm

for i = 1 to m dorepeat

generate p ∼ π(p)

generate Z∗n from the likelihood f (Z∗

n | p)

until ρ(Zn, Zobsn ) ≤ ε, with Zn = S(Z∗

n )

set p(i) = pend for

In Sect. 4 we compare the effect of the metrics through a simulated example.

3.2 Likelihood-free sampler: local linear regression algorithm

In [2] it is proposed an extension of the rejection ABC algorithm. This is based on a local-linear regression fitting of simulated parameters on simulated summary statistics, and then


to use this to predict the true parameter values by substituting the observed sample into theregression equation. Let develop the algorithm: suppose that we have simulated independentpairs (p(i), Z∗(i)

n ), i = 1, . . . , m, and we calculate on the simulated data the summary statis-tic, S(·), described in the previous section, obtaining the independent pairs (p(i), Z(i)

n ), i =1, . . . , m.

It is considered the following regression model:

⎛

⎜⎜⎜⎜⎜⎜⎜⎜⎝

p(1)

p(2)

...

p(m)

⎞

⎟⎟⎟⎟⎟⎟⎟⎟⎠

=

⎛

⎜⎜⎜⎜⎜⎜⎜⎜⎝

1 Z (1)1 − Z obs

1 . . . Z (1)n − Z obs

n

1 Z (2)1 − Z obs

1 . . . Z (2)n − Z obs

n

......

. . ....

1 Z (m)1 − Z obs

1 . . . Z (m)n − Z obs

n

⎞

⎟⎟⎟⎟⎟⎟⎟⎟⎠

⎛

⎜⎜⎜⎜⎜⎜⎜⎝

α

β1

...

βn

⎞

⎟⎟⎟⎟⎟⎟⎟⎠

+

⎛

⎜⎜⎜⎝

E1

E2...

Em

⎞

⎟⎟⎟⎠ (5)

being α and βi , i = 1, . . . , m, s-dimensional parameter row vectors (recall s is the dimen-sion of the support of the offspring distribution) and the s-dimensional row random vectorsEi , i = 1, . . . , m, are formed by zero-mean, uncorrelated and constant variance randomvariables. Equivalently (5) is written as

P = DB + E

with P = (p(1), . . . , p(m))t , D = (di j )i∈{1,...,m}, j∈{1,...,n+1}, being di,1 = 1, di, j = Z (i)j−1 −

Z obsj−1, j = 2, . . . , n + 1, i = 1, . . . , m, B = (α, β1, . . . , βn)t and E = (E1, . . . , Em)t .

The linearity and addition assumptions in (5) will often be implausible, however thesecan be applied locally. Thus, the estimation of B is obtained by minimizing the weightedleast-squares criterion

m∑

i=1

(p(i) − Di. B

)2Wi , (6)

where Di. is the i th row of the matrix D and Wi is a collection of weights. The solution to(6) is given by

B = (Dt W D

)−1Dt W P, (7)

with W a diagonal matrix whose i th diagonal element is Wi . In [2] it is recommended theuse of the Epanechnikov kernel to determine the weights Wi . Thus, for each i ∈ {1, . . . , m},let ti = ρ(Z(i)

n , Zobsn ), and

Wi ={

cε−1(1 − (ti/ε)2

), ti ≤ ε

0, ti > ε,(8)

being c a normalizing constant and ε the tolerance level.

M. González et al.

By considering a generic metric ρ(·, ·), it is proposed the following algorithm:

Likelihood-free sampler: local-linear regression algorithm

for i = 1 to m dorepeat

generate p(i) ∼ π(p)

generate Z∗(i)n from the likelihood f (Z∗

n |p(i))

calculate ti = ρ(Z(i)n , Zobs

n ), Z(i)n = S(Z∗(i)

n )

end forpick out the runs with ti ≤ ε. Let I the index set of theselected runs.define the weights Wi according to (8), i ∈ I.solve (6) to obtain B.calculate

p(i∗) = p(i) − (Z(i)n − Zobs

n )(β1, . . . , βn)t (9)

being the selected runs (p(i∗), Z(i)n ), i ∈ I.

The outputs p(i∗) are s-dimensional vectors whose coordinates sum one, but, due toregression, some of them can be negative. Such outputs must be removed from the sample.An alternative to warrant that adjusted parameters are probability vectors can be to considera transformation of the original responses. In in our case, we set out a multinomial logisticregression by using as responses

q(i) =(

logp(i)

1

p(i)s

, . . . , logp(i)

s−1

p(i)s

)

and solve Q = DB, with Q = (q(1), . . . , q(m))t , keeping the same notation for D and B(notice now that α and βi , i = 1, . . . , m, are (s − 1)-dimensional parameter row vectors).Finally setting q(i∗) = q(i) − (Z(i)

n − Zobsn )(β1, . . . , βn)t , Eq. (9) in the algorithm is replaced

by p(i∗) = (p(i∗)1 , . . . , p(i∗)

s ), with

p(i∗)j =

exp{

q(i∗)j

}

1 − ∑s−1k=1 exp

{q(i∗)

k

} , j = 1, . . . , s − 1, and

p(i∗)s = 1

1 − ∑s−1k=1 exp

{q(i∗)

k

} .

As was suggested in [2], for both rejection and regressions methods we set δ to be a quan-tile, qδ , of the empirical distribution function of the simulated ρ(Zn, Zobs

n ), with Zn = S(Z∗n ).

For instance, qδ = 0.15, for the regressions methods, means that the 15 % of the simulatedS(Z∗

n ) that provide the smallest values of ρ(Zn, Zobsn ) are assigned a nonzero weight.

4 Simulated example

To compare the performance of the algorithms we have previously described, we consider aparticular case of a control function which keeps the number of parents between two bounds.In an ecological context, a CBP with this kind of control function would be very useful,


Fig. 1 Evolution of thesimulated population sizes,{Z obs

0 , . . . , Z obs40 }

0 10 20 30 405

1015

2025

Generations

Indi

vidu

als

for example, to model the evolution of animal populations that are invasive species that arewidely recognized as a threat to native ecosystems, but there is disagreement about plans toeradicate it. That is, while the presence of the species is appreciated by a part of the society,if their numbers are left uncontrolled it is known to be very harmful to native ecosystems.In such a case it is better to control the population to keep it between admissible limits eventhough this might mean periods when animals have to be culled. Two examples of recentdiscussions about this topic are [10] and [33]. Thus, we have considered a CBP with offspringdistribution

p0 = 0.28398, p1 = 0.42014, p2 = 0.233090, p3 = 0.05747, p4 = 0.00531

and control function

φ(x) = 7I{0<x≤7} + k I{7<x≤20} + 20I{x≥20}, φ(0) = 0.

In our example the choice of such particular values for the bounds answers to computationalreasons. Let us denote by μ and by σ 2 the mean and variance of the offspring law, called off-spring mean and variance, respectively. Under these conditions, μ = 1.08 and σ 2 = 0.7884.For this model, we have simulated 40 generations starting with Z0 = 10.

Figure 1 shows the evolution of the simulated population sizes. The horizontal lines rep-resent the lower and upper control bounds. Using the standard classification for a classicalBGW process, we refer to a subcritical, critical, or supercritical offspring law depending onwhether μ is less than, equal to, or greater than one, respectively. It is known that a classicalBGW process with supercritical offspring mean has a positive probability of non-extinction,and in such a case it grows exponentially. Now, the presence of a control function can have adrastic effect on the population. Considering this particular kind of control function and usingthe results in [15], it can be deduced that this process dies out with probability one, althoughthe extinction time must be large, depending on the individuals’ reproductive capacity, i.e.,on the value of μ.

M. González et al.

Table 1 Potential scalereduction factor andautocorrelation for p

Potential scale reduction Autocorrelation

Est. 97.5 % lag1 lag100 lag350

p0 1.02 1.02 0.9407 0.2663 0.0373

p1 1.03 1.03 0.9868 0.4143 0.0605

p2 1.03 1.04 0.9863 0.4788 0.0696

p3 1.02 1.02 0.9682 0.2143 0.0077

p4 1.01 1.01 0.9372 0.1302 −0.0024

Table 2 Summary statistics forthe posterior distributionsof μ and σ 2

Mean SD MCSE TSSE

μ 1.0515 0.0420 0.0006 0.0006

σ 2 0.9656 0.2192 0.0030 0.0028

4.1 Evaluations and comparisons between MCMC and ABC methods

To this end we focus on, mainly, the estimation of μ, based on the population size in eachgeneration.

Assuming that there was no prior information available, we considered a Dirichlet priordistribution with parameters αk = 1/2, k ∈ S, as is suggested in [4] in order to apply thealgorithms described in the previous sections.

For the Gibbs method, we set T = 200 and run the algorithm described in Sect. 2, 10,000times for each chain. Using the Gelman–Rubin–Brooks diagnostic plots as a test of the con-vergence of the resulting probabilities to the stationary distribution, we set N = 1,000. Table1 gives the estimated potential scale reduction factor together with a 97.5 % confidence upperbounds. That the values of the estimated scale reduction factor are close to one suggests thatfurther simulations will not improve the values of the listed scalar estimators (see [6,11]). Thetable also shows that the batch size G = 350 is sufficient. The final sample size was 5,200.

To evaluate the algorithm’s efficiency, in Table 2 we have extracted some summary sta-tistics for the posterior distributions of μ and σ 2. Note that, due to the batch procedure, thetime-series standard errors (TSSE) are very close to the Monte Carlo standard errors (MCSE).Also, for the two parameters, the standard errors (MCSE and TSSE) are less than 5 % ofthe posterior standard deviation (SD), indicating that the number of observations consideredseems to be a reasonable choice.

Figure 2 shows the estimated posterior density for μ and σ 2 together with their Bayesestimates under squared error loss, and the 95 % HPD sets (by considering the sample untilgeneration 40). One observes that the 95 % HPD sets for μ and σ 2 contain the true valuesof these parameters. It can be considered that these are accurate estimates of the posteriordistributions of the parameters.

Now, we analyze the performance of the ABC approach and achieve its comparison withMCMC method.

For the ABC methods, it is remarkable the facility for simulating samples from a CBP.So that, we have simulated a pool of 20 millions of CBPs until generation 40, with Z0 = 10,the previous control function (considered known) and offspring laws randomly chosen fromthe prior Dirichlet distribution used for the MCMC method.


Den

sity

hpd 95% hpd 95%μμ

Den

sity

0.90 0.95 1.00 1.05 1.10 1.15 1.20

02

46

810

0.5 1.0 1.5 2.0

0.0

0.5

1.0

1.5

2.0

95% hpd 95% hpdσ2 σ2

Fig. 2 Estimated posterior density for μ (left) and σ 2 (right)

0.00

500.

0055

0.00

600.

0065

Euclidean metric Hellinger metric l1 metric MCMC s metric

Fig. 3 RMSE in estimates of the offspring mean by considering MCMC and by rejection ABC jointly withthe standard errors (vertical bars)

Firstly, we make a study about the behaviour of the different metrics by considering therejection ABC method and paying attention to the generation 10. A tolerance level equal tothe 0.00025 quantile of the sample of the distances is considered for each metric, so that thesize of ABC samples to approximate the posterior distribution of offspring mean is 5,000.It is remarkable that the considered metrics do rank simulated data sets differently, so eachone can lead us to different results.

We consider as a measure of the accuracy of the methods, the relative mean square error(RMSE), which was also proposed in [2], calculated by n−1 ∑n

i=1(μi − μ)2/μ2, with n thesample size, and μi the offspring mean of the reproduction law of the i th selected sample.

Figure 3 shows the RMSE of the estimates of μ by MCMC and by rejection ABC method(by considering the four metrics) jointly with the standard errors (vertical bars). Interestingly,the differences of the RMSE of the different approaches are not substantial (notice that thesample size is quite big: n = 5,000 for ABC and n = 5,200 for MCMC).

M. González et al.

0.8 1.0 1.2 1.4

86

42

0

Den

sity

MCMC Posterior

ABC Posterior ρ s

ABC Posterior ρ e

ABC Posterior ρ h

ABC Posterior ρ l1

ISE

ρs ρe ρh ρl1

0.11

400.

1560

0.21

460.

2844

Fig. 4 Left comparison of the posterior densities obtained with MCMC and rejection ABC (for the fourmetrics). The vertical line corresponds to the true value of the offspring mean. Right ISE for the posteriordistributions estimated by rejection ABC, by considering the four metrics, using as reference the posteriordistribution given by the MCMC algorithm

0.0 0.5 1.0 1.5 2.0 2.5

02

46

8

Den

sity

MCMC

Rejection ABC

Local Linear regression ABC

Logistic regression ABC

Fig. 5 Comparison of the posterior densities obtained with MCMC, rejection, local linear regression andlogistic regression ABC. The vertical line corresponds to the true value of the offspring mean

Figure 4, left, shows the estimated posterior density applying MCMC and the approxima-tions given by the rejection ABC methodology by considering the four metrics. To comparethe whole posterior distribution obtained by the rejection ABC methods we compute theintegrated squared error (ISE) taking as reference the distribution provided by the MCMCmethodology. As was stated before, a performance indicator for ABC techniques consists intheir ability to replicate likelihood-based results given by MCMC. Figure 4, right, shows thatthe ISE obtained with the rejection ABC method by considering the metric ρs is the smallest.

In the following we performance the post-processing of ABC output only considering theρs metric. This choice is motivated by the fact that ρs leads to the smallest ISE and to a similarvalue of RMSE than the other metrics. Also, it has been considered because of this is an adhoc metric for this problem. Figure 5 shows the posterior densities for the offspring meanestimated by MCMC, rejection, local-linear and logistic regression methods. The smallest


Generation 20D

ensi

ty

MCMCRejection ABC

0.9 1.0 1.1 1.2 1.3 0.9 1.0 1.1 1.2 1.3

Generation 30

Den

sity

MCMCRejection ABC

0.90 0.95 1.00 1.05 1.10 1.15 1.20 1.25

02

46

8

02

46

8

02

46

8

Generation 40

Den

sity

MCMCRejection ABC

Fig. 6 Comparison of the posterior densities obtained with MCMC and rejection ABC. The vertical linecorresponds to the true value of the offspring mean

value of the ISE is found when performing the rejection ABC algorithm (iserejection = 0.104,iselocal-linear = 0.268 and iselogistic = 2.091 ). The posterior densities estimated by local linearregression and logistic regression are more concentrated and dispersed, respectively, over theoffspring mean. We conclude that for our problem the rejection ABC is better than the othersto provide an accurate estimation of the posterior distribution of the offspring mean.

Finally Figure 6 shows the comparison along generations of the behaviour of the pos-terior densities estimated by MCMC and rejection ABC. As the simulation size is fixed(20 millions), increasing the number of generations (that is, increasing the number of sum-mary statistics) implies an increase in the tolerance level (in particular a tolerance level ofq0.00025 of the corresponding distributions means in absolute terms for n = 10, ε = 0.104;for n = 20, ε = 0.108; for n = 30, ε = 0.153; for n = 40, ε = 0.187). Respect tothe ISEs, taking as reference the MCMC posterior density in each generation we obtainisegeneration 10 = 0.104, isegeneration 20 = 0.353, isegeneration 30 = 0.106 and isegeneration 40 = 0.324 (thereis no a monotonic relationship among them) but could be an increase of the bias of the ABCapproximation when the generations go up. In general, the fact that the bias may increasewith the number of the summary statistics is known as the curse of the dimensionality. Onepossibility to solve this phenomenon is the search of different summary statistics that allowpossible dimension reductions. Our goal in this paper has been, mainly, to approximate theposterior distribution π(p | Zn), so that the summary statistics was fixed and determined bythe generation-by-generation population size. Anyway, beyond the objectives of this paper,it is important to point out that the ABC methodology provides us a new way to set out theinference problems that arise in the context of the branching processes. That is, the searchof summary statistics S(Zn), with possible dimension reductions, that provide posteriordistributions π(p | S(Zn)) close enough to π(p | Zn).

5 Concluding remark

We have considered Bayesian inference for controlled branching processes with a deter-ministic control function. Through a simulated study, one can establish that for this kind ofbranching models the MCMC method, based on Gibbs sampler, works quite well by allowingto make inference for the parameters of the model under realistic samples which proceedfrom the observation of the population size in each generation. Computationally, this meth-odology has an important cost spending a lot of time of computation. The mean time ofcomputation for a chain of size 10,000 has been 1,680 seconds using a Processor Intel(R)Core(TM)2 Quad CPU Q6600 @ 2.40GHz. In this framework, the reject ABC algorithmwith ρs metric behaves better than the other considered in the paper and represents a good

M. González et al.

alternative, showing accurate enough approximation to the MCMC estimate, reducing thetime of computation. The mean time of computation for a block of 1,00,000 simulations hasbeen 110 seconds using the same processor.

Remark 1 The simulations have been performed by the statistical software R (see [29]) usingfor convergence diagnostics the coda package (see [27]).

Acknowledgments The authors would like to thank the associate editor and the reviewers for providingvaluable comments and suggestions which have significantly improved this paper. Moreover, they are alsograteful to Professors Horacio González-Velasco and Carlos García-Orellana for providing them with thecomputational support.

References

1. Bagley, J.H.: On the almost sure convergence of controlled branching processes. J. Appl. Prob 23,827–831 (1986)

2. Beaumont, M.A., Zhang, W., Balding, D.J.: Approximate Bayesian computation in population genet-ics. Genetic 162, 2025–2035 (2002)

3. Beran, R.: Minimum Hellinger distance estimates fro paramteric models. Ann. Stat. 5, 445–463 (1977)4. Berger, J., Bernardo, J.: Ordered group reference priors with applications to a multinomial problem.

Biometrika 79, 25–37 (1992)5. Blum, M.G.B., Tran, V.C.: HIV with contact tracing: a case study in approximate Bayesian computa-

tion. Biostatistics 11, 644–660 (2010)6. Brooks, S.: Markov Chain Monte Carlo method and its application. J. R. Stat. Soc. Ser. D (The Statisti-

cian) 47, 69–100 (1998)7. Bruss, F.T.: A counterpart of the Borel-Cantelli lemma. J. Appl. Prob 17, 1094–1101 (1980)8. Dion, J.P., Essebbar, B.: On the statistics of controlled branching processes. Lect. Notes Stat. 99,

14–21 (1995)9. Dean, T.A., Singh, S.S., Jasra A., Peters G.W.: Parameter Estimation for Hidden Markov Models with

Intractable Likelihoods. http://arxiv.org/abs/1103.539910. Ellis, M., Elphick, C.: Using a stochastic model to examine the ecological, economic and ethical conse-

quences of population control in a charismatic invasive species: mute swans in North America. J. Appl.Ecol. 44, 312–322 (2007)

11. Gelman, A., Rubin, D.: Inference from iterative simulation using multiple sequences. Stat. Sci. 7,457–511 (1992)

12. González, M., Martín, J., Martínez, R., Mota, M.: Non-parametric Bayesian estimation for multitypebranching processes through simulation-based methods. Comput. Stat. Data Anal. 52, 1281–1291 (2008)

13. González, M., Martínez, R., del Puerto, I.: Nonparametric estimation of the offspring distribution andmean for a controlled branching process. Test 13, 465–479 (2004)

14. González, M., Martínez, R., del Puerto, I.: Estimation of the variance for a controlled branching pro-cess. Test 14, 199–213 (2005)

15. González, M., Molina, M., del Puerto, I.: On the class of controlled branching process with randomcontrol functions. J. Appl. Prob. 39, 804–815 (2002)

16. González, M., Molina, M., del Puerto, I.: On the geometric growth in controlled branching process withrandom control functions. J. Appl. Prob. 40, 995–1006 (2003)

17. González, M., Molina, M., del Puerto, I.: Limiting distribution for subcritical controlled branching pro-cesses with random control function. Stat. Probab. Lett. 63, 227–284 (2004)

18. González, M., Molina, M., del Puerto, I.: Asymptotic behaviour for the critical controlled branchingprocess with random control function. J. Appl. Probab. 42, 463–477 (2005)

19. González, M., Molina, M., del Puerto, I.: On the L2-convergence of controlled branching processes withrandom control functions. Bernoulli 11, 37–46 (2005)

20. González, M., del Puerto, I.: Diffusion approximation of an array of controlled branching processes.Methodol. Comput. Appl. Probab. (2012). doi:10.1007/s11009-012-9285-8

21. Guttorp, P.: Statistical Inference for Branching Processes. Wiley, New York (1991)22. Marin, J.M., Pudlo, P., Robert, C.P., Ryder, R.: Approximate Bayesian Computational methods. Stat.

Comput. 21(2), 289–291 (2011)23. Mendoza, M., Gutiérrez-Peña, E.: Bayesian Conjugate Analysis for the Galton–Watson Process.

Test 9, 149–172 (2000)

http://arxiv.org/abs/1103.5399

http://dx.doi.org/10.1007/s11009-012-9285-8


24. Molina, M., González, M., Mota, M.: Some theoretical results about superadditive controlled Galton–Watson branching processes. In: Husková, M., Lachout, P., Visek, J. (eds.) Proceedings of PragueStochastic’98. Union of Czech Mathematicians and Physicists (1998)

25. Martínez, R., Mota, M., del Puerto, I.: On asymptotic posterior normality for controlled branching pro-cesses with random control function. Statistics 43, 367–378 (2009)

26. Plagnol, V., Tavar, S.: Approximate Bayesian computation and MCMC. In: Niederreiter, H. (ed.)Proceedings of Monte Carlo and Quasi-Monte Carlo Methods 2002, pp. 99–114. Springer, Berlin (2004)

27. Plummer, M., Best, N., Cowles, K., Vines, K.: coda: Output analysis and diagnostics for MCMC (2010).http://CRAN.R-project.org/package=coda. R package version 0.13-5

28. Pritchard, J., Seielstad, M., Perez-Lezaun, A., Feldman, M.: Population growth of human Y chromosomes:a study of Y chromosome microsatellites. Mol. Biol. Evol. 16, 1791–1798 (1999)

29. R Development Core Team: R: A language and environment for statistical computing. R Foundation forStatistical Computing, Vienna, Austria (2012). http://www.R-project.org. ISBN 3-900051-07-0

30. Sevastyanov, B.A., Zubkov, A.: Controlled branching processes. Theor. Probab. Appl. 19, 14–24 (1974)31. Sriram, T., Bhattacharya, A., González, M., Martínez, R., del Puerto, I.: Estimation of the offspring mean

in a controlled branching process with a random control function. Stoch. Proc. Appl. 117, 928–946 (2007)32. Tierney, L.: Markov chains for exploring posterior distributions. Ann. Stat. 22, 1701–1762 (1994)33. Todd, C., Forsyth, D., Choquenot, D.: Modelling the effect of fertility control on koala-forest dynamics.

J. Appl. Ecol. 45, 568–578 (2007)34. Yanev, N.M.: Conditions for degeneracy of ϕ-branching processes with random ϕ. Theor. Probab.

Appl. 20, 421–428 (1975)35. Zubkov, A.M.: Analogies between Galton–Watson processes and ϕ-branching processes. Theor. Probab.

Appl. 19, 309–331 (1974)

http://CRAN.R-project.org/package=coda

http://www.R-project.org

Documents

Bayesian inference for controlled branching processes through MCMC and ABC methodologies