Upload
others
View
10
Download
0
Embed Size (px)
Citation preview
Optimal Bayesian experimental designsfor complex models
Mahasen Bandara DehideniyaBachelor of Science (Statistics with Computer Science),
University of Colombo
under the supervision of
Principal Supervisor: AProf James M. McGree
Associate Supervisor: AProf Christopher C. Drovandi
Mathematical Sciences
Faculty of Science and Engineering
Queensland University of Technology
2019
Submitted in fulfillment of the requirements of the degree of
Doctor of Philosophy
Abstract
Experimental design methods are used in areas such as epidemiology, system biology andecology to collect informative data. The increasingly complex systems from these areasmeans realistic statistical models with hidden states or auxiliary variables potentially inhierarchies are required to describe the observed data. This in turn imposes significantchallenges in experimental design as typically the informativeness of a given design isassessed based on a candidate model or set of competing models given previously collectedor expert elicited data. Thus, design and inference are naturally linked, so as morecomplex models are being developed and used in real-world problems, new methods indesign also need to be considered.
The problem of designing efficient experiments has been addressed in both the frequentistand Bayesian literature. In contrast with the frequentist approach, Bayesian methods fordesigning experiments provide a mathematically rigorous framework to handle uncertaintyabout, for example, the parameter values and the data generating model. However,Bayesian designs are much more computationally expensive to evaluate than frequentistdesigns, even for simple models, as it requires a large number of posterior evaluations.Consequently, Bayesian designs have been limited to relatively simple models, and futureresearch is needed for complex models to address the real world design problems in areassuch as epidemiology, ecology and pharmacology.
The work considered in this thesis is motivated by experiments in epidemiology. Unfortu-nately, most epidemiological models have computationally expensive or computationallyintractable likelihoods meaning evaluating the likelihood many times is computationallyinfeasible. In recent years, methods that facilitate Bayesian inference, at least approxi-mately, in such settings have become well-established (i.e. approximate Bayesian compu-tation; ABC) and used in many areas including epidemiology. In contrast, Bayesian designin such settings has received little attention. To address this gap in the literature, in thisthesis, we propose new methodologies to design experiments for discriminating betweencompeting models using various adaptations and developments of ABC methods.
In the presence of uncertainty about the model and parameters, conducting experimentsto unravel a single source of uncertainty is not efficient due to the high cost of con-ducting experiments, particularly in epidemiology. However, dual-purpose experimentsin epidemiology have not been previously considered due to lack of methods for efficientlyapproximating utility functions which are used to quantitatively evaluate designs. Thus,the developments in approximate Bayesian inference as proposed in this thesis will addressthis need. That is, we will demonstrate that our proposed methods can be used to locatedual purpose designs in settings where the likelihood is computationally intractable.
i
Of the little research that has been conducted in Bayesian design for models with in-tractable likelihoods, all have been limited to designs with a small number of dimensions(around 4). Unfortunately, realistically sized experiments require many more design di-mensions to be considered rending current approaches inapplicable. Consequently, ourdeveloped computational methods will enable designs to be found in high dimensions,and this will be demonstrated from a variety of utility functions including those for pa-rameter estimation, model discrimination and dual purpose experiments. The proposedmethodological developments in this thesis enable the derivation of efficient experimentsto understand biological processes in epidemiology and ecology. Primarily, the proposedmethods can be used to understand of how a disease spreads in large-scale agriculturefields or livestock. Such an understanding can lead to the development of appropriateand informed measures for early detection and control to prevent large-scale spread of thedisease. In addition, these methods can be used to find efficient experiments in ecologyto understand biological phenomena such as predator-prey interactions which can informthe development of policies for sustainable environments and the protection of endangeredspecies.
ii
Declaration
I hereby declare that this submission is my own work and tothe best of my knowledge it contains no material previouslypublished or written by another person, nor material which to asubstantial extent has been accepted for the award of any otherdegree or diploma at QUT or any other educational institution,except where due acknowledgement is made in the thesis. Anycontribution made to the research by colleagues, with whom Ihave worked at QUT or elsewhere, during my candidature, isfully acknowledged.
I also declare that the intellectual content of this thesis is theproduct of my own work, except to the extent that assistancefrom others in the project’s design and conception or in style,
iii
05/07/2019QUT Verified Signature
Acknowledgements
First and foremost, I would like to thank my supervisors for giving me the opportunityto complete my PhD thesis under their supervision. I would like to express my deepestand sincere gratitude to my principle supervisor, Associate Professor James McGree forhis invaluable advice and patience in guiding me at each step of my PhD research jour-ney. I would like to thank to Associate Professor Christopher Drovandi for his continuoussupport and guidance throughout this work. It was a great privilege and honour to learnand develop my research skills under their supervision.
This research would never have been possible without the support of various people atQUT, Brisbane. Firstly, the financial support provided by QUT to cover my living ex-periences in Brisbane and tuition fee to complete my PhD at QUT. I would also thankto staff members of the high-performance computing facility at QUT for their support inconducting my computationally intensive simulation studies. I am extending my thanksto my BRAG and ACEMS friends for their constant encouragement and genuine supportthroughout my PhD life.
I offer my sincerest thanks to University of Peradeniya for providing study leave and al-lowing me to continue my PhD at QUT. Furthermore, my special thanks go to the staff ofthe Department of Statistics and Computer Science at University of Peradeniya for theirtremendous support and guidance.
I am incredibly grateful to my parents for their continuous love and guidance throughoutmy life. Also, I express my thanks to my beloved sister and brother in law for theirsupport. My heartfelt thanks go to my partner, Arosha for her endless love and patience.I would also thank all the friends of mine who live in Brisbane for their great friendshipwhich makes me feel at home in Brisbane during the last four years.
v
List of Publications Arising from this Thesis
Chapter 3: Dehideniya M. B., Drovandi C. C., and McGree J. M. (2018). Optimal Bayesiandesign for discriminating between models with intractable likelihoods in epidemiol-ogy, Computational Statistics & Data Analysis, 124: 277-297.
Chapter 4: Dehideniya M. B., Drovandi C. C., and McGree J. M. Dual purpose Bayesiandesign for parameter estimation and model discrimination in epidemiology using asynthetic likelihood approach, Bayesian Analysis (Submitted for publication).
Chapter 5: Dehideniya M. B., Drovandi C. C., Overstall A. M., and McGree J. M. A syntheticlikelihood-based Laplace approximation for efficient design of biological processesElectronic Journal of Statistics (Submitted for publication).
vii
Contents
Abstract i
Declaration iii
Acknowledgements v
List of Publications Arising from this Thesis vii
Chapter 1 Introduction 71.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.1.1 Research aim and objectives . . . . . . . . . . . . . . . . . . . . . . 81.2 Research contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2.1 Contribution to methodology . . . . . . . . . . . . . . . . . . . . . 91.2.2 Contribution to application . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Research scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.4 Thesis structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Chapter 2 Literature Review 132.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.2 Bayesian inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.1 Markov chain Monte Carlo . . . . . . . . . . . . . . . . . . . . . . 152.2.2 Importance sampling . . . . . . . . . . . . . . . . . . . . . . . . . . 162.2.3 Laplace approximation . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3 Approximate Bayesian computation . . . . . . . . . . . . . . . . . . . . . 172.3.1 Synthetic likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . 192.3.2 Indirect inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4 Bayesian experimental designs . . . . . . . . . . . . . . . . . . . . . . . . . 202.4.1 Utility functions for parameter estimation . . . . . . . . . . . . . . 222.4.2 Utility functions for model discrimination . . . . . . . . . . . . . . 222.4.3 Utility functions for dual experimental goals . . . . . . . . . . . . 23
2.5 Experimental designs for models with intractable likelihood . . . . . . . . 242.6 Optimisation algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Chapter 3 Optimal Bayesian design for discriminating between models with in-tractable likelihoods in epidemiology 29
3.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.3 Bayesian model choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.4 Bayesian experimental design . . . . . . . . . . . . . . . . . . . . . . . . . 34
ix
3.4.1 Utility function for parameter estimation . . . . . . . . . . . . . . 353.4.2 Utility functions for model discrimination . . . . . . . . . . . . . . 36
3.5 Approximate Bayesian computation (ABC) and utility estimation . . . . 373.5.1 ABC for parameter estimation . . . . . . . . . . . . . . . . . . . . 373.5.2 ABC for model choice . . . . . . . . . . . . . . . . . . . . . . . . . 383.5.3 Estimating the model discrimination utility functions . . . . . . . 39
3.6 Optimisation algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413.6.1 Refined coordinate exchange (RCE) algorithm . . . . . . . . . . . 41
3.7 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433.7.1 Example 1 - Designs for parameter estimation of a pharmacokinetic
model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453.7.2 Example 2 - Designs for model discrimination . . . . . . . . . . . . 473.7.3 Example 3 - Designs for model discrimination . . . . . . . . . . . . 54
3.8 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Chapter 4 Dual purpose Bayesian design for parameter estimation and model dis-crimination in epidemiology using a synthetic likelihood approach 61
4.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 634.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 634.3 Inference framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 654.4 Bayesian experimental designs . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.4.1 Dual purpose utility function of parameter estimation and modeldiscrimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.4.2 Utility function for parameter estimation . . . . . . . . . . . . . . 704.4.3 Utility function for model discrimination . . . . . . . . . . . . . . . 71
4.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 714.5.1 Example 1 - Death and SI models . . . . . . . . . . . . . . . . . . 744.5.2 Example 2 - SIR and SEIR models . . . . . . . . . . . . . . . . . 79
4.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Chapter 5 A synthetic likelihood-based Laplace approximation for efficient designof biological processes 85
5.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 875.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 875.3 Inference framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 885.4 Bayesian experimental designs . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.4.1 Utility functions for parameter estimation . . . . . . . . . . . . . . 925.4.2 Utility function for model discrimination . . . . . . . . . . . . . . . 935.4.3 Utility function for dual purpose experiments . . . . . . . . . . . . 935.4.4 Optimisation algorithm . . . . . . . . . . . . . . . . . . . . . . . . 94
5.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 945.5.1 Example 1 - Dual purpose designs for the death and SI models . . 945.5.2 Example 2 - Dual purpose designs for foot and month disease . . . 1015.5.3 Example 3 - Design for parameter estimation of predator - prey model106
5.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
Chapter 6 Conclusion 1116.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1116.2 Limitations and future work . . . . . . . . . . . . . . . . . . . . . . . . . . 113
x
Appendix A Supplementary Material for Chapter 3: ‘Optimal Bayesian design fordiscriminating between models with intractable likelihoods in epidemi-ology’ 115
A.1 Monte Carlo error of utility estimation in Example 2 . . . . . . . . . . . . 116A.2 Performance of optimisation algorithms in locating three and four points
designs discriminating between Model 1 and 2 . . . . . . . . . . . . . . . 117A.3 Performance of the optimal designs . . . . . . . . . . . . . . . . . . . . . . 118
A.3.1 Performance of the optimal designs for model discrimination -Model 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
A.3.2 Performance of the optimal designs for model discrimination -Model 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
A.3.3 Performance of the optimal designs for model discrimination - Model4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
Appendix B Supplementary Material for Chapter 4: ‘Dual purpose Bayesian designfor parameter estimation and model discrimination in epidemiology usinga synthetic likelihood approach’ 121
B.1 Derivation of the total entropy utility for static designs . . . . . . . . . . . 121B.1.1 Expected change in HM . . . . . . . . . . . . . . . . . . . . . . . . 121B.1.2 Expected change in HP . . . . . . . . . . . . . . . . . . . . . . . . 123B.1.3 Expected change in HT . . . . . . . . . . . . . . . . . . . . . . . . 125
B.2 Comparison between synthetic likelihood approach and ABC rejection method127
Appendix C Supplementary Material for Chapter 5: ‘A synthetic likelihood-basedLaplace approximation for efficient design of biological processes’ 129
C.1 Informativeness of summary statistics . . . . . . . . . . . . . . . . . . . . 129C.1.1 Death model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129C.1.2 SI model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
xi
List of Figures
3.1 Trace plots of U(d) each iteration of the ACE and RCE algorithm in lo-
cating 15 optimal blood sampling times based on a discretised design space. 47
3.2 Prior predictive distributions of Model 1 (solid) and 2 (dashed). Here,
dot-dashed and dotted lines represent the 2.5% and 97.5% prior prediction
quantiles of Model 1 and 2, respectively. . . . . . . . . . . . . . . . . . . 48
3.3 Comparison of estimated expected utility of the mutual information utility
(first row), the Zero-One utility (second row) and the Ds-optimality utility
(third row) using ABC likelihoods and actual likelihoods. In each plot,
y = x line indicates the perfect match of approximated and actual utility
evaluations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.4 Trace plots of the expected utility for each run of the ACE-D and RCE
algorithms in locating optimal two-points design based on (a) the mutual
information utility , (b) the Ds-optimality and (c) Zero-One utility. . . . . 51
3.5 Empirical cumulative probabilities of the posterior model probability of
Model 1 (true model) obtained for observations generated from Model 1
according to optimal designs for discriminating between Models 1 and 2,
and random designs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.6 Empirical cumulative probabilities of the posterior model probability of
Model 2 (true model) obtained for observations generated from Model 2
according to optimal designs for discriminating between Models 1 and 2,
and random designs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.7 Empirical cumulative probabilities of the ABC posterior model probability
of Model 1 (true model) obtained for observations generated from Model
1 according to optimal designs for discriminating between Models 1, 2, 3
and 4, and random designs. . . . . . . . . . . . . . . . . . . . . . . . . . . 57
1
4.1 Comparison of the estimated expected utility of the mutual information
utility (first row), the total entropy utility (second row) and the KLD
utility (third row) using synthetic and actual likelihoods. In each plot,
y = x line indicates a perfect match of approximated and actual utility
evaluations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.2 The posterior model probability of the data generating model obtained for
observations generated from (a) death model and (b) SI model according
to optimal designs and random designs. . . . . . . . . . . . . . . . . . . . 77
4.3 Log determinant of the posterior covariance of the parameters of (a) death
model and (b) SI model obtained for observations generated from the cor-
responding model according to optimal designs and random designs. . . . 78
4.4 The approximated posterior model probability of (a) SIR model and (b)
SEIR model obtained for observations generated from the corresponding
model according to optimal designs and random designs. . . . . . . . . . . 81
4.5 Log determinant of the approximated posterior covariance of the parame-
ters of (a) SIR model and (b) SEIR model obtained for observations gen-
erated from the corresponding model according to optimal designs and
random designs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.1 Comparison of the estimated expected utility of design with 15 design
points according to the (a) total entropy, (b) mutual information and (c)
KLD utilities using Laplace approximation based on synthetic and actual
likelihoods. Here, designs with biased estimate of the expected utility are
represented by (×). In each plot, y = x line indicates a perfect match of
approximated and actual utility evaluations. . . . . . . . . . . . . . . . . 96
5.2 Prior predictive distribution of number of infecteds based on the death
model (solid) and SI model (dashed) are given in sub figure (a). Here,
dot-dashed and dotted lines represent the 10% aand 90% prior predictive
quantiles of the death and SI model, respectively. Sub figure (b) illustrates
the optimal designs found under total entropy utility (∗) along with the
KLD utility (×) and mutual information utility (+) for the death and SI
models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
5.3 The posterior model probability of the data generating model obtained for
observations generated from (a) death model and (b) SI model according
to optimal designs and an equally spaced designs. . . . . . . . . . . . . . . 99
2
5.4 The log determinant of the inverse of posterior variance-covariance matrix
of parameters of the data generating model when observations generated
from (a) death model and (b) SI model according to optimal designs and
an equally spaced designs. . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.5 Sub figures (a) and (b) the prior predictive distributions of infectious and
recovered individuals based on the SIR (solid) and SEIR (dashed) models.
In both figures, dotted and dot-dashed lines represent the 10% aand 90% of
prior predictive quantiles based on the SIR and SEIR model, respectively.
Optimal designs found under total entropy utility (∗) along with the KLD
utility (×) and mutual information utility (+) for the SIR and SEIR models
are illustrated in sub figure (c). . . . . . . . . . . . . . . . . . . . . . . . 102
5.6 The posterior model probability of the data generating model obtained for
observations generated from (a) SIR model and (b) SEIR model according
to optimal designs and an equally spaced designs. . . . . . . . . . . . . . . 104
5.7 The log determinant of the inverse of posterior variance-covariance matrix
of parameters of the data generating model when observations generated
from (a) SIR model and (b) SEIR model according to optimal designs and
an equally spaced designs. . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.8 Prior predictive distributions of prey and predators are given in sub figure
(a) and (b) respectively. In both figures, dotted lines represent the 10% and
90% prior prediction quantiles of prey and predators. Plot (c) illustrates
the optimal designs found under KLD utility (+) and NSEL utility (∗) for
estimating parameters of the modified LV model. . . . . . . . . . . . . . 107
5.9 The log determinant of the inverse of posterior variance-covariance matrix
of parameters of the data generating model when observations generated
from the modified LV model according to optimal designs and an equally
spaced designs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
A.1 Monte Carlo error of estimated expected utility of the mutual information
utility (first row), the Ds-optimality utility (second row) and the Zero-One
utility (third row). In each plot, the solid line represents the Monte Carlo
error associates with the estimated utility of the optimal design found under
each utility and dashed lines represent the Monte Carlo errors for randomly
selected designs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
A.2 Empirical cumulative probabilities of the ABC posterior model probability
of Model 2 (true model) obtained for observations generated from Model
2 according to optimal designs for discriminating between Models 1, 2, 3
and 4, and random designs. . . . . . . . . . . . . . . . . . . . . . . . . . . 118
3
A.3 Empirical cumulative probabilities of the ABC posterior model probability
of Model 3 (true model) obtained for observations generated from Model
3 according to optimal designs for discriminating between Models 1, 2, 3
and 4, and random designs. . . . . . . . . . . . . . . . . . . . . . . . . . . 119
A.4 Empirical cumulative probabilities of the ABC posterior model probability
of Model 4 (true model) obtained for observations generated from Model
4 according to optimal designs for discriminating between Models 1, 2, 3
and 4, and random designs. . . . . . . . . . . . . . . . . . . . . . . . . . . 120
B.1 Comparison of the accuracy of the estimated expected utility of the mu-
tual information utility using Synthetic likelihood based on 1× 106 model
simulations (first column), and ABC-MC based on 1×106 (second column)
and 2 × 106 (third column) model simulations. In each plot, y = x line
indicates a perfect match of approximated and actual utility evaluations. . 128
C.1 Scatter plot between model parameter b and summary statistics, (a) mean
and (b) variance of observations simulated from the death model according
to a random design with 15 design points. . . . . . . . . . . . . . . . . . . 129
C.2 Scatter plot between model parameter b1 and summary statistics, (a) mean
and (b) variance of observations simulated from the SI model according to
a random design with 15 design points. . . . . . . . . . . . . . . . . . . . . 130
C.3 Scatter plot between model parameter b2 and summary statistics, (a) mean
and (b) variance of observations simulated from the SI model according to
a random design with 15 design points. . . . . . . . . . . . . . . . . . . . . 130
4
List of Tables
3.1 Performance of optimisation algorithms in locating 15-points designs for
parameter estimation of PK-model. . . . . . . . . . . . . . . . . . . . . . . 46
3.2 Performance of optimisation algorithms in locating two points designs for
discriminating between Model 1 and 2 based on different utility functions. 50
3.3 Optimal designs for discriminating between Model 1 and Model 2 derived
under different utility functions. . . . . . . . . . . . . . . . . . . . . . . . . 52
3.4 Utility of optimal designs for discriminating between Models 1, 2, 3 and 4
derived under different utility functions. . . . . . . . . . . . . . . . . . . . 56
3.5 Utility of optimal designs for discriminating between Models 1, 2, 3 and 4
derived under the mutual information utility. . . . . . . . . . . . . . . . . 57
4.1 Optimal designs derived under different utility functions. . . . . . . . . . . 76
4.2 Optimal designs derived under different utility functions. . . . . . . . . . . 80
5.1 Expected utility values (standard deviation) of optimal designs derived
under different utility functions. . . . . . . . . . . . . . . . . . . . . . . . . 98
5.2 Expected utility values (standard deviation) of optimal designs derived
under different utility functions. . . . . . . . . . . . . . . . . . . . . . . . . 103
5.3 Expected utility values (standard deviation) of optimal designs derived
under the KLD and NSEL utility functions. . . . . . . . . . . . . . . . . . 108
A.1 Performance of optimisation algorithms in locating three and four points
designs for discriminating between Model 1 and 2 based on different utility
functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
5
1 Introduction
1.1 Motivation
Understanding the mechanisms underpinning the dynamics of biological systems is im-
portant in many areas such as epidemiology, ecology and systems biology. For instance,
in epidemiology, understanding the dynamics of an infectious disease spreading among a
population of animals or plants is crucial for developing targeted strategies for detection,
prevention and control. In these areas, experiments are used as one of the main meth-
ods of collecting data to discover new knowledge about the process of interest, and these
experiments should be carefully designed to ensure the data to be obtained are as infor-
mative as possible. However, the increasing complexity of models for biological systems
presents new challenges in design of experiments, motivating the need to develop modern
statistical methods in design for real-world experiments in biology.
The problem of designing efficient experiments has been addressed in both the frequen-
tist and Bayesian literature. In both approaches, optimal designs are selected based on
a utility function defined to reflect the worth of data to be obtained from the experi-
ment to achieve the intended experimental goals such as parameter estimation, model
discrimination and/or prediction. In general, existing information from previous experi-
ments and expert knowledge can be used to design efficient experiments. In contrast to
the frequentist approach, Bayesian methods to design experiments provide a principled
framework to incorporate prior information into the selection of optimal designs. This
is achieved by defining a utility function based on the posterior distribution, which is a
combination of prior information and information from the data (though the likelihood
function). Therefore, Bayesian designs can yield data which provide information that is
additional to what is already known. This is a significant advantage over the frequentist
approach where prior information is not typically included in the analysis.
In addition, uncertainty is most rigorously handled within a Bayesian framework. This en-
ables the construction of important utility functions for learning about specific unknowns.
For instance, the mutual information utility for model discrimination (Box and Hill, 1967)
selects designs based on the posterior model probability. Such a utility function therefore
captures (and handles) uncertainty across all outcomes within the model space. Another
7
Chapter 1. Introduction 8
useful utility is the total entropy utility (Borth, 1975) that is constructed based on en-
tropy (i.e. uncertainty) to design dual purpose experiments for parameter estimation and
model discrimination. In the frequentist context, dual purpose utility functions are typi-
cally defined as a weighted sum of utility functions for each experiment goal, for example,
DT-optimality criterion (Atkinson, 2008) and DKL-optimality criterion (Tommasi, 2009).
Consequently, the experimenter should select suitable weights for each goal which is not
straightforward. In contrast, the total entropy utility does not require such a selection of
weights as these are determined by the uncertainty in the prior information.
Often, models used to describe biological processes have likelihood functions that are
not available in closed-form or are expensive to evaluate a large number of times. As a
consequence, Bayesian design in these areas has been limited to low dimensions, a single
model and utility functions for parameter estimation. However, in practice, the true
biological process governing, for example, the spread of a disease within a population, is
rarely known. Therefore, new methodologies are required to handle multiple competing
models and to discriminate between models with intractable likelihoods.
Further, experiments with multiple goals such as parameter estimation and model dis-
crimination have been proposed in the design literature as a way of reducing the number
of experiments needed to learn about the process of interest. These dual purpose ex-
periments are useful in the context of epidemiological experiments where the cost of
conducting experiments is high, and the number of experiments is restricted by the ethi-
cal concerns about using animals as experimental units. Currently, there are no existing
methods to design such experiments for models with intractable likelihoods.
This thesis, therefore, aims to develop new Bayesian computational algorithms to advance
the field of design of experiments, and, through their application, advance applied areas
such as epidemiology and ecology. This is detailed in the next section.
1.1.1 Research aim and objectives
The central aim of this thesis is to develop methods to design experiments for models
with intractable likelihoods, with particular applications in epidemiology and ecology.
This aim will be achieved by addressing the following objectives:
1. Develop a method to design experiments for discriminating between models with
intractable likelihoods.
2. Develop a new optimisation algorithm for computationally expensive utility func-
tions.
3. Develop a method to design dual purpose experiments for models with intractable
likelihoods.
Chapter 1. Introduction 9
4. Extend Objective 3 to find high dimensional designs for models with intractable
likelihoods.
1.2 Research contribution
Addressing each of the above objectives will provide methodological contributions to the
field of Bayesian experimental design and applied areas for experimenting in epidemiology
and ecology. These specific contributions are outlined below.
1.2.1 Contribution to methodology
Objective 1 aims to develop and implement a computationally efficient algorithm to ap-
proximate three model discrimination utility functions for models with intractable likeli-
hoods. The algorithm will be based on methods from approximate Bayesian computation
(ABC), and will facilitate efficient approximations to posterior distributions. Such devel-
opments will provide the first approach to address model uncertainty in Bayesian design
for models with intractable likelihoods.
In general, evaluating utility functions in Bayesian design is computationally expensive,
and this becomes more computationally demanding for models with intractable likeli-
hoods. This presents significant challenges when optimising the design, particularly in
high dimensions. Therefore, to address Objective 2, we will extend the coordinate ex-
change algorithm such that it can be used to efficiently optimise expensive utility func-
tions. As a result, optimal designs will be found with a relatively small number of utility
evaluations facilitating high dimensional designs to be found in a timely manner.
Despite the advantages of conducting the dual purpose experiments, there has not been
any methodological development to design such experiments for models with intractable
likelihoods. We address this lack of methods through Objective 3 by developing an exten-
sion of the synthetic likelihood approach to handle discrete data. The proposed likelihood
approximation facilitates the evaluation of a wide range of utility functions including the
total entropy utility for designing dual purpose experiments for models with intractable
likelihoods.
For models with intractable likelihoods, limited approaches have been proposed for de-
signing experiments to collect a reasonably large number of observations. Objective 4
focuses on developing a synthetic likelihood-based Laplace approximation to efficiently
approximate posterior distributions. This extends the methodologies from Objective 3 to
design high dimensional experiments for models with intractable likelihoods.
1.2.2 Contribution to application
The methods developed in this thesis are predominately applied to design experiments in
epidemiology to generate new knowledge about infectious diseases such as foot and mouth
Chapter 1. Introduction 10
disease. Designing such experiments to learn about the appropriate model to describe
the spread of an infectious disease enables the discovery of mechanisms that potentially
promote, limit and/or initialise the spread of the infection. Such knowledge can therefore
be used to develop targeted detection, prevention and control strategies. Further, the
development of methods to design dual purpose experiments for model discrimination
and parameter estimation reduces the number of experiments needed. This facilitates
running more ethical experimentation with animals.
1.3 Research scope
The scope of this research is developing new Bayesian methods to design experiments
for models with intractable likelihoods as encountered in areas such as epidemiology and
ecology. This is distinctly different from frequentist approaches for models with tractable
likelihoods which are well-developed for, for example, linear models (Montgomery, 2017),
generalised linear models (Biedermann and Woods, 2011, Dror and Steinberg, 2006, Mc-
Gree and Eccleston, 2008, 2012, Woods et al., 2006, Wu and Stufken, 2014), multiple
response models (Denman et al., 2011, Perrone and Muller, 2016) and survival models
(Konstantinou et al., 2015, McGree, 2010).
1.4 Thesis structure
This thesis consists of one published and two submitted journal articles which are first
authored by the candidate. Since these chapters have been written as independent pub-
lications, there is some overlap between them. That is, each of these chapters (3 to 5)
contain a relevant literature review and methodology section but a more comprehensive
literature review is provided in Chapter 2.
Chapter 3 presents the proposed methodological developments to design experiments for
discriminating between models with intractable likelihoods using methods from ABC. Fur-
ther, an extended coordinate exchange algorithm, called the Refined coordinate exchange
algorithm, is proposed to reduce the computation burden of locating optimal designs for
models with intractable likelihoods. This chapter addresses Objective 1 and 2 of this
thesis, and has appeared in Computational Statistics & Data Analysis.
Chapter 4 addresses Objective 3 and presents a novel approach to design dual purpose
experiments for models with intractable likelihoods using an extension of synthetic like-
lihood for discrete observations. This work is motivated by the application of designing
dual purpose experiments to study foot and mouth disease, and has been submitted for
publication in Bayesian Analysis.
Chapter 5 extends the methodologies from Chapter 4 to facilitate high dimensional de-
sign via a synthetic likelihood-based Laplace approximation to posterior inference. The
proposed approach is validated through an illustrative example and also applied to find
Chapter 1. Introduction 11
high dimensional designs for motivating examples in epidemiology and ecology. The de-
velopments presented in this Chapter have been submitted for publication in Electronic
Journal of Statistics.
Finally, Chapter 6 summarises the key finding from Chapters 3 to 5. The limitations of
this work are then discussed, and avenues for future research are proposed.
2 Literature Review
“To consult the statistician after an experiment is finished is often merely to ask him to
conduct a post mortem examination. He can perhaps say what the experiment died of.”
- Sir Ronald Aylmer Fisher
2.1 Introduction
The task of designing experiments is crucial in most scientific exploration (discovery).
In conducting an experiment, the experimenter has the freedom to select values for some
variables associated with the process of interest. These variables are referred to as control
variables, and different values of these may result in variations of the response variable
which is measured by the experimenter. Therefore, prior to conducting an experiment,
the experimenter should decide the most appropriate values for the control variables such
that the collected data are as informative as possible. The combination of values for all
control variables is referred to as the design. The field of design of experiments provides
a collection of statistical methods for selecting designs, and has been developed as one of
the main branches of statistics.
In this thesis, primarily we consider the design of experiments in epidemiology to study
the dynamics of the spread of a disease in a closed population over time. In epidemiolog-
ical experiments, initially the virus or pathogen of interest is introduced to a population
of plants (Bailey and Gilligan, 1999, Bailey et al., 2004, Kleczkowski et al., 1996, Leclerc
et al., 2014, Otten et al., 2003) or animals (Backer et al., 2012, Bravo de Rueda et al., 2015,
Hu et al., 2017, Orsel et al., 2007, van der Goot et al., 2005) in a controlled environment,
and then the population of individuals is observed over time. Due to the high cost of col-
lecting data, the population of individuals is only observed at some selected time points
where the disease state of each individual is identified as, for example, being susceptible,
exposed, infectious, or recovered with respect to the disease. Let d = {t1, t2, . . . , tn} de-
note a vector of n time points at which the experimenter desires to observe the spread of
a disease among N individuals. Then, the number of individuals in each disease state at
each observational time point is the outcome of the experiment. Further, we also consider
experiments in ecology which are focused on investigating the interactions between two or
13
Chapter 2. Literature review 14
more species populations to understand ecological phenomena such as predator-prey in-
teractions (Luckinbill, 1973, Zhang et al., 2018). These ecological experiments start with
an initially specified number of species from each population, and the experimenter ob-
serves the size of each population over time in a similar fashion in which epidemiological
experiments are undertaken.
Continuous time Markov chain (CTMC) models can be used to model the data from
these experiments where such models typically describe the probability of individuals
transitioning between different disease states.
The selection of n observational time points plays a critical role in the informativeness
of the experiment. In this thesis, we propose new methodologies to determine when the
experiment should be observed in order to efficiently understand the underlying process
that governs the spread of the disease. A Bayesian framework is adopted for inference
and design throughout this thesis. Before providing background in Bayesian design of
experiments, Bayesian inference methods for parameter estimation and model selection
are described in Section 2.2.
2.2 Bayesian inference
In a Bayesian framework, the unknown model parameters θ are treated as random vari-
ables. The uncertainty on these parameters a priori is represented by a probability dis-
tribution p(θ) which is referred to as the prior distribution. Upon observing data y under
design d, the prior distribution p(θ) and the likelihood of observed data p(y|θ,d) are
combined by the Bayes’ theorem as shown in Equation (2.1). The resulting distribution
of θ is referred to as the posterior distribution.
p(θ|y,d) =p(y|θ,d)p(θ)
p(y|d). (2.1)
Here, the denominator, p(y|d) =∫θ p(y|θ,d)p(θ) dθ , is the normalising constant or the
model evidence. For all but the simplest of models, p(y|d) cannot be evaluated analyt-
ically. However, approaches in Bayesian inference have been proposed that only require
the evaluation of a quantity that is proportional to the posterior density or probability,
and as such the posterior distribution can be expressed as follows:
p(θ|y,d) ∝ p(y|θ,d)p(θ). (2.2)
Often more than one model may be contemplated to describe the underlying process which
generates the observations y. Thus, the selection of the most suitable model from a finite
number of K candidate models, described by a random variable M ∈ {1, 2, ...,K}, is of
Chapter 2. Literature review 15
interest. Let model m contain parameters θm with a prior distribution p(θm|M = m), and
prior model probability p(M = m), and the likelihood function of each model m be given
by p(y|θm,d). For ease of notation, M = m will be abbreviated by m throughout the
rest of this thesis. Then, model choice is performed using the posterior model probability
which can be expressed as follows:
p(m|y,d) =p(y|m,d)p(m)∑K
m=1 p(y|m,d)p(m), (2.3)
where
p(y|m,d) =
∫θm
p(y|θm,m,d)p(θm|m) dθm.
The following subsections describe computational methods for parameter estimation and
model discrimination when the likelihood is available in a closed form.
2.2.1 Markov chain Monte Carlo
When the posterior of θ is only available up to a normalisation constant, as given in
Equation (2.2), Markov chain Monte Carlo (MCMC) is the most commonly used method
to sample from the target posterior of θ. MCMC is based on an iterative exploitation of
parameter space by a Markov chain starting from an initial state θ0. In each iteration,
the state of the chain moves from the current state θ to a proposed state θ∗ via a proposal
distribution q(.|.), according to the following acceptance probability
r = min{p(y|θ∗,d) p(θ∗) q(θ|θ∗)p(y|θ,d) p(θ) q(θ∗|θ)
, 1}. (2.4)
After a large number of iterations, the Markov chain reaches to its limiting distribution
which represents the posterior distribution. The first B draws of the chain are discarded
to remove the effect of the initial value of θ. The below algorithm describes the basic
Chapter 2. Literature review 16
MCMC algorithm for sampling from a posterior distribution.
1 Select an initial value θ0.
2 for i = 1 to N do
3 Generate θ∗ ∼ q(.|θi−1).
4 Compute r = min{
q(θi−1|θ∗) p(y|θ∗,d) p(θ∗)q(θ∗|θi−1) p(y|θi−1,d) p(θi−1) , 1
}5 Generate u ∼ U(0, 1)
6 if u ≤ r then
7 Set θi = θ∗
8 else
9 Set θi = θi−1
10 end
11 end
Algorithm 2.1: Metropolis-Hastings Algorithm (Hastings, 1970)
Typically, the model evidence might be an intractable integral even for models with an-
alytically tractable likelihoods p(y|θm,d), and therefore the estimation of the posterior
model probability (see Equation (2.3)) is a difficult task. The reversible jump Markov
chain Monte Carlo (RJMCMC) sampler proposed by (Green, 1995) can be used to esti-
mate p(m|y,d). The RJMCMC sampler draws samples from the joint distribution of the
model parameters and model indicator (θm,m). Then, p(m|y,d) can be estimated by the
proportion of iterations that the sampler visited model m. Although, in principle, this
algorithm is straightforward to implement, in practice it suffers from poor mixing, par-
ticularly across the model space. Thus, the use of this algorithm for estimating posterior
model probabilities has been limited.
2.2.2 Importance sampling
Importance sampling is an alternative method for approximating a distribution that is
difficult to sample from directly, i.e. a posterior distribution. Here, we describe impor-
tance sampling for estimating the expectation of a function h(X) where X ∼ f(.) and
is difficult to sample from f(.). Let g(.) be another distribution which is easy to sample
from and has the same support as f(.). g(.) is called the importance distribution. Then,
E[h(X)] can be expressed as,
E[h(X)] =
∫h(x)f(x)
g(x)g(x) dx.
The ratio f(x)g(x) is referred to as the importance weights for each particle x. Then, the
importance sampling estimate of E[h(X)] using a sample {xi}Ni=1 drawn from g(.) is
given by,
Chapter 2. Literature review 17
E[h(X)] =N∑i=1
h(xi)f(xi)
g(xi)=
N∑i=1
h(xi)w(xi).
When f(.) is available up to a normalisation constant, normalised weights W (xi) =
w(xi)/∑N
i=1w(xi) should be used. This approach can be used to approximate a posterior
distribution. Here, p(θ) can be considered as the importance distribution and the ratiof(θ)g(θ) is simplified to the likelihood p(y|θ,d). Thus, a sample {θi}Ni=1 drawn from the prior
is weighted by the normalised likelihood weights Wi. The efficiency of the approximation
can be determined by the effective sample size (ESS) which can be approximated as,
ESS =1∑N
i=1W2i
. (2.5)
The value of ESS is the number of independent samples from the posterior distribution.
2.2.3 Laplace approximation
The Laplace approximation is a deterministic and computationally efficient approximation
to the posterior distribution. Suppose we have observed data y under design d generated
from model m with q parameters θ. Then, the Laplace approximation approximates
the posterior distribution of θ via a multivariate normal distribution with mean θ∗ and
covariance matrix H(θ∗)−1 where θ∗ is the posterior mode and H(θ∗)−1 is the inverse of
the Hessian matrix evaluated at θ∗.
One advantage of using the Laplace approximation is the availability of an approximation
to the model evidence which can be used for model choice. The approximation is as
follows:
p(y|m,d) = (2π)q2 |H(θ∗)−1|
12 p(y|θ∗,m,d) p(θ∗|m). (2.6)
Most models considered in this thesis have likelihoods which are not available in closed-
form or are computationally expensive to evaluate a large number of times. Thus, in the
following section, likelihood-free methods for inference are described.
2.3 Approximate Bayesian computation
For complex models, the likelihood may not be available in closed form or cannot be
evaluate a large number of times. In the last two decades, Approximate Bayesian com-
putation (ABC) methods have been developed as an alternative method to undertake
Bayesian inference for these complex models given that it is possible to simulate data
from the model. Originally proposed in population genetics (Beaumont et al., 2002) such
Chapter 2. Literature review 18
methods assume, despite the likelihood not being available, it is relatively straightforward
(and efficient) to sample from the likelihood. We note that this is certainly the case for
CTMC models via the Gillespie algorithm (Gillespie, 1977). The simplest approach in
ABC is ABC rejection. Here, the posterior distribution is approximated through the sim-
ulation of prior predictive data x, and retain the parameters values that generated data
close to the observed data y, as determined by a discrepancy function. More specifically,
a sample of N parameter values are drawn from the prior p(θ) and for each parameter
value θi, a dataset xi is generated. Then, each xi is compared with y using a discrepancy
function ρ(x,y), and the parameter values which generate x with discrepancy less than a
pre-defined threshold ε are kept to form the ABC posterior of θ. Thus, the ABC posterior
of θ can be expressed as,
pABC(θ|y,d, ε) ∝∫xp(x|θ,d) I(ρ(y,x|d) ≤ ε)dx. (2.7)
where I(A) is the indication function for event A. A sample from the ABC posterior
pABC(θ|y,d, ε) can be obtained by ABC rejection as described in Algorithm 2.2.
1: Generate θi ∼ p(θ) for i = 1, ..., N
2: Generate xi ∼ p(·|θi,d) for i = 1, ..., N
3: Compute discrepancies of ρi = ρ(xi,y|d) for i = 1, ..., N creating particles {θi, ρi}Ni=1
4: Sort the particle set according to the discrepancy ρ such that ρ1 ≤ ρ2 ≤ ... ≤ ρN
5: Determine ε = ρbαNc (where b·c denotes the floor function and 0 < α < 1).
6: Select the subset of particles {θi|ρi ≤ ε}Ni=1, which gives the ABC posterior sample of θ.
Algorithm 2.2: ABC rejection algorithm (Beaumont et al., 2002)
The ABC rejection method suffers from the curse of dimensionality as the number of ob-
servations in y increases. Thus, various dimension reduction methods have been proposed
in the literature (Blum et al., 2013). Further, the chosen value of tolerance ε determines
the accuracy of the approximation. A smaller value of ε provides a more accurate ap-
proximation, but also results in a low acceptance rate. Consequently, it increases the
computational cost of generating a large number of simulations to obtain a reasonably
sized sample {θi|ρi ≤ ε}Ni=1, see line 6 of Algorithm 2.2, to represent the posterior.
For K competing models with intractable likelihoods, the joint ABC posterior of model
indicator m and model parameters θm can be expressed as,
pABC(θm,m|y,d, ε) ∝ p(θm|m) p(m)
∫xp(x|θm,m,d) I(ρ(y,x|d) ≤ ε)dx, (2.8)
where y are the observed data, and x are the simulated data from model m. Upon
sampling from pABC(θm,m|y,d, ε), the ABC posterior model probability of model m can
Chapter 2. Literature review 19
be approximated by summing the number of samples where the model indicator is equal
to m. The ABC model choice algorithm (Grelaud et al., 2009) as described in Algorithm
2.3 can be used to approximate the posterior model probabilities of candidate models.
1: Generate mi ∼ p(m) for i = 1, ..., N
2: Generate θim ∼ p(·|mi,d) for i = 1, ..., N
3: Generate xi ∼ p(·|θim,d) for i = 1, ..., N
4: Compute discrepancies of ρi = ρ(xi,y|d) for i = 1, ..., N creating particles {mi, ρi}Ni=1
5: Sort the particle set according to the discrepancy ρ such that ρ1 ≤ ρ2 ≤ ... ≤ ρN
6: Determine ε = ρbαNc (where b·c denotes the floor function and 0 < α < 1).
7: Select the subset of particles {mi|ρi ≤ ε}Ni=1
Algorithm 2.3: ABC algorithm for model choice (ABC-MC) (Grelaud et al., 2009)
Then, the ABC approximation to the posterior model probability of model m is given by,
p(M = m|y,d) =1
Nε
Nε∑i=1
I(mi = m), (2.9)
where Nε is the number of particles with discrepancy value less than ε (see line 7 of
Algorithm 2.3).
2.3.1 Synthetic likelihood
The synthetic likelihood (Wood, 2010) approach is another method of approximating
an intractable likelihood of observed data y for a given value of model parameters θ.
However, the synthetic likelihood approach is based on a parametric approximation to
the sampling distribution of the data or summary statistics. This is achieved by evaluating
summary statistics sobs = S(y) of observed data y, and assuming these summary statistics
follow a multivariate normal distribution with mean µ(θ) and variance-covariance Σ(θ).
Then, the synthetic likelihood ls(sobs|θ) can be expressed as,
ls(sobs|θ) =1
2(sobs − µ(θ))T Σ(θ)(sobs − µ(θ))− 1
2log |Σ(θ)|. (2.10)
In general, µ(θ) and Σ(θ) cannot be evaluated analytically for a given value of θ but
can be approximated by simulating n datasets from the model of interest with parameter
θ and evaluating the mean and variance-covariance of the summary statistics for each
dataset. By substituting the estimated mean vector µ(θ) and variance-covariance matrix
Σ(θ) in Equation (2.10), ls(sobs|θ) can be approximated.
Originally, Wood (2010) used the synthetic likelihood to find the maximum likelihood
estimates of parameters in nonlinear models in ecology. Recently, Price et al. (2018c)
Chapter 2. Literature review 20
extended the synthetic likelihood to the Bayesian framework by incorporating parameter
uncertainty. Similarly, the synthetic likelihood can be used as an approximation to the
actual likelihood when using posterior approximations such as importance sampling and
variational Bayes methods (Ong et al., 2018).
Gaussian diffusion approximation is the deterministic counterpart of the synthetic likeli-
hood approach where the mean and variance are obtained analytically. Thus, the Gaussian
diffusion approximation is computationally less expensive than simulation-based methods
as summaries like the mean and variance can be found without simulation. Despite the
computational advantages, the applicability of this method is limited to density dependent
Markov chains (Pagendam and Pollett, 2013)
2.3.2 Indirect inference
In indirect inference (II), a model with a tractable likelihood, called an auxiliary model,
is used to describe data from a generative model with an intractable likelihood. Drovandi
et al. (2011) employed II within the ABC algorithm to summarise data via using estimated
parameters of the auxiliary model as summary statistic. Alternatively, the likelihood of
the generative model is approximated by the likelihood of auxiliary model evaluated at
φ(θ) where θ is the parameter of the generative model, for example, Drovandi et al.
(2015), Gallant and McCulloch (2009). The relationship between the parameters of the
generative model and auxiliary model, represented by a mapping function, φ(θ), is found
via fitting the auxiliary model to data simulated from the generative model. The accuracy
of this approach highly depends on the availability of an auxiliary model to well describe
the data generated from the generative model.
2.4 Bayesian experimental designs
The field of design of experiments provides a collection of methodologies to plan an
experiment that will yield informative data for statistical inference on the process of
interest. The informativeness of data y to be obtained under a design d to achieve
the purpose of the experiment such as parameter estimation, model selection and/or
prediction, is measured by a utility function. We consider design within a Bayesian
framework due to the mathematically rigorous handling of uncertainty and the availability
of important utility functions such as those based on mutual information (discussed later).
Define a utility function as u(d,y,θ) where y and θ are unknown a priori. Therefore, the
utility function cannot be used directly to design experiments. To do so, the expectation
is taken with respect to all unknowns to form an expected utility that can be expressed
as follows:
Chapter 2. Literature review 21
U(d) =
∫y
∫θu(d,y,θ) p(y|θ,d) p(θ) dθ dy. (2.11)
In the presence of model uncertainty, Equation (2.11) can be extended to incorporate
the uncertainty about each model given by the prior model probability p(m). Then, the
expected utility of design d is given by,
U(d) =K∑
m=1
p(m)
{∫y
∫θm
u(d,y,θm,m) p(y |θm, m,d) p(θm|m) dθm dy
}. (2.12)
The utility u(d,y,θm,m) can be defined according to the purpose of the experiment such
as estimation of parameters across all the K competing models, discriminating between
competing models or dual goal of parameters estimation and model discrimination. When
the utility function does not depend on the model parameters, Equation (2.12) can be
simplified to yield,
U(d) =K∑
m=1
p(m)
{∫yu(d,y,m) p(y |m,d) dy
}. (2.13)
Unfortunately, given the form of most utility functions, the above integral is generally
analytically intractable, and therefore needs to be estimated using numerical methods
such as Monte Carlo integration. This estimation can be expressed as follows:
U(d) =K∑
m=1
p(m)1
N
N∑j=1
u(d,ymj ,m), (2.14)
where ymj ∼ p(y|m,d) are independent draws from the model m at time points d. Ac-
cording to Equation (2.14), a single evaluation of U(d) requires K × N evaluations of
the utility function. Given the utility is typically a function of the posterior distribution,
evaluating this approximation is a computationally challenging task. Further challenges
occur in locating the optimal design which is defined as the design that maximises U(d)
over the design space.
Following subsections will describe the utility functions that have been used in the design
literature for parameter estimation, model discrimination and dual-purpose of parameter
estimation and model discrimination.
Chapter 2. Literature review 22
2.4.1 Utility functions for parameter estimation
Design of experiments for efficient parameter estimation has received major attention in
both frequentist and Bayesian design literature. A number of criteria have been used in
the frequentist design literature as functions of the Fisher information matrix, such as
the D-optimality, A-optimality and Ds-optimality.
In the Bayesian literature, the Kullback-Leibler (KL) divergence between the prior and
posterior distributions of parameters has been widely used as a utility function, see
Drovandi et al. (2013), Ryan et al. (2014), Ryan (2003). For a dataset y to be obtained
under a design d, the KL divergence utility can be expressed as follows:
u(d,y) =
∫θ
log
(p(θ |y,d)
p(θ)
)p(θ |y,d) dθ. (2.15)
By applying Bayes theorem, this can be simplified as (Lindley, 1956),
u(d,y) =
∫θ
log p(y|θ,d) p(θ|y,d) dθ − log p(y|d), (2.16)
where p(y|d) is the marginal likelihood or model evidence.
As can be seen, in using the KL divergence utility, an estimate of the marginal likelihood
is required. Given this is typically difficult to achieve, alternative estimation utilities
have been adopted in the literature. For example, Ryan et al. (2014) used the inverse of
the determinant of the posterior covariance matrix of θ as a utility to derive design for
parameter estimation and it can be expressed as follows:
u(d,y) = 1/det(V ar(θ|y,d)). (2.17)
As pointed out by Overstall et al. (2018), the logarithm of Equation (2.17) would be more
appropriate to use as it is more closely related to the KL divergence utility.
2.4.2 Utility functions for model discrimination
In the frequentist literature, the T-optimality criterion (Atkinson and Fedorov, 1975a,b)
has been used to design experiments to discriminate between an assumed true model and
one or several competing models with normal errors. Lopez-Fidalgo et al. (2007) proposed
a utility based on the KL distance between the predictive distributions of assumed true
model and alternative model, which can be used to discriminate between any models.
However, both T and KL optimal designs are based on the assumption of a true model, and
thus can be considered locally optimal. Ponce de Leon and Atkinson (1991) and Tommasi
and Lopez-Fidalgo (2010) extended the T-optimality and KL-optimality, respectively,
Chapter 2. Literature review 23
by incorporating prior model probability of each model and prior distribution of model
parameters.
The mutual information between the observed data and model indicator m has been
widely used as a discrimination utility to obtain fully Bayesian designs, for instance, Box
and Hill (1967), Cavagnaro et al. (2010), Drovandi et al. (2014). In contrast to the T and
KL optimality criteria, the mutual information utility is based on the posterior model
probability and does not require the specification of a true model. Further, the mutual
information utility is straightforward to extend to cases that involve many models, and
it is given by,
uMI(d,y,m) = log p(m |y,d). (2.18)
The Zero-One utility selects the design which correctly classifies, on average, the true
model based on the posterior model probability (Overstall et al., 2018, Rose, 2008). It
can be defined as follows:
u0−1(d,y,m) =
1, if m = m,
0, otherwise,(2.19)
where m = maxm∈M{p(m|y,d)} and M is the set of rival models. An advantage of the
Zero-One utility over the mutual information for model discrimination utility is that in-
correct choices of different models can be penalised differently. For example, the selection
of a more complex model can be penalised more than the selection of an overly simplified
model.
Both the mutual information for model discrimination and Zero-One utility are compu-
tationally expensive to evaluate. As an alternative, Ds-optimality has been used in the
frequentist design literature (Biedermann et al., 2007, Muller and Ponce de Leon, 1996),
and can be extended for use within a Bayesian framework as the posterior precision of
extra parameters of the most complex model among the nested models. This can be
expressed as follows:
uDs(d,y) = log(1/det(V ar(θs|y,d))), (2.20)
where θs is the set of extra parameters of the most complex model among the nested
models.
2.4.3 Utility functions for dual experimental goals
Often the purpose of experiments is to learn about both the model and parameters which
adequately describe the data generated from the process of interest. Thus, experiments
have been designed to achieve more than one goal simultaneously (Atkinson, 2008, Borth,
Chapter 2. Literature review 24
1975, Hill et al., 1968, McGree, 2017, McGree et al., 2008, Ng and Chick, 2004, Tommasi,
2009). The obvious advantage of such an experiment is the reduction of the cost of
conducting multiple experiments to achieve individual goals such as parameter estimation,
model discrimination and prediction.
A common approach to design the experiments with more than one experimental goal is
maximising a weighted product of (geometric mean) efficiencies under each experimental
goal. For instance, DT-optimality criterion (Atkinson, 2008), DKL-optimality criterion
(Tommasi, 2009) and DP-optimality criterion (McGree et al., 2008). However, in general,
the selection of the appropriate weight for each experiment goal is not straightforward.
In the Bayesian design literature, Borth (1975) proposed the total entropy utility to design
dual-purpose experiments for parameter estimation and model discrimination. The total
entropy utility uses the additive property of entropy to combine the utilities for parame-
ters estimation and model discrimination. Consequently, the experimenter is not required
to select weights for each experimental goal. However, application of the total entropy
utility has been limited to simple models as evaluation of this utility is a computationally
intensive task. Recently, McGree (2017) addressed the computational difficulties in eval-
uating the total entropy utility using sequential Monte Carlo methods to find designs for
non-linear models for binary and count response data. However, this work was limited to
discretised design space and generalized linear and generalized nonlinear models.
2.5 Experimental designs for models with intractable
likelihood
The evaluation of designs for models with intractable likelihoods, commonly found in
epidemiology, ecology and queue systems, is a challenging task. Consequently, only a few
attempts have been made in both the frequentist literature (Pagendam and Pollett, 2013,
Parker et al., 2015) and the Bayesian literature Cook et al. (2008), Drovandi and Pettitt
(2013), Hainy et al. (2014, 2013), Overstall and McGree (2018), Price et al. (2016, 2018a),
Ryan et al. (2016a) to derive optimal designs for experiments to estimate parameters of
an assumed model.
In the frequentist literature, Pagendam and Pollett (2013) used a Gaussian diffusion
approximation in deriving D-optimal experimental designs to estimate parameters of epi-
demic models, SI (Susceptible to infected), SIS (susceptible to infected to susceptible)
and SIR (susceptible to infected to recovered). However, the Gaussian diffusion approxi-
mation can only be applied for density dependent Markov chains. As noted by Pagendam
and Pollett (2013) this approximation only produces accurate results for large populations
and thus, cannot be used to design experiments where a small number of experimental
units are considered such as experiments in veterinary epidemiology. Parker et al. (2015)
found D and Ds optimal designs for parameter estimation of a queue system. This work
Chapter 2. Literature review 25
was limited to simple queue model, M/M/1, where the likelihood function can be ap-
proximated using Hyperbolic Bessel functions (Morse, 1955). Further, these designs are
not robust against parameter uncertainty and are thus termed locally optimal designs.
In the Bayesian framework, Cook et al. (2008) derived optimal observation times for pa-
rameter estimation of the SI model via the moment closure method to approximate the
likelihood. However, the moment closure may not be appropriate for complex models with
multiple sub-populations such as SIR model, as is not straightforward to find an appro-
priate probability distribution to approximate the actual likelihood. Drovandi and Pettitt
(2013) used ABC rejection method (Beaumont et al., 2002) to find optimal Bayesian de-
sign for parameter estimation of Markov process models of epidemics and macroparasite
population evolution. In this approach, the computational expense of simulating a large
number of datasets was reduced by pre-simulating and storing datasets prior to the search
of optimal design using the Muller algorithm (Muller, 1999). Then, the idea of using pre-
simulated data in utility evaluation has been employed in subsequent design papers by
Price et al. (2016, 2018a) for parameter estimation for epidemiological models and Hainy
et al. (2016) for spatial extremes models.
Ryan et al. (2016a) used indirect inference to approximate intractable likelihood in epi-
demiology models and found optimal design for parameters estimation. Indirect inference
methods approximate the likelihood of observed data (based on the generative model) via
the likelihood of an auxiliary model which is comparatively computationally less expen-
sive to evaluate. In contrast to the pre-simulation of data for each possible design point
(Drovandi and Pettitt, 2013), the indirect inference approach requires to find and store
a mapping function between parameters of the generative model and the auxiliary model
based on data simulated according to the selected training design. Thus, this approach
can be used to search optimal designs in a continuous design space, and it is straight-
forward to consider design problems with more design variables. However, this method
highly depends on the adequacy of the auxiliary model which may be difficult to find, in
general.
2.6 Optimisation algorithm
Finding the optimal design for an experiment requires maximising the expected utility
over all possible designs. An exhaustive search on the design space is computationally
prohibitive other than for design problems with few candidate designs to be considered.
For instance, Hainy et al. (2016) considered the selection of 3 weather stations from
39 stations to yield data for efficient parameter estimation for spatial extremes models.
Further, in the absence of an analytical expression for the expected utility U(d), the
numerical approximations result in a noise utility surface where the standard optimisation
algorithms may fail to find the optimal design.
Chapter 2. Literature review 26
The Muller algorithm (Muller, 1999) has been widely used to find Bayesian designs for
parameter estimation (Cook et al., 2008, Drovandi and Pettitt, 2013, Ryan et al., 2016a).
The Muller algorithm first samples from a joint distribution of parameters θ, data y and
design d using MCMC. Then, the optimal design is estimated as the multivariate mode
of the marginal distribution of d. The applicability of this method is limited for design
problems with only a few design dimensions due to the difficulty in estimating/finding
the multivariate mode. To overcome these computational issues Ryan et al. (2015, 2014)
used a low dimensional parametrisation of the design space using a beta distribution in
finding optimal sampling time for pharmacokinetic experiments. The sampling times are
represented by the quantiles of the beta distribution which can be represented by two
parameters. Thus, the higher dimensional design space of sampling times reduces to a
two dimensional design space. However, the low dimensional parametrisation may result
in sub-optimal designs.
Alternatively, the Coordinate exchange (CE) algorithm of Meyer and Nachtsheim (1995)
can be used to find optimal designs. The CE algorithm iteratively optimises one design
variable (design point) at a time until no improvement can be achieved in the utility
function. However, as the number of candidate solutions for each design point increases,
the CE algorithm can require a large number of evaluations of the expected utility. Thus,
Overstall and Woods (2017) extended the CE algorithm by emulating the expected utility
in each design dimension and optimising the predicted value as given by emulator. In
general, this reduces the number of utility evaluations required to find the optimal design.
However, this algorithm may require a large number of iterations to locate the optimal
design for design problems with utility surfaces which may not be well approximated by
the Gaussian Process emulator.
Price et al. (2018b) proposed the Induced Natural Selection Heuristic (INSH) algorithm, a
nature-inspired population-based optimisation algorithm, to locate optimal designs. The
INSH algorithm starts with a randomly selected sample of designs (the initial population),
and evaluates the utility of each design using available parallel computing resources. Then,
a fixed number of best designs are selected, and a pre-specified number of designs are
selected around these best designs to form the next generation of designs. This process
then iterates a pre-specified number of iterations after which the best design among the
population is selected as the optimal design. The INSH algorithm can be used for any
utility surface without making any assumptions about its shape. However, this approach
may result in near-optimal designs and requires specifying a relatively large number of
tuning parameters.
2.7 Conclusion
In summary, the existing research on ABC and Bayesian experimental design have con-
sidered design problems in a low dimensional design space. Further, methodologies for
Bayesian design for discriminating between models with intractable likelihoods and dual
Chapter 2. Literature review 27
purpose experiments for these models do not exist in the current literature. In this thesis,
we thus propose new methods to address these gaps in the literature with a particular
focus on designing experiments in epidemiology.
3 Optimal Bayesian design for discriminating be-
tween models with intractable likelihoods in epi-
demiology
29
Chapter 3. Optimal Bayesian design for Model discrimination 30
Statement for Authorship
This chapter has been written as a journal article. The authors listed below have certified
that:
(a) They meet the criteria for authorship as they have participated in the conception,
execution or interpretation of at least the part of the publication in their field of
expertise;
(b) They take public responsibility for their part of the publication, except for the
responsible author who accepts overall responsibility for the publication;
(c) There are no other authors of the publication according to these criteria;
(d) Potential conflicts of interest have been disclosed to granting bodies, the editor
or publisher of the journals of other publications and the head of the responsible
academic unit; and
(e) They agree to the use of the publication in the student’s thesis and its publication
on the Australian Digital Thesis database consistent with any limitations set by
publisher requirements.
The reference for the publication associated with this chapter is; Dehideniya M. B.,
Drovandi C. C. , and McGree J. M. (2018). Optimal Bayesian design for discriminating
between models with intractable likelihoods in epidemiology, Computational Statistics &
Data Analysis, 124: 277-297.
Contributor Statement of contribution
M. B. Dehideniya Developed and implemented the statistical methods, wrote
the manuscript, revised the manuscript as suggested by
co-authors and reviewers.
Signature and date:
J. M. McGree Supervised research, assisted in interpreting results,
critically reviewed manuscript
C. C. Drovandi Initiated the research concept, supervised research,
assisted in interpreting results, Critically reviewed
manuscript
Principal Supervisor Confirmation: I have sighted email or other correspondence for all
co-authors confirming their authorship.
Name: ________________________ Signature:______________ Date: ________
05/07/2019
James McGree 05/07/2019
Chapter 3. Optimal Bayesian design for Model discrimination 31
3.1 Abstract
A methodology is proposed to derive Bayesian experimental designs for discriminating be-
tween rival epidemiological models with computationally intractable likelihoods. Methods
from approximate Bayesian computation are used to facilitate inference in this setting,
and an efficient implementation of this inference framework for approximating the expec-
tation of utility functions is proposed. Three utility functions for model discrimination
are considered, and the performance each utility is explored in designing experiments for
discriminating between three epidemiological models: the death model, the Susceptible-
Infected model, and the Susceptible-Exposed-Infected model. The challenge of efficiently
locating optimal designs is addressed by an adaptation of the coordinate exchange algo-
rithm which exploits parallel computational architectures.
3.2 Introduction
Epidemiological studies are important for understanding how a disease is transmitted,
and for the development of preventative measures which might reduce or limit the spread
of the disease. Informative data collection is crucial in developing this understanding, and
can be achieved by conducting an experiment according to an optimal design that provides
the maximum amount of information to address the aim of the experiment which could
include model selection, parameter estimation and prediction. However, the derivation of
optimal designs in epidemiological experiments is a challenging task as most epidemiolog-
ical models contain likelihoods which are computationally expensive to evaluate (Becker,
1993). Consequently, only a few attempts have been made in both the frequentist lit-
erature (Pagendam and Pollett, 2013) and the Bayesian literature (Cook et al., 2008,
Drovandi and Pettitt, 2013) to derive optimal designs for experiments in epidemiology.
In the frequentist literature, the design of epidemiological experiments has been facil-
itated via an approximation to the likelihood. Pagendam and Pollett (2013) used a
Gaussian diffusion approximation in deriving D-optimal experimental designs to estimate
parameters of the SI (Susceptible-Infected), SIS (Susceptible-Infected-Susceptible) and
SIR (Susceptible-Infected-Recovered) epidemic models. The designs derived in this work
were dependent upon point estimates of the parameter values, and are thus termed locally
optimal designs. In contrast, the Bayesian approach provides a framework to account for
the uncertainty in parameters when deriving optimal designs (Ryan, 2003). This was
demonstrated in the work of Cook et al. (2008) who derived optimal observation times
for parameter estimation of the death model and the SI model. In their work, the moment
closure method was used to approximate the likelihood of the SI model.
Recent developments in approximate Bayesian computation (ABC) provide a compre-
hensive framework to undertake Bayesian inference and design when the likelihood is
Chapter 3. Optimal Bayesian design for Model discrimination 32
intractable. Drovandi and Pettitt (2013) presented a likelihood-free method to derive
Bayesian designs for parameter estimation of Markov process models of epidemics and
macroparasite population evolution using the ABC rejection method (Beaumont et al.,
2002). In the work of Price et al. (2016), ABC rejection was used to approximate a utility
function based on the Kullback-Leibler (KL) divergence (Kullback and Leibler, 1951) in
designing experiments for parameter estimation of epidemiological models.
Previous work in the design of epidemiological experiments has focussed on estimating
model parameters of an assumed true model to describe the process of interest (Cook
et al., 2008, Drovandi and Pettitt, 2013, Pagendam and Pollett, 2013). However, in re-
ality, there may be uncertainty about the true epidemiological process (see Lee et al.
(2015)), and indeed, the purpose of the experiment could be to determine how a disease
spreads. Hence, the lack of knowledge about the true model should be taken into account
when designing efficient experiments. Thus, the need for the development of new meth-
ods to design efficient epidemiological experiments for model discrimination motivates the
work described in this article. Here, we consider the design problem of locating a set of
observation times which yields information to efficiently discriminate between compet-
ing models. Moreover, previous work on designing experiments for model discrimination
(Atkinson and Fedorov, 1975a, Cavagnaro et al., 2010, Drovandi et al., 2014, Overstall
et al., 2018, Woods et al., 2017) were limited to models where the likelihood can be easily
computed. Thus, this is the first paper to propose methods for finding Bayesian optimal
designs for discriminating between models with intractable likelihoods.
Finding the optimal design for an experiment requires the maximisation of an expected
utility over all possible designs, and it is a challenging optimisation problem because the
utility surface is noisy and may be relatively flat around its maximum. Further, it can
be computationally prohibitive to undertake the optimisation even for experiments with
a moderate number of design variables (see the review by Ryan et al. (2016b)). Muller
(1999) proposed a simulation-based approach that converts the optimisation problem to a
problem of sampling from a target distribution for which the mode is the optimal design.
First, samples are drawn from the target distribution h(θ,y,d) (joint distribution of the
parameters, data, and design) using Markov chain Monte Carlo (MCMC) simulations,
and then the estimated multivariate mode of the marginal distribution of d is deemed the
optimal design. The Muller algorithm has been widely used in the Bayesian experimental
design literature (Cook et al., 2008, Drovandi and Pettitt, 2013, Ryan et al., 2014, Stroud
et al., 2001). However, in practice, this method suffers from slow convergence. Moreover,
sampling from the joint distribution h(θ,y,d) using an MCMC method and determining
the multivariate mode for a large number of design variables are computationally expen-
sive tasks (Drovandi and Pettitt, 2013).
Chapter 3. Optimal Bayesian design for Model discrimination 33
Alternatively, existing local search optimisation methods can be used to locate the optimal
design. For instance, the coordinate exchange (CE) algorithm of Meyer and Nachtsheim
(1995) has been used to find D-optimal designs in screening experiments by Goos and
Jones (2011), Palhazi Cuervo et al. (2016). Further, Gotwalt et al. (2009) used the coor-
dinate exchange algorithm in constructing pseudo-Bayesian optimal designs for parameter
estimation of non-linear models. The coordinate exchange algorithm starts from a given
initial design and iteratively maximises the utility function by changing one design vari-
able at a time while keeping all other variables fixed. This iterative procedure continues
until there is little or no improvement in the value of the utility. In practice, this may
require a large number of utility evaluations, especially when continuous design variables
are involved in the experiment. Recent work of Overstall and Woods (2017) extends
the idea of the coordinate exchange algorithm by considering an approximation of the
expected utility as a function of a single design variable conditional on the remaining
fixed variables. This approximation is facilitated by fitting a Gaussian process emulator
based on a relatively small number of utility evaluations. This emulator is then used to
approximate the utility function across the entire range of the considered variable and to
estimate the maximum at each iteration.
In this work, evaluating the approximate utility of a given design is highly computational
as it requires a large number of simulations from the model in order to approximate a
posterior distribution via ABC methods. Consequently, finding Bayesian optimal designs
for models with intractable likelihoods in a continuous design space could be computa-
tionally prohibitive. However, the use of a discrete design space to locate optimal designs
significantly reduces the required computational effort as it allows the use of pre-simulated
data for the posterior approximations in utility evaluations (discussed later). This idea
has been used by Drovandi and Pettitt (2013) within the Muller algorithm and by Price
et al. (2016) who finds the optimal design using an exhaustive search. However, these
methods quickly become computationally intensive as the number of design dimensions
increases. Hence, in this setting, it would be advantageous to reduce the required num-
ber of utility evaluations when searching for the optimal design. For this purpose, we
propose using the refined coordinate exchange algorithm where, at each iteration of the
exchange algorithm, the coordinate space reduces and becomes more refined. Further, the
algorithm is structured such that parallel computational architectures can be exploited.
As will be seen, through using this algorithm, we are able to efficiently locate Bayesian
designs in higher dimensions than previously explored in the design literature related to
models with intractable likelihoods.
The paper is organised as follows. In the next section, the problem of model choice in
the Bayesian framework is described. Section 3.4 presents the utility functions used in
this work, and Section 3.5 describes the ABC methods that are used for inference and in
estimating the expected utility of a given design. An adapted version of the coordinate
Chapter 3. Optimal Bayesian design for Model discrimination 34
exchange algorithm which exploits parallel computational architectures is presented in
Section 3.6. In Section 3.7, the design for a pharmacokinetic model is considered to explore
and demonstrate the performance of our proposed optimisation algorithm. Following this,
two epidemiological examples are considered to demonstrate the performance of three
utility functions, namely the mutual information utility, the Ds-optimal utility and the
Zero-One (0-1) utility for model discrimination. The paper concludes with a discussion
and suggestions for further research.
3.3 Bayesian model choice
Consider the problem of designing an experiment to select the preferred model from a
finite number of K candidate models, described by a random variable M ∈ {1, 2, . . . ,K}.Each model m is parameterised by θm with a prior distribution p(θm |M = m), and
the likelihood function of each model m is given by p(y |θm,d), where y represents the
observed data from the experiment conducted under design d. The prior probability of
each model m is represented by p(M = m), for m = 1, 2, . . . ,K. For ease of notation,
M = m will be abbreviated by m throughout the rest of the paper. In this work, model
choice is performed using the posterior model probability which can be expressed as
follows:
p(m |y,d) =p(y |m,d)p(m)∑K
m=1 p(y |m,d)p(m), (3.1)
where
p(y |m,d) =
∫θm
p(y |θm,d)p(θm |m) dθm.
In Section 3.5.2 we describe how to estimate p(m |y,d) in an intractable likelihood setting.
3.4 Bayesian experimental design
Experimental designs provide plans for the collection of informative data to efficiently
address experimental aims. In the Bayesian setting, an experimental design d is evaluated
by estimating the expected value of a utility function U(d) which represents the expected
worth of the experimental data obtained under the design d. Finding the Bayesian optimal
design involves locating the design over the design space that maximises d. We denote the
optimal design as d∗. In a Bayesian setting, we consider the expected value of a chosen
utility function u(d,y,θ):
U(d) =
∫y
∫θu(d,y,θ) p(y |θ,d) p(θ) dθ dy, (3.2)
Chapter 3. Optimal Bayesian design for Model discrimination 35
where p(y |θ,d) is the likelihood of the possible outcomes under parameters θ and design
d, and p(θ) is the prior distribution of θ. When the utility function u(.) is independent
of θ, the expected utility can be simplified to yield
U(d) =
∫yu(d,y)p(y |d) dy, (3.3)
where p(y |d) is the prior predictive distribution of observed data y under design d.
When model uncertainty is present, Equation (3.3) can be extended to yield
U(d) =K∑
m=1
p(m)
{∫yu(d,y,m) p(y |m,d) dy
}. (3.4)
In the Bayesian experimental design context, most utility functions u(.) are based on
the posterior of unknowns (parameters and/or model), and thus given the form of most
utility functions, the above integrals are analytically intractable. Therefore, an approxi-
mation needs to be considered. A common approach is to use Monte Carlo integration,
where data are sampled from the prior predictive distribution and the utility is evaluated
for each sample. Variance reduction can be obtained by drawing from the prior predic-
tive using randomised Quasi-Monte Carlo (Drovandi and Tran, 2018) but for simplicity
we consider pseudo-random numbers. Unfortunately, the approximation of the expected
utility requires a large number of posterior distributions to be approximated (or sampled
from) rendering Bayesian design more computationally challenging than Bayesian infer-
ence. Further, in the search for an optimal design, one needs to approximate U(d) a large
number of times over the design space. This presents significant computational challenges
and has been the main reason why Bayesian design has generally been restricted to low-
dimensional settings. The specific forms of the utility functions considered in this work
are outlined next.
3.4.1 Utility function for parameter estimation
In the design literature, utilities based on the expected information gain from the exper-
iment have been used to evaluate designs for efficient parameter estimation. Cook et al.
(2008) and Price et al. (2016) used the KL divergence between the prior and posterior
distributions of the parameters:
u(d,y) =
∫θ
log
(p(θ |y,d)
p(θ)
)p(θ |y,d) dθ. (3.5)
This utility is equivalent to the Shannon Information gain (SIG) which can be derived by
applying Bayes theorem to the above equation. Then the SIG utility can be expressed as,
Chapter 3. Optimal Bayesian design for Model discrimination 36
u(d,y) =
∫θ
log(p(y|θ,d)) p(θ|y,d)dθ − log(p(y|d)). (3.6)
This utility will be applied in Section 3.7.1 of this paper when we consider designing a
pharmacokinetic study to explore and evaluate the performance of our proposed optimi-
sation algorithm.
3.4.2 Utility functions for model discrimination
In this study, we consider three model discrimination utilities for discriminating between
rival models. First, we describe the mutual information utility which has been used in
Bayesian experimental design as a utility function for model discrimination (Cavagnaro
et al., 2010, Drovandi et al., 2014) when the likelihood can be computed analytically. This
utility evaluates the mutual information between the data and the model indicator, and
can be expressed as,
uMI(d,y,m) = log p(m |y,d), (3.7)
where y are the possible outcomes of the experiment under model m and design d. This
was originally proposed by Box and Hill (1967) and used recently by Drovandi et al. (2014)
to derive sequential designs for model discrimination. Here, we use the expected value
of the mutual information utility, UMI(d), to evaluate static designs for discriminating
between models with intractable likelihoods. Estimation of UMI(d) will be described in
the next section.
The Zero-One utility given in Equation (3.8) selects the design which, on average, has the
highest chance of selecting the true model based on the posterior model probabilities of
the rival models, that is,
u0−1(d,y,m) =
1, if m = m
0, otherwise,(3.8)
where m = maxm∈M{p(m |y,d)} and M is the set of rival models. Such a utility func-
tion has been used previously to discriminate between competing linear (Rose, 2008) and
logistic (Overstall et al., 2018) regression models. This utility has a straightforward inter-
pretation as maximising the probability of selecting the true model. An advantage of this
utility is that it can be easily modified to penalise incorrect selection of the true model.
For example, this utility can be adjusted to more severely penalise the incorrect selection
of the simpler of two models than incorrectly selecting the more complex model.
Chapter 3. Optimal Bayesian design for Model discrimination 37
Ds-optimality has been used for designing efficient experiments to estimate a specific
subset of parameters of interest in a given model (Atkinson and Bogacka, 1997, Solkner,
1993), and to discriminate between nested models (Muller and Ponce de Leon, 1996,
Waterhouse et al., 2008). As a discrimination utility, Ds-optimality selects the designs
which provide precise estimates of the additional parameters in the most complex model
among nested rival models. Thus, it is computationally less expensive to evaluate than
other model discrimination utilities which are based on the posterior model probabilities
of the competing models. It can be expressed as follows:
uDs(d,y) = log(1/det(Varθs |y,d(θs))), (3.9)
where θs denotes the additional parameters of the most complex model among the rival
models.
Hence, three discrimination utilities will be considered in the examples that follow in
Section 3.7. However, before the examples can be considered, we must first show how
each utility can be approximated in our inference framework. Thus, in the next section,
ABC is described for posterior inference in intractable likelihood settings. It is then shown
how the expectation of each utility function can be approximated.
3.5 Approximate Bayesian computation (ABC) and utility
estimation
ABC methods were originally developed for inference in population genetics (Beaumont
et al., 2002), and these methods are used for Bayesian inference through the simulation
of data from the model when the likelihood cannot be evaluated analytically or is com-
putationally infeasible to evaluate a large number of times. In using this approach, a
large number of simulations are drawn from the prior predictive distribution and then the
parameters which generate data x similar to the observed data y are used to approximate
the posterior of θ. The similarity between x and y is measured by a discrepancy function
ρ(x,y) which is based on some summary statistics of x and y. We outline two existing
ABC algorithms: an ABC algorithm for parameter estimation, and an extension of this
ABC algorithm for model choice.
3.5.1 ABC for parameter estimation
The ABC posterior of θ based on data y observed under design d can be expressed as,
p(θ |y,d, ε) ∝ p(θ)
∫xp(x |θ,d) I(ρ(y,x |d ) ≤ ε)dx, (3.10)
Chapter 3. Optimal Bayesian design for Model discrimination 38
where I(.) is the indicator function, ρ(y,x |d) is a discrepancy function of simulated data x
and observed data y under design d, and ε is a predefined tolerance value. In this work, we
directly compare simulated values x and observed values y using the discrepancy function
ρ(y,x |d) proposed by Drovandi and Pettitt (2013) as only low-dimensional designs are
considered. This can be defined as follows:
ρ(x,y |d) =
D∑j=1
|xj − yj |std(x·j | dj)
, (3.11)
where D is the number of design points, and std(x·j | dj) is the standard deviation of pre-
simulated prior predictive data x·j at time dj (see Section 3.5.3 for more details), for j =
1, . . . , D. A sample from the ABC posterior p(θ |y,d, ε) can be obtained by ABC rejection
(Beaumont et al., 2002), MCMC ABC (Marjoram et al., 2003) or SMC ABC (Sisson
et al., 2007). Algorithm 3.1 describes a modified version of the ABC rejection algorithm
(Drovandi and Pettitt, 2013) for designing experiments. This approach is adopted in our
work as it is computationally less expensive than the alternative methods.
1: Generate θ i ∼ p(θ) for i = 1, . . . , N
2: Generate xi ∼ p(· |θ i,d) for i = 1, . . . , N
3: Compute discrepancies of ρ i = ρ(xi,y |d) for i = 1, . . . , N creating particles {θ i, ρ i}Ni=1
4: Sort the particle set according to the discrepancy ρ such that ρ1 ≤ ρ2 ≤ . . . ≤ ρN
5: Determine ε = ρbαNc (where b·c denotes the floor function and 0 < α < 1)
6: Select the subset of particles {θ i | ρ i ≤ ε}Ni=1, which gives the ABC posterior sample of θ
Algorithm 3.1: ABC rejection algorithm (Drovandi and Pettitt, 2013)
3.5.2 ABC for model choice
The joint ABC posterior distribution of model m and model parameters θm is given by,
p(m,θm |y,d, ε) ∝ p(θm |m)p(m)
∫xp(x |m,θm,d) I(ρ(y,x |d ) ≤ ε)dx, (3.12)
where y are the observed data under design d, and x are simulated data from model
m under design d. The posterior model probabilities can be approximated via sampling
from the ABC joint posterior p(m,θm |y,d, ε) using a modified version of the ABC model
Chapter 3. Optimal Bayesian design for Model discrimination 39
choice (ABC-MC) approach (Grelaud et al., 2009) given in Algorithm 3.2.
1: Generate m i ∼ p(m) for i = 1, . . . , N
2: Generate θ im ∼ p(· |m i,d) for i = 1, . . . , N
3: Generate xi ∼ p(· |θ im,d) for i = 1, . . . , N
4: Compute discrepancies of ρ i = ρ(xi,y |d) for i = 1, . . . , N creating particles {m i, ρ i}Ni=1
5: Sort the particle set according to the discrepancy ρ such that ρ1 ≤ ρ2 ≤ . . . ≤ ρN
6: Determine ε = ρbαNc (where b·c denotes the floor function and 0 < α < 1)
7: Select the subset of particles {m i | ρ i ≤ ε}Ni=1
Algorithm 3.2: ABC algorithm for model choice (ABC-MC) (Grelaud et al., 2009)
Then, the ABC approximation to the posterior model probability of model m is given by,
p(m |y,d) =1
Nε
Nε∑i=1
I(m i = m), (3.13)
where Nε is the number of particles in {m i | ρ i ≤ ε}. The discrepancy function given in
Equation (3.11) is used in Algorithm 3.2 to evaluate the similarity between simulated and
observed data.
In this paper, both the ABC rejection and ABC-MC algorithms are used to approxi-
mate posteriors of parameters and posterior model probabilities, respectively, to assist in
approximating utility functions. This is described next.
3.5.3 Estimating the model discrimination utility functions
The expected utility UMI(d) described in Section 3.4.2 can be approximated by Monte
Carlo integration, and it can be expressed as follows:
UMI(d) =K∑
m=1
p(m)
{1
Qm
Qm∑k=1
log p(m |ymk ,d)
}, (3.14)
where ymk ∼ p(y |m,d),m = 1, 2, . . . ,K and p(m |ymk ,d) is an ABC approximation of
the posterior probability of model m obtained via the ABC-MC algorithm described in
Section 3.5.2.
It is a computationally demanding task to generate new datasets for each observed value
ymk in estimating the utility of design d and optimising it over the design space. However,
in ABC, simulated data x are independent from the observed data y (see lines 1-3 in
Algorithm 3.2) and thus, a set of pre-simulated datasets Xm from each model m for a
Chapter 3. Optimal Bayesian design for Model discrimination 40
given design d could be used instead of generating new datasets for each approximation.
Hence, one could discretise the design space and generate datasets from each competing
model and each design prior to running the optimisation algorithm. Such an approach
will significantly reduce computational burden when evaluating an expected utility. In
this work, we consider a grid of times from tmin to tmax with increments of tinc and gen-
erate datasets from each competing model for each time point. This approach has been
used by Drovandi and Pettitt (2013), Hainy et al. (2016) and Price et al. (2016) for find-
ing ABC posteriors, and achieved considerable computational efficiency when estimating
utility functions in locating optimal designs for parameter estimation.
Further, instead of generating a new sample of ymk ; k = 1, .., Q , m = 1, . . . ,K, a fixed
set of ymk values from the Xm are used as the observed data. This has been originally
proposed by Price et al. (2016). The use of a fixed set of ymk from Xm simplifies the
optimisation step as the utilities are deterministic (for a given design d and pre-simulated
datasets Xm;m = 1, . . . ,K). The accuracy of the approximation will depend upon the
value of Qm (see Equation (3.14)). In practice, there will be a trade-off between the accu-
racy and run time. That is, a more accurate approximation can be obtained by increasing
Qm. However, this increases the computational burden of estimating UMI(d).
Similarly, for estimating the expected Zero-One utility, the posterior model probability of
each rival model is approximated by the ABC-MC algorithm using pre-simulated datasets
Xm;m = 1, . . . ,K. The estimated utility can be expressed as,
U0−1(d) =K∑
m=1
p(m)
{1
Qm
Qm∑k=1
u0−1(d,ymk ,m)
}. (3.15)
Lastly, the expected value of the Ds-optimal utility UDs(d) is estimated using a similar
approach described above. Here, a pre-simulated dataset (Xms) is generated from the
most complex model (ms) among the competing models. Then, the UDs(d) is estimated
by Monte Carlo integration using a fixed set of values yk selected from the Xms . It can
be expressed as follows:
UDs(d) =1
Q
Q∑k=1
uDs(d,yk), (3.16)
where yk ∼ pms(y |d), and uDs(d,yk) is the estimate of uDs(d,yk) using the ABC pos-
terior of θs (see Equation (3.9)) obtained via Algorithm 3.1 using pre-simulated datasets
from the model ms. Let xij be the observed value (number of infected individuals) at the
Chapter 3. Optimal Bayesian design for Model discrimination 41
jth time point in the ith simulation. In this setting, the posterior of θs for a given yk is
approximated via the ABC rejection algorithm using xi· as simulated data (see line 2 of
Algorithm 1) and the standard deviation of pre-simulated data x·j at each time point can
be easily computed and then used in the ABC discrepancy function given in Equation
(3.11).
3.6 Optimisation algorithm
The task of locating the optimal design d∗ poses a challenging optimisation problem as
one must maximise a noisy function which may be relatively flat around its maximum.
Locating Bayesian optimal designs for models with intractable likelihoods has an addi-
tional challenge due to the cost of utility evaluations as described in the previous section.
Thus, in this work, we consider a discretised design space to take advantage of the use of
pre-simulated data when evaluating a utility function. Here, we present an adaptation of
the coordinate exchange algorithm for design problems with a discrete (or discretizable)
design space, which exploits parallel computational architectures.
3.6.1 Refined coordinate exchange (RCE) algorithm
In this study, we consider a discretised design space with a large number of possible val-
ues per design variable (observation time of the ith observation). Thus, if one were to
apply the coordinate exchange algorithm here, then U(d) would need to be evaluated for
changes in each element of the design, across all possibilities. This would be very compu-
tationally expensive. Instead, here we use the idea of refining the search space (Dror and
Steinberg, 2006) in optimising U(d) which requires relatively fewer utility evaluations.
This idea has also been used in the optimisation literature (for example, the grid walk
algorithm) and in approximating integrals with adaptive quadrature (for example, the
adaptive Simpson’s rule (McKeeman, 1962)).
To be more explicit, consider the problem of locating an optimal design with p design vari-
ables based on a given utility function. The RCE algorithm starts with a random design
d0 (see line 1 of Algorithm 3.3), and iteratively changes one variable at a time to maximise
U(d), until the improvement of utility U(d) from one iteration to the next, indexed by k,
is less than a predefined threshold value (ζ = 1×10−08) or a predefined maximum number
of iterations (kmax = 20) has been reached. Let dk = [dk1, . . . , dkq , . . . , d
kp] be a candidate
design at the kth iteration and U(dk) is optimised with respect to the qth design variable
(see lines 6-23 of Algorithm 3.3). Here, U(dk) changes only due to dkq as the other p− 1
variables are fixed and hence it is represented by U(dkq ). For a given iteration k and a
design variable q, U(dkq ) is evaluated at each value of set Dq1 which contains possible
values of the qth design variable from Lq and Uq with increments of stinit (a predefined
value) and find the optimal value of the qth variable from the set Dq1 which gives the
Chapter 3. Optimal Bayesian design for Model discrimination 42
maximum utility value Uk∗q . Then, at the next sub-iteration, this process is repeated
with the redefined lower lq(r+1) and upper uq(r+1) limits of the dkq and an increment of sr
until the increment sr equals stmin, the smallest increment of the discretised design space
considered (see lines 10-19 of Algorithm 3.3). For each of these sub-iterations, the set Dqr
contains the set of possible values of the design variable q from lq(r+1) to uq(r+1) with an
increment of sr. Here, simulation results suggested that the use of η = 1.5, where η is
the grid reduction parameter, generally avoids local maxima. Then, the current optimal
design dk is updated with dk∗q , which gives the maximum utility with respect to the qth
variable, as outlined in lines 20-23 of Algorithm 3.3. Further, the chance of locating the
global optimal can be increased by re-running the algorithm from different initial designs.
1 Initialise : Set k = 0 , d0 = [d01, d02, . . . , d
0p], U
0max = U(d0).
2 repeat3 k = k + 1
4 Ukmax = Uk−1max
5 for q = 1 to p do
6 Evaluate U(dkq ) for all dkq ∈ Dq1 where Dq1 = {Lq, Lq + stinit, Lq + 2× stinit, . . . , Uq}and Lq and Uq are the lower and the upper limits, respectively, of the qth variable
7 Set dk∗q = arg maxdkq∈Dq1U(dkq )
8 Set Uk∗q = U(dk∗q ) , r = 2 , st2 = stinit/2
9 Set lq2 = dk∗q − η × st2 , uq2 = dk∗q + η × st210 repeat
11 Evaluate U(dkq ) for all dkq ∈ Dqr where
Dqr = {lqr, lqr + str, lqr + 2× str, . . . , uqr}12 Set dk∗qr = arg maxdkq∈Dqr
U(dkq )
13 if Uk∗q ≤ U(dk∗qr ) then
14 Set Uk∗q = U(dk∗qr )
15 Set dk∗q = dk∗qr16 end
17 Set lq(r+1) = dk∗q − η × str , uq(r+1) = dk∗q + η × str18 Set str+1 = str/2, r = r + 1
19 until str ≥ stmin20 if Ukmax ≤ Uk∗q then21 Set Ukmax = Uk∗q22 Set dk = [dk1 , d
k2 , . . . , d
k(q−1), d
k∗q , d
k(q+1), . . . , d
kp]
23 end
24 end
25 until Ukmax − Uk−1max ≥ ζ and k ≤ kmax
Algorithm 3.3: Refined coordinate exchange algorithm
In the described optimisation algorithm, parallel computing can be used in two ways, (i)
to evaluate U(d) in parallel (see lines 6 and 11 of Algorithm 3.3) and/or (ii) to evalu-
ate u(d,y,m) in parallel. We experienced significant improvements in computation time
by using the first option. As will be seen in Example 3, we are interested in locating
designs for discriminating between four candidate models. This leads to particularly ex-
pensive utility evaluations for the mutual information and 0-1 utility function. Hence,
Chapter 3. Optimal Bayesian design for Model discrimination 43
we evaluated U(d) in parallel to allow designs to be found in a reasonable amount of time.
3.7 Examples
In epidemiology, individuals in a closed population are divided into sub-populations based
on their disease states such as susceptible, exposed, infected and recovered and the spread
of disease among the individuals is modelled by the transitions of the individuals between
the sub-populations based on unknown model parameters. The size of each sub-population
at time t is represented by the state of the Markov chain which models the dynamics of
the disease spread among the individuals along the time. Here, we consider the following
models which describe the spread of an infectious disease in a closed population of size n
to demonstrate the proposed methodology.
Model 1 : Death model
The death model (Cook et al., 2008) is a simple stochastic model which divides the
population of individuals into two states: susceptible and infected, and the number
of individuals in each subpopulation at time t is denoted by S(t) and I(t), respec-
tively. Once susceptible individuals become infected, they remain in the infected
state as they cannot recover. The probability that an infection occurs in the next
infinitesimal time period ∆t, at the time t with j susceptible individuals is,
P (S(t+ ∆t) = j − 1 |S(t) = j) = b1 j∆t + o(∆t),
where b1 is the rate at which susceptible individuals become infectious due to envi-
ronmental sources.
Model 2 : Susceptible-Infected (SI) model
The SI model (Cook et al., 2008) is an extension of the death model via the inclusion
of a parameter b2 which describes the infection rate per susceptible due to other
infectious individuals in the population. The probability that an infection occurs in
the next infinitesimal time period ∆t, at time t with j susceptible individuals is,
P (S(t+ ∆t) = j − 1 |S(t) = j) = (b1 + b2(n− j)) j∆t + o(∆t).
Model 3 : Susceptible−Exposed-Infected (SEI) model
For some diseases, the susceptible individuals are infected but become infectious
after a latent period of time T (Kim and Lin, 2008). The duration of the latent
Chapter 3. Optimal Bayesian design for Model discrimination 44
period of each individual is independently distributed according to an exponential
distribution with rate parameter λ. The individuals in the latent period are called
exposed individuals, and they are not discerned from the susceptible individuals.
Thus, both the number of susceptible and exposed individuals in the population, de-
noted by S(t) and E(t) respectively, cannot be observed. Consequently, the spread
of a disease described by the SEI model is a partially observable process. The prob-
ability that a susceptible becomes an exposed individual in the next infinitesimal
time period ∆t, at the time t with j susceptible individuals is,
P (S(t+ ∆t) = j − 1, E(t+ ∆t) = e+ 1 |S(t) = j, E(t) = e) = b1 j∆t + o(∆t).
The probability that an exposed individual becomes an infectious individual in the
next infinitesimal time period ∆t, at time t with e exposed individuals is,
P (S(t+ ∆t) = j, E(t+ ∆t) = e− 1 |S(t) = j, E(t) = e) = λ e∆t + o(∆t).
Model 4 : Susceptible-Exposed-Infected-II (SEI-II) model
In the SEI-II model, the rate that a susceptible individual becomes an exposed
individual also depends on the number of infectious individuals in the population
via an additional parameter b2. Thus, the probability that a susceptible becomes
an exposed individual in the next infinitesimal time period ∆t, at the time t with j
susceptible individuals and i infectious individuals is,
P (S(t+ ∆t) = j − 1, I(t+ ∆t) = i |S(t) = j, I(t) = i) = (b1 + b2i) j∆t + o(∆t).
Then, each exposed individual becomes an infectious individual after a latent period
of T , where T ∼ exp(λ). The probability that an exposed individual becomes an
infectious individual in the next infinitesimal time period ∆t, at time t with e
exposed individuals is,
P (S(t+ ∆t) = j, E(t+ ∆t) = e− 1 |S(t) = j, E(t) = e) = λ e∆t + o(∆t).
The following subsections describe three examples to demonstrate the methodology pre-
sented in this paper. The first example demonstrates the performance of the proposed
optimisation algorithm on a standard design problem in the literature. The next two
examples consider the design problem of determining a set of time points to observe a
process which yields observations to discriminate between rival models with intractable
likelihoods. In the first of these two examples, the performance of designs derived under
three utilities for discriminating between Model 1 and Model 2 are compared. Then, in
Chapter 3. Optimal Bayesian design for Model discrimination 45
the third example, the performance of these model discrimination utilities in designing
experiments to discriminate between more than two epidemiological models is explored.
Particularly in this example, the performance of the proposed methodology for finding
designs in, up to, ten dimensions is explored. In the last two examples, optimal designs
are derived based on a discretised design space which consists of discrete time points from
0.1 to 20 days with increments of 0.1 days.
3.7.1 Example 1 - Designs for parameter estimation of a
pharmacokinetic model
In this example, the performance of the RCE algorithm is demonstrated by considering
the problem of locating optimal sampling times for a pharmacokinetic experiment which
has been considered by Overstall and Woods (2017), Ryan et al. (2014). In pharmacoki-
netic studies, compartmental models are typically used to describe the time course of the
concentration of an administered drug in a subject’s bloodstream. Let yt be the measured
concentration of a drug at time t, and it can be modelled by yt ∼ N (a(θ)µ(θ; t) , σ2b(θ; t))
where µ(θ; t) = exp(−θ1t)−exp(−θ2t), a(θ) =400 θ2
θ3(θ2 − θ1), b(θ; t) =
(1+
a(θ)2 µ(θ; t)2
10
)and σ2 = 0.1.
As in Ryan et al. (2014), independent log-normal priors were assumed for the parameters
of interest, θ = (θ1, θ2, θ3)T where, on the log scale, each θ has a common variance of 0.05
and mean of log(0.1), log(1) and log(20) for θ1, θ2, θ3, respectively. Here, the design prob-
lem is to choose 15 sampling times to measure the concentration of the drug to estimate
θ as precisely as possible. Further, as considered by Ryan et al. (2014), these samples are
taken within the first 24 hours after the administration of the drug and these sampling
times must be at least 15 minutes apart.
Here, we compared the performance of the RCE, CE and ACE algorithms in locating
optimal blood sampling times based on a discrete design space; a 15 dimensional grid
with 0.01 increments - G(15, 0.01) (dimension, increment). Each algorithm was executed
20 times in parallel using a 10-core processor and, for each algorithm, the same set of 20
designs was used as the initial designs. For the purpose of comparison, in this example
we used the implementation of the SIG utility available in acebayes R package (Over-
stall et al., 2017b). In this implementation, the double-loop Monte Carlo approximation
was used to estimate the expected SIG utility (see Overstall and Woods (2017) for more
details). Further, in order to adapt the methodology described in Section 3.5.3 for this
example, a fixed set of Monte Carlo samples (prior predictive samples) of size 1000 was
used in evaluating the expected SIG utility of a given design in all three algorithms.
Chapter 3. Optimal Bayesian design for Model discrimination 46
The ACE algorithm locates the optimal design by iteratively selecting an optimal value of
one design variable at a time while keeping other associated design variables as constants.
In each of these iterations, an optimal value of the selected variable is chosen based on an
interpolated utility surface via Gaussian process (GP) (see Overstall and Woods (2017)
for more details). In this example, the utility function is deterministic throughout the
optimisation procedure. Thus, the same number of Monte Carlo samples was used for
the comparison and GP construction steps. The ACE algorithm is a continuous search
algorithm. Thus, it must be adapted for use when searching across discrete design spaces.
Here, we propose that, when a given design d is not available on the grid, the nearest
design dG on the grid G(15, 0.01) is instead considered. As this is an adaptation of the
ACE algorithm such that it can search a discrete design space, we denote this algorithm
as ACE-D.
In Table 3.1, the total run time and average number of utility evaluations of each algo-
rithm to locate 20 designs and the maximum expected utility found from these 20 runs
are given. According to the results, all three optimisation algorithms perform similarly
with ACE-D locating a design with the highest utility, and RCE and CE locating designs
which are 99.8% efficient. Further, it is evident that the discretisation of the design space
has a minor effect on locating the optimal design as the RCE located a design which is
98.0% efficient with respect to the ACE design on a continuous design space. The total
run times show that the CE algorithm is the most expensive requiring over 20 times more
resources than RCE and ACE-D. Further, RCE seems to be more efficient than ACE-D,
regardless of the three step sizes considered. This efficiency is explored further by in-
specting the trace plots of the 20 runs of the RCE and ACE-D algorithms. Such trace
plots are shown in Figure 3.1 which presents the utility value of the best design found at
each iteration of each algorithm. It is evident that RCE relatively quickly locates highly
efficient designs, and appears to be more robust to the initial design.
Table 3.1: Performance of optimisation algorithms in locating 15-points designs for parameter estimationof PK-model.
AlgorithmInitial
step sizeU(d∗)
Total
run time
(minutes)
Average number
of utility evaluations
RCE
1 4.82 19.2 3109.3
0.8 4.82 16.8 2856.1
0.5 4.82 23.2 3845.4
ACE-D - 4.83 34.7 6636.3
ACE - 4.92 35.8 6633.9
CE - 4.82 735.0 128684.9
Chapter 3. Optimal Bayesian design for Model discrimination 47
3.5
4.0
4.5
0 5 10 15 20
Iteration
Exp
ecte
d S
ha
nn
on
in
fro
ma
tio
n g
ain
Algorithm
ACE−D
RCE
Figure 3.1: Trace plots of U(d) each iteration of the ACE and RCE algorithm in locating 15 optimalblood sampling times based on a discretised design space.
In this example, we found that our optimisation algorithm could locate a relatively efficient
design quicker than the ACE-D algorithm, but the latter was still able to eventually find a
design with a slightly higher expected utility. Hence, we explore both of these algorithms
further in the next example.
3.7.2 Example 2 - Designs for model discrimination
In this example, we consider the design problem of deriving a set of distinct time points
at which to observe the process of interest (the spread of a disease) to gain the maximum
information for discriminating between Model 1 and Model 2. Here, time is considered as
the design variable and the ith design point represents the ith time at which to observe the
process. Optimal designs with 1-4 design points were derived using the three utility func-
tions described in Section 3.4.2. In contrast to Model 1, Model 2 has a likelihood which
is computationally intensive to evaluate a large number of times. Thus, here we used our
proposed ABC methods as described in Section 3.5.3 for utility evaluations in locating
optimal designs. As explained in Section 3.5.3, each utility function was evaluated by ap-
proximating posteriors using 50,000 prior predictive datasets from each model. The prior
distributions for the parameters in each model were found as follows. The parameter b1 of
the Model 1 was assumed to be 0.6 and data were generated from this model at three time
points, t = [2, 5, 11] (days). Both models were fitted using ABC rejection with the follow-
ing prior distributions: Model 1 - b1 ∼ U(0, 1), Model 2 - b1 ∼ U(0, 1) , b2 ∼ U(0, 0.05).
The resulting posterior distributions then formed our priors and yielded the predictive
Chapter 3. Optimal Bayesian design for Model discrimination 48
distributions as shown in Figure 3.2. This procedure was adopted to form our prior in-
formation so that the predictive distributions under each model were similar (as seen in
Figure 3.2), resulting in a particularly challenging model discrimination problem.
0
10
20
30
40
50
0 5 10 15 20
Time (t)
Num
ber
of in
fecte
ds
Figure 3.2: Prior predictive distributions of Model 1 (solid) and 2 (dashed). Here, dot-dashed anddotted lines represent the 2.5% and 97.5% prior prediction quantiles of Model 1 and 2, respectively.
For approximating the mutual information and Zero-One utilities, an equal number of
Monte Carlo samples from each candidate model (Qm, number of prior predictive draws
from model m) were used and the value of Qm was set to 500 since it yields estimates
of utilities with Monte Carlo error of less than 0.02 (standard deviation). It was found
that significant increases in precision could not be achieved for modest increases in Qm
(see Figure A.1). Similarly, the number of Monte Carlo samples from Model 2 used to
approximate the Ds-optimal utility was set to 500.
The accuracy of using estimates based on the ABC posterior in utility approximations
was evaluated by making comparisons with when the likelihood can be evaluated. That
is, for Model 1, the likelihood is straightforward to evaluate, and, for Model 2, the compu-
tationally expensive moment closure method can be used to approximate the likelihood.
Hence, our utilities were approximated both using ABC methods and when the likelihood
is available. This comparison was undertaken by randomly generating 500 two, three and
four point designs from the design space. For the one point case, all 200 possible designs
were considered. First, the expected utility of each of these designs was approximated
via ABC methods as described in Section 3.5.3 and then, these approximated expected
Chapter 3. Optimal Bayesian design for Model discrimination 49
utilities were compared with the corresponding expected utilities evaluated where the
likelihood is available.
−0.70
−0.65
−0.60
−0.55
−0.70 −0.65 −0.60 −0.55
U(d) − Actual
U(d
) −
AB
C A
pp
roxim
atio
n
−0.70
−0.65
−0.60
−0.55
−0.50
−0.70 −0.65 −0.60 −0.55 −0.50 −0.45
U(d) − Actual
U(d
) −
AB
C A
pp
roxim
atio
n−0.70
−0.65
−0.60
−0.55
−0.50
−0.45
−0.70 −0.65 −0.60 −0.55 −0.50 −0.45
U(d) − Actual
U(d
) −
AB
C A
pp
roxim
atio
n
−0.70
−0.65
−0.60
−0.55
−0.50
−0.45
−0.70 −0.65 −0.60 −0.55 −0.50 −0.45
U(d) − Actual
U(d
) −
AB
C A
pp
roxim
atio
n
0.50
0.55
0.60
0.65
0.70
0.50 0.55 0.60 0.65 0.70
U(d) − Actual
U(d
) −
AB
C A
pp
roxim
atio
n
0.5
0.6
0.7
0.8
0.5 0.6 0.7 0.8
U(d) − Actual
U(d
) −
AB
C A
pp
roxim
atio
n
0.5
0.6
0.7
0.5 0.6 0.7
U(d) − Actual
U(d
) −
AB
C A
pp
roxim
atio
n
0.5
0.6
0.7
0.5 0.6 0.7 0.8
U(d) − Actual
U(d
) −
AB
C A
pp
roxim
atio
n
9.5
9.6
9.7
9.8
9.6 9.7 9.8 9.9
U(d) − Actual
U(d
) −
AB
C A
pp
roxim
atio
n
1 design point
9.6
9.8
10.0
10.2
9.6 9.8 10.0 10.2 10.4
U(d) − Actual
U(d
) −
AB
C A
pp
roxim
atio
n
2 design points
9.50
9.75
10.00
10.25
9.75 10.00 10.25
U(d) − Actual
U(d
) −
AB
C A
pp
roxim
atio
n
3 design points
9.50
9.75
10.00
10.25
9.75 10.00 10.25 10.50
U(d) − Actual
U(d
) −
AB
C A
pp
roxim
atio
n
4 design points
Figure 3.3: Comparison of estimated expected utility of the mutual information utility (first row), theZero-One utility (second row) and the Ds-optimality utility (third row) using ABC likelihoods and actuallikelihoods. In each plot, y = x line indicates the perfect match of approximated and actual utilityevaluations.
Figure 3.3 shows a comparison of the expected utility values based on the ABC approx-
imation and the actual likelihood for each utility function considered in this study. The
first row shows that the ABC-MC algorithm to approximate the mutual information util-
ity closely matches the expected utility values when using the actual likelihood. In the
case of designs with four points, the approximated utility values slightly deviate from the
actual utility value, particularly for designs with higher utility values. Such noise could
potentially hinder our ability to locate the optimal design. Similarly, the approximation
of the Zero-One utility via the ABC posterior model probability results in a one-to-one
match for low dimension designs and some deviations for the four point design, as shown
Chapter 3. Optimal Bayesian design for Model discrimination 50
in the second row of Figure 3.3. The comparison of the expected utility values based
on the ABC approximation and the actual likelihood for Ds-optimality illustrated in the
third row suggests that the ABC approximation is biased. However, the relative ordering
of the designs (in terms of utility) is still preserved which suggests the approximation can
still be used when searching for the optimal design.
Table 3.2: Performance of optimisation algorithms in locating two points designs for discriminatingbetween Model 1 and 2 based on different utility functions.
Utility functionOptimisation
algorithm
Optimal design
d∗U(d∗)
Total
run time
(minutes)
Mutual information
RCE(0.5) (0.9, 4.2) -0.46 120.6
RCE(0.8) (0.9, 4.2) -0.46 91.8
RCE(1) (0.9, 4.2) -0.46 80.4
ACE-D (0.9, 4.2) -0.46 434.4
Ds-optimal
RCE(0.5) (0.6, 3.7) 10.36 75.0
RCE(0.8) (0.6, 3.7) 10.36 59.4
RCE(1) (0.6 , 3.7) 10.36 72.0
ACE-D (0.6 , 3.7) 10.36 332.4
Zero-One (0-1)
RCE(0.5) (0.7 , 4.3) 0.788 165.0
RCE(0.8) (0.9, 4.0) 0.791 117.6
RCE(1) (0.9 , 4.0) 0.791 125.4
ACE-D (0.9, 4.0) 0.791 475.8
The performance of the RCE and ACE-D algorithms were further explored for locating
optimal designs on a two dimensional grid with increments 0.1, that is, G(2,0.1) (similar
results were found for 3 and 4 design points, so they are given in the appendix - see Table
A.1). Here, each algorithm was executed 10 times in parallel starting from the same (ran-
domly drawn) initial designs. For each algorithm, the maximum expected utility value
found from the 10 runs and the total run time of 10 runs are given in Table 3.2. The
results show that both algorithms locate the same designs. One advantage of the RCE
algorithm is that it appears to locate these designs in a relatively short amount of time
(around 2 to 3 times faster than ACE-D). To further explore this, trace plots were again
inspected, see Figure 3.4. From this figure, it can be seen that ACE-D requires more
iterations of the exchange algorithm to locate the optimal design under all three utility
functions. As the utility functions considered for the remainder of this example and, in
particular, the next example are expensive to approximate, we propose only using RCE
to locate designs. Further, it is evident that the optimal designs found by the RCE al-
gorithm somewhat depend on the initial step size with both 0.8 and 1 being preferred in
terms of computation time. Thus, we ran the RCE algorithm with an initial step size of 1.
Chapter 3. Optimal Bayesian design for Model discrimination 51
−0.65
−0.60
−0.55
−0.50
−0.45
0 5 10 15 20
Iteration
U(d
) −
Mu
tua
l In
form
atio
n
Algorithm
ACE−D
RCE
(a)
9.6
9.8
10.0
10.2
10.4
0 5 10 15 20
Iteration
U(d
) −
Ds−
Op
tim
alit
y
Algorithm
ACE−D
RCE
(b)
0.6
0.7
0.8
0 5 10 15 20
Iteration
U(d
) −
Ze
ro−
On
e U
tilit
y
Algorithm
ACE−D
RCE
(c)
Figure 3.4: Trace plots of the expected utility for each run of the ACE-D and RCE algorithms inlocating optimal two-points design based on (a) the mutual information utility , (b) the Ds-optimalityand (c) Zero-One utility.
The optimal designs found under three utility functions, the mutual information util-
ity, the Ds-optimal utility and the Zero-One utility are given in Table 3.3. Here, we
re-evaluated the expected utility of each optimal design 500 times using different Monte
Carlo samples of size 500 from each model to estimate U(d∗) and the associated Monte
Carlo standard error. According to these designs, the process should be observed in its
early stages when most of the susceptible individuals become infected and the derived
optimal observational times are the time points which have a relatively large difference
between the prior predictive distributions of the candidate models (see Figure 3.2). Ac-
cording to the estimated utility values, a noticeable increase in the model discrimination
ability of designs is not achieved by collecting more than two observations.
The performance of the optimal designs found under each utility was assessed against a
set of some randomly selected designs with an equal number of design points. In each
case, 500 observations from Model 1 were generated according to the optimal design and
a corresponding random set of designs. Then, the posterior model probabilities of Model
Chapter 3. Optimal Bayesian design for Model discrimination 52
Table 3.3: Optimal designs for discriminating between Model 1 and Model 2 derived under differentutility functions.
Utility function
Number of
design points
|d|
Optimal design
d∗U(d∗)
Total
run time
(minutes)
Mutual information
1 0.4 -0.55 (0.01) 11.7
2 (0.9, 4.2) -0.47 (0.02) 80.2
3 (0.6, 4.3, 6.0) -0.47 (0.02) 233.8
4 (0.6, 4.0, 5.1, 10.8) -0.47 (0.02) 340.0
Ds-optimal
1 0.4 9.85 (0.01) 9.8
2 (0.6, 3.7) 10.35 (0.02) 72.3
3 (0.6, 3.3, 5.7) 10.43 (0.02) 145.7
4 (0.5, 1.0, 3.3, 5.7) 10.46 (0.02) 228.0
Zero-One (0-1)
1 0.3 0.71 (0.01) 13.0
2 (0.9, 4.0) 0.77 (0.01) 125.5
3 (0.4, 5.3, 8.1) 0.77 (0.01) 187.5
4 (0.3, 1.2, 3.3, 5.5) 0.77 (0.01) 331.1
1 were evaluated based on these simulated observations. In order to avoid potential inac-
curacies introduced by estimating the posterior model probabilities via ABC, the actual
likelihood was evaluated for this validation. In Figure 3.5, the empirical cumulative den-
sity function (CDF) of the estimated posterior model probability of Model 1 (true model)
based on each design is plotted. Here, we note that the proposed optimal designs under
each utility function perform equally well in discriminating between Model 1 and Model
2. According to Figure 3.5, randomly selected designs can perform as well as optimal
designs in discriminating between rival models, particularly in the case of one point de-
signs. This outcome is due to the fact that the set of random designs might actually
contain the optimal or near-optimal design. We note that, on average, the optimal de-
signs have a higher chance of correctly selecting the model responsible for data generation.
Chapter 3. Optimal Bayesian design for Model discrimination 53
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75
Posterior model probability
Em
pir
ica
l cum
ula
tive
pro
ba
bili
ty
Design
0−1
Ds
MI
RD
(a) 1 design point
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75
Posterior model probability
Em
pir
ica
l cum
ula
tive
pro
ba
bili
ty
Design
0−1
Ds
MI
RD
(b) 2 design points
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75
Posterior model probability
Em
pir
ical cum
ula
tive
pro
babili
ty
Design
0−1
Ds
MI
RD
(c) 3 design points
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00
Posterior model probability
Em
pir
ical cum
ula
tive
pro
babili
ty
Design
0−1
Ds
MI
RD
(d) 4 design points
Figure 3.5: Empirical cumulative probabilities of the posterior model probability of Model 1 (true model)obtained for observations generated from Model 1 according to optimal designs for discriminating betweenModels 1 and 2, and random designs.
A similar simulation study was conducted using data generated from Model 2, and Fig-
ure 3.6 presents the empirical CDF of the estimated posterior probability of Model 2 in
each case. In this instance, across both models, the optimal designs perform better for
discrimination than the random designs (in expectation). However, it appears as though
the optimal designs have a slightly higher chance of selecting the wrong model on some
occasions. Optimal designs were obtained by maximising the expected utility which is
evaluated based on possible outcomes of all competing models. Consequently, the selected
optimal designs might not be preferred for discrimination for each individual realisation
from the prior information. However, over all realisations (or a large set of realisations),
the optimal designs are preferred for discrimination.
Chapter 3. Optimal Bayesian design for Model discrimination 54
0.00
0.25
0.50
0.75
1.00
0.25 0.50 0.75 1.00
Posterior model probability
Em
pir
ica
l cum
ula
tive
pro
ba
bili
ty
Design
0−1
Ds
MI
RD
(a) 1 design point
0.00
0.25
0.50
0.75
1.00
0.25 0.50 0.75 1.00
Posterior model probability
Em
pir
ica
l cum
ula
tive
pro
ba
bili
ty
Design
0−1
Ds
MI
RD
(b) 2 design points
0.00
0.25
0.50
0.75
1.00
0.25 0.50 0.75 1.00
Posterior model probability
Em
pir
ical cum
ula
tive
pro
babili
ty
Design
0−1
Ds
MI
RD
(c) 3 design points
0.00
0.25
0.50
0.75
1.00
0.25 0.50 0.75 1.00
Posterior model probability
Em
pir
ical cum
ula
tive
pro
babili
ty
Design
0−1
Ds
MI
RD
(d) 4 design points
Figure 3.6: Empirical cumulative probabilities of the posterior model probability of Model 2 (true model)obtained for observations generated from Model 2 according to optimal designs for discriminating betweenModels 1 and 2, and random designs.
3.7.3 Example 3 - Designs for model discrimination
In this example, we derived optimal observation times which yield informative observa-
tions for discriminating between Models 1, 2, 3 and 4 using the utility functions previously
considered in Example 2. Similar to the previous example, here, the ith design point repre-
sents the ith time at which to observe the process of interest. As there are four competing
models in this example, we considered designs with up to ten design points for model
discrimination. Note that in the Ds-optimal utility function, b1 and λ of Model 4 were
considered as θs, the extra parameters of the most complex candidate model. For Model
1 and Model 2, the prior distributions from Example 2 were considered here, and for
Model 3 and Model 4, the prior distribution of the parameters was taken as the ABC
posterior obtain based on the same dataset from Model 1 used in Example 2 with priors
for Model 3 - b1 ∼ U(0.4, 1) and λ ∼ exp(rate= 0.01) and Model 4 - b1 ∼ U(0.01, 0.6),
b2 ∼ U(0.01, 0.0.5) and λ ∼ exp(rate= 0.01).
Chapter 3. Optimal Bayesian design for Model discrimination 55
In this example, the utility evaluations are computationally more expensive than the pre-
vious example as there are more competing models to consider. Thus, we implemented
parallel computing to evaluate utilities within the RCE algorithm (see lines 6 and 11 of
Algorithm 3.3). Specifically, the RCE algorithm was run five times in parallel using five
nodes each with 12 cores. In estimating the expected utility of mutual information and
Zero-One utility, 500 Monte Carlo samples from each model were used, and for Ds-optimal
utility 1000 Monte Carlo samples from Model 4 were used to achieve the desired accuracy
(Monte Carlo error less than 0.02). Optimal designs with 1 to 10 design points obtained
under each utility function and their corresponding utility values are given in Table 3.4.
In contrast to previous examples, here the average run time over five runs of the RCE
algorithm in locating each optimal design is given. It is evident that the Ds-optimal util-
ity function is computationally less expensive than the other two utility functions, and
the Zero-One utility function rapidly becomes computationally intensive as the number
of candidate models increases. As experienced in Example 2, the process should be ob-
served in its early stages to collect informative observations for discriminating between
competing models.
Chapter 3. Optimal Bayesian design for Model discrimination 56
Table 3.4: Utility of optimal designs for discriminating between Models 1, 2, 3 and 4 derived underdifferent utility functions.
Utility
function|d|
Optimal design
d∗U(d∗)
Average
run time
(minutes)
Mutual
information
1 0.5 -1.22 (0.01) 5.2
2 (0.9, 5.3) -1.11 (0.01) 21.9
3 (0.8, 3.8, 6.9) -1.08 (0.01) 52.2
4 (0.1, 1.0, 3.8, 5.2) -1.06 (0.02) 82.6
6 (0.1, 0.9, 3.7, 5.2, 7.1, 8.5) -1.03 (0.01) 113.3
8 (0.1, 0.2, 0.9, 3.7, 5.1, 7.1, 8.1, 10.8) -1.00 (0.01) 226.6
10 (0.1, 0.2, 1.0, 3.7, 4.9, 5.8, 7.7, 8.7, 11.2, 13.8) -1.00 (0.01) 312.2
Ds-optimal
1 0.2 1.48 (0.02) 4.7
2 (0.4, 3.5) 1.75 (0.02) 32.2
3 (0.1, 0.6, 3.4) 1.87 (0.02) 25.5
4 (0.1, 0.6, 3.1, 4.1) 1.87 (0.02) 64.9
6 (0.1, 0.3, 0.8, 3.1, 5.7, 9.5) 1.89 (0.02) 95.8
8 (0.1, 0.4, 2.3, 3.9, 7.8, 10.2, 13.1, 15.0) 1.88 (0.02) 109.7
10 (0.1, 0.6, 2.2, 3.5, 7.8, 9.2, 9.5, 11.0, 14.3, 15.6) 1.89 (0.02) 179.5
Zero-One
(0-1)
1 5.6 0.39 (0.01) 15.9
2 (0.9, 7.0) 0.45 (0.01) 99.0
3 (0.1, 0.4, 5.0) 0.48 (0.01) 170.9
4 (0.1, 0.9, 3.4, 7.2) 0.51 (0.01) 255.4
6 (0.1, 0.2, 0.9, 4.1, 6.7, 7.5) 0.54 (0.01) 364.5
8 (0.1, 0.2, 1.2, 3.9, 5.2, 7.1, 8.0, 9.1) 0.55 (0.01) 635.6
10 (0.1, 0.2, 0.4, 1.3, 3.5, 5.1, 7.2, 8.7, 9.0, 12.1) 0.55 (0.01) 788.8
Further, the performance of our proposed methodology for locating designs in higher di-
mensions (|d| ≥ 6) was explored. For this exploration, the mutual information utility was
considered. Table 3.5 compares the quality of designs obtained by increasing the number
of models simulations (N) used for posterior approximations (see Algorithm 3.2), where
the expected utility of each design was evaluated using 4× 105 prior predictive samples.
According to these designs, it is evident that little to no improvement in the utility can be
achieved by increasing the number of model simulations. This provides some assurance
that highly efficient designs have been located under the initially selected settings for this
example.
Chapter 3. Optimal Bayesian design for Model discrimination 57
Table 3.5: Utility of optimal designs for discriminating between Models 1, 2, 3 and 4 derived under themutual information utility.
|d|N
(×105)
Optimal design
d∗U(d∗)
Average
run time
(minutes)
6
1 (0.1, 0.9, 3.7, 5.2, 7.1, 8.5) -1.03 (0.01) 113.3
2 (0.1, 0.8, 2.2, 4.3, 5.4, 7.1) -1.03 (0.01) 313.7
4 (0.1, 0.3, 1.2, 3.7, 4.8, 6.9) -1.01 (0.01) 766.4
8
1 (0.1, 0.2, 0.9, 3.7, 5.1, 7.1, 8.1, 10.8) -1.00 (0.01) 226.6
2 (0.1, 0.2, 1.0, 3.7, 5.1, 7.1, 8.1, 10.8) -1.00 (0.01) 575.8
4 (0.1, 0.2, 1.0, 3.7, 4.8, 5.1, 6.8, 8.5) -1.00 (0.01) 1244.7
10
1 (0.1, 0.2, 1.0, 3.7, 4.9, 5.8, 7.7, 8.7, 11.2, 13.8) -1.00 (0.01) 312.2
2 (0.1, 0.2, 0.9, 3.3, 4.3, 5.1, 7.1, 8.1, 10.8, 12.1) -0.99 (0.01) 709.5
4 (0.1, 0.3, 1.0, 3.8, 5.1, 6.0, 7.1, 8.1, 8.7, 11.0) -1.00 (0.01) 1507.8
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75
ABC posterior model probability
Em
pir
ical cum
ula
tive
pro
babili
ty
Utility function
0−1
Ds
MI
RD
(a) 1 design point
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00
ABC posterior model probability
Em
pir
ical cum
ula
tive
pro
babili
ty
Utility function
0−1
Ds
MI
RD
(b) 4 design points
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00
ABC posterior model probability
Em
pir
ical cum
ula
tive
pro
babili
ty
Utility function
0−1
Ds
MI
RD
(c) 8 design points
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00
ABC posterior model probability
Em
pir
ical cum
ula
tive
pro
babili
ty
Utility function
0−1
Ds
MI
RD
(d) 10 design points
Figure 3.7: Empirical cumulative probabilities of the ABC posterior model probability of Model 1 (truemodel) obtained for observations generated from Model 1 according to optimal designs for discriminatingbetween Models 1, 2, 3 and 4, and random designs.
Chapter 3. Optimal Bayesian design for Model discrimination 58
As in Example 2, the performance of the derived optimal designs given in Table 3.4
was evaluated via a simulation study by generating data from all four competing models
considered in this example. However, here the ABC posterior model probability of the
true model was used for validation to avoid the high computational cost of computing
the actual posterior model probabilities of Model 3 and Model 4. Across all models, the
optimal designs generally perform better than the random design shown by much larger
empirical CDF values near 1. Specifically, the optimal designs found under the mutual
information and Zero-One utility functions perform equally well across all data generating
models. When Model 1 is the data generating model, (see Figure 3.7), Ds-optimal designs
do not perform as well as the designs from other two utilities but actually outperforms
the other optimal designs when Model 4 is generating data (see Figure A.4). This could
be due to the fact that the Ds-optimality utility is focussing on estimating the additional
parameters in Model 4 with the single parameter of Model 1 not being included in the
utility function.
3.8 Discussion
In this work, a methodology for selecting Bayesian designs for discriminating between
models with intractable likelihoods in epidemiology has been developed. Primarily, Bayesian
design for model discrimination in epidemiology has been considered by introducing a
computationally efficient method to estimate three model discrimination utilities, namely
the mutual information utility, the Ds-optimal utility and the Zero-One utility, via ABC
methods. Secondly, an adaptation of the coordinate exchange algorithm which exploits
parallel computational architectures was presented.
The results from comparing our utility approximation in Example 2 with the estimated
utility when the likelihood could be computed suggests that our approach can be used to
evaluate designs when competing models contain intractable likelihoods. Across Examples
2 and 3, all discrimination utilities generally performed well yielding data which could be
used to determine (with high probability) the true data generating model. However, there
was one instance where the Ds-optimality utility performed poorly, and this appeared to
be due to how the utility was constructed (in terms of which subset of parameters was
considered). One benefit of implementing this utility is that it required much less compu-
tational effort than the other two utilities. This is particularly noticeable when compared
to the Zero-One utility which becomes computationally expensive as the number of rival
models (K) increases since it requires (K−1) posterior model probability approximations
for a single evaluation of u(d,y,m) whereas Ds-optimality utility requires only one poste-
rior approximation. The mutual information utility also becomes moderately expensive to
evaluate as more candidate models are considered because it needs to evaluate u(d,y,m)
for each model m which requires a single posterior model probability approximation.
Chapter 3. Optimal Bayesian design for Model discrimination 59
In Example 3, we demonstrated our methodology for the location of designs of up to ten
dimensions. This methodology could potentially be used to find designs in more than
ten dimensions. However, as the number of dimensions increases, so does the ABC toler-
ance. As such, there may be a point where this tolerance is so large that it hinders our
ability to locate optimal designs. This could be addressed by increasing the number of
prior simulations to obtain a reasonably sized posterior sample with acceptable tolerance,
but this comes at a computational cost and increases the amount of memory required
to store these simulations. Thus, alternative approaches to ABC rejection may be of
interest. These could include the expectation propagation ABC (EP-ABC) algorithm of
Barthelme and Chopin (2014) and the synthetic likelihood approach of Wood (2010) and
Price et al. (2018c). We plan to explore these ideas in future work.
In this paper, the RCE algorithm was proposed to locate Bayesian designs. We also
proposed an adaptation to the ACE algorithm such that it can be used to search across
discrete design spaces. Both algorithms are able to take advantage of the use of pre-
simulated data, and gave promising results yielding relatively highly efficient designs in
high dimensions and in intractable likelihood settings. The computational efficiency of
RCE and being able to exploit parallel computational architectures eventually led to this
algorithm being preferred for design problems where computationally intensive utility
evaluations are involved. However, the RCE algorithm could potentially benefit from
employing an emulator of the utility surface (as used in ACE-D), and this could be con-
sidered in further research. One drawback of using a discrete design space is the located
design could be sub-optimal. However, the selection of a small increment for the grid
that is used to discretise the design space should enable the location of relatively highly
efficient designs in a reasonable amount of time where the use of a continuous design space
could be computationally prohibitive, particularly for complex models.
Given the developments presented in this paper, it is now possible to undertake Bayesian
design for parameter estimation and model discrimination in settings where the likelihood
is intractable. Thus, it should also be possible to consider dual purpose utility functions
such as the total entropy utility of Borth (1975) which addresses both of these experi-
mental goals. Recently, this utility has been implemented in settings where the likelihood
can be evaluated straightforwardly (McGree, 2017), but it is of interest to extend such
methodology so that experiments in epidemiology are informative for both model selec-
tion and parameter estimation.
4 Dual purpose Bayesian design for parameter es-
timation and model discrimination in epidemi-
ology using a synthetic likelihood approach
61
Chapter 4. Dual purpose Bayesian design for experiments in epidemiology 62
Statement for Authorship
This chapter has been written as a journal article. The authors listed below have certified
that:
(a) They meet the criteria for authorship as they have participated in the conception,
execution or interpretation of at least the part of the publication in their field of
expertise;
(b) They take public responsibility for their part of the publication, except for the
responsible author who accepts overall responsibility for the publication;
(c) There are no other authors of the publication according to these criteria;
(d) Potential conflicts of interest have been disclosed to granting bodies, the editor
or publisher of the journals of other publications and the head of the responsible
academic unit; and
(e) They agree to the use of the publication in the student’s thesis and its publication
on the Australian Digital Thesis database consistent with any limitations set by
publisher requirements.
The reference for the publication associated with this chapter is; Dehideniya M. B.,
Drovandi C. C. , and McGree J. M. Dual purpose Bayesian design for parameter esti-
mation and model discrimination in epidemiology using a synthetic likelihood approach.
Bayesian Analysis (Submitted for publication)
Contributor Statement of contribution
M. B. Dehideniya Developed and implemented the statistical methods, wrote
the manuscript, revised the manuscript as suggested by
co-authors.
Signature and date:
J. M. McGree Initiated the research concept, supervised research,
assisted in interpreting results, critically reviewed
manuscript.
C. C. Drovandi Supervised research, assisted in interpreting results,
critically reviewed manuscript.
Principal Supervisor Confirmation: I have sighted email or other correspondence for all
co-authors confirming their authorship.
Name: ________________________ Signature:______________ Date: ________
05/07/2019
James McGree 05/07/2019
Chapter 4. Dual purpose Bayesian design for experiments in epidemiology 63
4.1 Abstract
Foot and mouth disease (FMD) is a highly contagious infectious disease which has fre-
quently plagued livestock across many different countries worldwide. Currently, the
spread of the disease is not well understood, and thus experiments are needed such that
targeted disease detection, prevention and control measures can be developed. However,
developing such experiments is challenging as typically the likelihood of models for such
infectious diseases is computationally intractable. This poses challenges in quantifying
the usefulness of different experiments through a utility function. For this purpose, a
novel synthetic likelihood approach is considered which allows experiments for infectious
diseases to be developed through the consideration of a dual-purpose utility function for
parameter estimation and model discrimination. The new methodology is validated on
an illustrative example before being applied to experiments for FMD which motivate this
work. Across both examples, the results suggest that the derived dual purpose designs
perform similarly well in achieving each experimental goal when compared to the designs
optimised for each individual goal. Further, the results from the motivating example
suggest that new knowledge about how FMD spreads throughout a population could be
discovered if our approaches are adopted in future experimentation.
4.2 Introduction
Understanding the dynamics of infectious diseases is important for the development and
implementation of detection, prevention and control measures. In the field of veterinary
epidemiology, the dynamics of disease transmission is investigated by observing the spread
of the disease over time among a small population of animals in a controlled experiment.
Unfortunately, such experimentation can have detrimental effects on animal welfare and
can be costly in terms of time and money. This motivates the need to efficiently design
such studies, both in terms of the number of experiments which need to be conducted
and the number of times with which the population needs to be observed.
Methods from optimal design can be used to address this need for efficiency, and have
been specifically developed to handle the computational intractability of evaluating the
likelihood of models typically found in epidemiology. To determine informative times for
when to observe these controlled experiments, methods have been proposed for efficient
estimation of transmission parameters (Cook et al., 2008, Drovandi and Pettitt, 2013,
Pagendam and Pollett, 2013) and to determine the most appropriate model to describe
the spread of the disease (Dehideniya et al., 2018b). However, further efficiency could
be achieved through the consideration of multiple experimental objectives, and thus lead
to more ethical experimentation. Such developments are of particular importance when
experimenting with infectious diseases such as foot and mouth disease (FMD) which can
be quite harmful to animals, and this motivates the research presented in this article.
Chapter 4. Dual purpose Bayesian design for experiments in epidemiology 64
FMD is a highly contagious disease which impacts livestock such as cattle, pigs and sheep
(Knight-Jones and Rushton, 2013). There have been a number of outbreaks of FMD
including in 2001 in the United Kingdom (Haydon et al., 2004) and the Netherlands
(Bouma et al., 2003), and in 2010 in Japan (Muroga et al., 2012) which had large-scale
economic impact, and resulted in the mass culling of livestock in attempts to prevent the
disease from spreading further. Consequently, much attention has focussed on studying
the transmission dynamics of FMD, for instance, Backer et al. (2012), Bravo de Rueda
et al. (2015), Hu et al. (2017), Orsel et al. (2007) and the references therein. These studies
have led to the development of conflicting views about which model is most appropriate to
describe the dynamics of FMD. Hence, it is required to design experiments which provide
highly informative data to select, among the competing models found in the literature,
the most appropriate model to describe the dynamics of FMD and to also estimate the
parameters of this model as precisely as possible. This motivates the development of a
dual purpose utility function that can be used to design experiments for epidemiological
models which typically have intractable likelihoods.
In general, deriving dual purpose designs for model discrimination and parameter esti-
mation is challenging as typically these are competing objectives. That is, designs for
model discrimination typically perform poorly for parameter estimation, and vice versa
(Atkinson, 2008). Many authors have used a weighted sum of utilities which represent
different objectives, for example, Atkinson (2008), Clyde and Chaloner (1996), McGree
et al. (2008), Tommasi (2009). However, such an approach has been shown to be diffi-
cult to implement in practice due to the choice of weighting parameter. In the Bayesian
context, Borth (1975) suggests an entropy-based utility to derive dual purpose designs
which avoids the pre-specification of weights and the need of pre-specifying a true model.
Thus, in this work, we adopt the Bayesian approach to design dual purpose experiments.
McGree (2017) notes the computational difficulties in using the total entropy utility in
situations where the likelihood can be evaluated, and this is further compounded by the
intractability of the likelihood function for models typically found in epidemiology. As
such, dual purpose experiments to estimate parameters and discriminate between models
with intractable likelihoods have not been considered in the literature.
In the Bayesian context, methods from approximate Bayesian computation (ABC) have
been used in designing experiments for parameter estimation (Drovandi and Pettitt, 2013,
Price et al., 2016) and model discrimination (Dehideniya et al., 2018b). However, these
methodologies are restricted to a small class of utility functions due to the use of ABC
rejection for approximating the posterior distribution. In particular, ABC rejection meth-
ods do not provide an efficient estimate of the model evidence, and as such cannot be
used with the widely popular Kullback-Leibler (KL) divergence utility for parameter es-
timation. This also renders such methods inapplicable for use with the total entropy
utility. Further, the ABC approximation could perform poorly as the number of design
dimensions increases. Consequently, there will be a point where this potentially hinders
locating the optimal design depending on the problem and the available prior information.
Chapter 4. Dual purpose Bayesian design for experiments in epidemiology 65
Hence we propose a novel synthetic likelihood method to approximate the likelihood, and
then to estimate the utility function more efficiently. Our developments of the synthetic
likelihood facilitate, not only the consideration of the total entropy utility function, but
a much wider class of utilities which have not been possible to consider previously (in-
cluding the KL divergence utility). Thus, our methodology should be useful in designing
experiments for models with intractable likelihoods in general.
Pagendam and Pollett (2013) used the Gaussian diffusion approximation in evaluating
designs for parameter estimation in the frequentist framework. Despite its computational
efficiency compared with the simulation-based approximations such as ABC, the Gaussian
diffusion approximation can only be used for a certain class of models, that is, density
dependent Markov chains. Further, as pointed out by Pagendam and Pollett (2013), the
diffusion approximation is only valid for reasonably large populations (say > 100). Hence,
the diffusion approximation cannot be used in experiments in veterinary epidemiology
which typically consider a small number of animals due to the cost of conducting such
experiments, and concerns about animal welfare. The proposed synthetic likelihood does
not have such constraints, and thus our methodology is more appropriate in general.
The paper is organised as follows. In the next section, our proposed synthetic likelihood
method is described. Section 4.4 presents the utility functions used in this work, and we
show how these can be efficiently estimated within our synthetic likelihood framework.
Then, an illustrative example is considered in Section 4.5 along with the motivating
application for this work. Finally, we conclude with a discussion of our proposed methods
and directions for future research.
4.3 Inference framework
In epidemiology, continuous-time Markov chain (CTMC) models are used to describe the
spread of a disease in a closed-population where the individuals are divided into a set
of non-overlapping subpopulations according to their disease status, such as susceptible,
exposed, infectious, or recovered. Depending on the disease of interest, all or a subset of
these subpopulations are considered, and the state of the Markov chain at time t is defined
by the number individuals in each subpopulation. Then, the transition of individuals
between these subpopulations (disease states) are described by transition probabilities
defined based on unknown parameter values. The likelihood of a series of observed states,
y = {y1,y2, . . . ,yk}, of the Markov process at times d = {t1, t2, . . . , tk} can be expressed
as the product of transition probabilities, p(y|θ,d) =∏k
i=1 p(yi|yi−1), where p(yi|yi−1)is the transition probability of moving from state yi−1 to yi during the time from ti−1
to ti for given parameter values θ. Unfortunately, the evaluation of these transition
probabilities is a computationally expensive task due to the Markov processes having
(at least) a reasonable number of states. This results in a computationally intractable
likelihood for most epidemiological models. Thus, likelihood-free methods are required
for inference.
Chapter 4. Dual purpose Bayesian design for experiments in epidemiology 66
The synthetic likelihood approach (Wood, 2010) is a simulation-based method used to
form an approximation to intractable likelihoods. To approximate the likelihood of ob-
served data yobs for given parameter values θ, summary statistics sobs = S(yobs) are
considered. This will typically map the data into a smaller number of dimensions, and
is frequently used in likelihood-free inference to avoid the curse of dimensionality. The
likelihood for the observed data is then approximated by the likelihood of the summary
statistics given the model and θ where these summary statistics are assumed to be mul-
tivariate Normal with mean vector µ(θ) and covariance matrix Σ(θ). Usually, µ(θ) and
Σ(θ) are intractable functions but can be estimated via simulation from the model for
given values of θ. That is, one can simulate n datasets from the model of interest, and
compute summary statistics (s(1)sim, s
(2)sim, . . . , s
(n)sim) which can be used to estimate µ(θ)
and Σ(θ). Thus, despite the likelihood being intractable, we require it to be computa-
tionally efficient to generate data from the model. Originally, Wood (2010) focused on
estimating parameters via maximising the synthetic likelihood, and recently Price et al.
(2018c) extended this to the Bayesian paradigm by incorporating a prior distribution on
the parameters of interest.
The FMD application considered here involves datasets with a moderate number of ob-
servations, and thus, we derive the synthetic likelihood based on the distribution of the
simulated data themselves rather than considering summary statistics. To specify the
proposed synthetic likelihood approach, let us consider a univariate Markov process with
a discrete state space {1, 2, . . . , N} and y = {y1, y2, ..., yk} which are the observed states
of the Markov process at times d = {t1, t2, ..., tk}. For instance, the Susceptible-Infected
(SI) model (Cook et al., 2008) divides a closed-population of N individuals into two sub-
populations: susceptible and infectious individuals, and the state of the Markov process
at time ti is the number of infectious individuals at ti. Then, the likelihood of y for a
given model m with parameters θ, p(y|θ,m,d), can be approximated by assuming that y
follows a k-dimensional multivariate Normal distribution, N (µ(θ,m,d), Σ(θ,m,d)) where
µ(θ,m,d) and Σ(θ,m,d) are the estimated mean vector and the covariance matrix based
on n simulated datasets, denoted by X = {xti ; i = 1, ..., k}, from model m given pa-
rameter values θ at each time point ti. Specifically, µ(θ,m,d) is a vector of k elements
where the ith element is the estimated mean of xti and (i, j) element of Σ(θ,m,d) is the
estimated covariance between xti and xtj .
However, in our motivating study, count data are observed, and thus the multivariate
Normal density may not be appropriate to describe the likelihood of these discrete data,
particularly in the later stages of the spread of disease where only a few unique out-
comes are plausible. Thus, the idea of continuity correction is applied to approximate
p(y|θ,m,d), and this can be expressed as follows,
pSL(y|θ,m,d) = p(y1 − c < X1 < y1 + c, ..., yk − c < Xk < yk + c), (4.1)
Chapter 4. Dual purpose Bayesian design for experiments in epidemiology 67
where (X1, X2, ..., Xk) ∼ N (µ(θ,m,d), Σ(θ,m,d)) and c is the continuity correction fac-
tor which is set to 0.5 here. The accuracy of this approximation depends on n, the number
of simulations used, and this makes the synthetic likelihood method more convenient to
use in practice as less tuning is required compared to ABC methods. Further, in principle,
it is straightforward to extend this to l-dimensional stochastic processes by considering a
(l × k)-dimensional multivariate Normal distribution. However, as (l × k) increases, the
evaluation of pSL(y|θ,m,d) becomes more computationally expensive, and this may be
a limitation if (l × k) is large.
When the prior of parameters θ is updated based on a small to moderate number of
observations, the posterior of θ can be approximated via importance sampling. However,
this requires evaluating the likelihood which is generally computationally intractable for
epidemiological models. In place of this, we propose to use our synthetic likelihood
approximation. That is, first, a set of particles {θi}Qi=1 is drawn from p(θ) with equal
weights. Then, to reflect the posterior of θ upon observing y, these particles are weighted
by their corresponding normalised weights, W i, which are obtained by normalising the
likelihoods wi = pSL(y|θi,m,d).
Moreover, the marginal likelihood of y for model m can be approximated via Monte
Carlo integration based on the synthetic likelihood of y evaluated for a sample of {θi}Qi=1
drawn from the corresponding parameter distribution. Thus, the approximated marginal
likelihood is given by,
p(y|m,d) =1
Q
Q∑i=1
pSL(y|θi,m,d). (4.2)
When there are K competing models to describe the observed data y at times d, the
probability p(y|d) can be approximated by,
p(y|d) =
K∑m=1
p(y|m,d) p(m), (4.3)
where p(m) is the prior model probability of model m. Then, the posterior model prob-
ability of model m can be approximated as follows:
p(m|y,d) =p(y|m,d) p(m)
p(y|d). (4.4)
Hence, our proposed synthetic likelihood approach can be used to undertake parameter
estimation and model selection. Moreover, such approximations are of general use when
approximating utility functions in Bayesian design for models with intractable likelihoods.
Further details about this will be shown in the next section.
Chapter 4. Dual purpose Bayesian design for experiments in epidemiology 68
4.4 Bayesian experimental designs
Designing an experiment is a decision making process about how to select suitable values of
controllable variables of the experiment in order to efficiently address experimental aims.
The set of values selected for the controllable variables is referred to as the design, and
the experimental aims are defined via a utility function u(.). In deciding upon a design,
one must account for uncertainty about the model (m), the ensuing parameter values (θ)
and the data (y) that will be observed. This can be achieved through evaluating the
expected value of the utility function. Hence, in the Bayesian setting, possible designs
are evaluated and compared based on the expected value of u(.) which represents the
informativeness of y for addressing the aim of the experiment.
When the experimenter considers a single model to describe the process of interest, find-
ing the optimal design involves maximising the expected value of the utility function
u(d,y,θ), with respect to the joint distribution of y and θ, and this expectation can be
expressed as follows:
U(d) =
∫y
∫θu(d,y,θ) p(y|θ,d) p(θ) dθ dy, (4.5)
where p(y|θ,d) is the likelihood of a possible outcome y under parameters θ and design
d, and p(θ) is the prior distribution of θ which allows prior knowledge or expert opinion
to be incorporated into the process of designing efficient experiments. In practice, utility
functions for parameter estimation are often defined as a function of posterior distribution
of parameters such as KL divergence between prior and the posterior (Cook et al., 2008),
a commonly used utility in Bayesian experimental design. In such utilities, θ is integrated
out, and consequently can be defined as follows:
U(d) =
∫yu(d,y) p(y|d) dy, (4.6)
where p(y|d) is the prior predictive distribution of observed data y under design d.
In the presence of K competing models to describe the process of interest, the sum of
expected utilities under each model weighted by the corresponding prior model probabili-
ties p(m);m = 1, 2, ...,K can be considered as the expected utility. This can be expressed
as follows:
U(d) =
K∑m=1
p(m){∫
yu(d,y,m) p(y|m,d) dy
}, (4.7)
where p(y|m,d) is the prior predictive distribution of observed data y under design d
according to model m and p(m) is the prior model probability of model m.
Chapter 4. Dual purpose Bayesian design for experiments in epidemiology 69
Unfortunately, given the form of most utility functions, the above integral is generally ana-
lytically intractable, and therefore needs to be approximated. One conventional approach
is to approximate the above integral via Monte Carlo integration using N independent
datasets (y) generated from the prior predictive distribution of each model, and this can
be expressed as follows:
U(d) =
K∑m=1
p(m)1
N
N∑j=1
u(d,ymj ,m), (4.8)
where ymj ∼ p(y|m,d) are independent draws from the model m at time points d. The
number of prior predictive simulations, K × N , needed to estimate U(d) with a desired
accuracy can be reduced using the randomised Quasi-Monte Carlo method (Drovandi and
Tran, 2018). However, for simplicity, here we estimate the expected utility via standard
Monte Carlo integration as given in Equation (4.8).
As will be seen in the next subsections, the evaluation of each u(d,ymj ,m) involves some
posterior evaluation, and thus the approximation of U(d) requires approximating or sam-
pling from K×N posterior distributions which is a considerable computational challenge.
The following subsections will describe the total entropy utility function and other two
utility functions considered in this work. Then, a computationally efficient approach to
approximate each utility function via the synthetic likelihood will be discussed.
4.4.1 Dual purpose utility function of parameter estimation and
model discrimination
In the design literature, the derivation of dual purpose experimental designs for parame-
ter estimation and model discrimination has been considered by means of weighted crite-
ria (Atkinson, 2008, Hill et al., 1968) and entropy-based utilities (Borth, 1975, McGree,
2017). In practice, the choice of appropriate weight for each experimental aim when using
weighted criteria is not straightforward (Clyde and Chaloner, 1996, McGree et al., 2008).
In contrast, the utility function based on the total entropy does not require pre-specified
weights for each experimental aim as the additive property of entropy provides a natural
way to weight both model discrimination and parameter estimation within a utility func-
tion. However, due to the computational challenges in evaluating the total entropy utility,
it has received little attention in the literature until the recent work of McGree (2017) who
proposed a computationally efficient way of estimating the utility via sequential Monte
Carlo.
In this work, we consider a utility based on the expected change in total entropy upon
observing the experimental data to derive static designs for dual purpose experiments of
parameter estimation and model discrimination. The total entropy utility for a possible
dataset y under model m according to the design d can be expressed as,
Chapter 4. Dual purpose Bayesian design for experiments in epidemiology 70
uTE(d,y,m) =
∫θm
log p(y|θm,m,d) p(θm|m,y,d) dθm − log p(y|d). (4.9)
The derivation of this utility function is described in Appendix B.1 of the supplemen-
tary material. For an observed dataset y under model m given design d, the first term
of uTE(d,y,m) can be approximated via Monte Carlo integration using a sample of
weighted particles {θimW im}
Qi=1 which forms a particle approximation to p(θm|m,y,d).
These weighted particles can be obtained via importance sampling based on the synthetic
likelihood method as explained in Section 4.3. The second term p(y|d) can be approxi-
mated as shown in Equation (4.3) based on draws from p(θm|m) for each model m. Then,
the approximated total entropy utility can be expressed as,
uTE(d,y,m) =
Q∑i=1
W im log p(y|θim,m,d)− log p(y|d). (4.10)
The total entropy utility can be expressed as a summation of two widely used utility
functions, the KL divergence utility (Ryan, 2003) for parameter estimation and the mutual
information utility for model discrimination (Box and Hill, 1967) (see Appendix B.1 of
the supplementary material). Thus, we consider the designs optimised based on these
utility functions to investigate the performance of total entropy designs for addressing
each objective. For completeness, the following subsections briefly describe the KLD
utility and the mutual information utility.
4.4.2 Utility function for parameter estimation
The KL divergence between the prior and the posterior distribution of the model pa-
rameters can be used as a utility function to design an experiment to efficiently estimate
parameters of a given model m (Cook et al., 2008, Price et al., 2016, Ryan, 2003). It can
be expressed as follows:
uKLD(d,y,m) =
∫θm
log p(y|θm,m,d) p(θm|m,y,d) dθm − log p(y|m,d), (4.11)
where y is a possible outcome of the experiment conducted under the design d according
to the model m. When the experimenter considers more than one potential model to
describe the process of interest, the sum of the expected KL-divergence between the prior
and the posterior of parameters of each model m, weighted by the corresponding prior
model probabilities, can be used to obtain designs for parameter estimation under model
uncertainty (see Equation (4.7)). Further, this can be expressed as the expected change
in the entropy of the parameter based on the observations y given design d (see Appendix
B.1 in the supplementary material).
Chapter 4. Dual purpose Bayesian design for experiments in epidemiology 71
In a similar way that uTE(d,y,m) is approximated in Section 4.4.1, the first term of the
KLD utility can be approximated using a set of weighted particles {θimW im}
Qi=1 obtained
for an observed dataset y from model m. The log-marginal likelihood of y, log p(y|m,d),
can also be approximated using the synthetic likelihood approach described in Section
4.3 (see Equation (4.2)). Then, the approximation of uKLD(d,y,m) can be expressed as
follows:
uKLD(d,y,m) =
Q∑i=1
W im log p(y|θim,m,d)− log p(y|m,d). (4.12)
4.4.3 Utility function for model discrimination
In the design literature, the mutual information between the model indicator and the
outcomes of the experiment has been used as a utility function to evaluate sequential
designs for this purpose (Box and Hill, 1967, Drovandi et al., 2014). More recently,
Dehideniya et al. (2018b) used this utility to derive designs for discriminating between
epidemiological models with intractable likelihoods. The mutual information utility can
be expressed as,
uMI(d,y,m) = log p(m|y,d), (4.13)
where p(m|y,d) is the posterior model probability of model m. For models with in-
tractable likelihoods, this can be approximated by replacing p(m|y,d) with the approxi-
mated posterior model probability of m via our synthetic likelihood approach as described
in Section 4.3 (see Equation (4.4)). Then, the expected mutual information utility for a
given design d can be obtained by substituting approximated uMI(d,y,m) in Equation
(4.8).
It is worth noting that p(y|m,d) and p(y|d) could not be evaluated efficiently using ABC
methods, and this is another motivation for adopting our synthetic likelihood approach
in this work.
In the next section, the total entropy utility will be used to derive designs for two epi-
demiological experiments which aim to both discriminate between competing models and
estimate model parameters. Then, the performance of derived dual purpose designs will
be compared to the performance of designs which are optimised for each objective.
4.5 Examples
In this section, two examples which involve epidemiological models with intractable like-
lihoods are considered to demonstrate the methodology presented in this paper. The first
Chapter 4. Dual purpose Bayesian design for experiments in epidemiology 72
example demonstrates the performance of the proposed synthetic likelihood approxima-
tion in estimating the utility functions described in the previous section, and investigates
the performance of the dual purpose designs derived based on the total entropy utility.
The second example, which motivates our research, focuses on designing an efficient ex-
periment to investigate the within-herd spread of the FMD in a cattle population. In
these examples, the following models are used to describe the spread of the infectious
disease of interest in a closed-population of size N .
Model 1 : Death model
The death model (Cook et al., 2008) divides the population into two sub-populations,
susceptible and infected. The state of the CTMC at time t is defined as the number
of infected individuals at time t, I(t). Given I(t) = i, the transition probability of
the possible state at t+ ∆t is given by
P[i+ 1 | i
]= β1 (N − i) ∆t +O(∆t),
where β1 is the rate at which susceptible individuals become infected due to envi-
ronmental sources.
Model 2 : Susceptible-Infected (SI) model
The SI model (Cook et al., 2008) assumes that the infected individuals in the popu-
lation also contribute to the spread of diseases, represented by an additional param-
eter β2, in addition to environmental sources. Given that I(t) = i, the transition
probability of the possible state at t+ ∆t is given by
P[i+ 1 | i
]= (β1 + β2 i) (N − i) ∆t +O(∆t).
Model 3 : Susceptible-Infectious-Recovered (SIR) model
The SIR model divides a closed-population of individuals into three sub-populations:
susceptible, infectious and recovered. The state of the CTMC at time t is defined as
the number of susceptible and infectious individuals at time t, {S(t), I(t)}, respec-
tively. According to the SIR model, the susceptible individuals become infected and
are immediately infectious. These infectious individuals contribute to the spread of
the disease until they recover. The duration of the infectious period of each individ-
ual is independently and identically distributed as an exponential random variable
with rate parameter α. Given that (S(t), I(t)) = (s, i), the transitions probabilities
for possible states at t+ ∆t are given by
P[s− 1 , i+ 1 | s , i
]=β s i
N∆t +O(∆t),
P[s, i− 1|s, i
]= α i∆t +O(∆t),
where β is the rate at which an infectious individual makes infected-contacts per
unit time.
Chapter 4. Dual purpose Bayesian design for experiments in epidemiology 73
Model 4 : Susceptible-Exposed-Infectious-Recovered (SEIR) model
In contrast to the SIR model, the SEIR model assumes that the susceptible indi-
viduals do not become infectious immediately after they become infected but after
a latent period of time TE . During the latent period, TE which is distributed as
an exponential random variable with rate αE , the infected individuals cannot be
differentiated from the susceptible individuals. Thus, these individuals are called
exposed individuals, and the number of exposed individuals at time t is denoted
by E(t). Once the exposed individual becomes infectious, it spreads the disease
to other susceptible individuals until it recovers after time TI which is distributed
as an exponential random variable with rate αI . The state of the CTMC at time
t is defined as the number of susceptible, exposed and infectious individuals at
time t, {S(t), E(t), I(t)}, respectively. Given that (S(t), E(t), I(t)) = (s, e, i), the
transitions probabilities for possible states at t+ ∆t are given by
P[s− 1 , e+ 1 , i | s , e , i
]=β s i
N∆t +O(∆t),
P[s , e− 1 , i+ 1 | s , e , i
]= αEe∆t +O(∆t),
P[s , e , i− 1 | s , e , i
]= αIi∆t +O(∆t),
where β is the rate at which an infectious individual makes infected-contacts per
unit time.
In conducting experiments to investigate the spread of a disease, it is impractical to
observe the process and collect samples continuously. Even collecting samples at frequent
intervals would be costly. Thus, in the following examples, we consider a set of k optimal
times at which to observe the process to collect samples for both parameter estimation and
model discrimination. It may be computationally prohibitive to find the optimal design
in continuous design space (time) as it requires a large number of model simulations for
each utility evaluation, particularly for complex models such as SIR and SEIR. However,
the simulated n datasets to obtain µ(θ,m,d) and Σ(θ,m,d) are independent from the
observed data y (see Section 4.3). Thus, we avoid undertaking an excessively large number
of simulations in the optimal design procedure by discretising the design space as a grid
of times from tmin to tmax with increments of tinc and pre-computing µ(θ,m,d) and
Σ(θ,m,d) using the pre-simulated datasets at each time point of the grid. Then, these pre-
computed mean vectors and covariance matrices are used for approximating the likelihood
and utility values in a computationally efficient way during the process of locating optimal
designs. Another advantage of this approach is the accuracy of the approximation can
be increased by increasing the number of model simulations used to estimate µ(θ,m,d)
and Σ(θ,m,d) without any additional storage requirements. In the design literature, pre-
simulated datasets have been used to estimate utilities, for instance, Drovandi and Pettitt
(2013), Hainy et al. (2016). As we use a discrete design space in the examples considered
Chapter 4. Dual purpose Bayesian design for experiments in epidemiology 74
in this work, the refined coordinate exchange (RCE) algorithm (Dehideniya et al., 2018b)
is used to locate optimal designs. The RCE algorithm starts with a given initial design
and searches for the optimal design by iteratively optimising one design variable at a time
while keeping all other design variables fixed. In each iteration, the best value for the
selected design variable is found by refining the grid on which U(d) is evaluated, starting
from a grid with a given initial step size (see Section 5 of Dehideniya et al. (2018b) for
the complete algorithm). In each example, the RCE algorithm was run five times, with
different initial designs, in parallel with an initial step size of 1.
4.5.1 Example 1 - Death and SI models
To illustrate the proposed methodology, first, we consider the problem of deriving an
efficient set of time points to observe the spread of a disease, which provides maximum
information to both discriminate between two rival models namely death model and SI
model, and estimate model parameters. Then, designs for each individual experimental
goal are also considered to compare the performance of derived dual purpose designs
in addressing each individual experimental goal. Here, a grid of times from 0.1 to 20
days with increment 0.1 was considered as the design space, and designs with 1 to 8
design points were derived. This example has been previously considered by Dehideniya
et al. (2018b) for discriminating between the death and SI model. Here, for the death
model β1 ∼ log − normal(−0.48, 0.09) is used as the prior distribution, and for SI model
β1 ∼ log − normal(−1.1, 0.16) and β2 ∼ log − normal(−4.5, 0.4) are used as the prior
distributions. These two models were assumed equally likely a priori.
First, the accuracy of approximating the expected utility of a given design d using the
proposed synthetic likelihood approach was evaluated. In this comparison, 500 randomly
selected designs with two, four and eight points were used, and for the one-point case, all
200 possible designs were considered. For each of these designs, U(d) was evaluated for
KLD, mutual information for model discrimination and total entropy based on both the
actual likelihood and the approximated likelihood via our synthetic likelihood approach,
and the results are shown in Figure 4.1. Here, 500 independent datasets from each model
were used to approximate the expected utilities (see Equation (4.8)), and the posterior
distribution of parameters of each model was approximated via importance sampling as
described in Section 4.3, using 500 parameter values drawn from the corresponding prior
distribution. Further, 1000 model simulations were used to estimate the mean vector and
covariance matrix in each synthetic likelihood evaluation. As is evident from the first row
of Figure 4.1, the approximated expected values of the mutual information utility closely
match the corresponding estimated expected utility values based on actual likelihood
evaluations. In contrast, the adapted synthetic likelihood approach overestimates the
expected utility values of both total entropy and KLD utilities (see Figure 4.1). However,
the candidate designs can still be ranked correctly (in terms of utility) based on our
approximation. Thus, our approach can be used to find optimal designs for models with
intractable likelihoods. Further, an additional simulation study reveals that our synthetic
Chapter 4. Dual purpose Bayesian design for experiments in epidemiology 75
−0.7
−0.6
−0.6
−0.6
−0.7 −0.6 −0.6 −0.6
U(d) − Actual
U(d
) −
SL
Ap
pro
xim
atio
n
−0.7
−0.6
−0.6
−0.6
−0.5
−0.7 −0.6 −0.6 −0.6 −0.5
U(d) − Actual
U(d
) −
SL
Ap
pro
xim
atio
n
−0.7
−0.6
−0.5
−0.7 −0.6 −0.5
U(d) − Actual
U(d
) −
SL
Ap
pro
xim
atio
n
−0.7
−0.6
−0.5
−0.7 −0.6 −0.5
U(d) − Actual
U(d
) −
SL
Ap
pro
xim
atio
n
0.0
0.2
0.4
0.6
0.8
0.0 0.2 0.4 0.6 0.8
U(d) − Actual
U(d
) −
SL
Ap
pro
xim
atio
n
0.0
0.3
0.6
0.9
0.0 0.3 0.6 0.9
U(d) − Actual
U(d
) −
SL
Ap
pro
xim
atio
n
0.5
1.0
0.0 0.2 0.5 0.8 1.0 1.2
U(d) − Actual
U(d
) −
SL
Ap
pro
xim
atio
n
0.5
1.0
0.5 1.0
U(d) − Actual
U(d
) −
SL
Ap
pro
xim
atio
n
0.0
0.2
0.4
0.6
0.0 0.2 0.4 0.6
U(d) − Actual
U(d
) −
SL
Ap
pro
xim
atio
n
1 design point
0.0
0.2
0.5
0.8
0.0 0.2 0.5 0.8
U(d) − Actual
U(d
) −
SL
Ap
pro
xim
atio
n
2 design points
0.3
0.6
0.9
0.0 0.2 0.5 0.8 1.0
U(d) − Actual
U(d
) −
SL
Ap
pro
xim
atio
n
4 design points
0.2
0.5
0.8
1.0
1.2
0.2 0.5 0.8 1.0
U(d) − ActualU
(d)
− S
L A
pp
roxim
atio
n
8 design points
Figure 4.1: Comparison of the estimated expected utility of the mutual information utility (first row), thetotal entropy utility (second row) and the KLD utility (third row) using synthetic and actual likelihoods.In each plot, y = x line indicates a perfect match of approximated and actual utility evaluations.
likelihood approach outperforms the ABC model choice algorithm (Grelaud et al., 2009)
in approximating the mutual information utility for model discrimination particularly for
designs with a moderate number of design points (see Appendix B.2 of the supplementary
material).
The derived dual purpose designs for discriminating between the death and SI model
and estimating parameters of both models are given in Table 4.1 along with the designs
found under the mutual information and KLD utilities. Here, the expected utility of each
optimal design was estimated using 100 re-evaluations of U(d∗) for different Monte Carlo
samples of size 500 from each model, and the estimated Monte Carlo error is given as a
standard deviation in the parenthesis.
The performance of the total entropy designs for model discrimination and parameter
estimation was assessed against designs optimised for each task and randomly selected
Chapter 4. Dual purpose Bayesian design for experiments in epidemiology 76
Table 4.1: Optimal designs derived under different utility functions.
Utility function Optimal design d∗ U(d∗)
Mutual Information
(0.5) -0.59 (0.01)(0.5, 3.5) -0.49 (0.01)(0.5, 1.2, 3.5) -0.46 (0.02)(0.5, 1.2, 3.5, 5.6) -0.44 (0.02)(0.5, 1.2, 2.2, 3.5, 5.6, 12.1) -0.43 (0.02)(0.2, 0.5, 1.2, 2.2, 3.5, 5.6, 12.1, 13.7) -0.43 (0.02)
Total Entropy
(2.8) 0.77 (0.02)(0.9, 3.6) 1.11 (0.03)(0.6, 2.5, 5.5) 1.25 (0.03)(0.6, 2.2, 3.7, 5.6) 1.32 (0.03)(0.5, 2.1, 3.5, 4.6, 5.7, 7.3) 1.41 (0.03)(0.5, 1.2, 2.2, 3.5, 4.7, 5.7, 7.3, 8.9) 1.48 (0.03)
KLD
(2.7) 0.74 (0.02)(1.9, 4.4) 0.92 (0.02)(0.7, 2.7, 5.7) 1.01 (0.02)(0.7, 2.3, 4.3, 6.0) 1.08 (0.02)(0.7, 2.2, 3.6, 4.4, 5.7, 7.7) 1.17 (0.02)(0.7, 2.3, 3.6, 4.4, 5.0, 5.8, 7.0, 8.9) 1.23 (0.02)
designs. In this evaluation, first, 500 datasets were generated from the death model
according to each of the optimal designs given in Table 4.1 and randomly selected designs
with an equivalent number of design points. Then, the posterior model probability of
the death model and log reciprocal of the posterior variance of β1 of the death model
were used to evaluate the efficiency of the optimal and the random designs for model
discrimination and parameter estimation, respectively. The performance of designs was
evaluated in a similar manner for the SI model. In each case, posterior inference was
undertaken based on actual likelihood evaluations.
According to Figure 4.2, the total entropy designs perform similarly to the corresponding
discrimination designs, except for the one-point design where the total entropy design
is less efficient than the discrimination design. As expected, the one-point KLD design
performs poorly for discrimination while the mutual information design performs well
across both models. However, all optimal designs with more than three-design points
perform similarly well for model discrimination (as the designs are quite similar) while
the random designs perform poorly. Moreover, it seems that there is only a small benefit
in collecting more than three observations for discriminating between the death and SI
models.
Figure 4.3 compares the performance of optimal designs in terms of parameter estimation.
Here, for both models, all optimal designs provide more information than the correspond-
ing random designs. Both KLD and total entropy one-point designs perform significantly
better than the discrimination design across both data generating models. It appears that
Chapter 4. Dual purpose Bayesian design for experiments in epidemiology 77
0.00
0.25
0.50
0.75
1.00
MI−
1D
TE
−1D
KLD
−1D
RD
−1D
MI−
2D
TE
−2D
KLD
−2D
RD
−2D
MI−
3D
TE
−3D
KLD
−3D
RD
−3
D
MI−
4D
TE
−4D
KLD
−4D
RD
−4D
MI−
6D
TE
−6D
KLD
−6D
RD
−6D
MI−
8D
TE
−8D
KLD
−8D
RD
−8D
Design
Poste
rior
mo
del pro
ba
bili
ty
(a) Death model
0.00
0.25
0.50
0.75
1.00
MI−
1D
TE
−1
D
KLD
−1D
RD
−1D
MI−
2D
TE
−2D
KLD
−2D
RD
−2D
MI−
3D
TE
−3D
KLD
−3D
RD
−3D
MI−
4D
TE
−4D
KLD
−4D
RD
−4D
MI−
6D
TE
−6D
KLD
−6D
RD
−6D
MI−
8D
TE
−8D
KLD
−8
D
RD
−8
D
Design
Poste
rior
mo
de
l pro
ba
bili
ty
(b) SI model
Figure 4.2: The posterior model probability of the data generating model obtained for observationsgenerated from (a) death model and (b) SI model according to optimal designs and random designs.
Chapter 4. Dual purpose Bayesian design for experiments in epidemiology 78
3
4
5
6
MI−
1D
TE
−1D
KLD
−1D
RD
−1D
MI−
2D
TE
−2D
KLD
−2D
RD
−2D
MI−
3D
TE
−3D
KLD
−3D
RD
−3
D
MI−
4D
TE
−4D
KLD
−4D
RD
−4D
MI−
6D
TE
−6D
KLD
−6D
RD
−6D
MI−
8D
TE
−8D
KLD
−8D
RD
−8D
Design
log
(1/d
et(
cov(t
he
ta|y
,d))
(a) Death model
12.5
15.0
17.5
20.0
MI−
1D
TE
−1D
KLD
−1D
RD
−1D
MI−
2D
TE
−2D
KLD
−2D
RD
−2D
MI−
3D
TE
−3D
KLD
−3D
RD
−3D
MI−
4D
TE
−4D
KL
D−
4D
RD
−4D
MI−
6D
TE
−6D
KLD
−6
D
RD
−6
D
MI−
8D
TE
−8
D
KLD
−8D
RD
−8D
Design
log(1
/det(
cov(t
heta
|y,d
))
(b) SI model
Figure 4.3: Log determinant of the posterior covariance of the parameters of (a) death model and (b)SI model obtained for observations generated from the corresponding model according to optimal designsand random designs.
Chapter 4. Dual purpose Bayesian design for experiments in epidemiology 79
the performance of designs with more than two design points are similar across all the
utility functions regardless of the data generating model as (again) designs found under
each utility are quite similar (see Table 4.1).
Given our proposed approach to approximate U(d) appears reasonable across the three
utilities considered in this work and that total entropy utility appears to perform well in
both model discrimination and parameter estimation, we next consider this methodology
to design experiments to learn more about FMD.
4.5.2 Example 2 - SIR and SEIR models
Transmission experiments have been used as a primary tool to investigate the transmission
dynamics of FMD within a head of animals, for instance, Backer et al. (2012), Bravo de
Rueda et al. (2015), Hu et al. (2017), Orsel et al. (2007). These experiments generally start
with giving a viral dose containing a prespecified amount of plaque forming units (PFU)
of the FMD viral strain of interest, for instance approximately 37500 PFU of FMD virus
strain O/NET/2001 (Orsel et al., 2007), to randomly selected animals and letting them
interact with susceptible animals. Then, the animals are observed over time for evidence
of disease states such as infectious and recovery based on clinical signs and blood samples
collected from the animals. In the veterinary epidemiological literature, both the SIR
(Orsel et al., 2007) and SEIR (Backer et al., 2012, Bradhurst et al., 2015) models have
been proposed to model the with-in herd spread of FMD in cattle populations. Thus, in
this example, we focus on deriving a set of time points to observe a closed population
of 50 cattle to both discriminate between SIR and SEIR models and estimate model
parameters.
In this hypothetical example, the number of animals who are administered the virus,
approximately 37500 PFU of FMD virus strain O/NET/2001, prior to the experiment is
considered as a fixed variable, specifically, I(t = 0) = 5 in order to consider a realistic
scenario. These inoculated animals are kept separate from the other animals for a pre-
specified period (say 24 hours), and it is assumed that these animals are infectious at the
beginning of the experiment (t = 0) when they are allowed to mix with other remaining
45 animals who are assumed to be susceptible to FMD. The experiment is run over the
course of 30 days.
As it is costly to collect and test various samples from animals at regular intervals here,
we consider a set of k optimal times at which to collect those samples. Since the SEIR
model assumes a non-observable disease state, E(t), here we consider only the number of
infectious and recovered individuals observed at each time point in the utility evaluations.
Thus, subsequent inference is undertaken based on observed data Y = {(I(ti), R(ti)); i =
1, . . . , k} where I(ti) and R(ti) are the number of infectious and recovered individuals,
respectively, at time ti. Here, infectious or recovered individuals are identified as according
to the amount of virus measured in the collected blood samples of each individual as it is
Chapter 4. Dual purpose Bayesian design for experiments in epidemiology 80
Table 4.2: Optimal designs derived under different utility functions.
Utility function Optimal design d∗ U(d∗)
Mutual Information
(3.1) -0.43 (0.02)(4.1, 16.0) -0.34 (0.02)(0.7, 4.1, 18.4) -0.30 (0.02)(0.7, 4.1, 10.1, 25.3) -0.28 (0.02)(0.7, 3.1, 5.3, 6.5, 10.1, 25.3) - 0.27 (0.02)(0.7, 2.9, 4.1, 5.3, 6.3, 6.5, 10.1, 25.4) -0.27 (0.02)
Total Entropy
(7.0) 0.97 (0.02)(6.7, 17.5) 1.56 (0.03)(6.5, 13.5, 27.1) 1.81 (0.03)(5.5, 10.8, 16.3, 27.1) 1.97 (0.03)(4.1, 7.0, 10.8, 14.2, 18.8, 27.1) 2.16 (0.03)(4.1, 7.0, 10.8, 12.9, 15.2, 17.5, 21.7, 27.3) 2.30 (0.04)
KLD
(11.6) 0.91 (0.02)(9.4, 19.1) 1.26 (0.03)(7.4, 14.2, 27.1) 1.47 (0.03)(7.3, 10.9, 16.4, 27.1) 1.60 (0.03)(7.3, 10.7, 14.2, 17.8, 21.7, 28.2) 1.79 (0.03)(7.3, 10.7, 12.8, 15.0, 17.4, 21, 23.8, 28.2) 1.94 (0.03)
more reliable than classifying them according to their clinical signs (see for more details
Stenfeldt et al. (2016)).
Based on the parameter values of SEIR model given in Backer et al. (2012), β ∼ log −normal(0.44, 0.162), αE ∼ gamma(25.55, 0.02)(shape, scale) and αI ∼ gamma(7.25, 0.04)
were chosen as the prior distributions of parameters the SEIR model. Then, the β ∼log − normal(−0.09, 0.192) and αI ∼ gamma(10.30, 0.02) were used as the prior dis-
tributions of the parameters of the SIR model as these yielded similar prior predictive
distributions for infectious and recovered individuals as given by the SEIR model. These
two models were assumed equally likely a priori. Table 4.2 summarises the optimal designs
found under our three utility functions.
According to these designs, the process should be observed at its early stages to dis-
criminate between models. In contrast, observations collected at the middle and later
stages provide more information about parameters. A combination of such design char-
acteristics is reflected in the designs derived based on the total entropy utility. As in
the previous example, the performance of the derived optimal design for discriminating
between competing models and estimating model parameters was evaluated. Here, the
synthetic likelihood was used in this evaluation as the actual likelihood of either of these
models is not available in closed-form.
Figure 4.4 compares the performance of the optimal and random designs for discriminat-
ing between the SIR and SEIR model when (a) the SIR model and (b) the SEIR model is
Chapter 4. Dual purpose Bayesian design for experiments in epidemiology 81
0.00
0.25
0.50
0.75
1.00
MI−
1D
TE
−1D
KLD
−1D
RD
−1D
MI−
2D
TE
−2D
KLD
−2D
RD
−2D
MI−
3D
TE
−3D
KLD
−3D
RD
−3
D
MI−
4D
TE
−4D
KLD
−4D
RD
−4D
MI−
6D
TE
−6D
KLD
−6D
RD
−6D
MI−
8D
TE
−8D
KLD
−8D
RD
−8D
Design
Poste
rior
mo
del pro
ba
bili
ty
(a) SIR model
0.00
0.25
0.50
0.75
1.00
MI−
1D
TE
−1
D
KLD
−1D
RD
−1D
MI−
2D
TE
−2D
KLD
−2D
RD
−2D
MI−
3D
TE
−3D
KLD
−3D
RD
−3D
MI−
4D
TE
−4D
KLD
−4D
RD
−4D
MI−
6D
TE
−6D
KLD
−6D
RD
−6D
MI−
8D
TE
−8D
KLD
−8
D
RD
−8
D
Design
Poste
rior
mo
de
l pro
ba
bili
ty
(b) SEIR model
Figure 4.4: The approximated posterior model probability of (a) SIR model and (b) SEIR model obtainedfor observations generated from the corresponding model according to optimal designs and random designs.
Chapter 4. Dual purpose Bayesian design for experiments in epidemiology 82
responsible for data generation. For both models, the model discrimination designs per-
form better than the other designs. When the SIR model was the data generating model,
the total entropy designs yield observations with slightly less information for discriminat-
ing between models compared to the discrimination designs. However, as the number of
design points increases, the total entropy designs yield comparatively more information
about the data generation model. When the data were generated from the SEIR model,
the total entropy designs perform similarly to the model discrimination designs except for
the one-point design.
Chapter 4. Dual purpose Bayesian design for experiments in epidemiology 83
10
12
14
16
18
MI−
1D
TE
−1
D
KL
D−
1D
RD
−1D
MI−
2D
TE
−2
D
KL
D−
2D
RD
−2D
MI−
3D
TE
−3
D
KL
D−
3D
RD
−3D
MI−
4D
TE
−4
D
KL
D−
4D
RD
−4D
MI−
6D
TE
−6D
KL
D−
6D
RD
−6D
MI−
8D
TE
−8D
KL
D−
8D
RD
−8D
Design
log(1
/de
t(cov(t
heta
|y,d
))
(a) SIR model
15
20
25
MI−
1D
TE
−1D
KLD
−1D
RD
−1D
MI−
2D
TE
−2D
KLD
−2D
RD
−2D
MI−
3D
TE
−3D
KLD
−3D
RD
−3D
MI−
4D
TE
−4D
KL
D−
4D
RD
−4D
MI−
6D
TE
−6D
KLD
−6
D
RD
−6
D
MI−
8D
TE
−8
D
KLD
−8D
RD
−8D
Design
log(1
/det(
cov(t
heta
|y,d
))
(b) SEIR model
Figure 4.5: Log determinant of the approximated posterior covariance of the parameters of (a) SIRmodel and (b) SEIR model obtained for observations generated from the corresponding model accordingto optimal designs and random designs.
Figure 4.5 shows that, across both models, total entropy designs yield observations which
provide efficient estimates of the model parameters when compared to designs under the
KLD utility. In contrast, model discrimination designs do not provide more informative
Chapter 4. Dual purpose Bayesian design for experiments in epidemiology 84
observations to estimate parameters, particularly in the case of one-point design where it
performs poorly (in expectation) even when compared to the random designs.
4.6 Discussion
In this work, a methodology for designing dual purpose experiments for parameter estima-
tion and model discrimination for models with intractable likelihoods has been presented.
These developments were motivated by the need to conduct more ethical experiments in
epidemiology, and the current lack of knowledge about FMD. Our approach is based on
the synthetic likelihood method for approximating an entropy-based utility function. Our
illustrative example revealed that the proposed synthetic likelihood method estimates the
utility of a design reasonably well when compared to using the actual likelihood. These
methods were then applied in a second example to design experiments to learn about
FMD.
Although simulation-based likelihood approximation methods enable the derivation of
designs for Bayesian experiments for models with intractable likelihoods, it poses compu-
tational challenges in simulating a large amount of datasets. Consequently, pre-simulated
datasets have been used in the literature by discretising the design space which reduces
the required computational effort in utility evaluations. One advantage of the proposed
synthetic likelihood approach over the ABC method is the former requires less storage
as only mean vectors and covariance matrices are needed to be stored instead of all pre-
simulate datasets.
However, the proposed synthetic likelihood method may not be appropriate for approxi-
mating the likelihood in some cases when the Gaussian density is not a reasonable approx-
imation to the distribution of the data. Thus, further research is required to investigate
more suitable distributions in place of the Gaussian distribution, particularly when a
small number of experimental units will be considered. In addition, the computational
challenges of estimating the multivariate Normal integral to approximate the likelihood
could potentially hinder the exploration of higher dimensional designs, motivating the
need to develop fast and/or approximate methods to evaluate this integral.
The performance of the total entropy designs in both model discrimination and parameter
estimation, particularity in the FMD example, motivates the use of total entropy for
design problems in other areas such as systems biology and queueing systems. Further,
extending the proposed methodology to derive higher dimensional designs using more
sophisticated posterior approximation methods such as the Laplace approximation or
Laplace importance sampling instead of importance sampling from the prior could be
another potential avenue for future research.
5 A synthetic likelihood-based Laplace approxi-
mation for efficient design of biological processes
85
Chapter 5. A synthetic likelihood-based Laplace approximation for efficientdesigns
86
Statement for Authorship
This chapter has been written as a journal article. The authors listed below have certified
that:
(a) They meet the criteria for authorship as they have participated in the conception,
execution or interpretation of at least the part of the publication in their field of
expertise;
(b) They take public responsibility for their part of the publication, except for the
responsible author who accepts overall responsibility for the publication;
(c) There are no other authors of the publication according to these criteria;
(d) Potential conflicts of interest have been disclosed to granting bodies, the editor
or publisher of the journals of other publications and the head of the responsible
academic unit; and
(e) They agree to the use of the publication in the student’s thesis and its publication
on the Australian Digital Thesis database consistent with any limitations set by
publisher requirements.
Dehideniya M. B., Drovandi C. C., Overstall A. M., and McGree J. M. A synthetic
likelihood-based Laplace approximation for efficient design of biological processes Elec-
tronic Journal of Statistics (Submitted for publication).
Contributor Statement of contribution
M. B. Dehideniya Developed and implemented the statistical methods, wrote
the manuscript, revised the manuscript as suggested by
co-authors.
Signature and date:
J. M. McGree Initiated the research concept, supervised research, assisted in interpreting results,
critically reviewed manuscript.
C. C. Drovandi Supervised research, assisted in interpreting results,
critically reviewed manuscript.
A. M. Overstall Proposed additional application, critically reviewed manuscript.
Principal Supervisor Confirmation: I have sighted email or other correspondence for all
co-authors confirming their authorship.
Name: ________________________ Signature:______________ Date: ________
05/07/2019
James McGree 05/07/2019
Chapter 5. A synthetic likelihood-based Laplace approximation for efficientdesigns
87
5.1 Abstract
Complex models used to describe biological processes in epidemiology and ecology often
have computationally intractable or expensive likelihoods. This poses significant chal-
lenges in terms of Bayesian inference but more significantly in the design of efficient
experiments. This is because Bayesian designs are found by maximising the expectation
of a utility function over a design space. The difficulty comes from having to approximate
this expected utility as it requires sampling from or approximating a large number of
posterior distributions. This renders approaches adopted in inference computationally
infeasible to implement in design. Consequently, design in such fields has been limited to
a small number of dimensions or a restricted range of utilities. To overcome such limita-
tions, we propose a synthetic likelihood-based Laplace approximation for approximating
utility functions for models with intractable likelihoods. As will be seen, the proposed
approximation is flexible in that a wide range of utility functions can be considered, and
computationally efficient when compared to alternative methods for inference. To explore
the validity of this approximation, an illustrative example from epidemiology is consid-
ered. Then, our approach is used to design experiments with a relatively large number of
observations in two motivating applications from epidemiology and ecology.
5.2 Introduction
Designing experiments to collect data that are as informative as possible about the process
of interest is an important task in scientific investigation in, for instance, epidemiology
(Orsel et al., 2007), system biology (Faller et al., 2003) and ecology (Zhang et al., 2018).
These biological systems require the development and use of realistic statistical mod-
els that involve computationally intensive or intractable likelihoods. Unfortunately, this
poses significant challenges in design of experiment, and has led to a number of recent
developments, see Ryan et al. (2016b) for a recent review.
Often when designing an experiment, there is an uncertainty about the appropriate
stochastic process to describe the observed data and also the ensuing model parame-
ters. In the design literature for models with intractable likelihoods, these two levels of
uncertainty have been considered separately. Cook et al. (2008) considered the Kullback
Liebler distance (KLD) between the prior and the posterior as a utility function to find
designs for parameter estimation, and they used the moment closure method to approx-
imate the likelihood. Alternatively, methods from approximate Bayesian computation
(ABC) have been used to approximate utility functions based summaries of the posterior
distribution (Drovandi and Pettitt, 2013, Price et al., 2016) to design efficient experiments
for parameter estimation. In the presence of model uncertainty, designs have been found
for discriminating between competing models with intractable likelihoods (Dehideniya
et al., 2018b, Hainy et al., 2018). To approximate utility functions, Dehideniya et al.
(2018b) considered the ABC rejection algorithm for model choice (Grelaud et al., 2009)
Chapter 5. A synthetic likelihood-based Laplace approximation for efficientdesigns
88
while Hainy et al. (2018) proposed a classification based approach. Extensions to dual
purpose experiments for model discrimination and parameter estimation have also been
proposed by Dehideniya et al. (2018a) where the total entropy utility (Borth, 1975) was
considered. To approximate this utility, a synthetic likelihood approach for discrete data
was developed, and it was shown that such an approach allows a wide variety of utility
functions to be considered in design for models with intractable likelihoods.
Until the recent work by Overstall and McGree (2018), designs for models with intractable
likelihoods have been limited to a small number of design dimensions. Overstall and Mc-
Gree (2018) used emulation within an indirect inference framework to avoid the evalu-
ation of computationally expensive or intractable likelihoods, and were able to consider
design spaces of an order of magnitude larger than what has been considered previously.
However, their approach is limited to likelihood-based utility functions, that is, utility
functions that can be expressed in terms of the likelihood and/or the marginal likeli-
hood. In this paper, we address this limitation by extending the work of Dehideniya et al.
(2018a) to high dimensions thus allowing a wide variety of utility functions to be consid-
ered when designing large-scale experiments for models with intractable likelihoods. In
our approach, we use summary statistics to avoid the curse of dimensionality, and also
develop a Laplace-based approximation to the posterior distribution. This enables fast
posterior inference, and thus allows designs to be found in reasonable time frames.
This paper is outlined as follows. Section 5.3 presents the proposed synthetic likelihood-
based Laplace approximation for models with intractable likelihoods. Section 3 provides
background in Bayesian experimental design with a description of the utility functions
considered in this paper. Within this section, we also show how to approximate these
utility functions within our inference framework. Then, an illustrative example is consid-
ered in Section 4 along with the motivating examples for this work. Finally, a summary
of this work is given in Section 5 along with some limitations and suggestions for future
research directions.
5.3 Inference framework
The synthetic likelihood (Wood, 2010) approach is a method of approximating the like-
lihood of observed data y = {y1, y2, . . . , yL} for a given value of model parameters θ
for models with intractable likelihoods. This is achieved by assuming that the summary
statistics S conditional on θ follow a multivariate normal distribution with mean µ(θ)
and variance-covariance Σ(θ). In general, µ(θ) and Σ(θ) cannot be evaluated analytically
for a given value of θ but can be approximated by simulating n datasets from the model
conditional on θ and evaluating the summary statistics for each dataset. This yields a
distribution of summary statistics for which µ(θ) and Σ(θ) can be estimated. Then,
the log-likelihood of observed summary statistics, sobs = S(y), can be approximated as
follows:
Chapter 5. A synthetic likelihood-based Laplace approximation for efficientdesigns
89
ls(sobs|θ) = −1
2
[log |Σ(θ)| − (sobs − µ(θ))T Σ(θ)(sobs − µ(θ)) + L log(2π)
], (5.1)
where µ(θ) and Σ(θ) are the estimated mean vector and variance-covariance matrix of
the simulated summary statistics from the model of interest with parameter θ.
Given the above approximation to the log-likelihood, we now outline how we propose to
approximate the posterior distribution for models with intractable likelihoods. Previous
studies have considered importance sampling for models with tractable (McGree et al.,
2012, Weir et al., 2007) and intractable (Dehideniya et al., 2018a, Ryan et al., 2016a)
likelihoods as a fast approximation when the distance between the prior and posterior is
relatively small. However, as the number of observations increases the resultant posterior
distribution can be considerably different from the prior, and importance sampling may
provide an inefficient approximation to the posterior. One alternative that has been used
in Bayesian design is the Laplace approximation (Long et al., 2013, Overstall et al., 2018,
Ryan et al., 2015). Suppose we have observed data y under design d generated from
model m with qm parameters θm. Then, the Laplace approximation approximates the
posterior distribution of θm via a multivariate normal distribution with mean θ∗m and
covariance matrix H(θ∗m)−1 where θ∗m is the posterior mode and H(θ∗m)−1 is the inverse
of the Hessian matrix of negative log posterior evaluated at θ∗m. One advantage of using
the Laplace approximation is the availability of an approximation to the model evidence
which can be used for model choice. The approximation to the model evidence is as
follows:
p(y|m,d) = (2π)qm2 |H(θ∗m)−1|
12 p(y|θ∗m,m,d) p(θ∗m|m). (5.2)
When there are K candidate models to describe the process of interest, the posterior
model probability of model m is estimated by,
p(m|y,d) =p(y|m,d) p(m)∑K
m=1 p(y|m,d) p(m). (5.3)
As pointed out by Wood (2010), due to small-scale noise associated with evaluating
ls(sobs|θ), derivative-based, numerical optimisation approaches cannot be used to find
the posterior mode. Thus, in this work the Nelder-Mead algorithm for derivative-free
optimisation (Kelley, 1999) is used to find the parameter value which maximises the pos-
terior density. To approximate the Hessian matrix, methods proposed by Fasiolo (2016)
can be considered where the Hessian of the synthetic likelihood function is estimated at
a given parameter value θ based on a set of local regression models fitted between model
parameters and summary statistics (see Algorithm 2 on page 61 of Fasiolo (2016)).
Chapter 5. A synthetic likelihood-based Laplace approximation for efficientdesigns
90
Adopting the above approach for Bayesian inference in design has a number of advantages
over the recent work of Dehideniya et al. (2018a) who proposed a synthetic likelihood
approximation using the full dataset with continuity correction for discrete observations.
Although their approximation was shown to work well when a few data points were
observed, it becomes computational expensive as the number of observations increases.
In contrast, the synthetic likelihood approach based on summary statistics provides a
computationally feasible method of approximating the likelihood of a large number of
observations. Thus, the proposed approach is more suitable for designing experiments
which yield a large number of observations, where large is defined in the context of design
for models with intractable likelihoods.
Another advantage of our proposed approach is that the Laplace approximation requires
less likelihood evaluations when compared to alternative posterior approximations such as
importance sampling and Markov chain Monte Carlo. Consequently, the use of Laplace
approximation reduces the number of datasets that need to be simulated from the model.
While model simulation is generally an efficient process, having to repeat this a large
number of times imposes significant computational burden. Indeed, in the Bayesian ex-
perimental design literature, pre-simulated data have been used to avoid the computa-
tional cost of simulating a large number of datasets during the optimisation, for instance
Drovandi and Pettitt (2013), Hainy et al. (2016), Price et al. (2016). However, this re-
sults in having to consider a discrete design space, and therefore potentially suboptimal
designs. In adopting our proposed methods, a continuous design space can be considered
which should lead to the location of designs that perform better with respect to the ex-
perimental goal/s. The recent work of Hainy et al. (2018) also proposed a classification
based approach to reduce the number of models simulations and thus allowing a continu-
ous design space to be considered. However, their approach is currently limited to a small
number of utility functions for model discrimination.
5.4 Bayesian experimental designs
A properly planned experiment will provide informative data for subsequent inferences
of interest, such as parameter estimation, model discrimination and/or prediction. Thus,
prior to any experimentation, the values of controllable variables need to be carefully
chosen to ensure the data are as informative as possible. In the context of experimental
design, a set of possible values for these controllable variables is defined as the design,
and the informativeness of data obtained from such a design is measured by a utility
function which is defined according to the experimental goal/s. Applications considered
in this paper focus on when to observe a biological process of interest, and thus time is the
design/controllable variable. In the presence of model uncertainty, define K competing
models each with prior model probabilities p(m) and model parameters θm with prior
distributions p(θm|m). Then, the selection of the values for the controllable variables can
be defined as a decision problem under the uncertainty about the parameters θm, model
m and data y yet to be observed under design d (Lindley et al., 1978). Therefore, the
Chapter 5. A synthetic likelihood-based Laplace approximation for efficientdesigns
91
expected utility is considered in finding the optimal design. This expected utility can be
defined as follows:
U(d) =K∑
m=1
p(m)
{∫y
∫θm
u(d,y,θm,m) p(y |θm, m,d) p(θm|m) dθm dy
}, (5.4)
where, p(y |θm, m,d) is the likelihood of y given the model parameter θm, model m and
design d. The utility u(d,y,θm,m) can be defined to encapsulate the purpose of the
experiment such as estimation of parameters across all the K competing models (McGree
et al., 2016), discriminating between competing models (Drovandi et al., 2014) or dual
goals of parameter estimation and model discrimination (Borth, 1975, McGree, 2017).
When the utility function u(.) is independent from the model parameter θm, Equation
(5.4) can be simplified to
U(d) =K∑
m=1
p(m)
{∫yu(d,y,m) p(y |m,d) dy
}. (5.5)
Often the expected utility is not available in closed-form and thus needs to be approx-
imated, for example, by Monte Carlo integration. When u(.) depends on the model
parameters θm, the approximate expected utility of design d can be expressed as follows:
U(d) =K∑
m=1
p(m)1
Q
Q∑j=1
u(d,yjm,θjm,m), (5.6)
where yjm is a possible dataset generated from the model m using model parameters θjm
and design d. Similarly, when u(.) is independent from θm, the expected utility given by
Equation (5.5) can be approximated as,
U(d) =K∑
m=1
p(m)1
Q
Q∑j=1
u(d,yjm,m), (5.7)
where yjm is a possible dataset generated from the model m under design d. Generally,
the utility function u(.) is based on some posterior quantity. Therefore, the evaluation of
U(d) requires K×Q posterior evaluations or approximations, which is a computationally
intensive task in general but particularly so for models with intractable likelihoods.
The accuracy of this approximation increases as the number of Monte Carlo samples Q
increases for each model m. Drovandi and Tran (2018) demonstrated that the accuracy of
U(d) for a given number of Monte Carlo samples can be increased by using the randomised
Quasi-Monte Carlo (RQMC) method. Following Drovandi and Tran (2018), here a Sobol
Chapter 5. A synthetic likelihood-based Laplace approximation for efficientdesigns
92
sequence (0, 1](qm+n) is used to first generate qm parameters θm of model m and then
simulate n observations y from the model conditional on θm and d. In order to obtain
an unbiased estimate of U(d), these deterministic sequences are randomised by using the
Owen type (Owen, 1997) of scrambling implemented in Christophe and Petr (2018) with
different seed value for each time a sequence is simulated. In our implementation, system
time is used as the seed value for each simulated sequence.
5.4.1 Utility functions for parameter estimation
In this work, we consider two utility functions in designing efficient experiments to es-
timate parameters of models with intractable likelihoods namely the KLD utility and
Negative squared error loss utility.
5.4.1.1 Kullback-Leibler divergence utility
The KLD between the prior and posterior distributions of parameters θm of model m has
been commonly used as a utility function to design Bayesian experiments for estimating
parameters, for instance Cook et al. (2008), Ryan et al. (2014), Ryan (2003). The KLD
utility is given by,
uKLD(d,y,m) =
∫θm
log
(p(θm |y,m,d)
p(θm|m)
)p(θm |y,m,d) dθ. (5.8)
When p(θm|m) follows a multivariate normal distribution with mean µ1 and variance-
covariance matrix Σ1, the KLD utility can be calculated analytically based on the pos-
terior distribution of θm as given by the Laplace approximation. It can be expressed as
follows:
uKLD(d,y,m) =1
2
(tr(Σ−11 Σ2
)+(µ1−µ2)
TΣ−11 (µ1−µ2)−qm+log
(det(Σ1)
det(Σ2)
)), (5.9)
where µ2 and Σ2 are estimated posterior mean and variance-covariance matrix respec-
tively, and tr(A) is the trace of the matrix A.
5.4.1.2 Negative squared error loss utility
When the goal of the experiment is to obtain a posterior summary such as a posterior
mean that is as close as possible to the truth, the negative squared error loss (NSEL)
utility can be used to design such an experiment. Given that θ is a vector of q elements,
the NSEL utility can be expressed as follows:
Chapter 5. A synthetic likelihood-based Laplace approximation for efficientdesigns
93
uNSEL(d,y,θ) = −q∑
i=1
(θi − E[θi|y,d]
)2, (5.10)
where θi is the ith parameter value of θ used to generate y under design d. The NSEL
utility has been used to find designs for parameter estimation of logistic regression mod-
els (Overstall and Woods, 2017). Following the approach of Overstall et al. (2018), we
approximate E[θi|y,d] by the posterior mode θ∗ found using the Laplace approximation.
5.4.2 Utility function for model discrimination
The mutual information between the model indicator m and the observed data y under
design d has been used as a discrimination utility (Box and Hill, 1967, Drovandi et al.,
2014) to design experiments for model discrimination, and it can be expressed as follows:
uMI(d,y,m) = log p(m)− log p(m |y,d). (5.11)
As described in Section 5.3, the posterior model probability of model m can be ap-
proximated via Laplace approximation. Then, the mutual information utility can be
approximated as,
uMI(d,y,m) = log p(m)− log p(m |y,d). (5.12)
5.4.3 Utility function for dual purpose experiments
When more than one model is being considered to describe the data generated from the
process of interest, it is beneficial to design a dual purpose experiment to discriminate
between the competing models and estimate parameters of all competing models. The
total entropy (Borth, 1975) about the model and model parameters has been used as a
utility function to design dual purpose experiments of model discrimination and param-
eters estimation (Dehideniya et al., 2018a, McGree, 2017). This utility can be expressed
as a sum of the mutual information utility for model discrimination and KLD utility,
uTE(d,y,m) = uMI(d,y,m) + uKLD(d,y,m). (5.13)
By substituting the approximated uKLD(d,y,m) and uMI(d,y,m) in Equation (5.13), the
total entropy utility can be approximated.
Chapter 5. A synthetic likelihood-based Laplace approximation for efficientdesigns
94
5.4.4 Optimisation algorithm
Locating optimal designs involves maximising U(d) over a design space. In this work,
we used the approximate coordinate exchange (ACE) algorithm (Overstall and Woods,
2017) to find optimal designs in a continuous design space. The ACE algorithm iteratively
optimises one design variable at a time by emulating the expected utility of the given
design dimension, and optimising the predicted value as given by emulator. At each
iteration, the newly found design is compared with the current design, and is selected
based on a Bayesian hypothesis test. When implementing ACE, a number of tuning
parameters need to be specified. In the examples described in the following section, 5000
and 500 Monte Carlo samples were used for the hypothesis test and constructing the
emulator, respectively. Otherwise, the default settings for ACE were used as given in the
R-package aceabayes (Overstall et al., 2017a).
5.5 Examples
In this section, we consider one illustrative example and two motivating examples to
demonstrate the practical implications of our proposed methodologies. First, the pro-
posed utility approximation is validated by evaluating the total entropy utility of designs
for dual purpose experiments for two epidemiological models namely the death model and
Susceptible-Infected (SI) model. Secondly, the performance of our approach is demon-
strated through designing experiments to learn about foot and mouth disease. Finally, as
the third example, we consider designing laboratory microcosm experiments to estimate
parameters of a prey and predictor model found in ecology.
5.5.1 Example 1 - Dual purpose designs for the death and SI models
The death and SI models can be used to describe the spread of a disease among a closed
population of size N . In this example, optimal time points which yield informative obser-
vations for both discriminating between these competing models and estimating param-
eters of the models are considered.
The death model (Cook et al., 2008) divides the population into two sub-populations,
susceptible and infected. The state of the continuous-time Markov chain (CTMC) at
time t is defined as the number of infected individuals at time t, I(t). Given I(t) = i, the
transition probability of the possible state at t+ ∆t is given by
P[i+ 1 | i
]= β1 (N − i) ∆t +O(∆t),
where β1 is the rate at which susceptible individuals become infected due to environmental
sources.
Chapter 5. A synthetic likelihood-based Laplace approximation for efficientdesigns
95
The SI model (Cook et al., 2008) assumes that the infected individuals in the population
also contribute to the spread of diseases, represented by an additional parameter β2, in
addition to environmental sources. Given that I(t) = i, the transition probability of the
possible state at t+ ∆t is given by
P[i+ 1 | i
]= (β1 + β2 i) (N − i) ∆t +O(∆t).
The same priors for the unknown parameters considered by Dehideniya et al. (2018a) are
used, and they are as follows: log(β1) ∼ N(−0.48, 0.32) for Death model and log(β1) ∼N(−1.1, 0.42) and log(β2) ∼ N(−4.5, 0.632)) for the SI model.
Forming informative summary statistics for inference is generally a difficult task as the
summaries need to be informative over the entire prior predictive distribution. This
difficulty is further compounded in the context of experimental design as the summary
statistics not only need to be informative across the entire prior predictive distribution but
also across the entire design space. To handle this, we propose to use summary statistics
that are informative across a subset of the design space, where this subset is defined by
the perceived informativeness of the data obtained from a given design. That is, it seems
intuitively reasonable that, in order to estimate the probability of becoming infectious
given an individual is susceptible, data on individuals in both states are needed. To be
more precise, designs which observe the process in the latter stages generally yield all
individuals being infected (see Figure 5.2a). Consequently, observing the population of
individuals in this stage of the experiment will yield little information, and are therefore
avoided. Thus, it is this intuition that is used in subsetting the design space, and this
is achieved through inspection of the prior predictive data from a given design. Here,
we propose to subset the design space in a such a way that the set of observation times
should contain at least one time point during the first half of the experiment. In terms
of summary statistics, here we propose to use the mean and variance of the observed
counts as these were shown to be informative across the prior predictive distribution for
a random selection of designs (see Appendix C.1).
The Hessian approximation proposed in Fasiolo (2016) is based on a set of regression
models fitted between the model parameters and summary statistics. They also pro-
pose an additional regression step, where each model parameter is regressed against the
summary statistics, and the fitted model parameters are used as summary statistics to
improve the scalability of the Hessian approximation as the number of summary statistics
increases (see Section 4.5.1 of Fasiolo (2016)). In order to ensure reasonable accuracy in
the approximation, we propose to assess the validity of these additional linear regression
models based on a measure of goodness-of-fit (coefficient of multiple determination, R2).
This enables the identification of datasets (y in Equation (5.4)) which yield poor approx-
imations of the Hessian matrix and consequently the utility. Then, based on a defined
threshold value for R2, we propose to substitute poor estimates of the utility function
Chapter 5. A synthetic likelihood-based Laplace approximation for efficientdesigns
96
with a minimum utility value. As such, the estimate of the expected utility will be down
weighted, and potentially avoided within the optimisation. For applications in this pa-
per, it was found that for models with a single parameter, a threshold value of around 0.7
should be used while for other models a lower threshold can be considered (around 0.1).
Obviously, when this occurs, this will introduce a bias in the estimation of the expected
utility. Thus, we investigated the effect of this bias by comparing the expected utility
of randomly selected designs evaluated based on the actual likelihood to our synthetic
likelihood approach, and these results are shown in Figure 5.1.
As is evident from Figure 5.1, the proposed approach preserves a monotonic relationship
between the approximated and actual utility values, and reasonably approximates the
utility values for designs with higher utility values. It is noted that, the proposed approach
provides a biased estimate of the mutual information utility for some designs, see (×) in
Figure 5.1b. Given that these designs have relatively low expected utility values under the
actual utility evaluation, the maximisation of the expected utility should not be affected by
the introduced bias in handling the poor approximation of the Hessian matrix. Therefore,
we propose that our approximation can be used to locate optimal designs.
0.0
0.5
1.0
1.5
2.0
0.0 0.5 1.0 1.5 2.0
U(d) − Actual
U(d
) −
SL
Ap
pro
xim
atio
n
(a) Total entropy
−0.7
−0.6
−0.5
−0.6 −0.5 −0.4
U(d) − Actual
U(d
) −
SL
Ap
pro
xim
atio
n
(b) Mutual information
0.0
0.5
1.0
1.5
2.0
2.5
1.0 1.5 2.0
U(d) − Actual
U(d
) −
SL
Ap
pro
xim
atio
n
(c) KLD
Figure 5.1: Comparison of the estimated expected utility of design with 15 design points according tothe (a) total entropy, (b) mutual information and (c) KLD utilities using Laplace approximation based onsynthetic and actual likelihoods. Here, designs with biased estimate of the expected utility are representedby (×). In each plot, y = x line indicates a perfect match of approximated and actual utility evaluations.
Optimal designs under total entropy, mutual information and KLD utility functions were
located by the ACE algorithm. The optimal designs are shown in Figure 5.2b. The
expected utility values of the optimal designs were re-evaluated 100 times with different
draws from the prior predictive distribution, and the mean and standard deviation of
these expected utility values are given in Table 5.1.
Chapter 5. A synthetic likelihood-based Laplace approximation for efficientdesigns
97
0
10
20
30
40
50
0 5 10 15 20
Time (Days)
Num
ber
of in
fect
eds
(a)
MI−8D
TE−8D
KLD−8D
MI−10D
TE−10D
KLD−10D
MI−15D
TE−15D
KLD−15D
0 5 10 15 20
Time (Days)
Des
ign
(b)
Figure 5.2: Prior predictive distribution of number of infecteds based on the death model (solid) and SImodel (dashed) are given in sub figure (a). Here, dot-dashed and dotted lines represent the 10% aand 90%prior predictive quantiles of the death and SI model, respectively. Sub figure (b) illustrates the optimaldesigns found under total entropy utility (∗) along with the KLD utility (×) and mutual informationutility (+) for the death and SI models.
The optimal designs based on total entropy were compared with discrimination and esti-
mation designs in terms of addressing each experimental goal individually. Further, for
each optimal design, an equally spaced design with same number of design points was also
considered. In this comparison, for each design, 1000 datasets were simulated from both
Chapter 5. A synthetic likelihood-based Laplace approximation for efficientdesigns
98
Table 5.1: Expected utility values (standard deviation) of optimal designs derived under different utilityfunctions.
Utility function Number of design points |d| U(d∗) (SD)
Mutual Information8 -0.444 (0.002)10 -0.434 (0.002)15 -0.423 (0.002)
Total Entropy8 1.891 (0.006)10 1.931 (0.006)15 2.007 (0.007)
KLD8 2.328 (0.007)10 2.375 (0.007)15 2.433 (0.007)
models and posterior inference were undertake based on actual likelihood for the death
model and approximated likelihood for SI model (Sidje, 1998). First, posterior model
probabilities of the data generating model were estimated based on Laplace approxima-
tion as described by Equations (5.2) and (5.3) in Section 5.3. These results are shown in
Figure 5.3. From this figure, it is evident that all optimal designs perform equally well
for model discrimination, with some advantage over the equally spaced design.
Secondly, the designs were compared in terms of parameter estimation for both models.
For the death model, the log reciprocal of the posterior variance of β1 of death model and
for SI model the log determinant of the inverse of posterior variance-covariance matrix of
(β1, β2) of SI model were used to measure the performance of the designs for parameter
estimation. In this validation, Laplace importance sampling (LIS) was considered to ap-
proximate the posterior of model parameters more accurately. LIS is a combination of the
Laplace approximation and the importance sampling where the Laplace approximation
is used as the importance distribution. Following Ryan et al. (2015), here we multiplied
the variance-covariance matrix obtained via the Laplace approximation by 2 in forming
the importance distribution to ensure it covered the tails of the target distribution. The
results are shown in Figure 5.4. As can be seen, all optimal designs perform similarly
well, and consistently outperform the equally spaced design.
Chapter 5. A synthetic likelihood-based Laplace approximation for efficientdesigns
99
0.00
0.25
0.50
0.75
1.00
MI−
8D
TE
−8D
KLD
−8D
EQ
S−
8D
MI−
10D
TE
−1
0D
KL
D−
10D
EQ
S−
10
D
MI−
15D
TE
−1
5D
KL
D−
15D
EQ
S−
15
D
Design
Po
ste
rior
mode
l p
rob
abili
ty
(a) Death model
0.00
0.25
0.50
0.75
1.00
MI−
8D
TE
−8D
KLD
−8
D
EQ
S−
8D
MI−
10D
TE
−10
D
KLD
−10
D
EQ
S−
10D
MI−
15
D
TE
−15D
KLD
−15
D
EQ
S−
15D
Design
Po
ste
rior
mod
el p
rob
ab
ility
(b) SI model
Figure 5.3: The posterior model probability of the data generating model obtained for observationsgenerated from (a) death model and (b) SI model according to optimal designs and an equally spaceddesigns.
Chapter 5. A synthetic likelihood-based Laplace approximation for efficientdesigns
100
3.25
3.50
3.75
4.00
4.25
4.50
MI−
8D
TE
−8D
KLD
−8D
EQ
S−
8D
MI−
10D
TE
−1
0D
KL
D−
10D
EQ
S−
10
D
MI−
15D
TE
−1
5D
KL
D−
15D
EQ
S−
15
D
Design
log
(1/d
et(
cov(t
heta
|y,d
))
(a) Death model
3
4
5
6
MI−
8D
TE
−8D
KLD
−8
D
EQ
S−
8D
MI−
10D
TE
−10
D
KLD
−10
D
EQ
S−
10D
MI−
15
D
TE
−15D
KLD
−15
D
EQ
S−
15D
Design
log(1
/de
t(cov(t
he
ta|y
,d))
(b) SI model
Figure 5.4: The log determinant of the inverse of posterior variance-covariance matrix of parameters ofthe data generating model when observations generated from (a) death model and (b) SI model accordingto optimal designs and an equally spaced designs.
Chapter 5. A synthetic likelihood-based Laplace approximation for efficientdesigns
101
5.5.2 Example 2 - Dual purpose designs for foot and month disease
Foot and mouth disease (FMD) is a contagious disease which affects livestock such as
cattle, pigs, sheep (Knight-Jones and Rushton, 2013). Both the Susceptible-Infectious-
Recovered (SIR) model (Orsel et al., 2007) and the Susceptible-Exposed-Infectious-Recovered
(SEIR) model (Backer et al., 2012) have been proposed to describe epidemic data from
FMD.
The SIR model assumes that susceptible individuals become infectious immediately after
they make an infected-contact with the infectious individuals in the population. Then,
the infectious individuals recover after time T ∼ exp(α) becoming immune and no longer
spread the disease to other individuals. As in the previous example, the spread of FMD
among N individuals can be described by a CTMC model. Given that there are s sus-
ceptible and i infectious individuals at time t, the probabilities of possible events in the
next infinitesimal time period ∆t are given by,
P[s− 1 , i+ 1 | s , i
]=β s i
N∆t +O(∆t),
P[s, i− 1|s, i
]= α i∆t +O(∆t),
where β is the rate at which an infectious individual make infected-contacts per unit time.
In contrast, the SEIR model assumes that the susceptible individuals do not become in-
fectious immediately after they been exposed to the disease, but after time TE ∼ exp(αE).
During this period, exposed individuals do not show any symptoms of being infected, and
therefore the number of exposed individuals e at time t is unobservable. Once the exposed
individuals become infectious, they contribute to the spread of the disease and recover
after time TI ∼ exp(αI). Given that there are s susceptible, e exposed and i infectious
individuals at time t, the probabilities of possible events in the next infinitesimal time
period ∆t are given by,
P[s− 1 , e+ 1 , i | s , e , i
]=β s i
N∆t +O(∆t),
P[s , e− 1 , i+ 1 | s , e , i
]= αEe∆t +O(∆t),
P[s , e , i− 1 | s , e , i
]= αIi∆t +O(∆t),
where β is the rate at which an infectious individual make infected-contacts per unit time.
Chapter 5. A synthetic likelihood-based Laplace approximation for efficientdesigns
102
0
10
20
30
0 10 20 30
Time (Days)
Num
ber
of in
fectious indiv
iduals
(a)
0
10
20
30
40
50
0 10 20 30
Time (Days)
Num
ber
of re
covere
d indiv
iduals
(b)
MI−8D
TE−8D
KLD−8D
MI−10D
TE−10D
KLD−10D
MI−15D
TE−15D
KLD−15D
MI−20D
TE−20D
KLD−20D
0 10 20 30
Time (Days)
De
sig
n
(c)
Figure 5.5: Sub figures (a) and (b) the prior predictive distributions of infectious and recovered in-dividuals based on the SIR (solid) and SEIR (dashed) models. In both figures, dotted and dot-dashedlines represent the 10% aand 90% of prior predictive quantiles based on the SIR and SEIR model, respec-tively. Optimal designs found under total entropy utility (∗) along with the KLD utility (×) and mutualinformation utility (+) for the SIR and SEIR models are illustrated in sub figure (c).
Chapter 5. A synthetic likelihood-based Laplace approximation for efficientdesigns
103
Following Dehideniya et al. (2018a) log(β) ∼ N(−0.09, 0.192) and log(α) ∼ N(−1.63, 0.322)
were chosen to describe the uncertainty about the parameters of the SIR model, and
for SEIR model log(β) ∼ N(0.44, 0.162) , log(αE) ∼ N(−0.69, 0.22) and log(αI) ∼N(−1.31, 0.382) were chosen as the priors. At the beginning of the experiment, t = 0,
there are 5 infectious and 45 susceptible individuals and the population is observed start-
ing from t = 0.25 days (6 hours) until 30 days. In order to obtain an observation schedule
which is feasible to implement, these observation times were selected such that they are
at least 0.25 days apart, and we consider up to 20 design points. At each observation
time, the number of infectious (I) and recovered (R) individuals are recorded. In approx-
imating the expected utility of designs, mean, median and variance were considered as
the summary statistics to approximate the synthetic likelihood of observed data as de-
scribed in Section 5.3. Optimal designs found under three utility functions are illustrated
in Figure 5.5. The expected utility of each optimal design was re-evaluated 100 times and
the mean and the standard deviation of those estimated utilities are given in Table 5.2.
Table 5.2: Expected utility values (standard deviation) of optimal designs derived under different utilityfunctions.
Utility function Number of design points |d| U(d∗) (SD)
Mutual Information
8 -0.275 (0.005)10 -0.266 (0.005)15 -0.267 (0.005)20 -0.261 (0.005)
Total Entropy
8 1.152 (0.010)10 1.172 (0.008)15 1.196 (0.009)20 1.205 (0.009)
KLD
8 1.569 (0.013)10 1.583 (0.008)15 1.592 (0.009)20 1.611 (0.007)
As in Example 1, the optimal designs found using the total entropy utility were assessed
for model discrimination and parameter estimation. For each case, posterior inference
was undertaken using the synthetic likelihood approach described in Section 5.3. The
posterior model probabilities of the data generating model were determined for each
optimal and equally spaced design. As shown in Figure 5.6, designs found using the
model discrimination utility perform well across both SIR and SEIR models. When the
SIR model is the data generating model, KLD designs yield less informative datasets for
model discrimination while both total entropy designs and equally spaced designs perform
equally well. For the SEIR model, a clear difference in discrimination ability of designs
is not visible.
Chapter 5. A synthetic likelihood-based Laplace approximation for efficientdesigns
104
0.00
0.25
0.50
0.75
1.00
MI−
8D
TE
−8D
KLD
−8D
EQ
S−
8D
MI−
10D
TE
−1
0D
KL
D−
10D
EQ
S−
10
D
MI−
15D
TE
−1
5D
KL
D−
15D
EQ
S−
15
D
MI−
20D
TE
−2
0D
KL
D−
20D
EQ
S−
20
D
Design
Po
ste
rior
mode
l p
rob
abili
ty
(a) SIR model
0.00
0.25
0.50
0.75
1.00
MI−
8D
TE
−8D
KLD
−8
D
EQ
S−
8D
MI−
10D
TE
−10
D
KLD
−10
D
EQ
S−
10D
MI−
15
D
TE
−15D
KLD
−15
D
EQ
S−
15D
MI−
20D
TE
−20D
KLD
−20D
EQ
S−
20D
Design
Po
ste
rior
mod
el p
rob
ab
ility
(b) SEIR model
Figure 5.6: The posterior model probability of the data generating model obtained for observationsgenerated from (a) SIR model and (b) SEIR model according to optimal designs and an equally spaceddesigns.
In order to assess the performance of the optimal designs for parameter estimation, LIS
is used as in Example 1. Figure 5.7 compares the log determinant of the inverse of
posterior variance-covariance matrix in parameters of the data generating model based
Chapter 5. A synthetic likelihood-based Laplace approximation for efficientdesigns
105
on the optimal and equally spaced designs. It is evident that the total entropy designs
perform as well as the designs the designs found under KLD utility across both models
while designs found for model discrimination (only) do not provide precise estimation of
parameters and are actually less efficient than the equally spaced designs.
7
8
9
10
11K
LD
−8D
TE
−8D
MI−
8D
EQ
S−
8D
KLD
−10D
TE
−10D
MI−
10D
EQ
S−
10D
KLD
−15D
TE
−15D
MI−
15D
EQ
S−
15D
KLD
−20D
TE
−20D
MI−
20D
EQ
S−
20D
Design
log
(1/d
et(
cov(t
he
ta|y
,d))
(a) SIR model
10
12
14
KLD
−8D
TE
−8D
MI−
8D
EQ
S−
8D
KLD
−10D
TE
−10D
MI−
10D
EQ
S−
10D
KLD
−15
D
TE
−15D
MI−
15D
EQ
S−
15D
KLD
−20
D
TE
−20D
MI−
20D
EQ
S−
20D
Design
log(1
/det(
cov(t
heta
|y,d
))
(b) SEIR model
Figure 5.7: The log determinant of the inverse of posterior variance-covariance matrix of parameters ofthe data generating model when observations generated from (a) SIR model and (b) SEIR model accordingto optimal designs and an equally spaced designs.
Chapter 5. A synthetic likelihood-based Laplace approximation for efficientdesigns
106
5.5.3 Example 3 - Design for parameter estimation of predator - prey
model
Laboratory microcosm experiments play a key role in developing and refining ecological
theories (Bonsall and Hassell, 2005), where single-celled organisms or insects are placed
in a controlled environment to imitate complex natural environments. These experiments
provide many advantages over the field studies such as ability to replicate, control the
environmental conditions and convenient sampling. Consequently, in ecology laboratory
microcosm experiments were used to explore ecological concepts such as intraspecific
competition (Hassell et al., 1976, Nicholson, 1954) and predator and prey interaction
(Balciunas and Lawler, 1995, Lawler, 1993, Luckinbill, 1973).
Luckinbill (1973) conducted a series of experiments to investigate interactions between
Pciramecium aurelia (prey) and Didinium nasutum (predator). In this example, we con-
sider the Luckinbill’s experiment as a motivating application, and find optimal sampling
times to obtain data for estimating the parameters of the modified Lotka-Volterra (LV)
model with logistic growth of prey. Let the birth rate of prey be given by a and, in
the absence of predators, the prey population follows a logistic growth with a carrying
capacity K. Further, the rate of predation is given by b and the death rate of predators
is given by c. At time t, denote the size of the prey and predators populations are x and
y, respectively, the probabilities of possible events in the next infinitesimal time period
∆t are given by,
P[x+ 1 , y |x , y
]= a x∆t + o(∆t),
P[x− 1 , y |x , y
]= a
(1− x
K
)x∆t + o(∆t),
P[x− 1 , y + 1 |x , y
]= b x y∆t + o(∆t),
P[x , y − 1 |x , y
]= c y∆t + o(∆t).
Following the experimental set-up used by Luckinbill (1973), we assume that there are
90 prey and 35 predators at the beginning of the experiment. To obtain oscillatory
population dynamics over time, the following priors are chosen for the model parameters,
log(K) ∼ N(6.87, 0.202) , log(a) ∼ N(0.01, 0.122) , log(b) ∼ N(−5.03, 0.122) and log(c) ∼N(−0.69, 0.162), see Figures 5.8a and 5.8b.
Chapter 5. A synthetic likelihood-based Laplace approximation for efficientdesigns
107
0
100
200
300
0 5 10 15 20
Time (Days)
Nu
mb
er
of
pre
y
(a)
100
200
300
400
0 5 10 15 20
Time (Days)
Nu
mb
er
of
pre
da
tors
(b)
KLD−10D
NSEL−10D
KLD−15D
NSEL−15D
KLD−20D
NSEL−20D
0 5 10 15 20
Time (Days)
De
sign
(c)
Figure 5.8: Prior predictive distributions of prey and predators are given in sub figure (a) and (b)respectively. In both figures, dotted lines represent the 10% and 90% prior prediction quantiles of preyand predators. Plot (c) illustrates the optimal designs found under KLD utility (+) and NSEL utility (∗)for estimating parameters of the modified LV model.
Chapter 5. A synthetic likelihood-based Laplace approximation for efficientdesigns
108
The Gillespie algorithm (Gillespie, 1977) simulates every event that changes the state of
the system. Compared to the epidemiological models considered in Example 1 and 2,
the LV model can result in a large number of events being observed depending on the
parameter values used for the model simulation. Consequently, the Gillespie algorithm
can be computationally expensive to simulate data a large number of times. Alternatively,
the Explicit tau leap (ETL) method (Gillespie, 2001) can be considered where data are
simulated by sampling the number of times each possible event can occur during a time
step τ from a Poison distribution. The selection of value of τ is a trade-off between the
accuracy and the computational efficiency of the simulation. In this example, we used the
ETL method and set τ = 0.08 to simulate data in evaluating synthetic likelihood, as this
provided reasonable accuracy and significantly reduced computing time when compared to
the Gillespie algorithm. Here, mean, log variance and maximum of the observed counts of
prey and predators according to the design (d) were considered as the summary statistics.
Figure 5.8c shows the optimal sampling times for parameter estimation of the modified
LV model found under the KLD and NSEL utilities. There is overlap between the selected
sampling times which maximise the KLD and NSEL utilities. However, the NSEL designs
suggest to observe the process at the beginning of the experiment while KLD designs
consist of sampling times at the end of experiment. Similar to previous examples, the
expected utility of each optimal design were re-evaluated 100 times, and the mean and
standard deviation of these expected utility values are given in Table 5.3.
Table 5.3: Expected utility values (standard deviation) of optimal designs derived under the KLD andNSEL utility functions.
Utility function Number of design points |d| U(d∗) (SD)
KLD
10 2.87 (0.01)15 2.95 (0.02)20 2.97 (0.02)
NSEL
10 -0.0596 (0.0004)15 -0.0588 (0.0005)20 -0.0589 (0.0005)
As in the other two examples, the performance of optimal designs in estimating parameters
were assessed based on the log determinant of the inverse of posterior variance-covariance
matrix of parameters. Figure 5.9, compares the optimal designs found under KLD and
NSEL utilities with equally spaced designs. Despite noticeable differences between KLD
and NSEL designs, they perform similarly well compared to the equally spaced designs.
As seen in Figures 5.8a and 5.8b, prior predictive distributions of prey and predators have
two oscillations. Thus, potentially there are two regions of the prior predictive distribu-
tions where information about parameters can be obtained. Consequently, the utilities
considered here select quite different regions of the design space as being informative.
Chapter 5. A synthetic likelihood-based Laplace approximation for efficientdesigns
109
15
18
21
24
27
KL
D−
10
D
NS
EL−
10
D
EQ
S−
10
D
KL
D−
15D
NS
EL−
15
D
EQ
S−
15
D
KL
D−
20D
NS
EL−
20
D
EQ
S−
20
D
Design
log
(1/d
et(
cov(t
heta
|y,d
))
Figure 5.9: The log determinant of the inverse of posterior variance-covariance matrix of parameters ofthe data generating model when observations generated from the modified LV model according to optimaldesigns and an equally spaced designs.
5.6 Discussion
In this work, we proposed a synthetic likelihood-based Laplace approximation to evaluate
utility functions in designing experiments to collect a lager number of observations in
epidemiology and ecology. The proposed Laplace approximation requires a relativity small
number of likelihood evaluations compared to other posterior approximation methods,
and thus this reduces the number of model simulations required for utility evaluations.
This approach avoids the use of pre-simulated datasets which generally requires large
storage. Further, the computational cost in approximating the likelihoods with a large
number of observations has been reduced by using summary statistics instead of full
dataset. Consequently, our approach enables the location of high dimensional designs
for models with intractable likelihoods in a continuous design space providing significant
improvement on what has been proposed previously in the literature.
Although, the proposed approach provides an efficient approximation for a wide range
of utility functions, there are a few limitations. First, the selection of summary statis-
tics which are informative not only over the entire prior predictive distribution but also
across the design space appears to be a difficult task. We addressed this by avoiding
designs which yielded low utility values and/or poor approximations to the utility. How-
ever, recently robust methods for estimating the synthetic likelihood have been proposed.
For instance, the extended empirical saddlepoint estimation (Fasiolo et al., 2018) and a
semi-parametric approach (An et al., 2018). Exploration of the use of such methods to
Chapter 5. A synthetic likelihood-based Laplace approximation for efficientdesigns
110
improve the approximation of low utility values is a potential research avenue that could
be explored into the future.
Secondly, the use of Gillespie algorithm (Gillespie, 1977) to simulate data can be pro-
hibitively expensive to use in our approach, for example, see Example 3. Computationally
less expensive alternative approaches such as the ETL method (Gillespie, 2001) can be
used to simulate data. However, depending on the specified value for τ , The ETL method
can produce invalid states (negative values for population sizes). Consequently, improved
versions of ETL method, such as Binomial tau-leap method or Optimized tau-leap method
have been proposed. However, employing these methods comes at additional computa-
tional cost. Thus, further development of such methods is needed such that they can be
efficiently used in design. The exploitation of parallel computation available in Graphical
processing units (GPUs) could also be useful for alleviating some of the computational
burden when finding optimal designs. We plan to explore this into the future.
6 Conclusion
This chapter summarises the key developments proposed in this thesis. Then, the limita-
tions of these developments are discussed with potential future research directions.
6.1 Summary
The primary aim of this thesis was to develop and implement efficient methods to design
experiments for models with intractable likelihoods. To achieve this aim, the following
objectives were defined.
1. Develop a method to design experiments for discriminating between models with
intractable likelihoods.
2. Develop a new optimisation algorithm for computationally expensive utility func-
tions.
3. Develop a method to design dual purpose experiments for models with intractable
likelihoods.
4. Extend Objective 3 to find high dimensional designs for models with intractable
likelihoods.
To address the first objective, in Chapter 3 we proposed a novel method to efficiently
approximate model discrimination utilities using methods from approximate Bayesian
computation (ABC), specifically, ABC rejection algorithms for parameter estimation and
model choice. Three model discrimination utility functions, namely the mutual infor-
mation utility for model discrimination, Zero-one utility and Ds-optimal utility, were
considered to find optimal time points to observe the spread of a disease in a popula-
tion of individuals. The performance of designs found based on these three utilities were
compared. It was found that the mutual information designs generally performed better
when compared to the other two utilities, yielding data that provided the most certainty
about the appropriate model.
111
Chapter 6. Discussion and conclusions 112
To address Objective 2, we proposed an extension to the coordinate exchange (CE) al-
gorithm, called the Refined coordinate exchange (RCE) algorithm, to reduce the number
of utility evaluations required to locate optimal designs. We compared the performance
of the RCE algorithm with the CE algorithm, and found that both algorithms located
the same optimal design, but the RCE algorithm required much fewer evaluations of
the expected utility. We also compared the performance of the RCE algorithm with an
adaptation of the approximate coordinate exchange (ACE) algorithm to handle a discrete
design space, called ACE-D. It was found that the RCE algorithm locates highly efficient
designs with respect to ACE-D within a small number of iterations. Moreover, the RCE
algorithm was shown to be more robust to the initial design chosen. In order to locate
high dimensional designs, we implemented the RCE algorithm to exploit the parallel com-
putational architectures. Using the RCE algorithm, we found optimal designs up to ten
design points for discriminating between four competing epidemiological models. Such a
design problem would be computationally infeasible to consider with the CE algorithm,
and this was achieved without relying on an emulator of the expected utility surface (as
used in ACE-D) which, in practice, may suffer from issues of lack-of-fit.
Chapter 4 addressed the third objective by proposing methodology to design dual pur-
pose experiments for estimating parameters and discriminating between models with in-
tractable likelihoods. To do this, we extended the synthetic likelihood approximation by
applying a continuity correction for approximating the likelihood of discrete observations
via the multivariate normal distribution. Our approximation facilities the evaluation of a
wider range of utility functions than any other previously proposed methodologies allow-
ing utilities such as KLD utility, the mutual information utility for model discrimination
and the total entropy utility to be considered. After defining this synthetic likelihood
approximation, we validated the proposed utility approximation on an illustrative exam-
ple which involved designing dual purpose experiments for two epidemiological models
namely the death and Susceptible-Infected (SI) models. It was evident that the proposed
approximation was sufficient for locating optimal designs. Given this, we found dual
purpose designs for experiments on studying foot and mouth disease (FMD) which was
previously described by two competing epidemiological models, Susceptible-Infectious-
Recovered (SIR) and Susceptible-Exposed-Infectious-Recovered (SEIR) models. Then,
the performance of total entropy designs for discriminating between SIR and SEIR mod-
els and estimating parameters of these models were compared with optimal designs found
under the KLD utility and the mutual information utility for model discrimination. Over-
all, the total entropy designs performed similarly to the designs optimised for each design
objective, particularly as the number of design points increased. Therefore, we concluded
that our approximation was suitable and useful for finding dual purpose designs for ex-
periments in epidemiology.
The methodology proposed in Chapter 4 is limited to a small to moderate number of
design points as it involves computationally expensive integration across a multivariate
normal distribution, and requires many likelihood evaluations as importance sampling
Chapter 6. Discussion and conclusions 113
was used in approximating the posterior distribution. Thus, Objective 4 addressed these
limitations by proposing a synthetic likelihood-based Laplace approximation to the pos-
terior distribution that enables utility functions to be efficiently evaluated for high di-
mension designs. The proposed approach avoids the computational expense of evaluating
the likelihood of data by instead using summary statistics. Further, use of the Laplace
approximation provides a more efficient approximation to the posterior distribution (com-
pared to importance sampling) as the number of observations increases. We evaluated the
performance of the synthetic likelihood-based Laplace approximation for estimating util-
ity functions via an illustrative example. Then, we found high dimensional dual purpose
designs of up to 20 dimensions to learn about the FMD, and designs to estimate param-
eters of the modified Lotka-Volterra (LV) model with logistic growth of prey. Given we
developed a computationally efficient utility approximation, here we considered a contin-
uous design space where potentially more efficient designs can be found when compared
to a discretised design space.
6.2 Limitations and future work
There are some limitations of the proposed methods in this thesis which motivate future
research. One of the limitations of the methods in Chapters 4 and 5 is the Gaussian
assumption of the distribution of the data and summary statistics. To relax this assump-
tion, more robust likelihood approximations such as the extended empirical saddlepoint
approximation (Fasiolo et al., 2018) and SemiBSL (An et al., 2018) could be considered.
In Chapter 5, the Laplace approximation assumes that the posterior of parameters can
be well approximated by a Gaussian distribution. The use of more sophisticated methods
such as Laplace importance sampling and Variational Bayes could potentially improve
the posterior approximation, and therefore the approximation to the expected utility.
However, these methods will increase the computational cost of utility evaluations. Thus,
the possible improvements in the computational efficiency of these methods and/or im-
plementations of these methods would need to exploit parallel computing architectures in
order to locate designs in a reasonable amount of time. This is something that could be
investigated into the future.
Despite the various developments proposed in this thesis to improve the computational
efficiency in obtaining posterior approximations, the process of locating optimal designs
for models with intractable likelihoods is still quite time consuming. We would suggest
three research directions to address this limitation. Despite the computationally less
expensive model simulation using the Explicit tau-leap (ETL) method (Gillespie, 2001),
this method may produce invalid datasets. Improved versions of ETL method, such as
Binomial tau-leap method or Optimized tau-leap method, could be considered but this
comes at additional computational costs. Thus, developments are needed to simulate
data from such models in an efficient manner. This is an area of research that could
be pursued into the future. Secondly, the development of deterministic approaches to
evaluate likelihoods of stochastic models is also a potential future research direction to
Chapter 6. Discussion and conclusions 114
reduce or avoid a large number of model simulations. Finally, parallel computational
architectures available in graphics processing units (GPU) can be exploited to reduce
computational time in utility evaluations.
In this thesis, we have mainly considered designing of experiments in epidemiology. How-
ever, the methods developed can be used to design experiments in other areas such as
queue systems and system biology. Further, we have only considered designing experi-
ments without replicates due to the impracticality of collecting multiple measurements at
a single time point. In future research, designing experiments with replicates could also
be considered.
A Supplementary Material for Chapter 3: ‘Opti-
mal Bayesian design for discriminating between
models with intractable likelihoods in epidemi-
ology’
115
Appendix A. Supplementary Material for Chapter 3 116
A.1 Monte Carlo error of utility estimation in Example 2
0.00
0.01
0.02
0.03
0.04
0 1000 2000 3000 4000 5000
Number of Monte Carlo samples (Qm)
Monte
Carl
o e
rror
0.00
0.01
0.02
0.03
0.04
0.05
0 1000 2000 3000 4000 5000
Number of Monte Carlo samples (Qm)
Monte
Carl
o e
rror
0.00
0.01
0.02
0.03
0.04
0.05
0 1000 2000 3000 4000 5000
Number of Monte Carlo samples (Qm)
Monte
Carl
o e
rror
0.01
0.02
0.03
0.04
0.05
0 1000 2000 3000 4000 5000
Number of Monte Carlo samples (Qm)
Monte
Carl
o e
rror
0.00
0.01
0.02
0.03
0.04
0.05
0 1000 2000 3000 4000 5000
Number of Monte Carlo samples (Qms)
Monte
Carl
o e
rror
0.00
0.02
0.04
0.06
0 1000 2000 3000 4000 5000
Number of Monte Carlo samples (Qms)
Monte
Carl
o e
rror
0.00
0.02
0.04
0.06
0 1000 2000 3000 4000 5000
Number of Monte Carlo samples (Qms)
Monte
Carl
o e
rror
0.00
0.02
0.04
0.06
0 1000 2000 3000 4000 5000
Number of Monte Carlo samples (Qms)
Monte
Carl
o e
rror
0.00
0.01
0.02
0.03
0.04
0 1000 2000 3000 4000 5000
Number of Monte Carlo samples (Qm)
Monte
Carl
o e
rror
1 design point
0.00
0.01
0.02
0.03
0.04
0.05
0 1000 2000 3000 4000 5000
Number of Monte Carlo samples (Qm)
Monte
Carl
o e
rror
2 design points
0.01
0.02
0.03
0.04
0.05
0 1000 2000 3000 4000 5000
Number of Monte Carlo samples (Qm)
Monte
Carl
o e
rror
3 design points
0.01
0.02
0.03
0.04
0.05
0 1000 2000 3000 4000 5000
Number of Monte Carlo samples (Qm)
Monte
Carl
o e
rror
4 design points
Figure A.1: Monte Carlo error of estimated expected utility of the mutual information utility (first row),the Ds-optimality utility (second row) and the Zero-One utility (third row). In each plot, the solid linerepresents the Monte Carlo error associates with the estimated utility of the optimal design found undereach utility and dashed lines represent the Monte Carlo errors for randomly selected designs.
Appendix A. Supplementary Material for Chapter 3 117
A.2 Performance of optimisation algorithms in locating
three and four points designs discriminating between
Model 1 and 2
Table A.1: Performance of optimisation algorithms in locating three and four points designs for discrim-inating between Model 1 and 2 based on different utility functions.
Utility function |d|Optimisation
Algorithm
Optimal design
d∗U(d∗)
Total
run time
(hours)
Mutual information
3RCE(1) (0.6, 4.3, 6.0) -0.45 3.90
ACE-D (0.9, 4.2, 6.9) -0.45 14.62
4RCE(1) (0.6, 4.0, 5.1, 10.8) -0.44 5.67
ACE-D (0.6, 4.1, 5.2, 11.5) -0.44 19.90
Ds-optimal
3RCE(1) (0.7, 3.0, 4.8) 10.44 2.43
ACE-D (0.6, 2.9, 4.7) 10.44 10.10
4RCE(1) (0.5, 1.0, 3.3, 5.7) 10.48 3.80
ACE-D (0.6, 1.0, 3.0, 4.8) 10.48 11.95
Zero-One (0-1)
3RCE(1) (0.4, 5.3, 8.1 ) 0.790 3.12
ACE-D (0.6, 4.2, 9.2) 0.798 11.91
4RCE(1) (0.3, 1.2, 3.3, 5.5) 0.798 5.52
ACE-D (0.6, 4.0, 9.4, 13.6) 0.803 17.98
Appendix A. Supplementary Material for Chapter 3 118
A.3 Performance of the optimal designs
A.3.1 Performance of the optimal designs for model discrimination -
Model 2
0.00
0.25
0.50
0.75
1.00
0.2 0.4 0.6 0.8
ABC posterior model probability
Em
pir
ica
l cum
ula
tive
pro
ba
bili
ty
Utility function
0−1
Ds
MI
RD
(a) 1 design point
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00
ABC posterior model probability
Em
pir
ica
l cum
ula
tive
pro
ba
bili
ty
Utility function
0−1
Ds
MI
RD
(b) 4 design points
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00
ABC posterior model probability
Em
pir
ical cum
ula
tive
pro
babili
ty
Utility function
0−1
Ds
MI
RD
(c) 8 design points
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00
ABC posterior model probability
Em
pir
ical cum
ula
tive
pro
babili
ty
Utility function
0−1
Ds
MI
RD
(d) 10 design points
Figure A.2: Empirical cumulative probabilities of the ABC posterior model probability of Model 2 (truemodel) obtained for observations generated from Model 2 according to optimal designs for discriminatingbetween Models 1, 2, 3 and 4, and random designs.
Appendix A. Supplementary Material for Chapter 3 119
A.3.2 Performance of the optimal designs for model discrimination -
Model 3
0.00
0.25
0.50
0.75
1.00
0.0 0.1 0.2 0.3 0.4 0.5
ABC posterior model probability
Em
pir
ica
l cum
ula
tive
pro
ba
bili
ty
Utility function
0−1
Ds
MI
RD
(a) 1 design point
0.00
0.25
0.50
0.75
1.00
0.0 0.2 0.4 0.6 0.8
ABC posterior model probability
Em
pir
ica
l cum
ula
tive
pro
ba
bili
ty
Utility function
0−1
Ds
MI
RD
(b) 4 design points
0.00
0.25
0.50
0.75
1.00
0.0 0.2 0.4 0.6 0.8
ABC posterior model probability
Em
pir
ical cum
ula
tive
pro
babili
ty
Utility function
0−1
Ds
MI
RD
(c) 8 design points
0.00
0.25
0.50
0.75
1.00
0.0 0.2 0.4 0.6 0.8
ABC posterior model probability
Em
pir
ical cum
ula
tive
pro
babili
ty
Utility function
0−1
Ds
MI
RD
(d) 10 design points
Figure A.3: Empirical cumulative probabilities of the ABC posterior model probability of Model 3 (truemodel) obtained for observations generated from Model 3 according to optimal designs for discriminatingbetween Models 1, 2, 3 and 4, and random designs.
Appendix A. Supplementary Material for Chapter 3 120
A.3.3 Performance of the optimal designs for model discrimination -
Model 4
0.00
0.25
0.50
0.75
1.00
0.0 0.2 0.4 0.6
ABC posterior model probability
Em
pir
ica
l cum
ula
tive
pro
ba
bili
ty
Utility function
0−1
Ds
MI
RD
(a) 1 design point
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00
ABC posterior model probability
Em
pir
ica
l cum
ula
tive
pro
ba
bili
ty
Utility function
0−1
Ds
MI
RD
(b) 4 design points
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00
ABC posterior model probability
Em
pir
ical cum
ula
tive
pro
babili
ty
Utility function
0−1
Ds
MI
RD
(c) 8 design points
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00
ABC posterior model probability
Em
pir
ical cum
ula
tive
pro
babili
ty
Utility function
0−1
Ds
MI
RD
(d) 10 design points
Figure A.4: Empirical cumulative probabilities of the ABC posterior model probability of Model 4 (truemodel) obtained for observations generated from Model 4 according to optimal designs for discriminatingbetween Models 1, 2, 3 and 4, and random designs.
B Supplementary Material for Chapter 4: ‘Dual
purpose Bayesian design for parameter estima-
tion and model discrimination in epidemiology
using a synthetic likelihood approach’
B.1 Derivation of the total entropy utility for static
designs
Let y be n observations obtained by conducting an experiment under a static design d
which consists of n distinct time points, d = {t1, t2, . . . tn}. When the aim of the experi-
ment is to discriminate between K candidate models and to estimate model parameters,
a dual purpose utility function based on total entropy can be used. Total entropy was
originally proposed by Borth (1975), and can be expressed as follows,
HT (M,θ|y ,d) = HM (M |y ,d) +HP (θ|y ,d), (B.1)
where HM (M |y ,d) is the entropy about which model is correct and HP (θ|y ,d) is the
entropy of all parameters across the K models. Then, the expected change in total
entropy based on observations collected under a given design d can be used to measure
the informativeness of the design. In the following subsections, the expected change
in entropy about model indicator (IM ) and model parameters (IP ) will be described.
Following this, the total entropy utility is derived based on the sum of the expected
changes in entropy about the model indicator and parameter values.
B.1.1 Expected change in HM
The entropy about model indicator (Box and Hill, 1967) upon observing y given design
d is given by,
121
Appendix B. Supplementary Material for Chapter 4 122
HM (M |y ,d) = −K∑
m=1
p(m|y ,d) log p(m|y ,d). (B.2)
The entropy about the model indicator based on prior model probabilities, p(m), is,
HM (M) = −K∑
m=1
p(m) log p(m). (B.3)
The expected change in entropy about the model indicator based on observations obtained
under design d can be expressed as,
IM = HM (M)− E[HM (M |y ,d)]. (B.4)
Here,
E[HM (M |y ,d)] =∑y
HM (M |y ,d) p(y |d),
= −∑y
K∑m=1
p(m|y ,d) log p(m|y ,d) p(y |d),
= −K∑
m=1
∑y
p(m|y ,d) p(y |d) log p(m|y ,d). (B.5)
By the Bayes' theorem, p(m|y ,d)p(y |d) = p(y |m,d)p(m), then (B.6) can be simplified
as,
E[HM (M |y ,d)] = −K∑
m=1
∑y
p(y |m,d)p(m) log p(m|y ,d), (B.6)
= −K∑
m=1
p(m)∑y
p(y |m,d) log p(m|y ,d). (B.7)
Then, by applying (B.3) and (B.6) in Equation (B.4),
Appendix B. Supplementary Material for Chapter 4 123
IM =−K∑
m=1
p(m) log p(m) +K∑
m=1
p(m)∑y
p(y |m,d) log p(m|y ,d),
=K∑
m=1
p(m)∑y
p(y |m,d)
{log p(m|y ,d)− log p(m)
}. (B.8)
B.1.2 Expected change in HP
The entropy about model parameters across K models (McGree, 2017) upon observing y
given design d is given by,
HP (θ|y ,d) = −K∑
m=1
p(m|y ,d)
∫θm
p(θm|m,y ,d) log p(θm|m,y ,d) dθm. (B.9)
The entropy of model parameters, θm, based on the prior distribution of θm is given by,
HP (θ) = −K∑
m=1
p(m)
∫θm
p(θm|m)p(θm|m) dθm. (B.10)
The expected entropy about the model parameters based on observations obtained under
design d can be expressed as,
IP = HP (θ)− E[HP (θ|y ,d)]. (B.11)
Here,
E[HP (θ|y ,d)] =∑y
HP (θ|y ,d) p(y |d), (B.12)
= −∑y
K∑m=1
p(m|y ,d)
∫θm
p(θm|m,y ,d) log p(θm|m,y ,d) p(y |d)dθm,
(B.13)
= −∑y
K∑m=1
p(m|y ,d)p(y |d)
∫θm
p(θm|m,y ,d) log p(θm|m,y ,d) dθm.
(B.14)
Appendix B. Supplementary Material for Chapter 4 124
By the Bayes' theorem, p(m|y ,d)p(y |d) = p(y |m,d)p(m), then (B.14) can be expressed
as,
E[HP (θ|y ,d)] =−K∑
m=1
p(m)∑y
p(y |m,d)
∫θm
p(θm|m,y ,d) log p(θm|m,y ,d) dθm,
(B.15)
=−K∑
m=1
p(m)∑y
p(y |m,d)
∫θm
p(θm|m,y ,d) log
[p(y |θm,m,d)p(θm|m)
p(y |m,d)
]dθm,
(B.16)
=−K∑
m=1
p(m)∑y
p(y |m,d)
∫θm
p(θm|m,y ,d) log
[p(y |θm,m,d)
p(y |m,d)
]dθm
−K∑
m=1
p(m)∑y
p(y |m,d)
∫θm
p(θm|m,y ,d) log p(θm|m) dθm.
(B.17)
Let
A = −K∑
m=1
p(m)∑y
p(y |m,d)
∫θm
p(θm|m,y ,d) log
[p(y |θm,m,d)
p(y |m,d)
]dθm,
and
B = −K∑
m=1
p(m)∑y
p(y |m,d)
∫θm
p(θm|m,y ,d) log p(θm|m) dθm.
Then,
A = −K∑
m=1
p(m)∑y
p(y |m,d)
∫θm
p(θm|m,y ,d) log
[p(y |θm,m,d)
p(y |m,d)
]dθm, (B.18)
= −K∑
m=1
p(m)∑y
p(y |m,d)
{∫θm
p(θm|m,y ,d) log p(y |θm,m,d) dθm − log p(y |m,d)
}.
(B.19)
Appendix B. Supplementary Material for Chapter 4 125
B = −K∑
m=1
p(m)∑y
p(y |m,d)
∫θm
p(y |θm,m,d)p(θm|m)
p(y |m,d)log p(θm|m) dθm, (B.20)
= −K∑
m=1
p(m)∑y
∫θm
p(y |θm,m,d)p(θm|m) log p(θm|m) dθm, (B.21)
= −K∑
m=1
p(m)
∫θm
p(θm|m) log p(θm|m)∑y
p(y |θm,m,d) dθm, (B.22)
= −K∑
m=1
p(m)
∫θm
p(θm|m) log p(θm|m) dθm, (B.23)
= HP (θ). (B.24)
By substituting, A and B in Equation (B.17),
E[HP (θ|y ,d)] = −K∑
m=1
p(m)∑y
p(y |m,d)
{∫θm
p(θm|m,y ,d) log p(y |θm,m,d) dθm
− log p(y |m,d)
}+HP (θ). (B.25)
Then, by applying (B.10) and (B.25) in Equation (B.11),
IP =
K∑m=1
p(m)∑y
p(y |m,d)
{∫θm
p(θm|m,y ,d) log p(y |θm,m,d) dθm − log p(y |m,d)
}.
(B.26)
This can be used as a utility function for parameter estimation under model uncertainty.
B.1.3 Expected change in HT
The expected change in the total entropy can be expressed as follows:
Appendix B. Supplementary Material for Chapter 4 126
IT = HT (M,θ)− E[HT (M,θ|y ,d)], (B.27)
=
{HM (M) +HP (θ)
}−{E[HM (M |y ,d)] + E[HP (θ|y ,d)]
}, (B.28)
=
{HM (M)− E[HM (M |y ,d)]
}+
{HP (θ)− E[HP (θ|y ,d)]
}, (B.29)
= IM + IP . (B.30)
By applying Equation (B.8) and (B.26) in (B.30),
IT =
K∑m=1
p(m)∑y
p(y |m,d)
{log p(m|y ,d)− log p(m)
+
∫θm
p(θm|m,y ,d) log p(y |θm,m,d) dθm − log p(y |m,d)
},
(B.31)
=
K∑m=1
p(m)∑y
p(y |m,d)
{log
[p(m|y ,d)
p(y |m,d)p(m)
]+
∫θm
p(θm|m,y ,d) log p(y |θm,m,d) dθm
},
(B.32)
=K∑
m=1
p(m)∑y
p(y |m,d)
{− log p(y |d) +
∫θm
p(θm|m,y ,d) log p(y |θm,m,d) dθm
}.
(B.33)
The utility function based on the expected change in total entropy can be expressed as
follows:
U(d) =
K∑m=1
p(m)∑y
p(y |m,d)
{−log p(y |d)+
∫θm
p(θm|m,y ,d) log p(y |θm,m,d) dθm
}.
(B.34)
Appendix B. Supplementary Material for Chapter 4 127
B.2 Comparison between synthetic likelihood approach
and ABC rejection method
We compare the performance of the proposed synthetic likelihood approach with the
rejection sampling in approximate Bayesian computation (ABC) in approximating the
expected utility using the design scenario described in Example 1 of the main paper.
However, here we only consider the mutual information utility for discriminating between
the death and SI model as ABC rejection sampling does not provide an efficient approx-
imation to the Kullback-Leibler (KL) divergence utility and the total entropy utility. In
this comparison, we first compare our synthetic likelihood approach with the ABC model
choice (ABC-MC) method of Grelaud et al. (2009) for approximating the expected utility
of designs with two, four, six and eight randomly selected design points. For each design,
we compare both approximations based on 1× 106 model simulations, to the case where
u(d) is estimated based on evaluating the actual likelihood of the data, see Figure B.1.
From the first column of Figure B.1, it is evident that the proposed synthetic likelihood
approach approximates the actual mutual information utility well, even for designs with a
moderate number of design points (|d |). However, the ABC approximation, based on the
same number of model simulations, results in some deviation of the approximated utility
from the actual utility value as |d | increases.
As discussed in Dehideniya et al. (2018b), the reduced performance of ABC-MC as |d |increases is due to the increase in the ABC-tolerance. This could be remedied by increas-
ing the number of model simulations but comes at the cost of computational efficiency.
Nonetheless, we next investigate whether the ABC-MC approximation improves as the
number of simulated datasets increases. As can be seen in the third column of Figure B.1,
potentially some improvement is observed in the ABC-MC approximation by increasing
the number of simulated datasets to 2× 106. However, our synthetic likelihood approach
still performs better overall with strong agreement with the utilities values based on
evaluating the actual likelihood.
Appendix B. Supplementary Material for Chapter 4 128
SL (1× 106)
|d|=
2
−0.7
−0.6
−0.6
−0.6
−0.5
−0.7 −0.6 −0.6 −0.6 −0.5
U(d) − Actual
U(d
) −
SL
Ap
pro
xim
atio
n
|d|=
4
−0.7
−0.6
−0.5
−0.7 −0.6 −0.5
U(d) − Actual
U(d
) −
SL
Ap
pro
xim
atio
n
|d|=
6
−0.7
−0.6
−0.5
−0.7 −0.6 −0.5
U(d) − Actual
U(d
) −
SL
Ap
pro
xim
atio
n
|d|=
8
−0.7
−0.6
−0.5
−0.7 −0.6 −0.5
U(d) − Actual
U(d
) −
SL
Ap
pro
xim
atio
n
ABC (1× 106)
−0.7
−0.6
−0.6
−0.6
−0.5
−0.7 −0.6 −0.6 −0.6 −0.5
U(d) − Actual
U(d
) −
AB
C A
pp
roxim
atio
n
−0.7
−0.6
−0.6
−0.6
−0.5
−0.4
−0.7 −0.6 −0.5
U(d) − Actual
U(d
) −
AB
C A
pp
roxim
atio
n
−0.7
−0.6
−0.5
−0.7 −0.6 −0.5
U(d) − Actual
U(d
) −
AB
C A
pp
roxim
atio
n
−0.7
−0.6
−0.5
−0.7 −0.6 −0.5
U(d) − Actual
U(d
) −
AB
C A
pp
roxim
atio
n
ABC (2× 106)
−0.7
−0.6
−0.6
−0.6
−0.5
−0.7 −0.6 −0.6 −0.6 −0.5
U(d) − Actual
U(d
) −
AB
C A
pp
roxim
atio
n
−0.7
−0.6
−0.6
−0.6
−0.5
−0.4
−0.7 −0.6 −0.5
U(d) − Actual
U(d
) −
AB
C A
pp
roxim
atio
n
−0.7
−0.6
−0.5
−0.7 −0.6 −0.5
U(d) − Actual
U(d
) −
AB
C A
pp
roxim
atio
n
−0.7
−0.6
−0.5
−0.7 −0.6 −0.5
U(d) − Actual
U(d
) −
AB
C A
pp
roxim
atio
n
Figure B.1: Comparison of the accuracy of the estimated expected utility of the mutual informationutility using Synthetic likelihood based on 1 × 106 model simulations (first column), and ABC-MC basedon 1 × 106 (second column) and 2 × 106 (third column) model simulations. In each plot, y = x lineindicates a perfect match of approximated and actual utility evaluations.
C Supplementary Material for Chapter 5: ‘A syn-
thetic likelihood-based Laplace approximation
for efficient design of biological processes’
C.1 Informativeness of summary statistics
C.1.1 Death model
20
30
40
−1.5 −1.0 −0.5 0.0 0.5
b
me
an
(a)
0
100
200
−1.5 −1.0 −0.5 0.0 0.5
b
vari
an
ce
(b)
Figure C.1: Scatter plot between model parameter b and summary statistics, (a) mean and (b) varianceof observations simulated from the death model according to a random design with 15 design points.
129
Appendix C. Supplementary Material for Chapter 5 130
C.1.2 SI model
20
30
40
50
−2 −1 0
b1
me
an
(a)
0
100
200
300
400
−2 −1 0
b1
vari
an
ce
(b)
Figure C.2: Scatter plot between model parameter b1 and summary statistics, (a) mean and (b) varianceof observations simulated from the SI model according to a random design with 15 design points.
20
30
40
50
−7 −6 −5 −4 −3 −2
b2
me
an
(a)
0
100
200
300
400
−7 −6 −5 −4 −3 −2
b2
vari
an
ce
(b)
Figure C.3: Scatter plot between model parameter b2 and summary statistics, (a) mean and (b) varianceof observations simulated from the SI model according to a random design with 15 design points.
Bibliography
An, Z., Nott, D. J., and Drovandi, C. C. (2018). Robust Bayesian synthetic likelihood via
a semi-parametric approach. Technical report. URL : arXiv preprint arXiv:1809.05800.
Atkinson, A. (2008). DT-optimum designs for model discrimination and parameter esti-
mation. Journal of Statistical Planning and Inference, 138(1):56 – 64.
Atkinson, A. C. and Bogacka, B. (1997). Compound D- and Ds-optimum designs for
determining the order of a chemical reaction. Technometrics, 39(4):347–356.
Atkinson, A. C. and Fedorov, V. V. (1975a). The design of experiments for discriminating
between two rival models. Biometrika, 62(1):57–70.
Atkinson, A. C. and Fedorov, V. V. (1975b). Optimal design: Experiments for discrimi-
nating between several models. Biometrika, 62(2):289–303.
Backer, J., Hagenaars, T., Nodelijk, G., and van Roermund, H. (2012). Vaccination
against foot-and-mouth disease i: Epidemiological consequences. Preventive Veterinary
Medicine, 107(1-2):27 – 40.
Bailey, D. J. and Gilligan, C. A. (1999). Dynamics of primary and secondary infection in
take-all epidemics. Phytopathology, 89(1):84 – 91.
Bailey, D. J., Kleczkowski, A., and Gilligan, C. A. (2004). Epidemiological dynamics and
the efficiency of biological control of soil-borne disease during consecutive epidemics in
a controlled environment. New Phytologist, 161(2):569–575.
Balciunas, D. and Lawler, S. P. (1995). Effects of basal resources, predation, and alter-
native prey in microcosm food chains. Ecology, 76(4):1327–1336.
Barthelme, S. and Chopin, N. (2014). Expectation propagation for likelihood-free infer-
ence. Journal of the American Statistical Association, 109(505):315–333.
Beaumont, M. A., Zhang, W., and Balding, D. J. (2002). Approximate Bayesian compu-
tation in population genetics. Genetics, 162(4):2025–2035.
Becker, N. G. (1993). Parametric inference for epidemic models. Mathematical Bio-
sciences, 117(1):239 – 251.
Biedermann, S., Dette, H., and Pepelyshev, A. (2007). Optimal discrimination de-
signs for exponential regression models. Journal of Statistical Planning and Inference,
137(8):2579 – 2592. 5th St. Petersburg Workshop on Simulation.
Biedermann, S. and Woods, D. C. (2011). Optimal designs for generalized non-linear
models with application to second-harmonic generation experiments. Journal of the
Royal Statistical Society: Series C (Applied Statistics), 60(2):281–299.
131
Bibliography 132
Blum, M. G. B., Nunes, M. A., Prangle, D., and Sisson, S. A. (2013). A comparative
review of dimension reduction methods in approximate Bayesian computation. Statist.
Sci., 28(2):189–208.
Bonsall, M. B. and Hassell, M. P. (2005). Understanding ecological concepts: The role
of laboratory systems. In Population Dynamics and Laboratory Ecology, volume 37 of
Advances in Ecological Research, pages 1 – 36. Academic Press.
Borth, D. M. (1975). A total entropy criterion for the dual problem of model discrim-
ination and parameter estimation. Journal of the Royal Statistical Society. Series B
(Methodological), 37(1):77–87.
Bouma, A., Elbers, A., Dekker, A., de Koeijer, A., Bartels, C., Vellema, P., van der Wal,
P., van Rooij, E., Pluimers, F., and de Jong, M. (2003). The foot-and-mouth disease
epidemic in the Netherlands in 2001. Preventive Veterinary Medicine, 57(3):155 – 166.
Box, G. E. P. and Hill, W. J. (1967). Discrimination among mechanistic models. Tech-
nometrics, 9(1):57–71.
Bradhurst, R. A., Roche, S. E., East, I. J., Kwan, P., and Garner, M. G. (2015). A hy-
brid modeling approach to simulating foot-and-mouth disease outbreaks in Australian
livestock. Frontiers in Environmental Science, 3:17.
Bravo de Rueda, C., de Jong, M. C., Eble, P. L., and Dekker, A. (2015). Quantification of
transmission of foot-and-mouth disease virus caused by an environment contaminated
with secretions and excretions from infected calves. Veterinary Research, 46(1):43.
Cavagnaro, D. R., Myung, J. I., Pitt, M. A., and Kujala, J. V. (2010). Adaptive de-
sign optimization: A mutual information-based approach to model discrimination in
cognitive science. Neural Computation, 22(4):887–905.
Christophe, D. and Petr, S. (2018). randtoolbox: Generating and Testing Random Num-
bers. R package version 1.17.1.
Clyde, M. and Chaloner, K. (1996). The equivalence of constrained and weighted designs
in multiple objective design problems. Journal of the American Statistical Association,
91(435):1236–1244.
Cook, A. R., Gibson, G. J., and Gilligan, C. A. (2008). Optimal observation times in
experimental epidemic processes. Biometrics, 64(3):860–868.
Dehideniya, M. B., Drovandi, C. C., and McGree, J. M. (2018a). Dual pur-
pose Bayesian design for parameter estimation and model discrimination in epi-
demiology using a synthetic likelihood approach. Technical report. URL :
https://eprints.qut.edu.au/118569/.
Dehideniya, M. B., Drovandi, C. C., and McGree, J. M. (2018b). Optimal Bayesian
design for discriminating between models with intractable likelihoods in epidemiology.
Computational Statistics & Data Analysis, 124:277 – 297.
Denman, N., McGree, J., Eccleston, J., and Duffull, S. (2011). Design of experiments for
bivariate binary responses modelled by copula functions. Computational Statistics &
Data Analysis, 55(4):1509 – 1520.
Dror, H. A. and Steinberg, D. M. (2006). Robust experimental design for multivariate
generalized linear models. Technometrics, 48(4):520–529.
Bibliography 133
Drovandi, C. C., McGree, J. M., and Pettitt, A. N. (2013). Sequential Monte Carlo for
Bayesian sequentially designed experiments for discrete data. Computational Statistics
& Data Analysis, 57(1):320 – 335.
Drovandi, C. C., McGree, J. M., and Pettitt, A. N. (2014). A sequential Monte Carlo
algorithm to incorporate model uncertainty in Bayesian sequential design. Journal of
Computational and Graphical Statistics, 23(1):3–24.
Drovandi, C. C. and Pettitt, A. N. (2013). Bayesian experimental design for models with
intractable likelihoods. Biometrics, 69(4):937–948.
Drovandi, C. C., Pettitt, A. N., and Faddy, M. J. (2011). Approximate Bayesian com-
putation using indirect inference. Journal of the Royal Statistical Society: Series C
(Applied Statistics), 60(3):317–337.
Drovandi, C. C., Pettitt, A. N., and Lee, A. (2015). Bayesian indirect inference using a
parametric auxiliary model. Statistical Science, 30(1):72–95.
Drovandi, C. C. and Tran, M.-N. (2018). Improving the efficiency of fully Bayesian
optimal design of experiments using randomised quasi-Monte Carlo. Bayesian Analysis,
13(1):139–162.
Faller, D., Klingmuller, U., and Timmer, J. (2003). Simulation methods for optimal
experimental design in systems biology. SIMULATION, 79(12):717–725.
Fasiolo, M. (2016). Statistical Methods for Complex Population Dynamics. PhD thesis,
University of Bath.
Fasiolo, M., Wood, S. N., Hartig, F., and Bravington, M. V. (2018). An extended empirical
saddlepoint approximation for intractable likelihoods. Electronic Journal of Statistics,
12(1):1544–1578.
Gallant, A. R. and McCulloch, R. E. (2009). On the determination of general scientific
models with application to asset pricing. Journal of the American Statistical Associa-
tion, 104(485):117–131.
Gillespie, D. T. (1977). Exact stochastic simulation of coupled chemical reactions. The
Journal of Physical Chemistry, 81(25):2340–2361.
Gillespie, D. T. (2001). Approximate accelerated stochastic simulation of chemically
reacting systems. The Journal of Chemical Physics, 115(4):1716–1733.
Goos, P. and Jones, B. (2011). An Optimal Screening Experiment, chapter 2, pages 9–45.
John Wiley & Sons, Ltd.
Gotwalt, C. M., Jones, B. A., and Steinberg, D. M. (2009). Fast computation of designs
robust to parameter uncertainty for nonlinear settings. Technometrics, 51(1):88–95.
Green, P. J. (1995). Reversible jump Markov chain Monte Carlo computation and
Bayesian model determination. Biometrika, 82(4):711–732.
Grelaud, A., Robert, C. P., Marin, J.-M., Rodolphe, F., and Taly, J.-F. (2009). ABC
likelihood-free methods for model choice in Gibbs random fields. Bayesian Analysis,
4(2):317–335.
Hainy, M., Muller, W. G., and Wagner, H. (2014). Likelihood-free simulation-based
optimal design: An Introduction. In Melas, V., Mignani, S., Monari, P., and Salmaso,
L., editors, Topics in Statistical Simulation, pages 271–278, New York, NY. Springer
New York.
Bibliography 134
Hainy, M., Muller, W. G., and Wagner, H. (2016). Likelihood-free simulation-based opti-
mal design with an application to spatial extremes. Stochastic Environmental Research
and Risk Assessment, 30(2):481–492.
Hainy, M., Muller, W. G., and Wynn, H. P. (2013). Approximate Bayesian computation
design (ABCD), an Introduction. In Ucinski, D., Atkinson, A. C., and Patan, M.,
editors, mODa 10 – Advances in Model-Oriented Design and Analysis, pages 135–143,
Heidelberg. Springer International Publishing.
Hainy, M., Price, D. J., Restif, O., and Drovandi, C. (2018). Optimal Bayesian design for
model discrimination via classification. arXiv preprint arXiv:1809.05301.
Hassell, M. P., Lawton, J. H., and May, R. M. (1976). Patterns of dynamical behaviour
in single-species populations. Journal of Animal Ecology, 45(2):471–486.
Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and their
applications. Biometrika, 57(1):97–109.
Haydon, D. T., Kao, R. R., and Kitching, R. P. (2004). The UK foot-and-mouth disease
outbreak - the aftermath. Nature Reviews Microbiology, 2(8):675–681.
Hill, W. J., Hunter, W. G., and Wichern, D. W. (1968). A joint design criterion for
the dual problem of model discrimination and parameter estimation. Technometrics,
10(1):145–160.
Hu, B., Gonzales, J. L., and Gubbins, S. (2017). Bayesian inference of epidemiological
parameters from transmission experiments. Scientific Reports, 7(1):16774.
Kelley, C. (1999). Iterative Methods for Optimization. SIAM.
Kim, K. I. and Lin, Z. (2008). Asymptotic behavior of an SEI epidemic model with
diffusion. Mathematical and Computer Modelling, 47(11):1314–1322.
Kleczkowski, A., Bailey, D. J., and Gilligan, C. A. (1996). Dynamically generated variabil-
ity in plant-pathogen systems with biological control. Proceedings of the Royal Society
of London B: Biological Sciences, 263(1371):777–783.
Knight-Jones, T. and Rushton, J. (2013). The economic impacts of foot and mouth disease
- what are they, how big are they and where do they occur? Preventive Veterinary
Medicine, 112(3):161 – 173.
Konstantinou, M., Biedermann, S., and Kimber, A. C. (2015). Optimal designs for full
and partial likelihood information — with application to survival models. Journal of
Statistical Planning and Inference, 165:27 – 37.
Kullback, S. and Leibler, R. A. (1951). On information and sufficiency. The Annals of
Mathematical Statistics, 22(1):79–86.
Lawler, S. P. (1993). Direct and indirect effects in microcosm communities of protists.
Oecologia, 93(2):184–190.
Leclerc, M., Dore, T., Gilligan, C. A., Lucas, P., and Filipe, J. A. N. (2014). Estimat-
ing the delay between host infection and disease (incubation period) and assessing its
significance to the epidemiology of plant diseases. PLOS ONE, 9(1):1–15.
Lee, X. J., Drovandi, C. C., and Pettitt, A. N. (2015). Model choice problems using
approximate Bayesian computation with applications to pathogen transmission data
sets. Biometrics, 71(1):198–207.
Bibliography 135
Lindley, D. V., Barndorff-Nielsen, O., Elfving, G., Harsaae, E., Thorburn, D., Hald, A.,
and Spjotvoll, E. (1978). The Bayesian approach [with discussion and reply]. Scandi-
navian Journal of Statistics, 5(1):1–26.
Long, Q., Scavino, M., Tempone, R., and Wang, S. (2013). Fast estimation of expected
information gains for Bayesian experimental designs based on Laplace approximations.
Computer Methods in Applied Mechanics and Engineering, 259:24 – 39.
Lopez-Fidalgo, J., Tommasi, C., and Trandafir, P. C. (2007). An optimal experimental
design criterion for discriminating between non-normal models. Journal of the Royal
Statistical Society: Series B (Statistical Methodology), 69(2):231–242.
Luckinbill, L. S. (1973). Coexistence in laboratory populations of paramecium aurelia
and its predator didinium nasutum. Ecology, 54(6):1320–1327.
Marjoram, P., Molitor, J., Plagnol, V., and Tavare, S. (2003). Markov chain Monte Carlo
without likelihoods. Proceedings of the National Academy of Sciences, 100(26):15324–
15328.
McGree, J. M.and Eccleston, J. A. (2010). Investigating design for survival models.
Metrika, 72(3):295–311.
McGree, J. M. (2017). Developments of the total entropy utility function for the dual
purpose of model discrimination and parameter estimation in Bayesian design. Com-
putational Statistics and Data Analysis, 113:207–225.
McGree, J. M., Drovandi, C. C., Thompson, M. H., Eccleston, J. A., Duffull, S. B.,
Mengersen, K., Pettitt, A. N., and Goggin, T. (2012). Adaptive Bayesian compound de-
signs for dose finding studies. Journal of Statistical Planning and Inference, 142(6):1480
– 1492.
McGree, J. M., Drovandi, C. C., White, G., and Pettitt, A. N. (2016). A pseudo-marginal
sequential Monte Carlo algorithm for random effects models in Bayesian sequential
design. Statistics and Computing, 26(5):1121–1136.
McGree, J. M. and Eccleston, J. A. (2008). Probability-based optimal design. Australian
& New Zealand Journal of Statistics, 50(1):13–28.
McGree, J. M. and Eccleston, J. A. (2012). Robust designs for poisson regression models.
Technometrics, 54(1):64–72.
McGree, J. M., Eccleston, J. A., and Duffull, S. B. (2008). Compound optimal design
criteria for nonlinear models. Journal of Biopharmaceutical Statistics, 18(4):646–661.
McKeeman, W. M. (1962). Algorithm 145: Adaptive numerical integration by Simpson’s
rule. Communications of the ACM, 5(12):604.
Meyer, R. K. and Nachtsheim, C. J. (1995). The coordinate-exchange algorithm for
constructing exact optimal experimental designs. Technometrics, 37(1):60–69.
Montgomery, D. C. (2017). Design and Analysis of Experiments. John Wiley & Sons, 9
edition.
Morse, P. M. (1955). Stochastic properties of waiting lines. Journal of the Operations
Research Society of America, 3(3):255–261.
Muller, P. (1999). Simulation-based optimal design. Bayesian Statistics 6, pages 459–474.
Muller, W. G. and Ponce de Leon, A. C. M. (1996). Optimal design of an experiment in
economics. The Economic Journal, 106(434):122–127.
Bibliography 136
Muroga, N., Hayama, Y., Yamamoto, T., Kurogi, A., Tsuda, T., and Tsutsui, T. (2012).
The 2010 foot-and-mouth disease epidemic in Japan. Journal of Veterinary Medical
Science, 74(4):399–404.
Ng, S. H. and Chick, S. E. (2004). Design of follow-up experiments for improving model
discrimination and parameter estimation. Naval Research Logistics (NRL), 51(8):1129–
1148.
Nicholson, A. J. (1954). An outline of the dynamics of animal populations. Australian
Journal of Zoology, 2(1):9–65.
Ong, V. M. H., Nott, D. J., Tran, M.-N., Sisson, S. A., and Drovandi, C. C. (2018).
Variational bayes with synthetic likelihood. Statistics and Computing, 28(4):971–988.
Orsel, K., de Jong, M., Bouma, A., Stegeman, J., and Dekker, A. (2007). The effect of
vaccination on foot and mouth disease virus transmission among dairy cows. Vaccine,
25(2):327 – 335.
Otten, W., Filipe, J. A. N., Bailey, D. J., and Gilligan, C. A. (2003). Quantification and
analysis of transmission rates for soilborne epidemics. Ecology, 84(12):3232–3239.
Overstall, A. M. and McGree, J. M. (2018). Bayesian design of experiments for intractable
likelihood models using coupled auxiliary models and multivariate emulation. Bayesian
Analysis.
Overstall, A. M., McGree, J. M., and Drovandi, C. C. (2018). An approach for finding
fully Bayesian optimal designs using normal-based approximations to loss functions.
Statistics and Computing, 28(2):343–358.
Overstall, A. M. and Woods, D. C. (2017). Bayesian design of experiments using approx-
imate coordinate exchange. Technometrics, 59(4):458–470.
Overstall, A. M., Woods, D. C., and Adamou, M. (2017a). acebayes: An R package for
Bayesian optimal design of experiments via approximate coordinate exchange. arXiv
preprint arXiv:1705.08096.
Overstall, A. M., Woods, D. C., and Adamou, M. (2017b). acebayes: Optimal Bayesian
Experimental Design using the ACE Algorithm. R package version 1.4.
Owen, A. B. (1997). Scrambled net variance for integrals of smooth functions. The Annals
of Statistics, 25(4):1541–1562.
Pagendam, D. and Pollett, P. (2013). Optimal design of experimental epidemics. Journal
of Statistical Planning and Inference, 143(3):563 – 572.
Palhazi Cuervo, D., Goos, P., and Sorensen, K. (2016). Optimal design of large-scale
screening experiments: a critical look at the coordinate-exchange algorithm. Statistics
and Computing, 26(1):15–28.
Parker, B. M., Gilmour, S., Schormans, J., and Maruri-Aguilar, H. (2015). Optimal design
of measurements on queueing systems. Queueing Systems, 79(3):365–390.
Perrone, E. and Muller, W. (2016). Optimal designs for copula models. Statistics,
50(4):917–929.
Ponce de Leon, A. C. and Atkinson, A. C. (1991). Optimum experimental design for dis-
criminating between two rival models in the presence of prior information. Biometrika,
78(3):601–608.
Bibliography 137
Price, D. J., Bean, N. G., Ross, J. V., and Tuke, J. (2016). On the efficient determi-
nation of optimal Bayesian experimental designs using ABC: A case study in optimal
observation of epidemics. Journal of Statistical Planning and Inference, 172:1–15.
Price, D. J., Bean, N. G., Ross, J. V., and Tuke, J. (2018a). Designing group dose-response
studies in the presence of transmission. Mathematical Biosciences, 304:62 – 78.
Price, D. J., Bean, N. G., Ross, J. V., and Tuke, J. (2018b). An induced natural selection
heuristic for finding optimal Bayesian experimental designs. Computational Statistics
& Data Analysis, 126:112 – 124.
Price, L. F., Drovandi, C. C., Lee, A., and Nott, D. J. (2018c). Bayesian synthetic
likelihood. Journal of Computational and Graphical Statistics, 27(1):1–11.
Rose, A. D. (2008). Bayesian experimental design for model discrimination. PhD thesis,
University of Southampton.
Ryan, C. M., Drovandi, C. C., and Pettitt, A. N. (2016a). Optimal Bayesian experimen-
tal design for models with intractable likelihoods using indirect inference applied to
biological process models. Bayesian Analysis, 11(3):857–883.
Ryan, E. G., Drovandi, C. C., McGree, J. M., and Pettitt, A. N. (2016b). A review of
modern computational algorithms for Bayesian optimal design. International Statistical
Review, 84(1):128–154.
Ryan, E. G., Drovandi, C. C., and Pettitt, A. N. (2015). Fully Bayesian experimental
design for pharmacokinetic studies. Entropy, 17(3):1063–1089.
Ryan, E. G., Drovandi, C. C., Thompson, M. H., and Pettitt, A. N. (2014). Towards
Bayesian experimental design for nonlinear models that require a large number of sam-
pling times. Computational Statistics & Data Analysis, 70:45 – 60.
Ryan, K. J. (2003). Estimating expected information gains for experimental designs with
application to the random fatigue-limit model. Journal of Computational and Graphical
Statistics, 12(3):585–603.
Sidje, R. B. (1998). Expokit: A software package for computing matrix exponentials.
ACM Transactions on Mathematical Software, 24(1):130–156.
Sisson, S. A., Fan, Y., and Tanaka, M. M. (2007). Sequential Monte Carlo without
likelihoods. Proceedings of the National Academy of Sciences, 104(6):1760–1765.
Solkner, J. (1993). Choice of optimality criteria for the design of crossbreeding experi-
ments. Journal of animal science, 71(11):2867–2873.
Stenfeldt, C., Pacheco, J. M., Brito, B. P., Moreno-Torres, K. I., Branan, M. A., Delgado,
A. H., Rodriguez, L. L., and Arzt, J. (2016). Transmission of foot-and-mouth disease
virus during the incubation period in pigs. Frontiers in Veterinary Science, 3:105.
Stroud, J. R., Muller, P., and Rosner, G. L. (2001). Optimal sampling times in population
pharmacokinetic studies. Journal of the Royal Statistical Society: Series C (Applied
Statistics), 50(3):345–359.
Tommasi, C. (2009). Optimal designs for both model discrimination and parameter esti-
mation. Journal of Statistical Planning and Inference, 139(12):4123 – 4132.
Tommasi, C. and Lopez-Fidalgo, J. (2010). Bayesian optimum designs for discriminat-
ing between models with any distribution. Computational Statistics & Data Analysis,
54(1):143 – 150.
Bibliography 138
van der Goot, J. A., Koch, G., de Jong, M. C. M., and van Boven, M. (2005). Quantifica-
tion of the effect of vaccination on transmission of avian influenza (h7n7) in chickens.
Proceedings of the National Academy of Sciences, 102(50):18141–18146.
Waterhouse, T., Woods, D., Eccleston, J., and Lewis, S. (2008). Design selection criteria
for discrimination/estimation for nested models and a binomial response. Journal of
Statistical Planning and Inference, 138(1):132–144.
Weir, C. J., Spiegelhalter, D. J., and Grieve, A. P. (2007). Flexible design and efficient im-
plementation of adaptive dose-finding studies. Journal of Biopharmaceutical Statistics,
17(6):1033–1050.
Wood, S. N. (2010). Statistical inference for noisy nonlinear ecological dynamic systems.
Nature, 466(7310):1102–1104.
Woods, D. C., Lewis, S. M., Eccleston, J. A., and Russell, K. G. (2006). Designs for
generalized linear models with several variables and model uncertainty. Technometrics,
48(2):284–292.
Woods, D. C., McGree, J. M., and Lewis, S. M. (2017). Model selection via Bayesian
information capacity designs for generalised linear models. Computational Statistics &
Data Analysis, 113:226–238.
Wu, H.-P. and Stufken, J. (2014). Locally Φ p-optimal designs for generalized linear
models with a single-variable quadratic polynomial predictor. Biometrika, 101(2):365–
375.
Zhang, J. F., Papanikolaou, N. E., Kypraios, T., and Drovandi, C. C. (2018). Optimal
experimental design for predator–prey functional response experiments. Journal of The
Royal Society Interface, 15(144).