Optimal Bayesian experimental designs for complex models · intractable likelihoods meaning evaluating the likelihood many times is computationally infeasible. In recent years, methods

Optimal Bayesian experimental designsfor complex models

Mahasen Bandara DehideniyaBachelor of Science (Statistics with Computer Science),

University of Colombo

under the supervision of

Principal Supervisor: AProf James M. McGree

Associate Supervisor: AProf Christopher C. Drovandi

Mathematical Sciences

Faculty of Science and Engineering

Queensland University of Technology

2019

Submitted in fulfillment of the requirements of the degree of

Doctor of Philosophy

Abstract

Experimental design methods are used in areas such as epidemiology, system biology andecology to collect informative data. The increasingly complex systems from these areasmeans realistic statistical models with hidden states or auxiliary variables potentially inhierarchies are required to describe the observed data. This in turn imposes significantchallenges in experimental design as typically the informativeness of a given design isassessed based on a candidate model or set of competing models given previously collectedor expert elicited data. Thus, design and inference are naturally linked, so as morecomplex models are being developed and used in real-world problems, new methods indesign also need to be considered.

The problem of designing efficient experiments has been addressed in both the frequentistand Bayesian literature. In contrast with the frequentist approach, Bayesian methods fordesigning experiments provide a mathematically rigorous framework to handle uncertaintyabout, for example, the parameter values and the data generating model. However,Bayesian designs are much more computationally expensive to evaluate than frequentistdesigns, even for simple models, as it requires a large number of posterior evaluations.Consequently, Bayesian designs have been limited to relatively simple models, and futureresearch is needed for complex models to address the real world design problems in areassuch as epidemiology, ecology and pharmacology.

The work considered in this thesis is motivated by experiments in epidemiology. Unfortu-nately, most epidemiological models have computationally expensive or computationallyintractable likelihoods meaning evaluating the likelihood many times is computationallyinfeasible. In recent years, methods that facilitate Bayesian inference, at least approxi-mately, in such settings have become well-established (i.e. approximate Bayesian compu-tation; ABC) and used in many areas including epidemiology. In contrast, Bayesian designin such settings has received little attention. To address this gap in the literature, in thisthesis, we propose new methodologies to design experiments for discriminating betweencompeting models using various adaptations and developments of ABC methods.

In the presence of uncertainty about the model and parameters, conducting experimentsto unravel a single source of uncertainty is not efficient due to the high cost of con-ducting experiments, particularly in epidemiology. However, dual-purpose experimentsin epidemiology have not been previously considered due to lack of methods for efficientlyapproximating utility functions which are used to quantitatively evaluate designs. Thus,the developments in approximate Bayesian inference as proposed in this thesis will addressthis need. That is, we will demonstrate that our proposed methods can be used to locatedual purpose designs in settings where the likelihood is computationally intractable.

i

Of the little research that has been conducted in Bayesian design for models with in-tractable likelihoods, all have been limited to designs with a small number of dimensions(around 4). Unfortunately, realistically sized experiments require many more design di-mensions to be considered rending current approaches inapplicable. Consequently, ourdeveloped computational methods will enable designs to be found in high dimensions,and this will be demonstrated from a variety of utility functions including those for pa-rameter estimation, model discrimination and dual purpose experiments. The proposedmethodological developments in this thesis enable the derivation of efficient experimentsto understand biological processes in epidemiology and ecology. Primarily, the proposedmethods can be used to understand of how a disease spreads in large-scale agriculturefields or livestock. Such an understanding can lead to the development of appropriateand informed measures for early detection and control to prevent large-scale spread of thedisease. In addition, these methods can be used to find efficient experiments in ecologyto understand biological phenomena such as predator-prey interactions which can informthe development of policies for sustainable environments and the protection of endangeredspecies.

ii

Declaration

I hereby declare that this submission is my own work and tothe best of my knowledge it contains no material previouslypublished or written by another person, nor material which to asubstantial extent has been accepted for the award of any otherdegree or diploma at QUT or any other educational institution,except where due acknowledgement is made in the thesis. Anycontribution made to the research by colleagues, with whom Ihave worked at QUT or elsewhere, during my candidature, isfully acknowledged.

I also declare that the intellectual content of this thesis is theproduct of my own work, except to the extent that assistancefrom others in the project’s design and conception or in style,

iii

05/07/2019QUT Verified Signature

Acknowledgements

First and foremost, I would like to thank my supervisors for giving me the opportunityto complete my PhD thesis under their supervision. I would like to express my deepestand sincere gratitude to my principle supervisor, Associate Professor James McGree forhis invaluable advice and patience in guiding me at each step of my PhD research jour-ney. I would like to thank to Associate Professor Christopher Drovandi for his continuoussupport and guidance throughout this work. It was a great privilege and honour to learnand develop my research skills under their supervision.

This research would never have been possible without the support of various people atQUT, Brisbane. Firstly, the financial support provided by QUT to cover my living ex-periences in Brisbane and tuition fee to complete my PhD at QUT. I would also thankto staff members of the high-performance computing facility at QUT for their support inconducting my computationally intensive simulation studies. I am extending my thanksto my BRAG and ACEMS friends for their constant encouragement and genuine supportthroughout my PhD life.

I offer my sincerest thanks to University of Peradeniya for providing study leave and al-lowing me to continue my PhD at QUT. Furthermore, my special thanks go to the staff ofthe Department of Statistics and Computer Science at University of Peradeniya for theirtremendous support and guidance.

I am incredibly grateful to my parents for their continuous love and guidance throughoutmy life. Also, I express my thanks to my beloved sister and brother in law for theirsupport. My heartfelt thanks go to my partner, Arosha for her endless love and patience.I would also thank all the friends of mine who live in Brisbane for their great friendshipwhich makes me feel at home in Brisbane during the last four years.

v

List of Publications Arising from this Thesis

Chapter 3: Dehideniya M. B., Drovandi C. C., and McGree J. M. (2018). Optimal Bayesiandesign for discriminating between models with intractable likelihoods in epidemiol-ogy, Computational Statistics & Data Analysis, 124: 277-297.

Chapter 4: Dehideniya M. B., Drovandi C. C., and McGree J. M. Dual purpose Bayesiandesign for parameter estimation and model discrimination in epidemiology using asynthetic likelihood approach, Bayesian Analysis (Submitted for publication).

Chapter 5: Dehideniya M. B., Drovandi C. C., Overstall A. M., and McGree J. M. A syntheticlikelihood-based Laplace approximation for efficient design of biological processesElectronic Journal of Statistics (Submitted for publication).

vii

Contents

Abstract i

Declaration iii

Acknowledgements v

List of Publications Arising from this Thesis vii

Chapter 1 Introduction 71.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.1.1 Research aim and objectives . . . . . . . . . . . . . . . . . . . . . . 81.2 Research contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.2.1 Contribution to methodology . . . . . . . . . . . . . . . . . . . . . 91.2.2 Contribution to application . . . . . . . . . . . . . . . . . . . . . . 9

1.3 Research scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.4 Thesis structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

Chapter 2 Literature Review 132.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.2 Bayesian inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.2.1 Markov chain Monte Carlo . . . . . . . . . . . . . . . . . . . . . . 152.2.2 Importance sampling . . . . . . . . . . . . . . . . . . . . . . . . . . 162.2.3 Laplace approximation . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.3 Approximate Bayesian computation . . . . . . . . . . . . . . . . . . . . . 172.3.1 Synthetic likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . 192.3.2 Indirect inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.4 Bayesian experimental designs . . . . . . . . . . . . . . . . . . . . . . . . . 202.4.1 Utility functions for parameter estimation . . . . . . . . . . . . . . 222.4.2 Utility functions for model discrimination . . . . . . . . . . . . . . 222.4.3 Utility functions for dual experimental goals . . . . . . . . . . . . 23

2.5 Experimental designs for models with intractable likelihood . . . . . . . . 242.6 Optimisation algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

Chapter 3 Optimal Bayesian design for discriminating between models with in-tractable likelihoods in epidemiology 29

3.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.3 Bayesian model choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.4 Bayesian experimental design . . . . . . . . . . . . . . . . . . . . . . . . . 34

ix

3.4.1 Utility function for parameter estimation . . . . . . . . . . . . . . 353.4.2 Utility functions for model discrimination . . . . . . . . . . . . . . 36

3.5 Approximate Bayesian computation (ABC) and utility estimation . . . . 373.5.1 ABC for parameter estimation . . . . . . . . . . . . . . . . . . . . 373.5.2 ABC for model choice . . . . . . . . . . . . . . . . . . . . . . . . . 383.5.3 Estimating the model discrimination utility functions . . . . . . . 39

3.6 Optimisation algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413.6.1 Refined coordinate exchange (RCE) algorithm . . . . . . . . . . . 41

3.7 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433.7.1 Example 1 - Designs for parameter estimation of a pharmacokinetic

model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453.7.2 Example 2 - Designs for model discrimination . . . . . . . . . . . . 473.7.3 Example 3 - Designs for model discrimination . . . . . . . . . . . . 54

3.8 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

Chapter 4 Dual purpose Bayesian design for parameter estimation and model dis-crimination in epidemiology using a synthetic likelihood approach 61

4.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 634.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 634.3 Inference framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 654.4 Bayesian experimental designs . . . . . . . . . . . . . . . . . . . . . . . . . 68

4.4.1 Dual purpose utility function of parameter estimation and modeldiscrimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

4.4.2 Utility function for parameter estimation . . . . . . . . . . . . . . 704.4.3 Utility function for model discrimination . . . . . . . . . . . . . . . 71

4.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 714.5.1 Example 1 - Death and SI models . . . . . . . . . . . . . . . . . . 744.5.2 Example 2 - SIR and SEIR models . . . . . . . . . . . . . . . . . 79

4.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

Chapter 5 A synthetic likelihood-based Laplace approximation for efficient designof biological processes 85

5.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 875.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 875.3 Inference framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 885.4 Bayesian experimental designs . . . . . . . . . . . . . . . . . . . . . . . . . 90

5.4.1 Utility functions for parameter estimation . . . . . . . . . . . . . . 925.4.2 Utility function for model discrimination . . . . . . . . . . . . . . . 935.4.3 Utility function for dual purpose experiments . . . . . . . . . . . . 935.4.4 Optimisation algorithm . . . . . . . . . . . . . . . . . . . . . . . . 94

5.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 945.5.1 Example 1 - Dual purpose designs for the death and SI models . . 945.5.2 Example 2 - Dual purpose designs for foot and month disease . . . 1015.5.3 Example 3 - Design for parameter estimation of predator - prey model106

5.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

Chapter 6 Conclusion 1116.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1116.2 Limitations and future work . . . . . . . . . . . . . . . . . . . . . . . . . . 113

x

Appendix A Supplementary Material for Chapter 3: ‘Optimal Bayesian design fordiscriminating between models with intractable likelihoods in epidemi-ology’ 115

A.1 Monte Carlo error of utility estimation in Example 2 . . . . . . . . . . . . 116A.2 Performance of optimisation algorithms in locating three and four points

designs discriminating between Model 1 and 2 . . . . . . . . . . . . . . . 117A.3 Performance of the optimal designs . . . . . . . . . . . . . . . . . . . . . . 118

A.3.1 Performance of the optimal designs for model discrimination -Model 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

A.3.2 Performance of the optimal designs for model discrimination -Model 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

A.3.3 Performance of the optimal designs for model discrimination - Model4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

Appendix B Supplementary Material for Chapter 4: ‘Dual purpose Bayesian designfor parameter estimation and model discrimination in epidemiology usinga synthetic likelihood approach’ 121

B.1 Derivation of the total entropy utility for static designs . . . . . . . . . . . 121B.1.1 Expected change in HM . . . . . . . . . . . . . . . . . . . . . . . . 121B.1.2 Expected change in HP . . . . . . . . . . . . . . . . . . . . . . . . 123B.1.3 Expected change in HT . . . . . . . . . . . . . . . . . . . . . . . . 125

B.2 Comparison between synthetic likelihood approach and ABC rejection method127

Appendix C Supplementary Material for Chapter 5: ‘A synthetic likelihood-basedLaplace approximation for efficient design of biological processes’ 129

C.1 Informativeness of summary statistics . . . . . . . . . . . . . . . . . . . . 129C.1.1 Death model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129C.1.2 SI model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

xi

List of Figures

3.1 Trace plots of U(d) each iteration of the ACE and RCE algorithm in lo-

cating 15 optimal blood sampling times based on a discretised design space. 47

3.2 Prior predictive distributions of Model 1 (solid) and 2 (dashed). Here,

dot-dashed and dotted lines represent the 2.5% and 97.5% prior prediction

quantiles of Model 1 and 2, respectively. . . . . . . . . . . . . . . . . . . 48

3.3 Comparison of estimated expected utility of the mutual information utility

(first row), the Zero-One utility (second row) and the Ds-optimality utility

(third row) using ABC likelihoods and actual likelihoods. In each plot,

y = x line indicates the perfect match of approximated and actual utility

evaluations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3.4 Trace plots of the expected utility for each run of the ACE-D and RCE

algorithms in locating optimal two-points design based on (a) the mutual

information utility , (b) the Ds-optimality and (c) Zero-One utility. . . . . 51

3.5 Empirical cumulative probabilities of the posterior model probability of

Model 1 (true model) obtained for observations generated from Model 1

according to optimal designs for discriminating between Models 1 and 2,

and random designs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

3.6 Empirical cumulative probabilities of the posterior model probability of

Model 2 (true model) obtained for observations generated from Model 2

according to optimal designs for discriminating between Models 1 and 2,

and random designs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

3.7 Empirical cumulative probabilities of the ABC posterior model probability

of Model 1 (true model) obtained for observations generated from Model

1 according to optimal designs for discriminating between Models 1, 2, 3

and 4, and random designs. . . . . . . . . . . . . . . . . . . . . . . . . . . 57

1

4.1 Comparison of the estimated expected utility of the mutual information

utility (first row), the total entropy utility (second row) and the KLD

utility (third row) using synthetic and actual likelihoods. In each plot,

y = x line indicates a perfect match of approximated and actual utility

evaluations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

4.2 The posterior model probability of the data generating model obtained for

observations generated from (a) death model and (b) SI model according

to optimal designs and random designs. . . . . . . . . . . . . . . . . . . . 77

4.3 Log determinant of the posterior covariance of the parameters of (a) death

model and (b) SI model obtained for observations generated from the cor-

responding model according to optimal designs and random designs. . . . 78

4.4 The approximated posterior model probability of (a) SIR model and (b)

SEIR model obtained for observations generated from the corresponding

model according to optimal designs and random designs. . . . . . . . . . . 81

4.5 Log determinant of the approximated posterior covariance of the parame-

ters of (a) SIR model and (b) SEIR model obtained for observations gen-

erated from the corresponding model according to optimal designs and

random designs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

5.1 Comparison of the estimated expected utility of design with 15 design

points according to the (a) total entropy, (b) mutual information and (c)

KLD utilities using Laplace approximation based on synthetic and actual

likelihoods. Here, designs with biased estimate of the expected utility are

represented by (×). In each plot, y = x line indicates a perfect match of

approximated and actual utility evaluations. . . . . . . . . . . . . . . . . 96

5.2 Prior predictive distribution of number of infecteds based on the death

model (solid) and SI model (dashed) are given in sub figure (a). Here,

dot-dashed and dotted lines represent the 10% aand 90% prior predictive

quantiles of the death and SI model, respectively. Sub figure (b) illustrates

the optimal designs found under total entropy utility (∗) along with the

KLD utility (×) and mutual information utility (+) for the death and SI

models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97


observations generated from (a) death model and (b) SI model according

to optimal designs and an equally spaced designs. . . . . . . . . . . . . . . 99

2

5.4 The log determinant of the inverse of posterior variance-covariance matrix

of parameters of the data generating model when observations generated

from (a) death model and (b) SI model according to optimal designs and

an equally spaced designs. . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

5.5 Sub figures (a) and (b) the prior predictive distributions of infectious and

recovered individuals based on the SIR (solid) and SEIR (dashed) models.

In both figures, dotted and dot-dashed lines represent the 10% aand 90% of

prior predictive quantiles based on the SIR and SEIR model, respectively.

Optimal designs found under total entropy utility (∗) along with the KLD

utility (×) and mutual information utility (+) for the SIR and SEIR models

are illustrated in sub figure (c). . . . . . . . . . . . . . . . . . . . . . . . 102


observations generated from (a) SIR model and (b) SEIR model according

to optimal designs and an equally spaced designs. . . . . . . . . . . . . . . 104



from (a) SIR model and (b) SEIR model according to optimal designs and

an equally spaced designs. . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

5.8 Prior predictive distributions of prey and predators are given in sub figure

(a) and (b) respectively. In both figures, dotted lines represent the 10% and

90% prior prediction quantiles of prey and predators. Plot (c) illustrates

the optimal designs found under KLD utility (+) and NSEL utility (∗) for

estimating parameters of the modified LV model. . . . . . . . . . . . . . 107



from the modified LV model according to optimal designs and an equally

spaced designs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

A.1 Monte Carlo error of estimated expected utility of the mutual information

utility (first row), the Ds-optimality utility (second row) and the Zero-One

utility (third row). In each plot, the solid line represents the Monte Carlo

error associates with the estimated utility of the optimal design found under

each utility and dashed lines represent the Monte Carlo errors for randomly

selected designs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

A.2 Empirical cumulative probabilities of the ABC posterior model probability




3









B.1 Comparison of the accuracy of the estimated expected utility of the mu-

tual information utility using Synthetic likelihood based on 1× 106 model

simulations (first column), and ABC-MC based on 1×106 (second column)

and 2 × 106 (third column) model simulations. In each plot, y = x line

indicates a perfect match of approximated and actual utility evaluations. . 128

C.1 Scatter plot between model parameter b and summary statistics, (a) mean

and (b) variance of observations simulated from the death model according

to a random design with 15 design points. . . . . . . . . . . . . . . . . . . 129

C.2 Scatter plot between model parameter b1 and summary statistics, (a) mean

and (b) variance of observations simulated from the SI model according to

a random design with 15 design points. . . . . . . . . . . . . . . . . . . . . 130

C.3 Scatter plot between model parameter b2 and summary statistics, (a) mean

and (b) variance of observations simulated from the SI model according to

a random design with 15 design points. . . . . . . . . . . . . . . . . . . . . 130

4

List of Tables

3.1 Performance of optimisation algorithms in locating 15-points designs for

parameter estimation of PK-model. . . . . . . . . . . . . . . . . . . . . . . 46

3.2 Performance of optimisation algorithms in locating two points designs for

discriminating between Model 1 and 2 based on different utility functions. 50

3.3 Optimal designs for discriminating between Model 1 and Model 2 derived

under different utility functions. . . . . . . . . . . . . . . . . . . . . . . . . 52

3.4 Utility of optimal designs for discriminating between Models 1, 2, 3 and 4

derived under different utility functions. . . . . . . . . . . . . . . . . . . . 56

3.5 Utility of optimal designs for discriminating between Models 1, 2, 3 and 4

derived under the mutual information utility. . . . . . . . . . . . . . . . . 57

4.1 Optimal designs derived under different utility functions. . . . . . . . . . . 76

4.2 Optimal designs derived under different utility functions. . . . . . . . . . . 80

5.1 Expected utility values (standard deviation) of optimal designs derived





under the KLD and NSEL utility functions. . . . . . . . . . . . . . . . . . 108

A.1 Performance of optimisation algorithms in locating three and four points

designs for discriminating between Model 1 and 2 based on different utility

functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

5

1 Introduction

1.1 Motivation

Understanding the mechanisms underpinning the dynamics of biological systems is im-

portant in many areas such as epidemiology, ecology and systems biology. For instance,

in epidemiology, understanding the dynamics of an infectious disease spreading among a

population of animals or plants is crucial for developing targeted strategies for detection,

prevention and control. In these areas, experiments are used as one of the main meth-

ods of collecting data to discover new knowledge about the process of interest, and these

experiments should be carefully designed to ensure the data to be obtained are as infor-

mative as possible. However, the increasing complexity of models for biological systems

presents new challenges in design of experiments, motivating the need to develop modern

statistical methods in design for real-world experiments in biology.

The problem of designing efficient experiments has been addressed in both the frequen-

tist and Bayesian literature. In both approaches, optimal designs are selected based on

a utility function defined to reflect the worth of data to be obtained from the experi-

ment to achieve the intended experimental goals such as parameter estimation, model

discrimination and/or prediction. In general, existing information from previous experi-

ments and expert knowledge can be used to design efficient experiments. In contrast to

the frequentist approach, Bayesian methods to design experiments provide a principled

framework to incorporate prior information into the selection of optimal designs. This

is achieved by defining a utility function based on the posterior distribution, which is a

combination of prior information and information from the data (though the likelihood

function). Therefore, Bayesian designs can yield data which provide information that is

additional to what is already known. This is a significant advantage over the frequentist

approach where prior information is not typically included in the analysis.

In addition, uncertainty is most rigorously handled within a Bayesian framework. This en-

ables the construction of important utility functions for learning about specific unknowns.

For instance, the mutual information utility for model discrimination (Box and Hill, 1967)

selects designs based on the posterior model probability. Such a utility function therefore

captures (and handles) uncertainty across all outcomes within the model space. Another

7

Chapter 1. Introduction 8

useful utility is the total entropy utility (Borth, 1975) that is constructed based on en-

tropy (i.e. uncertainty) to design dual purpose experiments for parameter estimation and

model discrimination. In the frequentist context, dual purpose utility functions are typi-

cally defined as a weighted sum of utility functions for each experiment goal, for example,

DT-optimality criterion (Atkinson, 2008) and DKL-optimality criterion (Tommasi, 2009).

Consequently, the experimenter should select suitable weights for each goal which is not

straightforward. In contrast, the total entropy utility does not require such a selection of

weights as these are determined by the uncertainty in the prior information.

Often, models used to describe biological processes have likelihood functions that are

not available in closed-form or are expensive to evaluate a large number of times. As a

consequence, Bayesian design in these areas has been limited to low dimensions, a single

model and utility functions for parameter estimation. However, in practice, the true

biological process governing, for example, the spread of a disease within a population, is

rarely known. Therefore, new methodologies are required to handle multiple competing

models and to discriminate between models with intractable likelihoods.

Further, experiments with multiple goals such as parameter estimation and model dis-

crimination have been proposed in the design literature as a way of reducing the number

of experiments needed to learn about the process of interest. These dual purpose ex-

periments are useful in the context of epidemiological experiments where the cost of

conducting experiments is high, and the number of experiments is restricted by the ethi-

cal concerns about using animals as experimental units. Currently, there are no existing

methods to design such experiments for models with intractable likelihoods.

This thesis, therefore, aims to develop new Bayesian computational algorithms to advance

the field of design of experiments, and, through their application, advance applied areas

such as epidemiology and ecology. This is detailed in the next section.

1.1.1 Research aim and objectives

The central aim of this thesis is to develop methods to design experiments for models

with intractable likelihoods, with particular applications in epidemiology and ecology.

This aim will be achieved by addressing the following objectives:

1. Develop a method to design experiments for discriminating between models with

intractable likelihoods.

2. Develop a new optimisation algorithm for computationally expensive utility func-

tions.

3. Develop a method to design dual purpose experiments for models with intractable

likelihoods.


4. Extend Objective 3 to find high dimensional designs for models with intractable

likelihoods.

1.2 Research contribution

Addressing each of the above objectives will provide methodological contributions to the

field of Bayesian experimental design and applied areas for experimenting in epidemiology

and ecology. These specific contributions are outlined below.

1.2.1 Contribution to methodology

Objective 1 aims to develop and implement a computationally efficient algorithm to ap-

proximate three model discrimination utility functions for models with intractable likeli-

hoods. The algorithm will be based on methods from approximate Bayesian computation

(ABC), and will facilitate efficient approximations to posterior distributions. Such devel-

opments will provide the first approach to address model uncertainty in Bayesian design

for models with intractable likelihoods.

In general, evaluating utility functions in Bayesian design is computationally expensive,

and this becomes more computationally demanding for models with intractable likeli-

hoods. This presents significant challenges when optimising the design, particularly in

high dimensions. Therefore, to address Objective 2, we will extend the coordinate ex-

change algorithm such that it can be used to efficiently optimise expensive utility func-

tions. As a result, optimal designs will be found with a relatively small number of utility

evaluations facilitating high dimensional designs to be found in a timely manner.

Despite the advantages of conducting the dual purpose experiments, there has not been

any methodological development to design such experiments for models with intractable

likelihoods. We address this lack of methods through Objective 3 by developing an exten-

sion of the synthetic likelihood approach to handle discrete data. The proposed likelihood

approximation facilitates the evaluation of a wide range of utility functions including the

total entropy utility for designing dual purpose experiments for models with intractable

likelihoods.

For models with intractable likelihoods, limited approaches have been proposed for de-

signing experiments to collect a reasonably large number of observations. Objective 4

focuses on developing a synthetic likelihood-based Laplace approximation to efficiently

approximate posterior distributions. This extends the methodologies from Objective 3 to

design high dimensional experiments for models with intractable likelihoods.

1.2.2 Contribution to application

The methods developed in this thesis are predominately applied to design experiments in

epidemiology to generate new knowledge about infectious diseases such as foot and mouth


disease. Designing such experiments to learn about the appropriate model to describe

the spread of an infectious disease enables the discovery of mechanisms that potentially

promote, limit and/or initialise the spread of the infection. Such knowledge can therefore

be used to develop targeted detection, prevention and control strategies. Further, the

development of methods to design dual purpose experiments for model discrimination

and parameter estimation reduces the number of experiments needed. This facilitates

running more ethical experimentation with animals.

1.3 Research scope

The scope of this research is developing new Bayesian methods to design experiments

for models with intractable likelihoods as encountered in areas such as epidemiology and

ecology. This is distinctly different from frequentist approaches for models with tractable

likelihoods which are well-developed for, for example, linear models (Montgomery, 2017),

generalised linear models (Biedermann and Woods, 2011, Dror and Steinberg, 2006, Mc-

Gree and Eccleston, 2008, 2012, Woods et al., 2006, Wu and Stufken, 2014), multiple

response models (Denman et al., 2011, Perrone and Muller, 2016) and survival models

(Konstantinou et al., 2015, McGree, 2010).

1.4 Thesis structure

This thesis consists of one published and two submitted journal articles which are first

authored by the candidate. Since these chapters have been written as independent pub-

lications, there is some overlap between them. That is, each of these chapters (3 to 5)

contain a relevant literature review and methodology section but a more comprehensive

literature review is provided in Chapter 2.

Chapter 3 presents the proposed methodological developments to design experiments for

discriminating between models with intractable likelihoods using methods from ABC. Fur-

ther, an extended coordinate exchange algorithm, called the Refined coordinate exchange

algorithm, is proposed to reduce the computation burden of locating optimal designs for

models with intractable likelihoods. This chapter addresses Objective 1 and 2 of this

thesis, and has appeared in Computational Statistics & Data Analysis.

Chapter 4 addresses Objective 3 and presents a novel approach to design dual purpose

experiments for models with intractable likelihoods using an extension of synthetic like-

lihood for discrete observations. This work is motivated by the application of designing

dual purpose experiments to study foot and mouth disease, and has been submitted for

publication in Bayesian Analysis.

Chapter 5 extends the methodologies from Chapter 4 to facilitate high dimensional de-

sign via a synthetic likelihood-based Laplace approximation to posterior inference. The

proposed approach is validated through an illustrative example and also applied to find


high dimensional designs for motivating examples in epidemiology and ecology. The de-

velopments presented in this Chapter have been submitted for publication in Electronic

Journal of Statistics.

Finally, Chapter 6 summarises the key finding from Chapters 3 to 5. The limitations of

this work are then discussed, and avenues for future research are proposed.

2 Literature Review

“To consult the statistician after an experiment is finished is often merely to ask him to

conduct a post mortem examination. He can perhaps say what the experiment died of.”

- Sir Ronald Aylmer Fisher

2.1 Introduction

The task of designing experiments is crucial in most scientific exploration (discovery).

In conducting an experiment, the experimenter has the freedom to select values for some

variables associated with the process of interest. These variables are referred to as control

variables, and different values of these may result in variations of the response variable

which is measured by the experimenter. Therefore, prior to conducting an experiment,

the experimenter should decide the most appropriate values for the control variables such

that the collected data are as informative as possible. The combination of values for all

control variables is referred to as the design. The field of design of experiments provides

a collection of statistical methods for selecting designs, and has been developed as one of

the main branches of statistics.

In this thesis, primarily we consider the design of experiments in epidemiology to study

the dynamics of the spread of a disease in a closed population over time. In epidemiolog-

ical experiments, initially the virus or pathogen of interest is introduced to a population

of plants (Bailey and Gilligan, 1999, Bailey et al., 2004, Kleczkowski et al., 1996, Leclerc

et al., 2014, Otten et al., 2003) or animals (Backer et al., 2012, Bravo de Rueda et al., 2015,

Hu et al., 2017, Orsel et al., 2007, van der Goot et al., 2005) in a controlled environment,

and then the population of individuals is observed over time. Due to the high cost of col-

lecting data, the population of individuals is only observed at some selected time points

where the disease state of each individual is identified as, for example, being susceptible,

exposed, infectious, or recovered with respect to the disease. Let d = {t1, t2, . . . , tn} de-

note a vector of n time points at which the experimenter desires to observe the spread of

a disease among N individuals. Then, the number of individuals in each disease state at

each observational time point is the outcome of the experiment. Further, we also consider

experiments in ecology which are focused on investigating the interactions between two or

13

Chapter 2. Literature review 14

more species populations to understand ecological phenomena such as predator-prey in-

teractions (Luckinbill, 1973, Zhang et al., 2018). These ecological experiments start with

an initially specified number of species from each population, and the experimenter ob-

serves the size of each population over time in a similar fashion in which epidemiological

experiments are undertaken.

Continuous time Markov chain (CTMC) models can be used to model the data from

these experiments where such models typically describe the probability of individuals

transitioning between different disease states.

The selection of n observational time points plays a critical role in the informativeness

of the experiment. In this thesis, we propose new methodologies to determine when the

experiment should be observed in order to efficiently understand the underlying process

that governs the spread of the disease. A Bayesian framework is adopted for inference

and design throughout this thesis. Before providing background in Bayesian design of

experiments, Bayesian inference methods for parameter estimation and model selection

are described in Section 2.2.

2.2 Bayesian inference

In a Bayesian framework, the unknown model parameters θ are treated as random vari-

ables. The uncertainty on these parameters a priori is represented by a probability dis-

tribution p(θ) which is referred to as the prior distribution. Upon observing data y under

design d, the prior distribution p(θ) and the likelihood of observed data p(y|θ,d) are

combined by the Bayes’ theorem as shown in Equation (2.1). The resulting distribution

of θ is referred to as the posterior distribution.

p(θ|y,d) =p(y|θ,d)p(θ)

p(y|d). (2.1)

Here, the denominator, p(y|d) =∫θ p(y|θ,d)p(θ) dθ , is the normalising constant or the

model evidence. For all but the simplest of models, p(y|d) cannot be evaluated analyt-

ically. However, approaches in Bayesian inference have been proposed that only require

the evaluation of a quantity that is proportional to the posterior density or probability,

and as such the posterior distribution can be expressed as follows:

p(θ|y,d) ∝ p(y|θ,d)p(θ). (2.2)

Often more than one model may be contemplated to describe the underlying process which

generates the observations y. Thus, the selection of the most suitable model from a finite

number of K candidate models, described by a random variable M ∈ {1, 2, ...,K}, is of


interest. Let model m contain parameters θm with a prior distribution p(θm|M = m), and

prior model probability p(M = m), and the likelihood function of each model m be given

by p(y|θm,d). For ease of notation, M = m will be abbreviated by m throughout the

rest of this thesis. Then, model choice is performed using the posterior model probability

which can be expressed as follows:

p(m|y,d) =p(y|m,d)p(m)∑K

m=1 p(y|m,d)p(m), (2.3)

where

p(y|m,d) =

∫θm

p(y|θm,m,d)p(θm|m) dθm.

The following subsections describe computational methods for parameter estimation and

model discrimination when the likelihood is available in a closed form.

2.2.1 Markov chain Monte Carlo

When the posterior of θ is only available up to a normalisation constant, as given in

Equation (2.2), Markov chain Monte Carlo (MCMC) is the most commonly used method

to sample from the target posterior of θ. MCMC is based on an iterative exploitation of

parameter space by a Markov chain starting from an initial state θ0. In each iteration,

the state of the chain moves from the current state θ to a proposed state θ∗ via a proposal

distribution q(.|.), according to the following acceptance probability

r = min{p(y|θ∗,d) p(θ∗) q(θ|θ∗)p(y|θ,d) p(θ) q(θ∗|θ)

, 1}. (2.4)

After a large number of iterations, the Markov chain reaches to its limiting distribution

which represents the posterior distribution. The first B draws of the chain are discarded

to remove the effect of the initial value of θ. The below algorithm describes the basic


MCMC algorithm for sampling from a posterior distribution.

1 Select an initial value θ0.

2 for i = 1 to N do

3 Generate θ∗ ∼ q(.|θi−1).

4 Compute r = min{

q(θi−1|θ∗) p(y|θ∗,d) p(θ∗)q(θ∗|θi−1) p(y|θi−1,d) p(θi−1) , 1

}5 Generate u ∼ U(0, 1)

6 if u ≤ r then

7 Set θi = θ∗

8 else

9 Set θi = θi−1

10 end

11 end

Algorithm 2.1: Metropolis-Hastings Algorithm (Hastings, 1970)

Typically, the model evidence might be an intractable integral even for models with an-

alytically tractable likelihoods p(y|θm,d), and therefore the estimation of the posterior

model probability (see Equation (2.3)) is a difficult task. The reversible jump Markov

chain Monte Carlo (RJMCMC) sampler proposed by (Green, 1995) can be used to esti-

mate p(m|y,d). The RJMCMC sampler draws samples from the joint distribution of the

model parameters and model indicator (θm,m). Then, p(m|y,d) can be estimated by the

proportion of iterations that the sampler visited model m. Although, in principle, this

algorithm is straightforward to implement, in practice it suffers from poor mixing, par-

ticularly across the model space. Thus, the use of this algorithm for estimating posterior

model probabilities has been limited.

2.2.2 Importance sampling

Importance sampling is an alternative method for approximating a distribution that is

difficult to sample from directly, i.e. a posterior distribution. Here, we describe impor-

tance sampling for estimating the expectation of a function h(X) where X ∼ f(.) and

is difficult to sample from f(.). Let g(.) be another distribution which is easy to sample

from and has the same support as f(.). g(.) is called the importance distribution. Then,

E[h(X)] can be expressed as,

E[h(X)] =

∫h(x)f(x)

g(x)g(x) dx.

The ratio f(x)g(x) is referred to as the importance weights for each particle x. Then, the

importance sampling estimate of E[h(X)] using a sample {xi}Ni=1 drawn from g(.) is

given by,


E[h(X)] =N∑i=1

h(xi)f(xi)

g(xi)=

N∑i=1

h(xi)w(xi).

When f(.) is available up to a normalisation constant, normalised weights W (xi) =

w(xi)/∑N

i=1w(xi) should be used. This approach can be used to approximate a posterior

distribution. Here, p(θ) can be considered as the importance distribution and the ratiof(θ)g(θ) is simplified to the likelihood p(y|θ,d). Thus, a sample {θi}Ni=1 drawn from the prior

is weighted by the normalised likelihood weights Wi. The efficiency of the approximation

can be determined by the effective sample size (ESS) which can be approximated as,

ESS =1∑N

i=1W2i

. (2.5)

The value of ESS is the number of independent samples from the posterior distribution.

2.2.3 Laplace approximation

The Laplace approximation is a deterministic and computationally efficient approximation

to the posterior distribution. Suppose we have observed data y under design d generated

from model m with q parameters θ. Then, the Laplace approximation approximates

the posterior distribution of θ via a multivariate normal distribution with mean θ∗ and

covariance matrix H(θ∗)−1 where θ∗ is the posterior mode and H(θ∗)−1 is the inverse of

the Hessian matrix evaluated at θ∗.

One advantage of using the Laplace approximation is the availability of an approximation

to the model evidence which can be used for model choice. The approximation is as

follows:

p(y|m,d) = (2π)q2 |H(θ∗)−1|

12 p(y|θ∗,m,d) p(θ∗|m). (2.6)

Most models considered in this thesis have likelihoods which are not available in closed-

form or are computationally expensive to evaluate a large number of times. Thus, in the

following section, likelihood-free methods for inference are described.

2.3 Approximate Bayesian computation

For complex models, the likelihood may not be available in closed form or cannot be

evaluate a large number of times. In the last two decades, Approximate Bayesian com-

putation (ABC) methods have been developed as an alternative method to undertake

Bayesian inference for these complex models given that it is possible to simulate data

from the model. Originally proposed in population genetics (Beaumont et al., 2002) such


methods assume, despite the likelihood not being available, it is relatively straightforward

(and efficient) to sample from the likelihood. We note that this is certainly the case for

CTMC models via the Gillespie algorithm (Gillespie, 1977). The simplest approach in

ABC is ABC rejection. Here, the posterior distribution is approximated through the sim-

ulation of prior predictive data x, and retain the parameters values that generated data

close to the observed data y, as determined by a discrepancy function. More specifically,

a sample of N parameter values are drawn from the prior p(θ) and for each parameter

value θi, a dataset xi is generated. Then, each xi is compared with y using a discrepancy

function ρ(x,y), and the parameter values which generate x with discrepancy less than a

pre-defined threshold ε are kept to form the ABC posterior of θ. Thus, the ABC posterior

of θ can be expressed as,

pABC(θ|y,d, ε) ∝∫xp(x|θ,d) I(ρ(y,x|d) ≤ ε)dx. (2.7)

where I(A) is the indication function for event A. A sample from the ABC posterior

pABC(θ|y,d, ε) can be obtained by ABC rejection as described in Algorithm 2.2.

1: Generate θi ∼ p(θ) for i = 1, ..., N

2: Generate xi ∼ p(·|θi,d) for i = 1, ..., N

3: Compute discrepancies of ρi = ρ(xi,y|d) for i = 1, ..., N creating particles {θi, ρi}Ni=1

4: Sort the particle set according to the discrepancy ρ such that ρ1 ≤ ρ2 ≤ ... ≤ ρN

5: Determine ε = ρbαNc (where b·c denotes the floor function and 0 < α < 1).

6: Select the subset of particles {θi|ρi ≤ ε}Ni=1, which gives the ABC posterior sample of θ.

Algorithm 2.2: ABC rejection algorithm (Beaumont et al., 2002)

The ABC rejection method suffers from the curse of dimensionality as the number of ob-

servations in y increases. Thus, various dimension reduction methods have been proposed

in the literature (Blum et al., 2013). Further, the chosen value of tolerance ε determines

the accuracy of the approximation. A smaller value of ε provides a more accurate ap-

proximation, but also results in a low acceptance rate. Consequently, it increases the

computational cost of generating a large number of simulations to obtain a reasonably

sized sample {θi|ρi ≤ ε}Ni=1, see line 6 of Algorithm 2.2, to represent the posterior.

For K competing models with intractable likelihoods, the joint ABC posterior of model

indicator m and model parameters θm can be expressed as,

pABC(θm,m|y,d, ε) ∝ p(θm|m) p(m)

∫xp(x|θm,m,d) I(ρ(y,x|d) ≤ ε)dx, (2.8)

where y are the observed data, and x are the simulated data from model m. Upon

sampling from pABC(θm,m|y,d, ε), the ABC posterior model probability of model m can


be approximated by summing the number of samples where the model indicator is equal

to m. The ABC model choice algorithm (Grelaud et al., 2009) as described in Algorithm

2.3 can be used to approximate the posterior model probabilities of candidate models.

1: Generate mi ∼ p(m) for i = 1, ..., N

2: Generate θim ∼ p(·|mi,d) for i = 1, ..., N

3: Generate xi ∼ p(·|θim,d) for i = 1, ..., N

4: Compute discrepancies of ρi = ρ(xi,y|d) for i = 1, ..., N creating particles {mi, ρi}Ni=1

5: Sort the particle set according to the discrepancy ρ such that ρ1 ≤ ρ2 ≤ ... ≤ ρN

6: Determine ε = ρbαNc (where b·c denotes the floor function and 0 < α < 1).

7: Select the subset of particles {mi|ρi ≤ ε}Ni=1

Algorithm 2.3: ABC algorithm for model choice (ABC-MC) (Grelaud et al., 2009)

Then, the ABC approximation to the posterior model probability of model m is given by,

p(M = m|y,d) =1

Nε

Nε∑i=1

I(mi = m), (2.9)

where Nε is the number of particles with discrepancy value less than ε (see line 7 of

Algorithm 2.3).

2.3.1 Synthetic likelihood

The synthetic likelihood (Wood, 2010) approach is another method of approximating

an intractable likelihood of observed data y for a given value of model parameters θ.

However, the synthetic likelihood approach is based on a parametric approximation to

the sampling distribution of the data or summary statistics. This is achieved by evaluating

summary statistics sobs = S(y) of observed data y, and assuming these summary statistics

follow a multivariate normal distribution with mean µ(θ) and variance-covariance Σ(θ).

Then, the synthetic likelihood ls(sobs|θ) can be expressed as,

ls(sobs|θ) =1

2(sobs − µ(θ))T Σ(θ)(sobs − µ(θ))− 1

2log |Σ(θ)|. (2.10)

In general, µ(θ) and Σ(θ) cannot be evaluated analytically for a given value of θ but

can be approximated by simulating n datasets from the model of interest with parameter

θ and evaluating the mean and variance-covariance of the summary statistics for each

dataset. By substituting the estimated mean vector µ(θ) and variance-covariance matrix

Σ(θ) in Equation (2.10), ls(sobs|θ) can be approximated.

Originally, Wood (2010) used the synthetic likelihood to find the maximum likelihood

estimates of parameters in nonlinear models in ecology. Recently, Price et al. (2018c)


extended the synthetic likelihood to the Bayesian framework by incorporating parameter

uncertainty. Similarly, the synthetic likelihood can be used as an approximation to the

actual likelihood when using posterior approximations such as importance sampling and

variational Bayes methods (Ong et al., 2018).

Gaussian diffusion approximation is the deterministic counterpart of the synthetic likeli-

hood approach where the mean and variance are obtained analytically. Thus, the Gaussian

diffusion approximation is computationally less expensive than simulation-based methods

as summaries like the mean and variance can be found without simulation. Despite the

computational advantages, the applicability of this method is limited to density dependent

Markov chains (Pagendam and Pollett, 2013)

2.3.2 Indirect inference

In indirect inference (II), a model with a tractable likelihood, called an auxiliary model,

is used to describe data from a generative model with an intractable likelihood. Drovandi

et al. (2011) employed II within the ABC algorithm to summarise data via using estimated

parameters of the auxiliary model as summary statistic. Alternatively, the likelihood of

the generative model is approximated by the likelihood of auxiliary model evaluated at

φ(θ) where θ is the parameter of the generative model, for example, Drovandi et al.

(2015), Gallant and McCulloch (2009). The relationship between the parameters of the

generative model and auxiliary model, represented by a mapping function, φ(θ), is found

via fitting the auxiliary model to data simulated from the generative model. The accuracy

of this approach highly depends on the availability of an auxiliary model to well describe

the data generated from the generative model.

2.4 Bayesian experimental designs

The field of design of experiments provides a collection of methodologies to plan an

experiment that will yield informative data for statistical inference on the process of

interest. The informativeness of data y to be obtained under a design d to achieve

the purpose of the experiment such as parameter estimation, model selection and/or

prediction, is measured by a utility function. We consider design within a Bayesian

framework due to the mathematically rigorous handling of uncertainty and the availability

of important utility functions such as those based on mutual information (discussed later).

Define a utility function as u(d,y,θ) where y and θ are unknown a priori. Therefore, the

utility function cannot be used directly to design experiments. To do so, the expectation

is taken with respect to all unknowns to form an expected utility that can be expressed

as follows:


U(d) =

∫y

∫θu(d,y,θ) p(y|θ,d) p(θ) dθ dy. (2.11)

In the presence of model uncertainty, Equation (2.11) can be extended to incorporate

the uncertainty about each model given by the prior model probability p(m). Then, the

expected utility of design d is given by,

U(d) =K∑

m=1

p(m)

{∫y

∫θm

u(d,y,θm,m) p(y |θm, m,d) p(θm|m) dθm dy

}. (2.12)

The utility u(d,y,θm,m) can be defined according to the purpose of the experiment such

as estimation of parameters across all the K competing models, discriminating between

competing models or dual goal of parameters estimation and model discrimination. When

the utility function does not depend on the model parameters, Equation (2.12) can be

simplified to yield,

U(d) =K∑

m=1

p(m)

{∫yu(d,y,m) p(y |m,d) dy

}. (2.13)

Unfortunately, given the form of most utility functions, the above integral is generally

analytically intractable, and therefore needs to be estimated using numerical methods

such as Monte Carlo integration. This estimation can be expressed as follows:

U(d) =K∑

m=1

p(m)1

N

N∑j=1

u(d,ymj ,m), (2.14)

where ymj ∼ p(y|m,d) are independent draws from the model m at time points d. Ac-

cording to Equation (2.14), a single evaluation of U(d) requires K × N evaluations of

the utility function. Given the utility is typically a function of the posterior distribution,

evaluating this approximation is a computationally challenging task. Further challenges

occur in locating the optimal design which is defined as the design that maximises U(d)

over the design space.

Following subsections will describe the utility functions that have been used in the design

literature for parameter estimation, model discrimination and dual-purpose of parameter

estimation and model discrimination.


2.4.1 Utility functions for parameter estimation

Design of experiments for efficient parameter estimation has received major attention in

both frequentist and Bayesian design literature. A number of criteria have been used in

the frequentist design literature as functions of the Fisher information matrix, such as

the D-optimality, A-optimality and Ds-optimality.

In the Bayesian literature, the Kullback-Leibler (KL) divergence between the prior and

posterior distributions of parameters has been widely used as a utility function, see

Drovandi et al. (2013), Ryan et al. (2014), Ryan (2003). For a dataset y to be obtained

under a design d, the KL divergence utility can be expressed as follows:

u(d,y) =

∫θ

log

(p(θ |y,d)

p(θ)

)p(θ |y,d) dθ. (2.15)

By applying Bayes theorem, this can be simplified as (Lindley, 1956),

u(d,y) =

∫θ

log p(y|θ,d) p(θ|y,d) dθ − log p(y|d), (2.16)

where p(y|d) is the marginal likelihood or model evidence.

As can be seen, in using the KL divergence utility, an estimate of the marginal likelihood

is required. Given this is typically difficult to achieve, alternative estimation utilities

have been adopted in the literature. For example, Ryan et al. (2014) used the inverse of

the determinant of the posterior covariance matrix of θ as a utility to derive design for

parameter estimation and it can be expressed as follows:

u(d,y) = 1/det(V ar(θ|y,d)). (2.17)

As pointed out by Overstall et al. (2018), the logarithm of Equation (2.17) would be more

appropriate to use as it is more closely related to the KL divergence utility.

2.4.2 Utility functions for model discrimination

In the frequentist literature, the T-optimality criterion (Atkinson and Fedorov, 1975a,b)

has been used to design experiments to discriminate between an assumed true model and

one or several competing models with normal errors. Lopez-Fidalgo et al. (2007) proposed

a utility based on the KL distance between the predictive distributions of assumed true

model and alternative model, which can be used to discriminate between any models.

However, both T and KL optimal designs are based on the assumption of a true model, and

thus can be considered locally optimal. Ponce de Leon and Atkinson (1991) and Tommasi

and Lopez-Fidalgo (2010) extended the T-optimality and KL-optimality, respectively,


by incorporating prior model probability of each model and prior distribution of model

parameters.

The mutual information between the observed data and model indicator m has been

widely used as a discrimination utility to obtain fully Bayesian designs, for instance, Box

and Hill (1967), Cavagnaro et al. (2010), Drovandi et al. (2014). In contrast to the T and

KL optimality criteria, the mutual information utility is based on the posterior model

probability and does not require the specification of a true model. Further, the mutual

information utility is straightforward to extend to cases that involve many models, and

it is given by,

uMI(d,y,m) = log p(m |y,d). (2.18)

The Zero-One utility selects the design which correctly classifies, on average, the true

model based on the posterior model probability (Overstall et al., 2018, Rose, 2008). It

can be defined as follows:

u0−1(d,y,m) =

1, if m = m,

0, otherwise,(2.19)

where m = maxm∈M{p(m|y,d)} and M is the set of rival models. An advantage of the

Zero-One utility over the mutual information for model discrimination utility is that in-

correct choices of different models can be penalised differently. For example, the selection

of a more complex model can be penalised more than the selection of an overly simplified

model.

Both the mutual information for model discrimination and Zero-One utility are compu-

tationally expensive to evaluate. As an alternative, Ds-optimality has been used in the

frequentist design literature (Biedermann et al., 2007, Muller and Ponce de Leon, 1996),

and can be extended for use within a Bayesian framework as the posterior precision of

extra parameters of the most complex model among the nested models. This can be

expressed as follows:

uDs(d,y) = log(1/det(V ar(θs|y,d))), (2.20)

where θs is the set of extra parameters of the most complex model among the nested

models.

2.4.3 Utility functions for dual experimental goals

Often the purpose of experiments is to learn about both the model and parameters which

adequately describe the data generated from the process of interest. Thus, experiments

have been designed to achieve more than one goal simultaneously (Atkinson, 2008, Borth,


1975, Hill et al., 1968, McGree, 2017, McGree et al., 2008, Ng and Chick, 2004, Tommasi,

2009). The obvious advantage of such an experiment is the reduction of the cost of

conducting multiple experiments to achieve individual goals such as parameter estimation,

model discrimination and prediction.

A common approach to design the experiments with more than one experimental goal is

maximising a weighted product of (geometric mean) efficiencies under each experimental

goal. For instance, DT-optimality criterion (Atkinson, 2008), DKL-optimality criterion

(Tommasi, 2009) and DP-optimality criterion (McGree et al., 2008). However, in general,

the selection of the appropriate weight for each experiment goal is not straightforward.

In the Bayesian design literature, Borth (1975) proposed the total entropy utility to design

dual-purpose experiments for parameter estimation and model discrimination. The total

entropy utility uses the additive property of entropy to combine the utilities for parame-

ters estimation and model discrimination. Consequently, the experimenter is not required

to select weights for each experimental goal. However, application of the total entropy

utility has been limited to simple models as evaluation of this utility is a computationally

intensive task. Recently, McGree (2017) addressed the computational difficulties in eval-

uating the total entropy utility using sequential Monte Carlo methods to find designs for

non-linear models for binary and count response data. However, this work was limited to

discretised design space and generalized linear and generalized nonlinear models.

2.5 Experimental designs for models with intractable

likelihood

The evaluation of designs for models with intractable likelihoods, commonly found in

epidemiology, ecology and queue systems, is a challenging task. Consequently, only a few

attempts have been made in both the frequentist literature (Pagendam and Pollett, 2013,

Parker et al., 2015) and the Bayesian literature Cook et al. (2008), Drovandi and Pettitt

(2013), Hainy et al. (2014, 2013), Overstall and McGree (2018), Price et al. (2016, 2018a),

Ryan et al. (2016a) to derive optimal designs for experiments to estimate parameters of

an assumed model.

In the frequentist literature, Pagendam and Pollett (2013) used a Gaussian diffusion

approximation in deriving D-optimal experimental designs to estimate parameters of epi-

demic models, SI (Susceptible to infected), SIS (susceptible to infected to susceptible)

and SIR (susceptible to infected to recovered). However, the Gaussian diffusion approxi-

mation can only be applied for density dependent Markov chains. As noted by Pagendam

and Pollett (2013) this approximation only produces accurate results for large populations

and thus, cannot be used to design experiments where a small number of experimental

units are considered such as experiments in veterinary epidemiology. Parker et al. (2015)

found D and Ds optimal designs for parameter estimation of a queue system. This work


was limited to simple queue model, M/M/1, where the likelihood function can be ap-

proximated using Hyperbolic Bessel functions (Morse, 1955). Further, these designs are

not robust against parameter uncertainty and are thus termed locally optimal designs.

In the Bayesian framework, Cook et al. (2008) derived optimal observation times for pa-

rameter estimation of the SI model via the moment closure method to approximate the

likelihood. However, the moment closure may not be appropriate for complex models with

multiple sub-populations such as SIR model, as is not straightforward to find an appro-

priate probability distribution to approximate the actual likelihood. Drovandi and Pettitt

(2013) used ABC rejection method (Beaumont et al., 2002) to find optimal Bayesian de-

sign for parameter estimation of Markov process models of epidemics and macroparasite

population evolution. In this approach, the computational expense of simulating a large

number of datasets was reduced by pre-simulating and storing datasets prior to the search

of optimal design using the Muller algorithm (Muller, 1999). Then, the idea of using pre-

simulated data in utility evaluation has been employed in subsequent design papers by

Price et al. (2016, 2018a) for parameter estimation for epidemiological models and Hainy

et al. (2016) for spatial extremes models.

Ryan et al. (2016a) used indirect inference to approximate intractable likelihood in epi-

demiology models and found optimal design for parameters estimation. Indirect inference

methods approximate the likelihood of observed data (based on the generative model) via

the likelihood of an auxiliary model which is comparatively computationally less expen-

sive to evaluate. In contrast to the pre-simulation of data for each possible design point

(Drovandi and Pettitt, 2013), the indirect inference approach requires to find and store

a mapping function between parameters of the generative model and the auxiliary model

based on data simulated according to the selected training design. Thus, this approach

can be used to search optimal designs in a continuous design space, and it is straight-

forward to consider design problems with more design variables. However, this method

highly depends on the adequacy of the auxiliary model which may be difficult to find, in

general.

2.6 Optimisation algorithm

Finding the optimal design for an experiment requires maximising the expected utility

over all possible designs. An exhaustive search on the design space is computationally

prohibitive other than for design problems with few candidate designs to be considered.

For instance, Hainy et al. (2016) considered the selection of 3 weather stations from

39 stations to yield data for efficient parameter estimation for spatial extremes models.

Further, in the absence of an analytical expression for the expected utility U(d), the

numerical approximations result in a noise utility surface where the standard optimisation

algorithms may fail to find the optimal design.


The Muller algorithm (Muller, 1999) has been widely used to find Bayesian designs for

parameter estimation (Cook et al., 2008, Drovandi and Pettitt, 2013, Ryan et al., 2016a).

The Muller algorithm first samples from a joint distribution of parameters θ, data y and

design d using MCMC. Then, the optimal design is estimated as the multivariate mode

of the marginal distribution of d. The applicability of this method is limited for design

problems with only a few design dimensions due to the difficulty in estimating/finding

the multivariate mode. To overcome these computational issues Ryan et al. (2015, 2014)

used a low dimensional parametrisation of the design space using a beta distribution in

finding optimal sampling time for pharmacokinetic experiments. The sampling times are

represented by the quantiles of the beta distribution which can be represented by two

parameters. Thus, the higher dimensional design space of sampling times reduces to a

two dimensional design space. However, the low dimensional parametrisation may result

in sub-optimal designs.

Alternatively, the Coordinate exchange (CE) algorithm of Meyer and Nachtsheim (1995)

can be used to find optimal designs. The CE algorithm iteratively optimises one design

variable (design point) at a time until no improvement can be achieved in the utility

function. However, as the number of candidate solutions for each design point increases,

the CE algorithm can require a large number of evaluations of the expected utility. Thus,

Overstall and Woods (2017) extended the CE algorithm by emulating the expected utility

in each design dimension and optimising the predicted value as given by emulator. In

general, this reduces the number of utility evaluations required to find the optimal design.

However, this algorithm may require a large number of iterations to locate the optimal

design for design problems with utility surfaces which may not be well approximated by

the Gaussian Process emulator.

Price et al. (2018b) proposed the Induced Natural Selection Heuristic (INSH) algorithm, a

nature-inspired population-based optimisation algorithm, to locate optimal designs. The

INSH algorithm starts with a randomly selected sample of designs (the initial population),

and evaluates the utility of each design using available parallel computing resources. Then,

a fixed number of best designs are selected, and a pre-specified number of designs are

selected around these best designs to form the next generation of designs. This process

then iterates a pre-specified number of iterations after which the best design among the

population is selected as the optimal design. The INSH algorithm can be used for any

utility surface without making any assumptions about its shape. However, this approach

may result in near-optimal designs and requires specifying a relatively large number of

tuning parameters.

2.7 Conclusion

In summary, the existing research on ABC and Bayesian experimental design have con-

sidered design problems in a low dimensional design space. Further, methodologies for

Bayesian design for discriminating between models with intractable likelihoods and dual


purpose experiments for these models do not exist in the current literature. In this thesis,

we thus propose new methods to address these gaps in the literature with a particular

focus on designing experiments in epidemiology.

3 Optimal Bayesian design for discriminating be-

tween models with intractable likelihoods in epi-

demiology

29

Chapter 3. Optimal Bayesian design for Model discrimination 30

Statement for Authorship

This chapter has been written as a journal article. The authors listed below have certified

that:

(a) They meet the criteria for authorship as they have participated in the conception,

execution or interpretation of at least the part of the publication in their field of

expertise;

(b) They take public responsibility for their part of the publication, except for the

responsible author who accepts overall responsibility for the publication;

(c) There are no other authors of the publication according to these criteria;

(d) Potential conflicts of interest have been disclosed to granting bodies, the editor

or publisher of the journals of other publications and the head of the responsible

academic unit; and

(e) They agree to the use of the publication in the student’s thesis and its publication

on the Australian Digital Thesis database consistent with any limitations set by

publisher requirements.

The reference for the publication associated with this chapter is; Dehideniya M. B.,

Drovandi C. C. , and McGree J. M. (2018). Optimal Bayesian design for discriminating

between models with intractable likelihoods in epidemiology, Computational Statistics &

Data Analysis, 124: 277-297.

Contributor Statement of contribution

M. B. Dehideniya Developed and implemented the statistical methods, wrote

the manuscript, revised the manuscript as suggested by

co-authors and reviewers.

Signature and date:

J. M. McGree Supervised research, assisted in interpreting results,

critically reviewed manuscript

C. C. Drovandi Initiated the research concept, supervised research,

assisted in interpreting results, Critically reviewed

manuscript

Principal Supervisor Confirmation: I have sighted email or other correspondence for all

co-authors confirming their authorship.

Name: ________________________ Signature:______________ Date: ________

05/07/2019

James McGree 05/07/2019


3.1 Abstract

A methodology is proposed to derive Bayesian experimental designs for discriminating be-

tween rival epidemiological models with computationally intractable likelihoods. Methods

from approximate Bayesian computation are used to facilitate inference in this setting,

and an efficient implementation of this inference framework for approximating the expec-

tation of utility functions is proposed. Three utility functions for model discrimination

are considered, and the performance each utility is explored in designing experiments for

discriminating between three epidemiological models: the death model, the Susceptible-

Infected model, and the Susceptible-Exposed-Infected model. The challenge of efficiently

locating optimal designs is addressed by an adaptation of the coordinate exchange algo-

rithm which exploits parallel computational architectures.

3.2 Introduction

Epidemiological studies are important for understanding how a disease is transmitted,

and for the development of preventative measures which might reduce or limit the spread

of the disease. Informative data collection is crucial in developing this understanding, and

can be achieved by conducting an experiment according to an optimal design that provides

the maximum amount of information to address the aim of the experiment which could

include model selection, parameter estimation and prediction. However, the derivation of

optimal designs in epidemiological experiments is a challenging task as most epidemiolog-

ical models contain likelihoods which are computationally expensive to evaluate (Becker,

1993). Consequently, only a few attempts have been made in both the frequentist lit-

erature (Pagendam and Pollett, 2013) and the Bayesian literature (Cook et al., 2008,

Drovandi and Pettitt, 2013) to derive optimal designs for experiments in epidemiology.

In the frequentist literature, the design of epidemiological experiments has been facil-

itated via an approximation to the likelihood. Pagendam and Pollett (2013) used a

Gaussian diffusion approximation in deriving D-optimal experimental designs to estimate

parameters of the SI (Susceptible-Infected), SIS (Susceptible-Infected-Susceptible) and

SIR (Susceptible-Infected-Recovered) epidemic models. The designs derived in this work

were dependent upon point estimates of the parameter values, and are thus termed locally

optimal designs. In contrast, the Bayesian approach provides a framework to account for

the uncertainty in parameters when deriving optimal designs (Ryan, 2003). This was

demonstrated in the work of Cook et al. (2008) who derived optimal observation times

for parameter estimation of the death model and the SI model. In their work, the moment

closure method was used to approximate the likelihood of the SI model.

Recent developments in approximate Bayesian computation (ABC) provide a compre-

hensive framework to undertake Bayesian inference and design when the likelihood is


intractable. Drovandi and Pettitt (2013) presented a likelihood-free method to derive

Bayesian designs for parameter estimation of Markov process models of epidemics and

macroparasite population evolution using the ABC rejection method (Beaumont et al.,

2002). In the work of Price et al. (2016), ABC rejection was used to approximate a utility

function based on the Kullback-Leibler (KL) divergence (Kullback and Leibler, 1951) in

designing experiments for parameter estimation of epidemiological models.

Previous work in the design of epidemiological experiments has focussed on estimating

model parameters of an assumed true model to describe the process of interest (Cook

et al., 2008, Drovandi and Pettitt, 2013, Pagendam and Pollett, 2013). However, in re-

ality, there may be uncertainty about the true epidemiological process (see Lee et al.

(2015)), and indeed, the purpose of the experiment could be to determine how a disease

spreads. Hence, the lack of knowledge about the true model should be taken into account

when designing efficient experiments. Thus, the need for the development of new meth-

ods to design efficient epidemiological experiments for model discrimination motivates the

work described in this article. Here, we consider the design problem of locating a set of

observation times which yields information to efficiently discriminate between compet-

ing models. Moreover, previous work on designing experiments for model discrimination

(Atkinson and Fedorov, 1975a, Cavagnaro et al., 2010, Drovandi et al., 2014, Overstall

et al., 2018, Woods et al., 2017) were limited to models where the likelihood can be easily

computed. Thus, this is the first paper to propose methods for finding Bayesian optimal

designs for discriminating between models with intractable likelihoods.

Finding the optimal design for an experiment requires the maximisation of an expected

utility over all possible designs, and it is a challenging optimisation problem because the

utility surface is noisy and may be relatively flat around its maximum. Further, it can

be computationally prohibitive to undertake the optimisation even for experiments with

a moderate number of design variables (see the review by Ryan et al. (2016b)). Muller

(1999) proposed a simulation-based approach that converts the optimisation problem to a

problem of sampling from a target distribution for which the mode is the optimal design.

First, samples are drawn from the target distribution h(θ,y,d) (joint distribution of the

parameters, data, and design) using Markov chain Monte Carlo (MCMC) simulations,

and then the estimated multivariate mode of the marginal distribution of d is deemed the

optimal design. The Muller algorithm has been widely used in the Bayesian experimental

design literature (Cook et al., 2008, Drovandi and Pettitt, 2013, Ryan et al., 2014, Stroud

et al., 2001). However, in practice, this method suffers from slow convergence. Moreover,

sampling from the joint distribution h(θ,y,d) using an MCMC method and determining

the multivariate mode for a large number of design variables are computationally expen-

sive tasks (Drovandi and Pettitt, 2013).


Alternatively, existing local search optimisation methods can be used to locate the optimal

design. For instance, the coordinate exchange (CE) algorithm of Meyer and Nachtsheim

(1995) has been used to find D-optimal designs in screening experiments by Goos and

Jones (2011), Palhazi Cuervo et al. (2016). Further, Gotwalt et al. (2009) used the coor-

dinate exchange algorithm in constructing pseudo-Bayesian optimal designs for parameter

estimation of non-linear models. The coordinate exchange algorithm starts from a given

initial design and iteratively maximises the utility function by changing one design vari-

able at a time while keeping all other variables fixed. This iterative procedure continues

until there is little or no improvement in the value of the utility. In practice, this may

require a large number of utility evaluations, especially when continuous design variables

are involved in the experiment. Recent work of Overstall and Woods (2017) extends

the idea of the coordinate exchange algorithm by considering an approximation of the

expected utility as a function of a single design variable conditional on the remaining

fixed variables. This approximation is facilitated by fitting a Gaussian process emulator

based on a relatively small number of utility evaluations. This emulator is then used to

approximate the utility function across the entire range of the considered variable and to

estimate the maximum at each iteration.

In this work, evaluating the approximate utility of a given design is highly computational

as it requires a large number of simulations from the model in order to approximate a

posterior distribution via ABC methods. Consequently, finding Bayesian optimal designs

for models with intractable likelihoods in a continuous design space could be computa-

tionally prohibitive. However, the use of a discrete design space to locate optimal designs

significantly reduces the required computational effort as it allows the use of pre-simulated

data for the posterior approximations in utility evaluations (discussed later). This idea

has been used by Drovandi and Pettitt (2013) within the Muller algorithm and by Price

et al. (2016) who finds the optimal design using an exhaustive search. However, these

methods quickly become computationally intensive as the number of design dimensions

increases. Hence, in this setting, it would be advantageous to reduce the required num-

ber of utility evaluations when searching for the optimal design. For this purpose, we

propose using the refined coordinate exchange algorithm where, at each iteration of the

exchange algorithm, the coordinate space reduces and becomes more refined. Further, the

algorithm is structured such that parallel computational architectures can be exploited.

As will be seen, through using this algorithm, we are able to efficiently locate Bayesian

designs in higher dimensions than previously explored in the design literature related to

models with intractable likelihoods.

The paper is organised as follows. In the next section, the problem of model choice in

the Bayesian framework is described. Section 3.4 presents the utility functions used in

this work, and Section 3.5 describes the ABC methods that are used for inference and in

estimating the expected utility of a given design. An adapted version of the coordinate


exchange algorithm which exploits parallel computational architectures is presented in

Section 3.6. In Section 3.7, the design for a pharmacokinetic model is considered to explore

and demonstrate the performance of our proposed optimisation algorithm. Following this,

two epidemiological examples are considered to demonstrate the performance of three

utility functions, namely the mutual information utility, the Ds-optimal utility and the

Zero-One (0-1) utility for model discrimination. The paper concludes with a discussion

and suggestions for further research.

3.3 Bayesian model choice

Consider the problem of designing an experiment to select the preferred model from a

finite number of K candidate models, described by a random variable M ∈ {1, 2, . . . ,K}.Each model m is parameterised by θm with a prior distribution p(θm |M = m), and

the likelihood function of each model m is given by p(y |θm,d), where y represents the

observed data from the experiment conducted under design d. The prior probability of

each model m is represented by p(M = m), for m = 1, 2, . . . ,K. For ease of notation,

M = m will be abbreviated by m throughout the rest of the paper. In this work, model

choice is performed using the posterior model probability which can be expressed as

follows:

p(m |y,d) =p(y |m,d)p(m)∑K

m=1 p(y |m,d)p(m), (3.1)

where

p(y |m,d) =

∫θm

p(y |θm,d)p(θm |m) dθm.

In Section 3.5.2 we describe how to estimate p(m |y,d) in an intractable likelihood setting.

3.4 Bayesian experimental design

Experimental designs provide plans for the collection of informative data to efficiently

address experimental aims. In the Bayesian setting, an experimental design d is evaluated

by estimating the expected value of a utility function U(d) which represents the expected

worth of the experimental data obtained under the design d. Finding the Bayesian optimal

design involves locating the design over the design space that maximises d. We denote the

optimal design as d∗. In a Bayesian setting, we consider the expected value of a chosen

utility function u(d,y,θ):

U(d) =

∫y

∫θu(d,y,θ) p(y |θ,d) p(θ) dθ dy, (3.2)


where p(y |θ,d) is the likelihood of the possible outcomes under parameters θ and design

d, and p(θ) is the prior distribution of θ. When the utility function u(.) is independent

of θ, the expected utility can be simplified to yield

U(d) =

∫yu(d,y)p(y |d) dy, (3.3)

where p(y |d) is the prior predictive distribution of observed data y under design d.

When model uncertainty is present, Equation (3.3) can be extended to yield

U(d) =K∑

m=1

p(m)


}. (3.4)

In the Bayesian experimental design context, most utility functions u(.) are based on

the posterior of unknowns (parameters and/or model), and thus given the form of most

utility functions, the above integrals are analytically intractable. Therefore, an approxi-

mation needs to be considered. A common approach is to use Monte Carlo integration,

where data are sampled from the prior predictive distribution and the utility is evaluated

for each sample. Variance reduction can be obtained by drawing from the prior predic-

tive using randomised Quasi-Monte Carlo (Drovandi and Tran, 2018) but for simplicity

we consider pseudo-random numbers. Unfortunately, the approximation of the expected

utility requires a large number of posterior distributions to be approximated (or sampled

from) rendering Bayesian design more computationally challenging than Bayesian infer-

ence. Further, in the search for an optimal design, one needs to approximate U(d) a large

number of times over the design space. This presents significant computational challenges

and has been the main reason why Bayesian design has generally been restricted to low-

dimensional settings. The specific forms of the utility functions considered in this work

are outlined next.

3.4.1 Utility function for parameter estimation

In the design literature, utilities based on the expected information gain from the exper-

iment have been used to evaluate designs for efficient parameter estimation. Cook et al.

(2008) and Price et al. (2016) used the KL divergence between the prior and posterior

distributions of the parameters:

u(d,y) =

∫θ

log

(p(θ |y,d)

p(θ)

)p(θ |y,d) dθ. (3.5)

This utility is equivalent to the Shannon Information gain (SIG) which can be derived by

applying Bayes theorem to the above equation. Then the SIG utility can be expressed as,


u(d,y) =

∫θ

log(p(y|θ,d)) p(θ|y,d)dθ − log(p(y|d)). (3.6)

This utility will be applied in Section 3.7.1 of this paper when we consider designing a

pharmacokinetic study to explore and evaluate the performance of our proposed optimi-

sation algorithm.

3.4.2 Utility functions for model discrimination

In this study, we consider three model discrimination utilities for discriminating between

rival models. First, we describe the mutual information utility which has been used in

Bayesian experimental design as a utility function for model discrimination (Cavagnaro

et al., 2010, Drovandi et al., 2014) when the likelihood can be computed analytically. This

utility evaluates the mutual information between the data and the model indicator, and

can be expressed as,

uMI(d,y,m) = log p(m |y,d), (3.7)

where y are the possible outcomes of the experiment under model m and design d. This

was originally proposed by Box and Hill (1967) and used recently by Drovandi et al. (2014)

to derive sequential designs for model discrimination. Here, we use the expected value

of the mutual information utility, UMI(d), to evaluate static designs for discriminating

between models with intractable likelihoods. Estimation of UMI(d) will be described in

the next section.

The Zero-One utility given in Equation (3.8) selects the design which, on average, has the

highest chance of selecting the true model based on the posterior model probabilities of

the rival models, that is,

u0−1(d,y,m) =

1, if m = m

0, otherwise,(3.8)

where m = maxm∈M{p(m |y,d)} and M is the set of rival models. Such a utility func-

tion has been used previously to discriminate between competing linear (Rose, 2008) and

logistic (Overstall et al., 2018) regression models. This utility has a straightforward inter-

pretation as maximising the probability of selecting the true model. An advantage of this

utility is that it can be easily modified to penalise incorrect selection of the true model.

For example, this utility can be adjusted to more severely penalise the incorrect selection

of the simpler of two models than incorrectly selecting the more complex model.


Ds-optimality has been used for designing efficient experiments to estimate a specific

subset of parameters of interest in a given model (Atkinson and Bogacka, 1997, Solkner,

1993), and to discriminate between nested models (Muller and Ponce de Leon, 1996,

Waterhouse et al., 2008). As a discrimination utility, Ds-optimality selects the designs

which provide precise estimates of the additional parameters in the most complex model

among nested rival models. Thus, it is computationally less expensive to evaluate than

other model discrimination utilities which are based on the posterior model probabilities

of the competing models. It can be expressed as follows:

uDs(d,y) = log(1/det(Varθs |y,d(θs))), (3.9)

where θs denotes the additional parameters of the most complex model among the rival

models.

Hence, three discrimination utilities will be considered in the examples that follow in

Section 3.7. However, before the examples can be considered, we must first show how

each utility can be approximated in our inference framework. Thus, in the next section,

ABC is described for posterior inference in intractable likelihood settings. It is then shown

how the expectation of each utility function can be approximated.

3.5 Approximate Bayesian computation (ABC) and utility

estimation

ABC methods were originally developed for inference in population genetics (Beaumont

et al., 2002), and these methods are used for Bayesian inference through the simulation

of data from the model when the likelihood cannot be evaluated analytically or is com-

putationally infeasible to evaluate a large number of times. In using this approach, a

large number of simulations are drawn from the prior predictive distribution and then the

parameters which generate data x similar to the observed data y are used to approximate

the posterior of θ. The similarity between x and y is measured by a discrepancy function

ρ(x,y) which is based on some summary statistics of x and y. We outline two existing

ABC algorithms: an ABC algorithm for parameter estimation, and an extension of this

ABC algorithm for model choice.

3.5.1 ABC for parameter estimation

The ABC posterior of θ based on data y observed under design d can be expressed as,

p(θ |y,d, ε) ∝ p(θ)

∫xp(x |θ,d) I(ρ(y,x |d ) ≤ ε)dx, (3.10)


where I(.) is the indicator function, ρ(y,x |d) is a discrepancy function of simulated data x

and observed data y under design d, and ε is a predefined tolerance value. In this work, we

directly compare simulated values x and observed values y using the discrepancy function

ρ(y,x |d) proposed by Drovandi and Pettitt (2013) as only low-dimensional designs are

considered. This can be defined as follows:

ρ(x,y |d) =

D∑j=1

|xj − yj |std(x·j | dj)

, (3.11)

where D is the number of design points, and std(x·j | dj) is the standard deviation of pre-

simulated prior predictive data x·j at time dj (see Section 3.5.3 for more details), for j =

1, . . . , D. A sample from the ABC posterior p(θ |y,d, ε) can be obtained by ABC rejection

(Beaumont et al., 2002), MCMC ABC (Marjoram et al., 2003) or SMC ABC (Sisson

et al., 2007). Algorithm 3.1 describes a modified version of the ABC rejection algorithm

(Drovandi and Pettitt, 2013) for designing experiments. This approach is adopted in our

work as it is computationally less expensive than the alternative methods.

1: Generate θ i ∼ p(θ) for i = 1, . . . , N

2: Generate xi ∼ p(· |θ i,d) for i = 1, . . . , N

3: Compute discrepancies of ρ i = ρ(xi,y |d) for i = 1, . . . , N creating particles {θ i, ρ i}Ni=1

4: Sort the particle set according to the discrepancy ρ such that ρ1 ≤ ρ2 ≤ . . . ≤ ρN

5: Determine ε = ρbαNc (where b·c denotes the floor function and 0 < α < 1)

6: Select the subset of particles {θ i | ρ i ≤ ε}Ni=1, which gives the ABC posterior sample of θ

Algorithm 3.1: ABC rejection algorithm (Drovandi and Pettitt, 2013)

3.5.2 ABC for model choice

The joint ABC posterior distribution of model m and model parameters θm is given by,

p(m,θm |y,d, ε) ∝ p(θm |m)p(m)

∫xp(x |m,θm,d) I(ρ(y,x |d ) ≤ ε)dx, (3.12)

where y are the observed data under design d, and x are simulated data from model

m under design d. The posterior model probabilities can be approximated via sampling

from the ABC joint posterior p(m,θm |y,d, ε) using a modified version of the ABC model


choice (ABC-MC) approach (Grelaud et al., 2009) given in Algorithm 3.2.

1: Generate m i ∼ p(m) for i = 1, . . . , N

2: Generate θ im ∼ p(· |m i,d) for i = 1, . . . , N

3: Generate xi ∼ p(· |θ im,d) for i = 1, . . . , N

4: Compute discrepancies of ρ i = ρ(xi,y |d) for i = 1, . . . , N creating particles {m i, ρ i}Ni=1

5: Sort the particle set according to the discrepancy ρ such that ρ1 ≤ ρ2 ≤ . . . ≤ ρN

6: Determine ε = ρbαNc (where b·c denotes the floor function and 0 < α < 1)

7: Select the subset of particles {m i | ρ i ≤ ε}Ni=1

Algorithm 3.2: ABC algorithm for model choice (ABC-MC) (Grelaud et al., 2009)

Then, the ABC approximation to the posterior model probability of model m is given by,

p(m |y,d) =1

Nε

Nε∑i=1

I(m i = m), (3.13)

where Nε is the number of particles in {m i | ρ i ≤ ε}. The discrepancy function given in

Equation (3.11) is used in Algorithm 3.2 to evaluate the similarity between simulated and

observed data.

In this paper, both the ABC rejection and ABC-MC algorithms are used to approxi-

mate posteriors of parameters and posterior model probabilities, respectively, to assist in

approximating utility functions. This is described next.

3.5.3 Estimating the model discrimination utility functions

The expected utility UMI(d) described in Section 3.4.2 can be approximated by Monte

Carlo integration, and it can be expressed as follows:

UMI(d) =K∑

m=1

p(m)

{1

Qm

Qm∑k=1

log p(m |ymk ,d)

}, (3.14)

where ymk ∼ p(y |m,d),m = 1, 2, . . . ,K and p(m |ymk ,d) is an ABC approximation of

the posterior probability of model m obtained via the ABC-MC algorithm described in

Section 3.5.2.

It is a computationally demanding task to generate new datasets for each observed value

ymk in estimating the utility of design d and optimising it over the design space. However,

in ABC, simulated data x are independent from the observed data y (see lines 1-3 in

Algorithm 3.2) and thus, a set of pre-simulated datasets Xm from each model m for a


given design d could be used instead of generating new datasets for each approximation.

Hence, one could discretise the design space and generate datasets from each competing

model and each design prior to running the optimisation algorithm. Such an approach

will significantly reduce computational burden when evaluating an expected utility. In

this work, we consider a grid of times from tmin to tmax with increments of tinc and gen-

erate datasets from each competing model for each time point. This approach has been

used by Drovandi and Pettitt (2013), Hainy et al. (2016) and Price et al. (2016) for find-

ing ABC posteriors, and achieved considerable computational efficiency when estimating

utility functions in locating optimal designs for parameter estimation.

Further, instead of generating a new sample of ymk ; k = 1, .., Q , m = 1, . . . ,K, a fixed

set of ymk values from the Xm are used as the observed data. This has been originally

proposed by Price et al. (2016). The use of a fixed set of ymk from Xm simplifies the

optimisation step as the utilities are deterministic (for a given design d and pre-simulated

datasets Xm;m = 1, . . . ,K). The accuracy of the approximation will depend upon the

value of Qm (see Equation (3.14)). In practice, there will be a trade-off between the accu-

racy and run time. That is, a more accurate approximation can be obtained by increasing

Qm. However, this increases the computational burden of estimating UMI(d).

Similarly, for estimating the expected Zero-One utility, the posterior model probability of

each rival model is approximated by the ABC-MC algorithm using pre-simulated datasets

Xm;m = 1, . . . ,K. The estimated utility can be expressed as,

U0−1(d) =K∑

m=1

p(m)

{1

Qm

Qm∑k=1

u0−1(d,ymk ,m)

}. (3.15)

Lastly, the expected value of the Ds-optimal utility UDs(d) is estimated using a similar

approach described above. Here, a pre-simulated dataset (Xms) is generated from the

most complex model (ms) among the competing models. Then, the UDs(d) is estimated

by Monte Carlo integration using a fixed set of values yk selected from the Xms . It can

be expressed as follows:

UDs(d) =1

Q

Q∑k=1

uDs(d,yk), (3.16)

where yk ∼ pms(y |d), and uDs(d,yk) is the estimate of uDs(d,yk) using the ABC pos-

terior of θs (see Equation (3.9)) obtained via Algorithm 3.1 using pre-simulated datasets

from the model ms. Let xij be the observed value (number of infected individuals) at the


jth time point in the ith simulation. In this setting, the posterior of θs for a given yk is

approximated via the ABC rejection algorithm using xi· as simulated data (see line 2 of

Algorithm 1) and the standard deviation of pre-simulated data x·j at each time point can

be easily computed and then used in the ABC discrepancy function given in Equation

(3.11).

3.6 Optimisation algorithm

The task of locating the optimal design d∗ poses a challenging optimisation problem as

one must maximise a noisy function which may be relatively flat around its maximum.

Locating Bayesian optimal designs for models with intractable likelihoods has an addi-

tional challenge due to the cost of utility evaluations as described in the previous section.

Thus, in this work, we consider a discretised design space to take advantage of the use of

pre-simulated data when evaluating a utility function. Here, we present an adaptation of

the coordinate exchange algorithm for design problems with a discrete (or discretizable)

design space, which exploits parallel computational architectures.

3.6.1 Refined coordinate exchange (RCE) algorithm

In this study, we consider a discretised design space with a large number of possible val-

ues per design variable (observation time of the ith observation). Thus, if one were to

apply the coordinate exchange algorithm here, then U(d) would need to be evaluated for

changes in each element of the design, across all possibilities. This would be very compu-

tationally expensive. Instead, here we use the idea of refining the search space (Dror and

Steinberg, 2006) in optimising U(d) which requires relatively fewer utility evaluations.

This idea has also been used in the optimisation literature (for example, the grid walk

algorithm) and in approximating integrals with adaptive quadrature (for example, the

adaptive Simpson’s rule (McKeeman, 1962)).

To be more explicit, consider the problem of locating an optimal design with p design vari-

ables based on a given utility function. The RCE algorithm starts with a random design

d0 (see line 1 of Algorithm 3.3), and iteratively changes one variable at a time to maximise

U(d), until the improvement of utility U(d) from one iteration to the next, indexed by k,

is less than a predefined threshold value (ζ = 1×10−08) or a predefined maximum number

of iterations (kmax = 20) has been reached. Let dk = [dk1, . . . , dkq , . . . , d

kp] be a candidate

design at the kth iteration and U(dk) is optimised with respect to the qth design variable

(see lines 6-23 of Algorithm 3.3). Here, U(dk) changes only due to dkq as the other p− 1

variables are fixed and hence it is represented by U(dkq ). For a given iteration k and a

design variable q, U(dkq ) is evaluated at each value of set Dq1 which contains possible

values of the qth design variable from Lq and Uq with increments of stinit (a predefined

value) and find the optimal value of the qth variable from the set Dq1 which gives the


maximum utility value Uk∗q . Then, at the next sub-iteration, this process is repeated

with the redefined lower lq(r+1) and upper uq(r+1) limits of the dkq and an increment of sr

until the increment sr equals stmin, the smallest increment of the discretised design space

considered (see lines 10-19 of Algorithm 3.3). For each of these sub-iterations, the set Dqr

contains the set of possible values of the design variable q from lq(r+1) to uq(r+1) with an

increment of sr. Here, simulation results suggested that the use of η = 1.5, where η is

the grid reduction parameter, generally avoids local maxima. Then, the current optimal

design dk is updated with dk∗q , which gives the maximum utility with respect to the qth

variable, as outlined in lines 20-23 of Algorithm 3.3. Further, the chance of locating the

global optimal can be increased by re-running the algorithm from different initial designs.

1 Initialise : Set k = 0 , d0 = [d01, d02, . . . , d

0p], U

0max = U(d0).

2 repeat3 k = k + 1

4 Ukmax = Uk−1max

5 for q = 1 to p do

6 Evaluate U(dkq ) for all dkq ∈ Dq1 where Dq1 = {Lq, Lq + stinit, Lq + 2× stinit, . . . , Uq}and Lq and Uq are the lower and the upper limits, respectively, of the qth variable

7 Set dk∗q = arg maxdkq∈Dq1U(dkq )

8 Set Uk∗q = U(dk∗q ) , r = 2 , st2 = stinit/2

9 Set lq2 = dk∗q − η × st2 , uq2 = dk∗q + η × st210 repeat

11 Evaluate U(dkq ) for all dkq ∈ Dqr where

Dqr = {lqr, lqr + str, lqr + 2× str, . . . , uqr}12 Set dk∗qr = arg maxdkq∈Dqr

U(dkq )

13 if Uk∗q ≤ U(dk∗qr ) then

14 Set Uk∗q = U(dk∗qr )

15 Set dk∗q = dk∗qr16 end

17 Set lq(r+1) = dk∗q − η × str , uq(r+1) = dk∗q + η × str18 Set str+1 = str/2, r = r + 1

19 until str ≥ stmin20 if Ukmax ≤ Uk∗q then21 Set Ukmax = Uk∗q22 Set dk = [dk1 , d

k2 , . . . , d

k(q−1), d

k∗q , d

k(q+1), . . . , d

kp]

23 end

24 end

25 until Ukmax − Uk−1max ≥ ζ and k ≤ kmax

Algorithm 3.3: Refined coordinate exchange algorithm

In the described optimisation algorithm, parallel computing can be used in two ways, (i)

to evaluate U(d) in parallel (see lines 6 and 11 of Algorithm 3.3) and/or (ii) to evalu-

ate u(d,y,m) in parallel. We experienced significant improvements in computation time

by using the first option. As will be seen in Example 3, we are interested in locating

designs for discriminating between four candidate models. This leads to particularly ex-

pensive utility evaluations for the mutual information and 0-1 utility function. Hence,


we evaluated U(d) in parallel to allow designs to be found in a reasonable amount of time.

3.7 Examples

In epidemiology, individuals in a closed population are divided into sub-populations based

on their disease states such as susceptible, exposed, infected and recovered and the spread

of disease among the individuals is modelled by the transitions of the individuals between

the sub-populations based on unknown model parameters. The size of each sub-population

at time t is represented by the state of the Markov chain which models the dynamics of

the disease spread among the individuals along the time. Here, we consider the following

models which describe the spread of an infectious disease in a closed population of size n

to demonstrate the proposed methodology.

Model 1 : Death model

The death model (Cook et al., 2008) is a simple stochastic model which divides the

population of individuals into two states: susceptible and infected, and the number

of individuals in each subpopulation at time t is denoted by S(t) and I(t), respec-

tively. Once susceptible individuals become infected, they remain in the infected

state as they cannot recover. The probability that an infection occurs in the next

infinitesimal time period ∆t, at the time t with j susceptible individuals is,

P (S(t+ ∆t) = j − 1 |S(t) = j) = b1 j∆t + o(∆t),

where b1 is the rate at which susceptible individuals become infectious due to envi-

ronmental sources.

Model 2 : Susceptible-Infected (SI) model

The SI model (Cook et al., 2008) is an extension of the death model via the inclusion

of a parameter b2 which describes the infection rate per susceptible due to other

infectious individuals in the population. The probability that an infection occurs in

the next infinitesimal time period ∆t, at time t with j susceptible individuals is,

P (S(t+ ∆t) = j − 1 |S(t) = j) = (b1 + b2(n− j)) j∆t + o(∆t).

Model 3 : Susceptible−Exposed-Infected (SEI) model

For some diseases, the susceptible individuals are infected but become infectious

after a latent period of time T (Kim and Lin, 2008). The duration of the latent


period of each individual is independently distributed according to an exponential

distribution with rate parameter λ. The individuals in the latent period are called

exposed individuals, and they are not discerned from the susceptible individuals.

Thus, both the number of susceptible and exposed individuals in the population, de-

noted by S(t) and E(t) respectively, cannot be observed. Consequently, the spread

of a disease described by the SEI model is a partially observable process. The prob-

ability that a susceptible becomes an exposed individual in the next infinitesimal

time period ∆t, at the time t with j susceptible individuals is,

P (S(t+ ∆t) = j − 1, E(t+ ∆t) = e+ 1 |S(t) = j, E(t) = e) = b1 j∆t + o(∆t).

The probability that an exposed individual becomes an infectious individual in the

next infinitesimal time period ∆t, at time t with e exposed individuals is,

P (S(t+ ∆t) = j, E(t+ ∆t) = e− 1 |S(t) = j, E(t) = e) = λ e∆t + o(∆t).

Model 4 : Susceptible-Exposed-Infected-II (SEI-II) model

In the SEI-II model, the rate that a susceptible individual becomes an exposed

individual also depends on the number of infectious individuals in the population

via an additional parameter b2. Thus, the probability that a susceptible becomes

an exposed individual in the next infinitesimal time period ∆t, at the time t with j

susceptible individuals and i infectious individuals is,

P (S(t+ ∆t) = j − 1, I(t+ ∆t) = i |S(t) = j, I(t) = i) = (b1 + b2i) j∆t + o(∆t).

Then, each exposed individual becomes an infectious individual after a latent period

of T , where T ∼ exp(λ). The probability that an exposed individual becomes an

infectious individual in the next infinitesimal time period ∆t, at time t with e

exposed individuals is,

P (S(t+ ∆t) = j, E(t+ ∆t) = e− 1 |S(t) = j, E(t) = e) = λ e∆t + o(∆t).

The following subsections describe three examples to demonstrate the methodology pre-

sented in this paper. The first example demonstrates the performance of the proposed

optimisation algorithm on a standard design problem in the literature. The next two

examples consider the design problem of determining a set of time points to observe a

process which yields observations to discriminate between rival models with intractable

likelihoods. In the first of these two examples, the performance of designs derived under

three utilities for discriminating between Model 1 and Model 2 are compared. Then, in


the third example, the performance of these model discrimination utilities in designing

experiments to discriminate between more than two epidemiological models is explored.

Particularly in this example, the performance of the proposed methodology for finding

designs in, up to, ten dimensions is explored. In the last two examples, optimal designs

are derived based on a discretised design space which consists of discrete time points from

0.1 to 20 days with increments of 0.1 days.

3.7.1 Example 1 - Designs for parameter estimation of a

pharmacokinetic model

In this example, the performance of the RCE algorithm is demonstrated by considering

the problem of locating optimal sampling times for a pharmacokinetic experiment which

has been considered by Overstall and Woods (2017), Ryan et al. (2014). In pharmacoki-

netic studies, compartmental models are typically used to describe the time course of the

concentration of an administered drug in a subject’s bloodstream. Let yt be the measured

concentration of a drug at time t, and it can be modelled by yt ∼ N (a(θ)µ(θ; t) , σ2b(θ; t))

where µ(θ; t) = exp(−θ1t)−exp(−θ2t), a(θ) =400 θ2

θ3(θ2 − θ1), b(θ; t) =

(1+

a(θ)2 µ(θ; t)2

10

)and σ2 = 0.1.

As in Ryan et al. (2014), independent log-normal priors were assumed for the parameters

of interest, θ = (θ1, θ2, θ3)T where, on the log scale, each θ has a common variance of 0.05

and mean of log(0.1), log(1) and log(20) for θ1, θ2, θ3, respectively. Here, the design prob-

lem is to choose 15 sampling times to measure the concentration of the drug to estimate

θ as precisely as possible. Further, as considered by Ryan et al. (2014), these samples are

taken within the first 24 hours after the administration of the drug and these sampling

times must be at least 15 minutes apart.

Here, we compared the performance of the RCE, CE and ACE algorithms in locating

optimal blood sampling times based on a discrete design space; a 15 dimensional grid

with 0.01 increments - G(15, 0.01) (dimension, increment). Each algorithm was executed

20 times in parallel using a 10-core processor and, for each algorithm, the same set of 20

designs was used as the initial designs. For the purpose of comparison, in this example

we used the implementation of the SIG utility available in acebayes R package (Over-

stall et al., 2017b). In this implementation, the double-loop Monte Carlo approximation

was used to estimate the expected SIG utility (see Overstall and Woods (2017) for more

details). Further, in order to adapt the methodology described in Section 3.5.3 for this

example, a fixed set of Monte Carlo samples (prior predictive samples) of size 1000 was

used in evaluating the expected SIG utility of a given design in all three algorithms.


The ACE algorithm locates the optimal design by iteratively selecting an optimal value of

one design variable at a time while keeping other associated design variables as constants.

In each of these iterations, an optimal value of the selected variable is chosen based on an

interpolated utility surface via Gaussian process (GP) (see Overstall and Woods (2017)

for more details). In this example, the utility function is deterministic throughout the

optimisation procedure. Thus, the same number of Monte Carlo samples was used for

the comparison and GP construction steps. The ACE algorithm is a continuous search

algorithm. Thus, it must be adapted for use when searching across discrete design spaces.

Here, we propose that, when a given design d is not available on the grid, the nearest

design dG on the grid G(15, 0.01) is instead considered. As this is an adaptation of the

ACE algorithm such that it can search a discrete design space, we denote this algorithm

as ACE-D.

In Table 3.1, the total run time and average number of utility evaluations of each algo-

rithm to locate 20 designs and the maximum expected utility found from these 20 runs

are given. According to the results, all three optimisation algorithms perform similarly

with ACE-D locating a design with the highest utility, and RCE and CE locating designs

which are 99.8% efficient. Further, it is evident that the discretisation of the design space

has a minor effect on locating the optimal design as the RCE located a design which is

98.0% efficient with respect to the ACE design on a continuous design space. The total

run times show that the CE algorithm is the most expensive requiring over 20 times more

resources than RCE and ACE-D. Further, RCE seems to be more efficient than ACE-D,

regardless of the three step sizes considered. This efficiency is explored further by in-

specting the trace plots of the 20 runs of the RCE and ACE-D algorithms. Such trace

plots are shown in Figure 3.1 which presents the utility value of the best design found at

each iteration of each algorithm. It is evident that RCE relatively quickly locates highly

efficient designs, and appears to be more robust to the initial design.

Table 3.1: Performance of optimisation algorithms in locating 15-points designs for parameter estimationof PK-model.

AlgorithmInitial

step sizeU(d∗)

Total

run time

(minutes)

Average number

of utility evaluations

RCE

1 4.82 19.2 3109.3

0.8 4.82 16.8 2856.1

0.5 4.82 23.2 3845.4

ACE-D - 4.83 34.7 6636.3

ACE - 4.92 35.8 6633.9

CE - 4.82 735.0 128684.9


3.5

4.0

4.5

0 5 10 15 20

Iteration

Exp

ecte

d S

ha

nn

on

in

fro

ma

tio

n g

ain

Algorithm

ACE−D

RCE

Figure 3.1: Trace plots of U(d) each iteration of the ACE and RCE algorithm in locating 15 optimalblood sampling times based on a discretised design space.

In this example, we found that our optimisation algorithm could locate a relatively efficient

design quicker than the ACE-D algorithm, but the latter was still able to eventually find a

design with a slightly higher expected utility. Hence, we explore both of these algorithms

further in the next example.

3.7.2 Example 2 - Designs for model discrimination

In this example, we consider the design problem of deriving a set of distinct time points

at which to observe the process of interest (the spread of a disease) to gain the maximum

information for discriminating between Model 1 and Model 2. Here, time is considered as

the design variable and the ith design point represents the ith time at which to observe the

process. Optimal designs with 1-4 design points were derived using the three utility func-

tions described in Section 3.4.2. In contrast to Model 1, Model 2 has a likelihood which

is computationally intensive to evaluate a large number of times. Thus, here we used our

proposed ABC methods as described in Section 3.5.3 for utility evaluations in locating

optimal designs. As explained in Section 3.5.3, each utility function was evaluated by ap-

proximating posteriors using 50,000 prior predictive datasets from each model. The prior

distributions for the parameters in each model were found as follows. The parameter b1 of

the Model 1 was assumed to be 0.6 and data were generated from this model at three time

points, t = [2, 5, 11] (days). Both models were fitted using ABC rejection with the follow-

ing prior distributions: Model 1 - b1 ∼ U(0, 1), Model 2 - b1 ∼ U(0, 1) , b2 ∼ U(0, 0.05).

The resulting posterior distributions then formed our priors and yielded the predictive


distributions as shown in Figure 3.2. This procedure was adopted to form our prior in-

formation so that the predictive distributions under each model were similar (as seen in

Figure 3.2), resulting in a particularly challenging model discrimination problem.

0

10

20

30

40

50

0 5 10 15 20

Time (t)

Num

ber

of in

fecte

ds

Figure 3.2: Prior predictive distributions of Model 1 (solid) and 2 (dashed). Here, dot-dashed anddotted lines represent the 2.5% and 97.5% prior prediction quantiles of Model 1 and 2, respectively.

For approximating the mutual information and Zero-One utilities, an equal number of

Monte Carlo samples from each candidate model (Qm, number of prior predictive draws

from model m) were used and the value of Qm was set to 500 since it yields estimates

of utilities with Monte Carlo error of less than 0.02 (standard deviation). It was found

that significant increases in precision could not be achieved for modest increases in Qm

(see Figure A.1). Similarly, the number of Monte Carlo samples from Model 2 used to

approximate the Ds-optimal utility was set to 500.

The accuracy of using estimates based on the ABC posterior in utility approximations

was evaluated by making comparisons with when the likelihood can be evaluated. That

is, for Model 1, the likelihood is straightforward to evaluate, and, for Model 2, the compu-

tationally expensive moment closure method can be used to approximate the likelihood.

Hence, our utilities were approximated both using ABC methods and when the likelihood

is available. This comparison was undertaken by randomly generating 500 two, three and

four point designs from the design space. For the one point case, all 200 possible designs

were considered. First, the expected utility of each of these designs was approximated

via ABC methods as described in Section 3.5.3 and then, these approximated expected


utilities were compared with the corresponding expected utilities evaluated where the

likelihood is available.

−0.70

−0.65

−0.60

−0.55

−0.70 −0.65 −0.60 −0.55

U(d) − Actual

U(d

) −

AB

C A

pp

roxim

atio

n

−0.70

−0.65

−0.60

−0.55

−0.50

−0.70 −0.65 −0.60 −0.55 −0.50 −0.45

U(d) − Actual

U(d

) −

AB

C A

pp

roxim

atio

n−0.70

−0.65

−0.60

−0.55

−0.50

−0.45

−0.70 −0.65 −0.60 −0.55 −0.50 −0.45

U(d) − Actual

U(d

) −

AB

C A

pp

roxim

atio

n

−0.70

−0.65

−0.60

−0.55

−0.50

−0.45

−0.70 −0.65 −0.60 −0.55 −0.50 −0.45

U(d) − Actual

U(d

) −

AB

C A

pp

roxim

atio

n

0.50

0.55

0.60

0.65

0.70

0.50 0.55 0.60 0.65 0.70

U(d) − Actual

U(d

) −

AB

C A

pp

roxim

atio

n

0.5

0.6

0.7

0.8

0.5 0.6 0.7 0.8

U(d) − Actual

U(d

) −

AB

C A

pp

roxim

atio

n

0.5

0.6

0.7

0.5 0.6 0.7

U(d) − Actual

U(d

) −

AB

C A

pp

roxim

atio

n

0.5

0.6

0.7

0.5 0.6 0.7 0.8

U(d) − Actual

U(d

) −

AB

C A

pp

roxim

atio

n

9.5

9.6

9.7

9.8

9.6 9.7 9.8 9.9

U(d) − Actual

U(d

) −

AB

C A

pp

roxim

atio

n

1 design point

9.6

9.8

10.0

10.2

9.6 9.8 10.0 10.2 10.4

U(d) − Actual

U(d

) −

AB

C A

pp

roxim

atio

n

2 design points

9.50

9.75

10.00

10.25

9.75 10.00 10.25

U(d) − Actual

U(d

) −

AB

C A

pp

roxim

atio

n

3 design points

9.50

9.75

10.00

10.25

9.75 10.00 10.25 10.50

U(d) − Actual

U(d

) −

AB

C A

pp

roxim

atio

n

4 design points

Figure 3.3: Comparison of estimated expected utility of the mutual information utility (first row), theZero-One utility (second row) and the Ds-optimality utility (third row) using ABC likelihoods and actuallikelihoods. In each plot, y = x line indicates the perfect match of approximated and actual utilityevaluations.

Figure 3.3 shows a comparison of the expected utility values based on the ABC approx-

imation and the actual likelihood for each utility function considered in this study. The

first row shows that the ABC-MC algorithm to approximate the mutual information util-

ity closely matches the expected utility values when using the actual likelihood. In the

case of designs with four points, the approximated utility values slightly deviate from the

actual utility value, particularly for designs with higher utility values. Such noise could

potentially hinder our ability to locate the optimal design. Similarly, the approximation

of the Zero-One utility via the ABC posterior model probability results in a one-to-one

match for low dimension designs and some deviations for the four point design, as shown


in the second row of Figure 3.3. The comparison of the expected utility values based

on the ABC approximation and the actual likelihood for Ds-optimality illustrated in the

third row suggests that the ABC approximation is biased. However, the relative ordering

of the designs (in terms of utility) is still preserved which suggests the approximation can

still be used when searching for the optimal design.

Table 3.2: Performance of optimisation algorithms in locating two points designs for discriminatingbetween Model 1 and 2 based on different utility functions.

Utility functionOptimisation

algorithm

Optimal design

d∗U(d∗)

Total

run time

(minutes)

Mutual information

RCE(0.5) (0.9, 4.2) -0.46 120.6

RCE(0.8) (0.9, 4.2) -0.46 91.8

RCE(1) (0.9, 4.2) -0.46 80.4

ACE-D (0.9, 4.2) -0.46 434.4

Ds-optimal

RCE(0.5) (0.6, 3.7) 10.36 75.0

RCE(0.8) (0.6, 3.7) 10.36 59.4

RCE(1) (0.6 , 3.7) 10.36 72.0

ACE-D (0.6 , 3.7) 10.36 332.4

Zero-One (0-1)

RCE(0.5) (0.7 , 4.3) 0.788 165.0

RCE(0.8) (0.9, 4.0) 0.791 117.6

RCE(1) (0.9 , 4.0) 0.791 125.4

ACE-D (0.9, 4.0) 0.791 475.8

The performance of the RCE and ACE-D algorithms were further explored for locating

optimal designs on a two dimensional grid with increments 0.1, that is, G(2,0.1) (similar

results were found for 3 and 4 design points, so they are given in the appendix - see Table

A.1). Here, each algorithm was executed 10 times in parallel starting from the same (ran-

domly drawn) initial designs. For each algorithm, the maximum expected utility value

found from the 10 runs and the total run time of 10 runs are given in Table 3.2. The

results show that both algorithms locate the same designs. One advantage of the RCE

algorithm is that it appears to locate these designs in a relatively short amount of time

(around 2 to 3 times faster than ACE-D). To further explore this, trace plots were again

inspected, see Figure 3.4. From this figure, it can be seen that ACE-D requires more

iterations of the exchange algorithm to locate the optimal design under all three utility

functions. As the utility functions considered for the remainder of this example and, in

particular, the next example are expensive to approximate, we propose only using RCE

to locate designs. Further, it is evident that the optimal designs found by the RCE al-

gorithm somewhat depend on the initial step size with both 0.8 and 1 being preferred in

terms of computation time. Thus, we ran the RCE algorithm with an initial step size of 1.


−0.65

−0.60

−0.55

−0.50

−0.45

0 5 10 15 20

Iteration

U(d

) −

Mu

tua

l In

form

atio

n

Algorithm

ACE−D

RCE

(a)

9.6

9.8

10.0

10.2

10.4

0 5 10 15 20

Iteration

U(d

) −

Ds−

Op

tim

alit

y

Algorithm

ACE−D

RCE

(b)

0.6

0.7

0.8

0 5 10 15 20

Iteration

U(d

) −

Ze

ro−

On

e U

tilit

y

Algorithm

ACE−D

RCE

(c)

Figure 3.4: Trace plots of the expected utility for each run of the ACE-D and RCE algorithms inlocating optimal two-points design based on (a) the mutual information utility , (b) the Ds-optimalityand (c) Zero-One utility.

The optimal designs found under three utility functions, the mutual information util-

ity, the Ds-optimal utility and the Zero-One utility are given in Table 3.3. Here, we

re-evaluated the expected utility of each optimal design 500 times using different Monte

Carlo samples of size 500 from each model to estimate U(d∗) and the associated Monte

Carlo standard error. According to these designs, the process should be observed in its

early stages when most of the susceptible individuals become infected and the derived

optimal observational times are the time points which have a relatively large difference

between the prior predictive distributions of the candidate models (see Figure 3.2). Ac-

cording to the estimated utility values, a noticeable increase in the model discrimination

ability of designs is not achieved by collecting more than two observations.

The performance of the optimal designs found under each utility was assessed against a

set of some randomly selected designs with an equal number of design points. In each

case, 500 observations from Model 1 were generated according to the optimal design and

a corresponding random set of designs. Then, the posterior model probabilities of Model


Table 3.3: Optimal designs for discriminating between Model 1 and Model 2 derived under differentutility functions.

Utility function

Number of

design points

|d|

Optimal design

d∗U(d∗)

Total

run time

(minutes)

Mutual information

1 0.4 -0.55 (0.01) 11.7

2 (0.9, 4.2) -0.47 (0.02) 80.2

3 (0.6, 4.3, 6.0) -0.47 (0.02) 233.8

4 (0.6, 4.0, 5.1, 10.8) -0.47 (0.02) 340.0

Ds-optimal

1 0.4 9.85 (0.01) 9.8

2 (0.6, 3.7) 10.35 (0.02) 72.3

3 (0.6, 3.3, 5.7) 10.43 (0.02) 145.7

4 (0.5, 1.0, 3.3, 5.7) 10.46 (0.02) 228.0

Zero-One (0-1)

1 0.3 0.71 (0.01) 13.0

2 (0.9, 4.0) 0.77 (0.01) 125.5

3 (0.4, 5.3, 8.1) 0.77 (0.01) 187.5

4 (0.3, 1.2, 3.3, 5.5) 0.77 (0.01) 331.1

1 were evaluated based on these simulated observations. In order to avoid potential inac-

curacies introduced by estimating the posterior model probabilities via ABC, the actual

likelihood was evaluated for this validation. In Figure 3.5, the empirical cumulative den-

sity function (CDF) of the estimated posterior model probability of Model 1 (true model)

based on each design is plotted. Here, we note that the proposed optimal designs under

each utility function perform equally well in discriminating between Model 1 and Model

2. According to Figure 3.5, randomly selected designs can perform as well as optimal

designs in discriminating between rival models, particularly in the case of one point de-

signs. This outcome is due to the fact that the set of random designs might actually

contain the optimal or near-optimal design. We note that, on average, the optimal de-

signs have a higher chance of correctly selecting the model responsible for data generation.


0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75

Posterior model probability

Em

pir

ica

l cum

ula

tive

pro

ba

bili

ty

Design

0−1

Ds

MI

RD

(a) 1 design point

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75


Em

pir

ica

l cum

ula

tive

pro

ba

bili

ty

Design

0−1

Ds

MI

RD

(b) 2 design points

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75


Em

pir

ical cum

ula

tive

pro

babili

ty

Design

0−1

Ds

MI

RD

(c) 3 design points

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.00


Em

pir

ical cum

ula

tive

pro

babili

ty

Design

0−1

Ds

MI

RD

(d) 4 design points

Figure 3.5: Empirical cumulative probabilities of the posterior model probability of Model 1 (true model)obtained for observations generated from Model 1 according to optimal designs for discriminating betweenModels 1 and 2, and random designs.

A similar simulation study was conducted using data generated from Model 2, and Fig-

ure 3.6 presents the empirical CDF of the estimated posterior probability of Model 2 in

each case. In this instance, across both models, the optimal designs perform better for

discrimination than the random designs (in expectation). However, it appears as though

the optimal designs have a slightly higher chance of selecting the wrong model on some

occasions. Optimal designs were obtained by maximising the expected utility which is

evaluated based on possible outcomes of all competing models. Consequently, the selected

optimal designs might not be preferred for discrimination for each individual realisation

from the prior information. However, over all realisations (or a large set of realisations),

the optimal designs are preferred for discrimination.


0.00

0.25

0.50

0.75

1.00

0.25 0.50 0.75 1.00


Em

pir

ica

l cum

ula

tive

pro

ba

bili

ty

Design

0−1

Ds

MI

RD

(a) 1 design point

0.00

0.25

0.50

0.75

1.00

0.25 0.50 0.75 1.00


Em

pir

ica

l cum

ula

tive

pro

ba

bili

ty

Design

0−1

Ds

MI

RD

(b) 2 design points

0.00

0.25

0.50

0.75

1.00

0.25 0.50 0.75 1.00


Em

pir

ical cum

ula

tive

pro

babili

ty

Design

0−1

Ds

MI

RD

(c) 3 design points

0.00

0.25

0.50

0.75

1.00

0.25 0.50 0.75 1.00


Em

pir

ical cum

ula

tive

pro

babili

ty

Design

0−1

Ds

MI

RD

(d) 4 design points

Figure 3.6: Empirical cumulative probabilities of the posterior model probability of Model 2 (true model)obtained for observations generated from Model 2 according to optimal designs for discriminating betweenModels 1 and 2, and random designs.

3.7.3 Example 3 - Designs for model discrimination

In this example, we derived optimal observation times which yield informative observa-

tions for discriminating between Models 1, 2, 3 and 4 using the utility functions previously

considered in Example 2. Similar to the previous example, here, the ith design point repre-

sents the ith time at which to observe the process of interest. As there are four competing

models in this example, we considered designs with up to ten design points for model

discrimination. Note that in the Ds-optimal utility function, b1 and λ of Model 4 were

considered as θs, the extra parameters of the most complex candidate model. For Model

1 and Model 2, the prior distributions from Example 2 were considered here, and for

Model 3 and Model 4, the prior distribution of the parameters was taken as the ABC

posterior obtain based on the same dataset from Model 1 used in Example 2 with priors

for Model 3 - b1 ∼ U(0.4, 1) and λ ∼ exp(rate= 0.01) and Model 4 - b1 ∼ U(0.01, 0.6),

b2 ∼ U(0.01, 0.0.5) and λ ∼ exp(rate= 0.01).


In this example, the utility evaluations are computationally more expensive than the pre-

vious example as there are more competing models to consider. Thus, we implemented

parallel computing to evaluate utilities within the RCE algorithm (see lines 6 and 11 of

Algorithm 3.3). Specifically, the RCE algorithm was run five times in parallel using five

nodes each with 12 cores. In estimating the expected utility of mutual information and

Zero-One utility, 500 Monte Carlo samples from each model were used, and for Ds-optimal

utility 1000 Monte Carlo samples from Model 4 were used to achieve the desired accuracy

(Monte Carlo error less than 0.02). Optimal designs with 1 to 10 design points obtained

under each utility function and their corresponding utility values are given in Table 3.4.

In contrast to previous examples, here the average run time over five runs of the RCE

algorithm in locating each optimal design is given. It is evident that the Ds-optimal util-

ity function is computationally less expensive than the other two utility functions, and

the Zero-One utility function rapidly becomes computationally intensive as the number

of candidate models increases. As experienced in Example 2, the process should be ob-

served in its early stages to collect informative observations for discriminating between

competing models.


Table 3.4: Utility of optimal designs for discriminating between Models 1, 2, 3 and 4 derived underdifferent utility functions.

Utility

function|d|

Optimal design

d∗U(d∗)

Average

run time

(minutes)

Mutual

information

1 0.5 -1.22 (0.01) 5.2

2 (0.9, 5.3) -1.11 (0.01) 21.9

3 (0.8, 3.8, 6.9) -1.08 (0.01) 52.2

4 (0.1, 1.0, 3.8, 5.2) -1.06 (0.02) 82.6

6 (0.1, 0.9, 3.7, 5.2, 7.1, 8.5) -1.03 (0.01) 113.3

8 (0.1, 0.2, 0.9, 3.7, 5.1, 7.1, 8.1, 10.8) -1.00 (0.01) 226.6

10 (0.1, 0.2, 1.0, 3.7, 4.9, 5.8, 7.7, 8.7, 11.2, 13.8) -1.00 (0.01) 312.2

Ds-optimal

1 0.2 1.48 (0.02) 4.7

2 (0.4, 3.5) 1.75 (0.02) 32.2

3 (0.1, 0.6, 3.4) 1.87 (0.02) 25.5

4 (0.1, 0.6, 3.1, 4.1) 1.87 (0.02) 64.9

6 (0.1, 0.3, 0.8, 3.1, 5.7, 9.5) 1.89 (0.02) 95.8

8 (0.1, 0.4, 2.3, 3.9, 7.8, 10.2, 13.1, 15.0) 1.88 (0.02) 109.7

10 (0.1, 0.6, 2.2, 3.5, 7.8, 9.2, 9.5, 11.0, 14.3, 15.6) 1.89 (0.02) 179.5

Zero-One

(0-1)

1 5.6 0.39 (0.01) 15.9

2 (0.9, 7.0) 0.45 (0.01) 99.0

3 (0.1, 0.4, 5.0) 0.48 (0.01) 170.9

4 (0.1, 0.9, 3.4, 7.2) 0.51 (0.01) 255.4

6 (0.1, 0.2, 0.9, 4.1, 6.7, 7.5) 0.54 (0.01) 364.5

8 (0.1, 0.2, 1.2, 3.9, 5.2, 7.1, 8.0, 9.1) 0.55 (0.01) 635.6

10 (0.1, 0.2, 0.4, 1.3, 3.5, 5.1, 7.2, 8.7, 9.0, 12.1) 0.55 (0.01) 788.8

Further, the performance of our proposed methodology for locating designs in higher di-

mensions (|d| ≥ 6) was explored. For this exploration, the mutual information utility was

considered. Table 3.5 compares the quality of designs obtained by increasing the number

of models simulations (N) used for posterior approximations (see Algorithm 3.2), where

the expected utility of each design was evaluated using 4× 105 prior predictive samples.

According to these designs, it is evident that little to no improvement in the utility can be

achieved by increasing the number of model simulations. This provides some assurance

that highly efficient designs have been located under the initially selected settings for this

example.


Table 3.5: Utility of optimal designs for discriminating between Models 1, 2, 3 and 4 derived under themutual information utility.

|d|N

(×105)

Optimal design

d∗U(d∗)

Average

run time

(minutes)

6

1 (0.1, 0.9, 3.7, 5.2, 7.1, 8.5) -1.03 (0.01) 113.3

2 (0.1, 0.8, 2.2, 4.3, 5.4, 7.1) -1.03 (0.01) 313.7

4 (0.1, 0.3, 1.2, 3.7, 4.8, 6.9) -1.01 (0.01) 766.4

8

1 (0.1, 0.2, 0.9, 3.7, 5.1, 7.1, 8.1, 10.8) -1.00 (0.01) 226.6

2 (0.1, 0.2, 1.0, 3.7, 5.1, 7.1, 8.1, 10.8) -1.00 (0.01) 575.8

4 (0.1, 0.2, 1.0, 3.7, 4.8, 5.1, 6.8, 8.5) -1.00 (0.01) 1244.7

10

1 (0.1, 0.2, 1.0, 3.7, 4.9, 5.8, 7.7, 8.7, 11.2, 13.8) -1.00 (0.01) 312.2

2 (0.1, 0.2, 0.9, 3.3, 4.3, 5.1, 7.1, 8.1, 10.8, 12.1) -0.99 (0.01) 709.5

4 (0.1, 0.3, 1.0, 3.8, 5.1, 6.0, 7.1, 8.1, 8.7, 11.0) -1.00 (0.01) 1507.8

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75

ABC posterior model probability

Em

pir

ical cum

ula

tive

pro

babili

ty

Utility function

0−1

Ds

MI

RD

(a) 1 design point

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.00


Em

pir

ical cum

ula

tive

pro

babili

ty

Utility function

0−1

Ds

MI

RD

(b) 4 design points

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.00


Em

pir

ical cum

ula

tive

pro

babili

ty

Utility function

0−1

Ds

MI

RD

(c) 8 design points

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.00


Em

pir

ical cum

ula

tive

pro

babili

ty

Utility function

0−1

Ds

MI

RD

(d) 10 design points

Figure 3.7: Empirical cumulative probabilities of the ABC posterior model probability of Model 1 (truemodel) obtained for observations generated from Model 1 according to optimal designs for discriminatingbetween Models 1, 2, 3 and 4, and random designs.


As in Example 2, the performance of the derived optimal designs given in Table 3.4

was evaluated via a simulation study by generating data from all four competing models

considered in this example. However, here the ABC posterior model probability of the

true model was used for validation to avoid the high computational cost of computing

the actual posterior model probabilities of Model 3 and Model 4. Across all models, the

optimal designs generally perform better than the random design shown by much larger

empirical CDF values near 1. Specifically, the optimal designs found under the mutual

information and Zero-One utility functions perform equally well across all data generating

models. When Model 1 is the data generating model, (see Figure 3.7), Ds-optimal designs

do not perform as well as the designs from other two utilities but actually outperforms

the other optimal designs when Model 4 is generating data (see Figure A.4). This could

be due to the fact that the Ds-optimality utility is focussing on estimating the additional

parameters in Model 4 with the single parameter of Model 1 not being included in the

utility function.

3.8 Discussion

In this work, a methodology for selecting Bayesian designs for discriminating between

models with intractable likelihoods in epidemiology has been developed. Primarily, Bayesian

design for model discrimination in epidemiology has been considered by introducing a

computationally efficient method to estimate three model discrimination utilities, namely

the mutual information utility, the Ds-optimal utility and the Zero-One utility, via ABC

methods. Secondly, an adaptation of the coordinate exchange algorithm which exploits

parallel computational architectures was presented.

The results from comparing our utility approximation in Example 2 with the estimated

utility when the likelihood could be computed suggests that our approach can be used to

evaluate designs when competing models contain intractable likelihoods. Across Examples

2 and 3, all discrimination utilities generally performed well yielding data which could be

used to determine (with high probability) the true data generating model. However, there

was one instance where the Ds-optimality utility performed poorly, and this appeared to

be due to how the utility was constructed (in terms of which subset of parameters was

considered). One benefit of implementing this utility is that it required much less compu-

tational effort than the other two utilities. This is particularly noticeable when compared

to the Zero-One utility which becomes computationally expensive as the number of rival

models (K) increases since it requires (K−1) posterior model probability approximations

for a single evaluation of u(d,y,m) whereas Ds-optimality utility requires only one poste-

rior approximation. The mutual information utility also becomes moderately expensive to

evaluate as more candidate models are considered because it needs to evaluate u(d,y,m)

for each model m which requires a single posterior model probability approximation.


In Example 3, we demonstrated our methodology for the location of designs of up to ten

dimensions. This methodology could potentially be used to find designs in more than

ten dimensions. However, as the number of dimensions increases, so does the ABC toler-

ance. As such, there may be a point where this tolerance is so large that it hinders our

ability to locate optimal designs. This could be addressed by increasing the number of

prior simulations to obtain a reasonably sized posterior sample with acceptable tolerance,

but this comes at a computational cost and increases the amount of memory required

to store these simulations. Thus, alternative approaches to ABC rejection may be of

interest. These could include the expectation propagation ABC (EP-ABC) algorithm of

Barthelme and Chopin (2014) and the synthetic likelihood approach of Wood (2010) and

Price et al. (2018c). We plan to explore these ideas in future work.

In this paper, the RCE algorithm was proposed to locate Bayesian designs. We also

proposed an adaptation to the ACE algorithm such that it can be used to search across

discrete design spaces. Both algorithms are able to take advantage of the use of pre-

simulated data, and gave promising results yielding relatively highly efficient designs in

high dimensions and in intractable likelihood settings. The computational efficiency of

RCE and being able to exploit parallel computational architectures eventually led to this

algorithm being preferred for design problems where computationally intensive utility

evaluations are involved. However, the RCE algorithm could potentially benefit from

employing an emulator of the utility surface (as used in ACE-D), and this could be con-

sidered in further research. One drawback of using a discrete design space is the located

design could be sub-optimal. However, the selection of a small increment for the grid

that is used to discretise the design space should enable the location of relatively highly

efficient designs in a reasonable amount of time where the use of a continuous design space

could be computationally prohibitive, particularly for complex models.

Given the developments presented in this paper, it is now possible to undertake Bayesian

design for parameter estimation and model discrimination in settings where the likelihood

is intractable. Thus, it should also be possible to consider dual purpose utility functions

such as the total entropy utility of Borth (1975) which addresses both of these experi-

mental goals. Recently, this utility has been implemented in settings where the likelihood

can be evaluated straightforwardly (McGree, 2017), but it is of interest to extend such

methodology so that experiments in epidemiology are informative for both model selec-

tion and parameter estimation.

4 Dual purpose Bayesian design for parameter es-

timation and model discrimination in epidemi-

ology using a synthetic likelihood approach

61

Chapter 4. Dual purpose Bayesian design for experiments in epidemiology 62



that:



expertise;






academic unit; and




The reference for the publication associated with this chapter is; Dehideniya M. B.,

Drovandi C. C. , and McGree J. M. Dual purpose Bayesian design for parameter esti-

mation and model discrimination in epidemiology using a synthetic likelihood approach.

Bayesian Analysis (Submitted for publication)




co-authors.

Signature and date:

J. M. McGree Initiated the research concept, supervised research,

assisted in interpreting results, critically reviewed

manuscript.

C. C. Drovandi Supervised research, assisted in interpreting results,

critically reviewed manuscript.




05/07/2019



4.1 Abstract

Foot and mouth disease (FMD) is a highly contagious infectious disease which has fre-

quently plagued livestock across many different countries worldwide. Currently, the

spread of the disease is not well understood, and thus experiments are needed such that

targeted disease detection, prevention and control measures can be developed. However,

developing such experiments is challenging as typically the likelihood of models for such

infectious diseases is computationally intractable. This poses challenges in quantifying

the usefulness of different experiments through a utility function. For this purpose, a

novel synthetic likelihood approach is considered which allows experiments for infectious

diseases to be developed through the consideration of a dual-purpose utility function for

parameter estimation and model discrimination. The new methodology is validated on

an illustrative example before being applied to experiments for FMD which motivate this

work. Across both examples, the results suggest that the derived dual purpose designs

perform similarly well in achieving each experimental goal when compared to the designs

optimised for each individual goal. Further, the results from the motivating example

suggest that new knowledge about how FMD spreads throughout a population could be

discovered if our approaches are adopted in future experimentation.

4.2 Introduction

Understanding the dynamics of infectious diseases is important for the development and

implementation of detection, prevention and control measures. In the field of veterinary

epidemiology, the dynamics of disease transmission is investigated by observing the spread

of the disease over time among a small population of animals in a controlled experiment.

Unfortunately, such experimentation can have detrimental effects on animal welfare and

can be costly in terms of time and money. This motivates the need to efficiently design

such studies, both in terms of the number of experiments which need to be conducted

and the number of times with which the population needs to be observed.

Methods from optimal design can be used to address this need for efficiency, and have

been specifically developed to handle the computational intractability of evaluating the

likelihood of models typically found in epidemiology. To determine informative times for

when to observe these controlled experiments, methods have been proposed for efficient

estimation of transmission parameters (Cook et al., 2008, Drovandi and Pettitt, 2013,

Pagendam and Pollett, 2013) and to determine the most appropriate model to describe

the spread of the disease (Dehideniya et al., 2018b). However, further efficiency could

be achieved through the consideration of multiple experimental objectives, and thus lead

to more ethical experimentation. Such developments are of particular importance when

experimenting with infectious diseases such as foot and mouth disease (FMD) which can

be quite harmful to animals, and this motivates the research presented in this article.


FMD is a highly contagious disease which impacts livestock such as cattle, pigs and sheep

(Knight-Jones and Rushton, 2013). There have been a number of outbreaks of FMD

including in 2001 in the United Kingdom (Haydon et al., 2004) and the Netherlands

(Bouma et al., 2003), and in 2010 in Japan (Muroga et al., 2012) which had large-scale

economic impact, and resulted in the mass culling of livestock in attempts to prevent the

disease from spreading further. Consequently, much attention has focussed on studying

the transmission dynamics of FMD, for instance, Backer et al. (2012), Bravo de Rueda

et al. (2015), Hu et al. (2017), Orsel et al. (2007) and the references therein. These studies

have led to the development of conflicting views about which model is most appropriate to

describe the dynamics of FMD. Hence, it is required to design experiments which provide

highly informative data to select, among the competing models found in the literature,

the most appropriate model to describe the dynamics of FMD and to also estimate the

parameters of this model as precisely as possible. This motivates the development of a

dual purpose utility function that can be used to design experiments for epidemiological

models which typically have intractable likelihoods.

In general, deriving dual purpose designs for model discrimination and parameter esti-

mation is challenging as typically these are competing objectives. That is, designs for

model discrimination typically perform poorly for parameter estimation, and vice versa

(Atkinson, 2008). Many authors have used a weighted sum of utilities which represent

different objectives, for example, Atkinson (2008), Clyde and Chaloner (1996), McGree

et al. (2008), Tommasi (2009). However, such an approach has been shown to be diffi-

cult to implement in practice due to the choice of weighting parameter. In the Bayesian

context, Borth (1975) suggests an entropy-based utility to derive dual purpose designs

which avoids the pre-specification of weights and the need of pre-specifying a true model.

Thus, in this work, we adopt the Bayesian approach to design dual purpose experiments.

McGree (2017) notes the computational difficulties in using the total entropy utility in

situations where the likelihood can be evaluated, and this is further compounded by the

intractability of the likelihood function for models typically found in epidemiology. As

such, dual purpose experiments to estimate parameters and discriminate between models

with intractable likelihoods have not been considered in the literature.

In the Bayesian context, methods from approximate Bayesian computation (ABC) have

been used in designing experiments for parameter estimation (Drovandi and Pettitt, 2013,

Price et al., 2016) and model discrimination (Dehideniya et al., 2018b). However, these

methodologies are restricted to a small class of utility functions due to the use of ABC

rejection for approximating the posterior distribution. In particular, ABC rejection meth-

ods do not provide an efficient estimate of the model evidence, and as such cannot be

used with the widely popular Kullback-Leibler (KL) divergence utility for parameter es-

timation. This also renders such methods inapplicable for use with the total entropy

utility. Further, the ABC approximation could perform poorly as the number of design

dimensions increases. Consequently, there will be a point where this potentially hinders

locating the optimal design depending on the problem and the available prior information.


Hence we propose a novel synthetic likelihood method to approximate the likelihood, and

then to estimate the utility function more efficiently. Our developments of the synthetic

likelihood facilitate, not only the consideration of the total entropy utility function, but

a much wider class of utilities which have not been possible to consider previously (in-

cluding the KL divergence utility). Thus, our methodology should be useful in designing

experiments for models with intractable likelihoods in general.

Pagendam and Pollett (2013) used the Gaussian diffusion approximation in evaluating

designs for parameter estimation in the frequentist framework. Despite its computational

efficiency compared with the simulation-based approximations such as ABC, the Gaussian

diffusion approximation can only be used for a certain class of models, that is, density

dependent Markov chains. Further, as pointed out by Pagendam and Pollett (2013), the

diffusion approximation is only valid for reasonably large populations (say > 100). Hence,

the diffusion approximation cannot be used in experiments in veterinary epidemiology

which typically consider a small number of animals due to the cost of conducting such

experiments, and concerns about animal welfare. The proposed synthetic likelihood does

not have such constraints, and thus our methodology is more appropriate in general.

The paper is organised as follows. In the next section, our proposed synthetic likelihood

method is described. Section 4.4 presents the utility functions used in this work, and we

show how these can be efficiently estimated within our synthetic likelihood framework.

Then, an illustrative example is considered in Section 4.5 along with the motivating

application for this work. Finally, we conclude with a discussion of our proposed methods

and directions for future research.

4.3 Inference framework

In epidemiology, continuous-time Markov chain (CTMC) models are used to describe the

spread of a disease in a closed-population where the individuals are divided into a set

of non-overlapping subpopulations according to their disease status, such as susceptible,

exposed, infectious, or recovered. Depending on the disease of interest, all or a subset of

these subpopulations are considered, and the state of the Markov chain at time t is defined

by the number individuals in each subpopulation. Then, the transition of individuals

between these subpopulations (disease states) are described by transition probabilities

defined based on unknown parameter values. The likelihood of a series of observed states,

y = {y1,y2, . . . ,yk}, of the Markov process at times d = {t1, t2, . . . , tk} can be expressed

as the product of transition probabilities, p(y|θ,d) =∏k

i=1 p(yi|yi−1), where p(yi|yi−1)is the transition probability of moving from state yi−1 to yi during the time from ti−1

to ti for given parameter values θ. Unfortunately, the evaluation of these transition

probabilities is a computationally expensive task due to the Markov processes having

(at least) a reasonable number of states. This results in a computationally intractable

likelihood for most epidemiological models. Thus, likelihood-free methods are required

for inference.


The synthetic likelihood approach (Wood, 2010) is a simulation-based method used to

form an approximation to intractable likelihoods. To approximate the likelihood of ob-

served data yobs for given parameter values θ, summary statistics sobs = S(yobs) are

considered. This will typically map the data into a smaller number of dimensions, and

is frequently used in likelihood-free inference to avoid the curse of dimensionality. The

likelihood for the observed data is then approximated by the likelihood of the summary

statistics given the model and θ where these summary statistics are assumed to be mul-

tivariate Normal with mean vector µ(θ) and covariance matrix Σ(θ). Usually, µ(θ) and

Σ(θ) are intractable functions but can be estimated via simulation from the model for

given values of θ. That is, one can simulate n datasets from the model of interest, and

compute summary statistics (s(1)sim, s

(2)sim, . . . , s

(n)sim) which can be used to estimate µ(θ)

and Σ(θ). Thus, despite the likelihood being intractable, we require it to be computa-

tionally efficient to generate data from the model. Originally, Wood (2010) focused on

estimating parameters via maximising the synthetic likelihood, and recently Price et al.

(2018c) extended this to the Bayesian paradigm by incorporating a prior distribution on

the parameters of interest.

The FMD application considered here involves datasets with a moderate number of ob-

servations, and thus, we derive the synthetic likelihood based on the distribution of the

simulated data themselves rather than considering summary statistics. To specify the

proposed synthetic likelihood approach, let us consider a univariate Markov process with

a discrete state space {1, 2, . . . , N} and y = {y1, y2, ..., yk} which are the observed states

of the Markov process at times d = {t1, t2, ..., tk}. For instance, the Susceptible-Infected

(SI) model (Cook et al., 2008) divides a closed-population of N individuals into two sub-

populations: susceptible and infectious individuals, and the state of the Markov process

at time ti is the number of infectious individuals at ti. Then, the likelihood of y for a

given model m with parameters θ, p(y|θ,m,d), can be approximated by assuming that y

follows a k-dimensional multivariate Normal distribution, N (µ(θ,m,d), Σ(θ,m,d)) where

µ(θ,m,d) and Σ(θ,m,d) are the estimated mean vector and the covariance matrix based

on n simulated datasets, denoted by X = {xti ; i = 1, ..., k}, from model m given pa-

rameter values θ at each time point ti. Specifically, µ(θ,m,d) is a vector of k elements

where the ith element is the estimated mean of xti and (i, j) element of Σ(θ,m,d) is the

estimated covariance between xti and xtj .

However, in our motivating study, count data are observed, and thus the multivariate

Normal density may not be appropriate to describe the likelihood of these discrete data,

particularly in the later stages of the spread of disease where only a few unique out-

comes are plausible. Thus, the idea of continuity correction is applied to approximate

p(y|θ,m,d), and this can be expressed as follows,

pSL(y|θ,m,d) = p(y1 − c < X1 < y1 + c, ..., yk − c < Xk < yk + c), (4.1)


where (X1, X2, ..., Xk) ∼ N (µ(θ,m,d), Σ(θ,m,d)) and c is the continuity correction fac-

tor which is set to 0.5 here. The accuracy of this approximation depends on n, the number

of simulations used, and this makes the synthetic likelihood method more convenient to

use in practice as less tuning is required compared to ABC methods. Further, in principle,

it is straightforward to extend this to l-dimensional stochastic processes by considering a

(l × k)-dimensional multivariate Normal distribution. However, as (l × k) increases, the

evaluation of pSL(y|θ,m,d) becomes more computationally expensive, and this may be

a limitation if (l × k) is large.

When the prior of parameters θ is updated based on a small to moderate number of

observations, the posterior of θ can be approximated via importance sampling. However,

this requires evaluating the likelihood which is generally computationally intractable for

epidemiological models. In place of this, we propose to use our synthetic likelihood

approximation. That is, first, a set of particles {θi}Qi=1 is drawn from p(θ) with equal

weights. Then, to reflect the posterior of θ upon observing y, these particles are weighted

by their corresponding normalised weights, W i, which are obtained by normalising the

likelihoods wi = pSL(y|θi,m,d).

Moreover, the marginal likelihood of y for model m can be approximated via Monte

Carlo integration based on the synthetic likelihood of y evaluated for a sample of {θi}Qi=1

drawn from the corresponding parameter distribution. Thus, the approximated marginal

likelihood is given by,

p(y|m,d) =1

Q

Q∑i=1

pSL(y|θi,m,d). (4.2)

When there are K competing models to describe the observed data y at times d, the

probability p(y|d) can be approximated by,

p(y|d) =

K∑m=1

p(y|m,d) p(m), (4.3)

where p(m) is the prior model probability of model m. Then, the posterior model prob-

ability of model m can be approximated as follows:

p(m|y,d) =p(y|m,d) p(m)

p(y|d). (4.4)

Hence, our proposed synthetic likelihood approach can be used to undertake parameter

estimation and model selection. Moreover, such approximations are of general use when

approximating utility functions in Bayesian design for models with intractable likelihoods.

Further details about this will be shown in the next section.



Designing an experiment is a decision making process about how to select suitable values of

controllable variables of the experiment in order to efficiently address experimental aims.

The set of values selected for the controllable variables is referred to as the design, and

the experimental aims are defined via a utility function u(.). In deciding upon a design,

one must account for uncertainty about the model (m), the ensuing parameter values (θ)

and the data (y) that will be observed. This can be achieved through evaluating the

expected value of the utility function. Hence, in the Bayesian setting, possible designs

are evaluated and compared based on the expected value of u(.) which represents the

informativeness of y for addressing the aim of the experiment.

When the experimenter considers a single model to describe the process of interest, find-

ing the optimal design involves maximising the expected value of the utility function

u(d,y,θ), with respect to the joint distribution of y and θ, and this expectation can be

expressed as follows:

U(d) =

∫y

∫θu(d,y,θ) p(y|θ,d) p(θ) dθ dy, (4.5)

where p(y|θ,d) is the likelihood of a possible outcome y under parameters θ and design

d, and p(θ) is the prior distribution of θ which allows prior knowledge or expert opinion

to be incorporated into the process of designing efficient experiments. In practice, utility

functions for parameter estimation are often defined as a function of posterior distribution

of parameters such as KL divergence between prior and the posterior (Cook et al., 2008),

a commonly used utility in Bayesian experimental design. In such utilities, θ is integrated

out, and consequently can be defined as follows:

U(d) =

∫yu(d,y) p(y|d) dy, (4.6)

where p(y|d) is the prior predictive distribution of observed data y under design d.

In the presence of K competing models to describe the process of interest, the sum of

expected utilities under each model weighted by the corresponding prior model probabili-

ties p(m);m = 1, 2, ...,K can be considered as the expected utility. This can be expressed

as follows:

U(d) =

K∑m=1

p(m){∫

yu(d,y,m) p(y|m,d) dy

}, (4.7)

where p(y|m,d) is the prior predictive distribution of observed data y under design d

according to model m and p(m) is the prior model probability of model m.


Unfortunately, given the form of most utility functions, the above integral is generally ana-

lytically intractable, and therefore needs to be approximated. One conventional approach

is to approximate the above integral via Monte Carlo integration using N independent

datasets (y) generated from the prior predictive distribution of each model, and this can


U(d) =

K∑m=1

p(m)1

N

N∑j=1

u(d,ymj ,m), (4.8)

where ymj ∼ p(y|m,d) are independent draws from the model m at time points d. The

number of prior predictive simulations, K × N , needed to estimate U(d) with a desired

accuracy can be reduced using the randomised Quasi-Monte Carlo method (Drovandi and

Tran, 2018). However, for simplicity, here we estimate the expected utility via standard

Monte Carlo integration as given in Equation (4.8).

As will be seen in the next subsections, the evaluation of each u(d,ymj ,m) involves some

posterior evaluation, and thus the approximation of U(d) requires approximating or sam-

pling from K×N posterior distributions which is a considerable computational challenge.

The following subsections will describe the total entropy utility function and other two

utility functions considered in this work. Then, a computationally efficient approach to

approximate each utility function via the synthetic likelihood will be discussed.

4.4.1 Dual purpose utility function of parameter estimation and

model discrimination

In the design literature, the derivation of dual purpose experimental designs for parame-

ter estimation and model discrimination has been considered by means of weighted crite-

ria (Atkinson, 2008, Hill et al., 1968) and entropy-based utilities (Borth, 1975, McGree,

2017). In practice, the choice of appropriate weight for each experimental aim when using

weighted criteria is not straightforward (Clyde and Chaloner, 1996, McGree et al., 2008).

In contrast, the utility function based on the total entropy does not require pre-specified

weights for each experimental aim as the additive property of entropy provides a natural

way to weight both model discrimination and parameter estimation within a utility func-

tion. However, due to the computational challenges in evaluating the total entropy utility,

it has received little attention in the literature until the recent work of McGree (2017) who

proposed a computationally efficient way of estimating the utility via sequential Monte

Carlo.

In this work, we consider a utility based on the expected change in total entropy upon

observing the experimental data to derive static designs for dual purpose experiments of

parameter estimation and model discrimination. The total entropy utility for a possible

dataset y under model m according to the design d can be expressed as,


uTE(d,y,m) =

∫θm

log p(y|θm,m,d) p(θm|m,y,d) dθm − log p(y|d). (4.9)

The derivation of this utility function is described in Appendix B.1 of the supplemen-

tary material. For an observed dataset y under model m given design d, the first term

of uTE(d,y,m) can be approximated via Monte Carlo integration using a sample of

weighted particles {θimW im}

Qi=1 which forms a particle approximation to p(θm|m,y,d).

These weighted particles can be obtained via importance sampling based on the synthetic

likelihood method as explained in Section 4.3. The second term p(y|d) can be approxi-

mated as shown in Equation (4.3) based on draws from p(θm|m) for each model m. Then,

the approximated total entropy utility can be expressed as,

uTE(d,y,m) =

Q∑i=1

W im log p(y|θim,m,d)− log p(y|d). (4.10)

The total entropy utility can be expressed as a summation of two widely used utility

functions, the KL divergence utility (Ryan, 2003) for parameter estimation and the mutual

information utility for model discrimination (Box and Hill, 1967) (see Appendix B.1 of

the supplementary material). Thus, we consider the designs optimised based on these

utility functions to investigate the performance of total entropy designs for addressing

each objective. For completeness, the following subsections briefly describe the KLD

utility and the mutual information utility.

4.4.2 Utility function for parameter estimation

The KL divergence between the prior and the posterior distribution of the model pa-

rameters can be used as a utility function to design an experiment to efficiently estimate

parameters of a given model m (Cook et al., 2008, Price et al., 2016, Ryan, 2003). It can


uKLD(d,y,m) =

∫θm

log p(y|θm,m,d) p(θm|m,y,d) dθm − log p(y|m,d), (4.11)

where y is a possible outcome of the experiment conducted under the design d according

to the model m. When the experimenter considers more than one potential model to

describe the process of interest, the sum of the expected KL-divergence between the prior

and the posterior of parameters of each model m, weighted by the corresponding prior

model probabilities, can be used to obtain designs for parameter estimation under model

uncertainty (see Equation (4.7)). Further, this can be expressed as the expected change

in the entropy of the parameter based on the observations y given design d (see Appendix

B.1 in the supplementary material).


In a similar way that uTE(d,y,m) is approximated in Section 4.4.1, the first term of the

KLD utility can be approximated using a set of weighted particles {θimW im}

Qi=1 obtained

for an observed dataset y from model m. The log-marginal likelihood of y, log p(y|m,d),

can also be approximated using the synthetic likelihood approach described in Section

4.3 (see Equation (4.2)). Then, the approximation of uKLD(d,y,m) can be expressed as

follows:

uKLD(d,y,m) =

Q∑i=1

W im log p(y|θim,m,d)− log p(y|m,d). (4.12)

4.4.3 Utility function for model discrimination

In the design literature, the mutual information between the model indicator and the

outcomes of the experiment has been used as a utility function to evaluate sequential

designs for this purpose (Box and Hill, 1967, Drovandi et al., 2014). More recently,

Dehideniya et al. (2018b) used this utility to derive designs for discriminating between

epidemiological models with intractable likelihoods. The mutual information utility can

be expressed as,

uMI(d,y,m) = log p(m|y,d), (4.13)

where p(m|y,d) is the posterior model probability of model m. For models with in-

tractable likelihoods, this can be approximated by replacing p(m|y,d) with the approxi-

mated posterior model probability of m via our synthetic likelihood approach as described

in Section 4.3 (see Equation (4.4)). Then, the expected mutual information utility for a

given design d can be obtained by substituting approximated uMI(d,y,m) in Equation

(4.8).

It is worth noting that p(y|m,d) and p(y|d) could not be evaluated efficiently using ABC

methods, and this is another motivation for adopting our synthetic likelihood approach

in this work.

In the next section, the total entropy utility will be used to derive designs for two epi-

demiological experiments which aim to both discriminate between competing models and

estimate model parameters. Then, the performance of derived dual purpose designs will

be compared to the performance of designs which are optimised for each objective.

4.5 Examples

In this section, two examples which involve epidemiological models with intractable like-

lihoods are considered to demonstrate the methodology presented in this paper. The first


example demonstrates the performance of the proposed synthetic likelihood approxima-

tion in estimating the utility functions described in the previous section, and investigates

the performance of the dual purpose designs derived based on the total entropy utility.

The second example, which motivates our research, focuses on designing an efficient ex-

periment to investigate the within-herd spread of the FMD in a cattle population. In

these examples, the following models are used to describe the spread of the infectious

disease of interest in a closed-population of size N .

Model 1 : Death model

The death model (Cook et al., 2008) divides the population into two sub-populations,

susceptible and infected. The state of the CTMC at time t is defined as the number

of infected individuals at time t, I(t). Given I(t) = i, the transition probability of

the possible state at t+ ∆t is given by

P[i+ 1 | i

]= β1 (N − i) ∆t +O(∆t),

where β1 is the rate at which susceptible individuals become infected due to envi-

ronmental sources.

Model 2 : Susceptible-Infected (SI) model

The SI model (Cook et al., 2008) assumes that the infected individuals in the popu-

lation also contribute to the spread of diseases, represented by an additional param-

eter β2, in addition to environmental sources. Given that I(t) = i, the transition

probability of the possible state at t+ ∆t is given by

P[i+ 1 | i

]= (β1 + β2 i) (N − i) ∆t +O(∆t).

Model 3 : Susceptible-Infectious-Recovered (SIR) model

The SIR model divides a closed-population of individuals into three sub-populations:

susceptible, infectious and recovered. The state of the CTMC at time t is defined as

the number of susceptible and infectious individuals at time t, {S(t), I(t)}, respec-

tively. According to the SIR model, the susceptible individuals become infected and

are immediately infectious. These infectious individuals contribute to the spread of

the disease until they recover. The duration of the infectious period of each individ-

ual is independently and identically distributed as an exponential random variable

with rate parameter α. Given that (S(t), I(t)) = (s, i), the transitions probabilities

for possible states at t+ ∆t are given by

P[s− 1 , i+ 1 | s , i

]=β s i

N∆t +O(∆t),

P[s, i− 1|s, i

]= α i∆t +O(∆t),

where β is the rate at which an infectious individual makes infected-contacts per

unit time.


Model 4 : Susceptible-Exposed-Infectious-Recovered (SEIR) model

In contrast to the SIR model, the SEIR model assumes that the susceptible indi-

viduals do not become infectious immediately after they become infected but after

a latent period of time TE . During the latent period, TE which is distributed as

an exponential random variable with rate αE , the infected individuals cannot be

differentiated from the susceptible individuals. Thus, these individuals are called

exposed individuals, and the number of exposed individuals at time t is denoted

by E(t). Once the exposed individual becomes infectious, it spreads the disease

to other susceptible individuals until it recovers after time TI which is distributed

as an exponential random variable with rate αI . The state of the CTMC at time

t is defined as the number of susceptible, exposed and infectious individuals at

time t, {S(t), E(t), I(t)}, respectively. Given that (S(t), E(t), I(t)) = (s, e, i), the

transitions probabilities for possible states at t+ ∆t are given by

P[s− 1 , e+ 1 , i | s , e , i

]=β s i

N∆t +O(∆t),

P[s , e− 1 , i+ 1 | s , e , i

]= αEe∆t +O(∆t),

P[s , e , i− 1 | s , e , i

]= αIi∆t +O(∆t),

where β is the rate at which an infectious individual makes infected-contacts per

unit time.

In conducting experiments to investigate the spread of a disease, it is impractical to

observe the process and collect samples continuously. Even collecting samples at frequent

intervals would be costly. Thus, in the following examples, we consider a set of k optimal

times at which to observe the process to collect samples for both parameter estimation and

model discrimination. It may be computationally prohibitive to find the optimal design

in continuous design space (time) as it requires a large number of model simulations for

each utility evaluation, particularly for complex models such as SIR and SEIR. However,

the simulated n datasets to obtain µ(θ,m,d) and Σ(θ,m,d) are independent from the

observed data y (see Section 4.3). Thus, we avoid undertaking an excessively large number

of simulations in the optimal design procedure by discretising the design space as a grid

of times from tmin to tmax with increments of tinc and pre-computing µ(θ,m,d) and

Σ(θ,m,d) using the pre-simulated datasets at each time point of the grid. Then, these pre-

computed mean vectors and covariance matrices are used for approximating the likelihood

and utility values in a computationally efficient way during the process of locating optimal

designs. Another advantage of this approach is the accuracy of the approximation can

be increased by increasing the number of model simulations used to estimate µ(θ,m,d)

and Σ(θ,m,d) without any additional storage requirements. In the design literature, pre-

simulated datasets have been used to estimate utilities, for instance, Drovandi and Pettitt

(2013), Hainy et al. (2016). As we use a discrete design space in the examples considered


in this work, the refined coordinate exchange (RCE) algorithm (Dehideniya et al., 2018b)

is used to locate optimal designs. The RCE algorithm starts with a given initial design

and searches for the optimal design by iteratively optimising one design variable at a time

while keeping all other design variables fixed. In each iteration, the best value for the

selected design variable is found by refining the grid on which U(d) is evaluated, starting

from a grid with a given initial step size (see Section 5 of Dehideniya et al. (2018b) for

the complete algorithm). In each example, the RCE algorithm was run five times, with

different initial designs, in parallel with an initial step size of 1.

4.5.1 Example 1 - Death and SI models

To illustrate the proposed methodology, first, we consider the problem of deriving an

efficient set of time points to observe the spread of a disease, which provides maximum

information to both discriminate between two rival models namely death model and SI

model, and estimate model parameters. Then, designs for each individual experimental

goal are also considered to compare the performance of derived dual purpose designs

in addressing each individual experimental goal. Here, a grid of times from 0.1 to 20

days with increment 0.1 was considered as the design space, and designs with 1 to 8

design points were derived. This example has been previously considered by Dehideniya

et al. (2018b) for discriminating between the death and SI model. Here, for the death

model β1 ∼ log − normal(−0.48, 0.09) is used as the prior distribution, and for SI model

β1 ∼ log − normal(−1.1, 0.16) and β2 ∼ log − normal(−4.5, 0.4) are used as the prior

distributions. These two models were assumed equally likely a priori.

First, the accuracy of approximating the expected utility of a given design d using the

proposed synthetic likelihood approach was evaluated. In this comparison, 500 randomly

selected designs with two, four and eight points were used, and for the one-point case, all

200 possible designs were considered. For each of these designs, U(d) was evaluated for

KLD, mutual information for model discrimination and total entropy based on both the

actual likelihood and the approximated likelihood via our synthetic likelihood approach,

and the results are shown in Figure 4.1. Here, 500 independent datasets from each model

were used to approximate the expected utilities (see Equation (4.8)), and the posterior

distribution of parameters of each model was approximated via importance sampling as

described in Section 4.3, using 500 parameter values drawn from the corresponding prior

distribution. Further, 1000 model simulations were used to estimate the mean vector and

covariance matrix in each synthetic likelihood evaluation. As is evident from the first row

of Figure 4.1, the approximated expected values of the mutual information utility closely

match the corresponding estimated expected utility values based on actual likelihood

evaluations. In contrast, the adapted synthetic likelihood approach overestimates the

expected utility values of both total entropy and KLD utilities (see Figure 4.1). However,

the candidate designs can still be ranked correctly (in terms of utility) based on our

approximation. Thus, our approach can be used to find optimal designs for models with

intractable likelihoods. Further, an additional simulation study reveals that our synthetic


−0.7

−0.6

−0.6

−0.6

−0.7 −0.6 −0.6 −0.6

U(d) − Actual

U(d

) −

SL

Ap

pro

xim

atio

n

−0.7

−0.6

−0.6

−0.6

−0.5

−0.7 −0.6 −0.6 −0.6 −0.5

U(d) − Actual

U(d

) −

SL

Ap

pro

xim

atio

n

−0.7

−0.6

−0.5

−0.7 −0.6 −0.5

U(d) − Actual

U(d

) −

SL

Ap

pro

xim

atio

n

−0.7

−0.6

−0.5

−0.7 −0.6 −0.5

U(d) − Actual

U(d

) −

SL

Ap

pro

xim

atio

n

0.0

0.2

0.4

0.6

0.8

0.0 0.2 0.4 0.6 0.8

U(d) − Actual

U(d

) −

SL

Ap

pro

xim

atio

n

0.0

0.3

0.6

0.9

0.0 0.3 0.6 0.9

U(d) − Actual

U(d

) −

SL

Ap

pro

xim

atio

n

0.5

1.0

0.0 0.2 0.5 0.8 1.0 1.2

U(d) − Actual

U(d

) −

SL

Ap

pro

xim

atio

n

0.5

1.0

0.5 1.0

U(d) − Actual

U(d

) −

SL

Ap

pro

xim

atio

n

0.0

0.2

0.4

0.6

0.0 0.2 0.4 0.6

U(d) − Actual

U(d

) −

SL

Ap

pro

xim

atio

n

1 design point

0.0

0.2

0.5

0.8

0.0 0.2 0.5 0.8

U(d) − Actual

U(d

) −

SL

Ap

pro

xim

atio

n

2 design points

0.3

0.6

0.9

0.0 0.2 0.5 0.8 1.0

U(d) − Actual

U(d

) −

SL

Ap

pro

xim

atio

n

4 design points

0.2

0.5

0.8

1.0

1.2

0.2 0.5 0.8 1.0

U(d) − ActualU

(d)

− S

L A

pp

roxim

atio

n

8 design points

Figure 4.1: Comparison of the estimated expected utility of the mutual information utility (first row), thetotal entropy utility (second row) and the KLD utility (third row) using synthetic and actual likelihoods.In each plot, y = x line indicates a perfect match of approximated and actual utility evaluations.

likelihood approach outperforms the ABC model choice algorithm (Grelaud et al., 2009)

in approximating the mutual information utility for model discrimination particularly for

designs with a moderate number of design points (see Appendix B.2 of the supplementary

material).

The derived dual purpose designs for discriminating between the death and SI model

and estimating parameters of both models are given in Table 4.1 along with the designs

found under the mutual information and KLD utilities. Here, the expected utility of each

optimal design was estimated using 100 re-evaluations of U(d∗) for different Monte Carlo

samples of size 500 from each model, and the estimated Monte Carlo error is given as a

standard deviation in the parenthesis.

The performance of the total entropy designs for model discrimination and parameter

estimation was assessed against designs optimised for each task and randomly selected


Table 4.1: Optimal designs derived under different utility functions.

Utility function Optimal design d∗ U(d∗)

Mutual Information

(0.5) -0.59 (0.01)(0.5, 3.5) -0.49 (0.01)(0.5, 1.2, 3.5) -0.46 (0.02)(0.5, 1.2, 3.5, 5.6) -0.44 (0.02)(0.5, 1.2, 2.2, 3.5, 5.6, 12.1) -0.43 (0.02)(0.2, 0.5, 1.2, 2.2, 3.5, 5.6, 12.1, 13.7) -0.43 (0.02)

Total Entropy

(2.8) 0.77 (0.02)(0.9, 3.6) 1.11 (0.03)(0.6, 2.5, 5.5) 1.25 (0.03)(0.6, 2.2, 3.7, 5.6) 1.32 (0.03)(0.5, 2.1, 3.5, 4.6, 5.7, 7.3) 1.41 (0.03)(0.5, 1.2, 2.2, 3.5, 4.7, 5.7, 7.3, 8.9) 1.48 (0.03)

KLD

(2.7) 0.74 (0.02)(1.9, 4.4) 0.92 (0.02)(0.7, 2.7, 5.7) 1.01 (0.02)(0.7, 2.3, 4.3, 6.0) 1.08 (0.02)(0.7, 2.2, 3.6, 4.4, 5.7, 7.7) 1.17 (0.02)(0.7, 2.3, 3.6, 4.4, 5.0, 5.8, 7.0, 8.9) 1.23 (0.02)

designs. In this evaluation, first, 500 datasets were generated from the death model

according to each of the optimal designs given in Table 4.1 and randomly selected designs

with an equivalent number of design points. Then, the posterior model probability of

the death model and log reciprocal of the posterior variance of β1 of the death model

were used to evaluate the efficiency of the optimal and the random designs for model

discrimination and parameter estimation, respectively. The performance of designs was

evaluated in a similar manner for the SI model. In each case, posterior inference was

undertaken based on actual likelihood evaluations.

According to Figure 4.2, the total entropy designs perform similarly to the corresponding

discrimination designs, except for the one-point design where the total entropy design

is less efficient than the discrimination design. As expected, the one-point KLD design

performs poorly for discrimination while the mutual information design performs well

across both models. However, all optimal designs with more than three-design points

perform similarly well for model discrimination (as the designs are quite similar) while

the random designs perform poorly. Moreover, it seems that there is only a small benefit

in collecting more than three observations for discriminating between the death and SI

models.

Figure 4.3 compares the performance of optimal designs in terms of parameter estimation.

Here, for both models, all optimal designs provide more information than the correspond-

ing random designs. Both KLD and total entropy one-point designs perform significantly

better than the discrimination design across both data generating models. It appears that


0.00

0.25

0.50

0.75

1.00

MI−

1D

TE

−1D

KLD

−1D

RD

−1D

MI−

2D

TE

−2D

KLD

−2D

RD

−2D

MI−

3D

TE

−3D

KLD

−3D

RD

−3

D

MI−

4D

TE

−4D

KLD

−4D

RD

−4D

MI−

6D

TE

−6D

KLD

−6D

RD

−6D

MI−

8D

TE

−8D

KLD

−8D

RD

−8D

Design

Poste

rior

mo

del pro

ba

bili

ty

(a) Death model

0.00

0.25

0.50

0.75

1.00

MI−

1D

TE

−1

D

KLD

−1D

RD

−1D

MI−

2D

TE

−2D

KLD

−2D

RD

−2D

MI−

3D

TE

−3D

KLD

−3D

RD

−3D

MI−

4D

TE

−4D

KLD

−4D

RD

−4D

MI−

6D

TE

−6D

KLD

−6D

RD

−6D

MI−

8D

TE

−8D

KLD

−8

D

RD

−8

D

Design

Poste

rior

mo

de

l pro

ba

bili

ty

(b) SI model

Figure 4.2: The posterior model probability of the data generating model obtained for observationsgenerated from (a) death model and (b) SI model according to optimal designs and random designs.


3

4

5

6

MI−

1D

TE

−1D

KLD

−1D

RD

−1D

MI−

2D

TE

−2D

KLD

−2D

RD

−2D

MI−

3D

TE

−3D

KLD

−3D

RD

−3

D

MI−

4D

TE

−4D

KLD

−4D

RD

−4D

MI−

6D

TE

−6D

KLD

−6D

RD

−6D

MI−

8D

TE

−8D

KLD

−8D

RD

−8D

Design

log

(1/d

et(

cov(t

he

ta|y

,d))

(a) Death model

12.5

15.0

17.5

20.0

MI−

1D

TE

−1D

KLD

−1D

RD

−1D

MI−

2D

TE

−2D

KLD

−2D

RD

−2D

MI−

3D

TE

−3D

KLD

−3D

RD

−3D

MI−

4D

TE

−4D

KL

D−

4D

RD

−4D

MI−

6D

TE

−6D

KLD

−6

D

RD

−6

D

MI−

8D

TE

−8

D

KLD

−8D

RD

−8D

Design

log(1

/det(

cov(t

heta

|y,d

))

(b) SI model

Figure 4.3: Log determinant of the posterior covariance of the parameters of (a) death model and (b)SI model obtained for observations generated from the corresponding model according to optimal designsand random designs.


the performance of designs with more than two design points are similar across all the

utility functions regardless of the data generating model as (again) designs found under

each utility are quite similar (see Table 4.1).

Given our proposed approach to approximate U(d) appears reasonable across the three

utilities considered in this work and that total entropy utility appears to perform well in

both model discrimination and parameter estimation, we next consider this methodology

to design experiments to learn more about FMD.

4.5.2 Example 2 - SIR and SEIR models

Transmission experiments have been used as a primary tool to investigate the transmission

dynamics of FMD within a head of animals, for instance, Backer et al. (2012), Bravo de

Rueda et al. (2015), Hu et al. (2017), Orsel et al. (2007). These experiments generally start

with giving a viral dose containing a prespecified amount of plaque forming units (PFU)

of the FMD viral strain of interest, for instance approximately 37500 PFU of FMD virus

strain O/NET/2001 (Orsel et al., 2007), to randomly selected animals and letting them

interact with susceptible animals. Then, the animals are observed over time for evidence

of disease states such as infectious and recovery based on clinical signs and blood samples

collected from the animals. In the veterinary epidemiological literature, both the SIR

(Orsel et al., 2007) and SEIR (Backer et al., 2012, Bradhurst et al., 2015) models have

been proposed to model the with-in herd spread of FMD in cattle populations. Thus, in

this example, we focus on deriving a set of time points to observe a closed population

of 50 cattle to both discriminate between SIR and SEIR models and estimate model

parameters.

In this hypothetical example, the number of animals who are administered the virus,

approximately 37500 PFU of FMD virus strain O/NET/2001, prior to the experiment is

considered as a fixed variable, specifically, I(t = 0) = 5 in order to consider a realistic

scenario. These inoculated animals are kept separate from the other animals for a pre-

specified period (say 24 hours), and it is assumed that these animals are infectious at the

beginning of the experiment (t = 0) when they are allowed to mix with other remaining

45 animals who are assumed to be susceptible to FMD. The experiment is run over the

course of 30 days.

As it is costly to collect and test various samples from animals at regular intervals here,

we consider a set of k optimal times at which to collect those samples. Since the SEIR

model assumes a non-observable disease state, E(t), here we consider only the number of

infectious and recovered individuals observed at each time point in the utility evaluations.

Thus, subsequent inference is undertaken based on observed data Y = {(I(ti), R(ti)); i =

1, . . . , k} where I(ti) and R(ti) are the number of infectious and recovered individuals,

respectively, at time ti. Here, infectious or recovered individuals are identified as according

to the amount of virus measured in the collected blood samples of each individual as it is


Table 4.2: Optimal designs derived under different utility functions.

Utility function Optimal design d∗ U(d∗)

Mutual Information

(3.1) -0.43 (0.02)(4.1, 16.0) -0.34 (0.02)(0.7, 4.1, 18.4) -0.30 (0.02)(0.7, 4.1, 10.1, 25.3) -0.28 (0.02)(0.7, 3.1, 5.3, 6.5, 10.1, 25.3) - 0.27 (0.02)(0.7, 2.9, 4.1, 5.3, 6.3, 6.5, 10.1, 25.4) -0.27 (0.02)

Total Entropy

(7.0) 0.97 (0.02)(6.7, 17.5) 1.56 (0.03)(6.5, 13.5, 27.1) 1.81 (0.03)(5.5, 10.8, 16.3, 27.1) 1.97 (0.03)(4.1, 7.0, 10.8, 14.2, 18.8, 27.1) 2.16 (0.03)(4.1, 7.0, 10.8, 12.9, 15.2, 17.5, 21.7, 27.3) 2.30 (0.04)

KLD

(11.6) 0.91 (0.02)(9.4, 19.1) 1.26 (0.03)(7.4, 14.2, 27.1) 1.47 (0.03)(7.3, 10.9, 16.4, 27.1) 1.60 (0.03)(7.3, 10.7, 14.2, 17.8, 21.7, 28.2) 1.79 (0.03)(7.3, 10.7, 12.8, 15.0, 17.4, 21, 23.8, 28.2) 1.94 (0.03)

more reliable than classifying them according to their clinical signs (see for more details

Stenfeldt et al. (2016)).

Based on the parameter values of SEIR model given in Backer et al. (2012), β ∼ log −normal(0.44, 0.162), αE ∼ gamma(25.55, 0.02)(shape, scale) and αI ∼ gamma(7.25, 0.04)

were chosen as the prior distributions of parameters the SEIR model. Then, the β ∼log − normal(−0.09, 0.192) and αI ∼ gamma(10.30, 0.02) were used as the prior dis-

tributions of the parameters of the SIR model as these yielded similar prior predictive

distributions for infectious and recovered individuals as given by the SEIR model. These

two models were assumed equally likely a priori. Table 4.2 summarises the optimal designs

found under our three utility functions.

According to these designs, the process should be observed at its early stages to dis-

criminate between models. In contrast, observations collected at the middle and later

stages provide more information about parameters. A combination of such design char-

acteristics is reflected in the designs derived based on the total entropy utility. As in

the previous example, the performance of the derived optimal design for discriminating

between competing models and estimating model parameters was evaluated. Here, the

synthetic likelihood was used in this evaluation as the actual likelihood of either of these

models is not available in closed-form.

Figure 4.4 compares the performance of the optimal and random designs for discriminat-

ing between the SIR and SEIR model when (a) the SIR model and (b) the SEIR model is


0.00

0.25

0.50

0.75

1.00

MI−

1D

TE

−1D

KLD

−1D

RD

−1D

MI−

2D

TE

−2D

KLD

−2D

RD

−2D

MI−

3D

TE

−3D

KLD

−3D

RD

−3

D

MI−

4D

TE

−4D

KLD

−4D

RD

−4D

MI−

6D

TE

−6D

KLD

−6D

RD

−6D

MI−

8D

TE

−8D

KLD

−8D

RD

−8D

Design

Poste

rior

mo

del pro

ba

bili

ty

(a) SIR model

0.00

0.25

0.50

0.75

1.00

MI−

1D

TE

−1

D

KLD

−1D

RD

−1D

MI−

2D

TE

−2D

KLD

−2D

RD

−2D

MI−

3D

TE

−3D

KLD

−3D

RD

−3D

MI−

4D

TE

−4D

KLD

−4D

RD

−4D

MI−

6D

TE

−6D

KLD

−6D

RD

−6D

MI−

8D

TE

−8D

KLD

−8

D

RD

−8

D

Design

Poste

rior

mo

de

l pro

ba

bili

ty

(b) SEIR model

Figure 4.4: The approximated posterior model probability of (a) SIR model and (b) SEIR model obtainedfor observations generated from the corresponding model according to optimal designs and random designs.


responsible for data generation. For both models, the model discrimination designs per-

form better than the other designs. When the SIR model was the data generating model,

the total entropy designs yield observations with slightly less information for discriminat-

ing between models compared to the discrimination designs. However, as the number of

design points increases, the total entropy designs yield comparatively more information

about the data generation model. When the data were generated from the SEIR model,

the total entropy designs perform similarly to the model discrimination designs except for

the one-point design.


10

12

14

16

18

MI−

1D

TE

−1

D

KL

D−

1D

RD

−1D

MI−

2D

TE

−2

D

KL

D−

2D

RD

−2D

MI−

3D

TE

−3

D

KL

D−

3D

RD

−3D

MI−

4D

TE

−4

D

KL

D−

4D

RD

−4D

MI−

6D

TE

−6D

KL

D−

6D

RD

−6D

MI−

8D

TE

−8D

KL

D−

8D

RD

−8D

Design

log(1

/de

t(cov(t

heta

|y,d

))

(a) SIR model

15

20

25

MI−

1D

TE

−1D

KLD

−1D

RD

−1D

MI−

2D

TE

−2D

KLD

−2D

RD

−2D

MI−

3D

TE

−3D

KLD

−3D

RD

−3D

MI−

4D

TE

−4D

KL

D−

4D

RD

−4D

MI−

6D

TE

−6D

KLD

−6

D

RD

−6

D

MI−

8D

TE

−8

D

KLD

−8D

RD

−8D

Design

log(1

/det(

cov(t

heta

|y,d

))

(b) SEIR model

Figure 4.5: Log determinant of the approximated posterior covariance of the parameters of (a) SIRmodel and (b) SEIR model obtained for observations generated from the corresponding model accordingto optimal designs and random designs.

Figure 4.5 shows that, across both models, total entropy designs yield observations which

provide efficient estimates of the model parameters when compared to designs under the

KLD utility. In contrast, model discrimination designs do not provide more informative


observations to estimate parameters, particularly in the case of one-point design where it

performs poorly (in expectation) even when compared to the random designs.

4.6 Discussion

In this work, a methodology for designing dual purpose experiments for parameter estima-

tion and model discrimination for models with intractable likelihoods has been presented.

These developments were motivated by the need to conduct more ethical experiments in

epidemiology, and the current lack of knowledge about FMD. Our approach is based on

the synthetic likelihood method for approximating an entropy-based utility function. Our

illustrative example revealed that the proposed synthetic likelihood method estimates the

utility of a design reasonably well when compared to using the actual likelihood. These

methods were then applied in a second example to design experiments to learn about

FMD.

Although simulation-based likelihood approximation methods enable the derivation of

designs for Bayesian experiments for models with intractable likelihoods, it poses compu-

tational challenges in simulating a large amount of datasets. Consequently, pre-simulated

datasets have been used in the literature by discretising the design space which reduces

the required computational effort in utility evaluations. One advantage of the proposed

synthetic likelihood approach over the ABC method is the former requires less storage

as only mean vectors and covariance matrices are needed to be stored instead of all pre-

simulate datasets.

However, the proposed synthetic likelihood method may not be appropriate for approxi-

mating the likelihood in some cases when the Gaussian density is not a reasonable approx-

imation to the distribution of the data. Thus, further research is required to investigate

more suitable distributions in place of the Gaussian distribution, particularly when a

small number of experimental units will be considered. In addition, the computational

challenges of estimating the multivariate Normal integral to approximate the likelihood

could potentially hinder the exploration of higher dimensional designs, motivating the

need to develop fast and/or approximate methods to evaluate this integral.

The performance of the total entropy designs in both model discrimination and parameter

estimation, particularity in the FMD example, motivates the use of total entropy for

design problems in other areas such as systems biology and queueing systems. Further,

extending the proposed methodology to derive higher dimensional designs using more

sophisticated posterior approximation methods such as the Laplace approximation or

Laplace importance sampling instead of importance sampling from the prior could be

another potential avenue for future research.

5 A synthetic likelihood-based Laplace approxi-

mation for efficient design of biological processes

85

Chapter 5. A synthetic likelihood-based Laplace approximation for efficientdesigns

86



that:



expertise;






academic unit; and




Dehideniya M. B., Drovandi C. C., Overstall A. M., and McGree J. M. A synthetic

likelihood-based Laplace approximation for efficient design of biological processes Elec-

tronic Journal of Statistics (Submitted for publication).




co-authors.

Signature and date:

J. M. McGree Initiated the research concept, supervised research, assisted in interpreting results,


C. C. Drovandi Supervised research, assisted in interpreting results,


A. M. Overstall Proposed additional application, critically reviewed manuscript.




05/07/2019



87

5.1 Abstract

Complex models used to describe biological processes in epidemiology and ecology often

have computationally intractable or expensive likelihoods. This poses significant chal-

lenges in terms of Bayesian inference but more significantly in the design of efficient

experiments. This is because Bayesian designs are found by maximising the expectation

of a utility function over a design space. The difficulty comes from having to approximate

this expected utility as it requires sampling from or approximating a large number of

posterior distributions. This renders approaches adopted in inference computationally

infeasible to implement in design. Consequently, design in such fields has been limited to

a small number of dimensions or a restricted range of utilities. To overcome such limita-

tions, we propose a synthetic likelihood-based Laplace approximation for approximating

utility functions for models with intractable likelihoods. As will be seen, the proposed

approximation is flexible in that a wide range of utility functions can be considered, and

computationally efficient when compared to alternative methods for inference. To explore

the validity of this approximation, an illustrative example from epidemiology is consid-

ered. Then, our approach is used to design experiments with a relatively large number of

observations in two motivating applications from epidemiology and ecology.

5.2 Introduction

Designing experiments to collect data that are as informative as possible about the process

of interest is an important task in scientific investigation in, for instance, epidemiology

(Orsel et al., 2007), system biology (Faller et al., 2003) and ecology (Zhang et al., 2018).

These biological systems require the development and use of realistic statistical mod-

els that involve computationally intensive or intractable likelihoods. Unfortunately, this

poses significant challenges in design of experiment, and has led to a number of recent

developments, see Ryan et al. (2016b) for a recent review.

Often when designing an experiment, there is an uncertainty about the appropriate

stochastic process to describe the observed data and also the ensuing model parame-

ters. In the design literature for models with intractable likelihoods, these two levels of

uncertainty have been considered separately. Cook et al. (2008) considered the Kullback

Liebler distance (KLD) between the prior and the posterior as a utility function to find

designs for parameter estimation, and they used the moment closure method to approx-

imate the likelihood. Alternatively, methods from approximate Bayesian computation

(ABC) have been used to approximate utility functions based summaries of the posterior

distribution (Drovandi and Pettitt, 2013, Price et al., 2016) to design efficient experiments

for parameter estimation. In the presence of model uncertainty, designs have been found

for discriminating between competing models with intractable likelihoods (Dehideniya

et al., 2018b, Hainy et al., 2018). To approximate utility functions, Dehideniya et al.

(2018b) considered the ABC rejection algorithm for model choice (Grelaud et al., 2009)


88

while Hainy et al. (2018) proposed a classification based approach. Extensions to dual

purpose experiments for model discrimination and parameter estimation have also been

proposed by Dehideniya et al. (2018a) where the total entropy utility (Borth, 1975) was

considered. To approximate this utility, a synthetic likelihood approach for discrete data

was developed, and it was shown that such an approach allows a wide variety of utility

functions to be considered in design for models with intractable likelihoods.

Until the recent work by Overstall and McGree (2018), designs for models with intractable

likelihoods have been limited to a small number of design dimensions. Overstall and Mc-

Gree (2018) used emulation within an indirect inference framework to avoid the evalu-

ation of computationally expensive or intractable likelihoods, and were able to consider

design spaces of an order of magnitude larger than what has been considered previously.

However, their approach is limited to likelihood-based utility functions, that is, utility

functions that can be expressed in terms of the likelihood and/or the marginal likeli-

hood. In this paper, we address this limitation by extending the work of Dehideniya et al.

(2018a) to high dimensions thus allowing a wide variety of utility functions to be consid-

ered when designing large-scale experiments for models with intractable likelihoods. In

our approach, we use summary statistics to avoid the curse of dimensionality, and also

develop a Laplace-based approximation to the posterior distribution. This enables fast

posterior inference, and thus allows designs to be found in reasonable time frames.

This paper is outlined as follows. Section 5.3 presents the proposed synthetic likelihood-

based Laplace approximation for models with intractable likelihoods. Section 3 provides

background in Bayesian experimental design with a description of the utility functions

considered in this paper. Within this section, we also show how to approximate these

utility functions within our inference framework. Then, an illustrative example is consid-

ered in Section 4 along with the motivating examples for this work. Finally, a summary

of this work is given in Section 5 along with some limitations and suggestions for future

research directions.

5.3 Inference framework

The synthetic likelihood (Wood, 2010) approach is a method of approximating the like-

lihood of observed data y = {y1, y2, . . . , yL} for a given value of model parameters θ

for models with intractable likelihoods. This is achieved by assuming that the summary

statistics S conditional on θ follow a multivariate normal distribution with mean µ(θ)

and variance-covariance Σ(θ). In general, µ(θ) and Σ(θ) cannot be evaluated analytically

for a given value of θ but can be approximated by simulating n datasets from the model

conditional on θ and evaluating the summary statistics for each dataset. This yields a

distribution of summary statistics for which µ(θ) and Σ(θ) can be estimated. Then,

the log-likelihood of observed summary statistics, sobs = S(y), can be approximated as

follows:


89

ls(sobs|θ) = −1

2

[log |Σ(θ)| − (sobs − µ(θ))T Σ(θ)(sobs − µ(θ)) + L log(2π)

], (5.1)

where µ(θ) and Σ(θ) are the estimated mean vector and variance-covariance matrix of

the simulated summary statistics from the model of interest with parameter θ.

Given the above approximation to the log-likelihood, we now outline how we propose to

approximate the posterior distribution for models with intractable likelihoods. Previous

studies have considered importance sampling for models with tractable (McGree et al.,

2012, Weir et al., 2007) and intractable (Dehideniya et al., 2018a, Ryan et al., 2016a)

likelihoods as a fast approximation when the distance between the prior and posterior is

relatively small. However, as the number of observations increases the resultant posterior

distribution can be considerably different from the prior, and importance sampling may

provide an inefficient approximation to the posterior. One alternative that has been used

in Bayesian design is the Laplace approximation (Long et al., 2013, Overstall et al., 2018,

Ryan et al., 2015). Suppose we have observed data y under design d generated from

model m with qm parameters θm. Then, the Laplace approximation approximates the

posterior distribution of θm via a multivariate normal distribution with mean θ∗m and

covariance matrix H(θ∗m)−1 where θ∗m is the posterior mode and H(θ∗m)−1 is the inverse

of the Hessian matrix of negative log posterior evaluated at θ∗m. One advantage of using

the Laplace approximation is the availability of an approximation to the model evidence

which can be used for model choice. The approximation to the model evidence is as

follows:

p(y|m,d) = (2π)qm2 |H(θ∗m)−1|

12 p(y|θ∗m,m,d) p(θ∗m|m). (5.2)

When there are K candidate models to describe the process of interest, the posterior

model probability of model m is estimated by,

p(m|y,d) =p(y|m,d) p(m)∑K

m=1 p(y|m,d) p(m). (5.3)

As pointed out by Wood (2010), due to small-scale noise associated with evaluating

ls(sobs|θ), derivative-based, numerical optimisation approaches cannot be used to find

the posterior mode. Thus, in this work the Nelder-Mead algorithm for derivative-free

optimisation (Kelley, 1999) is used to find the parameter value which maximises the pos-

terior density. To approximate the Hessian matrix, methods proposed by Fasiolo (2016)

can be considered where the Hessian of the synthetic likelihood function is estimated at

a given parameter value θ based on a set of local regression models fitted between model

parameters and summary statistics (see Algorithm 2 on page 61 of Fasiolo (2016)).


90

Adopting the above approach for Bayesian inference in design has a number of advantages

over the recent work of Dehideniya et al. (2018a) who proposed a synthetic likelihood

approximation using the full dataset with continuity correction for discrete observations.

Although their approximation was shown to work well when a few data points were

observed, it becomes computational expensive as the number of observations increases.

In contrast, the synthetic likelihood approach based on summary statistics provides a

computationally feasible method of approximating the likelihood of a large number of

observations. Thus, the proposed approach is more suitable for designing experiments

which yield a large number of observations, where large is defined in the context of design

for models with intractable likelihoods.

Another advantage of our proposed approach is that the Laplace approximation requires

less likelihood evaluations when compared to alternative posterior approximations such as

importance sampling and Markov chain Monte Carlo. Consequently, the use of Laplace

approximation reduces the number of datasets that need to be simulated from the model.

While model simulation is generally an efficient process, having to repeat this a large

number of times imposes significant computational burden. Indeed, in the Bayesian ex-

perimental design literature, pre-simulated data have been used to avoid the computa-

tional cost of simulating a large number of datasets during the optimisation, for instance

Drovandi and Pettitt (2013), Hainy et al. (2016), Price et al. (2016). However, this re-

sults in having to consider a discrete design space, and therefore potentially suboptimal

designs. In adopting our proposed methods, a continuous design space can be considered

which should lead to the location of designs that perform better with respect to the ex-

perimental goal/s. The recent work of Hainy et al. (2018) also proposed a classification

based approach to reduce the number of models simulations and thus allowing a continu-

ous design space to be considered. However, their approach is currently limited to a small

number of utility functions for model discrimination.


A properly planned experiment will provide informative data for subsequent inferences

of interest, such as parameter estimation, model discrimination and/or prediction. Thus,

prior to any experimentation, the values of controllable variables need to be carefully

chosen to ensure the data are as informative as possible. In the context of experimental

design, a set of possible values for these controllable variables is defined as the design,

and the informativeness of data obtained from such a design is measured by a utility

function which is defined according to the experimental goal/s. Applications considered

in this paper focus on when to observe a biological process of interest, and thus time is the

design/controllable variable. In the presence of model uncertainty, define K competing

models each with prior model probabilities p(m) and model parameters θm with prior

distributions p(θm|m). Then, the selection of the values for the controllable variables can

be defined as a decision problem under the uncertainty about the parameters θm, model

m and data y yet to be observed under design d (Lindley et al., 1978). Therefore, the


91

expected utility is considered in finding the optimal design. This expected utility can be

defined as follows:

U(d) =K∑

m=1

p(m)

{∫y

∫θm

u(d,y,θm,m) p(y |θm, m,d) p(θm|m) dθm dy

}, (5.4)

where, p(y |θm, m,d) is the likelihood of y given the model parameter θm, model m and

design d. The utility u(d,y,θm,m) can be defined to encapsulate the purpose of the

experiment such as estimation of parameters across all the K competing models (McGree

et al., 2016), discriminating between competing models (Drovandi et al., 2014) or dual

goals of parameter estimation and model discrimination (Borth, 1975, McGree, 2017).

When the utility function u(.) is independent from the model parameter θm, Equation

(5.4) can be simplified to

U(d) =K∑

m=1

p(m)


}. (5.5)

Often the expected utility is not available in closed-form and thus needs to be approx-

imated, for example, by Monte Carlo integration. When u(.) depends on the model

parameters θm, the approximate expected utility of design d can be expressed as follows:

U(d) =K∑

m=1

p(m)1

Q

Q∑j=1

u(d,yjm,θjm,m), (5.6)

where yjm is a possible dataset generated from the model m using model parameters θjm

and design d. Similarly, when u(.) is independent from θm, the expected utility given by

Equation (5.5) can be approximated as,

U(d) =K∑

m=1

p(m)1

Q

Q∑j=1

u(d,yjm,m), (5.7)

where yjm is a possible dataset generated from the model m under design d. Generally,

the utility function u(.) is based on some posterior quantity. Therefore, the evaluation of

U(d) requires K×Q posterior evaluations or approximations, which is a computationally

intensive task in general but particularly so for models with intractable likelihoods.

The accuracy of this approximation increases as the number of Monte Carlo samples Q

increases for each model m. Drovandi and Tran (2018) demonstrated that the accuracy of

U(d) for a given number of Monte Carlo samples can be increased by using the randomised

Quasi-Monte Carlo (RQMC) method. Following Drovandi and Tran (2018), here a Sobol


92

sequence (0, 1](qm+n) is used to first generate qm parameters θm of model m and then

simulate n observations y from the model conditional on θm and d. In order to obtain

an unbiased estimate of U(d), these deterministic sequences are randomised by using the

Owen type (Owen, 1997) of scrambling implemented in Christophe and Petr (2018) with

different seed value for each time a sequence is simulated. In our implementation, system

time is used as the seed value for each simulated sequence.

5.4.1 Utility functions for parameter estimation

In this work, we consider two utility functions in designing efficient experiments to es-

timate parameters of models with intractable likelihoods namely the KLD utility and

Negative squared error loss utility.

5.4.1.1 Kullback-Leibler divergence utility

The KLD between the prior and posterior distributions of parameters θm of model m has

been commonly used as a utility function to design Bayesian experiments for estimating

parameters, for instance Cook et al. (2008), Ryan et al. (2014), Ryan (2003). The KLD

utility is given by,

uKLD(d,y,m) =

∫θm

log

(p(θm |y,m,d)

p(θm|m)

)p(θm |y,m,d) dθ. (5.8)

When p(θm|m) follows a multivariate normal distribution with mean µ1 and variance-

covariance matrix Σ1, the KLD utility can be calculated analytically based on the pos-

terior distribution of θm as given by the Laplace approximation. It can be expressed as

follows:

uKLD(d,y,m) =1

2

(tr(Σ−11 Σ2

)+(µ1−µ2)

TΣ−11 (µ1−µ2)−qm+log

(det(Σ1)

det(Σ2)

)), (5.9)

where µ2 and Σ2 are estimated posterior mean and variance-covariance matrix respec-

tively, and tr(A) is the trace of the matrix A.

5.4.1.2 Negative squared error loss utility

When the goal of the experiment is to obtain a posterior summary such as a posterior

mean that is as close as possible to the truth, the negative squared error loss (NSEL)

utility can be used to design such an experiment. Given that θ is a vector of q elements,

the NSEL utility can be expressed as follows:


93

uNSEL(d,y,θ) = −q∑

i=1

(θi − E[θi|y,d]

)2, (5.10)

where θi is the ith parameter value of θ used to generate y under design d. The NSEL

utility has been used to find designs for parameter estimation of logistic regression mod-

els (Overstall and Woods, 2017). Following the approach of Overstall et al. (2018), we

approximate E[θi|y,d] by the posterior mode θ∗ found using the Laplace approximation.

5.4.2 Utility function for model discrimination

The mutual information between the model indicator m and the observed data y under

design d has been used as a discrimination utility (Box and Hill, 1967, Drovandi et al.,

2014) to design experiments for model discrimination, and it can be expressed as follows:

uMI(d,y,m) = log p(m)− log p(m |y,d). (5.11)

As described in Section 5.3, the posterior model probability of model m can be ap-

proximated via Laplace approximation. Then, the mutual information utility can be

approximated as,

uMI(d,y,m) = log p(m)− log p(m |y,d). (5.12)

5.4.3 Utility function for dual purpose experiments

When more than one model is being considered to describe the data generated from the

process of interest, it is beneficial to design a dual purpose experiment to discriminate

between the competing models and estimate parameters of all competing models. The

total entropy (Borth, 1975) about the model and model parameters has been used as a

utility function to design dual purpose experiments of model discrimination and param-

eters estimation (Dehideniya et al., 2018a, McGree, 2017). This utility can be expressed

as a sum of the mutual information utility for model discrimination and KLD utility,

uTE(d,y,m) = uMI(d,y,m) + uKLD(d,y,m). (5.13)

By substituting the approximated uKLD(d,y,m) and uMI(d,y,m) in Equation (5.13), the

total entropy utility can be approximated.


94

5.4.4 Optimisation algorithm

Locating optimal designs involves maximising U(d) over a design space. In this work,

we used the approximate coordinate exchange (ACE) algorithm (Overstall and Woods,

2017) to find optimal designs in a continuous design space. The ACE algorithm iteratively

optimises one design variable at a time by emulating the expected utility of the given

design dimension, and optimising the predicted value as given by emulator. At each

iteration, the newly found design is compared with the current design, and is selected

based on a Bayesian hypothesis test. When implementing ACE, a number of tuning

parameters need to be specified. In the examples described in the following section, 5000

and 500 Monte Carlo samples were used for the hypothesis test and constructing the

emulator, respectively. Otherwise, the default settings for ACE were used as given in the

R-package aceabayes (Overstall et al., 2017a).

5.5 Examples

In this section, we consider one illustrative example and two motivating examples to

demonstrate the practical implications of our proposed methodologies. First, the pro-

posed utility approximation is validated by evaluating the total entropy utility of designs

for dual purpose experiments for two epidemiological models namely the death model and

Susceptible-Infected (SI) model. Secondly, the performance of our approach is demon-

strated through designing experiments to learn about foot and mouth disease. Finally, as

the third example, we consider designing laboratory microcosm experiments to estimate

parameters of a prey and predictor model found in ecology.

5.5.1 Example 1 - Dual purpose designs for the death and SI models

The death and SI models can be used to describe the spread of a disease among a closed

population of size N . In this example, optimal time points which yield informative obser-

vations for both discriminating between these competing models and estimating param-

eters of the models are considered.

The death model (Cook et al., 2008) divides the population into two sub-populations,

susceptible and infected. The state of the continuous-time Markov chain (CTMC) at

time t is defined as the number of infected individuals at time t, I(t). Given I(t) = i, the

transition probability of the possible state at t+ ∆t is given by

P[i+ 1 | i

]= β1 (N − i) ∆t +O(∆t),

where β1 is the rate at which susceptible individuals become infected due to environmental

sources.


95

The SI model (Cook et al., 2008) assumes that the infected individuals in the population

also contribute to the spread of diseases, represented by an additional parameter β2, in

addition to environmental sources. Given that I(t) = i, the transition probability of the

possible state at t+ ∆t is given by

P[i+ 1 | i

]= (β1 + β2 i) (N − i) ∆t +O(∆t).

The same priors for the unknown parameters considered by Dehideniya et al. (2018a) are

used, and they are as follows: log(β1) ∼ N(−0.48, 0.32) for Death model and log(β1) ∼N(−1.1, 0.42) and log(β2) ∼ N(−4.5, 0.632)) for the SI model.

Forming informative summary statistics for inference is generally a difficult task as the

summaries need to be informative over the entire prior predictive distribution. This

difficulty is further compounded in the context of experimental design as the summary

statistics not only need to be informative across the entire prior predictive distribution but

also across the entire design space. To handle this, we propose to use summary statistics

that are informative across a subset of the design space, where this subset is defined by

the perceived informativeness of the data obtained from a given design. That is, it seems

intuitively reasonable that, in order to estimate the probability of becoming infectious

given an individual is susceptible, data on individuals in both states are needed. To be

more precise, designs which observe the process in the latter stages generally yield all

individuals being infected (see Figure 5.2a). Consequently, observing the population of

individuals in this stage of the experiment will yield little information, and are therefore

avoided. Thus, it is this intuition that is used in subsetting the design space, and this

is achieved through inspection of the prior predictive data from a given design. Here,

we propose to subset the design space in a such a way that the set of observation times

should contain at least one time point during the first half of the experiment. In terms

of summary statistics, here we propose to use the mean and variance of the observed

counts as these were shown to be informative across the prior predictive distribution for

a random selection of designs (see Appendix C.1).

The Hessian approximation proposed in Fasiolo (2016) is based on a set of regression

models fitted between the model parameters and summary statistics. They also pro-

pose an additional regression step, where each model parameter is regressed against the

summary statistics, and the fitted model parameters are used as summary statistics to

improve the scalability of the Hessian approximation as the number of summary statistics

increases (see Section 4.5.1 of Fasiolo (2016)). In order to ensure reasonable accuracy in

the approximation, we propose to assess the validity of these additional linear regression

models based on a measure of goodness-of-fit (coefficient of multiple determination, R2).

This enables the identification of datasets (y in Equation (5.4)) which yield poor approx-

imations of the Hessian matrix and consequently the utility. Then, based on a defined

threshold value for R2, we propose to substitute poor estimates of the utility function


96

with a minimum utility value. As such, the estimate of the expected utility will be down

weighted, and potentially avoided within the optimisation. For applications in this pa-

per, it was found that for models with a single parameter, a threshold value of around 0.7

should be used while for other models a lower threshold can be considered (around 0.1).

Obviously, when this occurs, this will introduce a bias in the estimation of the expected

utility. Thus, we investigated the effect of this bias by comparing the expected utility

of randomly selected designs evaluated based on the actual likelihood to our synthetic

likelihood approach, and these results are shown in Figure 5.1.

As is evident from Figure 5.1, the proposed approach preserves a monotonic relationship

between the approximated and actual utility values, and reasonably approximates the

utility values for designs with higher utility values. It is noted that, the proposed approach

provides a biased estimate of the mutual information utility for some designs, see (×) in

Figure 5.1b. Given that these designs have relatively low expected utility values under the

actual utility evaluation, the maximisation of the expected utility should not be affected by

the introduced bias in handling the poor approximation of the Hessian matrix. Therefore,

we propose that our approximation can be used to locate optimal designs.

0.0

0.5

1.0

1.5

2.0

0.0 0.5 1.0 1.5 2.0

U(d) − Actual

U(d

) −

SL

Ap

pro

xim

atio

n

(a) Total entropy

−0.7

−0.6

−0.5

−0.6 −0.5 −0.4

U(d) − Actual

U(d

) −

SL

Ap

pro

xim

atio

n

(b) Mutual information

0.0

0.5

1.0

1.5

2.0

2.5

1.0 1.5 2.0

U(d) − Actual

U(d

) −

SL

Ap

pro

xim

atio

n

(c) KLD

Figure 5.1: Comparison of the estimated expected utility of design with 15 design points according tothe (a) total entropy, (b) mutual information and (c) KLD utilities using Laplace approximation based onsynthetic and actual likelihoods. Here, designs with biased estimate of the expected utility are representedby (×). In each plot, y = x line indicates a perfect match of approximated and actual utility evaluations.

Optimal designs under total entropy, mutual information and KLD utility functions were

located by the ACE algorithm. The optimal designs are shown in Figure 5.2b. The

expected utility values of the optimal designs were re-evaluated 100 times with different

draws from the prior predictive distribution, and the mean and standard deviation of

these expected utility values are given in Table 5.1.


97

0

10

20

30

40

50

0 5 10 15 20

Time (Days)

Num

ber

of in

fect

eds

(a)

MI−8D

TE−8D

KLD−8D

MI−10D

TE−10D

KLD−10D

MI−15D

TE−15D

KLD−15D

0 5 10 15 20

Time (Days)

Des

ign

(b)

Figure 5.2: Prior predictive distribution of number of infecteds based on the death model (solid) and SImodel (dashed) are given in sub figure (a). Here, dot-dashed and dotted lines represent the 10% aand 90%prior predictive quantiles of the death and SI model, respectively. Sub figure (b) illustrates the optimaldesigns found under total entropy utility (∗) along with the KLD utility (×) and mutual informationutility (+) for the death and SI models.

The optimal designs based on total entropy were compared with discrimination and esti-

mation designs in terms of addressing each experimental goal individually. Further, for

each optimal design, an equally spaced design with same number of design points was also

considered. In this comparison, for each design, 1000 datasets were simulated from both


98

Table 5.1: Expected utility values (standard deviation) of optimal designs derived under different utilityfunctions.

Utility function Number of design points |d| U(d∗) (SD)

Mutual Information8 -0.444 (0.002)10 -0.434 (0.002)15 -0.423 (0.002)

Total Entropy8 1.891 (0.006)10 1.931 (0.006)15 2.007 (0.007)

KLD8 2.328 (0.007)10 2.375 (0.007)15 2.433 (0.007)

models and posterior inference were undertake based on actual likelihood for the death

model and approximated likelihood for SI model (Sidje, 1998). First, posterior model

probabilities of the data generating model were estimated based on Laplace approxima-

tion as described by Equations (5.2) and (5.3) in Section 5.3. These results are shown in

Figure 5.3. From this figure, it is evident that all optimal designs perform equally well

for model discrimination, with some advantage over the equally spaced design.

Secondly, the designs were compared in terms of parameter estimation for both models.

For the death model, the log reciprocal of the posterior variance of β1 of death model and

for SI model the log determinant of the inverse of posterior variance-covariance matrix of

(β1, β2) of SI model were used to measure the performance of the designs for parameter

estimation. In this validation, Laplace importance sampling (LIS) was considered to ap-

proximate the posterior of model parameters more accurately. LIS is a combination of the

Laplace approximation and the importance sampling where the Laplace approximation

is used as the importance distribution. Following Ryan et al. (2015), here we multiplied

the variance-covariance matrix obtained via the Laplace approximation by 2 in forming

the importance distribution to ensure it covered the tails of the target distribution. The

results are shown in Figure 5.4. As can be seen, all optimal designs perform similarly

well, and consistently outperform the equally spaced design.


99

0.00

0.25

0.50

0.75

1.00

MI−

8D

TE

−8D

KLD

−8D

EQ

S−

8D

MI−

10D

TE

−1

0D

KL

D−

10D

EQ

S−

10

D

MI−

15D

TE

−1

5D

KL

D−

15D

EQ

S−

15

D

Design

Po

ste

rior

mode

l p

rob

abili

ty

(a) Death model

0.00

0.25

0.50

0.75

1.00

MI−

8D

TE

−8D

KLD

−8

D

EQ

S−

8D

MI−

10D

TE

−10

D

KLD

−10

D

EQ

S−

10D

MI−

15

D

TE

−15D

KLD

−15

D

EQ

S−

15D

Design

Po

ste

rior

mod

el p

rob

ab

ility

(b) SI model

Figure 5.3: The posterior model probability of the data generating model obtained for observationsgenerated from (a) death model and (b) SI model according to optimal designs and an equally spaceddesigns.


100

3.25

3.50

3.75

4.00

4.25

4.50

MI−

8D

TE

−8D

KLD

−8D

EQ

S−

8D

MI−

10D

TE

−1

0D

KL

D−

10D

EQ

S−

10

D

MI−

15D

TE

−1

5D

KL

D−

15D

EQ

S−

15

D

Design

log

(1/d

et(

cov(t

heta

|y,d

))

(a) Death model

3

4

5

6

MI−

8D

TE

−8D

KLD

−8

D

EQ

S−

8D

MI−

10D

TE

−10

D

KLD

−10

D

EQ

S−

10D

MI−

15

D

TE

−15D

KLD

−15

D

EQ

S−

15D

Design

log(1

/de

t(cov(t

he

ta|y

,d))

(b) SI model

Figure 5.4: The log determinant of the inverse of posterior variance-covariance matrix of parameters ofthe data generating model when observations generated from (a) death model and (b) SI model accordingto optimal designs and an equally spaced designs.


101

5.5.2 Example 2 - Dual purpose designs for foot and month disease

Foot and mouth disease (FMD) is a contagious disease which affects livestock such as

cattle, pigs, sheep (Knight-Jones and Rushton, 2013). Both the Susceptible-Infectious-

Recovered (SIR) model (Orsel et al., 2007) and the Susceptible-Exposed-Infectious-Recovered

(SEIR) model (Backer et al., 2012) have been proposed to describe epidemic data from

FMD.

The SIR model assumes that susceptible individuals become infectious immediately after

they make an infected-contact with the infectious individuals in the population. Then,

the infectious individuals recover after time T ∼ exp(α) becoming immune and no longer

spread the disease to other individuals. As in the previous example, the spread of FMD

among N individuals can be described by a CTMC model. Given that there are s sus-

ceptible and i infectious individuals at time t, the probabilities of possible events in the

next infinitesimal time period ∆t are given by,

P[s− 1 , i+ 1 | s , i

]=β s i

N∆t +O(∆t),

P[s, i− 1|s, i

]= α i∆t +O(∆t),

where β is the rate at which an infectious individual make infected-contacts per unit time.

In contrast, the SEIR model assumes that the susceptible individuals do not become in-

fectious immediately after they been exposed to the disease, but after time TE ∼ exp(αE).

During this period, exposed individuals do not show any symptoms of being infected, and

therefore the number of exposed individuals e at time t is unobservable. Once the exposed

individuals become infectious, they contribute to the spread of the disease and recover

after time TI ∼ exp(αI). Given that there are s susceptible, e exposed and i infectious

individuals at time t, the probabilities of possible events in the next infinitesimal time

period ∆t are given by,

P[s− 1 , e+ 1 , i | s , e , i

]=β s i

N∆t +O(∆t),

P[s , e− 1 , i+ 1 | s , e , i

]= αEe∆t +O(∆t),

P[s , e , i− 1 | s , e , i

]= αIi∆t +O(∆t),

where β is the rate at which an infectious individual make infected-contacts per unit time.


102

0

10

20

30

0 10 20 30

Time (Days)

Num

ber

of in

fectious indiv

iduals

(a)

0

10

20

30

40

50

0 10 20 30

Time (Days)

Num

ber

of re

covere

d indiv

iduals

(b)

MI−8D

TE−8D

KLD−8D

MI−10D

TE−10D

KLD−10D

MI−15D

TE−15D

KLD−15D

MI−20D

TE−20D

KLD−20D

0 10 20 30

Time (Days)

De

sig

n

(c)

Figure 5.5: Sub figures (a) and (b) the prior predictive distributions of infectious and recovered in-dividuals based on the SIR (solid) and SEIR (dashed) models. In both figures, dotted and dot-dashedlines represent the 10% aand 90% of prior predictive quantiles based on the SIR and SEIR model, respec-tively. Optimal designs found under total entropy utility (∗) along with the KLD utility (×) and mutualinformation utility (+) for the SIR and SEIR models are illustrated in sub figure (c).


103

Following Dehideniya et al. (2018a) log(β) ∼ N(−0.09, 0.192) and log(α) ∼ N(−1.63, 0.322)

were chosen to describe the uncertainty about the parameters of the SIR model, and

for SEIR model log(β) ∼ N(0.44, 0.162) , log(αE) ∼ N(−0.69, 0.22) and log(αI) ∼N(−1.31, 0.382) were chosen as the priors. At the beginning of the experiment, t = 0,

there are 5 infectious and 45 susceptible individuals and the population is observed start-

ing from t = 0.25 days (6 hours) until 30 days. In order to obtain an observation schedule

which is feasible to implement, these observation times were selected such that they are

at least 0.25 days apart, and we consider up to 20 design points. At each observation

time, the number of infectious (I) and recovered (R) individuals are recorded. In approx-

imating the expected utility of designs, mean, median and variance were considered as

the summary statistics to approximate the synthetic likelihood of observed data as de-

scribed in Section 5.3. Optimal designs found under three utility functions are illustrated

in Figure 5.5. The expected utility of each optimal design was re-evaluated 100 times and

the mean and the standard deviation of those estimated utilities are given in Table 5.2.

Table 5.2: Expected utility values (standard deviation) of optimal designs derived under different utilityfunctions.


Mutual Information

8 -0.275 (0.005)10 -0.266 (0.005)15 -0.267 (0.005)20 -0.261 (0.005)

Total Entropy

8 1.152 (0.010)10 1.172 (0.008)15 1.196 (0.009)20 1.205 (0.009)

KLD

8 1.569 (0.013)10 1.583 (0.008)15 1.592 (0.009)20 1.611 (0.007)

As in Example 1, the optimal designs found using the total entropy utility were assessed

for model discrimination and parameter estimation. For each case, posterior inference

was undertaken using the synthetic likelihood approach described in Section 5.3. The

posterior model probabilities of the data generating model were determined for each

optimal and equally spaced design. As shown in Figure 5.6, designs found using the

model discrimination utility perform well across both SIR and SEIR models. When the

SIR model is the data generating model, KLD designs yield less informative datasets for

model discrimination while both total entropy designs and equally spaced designs perform

equally well. For the SEIR model, a clear difference in discrimination ability of designs

is not visible.


104

0.00

0.25

0.50

0.75

1.00

MI−

8D

TE

−8D

KLD

−8D

EQ

S−

8D

MI−

10D

TE

−1

0D

KL

D−

10D

EQ

S−

10

D

MI−

15D

TE

−1

5D

KL

D−

15D

EQ

S−

15

D

MI−

20D

TE

−2

0D

KL

D−

20D

EQ

S−

20

D

Design

Po

ste

rior

mode

l p

rob

abili

ty

(a) SIR model

0.00

0.25

0.50

0.75

1.00

MI−

8D

TE

−8D

KLD

−8

D

EQ

S−

8D

MI−

10D

TE

−10

D

KLD

−10

D

EQ

S−

10D

MI−

15

D

TE

−15D

KLD

−15

D

EQ

S−

15D

MI−

20D

TE

−20D

KLD

−20D

EQ

S−

20D

Design

Po

ste

rior

mod

el p

rob

ab

ility

(b) SEIR model

Figure 5.6: The posterior model probability of the data generating model obtained for observationsgenerated from (a) SIR model and (b) SEIR model according to optimal designs and an equally spaceddesigns.

In order to assess the performance of the optimal designs for parameter estimation, LIS

is used as in Example 1. Figure 5.7 compares the log determinant of the inverse of

posterior variance-covariance matrix in parameters of the data generating model based


105

on the optimal and equally spaced designs. It is evident that the total entropy designs

perform as well as the designs the designs found under KLD utility across both models

while designs found for model discrimination (only) do not provide precise estimation of

parameters and are actually less efficient than the equally spaced designs.

7

8

9

10

11K

LD

−8D

TE

−8D

MI−

8D

EQ

S−

8D

KLD

−10D

TE

−10D

MI−

10D

EQ

S−

10D

KLD

−15D

TE

−15D

MI−

15D

EQ

S−

15D

KLD

−20D

TE

−20D

MI−

20D

EQ

S−

20D

Design

log

(1/d

et(

cov(t

he

ta|y

,d))

(a) SIR model

10

12

14

KLD

−8D

TE

−8D

MI−

8D

EQ

S−

8D

KLD

−10D

TE

−10D

MI−

10D

EQ

S−

10D

KLD

−15

D

TE

−15D

MI−

15D

EQ

S−

15D

KLD

−20

D

TE

−20D

MI−

20D

EQ

S−

20D

Design

log(1

/det(

cov(t

heta

|y,d

))

(b) SEIR model

Figure 5.7: The log determinant of the inverse of posterior variance-covariance matrix of parameters ofthe data generating model when observations generated from (a) SIR model and (b) SEIR model accordingto optimal designs and an equally spaced designs.


106

5.5.3 Example 3 - Design for parameter estimation of predator - prey

model

Laboratory microcosm experiments play a key role in developing and refining ecological

theories (Bonsall and Hassell, 2005), where single-celled organisms or insects are placed

in a controlled environment to imitate complex natural environments. These experiments

provide many advantages over the field studies such as ability to replicate, control the

environmental conditions and convenient sampling. Consequently, in ecology laboratory

microcosm experiments were used to explore ecological concepts such as intraspecific

competition (Hassell et al., 1976, Nicholson, 1954) and predator and prey interaction

(Balciunas and Lawler, 1995, Lawler, 1993, Luckinbill, 1973).

Luckinbill (1973) conducted a series of experiments to investigate interactions between

Pciramecium aurelia (prey) and Didinium nasutum (predator). In this example, we con-

sider the Luckinbill’s experiment as a motivating application, and find optimal sampling

times to obtain data for estimating the parameters of the modified Lotka-Volterra (LV)

model with logistic growth of prey. Let the birth rate of prey be given by a and, in

the absence of predators, the prey population follows a logistic growth with a carrying

capacity K. Further, the rate of predation is given by b and the death rate of predators

is given by c. At time t, denote the size of the prey and predators populations are x and

y, respectively, the probabilities of possible events in the next infinitesimal time period

∆t are given by,

P[x+ 1 , y |x , y

]= a x∆t + o(∆t),

P[x− 1 , y |x , y

]= a

(1− x

K

)x∆t + o(∆t),

P[x− 1 , y + 1 |x , y

]= b x y∆t + o(∆t),

P[x , y − 1 |x , y

]= c y∆t + o(∆t).

Following the experimental set-up used by Luckinbill (1973), we assume that there are

90 prey and 35 predators at the beginning of the experiment. To obtain oscillatory

population dynamics over time, the following priors are chosen for the model parameters,

log(K) ∼ N(6.87, 0.202) , log(a) ∼ N(0.01, 0.122) , log(b) ∼ N(−5.03, 0.122) and log(c) ∼N(−0.69, 0.162), see Figures 5.8a and 5.8b.


107

0

100

200

300

0 5 10 15 20

Time (Days)

Nu

mb

er

of

pre

y

(a)

100

200

300

400

0 5 10 15 20

Time (Days)

Nu

mb

er

of

pre

da

tors

(b)

KLD−10D

NSEL−10D

KLD−15D

NSEL−15D

KLD−20D

NSEL−20D

0 5 10 15 20

Time (Days)

De

sign

(c)

Figure 5.8: Prior predictive distributions of prey and predators are given in sub figure (a) and (b)respectively. In both figures, dotted lines represent the 10% and 90% prior prediction quantiles of preyand predators. Plot (c) illustrates the optimal designs found under KLD utility (+) and NSEL utility (∗)for estimating parameters of the modified LV model.


108

The Gillespie algorithm (Gillespie, 1977) simulates every event that changes the state of

the system. Compared to the epidemiological models considered in Example 1 and 2,

the LV model can result in a large number of events being observed depending on the

parameter values used for the model simulation. Consequently, the Gillespie algorithm

can be computationally expensive to simulate data a large number of times. Alternatively,

the Explicit tau leap (ETL) method (Gillespie, 2001) can be considered where data are

simulated by sampling the number of times each possible event can occur during a time

step τ from a Poison distribution. The selection of value of τ is a trade-off between the

accuracy and the computational efficiency of the simulation. In this example, we used the

ETL method and set τ = 0.08 to simulate data in evaluating synthetic likelihood, as this

provided reasonable accuracy and significantly reduced computing time when compared to

the Gillespie algorithm. Here, mean, log variance and maximum of the observed counts of

prey and predators according to the design (d) were considered as the summary statistics.

Figure 5.8c shows the optimal sampling times for parameter estimation of the modified

LV model found under the KLD and NSEL utilities. There is overlap between the selected

sampling times which maximise the KLD and NSEL utilities. However, the NSEL designs

suggest to observe the process at the beginning of the experiment while KLD designs

consist of sampling times at the end of experiment. Similar to previous examples, the

expected utility of each optimal design were re-evaluated 100 times, and the mean and

standard deviation of these expected utility values are given in Table 5.3.

Table 5.3: Expected utility values (standard deviation) of optimal designs derived under the KLD andNSEL utility functions.


KLD

10 2.87 (0.01)15 2.95 (0.02)20 2.97 (0.02)

NSEL

10 -0.0596 (0.0004)15 -0.0588 (0.0005)20 -0.0589 (0.0005)

As in the other two examples, the performance of optimal designs in estimating parameters

were assessed based on the log determinant of the inverse of posterior variance-covariance

matrix of parameters. Figure 5.9, compares the optimal designs found under KLD and

NSEL utilities with equally spaced designs. Despite noticeable differences between KLD

and NSEL designs, they perform similarly well compared to the equally spaced designs.

As seen in Figures 5.8a and 5.8b, prior predictive distributions of prey and predators have

two oscillations. Thus, potentially there are two regions of the prior predictive distribu-

tions where information about parameters can be obtained. Consequently, the utilities

considered here select quite different regions of the design space as being informative.


109

15

18

21

24

27

KL

D−

10

D

NS

EL−

10

D

EQ

S−

10

D

KL

D−

15D

NS

EL−

15

D

EQ

S−

15

D

KL

D−

20D

NS

EL−

20

D

EQ

S−

20

D

Design

log

(1/d

et(

cov(t

heta

|y,d

))

Figure 5.9: The log determinant of the inverse of posterior variance-covariance matrix of parameters ofthe data generating model when observations generated from the modified LV model according to optimaldesigns and an equally spaced designs.

5.6 Discussion

In this work, we proposed a synthetic likelihood-based Laplace approximation to evaluate

utility functions in designing experiments to collect a lager number of observations in

epidemiology and ecology. The proposed Laplace approximation requires a relativity small

number of likelihood evaluations compared to other posterior approximation methods,

and thus this reduces the number of model simulations required for utility evaluations.

This approach avoids the use of pre-simulated datasets which generally requires large

storage. Further, the computational cost in approximating the likelihoods with a large

number of observations has been reduced by using summary statistics instead of full

dataset. Consequently, our approach enables the location of high dimensional designs

for models with intractable likelihoods in a continuous design space providing significant

improvement on what has been proposed previously in the literature.

Although, the proposed approach provides an efficient approximation for a wide range

of utility functions, there are a few limitations. First, the selection of summary statis-

tics which are informative not only over the entire prior predictive distribution but also

across the design space appears to be a difficult task. We addressed this by avoiding

designs which yielded low utility values and/or poor approximations to the utility. How-

ever, recently robust methods for estimating the synthetic likelihood have been proposed.

For instance, the extended empirical saddlepoint estimation (Fasiolo et al., 2018) and a

semi-parametric approach (An et al., 2018). Exploration of the use of such methods to


110

improve the approximation of low utility values is a potential research avenue that could

be explored into the future.

Secondly, the use of Gillespie algorithm (Gillespie, 1977) to simulate data can be pro-

hibitively expensive to use in our approach, for example, see Example 3. Computationally

less expensive alternative approaches such as the ETL method (Gillespie, 2001) can be

used to simulate data. However, depending on the specified value for τ , The ETL method

can produce invalid states (negative values for population sizes). Consequently, improved

versions of ETL method, such as Binomial tau-leap method or Optimized tau-leap method

have been proposed. However, employing these methods comes at additional computa-

tional cost. Thus, further development of such methods is needed such that they can be

efficiently used in design. The exploitation of parallel computation available in Graphical

processing units (GPUs) could also be useful for alleviating some of the computational

burden when finding optimal designs. We plan to explore this into the future.

6 Conclusion

This chapter summarises the key developments proposed in this thesis. Then, the limita-

tions of these developments are discussed with potential future research directions.

6.1 Summary

The primary aim of this thesis was to develop and implement efficient methods to design

experiments for models with intractable likelihoods. To achieve this aim, the following

objectives were defined.

1. Develop a method to design experiments for discriminating between models with

intractable likelihoods.

2. Develop a new optimisation algorithm for computationally expensive utility func-

tions.

3. Develop a method to design dual purpose experiments for models with intractable

likelihoods.

4. Extend Objective 3 to find high dimensional designs for models with intractable

likelihoods.

To address the first objective, in Chapter 3 we proposed a novel method to efficiently

approximate model discrimination utilities using methods from approximate Bayesian

computation (ABC), specifically, ABC rejection algorithms for parameter estimation and

model choice. Three model discrimination utility functions, namely the mutual infor-

mation utility for model discrimination, Zero-one utility and Ds-optimal utility, were

considered to find optimal time points to observe the spread of a disease in a popula-

tion of individuals. The performance of designs found based on these three utilities were

compared. It was found that the mutual information designs generally performed better

when compared to the other two utilities, yielding data that provided the most certainty

about the appropriate model.

111

Chapter 6. Discussion and conclusions 112

To address Objective 2, we proposed an extension to the coordinate exchange (CE) al-

gorithm, called the Refined coordinate exchange (RCE) algorithm, to reduce the number

of utility evaluations required to locate optimal designs. We compared the performance

of the RCE algorithm with the CE algorithm, and found that both algorithms located

the same optimal design, but the RCE algorithm required much fewer evaluations of

the expected utility. We also compared the performance of the RCE algorithm with an

adaptation of the approximate coordinate exchange (ACE) algorithm to handle a discrete

design space, called ACE-D. It was found that the RCE algorithm locates highly efficient

designs with respect to ACE-D within a small number of iterations. Moreover, the RCE

algorithm was shown to be more robust to the initial design chosen. In order to locate

high dimensional designs, we implemented the RCE algorithm to exploit the parallel com-

putational architectures. Using the RCE algorithm, we found optimal designs up to ten

design points for discriminating between four competing epidemiological models. Such a

design problem would be computationally infeasible to consider with the CE algorithm,

and this was achieved without relying on an emulator of the expected utility surface (as

used in ACE-D) which, in practice, may suffer from issues of lack-of-fit.

Chapter 4 addressed the third objective by proposing methodology to design dual pur-

pose experiments for estimating parameters and discriminating between models with in-

tractable likelihoods. To do this, we extended the synthetic likelihood approximation by

applying a continuity correction for approximating the likelihood of discrete observations

via the multivariate normal distribution. Our approximation facilities the evaluation of a

wider range of utility functions than any other previously proposed methodologies allow-

ing utilities such as KLD utility, the mutual information utility for model discrimination

and the total entropy utility to be considered. After defining this synthetic likelihood

approximation, we validated the proposed utility approximation on an illustrative exam-

ple which involved designing dual purpose experiments for two epidemiological models

namely the death and Susceptible-Infected (SI) models. It was evident that the proposed

approximation was sufficient for locating optimal designs. Given this, we found dual

purpose designs for experiments on studying foot and mouth disease (FMD) which was

previously described by two competing epidemiological models, Susceptible-Infectious-

Recovered (SIR) and Susceptible-Exposed-Infectious-Recovered (SEIR) models. Then,

the performance of total entropy designs for discriminating between SIR and SEIR mod-

els and estimating parameters of these models were compared with optimal designs found

under the KLD utility and the mutual information utility for model discrimination. Over-

all, the total entropy designs performed similarly to the designs optimised for each design

objective, particularly as the number of design points increased. Therefore, we concluded

that our approximation was suitable and useful for finding dual purpose designs for ex-

periments in epidemiology.

The methodology proposed in Chapter 4 is limited to a small to moderate number of

design points as it involves computationally expensive integration across a multivariate

normal distribution, and requires many likelihood evaluations as importance sampling


was used in approximating the posterior distribution. Thus, Objective 4 addressed these

limitations by proposing a synthetic likelihood-based Laplace approximation to the pos-

terior distribution that enables utility functions to be efficiently evaluated for high di-

mension designs. The proposed approach avoids the computational expense of evaluating

the likelihood of data by instead using summary statistics. Further, use of the Laplace

approximation provides a more efficient approximation to the posterior distribution (com-

pared to importance sampling) as the number of observations increases. We evaluated the

performance of the synthetic likelihood-based Laplace approximation for estimating util-

ity functions via an illustrative example. Then, we found high dimensional dual purpose

designs of up to 20 dimensions to learn about the FMD, and designs to estimate param-

eters of the modified Lotka-Volterra (LV) model with logistic growth of prey. Given we

developed a computationally efficient utility approximation, here we considered a contin-

uous design space where potentially more efficient designs can be found when compared

to a discretised design space.

6.2 Limitations and future work

There are some limitations of the proposed methods in this thesis which motivate future

research. One of the limitations of the methods in Chapters 4 and 5 is the Gaussian

assumption of the distribution of the data and summary statistics. To relax this assump-

tion, more robust likelihood approximations such as the extended empirical saddlepoint

approximation (Fasiolo et al., 2018) and SemiBSL (An et al., 2018) could be considered.

In Chapter 5, the Laplace approximation assumes that the posterior of parameters can

be well approximated by a Gaussian distribution. The use of more sophisticated methods

such as Laplace importance sampling and Variational Bayes could potentially improve

the posterior approximation, and therefore the approximation to the expected utility.

However, these methods will increase the computational cost of utility evaluations. Thus,

the possible improvements in the computational efficiency of these methods and/or im-

plementations of these methods would need to exploit parallel computing architectures in

order to locate designs in a reasonable amount of time. This is something that could be

investigated into the future.

Despite the various developments proposed in this thesis to improve the computational

efficiency in obtaining posterior approximations, the process of locating optimal designs

for models with intractable likelihoods is still quite time consuming. We would suggest

three research directions to address this limitation. Despite the computationally less

expensive model simulation using the Explicit tau-leap (ETL) method (Gillespie, 2001),

this method may produce invalid datasets. Improved versions of ETL method, such as

Binomial tau-leap method or Optimized tau-leap method, could be considered but this

comes at additional computational costs. Thus, developments are needed to simulate

data from such models in an efficient manner. This is an area of research that could

be pursued into the future. Secondly, the development of deterministic approaches to

evaluate likelihoods of stochastic models is also a potential future research direction to


reduce or avoid a large number of model simulations. Finally, parallel computational

architectures available in graphics processing units (GPU) can be exploited to reduce

computational time in utility evaluations.

In this thesis, we have mainly considered designing of experiments in epidemiology. How-

ever, the methods developed can be used to design experiments in other areas such as

queue systems and system biology. Further, we have only considered designing experi-

ments without replicates due to the impracticality of collecting multiple measurements at

a single time point. In future research, designing experiments with replicates could also

be considered.

A Supplementary Material for Chapter 3: ‘Opti-

mal Bayesian design for discriminating between

models with intractable likelihoods in epidemi-

ology’

115

Appendix A. Supplementary Material for Chapter 3 116

A.1 Monte Carlo error of utility estimation in Example 2

0.00

0.01

0.02

0.03

0.04

0 1000 2000 3000 4000 5000

Number of Monte Carlo samples (Qm)

Monte

Carl

o e

rror

0.00

0.01

0.02

0.03

0.04

0.05

0 1000 2000 3000 4000 5000


Monte

Carl

o e

rror

0.00

0.01

0.02

0.03

0.04

0.05

0 1000 2000 3000 4000 5000


Monte

Carl

o e

rror

0.01

0.02

0.03

0.04

0.05

0 1000 2000 3000 4000 5000


Monte

Carl

o e

rror

0.00

0.01

0.02

0.03

0.04

0.05

0 1000 2000 3000 4000 5000

Number of Monte Carlo samples (Qms)

Monte

Carl

o e

rror

0.00

0.02

0.04

0.06

0 1000 2000 3000 4000 5000


Monte

Carl

o e

rror

0.00

0.02

0.04

0.06

0 1000 2000 3000 4000 5000


Monte

Carl

o e

rror

0.00

0.02

0.04

0.06

0 1000 2000 3000 4000 5000


Monte

Carl

o e

rror

0.00

0.01

0.02

0.03

0.04

0 1000 2000 3000 4000 5000


Monte

Carl

o e

rror

1 design point

0.00

0.01

0.02

0.03

0.04

0.05

0 1000 2000 3000 4000 5000


Monte

Carl

o e

rror

2 design points

0.01

0.02

0.03

0.04

0.05

0 1000 2000 3000 4000 5000


Monte

Carl

o e

rror

3 design points

0.01

0.02

0.03

0.04

0.05

0 1000 2000 3000 4000 5000


Monte

Carl

o e

rror

4 design points

Figure A.1: Monte Carlo error of estimated expected utility of the mutual information utility (first row),the Ds-optimality utility (second row) and the Zero-One utility (third row). In each plot, the solid linerepresents the Monte Carlo error associates with the estimated utility of the optimal design found undereach utility and dashed lines represent the Monte Carlo errors for randomly selected designs.


A.2 Performance of optimisation algorithms in locating

three and four points designs discriminating between

Model 1 and 2

Table A.1: Performance of optimisation algorithms in locating three and four points designs for discrim-inating between Model 1 and 2 based on different utility functions.

Utility function |d|Optimisation

Algorithm

Optimal design

d∗U(d∗)

Total

run time

(hours)

Mutual information

3RCE(1) (0.6, 4.3, 6.0) -0.45 3.90

ACE-D (0.9, 4.2, 6.9) -0.45 14.62

4RCE(1) (0.6, 4.0, 5.1, 10.8) -0.44 5.67

ACE-D (0.6, 4.1, 5.2, 11.5) -0.44 19.90

Ds-optimal

3RCE(1) (0.7, 3.0, 4.8) 10.44 2.43

ACE-D (0.6, 2.9, 4.7) 10.44 10.10

4RCE(1) (0.5, 1.0, 3.3, 5.7) 10.48 3.80

ACE-D (0.6, 1.0, 3.0, 4.8) 10.48 11.95

Zero-One (0-1)

3RCE(1) (0.4, 5.3, 8.1 ) 0.790 3.12

ACE-D (0.6, 4.2, 9.2) 0.798 11.91

4RCE(1) (0.3, 1.2, 3.3, 5.5) 0.798 5.52

ACE-D (0.6, 4.0, 9.4, 13.6) 0.803 17.98


A.3 Performance of the optimal designs

A.3.1 Performance of the optimal designs for model discrimination -

Model 2

0.00

0.25

0.50

0.75

1.00

0.2 0.4 0.6 0.8


Em

pir

ica

l cum

ula

tive

pro

ba

bili

ty

Utility function

0−1

Ds

MI

RD

(a) 1 design point

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.00


Em

pir

ica

l cum

ula

tive

pro

ba

bili

ty

Utility function

0−1

Ds

MI

RD

(b) 4 design points

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.00


Em

pir

ical cum

ula

tive

pro

babili

ty

Utility function

0−1

Ds

MI

RD

(c) 8 design points

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.00


Em

pir

ical cum

ula

tive

pro

babili

ty

Utility function

0−1

Ds

MI

RD


Figure A.2: Empirical cumulative probabilities of the ABC posterior model probability of Model 2 (truemodel) obtained for observations generated from Model 2 according to optimal designs for discriminatingbetween Models 1, 2, 3 and 4, and random designs.



Model 3

0.00

0.25

0.50

0.75

1.00

0.0 0.1 0.2 0.3 0.4 0.5


Em

pir

ica

l cum

ula

tive

pro

ba

bili

ty

Utility function

0−1

Ds

MI

RD

(a) 1 design point

0.00

0.25

0.50

0.75

1.00

0.0 0.2 0.4 0.6 0.8


Em

pir

ica

l cum

ula

tive

pro

ba

bili

ty

Utility function

0−1

Ds

MI

RD

(b) 4 design points

0.00

0.25

0.50

0.75

1.00

0.0 0.2 0.4 0.6 0.8


Em

pir

ical cum

ula

tive

pro

babili

ty

Utility function

0−1

Ds

MI

RD

(c) 8 design points

0.00

0.25

0.50

0.75

1.00

0.0 0.2 0.4 0.6 0.8


Em

pir

ical cum

ula

tive

pro

babili

ty

Utility function

0−1

Ds

MI

RD





Model 4

0.00

0.25

0.50

0.75

1.00

0.0 0.2 0.4 0.6


Em

pir

ica

l cum

ula

tive

pro

ba

bili

ty

Utility function

0−1

Ds

MI

RD

(a) 1 design point

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.00


Em

pir

ica

l cum

ula

tive

pro

ba

bili

ty

Utility function

0−1

Ds

MI

RD

(b) 4 design points

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.00


Em

pir

ical cum

ula

tive

pro

babili

ty

Utility function

0−1

Ds

MI

RD

(c) 8 design points

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.00


Em

pir

ical cum

ula

tive

pro

babili

ty

Utility function

0−1

Ds

MI

RD



B Supplementary Material for Chapter 4: ‘Dual

purpose Bayesian design for parameter estima-

tion and model discrimination in epidemiology

using a synthetic likelihood approach’

B.1 Derivation of the total entropy utility for static

designs

Let y be n observations obtained by conducting an experiment under a static design d

which consists of n distinct time points, d = {t1, t2, . . . tn}. When the aim of the experi-

ment is to discriminate between K candidate models and to estimate model parameters,

a dual purpose utility function based on total entropy can be used. Total entropy was

originally proposed by Borth (1975), and can be expressed as follows,

HT (M,θ|y ,d) = HM (M |y ,d) +HP (θ|y ,d), (B.1)

where HM (M |y ,d) is the entropy about which model is correct and HP (θ|y ,d) is the

entropy of all parameters across the K models. Then, the expected change in total

entropy based on observations collected under a given design d can be used to measure

the informativeness of the design. In the following subsections, the expected change

in entropy about model indicator (IM ) and model parameters (IP ) will be described.

Following this, the total entropy utility is derived based on the sum of the expected

changes in entropy about the model indicator and parameter values.

B.1.1 Expected change in HM

The entropy about model indicator (Box and Hill, 1967) upon observing y given design

d is given by,

121

Appendix B. Supplementary Material for Chapter 4 122

HM (M |y ,d) = −K∑

m=1

p(m|y ,d) log p(m|y ,d). (B.2)

The entropy about the model indicator based on prior model probabilities, p(m), is,

HM (M) = −K∑

m=1

p(m) log p(m). (B.3)

The expected change in entropy about the model indicator based on observations obtained

under design d can be expressed as,

IM = HM (M)− E[HM (M |y ,d)]. (B.4)

Here,

E[HM (M |y ,d)] =∑y

HM (M |y ,d) p(y |d),

= −∑y

K∑m=1

p(m|y ,d) log p(m|y ,d) p(y |d),

= −K∑

m=1

∑y

p(m|y ,d) p(y |d) log p(m|y ,d). (B.5)

By the Bayes' theorem, p(m|y ,d)p(y |d) = p(y |m,d)p(m), then (B.6) can be simplified

as,

E[HM (M |y ,d)] = −K∑

m=1

∑y

p(y |m,d)p(m) log p(m|y ,d), (B.6)

= −K∑

m=1

p(m)∑y

p(y |m,d) log p(m|y ,d). (B.7)

Then, by applying (B.3) and (B.6) in Equation (B.4),


IM =−K∑

m=1

p(m) log p(m) +K∑

m=1

p(m)∑y

p(y |m,d) log p(m|y ,d),

=K∑

m=1

p(m)∑y

p(y |m,d)

{log p(m|y ,d)− log p(m)

}. (B.8)

B.1.2 Expected change in HP

The entropy about model parameters across K models (McGree, 2017) upon observing y

given design d is given by,

HP (θ|y ,d) = −K∑

m=1

p(m|y ,d)

∫θm

p(θm|m,y ,d) log p(θm|m,y ,d) dθm. (B.9)

The entropy of model parameters, θm, based on the prior distribution of θm is given by,

HP (θ) = −K∑

m=1

p(m)

∫θm

p(θm|m)p(θm|m) dθm. (B.10)

The expected entropy about the model parameters based on observations obtained under

design d can be expressed as,

IP = HP (θ)− E[HP (θ|y ,d)]. (B.11)

Here,

E[HP (θ|y ,d)] =∑y

HP (θ|y ,d) p(y |d), (B.12)

= −∑y

K∑m=1

p(m|y ,d)

∫θm

p(θm|m,y ,d) log p(θm|m,y ,d) p(y |d)dθm,

(B.13)

= −∑y

K∑m=1

p(m|y ,d)p(y |d)

∫θm

p(θm|m,y ,d) log p(θm|m,y ,d) dθm.

(B.14)


By the Bayes' theorem, p(m|y ,d)p(y |d) = p(y |m,d)p(m), then (B.14) can be expressed

as,

E[HP (θ|y ,d)] =−K∑

m=1

p(m)∑y

p(y |m,d)

∫θm

p(θm|m,y ,d) log p(θm|m,y ,d) dθm,

(B.15)

=−K∑

m=1

p(m)∑y

p(y |m,d)

∫θm

p(θm|m,y ,d) log

[p(y |θm,m,d)p(θm|m)

p(y |m,d)

]dθm,

(B.16)

=−K∑

m=1

p(m)∑y

p(y |m,d)

∫θm

p(θm|m,y ,d) log

[p(y |θm,m,d)

p(y |m,d)

]dθm

−K∑

m=1

p(m)∑y

p(y |m,d)

∫θm

p(θm|m,y ,d) log p(θm|m) dθm.

(B.17)

Let

A = −K∑

m=1

p(m)∑y

p(y |m,d)

∫θm

p(θm|m,y ,d) log

[p(y |θm,m,d)

p(y |m,d)

]dθm,

and

B = −K∑

m=1

p(m)∑y

p(y |m,d)

∫θm

p(θm|m,y ,d) log p(θm|m) dθm.

Then,

A = −K∑

m=1

p(m)∑y

p(y |m,d)

∫θm

p(θm|m,y ,d) log

[p(y |θm,m,d)

p(y |m,d)

]dθm, (B.18)

= −K∑

m=1

p(m)∑y

p(y |m,d)

{∫θm

p(θm|m,y ,d) log p(y |θm,m,d) dθm − log p(y |m,d)

}.

(B.19)


B = −K∑

m=1

p(m)∑y

p(y |m,d)

∫θm

p(y |θm,m,d)p(θm|m)

p(y |m,d)log p(θm|m) dθm, (B.20)

= −K∑

m=1

p(m)∑y

∫θm

p(y |θm,m,d)p(θm|m) log p(θm|m) dθm, (B.21)

= −K∑

m=1

p(m)

∫θm

p(θm|m) log p(θm|m)∑y

p(y |θm,m,d) dθm, (B.22)

= −K∑

m=1

p(m)

∫θm

p(θm|m) log p(θm|m) dθm, (B.23)

= HP (θ). (B.24)

By substituting, A and B in Equation (B.17),

E[HP (θ|y ,d)] = −K∑

m=1

p(m)∑y

p(y |m,d)

{∫θm

p(θm|m,y ,d) log p(y |θm,m,d) dθm

− log p(y |m,d)

}+HP (θ). (B.25)

Then, by applying (B.10) and (B.25) in Equation (B.11),

IP =

K∑m=1

p(m)∑y

p(y |m,d)

{∫θm


}.

(B.26)

This can be used as a utility function for parameter estimation under model uncertainty.

B.1.3 Expected change in HT

The expected change in the total entropy can be expressed as follows:


IT = HT (M,θ)− E[HT (M,θ|y ,d)], (B.27)

=

{HM (M) +HP (θ)

}−{E[HM (M |y ,d)] + E[HP (θ|y ,d)]

}, (B.28)

=

{HM (M)− E[HM (M |y ,d)]

}+

{HP (θ)− E[HP (θ|y ,d)]

}, (B.29)

= IM + IP . (B.30)

By applying Equation (B.8) and (B.26) in (B.30),

IT =

K∑m=1

p(m)∑y

p(y |m,d)

{log p(m|y ,d)− log p(m)

+

∫θm


},

(B.31)

=

K∑m=1

p(m)∑y

p(y |m,d)

{log

[p(m|y ,d)

p(y |m,d)p(m)

]+

∫θm


},

(B.32)

=K∑

m=1

p(m)∑y

p(y |m,d)

{− log p(y |d) +

∫θm


}.

(B.33)

The utility function based on the expected change in total entropy can be expressed as

follows:

U(d) =

K∑m=1

p(m)∑y

p(y |m,d)

{−log p(y |d)+

∫θm


}.

(B.34)


B.2 Comparison between synthetic likelihood approach

and ABC rejection method

We compare the performance of the proposed synthetic likelihood approach with the

rejection sampling in approximate Bayesian computation (ABC) in approximating the

expected utility using the design scenario described in Example 1 of the main paper.

However, here we only consider the mutual information utility for discriminating between

the death and SI model as ABC rejection sampling does not provide an efficient approx-

imation to the Kullback-Leibler (KL) divergence utility and the total entropy utility. In

this comparison, we first compare our synthetic likelihood approach with the ABC model

choice (ABC-MC) method of Grelaud et al. (2009) for approximating the expected utility

of designs with two, four, six and eight randomly selected design points. For each design,

we compare both approximations based on 1× 106 model simulations, to the case where

u(d) is estimated based on evaluating the actual likelihood of the data, see Figure B.1.

From the first column of Figure B.1, it is evident that the proposed synthetic likelihood

approach approximates the actual mutual information utility well, even for designs with a

moderate number of design points (|d |). However, the ABC approximation, based on the

same number of model simulations, results in some deviation of the approximated utility

from the actual utility value as |d | increases.

As discussed in Dehideniya et al. (2018b), the reduced performance of ABC-MC as |d |increases is due to the increase in the ABC-tolerance. This could be remedied by increas-

ing the number of model simulations but comes at the cost of computational efficiency.

Nonetheless, we next investigate whether the ABC-MC approximation improves as the

number of simulated datasets increases. As can be seen in the third column of Figure B.1,

potentially some improvement is observed in the ABC-MC approximation by increasing

the number of simulated datasets to 2× 106. However, our synthetic likelihood approach

still performs better overall with strong agreement with the utilities values based on

evaluating the actual likelihood.


SL (1× 106)

|d|=

2

−0.7

−0.6

−0.6

−0.6

−0.5

−0.7 −0.6 −0.6 −0.6 −0.5

U(d) − Actual

U(d

) −

SL

Ap

pro

xim

atio

n

|d|=

4

−0.7

−0.6

−0.5

−0.7 −0.6 −0.5

U(d) − Actual

U(d

) −

SL

Ap

pro

xim

atio

n

|d|=

6

−0.7

−0.6

−0.5

−0.7 −0.6 −0.5

U(d) − Actual

U(d

) −

SL

Ap

pro

xim

atio

n

|d|=

8

−0.7

−0.6

−0.5

−0.7 −0.6 −0.5

U(d) − Actual

U(d

) −

SL

Ap

pro

xim

atio

n

ABC (1× 106)

−0.7

−0.6

−0.6

−0.6

−0.5

−0.7 −0.6 −0.6 −0.6 −0.5

U(d) − Actual

U(d

) −

AB

C A

pp

roxim

atio

n

−0.7

−0.6

−0.6

−0.6

−0.5

−0.4

−0.7 −0.6 −0.5

U(d) − Actual

U(d

) −

AB

C A

pp

roxim

atio

n

−0.7

−0.6

−0.5

−0.7 −0.6 −0.5

U(d) − Actual

U(d

) −

AB

C A

pp

roxim

atio

n

−0.7

−0.6

−0.5

−0.7 −0.6 −0.5

U(d) − Actual

U(d

) −

AB

C A

pp

roxim

atio

n

ABC (2× 106)

−0.7

−0.6

−0.6

−0.6

−0.5

−0.7 −0.6 −0.6 −0.6 −0.5

U(d) − Actual

U(d

) −

AB

C A

pp

roxim

atio

n

−0.7

−0.6

−0.6

−0.6

−0.5

−0.4

−0.7 −0.6 −0.5

U(d) − Actual

U(d

) −

AB

C A

pp

roxim

atio

n

−0.7

−0.6

−0.5

−0.7 −0.6 −0.5

U(d) − Actual

U(d

) −

AB

C A

pp

roxim

atio

n

−0.7

−0.6

−0.5

−0.7 −0.6 −0.5

U(d) − Actual

U(d

) −

AB

C A

pp

roxim

atio

n

Figure B.1: Comparison of the accuracy of the estimated expected utility of the mutual informationutility using Synthetic likelihood based on 1 × 106 model simulations (first column), and ABC-MC basedon 1 × 106 (second column) and 2 × 106 (third column) model simulations. In each plot, y = x lineindicates a perfect match of approximated and actual utility evaluations.

C Supplementary Material for Chapter 5: ‘A syn-

thetic likelihood-based Laplace approximation

for efficient design of biological processes’

C.1 Informativeness of summary statistics

C.1.1 Death model

20

30

40

−1.5 −1.0 −0.5 0.0 0.5

b

me

an

(a)

0

100

200

−1.5 −1.0 −0.5 0.0 0.5

b

vari

an

ce

(b)

Figure C.1: Scatter plot between model parameter b and summary statistics, (a) mean and (b) varianceof observations simulated from the death model according to a random design with 15 design points.

129

Appendix C. Supplementary Material for Chapter 5 130

C.1.2 SI model

20

30

40

50

−2 −1 0

b1

me

an

(a)

0

100

200

300

400

−2 −1 0

b1

vari

an

ce

(b)

Figure C.2: Scatter plot between model parameter b1 and summary statistics, (a) mean and (b) varianceof observations simulated from the SI model according to a random design with 15 design points.

20

30

40

50

−7 −6 −5 −4 −3 −2

b2

me

an

(a)

0

100

200

300

400

−7 −6 −5 −4 −3 −2

b2

vari

an

ce

(b)

Figure C.3: Scatter plot between model parameter b2 and summary statistics, (a) mean and (b) varianceof observations simulated from the SI model according to a random design with 15 design points.

Bibliography

An, Z., Nott, D. J., and Drovandi, C. C. (2018). Robust Bayesian synthetic likelihood via

a semi-parametric approach. Technical report. URL : arXiv preprint arXiv:1809.05800.

Atkinson, A. (2008). DT-optimum designs for model discrimination and parameter esti-

mation. Journal of Statistical Planning and Inference, 138(1):56 – 64.

Atkinson, A. C. and Bogacka, B. (1997). Compound D- and Ds-optimum designs for

determining the order of a chemical reaction. Technometrics, 39(4):347–356.

Atkinson, A. C. and Fedorov, V. V. (1975a). The design of experiments for discriminating

between two rival models. Biometrika, 62(1):57–70.

Atkinson, A. C. and Fedorov, V. V. (1975b). Optimal design: Experiments for discrimi-

nating between several models. Biometrika, 62(2):289–303.

Backer, J., Hagenaars, T., Nodelijk, G., and van Roermund, H. (2012). Vaccination

against foot-and-mouth disease i: Epidemiological consequences. Preventive Veterinary

Medicine, 107(1-2):27 – 40.

Bailey, D. J. and Gilligan, C. A. (1999). Dynamics of primary and secondary infection in

take-all epidemics. Phytopathology, 89(1):84 – 91.

Bailey, D. J., Kleczkowski, A., and Gilligan, C. A. (2004). Epidemiological dynamics and

the efficiency of biological control of soil-borne disease during consecutive epidemics in

a controlled environment. New Phytologist, 161(2):569–575.

Balciunas, D. and Lawler, S. P. (1995). Effects of basal resources, predation, and alter-

native prey in microcosm food chains. Ecology, 76(4):1327–1336.

Barthelme, S. and Chopin, N. (2014). Expectation propagation for likelihood-free infer-

ence. Journal of the American Statistical Association, 109(505):315–333.

Beaumont, M. A., Zhang, W., and Balding, D. J. (2002). Approximate Bayesian compu-

tation in population genetics. Genetics, 162(4):2025–2035.

Becker, N. G. (1993). Parametric inference for epidemic models. Mathematical Bio-

sciences, 117(1):239 – 251.

Biedermann, S., Dette, H., and Pepelyshev, A. (2007). Optimal discrimination de-

signs for exponential regression models. Journal of Statistical Planning and Inference,

137(8):2579 – 2592. 5th St. Petersburg Workshop on Simulation.

Biedermann, S. and Woods, D. C. (2011). Optimal designs for generalized non-linear

models with application to second-harmonic generation experiments. Journal of the

Royal Statistical Society: Series C (Applied Statistics), 60(2):281–299.

131

Bibliography 132

Blum, M. G. B., Nunes, M. A., Prangle, D., and Sisson, S. A. (2013). A comparative

review of dimension reduction methods in approximate Bayesian computation. Statist.

Sci., 28(2):189–208.

Bonsall, M. B. and Hassell, M. P. (2005). Understanding ecological concepts: The role

of laboratory systems. In Population Dynamics and Laboratory Ecology, volume 37 of

Advances in Ecological Research, pages 1 – 36. Academic Press.

Borth, D. M. (1975). A total entropy criterion for the dual problem of model discrim-

ination and parameter estimation. Journal of the Royal Statistical Society. Series B

(Methodological), 37(1):77–87.

Bouma, A., Elbers, A., Dekker, A., de Koeijer, A., Bartels, C., Vellema, P., van der Wal,

P., van Rooij, E., Pluimers, F., and de Jong, M. (2003). The foot-and-mouth disease

epidemic in the Netherlands in 2001. Preventive Veterinary Medicine, 57(3):155 – 166.

Box, G. E. P. and Hill, W. J. (1967). Discrimination among mechanistic models. Tech-

nometrics, 9(1):57–71.

Bradhurst, R. A., Roche, S. E., East, I. J., Kwan, P., and Garner, M. G. (2015). A hy-

brid modeling approach to simulating foot-and-mouth disease outbreaks in Australian

livestock. Frontiers in Environmental Science, 3:17.

Bravo de Rueda, C., de Jong, M. C., Eble, P. L., and Dekker, A. (2015). Quantification of

transmission of foot-and-mouth disease virus caused by an environment contaminated

with secretions and excretions from infected calves. Veterinary Research, 46(1):43.

Cavagnaro, D. R., Myung, J. I., Pitt, M. A., and Kujala, J. V. (2010). Adaptive de-

sign optimization: A mutual information-based approach to model discrimination in

cognitive science. Neural Computation, 22(4):887–905.

Christophe, D. and Petr, S. (2018). randtoolbox: Generating and Testing Random Num-

bers. R package version 1.17.1.

Clyde, M. and Chaloner, K. (1996). The equivalence of constrained and weighted designs

in multiple objective design problems. Journal of the American Statistical Association,

91(435):1236–1244.

Cook, A. R., Gibson, G. J., and Gilligan, C. A. (2008). Optimal observation times in

experimental epidemic processes. Biometrics, 64(3):860–868.

Dehideniya, M. B., Drovandi, C. C., and McGree, J. M. (2018a). Dual pur-

pose Bayesian design for parameter estimation and model discrimination in epi-

demiology using a synthetic likelihood approach. Technical report. URL :

https://eprints.qut.edu.au/118569/.

Dehideniya, M. B., Drovandi, C. C., and McGree, J. M. (2018b). Optimal Bayesian

design for discriminating between models with intractable likelihoods in epidemiology.

Computational Statistics & Data Analysis, 124:277 – 297.

Denman, N., McGree, J., Eccleston, J., and Duffull, S. (2011). Design of experiments for

bivariate binary responses modelled by copula functions. Computational Statistics &

Data Analysis, 55(4):1509 – 1520.

Dror, H. A. and Steinberg, D. M. (2006). Robust experimental design for multivariate

generalized linear models. Technometrics, 48(4):520–529.

Bibliography 133

Drovandi, C. C., McGree, J. M., and Pettitt, A. N. (2013). Sequential Monte Carlo for

Bayesian sequentially designed experiments for discrete data. Computational Statistics

& Data Analysis, 57(1):320 – 335.

Drovandi, C. C., McGree, J. M., and Pettitt, A. N. (2014). A sequential Monte Carlo

algorithm to incorporate model uncertainty in Bayesian sequential design. Journal of

Computational and Graphical Statistics, 23(1):3–24.

Drovandi, C. C. and Pettitt, A. N. (2013). Bayesian experimental design for models with

intractable likelihoods. Biometrics, 69(4):937–948.

Drovandi, C. C., Pettitt, A. N., and Faddy, M. J. (2011). Approximate Bayesian com-

putation using indirect inference. Journal of the Royal Statistical Society: Series C

(Applied Statistics), 60(3):317–337.

Drovandi, C. C., Pettitt, A. N., and Lee, A. (2015). Bayesian indirect inference using a

parametric auxiliary model. Statistical Science, 30(1):72–95.

Drovandi, C. C. and Tran, M.-N. (2018). Improving the efficiency of fully Bayesian

optimal design of experiments using randomised quasi-Monte Carlo. Bayesian Analysis,

13(1):139–162.

Faller, D., Klingmuller, U., and Timmer, J. (2003). Simulation methods for optimal

experimental design in systems biology. SIMULATION, 79(12):717–725.

Fasiolo, M. (2016). Statistical Methods for Complex Population Dynamics. PhD thesis,

University of Bath.

Fasiolo, M., Wood, S. N., Hartig, F., and Bravington, M. V. (2018). An extended empirical

saddlepoint approximation for intractable likelihoods. Electronic Journal of Statistics,

12(1):1544–1578.

Gallant, A. R. and McCulloch, R. E. (2009). On the determination of general scientific

models with application to asset pricing. Journal of the American Statistical Associa-

tion, 104(485):117–131.

Gillespie, D. T. (1977). Exact stochastic simulation of coupled chemical reactions. The

Journal of Physical Chemistry, 81(25):2340–2361.

Gillespie, D. T. (2001). Approximate accelerated stochastic simulation of chemically

reacting systems. The Journal of Chemical Physics, 115(4):1716–1733.

Goos, P. and Jones, B. (2011). An Optimal Screening Experiment, chapter 2, pages 9–45.

John Wiley & Sons, Ltd.

Gotwalt, C. M., Jones, B. A., and Steinberg, D. M. (2009). Fast computation of designs

robust to parameter uncertainty for nonlinear settings. Technometrics, 51(1):88–95.

Green, P. J. (1995). Reversible jump Markov chain Monte Carlo computation and

Bayesian model determination. Biometrika, 82(4):711–732.

Grelaud, A., Robert, C. P., Marin, J.-M., Rodolphe, F., and Taly, J.-F. (2009). ABC

likelihood-free methods for model choice in Gibbs random fields. Bayesian Analysis,

4(2):317–335.

Hainy, M., Muller, W. G., and Wagner, H. (2014). Likelihood-free simulation-based

optimal design: An Introduction. In Melas, V., Mignani, S., Monari, P., and Salmaso,

L., editors, Topics in Statistical Simulation, pages 271–278, New York, NY. Springer

New York.

Bibliography 134

Hainy, M., Muller, W. G., and Wagner, H. (2016). Likelihood-free simulation-based opti-

mal design with an application to spatial extremes. Stochastic Environmental Research

and Risk Assessment, 30(2):481–492.

Hainy, M., Muller, W. G., and Wynn, H. P. (2013). Approximate Bayesian computation

design (ABCD), an Introduction. In Ucinski, D., Atkinson, A. C., and Patan, M.,

editors, mODa 10 – Advances in Model-Oriented Design and Analysis, pages 135–143,

Heidelberg. Springer International Publishing.

Hainy, M., Price, D. J., Restif, O., and Drovandi, C. (2018). Optimal Bayesian design for

model discrimination via classification. arXiv preprint arXiv:1809.05301.

Hassell, M. P., Lawton, J. H., and May, R. M. (1976). Patterns of dynamical behaviour

in single-species populations. Journal of Animal Ecology, 45(2):471–486.

Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and their

applications. Biometrika, 57(1):97–109.

Haydon, D. T., Kao, R. R., and Kitching, R. P. (2004). The UK foot-and-mouth disease

outbreak - the aftermath. Nature Reviews Microbiology, 2(8):675–681.

Hill, W. J., Hunter, W. G., and Wichern, D. W. (1968). A joint design criterion for

the dual problem of model discrimination and parameter estimation. Technometrics,

10(1):145–160.

Hu, B., Gonzales, J. L., and Gubbins, S. (2017). Bayesian inference of epidemiological

parameters from transmission experiments. Scientific Reports, 7(1):16774.

Kelley, C. (1999). Iterative Methods for Optimization. SIAM.

Kim, K. I. and Lin, Z. (2008). Asymptotic behavior of an SEI epidemic model with

diffusion. Mathematical and Computer Modelling, 47(11):1314–1322.

Kleczkowski, A., Bailey, D. J., and Gilligan, C. A. (1996). Dynamically generated variabil-

ity in plant-pathogen systems with biological control. Proceedings of the Royal Society

of London B: Biological Sciences, 263(1371):777–783.

Knight-Jones, T. and Rushton, J. (2013). The economic impacts of foot and mouth disease

- what are they, how big are they and where do they occur? Preventive Veterinary

Medicine, 112(3):161 – 173.

Konstantinou, M., Biedermann, S., and Kimber, A. C. (2015). Optimal designs for full

and partial likelihood information — with application to survival models. Journal of

Statistical Planning and Inference, 165:27 – 37.

Kullback, S. and Leibler, R. A. (1951). On information and sufficiency. The Annals of

Mathematical Statistics, 22(1):79–86.

Lawler, S. P. (1993). Direct and indirect effects in microcosm communities of protists.

Oecologia, 93(2):184–190.

Leclerc, M., Dore, T., Gilligan, C. A., Lucas, P., and Filipe, J. A. N. (2014). Estimat-

ing the delay between host infection and disease (incubation period) and assessing its

significance to the epidemiology of plant diseases. PLOS ONE, 9(1):1–15.

Lee, X. J., Drovandi, C. C., and Pettitt, A. N. (2015). Model choice problems using

approximate Bayesian computation with applications to pathogen transmission data

sets. Biometrics, 71(1):198–207.

Bibliography 135

Lindley, D. V., Barndorff-Nielsen, O., Elfving, G., Harsaae, E., Thorburn, D., Hald, A.,

and Spjotvoll, E. (1978). The Bayesian approach [with discussion and reply]. Scandi-

navian Journal of Statistics, 5(1):1–26.

Long, Q., Scavino, M., Tempone, R., and Wang, S. (2013). Fast estimation of expected

information gains for Bayesian experimental designs based on Laplace approximations.

Computer Methods in Applied Mechanics and Engineering, 259:24 – 39.

Lopez-Fidalgo, J., Tommasi, C., and Trandafir, P. C. (2007). An optimal experimental

design criterion for discriminating between non-normal models. Journal of the Royal

Statistical Society: Series B (Statistical Methodology), 69(2):231–242.

Luckinbill, L. S. (1973). Coexistence in laboratory populations of paramecium aurelia

and its predator didinium nasutum. Ecology, 54(6):1320–1327.

Marjoram, P., Molitor, J., Plagnol, V., and Tavare, S. (2003). Markov chain Monte Carlo

without likelihoods. Proceedings of the National Academy of Sciences, 100(26):15324–

15328.

McGree, J. M.and Eccleston, J. A. (2010). Investigating design for survival models.

Metrika, 72(3):295–311.

McGree, J. M. (2017). Developments of the total entropy utility function for the dual

purpose of model discrimination and parameter estimation in Bayesian design. Com-

putational Statistics and Data Analysis, 113:207–225.

McGree, J. M., Drovandi, C. C., Thompson, M. H., Eccleston, J. A., Duffull, S. B.,

Mengersen, K., Pettitt, A. N., and Goggin, T. (2012). Adaptive Bayesian compound de-

signs for dose finding studies. Journal of Statistical Planning and Inference, 142(6):1480

– 1492.

McGree, J. M., Drovandi, C. C., White, G., and Pettitt, A. N. (2016). A pseudo-marginal

sequential Monte Carlo algorithm for random effects models in Bayesian sequential

design. Statistics and Computing, 26(5):1121–1136.

McGree, J. M. and Eccleston, J. A. (2008). Probability-based optimal design. Australian

& New Zealand Journal of Statistics, 50(1):13–28.

McGree, J. M. and Eccleston, J. A. (2012). Robust designs for poisson regression models.

Technometrics, 54(1):64–72.

McGree, J. M., Eccleston, J. A., and Duffull, S. B. (2008). Compound optimal design

criteria for nonlinear models. Journal of Biopharmaceutical Statistics, 18(4):646–661.

McKeeman, W. M. (1962). Algorithm 145: Adaptive numerical integration by Simpson’s

rule. Communications of the ACM, 5(12):604.

Meyer, R. K. and Nachtsheim, C. J. (1995). The coordinate-exchange algorithm for

constructing exact optimal experimental designs. Technometrics, 37(1):60–69.

Montgomery, D. C. (2017). Design and Analysis of Experiments. John Wiley & Sons, 9

edition.

Morse, P. M. (1955). Stochastic properties of waiting lines. Journal of the Operations

Research Society of America, 3(3):255–261.

Muller, P. (1999). Simulation-based optimal design. Bayesian Statistics 6, pages 459–474.

Muller, W. G. and Ponce de Leon, A. C. M. (1996). Optimal design of an experiment in

economics. The Economic Journal, 106(434):122–127.

Bibliography 136

Muroga, N., Hayama, Y., Yamamoto, T., Kurogi, A., Tsuda, T., and Tsutsui, T. (2012).

The 2010 foot-and-mouth disease epidemic in Japan. Journal of Veterinary Medical

Science, 74(4):399–404.

Ng, S. H. and Chick, S. E. (2004). Design of follow-up experiments for improving model

discrimination and parameter estimation. Naval Research Logistics (NRL), 51(8):1129–

1148.

Nicholson, A. J. (1954). An outline of the dynamics of animal populations. Australian

Journal of Zoology, 2(1):9–65.

Ong, V. M. H., Nott, D. J., Tran, M.-N., Sisson, S. A., and Drovandi, C. C. (2018).

Variational bayes with synthetic likelihood. Statistics and Computing, 28(4):971–988.

Orsel, K., de Jong, M., Bouma, A., Stegeman, J., and Dekker, A. (2007). The effect of

vaccination on foot and mouth disease virus transmission among dairy cows. Vaccine,

25(2):327 – 335.

Otten, W., Filipe, J. A. N., Bailey, D. J., and Gilligan, C. A. (2003). Quantification and

analysis of transmission rates for soilborne epidemics. Ecology, 84(12):3232–3239.

Overstall, A. M. and McGree, J. M. (2018). Bayesian design of experiments for intractable

likelihood models using coupled auxiliary models and multivariate emulation. Bayesian

Analysis.

Overstall, A. M., McGree, J. M., and Drovandi, C. C. (2018). An approach for finding

fully Bayesian optimal designs using normal-based approximations to loss functions.

Statistics and Computing, 28(2):343–358.

Overstall, A. M. and Woods, D. C. (2017). Bayesian design of experiments using approx-

imate coordinate exchange. Technometrics, 59(4):458–470.

Overstall, A. M., Woods, D. C., and Adamou, M. (2017a). acebayes: An R package for

Bayesian optimal design of experiments via approximate coordinate exchange. arXiv

preprint arXiv:1705.08096.

Overstall, A. M., Woods, D. C., and Adamou, M. (2017b). acebayes: Optimal Bayesian

Experimental Design using the ACE Algorithm. R package version 1.4.

Owen, A. B. (1997). Scrambled net variance for integrals of smooth functions. The Annals

of Statistics, 25(4):1541–1562.

Pagendam, D. and Pollett, P. (2013). Optimal design of experimental epidemics. Journal

of Statistical Planning and Inference, 143(3):563 – 572.

Palhazi Cuervo, D., Goos, P., and Sorensen, K. (2016). Optimal design of large-scale

screening experiments: a critical look at the coordinate-exchange algorithm. Statistics

and Computing, 26(1):15–28.

Parker, B. M., Gilmour, S., Schormans, J., and Maruri-Aguilar, H. (2015). Optimal design

of measurements on queueing systems. Queueing Systems, 79(3):365–390.

Perrone, E. and Muller, W. (2016). Optimal designs for copula models. Statistics,

50(4):917–929.

Ponce de Leon, A. C. and Atkinson, A. C. (1991). Optimum experimental design for dis-

criminating between two rival models in the presence of prior information. Biometrika,

78(3):601–608.

Bibliography 137

Price, D. J., Bean, N. G., Ross, J. V., and Tuke, J. (2016). On the efficient determi-

nation of optimal Bayesian experimental designs using ABC: A case study in optimal

observation of epidemics. Journal of Statistical Planning and Inference, 172:1–15.

Price, D. J., Bean, N. G., Ross, J. V., and Tuke, J. (2018a). Designing group dose-response

studies in the presence of transmission. Mathematical Biosciences, 304:62 – 78.

Price, D. J., Bean, N. G., Ross, J. V., and Tuke, J. (2018b). An induced natural selection

heuristic for finding optimal Bayesian experimental designs. Computational Statistics

& Data Analysis, 126:112 – 124.

Price, L. F., Drovandi, C. C., Lee, A., and Nott, D. J. (2018c). Bayesian synthetic

likelihood. Journal of Computational and Graphical Statistics, 27(1):1–11.

Rose, A. D. (2008). Bayesian experimental design for model discrimination. PhD thesis,

University of Southampton.

Ryan, C. M., Drovandi, C. C., and Pettitt, A. N. (2016a). Optimal Bayesian experimen-

tal design for models with intractable likelihoods using indirect inference applied to

biological process models. Bayesian Analysis, 11(3):857–883.

Ryan, E. G., Drovandi, C. C., McGree, J. M., and Pettitt, A. N. (2016b). A review of

modern computational algorithms for Bayesian optimal design. International Statistical

Review, 84(1):128–154.

Ryan, E. G., Drovandi, C. C., and Pettitt, A. N. (2015). Fully Bayesian experimental

design for pharmacokinetic studies. Entropy, 17(3):1063–1089.

Ryan, E. G., Drovandi, C. C., Thompson, M. H., and Pettitt, A. N. (2014). Towards

Bayesian experimental design for nonlinear models that require a large number of sam-

pling times. Computational Statistics & Data Analysis, 70:45 – 60.

Ryan, K. J. (2003). Estimating expected information gains for experimental designs with

application to the random fatigue-limit model. Journal of Computational and Graphical

Statistics, 12(3):585–603.

Sidje, R. B. (1998). Expokit: A software package for computing matrix exponentials.

ACM Transactions on Mathematical Software, 24(1):130–156.

Sisson, S. A., Fan, Y., and Tanaka, M. M. (2007). Sequential Monte Carlo without

likelihoods. Proceedings of the National Academy of Sciences, 104(6):1760–1765.

Solkner, J. (1993). Choice of optimality criteria for the design of crossbreeding experi-

ments. Journal of animal science, 71(11):2867–2873.

Stenfeldt, C., Pacheco, J. M., Brito, B. P., Moreno-Torres, K. I., Branan, M. A., Delgado,

A. H., Rodriguez, L. L., and Arzt, J. (2016). Transmission of foot-and-mouth disease

virus during the incubation period in pigs. Frontiers in Veterinary Science, 3:105.

Stroud, J. R., Muller, P., and Rosner, G. L. (2001). Optimal sampling times in population

pharmacokinetic studies. Journal of the Royal Statistical Society: Series C (Applied

Statistics), 50(3):345–359.

Tommasi, C. (2009). Optimal designs for both model discrimination and parameter esti-

mation. Journal of Statistical Planning and Inference, 139(12):4123 – 4132.

Tommasi, C. and Lopez-Fidalgo, J. (2010). Bayesian optimum designs for discriminat-

ing between models with any distribution. Computational Statistics & Data Analysis,

54(1):143 – 150.

Bibliography 138

van der Goot, J. A., Koch, G., de Jong, M. C. M., and van Boven, M. (2005). Quantifica-

tion of the effect of vaccination on transmission of avian influenza (h7n7) in chickens.

Proceedings of the National Academy of Sciences, 102(50):18141–18146.

Waterhouse, T., Woods, D., Eccleston, J., and Lewis, S. (2008). Design selection criteria

for discrimination/estimation for nested models and a binomial response. Journal of

Statistical Planning and Inference, 138(1):132–144.

Weir, C. J., Spiegelhalter, D. J., and Grieve, A. P. (2007). Flexible design and efficient im-

plementation of adaptive dose-finding studies. Journal of Biopharmaceutical Statistics,

17(6):1033–1050.

Wood, S. N. (2010). Statistical inference for noisy nonlinear ecological dynamic systems.

Nature, 466(7310):1102–1104.

Woods, D. C., Lewis, S. M., Eccleston, J. A., and Russell, K. G. (2006). Designs for

generalized linear models with several variables and model uncertainty. Technometrics,

48(2):284–292.

Woods, D. C., McGree, J. M., and Lewis, S. M. (2017). Model selection via Bayesian

information capacity designs for generalised linear models. Computational Statistics &

Data Analysis, 113:226–238.

Wu, H.-P. and Stufken, J. (2014). Locally Φ p-optimal designs for generalized linear

models with a single-variable quadratic polynomial predictor. Biometrika, 101(2):365–

375.

Zhang, J. F., Papanikolaou, N. E., Kypraios, T., and Drovandi, C. C. (2018). Optimal

experimental design for predator–prey functional response experiments. Journal of The

Royal Society Interface, 15(144).

Documents

Optimal Bayesian experimental designs for complex models · intractable likelihoods meaning evaluating the likelihood many times is computationally infeasible. In recent years, methods