Upload
dinhkhanh
View
230
Download
1
Embed Size (px)
Citation preview
Sample size determination for prevalence estimation
in the absence of a gold standard diagnostic test
Elham H. Rahme
Department of Mat hematics and S tatistics
McGill University, Montréal
November 1996
A thesis submitted to the Faculty of Graduate Studies and Research
in partial fulfillment of the requirements of the degree of Ph.D.
@Elham Rahme, 1996
National Library Bibliothèque nationale du Canada
Acquisitions and Acquisitions et Bibliographic Services services bibliographiques
395 Wellington Street 395, nie Wellington Ottawa ON K I A ON4 OttawaON K 1 A W Canada Canada
The author bas granted a non- L'auteur a accordé une licence non exclusive licence allowing the exclusive permettant à la National Library of Canada to Bibliothèque nationale du Canada de reproduce, loan, distribute or sen reproduire, prêter, distribuer ou copies of this thesis in microfonn, vendre des copies de cette thèse sous paper or electronic formats. la forme de microfiche/lfilm, de
reproduction sur papier ou sur format électronique.
The author retahs ownership of the L'auteur conserve la propriété du copyright in this thesis. Neither the droit d'auteur qui protège cette thèse. thesis nor substantial extracts fiom it Ni la thèse ni des extraits substantiels may be printed or otherwise de celle-ci ne doivent être imprimés reproduced without the author's ou autrement reproduits sans son permission. autorisation.
Acknowledgment s
I would like to express my deepest gratitude to Professor Lawrence Joseph who di-
rected the writing of t his t hesis. His invaluable suggestions, constant encouragement,
and endless patience. are greatly appreciated. I am also deeply grateful to Professor
David Wolfson for his advice and his constant support. 1 would like also to thank
Professor Keith Worsley for his comments and his kindness. Many thanks are also
due to the staff of the department for their support and their patience.
Abstract
A common problem in medical research is the estimation of the prevalence of a dis-
ease in a given population. This is usually accomplished by applying a diagnostic
test to a sample of subjects from the target population. In this thesis, we investigate
the sample size requirements for the accurate estimation of disease prevalence for
such experiments. When a gold standard diagnostic test is available, estimating the
prevalence of a disease can be viewed as a problem in estimating a binomial propor-
tion. In this case: we discuss some anomalies in the classical sample size criteria for
binomial parameter estimation. These are especially important with small sample
sizes. When a gold standard test is not available, one must take into account mis-
classification errors in order to avoid misleading results. When the sensitivity and
the specificity of the diagnostic test are both known, a new adjustment to the maxi-
mum likelihood estimator of the prevalence is suggested, and confidence intervals and
sample size estimates that arise from this estimator are given. A Bayesian approach
is taken when the sensitivity and specificity of the diagnostic test are not exactly
known. Here, a method to determine the sample size needed to satisfy a Bayesian
sample size criterion that averages over the preposterior marginal distribution of the
data is provided. Exact methods are given in some cases, and a sampling importance
resampling algorithm is used for more complex situations. A main conclusion is that
the degree to which the properties of a diagnostic test are known can have a ver?
large effect on the sample size requirernents.
Resumé
L'estimation de la prévalence d'une maladie dans une population donnée est un
problème commun en recherche médicale. Cette estimation est en géneral effectuée en
donnant un test diagnostic à un échantillon de la population visée. Dans cette thèse.
nous étudierons la taille de l'échantillon nécessaire à l'estimation de la prévalence
dans de telles expériences. Quand un test diagnostic parfait. servant de mesure
etalon' existe. l'estimation de la prévalence d'une maladie peut ètre vue comme un
problème d'estimation de la fréquence d'une distribution binomiale. dans ce cas. nous
montrerons qu'il existe quelques conceptions erronées en ce qui concerne le critère
classique servant à calculer la taille de l'échantillon. Ces conceptions sont partic-
ulièrement importantes dans le cas d'échantillons de petites tailles. Dans le cas ou
i iri test diagnostic parfait n'existe pas, on doit prendre en considérat ion les erreurs
de classification pour éviter des résultats trompeiirs. Si la sensibilité et la spécificité
du test diagnostic sont toutes deux connues. un nouvel ajustement du masirnum de
vraisemblance de la prévalence est suggéré. Des intervalles de confiance et des tailles
d'tchantillons résultants de cet estimateur sont calculés. C;ne approche Bayesienne
est prise quand la sensibilié et la spécificité du test diagnostic ne sont pas connues.
Ici. une méthode servant & calculer la taille de I'écliantillon nécessaire pour satisfaire
un critère Bayesien qui consiste à calculer la moyenne par rapport à la probabilité
marginale des données, est presentée. Des méthodes exactes sont proposées dans
quelques cas et un "Sampling Importance Resampling '' algorithme est utilisé dans
des situations plus complexes.
Contents
1 Introduction 1
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Diagnostic tests 1
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Objectives 4
. . . . . . . . . . . . . . . . . . 1.3 Estimating the prevalence of a disease 5
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Thesis outline 5
2 Preliminaries 8
. . . . . . . . . . . 2.1 Some additional characteristics of diagnostic tests 9
. . . . . . . . . . 2.2 Frequentist approaches t o sample size determination 11
2.2.1 Sample size required for a point estimate of a parameter to fa11
. . . . . . . . . . . . . . within a distance d of the true value 11
. . . . . 2.2.2 Sample size for a given power in a test of hypothesis 12
. . . . . . . . . . . 2.3 Bayesian approaches to sample size determination 13
. . . . . . . . . . . . . . . . . . . 2.3.1 Bayesian statistical inference 13
. . . . . . . . . . . . . . . . . . . 2.3.2 Bayesian sample size criteria 15
. . . . . . . . . . . . . . . . . . . . . . . . 2.4 Cornputational techniques 18
2.4.1 The SIR algorithm . . . . . . . . . . . . . . . . . . . . . . . . 18
2 - 4 2 The Gibbs sarnpler algorithm . . . . . . . . . . . . . . . . . . 23
3 Previous results and literature review 25
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Diagnostic tests 25
3.2 Sample size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4 Sample size in binomial studies: A new criterion 44
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.2 .A nomalies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.3 Modified criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5 Estimating the disease prevalence when the sensitivity and speci-
ficity are exactly known 55
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.2 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.3 Adjustment to the MLE . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.4 Confidence intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5 - 4 1 Confidence interval for p . . . . . . . . . . . . . . . . . . . . . 66
5.4.2 Confidence interval for 8 . . . . . . . . . . . . . . . . . . . . . 67
5.3 Sample size for estimating the prevalence . . . . . . . . . . . . . . . 73
6 Bayesian estimation of disease prevalence and sample size in the
absence of a gold standard 75
-" 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . KI
6.2 The case rvhen the sensitivity and the specificity of the diagnostic test
are exactly known . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.2.1 Prior density for p . . . . . . . . . . . . . . . . . . . . . . . . 76
. . . . . . . . . . . . . . . . . . . . . . . 6.2.2 Posterior density of p 77
6.2.3 Sample size determination via the average coverage criterion . 79
6.2.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
6 - 2 5 Comparing the sample sizes from the Bayesian approach to
those based on the AMLE . . . . . . . . . . . . . . . . . . . . 83
6.3 The case when the specificity but not the sensitivity of the diagnostic
test is exactly known . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.3.2 Prior density for p . . . . . . . . . . . . . . . . . . . . . . . . 87
6.3.3 Posterior density of p . . . . . . . . . . . . . . . . . . . . . . . 88
6.3.4 Posterior mean of 8 . . . . . . . . . . . . . . . . . . . . . . . . 90
6.3.5 Sampk size . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.3.6 SIR computations . . . . . . . . . . . . . . . . . . . . . . . . . 97
6.3.7 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
6.4 The case where both the sensitivity and the specificity of the diagnos-
tic test are unknown . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
6.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
6.4.2 Exact computations for the case of uniform prior distributions. 110
6.4.3 SIR computations . . . . . . . . . . . . . . . . . . . . . . . . . 112
6.4.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
6 . 5 Logistic regression models . . . . . . . . . . . . . . . . . . . . 123
7 Practical Implications
7.1 Procedure to End the sample size when the sensitivity and/or the
specificity are unknown . . . . . . . . . . . . . . . . . . . . . . . . . . 155
8 Discussion
Appendix 172
A Splus program to find the AMLE and the confidence interval for the
AMLE 173
B Splus program to calculate the average coverage probability when
both the sensitivity and the specificity are known 176
C Computing the posterior average coverage probability when the
specificity is known 187
D Regions of integration 195
E The SIR program to calculate the average coverage probability 200
vii
F Logistic models
Chapter 1
Introduction
1.1 Diagnostic tests
Diagnostic tests are widely used in medicine to help determine the presence or absence
of a certain disease in an individual. Unfortunately, most tests are not perfect. in the
sense that a given test may classi. a healthy individual as diseased. or a diseased
individual as healthy. Reasons for these errors may include laboratory or human
errors, technical imperfections in the tests t hemselves, and the difficulties of subjective
clinical judgments. Very often, the test does not directly mesu re the presence or
absence of the disease, but rather measures the degree to which a marker for the
disease, which rnay also be present in healthy individuals, is manifest. For example,
in parasitology, serologic testing may confirm the presence of ant ibodies associated
rvith a certain parasite long after an individual has been cured.
While in general the sample space for diagnostic test results may be continuous on
an interval or on the whole real line, dichotomous (positivelnegat ive), t richotomous
(high/medium/low) or other, in this thesis we will concentrate on dichotomous test
results. Of course: by choosing appropriate cutoff values, a11 diagnostic tests can
ultimately be dichotomized, although often this entails a loss of information. hfany
of the results to be presented here can be easily extended to other types of test results.
The degree of imperfection of a diagnostic test can be measured by its sensitivity
and specificity
The sensitivity of a dichotomous diagnostic test is defined to be its ability to
discover the disease when the disease is present. It is denoted here by S. so tha t
S = P(T+lD+). where T+ indicates that the result of the test is positive' and D+
indicates the presence of the disease. In other words: the sensitivity is the probability
of testing positive given that the disease is present.
The specificity of a diagnostic test is the abiIity of the test to confirm the absence
of the disease when the disease is truly absent. It is denoted here by C, so that C =
P(T-ID-) , where T- indicates a negative diagnostic test result, and D- indicates
the absence of disease. Thus the specificity of a test is the probability of testing
negative given that the disease is absent.
The prevalence of a given disease in a population is the proportion of the diseased
individuals in the population. The prevalence is denoted here by 8. so that 0 =
P(D+) . We now illustrate the above definitions:
Example 1.1.1 Centor [1992] considered 773 subjects who were given serum creati-
nine kinase (CK) to determine whether or not they had a myocardial infarction (MI).
The results are given in table 1.1.
The sensitivity of the test is estimated by s = 28/(28 + 23) = 0.55, and the
specificity is estimated by c = 471/(251 + 471) = 0.65. -4 point estimate of the
prevalence of myocardial infarction is 8 = (28 + 23)/(28 + 23 + 251 + 471) = 0.066.
When the results of the test always coincide with the true state of disease, the
Table 1.1: Diagnosing myocardial infarction (MI) by serum creatinine kinase (CK) .
Abnormal is defined as CK 2 120, and Normal subjects have CK < 120.
test is perfectly accurate and is often then referred to as a gold standard. Clearly
the sensitivity and the specificity of a gold standard test must both be equal to one.
Perfect gold standard tests rarely if ever exist, since even a theoretically perfect test
can be rendered less perfect by human, laboratory, or other errors. Therefore, a test
is often referred to as a gold standard if it is the best test available, even if it does
not have S = C = 1. Exarnples of imperfect tests that are considered to be gold
standards include arteriography, which is regarded as a gold standard diagnosis for
coronary artery disease, and the barium enema, which is regarded as a gold standard
for colon cancer.
Even when they exist, gold standard tests may be difficult to perform, highly
invasive, very costly, or time consuming, so that alternative tests are often considered.
In developing alternative tests, their performance is often compared to that of the
gold standard. For example, the stress method is a diagnostic test for coronary artery
disease whose test properties can be compared to arteriography. The stool guaiac test
is diagnostic for colon cancer and can be compared to the barium enema test. Table
1.1 provides an evample of the results of using serum creatinine kinase as a diagnostic
test for myocardial infarction.
1.2 Objectives
Determination of the sample size requirements dunng the planning phase of a study is
a very important and often difficult problem. A sample that is too small can produce
unstable estimates that can be very misleading, while an unnecessarily large sample
ail1 be wasteful of resources, including dollar costs and time to complete the stud-
and may expose study subjects to unnecessary risks.
In developing a new diagnostic test, one first needs to perform a study to estimate
the sensitivity and the specificity of the test. This thesis will address this problem
only very briefly. Separate studies are needed to give this problem the attention it
deserves.
Once a new test is developed, and its properties are estimated, it can be used
in screening studies to estimate the prevalence of a disease in a given population.
Several questions arise in such situations, two of which will be considered in this
t hesis:
1. What method should be used to estimate the prevalence of a disease, when
disease status will be deterrnined by an imperfect diagnostic test?
2. What should the sample size be in order to estimate the prevalence using these
methods with sufficient accuracy?
These problems are particularly difficult, since not only are we using an imperfect
test, but the degree of imperfection is only very rarely known exactly. Standard
methods for sample size determination therefore do not apply. In this thesis, we will
develop new methodology to address the above problems, and provide algorithms and
advice to enable researchers to apply the methods in practice.
1.3 Estimating the prevalence of a disease
When a gold standard test is available, estimating the prevalence of a disease given the
test results on a random sample of subjects from the target population, can be viewed
as the straightfomard estimation of a binomial parameter. Similarly, if diagnostic test
results from two tests are available, one of which is a gold standard' then estimation
of the sensitivity and the specificity of the other test is also straightfonvard. See
example 1.1.1. When a gold standard test is not available, however, one must take
into account the misclassification errors in order to avoid misleading results. In
particular, estimating the prevalence of the disease by the proportion of subjects
from the sample who test positive can be very inaccurate when the diagnostic test
is not a gold standard. For example, consider the case of a disease that has a Iow
prevalence, say srnaller than 0.01, and where the specificity of the diagnostic test
is 0.7. We will see in chapter 4 that in this situation, the proportion of subjects
with positive test results will be close to 0.3: no matter how large a sample size is
taken. Clearly, estimating 0 by that proportion would be very misleading, so that
adjustrnent for the sensitivity and specificity of the test is required.
1.4 Thesis outline
This thesis is mainly concerned with est imating sarnple size requirernents for st udies
involving imperfect diagnostic tests. It is structured as follows:
We have already defined the prevalence of a disease, the sensitivity and specificity
of a diagnostic test, and in chapter 2 we will present some additional definitions of
quantities related to diagnostic tests. We will also describe bot h frequentist and
Bayesian criteria for sample size estimation, and briefly outline the ideas behind
several estimation techniques that may be useful for sample size problems, including
those that will be helpful later in the thesis.
In chapter 3 we review some of the previous literature on the calculation of sample
sizes, both in general and those that relate to the estimation of the prevalence of a
disease.
In chapter 4 ive consider the classical problem of estimating a binomial parameter.
In planning binomial e~per iments~ sample sizes are often calculated to ensure tha t the
point estirnate will be within a desired distance frorn the true value with sufficiently
high probability. Since exact calculations resulting from the standard formulation
of this problem can be difficult, "conservative" and/or normal approximations are
frequently used. Sorne problems with the current formulations are given, and a
modified criterion that leads to some improvement is provided. -4 simple algorithm
that calculates the exact sample sizes under the modified criterion is provided, and
these sample sizes are compared to those given by the standard approximate criterion.
Chapter 5 focuses on a maximum likelihood approach for estimating the preva-
lence of a disease when the sensitivity and the specificity of the test are exactly
known. We consider the case of a disease with low prevalence. In this situation, the
maximum likelihood estimator ( M L E ) of 0 is often zero, which is unrealistic. We
define an adjusted MLE of O , which we denote by the AibfLE. We will show that
the .41bfLE is easy to calculate, can be considered as a more reasonable estimator
than the MLE, in a sense to be made niore precise in chapter 5 . We also discuss
confidence intemals for the A M L E , and find the sample size needed to estimate 0 to
within a given accuracy using the AMLE.
In most situations, however, the sensitivity and specificity of the diagnostic test
used to estimate the prevalence of a disease in a given study are not exactly known.
While some information usually is available on the tests, it is not normally sufficiently
precise that experts can agree on one single value for the sensitivity or the specificity.
Furt hermore, these values may Vary from situation to situation. Nevert heless. it
may be possible to construct a probability distribution over a range of sensitivity
and specificity values that represents what is known about the tests. In chapter 6
we use Bayesian methods to study the sample size requirements for estimating the
prevalence of a disease to within a given accuracy, in several different situations. First.
we examine the problem when the sensitivity and the specificity of the diagnostic
test are exactly known. Next, we extend these methods to the case where only the
specificity but not the sensitivity is exactly known. Finally we extend these methods
to the case where neither the sensitivity nor the specificity are exactly known. We
demonstrate that in the latter two cases, an upper bound on the accuracy exists'
so that higher precision cannot be attained regardless of the sample size. We will
develop a logistic regression mode1 to approximate the upper limit on the accuracy.
and the appropriate sample size needed to approach this upper limit.
Chapter 7 discusses the results of the previous chapters in the context of real
examples. We suggest a procedure to calculate the sample size required to estimate
the prewlence of a disease in practice.
Finally, chapter 8 contains a discussion and suggestions for further research.
Chapter 2
Preliminaries
In chapter 1, we defined the prevalence of a disease and the sensitivity and specificity
of a diagnostic test. The sensitivity and specificity are very important in assessing
the performance of a test. As ive shall see in this thesis? these properties also play an
important role in the estimation of the prevalence of a disease. Other characteristics
of diagnostic tests are also of interest. In this chapter we present some additional
definitions, as well as theorems and other methods that will be used throughout this
t hesis.
Throughout the remainder of this thesis, f (y) will denote the density function of
a random vector Y. In cases where confusion might occur, we will instead use the
notation fY(y) We will also continue to use the notation introduced in chapter 1 for
parameters related to diagnostic tests.
2.1 Some additional characteristics of diagnostic
tests
Definition 2.1.1 The positive predictive value ( p p ) is defined to be the prob-
ability of truly having the disease given that the test result is positive. ppu =
P(D+ (T+) .
Definition 2.1.2 The negative predictive value (npv) is defined to be the proba-
bility of truly not having the disease given that the test is negative. npv = P ( D - ( T - ) .
Definition 2.1.3 The positive likelihood ratio of a dichotomous diagnostic test
sensitivity LR+ = -
S -- 1 - specificity 1 - C y
and the negative likelihood ratio is
1 - sensitivity 1 - S LR- = = - .
C specificity
Example 2.1.1 Consider example 1.1.1 of chapter 1. In this example the positive
predictive value of the test is estimated by ppv = 28/(28 + 251) = 0.10, and the
negative predictive value is estimated by npv = 471/(23 + 471) = 0.95. The positive
likelihood ratio of the test is estimated by LRf = (28/51)/(251/722) = 1.58. and the
negative likelihood ratio of the test is estimated by LR- = (23/51)/(171/722) = 0.69.
the sample space. Let .4 be an
P(Bi
Theorem 2.1.1 Bayes' theorem Let Bi be a countable collection of mutually
esclusive events such that P ( B i ) > O for i = 1 ,2 : . . . , and R = uZI Bi, where R is
event with P(A) z O. Then for i = 1,2 , . . . , we have
Proof: By definition
which can be written as
By the 1aw of total probability
t herefore
Let X and Y be two random variables with joint probability function P ( x , y).
Bayes' thexern can then be written as
whenever the denominator is positive. If, on the other hand, S and 1' have a joint
continuous density function f (x, y ) , then Bayes' theorem can be written as
Proposition 2.1.1 The ppu and the npv can be written in terms of the prevalence
of the disease and the sensitivity and the specificity of the diagnostic test. In fact
e s = es+ (1 -8)(1- C) '
and
Proof: By defin
theorem
(1 - e)c npv = e(i - S) + ( i - B)C'
ition, ppv = P(D+ITf), and npv = P(D-(Tm). Therefore: by Bayes'
Similarly, using Bayes' theorem we have
Recalling the notation introduced in chapter 1 completes the proof.
2.2 Frequentist approaches to sample size deter-
minat ion
There are several frequentist criteria for determining sample sizes that have appeared
in the literature. Below we review the two most common of these. More complete
reviews of methods for sample size determination are found in Desu and Raghavarao
[1990], Lemeshow et al [1990], and Lachin [1981].
2.2.1 Sample size required for a point estimate of a param-
eter to fa11 within a distance d of the true value
Let -Y be a random variable, and let fs(xlO) be its density function, where 0 is an
unknown parameter. If the purpose of a study is to estimate 8' a natural question
is to determine the sample size such t hat the estimate of 9 d l be "close" to û wit h
high probability. Letting e be an estimator of 0, the sarnple size n can be deterrnined
such that
~ ( 1 6 - e l 5 d ) 2 1 - a (2.3)
is satisfied for prespecified d and a. Usually, 6 depends on n in such a way that 8 + t9 as n + W. Therefore. increasing the sample size leads to more accurate estimation,
and there is usually an no such that for n 2 no, equation 2.3 will be satisfied. Desu
and Raghavarao [1990] discuss this method for a variety of density functions f x ( x ( 0 ) .
We will return to this criterion in chapters 4 and 5 .
2.2.2 Sample size for a given power in a test of hypothesis
Suppose again that X is a random variable with density faY(xl0)! mhere 0 is an
unknown parameter. Suppose further that the purpose of the study is to test the null
hypothesis Ho : 0 = O. against an alternative hypothesis Ha, where Ha can be of one
of the following forms:
As is well known in testing a hypothesis? two types of errors can occur:
Definition 2.2.1 The type-1 error, usually denoted by (Y, is the probability of
rejecting the null hypothesis when it is in fact true.
Definition 2.2.2 The type-II error, usually denoted by $8: is the probability of
not rejecting the nu11 hypothesis when the alternative is true.
Definition 2.2.3 The power of the test under a particular alternative is the prob-
ability of rejecting the null hypothesis when this alternative is true, so that power=
1 - ,3.
Ideally, we would like both a and to be 0, but this is usually impossible except in
trivial cases. A comrnonly used method is to choose the sample size such that the
power of the test will be a t least 1-0 when the type4 error is at most a. This method
is widely used in the planning of clinical trials. In the context of diagnostic tests, it
may be relevant when one wishes to test whether the sensitivities (or specificities) of
two diff'erent diagnostic tests are equivalent. However, one is usually more concerned
with the estimation of test properties and population prevalences, so that Ive will not
further consider methods based on the power of hypothesis tests in this thesis. except
briefly in the literature review on sample size methods. See Desu and Raghavarao
[1990] for more on this approach in a wide variety of sampling situations.
2.3 Bayesian approaches to sarnple size determi-
nation
2.3.1 Bayesian sta t ist ical inference
Before the conduct of many experiments, some information is often available about
the values of the parameters of interest. The Bayesian approach to statistical in-
ference consists of summarizing this information in a joint distribution function for
the parameters, and then updating this prior distribution in light of the data col-
lected in an experiment through Bayes' theorem. We mil1 first briefly summarize the
Bayesian approach to statistical inference, and then review several Bayesian sample
size criteria.
Definit ions
Let S = (.Yi, . . . , Sn) be a random sample whose distribution depends on a possibiy
vector valued parameter .y.
Definition 2.3.1 The prior density function (or probability function) is the
density funct i~n (or probability function) of the parameter y, before the experiment
is conducted. It is denoted here by f (y).
The prior density function sommarizes al1 of the information known to the investiga-
tor before examining the data collected from the current experiment. Since different
investigators will have different prior density functions, they are subjective. An im-
portant step in any Bayesian analysis is to select an appropriate prior distribution.
See Chaloner [1996] for a discussion of a variety of methods for prior density elicita-
tion. Let f ( X I - / ) denote the joint conditional density function (or probability func-
tion) of the sample X = (-il,. . . , .Y,) given the parameter 7. Suppose that X = z is
observed.
Definition 2.3.2 The likelihood function is any function of the parameter ni7
l(217) that is proportional to f (zl y). For example, one can choose 1 (XI-() = f (XI-{). The likelihood function provides the relative likelihoods (heights of the curve) of the
data points x conditional on a fixed y.
With the above setting we have:
The marginal density of .Y? denoted here by m ( x ) ? is given b?;
J 1 (x 1 ) f (7) d , if f (y) is a density function
1 ( x ) ) : if f (7) is a probability function.
Definition 2.3.3 The posterior density function (or probability function)
of 7 is the conditional density function (or probability function) of y given .Y = x.
It is denoted here by f (ylx) and is given by
The above formula is of course Bayes' theorem. In the rernainder of this thesis, when
using the Bayesian approach, only continuous parameters with prior density functions
are considered. Therefore, the following definitions are given only for the continuous
case. Similar definitions can be written when considering discrete parameters with
prior probability functions.
Definition 2.3.4 The posterior mean of is the expected value of given .Y = z.
It is denoted here by j so that
where the integration is over the range of 7. If is a one random variable. then ive
also have the following:
Definition 2.3.5 The posterior coverage probability of an interval y2] given
2.3.2 Bayesian sample size criteria
Suppose that a random variable S depends on an unknown parameter and that
the purpose of a study is to estimate y. For example, we might like to determine the
minimum sample size n such that the posterior density function of y is "sufficiently
narrow". More specifically, we seek an n such that
for given d and a. Thus we are looking for the minimum sample size n such that the
posterior credible set defined by [̂ / - d, 7 + dl has posterior coverage probability of
at least 1 - a, that is
The posterior mean i and the credible interval [C, - d, + d] both depend on the
data x, but x is unknown at the planning stage of the experiment. To sidestep this
problem. we can select n such that the average posterior coverage over al1 possible
values of x is greater than l - a. This can be done in several ways.
Definition 2.3.6 An average coverage criterion (ACC) consists of fixing the
desired posterior interval length 2d, and finding the minimum sample size n such
that the expected posterior coverage probability is at least 1 - a, for a predetermined
value of a. Therefore, we seek the smallest n satisfying
In the case where S is a discrete random variable equation 2.7 is written as
In either case. note that the average is taken over the preposterior marginal dis-
tribution of x, rn(x). Therefore, the average coverage is in fact a weighted average,
with the weights given by m(x) .
Note 2.3.1 In this section and elsewhere, we use posterior credible intervals that
are of the form posterior mean f d. Other intervals could also be used' for example.
highest posterior density (HPD) intervals. Although it rvould be easy to define criteria
like the .&CC in terms of HPD intervals, these rvould typically be much more difficult
to compute, and rnay not substantially change the resulting sample size estimate in
most cases. See Joseph, Wolfson, and du Berger [1993b] for a comparison of HPD
and symmetric intervals in the context of various Bayesian sample size criteria.
Definition 2.3.7 .4n average length criterion (ALC) consists of fixing the cov-
erage probability 1 - cr of the posterior credible interval for y, and computing the
sample size n such that the expected length is at most 2d. That is, we seek the
minimum n such that r
where d f ( x Y n ) is such that
Here dt (x , n) depends on both x and n.
Definition 2.3.8 A worst outcome criterion (WOC) consists of finding the
srnallest n such that the minimum posterior coverage probability of length 2d over
al1 possible x, is a t least 1 - a, for predetermined d and a. That is
While the ACC and the ALC may be criticized on the grounds that they ensure
the desired posterior coverage probabiiity or the desired length only on average, the
WOC ensures t hat bot h the desired posterior coverage probability and the desired
length will hold regardless of which data x occurs. The WOC may sometimes lead to
unnecessarily high sample sizes, since some values of x leading to large n may have a
very low, but not nul1 probability. Therefore, it may be more realistic to only require
that the desired posterior coverage probability and the desired length hold over only
a subset of the data x. For more details on each of the above criteria see Joseph.
Wolfson and du Berger [1995a].
Although Bayesian approaches to sample size estimation were proposed a t least
35 years ago, (see Raiffa and Schlaiffer [1961]), they have not been widely used in
practice. Since al1 sarnple size criteria must make at least some use of the available
prior information, Bayesian approaches are natural for such problems. Therefore,
the reason for their scarcity is largely because these approaches involve computations
that can be very difficult. For example, in order to calculate the marginal density of
x we need to calculate integrals of the form
These integrals can be very difficult and often impossible to cornpute direct,ly, so
that numerical calculations or other methods are needed. However, recent advances
in Bayesian computing (see section 2.4) have largely removed this barrier. We will
make use of these cornputer advances in this thesis.
2.4 Computat ional techniques
In recent years, algorithms useful for Bayesian analysis, such as the sampling impor-
tance resampling (SIR) and the Gibbs sampler, have become very popular among
applied statisticians. This has focused increasing attention on Bayesian approaches.
both for general inference problerns (see Gilks (19961 for a recent review) and for
experimental design (see Dasgupta (19961 and Chaloner and Verdinelli [1995] for re-
cent review articles in Bayesian design). The basic idea behind these algorithms is to
replace difficult analytic computations of integrals involving posterior densities with
summaries of samples from the target densities.
Here we will briefly describe two of the most popular approaches.
2.4.1 The SIR algorithm
In this thesis. ive will use the SIR algorithm for many of the computations involving
Bayesian inference. This algorithm is a generally applicable method for approxi-
mating posterior distributions. The SIR algorithm is useful when one is interested
in obtaining a random sample from a probability distribution g(x) that is difficult
to work with analytically or even to simulate from directly, but where there exists
another simpler distribution, h ( x ) , that roughly abprofimates g ( ~ ) ~ and is easier to
sample frorn. The SIR algorithm consists of the following 3 steps:
1. Draw a random sample of size n, XI,. . . x,; from h(x).
2. Compute sample weights w ( z i ) = $$i = 1 . . . , n.
3. Draw a new random sample, of size m, XI:. . . , x k with replacement from
zl . . . , z, with probabilities proportional to w(xl) , . . . , w(x,) .
The resulting sample x;: . . . : x& is an approximate independent random sample from
g ( x ) . For example, one can approximate the posterior coverage probability of any
subset -4 of the real line by the proportion of points from x;'. . . xk that are in -4.
See Albert [1993] for a simple explanation of the SIR algorithm in the context of
binomial sampling, and see Rubin [1987] for a theoretical justification of the method.
Clearly, increasing m and n will increase the accuracy of the results of calculations
that use SIR samples.
Example 2.4.1 Suppose that a diagnostic test is given to n = 100 subjects to
determine whether or not they have a certain disease. Suppose further that x = 25
subjects test positive. Denote by p the probability of testing positive. Suppose that
prior information on the parameters 8, S and C is available, and the problem is to find
the posterior distribution of p. This might be of interest, for example, in estimating
the cost of follow-up for positively testing subjects. By the law of total probability,
we have
P ( T t ) = P ( T + ( D f ) P ( D f ) + P(T+ID-)P(D-)
which can be wit ten as
p = es+ (1 -8)(1- C).
Suppose that the prior information on O , S and C can be summarized in the following
prior distribution functions: 0 -- Beta(3,25); S -- Beta(5,2) and C - Beta(6.1).
Recall that the density function of a Beta(a, b) distribution is given by
where T ( t ) is the gamma function, and a and b are positive real numbers. The SIR
algorithm can be applied to obtain a random sample from the posterior distribution
of p. Let g (pis) be the posterior density function of p, and Let h ( p ) be the prior density
function of p. Approximate random samples can be obtained from h ( p ) as follows:
First we draw a random sarnple of size n = 50000, Say. from the prior distribution
of 8. (many algorithms are available for sampling from Beta densities). Random
samples of size n = 50000 are also drawn from the prior distributions of S and C. An
approximate random sample of size n = 50000 from h ( p ) is obtained by applying the
formula
p i = 6 i ~ i + ( 1 - B i ) ( l - ~ ) , i = 1 , ..., 50000.
Recall t hat
mhere m ( x ) is the marginal probability function of the data, and l(xlp) is the like-
lihood of the data, given by l (x(p) = (A!') pZ5 (1 - P ) ~ ' , or more simply p25(1 - Hence the sample weights are given by
Applying
from h ( p )
the SIR algorithm as described above we resample from the pi's obtained
with replacement and using weights wi, to obtain a simulated sample from
the posterior distribution of p. .A plot of the prior and posterior densities of p obtained
by smoothing the simulated posterior sample is shom in Figure 2.1.
prior density of p ...*.-------. posterior density of p
Figure 2.1: Plots of the prior and postefior density of p.
Here we see that the posterior density function of p seems to be more symmetrical
than the prior distribution of p: and shifted to the right.
2.4.2 The Gibbs sarnpler algorithm
Like the SIR algorithm, this technique consists of drawing appropriate random sarn-
ples from the posterior distribution in order tu make statistical inferences on a set of
unknown parameters, when direct computations are difficult or impossible to perform.
It can be useful in the following situation: Suppose we have k 2 2 random variables
?Cl, . . . 'il;, where the conditional distributions f (xilxj : 1 5 j j 4 and j # z) of each
,Y,, 1 5 i 5 k, given al1 other randorn variables, is either known or can be sampled
from. Suppose further that one is interested in finding the marginal distribution of
one or al1 of the random variables. The Gibbs sampler consists of the following steps:
1. Choose initial starting values for al1 the variables but one. For example. you
can start with initial values x * ~ , . . . x k l , for &, . . . &, respectivel-
2. Draiv a random value X I I for Xl from f (xL . . . zkl), ahich is the conditional
distribution of XI given & = 2 2 1 , - - - & = X ~ P
3. Repeat this procedure for al1 variables in turn, that is, after having drawn x i l :
draw a random value xz2 for & from the conditional distribution of ,Y2 given
q l , x 3 ~ . . . xkl, and so on.
-4 cycle of the algorithrn is completed when al1 conditional distributions have been
sampled €rom a t least once. The entire cycle is repeated a large nurnber of times,
typically 5000 or 10000. The random sample generated for each random variable
variable -Yi : 1 5 i < k is then regarded as a random sample (possibly correlated)
from the correct marginal distribution. Shere is now a very large literature on the
Gibbs sampler and other Markov chain Monte Car10 algorithms. For more details
see Gilks [1996]. Although we do not use the Gibbs sampler in this thesis, it may
be useful for more complex situations, for example, when two or more imperfect
diagnostic tests are sirnultaneously applied to a sample from a population in order
to estimate the prevalence of a disease. see Joseph, Gyorkos and Coupai [1995].
Other recently developed algorithms for Bayesian inference are reviewed by Evans
and Swartz [1995].
Chapter 3
Previous results and literature
review
In this chapter we will first review the Iiterature on estimation problems in diagnostic
test situations, and t hen review previous work on sample size determination. These
tmo topics form the b a i s for this thesis.
3.1 Diagnostic tests
Many authors have studied the estimation of the prevalence of a disease and the
accuracy of diagnostic tests when these tests are subject to errors. In this section we
will review some of this literature.
Walter and Irwig [1988] reviewed many methods for the estimation of the preva-
lence of a disease and the sensitivity and specificity of a diagnostic test when a
gold standard test is not available. In general, suppose that R different diagnostic
tests, none of ivhich are gold standards, are given to each subject in a sample from
-V different populations. If we suppose that each test provides a dichotornous re-
sponse indicating presence or absence of the disease. then we have R false positive
rate parameters ( 1-specificity) , R false negative rate parameters ( 1-sensi tivity) , and a
prevalence parameter associated with each population. Therefore, we have iV(2R+ 1)
parameters to estimate. Most authors have assumed the independence of the diag-
nostic test results from difFerent tests? conditional on disease status. This is often
called the conditional independence assumption. If the test results are conditionally
independent, there are N ( z R - 1) degrees of freedom in total. The log-likelihood of
the data can be expressed as
where a,, and Pr, denote the false positive and false negative rates for test r in
population s, x ( r ) denotes the classification of an individual by test r, the second
summation is over al1 combinations of observations given by the different diagnostic
tests, and n,(x) is the number of individuals in population s who receive a given
set of classifications x. If the number of tests is R 3 3, then al1 parameters can
be estimated by maximum likelihood' since N(2R + 1) 5 ~ ( 2 ~ + 1): that is the
degrees of freedom is greater than or equal to the nurnber of unknown parameters.
In this case, many authors, including Dawid and Skene (19791 and White and Landis
[1982], propose the use of the EM algorithm to estimate the parameters in equation
3.1. The Ehl algorithm consists of adopting initial estimates for each parameter. and
alternating through expectation (E) and rnaximization (M) steps until convergence.
See Dempster, Laird and Rubin [1977] for more on the EM algorithm. If R <
3, however, one is limited to estimating only a subset of the parameters, so that
constraints must be imposed on some parameters in order to calculate maximum
likelihood estimates of the others. For example, in the case where we only have one
diagnostic test and one population, there are three parameters to estimate, but only
one degree of freedom, and therefore two constraints must be imposed. A cornrnon
option in this case is to consider the sensitivity and the specificity as being exactly
known? and estimate the prevalence, given these test values. Rogan and Gladen X/n+c- 1 [1978] propose the use of the estimate = s+c-l . , where s is the sensitivity and c is
the specificity of the diagnostic test, and 0 the prevalence of the disease. X detailed
derivation and discussion of this estimator is given in chapter 3 of this thesis.
Two constraints are also needed in the case of two tests and one population, since
here there are 5 unknown parameters but ooly 3 degrees of freedom. One possibility is
to regard the sensitivity and the specificity of one test as being exactly known, and to
estimate the sensitivity and the specificity of the other test, along wit h the prevalence.
Staquet, Rozencweig, Lee et al [1981] consider this case and compute limit bounds
for the unknown sensitivity and specificity, which in turn lead to a range of values
for the prevalence of the disease. -4 special case would be to assume that one of the
tests is a gold standard, that is. s = c = 1. For example, in Weiner. R p n . McCabe et
al. [1979], arteriography is assumed to be a perfect test for coronary artery disease.
and the stress ECG method is the diagnostic test whose characteristic parameters
need to be estimated. Another choice would be to regard the sensitivity in both
tests as being exactly known, and estimate the specificities and the prevalence. see
Staquet, Rozencweig, Lee et al (19811. An alternative approach is to consider the
sensitivities of both tests as being unknown but equal and equal to the specificities.
Chinn and Burney [198'7]. Walter and Invig 119881 also reviewed cases with irregular
designs, where each test classifies only a subset of individuals. For erample, these
methods can be useful when preliminary tests will be used to determine whether
or not the individual will go on to be further tested. They also reviewed cases of
response variables with more than two categories, for example when a subject may
test positive, negative or indeterminate.
Lew and Levy [1989] used Bayesian rnethods to estimate the prevalence of a
disease frorn the results of a screening test, when the sensitivity and the specificity
of the test are exactly known: and equal to s and cl respectively, where s + c 2 1.
They used the fact that
p = OS + (1 - 8)(1 - c ) ,
which was derived in chapter 2, to mrite 0 as
From equation 3.2, and using the invariance property of the maximum likelihood
estimators: we have that the maximum likelihood estimator of 8 is given by
where x denotes the number of subjects with positive test results, and n is the sample
size. For rare diseases, the maximum Iikelihood estimator is often O. To correct
for this, the authors considered a Bayesian approach. They assumed that B has a
uniform prior distribution on the interval [O1 11, and proposed the posterior mean as
an estimator of B. The posterior mean of 19 is derived from the posterior mean of p in
the following way: Taking expected values conditional on the data x on both sides of E p z +c-1
equation 3.2, we have E(Blx) = (skL-L . Therefore, the posterior mean of 0 is
where
See ecpation 2.4 in chapter 2 . The authors also provide approximations to simplify
the calculations. and calculate credible intervals around the posterior mean. The
posterior mean has the advantage over the SILE in that it is always between O and
1.
The authors propose that Bayesian methods be used for samples with sizes smaller
than 100. and that the 5ILE be used for larger samples. While this may often be
reasonable. as will be seen in chapter 3. the maximum likelihood estimator could be
O with probability larger than 0.3 even for sample sizes much larger than 100. One
must also be careful when assuming the sensitivity and the specificity of the test as
exactly known. since even srnall deviations could greatly affect the estirnate of the
prevalence. Consider the example provided in Lew and Levy [1989]. Here n = 96.
x = S. and the sensitivity and specificity are assumed to be exactly known with
s = 0.89 and c = 0.74. Letting p denote the probability of testing positive. ive have
Replacing s and c by their values. we get p = 0.638 + 0.26. and
See Casella and Berger [1990] page 82. Therefore. this probability is decreasing in p.
and reaches its niauimum for p = 1 - c = 0.26. Hence
Thus either the authors are considering a highly unusual situation, or the specificity
of the diagnostic test is in fact much higher than 0.74. so that 1 - c will be closer to
8/96 = 0.083.
Johnson and Gartwirth [1991] considered the screening tests that are used to
detect antibodies to the human immunodeficiency virus (HIV) in donated blood, and
used a Bayesian approach to assess the prevalence of the disease. They developed
approximations to the joint posterior distribution of the prevalence of the disease
and the sensitivity and the specificity of the diagnostic test. The methods apply
to the case where the prevalence is very srnall, the sample size very large, and the
diagnostic test very accurate. They consider a sample of size n and suppose that
the data reports either x or (x:xa), where x denotes the number of subjects with
positive test results, and xtp denotes the number of subjects that are truly positive,
that is, the oumber of subjects among x that have the disease. They suppose that
the sensitivity S, the specificity C and the prevalence 0 have independent beta prior
distributions, 0 - Beta(a, b), S - Beta(al, bl), and C - Beta(a2, b2)- The likelihood
functions given data x or (x, x,,) are
and
respectively. From Bayes theorem, the joint posterior probability density functions
for (O, S, C) for the above two likelihoods are
and
respectively. In the case where B is near 0, and S and C are both near 1,
and
where in the second approximation O(1- S) is approximated by O and BC is approx-
imated by 8. Therefore,
0(1 - S) + (1 - e)c = 1 - (e + 1 - c) = e~p(-(O + 1 - C)).
The 1st approximation follows since for x small, exp(x) - 1 + x from the Taylor
series expansion of exp(x) . Now assuming that b, a l and a2 are not small relative to
n, and using the approximation exp(x) zz 1 + x, we get
and
It follows that
and
P(C) K (1 - c)"-' exp(-a2(l - C)).
Therefore, by Bayes theorem and the above approximations,
P(B,S,Clx) cc ( O + 1 - C)=exp(-(n - x)@+ 1 - C))Ba-'exp(-b8)(1 - S ) ~ I - '
x exp(-al (1 - C))(1 - C)~?- ' exp(-a2(l - C)) , and
31
P(0. S, Clz, xt,) x @"+"P-' exp(-O(b + (n - z))(î - s )~ ' - '
x exp(-al(l - S))(1 - ~ ) ~ ~ + ~ ~ p - ' e x ~ ( - ( l - C)(a2 + (n - x))).
Therefore, the marginal posterior densities of B I S and C given (x, ztp) are
These approximations provide a way to calculate the marginal posterior densities
when direct calculations are impossible. However. one must remember that they were
based on the assurnptions that b, ai and a2 are not small relative to n, and therefore
their applications are limited to situations where the prior information matches these
conditions. In particular. the exarnples provided in Johnson and Gastwirt h [l99 11
seem to violate the assumption that b. al and a2 are close to n. For example. using
a uniform prior on 0 violates the assumption that b is not small relative to n, since
b = 1 and n = 94496. Also, the priors S - .(142,1) and C - /3(1363,3) violate the
assumption that al and a* are not small relative to n, since a l = 142, a* = 1363 and
n = 3122556. These violations might a t l e s t partially explain the inconsistency of the
results found by the authors. For instance, the authors state that ". . . The posterior
standard deviations tend to decrease with decreasing prior information . . . If the prior
means were close to the data this phenomena would not occur . . . " This might be
due to the fact that decreasing prior information implies, according to the examples
provided, decreasing b, al and a:, while keeping n fixed. This creates larger errors in
the approximations used, and having the prior mean close to the data means that
b, al and a2 are closer to n. Therefore in choosing the prior distributions one has to
verify that the assumptions the approximations were based upon in fact hold.
Taragin, Wildman and Trout (19941 study the estimation of the prevalence of
disease 0 from an imperfect diagnostic test. They also propose the use of the estimate
given by equation 3.3, and assume that the sensitivity and specificity of the diagnostic
test are exactly known and equal to s and cl respectively. If the sensitivity and
specificity are not exactly known, the authors suggest the use of estimates "consistent
with the literature" to determine a range of values for the sensitivity and a range
of values for the specificity that will provide a range of values within which the
prevalence of disease will lie. Taragin, Wildman and Trout (19941 provide three
examples where equation 3.3 can be applied.
When using a Bayesian approach, estimation of al1 unknown parameters is based
on their joint posterior distribution. Even though the formulation of these posterior
distributions via Bayes theorern is simple, the calculations can be quite involved. In
recent years, with the advent of numerical techniques useful for Bayesian inference
such as the Gibbs sampler algorithm and the SIR algorithm (see chapter 2). one
can draw approximate randorn samples from the joint posterior distribution of the
parameters, and draw inferences about the parameters based on these samples.
Joseph, Gyorkos and Coupal [1995] provide a method to estimate the prevalence
and the parameters of one or more conditionally independent diagnostic tests in the
absence of a gold standard. In the case of one diagnostic test, they assume that the
prevalence of the disease in a given population and the sensitivity and the specificity
of the diagnostic test have independent Beta prior distributions, 0 --. P(a, b ) , S - &, b l ) , and C - /#(a2, b2) . The data can be represented by the 2 x 2 table given in
table 3.1. Here the sample size n is known, and a and b are the number of positive
and negative tests observed, respectively. The "data" YI and Y2, representing the
number of truly diseased individuais out of a and b, respectively, are not observed,
and are therefore termed latent data. The likelihood function of the observed and
Test results
True state 1 D+ 1 Yl 1 a - Yl 1 a 1
Table 3.1: Observed and latent data for one diagnostic test
latent data is then
l(a, b, Yi, %IO: S, C) cc ~ ~ ~ + ~ ~ ( 1 - 8 ) n - Y i - h ~ Y i ( 1 - S ) % ' ~ - ~ ( I - c)~-".
Using Bayes theorem, the joint posterior distribution of 0, S and C is
P(0, S, Cla, b, YI, f i ) CC t9Y1+Y2+a(1 - B ) ~ - ~ ~ - ~ + ~
x ~ Y l + a i (1 - s ) Y 2 + 6 1 ~ b - Y 2 + a z (1 ~ ) a - Y t + b z
Xote that the conditional distributions of the
(a, bo O , S. C) are
YI (a , 8; S , C - Binomial a,
and
( i; la, 6, S, C - Binomial 6 , (
The Gibbs sarnpler can be applied to obtain
unobserved latent data Yl and Y2 given
O(1 - S) e ( i - S) + (1 - e)c samples from the marginal densities for
YI, and Y*, 8, S, and C, by sampling in turn from the full conditional distributions
for these variables. As we will show in chapter 6, approximate posterior densities can
also be derived using a SIR algorithm, at least in the case of one test. The authors
applied similar methods to the case where the results of two or more diagnostic tests
are available, none of which can be considered a gold standard.
Joseph and G yorkos (19961 used sirnilar met hods to obtain posterior distributions
for the likelihood ratios of diagnostic tests in the absence of a gold standard.
3.2 Sample size
Sample size determination is an important component of the design of virtually every
experiment. The subject has received much attention, and a large literature is avail-
able. Little work, however, has been done on the question of determining the sample
s i x required to estimate the prevalence of a disease to within a given accuracy, when
a gold standard test is not available. If the diagnostic test used is a gold standard,
then estimating the prevalence of the disease and determining the sample size needed
for the estimate to be within a given accuracy of the true value is reduced to a simple
binomial sample size problem. This case will be discussed in full in the next chapter.
The problem of estimating the sample sizes required to estimate the sensitivity and
s p e ~ i f i c i t ~ of an imperfect diagnostic test, to within a given accuracy, is also reduced
to a simple binomial sample size problem, when a gold standard test is available. In
this section we will review some of the previous work done on the question of sample
size, giving more attention to the work that is more relevant to diagnostic tests.
Many sample size papers come from the clinical trials literature. Clinical trials are
usually designed to determine if exposure to a certain factor (such as a drug) changes
a disease state. One would therefore like to determine the sample size required to
ensure detection of an association between the factor and the disease state with high
probability. There is a very large Iiterature on estimating sample sizes for clinical
trials, much of which is applicable to other situations as well.
Lachin (19811 summarized the most cornmon methods for estimating sample size
requirements for clinical trials. The main idea behind most of the methods is as
follows: Let X be a random variable that is normally distributed. In particular,
assume X - N ( p , C), where C is usually a function of the variance of each individual
observation, a2, and of the sample size n. Often C is taken to be a2/n. Suppose that
t,he primary purpose of the study is to test the nu11 hypothesis Ho : p = po against the
alternative hypothesis H I : p = p l . Suppose further that under the nul1 hypothesis
a* = oz, and under the alternative hypothesis O* = of. One would like to determine
the sample size required for the hypothesis test to have a power of a t least 1 - 0 when
is a t most a, for aven (Y and p. Straightfonvard algebra (see Lachin
a sample size
the type4 errox
[1981]) leads to
\ P l - P o 1 .
where 2, and Zg are the usual standard normal upper 100(1 -a)% and 100(1- @)%
quantiles, respectively. In the case where O* is unknown, it can be approximated
from a preliminary sample of size m. for example by the sample variance S2 =
C(x, - ~ ) * / ( m - 1). See Desu and Raghavarao (19891 for a discussion of the estimation
of 02. A student t test based on may then be used to find the sample size.
Lachin (19811 applies this general method to a large variety of common occurring
cases, including the case where the purpose of the study is to test whether the means
of two independent groups are equal, as would occur in a clinical trial with a placebo
group.
Donner [1984] also reviewed approaches to sample size estimation in the design of
randomized clinical trials, where the primary purpose is to compare two treatments
with respect to the occurrence of (or time to) some specified event. In general,
denote by Pc the anticipated T-year event rate among control group patients, and
let PE be the anticipated T-year event rate among experimental group patients. Let
6 = PE - Pc, the difference in event rates. Suppose one would like to test Ho : b = O
against Ha : b = da, at level cu and power 1 - 13. The required sample size n for
each group can be calculated by approximating the binomial distributions with the
appropriate matching normal distributions. The sample size is then given by
where 00 and 01 are the standard deviations of an observation under Ho and Ha,
respectively. Different methods of estimating oo and ol Iead to different formulas.
Fleiss [1981] supposed that Pc is known, and provided the following sample size
where P = Pc+P& 2 and PE = Pc + da Lachin [1981] approximates equation 3.9 by
n= 2 (2 , + Z ~ ) ~ P ( ~ - P )
6f
Feinstein [197T], Cohen [197T] and Snedecor and Cochran [1980] provide similar for-
mulas to approximate equation 3.9. Xone of these formulas accounts for the continuity
correction often used to improve the iiormal approximation to the binomial distri-
bution. Fleiss, Tytun and Ury [1980] show that the incorporation of the continuity
correction implies that the value of n given by equation 3.10 should be increased by
an amount of 2/IPE - Pc( They also considered the case of randomization of un-
equal numbers of patients into two groups. If for example, one wishes to randomize
n patients to the experimental group and sn to the control group, where s > O, then
n is given by
where = s P ~ ~ %
Schlesselman (19741 provides a formula for the sample size needed to detect a
difference between the relative risk R = 2 regarded as clinically important to detect.
To test Ho : R = 1 against Ha : R = Ra, the sample size required for a test of level
a to have power 1 - is given by
where P~ = $pC(l + Ra). Sirnilar formulas are given by Mackuch and Simon [1978]
and Blackwelder [1982]: in estimating the sarnple size requirernents for clinical trials
for bioequivalence? thai is for trials designed to show that an experimental therapy
is equivalent in efficacy to a control therapy. This is of interest when the control
therapy is invasive or toxic. Donner (19841 also reviews several papers that considers
the stratification of subjects into k categories and Cases where one accounts for
Spiegelhalter and Freedman [1986] study the determination of sample size in clin-
ical trials, when the study consists of comparing experimental therapy against a stan-
dard treatment or placebo. They note that the usual method of testing Ho : d = O
against Ha : d = da, and computing the sample size required to have a test of level a
and power 1 - 0, has some weaknesses. In particular, the specification of 6, is often
vague. and usually is Wuggled until it is set at a value that is reasonably plausi-
ble, and yet detectable given the available patients". In this paper Spiegelhalter and
Freedman propose a different approach that takes account of the pnor clinical opinion
about the treatment difference. This method consists of setting the nul1 hypothesis
at d = 6,, the smallest clinically wort h-wile irnprovement necessary to recommend
the new treatment. The prior distribution f (6) of 6, is chosen based on information
about 6 available to the investigator prior to the conduct of the trial. Decisions can
then be made in the following way: Suppose a trial with a maximum n subjects is
envisaged. After data are collected, an interval for this treatment can be formed (this
rnethod applies to both confidence intervals or Bayesian credible intervals). If this
interval lies wholly below 6,, then the new treatment is inferior to the old one. If the
interval contains 6,, then the trial is not conclusive, and if the interval lies wholly
above 6, then the new treatment is superior. Denote by d the event that the new
treatment is superior. The power of the test is a function of 6. For example, if the
statistic of interest follows a normal distribution, ,Y, - N(6, 02/n), then the power
of the test is given by
where a(.) is the standard normal distribution function. For the derivation of equa-
tion 3.11, see Casella and Berger [1990]. The probability of correctly concluding that
the new treatment is superior is then calculated based on the prior distribution of 6.
A sample size can then be calculated to insure that P ( A ) is greater than some prede-
termined value. The authors acknowledge that this method is subjective, in the sense
that it depends on the prior distribution of 6, and that this prior distribution may
sometimes reflect the overenthusiasm of the trial planners. The authors hope that
with the frequent use of this method, planners will become trained to think deeply
about the trial before they conduct it, so that more objective priors for 6 mil1 be
proposed. See Spiegelhalter, Freedman and Parman [1994] for further discussion of
this method.
Arkin, hf i tchell, and Wachtel (19901 studied sample size determination for assess-
ing the properties of a diagnostic test B, that is, the sensitivity, the specificity, and
the positive and the negative predictive values of that test, by comparing it to a
reference diagnostic test A. They denote by x, and xb the test characteristic of in-
terest for tests A and B, respectively, and they test the hypothesis Ho : lr, - = O
against Ha : ?r, - r b = 9, at level a, where d is the smallest difference between and
q, considered to be clinically important. In order to perform this hypothesis test,
independent random samples of sizes ni and n2 are required from tests A and B,
respectively. The authors determine the minimum common sample size n = ni = 712
needed to ensure that the power of the test is 1 - ,O at a prespecified alternative
6, > 0. They find that
mhere no = (rra + 7rb)/2. Equation 3.12 is equivalent to equation 3.10 given above.
In ma- studies the main purpose is to estimate a particular test characteristic
to an error of f d. To estimate the sample size, Arkin, Mitchell, and Wachtel [1990]
propose the use of the standard binomial equation
derived frorn equation 2.3.
Simel, Samsa and blatchar [1990] studied the sample size requirements for accu-
rately estimating likelihood ratios based on wid t hs of confidence intervals. In general,
suppose that one has two random samples of sizes nl and n2, from two binomial pop-
ulations with parameters pl and p,, respectively. Suppose that one is interested in
where q / n l and x2 /n2 estimating p1/p2. The authors propose the estimator 02,n2 ,
are the observed sample proportions. A 100(1 - a)% confidence interval for p1/p2 is
To apply this general formula to likelihood ratios, suppose that the results of the
diagnostic test are given as in the 2 x 2 table 3.2, where a, b, e and f are al1 observed.
The sensitivity is estimated by a/(a + e ) and the specificity is estimated by f / (b + f ) ,
so that LR+ is estimated by a(b + f )/((a + e)b) , and LR- is estimated by e(6 + f) / ((a + e) f ) . Using the usual normal approximation to the binomial distribution,
Test results u/
an approximate 95% confidence interval for the sensitivity
Table 3.2: Diagnostic test versus true disease state when a gold standard is available.
and similarly, an approximate 95% confidence interval for the specificity is
Applying equation 3.13, an approximate 95% confidence interval for LR+ is given by
(log ( sens ) ji - sens + spec) exp I 1.96
1 - spec a b '
and an approximate 95% confidence interval for LR- is given by
where sens and spec are the estimated sensitivity and specificity, respectively. One
can use these equations to estimate the sample size based on accuracy of confidence
intervals as described in chapter 2. The sample sizes are not unique, however, so
a constraint on nl and nz must be imposed. One common constraint is n, = knz,
where k is predetermined by the investigator. Often, k = 1.
Simel, Samsa and Matchar [1990] argue that these methods sometimes result in
a sample size that is too large to be acceptable. This is because in order to use the
formula. one must provide values for sens and spec, and certain values may cause
large sample sizes. In this case, further constraints on the !ikelihood ratios can be
set by expert opinion. For example, upper and lower bounds can be imposed on the
likelihood ratios, and sample sizes are found taking into account these constraints.
However, a difficulty arises, since sens and spec are unknown before the study is
conducted, and it is not obvious which values will provide "conservative" confidence
intervals, in the sense of guaranteeing the desired interval width for d l data sets that
may arise. Confidence intemals for log(LR+) are of the form
1 - sens spec log(sens/f 1 - spec)) f ZaI2 +
(sens)n, (1 - spec)nz '
If we adopt the constraint nl = kn2, and if we impose upper and lower limits on
sens and spec, then the maximum length of the confidence interval is achieved when
sens and 1 - spec take their minimum values within their feasible ranges. I t is
not obvious, however, what an appropriate length of a confidence interval on a log
scale would be. In al1 three examples studied in this paper, there appears to be some
confusion between the exact true (but unknown) values and estimates of these values.
Consider, for example, their problem 1. Here they assume that the sensitivity is a t
least 0.8 and the specificity is assumed to be exactly known and equal to 0.73. This
implies that 2.96 5 LR+ < 3.70. Therefore, we certainly do not need a study to
confirm that the confidence interval does not contain the value 0.2, since under their
assumptions, it never will. Similarly if we consider the sample size suggested by the
authors, n = 73.4, then the upper limit of the confidence interval around a point
estirnate of 2.96 is
which exceeds 3.70.
Wickramaratne [1995] notes that the standard methods for determining sample
sizes in epidimiologic studies are based on simplifying assumptions that are often
unrealistic. He also notes that methods that make less restrictive assumptions have
been developed in recent years, and reviews some of these methods.
Joseph. Wolfson and du Berger [1995] studied sample size calculations for estimat-
ing binomial proportions, using a Bayesian approach. They used exact calculations
involving Beta quantiles, and based computations on highest posterior density sets.
Sample sizes from the average coverage criterion, the worst outcome criterion and the
average length criterion, al1 defined in chapter 2, were compared. Joseph, du Berger
and Belisle [1996] used Monte Carlo techniques to determine the sample sizes required
to estimate the difference between two binomial proportions for the same three cri-
teria. Similar methods for the case of normal means and the difference between two
normal means were considered by Joseph and Belisle [1997].
In summary. many authors have studied the estimation of the prevalence of a
disease from the results of diagnostic tests. Several authors have proposed a maximum
li kelihood estirnator when the sensitivity and specificity of the test are exactly known:
but Iittle work has been done on the estimation of the prevalence of a disease in the
absence of a gold standard test. Although the literature on the determination of
sample sizes in medical research is very rich, no attempt has been made so far to
study the sample size needed to estimate the prevalence of a disease to within a given
accuracy when no gold standard is available. In the remainder of this thesis, we shall
study the question of estimating the prevalence both using frequentist and Bayesian
approaches. Most of Our work will be focused on the determination of the sample size
needed to estimate the prevalence to within a given accuracy, when the sensitivity
and/or the specificity of the test are unknown, although in the next two chapters we
will first study the simpler case when they are assumed known.
Chapter 4
Sample size in binomial studies: A
new criterion
Introduction
As discussed in chapter 1, estimating the prevalence of a disease in a population
is a frequently occurring problem in medicine. One way to do this is to obtain a
sample from the population and test each individual in the sample for the disease.
If the test used is a gold standard, then the number of diseased individuals in the
sample is the same as the nurnber of individuals with positive test results, and the
problem of estimating the prevalence is the classical problem of estimating a binomial
proportion. In this chapter, we will look, from a frequentist point of view, a t the
problem of determining the sample size required to estimate the prevalence of a
disease within a given accuracy when the test used is a gold standard. In future
chapters we d l consider the more realistic problem of a test where misclassification
errors are possible.
Let O denote the prevalence of the disease in the population under study. Consider
a sample of size n frorn that population and denote by x the number of individuals
from the sample who test positive. Since the diagnostic test is a gold standard, x is
also the number of individuals from the sample who have the disease. Thus in this
case 8 = p.
Many recent textbooks on sample size determination (for example, Desu and Rag-
havarao [1990] and Lemeshow et al [1990]) suggest basing sample size calculations for
binomial experiments on criteria such as
This formulation ensures that the sample size wi11 be sufficient to estimate the true
binomial parameter O by the usual unbiased point estirnator 6 = zln, in the sense
that (e - 01 5 d with probability at least 1 - a. For suitably chosen xi and 2 ~ : the
left hand side of 4.1 is equal to
where n is the sample size. However, both the summand as well as X I and x2 depend
on the unknown value of 8, rnaking direct use of 4.2 and therefore 4.1 almost im-
possible in practice. One "conservative" solution (which we will show to not always
be conservative), suggested by Desu and Raghavarao [1990] and others, is to assume
that 4.2 is minimized when 0 = 0.5. More generally, if it is known that B 5 m < 0.5
or 0 2 m > 0.5, for some m, then an alternative solution would be to substitute
19 = m in 4.2. This would still be conservative, but guard against the possibility of
using an unneccesarily large sample size if 0 = 0.5 is used when in fact 0 << 0.5 or
B >z 0.3. The intuition behind labelling these substitutions "conservative" is that
the variance function of a binomial random variable, nO(1 - O ) , is maximized over the
interval (a, b) c [O7 11 by the value in (a, b) closest to 0.5. However, this reasoning is
only partially correct, since the effects of 0 on X I and x2 are ignored in focusing only
on the binomial variance.
It is also often suggested that the exact calculation in 4.2 can be replaced b - that
given by the normal approximation to the binomial distribut ion. Letting
1 e y i - eln-. 7- i: - exp (-$) ciy. 2=21 6
The limits yl and y2 are unknown, since 0 is unknown. However, conservative sample
sizes are available by substituting 0 = 0.5 or B = m as above, and using quantiles
of the normal distribution to approximate y1 and y*. This leads to the sample size
formula
where ZaIS is the usual standard normal upper 100(1- :)% quantile, and [al denotes
the smallest integer larger than a. In the case where 6=0.5, 4.4 reduces to 2;- a
n = [=1. These conservative solutions are correct only to the extent that the nor-
mal distribbt ion approxirnates the exact underl ying binomial probabilities. However,
the degree to which this approximation affects the sample sizes is usually unknown.
4.2 Anomalies
The main problem with using the standard formulation 4.1 is that 8 is unknown, and
it is dificult or impossible to ascertain which value of 0 is the most conservative.
Consider the following example:
Example 4.2.1 Let d = 0.1: 1 - a = 0.6, n = 5, and B = 0.4. Then 4.2 becomes
P ( x = 2) = 0.3456 while for 8 = 0.3, 4.2 reduces to P(x = 2) + P ( x = 3) =
0.625. Therefore, the minimum probability is not always at tained by substituting
0 = 0.5, that is, the value that provides the maximum variance is not atways the
most conservative in the sense of minimizing 4.2.
There are also other problems associated with the use of criterion 4.1:
Example 4.2.2 Suppose d = 0.1, 1 - a = 0.6 and n = 5. As above, for 0 = 0.5,
4.2 gives 0.625. There are several anomalies associated with this situation. First
consider the same calculation, but replace 0 = 0.5 by B = 0.50000001. In this case.
4.1 becomes P ( x = 3) = 0.3125. Thus the discrete nature of the binomial distribution
is such that a little disturbance in 0 reduces the probability by half. Since we will
rarely know 0 a prion' with a high degree of accuracy. this may be a serious concern.
Second, restore B = 0.5, but let d = 0.0999999999. Then 4.2 becomes O! Hence a
small decrease in d costs al1 of the probability. If strict inequality is considered in
equation 4.1, that is.
P(l'i /n - 81 < d ) 2 1 - a,
the probability is again O. Furthermore, if B = 0.5, the smallest n that gives
is n = 3, but if we take n = 6 then 4.2 becomes P ( x = 3) = 0.3125, that is, half of
the probability is lost when considering a larger sample.
While for ease of exposition the above examples featured only small values of n,
table 4.1 indicates that similar problems persist for much larger n. In the next section
a modified criterion is suggested to replace equation 4.1, that improves upon some
of the undesirable features illustrated above. In particular. sample sizes from the
modified criterion can be calculated exactly via an easy to program algorithm, and
the problems due to the anomalies illustrated by the above examples are diminished.
4.3 Modified criterion
For any giveri 8, a and d, criterion 4.1 ensures that
where 19 = z / n is the usual binomial maximum likelihood estimator. However, one
could also consider
P{-a 5 B - 0 5 + b } > - 1 - CI?
with ( a + 61 5 2d. Therefore. instead of the interval of length 2d centered at B. the
highest density intemal of length 5 2d containing 0 is considered. This is similar in
concept to stvitching to exact binomial confidence intervals rather than those based
on the normal approximation 4.3. Exact confidence intervals are cornmonly used
when n is small or 0 is near O or 1. Let
Do = { d l intervals I such that 9 E I and l(1) 5 2d} ,
wbere l ( 1 ) denotes the length of the interval I . Then the sample size can be defined
as the minimum n satisfying
wiiere k is an integer. and the infimum is over the range of possible values for B.
Remark 4.3.1 For given n and 19 5 0.5. P(k: O ) = ( F ) gk(1 - 1 9 ) * - ~ takes its highest
value on the point r, where -& 5 19 c S. For 0 = &. P ( r - 1: 19) = P(r; O ) .
Also P(k: O) > P ( k - 1; 0) if and only if -& < O. -4 similar argument can be made
when 6 > 0.5. See Rohatgi [19841 for the proofs of these statements.
Remark 4.3.2 Fix an integer k such that 1 5 k <_ 1-212. Then P(k / n; 0) is dif-
ferentiable with respect to 0, and it is easy to see that it increases if and only if
6 5 kln.
Definition 4.3.1 For given d and no define i to be the integer such that i /n 5 2d <
( i + l ) / n .
In what follows: it is assumed that i > 1. For i = O al1 of the results proven below
will be trivially true.
Lemma 4.3.1 Given d, a, and n. a point 8 can have a t most two highest density
intervals of length i , namely [sln, ( s + i ) /n ] and [ ( s + l)/n. (s + i + l)/n]. for some
integer S .
ProoE Suppose [sln, (s+i)/n] and luIn, (u+i)/n] are two highest density intervals
of length i corresponding to 8, and suppose without loss of generality that s < u.
It suffices to prove that u = s + 1. Since P(s;B) 2 P(s + i + 1;0), P ( s - 1 ; 6 ) 5
P ( s + i ; 8 ) , P(u;O) 2 P(u + i + 1;8), and P(u - 1;O) < P(u + i ;6) , by Remark
1: s 5 r < s + i + 1 and u 5 r < u + i + 1, where r is the point of maximum
probability. Hence s # u - 1 implies s < u - 1 and s + i < u + i - 1, which implies
that P(u - 1; 0) > P(s; 0) 2 P ( s + i + 1; 6 ) > P(u + i; O), which is a contradiction.
Lemma 4.3.2 Given d, a, n and BI < O*, let [sln, ( s + i ) /n] and [uln, (u + i ) /n ] be
highest density intervals corresponding to O1 and O2 respectively. Then u 2 S.
49
Proof QI < O2 implies that r t 5 r2 , where rl and 7-2 are the points of maximum
probability corresponding to OL and B2 respectively, defined in Rernark 1. Suppose
that u < S. then by Remarks 1 and 2, P(u+i+I: 0,) > P(u+i+l; 64) > P(s- 1: e l ) _>
P(u: O1) > P(u; B2): which is a contradiction.
Lemma 4.3.3 Given d: a. and n. an interval 1 is a highest density interval of length
i corresponding to some 8 E [O, l] if and only if it is an interval from the set
{[O, i/n]: [l /n, ( z + l ) / n ] , . . . , [ ( n - i ) /n . 11).
Proof: Let Ik denote the interval [ k /n , (k + i) /n]. To prove the necessary condition:
just note that if j/n < a < ( j + l ) / n 5 (n- i ) /n l then the probability of the interval I j
is larger than the probability of the interval [a, a + i l n ] . The proof of sufficiency will
proceed by induction. It is clear that Io is the highest density interval corresponding
to 0 = O. Suppose Ij-L is a highest density interval, then it suffices to prove that I j
is also a highest density interval. Let
8, = rnâu(0 : 1,-1 is a highest density interval for O } .
We will prove that both I j V I and I j are highest density intervals for 0,. Suppose
Ij-1 is not a highest density interval for 8, and let I be such an interval. Set E =
P ( I ; O,) - P ( I j - i : 8 , ) . For every k, there exists dm > O S U C ~ that (O j - O( < dk imphes
Let d = minbk, and take 6 such that O < 8, - 0 < 612 and such that I j - , is a highest
density interval for B. Then
Therefore P ( I j - l ; O ) < P ( I ; O ) , which is a contradiction. To prove that 1, is a highest
density interval for O j ? let
S = {s : I, is a highest density interval corresponding to some 8 > 8,)-
S is not empty since n - i E S. Letting v = min S, [vin, (u + i)/n] is a highest density
interval for 0,- To prove this, let 0 be such that O < t9 - 9, < 612: let I j be a highest
density intervai for B: and proceed as above. Hence by Lemmas 1 and 2, u = j and
I j is a highest density interval for 8,-
Theorem 4.3.1 Let d, a' and n? be given. Let i = LSndJ, that is, i is the largest I
(n 1 i+l integer srnaller than 2nd. For j E (1,. . . , n - i}, define rj = I;- and Bj = ,%.
L+j ) I - tr ,
Denote by H(B) the probability content of a highest density interval corresponding
Theorem 1 states that in order to calculate the minimum highest density region over
0 E [O, 11- it suffices to consider only the n - i values of 0,. Similarly, if O 5 m: only
H(Bj), Oj < m need to be computed, and the sarnple size is the smallest n such that
Proof: From Lemmas 1 and 2, and the definition of B j , j E (1.. . n - i), the
highest density interval of a point B E (O,, 8j+i) is unique and equal t o 1,. At 0,:
P ( I j - l ) = P(Ij) which reduces to (7- ,) $-' (1 - O,)"-j+' = (y+) O?' (1 - 0, *
J
x / n E I,: O ) , and therefore H(t9) = Cgzj p{x; O} - z:=i+,+l p{x; 0). On the other
hand (see Rohatgi [1984]),
g (O) is increasing. Furt hermore,
(Z1) (y- 1) n - i - j g(ej) = 1 - -- = 1 - < O and (73) (Y+,) n - j + l
so that g(B) has only one zero in 4. Hence H ( 0 ) has a maximum in I,, therefore in
this interval P(6) is minimum at one of the end points 0, or
This suggests the following algorit hm:
1. Given d, rn 5 0.5 and a, select an initial guess for the sample size n. (If
m > 0.5, use with m' = 1 - rn in place of m. The standard formula 4.4) with
B = m could be used to obtain the initial guess.)
;$T H, and Oj = 2. Calculate i = L2nd], T, = j = l , ? . . . n - i .
3. Calculate H(Bj) = ( k ) B:(1 -
4. Letting s = mm{j : 8, 5 m) , calculate H ( m ) = C;zs ((:) mk((l -
5 . (a) If there is no bound for O, calculate = min({H(Bj) : 1 5 j 5 n - i ) ) .
(b) If 8 < rn 5 0.5, calculate PmiR = min({H(Bj) : Bj 5 m } , H ( m ) ) .
6. Repeat steps 2 through 5 with a new value for n, until Pmin 1 - a for n
but not for n - 1. For example, subsequent values for n can be selected via a
bisect ional search algori t hm.
The above algorithm is straightforward to program in most programming lan-
pages.
4.4 Examples
Consider again example 4.2.1 from Section 2. This example illustrated that sub-
stituting O = 0.5 for the unknown 6 does not guarantee a conservative probability
calculation. It is also true that 8 = 0.5 is not necessarily conservative when using
the modified criterion. However, Theorem 1 states that the minimum highest density
probability intenal occurs when 6 = 9, for some j 5 n - i, so that the exact minimum
probability can easily be found, which is not in general the case when using 4.1.
Under the modified criterion, small disturbances in B do not greatly affect the
probabilities, as was the case for the standard formulation in Example 2. For example,
using t9 = 0.50000001, the highest density interval remains a t 0.625. Small decreases
in d also do not affect the highest density interval probabilities when k / n < d <
(k + l ) /n for some integer k. However, when d = kln, a small decrease in d produces
a loss of one of the end points of the interval. In contrast, under equation 4.1 both
end points are lost. When the sample size increases to 6, the probability under the
rnodified criterion is 0.5469 while under equation 4.1 it is 0.3125.
Table 4.1 provides ten additional examples, for a selection of values for 1 - a, d7
and m. The examples illustrate that in experiments requiring small samples, such as
when a and d are relatively large, the difference between the sarnple size computed
exactly and the one computed using the normal approximation can be as much as
50%. More interestingly, the differences can still approach 20% even when 1 - û
takes on the usual 0.9 or 0.95 values, and the sample sizes near 100. Furthermore,
up to sample sizes of 3.000. In many cases: these differences may be of practical
importance. Of course the exact intervals are assymmetric about the point estimate
of the proportion while the normal approximation is limited to symmetric intervals.
Therefore it is not surprising that the exact intervals lead to smaller sample sizes.
In conclusion. in this chapter we have introduced a new exact criterion for sample
size estimation for estimating binomial parameters. If a gold standard test is used.
this method could be used to select a sample size to estimate the prevalence of a
disease. In the next chapter. we will examine methods when the sensitivity and
specificit- are known. but are not assumed to be identically equal to one.
Table 4.1: Sample sizes (SS) for various values of a: d, and mo using the normal
approximation (4.4) and the modified criterion (4.5).
Chapter 5
Est imat ing the disease prevalence
when the sensitivity and specificity
are exactly known
5.1 Introduction
In chapter 4 we discussed the problem of estimating the prevalence of a disease when
the diagnostic test is a gold standard test. If, however, the diagnostic test is not a
gold standard test, the number of diseased individuals in the sample is not directly
observable. Instead one only knows the number of individuals mho test positive, and
estimating the prevalence therefore depends on knowing the characteristics of the
test' in particular. the sensitivity and specificity. In this chapter we will study the
problem of estimating the prevalence of a disease in the absence of a gold standard
diagnostic test. We will suppose that the sensitivity S and the specificity C of the test
are both exactly known and equal to s < 1 and c < 1, respectively. Since diagnostic
tests that have the sum of their sensitivity and specificity below 1 can be improved
by reversing what is considered to be a positive test, without loss of generality we will
assume that s + c 2 1. Although it is instructive to consider this problem, t his mode1
will usually be at best an approximation to reality, since it is very rarely true that
S and C are exactly known. Chapter 6 will consider the more realistic case when S
and C need to be estimated along with the prevalence.
5.2 Definitions
Let 0 denote the prevalence of the disease in a particular population. Consider a
sample of size n from the population under study, and let p denote the probability
of testing p ~ s i t i v e ~ which includes both true and false positives. Denote by X the
number of individuals from the sample who test positive. We saw in chapter 2,
equation 2.11 that,
p = se + (1 - c)(l - O ) . (5.1)
Since 8' s and c must al1 lie on the interval [O, 11, the above formula s h o w that p
must lie in the interval [l - c, s]. Solving for 8: we have
One common estimator of p is its maximum likelihood estimator (MLE). Owing to
the restriction of p to the interval [l - c, s], the MLE of p is not always the usual
binomial MLE Xln. In fact, as discussed in Rohatgi [1984]
Using the invariance property of maximum likelihood estimators, (see [SI), the ML E
Many authors have proposed
Taragin, W'ildman and Trout
I I : i f X / n ? s .
this estirnator, including Rogan and Gladen [1978] and
[1994]. The M L E ( 0 ) performs reasonably well for rnost
values of 0. When B is small however, as is the case for rnany diseases, the MLE(0)
is quite often O, even when the unobserved nurnber of truly diseased subjects in the
sample, Y, is not 0. In fact, P ( Y = 0) = (1 - O)" , while P ( M L E ( 6 ) = 0) = P ( S / n 5
1 - c), and the latter can be much larger than the former.
Table 5.1 illustrates t his phenomena for various values of 9 and n when s = 0.9 and
c = 0.8. In this table we used the normal approximation to the binomial distribution
to calculate P(,Y/n 5 1 - c ) . Since
we have
P ( M L E ( 0 ) = 0 ) = @
where ( t ) denotes the standard normal distribution function, that is,
In this chapter we Riil1 suggest an adjustment to the ICI LE of 19, useful when
M L E ( 0 ) = O . We will call this new estimator the adjusted MLE, or AMLE. Confi-
dence intervals for 0 based on the AMLE will be derived. Finally, we will derive a
method for calculating the sample size needed for the AMLE to be within a distance
d of 0 with probability at least 1 - a.
Table 5.1: P ( Y = O) versus P ( M L E ( 0 ) = O) when s = 0.9 and c = 0.8
0
Adjustment to the MLE
Suppose we have a sample of size n. Let X and Y be defined as in section 5.2. and Let
Z be the unobserved latent data representing the number of truly positive subjects.
See table 1.2.
sample size
Test results
P ( Y = O)
Table 5.2: Diagnostic test versus disease state
P ( k f LE(0)) = O
ofdisease
When choosing a subject at random from this sample. the probability of choosing
a positively testing subject is X/n, the probability of choosing a positively test ing
D- .Y-Z n-,Y-E'+Z n - Y
subject given that he is truly diseased is Z/Y, the probability of choosing a sub-
ject that tested positive given that he is not diseased is (X - Z ) / ( n - Y). and the
probability of choosing a subject that has the disease is Y/n. We have
S / n = Y / n ( Z / Y ) + ( 1 - k' /n)(X - Z ) / ( n - Y ) . (5.2)
Remark 5.3.1 Note that E(S /n ) = p: E(k'/n) = 0, E ( Z / Y ) = s and E ( ( X -
Z n - Y ) ) = 1 - c. Also, in deriving the MLE(B) we used the equation p =
Os + ( 1 - 8) (1 - c ) , which is equivalent to
E(x/n) = E ( Y / n ) E ( Z / Y ) + ( 1 - E ( Y / n ) ) E ( ( X - Z ) / ( n - Y)). (5.3)
Let x denote the observed value of ,Y in the experiment. To define the AMLE when
x / n 5 (1 - c ) , we will assume that equation 5.3 remains true if x is known. Therefore.
we assume that
from this we will derive an approximation to the term E(kF/nlx).
Approximating E(Y/n(x) when x/n 5 (1 - c): When x/n 5 (1 - c)? Y is usu-
ally small relative to n. By taking a large enough sample, we can make P(Y =
0) = (1 - 0)" as small as we like. Therefore, we will suppose that Y # 0, and
hence E(Z /Y lx ) can be approximated by S. In addition, for large sample sizes,
(S - Z) / ( n - Y ) is approximately normally distributed with mean (1 - c) and variance
c( l - c) /n . Therefore, to approxirnate E ( ( X - Z)/(n - Y ) lx) we consider a random
variable H that is normally distributed with mean 1 - c and variance c(l - c ) /n ,
but with the added constraint that H 5 1/72. We then calculate E ( H ( x ) , which a p
proximates E ( ( S - Z)/(n - Y ) ( x ) . Substituting these approximations, equation 5.4
becomes
x / n zz E ( Y / n l x ) s + ( 1 - E(Y/nlx))E(H(x).
Solving for E(Y /n (x ) gives
It remains to calculate E ( H l x ) , which will be done after the following definition.
Definition 5.3.1 We define the adjusted MLE of 0, denoted by .4MLE, to be
Remark 5.3.2 Similarly, an adjustment to the M L E ( 0 ) can be defined when x/n 2
s: that is, when the MLE(0) is 1. This can be done by reversing both what is
considered to be a positive test and what is considered to be the disease state. Then
s rvill become the specificity, c will become the sensitivity and ( n - x ) / n will be the
number of subjects from the sample who test positive. The AMLE can then be
defined similarly to the above. Therefore? without loss of generality, we will assume
in the remaining part of this chapter that X / n < s with probability 1.
Cdculating E(H1x): To calculate the AMLE ivhen x / n 5 1 - c, we need to
calculate the integral
where f H I X ( h l x ) denotes the conditional density function of H given X = x. Let
fH (h) denote the density function of H, FH (h) denote the distribution function of H ,
and F H l s ( h ( x ) denote the conditional distribution function of H given X = x. We
then have
1 O othenvise,
and t herefore
! O otherwise. Hence
Let ZL = h-(1-C) . then ,/- ,
= - Jc(:Tic) erp (- 2 4 - c)/n
so that
1 O , otherwise,
the AM L E can be defined as
Simulations nTere run to compare the estimates from the AMLE to those from
the MLE(O) for sewral common situations. Since the M L E ( 0 ) and the AMLE are
equal when 1 - c < x/n < S . table 5.3 illustrates several representative cases when
they are not equal. that is when x / n 5 1 - c.
0 1 sarnple size 1 sensitivity ( specificity
0.03 300 0.6 0.9
0.02 LOO 0.9 0.8
Table 5.3: Selected examples of the value of the AM L E simulated under the indicated
values for the sarnple size, true value of the prevalence 8: and sensitivity and specificity
of the diagnostic test. The value for x represents the nurnber of positive tests out of
the total sample size. Since x/n 5 1 - c, the MLE(0) is zero for al1 of these cases.
'lote that given the data, the A M L E does not depend on O, although of course,
i3 is needed to generate x in a simulation. For example, the first row of table 5.3
show that if 19 = 0.04, n = 100, s = 0.9 and c = 0.8, an observed x = 19 leads to
A M L E = 0.039. If instead 8 = 0.03 but all values for s, c, and n remain as above,
if we observe x = 19, then the A M L E will still be 0.039. We also see in this table
that when xln 5 1 - c, the further away x / n is from 1 - c, the smaller the AMLE
is. For instance. for n = 100, s = 0.9 and c = 0.8, we have
While the examples in table 5.3 seem to indicate that the .-Li\fLE irnproves on
the usual M L E in cases where they differ, we performed a more forma1 simulation
to quantify the improvement. We considered a few common cases where c = 0.8 and
s is either 0.9 or 0.8. For each of the values of û E {0.05,0.04,0.03~0.02,0.01)~ Ive
ran 10000 simulations of samples of size 100 and 500. We calculated the number of
times the MLE(0) \vas O in these simulations, and the mean AMLE for these cases.
The mean squared error (MSE) is defined as the average of the squared deviations
between the estimator and the true parameter value in each simulation. We calculated
the square root of the MSE of the i\fLE(fl), denoted by e.MLE, so that e.MLE = (A.ILE(0)-B)2 dr. 1oooo Sirnilariy, e.dMLE = J7. ( ; IMLE-O)* 10000 Since ive are interested in the
cases were the .RILE(B) and the AMLE differ, we defined the conditional >ISE to be
the MSE conditional on 2/71 5 1 - c. We denote the square root of the conditional (.4ICILE-8)' MÇE of the .4MLE by ce.-LiLILE, that is. ce.;lMLE = 4 7 . Note
that the conditional MSE of the MLE(0) is O , since
Table 5.4 contains the results of these simulations.
[ 0 ( ssize sens
Table 5.4: Mean square errors for the M L E ( 0 ) and AMLE. In this table, sens is
the sensitivity of the test and the specificity is held constant a t 0.8. The parameter
k represents the number of times x/n < 1 - c in a simulation of size 10000, mAMLE
is the average of the AMLE for these simulations where x/n 5 1 - c, e.MLE is the
square root of the MSE of the M L E ( 0 ) , e..4MLE is the square root of the MSE of
the AMLE and ce.AMLE is the square root of the CMSE of the AMLE conditional
on x/n 5 1 - c. The square root of the mean square error of the M L E ( 9 ) when
x/n 5 1 - c is equal to 0 (see text).
In most of the cases reported in table 5.4 we see that the MSE of the AMLE
is smaller than that of MLE(B). For euample, for B = 0.05, n = 100, s = 0.9 and
c = 0.8, the number of times the MLE(0) was O out of 10000 simulations is 2286. The
conditional MSE of the AMLE is 0.016, while the conditional MSE of the MLE(0)
is much larger, 0.05. If Ive increase the sample size to 500, the number of times the
MLE(B) is O out of 10000 simulations drops to 315. This is because when the sample
size is increased, x/n gets closer to the true value p > 1 - c, so the probability of
observing xln 5 1 - c decreases. We note that only in rows 8: 9 and 10 of the
table the performance of A M L E seems to be poorer than that of the MLE(0) . For
example, in row 10 the ce.A\fLE = 0.03 while ce-MLE = 0.01. This is due to the
fact that with 0 = 0.01. s = 0.8 and c = 0.8, a sample larger than 100 is needed for
the AMLE to perform well. In fact, if we increase the sample size to 500 (row 20)
the ce.AMLE drops to 0.008 while the ce.h.fLE is stillO.O1. In the next two sections.
we will derive confidence intervals and sample size requirements for B based on the
,AM LE.
5.4 Confidence intervals
In this section we will prove that to find a 1 - a confidence interval for O , it suffices
to find a 1 - a confidence interval for p, which can be done, for example, using the
.MLE of p. In fact, we will prove that if we can find a positive nurnber Z such that
5.4.1 Confidence interval for p
Finding an approximate confidence interval for p is the classical problem of finding an
approximate confidence interval for a binomial proportion, with the added restriction
that these intervals must be contained within the feasible range of pl the interval
[l - c' s]. In fact? suppose that an 1 has been found satisfying equation 5.6. This
means that
P ( p E [Xln - 1, Xln + 11) 2 1 - a.
Since P ([l -c, s]) = 1 and hence P([O, 1) - [l -c, s]) = O, by the law of total probability,
Therefore [ S l n - Z, S / n + 11 n [l - c, s] is a 100(1 - a)% confidence interval for p.
To find 1 satisfying equation 5.6, we use the classical method. found in almost
al1 statistical textbooks. It consists of taking the normal approximation to the
binomial(n, p) distribution, and looking for the value 1 such that
This leads to
1 = za/2J-,
where is the usual standard normal upper 100(1 - :)% quantile. Since p is
unknown, it is usually approximated by xln. In the current context, however, p is
restricted to the interval [1 - c, s]. Therefore, to find 1 satisfying equation 5.8, ive
approximate p by xln only when 1 - c < x/n < s and by 1 - c when xln 5 1 - c. An
approximate 1 - cr confidence interval for p is then given by the interval that results
from the intersection of [xln - 1, xln + 11 with [l - c, s].
Remark 5.4.1 In most cases (depending on the sample size and on a), the intenec-
tion of the two intemals is not the empty set. Since we are dealing with approximate
and not exact confidence intervals: however, it can happen that the two intervals do
not intersect. In these cases the above procedure does not lead to a 100(1 - a)for p.
Since the length of a confidence interval increases as a decreases, and goes to s + c - 1
which is the length of the interval of support of p, [l - c7 s] , as a goes to O. ive can
always find a ,O < a for which the above procedure leads to an approximate (1 - ,8)
confidence interval.
5.4.2 Confidence interval for 8
Assume chat an 1 satisfying equation S.6 (at least approximately) has been found.
PVe now prove the following theorem:
Theorem 5.4.1 If [ X / n - 1, X / n + 1) n [l - c, s] is a 100(1- a)% confidence interval ,Y n+c- t -l S n+c-14-1
for P , then [ /S+C-l 1 jS+=- ] n [O, 11 is a 100(1 - a)% confidence interval for 8
containing both MLE(0) and =IM LE.
ProoE Recall that 0 = S. By the law of total probability we have
Xt suffices then to prove that
and
Consider first the case when X / n > 1 - c. The .AMLE of O is then by definition simply
.\.ILE(B) = -Z-- s+c- 1 , so that
Therefore,
so that X/n+c-1-1 ,Y/n+c-1-1
s + c - 1
and by the law of total probability
Consider next the case when X/n 5 1 -c . We then have M L E ( 0 ) = 0, and .huLE = s /n-(1-c)tM(X)
s-(i-c)+nr(x) . Thus
If IX/n - pl 5 1 with probability at least equal to 1 - (Y, then
with probability a t least equal to 1 - a. On the other hand,
so that
with probability at least 1 - a. Therefore
with probability at least equal to 1 - a.
The right hand side of 5.9 can be written as
and by the law of total probability we have
X / n + c - 1 - 1 X / n + c - 1 - 1 1 n [O, 11) 2 1 - a. s + c - 1 s + c - 1
,Y n+c-1-1 S n+c-l+i Therefore, a confidence interval for 6 is obtained by intersecting [ ',+,-, , 's+c-L 1 with [O: 11, and this confidence interval contains both MLE(0) and AMLE.
We calculated 95% confidence intervals for some simulated cases, using a sample of
size 100, a specificity of 0.8, and various values for the sensitivity. Since here we are
mostly interested in the case where x / n 5 1 - c, we present several such examples in
table 5.4.2.
Since Our methods are only approximate, we ran a simulation to estirnate the
proportion of times the 95% confidence intervals captured the true O. We present the
results in table 5.6, for a sample size of 100. Ten thousand simulations were run for
each case. We note that in most cases reported in table 5.6, the proportion of times
the 95% confidence intervals captured the 0 is very close to 95%. The true coverage
proportion can be viewed a s a binomial proportion, since each time a confidence
interval is calculated, the true 6' is either in or out of the interval. Therefore, if we
have a sample of size 10000, the accuracy of the estimated coverage proportion is
where prop is the proportion of times the 95% confidence intervals captured the true
8. Here, the values of prop are al1 close to 0.95: so 1 can be approximated by
which is very small. In surnmary, the method appears to work wvell, at least for
n = 100, and should also work at least as well for larger sizes, and for values of B
further from O. Since it is not reasonable to estimate very small prevalences with
small samples, the method should work well in al1 cases of practical importance.
8 sens x -4MLE conf.int
Table 5.5: Confidence intervals for 8. The specificity= 0.8, sample size= 100: sens is
the sensitivity, and conf.int is the 95% confidence interval of B calculated with the
method given in section 5.4.2.
Table 5.6: The proportion of times out of 10000 the 95% confidence intervals cap
tured 8. The sample size= 100 for al1 simulations, sens is the sensitivity, spec is the
specificity and prop is the observed proportion of times the 95% confidence interval
captured 8.
5.5 Sample size for estimating the prevalence
In designing a study to estimate the prevalence of a disease in a given population.
it is always important to consider how large the sample size should be. Therefore,
we would like to know how large a sample we need for the AMLE to be within a
distance d of the true prevalence 0 with high probability. From the preceding section.
the AMLE is within a distance d of t3 if Xln is within a distance 1 = d(s + c - 1) of
p. Therefore, we are looking for the sample size n such that
Again, the normal approximation to the binomial distribution can be used. leading
to the usual binomial sample size formula
so that
We see from this formula that the sum of the sensitivity and the specificity has a
very large influence on sample size requirements. The closer this sum is to 1, the
weaker the test. and the larger the sample size. In the extreme case where s + c = 1.
no sample size is sufficient, since the test is completely uninformative. On the other
band: the minimum sample size occurs when we have a perfect gold standard test!
Table 5.7 provides sorne illustrative examples. In that table, ssize is the conserva-
tive sample size, obtained by replacing p in equation 5.10 by 112, and ssize.smaI1 is
the smallest possible sample size under the given conditions, obtained by replacing p
in equation 5.10 with 1 - c. The variable ssize-small estimates the sample size when
estimating the prevalence of a rare disease. In this case, from the formula 0 =
when 19 is small. p is close to 1 - c. When we anticipate a low prevalence, smaller val-
ues of d are usually needed for the study to be informative, leading to larger sample
sizes.
1 d 1 sens 1 spec ( ssize ( sçize.srnall 1
Table 5.7: Sample size for estimating O to accuracy f d, when a = 0.05. Here sens is
the sensitivity, spec is the specificity, ssize is the conservative sample size obtained by
using the normal approximation with p = 1/2, ssize.smal1 is the sample size obtained
by using the normal approximation with p = 1 - c .
Chapter 6
Bayesian estimation of disease
prevalence and sample size in the
absence of a gold standard
6.1 Introduction
In chapter 5 we studied the estimation of the prevalence of a disease based on the
results of diagnostic tests with known sensitivity and specificity. Such situations are
very rare, since most often we only have estimates of the sensitivity and specificity
of a test, not exact values. Performing analyses assuming these estimates to be the
exact values without accounting for the real uncertainty may produce very misleading
results. Unlike standard binomial parameter sample size formulae, we will show that
there is no "conservative" values for S and C that can be used to produce consenative
sample size estimates, and that al1 uncertainty must be accounted for. In this chapter
we will use a Bayesian approach to estirnate the prevalence of a disease in the absence
of a gold standard test, and where the sensitivity and specificity of the test are not
known exactly. Based on this approach, we will then calculate the srnallest sample
size for which a 1 - a credible interval for the prevalence has total width 1, using
an average coverage criterion. We will see that even small uncertainties about the
sensitivity and the specificity of a diagnostic test may lead to a large increase in
the sample size needed to reach the desired accuracy, and that in many cases this
accuracy cannot be reached even with an infinite sample size.
In this chapter we again assume that S + C 2 1.
6.2 The case when the sensitivity and the speci-
ficity of the diagnostic test are exactly known
In order to later examine the degree to which the sample size is affected by the
consideration of the uncertainty in the estimates of the sensitivity and the specificity
of the diagnostic test' first suppose that the sensitivity and specificity are known,
and equal to s and c respectively. Whereas in the previous chapter we considered a
classical approach to this problem, we now take a Bayesian approach. As before, let 0
denote the prevalence of the disease, p denote the probability of having a positive test
result. and s and c be the known values for the sensitivity and specificity, respectively.
6.2.1 Prior density for p
Suppose that the prior density function of B is f(0): where B takes values in the
interval [a, b] with O 5 a 5 b 5 1. We have seen in chapter 5 that
so that p is a linear transformation of O. The Jacobian of this transformation is dl9 J=,=' S+C- 1 1 where $ denotes the derivative of O with respect to p. Therefore,
the prior density function of p is
[ h ( p : . ; L ' ) sic- 1 7 P l i P S P 2
f P b ) =
where pl = (s + c - 1)a + 1 - c, and p2 = (S + c - 1)b + 1 - c.
Example 6.2.1 Suppose that the prior distribution of 0 is uniform on the interval
[a7 b]. Then the prior distribution of p is also uniform. It is easy to see that p - U [ p l , p 2 ] , where pl and p2 are given above.
Example 6.2.2 Suppose the prior distribution of O is Beta(a ,P) , so that [a. b] =
[O' 1). Then
where T ( t ) is the gamma function, and cr and 9 are positive real numbers. Therefore,
Equation 6.2 represents the equation of a Beta(<r, p) density function restricted to
the interval [l - c, SI .
6.2.2 Posterior density of p
Likelihood
If the diagnostic test is aven to a sarnple of n subjects, x of whom test positive for
the disease, the likelihood function is binomial, and is given by
Marginal probability function of X
The marginal probability function of the data x is then
'(1 - ~ ) " - ~ f ( ~ - ) d p s+c- 1 ! O 5 z _< n
O , otherwise.
Posterior density of p
By Bayes' theorem, the posterior density function of p is
O , otherwise,
that is
O , otherwise.
Posterior mean of p
T h e posteriormean cd p given data 2, denotedhere-by fi , is
Posterior mean of 0
Let 8 denote the posterior rnean of B. From equation 6.1 we have
E ( ~ ~ x ) = ( S + - i)E(eIx) + 1 - c?
where E ( . ( x ) denotes the conditional expectation given S = x. Therefore we have
6.2.3
Suppose
Sample size determination via the average coverage cri-
terion
one is looking for the sample size n such that the posterior credible set
[0 - d. 19 + dl has posterior coverage probability of a t least 1 - a for a predetermined
d and a. That is, suppose we seek n such that
Let cov(z; d) denote the posterior coverage probability of the interval [é - d' 0 + d]
for given x. Shen
Let ccni(d) denote the average coverage probability of the intervals [6 - d. ê + dl over
the marginal probability of x' that is n
cou(d) = cou ( x ; d ) m ( x ) . (6.5) r=o
Recall that m(x) is the distribution of x induced by the prior density f ( O ) . Thus'
cov(d) is the average coverage probability of the posterior credible intervals of length
2 4 centered at e , where the average is weighted by r n ( x ) . The average coverage
criterion (see chapter 2) states that this weighted coverage must be at least 1 - o.
By the linear transformation 6.1 and by equation 6.3, cov(x; d) can be written as
p - ( 1 - c ) p - ( 1 - c ) p - ( 1 - c ) COU (x ; d ) = P( - d < - 5 + d ( X = x),
s + c - 1 s t c - 1 s + c - 1
hlultiplying by s + c - 1 and then adding 1 - c,
c o u ( x ; d ) = P ( p - d ( s + c - 1 ) 5 p 5 p + d ( s + c - 1)I,Y = x).
Therefore
Computation of the sample size
The posterior density function of p: and thus the sample size required to satisfy the
average coverage criterion defined in chapter 2: cannot be expressed in closed form.
However' we can compute the posterior rnean of 0 and the sample size needed to
satisfy the average coverage criterion using the following algorithm:
Average coverage criterion algorithm:
1. Determine the prior density function of 8, fa@). Of course, this distribution
should be based on the available prior information
2. Choose values for n, d, and a.
3. Write the likelihood of the data given p
4. Calculate f,(p), the prior density function of p:
For each value of x. O 5 x 5 n, perform the following steps:
(a) Calculate f ( p 7 r ) ,
(b) Calcul8
( c ) Calcul
ate m(x) the marginal probability function of x,
ate f (plx) , the posterior density function of plx,
(d) Calculate the posterior mean of p,
(e) The posterior mean of 0 is then
(f) Calculate the coverage probability of the interval [e - d, e + dl ?
(g) Calculate the posterior average coverage probability over al1 values of x,
(h) Find the smallest sample size n that gives an average coverage probability
greater than 1 - a.
An S-plus program that calculates the sample size needed to satisfy the average
coverage criterion given d, a , and prior information on O , is provided in Appendix B.
Note 6.2.1 The integrals in steps (d) and (f) cannot always be calculated exac t l -
However: numerical integration (included as a function in S-plus) can provide accurate
approximations to these integrals.
Note 6.2.2 The sample size in step (h) can be found through a bisectional search,
which consists of evaluating the criterion for a starting value of n? then choosing a
next value of n depending on the resulting coverage. This criterion is then evaluated
at the new value of n, and the procedure is repeated until the criterion is satisfied
for n but not for n - 1.
Table 6.1 provides some representative examples of sample sizes
above algorithm. The first three examples represent a typical
calculated using the
situation where the
sensitivity of the diagnostic test is 0.9 and the specificity is 0.8. For this situation, we
calculated the sample size needed to have an average coverage of a t least 0.95 when d
was equal to 0.05: 0.03 and 0.02 respectively. In the Iast three examples we changed
the values of the sensitivity and the specificity of the diagnostic test and the prior
distribution of the prevalence of the disease, to investigate the effect these changes
have on the sample size needed to maintain an average coverage of a t least 0.95.
Table 6.1: Sample size required to have an average coverage probability of a t least
0.95 when 0 -. U[a: b]. The parameter d is h d f the length of the posterior credible
interval, ssize is the sample size needed under the given conditions and ssize.per f ect
is the sample size needed when sens = 1 and spec = 1.
<
Example #
1
2
3
4 - a
6
The first three examples show the degree to which the sample size increases as
d decreases. Example 4 shows that the sample size more than doubles when the
sensitivity and the specificity are decreased compared to example 2. Comparing
example 5 to example 3 shows that changing the prior information about f3 can also
sens
0.9
0.9
0.9
0.75
0.9
0.9
spec
0.8
0.8
0.8
0.75
0.8
0.9
a
O
O
O
O
O
O
6
1
1
1
1
0.1
0.1
d
0.05
0.03
0.02
0.03
0.02
0.02
ssize
647
1809
4082
3867
2887
1473
ssize.per f ect
275
767
1/29
767
359
359
have a large effect on the sample size. Example 6 compared to example 5 demonstrates
t hat increasing the specificity can substantially lower the required sample size. The
last column displays the sample size required to satisfy the given conditions when
the diagnostic test is a gold standard. that is when sens = 1 and spec = 1. This
column compared to the previous shows that the sample size required to reach the
given accuracy increases substantially when an imperfect test is used instead of a
gold standard.
6.2.5 Comparing the sample sizes from the Bayesian ap-
proach to those based on the AMLE
Chapter 5 discussed the adjusted maximum likelihood estimator ( A M L E ) for the
prevalence of a disease. In particular, we considered the sample size needed for the
-4MLE to fa11 within a distance d of the true 0 with probability at least 1 - a: when
the sensitivity and the specificity of the diagnostic test were exactly known. For
example, if d = 0.05, a = 0.05, the sensitivity= 0.9 and the specificity= 0.8. the
conservative sample size obtained by substituting 0.5 for p in the equation
was 784 (table 5.7). Replacing p in equation 6.6 by its lower bound, 1 - c = 0.2,
resulted in the smallest possible value for sample size over al1 po 501. From table
6.1, we note that the Bayesian average coverage sample size based on a uniform prior
density for 0 is in between these two extremes, 647. \Ne ran several simulations with a
sample size of 501 to compare the ArLILE to the posterior mean with a uniform prior.
Since the sample size calculations from both chapter 5 and this chapter are based on
interval estimation, it is clear that larger samples ivill produce at least as accurate
estimates. The results are given in table 6.2. For each value of 0, ive chose two values
for x from the simulations. The first is the first value of x obtained in the simulations
such that x/n 5 1 - c: and the second is the first value of x such that z /n > 1 - c.
Note that in these simulations the value of 6 is needed to generate xo but is not
relevant to the calculations of the AMLE or the posterior mean. From the examples
Table 6.2: Comparing the to the posterior mean. The sensitivity was taken
to be 0.9, and the specificity %vas 0.8. For the calculations of the posterior mean, the
prior distribution for 0 was UIOo 11. Here pmean is the posterior mean of p.
Example #
1
2
3
in table 6.2, we see that the posterior mean tends to be smaller than the AMLE when
x fn 5 1 - c and larger if x/n 2 1 - c. Of course, many more simulations would have
to be carried out in order to verify this observation. Nevertheless, these examples
suggest that neither method is clearly superior when estimating the prevalence when
the sensitivity and the specificity of the diagnostic test are known. The Bayesian
method, however, has the advantage of using the prior information available on the
prevalence, and this can have a great effect on the sample size. For example, we
calculated the sample size needed for the AMLE to fa11 within a distance d of the
0
0.02
0.02
0.03
x
92
113
99
pmean
0.014
0.041
0.02
.-iMLE
0.027
0.036
0.027
true value of the prewlence, and compared it to the sample size needed for the average
posterior coverage of the interval [1î - d , p + dl to be at least 1 - (Y, where p is the
posterior mean of p. We chose d = 0.02, a = 0.05, s = 0.9 and c = 0.8. We assumed
that the only prior information about B \vas an upper bound 6. Therefore, for the
Bayesian approach, the prior density was B - U[O, b]. For AMLE. since 0 is bounded
above by b, p is bounded above by b(s + c - 1) + 1 - c = 0.76 + 0.2. The results are
given in table 6.3.
1 b 1 ssAMLE 1 ssbayes 1
Table 6.3: Comparing the sample size, ssAMLE, needed for the d M L E to fa11 within
a distance 0.02 of the true value of the prevalence with probability at least 0.95. to
the sample size, ssbayes, needed to have a posterior average coverage of a t least 0.95
of an interval of length 0.04 around the posterior mean of O. The sensitivity is 0.9:
the specificity is 0.8, and the prior distribution of 0 is U[O, b ] .
From table 6.3 we see that in the Bayesian approach, the sample size decreases
substantially when the prior information increases, as summarized by a uniforrn prior
distribution over the range of 8. The sample size based on the AMLE also decreases,
but at a Iesser rate.
The Bayesian approach has another very important advantage over the frequentist
approach, in that it can be extended to include the uncertainty around the sensitivity
and the specificity of the diagnostic test. We shall take advantage of this flexibility
in the remaining sections of this chapter.
6.3 The case when the specificity but not the sen-
sitivity of the diagnostic test is exactly known
6.3.1 Introduction
Some diagnostic tests are considered to have high specificity, while other diagnostic
tests are considered to have high sensitivity. For example, some blood screening
tests that are used to detect antibodies to the human immunodeficiency virus have
very high sensitivity, Gastwirth [1991]. Another example occurs in parasitology,
where stool examinations for certain parasites are considered to have near perfect
specificity. This is because the test consists of looking directly for the parasite under
a microscope, and a distinctive Iooking parasite is not likely to be thought present
when it is not. Hence i t often occurs that one of the test properties is known exactly
or almost exactly, while the other may be less accurately known.
In this section we will consider the problem of estimating the prevalence of a
disease in the case where the specificity but not the sensitivity of the diagnostic
test is esactly known. Similar results can be derived when the sensitivity but not
the specificity is exactly known. We will denote by c the known specificity of the
diagnostic test, the sensitivity will be denoted by S, and 0 will denote the prevalence
of the disease. As in the previous section, we will use a Bayesian approach to estimate
the prevalence of the disease, and to calculate the sample size needed to satisfy
criterion 2.8, now in the presence of unknown S.
Two methods for calculating the posterior mean of 0 and the posterior average
coverage probability will be used and compared. The first is an exact method, that
consists of calculating the posterior coverage probability of an interval exactly or
almost exactly by integrating the posterior density function over that interval. As we
shall see, t his met hod is feasible in the case where the prior densities of the prevalence
and the sensitivity are independent uniforms. The second method consists of using
a SIR algorithm to draw a sample from the posterior distribution of 0. Since this
method uses random samples to make inference about a parameter, unlike the first
method, it only provides an approximate solution. However, it has the advantage
of generalizability, in that it can be used for any prior density function. The error
generated in using the SIR method instead of exact calculations can be reduced by
increasing the size of the SIR sarnple, although this cornes at a cost of increasing
computation times.
6.3.2 Prior density for p
Suppose that S and 0 are a priori independent random variables. Let f e (0 ) denote the
prior density function of 0' and fs(s) the prior density function of S, where again,
when there is no chance for confusion we will omit the subscript. Suppose that B
takes values on the interval [a,b], where O < a 5 b 5 1, and that S takes values on
the intervai [al: 611, where O 5 a l 5 bl 5 1. From the prior independence of O and S'
the prior joint density function of 0 and S, denoted here by f (B , s), is
As before, let p denote the probability that an individual has a positive test result.
Consider the transformation of the parameter vector (0, S) to the parameter vector
(p, S), recalling that
The Jacobian of this transformation is 1/(S + c - l), so that the joint prior density
function of p and S, denoted here by f ( p , s), is
fe((p-(l-c))l(s+c- I ) ) f s ( ~ ) s + ~ - 1 , i f a s + (1 - a ) ( l - c ) < p <_ bs+ (1 - b)(l - c )
and al 5 s 5 b l ,
fixO Y ot hemise.
6.3.1 Suppose that 0 - U[a , b] , where O 5 a 5 b 5 1, and that S -.
Cr[a17 bl], where O 5 al 5 bl < 1. Then
I if as + ( 1 - a ) ( l - c ) 5 p 4 6s + ( 1 - b ) ( l - C)
and al 5 s 5 bl,
O ot henvise.
6.3.3 Posterior density of p
Likelihood
The likelihood function of the data, denoted here by l ( x J B , s), has a binomial form,
and is given by
Csing the transformation 6.1, this likelihood can be writ ten in terms of (pl z).
'lote that l ( x l p , s) depends on s only through p, so we can delete the s and write
1 1 (xlp) instead.
f (6 , s)l(xle, S ) , if a 5 0 5 b
and al 5 s 5 61,
O ot herwise,
and define f ( p , s: x) by
f ( p , s)l(xlp, s ) , if as + (1 - a ) ( l - c) 5 p 5 bs + (1 - b ) ( l - c )
and al 5 s 5 bl,
O 7 otherwise.
(6-a)(b i -uI)(s+c- 1 ) if as+ (i - a)(i - c) 5 p 5 6s + ( 1 - b ) ( l - c )
and al 5 s 5 b l ,
O ot herwise.
Marginal probability fiinction of X
The marginal probability function of S, aenoted here by m(x), is obtained by inte-
grating f ( p , s, x) with respect to p and s,
bs+(l-c)(l-6)
m ( 4 = l; 1 as+(l-c)(l-a)
I (P , s , 4 d p d s .
Example 6.3.3 If 9 - U[a, b], where O 5 a 5 b 5 1, and S -. U [ a l , b l ] , where
bs+(l -c)(l-6) (3 ( P ) ~ ( ~ - P)- dpds. m+(i-~)(i-a) (b - a) (bl - ai) (S + c - 1)
89
Posterior joint density of 0 and S
the posterior joint density function of 0 and S given ,Y = x, denoted here by f (0' S I X ) ,
is
Posterior joint density of p and S
Gsing Bayes' theorem the posterior joint density function of p and S given S = x!
denoted here by f (p, s lx), is
f p 3 if as + (1 -a)(i - c) 5 p < bs+ (1 - b ) ( l - c) :
and al < s 5 bl ,
O t herwise.
6.3.4 Posterior mean of 6
Gsing transformation 6.7, we have
The posterior rnean of 13, denoted here by 0, is
6.3.5 Sample size
Posterior coverage probability
The posterior coverage probability of the interval [è - d, 8 + dl is
Changing the order of integration,
By the transformation 6.7:
where I l = (0 - d ) ( s + c - 1) + 1 - c, and l2 = ( ê + d ) ( s + c - 1) + 1 - c.
Sample size criterion
Suppose ive a re lmkmgfor thesamplesize n that satisfiës thepaverage covkage
criterion 2.8 for a given d and a. Therefore, we seek the sample size n such that
Given d and al to find n we have to compute the posterior mean 6 : the marginal
probability m ( x ) of x, and the coverage probability cov(x; d) for each O 5 x 5 n.
Exact computations
In this section we suppose that 0 - U[a, b], where O 5 a 5 b 5 1, and that S - U [ a l , 611, where O 5 al 5 b1 5 1. We will develop an exact algorithm to compute
ê, n ( x ) and cou ( x ; d ) under these conditions. Recall from examples 6.3.1, 6.3.2 and
6.3.3, that we have
and
Therefore,
b s + ( l - c U - b ) (:) ( p - (1 - c))pX(l - p)n-T dpds. a~cci-c)(i-a> ( 6 - a ) (61 - a i ) ( s + c - l ) ' m ( x )
For ease of computation, we change the order of integration such that s is integrated
first: and then p. Exact algorithm 1 is used to integrate with respect to S . A corn-
puter program, written in Maple and given in Appendk C, is used to carry out the
computations. The resulting functions are then integrated with respect to p. Exact
algorithm 2 is then used to cornpute the average coverage probability, using a pro-
gram written in S-plus, given in Appendiu C. Before describing the algorithms in
detail' we will first provide the regions of integration.
If bal + ( 1 - c ) ( l - a ) 2 abl + (1 - c ) ( l - a), let Rii be the region bounded by
the lines
let R12 be the region bounded by the lines
s = al , s = bl ,p = abl + ( 1 - c) ( l - a) and p = bal + ( 1 - c ) ( i - 6) :
and let Rl3 be the region bounded by the lines
p = bs + (1 - c)(l - b ) ? p = bai + (1 - c)(l - b) and s = b l .
Next, if ba l + (1 - c)(l - a) < abi + (1 - c ) ( l - a) , let R21 be the region bounded by
the lines
s = a1,p = a s + (1 - c) ( l - a ) and p = bal + (1 - c)(l - b) ,
let RZ2 be the region bounded by the lines
p = as+(l-c)( l -a) , p = abl+(l-c)( l -a) , p = bs+(l-c)(l-b) and p = bai+(l-c)(l-b),
and let R23 be the region bounded by the lines
p = b s + (1 - c)( i - b),p = abl + (1 -c) ( l - a ) and s = 61.
Note that if a = O then we only have two regions of integration RI and Ra- Rl is the
region bounded by the lines
p = 1 - c , s = a l , s = b1 a n d p = bal + (1 - c ) ( l - b ) ,
and R2 is the region bounded by the Iines
p = bal + (1 - c)(l - b) ,s = bl, and p = b s + (1 - c) ( l - b) .
Figure 6.1 contains plots of al1 of the above regions.
Exact algorithm 1:
1. If a > 0, set sl = (p - (1 - a ) ( l - c ) ) / a , and set s:! = ( p - (1 - b ) ( l - c ) ) / b .
2. Compute g l : the integral of over s when ( s ? p ) E R21, gl = J:; A d s -
3. Compute g2, the integral of - over s when (so p ) E R22, g2 = A d s .
4. Compute g3; the integral of - over s when ( s : p ) E R231 9 3 = 12 A d s .
5. Compute g4, the integral of - ove' s when ( s , p ) E R I Z , gd = J:: A d s .
6. Compute g p l . the integral of - over s when (s, p ) E R Î l , gp1 = 5:; (s+& ds.
9. Compute gp4, the integal of (s+&2 over s when (s, p) E RI2: gp4 = 12 (s+L ds-
Note 6.3.1 We do not need to integrate over the regions Rll and RI3, because these
integrals have the same values as the integrals over the regions R21 and R23. -41~0,
in the above algorithm, if a = O, we only need to find 92, 9 3 , g p ~ , gp3. Since these
functions will also be used for the computations of cov(x) in algorithm 2, we computed
the functions glo 92, 93 , g ~ , gp l , gp2, gp3, and gp4, in terms of a: b, a l , b l , and c' using
a Maple program for later use.
Note 6.3.2 The functions defined in exact algorithm 1 are then multiplied by the
likelihood function of the data and the result is integrated with respect to p over the
corresponding regions. The sum of the integrals of the product of g,, g2, g3, and g4
by the likelihood is the marginal probability of x, and the sum of the integrals of the
product of gp l , gp2, gp3, and gp4 by the likelihood is the posterior mean of B. These
cornputations are used in the following algorithm.
Exact algorithm 2:
1. Choose values for the prior parameters a, 6: al, bl and c, based on the available
prior information.
2. Compute m ( x ) the marginal probability function of x. by
where
1, if bal 5 06, + (1 - c ) ( b - a). i =
( 2, if bal > abl + (1 - c ) ( b - a).
3. Cornpute 8. the posterior mean of O, by
4. Let a.neu = 6 - d, and b.new = e + d. Define Ri .new. R2.new, &new in the
same way & I , Ki2 : Eir3 were defined previously, but wit h a replaced by a. new
and b replaced by 6-new.
5. Compute cov(x; d), the posterior coverage probability of the interval [ê - d, 0+d]
for a given outcome x, by
6. Compute the posterior average coverage probability over the marginal proba-
bility of al1 possible values of 3r by
An S-plus program that can be used to calculate the posterior average coverage
probability is given in Appendix C.
6.3.6 SIR computations
As n increases, it becomes increasingly time consuming to compute the average cover-
age probability using the exact algorithms, because of the large nurnber of computa-
tions involved. There is also another more important limitation to this algorithm, in
that it only covers the case when the prior distributions of 0 and S are both uniform.
Due to the complexity of the algorithm, it is difficult to extend this exact method to
other prior distributions, (although it may be possible). We therefore investigated
other methods of computation to approximate the average coverage probability. One
method that proved feasible is Sampling Importance Resampling (SIR), Rubin [198'i].
This method, introduced in chapter 2, is used to obtain approximate random samples
from a density function when direct sampling from this density function is difficult.
The random samples are then used for inference. For example, sample quantiles may
be used to approximate credible intervals, and sample rneans approximate posterior
means.
Simulating samples from the posterior distribution of B and S
Suppose that S and 0 are a priori independent random variables vith prior density
functions fs(s) andja (0) respectively. Recall that the likelihood function of the data
is
and that f (0, s , x ) is defined by
Recall also that the posterior joint density function of 0 and S given = x is
and al 5 s 5 bl,
ot herwise.
Suppose that one is interested in obtaining samples from the marginal posterior
distributions of 6 and S. The SIR algorithm can be used in the following way:
First obtain a sample { B i ) t(i<k of size k, from the prior distributions fe(o) of 0.
and a sample also of size k, from the prior distribution fs(s) of S. This will
give a sample (Oi2 si) from the joint prior density f (B. s). The rveight function is given
b y
Note that we do not need to compute m(x) for the weights, since w(0, s) m 1 (xl9, s).
Then the SIR method asserts that one obtains an approximate random sample
{B:)15i<r from the marginal posterior distribution of O , by drarving a sample with
replacement from {Bi} with unequal probability weights 1 (x le i , si) l< i<k . Similarly.
an approximate sample { S Z * ) ~ < ~ < , - - from the marginal posterior distribution of S is
obtained by drawing a sample with replacement from with the same weights.
Of course, if needed, a random sample from the joint posterior density of (8, Ç) can
be obtained by similarly resantpling (O i , Si) pairs.
t 98
Posterior mean of 0
The posterior mean of 8 is then approximated by the sample mean
The posterior coverage probability of the interval [b - d , 0 + d]
Given d > O and x. the coverage probability ccw(z; d) of the interval [0 - d, e + d] is
approximately equal to the proportion of points from the posterior sample of 8 that
are contained in the interval [6 - d, ê + dl.
The marginal probability function of x
The marginal probability function of x cao be approxirnated by
The posterior average coverage probability of the intervals [d - d, + d] over
the marginal probabilities of x
The posterior average coverage probability, ca i (d ) of the intervals [ê - d, 9 + dl over
the marginal probability of x is then approximated by n
CVe used the following algorithm to carry out the calculations.
SIR algorithm:
1. Take a random sample of size k, { O i ) i < i < k , from the prior distribution fe (8) of
8.
2. Take a random sample of size k, { s ~ ) ~ ~ ~ < ~ ~ from the prior distribution fs(s) of
S.
3. W i t e the likelihood function of the data given û and S
For each x, x = 0 , I : . . .n?
to each point ( O i l s i 7 X) of the sample.
5 . Resample a sample {Of } lSi<,, - of size r rvith replacement frorn the original sam-
pie { O i ) l < i < k 7 - - using probabilities proportional to the weights {wi} l < i c k .
6. Resample a sample {s i) l<i<r, - - of size r with replacement from the original sarn-
ple { s i ) l<i<k, - - using probabilities proportional to the weights { w i } l < i < k .
7. Compute è the posterior mean of 9
S. Approxirnate the coverage probability of the interval [8 - d, 6 + d] given d > 0.
by calculating the proportion of points from the posterior sample of 0 that are
in the interval [ë - d, % + dl.
9. Compute the marginal probability of z
10. Compute the posterior average coverage probability cov(z) of the intervals [ê - d, e + d ] over the marginal probabilities of z
An S-plus progam that performs these calculations is given in Appendix C .
6.3.7 Examples
We computed several examples of posterior average coverage probabilities using the
exact computations and compared the results to those obtained from the SIR al-
gorithm. We considered the case where the prior distributions of B and S are in-
dependent and uniform on the intervals [a, b] and [ai, 611, respectively, for various
choices of intervals [a. b] and [a1, bl] and different sample sizes, n. Some of these
examples are provided in table 6.4. Throughout this table, d = 0.05 and the speci-
ficity is 0.9. We considered the case where the sensitivity takes relatively large values
[al' bl] = [0.7,0.95]. We then slightly increased the information on the prior density of
the sensitivity to [ a l , b l ] = [0.7,0.9], to see the effect this has on the average coverage
probability. We also considered the case where more prior information rvas available
on the prevalence û -- U[O, 0.21, in with 0 - u'[0,0.5]. With [ai , bl] = [0.7,0.9], we con-
sidered the case where the sensitivity takes relatively low values [a l : b l ] = [0.3,0.5],
while leaving the interval width L ~ e d at bl - al = 0.2. We also considered the case
where there is increased uncertainty around the sensitivity, that is, when the sensitiv-
ity takes values in a wider interval [a l , bl] = [0.3,0.8], while [a, b] = [O, 0.21. For each
situation, we calculated the exact coverage probability and the approximation h m
the SIR algorithm for different values of n, ranging from n = 100 to n = 1300. For the
SIR algorithm, samples of size k = 1000 were taken from the prior distributions of 9
and S, and samples also of size r = 1000 were taken from the posterior distribution
of 0 and S given x.
Table 6.4: Comparing the SIR average coverage probability with the exact coverage
probability when S - U [ a l , b l ] , 0 - U[O, b], c = 0.9 and d = 0.05. Here cov.sir is the
average coverage probability obtained by using the SIR algorithm, and cov.exact is
the exact average coverage probability.
b
O
O
a ,
0.7
O .
bi
0.95
0.95
sample size
100
200
cov.sir
0.630
0.741
cov.exact
0.632
0.739
Table 6.4 shows that adding a small amount of information to the prior distribu-
tion of the sensitivity by decreasing the prior interval of S from (0.7: 0.951 to [O.C,0.9]
resulted in increasing the average coverage probability. For example, for n = 500,
the exact average coverage probability increases from 0.843 to 0.866. Similarly. an
increase in the prior information on 0 from [a, b] = [O, 0.51 to [a, b] = [O, 0.21, while
keeping [ai , bl] = [0.7, 0.91, also increased substantially the exact average coverage
probability. For example, for n = 400, it increased from 0.843 to 0.941. Converselx
increased uncertainty in the prior information for the sensitivity, which is represented
here by a wider prior range for S, resulted in a poorer average coverage. For example,
the average coverage was 0.941 for n=400 and [al, bl] = [0.7,0.95], but only 0.814 for
n=400 and [al, b l ] = [0.3,0.8]: even though [a! b] = [O, 0.21 remained constant.
The table also shows that the exact average coverage probability is very close to
the SIR approximate average coverage probability in al1 cases nre considered. Exact
values for the average coverage probabilities may be preferable, but beyondthese
become increasingly time consuming with increasing sample sizes. We found that
the SIR program ran about 13 times faster than the exact program. Moreover, the
esamples in this table indicate that the errors produced by the SIR computations
are very small. Therefore, in the sequel, we use the SIR algorithm to compute the
average coverage probabilities where larger values of n were considered. The results
are illustrated in plot 1 of figure 6.2, where the average coverage probabilities are
plotted against the sample size n. The values of the parameters considered in each
plot are given in table 6.5.
The plots in figure 6.5 suggest that in many cases, the average coverage probability
does not improve substantially when n is increased a certain value. The average
coverage probabilities seem to approach an upper limit, and this upper limit can
often be much smaller than the desired average coverage of, say, 0.95. For example,
1 Plot # 1 c
Table 6.2: The parameters used in the plots of figure 6.2. Here prior-theta is the
prior distribution of O , prior.S is the prior distribution of S, and r a n g e n is the range
of n. In al1 of these plots d = 0.05.
in plot 1, as n increases from 1500 to 3500, the average coverage seerns to stay near
0.7. Similarly, in plot 3, as n increases from about 600 to 2500, the average coverage
remains near 0.55. This phenornenon will occur again in the next section, where we
will study it in greater detail.
It is also of interest to compare plot 1, where the sensitivity takes values in [0.3,0.5]
and the average coverage seems to have an upper limit of approxirnately 0.7, to plot
2, where the sensitivity takes values in [0.7,0.9] and the average coverage seems to
approach the desired value of 0.95, a t least for n 2 1500. Similarly in plots 3 and
4, the upper lirnit seems to increase from about 0.55 to about 0.9 when the prior
information on 0 increased from 0 - Ii[0,0.5] to 9 - U[0,0.2]. Therefore, it seerns
that the upper limit depends on the prior densities for 0 and S. The case considered
in this section is a special case of that considered in the next section, where we will
allow both the sensitivity and the specificity of the diagnostic test to be unknown.
This case was of interest since we could compare exact calculations to those frorn the
SIR algorithm, and hence evaluate the performance of the latter algorithm. In the
nest section: we d l consider the more general case in detail, including modelling
the effect that features of the prior densities have on the posterior average coverage
probabilit ies.
plot 1 dot 2
plot 6
Figure 6.2: Average coverage probability plotted against n.
6.4 The case where both the sensitivity and the
specificity of the diagnostic test are unknown
6.4.1 Introduction
In this section, we will consider the sample size problern for estimating the prevalence
of a disease from a diagnostic test when both the sensitivity and the specificity of the
test are not exactly known. We will assume that the prevalence, the sensitivity and
the specificity are a priori independent random variables, although similar techniques
will car- through when this condition is not satisfied. Given prior distributions for
each of these quantities, we will derive methods to find the sarnple size n such that the
average coverage criterion 2.8, defined in chapter 2, will be satisfied. The procedure
will be similar to that of the preceding section: First, we develop an exact algorithm
that will be used to calculate the average coverage probability for a given sample size
n, when the prior distributions for the prevalence, the sensitivity and the specificity
are al1 uniform. The exact algorithm d l apply only to uniform prior distributions,
and is feasible in practice for only relatively srna11 saniple sizes. We therefore again
consider the SIR algorithm as an alternate method of computation. We investigate
the average coverage probability for several typical examples. In many situations'
average coverage probabilities do not increase substantially when the sample size n
increases b e p n d a certain value no. Therefore, we develop a regression mode1 to
estimate no, and use it to find the value of the average coverage probability as n
approaches no.
As in the preceding sections, we let 0 denote the prevalence of the disease, 'C the
number of individuals from the sample who test positive, x the observed value of X
in the experiment, and S the sensitivity of the diagnostic test. In this section the
specificity of the diagnostic test is also a random vanabte that will be denoted by C.
Let f e ( 8 ) denote the pnor density function of 0, defined on the interval [a, b], where
O 5 a 5 b 5 1. Similady, let fs(s) denote the prior density function of S, defined
on the interval [ a l , bl] where O 5 ai 5 bl < 1, and let fc(c) denote the prior density
function of C defined on the interval [a2' b2] where O 5 a2 5 b2 5 1 .
The likelihood of the data is
By the prior independence of 8, S and C, the joint density function of 0, S, C is
Define j(x, O, s, c ) by
The marginal probabiiity function of X is
so that by the usual application of Bayes' theorem, the posterior joint density
function of 8, S, and C is
The marginal posterior mean of 8 is found by
The marginal posterior coverage probability of the interval [ê - d, 8 + dl given
'i = x and given d is
The average coverage probability of the intervals [0 - d. 0 + dl over the marginal
of K given d is
To calculate the average coverage probability we need to calculate the marginal prob-
ability of ,Y, the posterior mean 6 of 0' and the posterior probability of the interval
[6 - dl 0 + d] given d and s, for each possible value of x, that is for x = 0.1,. . . , n.
6.4.2 Exact computations for the case of uniform prior dis-
tributions.
In this section we will derive an exact method for sample size calculations. Suppose
that a priori 0 - U[a, b], where O 5 a 5 b 5 1, S .Y U [ a l , b l ] , where O 5 al 5 b1 _< 1.
and C - U[a2, b2], where O 5 a2 5 b2 5 1. The joint density function of O, S. C
is then given by
0, ot henvise.
that is
O ~ a < B L b < i
0 < a l < s < b 1 5 1
0 < a 2 s c < b 2 < 1 ,
otherwise.
For ease of cornputation we consider the transformation of the parameter vector
(8, s, c) to the parameter vector (p, s, c). We recall that
p = es+ (1 - 8)(1- C) ,
so that
The Jacobian, J of the transformation is by definition the inverse of the determinant
Thus J = A, and f (x, p, sl c) is given by
Pl 5 ~ 5 ~ 2
O < a i s s < b l ~ l
o I a 2 s c < b 2 5 1!
ot henvise,
where pl = a ( s + c - 1) + (1 - c ) and p2 = b(s + c - 1) + (1 - c). The marginal
probability of X is then given by
(3 (P)=O - PY-= m ( x ) = l: 11; /: ( b - a) ( b , - a l ) (b2 - il2) (S + c - 1)
dpdcds.
From this the joint posterior density function of S, C and p is
otherwise.
The posterior mean of 9 is by definition
The coverage probability of the interval [0 - d; 0 + dl is
where
1, = (0 - d ) ( s + c - 1) + (1 - c ) and l 2 = (0 + d))(s + c - 1) + (1 - c ) .
To calculate r n ( X ) , e: as well as cov(z; d), triple integrals over s, c and p need to be
calculated. These integrals can be solved exactly by changing the order of integration
such that integration is perforrned first with respect to c, followed by s and then p.
In changing the order of integration, however, 36 different regions of integrations
arise. In Appendix D we will illustrate how the different regions of integration can be
obtained, and therefore how the required integrations may be computed to derive the
final sample sizes using an exact approach. This approach is applicable only when
uniform prior densities for 9, S and C are reasonable, and are practical only when
n is small. Nevertheless, they are useful as a point of cornparison for approximate
methods, which we introduce in the next section.
6.4.3 SIR computations
We will now describe a sampling importance resampling (SIR) algorithm that can be
used to provide approximate coverage probabilities for cases when the exact approach
112
described in the previous section cannot be used. This SIR algorithm is similar to the
one used in section 6.3, with the additional complication tha t C is now also unknom.
One proceeds as follows:
SIR
1.
2.
3.
4.
- a.
6.
7.
algorit hm:
Obtain a random sampie of size k from the prior distribution f0(q of 8, { a i ) l c i < k - -
Obtain a random sample of size k from the prior distribution fs(s) of S,
Obtain a randorn sample of size k from the prior distribution fc(c) of C'
{G} i < i < k .
The likelihood function of the data given 6, S. and c is
so that
to each
for each x' x = 0, 1,. . . , n, attach the weight
point (Oi, si, G, x), 2' = 1, . . . , k.
Resarnple a sample {Of } l< i<r , - - of size r with replacement from the original sam-
ple {Bi } using pro babilities proportional to the weights {wi ) 1 s i s i .
Approximate d the posterior mean of O, by
Estirnate cov(z; d) , the posterior coverage probability of the interval [û - d, 6 +d]
given d > O, by calculating the proportion of resampled points 8: from step 6
that are inside the interval [e - d, 6 + dl.
8. The marginal probability of x is estimated by
9. An approximation of the average coverage probability cov(x) of the intenals
[6 - dl e + d] over the marginal probability function of x is given by
h full listing of an S-plus program that carries out this algorithm is given in Appendix
6.4.4 Examples
Example 6.4.1 Cornparison of the average coverage probability obtained
from the SIR algorithm t o the exact average coverage probability
We calculated the average coverage probability (equation 6.5), using bot h the
exact and SIR algorithrns for a variety of typical prior densities for the sensitivity,
the specificity and the prevalence. Due to the limitations of the available hardware
and software, the exact computations can only be done for small values of n. We
considered examples with n = 50, n = 100 and n = 200. Throughout, we used prior
and posterior samples of size k = r = 1000 in the SIR program. Of course, larger
values for k and r will increase the accuracy at the expense of computing time. The
results are given in table 6.6.
The results in table 6.6 show that if n is relatively small, the SIR algorithm
performs well even with k = r = 1000. These results, together uith the results of the
preceding section and the fact that exact computation can only be done for small
values of n, encouraged us to use the SIR algorithm a s an alternative to the exact
method for the computation of average coverage probabilities. The SIR algorithm is
convenient in the sense that it is easy to program, its use is not restricted to small
sample sizes, and it is flexible, in the sense that it can be used regardless of the
form of the prior distributions. It also does not depend on the assumption of prior
independence of S, C and 0. However, the SIR algorithm has the disadvantage that it
relies on random samples for the computations of posterior probabilities, and therefore
the results given by the SIR algorithm introduce random errors. These errors seem to
be small, as ive discussed before, and do not constitute a real problem in this setting.
This is especially true since we will usually consider average coverage probabilities.
so that errors in each individual term in the average will tend to cancel out when
computing the final average. Therefore, in the remaining part of this section, the SIR
algori t hm will be used.
Example 6.4.2 Cornparison with the case of fixed sensitivity and/or tixed
specificity
In section 6.2, we considered the case where both the sensitivity and the speci-
ficity are fixed, and in section 6.3, the case where only the specificity is known was
examined. We now look at several examples to examine the effect that less than
perfect knowledge about these quantities has on the average coverage probability.
Table 6.6: Cornparison of the SIR average coverage probability with the exact average
coverage probability. The prior densities are S - U[al , b l ] , C -- U[0.85,0.95], and
0 - U[O, 0.11. The parameter d is half the length of the posterior credible interval.
Cov.sir is the average coverage probability given by the SIR program. Cov.exact is
the average coverage probability given by the exact program.
COV
0.950 ' 0.589
0.595
0.947
0.618
0.950
0.803
O. 783
0.950
0.816
0.950
0.765
0.950
0.557
Table 6.7: Variation of the average coverage probability with increasing uncertainty.
The prior distribution of 0 is U[O, 0.11, d is half the length of the posterior credible
interval, sens is the prior distribution of the sensitivity, spec is the prior distribution
of the specificity and cov is the average coverage probability.
Table 6.7 shows that the average coverage probability decreases when the sensi-
tivity and the specificity are not exactly known compared to when they are exactly
known. It is especially interesting to note that if the pnor information suggests that
the sensitivity and the specificity of the diagnostic test must be larger than a given
fixed value, but the exact values are not known. then the coverage probability can
substantially decrease. For example, comparing row 1 to row 5 of the table, the prob-
ability decreases from 0.95 to 0.62 for the same sample size of 1473. Therefore, the
uncertainty greatly decreases the probabilities, even though the uncertainty is only
in the direction of higher sensitivity and specificity. Similarly, in examples 9 and 10,
we see that the average coverage probability decreases from 0.95 to 0.816 when the
sensitivity and the specificity change from being fixed a t 0.9 to having 0.9 as a lower
bound. For small prevalences (prior support for the prevalence here was restricted
to the interval [O, 0.11) ! the uncertainty around the specificity has more efFect on the
accuracy of estimation of the prevalence t han the uncertainty around the sensitivity.
This is seen in examples 1, 2 and 1 of the table, where if we change the specificity from
being fixed a t 0.9 to U[0.85,0.95], the average coverage probability changes from 0.95
to 0.947. It decreased to 0.585, however, when both the sensitivity and the specificity
where changed to Li[O.S5,0.95]. Furthermore, the probability decreased to 0.395 when
the sensitivity changed to U[0.85,0.95] and the specificity to u[0.9, 11. The reason for
this effect is found in the equation p = Os + (1 - 8)(1- c ) . Since s is multiplied by 8,
when û is small the effect of s is small, and therefore the effect of c is larger.
Example 6.4.3 Variation of the average coverage probability with respect
to n
In this section we will investigate the average coverage probability for a variety
of situations to see the effect of changes to the sample size n. We examined a variety
of situations typical of those that may arise in practice. In particular, we considered
the cases where the prior range of 0 is known to be [O, 0.11, and the prior ranges
of the sensitivity and specificity are known to be one of the 4 intervals [0.65,0.75],
[0.75,0.85], [0.85,0.95] and [0.9,1]. We considered both uniform and Beta(3.3) prior
densities on these ranges. Table 6.9 provides average coverage probabilities for prior
densities of the form S - U[al, bl], C - U[az , bz] and û - U[O,O.l], while table
6.10 illustrates similar results for S - Beta(3,3), C - Beta(3,3) and O - Beta(3: 3).
Figure 6.3 displays plots of coverage probabilities versus the sample size n. The values
of the parameters considered in each plot are given in table 6.8.
1 1 1 0.02 1 Uniform ( [0.75,0.85] / [0.85,0.95] 1 [100,5000] 1 Plot #
1 5 1 0.03 1 Uniform 1 [0.75,0.85] 1 [0.85,0.95] 1 [100,5000] 1
d
2
Table 6.8: The parameters used in the plots of figure 6.3. Here range.S is the range
of S, 7-ange.C is the range of C, dist. is the prior distribution of 8: S and C7 and
range.n is the range of n. The range of 8 is [O, 0.11.
Note 6.4.1 The standard beta distribution has support on the interval [O, 1). It is
dist .
0.02
easy to derive a beta density having different support. For example, a random variable
Y - Beta(u, v) on the interval [a, b] would have the density function
range.S
Uniform
r ( ~ + v ) ( ( y - ( ( b - fy(9' = r ( u ) r ( ~ ) b - a 6 - a
range.C
[0.85,0.95]
range.n
[0.85,0.95] [100,5000]
Table 6.9: Average coverage probability when S - U [ a l , b l ] , C - U[a2, b2] and
t9 - Li[O,O.l]. Here cov-sir indicates the average coverage probability computed using
the SIR algorithm. The parameter d is half the length of the posterior credible
interval.
Table 6.10: Average coverage probability when S - Beta(3,3) on the interval [a l , b l ] ,
C - Beta(3,3) on the interval [a2, b2] and 0 - Beta(3,3) on the interval [O, 0.11. Here
cov.sir indicates the average coverage probability computed using the SIR algorithm.
The parameter d is half the length of the posterior credible interval.
plot 1
plot 3
r
Figure 6.3: -Average coverage probability versus n.
The plots in figure 6.3 and the information in tables 6.9 and 6.10 provide evidence
that the average coverage probability approaches an upper limit. For example. in the
first four rows of table 6.9 we do not see any improvement for the average coverage
probability beyond n = 500, even though we quadruple the sample size. Similar
effects occur in rows 5 to 8, rows 17 to 20 and rows 21 to 24 of the same table. While
increasing the sample size always improves the precision in estimating the observed
proportion of positive tests p , past a certain point it does not provide better estimates
of the sensitivity and the specificity of the diagnostic test. Since the estimate of O
depends not only on the estimate of p but also on estimates of the sensitivity and the
specificity for which improvement is limited, the improvement in estimating 0 is also
limited. The approximate sample size bepnd which furt her sampling provides lit tle
additional precision in estimating 0 is a complex function of the prior information
available on Bo S and C. In the next section, we will attempt to use the empirical
evidence in this section to build a model that can help to explain this phenornenon.
6.4.5 Logist ic regression models
We would like to construct a regression model to analyse the variation in average
coverage probabilities caused by changes to the variables n, d, and the prior distribu-
tions of O , S and C. Since the average coverage probabilities must be between O and
1, a generalized linear mode1 with logit link, (see McCullagh and Nelder [1989]) will
be appropriate. In a generalized linear model a function of the mean of the numeric
response variable, called the link, is written as a linear combination of the predictor
variables.
Let p denote the mean of the response variable. The logit link, denoted by q, is
given by
This is equivalent to
The prior distributions of 6: S and C can be summarized by their respective means,
and their respective prior coverage probabilities of the intervals of lengt h 2d centered
at the respective means. The following notation is used:
Let:
+ msens denote the prior rnean of the sensitivity,
mspec denote the prior mean of the specificity.
rn0 denote the prior mean of 9,
cov.8.start denote the coverage probability of the interval [me - d' m9 + d] by
the prior distribution of 0.
0 cm.sens.start denote the coverage probability of the interval [msens-d, msens t
d] by the prior distribution of S?
O cou.spec.start denote the coverage probability of the interval [mspec-d, mspec+
d] by the prior distribution of C, and
Since w was an important factor in sample size determination when the sensitivity
and specificity were exactly known, we chose to include it in the model. The average
coverage probability should satisfy the constraints
We therefore included also the variable l / d ( m - d ) in the model, where m = rnau(rn8-
a, b - me). In surnmary, we rnodel log (&) in terms of the variables n: d. l /d(m - d)
w, msens, mspec, m8, cov.O.start~ cov.sens.start and cov.spec.start.
Since the average coverage probability, as a hnction of n. seems to have an asymp-
tote with value smaller than 1, we looked for a model of the form
where u and v are constant with respect to n, so depend only on d, ut, msens, nspec,
mB, cov.8.start7 cou.sens.start and cou.spec.start. This model seems appropriate if
u < O, since then 7 will be increasing with n, and as n approaches infinity 7) approaches
u. -41~0
so that
p = erp(u) exp(u/n)/(l+ exp(u) exp(v/n)).
Therefore, as n approaches infinity, p approaches exp(u) / (1 + exp(u)).
Fitting the models
The average coverage probability is equal to cov.6.start for n = O, therefore we looked
for a mode1 that satisfies this constraint and fits the description just discussed. Hence
it seemed appropriate to model log(cou - crn.O.stort)/(l - cou + cov.0.start) with
respect to the following variables (and possibly t heir interactions): l l n , d, 1 / d ( m - d)
ui, msens, mspec? mi?, cov.8.start1 cov.sens.start and cov.spec.start. We first must
compute these variables for the situatioiis we studied (tables 6.13 to 6.22). For the
data we considered m = 0.05 and mB = 0.05. We have
( 0.7 if the range of the sensitivity is [0.65,0.75],
0.8 ifthe rangeof thesensitivityis [0.75,0.83], msens =
0.9 if the range of the sensitivity is [0.85,0.95],
10.95 , if'therangeofthesensitivity-is [0.9.I]_
( 0.7 if the range of the sensitivity is [0.65.0.75],
l 0.8 if the range of the sensitivity is [0.75,0.83], mspec =
0.9 if the range of the sensitivity is [0.85,0.95],
( 0.95 , if the range of the sensitivity is [0.9,1].
The Be ta(3 ,3 ) density on the interval [a, b] is given by
O otherwise,
which can be simplified to
~ ( e ) ~ ( o ) ' ? i f t ~ [a,b] f ( t ) =
( b - a ) &-a b-a
I O , otherwise. p p p p p p p p p p - - - - - - - -
p p p p p -
Thus the distribution function at a point y , where a 5 y 5 b, is
t - a b - t
Setting u = E, the integral becomes
so that
F ( y ) = pbeta(3,3, b - a
where pbeta(cu7 ,O. E) denotes the distribution function of a Beta(a, 9) defined on
the interval [a, b]: evaluated at the point y. Furthermore,
, if 9 has a uniform prior density
( pbeta(3: 3: -) - pbeta(3,3, -), if û has a Beta(3,3) prior density.
It is easy to see that in al1 the cases we studied (tables 6.13 through 6.22) we have
The error in the model arises largely from using the approximation to the true
coverage probability provided by the SIR algorithm. We fitted logistic regression
models with both binomial and constant variance, to see which would provide a
better fit. We now describe briefly each of these models.
The binomial model: This mode1 uses the logit link function with binomial vari-
ance, given by p(1 - IL). We first modelled the dependent variable cov - cov.8.start
in terms of l /n7 d, cou.t?.start, cov.0.start2, w and ll(d(0.05 - d)). FVe believe that in
the above list of variables w reflects the effects of both msens and rnspec. and since
cov.8.start = cm.sens.start = cov.spec.start, we believe that cov.6.start with pos-
sibly the quadratic term cm.0.start2 represents the effects from bot h cou.sens.start
and cov.spec.start, while l l (d(O.05 - d)) models the constraints (6.10). The S-plus
program used to fit this model and the model printouts are given in Appendis F.
Figure 6.4 displays a plot of the residuals From the binomial model. In this plot
we see that the mean of the residuals is approximately 0, and the absolute value of
the residuals is not larger than 0.04 for most response values. Although this indi-
cated a reasonable fit, the residual variance seems to be roughly constant throughout
the range of sample size n, so the assumption of the binomial variance seems un-
reasonable. This is not surprising, since rnost of the variance arises from the SIR
approximation to the average coverage probability, which we do not expect to greatly
differ across different values of n. Therefore, we also fitted a quasi likelihood model
with constant variance.
The quasi likelihood model: We considered a quasi likelihood model with logit
link and constant variance, using the same independent and dependent variables chat
were included in the previous model. This model is referred to as a quasi-likelihood
model, since it does not arise from the more commonly used exponential distribution
family and does not make full use of the likelihood function. The program written in
S-plus to fit this model and the model printouts are givea in Appendk F. Figure 6.1
displays a plot of the residuals from this model. In this plot we see that the mean of
the residuals is approximately 0, and the absolute value of the residuals is usually not
Iarger than 0.04. so that the fit seems to be similar to that from the binomial model.
Nevertheless, the variance assumption from this model is almost certainly closer to
being correct. For more details on quasi likelihood models see Mccullagh and Xelder
[l989].
The quasi.ful1 model: In order to see if adding interaction terrns for some of
the variables would improve the fit, we tried a generalized linear model with a lcgit
link and a constant variance. The independent variables were w, l / ( d * (O.OS - d)).
d, cou.0.start, cov.8.start2, msens, msens2, rnspec, rnspec? and l l n and the second-
order interactions of the variables d. cov.8.start, ~ov.O.~tar t*, msens, rnsens2' mspec
and mspec? were added. We seelcted the rnodel that had the smallest AIC (see
Chambers and Hastie [1992]). We called this model the quasi.ful1 model. The S-plus
program used to select this model and the model printouts are provided in -4ppendix
F. Figure 6.4 displays a plot of the residuals of the quasi.ful1 model. In this plot
ive see that the mean of the residuals is again approximately 0, and the range of
the residuals seems to have decreased slightly but not substantially, from the the
preceding models. The variance again appears to be roughly constant throughout
the range of n. The fit seems to be similar to that given by the quasi model, but the
quasi model is simpler in the sense that it has less variables and is therefore easier to
interpret, and to use in practice.
In order to investigate the effect of the sample size n on the average coverage
probabilit- we fitted a nested model, that rnodels the data with respect to n, with
al1 other variables being held constant. Here we consider each possible combination of
the variables separately, and examine how the average coverage probabilities change
with respect to n.
The nested quasi model: This is a quasi likelihood model with a logit link and
a constant variance with respect to l /n in each of the levels obtained by considering
al1 possible combinations of:
0 The levels taken by d, where 1 represents the value 0.01, 2 represents the value
0.02 and 3 represents the value 0.03,
the levels taken by the prior density of 8, where 1 represents the Uniform density
and 2 the Beta(3,3) density,
the levels taken by the range of the sensitivity, where the intervals [0.65,0.75].
[0.73,0.83], [0.83,0.95], and [0.9,1], are represented by 1, 2, 3 and 4, respectively.
and
the levels taken by the range of the specificity, where the intervals [0.65,0.75]
[0.75,0.85], [0.85,0.95], and [0.9,1], are represented by 1, 2, 3 and 4, respectively.
For more details on nested models, see Chambers and Hastie [1992].
The S-plus program we used to fit the nested model is given in Appendix F. The
plots of the average coverage probability for the nested model within the different
combinations of levels is given in figure 6.5 . Here, we see clearly how the average
coverage probabilities seem to approach an upper Iimit at a relatively small value of
n. This upper limit varies from one level to another depending on the initial values
of the predictors.
Figure 6.4: From top to bottom, residuals from the binomial, the quasi likelihood
and the quasi.ful1 models, respectively.
Figure 6.5: Predicted average coverage probability versus n, given by the nested quasi
mode1 in each of the levels provided by a11 possible combination of the values of d, the
prior densities, the ranges of the sensitivity and the specificity. Each point represents
a predicted coverage probability for a point in the original data set. .-
The residual deviance in the binomial model, 3.308, is small in absolute terms, but
is larger by approximately a factor of 12 than the deviance from the quasi Iikelihood
model, 0.276. The residual deviances from the quasi likelihood and quasi.ful1 models
are very close, the latter value being 0.221. This is due to the fact that the residual
deviance is given by the expression
where V ( u ) is the residual variance function and y = cm - cou.0.start. In Our data,
y 2 0.3, as seen in the plots of the residuals across the range of the observations. In
the binomial model, V ( u ) = u(l - u), which decreases as u increases on the interval
[O, 0.51, while in the quasi mode1 V ( u ) is constant. Therefore. the deviance of the
smaller values of y from the binomial model d l be larger than the corresponding
deviance from the quasi likelihood model, since in the binomial model we are dividing
by a smaller quantity.
Predicted values
The plots of the residual errors show that al1 three models provide a good fit to the
data. In order to assess the wlidity of our models for large values of n, we calculated
the predictions for the average coverage probability for n = 5000, and n = 7000. as
given by the binomial, quasi, quasi.ful1 and the nested quasi models. We compared
t hese values to the corresponding values provided by the SIR algorit hm. Throughout ,
we used situations that were not included in those from which the models were built.
The results are shown in tables 6.11 and 6.12.
' fit-full 6t.nested fit.bin fit-quasi
0.506 0.509 0.503 0.501
0.562 0.562 0.557 0.556
0.534 0.336 0.540 0.539
0.570 0.578 0.573 0.572
0.638 0.636 0.627 0.627
0.741 0.739 0.745 0.743
Table 6.1 1: Comparing the average coverage probability with the predicted coverage
probability from the modelç when S - u'[al, b l ] , C - U[a2, b z ] , and 0 - U[O, 0.11. The
parameter d is half the length of the posterior credible interval, cov.sir is the average
coverage probability computed using a SIR algorithm, fit.full, fit.nested. fit. bin and
fit-quasi are the values of the average coverage probability given by the quasi-full
model, nested quasi model, binomial mode1 and quasi model, respectively
- .- p. - -- -- - -
Table 6.12: Comparing the average coverage probability with the predicted coverage
probability when S - Beta(3,3) on the intenal [al, 611, C - Beta(3,3) on the interval
[a2, b2] and 0 N Beta(3,3) on the interval [O, 0.11. The length of the posterior credible
interval id d = 0.02, cov.sir is the average coverage probability computed using a SIR
algorithm, fit.ful1, fit.nested, fit-bin and fit.quasi are the values of the average coverage
probability given by the quasi.full model. nested quasi model, binomial mode1 and
quasi model,respectively.
n
5000
5000 - -
5000
5000
5000
a,
0.65
0.65
0.X
0.75
0.85
bl
0.75
0.75 -
0.85
0.85
0.95
as
0.65
0.75 - -
0.75
0.85
0.85
fit-full
0.715
0.736
0.754
0.783
0.805
b2
0.75
0.85
0.85
0.95
0.95
fit-nested
0.713
0.732 - -
0.755
0.780
0.804
cov-sir
0.728
0.746
0.745
0.817
0.810
fit.bin
0.714
0.739 - -
0.764
0.787
0.807
fit-quasi
0.713
0.739
0.765
0.788
0.809
We see in tables 6.11 and 6.12 that the values predicted by the four models are
very close to each other and also close to the corresponding values obtained from the
SIR algorit hm.
Sample size
The average coverage probability approaches an upper limit as the sample size in-
creases. It would be of considerable practical interest to cornpute this upper limit,
and to determine the sample size no after which the average coverage probability will
not improve by more than E, even if the sample size were to increase to infinity. Since
the rnodels we used were of the form
p approaches exp(u)/(l +exp(u)) as n approaches infinity. Therefore cov -cov.B.start
approaches exp(u)/(l + exp(u)) as n approaches infinity. The upper limit of the
average coverage probability is then given by
We therefore seek the no such that
exp(u)/(l + exp(u) - exp(u) exp(v/n)/(l + exp(u) exp(v/n)) _< c, for al1 n 2 na.
(6.13)
In a11 three rnodels, the expression
exp (u) exp(u/n) 1 + exp(u) exp(u/n)
is increasing in n, since its derivative is positive throughout its range. In fact the
derivative is equal to exp (u) exp(u/n) ( - v / n 2 ) (1 + exp(u) exp(v/n))* '
which is positive, since u < O in the three models. Hence inequdity 6.13 is satisfied
is true. We will now derive an expression that approximates no, applicable to al1
t hree models:
exp(u) - e(l + exp(u)) v / n 2 log
exp(u) + ~ ( 1 + exp(u)) exp(u) exp(u) - e(1 + exp ( u ) )
n 2 v l log exp(u) + ~ ( 1 + exp(u)) exp(u)
Tables 6.13 through 6.22 contain a collection of examples of coverage probability
limits? together with the sample sizes needed for E = 0.01 and e = 0.02: as given by
the t hree models.
[al, bl] [a2, b2] c. bin c.quasi
[0.65 ,0.75] (0.65 00.75] 0.236 0.235
Table 6.13: The sample size, and the limit of the average coverage probability given
by the binomial and quasi models, when the prior distribution of B is U[O, 0.11, the
prior distribution of S is U [ q , b l ] , and the prior distribution of C is U[aî , bz] . The
parameters E = 0.01, d = 0.01, and cov.0.start = 0.2. C.bin, c.quasi and c.full are
the upper limits of the average coverage probabilities given by the binomial, quasi
and quasi.ful1 models. Ss.bin, ss-quasi and ss.ful1 are the sample sizes given by the
same three models. respectively. after which the average coverage probability will not
improve by more than 6 = 0.01.
-- - --
[al, h] [a2, bz] cb in ~ .quas i ~ . fu l l ss.bin ~ ~ . q u a s i
Table 6.14: The sampie size, and the limit of the average coverage probability given
by the binomial and quasi models, when the prior distribution of 13 is U [O, 0.11. the
prior distribution of S is U[al, bt], and the prior distribution of C is LT[a2, b 2 ] . The
parameters E = 0.01, d = 0.02, and cou.0.start = 0.4. C.bin, c.qua.si and c.ful1 are the
upper limits of the average coverage probabilities given by the binomial, quasi and
quasi-full models. Ss.bin, ss.quasi and ss.ful1 are the sample sizes given by the same
same t hree models, respectively, after which the average coverage probability will not
improve by more than E = 0.01.
Table 6.16: The sample size, and the limit of the average coverage probability given
by the binomial and quasi models, when the prior distribution of O is Beta(3,3) on the
interval [O' 0.11, the prior distribution of S is Beta(3,3) on the interval [al, b l ] , and the
prior distribution of C is Beta(3,3) on the interval [a2, b2] . The parameters c = 0.01:
d = 0.02, and cov.8.sta~t = 0.674. C-bin, c.quasi and c.ful1 are the upper limits of
the average coverage probabilities given by the binomial, quasi and quasi.ful1 models.
Ss.bin, ss.quasi and ss.ful1 are the sample sizes given by the same three rnodelso
respectively, after which the average coverage probability will not improve by more
Table 6.17: The sample size, and the limit of the average coverage probability given
by the binomial and quasi models, when the prior distribution of 0 is Beta(3,3) on the
interval [O. 0.11, the prior distribution of S is Beta(3,3) on the interval [ a l , b l ] , and the
prior distribution of C is Beta(3,3) on the interval [a2, bz]. The parameters c = 0.01.
d = 0.03, and cov.0.start = 0.884. C.bin, c.quasi and c.full are the upper limits of
the average coverage probabilities given by the binomial, quasi and quasi-full models.
Ss-bin, ss.quasi and ss.full are the sample sizes given by the same three models.
respectively, after which the average coverage probability will not improve by more
than E = 0.01.
[al Y 611
[0.65,0.75]
[0.65,0.75]
[0.65,0.75]
[0.65,0.75]
[a2? b2]
[0.63,0.75]
[0.75,0.85]
[0.85,0.95]
[0.9,1]
c. bin
0.907
0.921
0.936
0.943
c-quasi
0.906
0.921
0.936
0.943
c-full
0.905
0.919
0.938
0.949
ss. bin
118
275
397
453
ss.quasi
133
252
366
419
ss.ful1
128
237
378
463
Table 6.18: The sample size, and the limit of the average coverage probability given
by the binomial and quasi models, when the prior distribution of 0 is U[O, 0.11, the
prior distribution of S is U [ a l , bi], and the prior distribution of C is U[a2' 621. The
parameters c = 0.02, d = 0.01, and cou.0.start = 0.2. C.bin, c.quasi and c.full are
the upper limits of the average coverage probabilities @en by the binomial, quasi
and quasi.ful1 models. Ss.bin, ss-quasi and ss.ful1 are the sample sizes given by the
same three models, respectively, after which the average coverage probability will not
improve by more than c = 0.02.
Table 6.20: The sample size. and the limit of the average coverage probability given
by the binomial and quasi models, when the prior distribution of 0 is U[O, 0.11. the
prior distribution of S is U [ a l , bill and the prior distribution of C is U[a2, b,]. The
parameters E = 0.02, d = 0.03, and cov.0.start = 0.6. C.bin, c.quasi and c.full are the
upper limits of the average coverage probabilities given by the binomial, quasi and
quasi.full rnodels. Ss.bin, ss-quasi and ss.ful1 are the sample sizes given by the three
models, respectively, after which the average coverage probability will not improve
by more than c = 0.02.
Table 6.21: The sample size. and the limit of the average coverage probability given
by the binomial and quasi models, when the prior distribution of û is Beta (3 ,3 ) on
the interval [O, 0.11, the prior distribution of S is Beta(3,3) on the interval [ a l , b l ] ,
and the prior distribution of C is Beta(3,3) on the interval [a2, b z ] The parameters
E = 0.02, d = 0.02, and cm.B.start = 0.674. C.bin, c-quasi and c.full are the upper
limits of the average coverage probabilities given by the binomial, quasi and quasi-full
models. Ss-bin, ss.quasi and ss.ful1 are the sarnple sizes given by the three models,
respectively, after which the average coverage probability will not improve by more
than é = 0.02.
- - - - .-
Table 6.22: The ~ a m p l e size? and the limit of the average coverage probability given
by the binomial and quasi models, when the prior distribution of 13 is Eeta(3,3) on
the interval [O, 0.11, the prior distribution of S is Beta(3,3) on the interval [a , , b , ] ,
and the prior distribution of C is Beta(3 ,3) on the interval [a2, b2]. The parameters
E = 0.02; d = 0.03, and cm.8.start = 0.884. Chin, c.quasi and c.ful1 are the upper
limits of the average coverage probabilities given by the binomial, quasi and quasi.ful1
models. Ss.bin, ss.quasi and ss.ful1 are the sample sizes given by the three models,
respectively, after which the average coverage probability will not improve by more
than c = 0.02.
Tables 6.13 through 6.22 confirm that in general, the average posterior coverage
probability has an upper limit that is often much smaller than 1. This limit cannot
be surpassed even with an infinite sample size. By fitting a logistic regression model
to data that consists of the values of the postenor average coverage probabilities
calculated at different values of the sample size. this upper limit can be closely ap-
proximated. The sample size needed for the posterior average coverage probability t o
be within c of this upper limit increases when the intervals of definition of the prior
densities of S and C is shifted to the right. For example, the results of the quasi
likelihood model in table 6.19 indicate that the sample size is 206 when the pnor
densities of the sensitivity and the specificity are both U[0.65,0.75], but it increases
to 707 when these prior densities are both U[0.9,1]. This is because the upper limit of
the posterior average coverage probability increases when the intervals of definition
of the prior densities of S and C shift to the right, and larger sample sizes are needed
to attain this higher upper limit.
For the various values of d and for the prior densities of S, C and B considered
in tables 6.13 through 6.22, we calculated the difference between the upper lirnit of
the post erior average coverage probabilities and cm.8.start. We cal1 t his difference
the improvement of the posterior average coverage probability. Table 6.23 shows the
maximum and minimum improvement over the different prior distributions of S and
C. For example, if d = 0.03 and the prior distribution of B is L;[O,O.l], the m a ~ i -
mum improvement is 0.233 and the minimum improvement is 0.065. These maximum
and minimum improvements correspond to prior distributions for the sensitivity and
specificity being both U[0.65,0.75] or both U[0.9,1] respectively. When the prior
distribution of B is U[O, 0.11 and the prior distributions of S and C are uniform (first
three rows of the table), the size of the irnprovement increases with increasing d.
For example, the maximum improvement is 0.139 for d = 0.01, while it is 0.229 for
d = 0.02. This is not the case! however in rows 4 and 5, where the prior distribution of
B is Beta(3,3) defined on the interval [O, 0.11, and the prior distributions of S and C
are also Beta(3,3) , defined on the intervals [al, b l ] and [a2, bz ] , respectively. Here the
improvement does not increase with increasing values of d. For example, for d = 0.02,
the maximum improvement is 0.165, while for d = 0.03, the maximum improvement
is only 0.089. This reversa1 is probably due to the shape of the density function of
a Beta(3,3) distribution defined on the interval [O, 0.11. This density function is con-
centrated around the middle of its interval of support, since its standard deviation is
0.01749636 and its rnean is 0.05. See figure 6.6. This figure plots a Beta(3,3) density
function defined on the interval [O, 0.11, together with a plot of the posterior density
function when x = 94 is obsenred. The SIR algorithm with k = r = 500000 was
used. Mie see from these plots that the improvement is concentrated in an interval of
approximative length 0.03 around the mean 0.05. If a d > 0.015, is considered, then
the improvement will be smaller, since some probability will be lost around the edges
of the interval and some gained around the middle of the interval.
-- -
Table 6.23: Slaximum and minimum differences between the upper limit of the pos-
terior average coverage probability and the prior coverage probability of 8. In the first
three r o m the prior distributions of S and C are respectively Li[al, bl] and U[a2, b 4 ,
where the intervals [al . b l ] and [a2, bz] are the ones considered in the tables 6.13
through 6.22. The parameter d is half the width of the posterior credible interval,
prior.19 is the prior distribution of 8, cou.b.start is the prior coverage probability by 19.
of an interval of total length 2d centered at the prior mean of 8, and max.imp.quasi
and min.imp.quasi are respectively the maximum and the minimum of the differences
over the different values of a 1 , bl , a*, and b2.
1
2
3
4
5
d
0.01
0.02
0.03
0.02
0.03
priw.8
G[O, 0.11
L'[O. 0.11
U[O, 0.11
Beta(3,3) on [O, 0.11
Beta(3.3) on [O, 0.11
~~~~~~~~~t
0.2
O -4
0.6
0.674
0.884
max.imp.quasi
0.139
0.229
0.233
0.165
0.089
min.imp.quasi
0.035
0.063
0.065
O. 040
0.022
prior density of theta - - - - - - - - - - - - . posterior density of theta
Figure 6.6: Plot of the prior and posterior density functions of 8. The prior distribu-
tion of 0 is Beta(3,3) defined on the interval [O, 0.11. The prior distributions of S and
C are both Beta(3,3) defined on the interval [0.9,1]. The posterior density function
of 0 is computed for x = 94. using a SIR algorithm with samples, both prior and
posterior, of sizes k = r = 500000.
Chapter 7
Pract ical Implications
The estimates of the posterior average coverage probabilities, and therefore also the
upper limits of t hese coverage probabili t ies as the sample size approaches infinity,
depend largely on the prior distributions for the sensitivity and specificity of the
diagnostic test. The information on the sensitivity and the specificity of the test
provided by the sample depends not only on the disease prevalence, but also on the
disease status of each subject in the sample. Since individual disease status is only
known if a gold standard test is applied to each subject, the sarnple will often carry
less information on the sensitivity and the specificity of the test compared to that
for disease prevaience. Therefore, it is important to gather as much information as
possible on the sensitivity and specificity of the test prior to the start of the experi-
ment, and to include this information in the prior distributions. It is crucial not only
to derive point estimates of the sensitivity and specificity, but also to minirnize the
standard errors of these estimates, since these errors have a large impact on accuracy
of the results of the study. For example, the results presented in table 6.7 showed
that the average coverage probability decreases with the introduction of uncertainties
around values of the sensitivity and specificity of the diagnostic test, even when these
uncertainties are uniquely in the direction of higher values. For exarnple, a sensitivity
that has a lower limit of 0.9 gives a poorer postenor average coverage probability for
0 than a sensitivity that is 0.9 exactly. Since one very rarely knows these values ex-
actly, one rnust include the uncertainty surrounding these variables in the estimation
procedure. Not accounting for this uncertainty rnay provide very misleading results.
On the other hand. including unnecessarily large uncertainties may render the sample
uninformative zven if it is very large. To illustrate how the novel methods presented
in this thesis may be used in practice. in this chapter we will present several examples
based on real studies. First? an observation that will be used later is examined:
Example 7.0.4 Table 7.1 presents the sampie size needed for the posterior average
coverage probability to be within E of its upper limit in a variety of examples. when
only the ranges of support of the sensitivity and specificity are known. These values
are compared to the sample size needed to reach the same posterior average coverage
probability when both the sensitivity and the specificity of the diagnostic test are
exactly known.
Table 7.1: Sample size needed to reach a desired posterior average coverage probabil-
i t - The parameter d is half the length of the posterior credible interval. Sens is the
prior distribution of the sensitivity, spec is the prior distribution of the specificity,
cov is the average coverage probability, taken here to be the difference between the
limit of the average coverage probability, given by the quasi mode1 with E = 0.02, and
ssize is the sample size required to reach the desired average coverage probability.
The prior density of 0 is U[O, 0.11.
d sens spec cou s s i t e
It is clear from table 7.1 that the sample size needed to reach a given posterior
average coverage probability decreases substantially when the uncertain- around the
sensitivity and the specificity is suppressed. For example, in the first and second r o m
of the table, the sample size needed to reach a posterior average coverage probability
of 0.588 is 610, while this sample size decreases to only 93 when the sensitivity and
specificity are known to be 0.8 and 0.95, respectively, al1 other conditions being held
fixed.
In practice, two conclusions can be drawn. First, properties of the diagnostic
test used may severely prohibit the abilities of studies to draw accurate conclusions.
Secondly, it will often be much more useful to improve the knowledge about the test
properties than to increase the sample size. Below we discuss a procedure to find a
sarnple size in the context of two examples of studies carried out in Montreal.
7.1 Procedure to find the sample size when the
sensitivity and/or the specificity are unknown
Suppose that we would like to estirnate the prevalence of a disease from the results of
a diagnostic test applied to a sample from a population. As is obvious from the results
of the previous section, it is important to determine the upper limit of the posterior
average coverage probability of the prevalence, based on what is known about the
properties of the diagnostic test. Then, assurning the potential improvement on the
average coverage probability is worthwhile, the sample size needed to approach the
upper limit can be determined. For example, tables 6.13 through 6.22 can be used
if the conditions of the study are sirnilar to the ones in these tables. If none of the
situations reported in these tables apply, then we propose the use of the following
procedure:
1. Estimate as accurately as possible the prior densities of S and C. This can be
done by consulting with subject matter experts, by performing a search of the
available literature, or by conducting new studies, if necessal The latter may
be especially important if there is doubt about whether previous estimates of
the diagnostic test properties apply to the population in the present study.
2. Estimate the prior density of 6.
3. Estimate the prior mean of 8.
4. Choose d, half the desired width of the posterior credible interval of O.
3. Select a positive nurnber cr such that 1 - cr is the desired posterior average
coverage probability.
6. Compute cov.O.start, the prior coverage probability of an interval of Length 2d
centered at the prior mean of O.
7. Use a SIR algorithm to approximate the values of the posterior average coverage
probability of 0 for several values of the sample size n. For example, values in
the range from 100 to 2000 may often be appropriate.
If in step 7 the desired posterior average coverage probability of 1 - a is reached,
then n can be determined correspondingly. If, however, the posterior average coverage
probabilities in step 6 seems to be stabilizing away from 1 -a then the following steps
are useful:
0 Fit a logistic regression mode1 with constant error variance to the data derived
in step 7 with dependent variable c m - cov.pz.start, and independent variable
n. In general models of the form u + v / n will provide a good fit. Alternative
models can be of the form u + v l /nk l + u2/nk2 + . . . + ui/nki for some integer i,
where kl, . . . , ki are rational numbers.
Compute an estimate of the upper limit of the posterior average coverage proba-
bility. For any of the models employed in the preceding step, exp(u)/(l+exp(u))
approximates this upper limit .
If t his estimate is considered worthwhile, then choose a positive number E such
that the posterior average coverage probability is considered worthwhile if i t is
within E of the upper limit, and compute the sample size. The formula
can be used for models of the form u + u / n .
We now applg this aigorithm to two examples:
Examples
Example 7.2.1 Joseph et al [1995] considered the problem of estimating the preva-
lence of Strongyloides infection among Kampuchean refugees arriving in Montrealo
Canada in 1982-1983. One diagnostic test that can be used to detect Strongloides
is a stool examination. This test is known to have high specificity, but low sensitiv-
ity. We will estimate the upper limit of the posterior average coverage probability
for 0 using an interval width of 2d = 0.1. In general, d would be selected to match
the objectives of the study. We will also calculate the sample size needed for the
posterior average coverage probability of û to be within c of this upper limit. for a
predetermined e . Prior distributions for O, S and C were given in Joseph e t al (19951,
where the prior distribution for the sensitivity was Beta(4.44,13.31), and that the
prior distribution for the specificity was Beta(71.25,3.75), both on the interval [O, 11.
Yo prior information was available on 8 prior to the start of the experirnent, so a
U[O; 1] prior distribution is assumed. LVe let d = 0.05, so that cm.8.start = 0.1' and
compute the average coverage probability for values of n ranging from 100 to 2000
using increments of 100. To carry out the computations, Ive use the SIR program as
described in section 7.1, with both initial and resampling sizes of 1000. The results
are presented in table 7.2.
Table 7.2: Posterior average coverage probabilities when d = 0.05, the prior distri-
bution of the specificity is Beta(71.25,3.75), the prior distribution of the sensitivity
is Beta(4.44,, 13-31): and the prior distribution of 0 is U[O, 11. Cal is the posterior
average coverage probability of 0 for sample size n.
The results from table 7.2 indicate that the posterior average coverage probability
seerns to have an upper limit of approximately 0.2. We fitted a logistic model of the
form u + u / n to the data in table 7.2. The Splus program that was used to fit this
model is given in Appendiv G. The coefficients are u = -2.219 and v = -47.153, so
that the model is cou - 0.1
log (1.1 - cou) = -2.219 - 47.153/n.
Therefore, the upper limit of the average coverage probability is approximately
The sample size needed for the posterior average coverage probability to be within
c = 0.02 of this upper limit, that is, to reach an average coverage of 0.178, is
where [t] denotes the smallest integer greater than t . The plots of the fit and of the
residuals of this mode1 are shown in Figure 7.1. CVe hypothesize that virtually al1 of
the residual error is due to the SIR approximation. We verified this hypothesis by
more accurately estimating the average coverage probability in this example when
n = 400. The posterior average coverage probability for n = 400 given in table 7.2
is 0.202, while the corresponding value given by the model is 0.188. We recalculated
the average posterior average coverage probability for n = 400 using a SIR algorithm
with prior and posterior samples of size k = r = 5000. The average of 20 repetitions
of this procedure should be very close to the exact probability. We obtained a value
of 0.188, which very closely matches the value predicted by the model. Therefore,
we conclude that most of the residual error from the model is due to random errors
from the SIR approximation? and that the model in fact provides average coverage
probabilities that are quite close to the true probabilities.
Obviously. reporting at best a posterior credible set with probability of 0.2 is not
useful in practice. This indicates t hat estimating the prevalence of S trongyloides
infection with a single stool examination is not worthwhile, even with a v e l large
sample size. Therefore, in order to design a worthwhile experiment. the investigators
must find a way to greatly improve the properties of this test (as may be possible by
taking repeated stool samples from each subject), or to find a different test. (In fact,
a combination of stool examination and a serologic test was used. See Joseph et al
[1995] for details.)
Figure 7.1: Fit by a quasi mode1 to the refugees data presented in table 7.2.
Example 7.2.2 Here we will consider again the previous example, but Vary the
prior densities of the sensitivity and the specificity across a reasonable range. CVe
again assume that the prior density of B is U[O, 11. For each set of prior densities, we
computed the posterior average coverage probabilities for values of n ranging from
100 to 2000, with increments of 100. using the SIR algorithm with 1000 points. The
results are given in table 7.3. For ease of comparison, the second column of this table
displays the results of example 'i.2.l.
Table 7.3: Posterior average coverage probability of B for different prior densities
for the sensitivity and specificity. Throughout, the prior density of B is U[0, 11, and
d = 0.05. Covl , cov2, cov3, cov4, c m 3 , and cm6 are the posterior average coverage
probabilities when the prior distributions of the sensitivity are B e t ~ ( 4 . 4 4 , ~ 13-31)?
Li[0.2,0.3], U[0.2,0.5], U[0.2,0.25], U[0.45,0.5], and 0.2, respectively, and the prior
distributions of the specificity are Be t~(71 .25~3 .75)~ U[0.95,1], 1, 1, 1 and 1, respec-
t ive1 y.
Columns 4 and 5 of table 7.3 demonstrate that the posterior average coverage
pro bability increases even when the prior range of the sensitivity decreases even
toward its lower end point. This is seen, for example, in comparing column 5 to
column 7. It is often better to have a test with poorer but more accurately known
properties compared to a test with higher but less accurately known properties. It
is interesting to note? however, that this phenomena is seen here for n 2 400, but
not for lower values of n. This is probably due to the fact that with small samples,
the uncertainty around p is large, so that the uncertainties around the sensitivity
h a . a smaller overall impact. With larger sample sizes, the uncertainty around p is
smaller, so that the degree of knowledge about the sensitivity has a greater effect.
In the last column, the sensitivity and specificity are both considered to be exactly
known. From section 6.1 of chapter 6, we know that the upper lirnit of the posterior
average coverage probability is 1, and an algorithm to cornpute the exact sample size
required for the posterior average coverage probability of 0 to reach a predetermined
value is available.
Example 7.2.3 In this example we examine the sample size requirements of a study
planned by Dr. Theresa Gyorkos at the Montreal General Hospital. The objective of
the study is to estimate the prevalence of Toxoplasma gondii among pregnant women
in the province of Quebec. A kit that detects the presence of antibodies will be used
as the diagnostic test. The prior densities for the sensitivity and specificity of this test
were estimated to be S - Beta(65,3) and C -. Beta(22.1,O.l) respectively. These
estimates were derived from a detailed study by Wilson and Ware [1991]. The prior
density of the prevalence was taken to be 0 - U[O, 11, since very little is known about
the prevalence of Toxoplasrna gondii in Quebec. LVe computed the sample size needed
for the average posterior coverage probability of credible intervals of total length 0.08
and 0.1 respectively. centered at the posterior mean of 6, to be at least 0.95. The
results are displayed in table 7.4.
Table 7.4: Posterior average coverage probability of B. The prior density of 0 is U[O. 11.
The prior densities of S and C are Beta(65,3) and Beta(22.1,O. 1) , respectively
Cou.04 and cov.05 are the posterior average coverage probabilities calculates for d =
0.04 and d = 0.05 respectively, where, d is half the length of the posterior credible
interval.
In table 7.4 we see that for d = 0.04 (column 1) the required sample size is
approxirnately 1500. Since the computations done using the SIR algorithm are based
on samples and therefore involve some errors, we repeated the calculations several
times. The results are displayed in table 7.5, from which we conclude t hat the sample
size needed for the posterior average coverage probability of a credible interval of
length 0.08 to be at least 0.95 is approximately 1500.
I n 1 Posterior average coverage probability 1
Table 7.5: Posterior average coverage probability of B. The parameter n is the sample
size. The prior density of 8 is U[O, 1). The prior densities of S and C are Beta(65,3))
and Beta(22.1' 0. l) , respectively. The length of the posterior credible interval is 0.08.
For the data displayed in column 2 of table 7.4, that is for d = 0.05, we again
repeated the calculations for n = 500, n = 550, and n = 600. The results are displayed
in table 7.6, which show that the sample size needed for the posterior average coverage
Table 7.6: Posterior average coverage probability of B. The parameter n is the sample
size. The prior density of B is U[O, 1). The prior densities of S and C are Beta(65,3))
and Beta(22.1,0.1), respectively. The length of the posterior credible interval is 0.1.
n
probability of a credible interval of length 0.1 to be a t least 0.95 is approximately
Posterior average coverage probability
550. Therefore unlike the previous example, the properties of the diagnostic test
for Toxoplasma gondii are sufficiently well k n o m that narrow credible sets can be
expected from the study.
Chapter 8
Discussion
Estimating the prevalence of a disease in a given population is the aim of rnany
studies. When an error-free diagnostic test will be applied to each subject in a sample.
standard binomial formulas can be used to determine the sample size required to
estimate the prevalence ta any desired accuracy. Cnfortunately, perfect gold standard
tests are very rare in practice. In general, one does not know the exact values of the
sensitivity and specificity of an imperfect diagnostic test, so that the classical binomial
formulas for sample size cannot be applied.
In this thesis, we provided methods for determining the sample size liecessas for
the estimation of disease prevalence to within a given accuracy. We first presented
an adjustment to a standard frequentist criterion, useful when the sensitivity and
specificity of the diagnostic test are exactly known. When the sensitivity and speci-
ficity are not known, using a Bayesian method, we showed that it is important when
planning a study to:
1. Estimate the sensitivity and specificity of the diagnostic test as accurately as
possible.
2. Calculate the upper limit of the posterior average coverage probability, using
a method provided in the thesis, to determine if a study using this test is
worthwhile. If yes, one can proceed t o the calculation of the sample size using
a procedure similar to that of chapter 7. We therefore stress the importance of
estimating the sensitivity and specificity of a diagnostic test as accurately as
possible before the start of any study.
In this thesis we investigated situations where a dichotomous diagnostic test is to
be used. This situation can be applied more generally? however, since any test with
categorical or continuous outcornes can be dichotomized. Obviously, t here are many
other situations for which similar methods can be developed. While we investigated
the case where the sensitivity and specificity of the diagnostic test and the prevalence
of the disease are a priori independent, similar methods can be developed in the case
where t hey are dependent. For example, in investigating parasitic infections, a subject
with a higher degree of infection may be more likely to test positive on two or more
tests compared to a subject with a lesser degree of infection.
Although this study focused on estimating the prevalence, similar methods could
be applied to estimating test properties. To estimate the sensitivity, for example, one
needs a sample of known positive subjects. If a gold standard test is available, such
a cohort can be assembled, and the problem can be viewed as the classical one of
estimating a binomial proportion. When a gold standard test does not exist, so that
there may be some false positives in the sample, methods similar to the ones developed
in this thesis can be found to determine the sample sizes required to estimate the
sensitivity and specificity of a diagnostic test to within a given accuracy. Similar
methods can also be developed to estimate the positive predictive value, the negative
predictive value and the likelihood ratios of a diagnostic test. For these problems,
one usually needs good prior information on the probabilities that each subject is
truly positive or negative.
Further work should also include sample size determination for studies where two
or more diagnostic tests d l be used. This would be useful when no single test pro-
vides adequate sensitivi ty and specificity for accurate estimation, but combinations
of tests offer hope. Finally, in this thesis Ive looked at only one of many possible
Bayesian sample size criteria. Other sample size criteria such as "average Iength" cri-
teria or conservative criteria such as '' wcrst outcorne" criteria (see chapter 1) could
be investigated, as well as decision theoretic cnteria.
Throughout this thesis, we have assumed random samples from the relevant pop-
ulations. Another area worthy of investigation would be to consider non-random
sarnples, which can occur in chic-based studies, when only certain subgroups of the
target population may attend a clinic. Therefore, in addition to misclassification, one
must handle potential selection bises.
Finally, work should be done on methods for prior elicitations for al1 parameters
involved in diagnostic test situations, since we have shown that these prior distribu-
tions can have an enormous effect on the results.
We hope that the work presented in this thesis will convince researchers of the
importance of knowing the properties of a diagnostic test as accurately as possible,
before using it to estimate the prevalence of the disease, and that the methods for
sample size estimation in the absence of a gold standard test will prove useful.
Appendix A
Splus program to find the AMLE
and the confidence interval for the
thetahat<-function(size,theta,alpha,spec,sens,n){
p<-theta*sens+(l-theta)*(l-spec)
xc-rbinom(size ,n ,p)
pro<- (1-theta) 'n
MLEOc-pnod (1-spec-p) / (sqrt (p* (1-p) /n) )
sigma<-spec* (1-spec) /n
MX<-sqrt (sigma) *exp (-0.5* (1-spec-x/n) -2/sigma) / (sqrt (2*3.1416) * (pnorm ( (x/n-l+spec) /sqrt (sigma) 1 )
chat <-1-spec-MX
AMLE <- (x/n-chat) / (sens-chat)
MLEC- (x/n-l+spec) / (sens+spec-1)
for (i in 1 : size) { if (MLECi] <= O) AMLECi] C- AMLEfi] else AMLECi] <- MLECi] 1
fodi in l:size)(
if (MLECi] > O) MLECi] <- MLE[i] else MLECi] <- O )
delta<- (2*x/n+ (qnom(1-alpha/2) ) -2/n) -2-4* (x/n) ̂2* (l+( (qnorm( 1-alpha/2) ) -2) /n)
pl<- (2*x/n+ (qnorm(1-alpha/2) ) -2/n+sqrt (delta) ) / (2* (l+qnorm(l-alpha/2) ^2/n) )
thetal<-(pl-l+spec)/(sens+spec-1)
mserror . MLEC-sqrt (mean ( (MLE-theta) -2) )
mserror . AMLE<-sqrt (mean ( (AMLE-theta) -2) ) diffc- AMLE-MLE
dif f . positif <-dif f [AMLE > MLE]
len<-length(diff.positif)
mAMLE<-sum(diff.positif)/len
mserror.diff.MLE+theta
mserror.diff.AMLE<-sqrt(sum((diff.positif-theta)-2)/len)
closer.prop<-sum(as.numeric(abs(diff.positif-theta)<theta))/len
delta<- (2*x/n+ (qnorm (1-alpha/2) ) ̂ 2/n) -2-4* (x/d ̂2* ( l+ ( (qnorm (1-alpha/2) ) -2) /n)
pl<- (2*x/n+ (qnorm( 1-alpha/2) ) ̂ 2/n-sqrt (delta) ) / (2* (l+qnorm( 1-alpha/2) ̂2/n) )
p2<- (2*x/n+ (qnorm( 1-alpha/2) ) ̂ 2/n+sqrt (delta) ) / (2* (l+qnorm(l-alpha/2) -2/n) )
thetal .m2+ (pl-l+spec) / (sens+spec-1)
d Cx/n<=l-spec] <-qnorm(1-alphal2) *sqrt (spec* (1-spec) /n)
4 percent.ml<-sum(as.numeric(thetal.rn1 <= the ta & theta<= theta2.m1))/size
174
percent.m2<-sum(as.numeric(thetal.rn2 <= theta & theta<= theta2.m2))/size
percent.ml.null<-sum(as.numeric(thetai.rn1 <= theta & theta<= theta2.ml &
AMLE > MLE ) )/len
percent.m2.null<-sum(as.numeric(thetal.rn2 <= theta & theta<= theta2.rn2 &
AMLE > MLE ) ) / l en
percent.mi.pos<-sum(as.numeric(thetai.mi <= theta & theta<= thetaS.ml &
AMLE == MLE) ) / (size-len)
percent.m2.pos<-sum(as.numeric(theta1.~ <= theta % theta<= thetaS.rn2 &
AMLE == MLE) ) / (size-len)
return(percent.ml,percent.ml.pos,percent.m2.pos,percent.ml.null,
percent.rn2.nul1,
percent.m2,mAMLE,len,pr0,MLEOJ
mserror.MLE,mserror.AMLE,rnserror.diff.MLE,mserror.diff.AMLE,closer.prop,
thetal.ml,theta2.rnl,thetal.m2,theta2.m2)~
Appendix B
Splus program to calculate the
average coverage probability when
both the sensitivity and the
specificity are known
In the following, we will display an Splus program to calculate the average coverage
probability when both the sensitivity and the specificity are known, and the prior dis-
tribution of 0 is U[a , b]. -4 similar program can be written when the prior distribution
of 0 is Beta(uo v) on an interval [a l , b l ] .
ne- # the sample size#
fc-function(p,x)~dbinom(x,n,p))
g<-function(p,x)Cdbinom(x+l ,n+1 , p ) )
marge-rep (-1, n+l)
d i f f erencebetac-rep (--1)
if (is . nul1 (yl) ==FI Cf or (i in yl) { phat1 [il <-integrate(g,lover=al ,upper=bl ,x=(i-l))$integral
phatnew Ci] <- i/ (n+l) +phat 1 [il 1)
for(i in y2)C
phat 1 [il <-dif f erencebetal Li]
phatnew Ci] <-dif f erencebetal Li] 3
for(i in t)(
phat Ci] <-phatnew C i 3 /marg[i]
11 [il <-max(a1 , (phat Ci] - d* (sens+spec-1) ) ) 12 Ci] <-min (bl , (phat [il + d* (sens+spec-1) ) )
differencebeta2 Ci] <-1/(n+1)* (pbeta(12 [il , i ,n+2-i)-pbeta(11 Li] , i ,n+2-i) ))
z1<-t [dif f erencebeta2 == O 1 dif f erencebeta2 == 11
z2C-t [dif f erencebeta2 ! = O&dif f erencebeta2 ! = 11
if(is.null(zl)==F) {for(i in A)(
posterior Ci] <-integrate (f , lower=11 Ci] , upper-12 [il , x=i-1) $integral))
for(i in z2) (
posterior [il C-dif f erencebeta2 Li] )
average.prob<-sum(posterior)/l
return(n,marg,average .prob 1)
Splus program to cornpute the sample size, through a bisectional search,
when the sensitivity and specificity are known.
phat Ci] <-dif f erencebeta Ci] /marg Ci]
if (11 [il < al ) lowerlimit Ci] <-al else lowerlimit Ci] (-11 Ci3
if (12Cil > bl) upperlimit Ci] <- bl else upperlimit [il<-12CiI
posterior [il <-l/ (n+l) * (pbeta(usper1imit [il , i ,n+2-i) - pbeta(lower1imit [il , i ,n+2-i) ) )
average. probstartc-average .prob
if (average. probstart < 1-alpha ) {n2<-ceiling(l*nl))
phat Ci] <-dif f erencebeta Cil /marg [il
11 Cil <-phat Ci] - d* (sens+spec-1) 12 Ci] <-phat [il + d* (sens+spec-1)
if (11 [il < al ) lowerlimit c i ] +al else lowerlimit Ci] <-Il Ci]
if (12 [il > bl) upperlimit [il <- bl else upperlimit [il <-12 Ci]
posterior Ci] c-l/ (n+l) * (pbeta(upper1imit [il , i ,n+2-i) -
pbeta(lower1imit [il , i ,n+2-i) ) )
postc-posterior [ t]
if (average. prob == 1-alpha) Cnnew<-n2)
n3<-nnew
t <-x [marg ! =O]
lowerlimit<-rep(-1 ,n+1)
upperlimit c-rep (-1, n+1)
posterior<-rep(-i,n+l)
phat<-rep (-1 ,n+l)
lic-rep(-1 ,n+l)
l2<-rep(-1 ,n+l)
differencebeta~-x/((n+2)*(n+l))*(pbeta(bl,x+l,n+2-x)-pbeta(al,x+l,n+2-x))
for(i in t)(
phat [il <-di f f erencebeta [il / m x g [il
11 Ci] <-phat [il- d* (senscspec-1)
12 [il <-phat Ci] + d* (sens+spec-1)
if (11 Ci] < al ) lowerlimit [il <-a1 else lowerlimit Ci] C-11 Ci]
if (12 Ci] > bl) upperlimit [il <- bî eise upperlimit [il <-12 Ci]
posterior Ci] C-1/ (n+l) *(pbeta(upperliait Ci] , i ,n+2-il-
pbeta(lower1imit [il , i ,n+2-i) 1)
post<-posterior [ t]
average. prob<-sum(post) /l
if(average.prob == 1-alpha) (break)
else if (average .prob > 1-alpha) {nstop<-(113-1)
n<-nstop
x<-c (1 : (n+l) > ale-a*(sens+spec-l)+l-spec
blc-b*(sens+spec-l)+l-spec
I<-bl-ai
marg<- l/(n+l)*(pbeta(bl ,xan+2-x) -pbeta(al ,x,n+2-x))
t<-x [marg ! =O]
phatc-rep(- 1, n+l)
Il<-rep(-l,n+l)
12~-rep(-l,n+l)
differencebetac-x/ ( (n+2) * (n+l) ) * (pbeta(b1 ¶x+1 ,n+2-XI-pbeta(a1 ,x+l ¶n+2-x) forci in t)(
phat fil <-dif f erencebeta Ci] /marg [il
11 Ci] <-phat [il - d* (sens+spec-1)
12 Cil <-phat Ci] + d* (sens+spec- 1)
if (11 [il C al ) lowerlimit Ci] <-al else lowerlimit [il <-11 [il
if (12 [il > bl) upperlimit [il <- bl else upperlimit [il <-12 [il
posterior [il <-l/ (n+l) * (pbeta(upper1imit [il , i ,n+2-i) -
pbeta(lower1imit Ci] , i ,n+2-i) ))
poste-posterior [ t]
average.probstop<-sum(post)/l
if(average.probstop < 1-alpha I l nstop==l) {break))
if(average.prob < 1-alpha %& average-probold < 1-alpha 8%
average. probstart C 1-alpha) {nnew<-ceiling(2*n3)
average.probstart<-average.probold
if(average.prob > 1-alpha &% average.proboid > 1-alpha &Br
average. probstart > 1-alpha) {nnew<-ceiling(n3/2)
average.probstart<-average.probold
average.probold<-average.prob
nl<-n2
n2<-n3
n3<-nnew )
else
if(average.prob > 1-alpha && average-probold > 1-alpha &&
average.probstart < 1-alpha) (nnewc-ceiling((nl+n3)/2)
average.probstart<-average.probstart
average.probold<-average.prob
ni<-n1
n2<-n3
n3<-nnew)
else
if(average.prob > 1-alpha && average-probold < 1-alpha &&
average-probstart > 1-alpha) {mev<-ceiling((n3+n2)/2)
average.probstart<-average-probold
average.probold<-average.prob
nl<-n2
n2<-n3
n3<-nnew)
else
if(average.prob < 1-alpha && average-probold > 1-alpha &&
average. probstart > 1-alpha) Cnnew<-ceiling( (n3+n2) /2)
average.probstart<-average.probold
average.probold<-average-prob
nl<-n2
n2<-n3
n3<-nnew)
else
if(average.prob > 1-alpha && average.probold < 1-alpha &&
average. probstart c 1-alpha) innew<-ceiling( (n3+n2) /2)
average.probstartC-average.probold
average.probold<-average.prob
ni<-n2
n2<-n3
n3<-nnew)
else
if(average.prob < 1-alpha &% average-probold < 1-alpha &&
average.probstart > 1-alpha) {nnew<-ceiling((nl+n3)/2)
average.probstart<-average-probstart
average.probold<-average-prob
nlc-nl
n2<-n3
n3<-nnew)
else
if(average.prob < 1-alpha %& average-probold > 1-alpha &&
average.probstart c 1-alpha) {nnewc-ceiling((n3+n2)/2)
average.probstart<-average.probo1d
average.probold+average.prob
nl<-n2
n2<-n3
Appendix C
Computing the posterior average
coverage probability when the
specificity is known
In this appendix, Ive display the programs needed to cornpute the posterior aver-
age coverage probability when the specificity is known, the prior distribution of the
sensitivity is U[a i , b l ] and the prior distribution of O is C'[a, b].
1. The Maple program to compute the integrals with respect to s, c
being held constant.
c is fixed, a > O
sl:=(p-(1-a)*(l-c))/a;
s2:=(p-(l-b)*(i-c))/b;
gl := in t (l/ (s+c-1) , s=al . . sl) ;
g2:=int (l/(s+c-1) , s=al . . bl) ;
g3:=int (l/(s+c-1) ,s=s2. . bl) ;
g4:=intCl/(s+c-1) ,s=s2. .si) ;
gpi:=int (l/(s+c-1)~2,s=a1. .si) ;
gp2 : = i n t (l/ (s+c-1) -2, s=al . . bl) ;
gp3 : = i n t Ci/ (s+c-1) - 2 , s=s2. . bl) ; gp4 : = i n t (l/ (s+c-1) -2, s=s2. . s 1) ;
c is f ixed, a=O
g2 : =int Cl/ (s+c-1) , s=al . . bi) ;
g3 : =int (l/ (s+c-1) , s=s2. . bi) ; gp2:=int (l/(s+c-1) *2,s=ai. .bl) ;
gp3:=int(l/(s+c-l)^2,s=sS..bi);
An Splus program was then written to integrate these functions with respect to p
and calculate the posterior average coverage probability.
2. Splus program to calculate the exact posterior average coverage
probability when the specificity is known, the prior distribution of the
sensitivity is U[a,, b,] and the prior distribution of 0 is U[O, b ] .
glc-function(p,x,a,b,al,bl,cO)~dbinom(x,n,p)*
(log( (p+cO-1) /a) -log(al+cO-1) ) 1
g2<-function(p,x,a,b,al,bl,cO)~dbinom(x,n,p)*
(log(bl+cO-1) -log(al+cO-1) ) 3
g3<- function(p,x,a,b,al,bl,cO)idbinom(x,n,p)*
(log(bi+cO-1) -log( (p+cO-1) /b) ) 1
g4<-function(p,x,a,b,al,bl,cO)~dbinom(x,n,p)*
(log( (p+cO-1) /a) - log( (p+cO-1) /b) ) )
gp2<-function(p,x,a,b,al,bl,cO){dbinom(x,n,p)*
((p-l+cO)*(bl-al)/(( bl + CO - l)*(al + CO - 1)))) gp3c- function(p,x,a,b,al,bl ,cO)~dbinom(x,n,p)*
((p-l+cO)*(- 1/( bl + CO - 1) +b/( p - 1 + CO))))
Il<-vector ()
I2<-vector 0
13~-vector 0
14~-vector 0
Ip2<-vector (1
Ip%-vector 0
post . probc-vector (1
fixc.cov<-function(n,d,a.old,b.old,al.old,b1.old,c.old)~
x<- 1: (n+l)
pl<-function(a,s,cO)Ca*s+(i-a)*(l-CO))
p2<-function(b, s, C O ) {b*s+ (1-b) * (1-CO))
lpic-1-c-old
lp2<-p2(b.old,al.old,c.old)
1~3~-p2(b.old,bl.old,c.old)
for(i in l:(n+l)){
I1~i]~-integrate(g2,lower=lpl,upper=lp2,x=i-l,a=a.old,b=b.old,
al=al.old,bl=bl.old,cO=c.old)$integral
I2[i~~-integrate(g3,lower=lp2,upper=lp3,x=i-l,a=a.old,b=b.old,
al=al.old,bl=bl.old,cO=c.old)$integral
Ip2[i]<-integrate(gp2,lower=lpl,upper=lp2,x=i-l,a=a.old,b=b.old,
al=al.old,bl=bl.old,cO=c.old)$integral
Ip3 Ci] <-integrate (gp3,lower=lp2, upper=lp3, x=i-1 , a=a. old, b=b . old,
al=al.old,bl=bl.old,cO=c.old)$integral)
rnarg<-11+12
WC-l/((b.old-a.old)*(bl.old-ai.old))
thetahat<-(Ip2+Ip3)/marg
Il<-pmadthetahat-d,a.old)
12<-pmin(thetahat+d,b.old)
a. new<-11
b . new<-12 1pl.newC-pl(a.new,al.old,c.old)
lp2.newC-pl(a.new,bl.old,c.old)
lp3.new<-p2(b.new,a1-old,c.old)
lp4.new<-p2(b.new,bl.old,c.old)
for( i in l:(n+l)){if(marg[i] >O && a.neu[il==O ) {
11 Ci] <-integrate (g2,lower=lp2. new Ci] , upper=lp3. new [il , x=i-1 ,
a=a. new [il ,b=b .new [il , al=al . old, bl=bl . old, CO-c . old) $integral
12 Ci J <-integrate (g3,lower=lp3. new [il , upper=lpl. new [il , x=i-1 ,
a=a .new [il , b=b . new Ci] , al=al . old, bl=bl . old, cO=c . old) $integral
post .prob Ci] <-Il [il +12 Ci] > if ( marg Ci] ==O ) {post . prob [il <- O)
if (marg[i] > O && a. new Ci] > O) {
if ( lp2. new Ci] <= Ip3. new [il ) {
Il [il c-integrate (gl , lover=lpl . new Cil ,upper=lp2. new [il , x=i-1 ,
a=a. new Ci] , b=b . new [il , al=al . old, bl=b1. old , cO=c . old) $integral
12 [il c-integrate (g2,lover=lp2. new Cil ,upper=lp3. new [il , x=i-1 ,
a=a. new [il , b=b. new Ci] , al=al . old, bl=bl . old, cO=c . old) $integral
13 [il c-integrate (g3,lover=lp3. new Cil , upper=lp4. new [il , x=i- 1,
a=a. nev Ci] , b=b. new Ci] , al=al . old, bl=bl . old, cO=c . old) $integral)
else if (Ip2. new Ci] > lp3. new [il ) {
Il [il <-integrate(gl,lower=lpl .nev[il ,upper=lp3 .new Ci] ,x=i-1 ,
a=a. new [il , b=b .new [il , al=al . old, bl=bl . old, cO=c . old) $integral
12 Ci] <-integrate (g4,lower=lp3. new Ci] ,upper=lp2. new Ci] , x=i-1 ,
a=a. new Ci] , b=b . new [il , al=al . old, bl=bl . old , cO=c . o ld ) tintegral
13 Ci] c-integrate (g3,lover=lp2. new Ci] ,upper=lp4. neu [il , x=i-1 ,
a=a. neu [il , b=b . new Cil , al-al . old, bl=b1. old, CO-c . old) $integral)
post . prob Ci] <-Il [il +12 Ci] +I3 [il >)
average. cov<-sum(post . prob*v)
sum.marg<-sum(marg*w)
return(average . cov, sum-marg) )
In this program, we supposed that a = O, since in most practical cases it is. h
similar program can be written when a > O. In that case, the functions gpl and
gp4 have to be integrated on the same regions as gl and 94, and the result of the
integration is added to the integrals of 92 and 93 to get the postenor mean of O.
3. The SIR program to calculate the posterior average coverage proba-
bility when the specificity is known, the prior distribution of the sensitivity
is U [ a l , b l ] and the prior distribution of 0 is U[a , b ] .
upperthetac-vect or (1
tparc-vector 0
marginale-vector (1
su<-vector 0
weight Ci ,] <-dbinom(i-1 ,size, (theta*sens+(l-theta)*(l-spec) ) )
su Cil C-sum(weight [i ,1)
if (sw [il >O) (
posttheta[i ,] <-sample (theta, size2 ,replace = T, prob = weight ci, 1 )
else {posttheta[i,]<-rep(O,sizel)>
marginal [il <-sum(weight Ci, 1 ) /sizel
loverthetaCi] <-mean(posttheta Ci, 1 ) -1/2
upperthetaci] <-mean(posttheta[i, 1 ) +1/2
tpar Ci] <-length(posttheta[i ,] CpostthetaLi ,] >= loverthetaCi] &
postthetaci ,] <= upperthetaLi1 1 13
totalcov<-sum(probability*marginal)
return(tpar,marginal,weight,sw,totalcov,posttheta,lowertheta,
uppertheta , probability) 1
In the above SIR program, the prior distributions of 0 and S were taking to be
uniform, but a similar SIR program can be rvritten for any other prior distribution.
Appendix D
Regions of integrat ion
In this Appendix, rve describe the regions of integration needed to calculate the exact
posterior average coverage probability, when the prior distributions of the sensit ivity,
the specificity and 0 are Li[al, bl], U [a2, b2] and U[a , b], respectively. These regions
were introduced in section 6.4.2
First fix s and look at the two variables c and p. We have al < c <_ b2 and
a(s + c - 1) + (1 - c ) 5 p 5 b(s + c - 1) + (1 - c), which means that p is between the
ttvo lines
-(1 - a)c + 1 - a + as < p 5 -(1 - b)c + 1 - b + bs.
Let
pi = as + (1 - a)(l - a*),
pz = as + (1 - a)(l - b2),
PJ = bs+ (1 - b)(l - a 2 )
and
p = bs + (1 - b)(l - 9 ) .
Depending on s two cases may arise pi 5 p4 or pi > p4. See figure D.1.
Figure D. 1: Regions of integration.
The inequality pi 5 p4 is equivalent to
as + ( 1 - a ) ( l - a 2 ) 5 6s + ( 1 - b ) ( l - b2) ,
so that
Let SI = "-a)(1-"')-(L-b)(1-b2). 6-a Therefore if s 2 si we have p2 4 pl 5 p4 5 p3 and if
s 5 si we have p2 5 p4 5 pl 5 p3. Fix s 5 si. Let RII be the region bounded by the
lines
c = b2, p = -(1 - a)c + 1 - a + as. p = p4,
let R12 be the region bounded by the lines
p = pl, p = p4, p = -(1 - a)c + 1 - a + as, p = - ( 1 - b)c + 1 - b - bs.
and let RI1 be the region bounded by the lines
Conversely ive may have s 2 SI. In this case. let R21 be the region bounded by
the lines
p = p l , p = - ( 1 - a ) c + 1 - a - a s l c = b 2 ,
let Rz2 be the region bounded by the Iines
p = p l , p = p4, c = al , c = b2, and
let R23 be the region bounded by the lines
p =pq, p = - ( 1 - b)c+ 1 - b - bs, c = al .
197
See figure D.l With the above notation the marginal density function of x can be
written as the sum of 6 triple integrals,
Similar to the case of known c discussed in section 5.3, we can interchange the order
of integration of the first two integrals. CVe will explain the procedure for the case
a # O and for the first integral
We have al 5 s 4 sl and p2 5 p 5 pq, that is, as + (1 - a ) ( l - b 2 ) 5 p 5
bs + (1 - b ) ( l - b 2 ) . Let
and
See figures D. 1 If 5 pzz, the integral becomes
Conversely, if pli 2 p z , the integral becornes
lPz2 P-l 1 lr j-:+l-a-as L-a j (x, P! S, c)dcdpds + p-(t-b)(L-b?)
b j-:+l -a-as f (x, p, s, c)dcdpds.
1 -a
The remaining integrals can be similarly computed.
Note that the case where a = O can be simplified as in the case of fixed specificity
(chapter 6 section 6 .3 ) . We see that each of the 6 integrals will split into three
different integrals. Since we have two cases, we then have a total of 36 triple integrals
to evaluate. We will omit the very tedious mathematical details. Similarly, we can
calculate ê and cov(z; d). As in the previous section we first wrote a program in
MapIe to calculate the double integrals with respect to c and then s as functions of
a, b, a l : bl: a2 and b2. Then we wrote an Splus program to integrate these functions
with respect to p and calculate the marginal probability function of X, the posterior
mean of B and the average coverage probability. The following algorithm was used:
1. Calculate m(x) , the marginal distribution of .Y.
2. Calculate 8, the posterior mean of B.
3. For each z, calculate the coverage probability cou, = P(& - d < 0 5 - dix.
1. Calculate the average coverage probability over al1 x, cov = x:=, cou,m(z).
Since including al1 the functions that result from integrating first with respect to
c and then s under the different conditions would take more than 40 manuscript
pages, and a listing of the program that implements the methods to calculate the
average coverage probability would take an additional 30 pages, these programs are
not included in this thesis.
Appendix E
The SIR program to calculate the
average coverage probability
The SIR program to calculate the average coverage probability when the prior dis-
tribution of 0 is U[a, 61, the prior distribution of the sensitivity is U [ a l , bl] and the
prior distribution of the specificity is U[a2 , b2 ] . Similar SIR programs can be written
for any other prior distribution.
sens<-runif(sizel,al,bl)
spec<-runif(sizel,a2,b2)
thetac-runif (sizel ,a, b)
forci in 1: (size+l))(
weight Ci ,] <-dbinorn(i-1, size, (theta*sens+(l-theta) * (1-spec)) )
sw Ci] C-sdweight fi ,] )
if(swCi1 >O)(
posttheta Ci, 1 <-sample (theta, size2 , replace = T , prob = weight ci, J ) )
else (postthetaci ,] <-rep(0, sizel) )
marginal [il c-sum(weight Ci, ) /sizel
lowertheta Ci] <-mean(posttheta[i, 1 ) -1/2
upperthetaCi1 <-mean(posttheta[i, ] ) +1/2
par1 fi] c-length(posttheta[i ,] CpostthetaCi ,] >= loverthetaCi] &
postthetaci ,] <= upperthetaci] 1 ))
probability<-parl/size2
totalcov<-sum(probability*marginal)
return(tota1cov))
Appendix F
Logist ic models
In this appendix. we will display the binomial, quasi and quasi-full models and their
printouts. We will also write the nested model.
1. The binomial model
cov.fit.bin<-glm(cov-cov.theta.start-l+w+I(l/(d*(0.05-d)))+
d+cov. theta. start+I(cov. the ta . startn2)+I(l/n),
family=binomial ,data=matrix.data.new)
C a l l : glm(formu1a = cov - cov-pi-start - 1 + w + d + I ( l / ( d * (0.05 - d))) +
cov.pi.start + I(cov.pi.start'2) + I(l/n), family = binomial, data =
matrix .data.new)
Deviance Residuals:
Min 1Q Median 34 Max
-0.2166384 -0.03393597 -0.001393636 0.03095628 0.20126
Coefficients :
Value Std. Error t value
(Intercept) -1.088243e+00 2.603811e+00 -0.4179425
w -1.050587e+00 2.870602e-01 -3.6598139
d 3.159586e+01 2.472907e+01 1.2776807
I(l/(d * (0.05 - dl)) -1.904313e-04 7.687555e-04 -0.2477138 cov.pi.start 3.973535e+00 5.307734e+00 0.7486310
I(cov.pi.startn2) -5.378742e+00 4.364519e+00 -1.2323792
I(i/n) -8.855294e+01 5.012866e+01 -1.7665132
(Dispersion Parameter for Binomial f a m i l y taken to be 1 )
Null Deviance: 41.67006 on 1258 degrees of freedom
Residual Deviance: 3.308388 on 1252 degrees of freedom
Number of Fisher Scoring Iterations: 5
Correlation of Coefficients:
(Intercept) w
w -0.1843766
d 0.0698851 -0.0488639
I(l/(d * (0.05 - d) 1) -0.9481826 0 .O039381 -0.0088116
cov.pi.start -0.8785893 0.0553071 -0.4203107 0.7670587
I(cov.pi.start62) 0.8699514 -0.0601915 0.3412218 -0.7515563
I(l/n> -0.0121085 0.0500585 -0.0398480 -0.0678532
cov .pi .start 1 (cov.pi. start-2)
W
2. The quasi mode1
cov.fit.quasi.simple<-glm(cov-cov.theta.st~t~1+~+1(1/(d*(O.O5-d))~+d+
cov. theta. start+I (cov. theta. start-2) +I (1/d ,
f amily=quasi (link=logit ,variance=constant) , data=matrix. data. new)
Call: glm(formu1a = cov - cov.pi. start - 1 + w + I(l/(d * (0 .O5 - d ) 1) +
d + cov.pi.start + I(cov.pi.start-2) + I(l/n) ,family =
quasi(1in.k = logit, variance = constant), data = matrix.data.new)
Deviance Residuals :
Min 1Q Median 34 Max
-0.04350419 -0.009659104 -0.000379838 0.009870335 0.06229331
Coefficients:
Value Std. Error t value
(Intercept) -1.112255e+00 1.329215e-01 -8.367760
w -1.066918e+00 1.486106e-02 -71.792844
1 (l/ (d * (0.05 - d) ) ) -1.69994ie-04 3.812174e-05 -4.459243
d 2.938892e+01 1.162614e+00 25.278320
cov.pi.start 4.131180e+00 3.020775e-01 13.675895
I(cov.pi.start-2) -5.468591e+00 2.601563e-01 -21.020407
I(l/n) -8.277740e+01 2.598457e+00 -31.856364
(Dispersion Parameter for Quasi-likelihood family taken to be 0.0002206 )
Nul1 Deviance: 3.956394 on 1258 degrees of freedom
Residual Deviance: 0.2762186 on 1252 degrees of freedom
Number of Fisher Scoring Iterations: 5
Correlation of Coefficients:
(Intercept) w I(l/(d * (0.05 - d l ) )
w -0.1873668
I(l/(d * (0.05 - d ) ) ) -0.9439987 0 .O177735
d 0.2459611 -0.0627219 -0.1664503
cov.pi.start -0.9029343 0.0528542 0.7832530 -0.5225833
I(cov.pi.start-2) 0.8951729 -0.0521580 -0.7707318 0.4517733
I(l/d -0.0157598 0.0433608 -0.0611775 -0.0475988
cov .pi. start 1 (cov.pi. start-2)
W
I(l/(d * (0.05 - d l ) )
d
cov. pi. start
I(cov.pi.start*2) -0.9905266
I(l/n) 0.0407937 -0.0450998
3. The quasi.fidi model. This mode1 includes al1 the variables and their
interactions it is selected by Splus to have the smallest MC.
cov.fit.quasi~-glm(cov-cov.theta.start~l+w+I(1/(d*(0.05-d)))+
d+cov. theta. start+I (cov. t h e t a . s t H 2 ) +
msens+I (msensn2) +mspec+I (mspecn2) +I (Un)
+ (d+cov. theta. start+I (cov. theta. starta2) +
msens+I(msens*2)+mspec+I(mspec~2))*2,
family=quasi(link=logit,variance=constant),data=matrix.data.new)
fomula(cov .f it .quasi)
cov.fit.quasi.step<-step(cov.fit.quasi,scope=-.)
Call: glm(formu1a = structure(.Data = cov - cov.pi.start - v +
I(l/(d * (0.05 - d l ) ) + d + cov.pi.start + I(cov.pi.startn2)+
I(msens-2) + mspec + I(mspecn2) + I(l/n) + d:1(msensA2) +
d: 1 (rnspec-2) + cov. pi. start : I (msensa2) + cov. pi. start :mspec +
I(cov.pi.starta2) :I(msens-2) + I(cov.pi.startA2):
mspec, class = "formula"), family = quasi(1ink = logit, variance =
constant) , data = matrix. data. new)
Deviance Residuals :
Min 1Q Median 34 Max
-0.04529682 -0.009005072 -0.0004487669 0.008003408 0.0546313
Coefficients :
Value Std. Error t value
(Intercept) -4.621282e+00 7.448138e-01 -6.2046145
w -4.918366e-01 5.446493e-02 -9.0303351
I(l/(d * (0.05 - d l ) ) -1.485416e-04 3.768448e-05
d 7.271363e+OI 9.034557e+00
cov.pi.start 7.752816e+00 2.277173e+00
I(cov.pi.startn2) -1.078100e+01 1.956258e+00
I(msens-2) 2.603876e-01 2.316828e-01
mspec 1.188436e+00 1.246512e+00
I(mspecn2) 1.785556e+00 6.329351e-01
I(l/n) -8.200729e+Ol 2.327415e+00
d:I(rnsensn2) -1.638015e+01 ?.245770e+00
d:I(mspeca2) -3.694298e+01 1.064003e+01
cov.pi.start:I(msens'2) 2.672344e+00 1.143772e+00
cov.pi.start:mspec -6.274561e+00 2.457733e+00
1 (cov. pi. startn2) : I (msensn2) -2.256186e+00 l.O06317e+OO
I(cov.pi.start'2):mspec 7.678926e+00 2.092371e+00
(Dispersion Parameter for Quasi-likelihood family taken to be 0.0001778 )
Nul1 Deviance: on 1258 degrees of freedom
Residual Deviance: 0.2210319 on 1243 degrees of freedom
Number of Fisher Scoring Iterations: 6
Correlation of Coefficients:
(Intercept) w I ( l / ( d * (0.05 - d ) ) )
w -0.4809546
I ( l / ( d * (0.05 - d))) -0.2397827 -0.0080458
d 0.2684785 0.0606991 -0.0950453
cov.pi.start -0.6525555 -0.0002743 0.2093212
I(cov.pi.starta2) 0.6722751 -0.0501112 -0.1967440
I(msensn2) -0.2367838 0.3005772 -0.2456594
mspec -0.9177480 0.3248278 0.1150115
I(mspec-2) 0.7562479 -0.2536979 -0.0685331
I(i/n) 0.0209653 -0.0034014 -0.0329474
d:I(msensd2) 0.0126985 0.1320700 -0.1466124
d:I(mspecn2) -0.2783140 -0.1346187 0.1628651
cov.pi.start:I(msensn2) 0.0569443 -0.0441566 0.2992395
cov.pi.start:mspec 0.6250244 0.0144031 -0.2151961
I(cov.pi.start-2):I(msensn2) -0.0541779 0.0123761 -0.3313031
I(cov.pi.start"2):mspec -0.6529259 0.0481267 0.2196878
d cov .pi. start 1 (cov-pi . startn2) W
I(l/(d * (0.05 - dl)) d
cov.pi.start -0.6600397
1 (cov . pi. startd2) 0.49l4913 -0.9628544
I(msensn2) -0.0582930 0.1845356
mspec -0.2409717 0.4828702
I(mspecn2) 0.1705694 -0.2166373
1 (l/n) 0.0163372 -O.O298150
d:I(msensd2) -0.3129960 0.0495458
d:I(mspecn2) -0.8283379 0.6422962
cov. pi. start : 1 (msensn2) 0.2lOl99O -0.1791927
cov.pi.start:mspec 0.5853145 -0.9399950
I(~ov.pi.start~2):I(rnsens~2) -0.1917929 0.1919941 -0.2142826
I(cov.pi.start"2):mspec -0.4255888 0.9066909 -0.9395096
1 (msens-2) mspec I (mspec-2) 1 (l/d
w
I(l/(d * (0.05 - d l ) )
d
cov.pi.start
1 (COV .pi. staxtn2)
I(msensn2)
mspec 0.0305303
I(mspecn2) -0.0282835 -0.9327618
I(l/n) -0.0481272 -0.0135424 0.0065108
d:I(msensk2) 0.3165893 -0.0793904 0.0727580 -0.0287008
d:I(mspecA2) -0.1127687 0.2905244 -0.2150540 -0.0062113
cov.pi.start:I(msensn2) -0.8672577 0.0875644 -0.0657743 0.0561111
cov.pi.start:mspec 0.0975931 -0.5182995 0.2392215 0.0186181
I(cov.pi.startn2):I(msens~2) 0.8619857 -0.0738785 0.0482081 -0.0637246
I(cov.pi.startn2):mspec -0.0763776 0.5237750 -0.2276340 -0.0204177
d: 1 (msensn2) d: 1 (mspecn2) cov . ~ i . start : 1 (rnsensn2)
W
I W ( d * (0.05 - d l ) )
d
cov. pi. start
I(cov.pi.start-2)
1 (msens-2)
mspec
1 (mspecn2)
1 (l/d
d : 1 (msensn2)
d:I(mspec*2) -0.2601305
cov.pi.start:I(msens~2) -0.6488201 0.1523801
cov.pi.start:mspec 0.1677280 -0.6963843 -0.1437198
I(co~.pi.start-2):I(msens^2) 0.5081962 -0.0884362 -0.9730796
I(cov.pi.start^2):mspec -0.094607'3 0.4934935 0.1171145
cov.pi.start:mspec I(cov.pi.start~2):I(msens^2)
W
I ( l / ( d * (0.05 - d l ) )
d
cov .pi. start
I(cov.pi. startn2)
1 (msensa2)
mspec
1 (mspec-2)
1 (l/d
d: 1 (msens-2)
d: I (mspec-2)
cov. pi. start : 1 (msens-2)
cov.pi.start:rnspec
1 (COV . pi. start-2) : 1 (msens -2) 0.1153060
1 (cov. pi. start'2) :mspec -0.9573718
4. The nested mode1
cov.aov<-glm(cov-cov.theta.start-I(l/n)+
210
Appendix G
Examples
Mode1 to fit the data of the second column of table 7.3
cov.theta.start<-0.1
matrix.refugees<-scan("data.refugees.O.05",
1ist(n=0,cov1=0,cov2=0,cov3=0,cov4=0,cov5~0,~0~6~0))
n+matrix.refugees$n
covl<-matrix.refugees$covl
cov2<-matrix.refugees$cov2
cov3<-matrix.refugees$cov3
cov4<-matrix.refugees$cov4
cov5<-matrix.refugees$cov5
cov6<-matrix.refugees$cov6
covl.refugees~-glm(covl-cov.theta.start-1+I(l/n),
family=quasi(link=logit,variance=constant),
data=matrix . refugees) summary(cov1. refugees)
coef (covl .refugees)
postscript (f ile=ltref ugees 1. ps" ,pr int=F, hor izonta l=F)
par(mfrow=c (2,l)
plot (n, covl)
par (new=T)
lines (n,f itted(cov1. refugees)+cov. theta. s t a r t )
plot (f itted(cov1 .refugees) , covl-cov. theta. start-f itted(cov1. refugees) )
dev. off 0
Bibliography
[l] Albert, J.H. (1993). Teaching Bayesian statistics using sampling methods and
MINITAB. The American Statistician 17 (3), 182-191.
[2] I rk in , C.F. and Wachtel, M.S. (1990). How many patients are necessary to assess
test performance. Journal of the lmerican Medical Association 263, 275-278.
[3] Berger, J.O. (1985). S tatistical decision theory and Bayesian analysis. Springer
Verlag, New York.
[4] Blackwelder, W.C. (1982). Proving the nul1 hypothesis in clinical trials. Con-
trolled Clinical Trials 3, 345-353.
[5] Casella, G. and Berger, R.L. (1990). Statistical inference. Wadsworth & Brooks,
California.
[6] Centor, R. M. (1992). Estimating confidence intervals of likelihood ratios. Medical
Decision Making 12 (3) , 229-33.
[7] Chaloner, K.M. and Verdinelli, J. (1995). Bayesian experimental design: a re-
view. Statistical Science 10 (3), 273-304.
[8] Chaloner, K.M. (1996). Estimation of prior distributions. Bayesian Biostatistics.
Marcel Dekker, New York.
[9] Chambers, J.M. and Hastie, T..J. (1992). Statistical models in S. Wadsworth &
Brooks, California.
[IO] Chinn, S. and Burney P.G.J. (1987). On rneasuring repeatability of data from
self-administered questionnaires. International Journal of Epidemiology 16 (1):
[ I l ] Cohen, J.R. (1977). Statistical power analysis for the behavioral sciences' Re-
vised Edition. Academic Press, Xew York.
[12] Dasgupta, A. and Vidakovic, B. j 1996). Sample size problems in -\nova: Bayesian
point of view. Submitted.
[13] Dawid, ;\.P. and Skene, A.M. (1979). Maximum likelihood estimation of observer
error-rates using the EM algorithm. Applied Statistics 28, 20-28.
[14] Dempster. .-\.P. Laird, ?LW. and Rubin, D.B. (1977). Maximum likelihood for
incomplete data via the Eh1 algorithm. Journal of the Royal Statistical Society
B 39, 1-38.
[l5] Desu, 3I.M. and Raghavarao, D. (1990). Sample size met hodology. Academic
Press, New York.
[16] Dobson, A.J. and Gebski, V.J. (1986). Sample size for comparing two inde-
pendent proportions using the continuity-corrected arc sine transformation. The
Statistician 35, 51-53.
(171 Donner, .A. (1984). Approaches to sample size estimation in the design of clinical
trials. A review. Statistics in Medicine 3, 199-214.
[la] Evans, 41. and Swartz, T. (19%). Methods for approximating integrals in statis-
tics with special emphasis on Bayesian integrat ion problems. Stat istical Science
I O (3), 254-272.
[19] Feinstein, A R . (1977). Clinical biostatistics. C.V. Moseley, Saint Louis.
120) Fleiss, J.L.. Tytun: -4. and Ury, H.K. (1980)- h simple approximation for calcu-
lating sample size for comparing independent proportions. Biometrics 36, 343-
346.
(211 Fleiss, J.L.(1981). Statistical methods for rates and proportions. Wiley, New
York.
1221 Gail, hl. (1973). The determination of sample sizes for trials involving several
independent 2 x 2 tables. Journal of Chronic Diseases 20, 233-239.
[23] Gilks, W .R., Richardson, S., and Spiegelhalter, D. J. (1996). Markov-chain Monte
Carlo in practice. Chapman Hall, New York.
[24] .Johnson, W.O. and Gastwirth, J.L. (1991). Bayesian inference for medical screen-
ing tests: -4pproximations useful for the analysis of acquired immune deficiency
syndrome. Journal of the Royal Statistical Society, Series B, 53 (2), 427-439
[25] Joseph: L., Gyorkos, T.W. and Coupal, L. (1995 a). Bayesian estimation of
disease prevalence and the parameters of diagnostic tests in the absence of a
gold standard. American Journal of Epidemiology 141 (3), 263-272
[26] Joseph, L., Wolfson, D.B. and du Berger, R. (1995 b). Sample size calculations
for binomial proportions via highest posterior density intemals. The Statistician
44. 143-154.
[27] Joseph, L.? du Berger, R. and Belisle, P. (1996). Bayesian and mked
Bayesian/likelihood criteria for sample size determination. S tatistics in Medicine,
15 (in press).
[28] Joseph, L., Belisle, P. (19%'). Bayesian sample size determination for normal
means and difference between normal means. The statistician (to appear).
[29] Joseph, L., Gyorkos? T.W. (1996). Inferences for likelihood ratios in the absence
of a gold standard. Medical Decision Making (to appear).
[30] Kung Jong Lui (1991). Sample sizes for related measurements in dichotomous
data. Statistics in Medicine 10, 463-472.
[31] Lachin, J.M. (1981). Introduction to sample size determination and power anal-
ysis for clinical trials. Controlled Clinical Trials 2, 93- 11 1.
[32] Lemeshow, S.. Hosmer Jr., D.W., Klar, J.: and Lwanga, S.K. (1990). Adequacy
of sample size in health studies. John Wiley & Sons, New York.
(331 Lew, R A . and Levy, P.S. (1989). Estimation of prevalence on the basis of screen-
ing tests. Statistics in Medicine 8, 1225-1230.
[34] Makuch, R. and Simon, R. (1978). Sarnple size requirements for evaluating a
conservative therapy. Cancer Treatments Report, 1037-1040.
[35] klcCullagh, P. and Nelder, J.A. (1989). Generalized linear models. Cliapman
and Hall, New York.
[36] Pham-Gia, T. and Turkkan? N. (1992). Sample size determination in Bayesian
analysis. The Statistician 41, 389-392.
[37] Raiffa, H. and Schlaiffer, R. (1961). -4pplied statistical decision theory. Harvard
Business School, Boston.
Rogan, W.J. and Gladen, B. Estimating prevalence from the result of a screening
test. (1987). American Journal of Epidemiology 107, 71-76.
Rohatgi, V.K. (1984). Statistical Inference. John Wiley & Sons, New York.
Rubin, D. (1987). Multiple imputation for nonresponse in surveys. John Wiley,
New York.
Schlesselman, J. J. (1974). Sample size requirements in cohort and case-control
studies of disease. Arnerican Journal of Epidemiology 99, 381-384.
Schork. M..\. and Remington, R.D. (1967). The determination of sample size in
treatment-control cornparisons for chronic disease studies in which dropout or
non-adherence is a problem. Journal of Chronic Diseases 20, 233-239.
Simel, D.L., Samsa. G.P. and Matchar, D.B. (1991). Likelihood ratios with con-
fidence: Sample size estimation for diagnostic test studies. Journal of Clinical
Epidemiology 44 (8), 763-770.
Snedecor, G.W. and Cochran, W.G. (1977). Statistical methods, Seventh Edi-
tion. Academic Press, New York.
Spiegelhalter, D. J. and Freedman, L.S. (1986)- A predictive approach to select-
ing the size of a clinical trial, based on subjective clinical opinion. Statistics in
Medicine 5 , 1-13.
Spiegelhalter, D.J., Freedman, L.S. and Parmar, M.K.B. (1994). Bayesian a p
proaches to randomized trials. Journal of the Royal Statistical Society, Series -4,
157, 357-416.
[47] Staquet, M. Rozencweig, M. Lee Y.J. et al. (1981). Methodology for the as-
sessment of new dichotomous diagnostic tests. Journal of Chronic Diseases 34,
599-610-
[48] Taragin, M.I., Wildman. D. and Trout, R. (1993). Assessing disease prevalence
from inaccurate test results: Teaching an old dog new tricks. Medical Decision
Making 14. 269-273.
[49] Turkkan, N. and Pham-Gia, S. (1993). Computation of the highest posterior
density intervals in Bayesian analysis. Journal of Statistical Computation and
Simulation 44, 243-250.
[SOI Viana, B1.A.G. and Ramakrishnan, V. (1992). Bayesian estirnates of predictive
values and related parameters of a diagnostic test. The Canadian Journal of
Statistics 20 (3), 311-321.
[5 11 Viana, M..kG. (1994). Bayesian small sample estimation of misclassified multi-
nomial data. Biometrics 50, 237-243.
[52] Walter, S.D. and Irwig, L.M. (1988). Estimation of test error rates, disease
prevalence and relative risk from misclassified data: a review. Journal of Clinical
Epidemiology 41 (9). 923-937.
[33] Weiner. DA., Ryan, T.T.? Mc Cabe, C.H. et al. (1979). Exercise stress testing:
correlation among history of angina, ST-segment response and prevalence of
coronary artery disease in the Coronary Artery Surgery Study (CASS). Xew
England Journal of Medicine 301, 230-235.
[54] White, -4.A. and Landis, J.R. (1982). A general categorical data methodology
for evaluating medical diagnostic tests. Communications in Statistics: Theory
and Methods 11, 567-605
[55] Wicknamaratne. P.J. (19%). Sample size determination in epidemiologic studies.
Statistical Methods in Medical Research 4, 311-337.
[56] Wilson, M. and Ware, D. (1991). Evaluation of the Diagnostic Pasteur Platelia
Toxo IgG and Toxo IgM kits for detection of human antibodies to Toxoplasma
gondii. 91 st General Meeting, American Society for Microbiology, Dallas, Texas.
May 3-93bstract number V-23.
IMAGE EVALUATION TEST TARGET (QA-3)
APPLIED - IMAGE. lnc = 1653 East Main Street - -* , , Rochester. NY 14609 USA -- -- - - Phone: 71 6/482-O3OO -- -- - - F a : 71 61288-5989
O 1993. Applied Image. lnc.. Ali Righs Resewed