size for prevalence estimation - Library and Archives Canada · Sample size determination for prevalence estimation ... and confidence intervals and sample size estimates that arise

Sample size determination for prevalence estimation

in the absence of a gold standard diagnostic test

Elham H. Rahme

Department of Mat hematics and S tatistics

McGill University, Montréal

November 1996

A thesis submitted to the Faculty of Graduate Studies and Research

in partial fulfillment of the requirements of the degree of Ph.D.

@Elham Rahme, 1996

National Library Bibliothèque nationale du Canada

Acquisitions and Acquisitions et Bibliographic Services services bibliographiques

395 Wellington Street 395, nie Wellington Ottawa ON K I A ON4 OttawaON K 1 A W Canada Canada

The author bas granted a non- L'auteur a accordé une licence non exclusive licence allowing the exclusive permettant à la National Library of Canada to Bibliothèque nationale du Canada de reproduce, loan, distribute or sen reproduire, prêter, distribuer ou copies of this thesis in microfonn, vendre des copies de cette thèse sous paper or electronic formats. la forme de microfiche/lfilm, de

reproduction sur papier ou sur format électronique.

The author retahs ownership of the L'auteur conserve la propriété du copyright in this thesis. Neither the droit d'auteur qui protège cette thèse. thesis nor substantial extracts fiom it Ni la thèse ni des extraits substantiels may be printed or otherwise de celle-ci ne doivent être imprimés reproduced without the author's ou autrement reproduits sans son permission. autorisation.

Acknowledgment s

I would like to express my deepest gratitude to Professor Lawrence Joseph who di-

rected the writing of t his t hesis. His invaluable suggestions, constant encouragement,

and endless patience. are greatly appreciated. I am also deeply grateful to Professor

David Wolfson for his advice and his constant support. 1 would like also to thank

Professor Keith Worsley for his comments and his kindness. Many thanks are also

due to the staff of the department for their support and their patience.

Abstract

A common problem in medical research is the estimation of the prevalence of a dis-

ease in a given population. This is usually accomplished by applying a diagnostic

test to a sample of subjects from the target population. In this thesis, we investigate

the sample size requirements for the accurate estimation of disease prevalence for

such experiments. When a gold standard diagnostic test is available, estimating the

prevalence of a disease can be viewed as a problem in estimating a binomial propor-

tion. In this case: we discuss some anomalies in the classical sample size criteria for

binomial parameter estimation. These are especially important with small sample

sizes. When a gold standard test is not available, one must take into account mis-

classification errors in order to avoid misleading results. When the sensitivity and

the specificity of the diagnostic test are both known, a new adjustment to the maxi-

mum likelihood estimator of the prevalence is suggested, and confidence intervals and

sample size estimates that arise from this estimator are given. A Bayesian approach

is taken when the sensitivity and specificity of the diagnostic test are not exactly

known. Here, a method to determine the sample size needed to satisfy a Bayesian

sample size criterion that averages over the preposterior marginal distribution of the

data is provided. Exact methods are given in some cases, and a sampling importance

resampling algorithm is used for more complex situations. A main conclusion is that

the degree to which the properties of a diagnostic test are known can have a ver?

large effect on the sample size requirernents.

Resumé

L'estimation de la prévalence d'une maladie dans une population donnée est un

problème commun en recherche médicale. Cette estimation est en géneral effectuée en

donnant un test diagnostic à un échantillon de la population visée. Dans cette thèse.

nous étudierons la taille de l'échantillon nécessaire à l'estimation de la prévalence

dans de telles expériences. Quand un test diagnostic parfait. servant de mesure

etalon' existe. l'estimation de la prévalence d'une maladie peut ètre vue comme un

problème d'estimation de la fréquence d'une distribution binomiale. dans ce cas. nous

montrerons qu'il existe quelques conceptions erronées en ce qui concerne le critère

classique servant à calculer la taille de l'échantillon. Ces conceptions sont partic-

ulièrement importantes dans le cas d'échantillons de petites tailles. Dans le cas ou

i iri test diagnostic parfait n'existe pas, on doit prendre en considérat ion les erreurs

de classification pour éviter des résultats trompeiirs. Si la sensibilité et la spécificité

du test diagnostic sont toutes deux connues. un nouvel ajustement du masirnum de

vraisemblance de la prévalence est suggéré. Des intervalles de confiance et des tailles

d'tchantillons résultants de cet estimateur sont calculés. C;ne approche Bayesienne

est prise quand la sensibilié et la spécificité du test diagnostic ne sont pas connues.

Ici. une méthode servant & calculer la taille de I'écliantillon nécessaire pour satisfaire

un critère Bayesien qui consiste à calculer la moyenne par rapport à la probabilité

marginale des données, est presentée. Des méthodes exactes sont proposées dans

quelques cas et un "Sampling Importance Resampling '' algorithme est utilisé dans

des situations plus complexes.

Contents

1 Introduction 1

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Diagnostic tests 1

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Objectives 4

. . . . . . . . . . . . . . . . . . 1.3 Estimating the prevalence of a disease 5

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Thesis outline 5

2 Preliminaries 8

. . . . . . . . . . . 2.1 Some additional characteristics of diagnostic tests 9

. . . . . . . . . . 2.2 Frequentist approaches t o sample size determination 11

2.2.1 Sample size required for a point estimate of a parameter to fa11

. . . . . . . . . . . . . . within a distance d of the true value 11

. . . . . 2.2.2 Sample size for a given power in a test of hypothesis 12

. . . . . . . . . . . 2.3 Bayesian approaches to sample size determination 13

. . . . . . . . . . . . . . . . . . . 2.3.1 Bayesian statistical inference 13

. . . . . . . . . . . . . . . . . . . 2.3.2 Bayesian sample size criteria 15

. . . . . . . . . . . . . . . . . . . . . . . . 2.4 Cornputational techniques 18

2.4.1 The SIR algorithm . . . . . . . . . . . . . . . . . . . . . . . . 18

2 - 4 2 The Gibbs sarnpler algorithm . . . . . . . . . . . . . . . . . . 23

3 Previous results and literature review 25

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Diagnostic tests 25

3.2 Sample size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4 Sample size in binomial studies: A new criterion 44

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

4.2 .A nomalies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4.3 Modified criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

5 Estimating the disease prevalence when the sensitivity and speci-

ficity are exactly known 55

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

5.2 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

3.3 Adjustment to the MLE . . . . . . . . . . . . . . . . . . . . . . . . . 58

5.4 Confidence intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

5 - 4 1 Confidence interval for p . . . . . . . . . . . . . . . . . . . . . 66

5.4.2 Confidence interval for 8 . . . . . . . . . . . . . . . . . . . . . 67

5.3 Sample size for estimating the prevalence . . . . . . . . . . . . . . . 73

6 Bayesian estimation of disease prevalence and sample size in the

absence of a gold standard 75

-" 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . KI

6.2 The case rvhen the sensitivity and the specificity of the diagnostic test

are exactly known . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

6.2.1 Prior density for p . . . . . . . . . . . . . . . . . . . . . . . . 76

. . . . . . . . . . . . . . . . . . . . . . . 6.2.2 Posterior density of p 77

6.2.3 Sample size determination via the average coverage criterion . 79

6.2.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

6 - 2 5 Comparing the sample sizes from the Bayesian approach to

those based on the AMLE . . . . . . . . . . . . . . . . . . . . 83

6.3 The case when the specificity but not the sensitivity of the diagnostic

test is exactly known . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

6.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

6.3.2 Prior density for p . . . . . . . . . . . . . . . . . . . . . . . . 87

6.3.3 Posterior density of p . . . . . . . . . . . . . . . . . . . . . . . 88

6.3.4 Posterior mean of 8 . . . . . . . . . . . . . . . . . . . . . . . . 90

6.3.5 Sampk size . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

6.3.6 SIR computations . . . . . . . . . . . . . . . . . . . . . . . . . 97

6.3.7 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

6.4 The case where both the sensitivity and the specificity of the diagnos-

tic test are unknown . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

6.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

6.4.2 Exact computations for the case of uniform prior distributions. 110

6.4.3 SIR computations . . . . . . . . . . . . . . . . . . . . . . . . . 112

6.4.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

6 . 5 Logistic regression models . . . . . . . . . . . . . . . . . . . . 123

7 Practical Implications

7.1 Procedure to End the sample size when the sensitivity and/or the

specificity are unknown . . . . . . . . . . . . . . . . . . . . . . . . . . 155

8 Discussion

Appendix 172

A Splus program to find the AMLE and the confidence interval for the

AMLE 173

B Splus program to calculate the average coverage probability when

both the sensitivity and the specificity are known 176

C Computing the posterior average coverage probability when the

specificity is known 187

D Regions of integration 195

E The SIR program to calculate the average coverage probability 200

vii

F Logistic models

Chapter 1

Introduction

1.1 Diagnostic tests

Diagnostic tests are widely used in medicine to help determine the presence or absence

of a certain disease in an individual. Unfortunately, most tests are not perfect. in the

sense that a given test may classi. a healthy individual as diseased. or a diseased

individual as healthy. Reasons for these errors may include laboratory or human

errors, technical imperfections in the tests t hemselves, and the difficulties of subjective

clinical judgments. Very often, the test does not directly mesu re the presence or

absence of the disease, but rather measures the degree to which a marker for the

disease, which rnay also be present in healthy individuals, is manifest. For example,

in parasitology, serologic testing may confirm the presence of ant ibodies associated

rvith a certain parasite long after an individual has been cured.

While in general the sample space for diagnostic test results may be continuous on

an interval or on the whole real line, dichotomous (positivelnegat ive), t richotomous

(high/medium/low) or other, in this thesis we will concentrate on dichotomous test

results. Of course: by choosing appropriate cutoff values, a11 diagnostic tests can

ultimately be dichotomized, although often this entails a loss of information. hfany

of the results to be presented here can be easily extended to other types of test results.

The degree of imperfection of a diagnostic test can be measured by its sensitivity

and specificity

The sensitivity of a dichotomous diagnostic test is defined to be its ability to

discover the disease when the disease is present. It is denoted here by S. so tha t

S = P(T+lD+). where T+ indicates that the result of the test is positive' and D+

indicates the presence of the disease. In other words: the sensitivity is the probability

of testing positive given that the disease is present.

The specificity of a diagnostic test is the abiIity of the test to confirm the absence

of the disease when the disease is truly absent. It is denoted here by C, so that C =

P(T-ID-) , where T- indicates a negative diagnostic test result, and D- indicates

the absence of disease. Thus the specificity of a test is the probability of testing

negative given that the disease is absent.

The prevalence of a given disease in a population is the proportion of the diseased

individuals in the population. The prevalence is denoted here by 8. so that 0 =

P(D+) . We now illustrate the above definitions:

Example 1.1.1 Centor [1992] considered 773 subjects who were given serum creati-

nine kinase (CK) to determine whether or not they had a myocardial infarction (MI).

The results are given in table 1.1.

The sensitivity of the test is estimated by s = 28/(28 + 23) = 0.55, and the

specificity is estimated by c = 471/(251 + 471) = 0.65. -4 point estimate of the

prevalence of myocardial infarction is 8 = (28 + 23)/(28 + 23 + 251 + 471) = 0.066.

When the results of the test always coincide with the true state of disease, the

Table 1.1: Diagnosing myocardial infarction (MI) by serum creatinine kinase (CK) .

Abnormal is defined as CK 2 120, and Normal subjects have CK < 120.

test is perfectly accurate and is often then referred to as a gold standard. Clearly

the sensitivity and the specificity of a gold standard test must both be equal to one.

Perfect gold standard tests rarely if ever exist, since even a theoretically perfect test

can be rendered less perfect by human, laboratory, or other errors. Therefore, a test

is often referred to as a gold standard if it is the best test available, even if it does

not have S = C = 1. Exarnples of imperfect tests that are considered to be gold

standards include arteriography, which is regarded as a gold standard diagnosis for

coronary artery disease, and the barium enema, which is regarded as a gold standard

for colon cancer.

Even when they exist, gold standard tests may be difficult to perform, highly

invasive, very costly, or time consuming, so that alternative tests are often considered.

In developing alternative tests, their performance is often compared to that of the

gold standard. For example, the stress method is a diagnostic test for coronary artery

disease whose test properties can be compared to arteriography. The stool guaiac test

is diagnostic for colon cancer and can be compared to the barium enema test. Table

1.1 provides an evample of the results of using serum creatinine kinase as a diagnostic

test for myocardial infarction.

1.2 Objectives

Determination of the sample size requirements dunng the planning phase of a study is

a very important and often difficult problem. A sample that is too small can produce

unstable estimates that can be very misleading, while an unnecessarily large sample

ail1 be wasteful of resources, including dollar costs and time to complete the stud-

and may expose study subjects to unnecessary risks.

In developing a new diagnostic test, one first needs to perform a study to estimate

the sensitivity and the specificity of the test. This thesis will address this problem

only very briefly. Separate studies are needed to give this problem the attention it

deserves.

Once a new test is developed, and its properties are estimated, it can be used

in screening studies to estimate the prevalence of a disease in a given population.

Several questions arise in such situations, two of which will be considered in this

t hesis:

1. What method should be used to estimate the prevalence of a disease, when

disease status will be deterrnined by an imperfect diagnostic test?

2. What should the sample size be in order to estimate the prevalence using these

methods with sufficient accuracy?

These problems are particularly difficult, since not only are we using an imperfect

test, but the degree of imperfection is only very rarely known exactly. Standard

methods for sample size determination therefore do not apply. In this thesis, we will

develop new methodology to address the above problems, and provide algorithms and

advice to enable researchers to apply the methods in practice.

1.3 Estimating the prevalence of a disease

When a gold standard test is available, estimating the prevalence of a disease given the

test results on a random sample of subjects from the target population, can be viewed

as the straightfomard estimation of a binomial parameter. Similarly, if diagnostic test

results from two tests are available, one of which is a gold standard' then estimation

of the sensitivity and the specificity of the other test is also straightfonvard. See

example 1.1.1. When a gold standard test is not available, however, one must take

into account the misclassification errors in order to avoid misleading results. In

particular, estimating the prevalence of the disease by the proportion of subjects

from the sample who test positive can be very inaccurate when the diagnostic test

is not a gold standard. For example, consider the case of a disease that has a Iow

prevalence, say srnaller than 0.01, and where the specificity of the diagnostic test

is 0.7. We will see in chapter 4 that in this situation, the proportion of subjects

with positive test results will be close to 0.3: no matter how large a sample size is

taken. Clearly, estimating 0 by that proportion would be very misleading, so that

adjustrnent for the sensitivity and specificity of the test is required.

1.4 Thesis outline

This thesis is mainly concerned with est imating sarnple size requirernents for st udies

involving imperfect diagnostic tests. It is structured as follows:

We have already defined the prevalence of a disease, the sensitivity and specificity

of a diagnostic test, and in chapter 2 we will present some additional definitions of

quantities related to diagnostic tests. We will also describe bot h frequentist and

Bayesian criteria for sample size estimation, and briefly outline the ideas behind

several estimation techniques that may be useful for sample size problems, including

those that will be helpful later in the thesis.

In chapter 3 we review some of the previous literature on the calculation of sample

sizes, both in general and those that relate to the estimation of the prevalence of a

disease.

In chapter 4 ive consider the classical problem of estimating a binomial parameter.

In planning binomial e~per iments~ sample sizes are often calculated to ensure tha t the

point estirnate will be within a desired distance frorn the true value with sufficiently

high probability. Since exact calculations resulting from the standard formulation

of this problem can be difficult, "conservative" and/or normal approximations are

frequently used. Sorne problems with the current formulations are given, and a

modified criterion that leads to some improvement is provided. -4 simple algorithm

that calculates the exact sample sizes under the modified criterion is provided, and

these sample sizes are compared to those given by the standard approximate criterion.

Chapter 5 focuses on a maximum likelihood approach for estimating the preva-

lence of a disease when the sensitivity and the specificity of the test are exactly

known. We consider the case of a disease with low prevalence. In this situation, the

maximum likelihood estimator ( M L E ) of 0 is often zero, which is unrealistic. We

define an adjusted MLE of O , which we denote by the AibfLE. We will show that

the .41bfLE is easy to calculate, can be considered as a more reasonable estimator

than the MLE, in a sense to be made niore precise in chapter 5 . We also discuss

confidence intemals for the A M L E , and find the sample size needed to estimate 0 to

within a given accuracy using the AMLE.

In most situations, however, the sensitivity and specificity of the diagnostic test

used to estimate the prevalence of a disease in a given study are not exactly known.

While some information usually is available on the tests, it is not normally sufficiently

precise that experts can agree on one single value for the sensitivity or the specificity.

Furt hermore, these values may Vary from situation to situation. Nevert heless. it

may be possible to construct a probability distribution over a range of sensitivity

and specificity values that represents what is known about the tests. In chapter 6

we use Bayesian methods to study the sample size requirements for estimating the

prevalence of a disease to within a given accuracy, in several different situations. First.

we examine the problem when the sensitivity and the specificity of the diagnostic

test are exactly known. Next, we extend these methods to the case where only the

specificity but not the sensitivity is exactly known. Finally we extend these methods

to the case where neither the sensitivity nor the specificity are exactly known. We

demonstrate that in the latter two cases, an upper bound on the accuracy exists'

so that higher precision cannot be attained regardless of the sample size. We will

develop a logistic regression mode1 to approximate the upper limit on the accuracy.

and the appropriate sample size needed to approach this upper limit.

Chapter 7 discusses the results of the previous chapters in the context of real

examples. We suggest a procedure to calculate the sample size required to estimate

the prewlence of a disease in practice.

Finally, chapter 8 contains a discussion and suggestions for further research.

Chapter 2

Preliminaries

In chapter 1, we defined the prevalence of a disease and the sensitivity and specificity

of a diagnostic test. The sensitivity and specificity are very important in assessing

the performance of a test. As ive shall see in this thesis? these properties also play an

important role in the estimation of the prevalence of a disease. Other characteristics

of diagnostic tests are also of interest. In this chapter we present some additional

definitions, as well as theorems and other methods that will be used throughout this

t hesis.

Throughout the remainder of this thesis, f (y) will denote the density function of

a random vector Y. In cases where confusion might occur, we will instead use the

notation fY(y) We will also continue to use the notation introduced in chapter 1 for

parameters related to diagnostic tests.

2.1 Some additional characteristics of diagnostic

tests

Definition 2.1.1 The positive predictive value ( p p ) is defined to be the prob-

ability of truly having the disease given that the test result is positive. ppu =

P(D+ (T+) .

Definition 2.1.2 The negative predictive value (npv) is defined to be the proba-

bility of truly not having the disease given that the test is negative. npv = P ( D - ( T - ) .

Definition 2.1.3 The positive likelihood ratio of a dichotomous diagnostic test

sensitivity LR+ = -

S -- 1 - specificity 1 - C y

and the negative likelihood ratio is

1 - sensitivity 1 - S LR- = = - .

C specificity

Example 2.1.1 Consider example 1.1.1 of chapter 1. In this example the positive

predictive value of the test is estimated by ppv = 28/(28 + 251) = 0.10, and the

negative predictive value is estimated by npv = 471/(23 + 471) = 0.95. The positive

likelihood ratio of the test is estimated by LRf = (28/51)/(251/722) = 1.58. and the

negative likelihood ratio of the test is estimated by LR- = (23/51)/(171/722) = 0.69.

the sample space. Let .4 be an

P(Bi

Theorem 2.1.1 Bayes' theorem Let Bi be a countable collection of mutually

esclusive events such that P ( B i ) > O for i = 1 ,2 : . . . , and R = uZI Bi, where R is

event with P(A) z O. Then for i = 1,2 , . . . , we have

Proof: By definition

which can be written as

By the 1aw of total probability

t herefore

Let X and Y be two random variables with joint probability function P ( x , y).

Bayes' thexern can then be written as

whenever the denominator is positive. If, on the other hand, S and 1' have a joint

continuous density function f (x, y ) , then Bayes' theorem can be written as

Proposition 2.1.1 The ppu and the npv can be written in terms of the prevalence

of the disease and the sensitivity and the specificity of the diagnostic test. In fact

e s = es+ (1 -8)(1- C) '

and

Proof: By defin

theorem

(1 - e)c npv = e(i - S) + ( i - B)C'

ition, ppv = P(D+ITf), and npv = P(D-(Tm). Therefore: by Bayes'

Similarly, using Bayes' theorem we have

Recalling the notation introduced in chapter 1 completes the proof.

2.2 Frequentist approaches to sample size deter-

minat ion

There are several frequentist criteria for determining sample sizes that have appeared

in the literature. Below we review the two most common of these. More complete

reviews of methods for sample size determination are found in Desu and Raghavarao

[1990], Lemeshow et al [1990], and Lachin [1981].

2.2.1 Sample size required for a point estimate of a param-

eter to fa11 within a distance d of the true value

Let -Y be a random variable, and let fs(xlO) be its density function, where 0 is an

unknown parameter. If the purpose of a study is to estimate 8' a natural question

is to determine the sample size such t hat the estimate of 9 d l be "close" to û wit h

high probability. Letting e be an estimator of 0, the sarnple size n can be deterrnined

such that

~ ( 1 6 - e l 5 d ) 2 1 - a (2.3)

is satisfied for prespecified d and a. Usually, 6 depends on n in such a way that 8 + t9 as n + W. Therefore. increasing the sample size leads to more accurate estimation,

and there is usually an no such that for n 2 no, equation 2.3 will be satisfied. Desu

and Raghavarao [1990] discuss this method for a variety of density functions f x ( x ( 0 ) .

We will return to this criterion in chapters 4 and 5 .

2.2.2 Sample size for a given power in a test of hypothesis

Suppose again that X is a random variable with density faY(xl0)! mhere 0 is an

unknown parameter. Suppose further that the purpose of the study is to test the null

hypothesis Ho : 0 = O. against an alternative hypothesis Ha, where Ha can be of one

of the following forms:

As is well known in testing a hypothesis? two types of errors can occur:

Definition 2.2.1 The type-1 error, usually denoted by (Y, is the probability of

rejecting the null hypothesis when it is in fact true.

Definition 2.2.2 The type-II error, usually denoted by $8: is the probability of

not rejecting the nu11 hypothesis when the alternative is true.

Definition 2.2.3 The power of the test under a particular alternative is the prob-

ability of rejecting the null hypothesis when this alternative is true, so that power=

1 - ,3.

Ideally, we would like both a and to be 0, but this is usually impossible except in

trivial cases. A comrnonly used method is to choose the sample size such that the

power of the test will be a t least 1-0 when the type4 error is at most a. This method

is widely used in the planning of clinical trials. In the context of diagnostic tests, it

may be relevant when one wishes to test whether the sensitivities (or specificities) of

two diff'erent diagnostic tests are equivalent. However, one is usually more concerned

with the estimation of test properties and population prevalences, so that Ive will not

further consider methods based on the power of hypothesis tests in this thesis. except

briefly in the literature review on sample size methods. See Desu and Raghavarao

[1990] for more on this approach in a wide variety of sampling situations.

2.3 Bayesian approaches to sarnple size determi-

nation

2.3.1 Bayesian sta t ist ical inference

Before the conduct of many experiments, some information is often available about

the values of the parameters of interest. The Bayesian approach to statistical in-

ference consists of summarizing this information in a joint distribution function for

the parameters, and then updating this prior distribution in light of the data col-

lected in an experiment through Bayes' theorem. We mil1 first briefly summarize the

Bayesian approach to statistical inference, and then review several Bayesian sample

size criteria.

Definit ions

Let S = (.Yi, . . . , Sn) be a random sample whose distribution depends on a possibiy

vector valued parameter .y.

Definition 2.3.1 The prior density function (or probability function) is the

density funct i~n (or probability function) of the parameter y, before the experiment

is conducted. It is denoted here by f (y).

The prior density function sommarizes al1 of the information known to the investiga-

tor before examining the data collected from the current experiment. Since different

investigators will have different prior density functions, they are subjective. An im-

portant step in any Bayesian analysis is to select an appropriate prior distribution.

See Chaloner [1996] for a discussion of a variety of methods for prior density elicita-

tion. Let f ( X I - / ) denote the joint conditional density function (or probability func-

tion) of the sample X = (-il,. . . , .Y,) given the parameter 7. Suppose that X = z is

observed.

Definition 2.3.2 The likelihood function is any function of the parameter ni7

l(217) that is proportional to f (zl y). For example, one can choose 1 (XI-() = f (XI-{). The likelihood function provides the relative likelihoods (heights of the curve) of the

data points x conditional on a fixed y.

With the above setting we have:

The marginal density of .Y? denoted here by m ( x ) ? is given b?;

J 1 (x 1 ) f (7) d , if f (y) is a density function

1 ( x ) ) : if f (7) is a probability function.

Definition 2.3.3 The posterior density function (or probability function)

of 7 is the conditional density function (or probability function) of y given .Y = x.

It is denoted here by f (ylx) and is given by

The above formula is of course Bayes' theorem. In the rernainder of this thesis, when

using the Bayesian approach, only continuous parameters with prior density functions

are considered. Therefore, the following definitions are given only for the continuous

case. Similar definitions can be written when considering discrete parameters with

prior probability functions.

Definition 2.3.4 The posterior mean of is the expected value of given .Y = z.

It is denoted here by j so that

where the integration is over the range of 7. If is a one random variable. then ive

also have the following:

Definition 2.3.5 The posterior coverage probability of an interval y2] given

2.3.2 Bayesian sample size criteria

Suppose that a random variable S depends on an unknown parameter and that

the purpose of a study is to estimate y. For example, we might like to determine the

minimum sample size n such that the posterior density function of y is "sufficiently

narrow". More specifically, we seek an n such that

for given d and a. Thus we are looking for the minimum sample size n such that the

posterior credible set defined by [̂ / - d, 7 + dl has posterior coverage probability of

at least 1 - a, that is

The posterior mean i and the credible interval [C, - d, + d] both depend on the

data x, but x is unknown at the planning stage of the experiment. To sidestep this

problem. we can select n such that the average posterior coverage over al1 possible

values of x is greater than l - a. This can be done in several ways.

Definition 2.3.6 An average coverage criterion (ACC) consists of fixing the

desired posterior interval length 2d, and finding the minimum sample size n such

that the expected posterior coverage probability is at least 1 - a, for a predetermined

value of a. Therefore, we seek the smallest n satisfying

In the case where S is a discrete random variable equation 2.7 is written as

In either case. note that the average is taken over the preposterior marginal dis-

tribution of x, rn(x). Therefore, the average coverage is in fact a weighted average,

with the weights given by m(x) .

Note 2.3.1 In this section and elsewhere, we use posterior credible intervals that

are of the form posterior mean f d. Other intervals could also be used' for example.

highest posterior density (HPD) intervals. Although it rvould be easy to define criteria

like the .&CC in terms of HPD intervals, these rvould typically be much more difficult

to compute, and rnay not substantially change the resulting sample size estimate in

most cases. See Joseph, Wolfson, and du Berger [1993b] for a comparison of HPD

and symmetric intervals in the context of various Bayesian sample size criteria.

Definition 2.3.7 .4n average length criterion (ALC) consists of fixing the cov-

erage probability 1 - cr of the posterior credible interval for y, and computing the

sample size n such that the expected length is at most 2d. That is, we seek the

minimum n such that r

where d f ( x Y n ) is such that

Here dt (x , n) depends on both x and n.

Definition 2.3.8 A worst outcome criterion (WOC) consists of finding the

srnallest n such that the minimum posterior coverage probability of length 2d over

al1 possible x, is a t least 1 - a, for predetermined d and a. That is

While the ACC and the ALC may be criticized on the grounds that they ensure

the desired posterior coverage probabiiity or the desired length only on average, the

WOC ensures t hat bot h the desired posterior coverage probability and the desired

length will hold regardless of which data x occurs. The WOC may sometimes lead to

unnecessarily high sample sizes, since some values of x leading to large n may have a

very low, but not nul1 probability. Therefore, it may be more realistic to only require

that the desired posterior coverage probability and the desired length hold over only

a subset of the data x. For more details on each of the above criteria see Joseph.

Wolfson and du Berger [1995a].

Although Bayesian approaches to sample size estimation were proposed a t least

35 years ago, (see Raiffa and Schlaiffer [1961]), they have not been widely used in

practice. Since al1 sarnple size criteria must make at least some use of the available

prior information, Bayesian approaches are natural for such problems. Therefore,

the reason for their scarcity is largely because these approaches involve computations

that can be very difficult. For example, in order to calculate the marginal density of

x we need to calculate integrals of the form

These integrals can be very difficult and often impossible to cornpute direct,ly, so

that numerical calculations or other methods are needed. However, recent advances

in Bayesian computing (see section 2.4) have largely removed this barrier. We will

make use of these cornputer advances in this thesis.

2.4 Computat ional techniques

In recent years, algorithms useful for Bayesian analysis, such as the sampling impor-

tance resampling (SIR) and the Gibbs sampler, have become very popular among

applied statisticians. This has focused increasing attention on Bayesian approaches.

both for general inference problerns (see Gilks (19961 for a recent review) and for

experimental design (see Dasgupta (19961 and Chaloner and Verdinelli [1995] for re-

cent review articles in Bayesian design). The basic idea behind these algorithms is to

replace difficult analytic computations of integrals involving posterior densities with

summaries of samples from the target densities.

Here we will briefly describe two of the most popular approaches.

2.4.1 The SIR algorithm

In this thesis. ive will use the SIR algorithm for many of the computations involving

Bayesian inference. This algorithm is a generally applicable method for approxi-

mating posterior distributions. The SIR algorithm is useful when one is interested

in obtaining a random sample from a probability distribution g(x) that is difficult

to work with analytically or even to simulate from directly, but where there exists

another simpler distribution, h ( x ) , that roughly abprofimates g ( ~ ) ~ and is easier to

sample frorn. The SIR algorithm consists of the following 3 steps:

1. Draw a random sample of size n, XI,. . . x,; from h(x).

2. Compute sample weights w ( z i ) = $$i = 1 . . . , n.

3. Draw a new random sample, of size m, XI:. . . , x k with replacement from

zl . . . , z, with probabilities proportional to w(xl) , . . . , w(x,) .

The resulting sample x;: . . . : x& is an approximate independent random sample from

g ( x ) . For example, one can approximate the posterior coverage probability of any

subset -4 of the real line by the proportion of points from x;'. . . xk that are in -4.

See Albert [1993] for a simple explanation of the SIR algorithm in the context of

binomial sampling, and see Rubin [1987] for a theoretical justification of the method.

Clearly, increasing m and n will increase the accuracy of the results of calculations

that use SIR samples.

Example 2.4.1 Suppose that a diagnostic test is given to n = 100 subjects to

determine whether or not they have a certain disease. Suppose further that x = 25

subjects test positive. Denote by p the probability of testing positive. Suppose that

prior information on the parameters 8, S and C is available, and the problem is to find

the posterior distribution of p. This might be of interest, for example, in estimating

the cost of follow-up for positively testing subjects. By the law of total probability,

we have

P ( T t ) = P ( T + ( D f ) P ( D f ) + P(T+ID-)P(D-)

which can be wit ten as

p = es+ (1 -8)(1- C).

Suppose that the prior information on O , S and C can be summarized in the following

prior distribution functions: 0 -- Beta(3,25); S -- Beta(5,2) and C - Beta(6.1).

Recall that the density function of a Beta(a, b) distribution is given by

where T ( t ) is the gamma function, and a and b are positive real numbers. The SIR

algorithm can be applied to obtain a random sample from the posterior distribution

of p. Let g (pis) be the posterior density function of p, and Let h ( p ) be the prior density

function of p. Approximate random samples can be obtained from h ( p ) as follows:

First we draw a random sarnple of size n = 50000, Say. from the prior distribution

of 8. (many algorithms are available for sampling from Beta densities). Random

samples of size n = 50000 are also drawn from the prior distributions of S and C. An

approximate random sample of size n = 50000 from h ( p ) is obtained by applying the

formula

p i = 6 i ~ i + ( 1 - B i ) ( l - ~ ) , i = 1 , ..., 50000.

Recall t hat

mhere m ( x ) is the marginal probability function of the data, and l(xlp) is the like-

lihood of the data, given by l (x(p) = (A!') pZ5 (1 - P ) ~ ' , or more simply p25(1 - Hence the sample weights are given by

Applying

from h ( p )

the SIR algorithm as described above we resample from the pi's obtained

with replacement and using weights wi, to obtain a simulated sample from

the posterior distribution of p. .A plot of the prior and posterior densities of p obtained

by smoothing the simulated posterior sample is shom in Figure 2.1.

prior density of p ...*.-------. posterior density of p

Figure 2.1: Plots of the prior and postefior density of p.

Here we see that the posterior density function of p seems to be more symmetrical

than the prior distribution of p: and shifted to the right.

2.4.2 The Gibbs sarnpler algorithm

Like the SIR algorithm, this technique consists of drawing appropriate random sarn-

ples from the posterior distribution in order tu make statistical inferences on a set of

unknown parameters, when direct computations are difficult or impossible to perform.

It can be useful in the following situation: Suppose we have k 2 2 random variables

?Cl, . . . 'il;, where the conditional distributions f (xilxj : 1 5 j j 4 and j # z) of each

,Y,, 1 5 i 5 k, given al1 other randorn variables, is either known or can be sampled

from. Suppose further that one is interested in finding the marginal distribution of

one or al1 of the random variables. The Gibbs sampler consists of the following steps:

1. Choose initial starting values for al1 the variables but one. For example. you

can start with initial values x * ~ , . . . x k l , for &, . . . &, respectivel-

2. Draiv a random value X I I for Xl from f (xL . . . zkl), ahich is the conditional

distribution of XI given & = 2 2 1 , - - - & = X ~ P

3. Repeat this procedure for al1 variables in turn, that is, after having drawn x i l :

draw a random value xz2 for & from the conditional distribution of ,Y2 given

q l , x 3 ~ . . . xkl, and so on.

-4 cycle of the algorithrn is completed when al1 conditional distributions have been

sampled €rom a t least once. The entire cycle is repeated a large nurnber of times,

typically 5000 or 10000. The random sample generated for each random variable

variable -Yi : 1 5 i < k is then regarded as a random sample (possibly correlated)

from the correct marginal distribution. Shere is now a very large literature on the

Gibbs sampler and other Markov chain Monte Car10 algorithms. For more details

see Gilks [1996]. Although we do not use the Gibbs sampler in this thesis, it may

be useful for more complex situations, for example, when two or more imperfect

diagnostic tests are sirnultaneously applied to a sample from a population in order

to estimate the prevalence of a disease. see Joseph, Gyorkos and Coupai [1995].

Other recently developed algorithms for Bayesian inference are reviewed by Evans

and Swartz [1995].

Chapter 3

Previous results and literature

review

In this chapter we will first review the Iiterature on estimation problems in diagnostic

test situations, and t hen review previous work on sample size determination. These

tmo topics form the b a i s for this thesis.

3.1 Diagnostic tests

Many authors have studied the estimation of the prevalence of a disease and the

accuracy of diagnostic tests when these tests are subject to errors. In this section we

will review some of this literature.

Walter and Irwig [1988] reviewed many methods for the estimation of the preva-

lence of a disease and the sensitivity and specificity of a diagnostic test when a

gold standard test is not available. In general, suppose that R different diagnostic

tests, none of ivhich are gold standards, are given to each subject in a sample from

-V different populations. If we suppose that each test provides a dichotornous re-

sponse indicating presence or absence of the disease. then we have R false positive

rate parameters ( 1-specificity) , R false negative rate parameters ( 1-sensi tivity) , and a

prevalence parameter associated with each population. Therefore, we have iV(2R+ 1)

parameters to estimate. Most authors have assumed the independence of the diag-

nostic test results from difFerent tests? conditional on disease status. This is often

called the conditional independence assumption. If the test results are conditionally

independent, there are N ( z R - 1) degrees of freedom in total. The log-likelihood of

the data can be expressed as

where a,, and Pr, denote the false positive and false negative rates for test r in

population s, x ( r ) denotes the classification of an individual by test r, the second

summation is over al1 combinations of observations given by the different diagnostic

tests, and n,(x) is the number of individuals in population s who receive a given

set of classifications x. If the number of tests is R 3 3, then al1 parameters can

be estimated by maximum likelihood' since N(2R + 1) 5 ~ ( 2 ~ + 1): that is the

degrees of freedom is greater than or equal to the nurnber of unknown parameters.

In this case, many authors, including Dawid and Skene (19791 and White and Landis

[1982], propose the use of the EM algorithm to estimate the parameters in equation

3.1. The Ehl algorithm consists of adopting initial estimates for each parameter. and

alternating through expectation (E) and rnaximization (M) steps until convergence.

See Dempster, Laird and Rubin [1977] for more on the EM algorithm. If R <

3, however, one is limited to estimating only a subset of the parameters, so that

constraints must be imposed on some parameters in order to calculate maximum

likelihood estimates of the others. For example, in the case where we only have one

diagnostic test and one population, there are three parameters to estimate, but only

one degree of freedom, and therefore two constraints must be imposed. A cornrnon

option in this case is to consider the sensitivity and the specificity as being exactly

known? and estimate the prevalence, given these test values. Rogan and Gladen X/n+c- 1 [1978] propose the use of the estimate = s+c-l . , where s is the sensitivity and c is

the specificity of the diagnostic test, and 0 the prevalence of the disease. X detailed

derivation and discussion of this estimator is given in chapter 3 of this thesis.

Two constraints are also needed in the case of two tests and one population, since

here there are 5 unknown parameters but ooly 3 degrees of freedom. One possibility is

to regard the sensitivity and the specificity of one test as being exactly known, and to

estimate the sensitivity and the specificity of the other test, along wit h the prevalence.

Staquet, Rozencweig, Lee et al [1981] consider this case and compute limit bounds

for the unknown sensitivity and specificity, which in turn lead to a range of values

for the prevalence of the disease. -4 special case would be to assume that one of the

tests is a gold standard, that is. s = c = 1. For example, in Weiner. R p n . McCabe et

al. [1979], arteriography is assumed to be a perfect test for coronary artery disease.

and the stress ECG method is the diagnostic test whose characteristic parameters

need to be estimated. Another choice would be to regard the sensitivity in both

tests as being exactly known, and estimate the specificities and the prevalence. see

Staquet, Rozencweig, Lee et al (19811. An alternative approach is to consider the

sensitivities of both tests as being unknown but equal and equal to the specificities.

Chinn and Burney [198'7]. Walter and Invig 119881 also reviewed cases with irregular

designs, where each test classifies only a subset of individuals. For erample, these

methods can be useful when preliminary tests will be used to determine whether

or not the individual will go on to be further tested. They also reviewed cases of

response variables with more than two categories, for example when a subject may

test positive, negative or indeterminate.

Lew and Levy [1989] used Bayesian rnethods to estimate the prevalence of a

disease frorn the results of a screening test, when the sensitivity and the specificity

of the test are exactly known: and equal to s and cl respectively, where s + c 2 1.

They used the fact that

p = OS + (1 - 8)(1 - c ) ,

which was derived in chapter 2, to mrite 0 as

From equation 3.2, and using the invariance property of the maximum likelihood

estimators: we have that the maximum likelihood estimator of 8 is given by

where x denotes the number of subjects with positive test results, and n is the sample

size. For rare diseases, the maximum Iikelihood estimator is often O. To correct

for this, the authors considered a Bayesian approach. They assumed that B has a

uniform prior distribution on the interval [O1 11, and proposed the posterior mean as

an estimator of B. The posterior mean of 19 is derived from the posterior mean of p in

the following way: Taking expected values conditional on the data x on both sides of E p z +c-1

equation 3.2, we have E(Blx) = (skL-L . Therefore, the posterior mean of 0 is

where

See ecpation 2.4 in chapter 2 . The authors also provide approximations to simplify

the calculations. and calculate credible intervals around the posterior mean. The

posterior mean has the advantage over the SILE in that it is always between O and

1.

The authors propose that Bayesian methods be used for samples with sizes smaller

than 100. and that the 5ILE be used for larger samples. While this may often be

reasonable. as will be seen in chapter 3. the maximum likelihood estimator could be

O with probability larger than 0.3 even for sample sizes much larger than 100. One

must also be careful when assuming the sensitivity and the specificity of the test as

exactly known. since even srnall deviations could greatly affect the estirnate of the

prevalence. Consider the example provided in Lew and Levy [1989]. Here n = 96.

x = S. and the sensitivity and specificity are assumed to be exactly known with

s = 0.89 and c = 0.74. Letting p denote the probability of testing positive. ive have

Replacing s and c by their values. we get p = 0.638 + 0.26. and

See Casella and Berger [1990] page 82. Therefore. this probability is decreasing in p.

and reaches its niauimum for p = 1 - c = 0.26. Hence

Thus either the authors are considering a highly unusual situation, or the specificity

of the diagnostic test is in fact much higher than 0.74. so that 1 - c will be closer to

8/96 = 0.083.

Johnson and Gartwirth [1991] considered the screening tests that are used to

detect antibodies to the human immunodeficiency virus (HIV) in donated blood, and

used a Bayesian approach to assess the prevalence of the disease. They developed

approximations to the joint posterior distribution of the prevalence of the disease

and the sensitivity and the specificity of the diagnostic test. The methods apply

to the case where the prevalence is very srnall, the sample size very large, and the

diagnostic test very accurate. They consider a sample of size n and suppose that

the data reports either x or (x:xa), where x denotes the number of subjects with

positive test results, and xtp denotes the number of subjects that are truly positive,

that is, the oumber of subjects among x that have the disease. They suppose that

the sensitivity S, the specificity C and the prevalence 0 have independent beta prior

distributions, 0 - Beta(a, b), S - Beta(al, bl), and C - Beta(a2, b2)- The likelihood

functions given data x or (x, x,,) are

and

respectively. From Bayes theorem, the joint posterior probability density functions

for (O, S, C) for the above two likelihoods are

and

respectively. In the case where B is near 0, and S and C are both near 1,

and

where in the second approximation O(1- S) is approximated by O and BC is approx-

imated by 8. Therefore,

0(1 - S) + (1 - e)c = 1 - (e + 1 - c) = e~p(-(O + 1 - C)).

The 1st approximation follows since for x small, exp(x) - 1 + x from the Taylor

series expansion of exp(x) . Now assuming that b, a l and a2 are not small relative to

n, and using the approximation exp(x) zz 1 + x, we get

and

It follows that

and

P(C) K (1 - c)"-' exp(-a2(l - C)).

Therefore, by Bayes theorem and the above approximations,

P(B,S,Clx) cc ( O + 1 - C)=exp(-(n - x)@+ 1 - C))Ba-'exp(-b8)(1 - S ) ~ I - '

x exp(-al (1 - C))(1 - C)~?- ' exp(-a2(l - C)) , and

31

P(0. S, Clz, xt,) x @"+"P-' exp(-O(b + (n - z))(î - s )~ ' - '

x exp(-al(l - S))(1 - ~ ) ~ ~ + ~ ~ p - ' e x ~ ( - ( l - C)(a2 + (n - x))).

Therefore, the marginal posterior densities of B I S and C given (x, ztp) are

These approximations provide a way to calculate the marginal posterior densities

when direct calculations are impossible. However. one must remember that they were

based on the assurnptions that b, ai and a2 are not small relative to n, and therefore

their applications are limited to situations where the prior information matches these

conditions. In particular. the exarnples provided in Johnson and Gastwirt h [l99 11

seem to violate the assumption that b. al and a2 are close to n. For example. using

a uniform prior on 0 violates the assumption that b is not small relative to n, since

b = 1 and n = 94496. Also, the priors S - .(142,1) and C - /3(1363,3) violate the

assumption that al and a* are not small relative to n, since a l = 142, a* = 1363 and

n = 3122556. These violations might a t l e s t partially explain the inconsistency of the

results found by the authors. For instance, the authors state that ". . . The posterior

standard deviations tend to decrease with decreasing prior information . . . If the prior

means were close to the data this phenomena would not occur . . . " This might be

due to the fact that decreasing prior information implies, according to the examples

provided, decreasing b, al and a:, while keeping n fixed. This creates larger errors in

the approximations used, and having the prior mean close to the data means that

b, al and a2 are closer to n. Therefore in choosing the prior distributions one has to

verify that the assumptions the approximations were based upon in fact hold.

Taragin, Wildman and Trout (19941 study the estimation of the prevalence of

disease 0 from an imperfect diagnostic test. They also propose the use of the estimate

given by equation 3.3, and assume that the sensitivity and specificity of the diagnostic

test are exactly known and equal to s and cl respectively. If the sensitivity and

specificity are not exactly known, the authors suggest the use of estimates "consistent

with the literature" to determine a range of values for the sensitivity and a range

of values for the specificity that will provide a range of values within which the

prevalence of disease will lie. Taragin, Wildman and Trout (19941 provide three

examples where equation 3.3 can be applied.

When using a Bayesian approach, estimation of al1 unknown parameters is based

on their joint posterior distribution. Even though the formulation of these posterior

distributions via Bayes theorern is simple, the calculations can be quite involved. In

recent years, with the advent of numerical techniques useful for Bayesian inference

such as the Gibbs sampler algorithm and the SIR algorithm (see chapter 2). one

can draw approximate randorn samples from the joint posterior distribution of the

parameters, and draw inferences about the parameters based on these samples.

Joseph, Gyorkos and Coupal [1995] provide a method to estimate the prevalence

and the parameters of one or more conditionally independent diagnostic tests in the

absence of a gold standard. In the case of one diagnostic test, they assume that the

prevalence of the disease in a given population and the sensitivity and the specificity

of the diagnostic test have independent Beta prior distributions, 0 --. P(a, b ) , S - &, b l ) , and C - /#(a2, b2) . The data can be represented by the 2 x 2 table given in

table 3.1. Here the sample size n is known, and a and b are the number of positive

and negative tests observed, respectively. The "data" YI and Y2, representing the

number of truly diseased individuais out of a and b, respectively, are not observed,

and are therefore termed latent data. The likelihood function of the observed and

Test results

True state 1 D+ 1 Yl 1 a - Yl 1 a 1

Table 3.1: Observed and latent data for one diagnostic test

latent data is then

l(a, b, Yi, %IO: S, C) cc ~ ~ ~ + ~ ~ ( 1 - 8 ) n - Y i - h ~ Y i ( 1 - S ) % ' ~ - ~ ( I - c)~-".

Using Bayes theorem, the joint posterior distribution of 0, S and C is

P(0, S, Cla, b, YI, f i ) CC t9Y1+Y2+a(1 - B ) ~ - ~ ~ - ~ + ~

x ~ Y l + a i (1 - s ) Y 2 + 6 1 ~ b - Y 2 + a z (1 ~ ) a - Y t + b z

Xote that the conditional distributions of the

(a, bo O , S. C) are

YI (a , 8; S , C - Binomial a,

and

( i; la, 6, S, C - Binomial 6 , (

The Gibbs sarnpler can be applied to obtain

unobserved latent data Yl and Y2 given

O(1 - S) e ( i - S) + (1 - e)c samples from the marginal densities for

YI, and Y*, 8, S, and C, by sampling in turn from the full conditional distributions

for these variables. As we will show in chapter 6, approximate posterior densities can

also be derived using a SIR algorithm, at least in the case of one test. The authors

applied similar methods to the case where the results of two or more diagnostic tests

are available, none of which can be considered a gold standard.

Joseph and G yorkos (19961 used sirnilar met hods to obtain posterior distributions

for the likelihood ratios of diagnostic tests in the absence of a gold standard.

3.2 Sample size

Sample size determination is an important component of the design of virtually every

experiment. The subject has received much attention, and a large literature is avail-

able. Little work, however, has been done on the question of determining the sample

s i x required to estimate the prevalence of a disease to within a given accuracy, when

a gold standard test is not available. If the diagnostic test used is a gold standard,

then estimating the prevalence of the disease and determining the sample size needed

for the estimate to be within a given accuracy of the true value is reduced to a simple

binomial sample size problem. This case will be discussed in full in the next chapter.

The problem of estimating the sample sizes required to estimate the sensitivity and

s p e ~ i f i c i t ~ of an imperfect diagnostic test, to within a given accuracy, is also reduced

to a simple binomial sample size problem, when a gold standard test is available. In

this section we will review some of the previous work done on the question of sample

size, giving more attention to the work that is more relevant to diagnostic tests.

Many sample size papers come from the clinical trials literature. Clinical trials are

usually designed to determine if exposure to a certain factor (such as a drug) changes

a disease state. One would therefore like to determine the sample size required to

ensure detection of an association between the factor and the disease state with high

probability. There is a very large Iiterature on estimating sample sizes for clinical

trials, much of which is applicable to other situations as well.

Lachin (19811 summarized the most cornmon methods for estimating sample size

requirements for clinical trials. The main idea behind most of the methods is as

follows: Let X be a random variable that is normally distributed. In particular,

assume X - N ( p , C), where C is usually a function of the variance of each individual

observation, a2, and of the sample size n. Often C is taken to be a2/n. Suppose that

t,he primary purpose of the study is to test the nu11 hypothesis Ho : p = po against the

alternative hypothesis H I : p = p l . Suppose further that under the nul1 hypothesis

a* = oz, and under the alternative hypothesis O* = of. One would like to determine

the sample size required for the hypothesis test to have a power of a t least 1 - 0 when

is a t most a, for aven (Y and p. Straightfonvard algebra (see Lachin

a sample size

the type4 errox

[1981]) leads to

\ P l - P o 1 .

where 2, and Zg are the usual standard normal upper 100(1 -a)% and 100(1- @)%

quantiles, respectively. In the case where O* is unknown, it can be approximated

from a preliminary sample of size m. for example by the sample variance S2 =

C(x, - ~ ) * / ( m - 1). See Desu and Raghavarao (19891 for a discussion of the estimation

of 02. A student t test based on may then be used to find the sample size.

Lachin (19811 applies this general method to a large variety of common occurring

cases, including the case where the purpose of the study is to test whether the means

of two independent groups are equal, as would occur in a clinical trial with a placebo

group.

Donner [1984] also reviewed approaches to sample size estimation in the design of

randomized clinical trials, where the primary purpose is to compare two treatments

with respect to the occurrence of (or time to) some specified event. In general,

denote by Pc the anticipated T-year event rate among control group patients, and

let PE be the anticipated T-year event rate among experimental group patients. Let

6 = PE - Pc, the difference in event rates. Suppose one would like to test Ho : b = O

against Ha : b = da, at level cu and power 1 - 13. The required sample size n for

each group can be calculated by approximating the binomial distributions with the

appropriate matching normal distributions. The sample size is then given by

where 00 and 01 are the standard deviations of an observation under Ho and Ha,

respectively. Different methods of estimating oo and ol Iead to different formulas.

Fleiss [1981] supposed that Pc is known, and provided the following sample size

where P = Pc+P& 2 and PE = Pc + da Lachin [1981] approximates equation 3.9 by

n= 2 (2 , + Z ~ ) ~ P ( ~ - P )

6f

Feinstein [197T], Cohen [197T] and Snedecor and Cochran [1980] provide similar for-

mulas to approximate equation 3.9. Xone of these formulas accounts for the continuity

correction often used to improve the iiormal approximation to the binomial distri-

bution. Fleiss, Tytun and Ury [1980] show that the incorporation of the continuity

correction implies that the value of n given by equation 3.10 should be increased by

an amount of 2/IPE - Pc( They also considered the case of randomization of un-

equal numbers of patients into two groups. If for example, one wishes to randomize

n patients to the experimental group and sn to the control group, where s > O, then

n is given by

where = s P ~ ~ %

Schlesselman (19741 provides a formula for the sample size needed to detect a

difference between the relative risk R = 2 regarded as clinically important to detect.

To test Ho : R = 1 against Ha : R = Ra, the sample size required for a test of level

a to have power 1 - is given by

where P~ = $pC(l + Ra). Sirnilar formulas are given by Mackuch and Simon [1978]

and Blackwelder [1982]: in estimating the sarnple size requirernents for clinical trials

for bioequivalence? thai is for trials designed to show that an experimental therapy

is equivalent in efficacy to a control therapy. This is of interest when the control

therapy is invasive or toxic. Donner (19841 also reviews several papers that considers

the stratification of subjects into k categories and Cases where one accounts for

Spiegelhalter and Freedman [1986] study the determination of sample size in clin-

ical trials, when the study consists of comparing experimental therapy against a stan-

dard treatment or placebo. They note that the usual method of testing Ho : d = O

against Ha : d = da, and computing the sample size required to have a test of level a

and power 1 - 0, has some weaknesses. In particular, the specification of 6, is often

vague. and usually is Wuggled until it is set at a value that is reasonably plausi-

ble, and yet detectable given the available patients". In this paper Spiegelhalter and

Freedman propose a different approach that takes account of the pnor clinical opinion

about the treatment difference. This method consists of setting the nul1 hypothesis

at d = 6,, the smallest clinically wort h-wile irnprovement necessary to recommend

the new treatment. The prior distribution f (6) of 6, is chosen based on information

about 6 available to the investigator prior to the conduct of the trial. Decisions can

then be made in the following way: Suppose a trial with a maximum n subjects is

envisaged. After data are collected, an interval for this treatment can be formed (this

rnethod applies to both confidence intervals or Bayesian credible intervals). If this

interval lies wholly below 6,, then the new treatment is inferior to the old one. If the

interval contains 6,, then the trial is not conclusive, and if the interval lies wholly

above 6, then the new treatment is superior. Denote by d the event that the new

treatment is superior. The power of the test is a function of 6. For example, if the

statistic of interest follows a normal distribution, ,Y, - N(6, 02/n), then the power

of the test is given by

where a(.) is the standard normal distribution function. For the derivation of equa-

tion 3.11, see Casella and Berger [1990]. The probability of correctly concluding that

the new treatment is superior is then calculated based on the prior distribution of 6.

A sample size can then be calculated to insure that P ( A ) is greater than some prede-

termined value. The authors acknowledge that this method is subjective, in the sense

that it depends on the prior distribution of 6, and that this prior distribution may

sometimes reflect the overenthusiasm of the trial planners. The authors hope that

with the frequent use of this method, planners will become trained to think deeply

about the trial before they conduct it, so that more objective priors for 6 mil1 be

proposed. See Spiegelhalter, Freedman and Parman [1994] for further discussion of

this method.

Arkin, hf i tchell, and Wachtel (19901 studied sample size determination for assess-

ing the properties of a diagnostic test B, that is, the sensitivity, the specificity, and

the positive and the negative predictive values of that test, by comparing it to a

reference diagnostic test A. They denote by x, and xb the test characteristic of in-

terest for tests A and B, respectively, and they test the hypothesis Ho : lr, - = O

against Ha : ?r, - r b = 9, at level a, where d is the smallest difference between and

q, considered to be clinically important. In order to perform this hypothesis test,

independent random samples of sizes ni and n2 are required from tests A and B,

respectively. The authors determine the minimum common sample size n = ni = 712

needed to ensure that the power of the test is 1 - ,O at a prespecified alternative

6, > 0. They find that

mhere no = (rra + 7rb)/2. Equation 3.12 is equivalent to equation 3.10 given above.

In ma- studies the main purpose is to estimate a particular test characteristic

to an error of f d. To estimate the sample size, Arkin, Mitchell, and Wachtel [1990]

propose the use of the standard binomial equation

derived frorn equation 2.3.

Simel, Samsa and blatchar [1990] studied the sample size requirements for accu-

rately estimating likelihood ratios based on wid t hs of confidence intervals. In general,

suppose that one has two random samples of sizes nl and n2, from two binomial pop-

ulations with parameters pl and p,, respectively. Suppose that one is interested in

where q / n l and x2 /n2 estimating p1/p2. The authors propose the estimator 02,n2 ,

are the observed sample proportions. A 100(1 - a)% confidence interval for p1/p2 is

To apply this general formula to likelihood ratios, suppose that the results of the

diagnostic test are given as in the 2 x 2 table 3.2, where a, b, e and f are al1 observed.

The sensitivity is estimated by a/(a + e ) and the specificity is estimated by f / (b + f ) ,

so that LR+ is estimated by a(b + f )/((a + e)b) , and LR- is estimated by e(6 + f) / ((a + e) f ) . Using the usual normal approximation to the binomial distribution,

Test results u/

an approximate 95% confidence interval for the sensitivity

Table 3.2: Diagnostic test versus true disease state when a gold standard is available.

and similarly, an approximate 95% confidence interval for the specificity is

Applying equation 3.13, an approximate 95% confidence interval for LR+ is given by

(log ( sens ) ji - sens + spec) exp I 1.96

1 - spec a b '

and an approximate 95% confidence interval for LR- is given by

where sens and spec are the estimated sensitivity and specificity, respectively. One

can use these equations to estimate the sample size based on accuracy of confidence

intervals as described in chapter 2. The sample sizes are not unique, however, so

a constraint on nl and nz must be imposed. One common constraint is n, = knz,

where k is predetermined by the investigator. Often, k = 1.

Simel, Samsa and Matchar [1990] argue that these methods sometimes result in

a sample size that is too large to be acceptable. This is because in order to use the

formula. one must provide values for sens and spec, and certain values may cause

large sample sizes. In this case, further constraints on the !ikelihood ratios can be

set by expert opinion. For example, upper and lower bounds can be imposed on the

likelihood ratios, and sample sizes are found taking into account these constraints.

However, a difficulty arises, since sens and spec are unknown before the study is

conducted, and it is not obvious which values will provide "conservative" confidence

intervals, in the sense of guaranteeing the desired interval width for d l data sets that

may arise. Confidence intemals for log(LR+) are of the form

1 - sens spec log(sens/f 1 - spec)) f ZaI2 +

(sens)n, (1 - spec)nz '

If we adopt the constraint nl = kn2, and if we impose upper and lower limits on

sens and spec, then the maximum length of the confidence interval is achieved when

sens and 1 - spec take their minimum values within their feasible ranges. I t is

not obvious, however, what an appropriate length of a confidence interval on a log

scale would be. In al1 three examples studied in this paper, there appears to be some

confusion between the exact true (but unknown) values and estimates of these values.

Consider, for example, their problem 1. Here they assume that the sensitivity is a t

least 0.8 and the specificity is assumed to be exactly known and equal to 0.73. This

implies that 2.96 5 LR+ < 3.70. Therefore, we certainly do not need a study to

confirm that the confidence interval does not contain the value 0.2, since under their

assumptions, it never will. Similarly if we consider the sample size suggested by the

authors, n = 73.4, then the upper limit of the confidence interval around a point

estirnate of 2.96 is

which exceeds 3.70.

Wickramaratne [1995] notes that the standard methods for determining sample

sizes in epidimiologic studies are based on simplifying assumptions that are often

unrealistic. He also notes that methods that make less restrictive assumptions have

been developed in recent years, and reviews some of these methods.

Joseph. Wolfson and du Berger [1995] studied sample size calculations for estimat-

ing binomial proportions, using a Bayesian approach. They used exact calculations

involving Beta quantiles, and based computations on highest posterior density sets.

Sample sizes from the average coverage criterion, the worst outcome criterion and the

average length criterion, al1 defined in chapter 2, were compared. Joseph, du Berger

and Belisle [1996] used Monte Carlo techniques to determine the sample sizes required

to estimate the difference between two binomial proportions for the same three cri-

teria. Similar methods for the case of normal means and the difference between two

normal means were considered by Joseph and Belisle [1997].

In summary. many authors have studied the estimation of the prevalence of a

disease from the results of diagnostic tests. Several authors have proposed a maximum

li kelihood estirnator when the sensitivity and specificity of the test are exactly known:

but Iittle work has been done on the estimation of the prevalence of a disease in the

absence of a gold standard test. Although the literature on the determination of

sample sizes in medical research is very rich, no attempt has been made so far to

study the sample size needed to estimate the prevalence of a disease to within a given

accuracy when no gold standard is available. In the remainder of this thesis, we shall

study the question of estimating the prevalence both using frequentist and Bayesian

approaches. Most of Our work will be focused on the determination of the sample size

needed to estimate the prevalence to within a given accuracy, when the sensitivity

and/or the specificity of the test are unknown, although in the next two chapters we

will first study the simpler case when they are assumed known.

Chapter 4

Sample size in binomial studies: A

new criterion

Introduction

As discussed in chapter 1, estimating the prevalence of a disease in a population

is a frequently occurring problem in medicine. One way to do this is to obtain a

sample from the population and test each individual in the sample for the disease.

If the test used is a gold standard, then the number of diseased individuals in the

sample is the same as the nurnber of individuals with positive test results, and the

problem of estimating the prevalence is the classical problem of estimating a binomial

proportion. In this chapter, we will look, from a frequentist point of view, a t the

problem of determining the sample size required to estimate the prevalence of a

disease within a given accuracy when the test used is a gold standard. In future

chapters we d l consider the more realistic problem of a test where misclassification

errors are possible.

Let O denote the prevalence of the disease in the population under study. Consider

a sample of size n frorn that population and denote by x the number of individuals

from the sample who test positive. Since the diagnostic test is a gold standard, x is

also the number of individuals from the sample who have the disease. Thus in this

case 8 = p.

Many recent textbooks on sample size determination (for example, Desu and Rag-

havarao [1990] and Lemeshow et al [1990]) suggest basing sample size calculations for

binomial experiments on criteria such as

This formulation ensures that the sample size wi11 be sufficient to estimate the true

binomial parameter O by the usual unbiased point estirnator 6 = zln, in the sense

that (e - 01 5 d with probability at least 1 - a. For suitably chosen xi and 2 ~ : the

left hand side of 4.1 is equal to

where n is the sample size. However, both the summand as well as X I and x2 depend

on the unknown value of 8, rnaking direct use of 4.2 and therefore 4.1 almost im-

possible in practice. One "conservative" solution (which we will show to not always

be conservative), suggested by Desu and Raghavarao [1990] and others, is to assume

that 4.2 is minimized when 0 = 0.5. More generally, if it is known that B 5 m < 0.5

or 0 2 m > 0.5, for some m, then an alternative solution would be to substitute

19 = m in 4.2. This would still be conservative, but guard against the possibility of

using an unneccesarily large sample size if 0 = 0.5 is used when in fact 0 << 0.5 or

B >z 0.3. The intuition behind labelling these substitutions "conservative" is that

the variance function of a binomial random variable, nO(1 - O ) , is maximized over the

interval (a, b) c [O7 11 by the value in (a, b) closest to 0.5. However, this reasoning is

only partially correct, since the effects of 0 on X I and x2 are ignored in focusing only

on the binomial variance.

It is also often suggested that the exact calculation in 4.2 can be replaced b - that

given by the normal approximation to the binomial distribut ion. Letting

1 e y i - eln-. 7- i: - exp (-$) ciy. 2=21 6

The limits yl and y2 are unknown, since 0 is unknown. However, conservative sample

sizes are available by substituting 0 = 0.5 or B = m as above, and using quantiles

of the normal distribution to approximate y1 and y*. This leads to the sample size

formula

where ZaIS is the usual standard normal upper 100(1- :)% quantile, and [al denotes

the smallest integer larger than a. In the case where 6=0.5, 4.4 reduces to 2;- a

n = [=1. These conservative solutions are correct only to the extent that the nor-

mal distribbt ion approxirnates the exact underl ying binomial probabilities. However,

the degree to which this approximation affects the sample sizes is usually unknown.

4.2 Anomalies

The main problem with using the standard formulation 4.1 is that 8 is unknown, and

it is dificult or impossible to ascertain which value of 0 is the most conservative.

Consider the following example:

Example 4.2.1 Let d = 0.1: 1 - a = 0.6, n = 5, and B = 0.4. Then 4.2 becomes

P ( x = 2) = 0.3456 while for 8 = 0.3, 4.2 reduces to P(x = 2) + P ( x = 3) =

0.625. Therefore, the minimum probability is not always at tained by substituting

0 = 0.5, that is, the value that provides the maximum variance is not atways the

most conservative in the sense of minimizing 4.2.

There are also other problems associated with the use of criterion 4.1:

Example 4.2.2 Suppose d = 0.1, 1 - a = 0.6 and n = 5. As above, for 0 = 0.5,

4.2 gives 0.625. There are several anomalies associated with this situation. First

consider the same calculation, but replace 0 = 0.5 by B = 0.50000001. In this case.

4.1 becomes P ( x = 3) = 0.3125. Thus the discrete nature of the binomial distribution

is such that a little disturbance in 0 reduces the probability by half. Since we will

rarely know 0 a prion' with a high degree of accuracy. this may be a serious concern.

Second, restore B = 0.5, but let d = 0.0999999999. Then 4.2 becomes O! Hence a

small decrease in d costs al1 of the probability. If strict inequality is considered in

equation 4.1, that is.

P(l'i /n - 81 < d ) 2 1 - a,

the probability is again O. Furthermore, if B = 0.5, the smallest n that gives

is n = 3, but if we take n = 6 then 4.2 becomes P ( x = 3) = 0.3125, that is, half of

the probability is lost when considering a larger sample.

While for ease of exposition the above examples featured only small values of n,

table 4.1 indicates that similar problems persist for much larger n. In the next section

a modified criterion is suggested to replace equation 4.1, that improves upon some

of the undesirable features illustrated above. In particular. sample sizes from the

modified criterion can be calculated exactly via an easy to program algorithm, and

the problems due to the anomalies illustrated by the above examples are diminished.

4.3 Modified criterion

For any giveri 8, a and d, criterion 4.1 ensures that

where 19 = z / n is the usual binomial maximum likelihood estimator. However, one

could also consider

P{-a 5 B - 0 5 + b } > - 1 - CI?

with ( a + 61 5 2d. Therefore. instead of the interval of length 2d centered at B. the

highest density intemal of length 5 2d containing 0 is considered. This is similar in

concept to stvitching to exact binomial confidence intervals rather than those based

on the normal approximation 4.3. Exact confidence intervals are cornmonly used

when n is small or 0 is near O or 1. Let

Do = { d l intervals I such that 9 E I and l(1) 5 2d} ,

wbere l ( 1 ) denotes the length of the interval I . Then the sample size can be defined

as the minimum n satisfying

wiiere k is an integer. and the infimum is over the range of possible values for B.

Remark 4.3.1 For given n and 19 5 0.5. P(k: O ) = ( F ) gk(1 - 1 9 ) * - ~ takes its highest

value on the point r, where -& 5 19 c S. For 0 = &. P ( r - 1: 19) = P(r; O ) .

Also P(k: O) > P ( k - 1; 0) if and only if -& < O. -4 similar argument can be made

when 6 > 0.5. See Rohatgi [19841 for the proofs of these statements.

Remark 4.3.2 Fix an integer k such that 1 5 k <_ 1-212. Then P(k / n; 0) is dif-

ferentiable with respect to 0, and it is easy to see that it increases if and only if

6 5 kln.

Definition 4.3.1 For given d and no define i to be the integer such that i /n 5 2d <

( i + l ) / n .

In what follows: it is assumed that i > 1. For i = O al1 of the results proven below

will be trivially true.

Lemma 4.3.1 Given d, a, and n. a point 8 can have a t most two highest density

intervals of length i , namely [sln, ( s + i ) /n ] and [ ( s + l)/n. (s + i + l)/n]. for some

integer S .

ProoE Suppose [sln, (s+i)/n] and luIn, (u+i)/n] are two highest density intervals

of length i corresponding to 8, and suppose without loss of generality that s < u.

It suffices to prove that u = s + 1. Since P(s;B) 2 P(s + i + 1;0), P ( s - 1 ; 6 ) 5

P ( s + i ; 8 ) , P(u;O) 2 P(u + i + 1;8), and P(u - 1;O) < P(u + i ;6) , by Remark

1: s 5 r < s + i + 1 and u 5 r < u + i + 1, where r is the point of maximum

probability. Hence s # u - 1 implies s < u - 1 and s + i < u + i - 1, which implies

that P(u - 1; 0) > P(s; 0) 2 P ( s + i + 1; 6 ) > P(u + i; O), which is a contradiction.

Lemma 4.3.2 Given d, a, n and BI < O*, let [sln, ( s + i ) /n] and [uln, (u + i ) /n ] be

highest density intervals corresponding to O1 and O2 respectively. Then u 2 S.

49

Proof QI < O2 implies that r t 5 r2 , where rl and 7-2 are the points of maximum

probability corresponding to OL and B2 respectively, defined in Rernark 1. Suppose

that u < S. then by Remarks 1 and 2, P(u+i+I: 0,) > P(u+i+l; 64) > P(s- 1: e l ) _>

P(u: O1) > P(u; B2): which is a contradiction.

Lemma 4.3.3 Given d: a. and n. an interval 1 is a highest density interval of length

i corresponding to some 8 E [O, l] if and only if it is an interval from the set

{[O, i/n]: [l /n, ( z + l ) / n ] , . . . , [ ( n - i ) /n . 11).

Proof: Let Ik denote the interval [ k /n , (k + i) /n]. To prove the necessary condition:

just note that if j/n < a < ( j + l ) / n 5 (n- i ) /n l then the probability of the interval I j

is larger than the probability of the interval [a, a + i l n ] . The proof of sufficiency will

proceed by induction. It is clear that Io is the highest density interval corresponding

to 0 = O. Suppose Ij-L is a highest density interval, then it suffices to prove that I j

is also a highest density interval. Let

8, = rnâu(0 : 1,-1 is a highest density interval for O } .

We will prove that both I j V I and I j are highest density intervals for 0,. Suppose

Ij-1 is not a highest density interval for 8, and let I be such an interval. Set E =

P ( I ; O,) - P ( I j - i : 8 , ) . For every k, there exists dm > O S U C ~ that (O j - O( < dk imphes

Let d = minbk, and take 6 such that O < 8, - 0 < 612 and such that I j - , is a highest

density interval for B. Then

Therefore P ( I j - l ; O ) < P ( I ; O ) , which is a contradiction. To prove that 1, is a highest

density interval for O j ? let

S = {s : I, is a highest density interval corresponding to some 8 > 8,)-

S is not empty since n - i E S. Letting v = min S, [vin, (u + i)/n] is a highest density

interval for 0,- To prove this, let 0 be such that O < t9 - 9, < 612: let I j be a highest

density intervai for B: and proceed as above. Hence by Lemmas 1 and 2, u = j and

I j is a highest density interval for 8,-

Theorem 4.3.1 Let d, a' and n? be given. Let i = LSndJ, that is, i is the largest I

(n 1 i+l integer srnaller than 2nd. For j E (1,. . . , n - i}, define rj = I;- and Bj = ,%.

L+j ) I - tr ,

Denote by H(B) the probability content of a highest density interval corresponding

Theorem 1 states that in order to calculate the minimum highest density region over

0 E [O, 11- it suffices to consider only the n - i values of 0,. Similarly, if O 5 m: only

H(Bj), Oj < m need to be computed, and the sarnple size is the smallest n such that

Proof: From Lemmas 1 and 2, and the definition of B j , j E (1.. . n - i), the

highest density interval of a point B E (O,, 8j+i) is unique and equal t o 1,. At 0,:

P ( I j - l ) = P(Ij) which reduces to (7- ,) $-' (1 - O,)"-j+' = (y+) O?' (1 - 0, *

J

x / n E I,: O ) , and therefore H(t9) = Cgzj p{x; O} - z:=i+,+l p{x; 0). On the other

hand (see Rohatgi [1984]),

g (O) is increasing. Furt hermore,

(Z1) (y- 1) n - i - j g(ej) = 1 - -- = 1 - < O and (73) (Y+,) n - j + l

so that g(B) has only one zero in 4. Hence H ( 0 ) has a maximum in I,, therefore in

this interval P(6) is minimum at one of the end points 0, or

This suggests the following algorit hm:

1. Given d, rn 5 0.5 and a, select an initial guess for the sample size n. (If

m > 0.5, use with m' = 1 - rn in place of m. The standard formula 4.4) with

B = m could be used to obtain the initial guess.)

;$T H, and Oj = 2. Calculate i = L2nd], T, = j = l , ? . . . n - i .

3. Calculate H(Bj) = ( k ) B:(1 -

4. Letting s = mm{j : 8, 5 m) , calculate H ( m ) = C;zs ((:) mk((l -

5 . (a) If there is no bound for O, calculate = min({H(Bj) : 1 5 j 5 n - i ) ) .

(b) If 8 < rn 5 0.5, calculate PmiR = min({H(Bj) : Bj 5 m } , H ( m ) ) .

6. Repeat steps 2 through 5 with a new value for n, until Pmin 1 - a for n

but not for n - 1. For example, subsequent values for n can be selected via a

bisect ional search algori t hm.

The above algorithm is straightforward to program in most programming lan-

pages.

4.4 Examples

Consider again example 4.2.1 from Section 2. This example illustrated that sub-

stituting O = 0.5 for the unknown 6 does not guarantee a conservative probability

calculation. It is also true that 8 = 0.5 is not necessarily conservative when using

the modified criterion. However, Theorem 1 states that the minimum highest density

probability intenal occurs when 6 = 9, for some j 5 n - i, so that the exact minimum

probability can easily be found, which is not in general the case when using 4.1.

Under the modified criterion, small disturbances in B do not greatly affect the

probabilities, as was the case for the standard formulation in Example 2. For example,

using t9 = 0.50000001, the highest density interval remains a t 0.625. Small decreases

in d also do not affect the highest density interval probabilities when k / n < d <

(k + l ) /n for some integer k. However, when d = kln, a small decrease in d produces

a loss of one of the end points of the interval. In contrast, under equation 4.1 both

end points are lost. When the sample size increases to 6, the probability under the

rnodified criterion is 0.5469 while under equation 4.1 it is 0.3125.

Table 4.1 provides ten additional examples, for a selection of values for 1 - a, d7

and m. The examples illustrate that in experiments requiring small samples, such as

when a and d are relatively large, the difference between the sarnple size computed

exactly and the one computed using the normal approximation can be as much as

50%. More interestingly, the differences can still approach 20% even when 1 - û

takes on the usual 0.9 or 0.95 values, and the sample sizes near 100. Furthermore,

up to sample sizes of 3.000. In many cases: these differences may be of practical

importance. Of course the exact intervals are assymmetric about the point estimate

of the proportion while the normal approximation is limited to symmetric intervals.

Therefore it is not surprising that the exact intervals lead to smaller sample sizes.

In conclusion. in this chapter we have introduced a new exact criterion for sample

size estimation for estimating binomial parameters. If a gold standard test is used.

this method could be used to select a sample size to estimate the prevalence of a

disease. In the next chapter. we will examine methods when the sensitivity and

specificit- are known. but are not assumed to be identically equal to one.

Table 4.1: Sample sizes (SS) for various values of a: d, and mo using the normal

approximation (4.4) and the modified criterion (4.5).

Chapter 5

Est imat ing the disease prevalence

when the sensitivity and specificity

are exactly known

5.1 Introduction

In chapter 4 we discussed the problem of estimating the prevalence of a disease when

the diagnostic test is a gold standard test. If, however, the diagnostic test is not a

gold standard test, the number of diseased individuals in the sample is not directly

observable. Instead one only knows the number of individuals mho test positive, and

estimating the prevalence therefore depends on knowing the characteristics of the

test' in particular. the sensitivity and specificity. In this chapter we will study the

problem of estimating the prevalence of a disease in the absence of a gold standard

diagnostic test. We will suppose that the sensitivity S and the specificity C of the test

are both exactly known and equal to s < 1 and c < 1, respectively. Since diagnostic

tests that have the sum of their sensitivity and specificity below 1 can be improved

by reversing what is considered to be a positive test, without loss of generality we will

assume that s + c 2 1. Although it is instructive to consider this problem, t his mode1

will usually be at best an approximation to reality, since it is very rarely true that

S and C are exactly known. Chapter 6 will consider the more realistic case when S

and C need to be estimated along with the prevalence.

5.2 Definitions

Let 0 denote the prevalence of the disease in a particular population. Consider a

sample of size n from the population under study, and let p denote the probability

of testing p ~ s i t i v e ~ which includes both true and false positives. Denote by X the

number of individuals from the sample who test positive. We saw in chapter 2,

equation 2.11 that,

p = se + (1 - c)(l - O ) . (5.1)

Since 8' s and c must al1 lie on the interval [O, 11, the above formula s h o w that p

must lie in the interval [l - c, s]. Solving for 8: we have

One common estimator of p is its maximum likelihood estimator (MLE). Owing to

the restriction of p to the interval [l - c, s], the MLE of p is not always the usual

binomial MLE Xln. In fact, as discussed in Rohatgi [1984]

Using the invariance property of maximum likelihood estimators, (see [SI), the ML E

Many authors have proposed

Taragin, W'ildman and Trout

I I : i f X / n ? s .

this estirnator, including Rogan and Gladen [1978] and

[1994]. The M L E ( 0 ) performs reasonably well for rnost

values of 0. When B is small however, as is the case for rnany diseases, the MLE(0)

is quite often O, even when the unobserved nurnber of truly diseased subjects in the

sample, Y, is not 0. In fact, P ( Y = 0) = (1 - O)" , while P ( M L E ( 6 ) = 0) = P ( S / n 5

1 - c), and the latter can be much larger than the former.

Table 5.1 illustrates t his phenomena for various values of 9 and n when s = 0.9 and

c = 0.8. In this table we used the normal approximation to the binomial distribution

to calculate P(,Y/n 5 1 - c ) . Since

we have

P ( M L E ( 0 ) = 0 ) = @

where ( t ) denotes the standard normal distribution function, that is,

In this chapter we Riil1 suggest an adjustment to the ICI LE of 19, useful when

M L E ( 0 ) = O . We will call this new estimator the adjusted MLE, or AMLE. Confi-

dence intervals for 0 based on the AMLE will be derived. Finally, we will derive a

method for calculating the sample size needed for the AMLE to be within a distance

d of 0 with probability at least 1 - a.

Table 5.1: P ( Y = O) versus P ( M L E ( 0 ) = O) when s = 0.9 and c = 0.8

0

Adjustment to the MLE

Suppose we have a sample of size n. Let X and Y be defined as in section 5.2. and Let

Z be the unobserved latent data representing the number of truly positive subjects.

See table 1.2.

sample size

Test results

P ( Y = O)

Table 5.2: Diagnostic test versus disease state

P ( k f LE(0)) = O

ofdisease

When choosing a subject at random from this sample. the probability of choosing

a positively testing subject is X/n, the probability of choosing a positively test ing

D- .Y-Z n-,Y-E'+Z n - Y

subject given that he is truly diseased is Z/Y, the probability of choosing a sub-

ject that tested positive given that he is not diseased is (X - Z ) / ( n - Y). and the

probability of choosing a subject that has the disease is Y/n. We have

S / n = Y / n ( Z / Y ) + ( 1 - k' /n)(X - Z ) / ( n - Y ) . (5.2)

Remark 5.3.1 Note that E(S /n ) = p: E(k'/n) = 0, E ( Z / Y ) = s and E ( ( X -

Z n - Y ) ) = 1 - c. Also, in deriving the MLE(B) we used the equation p =

Os + ( 1 - 8) (1 - c ) , which is equivalent to

E(x/n) = E ( Y / n ) E ( Z / Y ) + ( 1 - E ( Y / n ) ) E ( ( X - Z ) / ( n - Y)). (5.3)

Let x denote the observed value of ,Y in the experiment. To define the AMLE when

x / n 5 (1 - c ) , we will assume that equation 5.3 remains true if x is known. Therefore.

we assume that

from this we will derive an approximation to the term E(kF/nlx).

Approximating E(Y/n(x) when x/n 5 (1 - c): When x/n 5 (1 - c)? Y is usu-

ally small relative to n. By taking a large enough sample, we can make P(Y =

0) = (1 - 0)" as small as we like. Therefore, we will suppose that Y # 0, and

hence E(Z /Y lx ) can be approximated by S. In addition, for large sample sizes,

(S - Z) / ( n - Y ) is approximately normally distributed with mean (1 - c) and variance

c( l - c) /n . Therefore, to approxirnate E ( ( X - Z)/(n - Y ) lx) we consider a random

variable H that is normally distributed with mean 1 - c and variance c(l - c ) /n ,

but with the added constraint that H 5 1/72. We then calculate E ( H ( x ) , which a p

proximates E ( ( S - Z)/(n - Y ) ( x ) . Substituting these approximations, equation 5.4

becomes

x / n zz E ( Y / n l x ) s + ( 1 - E(Y/nlx))E(H(x).

Solving for E(Y /n (x ) gives

It remains to calculate E ( H l x ) , which will be done after the following definition.

Definition 5.3.1 We define the adjusted MLE of 0, denoted by .4MLE, to be

Remark 5.3.2 Similarly, an adjustment to the M L E ( 0 ) can be defined when x/n 2

s: that is, when the MLE(0) is 1. This can be done by reversing both what is

considered to be a positive test and what is considered to be the disease state. Then

s rvill become the specificity, c will become the sensitivity and ( n - x ) / n will be the

number of subjects from the sample who test positive. The AMLE can then be

defined similarly to the above. Therefore? without loss of generality, we will assume

in the remaining part of this chapter that X / n < s with probability 1.

Cdculating E(H1x): To calculate the AMLE ivhen x / n 5 1 - c, we need to

calculate the integral

where f H I X ( h l x ) denotes the conditional density function of H given X = x. Let

fH (h) denote the density function of H, FH (h) denote the distribution function of H ,

and F H l s ( h ( x ) denote the conditional distribution function of H given X = x. We

then have

1 O othenvise,

and t herefore

! O otherwise. Hence

Let ZL = h-(1-C) . then ,/- ,

= - Jc(:Tic) erp (- 2 4 - c)/n

so that

1 O , otherwise,

the AM L E can be defined as

Simulations nTere run to compare the estimates from the AMLE to those from

the MLE(O) for sewral common situations. Since the M L E ( 0 ) and the AMLE are

equal when 1 - c < x/n < S . table 5.3 illustrates several representative cases when

they are not equal. that is when x / n 5 1 - c.

0 1 sarnple size 1 sensitivity ( specificity

0.03 300 0.6 0.9

0.02 LOO 0.9 0.8

Table 5.3: Selected examples of the value of the AM L E simulated under the indicated

values for the sarnple size, true value of the prevalence 8: and sensitivity and specificity

of the diagnostic test. The value for x represents the nurnber of positive tests out of

the total sample size. Since x/n 5 1 - c, the MLE(0) is zero for al1 of these cases.

'lote that given the data, the A M L E does not depend on O, although of course,

i3 is needed to generate x in a simulation. For example, the first row of table 5.3

show that if 19 = 0.04, n = 100, s = 0.9 and c = 0.8, an observed x = 19 leads to

A M L E = 0.039. If instead 8 = 0.03 but all values for s, c, and n remain as above,

if we observe x = 19, then the A M L E will still be 0.039. We also see in this table

that when xln 5 1 - c, the further away x / n is from 1 - c, the smaller the AMLE

is. For instance. for n = 100, s = 0.9 and c = 0.8, we have

While the examples in table 5.3 seem to indicate that the .-Li\fLE irnproves on

the usual M L E in cases where they differ, we performed a more forma1 simulation

to quantify the improvement. We considered a few common cases where c = 0.8 and

s is either 0.9 or 0.8. For each of the values of û E {0.05,0.04,0.03~0.02,0.01)~ Ive

ran 10000 simulations of samples of size 100 and 500. We calculated the number of

times the MLE(0) \vas O in these simulations, and the mean AMLE for these cases.

The mean squared error (MSE) is defined as the average of the squared deviations

between the estimator and the true parameter value in each simulation. We calculated

the square root of the MSE of the i\fLE(fl), denoted by e.MLE, so that e.MLE = (A.ILE(0)-B)2 dr. 1oooo Sirnilariy, e.dMLE = J7. ( ; IMLE-O)* 10000 Since ive are interested in the

cases were the .RILE(B) and the AMLE differ, we defined the conditional >ISE to be

the MSE conditional on 2/71 5 1 - c. We denote the square root of the conditional (.4ICILE-8)' MÇE of the .4MLE by ce.-LiLILE, that is. ce.;lMLE = 4 7 . Note

that the conditional MSE of the MLE(0) is O , since

Table 5.4 contains the results of these simulations.

[ 0 ( ssize sens

Table 5.4: Mean square errors for the M L E ( 0 ) and AMLE. In this table, sens is

the sensitivity of the test and the specificity is held constant a t 0.8. The parameter

k represents the number of times x/n < 1 - c in a simulation of size 10000, mAMLE

is the average of the AMLE for these simulations where x/n 5 1 - c, e.MLE is the

square root of the MSE of the M L E ( 0 ) , e..4MLE is the square root of the MSE of

the AMLE and ce.AMLE is the square root of the CMSE of the AMLE conditional

on x/n 5 1 - c. The square root of the mean square error of the M L E ( 9 ) when

x/n 5 1 - c is equal to 0 (see text).

In most of the cases reported in table 5.4 we see that the MSE of the AMLE

is smaller than that of MLE(B). For euample, for B = 0.05, n = 100, s = 0.9 and

c = 0.8, the number of times the MLE(0) was O out of 10000 simulations is 2286. The

conditional MSE of the AMLE is 0.016, while the conditional MSE of the MLE(0)

is much larger, 0.05. If Ive increase the sample size to 500, the number of times the

MLE(B) is O out of 10000 simulations drops to 315. This is because when the sample

size is increased, x/n gets closer to the true value p > 1 - c, so the probability of

observing xln 5 1 - c decreases. We note that only in rows 8: 9 and 10 of the

table the performance of A M L E seems to be poorer than that of the MLE(0) . For

example, in row 10 the ce.A\fLE = 0.03 while ce-MLE = 0.01. This is due to the

fact that with 0 = 0.01. s = 0.8 and c = 0.8, a sample larger than 100 is needed for

the AMLE to perform well. In fact, if we increase the sample size to 500 (row 20)

the ce.AMLE drops to 0.008 while the ce.h.fLE is stillO.O1. In the next two sections.

we will derive confidence intervals and sample size requirements for B based on the

,AM LE.

5.4 Confidence intervals

In this section we will prove that to find a 1 - a confidence interval for O , it suffices

to find a 1 - a confidence interval for p, which can be done, for example, using the

.MLE of p. In fact, we will prove that if we can find a positive nurnber Z such that

5.4.1 Confidence interval for p

Finding an approximate confidence interval for p is the classical problem of finding an

approximate confidence interval for a binomial proportion, with the added restriction

that these intervals must be contained within the feasible range of pl the interval

[l - c' s]. In fact? suppose that an 1 has been found satisfying equation 5.6. This

means that

P ( p E [Xln - 1, Xln + 11) 2 1 - a.

Since P ([l -c, s]) = 1 and hence P([O, 1) - [l -c, s]) = O, by the law of total probability,

Therefore [ S l n - Z, S / n + 11 n [l - c, s] is a 100(1 - a)% confidence interval for p.

To find 1 satisfying equation 5.6, we use the classical method. found in almost

al1 statistical textbooks. It consists of taking the normal approximation to the

binomial(n, p) distribution, and looking for the value 1 such that

This leads to

1 = za/2J-,

where is the usual standard normal upper 100(1 - :)% quantile. Since p is

unknown, it is usually approximated by xln. In the current context, however, p is

restricted to the interval [1 - c, s]. Therefore, to find 1 satisfying equation 5.8, ive

approximate p by xln only when 1 - c < x/n < s and by 1 - c when xln 5 1 - c. An

approximate 1 - cr confidence interval for p is then given by the interval that results

from the intersection of [xln - 1, xln + 11 with [l - c, s].

Remark 5.4.1 In most cases (depending on the sample size and on a), the intenec-

tion of the two intemals is not the empty set. Since we are dealing with approximate

and not exact confidence intervals: however, it can happen that the two intervals do

not intersect. In these cases the above procedure does not lead to a 100(1 - a)for p.

Since the length of a confidence interval increases as a decreases, and goes to s + c - 1

which is the length of the interval of support of p, [l - c7 s] , as a goes to O. ive can

always find a ,O < a for which the above procedure leads to an approximate (1 - ,8)

confidence interval.

5.4.2 Confidence interval for 8

Assume chat an 1 satisfying equation S.6 (at least approximately) has been found.

PVe now prove the following theorem:

Theorem 5.4.1 If [ X / n - 1, X / n + 1) n [l - c, s] is a 100(1- a)% confidence interval ,Y n+c- t -l S n+c-14-1

for P , then [ /S+C-l 1 jS+=- ] n [O, 11 is a 100(1 - a)% confidence interval for 8

containing both MLE(0) and =IM LE.

ProoE Recall that 0 = S. By the law of total probability we have

Xt suffices then to prove that

and

Consider first the case when X / n > 1 - c. The .AMLE of O is then by definition simply

.\.ILE(B) = -Z-- s+c- 1 , so that

Therefore,

so that X/n+c-1-1 ,Y/n+c-1-1

s + c - 1

and by the law of total probability

Consider next the case when X/n 5 1 -c . We then have M L E ( 0 ) = 0, and .huLE = s /n-(1-c)tM(X)

s-(i-c)+nr(x) . Thus

If IX/n - pl 5 1 with probability at least equal to 1 - (Y, then

with probability a t least equal to 1 - a. On the other hand,

so that

with probability at least 1 - a. Therefore

with probability at least equal to 1 - a.

The right hand side of 5.9 can be written as

and by the law of total probability we have

X / n + c - 1 - 1 X / n + c - 1 - 1 1 n [O, 11) 2 1 - a. s + c - 1 s + c - 1

,Y n+c-1-1 S n+c-l+i Therefore, a confidence interval for 6 is obtained by intersecting [ ',+,-, , 's+c-L 1 with [O: 11, and this confidence interval contains both MLE(0) and AMLE.

We calculated 95% confidence intervals for some simulated cases, using a sample of

size 100, a specificity of 0.8, and various values for the sensitivity. Since here we are

mostly interested in the case where x / n 5 1 - c, we present several such examples in

table 5.4.2.

Since Our methods are only approximate, we ran a simulation to estirnate the

proportion of times the 95% confidence intervals captured the true O. We present the

results in table 5.6, for a sample size of 100. Ten thousand simulations were run for

each case. We note that in most cases reported in table 5.6, the proportion of times

the 95% confidence intervals captured the 0 is very close to 95%. The true coverage

proportion can be viewed a s a binomial proportion, since each time a confidence

interval is calculated, the true 6' is either in or out of the interval. Therefore, if we

have a sample of size 10000, the accuracy of the estimated coverage proportion is

where prop is the proportion of times the 95% confidence intervals captured the true

8. Here, the values of prop are al1 close to 0.95: so 1 can be approximated by

which is very small. In surnmary, the method appears to work wvell, at least for

n = 100, and should also work at least as well for larger sizes, and for values of B

further from O. Since it is not reasonable to estimate very small prevalences with

small samples, the method should work well in al1 cases of practical importance.

8 sens x -4MLE conf.int

Table 5.5: Confidence intervals for 8. The specificity= 0.8, sample size= 100: sens is

the sensitivity, and conf.int is the 95% confidence interval of B calculated with the

method given in section 5.4.2.

Table 5.6: The proportion of times out of 10000 the 95% confidence intervals cap

tured 8. The sample size= 100 for al1 simulations, sens is the sensitivity, spec is the

specificity and prop is the observed proportion of times the 95% confidence interval

captured 8.

5.5 Sample size for estimating the prevalence

In designing a study to estimate the prevalence of a disease in a given population.

it is always important to consider how large the sample size should be. Therefore,

we would like to know how large a sample we need for the AMLE to be within a

distance d of the true prevalence 0 with high probability. From the preceding section.

the AMLE is within a distance d of t3 if Xln is within a distance 1 = d(s + c - 1) of

p. Therefore, we are looking for the sample size n such that

Again, the normal approximation to the binomial distribution can be used. leading

to the usual binomial sample size formula

so that

We see from this formula that the sum of the sensitivity and the specificity has a

very large influence on sample size requirements. The closer this sum is to 1, the

weaker the test. and the larger the sample size. In the extreme case where s + c = 1.

no sample size is sufficient, since the test is completely uninformative. On the other

band: the minimum sample size occurs when we have a perfect gold standard test!

Table 5.7 provides sorne illustrative examples. In that table, ssize is the conserva-

tive sample size, obtained by replacing p in equation 5.10 by 112, and ssize.smaI1 is

the smallest possible sample size under the given conditions, obtained by replacing p

in equation 5.10 with 1 - c. The variable ssize-small estimates the sample size when

estimating the prevalence of a rare disease. In this case, from the formula 0 =

when 19 is small. p is close to 1 - c. When we anticipate a low prevalence, smaller val-

ues of d are usually needed for the study to be informative, leading to larger sample

sizes.

1 d 1 sens 1 spec ( ssize ( sçize.srnall 1

Table 5.7: Sample size for estimating O to accuracy f d, when a = 0.05. Here sens is

the sensitivity, spec is the specificity, ssize is the conservative sample size obtained by

using the normal approximation with p = 1/2, ssize.smal1 is the sample size obtained

by using the normal approximation with p = 1 - c .

Chapter 6

Bayesian estimation of disease

prevalence and sample size in the

absence of a gold standard

6.1 Introduction

In chapter 5 we studied the estimation of the prevalence of a disease based on the

results of diagnostic tests with known sensitivity and specificity. Such situations are

very rare, since most often we only have estimates of the sensitivity and specificity

of a test, not exact values. Performing analyses assuming these estimates to be the

exact values without accounting for the real uncertainty may produce very misleading

results. Unlike standard binomial parameter sample size formulae, we will show that

there is no "conservative" values for S and C that can be used to produce consenative

sample size estimates, and that al1 uncertainty must be accounted for. In this chapter

we will use a Bayesian approach to estirnate the prevalence of a disease in the absence

of a gold standard test, and where the sensitivity and specificity of the test are not

known exactly. Based on this approach, we will then calculate the srnallest sample

size for which a 1 - a credible interval for the prevalence has total width 1, using

an average coverage criterion. We will see that even small uncertainties about the

sensitivity and the specificity of a diagnostic test may lead to a large increase in

the sample size needed to reach the desired accuracy, and that in many cases this

accuracy cannot be reached even with an infinite sample size.

In this chapter we again assume that S + C 2 1.

6.2 The case when the sensitivity and the speci-

ficity of the diagnostic test are exactly known

In order to later examine the degree to which the sample size is affected by the

consideration of the uncertainty in the estimates of the sensitivity and the specificity

of the diagnostic test' first suppose that the sensitivity and specificity are known,

and equal to s and c respectively. Whereas in the previous chapter we considered a

classical approach to this problem, we now take a Bayesian approach. As before, let 0

denote the prevalence of the disease, p denote the probability of having a positive test

result. and s and c be the known values for the sensitivity and specificity, respectively.

6.2.1 Prior density for p

Suppose that the prior density function of B is f(0): where B takes values in the

interval [a, b] with O 5 a 5 b 5 1. We have seen in chapter 5 that

so that p is a linear transformation of O. The Jacobian of this transformation is dl9 J=,=' S+C- 1 1 where $ denotes the derivative of O with respect to p. Therefore,

the prior density function of p is

[ h ( p : . ; L ' ) sic- 1 7 P l i P S P 2

f P b ) =

where pl = (s + c - 1)a + 1 - c, and p2 = (S + c - 1)b + 1 - c.

Example 6.2.1 Suppose that the prior distribution of 0 is uniform on the interval

[a7 b]. Then the prior distribution of p is also uniform. It is easy to see that p - U [ p l , p 2 ] , where pl and p2 are given above.

Example 6.2.2 Suppose the prior distribution of O is Beta(a ,P) , so that [a. b] =

[O' 1). Then

where T ( t ) is the gamma function, and cr and 9 are positive real numbers. Therefore,

Equation 6.2 represents the equation of a Beta(<r, p) density function restricted to

the interval [l - c, SI .

6.2.2 Posterior density of p

Likelihood

If the diagnostic test is aven to a sarnple of n subjects, x of whom test positive for

the disease, the likelihood function is binomial, and is given by

Marginal probability function of X

The marginal probability function of the data x is then

'(1 - ~ ) " - ~ f ( ~ - ) d p s+c- 1 ! O 5 z _< n

O , otherwise.

Posterior density of p

By Bayes' theorem, the posterior density function of p is

O , otherwise,

that is

O , otherwise.

Posterior mean of p

T h e posteriormean cd p given data 2, denotedhere-by fi , is

Posterior mean of 0

Let 8 denote the posterior rnean of B. From equation 6.1 we have

E ( ~ ~ x ) = ( S + - i)E(eIx) + 1 - c?

where E ( . ( x ) denotes the conditional expectation given S = x. Therefore we have

6.2.3

Suppose

Sample size determination via the average coverage cri-

terion

one is looking for the sample size n such that the posterior credible set

[0 - d. 19 + dl has posterior coverage probability of a t least 1 - a for a predetermined

d and a. That is, suppose we seek n such that

Let cov(z; d) denote the posterior coverage probability of the interval [é - d' 0 + d]

for given x. Shen

Let ccni(d) denote the average coverage probability of the intervals [6 - d. ê + dl over

the marginal probability of x' that is n

cou(d) = cou ( x ; d ) m ( x ) . (6.5) r=o

Recall that m(x) is the distribution of x induced by the prior density f ( O ) . Thus'

cov(d) is the average coverage probability of the posterior credible intervals of length

2 4 centered at e , where the average is weighted by r n ( x ) . The average coverage

criterion (see chapter 2) states that this weighted coverage must be at least 1 - o.

By the linear transformation 6.1 and by equation 6.3, cov(x; d) can be written as

p - ( 1 - c ) p - ( 1 - c ) p - ( 1 - c ) COU (x ; d ) = P( - d < - 5 + d ( X = x),

s + c - 1 s t c - 1 s + c - 1

hlultiplying by s + c - 1 and then adding 1 - c,

c o u ( x ; d ) = P ( p - d ( s + c - 1 ) 5 p 5 p + d ( s + c - 1)I,Y = x).

Therefore

Computation of the sample size

The posterior density function of p: and thus the sample size required to satisfy the

average coverage criterion defined in chapter 2: cannot be expressed in closed form.

However' we can compute the posterior rnean of 0 and the sample size needed to

satisfy the average coverage criterion using the following algorithm:

Average coverage criterion algorithm:

1. Determine the prior density function of 8, fa@). Of course, this distribution

should be based on the available prior information

2. Choose values for n, d, and a.

3. Write the likelihood of the data given p

4. Calculate f,(p), the prior density function of p:

For each value of x. O 5 x 5 n, perform the following steps:

(a) Calculate f ( p 7 r ) ,

(b) Calcul8

( c ) Calcul

ate m(x) the marginal probability function of x,

ate f (plx) , the posterior density function of plx,

(d) Calculate the posterior mean of p,

(e) The posterior mean of 0 is then

(f) Calculate the coverage probability of the interval [e - d, e + dl ?

(g) Calculate the posterior average coverage probability over al1 values of x,

(h) Find the smallest sample size n that gives an average coverage probability

greater than 1 - a.

An S-plus program that calculates the sample size needed to satisfy the average

coverage criterion given d, a , and prior information on O , is provided in Appendix B.

Note 6.2.1 The integrals in steps (d) and (f) cannot always be calculated exac t l -

However: numerical integration (included as a function in S-plus) can provide accurate

approximations to these integrals.

Note 6.2.2 The sample size in step (h) can be found through a bisectional search,

which consists of evaluating the criterion for a starting value of n? then choosing a

next value of n depending on the resulting coverage. This criterion is then evaluated

at the new value of n, and the procedure is repeated until the criterion is satisfied

for n but not for n - 1.

Table 6.1 provides some representative examples of sample sizes

above algorithm. The first three examples represent a typical

calculated using the

situation where the

sensitivity of the diagnostic test is 0.9 and the specificity is 0.8. For this situation, we

calculated the sample size needed to have an average coverage of a t least 0.95 when d

was equal to 0.05: 0.03 and 0.02 respectively. In the Iast three examples we changed

the values of the sensitivity and the specificity of the diagnostic test and the prior

distribution of the prevalence of the disease, to investigate the effect these changes

have on the sample size needed to maintain an average coverage of a t least 0.95.

Table 6.1: Sample size required to have an average coverage probability of a t least

0.95 when 0 -. U[a: b]. The parameter d is h d f the length of the posterior credible

interval, ssize is the sample size needed under the given conditions and ssize.per f ect

is the sample size needed when sens = 1 and spec = 1.

<

Example #

1

2

3

4 - a

6

The first three examples show the degree to which the sample size increases as

d decreases. Example 4 shows that the sample size more than doubles when the

sensitivity and the specificity are decreased compared to example 2. Comparing

example 5 to example 3 shows that changing the prior information about f3 can also

sens

0.9

0.9

0.9

0.75

0.9

0.9

spec

0.8

0.8

0.8

0.75

0.8

0.9

a

O

O

O

O

O

O

6

1

1

1

1

0.1

0.1

d

0.05

0.03

0.02

0.03

0.02

0.02

ssize

647

1809

4082

3867

2887

1473

ssize.per f ect

275

767

1/29

767

359

359

have a large effect on the sample size. Example 6 compared to example 5 demonstrates

t hat increasing the specificity can substantially lower the required sample size. The

last column displays the sample size required to satisfy the given conditions when

the diagnostic test is a gold standard. that is when sens = 1 and spec = 1. This

column compared to the previous shows that the sample size required to reach the

given accuracy increases substantially when an imperfect test is used instead of a

gold standard.

6.2.5 Comparing the sample sizes from the Bayesian ap-

proach to those based on the AMLE

Chapter 5 discussed the adjusted maximum likelihood estimator ( A M L E ) for the

prevalence of a disease. In particular, we considered the sample size needed for the

-4MLE to fa11 within a distance d of the true 0 with probability at least 1 - a: when

the sensitivity and the specificity of the diagnostic test were exactly known. For

example, if d = 0.05, a = 0.05, the sensitivity= 0.9 and the specificity= 0.8. the

conservative sample size obtained by substituting 0.5 for p in the equation

was 784 (table 5.7). Replacing p in equation 6.6 by its lower bound, 1 - c = 0.2,

resulted in the smallest possible value for sample size over al1 po 501. From table

6.1, we note that the Bayesian average coverage sample size based on a uniform prior

density for 0 is in between these two extremes, 647. \Ne ran several simulations with a

sample size of 501 to compare the ArLILE to the posterior mean with a uniform prior.

Since the sample size calculations from both chapter 5 and this chapter are based on

interval estimation, it is clear that larger samples ivill produce at least as accurate

estimates. The results are given in table 6.2. For each value of 0, ive chose two values

for x from the simulations. The first is the first value of x obtained in the simulations

such that x/n 5 1 - c: and the second is the first value of x such that z /n > 1 - c.

Note that in these simulations the value of 6 is needed to generate xo but is not

relevant to the calculations of the AMLE or the posterior mean. From the examples

Table 6.2: Comparing the to the posterior mean. The sensitivity was taken

to be 0.9, and the specificity %vas 0.8. For the calculations of the posterior mean, the

prior distribution for 0 was UIOo 11. Here pmean is the posterior mean of p.

Example #

1

2

3

in table 6.2, we see that the posterior mean tends to be smaller than the AMLE when

x fn 5 1 - c and larger if x/n 2 1 - c. Of course, many more simulations would have

to be carried out in order to verify this observation. Nevertheless, these examples

suggest that neither method is clearly superior when estimating the prevalence when

the sensitivity and the specificity of the diagnostic test are known. The Bayesian

method, however, has the advantage of using the prior information available on the

prevalence, and this can have a great effect on the sample size. For example, we

calculated the sample size needed for the AMLE to fa11 within a distance d of the

0

0.02

0.02

0.03

x

92

113

99

pmean

0.014

0.041

0.02

.-iMLE

0.027

0.036

0.027

true value of the prewlence, and compared it to the sample size needed for the average

posterior coverage of the interval [1î - d , p + dl to be at least 1 - (Y, where p is the

posterior mean of p. We chose d = 0.02, a = 0.05, s = 0.9 and c = 0.8. We assumed

that the only prior information about B \vas an upper bound 6. Therefore, for the

Bayesian approach, the prior density was B - U[O, b]. For AMLE. since 0 is bounded

above by b, p is bounded above by b(s + c - 1) + 1 - c = 0.76 + 0.2. The results are

given in table 6.3.

1 b 1 ssAMLE 1 ssbayes 1

Table 6.3: Comparing the sample size, ssAMLE, needed for the d M L E to fa11 within

a distance 0.02 of the true value of the prevalence with probability at least 0.95. to

the sample size, ssbayes, needed to have a posterior average coverage of a t least 0.95

of an interval of length 0.04 around the posterior mean of O. The sensitivity is 0.9:

the specificity is 0.8, and the prior distribution of 0 is U[O, b ] .

From table 6.3 we see that in the Bayesian approach, the sample size decreases

substantially when the prior information increases, as summarized by a uniforrn prior

distribution over the range of 8. The sample size based on the AMLE also decreases,

but at a Iesser rate.

The Bayesian approach has another very important advantage over the frequentist

approach, in that it can be extended to include the uncertainty around the sensitivity

and the specificity of the diagnostic test. We shall take advantage of this flexibility

in the remaining sections of this chapter.

6.3 The case when the specificity but not the sen-

sitivity of the diagnostic test is exactly known

6.3.1 Introduction

Some diagnostic tests are considered to have high specificity, while other diagnostic

tests are considered to have high sensitivity. For example, some blood screening

tests that are used to detect antibodies to the human immunodeficiency virus have

very high sensitivity, Gastwirth [1991]. Another example occurs in parasitology,

where stool examinations for certain parasites are considered to have near perfect

specificity. This is because the test consists of looking directly for the parasite under

a microscope, and a distinctive Iooking parasite is not likely to be thought present

when it is not. Hence i t often occurs that one of the test properties is known exactly

or almost exactly, while the other may be less accurately known.

In this section we will consider the problem of estimating the prevalence of a

disease in the case where the specificity but not the sensitivity of the diagnostic

test is esactly known. Similar results can be derived when the sensitivity but not

the specificity is exactly known. We will denote by c the known specificity of the

diagnostic test, the sensitivity will be denoted by S, and 0 will denote the prevalence

of the disease. As in the previous section, we will use a Bayesian approach to estimate

the prevalence of the disease, and to calculate the sample size needed to satisfy

criterion 2.8, now in the presence of unknown S.

Two methods for calculating the posterior mean of 0 and the posterior average

coverage probability will be used and compared. The first is an exact method, that

consists of calculating the posterior coverage probability of an interval exactly or

almost exactly by integrating the posterior density function over that interval. As we

shall see, t his met hod is feasible in the case where the prior densities of the prevalence

and the sensitivity are independent uniforms. The second method consists of using

a SIR algorithm to draw a sample from the posterior distribution of 0. Since this

method uses random samples to make inference about a parameter, unlike the first

method, it only provides an approximate solution. However, it has the advantage

of generalizability, in that it can be used for any prior density function. The error

generated in using the SIR method instead of exact calculations can be reduced by

increasing the size of the SIR sarnple, although this cornes at a cost of increasing

computation times.

6.3.2 Prior density for p

Suppose that S and 0 are a priori independent random variables. Let f e (0 ) denote the

prior density function of 0' and fs(s) the prior density function of S, where again,

when there is no chance for confusion we will omit the subscript. Suppose that B

takes values on the interval [a,b], where O < a 5 b 5 1, and that S takes values on

the intervai [al: 611, where O 5 a l 5 bl 5 1. From the prior independence of O and S'

the prior joint density function of 0 and S, denoted here by f (B , s), is

As before, let p denote the probability that an individual has a positive test result.

Consider the transformation of the parameter vector (0, S) to the parameter vector

(p, S), recalling that

The Jacobian of this transformation is 1/(S + c - l), so that the joint prior density

function of p and S, denoted here by f ( p , s), is

fe((p-(l-c))l(s+c- I ) ) f s ( ~ ) s + ~ - 1 , i f a s + (1 - a ) ( l - c ) < p <_ bs+ (1 - b)(l - c )

and al 5 s 5 b l ,

fixO Y ot hemise.

6.3.1 Suppose that 0 - U[a , b] , where O 5 a 5 b 5 1, and that S -.

Cr[a17 bl], where O 5 al 5 bl < 1. Then

I if as + ( 1 - a ) ( l - c ) 5 p 4 6s + ( 1 - b ) ( l - C)

and al 5 s 5 bl,

O ot henvise.

6.3.3 Posterior density of p

Likelihood

The likelihood function of the data, denoted here by l ( x J B , s), has a binomial form,

and is given by

Csing the transformation 6.1, this likelihood can be writ ten in terms of (pl z).

'lote that l ( x l p , s) depends on s only through p, so we can delete the s and write

1 1 (xlp) instead.

f (6 , s)l(xle, S ) , if a 5 0 5 b

and al 5 s 5 61,

O ot herwise,

and define f ( p , s: x) by

f ( p , s)l(xlp, s ) , if as + (1 - a ) ( l - c) 5 p 5 bs + (1 - b ) ( l - c )

and al 5 s 5 bl,

O 7 otherwise.

(6-a)(b i -uI)(s+c- 1 ) if as+ (i - a)(i - c) 5 p 5 6s + ( 1 - b ) ( l - c )

and al 5 s 5 b l ,

O ot herwise.

Marginal probability fiinction of X

The marginal probability function of S, aenoted here by m(x), is obtained by inte-

grating f ( p , s, x) with respect to p and s,

bs+(l-c)(l-6)

m ( 4 = l; 1 as+(l-c)(l-a)

I (P , s , 4 d p d s .

Example 6.3.3 If 9 - U[a, b], where O 5 a 5 b 5 1, and S -. U [ a l , b l ] , where

bs+(l -c)(l-6) (3 ( P ) ~ ( ~ - P)- dpds. m+(i-~)(i-a) (b - a) (bl - ai) (S + c - 1)

89

Posterior joint density of 0 and S

the posterior joint density function of 0 and S given ,Y = x, denoted here by f (0' S I X ) ,

is

Posterior joint density of p and S

Gsing Bayes' theorem the posterior joint density function of p and S given S = x!

denoted here by f (p, s lx), is

f p 3 if as + (1 -a)(i - c) 5 p < bs+ (1 - b ) ( l - c) :

and al < s 5 bl ,

O t herwise.

6.3.4 Posterior mean of 6

Gsing transformation 6.7, we have

The posterior rnean of 13, denoted here by 0, is

6.3.5 Sample size

Posterior coverage probability

The posterior coverage probability of the interval [è - d, 8 + dl is

Changing the order of integration,

By the transformation 6.7:

where I l = (0 - d ) ( s + c - 1) + 1 - c, and l2 = ( ê + d ) ( s + c - 1) + 1 - c.

Sample size criterion

Suppose ive a re lmkmgfor thesamplesize n that satisfiës thepaverage covkage

criterion 2.8 for a given d and a. Therefore, we seek the sample size n such that

Given d and al to find n we have to compute the posterior mean 6 : the marginal

probability m ( x ) of x, and the coverage probability cov(x; d) for each O 5 x 5 n.

Exact computations

In this section we suppose that 0 - U[a, b], where O 5 a 5 b 5 1, and that S - U [ a l , 611, where O 5 al 5 b1 5 1. We will develop an exact algorithm to compute

ê, n ( x ) and cou ( x ; d ) under these conditions. Recall from examples 6.3.1, 6.3.2 and

6.3.3, that we have

and

Therefore,

b s + ( l - c U - b ) (:) ( p - (1 - c))pX(l - p)n-T dpds. a~cci-c)(i-a> ( 6 - a ) (61 - a i ) ( s + c - l ) ' m ( x )

For ease of computation, we change the order of integration such that s is integrated

first: and then p. Exact algorithm 1 is used to integrate with respect to S . A corn-

puter program, written in Maple and given in Appendk C, is used to carry out the

computations. The resulting functions are then integrated with respect to p. Exact

algorithm 2 is then used to cornpute the average coverage probability, using a pro-

gram written in S-plus, given in Appendiu C. Before describing the algorithms in

detail' we will first provide the regions of integration.

If bal + ( 1 - c ) ( l - a ) 2 abl + (1 - c ) ( l - a), let Rii be the region bounded by

the lines

let R12 be the region bounded by the lines

s = al , s = bl ,p = abl + ( 1 - c) ( l - a) and p = bal + ( 1 - c ) ( i - 6) :

and let Rl3 be the region bounded by the lines

p = bs + (1 - c)(l - b ) ? p = bai + (1 - c)(l - b) and s = b l .

Next, if ba l + (1 - c)(l - a) < abi + (1 - c ) ( l - a) , let R21 be the region bounded by

the lines

s = a1,p = a s + (1 - c) ( l - a ) and p = bal + (1 - c)(l - b) ,

let RZ2 be the region bounded by the lines

p = as+(l-c)( l -a) , p = abl+(l-c)( l -a) , p = bs+(l-c)(l-b) and p = bai+(l-c)(l-b),

and let R23 be the region bounded by the lines

p = b s + (1 - c)( i - b),p = abl + (1 -c) ( l - a ) and s = 61.

Note that if a = O then we only have two regions of integration RI and Ra- Rl is the

region bounded by the lines

p = 1 - c , s = a l , s = b1 a n d p = bal + (1 - c ) ( l - b ) ,

and R2 is the region bounded by the Iines

p = bal + (1 - c)(l - b) ,s = bl, and p = b s + (1 - c) ( l - b) .

Figure 6.1 contains plots of al1 of the above regions.

Exact algorithm 1:

1. If a > 0, set sl = (p - (1 - a ) ( l - c ) ) / a , and set s:! = ( p - (1 - b ) ( l - c ) ) / b .

2. Compute g l : the integral of over s when ( s ? p ) E R21, gl = J:; A d s -

3. Compute g2, the integral of - over s when (so p ) E R22, g2 = A d s .

4. Compute g3; the integral of - over s when ( s : p ) E R231 9 3 = 12 A d s .

5. Compute g4, the integral of - ove' s when ( s , p ) E R I Z , gd = J:: A d s .

6. Compute g p l . the integral of - over s when (s, p ) E R Î l , gp1 = 5:; (s+& ds.

9. Compute gp4, the integal of (s+&2 over s when (s, p) E RI2: gp4 = 12 (s+L ds-

Note 6.3.1 We do not need to integrate over the regions Rll and RI3, because these

integrals have the same values as the integrals over the regions R21 and R23. -41~0,

in the above algorithm, if a = O, we only need to find 92, 9 3 , g p ~ , gp3. Since these

functions will also be used for the computations of cov(x) in algorithm 2, we computed

the functions glo 92, 93 , g ~ , gp l , gp2, gp3, and gp4, in terms of a: b, a l , b l , and c' using

a Maple program for later use.

Note 6.3.2 The functions defined in exact algorithm 1 are then multiplied by the

likelihood function of the data and the result is integrated with respect to p over the

corresponding regions. The sum of the integrals of the product of g,, g2, g3, and g4

by the likelihood is the marginal probability of x, and the sum of the integrals of the

product of gp l , gp2, gp3, and gp4 by the likelihood is the posterior mean of B. These

cornputations are used in the following algorithm.

Exact algorithm 2:

1. Choose values for the prior parameters a, 6: al, bl and c, based on the available

prior information.

2. Compute m ( x ) the marginal probability function of x. by

where

1, if bal 5 06, + (1 - c ) ( b - a). i =

( 2, if bal > abl + (1 - c ) ( b - a).

3. Cornpute 8. the posterior mean of O, by

4. Let a.neu = 6 - d, and b.new = e + d. Define Ri .new. R2.new, &new in the

same way & I , Ki2 : Eir3 were defined previously, but wit h a replaced by a. new

and b replaced by 6-new.

5. Compute cov(x; d), the posterior coverage probability of the interval [ê - d, 0+d]

for a given outcome x, by

6. Compute the posterior average coverage probability over the marginal proba-

bility of al1 possible values of 3r by

An S-plus program that can be used to calculate the posterior average coverage

probability is given in Appendix C.

6.3.6 SIR computations

As n increases, it becomes increasingly time consuming to compute the average cover-

age probability using the exact algorithms, because of the large nurnber of computa-

tions involved. There is also another more important limitation to this algorithm, in

that it only covers the case when the prior distributions of 0 and S are both uniform.

Due to the complexity of the algorithm, it is difficult to extend this exact method to

other prior distributions, (although it may be possible). We therefore investigated

other methods of computation to approximate the average coverage probability. One

method that proved feasible is Sampling Importance Resampling (SIR), Rubin [198'i].

This method, introduced in chapter 2, is used to obtain approximate random samples

from a density function when direct sampling from this density function is difficult.

The random samples are then used for inference. For example, sample quantiles may

be used to approximate credible intervals, and sample rneans approximate posterior

means.

Simulating samples from the posterior distribution of B and S

Suppose that S and 0 are a priori independent random variables vith prior density

functions fs(s) andja (0) respectively. Recall that the likelihood function of the data

is

and that f (0, s , x ) is defined by

Recall also that the posterior joint density function of 0 and S given = x is

and al 5 s 5 bl,

ot herwise.

Suppose that one is interested in obtaining samples from the marginal posterior

distributions of 6 and S. The SIR algorithm can be used in the following way:

First obtain a sample { B i ) t(i<k of size k, from the prior distributions fe(o) of 0.

and a sample also of size k, from the prior distribution fs(s) of S. This will

give a sample (Oi2 si) from the joint prior density f (B. s). The rveight function is given

b y

Note that we do not need to compute m(x) for the weights, since w(0, s) m 1 (xl9, s).

Then the SIR method asserts that one obtains an approximate random sample

{B:)15i<r from the marginal posterior distribution of O , by drarving a sample with

replacement from {Bi} with unequal probability weights 1 (x le i , si) l< i<k . Similarly.

an approximate sample { S Z * ) ~ < ~ < , - - from the marginal posterior distribution of S is

obtained by drawing a sample with replacement from with the same weights.

Of course, if needed, a random sample from the joint posterior density of (8, Ç) can

be obtained by similarly resantpling (O i , Si) pairs.

t 98

Posterior mean of 0

The posterior mean of 8 is then approximated by the sample mean

The posterior coverage probability of the interval [b - d , 0 + d]

Given d > O and x. the coverage probability ccw(z; d) of the interval [0 - d, e + d] is

approximately equal to the proportion of points from the posterior sample of 8 that

are contained in the interval [6 - d, ê + dl.

The marginal probability function of x

The marginal probability function of x cao be approxirnated by

The posterior average coverage probability of the intervals [d - d, + d] over

the marginal probabilities of x

The posterior average coverage probability, ca i (d ) of the intervals [ê - d, 9 + dl over

the marginal probability of x is then approximated by n

CVe used the following algorithm to carry out the calculations.

SIR algorithm:

1. Take a random sample of size k, { O i ) i < i < k , from the prior distribution fe (8) of

8.

2. Take a random sample of size k, { s ~ ) ~ ~ ~ < ~ ~ from the prior distribution fs(s) of

S.

3. W i t e the likelihood function of the data given û and S

For each x, x = 0 , I : . . .n?

to each point ( O i l s i 7 X) of the sample.

5 . Resample a sample {Of } lSi<,, - of size r rvith replacement frorn the original sam-

pie { O i ) l < i < k 7 - - using probabilities proportional to the weights {wi} l < i c k .

6. Resample a sample {s i) l<i<r, - - of size r with replacement from the original sarn-

ple { s i ) l<i<k, - - using probabilities proportional to the weights { w i } l < i < k .

7. Compute è the posterior mean of 9

S. Approxirnate the coverage probability of the interval [8 - d, 6 + d] given d > 0.

by calculating the proportion of points from the posterior sample of 0 that are

in the interval [ë - d, % + dl.

9. Compute the marginal probability of z

10. Compute the posterior average coverage probability cov(z) of the intervals [ê - d, e + d ] over the marginal probabilities of z

An S-plus progam that performs these calculations is given in Appendix C .

6.3.7 Examples

We computed several examples of posterior average coverage probabilities using the

exact computations and compared the results to those obtained from the SIR al-

gorithm. We considered the case where the prior distributions of B and S are in-

dependent and uniform on the intervals [a, b] and [ai, 611, respectively, for various

choices of intervals [a. b] and [a1, bl] and different sample sizes, n. Some of these

examples are provided in table 6.4. Throughout this table, d = 0.05 and the speci-

ficity is 0.9. We considered the case where the sensitivity takes relatively large values

[al' bl] = [0.7,0.95]. We then slightly increased the information on the prior density of

the sensitivity to [ a l , b l ] = [0.7,0.9], to see the effect this has on the average coverage

probability. We also considered the case where more prior information rvas available

on the prevalence û -- U[O, 0.21, in with 0 - u'[0,0.5]. With [ai , bl] = [0.7,0.9], we con-

sidered the case where the sensitivity takes relatively low values [a l : b l ] = [0.3,0.5],

while leaving the interval width L ~ e d at bl - al = 0.2. We also considered the case

where there is increased uncertainty around the sensitivity, that is, when the sensitiv-

ity takes values in a wider interval [a l , bl] = [0.3,0.8], while [a, b] = [O, 0.21. For each

situation, we calculated the exact coverage probability and the approximation h m

the SIR algorithm for different values of n, ranging from n = 100 to n = 1300. For the

SIR algorithm, samples of size k = 1000 were taken from the prior distributions of 9

and S, and samples also of size r = 1000 were taken from the posterior distribution

of 0 and S given x.

Table 6.4: Comparing the SIR average coverage probability with the exact coverage

probability when S - U [ a l , b l ] , 0 - U[O, b], c = 0.9 and d = 0.05. Here cov.sir is the

average coverage probability obtained by using the SIR algorithm, and cov.exact is

the exact average coverage probability.

b

O

O

a ,

0.7

O .

bi

0.95

0.95

sample size

100

200

cov.sir

0.630

0.741

cov.exact

0.632

0.739

Table 6.4 shows that adding a small amount of information to the prior distribu-

tion of the sensitivity by decreasing the prior interval of S from (0.7: 0.951 to [O.C,0.9]

resulted in increasing the average coverage probability. For example, for n = 500,

the exact average coverage probability increases from 0.843 to 0.866. Similarly. an

increase in the prior information on 0 from [a, b] = [O, 0.51 to [a, b] = [O, 0.21, while

keeping [ai , bl] = [0.7, 0.91, also increased substantially the exact average coverage

probability. For example, for n = 400, it increased from 0.843 to 0.941. Converselx

increased uncertainty in the prior information for the sensitivity, which is represented

here by a wider prior range for S, resulted in a poorer average coverage. For example,

the average coverage was 0.941 for n=400 and [al, bl] = [0.7,0.95], but only 0.814 for

n=400 and [al, b l ] = [0.3,0.8]: even though [a! b] = [O, 0.21 remained constant.

The table also shows that the exact average coverage probability is very close to

the SIR approximate average coverage probability in al1 cases nre considered. Exact

values for the average coverage probabilities may be preferable, but beyondthese

become increasingly time consuming with increasing sample sizes. We found that

the SIR program ran about 13 times faster than the exact program. Moreover, the

esamples in this table indicate that the errors produced by the SIR computations

are very small. Therefore, in the sequel, we use the SIR algorithm to compute the

average coverage probabilities where larger values of n were considered. The results

are illustrated in plot 1 of figure 6.2, where the average coverage probabilities are

plotted against the sample size n. The values of the parameters considered in each

plot are given in table 6.5.

The plots in figure 6.5 suggest that in many cases, the average coverage probability

does not improve substantially when n is increased a certain value. The average

coverage probabilities seem to approach an upper limit, and this upper limit can

often be much smaller than the desired average coverage of, say, 0.95. For example,

1 Plot # 1 c

Table 6.2: The parameters used in the plots of figure 6.2. Here prior-theta is the

prior distribution of O , prior.S is the prior distribution of S, and r a n g e n is the range

of n. In al1 of these plots d = 0.05.

in plot 1, as n increases from 1500 to 3500, the average coverage seerns to stay near

0.7. Similarly, in plot 3, as n increases from about 600 to 2500, the average coverage

remains near 0.55. This phenornenon will occur again in the next section, where we

will study it in greater detail.

It is also of interest to compare plot 1, where the sensitivity takes values in [0.3,0.5]

and the average coverage seems to have an upper limit of approxirnately 0.7, to plot

2, where the sensitivity takes values in [0.7,0.9] and the average coverage seems to

approach the desired value of 0.95, a t least for n 2 1500. Similarly in plots 3 and

4, the upper lirnit seems to increase from about 0.55 to about 0.9 when the prior

information on 0 increased from 0 - Ii[0,0.5] to 9 - U[0,0.2]. Therefore, it seerns

that the upper limit depends on the prior densities for 0 and S. The case considered

in this section is a special case of that considered in the next section, where we will

allow both the sensitivity and the specificity of the diagnostic test to be unknown.

This case was of interest since we could compare exact calculations to those frorn the

SIR algorithm, and hence evaluate the performance of the latter algorithm. In the

nest section: we d l consider the more general case in detail, including modelling

the effect that features of the prior densities have on the posterior average coverage

probabilit ies.

plot 1 dot 2

plot 6

Figure 6.2: Average coverage probability plotted against n.

6.4 The case where both the sensitivity and the

specificity of the diagnostic test are unknown

6.4.1 Introduction

In this section, we will consider the sample size problern for estimating the prevalence

of a disease from a diagnostic test when both the sensitivity and the specificity of the

test are not exactly known. We will assume that the prevalence, the sensitivity and

the specificity are a priori independent random variables, although similar techniques

will car- through when this condition is not satisfied. Given prior distributions for

each of these quantities, we will derive methods to find the sarnple size n such that the

average coverage criterion 2.8, defined in chapter 2, will be satisfied. The procedure

will be similar to that of the preceding section: First, we develop an exact algorithm

that will be used to calculate the average coverage probability for a given sample size

n, when the prior distributions for the prevalence, the sensitivity and the specificity

are al1 uniform. The exact algorithm d l apply only to uniform prior distributions,

and is feasible in practice for only relatively srna11 saniple sizes. We therefore again

consider the SIR algorithm as an alternate method of computation. We investigate

the average coverage probability for several typical examples. In many situations'

average coverage probabilities do not increase substantially when the sample size n

increases b e p n d a certain value no. Therefore, we develop a regression mode1 to

estimate no, and use it to find the value of the average coverage probability as n

approaches no.

As in the preceding sections, we let 0 denote the prevalence of the disease, 'C the

number of individuals from the sample who test positive, x the observed value of X

in the experiment, and S the sensitivity of the diagnostic test. In this section the

specificity of the diagnostic test is also a random vanabte that will be denoted by C.

Let f e ( 8 ) denote the pnor density function of 0, defined on the interval [a, b], where

O 5 a 5 b 5 1. Similady, let fs(s) denote the prior density function of S, defined

on the interval [ a l , bl] where O 5 ai 5 bl < 1, and let fc(c) denote the prior density

function of C defined on the interval [a2' b2] where O 5 a2 5 b2 5 1 .

The likelihood of the data is

By the prior independence of 8, S and C, the joint density function of 0, S, C is

Define j(x, O, s, c ) by

The marginal probabiiity function of X is

so that by the usual application of Bayes' theorem, the posterior joint density

function of 8, S, and C is

The marginal posterior mean of 8 is found by

The marginal posterior coverage probability of the interval [ê - d, 8 + dl given

'i = x and given d is

The average coverage probability of the intervals [0 - d. 0 + dl over the marginal

of K given d is

To calculate the average coverage probability we need to calculate the marginal prob-

ability of ,Y, the posterior mean 6 of 0' and the posterior probability of the interval

[6 - dl 0 + d] given d and s, for each possible value of x, that is for x = 0.1,. . . , n.

6.4.2 Exact computations for the case of uniform prior dis-

tributions.

In this section we will derive an exact method for sample size calculations. Suppose

that a priori 0 - U[a, b], where O 5 a 5 b 5 1, S .Y U [ a l , b l ] , where O 5 al 5 b1 _< 1.

and C - U[a2, b2], where O 5 a2 5 b2 5 1. The joint density function of O, S. C

is then given by

0, ot henvise.

that is

O ~ a < B L b < i

0 < a l < s < b 1 5 1

0 < a 2 s c < b 2 < 1 ,

otherwise.

For ease of cornputation we consider the transformation of the parameter vector

(8, s, c) to the parameter vector (p, s, c). We recall that

p = es+ (1 - 8)(1- C) ,

so that

The Jacobian, J of the transformation is by definition the inverse of the determinant

Thus J = A, and f (x, p, sl c) is given by

Pl 5 ~ 5 ~ 2

O < a i s s < b l ~ l

o I a 2 s c < b 2 5 1!

ot henvise,

where pl = a ( s + c - 1) + (1 - c ) and p2 = b(s + c - 1) + (1 - c). The marginal

probability of X is then given by

(3 (P)=O - PY-= m ( x ) = l: 11; /: ( b - a) ( b , - a l ) (b2 - il2) (S + c - 1)

dpdcds.

From this the joint posterior density function of S, C and p is

otherwise.

The posterior mean of 9 is by definition

The coverage probability of the interval [0 - d; 0 + dl is

where

1, = (0 - d ) ( s + c - 1) + (1 - c ) and l 2 = (0 + d))(s + c - 1) + (1 - c ) .

To calculate r n ( X ) , e: as well as cov(z; d), triple integrals over s, c and p need to be

calculated. These integrals can be solved exactly by changing the order of integration

such that integration is perforrned first with respect to c, followed by s and then p.

In changing the order of integration, however, 36 different regions of integrations

arise. In Appendix D we will illustrate how the different regions of integration can be

obtained, and therefore how the required integrations may be computed to derive the

final sample sizes using an exact approach. This approach is applicable only when

uniform prior densities for 9, S and C are reasonable, and are practical only when

n is small. Nevertheless, they are useful as a point of cornparison for approximate

methods, which we introduce in the next section.

6.4.3 SIR computations

We will now describe a sampling importance resampling (SIR) algorithm that can be

used to provide approximate coverage probabilities for cases when the exact approach

112

described in the previous section cannot be used. This SIR algorithm is similar to the

one used in section 6.3, with the additional complication tha t C is now also unknom.

One proceeds as follows:

SIR

1.

2.

3.

4.

- a.

6.

7.

algorit hm:

Obtain a random sampie of size k from the prior distribution f0(q of 8, { a i ) l c i < k - -

Obtain a random sample of size k from the prior distribution fs(s) of S,

Obtain a randorn sample of size k from the prior distribution fc(c) of C'

{G} i < i < k .

The likelihood function of the data given 6, S. and c is

so that

to each

for each x' x = 0, 1,. . . , n, attach the weight

point (Oi, si, G, x), 2' = 1, . . . , k.

Resarnple a sample {Of } l< i<r , - - of size r with replacement from the original sam-

ple {Bi } using pro babilities proportional to the weights {wi ) 1 s i s i .

Approximate d the posterior mean of O, by

Estirnate cov(z; d) , the posterior coverage probability of the interval [û - d, 6 +d]

given d > O, by calculating the proportion of resampled points 8: from step 6

that are inside the interval [e - d, 6 + dl.

8. The marginal probability of x is estimated by

9. An approximation of the average coverage probability cov(x) of the intenals

[6 - dl e + d] over the marginal probability function of x is given by

h full listing of an S-plus program that carries out this algorithm is given in Appendix

6.4.4 Examples

Example 6.4.1 Cornparison of the average coverage probability obtained

from the SIR algorithm t o the exact average coverage probability

We calculated the average coverage probability (equation 6.5), using bot h the

exact and SIR algorithrns for a variety of typical prior densities for the sensitivity,

the specificity and the prevalence. Due to the limitations of the available hardware

and software, the exact computations can only be done for small values of n. We

considered examples with n = 50, n = 100 and n = 200. Throughout, we used prior

and posterior samples of size k = r = 1000 in the SIR program. Of course, larger

values for k and r will increase the accuracy at the expense of computing time. The

results are given in table 6.6.

The results in table 6.6 show that if n is relatively small, the SIR algorithm

performs well even with k = r = 1000. These results, together uith the results of the

preceding section and the fact that exact computation can only be done for small

values of n, encouraged us to use the SIR algorithm a s an alternative to the exact

method for the computation of average coverage probabilities. The SIR algorithm is

convenient in the sense that it is easy to program, its use is not restricted to small

sample sizes, and it is flexible, in the sense that it can be used regardless of the

form of the prior distributions. It also does not depend on the assumption of prior

independence of S, C and 0. However, the SIR algorithm has the disadvantage that it

relies on random samples for the computations of posterior probabilities, and therefore

the results given by the SIR algorithm introduce random errors. These errors seem to

be small, as ive discussed before, and do not constitute a real problem in this setting.

This is especially true since we will usually consider average coverage probabilities.

so that errors in each individual term in the average will tend to cancel out when

computing the final average. Therefore, in the remaining part of this section, the SIR

algori t hm will be used.

Example 6.4.2 Cornparison with the case of fixed sensitivity and/or tixed

specificity

In section 6.2, we considered the case where both the sensitivity and the speci-

ficity are fixed, and in section 6.3, the case where only the specificity is known was

examined. We now look at several examples to examine the effect that less than

perfect knowledge about these quantities has on the average coverage probability.

Table 6.6: Cornparison of the SIR average coverage probability with the exact average

coverage probability. The prior densities are S - U[al , b l ] , C -- U[0.85,0.95], and

0 - U[O, 0.11. The parameter d is half the length of the posterior credible interval.

Cov.sir is the average coverage probability given by the SIR program. Cov.exact is

the average coverage probability given by the exact program.

COV

0.950 ' 0.589

0.595

0.947

0.618

0.950

0.803

O. 783

0.950

0.816

0.950

0.765

0.950

0.557

Table 6.7: Variation of the average coverage probability with increasing uncertainty.

The prior distribution of 0 is U[O, 0.11, d is half the length of the posterior credible

interval, sens is the prior distribution of the sensitivity, spec is the prior distribution

of the specificity and cov is the average coverage probability.

Table 6.7 shows that the average coverage probability decreases when the sensi-

tivity and the specificity are not exactly known compared to when they are exactly

known. It is especially interesting to note that if the pnor information suggests that

the sensitivity and the specificity of the diagnostic test must be larger than a given

fixed value, but the exact values are not known. then the coverage probability can

substantially decrease. For example, comparing row 1 to row 5 of the table, the prob-

ability decreases from 0.95 to 0.62 for the same sample size of 1473. Therefore, the

uncertainty greatly decreases the probabilities, even though the uncertainty is only

in the direction of higher sensitivity and specificity. Similarly, in examples 9 and 10,

we see that the average coverage probability decreases from 0.95 to 0.816 when the

sensitivity and the specificity change from being fixed a t 0.9 to having 0.9 as a lower

bound. For small prevalences (prior support for the prevalence here was restricted

to the interval [O, 0.11) ! the uncertainty around the specificity has more efFect on the

accuracy of estimation of the prevalence t han the uncertainty around the sensitivity.

This is seen in examples 1, 2 and 1 of the table, where if we change the specificity from

being fixed a t 0.9 to U[0.85,0.95], the average coverage probability changes from 0.95

to 0.947. It decreased to 0.585, however, when both the sensitivity and the specificity

where changed to Li[O.S5,0.95]. Furthermore, the probability decreased to 0.395 when

the sensitivity changed to U[0.85,0.95] and the specificity to u[0.9, 11. The reason for

this effect is found in the equation p = Os + (1 - 8)(1- c ) . Since s is multiplied by 8,

when û is small the effect of s is small, and therefore the effect of c is larger.

Example 6.4.3 Variation of the average coverage probability with respect

to n

In this section we will investigate the average coverage probability for a variety

of situations to see the effect of changes to the sample size n. We examined a variety

of situations typical of those that may arise in practice. In particular, we considered

the cases where the prior range of 0 is known to be [O, 0.11, and the prior ranges

of the sensitivity and specificity are known to be one of the 4 intervals [0.65,0.75],

[0.75,0.85], [0.85,0.95] and [0.9,1]. We considered both uniform and Beta(3.3) prior

densities on these ranges. Table 6.9 provides average coverage probabilities for prior

densities of the form S - U[al, bl], C - U[az , bz] and û - U[O,O.l], while table

6.10 illustrates similar results for S - Beta(3,3), C - Beta(3,3) and O - Beta(3: 3).

Figure 6.3 displays plots of coverage probabilities versus the sample size n. The values

of the parameters considered in each plot are given in table 6.8.

1 1 1 0.02 1 Uniform ( [0.75,0.85] / [0.85,0.95] 1 [100,5000] 1 Plot #

1 5 1 0.03 1 Uniform 1 [0.75,0.85] 1 [0.85,0.95] 1 [100,5000] 1

d

2

Table 6.8: The parameters used in the plots of figure 6.3. Here range.S is the range

of S, 7-ange.C is the range of C, dist. is the prior distribution of 8: S and C7 and

range.n is the range of n. The range of 8 is [O, 0.11.

Note 6.4.1 The standard beta distribution has support on the interval [O, 1). It is

dist .

0.02

easy to derive a beta density having different support. For example, a random variable

Y - Beta(u, v) on the interval [a, b] would have the density function

range.S

Uniform

r ( ~ + v ) ( ( y - ( ( b - fy(9' = r ( u ) r ( ~ ) b - a 6 - a

range.C

[0.85,0.95]

range.n

[0.85,0.95] [100,5000]

Table 6.9: Average coverage probability when S - U [ a l , b l ] , C - U[a2, b2] and

t9 - Li[O,O.l]. Here cov-sir indicates the average coverage probability computed using

the SIR algorithm. The parameter d is half the length of the posterior credible

interval.

Table 6.10: Average coverage probability when S - Beta(3,3) on the interval [a l , b l ] ,

C - Beta(3,3) on the interval [a2, b2] and 0 - Beta(3,3) on the interval [O, 0.11. Here

cov.sir indicates the average coverage probability computed using the SIR algorithm.

The parameter d is half the length of the posterior credible interval.

plot 1

plot 3

r

Figure 6.3: -Average coverage probability versus n.

The plots in figure 6.3 and the information in tables 6.9 and 6.10 provide evidence

that the average coverage probability approaches an upper limit. For example. in the

first four rows of table 6.9 we do not see any improvement for the average coverage

probability beyond n = 500, even though we quadruple the sample size. Similar

effects occur in rows 5 to 8, rows 17 to 20 and rows 21 to 24 of the same table. While

increasing the sample size always improves the precision in estimating the observed

proportion of positive tests p , past a certain point it does not provide better estimates

of the sensitivity and the specificity of the diagnostic test. Since the estimate of O

depends not only on the estimate of p but also on estimates of the sensitivity and the

specificity for which improvement is limited, the improvement in estimating 0 is also

limited. The approximate sample size bepnd which furt her sampling provides lit tle

additional precision in estimating 0 is a complex function of the prior information

available on Bo S and C. In the next section, we will attempt to use the empirical

evidence in this section to build a model that can help to explain this phenornenon.

6.4.5 Logist ic regression models

We would like to construct a regression model to analyse the variation in average

coverage probabilities caused by changes to the variables n, d, and the prior distribu-

tions of O , S and C. Since the average coverage probabilities must be between O and

1, a generalized linear mode1 with logit link, (see McCullagh and Nelder [1989]) will

be appropriate. In a generalized linear model a function of the mean of the numeric

response variable, called the link, is written as a linear combination of the predictor

variables.

Let p denote the mean of the response variable. The logit link, denoted by q, is

given by

This is equivalent to

The prior distributions of 6: S and C can be summarized by their respective means,

and their respective prior coverage probabilities of the intervals of lengt h 2d centered

at the respective means. The following notation is used:

Let:

+ msens denote the prior rnean of the sensitivity,

mspec denote the prior mean of the specificity.

rn0 denote the prior mean of 9,

cov.8.start denote the coverage probability of the interval [me - d' m9 + d] by

the prior distribution of 0.

0 cm.sens.start denote the coverage probability of the interval [msens-d, msens t

d] by the prior distribution of S?

O cou.spec.start denote the coverage probability of the interval [mspec-d, mspec+

d] by the prior distribution of C, and

Since w was an important factor in sample size determination when the sensitivity

and specificity were exactly known, we chose to include it in the model. The average

coverage probability should satisfy the constraints

We therefore included also the variable l / d ( m - d ) in the model, where m = rnau(rn8-

a, b - me). In surnmary, we rnodel log (&) in terms of the variables n: d. l /d(m - d)

w, msens, mspec, m8, cov.O.start~ cov.sens.start and cov.spec.start.

Since the average coverage probability, as a hnction of n. seems to have an asymp-

tote with value smaller than 1, we looked for a model of the form

where u and v are constant with respect to n, so depend only on d, ut, msens, nspec,

mB, cov.8.start7 cou.sens.start and cou.spec.start. This model seems appropriate if

u < O, since then 7 will be increasing with n, and as n approaches infinity 7) approaches

u. -41~0

so that

p = erp(u) exp(u/n)/(l+ exp(u) exp(v/n)).

Therefore, as n approaches infinity, p approaches exp(u) / (1 + exp(u)).

Fitting the models

The average coverage probability is equal to cov.6.start for n = O, therefore we looked

for a mode1 that satisfies this constraint and fits the description just discussed. Hence

it seemed appropriate to model log(cou - crn.O.stort)/(l - cou + cov.0.start) with

respect to the following variables (and possibly t heir interactions): l l n , d, 1 / d ( m - d)

ui, msens, mspec? mi?, cov.8.start1 cov.sens.start and cov.spec.start. We first must

compute these variables for the situatioiis we studied (tables 6.13 to 6.22). For the

data we considered m = 0.05 and mB = 0.05. We have

( 0.7 if the range of the sensitivity is [0.65,0.75],

0.8 ifthe rangeof thesensitivityis [0.75,0.83], msens =

0.9 if the range of the sensitivity is [0.85,0.95],

10.95 , if'therangeofthesensitivity-is [0.9.I]_

( 0.7 if the range of the sensitivity is [0.65.0.75],

l 0.8 if the range of the sensitivity is [0.75,0.83], mspec =

0.9 if the range of the sensitivity is [0.85,0.95],

( 0.95 , if the range of the sensitivity is [0.9,1].

The Be ta(3 ,3 ) density on the interval [a, b] is given by

O otherwise,

which can be simplified to

~ ( e ) ~ ( o ) ' ? i f t ~ [a,b] f ( t ) =

( b - a ) &-a b-a

I O , otherwise. p p p p p p p p p p - - - - - - - -

p p p p p -

Thus the distribution function at a point y , where a 5 y 5 b, is

t - a b - t

Setting u = E, the integral becomes

so that

F ( y ) = pbeta(3,3, b - a

where pbeta(cu7 ,O. E) denotes the distribution function of a Beta(a, 9) defined on

the interval [a, b]: evaluated at the point y. Furthermore,

, if 9 has a uniform prior density

( pbeta(3: 3: -) - pbeta(3,3, -), if û has a Beta(3,3) prior density.

It is easy to see that in al1 the cases we studied (tables 6.13 through 6.22) we have

The error in the model arises largely from using the approximation to the true

coverage probability provided by the SIR algorithm. We fitted logistic regression

models with both binomial and constant variance, to see which would provide a

better fit. We now describe briefly each of these models.

The binomial model: This mode1 uses the logit link function with binomial vari-

ance, given by p(1 - IL). We first modelled the dependent variable cov - cov.8.start

in terms of l /n7 d, cou.t?.start, cov.0.start2, w and ll(d(0.05 - d)). FVe believe that in

the above list of variables w reflects the effects of both msens and rnspec. and since

cov.8.start = cm.sens.start = cov.spec.start, we believe that cov.6.start with pos-

sibly the quadratic term cm.0.start2 represents the effects from bot h cou.sens.start

and cov.spec.start, while l l (d(O.05 - d)) models the constraints (6.10). The S-plus

program used to fit this model and the model printouts are given in Appendis F.

Figure 6.4 displays a plot of the residuals From the binomial model. In this plot

we see that the mean of the residuals is approximately 0, and the absolute value of

the residuals is not larger than 0.04 for most response values. Although this indi-

cated a reasonable fit, the residual variance seems to be roughly constant throughout

the range of sample size n, so the assumption of the binomial variance seems un-

reasonable. This is not surprising, since rnost of the variance arises from the SIR

approximation to the average coverage probability, which we do not expect to greatly

differ across different values of n. Therefore, we also fitted a quasi likelihood model

with constant variance.

The quasi likelihood model: We considered a quasi likelihood model with logit

link and constant variance, using the same independent and dependent variables chat

were included in the previous model. This model is referred to as a quasi-likelihood

model, since it does not arise from the more commonly used exponential distribution

family and does not make full use of the likelihood function. The program written in

S-plus to fit this model and the model printouts are givea in Appendk F. Figure 6.1

displays a plot of the residuals from this model. In this plot we see that the mean of

the residuals is approximately 0, and the absolute value of the residuals is usually not

Iarger than 0.04. so that the fit seems to be similar to that from the binomial model.

Nevertheless, the variance assumption from this model is almost certainly closer to

being correct. For more details on quasi likelihood models see Mccullagh and Xelder

[l989].

The quasi.ful1 model: In order to see if adding interaction terrns for some of

the variables would improve the fit, we tried a generalized linear model with a lcgit

link and a constant variance. The independent variables were w, l / ( d * (O.OS - d)).

d, cou.0.start, cov.8.start2, msens, msens2, rnspec, rnspec? and l l n and the second-

order interactions of the variables d. cov.8.start, ~ov.O.~tar t*, msens, rnsens2' mspec

and mspec? were added. We seelcted the rnodel that had the smallest AIC (see

Chambers and Hastie [1992]). We called this model the quasi.ful1 model. The S-plus

program used to select this model and the model printouts are provided in -4ppendix

F. Figure 6.4 displays a plot of the residuals of the quasi.ful1 model. In this plot

ive see that the mean of the residuals is again approximately 0, and the range of

the residuals seems to have decreased slightly but not substantially, from the the

preceding models. The variance again appears to be roughly constant throughout

the range of n. The fit seems to be similar to that given by the quasi model, but the

quasi model is simpler in the sense that it has less variables and is therefore easier to

interpret, and to use in practice.

In order to investigate the effect of the sample size n on the average coverage

probabilit- we fitted a nested model, that rnodels the data with respect to n, with

al1 other variables being held constant. Here we consider each possible combination of

the variables separately, and examine how the average coverage probabilities change

with respect to n.

The nested quasi model: This is a quasi likelihood model with a logit link and

a constant variance with respect to l /n in each of the levels obtained by considering

al1 possible combinations of:

0 The levels taken by d, where 1 represents the value 0.01, 2 represents the value

0.02 and 3 represents the value 0.03,

the levels taken by the prior density of 8, where 1 represents the Uniform density

and 2 the Beta(3,3) density,

the levels taken by the range of the sensitivity, where the intervals [0.65,0.75].

[0.73,0.83], [0.83,0.95], and [0.9,1], are represented by 1, 2, 3 and 4, respectively.

and

the levels taken by the range of the specificity, where the intervals [0.65,0.75]

[0.75,0.85], [0.85,0.95], and [0.9,1], are represented by 1, 2, 3 and 4, respectively.

For more details on nested models, see Chambers and Hastie [1992].

The S-plus program we used to fit the nested model is given in Appendix F. The

plots of the average coverage probability for the nested model within the different

combinations of levels is given in figure 6.5 . Here, we see clearly how the average

coverage probabilities seem to approach an upper Iimit at a relatively small value of

n. This upper limit varies from one level to another depending on the initial values

of the predictors.

Figure 6.4: From top to bottom, residuals from the binomial, the quasi likelihood

and the quasi.ful1 models, respectively.

Figure 6.5: Predicted average coverage probability versus n, given by the nested quasi

mode1 in each of the levels provided by a11 possible combination of the values of d, the

prior densities, the ranges of the sensitivity and the specificity. Each point represents

a predicted coverage probability for a point in the original data set. .-

The residual deviance in the binomial model, 3.308, is small in absolute terms, but

is larger by approximately a factor of 12 than the deviance from the quasi Iikelihood

model, 0.276. The residual deviances from the quasi likelihood and quasi.ful1 models

are very close, the latter value being 0.221. This is due to the fact that the residual

deviance is given by the expression

where V ( u ) is the residual variance function and y = cm - cou.0.start. In Our data,

y 2 0.3, as seen in the plots of the residuals across the range of the observations. In

the binomial model, V ( u ) = u(l - u), which decreases as u increases on the interval

[O, 0.51, while in the quasi mode1 V ( u ) is constant. Therefore. the deviance of the

smaller values of y from the binomial model d l be larger than the corresponding

deviance from the quasi likelihood model, since in the binomial model we are dividing

by a smaller quantity.

Predicted values

The plots of the residual errors show that al1 three models provide a good fit to the

data. In order to assess the wlidity of our models for large values of n, we calculated

the predictions for the average coverage probability for n = 5000, and n = 7000. as

given by the binomial, quasi, quasi.ful1 and the nested quasi models. We compared

t hese values to the corresponding values provided by the SIR algorit hm. Throughout ,

we used situations that were not included in those from which the models were built.

The results are shown in tables 6.11 and 6.12.

' fit-full 6t.nested fit.bin fit-quasi

0.506 0.509 0.503 0.501

0.562 0.562 0.557 0.556

0.534 0.336 0.540 0.539

0.570 0.578 0.573 0.572

0.638 0.636 0.627 0.627

0.741 0.739 0.745 0.743

Table 6.1 1: Comparing the average coverage probability with the predicted coverage

probability from the modelç when S - u'[al, b l ] , C - U[a2, b z ] , and 0 - U[O, 0.11. The

parameter d is half the length of the posterior credible interval, cov.sir is the average

coverage probability computed using a SIR algorithm, fit.full, fit.nested. fit. bin and

fit-quasi are the values of the average coverage probability given by the quasi-full

model, nested quasi model, binomial mode1 and quasi model, respectively

- .- p. - -- -- - -

Table 6.12: Comparing the average coverage probability with the predicted coverage

probability when S - Beta(3,3) on the intenal [al, 611, C - Beta(3,3) on the interval

[a2, b2] and 0 N Beta(3,3) on the interval [O, 0.11. The length of the posterior credible

interval id d = 0.02, cov.sir is the average coverage probability computed using a SIR

algorithm, fit.ful1, fit.nested, fit-bin and fit.quasi are the values of the average coverage

probability given by the quasi.full model. nested quasi model, binomial mode1 and

quasi model,respectively.

n

5000

5000 - -

5000

5000

5000

a,

0.65

0.65

0.X

0.75

0.85

bl

0.75

0.75 -

0.85

0.85

0.95

as

0.65

0.75 - -

0.75

0.85

0.85

fit-full

0.715

0.736

0.754

0.783

0.805

b2

0.75

0.85

0.85

0.95

0.95

fit-nested

0.713

0.732 - -

0.755

0.780

0.804

cov-sir

0.728

0.746

0.745

0.817

0.810

fit.bin

0.714

0.739 - -

0.764

0.787

0.807

fit-quasi

0.713

0.739

0.765

0.788

0.809

We see in tables 6.11 and 6.12 that the values predicted by the four models are

very close to each other and also close to the corresponding values obtained from the

SIR algorit hm.

Sample size

The average coverage probability approaches an upper limit as the sample size in-

creases. It would be of considerable practical interest to cornpute this upper limit,

and to determine the sample size no after which the average coverage probability will

not improve by more than E, even if the sample size were to increase to infinity. Since

the rnodels we used were of the form

p approaches exp(u)/(l +exp(u)) as n approaches infinity. Therefore cov -cov.B.start

approaches exp(u)/(l + exp(u)) as n approaches infinity. The upper limit of the

average coverage probability is then given by

We therefore seek the no such that

exp(u)/(l + exp(u) - exp(u) exp(v/n)/(l + exp(u) exp(v/n)) _< c, for al1 n 2 na.

(6.13)

In a11 three rnodels, the expression

exp (u) exp(u/n) 1 + exp(u) exp(u/n)

is increasing in n, since its derivative is positive throughout its range. In fact the

derivative is equal to exp (u) exp(u/n) ( - v / n 2 ) (1 + exp(u) exp(v/n))* '

which is positive, since u < O in the three models. Hence inequdity 6.13 is satisfied

is true. We will now derive an expression that approximates no, applicable to al1

t hree models:

exp(u) - e(l + exp(u)) v / n 2 log

exp(u) + ~ ( 1 + exp(u)) exp(u) exp(u) - e(1 + exp ( u ) )

n 2 v l log exp(u) + ~ ( 1 + exp(u)) exp(u)

Tables 6.13 through 6.22 contain a collection of examples of coverage probability

limits? together with the sample sizes needed for E = 0.01 and e = 0.02: as given by

the t hree models.

[al, bl] [a2, b2] c. bin c.quasi

[0.65 ,0.75] (0.65 00.75] 0.236 0.235

Table 6.13: The sample size, and the limit of the average coverage probability given

by the binomial and quasi models, when the prior distribution of B is U[O, 0.11, the

prior distribution of S is U [ q , b l ] , and the prior distribution of C is U[aî , bz] . The

parameters E = 0.01, d = 0.01, and cov.0.start = 0.2. C.bin, c.quasi and c.full are

the upper limits of the average coverage probabilities given by the binomial, quasi

and quasi.ful1 models. Ss.bin, ss-quasi and ss.ful1 are the sample sizes given by the

same three models. respectively. after which the average coverage probability will not

improve by more than 6 = 0.01.

-- - --

[al, h] [a2, bz] cb in ~ .quas i ~ . fu l l ss.bin ~ ~ . q u a s i

Table 6.14: The sampie size, and the limit of the average coverage probability given

by the binomial and quasi models, when the prior distribution of 13 is U [O, 0.11. the

prior distribution of S is U[al, bt], and the prior distribution of C is LT[a2, b 2 ] . The

parameters E = 0.01, d = 0.02, and cou.0.start = 0.4. C.bin, c.qua.si and c.ful1 are the

upper limits of the average coverage probabilities given by the binomial, quasi and

quasi-full models. Ss.bin, ss.quasi and ss.ful1 are the sample sizes given by the same

same t hree models, respectively, after which the average coverage probability will not

improve by more than E = 0.01.


by the binomial and quasi models, when the prior distribution of O is Beta(3,3) on the

interval [O' 0.11, the prior distribution of S is Beta(3,3) on the interval [al, b l ] , and the

prior distribution of C is Beta(3,3) on the interval [a2, b2] . The parameters c = 0.01:

d = 0.02, and cov.8.sta~t = 0.674. C-bin, c.quasi and c.ful1 are the upper limits of

the average coverage probabilities given by the binomial, quasi and quasi.ful1 models.

Ss.bin, ss.quasi and ss.ful1 are the sample sizes given by the same three rnodelso

respectively, after which the average coverage probability will not improve by more


by the binomial and quasi models, when the prior distribution of 0 is Beta(3,3) on the

interval [O. 0.11, the prior distribution of S is Beta(3,3) on the interval [ a l , b l ] , and the

prior distribution of C is Beta(3,3) on the interval [a2, bz]. The parameters c = 0.01.

d = 0.03, and cov.0.start = 0.884. C.bin, c.quasi and c.full are the upper limits of

the average coverage probabilities given by the binomial, quasi and quasi-full models.

Ss-bin, ss.quasi and ss.full are the sample sizes given by the same three models.


than E = 0.01.

[al Y 611

[0.65,0.75]

[0.65,0.75]

[0.65,0.75]

[0.65,0.75]

[a2? b2]

[0.63,0.75]

[0.75,0.85]

[0.85,0.95]

[0.9,1]

c. bin

0.907

0.921

0.936

0.943

c-quasi

0.906

0.921

0.936

0.943

c-full

0.905

0.919

0.938

0.949

ss. bin

118

275

397

453

ss.quasi

133

252

366

419

ss.ful1

128

237

378

463


by the binomial and quasi models, when the prior distribution of 0 is U[O, 0.11, the

prior distribution of S is U [ a l , bi], and the prior distribution of C is U[a2' 621. The

parameters c = 0.02, d = 0.01, and cou.0.start = 0.2. C.bin, c.quasi and c.full are

the upper limits of the average coverage probabilities @en by the binomial, quasi

and quasi.ful1 models. Ss.bin, ss-quasi and ss.ful1 are the sample sizes given by the

same three models, respectively, after which the average coverage probability will not

improve by more than c = 0.02.

Table 6.20: The sample size. and the limit of the average coverage probability given

by the binomial and quasi models, when the prior distribution of 0 is U[O, 0.11. the

prior distribution of S is U [ a l , bill and the prior distribution of C is U[a2, b,]. The

parameters E = 0.02, d = 0.03, and cov.0.start = 0.6. C.bin, c.quasi and c.full are the

upper limits of the average coverage probabilities given by the binomial, quasi and

quasi.full rnodels. Ss.bin, ss-quasi and ss.ful1 are the sample sizes given by the three

models, respectively, after which the average coverage probability will not improve

by more than c = 0.02.

Table 6.21: The sample size. and the limit of the average coverage probability given

by the binomial and quasi models, when the prior distribution of û is Beta (3 ,3 ) on

the interval [O, 0.11, the prior distribution of S is Beta(3,3) on the interval [ a l , b l ] ,

and the prior distribution of C is Beta(3,3) on the interval [a2, b z ] The parameters

E = 0.02, d = 0.02, and cm.B.start = 0.674. C.bin, c-quasi and c.full are the upper

limits of the average coverage probabilities given by the binomial, quasi and quasi-full

models. Ss-bin, ss.quasi and ss.ful1 are the sarnple sizes given by the three models,


than é = 0.02.

- - - - .-

Table 6.22: The ~ a m p l e size? and the limit of the average coverage probability given

by the binomial and quasi models, when the prior distribution of 13 is Eeta(3,3) on

the interval [O, 0.11, the prior distribution of S is Beta(3,3) on the interval [a , , b , ] ,

and the prior distribution of C is Beta(3 ,3) on the interval [a2, b2]. The parameters

E = 0.02; d = 0.03, and cm.8.start = 0.884. Chin, c.quasi and c.ful1 are the upper

limits of the average coverage probabilities given by the binomial, quasi and quasi.ful1

models. Ss.bin, ss.quasi and ss.ful1 are the sample sizes given by the three models,


than c = 0.02.

Tables 6.13 through 6.22 confirm that in general, the average posterior coverage

probability has an upper limit that is often much smaller than 1. This limit cannot

be surpassed even with an infinite sample size. By fitting a logistic regression model

to data that consists of the values of the postenor average coverage probabilities

calculated at different values of the sample size. this upper limit can be closely ap-

proximated. The sample size needed for the posterior average coverage probability t o

be within c of this upper limit increases when the intervals of definition of the prior

densities of S and C is shifted to the right. For example, the results of the quasi

likelihood model in table 6.19 indicate that the sample size is 206 when the pnor

densities of the sensitivity and the specificity are both U[0.65,0.75], but it increases

to 707 when these prior densities are both U[0.9,1]. This is because the upper limit of

the posterior average coverage probability increases when the intervals of definition

of the prior densities of S and C shift to the right, and larger sample sizes are needed

to attain this higher upper limit.

For the various values of d and for the prior densities of S, C and B considered

in tables 6.13 through 6.22, we calculated the difference between the upper lirnit of

the post erior average coverage probabilities and cm.8.start. We cal1 t his difference

the improvement of the posterior average coverage probability. Table 6.23 shows the

maximum and minimum improvement over the different prior distributions of S and

C. For example, if d = 0.03 and the prior distribution of B is L;[O,O.l], the m a ~ i -

mum improvement is 0.233 and the minimum improvement is 0.065. These maximum

and minimum improvements correspond to prior distributions for the sensitivity and

specificity being both U[0.65,0.75] or both U[0.9,1] respectively. When the prior

distribution of B is U[O, 0.11 and the prior distributions of S and C are uniform (first

three rows of the table), the size of the irnprovement increases with increasing d.

For example, the maximum improvement is 0.139 for d = 0.01, while it is 0.229 for

d = 0.02. This is not the case! however in rows 4 and 5, where the prior distribution of

B is Beta(3,3) defined on the interval [O, 0.11, and the prior distributions of S and C

are also Beta(3,3) , defined on the intervals [al, b l ] and [a2, bz ] , respectively. Here the

improvement does not increase with increasing values of d. For example, for d = 0.02,

the maximum improvement is 0.165, while for d = 0.03, the maximum improvement

is only 0.089. This reversa1 is probably due to the shape of the density function of

a Beta(3,3) distribution defined on the interval [O, 0.11. This density function is con-

centrated around the middle of its interval of support, since its standard deviation is

0.01749636 and its rnean is 0.05. See figure 6.6. This figure plots a Beta(3,3) density

function defined on the interval [O, 0.11, together with a plot of the posterior density

function when x = 94 is obsenred. The SIR algorithm with k = r = 500000 was

used. Mie see from these plots that the improvement is concentrated in an interval of

approximative length 0.03 around the mean 0.05. If a d > 0.015, is considered, then

the improvement will be smaller, since some probability will be lost around the edges

of the interval and some gained around the middle of the interval.

-- -

Table 6.23: Slaximum and minimum differences between the upper limit of the pos-

terior average coverage probability and the prior coverage probability of 8. In the first

three r o m the prior distributions of S and C are respectively Li[al, bl] and U[a2, b 4 ,

where the intervals [al . b l ] and [a2, bz] are the ones considered in the tables 6.13

through 6.22. The parameter d is half the width of the posterior credible interval,

prior.19 is the prior distribution of 8, cou.b.start is the prior coverage probability by 19.

of an interval of total length 2d centered at the prior mean of 8, and max.imp.quasi

and min.imp.quasi are respectively the maximum and the minimum of the differences

over the different values of a 1 , bl , a*, and b2.

1

2

3

4

5

d

0.01

0.02

0.03

0.02

0.03

priw.8

G[O, 0.11

L'[O. 0.11

U[O, 0.11

Beta(3,3) on [O, 0.11

Beta(3.3) on [O, 0.11

~~~~~~~~~t

0.2

O -4

0.6

0.674

0.884

max.imp.quasi

0.139

0.229

0.233

0.165

0.089

min.imp.quasi

0.035

0.063

0.065

O. 040

0.022

prior density of theta - - - - - - - - - - - - . posterior density of theta

Figure 6.6: Plot of the prior and posterior density functions of 8. The prior distribu-

tion of 0 is Beta(3,3) defined on the interval [O, 0.11. The prior distributions of S and

C are both Beta(3,3) defined on the interval [0.9,1]. The posterior density function

of 0 is computed for x = 94. using a SIR algorithm with samples, both prior and

posterior, of sizes k = r = 500000.

Chapter 7

Pract ical Implications

The estimates of the posterior average coverage probabilities, and therefore also the

upper limits of t hese coverage probabili t ies as the sample size approaches infinity,

depend largely on the prior distributions for the sensitivity and specificity of the

diagnostic test. The information on the sensitivity and the specificity of the test

provided by the sample depends not only on the disease prevalence, but also on the

disease status of each subject in the sample. Since individual disease status is only

known if a gold standard test is applied to each subject, the sarnple will often carry

less information on the sensitivity and the specificity of the test compared to that

for disease prevaience. Therefore, it is important to gather as much information as

possible on the sensitivity and specificity of the test prior to the start of the experi-

ment, and to include this information in the prior distributions. It is crucial not only

to derive point estimates of the sensitivity and specificity, but also to minirnize the

standard errors of these estimates, since these errors have a large impact on accuracy

of the results of the study. For example, the results presented in table 6.7 showed

that the average coverage probability decreases with the introduction of uncertainties

around values of the sensitivity and specificity of the diagnostic test, even when these

uncertainties are uniquely in the direction of higher values. For exarnple, a sensitivity

that has a lower limit of 0.9 gives a poorer postenor average coverage probability for

0 than a sensitivity that is 0.9 exactly. Since one very rarely knows these values ex-

actly, one rnust include the uncertainty surrounding these variables in the estimation

procedure. Not accounting for this uncertainty rnay provide very misleading results.

On the other hand. including unnecessarily large uncertainties may render the sample

uninformative zven if it is very large. To illustrate how the novel methods presented

in this thesis may be used in practice. in this chapter we will present several examples

based on real studies. First? an observation that will be used later is examined:

Example 7.0.4 Table 7.1 presents the sampie size needed for the posterior average

coverage probability to be within E of its upper limit in a variety of examples. when

only the ranges of support of the sensitivity and specificity are known. These values

are compared to the sample size needed to reach the same posterior average coverage

probability when both the sensitivity and the specificity of the diagnostic test are

exactly known.

Table 7.1: Sample size needed to reach a desired posterior average coverage probabil-

i t - The parameter d is half the length of the posterior credible interval. Sens is the

prior distribution of the sensitivity, spec is the prior distribution of the specificity,

cov is the average coverage probability, taken here to be the difference between the

limit of the average coverage probability, given by the quasi mode1 with E = 0.02, and

ssize is the sample size required to reach the desired average coverage probability.

The prior density of 0 is U[O, 0.11.

d sens spec cou s s i t e

It is clear from table 7.1 that the sample size needed to reach a given posterior

average coverage probability decreases substantially when the uncertain- around the

sensitivity and the specificity is suppressed. For example, in the first and second r o m

of the table, the sample size needed to reach a posterior average coverage probability

of 0.588 is 610, while this sample size decreases to only 93 when the sensitivity and

specificity are known to be 0.8 and 0.95, respectively, al1 other conditions being held

fixed.

In practice, two conclusions can be drawn. First, properties of the diagnostic

test used may severely prohibit the abilities of studies to draw accurate conclusions.

Secondly, it will often be much more useful to improve the knowledge about the test

properties than to increase the sample size. Below we discuss a procedure to find a

sarnple size in the context of two examples of studies carried out in Montreal.

7.1 Procedure to find the sample size when the

sensitivity and/or the specificity are unknown

Suppose that we would like to estirnate the prevalence of a disease from the results of

a diagnostic test applied to a sample from a population. As is obvious from the results

of the previous section, it is important to determine the upper limit of the posterior

average coverage probability of the prevalence, based on what is known about the

properties of the diagnostic test. Then, assurning the potential improvement on the

average coverage probability is worthwhile, the sample size needed to approach the

upper limit can be determined. For example, tables 6.13 through 6.22 can be used

if the conditions of the study are sirnilar to the ones in these tables. If none of the

situations reported in these tables apply, then we propose the use of the following

procedure:

1. Estimate as accurately as possible the prior densities of S and C. This can be

done by consulting with subject matter experts, by performing a search of the

available literature, or by conducting new studies, if necessal The latter may

be especially important if there is doubt about whether previous estimates of

the diagnostic test properties apply to the population in the present study.

2. Estimate the prior density of 6.

3. Estimate the prior mean of 8.

4. Choose d, half the desired width of the posterior credible interval of O.

3. Select a positive nurnber cr such that 1 - cr is the desired posterior average

coverage probability.

6. Compute cov.O.start, the prior coverage probability of an interval of Length 2d

centered at the prior mean of O.

7. Use a SIR algorithm to approximate the values of the posterior average coverage

probability of 0 for several values of the sample size n. For example, values in

the range from 100 to 2000 may often be appropriate.

If in step 7 the desired posterior average coverage probability of 1 - a is reached,

then n can be determined correspondingly. If, however, the posterior average coverage

probabilities in step 6 seems to be stabilizing away from 1 -a then the following steps

are useful:

0 Fit a logistic regression mode1 with constant error variance to the data derived

in step 7 with dependent variable c m - cov.pz.start, and independent variable

n. In general models of the form u + v / n will provide a good fit. Alternative

models can be of the form u + v l /nk l + u2/nk2 + . . . + ui/nki for some integer i,

where kl, . . . , ki are rational numbers.

Compute an estimate of the upper limit of the posterior average coverage proba-

bility. For any of the models employed in the preceding step, exp(u)/(l+exp(u))

approximates this upper limit .

If t his estimate is considered worthwhile, then choose a positive number E such

that the posterior average coverage probability is considered worthwhile if i t is

within E of the upper limit, and compute the sample size. The formula

can be used for models of the form u + u / n .

We now applg this aigorithm to two examples:

Examples

Example 7.2.1 Joseph et al [1995] considered the problem of estimating the preva-

lence of Strongyloides infection among Kampuchean refugees arriving in Montrealo

Canada in 1982-1983. One diagnostic test that can be used to detect Strongloides

is a stool examination. This test is known to have high specificity, but low sensitiv-

ity. We will estimate the upper limit of the posterior average coverage probability

for 0 using an interval width of 2d = 0.1. In general, d would be selected to match

the objectives of the study. We will also calculate the sample size needed for the

posterior average coverage probability of û to be within c of this upper limit. for a

predetermined e . Prior distributions for O, S and C were given in Joseph e t al (19951,

where the prior distribution for the sensitivity was Beta(4.44,13.31), and that the

prior distribution for the specificity was Beta(71.25,3.75), both on the interval [O, 11.

Yo prior information was available on 8 prior to the start of the experirnent, so a

U[O; 1] prior distribution is assumed. LVe let d = 0.05, so that cm.8.start = 0.1' and

compute the average coverage probability for values of n ranging from 100 to 2000

using increments of 100. To carry out the computations, Ive use the SIR program as

described in section 7.1, with both initial and resampling sizes of 1000. The results

are presented in table 7.2.

Table 7.2: Posterior average coverage probabilities when d = 0.05, the prior distri-

bution of the specificity is Beta(71.25,3.75), the prior distribution of the sensitivity

is Beta(4.44,, 13-31): and the prior distribution of 0 is U[O, 11. Cal is the posterior

average coverage probability of 0 for sample size n.

The results from table 7.2 indicate that the posterior average coverage probability

seerns to have an upper limit of approximately 0.2. We fitted a logistic model of the

form u + u / n to the data in table 7.2. The Splus program that was used to fit this

model is given in Appendiv G. The coefficients are u = -2.219 and v = -47.153, so

that the model is cou - 0.1

log (1.1 - cou) = -2.219 - 47.153/n.

Therefore, the upper limit of the average coverage probability is approximately

The sample size needed for the posterior average coverage probability to be within

c = 0.02 of this upper limit, that is, to reach an average coverage of 0.178, is

where [t] denotes the smallest integer greater than t . The plots of the fit and of the

residuals of this mode1 are shown in Figure 7.1. CVe hypothesize that virtually al1 of

the residual error is due to the SIR approximation. We verified this hypothesis by

more accurately estimating the average coverage probability in this example when

n = 400. The posterior average coverage probability for n = 400 given in table 7.2

is 0.202, while the corresponding value given by the model is 0.188. We recalculated

the average posterior average coverage probability for n = 400 using a SIR algorithm

with prior and posterior samples of size k = r = 5000. The average of 20 repetitions

of this procedure should be very close to the exact probability. We obtained a value

of 0.188, which very closely matches the value predicted by the model. Therefore,

we conclude that most of the residual error from the model is due to random errors

from the SIR approximation? and that the model in fact provides average coverage

probabilities that are quite close to the true probabilities.

Obviously. reporting at best a posterior credible set with probability of 0.2 is not

useful in practice. This indicates t hat estimating the prevalence of S trongyloides

infection with a single stool examination is not worthwhile, even with a v e l large

sample size. Therefore, in order to design a worthwhile experiment. the investigators

must find a way to greatly improve the properties of this test (as may be possible by

taking repeated stool samples from each subject), or to find a different test. (In fact,

a combination of stool examination and a serologic test was used. See Joseph et al

[1995] for details.)

Figure 7.1: Fit by a quasi mode1 to the refugees data presented in table 7.2.

Example 7.2.2 Here we will consider again the previous example, but Vary the

prior densities of the sensitivity and the specificity across a reasonable range. CVe

again assume that the prior density of B is U[O, 11. For each set of prior densities, we

computed the posterior average coverage probabilities for values of n ranging from

100 to 2000, with increments of 100. using the SIR algorithm with 1000 points. The

results are given in table 7.3. For ease of comparison, the second column of this table

displays the results of example 'i.2.l.

Table 7.3: Posterior average coverage probability of B for different prior densities

for the sensitivity and specificity. Throughout, the prior density of B is U[0, 11, and

d = 0.05. Covl , cov2, cov3, cov4, c m 3 , and cm6 are the posterior average coverage

probabilities when the prior distributions of the sensitivity are B e t ~ ( 4 . 4 4 , ~ 13-31)?

Li[0.2,0.3], U[0.2,0.5], U[0.2,0.25], U[0.45,0.5], and 0.2, respectively, and the prior

distributions of the specificity are Be t~(71 .25~3 .75)~ U[0.95,1], 1, 1, 1 and 1, respec-

t ive1 y.

Columns 4 and 5 of table 7.3 demonstrate that the posterior average coverage

pro bability increases even when the prior range of the sensitivity decreases even

toward its lower end point. This is seen, for example, in comparing column 5 to

column 7. It is often better to have a test with poorer but more accurately known

properties compared to a test with higher but less accurately known properties. It

is interesting to note? however, that this phenomena is seen here for n 2 400, but

not for lower values of n. This is probably due to the fact that with small samples,

the uncertainty around p is large, so that the uncertainties around the sensitivity

h a . a smaller overall impact. With larger sample sizes, the uncertainty around p is

smaller, so that the degree of knowledge about the sensitivity has a greater effect.

In the last column, the sensitivity and specificity are both considered to be exactly

known. From section 6.1 of chapter 6, we know that the upper lirnit of the posterior

average coverage probability is 1, and an algorithm to cornpute the exact sample size

required for the posterior average coverage probability of 0 to reach a predetermined

value is available.

Example 7.2.3 In this example we examine the sample size requirements of a study

planned by Dr. Theresa Gyorkos at the Montreal General Hospital. The objective of

the study is to estimate the prevalence of Toxoplasma gondii among pregnant women

in the province of Quebec. A kit that detects the presence of antibodies will be used

as the diagnostic test. The prior densities for the sensitivity and specificity of this test

were estimated to be S - Beta(65,3) and C -. Beta(22.1,O.l) respectively. These

estimates were derived from a detailed study by Wilson and Ware [1991]. The prior

density of the prevalence was taken to be 0 - U[O, 11, since very little is known about

the prevalence of Toxoplasrna gondii in Quebec. LVe computed the sample size needed

for the average posterior coverage probability of credible intervals of total length 0.08

and 0.1 respectively. centered at the posterior mean of 6, to be at least 0.95. The

results are displayed in table 7.4.

Table 7.4: Posterior average coverage probability of B. The prior density of 0 is U[O. 11.

The prior densities of S and C are Beta(65,3) and Beta(22.1,O. 1) , respectively

Cou.04 and cov.05 are the posterior average coverage probabilities calculates for d =

0.04 and d = 0.05 respectively, where, d is half the length of the posterior credible

interval.

In table 7.4 we see that for d = 0.04 (column 1) the required sample size is

approxirnately 1500. Since the computations done using the SIR algorithm are based

on samples and therefore involve some errors, we repeated the calculations several

times. The results are displayed in table 7.5, from which we conclude t hat the sample

size needed for the posterior average coverage probability of a credible interval of

length 0.08 to be at least 0.95 is approximately 1500.

I n 1 Posterior average coverage probability 1

Table 7.5: Posterior average coverage probability of B. The parameter n is the sample

size. The prior density of 8 is U[O, 1). The prior densities of S and C are Beta(65,3))

and Beta(22.1' 0. l) , respectively. The length of the posterior credible interval is 0.08.

For the data displayed in column 2 of table 7.4, that is for d = 0.05, we again

repeated the calculations for n = 500, n = 550, and n = 600. The results are displayed

in table 7.6, which show that the sample size needed for the posterior average coverage

Table 7.6: Posterior average coverage probability of B. The parameter n is the sample

size. The prior density of B is U[O, 1). The prior densities of S and C are Beta(65,3))

and Beta(22.1,0.1), respectively. The length of the posterior credible interval is 0.1.

n

probability of a credible interval of length 0.1 to be a t least 0.95 is approximately

Posterior average coverage probability

550. Therefore unlike the previous example, the properties of the diagnostic test

for Toxoplasma gondii are sufficiently well k n o m that narrow credible sets can be

expected from the study.

Chapter 8

Discussion

Estimating the prevalence of a disease in a given population is the aim of rnany

studies. When an error-free diagnostic test will be applied to each subject in a sample.

standard binomial formulas can be used to determine the sample size required to

estimate the prevalence ta any desired accuracy. Cnfortunately, perfect gold standard

tests are very rare in practice. In general, one does not know the exact values of the

sensitivity and specificity of an imperfect diagnostic test, so that the classical binomial

formulas for sample size cannot be applied.

In this thesis, we provided methods for determining the sample size liecessas for

the estimation of disease prevalence to within a given accuracy. We first presented

an adjustment to a standard frequentist criterion, useful when the sensitivity and

specificity of the diagnostic test are exactly known. When the sensitivity and speci-

ficity are not known, using a Bayesian method, we showed that it is important when

planning a study to:

1. Estimate the sensitivity and specificity of the diagnostic test as accurately as

possible.

2. Calculate the upper limit of the posterior average coverage probability, using

a method provided in the thesis, to determine if a study using this test is

worthwhile. If yes, one can proceed t o the calculation of the sample size using

a procedure similar to that of chapter 7. We therefore stress the importance of

estimating the sensitivity and specificity of a diagnostic test as accurately as

possible before the start of any study.

In this thesis we investigated situations where a dichotomous diagnostic test is to

be used. This situation can be applied more generally? however, since any test with

categorical or continuous outcornes can be dichotomized. Obviously, t here are many

other situations for which similar methods can be developed. While we investigated

the case where the sensitivity and specificity of the diagnostic test and the prevalence

of the disease are a priori independent, similar methods can be developed in the case

where t hey are dependent. For example, in investigating parasitic infections, a subject

with a higher degree of infection may be more likely to test positive on two or more

tests compared to a subject with a lesser degree of infection.

Although this study focused on estimating the prevalence, similar methods could

be applied to estimating test properties. To estimate the sensitivity, for example, one

needs a sample of known positive subjects. If a gold standard test is available, such

a cohort can be assembled, and the problem can be viewed as the classical one of

estimating a binomial proportion. When a gold standard test does not exist, so that

there may be some false positives in the sample, methods similar to the ones developed

in this thesis can be found to determine the sample sizes required to estimate the

sensitivity and specificity of a diagnostic test to within a given accuracy. Similar

methods can also be developed to estimate the positive predictive value, the negative

predictive value and the likelihood ratios of a diagnostic test. For these problems,

one usually needs good prior information on the probabilities that each subject is

truly positive or negative.

Further work should also include sample size determination for studies where two

or more diagnostic tests d l be used. This would be useful when no single test pro-

vides adequate sensitivi ty and specificity for accurate estimation, but combinations

of tests offer hope. Finally, in this thesis Ive looked at only one of many possible

Bayesian sample size criteria. Other sample size criteria such as "average Iength" cri-

teria or conservative criteria such as '' wcrst outcorne" criteria (see chapter 1) could

be investigated, as well as decision theoretic cnteria.

Throughout this thesis, we have assumed random samples from the relevant pop-

ulations. Another area worthy of investigation would be to consider non-random

sarnples, which can occur in chic-based studies, when only certain subgroups of the

target population may attend a clinic. Therefore, in addition to misclassification, one

must handle potential selection bises.

Finally, work should be done on methods for prior elicitations for al1 parameters

involved in diagnostic test situations, since we have shown that these prior distribu-

tions can have an enormous effect on the results.

We hope that the work presented in this thesis will convince researchers of the

importance of knowing the properties of a diagnostic test as accurately as possible,

before using it to estimate the prevalence of the disease, and that the methods for

sample size estimation in the absence of a gold standard test will prove useful.

Appendix A

Splus program to find the AMLE

and the confidence interval for the

thetahat<-function(size,theta,alpha,spec,sens,n){

p<-theta*sens+(l-theta)*(l-spec)

xc-rbinom(size ,n ,p)

pro<- (1-theta) 'n

MLEOc-pnod (1-spec-p) / (sqrt (p* (1-p) /n) )

sigma<-spec* (1-spec) /n

MX<-sqrt (sigma) *exp (-0.5* (1-spec-x/n) -2/sigma) / (sqrt (2*3.1416) * (pnorm ( (x/n-l+spec) /sqrt (sigma) 1 )

chat <-1-spec-MX

AMLE <- (x/n-chat) / (sens-chat)

MLEC- (x/n-l+spec) / (sens+spec-1)

for (i in 1 : size) { if (MLECi] <= O) AMLECi] C- AMLEfi] else AMLECi] <- MLECi] 1

fodi in l:size)(

if (MLECi] > O) MLECi] <- MLE[i] else MLECi] <- O )

delta<- (2*x/n+ (qnom(1-alpha/2) ) -2/n) -2-4* (x/n) ̂2* (l+( (qnorm( 1-alpha/2) ) -2) /n)

pl<- (2*x/n+ (qnorm(1-alpha/2) ) -2/n+sqrt (delta) ) / (2* (l+qnorm(l-alpha/2) ^2/n) )

thetal<-(pl-l+spec)/(sens+spec-1)

mserror . MLEC-sqrt (mean ( (MLE-theta) -2) )

mserror . AMLE<-sqrt (mean ( (AMLE-theta) -2) ) diffc- AMLE-MLE

dif f . positif <-dif f [AMLE > MLE]

len<-length(diff.positif)

mAMLE<-sum(diff.positif)/len

mserror.diff.MLE+theta

mserror.diff.AMLE<-sqrt(sum((diff.positif-theta)-2)/len)

closer.prop<-sum(as.numeric(abs(diff.positif-theta)<theta))/len

delta<- (2*x/n+ (qnorm (1-alpha/2) ) ̂ 2/n) -2-4* (x/d ̂2* ( l+ ( (qnorm (1-alpha/2) ) -2) /n)

pl<- (2*x/n+ (qnorm( 1-alpha/2) ) ̂ 2/n-sqrt (delta) ) / (2* (l+qnorm( 1-alpha/2) ̂2/n) )

p2<- (2*x/n+ (qnorm( 1-alpha/2) ) ̂ 2/n+sqrt (delta) ) / (2* (l+qnorm(l-alpha/2) -2/n) )

thetal .m2+ (pl-l+spec) / (sens+spec-1)

d Cx/n<=l-spec] <-qnorm(1-alphal2) *sqrt (spec* (1-spec) /n)

4 percent.ml<-sum(as.numeric(thetal.rn1 <= the ta & theta<= theta2.m1))/size

174

percent.m2<-sum(as.numeric(thetal.rn2 <= theta & theta<= theta2.m2))/size

percent.ml.null<-sum(as.numeric(thetai.rn1 <= theta & theta<= theta2.ml &

AMLE > MLE ) )/len

percent.m2.null<-sum(as.numeric(thetal.rn2 <= theta & theta<= theta2.rn2 &

AMLE > MLE ) ) / l en

percent.mi.pos<-sum(as.numeric(thetai.mi <= theta & theta<= thetaS.ml &

AMLE == MLE) ) / (size-len)

percent.m2.pos<-sum(as.numeric(theta1.~ <= theta % theta<= thetaS.rn2 &

AMLE == MLE) ) / (size-len)

return(percent.ml,percent.ml.pos,percent.m2.pos,percent.ml.null,

percent.rn2.nul1,

percent.m2,mAMLE,len,pr0,MLEOJ

mserror.MLE,mserror.AMLE,rnserror.diff.MLE,mserror.diff.AMLE,closer.prop,

thetal.ml,theta2.rnl,thetal.m2,theta2.m2)~

Appendix B

Splus program to calculate the

average coverage probability when

both the sensitivity and the

specificity are known

In the following, we will display an Splus program to calculate the average coverage

probability when both the sensitivity and the specificity are known, and the prior dis-

tribution of 0 is U[a , b]. -4 similar program can be written when the prior distribution

of 0 is Beta(uo v) on an interval [a l , b l ] .

ne- # the sample size#

fc-function(p,x)~dbinom(x,n,p))

g<-function(p,x)Cdbinom(x+l ,n+1 , p ) )

marge-rep (-1, n+l)

d i f f erencebetac-rep (--1)

if (is . nul1 (yl) ==FI Cf or (i in yl) { phat1 [il <-integrate(g,lover=al ,upper=bl ,x=(i-l))$integral

phatnew Ci] <- i/ (n+l) +phat 1 [il 1)

for(i in y2)C

phat 1 [il <-dif f erencebetal Li]

phatnew Ci] <-dif f erencebetal Li] 3

for(i in t)(

phat Ci] <-phatnew C i 3 /marg[i]

11 [il <-max(a1 , (phat Ci] - d* (sens+spec-1) ) ) 12 Ci] <-min (bl , (phat [il + d* (sens+spec-1) ) )

differencebeta2 Ci] <-1/(n+1)* (pbeta(12 [il , i ,n+2-i)-pbeta(11 Li] , i ,n+2-i) ))

z1<-t [dif f erencebeta2 == O 1 dif f erencebeta2 == 11

z2C-t [dif f erencebeta2 ! = O&dif f erencebeta2 ! = 11

if(is.null(zl)==F) {for(i in A)(

posterior Ci] <-integrate (f , lower=11 Ci] , upper-12 [il , x=i-1) $integral))

for(i in z2) (

posterior [il C-dif f erencebeta2 Li] )

average.prob<-sum(posterior)/l

return(n,marg,average .prob 1)

Splus program to cornpute the sample size, through a bisectional search,

when the sensitivity and specificity are known.

phat Ci] <-dif f erencebeta Ci] /marg Ci]

if (11 [il < al ) lowerlimit Ci] <-al else lowerlimit Ci] (-11 Ci3

if (12Cil > bl) upperlimit Ci] <- bl else upperlimit [il<-12CiI

posterior [il <-l/ (n+l) * (pbeta(usper1imit [il , i ,n+2-i) - pbeta(lower1imit [il , i ,n+2-i) ) )

average. probstartc-average .prob

if (average. probstart < 1-alpha ) {n2<-ceiling(l*nl))

phat Ci] <-dif f erencebeta Cil /marg [il

11 Cil <-phat Ci] - d* (sens+spec-1) 12 Ci] <-phat [il + d* (sens+spec-1)

if (11 [il < al ) lowerlimit c i ] +al else lowerlimit Ci] <-Il Ci]

if (12 [il > bl) upperlimit [il <- bl else upperlimit [il <-12 Ci]

posterior Ci] c-l/ (n+l) * (pbeta(upper1imit [il , i ,n+2-i) -

pbeta(lower1imit [il , i ,n+2-i) ) )

postc-posterior [ t]

if (average. prob == 1-alpha) Cnnew<-n2)

n3<-nnew

t <-x [marg ! =O]

lowerlimit<-rep(-1 ,n+1)

upperlimit c-rep (-1, n+1)

posterior<-rep(-i,n+l)

phat<-rep (-1 ,n+l)

lic-rep(-1 ,n+l)

l2<-rep(-1 ,n+l)

differencebeta~-x/((n+2)*(n+l))*(pbeta(bl,x+l,n+2-x)-pbeta(al,x+l,n+2-x))

for(i in t)(

phat [il <-di f f erencebeta [il / m x g [il

11 Ci] <-phat [il- d* (senscspec-1)

12 [il <-phat Ci] + d* (sens+spec-1)

if (11 Ci] < al ) lowerlimit [il <-a1 else lowerlimit Ci] C-11 Ci]

if (12 Ci] > bl) upperlimit [il <- bî eise upperlimit [il <-12 Ci]

posterior Ci] C-1/ (n+l) *(pbeta(upperliait Ci] , i ,n+2-il-

pbeta(lower1imit [il , i ,n+2-i) 1)

post<-posterior [ t]

average. prob<-sum(post) /l

if(average.prob == 1-alpha) (break)

else if (average .prob > 1-alpha) {nstop<-(113-1)

n<-nstop

x<-c (1 : (n+l) > ale-a*(sens+spec-l)+l-spec

blc-b*(sens+spec-l)+l-spec

I<-bl-ai

marg<- l/(n+l)*(pbeta(bl ,xan+2-x) -pbeta(al ,x,n+2-x))

t<-x [marg ! =O]

phatc-rep(- 1, n+l)

Il<-rep(-l,n+l)

12~-rep(-l,n+l)

differencebetac-x/ ( (n+2) * (n+l) ) * (pbeta(b1 ¶x+1 ,n+2-XI-pbeta(a1 ,x+l ¶n+2-x) forci in t)(

phat fil <-dif f erencebeta Ci] /marg [il

11 Ci] <-phat [il - d* (sens+spec-1)

12 Cil <-phat Ci] + d* (sens+spec- 1)

if (11 [il C al ) lowerlimit Ci] <-al else lowerlimit [il <-11 [il

if (12 [il > bl) upperlimit [il <- bl else upperlimit [il <-12 [il

posterior [il <-l/ (n+l) * (pbeta(upper1imit [il , i ,n+2-i) -

pbeta(lower1imit Ci] , i ,n+2-i) ))

poste-posterior [ t]

average.probstop<-sum(post)/l

if(average.probstop < 1-alpha I l nstop==l) {break))

if(average.prob < 1-alpha %& average-probold < 1-alpha 8%

average. probstart C 1-alpha) {nnew<-ceiling(2*n3)

average.probstart<-average.probold

if(average.prob > 1-alpha &% average.proboid > 1-alpha &Br

average. probstart > 1-alpha) {nnew<-ceiling(n3/2)


average.probold<-average.prob

nl<-n2

n2<-n3

n3<-nnew )

else

if(average.prob > 1-alpha && average-probold > 1-alpha &&

average.probstart < 1-alpha) (nnewc-ceiling((nl+n3)/2)

average.probstart<-average.probstart


ni<-n1

n2<-n3

n3<-nnew)

else

if(average.prob > 1-alpha && average-probold < 1-alpha &&

average-probstart > 1-alpha) {mev<-ceiling((n3+n2)/2)

average.probstart<-average-probold


nl<-n2

n2<-n3

n3<-nnew)

else

if(average.prob < 1-alpha && average-probold > 1-alpha &&

average. probstart > 1-alpha) Cnnew<-ceiling( (n3+n2) /2)


average.probold<-average-prob

nl<-n2

n2<-n3

n3<-nnew)

else

if(average.prob > 1-alpha && average.probold < 1-alpha &&

average. probstart c 1-alpha) innew<-ceiling( (n3+n2) /2)

average.probstartC-average.probold


ni<-n2

n2<-n3

n3<-nnew)

else

if(average.prob < 1-alpha &% average-probold < 1-alpha &&

average.probstart > 1-alpha) {nnew<-ceiling((nl+n3)/2)

average.probstart<-average-probstart

average.probold<-average-prob

nlc-nl

n2<-n3

n3<-nnew)

else

if(average.prob < 1-alpha %& average-probold > 1-alpha &&

average.probstart c 1-alpha) {nnewc-ceiling((n3+n2)/2)

average.probstart<-average.probo1d

average.probold+average.prob

nl<-n2

n2<-n3

Appendix C

Computing the posterior average

coverage probability when the

specificity is known

In this appendix, Ive display the programs needed to cornpute the posterior aver-

age coverage probability when the specificity is known, the prior distribution of the

sensitivity is U[a i , b l ] and the prior distribution of O is C'[a, b].

1. The Maple program to compute the integrals with respect to s, c

being held constant.

c is fixed, a > O

sl:=(p-(1-a)*(l-c))/a;

s2:=(p-(l-b)*(i-c))/b;

gl := in t (l/ (s+c-1) , s=al . . sl) ;

g2:=int (l/(s+c-1) , s=al . . bl) ;

g3:=int (l/(s+c-1) ,s=s2. . bl) ;

g4:=intCl/(s+c-1) ,s=s2. .si) ;

gpi:=int (l/(s+c-1)~2,s=a1. .si) ;

gp2 : = i n t (l/ (s+c-1) -2, s=al . . bl) ;

gp3 : = i n t Ci/ (s+c-1) - 2 , s=s2. . bl) ; gp4 : = i n t (l/ (s+c-1) -2, s=s2. . s 1) ;

c is f ixed, a=O

g2 : =int Cl/ (s+c-1) , s=al . . bi) ;

g3 : =int (l/ (s+c-1) , s=s2. . bi) ; gp2:=int (l/(s+c-1) *2,s=ai. .bl) ;

gp3:=int(l/(s+c-l)^2,s=sS..bi);

An Splus program was then written to integrate these functions with respect to p

and calculate the posterior average coverage probability.

2. Splus program to calculate the exact posterior average coverage

probability when the specificity is known, the prior distribution of the

sensitivity is U[a,, b,] and the prior distribution of 0 is U[O, b ] .

glc-function(p,x,a,b,al,bl,cO)~dbinom(x,n,p)*

(log( (p+cO-1) /a) -log(al+cO-1) ) 1

g2<-function(p,x,a,b,al,bl,cO)~dbinom(x,n,p)*

(log(bl+cO-1) -log(al+cO-1) ) 3

g3<- function(p,x,a,b,al,bl,cO)idbinom(x,n,p)*

(log(bi+cO-1) -log( (p+cO-1) /b) ) 1

g4<-function(p,x,a,b,al,bl,cO)~dbinom(x,n,p)*

(log( (p+cO-1) /a) - log( (p+cO-1) /b) ) )

gp2<-function(p,x,a,b,al,bl,cO){dbinom(x,n,p)*

((p-l+cO)*(bl-al)/(( bl + CO - l)*(al + CO - 1)))) gp3c- function(p,x,a,b,al,bl ,cO)~dbinom(x,n,p)*

((p-l+cO)*(- 1/( bl + CO - 1) +b/( p - 1 + CO))))

Il<-vector ()

I2<-vector 0

13~-vector 0

14~-vector 0

Ip2<-vector (1

Ip%-vector 0

post . probc-vector (1

fixc.cov<-function(n,d,a.old,b.old,al.old,b1.old,c.old)~

x<- 1: (n+l)

pl<-function(a,s,cO)Ca*s+(i-a)*(l-CO))

p2<-function(b, s, C O ) {b*s+ (1-b) * (1-CO))

lpic-1-c-old

lp2<-p2(b.old,al.old,c.old)

1~3~-p2(b.old,bl.old,c.old)

for(i in l:(n+l)){

I1~i]~-integrate(g2,lower=lpl,upper=lp2,x=i-l,a=a.old,b=b.old,

al=al.old,bl=bl.old,cO=c.old)$integral

I2[i~~-integrate(g3,lower=lp2,upper=lp3,x=i-l,a=a.old,b=b.old,


Ip2[i]<-integrate(gp2,lower=lpl,upper=lp2,x=i-l,a=a.old,b=b.old,


Ip3 Ci] <-integrate (gp3,lower=lp2, upper=lp3, x=i-1 , a=a. old, b=b . old,

al=al.old,bl=bl.old,cO=c.old)$integral)

rnarg<-11+12

WC-l/((b.old-a.old)*(bl.old-ai.old))

thetahat<-(Ip2+Ip3)/marg

Il<-pmadthetahat-d,a.old)

12<-pmin(thetahat+d,b.old)

a. new<-11

b . new<-12 1pl.newC-pl(a.new,al.old,c.old)

lp2.newC-pl(a.new,bl.old,c.old)

lp3.new<-p2(b.new,a1-old,c.old)

lp4.new<-p2(b.new,bl.old,c.old)

for( i in l:(n+l)){if(marg[i] >O && a.neu[il==O ) {

11 Ci] <-integrate (g2,lower=lp2. new Ci] , upper=lp3. new [il , x=i-1 ,

a=a. new [il ,b=b .new [il , al=al . old, bl=bl . old, CO-c . old) $integral

12 Ci J <-integrate (g3,lower=lp3. new [il , upper=lpl. new [il , x=i-1 ,

a=a .new [il , b=b . new Ci] , al=al . old, bl=bl . old, cO=c . old) $integral

post .prob Ci] <-Il [il +12 Ci] > if ( marg Ci] ==O ) {post . prob [il <- O)

if (marg[i] > O && a. new Ci] > O) {

if ( lp2. new Ci] <= Ip3. new [il ) {

Il [il c-integrate (gl , lover=lpl . new Cil ,upper=lp2. new [il , x=i-1 ,

a=a. new Ci] , b=b . new [il , al=al . old, bl=b1. old , cO=c . old) $integral

12 [il c-integrate (g2,lover=lp2. new Cil ,upper=lp3. new [il , x=i-1 ,

a=a. new [il , b=b. new Ci] , al=al . old, bl=bl . old, cO=c . old) $integral

13 [il c-integrate (g3,lover=lp3. new Cil , upper=lp4. new [il , x=i- 1,

a=a. nev Ci] , b=b. new Ci] , al=al . old, bl=bl . old, cO=c . old) $integral)

else if (Ip2. new Ci] > lp3. new [il ) {

Il [il <-integrate(gl,lower=lpl .nev[il ,upper=lp3 .new Ci] ,x=i-1 ,

a=a. new [il , b=b .new [il , al=al . old, bl=bl . old, cO=c . old) $integral

12 Ci] <-integrate (g4,lower=lp3. new Ci] ,upper=lp2. new Ci] , x=i-1 ,

a=a. new Ci] , b=b . new [il , al=al . old, bl=bl . old , cO=c . o ld ) tintegral

13 Ci] c-integrate (g3,lover=lp2. new Ci] ,upper=lp4. neu [il , x=i-1 ,

a=a. neu [il , b=b . new Cil , al-al . old, bl=b1. old, CO-c . old) $integral)

post . prob Ci] <-Il [il +12 Ci] +I3 [il >)

average. cov<-sum(post . prob*v)

sum.marg<-sum(marg*w)

return(average . cov, sum-marg) )

In this program, we supposed that a = O, since in most practical cases it is. h

similar program can be written when a > O. In that case, the functions gpl and

gp4 have to be integrated on the same regions as gl and 94, and the result of the

integration is added to the integrals of 92 and 93 to get the postenor mean of O.

3. The SIR program to calculate the posterior average coverage proba-

bility when the specificity is known, the prior distribution of the sensitivity

is U [ a l , b l ] and the prior distribution of 0 is U[a , b ] .

upperthetac-vect or (1

tparc-vector 0

marginale-vector (1

su<-vector 0

weight Ci ,] <-dbinom(i-1 ,size, (theta*sens+(l-theta)*(l-spec) ) )

su Cil C-sum(weight [i ,1)

if (sw [il >O) (

posttheta[i ,] <-sample (theta, size2 ,replace = T, prob = weight ci, 1 )

else {posttheta[i,]<-rep(O,sizel)>

marginal [il <-sum(weight Ci, 1 ) /sizel

loverthetaCi] <-mean(posttheta Ci, 1 ) -1/2

upperthetaci] <-mean(posttheta[i, 1 ) +1/2

tpar Ci] <-length(posttheta[i ,] CpostthetaLi ,] >= loverthetaCi] &

postthetaci ,] <= upperthetaLi1 1 13

totalcov<-sum(probability*marginal)

return(tpar,marginal,weight,sw,totalcov,posttheta,lowertheta,

uppertheta , probability) 1

In the above SIR program, the prior distributions of 0 and S were taking to be

uniform, but a similar SIR program can be rvritten for any other prior distribution.

Appendix D

Regions of integrat ion

In this Appendix, rve describe the regions of integration needed to calculate the exact

posterior average coverage probability, when the prior distributions of the sensit ivity,

the specificity and 0 are Li[al, bl], U [a2, b2] and U[a , b], respectively. These regions

were introduced in section 6.4.2

First fix s and look at the two variables c and p. We have al < c <_ b2 and

a(s + c - 1) + (1 - c ) 5 p 5 b(s + c - 1) + (1 - c), which means that p is between the

ttvo lines

-(1 - a)c + 1 - a + as < p 5 -(1 - b)c + 1 - b + bs.

Let

pi = as + (1 - a)(l - a*),

pz = as + (1 - a)(l - b2),

PJ = bs+ (1 - b)(l - a 2 )

and

p = bs + (1 - b)(l - 9 ) .

Depending on s two cases may arise pi 5 p4 or pi > p4. See figure D.1.

Figure D. 1: Regions of integration.

The inequality pi 5 p4 is equivalent to

as + ( 1 - a ) ( l - a 2 ) 5 6s + ( 1 - b ) ( l - b2) ,

so that

Let SI = "-a)(1-"')-(L-b)(1-b2). 6-a Therefore if s 2 si we have p2 4 pl 5 p4 5 p3 and if

s 5 si we have p2 5 p4 5 pl 5 p3. Fix s 5 si. Let RII be the region bounded by the

lines

c = b2, p = -(1 - a)c + 1 - a + as. p = p4,


p = pl, p = p4, p = -(1 - a)c + 1 - a + as, p = - ( 1 - b)c + 1 - b - bs.

and let RI1 be the region bounded by the lines

Conversely ive may have s 2 SI. In this case. let R21 be the region bounded by

the lines

p = p l , p = - ( 1 - a ) c + 1 - a - a s l c = b 2 ,

let Rz2 be the region bounded by the Iines

p = p l , p = p4, c = al , c = b2, and


p =pq, p = - ( 1 - b)c+ 1 - b - bs, c = al .

197

See figure D.l With the above notation the marginal density function of x can be

written as the sum of 6 triple integrals,

Similar to the case of known c discussed in section 5.3, we can interchange the order

of integration of the first two integrals. CVe will explain the procedure for the case

a # O and for the first integral

We have al 5 s 4 sl and p2 5 p 5 pq, that is, as + (1 - a ) ( l - b 2 ) 5 p 5

bs + (1 - b ) ( l - b 2 ) . Let

and

See figures D. 1 If 5 pzz, the integral becomes

Conversely, if pli 2 p z , the integral becornes

lPz2 P-l 1 lr j-:+l-a-as L-a j (x, P! S, c)dcdpds + p-(t-b)(L-b?)

b j-:+l -a-as f (x, p, s, c)dcdpds.

1 -a

The remaining integrals can be similarly computed.

Note that the case where a = O can be simplified as in the case of fixed specificity

(chapter 6 section 6 .3 ) . We see that each of the 6 integrals will split into three

different integrals. Since we have two cases, we then have a total of 36 triple integrals

to evaluate. We will omit the very tedious mathematical details. Similarly, we can

calculate ê and cov(z; d). As in the previous section we first wrote a program in

MapIe to calculate the double integrals with respect to c and then s as functions of

a, b, a l : bl: a2 and b2. Then we wrote an Splus program to integrate these functions

with respect to p and calculate the marginal probability function of X, the posterior

mean of B and the average coverage probability. The following algorithm was used:

1. Calculate m(x) , the marginal distribution of .Y.

2. Calculate 8, the posterior mean of B.

3. For each z, calculate the coverage probability cou, = P(& - d < 0 5 - dix.

1. Calculate the average coverage probability over al1 x, cov = x:=, cou,m(z).

Since including al1 the functions that result from integrating first with respect to

c and then s under the different conditions would take more than 40 manuscript

pages, and a listing of the program that implements the methods to calculate the

average coverage probability would take an additional 30 pages, these programs are

not included in this thesis.

Appendix E

The SIR program to calculate the

average coverage probability

The SIR program to calculate the average coverage probability when the prior dis-

tribution of 0 is U[a, 61, the prior distribution of the sensitivity is U [ a l , bl] and the

prior distribution of the specificity is U[a2 , b2 ] . Similar SIR programs can be written

for any other prior distribution.

sens<-runif(sizel,al,bl)

spec<-runif(sizel,a2,b2)

thetac-runif (sizel ,a, b)

forci in 1: (size+l))(

weight Ci ,] <-dbinorn(i-1, size, (theta*sens+(l-theta) * (1-spec)) )

sw Ci] C-sdweight fi ,] )

if(swCi1 >O)(

posttheta Ci, 1 <-sample (theta, size2 , replace = T , prob = weight ci, J ) )

else (postthetaci ,] <-rep(0, sizel) )

marginal [il c-sum(weight Ci, ) /sizel

lowertheta Ci] <-mean(posttheta[i, 1 ) -1/2

upperthetaCi1 <-mean(posttheta[i, ] ) +1/2

par1 fi] c-length(posttheta[i ,] CpostthetaCi ,] >= loverthetaCi] &

postthetaci ,] <= upperthetaci] 1 ))

probability<-parl/size2

totalcov<-sum(probability*marginal)

return(tota1cov))

Appendix F

Logist ic models

In this appendix. we will display the binomial, quasi and quasi-full models and their

printouts. We will also write the nested model.

1. The binomial model

cov.fit.bin<-glm(cov-cov.theta.start-l+w+I(l/(d*(0.05-d)))+

d+cov. theta. start+I(cov. the ta . startn2)+I(l/n),

family=binomial ,data=matrix.data.new)

C a l l : glm(formu1a = cov - cov-pi-start - 1 + w + d + I ( l / ( d * (0.05 - d))) +

cov.pi.start + I(cov.pi.start'2) + I(l/n), family = binomial, data =

matrix .data.new)

Deviance Residuals:

Min 1Q Median 34 Max

-0.2166384 -0.03393597 -0.001393636 0.03095628 0.20126

Coefficients :

Value Std. Error t value

(Intercept) -1.088243e+00 2.603811e+00 -0.4179425

w -1.050587e+00 2.870602e-01 -3.6598139

d 3.159586e+01 2.472907e+01 1.2776807

I(l/(d * (0.05 - dl)) -1.904313e-04 7.687555e-04 -0.2477138 cov.pi.start 3.973535e+00 5.307734e+00 0.7486310

I(cov.pi.startn2) -5.378742e+00 4.364519e+00 -1.2323792

I(i/n) -8.855294e+01 5.012866e+01 -1.7665132

(Dispersion Parameter for Binomial f a m i l y taken to be 1 )

Null Deviance: 41.67006 on 1258 degrees of freedom

Residual Deviance: 3.308388 on 1252 degrees of freedom

Number of Fisher Scoring Iterations: 5

Correlation of Coefficients:

(Intercept) w

w -0.1843766

d 0.0698851 -0.0488639

I(l/(d * (0.05 - d) 1) -0.9481826 0 .O039381 -0.0088116

cov.pi.start -0.8785893 0.0553071 -0.4203107 0.7670587

I(cov.pi.start62) 0.8699514 -0.0601915 0.3412218 -0.7515563

I(l/n> -0.0121085 0.0500585 -0.0398480 -0.0678532

cov .pi .start 1 (cov.pi. start-2)

W

2. The quasi mode1

cov.fit.quasi.simple<-glm(cov-cov.theta.st~t~1+~+1(1/(d*(O.O5-d))~+d+

cov. theta. start+I (cov. theta. start-2) +I (1/d ,

f amily=quasi (link=logit ,variance=constant) , data=matrix. data. new)

Call: glm(formu1a = cov - cov.pi. start - 1 + w + I(l/(d * (0 .O5 - d ) 1) +

d + cov.pi.start + I(cov.pi.start-2) + I(l/n) ,family =

quasi(1in.k = logit, variance = constant), data = matrix.data.new)

Deviance Residuals :


-0.04350419 -0.009659104 -0.000379838 0.009870335 0.06229331

Coefficients:


(Intercept) -1.112255e+00 1.329215e-01 -8.367760

w -1.066918e+00 1.486106e-02 -71.792844

1 (l/ (d * (0.05 - d) ) ) -1.69994ie-04 3.812174e-05 -4.459243

d 2.938892e+01 1.162614e+00 25.278320

cov.pi.start 4.131180e+00 3.020775e-01 13.675895

I(cov.pi.start-2) -5.468591e+00 2.601563e-01 -21.020407

I(l/n) -8.277740e+01 2.598457e+00 -31.856364

(Dispersion Parameter for Quasi-likelihood family taken to be 0.0002206 )

Nul1 Deviance: 3.956394 on 1258 degrees of freedom




(Intercept) w I(l/(d * (0.05 - d l ) )

w -0.1873668

I(l/(d * (0.05 - d ) ) ) -0.9439987 0 .O177735

d 0.2459611 -0.0627219 -0.1664503

cov.pi.start -0.9029343 0.0528542 0.7832530 -0.5225833

I(cov.pi.start-2) 0.8951729 -0.0521580 -0.7707318 0.4517733

I(l/d -0.0157598 0.0433608 -0.0611775 -0.0475988

cov .pi. start 1 (cov.pi. start-2)

W

I(l/(d * (0.05 - d l ) )

d

cov. pi. start

I(cov.pi.start*2) -0.9905266

I(l/n) 0.0407937 -0.0450998

3. The quasi.fidi model. This mode1 includes al1 the variables and their

interactions it is selected by Splus to have the smallest MC.

cov.fit.quasi~-glm(cov-cov.theta.start~l+w+I(1/(d*(0.05-d)))+

d+cov. theta. start+I (cov. t h e t a . s t H 2 ) +

msens+I (msensn2) +mspec+I (mspecn2) +I (Un)

+ (d+cov. theta. start+I (cov. theta. starta2) +

msens+I(msens*2)+mspec+I(mspec~2))*2,

family=quasi(link=logit,variance=constant),data=matrix.data.new)

fomula(cov .f it .quasi)

cov.fit.quasi.step<-step(cov.fit.quasi,scope=-.)

Call: glm(formu1a = structure(.Data = cov - cov.pi.start - v +

I(l/(d * (0.05 - d l ) ) + d + cov.pi.start + I(cov.pi.startn2)+

I(msens-2) + mspec + I(mspecn2) + I(l/n) + d:1(msensA2) +

d: 1 (rnspec-2) + cov. pi. start : I (msensa2) + cov. pi. start :mspec +

I(cov.pi.starta2) :I(msens-2) + I(cov.pi.startA2):

mspec, class = "formula"), family = quasi(1ink = logit, variance =

constant) , data = matrix. data. new)

Deviance Residuals :


-0.04529682 -0.009005072 -0.0004487669 0.008003408 0.0546313

Coefficients :


(Intercept) -4.621282e+00 7.448138e-01 -6.2046145

w -4.918366e-01 5.446493e-02 -9.0303351

I(l/(d * (0.05 - d l ) ) -1.485416e-04 3.768448e-05

d 7.271363e+OI 9.034557e+00

cov.pi.start 7.752816e+00 2.277173e+00

I(cov.pi.startn2) -1.078100e+01 1.956258e+00

I(msens-2) 2.603876e-01 2.316828e-01

mspec 1.188436e+00 1.246512e+00

I(mspecn2) 1.785556e+00 6.329351e-01

I(l/n) -8.200729e+Ol 2.327415e+00

d:I(rnsensn2) -1.638015e+01 ?.245770e+00

d:I(mspeca2) -3.694298e+01 1.064003e+01

cov.pi.start:I(msens'2) 2.672344e+00 1.143772e+00

cov.pi.start:mspec -6.274561e+00 2.457733e+00

1 (cov. pi. startn2) : I (msensn2) -2.256186e+00 l.O06317e+OO

I(cov.pi.start'2):mspec 7.678926e+00 2.092371e+00

(Dispersion Parameter for Quasi-likelihood family taken to be 0.0001778 )

Nul1 Deviance: on 1258 degrees of freedom




(Intercept) w I ( l / ( d * (0.05 - d ) ) )

w -0.4809546

I ( l / ( d * (0.05 - d))) -0.2397827 -0.0080458

d 0.2684785 0.0606991 -0.0950453

cov.pi.start -0.6525555 -0.0002743 0.2093212

I(cov.pi.starta2) 0.6722751 -0.0501112 -0.1967440

I(msensn2) -0.2367838 0.3005772 -0.2456594

mspec -0.9177480 0.3248278 0.1150115

I(mspec-2) 0.7562479 -0.2536979 -0.0685331

I(i/n) 0.0209653 -0.0034014 -0.0329474

d:I(msensd2) 0.0126985 0.1320700 -0.1466124

d:I(mspecn2) -0.2783140 -0.1346187 0.1628651

cov.pi.start:I(msensn2) 0.0569443 -0.0441566 0.2992395

cov.pi.start:mspec 0.6250244 0.0144031 -0.2151961

I(cov.pi.start-2):I(msensn2) -0.0541779 0.0123761 -0.3313031

I(cov.pi.start"2):mspec -0.6529259 0.0481267 0.2196878

d cov .pi. start 1 (cov-pi . startn2) W

I(l/(d * (0.05 - dl)) d

cov.pi.start -0.6600397

1 (cov . pi. startd2) 0.49l4913 -0.9628544

I(msensn2) -0.0582930 0.1845356

mspec -0.2409717 0.4828702

I(mspecn2) 0.1705694 -0.2166373

1 (l/n) 0.0163372 -O.O298150

d:I(msensd2) -0.3129960 0.0495458

d:I(mspecn2) -0.8283379 0.6422962

cov. pi. start : 1 (msensn2) 0.2lOl99O -0.1791927

cov.pi.start:mspec 0.5853145 -0.9399950

I(~ov.pi.start~2):I(rnsens~2) -0.1917929 0.1919941 -0.2142826

I(cov.pi.start"2):mspec -0.4255888 0.9066909 -0.9395096

1 (msens-2) mspec I (mspec-2) 1 (l/d

w

I(l/(d * (0.05 - d l ) )

d

cov.pi.start

1 (COV .pi. staxtn2)

I(msensn2)

mspec 0.0305303

I(mspecn2) -0.0282835 -0.9327618

I(l/n) -0.0481272 -0.0135424 0.0065108

d:I(msensk2) 0.3165893 -0.0793904 0.0727580 -0.0287008

d:I(mspecA2) -0.1127687 0.2905244 -0.2150540 -0.0062113

cov.pi.start:I(msensn2) -0.8672577 0.0875644 -0.0657743 0.0561111

cov.pi.start:mspec 0.0975931 -0.5182995 0.2392215 0.0186181

I(cov.pi.startn2):I(msens~2) 0.8619857 -0.0738785 0.0482081 -0.0637246

I(cov.pi.startn2):mspec -0.0763776 0.5237750 -0.2276340 -0.0204177

d: 1 (msensn2) d: 1 (mspecn2) cov . ~ i . start : 1 (rnsensn2)

W

I W ( d * (0.05 - d l ) )

d

cov. pi. start

I(cov.pi.start-2)

1 (msens-2)

mspec

1 (mspecn2)

1 (l/d

d : 1 (msensn2)

d:I(mspec*2) -0.2601305

cov.pi.start:I(msens~2) -0.6488201 0.1523801

cov.pi.start:mspec 0.1677280 -0.6963843 -0.1437198

I(co~.pi.start-2):I(msens^2) 0.5081962 -0.0884362 -0.9730796

I(cov.pi.start^2):mspec -0.094607'3 0.4934935 0.1171145

cov.pi.start:mspec I(cov.pi.start~2):I(msens^2)

W

I ( l / ( d * (0.05 - d l ) )

d

cov .pi. start

I(cov.pi. startn2)

1 (msensa2)

mspec

1 (mspec-2)

1 (l/d

d: 1 (msens-2)

d: I (mspec-2)

cov. pi. start : 1 (msens-2)

cov.pi.start:rnspec

1 (COV . pi. start-2) : 1 (msens -2) 0.1153060

1 (cov. pi. start'2) :mspec -0.9573718

4. The nested mode1

cov.aov<-glm(cov-cov.theta.start-I(l/n)+

210

Appendix G

Examples

Mode1 to fit the data of the second column of table 7.3

cov.theta.start<-0.1

matrix.refugees<-scan("data.refugees.O.05",

1ist(n=0,cov1=0,cov2=0,cov3=0,cov4=0,cov5~0,~0~6~0))

n+matrix.refugees$n

covl<-matrix.refugees$covl

cov2<-matrix.refugees$cov2





covl.refugees~-glm(covl-cov.theta.start-1+I(l/n),

family=quasi(link=logit,variance=constant),

data=matrix . refugees) summary(cov1. refugees)

coef (covl .refugees)

postscript (f ile=ltref ugees 1. ps" ,pr int=F, hor izonta l=F)

par(mfrow=c (2,l)

plot (n, covl)

par (new=T)

lines (n,f itted(cov1. refugees)+cov. theta. s t a r t )

plot (f itted(cov1 .refugees) , covl-cov. theta. start-f itted(cov1. refugees) )

dev. off 0

Bibliography

[l] Albert, J.H. (1993). Teaching Bayesian statistics using sampling methods and

MINITAB. The American Statistician 17 (3), 182-191.

[2] I rk in , C.F. and Wachtel, M.S. (1990). How many patients are necessary to assess

test performance. Journal of the lmerican Medical Association 263, 275-278.

[3] Berger, J.O. (1985). S tatistical decision theory and Bayesian analysis. Springer

Verlag, New York.

[4] Blackwelder, W.C. (1982). Proving the nul1 hypothesis in clinical trials. Con-

trolled Clinical Trials 3, 345-353.

[5] Casella, G. and Berger, R.L. (1990). Statistical inference. Wadsworth & Brooks,

California.

[6] Centor, R. M. (1992). Estimating confidence intervals of likelihood ratios. Medical

Decision Making 12 (3) , 229-33.

[7] Chaloner, K.M. and Verdinelli, J. (1995). Bayesian experimental design: a re-

view. Statistical Science 10 (3), 273-304.

[8] Chaloner, K.M. (1996). Estimation of prior distributions. Bayesian Biostatistics.

Marcel Dekker, New York.

[9] Chambers, J.M. and Hastie, T..J. (1992). Statistical models in S. Wadsworth &

Brooks, California.

[IO] Chinn, S. and Burney P.G.J. (1987). On rneasuring repeatability of data from

self-administered questionnaires. International Journal of Epidemiology 16 (1):

[ I l ] Cohen, J.R. (1977). Statistical power analysis for the behavioral sciences' Re-

vised Edition. Academic Press, Xew York.

[12] Dasgupta, A. and Vidakovic, B. j 1996). Sample size problems in -\nova: Bayesian

point of view. Submitted.

[13] Dawid, ;\.P. and Skene, A.M. (1979). Maximum likelihood estimation of observer

error-rates using the EM algorithm. Applied Statistics 28, 20-28.

[14] Dempster. .-\.P. Laird, ?LW. and Rubin, D.B. (1977). Maximum likelihood for

incomplete data via the Eh1 algorithm. Journal of the Royal Statistical Society

B 39, 1-38.

[l5] Desu, 3I.M. and Raghavarao, D. (1990). Sample size met hodology. Academic

Press, New York.

[16] Dobson, A.J. and Gebski, V.J. (1986). Sample size for comparing two inde-

pendent proportions using the continuity-corrected arc sine transformation. The

Statistician 35, 51-53.

(171 Donner, .A. (1984). Approaches to sample size estimation in the design of clinical

trials. A review. Statistics in Medicine 3, 199-214.

[la] Evans, 41. and Swartz, T. (19%). Methods for approximating integrals in statis-

tics with special emphasis on Bayesian integrat ion problems. Stat istical Science

I O (3), 254-272.

[19] Feinstein, A R . (1977). Clinical biostatistics. C.V. Moseley, Saint Louis.

120) Fleiss, J.L.. Tytun: -4. and Ury, H.K. (1980)- h simple approximation for calcu-

lating sample size for comparing independent proportions. Biometrics 36, 343-

346.

(211 Fleiss, J.L.(1981). Statistical methods for rates and proportions. Wiley, New

York.

1221 Gail, hl. (1973). The determination of sample sizes for trials involving several

independent 2 x 2 tables. Journal of Chronic Diseases 20, 233-239.

[23] Gilks, W .R., Richardson, S., and Spiegelhalter, D. J. (1996). Markov-chain Monte

Carlo in practice. Chapman Hall, New York.

[24] .Johnson, W.O. and Gastwirth, J.L. (1991). Bayesian inference for medical screen-

ing tests: -4pproximations useful for the analysis of acquired immune deficiency

syndrome. Journal of the Royal Statistical Society, Series B, 53 (2), 427-439

[25] Joseph: L., Gyorkos, T.W. and Coupal, L. (1995 a). Bayesian estimation of

disease prevalence and the parameters of diagnostic tests in the absence of a

gold standard. American Journal of Epidemiology 141 (3), 263-272

[26] Joseph, L., Wolfson, D.B. and du Berger, R. (1995 b). Sample size calculations

for binomial proportions via highest posterior density intemals. The Statistician

44. 143-154.

[27] Joseph, L.? du Berger, R. and Belisle, P. (1996). Bayesian and mked

Bayesian/likelihood criteria for sample size determination. S tatistics in Medicine,

15 (in press).

[28] Joseph, L., Belisle, P. (19%'). Bayesian sample size determination for normal

means and difference between normal means. The statistician (to appear).

[29] Joseph, L., Gyorkos? T.W. (1996). Inferences for likelihood ratios in the absence

of a gold standard. Medical Decision Making (to appear).

[30] Kung Jong Lui (1991). Sample sizes for related measurements in dichotomous

data. Statistics in Medicine 10, 463-472.

[31] Lachin, J.M. (1981). Introduction to sample size determination and power anal-

ysis for clinical trials. Controlled Clinical Trials 2, 93- 11 1.

[32] Lemeshow, S.. Hosmer Jr., D.W., Klar, J.: and Lwanga, S.K. (1990). Adequacy

of sample size in health studies. John Wiley & Sons, New York.

(331 Lew, R A . and Levy, P.S. (1989). Estimation of prevalence on the basis of screen-

ing tests. Statistics in Medicine 8, 1225-1230.

[34] Makuch, R. and Simon, R. (1978). Sarnple size requirements for evaluating a

conservative therapy. Cancer Treatments Report, 1037-1040.

[35] klcCullagh, P. and Nelder, J.A. (1989). Generalized linear models. Cliapman

and Hall, New York.

[36] Pham-Gia, T. and Turkkan? N. (1992). Sample size determination in Bayesian

analysis. The Statistician 41, 389-392.

[37] Raiffa, H. and Schlaiffer, R. (1961). -4pplied statistical decision theory. Harvard

Business School, Boston.

Rogan, W.J. and Gladen, B. Estimating prevalence from the result of a screening

test. (1987). American Journal of Epidemiology 107, 71-76.

Rohatgi, V.K. (1984). Statistical Inference. John Wiley & Sons, New York.

Rubin, D. (1987). Multiple imputation for nonresponse in surveys. John Wiley,

New York.

Schlesselman, J. J. (1974). Sample size requirements in cohort and case-control

studies of disease. Arnerican Journal of Epidemiology 99, 381-384.

Schork. M..\. and Remington, R.D. (1967). The determination of sample size in

treatment-control cornparisons for chronic disease studies in which dropout or

non-adherence is a problem. Journal of Chronic Diseases 20, 233-239.

Simel, D.L., Samsa. G.P. and Matchar, D.B. (1991). Likelihood ratios with con-

fidence: Sample size estimation for diagnostic test studies. Journal of Clinical

Epidemiology 44 (8), 763-770.

Snedecor, G.W. and Cochran, W.G. (1977). Statistical methods, Seventh Edi-

tion. Academic Press, New York.

Spiegelhalter, D. J. and Freedman, L.S. (1986)- A predictive approach to select-

ing the size of a clinical trial, based on subjective clinical opinion. Statistics in

Medicine 5 , 1-13.

Spiegelhalter, D.J., Freedman, L.S. and Parmar, M.K.B. (1994). Bayesian a p

proaches to randomized trials. Journal of the Royal Statistical Society, Series -4,

157, 357-416.

[47] Staquet, M. Rozencweig, M. Lee Y.J. et al. (1981). Methodology for the as-

sessment of new dichotomous diagnostic tests. Journal of Chronic Diseases 34,

599-610-

[48] Taragin, M.I., Wildman. D. and Trout, R. (1993). Assessing disease prevalence

from inaccurate test results: Teaching an old dog new tricks. Medical Decision

Making 14. 269-273.

[49] Turkkan, N. and Pham-Gia, S. (1993). Computation of the highest posterior

density intervals in Bayesian analysis. Journal of Statistical Computation and

Simulation 44, 243-250.

[SOI Viana, B1.A.G. and Ramakrishnan, V. (1992). Bayesian estirnates of predictive

values and related parameters of a diagnostic test. The Canadian Journal of

Statistics 20 (3), 311-321.

[5 11 Viana, M..kG. (1994). Bayesian small sample estimation of misclassified multi-

nomial data. Biometrics 50, 237-243.

[52] Walter, S.D. and Irwig, L.M. (1988). Estimation of test error rates, disease

prevalence and relative risk from misclassified data: a review. Journal of Clinical

Epidemiology 41 (9). 923-937.

[33] Weiner. DA., Ryan, T.T.? Mc Cabe, C.H. et al. (1979). Exercise stress testing:

correlation among history of angina, ST-segment response and prevalence of

coronary artery disease in the Coronary Artery Surgery Study (CASS). Xew

England Journal of Medicine 301, 230-235.

[54] White, -4.A. and Landis, J.R. (1982). A general categorical data methodology

for evaluating medical diagnostic tests. Communications in Statistics: Theory

and Methods 11, 567-605

[55] Wicknamaratne. P.J. (19%). Sample size determination in epidemiologic studies.

Statistical Methods in Medical Research 4, 311-337.

[56] Wilson, M. and Ware, D. (1991). Evaluation of the Diagnostic Pasteur Platelia

Toxo IgG and Toxo IgM kits for detection of human antibodies to Toxoplasma

gondii. 91 st General Meeting, American Society for Microbiology, Dallas, Texas.

May 3-93bstract number V-23.

IMAGE EVALUATION TEST TARGET (QA-3)

APPLIED - IMAGE. lnc = 1653 East Main Street - -* , , Rochester. NY 14609 USA -- -- - - Phone: 71 6/482-O3OO -- -- - - F a : 71 61288-5989

O 1993. Applied Image. lnc.. Ali Righs Resewed

Documents

size for prevalence estimation - Library and Archives Canada · Sample size determination for prevalence estimation ... and confidence intervals and sample size estimates that arise