18
STATISTICS IN MEDICINE Statist. Med. 2002; 21:3757–3774 (DOI: 10.1002/sim.1330) A comparison of condence interval methods for the intraclass correlation coecient in cluster randomized trials Obioha C. Ukoumunne ; Department of Public Health Sciences; King s College London; 5th Floor; Capital House; 42 Weston Street; London SE.1 3QD, U.K. SUMMARY This study compared dierent methods for assigning condence intervals to the analysis of variance estimator of the intraclass correlation coecient (). The context of the comparison was the use of to estimate the variance ination factor when planning cluster randomized trials. The methods were com- pared using Monte Carlo simulations of unbalanced clustered data and data from a cluster randomized trial of an intervention to improve the management of asthma in a general practice setting. The coverage and precision of the intervals were compared for data with dierent numbers of clusters, mean numbers of subjects per cluster and underlying values of . The performance of the methods was also compared for data with Normal and non-Normally distributed cluster specic eects. Results of the simulations showed that methods based upon the variance ratio statistic provided greater coverage levels than those based upon large sample approximations to the standard error of . Searle’s method provided close to nominal coverage for data with Normally distributed random eects. Adjusted versions of Searle’s method to allow for lack of balance in the data generally did not improve upon it either in terms of coverage or precision. Analyses of the trial data, however, showed that limits provided by Thomas and Hultquist’s method may dier from those of the other variance ratio statistic methods when the arith- metic mean diers markedly from the harmonic mean cluster size. The simulation results demonstrated that marked non-Normality in the cluster level random eects compromised the performance of all methods. Condence intervals for the methods were generally wide relative to the underlying size of suggesting that there may be great uncertainty associated with sample size calculations for cluster trials where large clusters are randomized. Data from cluster based studies with sample sizes much larger than those typical of cluster randomized trials are required to estimate with a reasonable degree of precision. Copyright ? 2002 John Wiley & Sons, Ltd. KEY WORDS: intraclass correlation coecient; cluster randomized trials Correspondence to: Obioha C. Ukoumunne, Department of Public Health Sciences; King’s College London; 5th Floor; Capital House; 42 Weston Street; London SE.1 3QD, U.K. E-mail: [email protected] Presented at the Twenty-second Annual Conference of the International Society for Clinical Biostatistics, 19–23 August 2001, Stockholm. Contract=grant sponsor: Medical Research Council. Received January 2002 Copyright ? 2002 John Wiley & Sons, Ltd. Accepted April 2002

A comparison of confidence interval methods for the intraclass correlation coefficient in cluster randomized trials

Embed Size (px)

Citation preview

Page 1: A comparison of confidence interval methods for the intraclass correlation coefficient in cluster randomized trials

STATISTICS IN MEDICINEStatist. Med. 2002; 21:3757–3774 (DOI: 10.1002/sim.1330)

A comparison of con�dence interval methods for the intraclasscorrelation coe�cient in cluster randomized trials‡

Obioha C. Ukoumunne∗;†

Department of Public Health Sciences; King′s College London; 5th Floor; Capital House;42 Weston Street; London SE.1 3QD, U.K.

SUMMARY

This study compared di�erent methods for assigning con�dence intervals to the analysis of varianceestimator of the intraclass correlation coe�cient (�). The context of the comparison was the use of � toestimate the variance in�ation factor when planning cluster randomized trials. The methods were com-pared using Monte Carlo simulations of unbalanced clustered data and data from a cluster randomizedtrial of an intervention to improve the management of asthma in a general practice setting. The coverageand precision of the intervals were compared for data with di�erent numbers of clusters, mean numbersof subjects per cluster and underlying values of �. The performance of the methods was also comparedfor data with Normal and non-Normally distributed cluster speci�c e�ects. Results of the simulationsshowed that methods based upon the variance ratio statistic provided greater coverage levels than thosebased upon large sample approximations to the standard error of �. Searle’s method provided closeto nominal coverage for data with Normally distributed random e�ects. Adjusted versions of Searle’smethod to allow for lack of balance in the data generally did not improve upon it either in terms ofcoverage or precision. Analyses of the trial data, however, showed that limits provided by Thomas andHultquist’s method may di�er from those of the other variance ratio statistic methods when the arith-metic mean di�ers markedly from the harmonic mean cluster size. The simulation results demonstratedthat marked non-Normality in the cluster level random e�ects compromised the performance of allmethods. Con�dence intervals for the methods were generally wide relative to the underlying size of �suggesting that there may be great uncertainty associated with sample size calculations for cluster trialswhere large clusters are randomized. Data from cluster based studies with sample sizes much largerthan those typical of cluster randomized trials are required to estimate � with a reasonable degree ofprecision. Copyright ? 2002 John Wiley & Sons, Ltd.

KEY WORDS: intraclass correlation coe�cient; cluster randomized trials

∗Correspondence to: Obioha C. Ukoumunne, Department of Public Health Sciences; King’s College London; 5thFloor; Capital House; 42 Weston Street; London SE.1 3QD, U.K.

† E-mail: [email protected]‡ Presented at the Twenty-second Annual Conference of the International Society for Clinical Biostatistics, 19–23August 2001, Stockholm.

Contract=grant sponsor: Medical Research Council.

Received January 2002Copyright ? 2002 John Wiley & Sons, Ltd. Accepted April 2002

Page 2: A comparison of confidence interval methods for the intraclass correlation coefficient in cluster randomized trials

3758 O. C. UKOUMUNNE

1. INTRODUCTION

Cluster randomized trials are studies for which entire clusters of subjects are allocated to inter-vention or treatment groups but outcomes are measured on individuals [1, 2]. The clusters maybe health organizational units such as general practices [3] and hospitals [4] or geographicallyde�ned communities [5, 6]. Correlation between responses in the same cluster invalidates theuse of standard statistical methods for sample size calculation and analysis at the individuallevel as they assume that observations are independent. The application of standard samplesize formulae may result in the number of required subjects for the trial being underestimated.An approach commonly used to adjust standard formulae is to multiply the variance term bythe design e�ect [7] or variance in�ation factor (VIF) [1] given by

VIF=1 + (n− 1)� (1)

where n is the number of subjects sampled from each cluster and � is the intraclass (intra-cluster) correlation coe�cient of the outcome. � quanti�es the level of similarity between theresponses of subjects within the same cluster, or equivalently when positive, the proportionof variation in the outcome that is attributable to the natural variation between clusters ratherthan between subjects within clusters. The larger the correlation between subjects with respectto the outcome, the greater � is and hence the greater the sample size required in the trial tocompensate for allocating cluster units.One of the di�culties encountered in the application of the variance in�ation factor is

that the intraclass correlation coe�cient is generally not known at the planning stage of atrial. Several authors have commented on the paucity of knowledge of � for health relatedoutcomes and the need to make estimates more widely available to the research community[8–11]. This involves the analysis of data from studies for which individual level outcomescan be linked to cluster membership. There is a risk that estimates of � may be imprecise asthey are usually calculated from studies with a small number of cluster units, often less than30 [12]. The importance of allowing for uncertainty in �, both when calculating the requiredsample size and analysing data from cluster randomized trials, is widely accepted [8, 9, 11–15].Sensitivity analyses may be implemented by using various estimates of � within a plausiblerange. For sample size calculations in public health trials � has often been assumed to takevalues between 0 and 0.05 [8]. The use of con�dence intervals for the intraclass correlationcoe�cient is another approach that may be used to provide a plausible range of values.There has not yet been a formal evaluation of approximate methods for assigning con�dence

intervals to � for unbalanced data with the large cluster sizes typically encountered in publichealth or health services research. Factors a�ecting the performance of these methods includethe number of clusters, the average number of subjects per cluster, the degree of imbalancein the number of subjects per cluster, the distribution of the outcome at the cluster level andthe underlying value of � itself [16, 17]. The construction of exact con�dence intervals for� is straightforward for data with Normally distributed errors at the individual and clusterlevels where all the clusters contain the same number of subjects [10]. Donner and Wells[16], using Monte Carlo simulations, compared six methods for assigning con�dence intervalsto the intraclass correlation coe�cient for Normally distributed outcomes. Their study focusedon the applications of the intraclass correlation coe�cient to measure the degree of familyresemblance with respect to biological characteristics and interrater reliability. They thereforesimulated unbalanced data typical of studies with relatively large numbers of small clusters

Copyright ? 2002 John Wiley & Sons, Ltd. Statist. Med. 2002; 21:3757–3774

Page 3: A comparison of confidence interval methods for the intraclass correlation coefficient in cluster randomized trials

CONFIDENCE INTERVALS FOR THE INTRACLASS CORRELATION COEFFICIENT 3759

(mean size 3.12). Lohr and Divan [17] also compared con�dence intervals for � using sim-ulated data with cluster sizes that are smaller (mean of �ve observations) and values of �that are generally larger (0.1 and above) than might be expected from studies where generalpractices or hospitals are sampled or randomized. It is possible that these studies did notimitate the type of imbalance in cluster size that might be expected when clusters containinglarge numbers of subjects are allocated [16]. This study, motivated by and complementingthe work of Donner and Wells [16], uses Monte Carlo simulations and trial data to compareapproximate methods for assigning con�dence intervals to � for continuous outcomes in stud-ies typical of cluster randomized trials that contain relatively small numbers of large clusters.The methods compared broadly include those examined by Donner and Wells [16] and thestudy investigates their robustness to data imbalance and non-Normality of the random ef-fects at the cluster level. Data are simulated to have design characteristics typical of trials inwhich health care and administrative units are randomized, covering larger cluster sizes andconcentrating on smaller degrees of intraclass correlation than those studied by Donner andWells [16].Section 2 describes the random e�ects model which was used to simulate clustered data

and estimate �. Section 3 brie�y describes the methods for assigning con�dence intervals to� which were evaluated. The simulation design is described in Section 4. The results of thesimulations, and application of the methods to data from a trial in which general practiceswere randomized, appear in Sections 5 and 6, respectively, with a discussion of these �ndingsin Section 7.

2. THE RANDOM EFFECTS MODEL AND THE ANALYSIS OF VARIANCEESTIMATOR OF INTRACLASS CORRELATION

Under the random e�ects, or variance components model, the estimation of � is treated as avariance components problem. Variation between units is modelled at each level of clusteringand at the level of the individual. In the one-way random e�ects model the outcome isassumed to be the sum of two random variables with zero mean and constant variance. Itmay be expressed as

Yij=�+ ai + eij (2)

where Yij is the observed response for the jth subject in the ith cluster, � is the overall mean,ai is the random e�ect associated with the ith cluster and eij is the residual e�ect of the jthsubject within the ith cluster. The ai and eij are assumed to be independently and identicallydistributed with zero mean and constant variance, �2a and �

2e , respectively. The random e�ects

may also be assumed to be Normally distributed. Normality is not a requirement for the pointestimation of � but it is a necessary condition for the application of methods of inference[18]. In this study the eij term is assumed to be Normally distributed throughout in contrastto the ai term which is sometimes allowed to be non-Normal. The terms �2a and �

2e represent

the components of outcome variation that are between clusters and between subjects withinclusters, respectively. The intraclass correlation coe�cient is de�ned as the proportion of the

Copyright ? 2002 John Wiley & Sons, Ltd. Statist. Med. 2002; 21:3757–3774

Page 4: A comparison of confidence interval methods for the intraclass correlation coefficient in cluster randomized trials

3760 O. C. UKOUMUNNE

Table I. Analysis of variance for the unbalanced one-way random e�ects model.

Source of Degrees of Sums of squares Mean square Expected mean squarevariation freedom (d.o.f.)

Between clusters k − 1 SSA =∑ni( �Yi• − �Y••)2 MSA =

SSAk − 1 �2e +

(N −∑

n2i =Nk − 1

)�2a

Within clusters N − k SSE =∑∑

(Yij − �Yi•)2 MSE =SSEN − k �2e

Total N − 1 SST =∑∑

(Yij − �Y••)2

total outcome variation that is between clusters (equation (3)):

�=�2a

�2a + �2e(3)

The analysis of variance estimator of � is the most commonly used within the frameworkof the random e�ects model [18]. It is obtained from an analysis of variance of outcomeon cluster. The analysis of variance table corresponding to the unbalanced one-way randome�ects model (equation (2)) is shown in Table I. k is the number of clusters, N is the totalnumbers of subjects, ni is the number of subjects in the ith cluster, �Yi• is the mean of theresponses in the ith cluster, �Y•• is the mean of all responses, SSA is the between-cluster sumof squares, SSE is the within-cluster sum of squares, SST is the total sum of squares, MSAis the between-cluster mean square, MSE is the within-cluster mean square and �2a and �

2e are

the between- and within-cluster variance components, respectively. If the number of subjectsis the same in each cluster the table reduces to the analysis of variance for the balancedone-way random e�ects model. Estimates of the between- and within-components of varianceare obtained as linear combinations of the between- and within-cluster mean squares usingthe formulae

�̂2a =MSA −MSE

n0(4)

and

�̂2e =MSE (5)

respectively, where n0 is a weighted mean cluster size given by

n0=1

k − 1[N −

k∑i=1

n2iN

](6)

The estimate of � is obtained by substitution of equations (4) and (5) into equation (3) togive

�̂=MSA −MSE

MSA + (n0 − 1)MSE (7)

Although true negative values are generally considered implausible in cluster randomized trials,negative estimates of � may be obtained, even from data with large clusters, as a result ofsampling error. For the purpose of applying the variance in�ation factor procedure, negativeestimates of � are truncated below zero and thus the variance in�ation factor is truncatedbelow unity.

Copyright ? 2002 John Wiley & Sons, Ltd. Statist. Med. 2002; 21:3757–3774

Page 5: A comparison of confidence interval methods for the intraclass correlation coefficient in cluster randomized trials

CONFIDENCE INTERVALS FOR THE INTRACLASS CORRELATION COEFFICIENT 3761

3. CONFIDENCE INTERVAL METHODS

Wald [19] devised an exact method for assigning con�dence intervals to the intraclass cor-relation coe�cient for unbalanced Normally distributed data. Computation of the approachrequires the manipulation of matrices and may not be conveniently applied within some sta-tistical software. For example, due to a matrix size limitation, the standard version of the Statasoftware [20] cannot be used to apply the method for data where the total number of individ-ual subjects is greater than 800. As a consequence, approximate methods for unbalanced dataare considered to be of practical use [16]. The methods compared in this study fall into oneof three main approaches. Three are based on the variance ratio statistic obtained from theanalysis of variance of outcome on cluster. Three are based upon large sample approxima-tions to the standard error of �̂. One method is based upon a large sample approximation tothe standard error of a Normalizing transformation of �̂. Con�dence intervals are constructedunder the latter two approaches on the assumption that the sampling distributions of �̂ andthe transformed �̂ are approximately Normal.

3.1. Methods based upon the variance ratio statistic

3.1.1. Searle’s method. Searle’s [21] method for assigning con�dence intervals to the analysisof variance estimator of � was derived for use with Normally distributed data from a balanceddesign. The limits of 100(1− �) per cent intervals are given by{

F=FU − 1n+ F=FU − 1 ;

F=FL − 1n+ F=FL − 1

}(8)

where F=MSA=MSE is the variance ratio statistic from the analysis of variance table and FLand FU are quantiles from the F-distribution such that Pr{FL6F6FU}=1−�. For unbalanceddata the arithmetic mean cluster size, �n, can be used instead of n. FL and FU are usually chosenso the con�dence interval has the property of being equal-tailed with the coverage error dividedevenly beyond the lower and upper bounds of the interval [22]. Use of this method rests onthe assumption that the variance ratio statistic is distributed as a multiple of the central F-distribution with k − 1, N − k degrees of freedom. This is equivalent to assuming that theratio of the between-cluster sum of squares to the expected between-cluster mean square hasa central chi-squared distribution with k − 1 degrees of freedom. For Normally distributedbalanced data this assumption is valid and Searle’s method provides nominal coverage. Forunbalanced data the assumption is not guaranteed to hold unless the true value of � is zero.Under these conditions the method will only provide an approximation to the limits [16].

3.1.2. Adjustment of Searle′s method using the weighted mean cluster size (n0). A simpleadjustment to Searle’s con�dence limits when data are unbalanced is to replace the term forthe number of subjects per cluster with the weighted mean cluster size, n0. The lower andupper bounds of the 100� per cent interval are then given by{

F=FU − 1n0 + F=FU − 1 ;

F=FL − 1n0 + F=FL − 1

}(9)

3.1.3. Thomas and Hultquist′s adjustment to Searle′s method. Thomas and Hultquist [23]derived an adjustment to Searle’s procedure that uses a transformation of the variance ratio

Copyright ? 2002 John Wiley & Sons, Ltd. Statist. Med. 2002; 21:3757–3774

Page 6: A comparison of confidence interval methods for the intraclass correlation coefficient in cluster randomized trials

3762 O. C. UKOUMUNNE

statistic. It involves the substitution of the harmonic mean in place of each ni term in theanalysis of variance table and in place of the cluster size term, n, in the denominators of thelimits shown in formula (8). The limits are then given by{

F∗=FU − 1n̂+ F∗=FU − 1 ;

F∗=FL − 1n̂+ F∗=FL − 1

}(10)

where n̂=k=∑k

i=1(1=ni) is the harmonic mean cluster size and

F∗= n̂

[k∑i=1

�Y 2i• −1k

(k∑i=1

�Yi•

)2]/[(k − 1)MSE] (11)

The results of simulation studies [23] indicated that the F-distribution provides a better ap-proximation to a certain multiple of Thomas and Hultquist’s transformation of the varianceratio statistic than it does to the same multiple of the untransformed variance ratio statisticfor some unbalanced designs. Under the balanced design all three methods based upon thevariance ratio statistic provide identical con�dence limits for �.

3.2. Methods based upon large sample approximations to the standard error of �̂

3.2.1. Smith′s method. Smith [24] derived a large sample approximation to the variance ofthe intraclass correlation coe�cient given by

V (�̂) = [2(1− �)2=n20]{[1 + �(n0 − 1)]2=(N − k)}+ {(k − 1)(1− �)[1 + �(2n0 − 1)]+ �2

[∑n2i − 2N−1∑ n3i + N

−2 (∑n2i )2]} =(k − 1)2 (12)

For use of this formula, and others based on the variance of the intraclass correlation coe�cientor its transform, the estimated value, �̂, is substituted in place of the unknown true value, �.Approximate (1− �) 100 per cent con�dence limits for � may then be constructed as

{�̂− Z0√(V (�̂)); �̂+ Z0√(V (�̂))} (13)

where Z0 is the two-sided critical value of the standard Normal distribution putting 100�=2per cent of observations in each tail.

3.2.2. Method of Swiger and colleagues. Swiger et al. [25] a large sample approximation tothe variance of �̂ but with use of a simpler formula than Smith’s:

V (�̂)=2(N − 1)(1− �)2[1 + (n0 − 1)�]2

n20(N − k)(k − 1) (14)

Con�dence intervals for � may then be assigned as shown in formula (13). For balanced data,limits produced by this approach are equal to those of Smith.

3.2.3. Fisher’s method. Using a formula similar to that later derived by Swiger, Fisher [26]estimated the variance of �̂ as

V (�̂)=2(1− �)2{1 + (n− 1)�}2

n(n− 1)k (15)

Copyright ? 2002 John Wiley & Sons, Ltd. Statist. Med. 2002; 21:3757–3774

Page 7: A comparison of confidence interval methods for the intraclass correlation coefficient in cluster randomized trials

CONFIDENCE INTERVALS FOR THE INTRACLASS CORRELATION COEFFICIENT 3763

In this study the use of Fisher’s method was evaluated, �rst using �n, the arithmetic meancluster size, and then n0, a weighted mean cluster size in place of n. Con�dence intervalswere constructed using formula (13).

3.3. Fisher’s method based upon a large sample approximation to the standard error of aNormalizing transformation of �̂

The validity of methods based on approximations to the standard error of the intraclass corre-lation coe�cient rest on the assumption that �̂ is Normally distributed. A modi�ed version ofFisher’s Normalizing transformation [26] of the product moment correlation coe�cient maybe used to assign con�dence intervals to �. The transformation of the estimated intraclasscorrelation coe�cient, given by

ZF =12ln

[1 + (n0 − 1)�̂

1− �̂]

(16)

is approximately Normal with mean

12ln

[1 + (n0 − 1)�

1− �]

and variance

VZ =0:5{(k − 1)−1 + (N − k)−1}Con�dence limits are assigned to the transformed estimate using

{ZF − Z0√VZ; ZF + Z0√VZ} (17)

where Z0 is the two-sided critical value of the standard Normal distribution putting 100�=2 percent of observations in each tail. These limits are then inverted to obtain con�dence intervalson the original scale.

4. SIMULATION METHODS

Monte Carlo simulations were used to evaluate the performance of the methods for assigningcon�dence intervals to �. Coverage probabilities and mean interval widths were estimatedacross a range of values of the design parameters that feature in the formulae of these methods.This entailed varying the total number of clusters, the mean number of subjects per clusterand the true size of the intraclass correlation coe�cient. The performance of the methodswas also compared for Normal and non-Normal data. All programming for simulations andanalyses was implemented using Stata 7 software [20].Correlated outcome data were simulated, according to the framework of the one-way random

e�ects model described by equation (2), as the sum of two random variables with ai and eij,assumed to have zero mean and variances �2a and �

2e , respectively. The eij term was simulated

to be Normally distributed throughout whereas the ai term was sometimes simulated to havea non-Normal distribution. The overall mean, �, was �xed at zero. For convenience the sumof the variances was constrained to be 1000 with �2a and �

2e systematically manipulated to

generate the required value of �.

Copyright ? 2002 John Wiley & Sons, Ltd. Statist. Med. 2002; 21:3757–3774

Page 8: A comparison of confidence interval methods for the intraclass correlation coefficient in cluster randomized trials

3764 O. C. UKOUMUNNE

For example, data with an underlying intraclass correlation coe�cient of size 0.05 weregenerated by simulating ai to have variance 50 and eij to have variance 950. For simulationswhere ai was non-Normally distributed a four-parameter probability distribution presented byRamberg et al. [27] was used to generate this random e�ect to have a skewness coe�cient of1.2 and kurtosis coe�cient of 5.6. The choice of the skewness and kurtosis coe�cients wasinformed by analyses of data from the Health Survey for England, 1994 [28], based on thedistribution of random e�ects at the postcode sector level of clustering. Clusters at this levelare similar in size to general practices. The skewness and kurtosis coe�cients represent themarked deviation from Normality for ‘units of alcohol drunk per week’.A 6× 6× 8× 2 factorial design was used for the simulations with replications carried out for

di�erent numbers of clusters (10; 15; 20; 25; 30; 50), mean cluster sizes (10; 15; 20; 25; 30; 50),intraclass correlation coe�cients (0; 0:001; 0:005; 0:01; 0:05; 0:1; 0:2; 0:3) and distributions ofdata (Normally versus non-Normally distributed cluster level e�ects). 10 000 replications ofdata were simulated for each of the 576 con�gurations of design parameter values using adi�erent random number generator for each replication. The analysis of variance estimate ofthe intraclass correlation coe�cient was obtained and 95 per cent con�dence intervals wereestimated using each of the approximate methods. The coverage probability was estimated asthe proportion of intervals containing the simulated value of �. A total of 10 000 replicationsare su�cient to estimate coverage probabilities for 95 per cent intervals with a standard errorof just over 0.2 per cent. The estimated coverage for any method was never lower than 76per cent for which 10 000 replications provide a standard error of 0.4 per cent. Precision ofthe intervals was quanti�ed using the mean interval width.The patterns of unequal cluster sizes were generated as random Poisson variables truncated

below unity. The probability function of the Poisson distribution is given by

Pr(m)=�m exp(−�)

m!(18)

and the cumulative distribution function by∑�m exp(−�)=m! (19)

where m is the size of a given cluster for a given replication and � is the mean cluster sizefor a given set of replications. The Poisson distribution has previously been used to generatecluster sizes typical of public health cluster randomized trials [29]. In the main analyses,data were simulated in such a way that a new pattern of cluster sizes was generated foreach replication of data. To test the sensitivity of the results, alternative simulations werecarried out in which the pattern of cluster sizes was randomly generated, again using thePoisson distribution, but held �xed for any given con�guration of design parameters. The twoapproaches yielded essentially the same �ndings.Although negative intraclass correlation coe�cients are considered implausible in studies

with clusters that contain large numbers of subjects, negative estimates of � and con�dencelimits were not truncated below zero in the main analyses as this was felt to be inappropriatewhen comparing precision between the methods. Sensitivity analyses in which � and theassociated con�dence intervals were truncated below zero were carried out to assess the extentto which the e�ective performance of the methods was dependent on the assumption that thetrue value of � cannot take negative values. For the truncated comparison, Wald’s exactmethod [19] was also applied, but only for a subset of design parameter con�gurations due to

Copyright ? 2002 John Wiley & Sons, Ltd. Statist. Med. 2002; 21:3757–3774

Page 9: A comparison of confidence interval methods for the intraclass correlation coefficient in cluster randomized trials

CONFIDENCE INTERVALS FOR THE INTRACLASS CORRELATION COEFFICIENT 3765

the aforementioned matrix size limitation within Stata. The Wald method is applicable onlyto non-negative values of �, so comparison with the other methods in their truncated forms ismore appropriate. The bisection method [30] was used to obtain Wald’s con�dence intervalusing an algorithm adapted from Burdick and Graybill [22]. A total of 2000 replications wereused for the truncated analyses. This is su�cient to estimate coverage probabilities for 95 percent intervals with a standard error of 0.5 per cent. The lowest coverage for any truncatedmethod across the range of design parameter values was 84 per cent for which the standarderror was 0.8 per cent.

5. SIMULATION RESULTS

The coverage proportions of the non-truncated intervals are summarized for Normal and non-Normal data in Figures 1 and 2, respectively. Only Searle’s, Smith’s and Fisher’s transfor-mation methods are shown to aid readability. Approaches based on the variance ratio statisticprovided nearly identical results to each other, as did the approaches based on large sampleapproximations to the standard error of the untransformed �̂. Increasing the number of sub-jects in each cluster generally made little di�erence to the coverage levels of the methods.Exceptions to these general �ndings are commented on where appropriate. In Figure 1 thecoverage proportions of the methods for Normally distributed data are plotted against thenumber of clusters, for data with a mean cluster size of 10. Separate graphs are shown foreach value of �. Searle’s method based upon the variance ratio statistic provided close to

Figure 1. Coverage proportions of the non-truncated intervals for Normally distributed datawith mean of 10 subjects per cluster.

Copyright ? 2002 John Wiley & Sons, Ltd. Statist. Med. 2002; 21:3757–3774

Page 10: A comparison of confidence interval methods for the intraclass correlation coefficient in cluster randomized trials

3766 O. C. UKOUMUNNE

Figure 2. Coverage proportions of the non-truncated intervals for non-Normally distributeddata with mean of 10 subjects per cluster.

nominal 95 per cent coverage across the range of intraclass correlation coe�cients shown.Only for values of � above 0.1 was there an indication that coverage levels drop slightly butthis was never greater than 1 per cent below the nominal level. For larger cluster sizes the lossof coverage at larger values of � was barely noticeable so the results shown represent morepessimistic conditions. The two variance ratio statistic methods that attempt to improve onSearle’s provided no greater coverage. The use of n0 in place of n in Searle’s exact formulaprovided nearly identical results. For cluster sizes less than 20, Thomas and Hultquist’s methodactually yielded marginally poorer coverage for values of � below 0.1. All three variance ratiostatistic methods depend upon the F-statistic following a multiple of the F-distribution. Forunbalanced data this assumption is guaranteed to hold only when the true size of � is 0.As the size of � increases the coverage of methods based upon the variance ratio statisticdecreases. However, it appears that for values of intraclass correlation below 0.3, varianceratio statistic methods provide close to nominal coverage for Normally distributed data.Methods based upon the standard error of the untransformed �̂ generally provided poor

coverage, particularly when the number of clusters was small. When the number of clusterswas 10 the coverage proportions were all below 90 per cent. Of the three methods using thisapproach, Smith’s generally provided the best coverage. However, only for data containing50 clusters did the coverage proportion for Smith’s method approach close to the nominallevel.Fisher’s method based on the Normalizing transformation of �̂ yielded noticeably greater

coverage levels than Smith’s method. However, methods based upon the variance ratio statisticperformed better particularly for data with just 10 or 15 clusters. In common with methods

Copyright ? 2002 John Wiley & Sons, Ltd. Statist. Med. 2002; 21:3757–3774

Page 11: A comparison of confidence interval methods for the intraclass correlation coefficient in cluster randomized trials

CONFIDENCE INTERVALS FOR THE INTRACLASS CORRELATION COEFFICIENT 3767

Figure 3. Coverage proportions of the truncated intervals for Normally distributed data withmean of 10 subjects per cluster.

based on the standard error of the untransformed �̂, the coverage for Fisher’s transformationmethod increased as the number of clusters increased.Figure 2 summarizes the coverage proportions for non-Normally distributed data with a

mean of 10 subjects per cluster. Note that the scale di�ers from that of Figure 1. For non-Normal data, Searle’s method was again superior to those based upon approximations to thestandard error of �̂, though superiority over Fisher’s transformation method was slightly lessmarked than for Normal data when � was small. However the coverage proportions of Searle’smethod and the other variance ratio statistic methods were at nominal level only when thesimulated value of � was less than or equal to 0.01. As the intraclass correlation increased,the coverage proportions progressively decreased. The negative impact of non-Normality oncoverage was slightly more marked for data with larger cluster sizes. This is illustrated by thefact that the cut-o� point for the maximum value of � for which coverage is close to nominaldrops to 0.005 when the mean cluster size is 50. The results of additional simulations (using2000 replications) of data in which the mean cluster size was 500 provided further evidencethat increasing the number of subjects per cluster made little di�erence to coverage levelsof the methods for Normally distributed data but reduced the coverage levels for data withnon-Normally distributed cluster level e�ects. For such large cluster sizes the maximum valueof � at which nominal coverage levels are achieved by Searle’s method for non-Normal datais less than 0.001.Figure 3 summarizes the coverage proportions from simulations where con�dence inter-

vals were truncated below zero. The truncated versions of Searle’s and Smith’s methods areshown together with Wald’s method for Normally distributed data. Searle’s method provided

Copyright ? 2002 John Wiley & Sons, Ltd. Statist. Med. 2002; 21:3757–3774

Page 12: A comparison of confidence interval methods for the intraclass correlation coefficient in cluster randomized trials

3768 O. C. UKOUMUNNE

coverage of around 97.5 per cent when � was zero. This might be expected, as the coverageerror resulting from the interval falling below the true parameter value is 2.5 per cent forNormal data from a balanced design. For a similar reason the coverage proportion of Wald’sapproach was also around 97.5 per cent when � was zero. Searle’s and Wald’s methods dif-fered in that whereas the former provided just nominal coverage for all other values of �, thelatter provided slightly greater than nominal coverage until � was as large as 0.05. The cover-age proportion decreased down to the nominal level as � increased. This is due to the natureof the algorithm used to derive the limits under Wald’s method [22]. Smith’s method bene�tsmuch in terms of coverage from the truncation of the intervals. This is because the methodsbased on the standard error of �̂ yielded a large proportion of intervals that fell below thetrue value relative to the other methods. Smith’s method provided coverage levels that weremarkedly greater than nominal for values of � less than 0.1 when the mean cluster size was10, and for values of � less than 0.05 when the mean cluster size was 50. For larger valuesof � the coverage level of Smith’s method was poor and showed much the same pattern ofresults as for the non-truncated intervals.The methods based upon approximations to the standard error of �̂, in both its transformed

and untransformed states, generally provided slightly narrower intervals than those based uponthe variance ratio statistic, but this was o�set by their poorer coverage. The mean intervalwidth of Searle’s approach is shown in Table II for Normally distributed data across thedi�erent numbers of clusters and mean cluster sizes and three values of �(0; 0:05; 0:3). Widthsare shown for both non-truncated and truncated intervals. The mean widths for non-Normallydistributed data were virtually the same to two decimal places, and further analyses revealedthey were also no di�erent from the widths obtained for simulations of balanced data. Con-�dence intervals for Searle’s method were wide in comparison to the simulated values ofintraclass correlation. For instance, the mean width of the intervals for data with 50 clusterseach containing a mean of 50 subjects when � was 0.05 was 0.06. The shape of Searle’sinterval is asymmetrical with a greater di�erence between the upper limit and the estimatethan between the estimate and the lower limit. If these limits had been obtained in a practicalsetting this might suggest that the true value of � could lie anywhere between around 0.03and 0.09. Table II also shows that truncation of the con�dence intervals did little to improvethe precision, particularly when � was 0.05 or greater.The precision of all the con�dence interval methods increased as the mean cluster size and

number of clusters increased, and decreased with increasing intraclass correlation. Perhapssurprisingly the cluster size appeared to play a slightly more important role than the numberof clusters in increasing precision when � was 0. For any given total number of subjects, theinterval widths were narrowed more by increasing the mean cluster size than by increasing thenumber of clusters. For example, given a total sample size of 500, 10 clusters each containing50 subjects yielded narrower intervals when the true value of � was 0 than 50 clusters eachcontaining 10 subjects. One might expect this to be the case for small intraclass correlationcoe�cients as the major part of the total outcome variability is due to �2e which is estimatedall the more precisely when the cluster size is large [31]. When � was 0.05, increasing thenumber of clusters was slightly more e�ective for reducing the interval width than increasingthe mean number of subjects, and when � was 0.3, increasing the number of clusters wasclearly more e�ective. These �ndings may inform the design of pilot studies that are speciallyconducted for estimating the intraclass correlation coe�cient for use in the design of clusterrandomized trials.

Copyright ? 2002 John Wiley & Sons, Ltd. Statist. Med. 2002; 21:3757–3774

Page 13: A comparison of confidence interval methods for the intraclass correlation coefficient in cluster randomized trials

CONFIDENCE INTERVALS FOR THE INTRACLASS CORRELATION COEFFICIENT 3769

Table II. Mean widths of 95 per cent con�dence intervals using Searle’s method for Normallydistributed data.

Number of Mean Widths of non-truncated intervals Widths of truncated intervals

clusters cluster size � = 0 � = 0:05 � = 0:3 � = 0 � = 0:05 � = 0:3

10 10 0.24 0.31 0.48 0.19 0.28 0.4815 0.17 0.25 0.45 0.13 0.23 0.4520 0.13 0.22 0.44 0.10 0.21 0.4425 0.11 0.20 0.43 0.09 0.19 0.4330 0.09 0.19 0.42 0.07 0.19 0.4250 0.05 0.16 0.41 0.04 0.16 0.41

15 10 0.18 0.24 0.39 0.13 0.21 0.3915 0.12 0.19 0.37 0.09 0.18 0.3720 0.09 0.16 0.35 0.07 0.16 0.3525 0.08 0.15 0.35 0.06 0.15 0.3530 0.06 0.14 0.34 0.05 0.14 0.3450 0.04 0.12 0.33 0.03 0.12 0.33

20 10 0.15 0.20 0.34 0.10 0.18 0.3415 0.10 0.16 0.32 0.07 0.15 0.3220 0.08 0.14 0.30 0.05 0.13 0.3025 0.06 0.12 0.30 0.04 0.12 0.3030 0.05 0.11 0.29 0.04 0.11 0.2950 0.03 0.10 0.28 0.02 0.10 0.28

25 10 0.13 0.18 0.30 0.09 0.16 0.3015 0.09 0.14 0.28 0.06 0.13 0.2820 0.07 0.12 0.27 0.04 0.12 0.2725 0.05 0.11 0.26 0.04 0.11 0.2630 0.04 0.10 0.26 0.03 0.10 0.2650 0.03 0.08 0.25 0.02 0.08 0.25

30 10 0.12 0.16 0.28 0.08 0.14 0.2815 0.08 0.12 0.26 0.05 0.12 0.2620 0.06 0.11 0.25 0.04 0.10 0.2525 0.05 0.10 0.24 0.03 0.10 0.2430 0.04 0.09 0.24 0.03 0.09 0.2450 0.02 0.07 0.23 0.02 0.07 0.23

50 10 0.09 0.12 0.21 0.05 0.11 0.2115 0.06 0.09 0.20 0.04 0.09 0.2020 0.04 0.08 0.19 0.03 0.08 0.1925 0.03 0.07 0.19 0.02 0.07 0.1930 0.03 0.07 0.18 0.02 0.07 0.1850 0.02 0.06 0.18 0.01 0.06 0.18

6. AN EXAMPLE: APPLICATION OF METHODS TO ASTHMA SPECIFICQUALITY OF LIFE SCORES

The methods were applied to data from the Greenwich study [3] in which general practiceswere randomized and outcomes were measured on asthma patients. The aim of the studywas to evaluate the e�ectiveness of using nurse specialists to educate, advise and support

Copyright ? 2002 John Wiley & Sons, Ltd. Statist. Med. 2002; 21:3757–3774

Page 14: A comparison of confidence interval methods for the intraclass correlation coefficient in cluster randomized trials

3770 O. C. UKOUMUNNE

Table III. Ninety �ve per cent con�dence intervals for � for the square root of the Marks’ quality oflife scale score at general practice level.

Method 95 per cent con�dence intervals Width

Lower limit Upper limit

Searle using �n∗ 0.00145 0.02652 0.02507Searle using n0† 0.00146 0.02683 0.02537Thomas and Hultquist‡ 0.01151 0.07331 0.0618Smith −0:00180 0.02231 0.02411Swiger et al. −0:00141 0.02192 0.02333Fisher using �n∗ −0:00119 0.02170 0.02289Fisher using n†0 −0:00128 0.02179 0.02307Fisher’s transformation 0.00071 0.02467 0.02396

∗Arithmetic mean = 59:79.†Weighted mean cluster size (n0) = 59:07.‡Harmonic mean cluster size = 27:86.

practice nurses in the management of asthma according to the British Thoracic Society’sguidelines. The practice nurses in the intervention arm were o�ered six teaching sessionson core elements of asthma care and received visits from the nurse specialists to help themorganize clinics and manage their patients. Outcomes were measured before and after theintervention on cross-sectional samples of patients, aged 15 to 50 years, who were registeredwith the study practices. Quality of life was measured, using the Marks scale [32], on allpatients who reported asthma symptoms. The square root of the Marks’ quality of life scorewas approximately Normally distributed. This study analyzed the square root of quality of lifescores from the second survey for 2511 subjects from 42 general practices. The analysis ofvariance estimate of � was 0.01025. In Table III 95 per cent con�dence intervals for � areshown for the methods under comparison, with di�erent measures of the average cluster sizegiven below the table. All but the Thomas and Hultquist method provided similar limits. Thisis in keeping with the simulation results where the di�erent approaches provided approximatelythe same coverage and mean interval width for Normally distributed data with a total of around2500 subjects.For the Greenwich data, Thomas and Hultquist’s intervals were markedly wider and covered

larger values of � than the other methods. Certain aspects of the Greenwich data explain thisdiscrepancy. First the harmonic mean cluster size is half the size of the arithmetic or weightedmean cluster sizes and it is clear from equations (8), (9) and (10) that this has the e�ect ofincreasing the size of both the lower and upper bounds of the interval. Secondly, the use ofthe harmonic mean increases the size of the limits in equation (10) indirectly, by increasingF∗. In the standard analysis of variance table for the one-way random e�ects model, thesquare of the di�erence between the outcome mean for each cluster and the overall mean ismultiplied by the number of subjects in that cluster. Under Thomas and Hultquist’s approach,however, the square of the di�erence for each cluster is multiplied by the harmonic mean. Thecluster with the smallest number of subjects (just two) was the one whose mean value wasmost di�erent from the overall mean. Use of the harmonic mean gave this cluster markedlygreater weighting relative to when Searle’s method is used. Consequently F∗ is larger than F

Copyright ? 2002 John Wiley & Sons, Ltd. Statist. Med. 2002; 21:3757–3774

Page 15: A comparison of confidence interval methods for the intraclass correlation coefficient in cluster randomized trials

CONFIDENCE INTERVALS FOR THE INTRACLASS CORRELATION COEFFICIENT 3771

(1.97 versus 1.61) and the sizes of the limits are increased. Thomas and Hultquist’s methodclearly has potentially to provide quite di�erent �ndings from the other variance ratio statisticmethods under certain types of imbalance in cluster size.

7. DISCUSSION

This study has compared the coverage and precision of approximate methods for assigningcon�dence intervals to the intraclass correlation coe�cient of a continuous outcome. It isinformative to quantify the uncertainty with which � is estimated when designing clusterrandomized trials, as this information can be used to provide a range of suitable values of� for assessing the sensitivity of sample size estimates in cluster randomized trials. Further,con�dence limits may also inform the speci�cation of priors for � and the between-clustervariance when implementing Bayesian analyses [15].This study has demonstrated that methods based on the variance ratio statistic are more

appropriate than those based on large sample approximations to the standard error of �, fordata with relatively small numbers of large clusters and low within-cluster correlation. As acautionary note the �ndings may not be generalizable to data which contain large numbers ofsmall clusters, as are typically found in studies of heritability or interrater reliability. Further,the validity of the simulations largely rests on the extent to which the Poisson distributiondescribes the nature of imbalance in cluster size within cluster randomized trials. One mightexpect the degree of imbalance to be greater and more irregular in real data. Use of the Poissondistribution may not be the best approach to assess the sensitivity of the con�dence intervalmethods to data in which say one or two clusters have an extremely low size relative to theothers. Such a scenario is known to have a negative impact on the precision of at least one ofthe methods [25]. As commented on in the Results section, the analyses of the Greenwich datahighlighted the fact that subtleties in the nature of real data may sometimes have a markede�ect on the relative performance of these methods. Nevertheless, this simulation study addsto knowledge in this area, as much of the comparative work on point and interval estimationof � has tended to focus on data with large numbers of small clusters.Donner and Wells [16], based on their simulations of sibship sized clusters, recommended

Smith’s standard error method on the grounds that it provides better coverage over a widerrange of intraclass correlation coe�cients than other approximate methods. The Stata [20]software provides Smith’s intervals as the default with the ‘loneway’ procedure for estimating� and Searle’s intervals may only be selected when the cluster sizes are equal. The coverage ofmethods using the standard error approaches depends on having a large number of clusters.Cluster randomized trials, however, frequently have small numbers (less than 30) of largeclusters. The values of � tend to be less than 0.05 and are rarely greater than 0.3 [13, 33–35].Under these conditions Searle’s method provides better coverage.Non-Normality in the cluster speci�c random e�ects had a strong negative e�ect on the

coverage of all methods. The degree of non-Normality was simulated to be marked, in compar-ison to most continuous health outcomes but none the less based upon real data. Comparisonswere not made for data that were non-Normally distributed at the individual level on thegrounds that a suitable transformation can be derived to Normalize the data. An extension tothe issue of non-Normality is how applicable the evaluated methods are to binary data. Donner[18] points out that whilst the analysis of variance method provides valid point estimates for

Copyright ? 2002 John Wiley & Sons, Ltd. Statist. Med. 2002; 21:3757–3774

Page 16: A comparison of confidence interval methods for the intraclass correlation coefficient in cluster randomized trials

3772 O. C. UKOUMUNNE

the intraclass correlation coe�cient of a dichotomous outcome, the application of con�denceinterval methods that assume Normality may not be justi�ed. Further research needs to beimplemented to identify the scenarios in which these methods might be expected to performadequately for binary data. The use of the bootstrap [36] and Bayesian methods [15] o�erpotential for assigning measures of precision to � for non-Normal outcomes.Data from cluster randomized trials are frequently characterized by more than one level

of clustering. For instance, in a trial randomizing general practices and measuring outcomeson individuals, general practitioners form an intermediate level of clustering. It might be thecase that the most appropriate source for estimating the intraclass correlation coe�cient for agiven outcome is a survey study, the like of which may also be characterized by many levelsof clustering [28]. All sources of clustering need to be recognized in the analysis and thus themethods compared here are limited in that they may only be applied in two-level structuresand adjustment for �xed confounders is not straightforward. Multivariate approaches such asmultilevel modelling [37] may be used to obtain point estimates and con�dence intervalsfor the components of variance under more complex designs though con�dence intervalsfor the intraclass correlation coe�cient itself are not directly obtained. Further, the con�denceintervals of the variance components are constructed using the standard error on the assumptionthat the components are Normally distributed. This will only be the case for the between-cluster variance component when there is a su�ciently large enough number of clusters – asituation which is not common in cluster randomized trials [38]. Again, use of the bootstrapin the context of multilevel models may overcome these limitations.All methods for assigning intervals were characterized by poor levels of precision relative

to the size of �, even for sample sizes that would be considered large by the standards ofcluster randomized trials. The fact that the intervals for � would be truncated at zero whencalculating the variance in�ation factor does little to reduce the level of uncertainty. Thevariance in�ation factor formula shows that when the number of subjects sampled from eachcluster is large, even an apparently minor inaccuracy in the estimate of � may result in theoverall sample size of a cluster randomized trial being vastly overestimated or underestimated.The wide intervals in the simulations, therefore, draw attention to the problem that whilstone can quantify uncertainty in �, the sample size required in cluster randomized trials toguard against underestimation of �, may be unattainable in practice. In recognition of theconsiderable sample sizes required to estimate � with reasonable precision, greater use shouldbe made of complex health surveys, such as the Health Survey for England, which samplelarge numbers of subjects and clusters [34].

ACKNOWLEDGEMENTS

I am indebted to Peter Burney for permission to use data from the Greenwich asthma study. I am alsograteful to Susan Chinn, Martin Gulliford, Roger Newson and two anonymous referees for constructivecomments. This study was supported by an MRC Special Training Fellowship in Health Services andHealth of the Public Research.

REFERENCES

1. Donner A, Klar N. Design and Analysis of Cluster Randomization Trials in Health Research. Arnold: London,2000.

2. Murray DM. Design and Analysis of Group-Randomized Trials. Oxford University Press: New York, 1998.

Copyright ? 2002 John Wiley & Sons, Ltd. Statist. Med. 2002; 21:3757–3774

Page 17: A comparison of confidence interval methods for the intraclass correlation coefficient in cluster randomized trials

CONFIDENCE INTERVALS FOR THE INTRACLASS CORRELATION COEFFICIENT 3773

3. Premaratne UN, Sterne JA, Marks GB, Webb JR, Azima H, Burney PG. Clustered randomised trial of anintervention to improve the management of asthma: Greenwich asthma study. British Medical Journal 1999;318:1251–1255.

4. Wyatt JC, Paterson-Brown S, Johanson R, Altman DG, Bradburn MJ, Fisk NM. Randomised trial of educationalvisits to enhance use of systematic reviews in 25 obstetric units. British Medical Journal 1998; 317:1041–1046.

5. Gail MH, Byar DP, Pechacek TF, Corle DK. Aspects of statistical design for the Community Intervention Trialfor Smoking Cessation (COMMIT). Controlled Clinical Trials 1992; 13:6–21.

6. Zucker DM, Lakatos E, Webber LS, Murray DM, McKinlay SM, Feldman HA, Kelder SH. Statistical designof the Child and Adolescent Trial for Cardiovascular Health (CATCH): implications of cluster randomization.Controlled Clinical Trials 1995; 16:96–118.

7. Kish L. Survey Sampling. Wiley: New York, 1965.8. Hannan PJ, Murray DM, Jacobs Jr DR, McGovern PG. Parameters to aid in the design and analysis of communitytrials: Intraclass correlations from the Minnesota Heart Health Program. Epidemiology 1994; 5:88–95.

9. Kelder SH, Jacobs Jr DR, Je�ery RW, McGovern PG, Forster JL. The worksite component of variance: designe�ects and the Healthy Worker Project. Health Education Research 1993; 8:555–566.

10. Ukoumunne OC, Gulliford MC, Chinn S, Sterne JAC, Burney PGJ. Methods for evaluating area-wide andorganization-based interventions in health and health care: a systematic review. Health Technology Assessments1999; 3(5).

11. Campbell MK, Grimshaw JM. Cluster randomised trials: time for improvement. The implications of adopting acluster design are still largely being ignored. British Medical Journal 1998; 317:1171–1172.

12. Feng Z, Grizzle JE. Correlated binomial variates: properties of estimator of intraclass correlation and its e�ecton sample size calculation. Statistics in Medicine 1992; 11:1607–1614.

13. Campbell MJ. Cluster randomized trials in general (family) practice research. Statistical Methods in MedicalResearch 2000; 9:81–94.

14. Piaggio G, Carroli G, Villar J, Pinol A, Bakketeig L, Lumbiganon P, BergsjH P, Al-Mazrou Y, Ba’aqeel H,Beliz�an JM, Farnot U, Berendes H. Methodological considerations on the design and analysis of an equivalencestrati�ed cluster randomization trial. Statistics in Medicine 2001; 20:401–416.

15. Turner RM, Omar RZ, Thompson SG. Bayesian methods of analysis for cluster randomized trials with binaryoutcome data. Statistics in Medicine 2001; 20:453–472.

16. Donner A, Wells G. A comparison of con�dence interval methods for the intraclass correlation coe�cient.Biometrics 1986; 42:401–412.

17. Lohr SL, Divan M. Comparison of con�dence intervals for variance components with unbalanced data. Journalof Statistics and Computer Simulation 1997; 58:83–97.

18. Donner A. A review of inference procedures for the intraclass correlation coe�cient in the one-way randome�ects model. International Statistical Review 1986; 54:67–82.

19. Wald A. A note on the analysis of variance with unequal class frequencies. Annals of Mathematical Statistics1940; 11:96–100.

20. StataCorp. Stata Statistical Software: Release 7.0. Stata Corporation: College Station, TX, 2000.21. Searle SR. Linear Models. Wiley: New York, 1971.22. Burdick RK, Graybill FA. Con�dence Intervals on Variance Components. Marcel Dekker: New York, 1992.23. Thomas JD, Hultquist RA. Interval estimation for the unbalanced case of the one-way random e�ects model.

Annals of Statistics 1978; 6:582–587.24. Smith CAB. On the estimation of intraclass correlation. Annals of Human Genetics 1956; 21:363–373.25. Swiger LA, Harvey LR, Everson DO, Gregory KE. The variance of intraclass correlation involving groups with

one observation. Biometrics 1964; 20:818–826.26. Fisher RA. Statistical Methods for Research Workers. Hafner: New York, 1970.27. Ramberg JS, Dudewicz EJ, Tadikamalla PR, Mykytka EF. A probability distribution and its uses in �tting data.

Technometrics 1979; 21:201–214.28. Colhoun H, Prescott-Clarke P. Health Survey for England 1994. HMSO: London, 1996.29. Bellamy SL, Gibberd R, Hancock L, Howley P, Kennedy B, Klar N, Lipsitz S. Analysis of dichotomous outcome

data for community intervention studies. Statistical Methods in Medical Research 2000; 9:135–159.30. Burdick RK, Maqsood F, Graybill FA. Con�dence intervals on the intraclass correlation in the unbalanced

one-way classi�cation. Communications Statistics – Theory and Methods 1986; 15:3353–3378.31. Giraudeau B, Mary JY. Planning a reproducibility study: how many subjects and how many replicates per

subject for an expected width of the 95 per cent con�dence interval of the intraclass correlation coe�cient.Statistics in Medicine 2001; 20:3205–3214.

32. Marks GB, Dunn SM, Woolcock AJ. A scale for the measurement of quality of life in adults with asthma.Journal of Clinical Epidemiology 1992; 45:461–472.

33. Campbell M, Grimshaw J, Steen N. For the changing professional practice in Europe group. Sample sizecalculations for cluster randomised trials. Journal of Health Services & Research Policy 2000; 5:12–16.

Copyright ? 2002 John Wiley & Sons, Ltd. Statist. Med. 2002; 21:3757–3774

Page 18: A comparison of confidence interval methods for the intraclass correlation coefficient in cluster randomized trials

3774 O. C. UKOUMUNNE

34. Gulliford MC, Ukoumunne OC, Chinn S. Components of variance and intraclass correlations for the design ofcommunity-based surveys and intervention studies: data from the Health Survey for England 1994. AmericanJournal of Epidemiology 1999; 149:876–883.

35. Kerry SM, Bland JM. Unequal cluster sizes for trials in English and Welsh general practice: implications forsample size calculations. Statistics in Medicine 2001; 20:377–390.

36. Davison AC, Hinkley DV. Bootstrap Methods and Their Application. Cambridge University Press: Cambridge,1997.

37. Goldstein H. Multilevel Statistical Models. Edward Arnold: London, 1995.38. Evans BA, Feng Z, Peterson AV. A comparison of generalized linear mixed model procedures with estimating

equations for variance and covariance parameter estimation in longitudinal studies and group randomised trials.Statistics in Medicine 2001; 20:3353–3373.

Copyright ? 2002 John Wiley & Sons, Ltd. Statist. Med. 2002; 21:3757–3774