10
How does the DerSimonian and Laird procedure for random effects meta-analysis compare with its more efficient but harder to compute counterparts? Dan Jackson a, , Jack Bowden a , Rose Baker b a MRC Biostatistics Unit, Institute of Public Health, University Forvie Site, Robinson Way, Cambridge CB2 0SR, UK b Centre for OR and Applied Statistics, School of Business, University of Salford, Salford M5 4WT, UK article info Article history: Received 4 June 2009 Received in revised form 21 September 2009 Accepted 23 September 2009 Available online 30 September 2009 Keywords: Confidence intervals Efficiency Meta-analysis Random effects Profile likelihood abstract The procedure suggested by DerSimonian and Laird is the simplest and most commonly used method for fitting the random effects model for meta-analysis. Here it is shown that, unless all studies are of similar size, this is inefficient when estimating the between-study variance, but is remarkably efficient when estimating the treatment effect. If formal inference is restricted to statements about the treatment effect, and the sample size is large, there is little point in implementing more sophisticated methodology. However, it is further demonstrated, for a simple special case, that use of the profile likelihood results in actual coverage probabilities for 95% confidence intervals that are closer to nominal levels for smaller sample sizes. Alternative methods for making inferences for the treatment effect may therefore be preferable if the sample size is small, but the DerSimonian and Laird procedure retains its usefulness for larger samples. & 2009 Elsevier B.V. All rights reserved. 1. Introduction Meta-analysisthe pooling of separate studies concerned with the same treatment or issueis frequently used in medical and other applications. Although some debate concerning random versus fixed effects modelling continues, the random effects model has become a standard approach. Although the conventional random effects model is easily implemented, it has often been criticised. First of all, the studies must be large enough to use normal approximations, with known variances, for the within-study distributions. More recently, methods have been proposed using exact conditional distributions (van Houwelingen et al.,1993; Taye et al., 2008; Shi and Copas, 2002) and other developments recognise that the within-study variances are given in the form of estimates and that these are typically functions of the underlying treatment effect (B ¨ ohning et al., 2002; Malzahn et al., 2000). We will assume here, however, that studies are large enough to justify the standard within-study normal approximations. The conventional random effects model also makes the assumption that the random effect is normally distributed, although alternative distributional assumptions have been considered (Lee and Thompson, 2008; Baker and Jackson, 2008). The naive application of the random effects model has also been questioned due to the suspicion that study results may have been distorted due to publication and related biases (Baker and Jackson, 2006). Whilst recognising all these issues and concerns, we will assume here that the random effects model is appropriate and hence that all issues relate to which Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/jspi Journal of Statistical Planning and Inference ARTICLE IN PRESS 0378-3758/$ - see front matter & 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.jspi.2009.09.017 Corresponding author. Tel.: þ441223 330376; fax: þ44 1223 330388. E-mail addresses: [email protected] (D. Jackson), [email protected] (J. Bowden), [email protected] (R. Baker). Journal of Statistical Planning and Inference 140 (2010) 961–970

How does the DerSimonian and Laird procedure for random effects meta-analysis compare with its more efficient but harder to compute counterparts?

Embed Size (px)

Citation preview

Page 1: How does the DerSimonian and Laird procedure for random effects meta-analysis compare with its more efficient but harder to compute counterparts?

ARTICLE IN PRESS

Contents lists available at ScienceDirect

Journal of Statistical Planning and Inference

Journal of Statistical Planning and Inference 140 (2010) 961–970

0378-37

doi:10.1

� Cor

E-m

journal homepage: www.elsevier.com/locate/jspi

How does the DerSimonian and Laird procedure for random effectsmeta-analysis compare with its more efficient but harder tocompute counterparts?

Dan Jackson a,�, Jack Bowden a, Rose Baker b

a MRC Biostatistics Unit, Institute of Public Health, University Forvie Site, Robinson Way, Cambridge CB2 0SR, UKb Centre for OR and Applied Statistics, School of Business, University of Salford, Salford M5 4WT, UK

a r t i c l e i n f o

Article history:

Received 4 June 2009

Received in revised form

21 September 2009

Accepted 23 September 2009Available online 30 September 2009

Keywords:

Confidence intervals

Efficiency

Meta-analysis

Random effects

Profile likelihood

58/$ - see front matter & 2009 Elsevier B.V. A

016/j.jspi.2009.09.017

responding author. Tel.: þ441223 330376; fax

ail addresses: [email protected]

a b s t r a c t

The procedure suggested by DerSimonian and Laird is the simplest and most commonly

used method for fitting the random effects model for meta-analysis. Here it is shown

that, unless all studies are of similar size, this is inefficient when estimating the

between-study variance, but is remarkably efficient when estimating the treatment

effect. If formal inference is restricted to statements about the treatment effect, and the

sample size is large, there is little point in implementing more sophisticated

methodology. However, it is further demonstrated, for a simple special case, that use

of the profile likelihood results in actual coverage probabilities for 95% confidence

intervals that are closer to nominal levels for smaller sample sizes. Alternative methods

for making inferences for the treatment effect may therefore be preferable if the

sample size is small, but the DerSimonian and Laird procedure retains its usefulness for

larger samples.

& 2009 Elsevier B.V. All rights reserved.

1. Introduction

Meta-analysis—the pooling of separate studies concerned with the same treatment or issue—is frequently used inmedical and other applications. Although some debate concerning random versus fixed effects modelling continues,the random effects model has become a standard approach. Although the conventional random effects model is easilyimplemented, it has often been criticised. First of all, the studies must be large enough to use normal approximations, withknown variances, for the within-study distributions. More recently, methods have been proposed using exact conditionaldistributions (van Houwelingen et al., 1993; Taye et al., 2008; Shi and Copas, 2002) and other developments recognise thatthe within-study variances are given in the form of estimates and that these are typically functions of the underlyingtreatment effect (Bohning et al., 2002; Malzahn et al., 2000). We will assume here, however, that studies are large enoughto justify the standard within-study normal approximations.

The conventional random effects model also makes the assumption that the random effect is normally distributed,although alternative distributional assumptions have been considered (Lee and Thompson, 2008; Baker and Jackson, 2008).The naive application of the random effects model has also been questioned due to the suspicion that study results mayhave been distorted due to publication and related biases (Baker and Jackson, 2006). Whilst recognising all these issues andconcerns, we will assume here that the random effects model is appropriate and hence that all issues relate to which

ll rights reserved.

: þ441223 330388.

.uk (D. Jackson), [email protected] (J. Bowden), [email protected] (R. Baker).

Page 2: How does the DerSimonian and Laird procedure for random effects meta-analysis compare with its more efficient but harder to compute counterparts?

ARTICLE IN PRESS

D. Jackson et al. / Journal of Statistical Planning and Inference 140 (2010) 961–970962

estimation and related procedures to use. In particular, a now standard method originally proposed by DerSimonian andLaird (1986) is widely used, although this idea has more recently been extended (DerSimonian and Kacker, 2007).The popularity of this procedure is no doubt partly due to its relative simplicity. The DerSimonian and Laird procedure alsohas the merit of not requiring the assumption of normality for the random effect, an assumption that is sometimesquestioned (Hardy and Thompson, 1998). Hence the procedure is ‘valid approximately in a distribution-free context whenthere are many studies’ (Higgins et al., 2009). This statement does not reassure us that the DerSimonian and Lairdprocedure is effective compared to the alternatives, however.

The rest of the paper is set out as follows. In Section 2, the random effects model is described and we also provide aproof that estimates of treatment effect are unbiased under the assumptions of the model. In Section 3 the asymptotic(large number of studies) efficiency of the DerSimonian and Laird estimates is investigated and the small sample case isconsidered in Section 4. In Section 5, an investigation into using the profile likelihood suggests that this provides moresuitable coverage probabilities of confidence intervals when the sample size is modest and we conclude with a discussionin Section 6.

2. The random effects model

The conventional fixed and random effects models (DerSimonian and Laird, 1986; Biggerstaff and Tweedie, 1997;Jackson, 2009; Hardy and Thompson, 1996) initially assume that the estimate of treatment effect from the i th study,Yi, is distributed as Yijmi�Nðmi;s2

i Þ, where mi is the true underlying treatment effect of the i th study and s2i is the

corresponding within-study variance. The variance s2i is unknown but is replaced by a consistent estimate in practice. The

conventional random effects model further assumes that mi�Nðm; t2Þ, where m and t2 denote the overall treatment effectand between-study variance, respectively, and that the studies are independent. This provides the marginal distributionsYi�Nðm;s2

i þ t2Þ.

2.1. The standard procedure

The usual procedure begins by estimating t2. Once t2 has been evaluated, irrespective of how this has been obtained,the standard inference for m is straightforward, as t2 is effectively used or ‘plugged in’ as the true value. The simplest andmost commonly used estimate of t2 is the DerSimonian and Laird (1986) estimate. This uses the Q statistic,

Q ¼Xn

i¼1

wiðyi � yÞ2;

where wi ¼ s�2i , y ¼

Pni¼1 wiyi=

Pni¼1 wi and n denotes the number of studies. Under the assumptions of the random effects

model it can be shown that the expectation of Q is

E½Q � ¼ ðn� 1Þ þ S1 �S2

S1

� �t2

where Sr ¼Pn

i¼1 wri , which provides the DerSimonian and Laird estimate

t2DL ¼ max 0;

Q � ðn� 1Þ

S1 �S2

S1

0BB@

1CCA: ð1Þ

The corresponding estimate of treatment effect is

mDL ¼

Pni¼1

yi

s2i þ t

2DLPn

i¼1

1

s2i þ t

2DL

: ð2Þ

Confidence intervals are typically obtained using the approximation mDL�Nðm; ðPn

i¼1 w�i Þ�1Þ, where w� ¼ 1=ðs2

i þ t2DLÞ,

which is justified assuming that the studies are sufficiently large and also that there is at least a moderate number of these.For example, 100ð1� aÞ% confidence intervals for m are obtained as

mDL7Za=2

Xn

i¼1

w�i

!�1=2

; ð3Þ

where Za=2 denotes the a=2 quantile of the standard normal distribution. Quantiles from the t distribution, with ðn� 1Þdegrees of freedom, rather than the standard normal, are sometimes used in (3) instead.

Page 3: How does the DerSimonian and Laird procedure for random effects meta-analysis compare with its more efficient but harder to compute counterparts?

ARTICLE IN PRESS

D. Jackson et al. / Journal of Statistical Planning and Inference 140 (2010) 961–970 963

2.2. Three implementations of the random effects model

Standard variations of this procedure use alternative but consistent estimates of t2 to (1) in (2) and (3) (Normand, 1999).We identify three main variations in implementing the random effects model:

1.

The DerSimonian and Laird procedure in its entirety. 2. Maximum likelihood: The procedure which follows DerSimonian and Laird but where the maximum likelihood estimate

of t2; t2ML, is used instead.

3.

Restricted maximum likelihood (REML): The procedure which follows DerSimonian and Laird but where the REMLestimate of t2; t2

REML, is used instead.

Note, however, that although the first of these procedures is referred to here as the procedure of DerSimonian andLaird, they in fact examined all three of these variations in their original paper (DerSimonian and Laird, 1986). Variants 2and 3 require numerical maximisation and therefore iteration and diagnostic checks that the numerical methodshave been successful; they are more computationally intensive but might be expected to outperform DerSimonian andLaird’s procedure.

2.3. A further alternative implementation of the random effects model

An established but yet more computationally intensive alternative is to use profile likelihood (Hardy and Thompson,1996). Here the log-likelihood of the data is maximised to give estimates m and t2, and hence the same point estimates asfor variation 2 are obtained, but the profile likelihood is used to provide confidence intervals for both parameters, eitherseparately or jointly.

2.4. A proof that all standard estimates of treatment effect are unbiased under the random effects model

One important consideration when examining alternative estimators is their bias. Although all conventional estimatesof t2 are merely asymptotically unbiased, it is not generally appreciated that the corresponding estimates of treatmenteffect are unbiased under the random effects model. However, it should be emphasised that the assumption that thewithin-study variances are regarded as fixed and known is critical here (Shuster, 2009). Denote the vector of the Yi by Y,and let E ½Y� ¼ l. Emphasising that t2 is a function of the Yi, we have that

t2ðYÞ ¼ t2

ðY � lÞ ¼ t2ð�ðY � lÞÞ ð4Þ

for all conventional estimates of t2. Writing

m ¼ mþPn

i¼1ðYi � mÞ=ðs2i þ t

2ÞPn

i¼1 1=ðs2i þ t

and taking the expectation gives

E½m� ¼ mþ E

Pni¼1ðYi � mÞ=ðs2

i þ t2ÞPn

i¼1 1=ðs2i þ t

" #¼ mþ b;

where b is the bias. Defining Zi such that Zi � m ¼ �ðYi � mÞ, by the symmetry of the standard normal density function fð�Þ,E½gðZ� lÞ� ¼ E½gðY � lÞ� for any function g where these expectations exist. Then, using (4) and the definition of Zi � m,

b ¼ E

Pni¼1ðYi � mÞ=ðs2

i þ t2ðY � lÞÞPn

i¼1 1=ðs2i þ t

2ðY � lÞÞ

" #¼ E

Pni¼1ðZi � mÞ=ðs2

i þ t2ðZ� lÞÞPn

i¼1 1=ðs2i þ t

2ðZ� lÞÞ

" #¼ �E

Pni¼1ðYi � mÞ=ðs2

i þ t2ðY � lÞÞPn

i¼1 1=ðs2i þ t

2ðY � lÞÞ

" #¼ �b

and hence b ¼ 0 and E½m� ¼ m.

3. Asymptotic efficiency of point estimates

In this section, we will examine the asymptotic efficiency of standard point estimates of both m and t2 in order toinvestigate how well the various procedures perform in large samples. We denote y ¼ ðy1; y2Þ ¼ ðm; t2Þ. We make theassumption that the estimators y1 ¼ m and y2 ¼ t2 are asymptotically unbiased, which is the case for all standardestimation procedures when applying the random effects model, as the various possibilities for m are unbiased (as provedin Section 2.4) and the bias of all conventional estimates t2 tend towards zero as n-1. Emphasising their dependence onthe sample Y, we denote these estimators by Wy1

ðYÞ and Wy2ðYÞ. We let f ðy; yÞ denote the n-variate probability density

Page 4: How does the DerSimonian and Laird procedure for random effects meta-analysis compare with its more efficient but harder to compute counterparts?

ARTICLE IN PRESS

D. Jackson et al. / Journal of Statistical Planning and Inference 140 (2010) 961–970964

function of the data, i.e.

f ðy; yÞ ¼Yn

i¼i

1ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffis2

i þ y2

q fyi � y1ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffis2

i þ y2

q0B@

1CA:

Assuming estimates are asymptotically unbiased, as n-1, we have

E½Wy1ðYÞ� ¼

Zwy1ðyÞf ðy; yÞdy-y1 ð5Þ

and

E½Wy2ðYÞ� ¼

Zwy2ðyÞf ðy; yÞdy-y2 ð6Þ

where these integrals denote n-variate integrals. We assume that the estimators are regular functions, so that

@

@yiE½Wyj

ðYÞ� ¼@

@yi

ZwyjðyÞf ðy; yÞdy ¼

ZwyjðyÞ

@

@yif ðy; yÞdy ð7Þ

for i; j ¼ 1;2. This is the case under very general circumstances although note that the two sided derivatives in (7) assumethat parameter values are not at the edge of parameter spaces. Hence the results are restricted to t240. We denote Fisher’sinformation matrix by JðyÞ and the variance of WðYÞ ¼ ðWy1

ðYÞ;Wy2ðYÞÞ by VðyÞ.

We show in the Appendix that VðyÞ � JðyÞ�1 must be positive semi-definite in large samples. In particular,asymptotically, the diagonal entries of VðyÞ must be greater than or equal to the corresponding entries of JðyÞ�1.Effectively, by considering the asymptotic case where estimates are unbiased, we arrive at the multivariate Rao–Cramerinequality (Drygas, 1987). Although minimum variance bounds have been derived without requiring regular functions(Chapman and Robbins, 1951), this assumption is very general and includes all estimation procedures one might considerreasonable in practice.

3.1. Fisher’s information matrix and variance bounds for estimates using the random effects model

The log-likelihood of the data is

LðyÞ ¼ �1

2

Xn

i¼1

logð2pðs2i þ y2ÞÞ þ

ðyi � y1Þ2

s2i þ y2

( ): ð8Þ

Upon obtaining all second order partial derivatives, and taking minus the expectation of the resulting expressions over theindependent distributions Yi�Nðy1;s2

i þ y2Þ, we obtain

JðyÞ ¼

Xn

i¼1

1

s2i þ y2

0

01

2

Xn

i¼1

1

ðs2i þ y2Þ

2

2666664

3777775

so that

JðyÞ�1¼

Xn

i¼1

1

s2i þ y2

!�1

0

0 2Xn

i¼1

1

ðs2i þ y2Þ

2

!�1

26666664

37777775: ð9Þ

Hence, by restricting ourselves to using asymptotically unbiased estimators, we have that the asymptotic variances ofy1 ¼ m and y2 ¼ t2 must be greater than or equal to vðmÞ and vðt2

Þ where

vðmÞ ¼ 1

�Xn

i¼1

1

s2i þ t2

ð10Þ

and

vðt2Þ ¼ 2

�Xn

i¼1

1

ðs2i þ t2Þ

2: ð11Þ

Page 5: How does the DerSimonian and Laird procedure for random effects meta-analysis compare with its more efficient but harder to compute counterparts?

ARTICLE IN PRESS

D. Jackson et al. / Journal of Statistical Planning and Inference 140 (2010) 961–970 965

3.2. The implications of these bounds

If maximum likelihood or REML estimation is used then these bounds are achieved, indeed the inverse of theinformation matrix (9) is typically used as an approximate variance matrix for such estimators. The DerSimonian and Lairdestimators, by using a moments estimate of t2, do not guarantee this in general and hence these will be investigated furtherin this section. An important result given by Biggerstaff and Tweedie (1997) is

Var½Q � ¼ 2ðn� 1Þ þ 4ðS1 � S2=S1Þt2 þ 2ðS2 � 2S3=S1 þ S22=S2

1Þt4;

where Sr ¼P

wri and wi ¼ s�2

i . Hence the variance of the untruncated DerSimonian and Laird estimate of t2 is Var½t2u� ¼

Var½Q �=ððS1 � S2=S1Þ2Þ and, assuming that the true t240, this is also asymptotically Varðt2

DLÞ because the probability oftruncation tends to zero as n-1. The procedure suggested by Biggerstaff and Jackson (2008) could be used in principle toallow for truncation but requires obtaining high dimensional eigenvalues and applying Farebrother’s algorithm(Farebrother, 1984), for calculating the distribution of a positive linear combination of w2 random variables, with a verylarge number of variables. The ratio vðt2

Þ=Varðt2uÞ, as n-1, gives the asymptotic efficiency of the DerSimonian and Laird

estimate of t2.

3.2.1. The special case where all studies are the same size

An important special case is the scenario where all studies are the same ‘size’ i.e. s2i ¼ s2 for all i. The minimum

asymptotic variance vðmÞ (10) simplifies in this instance to ðs2 þ t2Þ=n; all three procedures provide m ¼ y so that thisbound is achieved. Furthermore (11) simplifies to 2=

Pni¼1 1=ðs2

i þ t2Þ

2¼ 2ðs2 þ t2Þ

2=n and Varðt2DLÞ, upon replacing Sr with

nwr and approximating n� 1 � n, simplifies to this same expression. The DerSimonian estimate of t2 is also asymptoticallyefficient if all studies are the same size.

3.2.2. More realistic distributions for the within-study variances

Three possibilities are examined. First of all, 0:25w21, truncated to lie within [0.009, 0.6], as suggested by Brockwell and

Gordon (2007), was used. This provides a density that is decreasing in s2i , so that large studies (small s2

i ) are more probablethan small studies. For this and all other distributions considered, a representative sample of n within-study variances areobtained as the 0;1=ðn� 1Þ;2=ðn� 1Þ; . . .1 quantiles. To provide a contrasting scenario, this same scaled and truncated w2

distribution, but with 10 degrees of freedom, was also used as this also provides a decreasing density function in s2i . Finally,

a uniform distribution for s2i over the range [0.009, 0.6] was also considered.

Jackson and Bowden (2009) considered other ranges for the distribution for s2i but since the results are to be calibrated

in terms of I2 ¼ t2=ðs2t þ t2Þ, where s2

t is the typical within-study variance given by Higgins and Thompson (2002), it wassuspected that the shape of the distribution of s2

i , rather than the location, is likely to be critical.

3.2.3. The asymptotic efficiency of the DerSimonian and Laird estimate of between-study variance

For large samples,

Varðt2DLÞ � Varðt2

uÞ � Var½Q �=ððS1 � S2=S1Þ2Þ � 2ðn=S2

1 þ 2t2=S1 þ S2t4=S21Þ;

and it is straightforward to show that vðt2Þ=Varðt2

uÞ-S21=nS2 as t2-0 or t2-1. Interpreting Si=n as E½wi�, we have

vðt2Þ=Varðt2

uÞ � E½wi�2=E½w2

i � ¼ 1� Var½wi�=E½w2i �r1 for very small and large degrees of heterogeneity and hence the

efficiency can easily be calculated in some instances. In order to investigate this for more moderate degrees ofheterogeneity, 1000 representative within-study variances were produced for each of the three distributions of within-study variances and vðt2

Þ=Varðt2uÞ was directly computed, using I2 ¼ 0:001;0:002; . . . ;0:999, and values of this ratio are

plotted against I2 in Fig. 1. The asymptotic efficiency of the DerSimonian and Laird estimate of t2 is particularlydisappointing for small and large degrees of heterogeneity and, although much better for moderate I2, the impression isthat if an accurate estimate of t2 is required,unless studies are of similar size, then more effort than the usual DerSimonianand Laird procedure is worthwhile.

3.2.4. The asymptotic efficiency of the DerSimonian and Laird estimate of treatment effect

VarðmDLÞ is much harder to derive analytically so a simulation study was performed, using the same three sets of1000 within-study variances as in the previous section, and I2 ¼ 0:01:0:05;0:1;0:2; . . . ;0:9:0:95;0:99. Values of I2 can beconverted back to a value of t2 and sets of Yi�Nðm;s2

i þ t2Þ; i ¼ 1;2; . . .1000, can be simulated and the DerSimonian and

Laird procedure can be applied to each set; m ¼ 0 was adopted but this choice is immaterial. The sample variance of theresulting estimates of m can be calculated and used as an estimate of this and, since the sample size is large, it gives anindication of the asymptotic variance of mDL; the ratio of vðmÞ and this variance gives a guide to the asymptotic efficiency ofmDL. Using a million simulated meta-analyses for each I2 and each set of within-study variances resulted in the asymptoticefficiencies shown in Fig. 2. These efficiencies are more than 98.5% in every instance. Although estimates appear to be lessefficient if the degree of heterogeneity is very small, if I240:2 then the results respond mainly to Monte Carlo error, ratherthan trend, which explains the efficiencies that are slightly greater than one, and the efficiency is very close to unity acrossall simulations.

Page 6: How does the DerSimonian and Laird procedure for random effects meta-analysis compare with its more efficient but harder to compute counterparts?

ARTICLE IN PRESS

0.0

0.990

Asy

mpt

otic

effi

cien

cy

1.002

1.000

0.998

0.996

0.994

0.992

I20.2 0.4 0.6 0.8 1.0

Fig. 2. The asymptotic efficiency of the DerSimonian and Laird estimate of m as a function of I2 using a representative sample of 1000 within-study

variances from each of the three distributions used for the within-study variances. The solid line corresponds to the scaled and truncated w2 distribution

with one degree of freedom; the dashed line corresponds to this w2 distribution with 10 degrees of freedom; the dotted line corresponds to a uniform

distribution. A line showing an ideal asymptotic efficiency of unity is shown for comparison.

0.0

0.2

Asy

mpt

otic

effi

cien

cy

1.0

0.8

0.6

0.4

I20.2 0.4 0.6 0.8 1.0

Fig. 1. The asymptotic efficiency of the DerSimonian and Laird estimate of t2 as a function of I2 using a representative sample of 1000 within-study

variances from each of the three distributions used for the within-study variances. The solid line corresponds to the scaled and truncated w2 distribution

with one degree of freedom; the dashed line corresponds to this w2 distribution with 10 degrees of freedom; the dotted line corresponds to a uniform

distribution. A line showing an ideal asymptotic efficiency of unity is shown for comparison.

D. Jackson et al. / Journal of Statistical Planning and Inference 140 (2010) 961–970966

4. Efficiency of the estimates of treatment effect in small samples

Although the results in the previous section show that the DerSimonian and Laird estimate of the treatment effect issurprisingly efficient in large samples, it is of interest to see if this is also the case for the much smaller sample sizes moregenerally encountered in practice. Since the corresponding estimate of t2 has been found to have unsatisfactory asymptoticproperties, attention here will focus on the estimate of the treatment effect.

In order to investigate the small sample properties of mDL the exercise described in Section 3.2.4 was repeated usingn ¼ 5;10;25 and 50, but using all three variants of the standard procedure described in Section 2.2 (but only 50,000simulated datasets in each instance, due to the computational burden of numerical maximisation and numerical checksrequired for the likelihood based approaches). The results are shown using the uniform distribution for s2

i in Fig. 3; note

Page 7: How does the DerSimonian and Laird procedure for random effects meta-analysis compare with its more efficient but harder to compute counterparts?

ARTICLE IN PRESS

0.0

0.5Asy

mpt

otic

effi

cien

cy

n = 5

0.0

0.70Asy

mpt

otic

effi

cien

cy

n = 10

0.00.86

Asy

mpt

otic

effi

cien

cy

n = 25

0.0

0.93Asy

mpt

otic

effi

cien

cy

n = 50

0.9

0.7

I20.2 0.4 0.6 0.8 1.0

1.00

0.90

0.80

I20.2 0.4 0.6 0.8 1.0

0.98

0.94

0.90

I20.2 0.4 0.6 0.8 1.0

0.99

0.97

0.95

I20.2 0.4 0.6 0.8 1.0

Fig. 3. The asymptotic efficiency of the DerSimonian and Laird estimate of m as a function of I2 using a representative sample of within-study variances

from the uniform distribution. Solid circles are DerSimonian and Laird estimates, hollow circles are maximum likelihood estimates and hollow rectangles

are REML estimates.

D. Jackson et al. / Journal of Statistical Planning and Inference 140 (2010) 961–970 967

that different vertical axes are used for each of the four plots shown in this figure. Although the asymptotic bound for thevariance of the treatment effect (10) does not strictly apply for these small sample sizes, the ratio of this and the resultingvariances give an indication of the relative efficiencies. Fig. 3 shows that the DerSimonian and Laird estimate of treatmenteffect compares well to the corresponding maximum likelihood and REML estimates. As n increases, the efficiencies of allestimation procedures also increase. Maximum likelihood estimation is the most efficient procedure for small I2 but forlarger values the DerSimonian and Laird and REML estimates become more efficient than ML. Similar findings were alsoobtained for the other two distributions for the s2

i ; greater efficiencies were obtained using a scaled and truncated w21

distribution, and smaller efficiencies were obtained using w210, however, suggesting that the distribution of the within-study

variances also has implications for the efficiency. Further investigation with alternative distributions for the within-studyvariances is warranted but it seems that estimates of treatment effect become more efficient as the amount of information(number and size of studies) increases.

5. The coverage probability of confidence intervals for the treatment effect

The above investigation does not address the perhaps more important issue of the performance of competing methodsfor constructing confidence intervals for the treatment effect when the sample size is small. Brockwell and Gordon (2001)investigated this via a simulation study and the intention here is to extend this work by exploring this issue analytically.Jackson (2009) considered the special case where all studies are the same size and showed that the actual significance levelof hypothesis tests resulting from the DerSimonian and Laird procedure, with nominal 5% significance levels, can be muchlarger than this in small samples and when the data are highly heterogeneous, by deriving the exact distribution of the teststatistic under the null hypothesis. The analysis in Jackson and Bowden (2009) shows that the same distribution applies tothe ratio of ðmDL � mÞ to its estimated standard error and hence the complement of the significance levels in Table 1 ofJackson (2009) give actual coverage probabilities of confidence intervals using (3), assuming all studies are the same size.

For this special case, the same point estimates and confidence intervals for m are obtained using variations 1 and 3, asthe DerSimonian and Laird and REML estimates of t2 are identical if all studies are the same size. Although the derivation ofdistributions can easily be amended to the case where maximum likelihood estimation of t2 is performed, this is notdescribed here, as it is clear that the low coverage of confidence intervals obtained below using variations 1 and 3 isreduced further by a smaller estimate of t2; the maximum likelihood estimate of t2 is necessarily less than or equal to theREML estimate and all confidence intervals for m are centred at y. It is therefore of interest to see if the morecomputationally intensive likelihood approach Hardy and Thompson (1996) provides more accurate coverage probabilitiesin such instances.

Page 8: How does the DerSimonian and Laird procedure for random effects meta-analysis compare with its more efficient but harder to compute counterparts?

ARTICLE IN PRESS

Table 1The actual coverage probability of nominal 95% confidence intervals for m assuming all studies are the same size.

I2 n ¼ 4 n ¼ 8 n ¼ 16 n ¼ 32

DLZ DLt PL DLZ DLt PL DLZ DLt PL DLZ DLt PL

0 0.963 0.999 0.972 0.962 0.987 0.966 0.960 0.974 0.962 0.958 0.966 0.959

0.15 0.951 0.998 0.963 0.951 0.981 0.957 0.950 0.967 0.954 0.950 0.958 0.952

0.3 0.936 0.996 0.951 0.939 0.974 0.948 0.942 0.959 0.947 0.944 0.953 0.947

0.5 0.912 0.990 0.932 0.924 0.963 0.935 0.934 0.953 0.940 0.941 0.950 0.944

0.75 0.877 0.972 0.903 0.911 0.952 0.925 0.931 0.950 0.938 0.941 0.950 0.944

0.9 0.860 0.956 0.889 0.909 0.950 0.924 0.931 0.950 0.938 0.941 0.950 0.944

DLZ and DLt denote intervals from the DerSimonian and Laird procedure using standard normal and the t distribution quantiles, respectively, and PL

denotes intervals using the likelihood approach to random effects meta-analysis proposed by Hardy and Thompson (i.e. using DðmpÞ�w21).

D. Jackson et al. / Journal of Statistical Planning and Inference 140 (2010) 961–970968

5.1. Analysis of the likelihood approach to random effects meta-analysis assuming all studies are the same size

If all studies are the same size then the profile log-likelihood in terms of mp and t2p is obtained from the log-likelihood

(8), upon replacing y1 and y2 with these terms, respectively. Upon making the simplification that s2i ¼ s2 for all i, and after

some algebraic manipulation involving the identityP

iðyi � mÞ2 ¼P

iðyi � yÞ2 þ nðy � mÞ2 we obtain, to within a constant,the deviance

Dðmp; t2pÞ ¼ �2Lðmp; t2

pÞ ¼ n logðs2 þ t2pÞ þ

s2 þ t2

s2 þ t2p

ðX1 þ X2Þ; ð12Þ

where X1 ¼P

iðyi � yÞ2=ðs2 þ t2Þ and X2 ¼ nðy � mpÞ2=ðs2 þ t2Þ. X1�w2

ðn�1Þ and, under the hypothesis that m ¼ mp;X2�w21,

and X1 and X2 are independent. For any fixed mp, the log-likelihood L is maximised by t2pðmpÞ ¼ maxð0;

Piðyi � mpÞ

2=

n� s2Þ ¼maxð0; ðs2 þ t2ÞðX1 þ X2Þ=n� s2Þ, or equivalently

s2 þ t2pðmpÞ ¼maxðs2; ðs2 þ t2ÞðX1 þ X2Þ=nÞ: ð13Þ

We substitute (13) into (12) to obtain the deviance

Dðmp; t2pðmpÞÞ ¼ n logðmaxðs2; ðs2 þ t2ÞðX1 þ X2Þ=nÞÞ þ

s2 þ t2

maxðs2; ðs2 þ t2ÞðX1 þ X2Þ=nÞðX1 þ X2Þ: ð14Þ

This is minimised using mp ¼ y which is equivalent to X2 ¼ 0. Hence the deviance statistic used for testing H0 : m ¼ mp,or equivalently determining if mp lies in the corresponding confidence interval, is the difference between Eq. (14) evaluatedwithout and with X2 set to zero. After further algebra the resulting deviance statistic simplifies to

DðmpÞ ¼ n logmaxðc;X1 þ X2Þ

maxðc;X1Þ

� �þ

X1 þ X2

maxðc;X1 þ X2Þ�

X1

maxðc;X1Þ

� �;

where c ¼ nð1� I2Þ. Under the hypothesis that H0 : m ¼ mp;DðmpÞ is asymptotically distributed as w21 and hence approximate

inferences can be made as suggested by Hardy and Thompson (1996).

5.2. Coverage probabilities of 95% confidence intervals for the treatment effect

Although X1 and X2 are independently distributed as w2ðn�1Þ and w2

1 under the null, respectively, deriving the resultingdensity of DðmpÞ analytically is a very difficult task. However, it is a trivial task to simulate large numbers of X1 and X2 fromindependent w2 distributions, evaluate DðmpÞ and see how many simulated values lie inside the acceptance region of the w2

test to give the coverage probability of confidence intervals to within Monte Carlo error.The actual coverage probabilities of nominal 95% confidence intervals for the treatment effect obtained using

DerSimonian and Laird’s procedure (with both standard normal and t quantiles) and the likelihood approach to randomeffects meta-analysis are shown in Table 1, for n ¼ 4;8;16;32 and a range of I2 values. For the DerSimonian and Lairdprocedure, coverage probabilities are calculated from Tables 1 and 2 of Jackson (2009) as explained above, and for thelikelihood approach a million simulations were used for each of the 24 scenarios considered. The results in Table 1 suggestthat n ¼ 8 is large enough to provide coverage probabilities within around 2.5% of the nominal 95% using the likelihoodapproach to meta-analysis. For DerSimonian and Laird’s procedure (and therefore also the REML variant of this) n closer to16 is needed. When n is around 16 or more, all methods appear suitable if formal inference is to be confined to m.

6. Discussion

Our investigation suggests that if the sample size is large and inferences are restricted to the treatment effect, thenwe can do little better than DerSimonian and Laird’s original procedure under the random effects model. Despite this,

Page 9: How does the DerSimonian and Laird procedure for random effects meta-analysis compare with its more efficient but harder to compute counterparts?

ARTICLE IN PRESS

D. Jackson et al. / Journal of Statistical Planning and Inference 140 (2010) 961–970 969

if inferences about the between-study variance are considered important, the extra effort of adopting efficient proceduresfor this is generally worthwhile.

For small samples, the standard methods for constructing confidence intervals for the treatment effect, which involveplugging in t2, perform poorly and hence the likelihood approach to meta-analysis might be usefully employed in suchinstances, despite its more computationally demanding nature. For meta-analyses with just two or three studies it isdifficult to suggest anything better than the method of Jackson and Bowden (2009), or perhaps an entirely likelihood basedanalogue of this, where alternative quantiles for constructing confidence intervals are considered in the context of asensitivity analysis. If the REML estimate is adopted then the small sample techniques of Kenward and Roger (1997) mightalso be useful in reducing the sample size needed to accurately obtain confidence intervals.

Since there is very little information relating to the between-study variance in small samples, a fully Bayesian approachis potentially useful, with informative priors extracted from experts, but it is evident from Fig. 3 of Jackson and Bowden(2009) that the magnitude of the between-study variance is crucial in determining the width of intervals for the treatmenteffect, and inferences for both parameters can be expected to be sensitive to prior specification unless the sample size islarge. Hence statistical issues are inevitable when fitting the random effects model for meta-analysis using datasets withjust a handful of studies, irrespective of the paradigm and procedure ultimately adopted.

Acknowledgements

The authors wish to thank Ian R. White and Julian Higgins for their helpful comments and suggestions. DJ and JB reemployed by the UK Medical Research Council (grant codes U.1052.00.006 and U.1052.00.001).

Appendix

As the estimates are asymptotically unbiased, trivially (7) tends towards unity as n-1 if i ¼ j and zero otherwise.The vector of score functions, SðYÞ ¼ ðS1ðYÞ; S2ðYÞÞ ¼ ðð@=@y1Þlog f ðY ; yÞ; ð@=@y2Þlog f ðY ; yÞÞ, has expectation 0 and variancematrix equal to JðyÞ. We are interested in the properties of VðyÞ in large samples, subject to the constraints (5) and (6). Sincethe estimators are regular functions we have

@

@yiE½Wyj

ðYÞ� ¼

ZwyjðyÞ

@

@yif ðy;yÞdy ¼ E wyj

ðYÞ@

@yilogðf ðY ; yÞÞ

� �:

Noting that the expectations of the score functions are zero, this means that

CovðWyjðYÞ; SiðYÞÞ ¼

@

@yiE½Wyj

ðYÞ�

which tends towards unity as n-1, if i ¼ j, and zero otherwise, as noted above. Hence for large n we have

VarðWðYÞ; SðYÞÞ �VðyÞ I

I JðyÞ

" #; ð15Þ

where I denotes the two by two identity matrix.Consider the random vector ZðYÞ ¼ IWðYÞ � JðyÞ�1SðYÞ. Since Z ¼ CX¼)VarðZÞ ¼ CSCT , where C is a matrix of constants

and VarðXÞ ¼ S, we have VarðZðYÞÞ � VðyÞ � JðyÞ�1, in large samples, and hence VðyÞ � JðyÞ�1 must be positive semi-definite.

References

Baker, R., Jackson, D., 2006. Using journal impact factors to correct for the publication bias of medical studies. Biometrics 62, 785–792.Baker, R., Jackson, D., 2008. A new approach to outliers in meta-analysis. Health Care in Management Science 11, 121–131.Biggerstaff, B.J., Jackson, D., 2008. The exact distribution of Cochran’s heterogeneity statistic in one-way random effects meta-analysis. Statistics in

Medicine 27, 6093–6110.Biggerstaff, B.J., Tweedie, R.L., 1997. Incorporating variability of estimates of heterogeneity in the random effects model in meta-analysis. Statistics in

Medicine 16, 753–768.Bohning, D., Malzahn, U., Dietz, E., Chlattmann, P., Viwatwongkasem, C., Biggeri, A., 2002. Some general points in estimating heterogeneity variance with

the DerSimonian–Laird estimator. Biostatistics 3, 445–457.Brockwell, S.E., Gordon, I.R., 2001. A comparison of statistical methods for meta-analysis. Statistics in Medicine 20, 825–840.Brockwell, S.E., Gordon, I.R., 2007. A simple method for inference on an overall effect in meta-analysis. Statistics in Medicine 26, 4531–4543.Chapman, D.G., Robbins, H., 1951. Minimum variance estimation without regularity assumptions. Annals of Mathematical Statistics 22, 581–586.DerSimonian, R., Kacker, R., 2007. Random-effects model for meta-analysis of clinical trials: an update. Contemporary Clinical Trials 28, 105–114.DerSimonian, R., Laird, N., 1986. Meta-analysis in clinical trials. Controlled Clinical Trials 7, 177–188.Drygas, H., 1987. On the multivariate Rao–Cramer inequality. Statistical Papers 28, 19–71.Farebrother, R.W., 1984. Algorithm AS204, the distribution of a positive linear combination of chi-squared random variables. Journal of the Royal Statistical

Society (Series C) 33, 332–339.Hardy, R.J., Thompson, S.G., 1996. A likelihood approach to meta-analysis with random effects. Statistics in Medicine 15, 619–629.Hardy, R.J., Thompson, S.G., 1998. Detecting and describing heterogeneity in meta-analysis. Statistics in Medicine 17, 841–856.Higgins, J.P.T., Thompson, S.G., 2002. Quantifying heterogeneity in meta-analysis. Statistics in Medicine 21, 1539–1558.

Page 10: How does the DerSimonian and Laird procedure for random effects meta-analysis compare with its more efficient but harder to compute counterparts?

ARTICLE IN PRESS

D. Jackson et al. / Journal of Statistical Planning and Inference 140 (2010) 961–970970

Higgins, J.P.T., Thompson, S.G., Speigelhalter, D.J., 2009. A re-evaluation of random-effects meta-analysis. Journal of the Royal Statistical Society Series A172, 137–159.

van Houwelingen, H.C., Zwinderman, K.H., Stijnen, T., 1993. A bivariate approach to meta-analysis. Statistics in Medicine 12, 2285–2303.Jackson, D., 2009. The significance level of the standard test for a treatment effect in meta-analysis. Statistics in Biopharmaceutical Research 1, 92–100.Jackson, D., Bowden, J., 2009. A re-evaluation of the ‘quantile approximation method’ for random effects meta-analysis. Statistics in Medicine 28, 338–348.Kenward, M.G., Roger, J.H., 1997. Small sample inference for fixed effects from restricted maximum likelihood. Biometrics 53, 983–997.Lee, K.J., Thompson, S.G., 2008. Flexible parametric models for random effects distributions. Statistics in Medicine 27, 418–434.Malzahn, U., Bohning, D., Holling, H., 2000. Nonparametric estimation of heterogeneity variance for the standardised difference used in meta-analysis.

Biometrika 87, 619–632.Normand, S.L.T., 1999. Meta-analysis: formulating, evaluating, combining and reporting. Statistics in Medicine 18, 321–359.Shi, J.Q., Copas, J.B., 2002. Meta-analysis for 2� 2 tables using an average Markov chain Monte Carlo EM algorithm. Journal of the Royal Statistical Society

Series B 64, 221–236.Shuster, J.J., 2009. Empirical vs natural weighting in random effects meta-analysis. Statistics in Medicine, to appear.Taye, H.H., van Houwelingen, H.C., Stijnen, T., 2008. The binomial distribution of meta-analysis was preferred to model within-study variability. Journal of

Clinical Epidemiology 61, 41–51.