Modifications of REML algorithm for HGLMs

Stat Comput (2012) 22:959–966DOI 10.1007/s11222-011-9265-9

Modifications of REML algorithm for HGLMs

Woojoo Lee · Youngjo Lee

Received: 28 October 2009 / Accepted: 9 June 2011 / Published online: 30 June 2011© Springer Science+Business Media, LLC 2011

Abstract Hierarchical generalized linear models (HGLMs)have become popular in data analysis. However, their max-imum likelihood (ML) and restricted maximum likelihood(REML) estimators are often difficult to compute, especiallywhen the random effects are correlated; this is because ob-taining the likelihood function involves high-dimensionalintegration. Recently, an h-likelihood method that does notinvolve numerical integration has been proposed. In thisstudy, we show how an h-likelihood method can be imple-mented by modifying the existing ML and REML proce-dures. A small simulation study is carried out to investigatethe performances of the proposed methods for HGLMs withcorrelated random effects.

Keywords Hierarchical generalized linear model ·Hierarchical likelihood · Penalized quasi-likelihood ·Restricted likelihood · Variance components

Electronic supplementary material The online version of this article(doi:10.1007/s11222-011-9265-9) contains supplementary material,which is available to authorized users.

W. LeeDepartment of Medical Epidemiology and Biostatistics,Karolinska Institutet, 17177, Stockholm, Swedene-mail: [email protected]

W. LeeDepartment of Statistics, Inha University, 253 Yonghyun-dong,Nam-gu, Incheon, 402-751, Korea

Y. Lee (�)Department of Statistics, Seoul National University, 599Gwanangno, Gwanak-gu, Seoul, 151-742, Koreae-mail: [email protected]

1 Introduction

Hierarchical generalized linear models (HGLMs) have at-tracted increasing attention because of their wide appli-cability and easy interpretability. The marginal likelihoodfunction, obtained by integrating out the random effects,is analytically intractable and demands computationally in-tensive approaches to allow for the computation of maxi-mum likelihood (ML) and restricted maximum likelihood(REML) estimators. The computational problems are mag-nified when the random effects are correlated, because themultivariate integral for correlated random effects cannot berepresented as the product of univariate integrals. Breslowand Clayton (1993) proposed the use of a penalized quasi-likelihood (PQL) method to estimate parameters in general-ized linear mixed models (GLMMs), a subclass of HGLMs,assuming a normal random effect. An advantage of the PQLmethod is the computational simplicity of its implementa-tion, in that it uses existing ML and REML procedures forlinear mixed models (LMMs). However, the PQL estima-tors can suffer from severe biases. To reduce the biases, Linand Breslow (1996) and Goldstein and Rasbash (1996) con-sidered higher-order corrections. However, Lin and Bres-low (1996) reported that their second-order PQL estima-tors seriously failed, producing large biases and breakingdown for large values of the dispersion parameters. Gold-stein and Rasbash (1996) also reported serious bias for thevariance component estimate in the simulation study; SeeTables 1 and 2 of their paper. Another disadvantage is thatthese methods cannot be used for non-normal random ef-fects, which are frequently encountered (Molenberghs et al.2010).

Lee and Nelder (1996) extended GLMMs to HGLMsby allowing for non-normal random effects, and they pro-posed the use of the hierarchical likelihood (h-likelihood)

http://dx.doi.org/10.1007/s11222-011-9265-9

mailto:[email protected]

mailto:[email protected]

960 Stat Comput (2012) 22:959–966

for drawing inferences from them. In binary data, the h-likelihood estimators substantially reduce the biases of thePQL estimators (Lee et al. 2006). Noh and Lee (2007)developed h-likelihood methods for the analysis of binarydata with crossed random effects and found that they pro-duced the best results among the approximation methods.An REML algorithm for LMMs with various patterned cor-relations of random effects has already been developed, andtherefore, it would be useful if the existing REML algorithmcan be modified to obtain estimators for various HGLMswith non-normal or correlated random effects.

So far, procedures for HGLMs have mainly been stud-ied for independent random effects, and correlation param-eters for random effects have been found using a searchmethod (Lee and Nelder 2001). In this study, we first showhow the existing algorithm for LMMs can be used to es-timate fixed effects. To estimate the dispersion and correla-tion parameters, we study how to modify the existing REMLscore equations and the Hessian matrices required for theNewton-Raphson method. All the proofs and computationaldetails are given in Online Resource. We also discuss thedifferences between the PQL and h-likelihood estimations.For models with independent random effects, the biases willdecrease as the cluster size increases. However, when therandom effects are highly correlated, the biases will not de-crease to the same degree with increasing cluster size.

2 PQL procedure for GLMMs

Consider an LMM described as

y = Xβ + Zv + e,

where X and Z are model matrices for fixed effects β =(β1, . . . , βp)T and random effect v = (v1, . . . , vq)T , v ∼Nq(0,D), and e ∼ Nn(0,W−1). Let τ = (τ1, . . . , τr )

T bea vector of dispersion components of D and W . In LMMs,we often have W−1 = φI and τ1 = φ. In longitudinal dataanalysis, a general matrix W can be considered. Let

var(y) = � = �L = ZDZT + W−1. (1)

In this paper, we use the generic symbol � for a particularmatrix necessary to implement various random-effect mod-els. Here, �L denotes such a matrix (the covariance matrixfor the responses) in the LMM. Harville (1977) showed thatthe ML equations for β are

ML(y) = XT �−1(y − Xβ̂) = 0, (2)

and those for the REML estimator of τk are

REML(y,C1k) = {(y − Xβ̂)T �−1C1k�−1(y − Xβ̂)

− tr(QC1k)}/2 = 0, (3)

where C1k = ∂�/∂τk and Q=�−1− �−1X(XT �−1X)−1×XT �−1. For LMMs, various statistical software packagesare available to provide these estimators, for example, theSAS MIXED procedure (SAS Institute Inc. 2004) and theREML directive in GenStat (Payne et al. 2007).

Suppose that the n × 1 vector of the response variables y

follows an HGLM with the linear predictor

η = g(μ) = Xβ + Zv,

where μ = E(y|v), and v = g(u) for some strictly mono-tonic function of u. In GLM families, we have

var(y|v) = φV (μ),

where φ is the dispersion parameter and V (·) is the GLMvariance function. The GLM family for the y|v componentis characterized by the variance function. HGLMs allow u

to follow any conjugate distribution of the GLM families,namely, normal, beta, gamma, and inverse gamma distribu-tions with parameter (dispersion component) λ. In GLMMswhere u = v ∼ Nq(0,D), we may define the GLM-adjustedresponse variables as

z1 = Xβ + Zv + (y − μ)(∂η/∂μ).

In GLMMs, let � = �G = ZDZT + W−11 , where the GLM

weight matrix is given by

W1 = (∂μ/∂η)2(φV (μ))−1. (4)

Lemma 1 In GLMMs, (i) the PQL estimator of β can beobtained by solving

ML(z1) = 0

and (ii) the PQL estimators of τk are obtained by solving

REML(z1,C1k) = 0.

An advantage of the PQL approach of Schall (1991) andBreslow and Clayton (1993) is the direct use of existing MLand REML codes by replacing (y,W) with (z1,W1); how-ever, the PQL approach requires the iteration of the equa-tions in Lemma 1. Because REML algorithms are well de-veloped for correlated random effects, the PQL method canbe easily applied to models with random effects having var-ious spatiotemporal correlations.

3 H-likelihood method for HGLMs

HGLMs extend GLMMs by allowing non-normal randomeffects, and their h-likelihood is defined as

h = h0 + h1,

Stat Comput (2012) 22:959–966 961

Table 1 GLM families for the conjugate random effects u

u-distribution v = θ(u) b(θ) ψ

normal u θ2/2 0

gamma log(u) exp(θ) 1

beta log{u/(1 − u)} log{1 + exp(θ)} 1/2

i-gammaa −1/u − log(−θ) 1

ainverse-gamma

where

h0 = logfβ,φ(y|v)

=n∑

i=1

{yiθ(μi) − b(θ(μi))}/φ + g0(y,φ). (5)

Here, θ(μi) denotes the canonical parameter, h1 = logfλ(v),

and fβ,φ(y|v) and fλ(v) are the probability density func-tions for the y|v and v components, respectively. The sub-scripts λ,β, and φ denote fixed parameters that determinethe characteristic of the probability density function andthe arguments y and v in the parenthesis are random vari-ables. For example, fλ(y|β̂) denotes the conditional densityfunction for y given β̂ with a fixed parameter λ. A key as-pect of HGLMs is the flexibility of the distribution of u

among conjugates of the GLM families whose log densityh1 = logfλ(vi) is proportional to

∑{ψθ(ui) − b(θ(ui))}/λ,

where vi = θ(ui) and the functions θ(·) and b(·) are thesame as those in the log-likelihood of GLM families (5)of the y|v component. This allows to fit conjugate HGLMswith the existing GLM algorithm (Lee and Nelder 2001) bydata augmentation. Table 1 shows ψ,θ(·), and b(·) for con-jugate families.

In HGLMs, any conjugate distribution can be used forrepresenting the distributions of the random effects:

h1 =∑

[{ψMθM(ui) − bM(θM(ui))}/λ] + g1(ψM,λ), (6)

where vi = θM(ui).

Example In a Poisson-gamma HGLM, we have

μij = E(yij |ui) = exp(xijβ)ui,

where ui denotes random effects. We have vi = θM(ui) =logui, ψM = 1, and bM(θ) = exp(θ). For a Poisson GLMM,we have vi = θM(ui) = ui, ψM = 0, and bM(θ) = θ2/2.

From the h-likelihood, we can obtain the two (log-)likelihoods. For the ML estimation of fixed effects β , the

marginal (log-) likelihood m can be used:

m = log∫

exp(h)dv.

In LMMs, for the REML estimation of the dispersion pa-rameters τ = (φ,λ), the following conditional density isused:

fτ (y|β̂) = fβ,τ (y)/fβ,τ (β̂),

where β̂ are ML estimators. This density is independentof β , and therefore, inferences about τ can be made usingthe expression

r = logfτ (y|β̂).

The use of this restricted (log-) likelihood of Patterson andThompson (1971) reduces the finite-sample bias of the MLestimators of τ . However, in HGLMs, m and r are difficultto compute, except in LMMs.

Consider a function class pα(�) defined by

pα(�) =[� − 1

2log |(−∂2�/∂α2)/2π |

]∣∣∣∣α=α̂

,

where | · | is the determinant, and α̂ solves ∂�/∂α = 0. Thefunction class pα(h) is useful for obtaining adjusted profilelikelihoods by eliminating nuisance effects α, irrespectiveof whether they are fixed or random or both (Lee and Nelder2001). In general, pβ(m) � r to the first order to eliminateβ by conditioning on β̂ (Barndorff-Nielsen 1983; Cox andReid 1987) and pv(h) � m to the first order to eliminate v

by integrating them out by using the Laplace approximation

pv(h) =[h − 1

2log |(ZT W1Z + W2)/(2π)|

]

v=v̂

,

where W1 is defined in (4) and

W2 = −∂2h1/∂v2. (7)

The function

pβ,v(h) =[h−1/2 log

∣∣∣∣XT W1X XT W1Z

ZT W1X ZT W1Z +W2

∣∣∣∣

]∣∣∣∣v=v̂,β=β̂

can be used to eliminate both the random and the fixed ef-fects simultaneously from h.

In LMMs, we have

m = pv(h) and r = pβ(m) = pβ(pv(h)) = pβ,v(h).

In HGLMs, m and r are generally not available, whereastheir first-order approximations pv(h) and pβ,v(h) are. Tounderstand the PQL algorithm for GLMMs, it is helpful torewrite the REML algorithm for LMMs using the equation

pβ,v(h) = D0 + D1 + D2 = r,

962 Stat Comput (2012) 22:959–966

where D0 = −(y − Xβ̂ − Zv̂)T W(y − Xβ̂ − Zv̂)/2, D1 =−v̂T D−1v̂/2, and D2 = −(log |�|+ log |XT �−1X|)/2 andW is defined in (1). The REML equations ∂pβ,v(h)/∂τk = 0involve the terms ∂β̂/∂τk and ∂v̂/∂τk. In LMMs, these twoterms can be ignored because D2 does not depend on β

and v. In HGLMs, however, D2 depends on β and v in gen-eral. Following Lee and Nelder (2001), we ignore ∂β̂/∂τk,

but not ∂v̂/∂τk because the number of random effects v isoften large and the number of fixed effects remains constantwith an increase in the sample size. However, for cases inwhich the dimension of β increases with the sample size,such as in semiparametric frailty models, it is required toconsider ∂β̂/∂τk properly (Ha et al. 2010).

Lemma 2 In LMMs,

REML(y,C1k) = ∂pβ,v(h)/∂τk = A0k + A1k + A2k, (8)

where A0k = ∂D0/∂τk = (∂v̂/∂τk)T ZT W(y −Xβ̂ −Zv̂)−

(y − Xβ̂ − Zv̂)T (∂W/∂τk)(y − Xβ̂ − Zv̂)/2, A1k = ∂D1/

∂τk = −(∂v̂/∂τk)T D−1v̂ − v̂T (∂D−1/∂τk)v̂/2, A2k =

∂D2/∂τk = − tr(QC1k)/2.

The PQL estimators for dispersion components can beobtained using the above procedure and by replacing (y,W)

with the GLM-adjusted response variables and the GLMweight matrix (z1,W1).

Corollary 1 In GLMMs, the PQL estimating equation forτk solves

REML(z1,C1k) = AP0k + A1k + A2k = 0, (9)

where AP0k = (∂v̂/∂τk)

T (ZT W1(y−μ)∂η/∂μ)−I (τk = φ)×((y − μ)∂η/∂μ)T (∂W1/∂τk)((y − μ)∂η/∂μ)/2 and C1k isas defined in (3).

Here, I (τk = φ) implies that the corresponding term isincluded if τk happens to be the dispersion parameter φ. InLMMs, the second term in A0k is zero when τk �= φ be-cause ∂W/∂τk = 0, whereas in a GLMM, ∂W1/∂τk �= 0.

Therefore, the PQL ignore this term; this could be a poten-tial source of bias.

In binary HGLMs, the first-order Laplace approximationis not sufficient to obtain accurate estimates. To resolve thisproblem, Lee and Nelder (2001) proposed the use of thesecond-order Laplace approximation

psβ,v(h) = pβ,v(h) − F/24,

where F = tr[−{3(∂4h/∂v4)+5(∂3h/∂v3)(−∂2h/∂v2)−1×(∂3h/∂v3)}(−∂2h/∂v2)−2]|v=v̂ . For the general expressionof the second-order correction term F/24, see Appendix B

Table 2 Proposed criteria for estimating β and τ

Method Criterion for β Criterion for τ Literature

HL(0,1) h pβ,v(h) Lee and Nelder (1996)

HL(0,2) h psβ,v(h) Lee and Nelder (2001)

HL(1,1) pv(h) pβ,v(h) Yun and Lee (2004)

HL(1,2) pv(h) psβ,v(h) Lee et al. (2007),

Noh and Lee (2007)

in Online Resource. Various combinations of criteria havebeen proposed for binary data (see Table 2). HL(0,1) andHL(0,2) reduce the biases of the PQL estimators; however,the estimators still suffer from non-negligible biases. Yunand Lee (2004) showed that HL(1,1) is effective in reduc-ing biases in the estimation of β. Lee et al. (2007) and Nohand Lee (2007) showed that HL(1,2) essentially eliminatesbiases in the estimation of both β and τ in binary HGLMswith beta and normal random effects, respectively. However,except for independent single-random component models,the computation of HL(·,2) is difficult. Noh and Lee (2007)studied the computation of HL(1,2) for models with crossedrandom effects.

For the bias reduction of PQL, higher-order correctionshave been studied by Goldstein and Rasbash (1996) and Linand Breslow (1996). Lin and Breslow (1996) reported thattheir second-order PQL estimators failed to reduce biases.In Sect. 4, we present REML estimating equations based onhigher-order Laplace approximations.

4 Modifications of ML and REML equations

In this section, we show how the adjusted response variablez1 and the REML equations can be modified to fit HGLMsusing h-likelihood.

Let

z2 = v + (ψM − u)(∂v/∂u)

and

W2 = (∂u/∂v)2(λVM(u))−1,

where VM(u) = b′′M(θM(u)). Let w1i and w2i be the ith di-

agonal elements of the GLM weight matrices W1 and W2,respectively.

Theorem 1 In HGLMs, where � = �H =ZW−1

2 ZT + W−11 ,

(i) β̂HL(i,j) for i = 0,1 and j = 1,2 can be obtained bysolving

ML(z(i)) = 0,

Stat Comput (2012) 22:959–966 963

where z(i) = z1 − a(i), a(0) = �W1Z(ZT W1Z +W2)

−1W2z2, a(1) = a(0) + �W1s(∂η/∂μ), and s isdefined in Appendix A of Online Resource, and

(ii) τ̂HL(i,j)k can be obtained by solving

REMLDk (j) = AH

0k + AH1k + AH

2k

− I (j = 2) ∗ (∂F/∂τk)/24 = 0,

where

AH0k = (∂v̂/∂τk)

T (ZT W1(y − μ)∂η/∂μ)

− 1/2 tr(W−11 ∂W1/∂τk)

+ I (τk = φ) ∗ (−(yθ − b(θ))/φ2 + ∂g0/∂φ),

AH1k = (∂v̂/∂τk)

T ((ψM − u)/λ) − 1/2 tr(W−12 ∂W2/∂τk)

+ I (τk �= φ) ∗ (−(ψMθM − bM(θM))/λ2

+ ∂g1/∂τk),

AH2k = − tr(Q(C1k + C2k))/2 with C2k

=∑

i

(∂�/∂vi)(∂v̂i/∂τk).

g0 and g1 are defined in (5) and (6), respectively, andI (·) is an indicator function.

The proof is given in Appendix A of Online Resource.In HGLMs, both the y|v and the v components are non-normal, and therefore, (A0k,A1k,A2k) in (8) is replacedwith (AH

0k,AH1k,A

H2k). The last term in the expression for

REMLDk (j) is for second-order correction. For estimating β,

the straightforward replacement technique in ML(z(i)) ispossible, whereas for estimating τk , the score and Hessianmatrices are required to be modified for HGLMs. We mayuse the Hessian matrix from the PQL method in Lemma 1.We find that it is often satisfactory for practical use, in thatit provides reasonable standard errors of estimators, close tothose from the exact Hessian matrix in Appendix A of On-line Resource.

For GLMMs with var(v) = D = W−12 , the REML equa-

tion is simplified because ∂D/∂βk = 0. Further simplifica-tion is possible to implement HL(·,2), as described in Ap-pendix B of Online Resource.

Corollary 2 In GLMMs with var(v) = D,

(i) for i = 0,1 and j = 1,2, β̂HL(i,j) can be obtained bysolving

ML(z(i)) = 0,

where z(0) = z1 and z(1) = z1 − �W1s(∂η/∂μ), and

(ii) τ̂HL(i,j) for τk (�= φ) can be obtained by solving

REMLDk (j) = AH

0k + A1k + AH2k

− I (j = 2) ∗ (∂F/∂τk)/24 = 0.

The PQL algorithm is analogous to HL(0,1), which hasz(0) = z1 and REMLD

k (j) = AH0k + A1k + AH

2k = 0. Thus,given dispersion components, the PQL and HL(0,1) involvethe same equation for estimating β. However, in the case ofthe PQL, C2k in the expression for AH

2k is ignored. Further-more, the PQL method employs the squared Pearson resid-uals −(z1 − Xβ̂ − Zv̂)T W1(z1 − Xβ̂ − Zv̂) instead of thetrue GLM deviances in h0.

5 Examples

In Sect. 4, we discuss the implementation of h-likelihoodmethods using a modification of existing ML and REMLalgorithms. For models with independent random effects,extensive numerical studies have shown the effectivenessof the h-likelihood method. In this section, we present h-likelihood estimators for a binomial-beta HGLM using theproposed method to demonstrate that non-normal randomeffects can also be fitted by using the proposed method. Wealso discuss simulation studies carried out to investigate theperformance of the h-likelihood methods for correlated ran-dom effects.

Example 1 (binomial-beta HGLM) Crowder (1978) pre-sented data on the proportion of seeds that germinated oneach of 21 plates with treatments in a 2 × 2 factorial lay-out of seed variety by type of root extract. Lee and Nelder(1996) used a binomial-beta HGLM to account for the extravariation associated with each plate. The model is

log{π/(1 − π)} = Xβ + v,

where E(y|u) = mπ , X is the model matrix for fixed ef-fects β , v = log{u/(1 − u)}, and u is assumed to followa Beta(τ, τ ) distribution. For covariates, the main effectsfor seeds and extracts and an interaction term are used. InTable 3, the estimates of fixed parameters are listed andstandard errors are indicated in parentheses, using the al-gorithm presented in Sect. 4. Note that we cannot obtainthe PQL estimator because v is not normal. However, forthe sake of comparison, we provide PQL estimates assum-ing that v is normal in Table 3. Because the distributionof v = log{u/(1 − u)} is symmetric, the analyses from theGLMM and the HGLM are essentially the same (Lee andNelder 1996). All the estimators are similar because 1/τ issmall. For binomial-beta HGLMs, Lee et al. (2007) showedthrough numerical studies that HL(1,2) becomes almostidentical to the ML estimators even when 1/τ is large.

964 Stat Comput (2012) 22:959–966

Table 3 Parameter estimatesfor the seed germination data

aFor the GLMM, this is avariance component estimate fornormal v. Note that theestimates of τ are comparablebecausevar(log(u/(1 − u))) ≈ 0.10where u ∼ Beta(20,20)

PQL HL(0,1) HL(1,1) HL(1,2)

Intercept −0.542(0.190) −0.542(0.190) −0.564(0.193) −0.564(0.194)

Seed 0.077(0.308) 0.077(0.307) 0.078(0.311) 0.077(0.313)

Extract 1.339(0.270) 1.338(0.269) 1.426(0.273) 1.429(0.276)

Interaction −0.825(0.430) −0.825(0.428) −0.889(0.434) −0.892(0.437)

log τ −2.323(0.435)a 3.041(0.428) 2.990(0.424) 2.961(0.422)

τ 0.098a 20.935 19.879 19.325

Table 4 Parameter estimatesfor the lip cancer data Parameter PQL HL(0,1) HL(1,1) HL(1,2)

β1 0.267(0.207) 0.271(0.207) 0.238(0.208) 0.237(0.208)

β2 0.377(0.122) 0.376(0.121) 0.376(0.122) 0.376(0.122)

λ 0.154(0.051) 0.153(0.052) 0.155(0.052) 0.156(0.053)

ρ 0.174(0.002) 0.174(0.002) 0.174(0.002) 0.174(0.002)

In models with correlated random effects, the dimen-sion of integrals required to obtain the marginal likelihoodis often high, and therefore, numerical integration is infea-sible. For models with correlated random effects, Lee andNelder (2001) used the parametrization v = L(ρ)r , withr = (r1, . . . , rs)

T being the independent random vector suchthat var(r) = � for some diagonal matrix �. Thus, var(v) =D = L(ρ)�LT (ρ). Various extended forms of L(ρ) by us-ing regression models were studied by Pan and MacKen-zie (2003). Lee and Nelder (2006) extended this modelto financial data. Given ρ, Lee and Nelder (2001) fittedGLMMs with correlated random effects by using HGLMswith an independent random effect using the expression η =Xβ + Zv = Xβ + Z∗r, where Z∗ = ZL(ρ). They searchedthe values of ρ that maximized pβ,v(h). Joe (2008) used afunction maximization routine for pv(h). These algorithmsare often too slow for data analysis.

To check the performance of the proposed h-likelihoodmethod, we conduct a simulation study. The results of thisstudy, which involves 500 replications of simulated data,are presented here. Let β̂i be the parameter estimate inthe ith replication. We report the mean β̂(= ∑

i β̂i/500)

and the standard error for the estimate σ̂β/√

500, where

σ̂β =√∑

i (β̂i − β̂)2/499. In all tables that pertain to the

simulation, estimates with significant biases at the 5% levelare indicated by *.

Example 2 (spatial model) Clayton and Kaldor (1987) an-alyzed the observed (yi ) and expected numbers (ei ) of lipcancer cases in the 56 administrative areas of Scotland witha view to producing a map that would display regional vari-ations in cancer incidence and yet avoid the presentationof unstable rates for the smaller areas. Data were availableon the percentage of the work force in each county em-ployed in agriculture, fishing, and forestry (xi ). Because all

three occupations involve outdoor work, exposure to sun-light, the principal known risk factor for lip cancer, mightbe explained. Breslow and Clayton (1993) considered thefollowing Poisson HGLM:

ηi = logμi = log ei + β1 + β2xi/10 + vi, (10)

where v follows a Markov random field model (MRF) hav-ing

[var(v)]−1 = (I − ρN)/λ.

λ is a dispersion parameter reflecting the overall heterogene-ity of the underlying risks and ρ is the spatial autocorre-lation parameter. The neighborhood matrix N has 0 as thej th diagonal element, and the off-diagonal elements in eachrow equal 1 if the corresponding regions are neighbors and0 otherwise. In the lip cancer data, ρ ∈ (−0.325,0.175) toensure that I − ρN is positive definite. Numerical integra-tion for the marginal likelihood is infeasible because the di-mension of the integral is 56. The results are presented inTable 4. For the computational details, see Appendix A inOnline Resource. Because the estimates of λ are small, allthe estimates of β are similar.

For the simulation, data are generated from the PoissonGLMM (10) using the same covariates as those in the origi-nal dataset. In Table 4, the PQL and HL estimators are sim-ilar because λ is small. To create apparent differences, weuse λ = 1.5. This shows that all the estimators of ρ have se-vere downward biases. However, these biases are found tobe caused by the skewed sampling distribution of ρ̂ (Fig. 1).Because ρ∗ = 4ρ +0.3 ∈ (−1,1), we apply the Fisher trans-formation γ = log{(1 − ρ∗)/(1 + ρ∗)} (Fisher 1915). Fig-ure 1 shows the histograms of ρ̂HL(1,2) and γ̂ HL(1,2). Thesampling distribution of γ̂ HL(1,2) becomes somewhat sym-metric, and therefore, on the γ scale, all the estimators ex-hibit no biases.

Stat Comput (2012) 22:959–966 965

Fig. 1 Histograms of ρ̂HL(1,2) (left) and γ̂ HL(1,2) (right) when ρ = 0.1 (γ = −1.735)

Table 5 Simulation study usingthe lip cancer data

*indicates significant biases atthe 5% level

Parameter PQL HL(0,1) HL(1,1) HL(1,2)

β1 = 0.25 0.295*(0.016) 0.297*(0.017) 0.221(0.017) 0.218(0.017)

β2 = 0.35 0.362(0.012) 0.360(0.012) 0.360(0.013) 0.359(0.013)

λ = 1.5 1.327*(0.013) 1.440*(0.015) 1.451*(0.015) 1.470(0.015)

ρ = 0.1 0.077*(0.003) 0.078*(0.003) 0.075*(0.003) 0.072*(0.003)

γ = −1.735 −1.718(0.049) −1.728(0.048) −1.691(0.049) −1.677(0.047)

Table 5 shows that the biases in both the PQL andthe HL(0,1) estimates of β1 and λ are significant at the5% level. HL(1,1) exhibits a downward bias in estimat-ing λ, whereas HL(1,2) lacks any such biases. In sum-mary, HL(1,2) has the least biases (no significant biases),but it is quite slow, as compared to HL(1,1). Thus, from theviewpoint of computation, HL(1,1) is preferable; however,it could give a bias for λ. An anonymous referee points outthat although these biases in Table 5 are statistically signifi-cant, they may not be practically significant. It is interestingto address how important these biases would be in practice.In disease mapping, the estimates of random effects v areoften interesting for relative risks. The standard error esti-mates of v̂ are affected by the biases of λ̂, and therefore, thecoverage probability of prediction interval for v based onthe PQL method cannot maintain the stated level becauseof a downwards bias of λ̂, whereas those of HL(1,1) andHL(1,2) can (Lee et al. 2011a).

Example 3 (temporal model) Joe (2008) studied the effectof an increase in the dimension of temporally correlated ran-

dom effects when the number of fixed parameters is con-stant. For the simulation study, we consider the followingPoisson HGLM:

ηij = logμij = β1 + xiβ2 + xijβ3 + vij ,

where the between-cluster covariate xi is set to −1 for thefirst n/2 subjects and to 1 for the remaining n/2 subjects;the within-cluster covariate xij is set to −1 for the first d/2and to 1 for the remaining d/2 within each subject. We alsodefine vi = (vi1, . . . , vid)T ∼ N(0,�(ρ)), where

�(ρ) = λ(ρ|j−k|)1≤j,k≤d .

Joe (2008) considered the model with β2 = β3 = 0. Weconsider the worst case in his numerical study: β1 = 0,2;ρ = 0,0.7; d = 2; n = 100; and λ = 0.5. Note that Joe(2008) used pv(h) to estimate λ and ρ, whereas we usedpβ,v(h). In the present study, the Fisher transformation ofρ is not necessary because the sampling distribution of ρ̂

is fairly symmetric. Our results are presented in Online Re-source. We also consider the case of d = 6 and n = 50. Sup-plementary Table shows that PQL and HL(0,1) are similar,

966 Stat Comput (2012) 22:959–966

having biases in almost all parameter estimators. HL(1,1)

and HL(1,2) generally reduce biases. When ρ = 0 (for in-dependent random effects), these biases are further reducedwhen the cluster size d or β1 increases. When β1 is large,the Poisson distribution approaches the normal distribution,for which all the methods provide exact ML and REML es-timators. However, when ρ is large (0.7), the biases maynot be reduced when the cluster size d or β1 increases. Wefound that HL(1,1) and HL(1,2) often estimate the within-cluster effects without bias. In temporally correlated PoissonHGLMs, HL(1,1) and HL(1,2) are similar, and HL(1,1)

would be preferable because it has better computational ef-ficiency.

6 Concluding remarks

The PQL method is easy to implement, but provides severebiases, and therefore, the h-likelihood method needs to beimplemented. First, we study the modification of the exist-ing IWLS (REML) method to implement the h-likelihoodmethod for HGLMs. The modification of the method usedfor the ML estimation of β is straightforward and simply in-volves the modification of the adjusted dependent variables.We also investigate the bias reductions of HL(i, j) meth-ods over the PQL method, especially for HGLMs with cor-related random effects. A simulation study shows that PQLand HL(0,1) have significant biases, and these biases can bereduced by using HL(1,1) and HL(1,2). For models withindependent random effects, biases can be negligible whenthe cluster size is large. However, such a bias reduction maynot hold when the correlation among random effects is large(when ρ = 0.7). The simulation study shows that the biasesof HL(1,1) and HL(1,2) in within-cluster covariates maybe negligible; however, the bias reduction may not be suf-ficient in most of the other parameter estimators. Thus, itwould be interesting to develop a method for further biasreduction. Between HL(1,1) and HL(1,2), HL(1,1) wouldbe preferable because it has better computational efficiency.Markov Chain Monte Carlo (MCMC) has been developedfor computation in Bayesian statistics, where the posteriormean E(vi |y) is often computed. In the h-likelihood ap-proach, the mode of the vi |y distribution is used for infer-ences. Lee et al. (2011b) and Lee and Ha (2010) illustratedthe advantages of the h-likelihood method over the MCMCmethod and best unbiased linear prediction approach, re-spectively. A more through comparisons would be an inter-esting future work.

References

Barndorff-Nielsen, O.E.: On a formulae for the distribution of the max-imum likelihood estimator. Biometrika 70, 343–365 (1983)

Breslow, N.E., Clayton, D.G.: Approximate inference in generalizedlinear mixed models. J. Am. Stat. Assoc. 88, 9–25 (1993)

Clayton, D., Kaldor, J.: Empirical Bayes estimates of age-standardizedrelative risks for use in disease mapping. Biometrics 43, 671–681(1987)

Cox, D.R., Reid, N.: Parameter orthogonality and approximate condi-tional inference. J. R. Stat. Soc. B 49, 1–39 (1987)

Crowder, M.J.: Beta-binomial ANOVA for proportions. Appl. Stat. 27,34–37 (1978)

Fisher, R.A.: Frequency distribution of the values of the correlation co-efficient in samples of an indefinitely large population. Biometrika10, 507–521 (1915)

Goldstein, H., Rasbash, J.: Improved approximations for multilevelmodels with binary responses. J. R. Stat. Soc. A 159, 505–513(1996)

Ha, I., Noh, M., Lee, Y.: Bias reduction of likelihood estimators insemiparametric frailty models. Scand. J. Stat. 37, 307–320 (2010)

Harville, D.: Maximum likelihood approached to variance componentestimation and related problems. J. Am. Stat. Assoc. 72, 320–340(1977)

Joe, H.: Accuracy of Laplace approximation for discrete responsemixed models. Comput. Stat. Data Anal. 52, 5066–5074 (2008)

Lee, Y., Ha, I.: Orthodox BLUP versus h-likelihood methods for infer-ences about random effects in Tweedie mixed models. Stat. Com-put. 20, 295–303 (2010)

Lee, Y., Nelder, J.A.: Hierarchical generalized linear models (with dis-cussion). J. R. Stat. Soc. B 58, 619–678 (1996)

Lee, Y., Nelder, J.A.: Hierarchical generalized linear models: A synthe-sis of generalized linear models, random effect model and struc-tured dispersion. Biometrika 88, 987–1006 (2001)

Lee, Y., Nelder, J.A.: Double hierarchical generalized linear models(with discussion). Appl. Stat. 55, 139–185 (2006)

Lee, Y., Nelder, J.A., Noh, M.: H-likelihood: problems and solutions.Stat. Comput. 17, 49–55 (2007)

Lee, Y., Nelder, J.A., Pawitan, Y.: Generalized Linear Models withRandom Effects: Unified Approach via h-Likelihood. Chapmanand Hall, New York (2006)

Lee, W., Lim, J., Lee, Y., Castillo, J.: The hierarchical likelihood ap-proach to autoregressive stochastic volatility models. Comput.Stat. Data Anal. 55, 248–260 (2011a)

Lee, Y., Jang, M., Lee, W.: Prediction interval for disease mappingusing the hierarchical likelihood. Comput. Stat. 26, 159–179(2011b)

Lin, X., Breslow, N.E.: Bias correction in generalized linear mixedmodels with multiple components of dispersion. J. Am. Stat. As-soc. 91, 1007–1016 (1996)

Molenberghs, G., Verbeke, G., Demetrio, C.G.B., Vieira, A.: A familyof generalized linear models for repeated measures with normaland conjugate random effects. Stat. Sci. 25, 325–347 (2010)

Noh, M., Lee, Y.: REML estimation for binary data in GLMMs. J.Multivar. Anal. 98, 896–915 (2007)

Pan, J., MacKenzie, G.: On modelling mean-covariance structures inlongitudinal studies. Biometrika 90, 239–244 (2003)

Patterson, H.D., Thompson, R.: Recovery of interblock informationwhen block sizes are unequal. Biometrika 58, 545–554 (1971)

Payne, R.W., Murray, D.A., Harding, S.A., Baird, D.B., Soutar, D.M.:GenStat for Windows (10th Edition) Introduction. VSN Interna-tional, Hemel Hempstead (2007)

SAS Institute Inc.: SAS 9.1.3 Help and Documentation. SAS InstituteInc., Cary (2000–2004)

Schall, R.: Estimation in general linear models with random effects.Biometrika 78, 719–727 (1991)

Yun, S., Lee, Y.: Comparison of hierarchical and marginal likelihoodestimators for binary outcomes. Comput. Stat. Data Anal. 45,639–650 (2004)

Documents

Modifications of REML algorithm for HGLMs