Modified GEE and Goodness of the Marginal Fit (GOMF) Test with Correlated Binary Responses for Contingency Tables

Modified GEE and Goodness of the Marginal Fit (GOMF) Testwith Correlated Binary Responses for Contingency Tables

Ji-Hyun Lee*; 1 and Bahjat F. Qaqish2

1 H. Lee Moffitt Cancer Center & Research Institute, University of South Florida, Tampa, FL 33612, USA2 Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599,

USA

Received 12 May 2004, revised 16 September 2004, accepted 21 September 2004

Summary

This paper addresses testing the goodness of fit of models for marginal probabilities estimated bygeneralized estimating equations. We develop a modified version of generalized estimating equationand a goodness-of-fit test based on the fitted marginal means. The test statistic is easy to compute andhas a simple reference distribution. Its performance is evaluated asymptotically and in small samples. Itis also compared to the deviance and Pearson X2 statistics. Example applications are given.

Key word: Correlated binary data; Modified GEE; Contingency table; Goodness of the mar-ginal fit test statistic.

1 Introduction

Correlated binary outcomes are common in biostatistics and epidemiology. For example, the coalminers study (McCullagh and Nelder, 1989, p. 230) is a study of respiratory ailments of working coalminers. One aim of the study was to determine how respiratory ailments and age are related. Eachparticipant was classified with respect to breathlessness and wheezing. The two responses on a subjectare naturally correlated. Another example is the smoking relapse data (Brandon et al., in press), whichmotivated this study. The outcome was a binary variable indicating whether or not an ex-smokersmoked any cigarettes over the 7 days preceding at 12, 18, and 24 months after the program entry.Three binary outcomes were measured on each subject over time, and correlation within a subject isexpected.

The method of generalized estimating equations (GEE), proposed by Liang and Zeger (1986), is apopular method for analyzing correlated binary responses. GEE provide consistent estimates for awide class of distributions with few assumptions. Specifically, under correct specification of the modelfor the marginal means, they provide consistent and asymptotically normal estimators of the regres-sion parameters and consistent estimators of their variances.

It is often desirable to assess how close the model’s fitted values are to the corresponding observedvalues, or stated otherwise, the goodness-of-fit. The standard goodness-of-fit statistics for contingencytables, the deviance and the Pearson X2, have the advantages of computational simplicity, and a sim-ple reference distribution in large samples. However, these tests are not applicable to GEE becausethey are based on fitted values for individual cells within a contingency table while GEE providesfitted values of table margins only.

Barnhart and Williamson (1998) proposed score tests that are extensions of Tsiatis (1980) andLipsitz and Buoncristiani (1994). These are score tests for added indicator variables constructed bydividing the covariate space into regions. Horton et al. (1999) proposed tests in which the observations

* Corresponding author: e-mail: [email protected]

Biometrical Journal 46 (2004) 6, 675–686 DOI: 10.1002/bimj.200410071

# 2004 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

are divided into subsets based on the fitted values, thus extending the approach of Hosmer and Leme-show (1989). However, there are concerns with partitions derived from fitted values (Hosmer et al., 1997).

In this paper, we follow the GEE framework of Liang and Zeger (1986), but replace their workingcovariance matrix by one based on the multinomial covariance structure; i.e., we retain the advantagesof the GEE with a parametrically derived covariance structure for the data. Further, based on thismodified GEE we develop a statistic for testing the goodness of the marginal fit (GOMF) for contin-gency tables.

Section 2 introduces the notation and basic results. In Section 3, we introduce the modified GEEfor parameter estimation. The GOMF and its asymptotic distribution are developed in Section 4. InSection 5, the coal miners data and the smoking relapse data are used to illustrate the proposedmethods. Small-sample evaluation of the proposed GOMF test and comparison to MLE type good-ness-of-fit tests are presented in Section 6. Section 7 concludes with a summary discussion.

2 Marginal Regression Models for Contingency Tables

The distribution of dependent Bernoulli variables S1; . . . ; Sn can be viewed as a multinomial withN :¼ 2n cells. Thus, let the N � 1 random vector Z denote the cell frequencies in a 2n contingencytable created by cross-classifying independent and identically distributed responses from m subjects.The distribution will be denoted by

Z � Mult ðm; pÞ ;where p is an N � 1 vector of positive cell probabilities summing to unity. The n margins of a tableare obtained by collapsing appropriate cells and will be collected in a vector Y ¼ CZ, where C is ann� N matrix with entries 0 or 1.

For example, in the case n ¼ 2, the cell counts are depicted by

S2 ¼ 0 S2 ¼ 1

S1 ¼ 0 Z1 Z2

S1 ¼ 1 Z3 Z4 Y1

Y2 m

and

C ¼ 0 0 1 10 1 0 1

� �:

By standard properties of multinomial, EðZÞ ¼ mp and var ðZÞ ¼ mðDp � pp>), whereDp ¼ Diag ðp) . Here Eð:Þ and (.) denote the expectation and variance operators, respectively.

Consequently, it follows that

EðYÞ ¼ mm and var ðYÞ ¼ mG ;

where

m :¼ Cp and G :¼ GðpÞ ¼ CðDp � pp>Þ C> :

Further, let P ¼ m�1Y be the marginal proportions.When several tables are available, they will be indexed by i ¼ 1; . . . ; T; thus, for example,

Yi; ni;pi;mi; mij;mi; and Gi pertain to the i-th table, where j ¼ 1; . . . ; ni.For simplicity we took ni ¼ n for all i, although that is not necessary. In the coal miners’ data there

is a 2� 2 table for each of the nine age groups, thus n ¼ 2 and T ¼ 9. In the smoking relapse data,four groups are defined by two interventions, and there is a 23 table (8 cells) for each, so n ¼ 3 andT ¼ 4.

676 J.-H. Lee and B. F. Qaqish: Modified GEE and GOMF Test


Marginal regression models specify the marginal means mj ¼ Pr ðSj ¼ 1Þ, but not the cell probabil-ities p. We are interested in marginal regression models of the form mi ¼ miðb), where b is an un-known s� 1 parameter vector. Typically, the regression model has the form

hðmijÞ ¼ hij ¼ x>ij b ;

for i ¼ 1; . . . ; T and j ¼ 1; . . . ; n. Here hð:Þ is a known link function and hij is the linear predic-tor. We assume that the parameterization is full rank so there are s linearly independent estimableparameters.

3 Estimation

Let K denote the number of subjects, K ¼PTi¼1

mi. Large-sample arguments are based on T fixed and

each mi !1 andmi

K! ci 2 ð0; 1Þ. A modified generalized estimating equation for b is the solution

to

UðbÞ ¼PTi¼1

miD>i G�1

i ðPi � miÞ ¼ 0 ;

where Di :¼ @[email protected] than use a working correlation matrix as in Liang and Zeger (1986), we replace Gi in (1) by

the consistent estimator ~GGi :¼ G(~ppiÞ, where ~ppi :¼ ðZi þ 2�n1Þðmi þ 1Þ . The term 2�n1 is asymptotically negli-

gible, but it stabilizes ~GGi in small samples.The solution of (1) will be denoted by bb*. Under mild regularity conditions, and by arguments

similar to Liang and Zeger (1986), the asymptotic distribution offfiffiffiffiKpðbb*� b) is multivariate normal

with mean zero and variance matrix

Vb ¼ limK!1

KPTi¼1

miD>i G�1

i Di

� ��1

: ð2Þ

A consistent estimator of Vb is obtained by replacing Gi in (2) by ~GGi.

4 Goodness of the Marginal Fit Test Statistic

Let V denote the block diagonal matrix with diagonal blocks Vi ¼ m�1i Gi and define

D ¼ ½D>1 . . . D>T �> and Q ¼ DðD>V�1DÞ�1 D>.

We propose testing H0 : m ¼ mðbÞ by the goodness of the marginal fit test (GOMF) statistic,

G2 ¼ ðP� mmÞ> WW�ðP� mmÞ ; ð3Þ

where P and mm are the stacked Pi and mmi ¼ miðbb*Þ vectors, respectively, WW is a consistent estimatorof W ¼ V � Q, and WW� is a generalized inverse. Here V ¼ var ðPÞ ¼ M�1G withM :¼ Diag ðm1; . . . ;mTÞ � In, In ¼ n� n identity matrix and � the Kronecker product.

Theorem 1 Assume that regularity conditions and the model m ¼ mðbÞ hold. Then, G2 has anasymptotically chi-squared distribution with degrees of freedom q ¼ nT � s.

The proof will be completed by a series of Lemmas.

Lemma 4.1 Approximated residuals for the fitted model.

ðP� mmÞ � ðI � HÞ ðP� mÞ ; ð4Þwhere I :¼ nT � nT identity matrix and H ¼ QV�1 is an idempotent projection matrix of rank s.

Biometrical Journal 46 (2004) 6 677


Proof. By a Taylor expansion of (1) and following arguments similar to the proof of Theorem 2 inLiang and Zeger (1986), we have that

bb* � bþPTi¼1

miD>i G�1

i Di

� ��1 PTi¼1

miD>i G�1

i ðPi � miÞ

¼ bþ ðD>G�1MDÞ�1 D>G�1MðP� mÞ ; ð5Þwhere � denotes asymptotic equivalence.

Then, the first-order Taylor expansion of the fitted values is

mm � mþ @m

@bðbb*� bÞ

and using (5),

ðmm� mÞ � QV�1ðP� mÞ ¼ HðP� mÞ :Now the distribution of the residuals can be approximated by

ðP� mmÞ ¼ ðP� mÞ � ðmm� mÞ � ðP� mÞ � HðP� mÞ � ðI � HÞ ðP� mÞ :

Lemma 4.2 The covariance matrix of P� mm is W ¼ V � Q.

Proof. W ¼ cov ðP� mmÞ ¼ cov fðI � HÞ ðP� mÞg¼ ðI � HÞ VðI � HÞ> ¼ V � DðD>V�1DÞ�1 D> ¼ V � Q ¼ ðI � HÞ V :

Using Cochran’s Theorem (Sen and Singer, 1993, p. 137), we may conclude that if the modelm ¼ mðb) holds, G2 follows asymptotically a chi-squared distribution with degrees of freedomq ¼ rank ðW):

Since V is a nonsingular matrix with rank nT and H is an nT � nT idempotent projection matrix, itfollows from Theorems 1.8 and 9.1 from Schott (1997) that

q ¼ rank ðWÞ ¼ rank fðI � HÞVg ¼ rank ðI � HÞ ¼ Trace ðI � HÞ ¼ nT � s :

This completes the proof of Theorem 1. Note that a computationally convenient form for G2 in (3) is

G2 ¼ ðP� mmÞ> V�1ðP� mmÞ : (6)

This follows because V�1ðI � HÞ is a generalized inverse of W, and for mm satisfying (1), HðP� mm)¼ 0. The GOMF can be obtained in fairly simple computational manner with (6), and a SAS IMLfunction that implements the modified GEE and GOMF test is available from the authors uponrequest.

5 Examples

5.1 Bivariate model: coal miners study

The first illustration is the coal miners study analyzed in some detail by McCullagh and Nelder (1989,p. 230), so we will report only on a few points of interest. The study focused on two respiratoryailments of working coal miners, breathlessness and wheezing. The subjects were smokers withoutradiological evidence of pneumoconiosis, age 20 to 64 at the time of examination. One aim of theinvestigation was to study how breathlessness, wheezing and their interaction are related to age. Theavailable covariate, age, is a discrete variable grouped into five year intervals.

We first consider a model with linear age effect:

logit ðmi1Þ ¼ b0B þ b1B Agei for Breathlessness

logit ðmi2Þ ¼ b0W þ b1W Agei for Wheezing ;



where Agei is coded �4; . . . ; 4 for the 9 age groups. Table 1 displays the standard GEE estimates (bb)using independence working correlation and their robust and naive standard errors (S.E.), the modifiedGEE estimates (bb*) and their standard errors (S.E.*), and the GOMF test statistic (G2). The estimatesand standard errors obtained from the modified GEE are quite similar to those from the standardGEE.

The GOMF test then suggests that a linear age effect did not fit the data well(G2 ¼ 25:1; p ¼ 0:03). After adding a quadratic age effect, the GOMF test statistic is 10.3 on 12degrees of freedom, a p-value of 0.59, indicating a good fit. This is in agreement with the conclusionsof McCullagh and Nelder (1989, Chapter 6).

5.2 Longitudinal model: smoking relapse study

The second illustration is the preventing smoking relapse study (Brandon et al., in press). As interven-tions, the content of the booklets and the repeated contact were dismantled from the previous studywherein Brandon et al. (2000) found that a series of eight empirically-based relapse-prevention book-lets mailed to ex-smokers significantly reduced relapse. 431 subjects who were abstinent at baselinewere randomly assigned to groups receiving the high contact or low contact and the high content orlow content interventions. As the response variable, smoking relapse status was determined for each of3 visits (at 12, 18, and 24 months). Some subjects never attended follow-up sessions, resulting inmissing observations. The number of participants and the marginal means of smoking relapse by inter-vention groups are summarized in Table 2. The prevalence of smoking relapse in the following timeperiod reveals a constant increase in most intervention groups with a maximum at the last time period.The primary study focuses on whether the smoking relapse has an association with the two interven-tions, controlling possible covariates. First, we fitted the marginal probability of smoking relapse withthe main effects of interventions and time in months, where both content and contact effects takevalues 1 if high or 0 if low. Table 3 displays the parameter estimates and GOMF tests obtained fromthe modified GEE along with those from the standard GEE assuming unstructured working correla-tion.


Table 1 Parameter estimates and GOMF for the coal miners data.

bb robust S.E. naive S.E. bb* S.E.*

Linear age effectIntercept (B) �2.2597 0.0294 0.0301 �2.2753 0.0295Age (B) 0.5125 0.0118 0.0127 0.5187 0.0117Intercept (W) �1.4875 0.0205 0.0206 �1.4935 0.0205Age (W) 0.3259 0.0088 0.0089 0.3271 0.0088

GOMF G2 ¼ 25:09 df ¼ 14 p ¼ 0:03

Quadratic age effectIntercept (B) �2.1993 0.0346 0.0346 �2.2024 0.0344Age (B) 0.5485 0.0168 0.0169 0.5484 0.0166Age2 (B) �0.0197 0.0056 0.0057 �0.0194 0.0055Intercept (W) �1.4492 0.0272 0.0272 �1.4521 0.0273Age (W) 0.3322 0.0095 0.0095 0.3325 0.0095Age2 (W) �0.0082 0.0039 0.0038 �0.0081 0.0039

GOMF G2 ¼ 10:33 df ¼ 12 p ¼ 0:60

(B): Breathless and (W): Wheezing


The GOMF test indicates an adequate fit (G2 ¼ 3:42; p ¼ 0:90) for the main effects model. Next,we removed the contact effect, which showed a non-significant p-value at the 0.05 level based on thechi-square distribution (p ¼ 0:74 from the standard GEE, p ¼ 0.67 from the modified GEE). Exclu-sion of the contact effect has the effect of reducing the G2 from 3.42 on 8 degrees of freedom to 1.15on 3 degrees of freedom. However, the statistical significance still remains (p ¼ 0:76) and we choosethe simpler reduced effects model as the final one. From the final model we observe that the param-eter estimates and the standard errors obtained here are fairly close to those obtained using thestandard GEE, except the standard error for the intercept effect. Notice that the reduced model hasonly one parameter difference compared to the full model; however, the difference in degrees of free-dom of the GOMF between the models is 5. This is because the degrees of freedom is a function ofthe number of tables and the number of responses for each subject, as well as the number of param-eters in the model. The final model implies that probability of smoking relapse is lower for subjectswho have high content intervention versus those who have low content intervention and as time passesthe smoking relapse probability increases.


Table 2 Means of smoking relapse in percent(number of subjects) at three times by the inter-vention groups.

Month

Content Contact 12 18 24

Low Low 26.4 34.1 33.3(87) (91) (96)

Low High 24.3 28.4 31.0(70) (81) (84)

High Low 13.6 20.4 21.9(81) (93) (96)

High High 19.5 20.9 21.1(82) (91) (90)

Table 3 Parameter estimates and GOMF for the smoking relapse study.

bb robust S.E. naive S.E. bb* S.E.*

Main effects modelIntercept �1.0633 0.2313 0.2344 �0.8632 0.5375Time (months) 0.0154 0.0085 0.0087 0.0158 0.0090Contact (High¼ 1, Low¼ 0) �0.0697 0.2130 0.2120 0.1078 0.2522Content (High¼ 1, Low¼ 0) �0.5356 0.2132 0.2120 �0.5858 0.2539

GOMF G2 ¼ 3:43 df ¼ 8 p ¼ 0:90

Reduced effects modelIntercept �1.0954 0.2123 0.2138 �0.8202 0.4219Time (months) 0.0154 0.0085 0.0088 0.0196 0.0095Content (High¼ 1, Low¼ 0) �0.5363 0.2129 0.2125 �0.5346 0.2517

GOMF G2 ¼ 1:15 df ¼ 3 p ¼ 0:76


6 Simulation

The first simulation experiment was designed to compare the GOMF test to the traditional devianceand Pearson X2 tests associated with maximum-likelihood estimation. The calculations for both thedeviance and Pearson X2 are based on the multinomial distribution as in McCullagh and Nelder(1989). Data were simulated from a true model MT , and then a possibly different working model MW

was fitted. The setup based on the coal miners study was as follows:

Model 1 MT : logit mijðxiÞ ¼ b0 þ b1xi þ b2x2i

log OR ðYi1; Yi2Þ ¼ 3:02� 0:13xi ð7ÞMW : logit mijðxiÞ ¼ b0j þ b1jxi ;

where i ¼ 1; . . . ; 9 denotes clusters, j ¼ 1; 2 denotes observations within a cluster, and xi ¼ �4; . . . ; 4.The values of b were chosen so that mij, as a function of xi, takes values mijð�4Þ ¼ 0:1;

mijð0Þ ¼ 0:3þ d and mijð4Þ ¼ 0:6: The value of d was varied from 0 to 0:3 in steps of 0:05, withd ¼ 0 corresponding to the null hypothesis. This allowed a series of models that increase in thestrength of the quadratic term. Table 4 displays the corresponding parameter values.

In order to assess the impact of mis-specification of the within-cluster association, a model withconstant log odds ratio was fitted in addition to the true model which is linear in xi as shown in (7).The values of 3.02 and �0.13 are the maximum likelihood estimates. Note that the GOMF test doesnot require specification of the association model (7).

A total of 10 000 datasets were generated wherein each dataset consists of 9 tables of size 22. Thetotal counts mi ¼ 50; 100; 200, or 400 for each table were investigated.

The results are summarized in Table 5. The GOMF test has a somewhat inflated Type I error forthe smaller sample sizes mi ¼ 50 and 100. However, its power is appreciably higher than the devianceand Pearson X2 tests at all sample sizes. The table also shows severely inflated Type I error for thedeviance and Pearson X2 tests if the within-cluster association is mis-specified.

These results show clearly that the GOMF test is aimed only at how well the table margins arefitted, while the other statistics test all aspects of the model. The deviance and Pearson X2 for contin-gency tables, being based on individual cell counts and their fitted values, test the model for thewhole joint distribution including the marginal means, pairwise correlations, and all higher-order inter-actions. For example, for the coal miners data, they test the adequacy of the correlation (or odds ratio)model in addition to the marginal mean model. A large value of these statistics can be due to eithermodel component being inadequate. In contrast, the GOMF test is aimed at the marginal mean struc-ture only, and does not require specification of an association model.

The second simulation experiment was designed to evaluate the power of the GOMF with the omis-sion of an quadratic term and the change of the magnitude of within-cluster correlation. Specifically,


Table 4 Parameters used in simulating datafor Model 1. mð�4Þ ¼ 0:1; mð4Þ ¼ 0:6 andmð0Þ ¼ 0:3þ d:

d b0 b1 b2

0.0 �0.895 0.325 0.00.05 �0.619 0.325 �0.0170.10 �0.405 0.325 �0.0310.15 �0.201 0.325 �0.0430.20 0.000 0.325 �0.0560.25 0.201 0.325 �0.0690.30 0.405 0.325 �0.081


the log ORðYi1; Yi2Þ component of (7) was replaced with

corr ðYi1; Yi2Þ ¼ q ;

and values q 2 f0:0; 0:2; 0:5; 0:8g were investigated.We generated a total of 1000 datasets. The size, number, and total counts of the tables were the

same from the first simulation experiment.Table 6 summarizes estimated power of the GOMF test with nominal level 0:05. Power is low

against mild departures from linearity, but increases quite rapidly for d � 0:15. Power decreases withincreasing correlation. With larger sample sizes, mi ¼ 400, high power is achieved even against milddepartures from the assumed model, d ¼ 0:1.

Next simulation examines power when an interaction term is omitted. The true and working mod-els, mimicking a longitudinal study, are given by:

Model 2 MT: logit ðmijÞ ¼ b0 þ b1xij1 þ b2xij2 þ b3xij1xij2

corr ðYij; Yij0 Þ ¼ q

MW: logit ðmijÞ ¼ b0 þ b1xij1 þ b2xij2

where i ¼ 1; 2 clusters and j ¼ 1; . . . ; 4 consecutive times.


Table 5 Power (%) of the GOMF and the MLE type goodness-of-fit tests,Deviance and Pearson X2, at a ¼ 0:05 with 10000 replicates. Quadratic depar-ture is parameterized by d in the true model from Model 1 using the correct anda mis-specified log odds ratio models for the Deviance and Pearson X2.

d

log odds ratio GOF mi 0.0 0.05 0.10 0.15 0.20 0.25 0.30

correct Deviance(c2

21)50 8 11 17 28 48 69 86

100 6 10 24 51 80 96 100200 5 13 44 86 99 100 100400 5 23 81 100 100 100 100

Pearson X2

(c221)

50 5 6 10 20 39 61 82100 5 8 20 46 78 95 100200 4 11 42 85 99 100 100400 5 22 79 100 100 100 100

unspecified GOMF(c2

14)50 8 14 25 43 66 85 95

100 7 15 38 70 92 99 100200 6 24 66 96 100 100 100400 6 42 95 100 100 100 100

mis-specified Deviance(c2

21)50 12 14 21 33 53 72 88

100 13 18 35 61 86 97 100200 20 32 66 94 100 100 100400 41 65 96 100 100 100 100

Pearson X2

(c221)

50 7 8 14 24 44 65 84100 10 14 30 56 83 97 100200 18 29 63 93 100 100 100400 40 63 96 100 100 100 100


The covariate xij1 is a time-independent or cluster-level binary covariate taking values 0 or 1, whilexij2 is a time-dependent or within-cluster covariate that represents age centered at 9 years, with valuesð�2; . . . ; 1Þ.

Values of b were chosen such that mijðxij1; xij2Þ takes on the specified values

mijð0;�2Þ ¼ 0:2 ; mijð0; 1Þ ¼ 0:2þ d ;

mijð1;�2Þ ¼ 0:4 ; mijð1; 1Þ ¼ 0:4� d :

Thus, d gauges the degree of non-parallelism of the regression lines in the two groups defined by xij1.Values d 2 f0, 0.05, 0.10, 0.15, 0.2g were investigated.

To see the performance of power under different levels of association, we generated data under MT,assuming exchangeable correlation structure. The values of q ¼ 0.0, 0.2, 0.3, 0.45 were investigated.

A total of 1000 datasets were generated. Each dataset consisted of two 24 tables, each table havingtotal count mi ¼ 50; 100; 200; or 400.

Table 7 presents the estimated power of the GOMF for Model 2. The power for detecting interac-tion departures is quite high. The results reveal that the GOMF test has power that increases withsample sizes, and the strength of the interaction. Power also increases with correlation since informa-tion about the interaction effect is available in within-cluster contrasts.

When a ¼ 0:05 with a relatively smaller sample size of 50 or 100, Type I error rate is inflated andranges from about 0.05 to 0.10 (Tables 6 and 7). The inflation is due to small expected counts inmany cells. In the first simulation setup, with mi ¼ 50; d ¼ 0 and log odds ratio model (7), of the 36cells in the 9 tables, 15 had expected cell counts less than 5. Specifically, suppose that the requiredminimum expected cell is � 5 in each table; i.e., mi � minimum (p; 1� pÞ � 5. This implies that formi ¼ 50, all cell probabilities, p, must be in the interval [0.1, .9] while [0.01, 0.99] for mi ¼ 500. In


Table 6 Power (%) of the GOMF test statistic at a ¼ 0:05 with 1000 replicates.Quadratic departure is parameterized by d in the true model from Model 1.

d

Correlation mi 0.0 0.05 0.10 0.15 0.20 0.25 0.30

q ¼ 0:050 10 17 38 61 81 96 100

100 8 21 56 91 99 100 100200 6 32 88 100 100 100 100400 5 65 100 100 100 100 100

q ¼ 0:250 9 16 31 53 79 93 9

100 8 16 46 84 98 100 100200 5 27 81 99 100 100 100400 6 54 99 100 100 100 100

q ¼ 0:550 10 14 27 44 68 84 97

100 9 14 37 73 93 99 100200 6 24 69 97 100 100 100400 5 43 96 100 100 100 100

q ¼ 0:850 7 11 21 35 56 77 91

100 6 13 32 63 87 97 100200 6 19 59 100 100 100 100400 5 39 92 100 100 100 100


the first our simulation setup, corresponding AGE¼ �4 (the youngest group), the marginal probabilitywas assigned m ¼ 0:1. Recall that m in the table is a function of p. Therefore, within this setup, whenmi ¼ 50 or mi ¼ 100, most tables have quite sparse data. These low cell frequencies (or expected cellcounts) were associated with large values of the GOMF test statistic produced by relative large valuesof residual vectors and small values in covariance matrix. That may explain why the Type I error rateis inflated.

All simulations for this section were performed on a personal computer using programs written inSAS IML (version 8.2). Software is available from the authors.

7 Summary and Discussion

This paper presented an approach to correlated binary outcomes based on a modification GEE ob-tained by replacing the working covariance matrix with one based on the multinomial distribution.

Additionally, based on the modified GEE a method to assess the goodness of the marginal fit wasproposed. Two examples illustrated this method. Simulation results showed that the proposed test hasgood power for detecting both quadratic and interaction departures from the assumed model. We alsocompared the power of the GOMF with the proper deviance and Pearson X2 based on the multinomialdistribution as in McCullagh and Nelder (1989). The simulation study showed that these standardgoodness-of-fit tests were very sensitive to the assumed dependence structure; if the log odds ratiomodel was incorrectly specified, Type I errors were overwhelmingly high. The proposed GOMF test,by comparison, does not require specification of the association model. The Type I errors of theGOMF test were close to the nominal level except when the sample size was small.

The GOMF test statistic can be used to test the effect of adding specific sets of covariates. Twonested models can be compared by referring the difference in the GOMF statistics to a chi-square


Table 7 Power (%) of the GOMF test statistic at a ¼ 0:05 with 1000 replicates.Interaction departure is parameterized by d in the true model from Model 2

d

Correlation mi 0.0 0.05 0.10 0.15 0.20

q ¼ 0:050 9 14 29 51 80

100 8 15 45 79 97200 5 24 75 98 100400 6 40 97 100 100

q ¼ 0:250 8 14 33 58 88

100 6 16 53 87 99200 5 28 83 100 100400 5 50 99 100 100

q ¼ 0:350 7 13 32 63 91

100 6 17 57 92 100200 6 31 89 100 100400 5 57 100 100 100

q ¼ 0:4550 6 13 37 74 97

100 5 19 70 97 100200 5 39 96 100 100400 4 66 100 100 100


distribution with degrees of freedom equal to the difference in the model dimensions. If the differenceof the GOMF statistics shows significance, the larger model can be considered a better fit.

The iteratively reweighted least squares algorithm is used in our model fitting. The estimates ob-tained after the first iteration coincide with those obtained by the weighted least squares (WLS) ap-proach of Koch et al. (1977). Thus the modified GEE can be viewed as an iterated form of the WLSapproach of Koch et al. (1977), and if the sample size is large enough, we would expect that theparameter estimates and standard errors would be reasonably similar for both approaches. A similarconnection has been noted by Miller et al. (1993), who also proposed a goodness-of-fit test for corre-lated ordinal data based on a similar framework. The test against a saturated model by Miller et al.(1993) would be equivalent to GOMF, however, they did not evaluate it in a simulation: i.e., a satu-rated model as an alternative hypothesis was not considered in the simulation. Further, the investiga-tion of power for the test comparing to the method from the full likelihood approach, such as thedeviance or Pearson X2 statistics was not conducted.

Our approach differs from that of Barnhart and Williamson (1998) in the use of a modified GEE, andin estimating the variance of the quasi-score function by using the multinomial covariance structureinstead of either the robust or the naive variance estimators associated with the standard GEE approach.

Some methods for continuous covariates depend on grouping of the covariates into classes (Hosmerand Lemeshow, 1980; Tsiatis, 1980). The power of the resulting tests depends on the number ofgroups, and no guidance is available for optimal grouping. This sensitivity to the grouping has beennoted in other contexts, for example, Section 6 of Jiang (2001). Other methods depend on non-para-metric smoothing. For example le Cessie and van Houwelingen (1991) used kernel smoothing of thestandardized residuals obtained after fitting an assumed parametric form for the covariate(s) effects.The variance of their statistic is quite complicated and computer-intensive even in the independencecase (it involves the correlations between all pairs of residuals). They remarked that the choice of thebandwidth is crucial. They also pointed out that the standard “optimal” bandwidth for kernel smooth-ing is not necessarily optimal for the goodness of fit application. Consequently, these concerns be-come even more difficult to deal with in the case of correlated outcomes. Our focus on categoricalcovariates allows a test statistic that is computationally simple, quite intuitive and has a simple large-sample distribution. The testing procedure is applicable after categorizing continuous covariates, butwith the caveats mentioned above.

Pan (2002) proposed a statistic, G, which is the ordinary Pearson X2 based on the individual binaryresponses, for ungrouped binary data with continuous covariates. However, according to McCullaghand Nelder (1989, Section 4.4.5), the statistic G is not useful for testing goodness of fit (GOF) forindependent binary outcomes. They give the counterexample of a fitted model that contains only anintercept, and for which G is simply the number of observations. Pan (2002) does not give any theore-tical justification as to why G should be a valid test of GOF for correlated binary data.

When several discrete covariates are involved in a model, the contingency tables may be verysparse, and the GOMF test may produce inflated Type I errors. Accuracy of the large-sample approx-imation requires that cells contain sufficient numbers of observations. We recommend applying guide-lines similar to those for the WLS method (Koch et al., 1977): i.e., Zij � 5. The addition of theconstant 2�n to each cell, as described in Section 3, avoids singular estimated multinomial covariancematrices and serves as a type of continuity correction in small samples.

As a referee pointed out, the standard GEE requires the missing completely at random (MCAR)assumption to obtain consistent estimates of the parameters and their covariance. However, Liang andZeger (1986) stated that when the working correlation structure is the true correlation, MCAR can beunnecessary. Our approach, by virtue of using the true multinomial covariance structure, is expectedto be consistent under MCAR. Even in some situations, under only MAR (missing at random) ourapproach may be nearly consistent and efficient. We refer readers to Lipsitz et al. (2000) and Preisseret al. (2002) for more detail. However, this topic does require further investigation.

Acknowledgements The authors thank the Tobacco Research and Intervention Program at the H. Lee MoffittCancer Center & Research Institute (director: Dr. Tom Brandon) for allowing to use their data. The first author’s



work was partially sponsored by the New Researcher Grant 18300 from the University of South Florida. Theauthors thank the referees for many comments that improved the manuscript.

References

Barnhart, H. X. and Williamson, J. (1998). Goodness-of-fit tests for GEE modeling. Biometrics 54, 720–729.Brandon, T. H., Collins, B. N., Juliano, L. M., and Lazev, A. B. (2000). Preventing relapse among former smo-

kers: A comparison of minimal interventions via telephone and mail. Journal of Consulting & Clinical Psy-chology 68, 103–113.

Brandon, T. H., Meade, C. D., Herzog, T. A., Webb, M. S., Chrisikos, T. N., and Cantor, A. B. (in press).Efficacy and cost-effectiveness of a minimal intervention to prevent smoking relapse: dismantling the effectsof content versus contact. Journal of Consulting & Clinical Psychology in press.

Horton, N. J., Bebchuk, J., Jones, C. Lipsitz, S. R., Catalano, P. J., Zahner, G. E. P. and Fitzmaurice, G. M.(1999). Goodness-of-fit for GEE: an example with mental health service utilization. Statistics in Medicine 18,213–222.

Hosmer, D. W., Hosmer, T., LeCessis, S. and Lemeshow, S. (1997). A comparison of goodness-of-fit tests for thelogistic regression model. Statistics in Medicine 16, 965–980.

Hosmer, D. W., and Lemeshow, S. (1980). Goodness-of-fit tests for the multiple logistic regression model. Com-munications in Statistics-Theory and Methods 9, 1043–1069.

Hosmer, D. W., and Lemeshow, S. (1989). Applied logistic regression. New York: John Wiley & Sons, Inc.Jiang, J. M. (2001). Goodness-of-fit tests for mixed model diagnostics. Annals of Statistics 29, 1137–1164.Koch, G. G., Landis, J. R., and Lehnen, R. G. (1977). A general methodology for the analysis of experiments

with repeated measurement of categorical data. Biometrics 33, 135–158.le Cessie, S. and van Houwelingen, J. C. (1991). A goodness-of-fit test for binary regression models, based on

smoothing methods. Biometrics 47, 1267–1282.Liang, K. Y. and Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika 73,

13–22.Lipsitz, S. R. and Buoncristiani, J. F. (1994). A robust goodness-of-fit test statistic with application to ordinal

regression-models. Statistics in Medicine 13, 145–152.Lipsitz S. R., Molenberghs G., Fitzmaurice G. M., and Ibrahim J. (2000). GEE with Gaussian estimation of the

correlations when data are incomplete. Biometrics 56, 528–536.McCullagh, P. and Nelder, J. A. (1989). Generalized linear models, 2nd ed. New York: Chapman Hall.Miller, M. E., Davis, C. S., and Landis, J. R. (1993). The analysis of longitudinal polytomous data: generalized

estimating equations and connections with weighted least squares. Biometrics 49, 1033–1044.Pan, W. (2002). Goodness-of-fit tests for GEE with correlated binary data. Scandinavian Journal of Statistics 29,

101–110.Preisser, J. S., Lohman, K. L., and Rathouz, P. J. (2002). Performance of weighted estimating equations for long-

itudinal binary data with drop-outs missing at random. Statistics in Medicine 21, 3035–3054.Schott, J. R. (1997). Matrix analysis for statistics. New York: John Wiley Sons, Inc.Sen, P. K. and Singer, J. M. (1993). Large Sample Methods in Statistics. London: Chapman and Hall.Tsiatis, A. A. (1980). A note on a goodness-of-fit test for the logistic regression model. Biometrika 67, 250–251.



Documents

Modified GEE and Goodness of the Marginal Fit (GOMF) Test with Correlated Binary Responses for Contingency Tables