10
Applied Mathematical Sciences, Vol. 5, 2011, no. 5, 213 - 222 Using Generalized Poisson Log Linear Regression Models in Analyzing Two-Way Contingency Tables Naser A. Rashwan and Maie M. Kamel Dept., of Statistics & Mathematics Faculty of Comm., Tanta University [email protected] Abstract Generalized Poisson log linear regression model is one of the most important models using in categorical (qualitative) data. We are assumed the probability distribution for the collected data, so the log linear model for data will be hypothesis and the estimators we had under the hypothesis of the probability model is right. We will make a comparison between the estimators and the actual data to evaluate the log linear model. Keywords: Poisson Regression (PR), Generalized poison Regression (GPR), Categorical Data, Contingency Tables, Over-dispersion, Hepatocellular Carcinomas (HCC), Maximum Likelihood, Pearson Chi-Square 1- Introduction When the response or dependent variable is a count data type (which can take on nonnegative integer values, (0, 1, 2, ...) it is not appropriate used the linear model based on normal distribution to describe the relationship between the response variable and a set of predictor variables and cannot use the logistic regression model because the response variable is not a binary variable (0, 1). In this case the Poisson regression model is the popular tool to describe it (Cameron and Trivedi, 1998). The Poisson regression (PR) model is often applied to study the occurrence of small number of counts or events as a function of a set of predictor variables (Porodi and Bottarelli, 2006). The PR model has been applied in many disciplines, including medicine, Economy, Biology and Demography. Cameron et al (1998) used the PR model in health demand studies model data on the number of times that individuals consume a health service and estimate the impact of health status and a health insurance. Rose (1990) applied this model to estimate the number of

Using Generalized Poisson Log Linear Regression …m-hikari.com/ams/ams-2011/ams-5-8-2011/kamelAMS5-8-2011.pdf · Using Generalized Poisson Log Linear Regression Models in Analyzing

Embed Size (px)

Citation preview

Page 1: Using Generalized Poisson Log Linear Regression …m-hikari.com/ams/ams-2011/ams-5-8-2011/kamelAMS5-8-2011.pdf · Using Generalized Poisson Log Linear Regression Models in Analyzing

Applied Mathematical Sciences, Vol. 5, 2011, no. 5, 213 - 222

Using Generalized Poisson Log Linear Regression

Models in Analyzing Two-Way Contingency Tables

Naser A. Rashwan and Maie M. Kamel

Dept., of Statistics & Mathematics Faculty of Comm., Tanta University

[email protected]

Abstract

Generalized Poisson log linear regression model is one of the most important models using in categorical (qualitative) data. We are assumed the probability distribution for the collected data, so the log linear model for data will be hypothesis and the estimators we had under the hypothesis of the probability model is right. We will make a comparison between the estimators and the actual data to evaluate the log linear model. Keywords: Poisson Regression (PR), Generalized poison Regression (GPR), Categorical Data, Contingency Tables, Over-dispersion, Hepatocellular Carcinomas (HCC), Maximum Likelihood, Pearson Chi-Square 1- Introduction

When the response or dependent variable is a count data type (which can

take on nonnegative integer values, (0, 1, 2, ...) it is not appropriate used the linear model based on normal distribution to describe the relationship between the response variable and a set of predictor variables and cannot use the logistic regression model because the response variable is not a binary variable (0, 1). In this case the Poisson regression model is the popular tool to describe it (Cameron and Trivedi, 1998).

The Poisson regression (PR) model is often applied to study the occurrence of small number of counts or events as a function of a set of predictor variables (Porodi and Bottarelli, 2006). The PR model has been applied in many disciplines, including medicine, Economy, Biology and Demography. Cameron et al (1998) used the PR model in health demand studies model data on the number of times that individuals consume a health service and estimate the impact of health status and a health insurance. Rose (1990) applied this model to estimate the number of

Page 2: Using Generalized Poisson Log Linear Regression …m-hikari.com/ams/ams-2011/ams-5-8-2011/kamelAMS5-8-2011.pdf · Using Generalized Poisson Log Linear Regression Models in Analyzing

214 N. A. Rashwan and M. M. Kamel accidents experienced by an airliner over some period winkelman (1995) applied this model in fertility studies.

Gardner et al (1995) applied the PR model in Biomedical studied, including epidemiology to investage the accurrance of selected diseases in exposed and unexposed subjects in experimental and observational studied. Travier et al (2003) used the PR model to Study cancer incidence among male Swedish veterinarians and other workers of the veterinary industry. Roche and Berry (2006),also applied this model to study the per parturient climatic, animal and management factors influencing the incidence of milk fever in grazing systems in cows.

The Poisson regression model assumed that the mean and the variance of the response variable are equal but in practice, the observed variance of the data may be larger or smaller than the corresponding mean.

In these cases, the data may involved over – dispersion of the variance is larger than the mean or the data may involved under– dispersion of the variance is less than the mean, For such situations , the PR model is not appropriate and the appropriate model is called the Generalized poison Regression (GPR) .

The GPR model proposed by Consul and Famoye (1992) and Famoye (1993) is used to model count data that are affected by a number of Known predictor variables the model is based on the generalized Poisson distribution. This model has been used to model households fortuity data set (Wang and Famoye, 1997) and to model injury data (Wulu et al., 2002 ) and a number of studies have applied the GPR model to deal with over – dispersion ( Breslow 1990 , Famoye 1993, and Famoye et al 2004 ) .

Because the Poisson regression model and Generalized Poisson regression model have still less used in many applications recently, especially if it compared with other regression models, therefore the basic objective of this paper is to highlight on these models and held a comparison between them through application on.

2 – Poisson regression model Let Yi be the random variable takes nonnegative values, i= 1, 2, n where n is

the number of observations. If Yi follows a Poisson distributions with the probability density function:

,....2,1,0,!)-exp()()( ==== i

i

yii

iiri yy

yYpyfiλλ

… (1)

With mean and the variance are equal ( ) ( ) iii YVarYE λ== … (2) Where ( )βλ ii X ′= exp , Xi is the ith row of covariate matrix, and β = (β1, β2, ….., βx) are unknown K-dimensional vector of regression parameters. The mean of Yi is given by E (Yi | Xi) and the variances of Yi is given by van (Yi |Xi).

The parameters B can be estimated by Maximum likelihood estimated method.

Page 3: Using Generalized Poisson Log Linear Regression …m-hikari.com/ams/ams-2011/ams-5-8-2011/kamelAMS5-8-2011.pdf · Using Generalized Poisson Log Linear Regression Models in Analyzing

Generalized Poisson log linear regression models 215

( ) ∏=

=n

i i

yi

yL

i

1

i

!)-exp( λλ

β … (3)

The log – likelihood function is given by:

Ln L (β) = [ lnlnـ! [1

iiii

n

iyyـ λλ +∑

=

= [ ]!y ln)exp()( i1

−′′∑=

ββ iii

n

iXـXy … (4)

By differentiating equation (4) with respectβ:

1.2.....k=j 0,=))XXexp(ـ (y=

∂)L(ln l∂

iii

n

1i

βββ ′∑

=j

… (5)

Yields K nonlinear equations and solve these equations by Newton- Raphson method or by iteratively weighted least square procedure the parameters are estimated. 3 – The generalized Poisson regression model. Suppose Yi is a count response variable that follows a generalized Poisson distribution. The probability density function of Yi, i = 1.2…..n is given by (Famoye 1993, Wang and Famoye 1997):

( ) ( ) ( )(6) ,..2,1,0y ,

1)1(

exp!

11 i

11

=⎥⎦

⎤⎢⎣

⎡++−+

⎟⎟⎠

⎞⎜⎜⎝

⎛+

===−

i

ii

i

yi

y

i

iiiri

yyy

yYPyfi

αλαλα

αλλ

Then the mean and variance of Yi are given by: 2)1()\(,)\( iiiiiii xYVarxYE αλλλ +== … (7) When α is called the dispersion parameter. The GP distribution is a natural extension of the Poisson distribution when α = o, the probability density function in (6) reduces to the probability density function in (1) so that the mean is equal to the variance and this a case of equi-dispersion. In practical application, this assumption is often untrue since the variance can either longer or smaller than the mean. if the variance is not equal to the mean, the estimates in PR model are Still Consistent but are inefficient, which leads to the invalidation of inference based on the estimated standard errors (Singh et al., 2000).

When α > o, then the variance is larger than the mean, and for this situation, the GPR model represents count data with over – dispersion, and when α < o, the variance is smaller than the mean, and for this situation, the GPR model represents count date with under – dispersion.

The estimates of α and B in the GPR model are obtained using the method of maximum likelihood. The Log – Likelihood function of the GPR model is given by:

Page 4: Using Generalized Poisson Log Linear Regression …m-hikari.com/ams/ams-2011/ams-5-8-2011/kamelAMS5-8-2011.pdf · Using Generalized Poisson Log Linear Regression Models in Analyzing

216 N. A. Rashwan and M. M. Kamel

)8......(....................1

)1(!-ln)1ln()1-()

1ln(),(ln

1∑=

⎥⎦

⎤⎢⎣

⎡++

−+++

=n

i i

iiiii

i

ii

yyyyyL

αλαλ

ααλλ

αβ

By finding the partial derivatives with respect to α and β , the likelihood equations are given by:

.)1(

)1(1

-),(ln2

1∑ ⎥

⎤⎢⎣

⎡+

++

=∂

= i

iii

i

iiiii

n

i

yxxyxyL

αλαλ

αλαλ

βαβ … (9)

)10.....(........................................)1(

)-(-

1)1-(

1-),(ln

21

⎥⎦

⎤⎢⎣

⎡++

++

=∂

∂ ∑= i

iii

i

ii

i

iin

i

yy

yyyLαλ

λλααλ

λα

αβ

The parameters α and B are estimated by the Newton – Raphson method, also we can estimate α by using moments method, under this method α may be estimated by equating the Pearson chi- squarer with (n-k) degree of freedom, as suggested by Breslow (1995 ),this is given by :

2

2

1 )1()-(

ii

iin

i

yαλλλ

+∑=

= n-k …(11)

Where n denotes the number of values and K the number of regression parameters.

4 – Goodness of fit test. A measure of goodness –of – fit of these models may be based on the log-

likelihood statistic, the regression model with a larger a log-likelihood value is a better than the one with small log-likelihood value.

The GPR model reduces to the PR model when α = o to test for the adequacy of the GPR model over the PR model one may test the hypothesis: Ho: α = o Vs H1: α ≠ o … (12)

The test of Ho in (12) for significance of the dispersion parameter when Ho is rejected, it is recommended to use the GPR model in Place of the PR model. to carry out the test in (12 ) one can use the asymmetrically normal Wald type ( t ) statistic defined and the ratio of the estimate of α to its standard error alternative test for the hypothesis in (12) is to use the likelihood ratio test , the likelihood ratio is T = 2 ( 1 – Lo ) where Ll and Lo are the model’s log-likelihood under the respective hypothesis. Under Ho, T is approximately chi – square distribution with one degrees of freedom. (Wang and Famoye 1997, Famoye et al. 2004).

Also, there are two other procedures can be used to measure the goodness–of–fit, the first, based on the Pearson chi – square. The person chi–square is equal to

)var()-( 2

1 ii

iin

i xyy λ∑

=

; it has approximately chi–square distribution with (n-k) degrees of

freedom. The second, based on the deviance statistic , the deviance is equal to 2(L( yi y) – L (λi y)) , where L (yi y) and L (λi y ) are the log-likelihood evaluated under

Page 5: Using Generalized Poisson Log Linear Regression …m-hikari.com/ams/ams-2011/ams-5-8-2011/kamelAMS5-8-2011.pdf · Using Generalized Poisson Log Linear Regression Models in Analyzing

Generalized Poisson log linear regression models 217 y and λ respectively. The deviance also has approximately chi-square distribution with (n-k) digress of freedom – thus if the value for both Pearson chi-squares and deviance are close to the degrees of freedom ( n-k ) the model may be an adequate.

5 – Numerical Study Using the data of the American-Egyptian center ,For a sample consisted of 456 patients from the national cancer institute (NCI) Cairo University. Tables (1) represent the used data of this study

Age Status

18-29 30-39 40-49 50-59 60&more

urban 2 2 14 31 23 Pop-urban

2030127 1998864 1757254 1315568 732603

rural 2 15 73 153 144 Pop-rural

2681238 2639949 2320847 1737503 967567

Variables in the analysis

1. The response variable y is the number of diseases in each cell of the table. with tow predictors variables X1 ,X2

2. First Predictor Variable X1 is the age cases variable and it can be separated to five levels takes the values from 1 to 5 as following: 1 from 18 to 29 years 2 from 30 to 39 years 3 from 40 to 49 years 4 from 50 to 59 years 5 60 years and more.

3. The Second Predictor variable X2is the Environment variable as a binary variable takes 1 for the Urban and 2 for the Rural.

4. We have the population as a offset variable. Using spss16: The first step in analysis data to recognize the data characters We compute the mean and the variance for data, table(2) represent the mean and the variance (descriptive statistics).

Table (2) Descriptive Statistics N Mean Variance

no 10 45.9000 3.370E3

Page 6: Using Generalized Poisson Log Linear Regression …m-hikari.com/ams/ams-2011/ams-5-8-2011/kamelAMS5-8-2011.pdf · Using Generalized Poisson Log Linear Regression Models in Analyzing

218 N. A. Rashwan and M. M. Kamel From the above table, we found that the variance of the data is greater than the mean ,as we know the important characters of Poisson distribution is the expected mean equal to the expected variance for the same data. So, the Poisson regression is not an appropriate for the data, because of there is an over- dispersion, So the Generalized Poisson Regression is the best model for data more than the Poisson Regression model. Table(3)represents the goodness of fit, there are three common measures of goodness of fit : - Deviance: ∑

- Pearson Chi-Square: ∑ - The log likelihood computed by the following equation :

– !

Table (3) Goodness of Fitb

Value df Value/df

Deviance 3.434 4 .858

Pearson Chi-Square 4.420 4 1.105

Log Likelihooda -25.194

All measures have approximately Chi-Square Distributions under the hypothesis that the current model is appropriate for the combinations of independent variables. Table (4) represent Likelihood ratio test for the overall the fit of the model, this test is significant

Table(4) Omnibus Testa

Likelihood Ratio Chi-Square df Sig.

676.728 5 .000

Table(5) represent the tests of the model effects which is all of them are significant.

Page 7: Using Generalized Poisson Log Linear Regression …m-hikari.com/ams/ams-2011/ams-5-8-2011/kamelAMS5-8-2011.pdf · Using Generalized Poisson Log Linear Regression Models in Analyzing

Generalized Poisson log linear regression models 219

Table(5) Tests of Model Effects

Source

Type III

Likelihood Ratio Chi-

Square df Sig.

(Intercept) 7565.536 1 .000

age 475.608 4 .000

state 201.120 1 .000

From Table (6) we found that all parameters estimator in GPR model are significant.

And the fitted model is:

. . . . . .

. . . .

. .

Table(6) Parameter Estimates

Parameter B Std. Error

95% Wald Confidence

Interval Hypothesis Test

Lower Upper Wald Chi-Square df Sig.

(Intercept) -1.038 .0245 -1.086 -.990 1796.094 1 .000 [age=1.00] -4.174 .4860 -5.127 -3.222 73.767 1 .000 [age=2.00] -2.721 .0617 -2.842 -2.600 1943.708 1 .000 [age=3.00] -1.032 .0277 -1.086 -.978 1391.500 1 .000 [age=4.00] -.157 .0317 -.219 -.095 24.685 1 .000 [age=5.00] 0a . . . . . . [state=1.00] -1.561 .0587 -1.676 -1.446 708.329 1 .000 [state=2.00] 0a . . . . . .

Over-dispersion α 0.1648 0.000

Page 8: Using Generalized Poisson Log Linear Regression …m-hikari.com/ams/ams-2011/ams-5-8-2011/kamelAMS5-8-2011.pdf · Using Generalized Poisson Log Linear Regression Models in Analyzing

220 N. A. Rashwan and M. M. Kamel

. . . . . .

. – . . . ..

We also found that, The over dispersion parameter α=0.1648 is significantly different from zero where sig=0.000 6- Conclusions This paper introduced the PR and GPR models as appropriate techniques to describe two-Way Contingency Tables (or a count data) of a response variable (number of diseases) as a function of age and state. This study has shown that for an over-dispersion data, the GPR model is better than the PR model. Because of the Poisson distribution has a special property that mean is equal to the variance. Thus an over dispersion means that the variance is greater than mean. Also we found that the set of age 50-59 years is more than have the disease Hepatocellular Carcinomas (HCC) according to another sets.

References

[ 1 ] A. C. Cameron, P. K. Trivedi, Regression Analysis of Count data, Cambridge University press, New York, 1998. [ 2 ] A. C. Cameron, P. K. Trivedi ,F. Milne and J. Piggott, A micro-econometric Model of the Demand for Health Cars and Health Insurance , Review of Economic Studies, 55 (1988), 85 – 106. [ 3 ] F. Famoye, Restricted Generalized Poisson Regression Model, Communication in Statistics – Theory and Methods, 22(1993), .1335 – 1354. [ 4 ] F. Famoye, J. T. Wulu and K. P. Singh ,On the Generalized Poisson Regression with an application to Accident Data , Journal of Data Science, 2 (2004), 287 – 295. [ 5 ] J. R. Roche and D. P. Berry, eripartuient Climatic, animal, and management Factors Influencing the Incidence of Milk Fever in Grazing systems”, Journal of Dairy Science, 89(2006) , 2775 – 2783.

Page 9: Using Generalized Poisson Log Linear Regression …m-hikari.com/ams/ams-2011/ams-5-8-2011/kamelAMS5-8-2011.pdf · Using Generalized Poisson Log Linear Regression Models in Analyzing

Generalized Poisson log linear regression models 221 [ 6 ] J.T Wulu, K. P. Singh, F. Famoye and G. McGwin, Regression Analysis of Count Data”, Journal of the Indian Society of Agriculture Statistics, 55 (2002) 220 – 231. [ 7 ] K. P. Singh, J.T. Wulu and A. A. Bartolucci, A note on Generalized Poisson Regression Model, available at, [email protected] , [email protected], and [email protected](2001) [ 8 ] N. E. Breslow, Testing of Hypotheses in Over – Dispersed Poisson Regression and other Quasi – Likelihood Models, Journal of the American Statistical Association, 85 (1990), 565 – 571. [ 9 ] N. Rose, Profitability and Product Quality: Economic Determinants of Airline Safety performance, Journal of Political Economy, 98(1990), 944 – 964. [ 10 ] N. Travier, G. Gridley, A. Blair, M. Dosmeci and P. Boffetta, Cancer Incidence among Male Swedish Veterinarians and other Workers of the veterinary industry: A record – Linkage Study, Cancer causes and control, 14 (2003), 587 – 593. [ 11 ] P. C. Cansul and F. Famoye, Generalized Poisson Regression Model”, Communication in Statistics – Theory and Methods, 21(1992), 89 – 109. [ 12 ] R. Winklmarn, Duration dependence and Dispersion in Count Data Models, Journal of business and Economic Statistic, 13 (1995), 467 – 474. [ 13 ] S. Ezzat, M. Abdel-Hamid, S. Eissa, N. Mokhtar, N. Labib, L. El-Ghorory, N. Mikhail, A. Abdel-Hamid, T. Hifnawy, G. Strickland, A. Christopher Loffredo, (2005), Associations of pesticides, HCV, HBV, and Hepatocellular carcinoma in Egypt Int .J. Hyg. Environ-Health 208(2005), 329-339. [ 14 ] S. Parodi and E. Bottarelli, Poisson Regression Model in Epidemiology-An introduction, Ann.Fac.Medic.Deparma, XXVI (2006), 25 – 44. [ 15 ] W. Gardner, E. P. Mulvey and E.C. Shaw, Regression Analysis of Count and Rates: Poisson, and Negative Binomial Models, Psychological Bulletin, 118 (1995), 392 – 404. [ 16 ] W. Wang and F. Famoye, Modeling Household Fertility Decisions with Generalized Poisson Regression, Journal of Population Economics, 10(1997), 273 – 283.

Page 10: Using Generalized Poisson Log Linear Regression …m-hikari.com/ams/ams-2011/ams-5-8-2011/kamelAMS5-8-2011.pdf · Using Generalized Poisson Log Linear Regression Models in Analyzing

222 N. A. Rashwan and M. M. Kamel Received: August, 2010