31
Multilevel Modeling- Logistic Raul Cruz-Cano, HLTH653 Spring 2013

Multilevel Modeling-Logistic

Embed Size (px)

DESCRIPTION

Multilevel Modeling-Logistic. Schedule. 3/18/2013 = Spring Break 3/25/2013 = Longitudinal Analysis 4/1/2013 = Midterm (Exercises 1-5, not Longitudinal). Introduction. Just as with linear regression, logistic regression allows you to look at the effect of multiple predictors on an outcome. - PowerPoint PPT Presentation

Citation preview

Page 1: Multilevel Modeling-Logistic

Multilevel Modeling-Logistic

Raul Cruz-Cano, HLTH653 Spring 2013

Page 2: Multilevel Modeling-Logistic

Schedule

3/18/2013 = Spring Break 3/25/2013 = Longitudinal Analysis 4/1/2013 = Midterm (Exercises 1-

5, not Longitudinal)

Raul Cruz-Cano, HLTH653 Spring 2013

Page 3: Multilevel Modeling-Logistic

Introduction

Just as with linear regression, logistic regression allows you to look at the effect of multiple predictors on an outcome.

Consider the following example: 15- and 16-year-old adolescents were asked if they have ever had sexual intercourse.

The outcome of interest is intercourse. The predictors are race (white and black) and gender (male and

female).

Example from Agresti, A. Categorical Data Analysis, 2nd ed. 2002.

Raul Cruz-Cano, HLTH653 Spring 2013

Page 4: Multilevel Modeling-Logistic

Here is a table of the data:

Intercourse

Race Gender Yes No

White Male 43 134

Female 26 149

Black Male 29 23

Female 22 36Raul Cruz-Cano, HLTH653

Spring 2013

Page 5: Multilevel Modeling-Logistic

Data Set Intercourse

DATA intercourse;INPUT white male intercourse count;

DATALINES;1 1 1 431 1 0 1341 0 1 261 0 0 1490 1 1 290 1 0 230 0 1 220 0 0 36;RUN;

Raul Cruz-Cano, HLTH653 Spring 2013

Page 6: Multilevel Modeling-Logistic

SAS:

“descending” models the probability that intercourse = 1 (yes) rather than = 0 (no).

“rsquare” requests the R2 value from SAS; it is interpreted the same way as the R2 from linear regression.

“lackfit” requests the Hosmer and Lemeshow Goodness-of-Fit Test. This tells you if the model you have created is a good fit for the data.

PROC LOGISTIC DATA = intercourse descending; weight count;MODEL intercourse = white male/rsquare lackfit;

RUN;

Raul Cruz-Cano, HLTH653 Spring 2013

Page 7: Multilevel Modeling-Logistic

SAS Output: R2

Raul Cruz-Cano, HLTH653 Spring 2013

Page 8: Multilevel Modeling-Logistic

Interpreting the R2 value

The R2 value is 0.9907. This means that 99.07% of the variability in our outcome (intercourse) is explained by including gender and race in our model.

Raul Cruz-Cano, HLTH653 Spring 2013

Page 9: Multilevel Modeling-Logistic

PROC LOGISTIC Output

The odds of having intercourse is 1.911 times greater for males versus females.

Page 10: Multilevel Modeling-Logistic

Hosmer and Lemeshow GOF Test

Page 11: Multilevel Modeling-Logistic

H-L GOF Test

The Hosmer and Lemeshow Goodness-of-Fit Test tests the hypotheses:Ho: the model is a good fit, vs. Ha: the model is NOT a good fit

With this test, we want to FAIL to reject the null hypothesis, because that means our model is a good fit (this is different from most of the hypothesis testing you have seen).

Look for a p-value > 0.10 in the H-L GOF test. This indicates the model is a good fit.

In this case, the pvalue = 0.2419, so we do NOT reject the null hypothesis, and we conclude the model is a good fit.

Raul Cruz-Cano, HLTH653 Spring 2013

Page 12: Multilevel Modeling-Logistic

Model Selection in SAS

Often, if you have multiple predictors and interactions in your model, SAS can systematically select significant predictors using forward selection, backwards selection, or stepwise selection.

In forward selection, SAS starts with no predictors in the model. It then selects the predictor with the smallest pvalue and adds it to the model. It then selects another predictor from the remaining variables with the smallest pvalue and adds it to the model. It continues doing this until no more predictors have pvalues less than 0.05.

In backwards selection, SAS starts with all of the predictors in the model and eliminates the non-significant predictors one at a time, refitting the model between each elimination. It stops once all the predictors remaining in the model are statistically significant. Raul Cruz-Cano, HLTH653

Spring 2013

Page 13: Multilevel Modeling-Logistic

Forward Selection in SAS

We will let SAS select a model for us out of the three predictors: white, male, white*male. Type the following code into SAS:

PROC LOGISTIC DATA = intercourse descending; weight count;MODEL intercourse = white male white*male/selection = forward lackfit;

RUN;

Raul Cruz-Cano, HLTH653 Spring 2013

Page 14: Multilevel Modeling-Logistic

Output from Forward Selection: “white” is added to the model

Page 15: Multilevel Modeling-Logistic

“male” is added to the model

Page 16: Multilevel Modeling-Logistic

No more predictors are found to be statistically significant

Page 17: Multilevel Modeling-Logistic

The Final Model:

Page 18: Multilevel Modeling-Logistic

Hosmer and Lemeshow GOF Test: The model is a good fit

Page 19: Multilevel Modeling-Logistic

Multilevel Modeling (refresher)

Multi-level modeling takes into account the hierarchical structure of the data (e.g. decedents clustered within occupations as in our data).

Such data structure is subject to intra-class correlation, whereby individuals within the same group are more alike than individuals across groups.

Analysis that ignores this intra-class correlation may underestimate the standard error of the regression coefficient of the aggregate risk factor, leading to overestimation of the significance of the risk factor.

To illustrate the above point, we conducted our analysis using two approaches

Raul Cruz-Cano, HLTH653 Spring 2013

Page 20: Multilevel Modeling-Logistic

1st Approach Fit a multiple logistic regression model on the

combined data with PROC LOGISTIC. The dependent variable is death from injury (yes/no); the risk factor of interest is exposure to hazardous

equipment at work (high/low); confounders included are gender, race

(white/black/other), age (continuous, centered) and a quadratic term for age.

This model ignores the hierarchical structure of the data, and treats aggregate exposure as if it was measured at individual level. The model is expressed by the following equation

21 2log ( ) log

1ij

ij i ij ij ij ijij

pit p Exposure Gender Race Age Age

p

Raul Cruz-Cano, HLTH653 Spring 2013

Page 21: Multilevel Modeling-Logistic

1st Approach pij is the expected probability of death from injury

for the jth individual of the ith occupation conditional on the predictor variables

21 2log ( ) log

1ij

ij i ij ij ij ijij

pit p Exposure Gender Race Age Age

p

proc logistic data=noms.combined descending; class exposure gender race; model injury = exposure gender race age age*age;

run;

Raul Cruz-Cano, HLTH653 Spring 2013

Page 22: Multilevel Modeling-Logistic

Multilevel Example Allison, 2006 The sample consists of 1151 girls from the National Longitudinal

Survey of Youth who were interviewed annually for nine years, beginning in 1979. For this initial example, we’ll only use data from year 1 and year 5.

The response variable POV has a value of 1 if the girl’s household was in poverty (as defined by U.S. federal standards) in each of the years, otherwise 0.

The predictor variables are: AGE: Age in years at the first interview BLACK: 1 if respondent is black, otherwise 0 MOTHER: 1 if respondent currently had a least one child, otherwise 0 SPOUSE: 1 if respondent is currently living with a spouse, otherwise 0 INSCHOOL: 1 if respondent is currently enrolled in school, otherwise 0 HOURS: Hours worked during the week of the survey

Raul Cruz-Cano, HLTH653 Spring 2013

Page 23: Multilevel Modeling-Logistic

Multilevel Example 5755 observations, five for each of the 1151 girls The CLASS statement declares YEAR to be a

categorical variable, with the highest year (year 5) being the reference category.

The STRATA statement says that each girl is a separate stratum, which has the consequence of grouping together the five observations for each girl in the process of constructing the likelihood function.

PROC LOGISTIC DATA=teenyrs5 DESC;CLASS year;MODEL pov = year mother spouse inschool hours;STRATA id;

RUN;

In PROC LOGISTIC there is no CLUSTER, just CLASS and STRATA

Page 24: Multilevel Modeling-Logistic

Multilevel Example In the “Analysis of Maximum of Likelihood Estimates” panel, we see that

motherhood and school enrollment increase the risk of poverty while living with a husband and working more hours reduce the risk.

The last panel gives the odds ratios. We see that motherhood increases the odds of poverty by an estimated 79 percent. Living with a husband cuts the odds approximately in half. Each additional hour of employment per week reduces the odds by about 2 percent. Keep in mind that these estimates control for all stable characteristics of the girls, including such

things as race, intelligence, place of birth and parent’s education

Raul Cruz-Cano, HLTH653 Spring 2013

Page 25: Multilevel Modeling-Logistic

Multilevel Example

The next model, for example, includes the interaction between MOTHER and BLACK.

PROC LOGISTIC DATA=teenyrs5 DESC;CLASS year;MODEL pov = year mother spouse inschool hours mother*black;STRATA id;

RUN;

Raul Cruz-Cano, HLTH653 Spring 2013

Page 26: Multilevel Modeling-Logistic

Multilevel Example The interaction is statistically significant at the .05 level. For nonblack girls, the effect of motherhood is to increase the odds

of poverty by a factor of exp(.9821)=2.67. For black girls, on the other hand, the effect of motherhood is to

increase the odds of poverty by a factor of exp(.9821-.5989)= 1.47. Thus, motherhood has a larger effect on poverty status among

nonblack girls than among black girls.

Raul Cruz-Cano, HLTH653 Spring 2013

Page 27: Multilevel Modeling-Logistic

SAS Weigted Example A random sample 300 students from each of the classes: freshman, sophomore, junior, and senior classes.

proc format; value Design 1='A' 2='B' 3='C'; value Rating 1='dislike very much' 2='dislike' 3='neutral' 4='like' 5='like very much'; value Class 1='Freshman' 2='Sophomore' 3='Junior' 4='Senior';

run; data Enrollment;

format Class Class.; input Class _TOTAL_; datalines; 1 3734 2 3565 3 3903 4 4196 ;

run;

data WebSurvey; format Class Class. Design Design. Rating Rating. ; do Class=1 to 4; do Design=1 to 3; do Rating=1 to 5; input Count @@; output; end; end; end; datalines; 10 34 35 16 15 8 21 23 26 22 5 10 24 30 21 1 14 25 23 37 11 14 20 34 21 16 19 30 23 12 19 12 26 18 25 11 14 24 33 18 10 18 32 23 17 8 15 35 30 12 15 22 34 9 20 2 34 30 18 16 ;

run; data WebSurvey;

set WebSurvey; if Class=1 then Weight=3734/300; if Class=2 then Weight=3565/300; if Class=3 then Weight=3903/300; if Class=4 then Weight=4196/300;

run;

Raul Cruz-Cano, HLTH653 Spring 2013

Page 28: Multilevel Modeling-Logistic

PROC Logistic

proc logistic data=WebSurvey; freq Count; class Design; model Rating (ref='neutral') = Design ; weight Weight;

run;

Raul Cruz-Cano, HLTH653 Spring 2013

Page 29: Multilevel Modeling-Logistic

PROC surveylogistic

proc surveylogistic data=WebSurvey total=Enrollment; freq Count; class Design; model Rating (ref='neutral') = Design; stratum Class; weight Weight; run;

If you want “better” results..

For the Ratings for Design B vs. Design C compare1.The point estimete2.95% Confidence Interval

Raul Cruz-Cano, HLTH653 Spring 2013

Page 30: Multilevel Modeling-Logistic

More to come…

There are also mixed effects logistic models…which will be studied later

Raul Cruz-Cano, HLTH653 Spring 2013

Page 31: Multilevel Modeling-Logistic

References Paul D. Allison, Fixed Effects Regression Methods In SAS, SUGI 31

Proceedings (2006), paper 184-31 Jia Li, Toni Alterman, James A. Deddens, Analysis of Large

Hierarchical Data with Multilevel Logistic Modeling Using PROC GLIMMIX In SAS, SUGI 31 Proceedings (2006), paper 151-31

David L. Cassell, (2006) “Wait Wait, Don't Tell Me… You're Using the Wrong Proc! SUGI31. Paper 193-31.

Raul Cruz-Cano, HLTH653 Spring 2013