149
Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536 / Epi 536 Categorical Data Analysis in Epidemiology Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 6: Review of Regression October 14, 2014

Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 1

1

Biost 536 / Epi 536Categorical Data Analysis in

EpidemiologyScott S. Emerson, M.D., Ph.D.

Professor of BiostatisticsUniversity of Washington

Lecture 6:Review of Regression

October 14, 2014

Page 2: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 2

2

Lecture Outline

• General Regression Model

• Confounding and Precision in Regression– Linear regression with adjustment for a single covariate– Logistic regression with adjustment for a single covariate– PH regression with adjustment for a single covariate– Linear regression with adjustment for multiple covariates

Page 3: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 3

3

General Regression Model

Page 4: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 4

4

Regression Models

• According to the parameter compared across groups– Means Linear regression– Geom Means Linear regression on logs– Odds Logistic regression– Rates Poisson regression– Hazards Proportional Hazards regr– Quantiles Parametric (AFT) survival regr

• A special case for this course– Cumulative incidence Exponential regression

Page 5: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 5

5

General Regression

• General notation for variables and parameter

• The parameter might be the mean, geometric mean, odds, rate, instantaneous risk of an event (hazard), etc.

ofon distributi ofParameter subjectth for the variablesadjustment of Value

subject th for the POI theof Value subject th on the measured Response

,, 21

ii

ii

i

i

Yi

ii

WWXY

Page 6: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 6

6

Multiple Regression

• General notation for multiple regression model

• The link function is usually either none (means) or log (geommean, odds, hazard)

• In this class: complementary log log link for proportions measuring cumulative incidence, which relate to log link on exponential hazards– log (- log p) = log + log Δt

" covariatefor Slope" "Interest of Predfor Slope" Intercept""

modelingfor usedfunction link""

1

1

0

231210

jj

iiii

WX

gWWXg

Page 7: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 7

7

Borrowing Information

• Use other groups to make estimates in groups with sparse data

• Intuitively: 67 and 69 year olds would provide some relevant information about 68 year olds

• Assuming straight line relationship tells us how to adjust data from other (even more distant) age groups– If we do not know about the exact functional relationship, we

might want to borrow information only close to each group

Page 8: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 8

8

Defining “Contrasts”

• Define a comparison across groups to use when answering scientific question

• If straight line relationship in parameter, slope for POI compares parameter between groups differing by 1 unit in X when all othercovariates in model are equal

• If nonlinear relationship in parameter, slope is average comparison of parameter between groups differing by 1 unit in X “holding covariates constant”– Statistical jargon: a “contrast” across the groups

Page 9: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 9

9

Comparison of Models

• The major difference between regression models is interpretationof the parameters

• Summary: Mean, geometric mean, odds, hazards• Comparison of groups: Difference, ratio

• Issues related to inclusion of covariates remain the same• Address the scientific question

– Predictor of interest; Effect modifiers• Address confounding• Increase precision

Page 10: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 10

10

Interpretation of Parameters

• Intercept– Corresponds to a population with all modeled covariates equal to

zero– Most often outside range of data; quite often impossible; very

rarely of interest by itself

• Slope– A comparison between groups differing by 1 unit in corresponding

covariate, but agreeing on all other modeled covariates– Sometimes impossible to use this definition when modeling

interactions or complex curves

Page 11: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 11

11

Regression with Identity Link

• E.g., modeling mean of response Y on predictors X, W1, W2, …

1

12110

1210

0

1210

: of Difference

1

00

Model

ijijii

ijijii

jii

iii

wxwWxXwxwWxX

WX

WX

Page 12: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 12

12

Regression with Log Link

• E.g., modeling geometric mean, odds, rate, hazard of response Yon predictors X, W1, W2, …

1

12110

1210

0

1210

:log of Difference

log1 log

log00

log Model

ijijii

ijijii

jii

iii

wxwWxXwxwWxX

WX

WX

Page 13: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 13

13

Regression with Log Link: Back Transform

• E.g., modeling geometric mean, odds, rate, hazard of response Yon predictors X, W1, W2, …

1

12110

1210

0

:of Ratio

1

00

log Model

1210

e

ewWxXewWxX

eWX

WX

i

i

wxjijii

wxjijii

jii

iii

Page 14: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 14

14

Stratification vs Regression

• Generally, any stratified analysis could be performed as a regression model

• But stratification adjusts for covariates and all interactions among those covariates– E.g, sex, race, and the sex-race interaction

• Our habit in regression is to just adjust for the covariates (the “main effect”), and consider interactions less often

Page 15: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 15

15

Regression with “Saturated” Models

• Regression model is “saturated” if as many regression parameters as distinct combinations of predictors in the data

• No borrowing of information across groups

• Fitted values will correspond to common descriptive statistics– Linear regression: sample means (proportions)– Logistic regression: (log) sample odds– Poisson regression: (log) sample rates

Page 16: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 16

16

Example of “Saturated” Models

• Suppose model sex (two values), race (four values), and categorized age (three levels)

• Suppose we have at least one observation for each combination of sex, race, and categorized age

• Then a regression model that fits r = 24 parameters is saturated

• Model dummy variables with all interactions:– Intercept (1)– Main effects: sex (1), race (3), age (2)– Two-way interactions: sex-race (3), sex-age (2), race-age(6)– Three-way interactions: sex-race-age (6)

• (24 equations in 24 unknowns: Fit descriptive statistics exactly)

Page 17: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 17

17

Regression with Binary Predictor

• Common regression models correspond to common two-sample tests when only a single binary predictor is modeled

• Linear regression (classic SE) t test, equal variance (exact)

• Linear regression (robust SE) t test, unequal variance (approx)

• Logistic regression chi square test (score test)

• Prop hazard regression logrank test (score test)

Page 18: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 18

18

Stata: Multiple Regression

• In Stata, common syntax is used for common regression models – Keyword indicating model (and summary measure-link)– Response variable (except in proportional hazards)– List of predictors / model

• Special syntax for creating dummy variables, interactions– Options

• Output– Table of regression parameters, CI, p values

• (Optional display: back-transformed parameters of RR, OR, HR)– Test of entire regression model also provided

• A test that all slopes are equal to 0

• Post-estimation commands– Joint tests of parameters: test, testparm– Inference about combinations of parameters: lincom, nlcom

Page 19: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 19

19

Stata: Regression Commands

• Linear regression (means, proportions)– regress Y X W1 W2 W3, robust

• Linear regression ( geometric means)– regress logY X W1 W2 W3, robust

• Logistic regression (odds)– logistic Y X W1 W2 W3, [robust]– logit Y X W1 W2 W3, [robust]

• Poisson regression (rates)– poisson Y X W1 W2 W3, robust

• Proportional hazards regression (hazards)– stset Y Event– stcox X W1 W2 W3, robust

Page 20: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 20

20

Stata: glm Regression Commands

• Difference of means, proportions (linear regression)– glm Y X W1, robust family(gaussian) link(identity)

• Difference of log means, log proportions– glm Y X W1 W2, robust link(log)

• Difference of log geometric means (linear regression on logs)– glm logY X W1 W2, robust

• Difference of log odds (logistic regression)– glm Y X W1, [robust] family(binomial) link(logit)

• Difference of log rates (Poisson regression)– glm Y X W1 W2, robust family(poisson) link(log)

• Back transforming regression parameters with log link– glm Y X W1 W2, robust family(…) link(…) eform

Page 21: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 21

21

R: Multiple Regression

• The uwIntroStats package in R adopts similar syntax to Stata

• A single function takes the “functional” as its first argument– “Functional” = “summary measure”– E.g., mean, geometric mean, odds, rates, hazard

• Allows pre-specification of “multiple-partial” tests– Joint significance of dummy variables, polynomials, interactions,

etc.

Page 22: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 22

22

Regression Models withBinary Response

Page 23: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 23

23

Common Regression Models

• According to the parameter and contrast across groups– Mean (differences) Linear regression (a GLM)– Mean (ratios) GLM with log link (a GLM)– Geom Means (ratios) Linear regression on logs (a GLM)– Odds (ratios) Logistic regression (a GLM)– Rates (ratios) Poisson regression (a GLM)– Hazards (ratios) Proportional Hazards regr– Quantiles (ratios) Parametric (AFT) survival regr

• A special case for this course– Cumulative incidence Exponential regression (a GLM)

Page 24: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 24

24

General Regression

• General notation for variables and parameter

• The parameter might be the mean, geometric mean, odds, rate, instantaneous risk of an event (hazard), etc.

• (Sometimes we will use multiple regression predictors to model the scientific POI– E.g., linear and quadratic terms, dummy variables, etc.)

ofon distributi of measure)(summary Parameter subjectth for the variablesadjustment of Value

subject th for the POI theof Value subject th on the measured Response

,, 21

ii

ii

i

i

Yi

ii

WWXY

Page 25: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 25

25

Multiple Regression

• General notation for multiple regression model

• The link function can usually be viewed as either none (means) or log (geom mean, odds, hazard) but some exceptions– log odds viewed by many as logit mean– complementary log log link for proportions measuring cumulative

incidence, which relate to log link on exponential hazards• log (- log p) = log + log Δt

" covariatefor Slope" "Interest of Predfor Slope" Intercept""

modelingfor usedfunction link""

1

1

0

231210

jj

iiii

WX

gWWXg

Page 26: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 26

26

Generalized Linear Model (GLM)

• A special subset of my general regression model is the “Generalized Linear Model”– θ is the mean of response Y– Y has a 1-parameter exponential family distribution

• Includes normal (Gaussian), binomial, Poisson, exponential, gamma, negative binomial, log-normal…

• Estimating equations derived from maximum likelihood theory in these “regular” models (i.e., distributions that satisfy certain technical assumptions)– MLEs are “consistent”: arbitrarily close to the truth with large n– MLEs are “asymptotically efficient”: have greatest precision

possible in general (meet Cramer-Rao lower bound)

Page 27: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 27

27

GLM Estimating Equations

• Use maximum likelihood estimation assuming a parametric distribution for the data

• Quasi-likelihood and generalized estimating equations (GEE) use same general equation, but do not assume parametric distribution– Only model mean and mean-variance relationship

0ˆˆˆ

:satisfy toˆ estimatesparameter regression Find

:equation" estimatingfunction Score",~|

ˆ1

112

221102

ii

in

i i

iij

i

iiii

n

i i

iiin

i i

iij

iiiiiiiii

VYU

xgVYYU

xxxVxXY

Page 28: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 28

28

“Family”: Mean Variance Relationship

• If we trust our linear model and data distribution completely, then using the correct mean-variance relationship will be more efficient– Weights observations less if the variance is greater– Works because every weighted average will correctly estimate

the straight line relationship

in

i i

ii

in

i i

ii

in

i ii

ii

in

i

ii

YU

YU

YU

YU

12

1

1

12

lExponentia

Poisson

1 Bernoulli

Gaussian

Page 29: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 29

29

Role of Link Function

• Conceptually, we can use any link function with any distributionfamily

• There are often advantages of using the “canonical” link for each family: These tend to be the default value in statistical software– Gaussian distribution Identity link– Binomial distribution Logit link (on means)– Poisson distribution Log link– Exponential distribution Inverse link: g(μ) = 1/ μ

Page 30: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 30

30

Binary Data: Role of Link Function

• With Bernoulli data, we might consider three different link functions

ORX

eeYU

RRXeeYU

RDXXX

XYU

i

n

iX

X

i

i

n

iX

Xi

i

n

i ii

ii

i

i

i

i

1

1

1 Logit

1 Log

ˆ1ˆ Identity

Page 31: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 31

31

Uses of Regression

• Modeling questions about associations– “Defining averages” across groups– “Defining contrasts” for associations between response and POI– “Defining contrasts” for detecting effect modification

• Enabling comparisons across POI groups that are “otherwise similar” with respect to other variables– Adjusting for confounding– Adjusting to gain precision

Page 32: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 32

32

Borrowing Information

• Use other groups to make estimates in groups with sparse data

• Intuitively: 67 and 69 year olds would provide some relevant information about 68 year olds

• Assuming straight line relationship in modeled covariates tells us how to adjust data from other (even more distant) age groups– If we do not know about the exact functional relationship, we

might want to borrow information only close to each group

Page 33: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 33

33

Defining “Contrasts”

• Define a comparison across groups to use when answering scientific question

• If straight line relationship in parameter, slope for POI compares parameter between groups differing by 1 unit in X when all othercovariates in model are equal

• If nonlinear relationship in parameter, slope is average comparison of parameter between groups differing by 1 unit in X “holding covariates constant”– Statistical jargon: a “contrast” across the groups

• If multiple regression predictors model the POI, interpetation of the contrast is more difficult

Page 34: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 34

34

Comparison of Models

• The major difference between regression models is interpretationof the parameters– Summary: Mean, geometric mean, odds, hazards– Comparison of groups: Difference, ratio

• Issues related to inclusion of covariates remain the same– Address the scientific question

• Predictor of interest; Effect modifiers– Address confounding– Increase precision

Page 35: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 35

35

Interpretation of Parameters

• Intercept– Corresponds to a population with all modeled covariates equal to

zero– Most often outside range of data; quite often impossible; very

rarely of interest by itself

• Slope– A comparison between groups differing by 1 unit in corresponding

covariate, but agreeing on all other modeled covariates• Identity link: a difference in the summary measure• Log link: a ratio in the summary measure (difference in logs)

– Sometimes impossible to use this definition when modeling interactions or complex curves

• I most often resort to looking at predicted summary measures

Page 36: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 36

36

Regression with Identity Link

• E.g., modeling mean of response Y on predictors X, W1, W2, …

1

12110

1210

0

1210

: of Difference

1

00

Model

ijijii

ijijii

jii

iii

wxwWxXwxwWxX

WX

WX

Page 37: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 37

37

Regression with Identity Link

• Most common example: Linear regression for difference in means– Ordinary (unweighted) least squares

• Optimal (in some sense): independent, homoscedastic response– Weighted least squares: weighting by inverse variance

• Optimal (in some sense): independent, heteroscedastic response– (Biost 540) General least squares: weighting by covariance matrix

• Optimal (in some sense) for correlated response data

• Optimality: For any distribution having a variance, then– “Best linear unbiased estimate”: greatest precision among all

unbiased estimators that are a linear combination of the data• Gauss-Markov theorem

– If response is normal within groups and linear model correct, then• Unbiased and efficient (greatest possible precision)• Also MLE

Page 38: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 38

38

Linear Regression: Ordinary, Weighted

• Use least squares estimation– Ordinary least squares: assume equal variance– Weighted least squares: use actual variances

ˆ

:satisfy toˆ estimatesparameter regression Find

:equation" estimatingfunction Score"

Minimize :squares"Least "

,~|

12

12

12

222110

2

ji

n

i i

iiij

i

ji

n

i i

iiij

n

i i

iii

iiiiiiii

xxYU

xxYU

xY

xxxxxXY

Page 39: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 39

39

Properties of Least Squares Estimates

• In unweighted regression with no covariates– Estimated intercept is the sample mean

• In unweighted regression with a single binary estimator X– Estimated intercept is the sample mean for group with X=0– Estimated slope is the difference in sample means

• Group with X=1 minus group with X=0

• In unweighted regression with as many parameters as distinct combinations of predictors– The fitted value for every group will correspond exactly to the

sample mean

• In weighted regressions the last two may not hold, unless all subjects with the same covariate values have the same weights

Page 40: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 40

40

Linear Regression: Binary Response

• A mean-variance relationship: mean μ = p variance p(1-p)

• Optimal (BLUE) estimates would use weighted least squares– We have to estimate the weights in an iterative search

0ˆ1ˆ

ˆˆ

:satisfy toˆ estimatesparameter regression Find

1

:equation" estimatingfunction Score"

1

112

ji

n

iiiii

iiij

i

ji

n

i iiii

iiiji

n

i i

iiij

xxx

xYU

xxx

xYxxYU

Page 41: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 41

41

Stata: Binary Response with Identity Link

• Weighted least squares using iteratively estimated weights:– glm y x1 …,family(binomial) link(identity)– binreg y x1 …,rd– cs y x1

• Ordinary (unweighted) least squares :– glm y x1 …,family(gaussian) link(identity)– regress y x1 …

• Robust standard errors can be obtained with either– Specify option robust with glm or regress – Specify option vce(robust) with binreg

Page 42: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 42

42

Examples: From Fall 2013 HW #2

• Modeling females’ risk of dying within 4 years as a function of– Estrogen use (0 or 1)– Prior history of cardiovascular disease (0 or 1)– Age (65 – 100 years)

• Derived variable estr_prev = estrogen * prevdis

• Comparisons– No covariates– Binary covariate– Saturated model with two binary covariates– Continuous covariate

Page 43: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 43

43

Fitted Values with Linear Age Fit

• Some fitted values estimate a negative probability– Weights would be negative in weighted regression– Alters the “contrast” across groups

0.0

5.1

.15

.2Fi

tted

valu

es

60.00 70.00 80.00 90.00 100.00age

No estrogen, no CVD No estrogen, prior CVDEstrogen, no CVD Estrogen, prior CVD

fit5: Age Continuous Linear

Page 44: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 44

44

Relative Advantages

• IF the linear model is correct, weighting by the estimated mean-variance relationship is most efficient

• HOWEVER, if the linear model is not correct, we can end up with negative variance estimates– Not a good thing to use when weighting

• The unweighted analysis– is unbiased if the linear model is correct, – interpretable as a linear trend when the linear model is not exactly

correct, and– can be adjusted for the heteroscedasticity when making inference

Page 45: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 45

45

Linear Regression: “Gaussian” Response

• No mean-variance relationship

• Optimal (BLUE) estimates only if homoscedastic– But unbiased regardless

0ˆˆ

:satisfy toˆ estimatesparameter regression Find

:equation" estimatingfunction Score"

1

1

ji

n

iiij

i

ji

n

iiij

xxYU

xxYU

Page 46: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 46

46

Regression with Log Link

• E.g., modeling geometric mean, odds, rate, hazard of response Yon predictors X, W1, W2, …

1

12110

1210

0

1210

:log of Difference

log1 log

log00

log Model

ijijii

ijijii

jii

iii

wxwWxXwxwWxX

WX

WX

Page 47: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 47

47

Regression with Log Link: Back Transform

• E.g., modeling geometric mean, odds, rate, hazard of response Yon predictors X, W1, W2, …

1

12110

1210

0

:of Ratio

1

00

log Model

1210

e

ewWxXewWxX

eWX

WX

i

i

wxjijii

wxjijii

jii

iii

Page 48: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 48

48

Log Link: Binary Response

• A mean-variance relationship: mean μ = p variance p(1-p)

• Log link: log(p) = linear predictor

01

ˆ

:satisfy toˆ estimatesparameter regression Find

1

:equation"estimatingfunction Score"

ˆ

112

ji

n

ix

xi

j

i

xji

n

ixx

xix

ji

n

i i

xi

j

xeeYU

exee

eYexeYU

ii

ii

ii

iiii

iiii

ii

Page 49: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 49

49

Relative Advantages

• IF the linear model is correct, weighting by the estimated mean-variance relationship is most efficient

• HOWEVER, if the linear model is not correct, we can end up with negative variance estimates when the mean estimate > 1– Not a good thing to use when weighting

• We can do an unweighted analysis based on Poisson regression– is consistent if the linear model is correct, – interpretable as a linear trend when the linear model is not exactly

correct, and– can be adjusted for the heteroscedasticity when making inference

• This is generally important to do– (Can use a time variable equal to 1 for all subjects)

Page 50: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 50

50

Poisson Regression

22110

1

22110

loglog:logfor 1 oft coefficienknown a

having exposure"" with regression formulate that weNote

0ˆ:satisfy toˆ estimatesparameter regression Find

link Log

:equation" estimatingfunction Score"

log~|

iiiii

i

j

i

i

n

i

Xii

iiiiiiiii

xxttt

U

XetYU

xxxtPxXY

i

Page 51: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 51

51

Simple Poisson Regression

• Modeling rate of count response Y on predictor X

110

10

0

10

log1 log

log0

loglog,| Model!

|Pr P~ Distn

xxXxxX

X

XTTXTYEk

tetTkYtY

ii

ii

ii

iiiiiii

kii

t

iiiiii

ii

Page 52: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 52

52

Interpretation as Rates

• Exponentiation of parameters

110

10

0

1 0

loglog,| Model!

|Pr P~ Distn

10

eeexXeexX

eX

XTTXTYEk

tetTkYtY

xii

xii

ii

iiiiiii

kii

t

iiiiii

ii

Page 53: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 53

53

Simple Poisson Regression

• Interpretation of the model

• Rate when predictor is 0– Found by exponentiation of the intercept from the Poisson

regression: exp(0)

• Rate ratio between groups differing in the value of the predictor by 1 unit– Found by exponentiation of the slope from the Poisson

regression: exp(1)

Page 54: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 54

54

Stata Commands

• Same form as for other regression models

• Exception:– If the observed counts are measured over different amounts of

time or space, we must specify the length of “exposure”– poisson respvar predvar, exposure(tm) [robust]

• Exposure can also be given as the “offset”, which is just the log of the exposure time– poisson respvar predvar, offset(logtm) [robust]

• By default, Stata reports estimates on the log mean and log mean ratio scale– Specifying the option irr will cause Stata to suppress output of

the intercept and to report “incidence rate ratios”

Page 55: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 55

55

Stata: Poisson regression

• Weighted least squares using iteratively estimated weights:– poisson y x1 …,exposure(t)– glm y x1 …,family(poisson) link(log)

• Robust standard errors can be obtained with either– Specify option robust with glm or poisson– Specify option vce(robust) with binreg

Page 56: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 56

56

Similarity to Other Regressions

• Poisson regression uses maximum likelihood estimation to find parameter estimates

• If a saturated model is fit, the estimated mean in each group will agree exactly with the sample mean

• In large samples, the regression parameter estimates are approximately normally distributed– P values and CI that are displayed for each parameter estimate

are Wald- based estimates

ˆˆ

ˆ

)()()( :statTest

ˆˆˆ )()()( :CI 95%

0

2/1

esZ

errstdnullestimateZ

eszerrstdvaluecritestimate

Page 57: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 57

57

Technical Details

• Unlike linear regression, there is no closed form expression to find the Poisson regression parameter estimates

• Instead, computer programs use an iterative search

• This search may fail in saturated or nearly saturated models if some parameter corresponds to a group having no events– In this setting, Poisson regression parameters modeling the log

mean are trying to estimate negative infinity– The sample size is too small for the model

Page 58: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 58

58

Example

• Prevalence of stroke (cerebrovascular accident- CVA) by age in subset of Cardiovascular Health Study

• Response variable is CVA– Binary variable: 0= no history of prior stroke, 1= prior history of

stroke

• Predictor variable is Age– Continuous predictor

Page 59: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 59

59

Example: Regression Model

• Answer question by assessing linear trends in log odds of strokeby age

• Estimate best fitting line to log odds of CVA within age groups

• An association will exist if the slope (1) is nonzero– In that case, the odds (and probability) of CVA will be different

across different age groups

AgeAgeCVAE 10log

Page 60: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 60

60

Parameter Estimates. poisson cva age, robust

(iteration info deleted)

Poisson regression

Number of obs = 735

Wald chi2(1) = 2.61

Prob > chi2 = 0.1064

Log pseudolklhd = -245.0 Pseudo R2 = 0.0044

| Robust

cva | Coef. StdErr z P>|z| [95% Conf Intvl]

age | .0298 .0184 1.61 0.106 -.00637 .0659

_cons | -4.52 1.40 -3.23 0.001 -7.26 -1.77

Page 61: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 61

61

Interpretation of Stata Output

• Regression model for CVA on age

• Intercept is labeled by “_cons”– Estimated intercept: -4.52

• Slope is labeled by variable name: “age”– Estimated slope: 0.0298

• Estimated linear relationship:– log probability of CVA by age group given by

iAgeCVA 0298.052.4]E[ log

Page 62: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 62

62

Interpretation of Intercept

• Estimated log odds CVA for newborns is -4.52– Probability of CVA for newborns is e-4.52 = 0.0109

• Pretty ridiculous to try to estimate– We never sampled anyone less than 67– In this problem, the intercept is just a tool in fitting the model

iAgeCVA 0298.052.4] E[ log

Page 63: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 63

63

Interpretation of Slope

• Estimated difference in log probability of CVA for two groups differing by one year in age is 0.0298, with older group tending to higher probability– Risk Ratio: e0.0298= 1.0302– For 5 year age difference: e5x0.0298= 1.03025 = 1.161

• (If a straight line relationship is not true, we interpret the slope as an average difference in log odds CVA per one year difference inage)

iAgeCVA 0298.052.4] E[ log

Page 64: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 64

64

Example: Interpretation

• “From Poisson regression analysis, we estimate that for each year difference in age, the probability of stroke is 3.02% higher in the older group, though this estimate is not statistically significant (P = .106). A 95% CI suggests that this observation is not unusual if a group that is one year older might have probability of stroke that was anywhere from 0.6% lower or 6.8% higher than the younger group.”

Page 65: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 65

65

Logistic Regression

• Binary response variable

• Allows continuous (or multiple) grouping variables– But is OK with binary grouping variable also

• Compares odds of response across groups– “Odds ratio”

Page 66: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 66

66

Logistic Regression

0ˆ:satisfy toˆ estimatesparameter regression Find

1 link Logit

:equation" estimatingfunction Score"

1,1~|

1

22110

j

i

i

n

iX

X

i

iiiX

X

iii

U

Xe

eYU

xxxe

eBxXY

i

i

i

i

Page 67: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 67

67

Simple Logistic Regression

• Modeling odds of binary response Y on predictor X

110

10

0

10

oddslog1 oddslog

oddslog0

1loglogit Model

1Pr on Distributi

xxXxxX

X

Xp

pp

pY

i

i

i

ii

ii

ii

Page 68: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 68

68

Interpretation as Odds

• Exponentiation of regression parameters

110

10

0

10

odds1 odds odds0

1 Model

1Pr on Distributi

eeexXeexX

eX

eep

p

pY

xi

xi

i

X

i

i

ii

i

Page 69: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 69

69

Estimating Proportions

• Proportion = odds / (1 + odds)

110

110

10

10

00

10

10

11

1

1/0

1 Model

1Pr on Distributi

eeeeeepxX

eeeepxX

eepX

eeeep

pY

x

x

ii

x

x

ii

ii

X

X

i

ii

i

i

Page 70: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 70

70

Parameter Interpretation

• Interpretation of the logistic regression parameters based on odds

• Odds when predictor is 0– Found by exponentiation of the intercept from the logistic

regression: exp(0)

• Odds ratio between groups differing in the value of the predictor by 1 unit– Found by exponentiation of the slope from the logistic regression:

exp(1)

Page 71: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 71

71

Similarity to Other Regressions

• Logistic regression uses maximum likelihood estimation to find parameter estimates

• If a saturated model is fit, the estimated odds of event in eachgroup will agree exactly with the sample odds

• In large samples, the regression parameter estimates are approximately normally distributed– P values and CI that are displayed for each parameter estimate

are Wald- based estimates

ˆˆ

ˆ

)()()( :statTest

ˆˆˆ )()()( :CI 95%

0

2/1

esZ

errstdnullestimateZ

eszerrstdvaluecritestimate

Page 72: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 72

72

Technical Details

• Unlike linear regression, there is no closed form expression to find the logistic regression parameter estimates

• Instead, computer programs use an iterative search

• This search may fail in saturated or nearly saturated models if some parameter corresponds to a group having all events or no events– In this setting, logistic regression parameters modeling the log

odds are trying to estimate positive or negative infinity– The sample size is too small for the model

Page 73: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 73

73

Stata: Logistic regression

• Weighted least squares using iteratively estimated weights:– logistic y x1 …, or logit y x1 …,– glm y x1 …,family(binomial) link(logit)– binreg y x1 …,

• Robust standard errors can be obtained with either– Specify option robust with glm or logistic / logit– Specify option vce(robust) with binreg

Page 74: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 74

74

Example

• Prevalence of stroke (cerebrovascular accident- CVA) by age in subset of Cardiovascular Health Study

• Response variable is CVA– Binary variable: 0= no history of prior stroke, 1= prior history of

stroke

• Predictor variable is Age– Continuous predictor

Page 75: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 75

75

Example: Regression Model

• Answer question by assessing linear trends in log odds of strokeby age

• Estimate best fitting line to log odds of CVA within age groups

• An association will exist if the slope (1) is nonzero– In that case, the odds (and probability) of CVA will be different

across different age groups

AgeAgeCVAodds 10log

Page 76: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 76

76

Parameter Estimates. logit cva age

(iteration info deleted)

Number of obs = 735

LR chi2(1) = 2.45

Prob > chi2 = 0.1175

Log likelihood = -240.98969

Pseudo R2 = 0.0051

cva | Coef StdErr z P>|z| [95% Conf Int]

age | .0336 .0210 1.59 0.111 -.0077 .0748

_cons | -4.69 1.591 -2.95 0.003 -7.810 -1.572

Page 77: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 77

77

Interpretation of Stata Output

• Regression model for CVA on age

• Intercept is labeled by “_cons”– Estimated intercept: -4.69

• Slope is labeled by variable name: “age”– Estimated slope: 0.0336

• Estimated linear relationship:– log odds CVA by age group given by

iAgeCVA 0336.069.4 odds log

Page 78: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 78

78

Interpretation of Intercept

• Estimated log odds CVA for newborns is -4.69– Odds of CVA for newborns is e-4.69 = 0.0092– Probability of CVA for newborns

• Use prob = odds / (1+odds): .0092 / (1+.0092)= .0091

• Pretty ridiculous to try to estimate– We never sampled anyone less than 67– In this problem, the intercept is just a tool in fitting the model

iAgeCVA 0336.069.4 odds log

Page 79: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 79

79

Interpretation of Slope

• Estimated difference in log odds CVA for two groups differing byone year in age is 0.0336, with older group tending to higher log odds– Odds Ratio: e0.0336= 1.034– For 5 year age difference: e5x0.0336= 1.0345 = 1.183

• (If a straight line relationship is not true, we interpret the slope as an average difference in log odds CVA per one year difference inage)

iAgeCVA 0336.069.4 odds log

Page 80: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 80

80

Stata: “logit” versus “logistic”

• We are rarely interested in the intercept by itself– We do have to use it when estimating odds of an event in a single

group

• Given that we are rarely interested in the intercept, we might as well use the “logistic” command– It will provide inference for the odds ratio– We don’t have to exponentiate the slope estimate

Page 81: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 81

81

Odds Ratios using “logistic”.logistic cva age

Logistic regression Number of obs = 735

LR chi2(1) = 2.45

Prob > chi2 = 0.1175

Log likelihood = -240.98969

Pseudo R2 = 0.0051

cva |Odds Ratio StdErr z P>|z| [95% Conf Int]

age | 1.034 .0218 1.59 0.111 .992 1.078

Page 82: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 82

82

Example: Interpretation

• “From logistic regression analysis, we estimate that for each year difference in age, the odds of stroke is 3.4% higher in the older group, though this estimate is not statistically significant (P = .113). A 95% CI suggests that this observation is not unusual if a group that is one year older might have odds of stroke that was anywhere from 0.8% lower or 7.8% higher than the younger group.”

Page 83: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 83

83

Comments on Interpretation

• I express this as a difference between group odds rather than a change with aging– We did not do a longitudinal study

• To the extent that the true group log odds have a linear relationship, this interpretation applies exactly

• If the true relationship is nonlinear– The slope estimates the “first order trend” for the sampled age

distribution– We should not regard the estimates of individual group

probabilities / odds as accurate

Page 84: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 84

84

Logistic Regression and 2 Test

• Logistic regression with a binary predictor (two groups) corresponds to familiar chi squared test

• Three possible statistics from logistic regression– Wald: The test based on the estimate and SE– Score: Corresponds to chi squared test, but not given in Stata

output– Likelihood ratio test: Can be obtained using post-regression

commands in Stata (covered with adjusting for covariates)

Page 85: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 85

85

Properties of MLEs with Canonical Link

• In regression with no covariates– Estimated intercept is the sample mean

• In regression with a single binary estimator X– Estimated intercept is the sample mean for group with X=0– Estimated slope is the difference in sample means

• Group with X=1 minus group with X=0

• In regression with as many parameters as distinct combinations of predictors– The fitted value for every group will correspond exactly to the

sample mean– A “saturated model”

Page 86: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 86

86

Stata: Regression Commands

• Linear regression (means, proportions)– regress Y X W1 W2 W3, robust

• Linear regression ( geometric means)– regress logY X W1 W2 W3, robust

• Logistic regression (odds)– logistic Y X W1 W2 W3, [robust]– logit Y X W1 W2 W3, [robust]

• Poisson regression (rates)– poisson Y X W1 W2 W3, robust exposure(timevar)

• Proportional hazards regression (hazards)– stset Y Event– stcox X W1 W2 W3, robust

Page 87: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 87

87

Stata: glm Regression Commands

• Difference of means, proportions (linear regression)– glm Y X W1, robust family(gaussian) link(identity)

• Difference of log means, log proportions– glm Y X W1 W2, robust link(log)

• Difference of log geometric means (linear regression on logs)– glm logY X W1 W2, robust

• Difference of log odds (logistic regression)– glm Y X W1, [robust] family(binomial) link(logit)

• Difference of log rates (Poisson regression)– glm Y X W1 W2, robust family(poisson) link(log)

• Back transforming regression parameters with log link– glm Y X W1 W2, robust family(…) link(…) eform

Page 88: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 88

88

Adjustment for Covariates

• We “adjust” for other covariates

• Define groups according to– Predictor of interest, and– Other covariates

• Compare the distribution of response across groups which– differ with respect to the Predictor of Interest, but– are the same with respect to the other covariates

• “holding other variables constant”

Page 89: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 89

89

Adjusted Regression Analyses

Page 90: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 90

90

Controlling Variation

• In a two sample comparison of means, we might control some variable in order to decrease the within group variability– Restrict population sampled– Standardize ancillary treatments and exposures after accrual– Standardize measurement procedure

Page 91: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 91

91

Adjusting for Covariates

• When comparing means using stratified analyses or linear regression, adjustment for precision variables decreases the within group standard deviation– Var (Y | X) vs Var (Y | X, W)

Page 92: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 92

92

Ex: Linear Regression

tindependen if 0 ,orthogonal if 0

|

1ˆˆ

ion)randomizat (completet independen if,|ˆ|ˆ)stratified (balanced, "orthogonal" if

,,1,,~,| :Adj

,,1,,~| :Unadj

22

22

22

2

2

1

2

1

11

11

2210

210

|,|

,||

,|

|

XWXW

XW

W

ii

ind

iii

i

ind

ii

rr

XWVar

rXnVarse

XnVarse

WXEEXE

niWXWXY

niXXY

XYWXY

WXYXY

WXY

XY

Page 93: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 93

93

Precision with Proportions

• When analyzing proportions (means), the mean variance relationship is important

• Precision is greatest when proportion is close to 0 or 1

• Greater homogeneity of groups makes results more deterministic– (At least, I always hope for this)

• Hence, we should get lower within-group variance upon adjusting for prognostic variables

Page 94: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 94

94

Ex: Diff of Indep Proportions

2

22

1

212

221

2212121

2121

ˆ/1

1

ˆˆˆ/;

,,1;2,1,,1~

nnnVserrV

pp

YYpppp

nnrnnn

njipBYind

iii

iiij

Page 95: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 95

95

Precision with Odds

• When analyzing odds (a nonlinear function of the mean), adjusting for a precision variable results in more extreme estimates– odds = p / (1-p)– odds using average of stratum specific p is not the average of

stratum specific odds

• Generally, little “precision” is gained due to the mean-variance relationship– Unless the precision variable is highly prognostic

Page 96: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 96

96

Precision with Hazards

• When analyzing hazards, adjusting for a precision variable results in more extreme estimates

• The standard error tends to still be related to the number of observed events– Higher hazard ratio with same standard error greater precision

Page 97: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 97

97

Adjustment for Covariates

• We “adjust” for other covariates

• Define groups according to– Predictor of interest, and– Other covariates

• Compare the distribution of response across groups which– differ with respect to the Predictor of Interest, but– are the same with respect to the other covariates

• “holding other variables constant”

Page 98: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 98

98

Unadjusted vs Adjusted Models

• Adjustment for covariates changes the scientific question

• Unadjusted models– Slope compares parameters across groups differing by 1 unit in

the modeled predictor• Groups may also differ with respect to other variables

• Adjusted models– Slope compares parameters across groups differing by 1 unit in

the modeled predictor but similar with respect to other modeled covariates

Page 99: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 99

99

Interpretation of Slopes

• Difference in interpretation of slopes

– β1 = Compares for groups differing by 1 unit in X• (The distribution of W might differ across groups being compared)

– γ1 = Compares for groups differing by 1 unit in X, but agreeing in their values of W

iiii WXWXg 210, :Model Adjusted

ii XXg 10 :Model Unadjusted

Page 100: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 100

100

Comparing models

?ˆˆˆˆ is When

?ˆˆ isn Whe:Statistics

?ˆˆ is When

? isen Wh:Science

, Adjusted

Unadjusted

11

11

11

11

210

10

eses

sese

WXWXg

XXg

iiii

ii

Page 101: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 101

101

General Results

• These questions can not be answered precisely in the general case

• However, in linear regression we can derive exact results

• These will serve as a basis for later examination of– Logistic regression– Poisson regression– Proportional hazards regression

Page 102: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 102

102

Linear Regression

• Difference in interpretation of slopes

– β1 = Diff in mean Y for groups differing by 1 unit in X• (The distribution of W might differ across groups being compared)

– γ1 = Diff in mean Y for groups differing by 1 unit in X, but agreeing in their values of W

iiiii WXWXYE 210, :Model Adjusted

iii XXYE 10 :Model Unadjusted

Page 103: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 103

103

Relationships: True Slopes• The slope of the unadjusted model will tend to be

• Hence, true adjusted and unadjusted slopes for X are estimating the same quantity only if

– ρXW = 0 (X and W are truly uncorrelated), OR

– (no association between W and Y after adjusting for X)

211

X

WXW

02

Page 104: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 104

104

Relationships: Estimated Slopes• The estimated slope of the unadjusted model will be

• Hence, estimated adjusted and unadjusted slopes for X are equal only if

– rXW = 0 (X and W are uncorrelated in the sample, which can be arranged by experimental design), OR

– (which cannot be predetermined, because Y is random)

– sW = 0 (W is controlled at a single value in which case rXW = 0)

XWYWYXX

WXW rrrs

sr211 ˆ1ˆˆ

0ˆ2

Page 105: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 105

105

Relationships: True SE

2,|

2|

22

2|

22

22

1

2

1

,|||

1,

ˆ Model Adjusted

ˆ Model Unadjusted

WXYXWXY

XW

WXYVarXWVarXYVar

rXnVarWXYVar

se

XnVarXYVar

se

Page 106: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 106

106

Relationships: True SE

0| OR 0AND

0if ˆˆ Thus,

,|||

1,

ˆ Model Adjusted

ˆ Model Unadjusted

2

11

22

22

1

2

1

XWVar

rsese

WXYVarXWVarXYVar

rXnVarWXYVar

se

XnVarXYVar

se

XW

XW

Page 107: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 107

107

Relationships: Estimated SE

2210

2

10

222

1

2

2

1

ˆˆˆ,|

ˆˆ|

113/,

ˆˆ Model Adjusted

12/ˆˆ Model Unadjusted

iii

ii

XWX

X

WXYWXYSSE

XYXYSSE

rsnnWXYSSE

es

snnXYSSE

es

Page 108: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 108

108

Relationships: Estimated SE

3/,2/AND

0if ˆˆˆˆ Thus,

113/,

ˆˆ Model Adjusted

12/ˆˆ Model Unadjusted

11

222

1

2

2

1

nWXYSSEnXYSSE

reses

rsnnWXYSSE

es

snnXYSSE

es

XW

XWX

X

Page 109: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 109

109

Residual Squared Error

WXYSSEXYSSE

WXYWXYSSE

XYXYSSE

iii

ii

,||:data same on the calculatedWhen

ˆˆˆ,|

ˆˆ|2

210

2

10

Page 110: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 110

110

Relationships: Estimated SE

0ˆ if ,|| and ,0OR

,|| casein which ,0ˆ if ˆˆ Now

ˆˆˆ,|

ˆˆ|

2

2

11

2210

2

10

WXYSSEXYSSEr

WXYSSEXYSSE

WXYWXYSSE

XYXYSSE

XW

iii

ii

Page 111: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 111

111

Special Cases

• Behavior of unadjusted and adjusted models according to whether– X and W are uncorrelated (no association in means)– W is associated with Y after adjustment for X

InflationVar Irrelevant0gConfoundinPrecision0

00

2

2

XWXW rr

Page 112: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 112

112

Simulations

• Unadjusted and adjusted estimates of effect of binary POI as a function of– Effect of a covariate on summary of outcome (mean, odds, …)– Sampling scheme: Association between covariate and POI

• Difference in mean covariate• Difference in median covariate

iiii

ii

XXX

WXW

iii

WXWXg

XXg

WWssr

XXWE

210

10

011

10

, : Adjusted

:Unadjusted

ˆ

: Sampling

Page 113: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 113

113

Linear Regression

• Simulation results

Truth Avg Estimates (SE)

ΔMdn α1 rXW γ2 γ1 β1 γ1

Irrelevant 0.0 0.0 0.00 0.0 0.0 0.0 (0.20) 0.0 (0.20)

Precision 0.0 0.0 0.00 1.0 0.0 0.0 (0.28) 0.0 (0.19)

Precision - 0.3 0.0 0.00 1.0 0.0 0.0 (0.28) 0.0 (0.20)

Precision 0.0 0.0 0.00 1.0 1.0 1.0 (0.28) 1.0 (0.20)

Confound 0.3 0.3 0.15 1.0 0.0 0.3 (0.28) 0.0 (0.21)

Confound 0.0 0.3 0.15 1.0 0.0 0.3 (0.29) 0.0 (0.21)

Var Inflatn 0.0 1.0 0.45 0.0 0.0 0.0 (0.20) 0.0 (0.22)

Page 114: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 114

114

Linear Regression

• Simulation results

Truth Avg Estimates (SE)

ΔMdn α1 rXW γ2 γ1 β1 γ1

Irrelevant 0.0 0.0 0.00 0.0 0.0 0.0 (0.20) 0.0 (0.20)

Precision 0.0 0.0 0.00 1.0 0.0 0.0 (0.28) 0.0 (0.19)

Precision - 0.3 0.0 0.00 1.0 0.0 0.0 (0.28) 0.0 (0.20)

Precision 0.0 0.0 0.00 1.0 1.0 1.0 (0.28) 1.0 (0.20)

Confound 0.3 0.3 0.15 1.0 0.0 0.3 (0.28) 0.0 (0.21)

Confound 0.0 0.3 0.15 1.0 0.0 0.3 (0.29) 0.0 (0.21)

Var Inflatn 0.0 1.0 0.45 0.0 0.0 0.0 (0.20) 0.0 (0.22)

Page 115: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 115

115

Linear Regression

• Simulation results

Truth Avg Estimates (SE)

ΔMdn α1 rXW γ2 γ1 β1 γ1

Irrelevant 0.0 0.0 0.00 0.0 0.0 0.0 (0.20) 0.0 (0.20)

Precision 0.0 0.0 0.00 1.0 0.0 0.0 (0.28) 0.0 (0.19)

Precision - 0.3 0.0 0.00 1.0 0.0 0.0 (0.28) 0.0 (0.20)

Precision 0.0 0.0 0.00 1.0 1.0 1.0 (0.28) 1.0 (0.20)

Confound 0.3 0.3 0.15 1.0 0.0 0.3 (0.28) 0.0 (0.21)

Confound 0.0 0.3 0.15 1.0 0.0 0.3 (0.29) 0.0 (0.21)

Var Inflatn 0.0 1.0 0.45 0.0 0.0 0.0 (0.20) 0.0 (0.22)

Page 116: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 116

116

Linear Regression

• Simulation results

Truth Avg Estimates (SE)

ΔMdn α1 rXW γ2 γ1 β1 γ1

Irrelevant 0.0 0.0 0.00 0.0 0.0 0.0 (0.20) 0.0 (0.20)

Precision 0.0 0.0 0.00 1.0 0.0 0.0 (0.28) 0.0 (0.19)

Precision - 0.3 0.0 0.00 1.0 0.0 0.0 (0.28) 0.0 (0.20)

Precision 0.0 0.0 0.00 1.0 1.0 1.0 (0.28) 1.0 (0.20)

Confound 0.3 0.3 0.15 1.0 0.0 0.3 (0.28) 0.0 (0.21)

Confound 0.0 0.3 0.15 1.0 0.0 0.3 (0.29) 0.0 (0.21)

Var Inflatn 0.0 1.0 0.45 0.0 0.0 0.0 (0.20) 0.0 (0.22)

Page 117: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 117

117

Linear Regression

• Simulation results

Truth Avg Estimates (SE)

ΔMdn α1 rXW γ2 γ1 β1 γ1

Irrelevant 0.0 0.0 0.00 0.0 0.0 0.0 (0.20) 0.0 (0.20)

Precision 0.0 0.0 0.00 1.0 0.0 0.0 (0.28) 0.0 (0.19)

Precision - 0.3 0.0 0.00 1.0 0.0 0.0 (0.28) 0.0 (0.20)

Precision 0.0 0.0 0.00 1.0 1.0 1.0 (0.28) 1.0 (0.20)

Confound 0.3 0.3 0.15 1.0 0.0 0.3 (0.28) 0.0 (0.21)

Confound 0.0 0.3 0.15 1.0 0.0 0.3 (0.29) 0.0 (0.21)

Var Inflatn 0.0 1.0 0.45 0.0 0.0 0.0 (0.20) 0.0 (0.22)

Page 118: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 118

118

Linear Regression

• Simulation results

Truth Avg Estimates (SE)

ΔMdn α1 rXW γ2 γ1 β1 γ1

Irrelevant 0.0 0.0 0.00 0.0 0.0 0.0 (0.20) 0.0 (0.20)

Precision 0.0 0.0 0.00 1.0 0.0 0.0 (0.28) 0.0 (0.19)

Precision - 0.3 0.0 0.00 1.0 0.0 0.0 (0.28) 0.0 (0.20)

Precision 0.0 0.0 0.00 1.0 1.0 1.0 (0.28) 1.0 (0.20)

Confound 0.3 0.3 0.15 1.0 0.0 0.3 (0.28) 0.0 (0.21)

Confound 0.0 0.3 0.15 1.0 0.0 0.3 (0.29) 0.0 (0.21)

Var Inflatn 0.0 1.0 0.45 0.0 0.0 0.0 (0.20) 0.0 (0.22)

Page 119: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 119

119

Take Home Message: Risk Difference

• The magnitude of confounding is a product of– Magnitude of association between covariate and response AND– Difference of mean covariate value across POI groups

• If adjustment would be linear, mean (not median, etc) matters

• If no confounding, adjusted and unadjusted estimates will be equal (approximately)

• Standard errors are– Decreased if added covariates are (conditionally) associated with

response– Increased if added covariates that are (conditionally) correlated

with the POI

• After adjusting for a confounder, the precision of the estimates for response-POI association can be larger or smaller than in an unadjusted analysis

Page 120: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 120

120

Logistic Regression

• Simulation results

Truth Avg Estimates (SE)

ΔMdn α1 rXW γ2 γ1 β1 γ1

Irrelevant 0.0 0.0 0.00 0.0 0.0 0.0 (0.42) 0.0 (0.42)

Precision 0.0 0.0 0.00 1.0 0.0 0.0 (0.40) 0.0 (0.42)

Precision - 0.3 0.0 0.00 1.0 0.0 0.0 (0.42) 0.0 (0.43)

Precision 0.0 0.0 0.00 1.0 1.0 0.8 (0.43) 1.0 (0.49)

Confound 0.3 0.3 0.15 1.0 0.0 0.3 (0.43) 0.0 (0.48)

Confound 0.0 0.3 0.15 1.0 0.0 0.2 (0.41) 0.0 (0.47)

Var Inflatn 0.0 1.0 0.45 0.0 0.0 0.0 (0.41) 0.0 (0.47)

Page 121: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 121

121

Logistic Regression

• Simulation results

Truth Avg Estimates (SE)

ΔMdn α1 rXW γ2 γ1 β1 γ1

Irrelevant 0.0 0.0 0.00 0.0 0.0 0.0 (0.42) 0.0 (0.42)

Precision 0.0 0.0 0.00 1.0 0.0 0.0 (0.40) 0.0 (0.42)

Precision - 0.3 0.0 0.00 1.0 0.0 0.0 (0.42) 0.0 (0.43)

Precision 0.0 0.0 0.00 1.0 1.0 0.8 (0.43) 1.0 (0.49)

Confound 0.3 0.3 0.15 1.0 0.0 0.3 (0.43) 0.0 (0.48)

Confound 0.0 0.3 0.15 1.0 0.0 0.2 (0.41) 0.0 (0.47)

Var Inflatn 0.0 1.0 0.45 0.0 0.0 0.0 (0.41) 0.0 (0.47)

Page 122: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 122

122

Logistic Regression

• Simulation results

Truth Avg Estimates (SE)

ΔMdn α1 rXW γ2 γ1 β1 γ1

Irrelevant 0.0 0.0 0.00 0.0 0.0 0.0 (0.42) 0.0 (0.42)

Precision 0.0 0.0 0.00 1.0 0.0 0.0 (0.40) 0.0 (0.42)

Precision - 0.3 0.0 0.00 1.0 0.0 0.0 (0.42) 0.0 (0.43)

Precision 0.0 0.0 0.00 1.0 1.0 0.8 (0.43) 1.0 (0.49)

Confound 0.3 0.3 0.15 1.0 0.0 0.3 (0.43) 0.0 (0.48)

Confound 0.0 0.3 0.15 1.0 0.0 0.2 (0.41) 0.0 (0.47)

Var Inflatn 0.0 1.0 0.45 0.0 0.0 0.0 (0.41) 0.0 (0.47)

Page 123: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 123

123

Logistic Regression

• Simulation results

Truth Avg Estimates (SE)

ΔMdn α1 rXW γ2 γ1 β1 γ1

Irrelevant 0.0 0.0 0.00 0.0 0.0 0.0 (0.42) 0.0 (0.42)

Precision 0.0 0.0 0.00 1.0 0.0 0.0 (0.40) 0.0 (0.42)

Precision - 0.3 0.0 0.00 1.0 0.0 0.0 (0.42) 0.0 (0.43)

Precision 0.0 0.0 0.00 1.0 1.0 0.8 (0.43) 1.0 (0.49)

Confound 0.3 0.3 0.15 1.0 0.0 0.3 (0.43) 0.0 (0.48)

Confound 0.0 0.3 0.15 1.0 0.0 0.2 (0.41) 0.0 (0.47)

Var Inflatn 0.0 1.0 0.45 0.0 0.0 0.0 (0.41) 0.0 (0.47)

Page 124: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 124

124

Logistic Regression

• Simulation results

Truth Avg Estimates (SE)

ΔMdn α1 rXW γ2 γ1 β1 γ1

Irrelevant 0.0 0.0 0.00 0.0 0.0 0.0 (0.42) 0.0 (0.42)

Precision 0.0 0.0 0.00 1.0 0.0 0.0 (0.40) 0.0 (0.42)

Precision - 0.3 0.0 0.00 1.0 0.0 0.0 (0.42) 0.0 (0.43)

Precision 0.0 0.0 0.00 1.0 1.0 0.8 (0.43) 1.0 (0.49)

Confound 0.3 0.3 0.15 1.0 0.0 0.3 (0.43) 0.0 (0.48)

Confound 0.0 0.3 0.15 1.0 0.0 0.2 (0.41) 0.0 (0.47)

Var Inflatn 0.0 1.0 0.45 0.0 0.0 0.0 (0.41) 0.0 (0.47)

Page 125: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 125

125

Take Home Message: Odds Ratios

• The magnitude of confounding is primarily a function of– Magnitude of association between covariate and response AND– Difference of mean covariate value across POI groups

• If adjustment would be linear, mean (not median, etc) matters most

• Even if no confounding, adjusted and unadjusted estimates will be different if new covariate (conditionally) associated with response– Adjusting for a precision variable will “deattenuate” OR (and

estimate)– The amount of “deattenuation” depends on

• The strength of association between the added covariate and the response, and

• The adjusted strength of association between the POI and the response

Page 126: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 126

126

Deattenuation of OR

Page 127: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 127

127

Take Home Message: Odds Ratios

• Standard errors of estimated OR measuring POI – response association are largely driven by the mean-variance relationship– SE of odds behaves like 1 / p(1-p)– Greater homogeneity of groups leads to p closer to 0 or 1– Usually statistical significance of the POI – response association

is affected very little by adjustment for a “precision” variable unless it is very strongly associated with response

• Note that if we persist in estimating the population OR instead of the adjusted OR, we do gain precision by adjustment– We do have to take a weighted average, however– See Tangen & Koch, Stat Med, 2000

Page 128: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 128

128

Poisson Regression

• Simulation results

Truth Avg Estimates (SE)

ΔMdn α1 rXW γ2 γ1 β1 γ1

Irrelevant 0.0 0.0 0.00 0.0 0.0 0.0 (0.18) 0.0 (0.18)

Precision 0.0 0.0 0.00 1.0 0.0 0.0 (0.27) 0.0 (0.10)

Precision - 0.3 0.0 0.00 1.0 0.0 0.4 (0.75) 0.0 (0.10)

Precision 0.0 0.0 0.00 1.0 1.0 1.0 (0.26) 1.0 (0.08)

Confound 0.3 0.3 0.15 1.0 0.0 0.3 (0.28) 0.0 (0.09)

Confound 0.0 0.3 0.15 1.0 0.0 0.7 (0.76) 0.0 (0.09)

Var Inflatn 0.0 1.0 0.45 0.0 0.0 0.0 (0.12) 0.0 (0.13)

Page 129: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 129

129

Poisson Regression

• Simulation results

Truth Avg Estimates (SE)

ΔMdn α1 rXW γ2 γ1 β1 γ1

Irrelevant 0.0 0.0 0.00 0.0 0.0 0.0 (0.18) 0.0 (0.18)

Precision 0.0 0.0 0.00 1.0 0.0 0.0 (0.27) 0.0 (0.10)

Precision - 0.3 0.0 0.00 1.0 0.0 0.4 (0.75) 0.0 (0.10)

Precision 0.0 0.0 0.00 1.0 1.0 1.0 (0.26) 1.0 (0.08)

Confound 0.3 0.3 0.15 1.0 0.0 0.3 (0.28) 0.0 (0.09)

Confound 0.0 0.3 0.15 1.0 0.0 0.7 (0.76) 0.0 (0.09)

Var Inflatn 0.0 1.0 0.45 0.0 0.0 0.0 (0.12) 0.0 (0.13)

Page 130: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 130

130

Poisson Regression

• Simulation results

Truth Avg Estimates (SE)

ΔMdn α1 rXW γ2 γ1 β1 γ1

Irrelevant 0.0 0.0 0.00 0.0 0.0 0.0 (0.18) 0.0 (0.18)

Precision 0.0 0.0 0.00 1.0 0.0 0.0 (0.27) 0.0 (0.10)

Precision - 0.3 0.0 0.00 1.0 0.0 0.4 (0.75) 0.0 (0.10)

Precision 0.0 0.0 0.00 1.0 1.0 1.0 (0.26) 1.0 (0.08)

Confound 0.3 0.3 0.15 1.0 0.0 0.3 (0.28) 0.0 (0.09)

Confound 0.0 0.3 0.15 1.0 0.0 0.7 (0.76) 0.0 (0.09)

Var Inflatn 0.0 1.0 0.45 0.0 0.0 0.0 (0.12) 0.0 (0.13)

Page 131: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 131

131

Poisson Regression

• Simulation results

Truth Avg Estimates (SE)

ΔMdn α1 rXW γ2 γ1 β1 γ1

Irrelevant 0.0 0.0 0.00 0.0 0.0 0.0 (0.18) 0.0 (0.18)

Precision 0.0 0.0 0.00 1.0 0.0 0.0 (0.27) 0.0 (0.10)

Precision - 0.3 0.0 0.00 1.0 0.0 0.4 (0.75) 0.0 (0.10)

Precision 0.0 0.0 0.00 1.0 1.0 1.0 (0.26) 1.0 (0.08)

Confound 0.3 0.3 0.15 1.0 0.0 0.3 (0.28) 0.0 (0.09)

Confound 0.0 0.3 0.15 1.0 0.0 0.7 (0.76) 0.0 (0.09)

Var Inflatn 0.0 1.0 0.45 0.0 0.0 0.0 (0.12) 0.0 (0.13)

Page 132: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 132

132

Take Home Message: Risk Ratios

• In simulations, I allowed for “overdispersion” of mean-variance

• The magnitude of confounding is primarily a function of– Magnitude of association between covariate and response AND– Difference of mean covariate value across POI groups

• If adjustment would be linear, mean matters most, though the skewness of the predictor distribution does have an effect

• (There is a mixture of Poisson random variables)

• Adjusted and unadjusted estimates can be different if new covariate (conditionally) associated with response– Influence of skewed observations can be great– If added covariate symmetric with relatively small variance, there

is little difference between adjusted, unadjusted estimates

Page 133: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 133

133

Take Home Message: Risk Ratio

• Standard errors tend to be– Decreased if added covariates are (conditionally) associated with

response– Increased if added covariates that are (conditionally) correlated

with the POI

• After adjusting for a confounder, the precision of the estimates for response-POI association can be larger or smaller than in an unadjusted analysis

Page 134: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 134

134

Proportional Hazards Regression

• Simulation results

Truth Avg Estimates (SE)

ΔMdn α1 rXW γ2 γ1 β1 γ1

Irrelevant 0.0 0.0 0.00 0.0 0.0 0.0 (0.20) 0.0 (0.20)

Precision 0.0 0.0 0.00 1.0 0.0 0.0 (0.21) 0.0 (0.22)

Precision - 0.3 0.0 0.00 1.0 0.0 0.0 (0.21) 0.0 (0.21)

Precision 0.0 0.0 0.00 1.0 1.0 0.7 (0.21) 1.0 (0.22)

Confound 0.3 0.3 0.15 1.0 0.0 0.2 (0.21) 0.0 (0.21)

Confound 0.0 0.3 0.15 1.0 0.0 0.1 (0.20) 0.0 (0.22)

Var Inflatn 0.0 1.0 0.45 0.0 0.0 0.0 (0.20) 0.0 (0.23)

Next

Page 135: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 135

135

Proportional Hazards Regression

• Simulation results

Truth Avg Estimates (SE)

ΔMdn α1 rXW γ2 γ1 β1 γ1

Irrelevant 0.0 0.0 0.00 0.0 0.0 0.0 (0.20) 0.0 (0.20)

Precision 0.0 0.0 0.00 1.0 0.0 0.0 (0.21) 0.0 (0.22)

Precision - 0.3 0.0 0.00 1.0 0.0 0.0 (0.21) 0.0 (0.21)

Precision 0.0 0.0 0.00 1.0 1.0 0.7 (0.21) 1.0 (0.22)

Confound 0.3 0.3 0.15 1.0 0.0 0.2 (0.21) 0.0 (0.21)

Confound 0.0 0.3 0.15 1.0 0.0 0.1 (0.20) 0.0 (0.22)

Var Inflatn 0.0 1.0 0.45 0.0 0.0 0.0 (0.20) 0.0 (0.23)

Next

Page 136: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 136

136

Proportional Hazards Regression

• Simulation results

Truth Avg Estimates (SE)

ΔMdn α1 rXW γ2 γ1 β1 γ1

Irrelevant 0.0 0.0 0.00 0.0 0.0 0.0 (0.20) 0.0 (0.20)

Precision 0.0 0.0 0.00 1.0 0.0 0.0 (0.21) 0.0 (0.22)

Precision - 0.3 0.0 0.00 1.0 0.0 0.0 (0.21) 0.0 (0.21)

Precision 0.0 0.0 0.00 1.0 1.0 0.7 (0.21) 1.0 (0.22)

Confound 0.3 0.3 0.15 1.0 0.0 0.2 (0.21) 0.0 (0.21)

Confound 0.0 0.3 0.15 1.0 0.0 0.1 (0.20) 0.0 (0.22)

Var Inflatn 0.0 1.0 0.45 0.0 0.0 0.0 (0.20) 0.0 (0.23)

Next

Page 137: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 137

137

Proportional Hazards Regression

• Simulation results

Truth Avg Estimates (SE)

ΔMdn α1 rXW γ2 γ1 β1 γ1

Irrelevant 0.0 0.0 0.00 0.0 0.0 0.0 (0.20) 0.0 (0.20)

Precision 0.0 0.0 0.00 1.0 0.0 0.0 (0.21) 0.0 (0.22)

Precision - 0.3 0.0 0.00 1.0 0.0 0.0 (0.21) 0.0 (0.21)

Precision 0.0 0.0 0.00 1.0 1.0 0.7 (0.21) 1.0 (0.22)

Confound 0.3 0.3 0.15 1.0 0.0 0.2 (0.21) 0.0 (0.21)

Confound 0.0 0.3 0.15 1.0 0.0 0.1 (0.20) 0.0 (0.22)

Var Inflatn 0.0 1.0 0.45 0.0 0.0 0.0 (0.20) 0.0 (0.23)

Next

Page 138: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 138

138

Take Home Message: Hazard Ratios

• The magnitude of confounding is primarily a function of– Magnitude of association between covariate and response AND– Difference of mean covariate value across POI groups

• If adjustment would be linear, mean (not median, etc) matters most

• Even if no confounding, adjusted and unadjusted estimates will be different if new covariate (conditionally) associated with response (PH regression is related to logistic regression)– Adjusting for a precision variable will “deattenuate” HR (and

estimate)– The amount of “deattenuation” depends on

• The strength of association between the added covariate and the response, and

• The adjusted strength of association between the POI and the response

Page 139: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 139

139

Take Home Message: Hazard Ratios

• Standard errors of estimated HR measuring POI – response association are largely driven by the mean-variance relationship of the sample size ratios– Unless there is a very strong association, the SE tends to be fairly

constant for a range of different HRs

• Holding the SE constant, we will obtain a lower p value with a deattenuated HR

Page 140: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 140

140

Precision: Linear Regression

• E.g., X, W independent in population (or completely randomized experiment) AND W associated with Y independent of X

1111

1111

2

ˆˆˆˆˆˆ Errs Std

ˆˆ Slopes

Estimates Value True

00

esessese

XW

Page 141: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 141

141

Precision: Logistic Regression

• Adjusting for a precision variable • Deattenuates slope away from the null• Standard errors reflect mean-variance relationship

– Substantially increased power only in extreme cases» (OR > 5 for equal samples sizes of binary W)

1111

11111

11111

ˆˆˆˆˆˆ Errs Std

ˆˆ :0

ˆˆ :0 Slopes

Estimates Value True

esessese

Page 142: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 142

142

Precision: Poisson Regression

• Adjusting for a precision variable (with robust SE)• No effect on the slope (similar to linear regression)

– log ratios are linear in log means• Standard errors reflect mean-variance relationship

– Virtually no effect on power

1111

1111

ˆˆˆˆˆˆ Errs Std

ˆˆ Slopes

Estimates Value True

esessese

Page 143: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 143

143

Precision: PH Regression

• Adjusting for a precision variable • Deattenuates slope away from the null• Standard errors stay fairly constant

– (Complicated result of binomial mean-variance)

1111

11111

11111

ˆˆˆˆˆˆ Errs Std

ˆˆ :0

ˆˆ :0 Slopes

Estimates Value True

esessese

Page 144: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 144

144

Lin Reg: Stratified Randomization

• Stratified (orthogonal) randomization in a designed experiment– Also when designing observational study with matching

• Unless we adjust for the stratification variables, we estimate the true standard errors incorrectly

1111

1111

2

ˆˆˆˆˆˆ Errs Std

ˆˆ Slopes

Estimates Value True

00

esessese

rXW

Page 145: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 145

145

Multiple Adjustment

• We can generalize the results about the difference in mean confounder values across a groups defined by a binary POI

• Difference in interpretation of slopes

– β1 = Diff in mean Y for groups differing by 1 unit in X• (The distribution of W might differ across groups being compared)

– γ1 = Diff in mean Y for groups differing by 1 unit in X, but agreeing in their values of all W’s

iiiiiii WWXWWXYE 23121021 ,,, :Model Adjusted

iii XXYE 10 :Model Unadjusted

Page 146: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 146

146

Multiple Adjustment: Good News• For a binary POI, the slope of the unadjusted model will tend to be

• Hence, true adjusted and unadjusted slopes for X are definitelyestimating the same quantity if for each j

• (Of course, it is also possible the various terms cancel each other out…)

p

jXjXjj WW

101111

covariatesother for adjusting afteroutcome of predictivenot is Covariate

0

group POIeach in equal are valuescovariateMean

1

01

j

XjXj

OR

WW

Page 147: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 147

147

Multiple Adjustment: Bad News• We must consider the conditional relationships between a covariate and

the POI after adjustment for other variables• Consider four possible models involving X, W, and Z

• Deciding between models U and W, we only need consider ρXW

• Deciding between models Z and WZ, we need to consider ρXZ and ρWZ as wel

iZiWiXiiii

iZiXiii

iWiXiii

iXii

ZWXZWXYE

ZXZXYE

WXWXYE

XXYE

0

0

0

0

,, : WZModel

, : ZModel

, : WModel

: UModel

Page 148: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 148

148

Multiple Adjustment: Bad News

• Even though W and X might be marginally uncorrelated, our models Z and ZW estimate the same quantity only if – X | Z is uncorrelated with W | Z, OR– Y | X, Z is uncorrelated with W | X, Z

WX

W

XW

WZXZXX

Z

WWZ

X

ZXZ

X

WXW

XW

WXX

XXWX

WXWXX

XW

XW

2

0

2

0

11

iZiWiXiiii

iZiXiii

iWiXiii

iXii

ZWXZWXYE

ZXZXYE

WXWXYE

XXYE

0

0

0

0

,, : WZModel

, : ZModel

, : WModel

: UModel

Page 149: Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014  · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 149

149

Take Home Messages

• “Everything should be as simple as possible, but no simpler”

• Understanding the role of confounding and precision allows some generalizations, but there are details that can mess us up– Linear regression: We have formulas to guide us when modeling

differences in means– Other regressions: Generally we use computers in an iterative

search, but each step is often a linear regression

• When adjusting for multiple variables, each slope parameter is interpreted by conditioning on the other variables in the model– We have to talk about adjusting for “residual confounding”– We have to talk about “added precision”