Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014 · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536

Lecture 6: Review of Regression October 14, 2014

Categorical Data Analysis, AUT 2014 1

1

Biost 536 / Epi 536Categorical Data Analysis in

EpidemiologyScott S. Emerson, M.D., Ph.D.

Professor of BiostatisticsUniversity of Washington

Lecture 6:Review of Regression

October 14, 2014



2

Lecture Outline

• General Regression Model

• Confounding and Precision in Regression– Linear regression with adjustment for a single covariate– Logistic regression with adjustment for a single covariate– PH regression with adjustment for a single covariate– Linear regression with adjustment for multiple covariates



3

General Regression Model



4

Regression Models

• According to the parameter compared across groups– Means Linear regression– Geom Means Linear regression on logs– Odds Logistic regression– Rates Poisson regression– Hazards Proportional Hazards regr– Quantiles Parametric (AFT) survival regr

• A special case for this course– Cumulative incidence Exponential regression



5

General Regression

• General notation for variables and parameter

• The parameter might be the mean, geometric mean, odds, rate, instantaneous risk of an event (hazard), etc.

ofon distributi ofParameter subjectth for the variablesadjustment of Value

subject th for the POI theof Value subject th on the measured Response

,, 21

ii

ii

i

i

Yi

ii

WWXY



6

Multiple Regression

• General notation for multiple regression model

• The link function is usually either none (means) or log (geommean, odds, hazard)

• In this class: complementary log log link for proportions measuring cumulative incidence, which relate to log link on exponential hazards– log (- log p) = log + log Δt

" covariatefor Slope" "Interest of Predfor Slope" Intercept""

modelingfor usedfunction link""

1

1

0

231210

jj

iiii

WX

gWWXg



7

Borrowing Information

• Use other groups to make estimates in groups with sparse data

• Intuitively: 67 and 69 year olds would provide some relevant information about 68 year olds

• Assuming straight line relationship tells us how to adjust data from other (even more distant) age groups– If we do not know about the exact functional relationship, we

might want to borrow information only close to each group



8

Defining “Contrasts”

• Define a comparison across groups to use when answering scientific question

• If straight line relationship in parameter, slope for POI compares parameter between groups differing by 1 unit in X when all othercovariates in model are equal

• If nonlinear relationship in parameter, slope is average comparison of parameter between groups differing by 1 unit in X “holding covariates constant”– Statistical jargon: a “contrast” across the groups



9

Comparison of Models

• The major difference between regression models is interpretationof the parameters

• Summary: Mean, geometric mean, odds, hazards• Comparison of groups: Difference, ratio

• Issues related to inclusion of covariates remain the same• Address the scientific question

– Predictor of interest; Effect modifiers• Address confounding• Increase precision



10

Interpretation of Parameters

• Intercept– Corresponds to a population with all modeled covariates equal to

zero– Most often outside range of data; quite often impossible; very

rarely of interest by itself

• Slope– A comparison between groups differing by 1 unit in corresponding

covariate, but agreeing on all other modeled covariates– Sometimes impossible to use this definition when modeling

interactions or complex curves



11

Regression with Identity Link

• E.g., modeling mean of response Y on predictors X, W1, W2, …

1

12110

1210

0

1210

: of Difference

1

00

Model

ijijii

ijijii

jii

iii

wxwWxXwxwWxX

WX

WX



12

Regression with Log Link

• E.g., modeling geometric mean, odds, rate, hazard of response Yon predictors X, W1, W2, …

1

12110

1210

0

1210

:log of Difference

log1 log

log00

log Model

ijijii

ijijii

jii

iii

wxwWxXwxwWxX

WX

WX



13

Regression with Log Link: Back Transform


1

12110

1210

0

:of Ratio

1

00

log Model

1210

e

ewWxXewWxX

eWX

WX

i

i

wxjijii

wxjijii

jii

iii



14

Stratification vs Regression

• Generally, any stratified analysis could be performed as a regression model

• But stratification adjusts for covariates and all interactions among those covariates– E.g, sex, race, and the sex-race interaction

• Our habit in regression is to just adjust for the covariates (the “main effect”), and consider interactions less often



15

Regression with “Saturated” Models

• Regression model is “saturated” if as many regression parameters as distinct combinations of predictors in the data

• No borrowing of information across groups

• Fitted values will correspond to common descriptive statistics– Linear regression: sample means (proportions)– Logistic regression: (log) sample odds– Poisson regression: (log) sample rates



16

Example of “Saturated” Models

• Suppose model sex (two values), race (four values), and categorized age (three levels)

• Suppose we have at least one observation for each combination of sex, race, and categorized age

• Then a regression model that fits r = 24 parameters is saturated

• Model dummy variables with all interactions:– Intercept (1)– Main effects: sex (1), race (3), age (2)– Two-way interactions: sex-race (3), sex-age (2), race-age(6)– Three-way interactions: sex-race-age (6)

• (24 equations in 24 unknowns: Fit descriptive statistics exactly)



17

Regression with Binary Predictor

• Common regression models correspond to common two-sample tests when only a single binary predictor is modeled

• Linear regression (classic SE) t test, equal variance (exact)

• Linear regression (robust SE) t test, unequal variance (approx)

• Logistic regression chi square test (score test)

• Prop hazard regression logrank test (score test)



18

Stata: Multiple Regression

• In Stata, common syntax is used for common regression models – Keyword indicating model (and summary measure-link)– Response variable (except in proportional hazards)– List of predictors / model

• Special syntax for creating dummy variables, interactions– Options

• Output– Table of regression parameters, CI, p values

• (Optional display: back-transformed parameters of RR, OR, HR)– Test of entire regression model also provided

• A test that all slopes are equal to 0

• Post-estimation commands– Joint tests of parameters: test, testparm– Inference about combinations of parameters: lincom, nlcom



19

Stata: Regression Commands

• Linear regression (means, proportions)– regress Y X W1 W2 W3, robust

• Linear regression ( geometric means)– regress logY X W1 W2 W3, robust

• Logistic regression (odds)– logistic Y X W1 W2 W3, [robust]– logit Y X W1 W2 W3, [robust]

• Poisson regression (rates)– poisson Y X W1 W2 W3, robust

• Proportional hazards regression (hazards)– stset Y Event– stcox X W1 W2 W3, robust



20

Stata: glm Regression Commands

• Difference of means, proportions (linear regression)– glm Y X W1, robust family(gaussian) link(identity)

• Difference of log means, log proportions– glm Y X W1 W2, robust link(log)

• Difference of log geometric means (linear regression on logs)– glm logY X W1 W2, robust

• Difference of log odds (logistic regression)– glm Y X W1, [robust] family(binomial) link(logit)

• Difference of log rates (Poisson regression)– glm Y X W1 W2, robust family(poisson) link(log)

• Back transforming regression parameters with log link– glm Y X W1 W2, robust family(…) link(…) eform



21

R: Multiple Regression

• The uwIntroStats package in R adopts similar syntax to Stata

• A single function takes the “functional” as its first argument– “Functional” = “summary measure”– E.g., mean, geometric mean, odds, rates, hazard

• Allows pre-specification of “multiple-partial” tests– Joint significance of dummy variables, polynomials, interactions,

etc.



22

Regression Models withBinary Response



23

Common Regression Models

• According to the parameter and contrast across groups– Mean (differences) Linear regression (a GLM)– Mean (ratios) GLM with log link (a GLM)– Geom Means (ratios) Linear regression on logs (a GLM)– Odds (ratios) Logistic regression (a GLM)– Rates (ratios) Poisson regression (a GLM)– Hazards (ratios) Proportional Hazards regr– Quantiles (ratios) Parametric (AFT) survival regr

• A special case for this course– Cumulative incidence Exponential regression (a GLM)



24

General Regression

• General notation for variables and parameter

• The parameter might be the mean, geometric mean, odds, rate, instantaneous risk of an event (hazard), etc.

• (Sometimes we will use multiple regression predictors to model the scientific POI– E.g., linear and quadratic terms, dummy variables, etc.)

ofon distributi of measure)(summary Parameter subjectth for the variablesadjustment of Value

subject th for the POI theof Value subject th on the measured Response

,, 21

ii

ii

i

i

Yi

ii

WWXY



25

Multiple Regression

• General notation for multiple regression model

• The link function can usually be viewed as either none (means) or log (geom mean, odds, hazard) but some exceptions– log odds viewed by many as logit mean– complementary log log link for proportions measuring cumulative

incidence, which relate to log link on exponential hazards• log (- log p) = log + log Δt

" covariatefor Slope" "Interest of Predfor Slope" Intercept""

modelingfor usedfunction link""

1

1

0

231210

jj

iiii

WX

gWWXg



26

Generalized Linear Model (GLM)

• A special subset of my general regression model is the “Generalized Linear Model”– θ is the mean of response Y– Y has a 1-parameter exponential family distribution

• Includes normal (Gaussian), binomial, Poisson, exponential, gamma, negative binomial, log-normal…

• Estimating equations derived from maximum likelihood theory in these “regular” models (i.e., distributions that satisfy certain technical assumptions)– MLEs are “consistent”: arbitrarily close to the truth with large n– MLEs are “asymptotically efficient”: have greatest precision

possible in general (meet Cramer-Rao lower bound)



27

GLM Estimating Equations

• Use maximum likelihood estimation assuming a parametric distribution for the data

• Quasi-likelihood and generalized estimating equations (GEE) use same general equation, but do not assume parametric distribution– Only model mean and mean-variance relationship

0ˆˆˆ

:satisfy toˆ estimatesparameter regression Find

:equation" estimatingfunction Score",~|

ˆ1

112

221102

ii

in

i i

iij

i

iiii

n

i i

iiin

i i

iij

iiiiiiiii

VYU

xgVYYU

xxxVxXY



28

“Family”: Mean Variance Relationship

• If we trust our linear model and data distribution completely, then using the correct mean-variance relationship will be more efficient– Weights observations less if the variance is greater– Works because every weighted average will correctly estimate

the straight line relationship

in

i i

ii

in

i i

ii

in

i ii

ii

in

i

ii

YU

YU

YU

YU

12

1

1

12

lExponentia

Poisson

1 Bernoulli

Gaussian



29

Role of Link Function

• Conceptually, we can use any link function with any distributionfamily

• There are often advantages of using the “canonical” link for each family: These tend to be the default value in statistical software– Gaussian distribution Identity link– Binomial distribution Logit link (on means)– Poisson distribution Log link– Exponential distribution Inverse link: g(μ) = 1/ μ



30

Binary Data: Role of Link Function

• With Bernoulli data, we might consider three different link functions

ORX

eeYU

RRXeeYU

RDXXX

XYU

i

n

iX

X

i

i

n

iX

Xi

i

n

i ii

ii

i

i

i

i

1

1ˆ

1

1 Logit

1 Log

ˆ1ˆ Identity



31

Uses of Regression

• Modeling questions about associations– “Defining averages” across groups– “Defining contrasts” for associations between response and POI– “Defining contrasts” for detecting effect modification

• Enabling comparisons across POI groups that are “otherwise similar” with respect to other variables– Adjusting for confounding– Adjusting to gain precision



32

Borrowing Information

• Use other groups to make estimates in groups with sparse data

• Intuitively: 67 and 69 year olds would provide some relevant information about 68 year olds

• Assuming straight line relationship in modeled covariates tells us how to adjust data from other (even more distant) age groups– If we do not know about the exact functional relationship, we

might want to borrow information only close to each group



33

Defining “Contrasts”

• Define a comparison across groups to use when answering scientific question

• If straight line relationship in parameter, slope for POI compares parameter between groups differing by 1 unit in X when all othercovariates in model are equal

• If nonlinear relationship in parameter, slope is average comparison of parameter between groups differing by 1 unit in X “holding covariates constant”– Statistical jargon: a “contrast” across the groups

• If multiple regression predictors model the POI, interpetation of the contrast is more difficult



34

Comparison of Models

• The major difference between regression models is interpretationof the parameters– Summary: Mean, geometric mean, odds, hazards– Comparison of groups: Difference, ratio

• Issues related to inclusion of covariates remain the same– Address the scientific question

• Predictor of interest; Effect modifiers– Address confounding– Increase precision



35

Interpretation of Parameters

• Intercept– Corresponds to a population with all modeled covariates equal to

zero– Most often outside range of data; quite often impossible; very

rarely of interest by itself

• Slope– A comparison between groups differing by 1 unit in corresponding

covariate, but agreeing on all other modeled covariates• Identity link: a difference in the summary measure• Log link: a ratio in the summary measure (difference in logs)

– Sometimes impossible to use this definition when modeling interactions or complex curves

• I most often resort to looking at predicted summary measures



36


• E.g., modeling mean of response Y on predictors X, W1, W2, …

1

12110

1210

0

1210

: of Difference

1

00

Model

ijijii

ijijii

jii

iii

wxwWxXwxwWxX

WX

WX



37


• Most common example: Linear regression for difference in means– Ordinary (unweighted) least squares

• Optimal (in some sense): independent, homoscedastic response– Weighted least squares: weighting by inverse variance

• Optimal (in some sense): independent, heteroscedastic response– (Biost 540) General least squares: weighting by covariance matrix

• Optimal (in some sense) for correlated response data

• Optimality: For any distribution having a variance, then– “Best linear unbiased estimate”: greatest precision among all

unbiased estimators that are a linear combination of the data• Gauss-Markov theorem

– If response is normal within groups and linear model correct, then• Unbiased and efficient (greatest possible precision)• Also MLE



38

Linear Regression: Ordinary, Weighted

• Use least squares estimation– Ordinary least squares: assume equal variance– Weighted least squares: use actual variances

0ˆ

ˆ


:equation" estimatingfunction Score"

Minimize :squares"Least "

,~|

12

12

12

222110

2

ji

n

i i

iiij

i

ji

n

i i

iiij

n

i i

iii

iiiiiiii

xxYU

xxYU

xY

xxxxxXY



39

Properties of Least Squares Estimates

• In unweighted regression with no covariates– Estimated intercept is the sample mean

• In unweighted regression with a single binary estimator X– Estimated intercept is the sample mean for group with X=0– Estimated slope is the difference in sample means

• Group with X=1 minus group with X=0

• In unweighted regression with as many parameters as distinct combinations of predictors– The fitted value for every group will correspond exactly to the

sample mean

• In weighted regressions the last two may not hold, unless all subjects with the same covariate values have the same weights



40

Linear Regression: Binary Response

• A mean-variance relationship: mean μ = p variance p(1-p)

• Optimal (BLUE) estimates would use weighted least squares– We have to estimate the weights in an iterative search

0ˆ1ˆ

ˆˆ


1


1

112

ji

n

iiiii

iiij

i

ji

n

i iiii

iiiji

n

i i

iiij

xxx

xYU

xxx

xYxxYU



41

Stata: Binary Response with Identity Link

• Weighted least squares using iteratively estimated weights:– glm y x1 …,family(binomial) link(identity)– binreg y x1 …,rd– cs y x1

• Ordinary (unweighted) least squares :– glm y x1 …,family(gaussian) link(identity)– regress y x1 …

• Robust standard errors can be obtained with either– Specify option robust with glm or regress – Specify option vce(robust) with binreg



42

Examples: From Fall 2013 HW #2

• Modeling females’ risk of dying within 4 years as a function of– Estrogen use (0 or 1)– Prior history of cardiovascular disease (0 or 1)– Age (65 – 100 years)

• Derived variable estr_prev = estrogen * prevdis

• Comparisons– No covariates– Binary covariate– Saturated model with two binary covariates– Continuous covariate



43

Fitted Values with Linear Age Fit

• Some fitted values estimate a negative probability– Weights would be negative in weighted regression– Alters the “contrast” across groups

0.0

5.1

.15

.2Fi

tted

valu

es

60.00 70.00 80.00 90.00 100.00age

No estrogen, no CVD No estrogen, prior CVDEstrogen, no CVD Estrogen, prior CVD

fit5: Age Continuous Linear



44

Relative Advantages

• IF the linear model is correct, weighting by the estimated mean-variance relationship is most efficient

• HOWEVER, if the linear model is not correct, we can end up with negative variance estimates– Not a good thing to use when weighting

• The unweighted analysis– is unbiased if the linear model is correct, – interpretable as a linear trend when the linear model is not exactly

correct, and– can be adjusted for the heteroscedasticity when making inference



45

Linear Regression: “Gaussian” Response

• No mean-variance relationship

• Optimal (BLUE) estimates only if homoscedastic– But unbiased regardless

0ˆˆ



1

1

ji

n

iiij

i

ji

n

iiij

xxYU

xxYU



46

Regression with Log Link


1

12110

1210

0

1210

:log of Difference

log1 log

log00

log Model

ijijii

ijijii

jii

iii

wxwWxXwxwWxX

WX

WX



47

Regression with Log Link: Back Transform


1

12110

1210

0

:of Ratio

1

00

log Model

1210

e

ewWxXewWxX

eWX

WX

i

i

wxjijii

wxjijii

jii

iii



48

Log Link: Binary Response

• A mean-variance relationship: mean μ = p variance p(1-p)

• Log link: log(p) = linear predictor

01

ˆ


1

:equation"estimatingfunction Score"

1ˆ

ˆ

112

ji

n

ix

xi

j

i

xji

n

ixx

xix

ji

n

i i

xi

j

xeeYU

exee

eYexeYU

ii

ii

ii

iiii

iiii

ii



49

Relative Advantages

• IF the linear model is correct, weighting by the estimated mean-variance relationship is most efficient

• HOWEVER, if the linear model is not correct, we can end up with negative variance estimates when the mean estimate > 1– Not a good thing to use when weighting

• We can do an unweighted analysis based on Poisson regression– is consistent if the linear model is correct, – interpretable as a linear trend when the linear model is not exactly

correct, and– can be adjusted for the heteroscedasticity when making inference

• This is generally important to do– (Can use a time variable equal to 1 for all subjects)



50

Poisson Regression

22110

1

22110

loglog:logfor 1 oft coefficienknown a

having exposure"" with regression formulate that weNote

0ˆ:satisfy toˆ estimatesparameter regression Find

link Log


log~|

iiiii

i

j

i

i

n

i

Xii

iiiiiiiii

xxttt

U

XetYU

xxxtPxXY

i



51

Simple Poisson Regression

• Modeling rate of count response Y on predictor X

110

10

0

10

log1 log

log0

loglog,| Model!

|Pr P~ Distn

xxXxxX

X

XTTXTYEk

tetTkYtY

ii

ii

ii

iiiiiii

kii

t

iiiiii

ii



52

Interpretation as Rates

• Exponentiation of parameters

110

10

0

1 0

loglog,| Model!

|Pr P~ Distn

10

eeexXeexX

eX

XTTXTYEk

tetTkYtY

xii

xii

ii

iiiiiii

kii

t

iiiiii

ii



53

Simple Poisson Regression

• Interpretation of the model

• Rate when predictor is 0– Found by exponentiation of the intercept from the Poisson

regression: exp(0)

• Rate ratio between groups differing in the value of the predictor by 1 unit– Found by exponentiation of the slope from the Poisson

regression: exp(1)



54

Stata Commands

• Same form as for other regression models

• Exception:– If the observed counts are measured over different amounts of

time or space, we must specify the length of “exposure”– poisson respvar predvar, exposure(tm) [robust]

• Exposure can also be given as the “offset”, which is just the log of the exposure time– poisson respvar predvar, offset(logtm) [robust]

• By default, Stata reports estimates on the log mean and log mean ratio scale– Specifying the option irr will cause Stata to suppress output of

the intercept and to report “incidence rate ratios”



55

Stata: Poisson regression

• Weighted least squares using iteratively estimated weights:– poisson y x1 …,exposure(t)– glm y x1 …,family(poisson) link(log)

• Robust standard errors can be obtained with either– Specify option robust with glm or poisson– Specify option vce(robust) with binreg



56

Similarity to Other Regressions

• Poisson regression uses maximum likelihood estimation to find parameter estimates

• If a saturated model is fit, the estimated mean in each group will agree exactly with the sample mean

• In large samples, the regression parameter estimates are approximately normally distributed– P values and CI that are displayed for each parameter estimate

are Wald- based estimates

ˆˆ

ˆ

)()()( :statTest

ˆˆˆ )()()( :CI 95%

0

2/1

esZ

errstdnullestimateZ

eszerrstdvaluecritestimate



57

Technical Details

• Unlike linear regression, there is no closed form expression to find the Poisson regression parameter estimates

• Instead, computer programs use an iterative search

• This search may fail in saturated or nearly saturated models if some parameter corresponds to a group having no events– In this setting, Poisson regression parameters modeling the log

mean are trying to estimate negative infinity– The sample size is too small for the model



58

Example

• Prevalence of stroke (cerebrovascular accident- CVA) by age in subset of Cardiovascular Health Study

• Response variable is CVA– Binary variable: 0= no history of prior stroke, 1= prior history of

stroke

• Predictor variable is Age– Continuous predictor



59

Example: Regression Model

• Answer question by assessing linear trends in log odds of strokeby age

• Estimate best fitting line to log odds of CVA within age groups

• An association will exist if the slope (1) is nonzero– In that case, the odds (and probability) of CVA will be different

across different age groups

AgeAgeCVAE 10log



60

Parameter Estimates. poisson cva age, robust

(iteration info deleted)

Poisson regression

Number of obs = 735

Wald chi2(1) = 2.61

Prob > chi2 = 0.1064

Log pseudolklhd = -245.0 Pseudo R2 = 0.0044

| Robust

cva | Coef. StdErr z P>|z| [95% Conf Intvl]

age | .0298 .0184 1.61 0.106 -.00637 .0659

_cons | -4.52 1.40 -3.23 0.001 -7.26 -1.77



61

Interpretation of Stata Output

• Regression model for CVA on age

• Intercept is labeled by “_cons”– Estimated intercept: -4.52

• Slope is labeled by variable name: “age”– Estimated slope: 0.0298

• Estimated linear relationship:– log probability of CVA by age group given by

iAgeCVA 0298.052.4]E[ log



62

Interpretation of Intercept

• Estimated log odds CVA for newborns is -4.52– Probability of CVA for newborns is e-4.52 = 0.0109

• Pretty ridiculous to try to estimate– We never sampled anyone less than 67– In this problem, the intercept is just a tool in fitting the model

iAgeCVA 0298.052.4] E[ log



63

Interpretation of Slope

• Estimated difference in log probability of CVA for two groups differing by one year in age is 0.0298, with older group tending to higher probability– Risk Ratio: e0.0298= 1.0302– For 5 year age difference: e5x0.0298= 1.03025 = 1.161

• (If a straight line relationship is not true, we interpret the slope as an average difference in log odds CVA per one year difference inage)

iAgeCVA 0298.052.4] E[ log



64

Example: Interpretation

• “From Poisson regression analysis, we estimate that for each year difference in age, the probability of stroke is 3.02% higher in the older group, though this estimate is not statistically significant (P = .106). A 95% CI suggests that this observation is not unusual if a group that is one year older might have probability of stroke that was anywhere from 0.6% lower or 6.8% higher than the younger group.”



65

Logistic Regression

• Binary response variable

• Allows continuous (or multiple) grouping variables– But is OK with binary grouping variable also

• Compares odds of response across groups– “Odds ratio”



66

Logistic Regression

0ˆ:satisfy toˆ estimatesparameter regression Find

1 link Logit


1,1~|

1

22110

j

i

i

n

iX

X

i

iiiX

X

iii

U

Xe

eYU

xxxe

eBxXY

i

i

i

i



67

Simple Logistic Regression

• Modeling odds of binary response Y on predictor X

110

10

0

10

oddslog1 oddslog

oddslog0

1loglogit Model

1Pr on Distributi

xxXxxX

X

Xp

pp

pY

i

i

i

ii

ii

ii



68

Interpretation as Odds

• Exponentiation of regression parameters

110

10

0

10

odds1 odds odds0

1 Model

1Pr on Distributi

eeexXeexX

eX

eep

p

pY

xi

xi

i

X

i

i

ii

i



69

Estimating Proportions

• Proportion = odds / (1 + odds)

110

110

10

10

00

10

10

11

1

1/0

1 Model

1Pr on Distributi

eeeeeepxX

eeeepxX

eepX

eeeep

pY

x

x

ii

x

x

ii

ii

X

X

i

ii

i

i



70

Parameter Interpretation

• Interpretation of the logistic regression parameters based on odds

• Odds when predictor is 0– Found by exponentiation of the intercept from the logistic

regression: exp(0)

• Odds ratio between groups differing in the value of the predictor by 1 unit– Found by exponentiation of the slope from the logistic regression:

exp(1)



71

Similarity to Other Regressions

• Logistic regression uses maximum likelihood estimation to find parameter estimates

• If a saturated model is fit, the estimated odds of event in eachgroup will agree exactly with the sample odds

• In large samples, the regression parameter estimates are approximately normally distributed– P values and CI that are displayed for each parameter estimate

are Wald- based estimates

ˆˆ

ˆ

)()()( :statTest

ˆˆˆ )()()( :CI 95%

0

2/1

esZ

errstdnullestimateZ

eszerrstdvaluecritestimate



72

Technical Details

• Unlike linear regression, there is no closed form expression to find the logistic regression parameter estimates

• Instead, computer programs use an iterative search

• This search may fail in saturated or nearly saturated models if some parameter corresponds to a group having all events or no events– In this setting, logistic regression parameters modeling the log

odds are trying to estimate positive or negative infinity– The sample size is too small for the model



73

Stata: Logistic regression

• Weighted least squares using iteratively estimated weights:– logistic y x1 …, or logit y x1 …,– glm y x1 …,family(binomial) link(logit)– binreg y x1 …,

• Robust standard errors can be obtained with either– Specify option robust with glm or logistic / logit– Specify option vce(robust) with binreg



74

Example

• Prevalence of stroke (cerebrovascular accident- CVA) by age in subset of Cardiovascular Health Study

• Response variable is CVA– Binary variable: 0= no history of prior stroke, 1= prior history of

stroke

• Predictor variable is Age– Continuous predictor



75

Example: Regression Model

• Answer question by assessing linear trends in log odds of strokeby age

• Estimate best fitting line to log odds of CVA within age groups

• An association will exist if the slope (1) is nonzero– In that case, the odds (and probability) of CVA will be different

across different age groups

AgeAgeCVAodds 10log



76

Parameter Estimates. logit cva age

(iteration info deleted)

Number of obs = 735

LR chi2(1) = 2.45

Prob > chi2 = 0.1175

Log likelihood = -240.98969

Pseudo R2 = 0.0051

cva | Coef StdErr z P>|z| [95% Conf Int]

age | .0336 .0210 1.59 0.111 -.0077 .0748

_cons | -4.69 1.591 -2.95 0.003 -7.810 -1.572



77

Interpretation of Stata Output

• Regression model for CVA on age

• Intercept is labeled by “_cons”– Estimated intercept: -4.69

• Slope is labeled by variable name: “age”– Estimated slope: 0.0336

• Estimated linear relationship:– log odds CVA by age group given by

iAgeCVA 0336.069.4 odds log



78

Interpretation of Intercept

• Estimated log odds CVA for newborns is -4.69– Odds of CVA for newborns is e-4.69 = 0.0092– Probability of CVA for newborns

• Use prob = odds / (1+odds): .0092 / (1+.0092)= .0091

• Pretty ridiculous to try to estimate– We never sampled anyone less than 67– In this problem, the intercept is just a tool in fitting the model




79

Interpretation of Slope

• Estimated difference in log odds CVA for two groups differing byone year in age is 0.0336, with older group tending to higher log odds– Odds Ratio: e0.0336= 1.034– For 5 year age difference: e5x0.0336= 1.0345 = 1.183

• (If a straight line relationship is not true, we interpret the slope as an average difference in log odds CVA per one year difference inage)




80

Stata: “logit” versus “logistic”

• We are rarely interested in the intercept by itself– We do have to use it when estimating odds of an event in a single

group

• Given that we are rarely interested in the intercept, we might as well use the “logistic” command– It will provide inference for the odds ratio– We don’t have to exponentiate the slope estimate



81

Odds Ratios using “logistic”.logistic cva age

Logistic regression Number of obs = 735

LR chi2(1) = 2.45

Prob > chi2 = 0.1175

Log likelihood = -240.98969

Pseudo R2 = 0.0051

cva |Odds Ratio StdErr z P>|z| [95% Conf Int]

age | 1.034 .0218 1.59 0.111 .992 1.078



82

Example: Interpretation

• “From logistic regression analysis, we estimate that for each year difference in age, the odds of stroke is 3.4% higher in the older group, though this estimate is not statistically significant (P = .113). A 95% CI suggests that this observation is not unusual if a group that is one year older might have odds of stroke that was anywhere from 0.8% lower or 7.8% higher than the younger group.”



83

Comments on Interpretation

• I express this as a difference between group odds rather than a change with aging– We did not do a longitudinal study

• To the extent that the true group log odds have a linear relationship, this interpretation applies exactly

• If the true relationship is nonlinear– The slope estimates the “first order trend” for the sampled age

distribution– We should not regard the estimates of individual group

probabilities / odds as accurate



84

Logistic Regression and 2 Test

• Logistic regression with a binary predictor (two groups) corresponds to familiar chi squared test

• Three possible statistics from logistic regression– Wald: The test based on the estimate and SE– Score: Corresponds to chi squared test, but not given in Stata

output– Likelihood ratio test: Can be obtained using post-regression

commands in Stata (covered with adjusting for covariates)



85

Properties of MLEs with Canonical Link

• In regression with no covariates– Estimated intercept is the sample mean

• In regression with a single binary estimator X– Estimated intercept is the sample mean for group with X=0– Estimated slope is the difference in sample means

• Group with X=1 minus group with X=0

• In regression with as many parameters as distinct combinations of predictors– The fitted value for every group will correspond exactly to the

sample mean– A “saturated model”



86

Stata: Regression Commands

• Linear regression (means, proportions)– regress Y X W1 W2 W3, robust

• Linear regression ( geometric means)– regress logY X W1 W2 W3, robust

• Logistic regression (odds)– logistic Y X W1 W2 W3, [robust]– logit Y X W1 W2 W3, [robust]

• Poisson regression (rates)– poisson Y X W1 W2 W3, robust exposure(timevar)

• Proportional hazards regression (hazards)– stset Y Event– stcox X W1 W2 W3, robust



87

Stata: glm Regression Commands

• Difference of means, proportions (linear regression)– glm Y X W1, robust family(gaussian) link(identity)

• Difference of log means, log proportions– glm Y X W1 W2, robust link(log)

• Difference of log geometric means (linear regression on logs)– glm logY X W1 W2, robust

• Difference of log odds (logistic regression)– glm Y X W1, [robust] family(binomial) link(logit)

• Difference of log rates (Poisson regression)– glm Y X W1 W2, robust family(poisson) link(log)

• Back transforming regression parameters with log link– glm Y X W1 W2, robust family(…) link(…) eform



88

Adjustment for Covariates

• We “adjust” for other covariates

• Define groups according to– Predictor of interest, and– Other covariates

• Compare the distribution of response across groups which– differ with respect to the Predictor of Interest, but– are the same with respect to the other covariates

• “holding other variables constant”



89

Adjusted Regression Analyses



90

Controlling Variation

• In a two sample comparison of means, we might control some variable in order to decrease the within group variability– Restrict population sampled– Standardize ancillary treatments and exposures after accrual– Standardize measurement procedure



91

Adjusting for Covariates

• When comparing means using stratified analyses or linear regression, adjustment for precision variables decreases the within group standard deviation– Var (Y | X) vs Var (Y | X, W)



92

Ex: Linear Regression

tindependen if 0 ,orthogonal if 0

|

1ˆˆ

ion)randomizat (completet independen if,|ˆ|ˆ)stratified (balanced, "orthogonal" if

,,1,,~,| :Adj

,,1,,~| :Unadj

22

22

22

2

2

1

2

1

11

11

2210

210

|,|

,||

,|

|

XWXW

XW

W

ii

ind

iii

i

ind

ii

rr

XWVar

rXnVarse

XnVarse

WXEEXE

niWXWXY

niXXY

XYWXY

WXYXY

WXY

XY



93

Precision with Proportions

• When analyzing proportions (means), the mean variance relationship is important

• Precision is greatest when proportion is close to 0 or 1

• Greater homogeneity of groups makes results more deterministic– (At least, I always hope for this)

• Hence, we should get lower within-group variance upon adjusting for prognostic variables



94

Ex: Diff of Indep Proportions

2

22

1

212

221

2212121

2121

ˆ/1

1

ˆˆˆ/;

,,1;2,1,,1~

nnnVserrV

pp

YYpppp

nnrnnn

njipBYind

iii

iiij



95

Precision with Odds

• When analyzing odds (a nonlinear function of the mean), adjusting for a precision variable results in more extreme estimates– odds = p / (1-p)– odds using average of stratum specific p is not the average of

stratum specific odds

• Generally, little “precision” is gained due to the mean-variance relationship– Unless the precision variable is highly prognostic



96

Precision with Hazards

• When analyzing hazards, adjusting for a precision variable results in more extreme estimates

• The standard error tends to still be related to the number of observed events– Higher hazard ratio with same standard error greater precision



97

Adjustment for Covariates

• We “adjust” for other covariates

• Define groups according to– Predictor of interest, and– Other covariates

• Compare the distribution of response across groups which– differ with respect to the Predictor of Interest, but– are the same with respect to the other covariates

• “holding other variables constant”



98

Unadjusted vs Adjusted Models

• Adjustment for covariates changes the scientific question

• Unadjusted models– Slope compares parameters across groups differing by 1 unit in

the modeled predictor• Groups may also differ with respect to other variables

• Adjusted models– Slope compares parameters across groups differing by 1 unit in

the modeled predictor but similar with respect to other modeled covariates



99

Interpretation of Slopes

• Difference in interpretation of slopes

– β1 = Compares for groups differing by 1 unit in X• (The distribution of W might differ across groups being compared)

– γ1 = Compares for groups differing by 1 unit in X, but agreeing in their values of W

iiii WXWXg 210, :Model Adjusted

ii XXg 10 :Model Unadjusted



100

Comparing models

?ˆˆˆˆ is When

?ˆˆ isn Whe:Statistics

?ˆˆ is When

? isen Wh:Science

, Adjusted

Unadjusted

11

11

11

11

210

10

eses

sese

WXWXg

XXg

iiii

ii



101

General Results

• These questions can not be answered precisely in the general case

• However, in linear regression we can derive exact results

• These will serve as a basis for later examination of– Logistic regression– Poisson regression– Proportional hazards regression



102

Linear Regression


– β1 = Diff in mean Y for groups differing by 1 unit in X• (The distribution of W might differ across groups being compared)

– γ1 = Diff in mean Y for groups differing by 1 unit in X, but agreeing in their values of W

iiiii WXWXYE 210, :Model Adjusted

iii XXYE 10 :Model Unadjusted



103

Relationships: True Slopes• The slope of the unadjusted model will tend to be

• Hence, true adjusted and unadjusted slopes for X are estimating the same quantity only if

– ρXW = 0 (X and W are truly uncorrelated), OR

– (no association between W and Y after adjusting for X)

211

X

WXW

02



104

Relationships: Estimated Slopes• The estimated slope of the unadjusted model will be

• Hence, estimated adjusted and unadjusted slopes for X are equal only if

– rXW = 0 (X and W are uncorrelated in the sample, which can be arranged by experimental design), OR

– (which cannot be predetermined, because Y is random)

– sW = 0 (W is controlled at a single value in which case rXW = 0)

XWYWYXX

WXW rrrs

sr211 ˆ1ˆˆ

0ˆ2



105

Relationships: True SE

2,|

2|

22

2|

22

22

1

2

1

,|||

1,

ˆ Model Adjusted

ˆ Model Unadjusted

WXYXWXY

XW

WXYVarXWVarXYVar

rXnVarWXYVar

se

XnVarXYVar

se



106

Relationships: True SE

0| OR 0AND

0if ˆˆ Thus,

,|||

1,

ˆ Model Adjusted

ˆ Model Unadjusted

2

11

22

22

1

2

1

XWVar

rsese

WXYVarXWVarXYVar

rXnVarWXYVar

se

XnVarXYVar

se

XW

XW



107

Relationships: Estimated SE

2210

2

10

222

1

2

2

1

ˆˆˆ,|

ˆˆ|

113/,

ˆˆ Model Adjusted

12/ˆˆ Model Unadjusted

iii

ii

XWX

X

WXYWXYSSE

XYXYSSE

rsnnWXYSSE

es

snnXYSSE

es



108


3/,2/AND

0if ˆˆˆˆ Thus,

113/,

ˆˆ Model Adjusted

12/ˆˆ Model Unadjusted

11

222

1

2

2

1

nWXYSSEnXYSSE

reses

rsnnWXYSSE

es

snnXYSSE

es

XW

XWX

X



109

Residual Squared Error

WXYSSEXYSSE

WXYWXYSSE

XYXYSSE

iii

ii

,||:data same on the calculatedWhen

ˆˆˆ,|

ˆˆ|2

210

2

10



110


0ˆ if ,|| and ,0OR

,|| casein which ,0ˆ if ˆˆ Now

ˆˆˆ,|

ˆˆ|

2

2

11

2210

2

10

WXYSSEXYSSEr

WXYSSEXYSSE

WXYWXYSSE

XYXYSSE

XW

iii

ii



111

Special Cases

• Behavior of unadjusted and adjusted models according to whether– X and W are uncorrelated (no association in means)– W is associated with Y after adjustment for X

InflationVar Irrelevant0gConfoundinPrecision0

00

2

2

XWXW rr



112

Simulations

• Unadjusted and adjusted estimates of effect of binary POI as a function of– Effect of a covariate on summary of outcome (mean, odds, …)– Sampling scheme: Association between covariate and POI

• Difference in mean covariate• Difference in median covariate

iiii

ii

XXX

WXW

iii

WXWXg

XXg

WWssr

XXWE

210

10

011

10

, : Adjusted

:Unadjusted

ˆ

: Sampling



113

Linear Regression

• Simulation results

Truth Avg Estimates (SE)

ΔMdn α1 rXW γ2 γ1 β1 γ1

Irrelevant 0.0 0.0 0.00 0.0 0.0 0.0 (0.20) 0.0 (0.20)

Precision 0.0 0.0 0.00 1.0 0.0 0.0 (0.28) 0.0 (0.19)

Precision - 0.3 0.0 0.00 1.0 0.0 0.0 (0.28) 0.0 (0.20)

Precision 0.0 0.0 0.00 1.0 1.0 1.0 (0.28) 1.0 (0.20)

Confound 0.3 0.3 0.15 1.0 0.0 0.3 (0.28) 0.0 (0.21)

Confound 0.0 0.3 0.15 1.0 0.0 0.3 (0.29) 0.0 (0.21)

Var Inflatn 0.0 1.0 0.45 0.0 0.0 0.0 (0.20) 0.0 (0.22)



114

Linear Regression




Irrelevant 0.0 0.0 0.00 0.0 0.0 0.0 (0.20) 0.0 (0.20)

Precision 0.0 0.0 0.00 1.0 0.0 0.0 (0.28) 0.0 (0.19)

Precision - 0.3 0.0 0.00 1.0 0.0 0.0 (0.28) 0.0 (0.20)

Precision 0.0 0.0 0.00 1.0 1.0 1.0 (0.28) 1.0 (0.20)

Confound 0.3 0.3 0.15 1.0 0.0 0.3 (0.28) 0.0 (0.21)

Confound 0.0 0.3 0.15 1.0 0.0 0.3 (0.29) 0.0 (0.21)

Var Inflatn 0.0 1.0 0.45 0.0 0.0 0.0 (0.20) 0.0 (0.22)



115

Linear Regression




Irrelevant 0.0 0.0 0.00 0.0 0.0 0.0 (0.20) 0.0 (0.20)

Precision 0.0 0.0 0.00 1.0 0.0 0.0 (0.28) 0.0 (0.19)

Precision - 0.3 0.0 0.00 1.0 0.0 0.0 (0.28) 0.0 (0.20)

Precision 0.0 0.0 0.00 1.0 1.0 1.0 (0.28) 1.0 (0.20)

Confound 0.3 0.3 0.15 1.0 0.0 0.3 (0.28) 0.0 (0.21)

Confound 0.0 0.3 0.15 1.0 0.0 0.3 (0.29) 0.0 (0.21)

Var Inflatn 0.0 1.0 0.45 0.0 0.0 0.0 (0.20) 0.0 (0.22)



116

Linear Regression




Irrelevant 0.0 0.0 0.00 0.0 0.0 0.0 (0.20) 0.0 (0.20)

Precision 0.0 0.0 0.00 1.0 0.0 0.0 (0.28) 0.0 (0.19)

Precision - 0.3 0.0 0.00 1.0 0.0 0.0 (0.28) 0.0 (0.20)

Precision 0.0 0.0 0.00 1.0 1.0 1.0 (0.28) 1.0 (0.20)

Confound 0.3 0.3 0.15 1.0 0.0 0.3 (0.28) 0.0 (0.21)

Confound 0.0 0.3 0.15 1.0 0.0 0.3 (0.29) 0.0 (0.21)

Var Inflatn 0.0 1.0 0.45 0.0 0.0 0.0 (0.20) 0.0 (0.22)



117

Linear Regression




Irrelevant 0.0 0.0 0.00 0.0 0.0 0.0 (0.20) 0.0 (0.20)

Precision 0.0 0.0 0.00 1.0 0.0 0.0 (0.28) 0.0 (0.19)

Precision - 0.3 0.0 0.00 1.0 0.0 0.0 (0.28) 0.0 (0.20)

Precision 0.0 0.0 0.00 1.0 1.0 1.0 (0.28) 1.0 (0.20)

Confound 0.3 0.3 0.15 1.0 0.0 0.3 (0.28) 0.0 (0.21)

Confound 0.0 0.3 0.15 1.0 0.0 0.3 (0.29) 0.0 (0.21)

Var Inflatn 0.0 1.0 0.45 0.0 0.0 0.0 (0.20) 0.0 (0.22)



118

Linear Regression




Irrelevant 0.0 0.0 0.00 0.0 0.0 0.0 (0.20) 0.0 (0.20)

Precision 0.0 0.0 0.00 1.0 0.0 0.0 (0.28) 0.0 (0.19)

Precision - 0.3 0.0 0.00 1.0 0.0 0.0 (0.28) 0.0 (0.20)

Precision 0.0 0.0 0.00 1.0 1.0 1.0 (0.28) 1.0 (0.20)

Confound 0.3 0.3 0.15 1.0 0.0 0.3 (0.28) 0.0 (0.21)

Confound 0.0 0.3 0.15 1.0 0.0 0.3 (0.29) 0.0 (0.21)

Var Inflatn 0.0 1.0 0.45 0.0 0.0 0.0 (0.20) 0.0 (0.22)



119

Take Home Message: Risk Difference

• The magnitude of confounding is a product of– Magnitude of association between covariate and response AND– Difference of mean covariate value across POI groups

• If adjustment would be linear, mean (not median, etc) matters

• If no confounding, adjusted and unadjusted estimates will be equal (approximately)

• Standard errors are– Decreased if added covariates are (conditionally) associated with

response– Increased if added covariates that are (conditionally) correlated

with the POI

• After adjusting for a confounder, the precision of the estimates for response-POI association can be larger or smaller than in an unadjusted analysis



120

Logistic Regression




Irrelevant 0.0 0.0 0.00 0.0 0.0 0.0 (0.42) 0.0 (0.42)

Precision 0.0 0.0 0.00 1.0 0.0 0.0 (0.40) 0.0 (0.42)

Precision - 0.3 0.0 0.00 1.0 0.0 0.0 (0.42) 0.0 (0.43)

Precision 0.0 0.0 0.00 1.0 1.0 0.8 (0.43) 1.0 (0.49)

Confound 0.3 0.3 0.15 1.0 0.0 0.3 (0.43) 0.0 (0.48)

Confound 0.0 0.3 0.15 1.0 0.0 0.2 (0.41) 0.0 (0.47)

Var Inflatn 0.0 1.0 0.45 0.0 0.0 0.0 (0.41) 0.0 (0.47)



121

Logistic Regression




Irrelevant 0.0 0.0 0.00 0.0 0.0 0.0 (0.42) 0.0 (0.42)

Precision 0.0 0.0 0.00 1.0 0.0 0.0 (0.40) 0.0 (0.42)

Precision - 0.3 0.0 0.00 1.0 0.0 0.0 (0.42) 0.0 (0.43)

Precision 0.0 0.0 0.00 1.0 1.0 0.8 (0.43) 1.0 (0.49)

Confound 0.3 0.3 0.15 1.0 0.0 0.3 (0.43) 0.0 (0.48)

Confound 0.0 0.3 0.15 1.0 0.0 0.2 (0.41) 0.0 (0.47)

Var Inflatn 0.0 1.0 0.45 0.0 0.0 0.0 (0.41) 0.0 (0.47)



122

Logistic Regression




Irrelevant 0.0 0.0 0.00 0.0 0.0 0.0 (0.42) 0.0 (0.42)

Precision 0.0 0.0 0.00 1.0 0.0 0.0 (0.40) 0.0 (0.42)

Precision - 0.3 0.0 0.00 1.0 0.0 0.0 (0.42) 0.0 (0.43)

Precision 0.0 0.0 0.00 1.0 1.0 0.8 (0.43) 1.0 (0.49)

Confound 0.3 0.3 0.15 1.0 0.0 0.3 (0.43) 0.0 (0.48)

Confound 0.0 0.3 0.15 1.0 0.0 0.2 (0.41) 0.0 (0.47)

Var Inflatn 0.0 1.0 0.45 0.0 0.0 0.0 (0.41) 0.0 (0.47)



123

Logistic Regression




Irrelevant 0.0 0.0 0.00 0.0 0.0 0.0 (0.42) 0.0 (0.42)

Precision 0.0 0.0 0.00 1.0 0.0 0.0 (0.40) 0.0 (0.42)

Precision - 0.3 0.0 0.00 1.0 0.0 0.0 (0.42) 0.0 (0.43)

Precision 0.0 0.0 0.00 1.0 1.0 0.8 (0.43) 1.0 (0.49)

Confound 0.3 0.3 0.15 1.0 0.0 0.3 (0.43) 0.0 (0.48)

Confound 0.0 0.3 0.15 1.0 0.0 0.2 (0.41) 0.0 (0.47)

Var Inflatn 0.0 1.0 0.45 0.0 0.0 0.0 (0.41) 0.0 (0.47)



124

Logistic Regression




Irrelevant 0.0 0.0 0.00 0.0 0.0 0.0 (0.42) 0.0 (0.42)

Precision 0.0 0.0 0.00 1.0 0.0 0.0 (0.40) 0.0 (0.42)

Precision - 0.3 0.0 0.00 1.0 0.0 0.0 (0.42) 0.0 (0.43)

Precision 0.0 0.0 0.00 1.0 1.0 0.8 (0.43) 1.0 (0.49)

Confound 0.3 0.3 0.15 1.0 0.0 0.3 (0.43) 0.0 (0.48)

Confound 0.0 0.3 0.15 1.0 0.0 0.2 (0.41) 0.0 (0.47)

Var Inflatn 0.0 1.0 0.45 0.0 0.0 0.0 (0.41) 0.0 (0.47)



125

Take Home Message: Odds Ratios

• The magnitude of confounding is primarily a function of– Magnitude of association between covariate and response AND– Difference of mean covariate value across POI groups

• If adjustment would be linear, mean (not median, etc) matters most

• Even if no confounding, adjusted and unadjusted estimates will be different if new covariate (conditionally) associated with response– Adjusting for a precision variable will “deattenuate” OR (and

estimate)– The amount of “deattenuation” depends on

• The strength of association between the added covariate and the response, and

• The adjusted strength of association between the POI and the response



126

Deattenuation of OR



127

Take Home Message: Odds Ratios

• Standard errors of estimated OR measuring POI – response association are largely driven by the mean-variance relationship– SE of odds behaves like 1 / p(1-p)– Greater homogeneity of groups leads to p closer to 0 or 1– Usually statistical significance of the POI – response association

is affected very little by adjustment for a “precision” variable unless it is very strongly associated with response

• Note that if we persist in estimating the population OR instead of the adjusted OR, we do gain precision by adjustment– We do have to take a weighted average, however– See Tangen & Koch, Stat Med, 2000



128

Poisson Regression




Irrelevant 0.0 0.0 0.00 0.0 0.0 0.0 (0.18) 0.0 (0.18)

Precision 0.0 0.0 0.00 1.0 0.0 0.0 (0.27) 0.0 (0.10)

Precision - 0.3 0.0 0.00 1.0 0.0 0.4 (0.75) 0.0 (0.10)

Precision 0.0 0.0 0.00 1.0 1.0 1.0 (0.26) 1.0 (0.08)

Confound 0.3 0.3 0.15 1.0 0.0 0.3 (0.28) 0.0 (0.09)

Confound 0.0 0.3 0.15 1.0 0.0 0.7 (0.76) 0.0 (0.09)

Var Inflatn 0.0 1.0 0.45 0.0 0.0 0.0 (0.12) 0.0 (0.13)



129

Poisson Regression




Irrelevant 0.0 0.0 0.00 0.0 0.0 0.0 (0.18) 0.0 (0.18)

Precision 0.0 0.0 0.00 1.0 0.0 0.0 (0.27) 0.0 (0.10)

Precision - 0.3 0.0 0.00 1.0 0.0 0.4 (0.75) 0.0 (0.10)

Precision 0.0 0.0 0.00 1.0 1.0 1.0 (0.26) 1.0 (0.08)

Confound 0.3 0.3 0.15 1.0 0.0 0.3 (0.28) 0.0 (0.09)

Confound 0.0 0.3 0.15 1.0 0.0 0.7 (0.76) 0.0 (0.09)

Var Inflatn 0.0 1.0 0.45 0.0 0.0 0.0 (0.12) 0.0 (0.13)



130

Poisson Regression




Irrelevant 0.0 0.0 0.00 0.0 0.0 0.0 (0.18) 0.0 (0.18)

Precision 0.0 0.0 0.00 1.0 0.0 0.0 (0.27) 0.0 (0.10)

Precision - 0.3 0.0 0.00 1.0 0.0 0.4 (0.75) 0.0 (0.10)

Precision 0.0 0.0 0.00 1.0 1.0 1.0 (0.26) 1.0 (0.08)

Confound 0.3 0.3 0.15 1.0 0.0 0.3 (0.28) 0.0 (0.09)

Confound 0.0 0.3 0.15 1.0 0.0 0.7 (0.76) 0.0 (0.09)

Var Inflatn 0.0 1.0 0.45 0.0 0.0 0.0 (0.12) 0.0 (0.13)



131

Poisson Regression




Irrelevant 0.0 0.0 0.00 0.0 0.0 0.0 (0.18) 0.0 (0.18)

Precision 0.0 0.0 0.00 1.0 0.0 0.0 (0.27) 0.0 (0.10)

Precision - 0.3 0.0 0.00 1.0 0.0 0.4 (0.75) 0.0 (0.10)

Precision 0.0 0.0 0.00 1.0 1.0 1.0 (0.26) 1.0 (0.08)

Confound 0.3 0.3 0.15 1.0 0.0 0.3 (0.28) 0.0 (0.09)

Confound 0.0 0.3 0.15 1.0 0.0 0.7 (0.76) 0.0 (0.09)

Var Inflatn 0.0 1.0 0.45 0.0 0.0 0.0 (0.12) 0.0 (0.13)



132

Take Home Message: Risk Ratios

• In simulations, I allowed for “overdispersion” of mean-variance


• If adjustment would be linear, mean matters most, though the skewness of the predictor distribution does have an effect

• (There is a mixture of Poisson random variables)

• Adjusted and unadjusted estimates can be different if new covariate (conditionally) associated with response– Influence of skewed observations can be great– If added covariate symmetric with relatively small variance, there

is little difference between adjusted, unadjusted estimates



133

Take Home Message: Risk Ratio

• Standard errors tend to be– Decreased if added covariates are (conditionally) associated with

response– Increased if added covariates that are (conditionally) correlated

with the POI

• After adjusting for a confounder, the precision of the estimates for response-POI association can be larger or smaller than in an unadjusted analysis



134

Proportional Hazards Regression




Irrelevant 0.0 0.0 0.00 0.0 0.0 0.0 (0.20) 0.0 (0.20)

Precision 0.0 0.0 0.00 1.0 0.0 0.0 (0.21) 0.0 (0.22)

Precision - 0.3 0.0 0.00 1.0 0.0 0.0 (0.21) 0.0 (0.21)

Precision 0.0 0.0 0.00 1.0 1.0 0.7 (0.21) 1.0 (0.22)

Confound 0.3 0.3 0.15 1.0 0.0 0.2 (0.21) 0.0 (0.21)

Confound 0.0 0.3 0.15 1.0 0.0 0.1 (0.20) 0.0 (0.22)

Var Inflatn 0.0 1.0 0.45 0.0 0.0 0.0 (0.20) 0.0 (0.23)

Next



135





Irrelevant 0.0 0.0 0.00 0.0 0.0 0.0 (0.20) 0.0 (0.20)

Precision 0.0 0.0 0.00 1.0 0.0 0.0 (0.21) 0.0 (0.22)

Precision - 0.3 0.0 0.00 1.0 0.0 0.0 (0.21) 0.0 (0.21)

Precision 0.0 0.0 0.00 1.0 1.0 0.7 (0.21) 1.0 (0.22)

Confound 0.3 0.3 0.15 1.0 0.0 0.2 (0.21) 0.0 (0.21)

Confound 0.0 0.3 0.15 1.0 0.0 0.1 (0.20) 0.0 (0.22)

Var Inflatn 0.0 1.0 0.45 0.0 0.0 0.0 (0.20) 0.0 (0.23)

Next



136





Irrelevant 0.0 0.0 0.00 0.0 0.0 0.0 (0.20) 0.0 (0.20)

Precision 0.0 0.0 0.00 1.0 0.0 0.0 (0.21) 0.0 (0.22)

Precision - 0.3 0.0 0.00 1.0 0.0 0.0 (0.21) 0.0 (0.21)

Precision 0.0 0.0 0.00 1.0 1.0 0.7 (0.21) 1.0 (0.22)

Confound 0.3 0.3 0.15 1.0 0.0 0.2 (0.21) 0.0 (0.21)

Confound 0.0 0.3 0.15 1.0 0.0 0.1 (0.20) 0.0 (0.22)

Var Inflatn 0.0 1.0 0.45 0.0 0.0 0.0 (0.20) 0.0 (0.23)

Next



137





Irrelevant 0.0 0.0 0.00 0.0 0.0 0.0 (0.20) 0.0 (0.20)

Precision 0.0 0.0 0.00 1.0 0.0 0.0 (0.21) 0.0 (0.22)

Precision - 0.3 0.0 0.00 1.0 0.0 0.0 (0.21) 0.0 (0.21)

Precision 0.0 0.0 0.00 1.0 1.0 0.7 (0.21) 1.0 (0.22)

Confound 0.3 0.3 0.15 1.0 0.0 0.2 (0.21) 0.0 (0.21)

Confound 0.0 0.3 0.15 1.0 0.0 0.1 (0.20) 0.0 (0.22)

Var Inflatn 0.0 1.0 0.45 0.0 0.0 0.0 (0.20) 0.0 (0.23)

Next



138

Take Home Message: Hazard Ratios


• If adjustment would be linear, mean (not median, etc) matters most

• Even if no confounding, adjusted and unadjusted estimates will be different if new covariate (conditionally) associated with response (PH regression is related to logistic regression)– Adjusting for a precision variable will “deattenuate” HR (and

estimate)– The amount of “deattenuation” depends on

• The strength of association between the added covariate and the response, and

• The adjusted strength of association between the POI and the response



139

Take Home Message: Hazard Ratios

• Standard errors of estimated HR measuring POI – response association are largely driven by the mean-variance relationship of the sample size ratios– Unless there is a very strong association, the SE tends to be fairly

constant for a range of different HRs

• Holding the SE constant, we will obtain a lower p value with a deattenuated HR



140

Precision: Linear Regression

• E.g., X, W independent in population (or completely randomized experiment) AND W associated with Y independent of X

1111

1111

2

ˆˆˆˆˆˆ Errs Std

ˆˆ Slopes

Estimates Value True

00

esessese

XW



141

Precision: Logistic Regression

• Adjusting for a precision variable • Deattenuates slope away from the null• Standard errors reflect mean-variance relationship

– Substantially increased power only in extreme cases» (OR > 5 for equal samples sizes of binary W)

1111

11111

11111


ˆˆ :0

ˆˆ :0 Slopes


esessese



142

Precision: Poisson Regression

• Adjusting for a precision variable (with robust SE)• No effect on the slope (similar to linear regression)

– log ratios are linear in log means• Standard errors reflect mean-variance relationship

– Virtually no effect on power

1111

1111


ˆˆ Slopes


esessese



143

Precision: PH Regression

• Adjusting for a precision variable • Deattenuates slope away from the null• Standard errors stay fairly constant

– (Complicated result of binomial mean-variance)

1111

11111

11111


ˆˆ :0

ˆˆ :0 Slopes


esessese



144

Lin Reg: Stratified Randomization

• Stratified (orthogonal) randomization in a designed experiment– Also when designing observational study with matching

• Unless we adjust for the stratification variables, we estimate the true standard errors incorrectly

1111

1111

2


ˆˆ Slopes


00

esessese

rXW



145

Multiple Adjustment

• We can generalize the results about the difference in mean confounder values across a groups defined by a binary POI


– β1 = Diff in mean Y for groups differing by 1 unit in X• (The distribution of W might differ across groups being compared)

– γ1 = Diff in mean Y for groups differing by 1 unit in X, but agreeing in their values of all W’s

iiiiiii WWXWWXYE 23121021 ,,, :Model Adjusted

iii XXYE 10 :Model Unadjusted



146

Multiple Adjustment: Good News• For a binary POI, the slope of the unadjusted model will tend to be

• Hence, true adjusted and unadjusted slopes for X are definitelyestimating the same quantity if for each j

• (Of course, it is also possible the various terms cancel each other out…)

p

jXjXjj WW

101111

covariatesother for adjusting afteroutcome of predictivenot is Covariate

0

group POIeach in equal are valuescovariateMean

1

01

j

XjXj

OR

WW



147

Multiple Adjustment: Bad News• We must consider the conditional relationships between a covariate and

the POI after adjustment for other variables• Consider four possible models involving X, W, and Z

• Deciding between models U and W, we only need consider ρXW

• Deciding between models Z and WZ, we need to consider ρXZ and ρWZ as wel

iZiWiXiiii

iZiXiii

iWiXiii

iXii

ZWXZWXYE

ZXZXYE

WXWXYE

XXYE

0

0

0

0

,, : WZModel

, : ZModel

, : WModel

: UModel



148

Multiple Adjustment: Bad News

• Even though W and X might be marginally uncorrelated, our models Z and ZW estimate the same quantity only if – X | Z is uncorrelated with W | Z, OR– Y | X, Z is uncorrelated with W | X, Z

WX

W

XW

WZXZXX

Z

WWZ

X

ZXZ

X

WXW

XW

WXX

XXWX

WXWXX

XW

XW

2

0

2

0

11

iZiWiXiiii

iZiXiii

iWiXiii

iXii

ZWXZWXYE

ZXZXYE

WXWXYE

XXYE

0

0

0

0

,, : WZModel

, : ZModel

, : WModel

: UModel



149

Take Home Messages

• “Everything should be as simple as possible, but no simpler”

• Understanding the role of confounding and precision allows some generalizations, but there are details that can mess us up– Linear regression: We have formulas to guide us when modeling

differences in means– Other regressions: Generally we use computers in an iterative

search, but each step is often a linear regression

• When adjusting for multiple variables, each slope parameter is interpreted by conditioning on the other variables in the model– We have to talk about adjusting for “residual confounding”– We have to talk about “added precision”

Documents

Biost 536 / Epi 536 Categorical Data Analysis in EpidemiologyOct 14, 2014 · Lecture 6: Review of Regression October 14, 2014 Categorical Data Analysis, AUT 2014 1 1 Biost 536