76
Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University of IL School of Public Health Training Course in MCH Epidemiology 1

Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

Embed Size (px)

Citation preview

Page 1: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

1

Overview of Linear ModelsWebinar: Tuesday, May 22, 2012

Deborah Rosenberg, PhDResearch Associate ProfessorDivision of Epidemiology and BiostatisticsUniversity of IL School of Public Health

Training Course in MCH Epidemiology

Page 2: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

2

Training Course in MCH EPI, 2012

Course Topics Focusing on Multivariable Regression

• Model Building Approaches• Modeling Ordinal and Nominal Outcomes• Multilevel Modeling• Trend Analysis• Population Attributable Fraction• Propensity Scores• Modeling Risk Differences

We need to have some perspective ...

Page 3: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

Introduction

So, let's keep this in mind:

"...technical expertise and methodology are not substitutes for conceptual coherence. Or, as one student remarked a few years ago, public health spends too much time on the "p" values of biostatistics and not enough time on values." Medicine and Public Health, Ethics and Human Rights

Jonathan M. Mann

The Hastings Center Report , Vol. 27, No. 3 (May - Jun., 1997), pp. 6-13

Published by: The Hastings Center

3

Page 4: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

Introduction

Multivariable analysis implies acknowledging and accounting for the intricacies of the real world reflected in the relationships among a set of variables

Multivariable analysis is complex, particularly with observational as opposed to experimental data.

The accuracy of estimates from multivariable analysis and therefore the accuracy of conclusions drawn and any public health action taken is dependent on the application of appropriate analytic methods.

4

Page 5: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

Introduction

The challenge for an MCH epidemiologist goes beyond carrying out complex multivariable analysis to include:

advocating for and facilitating the routine incorporation of complex multivariable methods into the work of public health agencies, and

guiding interpretation of findings working to design reporting templates working to build dissemination strategies working to link findings with action plans or

policy recommendations5

Page 6: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

6

Basic Components of Any Statistical Analysis

1. Sample statistic(s) (observed value(s))

2. Population parameter(s) (expected value(s))

3. Sample Size

4. Sample variance(s)/standard error(s)

5. Critical values from the appropriateprobability distribution

, p, r

Review of the Basics

, ,

n

z, t, chi-square, F

Page 7: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

7

Review of the Basics

The study design and the sampling strategy—cohort, case-control, cross-sectional, longitudinal, etc. will have an impact on the statistical analysis that can be carried out:

Which measures of occurrence can be reported Which measures of association can be reported How will standard errors for confidence intervals and

statistical testing be calculated

Page 8: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

8

Review of the Basics

Measures of Occurrence

Means summarize continuous variables and are assumed to follow a normal distribution.

Proportions summarize discrete variables and are assumed to follow the Binomial distribution.

Some proportions are also said to be Poisson distributed if the numerator is very small compared to the denominator.

Rates, also based on discrete variables, are typically said to be Poisson distributed.

Page 9: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

9

Review of the Basics

Measures of Association

Difference Measures

Between two or more means Between two or more proportions (attributable risk) Between a mean & a standard Between a proportion & a standard

Ratio Measures

Relative Risk / Relative Prevalence Odds Ratio Rate Ratio / Hazard Ratio

Page 10: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

10

Review of the Basics

The 2x2 table—framework for constructing the ratio measures

RR and RPa

a bc

c d

a

cn

or

2

n rr

pp

1 1

2

1

2

Disease or Other Health Outcome Yes No

Yes a b a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a+b+c+d N

OR abcd

adbc

Page 11: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

11

Review of the Basics

Assessing the Accuracy of Statistics

We use probability distributions to evaluate how close or far from the “truth” our statistics are by calculating a range of values which includes the “true” population value with a given probability. This range is a confidence interval, and can be calculated around both measures of occurrence, e.g. incidence or prevalence, and measures of association, e.g. odds ratios or relative risks..

Page 12: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

12

Review of the Basics

Tests of Statistical Significance

Confidence intervals around measures of association provide evidence for or against equality.

Statistical tests go beyond this by generating a specific probability that a given difference we see in our sample is due solely to chance imposed by the sampling process.

This probability is the p-value.

Page 13: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

13

Review of the Basics

We again use probability distributions to formally test hypotheses about sample statistics.

Page 14: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

Review of the Basics

Multivariable modeling should be the culmination of an analytic strategy that includes articulating a conceptual framework and carrying out preliminary analysis.

BEFORE any multivariable modeling—• Select variables of interest• Define levels of measurement, sometimes more than

once, for a given variable• Examine univariate distributions• Examine bivariate distributions

14

Page 15: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

Review of the Basics

BEFORE any multivariable modeling—

• Perform single factor stratified analysis to assess confounding and effect modification

• Rethink variables and levels of measurement• Perform multiple factor stratified analysis with

different combinations of potential confounders / effect modifiers

These steps should never be skipped!

15

Page 16: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

16

With confounding, the association between a risk factor and a health outcome is the same (or close to the same) in each stratum, but the adjusted association differs from the crude.

With effect modification, the association between a risk factor and a health outcome varies from stratum to stratum.

Confounding Effect Modification

Compare crude v. adjusted OR/RR

Compare stratum-specific OR/RR

No statistical testing Statistical testing

Review of the Basics

Page 17: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

17

Review of the Basics

Assessing Effect Modification

• Stratified Analysis: Are the stratum-specific measures of association different (heterogeneous)?

• Regression Analysis: Is the beta coefficient resulting from the multiplication of two variables large?

Regardless of the method, if the stratum-specific estimates differ, then reporting a weighted average will mask the important stratum-specific differences.

Stratum-specific differences can be statistically tested.

Page 18: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

18

Review of the Basics

Assessing Confounding

• Standardization: Does the standardized measure differ from the unstandardized measure?

• Stratified Analysis: Does the adjusted measure of association differ from the crude measure of association?

• Regression Analysis: Does the beta coefficient for a variable in a model that includes a potential confounder differ from the beta coefficient for that same variable in a model that does not include the potential confounder?

Page 19: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

19

Review of the Basics

Assessing Confounding

Regardless of the method, if the adjusted estimate differs from the crude estimate of association, then confounding is present.

Determining whether a difference between the crude and adjusted measures is meaningful is a matter of judgment, since there is no formal statistical test for the presence of confounding.

By convention, epidemiologists consider confounding to be present if the adjusted measure of association differs from the crude measure by >= 10%

Page 20: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

20

Review of the Basics

Moving toward Multivariable Modeling: Jointly Assessing a Set (but which set?) of Variables

“A sufficient confounder group is a minimal set of one or more risk factors whose simultaneous control in the analysis will correct for joint confounding in the estimation of the effect of interest. Here, 'minimal' refers to the property that, for any such set of variables, no variable can be removed from the set without sacrificing validity.”

Kleinbaum, DG, Kupper, LL., Morgenstern,H. Epidemiologic Research: Principles and Quantitative Methods, Nostrand Reinhold Company, New York, 1982, p 276.

Page 21: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

21

Linear Models: General Considerations

The most common regression models used to analyze health data express the hypothesized association between risk or other factors and an outcome as a linear (straight line) relationship:

Dependent Var. = ------Independent Variables------

This equation is relevant to any linear model; what differentiates one modeling approach from another is

the structure of the outcome variable, and the corresponding structure of the errors.

iikk2i21i10i XXXOutcome

Page 22: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

22

Linear Models: General Considerations

The straight linerelationship includesan intercept and oneor more slope parameters.

The differences between the actual datapoints and the regression line are the errors.

iikk2i21i10i XXXOutcome

Page 23: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

Linear Models: General Considerations

Regression analysis is an alternative to and an extension of simpler methods used to test hypotheses about associations:

For means, regression analysis is an extension of t-tests and analysis of variance.

For proportions or rates,, regression analysis is an extension of chi-square tests from contingency tables – crude and stratified analysis.

23

Page 24: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

Linear Models: General Considerations

Why not just do stratified analysis? Why Use Regression Modeling Approaches?

Unlike stratified analysis, regression approaches:

1. more efficiently handle many variables and the sparse data that stratification by many factors may imply

2. can accommodate both continuous and discrete variables, both as outcomes and as independent variables.

24

Page 25: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

Linear Models: General Considerations

Unlike stratified analysis, regression approaches:

3. allow for examination of multiple factors (independent variables) simultaneously in relation to an outcome (dependent variable)—all variables can be considered "exposures" or "covariates" depending on the hypotheses

4. provide more flexibility in assessing effect modification and controlling confounding.

25

Page 26: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

Linear Models: General Considerations

The Purpose of Modeling

Sometimes, regression modeling is carried out in order to assess one association; other variables are included to adjust for confounding or account for effect modification. In this scenario, the focus is on obtaining the ‘best’ estimate of the single association.

Sometimes, regression modeling is carried out in order to assess multiple, competing exposures, or to identify a set of variables that together predict the outcome.

26

Page 27: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

27

Linear Model: General Considerations

The utility of regression models is their ability to simultaneously handle many independent variables.

Models may be quite complex, including both continuous and discrete measures, and measures at the individual level and/or at an aggregate level such as census tract, zip code, or county.

Interpretation of the slopes or “beta coefficients” can be equally complex as they reflect measures of occurrence (means, proportions, rates) or measures of association (odds ratios, relative risks rate ratios) when used singly or in combination.

Page 28: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

28

Linear Models: General Considerations

The Traditional, 'Normal' Regression Model

This model has the following properties:

The outcome "Y" is continuous & normally distributed. The Y values are independent. The errors are independent, normally distributed; their

sum equals 0, with constant variance across levels of X. The expected value (mean) of the Y's is linearly related to

X (a straight line relationship exists).

iikk2i21i10i XXXY

Page 29: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

29

Linear Models: General Considerations

When the outcome variable is not continuous and normally distributed, a linear model cannot be written in the same way, and the properties listed above no longer pertain.

For example, if the outcome variable is a proportion or rate:

The errors are not normally distributed The variance across levels of X is not constant. (By

definition, p(1-p) changes with p and r changes with r). The expected value (proportion or rate) is not linearly

related to X (a straight line relationship does not exist).

Page 30: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

30

When an outcome

is a proportion or rate,

its relationship with

a risk factors is

not linear.

Linear Models: General Considerations

0.0

0.2

0.4

0.6

0.8

1.0

Proportion with the outcome

x

Page 31: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

31

Linear Models: General Considerations

General Linear Models

How can a linear modeling approach be applied to the many health outcomes that are proportions or rates?

The normal, binomial, Poisson, exponential, chi-square, and multinomial distributions are all in the exponential family.

Therefore, it is possible to define a “link function” that transforms an outcome variable from any of these distributions so that it is linearly related to a set of independent variables; the error terms can also be defined to correspond to the form of the outcome variable.

Page 32: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

32

Linear Models: General Considerations

General Linear Models

Some common link functions:•identity (untransformed)•natural log•logit•cumulative logit•generalized logit

The interpretation of the parameter estimates—the beta coefficients—changes depending on whether and how the outcome variable has been transformed (which link function has been used).

Page 33: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

33

Linear Models:General Considerations

Linear equation

The logit link function:

(logistic regression)

Non-linear equation

Page 34: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

34

Linear Models:General Considerations

The natural log link function:

log-binomial or Poisson regression with count data

Non-linear model

The linear model Xbbrln

ern

count

necount

10

Xbb

Xbb

10

10

Page 35: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

35

Linear Models: General Considerations

'Normal' Regression—Link=Identity, Dist=Normal

Logistic Regression—Link=Logit, Dist=Binomial

Log-Binomial or Poisson Regression with Count Data—Link=Log, Dist=Binomial or Dist=Poisson

kk22110 XXX lnor ln

iikk2i21i10i XXXY

kk22110 XXX1

ln

Page 36: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

Ordinal and Nominal Model For an ordinal outcome For a nominal outcomewith k+1 categories with k+1 categories

Both the numerator and Fixed denominatordenominator change (reference) category

http://www.indiana.edu/%7Estatmath/stat/all/cat/2b1.html 36

1k21

1k21k21

21

2121

1

11

p1

plnOddsln

p1

plnOddsln

p1

plnOddsln

1k21

kk

1k21

22

1k21

11

p1

plnOddsln

p1

plnOddsln

p1

plnOddsln

Linear Models: General Considerations

Page 37: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

Some Models with Correlated Errors

Mixed Models

♦ Multilevel/clustered data♦ Repeated measures/longitudinal data♦ Matched data♦ Time series analysis♦ Spatial analysis

37

Linear Models: General Considerations

Page 38: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

Some Other Multivariable Statistical Approaches

● Survival Analysis—censored dataParametricSemi-parametric / proportional hazards

● Structural Equation Modeling / mediation analysis—exploring causal pathways

● Bayesian modeling

38

Linear Models: General Considerations

Page 39: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

39

Regression Modeling Results

Measures of Occurrence Predicted Values: Crude, Adjusted, or Stratum-Specific

The predicted values are points on the regression line given particular values of the set of independent variables

‘Normal’ model yields meansLogistic model yields ln(odds)Binomial / Poisson models yield ln(proportions / rates)

Linear Models: General Considerations

Page 40: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

40

Linear Models:General Considerations

Regression Modeling Results

Measures of AssociationBeta coefficients: Crude, Adjusted, or Stratum-Specific

The measures of association are comparisons of points on the regression line at differing values of the independent variables

Page 41: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

41

Linear Models:General Considerations

Regression Modeling ApproachesMeasures of Association

‘Normal” regressionDifferences between means

Log-Binomial or Poisson regressionDifferences between log proportions:

Relative Risk / Relative Prevalence

Logistic regression(binary, cumulative, generalized)

Differences between log odds:Odds Ratio(s) for—

a single binary outcome a set of binary outcomes an ordinal outcome

Binomial RegressionDifferences between proportions:

Risk Differences / Attributable Risks

Poisson regression (person-time data)

Differences between log rates: Rate Ratio

Page 42: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

42

Regression Modeling Results

Measures of Association

General Form of Confidence Intervals and Hypothesis Testing for a Simple Comparison—

a Single Beta Coefficient

Linear Models: General Considerations

Beta ObservedError Standard

Beta Expected Beta ObservedStatisticTest

12/11 b.e.szdiffbCI

Page 43: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

43

Common Linear Regression Models

Examples with Smoking and Birthweight

Page 44: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

44

B ValueA Valueb

B ValuebbA Valuebb

YY

1

1010

B ValuexA Valuex

‘Normal’ Regression

Predicted Values (Means):

Predicted values use theentire regression equation,

including the intercept.

Measures of Association (Differences Between Means):

When comparing two predicted values—ameasure of association—the intercept terms cancel out.

A ValuebbY 10A Valuex

B ValuebbY 10B Valuex

Page 45: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

‘Normal’ Regression

in SAS/* Continuous Birthweight, OLS Regression */proc reg data=one; model dbirwt = smoking; run;proc reg data=one; model dbirwt = smoking late_no_pnc; run;

/* Continuous Birthweight, Regression Using ML */proc genmod data=one; model dbirwt = smoking / link=identity dist=normal; run;proc genmod data=one; model dbirwt = smoking late_no_pnc / link=identity dist=normal; run;

45

Page 46: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

‘Normal’ Regression

Descriptive Statistics and

Simple t-test for Smoking and Birthweight

46

The TTEST Procedure Variable: DBIRWT (Birth Weight Detail in Grams) smoking N Mean Std Dev Std Err DF t Value Pr > |t| yes 9259 3155.9 575.6 5.9824 no 71549 3352.7 568.2 2.1244 Diff (1-2) -196.9 569.1 6.2854 80806 -31.32 <.0001

Page 47: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

47

'Normal' Regression

“dbirwt” = Birthweight (grams) from vital records

Sum of Mean Source DF Squares Square F Value Pr > F Model 1 317799174 317799174 981.24 <.0001 Error 80806 26171015889 323875 Corrected Total 80807 26488815063 Root MSE 569.09987 R-Square 0.0120 Dependent Mean 3330.17903 Adj R-Sq 0.0120 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 3352.73853 2.12758 1575.84 <.0001 smoking 1 -196.88822 6.28538 -31.32 <.0001

Page 48: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

'Normal' Regression

model dbirwt = smoking;

Predicted value for smokers:

Mean birthweight = 3155.85 = 3352.74–196.89(1)

Predicted value for non-smokers:

Mean birthweight = 3352.74 = 3352.74–196.89(0)

Measure of Association / comparison of predicted values:

Difference between means = 3155.85-3352.74 = -196.89

95% CI = -196.89 +/- 1.96*6.29 = (-184.6, -209.2)

48

Page 49: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

49

'Normal' Regression with OLS in SAS

Sum of Mean Source DF Squares Square F Value Pr > F Model 2 375465933 187732967 580.92 <.0001 Error 80805 26113349130 323165 Corrected Total 80807 26488815063 Root MSE 568.47605 R-Square 0.0142 Dependent Mean 3330.17903 Adj R-Sq 0.0142 Coeff Var 17.07044 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 3364.04030 2.28746 1470.64 <.0001 smoking 1 -190.55737 6.29636 -30.26 <.0001 late_no_pnc 1 -71.69983 5.36744 -13.36 <.0001

Page 50: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

50

Logistic Regression

Predicted ValuesWhen the outcome is a proportion with a logistic transformation, the predicted values are log odds

Dichotomous Independent Variable Coded 1 and 0:

In general:

10

101x

1x

bb

1bbp1

pln

0

100x

0x

b

0bbp1

pln

A Valuebbp1

pln 10

A Valuex

A Valuex

Page 51: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

51

Logistic Regression

Measures of Association—Beta Coefficients—

Differences Between Log Odds, and the Odds Ratio

Dichotomous Independent Variable Coded 1 and 0

1

1

1010

0x

0x

1x

1x

b

01b

0bb1bb

p1

pln

p1

pln

1

1

10

10

1010

b

01b

0bb

1bb

0bb1bb

e

e

e

e

e

Page 52: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

52

Measures of Association—Beta Coefficients—Differences Between Log Odds, and the Odds Ratio

In General, The beta coefficient is the change in the logit for every unit change in X.

For an ordinal or continuous variable, the test of the beta coefficient will be a test of linear trend.

Logistic Regression

B ValueA Valueb

B ValuebbA Valuebb

p1

pln

p1

pln

1

1010

B Valuex

B Valuex

A Valuex

A Valuex

B ValueA Valueb

B Value bb

A Value bb

B Value bbA Value bb

1

10

10

1010

e

e

e

e

Page 53: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

53

Confidence Intervals for Estimated Odds Ratios from a Logistic Regression Model

For dichotomous variables coded 1 and 0:

In general, for a single beta coefficient:

where "diff" is the difference of interest in the values of the independent variable being analyzed

Logistic Regression

Page 54: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

Logistic Regression in SAS

/* Dichotomous Birthweight, Logistic Regression */proc logistic order=formatted data=one; model lbw = smoking;run;proc logistic order=formatted data=one; model lbw = smoking late_no_pnc;run;

proc genmod data=one; model lbw = smoking / link=logit dist=bin; estimate 'Crude OR smoking' smoking 1 / exp;run;proc genmod data=one; model lbw = smoking late_no_pnc / link=logit dist=bin; estimate 'AOR smoking' smoking 1 / exp; estimate 'AOR Late_no_pnc' late_no_pnc 1 / exp;run;

54

Page 55: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

55

First looking at acontingency table using proc freq in SASCrude Associationbetween Smokingand Low Birthweight

Logistic Regression smoking lbw Frequency‚ Row Pct ‚ yes ‚no ‚ Total ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ yes ‚ 938 ‚ 8321 ‚ 9259 ‚ 10.13 ‚ 89.87 ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ no ‚ 4046 ‚ 67503 ‚ 71549 ‚ 5.65 ‚ 94.35 ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Total 4984 75824 80808 (Asymptotic) 95% (Exact) 95% Risk ASE Confidence Limits Confidence Limits ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Row 1 0.1013 0.0031 0.0952 0.1075 0.0952 0.1076 Row 2 0.0565 0.0009 0.0549 0.0582 0.0549 0.0583 Total 0.0617 0.0008 0.0600 0.0633 0.0600 0.0634 Difference 0.0448 0.0033 0.0384 0.0511 Type of Study Value 95% Confidence Limits ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Case-Control (Odds Ratio) 1.8807 1.7455 2.0264 Cohort (Col1 Risk) 1.7915 1.6743 1.9169

Page 56: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

56

Output from proc logistic

Logistic Regression

88.1

e

e

e

e

6317.0

016317.0

06317.08244.2

16317.08244.2

Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -2.8144 0.0162 30236.4216 <.0001 smoking 1 0.6317 0.0381 275.5238 <.0001 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits smoking 1.881 1.746 2.026

Page 57: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

57

Logistic Regression

Risk Diff 0.0582 0.0438-0.0727 Risk Diff 0.0387 0.0316-0.0457

Case-Control (OR) Mantel-Haenszel 1.8355 1.7028-1.9784Cohort (RP) Mantel-Haenszel 1.7499 1.6349-1.8731

Is there is evidence of any confounding or effect modification?

Table 1 of smoking by lbw Controlling for late_no_pnc=Late or No PNC smoking lbw Frequency‚ Row Pct ‚ yes ‚no ‚ Total ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ yes ‚ 289 ‚ 1988 ‚ 2277 ‚ 12.69 ‚ 87.31 ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ no ‚ 775 ‚ 10503 ‚ 11278 ‚ 6.87 ‚ 93.13 ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Total 1064 12491 13555

Table 2 of smoking by lbw Controlling for late_no_pnc=First Trimester PNC smoking lbw Frequency‚ Row Pct ‚ yes ‚no ‚ Total ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ yes ‚ 649 ‚ 6333 ‚ 6982 ‚ 9.30 ‚ 90.70 ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ no ‚ 3271 ‚ 57000 ‚ 60271 ‚ 5.43 ‚ 94.57 ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Total 3920 63333 67253

Odds Ratio 1.9701 1.7070-2.2738 Cohort 1.8470 1.6261-2.0979

Odds Ratio 1.7858 1.6351-1.9503 Cohort 1.7127 1.5803-1.8563

Page 58: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

58

Logistic Regression

Output from proc logistic: Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -2.8622 0.0176 26390.9295 <.0001 smoking 1 0.6064 0.0382 251.5499 <.0001 late_no_pnc 1 0.2739 0.0362 57.3687 <.0001 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits smoking 1.834 1.701 1.977 late_no_pnc 1.315 1.225 1.412

Page 59: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

Binomial and Poisson Regression

Predicted ValuesWhen the outcome is a proportion with a natural log

transformation, the predicted values are log proportions

In general

59

10

10

bb

1bbrate/proportionln

0

10

b

bbrate/proportionln

A Valuebb

AValuebbrate/proportionln

10

10

Page 60: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

Binomial and Poisson Regression

Measures of Association—Beta Coefficients—Differences Between Log Proportions/rates, and the Relative Prevalence / Relative Risk

Dichotomous Independent Variable Coded 1 and 0

60

01b

0bb1bb

plnpln

1

1010

0x1x

01b

0bb

1bb

0bb1bb

1

10

10

1010

e

e

e

e

Page 61: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

Binomial and Poisson Regression

In General, the beta coefficient is the change in the log proportion / rate for every unit change in X.

61

BValueA Valueb

BValuebbA Valuebb

plnpln

1

1010

BValuexA Valuex

BValueA Valueb

BValuebb

A Valuebb

BValuebbA Valuebb

1

10

10

1010

e

e

e

e

Page 62: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

Binomial and Poisson Regression

The more common the outcome, the greater the difference in the binomial and Poisson standard errors

When the outcome is rare (e.g. per 10,000, per 100,000), the binomial and Poisson standard errors will be almost identical

62

Page 63: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

63

Binomial and Poisson Regression

For infant mortality, calculated per 1,000 live births, what difference will using the binomial or Poisson distribution make?

Suppose the IMR is 7 per 1,000, or 0.007:

0.002151500

0.00695

1500

0.9930.007s.e. binomial

0.00216

1500

0.007s.e. Poisson

Page 64: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

/* Dichotomous Birthweight, Log-Binomial Regression */proc genmod data=one; model lbw = smoking / link=log dist=bin; estimate 'Crude RP smoking' smoking 1 / exp; run;proc genmod data=one; model lbw = smoking late_no_pnc / link=log dist=bin; estimate 'ARP smoking' smoking 1 / exp; estimate 'ARP Late_no_pnc' late_no_pnc 1 / exp; run;

/* Dichotomous Birthweight, Poisson Regression */proc genmod data=one; model lbw = smoking / link=log dist=poisson; estimate 'Crude RP smoking' smoking 1 / exp; run;proc genmod data=one; model lbw = smoking late_no_pnc / link=log dist=poisson; estimate 'ARP smoking' smoking 1 / exp; estimate 'ARP Late_no_pnc' late_no_pnc 1 / exp; run;

64

Binomial and Poisson Regression in SAS

Page 65: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

Binomial and Poisson Regression

Output from proc genmod

65

Analysis Of Maximum Likelihood Parameter Estimates Standard Wald 95% Confidence Wald Parameter DF Estimate Error Limits Chi-Square Pr > ChiSq Intercept 1 -2.8727 0.0153 -2.9026 -2.8427 35389.4 <.0001 smoking 1 0.5831 0.0345 0.5154 0.6507 285.37 <.0001 Scale 0 1.0000 0.0000 1.0000 1.0000 NOTE: The scale parameter was held fixed. Contrast Estimate Results Mean Mean L'Beta Standard Label Estimate Confidence Limits Estimate Error Alpha Crude RP smoking 1.7915 1.6743 1.9169 0.5831 0.0345 0.05 Exp(Crude RP smoking) 1.7915 0.0618 0.05

Page 66: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

66

Binomial and Poisson Regression

Risk Diff 0.0582 0.0438-0.0727 Risk Diff 0.0387 0.0316-0.0457

Case-Control (OR) Mantel-Haenszel 1.8355 1.7028-1.9784Cohort (RP) Mantel-Haenszel 1.7499 1.6349-1.8731

Table 1 of smoking by lbw Controlling for late_no_pnc=Late or No PNC smoking lbw Frequency‚ Row Pct ‚ yes ‚no ‚ Total ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ yes ‚ 289 ‚ 1988 ‚ 2277 ‚ 12.69 ‚ 87.31 ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ no ‚ 775 ‚ 10503 ‚ 11278 ‚ 6.87 ‚ 93.13 ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Total 1064 12491 13555

Table 2 of smoking by lbw Controlling for late_no_pnc=First Trimester PNC smoking lbw Frequency‚ Row Pct ‚ yes ‚no ‚ Total ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ yes ‚ 649 ‚ 6333 ‚ 6982 ‚ 9.30 ‚ 90.70 ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ no ‚ 3271 ‚ 57000 ‚ 60271 ‚ 5.43 ‚ 94.57 ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Total 3920 63333 67253

Odds Ratio 1.9701 1.7070-2.2738 Cohort 1.8470 1.6261-2.0979

Odds Ratio 1.7858 1.6351-1.9503 Cohort 1.7127 1.5803-1.8563

Page 67: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

Binomial and Poisson Regression

Output from proc genmod

67

Analysis Of Maximum Likelihood Parameter Estimates Standard Wald 95% Confidence Wald Parameter DF Estimate Error Limits Chi-Square Pr > ChiSq Intercept 1 -2.9174 0.0166 -2.9500 -2.8848 30804.5 <.0001 smoking 1 0.5593 0.0347 0.4913 0.6272 260.34 <.0001 late_no_pnc 1 0.2548 0.0333 0.1894 0.3201 58.39 <.0001 Scale 0 1.0000 0.0000 1.0000 1.0000 NOTE: The scale parameter was held fixed. Contrast Estimate Results Mean Mean L'Beta Standard Label Estimate Confidence Limits Estimate Error Alpha ARP smoking 1.7494 1.6345 1.8724 0.5593 0.0347 0.05 Exp(ARP smoking) 1.7494 0.0606 0.05 ARP Late_no_pnc 1.2901 1.2085 1.3773 0.2548 0.0333 0.05 Exp(ARP Late_no_pnc) 1.2901 0.0430 0.05

Page 68: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

68

Binomial and Poisson Regression

Comparison between Binomial and Poisson Results

Binomial

Poissson

Page 69: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

Cumulative and Generalized Logit

/*vlbw, mlbw, and normal bw as an ordinal variable*/proc logistic order=formatted data=one; model bwcat = smoking;run;

/*vlbw, mlbw, and normal bw as a nominal variable*/proc logistic order=formatted data=one; model bwcat (ref='normal bw') = smoking / link=glogit;run;

Since this is logistic regression, predicted values are log(odds) and the measures of association—the beta coefficients—are differences between the log odds ratios, which when exponentiated are odds ratios.

69

Page 70: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

Cumulative and Generalized Logit

Output fromproc logistic:Ordinal Birthweight

70

Value bwcat Frequency 1 vlbw 897 2 mlbw 4087 3 normal bw 75824 Score Test for the Proportional Odds Assumption Chi-Square DF Pr > ChiSq 13.6865 1 0.0002 Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept vlbw 1 -4.5844 0.0343 17824.9677 <.0001 Intercept mlbw 1 -2.8138 0.0162 30242.4897 <.0001 smoking 1 0.6262 0.0381 270.3336 <.0001 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits smoking 1.871 1.736 2.016

Page 71: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

Cumulative and Generalized Logit

Output fromproc logistic:Nominal Birthweight

71

Value bwcat Frequency 1 vlbw 897 2 mlbw 4087 3 normal bw 75824 Analysis of Maximum Likelihood Estimates Standard Wald Parameter bwcat DF Estimate Error Chi-Square Pr > ChiSq Intercept vlbw 1 -4.4827 0.0364 15160.6058 <.0001 Intercept mlbw 1 -3.0234 0.0179 28618.1754 <.0001 smoking vlbw 1 0.3540 0.0944 14.0650 0.0002 smoking mlbw 1 0.6865 0.0410 280.0005 <.0001 Odds Ratio Estimates Point 95% Wald Effect bwcat Estimate Confidence Limits smoking vlbw 1.425 1.184 1.714 smoking mlbw 1.987 1.833 2.153

Page 72: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

Risk Differences

/* Dichotomous Birthweight, Modeling Risk Differences */proc genmod data=one; model dbirwt = smoking / link=identity dist=bin;run;proc genmod data=one; model dbirwt = smoking late_no_pnc / link=identity dist=bin;run;

Since the outcome variable is a proportion , but it is not transformed in any way, the predicted values are the proportions themselves, and the measures of association—the beta coefficients—are the differences in the proportions, or "risk" differences.

72

Page 73: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

73

Risk Differences

Risk Diff 0.0582 0.0438-0.0727 Risk Diff 0.0387 0.0316-0.0457

Case-Control (OR) Mantel-Haenszel 1.8355 1.7028-1.9784Cohort (RP) Mantel-Haenszel 1.7499 1.6349-1.8731

Table 1 of smoking by lbw Controlling for late_no_pnc=Late or No PNC smoking lbw Frequency‚ Row Pct ‚ yes ‚no ‚ Total ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ yes ‚ 289 ‚ 1988 ‚ 2277 ‚ 12.69 ‚ 87.31 ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ no ‚ 775 ‚ 10503 ‚ 11278 ‚ 6.87 ‚ 93.13 ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Total 1064 12491 13555

Table 2 of smoking by lbw Controlling for late_no_pnc=First Trimester PNC smoking lbw Frequency‚ Row Pct ‚ yes ‚no ‚ Total ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ yes ‚ 649 ‚ 6333 ‚ 6982 ‚ 9.30 ‚ 90.70 ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ no ‚ 3271 ‚ 57000 ‚ 60271 ‚ 5.43 ‚ 94.57 ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Total 3920 63333 67253

Odds Ratio 1.9701 1.7070-2.2738 Cohort 1.8470 1.6261-2.0979

Odds Ratio 1.7858 1.6351-1.9503 Cohort 1.7127 1.5803-1.8563

Page 74: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

Risk Differences

Output form proc genmod

Crude and Adjusted Risk Differences

74

Analysis Of Maximum Likelihood Parameter Estimates Standard Wald 95% Confidence Wald Parameter DF Estimate Error Limits Chi-Square Pr > ChiSq Intercept 1 0.0565 0.0009 0.0549 0.0582 4288.51 <.0001 smoking 1 0.0448 0.0033 0.0384 0.0511 189.37 <.0001 Scale 0 1.0000 0.0000 1.0000 1.0000

Analysis Of Maximum Likelihood Parameter Estimates Standard Wald 95% Confidence Wald Parameter DF Estimate Error Limits Chi-Square Pr > ChiSq Intercept 1 0.0540 0.0009 0.0522 0.0558 3506.35 <.0001 smoking 1 0.0428 0.0033 0.0364 0.0492 172.41 <.0001 late_no_pnc 1 0.0165 0.0025 0.0117 0.0213 45.21 <.0001 Scale 0 1.0000 0.0000 1.0000 1.0000

Page 75: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

75

Linear Models: General Considerations

Conceptual Framework

Level of measurement of the outcome variable

Unit of Analysis

Error Structure /Distribution

Hypothesis formulation

Continuous Dichotomous Polytomous-nominal Polytomous-ordinal

Individual Aggregate Individual and aggregate

Uncorrelated Correlated imposed by study

design or by ‘natural’ structure of the data

Page 76: Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

Until next week...

Again, let's keep this in mind...

"...technical expertise and methodology are not substitutes for conceptual coherence. Or, as one student remarked a few years ago, public health spends too much time on the "p" values of biostatistics and not enough time on values." Medicine and Public Health, Ethics and Human Rights

Jonathan M. Mann

The Hastings Center Report , Vol. 27, No. 3 (May - Jun., 1997), pp. 6-13

Published by: The Hastings Center

76