23
What is Interaction for A Binary Outcome? Chun Li Department of Biostatistics Center for Human Genetics Research September 19, 2007

What is Interaction for A Binary Outcome? Chun Li Department of Biostatistics Center for Human Genetics Research September 19, 2007

Embed Size (px)

Citation preview

What is Interaction for A Binary Outcome?

Chun Li

Department of Biostatistics

Center for Human Genetics Research

September 19, 2007

2

What We Have Learned

• Little.• Generic.

• In linear regression: y = β0 + β1x1 + β2x2 + β3x1x2

• In whatever other regression, the right-hand side isβ0 + β1x1 + β2x2 + β3x1x2

• For a binary outcome, we often use logistic regression. For example, the log-odds of cancer risk

log(Oij) = β0 + β1×sex + β2×smoking + β3×sex×smoking

“main effect” “interaction effect”

3

Interaction

• Introduced by R. A. Fisher to generalize the concept “epistasis” in genetics.

• The concept is ubiquitous.• The word sounds easy to understand, and is

charismatic in some circles.

• Ambiguous without model context.• Hard to interpret and translate to reality for some

models, such as logistic regression.

4

Epistasis

• Example: Genotype BB masks the effect of gene A.

• It is a very special type of interaction.

• Such a phenomenon can be seen in other contexts, e.g. gene-environment interaction.

bb Bb BB

aa

Aa

AA

Exposure

No Yes

aa

Aa

AA

5

“No Interaction” ≠ Independence• Interaction is about the joint effect of input variables

on an outcome, or how the effect change as the values change at the input variables.

• Independence is about the statistical relationship between input variables, irrespective of the outcome or the effect on the outcome.

• Using “independent effect” to describe “no interaction” may be confusing.

6

Interaction = Effect Modification• Effect modification: The effect of one variable on

the outcome is modified depending on the values of other variables.

• It depends on how “effect” is measured and on what scale. ― Kenneth Rothman, Sander Greenland

• For a binary outcome, “effect” can be measured as– risk difference

– risk ratio

– odds ratio

7

Measuring Effect: Risk Difference

If gender doesn’t modify the “effect” of smoking, thenR01 – R00 = R11 – R10

R11 – R00 = (R10 – R00) + (R01 – R00)

RR11 – 1 = (RR10 – 1) + (RR01 – 1)

additive decomposition of risk: Rij = ai + bj

Smoking

No (0) Yes (1) Marginal

Male (0) R00 R01 R0•

Female (1) R10 R11 R1•

Marginal R•0 R•1

“Effect” of smoking:R01 – R00 (in males)

R11 – R10 (in females)

Equivalent

= R•1 – R•0 (!)

= (R1• – R0•) + (R•1 – R•0)

, where RRij = Rij / R00

8

Measuring Effect: Risk Ratio

If gender doesn’t modify the “effect” of smoking, thenR01 / R00 = R11 / R10

RR11 = RR10 × RR01

RR11 = (R1• / R0•) × (R•1 / R•0)

multiplicative decomposition of risk: Rij = ci × dj

Smoking

No (0) Yes (1) Marginal

Male (0) R00 R01 R0•

Female (1) R10 R11 R1•

Marginal R•0 R•1

“Effect” of smoking:R01 / R00 (in males)

R11 / R10 (in females)

Equivalent

= R•1 / R•0 (!)

9

Measuring Effect: Odds Ratio

If gender doesn’t modify the “effect” of smoking, thenO01 / O00 = O11 / O10

OR11 = OR10 × OR01 , where ORij = Oij / O00

additive decomposition of log-odds ln(Oij)

Even if gender doesn’t modify the effect of smoking, smoking’s marginal effect may be different from its gender-specific effect !?!

Smoking

No (0) Yes (1) Marginal

Male (0) O00 O01 O0•

Female (1) O10 O11 O1•

Marginal O•0 O•1

“Effect” of smoking:O01 / O00 (in males)

O11 / O10 (in females)

O** = R**/(1 – R**)

Equivalent≠ O•1 / O•0 in general (?!?)

10

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

p1

p 2

25

OR = 10

1/21/5

1/10

0.0 0.2 0.4 0.6 0.8 1.0

-0.4

0.0

0.2

0.4

p1

p 2

p 1

0.0 0.2 0.4 0.6 0.8 1.0

02

46

810

p1

p 2p 1

0.0 0.2 0.4 0.6 0.8 1.0

-3-2

-10

12

3

p1

log

p 2

log

p 1

)1/(

)1/(

11

22

pp

ppOR

11

“No interaction” under one definition often means interaction under another definition.

Results from interaction analysis should be always reported with the scale that was used to measure effect.

Some effect measures are intuitive, some are not intuitive and even not intrinsically consistent.

Interaction = Effect ModificationMeasure

12

Biologic Interaction

• Biologic interaction = biologically causal interaction.• Greenland and Rothman argued that “biologic

interaction” is reflected by departure from additive risks.– Counterfactual arguments

– Causal pie arguments

• Additive definition is difficult to test directly in case-control studies.

13

Advantages of Logistic Regression

• For retrospective studies (e.g., case-control studies), risk difference and risk ratio cannot be estimated and analyzed. But odds ratio can!

• Odds ratio doesn’t have boundary effect. Both risk difference and risk ratio do:– Interaction effect must exist under some circumstances.– May cause problems computationally.

• Odds ratio ≈ risk ratio, when risks are very small.

14

Misconception 1

Interaction terms are treated the same way as main-effect terms:– Numerical comparison between an interaction

coefficient and a main-effect coefficient.– (logistic regression) Power to detect interaction

when “interaction explains half of the total effect.”– (logistic regression) “Odds ratio” of the

interaction.– Fact: They are oranges and apples.

15

Misconception Reinforced by Software

• Stata output:

. logistic case v1 v2 v12

Logistic regression Number of obs = 1530 LR chi2(3) = 12.93 Prob > chi2 = 0.0048Log likelihood = -878.77373 Pseudo R2 = 0.0073

------------------------------------------------------------------------------ case | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- v1 | 1.52674 .8978875 0.72 0.472 .4821329 4.83463 v2 | .7779552 .4651644 -0.42 0.675 .2409871 2.511397 v12 | 1.004005 .3277949 0.01 0.990 .5294554 1.903893------------------------------------------------------------------------------

16

Interaction in Logistic Regression

μ00 = β0

μ01 = β0 + β2

μ10 = β0 + β1

μ11 = β0 + β1 + β2 + β3

Smoking

No (0) Yes (1)

Male (0) O00 O01

Female (1) O10 O11

μij = log(Oij) = β0 + β1×sex + β2×smoking + β3×sex×smoking

Coefficient β exp(β)

β1 = μ10 – μ00 O10 / O00

β2 = μ01 – μ00 O01 / O00

β3 = (μ11 – μ10) – (μ01 – μ00) (O11 / O10) / (O01 / O00)

Ratio of odds ratios

Baseline ORs

β1β2

17

Misconception 2Interpret main-effect terms when interaction terms

are included in the model:– Evaluation of statistical significance of “main-effect”.– Fact: Main-effect term should always be included

in the model as long as it is involved in some interaction terms.

– A main-effect coefficient is interpreted as the magnitude of “main effect” or “marginal effect”.

– Fact: Main-effect coefficient of variable X represents its “baseline effect” when all variables “interacting” with X are zero (i.e. at baseline).

– Its interpretation depends on how other variables are coded (i.e. where the baselines are).

18

Significance of a Main-Effect Term in Logistic Regression

μ00 = β0

μ01 = β0 + β2

μ10 = β0 + β1

μ11 = β0 + β1 + β2 + β3

Smoking

No (0) Yes (1)

Male (0) O00 O01

Female (1) O10 O11

Statistical significance of a term ≡ if it can be removed.

μij = log(Oij) = β0 + β1×sex + β2×smoking + β3×sex×smoking

What would happen if β2 = 0?

This means differently when sex is coded differently.

19

One Input Variable is ContinuousY = β0 + β1G + β2X + β3G×X

A: YA = β0 + β2X

B: YB = (β0 + β1) + (β2 + β3)X

β1 = YB – YA when X = 0

β2 = slope for group A

β3 = difference in slopes (B – A)

x

y

a b

G = 0 (group A)

G = 1 (group B)

β1 = 0 → same Y when X = 0.

β2 = 0 → group A is flat.

β3 = 0 → equal slopes.

often extrapolative and meaningless

Not marginal effects

20

Misconception 3

• If a set of variables/genes together with all possible combinations among them (i.e. allowing full interactions) significantly predict the outcome, then we have found interaction among these variables.

• Fact: Interaction is about departure from additive effects. The variables may just have additive effects without interaction.

21

Do We Want Generic Interaction?Carcinogen exposure

No (#case/#control)

Yes (#case/#control)

aa 14/30 12/34

Aa 8/20 19/19

AA 9/18 18/19

Generic interaction

H0: 4 parameters

Ha: 6 parameters

DF = 2, p = 0.19

Carcinogen

No Yes

aa − 0.76

Aa 0.86 2.14

AA 1.07 2.03

A gene is identified to metabolize a carcinogen. Allele A is the putative susceptibility allele.

Goal: Is the risk elevated for those who have carcinogen exposure and carry the risk allele?Data from Piegorsch et al. (1994)

22

Do We Want Generic Interaction?

Approach 4

H0: 1 group

Ha: 2 groups

DF = 1, p = 0.0043

Carcinogen

No Yes

aa − −

Aa − 2.31

AA − 2.31

Approach 3

H0: 1 group

Ha: 3 groups

DF = 2, p = 0.017

Carcinogen

No Yes

aa − −

Aa − 2.37

AA − 2.25

Approach 2

H0: 2 groups

Ha: 4 groups

DF = 2, p = 0.037

Carcinogen

No Yes

aa − 0.77

Aa − 2.19

AA − 2.08

23

Testing for Interaction While Adjusting for Other Covariates

μage, 00 = (β0 + β4age)

μage, 01 = (β0 + β4age) + β2

μage, 10 = (β0 + β4age) + β1

μage, 11 = (β0 + β4age) + β1 + β2 + β3

μage, ij = log(Oage, ij) = β0 + β4age + β1sex + β2smoking+ β3sex×smoking

We are testing for interaction under the assumption that the effects of sex, smoking, and sex×smoking are the same over the whole ranges of the covariates.

Smoking

No (0) Yes (1)

Male (0) Oage, 00 Oage, 01

Female (1) Oage, 10 Oage, 11