18
Lecture 7 GLMs II Binomial Family Olivier MISSA, [email protected] Advanced Research Skills

Lecture 7 GLMs II Binomial Family Olivier MISSA, [email protected]@york.ac.uk Advanced Research Skills

Embed Size (px)

Citation preview

Page 1: Lecture 7 GLMs II Binomial Family Olivier MISSA, om502@york.ac.ukom502@york.ac.uk Advanced Research Skills

Lecture 7

GLMs II

Binomial Family

Olivier MISSA, [email protected]

Advanced Research Skills

Page 2: Lecture 7 GLMs II Binomial Family Olivier MISSA, om502@york.ac.ukom502@york.ac.uk Advanced Research Skills

2

Outline

Continue our Introduction to Generalized Linear Models.

In this lecture:

Illustrate the use of GLMs for

proportion and

binary data.

Page 3: Lecture 7 GLMs II Binomial Family Olivier MISSA, om502@york.ac.ukom502@york.ac.uk Advanced Research Skills

3

Binary & Proportion data tend to follow the Binomial distribution

The Canonical link of this glm family is the logit function:

The variance reaches a maximum for intermediate values of p and a minimum at either 0% or 100%.

Reminder

)1(

logp

p

ppnVar 1pnMean

Page 4: Lecture 7 GLMs II Binomial Family Olivier MISSA, om502@york.ac.ukom502@york.ac.uk Advanced Research Skills

4

In R, binary/proportion data can be entered

into a model as a response in three different ways:

as a numeric vector

(holding the number or proportion of successes)

as a logical vector or a factor

(TRUE or the first factor level will be considered successes).

as a two-column matrix(the first column holding the number of successes and

the second column the number of failures).

Three ways to work with binary data

Page 5: Lecture 7 GLMs II Binomial Family Olivier MISSA, om502@york.ac.ukom502@york.ac.uk Advanced Research Skills

5

Toxicity to tobacco budworm (moth) of different doses of trans-cypermethrin.

Batches of 20 moths (of each sex) were put in contact

for three days with increasing doses of the pyrethroid.

1st Example

Dose (micrograms)

Sex 1 2 4 8 16 32

Male 1 4 9 13 18 20

Female 0 2 6 10 12 16

Number of dead moths out of 20 tested

> (dose <- rep(2^(0:5), 2))[1] 1 2 4 8 16 32 1 2 4 8 16 32> numdead <- c(1,4,9,13,18,20,0,2,6,10,12,16)> (sex <- factor( rep( c("M","F"), c(6,6) ) ))[1] M M M M M M F F F F F FLevels: F M> SF <- cbind(numdead, numalive=20-numdead)

Page 6: Lecture 7 GLMs II Binomial Family Olivier MISSA, om502@york.ac.ukom502@york.ac.uk Advanced Research Skills

6

1st Example

> modb <- glm(SF ~ sex*dose, family=binomial)> summary(modb)Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.71578 0.32233 -5.323 1.02e-07 ***sexM -0.21194 0.51523 -0.411 0.68082 dose 0.11568 0.02379 4.863 1.16e-06 ***sexM:dose 0.18156 0.06692 2.713 0.00666 ** ---(Dispersion parameter for binomial family taken to be 1)

Null deviance: 124.876 on 11 degrees of freedomResidual deviance: 18.164 on 8 degrees of freedomAIC: 56.275

What is modelled is the proportion of successes

n

i i

iiiii yn

ynynyyyD

1 ˆln)()ˆ/ln(2

Page 7: Lecture 7 GLMs II Binomial Family Olivier MISSA, om502@york.ac.ukom502@york.ac.uk Advanced Research Skills

7

1st Example> ldose <- log2(dose)> modb2 <- glm(SF ~ sex*ldose, family=binomial)> summary(modb2)Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -2.9935 0.5527 -5.416 6.09e-08 ***sexM 0.1750 0.7783 0.225 0.822 ldose 0.9060 0.1671 5.422 5.89e-08 ***sexM:ldose 0.3529 0.2700 1.307 0.191 ---(Dispersion parameter for binomial family taken to be 1)

Null deviance: 124.8756 on 11 degrees of freedomResidual deviance: 4.9937 on 8 degrees of freedomAIC: 43.104

Page 8: Lecture 7 GLMs II Binomial Family Olivier MISSA, om502@york.ac.ukom502@york.ac.uk Advanced Research Skills

8

1st Example> drop1(modb2, test="Chisq")Single term deletions

Model:SF ~ sex * ldose Df Deviance AIC LRT Pr(Chi)<none> 4.994 43.104 sex:ldose 1 6.757 42.867 1.763 0.1842

> modb3 <- update(modb2, ~. – sex:ldose)> summary(modb3)Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -3.4732 0.4685 -7.413 1.23e-13 ***sexM 1.1007 0.3558 3.093 0.00198 ** ldose 1.0642 0.1311 8.119 4.70e-16 ***---(Dispersion parameter for binomial family taken to be 1)

Null deviance: 124.876 on 11 degrees of freedomResidual deviance: 6.757 on 9 degrees of freedomAIC: 42.867

Page 9: Lecture 7 GLMs II Binomial Family Olivier MISSA, om502@york.ac.ukom502@york.ac.uk Advanced Research Skills

9

1st Example> drop1(modb3, test="Chisq")Single term deletions

Model:SF ~ sex + ldose Df Deviance AIC LRT Pr(Chi) <none> 6.757 42.867 sex 1 16.984 51.094 10.227 0.001384 ** ldose 1 118.799 152.909 112.042 < 2.2e-16 ***

> shapiro.test(residuals(modb3), type="deviance")

Shapiro-Wilk normality test

data: residuals(modb3, type = "deviance") W = 0.9666, p-value = 0.8725

Page 10: Lecture 7 GLMs II Binomial Family Olivier MISSA, om502@york.ac.ukom502@york.ac.uk Advanced Research Skills

10

1st Example

> par(mfrow=c(2,2))> plot(modb3)

Page 11: Lecture 7 GLMs II Binomial Family Olivier MISSA, om502@york.ac.ukom502@york.ac.uk Advanced Research Skills

11

1st Example> plot( c(0,1) ~ c(1,32), type="n", log="x",

xlab="dose", ylab="Probability")> text(dose, numdead/20, labels=as.character(sex) )> ld <- seq(0,32,0.5)> lines (ld, predict(modb3, data.frame(ldose=log2(ld),

sex=factor(rep("M", length(ld)), levels=levels(sex))),type="response") )

> lines (ld, predict(modb3, data.frame(ldose=log2(ld),sex=factor(rep("F", length(ld)), levels=levels(sex))),type="response"), lty=2, col="red" )

Page 12: Lecture 7 GLMs II Binomial Family Olivier MISSA, om502@york.ac.ukom502@york.ac.uk Advanced Research Skills

12

1st Example> modbp <- glm(SF ~ sex*ldose,

family=binomial(link="probit"))> AIC(modbp)[1] 41.87836

> modbc <- glm(SF ~ sex*ldose,family=binomial(link="cloglog"))

> AIC(modbc)[1] 43.8663

> AIC(modb3)[1] 42.86747

Page 13: Lecture 7 GLMs II Binomial Family Olivier MISSA, om502@york.ac.ukom502@york.ac.uk Advanced Research Skills

13

> summary(modb3)

Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -3.4732 0.4685 -7.413 1.23e-13 ***sexM 1.1007 0.3558 3.093 0.00198 ** ldose 1.0642 0.1311 8.119 4.70e-16 ***---

> exp(modb3$coeff) ## careful it may be misleading(Intercept) sexM ldose 0.031019 3.006400 2.898560 ## odds ration: p / (1-p)

> exp(modb3$coeff[1]+modb3$coeff[2]) ## odds for males(Intercept) 0.09325553

1st Example

logit scale

)1(

logp

p

Every doubling of the dose will lead to an increase in the odds of dying over surviving by a factor of 2.899

Page 14: Lecture 7 GLMs II Binomial Family Olivier MISSA, om502@york.ac.ukom502@york.ac.uk Advanced Research Skills

14

Erythrocyte Sedimentation Rate in a group of patients.

Two groups : <20 (healthy) or >20 (ill) mm/hour

Q: Is it related to globulin & fibrinogen level in the blood ?

2nd Example

> data("plasma", package="HSAUR")> str(plasma)'data.frame': 32 obs. of 3 variables: $ fibrinogen: num 2.52 2.56 2.19 2.18 3.41 2.46 3.22 2.21 ... $ globulin : int 38 31 33 31 37 36 38 37 39 41 ... $ ESR : Factor w/ 2 levels "ESR < 20","ESR > 20": 1 1 ...> summary(plasma) fibrinogen globulin ESR Min. :2.090 Min. :28.00 ESR < 20:26 1st Qu.:2.290 1st Qu.:31.75 ESR > 20: 6 Median :2.600 Median :36.00 Mean :2.789 Mean :35.66 3rd Qu.:3.167 3rd Qu.:38.00 Max. :5.060 Max. :46.00

Page 15: Lecture 7 GLMs II Binomial Family Olivier MISSA, om502@york.ac.ukom502@york.ac.uk Advanced Research Skills

15

2nd Example> stripchart(globulin ~ ESR, vertical=T, data=plasma, xlab="Erythrocyte Sedimentation Rate (mm/hr)",

ylab="Globulin blood level", method="jitter" )

> stripchart(fibrinogen ~ ESR, vertical=T, data=plasma, xlab="Erythrocyte Sedimentation Rate (mm/hr)",

ylab="Fibrinogen blood level", method="jitter" )

Page 16: Lecture 7 GLMs II Binomial Family Olivier MISSA, om502@york.ac.ukom502@york.ac.uk Advanced Research Skills

16

2nd Example> mod1 <- glm(ESR~fibrinogen, data=plasma, family=binomial)> summary(mod1)Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -6.8451 2.7703 -2.471 0.0135 *fibrinogen 1.8271 0.9009 2.028 0.0425 *---(Dispersion parameter for binomial family taken to be 1)

Null deviance: 30.885 on 31 degrees of freedomResidual deviance: 24.840 on 30 degrees of freedomAIC: 28.840

> mod2 <- glm(ESR~fibrinogen+globulin, data=plasma,family=binomial)

> AIC(mod2)[1] 28.97111

factor

Page 17: Lecture 7 GLMs II Binomial Family Olivier MISSA, om502@york.ac.ukom502@york.ac.uk Advanced Research Skills

17

2nd Example> anova(mod1, mod2, test="Chisq")Analysis of Deviance Table

Model 1: ESR ~ fibrinogenModel 2: ESR ~ fibrinogen + globulin Resid. Df Resid. Dev Df Deviance P(>|Chi|)1 30 24.8404 2 29 22.9711 1 1.8692 0.1716

> summary(mod2)Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -12.7921 5.7963 -2.207 0.0273 *fibrinogen 1.9104 0.9710 1.967 0.0491 *globulin 0.1558 0.1195 1.303 0.1925 ---(Dispersion parameter for binomial family taken to be 1)

Null deviance: 30.885 on 31 degrees of freedomResidual deviance: 22.971 on 29 degrees of freedomAIC: 28.971

The difference in terms of Deviance between these models is not significant, which leads us to select the least complex model

Page 18: Lecture 7 GLMs II Binomial Family Olivier MISSA, om502@york.ac.ukom502@york.ac.uk Advanced Research Skills

18

2nd Example> shapiro.test(residuals(mod1, type="deviance")) Shapiro-Wilk normality test

data: residuals(mod1, type = "deviance") W = 0.6863, p-value = 5.465e-07> par(mfrow=c(2,2))> plot(mod1)