19
Survival Case Study Niels Richard Hansen June 8, 2012 Contents 1 Introduction 1 2 Descriptive statistics 2 3 The Cox model 11 4 Session information 19 1 Introduction We reconsider the case study from Chapter 19 in Frank Harrell’s book. This document is mostly a collection of R-code and not as well-commented as it should be. First, the data set is acquired. con <- url("http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/prostate.sav") print(load(con)) ## Prints "counties" if data loads correctly. ## [1] "prostate" close(con) Then we load the packages that are needed. library(rms) ### Harrell's R package ~~ the S-package is 'Design' library(ggplot2) ### Extensive graphics package library(reshape2) ### For the 'melt' function library(lattice) ### More graphics (specifically, scatter plot matrices) library(hexbin) ### and more graphics library(survival)

Survival Case Study - ku

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Survival Case Study - ku

Survival Case Study

Niels Richard HansenJune 8, 2012

Contents

1 Introduction 1

2 Descriptive statistics 2

3 The Cox model 11

4 Session information 19

1 Introduction

We reconsider the case study from Chapter 19 in Frank Harrell’s book. This documentis mostly a collection of R-code and not as well-commented as it should be.

First, the data set is acquired.

con <-

url("http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/prostate.sav")

print(load(con)) ## Prints "counties" if data loads correctly.

## [1] "prostate"

close(con)

Then we load the packages that are needed.

library(rms) ### Harrell's R package ~~ the S-package is 'Design'library(ggplot2) ### Extensive graphics package

library(reshape2) ### For the 'melt' function

library(lattice) ### More graphics (specifically, scatter plot matrices)

library(hexbin) ### and more graphics

library(survival)

Page 2: Survival Case Study - ku

2 Descriptive statistics

2 Descriptive statistics

The data set contains the following variables with the following class labels:

names(prostate)

## [1] "patno" "stage" "rx" "dtime" "status" "age" "wt"

## [8] "pf" "hx" "sbp" "dbp" "ekg" "hg" "sz"

## [15] "sg" "ap" "bm" "sdate"

sapply(prostate, function(x) class(x)[1])

## patno stage rx dtime status age

## "labelled" "labelled" "factor" "labelled" "factor" "labelled"

## wt pf hx sbp dbp ekg

## "labelled" "factor" "labelled" "labelled" "labelled" "factor"

## hg sz sg ap bm sdate

## "labelled" "labelled" "labelled" "labelled" "labelled" "labelled"

Using the labels that Harrell has attached to the columns as attributes we can get moreinformative variable names.

sapply(prostate, function(x) {

if(is.factor(x)) {

attr(x, "levels")

} else {

attr(x, "label")

}

}

)

## $patno

## [1] "Patient Number"

##

## $stage

## [1] "Stage"

##

## $rx

## [1] "placebo" "0.2 mg estrogen" "1.0 mg estrogen" "5.0 mg estrogen"

##

## $dtime

## [1] "Months of Follow-up"

##

## $status

## [1] "alive" "dead - prostatic ca"

## [3] "dead - heart or vascular" "dead - cerebrovascular"

Page 3: Survival Case Study - ku

Descriptive statistics 3

## [5] "dead - pulmonary embolus" "dead - other ca"

## [7] "dead - respiratory disease" "dead - other specific non-ca"

## [9] "dead - unspecified non-ca" "dead - unknown cause"

##

## $age

## [1] "Age in Years"

##

## $wt

## [1] "Weight Index = wt(kg)-ht(cm)+200"

##

## $pf

## [1] "normal activity" "in bed < 50% daytime" "in bed > 50% daytime"

## [4] "confined to bed"

##

## $hx

## [1] "History of Cardiovascular Disease"

##

## $sbp

## [1] "Systolic Blood Pressure/10"

##

## $dbp

## [1] "Diastolic Blood Pressure/10"

##

## $ekg

## [1] "normal" "benign"

## [3] "rhythmic disturb & electrolyte ch" "heart block or conduction def"

## [5] "heart strain" "old MI"

## [7] "recent MI"

##

## $hg

## [1] "Serum Hemoglobin (g/100ml)"

##

## $sz

## [1] "Size of Primary Tumor (cm^2)"

##

## $sg

## [1] "Combined Index of Stage and Hist. Grade"

##

## $ap

## [1] "Serum Prostatic Acid Phosphatase"

##

## $bm

## [1] "Bone Metastases"

##

## $sdate

## [1] "Date on study"

##

A standard R summary can also be obtained.

Page 4: Survival Case Study - ku

4 Descriptive statistics

summary(prostate)

## patno stage rx dtime

## Min. : 1 Min. :3.00 placebo :127 Min. : 0.0

## 1st Qu.:126 1st Qu.:3.00 0.2 mg estrogen:124 1st Qu.:14.2

## Median :252 Median :3.00 1.0 mg estrogen:126 Median :34.0

## Mean :252 Mean :3.42 5.0 mg estrogen:125 Mean :36.1

## 3rd Qu.:377 3rd Qu.:4.00 3rd Qu.:57.8

## Max. :506 Max. :4.00 Max. :76.0

##

## status age wt

## alive :148 Min. :48.0 Min. : 69

## dead - prostatic ca :130 1st Qu.:70.0 1st Qu.: 90

## dead - heart or vascular : 96 Median :73.0 Median : 98

## dead - cerebrovascular : 31 Mean :71.5 Mean : 99

## dead - other specific non-ca: 28 3rd Qu.:76.0 3rd Qu.:107

## dead - other ca : 25 Max. :89.0 Max. :152

## (Other) : 44 NA's :1 NA's :2

## pf hx sbp dbp

## normal activity :450 Min. :0.000 Min. : 8.0 Min. : 4.00

## in bed < 50% daytime: 37 1st Qu.:0.000 1st Qu.:13.0 1st Qu.: 7.00

## in bed > 50% daytime: 13 Median :0.000 Median :14.0 Median : 8.00

## confined to bed : 2 Mean :0.424 Mean :14.4 Mean : 8.15

## 3rd Qu.:1.000 3rd Qu.:16.0 3rd Qu.: 9.00

## Max. :1.000 Max. :30.0 Max. :18.00

##

## ekg hg sz

## normal :168 Min. : 5.9 Min. : 0.0

## heart strain :150 1st Qu.:12.3 1st Qu.: 5.0

## old MI : 75 Median :13.7 Median :11.0

## rhythmic disturb & electrolyte ch: 51 Mean :13.4 Mean :14.6

## heart block or conduction def : 26 3rd Qu.:14.7 3rd Qu.:21.0

## (Other) : 24 Max. :21.2 Max. :69.0

## NA's : 8 NA's :5

## sg ap bm sdate

## Min. : 5.0 Min. : 0.1 Min. :0.000 Min. :2652

## 1st Qu.: 9.0 1st Qu.: 0.5 1st Qu.:0.000 1st Qu.:2860

## Median :10.0 Median : 0.7 Median :0.000 Median :3021

## Mean :10.3 Mean : 12.2 Mean :0.163 Mean :3039

## 3rd Qu.:11.0 3rd Qu.: 3.0 3rd Qu.:0.000 3rd Qu.:3204

## Max. :15.0 Max. : 999.9 Max. :1.000 Max. :3465

## NA's :11

Then we look at the marginal distributions of the variables in terms of histograms andbarplots.

To study the continuous variables we exclude the patient number, factors, and some ofthe numeric variables that are, in fact, dichotomous variables encoded as numerics and

Page 5: Survival Case Study - ku

Descriptive statistics 5

the date. Then we make histograms. Subsequently, to study the discrete variables, weconsider barplots.

conVar <- c("dtime", "age", "wt", "sbp", "dbp","hg", "sz", "sg", "ap")

disVar <- c( "stage", "rx", "status", "pf", "hx", "ekg", "bm")

meltedProstate <- melt(prostate[, conVar])

qplot(value, data = meltedProstate, geom = "histogram",

xlab = "", ylab = "") +

facet_wrap(~ variable, scales = "free")

dtime age wt

sbp dbp hg

sz sg ap

0

10

20

30

0

20

40

60

80

0

10

20

30

40

50

60

0

20

40

60

80

100

0

50

100

150

0

20

40

60

0

20

40

60

0

50

100

0

100

200

300

400

0 20 40 60 80 50 60 70 80 90 60 80 100 120 140

10 15 20 25 30 5 10 15 5 10 15 20

0 20 40 60 6 8 10 12 14 0 200 400 600 800 1000

meltedProstate <- melt(prostate[, disVar], id.vars = c())

qplot(value, data = meltedProstate, geom = "bar",

xlab = "", ylab = "") +

facet_wrap(~ variable, scales = "free", nrow = 2) +

Page 6: Survival Case Study - ku

6 Descriptive statistics

opts(axis.text.x = theme_text(angle = -30,

size = 8, hjust = 0, vjust = 1))

stage rx status pf

hx ekg bm

0

50

100

150

200

250

300

0

20

40

60

80

100

120

0

50

100

150

0

100

200

300

400

0

50

100

150

200

250

300

0

50

100

150

0

100

200

300

400

3 4 0.2 mg estrogen

1.0 mg estrogen

5.0 mg estrogen

placebo

alivedead − cerebrovascular

dead − heart or vascular

dead − other ca

dead − other specific non−ca

dead − prostatic ca

dead − pulmonary embolus

dead − respiratory disease

dead − unknown cause

dead − unspecified non−ca

confined to bed

in bed < 50% daytime

in bed > 50% daytime

normal activity

0 1 benignheart block or conduction def

heart strain

normalold MI

recent MI

rhythmic disturb & electrolyte ch

NA 0 1

The number of missing values for each variable.

sapply(prostate, function(x) sum(is.na(x)))

## patno stage rx dtime status age wt pf hx sbp

## 0 0 0 0 0 1 2 0 0 0

## dbp ekg hg sz sg ap bm sdate

## 0 8 0 5 11 0 0 0

The scatter plot matrix.

Page 7: Survival Case Study - ku

Descriptive statistics 7

splom(na.omit(prostate[, conVar]),

upper.panel = panel.hexbinplot,

pscale = 0,

varname.cex = 0.7,

nbins = 15,

lower.panel = function(x, y) {

panel.text(mean(range(x)), mean(range(y)),

round(cor(x, y), digits = 2),

cex = 0.7

)

}

)

Scatter Plot Matrix

dtime −0.18 0.12 0.01 0.05 0.2 −0.19 −0.18 −0.03

age −0.06 0.1 −0.07 −0.09 0.01 −0.06 −0.06

wt 0.21 0.23 0.26 −0.05 −0.09 −0.06

sbp 0.63 0.06 0.05 −0.03 −0.05

dbp 0.15 −0.04 −0.07 −0.06

hg −0.13 −0.14 −0.13

sz 0.38 0.09

sg 0.15

ap

We will for now just drop the missing observations but see Harrell Chapter 8 for atreatment on imputation for this particular data set. In the following plots we study therelation between the main response (the followup time) and the explanatory variables.

Page 8: Survival Case Study - ku

8 Descriptive statistics

This is complicated by the censoring.

### Dropping observations with missing variables.

subProstate <- transform(na.omit(prostate),

patno = as.factor(patno),

dtime = as.numeric(dtime)

)

meltedProstate <- melt(subProstate[, c("patno", conVar, "status")],

id.vars = c("patno", "dtime", "status")

)

qplot(value, dtime, data = meltedProstate,

xlab = "", ylab = "", geom = "point",

colour = status != "alive", alpha = I(0.3), size = I(3)) +

facet_wrap(~ variable, scales = "free_x") +

geom_smooth(size = 1, se = FALSE) +

coord_cartesian(ylim = c(-10, 90)) +

geom_smooth(size = 1, se = FALSE, aes(colour = NULL))

Page 9: Survival Case Study - ku

Descriptive statistics 9

age wt sbp

dbp hg sz

sg ap

0

20

40

60

80

0

20

40

60

80

0

20

40

60

80

50 60 70 80 90 80 100 120 140 10 15 20 25 30

5 10 15 6 8 10 12 14 16 18 0 20 40 60

6 8 10 12 14 0 200 400 600 800 1000

status != "alive"

FALSE

TRUE

meltedProstate <- melt(subProstate[, c("patno", "dtime", disVar)],

id.vars = c("patno", "dtime", "status")

)

qplot(value, dtime, data = meltedProstate,

xlab = "", ylab = "", geom = "boxplot") +

facet_wrap(~ variable, scales = "free_x") +

geom_boxplot(aes(fill = status != "alive"), alpha = I(0.2)) +

opts(axis.text.x = theme_text(angle = -30,

size = 8, hjust = 0, vjust = 1),

legend.position = "top")

Page 10: Survival Case Study - ku

10 Descriptive statistics

●●●

●●

●●

stage rx pf

hx ekg bm

0

20

40

60

0

20

40

60

3 4 0.2 mg estrogen

1.0 mg estrogen

5.0 mg estrogen

placebo

confined to bed

in bed < 50% daytime

in bed > 50% daytime

normal activity

0 1 benignheart block or conduction def

heart strain

normalold MI

recent MI

rhythmic disturb & electrolyte ch

0 1

status != "alive" FALSE TRUE

The figures shed some light on how survival is related marginally to the explanatoryvariables, but the smoothed curves (blue) do not represent mean survival time due tocensoring. To focus on the treatment we turn to a Kaplan-Meier estimator for each ofthe four treatment groups.

pbcSurv <- survfit(Surv(dtime, status != "alive") ~ rx,

data = prostate)

pbcSurv

## Call: survfit.formula(formula = Surv(dtime, status != "alive") ~ rx,

## data = prostate)

##

## records n.max n.start events median 0.95LCL 0.95UCL

## rx=placebo 127 127 127 95 33.0 27 40

## rx=0.2 mg estrogen 124 124 124 95 30.0 26 41

## rx=1.0 mg estrogen 126 126 126 71 41.5 31 NA

## rx=5.0 mg estrogen 125 125 125 93 35.0 28 46

Page 11: Survival Case Study - ku

The Cox model 11

plot(pbcSurv, mark.time = FALSE, conf.int = FALSE,

col = c("red", "blue", "purple", "cyan"))

0 20 40 60

0.0

0.2

0.4

0.6

0.8

1.0

The plot suggests that those individuals treated with 1.0 mg estrogen has a longersurvival time than the other three groups. Adding confidence bands (which will meshup the reading of the figure in general) shows that the difference is significant.

3 The Cox model

form <- Surv(dtime, status != "alive") ~ rx + age + wt + pf +

hx + sbp + dbp + ekg + hg + sz + sg + ap + bm

prostateCox <- coxph(form, data = subProstate)

summary(prostateCox)

Page 12: Survival Case Study - ku

12 The Cox model

## Call:

## coxph(formula = form, data = subProstate)

##

## n= 475, number of events= 338

##

## coef exp(coef) se(coef) z

## rx0.2 mg estrogen -0.027299 0.973070 0.154311 -0.18

## rx1.0 mg estrogen -0.309034 0.734155 0.166961 -1.85

## rx5.0 mg estrogen -0.000717 0.999283 0.158288 0.00

## age 0.024865 1.025177 0.009132 2.72

## wt -0.009343 0.990701 0.004710 -1.98

## pfin bed < 50% daytime 0.289166 1.335314 0.207166 1.40

## pfin bed > 50% daytime 0.434667 1.544449 0.331319 1.31

## pfconfined to bed 1.597688 4.941594 0.813367 1.96

## hx 0.503999 1.655328 0.119740 4.21

## sbp -0.030186 0.970265 0.029267 -1.03

## dbp 0.039661 1.040458 0.047883 0.83

## ekgbenign 0.061527 1.063459 0.279862 0.22

## ekgrhythmic disturb & electrolyte ch 0.343086 1.409290 0.193319 1.77

## ekgheart block or conduction def -0.097585 0.907025 0.290307 -0.34

## ekgheart strain 0.448690 1.566260 0.141378 3.17

## ekgold MI 0.063535 1.065597 0.179141 0.35

## ekgrecent MI 0.889271 2.433356 1.021377 0.87

## hg -0.069256 0.933087 0.031750 -2.18

## sz 0.017531 1.017686 0.004576 3.83

## sg 0.071384 1.073993 0.031675 2.25

## ap -0.001566 0.998435 0.000999 -1.57

## bm 0.272781 1.313612 0.170497 1.60

## Pr(>|z|)

## rx0.2 mg estrogen 0.85958

## rx1.0 mg estrogen 0.06418 .

## rx5.0 mg estrogen 0.99639

## age 0.00647 **

## wt 0.04729 *

## pfin bed < 50% daytime 0.16277

## pfin bed > 50% daytime 0.18954

## pfconfined to bed 0.04950 *

## hx 2.6e-05 ***

## sbp 0.30236

## dbp 0.40751

## ekgbenign 0.82599

## ekgrhythmic disturb & electrolyte ch 0.07594 .

## ekgheart block or conduction def 0.73676

## ekgheart strain 0.00151 **

## ekgold MI 0.72284

## ekgrecent MI 0.38394

## hg 0.02916 *

## sz 0.00013 ***

## sg 0.02422 *

Page 13: Survival Case Study - ku

The Cox model 13

## ap 0.11705

## bm 0.10962

## ---

## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

##

## exp(coef) exp(-coef) lower .95

## rx0.2 mg estrogen 0.973 1.028 0.719

## rx1.0 mg estrogen 0.734 1.362 0.529

## rx5.0 mg estrogen 0.999 1.001 0.733

## age 1.025 0.975 1.007

## wt 0.991 1.009 0.982

## pfin bed < 50% daytime 1.335 0.749 0.890

## pfin bed > 50% daytime 1.544 0.647 0.807

## pfconfined to bed 4.942 0.202 1.004

## hx 1.655 0.604 1.309

## sbp 0.970 1.031 0.916

## dbp 1.040 0.961 0.947

## ekgbenign 1.063 0.940 0.614

## ekgrhythmic disturb & electrolyte ch 1.409 0.710 0.965

## ekgheart block or conduction def 0.907 1.103 0.513

## ekgheart strain 1.566 0.638 1.187

## ekgold MI 1.066 0.938 0.750

## ekgrecent MI 2.433 0.411 0.329

## hg 0.933 1.072 0.877

## sz 1.018 0.983 1.009

## sg 1.074 0.931 1.009

## ap 0.998 1.002 0.996

## bm 1.314 0.761 0.940

## upper .95

## rx0.2 mg estrogen 1.317

## rx1.0 mg estrogen 1.018

## rx5.0 mg estrogen 1.363

## age 1.044

## wt 1.000

## pfin bed < 50% daytime 2.004

## pfin bed > 50% daytime 2.957

## pfconfined to bed 24.334

## hx 2.093

## sbp 1.028

## dbp 1.143

## ekgbenign 1.841

## ekgrhythmic disturb & electrolyte ch 2.059

## ekgheart block or conduction def 1.602

## ekgheart strain 2.066

## ekgold MI 1.514

## ekgrecent MI 18.014

## hg 0.993

## sz 1.027

## sg 1.143

Page 14: Survival Case Study - ku

14 The Cox model

## ap 1.000

## bm 1.835

##

## Concordance= 0.654 (se = 0.017 )

## Rsquare= 0.205 (max possible= 1 )

## Likelihood ratio test= 109 on 22 df, p=1.53e-13

## Wald test = 109 on 22 df, p=1.59e-13

## Score (logrank) test = 114 on 22 df, p=2.58e-14

##

drop1(prostateCox, test = "Chisq")

## Single term deletions

##

## Model:

## Surv(dtime, status != "alive") ~ rx + age + wt + pf + hx + sbp +

## dbp + ekg + hg + sz + sg + ap + bm

## Df AIC LRT Pr(>Chi)

## <none> 3731

## rx 3 3730 4.91 0.17887

## age 1 3737 7.80 0.00524 **

## wt 1 3733 3.99 0.04585 *

## pf 3 3730 5.61 0.13245

## hx 1 3746 17.57 2.8e-05 ***

## sbp 1 3730 1.08 0.29940

## dbp 1 3729 0.69 0.40636

## ekg 6 3732 13.37 0.03756 *

## hg 1 3733 4.68 0.03049 *

## sz 1 3742 13.68 0.00022 ***

## sg 1 3734 4.99 0.02553 *

## ap 1 3732 2.89 0.08907 .

## bm 1 3731 2.49 0.11423

## ---

## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

prostateData <- subProstate[, c("patno", all.vars(form)[-c(1, 2)])]

predProstate <- cbind(prostateData,

data.frame(mgres = residuals(prostateCox))

)

meltedPredProstate <- melt(predProstate, id.vars = c("patno", "mgres"),

measure.vars = conVar[conVar %in%

all.vars(form)[-c(1, 2)]])

qplot(value, mgres, data = meltedPredProstate,

xlab = "", ylab = "", geom = "point", alpha = I(0.3), size = I(3)) +

facet_wrap(~ variable, scales = "free_x") +

Page 15: Survival Case Study - ku

The Cox model 15

geom_smooth(size = 1, fill = "blue") +

coord_cartesian(ylim = c(-5, 2))

age wt sbp

dbp hg sz

sg ap

−5

−4

−3

−2

−1

0

1

2

−5

−4

−3

−2

−1

0

1

2

−5

−4

−3

−2

−1

0

1

2

50 60 70 80 90 80 100 120 140 10 15 20 25 30

5 10 15 6 8 10 12 14 16 18 0 20 40 60

6 8 10 12 14 0 200 400 600 800 1000

meltedPredProstate <- melt(predProstate, id.vars = c("patno", "mgres"),

measure.vars = disVar[disVar %in%

all.vars(form)[-c(1, 2)]])

qplot(value, mgres, data = meltedPredProstate,

xlab = "", ylab = "", geom = "boxplot") +

facet_wrap(~ variable, scales = "free_x") +

opts(axis.text.x = theme_text(angle = -30,

size = 8, hjust = 0, vjust = 1))

Page 16: Survival Case Study - ku

16 The Cox model

●●

● ●

●●

●●

●●

rx pf hx

ekg bm

−4

−3

−2

−1

0

1

−4

−3

−2

−1

0

1

0.2 mg estrogen

1.0 mg estrogen

5.0 mg estrogen

placebo

confined to bed

in bed < 50% daytime

in bed > 50% daytime

normal activity

0 1

benignheart block or conduction def

heart strain

normalold MI

recent MI

rhythmic disturb & electrolyte ch

0 1

form <- Surv(dtime, status != "alive") ~ rx + rcs(age, 4) + rcs(wt, 4) + pf +

hx +

rcs(sbp, 4) + rcs(dbp, 4) + ekg + rcs(hg, 4) + rcs(sz, 4) + rcs(sg, 4) +

rcs(ap, 4) + bm

prostateCox2 <- coxph(form, data = subProstate)

drop1(prostateCox2, test = "Chisq")

## Single term deletions

##

## Model:

## Surv(dtime, status != "alive") ~ rx + rcs(age, 4) + rcs(wt, 4) +

## pf + hx + rcs(sbp, 4) + rcs(dbp, 4) + ekg + rcs(hg, 4) +

## rcs(sz, 4) + rcs(sg, 4) + rcs(ap, 4) + bm

## Df AIC LRT Pr(>Chi)

## <none> 3727

## rx 3 3725 3.40 0.3334

## rcs(age, 4) 3 3737 15.85 0.0012 **

Page 17: Survival Case Study - ku

The Cox model 17

## rcs(wt, 4) 3 3728 7.05 0.0704 .

## pf 3 3730 8.54 0.0360 *

## hx 1 3742 17.09 3.6e-05 ***

## rcs(sbp, 4) 3 3723 1.49 0.6847

## rcs(dbp, 4) 3 3728 7.02 0.0712 .

## ekg 6 3729 13.48 0.0360 *

## rcs(hg, 4) 3 3731 9.23 0.0264 *

## rcs(sz, 4) 3 3733 12.04 0.0073 **

## rcs(sg, 4) 3 3725 3.87 0.2762

## rcs(ap, 4) 3 3733 11.15 0.0109 *

## bm 1 3726 0.36 0.5479

## ---

## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

predictProstate <- predict(prostateCox2, type = "terms", se.fit = TRUE)

predictData <- melt(subProstate[, c("patno", conVar[-1])],

id.vars = "patno")

selectedTerms <- which(all.vars(form)[-c(1,2)] %in% conVar)

plotData <- cbind(predictData,

se = melt(predictProstate$se.fit[, selectedTerms])$value,

y = melt(predictProstate$fit[, selectedTerms])$value

)

ggplot(plotData, aes(x = value), xlab = "", ylab = "") +

geom_ribbon(aes(ymin = y - 2*se, ymax = y + 2*se),

alpha = I(0.2),

fill = I("blue")) +

geom_line(aes(y = y), size = 1) +

geom_rug(data = predictData,

aes(x = value, y = NULL),

colour = I("black"),

alpha = I(0.1)

) +

facet_wrap(~ variable, nrow = 2, scale = "free")

Page 18: Survival Case Study - ku

18 The Cox model

age wt sbp dbp

hg sz sg ap

−1.0

−0.5

0.0

0.5

1.0

1.5

2.0

−0.5

0.0

0.5

1.0

−2

−1

0

1

−4

−3

−2

−1

0

0.0

0.5

1.0

1.5

−0.5

0.0

0.5

1.0

1.5

−0.5

0.0

0.5

−4

−3

−2

−1

0

1

50 60 70 80 90 80 100 120 140 10 15 20 25 30 5 10 15

6 8 10 12 14 16 18 0 20 40 60 6 8 10 12 14 0 200 400 600 800 1000

value

predictData <- melt(subProstate[, c("patno", disVar[-c(1, 3)])],

id.vars = "patno")

selectedTerms <- which(all.vars(form)[-c(1, 2)] %in% disVar)

plotData <- cbind(predictData,

se = melt(predictProstate$se.fit[, selectedTerms])$value,

y = melt(predictProstate$fit[, selectedTerms])$value

)[!duplicated(interaction(plotData[, c("variable",

"value")])), ]

ggplot(plotData, aes(x = value), xlab = "", ylab = "") +

geom_errorbar(aes(ymin = y - 2*se, ymax = y + 2*se),

colour = I("blue")) +

geom_point(aes(y = y), size = 3) +

facet_wrap(~ variable, nrow = 2, scale = "free") +

opts(axis.text.x = theme_text(angle = -30,

size = 8, hjust = 0, vjust = 1))

## Error: Zero breaks in scale for y/ymin/ymax/yend/yintercept/ymin_final/ymax_final

Page 19: Survival Case Study - ku

Session information 19

4 Session information

## R version 2.15.0 (2012-03-30)

## Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

##

## locale:

## [1] C

##

## attached base packages:

## [1] splines grid stats graphics grDevices utils datasets

## [8] methods base

##

## other attached packages:

## [1] knitr_0.5 rms_3.5-0 Hmisc_3.9-3 survival_2.36-12

## [5] ggplot2_0.9.1 hexbin_1.26.0 xtable_1.7-0 lattice_0.20-6

## [9] reshape2_1.2.1

##

## loaded via a namespace (and not attached):

## [1] MASS_7.3-17 RColorBrewer_1.0-5 Rcpp_0.9.10

## [4] cluster_1.14.2 codetools_0.2-8 colorspace_1.1-1

## [7] compiler_2.15.0 dichromat_1.2-4 digest_0.5.2

## [10] evaluate_0.4.1 formatR_0.3-4 highlight_0.3.1

## [13] labeling_0.1 memoise_0.1 munsell_0.3

## [16] parser_0.0-14 plyr_1.7.1 proto_0.3-9.2

## [19] scales_0.2.1 stringr_0.6 tcltk_2.15.0

## [22] tools_2.15.0