187
Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis 12 November 2019 Anders Stockmarr Technical University of Denmark Section for Statistics and Data Analysis [email protected] AQUAEXCELL2020 Training Course - Planning and Conducting Experimental Infection Trials in Fish DTU AQUA, 12/11 2019 Design of Experiments Survival Analysis 1

Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

  • Upload
    others

  • View
    11

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Anders StockmarrTechnical University of DenmarkSection for Statistics and Data [email protected]

AQUAEXCELL2020 Training Course - Planning and Conducting Experimental Infection Trials in FishDTU AQUA, 12/11 2019

Design of ExperimentsSurvival Analysis

1

Page 2: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Two topics for today

• Design of Experiments and Survival Analysis

• Survival Analysis session: R commands uploaded in the script‘Commands.R’

• Data sets uploaded; should be placed in a folder labeled‘Data’ in your R working directory, if you want to followcalculations and figure generation simultaneously

• Pdf file ‘Introduction to R’ uploaded

2

Page 3: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Target Audience• You have:

–a first course in statistics;–heard of the normal distribution;–know about the mean and variance;–have done some regression analysis (or heard of it);–know something about ANOVA (or heard of it);–Have used Windows or Mac based computers;–Have done, or will be conducting experiments.

• These assumptions will form the basis of the communicationin this lecture.

3

Page 4: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Design of Experiments:Introduction

4

Page 5: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Main Reference

• Douglas C. Montgomery:Design and Analysis of ExperimentsWiley 2017.

A standard textbook held in an appropriate academic level.

5

Page 6: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Overview

• Introduction

• Basic Statistical Concepts

• The Blocking Principle

• The 2k Factorial Design

6

Page 7: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Introduction

7

Page 8: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 8

Design of ExperimentsIntroduction

• Why is this trip necessary? Goals of the lecture

• Some basic principles and terminology

• The strategy of experimentation

• Guidelines for planning, conducting and analyzing experiments

Page 9: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 9

Introduction to DOX

• An experiment is a test or a series of tests• Experiments are used widely in the engineering

world –Process characterization & optimization–Evaluation of material properties–Product design & development–Component & system tolerance determination

• “All experiments are designed experiments, some are poorly designed, some are well-designed”

Page 10: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 10

Experiments• Reduce time to design/develop

new products & processes• Improve performance of

existing processes• Improve reliability and

performance of products• Achieve product & process

robustness• Evaluation of materials, design

alternatives, setting component & system tolerances, etc.

Page 11: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 11

The Basic Principles of DOX

• Randomization–Running the trials in an experiment in random order–Notion of balancing out effects of “lurking” variables

• Replication–Sample size (improving precision of effect estimation, estimation

of error or background noise)–Replication versus repeat measurements?

• Blocking–Dealing with nuisance factors

Page 12: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 12

Strategy of Experimentation

• “Best-guess” experiments–Used a lot–More successful than you might suspect, but there are

disadvantages…• One-factor-at-a-time (OFAT) experiments

–Sometimes associated with the “scientific” or “engineering” method

–Devastated by interaction, also very inefficient• Statistically designed experiments

–Based on Fisher’s factorial concept

Page 13: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 13

Factorial Designs• In a factorial experiment, all

possible combinations of factor levels are tested

• The golf experiment:– Type of driver– Type of ball– Walking vs. riding– Type of beverage– Time of round– Weather – Type of golf spike– Etc, etc, etc…

Page 14: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 14

Factorial Design

Page 15: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 15

Factorial Designs with Several Factors

Page 16: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 16

Factorial Designs with Several FactorsA Fractional Factorial

Page 17: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 17

Planning, Conducting & Analyzing an Experiment1. Recognition of & statement of problem2. Choice of factors, levels, and ranges3. Selection of the response variable(s)4. Choice of design5. Conducting the experiment6. Statistical analysis7. Drawing conclusions, recommendations

Page 18: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 18

Planning, Conducting & Analyzing an Experiment

• Get statistical thinking involved early• Your non-statistical knowledge is crucial to

success• Pre-experimental planning (steps 1-3) vital• Think and experiment sequentially (use the

KISS principle)• Reference: Coleman & Montgomery

(Technometrics 1993).

Page 19: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Design of Experiments:

Basic Statistical Concepts

19

Page 20: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 20

Design of ExperimentsBasic Statistical Concepts• Simple comparative experiments

–The hypothesis testing framework–The two-sample t-test–Checking assumptions, validity

• Comparing more that two factor levels…theanalysis of variance–ANOVA decomposition of total variability–Statistical testing & analysis–Checking assumptions, model validity–Post-ANOVA testing of means

• Sample size determination

Page 21: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 21

Portland Cement Formulation

16.6216.7517.3717.1216.9816.8717.3417.0217.0817.27

Page 22: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 22

Graphical View of the DataDot Diagram

Page 23: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 23

Box Plots

Page 24: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 24

The Hypothesis Testing Framework

• Statistical hypothesis testing is a useful framework for many experimental situations

• Origins of the methodology date from the early 1900s

• We will use a procedure known as the two-sample t-test

Page 25: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

25

The Hypothesis Testing Framework

• Sampling from a normal distribution• Statistical hypotheses:

0 1 2

1 1 2

::

HH

µ µµ µ

=≠

Page 26: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 26

Estimation of Parameters

1

2 2 2

1

1 estimates the population mean

1 ( ) estimates the variance 1

n

ii

n

ii

y yn

S y yn

µ

σ

=

=

=

= −−

Page 27: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 27

Summary Statistics

Formulation 1

“New recipe”

Formulation 2

“Original recipe”

�𝑦𝑦1 = 16.76

𝑆𝑆12 = 0.100

𝑆𝑆1 = 0.316

𝑛𝑛1 = 10

�𝑦𝑦2 = 17.04

𝑆𝑆22 = 0. 061

𝑆𝑆2 = 0.248

𝑛𝑛2 = 10

Page 28: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 28

How the Two-Sample t-Test Works:

1 2

22y

Use the sample means to draw inferences about the population means16.76 17.04 0.28

Difference in sample meansStandard deviation of the difference in sample means

This suggests a statistic:

y y

nσσ

− = − = −

=

1 20 2 2

1 2

1 2

Z y y

n nσ σ−

=

+

Page 29: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 29

How the Two-Sample t-Test Works:2 2 2 2

1 2 1 2

1 22 2

1 2

1 2

2 2 21 2

2 22 1 1 2 2

1 2

Use and to estimate and

The previous ratio becomes

However, we have the case where Pool the individual sample variances:

( 1) ( 1)2p

S Sy yS Sn n

n S n SSn n

σ σ

σ σ σ

+

= =

− + −=

+ −

Page 30: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

30

How the Two-Sample t-Test Works:

• Values of t0 that are near zero are consistent with the null hypothesis

• Values of t0 that are very different from zero are consistent with the alternative hypothesis

• t0 is a “distance” measure-how far apart the averages are expressed in standard deviation units

• Notice the interpretation of t0 as a signal-to-noise ratio

1 20

1 2

The test statistic is

1 1

p

y ytS

n n

−=

+

Page 31: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 31

The Two-Sample (Pooled) t-Test2 2

2 1 1 2 2

1 2

1 20

1 2

( 1) ( 1) 9(0.100) 9(0.061) 0.0812 10 10 2

0.284

16.76 17.04 2.201 1 1 10.284

10 10

The two sample means are a little over two standard deviations apartIs t

p

p

p

n S n SSn n

S

y ytS

n n

− + − += = =

+ − + −=

− −= = = −

+ +

his a "large" difference?

Page 32: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 32

The Two-Sample (Pooled) t-Test• So far, we haven’t really

done any “statistics”• We need an objective

basis for deciding how large the test statistic t0 really is

• In 1908, W. S. Gossetderived the referencedistribution for t0 … called the t distribution

• Available in software packages such as R

t0 = -2.20

Page 33: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 33

The Two-Sample (Pooled) t-Test• A value of t0 between

–2.101 and 2.101 is consistent with equality of means

• It is possible for the means to be equal and t0 to exceed either 2.101 or –2.101, but it would be a “rareevent” … leads to the conclusion that the means are different

• Could also use the p-value approach

t0 = -2.20

Page 34: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

34

The Two-Sample (Pooled) t-Test

• The test level α is the chosen risk of wrongly rejecting the null hypothesis of equal means. The usual level of α is 0.05.

• The p-value is the probability of getting a more extreme vent under the hypothesis of equal means (it measures rareness of the event).

• The null hypothesis is rejected if the p-value is lower than the test level. In our problem, the p-value is p = 0.042

t0 = -2.20

Page 35: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 35

R Two-Sample t-Test ResultsR command:

t.test(modified,unmodified,var.equal=TRUE)

Output:Two Sample t-test

data: modified and unmodified

t = -2.1869, df = 18, p-value = 0.0422

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

-0.54507339 -0.01092661

sample estimates:

mean of x mean of y

16.764 17.042

Here the p-value is found

Page 36: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 36

Checking Assumptions –The Normal Quantile-Quantile Plot

Page 37: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 37

Importance of the t-Test

• Provides an objective framework for simple comparativeexperiments

• Could be used to test all relevant hypotheses in a two-levelfactorial design, because all of these hypotheses involve themean response at one “side” of the cube versus the meanresponse at the opposite “side” of the cube

Page 38: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 38

Confidence Intervals• Hypothesis testing gives an objective statement

concerning the difference in means, but itdoesn’t specify “how different” they are

• General form of a confidence interval

• The 100(1- α)% confidence interval on thedifference in two means:

where ( ) 1 L U P L Uθ θ α≤ ≤ ≤ ≤ = −

1 2

1 2

1 2 / 2, 2 1 2 1 2

1 2 / 2, 2 1 2

(1/ ) (1/ )

(1/ ) (1/ )n n p

n n p

y y t S n n

y y t S n nα

α

µ µ+ −

+ −

− − + ≤ − ≤

− + +

Page 39: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 39

What If There Are More Than Two Factor Levels?• The t-test does not directly apply

• There are lots of practical situations where there are eithermore than two levels of interest, or there are several factors ofsimultaneous interest

• The analysis of variance (ANOVA) is the appropriateanalysis “engine” for these types of experiments

• The ANOVA was developed by Fisher in the early 1920s, andinitially applied to agricultural experiments

Page 40: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 40

An Example• An engineer is interested in investigating the relationship between

the RF power setting and the etch rate for this tool. The objective ofan experiment like this is to model the relationship between etchrate and RF power, and to specify the power setting that will give adesired target etch rate.

• The response variable is etch rate.• She is interested in a particular gas (C2F6) and gap (0.80 cm), and

wants to test four levels of RF power: 160W, 180W, 200W, and220W. She decided to test five wafers at each level of RF power.

• The experimenter chooses 4 levels of RF power 160W, 180W,200W, and 220W

• The experiment is replicated 5 times – runs made in random order

Page 41: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

41

An Example

• Does changing the power change the mean etch rate?

• Is there an optimumlevel for power?

Page 42: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

42

The Analysis of Variance

• In general, there will be a level of the factor, or a treatment, and nreplicates of the experiment, run in random order…a completely randomized design (CRD)

• N = an total runs• Objective is to test hypotheses about the equality of the a treatment

means

Page 43: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 43

The Analysis of Variance• The name “analysis of variance” stems from a

partitioning of the total variability in the response variable into components that are consistent with a model for the experiment

• The basic single-factor ANOVA model is

2

1, 2,...,,

1, 2,...,

an overall mean, treatment effect, experimental error, (0, )

ij i ij

i

ij

i ay

j n

ithNID

µ τ ε

µ τ

ε σ

== + + =

= =

=

Page 44: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 44

Models for the DataThere are several ways to write a model for the data:

is called the effects modelLet , then

is called the means modelRegression models can also be employed

ij i ij

i i

ij i ij

y

y

µ τ ε

µ µ τµ ε

= + +

= += +

Page 45: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 45

The Analysis of Variance• Total variability is measured by the total sum

of squares:

• The basic ANOVA partitioning is:

2..

1 1( )

a n

T iji j

SS y y= =

= −∑∑

2 2.. . .. .

1 1 1 1

2 2. .. .

1 1 1

( ) [( ) ( )]

( ) ( )

a n a n

ij i ij ii j i j

a a n

i ij ii i j

T Treatments E

y y y y y y

n y y y y

SS SS SS

= = = =

= = =

− = − + −

= − + −

= +

∑∑ ∑∑

∑ ∑∑

Page 46: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

46

The Analysis of Variance

• A large value of SSTreatments reflects large differences in treatment means

• A small value of SSTreatments likely indicates no differences in treatment means

• Formal statistical hypotheses are:

T Treatments ESS SS SS= +

0 1 2

1

:: At least one mean is different

aHH

µ µ µ= = =

Page 47: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

47

The Analysis of Variance• While sums of squares cannot be directly compared to

test the hypothesis of equal means, mean squares can be compared.

• A mean square is a sum of squares divided by its degrees of freedom:

• If the treatment means are equal, the treatment and error mean squares will be (theoretically) equal.

• If treatment means differ, the treatment mean square will be larger than the error mean square.

1 1 ( 1)

,1 ( 1)

Total Treatments Error

Treatments ETreatments E

df df dfan a a n

SS SSMS MSa a n

= +− = − + −

= =− −

Page 48: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

48

The Analysis of Variance is Summarized in a Table

• The reference distribution for F0 is the Fa-1, a(n-1) distribution• Reject the null hypothesis (equal treatment means) if

0 , 1, ( 1)a a nF Fα − −>

Page 49: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 49

ANOVA Table

• Never done by hand, alsways with a computer. In R, the lm() function applies (lm for ‘Linear Model’)

Page 50: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 50

ANOVA Table: R code

Executed R code:

my.analysis<-lm(x~as.factor(Power),data=etching)

drop1(my.analysis,test="F")

Output:

Single term deletions

Model:

x ~ as.factor(Power)

Df Sum of Sq RSS AIC F value Pr(>F)

<none> 5339 119.74

as.factor(Power) 3 66871 72210 165.83 66.797 2.883e-09 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

>

Page 51: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 51

The Reference Distribution:

In R, you can find 𝐹𝐹0.05,3,16 as qf(1-0.05,3,16) : 3.24

Page 52: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 52

Model Adequacy Checking

• Checking assumptions is important• Normality• Constant variance• Independence• Have we fit the right model?• We will not discuss what to do if some of these

assumptions are violated, because of time issues.

Page 53: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 53

Model Adequacy Checking in the ANOVA• Examination of residuals

• Residual plots are very useful

• Quantile-quantile plot of residuals

.

ˆij ij ij

ij i

e y yy y

= −

= −

Page 54: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 54

Other Important Residual Plots

Page 55: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 55

Post-ANOVA Comparison of Means• The analysis of variance tests the hypothesis of equal

treatment means• Assume that residual analysis is satisfactory• If that hypothesis is rejected, we don’t know which

specific means are different• Determining which specific means differ following an

ANOVA is a multiple comparisons problem• There are lots of ways to do this…• We will use pairwise t-tests on means…sometimes

called Fisher’s Least Significant Difference (orFisher’s LSD) Method

Page 56: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 56

Fishers LSD: R code Output:Study: my.analysis ~ "Power"

LSD t Test for x

P value adjustment method: bonferroni

Mean Square Error: 333.7

Power, means and individual ( 95 %) CI

x std r LCL UCL Min Max

160 551.2 20.01749 5 533.8815 568.5185 530 575

180 587.4 16.74216 5 570.0815 604.7185 565 610

200 625.4 20.52559 5 608.0815 642.7185 600 651

220 707.0 15.24795 5 689.6815 724.3185 685 725

Alpha: 0.05 ; DF Error: 16

Critical Value of t: 3.008334

Minimum Significant Difference: 34.75635

Treatments with the same letter are not significantly different.

x groups

220 707.0 a

200 625.4 b

180 587.4 c

160 551.2 d

R code:install.packages("agricolae")

library(agricolae)

LSD.test(my.analysis,"Power", p.adj="bonferroni”,console=TRUE)

All different letters! ie none of the groups can be collapsedat a 5% Bonferroni-correctedtest level

Page 57: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 57

Graphical Comparison of Means

Page 58: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 58

The Regression Model

Page 59: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 59

Why Does the ANOVA Work?

2 21 0 ( 1)2 2

0

We are sampling from normal populations, so

if is true, and

Cochran's theorem gives the independence of these two chi-square random variables

/(So

Treamtents Ea a n

Treatments

SS SSH

SSF

χ χσ σ− −

=

21

1, ( 1)2( 1)

2

2 21

1) /( 1)/[ ( 1)] /[ ( 1)]

Finally, ( ) and ( )1

Therefore an upper-tail test is appropriate.

aa a n

E a n

n

ii

Treatments E

a a FSS a n a n

nE MS E MS

aF

χχ

τσ σ

−− −

=

− −− −

= + =−

~ ~

~ ~

Page 60: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 60

Sample Size Determination• FAQ in designed experiments:• Answer depends on lots of things; including what type

of experiment is being contemplated, how it will beconducted, resources, and desired sensitivity – howsure do you want to be?

• Sensitivity refers to the difference in means that theexperimenter wishes to detect.

• Generally, increasing the number of replicationsincreases the sensitivity or it makes it easier todetect small differences in means

Page 61: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 61

Sample Size Determination

• Can choose the sample size to detect a specificdifference in means and achieve desired values oftype I and type II errors

• Type I error – reject H0 when it is true ( )• Type II error – fail to reject H0 when it is false ( )• Power = 1 -• Operating characteristic curves plot against aparameter , where

αβ

βΦ 2

2 12

a

ii

n

a

τ

σ=Φ =∑

β

Page 62: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 62

Sample Size Determination

• Rule of thumb for the t-test: You obtain a power of 80% when

𝑛𝑛 ≈8𝜎𝜎2

Δ2

where 𝜎𝜎2 is the residual variance, and Δ is the difference that you want to be able to detect.

Example: suppose that our measurements are around 20, with a variance of 10, and we want to detect a 10% change (ie. Δ = ±2). Then

𝑛𝑛 ≈ 8 × 10/22 = 20

Page 63: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 63

Sample Size Determination• The general case of the t-test: For an arbitrary power 1 − 𝛽𝛽 and an

arbitrary test level 𝛼𝛼:

𝑛𝑛 ≈𝜎𝜎2 𝑧𝑧1−𝛽𝛽 + 𝑧𝑧1−𝛼𝛼/2

2

Δ2

Where 𝑧𝑧𝑞𝑞 is the 𝑞𝑞-percentile in the standard normal distribution. One can find it in R as qnorm(q).

Example: Suppose that 𝛼𝛼 = 0.05, and the desired power is 1 − 𝛽𝛽 = 0.8. Since it is well known that qnorm(1-0.05/2)is 1.96, and qnorm(0.8)returns the value 0.84, it holds that

𝑛𝑛 ≈𝜎𝜎2 2.8 2

Δ2=

7.84𝜎𝜎2

Δ2The rule of thumb reappears.

Page 64: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 64

Sample Size DeterminationPower and sample size can be explored in R with the functionpower.t.test. For more general designs, use the pwr package:

Function Power Calculations forpwr.2p.test Two proportions (equal n)pwr.2p2n.test Two proportions (unequal n)pwr.anova.test Balanced one-way anovapwr.chisq.test Chi-square testpwr.f2.test General linear modelpwr.p.test Proportion (one-sample)pwr.r.test Correlationpwr.t.test T-tests (one sample, two sample,

paired)pwr.t2n.test T-test (two samples with unequal n)

Page 65: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 65

Sample Size Determination – Example:Let us investigate the Portland Cement formulation example. Here we

found a difference between the groups of -0.28, and a pooled sd of0.284:

�𝑦𝑦1 − �𝑦𝑦2 = −0.28

𝑆𝑆𝑝𝑝 = 0.284

If these values were indeed the real differences between groups and sd,how many runs should we have in the experiment to be 80% sure to detecta statiastical significance?

Page 66: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 66

Sample Size Determination – Example:R code:power.t.test(delta=-0.28,sd=0.284,power=0.8)

Output:Two-sample t test power calculation

n = 17.16492delta = 0.28

sd = 0.284sig.level = 0.05

power = 0.8alternative = two.sided

NOTE: n is number in *each* group

Thus, we need 18 runs in each group to be 80% sure of detecting thedifference. Perhaps we were lucky with only 10 in each group.

.

Page 67: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 67

Sample Size Determination – Example:R code:power.t.test(n=10, delta=-0.28,sd=0.284)

Output:Two-sample t test power calculation

n = 10delta = 0.28

sd = 0.284sig.level = 0.05

power = 0.5502385alternative = two.sided

NOTE: n is number in *each* group

Thus, the real power of the experiment is close to 50-50, and we may havegotten lucky to detect it.

Page 68: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Design of Experiments:The Blocking

Principle68

Page 69: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 69

Design of ExperimentsThe Blocking Principle• Blocking and nuisance factors

• The randomized complete block design - the RCBD

• Extension of the ANOVA to the RCBD

• Other blocking scenarios…Latin Square designs

Page 70: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 70

The Blocking Principle• Blocking is a technique for dealing with nuisance factors

• A nuisance factor is a factor that probably has some effect on theresponse, but it’s of no interest to the experimenter…however, thevariability it transmits to the response needs to be minimized

• Typical nuisance factors include batches of raw material,operators, pieces of test equipment, time (shifts, days, etc.),different experimental units

• Many experiments involve blocking (or should)

• Failure to block is a common flaw in designing an experiment(consequences?)

Page 71: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 71

The Blocking Principle• If the nuisance variable is known and controllable (ie. we

can choose the values), we use blocking

• If the nuisance factor is known and uncontrollable,sometimes we can use the regression analysis to removethe effect of the nuisance factor from the analysis

• If the nuisance factor is unknown and uncontrollable (a“lurking” variable), we hope that randomization balancesout its impact across the experiment

• Sometimes several sources of variability are combined in ablock, so the block becomes an aggregate variable

Page 72: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 72

The Hardness Testing Example• We wish to determine whether 4 different tips produce different

(mean) hardness reading on a Rockwell hardness tester

• Assignment of the tips to an experimental unit; that is, a test coupon

• Structure of a completely randomized experiment

• The test coupons are a source of nuisance variability

• Alternatively, the experimenter may want to test the tips across coupons of various hardness levels

• The need for blocking

Page 73: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 73

The Hardness Testing Example:Randomized Complete Block Design (RCBD)

• To conduct this experiment as a RCBD, assign all 4 tips toeach coupon

• Each coupon is called a “block”; that is, it’s a morehomogenous experimental unit on which to test the tips

• Variability between blocks can be large, variability within ablock should be relatively small

• In general, a block is a specific level of the nuisance factor• A complete replicate of the basic experiment is conducted in

each block• A block represents a restriction on randomization• All runs within a block are randomized

Page 74: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 74

The Hardness Testing Example• Suppose that we use b = 4 blocks:

• Notice the two-way structure of the experiment• Once again, we are interested in testing the equality of

treatment means, but now we have to remove thevariability associated with the nuisance factor (the blocks)

Page 75: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 75

Using ANOVA to model the RCBD• Suppose that there are a treatments (factor

levels) and b blocks• A statistical model (effects model) for the

RCBD is

• The relevant (fixed effects) hypothesis is

𝐻𝐻0: 𝜏𝜏1 = 𝜏𝜏2 = ⋯ = 𝜏𝜏𝑎𝑎

1,2,...,1, 2,...,ij i j ij

i ay

j bµ τ β ε

== + + + =

Page 76: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 76

Using ANOVA to model the RCBDANOVA partitioning of total variability:

2.. . .. . ..

1 1 1 1

2. . ..

2 2. .. . ..

1 1

2. . ..

1 1

( ) [( ) ( )

( )]

( ) ( )

( )

a b a b

ij i ji j i j

ij i j

a b

i ji j

a b

ij i ji j

T Treatments Blocks E

y y y y y y

y y y y

b y y a y y

y y y y

SS SS SS SS

= = = =

= =

= =

− = − + −

+ − − +

= − + −

+ − − +

= + +

∑∑ ∑∑

∑ ∑

∑∑

Page 77: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 77

The degrees of freedom for the sums of squares in

are as follows:

Therefore, ratios of sums of squares to theirdegrees of freedom result in mean squares, andthe ratio of the mean square for treatments to theerror mean square is an F statistic that can be usedto test the hypothesis of equal treatment means

T Treatments Blocks ESS SS SS SS= + +

Using ANOVA to model the RCBD

1 1 1 ( 1)( 1)ab a b a b− = − + − + − −

Page 78: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 78

ANOVA Display for the RCBD

In R: lm does the job again.

Page 79: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 79

Vascular Graft Example

• To conduct this experiment as a RCBD, assign all 4pressures to each of the 6 batches of resin

• Each batch of resin is called a “block”; that is, it’s amore homogenous experimental unit on which to testthe extrusion pressures

Page 80: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 80

Vascular Graft Example• R code:graftdata<-data.frame(x=c(90.3,89.2,98.2,93.9,87.4,97.9,92.5,89.5,90.6,94.7,87.0,95.8,

85.5,90.8,89.6,86.2,88.0,93.4,82.5,89.5,85.6,87.4,78.9,90.7),PSI=as.factor(c(rep(c(8500,8700,8900,9100),each=6))),batch=as.factor(rep(1:6,4)))

my.analysis<-lm(x~PSI+batch,data=graftdata)anova(my.analysis)

• Output:Analysis of Variance Table

Response: xDf Sum Sq Mean Sq F value Pr(>F)

PSI 3 178.17 59.390 8.1071 0.001916 **batch 5 192.25 38.450 5.2487 0.005532 **Residuals 15 109.89 7.326 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Statistically significant Batch effect – correction is needed

Page 81: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 81

Residual Analysis for the Vascular Graft Example

Page 82: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 82

Residual Analysis for the Vascular Graft Example

Page 83: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 83

Residual Analysis for the Vascular Graft Example

• Basic residual plots indicate that normality, constant variance assumptions are satisfied

• No obvious problems with randomization• No patterns in the residuals vs. block• Can also plot residuals versus the (numerical) pressure

(residuals by factor) • These plots provide more information about the constant

variance assumption, possible outliers

Page 84: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 84

The Vascular Graft Example – Which Pressure is Different? Output:

Study: my.analysis ~ "PSI"

LSD t Test for x

P value adjustment method: bonferroni

Mean Square Error: 7.32575

PSI, means and individual ( 95 %) CI

x std r LCL UCL Min Max

8500 92.81667 4.577081 6 90.46148 95.17185 87.4 98.2

8700 91.68333 3.304189 6 89.32815 94.03852 87.0 95.8

8900 88.91667 2.966760 6 86.56148 91.27185 85.5 93.4

9100 85.76667 4.445072 6 83.41148 88.12185 78.9 90.7

Alpha: 0.05 ; DF Error: 15

Critical Value of t: 3.036283

Minimum Significant Difference: 4.744688

Treatments with the same letter are not significantly different.

x groups

8500 92.81667 a

8700 91.68333 a

8900 88.91667 ab

9100 85.76667 b

Fishers LSD. R code:

LSD.test(my.analysis,"PSI",

p.adj="bonferroni",console=T)

8500 and 8700 constitutes a lower group; 9100 a higher. 8900 cannot be distingushedfrom either

Page 85: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 85

The Latin Square Design• These designs are used to simultaneously

control (or eliminate) two sources of nuisance variability

• A significant assumption is that the three factors (treatments, nuisance factors) do not interact

• If this assumption is violated, the Latin square design will not produce valid results

• Latin squares’ force is the low number of runs. If resources is not an issue, RCBD is a possibility.

Page 86: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

86

The Rocket Propellant Problem –A Latin Square Design

• This is a 5 × 5 Latin Square design.• Corresponding RCBD: a 5 × 5 design for each

rocket propellant formula (A-E). • Statistical analysis: lm

Page 87: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 87

Statistical Analysis of the Latin Square Design• The statistical (effects) model is

• The statistical analysis (ANOVA) is much likethe analysis for the RCBD.

1,2,...,1, 2,...,1, 2,...,

ijk i j k ijk

i py j p

k pµ α τ β ε

== + + + + = =

Page 88: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 88

Statistical Analysis of the Latin Square DesignOrganizing data for analysis:

rocket.data<-data.frame(x=c(24,20,19,24,24,

17,24,30,27,36,18,38,26,27,21,26,31,26,23,22,22,30,20,29,31),

operator=as.factor(rep(1:5,5)),batch=as.factor(rep(1:5,each=5)),formula=as.factor(c("A","B","C","D","E",

"B","C","D","E","A","C","D","E","A","B","D","E","A","B","C","E","A","B","C","D")))

Page 89: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 89

Statistical Analysis of the Latin Square Design

Analysis: R commands:my.analysis<-lm(x~formula+operator+batch,

data=rocket.data)anova(my.analysis)

Outcome:

Analysis of Variance Table

Response: x

Df Sum Sq Mean Sq F value Pr(>F)

formula 4 330 82.500 7.7344 0.002537 **

operator 4 150 37.500 3.5156 0.040373 *

batch 4 68 17.000 1.5937 0.239059

Residuals 12 128 10.667

---

Signif. codes:

0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Batch is not statistically significant, and we can proceed to analyse formula and operator through a RCBD design with repetitions

Page 90: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Other Latin Squares: Examples

4-6 dimensions:

90

4x4 5x5 6x6ABDC ADBEC ABCEBFBCAD DACBE BAECFDCDBA CBEDA CEDFABDACB BEACD DCFBEA

ECDAB FBADCEEFBADC

Page 91: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Design of Experiments

The 2k Factorial Design

91

Page 92: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 92

The 2k Factorial Design• Special case of the general factorial design; kfactors, all at two levels

• The two levels are usually called low and high(they could be either quantitative or qualitative)

• Very widely used in industrial experimentation• Form a basic “building block” for other very useful

experimental designs • Special (short-cut) methods for analysis

Page 93: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 93

The Simplest Case: The 22

“-” and “+” denote the low and high levels of a factor, respectively

• Low and high are arbitrary terms

• Geometrically, the four runs form the corners of a square

• Factors can be quantitative or qualitative, although their treatment in the final model will be different

Page 94: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 94

Chemical Process Example

A = reactant concentration, B = catalyst amount, y = recovery

Page 95: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 95

Analysis Procedure for a Factorial Design• Formulate model• Statistical testing (ANOVA)• Refine the model• Analyze residuals (graphical)• Estimate factor effects• Interpret results

Page 96: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 96

Model formulation

Organizing data for analysis:chem.proces.data<-data.frame(y=c(28,25,27,36,32,32,18,19,23,31,30,29),A=rep(c(-1,1,-1,1),each=3),B=rep(c(-1,-1,1,1),each=3))

Page 97: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 97

Statistical Testing – ANOVAR code:my.analysis<-lm(y~A+B+A:B,data=chem.proces.data)anova(my.analysis)

Outcome:Analysis of Variance Table

Response: yDf Sum Sq Mean Sq F value Pr(>F)

A 1 208.333 208.333 53.1915 8.444e-05 ***B 1 75.000 75.000 19.1489 0.002362 ** A:B 1 8.333 8.333 2.1277 0.182776 Residuals 8 31.333 3.917 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’

A:B is not significant, and we proceed with a reduced model without A:B

Page 98: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 98

Statistical Testing – ANOVAR code:my.analysis<-lm(y~A+B,data=chem.proces.data)anova(my.analysis)

Outcome:Analysis of Variance Table

Response: yDf Sum Sq Mean Sq F value Pr(>F)

A 1 208.333 208.333 47.269 7.265e-05 ***B 1 75.000 75.000 17.017 0.002578 ** Residuals 9 39.667 4.407 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

A and B are both significant, and we proceed to estimate effects.

Page 99: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 99

Statistical Testing – ANOVAR code:my.analysis<-lm(y~A+B,data=chem.proces.data)summary(my.analysis)$coefficients

Outcome:

Estimate Std. Error t value Pr(>|t|)(Intercept) 27.500000 0.6060396 45.376576 6.132482e-12A 4.166667 0.6060396 6.875239 7.265111e-05B -2.500000 0.6060396 -4.125143 2.578088e-03

A high concentration of reactant (A) seem to increase the recoveryrate, while a high amount of catalyst (B) seem to decrease it.

Page 100: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 100

Residuals and Diagnostic Checking

Page 101: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 101

The 23 Factorial Design

Page 102: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 102

Table of – and + Signs for the 23

Factorial Design

Page 103: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 103

Properties of the Table • Except for column I, every column has an equal number of + and –

signs• The sum of the product of signs in any two columns is zero• Multiplying any column by I leaves that column unchanged (identity

element)• The product of any two columns yields a column in the table:

• Orthogonal design• Orthogonality is an important property shared by all factorial

designs – we shall not pursue this further

2

A B ABAB BC AB C AC× =

× = =

Page 104: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019 104

The General 2k Factorial Design• There will be k main effects, and:

two-factor interactions2

three-factor interactions3

1 factor interaction

k

k

k

Page 105: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

ConcludingRemarks

105

Page 106: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Conducting an Experiment: The Process• Plan your experiment!• Successful experiments depend on how well they are

planned.

What are you investigating?What is the objective of your experiment?What are you hoping to learn more about?What are the critical factors?Which of the factors can be controlled?What resources will be used?

106

Page 107: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

This presentation is an introduction• Design of experiments go much deeper;

• This presentation only refer to the simple situations.

• I refer you to the literature; t.ex. The Montgomery reference on slide 5.

107

Page 108: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Survival Analysis

108

Page 109: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Main Reference

109

Page 110: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Overview• Introduction;

• Terminology and Notation;

• Data Structures and Kaplan-Meier Curves;

• The Cox proportional Hazards Model;

• Survival Analysis with Time Dependent Covariates.

110

Page 111: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Introduction

111

Page 112: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Introduction• Survival analysis is about analyzing time until an event occurs.

Start follow-up Event

• ‘Time’ can be many things; – days, months, years, seconds, age, time since beginning of

follow-up of an individual, etc.• ‘Event’ can be many things; but generally referred to as the

Failure:–death, disease incidence, relapse from remission, recovery

(e.g. return to work), etc. Not neccesarily negatively loaded concepts.

112

TIME

Page 113: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

• Water turbidity in water bodies may be measured by loweringa secci disc, until you can’t see the disc

The distance to the water surface when the disc can’t be seenis the secchi depth.

Example: Secchi Depth

113

Page 114: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Example: Secchi Depth• Survival analysis framework:TIME is the distance to the water surfaceEVENT is when the secchi disc can’t be seenSURVIVAL TIME is the secchi depth.

The secchi depth can be interpreted as a measure of eutrophication

How should the event that the secchi disc hits the sea bed beinterpreted?

114

Page 115: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Example: Secchi Depth• When the secchi disc hits the sea bottom and can still be

seen, the information is the following:

• The secchi depth is more than the current depth;• The disc can’t be lowered further to invetigate the true secchi

depth; in other words, the current varaible can’t spen amymore TIME (=distance to surface).

• We say that the variable is CENSORED.

115

Page 116: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Censoring• A subject is censored at its censor time if at some time point

we can no longer observe the survival of the subject; ie. The depth when the secchi disc hits the seabed.

• Some subjects are censored, while others are not:

• Reasons for (right-) censoring: - Loss to follow-up (ie. Subject may have moved

away/do not show up at clinic/refuse to continue); - Loss to competing risks;- Survival past end of study.

116

Page 117: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Survival Analysis designs:Cohort study (prospective/retrospective)

Target population

Exposed

Unexposed

Disease

Disease-free

Disease

Disease-free

TIME

Disease-free cohort

Slide design: Kristin Sainani

Page 118: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Survival Analysis designs: Randomized Clinical Trial

Target population

Intervention

Control

Disease

Disease-free

Disease

Disease-free

TIME

Random assignment

Disease-free, at-risk cohort

Slide design: Kristin Sainani

Page 119: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Target population

Treatment

Control

Cured

Not cured

Cured

Not cured

TIME

Random assignment

Patient population

Survival Analysis designs: Randomized Clinical Trial

Slide design: Kristin Sainani

Page 120: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Target population

Treatment

Control

Dead

Alive

Dead

Alive

TIME

Random assignment

Patient population

Survival Analysis designs: Randomized Clinical Trial

Slide design: Kristin Sainani

Page 121: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Why Survival Analysis?• Why not compare mean time-to-event between groups, using

a t-test or linear regression?– ignores censoring

• Why not compare proportion of events in groups, using risk/odds ratios or logistic regression?–ignores time

121

Page 122: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Setting the Scene: Terminology –ObservationsT and d• What we observe:• T: Survival time. T is a random variable• d: Failure status:

𝑑𝑑 = �1 𝑖𝑖𝑖𝑖 𝑖𝑖𝑓𝑓𝑖𝑖𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓0 𝑖𝑖𝑖𝑖 𝑐𝑐𝑓𝑓𝑛𝑛𝑐𝑐𝑐𝑐𝑓𝑓𝑓𝑓𝑑𝑑

Observations: 𝑇𝑇,𝑑𝑑𝑇𝑇𝐴𝐴,𝑑𝑑𝐴𝐴 = 5,1 ; 𝑇𝑇𝐵𝐵 ,𝑑𝑑𝐵𝐵 = 12,0𝑇𝑇𝐶𝐶 ,𝑑𝑑𝐶𝐶 = 3.5,0 ; 𝑇𝑇𝐷𝐷 ,𝑑𝑑𝐷𝐷 = 8,0𝑇𝑇𝐸𝐸 ,𝑑𝑑𝐸𝐸 = 6,0 ; 𝑇𝑇𝐹𝐹 ,𝑑𝑑𝐹𝐹 = 3.5,1

Note that C-D also have delayed entry,so there’s a third variable i play.

122

Page 123: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Survival Analysis – Terminologyand Notation

123

Page 124: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Terminology – The Survival Function S• The stochastic variable T has a distribution. • This is given by the survival function S:

𝑆𝑆 𝑡𝑡 = 𝑃𝑃 𝑇𝑇 > 𝑡𝑡

124

Page 125: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

The Survival Function S: Example• T=Onset of Alzheimer’s disease, grouped by the number of E4

alleles in the APOE gene

• The area between the curves, weighted with general survival, is the average number of years you loose/gain by having a specific genotype relative to another

125

Page 126: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Terminology – The Hazard function 𝒉𝒉The hazard function:

ℎ 𝑡𝑡 = limΔ𝑡𝑡→0

𝑃𝑃 𝑡𝑡 ≤ 𝑇𝑇 < 𝑡𝑡 + Δ𝑡𝑡|𝑇𝑇 ≥ 𝑡𝑡Δ𝑡𝑡

ℎ(𝑡𝑡) gives the instantaneous potential per unit time for the event to occur, given that the individual has survived up to time t.Relationship with the survival function S:

ℎ 𝑡𝑡 =𝑆𝑆′(𝑡𝑡)𝑆𝑆(𝑡𝑡)

; 𝑆𝑆 𝑡𝑡 = 𝑓𝑓𝑒𝑒𝑒𝑒 −�0

𝑡𝑡ℎ 𝑐𝑐 𝑑𝑑𝑐𝑐

126

Page 127: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Terminology – The Hazard function 𝒉𝒉The hazard function is a rate, not a probability:Suppose that you drive 60 km/h. This then gives you a potential for driving: If you continue for 1 hour, you cover 60 km. However, you may slow down, speed up or stop during the next hour. The 60 km/h gives the instantaneous potential for driving, but says nothing about the distance covered.

Similarly with the hazard rate h: It gives the instantaneous potential for failure, but says nothing abut survival over intervals.

127

Page 128: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

The Hazard function 𝒉𝒉 - Example

Constant hazard: 𝑆𝑆 𝑡𝑡 = 𝑓𝑓−𝜆𝜆𝑡𝑡.Subjects healthy in the study period

128

Page 129: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

The Hazard function 𝒉𝒉 - Example

Increasing Weibull hazard: With no to treatment, the risk of dieing increases.

129

Page 130: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

The Hazard function 𝒉𝒉 - Example

Decreasing Weibull hazard: The risk of dying after surgery is highest immediately after.

130

Page 131: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

The Hazard function 𝒉𝒉 - Example

Lognormal hazard: The risk of dieing from TB increases early in the disease progression and decreases later.

131

Page 132: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

The Hazard function 𝒉𝒉Main reasons for studying the hazard function:

• It is a measure of instantaneous potential, whereas a survival curve is a cumulative measure over time;

• It may be used to identify a specific model form, such as an exponential, a Weibull, or a lognormal curve that fits one’s data;

• It is the vehicle by which mathematical modeling of survival data is carried out; that is, the survival model is usually written in terms of the hazard function.

132

Page 133: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Censoring RevisitedThree assumtions on censoring to make analysis work:

• Independent (vs.non-independent) censoring

• Random (vs. non-random) censoring (more restrictive thanIndependent censoring)

• Non-informative (vs. informative) censoring

For matematical formulations, see t.ex. Kalbfleisch and Prentice (1980)

133

Page 134: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Random Censoring• The subjects who are censored at time t should be

representative of all the subjects who remain at risk at time t with respect to their survival experience.

• Thus: Failure rate of those censored at time t is assumed equal to the failure rate of those remaining at time t.

• If there is only one group, random and independent censoring is the same.

• Random censoring implies independent censoring.

134

Page 135: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Independent Censoring• Within any subgroup of interest, the subjects who are

censored at time t should be representative of all the subjectsin that subgroup who remain at risk at time t with respect totheir survival experience.

• In other words, censoring is independent provided that it is random within any subgroup of interest.

• Problem: Bias.135

Page 136: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Non- Informative Censoring• Non-informative censoring occurs if the failure time

distribution of T provides no information about the distribution of censorship times C, and vice versa

• Often justifiable under random and independent censoring

136

Page 137: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Informative Censoring: Example• Informative censoring: • In a study comparing disease-free survival after two treatments for cancer, the

control arm may be ineffective, leading to more recurrences and patients becoming too sick to follow-up.

• On the other hand, patients on the intervention arm may be completely curedby an effective treatment and may no longer feel the need to follow-up. If these participants are routinely censored, the true treatment effect will not be picked up and the results of the study will be biased.

• Disease-free survival rates would be based on the patients who continued to be followed-up in the study, and would be overestimated for the control arm and underestimated for the treatment arm.

Ranganathan and Pramesh (2012)

137

Page 138: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Dealing with Issues of Non-Compliance

• Well-structured designs! Rule out the problem by carefullydesigning your survey.

• Imputation of values (R package: InformativeCensoring);

• Sensitivity analyses.

• See t.ex. Leung, Elashoff and Afifi (1997); Campigotto and Weller (2014); Jackson et al (2014); Hsu and Taylor (2009).

138

Page 139: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Data Structuresand Kaplan-

Meier Curves139

Page 140: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Goals of Survival AnalysisGoal 1: To estimate and interpret survivor and/or hazard functions from survival data.

- Constant, Weibull, lognormal hazards examples

Goal 2: To compare survivor and/or hazard functions.- Alzheimers Disease example

Goal 3: To assess the relationship of explanatory variables to survival time

- Mathematical modelling – to be adressed

140

Page 141: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Data Structures for Survival Analysis

141

Page 142: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Data Structures in Practice: The Takafumi data • R commands:TAKAFUMI<-read.csv2("Data/TAKAFUMI_nga.csv")

head(TAKAFUMI)

• Output:Tank Time Status Group Infection_model

1 41 19 1 SE-SVA-1033-9C Bath

2 41 29 0 SE-SVA-1033-9C Bath

3 41 29 0 SE-SVA-1033-9C Bath

4 41 29 0 SE-SVA-1033-9C Bath

5 41 29 0 SE-SVA-1033-9C Bath

6 41 29 0 SE-SVA-1033-9C Bath

142

This is T! This is d!

Page 143: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Data Structures in Practice: The Takafumi data Restructuring to add subject ID and get important columns first:• R commands:TAKAFUMI<-data.frame(ID=1:dim(TAKAFUMI)[1],TAKAFUMI[,c(2,3,1,4,5)])

head(TAKAFUMI)

• Output:ID Time Status Tank Group Infection_model

1 1 19 1 41 SE-SVA-1033-9C Bath

2 2 29 0 41 SE-SVA-1033-9C Bath

3 3 29 0 41 SE-SVA-1033-9C Bath

4 4 29 0 41 SE-SVA-1033-9C Bath

5 5 29 0 41 SE-SVA-1033-9C Bath

6 6 29 0 41 SE-SVA-1033-9C Bath143

Page 144: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Alternative Data Structure: The counting Process Approach

• Several lines per subject;

• TWO time points: Start and Stop

• We shall return to this structurewhen considering time-dependent covariates.

144

Page 145: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Goal 1: Estimating the Survival Function S

• We observe 𝑇𝑇1,𝑑𝑑1 , 𝑇𝑇2,𝑑𝑑2 , … , 𝑇𝑇𝑛𝑛,𝑑𝑑𝑛𝑛 (ie. 𝑛𝑛 subjects).• Define the process of events 𝑵𝑵 as

𝑁𝑁 𝑡𝑡 = �𝑖𝑖=1

𝑛𝑛

1{𝑇𝑇𝑖𝑖≤𝑡𝑡,𝑑𝑑𝑖𝑖=1}

The jumps of 𝑁𝑁(𝑡𝑡) indicates the number of events at time 𝑡𝑡.• Define the population a risk Y as

𝑌𝑌 𝑡𝑡 = �𝑖𝑖=1

𝑛𝑛

1{𝑡𝑡≤𝑇𝑇𝑖𝑖}

145

Page 146: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

• S is the survival function:

• Define the Kaplan-Meier estimator �̂�𝑆 as

�̂�𝑆 𝑡𝑡 = �𝑠𝑠≤𝑡𝑡

1 −Δ𝑁𝑁(𝑐𝑐)𝑌𝑌(𝑐𝑐)

If events occurs at 𝑡𝑡1, … , 𝑡𝑡𝑘𝑘, The Kaplan-Meier estimator takesthe form

�̂�𝑆 𝑡𝑡 = �𝑖𝑖=1

𝑘𝑘

1 −Δ𝑁𝑁(𝑡𝑡𝑖𝑖)𝑌𝑌(𝑡𝑡𝑖𝑖)

Goal 1: Estimating the Survival Function S

146

Page 147: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

• Alternative formulation of the the Kaplan-Meier estimator:

�̂�𝑆 𝑡𝑡 = �𝑖𝑖=1

𝑘𝑘𝑌𝑌 𝑡𝑡𝑖𝑖 − Δ𝑁𝑁(𝑡𝑡𝑖𝑖)

𝑌𝑌(𝑡𝑡𝑖𝑖)

Thus, the Kaplan-Meier estimator is the successive product of the ratio between those that survive and those that are at risk.

𝑉𝑉𝑓𝑓𝑓𝑓 �̂�𝑆(𝑡𝑡) = �̂�𝑆(𝑡𝑡)2�𝑡𝑡𝑖𝑖≤𝑡𝑡

𝑌𝑌 𝑡𝑡𝑖𝑖 − Δ𝑁𝑁(𝑡𝑡𝑖𝑖)𝑌𝑌(𝑡𝑡𝑖𝑖)

Greenwood (1926)

Goal 1: The Kaplan-Meier estimator

147

Page 148: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

• R code:

head(TAKAFUMI,n=3)

plot(survfit(Surv(Time, Status) ~ Group, data = TAKAFUMI),col=1:8)

legend("bottomleft",legend=levels(as.factor(TAKAFUMI$Group)),

col=1:8,lty=1)

• Output:ID Time Status Tank Group Infection_model

1 1 19 1 41 SE-SVA-1033-9C Bath

2 2 29 0 41 SE-SVA-1033-9C Bath

3 3 29 0 41 SE-SVA-1033-9C Bath

Goal 1: The Kaplan-Meier Estimator

148

Page 149: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

• One group at a time, with confidence intervals:• R code:my.levels<-levels(as.factor(

TAKAFUMI$Group))

par(mfrow=c(3,3))

for(i in 1:length(my.levels)){

plot(survfit(Surv(Time, Status)~1,

data = TAKAFUMI[

TAKAFUMI$Group==my.levels[i],]),

col=i,main=my.levels[i],lwd=1.5)

}

par(mfrow=c(1,1))

Goal 1: The Kaplan-Meier Estimator

149

Page 150: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

• Comparing group 1 and 8:• R code:TAKAFUMI.temp<-TAKAFUMI[TAKAFUMI$Group %in% my.levels[c(1,8)],]

TAKAFUMI.temp$Group<-

as.factor(as.character(TAKAFUMI.temp$Group))

plot(survfit(Surv(Time, Status) ~ Group,

data = TAKAFUMI.temp),conf.int=TRUE,col=c(1,2),

main=paste("Comparing",my.levels[1],"and",

my.levels[8]))

legend("bottomleft",

legend=levels(as.factor(TAKAFUMI.temp$Group)),

col=1:2,lty=1)

Goal 1: The Kaplan-Meier Estimator

150

Page 151: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

• Comparing group 1 and 8, onlyinfection method ”Bath”:

• R code:TAKAFUMI.temp<-TAKAFUMI[TAKAFUMI$Group %in%

my.levels[c(1,8)] &

TAKAFUMI$Infection_model=="Bath",]

TAKAFUMI.temp$Group<-

as.factor(as.character(TAKAFUMI.temp$Group))

plot(survfit(Surv(Time, Status) ~ Group,

data = TAKAFUMI.temp),conf.int=TRUE,col=c(1,2),

main=paste("Comparing",my.levels[1],"and",

my.levels[8]),

sub="Infection Type: Bath")

legend("bottomleft",

legend=levels(as.factor(TAKAFUMI.temp$Group)),

col=1:2,lty=1)

Goal 1: The Kaplan-Meier Estimator

151

Page 152: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

• The test statistic for comparing two groups is calculated as follows:

𝑍𝑍2 =𝑂𝑂1 − 𝐸𝐸1 2

𝐸𝐸1+

𝑂𝑂2 − 𝐸𝐸2 2

𝐸𝐸2

where the 𝑂𝑂1 and 𝑂𝑂2 are the total numbers of observed events in groups 1 and 2,respectively, and E1 and 𝐸𝐸2 the total numbers of expected events. Under theassumption of identical hazards, 𝑍𝑍2 is 𝝌𝝌𝟐𝟐-distributed with 1 degree of freedom.

• The total expected number of events for a group is the sum of the expectednumber of events at the time of each event.

• The expected number of events at the time of an event can be calculated as therisk of an event at that time, multiplied by the number at risk in the group.

Goal 2: The Log Rank Test

152

Page 153: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

• Lets make the observations from the comparisons formal:• In R, th Log Rnk test is performed by the survdiff function:

• Group comparisons ingnoring infection method, R code:TAKAFUMI.temp<-TAKAFUMI[TAKAFUMI$Group %in% my.levels[c(1,8)],]

TAKAFUMI.temp$Group<-as.factor(as.character(TAKAFUMI.temp$Group))

survdiff(Surv(Time, Status) ~ Group, data = TAKAFUMI.temp)

Output:Call:

survdiff(formula = Surv(Time, Status) ~ Group, data = TAKAFUMI.temp)

N Observed Expected (O-E)^2/E (O-E)^2/V

Group=negative control bath 72 4 28.9 21.42 31.9

Group=SE-SVA-14 wild-type 198 89 64.1 9.64 31.9

Chisq= 31.9 on 1 degrees of freedom, p= 2e-08

Goal 2: The Log Rank Test

153

Page 154: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

• Group comparisons with infection method ”Bath”, R code:

TAKAFUMI.temp<-TAKAFUMI[TAKAFUMI$Group %in% my.levels[c(1,8)] &

TAKAFUMI$Infection_model=="Bath",]

TAKAFUMI.temp$Group<-as.factor(as.character(TAKAFUMI.temp$Group))

survdiff(Surv(Time, Status) ~ Group, data = TAKAFUMI.temp)

Output:

Call:

survdiff(formula = Surv(Time, Status) ~ Group, data = TAKAFUMI.temp)

N Observed Expected (O-E)^2/E (O-E)^2/V

Group=negative control bath 72 4 4.93 0.174 0.297

Group=SE-SVA-14 wild-type 102 8 7.07 0.122 0.297

Chisq= 0.3 on 1 degrees of freedom, p= 0.6

Goal 2: The Log Rank Test

154

Page 155: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Goal 2: The Log Rank Test• Other formula:

Division with variance instead of mean; approximately similar

• Alternatives:

• The Wilcoxon test (rank test);

• Maximum Likelihood methods.

• Reference: Fleming and Harrington (1982).

155

Page 156: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

The Cox Proportional

Hazards Model156

Page 157: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Goal 3: The Cox Proportional Hazards Model• Goal 3: To assess the relationship of explanatory variables to

survival time

• We need a framework where we can take covariates into account:

> summary(TAKAFUMI)

ID Time Status Tank Group Infection_model

Min. : 1.0 Min. : 1.0 Min. :0.0000 Min. :25.00 SE-SVA-1033-9C :223 Bath:722

1st Qu.: 362.2 1st Qu.:29.0 1st Qu.:0.0000 1st Qu.:39.00 SE-SVA-1033-3F :221 IP :724

Median : 723.5 Median :29.0 Median :0.0000 Median :53.00 SE-SVA-14-3D :221

Mean : 723.5 Mean :25.7 Mean :0.2075 Mean :51.06 SE-SVA-1033 wild-type :218

3rd Qu.:1084.8 3rd Qu.:29.0 3rd Qu.:0.0000 3rd Qu.:65.00 SE-SVA-14-5G :217

Max. :1446.0 Max. :29.0 Max. :1.0000 Max. :76.00 SE-SVA-14 wild-type :198

157

summary(TAKAFUMI)

summary(TAKAFUMI)

summary(TAKAF

UMI)

Covariates

Page 158: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Goal 3: The Cox Proportional HazardsModel• Semi-parametric model;• Abstain from parametrizing the hazard function completely, in

order to be able to perform comparisons.

For subject 𝑖𝑖 k covariates:

ℎ𝑖𝑖 𝑡𝑡 = ℎ0 𝑡𝑡 exp �𝑗𝑗=1

𝑘𝑘

𝜃𝜃𝑗𝑗𝑋𝑋𝑖𝑖𝑗𝑗

Where ℎ0 𝑡𝑡 is a baseline hazard that in general is not estimated.

158

Page 159: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Goal 3: The Cox Proportional HazardsModel• In the TAKAFUMI case (at first we ignore the tank):

For subject 𝑖𝑖:

ℎ𝑖𝑖 𝑡𝑡 = ℎ0 𝑡𝑡 exp 𝛼𝛼𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝑝𝑝(𝑖𝑖) + 𝛽𝛽𝑖𝑖𝑛𝑛𝑖𝑖𝑖𝑖𝑖𝑖𝑡𝑡𝑖𝑖𝐺𝐺𝑛𝑛.𝑚𝑚𝐺𝐺𝑑𝑑𝑖𝑖𝑚𝑚(𝑖𝑖)

Individuals within the same Group and infection model: Same hazard. The reference group ”negative control bath” has hazardℎ0 𝑡𝑡 .

159

Page 160: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Goal 3: The Cox Proportional HazardsModel• Hazard rate between individuals, say 1 and 2, with the same

Group (ie, Group(1)=Group(2)), but different infection model (IP, Bath respectively):

ℎ1 𝑡𝑡ℎ2(𝑡𝑡)

=ℎ0 𝑡𝑡 exp 𝛼𝛼𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝑝𝑝(1) + 𝛽𝛽𝐼𝐼𝐼𝐼

ℎ0 𝑡𝑡 exp 𝛼𝛼𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝑝𝑝(2)= exp 𝛽𝛽𝐼𝐼𝐼𝐼

• Thus, the exponentiated coefficient gives the hazard ratio when changing infection model, irrespectively of Group status.

• The Hazards ℎ1 𝑡𝑡 and ℎ2 𝑡𝑡 are proportional.• Hence the name…

160

Page 161: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Goal 3: The Cox Proportional HazardsModelIn R, the coxph function estimates the proportional hazardsmodel. We use the Surv function to specify which variables that are time-to event and censoring.

• R code:

my.analysis<-coxph(

Surv(Time,Status)~Group+Infection_model,

data=TAKAFUMI)

161

Page 162: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Goal 3: The Cox Proportional Hazards Model• Is the cox proprtional hazards model a good model for these

data?

• Model control of the porportional hazards assumption: •cox.zph in R.> cox.zph(my.analysis)

rho chisq p

GroupNegative control IP 0.1604 7.681 0.00558

GroupSE-SVA-1033-3F 0.1122 3.882 0.04880

GroupSE-SVA-1033-9C 0.1238 4.646 0.03112

GroupSE-SVA-1033 wild-type 0.1118 3.722 0.05369

GroupSE-SVA-14-3D 0.0978 2.869 0.09033

GroupSE-SVA-14-5G 0.0940 2.674 0.10202

GroupSE-SVA-14 wild-type 0.1221 4.437 0.03517

Infection_modelIP -0.0488 0.734 0.39167

GLOBAL NA 9.744 0.28349

162

Overall, no problem!

Less than 0.05!But Bonferroni corrected, the value is only borderlinesignificant

Page 163: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Goal 3: The Cox Proportional HazardsModel

Investigating significant effects of Group:

• R code:my.analysis2<-coxph(Surv(Time,Status)~Infection_model,

data=TAKAFUMI)

anova(my.analysis, my.analysis2)

Output:Analysis of Deviance Table

Cox model: response is Surv(Time, Status)

Model 1: ~ Group + Infection_model

Model 2: ~ Infection_model

loglik Chisq Df P(>|Chi|)

1 -1924.2

2 -2042.2 236.05 7 < 2.2e-16 ***

---

Group, corrected for Infection model, is strongly significant.163

Page 164: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Goal 3: The Cox Proportional HazardsModel

Investigating significant effects of Infection Model:

• R code:my.analysis2<-coxph(Surv(Time,Status)~Group,

data=TAKAFUMI)

anova(my.analysis, my.analysis2)

Output:Analysis of Deviance Table

Cox model: response is Surv(Time, Status)

Model 1: ~ Group + Infection_model

Model 2: ~ Group

loglik Chisq Df P(>|Chi|)

1 -1924.2

2 -2063.9 279.49 1 < 2.2e-16 ***

Infection Model, corrected for Group, is strongly significant.164

Page 165: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Goal 3: The Cox Proportional Hazards Model

• Parameter values:> summary(my.analysis)$coef

coef exp(coef) se(coef) z Pr(>|z|)

GroupNegative control IP -3.12569262 0.04390651 0.8836013 -3.53744669 4.040157e-04

GroupSE-SVA-1033-3F 0.03483477 1.03544861 0.5398640 0.06452509 9.485521e-01

GroupSE-SVA-1033-9C -1.04266264 0.35251481 0.5629389 -1.85217723 6.400038e-02

GroupSE-SVA-1033 wild-type 0.38183948 1.46497690 0.5360187 0.71236217 4.762405e-01

GroupSE-SVA-14-3D -0.46135776 0.63042710 0.5481104 -0.84172411 3.999424e-01

GroupSE-SVA-14-5G -1.67392506 0.18750963 0.5937000 -2.81947963 4.810158e-03

GroupSE-SVA-14 wild-type 0.98034502 2.66537570 0.5312347 1.84540845 6.497815e-02

Infection_modelIP 2.33718070 10.35200996 0.1753049 13.33208884 1.506205e-40

Reference group: ”groupNegative control Bath”

165

Page 166: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Goal 3: The Cox Proportional Hazards Model• Visualizing: Estimated Kaplan-Meier curves> new.data<-data.frame(Group=c(levels(TAKAFUMI$Group)[1:2],

rep(levels(TAKAFUMI$Group)[-(1:2)],2)),

Infection_model=c(levels(TAKAFUMI$Infection_model),

rep(levels(TAKAFUMI$Infection_model),

each=length(levels(TAKAFUMI$Group))-2)))

> new.dataGroup Infection_model

1 negative control bath Bath

2 Negative control IP IP

3 SE-SVA-1033-3F Bath

4 SE-SVA-1033-9C Bath

5 SE-SVA-1033 wild-type Bath

6 SE-SVA-14-3D Bath

7 SE-SVA-14-5G Bath

8 SE-SVA-14 wild-type Bath

9 SE-SVA-1033-3F IP

10 SE-SVA-1033-9C IP

11 SE-SVA-1033 wild-type IP

12 SE-SVA-14-3D IP

13 SE-SVA-14-5G IP

14 SE-SVA-14 wild-type IP

166

Page 167: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Goal 3: The Cox Proportional Hazards Model• Visualizing: Estimated Kaplan-Meier curves:

• Drawing:

plot(survfit(my.analysis,newdata=new.data),col=1:14,

lty=as.numeric(new.data$Infection_model))

legend("bottomleft",

legend=paste(new.data$Group,new.data$Infection_model),

col=1:14,text.col=1:14,bty="n",cex=1.2,

lty=as.numeric(new.data$Infection_model))

167

Page 168: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Goal 3: The Cox Proportional Hazards Model

168

Page 169: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Goal 3: The Cox Proportional Hazards Model

• What if we add Tank to the model?>my.analysis<-coxph(

Surv(Time,Status)~Group+Infection_model+as.factor(Tank),

data=TAKAFUMI)

Warning message:

In fitter(X, Y, strats, offset, init, control, weights = weights, :

Loglik converged before variable 9,15,22,38,39 ; coefficient may beinfinite.

• Not a super model; and we have no direct interest ín the effect of Tank.

• While we may expect Tank to influence results, the effect is of no valueprospectively:

• In the next experiment, it will be Tanks under different circumstances

169

Page 170: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Goal 3: The Cox Proportional Hazards Model

• Checking the propotional hazards assumption:

cox.zph(my.analysis)

• Some combinations of Group and Infection_model only uses1 tank, so many parameters cannot be estimated.

• But the proportional hazards assumption is no longer questionable: The smallest value in cox.zph is 0.025 beforeBonferroni correction, global p-value is 0.19.

170

Page 171: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Goal 3: The Cox Proportional Hazards Model

• Lets randomize the Tank effect:

library(coxme)

my.analysis<-coxme(Surv(Time, Status)~

Group+Infection_model+(1|as.factor(Tank)),

data=TAKAFUMI)

171

Page 172: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Goal 3: The Cox Proportional Hazards Model

• Statistical inference on Group and Infection_model:

my.analysis2<-coxme(Surv(Time, Status)~

Infection_model+(1|as.factor(Tank)),data=TAKAFUMI)

anova(my.analysis,my.analysis2)

my.analysis2<-coxme(Surv(Time, Status)~

Group+(1|as.factor(Tank)),data=TAKAFUMI)

anova(my.analysis,my.analysis2)

• Both analyses gives strong significances.

172

Page 173: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Goal 3: The Cox Proportional Hazards Model

Parameter estimates in the random effects model:summary(my.analysis)$fixed

Model: Surv(Time, Status) ~ Group + Infection_model + (1 | as.factor(Tank))

Fixed coefficients

coef exp(coef) se(coef) z p

GroupNegative control IP -3.0897083 0.04551523 0.9080259 -3.40 0.00067

GroupSE-SVA-1033-3F 0.1328129 1.14203629 0.5651421 0.24 0.81000

GroupSE-SVA-1033-9C -0.9921949 0.37076201 0.5907804 -1.68 0.09300

GroupSE-SVA-1033 wild-type 0.3731670 1.45232688 0.5629024 0.66 0.51000

GroupSE-SVA-14-3D -0.4303877 0.65025694 0.5759943 -0.75 0.45000

GroupSE-SVA-14-5G -1.6311436 0.19570563 0.6209667 -2.63 0.00860

GroupSE-SVA-14 wild-type 0.9637155 2.62141828 0.5568671 1.73 0.08400

Infection_modelIP 2.2969157 9.94346652 0.1910049 12.03 0.00000

173

Page 174: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Goal 3: The Cox Proportional Hazards Model

Parameter estimates comparison:# random effects model

coef exp(coef) se(coef) z p

GroupNegative control IP -3.0897083 0.04551523 0.9080259 -3.40 0.00067

GroupSE-SVA-1033-3F 0.1328129 1.14203629 0.5651421 0.24 0.81000

GroupSE-SVA-1033-9C -0.9921949 0.37076201 0.5907804 -1.68 0.09300

GroupSE-SVA-1033 wild-type 0.3731670 1.45232688 0.5629024 0.66 0.51000

GroupSE-SVA-14-3D -0.4303877 0.65025694 0.5759943 -0.75 0.45000

GroupSE-SVA-14-5G -1.6311436 0.19570563 0.6209667 -2.63 0.00860

GroupSE-SVA-14 wild-type 0.9637155 2.62141828 0.5568671 1.73 0.08400

Infection_modelIP 2.2969157 9.94346652 0.1910049 12.03 0.00000

# fixed effects model:

coef exp(coef) se(coef) z Pr(>|z|)

GroupNegative control IP -3.12569262 0.04390651 0.8836013 -3.54 4.040157e-04

GroupSE-SVA-1033-3F 0.03483477 1.03544861 0.5398640 0.06 9.485521e-01

GroupSE-SVA-1033-9C -1.04266264 0.35251481 0.5629389 -1.85 6.400038e-02

GroupSE-SVA-1033 wild-type 0.38183948 1.46497690 0.5360187 0.71 4.762405e-01

GroupSE-SVA-14-3D -0.46135776 0.63042710 0.5481104 -0.84 3.999424e-01

GroupSE-SVA-14-5G -1.67392506 0.18750963 0.5937000 -2.82 4.810158e-03

GroupSE-SVA-14 wild-type 0.98034502 2.66537570 0.5312347 1.85 6.497815e-02

Infection_modelIP 2.33718070 10.35200996 0.1753049 13.33 1.506205e-40

174

Page 175: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Sample Size Determination for the Cox Proportional Hazards Model• The sample size requires a specific number of events, 𝑁𝑁𝐸𝐸𝐸𝐸. • For two equally sized groups, let Δ be the hazard ratio between them. To detect

a hazard ratio of Δ with a power of 1 − 𝛽𝛽, using a test level 𝛼𝛼, the necessarynumber of events is

𝑁𝑁𝐸𝐸𝐸𝐸 =𝑧𝑧1−𝛼𝛼/2 − 𝑧𝑧1−𝛽𝛽 Δ + 1

Δ − 1

2

• Power:

𝑧𝑧𝐸𝐸𝐸𝐸 = 𝑁𝑁𝐸𝐸𝐸𝐸Δ + 1Δ − 1

− 𝑧𝑧1−𝛼𝛼/2

𝑃𝑃𝑐𝑐𝑃𝑃𝑓𝑓𝑓𝑓 = Φ 𝑍𝑍𝐸𝐸𝐸𝐸 ,Where Φ is the distribution function for the standard normal:

Φ 𝑒𝑒 = �−∞

𝑥𝑥 12𝜋𝜋

𝑓𝑓−𝑥𝑥2/2𝑑𝑑𝑒𝑒

• In R: Φ=pnorm175

Page 176: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Sample Size Determination for the Cox Proportional Hazards Model

• Because of the semiparametric nature of the Cox Proportional Hazards model, no general methods exist to derive 𝑁𝑁 from 𝑁𝑁𝐸𝐸𝐸𝐸.

• Assume that the probability of an event is 𝑒𝑒1 in group 1 and 𝑒𝑒2in group 2. Then

𝑁𝑁 =𝑁𝑁𝐸𝐸𝐸𝐸

⁄𝑒𝑒1 + 𝑒𝑒2 2

R package: powerSurvEpi (2018).176

Page 177: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Survival Analysis with Time-Dependent Covariates

177

Page 178: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Survival Analysis with Time-Dependent Covariates• The addict dataset: Survival times in days of heroin addicts

from entry to a clinic until departure.• Data provided by John Caplehorn, The University of Sydney,

Dept of Public Health.

Column 1 = ID of subject

2 = Clinic (1 or 2)

3 = status (0=censored, 1=endpoint)

4 = survival time (days)

5 = prison record?

6 = methodone dose (mg/day)

178

Page 179: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Survival Analysis with Time-Dependent Covariates• addicts dataset in R:

addicts<-read.table("Data/addicts.txt",header=TRUE)

head(addicts)

ID Clinic Status Survival prison methodone

1 1 1 1 428 0 50

2 2 1 1 275 1 55

3 3 1 1 262 0 55

4 4 1 1 183 0 30

5 5 1 1 259 1 65

6 6 1 1 714 0 55

• 238 data lines of drug addicts treated with methodone179

Page 180: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Survival Analysis with Time-Dependent Covariates• Suppose that the variable ‘methodone dose’ violates the

proportional hazards assumption, and we are interested indefining a time-varying covariate as the product of DOSE andthe natural log of time (Survival).

• We need to re-organize data to facilitate this.

• For this, we have the survSplit function in R.

180

Page 181: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Survival Analysis with Time-Dependent Covariates

addicts.cp<-survSplit(addicts,

cut=addicts$Survival[addicts$Status==1],

end="Survival",

event="Status",

start="start",

id="ID2")

Breaks up the addicts dataset in lines corresponding to the passage between every point where an event happens(mimicking continuity).

181

Page 182: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Survival Analysis with Time-Dependent Covariates> head(addicts.cp)

ID Clinic prison methodone ID2 start Survival Status

1 1 1 0 50 1 0 7 0

2 1 1 0 50 1 7 13 0

3 1 1 0 50 1 13 17 0

4 1 1 0 50 1 17 19 0

5 1 1 0 50 1 19 26 0

6 1 1 0 50 1 26 29 0

The ID 1 is broken into 97 lines:> addicts.cp[96:98,]

ID Clinic prison methodone ID2 start Survival Status

96 1 1 0 50 1 394 399 0

97 1 1 0 50 1 399 428 1

98 2 1 1 55 2 0 7 0

182

Page 183: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Survival Analysis with Time-Dependent Covariates• Adding dose*log(time) :addicts.cp$logtdose=addicts.cp$methodone*log(addicts.cp$Survival)

# removing intervals of length 0:

addicts.cp<-addicts.cp[addicts.cp$start<addicts.cp$Survival,]

• ID 114 has an event at day 35:addicts.cp[addicts.cp$ID==114,c("ID","start","Survival","Status",

"methodone","logtdose")]

ID start Survival Status methodone logtdose

10515 114 0 7 0 40 77.83641

10516 114 7 13 0 40 102.59797

10517 114 13 17 0 40 113.32853

10518 114 17 19 0 40 117.77756

10519 114 19 26 0 40 130.32386

10520 114 26 29 0 40 134.69183

10521 114 29 30 0 40 136.04790

10522 114 30 33 0 40 139.86030

10523 114 33 35 1 40 142.21392

183

Page 184: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Survival Analysis with Time-Dependent CovariatesAnalysis results:

>my.analysis<-

coxph(Surv(addicts.cp$start,addicts.cp$Survival,addicts.cp$Status) ~

prison + methodone + Clinic + logtdose + cluster(ID),data=addicts.cp)

>summary(my.analysis)$coef

coef exp(coef) se(coef) robust se z Pr(>|z|)

prison 0.340633209 1.4058375 0.167474080 0.159717275 2.132726 3.294720e-02

methodone -0.082624866 0.9206965 0.035984407 0.029601316 -2.791257 5.250384e-03

Clinic -1.019875123 0.3606400 0.215415952 0.236365216 -4.314827 1.597276e-05

logtdose 0.008615205 1.0086524 0.006454814 0.005248135 1.641575 1.006782e-01

• The methodone dose is significant, just as the event risk increasesif you have been to prison; also there is a difference between the clinics. But the logtdose is not significant with methodone in the model.

184

Page 185: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Survival Analysis with Time-Dependent Covariates• Relevant cut points for epidemiological studies:

• Time points where exposure changes

• This way, subjects may serve as their own controls

185

Page 186: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Concluding Remarks• Survival analysis is a wide study area; half a day only lets a

glimmer of light out from the shining world of survival analysis.

• Go explore the relavant areas for you, on the basis of thisbrief introduction.

• Main references:–Kleinbaum & Klein: Survival analysis. Springer 2012.–Andersen, Borgan, Gill and Keiding: Statistical Emthods

based on Counting Processes. Springer 1997.–Martinussen & Scheike: Dynamic Regression Models for

Survival Data. Springer 2006.186

Page 187: Design of Experiments Survival Analysis · • ‘Introduction to. R ’ uploaded. 2. Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis

Anders Stockmarr Design of Experiments and Survival Analysis DTU Statistics and Data Analysis12 November 2019

Thank you for your attention

187