47
Designing longitudinal studies in epidemiology Donna Spiegelman Professor of Epidemiologic Methods Departments of Epidemiology and Biostatistics [email protected] Xavier Basagana Doctoral Student Department of Biostatistics, Harvard School of Public Health

Designing longitudinal studies in epidemiology Donna Spiegelman Professor of Epidemiologic Methods Departments of Epidemiology and Biostatistics [email protected]

Embed Size (px)

Citation preview

Page 1: Designing longitudinal studies in epidemiology Donna Spiegelman Professor of Epidemiologic Methods Departments of Epidemiology and Biostatistics stdls@channing.harvard.edu

Designing longitudinal studiesin epidemiology

Donna SpiegelmanProfessor of Epidemiologic Methods

Departments of Epidemiology and [email protected]

Xavier BasaganaDoctoral Student

Department of Biostatistics, Harvard School of Public Health

Page 2: Designing longitudinal studies in epidemiology Donna Spiegelman Professor of Epidemiologic Methods Departments of Epidemiology and Biostatistics stdls@channing.harvard.edu

Background

We develop methods for the design of longitudinal studies for the most common scenarios in epidemiology

There already exist some formulas for power and sample size calculations in this context.

All prior work has been developed for clinical trials applications

Page 3: Designing longitudinal studies in epidemiology Donna Spiegelman Professor of Epidemiologic Methods Departments of Epidemiology and Biostatistics stdls@channing.harvard.edu

Based on clinical trials:

Some are based on test statistics that are not valid or less efficient in an observational context, where (e.g. ANCOVA). 0100 ii YEYE

Background

0 1 2 3 4 5

05

1015

2025

Time

Y

ControlTreatment

0 1 2 3 4 5

05

1015

2025

Time

Y

0 1 2 3 4 5

05

1015

2025

3035

Time

Y

Page 4: Designing longitudinal studies in epidemiology Donna Spiegelman Professor of Epidemiologic Methods Departments of Epidemiology and Biostatistics stdls@channing.harvard.edu

Based on clinical trials:

In clinical trials:

The time measure of interest is time from randomization everyone starts at the same time. We consider situations where, for example, age is the time variable of interest, and subjects do not start at the same age.

Time-invariant exposures Exposure (treatment) prevalence is 50% by

design

Background

Page 5: Designing longitudinal studies in epidemiology Donna Spiegelman Professor of Epidemiologic Methods Departments of Epidemiology and Biostatistics stdls@channing.harvard.edu

Derive study design formulas based on tests that are valid and efficient for observational studies, for two reasonable alternative hypotheses.

Comprehensively assess the effect of all parameters on power and sample size.

Extend the formulas to a context where not all subjects enter the study at the same time.

Extend formulas to the case of time-varying covariates, and compare it to the time-invariant covariates case.

Xavier Basagaña’s Thesis

Page 6: Designing longitudinal studies in epidemiology Donna Spiegelman Professor of Epidemiologic Methods Departments of Epidemiology and Biostatistics stdls@channing.harvard.edu

Derive the optimal combination of number of subjects (n) and number of repeated measures (r+1) when subject to a cost constraint.

Create a computer program to perform design computations. Intuitive parameterization and easy to use.

Xavier Basagaña’s Thesis

Page 7: Designing longitudinal studies in epidemiology Donna Spiegelman Professor of Epidemiologic Methods Departments of Epidemiology and Biostatistics stdls@channing.harvard.edu

Notation and Preliminary Results

Page 8: Designing longitudinal studies in epidemiology Donna Spiegelman Professor of Epidemiologic Methods Departments of Epidemiology and Biostatistics stdls@channing.harvard.edu

We study two alternative hypotheses:

iriij XXYE 1.01.00.0|

0: 10 H

1. Constant Mean Difference (CMD).

0 1 2 3 4 5

05

10

15

20

25

Time

Y

UnexposedExposedDifference

Page 9: Designing longitudinal studies in epidemiology Donna Spiegelman Professor of Epidemiologic Methods Departments of Epidemiology and Biostatistics stdls@channing.harvard.edu

)(| 31.01.00.0 ijiiriij TXXXYE

0: 30 H

2. Linearly Divergent Differences (LDD)

0 1 2 3 4 5

05

10

15

20

25

30

35

Time

Y

UnexposedExposedDifference

Page 10: Designing longitudinal studies in epidemiology Donna Spiegelman Professor of Epidemiologic Methods Departments of Epidemiology and Biostatistics stdls@channing.harvard.edu

Intuitive parameterization of the alternative hypothesis

1) the mean response at baseline (or at the mean initial time) in the unexposed group, where

2) the percent difference between exposed and unexposed groups at baseline (or at the mean initial time), where

00 :

00 0 | 0 , 1, ,i iE Y X i N

1 :p

0 01

0

| 1 | 0

| 0i i i i

i i

E Y X E Y Xp

E Y X

Page 11: Designing longitudinal studies in epidemiology Donna Spiegelman Professor of Epidemiologic Methods Departments of Epidemiology and Biostatistics stdls@channing.harvard.edu

Intuitive parameterization of the alternative hypothesis (2)

3) : the percent change from baseline (or from the mean initial time) to end of follow-up (or to the mean final time) in the unexposed group, where

When is not fixed, is defined at time s instead of at time

4) : the percent difference between the change from baseline (or from the mean initial time) to end of follow-up (or mean final time) in the exposed group and the unexposed group, where

When , will be defined as the percent change from baseline (or from the mean initial time) to the end of follow-up (or to the mean final time) in the exposed group, i.e.

2p

02

0

| 0 | 0

| 0i i i i

i i

E Y X E Y Xp

E Y X

2p

3p

0 03

0

| 1 | 0

| 0i i i i i i

i i i

E Y Y X E Y Y Xp

E Y Y X

02 p 3p

03

0

| 1 | 1

| 1i i i i

i i

E Y X E Y Xp

E Y X

Page 12: Designing longitudinal studies in epidemiology Donna Spiegelman Professor of Epidemiologic Methods Departments of Epidemiology and Biostatistics stdls@channing.harvard.edu

We consider studies where the interval between visits (s) is fixed but the duration of the study is free (e.g. participants may respond to questionnaires every two years) Increasing r involves increasing the duration of the study

We also consider studies where the duration of the study, , is fixed, but the interval between visits is free (e.g. the study is 5 years long) Increasing r involves increasing the frequency of the

measurements, s

= s r.

Notation & Preliminary Results

Page 13: Designing longitudinal studies in epidemiology Donna Spiegelman Professor of Epidemiologic Methods Departments of Epidemiology and Biostatistics stdls@channing.harvard.edu

Model

The generalized least squares (GLS) estimator of B is

Power formula

1

1 1

, ,

1 1

1 1ˆ

~ ,

ki ki ki kik i k i

ki ki

N N

N E

B X Σ X X Σ Y

Β Σ X Σ X

, | ( 1, , ; 0,1)ki ki ki kiE Var i k k Y X Β Y X Σ

Notation & Preliminary Results

1 / 21 AH

Nz

c'Β

c Σ c

Page 14: Designing longitudinal studies in epidemiology Donna Spiegelman Professor of Epidemiologic Methods Departments of Epidemiology and Biostatistics stdls@channing.harvard.edu

Let lm be the (l,m)th element of -1

Assuming that the time distribution is independent of exposure group.

Then, under CMD

Under LDD

1

10 0

ˆ (1 )r r

e e lml m

Var p p v

c Σ c

1

0 03 2 2

20

0 0 0 0 0 0 0 0

(1 )ˆ

( )

r r

e e lml m

r r r r r r r r

lm lm lm lml m l m l m l m

p p v

Var

V t v s v l m v l v

c Σ c

Notation & Preliminary Results

Page 15: Designing longitudinal studies in epidemiology Donna Spiegelman Professor of Epidemiologic Methods Departments of Epidemiology and Biostatistics stdls@channing.harvard.edu

We consider three common correlation structures:

1. Compound symmetry (CS).

Correlation structures

1

1

1

)|( 2)1()1(

rriijVar ΣXY

Page 16: Designing longitudinal studies in epidemiology Donna Spiegelman Professor of Epidemiologic Methods Departments of Epidemiology and Biostatistics stdls@channing.harvard.edu

2. Damped Exponential (DEX)

1

1

1

1

)2()(

)2()2(

)()2(

2

ssrs

s

sss

ss

rsss

Σ

Correlation structures

1

0.8 1

0.8 0.8 1

0.8 0.8 0.8 1

0.8 0.8 0.8 0.8 1

1

0.8 1

0.76 0.8 1

0.73 0.76 0.8 1

0.71 0.73 0.76 0.8 1

1

0.8 1

0.64 0.8 1

0.51 0.64 0.8 1

0.41 0.51 0.64 0.8 1

= 0: CS

= 0.3: CS

= 1: AR(1)

Page 17: Designing longitudinal studies in epidemiology Donna Spiegelman Professor of Epidemiologic Methods Departments of Epidemiology and Biostatistics stdls@channing.harvard.edu

3. Random intercepts and slopes (RS).

Reparameterizing:

is the reliability coefficient at baseline

is the slope reliability at the end of follow-up ( =0 is CS; =1 all variation in slopes is between subjects).

With this correlation structure, the variance of the response changes with time, i.e. this correlation structure gives a heteroscedastic model.

2i i w Σ Z DZ I

2

2

11010

10100

bbbbb

bbbbb

D

]1,0[0t

]1,0[,1b

Correlation structures

1 ,0 CSb

1 ,b

1 ,b

Page 18: Designing longitudinal studies in epidemiology Donna Spiegelman Professor of Epidemiologic Methods Departments of Epidemiology and Biostatistics stdls@channing.harvard.edu

Goal is to investigate the effect of indicators of socioeconomic status and post-menopausal hormone use on cognitive function (CMD) and cognitive decline (LDD)

“Pilot study” by Lee S, Kawachi I, Berkman LF, Grodstein F (“Education, other socioeconomic indicators, and cognitive function. Am J Epidemiol 2003; 157: 712-720). Will denote as Grodstein.

Design questions include power of the published study to detect effects of specified magnitude, the number and timing of additional tests in order to obtain a study with the desired power to detect effects of specified magnitude, and the optimal number of participants and measurements needed in a de novo study of these issues

Example

Page 19: Designing longitudinal studies in epidemiology Donna Spiegelman Professor of Epidemiologic Methods Departments of Epidemiology and Biostatistics stdls@channing.harvard.edu

At baseline and at one time subsequently, six cognitive tests were administered to 15,654 participants in the Nurses’ Health Study

Outcome: Telephone Interview for Cognitive Status (TICS) 00=32.7 (4);

Implies model

= 1 point/10 years of age

Example

2 0.3% /p year

2 16, 2 12,e 2 0.25R

Page 20: Designing longitudinal studies in epidemiology Donna Spiegelman Professor of Epidemiologic Methods Departments of Epidemiology and Biostatistics stdls@channing.harvard.edu

Exposure: Graduate school degree vs. not (GRAD) Corr(GRAD, age)=-0.01 points

Exposure: Post-menopausal hormone use (CURRHORM) Corr(CURRHORM, age)=-0.06 points

Time: age (years) is the best choice, not questionnaire cycle or calendar year of test The mean age was 74 and V(t0)4.

Example

6.2%,ep

1 2.3% 0.75p

1 0.7% 0.02p 26.7%,ep

Page 21: Designing longitudinal studies in epidemiology Donna Spiegelman Professor of Epidemiologic Methods Departments of Epidemiology and Biostatistics stdls@channing.harvard.edu

The estimated covariance parameters were

SAS code to fit the LDD model with CS covarianceproc mixed;class id;model tics=grad age gradage/s;random id;

SAS code to fit the LDD model with RS covarianceproc mixed;class id;model tics=grad age gradage/s ddfm=bw;Random intercept age/type=un subject=id;

CS RS

or 0.27 0.26

0.04

-0.14

0t

2~,1 rb

10bb

Example

Page 22: Designing longitudinal studies in epidemiology Donna Spiegelman Professor of Epidemiologic Methods Departments of Epidemiology and Biostatistics stdls@channing.harvard.edu

Program optitxs.r makes it all possible

Page 23: Designing longitudinal studies in epidemiology Donna Spiegelman Professor of Epidemiologic Methods Departments of Epidemiology and Biostatistics stdls@channing.harvard.edu
Page 24: Designing longitudinal studies in epidemiology Donna Spiegelman Professor of Epidemiologic Methods Departments of Epidemiology and Biostatistics stdls@channing.harvard.edu
Page 25: Designing longitudinal studies in epidemiology Donna Spiegelman Professor of Epidemiologic Methods Departments of Epidemiology and Biostatistics stdls@channing.harvard.edu
Page 26: Designing longitudinal studies in epidemiology Donna Spiegelman Professor of Epidemiologic Methods Departments of Epidemiology and Biostatistics stdls@channing.harvard.edu
Page 27: Designing longitudinal studies in epidemiology Donna Spiegelman Professor of Epidemiologic Methods Departments of Epidemiology and Biostatistics stdls@channing.harvard.edu
Page 28: Designing longitudinal studies in epidemiology Donna Spiegelman Professor of Epidemiologic Methods Departments of Epidemiology and Biostatistics stdls@channing.harvard.edu

http://www.hsph.harvard.edu/faculty/spiegelman/software.html

Page 29: Designing longitudinal studies in epidemiology Donna Spiegelman Professor of Epidemiologic Methods Departments of Epidemiology and Biostatistics stdls@channing.harvard.edu

http://www.hsph.harvard.edu/faculty/spiegelman/optitxs.html

Page 30: Designing longitudinal studies in epidemiology Donna Spiegelman Professor of Epidemiologic Methods Departments of Epidemiology and Biostatistics stdls@channing.harvard.edu
Page 31: Designing longitudinal studies in epidemiology Donna Spiegelman Professor of Epidemiologic Methods Departments of Epidemiology and Biostatistics stdls@channing.harvard.edu
Page 32: Designing longitudinal studies in epidemiology Donna Spiegelman Professor of Epidemiologic Methods Departments of Epidemiology and Biostatistics stdls@channing.harvard.edu
Page 33: Designing longitudinal studies in epidemiology Donna Spiegelman Professor of Epidemiologic Methods Departments of Epidemiology and Biostatistics stdls@channing.harvard.edu
Page 34: Designing longitudinal studies in epidemiology Donna Spiegelman Professor of Epidemiologic Methods Departments of Epidemiology and Biostatistics stdls@channing.harvard.edu
Page 35: Designing longitudinal studies in epidemiology Donna Spiegelman Professor of Epidemiologic Methods Departments of Epidemiology and Biostatistics stdls@channing.harvard.edu

Illustration of use of softwareoptitxs.r

• We’ll calculate the power of the Grodstein’s published study to detect the observed 70% difference in rates of decline between those with more than high school vs. others

• Recall that 6.2% of NHS had more than high school; there was a –0.3% decline in cognitive function per year

Page 36: Designing longitudinal studies in epidemiology Donna Spiegelman Professor of Epidemiologic Methods Departments of Epidemiology and Biostatistics stdls@channing.harvard.edu

> long.power()Press <Esc> to quit

Constant mean difference (CMD) or Linearly divergent difference (LDD)? lddThe alternative is LDD.

Enter the total sample size (N): 15000

Enter the number of post-baseline measures (r>0): 1

Enter the time between repeated measures (s): 2

Enter the exposure prevalence (pe) (0<=pe<=1): 0.062

Enter the variance of the time variable at baseline, V(t0) (enter 0 if all participants begin at the same time): 4

Enter the correlation between the time variable at baseline and exposure, rho[e,t0] (enter 0 if all participants begin at the same time): -0.01

Will you specify the alternative hypothesis on the absolute (beta coefficient) scale (1) or the relative (percent) scale (2)? 2The alternative hypothesis will be specified on the relative (percent) change scale.

Page 37: Designing longitudinal studies in epidemiology Donna Spiegelman Professor of Epidemiologic Methods Departments of Epidemiology and Biostatistics stdls@channing.harvard.edu

Enter mean response at baseline among unexposed (mu00): 32.7

Enter the percent change from baseline to end of follow-up among unexposed (p2) (e.g. enter 0.10 for a 10% change): -0.006

Enter the percent difference between the change from baseline to end of follow-up in the exposed group and the unexposed group (p3) (e.g. enter 0.10 for a 10% difference): 0.7

Which covariance matrix are you assuming: compound symmetry (1), damped exponential (2) or random slopes (3)? 2You are assuming DEX covariance

Enter the residual variance of the response given the assumed model covariates (sigma2): 12

Enter the correlation between two measures of the same subject separated by one unit (rho): 0.3

Enter the damping coefficient (theta): 0.10

Power = 0.4206059

Page 38: Designing longitudinal studies in epidemiology Donna Spiegelman Professor of Epidemiologic Methods Departments of Epidemiology and Biostatistics stdls@channing.harvard.edu

Power of current study

• To detect the observed 70% difference in cognitive decline by GRAD – CS: 44%– RS: 35%– DEX : 42%

• To detect a hypothesized ±10% difference in cognitive decline by current hormone use– CS & DEX: 7%– RS: 6%

( 0.10)

Page 39: Designing longitudinal studies in epidemiology Donna Spiegelman Professor of Epidemiologic Methods Departments of Epidemiology and Biostatistics stdls@channing.harvard.edu

How many additional measurements are needed when tests are administered every 2 years

how many more years of follow-up are needed...

• To detect the observed 70% difference in cognitive decline by GRAD with 90% power?– CS, DEX , RS: 3 post-baseline measurements

=6• one more 5 year grant cycle

• To detect a hypothesized ± 20% difference in cognitive decline by current hormone use with 90% power?– CS, DEX : 6 post-baseline measurements

=12• More than two 5 year grant cycles

N=15,000 for these calculations

( 0.10)

( 0.10)

Page 40: Designing longitudinal studies in epidemiology Donna Spiegelman Professor of Epidemiologic Methods Departments of Epidemiology and Biostatistics stdls@channing.harvard.edu

How many more measurements should be taken in four (1 NIH grant cycle) and eight years of follow-up

(two NIH grant cycles)...

• To detect the observed 70% difference in cognitive decline by GRAD with 90% power?

• To detect a hypothesized ± 20% difference in cognitive decline by current hormone use with 90% power? ( 0.10)

( 0.10)

Duration of follow-up

4 years 8 years

CS 8 1

DEX 10 1

RS 10 1

Duration of follow-up

4 years

8 years

CS >50 11

DEX >50 17

RS >50 13

Page 41: Designing longitudinal studies in epidemiology Donna Spiegelman Professor of Epidemiologic Methods Departments of Epidemiology and Biostatistics stdls@channing.harvard.edu

Optimize (N,r) in a new study of cognitive decline

• Assume – 4 years of follow-up (1 NIH grant cycle); – cost of recruitment and baseline measurements are twice that of

subsequent measurements

• GRAD: – (N,r)=(26,795; 1) CS– =(26,930;1) DEX– =(28,945;1) RS

• CURRHORM: – (N,r)=(97,662; 1) CS– =(98,155; 1) DEX– =(105,470;1) RS

( 0.10)

3 70%p

3 20%p

( 0.10)

Page 42: Designing longitudinal studies in epidemiology Donna Spiegelman Professor of Epidemiologic Methods Departments of Epidemiology and Biostatistics stdls@channing.harvard.edu

Conclusions

iriij XXYE 1.01.00.0|

0: 10 H

Re: Constant Mean Difference (CMD)

0 1 2 3 4 5

05

10

15

20

25

Time

Y

UnexposedExposedDifference

Page 43: Designing longitudinal studies in epidemiology Donna Spiegelman Professor of Epidemiologic Methods Departments of Epidemiology and Biostatistics stdls@channing.harvard.edu

CMD: If all observations have the same cost, one would

not take repeated measures.

If subsequent measures are cheaper, one would take no repeated measures or just a small number if the correlation between measures is large.

If deviations from CS exist, it is advisable to take more repeated measures.

Power increases as and as Power increases as Var( ) goes to 0

Conclusions

0 1

0iT

Page 44: Designing longitudinal studies in epidemiology Donna Spiegelman Professor of Epidemiologic Methods Departments of Epidemiology and Biostatistics stdls@channing.harvard.edu

LDD: If the follow-up period is not fixed, choose the

maximum length of follow-up possible (except when RS is assumed).

If the follow-up period fixed, one would take more than one repeated measure only when the subsequent measures are more than five times cheaper. When there are departures from CS, values of around 10 or 20 are needed to justify taking 3 or 4 measures.

Power increases as , as , as slope reliability goes to 0, as Var( ) increases, and as the correlation between and exposure goes to 0

Conclusions

1 0 0iT

0iT

Page 45: Designing longitudinal studies in epidemiology Donna Spiegelman Professor of Epidemiologic Methods Departments of Epidemiology and Biostatistics stdls@channing.harvard.edu

LDD: The optimal (N,r) and the resulting power

can strongly depend on the correlation structure. Combinations that are optimal for one correlation may be bad for another.

All these decisions are based on power considerations alone. There might be other reasons to take repeated measures.

Sensitivity analysis. Our program.

Conclusions

Page 46: Designing longitudinal studies in epidemiology Donna Spiegelman Professor of Epidemiologic Methods Departments of Epidemiology and Biostatistics stdls@channing.harvard.edu

Future work

Develop formulas for time-varying exposure.

Include dropout

• For sample size calculations, simply inflate the sample size by a factor of 1/(1-f).

• However, dropout can alter the relationship between N and r.

Page 47: Designing longitudinal studies in epidemiology Donna Spiegelman Professor of Epidemiologic Methods Departments of Epidemiology and Biostatistics stdls@channing.harvard.edu

Thanks!