Correlated data - ku

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Faculty of Health Sciences

Correlated dataFurther topics

Lene Theil SkovgaardDecember 11, 2012

1 / 96


Further topics

I Specification of mixed modelsI Model check and diagnosticsI Explained variation, R2

I Missing valuesI Several series for each individualI Additional examples:

I Visual acuity example (from Variance components II)I Baseline revisited: Simple before and after studyI Reading ability

2 / 96


Specification of mixed models

I Systematic variation:I Between-individual covariates:

treatment, sex, age, baseline value...I Within-individual covariates:

time, cumulative dose, temperature...is specified “as usual”, including possible interactions

I Random variationI Interactions between systematic and random effectsare always random

3 / 96


Sources of random variation

1. Random effects:

2. Serial correlation:

3. Measurement error:

4 / 96


SAS, PROC MIXED

I modeldescribes the systematic part(fixed effects, mean value structure)

I randomdescribes the random effects

I repeateddescribes the serial correlation

I localadds an additional measurement error

5 / 96


General model specification

sorry..., matrix notation for brevity,Yi = (yi1, ..., yik)′ denote all outcome values for subject i

Yi = Xiβ + Zibi + εi

where

β denotes systematic effects, β = (β1, ..., βp)′

bi denotes random effects, bi = (bi1, ..., biq)′

εi is the serially dependent residual variation,εi = (εi1, ..., εik)′

We assume all bi ’s and εi ’s to be independent, with mean zero andVar(bi) = G, Var(εi) = Ri

6 / 96


For non-normal data

Yi follows some distribution from the exponential family,with mean value

E(Yi) = g−1(ηi)

whereηi = Xiβ + Zibi

and the variance V (Yi) = Vi is determined by the distribution.

7 / 96


Residual variance

In general, there is no free variance parameter,since the variance is determined from the mean value:

I Normal (link=identity), free variance parameter σ2

I Binomial (link=logit), variance np(1 − p)I Poisson (link=log), variance λ = E(Y )

Overdispersion:The variance is seen to be larger than determined by thedistribution.

8 / 96


Overdispersion

can be caused byI omitted covariates (isn’t that always the case?)I unrecognized clustersI heterogeneity, e.g. a “zero”-group (non-susceptibles)

Traditional solution:An over-dispersion parameter φ is estimated and multiplied ontothe variance

cheating.....we would like instead

ηi = Xiβ + Zibi + εi

with a free parameter Var(εi) = Σi

9 / 96


Swabs example from last week

proc glimmix data=swab;class crowding family name;model swab=crowding name / dist=poisson link=log s cl;

random family(crowding);run;

gave as part of the output

Fit Statistics

-2 Res Log Pseudo-Likelihood 178.54Gener. Chi-Square / DF 1.96 <--------------------

When this number is greater than 1, it indicates overdispersion,and the P-values in the Poisson analysis might be too small.

10 / 96


Model with overdispersion

proc glimmix data=swab;class crowding family name;model swab=crowding name /

dist=poisson link=log s cl;random intercept / subject=family(crowding);random _residual_; <------------------------------------------

run;

providing the output

The GLIMMIX Procedure

Fit Statistics

-2 Res Log Pseudo-Likelihood 148.94Generalized Chi-Square 176.12Gener. Chi-Square / DF 2.12

11 / 96


Output, continued

Covariance Parameter Estimates

StandardCov Parm Subject Estimate ErrorIntercept family(crowding) 0.04304 0.03250Residual (VC) 2.1219 0.3631

Type III Tests of Fixed Effects

Num DenEffect DF DF F Value Pr > Fcrowding 2 15 5.49 0.0162name 4 68 17.18 <.0001

with P-values more or less identical to the normality-based results.

12 / 96


But....

The model is not a real model,since no such distribution exists....If we should build a real model with overdispersion, we would write:

ycfn ∼ Poisson(λcfn)

log(λcfn) = µ+ βc + γn + Acf + εcfn

Acf ∼ N (0, ω2), εcfn ∼ N (0, σ2)

13 / 96


Real model for overdispersion

will require two random-statements, like this:

proc glimmix data=swab;class crowding family name;model swab=crowding name /

dist=poisson link=log s cl;random intercept / subject=family(crowding);random intercept / subject=name*family(crowding);

run;

but this model unfortunately cannot be estimated by glimmix:

14 / 96


Convergence problems

The GLIMMIX ProcedureIteration History

Objective MaxIteration Restarts Subiterations Function Change Gradient

15 0 1 155.39914436 0.00000048 0.00001816 0 1 155.39914341 0.00000048 0.00001817 0 1 155.39914437 0.00000049 0.00001818 0 1 155.3991434 0.00000049 0.00001919 0 1 155.39914438 0.00000050 0.000019

Did not converge.


StandardCov Parm Subject Estimate ErrorIntercept family(crowding) 0.03672 .Intercept family*name(crowdin) 0.1283 .

It probably requires true replicates.....

15 / 96


Model check and diagnostics (normal models)

I Ordinary residuals, from systematic effect, Yi − Xi βThese will be correlated

I Conditional residuals, Yi − Xi β + Zi biI Normality of random effects, estimated BLUP’s, biI Investigation of covariance structure, (variogram)I Detection of influential observations

16 / 96


Normality of random effects

Model checksI Histogram of BLUP’s from the model is not worth muchI Instead, build model with a mixture of normal distributions

and make a test....

If normality does not apply:I no large effect on estimates for β, G and RI standard error become biased,

but may be corrected in various waysI BLUP’s bi become invalid, especially when the residual

variance σ2 is large

17 / 96


Variogram

Variance of difference between time points:

γ(u) = 12E(εt − εt−u)2 = τ2(1 − ρ(u)) + σ2

I Nugget: σ2

I Sill: τ2 + σ2

I Variance: ω2 + τ2 + σ2

18 / 96


Local influence

Idea:I Put some infinitesimal extra weight (∆) on a single

observation (i); Weight vector:

ω∆i = (1, · · · , 1 + ∆, · · · , 1)I Look at the change in likelihood:

LD(ω∆i) = 2(l(θ) − l(θω∆i ))

I Make a Local influence plot (i,Ci), where

Ci = 1∆LD(ω∆i)

19 / 96


Reasons for a large influence

I unusual combination of covariates Xi

I large ordinary residuals (from Xiβ)

I unusual combination of covariates Zi

I bad choice of covariance pattern Vi(α)

20 / 96


Explained variation in percent, R2

We have two (or more) different variances to explain!I Residual variation (variation within individuals, σ2

W )I decreases (as usual)

when we include an important x covariate (level 1)I may decrease

when we include an important z covariate(level 2)

I Variation between individuals , ω2B

I decreaseswhen we include an important z covariate (level 2)

I may increase or decrease,when we include an important x covariate (level 1)

21 / 96


Hypothetical example

The x’s vary between individuals, and the average outcomes (y)are mostly due to this variation:

Levels of y, for fixed x are quite alike!ω2 decreases

22 / 96


Another hypothetical example

The x’s vary between individuals, but the average outcomes (y)are almost identical:

Levels of y, for fixed x are very different!ω2 increases

23 / 96


Missing values

Most investigations are planned to be balancedbut almost inevitable turn out to have missing values,or drop-out patients

I just by coincidence (blood sample lost or ruined)I because of exclusion (the patient has recovered)I we lost track of the patient (may be worrysome)I the patient is too ill to show up

(very serious, i.e. carrying information)

24 / 96


Types of missing data

I Single missing valuesI Drop-outs

25 / 96


Possible missing mechanism, I

Low values are good:When the patient is well treated, he drops out

26 / 96


Possible missing mechanism, II

Low values are bad:Below some threshold, the patient is too ill to show up(informative missing)

27 / 96


Notation

I Outcome YgitI Parameters θ (level, slope etc.)I Covariates xgitI Indicator of missing cgitI Missing outcomes Y ∗git not observed, i.e. corresponding to

cgit = 1

28 / 96


Types of missingness

MCAR Missing completely at randomP(cgit = 1) depends only on the parameter θ

MAR Missing at randomI CDEP: P(cgit = 1) depends upon θ and

covariates xgitI YDEP: P(cgit = 1) depends upon θ, covariates

xgit and observed outcomes Ygit

NI Non-ignorable: (informative missing)P(cgit = 1) depends upon the unobserved=missingoutcome Y ∗git

29 / 96


Hypothetical TLC-observations

Lung capacity measured at regular time intervalsfor two groups, that we want to compare

30 / 96


Average for the two groups

31 / 96


Hypothetical example: Informative missing

Patients who get below 3.5 drop out, averages change

32 / 96


Traditional handling of missing data

I Complete case analysisI LOCF: Last observation carried forwardI Time average imputationI Model prediction imputation

I Likelihood methods

33 / 96


Complete case analysis

Make an analysis including only those individuals who are observedat all available time points

I Information lossI Potential bias, if there is a specific reason for the missingness

34 / 96


LOCF: Last observation carried forwardIf an individual has no observed value at time tk , replace themissing value by the previous observation, tk−1For drop-outs, all subsequent values will equal this tk−1

I The time effect will be less pronouncedI Large residuals, i.e. overestimation of residual variation

35 / 96


Time average imputation

I Subject effect will be underestimatedI Large residuals, i.e. overestimation of residual variation

36 / 96


Model prediction imputation

I Two-step procedureI Too small residuals, i.e. downwards bias of of SD

37 / 96


Likelihood methods

Mixed models for all available observations

38 / 96


MCAR: Missing completely at random

I Complete case analysis OK, but inefficientI If only few observations are missing, imputations could work

but the variations will be affectedI Likelihood approaches (mixed models) OK

uses all available information

39 / 96


MAR: Missing at random

I Complete case analysis is biasedWe disregard subjects with special characteristics

I If only few observations are missing, imputations could workbut the variations will be affected

I Likelihood approaches (mixed models) OKuses all available information

40 / 96


Mixed models for MAR

41 / 96


Non-ignorable

Nothing works!

Many attempts have been tried to model the missing mechanisms,but they all rely on assumptions that cannot be checked.

42 / 96


Example: Effect of exercise on appetite

53 subjects, in three groups:I ControlI Moderate exercise: 1

2 hour a dayI Extensive exercise: 1 hour a day

Before and after exercise/placebo:Exercise test, with blood samples taken every half hour frombaseline until 3 hours.Several hormones are measured, e.g. ghrelin

Mads Rosenkilde

43 / 96


HIGH, week=Pre

44 / 96


HIGH, week=Post

45 / 96


Baseline differences, before exercise?

No, not reallyCan we then disregard these?

46 / 96


47 / 96


48 / 96


Aim of investigation

Does the exercise change something?I Does the time course change from the first to the second

week (pre/post exercise)?I If so, does it change more than for the control group?I And does it apply equally to the two exercise groups?

49 / 96


Model for each group separately

Fixed effectsI week: Pre or PostI time: 0, 30, 60, 90, 120, 150, 180 minutesI Interaction week*time:

A change in the pattern from before to after exerciseRandom effects

I Patients, Sub: 18 (HIGH, MOD), 17 (XCON)I Interaction Sub*week

Serial covariance structureI Autoregressive?I Local error term?

50 / 96


Analysis for each group separately

proc mixed data=a0 covtest; by grp;class Sub week time;model log_ghrelin=time week*time

/ ddfm=satterth s cl;random intercept / subject=Sub vcorr v;repeated time / subject=Sub*week

type=sp(pow)(numtime) local rcorr r;lsmeans week*time / slice=time;

run;

51 / 96


Output, exercise HIGH

grp=HIGH

The Mixed Procedure

Model Information

Data Set WORK.A0Dependent Variable log_ghrelinCovariance Structures Variance Components,

Spatial PowerSubject Effects Sub, Sub*weekEstimation Method REMLResidual Variance Method ProfileFixed Effects SE Method Model-BasedDegrees of Freedom Method Satterthwaite

Class Level Information

Class Levels ValuesSub 18 ALMA ANAP ANLO ANTF BRFR CASC

CHBE DAHA DERJ GRPE HEJE JAKUMIFH MIMR MIMÂİ MINI NIHA THES

week 2 Post Pretime 7 0 30 60 90 120 150 180

52 / 96


Output, exercise HIGH, II

DimensionsCovariance Parameters 4Columns in X 24Columns in Z Per Subject 1Subjects 18Max Obs Per Subject 14


Standard ZCov Parm Subject Estimate Error Value Pr ZIntercept Sub 0.008593 0.003105 2.77 0.0028Variance Sub*week 0.001219 0.000374 3.26 0.0005SP(POW) Sub*week 0.6549 0.1744 3.75 0.0002Residual 0.000817 0.000214 3.81 <.0001

53 / 96


Output, exercise HIGH, III

Type 3 Tests of Fixed Effects

Num DenEffect DF DF F Value Pr > Ftime 6 105 6.77 <.0001week*time 7 53.8 1.89 0.0885


Num DenEffect DF DF F Value Pr > Fweek 1 16.8 0.24 0.6302time 6 105 6.77 <.0001week*time 6 105 2.13 0.0560

54 / 96


Output, exercise HIGH, IV

Tests of Effect Slices

Num DenEffect time DF DF F Value Pr > Fweek*time 0 1 72.2 0.81 0.3725week*time 30 1 72.2 0.08 0.7830week*time 60 1 72.2 2.57 0.1130week*time 90 1 72.2 0.06 0.8052week*time 120 1 72.2 2.52 0.1167week*time 150 1 72.2 0.30 0.5881week*time 180 1 74.5 1.94 0.1682

Solution for Fixed Effects

StandardEffect week time Estimate Error DF t Value Pr > |t|

week*time Post 0 0.01350 0.01504 72.2 0.90 0.3725week*time Post 30 0.004157 0.01504 72.2 0.28 0.7830week*time Post 60 0.02413 0.01504 72.2 1.60 0.1130week*time Post 90 0.003723 0.01504 72.2 0.25 0.8052week*time Post 120 -0.02388 0.01504 72.2 -1.59 0.1167week*time Post 150 -0.00818 0.01504 72.2 -0.54 0.5881week*time Post 180 0.02116 0.01521 74.5 1.39 0.1682

55 / 96


Results from all three groups

ω2 τ2 ρ σ2 P: week*timeid/int

HIGH 0.008593 0.001219 0.6549 0.00817 0.089/0.056

MOD 0.01998 0.002811 0.7488 0.001522 0.32/0.23

XCON 0.01528 0.001303 0.2957 0.000398 0.13/0.092

All 0.01471 0.001724 0.6200 0.000946 0.0848∗

*: Test for second-order interaction: grp*week*time

56 / 96


All groups simultaneously

proc mixed data=a0 covtest;class Sub grp week time;model log_ghrelin=grp week time

grp*week grp*time week*timegrp*week*time / ddfm=satterth s cl;

random intercept / subject=Sub(grp) vcorr v;repeated time / subject=week*Sub(grp) type=sp(pow)(numtime) local rcorr r;lsmeans grp*week*time / slice=week;lsmeans grp*week*time / slice=grp;

run;

Class Level Information

Class Levels Values

Sub 53 ALJE ALMA ANAP ANBR ANKR ANLIANLO ANMO ANTF ASOL ASSA BRFR

THBR THES THLU TUMY ULRAgrp 3 HIGH MOD XCONweek 2 Post Pretime 7 0 30 60 90 120 150 180

57 / 96


Dimensions

Covariance Parameters 4Columns in X 96Columns in Z Per Subject 1Subjects 53Max Obs Per Subject 14

Number of Observations

Number of Observations Read 736Number of Observations Used 736Number of Observations Not Used 0

58 / 96


The Mixed Procedure


Standard ZCov Parm Subject Estimate Error Value Pr Z

Intercept Sub 0.01471 0.003062 4.80 <.0001Variance Sub*week 0.001724 0.000279 6.17 <.0001SP(POW) Sub*week 0.6200 0.1007 6.16 <.0001Residual 0.000946 0.000160 5.93 <.0001


Num DenEffect DF DF F Value Pr > F

grp 2 50.1 0.36 0.7004week 1 49.5 0.03 0.8673hours 6 319 15.66 <.0001grp*week 2 49.5 0.30 0.7416grp*hours 12 319 0.48 0.9236week*hours 6 319 1.71 0.1183grp*week*hours 12 319 1.62 0.0848

59 / 96


Baseline differences

Even though these are not significantly different, we know that weought to include it as a covariate, but which baseline?

I The observation at time 0 in the Pre-exercise week?I The observation at time 0 at each of the two weeks

(i.e. separate values for variable baseline, depending uponthe week)

Think about which question you want to answer!

60 / 96


Including first baseline as covariate

proc sort data=a0;by grp Sub;

run;

data baseline;set a0; if week=’Pre’ and time=0;ghrelin0=ghrelin;log_ghrelin0=log_ghrelin;run;

data test;merge a0 baseline; by grp Sub;

proc mixed data=test covtest; where week=’Post’ or time>0;class Sub grp week time;model log_ghrelin=log_ghrelin0 grp week time

grp*week grp*time week*timegrp*week*time / ddfm=satterth s cl;

random intercept / subject=Sub vcorr v;repeated time / subject=Sub*week type=sp(pow)(numtime) local rcorr r;

run;

61 / 96


Output, with baseline as covariate

Dimensions

Covariance Parameters 4Columns in X 85Columns in Z Per Subject 1Subjects 52Max Obs Per Subject 12

Number of Observations

Number of Observations Read 632Number of Observations Used 621Number of Observations Not Used 11


Standard ZCov Parm Subject Estimate Error Value Pr Z

Intercept Sub 0.001280 0.000357 3.59 0.0002Variance Sub*week 0.001350 0.000277 4.87 <.0001SP(POW) Sub*week 0.4419 0.1439 3.07 0.0021Residual 0.000877 0.000235 3.73 <.0001

62 / 96



StandardEffect grp week hours Estimate Error DF t Value

Intercept 0.5587 0.1159 48.7 4.82log_ghrelin0 0.7980 0.04092 47.9 19.50....



log_ghrelin0 1 47.9 380.34 <.0001grp 2 48.5 2.31 0.1105week 1 53.1 0.31 0.5787time 5 272 7.27 <.0001grp*week 2 53.1 0.68 0.5113grp*time 10 272 0.53 0.8649week*time 5 272 2.07 0.0699grp*week*time 10 272 1.28 0.2406 <--------------------------

Proceed with some model reduction and model checks.....

63 / 96


Model reduction


Num DenEffect DF DF F Value Pr > Flog_ghrelin0 1 47.9 380.16 <.0001grp 2 48 2.31 0.1100week 1 55.5 0.29 0.5946time 5 282 7.22 <.0001week*time 5 282 2.11 0.0638


Num DenEffect DF DF F Value Pr > Flog_ghrelin0 1 49.9 362.33 <.0001week 1 55.5 0.29 0.5944time 5 282 7.23 <.0001week*time 5 282 2.11 0.0642

64 / 96


Could this have been done simpler?

I Analysis of differencesI because we have only two occasionsI and exactly the same time points

I Reducing the time coursesI to averages and/or slopes,

to give a simpler model for derived quantities

65 / 96


Effect of the lens strength on visual acuity

7 individuals are looking at a screen, where a light flash appears.

They are looking through 4 lenses, with powers 6/6, 6/18, 6/36and 6/60, i.e. 4 magnifications: 1, 3, 6 and 10

with 2 eyes

Outcome:Visual acuity, the time lag (milliseconds) between the stimulus andthe electrical response at the back of the cortex

66 / 96


Data

Crowder & Hand (1990)

67 / 96


68 / 96


Factors to take into account

Main effects:7 individuals (person), Ap2 eyes for each individual (eye), αe4 lens magnifications (power), βm

Interactions?I person*eye, BpeI person*power, CpmI eye*power, γem

2-order interactionI person*eye*power = Residual, εpem

69 / 96


Model formulation

p = 1, . . . , 7, e = 1, 2, m = 1, 2, 3, 4

Ypem = µem + Ap + Bpe + Cpm + εpem

where

Ap ∼ N (0, ω2)Bpe ∼ N (0, τ2

e )Cpm ∼ N (0, τ2

m)εpem ∼ N (0, σ2)

70 / 96


Factor diagram

Ey

[Pa ∗ Ey]

0

[Pa]

Ey ∗ Po

[I ] = [Pa ∗ Ey ∗ Po]

Po

[Pa ∗ Po]

HHHHHHHj

HHHHHHHj

HHHHHHj

HHHHj��

��

��

? ?

?

?

71 / 96


Not quite a multilevel model, but..

Level Unit Covariates1 single measurements Ey*Po2 interactions2e [Pa*Ey] Ey2m [Pa*Po] Po3 individuals, [Pa] overall level

72 / 96


proc mixed data=visual covtest;class patient eye power;model acuity=eye power eye*power / s ddfm=satterth;

* random patient patient*eye patient*power;random intercept eye power / subject=patient;

run;


Standard ZCov Parm Subject Estimate Error Value Pr > ZIntercept patient 20.2857 17.1703 1.18 0.1187eye patient 11.6845 8.6646 1.35 0.0887power patient 4.0238 4.0849 0.99 0.1623Residual 12.8333 4.2778 3.00 0.0013



eye 1 6 0.78 0.4112power 3 18 2.25 0.1177eye*power 3 18 1.06 0.3925

73 / 96



StandardEffect eye power Estimate Error DF t Value Pr > |t|

Intercept 114.14 2.6411 6 43.22 <.0001eye left 3.5714 2.6467 6 1.35 0.2259eye right 0 . . . .power 1 -2.7143 2.1946 18 -1.24 0.2321power 3 -1.8571 2.1946 18 -0.85 0.4085power 6 -2.0000 2.1946 18 -0.91 0.3742power 10 0 . . . .eye*power left 1 -1.1429 2.7080 18 -0.42 0.6780eye*power left 3 -1.2857 2.7080 18 -0.47 0.6407eye*power left 6 -4.5714 2.7080 18 -1.69 0.1086eye*power left 10 0 . . . .eye*power right 1 0 . . . .eye*power right 3 0 . . . .eye*power right 6 0 . . . .eye*power right 10 0 . . . .

74 / 96


Predicted mean profiles

75 / 96


Individual predictions

76 / 96


Residual plot

77 / 96


Omit the interaction eye*power


Standard ZCov Parm Subject Estimate Error Value Pr > Z

Intercept patient 20.2984 17.1692 1.18 0.1186eye patient 11.6592 8.6561 1.35 0.0890power patient 3.9732 4.0118 0.99 0.1610Residual 12.9345 3.9917 3.24 0.0006


StandardEffect eye power Estimate Error DF t Value Pr > |t|Intercept 115.02 2.5076 6 45.87 <.0001eye left 1.8214 2.0628 6 0.88 0.4112eye right 0 . . . .power 1 -3.2857 1.7271 18 -1.90 0.0732power 3 -2.5000 1.7271 18 -1.45 0.1650power 6 -4.2857 1.7271 18 -2.48 0.0232power 10 0 . . . .


Num DenEffect DF DF F Value Pr > Feye 1 6 0.78 0.4112power 3 18 2.25 0.1177

78 / 96


Eye comparisons

Model:Ypem = µem + Ap + Bpe + Cpm + εpem

where

Ap ∼ N (0, ω2), Bpe ∼ N (0, τ2e ),

Cpm ∼ N (0, τ2m), εpem ∼ N (0, σ2)

Difference between eye averages:

Y.e1. − Y.e2. = µ− stuff+ B.e1 − B.e2 + ε.e1. − ε.e2.

79 / 96


Consequence for eye comparisons

Var(Y.e1. − Y.e2.) = 27τ

2e + 2

7 × 4σ2

I τ2e is rather large (people have different eye preferences)

I We have to demand a larger difference in order to detect itI P-values rather large

80 / 96


Magnification comparisons

Model:Ypem = µem + Ap + Bpe + Cpm + εpem

where

Ap ∼ N (0, ω2), Bpe ∼ N (0, τ2e ),

Cpm ∼ N (0, τ2m), εpem ∼ N (0, σ2)

Difference between magnification averages:

Y..m1 − Y..m2 = µ− stuff+ C..m1 − C..m2 + ε..m1 − ε..m2

81 / 96


Consequence for magnification comparisons

Var(Y..m1 − Y..m2) = 27τ

2m + 2

7 × 2σ2

I τ2m is not that large (people react more or less identically tothe different magnifications)

I We can detect smaller differencesI P-values rather small

82 / 96


If we ignore correlations

i.e a model with no random effectsI Eye differences:

Var(Y.e1. − Y.e2.) = 27 × 4σ

2

but another σ2

I Magnification differences:

Var(Y..m1 − Y..m2) = 27 × 2σ

2

83 / 96


Incorrect analysis, ignoring random effects

Covariance ParameterEstimates

Cov Parm EstimateResidual 46.7518


StandardEffect eye power Estimate Error DF t Value Pr > |t|Intercept 115.02 2.0431 51 56.30 <.0001eye left 1.8214 1.8274 51 1.00 0.3236eye right 0 . . . .power 1 -3.2857 2.5843 51 -1.27 0.2094power 3 -2.5000 2.5843 51 -0.97 0.3379power 6 -4.2857 2.5843 51 -1.66 0.1034power 10 0 . . . .


Num DenEffect DF DF Chi-Square F Value Pr > ChiSq Pr > Feye 1 51 0.99 0.99 0.3189 0.3236 <--too smallpower 3 51 3.01 1.00 0.3899 0.3988 <--too large

84 / 96


Systematic vs. random effects

Could the patients be treated as systematic here? Yes:

Covariance Parameter EstimatesCov Parm Subject Estimateeye patient 11.6845power patient 4.0238Residual 12.8333


Num DenEffect DF DF F Value Pr > Fpatient 6 6 3.40 0.0810eye 1 6 0.78 0.4112power 3 18 2.25 0.1177eye*power 3 18 1.06 0.3925

Can you think why?85 / 96


Example with only two time points (baseline and follow-up)from Vickers, A.J. & Altman, D.G.: Analysing controlled clinicaltrials with baseline and follow-up measurements.British Medical Journal 2001; 323: 1123-24.:

52 patients with shoulder pain are randomized to eitherI Acupuncture (n=25)I Placebo (n=27)

Pain is evaluated on a 100 point scalebefore and after treatment.High scores are good

86 / 96


Results on pain scores

Comparison of the two groupsAverage pain score Treatment effectplacebo acupuncture difference(n=27) (n=25) (95% CI) P-value

Baseline 53.9 (14.0) 60.4 (12.3) 6.5 0.09

Type of analysisFollow-up 62.3 (17.9) 79.6 (17.1) 17.3 (7.5; 27.1) 0.0008

Changes* 8.4 (14.6) 19.2 (16.1) 10.8 (2.3; 19.4) 0.014

Ancova 12.7 (4.1; 21.3) 0.005

* results published in Kleinhenz et.al. Pain 1999; 83:235-41.

87 / 96


Development of pain, actual and hypothetical

88 / 96


Approaches for pain score analysis

BaselineI The acupuncture group lies somewhat above placebo

Follow-upI We would expect the acupuncture group to be higher also

after treatmentI Therefore, a direct comparison of follow-up times is

unreasonable(we see too big a difference)

89 / 96


Approaches for pain score analysis, II

ChangeI Low baseline implies an expected large positive change

(regression to the mean)I The placebo group is therefore expected to increase the mostI Therefore, a direct comparison of changes is unreasonable

(we see too small a difference)

90 / 96


General approaches for handling baseline

I AncovaAnalysis of covariance, a special case of multiple regression:

I Outcome: follow-up dataI Covariates

I treatment (factor: acupuncture/placebo)I baseline measurement (quantitative)

I Repeated measurement analysisI Treatment effect appears as an interaction between treatment

and time

91 / 96


Recommandation

When can we use follow-up data?I when we have a control group and proper randomisationI when the correlation is low

When can we use differences?I when we have a control group and proper randomisationI when the correlation is large

When can we use analysis of covariance?I always -

as long as baseline imbalance is not related to treatmenteffect!

92 / 96


Example: Reading abilityas a function of age/training and cohort/age:Longitudinal (within-individual, βW ) effect vs.cross-sectional (between-individual, βB) effect:

93 / 96


Model

I Baseline level: ap1 at the age xp1:ap1 = α+ βBxp1 + δp, βB negative

I Baseline measurement: yp1 = ap1 + εp1I Follow-up level: ap2 = ap1 + βW (xp2 − xp1) at the age xp2:I Follow-up measurement: yp2 = ap2 + εp2I Difference: yp2 − yp1 = βW (xp2 − xp1) − δp + (εp2 − εp1)

Model for all y-observations:

ypj = α+ βBxp1 + βW (xpj − xp1) + δp + εpj

94 / 96


Actual analyses

Regression with inter- as well as intra-individual effect of age/time:

proc mixed data=reading;class id;model read=age1 difage / s;

random id;run;

Covariance Parameter EstimatesCov Parm Estimate

id 245.35Residual 27.0449

StandardEffect Estimate Error DF t Value Pr > |t|Intercept 78.1267 19.1124 4 4.09 0.0150age1 -1.3615 0.5722 5 -2.38 0.0632difage 0.8646 0.3121 5 2.77 0.0394

95 / 96


Estimation results

cross sectional (βB) longitudinal (βW )Method Cohort effect Age effectyi1 vs. xi1 -1.359 (0.458) –

yi2 vs. xi2 -1.245 (0.534) –

yij vs. xij -1.000 (0.384) –no individual effect

yi2 − yi1 vs. xi2 − xi1 – 0.883 (0.211)no intercept

yij vs. xij – 0.676 (0.307)random individual effect

yij vs. -1.362 (0.572) 0.865 (0.312)xi1 and (xi2 − xi1)

96 / 96

Documents

Correlated data - ku