Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Faculty of Health Sciences
Correlated dataFurther topics
Lene Theil SkovgaardDecember 11, 2012
1 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Further topics
I Specification of mixed modelsI Model check and diagnosticsI Explained variation, R2
I Missing valuesI Several series for each individualI Additional examples:
I Visual acuity example (from Variance components II)I Baseline revisited: Simple before and after studyI Reading ability
2 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Specification of mixed models
I Systematic variation:I Between-individual covariates:
treatment, sex, age, baseline value...I Within-individual covariates:
time, cumulative dose, temperature...is specified “as usual”, including possible interactions
I Random variationI Interactions between systematic and random effectsare always random
3 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Sources of random variation
1. Random effects:
2. Serial correlation:
3. Measurement error:
4 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
SAS, PROC MIXED
I modeldescribes the systematic part(fixed effects, mean value structure)
I randomdescribes the random effects
I repeateddescribes the serial correlation
I localadds an additional measurement error
5 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
General model specification
sorry..., matrix notation for brevity,Yi = (yi1, ..., yik)′ denote all outcome values for subject i
Yi = Xiβ + Zibi + εi
where
β denotes systematic effects, β = (β1, ..., βp)′
bi denotes random effects, bi = (bi1, ..., biq)′
εi is the serially dependent residual variation,εi = (εi1, ..., εik)′
We assume all bi ’s and εi ’s to be independent, with mean zero andVar(bi) = G, Var(εi) = Ri
6 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
For non-normal data
Yi follows some distribution from the exponential family,with mean value
E(Yi) = g−1(ηi)
whereηi = Xiβ + Zibi
and the variance V (Yi) = Vi is determined by the distribution.
7 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Residual variance
In general, there is no free variance parameter,since the variance is determined from the mean value:
I Normal (link=identity), free variance parameter σ2
I Binomial (link=logit), variance np(1 − p)I Poisson (link=log), variance λ = E(Y )
Overdispersion:The variance is seen to be larger than determined by thedistribution.
8 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Overdispersion
can be caused byI omitted covariates (isn’t that always the case?)I unrecognized clustersI heterogeneity, e.g. a “zero”-group (non-susceptibles)
Traditional solution:An over-dispersion parameter φ is estimated and multiplied ontothe variance
cheating.....we would like instead
ηi = Xiβ + Zibi + εi
with a free parameter Var(εi) = Σi
9 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Swabs example from last week
proc glimmix data=swab;class crowding family name;model swab=crowding name / dist=poisson link=log s cl;
random family(crowding);run;
gave as part of the output
Fit Statistics
-2 Res Log Pseudo-Likelihood 178.54Gener. Chi-Square / DF 1.96 <--------------------
When this number is greater than 1, it indicates overdispersion,and the P-values in the Poisson analysis might be too small.
10 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Model with overdispersion
proc glimmix data=swab;class crowding family name;model swab=crowding name /
dist=poisson link=log s cl;random intercept / subject=family(crowding);random _residual_; <------------------------------------------
run;
providing the output
The GLIMMIX Procedure
Fit Statistics
-2 Res Log Pseudo-Likelihood 148.94Generalized Chi-Square 176.12Gener. Chi-Square / DF 2.12
11 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Output, continued
Covariance Parameter Estimates
StandardCov Parm Subject Estimate ErrorIntercept family(crowding) 0.04304 0.03250Residual (VC) 2.1219 0.3631
Type III Tests of Fixed Effects
Num DenEffect DF DF F Value Pr > Fcrowding 2 15 5.49 0.0162name 4 68 17.18 <.0001
with P-values more or less identical to the normality-based results.
12 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
But....
The model is not a real model,since no such distribution exists....If we should build a real model with overdispersion, we would write:
ycfn ∼ Poisson(λcfn)
log(λcfn) = µ+ βc + γn + Acf + εcfn
Acf ∼ N (0, ω2), εcfn ∼ N (0, σ2)
13 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Real model for overdispersion
will require two random-statements, like this:
proc glimmix data=swab;class crowding family name;model swab=crowding name /
dist=poisson link=log s cl;random intercept / subject=family(crowding);random intercept / subject=name*family(crowding);
run;
but this model unfortunately cannot be estimated by glimmix:
14 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Convergence problems
The GLIMMIX ProcedureIteration History
Objective MaxIteration Restarts Subiterations Function Change Gradient
15 0 1 155.39914436 0.00000048 0.00001816 0 1 155.39914341 0.00000048 0.00001817 0 1 155.39914437 0.00000049 0.00001818 0 1 155.3991434 0.00000049 0.00001919 0 1 155.39914438 0.00000050 0.000019
Did not converge.
Covariance Parameter Estimates
StandardCov Parm Subject Estimate ErrorIntercept family(crowding) 0.03672 .Intercept family*name(crowdin) 0.1283 .
It probably requires true replicates.....
15 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Model check and diagnostics (normal models)
I Ordinary residuals, from systematic effect, Yi − Xi βThese will be correlated
I Conditional residuals, Yi − Xi β + Zi biI Normality of random effects, estimated BLUP’s, biI Investigation of covariance structure, (variogram)I Detection of influential observations
16 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Normality of random effects
Model checksI Histogram of BLUP’s from the model is not worth muchI Instead, build model with a mixture of normal distributions
and make a test....
If normality does not apply:I no large effect on estimates for β, G and RI standard error become biased,
but may be corrected in various waysI BLUP’s bi become invalid, especially when the residual
variance σ2 is large
17 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Variogram
Variance of difference between time points:
γ(u) = 12E(εt − εt−u)2 = τ2(1 − ρ(u)) + σ2
I Nugget: σ2
I Sill: τ2 + σ2
I Variance: ω2 + τ2 + σ2
18 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Local influence
Idea:I Put some infinitesimal extra weight (∆) on a single
observation (i); Weight vector:
ω∆i = (1, · · · , 1 + ∆, · · · , 1)I Look at the change in likelihood:
LD(ω∆i) = 2(l(θ) − l(θω∆i ))
I Make a Local influence plot (i,Ci), where
Ci = 1∆LD(ω∆i)
19 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Reasons for a large influence
I unusual combination of covariates Xi
I large ordinary residuals (from Xiβ)
I unusual combination of covariates Zi
I bad choice of covariance pattern Vi(α)
20 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Explained variation in percent, R2
We have two (or more) different variances to explain!I Residual variation (variation within individuals, σ2
W )I decreases (as usual)
when we include an important x covariate (level 1)I may decrease
when we include an important z covariate(level 2)
I Variation between individuals , ω2B
I decreaseswhen we include an important z covariate (level 2)
I may increase or decrease,when we include an important x covariate (level 1)
21 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Hypothetical example
The x’s vary between individuals, and the average outcomes (y)are mostly due to this variation:
Levels of y, for fixed x are quite alike!ω2 decreases
22 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Another hypothetical example
The x’s vary between individuals, but the average outcomes (y)are almost identical:
Levels of y, for fixed x are very different!ω2 increases
23 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Missing values
Most investigations are planned to be balancedbut almost inevitable turn out to have missing values,or drop-out patients
I just by coincidence (blood sample lost or ruined)I because of exclusion (the patient has recovered)I we lost track of the patient (may be worrysome)I the patient is too ill to show up
(very serious, i.e. carrying information)
24 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Types of missing data
I Single missing valuesI Drop-outs
25 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Possible missing mechanism, I
Low values are good:When the patient is well treated, he drops out
26 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Possible missing mechanism, II
Low values are bad:Below some threshold, the patient is too ill to show up(informative missing)
27 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Notation
I Outcome YgitI Parameters θ (level, slope etc.)I Covariates xgitI Indicator of missing cgitI Missing outcomes Y ∗git not observed, i.e. corresponding to
cgit = 1
28 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Types of missingness
MCAR Missing completely at randomP(cgit = 1) depends only on the parameter θ
MAR Missing at randomI CDEP: P(cgit = 1) depends upon θ and
covariates xgitI YDEP: P(cgit = 1) depends upon θ, covariates
xgit and observed outcomes Ygit
NI Non-ignorable: (informative missing)P(cgit = 1) depends upon the unobserved=missingoutcome Y ∗git
29 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Hypothetical TLC-observations
Lung capacity measured at regular time intervalsfor two groups, that we want to compare
30 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Average for the two groups
31 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Hypothetical example: Informative missing
Patients who get below 3.5 drop out, averages change
32 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Traditional handling of missing data
I Complete case analysisI LOCF: Last observation carried forwardI Time average imputationI Model prediction imputation
I Likelihood methods
33 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Complete case analysis
Make an analysis including only those individuals who are observedat all available time points
I Information lossI Potential bias, if there is a specific reason for the missingness
34 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
LOCF: Last observation carried forwardIf an individual has no observed value at time tk , replace themissing value by the previous observation, tk−1For drop-outs, all subsequent values will equal this tk−1
I The time effect will be less pronouncedI Large residuals, i.e. overestimation of residual variation
35 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Time average imputation
I Subject effect will be underestimatedI Large residuals, i.e. overestimation of residual variation
36 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Model prediction imputation
I Two-step procedureI Too small residuals, i.e. downwards bias of of SD
37 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Likelihood methods
Mixed models for all available observations
38 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
MCAR: Missing completely at random
I Complete case analysis OK, but inefficientI If only few observations are missing, imputations could work
but the variations will be affectedI Likelihood approaches (mixed models) OK
uses all available information
39 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
MAR: Missing at random
I Complete case analysis is biasedWe disregard subjects with special characteristics
I If only few observations are missing, imputations could workbut the variations will be affected
I Likelihood approaches (mixed models) OKuses all available information
40 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Mixed models for MAR
41 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Non-ignorable
Nothing works!
Many attempts have been tried to model the missing mechanisms,but they all rely on assumptions that cannot be checked.
42 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Example: Effect of exercise on appetite
53 subjects, in three groups:I ControlI Moderate exercise: 1
2 hour a dayI Extensive exercise: 1 hour a day
Before and after exercise/placebo:Exercise test, with blood samples taken every half hour frombaseline until 3 hours.Several hormones are measured, e.g. ghrelin
Mads Rosenkilde
43 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
HIGH, week=Pre
44 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
HIGH, week=Post
45 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Baseline differences, before exercise?
No, not reallyCan we then disregard these?
46 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
47 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
48 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Aim of investigation
Does the exercise change something?I Does the time course change from the first to the second
week (pre/post exercise)?I If so, does it change more than for the control group?I And does it apply equally to the two exercise groups?
49 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Model for each group separately
Fixed effectsI week: Pre or PostI time: 0, 30, 60, 90, 120, 150, 180 minutesI Interaction week*time:
A change in the pattern from before to after exerciseRandom effects
I Patients, Sub: 18 (HIGH, MOD), 17 (XCON)I Interaction Sub*week
Serial covariance structureI Autoregressive?I Local error term?
50 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Analysis for each group separately
proc mixed data=a0 covtest; by grp;class Sub week time;model log_ghrelin=time week*time
/ ddfm=satterth s cl;random intercept / subject=Sub vcorr v;repeated time / subject=Sub*week
type=sp(pow)(numtime) local rcorr r;lsmeans week*time / slice=time;
run;
51 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Output, exercise HIGH
grp=HIGH
The Mixed Procedure
Model Information
Data Set WORK.A0Dependent Variable log_ghrelinCovariance Structures Variance Components,
Spatial PowerSubject Effects Sub, Sub*weekEstimation Method REMLResidual Variance Method ProfileFixed Effects SE Method Model-BasedDegrees of Freedom Method Satterthwaite
Class Level Information
Class Levels ValuesSub 18 ALMA ANAP ANLO ANTF BRFR CASC
CHBE DAHA DERJ GRPE HEJE JAKUMIFH MIMR MIMÂİ MINI NIHA THES
week 2 Post Pretime 7 0 30 60 90 120 150 180
52 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Output, exercise HIGH, II
DimensionsCovariance Parameters 4Columns in X 24Columns in Z Per Subject 1Subjects 18Max Obs Per Subject 14
Covariance Parameter Estimates
Standard ZCov Parm Subject Estimate Error Value Pr ZIntercept Sub 0.008593 0.003105 2.77 0.0028Variance Sub*week 0.001219 0.000374 3.26 0.0005SP(POW) Sub*week 0.6549 0.1744 3.75 0.0002Residual 0.000817 0.000214 3.81 <.0001
53 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Output, exercise HIGH, III
Type 3 Tests of Fixed Effects
Num DenEffect DF DF F Value Pr > Ftime 6 105 6.77 <.0001week*time 7 53.8 1.89 0.0885
Type 3 Tests of Fixed Effects
Num DenEffect DF DF F Value Pr > Fweek 1 16.8 0.24 0.6302time 6 105 6.77 <.0001week*time 6 105 2.13 0.0560
54 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Output, exercise HIGH, IV
Tests of Effect Slices
Num DenEffect time DF DF F Value Pr > Fweek*time 0 1 72.2 0.81 0.3725week*time 30 1 72.2 0.08 0.7830week*time 60 1 72.2 2.57 0.1130week*time 90 1 72.2 0.06 0.8052week*time 120 1 72.2 2.52 0.1167week*time 150 1 72.2 0.30 0.5881week*time 180 1 74.5 1.94 0.1682
Solution for Fixed Effects
StandardEffect week time Estimate Error DF t Value Pr > |t|
week*time Post 0 0.01350 0.01504 72.2 0.90 0.3725week*time Post 30 0.004157 0.01504 72.2 0.28 0.7830week*time Post 60 0.02413 0.01504 72.2 1.60 0.1130week*time Post 90 0.003723 0.01504 72.2 0.25 0.8052week*time Post 120 -0.02388 0.01504 72.2 -1.59 0.1167week*time Post 150 -0.00818 0.01504 72.2 -0.54 0.5881week*time Post 180 0.02116 0.01521 74.5 1.39 0.1682
55 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Results from all three groups
ω2 τ2 ρ σ2 P: week*timeid/int
HIGH 0.008593 0.001219 0.6549 0.00817 0.089/0.056
MOD 0.01998 0.002811 0.7488 0.001522 0.32/0.23
XCON 0.01528 0.001303 0.2957 0.000398 0.13/0.092
All 0.01471 0.001724 0.6200 0.000946 0.0848∗
*: Test for second-order interaction: grp*week*time
56 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
All groups simultaneously
proc mixed data=a0 covtest;class Sub grp week time;model log_ghrelin=grp week time
grp*week grp*time week*timegrp*week*time / ddfm=satterth s cl;
random intercept / subject=Sub(grp) vcorr v;repeated time / subject=week*Sub(grp) type=sp(pow)(numtime) local rcorr r;lsmeans grp*week*time / slice=week;lsmeans grp*week*time / slice=grp;
run;
Class Level Information
Class Levels Values
Sub 53 ALJE ALMA ANAP ANBR ANKR ANLIANLO ANMO ANTF ASOL ASSA BRFR
THBR THES THLU TUMY ULRAgrp 3 HIGH MOD XCONweek 2 Post Pretime 7 0 30 60 90 120 150 180
57 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Dimensions
Covariance Parameters 4Columns in X 96Columns in Z Per Subject 1Subjects 53Max Obs Per Subject 14
Number of Observations
Number of Observations Read 736Number of Observations Used 736Number of Observations Not Used 0
58 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
The Mixed Procedure
Covariance Parameter Estimates
Standard ZCov Parm Subject Estimate Error Value Pr Z
Intercept Sub 0.01471 0.003062 4.80 <.0001Variance Sub*week 0.001724 0.000279 6.17 <.0001SP(POW) Sub*week 0.6200 0.1007 6.16 <.0001Residual 0.000946 0.000160 5.93 <.0001
Type 3 Tests of Fixed Effects
Num DenEffect DF DF F Value Pr > F
grp 2 50.1 0.36 0.7004week 1 49.5 0.03 0.8673hours 6 319 15.66 <.0001grp*week 2 49.5 0.30 0.7416grp*hours 12 319 0.48 0.9236week*hours 6 319 1.71 0.1183grp*week*hours 12 319 1.62 0.0848
59 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Baseline differences
Even though these are not significantly different, we know that weought to include it as a covariate, but which baseline?
I The observation at time 0 in the Pre-exercise week?I The observation at time 0 at each of the two weeks
(i.e. separate values for variable baseline, depending uponthe week)
Think about which question you want to answer!
60 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Including first baseline as covariate
proc sort data=a0;by grp Sub;
run;
data baseline;set a0; if week=’Pre’ and time=0;ghrelin0=ghrelin;log_ghrelin0=log_ghrelin;run;
data test;merge a0 baseline; by grp Sub;
proc mixed data=test covtest; where week=’Post’ or time>0;class Sub grp week time;model log_ghrelin=log_ghrelin0 grp week time
grp*week grp*time week*timegrp*week*time / ddfm=satterth s cl;
random intercept / subject=Sub vcorr v;repeated time / subject=Sub*week type=sp(pow)(numtime) local rcorr r;
run;
61 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Output, with baseline as covariate
Dimensions
Covariance Parameters 4Columns in X 85Columns in Z Per Subject 1Subjects 52Max Obs Per Subject 12
Number of Observations
Number of Observations Read 632Number of Observations Used 621Number of Observations Not Used 11
Covariance Parameter Estimates
Standard ZCov Parm Subject Estimate Error Value Pr Z
Intercept Sub 0.001280 0.000357 3.59 0.0002Variance Sub*week 0.001350 0.000277 4.87 <.0001SP(POW) Sub*week 0.4419 0.1439 3.07 0.0021Residual 0.000877 0.000235 3.73 <.0001
62 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Solution for Fixed Effects
StandardEffect grp week hours Estimate Error DF t Value
Intercept 0.5587 0.1159 48.7 4.82log_ghrelin0 0.7980 0.04092 47.9 19.50....
Type 3 Tests of Fixed Effects
Num DenEffect DF DF F Value Pr > F
log_ghrelin0 1 47.9 380.34 <.0001grp 2 48.5 2.31 0.1105week 1 53.1 0.31 0.5787time 5 272 7.27 <.0001grp*week 2 53.1 0.68 0.5113grp*time 10 272 0.53 0.8649week*time 5 272 2.07 0.0699grp*week*time 10 272 1.28 0.2406 <--------------------------
Proceed with some model reduction and model checks.....
63 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Model reduction
Type 3 Tests of Fixed Effects
Num DenEffect DF DF F Value Pr > Flog_ghrelin0 1 47.9 380.16 <.0001grp 2 48 2.31 0.1100week 1 55.5 0.29 0.5946time 5 282 7.22 <.0001week*time 5 282 2.11 0.0638
Type 3 Tests of Fixed Effects
Num DenEffect DF DF F Value Pr > Flog_ghrelin0 1 49.9 362.33 <.0001week 1 55.5 0.29 0.5944time 5 282 7.23 <.0001week*time 5 282 2.11 0.0642
64 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Could this have been done simpler?
I Analysis of differencesI because we have only two occasionsI and exactly the same time points
I Reducing the time coursesI to averages and/or slopes,
to give a simpler model for derived quantities
65 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Effect of the lens strength on visual acuity
7 individuals are looking at a screen, where a light flash appears.
They are looking through 4 lenses, with powers 6/6, 6/18, 6/36and 6/60, i.e. 4 magnifications: 1, 3, 6 and 10
with 2 eyes
Outcome:Visual acuity, the time lag (milliseconds) between the stimulus andthe electrical response at the back of the cortex
66 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Data
Crowder & Hand (1990)
67 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
68 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Factors to take into account
Main effects:7 individuals (person), Ap2 eyes for each individual (eye), αe4 lens magnifications (power), βm
Interactions?I person*eye, BpeI person*power, CpmI eye*power, γem
2-order interactionI person*eye*power = Residual, εpem
69 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Model formulation
p = 1, . . . , 7, e = 1, 2, m = 1, 2, 3, 4
Ypem = µem + Ap + Bpe + Cpm + εpem
where
Ap ∼ N (0, ω2)Bpe ∼ N (0, τ2
e )Cpm ∼ N (0, τ2
m)εpem ∼ N (0, σ2)
70 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Factor diagram
Ey
[Pa ∗ Ey]
0
[Pa]
Ey ∗ Po
[I ] = [Pa ∗ Ey ∗ Po]
Po
[Pa ∗ Po]
HHHHHHHj
HHHHHHHj
HHHHHHj
HHHHj�����
�������
�������� ��������
? ?
?
?
71 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Not quite a multilevel model, but..
Level Unit Covariates1 single measurements Ey*Po2 interactions2e [Pa*Ey] Ey2m [Pa*Po] Po3 individuals, [Pa] overall level
72 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
proc mixed data=visual covtest;class patient eye power;model acuity=eye power eye*power / s ddfm=satterth;
* random patient patient*eye patient*power;random intercept eye power / subject=patient;
run;
Covariance Parameter Estimates
Standard ZCov Parm Subject Estimate Error Value Pr > ZIntercept patient 20.2857 17.1703 1.18 0.1187eye patient 11.6845 8.6646 1.35 0.0887power patient 4.0238 4.0849 0.99 0.1623Residual 12.8333 4.2778 3.00 0.0013
Type 3 Tests of Fixed Effects
Num DenEffect DF DF F Value Pr > F
eye 1 6 0.78 0.4112power 3 18 2.25 0.1177eye*power 3 18 1.06 0.3925
73 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Solution for Fixed Effects
StandardEffect eye power Estimate Error DF t Value Pr > |t|
Intercept 114.14 2.6411 6 43.22 <.0001eye left 3.5714 2.6467 6 1.35 0.2259eye right 0 . . . .power 1 -2.7143 2.1946 18 -1.24 0.2321power 3 -1.8571 2.1946 18 -0.85 0.4085power 6 -2.0000 2.1946 18 -0.91 0.3742power 10 0 . . . .eye*power left 1 -1.1429 2.7080 18 -0.42 0.6780eye*power left 3 -1.2857 2.7080 18 -0.47 0.6407eye*power left 6 -4.5714 2.7080 18 -1.69 0.1086eye*power left 10 0 . . . .eye*power right 1 0 . . . .eye*power right 3 0 . . . .eye*power right 6 0 . . . .eye*power right 10 0 . . . .
74 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Predicted mean profiles
75 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Individual predictions
76 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Residual plot
77 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Omit the interaction eye*power
Covariance Parameter Estimates
Standard ZCov Parm Subject Estimate Error Value Pr > Z
Intercept patient 20.2984 17.1692 1.18 0.1186eye patient 11.6592 8.6561 1.35 0.0890power patient 3.9732 4.0118 0.99 0.1610Residual 12.9345 3.9917 3.24 0.0006
Solution for Fixed Effects
StandardEffect eye power Estimate Error DF t Value Pr > |t|Intercept 115.02 2.5076 6 45.87 <.0001eye left 1.8214 2.0628 6 0.88 0.4112eye right 0 . . . .power 1 -3.2857 1.7271 18 -1.90 0.0732power 3 -2.5000 1.7271 18 -1.45 0.1650power 6 -4.2857 1.7271 18 -2.48 0.0232power 10 0 . . . .
Type 3 Tests of Fixed Effects
Num DenEffect DF DF F Value Pr > Feye 1 6 0.78 0.4112power 3 18 2.25 0.1177
78 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Eye comparisons
Model:Ypem = µem + Ap + Bpe + Cpm + εpem
where
Ap ∼ N (0, ω2), Bpe ∼ N (0, τ2e ),
Cpm ∼ N (0, τ2m), εpem ∼ N (0, σ2)
Difference between eye averages:
Y.e1. − Y.e2. = µ− stuff+ B.e1 − B.e2 + ε.e1. − ε.e2.
79 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Consequence for eye comparisons
Var(Y.e1. − Y.e2.) = 27τ
2e + 2
7 × 4σ2
I τ2e is rather large (people have different eye preferences)
I We have to demand a larger difference in order to detect itI P-values rather large
80 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Magnification comparisons
Model:Ypem = µem + Ap + Bpe + Cpm + εpem
where
Ap ∼ N (0, ω2), Bpe ∼ N (0, τ2e ),
Cpm ∼ N (0, τ2m), εpem ∼ N (0, σ2)
Difference between magnification averages:
Y..m1 − Y..m2 = µ− stuff+ C..m1 − C..m2 + ε..m1 − ε..m2
81 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Consequence for magnification comparisons
Var(Y..m1 − Y..m2) = 27τ
2m + 2
7 × 2σ2
I τ2m is not that large (people react more or less identically tothe different magnifications)
I We can detect smaller differencesI P-values rather small
82 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
If we ignore correlations
i.e a model with no random effectsI Eye differences:
Var(Y.e1. − Y.e2.) = 27 × 4σ
2
but another σ2
I Magnification differences:
Var(Y..m1 − Y..m2) = 27 × 2σ
2
83 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Incorrect analysis, ignoring random effects
Covariance ParameterEstimates
Cov Parm EstimateResidual 46.7518
Solution for Fixed Effects
StandardEffect eye power Estimate Error DF t Value Pr > |t|Intercept 115.02 2.0431 51 56.30 <.0001eye left 1.8214 1.8274 51 1.00 0.3236eye right 0 . . . .power 1 -3.2857 2.5843 51 -1.27 0.2094power 3 -2.5000 2.5843 51 -0.97 0.3379power 6 -4.2857 2.5843 51 -1.66 0.1034power 10 0 . . . .
Type 3 Tests of Fixed Effects
Num DenEffect DF DF Chi-Square F Value Pr > ChiSq Pr > Feye 1 51 0.99 0.99 0.3189 0.3236 <--too smallpower 3 51 3.01 1.00 0.3899 0.3988 <--too large
84 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Systematic vs. random effects
Could the patients be treated as systematic here? Yes:
Covariance Parameter EstimatesCov Parm Subject Estimateeye patient 11.6845power patient 4.0238Residual 12.8333
Type 3 Tests of Fixed Effects
Num DenEffect DF DF F Value Pr > Fpatient 6 6 3.40 0.0810eye 1 6 0.78 0.4112power 3 18 2.25 0.1177eye*power 3 18 1.06 0.3925
Can you think why?85 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Example with only two time points (baseline and follow-up)from Vickers, A.J. & Altman, D.G.: Analysing controlled clinicaltrials with baseline and follow-up measurements.British Medical Journal 2001; 323: 1123-24.:
52 patients with shoulder pain are randomized to eitherI Acupuncture (n=25)I Placebo (n=27)
Pain is evaluated on a 100 point scalebefore and after treatment.High scores are good
86 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Results on pain scores
Comparison of the two groupsAverage pain score Treatment effectplacebo acupuncture difference(n=27) (n=25) (95% CI) P-value
Baseline 53.9 (14.0) 60.4 (12.3) 6.5 0.09
Type of analysisFollow-up 62.3 (17.9) 79.6 (17.1) 17.3 (7.5; 27.1) 0.0008
Changes* 8.4 (14.6) 19.2 (16.1) 10.8 (2.3; 19.4) 0.014
Ancova 12.7 (4.1; 21.3) 0.005
* results published in Kleinhenz et.al. Pain 1999; 83:235-41.
87 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Development of pain, actual and hypothetical
88 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Approaches for pain score analysis
BaselineI The acupuncture group lies somewhat above placebo
Follow-upI We would expect the acupuncture group to be higher also
after treatmentI Therefore, a direct comparison of follow-up times is
unreasonable(we see too big a difference)
89 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Approaches for pain score analysis, II
ChangeI Low baseline implies an expected large positive change
(regression to the mean)I The placebo group is therefore expected to increase the mostI Therefore, a direct comparison of changes is unreasonable
(we see too small a difference)
90 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
General approaches for handling baseline
I AncovaAnalysis of covariance, a special case of multiple regression:
I Outcome: follow-up dataI Covariates
I treatment (factor: acupuncture/placebo)I baseline measurement (quantitative)
I Repeated measurement analysisI Treatment effect appears as an interaction between treatment
and time
91 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Recommandation
When can we use follow-up data?I when we have a control group and proper randomisationI when the correlation is low
When can we use differences?I when we have a control group and proper randomisationI when the correlation is large
When can we use analysis of covariance?I always -
as long as baseline imbalance is not related to treatmenteffect!
92 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Example: Reading abilityas a function of age/training and cohort/age:Longitudinal (within-individual, βW ) effect vs.cross-sectional (between-individual, βB) effect:
93 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Model
I Baseline level: ap1 at the age xp1:ap1 = α+ βBxp1 + δp, βB negative
I Baseline measurement: yp1 = ap1 + εp1I Follow-up level: ap2 = ap1 + βW (xp2 − xp1) at the age xp2:I Follow-up measurement: yp2 = ap2 + εp2I Difference: yp2 − yp1 = βW (xp2 − xp1) − δp + (εp2 − εp1)
Model for all y-observations:
ypj = α+ βBxp1 + βW (xpj − xp1) + δp + εpj
94 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Actual analyses
Regression with inter- as well as intra-individual effect of age/time:
proc mixed data=reading;class id;model read=age1 difage / s;
random id;run;
Covariance Parameter EstimatesCov Parm Estimate
id 245.35Residual 27.0449
StandardEffect Estimate Error DF t Value Pr > |t|Intercept 78.1267 19.1124 4 4.09 0.0150age1 -1.3615 0.5722 5 -2.38 0.0632difage 0.8646 0.3121 5 2.77 0.0394
95 / 96
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Estimation results
cross sectional (βB) longitudinal (βW )Method Cohort effect Age effectyi1 vs. xi1 -1.359 (0.458) –
yi2 vs. xi2 -1.245 (0.534) –
yij vs. xij -1.000 (0.384) –no individual effect
yi2 − yi1 vs. xi2 − xi1 – 0.883 (0.211)no intercept
yij vs. xij – 0.676 (0.307)random individual effect
yij vs. -1.362 (0.572) 0.865 (0.312)xi1 and (xi2 − xi1)
96 / 96