Upload
helen-collins
View
220
Download
0
Tags:
Embed Size (px)
Citation preview
Multivariate Survival Analysis
Alternative approaches
Prof. L. DuchateauGhent University
Overview The different approaches
The marginal model The fixed effects model The stratified model The copula model The frailty model
Efficiency comparisons
The marginal model The marginal model approach consists of
two stages Stage 1: Fit the model without taking into
account the clustering Stage 2: Adjust for the clustering in the data
The ML estimate from the Independence Working Model (IWM)
is a consistent estimator for (Huster, 1989) More generally, the ML estimate ( and
baseline parameters) from the IWM is also a consistent estimator for
Parameter refers to the whole population
Consistency of marginal model parameter estimates
The variance estimate based on the inverse of the information matrix of is an inconsistent estimator of Var( )
One possible solution: jackknife estimation General expression of jackknife estimator
(Wu, 1986)
with N the number of observations and a the number of parameters
Adjusting the variance of IWM estimates
The grouped jackknife estimator For clustered observations: grouped jackknife estimator
with s the number of clusters
Reconstitution: jackknife#Jackknife estimatorbdel<-rep(NA,100)b1<- -survreg(Surv(timerec,stat)~trt,data=reconstitution,dist="exponential")$coeff[2]for (i in 1:100){temp<-reconstitution[reconstitution$cowid!=i,]bdel[i]<--survreg(Surv(timerec,stat)~trt,data=temp,dist="exponential")$coeff[2]}var.robust<-0.98*sum((bdel-b1)^2);stderr.robust<-sqrt(var.robust)var.robust;stderr.robust
Reconstitution: jackknife in R?#Jackknife estimator using cluster() and robust=T commandanalexp.jc<- survreg(Surv(timerec,stat)~trt+cluster(cowid),robust=T,dist="exponential")Error in score %*% vv : non-conformable arguments
Example marginal model with jackknife estimator Example: Time to reconstitution with drug
versus placebo Estimates from IWM model with time-
constant hazard rate assumption are given by
Grouped jackknife = approximation
Jackknife estimator Adjusts for clustering Reconstitution example: jackknife
estimator is smaller Time to diagnosis example?
Jackknife estimator Adjusts for clustering Reconstitution example: jackknife
estimator is smaller
What is then the picture? Simulation
Jackknife estimator-simulations(1)
Is jackknife estimate always smaller than estimate from unadjusted model?
Generate data from the frailty model with
We generate 2000 datasets, each of 100 pairs of two subjects for the settings1. Matched clusters, no censoring2. 20% of clusters 2 treated or untreated subjects,
no censoring3. Matched clusters, 20% censoring
Jackknife estimator-simulations(2)
The fixed effects model The fixed effects model is given by
with the fixed effect for cluster i,
Assume for simplicity
The fixed effects model: ML solution
General survival likelihood expression
For fixed effects model using assumptions
Reconstitution: fixed effects model#Fixed effects modelres.fixed<-survreg(Surv(timerec,stat)~trt+as.factor(cowid),dist="exponential",data=reconstitution)res.fixedsummary(res.fixed)
Treatment effect for reconstitution data using R-function survreg (loglin. model)
Output treatment effect
Parameter interpretation corresponds to constant hazard of
untreated udder quarter of cow 1
corresponds to constant
hazard of untreated udder quarter of cow i Cowid65 ≈ 0 Cowid100 exp(-21+18.8)=0.11
Treatment effect: HR=exp(0.185)=1.203 with 95% CI [0.83;1.75]
Parameter interpretation
Investigate cow characteristic: heifer#Fixed effects modelres.fixed<-survreg(Surv(timerec,stat)~heifer+as.factor(cowid),dist="exponential",data=reconstitution)res.fixedsummary(res.fixed)
Heifer effect for reconstitution dataintroducing heifer first in the model
Hazard ratio impossibly high
Output heifer effect
Add cow characteristic: heifer after cowid?#Fixed effects modelres.fixed<-survreg(Surv(timerec,stat)~trt+as.factor(cowid)+heifer,dist="exponential",data=reconstitution)res.fixedsummary(res.fixed)
Heifer effect for reconstitution dataintroducing cowid first in the model
Hazard ratio equal to 1
Example: between cluster covariate (2)
Exercise Investigate method and type of fracture in
diagnosis data
Note on overparametrisation and confounding
Cell means model: no overparametrisation Milk reduction as a function of low and
high inoculation dose
Factor effects model: overparametrisation Milk reduction as a function of low and
high inoculation dose
Confounding between temperature in F and C Effect of temperature on bacterial growth
(log(CFU))
Temperature in °C vs °F
Conversion from°F to °C
Infinite number of model representations
Confounding between blocks and block factors Cow factor is not confounded with
treatment factor
Fitting model with cow and treatment
Model with cow and treatment vs cow alone
Adding the heifer factor
Infinite number of model representations
Example: heifer - cowid confounded There is complete confounding between
fixed heifer effect and cowid
The stratified model Based on the Cox model
where now baseline hazard function unspecified Cox (1972) showed that if only order of events
matters, the survival likelihood reduces to the partial likelihood
Partial likelihood for the stratified model The stratified model is given by
Maximisation of partial likelihood
Reconstitution: stratified model#stratified Cox modellibrary(survival)res.strat<-coxph(Surv(timerec,stat)~trt+strata(cowid),data=reconstitution)res.stratsummary(res.strat)
Example for bivariate data The partial likelihood for reconstitution data
Estimates
Exercise Fit the stratified model for the diagnosis
data
The copula model The copula model is often considered to be
a two-stage model First obtain the population (marginal)
survival functions for each subject in a cluster.
The copula function then links these population survival functions to generate the joint survival function (Frees et al., 1996).
Example of copula model Time to diagnosis of being healed
Bivariate copula model likelihood Four different possible contributions of a
cluster
Estimated population survival functions are inserted, only copula parameters unknown
The Clayton copula The Clayton copula (Clayton, 1978) is
The Clayton copula corresponds to the family of Archimedean copulas, i.e.,
with in the Clayton copula case
Clayton copula likelihood Two censored observations
Observation j censored
No observations censored
Example Clayton copula (1) For diagnosis of being healed data, first fit
separate models for RX and US technique For instance, separate parametric models
Fitting the copula: two stage approach#Clayton copula for time to diagnosis timetodiag <- read.table("c:\\docs\\onderwijs\\survival\\flames\\diag.csv", header = T,sep=";")t1<-timetodiag$t1/30;t2<-timetodiag$t2/30;c1<-timetodiag$c1;c2<-timetodiag$c2;
surv1<-survreg(Surv(t1,c1)~1);l1<-exp(-surv1$coeff/surv1$scale);r1<-(1/surv1$scale)surv2<-survreg(Surv(t2,c2)~1);l2<-exp(-surv2$coeff/surv2$scale);r2<-(1/surv2$scale)
s1<-exp(-l1*t1^(r1));f1<-s1*r1*l1*t1^(r1-1)s2<-exp(-l2*t2^(r2));f2<-s2*r2*l2*t2^(r2-1)
loglikcon.gamma<-function(theta){P<-s1^(-theta)+ s2^(-theta)-1loglik<- -(1-c1)*(1-c2)*(1/theta)*log(P)+c1*(1-c2)*((1+1/theta)*log(P)+(theta+1)*log(s1)- log(f1))+c2*(1-c1)*((1+1/theta)*log(P)+(theta+1)*log(s2)-log(f2))+c1*c2*(log(1+theta)- (2+1/theta)*log(P)-(theta+1)*log(s1)+log(f1)-(theta+1)*log(s2)+log(f2))-sum(loglik)}
nlm(loglikcon.gamma,c(0.5))
Example Clayton copula (2) Estimates for marginal models are
Based on these estimates we obtain
which can be inserted in the likelihood expression which is then maximized for
Exercise Fit the copula model to the diagnosis data
as one-stage model
Fitting the copula: one stage approach
#Clayton copula for time to diagnosis – one stageloglikcon3.gamma<-function(param){theta<-param[1];l1<-param[2];l2<-param[3];r1<-param[4];r2<-param[5]s1<-exp(-l1*t1^(r1));f1<-s1*r1*l1*t1^(r1-1)s2<-exp(-l2*t2^(r2));f2<-s2*r2*l2*t2^(r2-1)P<-s1^(-theta)+ s2^(-theta)-1loglik<- -(1-c1)*(1-c2)*(1/theta)*log(P)+c1*(1-c2)*((1+1/theta)*log(P)+(theta+1)*log(s1)-log(f1))+c2*(1-c1)*((1+1/theta)*log(P)+(theta+1)*log(s2)-log(f2))+c1*c2*(log(1+theta)-(2+1/theta)*log(P)-(theta+1)*log(s1)+log(f1)-(theta+1)*log(s2)+log(f2))-sum(loglik)}nlm(loglikcon3.gamma, c(0.5,1,1,1,1))
Example Clayton copula (3) For parametric marginal models, the
likelihood can also be maximized simul-taneously for all parameters leading to
Thus, for small sample sizes, the two-stage approach can differ substantially from the one-stage approach
Example Clayton copula (4) Alternatives can be used for marginal
survival functions Nonparametric Semiparametric
leading to
The frailty model The ‘shared’ frailty model is given by
with the frailty An alternative formulation is given by
with
The gamma frailty model Gamma frailty distribution is easiest choice
with and
Marginal likelihood for the gamma frailty model Start from conditional (on frailty) likelihood
with containing the baseline hazard
parameters, e.g., for Weibull
Marginal likelihood: integrating out the frailties … Integrate out frailties using distribution
with
Closed form expression for marginal likelihood Integration leads to (homework)
and taking log and summing over s clusters
Maximisation of marginal likelihood leads to estimates Marginal likelihood no longer contains
frailties. By maximisation estimates of are obtained Furthermore, the asymptotic variance-
covariance matrix can be obtained as the inverse of the observed information matrix
with the Hessian matrix with entries
Entries of Hessian matrix from marginal likelihood
As an example, the entry of the Hessian matrix for is given by
Example for the parametric gamma frailty model Consider time to first insemination data Assume Weibull distributed event times and
model the heifer effect We have the following conditional functions
R program: read the data
#read data
setwd("c://docs//onderwijs//survival//Flames//notas//")
insemfix<-read.table("insemfix.csepv", header=T,sep=",")
#Create four column vectors, four different variables
herd<-insemfix$herdnr;timeto<-(insemfix$end*12/365.25)
stat<-insemfix$score;heifer<-insemfix$par2
#Derive some values
n<-length(levels(as.factor(herd)));
di<-aggregate(stat,by=list(herd),FUN=sum)[,2];r<-sum(di)
R program: the function #Observable likelihood weibull
#l=exp(p[1]), theta=exp(p[2]), beta=p[3], rho=exp(p[4])
#r=No events,di=number of events by herd
likelihood.weibul<-function(p){
cumhaz<-exp(heifer*p[3])*(timeto^(exp(p[4])))*exp(p[1])
cumhaz<-aggregate(cumhaz,by=list(herd),FUN=sum)[,2]
lnhaz<-stat*(heifer*p[3]+log((exp(p[4])*timeto^(exp(p[4])-1))*exp(p[1])))
lnhaz<-aggregate(lnhaz,by=list(herd),FUN=sum)[,2]
lik<-r*log(exp(p[2]))-sum((di+1/exp(p[2]))*log(1+cumhaz*exp(p[2])))+sum(lnhaz)+
sum(sapply(di,function(x) ifelse(x==0,0,log(prod(x+1/exp(p[2])-seq(1,x))))))
-lik}
R program: the output
res<-nlm(likelihood.weibul,c(log(0.128),log(0.39),0.15,log(1.76)), hessian=T)
lambda<-exp(res$estimate[1])
theta<-exp(res$estimate[2])
beta<-res$estimate[3]
rho<-exp(res$estimate[4])
Time to first insemination: effect of heifer with herd as cluster ML
Monthly hazard rate
Monotone increasing
Variance of frailties
Within herd heifer effect
Hazard ratio with 95 % CI
Using parfm librarylibrary(parfm)#Create four column vectors, four different variablesherd<-as.factor(insemfix$herdnr);timeto<-(insemfix$end*12/365.25)stat<-insemfix$score;heifer<-insemfix$par2insem<-data.frame(herd=herd,timeto=timeto,stat=stat,heifer=heifer)parfm(Surv(timeto,stat)~heifer,cluster="herd",data=insem,frailty="gamma")
Interpretation of frailty variance The parameter refers to the variability at
the hazard level: difficult to interprete! Maybe plot the hazard function for subjects
with a particular frailty
Plotting hazard of insemination for multiparous cows #Interpretation of parameterslambda<-0.174;theta<-0.394;rho<-1.769lambda<-lambda*((365.25/12)^(-rho))
time<-seq(1,350);timet<-time+29.5h1f0<-lambda*rho*time^(rho-1)h1f05<-qgamma(0.05,1/theta,1/theta)*lambda*rho*time^(rho-1)h1f95<-qgamma(0.95,1/theta,1/theta)*lambda*rho*time^(rho-1)
Plotting hazard of inseminationfor multiparous cows #Hazardspar(mfrow=c(1,2)); par(adj=0.5);par(cex=1.2)plot(c(0,360),c(min(h1f05,h2f05),max(h1f95,h2f95)),type='n',xlab="Time after calving (days)",ylab="hazard")lines(timet,h1f0,lty=1,lwd=3);lines(timet,h1f05,lty=1,lwd=1)lines(timet,h1f95,lty=1,lwd=1)par(adj=0);text(1,0.14,"Multiparous cows")
Exercise
Plot hazard of insemination for heifers
Plotting hazard of insemination for heifers #Interpretation of parameterslambda<-0.174;theta<-0.394;rho<-1.769;beta<--0.153lambda<-lambda*((365.25/12)^(-rho))
h2f0<-lambda*rho*exp(beta)*time^(rho-1)h2f05<-qgamma(0.05,1/theta,1/theta)*lambda*rho*exp(beta)*time^(rho-1)h2f95<-qgamma(0.95,1/theta,1/theta)*lambda*rho*exp(beta)*time^(rho-1)
Plotting hazard of insemination #Hazardspar(mfrow=c(1,2)); par(adj=0.5);par(cex=1.2)plot(c(0,360),c(min(h1f05,h2f05),max(h1f95,h2f95)),type='n',xlab="Time after calving (days)",ylab="hazard")lines(timet,h1f0,lty=1,lwd=3);lines(timet,h1f05,lty=1,lwd=1)lines(timet,h1f95,lty=1,lwd=1)par(adj=0);text(1,0.14,"Multiparous cows")par(adj=0.5)plot(c(0,360),c(min(h1f05,h2f05),max(h1f95,h2f95)),type='n',xlab="Time after calving (days)",ylab="hazard")lines(timet,h2f0,lty=1,lwd=3);lines(timet,h2f05,lty=1,lwd=1)lines(timet,h2f95,lty=1,lwd=1)par(adj=0);text(1,0.14,"Heifers")
Interpretation of frailty variance The parameter refers to the variability at
the hazard level: difficult to interprete!
Multiparouscows
Heifers
Transformation to median Density function of transformation of
random variable
Median of Weibull distribution To find the median survival time for cluster i, put = 0.5
Density of median for Weibull distribution The density function is then
with
and
Density of median for Weibull distribution Leading to
Plotting density function of median for multiparous cowslambda<-0.174;theta<-0.394;rho<-1.769;beta<--0.153;lambda<-lambda*((365.25/12)^(-rho))#Medianscalcm<-function(m){rho * (log(2)/(theta*lambda))^(1/theta) * (1/m)^(1+rho/theta) * (1/gamma(1/theta)) *exp(-log(2)/(theta*lambda*m^(rho)))}timedens<-seq(1,200,1)densmd1<-sapply(timedens,calcm)plot(c(0,230),c(min(densmd1),max(densmd1)),type='n',xlab="Median time to first insemination (days)",ylab="Density function median")lines(timedens+29.5,densmd1,lty=1,lwd=3)
Exercise Plot density of median for Heifers
Plotting density function of median for multiparous cows and heiferslambda<-0.174;theta<-0.394;rho<-1.769;beta<--0.153;lambda<-lambda*((365.25/12)^(-rho))#Medianscalcm<-function(m){rho * (log(2)/(theta*lambda))^(1/theta) * (1/m)^(1+rho/theta) * (1/gamma(1/theta)) *exp(-log(2)/(theta*lambda*m^(rho)))}timedens<-seq(1,200,1)densmd1<-sapply(timedens,calcm)lambda<-lambda*exp(beta)densmd2<-sapply(timedens,calcm)plot(c(0,230),c(min(densmd1,densmd2),max(densmd1,densmd2)),type='n',xlab="Median time to first insemination (days)",ylab="Density function median")lines(timedens+29.5,densmd1,lty=1,lwd=3);lines(timedens+29.5,densmd2,lty=2,lwd=3)legend(130,0.015,legend=c("Multiparous","Heifer"),lty=c(1,2))
Variability of median time to first insemination between herds
Exercise Derive the density function for the
percentage survivan at a particular time t
Transformation to percentage survival
The percentage in cluster i with first insemination at time t is given by
Thus
and
Interpretation of frailty variance in terms of % events at time t The density function is then obtained by
and thus
Variability of % first insemination at time t between herds
Multiparouscows
Heifers
Efficiency comparisons in the reconstitution data example Estimates (se) for reconstitution data