Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
1
BSTA 6652 PROBLEM SET FOUR
You should know how to answer these questions by hand calculation.
Problem One: Using the data on aneuploidy tumors found in Table 1.6 on page
12, answer the following questions.
1 a. Use Kaplan-Meier to estimate the survival function at 12 weeks and its
standard error.
Sas code for 1 a:
/* Number 1a */ title "Kaplan-Meier estimate" ; proc lifetest data =aneuploid method =km plots =survival( cl ); time time*stat( 0); run; /* End Number 1a */
Sas output for 1 a:
Since there are no death times between 10 and 12, we use S(10) as our estimate
of S(12). S(12) ≈ 0.9038 SE(S(12))≈0.0409
2
1 b. Find linear and loglog 95% confidence intervals for Survival at 60 weeks.
Sas code for 1 b:
/* Number 1b */ title "1 b. Kaplan-Meier estimate. Aneuploid data. Line ar CI." ; proc lifetest data =aneuploid method =km conftype =linear outsurv =linear alpha =0.05 noprint ; time time*stat( 0); run; proc print data =linear; run; title "1 b. Kaplan-Meier estimate. Aneuploid data. Log log CI." ; proc lifetest data =aneuploid method =km conftype =loglog outsurv =loglog alpha =0.05 noprint ; time time*stat( 0); run; proc print data =loglog; run; /* End Number 1b */
Sas output for 1 b linear:
Sas output for 1 b loglog:
Since there are no death times between 51 and 60, we use S(51) as our estimate
of S(60). Linear C.I. for S(60) = (0.5245, 0.7832)
Loglog C.I. for S(60)= (0.5083, 0.7659)
3
1 c. Use Nelson-Aalen to estimate H(60) & S(60). Compare to KM estimate S(60).
Sas code for 1 c:
/* Number 1c */ title "1 c. Nelson-Aalen estimate H. Aneuploid data." ; proc lifetest nelson method =km data =aneuploid alpha =0.05; time time*stat( 0); run; /* End Number 1c */
Sas output for 1 c:
…
Since there are no death times between 51 and 60, we use H(51) as our estimate
of H(60). H(60)≈0.4178
Thus S(60)≈exp(-H(60))≈exp(-0.4178)=0.6585
The Kaplan-Meier estimate of S(60) is 0.6538, as expected,
slightly smaller than S(60) from the Nelson-Aalen cumulative hazard estimate.
4
1 d. Use life-table method to estimate S(12) and it’s standard error, under the
interval setting: [0, 20), [20, 40), [40, 80), [80, 120) and 120 or more.
Sas code for 1 d:
/* Number 1d */ title "1 d. Life table estimate. Aneuploid data." ; proc lifetest data =aneuploid method =act intervals =20, 40, 80, 120; time time*stat( 0); run; /* End Number 1d */
Sas output for 1 d:
Let S* be the life-table estimator. Use linear interpolation to estimate the value of
S(12), from the life-table estimate of S*(20)=0.8269, and S*(0)=1.
S*(12) = S*(0)+(12-0)/(20-0)(S*(20)-S*(0)) = 1 + (12/20)(0.8269-1) = 1-
0.10386=0.8961
To estimate the standard error of S*(12), use the standard error of S*(20),
SE(S*(20))=0.0525, and SE(S*(0))=0.
V(S*(12)) =V(1+12/20(S*(20)-1))=(12/20)^2*V(S*(20)), so SE(S*(12))=
(12/20)*SE(S(20)) = (12/20)*0.0525 = 0.0315
5
1 e. Compare aneuploidy and diploid tumors.
Sas code for 1 e:
/* Number 1e */ title "Compare aneuploid to diploid tumor." ; proc lifetest data =tumor method =km plots =survival( cl ) graphics outsurv=a; time time*stat( 0); strata type; run; data a2; set a; logH=log(-log(survival)); run; proc gplot data =a2; symbol1 i =join width =2 value =triangle c=steelblue; symbol2 i =join width =2 value =circle c=red; plot logH*weeks=type; run; /* End Number 1e */
Sas output for 1 e, i:
The survival curves are significantly different. Visual inspection of the plot of the
Kaplan-Meier approximation shows the dipoid tumor patients die off faster.
6
Moreover, this is supported by test results. While the other two tests show
marginally significant, the Likelihood Ratio Test concludes that aneuploidy and
diploid are different with a p-value of 0.0469:
Part ii:
The plot of log(-logS) vs. time does not show parallel curves, comparing
aneuploidy and diploid tumors. This indicates the proportional hazard
assumption is not satisfied here.
7
2a. Table 5.6 verification.
Sas code for 2a:
/* Number 2a */ title "Wean example. Reproduce Table 5.6." ; proc lifetest method =life data =wean intervals =2 3 5 7 11 17 25 37 53 plots =(S, H); time time*stat( 0); freq number; run; /* End Number 2a */
Sas output for 2a:
Column “Effective Sample Size” matches “Number exposed to weaning” in T. 5.6.
Column “PDF evaluated at midpoint” matches “Est pdf at middle” in T. 5.6. (SAS
rounds to 6 decimals, but the book only round to 4.)
Column “PDF standard error” matches “Est s.d. of pdf” in T. 5.6.
Column “Hazard” matches “Est. hazard at middle” in T. 5.6.
Column “Hazard standard error” matches “Est stand dev of hazard” in T. 5.6.
So the output matches with Table 5.6 in the book.
2b. Estimation of h(60).
Sas output for 2b:
8
Note that the tail of the estimated survival function is nearly linear starting at 25,
going until the last estimated point of 53. Therefore, as 60 is not far from 53, it is
somewhat reasonable to use the same slope to estimate S(t) as a straight line
from t=53 to t=60. Unfortunately, this turns out to give a negative value for S(60)
and can’t be used to estimate h(60). Instead we can only say S(60) is close to zero
and so h(60) = -S’(60)/S(60) is big, larger than h(53). Here is how you can estimate
h(53):
From the SAS-output table on the previous page (or Table 5.6),
S(37)≈0.1296 and S(53)≈0.0313.
The slope is (S(37)-S(53))/(37-53)=(0.1296-0.0131)/(-16)=-0.006143.
h(53)=-S’(53)/S(53)≈0.006143/0.0313=0.1963.
9
Appendix: SAS Code
*SAS Homework 4; /*Question 1: Using the data on aneuploidy tumors f ound in Table 1.6 on page 12, answer the following questions:*/ data aneuploidy; input weeks status; datalines ; 1 1 3 1 3 1 4 1 10 1 13 1 13 1 16 1 16 1 24 1 26 1 27 1 28 1 30 1 30 1 32 1 41 1 51 1 65 1 67 1 70 1 72 1 73 1 77 1 91 1 93 1 96 1 100 1 104 1 157 1 167 1 61 0 74 0 79 0 80 0 81 0 87 0 87 0 88 0 89 0 93 0 97 0 101 0 104 0 108 0 109 0
10
120 0 131 0 150 0 231 0 240 0 400 0 ; run; /*Question 1a: Use the Kaplan-Meier method to estim ate the survival function at 12 weeks and its standard error */ proc lifetest data =aneuploidy method =km conftype =loglog plots =survival( cl ) plots =(s, ls ,lls) graphics outsurv =a; time weeks*status( 0); run; /*Question 1b: Find a linear and a complimentary lo g-log confidence interval for S(60) at 95% confidence level. Compare them*/ proc lifetest data =aneuploidy method =km CONFTYPE=LINEAR plots =survival( cl ) plots =(s, ls ,lls) graphics outsurv =b; time weeks*status( 0); run; /*Question 1c: Use Nelson-Aalen method to estimate the cumulative hazard rate at 60 months. Estimate S(60) by your estimate of H(60) and compare to the Kaplar-Meier estimate o f S(60). Which one is bigger?*/ proc lifetest data =aneuploidy method =km NELSON conftype =loglog plots =survival( cl ) plots =(s, ls ,lls) graphics outsurv =a; time weeks*status( 0); run; /*Question 1d: Repeat a-) using life table method u nder the interval setting [0,20), [20,40), [40,80), [80,120), and 120 or more */ proc lifetest data =aneuploidy method =act intervals =20 40 80 120 plots =(s, ls ,lls) graphics outsurv =c; time weeks*status( 0); run; /*Question 1e: Now we compare the survivorship of a neuploidy tumor patients to diploid tumor patients (data on Table 1.6)*/ data tumor; length type $ 10; input weeks status type$; if type= '1' then type= 'aneuploid' ; else type= 'diploid' ; datalines ; 1 1 1 3 1 1 3 1 1 4 1 1 10 1 1 13 1 1
11
13 1 1 16 1 1 16 1 1 24 1 1 26 1 1 27 1 1 28 1 1 30 1 1 30 1 1 32 1 1 41 1 1 51 1 1 65 1 1 67 1 1 70 1 1 72 1 1 73 1 1 77 1 1 91 1 1 93 1 1 96 1 1 100 1 1 104 1 1 157 1 1 167 1 1 61 0 1 74 0 1 79 0 1 80 0 1 81 0 1 87 0 1 87 0 1 88 0 1 89 0 1 93 0 1 97 0 1 101 0 1 104 0 1 108 0 1 109 0 1 120 0 1 131 0 1 150 0 1 231 0 1 240 0 1 400 0 1 1 1 2 3 1 2 4 1 2 5 1 2 5 1 2 8 1 2 12 1 2 13 1 2 18 1 2 23 1 2 26 1 2
12
27 1 2 30 1 2 42 1 2 56 1 2 62 1 2 69 1 2 104 1 2 104 1 2 112 1 2 129 1 2 181 1 2 8 0 2 67 0 2 76 0 2 104 0 2 176 0 2 231 0 2 ; run; /*i-Are their survival curves significantly differe nt?*/ proc lifetest data =tumor method =km conftype =loglog plots =survival( cl ) plots =(s, ls ,lls) graphics outsurv =a; time weeks*status( 0); strata type; symbol1 v=none color =black line =1; symbol2 v=none color =black line =2; run; /*i-Are their hazard function proportional? Justify your answer by graphs*/ data a2; set a; s=survival; logH=log(-log(s)); lweek=log(weeks); run; proc gplot data =a2; symbol1 i =join width =2 value =triangle c=steelblue; symbol2 i =join width =2 value =circle c=red; plot logH*weeks=type logH*lweek=type ; run; quit; /*Question 2: Use SAS and data of the 3rd and 4th c olumn of Table 5.6 to verify the life-table estimates on Table 5.6*/ /*a-) Provide your SAS code and output which match with Table 5.6*/ data weaning; input time status number; datalines ; 1 1 77 1 0 2 2.5 1 71 2.5 0 3 4 1 119 4 0 6 6 1 75 6 0 9
13
9 1 109 9 0 7 14 1 148 14 0 5 21 1 107 21 0 3 31 1 74 31 0 0 45 1 85 45 0 0 60 1 27 60 0 0 ; run; proc lifetest method =life data =weaning intervals =2 3 5 7 11 17 25 37 53 plots =(S, H); time time*status( 0); freq number; run;