13
1 BSTA 6652 PROBLEM SET FOUR You should know how to answer these questions by hand calculation. Problem One: Using the data on aneuploidy tumors found in Table 1.6 on page 12, answer the following questions. 1 a. Use Kaplan-Meier to estimate the survival function at 12 weeks and its standard error. Sas code for 1 a: /* Number 1a */ title "Kaplan-Meier estimate"; proc lifetest data=aneuploid method=km plots=survival(cl); time time*stat(0); run; /* End Number 1a */ Sas output for 1 a: Since there are no death times between 10 and 12, we use S(10) as our estimate of S(12). S(12) ≈ 0.9038 SE(S(12))≈0.0409

You should know how to answer these questions by hand ...sfan/SubPages/CSUteach/st...8 Note that the tail of the estimated survival function is nearly linear starting at 25, going

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: You should know how to answer these questions by hand ...sfan/SubPages/CSUteach/st...8 Note that the tail of the estimated survival function is nearly linear starting at 25, going

1

BSTA 6652 PROBLEM SET FOUR

You should know how to answer these questions by hand calculation.

Problem One: Using the data on aneuploidy tumors found in Table 1.6 on page

12, answer the following questions.

1 a. Use Kaplan-Meier to estimate the survival function at 12 weeks and its

standard error.

Sas code for 1 a:

/* Number 1a */ title "Kaplan-Meier estimate" ; proc lifetest data =aneuploid method =km plots =survival( cl ); time time*stat( 0); run; /* End Number 1a */

Sas output for 1 a:

Since there are no death times between 10 and 12, we use S(10) as our estimate

of S(12). S(12) ≈ 0.9038 SE(S(12))≈0.0409

Page 2: You should know how to answer these questions by hand ...sfan/SubPages/CSUteach/st...8 Note that the tail of the estimated survival function is nearly linear starting at 25, going

2

1 b. Find linear and loglog 95% confidence intervals for Survival at 60 weeks.

Sas code for 1 b:

/* Number 1b */ title "1 b. Kaplan-Meier estimate. Aneuploid data. Line ar CI." ; proc lifetest data =aneuploid method =km conftype =linear outsurv =linear alpha =0.05 noprint ; time time*stat( 0); run; proc print data =linear; run; title "1 b. Kaplan-Meier estimate. Aneuploid data. Log log CI." ; proc lifetest data =aneuploid method =km conftype =loglog outsurv =loglog alpha =0.05 noprint ; time time*stat( 0); run; proc print data =loglog; run; /* End Number 1b */

Sas output for 1 b linear:

Sas output for 1 b loglog:

Since there are no death times between 51 and 60, we use S(51) as our estimate

of S(60). Linear C.I. for S(60) = (0.5245, 0.7832)

Loglog C.I. for S(60)= (0.5083, 0.7659)

Page 3: You should know how to answer these questions by hand ...sfan/SubPages/CSUteach/st...8 Note that the tail of the estimated survival function is nearly linear starting at 25, going

3

1 c. Use Nelson-Aalen to estimate H(60) & S(60). Compare to KM estimate S(60).

Sas code for 1 c:

/* Number 1c */ title "1 c. Nelson-Aalen estimate H. Aneuploid data." ; proc lifetest nelson method =km data =aneuploid alpha =0.05; time time*stat( 0); run; /* End Number 1c */

Sas output for 1 c:

Since there are no death times between 51 and 60, we use H(51) as our estimate

of H(60). H(60)≈0.4178

Thus S(60)≈exp(-H(60))≈exp(-0.4178)=0.6585

The Kaplan-Meier estimate of S(60) is 0.6538, as expected,

slightly smaller than S(60) from the Nelson-Aalen cumulative hazard estimate.

Page 4: You should know how to answer these questions by hand ...sfan/SubPages/CSUteach/st...8 Note that the tail of the estimated survival function is nearly linear starting at 25, going

4

1 d. Use life-table method to estimate S(12) and it’s standard error, under the

interval setting: [0, 20), [20, 40), [40, 80), [80, 120) and 120 or more.

Sas code for 1 d:

/* Number 1d */ title "1 d. Life table estimate. Aneuploid data." ; proc lifetest data =aneuploid method =act intervals =20, 40, 80, 120; time time*stat( 0); run; /* End Number 1d */

Sas output for 1 d:

Let S* be the life-table estimator. Use linear interpolation to estimate the value of

S(12), from the life-table estimate of S*(20)=0.8269, and S*(0)=1.

S*(12) = S*(0)+(12-0)/(20-0)(S*(20)-S*(0)) = 1 + (12/20)(0.8269-1) = 1-

0.10386=0.8961

To estimate the standard error of S*(12), use the standard error of S*(20),

SE(S*(20))=0.0525, and SE(S*(0))=0.

V(S*(12)) =V(1+12/20(S*(20)-1))=(12/20)^2*V(S*(20)), so SE(S*(12))=

(12/20)*SE(S(20)) = (12/20)*0.0525 = 0.0315

Page 5: You should know how to answer these questions by hand ...sfan/SubPages/CSUteach/st...8 Note that the tail of the estimated survival function is nearly linear starting at 25, going

5

1 e. Compare aneuploidy and diploid tumors.

Sas code for 1 e:

/* Number 1e */ title "Compare aneuploid to diploid tumor." ; proc lifetest data =tumor method =km plots =survival( cl ) graphics outsurv=a; time time*stat( 0); strata type; run; data a2; set a; logH=log(-log(survival)); run; proc gplot data =a2; symbol1 i =join width =2 value =triangle c=steelblue; symbol2 i =join width =2 value =circle c=red; plot logH*weeks=type; run; /* End Number 1e */

Sas output for 1 e, i:

The survival curves are significantly different. Visual inspection of the plot of the

Kaplan-Meier approximation shows the dipoid tumor patients die off faster.

Page 6: You should know how to answer these questions by hand ...sfan/SubPages/CSUteach/st...8 Note that the tail of the estimated survival function is nearly linear starting at 25, going

6

Moreover, this is supported by test results. While the other two tests show

marginally significant, the Likelihood Ratio Test concludes that aneuploidy and

diploid are different with a p-value of 0.0469:

Part ii:

The plot of log(-logS) vs. time does not show parallel curves, comparing

aneuploidy and diploid tumors. This indicates the proportional hazard

assumption is not satisfied here.

Page 7: You should know how to answer these questions by hand ...sfan/SubPages/CSUteach/st...8 Note that the tail of the estimated survival function is nearly linear starting at 25, going

7

2a. Table 5.6 verification.

Sas code for 2a:

/* Number 2a */ title "Wean example. Reproduce Table 5.6." ; proc lifetest method =life data =wean intervals =2 3 5 7 11 17 25 37 53 plots =(S, H); time time*stat( 0); freq number; run; /* End Number 2a */

Sas output for 2a:

Column “Effective Sample Size” matches “Number exposed to weaning” in T. 5.6.

Column “PDF evaluated at midpoint” matches “Est pdf at middle” in T. 5.6. (SAS

rounds to 6 decimals, but the book only round to 4.)

Column “PDF standard error” matches “Est s.d. of pdf” in T. 5.6.

Column “Hazard” matches “Est. hazard at middle” in T. 5.6.

Column “Hazard standard error” matches “Est stand dev of hazard” in T. 5.6.

So the output matches with Table 5.6 in the book.

2b. Estimation of h(60).

Sas output for 2b:

Page 8: You should know how to answer these questions by hand ...sfan/SubPages/CSUteach/st...8 Note that the tail of the estimated survival function is nearly linear starting at 25, going

8

Note that the tail of the estimated survival function is nearly linear starting at 25,

going until the last estimated point of 53. Therefore, as 60 is not far from 53, it is

somewhat reasonable to use the same slope to estimate S(t) as a straight line

from t=53 to t=60. Unfortunately, this turns out to give a negative value for S(60)

and can’t be used to estimate h(60). Instead we can only say S(60) is close to zero

and so h(60) = -S’(60)/S(60) is big, larger than h(53). Here is how you can estimate

h(53):

From the SAS-output table on the previous page (or Table 5.6),

S(37)≈0.1296 and S(53)≈0.0313.

The slope is (S(37)-S(53))/(37-53)=(0.1296-0.0131)/(-16)=-0.006143.

h(53)=-S’(53)/S(53)≈0.006143/0.0313=0.1963.

Page 9: You should know how to answer these questions by hand ...sfan/SubPages/CSUteach/st...8 Note that the tail of the estimated survival function is nearly linear starting at 25, going

9

Appendix: SAS Code

*SAS Homework 4; /*Question 1: Using the data on aneuploidy tumors f ound in Table 1.6 on page 12, answer the following questions:*/ data aneuploidy; input weeks status; datalines ; 1 1 3 1 3 1 4 1 10 1 13 1 13 1 16 1 16 1 24 1 26 1 27 1 28 1 30 1 30 1 32 1 41 1 51 1 65 1 67 1 70 1 72 1 73 1 77 1 91 1 93 1 96 1 100 1 104 1 157 1 167 1 61 0 74 0 79 0 80 0 81 0 87 0 87 0 88 0 89 0 93 0 97 0 101 0 104 0 108 0 109 0

Page 10: You should know how to answer these questions by hand ...sfan/SubPages/CSUteach/st...8 Note that the tail of the estimated survival function is nearly linear starting at 25, going

10

120 0 131 0 150 0 231 0 240 0 400 0 ; run; /*Question 1a: Use the Kaplan-Meier method to estim ate the survival function at 12 weeks and its standard error */ proc lifetest data =aneuploidy method =km conftype =loglog plots =survival( cl ) plots =(s, ls ,lls) graphics outsurv =a; time weeks*status( 0); run; /*Question 1b: Find a linear and a complimentary lo g-log confidence interval for S(60) at 95% confidence level. Compare them*/ proc lifetest data =aneuploidy method =km CONFTYPE=LINEAR plots =survival( cl ) plots =(s, ls ,lls) graphics outsurv =b; time weeks*status( 0); run; /*Question 1c: Use Nelson-Aalen method to estimate the cumulative hazard rate at 60 months. Estimate S(60) by your estimate of H(60) and compare to the Kaplar-Meier estimate o f S(60). Which one is bigger?*/ proc lifetest data =aneuploidy method =km NELSON conftype =loglog plots =survival( cl ) plots =(s, ls ,lls) graphics outsurv =a; time weeks*status( 0); run; /*Question 1d: Repeat a-) using life table method u nder the interval setting [0,20), [20,40), [40,80), [80,120), and 120 or more */ proc lifetest data =aneuploidy method =act intervals =20 40 80 120 plots =(s, ls ,lls) graphics outsurv =c; time weeks*status( 0); run; /*Question 1e: Now we compare the survivorship of a neuploidy tumor patients to diploid tumor patients (data on Table 1.6)*/ data tumor; length type $ 10; input weeks status type$; if type= '1' then type= 'aneuploid' ; else type= 'diploid' ; datalines ; 1 1 1 3 1 1 3 1 1 4 1 1 10 1 1 13 1 1

Page 11: You should know how to answer these questions by hand ...sfan/SubPages/CSUteach/st...8 Note that the tail of the estimated survival function is nearly linear starting at 25, going

11

13 1 1 16 1 1 16 1 1 24 1 1 26 1 1 27 1 1 28 1 1 30 1 1 30 1 1 32 1 1 41 1 1 51 1 1 65 1 1 67 1 1 70 1 1 72 1 1 73 1 1 77 1 1 91 1 1 93 1 1 96 1 1 100 1 1 104 1 1 157 1 1 167 1 1 61 0 1 74 0 1 79 0 1 80 0 1 81 0 1 87 0 1 87 0 1 88 0 1 89 0 1 93 0 1 97 0 1 101 0 1 104 0 1 108 0 1 109 0 1 120 0 1 131 0 1 150 0 1 231 0 1 240 0 1 400 0 1 1 1 2 3 1 2 4 1 2 5 1 2 5 1 2 8 1 2 12 1 2 13 1 2 18 1 2 23 1 2 26 1 2

Page 12: You should know how to answer these questions by hand ...sfan/SubPages/CSUteach/st...8 Note that the tail of the estimated survival function is nearly linear starting at 25, going

12

27 1 2 30 1 2 42 1 2 56 1 2 62 1 2 69 1 2 104 1 2 104 1 2 112 1 2 129 1 2 181 1 2 8 0 2 67 0 2 76 0 2 104 0 2 176 0 2 231 0 2 ; run; /*i-Are their survival curves significantly differe nt?*/ proc lifetest data =tumor method =km conftype =loglog plots =survival( cl ) plots =(s, ls ,lls) graphics outsurv =a; time weeks*status( 0); strata type; symbol1 v=none color =black line =1; symbol2 v=none color =black line =2; run; /*i-Are their hazard function proportional? Justify your answer by graphs*/ data a2; set a; s=survival; logH=log(-log(s)); lweek=log(weeks); run; proc gplot data =a2; symbol1 i =join width =2 value =triangle c=steelblue; symbol2 i =join width =2 value =circle c=red; plot logH*weeks=type logH*lweek=type ; run; quit; /*Question 2: Use SAS and data of the 3rd and 4th c olumn of Table 5.6 to verify the life-table estimates on Table 5.6*/ /*a-) Provide your SAS code and output which match with Table 5.6*/ data weaning; input time status number; datalines ; 1 1 77 1 0 2 2.5 1 71 2.5 0 3 4 1 119 4 0 6 6 1 75 6 0 9

Page 13: You should know how to answer these questions by hand ...sfan/SubPages/CSUteach/st...8 Note that the tail of the estimated survival function is nearly linear starting at 25, going

13

9 1 109 9 0 7 14 1 148 14 0 5 21 1 107 21 0 3 31 1 74 31 0 0 45 1 85 45 0 0 60 1 27 60 0 0 ; run; proc lifetest method =life data =weaning intervals =2 3 5 7 11 17 25 37 53 plots =(S, H); time time*status( 0); freq number; run;