Upload
dwaipayan-mukhopadhyay
View
21
Download
1
Embed Size (px)
Citation preview
Survival Report of 76 breast cancer patients under three different treatments
Summary
This review presents an application of the Kaplan-Meier estimator, Lifetable Analysis and a clinical data, the survival time of 76 breast cancer patients categorized under three different treatments, which is presented with respected lifetables along with survival and hazard function for comparison. From various test results it is evident treatment R stands out little better than the rest although all of the treatments have low survival rate, with less difference in response among the cancer patients.
Introduction
Survival times are data that measure follow-up time from a defined starting point to the occurrence of a given event, for example the time from the beginning to the end of a remission period or the time from the diagnosis of a disease to death. Standard statistical techniques cannot usually be applied because the underlying distribution is rarely Normal and the data are often 'censored'. A survival time is described as censored when there is a follow-up time but the event has not yet occurred or is not known to have occurred. We consider methods for the analysis of data when the response of interest is the time until some event occurs, such events are generically referred to as failure. The survival analysis attempts to cover both the parametric and nonparametric methods, the emphasis is on the more recent nonparametric developments with applications to medical research.
The data set is follow up to a clinical trial conducted in the early 80’s on 76 breast cancer patients to investigate three different treatments- Radiotherapy alone (R), Radiotherapy and Chemotherapy (RC), and Chinese traditional medicine (CTM). During the tenure of five years of the examination-25 patients received R, 27 received R&C, and the other 24 received CTM. The survival time (in months) is the time until cosmetic deterioration which is determined by the appearance of breast retraction.
Procedure along with computation, output and pictorial representation
Computational Tables from SAS
Summary of censored and uncensored values-
Summary of the Number of Censored and Uncensored Values
Stratumtreatment Total Failed CensoredPercent
Censored1CTM 24 20 4 16.672R 25 18 7 28.003RC 27 21 6 22.22
Total 76 59 17 22.37
The above table shows the summary of all the censored and uncensored values obtained from the given data set.
1.Kaplan –Meier Estimate
SAS procedure- For each case in the sample, we define three variables, Time, Status and Treatment. Let Time denote the survival time (exact or censored), Status be a dummy variable with Status=0 if Time is censored and 1 otherwise and Treat be a variable with Treat = R if the patient received Radiotherapy alone, RC if the patient receive Radiotherapy and Chemotherapy and CTM if the patient received Chinese traditional medicine. The SAS code for procedure LIFETEST can be used to test the above null hypothesis. We should simply add a STRATA statement after the Time statement.
We computed and plotted the PLS estimates of S(t) at every time for the R,RC and CTM groups.
Hence using proc lifetest we acquire the following result. We also include the Survival distribution function but later on different section.
SAS output-
The following tables are the SAS output of Product Limit (PL) survival estimates under the three treatment groups CTM, R and RC.
(a) CTM-
Product-Limit Survival Estimates
Time SurvivalFailureSurvival Standard ErrorNumber
FailedNumber
Left0.0000 1.0000 0 0 0 2410.000
0 0.95830.0417 0.0408 1 2313.000
0 0.91670.0833 0.0564 2 2214.000
0 0.87500.1250 0.0675 3 2116.000
0 0.83330.1667 0.0761 4 2016.000
0* . . . 4 1918.000
0 0.78950.2105 0.0838 5 1820.000 0.74560.2544 0.0899 6 17
Product-Limit Survival Estimates
Time SurvivalFailureSurvival Standard ErrorNumber
FailedNumber
Left0
21.0000 0.70180.2982 0.0947 7 16
27.0000 0.65790.3421 0.0984 8 15
28.0000 . . . 9 14
28.0000 0.57020.4298 0.1030 10 13
28.0000* . . . 10 12
32.0000 0.52270.4773 0.1048 11 11
33.0000 . . . 12 10
33.0000 0.42760.5724 0.1051 13 9
34.0000 0.38010.6199 0.1036 14 8
39.0000* . . . 14 7
41.0000 0.32580.6742 0.1020 15 6
46.0000 0.27150.7285 0.0984 16 5
51.0000 0.21720.7828 0.0925 17 4
52.0000 0.16290.8371 0.0838 18 3
53.0000 0.10860.8914 0.0713 19 2
55.0000* . . . 19 1
57.0000 01.0000 . 20 0
Summary Statistics for Time Variable Time
Quartile Estimates
PercentPoint
Estimate95% Confidence Interval
Transform [Lower Upper)
75 51.0000LOGLOG33.000
057.000
050 33.0000LOGLOG 21.000 46.000
Quartile Estimates
PercentPoint
Estimate95% Confidence Interval
Transform [Lower Upper)0 0
25 20.0000LOGLOG10.000
028.000
0
MeanStandard
Error34.094
3 3.2433
(b) R-
Product-Limit Survival Estimates
Time SurvivalFailureSurvival Standard ErrorNumber
FailedNumber
Left0.0000 1.0000 0 0 0 2516.000
0 0.96000.0400 0.0392 1 2417.000
0* . . . 1 2318.000
0 0.91830.0817 0.0554 2 2220.000
0 0.87650.1235 0.0668 3 2124.000
0 0.83480.1652 0.0755 4 2025.000
0 0.79300.2070 0.0825 5 1927.000
0 . . . 6 1827.000
0 0.70960.2904 0.0925 7 1729.000
0 0.66780.3322 0.0961 8 1633.000
0 0.62610.3739 0.0987 9 1535.000
0 0.58430.4157 0.1006 10 1436.000
0 0.54260.4574 0.1017 11 1339.000
0* . . . 11 1241.000
0 0.49740.5026 0.1028 12 1144.000
0* . . . 12 10
Product-Limit Survival Estimates
Time SurvivalFailureSurvival Standard ErrorNumber
FailedNumber
Left45.000
0 0.44770.5523 0.1038 13 950.000
0 0.39790.6021 0.1035 14 852.000
0 0.34820.6518 0.1018 15 752.000
0* . . . 15 656.000
0 . . . 16 556.000
0 0.23210.7679 0.0954 17 458.000
0* . . . 17 359.000
0 0.15470.8453 0.0896 18 260.000
0* . . . 18 160.000
0* . . . 18 0
Summary Statistics for Time Variable Time
Quartile Estimates
PercentPoint
Estimate95% Confidence Interval
Transform [Lower Upper)
75 56.0000LOGLOG45.000
0 .
50 41.0000LOGLOG27.000
056.000
0
25 27.0000LOGLOG16.000
036.000
0
MeanStandard
Error41.436
2 3.1849
(c) RC-
Product-Limit Survival Estimates
Time SurvivalFailureSurvival Standard ErrorNumber
FailedNumber
Left0.0000 1.0000 0 0 0 27
Product-Limit Survival Estimates
Time SurvivalFailureSurvival Standard ErrorNumber
FailedNumber
Left9.0000 0.96300.0370 0.0363 1 2611.000
0 0.92590.0741 0.0504 2 2517.000
0 . . . 3 2417.000
0 0.85190.1481 0.0684 4 2319.000
0 0.81480.1852 0.0748 5 2221.000
0 0.77780.2222 0.0800 6 2124.000
0 0.74070.2593 0.0843 7 2025.000
0 0.70370.2963 0.0879 8 1927.000
0 0.66670.3333 0.0907 9 1828.000
0 0.62960.3704 0.0929 10 1728.000
0* . . . 10 1629.000
0 0.59030.4097 0.0951 11 1529.000
0* . . . 11 1430.000
0 0.54810.4519 0.0972 12 1333.000
0 0.50600.4940 0.0984 13 1237.000
0 0.46380.5362 0.0989 14 1139.000
0 0.42160.5784 0.0985 15 1040.000
0* . . . 15 944.000
0 0.37480.6252 0.0980 16 846.000
0* . . . 16 747.000
0* . . . 16 651.000
0 0.31230.6877 0.0996 17 552.000 0.24990.7501 0.0973 18 4
Product-Limit Survival Estimates
Time SurvivalFailureSurvival Standard ErrorNumber
FailedNumber
Left0
54.0000 0.18740.8126 0.0909 19 3
56.0000 0.12490.8751 0.0792 20 2
58.0000 0.06250.9375 0.0593 21 1
60.0000* . . . 21 0
Summary Statistics for Time Variable Time
Quartile Estimates
PercentPoint
Estimate95% Confidence Interval
Transform [Lower Upper)
75 52.0000LOGLOG39.000
058.000
0
50 37.0000LOGLOG25.000
052.000
0
25 24.0000LOGLOG11.000
029.000
0
MeanStandard
Error36.946
9 3.2011
2.Life-table Analysis
SAS Procedure-The SAS procedure for the life-table analysis remains the same but here under proc lifetest we define the intervals under which we are creating the life-table.
SAS output-
The following tables are the SAS output of life-table analysis under the three treatment groups CTM,R and RC.
(a) CTM-
Life Table Survival Estimates
Interval
Number
Failed
Number
Censored
Effective
Sample
Size
Conditional
Probability of
Failure
Conditional
Probability
Standard
ErrorSurvi
valFailure
Survival
Standard
Error
Median
Residual
Lifetime
Median
Standard
Error
Evaluated at the Midpoint of the Interval
[Lower,
Upper) PDF
PDFStand
ardError
Hazard
Hazard
Standard
Error
0 5 0 0 24.0 0 01.00
00 0 031.75
022.704
6 0 . 0 .
5 10 0 0 24.0 0 01.00
00 0 026.75
022.704
6 0 . 0 .
10 15 3 0 24.0 0.1250 0.06751.00
00 0 021.75
022.704
60.02
500.013
50.026
6670.015
362
15 20 2 1 20.5 0.0976 0.06550.87
500.12
500.067
518.40
642.560
60.01
710.011
50.020
5130.014
486
20 25 2 0 18.0 0.1111 0.07410.78
960.21
040.083
714.53
752.466
00.01
750.011
80.023
5290.016
609
25 30 3 1 15.5 0.1935 0.10040.70
190.29
810.094
617.45
008.267
70.02
720.014
60.042
8570.024
601
30 35 4 0 12.0 0.3333 0.13610.56
600.43
400.103
818.75
007.577
70.03
770.016
9 0.080.039
192
35 40 0 1 7.5 0 00.37
740.62
260.103
617.50
002.130
0 0 . 0 .
40 45 1 0 7.0 0.1429 0.13230.37
740.62
260.103
612.50
002.204
80.01
080.010
40.030
7690.030
678
45 50 1 0 6.0 0.1667 0.15210.32
350.67
650.101
88.333
32.041
20.01
080.010
40.036
3640.036
213
50 55 3 0 5.0 0.6000 0.21910.26
950.73
050.098
14.166
71.863
40.03
230.016
70.171
4290.089
424
55 60 1 1 1.5 0.6667 0.38490.10
780.89
220.070
93.750
03.061
90.01
440.012
6 0.20.173
205
60 . 0 0 0.0 0 00.03
590.96
410.047
8 . . . . . .
(b) R-
Life Table Survival Estimates
Interval
Number
Failed
Number
Censored
Effective
Sample
Size
Conditional
Probability of
Failure
Conditional
Probability
Standard
ErrorSurvi
valFailure
Survival
Standard
Error
Median
Residual
Lifetime
Median
Standard
Error
Evaluated at the Midpoint of the Interval
[Lower,
Upper) PDF
PDFStand
ardError
Hazard
Hazard
Standard
Error
0 5 0 0 25.0 0 01.00
00 0 044.23
8810.65
22 0 . 0 .
5 10 0 0 25.0 0 01.00
00 0 039.23
8810.65
22 0 . 0 .
10 15 0 0 25.0 0 01.00
00 0 034.23
8810.65
22 0 . 0 .
15 20 2 1 24.5 0.0816 0.05531.00
00 0 029.23
8810.76
040.01
630.011
10.017
0210.012
025
20 25 2 0 22.0 0.0909 0.06130.91
840.08
160.055
328.41
599.931
80.01
670.011
30.019
0480.013
453
25 30 4 0 20.0 0.2000 0.08940.83
490.16
510.075
526.25
184.471
70.03
340.015
20.044
4440.022
085
30 35 1 0 16.0 0.0625 0.06050.66
790.33
210.096
025.14
182.256
20.00835
0.00817
0.012903
0.012897
35 40 2 1 14.5 0.1379 0.09060.62
620.37
380.098
720.70
592.221
90.01
730.011
70.029
630.020
894
40 45 1 1 11.5 0.0870 0.08310.53
980.46
020.102
216.87
292.150
80.00939
0.00914
0.018182
0.018163
45 50 1 0 10.0 0.1000 0.09490.49
290.50
710.103
612.50
712.105
90.00986
0.00958
0.021053
0.021023
50 55 2 1 8.5 0.2353 0.14550.44
360.55
640.104
38.173
12.055
80.02
090.013
80.053
3330.037
376
55 60 3 1 5.5 0.5455 0.21230.33
920.66
080.102
64.583
31.954
30.03
700.018
2 0.150.080
283
60 . 0 2 1.0 0 00.15
420.84
580.085
8 . . . . . .
(b) RC-
Life Table Survival Estimates
Interval
Number
Failed
Number
Censored
Effective
Sample
Size
Conditional
Probability of
Failure
Conditional
Probability
Standard
ErrorSurvi
valFailure
Survival
Standard
Error
Median
Residual
Lifetime
Median
Standard
Error
Evaluated at the Midpoint of the Interval
[Lower,
Upper) PDF
PDFStand
ardError
Hazard
Hazard
Standard
Error
0 5 0 0 27.0 0 01.00
00 0 035.07
505.759
1 0 . 0 .
5 10 1 0 27.0 0.0370 0.03631.00
00 0 030.07
505.759
10.00741
0.00727
0.007547
0.007546
10 15 1 0 26.0 0.0385 0.03770.96
300.03
700.036
326.18
335.651
40.00741
0.00727
0.007843
0.007842
15 20 3 0 25.0 0.1200 0.06500.92
590.07
410.050
422.29
175.541
70.02
220.012
10.025
5320.014
711
20 25 2 0 22.0 0.0909 0.06130.81
480.18
520.074
821.17
179.877
20.01
480.010
10.019
0480.013
453
25 30 4 2 19.0 0.2105 0.09350.74
070.25
930.084
325.09
022.273
50.03
120.014
30.047
0590.023
366
30 35 2 0 14.0 0.1429 0.09350.58
480.41
520.096
122.17
652.090
90.01
670.011
30.030
7690.021
693
35 40 2 0 12.0 0.1667 0.10760.50
130.49
870.098
918.29
411.935
80.01
670.011
30.036
3640.025
607
40 45 1 1 9.5 0.1053 0.09960.41
770.58
230.098
514.41
181.813
10.00879
0.00857
0.022222
0.022188
45 50 0 2 7.0 0 00.37
370.62
630.097
410.00
001.889
8 0 . 0 .
50 55 3 0 6.0 0.5000 0.20410.37
370.62
630.097
45.000
02.041
20.03
740.018
10.133
3330.072
577
55 60 2 0 3.0 0.6667 0.27220.18
690.81
310.090
53.750
02.165
10.02
490.015
8 0.20.122
474
60 . 0 1 0.5 0 00.06
230.93
770.059
1 . . . . . .
Goodness of Fit test
In this section we will perform the goodness of fit test under the different distribution and we will select the appropriate distribution according to the AIC value (the lower the better).
Conclusion-From the tables of the SAS output we will select Log Normal distribution as our model for fitting the data.
Details of the SAS output are given in the following manner-
SAS Output-
Exponential Distribution-
Fit Statistics
-2 Log Likelihood175.95
6
AIC (smaller is better)177.95
6
AICC (smaller is better)178.01
0
BIC (smaller is better)180.28
7
Analysis of Maximum Likelihood Parameter Estimates
Parameter DFEstimateStandard
Error95% Confidence LimitsChi-SquarePr > ChiSqIntercept 1 3.4570 0.1089 3.2436 3.6704 1007.79 <.0001Scale 0 1.0000 0.0000 1.0000 1.0000 Weibull Scale 1 31.7213 3.4543 25.6246 39.2684 Weibull Shape 0 1.0000 0.0000 1.0000 1.0000
Weibull distribution-
Fit Statistics
-2 Log Likelihood154.83
5
AIC (smaller is better)158.83
5
AICC (smaller is better)159.00
0
BIC (smaller is better)163.49
7
Analysis of Maximum Likelihood Parameter Estimates
Parameter DFEstimateStandard
Error95% Confidence LimitsChi-SquarePr > ChiSqIntercept 1 3.4570 0.0853 3.2899 3.6241 1643.79 <.0001Scale 1 0.4830 0.0615 0.3763 0.6199 Weibull Scale 1 31.7213 2.7047 26.8394 37.4912 Weibull Shape 1 2.0703 0.2636 1.6131 2.6573
Log Normal Distribution-
Fit Statistics
-2 Log Likelihood128.01
3
AIC (smaller is better)132.01
3
AICC (smaller is better)132.17
7
BIC (smaller is better)136.67
4
Analysis of Maximum Likelihood Parameter Estimates
ParameterDFEstimateStandard
Error95% Confidence LimitsChi-SquarePr > ChiSqIntercept 1 3.4570 0.0578 3.3436 3.5703 3573.43 <.0001Scale 1 0.4830 0.0381 0.4138 0.5637
Log logistic Distribution-
Fit Statistics
-2 Log Likelihood139.57
3
AIC (smaller is better)143.57
3
AICC (smaller is better)143.73
7
BIC (smaller is better)148.23
4
Analysis of Maximum Likelihood Parameter Estimates
ParameterDFEstimateStandard
Error95% Confidence LimitsChi-SquarePr > ChiSqIntercept 1 3.4570 0.0969 3.2671 3.6469 1272.89 <.0001Scale 1 0.4830 0.1005 0.3212 0.7263
Gamma Distribution-
Fit Statistics
-2 Log Likelihood154.83
5
AIC (smaller is better)160.83
5
AICC (smaller is better)161.16
9
BIC (smaller is better)167.82
7
Analysis of Maximum Likelihood Parameter Estimates
ParameterDFEstimateStandard
Error95% Confidence LimitsChi-SquarePr > ChiSqIntercept 1 3.4570 0.0853 3.2899 3.6241 1643.79 <.0001Scale 1 0.4830 0.0615 0.3763 0.6199 Shape 0 1.0000 0.0000 1.0000 1.0000
Interpretation- The AIC vale of Log normal distribution is 132.013 which is the smallest amongst the other distributions, hence Lognormal distribution is the appropriate model for fitting the given data set.
Pictorial Representation
Survival function-
Most real life survival curves are not portrayed as smooth curves as in this example. Instead, they are usually shown as staircase curves with a "step" down each time there is a death. This is because a real-world survival curve represents the actual experience of a particular group of people. At the moment of each death, the proportion of survivor’s decreases and the
proportion of survivors does not change at any other time. Thus the curve steps down at each death and is flat in between deaths which leads to the classic staircase appearance.
While a staircase does represent the actual experience of the group whose survival is portrayed in the curve, it does not mean that the risk of an individual patient occurs in discrete steps at specific times as shown in these curves.
With staircase curves, as the group of patients is larger, the step down caused by each death is smaller. If the times of the deaths are plotted accurately, then we can see that as the size of the group increases the staircase will become closer and closer to the ideal of a smooth curve
Interpretation-
The curves may compare results from different treatments as in the above graph. If one curve is continuously "above" the other, as with these curves, the conclusion is that the treatment associated with the higher curve was more effective for these patients. There are many ways the two curves could compare. They might be very close to each other indicating there was no
difference between the treatments. If a dangerously toxic treatment resulted in more long term survivors than a less dangerous treatment, the curve for the riskier treatment might be lower than the other curve due to early treatment deaths, but end up further off the deck in the end.
Now from the above graph it is quite clear that the graph of R(radiotherapy) is above the rest stating that it might be the superior treatment compared to RC and CTM treatments. Although for all of the treatments the survival graph is getting closer to zero in long run indicating low survival rate for all, which is quite obvious since we are dealing with a fatal disease like breast cancer.
Often it may be unclear whether two curves are really different or whether it is reasonable to assume the difference between them may be just due to chance. There are tests of significance for survival curves, such as the log rank test, and we will often see a "p value" given with comparative survival curves to indicate whether the difference is statistically significant. This is explained in the next section
Hazard Function-
The nonparametric hazard plot enables one to examine the hazard function without any distribution assumption. This plot may indicate which parametric distribution would be appropriate for modeling your data should you decide to use parametric estimation methods.
One can interpret the nonparametric hazard plot the same way as one would interpret the parametric hazard plot. The major difference is that the nonparametric hazard plot is a step function whereas the parametric hazard plot is a smoothed function.
Interpretation-
From the above graph it is quite evident that the hazard plot of the cancer patients getting the treatment of CTM and RC are increasing than the cancer patients receiving the R treatment.
Hence the breast cancer patients receiving the CTM and RC treatment do not respond that well compared to that of the treatment R, stating treatment R is much better.
Test of Homogeneity data table from SAS
Let us consider the following tests-
H0: the treatments (or characteristics) being compared are all the same vs H1: Not H0.
Using proc lifetest we get the following SAS output for homogeneity test-
Rank StatisticstreatmentLog-RankWilcoxonCTM 5.0743 200.00R -6.1580 -243.00RC 1.0837 43.00
Covariance Matrix for the Log-RankStatistics
treatment CTM R RCCTM 10.5217 -5.5701 -4.9516R -5.5701 13.3057 -7.7357RC -4.9516 -7.7357 12.6873Covariance Matrix for the Wilcoxon
Statisticstreatment CTM R RC
CTM 26460.0-
13342.1-
13118.0
R-
13342.1 29660.3-
16318.2
RC-
13118.0-
16318.2 29436.2Test of Equality over Strata
Test Chi-SquareDFPr >
Chi-SquareLog-Rank 3.6109 2 0.1644Wilcoxon 2.3929 2 0.3023-2Log(LR) 1.1913 2 0.5512
Interpretation- The rank tests for homogeneity indicate a significant difference between the treatments (p=0.1644 for the log-rank test and p=0.3023 for the Wilcoxon test).
The corresponding chi-square p-value of Log-rank test being 0.1644, hence at α-.05 level of significance we accept the null hypothesis H0, stating that there is no significant difference in the difference of the treatments.
The p-value corresponding to the chi-square value of Wilcoxon test also supports our argument.
Conclusion
When not every patient responds to a treatment, as is nearly always the case in cancer therapy, each trial will accrue some patients who will be responders, and others who, unfortunately, will not be responders. By random chance, some of these trials will happen to get more responders
and thus show a higher response rate than others. If the trials are small enough and there are enough trials, probably a few of these identical trials will get a much higher response rate than the others.
From the statistical homogeneity table we can conclude that the three treatments are not much of any difference for the test subject of 76 breast cancer patients. But from the survival graph one may argue that treatment R stands out to be little better compared to the rest.
Moreover from the hazard plot, the hazard curve of both of the treatments CTM and RC are highly increasing compared to that of the treatment R, supporting our argument.
So with the given small data set we can conclude that although all of the treatments for the breast cancer patients hold significantly no difference and with low survival rate, but treatment R might edge out to be little bit better than the rest.
Recommendation
If there is a high response rate in a small trial and we conduct one more small trial of the same treatment and also get a high response rate this is evidence that the true response rate really is relatively high - because the chances of randomly getting a much higher response rate than the true response rate in any one small trial is small - getting such results twice in a row is not likely.
If the trials are larger, the chance of getting misleading results in the first place is smaller. So we can conclude that on a long run, that is if we collect more sample clinical data we might get a clearer picture as which treatment is best or whether they have any significantly different impact or not.