Survival report of 76 breast cancer patients under three different treatments

Survival Report of 76 breast cancer patients under three different treatments

Summary

This review presents an application of the Kaplan-Meier estimator, Lifetable Analysis and a clinical data, the survival time of 76 breast cancer patients categorized under three different treatments, which is presented with respected lifetables along with survival and hazard function for comparison. From various test results it is evident treatment R stands out little better than the rest although all of the treatments have low survival rate, with less difference in response among the cancer patients.

Introduction

Survival times are data that measure follow-up time from a defined starting point to the occurrence of a given event, for example the time from the beginning to the end of a remission period or the time from the diagnosis of a disease to death. Standard statistical techniques cannot usually be applied because the underlying distribution is rarely Normal and the data are often 'censored'. A survival time is described as censored when there is a follow-up time but the event has not yet occurred or is not known to have occurred. We consider methods for the analysis of data when the response of interest is the time until some event occurs, such events are generically referred to as failure. The survival analysis attempts to cover both the parametric and nonparametric methods, the emphasis is on the more recent nonparametric developments with applications to medical research.

The data set is follow up to a clinical trial conducted in the early 80’s on 76 breast cancer patients to investigate three different treatments- Radiotherapy alone (R), Radiotherapy and Chemotherapy (RC), and Chinese traditional medicine (CTM). During the tenure of five years of the examination-25 patients received R, 27 received R&C, and the other 24 received CTM. The survival time (in months) is the time until cosmetic deterioration which is determined by the appearance of breast retraction.

Procedure along with computation, output and pictorial representation

Computational Tables from SAS

Summary of censored and uncensored values-

Summary of the Number of Censored and Uncensored Values

Stratumtreatment Total Failed CensoredPercent

Censored1CTM 24 20 4 16.672R 25 18 7 28.003RC 27 21 6 22.22

Total 76 59 17 22.37

The above table shows the summary of all the censored and uncensored values obtained from the given data set.

1.Kaplan –Meier Estimate

SAS procedure- For each case in the sample, we define three variables, Time, Status and Treatment. Let Time denote the survival time (exact or censored), Status be a dummy variable with Status=0 if Time is censored and 1 otherwise and Treat be a variable with Treat = R if the patient received Radiotherapy alone, RC if the patient receive Radiotherapy and Chemotherapy and CTM if the patient received Chinese traditional medicine. The SAS code for procedure LIFETEST can be used to test the above null hypothesis. We should simply add a STRATA statement after the Time statement.

We computed and plotted the PLS estimates of S(t) at every time for the R,RC and CTM groups.

Hence using proc lifetest we acquire the following result. We also include the Survival distribution function but later on different section.

SAS output-

The following tables are the SAS output of Product Limit (PL) survival estimates under the three treatment groups CTM, R and RC.

(a) CTM-

Product-Limit Survival Estimates

Time SurvivalFailureSurvival Standard ErrorNumber

FailedNumber

Left0.0000 1.0000 0 0 0 2410.000

0 0.95830.0417 0.0408 1 2313.000

0 0.91670.0833 0.0564 2 2214.000

0 0.87500.1250 0.0675 3 2116.000

0 0.83330.1667 0.0761 4 2016.000

0* . . . 4 1918.000

0 0.78950.2105 0.0838 5 1820.000 0.74560.2544 0.0899 6 17



FailedNumber

Left0

21.0000 0.70180.2982 0.0947 7 16

27.0000 0.65790.3421 0.0984 8 15

28.0000 . . . 9 14

28.0000 0.57020.4298 0.1030 10 13

28.0000* . . . 10 12

32.0000 0.52270.4773 0.1048 11 11

33.0000 . . . 12 10

33.0000 0.42760.5724 0.1051 13 9

34.0000 0.38010.6199 0.1036 14 8

39.0000* . . . 14 7

41.0000 0.32580.6742 0.1020 15 6

46.0000 0.27150.7285 0.0984 16 5

51.0000 0.21720.7828 0.0925 17 4

52.0000 0.16290.8371 0.0838 18 3

53.0000 0.10860.8914 0.0713 19 2

55.0000* . . . 19 1

57.0000 01.0000 . 20 0

Summary Statistics for Time Variable Time

Quartile Estimates

PercentPoint

Estimate95% Confidence Interval

Transform [Lower Upper)

75 51.0000LOGLOG33.000

057.000

050 33.0000LOGLOG 21.000 46.000

Quartile Estimates

PercentPoint


Transform [Lower Upper)0 0

25 20.0000LOGLOG10.000

028.000

0

MeanStandard

Error34.094

3 3.2433

(b) R-



FailedNumber

Left0.0000 1.0000 0 0 0 2516.000

0 0.96000.0400 0.0392 1 2417.000

0* . . . 1 2318.000

0 0.91830.0817 0.0554 2 2220.000

0 0.87650.1235 0.0668 3 2124.000

0 0.83480.1652 0.0755 4 2025.000

0 0.79300.2070 0.0825 5 1927.000

0 . . . 6 1827.000

0 0.70960.2904 0.0925 7 1729.000

0 0.66780.3322 0.0961 8 1633.000

0 0.62610.3739 0.0987 9 1535.000

0 0.58430.4157 0.1006 10 1436.000

0 0.54260.4574 0.1017 11 1339.000

0* . . . 11 1241.000

0 0.49740.5026 0.1028 12 1144.000

0* . . . 12 10



FailedNumber

Left45.000

0 0.44770.5523 0.1038 13 950.000

0 0.39790.6021 0.1035 14 852.000

0 0.34820.6518 0.1018 15 752.000

0* . . . 15 656.000

0 . . . 16 556.000

0 0.23210.7679 0.0954 17 458.000

0* . . . 17 359.000

0 0.15470.8453 0.0896 18 260.000

0* . . . 18 160.000

0* . . . 18 0


Quartile Estimates

PercentPoint



75 56.0000LOGLOG45.000

0 .

50 41.0000LOGLOG27.000

056.000

0

25 27.0000LOGLOG16.000

036.000

0

MeanStandard

Error41.436

2 3.1849

(c) RC-



FailedNumber

Left0.0000 1.0000 0 0 0 27



FailedNumber

Left9.0000 0.96300.0370 0.0363 1 2611.000

0 0.92590.0741 0.0504 2 2517.000

0 . . . 3 2417.000

0 0.85190.1481 0.0684 4 2319.000

0 0.81480.1852 0.0748 5 2221.000

0 0.77780.2222 0.0800 6 2124.000

0 0.74070.2593 0.0843 7 2025.000

0 0.70370.2963 0.0879 8 1927.000

0 0.66670.3333 0.0907 9 1828.000

0 0.62960.3704 0.0929 10 1728.000

0* . . . 10 1629.000

0 0.59030.4097 0.0951 11 1529.000

0* . . . 11 1430.000

0 0.54810.4519 0.0972 12 1333.000

0 0.50600.4940 0.0984 13 1237.000

0 0.46380.5362 0.0989 14 1139.000

0 0.42160.5784 0.0985 15 1040.000

0* . . . 15 944.000

0 0.37480.6252 0.0980 16 846.000

0* . . . 16 747.000

0* . . . 16 651.000

0 0.31230.6877 0.0996 17 552.000 0.24990.7501 0.0973 18 4



FailedNumber

Left0

54.0000 0.18740.8126 0.0909 19 3

56.0000 0.12490.8751 0.0792 20 2

58.0000 0.06250.9375 0.0593 21 1

60.0000* . . . 21 0


Quartile Estimates

PercentPoint



75 52.0000LOGLOG39.000

058.000

0

50 37.0000LOGLOG25.000

052.000

0

25 24.0000LOGLOG11.000

029.000

0

MeanStandard

Error36.946

9 3.2011

2.Life-table Analysis

SAS Procedure-The SAS procedure for the life-table analysis remains the same but here under proc lifetest we define the intervals under which we are creating the life-table.

SAS output-

The following tables are the SAS output of life-table analysis under the three treatment groups CTM,R and RC.

(a) CTM-

Life Table Survival Estimates

Interval

Number

Failed

Number

Censored

Effective

Sample

Size

Conditional

Probability of

Failure

Conditional

Probability

Standard

ErrorSurvi

valFailure

Survival

Standard

Error

Median

Residual

Lifetime

Median

Standard

Error

Evaluated at the Midpoint of the Interval

[Lower,

Upper) PDF

PDFStand

ardError

Hazard

Hazard

Standard

Error

0 5 0 0 24.0 0 01.00

00 0 031.75

022.704

6 0 . 0 .

5 10 0 0 24.0 0 01.00

00 0 026.75

022.704

6 0 . 0 .

10 15 3 0 24.0 0.1250 0.06751.00

00 0 021.75

022.704

60.02

500.013

50.026

6670.015

362

15 20 2 1 20.5 0.0976 0.06550.87

500.12

500.067

518.40

642.560

60.01

710.011

50.020

5130.014

486

20 25 2 0 18.0 0.1111 0.07410.78

960.21

040.083

714.53

752.466

00.01

750.011

80.023

5290.016

609

25 30 3 1 15.5 0.1935 0.10040.70

190.29

810.094

617.45

008.267

70.02

720.014

60.042

8570.024

601

30 35 4 0 12.0 0.3333 0.13610.56

600.43

400.103

818.75

007.577

70.03

770.016

9 0.080.039

192

35 40 0 1 7.5 0 00.37

740.62

260.103

617.50

002.130

0 0 . 0 .

40 45 1 0 7.0 0.1429 0.13230.37

740.62

260.103

612.50

002.204

80.01

080.010

40.030

7690.030

678

45 50 1 0 6.0 0.1667 0.15210.32

350.67

650.101

88.333

32.041

20.01

080.010

40.036

3640.036

213

50 55 3 0 5.0 0.6000 0.21910.26

950.73

050.098

14.166

71.863

40.03

230.016

70.171

4290.089

424

55 60 1 1 1.5 0.6667 0.38490.10

780.89

220.070

93.750

03.061

90.01

440.012

6 0.20.173

205

60 . 0 0 0.0 0 00.03

590.96

410.047

8 . . . . . .

(b) R-


Interval

Number

Failed

Number

Censored

Effective

Sample

Size

Conditional

Probability of

Failure

Conditional

Probability

Standard

ErrorSurvi

valFailure

Survival

Standard

Error

Median

Residual

Lifetime

Median

Standard

Error


[Lower,

Upper) PDF

PDFStand

ardError

Hazard

Hazard

Standard

Error

0 5 0 0 25.0 0 01.00

00 0 044.23

8810.65

22 0 . 0 .

5 10 0 0 25.0 0 01.00

00 0 039.23

8810.65

22 0 . 0 .

10 15 0 0 25.0 0 01.00

00 0 034.23

8810.65

22 0 . 0 .

15 20 2 1 24.5 0.0816 0.05531.00

00 0 029.23

8810.76

040.01

630.011

10.017

0210.012

025

20 25 2 0 22.0 0.0909 0.06130.91

840.08

160.055

328.41

599.931

80.01

670.011

30.019

0480.013

453

25 30 4 0 20.0 0.2000 0.08940.83

490.16

510.075

526.25

184.471

70.03

340.015

20.044

4440.022

085

30 35 1 0 16.0 0.0625 0.06050.66

790.33

210.096

025.14

182.256

20.00835

0.00817

0.012903

0.012897

35 40 2 1 14.5 0.1379 0.09060.62

620.37

380.098

720.70

592.221

90.01

730.011

70.029

630.020

894

40 45 1 1 11.5 0.0870 0.08310.53

980.46

020.102

216.87

292.150

80.00939

0.00914

0.018182

0.018163

45 50 1 0 10.0 0.1000 0.09490.49

290.50

710.103

612.50

712.105

90.00986

0.00958

0.021053

0.021023

50 55 2 1 8.5 0.2353 0.14550.44

360.55

640.104

38.173

12.055

80.02

090.013

80.053

3330.037

376

55 60 3 1 5.5 0.5455 0.21230.33

920.66

080.102

64.583

31.954

30.03

700.018

2 0.150.080

283

60 . 0 2 1.0 0 00.15

420.84

580.085

8 . . . . . .

(b) RC-


Interval

Number

Failed

Number

Censored

Effective

Sample

Size

Conditional

Probability of

Failure

Conditional

Probability

Standard

ErrorSurvi

valFailure

Survival

Standard

Error

Median

Residual

Lifetime

Median

Standard

Error


[Lower,

Upper) PDF

PDFStand

ardError

Hazard

Hazard

Standard

Error

0 5 0 0 27.0 0 01.00

00 0 035.07

505.759

1 0 . 0 .

5 10 1 0 27.0 0.0370 0.03631.00

00 0 030.07

505.759

10.00741

0.00727

0.007547

0.007546

10 15 1 0 26.0 0.0385 0.03770.96

300.03

700.036

326.18

335.651

40.00741

0.00727

0.007843

0.007842

15 20 3 0 25.0 0.1200 0.06500.92

590.07

410.050

422.29

175.541

70.02

220.012

10.025

5320.014

711

20 25 2 0 22.0 0.0909 0.06130.81

480.18

520.074

821.17

179.877

20.01

480.010

10.019

0480.013

453

25 30 4 2 19.0 0.2105 0.09350.74

070.25

930.084

325.09

022.273

50.03

120.014

30.047

0590.023

366

30 35 2 0 14.0 0.1429 0.09350.58

480.41

520.096

122.17

652.090

90.01

670.011

30.030

7690.021

693

35 40 2 0 12.0 0.1667 0.10760.50

130.49

870.098

918.29

411.935

80.01

670.011

30.036

3640.025

607

40 45 1 1 9.5 0.1053 0.09960.41

770.58

230.098

514.41

181.813

10.00879

0.00857

0.022222

0.022188

45 50 0 2 7.0 0 00.37

370.62

630.097

410.00

001.889

8 0 . 0 .

50 55 3 0 6.0 0.5000 0.20410.37

370.62

630.097

45.000

02.041

20.03

740.018

10.133

3330.072

577

55 60 2 0 3.0 0.6667 0.27220.18

690.81

310.090

53.750

02.165

10.02

490.015

8 0.20.122

474

60 . 0 1 0.5 0 00.06

230.93

770.059

1 . . . . . .

Goodness of Fit test

In this section we will perform the goodness of fit test under the different distribution and we will select the appropriate distribution according to the AIC value (the lower the better).

Conclusion-From the tables of the SAS output we will select Log Normal distribution as our model for fitting the data.

Details of the SAS output are given in the following manner-

SAS Output-

Exponential Distribution-

Fit Statistics

-2 Log Likelihood175.95

6

AIC (smaller is better)177.95

6

AICC (smaller is better)178.01

0

BIC (smaller is better)180.28

7

Analysis of Maximum Likelihood Parameter Estimates

Parameter DFEstimateStandard

Error95% Confidence LimitsChi-SquarePr > ChiSqIntercept 1 3.4570 0.1089 3.2436 3.6704 1007.79 <.0001Scale 0 1.0000 0.0000 1.0000 1.0000 Weibull Scale 1 31.7213 3.4543 25.6246 39.2684 Weibull Shape 0 1.0000 0.0000 1.0000 1.0000

Weibull distribution-

Fit Statistics


5


5


0


7


Parameter DFEstimateStandard

Error95% Confidence LimitsChi-SquarePr > ChiSqIntercept 1 3.4570 0.0853 3.2899 3.6241 1643.79 <.0001Scale 1 0.4830 0.0615 0.3763 0.6199 Weibull Scale 1 31.7213 2.7047 26.8394 37.4912 Weibull Shape 1 2.0703 0.2636 1.6131 2.6573

Log Normal Distribution-

Fit Statistics


3


3


7


4


ParameterDFEstimateStandard

Error95% Confidence LimitsChi-SquarePr > ChiSqIntercept 1 3.4570 0.0578 3.3436 3.5703 3573.43 <.0001Scale 1 0.4830 0.0381 0.4138 0.5637

Log logistic Distribution-

Fit Statistics


3


3


7


4



Error95% Confidence LimitsChi-SquarePr > ChiSqIntercept 1 3.4570 0.0969 3.2671 3.6469 1272.89 <.0001Scale 1 0.4830 0.1005 0.3212 0.7263

Gamma Distribution-

Fit Statistics


5


5


9


7



Error95% Confidence LimitsChi-SquarePr > ChiSqIntercept 1 3.4570 0.0853 3.2899 3.6241 1643.79 <.0001Scale 1 0.4830 0.0615 0.3763 0.6199 Shape 0 1.0000 0.0000 1.0000 1.0000

Interpretation- The AIC vale of Log normal distribution is 132.013 which is the smallest amongst the other distributions, hence Lognormal distribution is the appropriate model for fitting the given data set.

Pictorial Representation

Survival function-

Most real life survival curves are not portrayed as smooth curves as in this example. Instead, they are usually shown as staircase curves with a "step" down each time there is a death. This is because a real-world survival curve represents the actual experience of a particular group of people. At the moment of each death, the proportion of survivor’s decreases and the

proportion of survivors does not change at any other time. Thus the curve steps down at each death and is flat in between deaths which leads to the classic staircase appearance.

While a staircase does represent the actual experience of the group whose survival is portrayed in the curve, it does not mean that the risk of an individual patient occurs in discrete steps at specific times as shown in these curves.

With staircase curves, as the group of patients is larger, the step down caused by each death is smaller. If the times of the deaths are plotted accurately, then we can see that as the size of the group increases the staircase will become closer and closer to the ideal of a smooth curve

Interpretation-

The curves may compare results from different treatments as in the above graph. If one curve is continuously "above" the other, as with these curves, the conclusion is that the treatment associated with the higher curve was more effective for these patients. There are many ways the two curves could compare. They might be very close to each other indicating there was no

difference between the treatments. If a dangerously toxic treatment resulted in more long term survivors than a less dangerous treatment, the curve for the riskier treatment might be lower than the other curve due to early treatment deaths, but end up further off the deck in the end.

Now from the above graph it is quite clear that the graph of R(radiotherapy) is above the rest stating that it might be the superior treatment compared to RC and CTM treatments. Although for all of the treatments the survival graph is getting closer to zero in long run indicating low survival rate for all, which is quite obvious since we are dealing with a fatal disease like breast cancer.

Often it may be unclear whether two curves are really different or whether it is reasonable to assume the difference between them may be just due to chance. There are tests of significance for survival curves, such as the log rank test, and we will often see a "p value" given with comparative survival curves to indicate whether the difference is statistically significant. This is explained in the next section

Hazard Function-

The nonparametric hazard plot enables one to examine the hazard function without any distribution assumption. This plot may indicate which parametric distribution would be appropriate for modeling your data should you decide to use parametric estimation methods.

One can interpret the nonparametric hazard plot the same way as one would interpret the parametric hazard plot. The major difference is that the nonparametric hazard plot is a step function whereas the parametric hazard plot is a smoothed function.

Interpretation-

From the above graph it is quite evident that the hazard plot of the cancer patients getting the treatment of CTM and RC are increasing than the cancer patients receiving the R treatment.

Hence the breast cancer patients receiving the CTM and RC treatment do not respond that well compared to that of the treatment R, stating treatment R is much better.

Test of Homogeneity data table from SAS

Let us consider the following tests-

H0: the treatments (or characteristics) being compared are all the same vs H1: Not H0.

Using proc lifetest we get the following SAS output for homogeneity test-

Rank StatisticstreatmentLog-RankWilcoxonCTM 5.0743 200.00R -6.1580 -243.00RC 1.0837 43.00

Covariance Matrix for the Log-RankStatistics

treatment CTM R RCCTM 10.5217 -5.5701 -4.9516R -5.5701 13.3057 -7.7357RC -4.9516 -7.7357 12.6873Covariance Matrix for the Wilcoxon

Statisticstreatment CTM R RC

CTM 26460.0-

13342.1-

13118.0

R-

13342.1 29660.3-

16318.2

RC-

13118.0-

16318.2 29436.2Test of Equality over Strata

Test Chi-SquareDFPr >

Chi-SquareLog-Rank 3.6109 2 0.1644Wilcoxon 2.3929 2 0.3023-2Log(LR) 1.1913 2 0.5512

Interpretation- The rank tests for homogeneity indicate a significant difference between the treatments (p=0.1644 for the log-rank test and p=0.3023 for the Wilcoxon test).

The corresponding chi-square p-value of Log-rank test being 0.1644, hence at α-.05 level of significance we accept the null hypothesis H0, stating that there is no significant difference in the difference of the treatments.

The p-value corresponding to the chi-square value of Wilcoxon test also supports our argument.

Conclusion

When not every patient responds to a treatment, as is nearly always the case in cancer therapy, each trial will accrue some patients who will be responders, and others who, unfortunately, will not be responders. By random chance, some of these trials will happen to get more responders

and thus show a higher response rate than others. If the trials are small enough and there are enough trials, probably a few of these identical trials will get a much higher response rate than the others.

From the statistical homogeneity table we can conclude that the three treatments are not much of any difference for the test subject of 76 breast cancer patients. But from the survival graph one may argue that treatment R stands out to be little better compared to the rest.

Moreover from the hazard plot, the hazard curve of both of the treatments CTM and RC are highly increasing compared to that of the treatment R, supporting our argument.

So with the given small data set we can conclude that although all of the treatments for the breast cancer patients hold significantly no difference and with low survival rate, but treatment R might edge out to be little bit better than the rest.

Recommendation

If there is a high response rate in a small trial and we conduct one more small trial of the same treatment and also get a high response rate this is evidence that the true response rate really is relatively high - because the chances of randomly getting a much higher response rate than the true response rate in any one small trial is small - getting such results twice in a row is not likely.

If the trials are larger, the chance of getting misleading results in the first place is smaller. So we can conclude that on a long run, that is if we collect more sample clinical data we might get a clearer picture as which treatment is best or whether they have any significantly different impact or not.

Data & Analytics

Survival report of 76 breast cancer patients under three different treatments