45
U.U.D.M. Project Report 2008:11 Examensarbete i matematisk statistik, 30 hp Handledare och examinator: Hans Garmo Juni 2008 Analyzing and modeling the relative survival rate of patients diagnosed with malignant melanoma Fredrik Sandin Department of Mathematics Uppsala University

Analyzing and modeling the relative survival rate of ...302910/FULLTEXT01.pdfor unreliable, and a preferable alternative is to study the relative survival rate by comparing the observed

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Analyzing and modeling the relative survival rate of ...302910/FULLTEXT01.pdfor unreliable, and a preferable alternative is to study the relative survival rate by comparing the observed

U.U.D.M. Project Report 2008:11

Examensarbete i matematisk statistik, 30 hpHandledare och examinator: Hans GarmoJuni 2008

Analyzing and modeling the relative survival rate of patients diagnosed with malignant melanoma

Fredrik Sandin

Department of MathematicsUppsala University

Page 2: Analyzing and modeling the relative survival rate of ...302910/FULLTEXT01.pdfor unreliable, and a preferable alternative is to study the relative survival rate by comparing the observed
Page 3: Analyzing and modeling the relative survival rate of ...302910/FULLTEXT01.pdfor unreliable, and a preferable alternative is to study the relative survival rate by comparing the observed

Analyzing and modeling the relative survival rate

of patients diagnosed with malignant melanoma

Fredrik Sandin

Master of Science Project in Mathematical Statistics

Supervisor and examiner: Hans Garmo, Ph.D.

Page 4: Analyzing and modeling the relative survival rate of ...302910/FULLTEXT01.pdfor unreliable, and a preferable alternative is to study the relative survival rate by comparing the observed
Page 5: Analyzing and modeling the relative survival rate of ...302910/FULLTEXT01.pdfor unreliable, and a preferable alternative is to study the relative survival rate by comparing the observed

Abstract

In cause-specific survival analysis, patients dying from other causes than thedisease/condition of interest are censored, which suggests that the method re-quires accurate knowledge of the specific cause of death. In population-basedcancer studies, it is not uncommon for this information to be either unavailableor unreliable, and a preferable alternative is to study the relative survival rateby comparing the observed survival rate of the patients to the expected survivalrate of the background population. In this paper, methods of estimating andmodeling the relative survival rate are described and then applied to data fromthe regional malignant melanoma register of central Sweden. Previous analysesof this data material have shown that melanoma patients with tumour regres-sion induced by the body’s own immune system have an improved survival ratecompared to patients without tumour regression. The prognostic significanceof spontaneous regression is a controversial issue, and further analysis of theeffect of this regression phenomenon is therefore performed in the framework ofrelative survival analysis.

Page 6: Analyzing and modeling the relative survival rate of ...302910/FULLTEXT01.pdfor unreliable, and a preferable alternative is to study the relative survival rate by comparing the observed

Acknowledgement

I would first and foremost like to thank Mats Lambe for giving me the oppor-tunity to write my thesis at ROC. Many thanks also to my supervisor HansGarmo for providing support and guidance through the entire project. I wouldalso like to thank Gunnar Wagenius for providing highly valuable opinions, andfor showing such great interest in the study. Finally, I would like to thank theentire staff of ROC for providing a friendly working environment.

Page 7: Analyzing and modeling the relative survival rate of ...302910/FULLTEXT01.pdfor unreliable, and a preferable alternative is to study the relative survival rate by comparing the observed

Contents

1 Introduction 2

1.1 Aim of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 What is malignant melanoma . . . . . . . . . . . . . . . . . . . . 2

1.2.1 Treatment . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2.2 Classification . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Methods 6

2.1 The Cancer Register . . . . . . . . . . . . . . . . . . . . . . . . . 62.1.1 Data included in the study . . . . . . . . . . . . . . . . . 6

2.2 Right censoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.3 Cause-specific survival . . . . . . . . . . . . . . . . . . . . . . . . 7

2.3.1 Estimating the survival function . . . . . . . . . . . . . . 92.3.2 The Cox proportional hazard model . . . . . . . . . . . . 9

2.4 Competing risks . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.5 Age-standardized incidence . . . . . . . . . . . . . . . . . . . . . 112.6 Relative survival . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.6.1 Calculating the observed survival . . . . . . . . . . . . . . 132.6.2 Estimating the expected survival . . . . . . . . . . . . . . 142.6.3 The standard error of the relative survival rate . . . . . . 16

2.7 Modeling excess hazard . . . . . . . . . . . . . . . . . . . . . . . 172.7.1 The Esteve et al. full likelihood approach . . . . . . . . . 172.7.2 The Dickman et al. approach . . . . . . . . . . . . . . . . 182.7.3 The Hakulinen-Tenkanen approach . . . . . . . . . . . . . 19

2.8 Statistical software . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3 Results 21

3.1 Descriptive analysis . . . . . . . . . . . . . . . . . . . . . . . . . 213.1.1 Age-standardized incidence . . . . . . . . . . . . . . . . . 263.1.2 Cumulative incidence analysis on a subset of the data . . 273.1.3 Relative survival . . . . . . . . . . . . . . . . . . . . . . . 28

3.2 Further analysis of the regression phenomenon . . . . . . . . . . 31

4 Discussion 35

References 37

Appendix A Variable descriptions 39

1

Page 8: Analyzing and modeling the relative survival rate of ...302910/FULLTEXT01.pdfor unreliable, and a preferable alternative is to study the relative survival rate by comparing the observed

1 Introduction

1.1 Aim of the thesis

The aim of the thesis is to study the regional malignant melanoma register atthe Regional Oncologic Center (ROC) in Uppsala, an institution that gathersand analyses data on patients diagnosed with cancer in the Uppsala/Orebroregion of Sweden. The main interest lies in studying incidence, distributionof patient and tumour characteristics, and survival after the time of diagnosis.Cause-specific survival analysis has traditionally been the method of choice forstudying cancer patient survival, but information on the specific cause of deathfor patients in population-based studies is often either unavailable or unreliable.It is preferable, in this situation, to study the relative survival rate, which isestimated by comparing the observed survival rate of cancer patients to theexpected survival rate of the background population. Approaches to modelingproportional excess hazard, using the framework of Generalized Linear Models,have been proposed [4] [5], which enables adjusting the relative survival rate forvarious patient and tumour characteristics.

In some cases, at the histopathological analysis of a malignant melanoma,tumour regression induced by the body’s own immune system can be observed.Previous analyses of the malignant melanoma register at ROC have shown thatpatients with this regression phenomenon have an improved survival rate com-pared to patients without the regression phenomenon. The prognostic signifi-cance of the phenomenon is a controversial issue. Results from some [6][7], butnot all studies [8][9][10] suggest that spontaneous regression is associated with apoorer survival, and another aim of the thesis is therefore to further investigatethe effect of the regression phenomenon.

1.2 What is malignant melanoma

Malignant melanoma is the most serious form of skin cancer, and it is also oneof the cancer diseases which have excibited the largest increase in occurence inSweden during recent decades [1]. Much of the increase has been suggested tobe caused by the increase in excessive exposure to sunlight by the population,since it is well known that ultraviolet radiation increases the risk of develop-ing melanoma. Another contributing factor to the risk of developing melanomais the skin type. While Caucasians, freckled individuals, and people who aresusceptible to red sunburn have the highest risk of developing melanoma, thedisease is quite unusual in northern Africa, Asia, and the Middle-East.

2

Page 9: Analyzing and modeling the relative survival rate of ...302910/FULLTEXT01.pdfor unreliable, and a preferable alternative is to study the relative survival rate by comparing the observed

The most important tumour characteristic affecting the prognosis of patientswith localized melanoma of the skin is the tumour depth, which is measured ac-cording to Breslow at the place where the tumour is thickest. Another importantcharacteristic is the level of invasion according to Clark, which is a measure ofhow far down into the skin the tumour has invaded.

Clark level of invasion

I. Melanomas confined to the the epidermis.Also known as ”melanoma in situ”.

II. Penetration into the second layer of the skin, the dermis.III. Penetration to the junction of the papillary and reticular dermis.IV. Penetration into the reticular dermis.V. Penetration into the third layer of the skin, the subcutis.

The presence of metastases in nearby lymph nodes (regional metastases) orother parts of the body (remote metastases) obviously has an impact on theprognosis, and certain patient characteristics, such as age and gender, are alsoimportant factors to consider.

1.2.1 Treatment

While the occurence of malignant melanoma has increased during recent decades,the prognosis for those affected has improved. This is most likely due to earlierdetection of the tumour, which enables patients to be cured through surgicaltreatment. The patient is usually alarmed by changes in a nevus on the skin,and if the contacted physician even slightly suspects cancer, primary surgeryis performed, during which the nevus is removed along with a margin of thesurrounding skin. If the histopathological investigation confirms malignantmelanoma, the patient is called back for extended surgery, during which aneven larger margin around the site of the tumour is excised. How wide a mar-gin of disease-free tissue that is removed is dependent upon the depth of thetumour. Melanomas with a Breslow depth of less than 1.0 mm have a recom-mended total excision margin of 1.0 cm, while tumours with depth larger than1.0 mm should be excised with a total margin of 2.0 cm. The recommendedmargin for large melanomas was wider a few years back, but a Swedish study [3]showed that there was no significant change in long-term survival for patientswith melanomas thicker than 0.8 mm and thinner than 2.0 mm, when they weretreated with a 2.0 cm resection margin instead of a 5.0 cm resection margin.Results from a recent Nordic study also showed no significant change in survivalfor patients with melanomas thicker than 2.0 mm, when they were treated witha 2.0 cm resection margin instead of a 4.0 cm resection margin.

3

Page 10: Analyzing and modeling the relative survival rate of ...302910/FULLTEXT01.pdfor unreliable, and a preferable alternative is to study the relative survival rate by comparing the observed

If a malignant tumour is not detected in an early stage, it could grow largeenough to invade through the skin, allowing cancer cells to spread through theblood stream to nearby lymph nodes. The cells could form daughter tumours,so-called regional metastases, which have to be surgically removed. If the dis-ease is allowed to progress for a longer period of time, remote metastases couldform in other parts of the body. Unlike localized melanomas, metastases aredifficult to completely remove through surgical measures, and other forms oftreatments are often necessary. These include chemotherapy and radiation, butresearch is also being performed on other options, such as immunotherapy.

1.2.2 Classification

An excised tumour is usually classified according to the TNM-system (TNMbeing an acronym for Tumour, Nodes and Metastases), created by the Interna-tional Union Against Cancer (UICC). This is a classification system based uponsuch variables as tumour depth, presence of ulceration, Clark level of invasion,presence of regional lymph node metastases, and presence of remote metastases.The stage of the cancer is determined from these classifications, with melanomain situ being stage zero.

T-classification (tumour depth, ulceration, Clark level)

TX Invation non-assesableT0 Primary tumour unknownTis In situ or LM (or Clark I)

Tumour depth Ulceration

T1 ≤ 1.0 mm T1a Absence of ulceration, or Clark level II-IIIT1b Presence of ulceration, or Clark level IV-V

T2 1.01-2.0 mm T2a Absence of ulcerationT2b Presence of ulceration

T3 2.01-4.0 mm T3a Absence of ulcerationT3b Presence of ulceration

T4 > 4.0 mm T4a Absence of ulcerationT4b Presence of ulceration

4

Page 11: Analyzing and modeling the relative survival rate of ...302910/FULLTEXT01.pdfor unreliable, and a preferable alternative is to study the relative survival rate by comparing the observed

N-classification (lymph node metastases)

NX Lymph nodes not investigatedN0 No lymph node metastases

No. of pathological lgl Lymph node status

N1 1 lgl N1a MicrometastasisN1b Macrometastasis

N2 2-3 lgl N2a MicrometastasisN2b MacrometastasisN3a Satellite/in-transit metastasis without

regional lymph node metastasesN3 ≥ 4 lgl, or conglomerate of lymph nodes, or satellite/in-transit metastasis

with regional lymph node metastases

M-classification (remote metastases)

MX Not investigatedM0 No remote metastases

Location

M1 M1a Remote metastasis in skin, subcutan or lymph node metastasiswith normal LD-test

M1b Lung metastasis with normal LD-valueM1c All other visceral metastasis, or remote metastasis with heightened

LD-test from two measures, independent of metastasis location

Cancer stage (based on TNM-classification)

M0 T1a T1b T2a T2b T3a T3b T4a T4b M1

N0 IA IB IB IIA IIA IIB IIB IIC IV

N1a IIIA IIIB IIIA IIIB IIIA IIIB IIIA IIIB IV

N1b IIIB IIIC IIIB IIIC IIIB IIIC IIIB IIIC IV

N2a IIIA IIIB IIIA IIIB IIIA IIIB IIIA IIIB IV

N2b IIIB IIIC IIIB IIIC IIIB IIIC IIIB IIIC IV

N2c IIIB IIIB IIIB IIIB IIIB IIIB IIIB IIIB IV

N3 IIIC IIIC IIIC IIIC IIIC IIIC IIIC IIIC IV

M1 IV IV IV IV IV IV IV IV

5

Page 12: Analyzing and modeling the relative survival rate of ...302910/FULLTEXT01.pdfor unreliable, and a preferable alternative is to study the relative survival rate by comparing the observed

2 Methods

2.1 The Cancer Register

The Regional Oncologic Center in Uppsala gathers data on cancer patientsdiagnosed in the Uppsala/Orebro region of Sweden. This region includes sevencounties; Uppsala, Sodermanland, Varmland, Orebro, Vastmanland, Dalarnaand Gavleborg, with a total population of roughly two million. Physicians athospitals and other health care centers are obligated by law to report diagnosedcancer cases to the Swedish Cancer Registry, but there is no law that requiresthem to provide records to the register at ROC. As a consequence, a number ofvariables are missing for some patients in the data set.

2.1.1 Data included in the study

The malignant melanoma register at ROC contained, at the initiation of thestudy, 5480 patients whose first diagnosis of malignant melanoma occurred afterDec 31, 1993. To attain acceptable coverage (above 95%) of the cases in theSwedish Cancer Registry, only patients diagnosed between the years 1996 and2006 were included.

Total entries in the melanoma database:

5480

Missing date of diagnosis: 1

5480-1=5479

Date of diagnosis earlier than Jan 1, 1996 or later than Dec 31, 2006: 527

5479-527=4952

A total of 4952 patients, each associated with 41 variables (complete list ofvariables available in the appendix), were included in the study. Patient recordsare sometimes incomplete, which prohibits the use of the entire data set in everyaspect of the analysis. The geographical distribution of all patients can be foundin Table I.

6

Page 13: Analyzing and modeling the relative survival rate of ...302910/FULLTEXT01.pdfor unreliable, and a preferable alternative is to study the relative survival rate by comparing the observed

Table I. Number of cases in the melanoma register of central Sweden, according to initial registration.

County 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 Total

Uppsala 29 44 34 36 48 55 60 56 76 69 57 564

Sodermanland 56 49 44 50 50 55 64 64 76 79 94 681

Varmland 37 57 55 52 61 68 67 74 86 100 106 763

Orebro 63 79 78 54 54 76 90 91 112 74 114 885

Vastmanland 44 53 64 57 47 69 73 77 48 66 81 679

Dalarna 52 54 58 59 47 70 78 76 63 90 87 734

Gavleborg 60 32 55 43 46 57 70 59 72 81 71 646

Total 341 368 388 351 353 450 502 497 533 559 610 4952

The first part of the descriptive analysis in this paper serves to describe theregional melanoma register, and patients were therefore included to the largestextent possible. When studying patient survival after the time of diagnosis, itis obvious that patients with metastases are not in a sensible way comparableto patients without metastases. Therefore, only patients with N-classificationN0 and M-classification M0 were included in the survival analysis.

2.2 Right censoring

In survival analysis, we are interested in studying the time to a certain event ofinterest. In cancer studies, the event is usually death and time is counted sincediagnosis. Right censoring occurs when we do not know the specific time of theevent, only that it would have occurred after a specific time point. The reasonfor this could be death from something other than the disease of interest, migra-tion, or simply survival until the end of the study. Besides right censoring, leftcensoring and interval censoring can occur when all we know is that the eventhas occurred before a specific time point, or between two specific time points.

In the data set concerning malignant melanoma, the only kind of censoringpresent is right censoring. We have chosen to include only patients with aconfirmed date of diagnosis between the years 1996 and 2006, and patients thatwere alive at the end of 2006 were automatically censored, and their last dateof follow-up was set to December 31, 2006.

2.3 Cause-specific survival

Four functions are used in cause-specific survival analysis to characterize thedistribution of the time, T , to the event of interest. These are the suvival func-tion, the hazard rate, the probability density function, and the the cumulativedistribution function. If we know one of these measures, the other three can beuniquely determined.

7

Page 14: Analyzing and modeling the relative survival rate of ...302910/FULLTEXT01.pdfor unreliable, and a preferable alternative is to study the relative survival rate by comparing the observed

The survival function, S(t), is the probability of experiencing the event ofinterest after the time point t,

S(t) = P (T > t). (1)

The cumulative distribution function is the complement of the survival function,

F (t) = 1− S(t) = P (T ≤ t), (2)

and if T is a continuous random variable, the survival function is the integralof the probability density function,

S(t) = P (T > t) =∫ ∞t

f(u)du⇔ f(t) = −dS(t)dt

. (3)

The survival function is a monotone, nonincreasing function equal to one attime zero and zero at infinity. Another important quantity is the hazard rate,h(t), also known as the hazard function, which is a nonnegative function definedas

h(t) = lim∆t→0

P (t ≤ T < t+ ∆t|T ≥ t)∆t

=f(t)S(t)

= −d ln[S(t)]dt

. (4)

The cumulative hazard function, H(t), is defined as

H(t) =∫ t

0

h(u)du = − ln[S(t)]. (5)

The hazard rate can be interpreted as the risk of experiencing the event of in-terest immediately after time t, given survival up until time t. Also, h(t)∆t maybe viewed as the approximate probability of experiencing the event in the nextinstant.

If T is a discrete random variable, taking on values t1 < t2 < ... < tn, thesurvival function is defined as

S(t) = P (T > t) =∑tj>t

p(tj), (6)

where p(tj) = P (T = tj), the probability mass function. The hazard function isgiven by

h(tj) = P (T = tj |T ≥ tj) =p(tj)S(tj−1)

. (7)

8

Page 15: Analyzing and modeling the relative survival rate of ...302910/FULLTEXT01.pdfor unreliable, and a preferable alternative is to study the relative survival rate by comparing the observed

Since T is discrete,

p(tj) = S(tj−1)− S(tj)⇒ h(tj) = 1− S(tj)S(tj−1)

, (8)

and because S(t0) = 1, the survival function can be written as the product ofconditional survival probabilities,

S(x) =∏tj≤t

S(tj)S(tj−1)

=∏tj≤t

[1− h(tj)]. (9)

2.3.1 Estimating the survival function

Let di be the number of events at time ti, and let Yi be the number of individualsat risk at time ti (i.e. the number of individuals with a time on study largerthan or equal to ti). When estimating the survival function from data consistingof n distinct event times, t1 < t2 < ... < tn, we use the fact that di/Yi providesan estimate of the conditional probability of experiencing the event at time ti,given survival to just prior to time ti. Since S(0) = 1, P (T > ti|T ≥ ti) =S(ti)/S(ti−1), and P (T > ti|T ≥ ti) = 1 − di/Yi, we conclude from (9) that asuitable estimator of S(t) is

S(t) =

{1 , if t < t1,∏ti≤t[1−

di

Yi] , if t1 ≤ t

. (10)

This is known as the Kaplan-Meier estimator (sometimes referred to as theProduct-Limit estimator) of the survival function at time t. It is a step functionwith jumps at the distinct event times, and its variance is usually estimated byGreenwood’s formula:

V [S(t)] = S(t)2∑ti≤t

diYi(Yi − di)

. (11)

Pointwise confidence intervals for the survival function can be formed in theusual manner by assuming approximate normality, and by using the square-root of (11) as estimated standard error.

2.3.2 The Cox proportional hazard model

By modeling the time after diagnosis, it is possible to adjust for certain explana-tory variables, so-called covariates, such as age, gender, or other characteristicsthat may affect the survival. For every observation i, i = 1, 2, ..., n, we have thedata (Ti, δi,Zi), where Ti is the time on study for observation i, δi is an indi-cator of whether observation i experienced the event or was censored, and Zi

9

Page 16: Analyzing and modeling the relative survival rate of ...302910/FULLTEXT01.pdfor unreliable, and a preferable alternative is to study the relative survival rate by comparing the observed

is a vector of covariates for observation i. We denote the hazard rate at time tfor an observation with covariate vector Z by h(t|Z). The most commonly usedregression model in cause-specific survival analysis is the proportional hazardmodel, proposed by Cox:

h(t|Z) = h0(t) exp(βTZ). (12)

h0(t) is a baseline hazard rate that is common for all observations, and β is avector of parameters to be estimated. According to the model, the hazard ratesfor two observations with distinct covariate vectors, Z1 and Z2, are proportional:

h(t|Z1)h(t|Z2)

=h0(t) exp(βTZ1)h0(t) exp(βTZ1)

=exp(βTZ1)exp(βTZ2)

. (13)

The parameters and their standard errors are estimated by maximizing the par-tial likelihood, using an iterative numeric procedure such as Newton-Raphson.

2.4 Competing risks

In cause-specific survival analysis, we consider a hypothetical world where thedisease of interest is the only possible cause of death. Instead of censoring fordeaths from other causes, we now consider them competing risks acting uponthe patient. The occurrence of one of the events would then preclude us fromobserving the others. In general, if we consider K competing risks, we observethe vector (Ti, δi), where Ti is the time until an event, and δi ∈ (0, 1, 2, ...,K)indicates which event that has occurred (with δi=0 meaning censoring). Theprobability of death from cause k before or at time t is expressed by the cause-specific subdistribution function, Fk(t), also known as the cumulative incidencefunction:

Fk(t) = P (T ≤ t, δ = k). (14)

Fk(t) is not a true distribution function since Fk(∞) = P (δ = k) < 1. We nowlet t1 < t2 < ... < tn be distinct event times, and Yi be the number at riskat time ti. We denote by ri the number of patients experiencing the event ofinterest (e.g. death from the cancer of interest) at time ti, and we denote bydi the number of patients experiencing any other event (i.e. any of the othercompeting risks) at time ti. The cumulative incidence function at time t is thendefined as

CI(t) =

{0 , if t < t1,∑ti≤t

[∏i−1j=1[1− dj+rj

Yj]]ri

Yi, if t1 ≤ t

. (15)

10

Page 17: Analyzing and modeling the relative survival rate of ...302910/FULLTEXT01.pdfor unreliable, and a preferable alternative is to study the relative survival rate by comparing the observed

For t ≥ t1, we haveCI(t) =

∑ti≤t

S(ti−)riYi, (16)

where S(ti−) is the Kaplan-Meier estimator of the survival function, when treat-ing any of the competing risks as an event, evaluated just before time ti. Thecumulative incidence function estimates the probability of the event of interestoccurring before or at time t, and occurring before any of the other competingrisks. The variance of (15) can be estimated by

V [CI(t)] =∑ti≤t

S(ti)2

[[CI(t)− CI(ti)]2

ri + diY 2i

+ [1− 2(CI(t)− CI(ti))]riY 2i

],

(17)and pointwise confidence intervals can be formed in the usual manner by as-suming approximate normality, and by using the square-root of (17) as standarderror.

2.5 Age-standardized incidence

Incidence is a measure of the risk of developing a disease/condition within aspecified period of time. The number of new cases is in it self a fairly incom-plete measure, since we have to consider the size of the population at risk.The incidence is often calculated by comparing the occurrence of the diseasein different age categories to the population at risk in the same age categories.Comparing this measure over time is unwise, since the age-composition of theunderlying population changes. Therefore, it is preferable to study the age-standardized incidence, which is the age-specific incidence, standardized to apre-specified age-composition.

The incidence proportion is calculated as the number of new cases of thedisease of interest divided by the size of the underlying population, for a spec-ified period of time. Since cancer is usually more common in groups of elderlyindividuals, it is of interest to consider the age-specific incidence, which for agecategory i, and time period j, is calculated as

Ii,j =Xi,j

Bi,j ·Nj. (18)

Here, Xi,j is the number of new cases in age category i during the time periodof interest, Bi,j is the mean population of the area in age category i during thetime period of interest, and Nj is the number of years in the time period ofinterest. We now consider the age-standardized incidence, Rj , which for time

11

Page 18: Analyzing and modeling the relative survival rate of ...302910/FULLTEXT01.pdfor unreliable, and a preferable alternative is to study the relative survival rate by comparing the observed

period j is calculated as

Rj =k∑i=1

wi · Ii,j . (19)

Here, k is the number of age categories considered, and wi are suitable weights.The measure of interest is usually the age-standardized incidence per 100 000person-years, which we acquire by multiplying Rj by 100 000.

In this study, the population of the seven counties of interest is used asthe background population, and the weights in (19) are chosen from the age-composition for the year 2000.

2.6 Relative survival

Cause-specific survival analysis provides us with an estimate of the survivalprobability of a cancer patient in a hypothetical world where the cancer of in-terest is the only possible cause of death, as death from any other cause leadsto the patient being censored. The specific cause of death for a patient cansometimes be hard to determine, and questions may arise around how to handledeaths from treatment complications, or deaths at a time where the cancer wasnot known to be present.

In relative survival analysis, knowledge of the specific cause of death is notrequired. The relative survival rate, r(t), is defined as the observed survivalprobability, S(t), of the patients included in the study, divided by the expectedsurvival probability, S∗(t), of a comparable group of the background population,

r(t) =S(t)S∗(t)

. (20)

The cause of death is not needed, since we compare the mortality from all causesin the patient group to the mortality of the population at large. The mortalityof the background population is estimated from life tables, which are stratifiedby age, gender, and sometimes race. Life tables are calculated for populationsthat include individuals diagnosed with the cancer of interest, but it has beenshown [14] that this does not significantly affect the estimates in practice. Sincethe general mortality has decreased significantly during the past century, it isof interest to take into account the change in mortality over calender time. Forthis purpose, a cohort life table is used, which includes the mortality of thepopulation for different calender periods. To be able to compare each patientto the correct subgroup of the background population, it is in practice commonto split each observation for pre-specified time intervals, and to increase the

12

Page 19: Analyzing and modeling the relative survival rate of ...302910/FULLTEXT01.pdfor unreliable, and a preferable alternative is to study the relative survival rate by comparing the observed

current year and age accordingly for each new part. If we were interested inyearly relative survival rates, a patient dying in her tenth year after the time ofdiagnosis would be split into ten parts, with the first nine parts being censoredafter exactly one year, and the last part having a time on study equal to thetime spent alive by the patient during the tenth year.

2.6.1 Calculating the observed survival

Two methods are commonly used for calculating the observed survival propor-tion; the actuarial approach, and the method of transforming the hazard. Thefirst uses only information on in which interval patients die or are censored, andprovides the following estimate for the interval-specific survival proportion forinterval i, i.e. the probability of surviving interval i, given survival up until thestart of the interval:

pi,1 = 1− di

li − Wi

2

. (21)

Here, di is the number of deaths in interval i, Wi is the number of censored indi-viduals (”true” censoring, and not censoring due to interval splitting) in intervali, and li is the number at risk in interval i. l′i = li − Wi

2 is called the effectivenumber at risk, and is calculated by assuming that the censored patients are atrisk for an average of half the interval.

If we know the exact time on study for each individual, we can estimate theinterval-specific survival proportion for interval i by transforming the estimatedhazard:

pi,1 = exp(−diYiki). (22)

Yi is the total person-time at risk for interval i, and ki is the length of the inter-val in years. We assume that the hazard rate is constant in each interval. Thetwo methods provide very similar estimates if no left-truncation is present, butsince exact survival times are available in the data set considered, the methodof transforming the hazard is used in this study.

The cumulative observed survival proportion up until the end of interval iis calculated by taking the product of the interval-specific survival proportions:

pi =∏j≤i

pj,1. (23)

The variance of the observed interval-specific survival proportion is usually es-

13

Page 20: Analyzing and modeling the relative survival rate of ...302910/FULLTEXT01.pdfor unreliable, and a preferable alternative is to study the relative survival rate by comparing the observed

timated by using the effective number at risk in Greenwood’s formula (11):

V [pi,1] = p2i,1

(di

l′i(l′i − di)

), (24)

The variance of the cumulative observed survival proportion is estimated by

V [pi] = p2i

∑j≤i

djl′j(l′j − dj)

. (25)

2.6.2 Estimating the expected survival

Three methods of estimating the expected survival proportion are discussedhere; Ederer I, Ederer II, and the Hakulinen method. The methods differ inhow long a patient’s counterpart in the background population is considered tobe at risk, but in practice the three methods often produce very similar esti-mates for follow-up times up to 10 years.

The Ederer I method [16] is not used in this study, but the details are fairlysimple. The expected probability of patient j surviving until the end of intervali, p∗i (j), is calculated as the product of the expected interval-specific survivalproportions, p∗k,1(j), gathered from the life table for corresponding age, genderand year:

p∗i (j) =∏k≤i

p∗k,1(j). (26)

The expected cumulative survival rate is then estimated by

p∗i =

∑l1j=1 p

∗i (j)

l1, (27)

where l1 is the number at risk at the start of the first interval. In other words;the matched individuals in the background population are considered to be atrisk indefinitely, even if the patients die or are censored.

The Ederer II method [17] is considered more reasonable, since it allowsfor different length of follow-up times. Matched individuals in the backgroundpopulation of patients that die or are censored are no longer considered to be atrisk. This is an effect of the estimate of the expected interval-specific survivalfor interval k being based only upon those patients at risk at the start of theinterval:

p∗k,1 =lk∑j=1

p∗k,1(j)lk

. (28)

14

Page 21: Analyzing and modeling the relative survival rate of ...302910/FULLTEXT01.pdfor unreliable, and a preferable alternative is to study the relative survival rate by comparing the observed

The expected cumulative survival rate is then estimated by

p∗i =∏k≤i

p∗k,1. (29)

While this method is more appealing than the previous method, by allowingfor heterogeneous follow-up times, they both lead to biased estimates of therelative survival rate. Ederer I usually overestimates the relative survival rate,while Ederer II usually underestimates it [18].

In the Hakulinen method, a censored patient’s counterpart in the backgroundpopulation is also censored, while the counterpart of a patient that dies remainsat risk. The objective is to create an unbiased estimate of the relative sur-vival rate by canceling the bias of the observed survival with a created bias ofthe estimated expected survival. For follow-up times longer than 10 years, theHakulinen method is usually preferred, but since we in this study only considerfollow-up times up to 7 years, the Ederer II method was deemed to be sufficient.The main advantage of the Ederer II method is its simplicity, which makes iteasy to use in practice. The Hakulinen method is somewhat more complicated,and the full details can be found in Timo Hakulinen’s original article [18].

When using Hakulinen’s method in practice, we have to know the potentialfollow-up times for all patients. The follow-up time is divided into intervals ofpre-specified lengths (usually one year), and we denote the expected interval-specific survival probability for interval k, for a patient having a potential follow-up time stretching past the end of interval k − 1, by p∗k,1(j). The expectedprobability of patient j surviving up until the end of interval i is estimated by

p∗i (j) =i∏

k=1

p∗k,1(j). (30)

We now organize the li patients having a potential follow-up time past the endof interval i − 1, so that κi,a is the set of all patients have a potential follow-up time stretching past the end of interval i, and κi,b is the set of all patientswithdrawing during interval i. We let

l∗i =

{∑lij=1 p

∗i−1(j) , i ≥ 2

l1 , i = 1(31)

be the expected number of patients alive and under observation at the beginning

15

Page 22: Analyzing and modeling the relative survival rate of ...302910/FULLTEXT01.pdfor unreliable, and a preferable alternative is to study the relative survival rate by comparing the observed

of interval i, and we let

w∗i =

∑j∈κi,b

p∗i−1(j){p∗i,1(j)

} 12 , i ≥ 2∑

j∈κi,b

{p∗i,1(j)

} 12 , i = 1

(32)

be the expected number of patients withdrawing alive during interval i.

δ∗i =

∑j∈κi,b

p∗i−1(j)[1−{p∗i,1(j)

} 12 ] , i ≥ 2∑

j∈κi,b[1−

{p∗i,1(j)

} 12 ] , i = 1

(33)

is the expected number of patients dying in interval i, among the patients with-drawing in interval i, and

d∗i =

{∑

j∈κi,ap∗i−1(j)[1− p∗i,1(j)]

}+ δ∗i , i ≥ 2{∑

j∈κi,a[1− p∗i,1(j)]

}+ δ∗1 , i = 1

(34)

is the expected total number of patients dying in interval i. The expectedinterval-specific survival proportion can then be calculated by using the actuarialapproach;

p∗i,1 = 1− d∗i

l∗i −w∗

i

2

. (35)

The expected cumulative survival proportion, up until the end of interval i, iscalculated as the product of the expected interval-specific survival proportions,

p∗i =i∏

k=1

p∗k,1. (36)

2.6.3 The standard error of the relative survival rate

Since the variance of the expected survival proportion is very small compared tothe variance of the observed survival proportion, it is assumed that the expectedsurvival proportion is a fixed constant. Using the familiar formula V [cX] =c2V [X], for c constant, we receive the following expression for the variance ofthe relative survival rate, r:

V [r] = V

[p

p∗

]=

1p∗2

V [p]. (37)

The expression holds for the interval-specific relative survival rate as well asthe cumulative relative survival rate. Greenwood’s formula from section 2.6.1is commonly used as an estimate of the variance of the observed survival pro-portion. Pointwise confidence intervals for the relative survival rate can be

16

Page 23: Analyzing and modeling the relative survival rate of ...302910/FULLTEXT01.pdfor unreliable, and a preferable alternative is to study the relative survival rate by comparing the observed

constructed in the usual manner by assuming approximate normality, and byusing the square-root of (37) as estimated standard error.

2.7 Modeling excess hazard

Merely estimating the relative survival rate gives us no convenient way of ad-justing for interesting covariates. At best, we can estimate the relative survivalrate for different values of a covariate and compare the differences in the ac-quired curves. To perform further analyses would require a way of modeling therelative survival rate, similar to that of Cox regression for cause-specific survivalanalysis.

The proposed method for modeling the relative survival rate is to assumethat the hazard rate at time t, of a patient with covariate vector z, is the sumof an expected hazard rate, h∗(t, z), and an excess hazard rate, ν(t, z). The ex-pected hazard rate is estimated from life tables for the background population,which are stratified by a subvector of z (typically age, gender, and year). Thehazard rate is also assumed to be constant within pre-specified intervals (typi-cally one year) of the follow-up time. In addition to the covariates of interest, z,we will include indicator variables for each interval, and we let x be the vectorof these variables in addition to z. Similar to the Cox regression model, theexcess hazard is assumed to be of the form

ν(t, z) = exp(xβ), (38)

where β is the vector of parameters to be estimated. The proposed model istherefore written as

h(t,x) = h∗(t,x) + exp(xβ). (39)

The excess hazard ratio between a group of patients with covariate vector x1,and a group of patients with covariate vector x2, is constant;

h(t,x1)− h∗(t,x1)h(t,x2)− h∗(t,x2)

=exp(x1β)exp(x2β)

. (40)

This can be interpreted as relative excess risk, i.e. an excess hazard ratio of 1.2indicates that the excess hazard, due to the cancer, is 20% higher for the groupwith covariate vector x1, than for the group with covariate vector x2.

2.7.1 The Esteve et al. full likelihood approach

Data involving right censoring can be represented by the random variable pairs(Ti, δi), i = 1, 2, ..., n, where Ti is the time on study and δi is a variable indicating

17

Page 24: Analyzing and modeling the relative survival rate of ...302910/FULLTEXT01.pdfor unreliable, and a preferable alternative is to study the relative survival rate by comparing the observed

if the patient experienced the event (δi = 1) or was censored (δi = 0). We knowfrom section 2.3 that f(ti) = h(ti)S(ti), so the likelihood function can be writtenas

L =n∏i=1

[f(ti)]δi [S(ti)]

1−δi =n∏i=1

[h(ti)]δi exp

[−∫ ti

0

h(s)ds]. (41)

By taking the logarithm of (41) and using (39) as the hazard rate, we arrive atthe following expression for the log-likelihood function:

l(β) =n∑i=1

δi ln [h∗(ti,xi) + exp(xiβ)]−n∑i=1

∫ ti

0

h∗(s,xi)−n∑i=1

∫ ti

0

exp(xiβ)ds.

(42)The second term does not contain β and can therefore be omitted. Since thehazard rate is assumed to be constant within the pre-specified intervals, we cansimplify things by splitting patient observations for each interval, which resultsin a total of J so-called subject-band observations. The integral in the thirdterm of (42) can then be calculated as the value of the expression within theintegral, which is constant, multiplied by the time at risk, yj , for subject-bandobservation j;

l(β) =J∑j=1

[δj ln [h∗(tj ,xj) + exp(xjβ)]− yj exp(xjβ)] . (43)

The parameter vector β, along with its standard errors, can be estimated bymaximizing the log-likelihood function above. This does not, however, providemeans towards assessing goodness-of-fit.

2.7.2 The Dickman et al. approach

In an effort towards gaining access to suitable methods of model diagnostics,an approach to modeling the excess hazard using Poisson regression has beenproposed [4]. Since we assume that the hazard rate is constant within pre-specified intervals, a Poisson process for the number of deaths is suitable. If weuse subject-band observations and assume that δj ∼ Po (µj) = Po (h(tj ,xj)yj),the log-likelihood function is

l(β) =∑Jj=1 δj ln [h∗(tj ,xj) + exp(xjβ)] +

∑Jj=1 δj ln(yj)−

−∑Jj=1 [h∗(tj ,xj) + exp(xjβ)] yj −

∑Jj=1 ln(δj !).

(44)

After the removal of irrelevant terms, i.e. terms not containing the parametervector β, we are left with

l(β) =∑Jj=1 δj ln [h∗(tj ,xj) + exp(xjβ)]−

∑Jj=1 exp(xjβ)yj , (45)

18

Page 25: Analyzing and modeling the relative survival rate of ...302910/FULLTEXT01.pdfor unreliable, and a preferable alternative is to study the relative survival rate by comparing the observed

which is exactly the same as (43). So fitting this model for subject-band ob-servations would result in the exact same parameter estimates as with the fulllikelihood approach, since we are maximizing the same likelihood function. Byestimating the expected hazard for subject-band observation j by d∗j/yj , withd∗j being the estimated expected number of deaths from causes other than thecancer of interest, we write

µj = h(tj ,xj)yj ⇔µjyj

= h∗(tj ,xj) + exp(xjβ) =d∗jyj

+ exp(xjβ)

ln(µj − d∗j ) = xjβ + ln(yj). (46)

This implies a generalized linear model with outcome dj , Poisson error struc-ture, link ln(µj − d∗j ), and offset ln(yj). By using the framework of generalizedlinear models, we gain access to methods of model diagnostics, and are thereforebetter able to assess the goodness-of-fit.

A convenient way of organizing the data is in the form of collapsed data, i.e.we group together observations with the same covariate pattern and sum overthe number of deaths, the expected number of deaths, and the person-time atrisk. If we, for example, were interested in the covariates gender and cancer stage(I-IV) over 5 yearly intervals, we would collapse the data into 40 observations (2genders, 4 stages, 5 intervals, 2·4·5 = 40). The parameter estimates for collapseddata differ slightly from the parameter estimates for subject-band observations,since d∗j in (46) varies within each covariate pattern. This difference is oftenvery small though, and studying residuals or other goodness-of-fit statistics isnot appropriate when estimating the model from subject band observations.

2.7.3 The Hakulinen-Tenkanen approach

Hakulinen and Tenkanen [5] have proposed an approach to modeling propor-tional excess hazard using a binomial error structure. The method uses groupeddata, which only contains information on how many patients that die or are cen-sored in each interval for the different covariate pattern. We let hik denote thehazard contributed by group k in follow-up interval (year) i, which is assumedto be of the form

hik = h∗ik + exp(xikβ). (47)

Here, h∗ik is the expected hazard, xik is the covariate vector for group k ininterval i, and β is a vector of parameters to be estimated. We consider yearly

19

Page 26: Analyzing and modeling the relative survival rate of ...302910/FULLTEXT01.pdfor unreliable, and a preferable alternative is to study the relative survival rate by comparing the observed

intervals, and it follows from (47) and (5) that

− ln(pikp∗ik

) = exp(xikβ)⇔ ln[− ln(

pikp∗ik

)]

= xikβ, (48)

where pik is the observed interval-specific survival probability for group k ininterval i, and p∗ik is the expected interval-specific survival probability for groupk in interval i. So, the Hakulinen-Tenkanen approach uses a generalized linearmodel with binomial error structure, and a complementary log-log link functionwith a division by p∗ik. This model produces similar estimates to those of thePoisson model in section 2.7.2, as the likelihood functions for the two modelsare similar.

2.8 Statistical software

Most of the statistical analysis in this study was performed with the R StatisticalSoftware Package1. Functions for cause-specific survival analysis and estimationof the cumulative incidence function are integrated in existing packages, whilefunctions for age-standardized incidence, relative survival, and the modelingof excess hazard were written by the author. The program for estimating therelative survival rate uses essentially the same procedure as Paul Dickman’sprogram [13] for estimating the relative survival rate in SAS2, and the programfor modeling excess hazard uses code from the relsurv package, written by MajaPohar [15]. The data was split for yearly intervals, allowing the patient’s age andcurrent year to be increased accordingly. This ensured that, in each interval,the patient’s expected mortality was estimated from the correct backgroundpopulation. The Ederer II method was used to estimate the expected survivalrate, using life tables for Sweden attained from the Human Mortality Database3.A program using the Hakulinen method to estimate the expected survival ratewas written by the author, and the results from this program were very similarto the results from the program using the Ederer II method. An advantageof the Ederer II method is the possibility of comparing the results from the Rprogram to the results from Paul Dickman’s SAS program. As a precaution,and to ensure the accuracy of the results, some analyses were performed in bothR and SAS.

1http://www.r-project.org2http://www.sas.com/technologies/analytics/statistics/stat/index.html3http://www.mortality.org

20

Page 27: Analyzing and modeling the relative survival rate of ...302910/FULLTEXT01.pdfor unreliable, and a preferable alternative is to study the relative survival rate by comparing the observed

3 Results

In this section, the results from the analysis of the regional malignant melanomaregister are presented. Descriptive analyses of the register, including the dis-tribution of various patient and tumour characteristics, age-standardized inci-dence, cumulative incidence, and relative survival, are presented in the firstsections. In section 3.2, the excess hazard model in section 2.7.2 is used toinvestigate the effect of the regression phenomenon on the relative survival rate.

3.1 Descriptive analysis

The geographical distribution over time, for the 4952 patients included in thestudy, has been presented in Table I. Table II shows the distribution of patientand tumour characteristics, for all patients, by gender and time period. In latersections, analyses will primarily be based on patients with invasive melanomas.Melanomas in situ are fairly harmless, and the survival rate for those affectedis about the same as the survival rate of the background population.

Bar plots for most of the categories in Table II, for patients with invasivetumours, are included in Figure 1 and 2. The gender distribution is fairly even,but the distribution of location differs greatly between men and women. Formen, the most common location of malignant melanomas is the trunk, whichcould be explained by the fact that men expose their trunks to sunlight to alarger extent than women. The most common location for females is the lowerextremities, and this could be explained with similar logic. The median ageat diagnosis, for patients with invasive melanomas, is 63.2 years, but it rangesfrom 10.8 years to 107.3 years. SSM (Superficial Spreading Melanoma) is themost common type of malignant melanoma in the register, and NM (NodularMelanoma) is the second most common type. Together, melanomas in situ andLM (Lentigo Maligna, which is also a change in situ) represent around 20% ofthe cases, while LMM and ALM are fairly uncommon. Worth mentioning is thatonly a very small portion of the patients with invasive malignant melanomashave metastases. Table III shows the distribution of metastases, for patientswith invasive melanomas, by gender and time period. Table IV shows the dis-tribution of treatment at primary surgery and extended surgery, for all patients,by gender and time period.

21

Page 28: Analyzing and modeling the relative survival rate of ...302910/FULLTEXT01.pdfor unreliable, and a preferable alternative is to study the relative survival rate by comparing the observed

Table II. Distribution of patient and tumour characteristics, for all patients, by period and gender.

1996-2000 2001-2006 Total

Male Female Male Female

No. of cases 899 902 1508 1643 4952

Gender (%)

Male 899 (100.0) - - 1508 (100.0) - - 2407 (48.6)

Female - - 902 (100.0) - - 1643 (100.0) 2545 (51.4)

Age

Median (Min-Max) 64.7 (17.8-94.8) 61.9 (16.8-107.3) 65.6 (19.0-97.0) 61.3 (10.8-99.9) 63.5 (10.8-107.3)

Location (%)

Head-Neck 145 (16.1) 161 (17.8) 271 (18.0) 284 (17.3) 861 (17.4)

Top extremity 136 (15.1) 193 (21.4) 245 (16.2) 345 (21.0) 919 (18.6)

Lower extremity 101 (11.2) 273 (30.3) 158 (10.5) 524 (31.9) 1056 (21.3)

Trunk 478 (53.2) 244 (27.1) 813 (53.9) 472 (28.7) 2007 (40.5)

Palm, foot, subungual 16 (1.8) 22 (2.4) 17 (1.1) 17 (1.0) 72 (1.5)

Data unavailable 23 (2.6) 9 (1.0) 4 (0.3) 1 (0.1) 37 (0.7)

Type (%)

Invasive 685 (76.2) 669 (74.2) 1091 (72.3) 1116 (67.9) 3561 (71.9)

In situ 124 (13.8) 152 (16.9) 240 (15.9) 312 (19.0) 828 (16.7)

LM 5 (0.6) 4 (0.4) 93 (6.2) 122 (7.4) 224 (4.5)

Data unavailable 85 (9.5) 77 (8.5) 84 (5.6) 93 (5.7) 339 (6.8)

Type of invasive (%)

SSM 375 (54.7) 396 (59.2) 709 (65.0) 747 (66.9) 2227 (62.5)

LMM 37 (5.4) 35 (5.2) 34 (3.1) 52 (4.7) 158 (4.4)

NM 208 (30.4) 164 (24.5) 277 (25.4) 238 (21.3) 887 (24.9)

ALM 7 (1.0) 18 (2.7) 10 (0.9) 15 (1.3) 50 (1.4)

Other 33 (4.8) 44 (6.6) 56 (5.1) 52 (4.7) 185 (5.2)

Data unavailable 25 (3.6) 12 (1.8) 5 (0.5) 12 (1.1) 54 (1.5)

Size, largest diameter (mm) (%)

≤5 52 (5.8) 110 (12.2) 162 (10.7) 238 (14.5) 562 (11.3)

5.01-6.00 47 (5.2) 67 (7.4) 68 (4.5) 105 (6.4) 287 (5.8)

6.01-7.00 50 (5.6) 73 (8.1) 63 (4.2) 90 (5.5) 276 (5.6)

7.01-8.00 72 (8.0) 65 (7.2) 107 (7.1) 107 (6.5) 351 (7.1)

>8 419 (46.6) 327 (36.3) 876 (58.1) 827 (50.3) 2449 (49.5)

Data unavailable 259 (28.8) 260 (28.8) 232 (15.4) 276 (16.8) 1027 (20.7)

Ulceration (%)

Absent 538 (59.8) 588 (65.2) 891 (59.1) 994 (60.5) 3011 (60.8)

Present 195 (21.7) 162 (18.0) 280 (18.6) 231 (14.1) 868 (17.5)

Non-assessable 0 (0.0) 0 (0.0) 29 (1.9) 27 (1.6) 56 (1.1)

Data unavailable 166 (18.5) 152 (16.9) 308 (20.4) 391 (23.8) 1017 (20.5)

T category (%)

T0 0 (0.0) 0 (0.0) 0 (0.0) 0 (0.0) 0 (0.0)

Tis 129 (14.3) 156 (17.3) 333 (22.1) 434 (26.4) 1052 (21.2)

T1 281 (31.3) 348 (38.6) 482 (32.0) 576 (35.1) 1687 (34.1)

T2 136 (15.1) 140 (15.5) 222 (14.7) 216 (13.1) 714 (14.4)

T3 125 (13.9) 82 (9.1) 165 (10.9) 153 (9.3) 525 (10.6)

T4 100 (11.1) 73 (8.1) 177 (11.7) 132 (8.0) 482 (9.7)

TX/Data unavailable 128 (14.2) 103 (11.4) 129 (8.6) 132 (8.0) 492 (9.9)

N category (%)

N0 754 (83.9) 791 (87.7) 1315 (87.2) 1475 (89.8) 4335 (87.5)

N1 33 (3.7) 19 (2.1) 38 (2.5) 18 (1.1) 108 (2.2)

N2 0 (0.0) 0 (0.0) 0 (0.0) 0 (0.0) 0 (0.0)

N3 0 (0.0) 0 (0.0) 0 (0.0) 0 (0.0) 0 (0.0)

NX/Data unavailable 112 (12.5) 92 (10.2) 155 (10.3) 150 (9.1) 509 (10.3)

M category (%)

M0 754 (83.9) 791 (87.7) 1315 (87.2) 1475 (89.8) 4335 (87.5)

M1a 0 (0.0) 0 (0.0) 1 (0.1) 1 (0.1) 2 (0.0)

M1b 2 (0.2) 1 (0.1) 6 (0.4) 1 (0.1) 10 (0.2)

M1c 9 (1.0) 4 (0.4) 4 (0.3) 5 (0.3) 22 (0.4)

MX/Data unavailable 134 (14.9) 106 (11.8) 182 (12.1) 161 (9.8) 583 (11.8)

Clark level of invasion (%)

I 129 (14.3) 156 (17.3) 333 (22.1) 434 (26.4) 1052 (21.2)

II 157 (17.5) 213 (23.6) 309 (20.5) 327 (19.9) 1006 (20.3)

III 213 (23.7) 205 (22.7) 344 (22.8) 374 (22.8) 1136 (22.9)

IV 231 (25.7) 190 (21.1) 338 (22.4) 293 (17.8) 1052 (21.2)

V 34 (3.8) 35 (3.9) 63 (4.2) 61 (3.7) 193 (3.9)

Non-assessable 2 (0.2) 0 (0.0) 22 (1.5) 37 (2.3) 61 (1.2)

Data unavailable 133 (14.8) 103 (11.4) 99 (6.6) 117 (7.1) 452 (9.1)

Breslow depth (%)

≤1.0 306 (34.0) 373 (41.4) 495 (32.8) 591 (36.0) 1765 (35.6)

1.01-2.00 136 (15.1) 140 (15.5) 222 (14.7) 216 (13.1) 714 (14.4)

2.01-4.00 125 (13.9) 82 (9.1) 165 (10.9) 153 (9.3) 525 (10.6)

>4.0 100 (11.1) 73 (8.1) 177 (11.7) 132 (8.0) 482 (9.7)

Non-assessable 0 (0.0) 0 (0.0) 88 (5.8) 82 (5.0) 170 (3.4)

Data unavailable 232 (25.8) 234 (25.9) 361 (23.9) 469 (28.5) 1296 (26.2)

Stage (%)

0 129 (14.3) 156 (17.3) 333 (22.1) 434 (26.4) 1052 (21.2)

I 369 (41.0) 439 (48.7) 610 (40.5) 708 (43.1) 2126 (42.9)

II 239 (26.6) 177 (19.6) 353 (23.4) 317 (19.3) 1086 (21.9)

III 0 (0.0) 0 (0.0) 0 (0.0) 0 (0.0) 0 (0.0)

IV 11 (1.2) 5 (0.6) 11 (0.7) 6 (0.4) 33 (0.7)

Data incomplete 151 (16.8) 125 (13.9) 201 (13.3) 178 (10.8) 655 (13.2)

Page 29: Analyzing and modeling the relative survival rate of ...302910/FULLTEXT01.pdfor unreliable, and a preferable alternative is to study the relative survival rate by comparing the observed

Table III. Distribution of metastases at time of diagnosis, for patients with invasive tumours, by period and gender.

1996-2000 2001-2006 Total

Male Female Male Female

No. of cases 685 669 1091 1116 3561

Metastases detected (%)

Yes 60 (8.8) 34 (5.1) 61 (5.6) 37 (3.3) 192 (5.4)

No 625 (91.2) 635 (94.9) 997 (91.4) 1060 (95.0) 3317 (93.1)

Data unavailable 0 (0.0) 0 (0.0) 33 (3.0) 19 (1.7) 52 (1.5)

Regional palpable lymph node metastases (%)

No 1 (1.7) 0 (0.0) 12 (19.7) 4 (10.8) 17 (8.9)

Yes 35 (58.3) 20 (58.8) 23 (37.7) 15 (40.5) 93 (48.4)

Data unavailable 24 (40.0) 14 (41.2) 26 (42.6) 18 (48.6) 82 (42.7)

Satellites/In-transit metastases (%)

No 45 (75.0) 28 (82.4) 28 (45.9) 22 (59.5) 123 (64.1)

Yes 15 (25.0) 6 (17.6) 26 (42.6) 7 (18.9) 54 (28.1)

Data unavailable 0 (0.0) 0 (0.0) 7 (11.5) 8 (21.6) 15 (7.8)

Remote metastases (%)

No 35 (58.3) 25 (73.5) 32 (52.5) 20 (54.1) 112 (58.3)

Yes 22 (36.7) 7 (20.6) 14 (23.0) 6 (16.2) 49 (25.5)

Data unavailable 3 (5.0) 2 (5.9) 15 (24.6) 11 (29.7) 31 (16.1)

Location of remote metastases (%)

Skin, subcutan or gland metastasis 0 (0.0) 0 (0.0) 1 (7.1) 1 (16.7) 2 (4.1)

Lung metastasis 4 (18.2) 1 (14.3) 3 (21.4) 1 (16.7) 9 (18.4)

Other visceral metastasis 14 (63.6) 6 (85.7) 1 (7.1) 4 (66.7) 25 (51.0)

Multiple locations 1 (4.5) 0 (0.0) 6 (42.9) 0 (0.0) 7 (14.3)

Unknown location 0 (0.0) 0 (0.0) 0 (0.0) 0 (0.0) 0 (0.0)

Data unavailable 3 (13.6) 0 (0.0) 3 (21.4) 0 (0.0) 6 (12.2)

Table IV. Distribution of treatment, for all patients, by period and gender.

1996-2000 2001-2006 Total

Male Female Male Female

No. of cases 899 902 1508 1643 4952

Free margin at primary surgery (mm)

Median (Min-Max) 3.0 (0.0-60.0) 3.0 (0.0-50.0) 4.0 (0.0-50.0) 3.0 (0.0-50.0) 3.0 (0.0-60.0)

Free margin at primary surgery (%)

<3mm 225 (25.0) 252 (27.9) 340 (22.5) 387 (23.6) 1204 (24.3)

≥3mm 410 (45.6) 355 (39.4) 749 (49.7) 702 (42.7) 2216 (44.7)

Data unavailable 264 (29.4) 295 (32.7) 419 (27.8) 554 (33.7) 1532 (30.9)

Primary surgery (%)

Excision + sutur 827 (92.0) 823 (91.2) 1283 (85.1) 1388 (84.5) 4321 (87.3)

Skin transplantation 0 (0.0) 0 (0.0) 4 (0.3) 2 (0.1) 6 (0.1)

Surgical flap plastic 0 (0.0) 0 (0.0) 2 (0.1) 1 (0.1) 3 (0.1)

Amputation 0 (0.0) 0 (0.0) 0 (0.0) 0 (0.0) 0 (0.0)

Other (biopsy etc.) 66 (7.3) 75 (8.3) 160 (10.6) 191 (11.6) 492 (9.9)

Combination 6 (0.7) 3 (0.3) 59 (3.9) 61 (3.7) 129 (2.6)

Data unavailable 0 (0.0) 1 (0.1) 0 (0.0) 0 (0.0) 1 (0.0)

Free margin at extended surgery (mm)

Median (Min-Max) 20.0 (0.0-80.0) 10.0 (0.0-100.0) 10.0 (0.0-100.0) 10.0 (0.0-100.0) 10.0 (0.0-100.0)

Free margin at extended surgery (%)

<10mm 42 (4.7) 59 (6.5) 89 (5.9) 122 (7.4) 312 (6.3)

10.00-19.99mm 251 (27.9) 309 (34.3) 501 (33.2) 632 (38.5) 1693 (34.2)

≥20mm 339 (37.7) 253 (28.0) 412 (27.3) 377 (22.9) 1381 (27.9)

Data unavailable 267 (29.7) 281 (31.2) 506 (33.6) 512 (31.2) 1566 (31.6)

Extended surgery (%)

Excision + sutur 563 (62.6) 574 (63.6) 957 (63.5) 1058 (64.4) 3152 (63.7)

Skin transplantation 103 (11.5) 95 (10.5) 150 (9.9) 187 (11.4) 535 (10.8)

Surgical flap plastic 39 (4.3) 27 (3.0) 32 (2.1) 54 (3.3) 152 (3.1)

Amputation 10 (1.1) 19 (2.1) 15 (1.0) 24 (1.5) 68 (1.4)

Other (biopsy etc.) 1 (0.1) 0 (0.0) 6 (0.4) 3 (0.2) 10 (0.2)

Combination 0 (0.0) 0 (0.0) 2 (0.1) 4 (0.2) 6 (0.1)

Data unavailable 183 (20.4) 187 (20.7) 346 (22.9) 313 (19.1) 1029 (20.8)

Recommendations on free margin were

followed (tumour depth ≤1.0mm)

No (margin<1.0cm) 18 (5.9) 21 (5.6) 32 (6.5) 47 (8.0) 118 (6.7)

Yes (margin≥1.0cm) 211 (69.0) 257 (68.9) 295 (59.6) 372 (62.9) 1135 (64.3)

Data unavailable 77 (25.2) 95 (25.5) 168 (33.9) 172 (29.1) 512 (29.0)

Recommendations on free margin were

followed (tumour depth >1.0mm)

No (margin<2.0cm) 43 (11.9) 51 (17.3) 96 (17.0) 89 (17.8) 279 (16.2)

Yes (margin≥2.0cm) 229 (63.4) 157 (53.2) 313 (55.5) 287 (57.3) 986 (57.3)

Data unavailable 89 (24.7) 87 (29.5) 155 (27.5) 125 (25.0) 456 (26.5)

Days between primary and extended

surgery

Median (Min-Max) 33.0 (0.0-392.0) 33.0 (0.0-314.0) 40.0 (0.0-571.0) 39.0 (0.0-1109.0) 36.0 (0.0-1109.0)

Page 30: Analyzing and modeling the relative survival rate of ...302910/FULLTEXT01.pdfor unreliable, and a preferable alternative is to study the relative survival rate by comparing the observed

Age distribution for patients with invasive melanoma

Age at diagnosis

Per

cent

0 10 20 30 40 50 60 70 80 90 100 110

0.0

0.5

1.0

1.5

2.0

2.5

3.0

median (min−max): 63.2 (10.8−107.3)

mean (sd): 62.3 (16.2)

In s

itu LM

SS

M

LMM

NM

ALM

Oth

er

Dat

a m

issi

ng

Distribution of type of melanoma

Per

cent

0

10

20

30

40

50

In s

itu LM

SS

M

LMM

NM

ALM

Oth

er

Dat

a m

issi

ng

Distribution of type of melanoma

Per

cent

0

10

20

30

40

50

Mal

e

Fem

ale

Dat

a m

issi

ng

Distribution of gender for patients with invasive melanoma

Per

cent

0

10

20

30

40

50

60

Mal

e

Fem

ale

Dat

a m

issi

ng

Distribution of gender for patients with invasive melanoma

Per

cent

0

10

20

30

40

50

60

Hea

d−N

eck

Top

ext

rem

ity

Low

er e

xtre

mity

Tru

nk

Pal

m, f

oot,

subu

ngua

l

Dat

a m

issi

ng

Distribution of location for male patients with invasive melanoma

Per

cent

0

10

20

30

40

50

60

Hea

d−N

eck

Top

ext

rem

ity

Low

er e

xtre

mity

Tru

nk

Pal

m, f

oot,

subu

ngua

l

Dat

a m

issi

ng

Distribution of location for male patients with invasive melanoma

Per

cent

0

10

20

30

40

50

60

Hea

d−N

eck

Top

ext

rem

ity

Low

er e

xtre

mity

Tru

nk

Pal

m, f

oot,

subu

ngua

l

Dat

a m

issi

ng

Distribution of location for female patients with invasive melanoma

Per

cent

0

10

20

30

40

50

60

Hea

d−N

eck

Top

ext

rem

ity

Low

er e

xtre

mity

Tru

nk

Pal

m, f

oot,

subu

ngua

l

Dat

a m

issi

ng

Distribution of location for female patients with invasive melanoma

Per

cent

0

10

20

30

40

50

60

II III IV V

Non

−as

sess

able

Dat

a m

issi

ng

Distribution of Clark level for patients with invasive melanoma

Per

cent

0

10

20

30

40

50

60

II III IV V

Non

−as

sess

able

Dat

a m

issi

ng

Distribution of Clark level for patients with invasive melanoma

Per

cent

0

10

20

30

40

50

60

Figure 1: Bar plots of patient and tumour characteristics.

24

Page 31: Analyzing and modeling the relative survival rate of ...302910/FULLTEXT01.pdfor unreliable, and a preferable alternative is to study the relative survival rate by comparing the observed

Abs

ent

Pre

sent

Non

−as

sess

able

Dat

a m

issi

ng

Distribution of presence of ulceration for patients with invasive melanoma

Per

cent

0

20

40

60

80

Abs

ent

Pre

sent

Non

−as

sess

able

Dat

a m

issi

ng

Distribution of presence of ulceration for patients with invasive melanoma

Per

cent

0

20

40

60

80

T1

T2

T3

T4

Dat

a m

issi

ng

Distribution of T−classification for patients with invasive melanoma

Per

cent

0

10

20

30

40

50

T1

T2

T3

T4

Dat

a m

issi

ng

Distribution of T−classification for patients with invasive melanoma

Per

cent

0

10

20

30

40

50

N0

N1

N2

N3

Dat

a m

issi

ng

Distribution of N−classification for patients with invasive melanoma

Per

cent

0

20

40

60

80

100

N0

N1

N2

N3

Dat

a m

issi

ng

Distribution of N−classification for patients with invasive melanoma

Per

cent

0

20

40

60

80

100

M0

M1a

M1b

M1c

Dat

a m

issi

ng

Distribution of M−classification for patients with invasive melanoma

Per

cent

0

20

40

60

80

100

M0

M1a

M1b

M1c

Dat

a m

issi

ng

Distribution of M−classification for patients with invasive melanoma

Per

cent

0

20

40

60

80

100

I II III IV

Dat

a m

issi

ng

Distribution of TNM−stage for patients with invasive melanoma

Per

cent

0

10

20

30

40

50

60

70

I II III IV

Dat

a m

issi

ng

Distribution of TNM−stage for patients with invasive melanoma

Per

cent

0

10

20

30

40

50

60

70

Upp

sala

Söd

erm

anla

nd

Vär

mla

nd

Öre

bro

Väs

tman

land

Dal

arna

Gäv

lebo

rg

Geografical distribution of all patients

Per

cent

0

5

10

15

20

Upp

sala

Söd

erm

anla

nd

Vär

mla

nd

Öre

bro

Väs

tman

land

Dal

arna

Gäv

lebo

rg

Geografical distribution of all patients

Per

cent

0

5

10

15

20

Figure 2: Bar plots of patient and tumour characteristics.

25

Page 32: Analyzing and modeling the relative survival rate of ...302910/FULLTEXT01.pdfor unreliable, and a preferable alternative is to study the relative survival rate by comparing the observed

3.1.1 Age-standardized incidence

In Figure 3, we see the age-standardized incidence of malignant melanoma inthe seven counties of interest. The first plot shows the total rate, as well asthe rate by type of melanoma. During several years, a certain hospital failed toreport the results of the histopathological investigation, a consequence of whichbeing that information on the histopathological type of the melanoma is missingfor many patients. When comparing the years 2006 and 1996, we observe a totalincrease in age-standardized incidence of almost 70%. For invasive melanomas,we observe an increase of almost 50%. The incidence curve levels out for theyears at the end of the 20th century (as can be seen in Figure 3), but a continuingincrease can be observed for the first years of the 21st century.

1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006

Year

0

10

20

30

40

Age

sta

ndar

dize

d ra

te p

er 1

00 0

00

TotalInvasiveIn situ (incl. LM)Data unavailable for type

1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006

Year

0

10

20

30

40

Age

sta

ndar

dize

d ra

te p

er 1

00 0

00

MaleFemale

Figure 3: Age-standardized incidence per 100 000 by type of melanoma.1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006

Year

0

10

20

30

40

Age

sta

ndar

dize

d ra

te p

er 1

00 0

00

TotalInvasiveIn situ (incl. LM)Data unavailable for type

1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006

Year

0

10

20

30

40

Age

sta

ndar

dize

d ra

te p

er 1

00 0

00

MaleFemale

Figure 4: Age-standardized incidence per 100 000 by gender, for invasive melanomas.

26

Page 33: Analyzing and modeling the relative survival rate of ...302910/FULLTEXT01.pdfor unreliable, and a preferable alternative is to study the relative survival rate by comparing the observed

Figure 4 shows that males have a higher age-standardized incidence of inva-sive melanomas than females, for almost the entire duration of the period. Theincrease in incidence of malignant melanoma is alarming, and there is no doubtamong physicians that excessive exposure to sunlight is a large contributingfactor. While awareness of the dangers of ultraviolet radiation has increased, itis still to early to detect any stagnation of the rapid increase in incidence.

3.1.2 Cumulative incidence analysis on a subset of the data

Information on the specific cause of death was available for a subset of the data,namely for patients with a date of diagnosis between the years 1996 and 2003.The information was assumed to be fairly reliable (even though there was no wayof confirming it), and the cumulative incidence function was calculated for pa-tients in this subset, by viewing death from malignant melanoma and death fromany other cause as competing risks. In Figure 5, we see the subdistribution func-tions, estimated from patients with invasive melanomas and no metastases. Thetwo competing risks have similar subdistribution functions, but worth noting isthat these estimates are not adjusted for any patient or tumour characteristics,and that it is obvious that the risk of dying from direct consequences of malig-nant melanoma depends on several important factors. Nevertheless, estimatesbased on all patients give us important information on the overall mortality ofthe disease.

0 1 2 3 4 5

Years after diagnosis

0

0.05

0.1

0.15

0.2

Cau

se−

spec

ific

subd

istr

ibut

ion

func

tion

Death from malignant melanoma

Death from other cause

Figure 5: Cause-specific subdistribution functions.

27

Page 34: Analyzing and modeling the relative survival rate of ...302910/FULLTEXT01.pdfor unreliable, and a preferable alternative is to study the relative survival rate by comparing the observed

3.1.3 Relative survival

When studying more serious forms of cancer, e.g. lung cancer, most patientsdie within a few years from the time of diagnosis, and the cause of death is mostlikely the cancer. But malignant melanoma is, as we have seen in the previoussection, not a disease with a high mortality rate. Since knowledge of the specificcause of death is not available for all patients in the data set, and since we areinterested in studying the effect of the cancer of interest, studying the relativesurvival rate is preferable to cause-specific survival analysis.

Only patients with invasive tumours and no metastases were included whenestimating the relative survival rate, with the exception of the analysis by typeof melanoma. In Figure 6, we see the cumulative relative survival rate, for upto 7 years of follow-up time, by type of the melanoma, gender of the patient,location of the melanoma, and age of the patient. Patients diagnosed with aninvasive melanoma have a 5-year relative survival rate of around 85%, whilepatients diagnosed with a melanoma in situ have a survival rate roughly equalto that of the background population. The relative survival curve for melanomain situ is even above one, which suggests that patients with this type of tumoursurvive better than their counterparts in the background population. One of thereasons for this could be that patients diagnosed with melanoma in situ havemore frequent contact with the health care system.

Men have, as can be seen in Figure 6.b, a lower relative survival rate thanwomen. This is common for most cancer diseases, but nothing is revealed hereabout whether this is an actual consequence of the gender of the patient, ora consequence of some other influential characteristic that is more common inone gender than the other. To further investigate this would require modeling,which would allow us to adjust for various tumour characteristics.

The relative survival rate by location, for invasive melanomas, is presentedin Figure 6.c. As can be seen, all locations have similar relative survival rates,except for Head-Neck. Melanomas located on the head or on the neck are moredifficult to excise, but whether or not this is the cause of the lower relative sur-vival rate is left unsaid. Worth noting is that the location category Palm, foot,subungual is not included here, since the patients with any of these locationsare too few to analyze.

28

Page 35: Analyzing and modeling the relative survival rate of ...302910/FULLTEXT01.pdfor unreliable, and a preferable alternative is to study the relative survival rate by comparing the observed

0 1 2 3 4 5 6 7

Years after diagnosis

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Cum

ulat

ive

rela

tive

surv

ival

Cumulative relative survival, for all patients, by type of malignant melanoma (LM included in In situ).

InvasiveIn situ

No. at risk

Invasive 3558 2906 2383 1942 1556 1246 957 756In situ 1050 862 691 518 400 312 243 194

0 1 2 3 4 5 6 7

Years after diagnosis

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Cum

ulat

ive

rela

tive

surv

ival

Relative survival, for patients with an invasive tumour and no metastases, by gender

MaleFemale

No. at risk

Male 1613 1319 1089 887 698 556 428 346Female 1685 1431 1188 971 795 650 510 397

0 1 2 3 4 5 6 7

Years after diagnosis

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Cum

ulat

ive

rela

tive

surv

ival

Relative survival, for patients with an invasive tumour and no metastases, by location.

Head−NeckUpper extr.Lower extr.Trunk

No. at risk

Head−Neck 438 350 277 215 175 142 104 73Upper extr. 620 511 436 354 291 233 176 133Lower extr. 751 635 524 442 355 288 221 190

Trunk 1423 1201 990 812 646 520 420 332

0 1 2 3 4 5 6 7

Years after diagnosis

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1C

umul

ativ

e re

lativ

e su

rviv

alRelative survival, for patients with an invasive tumour and no metastases, by age group.

Age < 50Age 50 − 64Age 65 − 80Age >= 80

No. at risk

Age < 50 756 670 586 495 413 347 273 227Age 50 − 64 1037 893 751 625 511 410 320 250Age 65 − 80 1040 869 717 574 454 369 298 236Age >= 80 465 318 223 164 115 80 47 30

Figure 6: Cumulative relative survival by a) type, b) gender, c) location, and d) age group.

What age categories to consider in the analysis is fairly subjective, and thecategories were chosen as they are simply because they seemed logical. Evenwith the expected survival rate being adjusted for age, we see a difference inrelative survival between the age categories. Again, nothing can be said in thisstage about whether this is actually caused by the age of the patient, or by someother characteristic not adjusted for here.

In Figure 7, we see the cumulative relative survival rate, for up to 5 yearsof follow-up time, by Clark level of invasion, T-classification, presence of ulcer-ation, and presence of the regression phenomenon. Only patients with invasivetumours and no metastases were included in the analysis resulting in these fourplots. Figure 7.a shows, as could be expected, large differences in relative sur-vival between patients with different levels of invasion. Patients with Clark levelII have a 5-year relative survival rate almost equal to one, while patients withClark level V have a 5-year relative survival rate of roughly 0.54. There are

29

Page 36: Analyzing and modeling the relative survival rate of ...302910/FULLTEXT01.pdfor unreliable, and a preferable alternative is to study the relative survival rate by comparing the observed

few patients with Clark level V, so we would expect the confidence intervalsfor this category to be wide. It is obvious, when studying Figure 7.b, that tu-mour depth is a strong prognostic factor. Patients with melanomas thinner than1.0 mm do not have any significantly lower survival rate than the backgroundpopulation, which clearly reflects the importance of early detection. Anotherimportant prognostic factor is the presence/absence of ulceration (Figure 7.c).The analysis of the regional malignant melanoma register shows that patientswith ulcerated tumours have a 5-year relative survival rate of roughly 0.67,while patients with non-ulcerated tumours have a 5-year relative survival rateof roughly 0.95.

0 1 2 3 4 5

Years after diagnosis

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Cum

ulat

ive

rela

tive

surv

ival

Relative survival, for patients with an invasive tumour and no metastases, by Clark level.

IIIIIIVV

No. at risk

II 972 852 738 623 508 413III 1085 942 792 661 533 422IV 977 763 601 480 390 324V 156 109 80 48 31 22

0 1 2 3 4 5

Years after diagnosis

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Cum

ulat

ive

rela

tive

surv

ival

Relative survival, for patients with an invasive tumour and no metastases, by T−classification.

T1T2T3T4

No. at risk

T1 1632 1413 1213 1022 846 682T2 679 592 495 406 319 263T3 491 378 287 231 179 143T4 410 299 226 154 112 87

0 1 2 3 4 5

Years after diagnosis

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Cum

ulat

ive

rela

tive

surv

ival

Relative survival, for patients with an invasive tumour and no metastases, by presence of ulceration.

AbsentPresent

No. at risk

Absent 2405 2056 1739 1439 1174 951Present 776 596 453 354 270 217

0 1 2 3 4 5

Years after diagnosis

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Cum

ulat

ive

rela

tive

surv

ival

Relative survival, for patients with an invasive tumour and no metastases, by presence of regression phen.

AbsentPresent

No. at risk

Absent 2295 1897 1572 1275 1050 995Present 418 342 261 194 144 136

Figure 7: Cumulative relative survival by a) Clark level of invasion, b) T-classification, c)presence of ulceration, and d) presence of the regression phenomenon.

Previous analyses of the regional malignant melanoma register have shownthat patients with tumour regression induced by the body’s own immune systemhave a higher crude survival rate than patients without tumour regression. Aquestion that arose was whether or not this difference remained in the relative

30

Page 37: Analyzing and modeling the relative survival rate of ...302910/FULLTEXT01.pdfor unreliable, and a preferable alternative is to study the relative survival rate by comparing the observed

survival analysis. As can be seen in Figure 7.d, there is a noticeable differencealso in relative survival between patients with and without tumour regression,and this is the subject of further analysis in the next section.

3.2 Further analysis of the regression phenomenon

In order to further investigate the effect of the regression phenomenon on therelative survival rate, the Poisson model in section 2.7.2 was used. The dataconsidered in the model was limited to patients with invasive melanomas andno metastases, i.e. patients with invasive melanomas, N-classification N0, andM-classification M0. When testing hypotheses, p-values <0.05 were consideredstatistically significant.

Collapsed data for follow-up times up to 5 years was used, with the firstmodel being adjusted for age and gender. Table V shows the parameter es-timates, standard errors, p-values of the parameter being equal to zero, andexcess hazard ratios with 95% confidence intervals. The model has a devianceof 72.33 on 70 degrees of freedom (p ≈0.4008), and the deviance residual anal-ysis plots in Figure 8 show no evident lack of fit. The parameter for presenceof the regression phenomenon is significant (p = 0.025), even when adjustingfor age and gender. Also, the parameter for gender and the parameters for thetwo highest age-groups are significantly different from zero, which reflects thebehavior of the relative survival curves in the previous section.

Estimate Std. Error p EHR EHR(βi) (eβi) 95% CI

(intercept) −4.348 0.290 <0.001follow-up interval: 2 0.320 0.262 0.223 1.377 (0.823,2.303)follow-up interval: 3 0.567 0.257 0.027 1.763 (1.065,2.919)follow-up interval: 4 0.280 0.301 0.352 1.323 (0.734,2.386)follow-up interval: 5 −0.531 0.484 0.273 0.588 (0.228,1.519)

gender: woman −0.471 0.188 0.012 0.624 (0.432,0.902)age: 50-64 0.497 0.265 0.060 1.645 (0.979,2.763)age: 65-79 0.972 0.261 <0.001 2.644 (1.584,4.412)

age: 80+ 1.481 0.352 <0.001 4.398 (2.205,8.774)regression phenomenon: yes −1.305 0.584 0.025 0.271 (0.086,0.852)

Table V: Estimated parameters and excess hazard ratios (EHR) of the Poisson model.

31

Page 38: Analyzing and modeling the relative survival rate of ...302910/FULLTEXT01.pdfor unreliable, and a preferable alternative is to study the relative survival rate by comparing the observed

0 10 20 30 40 50 60 70 80Observation

−3

−2

−1

0

1

2

3

Sta

ndar

dize

d de

vian

ce r

esid

uals

●●●●●●●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●●●

−3 −2 −1 0 1 2 3Theoretical quantiles of standard normal

−3

−2

−1

0

1

2

3

Ord

ered

sta

ndar

dize

d de

vian

ce r

esid

uals ●

●●●●

●●●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●●

Figure 8: a) Standardized deviance residuals, and b) Q-Q plot of standardized deviance resid-uals, for the excess hazard Poisson model in Table V.

When modeling cancer survival, it is common practice to adjust for cancerstage, since the mortality of the disease differs greatly between the stages. Sincewe have excluded patients with metastases from the analysis, and since it is wellknown that tumour depth is an important prognostic factor, it seems reasonableto adjust our excess hazard model for T-classification. Table VI shows theparameter estimates, standard errors, p-values, and excess hazard ratios with95% confidence intervals, for the new adjusted Poisson model.

Estimate Std. Error p EHR EHR(βi) (eβi) 95% CI

(intercept) −6.245 0.585 <0.001follow-up interval: 2 0.340 0.233 0.146 1.405 (0.889,2.219)follow-up interval: 3 0.640 0.231 0.006 1.896 (1.205,2.983)follow-up interval: 4 0.434 0.269 0.106 1.543 (0.911,2.614)follow-up interval: 5 −0.427 0.431 0.322 0.653 (0.281,1.519)

gender: woman −0.248 0.171 0.146 0.780 (0.558,1.091)T-classification: T2 1.665 0.598 0.005 5.286 (1.637,17.069)T-classification: T3 2.921 0.566 <0.001 18.568 (6.122,56.321)T-classification: T4 3.553 0.565 <0.001 34.923 (11.539,105.699)

age: 50-64 0.253 0.265 0.341 1.287 (0.765,2.165)age: 65-79 0.427 0.258 0.098 1.533 (0.924,2.544)

age: 80+ 0.833 0.309 0.007 2.300 (1.255,4.218)regression phenomenon: yes −0.925 0.577 0.109 0.397 (0.128,1.228)

Table VI: Estimated parameters and excess hazard ratios (EHR) of the adjusted Poisson model.

32

Page 39: Analyzing and modeling the relative survival rate of ...302910/FULLTEXT01.pdfor unreliable, and a preferable alternative is to study the relative survival rate by comparing the observed

0 20 50 80 110 140 170 200 230 260 290Observation

−3

−2

−1

0

1

2

3

Sta

ndar

dize

d de

vian

ce r

esid

uals

●●

●●●

●●●●●●●●●●

●●●

●●●●●●●●●●●

●●●●●●●●●●

●●●

●●●

●●●●●●

●●

●●●

●●

●●

●●●●●●●●

●●●●●

●●●

●●●

●●

●●●

●●

●●●●●

●●

●●

●●●●

●●●●

●●

●●

●●

●●●

●●●●●●●

●●

●●●

●●

●●

●●●●●●●

●●

●●

●●

●●

●●

●●

●●●

−3 −2 −1 0 1 2 3Theoretical quantiles of standard normal

−3

−2

−1

0

1

2

3

Ord

ered

sta

ndar

dize

d de

vian

ce r

esid

uals

●●

●●●

●●●●●●●●●

●●●

●●● ●●●●●

●●●

●●● ●●●●● ●●

●●●

●●

●●●●●

●●

●●●

●●

●●

●●

●●●●●●

●●●●●

●●●

●●●

●●

●●●

●●

●●●●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●●

●●●

●●●●

●●

●●●

●●

●●

●●●●●●●

●●

●●

●●

●●

●●

●●

●●●

Figure 9: a) Standardized deviance residuals, and b) Q-Q plot of standardized deviance resid-uals, for the excess hazard Poisson model in Table VI.

The model has a deviance of 249.00 on 282 degrees of freedom (p ≈0.9221),which suggests that the model fits the data fine. The Q-Q plot of the standard-ized deviance residuals (Figure 8.b) does show some deviation, but we shouldnot expect a perfect line when fitting a generalized linear model. Interestinglyenough, when adjusting for tumour stage, the parameter for presence of theregression phenomenon is no longer significantly different from zero. This sug-gests that at least a part of the improvement in relative survival, for the groupof patients with tumour regression, may be caused by a difference in tumourstage distribution. If we look at the distribution of tumour stage (Table VII)for patients with and without the regression phenomenon, we see a noticeabledifference.

Table VII: Distribution of T-stage in percent, for patients with invasive melanomas and absence of metastases,

by presence/absence of the regression phenomenon.

T1 T2 T3 T4 TX/missing

Presence of the regression phenomenon 69.62 17.70 6.22 5.02 1.44

Absence of the regression phenomenon 46.10 21.57 17.04 13.68 1.61

It is reasonable to find a lower proportion of large melanomas within thegroup of patients with tumour regression, as the thickness of these patients’tumours could have been reduced by the body’s own immune system. Themodel suggests that the improvement in relative survival could be the resultof this group having a higher proportion of thin melanomas. Even though themodel suggests no significant improvement in relative survival as a result of theregression phenomenon, there is nothing in the data that indicates the opposite.

33

Page 40: Analyzing and modeling the relative survival rate of ...302910/FULLTEXT01.pdfor unreliable, and a preferable alternative is to study the relative survival rate by comparing the observed

When adjusting for T-classification, the parameter for the third age-groupand the parameter for gender are no longer significantly different from zero,which suggests a difference in tumour stage distribution (Table VIII).

Table VIII: Distribution of T-stage in percent, for patients with invasive melanomas

and absence of metastases, by gender and age-group.

T1 T2 T3 T4 TX/missing

Male 45.23 21.52 17.35 14.39 1.52

Female 53.98 20.46 13.50 10.41 1.65

Age < 50 62.40 22.46 10.14 4.06 0.94

Age 50 - 64 55.24 20.85 13.19 8.72 2.00

Age 65 – 79 43.69 20.79 18.93 15.30 1.29

Age ≥ 80 28.88 19.07 21.25 28.34 2.45

Breslow depth is one of the most important prognostic factors for patientsdiagnosed with malignant melanoma, but nothing is said here about whetheror not the differences in distribution of T-classification for the subgroups of in-terest account for the entire differences in their relative survival curves. Theparameter for the highest age-group is still significantly different from zero, andan excess hazard ratio of 2.30 suggests that the excess risk is 130% higher forpatients of age ≥80, compared to patients of age <50.

The excess hazard ratios for the different T-classifications are very high,particularly for T3 and T4, and the reason for this is of course that we havechosen T1 as baseline. Patients with T1 tumours have a relative survival rateroughly equal to one, which corresponds to an excess hazard rate close to zero.And if the excess hazard of the baseline group is very small, it is reasonable toacquire very high excess hazard ratios.

34

Page 41: Analyzing and modeling the relative survival rate of ...302910/FULLTEXT01.pdfor unreliable, and a preferable alternative is to study the relative survival rate by comparing the observed

4 Discussion

The increase in occurrence of malignant melanoma has for many years been amajor concern of the health care community. When studying the regional ma-lignant melanoma register of central Sweden for the years 1996-2006, we foundan increase in the age-standardized incidence of invasive melanomas of roughly50%. Excessive exposure to sunlight is often mentioned as one of the majorrisk factors, but genetics also plays an important role. While awareness of thedisease is being thoroughly promoted, we have yet to see any stagnation of therapid increase in incidence. The mortality of malignant melanoma is lower thanthat of most cancer diseases, since the location of the tumour enables more pa-tients to be completely cured through surgery. The most important prognosticfactor is the tumour size, and as with most cancer diseases, it is of vital impor-tance to discover the disease in an early stage.

In this paper, methods of both cause-specific survival analysis, and relativesurvival analysis, were described. Since the specific cause of death was notavailable for all patients in the data set, it was considered preferable to studythe relative survival rate. The analysis showed that patients with melanomasin situ had a relative survival rate above one, which could be explained as aselection phenomenon. It is plausible that patients who manage to discover thedisease in this stage are individuals who are more concerned with their health.Also, a diagnosis of malignant melanoma most likely results in more contactwith the health care system. Patients with invasive melanomas had a 5-yearrelative survival rate of roughly 85%, and further analyses were focused on thisgroup. The descriptive analysis of the data set revealed that very few patientshad any metastases. The survival rate of this small group was not deemed com-parable to the survival rate of those without metastases, and therefore, onlypatients without metastases were included in further analyses. The relative sur-vival analysis verified the importance of tumour depth as a prognostic factor,as patients with melanomas thinner than 1.0 mm had no significantly differentsurvival rate than the background population. Patients with thick melanomas(≥4.0 mm) had a 5-year relative survival rate of roughly 57%, which clearlyverifies the importance of creating better awareness of the disease, so that moretumours can be discovered in an early stage. Differences in relative survivalrates were also observed between the genders, the age-groups, and the locationsof the melanoma.

When comparing patients with and without the regression phenomenon, asignificant difference in relative survival was observed. In order to further inves-

35

Page 42: Analyzing and modeling the relative survival rate of ...302910/FULLTEXT01.pdfor unreliable, and a preferable alternative is to study the relative survival rate by comparing the observed

tigate this result, the Dickman et al. approach to modeling excess hazard wasapplied. When adjusting for just age and gender, the regression phenomenonseemed to have a significant effect, and an excess hazard ratio of 0.271 wassuggested. The parameter for gender and the parameters for the two high-est age-groups were also significant. Interestingly enough, when adjusting forT-classification, most of the parameters were no longer significant, which sug-gested that the differences in the relative survival curves could have been theresults of differences in tumour stage distribution. A study of the distribu-tion of T-classification for the subgroups of interest confirmed large differences.Presence of the regression phenomenon seemed to have no significant effect onthe relative survival rate, which is interesting, since it has been suggested thatpatients with the regression phenomenon should exhibit a poorer survival ratethan corresponding patients without the regression phenomenon. The issue isstill controversial, and further study of the effect of this phenomenon is needed.

36

Page 43: Analyzing and modeling the relative survival rate of ...302910/FULLTEXT01.pdfor unreliable, and a preferable alternative is to study the relative survival rate by comparing the observed

References

[1] Socialstyrelsen. Cancer i siffror - Popularvetenskapliga fakta om cancer -dess forekomst, bot och dodlighet. 2005. ISBN:91-89446-68-2.

[2] Regionalt Onkologiskt Centrum. Vardprogram for malignt hudmelanomoch dysplastiska nevus. 2008.

[3] Cohn-Cedermark G, Rutqvist L-E, Andersson R, Breivald M, IngvarC, Johansson H, Jonsson P-E, Krysander L, Lindholm C, Ringborg U.Longterm results of a randomized study by the Swedish Melanoma StudyGroup on 2-cm versus 5-cm resection margins for patients with cutaneousmelanoma with a tumor thickness of 0.8-2.0 mm. Cancer 2000; 89(7):1495-501.

[4] Dickman PW, Sloggett A, Hills M, Hakulinen T. Regression models forrelative survival. Statist. Med. 2004; 23(1): 51-64.

[5] Hakulinen T, Tenkanen L. Regression Analysis of Relative Survival Rates.Appl. Statist. 1987; 36(3): 309-317.

[6] Sondergaard K, Hou-Jensen K. Partial regression in thin primary cuta-neous melanomas clinical stage I. A study of 486 cases. Virchows Arch.1985; 408: 241-247.

[7] Blessing, K, McLaren, KM. Histological regression in primary cutaneousmelanoma: recognition, prevalence and significance. Histopathology 1992;20: 315.

[8] Kelly JW, Sagebiel RW, Blois MS. Regression in malignant melanoma.A histologic feature without independent prognostic significance. Cancer1985; 56(9): 2287-2291.

[9] Brogelli L, Umberto MR, Moretti S, Urso C. The prognostic significanceof histologic regression in cutaneous melanoma. Melanoma Res. 1992; 2:87.

[10] McGovern VJ, Shaw HM, Milton GW. Prognosis in patients with thinmalignant melanoma: influence of regression. Histopathology 1983; 7(5):673-680.

[11] Klein John P, Moeschberger Melvin L. Survival Analysis: Techniques forCensored and Truncated Data Second Edition. Springer, 2003. ISBN: 0-387-95399-X.

37

Page 44: Analyzing and modeling the relative survival rate of ...302910/FULLTEXT01.pdfor unreliable, and a preferable alternative is to study the relative survival rate by comparing the observed

[12] Olsson U. Generalized Linear Models: An Applied Approach. Studentlit-teratur, 2002. ISBN: 91-44-04155-1.

[13] Dickman P. Estimating and modelling relative survival using SAS. 2004.

[14] Ederer F, Axtell LM, Cutler SJ. The Relative Survival Rate: A StatisticalMethodology. National Cancer Institute Monograph. 1961; 6: 101-121.

[15] Pohar M, Stare J. Relative survival analysis in R. Computer Methods andPrograms in Biomedicine. 2006; 81: 272-278.

[16] Ederer F, Axtell LM, Cutler SJ. The Relative Survival Rate: A Statisti-cal Methodology. National Cancer Institute Monograph. 1961; 6: 101-121.

[17] Ederer F, Heise H. Instructions to IBM 650 Programmers in ProcessingSurvival Computations; 1959. Methodological note No. 10, End ResultsEvaluation Section, National Cancer Institute, Bethesda MD.

[18] Hakulinen T. Cancer Survival Corrected for Heterogeneity in PatientWithdrawal. Biometrics. 1982; 38: 933-942.

38

Page 45: Analyzing and modeling the relative survival rate of ...302910/FULLTEXT01.pdfor unreliable, and a preferable alternative is to study the relative survival rate by comparing the observed

A Variable descriptions

Variable Description

Pn Swedish personal ID numberHemkod Code giving the county, municipality and parish of the patientAlder Age at diagnosisDtdgn Date of diagnosisLokal Location of tumourSida The side of the body on which the tumour is locatedDiam Largest diameter of the tumourEnbpri Primary tumour only (yes/no)Regly Regional palpable lymph node metastases (yes/no)Trasat Satellites/in-transit metastases (yes/no)Klmet Remote metastases (yes/no)Klmlok Location of remote metastasesT T classificationN N classificationM M classificationdtpri Date of primary surgerymmpri Free margin at primary surgery (mm)kirpri Proscedure at primary surgerydtutv Date of extended surgerymmutv Free margin at extended surgery (mm)kirutv Proscedure at extended surgerykirlg Lymph node surgery (yes/no)dtlg Date of lymph node surgerylgors Reason for lymph node surgerylgaxh Lymph node region, right axill (yes/no)lgaxv Lymph node region, left axill (yes/no)lgljh Lymph node region, right groin (yes/no)lgljv Lymph node region, left groin (yes/no)lgann Lymp node region, other (yes/no)d kon Gender of the patientlopnr Database identification numberdtpad Date of pathological disectiontyp Type of melanoma (invasive/in situ/LM)hist Type of invasive melanomamm Tumour depth (mm)ulc Presence of ulceration (yes/no)clark Clark level of invasionvaxt Presence of growth in the resection border (yes/no)regr Presence of the regression phenomenon (yes/no)vitaldat Last date of follow-updod Status at end of follow-up