70
ACTA UNIVERSITATIS UPSALIENSIS UPPSALA 2012 Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Medicine 817 Molecular Characterisation and Prognostic Biomarker Discovery in Human Non-Small Cell Lung Cancer KAROLINA EDLUND ISSN 1651-6206 ISBN 978-91-554-8482-8 urn:nbn:se:uu:diva-181912

Molecular Characterisation and Prognostic Biomarker

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Molecular Characterisation and Prognostic Biomarker

ACTAUNIVERSITATIS

UPSALIENSISUPPSALA

2012

Digital Comprehensive Summaries of Uppsala Dissertationsfrom the Faculty of Medicine 817

Molecular Characterisation andPrognostic Biomarker Discoveryin Human Non-Small Cell LungCancer

KAROLINA EDLUND

ISSN 1651-6206ISBN 978-91-554-8482-8urn:nbn:se:uu:diva-181912

Page 2: Molecular Characterisation and Prognostic Biomarker

Dissertation presented at Uppsala University to be publicly examined in Rudbecksalen,Rudbecklaboratoriet, Dag Hammarskjölds väg 20, Uppsala, Friday, November 16, 2012 at13:15 for the degree of Doctor of Philosophy. The examination will be conducted in English.

AbstractEdlund, K. 2012. Molecular Characterisation and Prognostic Biomarker Discovery in HumanNon-Small Cell Lung Cancer. Acta Universitatis Upsaliensis. Digital ComprehensiveSummaries of Uppsala Dissertations from the Faculty of Medicine 817. 68 pp. Uppsala.ISBN 978-91-554-8482-8.

Non-small cell lung cancer (NSCLC) constitutes a clinically, histologically, and geneticallyheterogeneous disease entity that represents a major cause of cancer-related death. Early-stagepatients, who undergo surgery with curative intent, experience high recurrence rates and theeffect of adjuvant treatment is modest. Prognostic biomarkers would be of particular relevanceto guide intensified treatment depending on expected outcome and moreover often infer abiological role in tumourigenesis.

This thesis presents a translational study approach to establish a well-characterised NSCLCfrozen-tissue cohort and to obtain a profile of each specimen with regard to genome-widecopy number alterations, global gene expression levels and somatic mutations in selectedcancer-related genes. Furthermore, the generation of a formalin-fixed, paraffin-embedded tissuemicroarray enabled validation of findings on the protein level using immunohistochemistry.The comprehensive molecular characterisation, combined with data on clinical parameters,enabled the analysis of biomarkers linked to disease outcome. In Paper I, single nucleotidepolymorphism arrays were applied to assess copy number alterations in NSCLC and associationswith overall survival in adenocarcinoma and squamous cell carcinoma were described. InPaper II, we evaluated expression levels of selected stromal proteins in NSCLC usingimmunohistochemistry and the adhesion molecule CD99 was identified as an outcome-related biomarker in two independent cohorts. Paper III presents a strategy for prognosticbiomarker discovery based on gene expression profiling, meta-analysis, and validation ofprotein expression on tissue microarrays, and suggests the putative tumour suppressor CADM1as a candidate biomarker. In Paper IV, we propose a prognostic role for tumour-infiltratingIGKC-expressing plasma cells in the local tumour microenvironment, indicating an involvementof the humoral immune response in anti-tumor activity. In Paper V, we combined next-generation deep sequencing with statistical analysis of the TP53 database to define novelparameters for database curation.

In summary, this thesis exemplifies the benefits of a translational study approach, basedon a comprehensive tumour characterisation, and describes molecular markers associated withclinical outcome in NSCLC.

Keywords: non-small cell lung cancer, biomarker, prognosis, microarray, copy numberaberration

Karolina Edlund, Uppsala University, Department of Immunology, Genetics and Pathology,Molecular and Morphological Pathology, Rudbecklaboratoriet, SE-751 85 Uppsala, Sweden.

© Karolina Edlund 2012

ISSN 1651-6206ISBN 978-91-554-8482-8urn:nbn:se:uu:diva-181912 (http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-181912)

Page 3: Molecular Characterisation and Prognostic Biomarker

Till pappa.

Page 4: Molecular Characterisation and Prognostic Biomarker
Page 5: Molecular Characterisation and Prognostic Biomarker

List of papers

This thesis is based on the following papers, which are referred to in the text by their Roman numerals. Reprints were made with permission from the respective publish-er. #Authors contributed equally. I Micke P#, Edlund K#, Holmberg L, Kultima HG, Mansouri L, Ekman S,

Bergqvist M, Scheibenflug L, Lamberg K, Myrdal G, Berglund A, Anders-son A, Lambe M, Nyberg F, Thomas A, Isaksson A, Botling J. Gene copy number aberrations are associated with survival in histologic subgroups of non-small cell lung cancer. J Thorac Oncol. (2011)

II Edlund K#, Lindskog C#, Saito A, Berglund A, Pontén F, Kultima HG,

Isaksson A, Jirström K, Planck M, Johansson L, Lambe M, Holmberg L, Nyberg F, Ekman S, Bergqvist M, Landelius P, Lamberg K, Botling J, Östman A, Micke P. CD99 is a novel prognostic stromal marker in non-small cell lung cancer. Int J Cancer. (2012)

III Botling J#, Edlund K#, Lohr M, Hellwig B, Holmberg L, Lambe M,

Berglund A, Ekman S, Bergqvist M, Pontén F, König A, Fernandes O, Karlsson M, Helenius G, Karlsson C, Rahnenführer J, Hengstler JG, Micke P. Biomarker discovery in non-small cell lung cancer: integrating gene expression profiling, meta-analysis and tissue microarray validation. Accepted for publication in Clin Cancer Res. (2012)

IV Lohr M#, Edlund K#, Botling J, Hammad S, Hellwig B, Othman A,

Berglund A, Lambe M, Holmberg L, Ekman S, Bergqvist M, Pontén F, Cadenas C, Marchan R, Hengstler JG, Rahnenführer J, Micke P. The prog-nostic relevance of tumour-infiltrating plasma cells and immunoglobulin kappa C expression indicates an important role of the humoral immune response in non-small cell lung cancer. Manuscript.

V Edlund K, Larsson O, Ameur A, Bunikis I, Gyllensten U, Leroy B, Sund-

ström M, Micke P, Botling J, Soussi T. Data-driven unbiased curation of the TP53 tumor suppressor gene mutation database and validation by ultradeep sequencing of human tumors. Proc Natl Acad Sci U S A. (2012)

Page 6: Molecular Characterisation and Prognostic Biomarker

Related publications

Mattsson JS, Imgenberg-Kreuz J, Edlund K, Botling J, Micke P. Consistent mutation status within histologically heterogeneous lung cancer lesions. Histopathology. (2012) Sundström M, Edlund K, Lindell M, Glimelius B, Birgisson H, Micke P, Botling J. KRAS analysis in colorectal carcinoma: analytical aspects of Pyrosequencing and allele-specific PCR in clinical practice. BMC Cancer. (2010) Botling J#, Edlund K#, Segersten U, Tahmasebpoor S, Engström M, Sundström M, Malmström PU, Micke P. Impact of thawing on RNA integrity and gene expression analysis in fresh frozen tissue. Diagn Mol Pathol. (2009) Segersten U, Edlund K, Micke P, de la Torre M, Hamberg H, Edvinsson Å, Anders-son S, Malmström PU, Wester K. A novel strategy based on histological protein profiling in-silico for identifying potential biomarkers in urinary bladder cancer. BJU Int. (2009)

Page 7: Molecular Characterisation and Prognostic Biomarker

Contents

Introduction ................................................................................................... 11 Cancer ...................................................................................................... 12

The cancer cell and genome ................................................................ 12 The tumour microenvironment ............................................................ 13

Lung cancer .............................................................................................. 15 Aetiology ............................................................................................. 15 Epidemiology ....................................................................................... 15 Classification ....................................................................................... 16

Non-small cell lung cancer ....................................................................... 16 Histology ............................................................................................. 16 Staging ................................................................................................. 17 Treatment ............................................................................................. 17 Molecular pathology ............................................................................ 19

Biomarkers in NSCLC ............................................................................. 21 Biomarker definition ............................................................................ 21 Current prognostic and predictive biomarkers ..................................... 21

Array-based NSCLC profiling ................................................................. 22 Gene expression profiling .................................................................... 23 Genomic profiling ................................................................................ 24

Aim ............................................................................................................... 26

Present Investigation ..................................................................................... 27 Materials and methods ............................................................................. 27

Patient cohort and study design ........................................................... 27 DNA and RNA extraction ................................................................... 31 Gene expression array analysis ............................................................ 31 Single nucleotide polymorphism array analysis .................................. 32 Quantitative real-time polymerase chain reaction ............................... 34 Laser capture microdissection ............................................................. 34 Immunohistochemistry ........................................................................ 35 Genomic sequencing ............................................................................ 37 Statistics ............................................................................................... 39

Page 8: Molecular Characterisation and Prognostic Biomarker

Results and Discussion ............................................................................. 40 Paper I .................................................................................................. 40 Paper II ................................................................................................ 42 Paper III ............................................................................................... 44 Paper IV ............................................................................................... 45 Paper V ................................................................................................ 47

Concluding remarks and future perspectives ........................................... 50

Acknowledgements ....................................................................................... 53

References ..................................................................................................... 55

Page 9: Molecular Characterisation and Prognostic Biomarker

Abbreviations

ABL v-abl Abelson murine leukemia viral oncogene homolog 1 ACTA2 actin, alpha 2, smooth muscle AKT1 v-akt murine thymoma viral oncogene homolog 1 ALK anaplastic lymphoma receptor tyrosine kinase APC adenomatous polyposis coli ATM ataxia telangiectasia mutated ATP adenosine triphosphate BCR breakpoint cluster region BRAF v-raf murine sarcoma viral oncogene homolog B1 CADM1 cell adhesion molecule 1 CAF cancer-associated fibroblast CD99 CD99 molecule CDKN2A cyclin-dependent kinase inhibitor 2A CGH comparative genomic hybridisation CI confidence interval CIN chromosomal instability DAB diaminobenzidine dNTP deoxy nucleotide triphosphates ddNTP dideoxy nucleotide triphosphates DDR2 discoidin domain receptor family, member 2 DNA deoxyribonucleic acid ECM extra-cellular matrix EGFR epidermal growth factor receptor EML4 echinoderm microtubule-associated protein-like 4 EPHA3 ephrin type-A receptor 3 ERBB2 v-erb-b2 erythroblastic leukemia viral oncogene homolog 2 ERBB4 v-erb-a erythroblastic leukemia viral oncogene homolog 4 ERCC1 excision repair cross-complementing rodent repair deficiency, com-

plementation group 1 FDR false discovery rate FFPE formalin-fixed paraffin-embedded FGFR1 fibroblast growth factor receptor FISH fluorescence in situ hybridisation GAPDH glyceraldehyde 3-phosphate dehydrogenase HGFR hepatocyte growth factor receptor HR hazard ratio

Page 10: Molecular Characterisation and Prognostic Biomarker

HRP horseradish peroxidase IGKC immunoglobulin kappa constant IHC immunohistochemistry IRF4 interferon regulatory factor 4 KIF5B kinesin family member 5B KRAS v-Ki-ras2 Kirsten rat sarcoma viral oncogene homolog LCNEC large cell neuroendocrine carcinoma LKB1 liver kinase B1 LOH loss of heterozygosity MAP2K1 mitogen-activated protein kinase kinase 1 MDM2 mdm2, p53 E3 ubiquitin protein ligase homolog MET met proto-oncogene (hepatocyte growth factor receptor) MIA minimally invasive adenocarcinoma MS4A1 membrane-spanning 4-domains, subfamily A, member 1 MYC v-myc myelocytomatosis viral oncogene homolog NF1 neurofibromin 1 NKX2-1 NK2 homeobox 1 NOS not otherwise specified NSCLC non-small cell lung cancer OR odds ratio PCA principal component analysis PCR polymerase chain reaction PIK3CA α-phosphatidylinositol-4,5-bisphosphate-3-kinase catalytic subunit PTEN phosphatase and tensin homolog PPi pyrophospate PTPRC protein tyrosine phosphatase, receptor type, C qPCR quantitative PCR RET ret proto-oncogene RIN RNA integrity number RMA robust multi-array average RNA ribonucleic acid ROS1 c-ros oncogene 1 receptor tyrosine kinase RRM1 ribonucleoside-diphosphate reductase SCLC small cell lung cancer SDC1 syndecan 1 SNP single nucleotide polymorphism SOLiD sequencing by oligonucleotide ligation and detection SOX2 sex determining region Y box 2 STK11 serine/threonine kinase 11 TERT telomerase reverse transcriptase TMA tissue microarray TP53 tumor protein p53 TTF1 thyroid transcription factor-1 VEGF vascular endothelial growth factor

Page 11: Molecular Characterisation and Prognostic Biomarker

11

Introduction

More than twelve million people are newly diagnosed with cancer world-wide each year [1]. Behind this large number lie initially very small altera-tions in the building blocks that constitute the genetic blueprint. These alterations affect the normal behaviour of the cell – that is, to live and to die in a controlled fashion – and consequently render a serious threat to human health. While sharing many elementary features related to malignancy, e.g. the unrestricted proliferation of genetically transformed cells [2], all cancers are not alike. Cancer is a disease that presents with numerous faces and mani-fests itself in different ways in various organs and tissue types. This diversity is reflected also in the range of therapeutic strategies that have been devel-oped to combat the disease, leading up to the current concept of personalised cancer care, with treatment customised according to the molecular character-istics of each individual tumour [3]. This thesis focuses on non-small cell lung cancer (NSCLC), one of the most common as well as most deadly cancer types. At first, a brief overview is provided with regard to central aspects of understanding cancer as a genetic and cellular disease, as well as features specifically related to NSCLC, to clarify what clinical needs we should aim to satisfy as we try to better under-stand the basic molecular biology of the disease. The five papers included in this thesis will then be presented, with the objective to portray the benefits of an integrative translational study approach and a clinically, histologically, and molecularly well-characterised patient cohort, as well as to exemplify how the vast amount of molecular information generated by comprehensive tumour profiling efforts can be made understandable in connection with patient parameters, including disease outcome, and potential clinical imple-mentations.

Page 12: Molecular Characterisation and Prognostic Biomarker

12

Cancer

The cancer cell and genome The information that is required for an organism to develop its mass, shape and form, and to construct the tools needed in order to sustain its biological processes, lies embedded in the DNA sequence. The flow of biological information in living organisms is summarised in the central dogma of molecular biology, providing a structure to understand how a copy of the DNA sequence is produced (replication) to enable transfer from mother cell to daughter cell, as well as how the stored sequence information is rewritten into mRNA (transcription) to enable the genetic code to be deciphered and understood by the protein synthesis machinery that will build the functional end product specified in the DNA (translation) [4].

In this process, although under stringent control, errors do occur as the wrong nucleotide may be mistakenly incorporated into the newly synthesised DNA strand during replication. Moreover, DNA damage may occur due to mutagenic substances that we encounter while interacting with our surround-ings or from endogenous processes within the cell [5-6]. We are protected from permanent DNA damage by delicate reparation systems that efficiently correct errors as they occur [6], but the system is not bulletproof and if an error remains unrepaired, this may signify the initiation of the tumourigenic process by enabling the clonal selection of tumour cells with a proliferative advantage [7]. A non-synonymous mutation is any nucleotide change in the DNA sequence that introduces a change in the amino acid sequence of the corresponding protein. To the cell, this changed phenotype may provide an advantage, or a disadvantage, which subsequently provides a basis for bio-logical selection. On the other hand, synonymous mutations are biologically silent in the sense that the changed nucleotide does not alter the amino acid sequence [5]. In general, numerous accumulated mutations are required to promote a malignant phenotype and give rise to a tumour cell clone with the capability of unrestricted proliferation [8-9].

In addition to limitless proliferative ability, acquired traits in common to most cancer cells include a capacity to generate their own growth factors while resisting growth-inhibitory signals and escaping the cell’s apoptotic machinery [2]. The acquisition of properties needed for the invasion of surrounding tissue and development of distant metastasis is also a common feature of neoplastic cells [2] and is the one that in the end causes most cancer deaths. Alongside these often cited hallmarks of cancer, mutation acquisition and genome instability have been mentioned as characteristics that enable cancer development [10].

Three main types of genes, oncogenes, tumour suppressor genes and stability genes, contribute to carcinogenesis [7]. An oncogene is a normal gene (proto-oncogene) that due to transformation is continuously activated,

Page 13: Molecular Characterisation and Prognostic Biomarker

13

active in a setting where it should be inactive, or overexpressed to produce an aberrant amount of gene product. One mutated allele of an oncogene is in general sufficient to provide a growth advantage [11]. Tumour suppressor genes guard the cell from malignant transformation by controlling for instance cell cycle and apoptosis-related processes. Inactivation of both alleles of a tumor suppressor gene is required to contribute to tumour devel-opment [12]. Bi-allelic loss may for instance follow the deletion of one allele due to a large-scale chromosomal event, such as loss of a chromosome arm, together with an inactivating point mutation in the second allele [7]. Stability genes (or maintenance genes) ensure that DNA errors are kept at a minimum and if this type of gene is inactivated, mutations will accumulate more rapid-ly in the cell. While some stability genes correct replication errors and mutagen-caused DNA damage, other safeguard processes such as cell divi-sion and chromosomal segregation [6]. Lately, genome-wide mutation screens have revealed that the total number of genes that are mutated in human cancers is larger than previously thought. In addition to well-known mutation hotspot genes, such as TP53 and KRAS, numerous genes appear to be mutated in only a small percentage of tumours [11,13].

Another common feature of many solid tumours is chromosomal instabil-ity (CIN), causing large-scale amplification and deletion of chromosome segments [7]. In a typical cancer specimen, 25% of the genome is affected by somatic copy number alterations that often affect an entire chromosome or a chromosome arm [14]. The mechanism behind chromosomal instability is likely to involve deregulated mitotic checkpoints and/or telomere dysfunc-tion [15-16].

The tumour microenvironment In the local tissue environment, malignant cells are surrounded by cells of the tumour stroma and the various cell types are involved in an intricate and continuous interplay. Today it is generally accepted that many aspects of tumour formation and growth are influenced by non-malignant cell types in the tumour microenvironment and these interactions are being studied in greater detail [17]. The tumour stroma consists of fibroblasts, cells of the immune system, pericytes, vascular cells, and the extracellular matrix (ECM) proteins that make up the inter-cellular space. Throughout the malig-nant process, factors related to the tumour stroma have been shown to either promote or inhibit tumour growth [18-19]. Therapies that target various aspects of the microenvironment have been developed and are today used in clinical practice, for instance to restrain the formation of new blood vessels that are needed to support the expanding tumour mass [20].

Fibroblasts that reside in the tumour microenvironment, commonly termed cancer-associated fibroblasts (CAF), typically present a morphology that differs compared to that of normal fibroblasts. CAFs express character-

Page 14: Molecular Characterisation and Prognostic Biomarker

14

istic mesenchymal markers, such as α-smooth muscle actin and vimentin, and have many features in common with the myofibroblasts involved in non-malignant processes such as wound-healing [18]. Fibroblasts present locally in the tumour tissue are thought to be the predecessors of their cancer-associated counterparts, but pericytes and circulating bone marrow-derived mesenchymal cells are also considered possible CAF precursor cells, as are vascular cells, or even tumour cells, that may transform through endothelial or epithelial to mesenchymal transition [19].

Inflammation in the tumour vicinity is a common feature of many cancers [21]. The capacity of tumour cells to avoid detection and elimination by the host immune system is important for tumour initiation and has been proposed to be included among the characteristics that constitute the cancer hallmarks [10]. On the other hand, it is apparent that concurrent inflamma-tion in the early stages of tumour formation may endorse tumour develop-ment [22]. These two aspects highlight the two-sided role of the immune system in cancer, counteracting as well as promoting the tumourigenic process.

While studying single genes and pathways it is important to remember that every system functions as a whole and that the interactions between different cell types constitute a vital aspect of tumour formation and growth. However, the complex tissue composition also demands that considerations are taken when interpreting data that originates from mixed cell type tissue specimens. In many cancer specimens, such as small tumour needle biopsies, the actual tumour cell population frequently constitutes the minority cell type, which will influence the composite signal and potentially lead to erroneous conclusions if not assessed and accounted for. The application of in situ methods may counteract this problem, as may laser microdissection techniques and in silico approaches.

Page 15: Molecular Characterisation and Prognostic Biomarker

15

Lung cancer

Aetiology Today it is well-known that cigarette smoking is the most important risk factor for lung cancer development and the risk has been shown to increase with the number of smoking years as well as with the number of cigarettes smoked daily [23-25]. Approximately 9/10 male and 8/10 female lung can-cer patients in Sweden are either current or former smokers [26]. Whereas the lung cancer risk in former smokers remains elevated compared to never smokers, to quit smoking is favourable also after many years, as the risk indeed decreases with time [27-28]. In addition to tobacco usage, known risk factors include the exposure to radon, asbestos, chromates, chloromethyl ethers, and polycyclic aromatic hydrocarbons, as well as a number of metals such as arsenic, chromium and nickel. Other suggested risk factors include outdoor air pollution, dietary factors, lack of physical activity, and the presence of other acquired lung disease [29-30]. Although lung cancer risk factors to a large extent can be found in the environment, the risk to acquire disease to some degree probably depends on individual susceptibility along-side exposure to respiratory carcinogens [31-32].

Epidemiology Lung cancer is the most commonly diagnosed cancer type world-wide. In 2008 there was an estimated 1.61 million new lung cancer cases, represent-ing 12.7% of all cancers newly diagnosed in that year. Lung cancer is also the leading cause of cancer-related death [33]. In men, lung cancer is both the most commonly diagnosed cancer (16.5%) and the most common cause of death due to cancer world-wide, while in women lung cancer is now the fourth most frequently diagnosed cancer (8.5%) and the second most common cause of death from cancer [33]. World-wide lung cancer statistics present discouraging numbers and corresponding figures for Sweden paint an analogous picture, with lung cancer constituting a major health care burden as well as a cause of substantial individual distress with a reported 13.9% 5-year relative survival rate across all stages [34]. Almost 25,000 new lung cancer patients were reported to the Swedish National Cancer Registry between the years 2002 and 2009 (53% men; 47% women), a majority of which with an age of 60 years or older at the time of diagnosis [26].

In never smokers, lung cancer seems to be on the rise, a trend especially prominent in Asia [35]. Compared to smoking-related disease, lung cancer in never smokers is clinically characterised by a higher frequency of female patients and a higher frequency of tumours with adenocarcinoma histology [36]. Of the total number of reported cases in Sweden 2002-2009, never smokers constituted 10.2% (2537 patients: 6.1% male; 14.9% female) [26].

Page 16: Molecular Characterisation and Prognostic Biomarker

16

In a recent study, never smokers aged 40 to 79 years in the Uppsala-Örebro region in Sweden were reported to have age-adjusted incidence rates of 4.8 (per 100,000) in men and 14.4 in women [37].

Classification Lung cancer comprises a clinically, histologically, and genetically hetero-geneous group of tumours. If possible, the final diagnosis is based on histology or cytology. An initial important distinction is made between small cell (SCLC) and non-small cell lung cancer (NSCLC), as clinical behaviour and treatment strategies differ between these two groups [38]. SCLC consti-tutes approximately 15% of all diagnosed lung cancers and is believed to have its origin in neuroendocrine cells of the lung [39]. SCLC is sensitive to chemotherapy, but a majority of patients rapidly develop treatment resistance and survival beyond five years is rare. Metastatic disease is seen at the time of diagnosis in a majority of SCLCs [39]. This thesis focuses on NSCLC, which accounts for approximately 85% of all lung cancers and is further subdivided into three main categories based on tumour histology (next section) [38,40].

Non-small cell lung cancer Histology NSCLC histologic subgroups are defined based on differences in cellular morphology and to make this distinction has become increasingly important as therapy choice today is influenced by histology [41-42]. A morphological examination of a stained tissue section in the light microscope is routinely performed to determine the histological subtype according to current classi-fication schemes. Additionally, immunohistochemical evaluation of protein markers may be necessary to reach a definite conclusion, in particular for undifferentiated tumours [43]. In Sweden, adenocarcinoma constitutes the largest subgroup with 40% of all diagnosed lung cancers, while squamous cell and large cell carcinoma make up 21% and 14%, respectively [26].

Adenocarcinoma is characterised by a glandular structure and/or the production of mucin and by the expression of protein markers such as thyroid transcription factor-1 (TTF1) [42]. In the new IASCL/ATS/ERS classification, invasive adenocarcinoma is further categorised according to the predominant growth pattern into lepidic, acinar, papillary, micro-papillary, and solid variants. In addition, the term minimally invasive adeno-carcinoma (MIA) was introduced to describe small lepidic tumours (≤ 3cm) with an invasive component ≤ 5mm; a group of tumours with a generally good prognosis [44]. Adenocarcinoma is the most common type in younger

Page 17: Molecular Characterisation and Prognostic Biomarker

17

men (< 50 years), in women of all ages, and in never smokers [45]. It is also the most frequent subtype in patients harbouring mutations in the EGFR or KRAS gene [46].

In squamous cell carcinoma, the histological picture is characterised by keratinisation and intercellular bridges. Immunohistochemistry for protein markers cytokeratin 5/6 and/or p63 aids to define the squamous subtype in poorly differentiated tumours [41]. Large cell carcinoma presents an un-differentiated histological pattern with no microscopic evidence of squamous or glandular differentiation [38]. Variants of large cell carcinoma include for instance large cell neuroendocrine carcinomas (LCNEC), which are poorly differentiated tumours with histologic features suggestive of neuroendocrine differentiation and expression of markers like chromogranin A and synapto-physin [41,47]. Besides these three major histological subtypes, adenosqua-mous carcinomas, sarcomatoid carcinomas and typical/atypical carcinoids account for a few per cent of all lung malignancies [38].

Staging Assessment of the extent of the tumour burden in the individual patient is based on the size of the tumour, the invasion of organ structures and lymph nodes, and the presence of distant metastasis. The TNM system is the most widely used scheme for categorisation of these parameters and the 7th edition is currently applied (Table 1) [48-49]. T stands for tumour and describes the size of the primary tumour and its growth into neighbouring organs; N stands for node and describes the existence and degree of metastasis to the lymph nodes; M stands for metastasis and describes if distant metastases are present. Common sites of metastasis include brain, pleural cavity, bone, liver, adrenal glands, and skin [50]. The TNM components are then summa-rised to define the cancer stage (Table 2), typically denoted by roman numerals I-IV, where I represents small tumours that are localised to the lung and IV describes metastatic tumours that have spread to distant organ sites. For a more detailed classification, further subdivision into for instance IA and IB, may be implemented. The tumour stage forms a basis for assess-ment of prognosis and for making treatment decisions [51].

Treatment The first symptoms in NSCLC cancer patients are in general unspecific. A majority experience one or more symptoms from the respira-tory tract, such as coughing, hoarseness, or chest pain. Other symptoms, such as weight loss and fatigue, may indicate systemic signs of cancer, or originate from meta-static disease (bone pain, headache) or paraneoplastic syndromes. As there are no lung cancer-specific early symptoms, frequently the cancer is diag-nosed at a late stage where treatment options are inadequate and survival

Page 18: Molecular Characterisation and Prognostic Biomarker

18

rates are poor [52-53]. In stage I and II patients (25-30%) surgical resection is the treatment of choice and there is a possibility of complete cure [54]. Nevertheless, numerous early-stage patients (30-40% of stage I) experience tumour relapse [55]. Table 1. Definition of TNM (7th ed), reproduced from [48].

Primary tumour (T) Description

T1 Tumour ≤ 3 cm in diameter surrounded by lung or visceral pleura, without invasion more proximal than lobar bronchus.

T1a Tumour ≤ 2 cm in diameter T1b Tumour >2 cm but ≤ 3 cm in diameter T2 Tumour >3 cm but ≤ 7 cm in diameter or tumour with: -Involvement of the main bronchus ≥ 2 cm distal to the carina -Invasion of visceral pleura -Associated atelectasis or obstructive pneumonitis that does not

involve the entire lung. T2a Tumour ≤ 5 cm in diameter T2b Tumour >5 cm but ≤ 7 cm in diameter T3 Tumour >3 cm but ≤ 7 cm in diameter or tumour with: -Direct invasion of the chest wall, diaphragm, phrenic nerve -Direct invasion of the mediastinal pleura or parietal pericardium -Associated atelectasis or obstructive pneumonitis that involves

the entire lung. -Tumour within the main bronchus < 2 cm to the carina, without

involvement of the carina. -Satellite tumour nodule(s) in the same lobe. T4 Tumour of any size with: -Invasion of mediastinum -Invasion of heart or great vessels -Invasion of trachea, oesophagus, or recurrent laryngeal nerve -Invasion of a vertebral body or carina -Separate tumour nodules in a different ipsilateral lobe. Regional lymph node N

N0 No regional lymph node metastasis N1 Involvement of ipsilateral hilar or peribronchial nodes N2 Involvement of ipsilateral mediastinal or subcarinal nodes N3 Involvement of contralateral mediastinal or hilar nodes, or ipsi-

lateral/contralateral scalene or supraclavicular nodes. Distant metastasis

M0 No distant metastasis M1 Distant metastasis present M1a Separate tumour nodule(s) in a contralateral lobe or tumour with

pleural nodules or malignant pleural/pericardial effusion M1b Distant metastasis

Page 19: Molecular Characterisation and Prognostic Biomarker

19

Table 2. Tumour stage based on TNM 7th ed, reproduced from [51].

Stage TNM subset Stage TNM subset

0 Carcinoma in situ IA T1a/T1b N0 M0 IB T2a N0 M0 IIA T1a/T1b N1 M0 IIB T2b N1 M0 T2a N1 M0 T3 N0 M0 T2a N0 M0 IIIA T1/T2 N2 M0 IIIB T4 N2 M0 T3 N1/N2 M0 Any T N3 M0 T4 N0/N1 M0 IV Any T Any N M1a/M1b

Adjuvant chemotherapy to decrease the risk of recurrence is currently recommended for stage II patients, but not stage I [54]. Roughly 30% of NSCLC patients are diagnosed with locally advanced stage III disease. This heterogeneous group can be further divided into IIIA and IIIB, with 5-year survival rates of 23% and 7% respectively [56-57]. Stage IIIA patients may be operated like those with stage I and II disease, followed by adjuvant chemotherapy [56]. Stage IIIB tumours are considered inoperable and a combination of radio- and chemotherapy is recommended [57]. Stage IV constitutes around half of all newly diagnosed cases. At this stage, the disease is considered incurable, but treatment, e.g. chemotherapy adminis-tered alone or in combination with a targeted therapy, may improve symptoms and offer additional months, or even years, to individual patients [58]. Examples of targeted therapies approved in non-squamous NSCLC include for instance bevacizumab, a monoclonal antibody directed against VEGF (vascular endothelial growth factor) [20], and small molecule inhibi-tors of EGFR (epidermal growth factor receptor) tyrosine kinase activity [59].

Molecular pathology Today we have gained an extensive knowledge about the various molecular features of lung tumours. Mutations in the TP53 gene is a common genetic event in NSCLC, present in more than half of lung tumours, particularly in smoking-related squamous cell carcinomas [60]. Another common event that occurs in nearly 50% of NSCLCs is the presence of an inactivating mutation in STK11 (serine/threonine kinase 11; also known as LKB1), predominantly in smokers with adenocarcinoma and many times coexisting with mutations in other cancer genes [61].

Activating mutations in KRAS (v-Ki-ras2 Kirsten rat sarcoma viral onco-gene homolog), a majority involving hotspot codons 12 or 13, are reported to occur in approximately 30% of adenocarcinoma patients, while rarely being

Page 20: Molecular Characterisation and Prognostic Biomarker

20

detected in squamous cell carcinomas [62-63]. The occurrence of activating EGFR mutations range from 5-20% in Western populations to 20-40% in Asia, exon 19 deletions and the exon 21 point mutation L858R being the most commonly reported events [63]. EGFR mutations are in general more frequently observed in female patients, never smokers, patients of Asian ethnicity, and in tumours with adenocarcinoma histology [46]. Increased EGFR copy number (defined by gene amplification or high polysomy) has been reported in approximately 30% of NSCLC patients and many times coexists with the presence of mutation [64].

BRAF (v-raf murine sarcoma viral oncogene homolog B1) mutations are detected in only 1–3% of NSCLCs and mostly in adenocarcinomas [65]. Mutations in PIK3CA (α-phosphatidylinositol 4,5-bisphosphate 3-kinase, catalytic subunit) are present in 2-4% of NSCLCs, both squamous cell carcinomas and adenocarcinomas, and may coexist with KRAS and EGFR mutations [65-66]. Other low-frequency mutations in NSCLC include for instance ERBB2 (v-erb-b2 erythroblastic leukemia viral oncogene homolog 2) – 2%, AKT1 (v-akt murine thymoma viral oncogene homolog 1) – 1%, and MAP2K1 (mitogen-activated protein kinase kinase 1) – 1% [65]. Point mutation in DDR2 (discoidin domain receptor family, member 2), a collagen-binding receptor tyrosine kinase [67], was recently detected in squamous cell carcinoma (4%) and have been shown to be associated with response to dasatinib (a BCR/ABL and Src family tyrosine kinase inhibitor) in cell culture and animal models [68].In one of the largest mutations screens in NSCLC to date, by Ding et al., 623 genes were sequenced in 188 adeno-carcinomas [69]. In addition to genes previously known to be frequently mutated in lung adenocarcinoma, recurrent mutations were detected in tu-mour suppressor genes NF1 (neurofibromin 1), ATM (ataxia telangiectasia mutated), and APC (adenomatous polyposis coli), as well as putative onco-gene ERBB4 (v-erb-a erythroblastic leukemia viral oncogene homolog 4) and members of the ephrin family of receptor tyrosine kinases including for instance EPHA3 (ephrin type-A receptor 3).

Amplification of the MET gene, which encodes the hepatocyte growth factor receptor (HGFR) tyrosine kinase, has been reported at frequencies ranging from in 5.6 to 21.0% of NSCLC patients [70-72] and in approximately 20% of patients who has acquired resistance to EGFR-inhibitors [73]. In 3-7% of unselected NSCLC, a small inversion on chromosome 2p causes the formation of a fusion gene that consists of parts from EML4 (echinoderm microtubule-associated protein-like 4) and ALK (anaplastic lymphoma receptor tyrosine kinase) and results in the constitu-tive activation of ALK [74-75]. Recently, additional kinase fusion genes, involving for instance ROS1 (c-ros oncogene 1 receptor tyrosine kinase), have been identified in adenocarcinomas and may represent targets for ther-apy [76-77]. Gene fusions involving RET (ret proto-oncogene) and KIF5B

Page 21: Molecular Characterisation and Prognostic Biomarker

21

(kinesin family member 5B) as well as CCDC6 (coiled-coil domain containing 6) were also recently described [77-78].

Biomarkers in NSCLC Biomarker definition As can be inferred from the word itself, a biomarker is, in its most simple sense, a marker (or indicator) of underlying biology. One common definition states that a biomarker is a “characteristic that is objectively measured and evaluated as an indicator of normal biological processes, pathogenic pro-cesses, or pharmacologic responses to a therapeutic intervention” [79]. A useful cancer biomarker provides clinically relevant information that will benefit the individual patient. A prognostic biomarker predicts a patient’s individual cancer outcome (i.e. overall survival or progression-free survival) regardless of given therapy. Prognostic biomarkers can be utilised in risk stratification, for instance to decide how aggressively a patient should be treated. A predictive biomarker provides information about the response to therapy, e.g. if a specific drug is likely to benefit the patient [80].

Current prognostic and predictive biomarkers As operated NSCLC patients are at high risk for tumour recurrence, a prog-nostic biomarker that can stratify NSCLC patients into high and low risk groups, and that is readily applicable in clinical diagnostics, might be valuable in the selection of patients for additional and more aggressive treatment. In general, the value of adjuvant chemotherapy in NSCLC is disputed, in particular with regard to patients with stage I disease. A useful prognostic biomarker would reliably point out the more aggressive tumours, with a high likelihood of recurrence, as well as recognise low-risk patients who would not benefit from adjuvant chemotherapy, thus avoiding over-treatment with associated side effects in early-stage NSCLC. Clinical parameters such as the tumour stage, and the patient’s performance status, are widely used to make predictions on prognosis but are many times inade-quate. A number of candidate prognostic molecular biomarkers have been suggested, but remain to be independently validated in prospective random-ised trials and included in routine diagnostics [81-83].

One example of a candidate prognostic biomarker is RRM1 (ribonucleo-side-diphosphate reductase) which is involved in control of proliferation through the manufacturing of deoxynucleotides and in metastasis through induction of PTEN (phosphatase and tensin homolog). A high expression level of RRM1 has been found to be indicative of longer survival, independ-ent of stage and performance status, and predictive power with regard to

Page 22: Molecular Characterisation and Prognostic Biomarker

22

treatment with gemcitabine in advanced NSCLC has also been linked to RRM1 expression [84-85]. Another example of a candidate prognostic, as well as predictive, biomarker is ERCC1, a member of the excision repair cross-complementing gene family that maintains DNA integrity. Operated NSCLC patients with high ERCC1 levels are shown to have a better prog-nosis, perhaps due to the benefits of an intact pathway for DNA repair by nucleotide excision [86]. On the contrary, low ERCC1 expression is predic-tive of better response to platinum-based chemotherapy [87].

Targeted therapy, directed at specific structures in the tumour cell that differ between the cancer cells and their normal counterparts, have lately entered the spectrum of available treatment options. Consequently, a door has been opened for further individualisation of treatment, and the concur-rent development of suitable tools to predict which subsets of patients will benefit, or not benefit, from treatment. Predictive testing, using molecular techniques, is today routinely performed in clinical pathology laboratories. Activating EGFR mutations predict response to EGFR tyrosine kinase inhibitors [88-91] and is a well-established predictive biomarker, routinely examined in clinical laboratories, using for instance conventional sequencing methods, allele-specific real-time PCR or high-resolution melting curve analysis [92-93]. Unfortunately, only a minority of NSCLC patients harbour these mutations and benefit from the selective inhibition of EGFR signalling [59] and, for those who do benefit, the development of treatment resistance poses a problem, for instance via a secondary T790M point mutation or amplification of MET [94-95]. Increased EGFR copy number has also been linked to improved outcome after gefitinib treatment [96], but is today not analysed on a routine basis.

Another emerging predictive biomarker is the presence of the EML4-ALK fusion gene. ALK-inhibitors, including for instance crizotinib, have been developed and are clinically efficient in the subgroup of NSCLC patients that harbour the EML4-ALK fusion gene [97], and analysis of EML4-ALK using fluorescence in situ hybridisation (FISH) or real-time quantitative PCR-based methods [98-99] is today performed routinely in many molecular pathology laboratories. Secondary mutations within the kinase domain of EML4-ALK have been found to cause resistance to ALK inhibitors [100].

Array-based NSCLC profiling Notwithstanding advancements in the development of novel treatment options and the identification of biomarkers to predict treatment response, improvement of overall NSCLC survival has been modest, prolonging life with a few months in a minority of patients and mostly in the advanced stag-es of NSCLC. With the introduction of array-based technologies for

Page 23: Molecular Characterisation and Prognostic Biomarker

23

genome- and transcriptome-wide analysis, new opportunities emerged to define clinically meaningful subtypes and to build prognostic and predictive molecular signatures based on global gene expression patterns and DNA copy number alterations.

Gene expression profiling While all the basic genetic information needed to build an organism is encoded in the DNA sequence, the dynamic aspects of the genome are dis-played as alterations in gene expression – a tightly regulated process to determine when and where a specific gene should be transcribed and translated. Gene expression microarray analysis allows the simultaneous measurement of expression levels for thousands of genes to provide a snap-shot of the underlying biological activity in cells at a given time. Global gene expression microarray data is often analysed by various clustering methods that group genes and/or samples based on the similarity of gene expression patterns and by statistical methods that highlight the potential relevance of genes in connection with clinical variables. The aim of gene expression profiling studies can often be described as either class compari-son (gene identification), class prediction (prediction of for instance clinical outcome) or class discovery (grouping samples with similar profiles) [101].

Gene expression profiling was initially used to build tumour classification schemes that portray distinct molecular subsets which are indistinguishable by morphological examination [102-106]. In one of the first gene expression array studies in the lung cancer field, Bhattacharjee et al. used unsupervised hierarchical clustering to define distinct molecular subsets of NSCLC patients based on gene expression patterns. In the retrospective analysis, a subclass of adenocarcinomas that comprised tumours with high expression of neuroendocrine markers were characterised by a less favourable outcome [104]. Gene expression signatures based on microarray data have also been used to predict prognosis with regard to lymph node metastasis [107-109], tumour relapse after surgical resection [110-112], and disease-free and over-all survival [113-121]. In addition, gene expression profiling of lung cancer cell lines and tumours were used to predict response, or resistance, to conventional chemotherapy and targeted therapy, such as EGFR tyrosine kinase inhibitors [122-124]. An expression profile representative of smok-ing, generated by comparing gene expression levels in never and current smokers, was found to include an over-representation of cell cycle-related genes, predominantly genes involved in mitotic spindle formation. Also, in former smokers, gene expression levels were found to display persistent alterations even in patients who quit smoking more than 20 years ago [125].

While undoubtedly much knowledge has been gained during a decade of gene expression profiling, the initial enthusiasm, powered by the prospect to make clinically meaningful predictions based on global gene expression

Page 24: Molecular Characterisation and Prognostic Biomarker

24

patterns in NSCLC, have somewhat subsided. No classifier based on gene expression data is today widely established in clinical practice and it seems that many published gene signatures are either not reproducible or do not add information beyond conventional prognostic parameters [126]. This is in contrast to the breast cancer field, where gene expression array-based prognostic signatures have been more thoroughly validated, developed into products for risk assessment, and included in on-going phase III trials [127-129]. Indeed it remains a major challenge to reliably convert information from the complexity of thousands of interconnected data points into a suitable format for clinical implementation and a real benefit to lung cancer patients.

Genomic profiling The identification of frequently altered genomic regions in cancer is a potent way to unravel key genes that are involved in tumourigenesis. Conventional cytogenetic techniques, such as karyotyping methods and FISH, were in the early 1990s supplemented with comparative genomic hybridisation (CGH) [130]. In CGH, differently labelled tumour and reference DNA samples are simultaneously hybridised onto chromosome metaphase spreads from a healthy individual and the fluorescence ratio is measured. CGH was later further developed into an array format (array-based CGH) that replaced metaphase spreads with BAC (bacterial artificial chromosome) or cDNA clones spotted onto a glass slide; an improvement that significantly increased the resolution of detection and was able to provide more exact chromosomal locations of detected aberrations [131-133]. Yet another high-resolution technique was introduced with the commercial single nucleotide polymor-phism (SNP) array chip [134-135]. Using SNP array technology, thousands to hundred thousands of individual SNPs can be simultaneously assessed to detect genome-wide gene copy number alterations at an unprecedented resolution. SNP arrays also have the benefit of being able to detect allelic imbalances, such as loss of heterozygosity (LOH), across the genome [136].

Since the late 1990s, numerous studies have applied conventional CGH [137], array-CGH [138-141] and SNP arrays [142-144] to describe genome-wide recurrent somatic copy number alterations in NSCLC, including 1q, 3q, 5p, 7p, 8q and 17q gain, and 3p, 6q, 8p, 9p, 13q, 17p and 18q loss, as well as revealed significant differences between histologic subtypes. Recurrent gain and loss regions are likely to contain genes that are crucial to pathogenesis and that may represent targets for treatment. The exploration of candidate genes and their clinicopathologic associations is a work still in progress.

In the largest study to date, Weir et al. analysed 371 adenocarcinomas [143] and copy number gain at chromosome 5p was reported to be the most common genomic aberration, occurring in 60% of analysed samples. The most frequently detected focal event was amplification of 14q13.3 (6-12%)

Page 25: Molecular Characterisation and Prognostic Biomarker

25

and the authors suggested, based on functional analyses, that NK2 homeo-box 1 (NKX2-1, also called TITF or TTF1) represents a novel cancer-related gene in this region. Other reported focal regions in the Weir study included for instance amplification of 5p15.33, 12q15, and 17q12 (containing known oncogenes TERT, MDM2, and ERBB2, respectively) and deletion of 10q23.31 and 9p12.3 (PTEN and CDKN2A).

Only a few whole-genome copy number studies in NSCLC to date included data on patient outcome [139-140,144]. The first high-resolution array-CGH analysis of lung adenocarcinoma by Shibata et al. revealed that 13p14.1 loss and 8q24.2 gain were significantly associated with longer disease-free survival [140]. Iwakawa and co-workers also showed that 8q gain is a recurrent event in lung adenocarcinoma, while amplification of 8q24.21, in contrast, correlated with poor prognosis in the Iwakawa study [144]. In squamous cell carcinoma, fibroblast growth factor receptor 1 (FGFR1) amplification on chromosome 8p12 is a common genetic event (16-22%) and the use of small-molecule inhibitors that target FGFR1 may be applicable in patients with metastatic disease [145-146]. Another recurrent event, amplification of the transcription factor and candidate oncogene SOX2 (Sex determining region Y box 2) on 3q26.3, has been associated with better outcome in squamous cell carcinoma [147-149].

Recent publications using SNP arrays have also described an increase in the number of gene amplification events in heavy smokers compared to light and non-smokers [150]. Furthermore, it has been reported that the extent of copy number alteration increases with cancer progression, e.g. comparing early-stage to advanced-stage tumours [151]. However, extreme chromoso-mal instability (CIN) may be deleterious to malignant carcinomas and recent finding suggest that poor outcome in NSCLC and other cancer types rather is linked to intermediate levels of CIN [152].

Page 26: Molecular Characterisation and Prognostic Biomarker

26

Aim

The overall aim of this thesis was

to establish a non-small cell lung cancer cohort and to obtain a comprehen-sive molecular profile of each individual sample with regard to whole-genome gene copy number alterations, global gene expression levels and somatic mutations in selected cancer genes.

to complement the frozen tissue cohort with a formalin-fixed paraffin-embedded tissue microarray to enable validation of findings on the protein level using immunohistochemistry.

to integrate molecular data from various analytical platforms with infor-mation on patient parameters to identify clinically relevant molecular events, linked to overall survival, that may represent candidate prognostic biomarkers. More specifically the aims were

to describe associations between DNA copy number gain/loss and overall survival in histological subtypes of NSCLC (Paper I).

to evaluate the expression pattern of selected stromal proteins in non-small cell lung cancer in order to identify candidate biomarkers related to disease outcome (Paper II).

to present a strategy for prognostic biomarker discovery based on global gene expression array profiling, meta-analysis and in situ validation of protein expression using tissue microarrays (Paper III).

to study the prognostic relevance of tumour-infiltrating plasma cells and immunoglobulin kappa protein expression in NSCLC (Paper IV).

to apply next-generation deep sequencing and statistical analysis of the TP53 database in order to outline parameters for database curation (Paper V).

Page 27: Molecular Characterisation and Prognostic Biomarker

27

Present Investigation

Materials and methods Patient cohort and study design The establishment of this non-small cell lung cancer study cohort was made possible by the presence of a well-organised local biobank infrastructure for the procurement and storage of fresh frozen tissue samples from surgical procedures and of a high-quality cancer registry for the standardised assem-bly of clinical patient information in the Uppsala-Örebro region. The use of patient data and tissue specimens was approved by an ethical review board.

The study population consisted of all operated patients with primary NSCLC that were diagnosed and reported to the Uppsala-Örebro Regional Lung Cancer Registry in the years 1995-2005, and that also had a fresh frozen tissue sample available via the Uppsala frozen tissue Biobank. When the Regional Lung Cancer Registry was cross-linked against the Biobank database, using the Swedish personal identification number as an identifier, 382 patients were recognised and the corresponding frozen tissue blocks (n=706) were retrieved from the biobank low-temperature freezers. Tissue sections (4µm) were prepared from all tissue blocks using a cryostat, stained with haematoxylin-eosin, and subsequently reviewed by a trained pathologist. Study inclusion was based on (i) confirmed NSCLC histology (adenocarcinoma, squamous cell carcinoma, or large cell carcinoma/ NSCLC not otherwise specified (NOS) in the frozen tissue section, (ii) tumour sample size ≥ 5 mm, and (iii) tumour cell fraction ≥ 50%. Samples from patients who had received neoadjuvant treatment were excluded. After DNA and RNA extraction from all eligible cases, an RNA integrity number (RIN) ≥ 7.0 [153] was also required for study inclusion (iv). In total, 196 cases qualified according to these criteria. The comprehensive study design is described and visualised in Figure 1.

At the outset, to increase the chance to identify molecular events inde-pendently associated with prognosis, two patient populations, defined by different survival outcomes, were included in the study (Paper I, II and V). Eligible cases were ranked according to survival time and 100 samples were selected to represent short-term (0-20 months; n=53) and long-term survi-vors (58-172 months; n=47). As only surgically resected patients were included, a majority of selected cases had early-stage disease (IA-IIB 78%,

Page 28: Molecular Characterisation and Prognostic Biomarker

28

IIIA-IV 18%, missing data 4%). The median age at diagnosis was 65 years (range 40-82 years) and 54% of patients were male, while 46% were female. Never smokers constituted 7% of the patients, while former and current smokers comprised 42% and 47%, respectively (missing data 4%). A majori-ty of samples were of adenocarcinoma subtype (50%), followed by squamous cell (28%) and large cell (22%) carcinoma. The mutation status for TP53, KRAS, and EGFR was evaluated and 67%, 25%, and 13% of pa-tient samples, respectively, were found to harbour a mutation (see Paper V, Dataset S2).

In addition to the frozen tissue cohort, formalin-fixed paraffin-embedded (FFPE) archival tissue blocks were available from 94/100 patients. Applying the same study design, with long- and short-term survivors and the same cut-off survival times as for the frozen tissue cohort, these 94 tumours plus 96 additional tumour samples (where the corresponding frozen specimens was ineligible for the molecular analysis) were included in a tissue microarray (TMA) (total n=190) (Paper II).

At a later time, the remaining 96 frozen tissue samples with intermediate survival time, that fulfilled the quality criteria, were included to generate an extended consecutive frozen tissue cohort (total n=196). The FFPE tissue microarray cohort was extended accordingly to include in total 355 patients. Clinical patient information for the extended frozen tissue and FFPE cohorts is presented in Table 3 and Table 4 (Paper III-IV).

Figure 1. Study design: The Regional Lung Cancer Registry was cross-linked against the Uppsala frozen tissue Biobank database to define the study population. Patient samples were selected according to quality criteria and a comprehensive molecular profile was obtained for each individual sample with regard to whole-genome copy number alterations, global gene expression levels and somatic mutations in selected cancer genes. A FFPE TMA was constructed to enable valida-tion of findings on the protein level using immunohistochemistry.

Regional Lung Cancer Registry for Uppsala/Örebro

1995-2005 Quality criteria

Histology

AgeGenderSmoking

StageSurvival

Histology

Uppsala Biobank

n=196

DNA copy number

alteration

n=196

DNA mutation

n=196

mRNAgene expression

n=355

proteinin situ expression

pattern

Page 29: Molecular Characterisation and Prognostic Biomarker

29

Table 3. Clinical characteristics of NSCLC patients included in the extended frozen tissue cohort (Paper III-IV).

No. %

All cases 196 100.0 Gender Male 107 54.6 Female 89 45.4 Age at diagnosis < 60 64 32.7 ≥ 60 132 67.3 Median (range) 65 (39-84) Smoking Current 96 49.0 Former 85 43.4 Never 15 7.7 Tumour stage IA 40 20.4 IB 90 45.9 IIA 6 3.1 IIB 29 14.8 IIIA 21 10.7 IIIB 6 3.1 IV 4 2.0 Histology Adenocarcinoma 106 54.1 Squamous cell carcinoma 66 33.7 Large cell carcinoma/NOS 24 12.2 Performance status (WHO)(1) 0 105 53.6 1 75 38.3 2 12 6.1 3 4 2.0 4 0 0.0 Mutation status(2) KRAS mutation 50 26.0 EGFR mutation 21 11.0

(1) The performance status is an assessment of a patient’s general well-being and daily activities, here reported as the WHO score (or ECOG score): 0-no symptoms, 1-symptomatic but ambulatory, 2-symptomatic and in bed <50%, 3-symptomatic and in bed >50%, 4-bedbound, 5-dead [154].

(2) The KRAS analysis was performed using pyrosequencing of mutation hotspot codons 12-13 and 61. The EGFR analysis was performed using Sanger sequencing of exons 18-21. The TP53 mutation status was not analysed for the complete extended frozen tissue cohort and is therefore not included in this table.

Page 30: Molecular Characterisation and Prognostic Biomarker

30

Table 4. Clinical characteristics of NSCLC patients included in the extended FFPE tissue microarray cohort (Paper III).

No. %

All cases 355 100.0 Gender Male 193 54.4 Female 162 45.6 Age at diagnosis < 60 93 26.2 ≥ 60 262 73.8 Median (range) 67 (40-84) Smoking Current 163 45.9 Former 156 43.9 Never 34 9.6 Missing data 2 0.6 Tumour stage IA 88 24.8 IB 150 42.3 IIA 12 3.4 IIB 44 12.4 IIIA 35 9.9 IIIB 16 4.5 IV 10 2.8 Histology Adenocarcinoma 195 54.9 Squamous cell carcinoma 120 33.8 Large cell carcinoma/NOS 40 11.3 Performance status (WHO) 0 187 52.7 1 134 37.7 2 28 7.9 3 5 1.4 4 1 0.3

In the extended cohort KRAS and EGFR mutations were detected in 26% and 11%, respectively. In KRAS, 86% of mutations were detected in codon 12 (p.Gly12Ala n=5; p.Gly12Cys n=15; p.Gly12Asp n=12; p.Gly12Val n=11), 6% in codon 13 (p.Gly13Cys n=2; p.Gly13Asp n=1) and 8% in codon 61 (p.Gln61His n=3; p.Gln61Leu n=1). The majority of EGFR muta-tions constituted exon 19 deletions (p.Glu746-Ala750 del variants) (n=11) and exon 21 point mutations at codon 858 (p.Leu858Arg) (n=5), while rare variants included for instance two samples with a point mutation in exon 18 (Gly719Ala; Gly719Ser) and one sample with double point mutations in exons 18 and 20 (Ser768Ile+Gly719Cys).

Page 31: Molecular Characterisation and Prognostic Biomarker

31

Follow-up information with regard to tumour recurrence was available only for a subset of the extended frozen tissue and TMA cohorts, as this infor-mation is not routinely incorporated into the Regional Lung Cancer Registry and had to be collected retrospectively from patient hospital records. Infor-mation on recurrence-free survival was available for 157 patients, 80 of whom experienced tumour recurrence, either locally in the lung (41%) or at a distant site including brain (21%), bone (21%), and liver (10%). A majority of patients with recurrent disease (63/80) received additional treatment by chemotherapy (30%), radiotherapy (52%), radiochemotherapy (3%), surgery (10%) or other (5%). Also follow-up data on adjuvant therapy after surgical removal of the primary tumour was available only for a subset of patients (n=162), of which 29% did receive adjuvant chemotherapy. In stage I patients (n=105) this figure was 20%, stage II (n=27) 48%, and stage III-IV (n=30) 43%.

DNA and RNA extraction DNA and RNA was extracted (Paper I-V), using column-based commercial extraction kits, from 5-10 frozen tissue sections (10µm). The DNA and RNA concentration and purity was measured with NanoDrop. In addition, the RNA integrity was assessed using the Agilent 2100 Bioanalyzer to ensure that extensive RNA degradation had not occurred. The Bioanalyzer applies a microfluidics system for electrophoretic separation of molecules. A quality score (RIN) ranging from ten to zero is automatically calculated based on features of the resulting electropherogram, such as the shape of and ratio between the 28S and 18S rRNA peaks, baseline configuration, and presence of aberrant peaks [153].

Gene expression array analysis A microarray consists of a large number of probes that are spotted or synthe-sised onto a solid surface. The basic principle includes labelling a sample with a fluorescent dye and allowing it to hybridise to a matching probe on the array. The hybridisation intensity, which is relative to the amount of target transcript, is then measured by the fluorescence signal from each spot on the array. In this thesis, data from array-based global gene expression analysis of NSCLC, using Affymetrix GeneChip Human Genome U133 Plus 2.0 arrays, were included in Paper I-IV. An Affymetrix probe is a 25-mer oligonucleotide that has been synthesised directly onto a solid surface. A set of probes that bind to different positions along the target sequence is incor-porated to measure each transcript (probe set) [155]. The HG-U133 Plus 2.0 array includes more than 54,000 probe sets, corresponding to over 47,000 transcripts (38500 well-characterised genes) [156].

Page 32: Molecular Characterisation and Prognostic Biomarker

32

In the standard Affymetrix protocol, applied in Paper I-IV, the mRNA sample is first reversibly transcribed to cDNA and the double-stranded cDNA then enters as a template into the next reaction to generate amplified biotin-labelled cRNA. The cRNA is fragmented and hybridised to the array. Following hybridisation, the chip is rinsed and a scanned image is acquired from which individual features (probe locations on the array) are extracted and the signal intensity is calculated for each feature. Before data analysis, the raw data fluorescence signals are log-transformed and preprocessed, including background correction (to remove signal from unspecific binding and incomplete washing), normalisation (to make the different samples in one experiment comparable to each other), and summarisation (to combine measurements from the multiple probes of a probe set into a compound signal) [155]. Different filtering methods can be applied to exclude probe sets that have a signal intensity close to the background noise level (filtering by signal intensity) or that do not change within an experiment (filtering by variation between samples) [155]. In Paper I-IV, the raw data was normal-ised using the robust multi-array average (RMA) method [157]. In Paper I-II, only transcripts with an average signal intensity log2 value > 5 were includ-ed in the analysis. In Paper III-IV, no filtering procedure was implemented.

Single nucleotide polymorphism array analysis In Paper I, Affymetrix GeneChip Mapping 250K NspI arrays were applied to assess genome-wide gene copy number alterations. Analogous to the gene expression array described above, the single nucleotide polymorphism (SNP) array consists of 25-base pair oligonucleotide probes, attached to a solid surface, representing a selection of SNP alleles that are present throughout the genome [158]. The Affymetrix GeneChip 250K NspI array covers 262000 SNPs, with an average distance of approximately 6 kb between neighbouring SNPs. Between 25 and 40 probes are included for each SNP and one array chip comprises millions of individual features [159]. Using standard Affymetrix protocols, genomic DNA is cut into different-length fragments by enzymatic restriction and adapters are ligated onto the frag-ments, followed by PCR amplification with primers complementary to the adapter sequence. The amplified DNA is then yet again fragmented and la-belled with a fluorochrome. The fluorescently labelled fragments are hybridised to the chip and DNA that encompasses a specific SNP allele will bind to the matching probe (Figure 2). The chip is rinsed to dispose unbound DNA and scanned to detect the fluorescence intensity from the bound DNA fragments [158].

Prior to genotype calling, the raw data fluorescence signal must be nor-malised, either to a matched normal sample or a suitable reference data set. In Paper I, a reference set of 82 normal tissue samples that originated from a Caucasian population was used as a reference for normalisation. The

Page 33: Molecular Characterisation and Prognostic Biomarker

33

GeneChip Genotyping Analysis Software (GTYPE) 4.1 from Affymetrix was used for probe level normalisation to produce log2 ratios (Copy Number Analysis Tool, CNAT, 4.0.1) and for single sample quality control and geno-type calling (Dynamic Model algorithm) [160]. Copy number gains and losses were defined using the BioDiscovery Nexus Copy Number 3.0 Rank Segmentation algorithm. In this context, segmentation means to delineate groups of neighbouring probes that share the same DNA copy number. The built-in Rank Segmentation is a variant of the Circular Binary Segmentation algorithm, which segments data by detecting significant change-points [161-162]. The significance threshold for segmentation was set to p<10-6 for at least 30 neighbouring probes in a row and a log2 ratio threshold of ±0.15 was applied to define genomic loss and gain. Regions with recurrent focal copy number alterations (>35%; p<0.05) were identified using the Nexus Significance Testing for Aberrant Copy Number (STAC) method [163].

Figure 2. SNP array analysis using Affymetrix GeneChip Mapping 250K NspI arrays: (A) After enzymatic restriction, adapter ligation and amplification, fluores-cently labelled DNA is hybridised to a chip that contains probes representative of different SNP alleles; modified from [158-159]. (B) Visualisation of DNA copy number data (chromosome 8) from matched normal lung (left) and tumour (right) samples that portrays genomic loss (red) and gain (green). Each dot represents a single SNP.

NspIA.

B.

genomic DNA

adapter ligation

amplificationfragmentation and labelling

Page 34: Molecular Characterisation and Prognostic Biomarker

34

Quantitative real-time polymerase chain reaction

Polymerase chain reaction (PCR) exponentially amplifies template DNA to generate a large number of copies of a sequence that has been specified by the choice of oligonucleotide primers flanking the region of interest. PCR is based on repetitive temperature cycling for denaturation (generation of single-stranded DNA), annealing (primer binding to DNA to form a starting point for the polymerase), and primer extension (incorporation of comple-mentary deoxynucleotide triphosphates to build the new DNA strand). In conventional PCR, the end product is visualised after the completion of the PCR reaction, e.g. on an agarose gel stained with a DNA intercalating dye. In real-time PCR, the accumulation of reaction products is monitored throughout the reaction by fluorescence detection, either using specific fluorescently labelled probes or non-specific labelling of the amplified target by double-stranded DNA-binding dyes [164]. As the number of copies is doubled in every PCR cycle, a quantitative relationship exists between the amount of input DNA and the amount of PCR product in any given cycle. Real-time reverse transcription quantitative PCR is a much used technique for analysis of gene expression levels. Complementary DNA (cDNA) is then synthesised from mRNA, using reverse transcription, and entered into the PCR reaction. The quantification method can either be absolute, linking the signal intensity to a standard curve of known concentrations, or relative, where the transcript level is calculated in relation to the expression of one or multiple housekeeping genes [165-166].

Quantitative real-time PCR was used in Paper I to (i) validate DNA copy number alterations detected by the SNP array analysis, by comparing amplified/deleted chromosomal regions with internal control regions (normal copy number) in matched tumour and normal samples, and (ii) validate results from the gene expression array by analysing expression levels of selected genes relative to that of glyceraldehyde 3-phosphate dehydrogenase (GAPDH) according to the ∆∆Ct method.

Laser capture microdissection Laser capture microdissection techniques allow the high-precision procure-ment of a specified cell population from a heterogeneous tissue sample that consists of multiple cell types. Under the microscope, a focused laser beam is used to cut and retrieve the chosen tissue area, from which for instance DNA or RNA can be extracted. The PALM system for laser microdissection and pressure catapulting was used in Paper II to acquire homogeneous tumour cell populations from frozen tissue sections for RNA extraction and gene expression array analysis.

Page 35: Molecular Characterisation and Prognostic Biomarker

35

Immunohistochemistry The capacity of antibodies to specifically bind to their epitopes is widely exploited in both research and diagnostic pathology to provide information on protein expression and tissue localisation. Various detection systems, either enzyme-based or fluorescence-based, are used to visualise antibody-binding. An antibody can either be directly conjugated to a fluorochrome or an enzyme (direct methods) or a secondary antibody, that binds to the constant part of a primary antibody, can be fluorescently or enzymatically labelled (indirect methods).

Immunohistochemistry (IHC) was first introduced by Albert H. Coons in 1942, at that time using direct fluorescence-labelling to detect antibody staining in tissue sections [167]. In 1966, Nakane et al. showed that enzyme-labelled antibodies could be used to localise tissue antigens [168]. One popu-lar enzymatic system for detection is based on the activity of horseradish peroxidase (HRP) and a chromogen, 3,3-diaminobenzidine (DAB) to upon addition of substrate (hydrogen peroxide) produce a brown-coloured reaction end product that can be counterstained with haematoxylin for better visuali-sation of tissue morphology (Figure 3) [169]. Improvements to enhance the staining intensity included the introduction of more complex detection systems, such as peroxidase-anti-peroxidase (PAP) [170] and the avidin-biotin complex (ABC) [171].

Tissue microarrays A tissue microarray (TMA) is a collection of cylindrical tissue cores that have been removed from individual donor paraffin blocks and inserted into one single paraffin block at high density (Figure 3). The original idea of a multi-tumour tissue block, by Battifora in 1986 [172], was further advanced by Kononen and co-workers through the introducing of a new method to combine numerous tissue specimens into an array format [173]. TMAs provide valuable tools for high throughput visualisation in hundreds of sam-ples simultaneously. A wide range of techniques are applicable on a TMA format, including immunohistochemistry and immunofluorescence for eval-uation of protein expression [174], fluorescence-, silver- or chromogenic in situ hybridisation (FISH/SISH/CISH) for picturing chromosomal anomalies [175-177], and mRNA-based methods for assessment of transcript levels in a tissue context [178]. Antibody-based methods are suitable for candidate biomarker discovery and validation and may aid in clinical implementation of research findings as these methods are relatively inexpensive and routine-ly used in pathology laboratories world-wide. Tissue microarrays that include tumour specimens linked to clinical endpoints enable the evaluation of candidate biomarkers for diagnostic, prognostic, or predictive value.

Page 36: Molecular Characterisation and Prognostic Biomarker

36

Figure 3. (A) IHC using an indirect visualisation strategy. A primary antibody binds to its tissue antigen and an enzymatically labelled secondary antibody, which binds to the primary antibody, is then added. When a substrate for the enzymatic reaction and a chromogen is added, a coloured end product is generated. (B) Tissue micro-array that displays tumour samples (duplicates) with different staining intensity.

In Paper II-IV, automated immunohistochemistry (IHC) was performed on tissue microarrays. The protein expression was manually annotated with regard to staining intensity and fraction of stained cells. In Paper II and IV, immunofluorescence double-staining was applied to envision co-localisation of selected proteins and cell type-specific markers. By using two primary antibodies that were raised in different species and secondary antibodies that were labelled with fluorescent molecules emitting at different wavelengths, the potential co-localisation was evaluated.

A.

B.antigen

primary antibody

secondary antibody

substrate + chromogen visible product

enzyme

Page 37: Molecular Characterisation and Prognostic Biomarker

37

Genomic sequencing Genomic sequencing methods are used to determine the order of nucleotide bases in the DNA. With the availability of a sequence blueprint of the human genome [179], deviations from the reference DNA sequence can be identi-fied, such as single-base nucleotide substitutions as well as deletions or insertions of one or several nucleotides. Various sequencing methods are today used in research and clinical diagnostics to identify and categorise known and novel cancer-associated mutations.

Sanger sequencing Sanger (dideoxy) sequencing, developed in 1977 by Frederick Sanger [180], is still the gold standard in DNA sequencing. Based on the incorporation of fluorescently labelled dideoxy nucleotide triphosphates (ddNTPs) [181] that cause chain-termination of synthesised DNA molecules, DNA fragments of different lengths are produced that can be separated according to size by automated capillary electrophoresis. As part of the basic characterisation of the study cohort, and summarised in Paper V, Sanger sequencing was used to detect mutations in TP53 (exon 4-9) and EGFR (exon 18-21).

Pyrosequencing In pyrosequencing, real-time DNA synthesis is monitored through detection of pyrophosphate release upon dNTP incorporation [182-183]. Single-stranded DNA is immobilised and a complementary strand is generated through successive addition of nucleotide bases. When the added nucleotide is complementary to the first unpaired base in the single-stranded template, the incorporation triggers the release of pyrophosphate (PPi). Through an enzymatic reaction, PPi is converted to adenosine triphosphate (ATP) by ATP sulfurylase, which in turn triggers the luciferase-mediated transfor-mation of luciferin to oxyluciferin. In the process, a light signal is generated that is proportional to the number of incorporated nucleotides. ATP and nu-cleotides that have not been incorporated are degraded by apyrase and the process starts all over again with the addition of another nucleotide [184]. As part of the basic characterisation of the study cohort, and summarised in Paper V, pyrosequencing was used to detect mutations in KRAS in hotspot codons 12, 13 and 61.

Next-generation sequencing on the SOLiD platform. Sanger sequencing throughput is limited by that fact that the different-length DNA fragments must be sorted by capillary electrophoresis. Next-generation technologies increase the sequencing throughput by working in a massively parallel fashion. This is achieved by attaching individual DNA molecules to a solid surface and determining the sequence of each single molecule in situ [185]. Various next-generation platforms are available today, including the

Page 38: Molecular Characterisation and Prognostic Biomarker

38

Illumina Genome Analyzer, which applies bridge amplification and a polymerase-based sequencing-by-synthesis approach with reversible termi-nators, and the SOLiD system (Sequencing by Oligonucleotide Ligation and Detection), which combines DNA amplification by emulsion PCR with liga-tion-based sequencing chemistry [186]. Third generation single-molecule sequencers that abolish the need for a PCR amplification step are under de-velopment [187].

In Paper V, high-coverage SOLiD sequencing was applied to interrogate the presence of low-frequency mutations at a sensitivity of detection beyond that of conventional Sanger sequencing. In this context, coverage (or sequencing depth) describes the average number of sequencing reads for any given nucleotide. As each input DNA molecule acts as a separate template in the sequencing reaction, heterogeneous mixtures of genomes can be analysed simultaneously. Given a sufficient sequencing depth, also minor variants present can be assessed, providing a window into intratumor muta-tional heterogeneity.

For target selection in Paper V, several PCR primer pairs were designed to amplify the TP53 gene. A fragment library was then prepared according to standard protocols for SOLiD sequencing (SOLiD 4 System Library Prepa-ration Guide, March 2010 version). In brief, the DNA was fragmented by sonication, end-repaired and adapters were ligated to the ends of each frag-ment. Each sample was also assigned a unique barcode as part of one of the adapter sequences, which enabled all samples to be pooled. The libraries were amplified and the amplified molecules were attached to beads via one of the adapters. In the following emulsion PCR, individual template mole-cules attached to beads were amplified in separate droplets of water sus-pended in oil; each droplet containing one bead and all the necessary rea-gents for the PCR reaction. Beads with successfully extended PCR products were enriched and deposited onto a slide for sequencing.

A primer complementary to one adapter sequence, fluorescently labelled di-base oligonucleotide probes and the enzyme ligase were added. Each di-base probe consists of eight nucleotide bases. The first two bases at the 3'-end are complementary to the template sequence, while bases 3 through 8 are degenerate, which means that they are able to bind to any template nu-cleotide. After ligation and detection of the fluorescence signal, bases 6 through 8 are cleaved off together with the fluorescent dye attached at the 5’-end and the next cycle of ligation is initiated. The ligation/cleavage cycle is repeated several times, identifying the template sequence at five nucleotide intervals. The next sequencing round is commenced by the removal of the newly synthesised strand and hybridisation of a second sequencing primer followed by another round of ligation. This second primer is set back one base to enable reading some of the nucleotides that were not surveyed in the first round. The one base pair set-back of the sequencing primer also means that each base will be interrogated twice, providing an additional layer of

Page 39: Molecular Characterisation and Prognostic Biomarker

39

quality control [188]. In Paper V, the 50-bp fragment run included ten cycles of ligation, repeated in a total of five primer reset cycles.

Next-generation sequencers generate a large number of short sequence reads that must be aligned back to a reference sequence before data analysis. In Paper V, to generate a list of single base-pair substitutions with corresponding allele frequencies for each sample, the SOLiD sequencing reads were aligned to the human reference sequence (build GRCh37.p5) in colour space using the SOLiD System Analysis Pipeline Tool (Corona Lite) [189]. To detect small indels, either Burrows-Wheeler Alignment Tool [190] or NovoalignCS aligner was used to generate gapped alignments. Variants were called using the SAMtools package [191] and indels sustained by ≥ 100 reads were reported. SNPs and indels not present in dbSNP (v131) were classified as novel variants.

Statistics Statistical analyses (Paper I-V) were performed using Statistical Package for the Social Sciences Version 19.0 (IBM SPSS, Chicago, IL, USA) and the statistical computing language R (http://www.r-project.org). Chi-square and Fisher’s exact tests were used to assess the statistical relationship between clinical parameters and long-/short-term survival (Paper I) or protein expres-sion levels (Paper II), as well as to assess significant associations between copy number gain/loss and long-/short-term survival (Paper I). A two-sided Student’s t-test was used to compare gene expression levels between groups (Paper I). An empirical Bayes moderated t-test was used to identify differen-tially expressed genes between two groups of paired samples (Paper II). The Mann-Whitney U test was used to assess the relation between TP53 mutation status and transcriptional activity (Paper V). Uni- and multivariate logistic regression, with estimated Odds Ratios (OR) and 95% confidence interval (CI), was used to model associations between long-/short-term survival and clinical characteristics (Paper I) or protein expression levels (Paper II). Uni- and multivariate Cox regression models, with estimated Hazard Ratios (HR) and 95% CI, were applied to assess the importance of gene expression levels and clinical parameters in relation to patient survival time (Paper III-IV). Overall survival was calculated from the date of diagnosis until the date of death. Recurrence-free survival was calculated from the date of diagnosis until the date of last reported follow-up. Survival rates were calculated and visualised according to the Kaplan-Meier method and the Mantel-Cox log-rank test was used to compare survival distributions between groups (Paper II-IV). Meta-analysis (fixed effect and random effects models) was used to assess the statistical significance associated with outcome-related genes across multiple patient cohorts (Paper III-IV). The Spearman correlation coefficient was calculated to describe the correlation between gene/protein and protein/protein expression (Paper IV).

Page 40: Molecular Characterisation and Prognostic Biomarker

40

Results and Discussion

Paper I

Gene copy number aberrations are associated with survival in histologic subgroups of non-small cell lung cancer

In Paper I, we characterised DNA copy number alterations on a genome-wide scale, using Affymetrix GeneChip Mapping 250K NspI SNP arrays, and described associations between copy number gains and losses and survival outcome in defined histological NSCLC subtypes. The frequencies of detected amplifications and deletions were compared between two patient groups with short and long survival time, respectively, in order to identify changes associated with survival outcome. In addition, we analysed global gene expression levels in adenocarcinomas and squamous cell carcinomas, applying the Affymetrix Gene Chip Human Genome U133 Plus 2.0 arrays, as the integration of molecular data from different analytical platforms may deliver additional hints towards recognising genes that influence tumour behaviour.

Initially, to look at alterations in NSCLC as one disease entity, we evalu-ated data from all histological subtypes in a combined analysis. As observed in previous studies, numerous frequent large-scale alterations were detected (exemplified in Figure 4). Recurrent genomic aberrations comprised gain on chromosome 1q, 3q, 5p, 8q, and loss on 3p, 8p, 9p, 13q. When frequencies of copy number gains and losses were compared between the long-term and short-term survival group, no chromosomal region was associated with ei-ther a survival benefit or dismal prognosis.

As it is widely accepted that NSCLC histological subtypes differ with regard to molecular aspects, we subsequently compared differences in the frequency distribution of genomic anomalies between the two major histo-logic subtypes: adenocarcinoma and squamous cell carcinoma. As the large cell carcinomas comprised only a small number of cases, no separate analy-sis was performed for this histological subtype. Generally, large scale (>5Mb) gains and losses were detected more frequently in squamous cell carcinomas. Copy number gain on 2p, 3q, 8p, and 20p was more common in squamous cell carcinomas, as was copy number loss on 3p, 4p, 4q, 5q, 9p, 10q, 13q, and 17p, while gain on 1q was more frequently observed in adeno-carcinomas.

We thereafter analysed the histological subgroups separately to assess if any chromosomal region was more commonly amplified or deleted in the long- or short-term survival group, respectively. In the squamous cell carcinomas, recurrent gains on chromosome 14q23.1-q24.3 were associated with shorter survival time, while losses in another region on the same chro-

Page 41: Molecular Characterisation and Prognostic Biomarker

41

mosome, 14q31.1-q32.33, correlated with favourable outcome. Recurrent genomic alterations on the long arm of chromosome 14 have been described previously in NSCLC [143, 192], but the survival associations in our cohort represent new findings. Also, the reported recurrent amplification of fibro-blast growth factor receptor (FGFR1) at 8p12 [145] was confirmed in our study, detected in 35% of squamous cell carcinomas, but no association with survival was observed. Nonetheless, amplified FGFR1 may represent a much needed target for treatment in a subset of squamous cell lung carcinomas.

Figure 4. Illustration of DNA copy number changes identified in a squamous cell carcinoma specimen from a NSCLC patient; red/green bars represent regions of copy number loss/gain and two bars represents regions with high copy number gain (defined as ≥ 3 copies).

In adenocarcinoma, 8q21-q24.3 gain was more frequently detected among long-term survivors. The association between 8q24 gain and longer disease-free survival was previously described in adenocarcinoma by Shibata et al. [140] and now confirmed in our study. In contrast to our results, the focal amplification of the transcription factor and proto-oncogene c-MYC (8q24.2) was recently associated with shorter relapse-free survival [144]. A combined analysis of multiple data sets with linked survival information might further clarify the importance of chromosome 8q amplification events in NSCLC.

To verify that the identified genomic alterations represented true events, quantitative real-time PCR was applied to validate gains and losses in ‘sur-vival regions’ for one sample representing each histology group where both tumour and normal tissue samples were available, allowing a direct compari-son between copy number levels. DNA copy numbers were normalised against copy number neutral internal control regions and the ratio relative to the matched normal sample was calculated, confirming the presence of 8q gain and 14q loss.

Page 42: Molecular Characterisation and Prognostic Biomarker

42

In the end, in order to investigate if the identified copy number alterations were manifested as corresponding changes in mRNA transcript levels, global gene expression levels were analysed, as this may aid to further pinpoint genes that contribute to tumourigenesis. In the squamous cell carcinoma gain region (14q23.1-q24.3), 40% of genes in amplified cases showed significant-ly higher expression levels compared to non-amplified (45/112). Looking at the loss region (14q31.1-q32.33), concordant genomic loss and lower expression was observed in 32% (26/82). Comparing amplified with non-amplified cases in adenocarcinoma, 43% of genes (55/128) within the gain region on chromosome 8 on (8q21-q24.3) showed significantly higher ex-pression levels.

In conclusion, we described genomic gain and loss regions, with corre-sponding increase and decrease in mRNA transcript levels, associated with overall survival in adenocarcinoma and squamous cell carcinoma subgroups, suggesting that gene copy number analysis may deliver prognostic infor-mation of clinical value. Further investigations are warranted to identify and validate target genes in these regions.

Paper II

CD99 is a novel prognostic stromal marker in non-small cell lung cancer

In Paper II, we characterised the protein expression pattern of selected stro-mal proteins in NSCLC using immunohistochemistry on tissue microarrays. Protein expression was analysed in conjunction with clinical parameters, including overall survival, to identify stromal proteins with a potential relevance for NSCLC outcome.

Genes expressed in the NSCLC stroma were recognised by comparing global gene expression profiles of microdissected pure tumour cell popula-tions with the whole-tissue profiles from corresponding samples. Further-more, candidate genes were selected based on published gene expression data sets describing the tumour stroma in various tumour types. For final selection of candidates, we applied a strategy to combine gene expression array data with a subsequent screening of protein expression using the Human Protein Atlas (HPA) public web-portal (www.proteinatlas.org) [193]. Twelve proteins that presented a distinct and heterogeneous stromal staining pattern in the samples included in the HPA were selected for further evaluation on a NSCLC tissue microarray. The staining intensity was scored manually on a 4-graded scale, including the expression in all cell types of the stroma.

In the univariate analysis, a higher level of stromal CD99 expression was associated with long-term survival (OR 2.0, 1.04-3.82, p=0.037). Multivari-ate analysis sustained that the prognostic impact was independent of age,

Page 43: Molecular Characterisation and Prognostic Biomarker

43

tumour stage, smoking history and performance status (OR 2.05, CI 1.04-4.05, p=0.039). The survival association was subsequently confirmed (p=0.008) in an independent consecutive NSCLC cohort (n=240). To further explore which cell type expressed CD99 in the NSCLC stroma, tissue sec-tions were double-stained with CD99 and the cell type specific markers ACTA2 (α-smooth muscle actin) for cancer-associated fibroblasts and PTPRC (CD45) for lymphocytes. Visualised with immunofluorescence and confocal microscopy, CD99 co-localised with both ACTA2 and PTPRC, indicating that cancer-associated fibroblasts as well as lymphocytes do express CD99.

For the remaining eleven investigated proteins, no significant association with survival was observed. However, a high expression level of PDZ and LIM domain protein 5 (PDLIM5), secreted protein acidic and rich in cystein (SPARC) and transgelin (TAGLN) was associated with high tumour cell expression of proliferation marker Ki67. High expression levels of elastin microfibril interface located protein 1 (EMILIN1) and fibrillin 1 (FBN1) were on the contrary associated with a low tumour proliferation rate. A high stromal expression level of SPARC, versican (VCAN), periostin (POSTN), biglycan (BGN) and tenascin-C (TNC), as well as reduced expression of decorin (DCN), has been reported to be associated with poor prognosis in NSCLC and other cancer types [194-199]. These previous observations were not confirmed in our cohort, while a trend towards longer survival was indeed observed with higher expression of VCAN (OR 1.7, CI 0.94–2.96, p=0.082).

CD99 is reported to be down-regulated in for instance osteosarcoma and gastric adenocarcinoma, and has been suggested to function as a tumour suppressor [200-201]. In osteosarcoma cell culture experiments, the forced expression of CD99 was shown to inhibit anchorage-independent growth and cell migration. The tumour suppressing ability of CD99 in osteosarcoma is believed to be transmitted through the regulation of the plasma membrane protein caveolin-1 and inhibition of c-Src kinase [200]. In a recent publica-tion, CD99 was shown to inhibit cell adhesion to extracellular matrix proteins fibronectin, laminin and collagen IV by suppression of integrin β1 [202].

In conclusion, the adhesion molecule CD99 was recognised as a stromal biomarker associated with patient outcome in two independent NSCLC cohorts. High CD99 expression, mediated by cancer-associated fibroblasts and/or lymphocytes in the tumour stroma, may be part of what characterises a microenvironment that inhibits tumour progression, while the mechanism underlying the function of CD99 in NSCLC stroma needs to be further elu-cidated.

Page 44: Molecular Characterisation and Prognostic Biomarker

44

Paper III

Biomarker discovery in non-small cell lung cancer: integrating gene expression profiling, meta-analysis and tissue microarray validation In Paper III, we presented a strategy for prognostic biomarker discovery based on global gene expression array profiling and a meta-analysis approach that combines data from numerous independent public data sets. Moreover, we wanted to assess if the prognostic impact of single mRNA transcript levels were manifested as differences in protein expression that can provide confirmation on a clinically applicable level by the use of immunohistochemistry.

In clinical practice, the prognostic significance of most published NSCLC gene signatures is uncertain, as they rarely outperform conventional para-meters and have not yet been incorporated into the diagnostic routine. Furthermore, global mRNA-based assays are ill-suited to apply on the FFPE tissue samples routinely used in clinical diagnostics, and the biological importance of individual genes included in multigene classifiers is difficult to assess. With this background, instead of trying to generate a prognostic gene signature, we focused on reliable identification of single genes as prog-nostic biomarkers by employing an approach where candidate biomarkers with prognostic relevance in a primary cohort were further evaluated in a meta-analysis of publicly accessible data sets and validated on the protein level using immunohistochemistry in two independent cohorts.

Initially, to identify genes with prognostic relevance in the Uppsala co-hort, a Cox proportional hazards model was applied. After multiple testing adjustment (false discovery rate (FDR) <0.05), no gene displayed a signifi-cant prognostic impact when all probe sets on the array (n=54,675) were included in the analysis. In the adenocarcinoma subgroup, two transcripts (SSR4 and FAM46C) were associated with survival when strictly adjusted p-values were imposed.

While ensuring that false positive findings are kept at a minimum level, strict adjustment for multiple testing may on the other hand exclude relevant genes from further analysis (false negatives), which may be a drawback in exploratory investigations. One alternative is to allow a higher number of false positives in a primary analysis and then require validation in one or multiple independent patient cohorts. When the Uppsala cohort was used in a primary analysis, 450 probe sets with an unadjusted significant level p<0.01 were identified. Of the 450 prognostic probe sets, 62 remained sig-nificant in the meta-analysis (unadjusted p<0.01) and 17 thereof persisted also after rigorous adjustment for multiple testing with a FDR<0.01. These probe sets corresponded to 14 genes and we selected one of these genes, cell adhesion molecule 1 (CADM1), for further analysis on the protein level

Page 45: Molecular Characterisation and Prognostic Biomarker

45

using immunohistochemistry on tissue microarrays constructed from two independent NSCLC cohorts (Uppsala cohort and Örebro cohort).

The CADM1 antibody staining ranged from distinct membranous positivity in variable tumour cell fractions to completely negative. In the Uppsala TMA cohort (n=355), a low protein expression level was signifi-cantly associated with poor prognosis (HR=0.70, p=0.019). Also in the multivariate analysis, including established clinical factors, a clear associa-tion with overall survival was also observed (HR=0.69, p=0.016) and the influence of CADM1 loss of expression was stronger when the analysis was limited to the adenocarcinoma subgroup (HR=0.52, p<0.01). Furthermore, we showed that the prognostic relevance was retained when only adeno-carcinoma patients with stage I disease were included in the analysis (HR=0.53, p=0.013). The significant association between CADM1 protein expression and overall survival was validated by analysis of an independent patient cohort (Örebro cohort) that consisted of 262 NSCLC patient samples (NSCLC: HR=0.55, p=0.011; adenocarcinoma: HR=0.30; p<0.01).

We believe that these findings support the validity of our biomarker discovery strategy and the clinical and biological relevance of CADM1 as a tumour suppressor in NSCLC. Aberrant expression due to inactivation of CADM1 by promoter hyper-methylation has been described previously in NSCLC and in other cancer types including breast and cervical carcinoma [203-205] and it was recently shown that CADM1 null mice spontaneously develop tumours at an early age [206]. The down-regulation of CADM1 expression in cervical carcinoma during progression from cervical intraepi-thelial neoplasia to stage I/II disease suggests a role of CADM1 loss in early invasiveness [205].

Paper IV

The prognostic relevance of tumour-infiltrating plasma cells and immunoglobulin kappa C expression indicates an important role of the humoral immune response in non-small cell lung cancer

A gene expression profile reflecting the involvement of the humoral immune system has been shown to display prognostic power in node-negative breast cancer [207]. Recently, the prognostic impact of immunoglobulin kappa constant (IGKC) gene expression was described in various cancer types, including the Uppsala NSCLC cohort [208]. In Paper IV, IGKC protein expression in the tumour stroma of NSCLC was analysed by immunohisto-chemistry on a tissue microarray and analysed in association with clinical parameters. In addition, the co-localisation of IGKC and cell type-specific markers in NSCLC tissue was assessed using immunofluorescence. Further,

Page 46: Molecular Characterisation and Prognostic Biomarker

46

the immunohistochemical expression of B cell (CD20/MS4A1) and plasma cell (CD138/SDC1) markers was evaluated.

In the Uppsala cohort, IGKC gene expression, dichotomised by median expression, was associated with overall survival in a multivariate analysis, including clinically relevant factors: stage, performance status and age (HR=0.63, p=0.007). In addition to the probe set 214669_x_at included in previous analyses [208], two additional IGKC probe sets are included on the Affymetrix HG-U133 Plus 2.0 chip. While IGKC expression based on the probe set 215217_at showed a tendency towards better prognosis, the third probe set 237625_s_at revealed a clear association between higher gene expression and longer survival in univariate and multivariate analyses. The prognostic impact of all three IGKC probe sets was more pronounced when only the adenocarcinoma subgroup was evaluated, with median survival rates of 86/58/76 month for patients with high IGKC expression (> median) and 37/35/33 months for patients with low expression (≤median), for the three probe sets (214669_x_at; 215217_at; 237625_s_at). In an extended meta-analysis, now including 1260 NSCLC patients from six publicly avail-able data sets, the prognostic impact of IGKC gene expression with regard to survival was further substantiated (p=0.021).

In Paper IV, IGKC cell type-specific expression in NSCLC was visual-ised using immunofluorescence double-staining with antibodies directed towards either CD20 (B lymphocyte marker expressed on mature B cells but not on plasma cells) and IRF4/MUM1 (multiple myeloma oncogene-1; a marker for activated B cells, plasma blasts and plasma cells). Most of the cells that expressed IGKC also stained positive for nuclear IRF4/MUM1, indicating its expression in mature plasma cells. No co-localisation was ob-served between IGKC and CD20.

We then analysed IGKC protein expression on a TMA that included 355 non-small cell lung cancers of various histological types. A high expression level was associated with longer survival (HR=0.72, p=0.03), with particular impact in the adenocarcinoma subgroup (HR=0.57, p=0.01). The median survival benefit over all histological groups was 37 months between the high and low expression groups, while for adenocarcinoma patients a median survival difference of 53 month was observed. The prognostic relevance was retained in the multivariate analysis of the whole cohort (HR=0.72, p=0.03) as well as in the adenocarcinoma subgroup (HR=0.57, p=0.01).

In addition to IGKC, we analysed the protein expression of CD138 and CD20. CD138 is commonly used for detection of plasma cells in blood, but the in situ application of CD138 antibody is problematic as the antibody frequently stains also cells of epithelial origin. When including all histologi-cal subgroups in a combined analysis, the level of stromal CD138 expression was significantly associated with overall survival in the multivariate analysis (HR=0.74, p=0.04). Also in the adenocarcinoma subgroup, higher CD138 levels were clearly associated with longer overall survival, with 2-year

Page 47: Molecular Characterisation and Prognostic Biomarker

47

survival rates of 86% and 70% and median survival time of 80 and 45 months, for the high and low expression group, respectively. IGKC protein levels correlated strongly with CD138 (r=0.69, p<0.0001), while a weaker correlation was seen with CD20 (r=0.29).

In conclusion, we showed that the presence of tumour-infiltrating plasma cells that express IGKC was associated with favourable outcome, with par-ticular influence in the adenocarcinoma subgroup. This finding supports the involvement of the humoral immune response in anti-tumour activity.

Paper V

Data-driven unbiased curation of the TP53 tumour suppressor gene mutation database and validation by ultradeep sequencing of human tumours Mutated TP53 is the most frequently detected event in human cancers [65]. Data compiled from a large number of scientific publications that describe TP53 mutation in various tumour types have been included and categorised in Locus Specific Databases. A number of recent publications, included in the TP53 mutation database, described an unusual mutation distribution, including for instance multiple TP53 mutations per tumour in a large fraction of analysed samples. The accuracy of these reports has been debated and it has been suggested that this specific pattern of mutations could originate from sequencing artefacts due to the use of DNA from FFPE tissue [209]. On the other hand, it is possible that several tumors could be heterogeneous and consist of multiple subclones with different mutations. In Paper V, we combined two different types of analysis, next-generation deep sequencing and statistical analysis of the TP53 database, to define novel parameters to improve the accuracy of the database.

Deep sequencing of the TP53 gene: As the number of TP53 mutations per tumour was noticeably dissimilar among the database publications, we wanted to address if the occurrence of multiple mutation events in a tumour sample might be due to subclonal mutations that remain undetected using conventional sequencing techniques. In next-generation sequencing method-ologies, single DNA molecules functions as templates in the sequencing reaction. This means that diverse genomic variants present in a sample can be concurrently assessed to identify mutations beyond the limit of detection of Sanger sequencing, provided an adequate sequencing depth. This strategy was previously applied to examine for instance the intraclonal diversity at the Ig heavy chain locus in B-cell chronic lymphocytic leukemia, detecting frequencies as low as 1 in 5000 copies [210] and EGFR mutations in lung cancer biopsies below 1% mutated allele frequency at >65000-fold coverage [211].

Page 48: Molecular Characterisation and Prognostic Biomarker

48

One hundred NSCLC samples were first analysed for mutations in TP53 using Sanger sequencing. As anticipated, a high mutation frequency (67%) was observed, as was a large number of G-to-T transversions; a pattern recognised to reflect smoking-induced DNA damage. In addition, mutation frequencies for KRAS and EGFR (25% and 13%) in the NSCLC cohort were found to be in accordance with previously reported findings, as was the frequency and distribution of TP53 mutations, suggesting that we analysed a representative patient cohort. Twenty NSCLC specimens with known TP53 mutation status (five wild type and 15 mutated) were then selected for high-coverage sequencing on the SOLiD platform. In addition, fresh frozen tu-mour samples from 20 colon and 20 breast cancer patients were included. As a control, a subset of lung cancers was also analysed for the presence of mu-tations in TP73, a gene known to be exempt of mutation in cancers.

The TP53 mutation status in the NSCLC sample set, as defined by con-ventional sequencing, was confirmed in all but one tumour. Two novel non-synonymous mutations were identified in exons not previously covered by Sanger sequencing (exon 3 and 10). No mutations were detected in the TP73 gene. In breast cancer sample set, one novel mutation was detected. In the colon cancer sample set, which was previously not analysed by conventional sequencing, 17 TP53 mutations were detected in 14 tumours. In summary, in the 60 tumours analysed by ultradeep SOLiD sequencing, the number of mutation events per tumour was 1.12, which can be compared to 1.11 across the entire TP53 database. Therefore, our results indicate that in these three cancer types, with different mutation aetiology, the majority of the tumours are indeed driven by a single TP53 mutation.

Statistical analysis of the TP53 database: Based on the latest database release, we initially confirmed the previously described inverse relationship between TP53 mutation frequency and the residual transcriptional activity of mutant TP53 [212-213]. In addition to activity, to assess the quality of the publications in the TP53 database we then applied additional criteria, including the presence of synonymous mutations (WT), multiple mutations (EVT, T2, T3, T4+), mutation frequency (FREQ), and mutation recurrence (REC) (summarised in Paper V, Table S1). In brief, frequently detected TP53 mutations demonstrated a substantial loss of transcriptional activity, while infrequent mutations often sustained an activity >50%, which suggests a limited functional influence on tumourigenesis. In almost three quarters of publications, no synonymous mutations were reported, while frequencies ranging from a few percent to 50% were observed in the remaining publica-tions. In about half of the publications, only one mutation was described for all tumour samples. The remaining publications included reports of multiple mutations per tumour (range 2-14), many of these with either partial or no loss of activity.

To include the abovementioned criteria in a combined analysis, principal component analysis (PCA) was applied on the 1319 publications (27048

Page 49: Molecular Characterisation and Prognostic Biomarker

49

mutations) in the TP53 database that described six or more mutations. PCA is used to reduce data dimensionality and to describe non-overlapping variance within a data set. The first four components captured 66% of the total variance and were used to assess publication outlier behaviour. Publica-tions that deviated from the median by more than two standard deviations were defined as outliers. According to this definition, we identified 129 out-lier studies (9.7%). In the most recent release of the TP53 database an incor-porated quality score, based on the aforementioned analysis, will permit the user to select a quality-filtered content.

Page 50: Molecular Characterisation and Prognostic Biomarker

50

Concluding remarks and future perspectives The efforts of the medical research community are in the end celebrated by managing to fulfil the needs of those who suffer from disease. The translation of knowledge from test tubes and incubators to hospital beds and clinical diagnostic laboratories represents one crucial part of this process. Scientific and technological advancements continue to provide an increasing number of options for improvement of health, but also places new demands for large-scale collaborations due to the highly specialised functions that are needed to fully take advantage of its fruits. Building and maintaining necessary infrastructures for tissue biobanking to enable advanced molecular analyses is essential, as is management of high-quality cancer databases for collection and storage of data without breaking patient integrity and trust. In order to not lose momentum in the translation of novel findings into useful tools for improvement of human health, it is important to find ways to ensure that basic science and clinical reality are not two separate worlds.

To date numerous studies have undertaken the task to identify novel bio-markers; yet few have in the end found their way into widely used prognostic and predictive algorithms in a clinical setting. Multiple hurdles must be crossed prior to the implementation of a novel biomarker in routine diagnostics. In order to ensure robustness of biomarker performance, independent validation in multiple patient cohorts is needed to sustain the confidence in a novel candidate. To this end, any effort to encourage the responsible sharing of data and material resources between research groups will be advantageous. The inclusion of biomarker evaluation in randomised prospective clinical trials will most likely provide the final verdict for each candidate. In addition to clinical validation, to further understand the functional properties of candidate biomarkers and their role in tumour development represents another vital side of the coin.

The five papers included in this thesis reflect different aspects of a trans-lational research effort with the aim to characterise NSCLC at the molecular level and to identify factors associated with clinical outcome in this large patient group with a generally poor prognosis. In Paper I, genomic gain and loss regions, with corresponding increase and decrease in mRNA transcript levels, were found to be associated with overall survival in adenocarcinoma and squamous cell carcinoma subgroups, signifying that gene copy number analysis may provide prognostic information. A successful validation of survival-associated chromosomal regions in the extended frozen tissue cohort would add confidence in these findings and, integrated with global gene expression data, provide clues for further exploration of target genes. Furthermore, the evaluation of FGFR1 amplification in the Uppsala TMA cohort, using in situ hybridisation methods, will hopefully deliver a clear picture of the distribution of amplified FGFR1 as a much needed target for treatment in squamous cell lung carcinomas.

Page 51: Molecular Characterisation and Prognostic Biomarker

51

In Paper II, the adhesion molecule CD99 was recognised as a stromal biomarker associated with overall survival in two independent NSCLC cohorts. However, we were not able to provide a definite answer to whether the prognostic influence was mediated by CD99+ lymphocytes or cancer-associated fibroblasts (or both). CD99 is known to be widely expressed on cells of the hematopoietic lineage, regulating for instance the migration of leukocytes (including monocytes, neutrophils and T lymphocytes) through endothelial junctions [214-216]. Further co-localisation studies including cell-type specific immune cell markers, may clarify if the prognostic impact is related to a specific leukocyte subset. The presence of CD4+ T cells in the tumour stroma has for instance been correlated to better prognosis in NSCLC [217]. While the reason behind the survival-association of CD99 expression in NSCLC stroma remains speculative, the repeated observation in two independent cohorts clearly warrants further attention.

In Paper III, a strategy for prognostic biomarker discovery was described based on global gene expression array profiling and a meta-analysis approach that combines information from numerous independent public data sets. Moreover, the manifestation of prognostic mRNA transcript levels as differences in protein expression was assessed, providing confirmation on a clinically applicable level by the use of immunohistochemistry. As a final point, we suggested a clinical relevance of the tumour suppressor CADM1 as a candidate marker for patient stratification for adjuvant chemotherapy in early-stage NSCLC. It has been shown previously that CADM1-mediated cell adhesion functions via homophilic binding between extracellular immunoglobulin-like domains on neighbouring epithelial cells, while the cytoplasmic domain mediates attachment to intracellular actin filaments via DAL-1 (differentially expressed in adenocarcinoma of the lung)/4.1B and binds to membrane-associated guanylate kinase homolog such as MPP3 (MAGUK p55 subfamily member 3) [218-220]. A further characterisation of CADM1 interaction partners may elucidate how loss of CADM1-mediated cell adhesion is linked to invasion and metastasis in adenocarcinoma of the lung. In addition to CADM1, the 14-gene list presented in Paper III provides leads for further investigation of prognostic biomarkers in NSCLC, e.g. CDCP1 (CUB domain containing protein-1), a cell surface glycoprotein with a potential role in cancer metastasis through the regulation of resistance to anoikis, motility and degradation of extracellular matrix [221-222].

In Paper IV, we propose that the presence of tumour-infiltrating IGKC-expressing plasma cells in the local tissue environment is associated with favourable disease outcome; a finding that supports a role of the humoral immune response in NSCLC anti-tumour activity. As mentioned, a number of immune-related prognostic factors have been described in NSCLC tissue previously. The presence of CD4+ T cells did for instance correlate with longer overall survival, CD8+ T cells with longer, as well as shorter, overall and disease-free survival, and FoxP3+ regulatory T cells and tumour-

Page 52: Molecular Characterisation and Prognostic Biomarker

52

associated macrophages were consistently linked to poor prognosis [223]. A 72-gene prognostic profile, enriched for genes associated with the immune response (including immunoglobulins and various cytokines) has also been described in NSCLC [224], substantiating that this finding is not accidental, but reflects a true protective mechanism operating in tumour tissue.

In Paper V, we addressed the importance of accurate locus-specific mutation databases by combining two different types of analysis: next-generation deep sequencing and statistical analysis of the TP53 database, to define novel parameters to improve the accuracy of the database. As a result, a built-in quality score has been imposed in the new release of the TP53 database. It would be attractive to apply a similar deep sequencing approach to look into intratumoral mutation heterogeneity in genes that are targets for directed therapy to assess the presence of putative low-frequency resistant clones prior to treatment with receptor tyrosine kinase inhibitors.

During the course of the completion of this thesis, the genetics field have developed tremendously, with the emergence of next-generation sequencing technologies, and is today facing an exciting future. The complete DNA sequence of NSCLC tumour tissue and corresponding normal tissue from a 51 year old lung adenocarcinoma patient was published in 2010, identifying 540 non-synonymous mutations in protein coding genes [225], and it is apparent that next-generation methods for whole genome, exome, and tran-scriptome analysis will soon become mainstream. Recent studies already applied whole-genome and exome sequencing to comprehensively describe mutation patterns in large cohorts of lung adenocarcinoma and squamous cell carcinoma, validating previously known recurrent mutations as well as presenting several new candidates [226-227]. One question that comes to mind is if array-based methods, like those applied in Paper I-IV, are already obsolete? One thing that array-based tumour profiling and novel sequencing methods do have in common is the generation of an enormous amount of data and a continuing challenge will be to efficiently navigate this labyrinth of data points to arrive at reliable and meaningful conclusions.

In summary, this thesis encapsulates the comprehensive characterisation of a novel NSCLC tissue cohort and describes molecular markers associated with clinical outcome. A high-quality molecular data set has been generated, that may serve as a resource for our group, and others, and hopefully contribute to an enhanced understanding of mechanisms underlying non-small cell lung cancer biology and aid in the identification of much needed novel targets for prognostication and therapeutic intervention.

Page 53: Molecular Characterisation and Prognostic Biomarker

53

Acknowledgements

This work was carried out at the Department of Immunology, Genetics and Pathology (IGP), Uppsala University, and the Molecular Pathology unit at the Department of Clinical Pathology and Cytology, Uppsala University Hospital. Financial support was generously provided by the Lion’s Research Foundation and the Selander Foundation, Uppsala, Sweden, and by an open research grant from AstraZeneca, Manchester, UK.

Initially I would like to thank all lung cancer patients who donated precious tissue samples for medical research and all members of hospital staff who contributed to sample collection and the establishment of a tissue biobank. Without this invaluable resource this work would not have been possible!

I would like to express my sincerest gratitude to my supervisor, Patrick Micke, for convincing me to pursue a doctoral degree and for showing by example the value of hard work, dedication and persistence.

My co-supervisor, Johan Botling, for your enthusiastic commitment to development in the fields of tissue biobanking, molecular pathology and translational research in Uppsala and within national and international net-works.

My other co-supervisor, Fredrik Pontén, for your never-ending optimism and belief in scientific progress and for invaluable support when needed.

I would also like to say thank you to all of you who has supported me in different ways and enabled the completion of this thesis, especially:

Past and present members of the Molecular Pathology unit at the Department of Clinical Pathology and Cytology and the Translational Tumour Pathology group at IGP, including Monica Lindell, Monica Bergfors, Millaray Marincevic, Elin Nordström, Ingrid Thörn, Åsa Håkansson, Juliana Imgen-berg-Kreuz, and Johanna Mattsson. A special thank you to Simin Tah-masebpoor for always lending a helpful hand and to Magnus Sundström for always being ready to provide a sound opinion regardless the matter.

Lars Holmberg, Mats Lambe, Anders Berglund, Simon Ekman, and Michael Bergqvist for your contribution to the Uppsala lung cancer research initiative that provided the foundation for this thesis.

Page 54: Molecular Characterisation and Prognostic Biomarker

54

Anders Isaksson, Hanna Göransson-Kultima, and Maria Rydåker at the Uppsala Array Platform, SciLife Laboratory Uppsala, for nice collaborations and for providing bioinformatic expertise.

The Human Protein Atlas group at IGP, among others Anna, Sandra, Stina, Gabi, Angelika, IngMarie, Dijana, and PH, for adopting me as part of your research group and for excellent assistance with immunohistochemistry and tissue microarrays. A special thank you to Cecilia Lindskog – I learned a lot from you!

Thierry Soussi at Karolinska Institute (KI), Stockholm, for guiding me through sequence analysis and for taking me onboard the p53 project. Ola Larsson, KI, for valuable contributions to the p53 paper. Inger Jonasson and co-workers at the Uppsala Genome Center, SciLife Laboratory Uppsala, for professionally facilitating the deep sequencing project, as well as Adam Ameur and Ignas Bunikis for assistance in the processing and analysis of SOLiD data.

Jan G. Hengstler and colleagues at Leibniz Research Centre for Working Environment and Human Factors at TU Dortmund, as well as Jörg Rahnenführer and Miriam Lohr at TU Dortmund, for nice collaborations on the IGKC paper and for help with applying advanced bioinformatics to dis-sect the complexity of cancer.

Our next door neighbours, the forensics group, in particular Maria Lembring and Viktoria Ahlgren, for providing a friendly work environment. All people at the Rudbeck Laboratory for making this a nice place to work!

In the end, I want to express how grateful I am to have family and friends who support me in every possible way and encourage me to have a curious mind. My mother, Karin Edlund, for being the rock that you are for all of us. My father, Anders Edlund, who I wish would have been able to be here with us today. Theresia Brundin, my dear sister and best friend, for being the person with heart and integrity that I can always turn to when needed. And finally, my dearest Nils for making Kajo play and forget everything else!

Tack.

Page 55: Molecular Characterisation and Prognostic Biomarker

55

References

1. Jemal A, Bray F, Center MM, et al. Global cancer statistics. CA Cancer J Clin. 2011; 61:69-90.

2. Hanahan D, Weinberg RA. The hallmarks of cancer. Cell. 2000; 100:57-70.

3. Swanton C, Caldas C. From genomic landscapes to personalized cancer man-agement-is there a roadmap? Ann N Y Acad Sci. 2010; 1210:34-44.

4. The Molecular Biology of the Cell, Chapter 6, p.331, 5th edition. Garland Sci-ence, Taylor & Fransis Group (USA), 2007.

5. Greenman C, Stephens P, Smith R, et al. Patterns of somatic mutation in human cancer genomes. Nature. 2007; 446:153-158.

6. Hakem R. DNA-damage repair; the good, the bad, and the ugly. EMBO J. 2008; 27:589-605.

7. Vogelstein B, Kinzler KW. Cancer genes and the pathways they control. Nat Med. 2004; 10:789-799.

8. Nowell PC. The clonal evolution of tumor cell populations. Science. 1976; 194:23-28.

9. Kinzler KW, Vogelstein B. Lessons from hereditary colorectal cancer. Cell. 1996; 87:159-170.

10. Hanahan D, Weinberg RA. Hallmarks of cancer: the next generation. Cell. 2011; 144:646-674.

11. Bell DW. Our changing view of the genomic landscape of cancer. J Pathol. 2010; 220:231-243.

12. Knudson AG Jr. Mutation and cancer: statistical study of retinoblastoma. Proc Natl Acad Sci U S A. 1971; 68:820-823.

13. Wood LD, Parsons DW, Jones S, et al. The genomic landscapes of human breast and colorectal cancers. Science. 2007; 318:1108-1113.

14. Beroukhim R, Mermel CH, Porter D, et al. The landscape of somatic copy-number alteration across human cancers. Nature. 2010; 463:899-905.

15. Maser RS, DePinho RA. Connecting chromosomes, crisis, and cancer. Science. 2002; 297:565-569.

16. Artandi SE, DePinho RA. Telomeres and telomerase in cancer. Carcinogenesis. 2010; 31:9-18.

17. Pietras K, Ostman A. Hallmarks of cancer: interactions with the tumor stroma. Exp Cell Res. 2010; 316:1324-1331.

18. Micke P, Ostman A. Exploring the tumour environment: cancer-associated fibroblasts as targets in cancer therapy. Expert Opin Ther Targets. 2005; 9:1217-1233.

Page 56: Molecular Characterisation and Prognostic Biomarker

56

19. Bremnes RM, Dønnem T, Al-Saad S, et al. The role of tumor stroma in cancer progression and prognosis: emphasis on carcinoma-associated fibroblasts and non-small cell lung cancer. J Thorac Oncol. 2011; 6:209-217.

20. Langer C, Soria JC. The role of anti-epidermal growth factor receptor and anti-vascular endothelial growth factor therapies in the treatment of non-small-cell lung cancer. Clin Lung Cancer. 2010; 11:82-90.

21. Mantovani A, Allavena P, Sica A, Balkwill F. Cancer-related inflammation. Nature. 2008; 454:436-444.

22. Colotta F, Allavena P, Sica A, at al. Cancer-related inflammation, the seventh hallmark of cancer: links to genetic instability. Carcinogenesis. 2009; 30:1073-1081.

23. WHO Report on the Global Tobacco Epidemic. World Health Organization, Geneva, Switzerland: World Health Organization Press, 2008. (http://www. who.int/tobacco/mpower/mpower_report_full_2008.pdf; accessed September 23, 2012)

24. The Health Consequences of Smoking: A Report of the Surgeon General. Chap-ter 2, p.42-61. Atlanta, GA: U.S. Department of Health and Human Services, Centers for Disease Control and Prevention, National Center for Chronic Dis-ease Prevention and Health Promotion, Office on Smoking and Health, 2004. (http://www.cdc.gov/tobacco/data_statistics/sgr/2004/pdfs/chapter 2.pdf; ac-cessed September 23, 2012)

25. Doll R, Peto R. Cigarette smoking and bronchial carcinoma: dose and time relationships among regular smokers and lifelong non-smokers. J Epidemiol Community Health. 1978; 32:303-313.

26. Lungcancer i Sverige: Nationellt register for lungcancer, Redovisning av material 2002-2009. Styrgruppen för nationella lungcancerregistret, 2011. (http://www.cancercentrum.se/Global/RCCUppsalaOrebro/Vårdprocesser/lungcancer/rapporter/nat_rapplunga_11.pdf; accessed September 23, 2012)

27. Peto R, Darby S, Deo H, et al. Smoking, smoking cessation, and lung cancer in the UK since 1950: combination of national statistics with two case-control studies. BMJ. 2000; 321:323-329.

28. Doll R, Peto R, Boreham J, Sutherland I. Mortality in relation to smoking: 50 years' observations on male British doctors. BMJ. 2004; 328:1519.

29. Alberg AJ, Ford JG, Samet JM. Epidemiology of lung cancer: ACCP evidence-based clinical practice guidelines (2nd edition). Chest. 2007; 132:29S-55S.

30. Yang M. A current global view of environmental and occupational cancers. J Environ Sci Health C Environ Carcinog Ecotoxicol Rev. 2011; 29:223-249.

31. Shields PG. Molecular epidemiology of smoking and lung cancer. Oncogene. 2002; 21:6870-6876.

32. Brennan P, Hainaut P, Boffetta P. Genetics of lung-cancer susceptibility. Lancet Oncol. 2011; 12:399-408.

33. Ferlay J, Shin HR, Bray F, et al. Estimates of worldwide burden of cancer in 2008: GLOBOCAN 2008. Int J Cancer. 2010; 127:2893-2917.

34. Holmberg L, Sandin F, Bray F, et al. National comparisons of lung cancer sur-vival in England, Norway and Sweden 2001-2004: differences occur early in follow-up. Thorax. 2010; 65:436-441.

Page 57: Molecular Characterisation and Prognostic Biomarker

57

35. Yano T, Haro A, Shikada Y, et al. Non-small cell lung cancer in never smokers as a representative 'non-smoking-associated lung cancer': epidemiology and clinical features. Int J Clin Oncol. 2011; 16:287-293.

36. Subramanian J, Govindan R. Lung cancer in never smokers: a review. J Clin Oncol. 2007; 25:561-570.

37. Wakelee HA, Chang ET, Gomez SL, et al. Lung cancer incidence in never smokers. J Clin Oncol. 2007; 25:472-478.

38. Travis WD. Pathology of lung cancer. Clin Chest Med. 2011; 32:669-692.

39. Rosti G, Bevilacqua G, Bidoli P, et al. Small cell lung cancer. Ann Oncol. 2006; 17:ii5-10.

40. Travis WD, Brambilla E, Muller-Hermelink HK, et al. Pathology and genetics: tumours of the lung, pleura, tymus and heart. Chapter 1, p.12-124. Lyon (France):IARC, 2004.

41. Langer CJ, Besse B, Gualberto A, et al. The evolving role of histology in the management of advanced non-small-cell lung cancer. J Clin Oncol. 2010; 28:5311-5320.

42. Cooper WA, O'toole S, Boyer M, et al. What's new in non-small cell lung can-cer for pathologists: the importance of accurate subtyping, EGFR mutations and ALK rearrangements. Pathology. 2011; 43:103-115.

43. Rekhtman N, Ang DC, Sima CS, et al. Immunohistochemical algorithm for differentiation of lung adenocarcinoma and squamous cell carcinoma based on large series of whole-tissue sections with validation in small specimens.Mod Pathol. 2011; 24:1348-1359.

44. Travis WD, Brambilla E, Noguchi M, et al. International association for the study of lung cancer/american thoracic society/european respiratory society in-ternational multidisciplinary classification of lung adenocarcinoma. Thorac On-col. 2011; 6:244-285.

45. Brambilla E, Travis WD, Colby TV, et al. The new World Health Organization classification of lung tumours. Eur Respir J. 2001; 18:1059-1068.

46. Sequist LV, Bell DW, Lynch TJ, Haber DA. Molecular predictors of response to epidermal growth factor receptor antagonists in non-small-cell lung cancer. J Clin Oncol. 2007; 25:587-595.

47. Travis WD. Lung tumours with neuroendocrine differentiation. Eur J Cancer. 2009; 45:251-266.

48. Goldstraw P, Crowley J, Chansky K, et al. The IASLC Lung Cancer Staging Project: proposals for the revision of the TNM stage groupings in the forthcom-ing (seventh) edition of the TNM Classification of malignant tumours. J Thorac Oncol. 2007; 2:706-714.

49. Sobin LH, Compton CC. TNM seventh edition: what's new, what's changed: communication from the International Union Against Cancer and the American Joint Committee on Cancer. Cancer. 2010; 116:5336-5339.

50. Hoffman PC, Mauer AM, Vokes EE. Lung cancer. Lancet. 2000; 355:479-485.

51. Tsim S, O'Dowd CA, Milroy R, Davidson S. Staging of non-small cell lung cancer (NSCLC): a review. Respir Med. 2010; 104:1767-1774.

52. Hammerschmidt S, Wirtz H. Lung cancer: current diagnosis and treatment. Dtsch Arztebl Int. 2009; 106:809-818.

Page 58: Molecular Characterisation and Prognostic Biomarker

58

53. Spiro SG, Gould MK, Colice GL, et al. Initial evaluation of the patient with lung cancer: symptoms, signs, laboratory tests, and paraneoplastic syndromes: ACCP evidenced-based clinical practice guidelines (2nd edition). Chest. 2007; 132:149S-160S.

54. Scott WJ, Howington J, Feigenberg S, et al. Treatment of non-small cell lung cancer stage I and stage II: ACCP evidence-based clinical practice guidelines (2nd edition). Chest. 2007; 132:234S-242S.

55. Strauss GM, Kwiatkowski DJ, Harpole DH, et al. Molecular and pathologic markers in stage I non-small-cell carcinoma of the lung.J Clin Oncol. 1995; 13:1265-1279.

56. Robinson LA, Ruckdeschel JC, Wagner H Jr, et al. Treatment of non-small cell lung cancer-stage IIIA: ACCP evidence-based clinical practice guidelines (2nd edition). Chest. 2007;132:243S-265S.

57. Jett JR, Schild SE, Keith RL, et al. Treatment of non-small cell lung cancer, stage IIIB: ACCP evidence-based clinical practice guidelines (2nd edition). Chest. 2007; 132:266S-276S.

58. Socinski MA, Crowell R, Hensing TE, et al. Treatment of non-small cell lung cancer, stage IV: ACCP evidence-based clinical practice guidelines (2nd edi-tion). Chest. 2007; 132:277S-289S.

59. Laurent-Puig P, Lievre A, Blons H. Mutations and response to epidermal growth factor receptor inhibitors. Clin Cancer Res. 2009; 15:1133-1139.

60. Toyooka S, Tsuda T, Gazdar AF. The TP53 gene, tobacco exposure, and lung cancer. Hum Mutat. 2003; 21:229-239.

61. Sanchez-Cespedes M. The role of LKB1 in lung cancer. Fam Cancer. 2011; 10:447-453.

62. Riely GJ, Marks J, Pao W. KRAS mutations in non-small cell lung cancer. Proc Am Thorac Soc. 2009; 6:201-205.

63. Cheng H, Xu X, Costa DB, et al. Molecular testing in lung cancer: the time is now. Curr Oncol Rep. 2010; 12:335-348.

64. Zhang X, Chang A. Molecular predictors of EGFR-TKI sensitivity in advanced non-small cell lungcancer. Int J Med Sci. 2008; 5:209-217.

65. Pao W, Girard N. New driver mutations in non-small-cell lung cancer. Lancet Oncol. 2011; 12:175-180.

66. Kawano O, Sasaki H, Endo K, et al. PIK3CA mutation status in Japanese lung cancer patients. Lung Cancer. 2006; 54:209-215.

67. Olaso E, Labrador JP, Wang L, et al. Discoidin domain receptor 2 regulates fibroblast proliferation and migration through the extracellular matrix in associ-ation with transcriptional activation of matrix metalloproteinase-2. J Biol Chem. 2002; 277:3606-3613.

68. Hammerman PS, Sos ML, Ramos AH, et al. Mutations in the DDR2 kinase gene identify a novel therapeutic target in squamous cell lung cancer. Cancer Discov. 2011; 1:78-89.

69. Ding L, Getz G, Wheeler DA, et al. Somatic mutations affect key pathways in lung adenocarcinoma. Nature. 2008; 455:1069-1075.

70. Beau-Faller M, Ruppert AM, Voegeli AC, et al. MET gene copy number in non-small cell lung cancer: molecular analysis in a targeted tyrosine kinase in-hibitor naïve cohort. J Thorac Oncol. 2008; 3:331-339.

Page 59: Molecular Characterisation and Prognostic Biomarker

59

71. Okuda K, Sasaki H, Yukiue H, et al. Met gene copy number predicts the prog-nosis for completely resected non-small cell lung cancer. Cancer Sci. 2008; 99:2280-2285.

72. Cappuzzo F, Marchetti A, Skokan M, et al. Increased MET gene copy number negatively affects survival of surgically resected non-small-cell lung cancer pa-tients. Clin Oncol. 2009; 27:1667-1674.

73. Bean J, Brennan C, Shih JY, et al. MET amplification occurs with or without T790M mutations in EGFR mutant lung tumors with acquired resistance to ge-fitinib or erlotinib. Proc Natl Acad Sci U S A. 2007; 104:20932-20937.

74. Soda M, Choi YL, Enomoto M, et al. Identification of the transforming EML4-ALK fusion gene in non-small-cell lung cancer. Nature. 2007; 448:561-566.

75. Horn L, Pao W. EML4-ALK: honing in on a new target in non-small-cell lung cancer. J Clin Oncol. 2009; 27:4232-4235.

76. Jun HJ, Johnson H, Bronson RT, et al. The Oncogenic Lung Cancer Fusion Kinase CD74-ROS Activates a Novel Invasiveness Pathway Through E-Syt1 Phosphorylation. Cancer Res. 2012; 72:3764-3774

77. Takeuchi K, Soda M, Togashi Y, et al. RET, ROS1 and ALK fusions in lung cancer. Nat Med. 2012; 18:378-381.

78. Ju YS, Lee WC, Shin JY, et al. A transforming KIF5B and RET gene fusion in lung adenocarcinoma revealed from whole-genome and transcriptome sequenc-ing. Genome Res. 2012; 22:436-445.

79. Biomarkers Definitions Working Group. Biomarkers and surrogate endpoints: preferred definitions and conceptual framework. Clin Pharmacol Ther. 2001; 69:89-95.

80. Oldenhuis CN, Oosting SF, Gietema JA, de Vries EG. Prognostic versus predic-tive value of biomarkers in oncology. Eur J Cancer. 2008; 44:946-953.

81. Konopa K. Do we have markers to select patients for adjuvant therapies of non-small-cell lung cancer? Ann Oncol. 2010; 21:vii199-202.

82. Sudhindra A, Ochoa R, Santos ES. Biomarkers, prediction, and prognosis in non-small-cell lung cancer: a platform for personalized treatment. Clin Lung Cancer. 2011; 12:360-368.

83. Coate LE, John T, Tsao MS, Shepherd FA. Molecular predictive and prognostic markers in non-small-cell lung cancer. Lancet Oncol. 2009; 10:1001-1010.

84. Bepler G, Sharma S, Cantor A, et al. RRM1 and PTEN as prognostic parame-ters for overall and disease-free survival in patients with non-small-cell lung cancer. J Clin Oncol. 2004; 22:1878-1885.

85. Rosell R, Danenberg KD, Alberola V, et al. Ribonucleotide reductase messen-ger RNA expression and survival in gemcitabine/ cisplatin-treated advanced non-small cell lung cancer patients. Clin Cancer Res 2004; 10:1318–1325.

86. Simon GR, Sharma S, Cantor A, et al. ERCC1 expression is a predictor of sur-vival in resected patients with non-small cell lung cancer. Chest. 2005; 127:978-983.

87. Lord RV, Brabender J, Gandara D, et al. Low ERCC1 expression correlates with prolonged survival after cisplatin plus gemcitabine chemotherapy in non-small cell lung cancer. Clin Cancer Res. 2002; 8:2286-2291.

Page 60: Molecular Characterisation and Prognostic Biomarker

60

88. Lynch TJ, Bell DW, Sordella R, et al. Activating mutations in the epidermal growth factor receptor underlying responsiveness of non-small-cell lung cancer to gefitinib. N Engl J Med. 2004; 350:2129-2139.

89. Paez JG, Jänne PA, Lee JC, et al. EGFR mutations in lung cancer: correlation with clinical response to gefitinib therapy. Science. 2004; 304:1497-500.

90. Takano T, Ohe Y, Sakamoto H, et al. Epidermal growth factor receptor gene mutations and increased copy numbers predict gefitinib sensitivity in patients with recurrent non-small-cell lung cancer. J Clin Oncol. 2005; 23:6829-3837.

91. Tsao MS, Sakurada A, Cutz JC, et al. Erlotinib in lung cancer - molecular and clinical predictors of outcome. N Engl J Med. 2005; 353:133-144.

92. Goto K, Satouchi M, Ishii G, et al. An evaluation study of EGFR mutation tests utilized for non-small-cell lung cancer in the diagnostic setting. Ann Oncol. 2012 [Epub ahead of print].

93. Do H, Krypuy M, Mitchell PL, et al. High resolution melting analysis for rapid and sensitive EGFR and KRAS mutation detection in formalin fixed paraffin embedded biopsies. BMC Cancer. 2008; 8:142.

94. Kobayashi S, Boggon TJ, Dayaram T et al. EGFR mutation and resistance of non-small-cell lung cancer to gefitinib. N Engl J Med. 2005; 352:786-792.

95. Engelman JA, Zejnullahu K, Mitsudomi T, et al. MET amplification leads to gefitinib resistance in lung cancer by activating ERBB3 signaling. Science. 2007; 316:1039-1043.

96. Hirsch FR, Varella-Garcia M, Cappuzzo F, et al. Combination of EGFR gene copy number and protein expression predicts outcome for advanced non-small-cell lung cancer patients treated with gefitinib. Ann Oncol. 2007; 18:752-760.

97. Kwak EL, Bang YJ, Camidge DR, et al. Anaplastic lymphoma kinase inhibition in non-small-cell lung cancer. N Engl J Med. 2010; 363:1693-1703.

98. Wang R, Pan Y, Li C, et al. The Use of Quantitative Real-time Reverse-Transcriptase PCR for 5' and 3' portions of ALK transcripts to detect ALK Re-arrangements in Lung Cancers. Clin Cancer Res. 2012 Sep; 18:4725-4732.

99. Wallander ML, Geiersbach KB, Tripp SR, Layfield LJ. Comparison of reverse transcription-polymerase chain reaction, immunohistochemistry, and fluores-cence in situ hybridization methodologies for detection of echinoderm microtu-bule-associated proteinlike 4-anaplastic lymphoma kinase fusion-positive non-small cell lung carcinoma: implications for optimal clinical testing. Arch Pathol Lab Med. 2012; 136:796-803.

100. Choi YL, Soda M, Yamashita Y, et al. EML4-ALK mutations in lung cancer that confer resistance to ALK inhibitors. N Engl J Med. 2010; 363:1734-1739.

101. Agulló-Ortuño MT, López-Ríos F, Paz-Ares L. Lung cancer genomic signa-tures. J Thorac Oncol. 2010; 5:1673-1691.

102. Garber ME, Troyanskaya OG, Schluens K, et al. Diversity of gene expression in adenocarcinoma of the lung. Proc Natl Acad Sci U S A. 2001; 98:13784-13789.

103. Bhattacharjee A, Richards WG, Staunton J, et al. Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc Natl Acad Sci U S A. 2001; 98:13790-13795.

104. Hayes DN, Monti S, Parmigiani G, et al. Gene expression profiling reveals reproducible human lung adenocarcinoma subtypes in multiple independent pa-tient cohorts. J Clin Oncol. 2006; 24:5079-5090.

Page 61: Molecular Characterisation and Prognostic Biomarker

61

105. Takeuchi T, Tomida S, Yatabe Y, et al. Expression profile-defined classification of lung adenocarcinoma shows close relationship with underlying major genetic changes and clinicopathologic behaviors. J Clin Oncol. 2006; 24:1679-1688.

106. Inamura K, Fujiwara T, Hoshida Y, et al. Two subclasses of lung squamous cell carcinoma with different gene expression profiles and prognosis identified by hierarchical clustering and non-negative matrix factorization. Oncogene. 2005; 24:7105-7113.

107. Kikuchi T, Daigo Y, Katagiri T et al. Expression profiles of non-small cell lung cancers on cDNA microarrays: identification of genes for prediction of lymph-node metastasis and sensitivity to anti-cancer drugs. Oncogene. 2003; 22:2192-2205.

108. Takada M, Tada M, Tamoto E, et al. Prediction of lymph node metastasis by analysis of gene expression profiles in non-small cell lung cancer. J Surg Res. 2004; 122:61-69.

109. Xi L, Lyons-Weiler J, Coello MC, et al. Prediction of lymph node metastasis by analysis of gene expression profiles in primary lung adenocarcinomas. Clin Cancer Res. 2005; 11:4128-4135.

110. Larsen JE, Pavey SJ, Passmore LH, et al. Gene expression signature predicts recurrence in lung adenocarcinoma. Clin Cancer Res. 2007; 13:2946-2954.

111. Tomida S, Takeuchi T, Shimada Y, et al. Relapse-related molecular signature in lung adenocarcinomas identifies patients with dismal prognosis. J Clin Oncol. 2009; 27:2793-2799.

112. Lu Y, Wang L, Liu P, et al. Gene-expression signature predicts postoperative recurrence in stage I non-small cell lung cancer patients. PLoS One. 2012;7:e30880.

113. Beer DG, Kardia SL, Huang CC, et al. Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat Med. 2002; 8:816-824.

114. Tomida S, Koshikawa K, Yatabe Y, et al. Gene expression-based, individual-ized outcome prediction for surgically treated lung cancer patients. Oncogene. 2004; 23:5360-5370

115. Raponi M, Zhang Y, Yu J, et al. Gene expression signatures for predicting prognosis of squamous cell and adenocarcinomas of the lung. Cancer Res. 2006; 66:7466-7472.

116. Lu Y, Lemon W, Liu PY, et al. A gene expression signature predicts survival of patients with stage I non-small cell lung cancer. PLoS Med. 2006; 3:e467.

117. Chen HY, Yu SL, Chen CH, et al. A five-gene signature and clinical outcome in non-small-cell lung cancer. N Engl J Med. 2007; 356:11-20.

118. Sun Z, Wigle DA, Yang P. Non-overlapping and non-cell-type-specific gene expression signatures predictlung cancer survival. J Clin Oncol. 2008; 26:877-883.

119. Shedden K, Taylor JM, Enkemann SA et al. Gene expression-based survival prediction in lung adenocarcinoma: a multi-site, blinded validation study. Nat Med. 2008; 14:822-827.

120. Hou J, Aerts J, den Hamer B, et al. Gene expression-based classification of non-small cell lung carcinomas and survival prediction. PLoS One. 2010; 5:e10312.

121. Zhu CQ, Strumpf D, Li CY, et al. Prognostic gene expression signature for squamous cell carcinoma of lung. Clin Cancer Res. 2010; 16:5038-5047.

Page 62: Molecular Characterisation and Prognostic Biomarker

62

122. Balko JM, Potti A, Saunders C, et al. Gene expression patterns that predict sensitivity to epidermal growth factor receptor tyrosine kinase inhibitors in lung cancer cell lines and human lung tumors. BMC Genomics. 2006; 7:289.

123. Coldren CD, Helfrich BA, Witta SE, et al. Baseline gene expression predicts sensitivity to gefitinib in non-small cell lung cancer cell lines. Mol Cancer Res. 2006; 4:521-528.

124. Györffy B, Surowiak P, Kiesslich O, et al. Gene expression profiling of 30 cancer cell lines predicts resistance towards 11 anticancer drugs at clinically achieved concentrations. Int J Cancer. 2006; 118:1699-1712.

125. Landi MT, Dracheva T, Rotunno M, et al. Gene expression signature of ciga-rette smoking and its role in lung adenocarcinoma development and survival. PLoS One. 2008; 3:e1651.

126. Subramanian J, Simon R. Gene expression-based prognostic signatures in lung cancer: ready for clinical use? J Natl Cancer Inst. 2010; 102:464-474.

127. van de Vijver MJ, He YD, van't Veer LJ, et al. A gene-expression signature as a predictor of survival in breast cancer. N Engl J Med. 2002; 347:1999-2009.

128. Paik S, Shak S, Tang G, et al. A multigene assay to predict recurrence of tamox-ifen-treated, node-negative breast cancer. N Engl J Med. 2004; 351: 2817-2826.

129. Espinosa E, Gámez-Pozo A, Sánchez-Navarro I, et al. The present and future of gene profiling in breast cancer. Cancer Metastasis Rev. 2012; 31:41-46.

130. Kallioniemi A, Kallioniemi OP, Sudar D, et al. Comparative genomic hybridi-zation for molecular cytogenetic analysis of solid tumors. Science. 1992; 258:818-821.

131. Solinas-Toldo S, Lampel S, Stilgenbauer S, et al. Matrix-based comparative genomic hybridization: biochips to screen for genomic imbalances. Genes Chromosomes Cancer. 1997; 20:399-407.

132. Pinkel D, Segraves R, Sudar D, et al. High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays. Nat Genet. 1998; 20:207-211.

133. Pollack JR, Perou CM, Alizadeh AA, et al. Genome-wide analysis of DNA copy-number changes using cDNA microarrays. Nat Genet. 1999; 23:41-46.

134. Mei R, Galipeau PC, Prass C et al. Genome-wide detection of allelic imbalance using human SNPs and high-density DNA arrays. Genome Res. 2000; 10:1126-1137.

135. Bignell GR, Huang J, Greshock J, et al. High-resolution analysis of DNA copy number using oligonucleotide microarrays. Genome Res. 2004; 14:287-295.

136. Dutt A, Beroukhim R. Single nucleotide polymorphism array analysis of cancer. Curr Opin Oncol. 2007; 19:43-49.

137. Petersen I, Bujard M, Petersen S et al. Patterns of chromosomal imbalances in adenocarcinoma and squamous cell carcinoma of the lung. Cancer Res. 1997; 57:2331-2335.

138. Jiang F, Yin Z, Caraway NP, et al. Genomic profiles in stage I primary non small cell lung cancer using comparative genomic hybridization analysis of cDNA microarrays. Neoplasia. 2004; 6:623-635.

139. Kim TM, Yim SH, Lee JS, et al. Genome-wide screening of genomic alterations and their clinicopathologic implications in non-small cell lung cancers. Clin Cancer Res. 2005; 11:8235-8242.

Page 63: Molecular Characterisation and Prognostic Biomarker

63

140. Shibata T, Uryu S, Kokubu A, et al. Genetic classification of lung adenocarci-noma based on array-based comparative genomic hybridization analysis: its as-sociation with clinicopathologic features. Clin Cancer Res. 2005; 11:6177-6185.

141. Lo KC, Stein LC, Panzarella JA, et al. Identification of genes involved in squa-mous cell carcinoma of the lung using synchronized data from DNA copy num-ber and transcript expression profiling analysis. Lung Cancer. 2008; 59:315-331.

142. Zhao X, Weir BA, , T et al. Homozygous deletions and chromosome amplifica-tions in human lung carcinomas revealed by single nucleotide polymorphism ar-ray analysis. Cancer Res. 2005; 65:5561-5570.

143. Weir BA, Woo MS, Getz G, et al. Characterizing the cancer genome in lung adenocarcinoma. Nature. 2007; 450:893-898.

144. Iwakawa R, Kohno T, Kato M, et al. MYC amplification as a prognostic marker of early-stage lung adenocarcinoma identified by whole genome copy number analysis. Clin Cancer Res. 2011; 17:1481-1489.

145. Weiss J, Sos ML, Seidel D, et al. Frequent and focal FGFR1 amplification asso-ciates with therapeutically tractable FGFR1 dependency in squamous cell lung cancer. Sci Transl Med. 2010; 2:62ra93.

146. Goeke F, Franzen A, Menon R, et al. Rationale for treatment of metastatic squamous cell carcinoma of the lung using FGFR Inhibitors. Chest. 2012 [Epub ahead of print].

147. Bass AJ, Watanabe H, Mermel CH, et al. SOX2 is an amplified lineage-survival oncogene in lung and esophageal squamous cell carcinomas. Nat Genet. 2009; 41:1238-1242.

148. Hussenet T, Dali S, Exinger J, et al. SOX2 is an oncogene activated by recurrent 3q26.3 amplifications in human lung squamous cell carcinomas. PLoS One. 2010; 5:e8960.

149. Wilbertz T, Wagner P, Petersen K, et al. SOX2 gene amplification and protein overexpression are associated with better outcome in squamous cell lung can-cer. Mod Pathol. 2011;24:944-953.

150. Huang YT, Lin X, Liu Y, et al. Cigarette smoking increases copy number al-terations in nonsmall-cell lung cancer. Proc Natl Acad Sci U S A. 2011; 108:16345-16350.

151. Huang YT, Lin X, Chirieac LR, et al. Impact on disease development, genomic location and biological function of copynumber alterations in non-small cell lung cancer. PLoS One. 2011; 6:e22961.

152. Birkbak NJ, Eklund AC, Li Q, et al. Paradoxical relationship between chromo-somal instability and survival outcome in cancer. Cancer Res. 2011; 71:3447-3452.

153. Schroeder A, Mueller O, Stocker S, et al. The RIN: an RNA integrity number for assigning integrity values to RNA measurements. BMC Mol Biol. 2006; 7:3.

154. Oken MM, Creech RH, Tormey DC, et al. Toxicity and response criteria of the Eastern Cooperative Oncology Group. Am J Clin Oncol. 1982; 5:649-655.

155. Göhlmann H, Talloen W. Gene Expression Studies Using Affymetrix Microar-rays. Chapter 2-5. Chapman & Hall/CRC, Taylor & Francis Group (USA), 2009.

Page 64: Molecular Characterisation and Prognostic Biomarker

64

156. Datasheet: GeneChip® Human Genome U133 Arrays. (http://media.affymetrix. com/support/technical/datasheets/hgu133arrays_datasheet.pdf; accessed Sep-tember 20, 2012).

157. Irizarry RA, Hobbs B, Collin F, et al. Exploration, normalization, and summar-ies of high density oligonucleotide array probe level data. Biostatistics. 2003; 4:249-264.

158. Nowak D, Hofmann WK, Koeffler HP. Genome-wide Mapping of Copy Num-ber Variations Using SNP Arrays. Transfus Med Hemother. 2009; 36:246-251.

159. Datasheet: Affymetrix GeneChip® Human Mapping 500K Array Set (http:// me dia.affymetrix.com/support/technical/datasheets/500k_datasheet.pdf; accessed September 20, 2012).

160. Di X, Matsuzaki H, Webster TA, et al. Dynamic model based algorithms for screening and genotyping over 100 K SNPs on oligonucleotide microarrays. Bi-oinformatics. 2005; 21:1958-1963.

161. Simplifying Data Interpretation with Nexus Copy Number. A white paper from BioDiscovery, Inc. (http://www.biodiscovery.com/downloads/pdfs/Simplifying DataInterpretationWithNexusCopyNumber.pdf; accessed September 20, 2012).

162. Olshen AB, Venkatraman ES, Lucito R, Wigler M. Circular binary segmenta-tion for the analysis of array-based DNA copy number data. Biostatistics. 2004; 5:557-572.

163. Diskin SJ, Eck T, Greshock J, et al. STAC: A method for testing the signifi-cance of DNA copy number aberrations across multiple array-CGH experi-ments. Genome Res. 2006; 16:1149-1158.

164. Kubista M, Andrade JM, Bengtsson M, et al. The real-time polymerase chain reaction. Mol Aspects Med. 2006; 27:95-112.

165. Livak KJ, Schmittgen TD. Analysis of relative gene expression data using real-time quantitative PCR and the 2(-Delta Delta C(T)) Method. Methods. 2001; 25:402-408.

166. Vandesompele J, De Preter K, Pattyn F, et al. Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal con-trol genes. Genome Biol. 2002; 3:34.

167. Teruya-Feldstein J. The immunohistochemistry laboratory: looking at molecules and preparing for tomorrow. Arch Pathol Lab Med. 2010; 134:1659-1665.

168. Nakane PK, Pierce GB Jr. Enzyme-labeled antibodies: preparation and applica-tion for the localization of antigens. J Histochem Cytochem. 1966; 14:929-931.

169. Bratthauer GL. Overview of antigen detection through enzymatic activity. Methods Mol Biol. 2010; 588:231-241.

170. Sternberger LA, Hardy PH Jr, Cuculis JJ, Meyer HG. The unlabeled antibody enzyme method of immunohistochemistry: preparation and properties of soluble antigen-antibody complex (horseradish peroxidase-antihorseradish peroxidase) and its use in identification of spirochetes. J Histochem Cytochem. 1970; 18:315-333.

171. Heitzmann H, Richards FM. Use of the avidin-biotin complex for specific stain-ing of biological membranes in electron microscopy. Proc Natl Acad Sci U S A. 1974; 71:3537-3541.

172. Battifora H. The multitumor (sausage) tissue block: novel method for immuno-histochemical antibody testing. Lab Invest. 1986; 55:244-248.

Page 65: Molecular Characterisation and Prognostic Biomarker

65

173. Kononen J, Bubendorf L, Kallioniemi A, et al. Tissue microarrays for high-throughput molecular profiling of tumor specimens. Nat Med. 1998; 4:844-847.

174. Pontén F, Schwenk JM, Asplund A, Edqvist PH. The Human Protein Atlas as a proteomic resource for biomarker discovery. J Intern Med. 2011; 270:428-446.

175. Cappuzzo F, Marchetti A, Skokan M, et al. Increased MET gene copy number negatively affects survival of surgically resected non-small-cell lung cancer pa-tients. J Clin Oncol. 2009; 27:1667-1674.

176. Park K, Han S, Kim JY, et al. Silver-Enhanced In Situ Hybridization as an Al-ternative to Fluorescence In Situ Hybridization for Assaying HER2 Amplifica-tion in Clinical Breast Cancer. J Breast Cancer. 2011; 14:276-282.

177. Bhargava R, Gerald WL, Li AR, et al. EGFR gene amplification in breast can-cer: correlation with epidermal growth factor receptor mRNA and protein ex-pression and HER-2 status and absence of EGFR-activating mutations. Mod Pathol. 2005; 18:1027-1033.

178. Kiflemariam S, Andersson S, Asplund A, et al. Scalable in situ hybridization on tissue arrays for validation of novel cance and tissue-specific biomarkers. PLoS One. 2012; 7:e32927.

179. International Human Genome Sequencing Consortium. Finishing the euchro-matic sequence of the human genome. Nature. 2004; 431:931-945.

180. Sanger F, Nicklen S, Coulson AR. DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci U S A. 1977; 74:5463-5467.

181. Smith LM, Sanders JZ, Kaiser RJ, et al. Fluorescence detection in automated DNA sequence analysis. Nature. 1986; 321:674-679.

182. Ronaghi M, Uhlén M, Nyrén P. A sequencing method based on real-time pyro-phosphate. Science. 1998; 281:363-365.

183. Fakhrai-Rad H, Pourmand N, Ronaghi M. Pyrosequencing: an accurate detec-tion platform for single nucleotide polymorphisms. Hum Mutat. 2002; 19:479-485.

184. Ahmadian A, Ehn M, Hober S. Pyrosequencing: history, biochemistry and fu-ture. Clin Chim Acta. 2006; 363:83-94.

185. Aparicio SA, Huntsman DG. Does massively parallel DNA resequencing signi-fy the end of histopathology as we know it? J Pathol. 2010; 220:307-315.

186. Mardis ER. The impact of next-generation sequencing technology on genetics. Trends Genet. 2008; 24:133-141.

187. Schadt EE, Turner S, Kasarskis A. A window into third-generation sequencing. Hum Mol Genet. 2010; 19:R227-R240.

188. White Paper SOLiD™ System: A theoretical understanding of 2 base color codes and its application to annotation, error detection, and error correction. Methods for Annotating 2 Base Color Encoded Reads in the SOLiD™ System. (http://www3.appliedbiosystems.com/cms/groups/mcb_marketing/documents/generaldocuments/cms_058265.pdf; accessed September 23, 2012.)

189. Zaboli G, Ameur A, Igl W, et al. Sequencing of high-complexity DNA pools for identification of nucleotide and structural variants in regions associated with complex traits. Eur J Hum Genet. 2012; 20:77-83.

190. Li H, Durbin R. Bioinformatics. Fast and accurate short read alignment with Burrows-Wheeler transform. 2009; 25:1754-1760.

Page 66: Molecular Characterisation and Prognostic Biomarker

66

191. Li H, Handsaker B, Wysoker A, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009; 25:2078-2079.

192. Harris T, Pan Q, Sironi J, et al. Both gene amplification and allelic loss occur at 14q13.3 in lung cancer. Clin Cancer Res 2011; 17:690-699.

193. Uhlén M, Björling E, Agaton C, et al. A human protein atlas for normal and cancer tissues based on antibody proteomics. Mol Cell Proteomics. 2005; 4:1920-1932.

194. Chlenski A, Cohn SL. Modulation of matrix remodeling by SPARC in neo-plastic progression. Semin Cell Dev Biol. 2010; 21:55-65.

195. Ricciardelli C, Sakko AJ, Ween MP, et al. The biological role and regulation of versican levels in cancer. Cancer Metastasis Rev. 2009; 28:233-245.

196. Ruan K, Bao S, Ouyang G. The multifaceted role of periostin in tumorigenesis. Cell Mol Life Sci. 2009; 66:2219-2230.

197. Gu X, Ma Y, Xiao J, et al. Up-regulated biglycan expression correlates with the malignancy in human colorectal cancers. Clin Exp Med. 2012; 12:195-199.

198. Orend G, Chiquet-Ehrismann R. Tenascin-C induced signaling in cancer. Can-cer Lett. 2006; 244:143-163.

199. Troup S, Njue C, Kliewer EV, et al. Reduced expression of the small leucine-rich proteoglycans, lumican, and decorin is associated with poor outcome in node-negative invasive breast cancer. Clin Cancer Res. 2003; 9:207-214.

200. Manara MC, Bernard G, Lollini PL, et al. CD99 acts as an oncosuppressor in osteosarcoma. Mol Biol Cell. 2006; 17:1910-1921.

201. Lee JH, Kim SH, Wang LH, et al. Clinical significance of CD99 down-regulation in gastric adenocarcinoma. Clin Cancer Res. 2007; 13:2584-2591.

202. Lee KJ, Lee SH, Yadav BK, et al. The activation of CD99 inhibits cell-extracellular matrix adhesion by suppressing β1 integrin affinity. BMB Rep. 2012; 45:159-164.

203. Kikuchi S, Yamada D, Fukami T, et al. Hypermethylation of the TSLC1/IGSF4 promoter is associated with tobacco smoking and a poor prognosis in primary nonsmall cell lung carcinoma. Cancer. 2006; 106:1751-1758.

204. Takahashi Y, Iwai M, Kawai T et al. Aberrant expression of tumor suppressors CADM1 and 4.1B in invasive lesions of primary breast cancer. Breast Cancer. 2012; 19:242-252.

205. Mazumder Indra D, Mitra S, Roy A, et al. Alterations of ATM and CADM1 in chromosomal 11q22.3-23.2 region are associated with the development of inva-sive cervical carcinoma. Hum Genet. 2011; 130:735-748.

206. van der Weyden L, Arends MJ, Rust AG, et al. Increased tumorigenesis associ-ated with loss of the tumor suppressor gene Cadm1. Mol Cancer. 2012; 11:29.

207. Schmidt M, Böhm D, von Törne C, et al. The humoral immune system has a key prognostic impact in node-negative breast cancer. Cancer Res. 2008; 68:5405-5413.

208. Schmidt M, Hellwig B, Hammad S, et al. A comprehensive analysis of human gene expression profiles identifies stromal immunoglobulin κ C as a compatible prognostic marker in human solid tumors. Clin Cancer Res. 2012; 18:2695-2703.

Page 67: Molecular Characterisation and Prognostic Biomarker

67

209. Soussi T, Asselain B, Hamroun D, et al. Meta-analysis of the p53 mutation database for mutant p53 biological activity reveals a methodologic bias in muta-tion detection. Clin Cancer Res. 2006; 12:62-69.

210. Campbell PJ, Pleasance ED, Stephens PJ, et al. Subclonal phylogenetic struc-tures in cancer revealed by ultra-deep sequencing. Proc Natl Acad Sci U S A. 2008; 105:13081-13086.

211. Thomas RK, Nickerson E, Simons JF, et al. Sensitive mutation detection in heterogeneous cancer specimens by massively parallel picoliter reactor sequenc-ing. Nat Med. 2006; 12:852-855.

212. Kato S, et al. Understanding the function-structure and function-mutation rela-tionships of p53 tumor suppressor protein by high-resolution missense mutation analysis. Proc Natl Acad Sci USA. 2003; 100:8424-8429.

213. Soussi T, Kato S, Levy PP, Ishioka C. Reassessment of the TP53 mutation da-tabase in human disease by data mining with a library of TP53 missense muta-tions. Hum Mutat 2005; 25:6-17.

214. Schenkel AR, Mamdouh Z, Chen X, et al. CD99 plays a major role in the mi-gration of monocytes through endothelial junctions. Nat Immunol. 2002; 3:143-150.

215. Lou O, Alcaide P, Luscinskas FW, Muller WA. CD99 is a key mediator of the transendothelial migration of neutrophils. J Immunol. 2007; 178:1136-1143.

216. Manes TD, Pober JS. Identification of endothelial cell junctional proteins and lymphocyte receptors involved in transendothelial migration of human effector memory CD4+ T cells. J Immunol. 2011; 186:1763-1768.

217. Wakabayashi O, Yamazaki K, Oizumi S, et al. CD4+ T cells in cancer stroma, not CD8+ T cells in cancer cell nests, are associated with favorable prognosis in human non-small cell lung cancers. Cancer Sci. 2003; 94:1003-1009.

218. Murakami Y. Involvement of a cell adhesion molecule, TSLC1/IGSF4, in hu-man oncogenesis. Cancer Sci. 2005; 96:543-552.

219. Zhang Y, Xu R, Li G, et al. Loss of expression of the differentially expressed in adenocarcinoma of the lung (DAL-1) protein is associated with metastasis of non-small cell lung carcinoma cells. Tumour Biol. 2012 [Epub ahead of print].

220. Fukuhara H, Masuda M, Yageta M, et al. Association of a lung tumor suppres-sor TSLC1 with MPP3, a human homologue of Drosophila tumor suppressor Dlg. Oncogene. 2003; 22:6160-6165.

221. Uekita T, Sakai R. Roles of CUB domain-containing protein 1 signaling in cancer invasion and metastasis. Cancer Sci. 2011; 102:1943-1948.

222. Benes CH, Poulogiannis G, Cantley LC, Soltoff SP. The SRC-associated pro-tein CUB Domain-Containing Protein-1 regulates adhesion and motility. Onco-gene. 2012; 31:653-663.

223. Suzuki K, Kachala SS, Kadota K, et al. Prognostic immune markers in non-small cell lung cancer. Clin Cancer Res. 2011; 17:5247-5256.

224. Roepman P, Jassem J, Smit EF, et al. An immune response enriched 72-gene prognostic profile for early-stage non-small-cell lung cancer. Clin Cancer Res. 2009; 15:284-290.

225. Lee W, Jiang Z, Liu J, et al. The mutation spectrum revealed by paired genome sequences from a lung cancer patient. Nature. 2010; 465:473-477.

Page 68: Molecular Characterisation and Prognostic Biomarker

68

226. The Cancer Genome Atlas Research Network. Comprehensive genomic charac-terization of squamous cell lung cancers. Nature. 2012 [Epub ahead of print].

227. Imielinski M, Berger AH, Hammerman PS, et al. Mapping the hallmarks of lung adenocarcinoma with massively parallel sequencing. Cell. 2012; 150:1107-1120.

Page 69: Molecular Characterisation and Prognostic Biomarker
Page 70: Molecular Characterisation and Prognostic Biomarker

Acta Universitatis UpsaliensisDigital Comprehensive Summaries of Uppsala Dissertationsfrom the Faculty of Medicine 817

Editor: The Dean of the Faculty of Medicine

A doctoral dissertation from the Faculty of Medicine, UppsalaUniversity, is usually a summary of a number of papers. A fewcopies of the complete dissertation are kept at major Swedishresearch libraries, while the summary alone is distributedinternationally through the series Digital ComprehensiveSummaries of Uppsala Dissertations from the Faculty ofMedicine.

Distribution: publications.uu.seurn:nbn:se:uu:diva-181912

ACTAUNIVERSITATIS

UPSALIENSISUPPSALA

2012