Potamias_etal05

Embed Size (px)

Citation preview

  • 7/28/2019 Potamias_etal05

    1/6

    Breast Cancer and Biomedical Informatics: The PrognoChip Project

    G. Potamias1,2

    *, A. Analyti1, D. Kafetzopoulos

    3, D. Plexousakis

    1,2, P. Poirazi

    3, M. Reczko

    1,

    I.G. Tollis1,2, M. E. Sanidas4, E. Stathopoulos5, Tsiknakis1, S. Vassilaros6

    Institute of Computer Science

    Foundation for Research & Technology Hellas (FORTH)

    Heraklion, 711 10 Greece

    Phone: +30-2810-391693, Fax: +30-2810-311601, E-mail:[email protected]

    _____________________________

    * Corresponding author

    1Institute of Computer Science (ICS), FORTH, 2Dept. of Computer Science, University of Crete, 3Institute of Molecular Biologyand Biotechnology (IMBB), FORTH, 4Dept. of Surgical Oncology, Medical School, University of Crete, Heraklion, Crete, Greece,5Dept. of Pathology, Medical School, University of Crete, Crete, Greece, 6Prolipsis Diagnostic Breast Center, Athens, Greece.

    Abstract - Breast cancer is the most common malignancy

    affecting women, the life time risk being approximately

    10%. Breast cancer is both genetically and

    histopathologically heterogeneous, and the underling

    development mechanisms remain largely unknown. Global

    expression analysis using microarrays offers unprecedented

    opportunities to obtain molecular signatures of the state ofactivity of diseased cells and patient samples. The predictive

    power of this approach is much greater than that of

    currently used approaches, but remains to be validated in

    prospective clinical studies. The PrognoChip project is

    based on the synergy between Bioinformatics and Medical

    Informatics, following the lines of the new raising discipline

    of Biomedical Informatics. In this context we are moving

    towards the specification and creation of an Integrated

    Clinico-Genomics Information Technology Environment

    (ICG-ITE) where, the smooth integration between the

    clinical and the genomics worlds as well as the intelligent

    processing of the underlying data, enables the identification

    of reliable and clinically valid (i.e., in terms of prognosis)

    molecular (gene) markers.

    Keywords: Breast cancer, Biomedical informatics, semantic

    integration, data-mining

    I. INTRODUCTION

    The completion of the human genome and the

    development of post-genomic applications have

    introduced new holistic approaches and challenges in the

    analysis of diseases that will, in the years to come,

    revolutionize biomedical research and health care. A

    characteristic of medicine in the post-genomic era will be

    the consultation of both the comprehensive genotypic

    information of the patient and the detailed molecular

    classification of the disease in order to specify, with

    precision and high efficiency, an individualized

    treatment.

    Breast cancer is one of the most common malignancies

    affecting women, the life time risk being approximately

    10%. Breast cancer is both genetically and histo-

    pathologically heterogeneous, and the mechanisms

    underlying breast cancer development remain largely

    unknown. Breast cancer patients diagnosed with the same

    stage of disease often have remarkably different

    responses to therapy and overall outcome. Even with the

    strongest prognostic indicators such as lymph node status,estrogen receptor expression and histological grade, it is

    not possible to accurately classify breast tumors

    according to their clinical behavior. Genomic background

    and variations in the transcriptional programs account for

    much of the observed diversity. The Prognochip project

    aims at the identification and validation of signature

    gene expression profiles of breast tumors correlating with

    other epidemiological or clinical parameters.

    Towards these goals scientists from distant scientific

    disciplines join forces and efforts: Molecular Biology

    (Institute of Molecular Biology & Biotechnology,

    FORTH; http://www.imbb.forth.gr), Medicine(University Hospital, University of Crete Surgical

    Oncology; and Prolipsis a diagnostic centre in Athens),

    Biostatistics and Computer Science (Institute of

    Computer Science, FORTH; http://www.ics.forth.gr). We

    expect that the synergy between Medicine, Molecular

    Biology, and Biomedical Informatics, will provide us

    with unique means and experience to evaluate gene

    expression signatures that will outperform the currently

    used parameters in therapy prediction and clinical

    prognosis of breast cancer.

    II. POST-GENOMICS, MICROARRAYS ANDBREAST CANCER

    Since the discovery of the first oncogene about 25 years

    ago, a large body of research has convincingly

    demonstrated that the initiation and progression of

    cancers involve the accumulation of genetic aberrations in

    the cell. Recently, through studying blood samples of

    families in which there is a history of breast cancer,

  • 7/28/2019 Potamias_etal05

    2/6

    scientists have isolated and identified a gene linked to

    breast cancer. A person who has this modified gene,

    labelled BRCA1, has an 85% lifetime risk of developing

    breast cancer, as well as a significantly higher risk of

    ovarian cancer. By being able to identify these genes

    through particular markers associated with the gene,

    doctors will know which individuals are more susceptibleto cancer and therefore can follow the proper procedure.

    The recent isolation of the gene BRCA1 has prompted

    investigators to identify other genes that may contribute

    to breast cancer; ovarian cancer and the breast-ovarian

    cancer syndrome. Research and technological

    development incriminated a number of other breast-

    cancer related genes. These genes and their role in

    starting or growing breast cancer are listed in Table I

    (refer to http://www.breasted.org/genetics.html for a

    detailed description and references).

    Molecular diagnostics is a rapidly advancing field in

    which insights into disease mechanisms are being

    elucidated by use of new gene-based biomarkers. Until

    recently, diagnostic and prognostic assessment of

    diseased tissues and tumours relied heavily on indirect

    indicators that permitted only general classifications into

    broad histological or morphological subtypes and did not

    take into account the alterations in individual gene

    expression.

    In this context, global gene expression analysis using

    microarrays now offers unprecedented opportunities to

    obtain molecular signatures of the state of activity of

    diseased cells and patient samples. This groundbreakingapproach of studying cancer promises to provide a better

    understanding of the underlying mechanism for

    tumourigenesis, more accurate diagnosis, more

    comprehensive prognosis, and more effective therapeutic

    interventions [KHA, 01]

    Within the past years, two major advances have taken

    place. First, microarray-based expression profiling has

    shown promise with the preliminary demonstration that

    clustering techniques can predict clinical outcome in

    lymphoma [ALI, 00], paediatric leukaemia [YEO, 02],

    and breast cancer [SOR, 01], [VEE, 02]. Relative results

    for breast cancer have demonstrated the ability of

    microarray-based expression profiling to detect tumour

    cells in peripheral blood samples, to predict

    chemotherapy responses in fine-needle aspiration samples

    in neoadjuvant chemotherapy, and, most importantly, to

    predict disease-free survival and overall survival from

    profiles in breast cancer surgical specimens [BER, 00],

    [HED, 01]. Second, in breast cancer genetics, genes like

    CHEK2 and HERC2/neu receptor tyrosine kinase were

    identified as low-penetrance breast cancer susceptibility

    genes and are targets of specific drugs [LAB, 01]. These

    studies demonstrate the transition of basic biologic

    research to clinical application.

    TABLE I

    BREAST CANCER GENES AND THEIR ROLE

    Gene Role

    BRCA1, BRCA2 Tumor suppressor

    BP1 stimulates cell growth

    HER2, erb-B, Erb-B2, neu stimulates cell growth

    P65 stimulates cell growthATM controls cell division

    ZNF21 increases the longevity of cells

    PDGF stimulates the growth of blood

    vessels

    Bcl-1 regulates the cell cycle

    RB regulates the cell cycle

    EK2 involved in repair of damaged DNA

    Furthermore, analysis of primary tumours and derived

    metastases showed very similar expression profiles

    indicating that the molecular program of a primary

    tumour is generally retained in its metastases [SCH, 03].

    Given the clinical heterogeneity of breast cancer,

    microarrays are an ideal tool to establish a more accurate

    classification [PIN, 03]. The predictive power of this

    approach is much greater than that of currently used

    approaches, but remains to be validated in prospective

    clinical studies. If confirmed in that setting, the

    expression profiling classifier would result at minimum in

    about a four-fold drop of patients receiving adjuvant

    therapy unnecessarily. Recent breast cancer studies have

    demonstrated the ability of microarray-based expression

    profiling to detect tumor cells in peripheral blood

    samples, to predict chemotherapy responses in fine-

    needle aspiration samples in neoadjuvant chemotherapy,

    and, most importantly, to predict disease-free survival andoverall survival from profiles in breast cancer surgical

    specimens. The predictive power of this approach is much

    greater than that of currently used approaches, but

    remains to be validated in prospective clinical studies.

    III. INDIVIDUALIZED MEDICINE AND

    BIOMEDICAL INFORMATICS

    It becomes evident that in order to fully grasp the

    mechanisms of a disease we do not only need an

    understanding of the genetic base of the disease- dealing

    with large amounts of data and related functionalgenomics approaches (such as gene-expression profiling)

    but we also need to integrate the knowledge normally

    processed in the clinical setting.

    The use of genetic and proteomic data in addition to

    clinical symptoms for medical decision-making will

    contribute to the expected, continued shift towards

    evidence-based medicine. This vision can only be

    realized with an enormous investment into: (i) technology

    able to produce the genomic and proteomic data and the

    initial comparison of produced results with reference

    databases; (ii) creation of standardized databases that

    combine clinical history, symptoms and signs, laboratory

  • 7/28/2019 Potamias_etal05

    3/6

    and procedural results, and genetic and proteomic data in

    raw as well as intelligently processed formats; (iii)

    technology that assures confidential access to these data

    by those who need access, and full-proof security against

    unauthorized access; (v) extraction of knowledge out of

    these huge databases, their expert interpretation and

    matching against existing computational models; (vi)development of novel explanatory and predictive models

    for the above, abstraction of the results to the clinical

    level, and incorporation of the extracted knowledge into

    algorithms and standardized clinical guidelines where

    feasible; and finally (vii) implementation of the new

    guidelines into the clinical decision-making process.

    In this setting a new discipline namely, Biomedical

    Informatics (BMI), is raising. BMI aims to offer the

    appropriate technology in order to support the emerging

    individualized medicine environment, and allow

    optimized, individualized healthcare using all relevant

    sources of information. Collaborative efforts between

    Medical Informatics (MI) and Bioinformatics (BI) could

    provide new insights and create a synergy for challenges

    needed to create novel genomic applications in medicine

    (refer to http://bioinfomed.isciii.es for a white-

    paper on the field, and to http://www.

    infobiomed.net for a relevant EU funded NoE

    project).

    BI enables us to understand the fundamental knowledge

    about biological processes. The inclusion of clinical

    information in biomedical informatics opens the gateway

    to genetic risk profiling of patients, new paradigms indisease diagnoses and prognoses and novel approaches to

    drug discovery based on the correlation of genetic and

    molecular knowledge of diseases with clinical

    information of the patients. In this setting the respective

    biomedical informatics R&D agenda is forwarded

    towards the design, development and deployment of an

    integrated clinico-genomics operational framework

    where, functional genomics and disease compacting

    research are coupled and guided by related medical

    knowledge.

    IV. THE PROGNOCHIP PROJECT

    PrognoChip is a (running) project that joins forces and

    efforts from different scientific disciplines: Molecular

    Biology (Institute of Molecular Biology &

    Biotechnology, FORTH), Medicine (Dept of Surgical

    Oncology, University of Crete, and PROLIPSIS,

    diagnostic breast cancer center), and Computer Science

    (Institute of Computer Science, FORTH). The major tasks

    (already scheduled and initiated) within Prognochip are

    briefly presented in the sequel.

    Medicine/ Tissue collection & Histopathology. (a)

    surgical specimens are collected from breast cancer

    patients that undergo any type of surgical type of

    treatment; as soon as the specimen is removed from the

    patient it is carried immediately (in less than 20 minutes)

    to the histopathology department in order to avoid ex

    vivo ischemia phenomena; (b) a tissue procurement

    protocol is designed for tissue collection and storage;

    sections are taken from the growing edge of the tumour,

    stored at 800C dry freezer for further reference, placed

    in RNAlater reagent for further RNA extraction, andcovered with optimal cutting temperature compound

    (OCT) intended for immunohistochemistry - a Tissue-

    Banksystem was designed and developed (already in use)

    for proper tissue filing and management; (c) a set of

    immunohistology and FISH methods for growth factors

    and their receptors, especially HER-2 (up-regulated in

    30% of breast carcinomas), are accessed for the

    characterization of breast carcinomas; all patients with

    malignant disease are staged according to the new TNM

    system. In the context of PrognoChip the plans is to

    obtain full-genome expression profiles from

    approximately 200 individual breast carcinomas.Ethical Issues: Patients are informed and consent to the

    molecular and genetic data analysis of their tissue and

    blood samples. They also consent to the use of the data

    for scientific purposes provided that their anonymity is

    secured. For this purpose, special security and

    authorization mechanisms are provided and made

    operational in the context of the deployed clinical

    information systems (see below).

    Molecular Biology/ Microarrays: A DNA microarray of

    long oligonucleotide probes has been designed,

    representing all known human genes, approximately

    35,000 different transcripts of 27,000 different genes.Additional positive and negative control oligos have been

    included for the quality control of the procedure and the

    normalization of data. Oligonucleotide probes are spotted

    on a coated activated glass slide, at a density of

    approximately 2250 elements/cm3. A common reference

    material has been decided for the study, consisting from a

    defined set of cell-line extracts, ensuring accurate

    quantitation of gene expression for the most of the genes.

    An RNA extraction, amplification and fluorescent

    labeling protocol has been developed, allowing the

    analysis of small samples. After hybridization,

    fluorescence intensity images are acquired, using

    confocal laser scanner, as 16-bit TIFF files. From these

    images, fluorescence intensities are obtained using

    dedicated image analysis software. Special plug-ins are

    developed for data pre-processing (filtering,

    normalization) and analysis.

    V. TOWARDS AN INTEGRATED CLINICO-

    GENOMICS ENVIRONMENT

    In the context of the Prognochip project we have

    forwarded, scheduled, and initiated efforts towards the

    delivery of an Integrated Clinico-Genomics Information

    Technology Environment (ICG-ITE) with the combined

    genetic- andindividualized-medicinebeing the target.

  • 7/28/2019 Potamias_etal05

    4/6

    Fig. 1. Architectural layout and building blocks of the Integrated

    Clinico-Genomics Information technology Environment

    The envisioned building blocks of ICG-ITE include (seeFigure 1):

    a set ofclinical information systems to keep patientsclinical information (i.e., clinical, laboratory and

    histo-pathology information systems) based on

    Electronic Health Care Record (EHCR) standard data-

    models [TSI, 02], [COA, 99], [HL7, 02],

    a genomic information system (GIS) to store andmanage the specifications of the respective microarray

    experiments (i.e., chip design, hybridizations, etc.),

    analyze measured biossays, as well as to store

    samples genomic information. GIS is based on the

    BASE system (http://base.thep. lu.se)where, the underlying standard genomic data model

    ([MIA, 04]) and functionality was extended to meet

    the project requirements,and

    a middleware layer for information/ data integrationand intelligent processing - realized by a puzzle of

    integrated software components that enable: (i) the

    seamless and efficient extraction of data from the

    various data and information sources (clinical and

    genomic); (ii) uniform information modeling- enabled

    by the utilization of standard clinical/ genomic data

    models and respective ontologies [XML, 04], [KAR,

    03], (iii) uniform information representation- enabledby the utilization and the appropriate customization of

    RDF/XML technology; and (iv) intelligent data

    processing and visualization - enabled by a suite of

    data-miningcomponents and tools [TIB, 99], [AWE,

    99], [PER, 00], [POT, 04], [SYM, 04].

    The demanding clinical and genomic data integration

    environment post the need to elaborate on the concept of

    Integrated Electronic Health Care Record (IEHCR)

    architectures [TSI, 02], utilize the respective

    technological advances, and extend the standard clinical

    data models to include and amalgamate genomic ones. In

    this context, the provided security and authorisationinfrastructure is fully employed.

    VI. KNOWLEDGE DISCOVERY AND SYNERGISTIC

    CLINICO-GENOMICS DECISION-MAKING

    The vision of PrognoChip is to realize and operationalize

    integrated clinico-genomics knowledge-discovery and

    decision-making scenarios, in the lines of the tasks and

    procedures outlined below.

    A. From Phenotypes to Genotypes

    Applying advanced data-mining operations (e.g.,

    discriminatory analysis for gene-selection) on the

    acquired gene-expression matrix we are able to identify

    potential discriminatory genes, i.e., genes that

    distinguishe between identified phenotypes (e.g.,

    phenotypes A and B; see Figure 2). These genes compose

    and indicate the molecular signature orgene markers of

    the specific patients phenotypes. In other words, we are

    able to link potential phenotypical profiles to respectivemolecular orgenotypicalones. Such advancement may be

    utilised in the course of both prognostic and therapeutic

    decision-making processes. That is, respective patients,

    whose gene-expression profiles match the discovered

    molecular signature, could be detected to belong to one of

    the identified phenotypes. Then, according to established

    guidelines and treatment protocols, prognostic indicators

    may be assessed with patients admitted to (potentially)

    available treatment protocols.

    B. From Genotypes to Phenotypes

    The above scenario could be initiated the other wayaround. That is, applying again data-mining operations

    (e.g., unsupervised learning such as clustering) we are

    able to identify clusters of samples based on their gene-

    expression profiles. These clusters may represent

    potential interesting genotypes. Assume that two such

    genotypical profiles are discovered and identified, X, and

    Y (based on the exact parameterization of the clustering

    process more clusters may be identified; see Figure 2).

    Having on our disposal recorded phenotypical

    information and data about the samples (i.e., response,

    positive reaction or resistance to specific

    chemotherapeutic agents and/or clinico-histopathologicalstate of tumour) we may assign each, yet untreated,

    sample to one of the two classes, X orY. Then, we may

    initiate a supervised data mining process (e.g.,

    classification) in order to discover respective predictive

    models. Each of these models represents a potential

    phenotype. In this mode of the scenario we may achieve a

    re-classification of breast cancer, i.e., a hierarchical

    organization of different disease-related phenotypes - a

    major task in cancer research. In this context, patients

    with different phenotypical profiles are (potentially)

    subject to follow different chemo- and/or radio-

    therapeutic protocols. So, a more individualised health-care plan may be accessed.

  • 7/28/2019 Potamias_etal05

    5/6

    Fig. 2. Synergistic clinico-genomics decision-making and

    knowledge-discovery support.

    VII.CONCLUSION & FUTURE WORK

    Much of the genomic data of clinical relevance generated

    so far are in a format that is inappropriate for diagnostic

    testing. Very large epidemiological population samples

    followed prospectively (over a period of years) and

    characterized for their biomarker and genetic variation

    will be necessary to demonstrate the clinical utility of

    these tools. Obstacles to the routine application of these

    data in clinical practice include a cultural gap between the

    approaches to clinical practice that is currently employed

    and that which is possible with these new tools. This will

    require a change of mind of clinical oncologists. In the

    next 10 years clinical protocols will require a

    translational section based on the type of targeted

    treatment under study [CEL, 03].

    In this paper weve presented PrognoChip, a multi-

    disciplinary project that meets the aforementioned

    challenges and targets the raising need for individualised

    medicine (in terms of both prognosis and treatment). In

    the context of the project an Integrated Clinico-Genomics

    Environment was designed. The building-blocks of this

    environment are identified and specified. Various

    enabling components of the environment are already

    developed and deployed (the clinical and genomic

    information systems). Furthermore, experimentation and

    evaluation of known and (developed) innovative data-

    mining techniques is in progress.On-going R&D work (as

    related to information technology) is now forwarded to

    the development of the integration infrastructure, i.e., to

    the operationalisation of the middleweare layer of the

    ICG-ITE. The plan is to have a first (prototype)

    implementation of the whole system by June 2005. By

    that, the clinical and genomic profiles of a number of

    original patients samples will be also available and

    recorded in the respective information systems.

    PrognoChip is a very demanding project, in terms of both

    human and infrastructure resources. So, resources from

    other, directly related, on-going projects (in which

    organization in PrognoChip participate) are also utilised.In this context, we want to acknowledge INFOBIOMED

    (a network of excellence project; funded by the EU IST

    program; http://www.infobiomed.net) where,

    results from a nationally-funded project (as PrognoChip)

    will be utilised and exploited in the context of a trans-

    European one.

    REFERENCES

    [ALI, 00] Alizadeh et al, Distinct types of diffuse large B-cell

    lymphoma identified by gene expression profiling,Nature, 403,

    pp. 503511, 2000.

    [BER, 00] F. Bertucci et al, Gene expression profiling ofprimary breast carcinomas using arrays of candidate genes,Hum

    Mol Genet, 9, pp. 29812991, 2000.

    [CEL, 03] Celis, J. Proteomics and Functional Genomics in

    Translational Cancer Research: towards an integrated approach.

    Presentation in Cancer: Molecular Targets for novel Therapies.

    3rd Simposio Scientifico, Pabelln San Carlos, Hospital Clinico,

    Madrid, April 2003.

    [COA, 99] COAS, Clinical Observations Access Service

    (COAS), Final Submission, OMG Document: corbamed/99-03-

    25, 1999.

    [HED, 01] I. Hedenfalk et al, Gene-expression profiles in

    hereditary breast cancer,N Engl J Med, 344, pp. 539548, 2001.

    [HL7, 02] HL7 Health Level 7: Reference Information Model

    (RIM), http://www.hl7.org/library/data-model/RIM/C30118/

    rim.htm.

    [KAR, 03] G. Karvounarakis, A. Magkanaraki, S. Alexaki, V.

    Christophides, D. Plexousakis, M. Scholl, and K. Tolle.

    Querying the Semantic Web with RQL. Computer Networks

    and ISDN Systems Journal, 42(5), pp. 617640, 2003.

    [KHA, 01] J. Khan et al, Classification and diagnostic

    prediction of cancers using gene expression profiling and

    artificial neural networks,Nat Med, 7, pp. 673679, 2001.

    [LAB, 01] E. Landesman-Bollag et al, Protein kinase CK2 in

    mammary gland tumourigenesis, Oncogene, 20, pp. 32473257,

    2001.[MIA, 04] MIAME Web site. http://www.mged.org/

    Workgroups/MIAME/miame.html, accessed Dec. 2004.

    [PER, 00] C.M. Perou et al, Molecular portraits of human breast

    tumours,Nature, 406, 747752, 2000.

    [PIN, 03] R. Pinedo, Cancer Clinical Trials in the next decade.

    Presentation in Cancer: Molecular Targets for novel Therapies.

    3rd Simposio Scientifico, Pabelln San Carlos, Hospital Clinico,

    Madrid, April 2003.

    [POT, 04] G. Potamias, L. Koumakis, and V. Moustakis, Gene

    Selection via Discretized Gene-Expression Profiles and Greedy

    Feature-Elimination, LECT NOTES ARTIF INT (LNAI), 3025,

    pp. 256266, 2004.

  • 7/28/2019 Potamias_etal05

    6/6

    [SCH, 03] U. Schmidt et al, Cancer diagnosis and

    microarrays, Int J Biochem Cell Biol, 35(2), pp. 119124,

    2003.

    [SOR, 01] T. Sorlie et al, Gene expression patterns of breast

    carcinomas distinguish tumour subclasses with clinical

    implications,Proc Natl Acad Sci, Sep 11, 98(19), pp. 10869

    10874, 2001.[SYM, 04] A. Symeonidis and I.G. Tollis, Visualization of

    Biological Information with Circular Drawings, LNCS, 3337,

    pp. 468478, 2004.

    [TIB, 99] R. Tibshirani, R., Hastie, T., Eisen, M., Ross, D.,

    Botstein, and Brown, P., Clustering methods for the analysis

    of DNA microarray data, Technical Report, Department of

    Statistics, Stanford University, 1999.

    [TSI, 02] M. Tsiknakis, D.G. Katehakis, and S.C.

    Orphanoudakis, An Open, Component-based Information

    Infrastructure for Integrated Health Information Networks,

    International Journal of Medical Informatics, 68(1-3), pp. 3

    26, 2002.

    [VEE, 02] E. van der Veer et al, Gene expression profiling

    predicts clinical outcome of breast cancer, Nature,

    415(6871), pp. 530536, 2002.

    [XML, 04] XML Semantics. http://www.w3.org/DesignIssues/Toolbox.html, accessed Dec. 2004.

    [YEO, 02] E.J. Yeoh, et al, Classification, subtype discovery,

    and prediction of outcome in pediatric acute lymphoblastic

    leukemia by gene expression profiling, Cancer Cell, 1(2), pp.

    13343, 2002.

    [ZWE, 99] G. Zweiger, Knowledge discovery in gene-

    expression-microarray data: mining the information output of

    the genome. Trends Biotechnol., 17(11), pp. 429436, 1999.