Potamias_etal05

7/28/2019 Potamias_etal05

1/6

Breast Cancer and Biomedical Informatics: The PrognoChip Project

G. Potamias1,2

*, A. Analyti1, D. Kafetzopoulos

3, D. Plexousakis

1,2, P. Poirazi

3, M. Reczko

1,

I.G. Tollis1,2, M. E. Sanidas4, E. Stathopoulos5, Tsiknakis1, S. Vassilaros6

Institute of Computer Science

Foundation for Research & Technology Hellas (FORTH)

Heraklion, 711 10 Greece

Phone: +30-2810-391693, Fax: +30-2810-311601, E-mail:[email protected]

_____________________________

* Corresponding author

1Institute of Computer Science (ICS), FORTH, 2Dept. of Computer Science, University of Crete, 3Institute of Molecular Biologyand Biotechnology (IMBB), FORTH, 4Dept. of Surgical Oncology, Medical School, University of Crete, Heraklion, Crete, Greece,5Dept. of Pathology, Medical School, University of Crete, Crete, Greece, 6Prolipsis Diagnostic Breast Center, Athens, Greece.

Abstract - Breast cancer is the most common malignancy

affecting women, the life time risk being approximately

10%. Breast cancer is both genetically and

histopathologically heterogeneous, and the underling

development mechanisms remain largely unknown. Global

expression analysis using microarrays offers unprecedented

opportunities to obtain molecular signatures of the state ofactivity of diseased cells and patient samples. The predictive

power of this approach is much greater than that of

currently used approaches, but remains to be validated in

prospective clinical studies. The PrognoChip project is

based on the synergy between Bioinformatics and Medical

Informatics, following the lines of the new raising discipline

of Biomedical Informatics. In this context we are moving

towards the specification and creation of an Integrated

Clinico-Genomics Information Technology Environment

(ICG-ITE) where, the smooth integration between the

clinical and the genomics worlds as well as the intelligent

processing of the underlying data, enables the identification

of reliable and clinically valid (i.e., in terms of prognosis)

molecular (gene) markers.

Keywords: Breast cancer, Biomedical informatics, semantic

integration, data-mining

I. INTRODUCTION

The completion of the human genome and the

development of post-genomic applications have

introduced new holistic approaches and challenges in the

analysis of diseases that will, in the years to come,

revolutionize biomedical research and health care. A

characteristic of medicine in the post-genomic era will be

the consultation of both the comprehensive genotypic

information of the patient and the detailed molecular

classification of the disease in order to specify, with

precision and high efficiency, an individualized

treatment.

Breast cancer is one of the most common malignancies

affecting women, the life time risk being approximately

10%. Breast cancer is both genetically and histo-

pathologically heterogeneous, and the mechanisms

underlying breast cancer development remain largely

unknown. Breast cancer patients diagnosed with the same

stage of disease often have remarkably different

responses to therapy and overall outcome. Even with the

strongest prognostic indicators such as lymph node status,estrogen receptor expression and histological grade, it is

not possible to accurately classify breast tumors

according to their clinical behavior. Genomic background

and variations in the transcriptional programs account for

much of the observed diversity. The Prognochip project

aims at the identification and validation of signature

gene expression profiles of breast tumors correlating with

other epidemiological or clinical parameters.

Towards these goals scientists from distant scientific

disciplines join forces and efforts: Molecular Biology

(Institute of Molecular Biology & Biotechnology,

FORTH; http://www.imbb.forth.gr), Medicine(University Hospital, University of Crete Surgical

Oncology; and Prolipsis a diagnostic centre in Athens),

Biostatistics and Computer Science (Institute of

Computer Science, FORTH; http://www.ics.forth.gr). We

expect that the synergy between Medicine, Molecular

Biology, and Biomedical Informatics, will provide us

with unique means and experience to evaluate gene

expression signatures that will outperform the currently

used parameters in therapy prediction and clinical

prognosis of breast cancer.

II. POST-GENOMICS, MICROARRAYS ANDBREAST CANCER

Since the discovery of the first oncogene about 25 years

ago, a large body of research has convincingly

demonstrated that the initiation and progression of

cancers involve the accumulation of genetic aberrations in

the cell. Recently, through studying blood samples of

families in which there is a history of breast cancer,


2/6

scientists have isolated and identified a gene linked to

breast cancer. A person who has this modified gene,

labelled BRCA1, has an 85% lifetime risk of developing

breast cancer, as well as a significantly higher risk of

ovarian cancer. By being able to identify these genes

through particular markers associated with the gene,

doctors will know which individuals are more susceptibleto cancer and therefore can follow the proper procedure.

The recent isolation of the gene BRCA1 has prompted

investigators to identify other genes that may contribute

to breast cancer; ovarian cancer and the breast-ovarian

cancer syndrome. Research and technological

development incriminated a number of other breast-

cancer related genes. These genes and their role in

starting or growing breast cancer are listed in Table I

(refer to http://www.breasted.org/genetics.html for a

detailed description and references).

Molecular diagnostics is a rapidly advancing field in

which insights into disease mechanisms are being

elucidated by use of new gene-based biomarkers. Until

recently, diagnostic and prognostic assessment of

diseased tissues and tumours relied heavily on indirect

indicators that permitted only general classifications into

broad histological or morphological subtypes and did not

take into account the alterations in individual gene

expression.

In this context, global gene expression analysis using

microarrays now offers unprecedented opportunities to

obtain molecular signatures of the state of activity of

diseased cells and patient samples. This groundbreakingapproach of studying cancer promises to provide a better

understanding of the underlying mechanism for

tumourigenesis, more accurate diagnosis, more

comprehensive prognosis, and more effective therapeutic

interventions [KHA, 01]

Within the past years, two major advances have taken

place. First, microarray-based expression profiling has

shown promise with the preliminary demonstration that

clustering techniques can predict clinical outcome in

lymphoma [ALI, 00], paediatric leukaemia [YEO, 02],

and breast cancer [SOR, 01], [VEE, 02]. Relative results

for breast cancer have demonstrated the ability of

microarray-based expression profiling to detect tumour

cells in peripheral blood samples, to predict

chemotherapy responses in fine-needle aspiration samples

in neoadjuvant chemotherapy, and, most importantly, to

predict disease-free survival and overall survival from

profiles in breast cancer surgical specimens [BER, 00],

[HED, 01]. Second, in breast cancer genetics, genes like

CHEK2 and HERC2/neu receptor tyrosine kinase were

identified as low-penetrance breast cancer susceptibility

genes and are targets of specific drugs [LAB, 01]. These

studies demonstrate the transition of basic biologic

research to clinical application.

TABLE I

BREAST CANCER GENES AND THEIR ROLE

Gene Role

BRCA1, BRCA2 Tumor suppressor

BP1 stimulates cell growth

HER2, erb-B, Erb-B2, neu stimulates cell growth

P65 stimulates cell growthATM controls cell division

ZNF21 increases the longevity of cells

PDGF stimulates the growth of blood

vessels

Bcl-1 regulates the cell cycle

RB regulates the cell cycle

EK2 involved in repair of damaged DNA

Furthermore, analysis of primary tumours and derived

metastases showed very similar expression profiles

indicating that the molecular program of a primary

tumour is generally retained in its metastases [SCH, 03].

Given the clinical heterogeneity of breast cancer,

microarrays are an ideal tool to establish a more accurate

classification [PIN, 03]. The predictive power of this

approach is much greater than that of currently used

approaches, but remains to be validated in prospective

clinical studies. If confirmed in that setting, the

expression profiling classifier would result at minimum in

about a four-fold drop of patients receiving adjuvant

therapy unnecessarily. Recent breast cancer studies have

demonstrated the ability of microarray-based expression

profiling to detect tumor cells in peripheral blood

samples, to predict chemotherapy responses in fine-

needle aspiration samples in neoadjuvant chemotherapy,

and, most importantly, to predict disease-free survival andoverall survival from profiles in breast cancer surgical

specimens. The predictive power of this approach is much

greater than that of currently used approaches, but

remains to be validated in prospective clinical studies.

III. INDIVIDUALIZED MEDICINE AND

BIOMEDICAL INFORMATICS

It becomes evident that in order to fully grasp the

mechanisms of a disease we do not only need an

understanding of the genetic base of the disease- dealing

with large amounts of data and related functionalgenomics approaches (such as gene-expression profiling)

but we also need to integrate the knowledge normally

processed in the clinical setting.

The use of genetic and proteomic data in addition to

clinical symptoms for medical decision-making will

contribute to the expected, continued shift towards

evidence-based medicine. This vision can only be

realized with an enormous investment into: (i) technology

able to produce the genomic and proteomic data and the

initial comparison of produced results with reference

databases; (ii) creation of standardized databases that

combine clinical history, symptoms and signs, laboratory


3/6

and procedural results, and genetic and proteomic data in

raw as well as intelligently processed formats; (iii)

technology that assures confidential access to these data

by those who need access, and full-proof security against

unauthorized access; (v) extraction of knowledge out of

these huge databases, their expert interpretation and

matching against existing computational models; (vi)development of novel explanatory and predictive models

for the above, abstraction of the results to the clinical

level, and incorporation of the extracted knowledge into

algorithms and standardized clinical guidelines where

feasible; and finally (vii) implementation of the new

guidelines into the clinical decision-making process.

In this setting a new discipline namely, Biomedical

Informatics (BMI), is raising. BMI aims to offer the

appropriate technology in order to support the emerging

individualized medicine environment, and allow

optimized, individualized healthcare using all relevant

sources of information. Collaborative efforts between

Medical Informatics (MI) and Bioinformatics (BI) could

provide new insights and create a synergy for challenges

needed to create novel genomic applications in medicine

(refer to http://bioinfomed.isciii.es for a white-

paper on the field, and to http://www.

infobiomed.net for a relevant EU funded NoE

project).

BI enables us to understand the fundamental knowledge

about biological processes. The inclusion of clinical

information in biomedical informatics opens the gateway

to genetic risk profiling of patients, new paradigms indisease diagnoses and prognoses and novel approaches to

drug discovery based on the correlation of genetic and

molecular knowledge of diseases with clinical

information of the patients. In this setting the respective

biomedical informatics R&D agenda is forwarded

towards the design, development and deployment of an

integrated clinico-genomics operational framework

where, functional genomics and disease compacting

research are coupled and guided by related medical

knowledge.

IV. THE PROGNOCHIP PROJECT

PrognoChip is a (running) project that joins forces and

efforts from different scientific disciplines: Molecular

Biology (Institute of Molecular Biology &

Biotechnology, FORTH), Medicine (Dept of Surgical

Oncology, University of Crete, and PROLIPSIS,

diagnostic breast cancer center), and Computer Science

(Institute of Computer Science, FORTH). The major tasks

(already scheduled and initiated) within Prognochip are

briefly presented in the sequel.

Medicine/ Tissue collection & Histopathology. (a)

surgical specimens are collected from breast cancer

patients that undergo any type of surgical type of

treatment; as soon as the specimen is removed from the

patient it is carried immediately (in less than 20 minutes)

to the histopathology department in order to avoid ex

vivo ischemia phenomena; (b) a tissue procurement

protocol is designed for tissue collection and storage;

sections are taken from the growing edge of the tumour,

stored at 800C dry freezer for further reference, placed

in RNAlater reagent for further RNA extraction, andcovered with optimal cutting temperature compound

(OCT) intended for immunohistochemistry - a Tissue-

Banksystem was designed and developed (already in use)

for proper tissue filing and management; (c) a set of

immunohistology and FISH methods for growth factors

and their receptors, especially HER-2 (up-regulated in

30% of breast carcinomas), are accessed for the

characterization of breast carcinomas; all patients with

malignant disease are staged according to the new TNM

system. In the context of PrognoChip the plans is to

obtain full-genome expression profiles from

approximately 200 individual breast carcinomas.Ethical Issues: Patients are informed and consent to the

molecular and genetic data analysis of their tissue and

blood samples. They also consent to the use of the data

for scientific purposes provided that their anonymity is

secured. For this purpose, special security and

authorization mechanisms are provided and made

operational in the context of the deployed clinical

information systems (see below).

Molecular Biology/ Microarrays: A DNA microarray of

long oligonucleotide probes has been designed,

representing all known human genes, approximately

35,000 different transcripts of 27,000 different genes.Additional positive and negative control oligos have been

included for the quality control of the procedure and the

normalization of data. Oligonucleotide probes are spotted

on a coated activated glass slide, at a density of

approximately 2250 elements/cm3. A common reference

material has been decided for the study, consisting from a

defined set of cell-line extracts, ensuring accurate

quantitation of gene expression for the most of the genes.

An RNA extraction, amplification and fluorescent

labeling protocol has been developed, allowing the

analysis of small samples. After hybridization,

fluorescence intensity images are acquired, using

confocal laser scanner, as 16-bit TIFF files. From these

images, fluorescence intensities are obtained using

dedicated image analysis software. Special plug-ins are

developed for data pre-processing (filtering,

normalization) and analysis.

V. TOWARDS AN INTEGRATED CLINICO-

GENOMICS ENVIRONMENT

In the context of the Prognochip project we have

forwarded, scheduled, and initiated efforts towards the

delivery of an Integrated Clinico-Genomics Information

Technology Environment (ICG-ITE) with the combined

genetic- andindividualized-medicinebeing the target.


4/6

Fig. 1. Architectural layout and building blocks of the Integrated

Clinico-Genomics Information technology Environment

The envisioned building blocks of ICG-ITE include (seeFigure 1):

a set ofclinical information systems to keep patientsclinical information (i.e., clinical, laboratory and

histo-pathology information systems) based on

Electronic Health Care Record (EHCR) standard data-

models [TSI, 02], [COA, 99], [HL7, 02],

a genomic information system (GIS) to store andmanage the specifications of the respective microarray

experiments (i.e., chip design, hybridizations, etc.),

analyze measured biossays, as well as to store

samples genomic information. GIS is based on the

BASE system (http://base.thep. lu.se)where, the underlying standard genomic data model

([MIA, 04]) and functionality was extended to meet

the project requirements,and

a middleware layer for information/ data integrationand intelligent processing - realized by a puzzle of

integrated software components that enable: (i) the

seamless and efficient extraction of data from the

various data and information sources (clinical and

genomic); (ii) uniform information modeling- enabled

by the utilization of standard clinical/ genomic data

models and respective ontologies [XML, 04], [KAR,

03], (iii) uniform information representation- enabledby the utilization and the appropriate customization of

RDF/XML technology; and (iv) intelligent data

processing and visualization - enabled by a suite of

data-miningcomponents and tools [TIB, 99], [AWE,

99], [PER, 00], [POT, 04], [SYM, 04].

The demanding clinical and genomic data integration

environment post the need to elaborate on the concept of

Integrated Electronic Health Care Record (IEHCR)

architectures [TSI, 02], utilize the respective

technological advances, and extend the standard clinical

data models to include and amalgamate genomic ones. In

this context, the provided security and authorisationinfrastructure is fully employed.

VI. KNOWLEDGE DISCOVERY AND SYNERGISTIC

CLINICO-GENOMICS DECISION-MAKING

The vision of PrognoChip is to realize and operationalize

integrated clinico-genomics knowledge-discovery and

decision-making scenarios, in the lines of the tasks and

procedures outlined below.

A. From Phenotypes to Genotypes

Applying advanced data-mining operations (e.g.,

discriminatory analysis for gene-selection) on the

acquired gene-expression matrix we are able to identify

potential discriminatory genes, i.e., genes that

distinguishe between identified phenotypes (e.g.,

phenotypes A and B; see Figure 2). These genes compose

and indicate the molecular signature orgene markers of

the specific patients phenotypes. In other words, we are

able to link potential phenotypical profiles to respectivemolecular orgenotypicalones. Such advancement may be

utilised in the course of both prognostic and therapeutic

decision-making processes. That is, respective patients,

whose gene-expression profiles match the discovered

molecular signature, could be detected to belong to one of

the identified phenotypes. Then, according to established

guidelines and treatment protocols, prognostic indicators

may be assessed with patients admitted to (potentially)

available treatment protocols.

B. From Genotypes to Phenotypes

The above scenario could be initiated the other wayaround. That is, applying again data-mining operations

(e.g., unsupervised learning such as clustering) we are

able to identify clusters of samples based on their gene-

expression profiles. These clusters may represent

potential interesting genotypes. Assume that two such

genotypical profiles are discovered and identified, X, and

Y (based on the exact parameterization of the clustering

process more clusters may be identified; see Figure 2).

Having on our disposal recorded phenotypical

information and data about the samples (i.e., response,

positive reaction or resistance to specific

chemotherapeutic agents and/or clinico-histopathologicalstate of tumour) we may assign each, yet untreated,

sample to one of the two classes, X orY. Then, we may

initiate a supervised data mining process (e.g.,

classification) in order to discover respective predictive

models. Each of these models represents a potential

phenotype. In this mode of the scenario we may achieve a

re-classification of breast cancer, i.e., a hierarchical

organization of different disease-related phenotypes - a

major task in cancer research. In this context, patients

with different phenotypical profiles are (potentially)

subject to follow different chemo- and/or radio-

therapeutic protocols. So, a more individualised health-care plan may be accessed.


5/6

Fig. 2. Synergistic clinico-genomics decision-making and

knowledge-discovery support.

VII.CONCLUSION & FUTURE WORK

Much of the genomic data of clinical relevance generated

so far are in a format that is inappropriate for diagnostic

testing. Very large epidemiological population samples

followed prospectively (over a period of years) and

characterized for their biomarker and genetic variation

will be necessary to demonstrate the clinical utility of

these tools. Obstacles to the routine application of these

data in clinical practice include a cultural gap between the

approaches to clinical practice that is currently employed

and that which is possible with these new tools. This will

require a change of mind of clinical oncologists. In the

next 10 years clinical protocols will require a

translational section based on the type of targeted

treatment under study [CEL, 03].

In this paper weve presented PrognoChip, a multi-

disciplinary project that meets the aforementioned

challenges and targets the raising need for individualised

medicine (in terms of both prognosis and treatment). In

the context of the project an Integrated Clinico-Genomics

Environment was designed. The building-blocks of this

environment are identified and specified. Various

enabling components of the environment are already

developed and deployed (the clinical and genomic

information systems). Furthermore, experimentation and

evaluation of known and (developed) innovative data-

mining techniques is in progress.On-going R&D work (as

related to information technology) is now forwarded to

the development of the integration infrastructure, i.e., to

the operationalisation of the middleweare layer of the

ICG-ITE. The plan is to have a first (prototype)

implementation of the whole system by June 2005. By

that, the clinical and genomic profiles of a number of

original patients samples will be also available and

recorded in the respective information systems.

PrognoChip is a very demanding project, in terms of both

human and infrastructure resources. So, resources from

other, directly related, on-going projects (in which

organization in PrognoChip participate) are also utilised.In this context, we want to acknowledge INFOBIOMED

(a network of excellence project; funded by the EU IST

program; http://www.infobiomed.net) where,

results from a nationally-funded project (as PrognoChip)

will be utilised and exploited in the context of a trans-

European one.

REFERENCES

[ALI, 00] Alizadeh et al, Distinct types of diffuse large B-cell

lymphoma identified by gene expression profiling,Nature, 403,

pp. 503511, 2000.

[BER, 00] F. Bertucci et al, Gene expression profiling ofprimary breast carcinomas using arrays of candidate genes,Hum

Mol Genet, 9, pp. 29812991, 2000.

[CEL, 03] Celis, J. Proteomics and Functional Genomics in

Translational Cancer Research: towards an integrated approach.

Presentation in Cancer: Molecular Targets for novel Therapies.

3rd Simposio Scientifico, Pabelln San Carlos, Hospital Clinico,

Madrid, April 2003.

[COA, 99] COAS, Clinical Observations Access Service

(COAS), Final Submission, OMG Document: corbamed/99-03-

25, 1999.

[HED, 01] I. Hedenfalk et al, Gene-expression profiles in

hereditary breast cancer,N Engl J Med, 344, pp. 539548, 2001.

[HL7, 02] HL7 Health Level 7: Reference Information Model

(RIM), http://www.hl7.org/library/data-model/RIM/C30118/

rim.htm.

[KAR, 03] G. Karvounarakis, A. Magkanaraki, S. Alexaki, V.

Christophides, D. Plexousakis, M. Scholl, and K. Tolle.

Querying the Semantic Web with RQL. Computer Networks

and ISDN Systems Journal, 42(5), pp. 617640, 2003.

[KHA, 01] J. Khan et al, Classification and diagnostic

prediction of cancers using gene expression profiling and

artificial neural networks,Nat Med, 7, pp. 673679, 2001.

[LAB, 01] E. Landesman-Bollag et al, Protein kinase CK2 in

mammary gland tumourigenesis, Oncogene, 20, pp. 32473257,

2001.[MIA, 04] MIAME Web site. http://www.mged.org/

Workgroups/MIAME/miame.html, accessed Dec. 2004.

[PER, 00] C.M. Perou et al, Molecular portraits of human breast

tumours,Nature, 406, 747752, 2000.

[PIN, 03] R. Pinedo, Cancer Clinical Trials in the next decade.

Presentation in Cancer: Molecular Targets for novel Therapies.

3rd Simposio Scientifico, Pabelln San Carlos, Hospital Clinico,

Madrid, April 2003.

[POT, 04] G. Potamias, L. Koumakis, and V. Moustakis, Gene

Selection via Discretized Gene-Expression Profiles and Greedy

Feature-Elimination, LECT NOTES ARTIF INT (LNAI), 3025,

pp. 256266, 2004.


6/6

[SCH, 03] U. Schmidt et al, Cancer diagnosis and

microarrays, Int J Biochem Cell Biol, 35(2), pp. 119124,

2003.

[SOR, 01] T. Sorlie et al, Gene expression patterns of breast

carcinomas distinguish tumour subclasses with clinical

implications,Proc Natl Acad Sci, Sep 11, 98(19), pp. 10869

10874, 2001.[SYM, 04] A. Symeonidis and I.G. Tollis, Visualization of

Biological Information with Circular Drawings, LNCS, 3337,

pp. 468478, 2004.

[TIB, 99] R. Tibshirani, R., Hastie, T., Eisen, M., Ross, D.,

Botstein, and Brown, P., Clustering methods for the analysis

of DNA microarray data, Technical Report, Department of

Statistics, Stanford University, 1999.

[TSI, 02] M. Tsiknakis, D.G. Katehakis, and S.C.

Orphanoudakis, An Open, Component-based Information

Infrastructure for Integrated Health Information Networks,

International Journal of Medical Informatics, 68(1-3), pp. 3

26, 2002.

[VEE, 02] E. van der Veer et al, Gene expression profiling

predicts clinical outcome of breast cancer, Nature,

415(6871), pp. 530536, 2002.

[XML, 04] XML Semantics. http://www.w3.org/DesignIssues/Toolbox.html, accessed Dec. 2004.

[YEO, 02] E.J. Yeoh, et al, Classification, subtype discovery,

and prediction of outcome in pediatric acute lymphoblastic

leukemia by gene expression profiling, Cancer Cell, 1(2), pp.

13343, 2002.

[ZWE, 99] G. Zweiger, Knowledge discovery in gene-

expression-microarray data: mining the information output of

the genome. Trends Biotechnol., 17(11), pp. 429436, 1999.

Documents

Potamias_etal05