Upload
ruanita-veiga
View
216
Download
0
Embed Size (px)
Citation preview
7/28/2019 Potamias_etal05
1/6
Breast Cancer and Biomedical Informatics: The PrognoChip Project
G. Potamias1,2
*, A. Analyti1, D. Kafetzopoulos
3, D. Plexousakis
1,2, P. Poirazi
3, M. Reczko
1,
I.G. Tollis1,2, M. E. Sanidas4, E. Stathopoulos5, Tsiknakis1, S. Vassilaros6
Institute of Computer Science
Foundation for Research & Technology Hellas (FORTH)
Heraklion, 711 10 Greece
Phone: +30-2810-391693, Fax: +30-2810-311601, E-mail:[email protected]
_____________________________
* Corresponding author
1Institute of Computer Science (ICS), FORTH, 2Dept. of Computer Science, University of Crete, 3Institute of Molecular Biologyand Biotechnology (IMBB), FORTH, 4Dept. of Surgical Oncology, Medical School, University of Crete, Heraklion, Crete, Greece,5Dept. of Pathology, Medical School, University of Crete, Crete, Greece, 6Prolipsis Diagnostic Breast Center, Athens, Greece.
Abstract - Breast cancer is the most common malignancy
affecting women, the life time risk being approximately
10%. Breast cancer is both genetically and
histopathologically heterogeneous, and the underling
development mechanisms remain largely unknown. Global
expression analysis using microarrays offers unprecedented
opportunities to obtain molecular signatures of the state ofactivity of diseased cells and patient samples. The predictive
power of this approach is much greater than that of
currently used approaches, but remains to be validated in
prospective clinical studies. The PrognoChip project is
based on the synergy between Bioinformatics and Medical
Informatics, following the lines of the new raising discipline
of Biomedical Informatics. In this context we are moving
towards the specification and creation of an Integrated
Clinico-Genomics Information Technology Environment
(ICG-ITE) where, the smooth integration between the
clinical and the genomics worlds as well as the intelligent
processing of the underlying data, enables the identification
of reliable and clinically valid (i.e., in terms of prognosis)
molecular (gene) markers.
Keywords: Breast cancer, Biomedical informatics, semantic
integration, data-mining
I. INTRODUCTION
The completion of the human genome and the
development of post-genomic applications have
introduced new holistic approaches and challenges in the
analysis of diseases that will, in the years to come,
revolutionize biomedical research and health care. A
characteristic of medicine in the post-genomic era will be
the consultation of both the comprehensive genotypic
information of the patient and the detailed molecular
classification of the disease in order to specify, with
precision and high efficiency, an individualized
treatment.
Breast cancer is one of the most common malignancies
affecting women, the life time risk being approximately
10%. Breast cancer is both genetically and histo-
pathologically heterogeneous, and the mechanisms
underlying breast cancer development remain largely
unknown. Breast cancer patients diagnosed with the same
stage of disease often have remarkably different
responses to therapy and overall outcome. Even with the
strongest prognostic indicators such as lymph node status,estrogen receptor expression and histological grade, it is
not possible to accurately classify breast tumors
according to their clinical behavior. Genomic background
and variations in the transcriptional programs account for
much of the observed diversity. The Prognochip project
aims at the identification and validation of signature
gene expression profiles of breast tumors correlating with
other epidemiological or clinical parameters.
Towards these goals scientists from distant scientific
disciplines join forces and efforts: Molecular Biology
(Institute of Molecular Biology & Biotechnology,
FORTH; http://www.imbb.forth.gr), Medicine(University Hospital, University of Crete Surgical
Oncology; and Prolipsis a diagnostic centre in Athens),
Biostatistics and Computer Science (Institute of
Computer Science, FORTH; http://www.ics.forth.gr). We
expect that the synergy between Medicine, Molecular
Biology, and Biomedical Informatics, will provide us
with unique means and experience to evaluate gene
expression signatures that will outperform the currently
used parameters in therapy prediction and clinical
prognosis of breast cancer.
II. POST-GENOMICS, MICROARRAYS ANDBREAST CANCER
Since the discovery of the first oncogene about 25 years
ago, a large body of research has convincingly
demonstrated that the initiation and progression of
cancers involve the accumulation of genetic aberrations in
the cell. Recently, through studying blood samples of
families in which there is a history of breast cancer,
7/28/2019 Potamias_etal05
2/6
scientists have isolated and identified a gene linked to
breast cancer. A person who has this modified gene,
labelled BRCA1, has an 85% lifetime risk of developing
breast cancer, as well as a significantly higher risk of
ovarian cancer. By being able to identify these genes
through particular markers associated with the gene,
doctors will know which individuals are more susceptibleto cancer and therefore can follow the proper procedure.
The recent isolation of the gene BRCA1 has prompted
investigators to identify other genes that may contribute
to breast cancer; ovarian cancer and the breast-ovarian
cancer syndrome. Research and technological
development incriminated a number of other breast-
cancer related genes. These genes and their role in
starting or growing breast cancer are listed in Table I
(refer to http://www.breasted.org/genetics.html for a
detailed description and references).
Molecular diagnostics is a rapidly advancing field in
which insights into disease mechanisms are being
elucidated by use of new gene-based biomarkers. Until
recently, diagnostic and prognostic assessment of
diseased tissues and tumours relied heavily on indirect
indicators that permitted only general classifications into
broad histological or morphological subtypes and did not
take into account the alterations in individual gene
expression.
In this context, global gene expression analysis using
microarrays now offers unprecedented opportunities to
obtain molecular signatures of the state of activity of
diseased cells and patient samples. This groundbreakingapproach of studying cancer promises to provide a better
understanding of the underlying mechanism for
tumourigenesis, more accurate diagnosis, more
comprehensive prognosis, and more effective therapeutic
interventions [KHA, 01]
Within the past years, two major advances have taken
place. First, microarray-based expression profiling has
shown promise with the preliminary demonstration that
clustering techniques can predict clinical outcome in
lymphoma [ALI, 00], paediatric leukaemia [YEO, 02],
and breast cancer [SOR, 01], [VEE, 02]. Relative results
for breast cancer have demonstrated the ability of
microarray-based expression profiling to detect tumour
cells in peripheral blood samples, to predict
chemotherapy responses in fine-needle aspiration samples
in neoadjuvant chemotherapy, and, most importantly, to
predict disease-free survival and overall survival from
profiles in breast cancer surgical specimens [BER, 00],
[HED, 01]. Second, in breast cancer genetics, genes like
CHEK2 and HERC2/neu receptor tyrosine kinase were
identified as low-penetrance breast cancer susceptibility
genes and are targets of specific drugs [LAB, 01]. These
studies demonstrate the transition of basic biologic
research to clinical application.
TABLE I
BREAST CANCER GENES AND THEIR ROLE
Gene Role
BRCA1, BRCA2 Tumor suppressor
BP1 stimulates cell growth
HER2, erb-B, Erb-B2, neu stimulates cell growth
P65 stimulates cell growthATM controls cell division
ZNF21 increases the longevity of cells
PDGF stimulates the growth of blood
vessels
Bcl-1 regulates the cell cycle
RB regulates the cell cycle
EK2 involved in repair of damaged DNA
Furthermore, analysis of primary tumours and derived
metastases showed very similar expression profiles
indicating that the molecular program of a primary
tumour is generally retained in its metastases [SCH, 03].
Given the clinical heterogeneity of breast cancer,
microarrays are an ideal tool to establish a more accurate
classification [PIN, 03]. The predictive power of this
approach is much greater than that of currently used
approaches, but remains to be validated in prospective
clinical studies. If confirmed in that setting, the
expression profiling classifier would result at minimum in
about a four-fold drop of patients receiving adjuvant
therapy unnecessarily. Recent breast cancer studies have
demonstrated the ability of microarray-based expression
profiling to detect tumor cells in peripheral blood
samples, to predict chemotherapy responses in fine-
needle aspiration samples in neoadjuvant chemotherapy,
and, most importantly, to predict disease-free survival andoverall survival from profiles in breast cancer surgical
specimens. The predictive power of this approach is much
greater than that of currently used approaches, but
remains to be validated in prospective clinical studies.
III. INDIVIDUALIZED MEDICINE AND
BIOMEDICAL INFORMATICS
It becomes evident that in order to fully grasp the
mechanisms of a disease we do not only need an
understanding of the genetic base of the disease- dealing
with large amounts of data and related functionalgenomics approaches (such as gene-expression profiling)
but we also need to integrate the knowledge normally
processed in the clinical setting.
The use of genetic and proteomic data in addition to
clinical symptoms for medical decision-making will
contribute to the expected, continued shift towards
evidence-based medicine. This vision can only be
realized with an enormous investment into: (i) technology
able to produce the genomic and proteomic data and the
initial comparison of produced results with reference
databases; (ii) creation of standardized databases that
combine clinical history, symptoms and signs, laboratory
7/28/2019 Potamias_etal05
3/6
and procedural results, and genetic and proteomic data in
raw as well as intelligently processed formats; (iii)
technology that assures confidential access to these data
by those who need access, and full-proof security against
unauthorized access; (v) extraction of knowledge out of
these huge databases, their expert interpretation and
matching against existing computational models; (vi)development of novel explanatory and predictive models
for the above, abstraction of the results to the clinical
level, and incorporation of the extracted knowledge into
algorithms and standardized clinical guidelines where
feasible; and finally (vii) implementation of the new
guidelines into the clinical decision-making process.
In this setting a new discipline namely, Biomedical
Informatics (BMI), is raising. BMI aims to offer the
appropriate technology in order to support the emerging
individualized medicine environment, and allow
optimized, individualized healthcare using all relevant
sources of information. Collaborative efforts between
Medical Informatics (MI) and Bioinformatics (BI) could
provide new insights and create a synergy for challenges
needed to create novel genomic applications in medicine
(refer to http://bioinfomed.isciii.es for a white-
paper on the field, and to http://www.
infobiomed.net for a relevant EU funded NoE
project).
BI enables us to understand the fundamental knowledge
about biological processes. The inclusion of clinical
information in biomedical informatics opens the gateway
to genetic risk profiling of patients, new paradigms indisease diagnoses and prognoses and novel approaches to
drug discovery based on the correlation of genetic and
molecular knowledge of diseases with clinical
information of the patients. In this setting the respective
biomedical informatics R&D agenda is forwarded
towards the design, development and deployment of an
integrated clinico-genomics operational framework
where, functional genomics and disease compacting
research are coupled and guided by related medical
knowledge.
IV. THE PROGNOCHIP PROJECT
PrognoChip is a (running) project that joins forces and
efforts from different scientific disciplines: Molecular
Biology (Institute of Molecular Biology &
Biotechnology, FORTH), Medicine (Dept of Surgical
Oncology, University of Crete, and PROLIPSIS,
diagnostic breast cancer center), and Computer Science
(Institute of Computer Science, FORTH). The major tasks
(already scheduled and initiated) within Prognochip are
briefly presented in the sequel.
Medicine/ Tissue collection & Histopathology. (a)
surgical specimens are collected from breast cancer
patients that undergo any type of surgical type of
treatment; as soon as the specimen is removed from the
patient it is carried immediately (in less than 20 minutes)
to the histopathology department in order to avoid ex
vivo ischemia phenomena; (b) a tissue procurement
protocol is designed for tissue collection and storage;
sections are taken from the growing edge of the tumour,
stored at 800C dry freezer for further reference, placed
in RNAlater reagent for further RNA extraction, andcovered with optimal cutting temperature compound
(OCT) intended for immunohistochemistry - a Tissue-
Banksystem was designed and developed (already in use)
for proper tissue filing and management; (c) a set of
immunohistology and FISH methods for growth factors
and their receptors, especially HER-2 (up-regulated in
30% of breast carcinomas), are accessed for the
characterization of breast carcinomas; all patients with
malignant disease are staged according to the new TNM
system. In the context of PrognoChip the plans is to
obtain full-genome expression profiles from
approximately 200 individual breast carcinomas.Ethical Issues: Patients are informed and consent to the
molecular and genetic data analysis of their tissue and
blood samples. They also consent to the use of the data
for scientific purposes provided that their anonymity is
secured. For this purpose, special security and
authorization mechanisms are provided and made
operational in the context of the deployed clinical
information systems (see below).
Molecular Biology/ Microarrays: A DNA microarray of
long oligonucleotide probes has been designed,
representing all known human genes, approximately
35,000 different transcripts of 27,000 different genes.Additional positive and negative control oligos have been
included for the quality control of the procedure and the
normalization of data. Oligonucleotide probes are spotted
on a coated activated glass slide, at a density of
approximately 2250 elements/cm3. A common reference
material has been decided for the study, consisting from a
defined set of cell-line extracts, ensuring accurate
quantitation of gene expression for the most of the genes.
An RNA extraction, amplification and fluorescent
labeling protocol has been developed, allowing the
analysis of small samples. After hybridization,
fluorescence intensity images are acquired, using
confocal laser scanner, as 16-bit TIFF files. From these
images, fluorescence intensities are obtained using
dedicated image analysis software. Special plug-ins are
developed for data pre-processing (filtering,
normalization) and analysis.
V. TOWARDS AN INTEGRATED CLINICO-
GENOMICS ENVIRONMENT
In the context of the Prognochip project we have
forwarded, scheduled, and initiated efforts towards the
delivery of an Integrated Clinico-Genomics Information
Technology Environment (ICG-ITE) with the combined
genetic- andindividualized-medicinebeing the target.
7/28/2019 Potamias_etal05
4/6
Fig. 1. Architectural layout and building blocks of the Integrated
Clinico-Genomics Information technology Environment
The envisioned building blocks of ICG-ITE include (seeFigure 1):
a set ofclinical information systems to keep patientsclinical information (i.e., clinical, laboratory and
histo-pathology information systems) based on
Electronic Health Care Record (EHCR) standard data-
models [TSI, 02], [COA, 99], [HL7, 02],
a genomic information system (GIS) to store andmanage the specifications of the respective microarray
experiments (i.e., chip design, hybridizations, etc.),
analyze measured biossays, as well as to store
samples genomic information. GIS is based on the
BASE system (http://base.thep. lu.se)where, the underlying standard genomic data model
([MIA, 04]) and functionality was extended to meet
the project requirements,and
a middleware layer for information/ data integrationand intelligent processing - realized by a puzzle of
integrated software components that enable: (i) the
seamless and efficient extraction of data from the
various data and information sources (clinical and
genomic); (ii) uniform information modeling- enabled
by the utilization of standard clinical/ genomic data
models and respective ontologies [XML, 04], [KAR,
03], (iii) uniform information representation- enabledby the utilization and the appropriate customization of
RDF/XML technology; and (iv) intelligent data
processing and visualization - enabled by a suite of
data-miningcomponents and tools [TIB, 99], [AWE,
99], [PER, 00], [POT, 04], [SYM, 04].
The demanding clinical and genomic data integration
environment post the need to elaborate on the concept of
Integrated Electronic Health Care Record (IEHCR)
architectures [TSI, 02], utilize the respective
technological advances, and extend the standard clinical
data models to include and amalgamate genomic ones. In
this context, the provided security and authorisationinfrastructure is fully employed.
VI. KNOWLEDGE DISCOVERY AND SYNERGISTIC
CLINICO-GENOMICS DECISION-MAKING
The vision of PrognoChip is to realize and operationalize
integrated clinico-genomics knowledge-discovery and
decision-making scenarios, in the lines of the tasks and
procedures outlined below.
A. From Phenotypes to Genotypes
Applying advanced data-mining operations (e.g.,
discriminatory analysis for gene-selection) on the
acquired gene-expression matrix we are able to identify
potential discriminatory genes, i.e., genes that
distinguishe between identified phenotypes (e.g.,
phenotypes A and B; see Figure 2). These genes compose
and indicate the molecular signature orgene markers of
the specific patients phenotypes. In other words, we are
able to link potential phenotypical profiles to respectivemolecular orgenotypicalones. Such advancement may be
utilised in the course of both prognostic and therapeutic
decision-making processes. That is, respective patients,
whose gene-expression profiles match the discovered
molecular signature, could be detected to belong to one of
the identified phenotypes. Then, according to established
guidelines and treatment protocols, prognostic indicators
may be assessed with patients admitted to (potentially)
available treatment protocols.
B. From Genotypes to Phenotypes
The above scenario could be initiated the other wayaround. That is, applying again data-mining operations
(e.g., unsupervised learning such as clustering) we are
able to identify clusters of samples based on their gene-
expression profiles. These clusters may represent
potential interesting genotypes. Assume that two such
genotypical profiles are discovered and identified, X, and
Y (based on the exact parameterization of the clustering
process more clusters may be identified; see Figure 2).
Having on our disposal recorded phenotypical
information and data about the samples (i.e., response,
positive reaction or resistance to specific
chemotherapeutic agents and/or clinico-histopathologicalstate of tumour) we may assign each, yet untreated,
sample to one of the two classes, X orY. Then, we may
initiate a supervised data mining process (e.g.,
classification) in order to discover respective predictive
models. Each of these models represents a potential
phenotype. In this mode of the scenario we may achieve a
re-classification of breast cancer, i.e., a hierarchical
organization of different disease-related phenotypes - a
major task in cancer research. In this context, patients
with different phenotypical profiles are (potentially)
subject to follow different chemo- and/or radio-
therapeutic protocols. So, a more individualised health-care plan may be accessed.
7/28/2019 Potamias_etal05
5/6
Fig. 2. Synergistic clinico-genomics decision-making and
knowledge-discovery support.
VII.CONCLUSION & FUTURE WORK
Much of the genomic data of clinical relevance generated
so far are in a format that is inappropriate for diagnostic
testing. Very large epidemiological population samples
followed prospectively (over a period of years) and
characterized for their biomarker and genetic variation
will be necessary to demonstrate the clinical utility of
these tools. Obstacles to the routine application of these
data in clinical practice include a cultural gap between the
approaches to clinical practice that is currently employed
and that which is possible with these new tools. This will
require a change of mind of clinical oncologists. In the
next 10 years clinical protocols will require a
translational section based on the type of targeted
treatment under study [CEL, 03].
In this paper weve presented PrognoChip, a multi-
disciplinary project that meets the aforementioned
challenges and targets the raising need for individualised
medicine (in terms of both prognosis and treatment). In
the context of the project an Integrated Clinico-Genomics
Environment was designed. The building-blocks of this
environment are identified and specified. Various
enabling components of the environment are already
developed and deployed (the clinical and genomic
information systems). Furthermore, experimentation and
evaluation of known and (developed) innovative data-
mining techniques is in progress.On-going R&D work (as
related to information technology) is now forwarded to
the development of the integration infrastructure, i.e., to
the operationalisation of the middleweare layer of the
ICG-ITE. The plan is to have a first (prototype)
implementation of the whole system by June 2005. By
that, the clinical and genomic profiles of a number of
original patients samples will be also available and
recorded in the respective information systems.
PrognoChip is a very demanding project, in terms of both
human and infrastructure resources. So, resources from
other, directly related, on-going projects (in which
organization in PrognoChip participate) are also utilised.In this context, we want to acknowledge INFOBIOMED
(a network of excellence project; funded by the EU IST
program; http://www.infobiomed.net) where,
results from a nationally-funded project (as PrognoChip)
will be utilised and exploited in the context of a trans-
European one.
REFERENCES
[ALI, 00] Alizadeh et al, Distinct types of diffuse large B-cell
lymphoma identified by gene expression profiling,Nature, 403,
pp. 503511, 2000.
[BER, 00] F. Bertucci et al, Gene expression profiling ofprimary breast carcinomas using arrays of candidate genes,Hum
Mol Genet, 9, pp. 29812991, 2000.
[CEL, 03] Celis, J. Proteomics and Functional Genomics in
Translational Cancer Research: towards an integrated approach.
Presentation in Cancer: Molecular Targets for novel Therapies.
3rd Simposio Scientifico, Pabelln San Carlos, Hospital Clinico,
Madrid, April 2003.
[COA, 99] COAS, Clinical Observations Access Service
(COAS), Final Submission, OMG Document: corbamed/99-03-
25, 1999.
[HED, 01] I. Hedenfalk et al, Gene-expression profiles in
hereditary breast cancer,N Engl J Med, 344, pp. 539548, 2001.
[HL7, 02] HL7 Health Level 7: Reference Information Model
(RIM), http://www.hl7.org/library/data-model/RIM/C30118/
rim.htm.
[KAR, 03] G. Karvounarakis, A. Magkanaraki, S. Alexaki, V.
Christophides, D. Plexousakis, M. Scholl, and K. Tolle.
Querying the Semantic Web with RQL. Computer Networks
and ISDN Systems Journal, 42(5), pp. 617640, 2003.
[KHA, 01] J. Khan et al, Classification and diagnostic
prediction of cancers using gene expression profiling and
artificial neural networks,Nat Med, 7, pp. 673679, 2001.
[LAB, 01] E. Landesman-Bollag et al, Protein kinase CK2 in
mammary gland tumourigenesis, Oncogene, 20, pp. 32473257,
2001.[MIA, 04] MIAME Web site. http://www.mged.org/
Workgroups/MIAME/miame.html, accessed Dec. 2004.
[PER, 00] C.M. Perou et al, Molecular portraits of human breast
tumours,Nature, 406, 747752, 2000.
[PIN, 03] R. Pinedo, Cancer Clinical Trials in the next decade.
Presentation in Cancer: Molecular Targets for novel Therapies.
3rd Simposio Scientifico, Pabelln San Carlos, Hospital Clinico,
Madrid, April 2003.
[POT, 04] G. Potamias, L. Koumakis, and V. Moustakis, Gene
Selection via Discretized Gene-Expression Profiles and Greedy
Feature-Elimination, LECT NOTES ARTIF INT (LNAI), 3025,
pp. 256266, 2004.
7/28/2019 Potamias_etal05
6/6
[SCH, 03] U. Schmidt et al, Cancer diagnosis and
microarrays, Int J Biochem Cell Biol, 35(2), pp. 119124,
2003.
[SOR, 01] T. Sorlie et al, Gene expression patterns of breast
carcinomas distinguish tumour subclasses with clinical
implications,Proc Natl Acad Sci, Sep 11, 98(19), pp. 10869
10874, 2001.[SYM, 04] A. Symeonidis and I.G. Tollis, Visualization of
Biological Information with Circular Drawings, LNCS, 3337,
pp. 468478, 2004.
[TIB, 99] R. Tibshirani, R., Hastie, T., Eisen, M., Ross, D.,
Botstein, and Brown, P., Clustering methods for the analysis
of DNA microarray data, Technical Report, Department of
Statistics, Stanford University, 1999.
[TSI, 02] M. Tsiknakis, D.G. Katehakis, and S.C.
Orphanoudakis, An Open, Component-based Information
Infrastructure for Integrated Health Information Networks,
International Journal of Medical Informatics, 68(1-3), pp. 3
26, 2002.
[VEE, 02] E. van der Veer et al, Gene expression profiling
predicts clinical outcome of breast cancer, Nature,
415(6871), pp. 530536, 2002.
[XML, 04] XML Semantics. http://www.w3.org/DesignIssues/Toolbox.html, accessed Dec. 2004.
[YEO, 02] E.J. Yeoh, et al, Classification, subtype discovery,
and prediction of outcome in pediatric acute lymphoblastic
leukemia by gene expression profiling, Cancer Cell, 1(2), pp.
13343, 2002.
[ZWE, 99] G. Zweiger, Knowledge discovery in gene-
expression-microarray data: mining the information output of
the genome. Trends Biotechnol., 17(11), pp. 429436, 1999.