Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Verastem, Inc. | 215 First Street | Suite 440 | Cambridge | Massachusetts | 02142
An Alternative Splicing Signature that Identifies Breast Cancer Stem Cells
Irina M. Shapiro, Jiangwen Zhang, Alan G. Derr, Christian M. Vidal, Alissa A. Neill, Qunli Xu, Daniel W. Paterson, Jonathan A. Pachter and David T. Weaver
Verastem, Inc., Cambridge, MA
Tumors frequently contain a sub-population of cells, referred to as cancer stem cells (CSCs) or tumor-initiating (TI) cells, with the abilities to self-renew and to regenerate all cell types within the tumor. Recent studies suggest that the process of epithelial-mesenchymal transition (EMT) may contribute to generation of CSCs within the tumor. Although alternative splicing has been shown to play a role in the biology of EMT, the occurrence and role of alternative splicing in CSC biology is largely unexplored. Given the role of CSCs in the recurrence and spread of cancer, there is an urgent need to develop new agents that target CSCs. Development of CSC-targeted drugs will be greatly facilitated by biomarkers that can identify CSCs to aid in patient selection and determination of drug response. Analysis of alternative splicing in CSCs may provide valuable new CSC-specific markers. In this study, a TI gene expression signature (Creighton et al, 2009), an EMT gene expression signature (Gupta et al., 2009, Taube et al, 2009) and a Basal B/Luminal breast cancer subtype classifier were used in Support Vector Machine (SVM) analysis of 41 human breast cancer cell lines to identify changes in alternative splicing. We discovered 209 cassette exon splicing events from the union of these 3 classifiers, of which 68 splicing events were concordant. Interestingly, GO and KEGG pathway analysis using these 68 alternatively spliced events demonstrated enrichment of genes encoding key drivers of CSC phenotype, including cell migration, motility, and cell adhesion pathways, as well as extracellular matrix-receptor interactions. SVM analysis of an independent NCI-60 cancer cell line dataset determined that the top 60 exons from the breast cancer cell line training group identified 96% of the CSC-high cell lines and 90% of CSC-low cell lines with high accuracy. To extend the analysis to human tumor samples, we assessed the whole transcriptome data from human breast cancers (81 patients, Lin et al., 2009) using a CSC centroid gene signature model that clustered tumor samples into CSC-high and CSC-low subgroups. Interestingly, a centroid model based on the 68 alternative splicing events similarly identified the CSC-high and CSC-low breast cancers with high accuracy. The CSC-high subgroup contained mostly triple negative breast cancers, known to have increased frequencies of CSC and EMT phenotypes, suggestive of the therapeutic importance of this alternative splicing signature for identification of CSCs in patients. Q-PCR analysis showed that several of the alternative splicing events observed in the CSC-high subgroup were further enriched in tumorspheres grown from human breast cancer cell lines. This supports involvement of these alternative splicing events in the CSC renewal process. The CSC-associated alternative splicing signature identified here will be further refined to develop new CSC-specific diagnostic markers to stratify breast cancer patients and monitor response to novel CSC-targeted therapies.
A) Classifiers used for alternative splicing analysis of 30 Breast Cancer cell lines B) Venn diagram of the overlap among alternatively spliced cassette exons identified using different classifiers. C) KEGG and GO analysis based on the union of 209 alternative cassette exons (154 genes).
ABSTRACT
INTRODUCTION
• Alternative splicing analysis of 30 BC cell lines identified 68 alternative cassette exons that predict CSC potential in NCI-60 independent dataset.
• CSC gene expression centroid model identified CSC-high and CSC-low NCI-60 cell lines and primary breast cancer samples.
• A substantial agreement was observed between classification of breast cancer samples using exon CSC centroid model and gene expression CSC centroid model .
• Several alternative cassette exons were enriched in tumorspheres of SUM159 cells implying involvement of these splicing events in the biology of CSCs.
Fig 1: Identification of alternative splicing events using three independent classifiers.
TI signature
16
62
28
68
33
6
36
EMT signature
BasalB/Luminal classifier
Union 209 Cassette exons P value < 0.001
Tumor initiating gene signature (463 genes) EMT gene signature (178 genes) BasalB vs. Luminal Breast cancer cell line subtype classifier
30 BC cell lines Splicing Index algorithm SVM analysis
A
B C
Fig 2: Hierarchical clustering of 30 BC or NCI-60 cell lines based on the union of 209 alternative exons separates them into subgroups.
A
0 2 4 6 8 10
cell motion
cell adhesion
biological adhesion
cell migration
blood vessel development
vasculature development
extracellular structure organization
cell motility
-log10 (Pvalue)
0 1 2 3 4 5 6
Pathways in cancer
Focal adhesion
ECM-receptor interaction
MAPK signaling pathway
ErbB signaling pathway
Toll-like receptor signaling pathway
-log10(Pvalue)
KEGG
GO
NCI-60 Cell lines (Independent dataset)
20
9 U
nio
n exo
ns
30 BC Cell lines (Training dataset) B
Luminal cell lines BasalB cell lines BasalA cell
lines
A) Hierarchical clustering of 30 BC cell lines based on the union of 209 alternatively spliced cassette exons. B) Hierarchical clustering of NCI-60 cell lines based on the union of 209 alternatively spliced cassette exons. 68 concordant exons were ranked using SMV to test their classification power on NCI-60 cell lines.
Schematics of the centroid analysis design to predict CSC potential of cell lines and primary cancer samples.
METHODS
Fig 3: Analysis of primary breast cancer samples using the CSC gene expression or exon signature centroids.
A
A) 84 human breast cancer tumor samples (Lin et al., 2009) clustered based on the 68 CSC exon signature centroid. Groups of cell lines with high or low CSC component cluster together. B) 84 human breast cancer tumor samples clustered based on the CSC gene expression signature centroid. Groups of tumor samples with high or low CSC component cluster together.
Fig 4: CSC gene expression centroid correlates with the CSC SI exon centroid in predicting CSC high/low tumor samples as well as triple negative cancer samples.
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
-0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8
Exo
n C
SC c
en
tro
id s
core
Gene CSC centroid score
Cohen Kappa=0.6 A
y = -0.4678x - 0.012 R² = 0.7337
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
-0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8
Trip
le N
ega
tive
Sig
nat
ure
sco
re
Exon CSC centroid score
y = -0.5207x + 0.0015 R² = 0.6063
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
-0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
Trip
le N
ega
tive
Sig
nat
ure
sco
re
Gene CSC centroid score
Fig 5: Experimental validation of alternative splicing events in tumorspheres of SUM159 cells.
CSC GENE SIGNATURE CSC GENE CENTROID MODEL Training dataset: 30 Breast Cancer cell lines 2 clusters: CSC high and CSC low
CSC GENE CENTROID MODEL • NCI-60 Cell lines • Primary BC samples
CONCORDANT 68 ALTERNATIVE CASSETTE EXONS
CSC EXON CENTROID MODEL Training dataset: 41 Breast Cancer cell lines
CSC EXON CENTROID MODEL • NCI-60 Cell lines • Primary BC samples
COMPARE CENTROIDS GENE VS EXON
RESULTS
SUMMARY
B
C
A) 84 human breast cancer patient samples (Lin et al., 2009), 34 HER2 positive, 26 luminal, and 24 basal), analyzed using CSC gene or alternative exon centroid models. Each dot represents an individual tumor sample. A substantial agreement is observed between two model predictions (Cohen Kappa = 0.6). B) Centroid model build using a Triple Negative gene signature (Lehmann et al., 2011) is in agreement with CSC alternative exon centroid used to classify BC patient samples in (A). C) Same as (B) for CSC gene expression centroid.
High Low CSC High Low CSC CSC CSC
mixed
Top 60 exons together can identify 96% of cell lines with low CSC potential and 90% of cell lines
with high CSC potential from NCI60 dataset.
CSC- specific biomarkers: • Patient stratification • Drug response
Alternative splicing enriches the pool of biomarkers Examples:
FGFR2
Holzmann K., et al. J. Nucleic Acids, 2012
p120Catenin
B
A B
A) Schematics of an experimental validation of alternative splicing events. Exon expression was normalized to gene expression. B) Bar graph of changes in exon inclusion between tumorspheres of SUM159 cells and cells grown on plastic (2D). Enrichment of CSC-related exons is expected in tumorspheres.
High CSC Low CSC High CSC mixed
Low High Low High CSC CSC CSC CSC
mixed
LB-197
Top 5 exons can achieve 100% accuracy in identifying cell lines with high and low CSC
potential from 30 BC cell lines.