1
Verastem, Inc. | 215 First Street | Suite 440 | Cambridge | Massachusetts | 02142 An Alternative Splicing Signature that Identifies Breast Cancer Stem Cells Irina M. Shapiro, Jiangwen Zhang, Alan G. Derr, Christian M. Vidal, Alissa A. Neill, Qunli Xu, Daniel W. Paterson, Jonathan A. Pachter and David T. Weaver Verastem, Inc., Cambridge, MA Tumors frequently contain a sub-population of cells, referred to as cancer stem cells (CSCs) or tumor-initiating (TI) cells, with the abilities to self-renew and to regenerate all cell types within the tumor. Recent studies suggest that the process of epithelial-mesenchymal transition (EMT) may contribute to generation of CSCs within the tumor. Although alternative splicing has been shown to play a role in the biology of EMT, the occurrence and role of alternative splicing in CSC biology is largely unexplored. Given the role of CSCs in the recurrence and spread of cancer, there is an urgent need to develop new agents that target CSCs. Development of CSC-targeted drugs will be greatly facilitated by biomarkers that can identify CSCs to aid in patient selection and determination of drug response. Analysis of alternative splicing in CSCs may provide valuable new CSC-specific markers. In this study, a TI gene expression signature (Creighton et al, 2009), an EMT gene expression signature (Gupta et al., 2009, Taube et al, 2009) and a Basal B/Luminal breast cancer subtype classifier were used in Support Vector Machine (SVM) analysis of 41 human breast cancer cell lines to identify changes in alternative splicing. We discovered 209 cassette exon splicing events from the union of these 3 classifiers, of which 68 splicing events were concordant. Interestingly, GO and KEGG pathway analysis using these 68 alternatively spliced events demonstrated enrichment of genes encoding key drivers of CSC phenotype, including cell migration, motility, and cell adhesion pathways, as well as extracellular matrix-receptor interactions. SVM analysis of an independent NCI-60 cancer cell line dataset determined that the top 60 exons from the breast cancer cell line training group identified 96% of the CSC-high cell lines and 90% of CSC-low cell lines with high accuracy. To extend the analysis to human tumor samples, we assessed the whole transcriptome data from human breast cancers (81 patients, Lin et al., 2009) using a CSC centroid gene signature model that clustered tumor samples into CSC-high and CSC-low subgroups. Interestingly, a centroid model based on the 68 alternative splicing events similarly identified the CSC-high and CSC-low breast cancers with high accuracy. The CSC-high subgroup contained mostly triple negative breast cancers, known to have increased frequencies of CSC and EMT phenotypes, suggestive of the therapeutic importance of this alternative splicing signature for identification of CSCs in patients. Q-PCR analysis showed that several of the alternative splicing events observed in the CSC-high subgroup were further enriched in tumorspheres grown from human breast cancer cell lines. This supports involvement of these alternative splicing events in the CSC renewal process. The CSC- associated alternative splicing signature identified here will be further refined to develop new CSC- specific diagnostic markers to stratify breast cancer patients and monitor response to novel CSC-targeted therapies. A) Classifiers used for alternative splicing analysis of 30 Breast Cancer cell lines B) Venn diagram of the overlap among alternatively spliced cassette exons identified using different classifiers. C) KEGG and GO analysis based on the union of 209 alternative cassette exons (154 genes). ABSTRACT INTRODUCTION Alternative splicing analysis of 30 BC cell lines identified 68 alternative cassette exons that predict CSC potential in NCI-60 independent dataset. CSC gene expression centroid model identified CSC-high and CSC-low NCI-60 cell lines and primary breast cancer samples. A substantial agreement was observed between classification of breast cancer samples using exon CSC centroid model and gene expression CSC centroid model . Several alternative cassette exons were enriched in tumorspheres of SUM159 cells implying involvement of these splicing events in the biology of CSCs. Fig 1: Identification of alternative splicing events using three independent classifiers. TI signature 16 62 28 68 33 6 36 EMT signature BasalB/Luminal classifier Union 209 Cassette exons P value < 0.001 Tumor initiating gene signature (463 genes) EMT gene signature (178 genes) BasalB vs. Luminal Breast cancer cell line subtype classifier 30 BC cell lines Splicing Index algorithm SVM analysis A B C Fig 2: Hierarchical clustering of 30 BC or NCI-60 cell lines based on the union of 209 alternative exons separates them into subgroups. A 0 2 4 6 8 10 cell motion cell adhesion biological adhesion cell migration blood vessel development vasculature development extracellular structure organization cell motility -log10 (Pvalue) 0 1 2 3 4 5 6 Pathways in cancer Focal adhesion ECM-receptor interaction MAPK signaling pathway ErbB signaling pathway Toll-like receptor signaling pathway -log10(Pvalue) KEGG GO NCI-60 Cell lines (Independent dataset) 209 Union exons 30 BC Cell lines (Training dataset) B Luminal cell lines BasalB cell lines BasalA cell lines A) Hierarchical clustering of 30 BC cell lines based on the union of 209 alternatively spliced cassette exons. B) Hierarchical clustering of NCI-60 cell lines based on the union of 209 alternatively spliced cassette exons. 68 concordant exons were ranked using SMV to test their classification power on NCI-60 cell lines. Schematics of the centroid analysis design to predict CSC potential of cell lines and primary cancer samples. METHODS Fig 3: Analysis of primary breast cancer samples using the CSC gene expression or exon signature centroids. A A) 84 human breast cancer tumor samples (Lin et al., 2009) clustered based on the 68 CSC exon signature centroid. Groups of cell lines with high or low CSC component cluster together. B) 84 human breast cancer tumor samples clustered based on the CSC gene expression signature centroid. Groups of tumor samples with high or low CSC component cluster together. Fig 4: CSC gene expression centroid correlates with the CSC SI exon centroid in predicting CSC high/low tumor samples as well as triple negative cancer samples. -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 Exon CSC centroid score Gene CSC centroid score Cohen Kappa=0.6 A y = -0.4678x - 0.012 R² = 0.7337 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 Triple Negative Signature score Exon CSC centroid score y = -0.5207x + 0.0015 R² = 0.6063 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 Triple Negative Signature score Gene CSC centroid score Fig 5: Experimental validation of alternative splicing events in tumorspheres of SUM159 cells. CSC GENE SIGNATURE CSC GENE CENTROID MODEL Training dataset: 30 Breast Cancer cell lines 2 clusters: CSC high and CSC low CSC GENE CENTROID MODEL NCI-60 Cell lines Primary BC samples CONCORDANT 68 ALTERNATIVE CASSETTE EXONS CSC EXON CENTROID MODEL Training dataset: 41 Breast Cancer cell lines CSC EXON CENTROID MODEL NCI-60 Cell lines Primary BC samples COMPARE CENTROIDS GENE VS EXON RESULTS SUMMARY B C A) 84 human breast cancer patient samples (Lin et al., 2009), 34 HER2 positive, 26 luminal, and 24 basal), analyzed using CSC gene or alternative exon centroid models. Each dot represents an individual tumor sample. A substantial agreement is observed between two model predictions (Cohen Kappa = 0.6). B) Centroid model build using a Triple Negative gene signature (Lehmann et al., 2011) is in agreement with CSC alternative exon centroid used to classify BC patient samples in (A). C) Same as (B) for CSC gene expression centroid. High Low CSC High Low CSC CSC CSC mixed Top 60 exons together can identify 96% of cell lines with low CSC potential and 90% of cell lines with high CSC potential from NCI60 dataset. CSC- specific biomarkers: Patient stratification Drug response Alternative splicing enriches the pool of biomarkers Examples: FGFR2 Holzmann K., et al. J. Nucleic Acids, 2012 p120Catenin B A B A) Schematics of an experimental validation of alternative splicing events. Exon expression was normalized to gene expression. B) Bar graph of changes in exon inclusion between tumorspheres of SUM159 cells and cells grown on plastic (2D). Enrichment of CSC-related exons is expected in tumorspheres. High CSC Low CSC High CSC mixed Low High Low High CSC CSC CSC CSC mixed LB-197 Top 5 exons can achieve 100% accuracy in identifying cell lines with high and low CSC potential from 30 BC cell lines.

An Alternative Splicing Signature that Identifies Breast LB-197 …€¦ · An Alternative Splicing Signature that Identifies Breast Cancer Stem Cells Irina M. Shapiro, Jiangwen Zhang,

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: An Alternative Splicing Signature that Identifies Breast LB-197 …€¦ · An Alternative Splicing Signature that Identifies Breast Cancer Stem Cells Irina M. Shapiro, Jiangwen Zhang,

Verastem, Inc. | 215 First Street | Suite 440 | Cambridge | Massachusetts | 02142

An Alternative Splicing Signature that Identifies Breast Cancer Stem Cells

Irina M. Shapiro, Jiangwen Zhang, Alan G. Derr, Christian M. Vidal, Alissa A. Neill, Qunli Xu, Daniel W. Paterson, Jonathan A. Pachter and David T. Weaver

Verastem, Inc., Cambridge, MA

Tumors frequently contain a sub-population of cells, referred to as cancer stem cells (CSCs) or tumor-initiating (TI) cells, with the abilities to self-renew and to regenerate all cell types within the tumor. Recent studies suggest that the process of epithelial-mesenchymal transition (EMT) may contribute to generation of CSCs within the tumor. Although alternative splicing has been shown to play a role in the biology of EMT, the occurrence and role of alternative splicing in CSC biology is largely unexplored. Given the role of CSCs in the recurrence and spread of cancer, there is an urgent need to develop new agents that target CSCs. Development of CSC-targeted drugs will be greatly facilitated by biomarkers that can identify CSCs to aid in patient selection and determination of drug response. Analysis of alternative splicing in CSCs may provide valuable new CSC-specific markers. In this study, a TI gene expression signature (Creighton et al, 2009), an EMT gene expression signature (Gupta et al., 2009, Taube et al, 2009) and a Basal B/Luminal breast cancer subtype classifier were used in Support Vector Machine (SVM) analysis of 41 human breast cancer cell lines to identify changes in alternative splicing. We discovered 209 cassette exon splicing events from the union of these 3 classifiers, of which 68 splicing events were concordant. Interestingly, GO and KEGG pathway analysis using these 68 alternatively spliced events demonstrated enrichment of genes encoding key drivers of CSC phenotype, including cell migration, motility, and cell adhesion pathways, as well as extracellular matrix-receptor interactions. SVM analysis of an independent NCI-60 cancer cell line dataset determined that the top 60 exons from the breast cancer cell line training group identified 96% of the CSC-high cell lines and 90% of CSC-low cell lines with high accuracy. To extend the analysis to human tumor samples, we assessed the whole transcriptome data from human breast cancers (81 patients, Lin et al., 2009) using a CSC centroid gene signature model that clustered tumor samples into CSC-high and CSC-low subgroups. Interestingly, a centroid model based on the 68 alternative splicing events similarly identified the CSC-high and CSC-low breast cancers with high accuracy. The CSC-high subgroup contained mostly triple negative breast cancers, known to have increased frequencies of CSC and EMT phenotypes, suggestive of the therapeutic importance of this alternative splicing signature for identification of CSCs in patients. Q-PCR analysis showed that several of the alternative splicing events observed in the CSC-high subgroup were further enriched in tumorspheres grown from human breast cancer cell lines. This supports involvement of these alternative splicing events in the CSC renewal process. The CSC-associated alternative splicing signature identified here will be further refined to develop new CSC-specific diagnostic markers to stratify breast cancer patients and monitor response to novel CSC-targeted therapies.

A) Classifiers used for alternative splicing analysis of 30 Breast Cancer cell lines B) Venn diagram of the overlap among alternatively spliced cassette exons identified using different classifiers. C) KEGG and GO analysis based on the union of 209 alternative cassette exons (154 genes).

ABSTRACT

INTRODUCTION

• Alternative splicing analysis of 30 BC cell lines identified 68 alternative cassette exons that predict CSC potential in NCI-60 independent dataset.

• CSC gene expression centroid model identified CSC-high and CSC-low NCI-60 cell lines and primary breast cancer samples.

• A substantial agreement was observed between classification of breast cancer samples using exon CSC centroid model and gene expression CSC centroid model .

• Several alternative cassette exons were enriched in tumorspheres of SUM159 cells implying involvement of these splicing events in the biology of CSCs.

Fig 1: Identification of alternative splicing events using three independent classifiers.

TI signature

16

62

28

68

33

6

36

EMT signature

BasalB/Luminal classifier

Union 209 Cassette exons P value < 0.001

Tumor initiating gene signature (463 genes) EMT gene signature (178 genes) BasalB vs. Luminal Breast cancer cell line subtype classifier

30 BC cell lines Splicing Index algorithm SVM analysis

A

B C

Fig 2: Hierarchical clustering of 30 BC or NCI-60 cell lines based on the union of 209 alternative exons separates them into subgroups.

A

0 2 4 6 8 10

cell motion

cell adhesion

biological adhesion

cell migration

blood vessel development

vasculature development

extracellular structure organization

cell motility

-log10 (Pvalue)

0 1 2 3 4 5 6

Pathways in cancer

Focal adhesion

ECM-receptor interaction

MAPK signaling pathway

ErbB signaling pathway

Toll-like receptor signaling pathway

-log10(Pvalue)

KEGG

GO

NCI-60 Cell lines (Independent dataset)

20

9 U

nio

n exo

ns

30 BC Cell lines (Training dataset) B

Luminal cell lines BasalB cell lines BasalA cell

lines

A) Hierarchical clustering of 30 BC cell lines based on the union of 209 alternatively spliced cassette exons. B) Hierarchical clustering of NCI-60 cell lines based on the union of 209 alternatively spliced cassette exons. 68 concordant exons were ranked using SMV to test their classification power on NCI-60 cell lines.

Schematics of the centroid analysis design to predict CSC potential of cell lines and primary cancer samples.

METHODS

Fig 3: Analysis of primary breast cancer samples using the CSC gene expression or exon signature centroids.

A

A) 84 human breast cancer tumor samples (Lin et al., 2009) clustered based on the 68 CSC exon signature centroid. Groups of cell lines with high or low CSC component cluster together. B) 84 human breast cancer tumor samples clustered based on the CSC gene expression signature centroid. Groups of tumor samples with high or low CSC component cluster together.

Fig 4: CSC gene expression centroid correlates with the CSC SI exon centroid in predicting CSC high/low tumor samples as well as triple negative cancer samples.

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

-0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8

Exo

n C

SC c

en

tro

id s

core

Gene CSC centroid score

Cohen Kappa=0.6 A

y = -0.4678x - 0.012 R² = 0.7337

-0.5

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

-0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8

Trip

le N

ega

tive

Sig

nat

ure

sco

re

Exon CSC centroid score

y = -0.5207x + 0.0015 R² = 0.6063

-0.5

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

-0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1

Trip

le N

ega

tive

Sig

nat

ure

sco

re

Gene CSC centroid score

Fig 5: Experimental validation of alternative splicing events in tumorspheres of SUM159 cells.

CSC GENE SIGNATURE CSC GENE CENTROID MODEL Training dataset: 30 Breast Cancer cell lines 2 clusters: CSC high and CSC low

CSC GENE CENTROID MODEL • NCI-60 Cell lines • Primary BC samples

CONCORDANT 68 ALTERNATIVE CASSETTE EXONS

CSC EXON CENTROID MODEL Training dataset: 41 Breast Cancer cell lines

CSC EXON CENTROID MODEL • NCI-60 Cell lines • Primary BC samples

COMPARE CENTROIDS GENE VS EXON

RESULTS

SUMMARY

B

C

A) 84 human breast cancer patient samples (Lin et al., 2009), 34 HER2 positive, 26 luminal, and 24 basal), analyzed using CSC gene or alternative exon centroid models. Each dot represents an individual tumor sample. A substantial agreement is observed between two model predictions (Cohen Kappa = 0.6). B) Centroid model build using a Triple Negative gene signature (Lehmann et al., 2011) is in agreement with CSC alternative exon centroid used to classify BC patient samples in (A). C) Same as (B) for CSC gene expression centroid.

High Low CSC High Low CSC CSC CSC

mixed

Top 60 exons together can identify 96% of cell lines with low CSC potential and 90% of cell lines

with high CSC potential from NCI60 dataset.

CSC- specific biomarkers: • Patient stratification • Drug response

Alternative splicing enriches the pool of biomarkers Examples:

FGFR2

Holzmann K., et al. J. Nucleic Acids, 2012

p120Catenin

B

A B

A) Schematics of an experimental validation of alternative splicing events. Exon expression was normalized to gene expression. B) Bar graph of changes in exon inclusion between tumorspheres of SUM159 cells and cells grown on plastic (2D). Enrichment of CSC-related exons is expected in tumorspheres.

High CSC Low CSC High CSC mixed

Low High Low High CSC CSC CSC CSC

mixed

LB-197

Top 5 exons can achieve 100% accuracy in identifying cell lines with high and low CSC

potential from 30 BC cell lines.