22
D ata D riven C ancer(DDC) or C ancer D riven D ata(CDD)? An omics puzzle to be solved for better prognosis in the disease Alokkumar Jha PhD Student Insight centre for data analytics NUI Galway

Institute meeting - 09/10/15

Embed Size (px)

DESCRIPTION

Cancer is like the Mafia

Citation preview

Page 1: Institute meeting  - 09/10/15

Data Driven Cancer(DDC) or Cancer Driven Data(CDD)?

An omics puzzle to be solved for better prognosis in the disease

Alokkumar JhaPhD Student

Insight centre for data analytics

NUI Galway

Page 2: Institute meeting  - 09/10/15

Cancer is like the Mafia • Treatments have variable effect• Resistance can evolve• Doesn't work for all people• Doesn’t hit the progenitor

• Looks ordinary(almost)• Don’t play by the rules• Have competitive advantage• Allude detection

Page 3: Institute meeting  - 09/10/15

Current Scenario in cancer research and data science

Page 4: Institute meeting  - 09/10/15

Flood of Data

Expression

DNA Methylation

StructuralVariation Exome

Sequences

Copy NumberAlterations

NextGen Biology Mantra: More data is good.

Page 5: Institute meeting  - 09/10/15

Data Generation Mechanisms• There is approximately 500 petabytes of healthcare data in existence today and

that number is expected to skyrocket to more than 25,000 petabytes within the next seven years Groves, P., Kayyali, B., Knott, D. & Kuilen, S. V. (2013). The ‘Big-Data’ Revolution in Healthcare. MicKinsey & Company Report.

Data Driven Cancer (DDC)Know biomarkers for certain cancer types but difficult to

understand gene behaviour and alternation disease from the gene(Gene->Data) ,Biomarker research, Targeted therapy

Cancer Driven Data (CDD)

Molecular level information for cancer is not very well know so use existing open source data and discover cancer behaviour(Data->Cancer) 1000Geneome,GWAS

Page 6: Institute meeting  - 09/10/15

Analysis like investigating a plane crashPatient Sample 1 Patient Sample 2

Patient Sample 3 Patient Sample N

Page 7: Institute meeting  - 09/10/15

Data Driven Cancer

Indication of novel cancer types based on their signature and targets (Genes/ Proteins) or alternate indications

MLN7243

MET,ITGA2,CAV1,ASPH,LGALS3,F2RL1,SERPINE2,EGFR,CAV2SDC4,LMNA,TPM1,DAB2,GNG12,FN1,PTPRM,MYLK,KRT18LAMB1,ADAM9,TIMP1,ITGA3,CD44,MIR21,ITGA5,IGFBP3,NRP1S100A6,ACTN1,ANXA2,TGFB2,THBS1,FOSL1,YAP1,TJP1,EREG,PTPRFTIMP2,EPHA2,KRT8,SNAI2,CTTN,SERPINE1,LAMC2,IGFBP6F2RL2,MMP2,TGFBR2,LAMA4,TIMP3,DKK1,JAG1,AXL,AREG,PTNKRT7,LAMB3,CDH1,COL4A2,SDC1,PKP2,CLDN1,TGFA,CXCL2ITGB4,APP,KRT19,TGFB1I1,PTGS2,LAMA3,COL4A1,EDN1,PLAULOXL2,PPL,CALD1,KLF5,ITGB6,MMP1,PLAT,LOX,CCND1,CTGFTGIF1,TFPI2,TUBB6,COL1A1,CLDN7,TACSTD2,CDH2,GJA1NID1,DSP,SPARC,CDH3,GNG11,EFNA5,IL1A,RHOB,EPCAM,F11R

Signature Genes

Page 8: Institute meeting  - 09/10/15

Data Driven Cancer

PPI(protein –Protein

Interaction)

PPIs

Graph Statistics• Number of genes from your seed list: 100• Number of intermediate components: 90• Number of interactions in subnetwork created

from seed list: 351• Total components in the background network:

2086 • Total interactions in the background network:

11429

PPI based Disease

Enrichment

TOP GeneCOBAS2.0BioMyndb

DAVID

PPI databasesHPRD, BIND, IntAct, Vidal, MiNT, PID,

BioGrid

Page 9: Institute meeting  - 09/10/15

DDC

Page 10: Institute meeting  - 09/10/15

TOPgene,Cobas2.0,Biomyndb,David,Disent,Gsea

Background gene from

linkedcanDB

Algorithm defined

background genes

• Gtp cyclohydrolase i deficiency• Dystonia, dopa-responsive; drd• Epidermolysis bullosa letalis• Cirrhosis, familial• Epidermolysis bullosa, generalized atrophic

benign; gabeb

Background gene based disease

enrichment

Top Ten Diseases from this list based on p-value >0.05

• Idiopathic intracranial hypertension with papilledema• Galactorrhea-Hyperprolactinemia• Chromosome 13q trisomy• Intrahepatic cholangiocarcinoma• Isotretinoin embryopathy like syndrome

• Familial primary gastric lymphoma

LinkedcandbOMIM, TTD, CTD , clinvar, COSMIC, kegg, wikipathway, reactome etc. (32 databases)

Linkedmdbwor(22 databases)

Page 11: Institute meeting  - 09/10/15

Summery: DDC

• Requirements Integrated dataset for downstream analysis Inferred activities reflect neighbourhood of influence around a gene Can boost signal for survival analysis and assessment of mutation

impact

Page 12: Institute meeting  - 09/10/15

Proteasome Subunit

NGS(ChIP+RNA seq

Approach)

LinkedSeq(ENCODE,TCGA,SR,

GWAS,GRO-seq, 1000genome etc.)

PSMD9

Cancer Driven Data

Page 13: Institute meeting  - 09/10/15

Proteasome Subunit

PSMD9

Microarray Approach

LinkedArray(U133Plus2,U133)GEO,EBI Express

Cell line data

Tissue data

Cancer Driven Data Tissue U133plus2 U133A Total

Cancer Normal Cancer Normal

Abdomen 13 0 0 0 13

Adipose 1 59 0 12 72

Adrenal gland 14 5 0 0 19

Bladder 39 14 87 15 155

Blood 4693 639 3130 1099 8974

Brain 785 568 592 1627 3572

Breast 1954 251 2635 91 4931

Cervix 74 12 64 34 184

Colon 1294 206 256 27 1783

Endometrium 72 61 0 9 142

Esophagus 48 9 24 28 109

GIST 64 0 0 0 64

Head and neck 202 14 21 2 239

Heart 0 0 0 41 41

Kidney 573 105 366 66 1110

Liver 182 25 156 52 415

Lung 441 225 582 364 1612

Muscle 0 177 0 331 508

Myometrium 0 0 0 24 24

Ovary 859 21 341 9 1230

Pancreas 132 55 13 8 208

Prostate 308 45 244 83 680

Sarcoma 493 0 0 0 493

Skin 290 28 499 59 876

Small intestine 13 6 0 22 41

Stomach 268 57 46 18 389

Testis 4 6 184 13 207

Thyroid 62 25 44 25 156

Tongue 0 11 0 4 15

Uterus 155 12 0 24 191

Vagina 3 5 0 0 8

Vulva 21 14 0 0 35

Total 13057 2655 9284 4087 29083

Page 14: Institute meeting  - 09/10/15

LinkedTheraputics :A linked data approach towards connected omics healthcare

Probes

U133Plus2 54,613U133A 22,215

Normal Tissue

Network(U133plus2 –N+U133A-N)

Cancer Network

(U133plus2 –C+U133A-C)

Protein Synonym problem(PSMD9=RPN4=P27)

LinkedMDBWOR(22 databases)

Centrality Measures• Closeness• Betweenness• Eccentricity• Degree• Eigen Vector• Radiality• Shortest path Length• Longest path lengthWeighted

Network with PCC

Clustering of Both Networks (Community Clustering )

Topological Stability based on Tringle Counts ( Normal vs Cancer)Measure of LOSS/GAIN

Linked PathwaysLinkedPathway

KEGG,REACTOMELeading Disease by each cluster/Indirect Indications

Linked Visualization & ReportingLinkedVIZ

Page 15: Institute meeting  - 09/10/15

LinkedTheraputics :Results Survival profile of PSMD9 With LinkeDTheraputics Platform(Reactome+HPRD+IntAct+NCI)GEO dataset Cancer Typemolecular subclasses of high-grade glioma: prognosis, disease progression, and neurogenesishigh-grade gliomaexperimentally derived metastasis gene expression profile predicts recurrence and death in colon cancer patientscolon cancerdiscovery cohort for genomic predictor of response and survival following neoadjuvant taxane-anthracycline chemotherapy in breast cancerbreast cancermaqc-ii project: multiple myeloma (mm) data set multiple myelomametastasis gene expression profile predicts recurrence and death in colon cancer patients (moffitt samples)colon cancerexpression profile-defined classification of lung adenocarcinoma lung cancervalidation cohort for genomic predictor of response and survival following neoadjuvant taxane-anthracycline chemotherapy in breast cancerbreast cancerexpression data for early stage nsclc lung cancerpredective value of prognosis-related gene expression study in primary bladder cancer bladder cancergene expression data for pathological stage i-ii lung adenocarcinomas lung cancerSurvival PSMD9 based on ReactomeRelapse-related molecular signature in lung adenocarcinomas identifies patients with dismal prognosislung cancer183 breast tumors from the Helsinki Univerisity Central Hospital with survival informationBreast cancerAnalysis of early primary breast cancer to identify prognostic markers and associated pathways: mRNA and miRNA profilingBreast cancerGene expression data for pathological stage I-II lung adenocarcinomas Lung cancerThe humoral immune system has a key prognostic impact in node-negative breast cancerBreast cancerGene expression of breast cancer tissue in a large population-based cohort of Swedish patientsBreast cancerHuman lung adenocarcinoma lung cancerExpression Profile-Defined Classification of Lung Adenocarcinoma lung cancerPrediction of survival in diffuse large B cell lymphoma treated with chemotherapy plus Rituximabdiffuse large B cell lymphomaSurvival Related Profile, Pathways and Transcription Factors in Ovarian Cancer Ovarian CancerSurvival PSMD9 based on NCI183 breast tumors from the Helsinki Univerisity Central Hospital with survival informationBreast cancerRelapse-related molecular signature in lung adenocarcinomas identifies patients with dismal prognosislung cancerAnalysis of early primary breast cancer to identify prognostic markers and associated pathways: mRNA and miRNA profilingBreast cancerGene expression data for pathological stage I-II lung adenocarcinomas Lung cancerThe humoral immune system has a key prognostic impact in node-negative breast cancerBreast cancerGene expression of breast cancer tissue in a large population-based cohort of Swedish patientsBreast cancerHuman lung adenocarcinoma lung cancerExpression Profile-Defined Classification of Lung Adenocarcinoma lung cancerSurvival Related Profile, Pathways and Transcription Factors in Ovarian Cancer Ovarian CancerPrediction of survival in diffuse large B cell lymphoma treated with chemotherapy plus Rituximabdiffuse large B cell lymphoma

Page 16: Institute meeting  - 09/10/15

Multiple data types

• Clinical diagnosis• Treatment history• Histologic diagnosis• Pathologic report/images• Tissue anatomic site• Surgical history• Gene expression/RNA

sequence• Chromosomal copy

number• Loss of heterozygosity• Methylation patterns• miRNA expression• DNA sequence• RPPA (protein)• Subset for Mass Spec

CDD

25* forms of cancer

glioblastoma multiforme(brain)

squamous carcinoma(lung)

serouscystadenocarcinoma(ovarian)

Etc. Etc. Etc.

Biospecimen CoreResource with more than 150 Tissue Source Sites

6 Cancer GenomicCharacterization Centers

3 GenomeSequencingCenters

7 Genome Data Analysis Centers

Data Coordinating Center

Page 17: Institute meeting  - 09/10/15

Future Medicine Practice in cancer research

Chin et al. 2014,Cell

Page 18: Institute meeting  - 09/10/15

Motivation

18

TCGA has many high quality primary tumor samples,

but metastasis kills

Which primaries will metastasize?

Image courtesy of wikimedia commons

Page 19: Institute meeting  - 09/10/15

Overview of pathway-guided approach• Integrate many data sources to gain accurate view of

how genes are functioning in pathways• Predict the functional consequences of mutations by

quantifying the effect on the surrounding pathway• Use pathway signatures to implicate mutations in novel

genes to (re-)focus targeting• Identify critical “Achilles Heels” in the pathways that

distinguish a particular sub-type

Page 20: Institute meeting  - 09/10/15

Schema

Data

SGPX

Assembly

Chr location:start-end

Cytogenetic band

Disease iD:PharmaGKB

Ensemble ID

SNP Id

GO: gene ontology

COSMIC mutation

GRCh38.p2

X:139,955,72-139,965,520

Xq27.1

ENSG00000117592

rs761610936

GO:0016021Integral membrane component

PA164718516

FASTA seq Cell-lines

Kegg pathway

Molecular Mass

interaction

Proteomes

Modified reside Protein abundance cross organisms

MCF7,HeLa

hsa:347487

39944 Da

UP000005640

Glycosylation Q5JRM2

CXORF66

MNLVICVLLLSIWKNNCMTTNQTNGSSTTGDKPVESMQTKLNYLRRNLLILVGIIIMVFV FICFCYLHYNCLSDDASKAGMVKKKGIAAKSSKTSFSEAKTASQCSPETQPMLSTADKSS DSSSPERASAQS

9606.ENSP00000359571SGPX

equivalent to chromosome X open reading frame 66

COSM1249516

see alsoc.17G>T

Mutation type

same as

Peroxiredoxin 6

UniProtHPA

KEGG

UniProt

Gene cards

EMBL-EBISPDPaxDb

NCBI

dbSNP

HGNC

PharmaGKB

COSMICSwissProt

Ensemble

equivalent class

Page 21: Institute meeting  - 09/10/15

Acknowledgements

Dr . Ratnesh SahayGroup Leader, eHealth and Life sciences , Insight centre for data analytics @ NUI Galway

Dr . Prasanna VenkatramanPrincipal Investigator, Advanced centre for treatment education and research in cancer, Mumbai, India

Dr . Rangapriya SundarajanSr. Research Associate, Advanced centre for treatment education and research in cancer, Mumbai, India

Page 22: Institute meeting  - 09/10/15