Madrid icgc pcawg_2016_slideshare

Preview:

Citation preview

Canceromatic III - Session I: Pan-Cancer analysis - Changing landscape of data and tools available for reproducible cancer genomics workflows: report from the ICGC trenches. 

Nov 14th 2016B.F. Francis Ouellette francis@oicr.on.ca

• Senior Scientists & Associate Director, Informatics and Biocomputing, Ontario Institute for Cancer Research, Toronto, ON

• Associate Professor, Department of Cell and Systems Biology, University of Toronto, Toronto, ON.

2Module #: Title of Module

Module 2 bioinformatics.ca

ONTARIO INSTITUTE FOR CANCER RESEARCH

You are free to:Copy, share, adapt, or re-mix;

Photograph, film, or broadcast;

Blog, live-blog, or post video of;

This presentation. Provided that:You attribute the work to its author and respect the rights and licenses associated with its components.

Slide Concept by Cameron Neylon, who has waived all copyright and related or neighbouring rights. This slide only ccZero.Social Media Icons adapted with permission from originals by Christopher Ross. Original images are available under GPL at;http://www.thisismyurl.com/free-downloads/15-free-speech-bubble-icons-for-popular-websites

ONTARIO INSTITUTE FOR CANCER RESEARCH

@bffo

francis@oicr.on.caE-mail

6

ONTARIO INSTITUTE FOR CANCER RESEARCH

Cancer-om-atics Jul 6-9 2009Cancer-om-atics II Mar 28-30 2011Canceromatics III Nov 13 -16 2016

ONTARIO INSTITUTE FOR CANCER RESEARCH

DisclaimersI do not (and will not) profit in any way, shape or form, from any of the brands, products or companies I may mention.

I am a big proponent of Open Access, Open Source, Opent Data and Open Courseware

I am on the SAB of many NIH funded projects (SGD, Galaxy, GenomeSpace, H3ABionet, and HMP2), as well as Elixir and Genome Canada’s SIAC, and the NRC’s KMAC. This comes with a bias on how science should be done!

ONTARIO INSTITUTE FOR CANCER RESEARCH

Outline

8

IntroductionICGCPCAWGClosing remarks

9

ONTARIO INSTITUTE FOR CANCER RESEARCH

adapted from https://goo.gl/fQJAz1

ICGC PCAWGDocker Testing

ONTARIO INSTITUTE FOR CANCER RESEARCH

Cancer is a Disease of the Genome

Challenge in Treating Cancer: Every tumour is different Every cancer patient is different

Adapted from Tom Hudsonhttps://www.cancer.gov/research/areas/genomics

ONTARIO INSTITUTE FOR CANCER RESEARCH

Johns Hopkins> 18,000 genes analyzed for mutations11 breast and 11 colon tumorsL.D. Wood et al, Science, Oct. 2007

Wellcome Trust Sanger Institute518 genes analyzed for mutations210 tumors of various typesC. Greenman et al, Nature, Mar. 2007

TCGA (NIH)Multiple technologiesbrain (glioblastoma multiforme), lung (squamous

carcinoma), and ovarian (serous cystadenocarcinoma).

F.S. Collins & A.D. Barker, Sci. Am, Mar. 2007

Large-Scale Studies of Cancer Genomes

ONTARIO INSTITUTE FOR CANCER RESEARCH

Heterogeneity within and across tumor types

High rate of abnormalities (driver vs passenger)

Sample quality matters Consent and controlled data access is

complicated

Lessons learned from early studies

MR Stratton et al. Nature 458, 719-724 (2009) doi:10.1038/nature07943

ONTARIO INSTITUTE FOR CANCER RESEARCH

Analysis Data TypesSimple Somatic Mutations (SSM or SNV)Copy Number Alterations (CAN or CNV)Structural Variants (SV) Germline variants (SNPs)Gene Expression (micro-arrays and RNASeq)miRNA Expression (RNASeq)Epigenomics (Arrays and Methylation) Splicing Variation (RNASeq)Protein Expression (Arrays)

ONTARIO INSTITUTE FOR CANCER RESEARCH

Rationale for the ICGC:

Scope is hugeReduce duplication of effortStandardization and uniform quality measuresMerging of datasetsSpectrum of many cancers varies across the worldAccelerate the dissemination of genomic and analytical methods

ONTARIO INSTITUTE FOR CANCER RESEARCH

International Cancer Genome Consortium

Collect ~500 tumour/normal pairs from each of 50 different major cancer types; 25,000 T/N pairs!

Comprehensive genome analysis of each T/N pair: GenomeTranscriptomeMethylomeClinical data

Make the data available to the research community & public. Identify

genome changes

…GATTATTCCAGGTAT… …GATTATTGCAGGTAT… …GATTATTGCAGGTAT…

Adapted from Tom Hudson

16

ONTARIO INSTITUTE FOR CANCER RESEARCH

ONTARIO INSTITUTE FOR CANCER RESEARCH

International Cancer Genome Consortium: http:/icgc.org

ONTARIO INSTITUTE FOR CANCER RESEARCH

DataSubmissio

n

ValidationValidationValidation(dictionary)

Validation(across fields)Validation

(across fields)Validation

(across fields)

indexing

Happy Users

http://goo.gl/1EcyR

ONTARIO INSTITUTE FOR CANCER RESEARCH

ICGC needs to deal with different kinds of users!

19

Biologists/Clinicians:Web interface to processed data, providing:

Affected gene lists with consequencesImpact on pathways

Power users:Application Programing Interface (API) to get to dataAvailability and Integration with cloud resources

20

ONTARIO INSTITUTE FOR CANCER RESEARCH

ICGC Data Coordinating Centre: dcc.icgc.org

ONTARIO INSTITUTE FOR CANCER RESEARCH

BRAF missense mutations in colorectal cancer

21

22

ONTARIO INSTITUTE FOR CANCER RESEARCH

https://dcc.icgc.org/

23

ONTARIO INSTITUTE FOR CANCER RESEARCH

https://dcc.icgc.org/icgc-in-the-cloud

24

ONTARIO INSTITUTE FOR CANCER RESEARCH

http://www.cancercollaboratory.org/

ONTARIO INSTITUTE FOR CANCER RESEARCH

http://docs.icgc.org/

User and submitter documentation

26

ONTARIO INSTITUTE FOR CANCER RESEARCH

Software development discussions

https://discuss.icgc.org/

ONTARIO INSTITUTE FOR CANCER RESEARCH

Some challenges:

27

So, we have lots of data, is it generated the same way?

ONTARIO INSTITUTE FOR CANCER RESEARCH

Every country/group has basically been submitting:

28

Simple Somatic Mutations (SSM or SNV)Copy Number Alterations (CAN or CNV)Structural Variants (SV) Germline variants (SNPs)Gene Expression (micro-arrays and RNASeq)miRNA Expression (RNASeq)Epigenomics (Arrays and Methylation) Splicing Variation (RNASeq)Protein Expression (Arrays)

29

ONTARIO INSTITUTE FOR CANCER RESEARCH

Are they all using the same pipelines?

No

ONTARIO INSTITUTE FOR CANCER RESEARCH

http://goo.gl/CekF6y

Missing Clinical Data?

31

ONTARIO INSTITUTE FOR CANCER RESEARCH

http://goo.gl/CekF6y

32

ONTARIO INSTITUTE FOR CANCER RESEARCH

Are we all using the same definition for controlled access data?

No

ONTARIO INSTITUTE FOR CANCER RESEARCH

ICGCBAM/FASTQ

TCGABAM/FASTQ

ICGCOpenData

(includes TCGA

Open Data)

ONTARIO INSTITUTE FOR CANCER RESEARCH

• Detailed Phenotype and Outcome data Region of residenceRisk factorsExaminationSurgeryRadiationSampleSlideSpecific histological featuresAnalyteAliquotDonor notes

• Gene Expression (probe-level data)• Raw genotype calls• Gene-sample identifier links• Genome sequence files

ICGC Controlled Access Datasets

• Cancer Pathology Histologic type or subtypeHistologic nuclear grade

• Patient/Person Gender, Age range, Vital status, Survival timeRelapse type, Status at follow-up

• Gene Expression (normalized)• DNA methylation •Computed Copy Number and Loss of Heterozygosity• Newly discovered somatic variants

ICGC OA Datasets

http://goo.gl/w4mrV

ONTARIO INSTITUTE FOR CANCER RESEARCH

ICGC

TCGA

ONTARIO INSTITUTE FOR CANCER RESEARCH

ICGC

TCGA

Differences between ICGC & TCGA• Different tumour types• Different geographic rules• Many countries vs one jurisdiction• Different definitions of what is controlled• Different data access rules

ONTARIO INSTITUTE FOR CANCER RESEARCH

• Detailed Phenotype and Outcome data

• Gene Expression (probe-level data)

• Raw genotype calls

• Gene-sample identifier links

• Genome sequence files

• Germ line variants

ICGC Controlled Access Datasets

• Cancer Pathology Histologic type or subtypeHistologic nuclear grade

• Patient/Person Gender, Age range, Vital status, Survival timeRelapse type, Status at follow-up

• Gene Expression (normalized)• DNA methylation •Computed Copy Number and Loss of Heterozygosity• Somatic variants from Exome or WGS

ICGC OpenAccess Datasets

http://goo.gl/w4mrV

ONTARIO INSTITUTE FOR CANCER RESEARCH

• Primary sequence data (BAM and FASTQ files)

• SNP6 array level 1 and level 2 data• Exon array level 1 and level 2 data• Somatic variants from whole

genome sequencing• Certain information in MAFs• A full list of controlled-access

data types can be found at: http://goo.gl/K1h7zu

TCGA Controlled Access Datasets

• De-identified clinical and demographic data

• Gene expression data• Copy number alterations in regions

of the genome• Epigenetic data• Summaries of data compiled across

individuals• Anonymized single amplicon DNA

sequence data• Somatic variants from scrubbed

exome sequencing

TCGA OpenAccess Datasets

http://goo.gl/A1rMRB

39

ONTARIO INSTITUTE FOR CANCER RESEARCH

Can we do better?

ONTARIO INSTITUTE FOR CANCER RESEARCH

From ICGC/TCGA

40

Each groups have been free to decide on their own if they wanted to sequence Exomes or Whole Genomes.A bit more than 10% of all genomes done were done with Whole Genome SequencingA steering comitte was formed and we decided to alnalyze these WG in a robust way with the primary question of figuring out what was hidden in the genomic sequence of cancer patients!

41

ONTARIO INSTITUTE FOR CANCER RESEARCH

42

ONTARIO INSTITUTE FOR CANCER RESEARCH

Steering Committee of PCAWG

Peter Campbell, Sanger Inst.Gady Getz, BroadJan Korbel, EMBLLincoln Stein, OICRJosh Stuart, UCSC

ONTARIO INSTITUTE FOR CANCER RESEARCH

PanCancer Analysis of Whole Genomes (PCAWG)

> 2,800 T/N pairs with clinical data from 20 tumour type of whole genome analysis.Aligned with one standard pipeline.Genomic Variants determined with 3 pipelines17 working groupsStart writing papers now

44

ONTARIO INSTITUTE FOR CANCER RESEARCH

Deliverable for PCAWG will include:

1st PANCANCER analysis on > 2,800 cancer tumours from a WGS perspectiveRNA, SSM, CNV, Methylation analysis & germlinePublished (executable) pipelines

Docker / DockstoreMutiple cloud access to dataMultiple portal access to data

45

ONTARIO INSTITUTE FOR CANCER RESEARCH

https://dcc.icgc.org/pcawg

46

ONTARIO INSTITUTE FOR CANCER RESEARCH

Working Groups (1/2)1 Novel somatic mutation calling methods 2 Analysis of mutations in regulatory regions3 Integration of transcriptome and genome4 Integration of epigenome and genome5 Consequences of somatic mutations on pathway and network activity6 Patterns of structural variations, signatures, genomic correlations, retrotransposons, mobile elements7 Mutation signatures and processes8 Germline cancer genome

47

ONTARIO INSTITUTE FOR CANCER RESEARCH

Working Groups (1/2)9 Inferring driver mutations and identifying cancer genes and pathways10 Translating cancer genomes to the clinic11 Evolution and heterogeneity12 Exploratory: portals, visualization and software infrastructure13 Molecular subtypes and classification14 Analysis of mutations in non-coding RNA15 Exploratory: mitochondrial16 Exploratory: pathogensTech Technical working group

48

ONTARIO INSTITUTE FOR CANCER RESEARCH

http://dockstore.org

49

ONTARIO INSTITUTE FOR CANCER RESEARCH

PCAWG pipelines now on Dockstore

50

ONTARIO INSTITUTE FOR CANCER RESEARCH

DOCKSTORE testing groupAndrew Duncan, OICRChristina Yung, OICRDenis Yuen, OICRZhibin Lu, OICRBrian O’Connor, UCSCAlex Buchanan, OHSUKyle Ellrott, OHSU Francis Ouellette, OICRGordon Saksena, BroadJunjun Zhang, OICRMiguel Vazquez, CNIOOliver Hofmann, AustraliaSolomon Shorser, OICRAdam Strucka, OHSU

51

ONTARIO INSTITUTE FOR CANCER RESEARCH

Challenges:Too many conference calls!Too many clouds Even though we learned from what not to do with ICGC, we had to learn what not to do in the clouds.TCGA and ICGC have different authorization protocolsNot all data can exist everywhereDockstore testing is taking too long!

ONTARIO INSTITUTE FOR CANCER RESEARCH

Other projects in planning ICGC to finish in Spring of 2018Planning for ICGCmed

ICGC 1: 25,000 tumours (DNA, RNA, Epigenome, Clinical data)ICGCmed: 200,000 Tumours (DNA, RNA, Epigenome, Clinical trial)ICGC1 was the picture, ICGCmed will be the movie (before and after treatment).

Submission system with one place for data and metadataTools/links directory portal

53

ONTARIO INSTITUTE FOR CANCER RESEARCH

29,647

54

ONTARIO INSTITUTE FOR CANCER RESEARCH

29,647

55

ONTARIO INSTITUTE FOR CANCER RESEARCH

2,834

56

ONTARIO INSTITUTE FOR CANCER RESEARCH

2,834

57

ONTARIO INSTITUTE FOR CANCER RESEARCH

1477

58

ONTARIO INSTITUTE FOR CANCER RESEARCH

1477

59

ONTARIO INSTITUTE FOR CANCER RESEARCH

915

60

ONTARIO INSTITUTE FOR CANCER RESEARCH

915

61

ONTARIO INSTITUTE FOR CANCER RESEARCH

20

62

ONTARIO INSTITUTE FOR CANCER RESEARCH

20

ONTARIO INSTITUTE FOR CANCER RESEARCH

17

ONTARIO INSTITUTE FOR CANCER RESEARCHhttp://bioinformatics.ca/

17

65

ONTARIO INSTITUTE FOR CANCER RESEARCH

12

66

ONTARIO INSTITUTE FOR CANCER RESEARCH

0-Toronto1-Bethesda2-Hinxton

4-Queensland 3-Madrid5-Kyoto

7-Hidelberg 6-Cannes8-Toronto

9-Beijing

10-Mumbai11- Boston

12

67

ONTARIO INSTITUTE FOR CANCER RESEARCH

10

68

ONTARIO INSTITUTE FOR CANCER RESEARCH

Informatics & BioComputing @ OICR

10

ONTARIO INSTITUTE FOR CANCER RESEARCH

9

ONTARIO INSTITUTE FOR CANCER RESEARCH

9

ONTARIO INSTITUTE FOR CANCER RESEARCH

71

1

ONTARIO INSTITUTE FOR CANCER RESEARCH

72

Bioinformatics.ca workshops Content

http://bioinformatics-ca.github.io/

https://goo.gl/CGu13q1

ONTARIO INSTITUTE FOR CANCER RESEARCH

DCC Software Developer

Vincent FerrettiDusan AndricPhuong-My DoFrancois GerthoffertTerry LinMichael MoncadaVitalii SlobodianykBob TiernayDouglas WongLinda XiangJunjun Zhang

AcknowledgmentsICGC/OICR Project leaders:

Tom HudsonJohn McPhersonLincoln SteinJared SimpsonPaul BoutrosVincent FerrettiFrancis OuelletteJennifer JenningsChristine Yung

Ouellette LabAlysha MoncrieffeAnn MeyerZhibin LuWeb DevJoseph YamadaKaman WuKim CullionKoji MiyauchiMiyuki Fukuma

ICGC DCC BiocurationHardeep NahalMarc Perry

http://oicr.on.ca http://icgc.org

… and all the patients and their families that that are putting their hopes into our work!

Research IT/Systems

David Sutton, Bob GibsonDavid MagdaRob NaccaratoBrian OttGino Yearwood

EGAJordi Rambla De ArgilaArcadi Navarro Audald Iloret Mauricio Moldes 

ONTARIO INSTITUTE FOR CANCER RESEARCH

http://icgc.orghttp://dcc.icgc.orghttp://docs.icgc.org

info@icgc.org http://bioinformatics.ca

ONTARIO INSTITUTE FOR CANCER RESEARCH

We are hiring:

• OICR Director• Genome Technology Director• Junior Faculty in Informatics

& Biocomputing• PDFs

Interested? Ask Paul Boutros or I

76

ONTARIO INSTITUTE FOR CANCER RESEARCH

Muchas gracias!

Recommended