76
Canceromatic III - Session I: Pan- Cancer analysis - Changing landscape of data and tools available for reproducible cancer genomics workflows: report from the ICGC trenches. 14 th 2016 F. Francis Ouellette [email protected] Senior Scientists & Associate Director, Informatics and Biocomputing, Ontario Institute for Cancer Research, Toronto, ON Associate Professor, Department of Cell and Systems Biology, University of Toronto, Toronto, ON.

Madrid icgc pcawg_2016_slideshare

Embed Size (px)

Citation preview

Page 1: Madrid icgc pcawg_2016_slideshare

Canceromatic III - Session I: Pan-Cancer analysis - Changing landscape of data and tools available for reproducible cancer genomics workflows: report from the ICGC trenches. 

Nov 14th 2016B.F. Francis Ouellette [email protected]

• Senior Scientists & Associate Director, Informatics and Biocomputing, Ontario Institute for Cancer Research, Toronto, ON

• Associate Professor, Department of Cell and Systems Biology, University of Toronto, Toronto, ON.

Page 2: Madrid icgc pcawg_2016_slideshare

2Module #: Title of Module

Page 3: Madrid icgc pcawg_2016_slideshare

Module 2 bioinformatics.ca

Page 4: Madrid icgc pcawg_2016_slideshare

ONTARIO INSTITUTE FOR CANCER RESEARCH

You are free to:Copy, share, adapt, or re-mix;

Photograph, film, or broadcast;

Blog, live-blog, or post video of;

This presentation. Provided that:You attribute the work to its author and respect the rights and licenses associated with its components.

Slide Concept by Cameron Neylon, who has waived all copyright and related or neighbouring rights. This slide only ccZero.Social Media Icons adapted with permission from originals by Christopher Ross. Original images are available under GPL at;http://www.thisismyurl.com/free-downloads/15-free-speech-bubble-icons-for-popular-websites

Page 5: Madrid icgc pcawg_2016_slideshare

ONTARIO INSTITUTE FOR CANCER RESEARCH

@bffo

[email protected]

Page 6: Madrid icgc pcawg_2016_slideshare

6

ONTARIO INSTITUTE FOR CANCER RESEARCH

Cancer-om-atics Jul 6-9 2009Cancer-om-atics II Mar 28-30 2011Canceromatics III Nov 13 -16 2016

Page 7: Madrid icgc pcawg_2016_slideshare

ONTARIO INSTITUTE FOR CANCER RESEARCH

DisclaimersI do not (and will not) profit in any way, shape or form, from any of the brands, products or companies I may mention.

I am a big proponent of Open Access, Open Source, Opent Data and Open Courseware

I am on the SAB of many NIH funded projects (SGD, Galaxy, GenomeSpace, H3ABionet, and HMP2), as well as Elixir and Genome Canada’s SIAC, and the NRC’s KMAC. This comes with a bias on how science should be done!

Page 8: Madrid icgc pcawg_2016_slideshare

ONTARIO INSTITUTE FOR CANCER RESEARCH

Outline

8

IntroductionICGCPCAWGClosing remarks

Page 9: Madrid icgc pcawg_2016_slideshare

9

ONTARIO INSTITUTE FOR CANCER RESEARCH

adapted from https://goo.gl/fQJAz1

ICGC PCAWGDocker Testing

Page 10: Madrid icgc pcawg_2016_slideshare

ONTARIO INSTITUTE FOR CANCER RESEARCH

Cancer is a Disease of the Genome

Challenge in Treating Cancer: Every tumour is different Every cancer patient is different

Adapted from Tom Hudsonhttps://www.cancer.gov/research/areas/genomics

Page 11: Madrid icgc pcawg_2016_slideshare

ONTARIO INSTITUTE FOR CANCER RESEARCH

Johns Hopkins> 18,000 genes analyzed for mutations11 breast and 11 colon tumorsL.D. Wood et al, Science, Oct. 2007

Wellcome Trust Sanger Institute518 genes analyzed for mutations210 tumors of various typesC. Greenman et al, Nature, Mar. 2007

TCGA (NIH)Multiple technologiesbrain (glioblastoma multiforme), lung (squamous

carcinoma), and ovarian (serous cystadenocarcinoma).

F.S. Collins & A.D. Barker, Sci. Am, Mar. 2007

Large-Scale Studies of Cancer Genomes

Page 12: Madrid icgc pcawg_2016_slideshare

ONTARIO INSTITUTE FOR CANCER RESEARCH

Heterogeneity within and across tumor types

High rate of abnormalities (driver vs passenger)

Sample quality matters Consent and controlled data access is

complicated

Lessons learned from early studies

MR Stratton et al. Nature 458, 719-724 (2009) doi:10.1038/nature07943

Page 13: Madrid icgc pcawg_2016_slideshare

ONTARIO INSTITUTE FOR CANCER RESEARCH

Analysis Data TypesSimple Somatic Mutations (SSM or SNV)Copy Number Alterations (CAN or CNV)Structural Variants (SV) Germline variants (SNPs)Gene Expression (micro-arrays and RNASeq)miRNA Expression (RNASeq)Epigenomics (Arrays and Methylation) Splicing Variation (RNASeq)Protein Expression (Arrays)

Page 14: Madrid icgc pcawg_2016_slideshare

ONTARIO INSTITUTE FOR CANCER RESEARCH

Rationale for the ICGC:

Scope is hugeReduce duplication of effortStandardization and uniform quality measuresMerging of datasetsSpectrum of many cancers varies across the worldAccelerate the dissemination of genomic and analytical methods

Page 15: Madrid icgc pcawg_2016_slideshare

ONTARIO INSTITUTE FOR CANCER RESEARCH

International Cancer Genome Consortium

Collect ~500 tumour/normal pairs from each of 50 different major cancer types; 25,000 T/N pairs!

Comprehensive genome analysis of each T/N pair: GenomeTranscriptomeMethylomeClinical data

Make the data available to the research community & public. Identify

genome changes

…GATTATTCCAGGTAT… …GATTATTGCAGGTAT… …GATTATTGCAGGTAT…

Adapted from Tom Hudson

Page 16: Madrid icgc pcawg_2016_slideshare

16

ONTARIO INSTITUTE FOR CANCER RESEARCH

Page 17: Madrid icgc pcawg_2016_slideshare

ONTARIO INSTITUTE FOR CANCER RESEARCH

International Cancer Genome Consortium: http:/icgc.org

Page 18: Madrid icgc pcawg_2016_slideshare

ONTARIO INSTITUTE FOR CANCER RESEARCH

DataSubmissio

n

ValidationValidationValidation(dictionary)

Validation(across fields)Validation

(across fields)Validation

(across fields)

indexing

Happy Users

http://goo.gl/1EcyR

Page 19: Madrid icgc pcawg_2016_slideshare

ONTARIO INSTITUTE FOR CANCER RESEARCH

ICGC needs to deal with different kinds of users!

19

Biologists/Clinicians:Web interface to processed data, providing:

Affected gene lists with consequencesImpact on pathways

Power users:Application Programing Interface (API) to get to dataAvailability and Integration with cloud resources

Page 20: Madrid icgc pcawg_2016_slideshare

20

ONTARIO INSTITUTE FOR CANCER RESEARCH

ICGC Data Coordinating Centre: dcc.icgc.org

Page 21: Madrid icgc pcawg_2016_slideshare

ONTARIO INSTITUTE FOR CANCER RESEARCH

BRAF missense mutations in colorectal cancer

21

Page 22: Madrid icgc pcawg_2016_slideshare

22

ONTARIO INSTITUTE FOR CANCER RESEARCH

https://dcc.icgc.org/

Page 23: Madrid icgc pcawg_2016_slideshare

23

ONTARIO INSTITUTE FOR CANCER RESEARCH

https://dcc.icgc.org/icgc-in-the-cloud

Page 24: Madrid icgc pcawg_2016_slideshare

24

ONTARIO INSTITUTE FOR CANCER RESEARCH

http://www.cancercollaboratory.org/

Page 25: Madrid icgc pcawg_2016_slideshare

ONTARIO INSTITUTE FOR CANCER RESEARCH

http://docs.icgc.org/

User and submitter documentation

Page 26: Madrid icgc pcawg_2016_slideshare

26

ONTARIO INSTITUTE FOR CANCER RESEARCH

Software development discussions

https://discuss.icgc.org/

Page 27: Madrid icgc pcawg_2016_slideshare

ONTARIO INSTITUTE FOR CANCER RESEARCH

Some challenges:

27

So, we have lots of data, is it generated the same way?

Page 28: Madrid icgc pcawg_2016_slideshare

ONTARIO INSTITUTE FOR CANCER RESEARCH

Every country/group has basically been submitting:

28

Simple Somatic Mutations (SSM or SNV)Copy Number Alterations (CAN or CNV)Structural Variants (SV) Germline variants (SNPs)Gene Expression (micro-arrays and RNASeq)miRNA Expression (RNASeq)Epigenomics (Arrays and Methylation) Splicing Variation (RNASeq)Protein Expression (Arrays)

Page 29: Madrid icgc pcawg_2016_slideshare

29

ONTARIO INSTITUTE FOR CANCER RESEARCH

Are they all using the same pipelines?

No

Page 30: Madrid icgc pcawg_2016_slideshare

ONTARIO INSTITUTE FOR CANCER RESEARCH

http://goo.gl/CekF6y

Missing Clinical Data?

Page 31: Madrid icgc pcawg_2016_slideshare

31

ONTARIO INSTITUTE FOR CANCER RESEARCH

http://goo.gl/CekF6y

Page 32: Madrid icgc pcawg_2016_slideshare

32

ONTARIO INSTITUTE FOR CANCER RESEARCH

Are we all using the same definition for controlled access data?

No

Page 33: Madrid icgc pcawg_2016_slideshare

ONTARIO INSTITUTE FOR CANCER RESEARCH

ICGCBAM/FASTQ

TCGABAM/FASTQ

ICGCOpenData

(includes TCGA

Open Data)

Page 34: Madrid icgc pcawg_2016_slideshare

ONTARIO INSTITUTE FOR CANCER RESEARCH

• Detailed Phenotype and Outcome data Region of residenceRisk factorsExaminationSurgeryRadiationSampleSlideSpecific histological featuresAnalyteAliquotDonor notes

• Gene Expression (probe-level data)• Raw genotype calls• Gene-sample identifier links• Genome sequence files

ICGC Controlled Access Datasets

• Cancer Pathology Histologic type or subtypeHistologic nuclear grade

• Patient/Person Gender, Age range, Vital status, Survival timeRelapse type, Status at follow-up

• Gene Expression (normalized)• DNA methylation •Computed Copy Number and Loss of Heterozygosity• Newly discovered somatic variants

ICGC OA Datasets

http://goo.gl/w4mrV

Page 35: Madrid icgc pcawg_2016_slideshare

ONTARIO INSTITUTE FOR CANCER RESEARCH

ICGC

TCGA

Page 36: Madrid icgc pcawg_2016_slideshare

ONTARIO INSTITUTE FOR CANCER RESEARCH

ICGC

TCGA

Differences between ICGC & TCGA• Different tumour types• Different geographic rules• Many countries vs one jurisdiction• Different definitions of what is controlled• Different data access rules

Page 37: Madrid icgc pcawg_2016_slideshare

ONTARIO INSTITUTE FOR CANCER RESEARCH

• Detailed Phenotype and Outcome data

• Gene Expression (probe-level data)

• Raw genotype calls

• Gene-sample identifier links

• Genome sequence files

• Germ line variants

ICGC Controlled Access Datasets

• Cancer Pathology Histologic type or subtypeHistologic nuclear grade

• Patient/Person Gender, Age range, Vital status, Survival timeRelapse type, Status at follow-up

• Gene Expression (normalized)• DNA methylation •Computed Copy Number and Loss of Heterozygosity• Somatic variants from Exome or WGS

ICGC OpenAccess Datasets

http://goo.gl/w4mrV

Page 38: Madrid icgc pcawg_2016_slideshare

ONTARIO INSTITUTE FOR CANCER RESEARCH

• Primary sequence data (BAM and FASTQ files)

• SNP6 array level 1 and level 2 data• Exon array level 1 and level 2 data• Somatic variants from whole

genome sequencing• Certain information in MAFs• A full list of controlled-access

data types can be found at: http://goo.gl/K1h7zu

TCGA Controlled Access Datasets

• De-identified clinical and demographic data

• Gene expression data• Copy number alterations in regions

of the genome• Epigenetic data• Summaries of data compiled across

individuals• Anonymized single amplicon DNA

sequence data• Somatic variants from scrubbed

exome sequencing

TCGA OpenAccess Datasets

http://goo.gl/A1rMRB

Page 39: Madrid icgc pcawg_2016_slideshare

39

ONTARIO INSTITUTE FOR CANCER RESEARCH

Can we do better?

Page 40: Madrid icgc pcawg_2016_slideshare

ONTARIO INSTITUTE FOR CANCER RESEARCH

From ICGC/TCGA

40

Each groups have been free to decide on their own if they wanted to sequence Exomes or Whole Genomes.A bit more than 10% of all genomes done were done with Whole Genome SequencingA steering comitte was formed and we decided to alnalyze these WG in a robust way with the primary question of figuring out what was hidden in the genomic sequence of cancer patients!

Page 41: Madrid icgc pcawg_2016_slideshare

41

ONTARIO INSTITUTE FOR CANCER RESEARCH

Page 42: Madrid icgc pcawg_2016_slideshare

42

ONTARIO INSTITUTE FOR CANCER RESEARCH

Steering Committee of PCAWG

Peter Campbell, Sanger Inst.Gady Getz, BroadJan Korbel, EMBLLincoln Stein, OICRJosh Stuart, UCSC

Page 43: Madrid icgc pcawg_2016_slideshare

ONTARIO INSTITUTE FOR CANCER RESEARCH

PanCancer Analysis of Whole Genomes (PCAWG)

> 2,800 T/N pairs with clinical data from 20 tumour type of whole genome analysis.Aligned with one standard pipeline.Genomic Variants determined with 3 pipelines17 working groupsStart writing papers now

Page 44: Madrid icgc pcawg_2016_slideshare

44

ONTARIO INSTITUTE FOR CANCER RESEARCH

Deliverable for PCAWG will include:

1st PANCANCER analysis on > 2,800 cancer tumours from a WGS perspectiveRNA, SSM, CNV, Methylation analysis & germlinePublished (executable) pipelines

Docker / DockstoreMutiple cloud access to dataMultiple portal access to data

Page 45: Madrid icgc pcawg_2016_slideshare

45

ONTARIO INSTITUTE FOR CANCER RESEARCH

https://dcc.icgc.org/pcawg

Page 46: Madrid icgc pcawg_2016_slideshare

46

ONTARIO INSTITUTE FOR CANCER RESEARCH

Working Groups (1/2)1 Novel somatic mutation calling methods 2 Analysis of mutations in regulatory regions3 Integration of transcriptome and genome4 Integration of epigenome and genome5 Consequences of somatic mutations on pathway and network activity6 Patterns of structural variations, signatures, genomic correlations, retrotransposons, mobile elements7 Mutation signatures and processes8 Germline cancer genome

Page 47: Madrid icgc pcawg_2016_slideshare

47

ONTARIO INSTITUTE FOR CANCER RESEARCH

Working Groups (1/2)9 Inferring driver mutations and identifying cancer genes and pathways10 Translating cancer genomes to the clinic11 Evolution and heterogeneity12 Exploratory: portals, visualization and software infrastructure13 Molecular subtypes and classification14 Analysis of mutations in non-coding RNA15 Exploratory: mitochondrial16 Exploratory: pathogensTech Technical working group

Page 48: Madrid icgc pcawg_2016_slideshare

48

ONTARIO INSTITUTE FOR CANCER RESEARCH

http://dockstore.org

Page 49: Madrid icgc pcawg_2016_slideshare

49

ONTARIO INSTITUTE FOR CANCER RESEARCH

PCAWG pipelines now on Dockstore

Page 50: Madrid icgc pcawg_2016_slideshare

50

ONTARIO INSTITUTE FOR CANCER RESEARCH

DOCKSTORE testing groupAndrew Duncan, OICRChristina Yung, OICRDenis Yuen, OICRZhibin Lu, OICRBrian O’Connor, UCSCAlex Buchanan, OHSUKyle Ellrott, OHSU Francis Ouellette, OICRGordon Saksena, BroadJunjun Zhang, OICRMiguel Vazquez, CNIOOliver Hofmann, AustraliaSolomon Shorser, OICRAdam Strucka, OHSU

Page 51: Madrid icgc pcawg_2016_slideshare

51

ONTARIO INSTITUTE FOR CANCER RESEARCH

Challenges:Too many conference calls!Too many clouds Even though we learned from what not to do with ICGC, we had to learn what not to do in the clouds.TCGA and ICGC have different authorization protocolsNot all data can exist everywhereDockstore testing is taking too long!

Page 52: Madrid icgc pcawg_2016_slideshare

ONTARIO INSTITUTE FOR CANCER RESEARCH

Other projects in planning ICGC to finish in Spring of 2018Planning for ICGCmed

ICGC 1: 25,000 tumours (DNA, RNA, Epigenome, Clinical data)ICGCmed: 200,000 Tumours (DNA, RNA, Epigenome, Clinical trial)ICGC1 was the picture, ICGCmed will be the movie (before and after treatment).

Submission system with one place for data and metadataTools/links directory portal

Page 53: Madrid icgc pcawg_2016_slideshare

53

ONTARIO INSTITUTE FOR CANCER RESEARCH

29,647

Page 54: Madrid icgc pcawg_2016_slideshare

54

ONTARIO INSTITUTE FOR CANCER RESEARCH

29,647

Page 55: Madrid icgc pcawg_2016_slideshare

55

ONTARIO INSTITUTE FOR CANCER RESEARCH

2,834

Page 56: Madrid icgc pcawg_2016_slideshare

56

ONTARIO INSTITUTE FOR CANCER RESEARCH

2,834

Page 57: Madrid icgc pcawg_2016_slideshare

57

ONTARIO INSTITUTE FOR CANCER RESEARCH

1477

Page 58: Madrid icgc pcawg_2016_slideshare

58

ONTARIO INSTITUTE FOR CANCER RESEARCH

1477

Page 59: Madrid icgc pcawg_2016_slideshare

59

ONTARIO INSTITUTE FOR CANCER RESEARCH

915

Page 60: Madrid icgc pcawg_2016_slideshare

60

ONTARIO INSTITUTE FOR CANCER RESEARCH

915

Page 61: Madrid icgc pcawg_2016_slideshare

61

ONTARIO INSTITUTE FOR CANCER RESEARCH

20

Page 62: Madrid icgc pcawg_2016_slideshare

62

ONTARIO INSTITUTE FOR CANCER RESEARCH

20

Page 63: Madrid icgc pcawg_2016_slideshare

ONTARIO INSTITUTE FOR CANCER RESEARCH

17

Page 64: Madrid icgc pcawg_2016_slideshare

ONTARIO INSTITUTE FOR CANCER RESEARCHhttp://bioinformatics.ca/

17

Page 65: Madrid icgc pcawg_2016_slideshare

65

ONTARIO INSTITUTE FOR CANCER RESEARCH

12

Page 66: Madrid icgc pcawg_2016_slideshare

66

ONTARIO INSTITUTE FOR CANCER RESEARCH

0-Toronto1-Bethesda2-Hinxton

4-Queensland 3-Madrid5-Kyoto

7-Hidelberg 6-Cannes8-Toronto

9-Beijing

10-Mumbai11- Boston

12

Page 67: Madrid icgc pcawg_2016_slideshare

67

ONTARIO INSTITUTE FOR CANCER RESEARCH

10

Page 68: Madrid icgc pcawg_2016_slideshare

68

ONTARIO INSTITUTE FOR CANCER RESEARCH

Informatics & BioComputing @ OICR

10

Page 69: Madrid icgc pcawg_2016_slideshare

ONTARIO INSTITUTE FOR CANCER RESEARCH

9

Page 70: Madrid icgc pcawg_2016_slideshare

ONTARIO INSTITUTE FOR CANCER RESEARCH

9

Page 71: Madrid icgc pcawg_2016_slideshare

ONTARIO INSTITUTE FOR CANCER RESEARCH

71

1

Page 72: Madrid icgc pcawg_2016_slideshare

ONTARIO INSTITUTE FOR CANCER RESEARCH

72

Bioinformatics.ca workshops Content

http://bioinformatics-ca.github.io/

https://goo.gl/CGu13q1

Page 73: Madrid icgc pcawg_2016_slideshare

ONTARIO INSTITUTE FOR CANCER RESEARCH

DCC Software Developer

Vincent FerrettiDusan AndricPhuong-My DoFrancois GerthoffertTerry LinMichael MoncadaVitalii SlobodianykBob TiernayDouglas WongLinda XiangJunjun Zhang

AcknowledgmentsICGC/OICR Project leaders:

Tom HudsonJohn McPhersonLincoln SteinJared SimpsonPaul BoutrosVincent FerrettiFrancis OuelletteJennifer JenningsChristine Yung

Ouellette LabAlysha MoncrieffeAnn MeyerZhibin LuWeb DevJoseph YamadaKaman WuKim CullionKoji MiyauchiMiyuki Fukuma

ICGC DCC BiocurationHardeep NahalMarc Perry

http://oicr.on.ca http://icgc.org

… and all the patients and their families that that are putting their hopes into our work!

Research IT/Systems

David Sutton, Bob GibsonDavid MagdaRob NaccaratoBrian OttGino Yearwood

EGAJordi Rambla De ArgilaArcadi Navarro Audald Iloret Mauricio Moldes 

Page 74: Madrid icgc pcawg_2016_slideshare

ONTARIO INSTITUTE FOR CANCER RESEARCH

http://icgc.orghttp://dcc.icgc.orghttp://docs.icgc.org

[email protected] http://bioinformatics.ca

Page 75: Madrid icgc pcawg_2016_slideshare

ONTARIO INSTITUTE FOR CANCER RESEARCH

We are hiring:

• OICR Director• Genome Technology Director• Junior Faculty in Informatics

& Biocomputing• PDFs

Interested? Ask Paul Boutros or I

Page 76: Madrid icgc pcawg_2016_slideshare

76

ONTARIO INSTITUTE FOR CANCER RESEARCH

Muchas gracias!