Upload
gertrude-blake
View
224
Download
0
Tags:
Embed Size (px)
Citation preview
Cancer Sequencing
What is Cancer?
Definitions• A class of diseases
characterized by malignant growth of a group of cells– Growth is uncontrolled– Invasive and Damaging– Often able to metastasize
• An instance of such a disease (a malignant tumor)
• A disease of the genome
http://en.wikipedia.org/wiki/Cancer http://faculty.ksu.edu.sa/tatiah/Pictures%20Library/normal%20male%20karyotyping.jpg
What is Cancer?
Definitions• A class of diseases
characterized by malignant growth of a group of cells– Growth is uncontrolled– Invasive and Damaging– Often able to metastasize
• An instance of such a disease (a malignant tumor)
• A disease of the genome
http://en.wikipedia.org/wiki/Cancer http://www.moffitt.org/CCJRoot/v2n5/artcl2img4.gif
Fundamental Changes in Cancer Cell Physiology
Evasion of anti-cancer control mechanisms• Apoptosis (e.g. p53)• Antigrowth signals (e.g. pRb)• Cell Senescence
Hanahan and Weinberg. 2000. The hallmarks of cancer. Cell 100: 57-70.
Exploitation of natural pathways for cellular growth• Growth Signals (e.g. TGF family)• Angiogenesis• Tissue Invasion & Metastasis
Acceleration of Cellular Evolution Via Genome Instability• DNA Repair• DNA Polymerase
Many Paths Lead to Cancer Self-Sufficiency
Hanahan, Douglas, and Ra Weinberg. 2000. The hallmarks of cancer. Cell 100: 57-70.
Cancer Heterogeneity
Greaves, M. & Maley, C. C. Clonal evolution in cancer. Nature 481, 306–13 (2012).
Why Sequence Cancer Genomes?
• Better understand cancer biology– Pathway information– Types of mutations found in
different cancers
Why Sequence Cancer Genomes?
• Better understand cancer biology– Pathway information– Types of mutations found in
different cancers
• Cancer Diagnosis– Genetic signatures of cancer types will
inform diagnosis– Non-invasive means of detecting or
confirming presence of cancer
• Improve cancer therapies– Targeted treatment of cancer subtypes
COSMIC Database, v48, July 2010http://www.sanger.ac.uk/genetics/CGP/cosmic/
Forbes et al. 2011. COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic Acids Research 39: D945-D950
Samples 544809
Mutations 141212
Papers 10383
Whole Genomes 29
Why Sequence Cancer Genomes?
• Better understand cancer biology– Pathway information– Types of mutations found in
different cancers
• Cancer Diagnosis– Genetic signatures of cancer types will
inform diagnosis– Non-invasive means of detecting or
confirming presence of cancer
• Improve cancer therapies– Targeted treatment of cancer subtypes
COSMIC Database, v71, Oct 2014http://www.sanger.ac.uk/genetics/CGP/cosmic/
Samples 1058292
Mutations 2710449
Papers 20247
Whole Genomes 15047
How Do We Sequence Cancer Genomes?
How Do We Sequence Cancer Genomes?
Read Mapping
Definition of Coverage
Length of genomic segment: LNumber of reads: nLength of each read: l
Definition: Coverage C = n l / L
How much coverage is enough?
Lander-Waterman model:Assuming uniform distribution of reads, C=10 results in 1 gapped region /1,000,000 nucleotides
C
Read Mapping
BWA
Paired-End Read Mapping
ReferencePhysical Coverage: 4Sequence Coverage: 2
• Physical coverage refers to the genomic coverage including the unsequenced regions of each DNA fragment
• Sequence coverage refers to the genomic coverage counting only the sequenced part of each DNA fragment
• Increased gap length between paired reads provides higher physical coverage without incurring increased costs for sequencing, which is useful for detecting certain types of mutations
• Factors that effect mutation signal– Limited genetic material (lower depth)– Mixture of tumor and normal tissue– Cancer Heterogeneity
• Factors that introduce noise– Formalin-fixed and Paraffin-embedded samples– Increased number of mutations and unusual genomic rearrangements
• General Consideration– Each individual has many unique mutations that could be confused with
cancer causing mutations
Considerations for Cancer Sequencing
Human Genome Variation
SNP TGCTGAGATGCCGAGA Novel Sequence TGCTCGGAGA
TGC - - - GAGA
Inversion Mobile Element orPseudogene Insertion
Translocation Tandem Duplication
Microdeletion TGC - - AGATGCCGAGA Transposition
Large Deletion Novel Sequenceat Breakpoint
TGC
Variant TypesVariant Types
Single Nucleotide Variants(SNVs)
Small Insertion / Deletion (indels)
Copy Number Variants (CNVs)
Structural Variants (SVs)
Novel Sequence
SNV CallingVariant Types
Single Nucleotide Variants(SNVs)
Small Insertion / Deletion (indels)
Copy Number Variants (CNVs)
Structural Variants (SVs)
Novel Sequence
• A bayesian approach is the most general and common method of calling SNVs– MAQ, SOAPsnp, Genome Analyis ToolKit (GATK),
SAMtools
SNV CallingVariant Types
Single Nucleotide Variants(SNVs)
Small Insertion / Deletion (indels)
Copy Number Variants (CNVs)
Structural Variants (SVs)
Novel Sequence
http://www.broadinstitute.org/gatk//events/2038/GATKwh0-BP-5-Variant_calling.pdf
SNV CallingVariant Types
Single Nucleotide Variants(SNVs)
Small Insertion / Deletion (indels)
Copy Number Variants (CNVs)
Structural Variants (SVs)
Novel Sequence
• A given human genome (germline) differs from the reference genome at millions of positions.
• A cancer genome differs from the healthy genome of its host by tens of thousands of positions at most, which is several orders of magnitude fewer differences than germline versus reference
• How do we distinguish germline mutations from somatic mutations?
Somatic SNV calling
Tumor TissueNormal Tissue
Compare the alignment results
• Most naïve: use a standard SNV caller on both datasets. If there is a mutation found in the tumor sample but not the normal, it is somatic!
Somatic SNV calling
Roth, A. et al. JointSNVMix: a probabilistic model for accurate detection of somatic mutations in normal/tumour paired next-generation sequencing data. Bioinformatics 28, 907–13 (2012).
JointSNVMix • probabilistic graphical models for joint
tumor-normal SNV calling
Short Indel CallingVariant Types
Single Nucleotide Variants(SNVs)
Short Insertion / Deletion (indels)
Copy Number Variants (CNVs)
Structural Variants (SVs)
Novel Sequence
Reference
Deletion
Insertion
Short Indel CallingVariant Types
Single Nucleotide Variants(SNVs)
Short Insertion / Deletion (indels)
Copy Number Variants (CNVs)
Structural Variants (SVs)
Novel Sequence
Reference
Deletion
Insertion
Reference
Read mappingin practice
Unmappable part of read (just the read end)
Unmapped read (could not be alignedanywhere)
Short Indel Calling – Discordant Reads Pairs
II) Deletion
I) Insertion
i
d
l
l - i
l + d
l
Variant Types
Single Nucleotide Variants(SNVs)
Short Insertion / Deletion (indels)
Copy Number Variants (CNVs)
Structural Variants (SVs)
Novel Sequence
Reference
Reference
Short Indel Calling – Split Read Mapping
Variant Types
Single Nucleotide Variants(SNVs)
Short Insertion / Deletion (indels)
Copy Number Variants (CNVs)
Structural Variants (SVs)
Novel Sequence
Reference
Reference
Deletion
Read mappingin practice
Short Indel Calling – Split Read Mapping
Variant Types
Single Nucleotide Variants(SNVs)
Short Insertion / Deletion (indels)
Copy Number Variants (CNVs)
Structural Variants (SVs)
Novel Sequence
Reference
Reference
Deletion
Read mappingin practice
Remap each end of thesuspicious reads
Paired-end mapping can improve power to detect variants without need for more sequencing
Modified from Meyerson et al. . 2010. Advances in understanding cancer genomes through second-generation sequencing. Nature Reviews Genetics 11, no. 10 (October): 685-696
Copy Number Variants
Ref: A B C D E F G H I K
A B C D C E F G H C I K
A B C D C E F G H C I K
Variant Types
Single Nucleotide Variants(SNVs)
Short Insertion / Deletion (indels)
Copy Number Variants (CNVs)
Structural Variants (SVs)
Novel Sequence
Copy Number Variants
Ref: A B C D E F G H I K
A B C D C E F G H C I K
C C C
C Depth of Coverage
Modified from Dalca and Brudno. 2010. Genome variation discovery with high-throughput sequencing data. Briefings in bioinformatics 11, no. 1: 3-14
Variant Types
Single Nucleotide Variants(SNVs)
Short Insertion / Deletion (indels)
Copy Number Variants (CNVs)
Structural Variants (SVs)
Novel Sequence
• Problems with DOC – Very sensitive to stochastic variance in coverage– Sensitive to bias coverage (e.g. GC content).– Impossible to determine non-reference locations of CNVs
• Graph methods using paired-end reads help overcome some of these problems
Copy Number Variants
Ref: A B C D E F G H I K
A B C D C E F G H C I K
C C C
C Depth of Coverage
Variant Types
Single Nucleotide Variants(SNVs)
Small Insertion / Deletion (indels)
Copy Number Variants (CNVs)
Structural Variants (SVs)
Novel Sequence
Copy Number Variants - CNAnorm
Gusnanto, A., Wood, H. M., Pawitan, Y., Rabbitts, P. & Berri, S. Correcting for cancer genome size and tumour cell content enables better estimation of copy number alterations from next-generation sequence data. Bioinformatics 28, 40–7 (2012).
Overall steps in CNAnorm method, a tool for detecting copy number changes in tumor samples
Data: sequence data from tumor and normal samples
Steps:
1) Count number of reads in fixed windows across the genome
2) Calculate ratio of reads in tumor vs. reads in normal for each window, correcting for sequence biases (e.g. GC)
3) Smooth ratio signal across windows
4) Normalize data
5) Estimate amount of normal contamination in tumor sample
6) Perform segmentation on tumor data
Variant Types
Ref: A B C D E F G H I K
1 2 3 4 5 6 7 8
4 G I K1 2 3
1 2 4 3 5 6 7 8
Structural Rearrangement
Translocation
3 2 1 5 6 7 8 Inversion
1 3 5 9 6 7 8 Large Insertion / Deletion
2̂
Variant Types
Single Nucleotide Variants(SNVs)
Short Insertion / Deletion (indels)
Copy Number Variants (CNVs)
Structural Variants (SVs)
Novel Sequence
Summary of Variant Types
Meyerson et al. . 2010. Advances in understanding cancer genomes through second-generation sequencing. Nature Reviews Genetics 11, no. 10 (October): 685-696
Passenger Mutations and Driver Mutations
Normal
CancerXX
Driver or Passenger?
Greaves, M. & Maley, C. C. Clonal evolution in cancer. Nature 481, 306–13 (2012).
Sequence
Passenger Mutations and Driver Mutations
Stratton, Michael R, Peter J Campbell, and P Andrew Futreal. 2009. The cancer genome. Nature 458, no. 7239 (April): 719-24. doi:10.1038/nature07943
Passenger Mutations and Driver Mutations
Distinguishing Features• Presence in many tumors• Predicted to have functional
impact on the cell– Conserved– Not seen in healthy adults
(rare)– Predicted to affect protein
structure
• In pathways known to be involved in cancer
Train Classifier using Machine Learning Approaches
Carter et al. 2009. Cancer-specific high-throughput annotation of somatic mutations: computational prediction of driver missense mutations. Cancer research, no. 16: 6660-6667
Tracking the Evolution of Cancer
Models of Breast Cancer Progression
Models of Breast Cancer Progression
What we did
• Cancer phylogenetics– Lineage relationship of
neoplastic lesions with cancers using somatic SNVs as lineage markers
– Order of genomic events and drivers
Slide Courtesy of Arend Sidow
Samples
P1 P2 P3 P4 P5 P6LymphNormal
CCLFEADCISIDC
Side 1 Side 2
All samples are FFPE material
Slide Courtesy of Arend Sidow
Samples
Patient 1 Evolution – SNVs
GGATAGATAGCG
GCGTCCTAGCGT
CCATGGCATGGCCATGGC
GGCAAA
Normal sample
Early neoplasia(EN) sample
EN with atypia(ENA) sample
Invasive ductal carcinoma(IDC) sample
GGATAGTGTCCATGGCAAA
Reads from sequencing patient sample
Human genome reference
Patient 1 Evolution – SNVs
Normal sample
Early neoplasia(EN) sample
EN with atypia(ENA) sample
Invasive ductal carcinoma(IDC) sample
Reads from sequencing patient sample
Human genome reference
CCC
Multisample SNV Code
1 0 0 0
Normal EN ENA IDC
Patient 1 Evolution – SNVs
Normal sample
Early neoplasia(EN) sample
EN with atypia(ENA) sample
Invasive ductal carcinoma(IDC) sample
Reads from sequencing patient sample
Human genome reference
Multisample SNV Code
0 1 1 0
Normal EN ENA IDC
C
CC
CCC
Patient 1 Evolution – SNVs
Code Normal EN ENA IDC SUM
1000 89 89
0100 147 147
0010 102 102
0001 46 46
0011 755 755 755
Patient 1 Evolution – SNVs
Code Normal EN ENA IDC SUM
1000 89 89
0100 147 147
0010 102 102
0001 46 46
0011 755 755 755
75546
102
89 147
Normal EN
ENA IDC
Venn diagram view
Patient 1 Evolution – SNVs
Code Normal EN ENA IDC SUM
1000 89 89
0100 147 147
0010 102 102
0001 46 46
0011 755 755 755
75546
102
89 147
Normal EN
ENA IDC
P1
Patient 6 Evolution - SNVs
Lymph CCL_CL CCL FEA DCIS IDC SUM
010000 219 219
001000 305 305
000100 345 345
000010 978 978
000001 608 608
000101 61 61 61
000110 185 185 185
000111 510 510 510 510
01XXXX 0
010101 0
000011 0
3211
LINEAGE CONCEPTSSomatic changes
Cell Divisions in One Generation
~40 celldivisions
~60 newpoint
mutations
“Germline”
?D
You
Your dad
“Germline”
You
Your dad
Mutations detected here ...
... but not here ...
?D
Somatic (Tumor) Lineages
Sampled lesion
Somatic (Tumor) Lineages
Sampled lesion
Somatic (Tumor) Lineages
Sampled lesion 1 Sampled lesion 2
Patient 6 Evolution - SNVs
Patient 6 Evolution
Slide Courtesy of Arend Sidow
Patient 6 Evolution – Copy Number Changes
Slide Courtesy of Arend Sidow
ANEUPLOIDIESPutting the germline SNPs to good use (no somatic SNVs for this!)
Heterozygous Positions
G A Tm
A T Cp
50% 50% 50%
LOH (e.g., paternal chromosome)
G A Tm
A T Cp
0% 0% 0%
Fraction of “lesser allele”
Chromosome Duplication
G A Tm
A T Cp A T C
66% 66% 66%
Chromosome Duplication
G A Tm
A T Cp A T C
33% 33% 33%
Fraction of “lesser allele”
Lesser Allele Fraction Plots
Chromosome
Lesser allele fraction
Running number of germlinehet SNP (N ~ 1.7 million)
Plots are windows of 1000 SNPs, overlapping by 500
Lesser Allele Fraction Plots
Zoom-In
But ... What is the actual ploidy?
Absolute Coverage Pattern of LOH and Gain
Normal LOH Gain
Absolute Coverage in LOH vs Ploidy Gain
Prevalent allele absolute coverage
Lesser allele absolute coverage
Lesser allele FRACTION
LOH
Gain
?
Fractions with normal contribution
GA
Prevalent allele absolute coverage = 7
Lesser allele absolute coverage = 0
GA
Prevalent allele absolute coverage = 14
Lesser allele absolute coverage = 7
Our samples: up to 50% normal (non-tumor)
tissue contentLesser allele fraction = 7/21 = 0.33
Patient 6 Evolution – Copy Number Changes
Slide Courtesy of Arend Sidow
Patient 6 Evolution
Slide Courtesy of Arend Sidow
Patient 2
Lymph Normal CCL FEA IDCDCIS
Patient 2 - normal
Patient 2 – CCL and DCIS
16q: 1N (LOH)
1q: 4N (3:1)
X: 1N (LOH)
16p: 3N (2:1)
Patient 2 – IDC has same as CCL,DCIS
Patient 2 Aneuploidy Evolution
CCL,DCIS IDC
1q,16p16q, X
Patient 2 - IDC
Patient 2 Aneuploidy Evolution
CCL,DCIS IDC IDC’
1q,16p16q, X
Major CrisisInvolving all but 6 chromosomes, including 10 whole-chromosome LOHs
No aneuploidies but ...
Patient 2 – SNVs
515
5
70
0
681 80
133
Allele Freq in CCLCCL: 894
IDC: 1276
DCIS: 884
Patient 2 Evolution
1q,16p16q, X
1p245891011131415171921681
133
80 515
Patient Cancer Phylogeny Trees
Slide Courtesy of Arend Sidow
Mutational Profiles
Branched tree model
Victoria Popic
Automated Inference of Multi-Sample Cancer Phylogenies
Raheleh Salari
Sample 1
Sample 3
SMutH: Somatic Mutation Hierarchies
Sample 2
VAF profiles of SNVs across samples
VAF profiles of SNVs – Clustering
Edge u v :
Cell-Lineage VAF Constraint
u
v
“Possibly mutations in u happened before those in v”
For each node u and its children C :
Tree Construction
Find all spanning trees that satisfy VAF constraints
(extension of Gabow&Myers spanning tree search algorithm)
Rank trees according to their agreement with VAFs
u
vw
Simulation Results
Pred: pairs of nodes ordered correctlyBranch: pairs of nodes correctly assigned to separate branchesShared edges: edges shared between true and reconstructed trees
u
vwu
vw z
yx
ccRCC Study of Renal Carcinoma by Gerlinger et. al
(2014)
HGSC Study of Ovarian Cancer Bashashati et. al (2013)
Reconstruction of Lineage Trees in Recent Literature
PIK3CA H1047R
PIK3CA H1047L
Expanded Breast Cancer Lineage Trees
• Fantastic Cancer Reviews– Hanahan and Weinberg. 2000. The hallmarks of cancer. Cell 100: 57-70.– Hanahan and Weinberg. 2011. Hallmarks of cancer: the next generation. Cell
144, 646–74.• Reviews of Cancer Genomics
– Meyerson, Matthew, Stacey Gabriel, and Gad Getz. 2010. Advances in understanding cancer genomes through second-generation sequencing. Nature Reviews Genetics 11, no. 10 (October): 685-696. doi:10.1038/nrg2841. http://www.nature.com/doifinder/10.1038/nrg2841.
– Yates, L. R. & Campbell, P. J. Evolution of the cancer genome. Nat. Rev. Genet. 13, 795–806 (2012).
• Variant Calling– Dalca, Adrian V, and Michael Brudno. 2010. Genome variation discovery with
high-throughput sequencing data. Briefings in bioinformatics 11, no. 1 (January): http://www.ncbi.nlm.nih.gov/pubmed/20053733.
– Medvedev, Paul, Monica Stanciu, and Michael Brudno. 2009. Computational methods for discovering structural variation with next-generation sequencing. nature methods 6, no. 11 http://www.nature.com/nmeth/journal/v6/n11s/full/nmeth.1374.html.
Further Readings for the Curious