Upload
genomeinabottle
View
325
Download
0
Embed Size (px)
Citation preview
Validating, Enhancing and Using GiaB Reference Materials
1
GiaB Workshop, Stanford UniversityJanuary 29, 2016
Bina Technologies, Roche Sequencing
Marghoob MohiyuddinMohammad Sahraeian
Hugo Y. K. Lam
Background
2
For Research Use Only. Not for use in diagnostic procedures. 3
What we do
For Research Use Only. Not for use in diagnostic procedures. 4
Collaborative scientific innovation
SeqAlto
VarSim
Benchmarking variant-calling
For Research Use Only. Not for use in diagnostic procedures.
The VarSim framework
For Research Use Only. Not for use in diagnostic procedures.
The VarSim framework
Simulate and validate whole-genomesBoth somatic and germline simulation supportedComprehensive simulation and validation of multiple kinds of variants
SNPs, small Indels, SVs
For Research Use Only. Not for use in diagnostic procedures.
Assessing variant-calling
VarSim simulation of 50x Illumina 2x100bp reads for NA12878
Error profiles from NA12878 platinum genomes sample
SNVs and small indels from GiaB high-confidence set
SVs from multiple sourcesDeletions from 1000Genomes
Insertions randomly sampled from DGV and sequences from HuRef insertion
sequences
Inversions, duplications randomly sampled from DGV
8
For Research Use Only. Not for use in diagnostic procedures.
Small variant calling accuracy
Validating GiaB gold set
For Research Use Only. Not for use in diagnostic procedures.
GiaB SVs
11
GiaB released high-confidence SVs for NA128782676 deletions, 68 insertions
Trio sequences available from Illumina Platinum GenomesMetaSV calls SVs by integrating from multiple methods
http://bioinform.github.io/metasv/Validation of GiaB gold set using MetaSV trio analysis ensures quality of GiaB gold set
For Research Use Only. Not for use in diagnostic procedures.
MetaSV trio analysis
12
Validate by analyzing trios (50x coverage)MetaSV calls for parents (NA12891, NA12892)MetaSV calls for NA12878
CriteriaDeletions >= 100bp considered (2,348/88% in GiaB)Reciprocal overlap of 50%GiaB deletion validated ifDetected by MetaSV in any parent (multiple samples)Detected by MetaSV as high-confidence in NA12878 (multiple methods)Reported in previous literatures
For Research Use Only. Not for use in diagnostic procedures.
MetaSV: an ensemble approach
13
For Research Use Only. Not for use in diagnostic procedures.
MetaSV workflow
14
Merge SVs from multiple methods
Multiple methods high-conf.8 SV callers supported
Enhanced insertion detectionSoft-clip analysis + assembly
Assembly and alignment to refine breakpointsSupports Del, Ins, Inv, Dup, Trans.Supported in bcbio
For Research Use Only. Not for use in diagnostic procedures. 15
GiaB deletion validation Total Validated
Total not validated
Additionally Validated
0(0%)
2,348 (100%)
0 (0%)
2,302 (98.0%)
46 (2.0%)
2,302(98.0%)
2,306 (98.2%)
42 (1.8%)
4(0.2%)
2,342 (99.7%)
6 (0.3%)
36(1.5%)
GiaB HC
GiaB HC Validated by Parents (MetaSV ALL)
GiaB HC Validated by Child (MetaSV PASS)
GiaB HC Validated by Child (curated)
For Research Use Only. Not for use in diagnostic procedures. 16
Mendelian validation
● MetaSV High Quality Trio Deletions: ○ Mendelian Inheritance Consistency with Genotypes○ Pass in Child and ALL in Parents○ Considering no call as reference call
MetaSV Trio Dels2,582
GiaB HC Dels2,348
GiaB Private222
Common2,126
MetaSV Private456
(142 not in literature)
MetaSV PASS Dels2,671
96.7% are Mendelian consistent
(98.7% if ignoring genotypes)
For Research Use Only. Not for use in diagnostic procedures. 17
Trio analysis summary
GiaB SVs have a high validation rate using MetaSV trio analysisOnly 6 unvalidated SVs do not have strong support in IGV or SVVIZ GiaB deletions of high quality
Almost all (up to 98.7%) MetaSV PASS calls are Mendelian consistent making them high-quality
Significant number (456) of MetaSV trio calls not in GiaBPossibly missed due to stringent GiaB requirements since 321 of those in literature
MetaSV trio validation can help validate and extend the gold set
Enhancing GiaB gold set
For Research Use Only. Not for use in diagnostic procedures. 19
Work on Jewish trio
Enhanced MetaSVAssembly to get evidence for and refine all kinds of SVsOptimizations to speed up analysis
Calls for Jewish trio submittedWill help build high-confidence SV calls for other trios
For Research Use Only. Not for use in diagnostic procedures. 20
MetaSV and Parliament
2240
(Met
aSV,
79.
7%)
4490
(Par
liam
ent,
82.1
%)
569
(20.
3%)
977
(17.
9%)
MetaSV total 2809
Parliament total 5467 (after uniq)
Reciprocal overlap of 50% was used, no genotype matching performed. With 90% reciprocal overlap, 75.9% MetaSV calls and 54.0% Parliament calls overlapped.
For Research Use Only. Not for use in diagnostic procedures. 21
Summary
GiaB high-confidence calls regularly used to assess variant-calling pipelines
For both simulated and real dataMetaSV was used to perform validation for GiaB deletion SVs for NA12878
Trio analysis can also be applied to the Jewish and Han samplesContributing MetaSV calls for Jewish trio to help build better SV gold sets
For Research Use Only. Not for use in diagnostic procedures. 22
Acknowledgements
Bina: Mohammad Sahraeian, Jian Li, John MuGiaB: Justin Zook, Hemang Parikh