Upload
genomeinabottle
View
199
Download
1
Embed Size (px)
DESCRIPTION
Aug2014 nist integration plans
Citation preview
Pedigree and Trio Data Integration
Justin ZookNIST
Integration of Pedigree with NIST arbitrated calls
High-confidence• In NIST high-confidence set
and not in the RTG phase inconsistent set.
• In NIST low-confidence set and polymorphic in either the RTG or PG phase consistent sets
Uncertain• Homopolymers not in phase
consistent sets• In NIST low-confidence set and not
polymorphic in either the RTG or PG phase consistent sets
• In RTG or PG and homozygous reference in NIST
• Calls missing from our high and low confidence calls and falls outside our high-confidence regions
• NA12878 SVs in dbVar and known segmental duplications
Integration of Pedigree with multi-platform callsNIST-PASS
Both3.04M
RTG-PHQ12.6k
NIST-PASS = NIST passing calls v.2.19NIST-All = NIST v2.19 calls, including filtered calls if they are not likely homozygous referenceRTG-PHQ = Real Time Genomics Phase Consistent calls with any phase qualityRTG-PHQ>20 = Real Time Genomics Phase Consistent calls with phase quality > 20RTG-PHI = Real Time Genomics Phase Inconsistent callsPlatGen = Platinum Genomes Phase Consistent callsPlatGenPoly = Platinum Genomes Phase Consistent calls that are polymorphic in the pedigreeBold means included in the final call setItalic means removed + 50bp on either side from the final bed file
NIST-PASS-
noPHQ23.6k
RTG-PHI
(174k)
NIST-PASS55.6k
Both31.8k
PlatGen
NIST-PASS-
noPHQ-noPHI
17k
Both6.6k
PlatGen
NIST-PASS-
noPHQ-PHI
(18k)
Both13.5k
Integration of Pedigree with multi-platform calls – NIST filtered
Both2.74M
RTG-PHQ>061.0k
NIST-PASS = NIST passing calls v.2.19NIST-All = NIST v2.19 calls, including filtered calls if they are not likely homozygous referenceRTG-PHQ = Real Time Genomics Phase Consistent calls with any phase qualityRTG-PHQ>0 = Real Time Genomics Phase Consistent calls with phase quality > 0RTG-PHI = Real Time Genomics Phase Inconsistent callsPlatGen = Platinum Genomes Phase Consistent callsPlatGenPoly = Platinum Genomes Phase Consistent calls that are polymorphic in the pedigreeBold means included in the final call setItalic means removed + 50bp on either side from the final bed file
NIST-All-PHQ>0364k
NIST-PASS-PHQ
(664k)
NIST-All
Both2.37M
NIST-All-
PlatGenpoly-
noNISTPASS62k
NIST-All-
PHQ>0-noNIST
PASS134k
Both230k
Both2.05M
PlatGenpoly
32.7k
NIST-All-PlatGen
poly364k
NIST-PASS-PHQ
(664k)
NIST-All
Both2.37M
New Platform-specific Integration Method for PGP Trios
Normalize and take union of calls
Simple SNPs/indels
Illumina/SOLiD – GATK HC force
calls
Ion – TVC force calls
If all biased or low qual, uncertain
Elseif all concordant, high-
conf
Elseif all unbiased are concordant,
high-confElse uncertain
CG – use Ref file
Complex Variants
Use vcfeval or SMASH for
sequential pair-wise comparison
Integration Method Plans
• Implement new integration methods on the cloud– Easier for others to reproduce results
• First, analyze NA12878 RM data with new methods to ensure they work well
• Then, apply to PGP trios