6
Pedigree and Trio Data Integration Justin Zook NIST

Aug2014 nist integration plans

Embed Size (px)

DESCRIPTION

Aug2014 nist integration plans

Citation preview

Page 1: Aug2014 nist integration plans

Pedigree and Trio Data Integration

Justin ZookNIST

Page 2: Aug2014 nist integration plans

Integration of Pedigree with NIST arbitrated calls

High-confidence• In NIST high-confidence set

and not in the RTG phase inconsistent set.

• In NIST low-confidence set and polymorphic in either the RTG or PG phase consistent sets

Uncertain• Homopolymers not in phase

consistent sets• In NIST low-confidence set and not

polymorphic in either the RTG or PG phase consistent sets

• In RTG or PG and homozygous reference in NIST

• Calls missing from our high and low confidence calls and falls outside our high-confidence regions

• NA12878 SVs in dbVar and known segmental duplications

Page 3: Aug2014 nist integration plans

Integration of Pedigree with multi-platform callsNIST-PASS

Both3.04M

RTG-PHQ12.6k

NIST-PASS = NIST passing calls v.2.19NIST-All = NIST v2.19 calls, including filtered calls if they are not likely homozygous referenceRTG-PHQ = Real Time Genomics Phase Consistent calls with any phase qualityRTG-PHQ>20 = Real Time Genomics Phase Consistent calls with phase quality > 20RTG-PHI = Real Time Genomics Phase Inconsistent callsPlatGen = Platinum Genomes Phase Consistent callsPlatGenPoly = Platinum Genomes Phase Consistent calls that are polymorphic in the pedigreeBold means included in the final call setItalic means removed + 50bp on either side from the final bed file

NIST-PASS-

noPHQ23.6k

RTG-PHI

(174k)

NIST-PASS55.6k

Both31.8k

PlatGen

NIST-PASS-

noPHQ-noPHI

17k

Both6.6k

PlatGen

NIST-PASS-

noPHQ-PHI

(18k)

Both13.5k

Page 4: Aug2014 nist integration plans

Integration of Pedigree with multi-platform calls – NIST filtered

Both2.74M

RTG-PHQ>061.0k

NIST-PASS = NIST passing calls v.2.19NIST-All = NIST v2.19 calls, including filtered calls if they are not likely homozygous referenceRTG-PHQ = Real Time Genomics Phase Consistent calls with any phase qualityRTG-PHQ>0 = Real Time Genomics Phase Consistent calls with phase quality > 0RTG-PHI = Real Time Genomics Phase Inconsistent callsPlatGen = Platinum Genomes Phase Consistent callsPlatGenPoly = Platinum Genomes Phase Consistent calls that are polymorphic in the pedigreeBold means included in the final call setItalic means removed + 50bp on either side from the final bed file

NIST-All-PHQ>0364k

NIST-PASS-PHQ

(664k)

NIST-All

Both2.37M

NIST-All-

PlatGenpoly-

noNISTPASS62k

NIST-All-

PHQ>0-noNIST

PASS134k

Both230k

Both2.05M

PlatGenpoly

32.7k

NIST-All-PlatGen

poly364k

NIST-PASS-PHQ

(664k)

NIST-All

Both2.37M

Page 5: Aug2014 nist integration plans

New Platform-specific Integration Method for PGP Trios

Normalize and take union of calls

Simple SNPs/indels

Illumina/SOLiD – GATK HC force

calls

Ion – TVC force calls

If all biased or low qual, uncertain

Elseif all concordant, high-

conf

Elseif all unbiased are concordant,

high-confElse uncertain

CG – use Ref file

Complex Variants

Use vcfeval or SMASH for

sequential pair-wise comparison

Page 6: Aug2014 nist integration plans

Integration Method Plans

• Implement new integration methods on the cloud– Easier for others to reproduce results

• First, analyze NA12878 RM data with new methods to ensure they work well

• Then, apply to PGP trios