Upload
genomeinabottle
View
303
Download
1
Embed Size (px)
DESCRIPTION
Aug2014 nist rm development plans
Citation preview
NIST Reference Material Development Plans
August 2014
NIST RM Development PlansGenome(s) Q4 2014 Q1 2015 Q2 2015 Q3 2015 Q4 2015
HG-001/NA12878
Release NIST RM8398; Preliminary large deletions
Refined Structural Variants
HG-002 to HG-004
(Ashkenazim trio)
Illumina, Complete Genomics, Ion, BioNano, and SOLiD data
Preliminary SNPs/indels; 100x PacBio data; Illumina assembled long reads
Refined SNPs/indels; Preliminary SVs
Refined Structural Variants
NIST RMs 8391/8392 release
HG-005 (son in Asian trio)
Illumina, Complete Genomics, Ion, BioNano, and SOLiD data
Illumina assembled long reads
Preliminary SNPs/indels
Refined SNPs/indels; Refined Structural Variants
NIST RM8393 release
Preliminary uses of high-confidence NIST-GIAB genotypes for NA12878
• NIST have released several versions of high-confidence genotypes for its pilot RM
• These data are presently being used for benchmarking– prior to release of RMs– SNPs & indels
• ~77% of the genome
Data Release Plans
Individual Datasets• Uploaded to GIAB FTP site
as it is collected• May include raw reads,
aligned reads, and variant/reference calls
Integrated High-confidence Calls
• First develop SNP, indel, and homozygous reference calls
• Then develop SV and non-SV calls
• Released calls are versioned• Preliminary callsets will be
made available to be critiqued
• Data jamboree??
Pilot RM (NA12878)
• HapMap/1000 Genomes sample
• Lots of public data and analyses
• Not consented for commercial redistribution
• Data from pedigree available and analyzed
• ~8000 units for NIST RM
• High-confidence calls released– integrates multiple
datasets and phased pedigree analysis
• Developing SV calls• Planned release as NIST
RM8398 in Q4 2014
Ashkenazim PGP trio
• Personal Genome Project trio (huAA53E0/hu8E87A9/hu6E4515)
• Father/mother/son at Coriell (GM24143/GM24149/GM24385)
• Consented for commercial redistribution
• Most short-read data will be available Q3 2014
• 100x PacBio WGS completed ~Q1 2015
• 10x Illumina assembled long reads for son ~Q1 2015
• Planned NIST RM release ~Q4 2015– NIST RM 8391 will be only the
son (~8000 units)– NIST RM 8392 will contain all 3
family members (~2500 units)
Asian PGP trio
• Personal Genome Project trio (hu91BD69/hu38168C/huCA017E)
• Father/mother/son at Coriell (GM24695/GM24694/GM24631)
• Only the son planned for NIST RM but trio will be characterized
• Consented for commercial redistribution
• Most short-read data will be available Q3-Q4 2014
• 10x Illumina assembled long reads for son ~Q1 2015
• Planned NIST RM release ~Q4 2015– NIST RM 8393 will be
only the son (~11000 units)
New Platform-specific (-independent?) Integration Method
Normalize and take union of calls
Simple SNPs/indels
Illumina/SOLiD – GATK HC force
calls
Ion – TVC force calls
If all biased or low qual, uncertain
Elseif all concordant, high-
conf
Elseif all unbiased are concordant,
high-confElse uncertain
CG – use Ref file
Complex Variants
Use vcfeval or SMASH for
sequential pair-wise comparison
Integration Method Plans
• Implement new integration methods on the cloud– Easier for…
• distributed analysis• scalability• transparency• others to reproduce results
• First, analyze NA12878 RM data with new methods to ensure they work well
• Then, apply to PGP trios