9
NIST Reference Material Development Plans August 2014

Aug2014 nist rm development plans

Embed Size (px)

DESCRIPTION

Aug2014 nist rm development plans

Citation preview

Page 1: Aug2014 nist rm development plans

NIST Reference Material Development Plans

August 2014

Page 2: Aug2014 nist rm development plans

NIST RM Development PlansGenome(s) Q4 2014 Q1 2015 Q2 2015 Q3 2015 Q4 2015

HG-001/NA12878

Release NIST RM8398; Preliminary large deletions

Refined Structural Variants

HG-002 to HG-004

(Ashkenazim trio)

Illumina, Complete Genomics, Ion, BioNano, and SOLiD data

Preliminary SNPs/indels; 100x PacBio data; Illumina assembled long reads

Refined SNPs/indels; Preliminary SVs

Refined Structural Variants

NIST RMs 8391/8392 release

HG-005 (son in Asian trio)

Illumina, Complete Genomics, Ion, BioNano, and SOLiD data

Illumina assembled long reads

Preliminary SNPs/indels

Refined SNPs/indels; Refined Structural Variants

NIST RM8393 release

Page 3: Aug2014 nist rm development plans

Preliminary uses of high-confidence NIST-GIAB genotypes for NA12878

• NIST have released several versions of high-confidence genotypes for its pilot RM

• These data are presently being used for benchmarking– prior to release of RMs– SNPs & indels

• ~77% of the genome

Page 4: Aug2014 nist rm development plans

Data Release Plans

Individual Datasets• Uploaded to GIAB FTP site

as it is collected• May include raw reads,

aligned reads, and variant/reference calls

Integrated High-confidence Calls

• First develop SNP, indel, and homozygous reference calls

• Then develop SV and non-SV calls

• Released calls are versioned• Preliminary callsets will be

made available to be critiqued

• Data jamboree??

Page 5: Aug2014 nist rm development plans

Pilot RM (NA12878)

• HapMap/1000 Genomes sample

• Lots of public data and analyses

• Not consented for commercial redistribution

• Data from pedigree available and analyzed

• ~8000 units for NIST RM

• High-confidence calls released– integrates multiple

datasets and phased pedigree analysis

• Developing SV calls• Planned release as NIST

RM8398 in Q4 2014

Page 6: Aug2014 nist rm development plans

Ashkenazim PGP trio

• Personal Genome Project trio (huAA53E0/hu8E87A9/hu6E4515)

• Father/mother/son at Coriell (GM24143/GM24149/GM24385)

• Consented for commercial redistribution

• Most short-read data will be available Q3 2014

• 100x PacBio WGS completed ~Q1 2015

• 10x Illumina assembled long reads for son ~Q1 2015

• Planned NIST RM release ~Q4 2015– NIST RM 8391 will be only the

son (~8000 units)– NIST RM 8392 will contain all 3

family members (~2500 units)

Page 7: Aug2014 nist rm development plans

Asian PGP trio

• Personal Genome Project trio (hu91BD69/hu38168C/huCA017E)

• Father/mother/son at Coriell (GM24695/GM24694/GM24631)

• Only the son planned for NIST RM but trio will be characterized

• Consented for commercial redistribution

• Most short-read data will be available Q3-Q4 2014

• 10x Illumina assembled long reads for son ~Q1 2015

• Planned NIST RM release ~Q4 2015– NIST RM 8393 will be

only the son (~11000 units)

Page 8: Aug2014 nist rm development plans

New Platform-specific (-independent?) Integration Method

Normalize and take union of calls

Simple SNPs/indels

Illumina/SOLiD – GATK HC force

calls

Ion – TVC force calls

If all biased or low qual, uncertain

Elseif all concordant, high-

conf

Elseif all unbiased are concordant,

high-confElse uncertain

CG – use Ref file

Complex Variants

Use vcfeval or SMASH for

sequential pair-wise comparison

Page 9: Aug2014 nist rm development plans

Integration Method Plans

• Implement new integration methods on the cloud– Easier for…

• distributed analysis• scalability• transparency• others to reproduce results

• First, analyze NA12878 RM data with new methods to ensure they work well

• Then, apply to PGP trios