Aug2013 performance metrics working group

Genome in a Bottle

GIAB 2013

Workgroup 4: Performance Metrics

Q1. What performance metrics can/should be generated when someone sequences

the GIAB RMs?

Sequence group 1: Initial characterization of the RM to develop the ‘truth set’

EVERYTHING!

Chris Mason

Sequence group 2: People using reference materials to benchmark tests.

Probably NotMuch

To do list:

Create a document describing metadata we want to capture (Chris Mason)Identify fields we can reliably get from sequencers (Chris Mason)Develop a flat data structure to capture information (Brad Chapman)

Help develop an improved individual genotype reporting format.Work with CDC group on this. Work with VCF/gVCF/GVF developers

✔

Q2. How should performance besubdivided by region?

Q3. How should performance besubdivided by variant type?

Assembly Region Reproducibility Track (for all RMs)Highly confident regionsLess confident regionsRegions we can’t reliably call

NA12878 high quality genotype callsFocus on SNVs and small indels firstExpand to other variant types as we get more confidence

Update definitions as we add additional reference materials.

Q4. How can GIAB help coordinatethe different groups developing

performance metrics?

Develop APIs for existing software:X-prize/Harvard School of Public Health software

BCBio variation (comparison software)O8 (visualization)BCBio NextGen (Pipeline for running comparison)

Chris Mason’s software suiteArvados softwareGCAT software (Bioplanet)GeT-RM browser for visualization

Technology

Aug2013 performance metrics working group