Data Sciences & Data Engineering Broad Institute of Harvard and MIT http://www.broadinstitute.org/gatk
h"p://iseqtools.org/
@gatk_dev
GATK Best Practices for Variant Discovery UCLA, Los Angeles CA, USA 2-4 Mar, 2016
What/whoistheBroadIns8tute?
• SpinoffofHarvard&MIT--EricLanderandphilanthropistsEli&EdythBroad
• Usethefullpowerofgenomicstotransformtheunderstandingandtreatmentofdisease
DataScience&DataEngineering@Broad
Aneworganiza8onbringingtogethersoOwareengineers,computa8onalbiologists,andcompu8nginfrastructurespecialists.
Avisionthatar8culatesanadvancedcompu8nginfrastructure,setofdataandanalysisservicesleveragingmoderncloudcompu8ngparadigms.
h2ps://www.broadins>tute.org/dsde/
• Toolkitfocusedonvariantdiscovery(SNP&indel)
• Components:
- Engineandinfrastructure
- Tools(walkers)
- AlsoaprogrammingframeworkfordevelopinggenomeanalysissoOware
GATK=GenomeAnalysisToolkit
GATKBestPrac8ces=completereads-to-variantsworkflows
DataPre-Processing
VariantDiscovery
CallsetRefinement
FASTQ->BAM BAM->VCF
FASTQ
SAM/BAM VCF
GATKdevelopmentroadmap
1.x 2.x 3.x
4.x
AlphaGATK4:cloud-friendlyandmorescalable(ApacheSpark) +extendedfunc>onality(CNVs,Picard)
h"ps://github.com/broadins4tute/gatk
=