Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
8/11/16
1
SNPDetectionAnalysisManpreetS.Katari
PTCTasting
• Phenylthiocarbamideisachemicalthatnoteveryonecantaste.• Generallyabout70%ofindividualscantastethebitterness.• Canweusegeneticstoexplainthisphenomenon?
• Inordertokeepthedatasetsmallwearegoingtofocusononegenethatisknowntoimportantinthis.
• Thesequenceforthisgenewasobtainedfromfourindividualswhoalsoperformedthetastetest.
Singlenucleotidepolymorphisms(SNPs)l Single-nucleotidepolymorphisms
SNP (Genotypes G/G, reference A/A)
Typesofvariants
Typesofbasesubstitutions
• Transition(Ti):• purine->purinemutation• Or• pyrimidine->pyrimidinemutation
• Transversion (Tv):• Purine->pyrimidinemutation• Or• Pyrimidine->purinemutation
Smallinsertiondeletions(INDELS)
8/11/16
2
l Single-nucleotidepolymorphisms
l Smallinsertion/deletionpolymorphisms
SNP (Genotypes G/G, reference A/A)
Insertion allele in reference
TypesofvariantsMutationdetectionpipeline
• Alignment:SAMformat(BAM=binary)
• SNPdetection:VCFformat(BCF=binary)
• Functionalassessment
Variant Call Format (VCFs): summary of variants in a genome(s) Variant Call Format (VCFs): representation of variants
VCFlineexample• CHROM:PTC_Human• POS:245• ID:.
• REF:T• ALT:C• QUAL:999
• FILTER:.• INFO:DP=7609;VDB=0.0040;AF1=0.625;AC1=5;DP4=1985,855,2346,2276;MQ=59;FQ=999;PV4=2e-60,1,4.6e-
05,1• FORMAT:GT:PL:GQ GT=genotype,PL=genotypelikelihood,GQ=genotypequality• 0/1:244,0,248:99• 0/0:0,255,255:99
• 1/1:255,255,0:99• 1/1:255,255,0:99
Wholegenome-resequencing:snp-calling
Nielsen, Nat. Rev. Genetics 2011
Requires assembled genome
Steps to reduce the “false discovery rate” (FDR)Multi-sample algorithms improve SNP-calling in population studies
Hard-filtering of suspect SNPs is required in most snp-calling and genotyping applications
8/11/16
3
l WhatisaPCRduplicate?
PCR duplicates
Not a duplicate
Not a duplicate
Reference genome
Duplicate handling: PCR Duplicates Indel Re-alignment
l Whyisitnecessarytore-alignreads?
l Outcomeisrefinementofinsertion-deletionpositioningandreductioninfalsepositiveSNPs
l Example
DINDEL
GATK IndelRealignerTargetCreator / IndelRealigner Before:
After:
BaseQualityScoreRecalibration Resequencing workflow
l QCofsequencingdata
l Alignreads- generatesSAM/BAMalignment
l Coordinatesortreads
l Markduplicatereads
l Re-alignmentaroundinsertions/deletions
l SNP-calling
l Filtering/Qualitycontrol
l AssesspredictedeffectsofSNPs
GATKframework AnnotatingVariants
8/11/16
4
Dotheanalysis
IGV:IntegrativeGenomicsViewer
• http://www.broadinstitute.org/igv/• Standalonejavaprogram
• Doesnotrequireamysql databaseserveroranapachewebserver• Limitedtotheresourcesofthemachinethatitisrunningon.• MoreinteractivecomparedtoGbrowse.• BothIGVandGbrowse canuseGFFfileformat.
Loadthefasta fileasthegenome(referencesequence) SelectthePTC_Human.fa file
Noticethesequenceatthebottom NowloadtheAlignmentfiles(.bam)
8/11/16
5
NowloadtheAlignmentfiles(.bam)Coverageistheconsensusofallsequencestogether
Graymeanssequenceisthesameasthegenome,colorshowsachangefromreference.
Loadallthebamfiles
Youcanmovethetracksbyclickingonthenameanddraggingthem.
Zoomintoregionofinterest
Putyourmouseovereachbasetogetmorestatisticsabouteachbase.
VCFFormat LoadtheVCFfile
8/11/16
6
LoadtheVCFfile DetailsofSNP
PlaceyourmouseovertheSNPtogetthedetails.Aretheconclusionsthesame?Whatadditionalfilteringshouldweapply?