53
Read Mapping and Variant Calling

Read Mapping and Variant Calling - Read the DocsClever Tricks to find “Best” Alignments • Use the quality values – Penalize mismatches at high quality bases more than mismatches

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Read Mapping and Variant Calling - Read the DocsClever Tricks to find “Best” Alignments • Use the quality values – Penalize mismatches at high quality bases more than mismatches

ReadMappingandVariantCalling

Page 2: Read Mapping and Variant Calling - Read the DocsClever Tricks to find “Best” Alignments • Use the quality values – Penalize mismatches at high quality bases more than mismatches

WholeGenomeResequencing•  Sequencingmul:ple

individualsfromthesamespecies

•  Referencegenomeisalreadyavailable

•  Discovervaria:onsinthegenomesbetweenandwithinsamples–  muta:ons–  inser:ons–  dele:ons–  rearrangements–  copynumberchanges

Howlongdothereadsneedtobe?Forthehumangenome,es:matesare:25mers=80%uniquecoverage43mers=90%uniquecoverageButlongerisbePerforcertainapplica:ons.

Page 3: Read Mapping and Variant Calling - Read the DocsClever Tricks to find “Best” Alignments • Use the quality values – Penalize mismatches at high quality bases more than mismatches

Workflow

QualityAssessment

Trimming

QualityAssessment

MappingtoaReference

Visualiza:on

Callingvariants

Assessingfunc:onalimpactofvariants

SubmittoSRA

Page 4: Read Mapping and Variant Calling - Read the DocsClever Tricks to find “Best” Alignments • Use the quality values – Penalize mismatches at high quality bases more than mismatches

Butwe’vealreadycoveredalignmentwithBLAST.

•  Alignmentmethodsmusthavetradeoffsforspeedvsaccuracy

•  Dependingontheapplica:on,maywanttomakedifferenttradeoffs

•  Globalvslocal•  Differenttypesofalignmentobjec:vesleadtodifferent

categoriesofaligners

Page 5: Read Mapping and Variant Calling - Read the DocsClever Tricks to find “Best” Alignments • Use the quality values – Penalize mismatches at high quality bases more than mismatches

ShortReadMappers•  BLASTismuchfasterthanoriginalalgorithms(SmithWatermanforexample)

•  S:lltooslowfortheamountofdataproducedbyNGStechnology

•  Resequencingusuallyinvolvescomparingverysimilarsequences(>90%iden:tyinresidues)toareferencegenome

•  Soawarethatleveragesthishighpercentiden:tycanbefaster

•  Cangenerallyu:lizeaglobalstrategy

Page 6: Read Mapping and Variant Calling - Read the DocsClever Tricks to find “Best” Alignments • Use the quality values – Penalize mismatches at high quality bases more than mismatches

ShortReadMappers

•  OrdersofmagnitudefasterthanBLAST•  severaltensofmillionsofreadsmappedperhourperCPU

•  Onlymatchesof95%iden:tyorgreaterarefound•  Indelsarepar:cularlyproblema:c•  Usuallyonlyoutputthebesthitorthesetofhitsallequivalentlygood–  Thepointisusuallytofindtheorigininthereferencegenome

–  Othergenomicregionsofloweriden:tyarenotconsidereduseful

Page 7: Read Mapping and Variant Calling - Read the DocsClever Tricks to find “Best” Alignments • Use the quality values – Penalize mismatches at high quality bases more than mismatches

Uniqueness•  Somereadscanbemappeduniquelytothereference

•  Somecan’t–>Mul:plealignmentreads•  Mul:plealignmentreadsaredifficulttoapplytodownstreamapplica:ons–  RNASeq–whichgenedotheyrepresent?–  SNP–whichloca:oncarriesthepolymorphism?

•  Howtodealwithmul:plealignmentreads?–  Throwthemaway–  Thisintroducesbiasandignoresrealgenomicregionsthatmaybebiologicallyimportant

Page 8: Read Mapping and Variant Calling - Read the DocsClever Tricks to find “Best” Alignments • Use the quality values – Penalize mismatches at high quality bases more than mismatches

CleverTrickstofind“Best”Alignments

•  Usethequalityvalues– Penalizemismatchesathighqualitybasesmorethanmismatchesatlowqualitybases

•  PairedEndinforma:on–  Ifonereaddoesnotmapuniquely,buttheotherdoes,usethatinforma:ontoplacethenon-uniqueone

– Needtoknowyourinsertsize

Page 9: Read Mapping and Variant Calling - Read the DocsClever Tricks to find “Best” Alignments • Use the quality values – Penalize mismatches at high quality bases more than mismatches

Decisionsfortheenduser

•  Howmanymismatchesareallowedforareadtobeconsideredmapped?– Heterozygositybetweensampleandreference–  Incomplete/lowqualityreference

•  Howmanymatchestoreport?– Doesyourdownstreamanalysisneed/wanttoincludemul:plematches?

Explorethedocumenta1onandparametersforyour

so5wareofchoiceIsitdoingwhatyouthinkitsdoing?

Page 10: Read Mapping and Variant Calling - Read the DocsClever Tricks to find “Best” Alignments • Use the quality values – Penalize mismatches at high quality bases more than mismatches

Lotsofchoices65soawarepackageslistedonthewikipediapageforalignmentsoaware

BlatBWAElandGSNAPMaqRMAPStampySHRiMP

Prizeforbestnamed:VelociMapper

Page 11: Read Mapping and Variant Calling - Read the DocsClever Tricks to find “Best” Alignments • Use the quality values – Penalize mismatches at high quality bases more than mismatches

Howtochoose?

•  Memoryefficient•  Gooddocumenta:on•  Responsivemailinglistorhelpforum•  Maintainedandupdatedwhenbugsarefound– BWA

Page 12: Read Mapping and Variant Calling - Read the DocsClever Tricks to find “Best” Alignments • Use the quality values – Penalize mismatches at high quality bases more than mismatches

Whatmappershaveincommon:IndexingStrategies

•  Usually,thefirststepistotransformpartofthedataintoamoresuitableformforfastsearching

•  Indexing–crea:ngaglossaryorlookuptable

•  Withoutindexingyouwouldhavetoscaneverythingeach:meyoudidasearch

•  Considerwebsearchengines

Page 13: Read Mapping and Variant Calling - Read the DocsClever Tricks to find “Best” Alignments • Use the quality values – Penalize mismatches at high quality bases more than mismatches

•  Hasthreealgorithms•  Individualchromosomescannotbelongerthan2GB

•  OutputinSAMformat

Page 14: Read Mapping and Variant Calling - Read the DocsClever Tricks to find “Best” Alignments • Use the quality values – Penalize mismatches at high quality bases more than mismatches

•  threealgorithms:•  BWA-backtrack

–  Meantforsequencesofupto100bpinlength–  Meantforreadswithlessthan2%error(candosomeendtrimming)

•  BWA-MEM–  Useforanysequencesgreaterthan70bpupto1Mb–  Muchmorewidelyusednowthatsequencersoutputlongerreads–  Willworkwithreadswith

•  2%errorfor100bp•  3%errorfor200bp•  etc

–  Hassplitreadsupport•  structuralvaria:ons,genefusionorreferencemisassembly

•  BWA-MEMismoreaccurateandfaster

hPp://bio-bwa.sourceforge.net/

Page 15: Read Mapping and Variant Calling - Read the DocsClever Tricks to find “Best” Alignments • Use the quality values – Penalize mismatches at high quality bases more than mismatches

SAMANDBAMFORMAT

Page 16: Read Mapping and Variant Calling - Read the DocsClever Tricks to find “Best” Alignments • Use the quality values – Penalize mismatches at high quality bases more than mismatches

SAMFormat•  SAM=SequenceAlignment/

Mapformat•  Tabdelimitedplaintext•  Storelargenucleo:de

sequencealignments–  Alignmentofeveryread–  IncludinggapsandSNPs–  Pairingofreads–  Canrecordmorethanone

alignmentloca:oninthegenome

–  Storesqualityvalues–  Storesinforma:onabout

duplica:on

•  Flexible•  Usefulforopera:onsonvery

largesequences•  Extremelydetailed

documenta:on–  hPps://samtools.github.io/hts-

specs/SAMv1.pdf•  Manipula:onsareprimarily

donewiththesoawaresamtools

•  Originallydesignedtostoremappinginforma:on,nowusedasaprimarystorageformatforunmappedsequencesaswell

Page 17: Read Mapping and Variant Calling - Read the DocsClever Tricks to find “Best” Alignments • Use the quality values – Penalize mismatches at high quality bases more than mismatches

SAM-Header

•  Structure– Op:onalHeaderattopoffile

– Alignmentinforma:on

@@@@@ReadReadRead

Page 18: Read Mapping and Variant Calling - Read the DocsClever Tricks to find “Best” Alignments • Use the quality values – Penalize mismatches at high quality bases more than mismatches

SAM-Header•  Headerlinesstartwith@symbol•  Alwaysattopoffile•  Containlotsofinforma:onaboutwhatwasmapped,whatitwasmappedto,andhow(metadata)–  theversioninforma:onfortheSAM/BAMfile– whetherornotandhowthefileissorted–  informa:onaboutthereferencesequences–  anyprocessingthatwasusedtogeneratethevariousreadsinthefile

–  soawareversion

Page 19: Read Mapping and Variant Calling - Read the DocsClever Tricks to find “Best” Alignments • Use the quality values – Penalize mismatches at high quality bases more than mismatches

SimpleHeader@HD=firstline

VN=versionofSAMformat SO=sortorder(thisissortedbycoordinates)

@HD VN:1.5 SO:coordinate@SQ SN:ref LN:45

@SQ=referencesequence SN=SequencereferenceName LN=seqeuncereferencelength

Decipherheaderinforma:on:hPps://samtools.github.io/hts-specs/SAMv1.pdf

Page 20: Read Mapping and Variant Calling - Read the DocsClever Tricks to find “Best” Alignments • Use the quality values – Penalize mismatches at high quality bases more than mismatches

AlignmentLine•  Belowtheheadersarethealignmentrecords•  Tab-delimitedfields

•  1 QNAME Querytemplate/pairNAME•  2 FLAG bitwiseFLAG•  3 RNAME ReferencesequenceNAME•  4 POS 1-basedleamostPOSi:on/coordinateofclippedsequence•  5 MAPQ MAPpingQuality(Phred-scaled)•  6 CIGAR extendedCIGARstring•  7 MRNM MateReferencesequenceNaMe(`='ifsameasRNAME)•  8 MPOS 1-basedMatePOSis:on•  9 TLEN inferredTemplateLENgth(insertsize)•  10 SEQ querySEQuenceonthesamestrandasthereference•  11 QUAL queryQUALity(ASCII-33givesthePhredbasequality)•  12+ OPT variableOPTionalfieldsintheformatTAG:VTYPE:VALUE

Page 21: Read Mapping and Variant Calling - Read the DocsClever Tricks to find “Best” Alignments • Use the quality values – Penalize mismatches at high quality bases more than mismatches

Letsunpackthisalignmentline,takenfromaSAMfile:SRR030257.2000020 83gi|254160123|ref|NC_012967.1|32957526036M = 3295706-82TGCTGGCGGCGATATCGTCCGTGGTTCCGATCTGGT?%<91<?>>??AAAAAAAAAAAAAAAAAAAAAAAAAXT:A:NM:i:0 SM:i:37 AM:i:37 X0:i:1X1:i:0XM:i:0 XO:i:0 XG:i:0 MD:Z:36

Page 22: Read Mapping and Variant Calling - Read the DocsClever Tricks to find “Best” Alignments • Use the quality values – Penalize mismatches at high quality bases more than mismatches

SAMField1

Query name

SRR030257.2000020

2. Flag:83

Page 23: Read Mapping and Variant Calling - Read the DocsClever Tricks to find “Best” Alignments • Use the quality values – Penalize mismatches at high quality bases more than mismatches

Field 2:Flag

83

64 + 16 + 2 + 1

1 = Read is paired2 = Read mapped in proper pair16 = Read mapped to reverse strand64 = First in pair

LookupaSAMflag:hPps://broadins:tute.github.io/picard/explain-flags.html

Page 24: Read Mapping and Variant Calling - Read the DocsClever Tricks to find “Best” Alignments • Use the quality values – Penalize mismatches at high quality bases more than mismatches

SAMField3

Referencesequencename(usefulespeciallyifyouhavemul:plechromosomes)gi|254160123|ref|NC_012967.1|

Page 25: Read Mapping and Variant Calling - Read the DocsClever Tricks to find “Best” Alignments • Use the quality values – Penalize mismatches at high quality bases more than mismatches

SAMField4

Posi:on-1-basedleamostmappingPOSi:onofthefirstmatchingbase3295752

Page 26: Read Mapping and Variant Calling - Read the DocsClever Tricks to find “Best” Alignments • Use the quality values – Penalize mismatches at high quality bases more than mismatches

SAMField5

MappingQuality•  equals−10log10Pr{mappingposi:oniswrong},roundedtothenearestinteger

•  Probabilityof99.9%=mapqualityof30•  Probabilityof0%=mapqualityof0•  value255indicatesthatthemappingqualityisnotavailable.

60 .000001% probability wrong

Page 27: Read Mapping and Variant Calling - Read the DocsClever Tricks to find “Best” Alignments • Use the quality values – Penalize mismatches at high quality bases more than mismatches

SAMField6CIGARString36M 36 nucleotides match (perfect match)8S28M 8 nucleotides clipped, 28 match

Page 28: Read Mapping and Variant Calling - Read the DocsClever Tricks to find “Best” Alignments • Use the quality values – Penalize mismatches at high quality bases more than mismatches

MoreCIGAR

Position: 5CIGAR: 3M1I3M1D5M

Page 29: Read Mapping and Variant Calling - Read the DocsClever Tricks to find “Best” Alignments • Use the quality values – Penalize mismatches at high quality bases more than mismatches

SAMField7

Referencesequenceforthenextreadinthetemplate•  Foraforwardread,thisisthereferencewherethereversereadmaps

•  Forareverseread,thisisthereferencewheretheforwardreadmaps

= reverse read maps on the same reference

Page 30: Read Mapping and Variant Calling - Read the DocsClever Tricks to find “Best” Alignments • Use the quality values – Penalize mismatches at high quality bases more than mismatches

SAMField8

Posi:onwherethenextreadmaps

3295706

(Forward read mapped at 3295752. Remember the forward read mapped to the reverse strand)

Page 31: Read Mapping and Variant Calling - Read the DocsClever Tricks to find “Best” Alignments • Use the quality values – Penalize mismatches at high quality bases more than mismatches

SAMField9

Observedtemplatelength

-82

Page 32: Read Mapping and Variant Calling - Read the DocsClever Tricks to find “Best” Alignments • Use the quality values – Penalize mismatches at high quality bases more than mismatches

SAMField10

Sequenceoftheread

TGCTGGCGGCGATATCGTCCGTGGTTCCGATCTGGT

Page 33: Read Mapping and Variant Calling - Read the DocsClever Tricks to find “Best” Alignments • Use the quality values – Penalize mismatches at high quality bases more than mismatches

SAMField11

Qualityoftheread

?%<91<?>>??AAAAAAAAAAAAAAAAAAAAAAAAA

Page 34: Read Mapping and Variant Calling - Read the DocsClever Tricks to find “Best” Alignments • Use the quality values – Penalize mismatches at high quality bases more than mismatches

SAMField12

Op:onalMOREinforma:onTAG:TYPE:VALUEformat

XT:A:U NM:i:0 SM:i:37 AM:i:37 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:36

Anything with an X is specified by the user or by the mapping software, and is not part of the SAM spec.

Page 35: Read Mapping and Variant Calling - Read the DocsClever Tricks to find “Best” Alignments • Use the quality values – Penalize mismatches at high quality bases more than mismatches

DecipherthelastfieldsXT:A:U One of Unique/Repeat/N/Mate-swNM:i:0 Edit distance to the referenceSM:i:37 Template-independent mapping qualityAM:i:37 Smallest template-independent mapping

quality of other segmentsX0:i:1 Number of best hitsX1:i:0 Number of suboptimal hits found by BWAXM:i:0 Number of mismatches in the alignmentXO:i:0 Number of gap opensXG:i:0 Number of gap extentionsMD:Z:36 String for mismatching positions

Page 36: Read Mapping and Variant Calling - Read the DocsClever Tricks to find “Best” Alignments • Use the quality values – Penalize mismatches at high quality bases more than mismatches

SAMThefirstthreelinesofasamfile:@SQ SN:gi|254160123|ref|NC_012967.1| LN:4629812@PG ID:bwaPN:bwa VN:0.7.12-r1039 CL:/lustre/projects/rnaseq_ws/apps/bwa-0.7.12/bwasampe../raw_data/NC_012967.1.fastaaln_SRR030257_1.saialn_SRR030257_2.sai../raw_data/SRR030257_1.fastq../raw_data/SRR030257_2.fastqSRR030257.1 99 gi|254160123|ref|NC_012967.1|

950180 60 36M = 950295 151TTACACTCCTGTTAATCCATACAGCAACAGTATTGGAAA;A;AA?A?AAAAA?;?A?1A;;????566)=*1 XT:A:UNM:i:1SM:i:37 AM:i:25 X0:i:1 X1:i:0 XM:i:1XO:i:0XG:i:0 MD:Z:32C3

Page 37: Read Mapping and Variant Calling - Read the DocsClever Tricks to find “Best” Alignments • Use the quality values – Penalize mismatches at high quality bases more than mismatches

BAMFormat•  SisterformattoSAM•  BAM–BinaryversionofSAM•  compressedBGZF(BlockedGNUZipFormat)-avariantofGZIP

(GNUZIP),•  filesarebiggerthanGZIPfiles,buttheyaremuchfasterforrandom

access•  Canindexandthenlookupinforma:onembeddedinthefilewith

decompressingthewholefile•  upto75%smallerinsize•  Notreadablebypeople

hPp://blastedbio.blogspot.com/2011/11/bgzf-blocked-bigger-bePer-gzip.html

Page 38: Read Mapping and Variant Calling - Read the DocsClever Tricks to find “Best” Alignments • Use the quality values – Penalize mismatches at high quality bases more than mismatches

samtools•  View–printalignmentstoyourscreenorconvertbetweenformats.Canreduce

filestoapar:cularregiononly•  Tview-textalignmentviewer,niayforquickviewingoffiles•  Mpileup–generatesaspecialmpileupformaPedfileneededforcallingvariants•  Sort–sortthealignments(bydefault,sortsbycoordinate).Sor:ngisneededfor

mostdownstreamapplica:ons.•  Merge–concatenatebamfilestogether,whilemaintainingsor:ngorder•  Index-indexabamorcramfile,neededformostdownstreamapplica:ons•  Idxstats–getsomestatsaboutyourbamfile•  Faidx-indexafastafile,needformostdownstreamapplica:onsusingabamfile•  Bam2fq–convertabamfiletoafastqfile•  More…

AlwaystheformatSamtoolssubcommand–flags–moreflags

Page 39: Read Mapping and Variant Calling - Read the DocsClever Tricks to find “Best” Alignments • Use the quality values – Penalize mismatches at high quality bases more than mismatches

SNPCALLING

Page 40: Read Mapping and Variant Calling - Read the DocsClever Tricks to find “Best” Alignments • Use the quality values – Penalize mismatches at high quality bases more than mismatches

SNPCalling

•  SNP=singlenucleo:depolymorphism

•  Indel=inser:on/dele:on

•  Examinethealignmentsofreadsandlookfordifferencesbetweenthereferenceandtheindividual(s)beingsequenced

Fengetal2014.NovelSingleNucleo:dePolymorphismsoftheInsulin-LikeGrowthFactor-IGeneandTheirAssocia:onswithGrowthTraitsinCommonCarp(CyprinuscarpioL.)

Page 41: Read Mapping and Variant Calling - Read the DocsClever Tricks to find “Best” Alignments • Use the quality values – Penalize mismatches at high quality bases more than mismatches

SNPCalling•  Difficul:es:–  Cloningprocess(PCR)ar:facts–  Errorsinthesequencingreads–  Incorrectmapping–  Errorsinthereferencegenome

•  HengLi,developerofBWA,lookedatmajorsourcesoferrorsinvariantcalls*:–  erroneousrealignmentinlow-complexityregions–  theincompletereferencegenomewithrespecttothesample

*Li2014TowardbePerunderstandingofar:factsinvariantcallingfromhigh-coveragesamples.Bioinforma:cs.

Page 42: Read Mapping and Variant Calling - Read the DocsClever Tricks to find “Best” Alignments • Use the quality values – Penalize mismatches at high quality bases more than mismatches

Indel

Canbedifficulttodecidewherethebestalignmentactuallyis

*hPps://bioinf.comav.upv.es/courses/sequence_analysis/snp_calling.html

Page 43: Read Mapping and Variant Calling - Read the DocsClever Tricks to find “Best” Alignments • Use the quality values – Penalize mismatches at high quality bases more than mismatches

SNPCallingSoaware•  Samtool’smpileup->bcaools•  GATK(GenomeAnalysisTooklit)•  FreeBayes

•  Takeintoaccountthequalityvalueoftheindividualbaseandthequalityvalueforthealignmentoftheread

•  Canu:lizeinforma:onaboutpreviouslycalled/confirmedSNPs

Page 44: Read Mapping and Variant Calling - Read the DocsClever Tricks to find “Best” Alignments • Use the quality values – Penalize mismatches at high quality bases more than mismatches

Steps

•  Preprocessingreadalignments*– PicardMarkDuplicates– GATKbasequalityscorerecalibra:on– GATKrealignmentaroundindels

•  CalltheSNPswithsoawareofyourchoice•  FiltertheSNPs– Goaltoremovefalseposi:ves

Mayormaynotbeworthpreprocessing:hPps://bcbio.wordpress.com/2013/10/21/updated-comparison-of-variant-detec:on-methods-ensemble-freebayes-and-minimal-bam-prepara:on-pipelines/

Page 45: Read Mapping and Variant Calling - Read the DocsClever Tricks to find “Best” Alignments • Use the quality values – Penalize mismatches at high quality bases more than mismatches

Filtering•  Depth

–  HowmanyreadsdoyouneedtosampletoconfidentlycallaSNP?–  >20X=verygood–  5-20X=okay–  <5X=missingmanyheterozygouscalls

–  Lowdepthmaybeovercomeusingsta:s:cs–seeoverviewofgenotypelikelihoodshere:

–  hPp://www.ncbi.nlm.nih.gov/pmc/ar:cles/PMC3593722/

•  Lowquality–usethequalityes:matesprovidedbythesoaware•  Numberofalleles(monomorphicvsbiallelic)•  Highcoverage–canindicateaduplicatedregioninthegenome•  Highlyvariableregion–canalsoindicateaduplicatedregion•  Lowcomplexityregions

Page 46: Read Mapping and Variant Calling - Read the DocsClever Tricks to find “Best” Alignments • Use the quality values – Penalize mismatches at high quality bases more than mismatches

Yourmileagemayvary•  Differentdecisionsabouthowtoalignreadsandiden:fyvariantscanyieldverydifferentresults

•  5pipelines•  “SNVconcordancebetweenfiveIlluminapipelinesacrossall15exomeswas57.4%,while0.5to5.1%ofvariantswerecalledasuniquetoeachpipeline.Indelconcordancewasonly26.8%betweenthreeindel-callingpipelines”

Page 47: Read Mapping and Variant Calling - Read the DocsClever Tricks to find “Best” Alignments • Use the quality values – Penalize mismatches at high quality bases more than mismatches

bcaools

•  BCFtoolsisasetofu:li:esthatmanipulatevariantcallsintheVariantCallFormat(VCF)anditsbinarycounterpartBCF.

•  Ack,moreformats!!!

Page 48: Read Mapping and Variant Calling - Read the DocsClever Tricks to find “Best” Alignments • Use the quality values – Penalize mismatches at high quality bases more than mismatches

VCF

•  VariantCallFormat•  Officialspec:hPp://samtools.github.io/hts-specs/VCFv4.2.pdf

•  Headerlinesstar:ngwith#signs

•  Lineswithvariantsaaerward

#####ReadReadRead

Page 49: Read Mapping and Variant Calling - Read the DocsClever Tricks to find “Best” Alignments • Use the quality values – Penalize mismatches at high quality bases more than mismatches

VCF(cont)•  Tabdelimitedfields

–  Chromosome–  Loca:on–  ID(ifthisisanamedvariant)–  Referencesequence–  Alternatesequence–  Qualityscore–  Filter(true/false–whetherornotitpassedfiltering)–  Info–lotsofaddi:onalinfosuchasCIGARstring,depthacross

differentsamples,etc.–  Columnsfollowforeachgenotypeifavailable

•  BCFisthecompressedbinaryformat–  SAM<->BAM–  VCF<->BCF

Page 50: Read Mapping and Variant Calling - Read the DocsClever Tricks to find “Best” Alignments • Use the quality values – Penalize mismatches at high quality bases more than mismatches

VCFExample

#CHROM 20POS 14370ID rs6054257REF GALT AQUAL 29FILTER PASSINFO NS=3;DP=14;AF=0.5;DB;H2FORMAT GT:GQ:DP:HQNA00001 0|0:48:1:51,51NA00002 1|0:48:8:51,51NA00003 1/1:43:5:.,.

Page 51: Read Mapping and Variant Calling - Read the DocsClever Tricks to find “Best” Alignments • Use the quality values – Penalize mismatches at high quality bases more than mismatches

bcaools•  BcaoolscallsSNPsfrom

mpileupfileandmanipulatesvcf/bcfformaPedfiles

•  Call–SNP/indelcalling•  Filter–filterthevariantsby

quality•  Merge–mergeVCFfiles

together•  Consensus–resequencedan

individualandgeneratethereferencesequenceforthatindividual

•  Stats-sta:s:cs•  Convert–convertbetween

formats

Page 52: Read Mapping and Variant Calling - Read the DocsClever Tricks to find “Best” Alignments • Use the quality values – Penalize mismatches at high quality bases more than mismatches

Overview

Samtools

Bcaools

•  WorkswithSAM/BAMfiles

•  Producesmpileup

•  CallSNPsfrommpileup•  WorkswithVCF/BCFfiles

AlignmentData

VariantData

Page 53: Read Mapping and Variant Calling - Read the DocsClever Tricks to find “Best” Alignments • Use the quality values – Penalize mismatches at high quality bases more than mismatches

IGV•  high-performancevisualiza:on

toolforinterac:veexplora:onoflarge,integratedgenomicdatasets

•  Freebutrequiresone:meregistra:on

hPp://www.broadins:tute.org/igv/

•  Visualizeslotsofdatatypes–  NGSreadalignments–  Geneannota:on–  Variants–  Etc.