Upload
others
View
10
Download
0
Embed Size (px)
Citation preview
Lecture5,6Localsequencealignment
Chapter6inJonesandPevzner
Fall2019September12,17,2019
Evolutionasatoolforbiologicalinsight
• “Nothinginbiologymakessenseexceptinthelightofevolution”-TheodosiusDobzhansky.
• Thefunctionalityofmany
genesisvirtuallythesameamongmanyorganisms:Canunderstandbiologyinsimplerorganismsthanourselves(“modelorganisms”).
Localalignment:rationale
• Proteinsareoftenmulti-functional,andarecomposedofregions(domains),eachofwhichcontributesaparticularfunction
• Example:
² Homeoboxgeneshaveashortregioncalledthehomeodomainthatishighlyconservedbetweenspecies.
² AglobalalignmentmightnotfindthehomeodomainbecauseitwouldtrytoaligntheENTIREsequence
Drosophila homeodomain PDB: 1ZQ3
Localvs.GlobalAlignment(cont’d)
Global Alignment Local Alignment—better for finding a conserved segment
--T—-CC-C-AGT—-TATGT-CAGGGGACACG—A-GCATGCAGA-GAC | || | || | | | ||| || | | | | |||| | AATTGCCGCC-GTCGT-T-TTCAG----CA-GTTATG—T-CAGAT--C
tccCAGTTATGTCAGgggacacgagcatgcagagac ||||||||||||
aattgccgccgtcgttttcagCAGTTATGTCAGatc
TheLocalAlignmentProblem
• Goal:Findthebestlocalalignmentbetweentwostrings
• Input:Stringsv,wandscoringmatrixδ• Output:Alignmentofsubstringsofvandwwhosealignmentscoreismaximumamongallpossiblealignmentofallpossiblesubstrings
Localvs.globalalignment
Global alignment
Local alignment
Compute a “mini” global alignment to get a local alignment
Localvs.GlobalAlignment
• TheGlobalAlignmentProblemtriestofindthelongestpathbetweenvertices(0,0)and(n,m)intheeditgraph.
• TheLocalAlignmentProblemtriestofindthelongestpathamongpathsbetweenarbitraryvertices(i,j)and(i’,j’)intheeditgraph.
• Inaneditgraphwithnegatively-scorededges,alocalalignmentmayscorehigherthanaglobalalignment
TheProblemwiththisProblem
• Naïvemethod(runtimeO(n4)):
-Inagridofsizenxntherearen2nodes(i,j)thatmayserveasasource.
-Foreachsuchnodecomputingalignmentsfrom(i,j)to(i’,j’)takesO(n2)time.
LocalAlignment:Example
LocalAlignment:Example
LocalAlignment:Example
LocalAlignment:Example
LocalAlignment:Example
LocalAlignment:FreeRides
(0,0)
The dashed edges represent the free rides from (0,0) to every other node.
The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.
Yeah, a free ride!
LocalAlignment:Recurrence
0 si-1,j-1 + δ (vi, wj)
s i-1,j + δ (vi, -) s i,j-1 + δ (-, wj)
Power of ZERO: this is the only change from the original recurrence of a global alignment, representing the “free ride” edge
si,j = max
LocalAlignment:Backtrace
• Scoreofbestlocalalignmentisthemaximumentryofsij
• Thealignmentisfoundbyabacktracefromthemaximumnode,toanodeforwhichthescoreis0.
LocalAlignment:SW
• Thislocalalignmentalgorithmisknownasthe“Smith-Waterman”algorithm1.
• T.F.SmithandM.W.Waterman.Identificationofcommonmolecularsubsequences.J.Mol.Biol.147:195-197,1981.
1TheSmith-Watermanalgorithmconsidersamoresophisticatedgappenaltyscheme
ScoringIndels:NaiveApproach
• Afixedpenaltyσisgiventoeveryindel:• -σfor1indel,• -2σfor2consecutiveindels• -3σfor3consecutiveindels,etc.
Canbetooseverepenaltyforaseriesof100consecutiveindels
AffineGapPenalties
• Innature,aseriesofkindelsoftencomesasasingleeventratherthanaseriesofksinglenucleotideevents:
Normal scoring would give the same score for both alignments
This is more likely.
This is less likely.
Affinegappenalty
• Scoreforagapoflengthxis:-(ρ+σx)where:ρ>0isthegapopeningpenalty σ>0isthegapextensionpenalty
• ρislargerelativetoσbecauseyoudonotwanttoadd
toomuchofapenaltyforextendingthegap.
AffineGapPenaltiesandEditGraph
Toreflectaffinegappenaltieswehavetoadd“long”horizontalandverticaledgestotheeditgraph.Eachsuchedgeoflengthxshouldhaveweight -ρ - x *σ
Adding“AffinePenalty”EdgestotheEditGraph
• Therearemanysuchedges!
• Addingthemtothegraphincreasestherunningtimeofthealignmentalgorithmbyafactorofn.
• ThecomplexityincreasesfromO(n2)toO(n3)
The3-leveledManhattanGrid
gaps in w
matches/mismatches
gaps in v
Manhattanin3Layers
ρ
ρ
σ
σ δ δ
δ
δ δ
AffineGapPenaltiesand3LayerManhattanGrid
• We’llhavethreerecurrencesina3-layeredgraph.
• Thetoplevelcreates/extendsgapsinthesequencew.
• Thebottomlevelcreates/extendsgapsinsequencev.
• Themiddlelevelextendsmatchesandmismatches.
SwitchingBetweentheLayers
• Levels:• Themainlevelisfordiagonaledges• Thelowerlevelisforhorizontaledges• Theupperlevelisforverticaledges
• Ajumpingpenaltyisassignedtomovingfromthemainleveltoeithertheupperlevelorthelowerlevel(-ρ- σ)
• Thereisagapextensionpenaltyforeachcontinuationonalevelotherthanthemainlevel(-σ)
AffineGapPenaltyRecurrences
si,j = s i-1,j - σ max s i-1,j - (ρ+σ) si,j = s i,j-1 - σ max s i,j-1 - (ρ+σ) si,j = si-1,j-1 + δ (vi, wj) max s i,j s i,j
Continue gap in w Start gap in w : from middle
Continue gap in v Start gap in v : from middle
Match or Mismatch End gap: from top End gap: from bottom
ShouldwecompareDNAorproteinsequences?
• DNAsequenceislessconservedthanproteinsequence
• TheproteinsequencecontainsmoreinformationthantheDNAsequence
⇒Lesseffectivetocomparecodingregionsatthenucleotidelevel
ScoringMatrices
MakingaScoringMatrix
• Scoringmatricesarecreatedbasedontheintuitionthatsomemutationshaveasmallereffectonthefunctionofaprotein⇒Suchmismatchpenaltiesshouldbelessharshthanothers.
ScoringMatrix:Example
A R N K
A 5 -2 -1 -1
R - 7 -1 3
N - - 7 0
K - - - 6
• AlthoughRandKaredifferentaminoacids,theyhaveapositivescore.
• Why?Theyarebothpositivelychargedaminoacids,thereforesubstitutionwillnotgreatlychangethefunctionoftheprotein.
SubstitutionsofAminoAcidsMutationratesbetweenaminoacidshavedramaticdifferences!
Conservation
• Aminoacidchangesthatpreservethephysico-chemicalpropertiesoftheoriginalresidueshouldreceivehigherscores• polartopolar• aspartateàglutamate
• hydrophobictohydrophobic• alanineàvaline
PercentSequenceIdentity
• Ameasureoftheextenttowhichtwonucleotideoraminoacidsequencesaresimilar
A C C T G A G – A G A C G T G – G C A G
70% identical mismatch
indel
BLOSUM
• BlocksSubstitutionMatrix• Scoresderivedfromobservationsofthefrequenciesofsubstitutionsinalignmentsofrelatedproteins
• Matrixnameindicatesevolutionarydistance:• BLOSUM62wascreatedusingsequencessharingnomorethan62%identity
Henikoff, S. and Henikoff, J. (1992) Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA. 89(biochemistry): 10915 - 10919. 1992
AnentryfromtheBLOCKSdatabase
Block PR00851A ID XRODRMPGMNTB; BLOCK AC PR00851A; distance from previous block=(52,131) DE Xeroderma pigmentosum group B protein signature BL adapted; width=21; seqs=8; 99.5%=985; strength=1287 XPB_HUMAN|P19447 ( 74) RPLWVAPDGHIFLEAFSPVYK 54XPB_MOUSE|P49135 ( 74) RPLWVAPDGHIFLEAFSPVYK 54P91579 ( 80) RPLYLAPDGHIFLESFSPVYK 67XPB_DROME|Q02870 ( 84) RPLWVAPNGHVFLESFSPVYK 79RA25_YEAST|Q00578 ( 131) PLWISPSDGRIILESFSPLAE 100Q38861 ( 52) RPLWACADGRIFLETFSPLYK 71O13768 ( 90) PLWINPIDGRIILEAFSPLAE 100O00835 ( 79) RPIWVCPDGHIFLETFSAIYK 86//
The BLOCKS database is at: http://blocks.fhcrc.org/
TheBlosum50ScoringMatrix