14
Michael Schroeder Biotechnology Center TU Dresden Global and Local Alignments

Global and Local Alignments · Local Alignment § Global Alignment § path in distance matrix d from d 0,0 to d m,n § Local Alignment § Path in d from any d k,l to any d o,p such

  • Upload
    others

  • View
    24

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Global and Local Alignments · Local Alignment § Global Alignment § path in distance matrix d from d 0,0 to d m,n § Local Alignment § Path in d from any d k,l to any d o,p such

Michael Schroeder

Biotechnology CenterTU Dresden

Global and Local Alignments

Page 2: Global and Local Alignments · Local Alignment § Global Alignment § path in distance matrix d from d 0,0 to d m,n § Local Alignment § Path in d from any d k,l to any d o,p such

Contents §  Why to compare and align sequences?

§  How to judge an alignment? §  Z-score, E-value, P-value, structure and function

§  How to compare and align sequences? §  Levensthein distance, scoring schemes, longest common

subsequence, global and local alignment, substitution matrix, §  How to compute an alignment?

§  Dynamic programming §  How to compute an alignment fast?

§  Blast §  How to align many sequences

§  Multiple sequence alignment, phylogenetic trees §  Alignments and structure

§  How to predict protein structure from protein sequence

2

Page 3: Global and Local Alignments · Local Alignment § Global Alignment § path in distance matrix d from d 0,0 to d m,n § Local Alignment § Path in d from any d k,l to any d o,p such

Local Alignment §  Needleman-Wunsch = globally best alignment

§  Finding domains / exons

§  maximise local alignments by ignoring terminal gaps

§  How to maximise locally

3

Page 4: Global and Local Alignments · Local Alignment § Global Alignment § path in distance matrix d from d 0,0 to d m,n § Local Alignment § Path in d from any d k,l to any d o,p such

Local Alignment §  Global Alignment

§  path in distance matrix d from d0,0 to dm,n

§  Local Alignment §  Path in d from any dk,l to any do,p such that

do,p - dk,l ≥ do,p - di,j for any i ≤ m and j ≤ n with o ≥ k and p ≥ l and o ≥ i and p ≥ j.

§  A path must exist from o,p to k,l and o,p to i,j in db

§  How to §  chop off right side? §  chop off left side?

4

Page 5: Global and Local Alignments · Local Alignment § Global Alignment § path in distance matrix d from d 0,0 to d m,n § Local Alignment § Path in d from any d k,l to any d o,p such

Needleman-Wunsch Algorithm with Substitution Matrix

5

Global alignment of string a to b

5 Global Alignment with Needleman-Wunsch and Substitution

Matrix

Log-Odds Ratio

log2P (x, y)

P (x) ⇤ P (y)

Needle

Let a = a1 . . . am and b = b1 . . . bn be strings. Then

needlea,b = needlea,b(m,n)

is the global alignment score of a and b with substitution matrix, where

needlea,b(i, j) =

8>>>>>><

>>>>>>:

isg if j = 0,

jsg if i = 0,

max

8><

>:

needlea,b(i� 1, j) + sg

needlea,b(i, j � 1) + sg

needlea,b(i� 1, j � 1) + ds(ai, bj)

otherwise,

for 0 i m and 0 j n, substitution matrix ds(ai, bj), and gap penalty sg < 0.

7

Page 6: Global and Local Alignments · Local Alignment § Global Alignment § path in distance matrix d from d 0,0 to d m,n § Local Alignment § Path in d from any d k,l to any d o,p such

Smith-Waterman Algorithm

6

Local alignment of string a to b

6 Local Alignment with Smith-Waterman Algorithm

Let a = a1 . . . am and b = b1 . . . bn be strings. Then

watera,b = max1im,1jn

{watera,b(i, j)}

is the local alignment score of a and b, where

watera,b(i, j) =

8>>>>>><

>>>>>>:

0 if min(i, j) = 0,

max

8>>>><

>>>>:

0

watera,b(i� 1, j) + sg

watera,b(i, j � 1) + sg

watera,b(i� 1, j � 1) + ds(ai, bj)

otherwise,

and 0 i m and 0 j n, substitution matrix ds(ai, bj), and gap penalty sg < 0.

8

Page 7: Global and Local Alignments · Local Alignment § Global Alignment § path in distance matrix d from d 0,0 to d m,n § Local Alignment § Path in d from any d k,l to any d o,p such

Local Alignment with Dynamic Programming

i \ j p e t r e l l a

p e d r o

7

Page 8: Global and Local Alignments · Local Alignment § Global Alignment § path in distance matrix d from d 0,0 to d m,n § Local Alignment § Path in d from any d k,l to any d o,p such

Global Alignment with Substitution Matrix and Dynamic Programming

needle(a,b,ds): let d be a matrix of size m+1 × n+1 for 0 ≤ i ≤ m: d[i,0] = i * sg for 1 ≤ j ≤ n: d[0,j] = j * sg for 1 ≤ i ≤ m: for 1 ≤ j ≤ n: d[i,j] = max(d[i-1,j ] + sg, d[i ,j-1] + sg, d[i-1,j-1] + ds[ai,bj]) return d[m,n]

8

Page 9: Global and Local Alignments · Local Alignment § Global Alignment § path in distance matrix d from d 0,0 to d m,n § Local Alignment § Path in d from any d k,l to any d o,p such

Local Alignment with Dynamic Programming

water(a,b,ds): let d be a matrix of size m+1 × n+1 max = -∞ for 0 ≤ i ≤ m: d[i,0] = 0 for 1 ≤ j ≤ n: d[0,j] = 0 for 1 ≤ i ≤ m: for 1 ≤ j ≤ n: d[i,j] = max(0, d[i-1,j ] + sg, d[i ,j-1] + sg, d[i-1,j-1] + ds[ai,bj]) if d[i,j]>max: max=d[i,j] return max

9

Page 10: Global and Local Alignments · Local Alignment § Global Alignment § path in distance matrix d from d 0,0 to d m,n § Local Alignment § Path in d from any d k,l to any d o,p such

Affine Gap Penalties §  So far: 5 gaps of size 1 are as good 1 gap of size 5 §  But: Often whole substrings are deleted/inserted

§  Gap Score for a gap of length l: sg = so + l se §  so is gap opening score §  se gap extension score

§  Gap penalty vs. match/mismatch §  High: shorter, lower-scoring alignments with fewer gaps §  Low: higher-scoring, longer alignments with more gaps

§  Gap opening vs. gap extension §  Opening influences number of gaps §  Extension influences length of gaps

10

Page 11: Global and Local Alignments · Local Alignment § Global Alignment § path in distance matrix d from d 0,0 to d m,n § Local Alignment § Path in d from any d k,l to any d o,p such

Needleman-Wunsch Algorithm with Substitution Matrix

11

Global alignment of string a to b

5 Global Alignment with Needleman-Wunsch and Substitution

Matrix

Log-Odds Ratio

log2P (x, y)

P (x) ⇤ P (y)

Needle

Let a = a1 . . . am and b = b1 . . . bn be strings. Then

needlea,b = needlea,b(m,n)

is the global alignment score of a and b with substitution matrix, where

needlea,b(i, j) =

8>>>>>><

>>>>>>:

isg if j = 0,

jsg if i = 0,

max

8><

>:

needlea,b(i� 1, j) + sg

needlea,b(i, j � 1) + sg

needlea,b(i� 1, j � 1) + ds(ai, bj)

otherwise,

for 0 i m and 0 j n, substitution matrix ds(ai, bj), and gap penalty sg < 0.

7

Page 12: Global and Local Alignments · Local Alignment § Global Alignment § path in distance matrix d from d 0,0 to d m,n § Local Alignment § Path in d from any d k,l to any d o,p such

Needleman-Wunsch with Substitution Matrix and Affine Gap Penalties

12

Global alignment of string a to b

Page 13: Global and Local Alignments · Local Alignment § Global Alignment § path in distance matrix d from d 0,0 to d m,n § Local Alignment § Path in d from any d k,l to any d o,p such

Needleman-Wunsch with Substitution Matrix and Affine Gap Penalties

13

Global alignment of string a to b

7 Needleman-Wunsch Algorithm with substitution matrix and

a�ne gap penalties

Let a = a1 . . . am and b = b1 . . . bn be strings. Then

needlea,b = needlea,b(m,n)

is the global alignment score of a and b with substitution matrix and a�ne gap penalties,where

needlea,b(i, j) =

8>>>>>>>>><

>>>>>>>>>:

0 if i = j = 0,

so + ise if j = 0,

so + jse if i = 0,

max

8><

>:

needlea,b(i� 1, j) + ddel(i, j)

needlea,b(i, j � 1) + dins(i, j)

needlea,b(i� 1, j � 1) + ds(ai, bj)

otherwise,

and 0 i m, 0 j n, substitution matrix ds(ai, bj),gap opening penalty so < 0, gap extension penalty se < 0and a�ne gap panelty matrices

ddel(i, j) =

8><

>:

2so + jse if i = 0and j > 0,

max

(ddel(i� 1, j) + se

needlea,b(i� 1, j) + so + seotherwise,

dins(i, j) =

8><

>:

2so + ise if j = 0and i > 0,

max

(dins(i, j � 1) + se

needlea,b(i, j � 1) + so + seotherwise.

9

Page 14: Global and Local Alignments · Local Alignment § Global Alignment § path in distance matrix d from d 0,0 to d m,n § Local Alignment § Path in d from any d k,l to any d o,p such

Global Alignment with Dynamic Programming

i \ j a p i e d

p e d r o

14