25
Alignment methods April 26, 2011 Return Quiz 1 today Return homework #4 today. Next homework due Tues, May 3 Learning objectives- Understand the Smith-Waterman algorithm. Have a general understanding about PAM and BLOSUM scoring matrices. Workshop-Compare scoring matrices.

Alignment methods April 26, 2011 Return Quiz 1 today Return homework #4 today. Next homework due Tues, May 3 Learning objectives- Understand the Smith-Waterman

Embed Size (px)

Citation preview

Page 1: Alignment methods April 26, 2011 Return Quiz 1 today Return homework #4 today. Next homework due Tues, May 3 Learning objectives- Understand the Smith-Waterman

Alignment methodsApril 26, 2011Return Quiz 1 todayReturn homework #4 today.Next homework due Tues, May 3Learning objectives- Understand the Smith-Waterman algorithm. Have a general understanding about PAM and BLOSUM scoring matrices. Workshop-Compare scoring matrices.

Page 2: Alignment methods April 26, 2011 Return Quiz 1 today Return homework #4 today. Next homework due Tues, May 3 Learning objectives- Understand the Smith-Waterman

Smith-Waterman Algorithm Advances inApplied Mathematics, 2:482-489 (1981)

Smith-Waterman algorithm –can be used for global or local alignment

-Memory intensive

-Common searching programs such as BLAST use SW algorithm

Page 3: Alignment methods April 26, 2011 Return Quiz 1 today Return homework #4 today. Next homework due Tues, May 3 Learning objectives- Understand the Smith-Waterman

Mi,j = MAXIMUM [

Mi-1, j-1 + si,,j (match or mismatch in the diagonal),

Mi, j-1 + w (gap in sequence #1),

Mi-1, j + w (gap in sequence #2),

0]

Where Mi-1, j-1 is the value in the cell diagonally juxtaposed to Mi,j.

(The i-1, j-1 cell is up and to the left of mi,nj).

Where si,j is the value for the match or mismatch in the minj cell.

Where Mi, j-1 is the value in the cell above Mi,j.

Where w is the value for the gap penalty.

Where Mi-1, j is the value in the cell to the left of Mi,j.

Smith-Waterman algorithm

Page 4: Alignment methods April 26, 2011 Return Quiz 1 today Return homework #4 today. Next homework due Tues, May 3 Learning objectives- Understand the Smith-Waterman

Two sequences to align

Sequence 1: ABCNJRQCLCRPM

Sequence 2: AJCJNRCKCRBP

Page 5: Alignment methods April 26, 2011 Return Quiz 1 today Return homework #4 today. Next homework due Tues, May 3 Learning objectives- Understand the Smith-Waterman

Initialization step: create matrix with M + 1 columnsand N + 1 rows. M = number of letters in sequence 1 and N =number of letters in sequence 2. First column (M-1) and first row (N-1) will be filled with 0’s.

A B C N J R Q C L C R P M 0 0 0 0 0 0 0 0 0 0 0 0 0 0

A 0 1 J 0 C 0 J 0 N 0 R 0 C 0 K 0 C 0 R 0 B 0 P 0

Page 6: Alignment methods April 26, 2011 Return Quiz 1 today Return homework #4 today. Next homework due Tues, May 3 Learning objectives- Understand the Smith-Waterman

Matrix fill step: Each position Mi,j is defined to be theMAXIMUM score at position i,j Mi,j = MAXIMUM [

Mi-1, j-1 + si,,j (match or mismatch in the diagonal)Mi, j-1 + w (gap in sequence #1)Mi-1, j + w (gap in sequence #2)]

rowcolumn

A B C N J R Q C L C R P M 0 0 0 0 0 0 0 0 0 0 0 0 0 0

A 0 1 1 1 1 1 1 1 1 1 1 1 1 1 J 0 1 C 0 1 J 0 1 N 0 1 R 0 1 C 0 1 K 0 1 C 0 1 R 0 1 B 0 1 P 0 1

Page 7: Alignment methods April 26, 2011 Return Quiz 1 today Return homework #4 today. Next homework due Tues, May 3 Learning objectives- Understand the Smith-Waterman

A B C N J R Q C L C R P M 0 0 0 0 0 0 0 0 0 0 0 0 0 0

A 0 1 1 1 1 1 1 1 1 1 1 1 1 1 J 0 1 1 1 1 2 2 2 2 2 2 2 2 2 C 0 1 1 2 2 2 2 2 3 3 3 3 3 3 J 0 1 1 2 2 3 3 3 3 3 3 3 3 3 N 0 1 1 2 3 3 3 3 3 3 3 3 3 3 R 0 1 1 2 3 3 4 4 4 4 4 4 4 4 C 0 1 1 2 3 3 4 4 5 5 5 5 5 5 K 0 1 1 2 3 3 4 4 5 5 5 5 5 5 C 0 1 1 2 3 3 4 4 5 5 6 6 6 6 R 0 1 1 2 3 3 4 4 5 5 6 7 7 7 B 0 1 2 2 3 3 4 4 5 5 6 7 7 7 P 0 1 2 2 3 3 4 4 5 5 6 7 8 8

Page 8: Alignment methods April 26, 2011 Return Quiz 1 today Return homework #4 today. Next homework due Tues, May 3 Learning objectives- Understand the Smith-Waterman

Sequence 1: ABCNJ-RQCLCR-PMSequence 2: AJC-JNR-CKCRBP-Score : 8

Page 9: Alignment methods April 26, 2011 Return Quiz 1 today Return homework #4 today. Next homework due Tues, May 3 Learning objectives- Understand the Smith-Waterman

Smith-Waterman (local alignment)

a. Initializes edges of the matrix with zerosb. It searches for sequence matches.c. Assigns a score to each pair of amino acids

-uses similarity scores-uses positive scores for related residues-uses negative scores for substitutions and gaps

d. Scores are summed for placement into Mi,j. If any sum result is below 0, a 0 is placed into Mi,j.e. Backtracing begins at the maximum value found anywhere in the matrix.f. Backtrace continues until the it meets an Mi,j value of 0.

Page 10: Alignment methods April 26, 2011 Return Quiz 1 today Return homework #4 today. Next homework due Tues, May 3 Learning objectives- Understand the Smith-Waterman

BLOSUM 45 Scoring Matrix

Page 11: Alignment methods April 26, 2011 Return Quiz 1 today Return homework #4 today. Next homework due Tues, May 3 Learning objectives- Understand the Smith-Waterman

H E A G A W G H E E

0 0 0 0 0 0 0 0 0 0 0

P

0 0 0 0

0 0 0 0 0 0 0

A

0 0 0 5 0

5 0 0 0 0 0

W

0 0 0 0 3 0

20

12 4 0 0

H

0 10 2 0 0 1 12 18

22 14 6

E

0 2 16 8 0 0 4 10 18 28 20

A

0 0 8 21 13 5 0 4 10 20 27

E

0 0 6 13 19 12 4 0 4 16 26

A W G H E A W – H EScore: 5 15 -8 10 6Total score: 28Pecent similarity: 4/5 x 100 = 80%

Page 12: Alignment methods April 26, 2011 Return Quiz 1 today Return homework #4 today. Next homework due Tues, May 3 Learning objectives- Understand the Smith-Waterman

How does one achieve the “perfect database search”?

Consider the following:Scoring Matrices (PAM vs. BLOSUM)Local alignment algorithmDatabaseSearch Parameters Expect Value-change threshold for score

reporting Filtering-remove repeat sequences

Page 13: Alignment methods April 26, 2011 Return Quiz 1 today Return homework #4 today. Next homework due Tues, May 3 Learning objectives- Understand the Smith-Waterman

Which Scoring Matrix to use?

PAM-1

BLOSUM-100

Small evolutionary distance

High identity within short sequences

PAM-250

BLOSUM-20

Large evolutionary distance

Low identity within long sequences

Page 14: Alignment methods April 26, 2011 Return Quiz 1 today Return homework #4 today. Next homework due Tues, May 3 Learning objectives- Understand the Smith-Waterman

BLOSUM Scoring Matrices

Which BLOSUM Matrix to use?

BLOSUM Identity (up to)

80 80% 62 62% (usually default value) 35 35%

If you are comparing sequences that are very similar, useBLOSUM 80. Sequences that are more divergent (dissimilar)than 20% are given very low scores in this matrix.

Page 15: Alignment methods April 26, 2011 Return Quiz 1 today Return homework #4 today. Next homework due Tues, May 3 Learning objectives- Understand the Smith-Waterman

Logic behind PAM scoring matrix

Page 16: Alignment methods April 26, 2011 Return Quiz 1 today Return homework #4 today. Next homework due Tues, May 3 Learning objectives- Understand the Smith-Waterman

Original amino acid

Replacement amino acid

Page 17: Alignment methods April 26, 2011 Return Quiz 1 today Return homework #4 today. Next homework due Tues, May 3 Learning objectives- Understand the Smith-Waterman

A ---- R 30 ---- N 109 17 ---- D 154 0 532 ---- C 33 10 0 0 ---- Q 93 120 50 76 0 ---- E 266 0 94 831 0 422 ---- G 579 10 156 162 10 30 112 ---- H 21 103 226 43 10 243 23 10 ---- I 66 20 36 13 17 8 35 0 3 ---- L 95 17 37 0 0 75 15 17 40 253 ---- K 57 477 322 85 0 147 104 50 23 43 39 ----

M 29 17 0 0 0 20 7 7 0 57 207 90 ---- F 20 7 7 0 0 0 0 17 20 90 167 0 17 ---- P 345 67 27 10 10 93 40 49 50 7 43 43 4 7 ---- S 772 137 432 98 117 47 86 450 26 20 32 168 20 40 269 ---- T 590 20 169 57 10 37 31 50 14 129 52 200 28 10 73 696 ----

W 0 27 3 0 0 0 0 0 3 0 13 0 0 10 0 17 0 ---- Y 20 3 36 0 30 0 10 0 40 13 23 10 0 260 0 22 23 6 ---- V 365 20 13 17 33 27 37 97 30 661 303 17 77 10 50 43 186 0 17 ----

A R N D C Q E G H I L K M F P S T W Y V

Figure 4.2 Numbers of accepted point mutations (multiplied by 10). A total of 1572 exchanges are shown. Positions with red dashes are Mjj values. Modified from Dayhoff, 1978.

Page 18: Alignment methods April 26, 2011 Return Quiz 1 today Return homework #4 today. Next homework due Tues, May 3 Learning objectives- Understand the Smith-Waterman

Relative mutability calculations

Aligned sequences A D A A D B Amino acids A B D Changes 1 1 0 Frequency of occurrences 3 1 2 Relative mutability 0.33 1 0

Figure 4.3 Simplified example to show how relative mutability is calculated.

Page 19: Alignment methods April 26, 2011 Return Quiz 1 today Return homework #4 today. Next homework due Tues, May 3 Learning objectives- Understand the Smith-Waterman

Development of the Mutation Probability Matrix.

Mij = λmjAij/(ΣAij)

where Aij is an element of the accepted point mutation matrix (see Fig. 4.2) λ is the proportionality constant (to be discussed below) mj is the relative mutability of the amino acids on the bottom row

Here is the second equation. This equation applies when the original amino acid and the replacement amino acid are the same. The diagonal elements (all in cells with the location Mjj) have the value:

Mjj = 1-λmj

Page 20: Alignment methods April 26, 2011 Return Quiz 1 today Return homework #4 today. Next homework due Tues, May 3 Learning objectives- Understand the Smith-Waterman

Development of the Mutation Probability Matrix. (2)

Figure 4.4. Mutational Probability Matrix (partial). This only shows 5 of the 20 amino acids in the MPM. Numbers were multiplied by 10,000 to make it easier to read. The numbers for each column adds up to 10,000. In the top row there are the replacement amino acids and on the left column are the original amino acids. Mjj values shown are 9867, 9913, 9822, 9859 and 9973.

Page 21: Alignment methods April 26, 2011 Return Quiz 1 today Return homework #4 today. Next homework due Tues, May 3 Learning objectives- Understand the Smith-Waterman

Table 4.1 Normalized frequencies of amino acids (from Dayhoff’s data)1

Amino acid Normalized Frequency

Amino acid Normalized Frequency

G 0.089 R 0.041 A 0.087 N 0.040 L 0.085 F 0.040 K 0.081 Q 0.038 S 0.070 I 0.037 V 0.065 H 0.034 T 0.058 C 0.033 P 0.051 Y 0.030 E 0.050 M 0.015 D 0.047 W 0.010

Page 22: Alignment methods April 26, 2011 Return Quiz 1 today Return homework #4 today. Next homework due Tues, May 3 Learning objectives- Understand the Smith-Waterman

What is percent of amino acids that differ in the MPM?

100 x ΣfiMii

Where fi = normalized frequency of amino acid i Where Mii = Mjj

This value totals 99 for each amino acid. There is a 1% differencefor each amino acid

Page 23: Alignment methods April 26, 2011 Return Quiz 1 today Return homework #4 today. Next homework due Tues, May 3 Learning objectives- Understand the Smith-Waterman

Conversion of the PAM1 Mutational Probability Matrix to the PAM1 Scoring Matrix.

Sij=10log10(Mij/fi) equation 4.1

Where Sij is the log-odds score for amino i replacing amino acid j. For a PAM1 scoring matrix let’s take Gly replacing Ala as an example. From Figure 4.2 we obtain a value of 0.0021 for Mij. The fi of Gly is 0.089 (from Table 4.1). Thus, the Sij for this replacement is:

Sij=10log10(0.0021/0.089) = -16

Page 24: Alignment methods April 26, 2011 Return Quiz 1 today Return homework #4 today. Next homework due Tues, May 3 Learning objectives- Understand the Smith-Waterman

Conversion of the PAM1 Mutational Probability Matrix to other PAM scoring matrices.

Table 4.2 Correspondence between Observed Percent Amino acid differences and PAMs PAM Mutational Probability Matrices1 Observed percent amino acid differences2

1 1 5 5 11 10 17 15 30 25 56 40 80 50 112 60 159 70 246 80 1Mutation Probability Matrices generated by the equation

(PAM1 MPM)n where n is the number listed in the first column.

Page 25: Alignment methods April 26, 2011 Return Quiz 1 today Return homework #4 today. Next homework due Tues, May 3 Learning objectives- Understand the Smith-Waterman