22
Alignment methods April 21, 2009 Quiz 1-April 23 (JAM lectures through today) Writing assignment topic due Tues, April 23 Hand in homework #3 Why has HbS stayed in the population? Learning objectives- Understand difference between global alignment and local alignment. Understand the Needleman-Wunsch algorithm. Understand the Smith-Waterman algorithm in global alignment mode. Workshop-Perform alignment of two nucleotide sequences Homework #4 due Tues, April 23

Alignment methods April 21, 2009 Quiz 1-April 23 (JAM lectures through today) Writing assignment topic due Tues, April 23 Hand in homework #3 Why has HbS

Embed Size (px)

Citation preview

Page 1: Alignment methods April 21, 2009 Quiz 1-April 23 (JAM lectures through today) Writing assignment topic due Tues, April 23 Hand in homework #3 Why has HbS

Alignment methodsApril 21, 2009Quiz 1-April 23 (JAM lectures through today)Writing assignment topic due Tues, April 23Hand in homework #3Why has HbS stayed in the population?Learning objectives- Understand difference between global alignment and local alignment. Understand the Needleman-Wunsch algorithm. Understand the Smith-Waterman algorithm in global alignment mode.Workshop-Perform alignment of two nucleotide sequencesHomework #4 due Tues, April 23

Page 2: Alignment methods April 21, 2009 Quiz 1-April 23 (JAM lectures through today) Writing assignment topic due Tues, April 23 Hand in homework #3 Why has HbS

Evolutionary Basis of Sequence Alignment

Why are there regions of identity when comparing protein sequences?

1) Conserved function-amino acid residues participate in reaction.

2) Structural (For example, conserved cysteine residues that form a disulfide linkage)

3) Historical-Residues that are conserved solely due to a common ancestor gene.

Page 3: Alignment methods April 21, 2009 Quiz 1-April 23 (JAM lectures through today) Writing assignment topic due Tues, April 23 Hand in homework #3 Why has HbS

Identity Matrix

Simplest type of scoring matrix

LICA

1000L

100I

10C

1A

Page 4: Alignment methods April 21, 2009 Quiz 1-April 23 (JAM lectures through today) Writing assignment topic due Tues, April 23 Hand in homework #3 Why has HbS

Similarity

It is easy to score if an amino acid is identical to another (thescore is 1 if identical and 0 if not). However, it is not easy togive a score for amino acids that are somewhat similar.

+NH3CO2

- +NH3CO2

-

Leucine Isoleucine

Should they get a 0 (non-identical) or a 1 (identical) orSomething in between?

Page 5: Alignment methods April 21, 2009 Quiz 1-April 23 (JAM lectures through today) Writing assignment topic due Tues, April 23 Hand in homework #3 Why has HbS

One is mouse trypsin and the other is crayfish trypsin.They are homologous proteins. The sequences share 41% identity.

Page 6: Alignment methods April 21, 2009 Quiz 1-April 23 (JAM lectures through today) Writing assignment topic due Tues, April 23 Hand in homework #3 Why has HbS
Page 7: Alignment methods April 21, 2009 Quiz 1-April 23 (JAM lectures through today) Writing assignment topic due Tues, April 23 Hand in homework #3 Why has HbS

Evolutionary Basis of Sequence Alignment (Cont. 2)

Note: it is possible that two proteins share a high degree of similarity but have two different functions. For example, human gamma-crystallin is a lens protein that has no knownenzymatic activity. It shares a high percentage of identity withE. coli quinone oxidoreductase. These proteins likely had acommon ancestor but their functions diverged.

Analogous to railroad car and diner. Both have the same form butdifferent functions.

Page 8: Alignment methods April 21, 2009 Quiz 1-April 23 (JAM lectures through today) Writing assignment topic due Tues, April 23 Hand in homework #3 Why has HbS

Global Alignment Method

For example, the two hypothetical sequences abcdefghajklm abbdhijk

could be aligned like this abcdefghajklm || | | || abbd...hijkAs shown, there are 6 matches,2 mismatches, and one gap of length 3.

Page 9: Alignment methods April 21, 2009 Quiz 1-April 23 (JAM lectures through today) Writing assignment topic due Tues, April 23 Hand in homework #3 Why has HbS

Global Alignment Method Scored

The alignment is scored according to a payoff matrix

$payoff = {match => $match, mismatch => $mismatch, gap_open => $gap_open, gap_extend => $gap_extend};

For correct operation, an algorithm is created such that the match must be positive and the other payoff entities must be negative.

Page 10: Alignment methods April 21, 2009 Quiz 1-April 23 (JAM lectures through today) Writing assignment topic due Tues, April 23 Hand in homework #3 Why has HbS

Global Alignment Method (cont. 3)

Example

Given the payoff matrix $payoff = {match => 4, mismatch => -3, gap_open => -2, gap_extend => -1};

Page 11: Alignment methods April 21, 2009 Quiz 1-April 23 (JAM lectures through today) Writing assignment topic due Tues, April 23 Hand in homework #3 Why has HbS

Global Alignment Method (cont. 4)

The sequences abcdefghajklm abbdhijkare aligned and scored like this a b c d e f g h a j k l m | | | | | | a b b d . . . h i j k match 4 4 4 4 4 4 mismatch -3 -3 gap_open -2 gap_extend -1-1-1for a total score of 24-6-2-3 = 13.

Page 12: Alignment methods April 21, 2009 Quiz 1-April 23 (JAM lectures through today) Writing assignment topic due Tues, April 23 Hand in homework #3 Why has HbS

Global Alignment Method (cont. 5)

The algorithm should guarantee that no otheralignment of these two sequences has ahigher score under this payoff matrix.

Page 13: Alignment methods April 21, 2009 Quiz 1-April 23 (JAM lectures through today) Writing assignment topic due Tues, April 23 Hand in homework #3 Why has HbS

Alignment A

Sequence 1: ABCNJ-RQCLCR-PM Sequence 2: AJC-JNR-CKCRBP- Score: 101010101011010Total Score: 8

Alignment B

Sequence 1: ABC-NJRQCLCR-PM Sequence 2: AJCJN-R-CKCRBP- Score: 101010101011010Total Score: 8

Let’s align the following with a simple payoff matrix:ABCNJRQCLCRPM and AJCJNRCKCRBPWhere match = 1

mismatch = 0gap = 0gap extension = 0

Page 14: Alignment methods April 21, 2009 Quiz 1-April 23 (JAM lectures through today) Writing assignment topic due Tues, April 23 Hand in homework #3 Why has HbS

Three steps in Dynamic Programming

1. Initialization

2. Matrix fill or scoring

3. Traceback and alignment

Page 15: Alignment methods April 21, 2009 Quiz 1-April 23 (JAM lectures through today) Writing assignment topic due Tues, April 23 Hand in homework #3 Why has HbS

Initialization step

Page 16: Alignment methods April 21, 2009 Quiz 1-April 23 (JAM lectures through today) Writing assignment topic due Tues, April 23 Hand in homework #3 Why has HbS

Matrix Fill (bottom two rows)

Page 17: Alignment methods April 21, 2009 Quiz 1-April 23 (JAM lectures through today) Writing assignment topic due Tues, April 23 Hand in homework #3 Why has HbS

Matrix Fill (bottom three rows)

Page 18: Alignment methods April 21, 2009 Quiz 1-April 23 (JAM lectures through today) Writing assignment topic due Tues, April 23 Hand in homework #3 Why has HbS

Matrix Fill (entire matrix)

Sequence 1: ABCNJ-RQCLCR-PM Sequence 2: AJC-JNR-CKCRBP- Score: 101010101011010Total Score: 8

Sequence 1: ABC-NJRQCLCR-PM Sequence 2: AJCJN-R-CKCRBP- Score: 101010101011010Total Score: 8

Page 19: Alignment methods April 21, 2009 Quiz 1-April 23 (JAM lectures through today) Writing assignment topic due Tues, April 23 Hand in homework #3 Why has HbS

Mi,j = MAXIMUM [

Mi-1, j-1 + si,,j (match or mismatch in the diagonal),

Mi, j-1 + w (gap in sequence #1),

Mi-1, j + w (gap in sequence #2),

0]

Where Mi-1, j-1 is the value in the cell diagonally juxtaposed to Mi,j.

(The i-1, j-1 cell is up and to the left of mi,nj).

Where si,j is the value for the match or mismatch in the minj cell.

Where Mi, j-1 is the value in the cell above Mi,j.

Where w is the value for the gap penalty.

Where Mi-1, j is the value in the cell to the left of Mi,j.

Smith-Waterman algorithm

Page 20: Alignment methods April 21, 2009 Quiz 1-April 23 (JAM lectures through today) Writing assignment topic due Tues, April 23 Hand in homework #3 Why has HbS

Initialization step: Create Matrix with M + 1 columnsand N + 1 rows. M = number of letters in sequence 1 and N =number of letters in sequence 2. First column (M-1) and first row (N-1) will be filled with 0’s.

Page 21: Alignment methods April 21, 2009 Quiz 1-April 23 (JAM lectures through today) Writing assignment topic due Tues, April 23 Hand in homework #3 Why has HbS

Matrix fill step: Each position Mi,j is defined to be theMAXIMUM score at position i,j Mi,j = MAXIMUM [

Mi-1, j-1 + si,,j (match or mismatch in the diagonal)Mi, j-1 + w (gap in sequence #1)Mi-1, j + w (gap in sequence #2)]

rowcolumn

Page 22: Alignment methods April 21, 2009 Quiz 1-April 23 (JAM lectures through today) Writing assignment topic due Tues, April 23 Hand in homework #3 Why has HbS

Sequence 1: ABCNJ-RQCLCR-PMSequence 2: AJC-JNR-CKCRBP-Score : 8