46
Dr. R. Sankar, BSE 633 (2020) Global vs Local alignment Global: Matched regions are long Cover most of the two aligning sequences Depend on the presence of many gaps Negative mismatch scores and gap penalties are deliberately chosen to be small in comparison with match score Score will be proportional to the length of the sequence Local: Tend to be shorter; does not include many gaps A negative mismatch score and gap penalties are chosen to balance the positive score of a match This prevents the alignments from growing into regions that do not match well

Global vs Local alignment - IIT Kanpurhome.iitk.ac.in/~rsankar/courses/lec03.pdfDr. R. Sankar, BSE 633 (2020) Global vs Local alignment Global: •Matched regions are long •Cover

  • Upload
    others

  • View
    16

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Global vs Local alignment - IIT Kanpurhome.iitk.ac.in/~rsankar/courses/lec03.pdfDr. R. Sankar, BSE 633 (2020) Global vs Local alignment Global: •Matched regions are long •Cover

Dr. R. Sankar, BSE 633 (2020)

Global vs Local alignment Global: •Matched regions are long

•Cover most of the two aligning sequences

•Depend on the presence of many gaps

•Negative mismatch scores and gap penalties are deliberately chosen to be small in comparison with match score

•Score will be proportional to the length of the sequence

Local: •Tend to be shorter; does not include many gaps

•A negative mismatch score and gap penalties are chosen to balance the positive score of a match

•This prevents the alignments from growing into regions that do not match well

Page 2: Global vs Local alignment - IIT Kanpurhome.iitk.ac.in/~rsankar/courses/lec03.pdfDr. R. Sankar, BSE 633 (2020) Global vs Local alignment Global: •Matched regions are long •Cover

Dr. R. Sankar, BSE 633 (2020)

Limitations

Computationally expensive

Sequences having two or more matching regions that have intervening regions that do not match well (Smith-Waterman)

Page 3: Global vs Local alignment - IIT Kanpurhome.iitk.ac.in/~rsankar/courses/lec03.pdfDr. R. Sankar, BSE 633 (2020) Global vs Local alignment Global: •Matched regions are long •Cover

Dr. R. Sankar, BSE 633 (2020)

Page 4: Global vs Local alignment - IIT Kanpurhome.iitk.ac.in/~rsankar/courses/lec03.pdfDr. R. Sankar, BSE 633 (2020) Global vs Local alignment Global: •Matched regions are long •Cover

Dr. R. Sankar, BSE 633 (2020)

Page 5: Global vs Local alignment - IIT Kanpurhome.iitk.ac.in/~rsankar/courses/lec03.pdfDr. R. Sankar, BSE 633 (2020) Global vs Local alignment Global: •Matched regions are long •Cover

Dr. R. Sankar, BSE 633 (2020)

Margaret Dayhoff (1925-1983)

Page 6: Global vs Local alignment - IIT Kanpurhome.iitk.ac.in/~rsankar/courses/lec03.pdfDr. R. Sankar, BSE 633 (2020) Global vs Local alignment Global: •Matched regions are long •Cover

Dr. R. Sankar, BSE 633 (2020)

Page 7: Global vs Local alignment - IIT Kanpurhome.iitk.ac.in/~rsankar/courses/lec03.pdfDr. R. Sankar, BSE 633 (2020) Global vs Local alignment Global: •Matched regions are long •Cover

Dr. R. Sankar, BSE 633 (2020)

Amino acid substitution matrices

Certain amino acid substitutions commonly occur in related proteins from different species

These substituted amino acids are compatible with protein structure and function

Types of amino acid changes that are most and least common in a large number of proteins can assist with predicting alignments for any set of protein sequences

Amino acid substitution matrices are used for such purposes

Page 8: Global vs Local alignment - IIT Kanpurhome.iitk.ac.in/~rsankar/courses/lec03.pdfDr. R. Sankar, BSE 633 (2020) Global vs Local alignment Global: •Matched regions are long •Cover

Dr. R. Sankar, BSE 633 (2020)

Seq #1: Y C D A

Seq #2: F M E G

3 -1 2 0

Total score = 3-1+2+0 = 4

Alignment with gaps

Each value represents odd scores

Odd scores are probabilities; they are multiplied to give an overall odd scores

For convenience, odd scores are converted to log odd scores so that the values can be summed

Page 9: Global vs Local alignment - IIT Kanpurhome.iitk.ac.in/~rsankar/courses/lec03.pdfDr. R. Sankar, BSE 633 (2020) Global vs Local alignment Global: •Matched regions are long •Cover

Dr. R. Sankar, BSE 633 (2020)

Amino acid substitution matrix: An example

Page 10: Global vs Local alignment - IIT Kanpurhome.iitk.ac.in/~rsankar/courses/lec03.pdfDr. R. Sankar, BSE 633 (2020) Global vs Local alignment Global: •Matched regions are long •Cover

Dr. R. Sankar, BSE 633 (2020)

Amino acid substitution matrices

• Each matrix position is filled with a score

• Scores reflect how often one AA would have been paired with the other in an alignment of related protein sequences

• Probability (A B) = Probability (B A)

• Likelihood of replacement depends on 1. The product of the frequency of occurrence of two AAs

2. Their chemical and physical properties

Page 11: Global vs Local alignment - IIT Kanpurhome.iitk.ac.in/~rsankar/courses/lec03.pdfDr. R. Sankar, BSE 633 (2020) Global vs Local alignment Global: •Matched regions are long •Cover

Dr. R. Sankar, BSE 633 (2020)

Scoring matrices

PAM (Percent Accepted Mutation) Matrices

BLOSUM (Blocks Amino Acid Substitution Matrices)

Page 12: Global vs Local alignment - IIT Kanpurhome.iitk.ac.in/~rsankar/courses/lec03.pdfDr. R. Sankar, BSE 633 (2020) Global vs Local alignment Global: •Matched regions are long •Cover

Dr. R. Sankar, BSE 633 (2020)

Percent Accepted Mutation (PAM) Matrices

Based on evolutionary principles; family of matrices

One matrix gives the AA changes expected in homologous proteins that have diverged only a small amount from each other in a relatively short period of time; still 50% or more similar

Another gives AA changes that have diverged over a much longer period; < 20% similarity

Predicted changes are used to score the alignment and produce an optimal alignment

Page 13: Global vs Local alignment - IIT Kanpurhome.iitk.ac.in/~rsankar/courses/lec03.pdfDr. R. Sankar, BSE 633 (2020) Global vs Local alignment Global: •Matched regions are long •Cover

Dr. R. Sankar, BSE 633 (2020)

PAM1: A PAM unit is a time period over which 1% of amino acids in a sequence are expected to undergo accepted mutations some of which may occur in the same position

PAM Units in PAM matrices

The two sequences are 100 PAM Unit diverged does not mean that they are different in every position

Page 14: Global vs Local alignment - IIT Kanpurhome.iitk.ac.in/~rsankar/courses/lec03.pdfDr. R. Sankar, BSE 633 (2020) Global vs Local alignment Global: •Matched regions are long •Cover

Dr. R. Sankar, BSE 633 (2020)

Percent Accepted Mutation (PAM) Matrices

Assumptions

Each change in the current AA at a particular site is assumed to be independent of previous mutational events at that site

AA substitutions are viewed as a Markov model

A series of changes of state in a system such that a change from one state to another does not depend on the previous history of the state

AA substitutions observed over short periods of evolutionary history can be extrapolated to longer distances

Page 15: Global vs Local alignment - IIT Kanpurhome.iitk.ac.in/~rsankar/courses/lec03.pdfDr. R. Sankar, BSE 633 (2020) Global vs Local alignment Global: •Matched regions are long •Cover

Dr. R. Sankar, BSE 633 (2020)

PAM Matrices (Dayhoff Matrices)

71 groups of protein sequences; at least 85% similar

1572 changes Changes were observed in closely related proteins

They do not significantly change the function

“accepted” by natural selection and hence “accepted mutations”

Number of changes of each AA

Relative amount of change in each AA

Normalizes the data for variations in AA composition, mutation rate and sequence length

Page 16: Global vs Local alignment - IIT Kanpurhome.iitk.ac.in/~rsankar/courses/lec03.pdfDr. R. Sankar, BSE 633 (2020) Global vs Local alignment Global: •Matched regions are long •Cover

Dr. R. Sankar, BSE 633 (2020)

• Construct 71 phylogenetic trees of protein

families • Observe amino acid substitutions on each

branch of tree • Also need probability of occurrence for each

amino acid (pa)

PAM Matrix Construction

Slide courtesy: Chris Bailey ([email protected])

Page 17: Global vs Local alignment - IIT Kanpurhome.iitk.ac.in/~rsankar/courses/lec03.pdfDr. R. Sankar, BSE 633 (2020) Global vs Local alignment Global: •Matched regions are long •Cover

Dr. R. Sankar, BSE 633 (2020)

Construct a multiple sequence alignment

Create a phylogenetic tree from the alignment

PAM Matrix construction

Step 1

Step 2

Page 18: Global vs Local alignment - IIT Kanpurhome.iitk.ac.in/~rsankar/courses/lec03.pdfDr. R. Sankar, BSE 633 (2020) Global vs Local alignment Global: •Matched regions are long •Cover

Dr. R. Sankar, BSE 633 (2020)

• Using substitution data calculate fab the observed frequency of the mutation a ↔ b

• Also note that fab = fba

• Using this information calculate fa, the total number of mutations in which a involved

PAM Matrix Construction

∑≠

=ab

aba ff

Page 19: Global vs Local alignment - IIT Kanpurhome.iitk.ac.in/~rsankar/courses/lec03.pdfDr. R. Sankar, BSE 633 (2020) Global vs Local alignment Global: •Matched regions are long •Cover

Dr. R. Sankar, BSE 633 (2020)

Step 3

For each amino acid type, the frequency with which it is substituted by each other amino acid

A G = G A

F(G,A) = 3

PAM Matrix construction

Page 20: Global vs Local alignment - IIT Kanpurhome.iitk.ac.in/~rsankar/courses/lec03.pdfDr. R. Sankar, BSE 633 (2020) Global vs Local alignment Global: •Matched regions are long •Cover

Dr. R. Sankar, BSE 633 (2020)

• And also calculate f, the total occurences of amino acid substitutions

• From here we go on to calculate relative mutability:

∑=a

aff

a

aa

pffm

100=

PAM Matrix Construction

Page 21: Global vs Local alignment - IIT Kanpurhome.iitk.ac.in/~rsankar/courses/lec03.pdfDr. R. Sankar, BSE 633 (2020) Global vs Local alignment Global: •Matched regions are long •Cover

Dr. R. Sankar, BSE 633 (2020)

Step 4: Relative mutability of Ala

Number of mutations in which Ala = 4 is substituted

Total number of mutations = 6 x 2 = 12

Relative frequency of Ala = 10/63 = 0.159

Relative mutability = 4/(12 x 0.159 x 100) = 0.0209

Page 22: Global vs Local alignment - IIT Kanpurhome.iitk.ac.in/~rsankar/courses/lec03.pdfDr. R. Sankar, BSE 633 (2020) Global vs Local alignment Global: •Matched regions are long •Cover

Dr. R. Sankar, BSE 633 (2020)

Relative mutability of residue a

• How much the residue a likes to change compared to other residues

• Relative mutability: Probability that a given amino acid will change in the evolutionary period of interest

Page 23: Global vs Local alignment - IIT Kanpurhome.iitk.ac.in/~rsankar/courses/lec03.pdfDr. R. Sankar, BSE 633 (2020) Global vs Local alignment Global: •Matched regions are long •Cover

Dr. R. Sankar, BSE 633 (2020)

• 20 x 20 Matrix where Mab is the probability of amino acid a changing into amino acid b

• Maa = 1 – ma • Mab is more complicated & requires

conditional probability – E.g. P(A and B) = P(A)∙P(B|A)

PAM Matrix Construction

Page 24: Global vs Local alignment - IIT Kanpurhome.iitk.ac.in/~rsankar/courses/lec03.pdfDr. R. Sankar, BSE 633 (2020) Global vs Local alignment Global: •Matched regions are long •Cover

Dr. R. Sankar, BSE 633 (2020)

• In this case:

• Or:

changed) ()changed |( aPabaPMab →=

aa

abab m

ffM =

PAM Matrix Construction

Page 25: Global vs Local alignment - IIT Kanpurhome.iitk.ac.in/~rsankar/courses/lec03.pdfDr. R. Sankar, BSE 633 (2020) Global vs Local alignment Global: •Matched regions are long •Cover

Dr. R. Sankar, BSE 633 (2020)

Step 5: Mutation probability for each pair of a.a

Relative mutability of Ala = 0.0209

Frequency of Ala Gly = 3

Frequency of all a.a pairs in which Ala is substituted = 4

Mutation probability of Ala Gly:

MA,G = (0.0209 * 3 )/4 = 0.0156

Page 26: Global vs Local alignment - IIT Kanpurhome.iitk.ac.in/~rsankar/courses/lec03.pdfDr. R. Sankar, BSE 633 (2020) Global vs Local alignment Global: •Matched regions are long •Cover

Dr. R. Sankar, BSE 633 (2020)

• The final scores in a PAM matrix are expressed as a lod (logarithm of odds) score

• Compare probability of mutation vs probability of random occurrence

• Gives odds ratio:

• Scoring Matrix S is calculated by: b

ab

pM

=

b

abab

pMS 10log10

PAM Matrix Construction

Page 27: Global vs Local alignment - IIT Kanpurhome.iitk.ac.in/~rsankar/courses/lec03.pdfDr. R. Sankar, BSE 633 (2020) Global vs Local alignment Global: •Matched regions are long •Cover

Dr. R. Sankar, BSE 633 (2020)

• These equations allow us to calculate a PAM1 matrix

• The number after PAM is the number of amino acid substitutions per 100 residues: – PAM40 – 40 substitutions per 100 residues – PAM250 – 250 substitutions per 100 residues

• All matrices calculated by multiplication of

PAM1 matrix

PAM Matrix Construction

Page 28: Global vs Local alignment - IIT Kanpurhome.iitk.ac.in/~rsankar/courses/lec03.pdfDr. R. Sankar, BSE 633 (2020) Global vs Local alignment Global: •Matched regions are long •Cover

Dr. R. Sankar, BSE 633 (2020)

1. For each amino acid type, the frequency with which it is substituted by each other amino acid

2. Relative mutatability, mi, of each amino acid

3. Mutation probability Mij

4. Divide Mij by frequency of occurrence, fi, of residue i

5. Take the log of these values

6. Fill the matrix Rij (off-diagonal entries)

7. Diagonal entries: Mjj = 1 – mj

8. Follow steps 4 to 6 to fill diagonal entries

PAM Matrix construction

Page 29: Global vs Local alignment - IIT Kanpurhome.iitk.ac.in/~rsankar/courses/lec03.pdfDr. R. Sankar, BSE 633 (2020) Global vs Local alignment Global: •Matched regions are long •Cover

Dr. R. Sankar, BSE 633 (2020) top row shows original amino acid; left column shows replacement amino acid

PAM1 evolutionary distance

Page 30: Global vs Local alignment - IIT Kanpurhome.iitk.ac.in/~rsankar/courses/lec03.pdfDr. R. Sankar, BSE 633 (2020) Global vs Local alignment Global: •Matched regions are long •Cover

Dr. R. Sankar, BSE 633 (2020)

top row shows original amino acid; left column shows replacement amino acid

(Adapted from Figure 82. Atlas of Protein Sequence and Structure, Suppl 3, 1978, M.O. Dayhoff, ed. National Biomedical Research Foundation, 1979.)

•Mutation probability matrix for the evolutionary distance of 1 PAM (i.e., one Accepted Point Mutation per 100 amino acids).

•An element of this matrix, Mi,j , gives the probability that the amino acid in column j will be replaced by the amino acid in row i after a given evolutionary interval,

•in this case 1 PAM. Thus, there is a 0.56% probability that Asp will be replaced by Glu. To simplify the appearance, the elements are shown multiplied by 10,000.

Page 31: Global vs Local alignment - IIT Kanpurhome.iitk.ac.in/~rsankar/courses/lec03.pdfDr. R. Sankar, BSE 633 (2020) Global vs Local alignment Global: •Matched regions are long •Cover

Dr. R. Sankar, BSE 633 (2020)

PAM250 evolutionary distance

Page 32: Global vs Local alignment - IIT Kanpurhome.iitk.ac.in/~rsankar/courses/lec03.pdfDr. R. Sankar, BSE 633 (2020) Global vs Local alignment Global: •Matched regions are long •Cover

Dr. R. Sankar, BSE 633 (2020)

(Adapted from Figure 83. Atlas of Protein Sequence and Structure, Suppl 3, 1978, M.O. Dayhoff, ed. National Biomedical Research Foundation, 1979.)

Mutation probability matrix for the evolutionary distance of 250 PAMs. To simplify the appearance, the elements are shown multiplied by 100.

In comparing two sequences of average amino acid frequency at this evolutionary distance, there is a 13% probability that a position containing Ala in the first sequence will contain Ala in the second.

There is a 3% chance that it will contain Arg, and so forth. The relationship of two sequences at a distance of 250 PAMs can be demonstrated by statistical methods.

Page 33: Global vs Local alignment - IIT Kanpurhome.iitk.ac.in/~rsankar/courses/lec03.pdfDr. R. Sankar, BSE 633 (2020) Global vs Local alignment Global: •Matched regions are long •Cover

Dr. R. Sankar, BSE 633 (2020)

http://biomath.geneseo.edu/symposium/talks/darling.pdf

http://www.cse.ucsd.edu/classes/sp04/cse182/slides/L6.pdf

Page 34: Global vs Local alignment - IIT Kanpurhome.iitk.ac.in/~rsankar/courses/lec03.pdfDr. R. Sankar, BSE 633 (2020) Global vs Local alignment Global: •Matched regions are long •Cover

Dr. R. Sankar, BSE 633 (2020)

Protein A: amino acid sequence

a1a2a3a4…(time t)

Assume that 1% of all amino acids in protein A have undergone substitutions at time t + t’

New amino acid sequence at t + t’

b1b2b3b4…(call this sequence as Protein A’)

PAM1: The probability that a residue of type i in protein A will be replaced by j in the protein A’

Page 35: Global vs Local alignment - IIT Kanpurhome.iitk.ac.in/~rsankar/courses/lec03.pdfDr. R. Sankar, BSE 633 (2020) Global vs Local alignment Global: •Matched regions are long •Cover

Dr. R. Sankar, BSE 633 (2020)

Page 36: Global vs Local alignment - IIT Kanpurhome.iitk.ac.in/~rsankar/courses/lec03.pdfDr. R. Sankar, BSE 633 (2020) Global vs Local alignment Global: •Matched regions are long •Cover

Dr. R. Sankar, BSE 633 (2020)

Page 37: Global vs Local alignment - IIT Kanpurhome.iitk.ac.in/~rsankar/courses/lec03.pdfDr. R. Sankar, BSE 633 (2020) Global vs Local alignment Global: •Matched regions are long •Cover

Dr. R. Sankar, BSE 633 (2020)

The relative mutability of amino acids

Asn 134 His 66 Ser 120 Arg 65 Asp 106 Lys 56 Glu 102 Pro 56 Ala 100 Gly 49 Thr 97 Tyr 41 Ile 96 Phe 41 Met 94 Leu 40 Gln 93 Cys 20 Val 74 Trp 18

Note that alanine is normalized to a value of 100. Trp and cys are least mutable. Asn and ser are most mutable.

Page 38: Global vs Local alignment - IIT Kanpurhome.iitk.ac.in/~rsankar/courses/lec03.pdfDr. R. Sankar, BSE 633 (2020) Global vs Local alignment Global: •Matched regions are long •Cover

Dr. R. Sankar, BSE 633 (2020)

Normalized frequencies of amino acids: variations in frequency of occurrence

Gly 8.9% Arg 4.1% Ala 8.7% Asn 4.0% Leu 8.5% Phe 4.0% Lys 8.1% Gln 3.8% Ser 7.0% Ile 3.7% Val 6.5% His 3.4% Thr 5.8% Cys 3.3% Pro 5.1% Tyr 3.0% Glu 5.0% Met 1.5% Asp 4.7% Trp 1.0% blue=6 codons; red=1 codon;

note: should be 5% for each if equally distributed

Page 39: Global vs Local alignment - IIT Kanpurhome.iitk.ac.in/~rsankar/courses/lec03.pdfDr. R. Sankar, BSE 633 (2020) Global vs Local alignment Global: •Matched regions are long •Cover

Dr. R. Sankar, BSE 633 (2020)

Comparing two proteins with a PAM1 matrix gives completely different results than PAM250!

Consider two distantly related proteins. A PAM40 matrix is not forgiving of mismatches, and penalizes them severely. Using this matrix you can find almost no match.

A PAM250 matrix is very tolerant of mismatches.

hsrbp, 136 CRLLNLDGTC btlact, 3 CLLLALALTC * ** * **

24.7% identity in 81 residues overlap; Score: 77.0; Gap frequency: 3.7% rbp4 26 RVKENFDKARFSGTWYAMAKKDPEGLFLQDNIVAEFSVDETGQMSATAKGRVRLLNNWDV btlact 21 QTMKGLDIQKVAGTWYSLAMAASD-ISLLDAQSAPLRVYVEELKPTPEGDLEILLQKWEN * **** * * * * ** * rbp4 86 --CADMVGTFTDTEDPAKFKM btlact 80 GECAQKKIIAEKTKIPAVFKI ** * ** **

Page 40: Global vs Local alignment - IIT Kanpurhome.iitk.ac.in/~rsankar/courses/lec03.pdfDr. R. Sankar, BSE 633 (2020) Global vs Local alignment Global: •Matched regions are long •Cover

Dr. R. Sankar, BSE 633 (2020)

PAM 250 matrix

Page 41: Global vs Local alignment - IIT Kanpurhome.iitk.ac.in/~rsankar/courses/lec03.pdfDr. R. Sankar, BSE 633 (2020) Global vs Local alignment Global: •Matched regions are long •Cover

Dr. R. Sankar, BSE 633 (2020)

1978 1991

L 0.085 0.091

A 0.087 0.077

G 0.089 0.074

S 0.070 0.069

V 0.065 0.066

E 0.050 0.062

T 0.058 0.059

K 0.081 0.059

I 0.037 0.053

D 0.047 0.052

R 0.041 0.051

P 0.051 0.051

N 0.040 0.043

Q 0.038 0.041

F 0.040 0.040

Y 0.030 0.032

M 0.015 0.024

H 0.034 0.023

C 0.033 0.020

W 0.010 0.014

Amino acid frequencies

bioinfo.mbb.yale.edu/course/ classes/c10/AAS-matrices3.html

Page 42: Global vs Local alignment - IIT Kanpurhome.iitk.ac.in/~rsankar/courses/lec03.pdfDr. R. Sankar, BSE 633 (2020) Global vs Local alignment Global: •Matched regions are long •Cover

Dr. R. Sankar, BSE 633 (2020)

1978 1991

A 100 100

C 20 44

D 106 86

E 102 77

F 41 51

G 49 50

H 66 91

I 96 103

K 56 72

L 40 54

M 94 93

N 134 104

P 56 58

Q 93 84

R 65 83

S 120 117

T 97 107

V 74 98

W 18 25

Y 41 50

Relative mutabilities of amino acids:

http://bioinfo.mbb.yale.edu/course/ classes/c10/AAS-matrices3.html

Page 43: Global vs Local alignment - IIT Kanpurhome.iitk.ac.in/~rsankar/courses/lec03.pdfDr. R. Sankar, BSE 633 (2020) Global vs Local alignment Global: •Matched regions are long •Cover

Dr. R. Sankar, BSE 633 (2020)

-the prob of a mutation is independent of the history of the sequence

-the prob of a mutation is independent of the position within the sequence

-the prob of a mutation at any position is independent of the rest of the sequence

-all positions within the sequence mutate at the same rate

-long term substitution patterns can be extrapolated from short term

-AA distributions in the set of protein families used to make the scoring matrix are representative of all the families that are likely to be encountered

Implicit assumptions in PAM matrix calculations

Page 44: Global vs Local alignment - IIT Kanpurhome.iitk.ac.in/~rsankar/courses/lec03.pdfDr. R. Sankar, BSE 633 (2020) Global vs Local alignment Global: •Matched regions are long •Cover

Dr. R. Sankar, BSE 633 (2020)

Sources of Error in PAM

Page 45: Global vs Local alignment - IIT Kanpurhome.iitk.ac.in/~rsankar/courses/lec03.pdfDr. R. Sankar, BSE 633 (2020) Global vs Local alignment Global: •Matched regions are long •Cover

Dr. R. Sankar, BSE 633 (2020)

Page 46: Global vs Local alignment - IIT Kanpurhome.iitk.ac.in/~rsankar/courses/lec03.pdfDr. R. Sankar, BSE 633 (2020) Global vs Local alignment Global: •Matched regions are long •Cover

Dr. R. Sankar, BSE 633 (2020)

Go to UniProt database (www.uniprot.org)

Extract the following hemoglobin sequences with accession ids:

(a) P69905, (b) P68871, (c) O04985

Use the pairwise alignment tools Needle and Water available at the URL http://www.ebi.ac.uk/Tools/psa/ with all default parameters.

•Align sequences (a) and (b) using both Needle and Water

•What do you observe in the pairwise alignments?

•What is your conclusion?

•Align sequences (a) and (c) using both Needle and Water

•What do you observe in the pairwise alignments

•What is your conclusion?

Repeat the above exercise with different PAM matrices (PAM50, PAM100, PAM250)

Compare the output you get for lower and higher PAM matrices

Exercise