68
Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-5) x (# gap openings) + (-2) x (total length of all gaps)

Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)

Embed Size (px)

Citation preview

Page 1: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)

Evolution and Scoring Rules

Example Score = 5 x (# matches) + (-4) x (# mismatches) +

+ (-7) x (total length of all gaps)

Example Score = 5 x (# matches) + (-4) x (# mismatches) +

+ (-5) x (# gap openings) + (-2) x (total length of all gaps)

Page 2: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)
Page 3: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)
Page 4: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)

Scoring Matrices

Page 5: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)

Scoring Rules vs. Scoring Matrices Nucleotide vs. Amino Acid Sequence The choice of a scoring rule can strongly

influence the outcome of sequence analysis Scoring matrices implicitly represent a

particular theory of evolution Elements of the matrices specify the

similarity of one residue to another

Page 6: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)

DNA: A T G C

1:1

RNA: A U G C

3:1

Protein: 20 amino acids

Transcription

Translation

Replication

Translation - Protein Synthesis: Every 3 nucleotides (codon) are translated into one amino acid

Page 7: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)

Nucleotide sequence determines the amino acid sequence

Page 8: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)

Translation - Protein Synthesis

5’ -> 3’ : N-term -> C-term RNA Protein

Page 9: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)
Page 10: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)
Page 11: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)

Log Likelihoods used as Scoring Matrices:

PAM - % Accepted Mutations:1500 changes in 71 groups w/ > 85%

similarity

BLOSUM – Blocks Substitution Matrix:2000 “blocks” from 500 families

Page 12: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)

Log Likelihoods used as Scoring Matrices:

BLOSUM

ji

ijij pp

pS 2log2

Page 13: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)

Likelihood Ratio for Aligning a Single Pair of Residues

•Above: the probability that two residues are aligned by evolutionary descent

•Below: the probability that they are aligned by chance

•Pi, Pj are frequencies of residue i and j in all protein sequences (abundance)

ji

ijij pp

p

ji

jiS log

chance)by | withalignedPr(

ancestry)common | withalignedPr(log

Page 14: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)

Likelihood Ratio of Aligning Two Sequences

tjsiij

ji

ij

ji

ij Spp

p

pp

p

ji

ji

,

loglog

chance)by | withalignedPr(

ancestry)common | withalignedPr(log

)chanceby |alignmentPr(

)ancestrycommon |alignmentPr(log

alignment of ratiolik log

Page 15: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)

The alignment score of aligning two sequences is the log likelihood ratio of the alignment under two models Common ancestry By chance

Page 16: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)

PAM and BLOSUM matrices are all log likelihood matrices

More specificly: An alignment that scores 6 means

that the alignment by common ancestry is 2^(6/2)=8 times as likely as expected by chance.

ji

ijij pp

pS 2log2

Page 17: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)

BLOSUM matrices for Protein

S. Henikoff and J. Henikoff (1992). “Amino acid substitution matrices from protein blocks”. PNAS 89: 10915-10919

Training Data: ~2000 conserved blocks from BLOCKS database. Ungapped, aligned protein segments. Each block represents a conserved region of a protein family

Page 18: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)

Constructing BLOSUM Matrices of Specific Similarities

Sets of sequences have widely varying similarity. Sequences with above a threshold similarity are clustered.

If clustering threshold is 62%, final matrix is BLOSUM62

Page 19: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)

A toy example of constructing a BLOSUM matrix from 4

training sequences

Page 20: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)

Constructing a BLOSUM matr.1. Counting mutations

Page 21: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)

Constructing a BLOSUM matr.2. Tallying mutation frequencies

Page 22: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)

Constructing a BLOSUM matr.3. Matrix of mutation probs.

Page 23: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)

4. Calculate abundance of each residue (Marginal prob)

Page 24: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)

5. Obtaining a BLOSUM matrix

Page 25: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)

Constructing the real BLOSUM62 Matrix

Page 26: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)

1.2.3.Mutation Frequency Table

1000ijP

Page 27: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)

4. Calculate Amino Acid Abundance

acid aminoeach of likelihood marginal the: ip

Page 28: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)

5. Obtaining BLOSUM62 Matrix

ji

ijij pp

pS 2log2

Page 29: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)
Page 30: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)

PAM Matrices (Point Accepted Mutations)

Mutations accepted by natural selection

Page 31: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)

PAM Matrices Accepted Point Mutation Atlas of Protein Sequence and Structure,

Suppl 3, 1978, M.O. Dayhoff.

ed. National Biomedical Research Foundation, 1

Based on evolutionary principles

Page 32: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)

Constructing PAM Matrix: Training Data

Page 33: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)

PAM: Phylogenetic Tree

Page 34: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)

PAM: Accepted Point Mutation

Page 35: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)

Mutability

Page 36: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)

Total Mutation Rate

is the total mutation rate of all amino acids

Page 37: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)

Normalize Total Mutation Rate

Page 38: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)

Mutation Probability Matrix Normalized

Such that the Total Mutation Rate is 1%

Page 39: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)

Mutation Probability Matrix (transposed) M*10000

Page 40: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)

-- PAM1 mutation prob. matr. --PAM2 Mutation Probability Matrix?

-- Mutations that happen in twice the evolution period of that for a PAM1

)1(M)2(M

Page 41: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)

PAM Matrix: Assumptions

Page 42: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)

In two PAM1 periods: {AR} = {AA and AR} or {AN and NR} or {AD and DR} or … or {AV and VR}

Page 43: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)

period) 2ndin RPr(Dperiod)1st in DPr(A

period) 2ndin RPr(Nperiod)1st in NPr(A

period) 2ndin RPr(Aperiod)1st in A Pr(A

periods) 2in RAPr(

DRADNRANARAAAR PPPPPPP )2(

Entries in a PAM-2 Mut. Prob. Matr.

Page 44: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)

PAM-k Mutation Prob. Matrix

KK MM

MMM

}{ )1()(

)1()1()2(

Page 45: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)

PAM-1 log likelihood matrix

ji

ijij pp

PS

)1(

10log10

Page 46: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)

PAM-k log likelihood matrix

ji

kij

ijk

pp

PS

)(

10)( log10

Page 47: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)

PAM-250

Page 48: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)

PAM60—60%, PAM80—50%, PAM120—40% PAM-250 matrix provides a better

scoring alignment than lower-numbered PAM matrices for proteins of 14-27% similarity

Page 49: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)

Sources of Error in PAM

Page 50: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)

Comparing Scoring MatrixPAM

Based on extrapolation of a small evol. Period

Track evolutionary origins Homologous seq.s during

evolution

BLOSUM Based on a range of

evol. Periods Conserved blocks Find conserved

domains

Page 51: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)

Choice of Scoring Matrix

Page 52: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)

Global Alignment with Affine Gaps

Complex Dynamic Programming

Page 53: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)

Problem w/ Independent Gap Penalties The occurrence of x consecutive

deletions/insertions is more likely than the occurrence of x isolated mutations

We should penalize x long gap less than x

times of the penalty for one gap

Page 54: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)

Affine Gap Penalty

w2 is the penalty for each gap w1 is the _extra_ penalty for the

1st gap

Page 55: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)

Scoring Rule not Additive! We need to know if the current gap

is a new gap or the continuation of an existing gap

Use three Dynamic Programming matrices to keep track of the previous step

Page 56: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)

S1 is the vertical sequenceS2 is the horizontal sequence (From Diagonal) a(i,j): current position

is a match (From Left) b(i,j): current position is a

gap in S1 (From Above) c(i,j): current position is a

gap in S2Filling the next element in each matrix

depends on the previous step, which is stored in the three matrices.

Page 57: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)
Page 58: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)

Last step a match

a gap in S2

a gap in S1

new gap in S2

a continued gap in S2

a gap in S2 following a gap in S1

Page 59: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)
Page 60: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)
Page 61: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)
Page 62: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)
Page 63: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)
Page 64: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)
Page 65: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)

Decisions in Seq. Alignment Local or global alignment? Which program to use Type of scoring matrix Value of gap penalty

Page 66: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)

Aij*10

Page 67: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)

PAM-k log-likelihood matrix

Page 68: Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)