76
Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’ -Algoritmos de Programación Dinámica -Dot Plot Miércoles Alineamiento simple de secuencias: Manejo de los programas: Clustal, Macaw y servidores en línea

Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

Embed Size (px)

Citation preview

Page 1: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

Bioinformática 2007-I Prof. Mirko Zimic

Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’ -Algoritmos de Programación Dinámica-Dot Plot

MiércolesAlineamiento simple de secuencias: Manejo de los programas: Clustal, Macaw y servidores en línea

Page 2: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

“Nada en Biología tiene sentido a menos que se entienda en términos

de Evolución”

T. Dobzhansky

Page 3: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

“Alinear” = “Comparar”

Finches of the Galápagos Islands observed by Charles Darwin on the voyage of HMS Beagle

Sequence alignment is similar to other types of comparative analysis

Involves scoring similarities and differences among a group of related entities

Page 4: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

Homología

Homology Is the central concept for all of biology. Whenever we say that a mammalian hormone is the ‘same’ hormone as a fish hormone, that a human gene sequence is the ‘same’ as a sequence in a chimp or a mouse, that a HOX gene is the ‘same’ in a mouse, a fruit fly, a frog and a human - even when we argue that discoveries about a worm, a fruit fly, a frog, a mouse, or a chimp have relevance to the human condition - we have made a bold and direct statement about homology. The aggressive confidence of modern biomedical science implies that we know what we are talking about.”

David B. Wake

Page 5: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

Similitud ≠ Homología

1) 25% similarity ≥ 100 AAs is likely homology

2) Homology is an evolutionary statement which means “descent from a common ancestor” –common 3D structure–usually common function–all or nothing, cannot say "50%

homologous"

Page 6: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

C O M P A R A T I V E A N A L Y S I S

Alignment algorithms model evolutionary processes

GATTACCA

GATGACCA GATTACCA

Derivation from a common ancestor through incremental change due to dna replication errors, mutations, damage, or unequal crossing-over.

insertion

GATCATCA GATTGATCA

GATTACCA GATTATCA GATTACCA

deletionSubstitution

GAT ACCA

T

Page 7: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

C O M P A R A T I V E A N A L Y S I S

Alignment algorithms model evolutionary processes

GATTACCA

GATGACCA GATTACCA

Derivation from a common ancestor through incremental change

GATCATCA GATTGATCA

GATTACCA GATTATCA GATTACCA

GATACCA

Only extant sequences are known, ancestral sequences are postulated.

GATCATCA GATTGATCA

GATTACCA

GATACCA

Page 8: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

The term homology implies a common ancestry, which may be inferred from observations of sequence similarity

C O M P A R A T I V E A N A L Y S I S

Alignment algorithms model evolutionary processes

GATTACCA

GATGACCA GATTACCA

Derivation from a common ancestor through incremental change. Mutations that do not kill the host may carry over to the population. Rarely are mutations kept/rejected by natural selection.

GATCATCA GATTGATCA

GATTACCA GATTATCA GATTACCA

GATACCA

Page 9: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

Sequence AlignmentsSequence Alignments

• Why align?

Can delineate sequence elements that are functionally significant Illuminates phylogenetic relationships

• Algorithms for sequence alignment

Dynamic programming Dot-matrix Word-based algorithms Bayesian methods

Page 10: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

What is Meant by Alignment?What is Meant by Alignment?

Identical nucleotide sequences (trivial example)

A better alignment

ATTCGGCATTCAGTGCTAGAATTCGGCATTCAGTGCTAGA

Score = 20(20 1)

Imperfect match

ATTCGGCATTCAGTGCTAGAATTCGGCATTGCTAGA

Score = 11

ATTCGGCATTCAGTGCTAGAATTCGGCATT----GCTAGA

Score = 14= 10 + 6 + 4(-0.5){

Gap penalty

Page 11: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

Beware of aligning apples and Beware of aligning apples and oranges oranges [[and grapefruitand grapefruit]]!!

Parologous versus orthologous;

genomic versus cDNA;

mature versus precursor.

Page 12: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

Los alineamientos se pueden efectuar tanto en secuencias de ADN como en secuencias de

proteínas…

Page 13: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

Why Do We Want To Compare Sequences

wheat --DPNKPKRAMTSFVFFMSEFRSEFKQKHSKLKSIVEMVKAAGER | | |||||||| || | ||| ||| | |||| ||||????? KKDSNAPKRAMTSFMFFSSDFRS----KHSDL-SIVEMSKAAGAA

EXTRAPOLATE

??????

Homology?

SwissProt

Page 14: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

Why Does It Make Sense To Align Sequences ?

-Evolution is our Real Tool.

-Nature is LAZY and Keeps re-using Stuff.

-Evolution is mostly DIVERGEANT

Same Sequence Same Ancestor

Page 15: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

Why Does It Make Sense To Align Sequences ?

SameSequence

Same Function

Same 3D Fold

Same Origin

Comparing Is Reconstructing Evolution

Page 16: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

An Alignment is a STORY

ADKPKRPLSAYMLWLN

ADKPKRPKPRLSAYMLWLNADKPRRPLS-YMLWLN

ADKPKRPLSAYMLWLN ADKPKRPLSAYMLWLN

Mutations+

Selection

Page 17: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

An Alignment is a STORY

ADKPRRP---LS-YMLWLNADKPKRPKPRLSAYMLWLN

Mutation

InsertionDeletion

ADKPKRPLSAYMLWLN

ADKPKRPKPRLSAYMLWLNADKPRRPLS-YMLWLN

ADKPKRPLSAYMLWLN ADKPKRPLSAYMLWLN

Mutations+

Selection

Page 18: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

Evolution is NOT Always Divergent…

AFGP with (ThrAlaAla)nSimilar To Trypsynogen

AFGP with (ThrAlaAla)nNOT

Similar to Trypsinogen

N

S

SIMILAR Sequences

BUTDIFFERENT origin

…But in MOST cases, you may assume it is.

Page 19: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

How Do Sequences Evolve ?

CONSTRAINED Genome Positions Evolve SLOWLY

EVERY Protein Family Has its Own Level Of Constraint

Family KS KA

Histone3 6.4 0Insulin 4.0 0.1Interleukin I 4.6 1.4Globin 5.1 0.6Apolipoprot. AI 4.5 1.6Interferon G 8.6 2.8

Rates in Substitutions/site/Billion Years as measured on Mouse Vs Human (80 Million years)Ks Synonymous Mutations, Ka Non-Neutral.

Page 20: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

GC

LIV A

F

Aliphatic

Aromatic

Hydrophobic

C

How Do Sequences Evolve ?The amino Acids Venn Diagram

To Make Things Worse, Every Residue has its Own Personality

ST

WY

QHK

R

ED N

Polar

PG

Small

C

Page 21: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

How Do Sequences Evolve ?

In a structure, each Amino Acid plays a Special Role

OmpR, Cter Domain

In the core, SIZE MATTERS

On the surface, CHARGE MATTERS

--+

Page 22: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

How Do Sequences Evolve ?

Accepted Mutations Depend on the Structure

Big -> BigSmall ->SmallNO DELETION

--+

Charged -> ChargedSmall <-> Big or SmallDELETIONS

Page 23: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

How Can We Compare Sequences ?

To Compare Two Sequences, We need:

Their Function

Their Structure

We Do Not Have Them !!!

Page 24: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

How Can We Compare Sequences ?

We will Need To Replace Structural Information With Sequence Information.

SameSequence

Same Function

Same 3D Fold

Same Origin

It CANNOT Work ALL THE TIME !!!

Page 25: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

How Can We Compare Sequences ?

To Compare Sequences, We need to Compare ResiduesWe Need to Know How Much it COSTS to SUBSTITUTE

an Alanine into an Isoleucinea Tryptophan into a Glycine…The table that contains the costs for all the

possible substitutions is called the SUBSTITUTION MATRIX

How to derive that matrix?

Page 26: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

How Can We Compare Sequences ?Making a Substitution Matrix

-Take 100 nice pairs of Protein Sequences, easy to align (80% identical).

-Align them…

-Count each mutations in the alignments

-25 Tryptophans into phenylalanine-30 Isoleucine into Leucine…

-For each mutation, set the substitution score to the log odd ratio:

Expected by chance

ObservedLog

Page 27: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

How Can We Compare Sequences ?Making a Substitution Matrix

The Diagonal Indicates How Conserved a residue tends to be.W is VERY Conserved

Some Residues are Easier To mutate into other similar

Cysteins that make disulfide bridges and those that do not get averaged

Page 28: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’
Page 29: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

How Can We Compare Sequences ?Using Substitution Matrix

ADKPRRP---LS-YMLWLNADKPKRPKPRLSAYMLWLN

Mutation

InsertionDeletion

Given two Sequences and a substitution Matrix,We must Compute the CHEAPEST Alignment

Page 30: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

Most popular Subsitution Matrices • PAM250• Blosum62 (Most widely used)

Raw Score

TPEA¦| |APGA

TPEA¦| |APGA

Score =1 = 9

• Question: Is it possible to get such a good alignment by chance only?

+ 6 + 0 + 2

Scoring an Alignment

Page 31: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

Insertions and Deletions

Gap Penalties

• Opening a gap is more expensive than extending it

Seq AGARFIELDTHE----CAT||||||||||| |||

Seq BGARFIELDTHELASTCAT

Seq AGARFIELDTHE----CAT||||||||||| |||

Seq BGARFIELDTHELASTCAT

gap

Gap Opening PenaltyGap Extension Penalty

Page 32: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

How Can We Compare Sequences ?Limits of the substitution Matrices

They ignore non-local interactions and Assume that identical residues are equal

They assume evolution rate to be constant

ADKPKRPLSAYMLWLN

ADKPKRPKPRLSAYMLWLN

ADKPRRPLS-YMLWLN

ADKPKRPLSAYMLWLNADKPKRPLSAYMLWLN

Mutations+

Selection

Page 33: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

How Can We Compare Sequences ?Limits of the substitution Matrices

Substitution Matrices Cannot Work !!!

Page 34: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

How Can We Compare Sequences ?Limits of the substitution Matrices

I know… But at least, could I get some idea of when they are likely to do all right

Page 35: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

How Can We Compare Sequences ?The Twilight Zone

Length

%Sequence Identity

100

Same 3D Fold

Twilight Zone

Similar SequenceSimilar Structure

30%

Different SequenceStructure ????

30

Page 36: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

How Can We Compare Sequences ?The Twilight Zone

Substitution Matrices Work Reasonably Well on Sequences that have more than 30 % identity over more than 100 residues

Page 37: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’
Page 38: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’
Page 39: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’
Page 40: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’
Page 41: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’
Page 42: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

PAM BLOSUM

Built from global alignments Built from local alignments

Built from small amout of Data Built from vast amout of Data

Counting is based on minimumreplacement or maximum parsimony

Counting based on groups ofrelated sequences counted as one

Perform better for finding globalalignments and remote homologs

Better for finding localalignments

Higher PAM series means moredivergence

Lower BLOSUM series meansmore divergence

Major Differences between PAM and BLOSUM

Page 43: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

How Can We Compare Sequences ?Which Matrix Shall I use

PAM: Distant Proteins High Index (PAM 350)BLOSUM: Distant Proteins Low Index (Blosum30)

•GONNET 250> BLOSUM62>PAM 250.

•But This will depend on:

•The Family.•The Program Used and Its Tuning.

Choosing The Right Matrix may be Tricky…

•Insertions, Deletions?

Page 44: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

Dot MatricesGlobal AlignmentsLocal Alignment

HOW Can we Align Two Sequences ?

Page 45: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’
Page 46: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

Cost

L

Afine Gap Penalty

Global Alignments

-Take 2 Nice Protein Sequences

-A good Substitution Matrix (blosum)

-A Gap opening Penalty (GOP)

-A Gap extension Penalty (GEP)

GOP

GEP

GOP GOP

GOP

Parsimony: Evolution takes the simplest path

(So We Think…)

Page 47: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

Insertions and Deletions

Gap Penalties

• Opening a gap is more expensive than extending it

Seq AGARFIELDTHE----CAT||||||||||| |||

Seq BGARFIELDTHELASTCAT

Seq AGARFIELDTHE----CAT||||||||||| |||

Seq BGARFIELDTHELASTCAT

gap

Gap Opening PenaltyGap Extension Penalty

Page 48: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

Global Alignments

-Take 2 Nice Protein Sequences

-A good Substitution Matrix (blosum)

-A Gap opening Penalty (GOP)

-A Gap extension Penalty (GEP)

>Seq1THEFATCAT>Seq2THEFASTCAT

-DYNAMIC PROGRAMMING

DYNAMICPROGRAMMING

THEFA-TCATTHEFASTCAT

Page 49: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

Global Alignments

F A S T

F A T

----FATFAST---

(L1+l2)!

(L1)!*(L2)!

---FAT-FAST---

--F-AT-FAST---

Brut Force Enumeration

2

( )

DYNAMIC PROGRAMMING

Page 50: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

G A T A C T AG A T T A C C A

Construct an optimal of these two sequences:

Using these scoring rules: Match:

Mismatch:Gap:

+1-1-1

D Y N A M I C P R O G R A M M I N G

Dynamic Programming Example

Page 51: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

D Y N A M I C P R O G R A M M I N G

G A T A C T AGATTACCA

Arrange the sequence residues along a two-dimensional lattice

Vertices of the lattice fall between letters

Page 52: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

D Y N A M I C P R O G R A M M I N G

G A T A C T AGATTACCA

The goal is to find the optimal path

from here

to here

Page 53: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

D Y N A M I C P R O G R A M M I N G

G A T A C T AGATTACCA

Each path corresponds to a unique alignment

Which one is optimal?

Page 54: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

D Y N A M I C P R O G R A M M I N G

G A T A C T AGATTACCA

The score for a path is the sum of its incremental edges scores

A aligned with AMatch = +1

Page 55: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

D Y N A M I C P R O G R A M M I N G

G A T A C T AGATTACCA

The score for a path is the sum of its incremental edges scores A aligned with T

Mismatch = -1

Page 56: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

D Y N A M I C P R O G R A M M I N G

G A T A C T AGATTACCA

The score for a path is the sum of its incremental edges scores

T aligned with NULL

Gap = -1

NULL aligned with T

Page 57: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

D Y N A M I C P R O G R A M M I N G

G A T A C T AGATTACCA

Incrementally extend the path

0 -1

+1-1

Page 58: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

D Y N A M I C P R O G R A M M I N G

G A T A C T AGATTACCA

Incrementally extend the path

0

+1-1

-2

-2

-1

Remember the best sub-path leading to each point on the lattice

Page 59: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

D Y N A M I C P R O G R A M M I N G

G A T A C T AGATTACCA

Incrementally extend the path

0

-1

-2

Remember the best sub-path leading to each point on the lattice

0 +2

+1

-1

-20

Page 60: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

D Y N A M I C P R O G R A M M I N G

G A T A C T AGATTACCA

Incrementally extend the path

0 -2

Remember the best sub-path leading to each point on the lattice

0 +2

+1

-1

-20

-2

-1

Page 61: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

D Y N A M I C P R O G R A M M I N G

G A T A C T AGATTACCA

Incrementally extend the path

0

Remember the best sub-path leading to each point on the lattice

+1

-1

-2-1

-3-2

-3

-2

+3

-1

-1

0

0

+1

+1

+2

Page 62: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

D Y N A M I C P R O G R A M M I N G

G A T A C T AGATTACCA

Incrementally extend the path

0

Remember the best sub-path leading to each point on the lattice

+1

-1

-1

-2

-2 0

0

+1+2

-5-4

-5

-4

-3

-3

-1 -3-2

-10

+1

+2

0

+1-1

+2

-3 -1

-2

+1 +3

+2 +1

+2+3

Page 63: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

D Y N A M I C P R O G R A M M I N G

G A T A C T AGATTACCA

Incrementally extend the path

Remember the best sub-path leading to each point on the lattice

0

+1

-1

-1

-2

-2 0

0

+1+2

-4

-4

-3

-3

-1 -2

0

+2

0

+1-1

+2-2 +2 +1

+2+3

-8

-7

-6

-5

-7-6-5

-5-3

-2 -3

-4

-1

-1

0+1

+1

+1 +3

+2

-4

-6

-3

-2

-3

-1

-4

-5

+1 +3

+1

0 +2

+4

+4

+3

+2

+2

+3

-2 0

-1

+2 +2

+3

Page 64: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

D Y N A M I C P R O G R A M M I N G

G A T A C T AGATTACCA

Trace-back to get optimal path and alignment

0

+1

-1

-1

-2

-2 0

0

+1+2

-4

-4

-3

-3

-1 -2

0

+2

0

+1-1

+2-2 +2 +1

+2+3

-8

-7

-6

-5

-7-6-5

-5-3

-2 -3

-4

-1

-1

0+1

+1

+1 +3

+2

-4

-6

-3

-2

-3

-1

-4

-5

+1 +3

+1

0 +2

+4

+4

+3

+2

+2

+3

-2 0

-1

+2 +2

+3

Page 65: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

D Y N A M I C P R O G R A M M I N G

G A T A C T AGATTACCA

Print out the alignment

AA-TTTAACCTCAA

GG

Page 66: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

Global AlignmentsDYNAMIC PROGRAMMING

Match=1 MisMatch=-1Gap=-1

FAT

F A S T

1

-1

-1

-2

-3

0

-2 -3 -4

2

0

0

Dynamic Programming (Needlman and Wunsch)

FAT

F A S T

1

-1

-1

-2

-3

0

-2 -3 -4

2

0

0 -1 0

0

21-1-1

1

FAT

F A S T

1

-1 -2 -3 -4

2

0

2

1

F A S TF A - T

Page 67: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

Local Alignments

GLOBAL Alignment

LOCAL Alignment

Smith And Waterman (SW)=LOCAL Alignment

Page 68: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

Two different types of Alignment

Needleman & Wunch (J. Mol. Biol. (1970) 48,443-453 : Problem of finding the best path. Revelation: Any partial sub-path that ends at a point along the true optimal path must itself be the optimal path leading to that point. This provides a method to create a matrix of path “score”, the score of a path leading to that point. Trace the optimal path from one end to the other of the two sequences.

Global Alignment methods:

Smith & Waterman.(J. Mol. Biol. (1981), 147,195-197: Use Needleman &Wunch, but report all non-overlapping paths, starting at the highest scoring points in the path graph.

FASTP(Lipman &Pearson(1985),Science 227,1435-1441

BLAST (Altschul et al (1990),J. Mol. Bio. 215,408-410): don’t report all overlapping paths, but only attempt to find paths if there are words that are high-scoring. Speeds up considerably the alignments.

Local Alignment methods:

Page 69: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

Global vs. Local AlignmentGlobal vs. Local Alignment

High-scoringsubsequence Gap

Global alignment

Local alignment

Global alignment: best overall alignment independent of whether local high-scoring sequences are included

Local alignment: alignments involving high-scoring sequences take precedence of global features

Page 70: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

G L O B A L & L O C A L S I M I L A R I T Y

Implementations of dynamic programming for global and local similarities

Optimal global alignment

Needleman & Wunsch (1970)

Sequences align essentially from end to end

Optimal local alignment

Smith & Waterman (1981)

Sequences align only in small, isolated regions

Page 71: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

Filtering low complexity sequences

• Filters out short repeats and low complexity regions from the query sequences before searching the database

• Filtering helps to obtain statistically significant results and reduce the background noise resulting from matches with repeats and low complexity regions

• The output shows which regions of the query sequence were masked

Page 72: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

Sequence Periodicities in Kinetoplast DNASequence Periodicities in Kinetoplast DNA

Marini et al. Proc. Natl. Acad. Sci. USA 79, 7664-7668 (1982)

Page 73: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

Local Alignments

We now have a PairWise Comparison Algorithm,

We are ready to search Databases

Page 74: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’

Database Search

1.10e-20

10

1.10e-100

1.10e-2

1.10e-1

10

3

1

3

6

1.10e-2

1

20

15

13

QUERRY

Comparison Engine

Database

E-valuesHow many time do we expect such anAlignment by chance?

SWQ

Page 75: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’
Page 76: Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’