48
1 Multiple Sequence Alignment Lesson 5

Multiple Sequence Alignment

  • Upload
    karli

  • View
    31

  • Download
    0

Embed Size (px)

DESCRIPTION

Multiple Sequence Alignment. Lesson 5. Example. VTIS C TGSSSNIGAG-NHVK W YQQLPG VTIS C TGTSSNIGS--ITVN W YQQLPG LRLS C SSSGFIFSS--YAMY W VRQAPG LSLT C TVSGTSFDD--YYST W VRQPPG PEVT C VVVDVSHEDPQVKFN W YVDG-- ATLV C LISDFYPGA--VTVA W KADS-- AALG C LVKDYFPEP--VTVS W NSG--- - PowerPoint PPT Presentation

Citation preview

Page 1: Multiple Sequence Alignment

1

Multiple Sequence Alignment

Lesson 5

2

Example

VTISCTGSSSNIGAG-NHVKWYQQLPGVTISCTGTSSNIGS--ITVNWYQQLPGLRLSCSSSGFIFSS--YAMYWVRQAPGLSLTCTVSGTSFDD--YYSTWVRQPPGPEVTCVVVDVSHEDPQVKFNWYVDG--ATLVCLISDFYPGA--VTVAWKADS--AALGCLVKDYFPEP--VTVSWNSG---VSLTCLVKGFYPSD--IAVEWWSNG--

3

Why multiple sequence alignmentbull Structure similarity ndash aa that play the same role in

each structure are in the same columnbull Evolutionary similarity ndash aa related to the same

ancestor are in the same columnbull Functional similarity - aa with the same function are in

the same columnbull Seq similarity ndash alignment with max similarity No

biological meaning

bull When seqs are closely related structure-evolution-functional similarity equivalent

4

bull Histones small abundant proteins Present in all eukaryotic chromosomes

bull Show a remarkable conserved multiple sequence alignment

bull Conservation of structure and function (aid in DNA package)

Why multiple alignment - examplebull Example

5

MSA applicationsbull Generate protein familiesbull Extrapolation ndash membership of uncharacterized sequence to a

protein familybull Understand evolution - preliminary step in molecular evolution

analysis for constructing phylogenetic trees eg is the duck evolutionary closer to a lion or to a fruit fly

bull Pattern identification ndash find the important (conserved) region in the protein - conserved positions may characterize a function

bull Domain identification ndash Build a consensusprofilemotif that describe the protein family help to describe new members of the family

bull DNA regulatory elementsbull Structure prediction (secondary and 3D model)

6

multiple sequence alignment

Pairwise solution might be very different from multiple solution

7

בתיאוריה כן באופן האם אפשר פשוט להכליל את שיטת התיכנות הדינמי פרקטי לא

מדובר בזמן ריצהובגודל זיכרון הגדלים

כאשר NKכ Nהוא אורך הרצף Kמספר הרצפים

למעשה בלתי אפשריKgt3 עבור

הבעיה החישובית בהינתן מספר רצפים מצא את ההתאמה שלהם למסגרת תקבל ערך אופטימלי שפונקצית מרחקמשותפת (עי הוספת רווחים) כך

-SCGPFIRVMSCGPGLRA-SCTPHL-AMSC-PKIRGMS-LPLLRNMSHKPALRA

SCGPFIRVMSCGPGLRASCTPHLAMSCPKIRGMSLPLLRNMSHKPALRA

8

מאחר שהבעיה החישובית קשה אנו יודעים שכל שיטה שנציע לא תבטיח פתרון אופטימלי נחפש שיטה שנותנת תוצאות טובות

ברב המקרים יתכן שהשיטה שנבחר תהיה תלויה בסיבה שבגללה

אנו מבצעים את ההתאמה

1 Choosing the sequencess2 Scoring metrics3 Approximationheuristic algorithms4 MSA formats5 Interpreting the MSA

The problems we have to answer

9

Scoring metricsbull Distance from Consensus - The consensus of an alignment

is a string of the most common characters in each column of the alignment The total distance between the strings is defined as the number of characters that differ from the consensus character of their column let C be the consensus sequence then the total distance is sum_i D(Si C)

bull Evolutionary Tree Alignment - The weight of the lightest evolutionary tree that can be constructed from the sequences with the weight of the tree defined as the number of changes between pairs of sequences that correspond to two adjacent nodes in the tree summed over all such pairs

bull Sum of Pairs - The sum of pairwise distances between all pairs of sequences Sum_iltj D(Si Sj)

10

-SCGPFIRVMSCGPGLRA-SCTPHL-A

-SCGPFIRVMSCGPGLRA

-SCGPFIRV-SCTPHL-A

MSCGPGLRA-SCTPHL-A

5 3 5 13

Scoring metrics -examplesSum of pairs

-SCGPFIRVMSCGPGLRA-SCTPHL-AMSC-PKIRGMS-LPLLRNMSHKPALRA

3 5 4 1 6 2 4 6 354סהכ הומוגניות

3 1 2 5 0 4 2 0 42 19סהכ מרחק

Distance from concensus

11

MSA algorithms

bull Progressive methods (CLUSTALWT-Coffee)

bull Iterative methods (Dialign)

bull Direct optimization (monte carlo genetic algorithms)

bull Local methods eMotifs Blocks Psi-blast

12

CLUSTALW algorithm

bull Compare all sequence pairs (pairwise alignment)

bull Generate a hierarchy for alignment (guide tree)

bull Build the multiple alignment step by step according to the guide tree first aligning the most similar pair then add another sequence or another pairwise alignment etc

13

(1) Pairwise alignment (prepare a guide tree)

6 pairwise alignments

then cluster analysis

(2) Multiple alignment following the tree from (1)

successive alignments

CLUSTALW algorithm

14

בונים עץ עי חיבור הרצפים הדומים לפי סדר התאמתם2

Hbb Hbb Hba Hba Myg Human Horse Human Horse Whale

בונים את ההתאמה לפי הסדר המוכתב3 עי העץ

ישנם שלושה מצביםהתאמת זוג רצפים - התאמה בין התאמות-הוספת רצף בודד להתאמה קיימת-

בונים טבלה של מרחק עריכה בין כל שני רצפים1CLUSTALW algorithm

15

ניתן להתאים בין התאמות (או בין רצף להתאמה) עי הרחבה די טבעית של שיטת התיכנות הדינמי הוספה

והכנסת אותיות באופן הרגיל המחיר להתאמת אותיות הוא מחיר ממוצע לקומבינציות השונות צריך לתת מחיר מיוחד

לרווח

16

נקודות עדינות

נותנים משקלות שונים לרצפים כך שאם יש מספר רצפים מאד דומים bullהמשקל היחסי של כל אחד יקטן

שיטה מיוחדת לקביעת מחיר להכנסת רווחיםbull

חסרונותמאחר שהבניה היא בלתי הפיכה יש תלות גדולה מאד באיכות ההתאמה bull

של הזוגות הראשוניםOnce a Gap Always a gapרווחים אינם נעלמים bullרגישות רבה לקנס הרווחיםbull

יתרונותמהירbullסבירbullאפשר לקבוע הרבה פרמטריםbullכולם משתמשיםbull

CLUSTALW algorithm

17

bull We can deduce a pairwise alignment for each two sequences in the multiple alignment (projected pairwise alignment)

bull The projected pairwise alignment is NOT the best pairwise alignment for the two sequences

bull CLUTALW is not an optimal algorithm Better alignments might exist The algorithm yields a possible alignment but not necessarily the best one

Best Pairwise alignment (optimal)

Projected Pairwise alignment

CLUSTALW algorithm

18

ClustalW at EMBLhttpwwwebiacukclustalw Clustalw at the SRS site at EBI

19

ClustalW Output Aln format

20

MSA Editing Jalview

Conservation

wwwesembnetorgServicesMolBiojalviewindexhtml

21

MSA formats - fasta

22

MSA formats - Aln

23

MSA formats - MSF

24

Example 1a a good MSA

25

Example 1b making MSA of distantly related proteins

26

Example 1c including more distant relatives in the MSA

27

bull Mononuclear iron proteins ndash electron carrier proteins Iron atoms are bound to amino acid side chains

bull In IPNS the metal ion is coordinated by three protein residues

bull IPNS is involved in biosynthesis of penicillin

Example 2 Isopenicillin N Synthase

N

SN

OCOOHO

H

H

Me

Me

COOH

NH2

N

SN

OCOOH

NH2 Me

Me

COOHO

ACV

Isopenicillin N

Fe+2

Ascorbate

O2

2H2O Fe

N

NHis268

H

N

NHis212

H

SACV

O2 (NO)

H2O

OAsp214

O

28

Research IPNS

bull Goal Identify Fe+2 binding residues

bull Possible solutions1 In the lab

2 Bioinformatic approach (comparing different IPNS sequences)

29

Step 1

Multiple alignment of known IPNS

Implementation

1 Obtain sequence (eg for MCBI)

IPNS AND Bacteria[Organism]

2 MSA (clustalw) and search for conserved residues in the MSA

30

31

32

MSA ndash bacteria only

Not enough variation

33

MSA ndash bacteria amp fungi

Not enough variation

bacteria amp fungi

bacteria

34

Step 2

Goal Add more enzymes similar to IPNS

ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW

35

bull New multiple alignment narrowing down the possibilities

36

Simple multiple alignment

bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite

similarbull Not enough variability to categorize the active

sitesbull We need to obtain even more distant sequences

(distant homologs)

37

Step 3

Using the results of the MSA for further searches

Implementation

1 Obtain an MSA (clustalw)

2 - Construct a consensus sequence and perform a

new search

OR

- Construct a profile and perform a new search

3 MSA (clustalw) and search for conserved residues in the MSA

38

Consensus Sequence

bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column

bull Consensus each position reflects the most common character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

39

Profilebull We can deduce a statistical model describing

the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column

bull Profile each position reflects the frequency of the character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 1 067 0 0

T 0 033 1 1

C 0 0 0 0

G 0 0 0 0

40

Profile vs Consensusbull The following multiple alignments will

have the same consensus

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

41

Profile vs Consensusbull But have a different profile

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 066 1 0 0

T 0 0 0 1

C 033 0 066

0

G 0 0 033

0

1 2 3 4 5 6

A 1 1 0 0

T 0 0 0 1

C 0 0 1 0

G 0 0 0 0

42

Sequence LOGO A A C C G C T C T T

A G C C G C G C - T

A - C A G A G C C T

A A G C A C G C - T

A C G G G T G C T T

A T G C ndash C G C - T

A gc c g G C T

httpweblogoberkeleyedu

43

Psi Blast

bull Position Specific Iterated - automatic profile-like search

Regular blast

Construct profile from blast results

Blast profile search

Final results

44

bull Alignment with distantly related proteins

45

bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS

Enzyme Relative Km kcat kcatKm

Activity (mM) (min-1) (mM-1min

-1)Wild type 100 04 388 969

His48Ala 16 056 75 134

His63Ala 31 10 142 142

His114Ala 28 085 125 147

His124Ala 48 084 321 381

His135Ala 22 059 117 198

His212Ala lt0007 nd nd

His268Ala lt0003 nd nd

Asp14Ala 5 086 056 07

Asp113Ala 63 045 238 528

Asp131Ala 68 048 363 755

Asp203Ala 32 091 123 135

Asp214Ala lt0004 nd nd

Isopenicillin N Synthase

46

ndash IPNS

47

48

  • Multiple Sequence Alignment
  • Example
  • Why multiple sequence alignment
  • Why multiple alignment - example
  • MSA applications
  • Slide 6
  • Slide 7
  • Slide 8
  • Scoring metrics
  • Slide 10
  • MSA algorithms
  • CLUSTALW algorithm
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • ClustalW at EMBL
  • ClustalW Output Aln format
  • MSA Editing Jalview
  • MSA formats - fasta
  • MSA formats - Aln
  • MSA formats - MSF
  • Example 1a a good MSA
  • Slide 25
  • Slide 26
  • Example 2 Isopenicillin N Synthase
  • Research IPNS
  • Step 1
  • Slide 30
  • Slide 31
  • MSA ndash bacteria only
  • MSA ndash bacteria amp fungi
  • Step 2
  • Slide 35
  • Simple multiple alignment
  • Step 3
  • Consensus Sequence
  • Profile
  • Profile vs Consensus
  • Slide 41
  • Slide 42
  • Psi Blast
  • Slide 44
  • Isopenicillin N Synthase
  • Slide 46
  • Slide 47
  • Slide 48
Page 2: Multiple Sequence Alignment

2

Example

VTISCTGSSSNIGAG-NHVKWYQQLPGVTISCTGTSSNIGS--ITVNWYQQLPGLRLSCSSSGFIFSS--YAMYWVRQAPGLSLTCTVSGTSFDD--YYSTWVRQPPGPEVTCVVVDVSHEDPQVKFNWYVDG--ATLVCLISDFYPGA--VTVAWKADS--AALGCLVKDYFPEP--VTVSWNSG---VSLTCLVKGFYPSD--IAVEWWSNG--

3

Why multiple sequence alignmentbull Structure similarity ndash aa that play the same role in

each structure are in the same columnbull Evolutionary similarity ndash aa related to the same

ancestor are in the same columnbull Functional similarity - aa with the same function are in

the same columnbull Seq similarity ndash alignment with max similarity No

biological meaning

bull When seqs are closely related structure-evolution-functional similarity equivalent

4

bull Histones small abundant proteins Present in all eukaryotic chromosomes

bull Show a remarkable conserved multiple sequence alignment

bull Conservation of structure and function (aid in DNA package)

Why multiple alignment - examplebull Example

5

MSA applicationsbull Generate protein familiesbull Extrapolation ndash membership of uncharacterized sequence to a

protein familybull Understand evolution - preliminary step in molecular evolution

analysis for constructing phylogenetic trees eg is the duck evolutionary closer to a lion or to a fruit fly

bull Pattern identification ndash find the important (conserved) region in the protein - conserved positions may characterize a function

bull Domain identification ndash Build a consensusprofilemotif that describe the protein family help to describe new members of the family

bull DNA regulatory elementsbull Structure prediction (secondary and 3D model)

6

multiple sequence alignment

Pairwise solution might be very different from multiple solution

7

בתיאוריה כן באופן האם אפשר פשוט להכליל את שיטת התיכנות הדינמי פרקטי לא

מדובר בזמן ריצהובגודל זיכרון הגדלים

כאשר NKכ Nהוא אורך הרצף Kמספר הרצפים

למעשה בלתי אפשריKgt3 עבור

הבעיה החישובית בהינתן מספר רצפים מצא את ההתאמה שלהם למסגרת תקבל ערך אופטימלי שפונקצית מרחקמשותפת (עי הוספת רווחים) כך

-SCGPFIRVMSCGPGLRA-SCTPHL-AMSC-PKIRGMS-LPLLRNMSHKPALRA

SCGPFIRVMSCGPGLRASCTPHLAMSCPKIRGMSLPLLRNMSHKPALRA

8

מאחר שהבעיה החישובית קשה אנו יודעים שכל שיטה שנציע לא תבטיח פתרון אופטימלי נחפש שיטה שנותנת תוצאות טובות

ברב המקרים יתכן שהשיטה שנבחר תהיה תלויה בסיבה שבגללה

אנו מבצעים את ההתאמה

1 Choosing the sequencess2 Scoring metrics3 Approximationheuristic algorithms4 MSA formats5 Interpreting the MSA

The problems we have to answer

9

Scoring metricsbull Distance from Consensus - The consensus of an alignment

is a string of the most common characters in each column of the alignment The total distance between the strings is defined as the number of characters that differ from the consensus character of their column let C be the consensus sequence then the total distance is sum_i D(Si C)

bull Evolutionary Tree Alignment - The weight of the lightest evolutionary tree that can be constructed from the sequences with the weight of the tree defined as the number of changes between pairs of sequences that correspond to two adjacent nodes in the tree summed over all such pairs

bull Sum of Pairs - The sum of pairwise distances between all pairs of sequences Sum_iltj D(Si Sj)

10

-SCGPFIRVMSCGPGLRA-SCTPHL-A

-SCGPFIRVMSCGPGLRA

-SCGPFIRV-SCTPHL-A

MSCGPGLRA-SCTPHL-A

5 3 5 13

Scoring metrics -examplesSum of pairs

-SCGPFIRVMSCGPGLRA-SCTPHL-AMSC-PKIRGMS-LPLLRNMSHKPALRA

3 5 4 1 6 2 4 6 354סהכ הומוגניות

3 1 2 5 0 4 2 0 42 19סהכ מרחק

Distance from concensus

11

MSA algorithms

bull Progressive methods (CLUSTALWT-Coffee)

bull Iterative methods (Dialign)

bull Direct optimization (monte carlo genetic algorithms)

bull Local methods eMotifs Blocks Psi-blast

12

CLUSTALW algorithm

bull Compare all sequence pairs (pairwise alignment)

bull Generate a hierarchy for alignment (guide tree)

bull Build the multiple alignment step by step according to the guide tree first aligning the most similar pair then add another sequence or another pairwise alignment etc

13

(1) Pairwise alignment (prepare a guide tree)

6 pairwise alignments

then cluster analysis

(2) Multiple alignment following the tree from (1)

successive alignments

CLUSTALW algorithm

14

בונים עץ עי חיבור הרצפים הדומים לפי סדר התאמתם2

Hbb Hbb Hba Hba Myg Human Horse Human Horse Whale

בונים את ההתאמה לפי הסדר המוכתב3 עי העץ

ישנם שלושה מצביםהתאמת זוג רצפים - התאמה בין התאמות-הוספת רצף בודד להתאמה קיימת-

בונים טבלה של מרחק עריכה בין כל שני רצפים1CLUSTALW algorithm

15

ניתן להתאים בין התאמות (או בין רצף להתאמה) עי הרחבה די טבעית של שיטת התיכנות הדינמי הוספה

והכנסת אותיות באופן הרגיל המחיר להתאמת אותיות הוא מחיר ממוצע לקומבינציות השונות צריך לתת מחיר מיוחד

לרווח

16

נקודות עדינות

נותנים משקלות שונים לרצפים כך שאם יש מספר רצפים מאד דומים bullהמשקל היחסי של כל אחד יקטן

שיטה מיוחדת לקביעת מחיר להכנסת רווחיםbull

חסרונותמאחר שהבניה היא בלתי הפיכה יש תלות גדולה מאד באיכות ההתאמה bull

של הזוגות הראשוניםOnce a Gap Always a gapרווחים אינם נעלמים bullרגישות רבה לקנס הרווחיםbull

יתרונותמהירbullסבירbullאפשר לקבוע הרבה פרמטריםbullכולם משתמשיםbull

CLUSTALW algorithm

17

bull We can deduce a pairwise alignment for each two sequences in the multiple alignment (projected pairwise alignment)

bull The projected pairwise alignment is NOT the best pairwise alignment for the two sequences

bull CLUTALW is not an optimal algorithm Better alignments might exist The algorithm yields a possible alignment but not necessarily the best one

Best Pairwise alignment (optimal)

Projected Pairwise alignment

CLUSTALW algorithm

18

ClustalW at EMBLhttpwwwebiacukclustalw Clustalw at the SRS site at EBI

19

ClustalW Output Aln format

20

MSA Editing Jalview

Conservation

wwwesembnetorgServicesMolBiojalviewindexhtml

21

MSA formats - fasta

22

MSA formats - Aln

23

MSA formats - MSF

24

Example 1a a good MSA

25

Example 1b making MSA of distantly related proteins

26

Example 1c including more distant relatives in the MSA

27

bull Mononuclear iron proteins ndash electron carrier proteins Iron atoms are bound to amino acid side chains

bull In IPNS the metal ion is coordinated by three protein residues

bull IPNS is involved in biosynthesis of penicillin

Example 2 Isopenicillin N Synthase

N

SN

OCOOHO

H

H

Me

Me

COOH

NH2

N

SN

OCOOH

NH2 Me

Me

COOHO

ACV

Isopenicillin N

Fe+2

Ascorbate

O2

2H2O Fe

N

NHis268

H

N

NHis212

H

SACV

O2 (NO)

H2O

OAsp214

O

28

Research IPNS

bull Goal Identify Fe+2 binding residues

bull Possible solutions1 In the lab

2 Bioinformatic approach (comparing different IPNS sequences)

29

Step 1

Multiple alignment of known IPNS

Implementation

1 Obtain sequence (eg for MCBI)

IPNS AND Bacteria[Organism]

2 MSA (clustalw) and search for conserved residues in the MSA

30

31

32

MSA ndash bacteria only

Not enough variation

33

MSA ndash bacteria amp fungi

Not enough variation

bacteria amp fungi

bacteria

34

Step 2

Goal Add more enzymes similar to IPNS

ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW

35

bull New multiple alignment narrowing down the possibilities

36

Simple multiple alignment

bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite

similarbull Not enough variability to categorize the active

sitesbull We need to obtain even more distant sequences

(distant homologs)

37

Step 3

Using the results of the MSA for further searches

Implementation

1 Obtain an MSA (clustalw)

2 - Construct a consensus sequence and perform a

new search

OR

- Construct a profile and perform a new search

3 MSA (clustalw) and search for conserved residues in the MSA

38

Consensus Sequence

bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column

bull Consensus each position reflects the most common character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

39

Profilebull We can deduce a statistical model describing

the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column

bull Profile each position reflects the frequency of the character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 1 067 0 0

T 0 033 1 1

C 0 0 0 0

G 0 0 0 0

40

Profile vs Consensusbull The following multiple alignments will

have the same consensus

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

41

Profile vs Consensusbull But have a different profile

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 066 1 0 0

T 0 0 0 1

C 033 0 066

0

G 0 0 033

0

1 2 3 4 5 6

A 1 1 0 0

T 0 0 0 1

C 0 0 1 0

G 0 0 0 0

42

Sequence LOGO A A C C G C T C T T

A G C C G C G C - T

A - C A G A G C C T

A A G C A C G C - T

A C G G G T G C T T

A T G C ndash C G C - T

A gc c g G C T

httpweblogoberkeleyedu

43

Psi Blast

bull Position Specific Iterated - automatic profile-like search

Regular blast

Construct profile from blast results

Blast profile search

Final results

44

bull Alignment with distantly related proteins

45

bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS

Enzyme Relative Km kcat kcatKm

Activity (mM) (min-1) (mM-1min

-1)Wild type 100 04 388 969

His48Ala 16 056 75 134

His63Ala 31 10 142 142

His114Ala 28 085 125 147

His124Ala 48 084 321 381

His135Ala 22 059 117 198

His212Ala lt0007 nd nd

His268Ala lt0003 nd nd

Asp14Ala 5 086 056 07

Asp113Ala 63 045 238 528

Asp131Ala 68 048 363 755

Asp203Ala 32 091 123 135

Asp214Ala lt0004 nd nd

Isopenicillin N Synthase

46

ndash IPNS

47

48

  • Multiple Sequence Alignment
  • Example
  • Why multiple sequence alignment
  • Why multiple alignment - example
  • MSA applications
  • Slide 6
  • Slide 7
  • Slide 8
  • Scoring metrics
  • Slide 10
  • MSA algorithms
  • CLUSTALW algorithm
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • ClustalW at EMBL
  • ClustalW Output Aln format
  • MSA Editing Jalview
  • MSA formats - fasta
  • MSA formats - Aln
  • MSA formats - MSF
  • Example 1a a good MSA
  • Slide 25
  • Slide 26
  • Example 2 Isopenicillin N Synthase
  • Research IPNS
  • Step 1
  • Slide 30
  • Slide 31
  • MSA ndash bacteria only
  • MSA ndash bacteria amp fungi
  • Step 2
  • Slide 35
  • Simple multiple alignment
  • Step 3
  • Consensus Sequence
  • Profile
  • Profile vs Consensus
  • Slide 41
  • Slide 42
  • Psi Blast
  • Slide 44
  • Isopenicillin N Synthase
  • Slide 46
  • Slide 47
  • Slide 48
Page 3: Multiple Sequence Alignment

3

Why multiple sequence alignmentbull Structure similarity ndash aa that play the same role in

each structure are in the same columnbull Evolutionary similarity ndash aa related to the same

ancestor are in the same columnbull Functional similarity - aa with the same function are in

the same columnbull Seq similarity ndash alignment with max similarity No

biological meaning

bull When seqs are closely related structure-evolution-functional similarity equivalent

4

bull Histones small abundant proteins Present in all eukaryotic chromosomes

bull Show a remarkable conserved multiple sequence alignment

bull Conservation of structure and function (aid in DNA package)

Why multiple alignment - examplebull Example

5

MSA applicationsbull Generate protein familiesbull Extrapolation ndash membership of uncharacterized sequence to a

protein familybull Understand evolution - preliminary step in molecular evolution

analysis for constructing phylogenetic trees eg is the duck evolutionary closer to a lion or to a fruit fly

bull Pattern identification ndash find the important (conserved) region in the protein - conserved positions may characterize a function

bull Domain identification ndash Build a consensusprofilemotif that describe the protein family help to describe new members of the family

bull DNA regulatory elementsbull Structure prediction (secondary and 3D model)

6

multiple sequence alignment

Pairwise solution might be very different from multiple solution

7

בתיאוריה כן באופן האם אפשר פשוט להכליל את שיטת התיכנות הדינמי פרקטי לא

מדובר בזמן ריצהובגודל זיכרון הגדלים

כאשר NKכ Nהוא אורך הרצף Kמספר הרצפים

למעשה בלתי אפשריKgt3 עבור

הבעיה החישובית בהינתן מספר רצפים מצא את ההתאמה שלהם למסגרת תקבל ערך אופטימלי שפונקצית מרחקמשותפת (עי הוספת רווחים) כך

-SCGPFIRVMSCGPGLRA-SCTPHL-AMSC-PKIRGMS-LPLLRNMSHKPALRA

SCGPFIRVMSCGPGLRASCTPHLAMSCPKIRGMSLPLLRNMSHKPALRA

8

מאחר שהבעיה החישובית קשה אנו יודעים שכל שיטה שנציע לא תבטיח פתרון אופטימלי נחפש שיטה שנותנת תוצאות טובות

ברב המקרים יתכן שהשיטה שנבחר תהיה תלויה בסיבה שבגללה

אנו מבצעים את ההתאמה

1 Choosing the sequencess2 Scoring metrics3 Approximationheuristic algorithms4 MSA formats5 Interpreting the MSA

The problems we have to answer

9

Scoring metricsbull Distance from Consensus - The consensus of an alignment

is a string of the most common characters in each column of the alignment The total distance between the strings is defined as the number of characters that differ from the consensus character of their column let C be the consensus sequence then the total distance is sum_i D(Si C)

bull Evolutionary Tree Alignment - The weight of the lightest evolutionary tree that can be constructed from the sequences with the weight of the tree defined as the number of changes between pairs of sequences that correspond to two adjacent nodes in the tree summed over all such pairs

bull Sum of Pairs - The sum of pairwise distances between all pairs of sequences Sum_iltj D(Si Sj)

10

-SCGPFIRVMSCGPGLRA-SCTPHL-A

-SCGPFIRVMSCGPGLRA

-SCGPFIRV-SCTPHL-A

MSCGPGLRA-SCTPHL-A

5 3 5 13

Scoring metrics -examplesSum of pairs

-SCGPFIRVMSCGPGLRA-SCTPHL-AMSC-PKIRGMS-LPLLRNMSHKPALRA

3 5 4 1 6 2 4 6 354סהכ הומוגניות

3 1 2 5 0 4 2 0 42 19סהכ מרחק

Distance from concensus

11

MSA algorithms

bull Progressive methods (CLUSTALWT-Coffee)

bull Iterative methods (Dialign)

bull Direct optimization (monte carlo genetic algorithms)

bull Local methods eMotifs Blocks Psi-blast

12

CLUSTALW algorithm

bull Compare all sequence pairs (pairwise alignment)

bull Generate a hierarchy for alignment (guide tree)

bull Build the multiple alignment step by step according to the guide tree first aligning the most similar pair then add another sequence or another pairwise alignment etc

13

(1) Pairwise alignment (prepare a guide tree)

6 pairwise alignments

then cluster analysis

(2) Multiple alignment following the tree from (1)

successive alignments

CLUSTALW algorithm

14

בונים עץ עי חיבור הרצפים הדומים לפי סדר התאמתם2

Hbb Hbb Hba Hba Myg Human Horse Human Horse Whale

בונים את ההתאמה לפי הסדר המוכתב3 עי העץ

ישנם שלושה מצביםהתאמת זוג רצפים - התאמה בין התאמות-הוספת רצף בודד להתאמה קיימת-

בונים טבלה של מרחק עריכה בין כל שני רצפים1CLUSTALW algorithm

15

ניתן להתאים בין התאמות (או בין רצף להתאמה) עי הרחבה די טבעית של שיטת התיכנות הדינמי הוספה

והכנסת אותיות באופן הרגיל המחיר להתאמת אותיות הוא מחיר ממוצע לקומבינציות השונות צריך לתת מחיר מיוחד

לרווח

16

נקודות עדינות

נותנים משקלות שונים לרצפים כך שאם יש מספר רצפים מאד דומים bullהמשקל היחסי של כל אחד יקטן

שיטה מיוחדת לקביעת מחיר להכנסת רווחיםbull

חסרונותמאחר שהבניה היא בלתי הפיכה יש תלות גדולה מאד באיכות ההתאמה bull

של הזוגות הראשוניםOnce a Gap Always a gapרווחים אינם נעלמים bullרגישות רבה לקנס הרווחיםbull

יתרונותמהירbullסבירbullאפשר לקבוע הרבה פרמטריםbullכולם משתמשיםbull

CLUSTALW algorithm

17

bull We can deduce a pairwise alignment for each two sequences in the multiple alignment (projected pairwise alignment)

bull The projected pairwise alignment is NOT the best pairwise alignment for the two sequences

bull CLUTALW is not an optimal algorithm Better alignments might exist The algorithm yields a possible alignment but not necessarily the best one

Best Pairwise alignment (optimal)

Projected Pairwise alignment

CLUSTALW algorithm

18

ClustalW at EMBLhttpwwwebiacukclustalw Clustalw at the SRS site at EBI

19

ClustalW Output Aln format

20

MSA Editing Jalview

Conservation

wwwesembnetorgServicesMolBiojalviewindexhtml

21

MSA formats - fasta

22

MSA formats - Aln

23

MSA formats - MSF

24

Example 1a a good MSA

25

Example 1b making MSA of distantly related proteins

26

Example 1c including more distant relatives in the MSA

27

bull Mononuclear iron proteins ndash electron carrier proteins Iron atoms are bound to amino acid side chains

bull In IPNS the metal ion is coordinated by three protein residues

bull IPNS is involved in biosynthesis of penicillin

Example 2 Isopenicillin N Synthase

N

SN

OCOOHO

H

H

Me

Me

COOH

NH2

N

SN

OCOOH

NH2 Me

Me

COOHO

ACV

Isopenicillin N

Fe+2

Ascorbate

O2

2H2O Fe

N

NHis268

H

N

NHis212

H

SACV

O2 (NO)

H2O

OAsp214

O

28

Research IPNS

bull Goal Identify Fe+2 binding residues

bull Possible solutions1 In the lab

2 Bioinformatic approach (comparing different IPNS sequences)

29

Step 1

Multiple alignment of known IPNS

Implementation

1 Obtain sequence (eg for MCBI)

IPNS AND Bacteria[Organism]

2 MSA (clustalw) and search for conserved residues in the MSA

30

31

32

MSA ndash bacteria only

Not enough variation

33

MSA ndash bacteria amp fungi

Not enough variation

bacteria amp fungi

bacteria

34

Step 2

Goal Add more enzymes similar to IPNS

ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW

35

bull New multiple alignment narrowing down the possibilities

36

Simple multiple alignment

bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite

similarbull Not enough variability to categorize the active

sitesbull We need to obtain even more distant sequences

(distant homologs)

37

Step 3

Using the results of the MSA for further searches

Implementation

1 Obtain an MSA (clustalw)

2 - Construct a consensus sequence and perform a

new search

OR

- Construct a profile and perform a new search

3 MSA (clustalw) and search for conserved residues in the MSA

38

Consensus Sequence

bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column

bull Consensus each position reflects the most common character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

39

Profilebull We can deduce a statistical model describing

the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column

bull Profile each position reflects the frequency of the character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 1 067 0 0

T 0 033 1 1

C 0 0 0 0

G 0 0 0 0

40

Profile vs Consensusbull The following multiple alignments will

have the same consensus

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

41

Profile vs Consensusbull But have a different profile

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 066 1 0 0

T 0 0 0 1

C 033 0 066

0

G 0 0 033

0

1 2 3 4 5 6

A 1 1 0 0

T 0 0 0 1

C 0 0 1 0

G 0 0 0 0

42

Sequence LOGO A A C C G C T C T T

A G C C G C G C - T

A - C A G A G C C T

A A G C A C G C - T

A C G G G T G C T T

A T G C ndash C G C - T

A gc c g G C T

httpweblogoberkeleyedu

43

Psi Blast

bull Position Specific Iterated - automatic profile-like search

Regular blast

Construct profile from blast results

Blast profile search

Final results

44

bull Alignment with distantly related proteins

45

bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS

Enzyme Relative Km kcat kcatKm

Activity (mM) (min-1) (mM-1min

-1)Wild type 100 04 388 969

His48Ala 16 056 75 134

His63Ala 31 10 142 142

His114Ala 28 085 125 147

His124Ala 48 084 321 381

His135Ala 22 059 117 198

His212Ala lt0007 nd nd

His268Ala lt0003 nd nd

Asp14Ala 5 086 056 07

Asp113Ala 63 045 238 528

Asp131Ala 68 048 363 755

Asp203Ala 32 091 123 135

Asp214Ala lt0004 nd nd

Isopenicillin N Synthase

46

ndash IPNS

47

48

  • Multiple Sequence Alignment
  • Example
  • Why multiple sequence alignment
  • Why multiple alignment - example
  • MSA applications
  • Slide 6
  • Slide 7
  • Slide 8
  • Scoring metrics
  • Slide 10
  • MSA algorithms
  • CLUSTALW algorithm
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • ClustalW at EMBL
  • ClustalW Output Aln format
  • MSA Editing Jalview
  • MSA formats - fasta
  • MSA formats - Aln
  • MSA formats - MSF
  • Example 1a a good MSA
  • Slide 25
  • Slide 26
  • Example 2 Isopenicillin N Synthase
  • Research IPNS
  • Step 1
  • Slide 30
  • Slide 31
  • MSA ndash bacteria only
  • MSA ndash bacteria amp fungi
  • Step 2
  • Slide 35
  • Simple multiple alignment
  • Step 3
  • Consensus Sequence
  • Profile
  • Profile vs Consensus
  • Slide 41
  • Slide 42
  • Psi Blast
  • Slide 44
  • Isopenicillin N Synthase
  • Slide 46
  • Slide 47
  • Slide 48
Page 4: Multiple Sequence Alignment

4

bull Histones small abundant proteins Present in all eukaryotic chromosomes

bull Show a remarkable conserved multiple sequence alignment

bull Conservation of structure and function (aid in DNA package)

Why multiple alignment - examplebull Example

5

MSA applicationsbull Generate protein familiesbull Extrapolation ndash membership of uncharacterized sequence to a

protein familybull Understand evolution - preliminary step in molecular evolution

analysis for constructing phylogenetic trees eg is the duck evolutionary closer to a lion or to a fruit fly

bull Pattern identification ndash find the important (conserved) region in the protein - conserved positions may characterize a function

bull Domain identification ndash Build a consensusprofilemotif that describe the protein family help to describe new members of the family

bull DNA regulatory elementsbull Structure prediction (secondary and 3D model)

6

multiple sequence alignment

Pairwise solution might be very different from multiple solution

7

בתיאוריה כן באופן האם אפשר פשוט להכליל את שיטת התיכנות הדינמי פרקטי לא

מדובר בזמן ריצהובגודל זיכרון הגדלים

כאשר NKכ Nהוא אורך הרצף Kמספר הרצפים

למעשה בלתי אפשריKgt3 עבור

הבעיה החישובית בהינתן מספר רצפים מצא את ההתאמה שלהם למסגרת תקבל ערך אופטימלי שפונקצית מרחקמשותפת (עי הוספת רווחים) כך

-SCGPFIRVMSCGPGLRA-SCTPHL-AMSC-PKIRGMS-LPLLRNMSHKPALRA

SCGPFIRVMSCGPGLRASCTPHLAMSCPKIRGMSLPLLRNMSHKPALRA

8

מאחר שהבעיה החישובית קשה אנו יודעים שכל שיטה שנציע לא תבטיח פתרון אופטימלי נחפש שיטה שנותנת תוצאות טובות

ברב המקרים יתכן שהשיטה שנבחר תהיה תלויה בסיבה שבגללה

אנו מבצעים את ההתאמה

1 Choosing the sequencess2 Scoring metrics3 Approximationheuristic algorithms4 MSA formats5 Interpreting the MSA

The problems we have to answer

9

Scoring metricsbull Distance from Consensus - The consensus of an alignment

is a string of the most common characters in each column of the alignment The total distance between the strings is defined as the number of characters that differ from the consensus character of their column let C be the consensus sequence then the total distance is sum_i D(Si C)

bull Evolutionary Tree Alignment - The weight of the lightest evolutionary tree that can be constructed from the sequences with the weight of the tree defined as the number of changes between pairs of sequences that correspond to two adjacent nodes in the tree summed over all such pairs

bull Sum of Pairs - The sum of pairwise distances between all pairs of sequences Sum_iltj D(Si Sj)

10

-SCGPFIRVMSCGPGLRA-SCTPHL-A

-SCGPFIRVMSCGPGLRA

-SCGPFIRV-SCTPHL-A

MSCGPGLRA-SCTPHL-A

5 3 5 13

Scoring metrics -examplesSum of pairs

-SCGPFIRVMSCGPGLRA-SCTPHL-AMSC-PKIRGMS-LPLLRNMSHKPALRA

3 5 4 1 6 2 4 6 354סהכ הומוגניות

3 1 2 5 0 4 2 0 42 19סהכ מרחק

Distance from concensus

11

MSA algorithms

bull Progressive methods (CLUSTALWT-Coffee)

bull Iterative methods (Dialign)

bull Direct optimization (monte carlo genetic algorithms)

bull Local methods eMotifs Blocks Psi-blast

12

CLUSTALW algorithm

bull Compare all sequence pairs (pairwise alignment)

bull Generate a hierarchy for alignment (guide tree)

bull Build the multiple alignment step by step according to the guide tree first aligning the most similar pair then add another sequence or another pairwise alignment etc

13

(1) Pairwise alignment (prepare a guide tree)

6 pairwise alignments

then cluster analysis

(2) Multiple alignment following the tree from (1)

successive alignments

CLUSTALW algorithm

14

בונים עץ עי חיבור הרצפים הדומים לפי סדר התאמתם2

Hbb Hbb Hba Hba Myg Human Horse Human Horse Whale

בונים את ההתאמה לפי הסדר המוכתב3 עי העץ

ישנם שלושה מצביםהתאמת זוג רצפים - התאמה בין התאמות-הוספת רצף בודד להתאמה קיימת-

בונים טבלה של מרחק עריכה בין כל שני רצפים1CLUSTALW algorithm

15

ניתן להתאים בין התאמות (או בין רצף להתאמה) עי הרחבה די טבעית של שיטת התיכנות הדינמי הוספה

והכנסת אותיות באופן הרגיל המחיר להתאמת אותיות הוא מחיר ממוצע לקומבינציות השונות צריך לתת מחיר מיוחד

לרווח

16

נקודות עדינות

נותנים משקלות שונים לרצפים כך שאם יש מספר רצפים מאד דומים bullהמשקל היחסי של כל אחד יקטן

שיטה מיוחדת לקביעת מחיר להכנסת רווחיםbull

חסרונותמאחר שהבניה היא בלתי הפיכה יש תלות גדולה מאד באיכות ההתאמה bull

של הזוגות הראשוניםOnce a Gap Always a gapרווחים אינם נעלמים bullרגישות רבה לקנס הרווחיםbull

יתרונותמהירbullסבירbullאפשר לקבוע הרבה פרמטריםbullכולם משתמשיםbull

CLUSTALW algorithm

17

bull We can deduce a pairwise alignment for each two sequences in the multiple alignment (projected pairwise alignment)

bull The projected pairwise alignment is NOT the best pairwise alignment for the two sequences

bull CLUTALW is not an optimal algorithm Better alignments might exist The algorithm yields a possible alignment but not necessarily the best one

Best Pairwise alignment (optimal)

Projected Pairwise alignment

CLUSTALW algorithm

18

ClustalW at EMBLhttpwwwebiacukclustalw Clustalw at the SRS site at EBI

19

ClustalW Output Aln format

20

MSA Editing Jalview

Conservation

wwwesembnetorgServicesMolBiojalviewindexhtml

21

MSA formats - fasta

22

MSA formats - Aln

23

MSA formats - MSF

24

Example 1a a good MSA

25

Example 1b making MSA of distantly related proteins

26

Example 1c including more distant relatives in the MSA

27

bull Mononuclear iron proteins ndash electron carrier proteins Iron atoms are bound to amino acid side chains

bull In IPNS the metal ion is coordinated by three protein residues

bull IPNS is involved in biosynthesis of penicillin

Example 2 Isopenicillin N Synthase

N

SN

OCOOHO

H

H

Me

Me

COOH

NH2

N

SN

OCOOH

NH2 Me

Me

COOHO

ACV

Isopenicillin N

Fe+2

Ascorbate

O2

2H2O Fe

N

NHis268

H

N

NHis212

H

SACV

O2 (NO)

H2O

OAsp214

O

28

Research IPNS

bull Goal Identify Fe+2 binding residues

bull Possible solutions1 In the lab

2 Bioinformatic approach (comparing different IPNS sequences)

29

Step 1

Multiple alignment of known IPNS

Implementation

1 Obtain sequence (eg for MCBI)

IPNS AND Bacteria[Organism]

2 MSA (clustalw) and search for conserved residues in the MSA

30

31

32

MSA ndash bacteria only

Not enough variation

33

MSA ndash bacteria amp fungi

Not enough variation

bacteria amp fungi

bacteria

34

Step 2

Goal Add more enzymes similar to IPNS

ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW

35

bull New multiple alignment narrowing down the possibilities

36

Simple multiple alignment

bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite

similarbull Not enough variability to categorize the active

sitesbull We need to obtain even more distant sequences

(distant homologs)

37

Step 3

Using the results of the MSA for further searches

Implementation

1 Obtain an MSA (clustalw)

2 - Construct a consensus sequence and perform a

new search

OR

- Construct a profile and perform a new search

3 MSA (clustalw) and search for conserved residues in the MSA

38

Consensus Sequence

bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column

bull Consensus each position reflects the most common character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

39

Profilebull We can deduce a statistical model describing

the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column

bull Profile each position reflects the frequency of the character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 1 067 0 0

T 0 033 1 1

C 0 0 0 0

G 0 0 0 0

40

Profile vs Consensusbull The following multiple alignments will

have the same consensus

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

41

Profile vs Consensusbull But have a different profile

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 066 1 0 0

T 0 0 0 1

C 033 0 066

0

G 0 0 033

0

1 2 3 4 5 6

A 1 1 0 0

T 0 0 0 1

C 0 0 1 0

G 0 0 0 0

42

Sequence LOGO A A C C G C T C T T

A G C C G C G C - T

A - C A G A G C C T

A A G C A C G C - T

A C G G G T G C T T

A T G C ndash C G C - T

A gc c g G C T

httpweblogoberkeleyedu

43

Psi Blast

bull Position Specific Iterated - automatic profile-like search

Regular blast

Construct profile from blast results

Blast profile search

Final results

44

bull Alignment with distantly related proteins

45

bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS

Enzyme Relative Km kcat kcatKm

Activity (mM) (min-1) (mM-1min

-1)Wild type 100 04 388 969

His48Ala 16 056 75 134

His63Ala 31 10 142 142

His114Ala 28 085 125 147

His124Ala 48 084 321 381

His135Ala 22 059 117 198

His212Ala lt0007 nd nd

His268Ala lt0003 nd nd

Asp14Ala 5 086 056 07

Asp113Ala 63 045 238 528

Asp131Ala 68 048 363 755

Asp203Ala 32 091 123 135

Asp214Ala lt0004 nd nd

Isopenicillin N Synthase

46

ndash IPNS

47

48

  • Multiple Sequence Alignment
  • Example
  • Why multiple sequence alignment
  • Why multiple alignment - example
  • MSA applications
  • Slide 6
  • Slide 7
  • Slide 8
  • Scoring metrics
  • Slide 10
  • MSA algorithms
  • CLUSTALW algorithm
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • ClustalW at EMBL
  • ClustalW Output Aln format
  • MSA Editing Jalview
  • MSA formats - fasta
  • MSA formats - Aln
  • MSA formats - MSF
  • Example 1a a good MSA
  • Slide 25
  • Slide 26
  • Example 2 Isopenicillin N Synthase
  • Research IPNS
  • Step 1
  • Slide 30
  • Slide 31
  • MSA ndash bacteria only
  • MSA ndash bacteria amp fungi
  • Step 2
  • Slide 35
  • Simple multiple alignment
  • Step 3
  • Consensus Sequence
  • Profile
  • Profile vs Consensus
  • Slide 41
  • Slide 42
  • Psi Blast
  • Slide 44
  • Isopenicillin N Synthase
  • Slide 46
  • Slide 47
  • Slide 48
Page 5: Multiple Sequence Alignment

5

MSA applicationsbull Generate protein familiesbull Extrapolation ndash membership of uncharacterized sequence to a

protein familybull Understand evolution - preliminary step in molecular evolution

analysis for constructing phylogenetic trees eg is the duck evolutionary closer to a lion or to a fruit fly

bull Pattern identification ndash find the important (conserved) region in the protein - conserved positions may characterize a function

bull Domain identification ndash Build a consensusprofilemotif that describe the protein family help to describe new members of the family

bull DNA regulatory elementsbull Structure prediction (secondary and 3D model)

6

multiple sequence alignment

Pairwise solution might be very different from multiple solution

7

בתיאוריה כן באופן האם אפשר פשוט להכליל את שיטת התיכנות הדינמי פרקטי לא

מדובר בזמן ריצהובגודל זיכרון הגדלים

כאשר NKכ Nהוא אורך הרצף Kמספר הרצפים

למעשה בלתי אפשריKgt3 עבור

הבעיה החישובית בהינתן מספר רצפים מצא את ההתאמה שלהם למסגרת תקבל ערך אופטימלי שפונקצית מרחקמשותפת (עי הוספת רווחים) כך

-SCGPFIRVMSCGPGLRA-SCTPHL-AMSC-PKIRGMS-LPLLRNMSHKPALRA

SCGPFIRVMSCGPGLRASCTPHLAMSCPKIRGMSLPLLRNMSHKPALRA

8

מאחר שהבעיה החישובית קשה אנו יודעים שכל שיטה שנציע לא תבטיח פתרון אופטימלי נחפש שיטה שנותנת תוצאות טובות

ברב המקרים יתכן שהשיטה שנבחר תהיה תלויה בסיבה שבגללה

אנו מבצעים את ההתאמה

1 Choosing the sequencess2 Scoring metrics3 Approximationheuristic algorithms4 MSA formats5 Interpreting the MSA

The problems we have to answer

9

Scoring metricsbull Distance from Consensus - The consensus of an alignment

is a string of the most common characters in each column of the alignment The total distance between the strings is defined as the number of characters that differ from the consensus character of their column let C be the consensus sequence then the total distance is sum_i D(Si C)

bull Evolutionary Tree Alignment - The weight of the lightest evolutionary tree that can be constructed from the sequences with the weight of the tree defined as the number of changes between pairs of sequences that correspond to two adjacent nodes in the tree summed over all such pairs

bull Sum of Pairs - The sum of pairwise distances between all pairs of sequences Sum_iltj D(Si Sj)

10

-SCGPFIRVMSCGPGLRA-SCTPHL-A

-SCGPFIRVMSCGPGLRA

-SCGPFIRV-SCTPHL-A

MSCGPGLRA-SCTPHL-A

5 3 5 13

Scoring metrics -examplesSum of pairs

-SCGPFIRVMSCGPGLRA-SCTPHL-AMSC-PKIRGMS-LPLLRNMSHKPALRA

3 5 4 1 6 2 4 6 354סהכ הומוגניות

3 1 2 5 0 4 2 0 42 19סהכ מרחק

Distance from concensus

11

MSA algorithms

bull Progressive methods (CLUSTALWT-Coffee)

bull Iterative methods (Dialign)

bull Direct optimization (monte carlo genetic algorithms)

bull Local methods eMotifs Blocks Psi-blast

12

CLUSTALW algorithm

bull Compare all sequence pairs (pairwise alignment)

bull Generate a hierarchy for alignment (guide tree)

bull Build the multiple alignment step by step according to the guide tree first aligning the most similar pair then add another sequence or another pairwise alignment etc

13

(1) Pairwise alignment (prepare a guide tree)

6 pairwise alignments

then cluster analysis

(2) Multiple alignment following the tree from (1)

successive alignments

CLUSTALW algorithm

14

בונים עץ עי חיבור הרצפים הדומים לפי סדר התאמתם2

Hbb Hbb Hba Hba Myg Human Horse Human Horse Whale

בונים את ההתאמה לפי הסדר המוכתב3 עי העץ

ישנם שלושה מצביםהתאמת זוג רצפים - התאמה בין התאמות-הוספת רצף בודד להתאמה קיימת-

בונים טבלה של מרחק עריכה בין כל שני רצפים1CLUSTALW algorithm

15

ניתן להתאים בין התאמות (או בין רצף להתאמה) עי הרחבה די טבעית של שיטת התיכנות הדינמי הוספה

והכנסת אותיות באופן הרגיל המחיר להתאמת אותיות הוא מחיר ממוצע לקומבינציות השונות צריך לתת מחיר מיוחד

לרווח

16

נקודות עדינות

נותנים משקלות שונים לרצפים כך שאם יש מספר רצפים מאד דומים bullהמשקל היחסי של כל אחד יקטן

שיטה מיוחדת לקביעת מחיר להכנסת רווחיםbull

חסרונותמאחר שהבניה היא בלתי הפיכה יש תלות גדולה מאד באיכות ההתאמה bull

של הזוגות הראשוניםOnce a Gap Always a gapרווחים אינם נעלמים bullרגישות רבה לקנס הרווחיםbull

יתרונותמהירbullסבירbullאפשר לקבוע הרבה פרמטריםbullכולם משתמשיםbull

CLUSTALW algorithm

17

bull We can deduce a pairwise alignment for each two sequences in the multiple alignment (projected pairwise alignment)

bull The projected pairwise alignment is NOT the best pairwise alignment for the two sequences

bull CLUTALW is not an optimal algorithm Better alignments might exist The algorithm yields a possible alignment but not necessarily the best one

Best Pairwise alignment (optimal)

Projected Pairwise alignment

CLUSTALW algorithm

18

ClustalW at EMBLhttpwwwebiacukclustalw Clustalw at the SRS site at EBI

19

ClustalW Output Aln format

20

MSA Editing Jalview

Conservation

wwwesembnetorgServicesMolBiojalviewindexhtml

21

MSA formats - fasta

22

MSA formats - Aln

23

MSA formats - MSF

24

Example 1a a good MSA

25

Example 1b making MSA of distantly related proteins

26

Example 1c including more distant relatives in the MSA

27

bull Mononuclear iron proteins ndash electron carrier proteins Iron atoms are bound to amino acid side chains

bull In IPNS the metal ion is coordinated by three protein residues

bull IPNS is involved in biosynthesis of penicillin

Example 2 Isopenicillin N Synthase

N

SN

OCOOHO

H

H

Me

Me

COOH

NH2

N

SN

OCOOH

NH2 Me

Me

COOHO

ACV

Isopenicillin N

Fe+2

Ascorbate

O2

2H2O Fe

N

NHis268

H

N

NHis212

H

SACV

O2 (NO)

H2O

OAsp214

O

28

Research IPNS

bull Goal Identify Fe+2 binding residues

bull Possible solutions1 In the lab

2 Bioinformatic approach (comparing different IPNS sequences)

29

Step 1

Multiple alignment of known IPNS

Implementation

1 Obtain sequence (eg for MCBI)

IPNS AND Bacteria[Organism]

2 MSA (clustalw) and search for conserved residues in the MSA

30

31

32

MSA ndash bacteria only

Not enough variation

33

MSA ndash bacteria amp fungi

Not enough variation

bacteria amp fungi

bacteria

34

Step 2

Goal Add more enzymes similar to IPNS

ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW

35

bull New multiple alignment narrowing down the possibilities

36

Simple multiple alignment

bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite

similarbull Not enough variability to categorize the active

sitesbull We need to obtain even more distant sequences

(distant homologs)

37

Step 3

Using the results of the MSA for further searches

Implementation

1 Obtain an MSA (clustalw)

2 - Construct a consensus sequence and perform a

new search

OR

- Construct a profile and perform a new search

3 MSA (clustalw) and search for conserved residues in the MSA

38

Consensus Sequence

bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column

bull Consensus each position reflects the most common character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

39

Profilebull We can deduce a statistical model describing

the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column

bull Profile each position reflects the frequency of the character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 1 067 0 0

T 0 033 1 1

C 0 0 0 0

G 0 0 0 0

40

Profile vs Consensusbull The following multiple alignments will

have the same consensus

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

41

Profile vs Consensusbull But have a different profile

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 066 1 0 0

T 0 0 0 1

C 033 0 066

0

G 0 0 033

0

1 2 3 4 5 6

A 1 1 0 0

T 0 0 0 1

C 0 0 1 0

G 0 0 0 0

42

Sequence LOGO A A C C G C T C T T

A G C C G C G C - T

A - C A G A G C C T

A A G C A C G C - T

A C G G G T G C T T

A T G C ndash C G C - T

A gc c g G C T

httpweblogoberkeleyedu

43

Psi Blast

bull Position Specific Iterated - automatic profile-like search

Regular blast

Construct profile from blast results

Blast profile search

Final results

44

bull Alignment with distantly related proteins

45

bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS

Enzyme Relative Km kcat kcatKm

Activity (mM) (min-1) (mM-1min

-1)Wild type 100 04 388 969

His48Ala 16 056 75 134

His63Ala 31 10 142 142

His114Ala 28 085 125 147

His124Ala 48 084 321 381

His135Ala 22 059 117 198

His212Ala lt0007 nd nd

His268Ala lt0003 nd nd

Asp14Ala 5 086 056 07

Asp113Ala 63 045 238 528

Asp131Ala 68 048 363 755

Asp203Ala 32 091 123 135

Asp214Ala lt0004 nd nd

Isopenicillin N Synthase

46

ndash IPNS

47

48

  • Multiple Sequence Alignment
  • Example
  • Why multiple sequence alignment
  • Why multiple alignment - example
  • MSA applications
  • Slide 6
  • Slide 7
  • Slide 8
  • Scoring metrics
  • Slide 10
  • MSA algorithms
  • CLUSTALW algorithm
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • ClustalW at EMBL
  • ClustalW Output Aln format
  • MSA Editing Jalview
  • MSA formats - fasta
  • MSA formats - Aln
  • MSA formats - MSF
  • Example 1a a good MSA
  • Slide 25
  • Slide 26
  • Example 2 Isopenicillin N Synthase
  • Research IPNS
  • Step 1
  • Slide 30
  • Slide 31
  • MSA ndash bacteria only
  • MSA ndash bacteria amp fungi
  • Step 2
  • Slide 35
  • Simple multiple alignment
  • Step 3
  • Consensus Sequence
  • Profile
  • Profile vs Consensus
  • Slide 41
  • Slide 42
  • Psi Blast
  • Slide 44
  • Isopenicillin N Synthase
  • Slide 46
  • Slide 47
  • Slide 48
Page 6: Multiple Sequence Alignment

6

multiple sequence alignment

Pairwise solution might be very different from multiple solution

7

בתיאוריה כן באופן האם אפשר פשוט להכליל את שיטת התיכנות הדינמי פרקטי לא

מדובר בזמן ריצהובגודל זיכרון הגדלים

כאשר NKכ Nהוא אורך הרצף Kמספר הרצפים

למעשה בלתי אפשריKgt3 עבור

הבעיה החישובית בהינתן מספר רצפים מצא את ההתאמה שלהם למסגרת תקבל ערך אופטימלי שפונקצית מרחקמשותפת (עי הוספת רווחים) כך

-SCGPFIRVMSCGPGLRA-SCTPHL-AMSC-PKIRGMS-LPLLRNMSHKPALRA

SCGPFIRVMSCGPGLRASCTPHLAMSCPKIRGMSLPLLRNMSHKPALRA

8

מאחר שהבעיה החישובית קשה אנו יודעים שכל שיטה שנציע לא תבטיח פתרון אופטימלי נחפש שיטה שנותנת תוצאות טובות

ברב המקרים יתכן שהשיטה שנבחר תהיה תלויה בסיבה שבגללה

אנו מבצעים את ההתאמה

1 Choosing the sequencess2 Scoring metrics3 Approximationheuristic algorithms4 MSA formats5 Interpreting the MSA

The problems we have to answer

9

Scoring metricsbull Distance from Consensus - The consensus of an alignment

is a string of the most common characters in each column of the alignment The total distance between the strings is defined as the number of characters that differ from the consensus character of their column let C be the consensus sequence then the total distance is sum_i D(Si C)

bull Evolutionary Tree Alignment - The weight of the lightest evolutionary tree that can be constructed from the sequences with the weight of the tree defined as the number of changes between pairs of sequences that correspond to two adjacent nodes in the tree summed over all such pairs

bull Sum of Pairs - The sum of pairwise distances between all pairs of sequences Sum_iltj D(Si Sj)

10

-SCGPFIRVMSCGPGLRA-SCTPHL-A

-SCGPFIRVMSCGPGLRA

-SCGPFIRV-SCTPHL-A

MSCGPGLRA-SCTPHL-A

5 3 5 13

Scoring metrics -examplesSum of pairs

-SCGPFIRVMSCGPGLRA-SCTPHL-AMSC-PKIRGMS-LPLLRNMSHKPALRA

3 5 4 1 6 2 4 6 354סהכ הומוגניות

3 1 2 5 0 4 2 0 42 19סהכ מרחק

Distance from concensus

11

MSA algorithms

bull Progressive methods (CLUSTALWT-Coffee)

bull Iterative methods (Dialign)

bull Direct optimization (monte carlo genetic algorithms)

bull Local methods eMotifs Blocks Psi-blast

12

CLUSTALW algorithm

bull Compare all sequence pairs (pairwise alignment)

bull Generate a hierarchy for alignment (guide tree)

bull Build the multiple alignment step by step according to the guide tree first aligning the most similar pair then add another sequence or another pairwise alignment etc

13

(1) Pairwise alignment (prepare a guide tree)

6 pairwise alignments

then cluster analysis

(2) Multiple alignment following the tree from (1)

successive alignments

CLUSTALW algorithm

14

בונים עץ עי חיבור הרצפים הדומים לפי סדר התאמתם2

Hbb Hbb Hba Hba Myg Human Horse Human Horse Whale

בונים את ההתאמה לפי הסדר המוכתב3 עי העץ

ישנם שלושה מצביםהתאמת זוג רצפים - התאמה בין התאמות-הוספת רצף בודד להתאמה קיימת-

בונים טבלה של מרחק עריכה בין כל שני רצפים1CLUSTALW algorithm

15

ניתן להתאים בין התאמות (או בין רצף להתאמה) עי הרחבה די טבעית של שיטת התיכנות הדינמי הוספה

והכנסת אותיות באופן הרגיל המחיר להתאמת אותיות הוא מחיר ממוצע לקומבינציות השונות צריך לתת מחיר מיוחד

לרווח

16

נקודות עדינות

נותנים משקלות שונים לרצפים כך שאם יש מספר רצפים מאד דומים bullהמשקל היחסי של כל אחד יקטן

שיטה מיוחדת לקביעת מחיר להכנסת רווחיםbull

חסרונותמאחר שהבניה היא בלתי הפיכה יש תלות גדולה מאד באיכות ההתאמה bull

של הזוגות הראשוניםOnce a Gap Always a gapרווחים אינם נעלמים bullרגישות רבה לקנס הרווחיםbull

יתרונותמהירbullסבירbullאפשר לקבוע הרבה פרמטריםbullכולם משתמשיםbull

CLUSTALW algorithm

17

bull We can deduce a pairwise alignment for each two sequences in the multiple alignment (projected pairwise alignment)

bull The projected pairwise alignment is NOT the best pairwise alignment for the two sequences

bull CLUTALW is not an optimal algorithm Better alignments might exist The algorithm yields a possible alignment but not necessarily the best one

Best Pairwise alignment (optimal)

Projected Pairwise alignment

CLUSTALW algorithm

18

ClustalW at EMBLhttpwwwebiacukclustalw Clustalw at the SRS site at EBI

19

ClustalW Output Aln format

20

MSA Editing Jalview

Conservation

wwwesembnetorgServicesMolBiojalviewindexhtml

21

MSA formats - fasta

22

MSA formats - Aln

23

MSA formats - MSF

24

Example 1a a good MSA

25

Example 1b making MSA of distantly related proteins

26

Example 1c including more distant relatives in the MSA

27

bull Mononuclear iron proteins ndash electron carrier proteins Iron atoms are bound to amino acid side chains

bull In IPNS the metal ion is coordinated by three protein residues

bull IPNS is involved in biosynthesis of penicillin

Example 2 Isopenicillin N Synthase

N

SN

OCOOHO

H

H

Me

Me

COOH

NH2

N

SN

OCOOH

NH2 Me

Me

COOHO

ACV

Isopenicillin N

Fe+2

Ascorbate

O2

2H2O Fe

N

NHis268

H

N

NHis212

H

SACV

O2 (NO)

H2O

OAsp214

O

28

Research IPNS

bull Goal Identify Fe+2 binding residues

bull Possible solutions1 In the lab

2 Bioinformatic approach (comparing different IPNS sequences)

29

Step 1

Multiple alignment of known IPNS

Implementation

1 Obtain sequence (eg for MCBI)

IPNS AND Bacteria[Organism]

2 MSA (clustalw) and search for conserved residues in the MSA

30

31

32

MSA ndash bacteria only

Not enough variation

33

MSA ndash bacteria amp fungi

Not enough variation

bacteria amp fungi

bacteria

34

Step 2

Goal Add more enzymes similar to IPNS

ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW

35

bull New multiple alignment narrowing down the possibilities

36

Simple multiple alignment

bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite

similarbull Not enough variability to categorize the active

sitesbull We need to obtain even more distant sequences

(distant homologs)

37

Step 3

Using the results of the MSA for further searches

Implementation

1 Obtain an MSA (clustalw)

2 - Construct a consensus sequence and perform a

new search

OR

- Construct a profile and perform a new search

3 MSA (clustalw) and search for conserved residues in the MSA

38

Consensus Sequence

bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column

bull Consensus each position reflects the most common character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

39

Profilebull We can deduce a statistical model describing

the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column

bull Profile each position reflects the frequency of the character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 1 067 0 0

T 0 033 1 1

C 0 0 0 0

G 0 0 0 0

40

Profile vs Consensusbull The following multiple alignments will

have the same consensus

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

41

Profile vs Consensusbull But have a different profile

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 066 1 0 0

T 0 0 0 1

C 033 0 066

0

G 0 0 033

0

1 2 3 4 5 6

A 1 1 0 0

T 0 0 0 1

C 0 0 1 0

G 0 0 0 0

42

Sequence LOGO A A C C G C T C T T

A G C C G C G C - T

A - C A G A G C C T

A A G C A C G C - T

A C G G G T G C T T

A T G C ndash C G C - T

A gc c g G C T

httpweblogoberkeleyedu

43

Psi Blast

bull Position Specific Iterated - automatic profile-like search

Regular blast

Construct profile from blast results

Blast profile search

Final results

44

bull Alignment with distantly related proteins

45

bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS

Enzyme Relative Km kcat kcatKm

Activity (mM) (min-1) (mM-1min

-1)Wild type 100 04 388 969

His48Ala 16 056 75 134

His63Ala 31 10 142 142

His114Ala 28 085 125 147

His124Ala 48 084 321 381

His135Ala 22 059 117 198

His212Ala lt0007 nd nd

His268Ala lt0003 nd nd

Asp14Ala 5 086 056 07

Asp113Ala 63 045 238 528

Asp131Ala 68 048 363 755

Asp203Ala 32 091 123 135

Asp214Ala lt0004 nd nd

Isopenicillin N Synthase

46

ndash IPNS

47

48

  • Multiple Sequence Alignment
  • Example
  • Why multiple sequence alignment
  • Why multiple alignment - example
  • MSA applications
  • Slide 6
  • Slide 7
  • Slide 8
  • Scoring metrics
  • Slide 10
  • MSA algorithms
  • CLUSTALW algorithm
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • ClustalW at EMBL
  • ClustalW Output Aln format
  • MSA Editing Jalview
  • MSA formats - fasta
  • MSA formats - Aln
  • MSA formats - MSF
  • Example 1a a good MSA
  • Slide 25
  • Slide 26
  • Example 2 Isopenicillin N Synthase
  • Research IPNS
  • Step 1
  • Slide 30
  • Slide 31
  • MSA ndash bacteria only
  • MSA ndash bacteria amp fungi
  • Step 2
  • Slide 35
  • Simple multiple alignment
  • Step 3
  • Consensus Sequence
  • Profile
  • Profile vs Consensus
  • Slide 41
  • Slide 42
  • Psi Blast
  • Slide 44
  • Isopenicillin N Synthase
  • Slide 46
  • Slide 47
  • Slide 48
Page 7: Multiple Sequence Alignment

7

בתיאוריה כן באופן האם אפשר פשוט להכליל את שיטת התיכנות הדינמי פרקטי לא

מדובר בזמן ריצהובגודל זיכרון הגדלים

כאשר NKכ Nהוא אורך הרצף Kמספר הרצפים

למעשה בלתי אפשריKgt3 עבור

הבעיה החישובית בהינתן מספר רצפים מצא את ההתאמה שלהם למסגרת תקבל ערך אופטימלי שפונקצית מרחקמשותפת (עי הוספת רווחים) כך

-SCGPFIRVMSCGPGLRA-SCTPHL-AMSC-PKIRGMS-LPLLRNMSHKPALRA

SCGPFIRVMSCGPGLRASCTPHLAMSCPKIRGMSLPLLRNMSHKPALRA

8

מאחר שהבעיה החישובית קשה אנו יודעים שכל שיטה שנציע לא תבטיח פתרון אופטימלי נחפש שיטה שנותנת תוצאות טובות

ברב המקרים יתכן שהשיטה שנבחר תהיה תלויה בסיבה שבגללה

אנו מבצעים את ההתאמה

1 Choosing the sequencess2 Scoring metrics3 Approximationheuristic algorithms4 MSA formats5 Interpreting the MSA

The problems we have to answer

9

Scoring metricsbull Distance from Consensus - The consensus of an alignment

is a string of the most common characters in each column of the alignment The total distance between the strings is defined as the number of characters that differ from the consensus character of their column let C be the consensus sequence then the total distance is sum_i D(Si C)

bull Evolutionary Tree Alignment - The weight of the lightest evolutionary tree that can be constructed from the sequences with the weight of the tree defined as the number of changes between pairs of sequences that correspond to two adjacent nodes in the tree summed over all such pairs

bull Sum of Pairs - The sum of pairwise distances between all pairs of sequences Sum_iltj D(Si Sj)

10

-SCGPFIRVMSCGPGLRA-SCTPHL-A

-SCGPFIRVMSCGPGLRA

-SCGPFIRV-SCTPHL-A

MSCGPGLRA-SCTPHL-A

5 3 5 13

Scoring metrics -examplesSum of pairs

-SCGPFIRVMSCGPGLRA-SCTPHL-AMSC-PKIRGMS-LPLLRNMSHKPALRA

3 5 4 1 6 2 4 6 354סהכ הומוגניות

3 1 2 5 0 4 2 0 42 19סהכ מרחק

Distance from concensus

11

MSA algorithms

bull Progressive methods (CLUSTALWT-Coffee)

bull Iterative methods (Dialign)

bull Direct optimization (monte carlo genetic algorithms)

bull Local methods eMotifs Blocks Psi-blast

12

CLUSTALW algorithm

bull Compare all sequence pairs (pairwise alignment)

bull Generate a hierarchy for alignment (guide tree)

bull Build the multiple alignment step by step according to the guide tree first aligning the most similar pair then add another sequence or another pairwise alignment etc

13

(1) Pairwise alignment (prepare a guide tree)

6 pairwise alignments

then cluster analysis

(2) Multiple alignment following the tree from (1)

successive alignments

CLUSTALW algorithm

14

בונים עץ עי חיבור הרצפים הדומים לפי סדר התאמתם2

Hbb Hbb Hba Hba Myg Human Horse Human Horse Whale

בונים את ההתאמה לפי הסדר המוכתב3 עי העץ

ישנם שלושה מצביםהתאמת זוג רצפים - התאמה בין התאמות-הוספת רצף בודד להתאמה קיימת-

בונים טבלה של מרחק עריכה בין כל שני רצפים1CLUSTALW algorithm

15

ניתן להתאים בין התאמות (או בין רצף להתאמה) עי הרחבה די טבעית של שיטת התיכנות הדינמי הוספה

והכנסת אותיות באופן הרגיל המחיר להתאמת אותיות הוא מחיר ממוצע לקומבינציות השונות צריך לתת מחיר מיוחד

לרווח

16

נקודות עדינות

נותנים משקלות שונים לרצפים כך שאם יש מספר רצפים מאד דומים bullהמשקל היחסי של כל אחד יקטן

שיטה מיוחדת לקביעת מחיר להכנסת רווחיםbull

חסרונותמאחר שהבניה היא בלתי הפיכה יש תלות גדולה מאד באיכות ההתאמה bull

של הזוגות הראשוניםOnce a Gap Always a gapרווחים אינם נעלמים bullרגישות רבה לקנס הרווחיםbull

יתרונותמהירbullסבירbullאפשר לקבוע הרבה פרמטריםbullכולם משתמשיםbull

CLUSTALW algorithm

17

bull We can deduce a pairwise alignment for each two sequences in the multiple alignment (projected pairwise alignment)

bull The projected pairwise alignment is NOT the best pairwise alignment for the two sequences

bull CLUTALW is not an optimal algorithm Better alignments might exist The algorithm yields a possible alignment but not necessarily the best one

Best Pairwise alignment (optimal)

Projected Pairwise alignment

CLUSTALW algorithm

18

ClustalW at EMBLhttpwwwebiacukclustalw Clustalw at the SRS site at EBI

19

ClustalW Output Aln format

20

MSA Editing Jalview

Conservation

wwwesembnetorgServicesMolBiojalviewindexhtml

21

MSA formats - fasta

22

MSA formats - Aln

23

MSA formats - MSF

24

Example 1a a good MSA

25

Example 1b making MSA of distantly related proteins

26

Example 1c including more distant relatives in the MSA

27

bull Mononuclear iron proteins ndash electron carrier proteins Iron atoms are bound to amino acid side chains

bull In IPNS the metal ion is coordinated by three protein residues

bull IPNS is involved in biosynthesis of penicillin

Example 2 Isopenicillin N Synthase

N

SN

OCOOHO

H

H

Me

Me

COOH

NH2

N

SN

OCOOH

NH2 Me

Me

COOHO

ACV

Isopenicillin N

Fe+2

Ascorbate

O2

2H2O Fe

N

NHis268

H

N

NHis212

H

SACV

O2 (NO)

H2O

OAsp214

O

28

Research IPNS

bull Goal Identify Fe+2 binding residues

bull Possible solutions1 In the lab

2 Bioinformatic approach (comparing different IPNS sequences)

29

Step 1

Multiple alignment of known IPNS

Implementation

1 Obtain sequence (eg for MCBI)

IPNS AND Bacteria[Organism]

2 MSA (clustalw) and search for conserved residues in the MSA

30

31

32

MSA ndash bacteria only

Not enough variation

33

MSA ndash bacteria amp fungi

Not enough variation

bacteria amp fungi

bacteria

34

Step 2

Goal Add more enzymes similar to IPNS

ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW

35

bull New multiple alignment narrowing down the possibilities

36

Simple multiple alignment

bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite

similarbull Not enough variability to categorize the active

sitesbull We need to obtain even more distant sequences

(distant homologs)

37

Step 3

Using the results of the MSA for further searches

Implementation

1 Obtain an MSA (clustalw)

2 - Construct a consensus sequence and perform a

new search

OR

- Construct a profile and perform a new search

3 MSA (clustalw) and search for conserved residues in the MSA

38

Consensus Sequence

bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column

bull Consensus each position reflects the most common character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

39

Profilebull We can deduce a statistical model describing

the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column

bull Profile each position reflects the frequency of the character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 1 067 0 0

T 0 033 1 1

C 0 0 0 0

G 0 0 0 0

40

Profile vs Consensusbull The following multiple alignments will

have the same consensus

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

41

Profile vs Consensusbull But have a different profile

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 066 1 0 0

T 0 0 0 1

C 033 0 066

0

G 0 0 033

0

1 2 3 4 5 6

A 1 1 0 0

T 0 0 0 1

C 0 0 1 0

G 0 0 0 0

42

Sequence LOGO A A C C G C T C T T

A G C C G C G C - T

A - C A G A G C C T

A A G C A C G C - T

A C G G G T G C T T

A T G C ndash C G C - T

A gc c g G C T

httpweblogoberkeleyedu

43

Psi Blast

bull Position Specific Iterated - automatic profile-like search

Regular blast

Construct profile from blast results

Blast profile search

Final results

44

bull Alignment with distantly related proteins

45

bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS

Enzyme Relative Km kcat kcatKm

Activity (mM) (min-1) (mM-1min

-1)Wild type 100 04 388 969

His48Ala 16 056 75 134

His63Ala 31 10 142 142

His114Ala 28 085 125 147

His124Ala 48 084 321 381

His135Ala 22 059 117 198

His212Ala lt0007 nd nd

His268Ala lt0003 nd nd

Asp14Ala 5 086 056 07

Asp113Ala 63 045 238 528

Asp131Ala 68 048 363 755

Asp203Ala 32 091 123 135

Asp214Ala lt0004 nd nd

Isopenicillin N Synthase

46

ndash IPNS

47

48

  • Multiple Sequence Alignment
  • Example
  • Why multiple sequence alignment
  • Why multiple alignment - example
  • MSA applications
  • Slide 6
  • Slide 7
  • Slide 8
  • Scoring metrics
  • Slide 10
  • MSA algorithms
  • CLUSTALW algorithm
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • ClustalW at EMBL
  • ClustalW Output Aln format
  • MSA Editing Jalview
  • MSA formats - fasta
  • MSA formats - Aln
  • MSA formats - MSF
  • Example 1a a good MSA
  • Slide 25
  • Slide 26
  • Example 2 Isopenicillin N Synthase
  • Research IPNS
  • Step 1
  • Slide 30
  • Slide 31
  • MSA ndash bacteria only
  • MSA ndash bacteria amp fungi
  • Step 2
  • Slide 35
  • Simple multiple alignment
  • Step 3
  • Consensus Sequence
  • Profile
  • Profile vs Consensus
  • Slide 41
  • Slide 42
  • Psi Blast
  • Slide 44
  • Isopenicillin N Synthase
  • Slide 46
  • Slide 47
  • Slide 48
Page 8: Multiple Sequence Alignment

8

מאחר שהבעיה החישובית קשה אנו יודעים שכל שיטה שנציע לא תבטיח פתרון אופטימלי נחפש שיטה שנותנת תוצאות טובות

ברב המקרים יתכן שהשיטה שנבחר תהיה תלויה בסיבה שבגללה

אנו מבצעים את ההתאמה

1 Choosing the sequencess2 Scoring metrics3 Approximationheuristic algorithms4 MSA formats5 Interpreting the MSA

The problems we have to answer

9

Scoring metricsbull Distance from Consensus - The consensus of an alignment

is a string of the most common characters in each column of the alignment The total distance between the strings is defined as the number of characters that differ from the consensus character of their column let C be the consensus sequence then the total distance is sum_i D(Si C)

bull Evolutionary Tree Alignment - The weight of the lightest evolutionary tree that can be constructed from the sequences with the weight of the tree defined as the number of changes between pairs of sequences that correspond to two adjacent nodes in the tree summed over all such pairs

bull Sum of Pairs - The sum of pairwise distances between all pairs of sequences Sum_iltj D(Si Sj)

10

-SCGPFIRVMSCGPGLRA-SCTPHL-A

-SCGPFIRVMSCGPGLRA

-SCGPFIRV-SCTPHL-A

MSCGPGLRA-SCTPHL-A

5 3 5 13

Scoring metrics -examplesSum of pairs

-SCGPFIRVMSCGPGLRA-SCTPHL-AMSC-PKIRGMS-LPLLRNMSHKPALRA

3 5 4 1 6 2 4 6 354סהכ הומוגניות

3 1 2 5 0 4 2 0 42 19סהכ מרחק

Distance from concensus

11

MSA algorithms

bull Progressive methods (CLUSTALWT-Coffee)

bull Iterative methods (Dialign)

bull Direct optimization (monte carlo genetic algorithms)

bull Local methods eMotifs Blocks Psi-blast

12

CLUSTALW algorithm

bull Compare all sequence pairs (pairwise alignment)

bull Generate a hierarchy for alignment (guide tree)

bull Build the multiple alignment step by step according to the guide tree first aligning the most similar pair then add another sequence or another pairwise alignment etc

13

(1) Pairwise alignment (prepare a guide tree)

6 pairwise alignments

then cluster analysis

(2) Multiple alignment following the tree from (1)

successive alignments

CLUSTALW algorithm

14

בונים עץ עי חיבור הרצפים הדומים לפי סדר התאמתם2

Hbb Hbb Hba Hba Myg Human Horse Human Horse Whale

בונים את ההתאמה לפי הסדר המוכתב3 עי העץ

ישנם שלושה מצביםהתאמת זוג רצפים - התאמה בין התאמות-הוספת רצף בודד להתאמה קיימת-

בונים טבלה של מרחק עריכה בין כל שני רצפים1CLUSTALW algorithm

15

ניתן להתאים בין התאמות (או בין רצף להתאמה) עי הרחבה די טבעית של שיטת התיכנות הדינמי הוספה

והכנסת אותיות באופן הרגיל המחיר להתאמת אותיות הוא מחיר ממוצע לקומבינציות השונות צריך לתת מחיר מיוחד

לרווח

16

נקודות עדינות

נותנים משקלות שונים לרצפים כך שאם יש מספר רצפים מאד דומים bullהמשקל היחסי של כל אחד יקטן

שיטה מיוחדת לקביעת מחיר להכנסת רווחיםbull

חסרונותמאחר שהבניה היא בלתי הפיכה יש תלות גדולה מאד באיכות ההתאמה bull

של הזוגות הראשוניםOnce a Gap Always a gapרווחים אינם נעלמים bullרגישות רבה לקנס הרווחיםbull

יתרונותמהירbullסבירbullאפשר לקבוע הרבה פרמטריםbullכולם משתמשיםbull

CLUSTALW algorithm

17

bull We can deduce a pairwise alignment for each two sequences in the multiple alignment (projected pairwise alignment)

bull The projected pairwise alignment is NOT the best pairwise alignment for the two sequences

bull CLUTALW is not an optimal algorithm Better alignments might exist The algorithm yields a possible alignment but not necessarily the best one

Best Pairwise alignment (optimal)

Projected Pairwise alignment

CLUSTALW algorithm

18

ClustalW at EMBLhttpwwwebiacukclustalw Clustalw at the SRS site at EBI

19

ClustalW Output Aln format

20

MSA Editing Jalview

Conservation

wwwesembnetorgServicesMolBiojalviewindexhtml

21

MSA formats - fasta

22

MSA formats - Aln

23

MSA formats - MSF

24

Example 1a a good MSA

25

Example 1b making MSA of distantly related proteins

26

Example 1c including more distant relatives in the MSA

27

bull Mononuclear iron proteins ndash electron carrier proteins Iron atoms are bound to amino acid side chains

bull In IPNS the metal ion is coordinated by three protein residues

bull IPNS is involved in biosynthesis of penicillin

Example 2 Isopenicillin N Synthase

N

SN

OCOOHO

H

H

Me

Me

COOH

NH2

N

SN

OCOOH

NH2 Me

Me

COOHO

ACV

Isopenicillin N

Fe+2

Ascorbate

O2

2H2O Fe

N

NHis268

H

N

NHis212

H

SACV

O2 (NO)

H2O

OAsp214

O

28

Research IPNS

bull Goal Identify Fe+2 binding residues

bull Possible solutions1 In the lab

2 Bioinformatic approach (comparing different IPNS sequences)

29

Step 1

Multiple alignment of known IPNS

Implementation

1 Obtain sequence (eg for MCBI)

IPNS AND Bacteria[Organism]

2 MSA (clustalw) and search for conserved residues in the MSA

30

31

32

MSA ndash bacteria only

Not enough variation

33

MSA ndash bacteria amp fungi

Not enough variation

bacteria amp fungi

bacteria

34

Step 2

Goal Add more enzymes similar to IPNS

ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW

35

bull New multiple alignment narrowing down the possibilities

36

Simple multiple alignment

bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite

similarbull Not enough variability to categorize the active

sitesbull We need to obtain even more distant sequences

(distant homologs)

37

Step 3

Using the results of the MSA for further searches

Implementation

1 Obtain an MSA (clustalw)

2 - Construct a consensus sequence and perform a

new search

OR

- Construct a profile and perform a new search

3 MSA (clustalw) and search for conserved residues in the MSA

38

Consensus Sequence

bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column

bull Consensus each position reflects the most common character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

39

Profilebull We can deduce a statistical model describing

the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column

bull Profile each position reflects the frequency of the character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 1 067 0 0

T 0 033 1 1

C 0 0 0 0

G 0 0 0 0

40

Profile vs Consensusbull The following multiple alignments will

have the same consensus

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

41

Profile vs Consensusbull But have a different profile

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 066 1 0 0

T 0 0 0 1

C 033 0 066

0

G 0 0 033

0

1 2 3 4 5 6

A 1 1 0 0

T 0 0 0 1

C 0 0 1 0

G 0 0 0 0

42

Sequence LOGO A A C C G C T C T T

A G C C G C G C - T

A - C A G A G C C T

A A G C A C G C - T

A C G G G T G C T T

A T G C ndash C G C - T

A gc c g G C T

httpweblogoberkeleyedu

43

Psi Blast

bull Position Specific Iterated - automatic profile-like search

Regular blast

Construct profile from blast results

Blast profile search

Final results

44

bull Alignment with distantly related proteins

45

bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS

Enzyme Relative Km kcat kcatKm

Activity (mM) (min-1) (mM-1min

-1)Wild type 100 04 388 969

His48Ala 16 056 75 134

His63Ala 31 10 142 142

His114Ala 28 085 125 147

His124Ala 48 084 321 381

His135Ala 22 059 117 198

His212Ala lt0007 nd nd

His268Ala lt0003 nd nd

Asp14Ala 5 086 056 07

Asp113Ala 63 045 238 528

Asp131Ala 68 048 363 755

Asp203Ala 32 091 123 135

Asp214Ala lt0004 nd nd

Isopenicillin N Synthase

46

ndash IPNS

47

48

  • Multiple Sequence Alignment
  • Example
  • Why multiple sequence alignment
  • Why multiple alignment - example
  • MSA applications
  • Slide 6
  • Slide 7
  • Slide 8
  • Scoring metrics
  • Slide 10
  • MSA algorithms
  • CLUSTALW algorithm
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • ClustalW at EMBL
  • ClustalW Output Aln format
  • MSA Editing Jalview
  • MSA formats - fasta
  • MSA formats - Aln
  • MSA formats - MSF
  • Example 1a a good MSA
  • Slide 25
  • Slide 26
  • Example 2 Isopenicillin N Synthase
  • Research IPNS
  • Step 1
  • Slide 30
  • Slide 31
  • MSA ndash bacteria only
  • MSA ndash bacteria amp fungi
  • Step 2
  • Slide 35
  • Simple multiple alignment
  • Step 3
  • Consensus Sequence
  • Profile
  • Profile vs Consensus
  • Slide 41
  • Slide 42
  • Psi Blast
  • Slide 44
  • Isopenicillin N Synthase
  • Slide 46
  • Slide 47
  • Slide 48
Page 9: Multiple Sequence Alignment

9

Scoring metricsbull Distance from Consensus - The consensus of an alignment

is a string of the most common characters in each column of the alignment The total distance between the strings is defined as the number of characters that differ from the consensus character of their column let C be the consensus sequence then the total distance is sum_i D(Si C)

bull Evolutionary Tree Alignment - The weight of the lightest evolutionary tree that can be constructed from the sequences with the weight of the tree defined as the number of changes between pairs of sequences that correspond to two adjacent nodes in the tree summed over all such pairs

bull Sum of Pairs - The sum of pairwise distances between all pairs of sequences Sum_iltj D(Si Sj)

10

-SCGPFIRVMSCGPGLRA-SCTPHL-A

-SCGPFIRVMSCGPGLRA

-SCGPFIRV-SCTPHL-A

MSCGPGLRA-SCTPHL-A

5 3 5 13

Scoring metrics -examplesSum of pairs

-SCGPFIRVMSCGPGLRA-SCTPHL-AMSC-PKIRGMS-LPLLRNMSHKPALRA

3 5 4 1 6 2 4 6 354סהכ הומוגניות

3 1 2 5 0 4 2 0 42 19סהכ מרחק

Distance from concensus

11

MSA algorithms

bull Progressive methods (CLUSTALWT-Coffee)

bull Iterative methods (Dialign)

bull Direct optimization (monte carlo genetic algorithms)

bull Local methods eMotifs Blocks Psi-blast

12

CLUSTALW algorithm

bull Compare all sequence pairs (pairwise alignment)

bull Generate a hierarchy for alignment (guide tree)

bull Build the multiple alignment step by step according to the guide tree first aligning the most similar pair then add another sequence or another pairwise alignment etc

13

(1) Pairwise alignment (prepare a guide tree)

6 pairwise alignments

then cluster analysis

(2) Multiple alignment following the tree from (1)

successive alignments

CLUSTALW algorithm

14

בונים עץ עי חיבור הרצפים הדומים לפי סדר התאמתם2

Hbb Hbb Hba Hba Myg Human Horse Human Horse Whale

בונים את ההתאמה לפי הסדר המוכתב3 עי העץ

ישנם שלושה מצביםהתאמת זוג רצפים - התאמה בין התאמות-הוספת רצף בודד להתאמה קיימת-

בונים טבלה של מרחק עריכה בין כל שני רצפים1CLUSTALW algorithm

15

ניתן להתאים בין התאמות (או בין רצף להתאמה) עי הרחבה די טבעית של שיטת התיכנות הדינמי הוספה

והכנסת אותיות באופן הרגיל המחיר להתאמת אותיות הוא מחיר ממוצע לקומבינציות השונות צריך לתת מחיר מיוחד

לרווח

16

נקודות עדינות

נותנים משקלות שונים לרצפים כך שאם יש מספר רצפים מאד דומים bullהמשקל היחסי של כל אחד יקטן

שיטה מיוחדת לקביעת מחיר להכנסת רווחיםbull

חסרונותמאחר שהבניה היא בלתי הפיכה יש תלות גדולה מאד באיכות ההתאמה bull

של הזוגות הראשוניםOnce a Gap Always a gapרווחים אינם נעלמים bullרגישות רבה לקנס הרווחיםbull

יתרונותמהירbullסבירbullאפשר לקבוע הרבה פרמטריםbullכולם משתמשיםbull

CLUSTALW algorithm

17

bull We can deduce a pairwise alignment for each two sequences in the multiple alignment (projected pairwise alignment)

bull The projected pairwise alignment is NOT the best pairwise alignment for the two sequences

bull CLUTALW is not an optimal algorithm Better alignments might exist The algorithm yields a possible alignment but not necessarily the best one

Best Pairwise alignment (optimal)

Projected Pairwise alignment

CLUSTALW algorithm

18

ClustalW at EMBLhttpwwwebiacukclustalw Clustalw at the SRS site at EBI

19

ClustalW Output Aln format

20

MSA Editing Jalview

Conservation

wwwesembnetorgServicesMolBiojalviewindexhtml

21

MSA formats - fasta

22

MSA formats - Aln

23

MSA formats - MSF

24

Example 1a a good MSA

25

Example 1b making MSA of distantly related proteins

26

Example 1c including more distant relatives in the MSA

27

bull Mononuclear iron proteins ndash electron carrier proteins Iron atoms are bound to amino acid side chains

bull In IPNS the metal ion is coordinated by three protein residues

bull IPNS is involved in biosynthesis of penicillin

Example 2 Isopenicillin N Synthase

N

SN

OCOOHO

H

H

Me

Me

COOH

NH2

N

SN

OCOOH

NH2 Me

Me

COOHO

ACV

Isopenicillin N

Fe+2

Ascorbate

O2

2H2O Fe

N

NHis268

H

N

NHis212

H

SACV

O2 (NO)

H2O

OAsp214

O

28

Research IPNS

bull Goal Identify Fe+2 binding residues

bull Possible solutions1 In the lab

2 Bioinformatic approach (comparing different IPNS sequences)

29

Step 1

Multiple alignment of known IPNS

Implementation

1 Obtain sequence (eg for MCBI)

IPNS AND Bacteria[Organism]

2 MSA (clustalw) and search for conserved residues in the MSA

30

31

32

MSA ndash bacteria only

Not enough variation

33

MSA ndash bacteria amp fungi

Not enough variation

bacteria amp fungi

bacteria

34

Step 2

Goal Add more enzymes similar to IPNS

ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW

35

bull New multiple alignment narrowing down the possibilities

36

Simple multiple alignment

bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite

similarbull Not enough variability to categorize the active

sitesbull We need to obtain even more distant sequences

(distant homologs)

37

Step 3

Using the results of the MSA for further searches

Implementation

1 Obtain an MSA (clustalw)

2 - Construct a consensus sequence and perform a

new search

OR

- Construct a profile and perform a new search

3 MSA (clustalw) and search for conserved residues in the MSA

38

Consensus Sequence

bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column

bull Consensus each position reflects the most common character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

39

Profilebull We can deduce a statistical model describing

the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column

bull Profile each position reflects the frequency of the character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 1 067 0 0

T 0 033 1 1

C 0 0 0 0

G 0 0 0 0

40

Profile vs Consensusbull The following multiple alignments will

have the same consensus

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

41

Profile vs Consensusbull But have a different profile

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 066 1 0 0

T 0 0 0 1

C 033 0 066

0

G 0 0 033

0

1 2 3 4 5 6

A 1 1 0 0

T 0 0 0 1

C 0 0 1 0

G 0 0 0 0

42

Sequence LOGO A A C C G C T C T T

A G C C G C G C - T

A - C A G A G C C T

A A G C A C G C - T

A C G G G T G C T T

A T G C ndash C G C - T

A gc c g G C T

httpweblogoberkeleyedu

43

Psi Blast

bull Position Specific Iterated - automatic profile-like search

Regular blast

Construct profile from blast results

Blast profile search

Final results

44

bull Alignment with distantly related proteins

45

bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS

Enzyme Relative Km kcat kcatKm

Activity (mM) (min-1) (mM-1min

-1)Wild type 100 04 388 969

His48Ala 16 056 75 134

His63Ala 31 10 142 142

His114Ala 28 085 125 147

His124Ala 48 084 321 381

His135Ala 22 059 117 198

His212Ala lt0007 nd nd

His268Ala lt0003 nd nd

Asp14Ala 5 086 056 07

Asp113Ala 63 045 238 528

Asp131Ala 68 048 363 755

Asp203Ala 32 091 123 135

Asp214Ala lt0004 nd nd

Isopenicillin N Synthase

46

ndash IPNS

47

48

  • Multiple Sequence Alignment
  • Example
  • Why multiple sequence alignment
  • Why multiple alignment - example
  • MSA applications
  • Slide 6
  • Slide 7
  • Slide 8
  • Scoring metrics
  • Slide 10
  • MSA algorithms
  • CLUSTALW algorithm
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • ClustalW at EMBL
  • ClustalW Output Aln format
  • MSA Editing Jalview
  • MSA formats - fasta
  • MSA formats - Aln
  • MSA formats - MSF
  • Example 1a a good MSA
  • Slide 25
  • Slide 26
  • Example 2 Isopenicillin N Synthase
  • Research IPNS
  • Step 1
  • Slide 30
  • Slide 31
  • MSA ndash bacteria only
  • MSA ndash bacteria amp fungi
  • Step 2
  • Slide 35
  • Simple multiple alignment
  • Step 3
  • Consensus Sequence
  • Profile
  • Profile vs Consensus
  • Slide 41
  • Slide 42
  • Psi Blast
  • Slide 44
  • Isopenicillin N Synthase
  • Slide 46
  • Slide 47
  • Slide 48
Page 10: Multiple Sequence Alignment

10

-SCGPFIRVMSCGPGLRA-SCTPHL-A

-SCGPFIRVMSCGPGLRA

-SCGPFIRV-SCTPHL-A

MSCGPGLRA-SCTPHL-A

5 3 5 13

Scoring metrics -examplesSum of pairs

-SCGPFIRVMSCGPGLRA-SCTPHL-AMSC-PKIRGMS-LPLLRNMSHKPALRA

3 5 4 1 6 2 4 6 354סהכ הומוגניות

3 1 2 5 0 4 2 0 42 19סהכ מרחק

Distance from concensus

11

MSA algorithms

bull Progressive methods (CLUSTALWT-Coffee)

bull Iterative methods (Dialign)

bull Direct optimization (monte carlo genetic algorithms)

bull Local methods eMotifs Blocks Psi-blast

12

CLUSTALW algorithm

bull Compare all sequence pairs (pairwise alignment)

bull Generate a hierarchy for alignment (guide tree)

bull Build the multiple alignment step by step according to the guide tree first aligning the most similar pair then add another sequence or another pairwise alignment etc

13

(1) Pairwise alignment (prepare a guide tree)

6 pairwise alignments

then cluster analysis

(2) Multiple alignment following the tree from (1)

successive alignments

CLUSTALW algorithm

14

בונים עץ עי חיבור הרצפים הדומים לפי סדר התאמתם2

Hbb Hbb Hba Hba Myg Human Horse Human Horse Whale

בונים את ההתאמה לפי הסדר המוכתב3 עי העץ

ישנם שלושה מצביםהתאמת זוג רצפים - התאמה בין התאמות-הוספת רצף בודד להתאמה קיימת-

בונים טבלה של מרחק עריכה בין כל שני רצפים1CLUSTALW algorithm

15

ניתן להתאים בין התאמות (או בין רצף להתאמה) עי הרחבה די טבעית של שיטת התיכנות הדינמי הוספה

והכנסת אותיות באופן הרגיל המחיר להתאמת אותיות הוא מחיר ממוצע לקומבינציות השונות צריך לתת מחיר מיוחד

לרווח

16

נקודות עדינות

נותנים משקלות שונים לרצפים כך שאם יש מספר רצפים מאד דומים bullהמשקל היחסי של כל אחד יקטן

שיטה מיוחדת לקביעת מחיר להכנסת רווחיםbull

חסרונותמאחר שהבניה היא בלתי הפיכה יש תלות גדולה מאד באיכות ההתאמה bull

של הזוגות הראשוניםOnce a Gap Always a gapרווחים אינם נעלמים bullרגישות רבה לקנס הרווחיםbull

יתרונותמהירbullסבירbullאפשר לקבוע הרבה פרמטריםbullכולם משתמשיםbull

CLUSTALW algorithm

17

bull We can deduce a pairwise alignment for each two sequences in the multiple alignment (projected pairwise alignment)

bull The projected pairwise alignment is NOT the best pairwise alignment for the two sequences

bull CLUTALW is not an optimal algorithm Better alignments might exist The algorithm yields a possible alignment but not necessarily the best one

Best Pairwise alignment (optimal)

Projected Pairwise alignment

CLUSTALW algorithm

18

ClustalW at EMBLhttpwwwebiacukclustalw Clustalw at the SRS site at EBI

19

ClustalW Output Aln format

20

MSA Editing Jalview

Conservation

wwwesembnetorgServicesMolBiojalviewindexhtml

21

MSA formats - fasta

22

MSA formats - Aln

23

MSA formats - MSF

24

Example 1a a good MSA

25

Example 1b making MSA of distantly related proteins

26

Example 1c including more distant relatives in the MSA

27

bull Mononuclear iron proteins ndash electron carrier proteins Iron atoms are bound to amino acid side chains

bull In IPNS the metal ion is coordinated by three protein residues

bull IPNS is involved in biosynthesis of penicillin

Example 2 Isopenicillin N Synthase

N

SN

OCOOHO

H

H

Me

Me

COOH

NH2

N

SN

OCOOH

NH2 Me

Me

COOHO

ACV

Isopenicillin N

Fe+2

Ascorbate

O2

2H2O Fe

N

NHis268

H

N

NHis212

H

SACV

O2 (NO)

H2O

OAsp214

O

28

Research IPNS

bull Goal Identify Fe+2 binding residues

bull Possible solutions1 In the lab

2 Bioinformatic approach (comparing different IPNS sequences)

29

Step 1

Multiple alignment of known IPNS

Implementation

1 Obtain sequence (eg for MCBI)

IPNS AND Bacteria[Organism]

2 MSA (clustalw) and search for conserved residues in the MSA

30

31

32

MSA ndash bacteria only

Not enough variation

33

MSA ndash bacteria amp fungi

Not enough variation

bacteria amp fungi

bacteria

34

Step 2

Goal Add more enzymes similar to IPNS

ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW

35

bull New multiple alignment narrowing down the possibilities

36

Simple multiple alignment

bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite

similarbull Not enough variability to categorize the active

sitesbull We need to obtain even more distant sequences

(distant homologs)

37

Step 3

Using the results of the MSA for further searches

Implementation

1 Obtain an MSA (clustalw)

2 - Construct a consensus sequence and perform a

new search

OR

- Construct a profile and perform a new search

3 MSA (clustalw) and search for conserved residues in the MSA

38

Consensus Sequence

bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column

bull Consensus each position reflects the most common character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

39

Profilebull We can deduce a statistical model describing

the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column

bull Profile each position reflects the frequency of the character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 1 067 0 0

T 0 033 1 1

C 0 0 0 0

G 0 0 0 0

40

Profile vs Consensusbull The following multiple alignments will

have the same consensus

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

41

Profile vs Consensusbull But have a different profile

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 066 1 0 0

T 0 0 0 1

C 033 0 066

0

G 0 0 033

0

1 2 3 4 5 6

A 1 1 0 0

T 0 0 0 1

C 0 0 1 0

G 0 0 0 0

42

Sequence LOGO A A C C G C T C T T

A G C C G C G C - T

A - C A G A G C C T

A A G C A C G C - T

A C G G G T G C T T

A T G C ndash C G C - T

A gc c g G C T

httpweblogoberkeleyedu

43

Psi Blast

bull Position Specific Iterated - automatic profile-like search

Regular blast

Construct profile from blast results

Blast profile search

Final results

44

bull Alignment with distantly related proteins

45

bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS

Enzyme Relative Km kcat kcatKm

Activity (mM) (min-1) (mM-1min

-1)Wild type 100 04 388 969

His48Ala 16 056 75 134

His63Ala 31 10 142 142

His114Ala 28 085 125 147

His124Ala 48 084 321 381

His135Ala 22 059 117 198

His212Ala lt0007 nd nd

His268Ala lt0003 nd nd

Asp14Ala 5 086 056 07

Asp113Ala 63 045 238 528

Asp131Ala 68 048 363 755

Asp203Ala 32 091 123 135

Asp214Ala lt0004 nd nd

Isopenicillin N Synthase

46

ndash IPNS

47

48

  • Multiple Sequence Alignment
  • Example
  • Why multiple sequence alignment
  • Why multiple alignment - example
  • MSA applications
  • Slide 6
  • Slide 7
  • Slide 8
  • Scoring metrics
  • Slide 10
  • MSA algorithms
  • CLUSTALW algorithm
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • ClustalW at EMBL
  • ClustalW Output Aln format
  • MSA Editing Jalview
  • MSA formats - fasta
  • MSA formats - Aln
  • MSA formats - MSF
  • Example 1a a good MSA
  • Slide 25
  • Slide 26
  • Example 2 Isopenicillin N Synthase
  • Research IPNS
  • Step 1
  • Slide 30
  • Slide 31
  • MSA ndash bacteria only
  • MSA ndash bacteria amp fungi
  • Step 2
  • Slide 35
  • Simple multiple alignment
  • Step 3
  • Consensus Sequence
  • Profile
  • Profile vs Consensus
  • Slide 41
  • Slide 42
  • Psi Blast
  • Slide 44
  • Isopenicillin N Synthase
  • Slide 46
  • Slide 47
  • Slide 48
Page 11: Multiple Sequence Alignment

11

MSA algorithms

bull Progressive methods (CLUSTALWT-Coffee)

bull Iterative methods (Dialign)

bull Direct optimization (monte carlo genetic algorithms)

bull Local methods eMotifs Blocks Psi-blast

12

CLUSTALW algorithm

bull Compare all sequence pairs (pairwise alignment)

bull Generate a hierarchy for alignment (guide tree)

bull Build the multiple alignment step by step according to the guide tree first aligning the most similar pair then add another sequence or another pairwise alignment etc

13

(1) Pairwise alignment (prepare a guide tree)

6 pairwise alignments

then cluster analysis

(2) Multiple alignment following the tree from (1)

successive alignments

CLUSTALW algorithm

14

בונים עץ עי חיבור הרצפים הדומים לפי סדר התאמתם2

Hbb Hbb Hba Hba Myg Human Horse Human Horse Whale

בונים את ההתאמה לפי הסדר המוכתב3 עי העץ

ישנם שלושה מצביםהתאמת זוג רצפים - התאמה בין התאמות-הוספת רצף בודד להתאמה קיימת-

בונים טבלה של מרחק עריכה בין כל שני רצפים1CLUSTALW algorithm

15

ניתן להתאים בין התאמות (או בין רצף להתאמה) עי הרחבה די טבעית של שיטת התיכנות הדינמי הוספה

והכנסת אותיות באופן הרגיל המחיר להתאמת אותיות הוא מחיר ממוצע לקומבינציות השונות צריך לתת מחיר מיוחד

לרווח

16

נקודות עדינות

נותנים משקלות שונים לרצפים כך שאם יש מספר רצפים מאד דומים bullהמשקל היחסי של כל אחד יקטן

שיטה מיוחדת לקביעת מחיר להכנסת רווחיםbull

חסרונותמאחר שהבניה היא בלתי הפיכה יש תלות גדולה מאד באיכות ההתאמה bull

של הזוגות הראשוניםOnce a Gap Always a gapרווחים אינם נעלמים bullרגישות רבה לקנס הרווחיםbull

יתרונותמהירbullסבירbullאפשר לקבוע הרבה פרמטריםbullכולם משתמשיםbull

CLUSTALW algorithm

17

bull We can deduce a pairwise alignment for each two sequences in the multiple alignment (projected pairwise alignment)

bull The projected pairwise alignment is NOT the best pairwise alignment for the two sequences

bull CLUTALW is not an optimal algorithm Better alignments might exist The algorithm yields a possible alignment but not necessarily the best one

Best Pairwise alignment (optimal)

Projected Pairwise alignment

CLUSTALW algorithm

18

ClustalW at EMBLhttpwwwebiacukclustalw Clustalw at the SRS site at EBI

19

ClustalW Output Aln format

20

MSA Editing Jalview

Conservation

wwwesembnetorgServicesMolBiojalviewindexhtml

21

MSA formats - fasta

22

MSA formats - Aln

23

MSA formats - MSF

24

Example 1a a good MSA

25

Example 1b making MSA of distantly related proteins

26

Example 1c including more distant relatives in the MSA

27

bull Mononuclear iron proteins ndash electron carrier proteins Iron atoms are bound to amino acid side chains

bull In IPNS the metal ion is coordinated by three protein residues

bull IPNS is involved in biosynthesis of penicillin

Example 2 Isopenicillin N Synthase

N

SN

OCOOHO

H

H

Me

Me

COOH

NH2

N

SN

OCOOH

NH2 Me

Me

COOHO

ACV

Isopenicillin N

Fe+2

Ascorbate

O2

2H2O Fe

N

NHis268

H

N

NHis212

H

SACV

O2 (NO)

H2O

OAsp214

O

28

Research IPNS

bull Goal Identify Fe+2 binding residues

bull Possible solutions1 In the lab

2 Bioinformatic approach (comparing different IPNS sequences)

29

Step 1

Multiple alignment of known IPNS

Implementation

1 Obtain sequence (eg for MCBI)

IPNS AND Bacteria[Organism]

2 MSA (clustalw) and search for conserved residues in the MSA

30

31

32

MSA ndash bacteria only

Not enough variation

33

MSA ndash bacteria amp fungi

Not enough variation

bacteria amp fungi

bacteria

34

Step 2

Goal Add more enzymes similar to IPNS

ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW

35

bull New multiple alignment narrowing down the possibilities

36

Simple multiple alignment

bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite

similarbull Not enough variability to categorize the active

sitesbull We need to obtain even more distant sequences

(distant homologs)

37

Step 3

Using the results of the MSA for further searches

Implementation

1 Obtain an MSA (clustalw)

2 - Construct a consensus sequence and perform a

new search

OR

- Construct a profile and perform a new search

3 MSA (clustalw) and search for conserved residues in the MSA

38

Consensus Sequence

bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column

bull Consensus each position reflects the most common character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

39

Profilebull We can deduce a statistical model describing

the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column

bull Profile each position reflects the frequency of the character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 1 067 0 0

T 0 033 1 1

C 0 0 0 0

G 0 0 0 0

40

Profile vs Consensusbull The following multiple alignments will

have the same consensus

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

41

Profile vs Consensusbull But have a different profile

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 066 1 0 0

T 0 0 0 1

C 033 0 066

0

G 0 0 033

0

1 2 3 4 5 6

A 1 1 0 0

T 0 0 0 1

C 0 0 1 0

G 0 0 0 0

42

Sequence LOGO A A C C G C T C T T

A G C C G C G C - T

A - C A G A G C C T

A A G C A C G C - T

A C G G G T G C T T

A T G C ndash C G C - T

A gc c g G C T

httpweblogoberkeleyedu

43

Psi Blast

bull Position Specific Iterated - automatic profile-like search

Regular blast

Construct profile from blast results

Blast profile search

Final results

44

bull Alignment with distantly related proteins

45

bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS

Enzyme Relative Km kcat kcatKm

Activity (mM) (min-1) (mM-1min

-1)Wild type 100 04 388 969

His48Ala 16 056 75 134

His63Ala 31 10 142 142

His114Ala 28 085 125 147

His124Ala 48 084 321 381

His135Ala 22 059 117 198

His212Ala lt0007 nd nd

His268Ala lt0003 nd nd

Asp14Ala 5 086 056 07

Asp113Ala 63 045 238 528

Asp131Ala 68 048 363 755

Asp203Ala 32 091 123 135

Asp214Ala lt0004 nd nd

Isopenicillin N Synthase

46

ndash IPNS

47

48

  • Multiple Sequence Alignment
  • Example
  • Why multiple sequence alignment
  • Why multiple alignment - example
  • MSA applications
  • Slide 6
  • Slide 7
  • Slide 8
  • Scoring metrics
  • Slide 10
  • MSA algorithms
  • CLUSTALW algorithm
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • ClustalW at EMBL
  • ClustalW Output Aln format
  • MSA Editing Jalview
  • MSA formats - fasta
  • MSA formats - Aln
  • MSA formats - MSF
  • Example 1a a good MSA
  • Slide 25
  • Slide 26
  • Example 2 Isopenicillin N Synthase
  • Research IPNS
  • Step 1
  • Slide 30
  • Slide 31
  • MSA ndash bacteria only
  • MSA ndash bacteria amp fungi
  • Step 2
  • Slide 35
  • Simple multiple alignment
  • Step 3
  • Consensus Sequence
  • Profile
  • Profile vs Consensus
  • Slide 41
  • Slide 42
  • Psi Blast
  • Slide 44
  • Isopenicillin N Synthase
  • Slide 46
  • Slide 47
  • Slide 48
Page 12: Multiple Sequence Alignment

12

CLUSTALW algorithm

bull Compare all sequence pairs (pairwise alignment)

bull Generate a hierarchy for alignment (guide tree)

bull Build the multiple alignment step by step according to the guide tree first aligning the most similar pair then add another sequence or another pairwise alignment etc

13

(1) Pairwise alignment (prepare a guide tree)

6 pairwise alignments

then cluster analysis

(2) Multiple alignment following the tree from (1)

successive alignments

CLUSTALW algorithm

14

בונים עץ עי חיבור הרצפים הדומים לפי סדר התאמתם2

Hbb Hbb Hba Hba Myg Human Horse Human Horse Whale

בונים את ההתאמה לפי הסדר המוכתב3 עי העץ

ישנם שלושה מצביםהתאמת זוג רצפים - התאמה בין התאמות-הוספת רצף בודד להתאמה קיימת-

בונים טבלה של מרחק עריכה בין כל שני רצפים1CLUSTALW algorithm

15

ניתן להתאים בין התאמות (או בין רצף להתאמה) עי הרחבה די טבעית של שיטת התיכנות הדינמי הוספה

והכנסת אותיות באופן הרגיל המחיר להתאמת אותיות הוא מחיר ממוצע לקומבינציות השונות צריך לתת מחיר מיוחד

לרווח

16

נקודות עדינות

נותנים משקלות שונים לרצפים כך שאם יש מספר רצפים מאד דומים bullהמשקל היחסי של כל אחד יקטן

שיטה מיוחדת לקביעת מחיר להכנסת רווחיםbull

חסרונותמאחר שהבניה היא בלתי הפיכה יש תלות גדולה מאד באיכות ההתאמה bull

של הזוגות הראשוניםOnce a Gap Always a gapרווחים אינם נעלמים bullרגישות רבה לקנס הרווחיםbull

יתרונותמהירbullסבירbullאפשר לקבוע הרבה פרמטריםbullכולם משתמשיםbull

CLUSTALW algorithm

17

bull We can deduce a pairwise alignment for each two sequences in the multiple alignment (projected pairwise alignment)

bull The projected pairwise alignment is NOT the best pairwise alignment for the two sequences

bull CLUTALW is not an optimal algorithm Better alignments might exist The algorithm yields a possible alignment but not necessarily the best one

Best Pairwise alignment (optimal)

Projected Pairwise alignment

CLUSTALW algorithm

18

ClustalW at EMBLhttpwwwebiacukclustalw Clustalw at the SRS site at EBI

19

ClustalW Output Aln format

20

MSA Editing Jalview

Conservation

wwwesembnetorgServicesMolBiojalviewindexhtml

21

MSA formats - fasta

22

MSA formats - Aln

23

MSA formats - MSF

24

Example 1a a good MSA

25

Example 1b making MSA of distantly related proteins

26

Example 1c including more distant relatives in the MSA

27

bull Mononuclear iron proteins ndash electron carrier proteins Iron atoms are bound to amino acid side chains

bull In IPNS the metal ion is coordinated by three protein residues

bull IPNS is involved in biosynthesis of penicillin

Example 2 Isopenicillin N Synthase

N

SN

OCOOHO

H

H

Me

Me

COOH

NH2

N

SN

OCOOH

NH2 Me

Me

COOHO

ACV

Isopenicillin N

Fe+2

Ascorbate

O2

2H2O Fe

N

NHis268

H

N

NHis212

H

SACV

O2 (NO)

H2O

OAsp214

O

28

Research IPNS

bull Goal Identify Fe+2 binding residues

bull Possible solutions1 In the lab

2 Bioinformatic approach (comparing different IPNS sequences)

29

Step 1

Multiple alignment of known IPNS

Implementation

1 Obtain sequence (eg for MCBI)

IPNS AND Bacteria[Organism]

2 MSA (clustalw) and search for conserved residues in the MSA

30

31

32

MSA ndash bacteria only

Not enough variation

33

MSA ndash bacteria amp fungi

Not enough variation

bacteria amp fungi

bacteria

34

Step 2

Goal Add more enzymes similar to IPNS

ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW

35

bull New multiple alignment narrowing down the possibilities

36

Simple multiple alignment

bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite

similarbull Not enough variability to categorize the active

sitesbull We need to obtain even more distant sequences

(distant homologs)

37

Step 3

Using the results of the MSA for further searches

Implementation

1 Obtain an MSA (clustalw)

2 - Construct a consensus sequence and perform a

new search

OR

- Construct a profile and perform a new search

3 MSA (clustalw) and search for conserved residues in the MSA

38

Consensus Sequence

bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column

bull Consensus each position reflects the most common character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

39

Profilebull We can deduce a statistical model describing

the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column

bull Profile each position reflects the frequency of the character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 1 067 0 0

T 0 033 1 1

C 0 0 0 0

G 0 0 0 0

40

Profile vs Consensusbull The following multiple alignments will

have the same consensus

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

41

Profile vs Consensusbull But have a different profile

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 066 1 0 0

T 0 0 0 1

C 033 0 066

0

G 0 0 033

0

1 2 3 4 5 6

A 1 1 0 0

T 0 0 0 1

C 0 0 1 0

G 0 0 0 0

42

Sequence LOGO A A C C G C T C T T

A G C C G C G C - T

A - C A G A G C C T

A A G C A C G C - T

A C G G G T G C T T

A T G C ndash C G C - T

A gc c g G C T

httpweblogoberkeleyedu

43

Psi Blast

bull Position Specific Iterated - automatic profile-like search

Regular blast

Construct profile from blast results

Blast profile search

Final results

44

bull Alignment with distantly related proteins

45

bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS

Enzyme Relative Km kcat kcatKm

Activity (mM) (min-1) (mM-1min

-1)Wild type 100 04 388 969

His48Ala 16 056 75 134

His63Ala 31 10 142 142

His114Ala 28 085 125 147

His124Ala 48 084 321 381

His135Ala 22 059 117 198

His212Ala lt0007 nd nd

His268Ala lt0003 nd nd

Asp14Ala 5 086 056 07

Asp113Ala 63 045 238 528

Asp131Ala 68 048 363 755

Asp203Ala 32 091 123 135

Asp214Ala lt0004 nd nd

Isopenicillin N Synthase

46

ndash IPNS

47

48

  • Multiple Sequence Alignment
  • Example
  • Why multiple sequence alignment
  • Why multiple alignment - example
  • MSA applications
  • Slide 6
  • Slide 7
  • Slide 8
  • Scoring metrics
  • Slide 10
  • MSA algorithms
  • CLUSTALW algorithm
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • ClustalW at EMBL
  • ClustalW Output Aln format
  • MSA Editing Jalview
  • MSA formats - fasta
  • MSA formats - Aln
  • MSA formats - MSF
  • Example 1a a good MSA
  • Slide 25
  • Slide 26
  • Example 2 Isopenicillin N Synthase
  • Research IPNS
  • Step 1
  • Slide 30
  • Slide 31
  • MSA ndash bacteria only
  • MSA ndash bacteria amp fungi
  • Step 2
  • Slide 35
  • Simple multiple alignment
  • Step 3
  • Consensus Sequence
  • Profile
  • Profile vs Consensus
  • Slide 41
  • Slide 42
  • Psi Blast
  • Slide 44
  • Isopenicillin N Synthase
  • Slide 46
  • Slide 47
  • Slide 48
Page 13: Multiple Sequence Alignment

13

(1) Pairwise alignment (prepare a guide tree)

6 pairwise alignments

then cluster analysis

(2) Multiple alignment following the tree from (1)

successive alignments

CLUSTALW algorithm

14

בונים עץ עי חיבור הרצפים הדומים לפי סדר התאמתם2

Hbb Hbb Hba Hba Myg Human Horse Human Horse Whale

בונים את ההתאמה לפי הסדר המוכתב3 עי העץ

ישנם שלושה מצביםהתאמת זוג רצפים - התאמה בין התאמות-הוספת רצף בודד להתאמה קיימת-

בונים טבלה של מרחק עריכה בין כל שני רצפים1CLUSTALW algorithm

15

ניתן להתאים בין התאמות (או בין רצף להתאמה) עי הרחבה די טבעית של שיטת התיכנות הדינמי הוספה

והכנסת אותיות באופן הרגיל המחיר להתאמת אותיות הוא מחיר ממוצע לקומבינציות השונות צריך לתת מחיר מיוחד

לרווח

16

נקודות עדינות

נותנים משקלות שונים לרצפים כך שאם יש מספר רצפים מאד דומים bullהמשקל היחסי של כל אחד יקטן

שיטה מיוחדת לקביעת מחיר להכנסת רווחיםbull

חסרונותמאחר שהבניה היא בלתי הפיכה יש תלות גדולה מאד באיכות ההתאמה bull

של הזוגות הראשוניםOnce a Gap Always a gapרווחים אינם נעלמים bullרגישות רבה לקנס הרווחיםbull

יתרונותמהירbullסבירbullאפשר לקבוע הרבה פרמטריםbullכולם משתמשיםbull

CLUSTALW algorithm

17

bull We can deduce a pairwise alignment for each two sequences in the multiple alignment (projected pairwise alignment)

bull The projected pairwise alignment is NOT the best pairwise alignment for the two sequences

bull CLUTALW is not an optimal algorithm Better alignments might exist The algorithm yields a possible alignment but not necessarily the best one

Best Pairwise alignment (optimal)

Projected Pairwise alignment

CLUSTALW algorithm

18

ClustalW at EMBLhttpwwwebiacukclustalw Clustalw at the SRS site at EBI

19

ClustalW Output Aln format

20

MSA Editing Jalview

Conservation

wwwesembnetorgServicesMolBiojalviewindexhtml

21

MSA formats - fasta

22

MSA formats - Aln

23

MSA formats - MSF

24

Example 1a a good MSA

25

Example 1b making MSA of distantly related proteins

26

Example 1c including more distant relatives in the MSA

27

bull Mononuclear iron proteins ndash electron carrier proteins Iron atoms are bound to amino acid side chains

bull In IPNS the metal ion is coordinated by three protein residues

bull IPNS is involved in biosynthesis of penicillin

Example 2 Isopenicillin N Synthase

N

SN

OCOOHO

H

H

Me

Me

COOH

NH2

N

SN

OCOOH

NH2 Me

Me

COOHO

ACV

Isopenicillin N

Fe+2

Ascorbate

O2

2H2O Fe

N

NHis268

H

N

NHis212

H

SACV

O2 (NO)

H2O

OAsp214

O

28

Research IPNS

bull Goal Identify Fe+2 binding residues

bull Possible solutions1 In the lab

2 Bioinformatic approach (comparing different IPNS sequences)

29

Step 1

Multiple alignment of known IPNS

Implementation

1 Obtain sequence (eg for MCBI)

IPNS AND Bacteria[Organism]

2 MSA (clustalw) and search for conserved residues in the MSA

30

31

32

MSA ndash bacteria only

Not enough variation

33

MSA ndash bacteria amp fungi

Not enough variation

bacteria amp fungi

bacteria

34

Step 2

Goal Add more enzymes similar to IPNS

ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW

35

bull New multiple alignment narrowing down the possibilities

36

Simple multiple alignment

bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite

similarbull Not enough variability to categorize the active

sitesbull We need to obtain even more distant sequences

(distant homologs)

37

Step 3

Using the results of the MSA for further searches

Implementation

1 Obtain an MSA (clustalw)

2 - Construct a consensus sequence and perform a

new search

OR

- Construct a profile and perform a new search

3 MSA (clustalw) and search for conserved residues in the MSA

38

Consensus Sequence

bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column

bull Consensus each position reflects the most common character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

39

Profilebull We can deduce a statistical model describing

the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column

bull Profile each position reflects the frequency of the character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 1 067 0 0

T 0 033 1 1

C 0 0 0 0

G 0 0 0 0

40

Profile vs Consensusbull The following multiple alignments will

have the same consensus

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

41

Profile vs Consensusbull But have a different profile

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 066 1 0 0

T 0 0 0 1

C 033 0 066

0

G 0 0 033

0

1 2 3 4 5 6

A 1 1 0 0

T 0 0 0 1

C 0 0 1 0

G 0 0 0 0

42

Sequence LOGO A A C C G C T C T T

A G C C G C G C - T

A - C A G A G C C T

A A G C A C G C - T

A C G G G T G C T T

A T G C ndash C G C - T

A gc c g G C T

httpweblogoberkeleyedu

43

Psi Blast

bull Position Specific Iterated - automatic profile-like search

Regular blast

Construct profile from blast results

Blast profile search

Final results

44

bull Alignment with distantly related proteins

45

bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS

Enzyme Relative Km kcat kcatKm

Activity (mM) (min-1) (mM-1min

-1)Wild type 100 04 388 969

His48Ala 16 056 75 134

His63Ala 31 10 142 142

His114Ala 28 085 125 147

His124Ala 48 084 321 381

His135Ala 22 059 117 198

His212Ala lt0007 nd nd

His268Ala lt0003 nd nd

Asp14Ala 5 086 056 07

Asp113Ala 63 045 238 528

Asp131Ala 68 048 363 755

Asp203Ala 32 091 123 135

Asp214Ala lt0004 nd nd

Isopenicillin N Synthase

46

ndash IPNS

47

48

  • Multiple Sequence Alignment
  • Example
  • Why multiple sequence alignment
  • Why multiple alignment - example
  • MSA applications
  • Slide 6
  • Slide 7
  • Slide 8
  • Scoring metrics
  • Slide 10
  • MSA algorithms
  • CLUSTALW algorithm
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • ClustalW at EMBL
  • ClustalW Output Aln format
  • MSA Editing Jalview
  • MSA formats - fasta
  • MSA formats - Aln
  • MSA formats - MSF
  • Example 1a a good MSA
  • Slide 25
  • Slide 26
  • Example 2 Isopenicillin N Synthase
  • Research IPNS
  • Step 1
  • Slide 30
  • Slide 31
  • MSA ndash bacteria only
  • MSA ndash bacteria amp fungi
  • Step 2
  • Slide 35
  • Simple multiple alignment
  • Step 3
  • Consensus Sequence
  • Profile
  • Profile vs Consensus
  • Slide 41
  • Slide 42
  • Psi Blast
  • Slide 44
  • Isopenicillin N Synthase
  • Slide 46
  • Slide 47
  • Slide 48
Page 14: Multiple Sequence Alignment

14

בונים עץ עי חיבור הרצפים הדומים לפי סדר התאמתם2

Hbb Hbb Hba Hba Myg Human Horse Human Horse Whale

בונים את ההתאמה לפי הסדר המוכתב3 עי העץ

ישנם שלושה מצביםהתאמת זוג רצפים - התאמה בין התאמות-הוספת רצף בודד להתאמה קיימת-

בונים טבלה של מרחק עריכה בין כל שני רצפים1CLUSTALW algorithm

15

ניתן להתאים בין התאמות (או בין רצף להתאמה) עי הרחבה די טבעית של שיטת התיכנות הדינמי הוספה

והכנסת אותיות באופן הרגיל המחיר להתאמת אותיות הוא מחיר ממוצע לקומבינציות השונות צריך לתת מחיר מיוחד

לרווח

16

נקודות עדינות

נותנים משקלות שונים לרצפים כך שאם יש מספר רצפים מאד דומים bullהמשקל היחסי של כל אחד יקטן

שיטה מיוחדת לקביעת מחיר להכנסת רווחיםbull

חסרונותמאחר שהבניה היא בלתי הפיכה יש תלות גדולה מאד באיכות ההתאמה bull

של הזוגות הראשוניםOnce a Gap Always a gapרווחים אינם נעלמים bullרגישות רבה לקנס הרווחיםbull

יתרונותמהירbullסבירbullאפשר לקבוע הרבה פרמטריםbullכולם משתמשיםbull

CLUSTALW algorithm

17

bull We can deduce a pairwise alignment for each two sequences in the multiple alignment (projected pairwise alignment)

bull The projected pairwise alignment is NOT the best pairwise alignment for the two sequences

bull CLUTALW is not an optimal algorithm Better alignments might exist The algorithm yields a possible alignment but not necessarily the best one

Best Pairwise alignment (optimal)

Projected Pairwise alignment

CLUSTALW algorithm

18

ClustalW at EMBLhttpwwwebiacukclustalw Clustalw at the SRS site at EBI

19

ClustalW Output Aln format

20

MSA Editing Jalview

Conservation

wwwesembnetorgServicesMolBiojalviewindexhtml

21

MSA formats - fasta

22

MSA formats - Aln

23

MSA formats - MSF

24

Example 1a a good MSA

25

Example 1b making MSA of distantly related proteins

26

Example 1c including more distant relatives in the MSA

27

bull Mononuclear iron proteins ndash electron carrier proteins Iron atoms are bound to amino acid side chains

bull In IPNS the metal ion is coordinated by three protein residues

bull IPNS is involved in biosynthesis of penicillin

Example 2 Isopenicillin N Synthase

N

SN

OCOOHO

H

H

Me

Me

COOH

NH2

N

SN

OCOOH

NH2 Me

Me

COOHO

ACV

Isopenicillin N

Fe+2

Ascorbate

O2

2H2O Fe

N

NHis268

H

N

NHis212

H

SACV

O2 (NO)

H2O

OAsp214

O

28

Research IPNS

bull Goal Identify Fe+2 binding residues

bull Possible solutions1 In the lab

2 Bioinformatic approach (comparing different IPNS sequences)

29

Step 1

Multiple alignment of known IPNS

Implementation

1 Obtain sequence (eg for MCBI)

IPNS AND Bacteria[Organism]

2 MSA (clustalw) and search for conserved residues in the MSA

30

31

32

MSA ndash bacteria only

Not enough variation

33

MSA ndash bacteria amp fungi

Not enough variation

bacteria amp fungi

bacteria

34

Step 2

Goal Add more enzymes similar to IPNS

ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW

35

bull New multiple alignment narrowing down the possibilities

36

Simple multiple alignment

bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite

similarbull Not enough variability to categorize the active

sitesbull We need to obtain even more distant sequences

(distant homologs)

37

Step 3

Using the results of the MSA for further searches

Implementation

1 Obtain an MSA (clustalw)

2 - Construct a consensus sequence and perform a

new search

OR

- Construct a profile and perform a new search

3 MSA (clustalw) and search for conserved residues in the MSA

38

Consensus Sequence

bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column

bull Consensus each position reflects the most common character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

39

Profilebull We can deduce a statistical model describing

the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column

bull Profile each position reflects the frequency of the character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 1 067 0 0

T 0 033 1 1

C 0 0 0 0

G 0 0 0 0

40

Profile vs Consensusbull The following multiple alignments will

have the same consensus

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

41

Profile vs Consensusbull But have a different profile

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 066 1 0 0

T 0 0 0 1

C 033 0 066

0

G 0 0 033

0

1 2 3 4 5 6

A 1 1 0 0

T 0 0 0 1

C 0 0 1 0

G 0 0 0 0

42

Sequence LOGO A A C C G C T C T T

A G C C G C G C - T

A - C A G A G C C T

A A G C A C G C - T

A C G G G T G C T T

A T G C ndash C G C - T

A gc c g G C T

httpweblogoberkeleyedu

43

Psi Blast

bull Position Specific Iterated - automatic profile-like search

Regular blast

Construct profile from blast results

Blast profile search

Final results

44

bull Alignment with distantly related proteins

45

bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS

Enzyme Relative Km kcat kcatKm

Activity (mM) (min-1) (mM-1min

-1)Wild type 100 04 388 969

His48Ala 16 056 75 134

His63Ala 31 10 142 142

His114Ala 28 085 125 147

His124Ala 48 084 321 381

His135Ala 22 059 117 198

His212Ala lt0007 nd nd

His268Ala lt0003 nd nd

Asp14Ala 5 086 056 07

Asp113Ala 63 045 238 528

Asp131Ala 68 048 363 755

Asp203Ala 32 091 123 135

Asp214Ala lt0004 nd nd

Isopenicillin N Synthase

46

ndash IPNS

47

48

  • Multiple Sequence Alignment
  • Example
  • Why multiple sequence alignment
  • Why multiple alignment - example
  • MSA applications
  • Slide 6
  • Slide 7
  • Slide 8
  • Scoring metrics
  • Slide 10
  • MSA algorithms
  • CLUSTALW algorithm
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • ClustalW at EMBL
  • ClustalW Output Aln format
  • MSA Editing Jalview
  • MSA formats - fasta
  • MSA formats - Aln
  • MSA formats - MSF
  • Example 1a a good MSA
  • Slide 25
  • Slide 26
  • Example 2 Isopenicillin N Synthase
  • Research IPNS
  • Step 1
  • Slide 30
  • Slide 31
  • MSA ndash bacteria only
  • MSA ndash bacteria amp fungi
  • Step 2
  • Slide 35
  • Simple multiple alignment
  • Step 3
  • Consensus Sequence
  • Profile
  • Profile vs Consensus
  • Slide 41
  • Slide 42
  • Psi Blast
  • Slide 44
  • Isopenicillin N Synthase
  • Slide 46
  • Slide 47
  • Slide 48
Page 15: Multiple Sequence Alignment

15

ניתן להתאים בין התאמות (או בין רצף להתאמה) עי הרחבה די טבעית של שיטת התיכנות הדינמי הוספה

והכנסת אותיות באופן הרגיל המחיר להתאמת אותיות הוא מחיר ממוצע לקומבינציות השונות צריך לתת מחיר מיוחד

לרווח

16

נקודות עדינות

נותנים משקלות שונים לרצפים כך שאם יש מספר רצפים מאד דומים bullהמשקל היחסי של כל אחד יקטן

שיטה מיוחדת לקביעת מחיר להכנסת רווחיםbull

חסרונותמאחר שהבניה היא בלתי הפיכה יש תלות גדולה מאד באיכות ההתאמה bull

של הזוגות הראשוניםOnce a Gap Always a gapרווחים אינם נעלמים bullרגישות רבה לקנס הרווחיםbull

יתרונותמהירbullסבירbullאפשר לקבוע הרבה פרמטריםbullכולם משתמשיםbull

CLUSTALW algorithm

17

bull We can deduce a pairwise alignment for each two sequences in the multiple alignment (projected pairwise alignment)

bull The projected pairwise alignment is NOT the best pairwise alignment for the two sequences

bull CLUTALW is not an optimal algorithm Better alignments might exist The algorithm yields a possible alignment but not necessarily the best one

Best Pairwise alignment (optimal)

Projected Pairwise alignment

CLUSTALW algorithm

18

ClustalW at EMBLhttpwwwebiacukclustalw Clustalw at the SRS site at EBI

19

ClustalW Output Aln format

20

MSA Editing Jalview

Conservation

wwwesembnetorgServicesMolBiojalviewindexhtml

21

MSA formats - fasta

22

MSA formats - Aln

23

MSA formats - MSF

24

Example 1a a good MSA

25

Example 1b making MSA of distantly related proteins

26

Example 1c including more distant relatives in the MSA

27

bull Mononuclear iron proteins ndash electron carrier proteins Iron atoms are bound to amino acid side chains

bull In IPNS the metal ion is coordinated by three protein residues

bull IPNS is involved in biosynthesis of penicillin

Example 2 Isopenicillin N Synthase

N

SN

OCOOHO

H

H

Me

Me

COOH

NH2

N

SN

OCOOH

NH2 Me

Me

COOHO

ACV

Isopenicillin N

Fe+2

Ascorbate

O2

2H2O Fe

N

NHis268

H

N

NHis212

H

SACV

O2 (NO)

H2O

OAsp214

O

28

Research IPNS

bull Goal Identify Fe+2 binding residues

bull Possible solutions1 In the lab

2 Bioinformatic approach (comparing different IPNS sequences)

29

Step 1

Multiple alignment of known IPNS

Implementation

1 Obtain sequence (eg for MCBI)

IPNS AND Bacteria[Organism]

2 MSA (clustalw) and search for conserved residues in the MSA

30

31

32

MSA ndash bacteria only

Not enough variation

33

MSA ndash bacteria amp fungi

Not enough variation

bacteria amp fungi

bacteria

34

Step 2

Goal Add more enzymes similar to IPNS

ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW

35

bull New multiple alignment narrowing down the possibilities

36

Simple multiple alignment

bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite

similarbull Not enough variability to categorize the active

sitesbull We need to obtain even more distant sequences

(distant homologs)

37

Step 3

Using the results of the MSA for further searches

Implementation

1 Obtain an MSA (clustalw)

2 - Construct a consensus sequence and perform a

new search

OR

- Construct a profile and perform a new search

3 MSA (clustalw) and search for conserved residues in the MSA

38

Consensus Sequence

bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column

bull Consensus each position reflects the most common character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

39

Profilebull We can deduce a statistical model describing

the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column

bull Profile each position reflects the frequency of the character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 1 067 0 0

T 0 033 1 1

C 0 0 0 0

G 0 0 0 0

40

Profile vs Consensusbull The following multiple alignments will

have the same consensus

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

41

Profile vs Consensusbull But have a different profile

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 066 1 0 0

T 0 0 0 1

C 033 0 066

0

G 0 0 033

0

1 2 3 4 5 6

A 1 1 0 0

T 0 0 0 1

C 0 0 1 0

G 0 0 0 0

42

Sequence LOGO A A C C G C T C T T

A G C C G C G C - T

A - C A G A G C C T

A A G C A C G C - T

A C G G G T G C T T

A T G C ndash C G C - T

A gc c g G C T

httpweblogoberkeleyedu

43

Psi Blast

bull Position Specific Iterated - automatic profile-like search

Regular blast

Construct profile from blast results

Blast profile search

Final results

44

bull Alignment with distantly related proteins

45

bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS

Enzyme Relative Km kcat kcatKm

Activity (mM) (min-1) (mM-1min

-1)Wild type 100 04 388 969

His48Ala 16 056 75 134

His63Ala 31 10 142 142

His114Ala 28 085 125 147

His124Ala 48 084 321 381

His135Ala 22 059 117 198

His212Ala lt0007 nd nd

His268Ala lt0003 nd nd

Asp14Ala 5 086 056 07

Asp113Ala 63 045 238 528

Asp131Ala 68 048 363 755

Asp203Ala 32 091 123 135

Asp214Ala lt0004 nd nd

Isopenicillin N Synthase

46

ndash IPNS

47

48

  • Multiple Sequence Alignment
  • Example
  • Why multiple sequence alignment
  • Why multiple alignment - example
  • MSA applications
  • Slide 6
  • Slide 7
  • Slide 8
  • Scoring metrics
  • Slide 10
  • MSA algorithms
  • CLUSTALW algorithm
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • ClustalW at EMBL
  • ClustalW Output Aln format
  • MSA Editing Jalview
  • MSA formats - fasta
  • MSA formats - Aln
  • MSA formats - MSF
  • Example 1a a good MSA
  • Slide 25
  • Slide 26
  • Example 2 Isopenicillin N Synthase
  • Research IPNS
  • Step 1
  • Slide 30
  • Slide 31
  • MSA ndash bacteria only
  • MSA ndash bacteria amp fungi
  • Step 2
  • Slide 35
  • Simple multiple alignment
  • Step 3
  • Consensus Sequence
  • Profile
  • Profile vs Consensus
  • Slide 41
  • Slide 42
  • Psi Blast
  • Slide 44
  • Isopenicillin N Synthase
  • Slide 46
  • Slide 47
  • Slide 48
Page 16: Multiple Sequence Alignment

16

נקודות עדינות

נותנים משקלות שונים לרצפים כך שאם יש מספר רצפים מאד דומים bullהמשקל היחסי של כל אחד יקטן

שיטה מיוחדת לקביעת מחיר להכנסת רווחיםbull

חסרונותמאחר שהבניה היא בלתי הפיכה יש תלות גדולה מאד באיכות ההתאמה bull

של הזוגות הראשוניםOnce a Gap Always a gapרווחים אינם נעלמים bullרגישות רבה לקנס הרווחיםbull

יתרונותמהירbullסבירbullאפשר לקבוע הרבה פרמטריםbullכולם משתמשיםbull

CLUSTALW algorithm

17

bull We can deduce a pairwise alignment for each two sequences in the multiple alignment (projected pairwise alignment)

bull The projected pairwise alignment is NOT the best pairwise alignment for the two sequences

bull CLUTALW is not an optimal algorithm Better alignments might exist The algorithm yields a possible alignment but not necessarily the best one

Best Pairwise alignment (optimal)

Projected Pairwise alignment

CLUSTALW algorithm

18

ClustalW at EMBLhttpwwwebiacukclustalw Clustalw at the SRS site at EBI

19

ClustalW Output Aln format

20

MSA Editing Jalview

Conservation

wwwesembnetorgServicesMolBiojalviewindexhtml

21

MSA formats - fasta

22

MSA formats - Aln

23

MSA formats - MSF

24

Example 1a a good MSA

25

Example 1b making MSA of distantly related proteins

26

Example 1c including more distant relatives in the MSA

27

bull Mononuclear iron proteins ndash electron carrier proteins Iron atoms are bound to amino acid side chains

bull In IPNS the metal ion is coordinated by three protein residues

bull IPNS is involved in biosynthesis of penicillin

Example 2 Isopenicillin N Synthase

N

SN

OCOOHO

H

H

Me

Me

COOH

NH2

N

SN

OCOOH

NH2 Me

Me

COOHO

ACV

Isopenicillin N

Fe+2

Ascorbate

O2

2H2O Fe

N

NHis268

H

N

NHis212

H

SACV

O2 (NO)

H2O

OAsp214

O

28

Research IPNS

bull Goal Identify Fe+2 binding residues

bull Possible solutions1 In the lab

2 Bioinformatic approach (comparing different IPNS sequences)

29

Step 1

Multiple alignment of known IPNS

Implementation

1 Obtain sequence (eg for MCBI)

IPNS AND Bacteria[Organism]

2 MSA (clustalw) and search for conserved residues in the MSA

30

31

32

MSA ndash bacteria only

Not enough variation

33

MSA ndash bacteria amp fungi

Not enough variation

bacteria amp fungi

bacteria

34

Step 2

Goal Add more enzymes similar to IPNS

ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW

35

bull New multiple alignment narrowing down the possibilities

36

Simple multiple alignment

bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite

similarbull Not enough variability to categorize the active

sitesbull We need to obtain even more distant sequences

(distant homologs)

37

Step 3

Using the results of the MSA for further searches

Implementation

1 Obtain an MSA (clustalw)

2 - Construct a consensus sequence and perform a

new search

OR

- Construct a profile and perform a new search

3 MSA (clustalw) and search for conserved residues in the MSA

38

Consensus Sequence

bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column

bull Consensus each position reflects the most common character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

39

Profilebull We can deduce a statistical model describing

the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column

bull Profile each position reflects the frequency of the character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 1 067 0 0

T 0 033 1 1

C 0 0 0 0

G 0 0 0 0

40

Profile vs Consensusbull The following multiple alignments will

have the same consensus

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

41

Profile vs Consensusbull But have a different profile

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 066 1 0 0

T 0 0 0 1

C 033 0 066

0

G 0 0 033

0

1 2 3 4 5 6

A 1 1 0 0

T 0 0 0 1

C 0 0 1 0

G 0 0 0 0

42

Sequence LOGO A A C C G C T C T T

A G C C G C G C - T

A - C A G A G C C T

A A G C A C G C - T

A C G G G T G C T T

A T G C ndash C G C - T

A gc c g G C T

httpweblogoberkeleyedu

43

Psi Blast

bull Position Specific Iterated - automatic profile-like search

Regular blast

Construct profile from blast results

Blast profile search

Final results

44

bull Alignment with distantly related proteins

45

bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS

Enzyme Relative Km kcat kcatKm

Activity (mM) (min-1) (mM-1min

-1)Wild type 100 04 388 969

His48Ala 16 056 75 134

His63Ala 31 10 142 142

His114Ala 28 085 125 147

His124Ala 48 084 321 381

His135Ala 22 059 117 198

His212Ala lt0007 nd nd

His268Ala lt0003 nd nd

Asp14Ala 5 086 056 07

Asp113Ala 63 045 238 528

Asp131Ala 68 048 363 755

Asp203Ala 32 091 123 135

Asp214Ala lt0004 nd nd

Isopenicillin N Synthase

46

ndash IPNS

47

48

  • Multiple Sequence Alignment
  • Example
  • Why multiple sequence alignment
  • Why multiple alignment - example
  • MSA applications
  • Slide 6
  • Slide 7
  • Slide 8
  • Scoring metrics
  • Slide 10
  • MSA algorithms
  • CLUSTALW algorithm
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • ClustalW at EMBL
  • ClustalW Output Aln format
  • MSA Editing Jalview
  • MSA formats - fasta
  • MSA formats - Aln
  • MSA formats - MSF
  • Example 1a a good MSA
  • Slide 25
  • Slide 26
  • Example 2 Isopenicillin N Synthase
  • Research IPNS
  • Step 1
  • Slide 30
  • Slide 31
  • MSA ndash bacteria only
  • MSA ndash bacteria amp fungi
  • Step 2
  • Slide 35
  • Simple multiple alignment
  • Step 3
  • Consensus Sequence
  • Profile
  • Profile vs Consensus
  • Slide 41
  • Slide 42
  • Psi Blast
  • Slide 44
  • Isopenicillin N Synthase
  • Slide 46
  • Slide 47
  • Slide 48
Page 17: Multiple Sequence Alignment

17

bull We can deduce a pairwise alignment for each two sequences in the multiple alignment (projected pairwise alignment)

bull The projected pairwise alignment is NOT the best pairwise alignment for the two sequences

bull CLUTALW is not an optimal algorithm Better alignments might exist The algorithm yields a possible alignment but not necessarily the best one

Best Pairwise alignment (optimal)

Projected Pairwise alignment

CLUSTALW algorithm

18

ClustalW at EMBLhttpwwwebiacukclustalw Clustalw at the SRS site at EBI

19

ClustalW Output Aln format

20

MSA Editing Jalview

Conservation

wwwesembnetorgServicesMolBiojalviewindexhtml

21

MSA formats - fasta

22

MSA formats - Aln

23

MSA formats - MSF

24

Example 1a a good MSA

25

Example 1b making MSA of distantly related proteins

26

Example 1c including more distant relatives in the MSA

27

bull Mononuclear iron proteins ndash electron carrier proteins Iron atoms are bound to amino acid side chains

bull In IPNS the metal ion is coordinated by three protein residues

bull IPNS is involved in biosynthesis of penicillin

Example 2 Isopenicillin N Synthase

N

SN

OCOOHO

H

H

Me

Me

COOH

NH2

N

SN

OCOOH

NH2 Me

Me

COOHO

ACV

Isopenicillin N

Fe+2

Ascorbate

O2

2H2O Fe

N

NHis268

H

N

NHis212

H

SACV

O2 (NO)

H2O

OAsp214

O

28

Research IPNS

bull Goal Identify Fe+2 binding residues

bull Possible solutions1 In the lab

2 Bioinformatic approach (comparing different IPNS sequences)

29

Step 1

Multiple alignment of known IPNS

Implementation

1 Obtain sequence (eg for MCBI)

IPNS AND Bacteria[Organism]

2 MSA (clustalw) and search for conserved residues in the MSA

30

31

32

MSA ndash bacteria only

Not enough variation

33

MSA ndash bacteria amp fungi

Not enough variation

bacteria amp fungi

bacteria

34

Step 2

Goal Add more enzymes similar to IPNS

ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW

35

bull New multiple alignment narrowing down the possibilities

36

Simple multiple alignment

bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite

similarbull Not enough variability to categorize the active

sitesbull We need to obtain even more distant sequences

(distant homologs)

37

Step 3

Using the results of the MSA for further searches

Implementation

1 Obtain an MSA (clustalw)

2 - Construct a consensus sequence and perform a

new search

OR

- Construct a profile and perform a new search

3 MSA (clustalw) and search for conserved residues in the MSA

38

Consensus Sequence

bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column

bull Consensus each position reflects the most common character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

39

Profilebull We can deduce a statistical model describing

the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column

bull Profile each position reflects the frequency of the character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 1 067 0 0

T 0 033 1 1

C 0 0 0 0

G 0 0 0 0

40

Profile vs Consensusbull The following multiple alignments will

have the same consensus

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

41

Profile vs Consensusbull But have a different profile

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 066 1 0 0

T 0 0 0 1

C 033 0 066

0

G 0 0 033

0

1 2 3 4 5 6

A 1 1 0 0

T 0 0 0 1

C 0 0 1 0

G 0 0 0 0

42

Sequence LOGO A A C C G C T C T T

A G C C G C G C - T

A - C A G A G C C T

A A G C A C G C - T

A C G G G T G C T T

A T G C ndash C G C - T

A gc c g G C T

httpweblogoberkeleyedu

43

Psi Blast

bull Position Specific Iterated - automatic profile-like search

Regular blast

Construct profile from blast results

Blast profile search

Final results

44

bull Alignment with distantly related proteins

45

bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS

Enzyme Relative Km kcat kcatKm

Activity (mM) (min-1) (mM-1min

-1)Wild type 100 04 388 969

His48Ala 16 056 75 134

His63Ala 31 10 142 142

His114Ala 28 085 125 147

His124Ala 48 084 321 381

His135Ala 22 059 117 198

His212Ala lt0007 nd nd

His268Ala lt0003 nd nd

Asp14Ala 5 086 056 07

Asp113Ala 63 045 238 528

Asp131Ala 68 048 363 755

Asp203Ala 32 091 123 135

Asp214Ala lt0004 nd nd

Isopenicillin N Synthase

46

ndash IPNS

47

48

  • Multiple Sequence Alignment
  • Example
  • Why multiple sequence alignment
  • Why multiple alignment - example
  • MSA applications
  • Slide 6
  • Slide 7
  • Slide 8
  • Scoring metrics
  • Slide 10
  • MSA algorithms
  • CLUSTALW algorithm
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • ClustalW at EMBL
  • ClustalW Output Aln format
  • MSA Editing Jalview
  • MSA formats - fasta
  • MSA formats - Aln
  • MSA formats - MSF
  • Example 1a a good MSA
  • Slide 25
  • Slide 26
  • Example 2 Isopenicillin N Synthase
  • Research IPNS
  • Step 1
  • Slide 30
  • Slide 31
  • MSA ndash bacteria only
  • MSA ndash bacteria amp fungi
  • Step 2
  • Slide 35
  • Simple multiple alignment
  • Step 3
  • Consensus Sequence
  • Profile
  • Profile vs Consensus
  • Slide 41
  • Slide 42
  • Psi Blast
  • Slide 44
  • Isopenicillin N Synthase
  • Slide 46
  • Slide 47
  • Slide 48
Page 18: Multiple Sequence Alignment

18

ClustalW at EMBLhttpwwwebiacukclustalw Clustalw at the SRS site at EBI

19

ClustalW Output Aln format

20

MSA Editing Jalview

Conservation

wwwesembnetorgServicesMolBiojalviewindexhtml

21

MSA formats - fasta

22

MSA formats - Aln

23

MSA formats - MSF

24

Example 1a a good MSA

25

Example 1b making MSA of distantly related proteins

26

Example 1c including more distant relatives in the MSA

27

bull Mononuclear iron proteins ndash electron carrier proteins Iron atoms are bound to amino acid side chains

bull In IPNS the metal ion is coordinated by three protein residues

bull IPNS is involved in biosynthesis of penicillin

Example 2 Isopenicillin N Synthase

N

SN

OCOOHO

H

H

Me

Me

COOH

NH2

N

SN

OCOOH

NH2 Me

Me

COOHO

ACV

Isopenicillin N

Fe+2

Ascorbate

O2

2H2O Fe

N

NHis268

H

N

NHis212

H

SACV

O2 (NO)

H2O

OAsp214

O

28

Research IPNS

bull Goal Identify Fe+2 binding residues

bull Possible solutions1 In the lab

2 Bioinformatic approach (comparing different IPNS sequences)

29

Step 1

Multiple alignment of known IPNS

Implementation

1 Obtain sequence (eg for MCBI)

IPNS AND Bacteria[Organism]

2 MSA (clustalw) and search for conserved residues in the MSA

30

31

32

MSA ndash bacteria only

Not enough variation

33

MSA ndash bacteria amp fungi

Not enough variation

bacteria amp fungi

bacteria

34

Step 2

Goal Add more enzymes similar to IPNS

ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW

35

bull New multiple alignment narrowing down the possibilities

36

Simple multiple alignment

bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite

similarbull Not enough variability to categorize the active

sitesbull We need to obtain even more distant sequences

(distant homologs)

37

Step 3

Using the results of the MSA for further searches

Implementation

1 Obtain an MSA (clustalw)

2 - Construct a consensus sequence and perform a

new search

OR

- Construct a profile and perform a new search

3 MSA (clustalw) and search for conserved residues in the MSA

38

Consensus Sequence

bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column

bull Consensus each position reflects the most common character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

39

Profilebull We can deduce a statistical model describing

the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column

bull Profile each position reflects the frequency of the character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 1 067 0 0

T 0 033 1 1

C 0 0 0 0

G 0 0 0 0

40

Profile vs Consensusbull The following multiple alignments will

have the same consensus

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

41

Profile vs Consensusbull But have a different profile

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 066 1 0 0

T 0 0 0 1

C 033 0 066

0

G 0 0 033

0

1 2 3 4 5 6

A 1 1 0 0

T 0 0 0 1

C 0 0 1 0

G 0 0 0 0

42

Sequence LOGO A A C C G C T C T T

A G C C G C G C - T

A - C A G A G C C T

A A G C A C G C - T

A C G G G T G C T T

A T G C ndash C G C - T

A gc c g G C T

httpweblogoberkeleyedu

43

Psi Blast

bull Position Specific Iterated - automatic profile-like search

Regular blast

Construct profile from blast results

Blast profile search

Final results

44

bull Alignment with distantly related proteins

45

bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS

Enzyme Relative Km kcat kcatKm

Activity (mM) (min-1) (mM-1min

-1)Wild type 100 04 388 969

His48Ala 16 056 75 134

His63Ala 31 10 142 142

His114Ala 28 085 125 147

His124Ala 48 084 321 381

His135Ala 22 059 117 198

His212Ala lt0007 nd nd

His268Ala lt0003 nd nd

Asp14Ala 5 086 056 07

Asp113Ala 63 045 238 528

Asp131Ala 68 048 363 755

Asp203Ala 32 091 123 135

Asp214Ala lt0004 nd nd

Isopenicillin N Synthase

46

ndash IPNS

47

48

  • Multiple Sequence Alignment
  • Example
  • Why multiple sequence alignment
  • Why multiple alignment - example
  • MSA applications
  • Slide 6
  • Slide 7
  • Slide 8
  • Scoring metrics
  • Slide 10
  • MSA algorithms
  • CLUSTALW algorithm
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • ClustalW at EMBL
  • ClustalW Output Aln format
  • MSA Editing Jalview
  • MSA formats - fasta
  • MSA formats - Aln
  • MSA formats - MSF
  • Example 1a a good MSA
  • Slide 25
  • Slide 26
  • Example 2 Isopenicillin N Synthase
  • Research IPNS
  • Step 1
  • Slide 30
  • Slide 31
  • MSA ndash bacteria only
  • MSA ndash bacteria amp fungi
  • Step 2
  • Slide 35
  • Simple multiple alignment
  • Step 3
  • Consensus Sequence
  • Profile
  • Profile vs Consensus
  • Slide 41
  • Slide 42
  • Psi Blast
  • Slide 44
  • Isopenicillin N Synthase
  • Slide 46
  • Slide 47
  • Slide 48
Page 19: Multiple Sequence Alignment

19

ClustalW Output Aln format

20

MSA Editing Jalview

Conservation

wwwesembnetorgServicesMolBiojalviewindexhtml

21

MSA formats - fasta

22

MSA formats - Aln

23

MSA formats - MSF

24

Example 1a a good MSA

25

Example 1b making MSA of distantly related proteins

26

Example 1c including more distant relatives in the MSA

27

bull Mononuclear iron proteins ndash electron carrier proteins Iron atoms are bound to amino acid side chains

bull In IPNS the metal ion is coordinated by three protein residues

bull IPNS is involved in biosynthesis of penicillin

Example 2 Isopenicillin N Synthase

N

SN

OCOOHO

H

H

Me

Me

COOH

NH2

N

SN

OCOOH

NH2 Me

Me

COOHO

ACV

Isopenicillin N

Fe+2

Ascorbate

O2

2H2O Fe

N

NHis268

H

N

NHis212

H

SACV

O2 (NO)

H2O

OAsp214

O

28

Research IPNS

bull Goal Identify Fe+2 binding residues

bull Possible solutions1 In the lab

2 Bioinformatic approach (comparing different IPNS sequences)

29

Step 1

Multiple alignment of known IPNS

Implementation

1 Obtain sequence (eg for MCBI)

IPNS AND Bacteria[Organism]

2 MSA (clustalw) and search for conserved residues in the MSA

30

31

32

MSA ndash bacteria only

Not enough variation

33

MSA ndash bacteria amp fungi

Not enough variation

bacteria amp fungi

bacteria

34

Step 2

Goal Add more enzymes similar to IPNS

ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW

35

bull New multiple alignment narrowing down the possibilities

36

Simple multiple alignment

bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite

similarbull Not enough variability to categorize the active

sitesbull We need to obtain even more distant sequences

(distant homologs)

37

Step 3

Using the results of the MSA for further searches

Implementation

1 Obtain an MSA (clustalw)

2 - Construct a consensus sequence and perform a

new search

OR

- Construct a profile and perform a new search

3 MSA (clustalw) and search for conserved residues in the MSA

38

Consensus Sequence

bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column

bull Consensus each position reflects the most common character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

39

Profilebull We can deduce a statistical model describing

the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column

bull Profile each position reflects the frequency of the character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 1 067 0 0

T 0 033 1 1

C 0 0 0 0

G 0 0 0 0

40

Profile vs Consensusbull The following multiple alignments will

have the same consensus

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

41

Profile vs Consensusbull But have a different profile

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 066 1 0 0

T 0 0 0 1

C 033 0 066

0

G 0 0 033

0

1 2 3 4 5 6

A 1 1 0 0

T 0 0 0 1

C 0 0 1 0

G 0 0 0 0

42

Sequence LOGO A A C C G C T C T T

A G C C G C G C - T

A - C A G A G C C T

A A G C A C G C - T

A C G G G T G C T T

A T G C ndash C G C - T

A gc c g G C T

httpweblogoberkeleyedu

43

Psi Blast

bull Position Specific Iterated - automatic profile-like search

Regular blast

Construct profile from blast results

Blast profile search

Final results

44

bull Alignment with distantly related proteins

45

bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS

Enzyme Relative Km kcat kcatKm

Activity (mM) (min-1) (mM-1min

-1)Wild type 100 04 388 969

His48Ala 16 056 75 134

His63Ala 31 10 142 142

His114Ala 28 085 125 147

His124Ala 48 084 321 381

His135Ala 22 059 117 198

His212Ala lt0007 nd nd

His268Ala lt0003 nd nd

Asp14Ala 5 086 056 07

Asp113Ala 63 045 238 528

Asp131Ala 68 048 363 755

Asp203Ala 32 091 123 135

Asp214Ala lt0004 nd nd

Isopenicillin N Synthase

46

ndash IPNS

47

48

  • Multiple Sequence Alignment
  • Example
  • Why multiple sequence alignment
  • Why multiple alignment - example
  • MSA applications
  • Slide 6
  • Slide 7
  • Slide 8
  • Scoring metrics
  • Slide 10
  • MSA algorithms
  • CLUSTALW algorithm
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • ClustalW at EMBL
  • ClustalW Output Aln format
  • MSA Editing Jalview
  • MSA formats - fasta
  • MSA formats - Aln
  • MSA formats - MSF
  • Example 1a a good MSA
  • Slide 25
  • Slide 26
  • Example 2 Isopenicillin N Synthase
  • Research IPNS
  • Step 1
  • Slide 30
  • Slide 31
  • MSA ndash bacteria only
  • MSA ndash bacteria amp fungi
  • Step 2
  • Slide 35
  • Simple multiple alignment
  • Step 3
  • Consensus Sequence
  • Profile
  • Profile vs Consensus
  • Slide 41
  • Slide 42
  • Psi Blast
  • Slide 44
  • Isopenicillin N Synthase
  • Slide 46
  • Slide 47
  • Slide 48
Page 20: Multiple Sequence Alignment

20

MSA Editing Jalview

Conservation

wwwesembnetorgServicesMolBiojalviewindexhtml

21

MSA formats - fasta

22

MSA formats - Aln

23

MSA formats - MSF

24

Example 1a a good MSA

25

Example 1b making MSA of distantly related proteins

26

Example 1c including more distant relatives in the MSA

27

bull Mononuclear iron proteins ndash electron carrier proteins Iron atoms are bound to amino acid side chains

bull In IPNS the metal ion is coordinated by three protein residues

bull IPNS is involved in biosynthesis of penicillin

Example 2 Isopenicillin N Synthase

N

SN

OCOOHO

H

H

Me

Me

COOH

NH2

N

SN

OCOOH

NH2 Me

Me

COOHO

ACV

Isopenicillin N

Fe+2

Ascorbate

O2

2H2O Fe

N

NHis268

H

N

NHis212

H

SACV

O2 (NO)

H2O

OAsp214

O

28

Research IPNS

bull Goal Identify Fe+2 binding residues

bull Possible solutions1 In the lab

2 Bioinformatic approach (comparing different IPNS sequences)

29

Step 1

Multiple alignment of known IPNS

Implementation

1 Obtain sequence (eg for MCBI)

IPNS AND Bacteria[Organism]

2 MSA (clustalw) and search for conserved residues in the MSA

30

31

32

MSA ndash bacteria only

Not enough variation

33

MSA ndash bacteria amp fungi

Not enough variation

bacteria amp fungi

bacteria

34

Step 2

Goal Add more enzymes similar to IPNS

ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW

35

bull New multiple alignment narrowing down the possibilities

36

Simple multiple alignment

bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite

similarbull Not enough variability to categorize the active

sitesbull We need to obtain even more distant sequences

(distant homologs)

37

Step 3

Using the results of the MSA for further searches

Implementation

1 Obtain an MSA (clustalw)

2 - Construct a consensus sequence and perform a

new search

OR

- Construct a profile and perform a new search

3 MSA (clustalw) and search for conserved residues in the MSA

38

Consensus Sequence

bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column

bull Consensus each position reflects the most common character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

39

Profilebull We can deduce a statistical model describing

the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column

bull Profile each position reflects the frequency of the character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 1 067 0 0

T 0 033 1 1

C 0 0 0 0

G 0 0 0 0

40

Profile vs Consensusbull The following multiple alignments will

have the same consensus

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

41

Profile vs Consensusbull But have a different profile

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 066 1 0 0

T 0 0 0 1

C 033 0 066

0

G 0 0 033

0

1 2 3 4 5 6

A 1 1 0 0

T 0 0 0 1

C 0 0 1 0

G 0 0 0 0

42

Sequence LOGO A A C C G C T C T T

A G C C G C G C - T

A - C A G A G C C T

A A G C A C G C - T

A C G G G T G C T T

A T G C ndash C G C - T

A gc c g G C T

httpweblogoberkeleyedu

43

Psi Blast

bull Position Specific Iterated - automatic profile-like search

Regular blast

Construct profile from blast results

Blast profile search

Final results

44

bull Alignment with distantly related proteins

45

bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS

Enzyme Relative Km kcat kcatKm

Activity (mM) (min-1) (mM-1min

-1)Wild type 100 04 388 969

His48Ala 16 056 75 134

His63Ala 31 10 142 142

His114Ala 28 085 125 147

His124Ala 48 084 321 381

His135Ala 22 059 117 198

His212Ala lt0007 nd nd

His268Ala lt0003 nd nd

Asp14Ala 5 086 056 07

Asp113Ala 63 045 238 528

Asp131Ala 68 048 363 755

Asp203Ala 32 091 123 135

Asp214Ala lt0004 nd nd

Isopenicillin N Synthase

46

ndash IPNS

47

48

  • Multiple Sequence Alignment
  • Example
  • Why multiple sequence alignment
  • Why multiple alignment - example
  • MSA applications
  • Slide 6
  • Slide 7
  • Slide 8
  • Scoring metrics
  • Slide 10
  • MSA algorithms
  • CLUSTALW algorithm
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • ClustalW at EMBL
  • ClustalW Output Aln format
  • MSA Editing Jalview
  • MSA formats - fasta
  • MSA formats - Aln
  • MSA formats - MSF
  • Example 1a a good MSA
  • Slide 25
  • Slide 26
  • Example 2 Isopenicillin N Synthase
  • Research IPNS
  • Step 1
  • Slide 30
  • Slide 31
  • MSA ndash bacteria only
  • MSA ndash bacteria amp fungi
  • Step 2
  • Slide 35
  • Simple multiple alignment
  • Step 3
  • Consensus Sequence
  • Profile
  • Profile vs Consensus
  • Slide 41
  • Slide 42
  • Psi Blast
  • Slide 44
  • Isopenicillin N Synthase
  • Slide 46
  • Slide 47
  • Slide 48
Page 21: Multiple Sequence Alignment

21

MSA formats - fasta

22

MSA formats - Aln

23

MSA formats - MSF

24

Example 1a a good MSA

25

Example 1b making MSA of distantly related proteins

26

Example 1c including more distant relatives in the MSA

27

bull Mononuclear iron proteins ndash electron carrier proteins Iron atoms are bound to amino acid side chains

bull In IPNS the metal ion is coordinated by three protein residues

bull IPNS is involved in biosynthesis of penicillin

Example 2 Isopenicillin N Synthase

N

SN

OCOOHO

H

H

Me

Me

COOH

NH2

N

SN

OCOOH

NH2 Me

Me

COOHO

ACV

Isopenicillin N

Fe+2

Ascorbate

O2

2H2O Fe

N

NHis268

H

N

NHis212

H

SACV

O2 (NO)

H2O

OAsp214

O

28

Research IPNS

bull Goal Identify Fe+2 binding residues

bull Possible solutions1 In the lab

2 Bioinformatic approach (comparing different IPNS sequences)

29

Step 1

Multiple alignment of known IPNS

Implementation

1 Obtain sequence (eg for MCBI)

IPNS AND Bacteria[Organism]

2 MSA (clustalw) and search for conserved residues in the MSA

30

31

32

MSA ndash bacteria only

Not enough variation

33

MSA ndash bacteria amp fungi

Not enough variation

bacteria amp fungi

bacteria

34

Step 2

Goal Add more enzymes similar to IPNS

ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW

35

bull New multiple alignment narrowing down the possibilities

36

Simple multiple alignment

bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite

similarbull Not enough variability to categorize the active

sitesbull We need to obtain even more distant sequences

(distant homologs)

37

Step 3

Using the results of the MSA for further searches

Implementation

1 Obtain an MSA (clustalw)

2 - Construct a consensus sequence and perform a

new search

OR

- Construct a profile and perform a new search

3 MSA (clustalw) and search for conserved residues in the MSA

38

Consensus Sequence

bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column

bull Consensus each position reflects the most common character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

39

Profilebull We can deduce a statistical model describing

the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column

bull Profile each position reflects the frequency of the character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 1 067 0 0

T 0 033 1 1

C 0 0 0 0

G 0 0 0 0

40

Profile vs Consensusbull The following multiple alignments will

have the same consensus

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

41

Profile vs Consensusbull But have a different profile

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 066 1 0 0

T 0 0 0 1

C 033 0 066

0

G 0 0 033

0

1 2 3 4 5 6

A 1 1 0 0

T 0 0 0 1

C 0 0 1 0

G 0 0 0 0

42

Sequence LOGO A A C C G C T C T T

A G C C G C G C - T

A - C A G A G C C T

A A G C A C G C - T

A C G G G T G C T T

A T G C ndash C G C - T

A gc c g G C T

httpweblogoberkeleyedu

43

Psi Blast

bull Position Specific Iterated - automatic profile-like search

Regular blast

Construct profile from blast results

Blast profile search

Final results

44

bull Alignment with distantly related proteins

45

bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS

Enzyme Relative Km kcat kcatKm

Activity (mM) (min-1) (mM-1min

-1)Wild type 100 04 388 969

His48Ala 16 056 75 134

His63Ala 31 10 142 142

His114Ala 28 085 125 147

His124Ala 48 084 321 381

His135Ala 22 059 117 198

His212Ala lt0007 nd nd

His268Ala lt0003 nd nd

Asp14Ala 5 086 056 07

Asp113Ala 63 045 238 528

Asp131Ala 68 048 363 755

Asp203Ala 32 091 123 135

Asp214Ala lt0004 nd nd

Isopenicillin N Synthase

46

ndash IPNS

47

48

  • Multiple Sequence Alignment
  • Example
  • Why multiple sequence alignment
  • Why multiple alignment - example
  • MSA applications
  • Slide 6
  • Slide 7
  • Slide 8
  • Scoring metrics
  • Slide 10
  • MSA algorithms
  • CLUSTALW algorithm
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • ClustalW at EMBL
  • ClustalW Output Aln format
  • MSA Editing Jalview
  • MSA formats - fasta
  • MSA formats - Aln
  • MSA formats - MSF
  • Example 1a a good MSA
  • Slide 25
  • Slide 26
  • Example 2 Isopenicillin N Synthase
  • Research IPNS
  • Step 1
  • Slide 30
  • Slide 31
  • MSA ndash bacteria only
  • MSA ndash bacteria amp fungi
  • Step 2
  • Slide 35
  • Simple multiple alignment
  • Step 3
  • Consensus Sequence
  • Profile
  • Profile vs Consensus
  • Slide 41
  • Slide 42
  • Psi Blast
  • Slide 44
  • Isopenicillin N Synthase
  • Slide 46
  • Slide 47
  • Slide 48
Page 22: Multiple Sequence Alignment

22

MSA formats - Aln

23

MSA formats - MSF

24

Example 1a a good MSA

25

Example 1b making MSA of distantly related proteins

26

Example 1c including more distant relatives in the MSA

27

bull Mononuclear iron proteins ndash electron carrier proteins Iron atoms are bound to amino acid side chains

bull In IPNS the metal ion is coordinated by three protein residues

bull IPNS is involved in biosynthesis of penicillin

Example 2 Isopenicillin N Synthase

N

SN

OCOOHO

H

H

Me

Me

COOH

NH2

N

SN

OCOOH

NH2 Me

Me

COOHO

ACV

Isopenicillin N

Fe+2

Ascorbate

O2

2H2O Fe

N

NHis268

H

N

NHis212

H

SACV

O2 (NO)

H2O

OAsp214

O

28

Research IPNS

bull Goal Identify Fe+2 binding residues

bull Possible solutions1 In the lab

2 Bioinformatic approach (comparing different IPNS sequences)

29

Step 1

Multiple alignment of known IPNS

Implementation

1 Obtain sequence (eg for MCBI)

IPNS AND Bacteria[Organism]

2 MSA (clustalw) and search for conserved residues in the MSA

30

31

32

MSA ndash bacteria only

Not enough variation

33

MSA ndash bacteria amp fungi

Not enough variation

bacteria amp fungi

bacteria

34

Step 2

Goal Add more enzymes similar to IPNS

ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW

35

bull New multiple alignment narrowing down the possibilities

36

Simple multiple alignment

bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite

similarbull Not enough variability to categorize the active

sitesbull We need to obtain even more distant sequences

(distant homologs)

37

Step 3

Using the results of the MSA for further searches

Implementation

1 Obtain an MSA (clustalw)

2 - Construct a consensus sequence and perform a

new search

OR

- Construct a profile and perform a new search

3 MSA (clustalw) and search for conserved residues in the MSA

38

Consensus Sequence

bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column

bull Consensus each position reflects the most common character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

39

Profilebull We can deduce a statistical model describing

the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column

bull Profile each position reflects the frequency of the character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 1 067 0 0

T 0 033 1 1

C 0 0 0 0

G 0 0 0 0

40

Profile vs Consensusbull The following multiple alignments will

have the same consensus

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

41

Profile vs Consensusbull But have a different profile

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 066 1 0 0

T 0 0 0 1

C 033 0 066

0

G 0 0 033

0

1 2 3 4 5 6

A 1 1 0 0

T 0 0 0 1

C 0 0 1 0

G 0 0 0 0

42

Sequence LOGO A A C C G C T C T T

A G C C G C G C - T

A - C A G A G C C T

A A G C A C G C - T

A C G G G T G C T T

A T G C ndash C G C - T

A gc c g G C T

httpweblogoberkeleyedu

43

Psi Blast

bull Position Specific Iterated - automatic profile-like search

Regular blast

Construct profile from blast results

Blast profile search

Final results

44

bull Alignment with distantly related proteins

45

bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS

Enzyme Relative Km kcat kcatKm

Activity (mM) (min-1) (mM-1min

-1)Wild type 100 04 388 969

His48Ala 16 056 75 134

His63Ala 31 10 142 142

His114Ala 28 085 125 147

His124Ala 48 084 321 381

His135Ala 22 059 117 198

His212Ala lt0007 nd nd

His268Ala lt0003 nd nd

Asp14Ala 5 086 056 07

Asp113Ala 63 045 238 528

Asp131Ala 68 048 363 755

Asp203Ala 32 091 123 135

Asp214Ala lt0004 nd nd

Isopenicillin N Synthase

46

ndash IPNS

47

48

  • Multiple Sequence Alignment
  • Example
  • Why multiple sequence alignment
  • Why multiple alignment - example
  • MSA applications
  • Slide 6
  • Slide 7
  • Slide 8
  • Scoring metrics
  • Slide 10
  • MSA algorithms
  • CLUSTALW algorithm
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • ClustalW at EMBL
  • ClustalW Output Aln format
  • MSA Editing Jalview
  • MSA formats - fasta
  • MSA formats - Aln
  • MSA formats - MSF
  • Example 1a a good MSA
  • Slide 25
  • Slide 26
  • Example 2 Isopenicillin N Synthase
  • Research IPNS
  • Step 1
  • Slide 30
  • Slide 31
  • MSA ndash bacteria only
  • MSA ndash bacteria amp fungi
  • Step 2
  • Slide 35
  • Simple multiple alignment
  • Step 3
  • Consensus Sequence
  • Profile
  • Profile vs Consensus
  • Slide 41
  • Slide 42
  • Psi Blast
  • Slide 44
  • Isopenicillin N Synthase
  • Slide 46
  • Slide 47
  • Slide 48
Page 23: Multiple Sequence Alignment

23

MSA formats - MSF

24

Example 1a a good MSA

25

Example 1b making MSA of distantly related proteins

26

Example 1c including more distant relatives in the MSA

27

bull Mononuclear iron proteins ndash electron carrier proteins Iron atoms are bound to amino acid side chains

bull In IPNS the metal ion is coordinated by three protein residues

bull IPNS is involved in biosynthesis of penicillin

Example 2 Isopenicillin N Synthase

N

SN

OCOOHO

H

H

Me

Me

COOH

NH2

N

SN

OCOOH

NH2 Me

Me

COOHO

ACV

Isopenicillin N

Fe+2

Ascorbate

O2

2H2O Fe

N

NHis268

H

N

NHis212

H

SACV

O2 (NO)

H2O

OAsp214

O

28

Research IPNS

bull Goal Identify Fe+2 binding residues

bull Possible solutions1 In the lab

2 Bioinformatic approach (comparing different IPNS sequences)

29

Step 1

Multiple alignment of known IPNS

Implementation

1 Obtain sequence (eg for MCBI)

IPNS AND Bacteria[Organism]

2 MSA (clustalw) and search for conserved residues in the MSA

30

31

32

MSA ndash bacteria only

Not enough variation

33

MSA ndash bacteria amp fungi

Not enough variation

bacteria amp fungi

bacteria

34

Step 2

Goal Add more enzymes similar to IPNS

ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW

35

bull New multiple alignment narrowing down the possibilities

36

Simple multiple alignment

bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite

similarbull Not enough variability to categorize the active

sitesbull We need to obtain even more distant sequences

(distant homologs)

37

Step 3

Using the results of the MSA for further searches

Implementation

1 Obtain an MSA (clustalw)

2 - Construct a consensus sequence and perform a

new search

OR

- Construct a profile and perform a new search

3 MSA (clustalw) and search for conserved residues in the MSA

38

Consensus Sequence

bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column

bull Consensus each position reflects the most common character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

39

Profilebull We can deduce a statistical model describing

the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column

bull Profile each position reflects the frequency of the character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 1 067 0 0

T 0 033 1 1

C 0 0 0 0

G 0 0 0 0

40

Profile vs Consensusbull The following multiple alignments will

have the same consensus

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

41

Profile vs Consensusbull But have a different profile

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 066 1 0 0

T 0 0 0 1

C 033 0 066

0

G 0 0 033

0

1 2 3 4 5 6

A 1 1 0 0

T 0 0 0 1

C 0 0 1 0

G 0 0 0 0

42

Sequence LOGO A A C C G C T C T T

A G C C G C G C - T

A - C A G A G C C T

A A G C A C G C - T

A C G G G T G C T T

A T G C ndash C G C - T

A gc c g G C T

httpweblogoberkeleyedu

43

Psi Blast

bull Position Specific Iterated - automatic profile-like search

Regular blast

Construct profile from blast results

Blast profile search

Final results

44

bull Alignment with distantly related proteins

45

bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS

Enzyme Relative Km kcat kcatKm

Activity (mM) (min-1) (mM-1min

-1)Wild type 100 04 388 969

His48Ala 16 056 75 134

His63Ala 31 10 142 142

His114Ala 28 085 125 147

His124Ala 48 084 321 381

His135Ala 22 059 117 198

His212Ala lt0007 nd nd

His268Ala lt0003 nd nd

Asp14Ala 5 086 056 07

Asp113Ala 63 045 238 528

Asp131Ala 68 048 363 755

Asp203Ala 32 091 123 135

Asp214Ala lt0004 nd nd

Isopenicillin N Synthase

46

ndash IPNS

47

48

  • Multiple Sequence Alignment
  • Example
  • Why multiple sequence alignment
  • Why multiple alignment - example
  • MSA applications
  • Slide 6
  • Slide 7
  • Slide 8
  • Scoring metrics
  • Slide 10
  • MSA algorithms
  • CLUSTALW algorithm
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • ClustalW at EMBL
  • ClustalW Output Aln format
  • MSA Editing Jalview
  • MSA formats - fasta
  • MSA formats - Aln
  • MSA formats - MSF
  • Example 1a a good MSA
  • Slide 25
  • Slide 26
  • Example 2 Isopenicillin N Synthase
  • Research IPNS
  • Step 1
  • Slide 30
  • Slide 31
  • MSA ndash bacteria only
  • MSA ndash bacteria amp fungi
  • Step 2
  • Slide 35
  • Simple multiple alignment
  • Step 3
  • Consensus Sequence
  • Profile
  • Profile vs Consensus
  • Slide 41
  • Slide 42
  • Psi Blast
  • Slide 44
  • Isopenicillin N Synthase
  • Slide 46
  • Slide 47
  • Slide 48
Page 24: Multiple Sequence Alignment

24

Example 1a a good MSA

25

Example 1b making MSA of distantly related proteins

26

Example 1c including more distant relatives in the MSA

27

bull Mononuclear iron proteins ndash electron carrier proteins Iron atoms are bound to amino acid side chains

bull In IPNS the metal ion is coordinated by three protein residues

bull IPNS is involved in biosynthesis of penicillin

Example 2 Isopenicillin N Synthase

N

SN

OCOOHO

H

H

Me

Me

COOH

NH2

N

SN

OCOOH

NH2 Me

Me

COOHO

ACV

Isopenicillin N

Fe+2

Ascorbate

O2

2H2O Fe

N

NHis268

H

N

NHis212

H

SACV

O2 (NO)

H2O

OAsp214

O

28

Research IPNS

bull Goal Identify Fe+2 binding residues

bull Possible solutions1 In the lab

2 Bioinformatic approach (comparing different IPNS sequences)

29

Step 1

Multiple alignment of known IPNS

Implementation

1 Obtain sequence (eg for MCBI)

IPNS AND Bacteria[Organism]

2 MSA (clustalw) and search for conserved residues in the MSA

30

31

32

MSA ndash bacteria only

Not enough variation

33

MSA ndash bacteria amp fungi

Not enough variation

bacteria amp fungi

bacteria

34

Step 2

Goal Add more enzymes similar to IPNS

ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW

35

bull New multiple alignment narrowing down the possibilities

36

Simple multiple alignment

bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite

similarbull Not enough variability to categorize the active

sitesbull We need to obtain even more distant sequences

(distant homologs)

37

Step 3

Using the results of the MSA for further searches

Implementation

1 Obtain an MSA (clustalw)

2 - Construct a consensus sequence and perform a

new search

OR

- Construct a profile and perform a new search

3 MSA (clustalw) and search for conserved residues in the MSA

38

Consensus Sequence

bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column

bull Consensus each position reflects the most common character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

39

Profilebull We can deduce a statistical model describing

the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column

bull Profile each position reflects the frequency of the character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 1 067 0 0

T 0 033 1 1

C 0 0 0 0

G 0 0 0 0

40

Profile vs Consensusbull The following multiple alignments will

have the same consensus

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

41

Profile vs Consensusbull But have a different profile

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 066 1 0 0

T 0 0 0 1

C 033 0 066

0

G 0 0 033

0

1 2 3 4 5 6

A 1 1 0 0

T 0 0 0 1

C 0 0 1 0

G 0 0 0 0

42

Sequence LOGO A A C C G C T C T T

A G C C G C G C - T

A - C A G A G C C T

A A G C A C G C - T

A C G G G T G C T T

A T G C ndash C G C - T

A gc c g G C T

httpweblogoberkeleyedu

43

Psi Blast

bull Position Specific Iterated - automatic profile-like search

Regular blast

Construct profile from blast results

Blast profile search

Final results

44

bull Alignment with distantly related proteins

45

bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS

Enzyme Relative Km kcat kcatKm

Activity (mM) (min-1) (mM-1min

-1)Wild type 100 04 388 969

His48Ala 16 056 75 134

His63Ala 31 10 142 142

His114Ala 28 085 125 147

His124Ala 48 084 321 381

His135Ala 22 059 117 198

His212Ala lt0007 nd nd

His268Ala lt0003 nd nd

Asp14Ala 5 086 056 07

Asp113Ala 63 045 238 528

Asp131Ala 68 048 363 755

Asp203Ala 32 091 123 135

Asp214Ala lt0004 nd nd

Isopenicillin N Synthase

46

ndash IPNS

47

48

  • Multiple Sequence Alignment
  • Example
  • Why multiple sequence alignment
  • Why multiple alignment - example
  • MSA applications
  • Slide 6
  • Slide 7
  • Slide 8
  • Scoring metrics
  • Slide 10
  • MSA algorithms
  • CLUSTALW algorithm
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • ClustalW at EMBL
  • ClustalW Output Aln format
  • MSA Editing Jalview
  • MSA formats - fasta
  • MSA formats - Aln
  • MSA formats - MSF
  • Example 1a a good MSA
  • Slide 25
  • Slide 26
  • Example 2 Isopenicillin N Synthase
  • Research IPNS
  • Step 1
  • Slide 30
  • Slide 31
  • MSA ndash bacteria only
  • MSA ndash bacteria amp fungi
  • Step 2
  • Slide 35
  • Simple multiple alignment
  • Step 3
  • Consensus Sequence
  • Profile
  • Profile vs Consensus
  • Slide 41
  • Slide 42
  • Psi Blast
  • Slide 44
  • Isopenicillin N Synthase
  • Slide 46
  • Slide 47
  • Slide 48
Page 25: Multiple Sequence Alignment

25

Example 1b making MSA of distantly related proteins

26

Example 1c including more distant relatives in the MSA

27

bull Mononuclear iron proteins ndash electron carrier proteins Iron atoms are bound to amino acid side chains

bull In IPNS the metal ion is coordinated by three protein residues

bull IPNS is involved in biosynthesis of penicillin

Example 2 Isopenicillin N Synthase

N

SN

OCOOHO

H

H

Me

Me

COOH

NH2

N

SN

OCOOH

NH2 Me

Me

COOHO

ACV

Isopenicillin N

Fe+2

Ascorbate

O2

2H2O Fe

N

NHis268

H

N

NHis212

H

SACV

O2 (NO)

H2O

OAsp214

O

28

Research IPNS

bull Goal Identify Fe+2 binding residues

bull Possible solutions1 In the lab

2 Bioinformatic approach (comparing different IPNS sequences)

29

Step 1

Multiple alignment of known IPNS

Implementation

1 Obtain sequence (eg for MCBI)

IPNS AND Bacteria[Organism]

2 MSA (clustalw) and search for conserved residues in the MSA

30

31

32

MSA ndash bacteria only

Not enough variation

33

MSA ndash bacteria amp fungi

Not enough variation

bacteria amp fungi

bacteria

34

Step 2

Goal Add more enzymes similar to IPNS

ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW

35

bull New multiple alignment narrowing down the possibilities

36

Simple multiple alignment

bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite

similarbull Not enough variability to categorize the active

sitesbull We need to obtain even more distant sequences

(distant homologs)

37

Step 3

Using the results of the MSA for further searches

Implementation

1 Obtain an MSA (clustalw)

2 - Construct a consensus sequence and perform a

new search

OR

- Construct a profile and perform a new search

3 MSA (clustalw) and search for conserved residues in the MSA

38

Consensus Sequence

bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column

bull Consensus each position reflects the most common character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

39

Profilebull We can deduce a statistical model describing

the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column

bull Profile each position reflects the frequency of the character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 1 067 0 0

T 0 033 1 1

C 0 0 0 0

G 0 0 0 0

40

Profile vs Consensusbull The following multiple alignments will

have the same consensus

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

41

Profile vs Consensusbull But have a different profile

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 066 1 0 0

T 0 0 0 1

C 033 0 066

0

G 0 0 033

0

1 2 3 4 5 6

A 1 1 0 0

T 0 0 0 1

C 0 0 1 0

G 0 0 0 0

42

Sequence LOGO A A C C G C T C T T

A G C C G C G C - T

A - C A G A G C C T

A A G C A C G C - T

A C G G G T G C T T

A T G C ndash C G C - T

A gc c g G C T

httpweblogoberkeleyedu

43

Psi Blast

bull Position Specific Iterated - automatic profile-like search

Regular blast

Construct profile from blast results

Blast profile search

Final results

44

bull Alignment with distantly related proteins

45

bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS

Enzyme Relative Km kcat kcatKm

Activity (mM) (min-1) (mM-1min

-1)Wild type 100 04 388 969

His48Ala 16 056 75 134

His63Ala 31 10 142 142

His114Ala 28 085 125 147

His124Ala 48 084 321 381

His135Ala 22 059 117 198

His212Ala lt0007 nd nd

His268Ala lt0003 nd nd

Asp14Ala 5 086 056 07

Asp113Ala 63 045 238 528

Asp131Ala 68 048 363 755

Asp203Ala 32 091 123 135

Asp214Ala lt0004 nd nd

Isopenicillin N Synthase

46

ndash IPNS

47

48

  • Multiple Sequence Alignment
  • Example
  • Why multiple sequence alignment
  • Why multiple alignment - example
  • MSA applications
  • Slide 6
  • Slide 7
  • Slide 8
  • Scoring metrics
  • Slide 10
  • MSA algorithms
  • CLUSTALW algorithm
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • ClustalW at EMBL
  • ClustalW Output Aln format
  • MSA Editing Jalview
  • MSA formats - fasta
  • MSA formats - Aln
  • MSA formats - MSF
  • Example 1a a good MSA
  • Slide 25
  • Slide 26
  • Example 2 Isopenicillin N Synthase
  • Research IPNS
  • Step 1
  • Slide 30
  • Slide 31
  • MSA ndash bacteria only
  • MSA ndash bacteria amp fungi
  • Step 2
  • Slide 35
  • Simple multiple alignment
  • Step 3
  • Consensus Sequence
  • Profile
  • Profile vs Consensus
  • Slide 41
  • Slide 42
  • Psi Blast
  • Slide 44
  • Isopenicillin N Synthase
  • Slide 46
  • Slide 47
  • Slide 48
Page 26: Multiple Sequence Alignment

26

Example 1c including more distant relatives in the MSA

27

bull Mononuclear iron proteins ndash electron carrier proteins Iron atoms are bound to amino acid side chains

bull In IPNS the metal ion is coordinated by three protein residues

bull IPNS is involved in biosynthesis of penicillin

Example 2 Isopenicillin N Synthase

N

SN

OCOOHO

H

H

Me

Me

COOH

NH2

N

SN

OCOOH

NH2 Me

Me

COOHO

ACV

Isopenicillin N

Fe+2

Ascorbate

O2

2H2O Fe

N

NHis268

H

N

NHis212

H

SACV

O2 (NO)

H2O

OAsp214

O

28

Research IPNS

bull Goal Identify Fe+2 binding residues

bull Possible solutions1 In the lab

2 Bioinformatic approach (comparing different IPNS sequences)

29

Step 1

Multiple alignment of known IPNS

Implementation

1 Obtain sequence (eg for MCBI)

IPNS AND Bacteria[Organism]

2 MSA (clustalw) and search for conserved residues in the MSA

30

31

32

MSA ndash bacteria only

Not enough variation

33

MSA ndash bacteria amp fungi

Not enough variation

bacteria amp fungi

bacteria

34

Step 2

Goal Add more enzymes similar to IPNS

ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW

35

bull New multiple alignment narrowing down the possibilities

36

Simple multiple alignment

bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite

similarbull Not enough variability to categorize the active

sitesbull We need to obtain even more distant sequences

(distant homologs)

37

Step 3

Using the results of the MSA for further searches

Implementation

1 Obtain an MSA (clustalw)

2 - Construct a consensus sequence and perform a

new search

OR

- Construct a profile and perform a new search

3 MSA (clustalw) and search for conserved residues in the MSA

38

Consensus Sequence

bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column

bull Consensus each position reflects the most common character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

39

Profilebull We can deduce a statistical model describing

the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column

bull Profile each position reflects the frequency of the character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 1 067 0 0

T 0 033 1 1

C 0 0 0 0

G 0 0 0 0

40

Profile vs Consensusbull The following multiple alignments will

have the same consensus

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

41

Profile vs Consensusbull But have a different profile

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 066 1 0 0

T 0 0 0 1

C 033 0 066

0

G 0 0 033

0

1 2 3 4 5 6

A 1 1 0 0

T 0 0 0 1

C 0 0 1 0

G 0 0 0 0

42

Sequence LOGO A A C C G C T C T T

A G C C G C G C - T

A - C A G A G C C T

A A G C A C G C - T

A C G G G T G C T T

A T G C ndash C G C - T

A gc c g G C T

httpweblogoberkeleyedu

43

Psi Blast

bull Position Specific Iterated - automatic profile-like search

Regular blast

Construct profile from blast results

Blast profile search

Final results

44

bull Alignment with distantly related proteins

45

bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS

Enzyme Relative Km kcat kcatKm

Activity (mM) (min-1) (mM-1min

-1)Wild type 100 04 388 969

His48Ala 16 056 75 134

His63Ala 31 10 142 142

His114Ala 28 085 125 147

His124Ala 48 084 321 381

His135Ala 22 059 117 198

His212Ala lt0007 nd nd

His268Ala lt0003 nd nd

Asp14Ala 5 086 056 07

Asp113Ala 63 045 238 528

Asp131Ala 68 048 363 755

Asp203Ala 32 091 123 135

Asp214Ala lt0004 nd nd

Isopenicillin N Synthase

46

ndash IPNS

47

48

  • Multiple Sequence Alignment
  • Example
  • Why multiple sequence alignment
  • Why multiple alignment - example
  • MSA applications
  • Slide 6
  • Slide 7
  • Slide 8
  • Scoring metrics
  • Slide 10
  • MSA algorithms
  • CLUSTALW algorithm
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • ClustalW at EMBL
  • ClustalW Output Aln format
  • MSA Editing Jalview
  • MSA formats - fasta
  • MSA formats - Aln
  • MSA formats - MSF
  • Example 1a a good MSA
  • Slide 25
  • Slide 26
  • Example 2 Isopenicillin N Synthase
  • Research IPNS
  • Step 1
  • Slide 30
  • Slide 31
  • MSA ndash bacteria only
  • MSA ndash bacteria amp fungi
  • Step 2
  • Slide 35
  • Simple multiple alignment
  • Step 3
  • Consensus Sequence
  • Profile
  • Profile vs Consensus
  • Slide 41
  • Slide 42
  • Psi Blast
  • Slide 44
  • Isopenicillin N Synthase
  • Slide 46
  • Slide 47
  • Slide 48
Page 27: Multiple Sequence Alignment

27

bull Mononuclear iron proteins ndash electron carrier proteins Iron atoms are bound to amino acid side chains

bull In IPNS the metal ion is coordinated by three protein residues

bull IPNS is involved in biosynthesis of penicillin

Example 2 Isopenicillin N Synthase

N

SN

OCOOHO

H

H

Me

Me

COOH

NH2

N

SN

OCOOH

NH2 Me

Me

COOHO

ACV

Isopenicillin N

Fe+2

Ascorbate

O2

2H2O Fe

N

NHis268

H

N

NHis212

H

SACV

O2 (NO)

H2O

OAsp214

O

28

Research IPNS

bull Goal Identify Fe+2 binding residues

bull Possible solutions1 In the lab

2 Bioinformatic approach (comparing different IPNS sequences)

29

Step 1

Multiple alignment of known IPNS

Implementation

1 Obtain sequence (eg for MCBI)

IPNS AND Bacteria[Organism]

2 MSA (clustalw) and search for conserved residues in the MSA

30

31

32

MSA ndash bacteria only

Not enough variation

33

MSA ndash bacteria amp fungi

Not enough variation

bacteria amp fungi

bacteria

34

Step 2

Goal Add more enzymes similar to IPNS

ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW

35

bull New multiple alignment narrowing down the possibilities

36

Simple multiple alignment

bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite

similarbull Not enough variability to categorize the active

sitesbull We need to obtain even more distant sequences

(distant homologs)

37

Step 3

Using the results of the MSA for further searches

Implementation

1 Obtain an MSA (clustalw)

2 - Construct a consensus sequence and perform a

new search

OR

- Construct a profile and perform a new search

3 MSA (clustalw) and search for conserved residues in the MSA

38

Consensus Sequence

bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column

bull Consensus each position reflects the most common character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

39

Profilebull We can deduce a statistical model describing

the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column

bull Profile each position reflects the frequency of the character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 1 067 0 0

T 0 033 1 1

C 0 0 0 0

G 0 0 0 0

40

Profile vs Consensusbull The following multiple alignments will

have the same consensus

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

41

Profile vs Consensusbull But have a different profile

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 066 1 0 0

T 0 0 0 1

C 033 0 066

0

G 0 0 033

0

1 2 3 4 5 6

A 1 1 0 0

T 0 0 0 1

C 0 0 1 0

G 0 0 0 0

42

Sequence LOGO A A C C G C T C T T

A G C C G C G C - T

A - C A G A G C C T

A A G C A C G C - T

A C G G G T G C T T

A T G C ndash C G C - T

A gc c g G C T

httpweblogoberkeleyedu

43

Psi Blast

bull Position Specific Iterated - automatic profile-like search

Regular blast

Construct profile from blast results

Blast profile search

Final results

44

bull Alignment with distantly related proteins

45

bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS

Enzyme Relative Km kcat kcatKm

Activity (mM) (min-1) (mM-1min

-1)Wild type 100 04 388 969

His48Ala 16 056 75 134

His63Ala 31 10 142 142

His114Ala 28 085 125 147

His124Ala 48 084 321 381

His135Ala 22 059 117 198

His212Ala lt0007 nd nd

His268Ala lt0003 nd nd

Asp14Ala 5 086 056 07

Asp113Ala 63 045 238 528

Asp131Ala 68 048 363 755

Asp203Ala 32 091 123 135

Asp214Ala lt0004 nd nd

Isopenicillin N Synthase

46

ndash IPNS

47

48

  • Multiple Sequence Alignment
  • Example
  • Why multiple sequence alignment
  • Why multiple alignment - example
  • MSA applications
  • Slide 6
  • Slide 7
  • Slide 8
  • Scoring metrics
  • Slide 10
  • MSA algorithms
  • CLUSTALW algorithm
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • ClustalW at EMBL
  • ClustalW Output Aln format
  • MSA Editing Jalview
  • MSA formats - fasta
  • MSA formats - Aln
  • MSA formats - MSF
  • Example 1a a good MSA
  • Slide 25
  • Slide 26
  • Example 2 Isopenicillin N Synthase
  • Research IPNS
  • Step 1
  • Slide 30
  • Slide 31
  • MSA ndash bacteria only
  • MSA ndash bacteria amp fungi
  • Step 2
  • Slide 35
  • Simple multiple alignment
  • Step 3
  • Consensus Sequence
  • Profile
  • Profile vs Consensus
  • Slide 41
  • Slide 42
  • Psi Blast
  • Slide 44
  • Isopenicillin N Synthase
  • Slide 46
  • Slide 47
  • Slide 48
Page 28: Multiple Sequence Alignment

28

Research IPNS

bull Goal Identify Fe+2 binding residues

bull Possible solutions1 In the lab

2 Bioinformatic approach (comparing different IPNS sequences)

29

Step 1

Multiple alignment of known IPNS

Implementation

1 Obtain sequence (eg for MCBI)

IPNS AND Bacteria[Organism]

2 MSA (clustalw) and search for conserved residues in the MSA

30

31

32

MSA ndash bacteria only

Not enough variation

33

MSA ndash bacteria amp fungi

Not enough variation

bacteria amp fungi

bacteria

34

Step 2

Goal Add more enzymes similar to IPNS

ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW

35

bull New multiple alignment narrowing down the possibilities

36

Simple multiple alignment

bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite

similarbull Not enough variability to categorize the active

sitesbull We need to obtain even more distant sequences

(distant homologs)

37

Step 3

Using the results of the MSA for further searches

Implementation

1 Obtain an MSA (clustalw)

2 - Construct a consensus sequence and perform a

new search

OR

- Construct a profile and perform a new search

3 MSA (clustalw) and search for conserved residues in the MSA

38

Consensus Sequence

bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column

bull Consensus each position reflects the most common character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

39

Profilebull We can deduce a statistical model describing

the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column

bull Profile each position reflects the frequency of the character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 1 067 0 0

T 0 033 1 1

C 0 0 0 0

G 0 0 0 0

40

Profile vs Consensusbull The following multiple alignments will

have the same consensus

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

41

Profile vs Consensusbull But have a different profile

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 066 1 0 0

T 0 0 0 1

C 033 0 066

0

G 0 0 033

0

1 2 3 4 5 6

A 1 1 0 0

T 0 0 0 1

C 0 0 1 0

G 0 0 0 0

42

Sequence LOGO A A C C G C T C T T

A G C C G C G C - T

A - C A G A G C C T

A A G C A C G C - T

A C G G G T G C T T

A T G C ndash C G C - T

A gc c g G C T

httpweblogoberkeleyedu

43

Psi Blast

bull Position Specific Iterated - automatic profile-like search

Regular blast

Construct profile from blast results

Blast profile search

Final results

44

bull Alignment with distantly related proteins

45

bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS

Enzyme Relative Km kcat kcatKm

Activity (mM) (min-1) (mM-1min

-1)Wild type 100 04 388 969

His48Ala 16 056 75 134

His63Ala 31 10 142 142

His114Ala 28 085 125 147

His124Ala 48 084 321 381

His135Ala 22 059 117 198

His212Ala lt0007 nd nd

His268Ala lt0003 nd nd

Asp14Ala 5 086 056 07

Asp113Ala 63 045 238 528

Asp131Ala 68 048 363 755

Asp203Ala 32 091 123 135

Asp214Ala lt0004 nd nd

Isopenicillin N Synthase

46

ndash IPNS

47

48

  • Multiple Sequence Alignment
  • Example
  • Why multiple sequence alignment
  • Why multiple alignment - example
  • MSA applications
  • Slide 6
  • Slide 7
  • Slide 8
  • Scoring metrics
  • Slide 10
  • MSA algorithms
  • CLUSTALW algorithm
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • ClustalW at EMBL
  • ClustalW Output Aln format
  • MSA Editing Jalview
  • MSA formats - fasta
  • MSA formats - Aln
  • MSA formats - MSF
  • Example 1a a good MSA
  • Slide 25
  • Slide 26
  • Example 2 Isopenicillin N Synthase
  • Research IPNS
  • Step 1
  • Slide 30
  • Slide 31
  • MSA ndash bacteria only
  • MSA ndash bacteria amp fungi
  • Step 2
  • Slide 35
  • Simple multiple alignment
  • Step 3
  • Consensus Sequence
  • Profile
  • Profile vs Consensus
  • Slide 41
  • Slide 42
  • Psi Blast
  • Slide 44
  • Isopenicillin N Synthase
  • Slide 46
  • Slide 47
  • Slide 48
Page 29: Multiple Sequence Alignment

29

Step 1

Multiple alignment of known IPNS

Implementation

1 Obtain sequence (eg for MCBI)

IPNS AND Bacteria[Organism]

2 MSA (clustalw) and search for conserved residues in the MSA

30

31

32

MSA ndash bacteria only

Not enough variation

33

MSA ndash bacteria amp fungi

Not enough variation

bacteria amp fungi

bacteria

34

Step 2

Goal Add more enzymes similar to IPNS

ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW

35

bull New multiple alignment narrowing down the possibilities

36

Simple multiple alignment

bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite

similarbull Not enough variability to categorize the active

sitesbull We need to obtain even more distant sequences

(distant homologs)

37

Step 3

Using the results of the MSA for further searches

Implementation

1 Obtain an MSA (clustalw)

2 - Construct a consensus sequence and perform a

new search

OR

- Construct a profile and perform a new search

3 MSA (clustalw) and search for conserved residues in the MSA

38

Consensus Sequence

bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column

bull Consensus each position reflects the most common character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

39

Profilebull We can deduce a statistical model describing

the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column

bull Profile each position reflects the frequency of the character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 1 067 0 0

T 0 033 1 1

C 0 0 0 0

G 0 0 0 0

40

Profile vs Consensusbull The following multiple alignments will

have the same consensus

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

41

Profile vs Consensusbull But have a different profile

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 066 1 0 0

T 0 0 0 1

C 033 0 066

0

G 0 0 033

0

1 2 3 4 5 6

A 1 1 0 0

T 0 0 0 1

C 0 0 1 0

G 0 0 0 0

42

Sequence LOGO A A C C G C T C T T

A G C C G C G C - T

A - C A G A G C C T

A A G C A C G C - T

A C G G G T G C T T

A T G C ndash C G C - T

A gc c g G C T

httpweblogoberkeleyedu

43

Psi Blast

bull Position Specific Iterated - automatic profile-like search

Regular blast

Construct profile from blast results

Blast profile search

Final results

44

bull Alignment with distantly related proteins

45

bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS

Enzyme Relative Km kcat kcatKm

Activity (mM) (min-1) (mM-1min

-1)Wild type 100 04 388 969

His48Ala 16 056 75 134

His63Ala 31 10 142 142

His114Ala 28 085 125 147

His124Ala 48 084 321 381

His135Ala 22 059 117 198

His212Ala lt0007 nd nd

His268Ala lt0003 nd nd

Asp14Ala 5 086 056 07

Asp113Ala 63 045 238 528

Asp131Ala 68 048 363 755

Asp203Ala 32 091 123 135

Asp214Ala lt0004 nd nd

Isopenicillin N Synthase

46

ndash IPNS

47

48

  • Multiple Sequence Alignment
  • Example
  • Why multiple sequence alignment
  • Why multiple alignment - example
  • MSA applications
  • Slide 6
  • Slide 7
  • Slide 8
  • Scoring metrics
  • Slide 10
  • MSA algorithms
  • CLUSTALW algorithm
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • ClustalW at EMBL
  • ClustalW Output Aln format
  • MSA Editing Jalview
  • MSA formats - fasta
  • MSA formats - Aln
  • MSA formats - MSF
  • Example 1a a good MSA
  • Slide 25
  • Slide 26
  • Example 2 Isopenicillin N Synthase
  • Research IPNS
  • Step 1
  • Slide 30
  • Slide 31
  • MSA ndash bacteria only
  • MSA ndash bacteria amp fungi
  • Step 2
  • Slide 35
  • Simple multiple alignment
  • Step 3
  • Consensus Sequence
  • Profile
  • Profile vs Consensus
  • Slide 41
  • Slide 42
  • Psi Blast
  • Slide 44
  • Isopenicillin N Synthase
  • Slide 46
  • Slide 47
  • Slide 48
Page 30: Multiple Sequence Alignment

30

31

32

MSA ndash bacteria only

Not enough variation

33

MSA ndash bacteria amp fungi

Not enough variation

bacteria amp fungi

bacteria

34

Step 2

Goal Add more enzymes similar to IPNS

ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW

35

bull New multiple alignment narrowing down the possibilities

36

Simple multiple alignment

bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite

similarbull Not enough variability to categorize the active

sitesbull We need to obtain even more distant sequences

(distant homologs)

37

Step 3

Using the results of the MSA for further searches

Implementation

1 Obtain an MSA (clustalw)

2 - Construct a consensus sequence and perform a

new search

OR

- Construct a profile and perform a new search

3 MSA (clustalw) and search for conserved residues in the MSA

38

Consensus Sequence

bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column

bull Consensus each position reflects the most common character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

39

Profilebull We can deduce a statistical model describing

the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column

bull Profile each position reflects the frequency of the character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 1 067 0 0

T 0 033 1 1

C 0 0 0 0

G 0 0 0 0

40

Profile vs Consensusbull The following multiple alignments will

have the same consensus

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

41

Profile vs Consensusbull But have a different profile

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 066 1 0 0

T 0 0 0 1

C 033 0 066

0

G 0 0 033

0

1 2 3 4 5 6

A 1 1 0 0

T 0 0 0 1

C 0 0 1 0

G 0 0 0 0

42

Sequence LOGO A A C C G C T C T T

A G C C G C G C - T

A - C A G A G C C T

A A G C A C G C - T

A C G G G T G C T T

A T G C ndash C G C - T

A gc c g G C T

httpweblogoberkeleyedu

43

Psi Blast

bull Position Specific Iterated - automatic profile-like search

Regular blast

Construct profile from blast results

Blast profile search

Final results

44

bull Alignment with distantly related proteins

45

bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS

Enzyme Relative Km kcat kcatKm

Activity (mM) (min-1) (mM-1min

-1)Wild type 100 04 388 969

His48Ala 16 056 75 134

His63Ala 31 10 142 142

His114Ala 28 085 125 147

His124Ala 48 084 321 381

His135Ala 22 059 117 198

His212Ala lt0007 nd nd

His268Ala lt0003 nd nd

Asp14Ala 5 086 056 07

Asp113Ala 63 045 238 528

Asp131Ala 68 048 363 755

Asp203Ala 32 091 123 135

Asp214Ala lt0004 nd nd

Isopenicillin N Synthase

46

ndash IPNS

47

48

  • Multiple Sequence Alignment
  • Example
  • Why multiple sequence alignment
  • Why multiple alignment - example
  • MSA applications
  • Slide 6
  • Slide 7
  • Slide 8
  • Scoring metrics
  • Slide 10
  • MSA algorithms
  • CLUSTALW algorithm
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • ClustalW at EMBL
  • ClustalW Output Aln format
  • MSA Editing Jalview
  • MSA formats - fasta
  • MSA formats - Aln
  • MSA formats - MSF
  • Example 1a a good MSA
  • Slide 25
  • Slide 26
  • Example 2 Isopenicillin N Synthase
  • Research IPNS
  • Step 1
  • Slide 30
  • Slide 31
  • MSA ndash bacteria only
  • MSA ndash bacteria amp fungi
  • Step 2
  • Slide 35
  • Simple multiple alignment
  • Step 3
  • Consensus Sequence
  • Profile
  • Profile vs Consensus
  • Slide 41
  • Slide 42
  • Psi Blast
  • Slide 44
  • Isopenicillin N Synthase
  • Slide 46
  • Slide 47
  • Slide 48
Page 31: Multiple Sequence Alignment

31

32

MSA ndash bacteria only

Not enough variation

33

MSA ndash bacteria amp fungi

Not enough variation

bacteria amp fungi

bacteria

34

Step 2

Goal Add more enzymes similar to IPNS

ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW

35

bull New multiple alignment narrowing down the possibilities

36

Simple multiple alignment

bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite

similarbull Not enough variability to categorize the active

sitesbull We need to obtain even more distant sequences

(distant homologs)

37

Step 3

Using the results of the MSA for further searches

Implementation

1 Obtain an MSA (clustalw)

2 - Construct a consensus sequence and perform a

new search

OR

- Construct a profile and perform a new search

3 MSA (clustalw) and search for conserved residues in the MSA

38

Consensus Sequence

bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column

bull Consensus each position reflects the most common character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

39

Profilebull We can deduce a statistical model describing

the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column

bull Profile each position reflects the frequency of the character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 1 067 0 0

T 0 033 1 1

C 0 0 0 0

G 0 0 0 0

40

Profile vs Consensusbull The following multiple alignments will

have the same consensus

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

41

Profile vs Consensusbull But have a different profile

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 066 1 0 0

T 0 0 0 1

C 033 0 066

0

G 0 0 033

0

1 2 3 4 5 6

A 1 1 0 0

T 0 0 0 1

C 0 0 1 0

G 0 0 0 0

42

Sequence LOGO A A C C G C T C T T

A G C C G C G C - T

A - C A G A G C C T

A A G C A C G C - T

A C G G G T G C T T

A T G C ndash C G C - T

A gc c g G C T

httpweblogoberkeleyedu

43

Psi Blast

bull Position Specific Iterated - automatic profile-like search

Regular blast

Construct profile from blast results

Blast profile search

Final results

44

bull Alignment with distantly related proteins

45

bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS

Enzyme Relative Km kcat kcatKm

Activity (mM) (min-1) (mM-1min

-1)Wild type 100 04 388 969

His48Ala 16 056 75 134

His63Ala 31 10 142 142

His114Ala 28 085 125 147

His124Ala 48 084 321 381

His135Ala 22 059 117 198

His212Ala lt0007 nd nd

His268Ala lt0003 nd nd

Asp14Ala 5 086 056 07

Asp113Ala 63 045 238 528

Asp131Ala 68 048 363 755

Asp203Ala 32 091 123 135

Asp214Ala lt0004 nd nd

Isopenicillin N Synthase

46

ndash IPNS

47

48

  • Multiple Sequence Alignment
  • Example
  • Why multiple sequence alignment
  • Why multiple alignment - example
  • MSA applications
  • Slide 6
  • Slide 7
  • Slide 8
  • Scoring metrics
  • Slide 10
  • MSA algorithms
  • CLUSTALW algorithm
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • ClustalW at EMBL
  • ClustalW Output Aln format
  • MSA Editing Jalview
  • MSA formats - fasta
  • MSA formats - Aln
  • MSA formats - MSF
  • Example 1a a good MSA
  • Slide 25
  • Slide 26
  • Example 2 Isopenicillin N Synthase
  • Research IPNS
  • Step 1
  • Slide 30
  • Slide 31
  • MSA ndash bacteria only
  • MSA ndash bacteria amp fungi
  • Step 2
  • Slide 35
  • Simple multiple alignment
  • Step 3
  • Consensus Sequence
  • Profile
  • Profile vs Consensus
  • Slide 41
  • Slide 42
  • Psi Blast
  • Slide 44
  • Isopenicillin N Synthase
  • Slide 46
  • Slide 47
  • Slide 48
Page 32: Multiple Sequence Alignment

32

MSA ndash bacteria only

Not enough variation

33

MSA ndash bacteria amp fungi

Not enough variation

bacteria amp fungi

bacteria

34

Step 2

Goal Add more enzymes similar to IPNS

ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW

35

bull New multiple alignment narrowing down the possibilities

36

Simple multiple alignment

bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite

similarbull Not enough variability to categorize the active

sitesbull We need to obtain even more distant sequences

(distant homologs)

37

Step 3

Using the results of the MSA for further searches

Implementation

1 Obtain an MSA (clustalw)

2 - Construct a consensus sequence and perform a

new search

OR

- Construct a profile and perform a new search

3 MSA (clustalw) and search for conserved residues in the MSA

38

Consensus Sequence

bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column

bull Consensus each position reflects the most common character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

39

Profilebull We can deduce a statistical model describing

the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column

bull Profile each position reflects the frequency of the character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 1 067 0 0

T 0 033 1 1

C 0 0 0 0

G 0 0 0 0

40

Profile vs Consensusbull The following multiple alignments will

have the same consensus

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

41

Profile vs Consensusbull But have a different profile

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 066 1 0 0

T 0 0 0 1

C 033 0 066

0

G 0 0 033

0

1 2 3 4 5 6

A 1 1 0 0

T 0 0 0 1

C 0 0 1 0

G 0 0 0 0

42

Sequence LOGO A A C C G C T C T T

A G C C G C G C - T

A - C A G A G C C T

A A G C A C G C - T

A C G G G T G C T T

A T G C ndash C G C - T

A gc c g G C T

httpweblogoberkeleyedu

43

Psi Blast

bull Position Specific Iterated - automatic profile-like search

Regular blast

Construct profile from blast results

Blast profile search

Final results

44

bull Alignment with distantly related proteins

45

bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS

Enzyme Relative Km kcat kcatKm

Activity (mM) (min-1) (mM-1min

-1)Wild type 100 04 388 969

His48Ala 16 056 75 134

His63Ala 31 10 142 142

His114Ala 28 085 125 147

His124Ala 48 084 321 381

His135Ala 22 059 117 198

His212Ala lt0007 nd nd

His268Ala lt0003 nd nd

Asp14Ala 5 086 056 07

Asp113Ala 63 045 238 528

Asp131Ala 68 048 363 755

Asp203Ala 32 091 123 135

Asp214Ala lt0004 nd nd

Isopenicillin N Synthase

46

ndash IPNS

47

48

  • Multiple Sequence Alignment
  • Example
  • Why multiple sequence alignment
  • Why multiple alignment - example
  • MSA applications
  • Slide 6
  • Slide 7
  • Slide 8
  • Scoring metrics
  • Slide 10
  • MSA algorithms
  • CLUSTALW algorithm
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • ClustalW at EMBL
  • ClustalW Output Aln format
  • MSA Editing Jalview
  • MSA formats - fasta
  • MSA formats - Aln
  • MSA formats - MSF
  • Example 1a a good MSA
  • Slide 25
  • Slide 26
  • Example 2 Isopenicillin N Synthase
  • Research IPNS
  • Step 1
  • Slide 30
  • Slide 31
  • MSA ndash bacteria only
  • MSA ndash bacteria amp fungi
  • Step 2
  • Slide 35
  • Simple multiple alignment
  • Step 3
  • Consensus Sequence
  • Profile
  • Profile vs Consensus
  • Slide 41
  • Slide 42
  • Psi Blast
  • Slide 44
  • Isopenicillin N Synthase
  • Slide 46
  • Slide 47
  • Slide 48
Page 33: Multiple Sequence Alignment

33

MSA ndash bacteria amp fungi

Not enough variation

bacteria amp fungi

bacteria

34

Step 2

Goal Add more enzymes similar to IPNS

ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW

35

bull New multiple alignment narrowing down the possibilities

36

Simple multiple alignment

bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite

similarbull Not enough variability to categorize the active

sitesbull We need to obtain even more distant sequences

(distant homologs)

37

Step 3

Using the results of the MSA for further searches

Implementation

1 Obtain an MSA (clustalw)

2 - Construct a consensus sequence and perform a

new search

OR

- Construct a profile and perform a new search

3 MSA (clustalw) and search for conserved residues in the MSA

38

Consensus Sequence

bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column

bull Consensus each position reflects the most common character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

39

Profilebull We can deduce a statistical model describing

the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column

bull Profile each position reflects the frequency of the character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 1 067 0 0

T 0 033 1 1

C 0 0 0 0

G 0 0 0 0

40

Profile vs Consensusbull The following multiple alignments will

have the same consensus

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

41

Profile vs Consensusbull But have a different profile

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 066 1 0 0

T 0 0 0 1

C 033 0 066

0

G 0 0 033

0

1 2 3 4 5 6

A 1 1 0 0

T 0 0 0 1

C 0 0 1 0

G 0 0 0 0

42

Sequence LOGO A A C C G C T C T T

A G C C G C G C - T

A - C A G A G C C T

A A G C A C G C - T

A C G G G T G C T T

A T G C ndash C G C - T

A gc c g G C T

httpweblogoberkeleyedu

43

Psi Blast

bull Position Specific Iterated - automatic profile-like search

Regular blast

Construct profile from blast results

Blast profile search

Final results

44

bull Alignment with distantly related proteins

45

bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS

Enzyme Relative Km kcat kcatKm

Activity (mM) (min-1) (mM-1min

-1)Wild type 100 04 388 969

His48Ala 16 056 75 134

His63Ala 31 10 142 142

His114Ala 28 085 125 147

His124Ala 48 084 321 381

His135Ala 22 059 117 198

His212Ala lt0007 nd nd

His268Ala lt0003 nd nd

Asp14Ala 5 086 056 07

Asp113Ala 63 045 238 528

Asp131Ala 68 048 363 755

Asp203Ala 32 091 123 135

Asp214Ala lt0004 nd nd

Isopenicillin N Synthase

46

ndash IPNS

47

48

  • Multiple Sequence Alignment
  • Example
  • Why multiple sequence alignment
  • Why multiple alignment - example
  • MSA applications
  • Slide 6
  • Slide 7
  • Slide 8
  • Scoring metrics
  • Slide 10
  • MSA algorithms
  • CLUSTALW algorithm
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • ClustalW at EMBL
  • ClustalW Output Aln format
  • MSA Editing Jalview
  • MSA formats - fasta
  • MSA formats - Aln
  • MSA formats - MSF
  • Example 1a a good MSA
  • Slide 25
  • Slide 26
  • Example 2 Isopenicillin N Synthase
  • Research IPNS
  • Step 1
  • Slide 30
  • Slide 31
  • MSA ndash bacteria only
  • MSA ndash bacteria amp fungi
  • Step 2
  • Slide 35
  • Simple multiple alignment
  • Step 3
  • Consensus Sequence
  • Profile
  • Profile vs Consensus
  • Slide 41
  • Slide 42
  • Psi Blast
  • Slide 44
  • Isopenicillin N Synthase
  • Slide 46
  • Slide 47
  • Slide 48
Page 34: Multiple Sequence Alignment

34

Step 2

Goal Add more enzymes similar to IPNS

ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW

35

bull New multiple alignment narrowing down the possibilities

36

Simple multiple alignment

bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite

similarbull Not enough variability to categorize the active

sitesbull We need to obtain even more distant sequences

(distant homologs)

37

Step 3

Using the results of the MSA for further searches

Implementation

1 Obtain an MSA (clustalw)

2 - Construct a consensus sequence and perform a

new search

OR

- Construct a profile and perform a new search

3 MSA (clustalw) and search for conserved residues in the MSA

38

Consensus Sequence

bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column

bull Consensus each position reflects the most common character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

39

Profilebull We can deduce a statistical model describing

the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column

bull Profile each position reflects the frequency of the character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 1 067 0 0

T 0 033 1 1

C 0 0 0 0

G 0 0 0 0

40

Profile vs Consensusbull The following multiple alignments will

have the same consensus

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

41

Profile vs Consensusbull But have a different profile

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 066 1 0 0

T 0 0 0 1

C 033 0 066

0

G 0 0 033

0

1 2 3 4 5 6

A 1 1 0 0

T 0 0 0 1

C 0 0 1 0

G 0 0 0 0

42

Sequence LOGO A A C C G C T C T T

A G C C G C G C - T

A - C A G A G C C T

A A G C A C G C - T

A C G G G T G C T T

A T G C ndash C G C - T

A gc c g G C T

httpweblogoberkeleyedu

43

Psi Blast

bull Position Specific Iterated - automatic profile-like search

Regular blast

Construct profile from blast results

Blast profile search

Final results

44

bull Alignment with distantly related proteins

45

bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS

Enzyme Relative Km kcat kcatKm

Activity (mM) (min-1) (mM-1min

-1)Wild type 100 04 388 969

His48Ala 16 056 75 134

His63Ala 31 10 142 142

His114Ala 28 085 125 147

His124Ala 48 084 321 381

His135Ala 22 059 117 198

His212Ala lt0007 nd nd

His268Ala lt0003 nd nd

Asp14Ala 5 086 056 07

Asp113Ala 63 045 238 528

Asp131Ala 68 048 363 755

Asp203Ala 32 091 123 135

Asp214Ala lt0004 nd nd

Isopenicillin N Synthase

46

ndash IPNS

47

48

  • Multiple Sequence Alignment
  • Example
  • Why multiple sequence alignment
  • Why multiple alignment - example
  • MSA applications
  • Slide 6
  • Slide 7
  • Slide 8
  • Scoring metrics
  • Slide 10
  • MSA algorithms
  • CLUSTALW algorithm
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • ClustalW at EMBL
  • ClustalW Output Aln format
  • MSA Editing Jalview
  • MSA formats - fasta
  • MSA formats - Aln
  • MSA formats - MSF
  • Example 1a a good MSA
  • Slide 25
  • Slide 26
  • Example 2 Isopenicillin N Synthase
  • Research IPNS
  • Step 1
  • Slide 30
  • Slide 31
  • MSA ndash bacteria only
  • MSA ndash bacteria amp fungi
  • Step 2
  • Slide 35
  • Simple multiple alignment
  • Step 3
  • Consensus Sequence
  • Profile
  • Profile vs Consensus
  • Slide 41
  • Slide 42
  • Psi Blast
  • Slide 44
  • Isopenicillin N Synthase
  • Slide 46
  • Slide 47
  • Slide 48
Page 35: Multiple Sequence Alignment

35

bull New multiple alignment narrowing down the possibilities

36

Simple multiple alignment

bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite

similarbull Not enough variability to categorize the active

sitesbull We need to obtain even more distant sequences

(distant homologs)

37

Step 3

Using the results of the MSA for further searches

Implementation

1 Obtain an MSA (clustalw)

2 - Construct a consensus sequence and perform a

new search

OR

- Construct a profile and perform a new search

3 MSA (clustalw) and search for conserved residues in the MSA

38

Consensus Sequence

bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column

bull Consensus each position reflects the most common character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

39

Profilebull We can deduce a statistical model describing

the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column

bull Profile each position reflects the frequency of the character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 1 067 0 0

T 0 033 1 1

C 0 0 0 0

G 0 0 0 0

40

Profile vs Consensusbull The following multiple alignments will

have the same consensus

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

41

Profile vs Consensusbull But have a different profile

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 066 1 0 0

T 0 0 0 1

C 033 0 066

0

G 0 0 033

0

1 2 3 4 5 6

A 1 1 0 0

T 0 0 0 1

C 0 0 1 0

G 0 0 0 0

42

Sequence LOGO A A C C G C T C T T

A G C C G C G C - T

A - C A G A G C C T

A A G C A C G C - T

A C G G G T G C T T

A T G C ndash C G C - T

A gc c g G C T

httpweblogoberkeleyedu

43

Psi Blast

bull Position Specific Iterated - automatic profile-like search

Regular blast

Construct profile from blast results

Blast profile search

Final results

44

bull Alignment with distantly related proteins

45

bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS

Enzyme Relative Km kcat kcatKm

Activity (mM) (min-1) (mM-1min

-1)Wild type 100 04 388 969

His48Ala 16 056 75 134

His63Ala 31 10 142 142

His114Ala 28 085 125 147

His124Ala 48 084 321 381

His135Ala 22 059 117 198

His212Ala lt0007 nd nd

His268Ala lt0003 nd nd

Asp14Ala 5 086 056 07

Asp113Ala 63 045 238 528

Asp131Ala 68 048 363 755

Asp203Ala 32 091 123 135

Asp214Ala lt0004 nd nd

Isopenicillin N Synthase

46

ndash IPNS

47

48

  • Multiple Sequence Alignment
  • Example
  • Why multiple sequence alignment
  • Why multiple alignment - example
  • MSA applications
  • Slide 6
  • Slide 7
  • Slide 8
  • Scoring metrics
  • Slide 10
  • MSA algorithms
  • CLUSTALW algorithm
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • ClustalW at EMBL
  • ClustalW Output Aln format
  • MSA Editing Jalview
  • MSA formats - fasta
  • MSA formats - Aln
  • MSA formats - MSF
  • Example 1a a good MSA
  • Slide 25
  • Slide 26
  • Example 2 Isopenicillin N Synthase
  • Research IPNS
  • Step 1
  • Slide 30
  • Slide 31
  • MSA ndash bacteria only
  • MSA ndash bacteria amp fungi
  • Step 2
  • Slide 35
  • Simple multiple alignment
  • Step 3
  • Consensus Sequence
  • Profile
  • Profile vs Consensus
  • Slide 41
  • Slide 42
  • Psi Blast
  • Slide 44
  • Isopenicillin N Synthase
  • Slide 46
  • Slide 47
  • Slide 48
Page 36: Multiple Sequence Alignment

36

Simple multiple alignment

bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite

similarbull Not enough variability to categorize the active

sitesbull We need to obtain even more distant sequences

(distant homologs)

37

Step 3

Using the results of the MSA for further searches

Implementation

1 Obtain an MSA (clustalw)

2 - Construct a consensus sequence and perform a

new search

OR

- Construct a profile and perform a new search

3 MSA (clustalw) and search for conserved residues in the MSA

38

Consensus Sequence

bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column

bull Consensus each position reflects the most common character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

39

Profilebull We can deduce a statistical model describing

the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column

bull Profile each position reflects the frequency of the character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 1 067 0 0

T 0 033 1 1

C 0 0 0 0

G 0 0 0 0

40

Profile vs Consensusbull The following multiple alignments will

have the same consensus

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

41

Profile vs Consensusbull But have a different profile

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 066 1 0 0

T 0 0 0 1

C 033 0 066

0

G 0 0 033

0

1 2 3 4 5 6

A 1 1 0 0

T 0 0 0 1

C 0 0 1 0

G 0 0 0 0

42

Sequence LOGO A A C C G C T C T T

A G C C G C G C - T

A - C A G A G C C T

A A G C A C G C - T

A C G G G T G C T T

A T G C ndash C G C - T

A gc c g G C T

httpweblogoberkeleyedu

43

Psi Blast

bull Position Specific Iterated - automatic profile-like search

Regular blast

Construct profile from blast results

Blast profile search

Final results

44

bull Alignment with distantly related proteins

45

bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS

Enzyme Relative Km kcat kcatKm

Activity (mM) (min-1) (mM-1min

-1)Wild type 100 04 388 969

His48Ala 16 056 75 134

His63Ala 31 10 142 142

His114Ala 28 085 125 147

His124Ala 48 084 321 381

His135Ala 22 059 117 198

His212Ala lt0007 nd nd

His268Ala lt0003 nd nd

Asp14Ala 5 086 056 07

Asp113Ala 63 045 238 528

Asp131Ala 68 048 363 755

Asp203Ala 32 091 123 135

Asp214Ala lt0004 nd nd

Isopenicillin N Synthase

46

ndash IPNS

47

48

  • Multiple Sequence Alignment
  • Example
  • Why multiple sequence alignment
  • Why multiple alignment - example
  • MSA applications
  • Slide 6
  • Slide 7
  • Slide 8
  • Scoring metrics
  • Slide 10
  • MSA algorithms
  • CLUSTALW algorithm
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • ClustalW at EMBL
  • ClustalW Output Aln format
  • MSA Editing Jalview
  • MSA formats - fasta
  • MSA formats - Aln
  • MSA formats - MSF
  • Example 1a a good MSA
  • Slide 25
  • Slide 26
  • Example 2 Isopenicillin N Synthase
  • Research IPNS
  • Step 1
  • Slide 30
  • Slide 31
  • MSA ndash bacteria only
  • MSA ndash bacteria amp fungi
  • Step 2
  • Slide 35
  • Simple multiple alignment
  • Step 3
  • Consensus Sequence
  • Profile
  • Profile vs Consensus
  • Slide 41
  • Slide 42
  • Psi Blast
  • Slide 44
  • Isopenicillin N Synthase
  • Slide 46
  • Slide 47
  • Slide 48
Page 37: Multiple Sequence Alignment

37

Step 3

Using the results of the MSA for further searches

Implementation

1 Obtain an MSA (clustalw)

2 - Construct a consensus sequence and perform a

new search

OR

- Construct a profile and perform a new search

3 MSA (clustalw) and search for conserved residues in the MSA

38

Consensus Sequence

bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column

bull Consensus each position reflects the most common character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

39

Profilebull We can deduce a statistical model describing

the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column

bull Profile each position reflects the frequency of the character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 1 067 0 0

T 0 033 1 1

C 0 0 0 0

G 0 0 0 0

40

Profile vs Consensusbull The following multiple alignments will

have the same consensus

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

41

Profile vs Consensusbull But have a different profile

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 066 1 0 0

T 0 0 0 1

C 033 0 066

0

G 0 0 033

0

1 2 3 4 5 6

A 1 1 0 0

T 0 0 0 1

C 0 0 1 0

G 0 0 0 0

42

Sequence LOGO A A C C G C T C T T

A G C C G C G C - T

A - C A G A G C C T

A A G C A C G C - T

A C G G G T G C T T

A T G C ndash C G C - T

A gc c g G C T

httpweblogoberkeleyedu

43

Psi Blast

bull Position Specific Iterated - automatic profile-like search

Regular blast

Construct profile from blast results

Blast profile search

Final results

44

bull Alignment with distantly related proteins

45

bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS

Enzyme Relative Km kcat kcatKm

Activity (mM) (min-1) (mM-1min

-1)Wild type 100 04 388 969

His48Ala 16 056 75 134

His63Ala 31 10 142 142

His114Ala 28 085 125 147

His124Ala 48 084 321 381

His135Ala 22 059 117 198

His212Ala lt0007 nd nd

His268Ala lt0003 nd nd

Asp14Ala 5 086 056 07

Asp113Ala 63 045 238 528

Asp131Ala 68 048 363 755

Asp203Ala 32 091 123 135

Asp214Ala lt0004 nd nd

Isopenicillin N Synthase

46

ndash IPNS

47

48

  • Multiple Sequence Alignment
  • Example
  • Why multiple sequence alignment
  • Why multiple alignment - example
  • MSA applications
  • Slide 6
  • Slide 7
  • Slide 8
  • Scoring metrics
  • Slide 10
  • MSA algorithms
  • CLUSTALW algorithm
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • ClustalW at EMBL
  • ClustalW Output Aln format
  • MSA Editing Jalview
  • MSA formats - fasta
  • MSA formats - Aln
  • MSA formats - MSF
  • Example 1a a good MSA
  • Slide 25
  • Slide 26
  • Example 2 Isopenicillin N Synthase
  • Research IPNS
  • Step 1
  • Slide 30
  • Slide 31
  • MSA ndash bacteria only
  • MSA ndash bacteria amp fungi
  • Step 2
  • Slide 35
  • Simple multiple alignment
  • Step 3
  • Consensus Sequence
  • Profile
  • Profile vs Consensus
  • Slide 41
  • Slide 42
  • Psi Blast
  • Slide 44
  • Isopenicillin N Synthase
  • Slide 46
  • Slide 47
  • Slide 48
Page 38: Multiple Sequence Alignment

38

Consensus Sequence

bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column

bull Consensus each position reflects the most common character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

39

Profilebull We can deduce a statistical model describing

the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column

bull Profile each position reflects the frequency of the character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 1 067 0 0

T 0 033 1 1

C 0 0 0 0

G 0 0 0 0

40

Profile vs Consensusbull The following multiple alignments will

have the same consensus

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

41

Profile vs Consensusbull But have a different profile

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 066 1 0 0

T 0 0 0 1

C 033 0 066

0

G 0 0 033

0

1 2 3 4 5 6

A 1 1 0 0

T 0 0 0 1

C 0 0 1 0

G 0 0 0 0

42

Sequence LOGO A A C C G C T C T T

A G C C G C G C - T

A - C A G A G C C T

A A G C A C G C - T

A C G G G T G C T T

A T G C ndash C G C - T

A gc c g G C T

httpweblogoberkeleyedu

43

Psi Blast

bull Position Specific Iterated - automatic profile-like search

Regular blast

Construct profile from blast results

Blast profile search

Final results

44

bull Alignment with distantly related proteins

45

bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS

Enzyme Relative Km kcat kcatKm

Activity (mM) (min-1) (mM-1min

-1)Wild type 100 04 388 969

His48Ala 16 056 75 134

His63Ala 31 10 142 142

His114Ala 28 085 125 147

His124Ala 48 084 321 381

His135Ala 22 059 117 198

His212Ala lt0007 nd nd

His268Ala lt0003 nd nd

Asp14Ala 5 086 056 07

Asp113Ala 63 045 238 528

Asp131Ala 68 048 363 755

Asp203Ala 32 091 123 135

Asp214Ala lt0004 nd nd

Isopenicillin N Synthase

46

ndash IPNS

47

48

  • Multiple Sequence Alignment
  • Example
  • Why multiple sequence alignment
  • Why multiple alignment - example
  • MSA applications
  • Slide 6
  • Slide 7
  • Slide 8
  • Scoring metrics
  • Slide 10
  • MSA algorithms
  • CLUSTALW algorithm
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • ClustalW at EMBL
  • ClustalW Output Aln format
  • MSA Editing Jalview
  • MSA formats - fasta
  • MSA formats - Aln
  • MSA formats - MSF
  • Example 1a a good MSA
  • Slide 25
  • Slide 26
  • Example 2 Isopenicillin N Synthase
  • Research IPNS
  • Step 1
  • Slide 30
  • Slide 31
  • MSA ndash bacteria only
  • MSA ndash bacteria amp fungi
  • Step 2
  • Slide 35
  • Simple multiple alignment
  • Step 3
  • Consensus Sequence
  • Profile
  • Profile vs Consensus
  • Slide 41
  • Slide 42
  • Psi Blast
  • Slide 44
  • Isopenicillin N Synthase
  • Slide 46
  • Slide 47
  • Slide 48
Page 39: Multiple Sequence Alignment

39

Profilebull We can deduce a statistical model describing

the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column

bull Profile each position reflects the frequency of the character found at a position

A T C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 1 067 0 0

T 0 033 1 1

C 0 0 0 0

G 0 0 0 0

40

Profile vs Consensusbull The following multiple alignments will

have the same consensus

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

41

Profile vs Consensusbull But have a different profile

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 066 1 0 0

T 0 0 0 1

C 033 0 066

0

G 0 0 033

0

1 2 3 4 5 6

A 1 1 0 0

T 0 0 0 1

C 0 0 1 0

G 0 0 0 0

42

Sequence LOGO A A C C G C T C T T

A G C C G C G C - T

A - C A G A G C C T

A A G C A C G C - T

A C G G G T G C T T

A T G C ndash C G C - T

A gc c g G C T

httpweblogoberkeleyedu

43

Psi Blast

bull Position Specific Iterated - automatic profile-like search

Regular blast

Construct profile from blast results

Blast profile search

Final results

44

bull Alignment with distantly related proteins

45

bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS

Enzyme Relative Km kcat kcatKm

Activity (mM) (min-1) (mM-1min

-1)Wild type 100 04 388 969

His48Ala 16 056 75 134

His63Ala 31 10 142 142

His114Ala 28 085 125 147

His124Ala 48 084 321 381

His135Ala 22 059 117 198

His212Ala lt0007 nd nd

His268Ala lt0003 nd nd

Asp14Ala 5 086 056 07

Asp113Ala 63 045 238 528

Asp131Ala 68 048 363 755

Asp203Ala 32 091 123 135

Asp214Ala lt0004 nd nd

Isopenicillin N Synthase

46

ndash IPNS

47

48

  • Multiple Sequence Alignment
  • Example
  • Why multiple sequence alignment
  • Why multiple alignment - example
  • MSA applications
  • Slide 6
  • Slide 7
  • Slide 8
  • Scoring metrics
  • Slide 10
  • MSA algorithms
  • CLUSTALW algorithm
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • ClustalW at EMBL
  • ClustalW Output Aln format
  • MSA Editing Jalview
  • MSA formats - fasta
  • MSA formats - Aln
  • MSA formats - MSF
  • Example 1a a good MSA
  • Slide 25
  • Slide 26
  • Example 2 Isopenicillin N Synthase
  • Research IPNS
  • Step 1
  • Slide 30
  • Slide 31
  • MSA ndash bacteria only
  • MSA ndash bacteria amp fungi
  • Step 2
  • Slide 35
  • Simple multiple alignment
  • Step 3
  • Consensus Sequence
  • Profile
  • Profile vs Consensus
  • Slide 41
  • Slide 42
  • Psi Blast
  • Slide 44
  • Isopenicillin N Synthase
  • Slide 46
  • Slide 47
  • Slide 48
Page 40: Multiple Sequence Alignment

40

Profile vs Consensusbull The following multiple alignments will

have the same consensus

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

A A C T T G T

41

Profile vs Consensusbull But have a different profile

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 066 1 0 0

T 0 0 0 1

C 033 0 066

0

G 0 0 033

0

1 2 3 4 5 6

A 1 1 0 0

T 0 0 0 1

C 0 0 1 0

G 0 0 0 0

42

Sequence LOGO A A C C G C T C T T

A G C C G C G C - T

A - C A G A G C C T

A A G C A C G C - T

A C G G G T G C T T

A T G C ndash C G C - T

A gc c g G C T

httpweblogoberkeleyedu

43

Psi Blast

bull Position Specific Iterated - automatic profile-like search

Regular blast

Construct profile from blast results

Blast profile search

Final results

44

bull Alignment with distantly related proteins

45

bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS

Enzyme Relative Km kcat kcatKm

Activity (mM) (min-1) (mM-1min

-1)Wild type 100 04 388 969

His48Ala 16 056 75 134

His63Ala 31 10 142 142

His114Ala 28 085 125 147

His124Ala 48 084 321 381

His135Ala 22 059 117 198

His212Ala lt0007 nd nd

His268Ala lt0003 nd nd

Asp14Ala 5 086 056 07

Asp113Ala 63 045 238 528

Asp131Ala 68 048 363 755

Asp203Ala 32 091 123 135

Asp214Ala lt0004 nd nd

Isopenicillin N Synthase

46

ndash IPNS

47

48

  • Multiple Sequence Alignment
  • Example
  • Why multiple sequence alignment
  • Why multiple alignment - example
  • MSA applications
  • Slide 6
  • Slide 7
  • Slide 8
  • Scoring metrics
  • Slide 10
  • MSA algorithms
  • CLUSTALW algorithm
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • ClustalW at EMBL
  • ClustalW Output Aln format
  • MSA Editing Jalview
  • MSA formats - fasta
  • MSA formats - Aln
  • MSA formats - MSF
  • Example 1a a good MSA
  • Slide 25
  • Slide 26
  • Example 2 Isopenicillin N Synthase
  • Research IPNS
  • Step 1
  • Slide 30
  • Slide 31
  • MSA ndash bacteria only
  • MSA ndash bacteria amp fungi
  • Step 2
  • Slide 35
  • Simple multiple alignment
  • Step 3
  • Consensus Sequence
  • Profile
  • Profile vs Consensus
  • Slide 41
  • Slide 42
  • Psi Blast
  • Slide 44
  • Isopenicillin N Synthase
  • Slide 46
  • Slide 47
  • Slide 48
Page 41: Multiple Sequence Alignment

41

Profile vs Consensusbull But have a different profile

A A C T T G C

A A G T C G T

C A C T T C T

A A C T T G T

A A C T T G T

A A C T T C T

1 2 3 4 5 6

A 066 1 0 0

T 0 0 0 1

C 033 0 066

0

G 0 0 033

0

1 2 3 4 5 6

A 1 1 0 0

T 0 0 0 1

C 0 0 1 0

G 0 0 0 0

42

Sequence LOGO A A C C G C T C T T

A G C C G C G C - T

A - C A G A G C C T

A A G C A C G C - T

A C G G G T G C T T

A T G C ndash C G C - T

A gc c g G C T

httpweblogoberkeleyedu

43

Psi Blast

bull Position Specific Iterated - automatic profile-like search

Regular blast

Construct profile from blast results

Blast profile search

Final results

44

bull Alignment with distantly related proteins

45

bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS

Enzyme Relative Km kcat kcatKm

Activity (mM) (min-1) (mM-1min

-1)Wild type 100 04 388 969

His48Ala 16 056 75 134

His63Ala 31 10 142 142

His114Ala 28 085 125 147

His124Ala 48 084 321 381

His135Ala 22 059 117 198

His212Ala lt0007 nd nd

His268Ala lt0003 nd nd

Asp14Ala 5 086 056 07

Asp113Ala 63 045 238 528

Asp131Ala 68 048 363 755

Asp203Ala 32 091 123 135

Asp214Ala lt0004 nd nd

Isopenicillin N Synthase

46

ndash IPNS

47

48

  • Multiple Sequence Alignment
  • Example
  • Why multiple sequence alignment
  • Why multiple alignment - example
  • MSA applications
  • Slide 6
  • Slide 7
  • Slide 8
  • Scoring metrics
  • Slide 10
  • MSA algorithms
  • CLUSTALW algorithm
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • ClustalW at EMBL
  • ClustalW Output Aln format
  • MSA Editing Jalview
  • MSA formats - fasta
  • MSA formats - Aln
  • MSA formats - MSF
  • Example 1a a good MSA
  • Slide 25
  • Slide 26
  • Example 2 Isopenicillin N Synthase
  • Research IPNS
  • Step 1
  • Slide 30
  • Slide 31
  • MSA ndash bacteria only
  • MSA ndash bacteria amp fungi
  • Step 2
  • Slide 35
  • Simple multiple alignment
  • Step 3
  • Consensus Sequence
  • Profile
  • Profile vs Consensus
  • Slide 41
  • Slide 42
  • Psi Blast
  • Slide 44
  • Isopenicillin N Synthase
  • Slide 46
  • Slide 47
  • Slide 48
Page 42: Multiple Sequence Alignment

42

Sequence LOGO A A C C G C T C T T

A G C C G C G C - T

A - C A G A G C C T

A A G C A C G C - T

A C G G G T G C T T

A T G C ndash C G C - T

A gc c g G C T

httpweblogoberkeleyedu

43

Psi Blast

bull Position Specific Iterated - automatic profile-like search

Regular blast

Construct profile from blast results

Blast profile search

Final results

44

bull Alignment with distantly related proteins

45

bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS

Enzyme Relative Km kcat kcatKm

Activity (mM) (min-1) (mM-1min

-1)Wild type 100 04 388 969

His48Ala 16 056 75 134

His63Ala 31 10 142 142

His114Ala 28 085 125 147

His124Ala 48 084 321 381

His135Ala 22 059 117 198

His212Ala lt0007 nd nd

His268Ala lt0003 nd nd

Asp14Ala 5 086 056 07

Asp113Ala 63 045 238 528

Asp131Ala 68 048 363 755

Asp203Ala 32 091 123 135

Asp214Ala lt0004 nd nd

Isopenicillin N Synthase

46

ndash IPNS

47

48

  • Multiple Sequence Alignment
  • Example
  • Why multiple sequence alignment
  • Why multiple alignment - example
  • MSA applications
  • Slide 6
  • Slide 7
  • Slide 8
  • Scoring metrics
  • Slide 10
  • MSA algorithms
  • CLUSTALW algorithm
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • ClustalW at EMBL
  • ClustalW Output Aln format
  • MSA Editing Jalview
  • MSA formats - fasta
  • MSA formats - Aln
  • MSA formats - MSF
  • Example 1a a good MSA
  • Slide 25
  • Slide 26
  • Example 2 Isopenicillin N Synthase
  • Research IPNS
  • Step 1
  • Slide 30
  • Slide 31
  • MSA ndash bacteria only
  • MSA ndash bacteria amp fungi
  • Step 2
  • Slide 35
  • Simple multiple alignment
  • Step 3
  • Consensus Sequence
  • Profile
  • Profile vs Consensus
  • Slide 41
  • Slide 42
  • Psi Blast
  • Slide 44
  • Isopenicillin N Synthase
  • Slide 46
  • Slide 47
  • Slide 48
Page 43: Multiple Sequence Alignment

43

Psi Blast

bull Position Specific Iterated - automatic profile-like search

Regular blast

Construct profile from blast results

Blast profile search

Final results

44

bull Alignment with distantly related proteins

45

bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS

Enzyme Relative Km kcat kcatKm

Activity (mM) (min-1) (mM-1min

-1)Wild type 100 04 388 969

His48Ala 16 056 75 134

His63Ala 31 10 142 142

His114Ala 28 085 125 147

His124Ala 48 084 321 381

His135Ala 22 059 117 198

His212Ala lt0007 nd nd

His268Ala lt0003 nd nd

Asp14Ala 5 086 056 07

Asp113Ala 63 045 238 528

Asp131Ala 68 048 363 755

Asp203Ala 32 091 123 135

Asp214Ala lt0004 nd nd

Isopenicillin N Synthase

46

ndash IPNS

47

48

  • Multiple Sequence Alignment
  • Example
  • Why multiple sequence alignment
  • Why multiple alignment - example
  • MSA applications
  • Slide 6
  • Slide 7
  • Slide 8
  • Scoring metrics
  • Slide 10
  • MSA algorithms
  • CLUSTALW algorithm
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • ClustalW at EMBL
  • ClustalW Output Aln format
  • MSA Editing Jalview
  • MSA formats - fasta
  • MSA formats - Aln
  • MSA formats - MSF
  • Example 1a a good MSA
  • Slide 25
  • Slide 26
  • Example 2 Isopenicillin N Synthase
  • Research IPNS
  • Step 1
  • Slide 30
  • Slide 31
  • MSA ndash bacteria only
  • MSA ndash bacteria amp fungi
  • Step 2
  • Slide 35
  • Simple multiple alignment
  • Step 3
  • Consensus Sequence
  • Profile
  • Profile vs Consensus
  • Slide 41
  • Slide 42
  • Psi Blast
  • Slide 44
  • Isopenicillin N Synthase
  • Slide 46
  • Slide 47
  • Slide 48
Page 44: Multiple Sequence Alignment

44

bull Alignment with distantly related proteins

45

bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS

Enzyme Relative Km kcat kcatKm

Activity (mM) (min-1) (mM-1min

-1)Wild type 100 04 388 969

His48Ala 16 056 75 134

His63Ala 31 10 142 142

His114Ala 28 085 125 147

His124Ala 48 084 321 381

His135Ala 22 059 117 198

His212Ala lt0007 nd nd

His268Ala lt0003 nd nd

Asp14Ala 5 086 056 07

Asp113Ala 63 045 238 528

Asp131Ala 68 048 363 755

Asp203Ala 32 091 123 135

Asp214Ala lt0004 nd nd

Isopenicillin N Synthase

46

ndash IPNS

47

48

  • Multiple Sequence Alignment
  • Example
  • Why multiple sequence alignment
  • Why multiple alignment - example
  • MSA applications
  • Slide 6
  • Slide 7
  • Slide 8
  • Scoring metrics
  • Slide 10
  • MSA algorithms
  • CLUSTALW algorithm
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • ClustalW at EMBL
  • ClustalW Output Aln format
  • MSA Editing Jalview
  • MSA formats - fasta
  • MSA formats - Aln
  • MSA formats - MSF
  • Example 1a a good MSA
  • Slide 25
  • Slide 26
  • Example 2 Isopenicillin N Synthase
  • Research IPNS
  • Step 1
  • Slide 30
  • Slide 31
  • MSA ndash bacteria only
  • MSA ndash bacteria amp fungi
  • Step 2
  • Slide 35
  • Simple multiple alignment
  • Step 3
  • Consensus Sequence
  • Profile
  • Profile vs Consensus
  • Slide 41
  • Slide 42
  • Psi Blast
  • Slide 44
  • Isopenicillin N Synthase
  • Slide 46
  • Slide 47
  • Slide 48
Page 45: Multiple Sequence Alignment

45

bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS

Enzyme Relative Km kcat kcatKm

Activity (mM) (min-1) (mM-1min

-1)Wild type 100 04 388 969

His48Ala 16 056 75 134

His63Ala 31 10 142 142

His114Ala 28 085 125 147

His124Ala 48 084 321 381

His135Ala 22 059 117 198

His212Ala lt0007 nd nd

His268Ala lt0003 nd nd

Asp14Ala 5 086 056 07

Asp113Ala 63 045 238 528

Asp131Ala 68 048 363 755

Asp203Ala 32 091 123 135

Asp214Ala lt0004 nd nd

Isopenicillin N Synthase

46

ndash IPNS

47

48

  • Multiple Sequence Alignment
  • Example
  • Why multiple sequence alignment
  • Why multiple alignment - example
  • MSA applications
  • Slide 6
  • Slide 7
  • Slide 8
  • Scoring metrics
  • Slide 10
  • MSA algorithms
  • CLUSTALW algorithm
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • ClustalW at EMBL
  • ClustalW Output Aln format
  • MSA Editing Jalview
  • MSA formats - fasta
  • MSA formats - Aln
  • MSA formats - MSF
  • Example 1a a good MSA
  • Slide 25
  • Slide 26
  • Example 2 Isopenicillin N Synthase
  • Research IPNS
  • Step 1
  • Slide 30
  • Slide 31
  • MSA ndash bacteria only
  • MSA ndash bacteria amp fungi
  • Step 2
  • Slide 35
  • Simple multiple alignment
  • Step 3
  • Consensus Sequence
  • Profile
  • Profile vs Consensus
  • Slide 41
  • Slide 42
  • Psi Blast
  • Slide 44
  • Isopenicillin N Synthase
  • Slide 46
  • Slide 47
  • Slide 48
Page 46: Multiple Sequence Alignment

46

ndash IPNS

47

48

  • Multiple Sequence Alignment
  • Example
  • Why multiple sequence alignment
  • Why multiple alignment - example
  • MSA applications
  • Slide 6
  • Slide 7
  • Slide 8
  • Scoring metrics
  • Slide 10
  • MSA algorithms
  • CLUSTALW algorithm
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • ClustalW at EMBL
  • ClustalW Output Aln format
  • MSA Editing Jalview
  • MSA formats - fasta
  • MSA formats - Aln
  • MSA formats - MSF
  • Example 1a a good MSA
  • Slide 25
  • Slide 26
  • Example 2 Isopenicillin N Synthase
  • Research IPNS
  • Step 1
  • Slide 30
  • Slide 31
  • MSA ndash bacteria only
  • MSA ndash bacteria amp fungi
  • Step 2
  • Slide 35
  • Simple multiple alignment
  • Step 3
  • Consensus Sequence
  • Profile
  • Profile vs Consensus
  • Slide 41
  • Slide 42
  • Psi Blast
  • Slide 44
  • Isopenicillin N Synthase
  • Slide 46
  • Slide 47
  • Slide 48
Page 47: Multiple Sequence Alignment

47

48

  • Multiple Sequence Alignment
  • Example
  • Why multiple sequence alignment
  • Why multiple alignment - example
  • MSA applications
  • Slide 6
  • Slide 7
  • Slide 8
  • Scoring metrics
  • Slide 10
  • MSA algorithms
  • CLUSTALW algorithm
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • ClustalW at EMBL
  • ClustalW Output Aln format
  • MSA Editing Jalview
  • MSA formats - fasta
  • MSA formats - Aln
  • MSA formats - MSF
  • Example 1a a good MSA
  • Slide 25
  • Slide 26
  • Example 2 Isopenicillin N Synthase
  • Research IPNS
  • Step 1
  • Slide 30
  • Slide 31
  • MSA ndash bacteria only
  • MSA ndash bacteria amp fungi
  • Step 2
  • Slide 35
  • Simple multiple alignment
  • Step 3
  • Consensus Sequence
  • Profile
  • Profile vs Consensus
  • Slide 41
  • Slide 42
  • Psi Blast
  • Slide 44
  • Isopenicillin N Synthase
  • Slide 46
  • Slide 47
  • Slide 48
Page 48: Multiple Sequence Alignment

48

  • Multiple Sequence Alignment
  • Example
  • Why multiple sequence alignment
  • Why multiple alignment - example
  • MSA applications
  • Slide 6
  • Slide 7
  • Slide 8
  • Scoring metrics
  • Slide 10
  • MSA algorithms
  • CLUSTALW algorithm
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • ClustalW at EMBL
  • ClustalW Output Aln format
  • MSA Editing Jalview
  • MSA formats - fasta
  • MSA formats - Aln
  • MSA formats - MSF
  • Example 1a a good MSA
  • Slide 25
  • Slide 26
  • Example 2 Isopenicillin N Synthase
  • Research IPNS
  • Step 1
  • Slide 30
  • Slide 31
  • MSA ndash bacteria only
  • MSA ndash bacteria amp fungi
  • Step 2
  • Slide 35
  • Simple multiple alignment
  • Step 3
  • Consensus Sequence
  • Profile
  • Profile vs Consensus
  • Slide 41
  • Slide 42
  • Psi Blast
  • Slide 44
  • Isopenicillin N Synthase
  • Slide 46
  • Slide 47
  • Slide 48