Upload
karli
View
31
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Multiple Sequence Alignment. Lesson 5. Example. VTIS C TGSSSNIGAG-NHVK W YQQLPG VTIS C TGTSSNIGS--ITVN W YQQLPG LRLS C SSSGFIFSS--YAMY W VRQAPG LSLT C TVSGTSFDD--YYST W VRQPPG PEVT C VVVDVSHEDPQVKFN W YVDG-- ATLV C LISDFYPGA--VTVA W KADS-- AALG C LVKDYFPEP--VTVS W NSG--- - PowerPoint PPT Presentation
Citation preview
1
Multiple Sequence Alignment
Lesson 5
2
Example
VTISCTGSSSNIGAG-NHVKWYQQLPGVTISCTGTSSNIGS--ITVNWYQQLPGLRLSCSSSGFIFSS--YAMYWVRQAPGLSLTCTVSGTSFDD--YYSTWVRQPPGPEVTCVVVDVSHEDPQVKFNWYVDG--ATLVCLISDFYPGA--VTVAWKADS--AALGCLVKDYFPEP--VTVSWNSG---VSLTCLVKGFYPSD--IAVEWWSNG--
3
Why multiple sequence alignmentbull Structure similarity ndash aa that play the same role in
each structure are in the same columnbull Evolutionary similarity ndash aa related to the same
ancestor are in the same columnbull Functional similarity - aa with the same function are in
the same columnbull Seq similarity ndash alignment with max similarity No
biological meaning
bull When seqs are closely related structure-evolution-functional similarity equivalent
4
bull Histones small abundant proteins Present in all eukaryotic chromosomes
bull Show a remarkable conserved multiple sequence alignment
bull Conservation of structure and function (aid in DNA package)
Why multiple alignment - examplebull Example
5
MSA applicationsbull Generate protein familiesbull Extrapolation ndash membership of uncharacterized sequence to a
protein familybull Understand evolution - preliminary step in molecular evolution
analysis for constructing phylogenetic trees eg is the duck evolutionary closer to a lion or to a fruit fly
bull Pattern identification ndash find the important (conserved) region in the protein - conserved positions may characterize a function
bull Domain identification ndash Build a consensusprofilemotif that describe the protein family help to describe new members of the family
bull DNA regulatory elementsbull Structure prediction (secondary and 3D model)
6
multiple sequence alignment
Pairwise solution might be very different from multiple solution
7
בתיאוריה כן באופן האם אפשר פשוט להכליל את שיטת התיכנות הדינמי פרקטי לא
מדובר בזמן ריצהובגודל זיכרון הגדלים
כאשר NKכ Nהוא אורך הרצף Kמספר הרצפים
למעשה בלתי אפשריKgt3 עבור
הבעיה החישובית בהינתן מספר רצפים מצא את ההתאמה שלהם למסגרת תקבל ערך אופטימלי שפונקצית מרחקמשותפת (עי הוספת רווחים) כך
-SCGPFIRVMSCGPGLRA-SCTPHL-AMSC-PKIRGMS-LPLLRNMSHKPALRA
SCGPFIRVMSCGPGLRASCTPHLAMSCPKIRGMSLPLLRNMSHKPALRA
8
מאחר שהבעיה החישובית קשה אנו יודעים שכל שיטה שנציע לא תבטיח פתרון אופטימלי נחפש שיטה שנותנת תוצאות טובות
ברב המקרים יתכן שהשיטה שנבחר תהיה תלויה בסיבה שבגללה
אנו מבצעים את ההתאמה
1 Choosing the sequencess2 Scoring metrics3 Approximationheuristic algorithms4 MSA formats5 Interpreting the MSA
The problems we have to answer
9
Scoring metricsbull Distance from Consensus - The consensus of an alignment
is a string of the most common characters in each column of the alignment The total distance between the strings is defined as the number of characters that differ from the consensus character of their column let C be the consensus sequence then the total distance is sum_i D(Si C)
bull Evolutionary Tree Alignment - The weight of the lightest evolutionary tree that can be constructed from the sequences with the weight of the tree defined as the number of changes between pairs of sequences that correspond to two adjacent nodes in the tree summed over all such pairs
bull Sum of Pairs - The sum of pairwise distances between all pairs of sequences Sum_iltj D(Si Sj)
10
-SCGPFIRVMSCGPGLRA-SCTPHL-A
-SCGPFIRVMSCGPGLRA
-SCGPFIRV-SCTPHL-A
MSCGPGLRA-SCTPHL-A
5 3 5 13
Scoring metrics -examplesSum of pairs
-SCGPFIRVMSCGPGLRA-SCTPHL-AMSC-PKIRGMS-LPLLRNMSHKPALRA
3 5 4 1 6 2 4 6 354סהכ הומוגניות
3 1 2 5 0 4 2 0 42 19סהכ מרחק
Distance from concensus
11
MSA algorithms
bull Progressive methods (CLUSTALWT-Coffee)
bull Iterative methods (Dialign)
bull Direct optimization (monte carlo genetic algorithms)
bull Local methods eMotifs Blocks Psi-blast
12
CLUSTALW algorithm
bull Compare all sequence pairs (pairwise alignment)
bull Generate a hierarchy for alignment (guide tree)
bull Build the multiple alignment step by step according to the guide tree first aligning the most similar pair then add another sequence or another pairwise alignment etc
13
(1) Pairwise alignment (prepare a guide tree)
6 pairwise alignments
then cluster analysis
(2) Multiple alignment following the tree from (1)
successive alignments
CLUSTALW algorithm
14
בונים עץ עי חיבור הרצפים הדומים לפי סדר התאמתם2
Hbb Hbb Hba Hba Myg Human Horse Human Horse Whale
בונים את ההתאמה לפי הסדר המוכתב3 עי העץ
ישנם שלושה מצביםהתאמת זוג רצפים - התאמה בין התאמות-הוספת רצף בודד להתאמה קיימת-
בונים טבלה של מרחק עריכה בין כל שני רצפים1CLUSTALW algorithm
15
ניתן להתאים בין התאמות (או בין רצף להתאמה) עי הרחבה די טבעית של שיטת התיכנות הדינמי הוספה
והכנסת אותיות באופן הרגיל המחיר להתאמת אותיות הוא מחיר ממוצע לקומבינציות השונות צריך לתת מחיר מיוחד
לרווח
16
נקודות עדינות
נותנים משקלות שונים לרצפים כך שאם יש מספר רצפים מאד דומים bullהמשקל היחסי של כל אחד יקטן
שיטה מיוחדת לקביעת מחיר להכנסת רווחיםbull
חסרונותמאחר שהבניה היא בלתי הפיכה יש תלות גדולה מאד באיכות ההתאמה bull
של הזוגות הראשוניםOnce a Gap Always a gapרווחים אינם נעלמים bullרגישות רבה לקנס הרווחיםbull
יתרונותמהירbullסבירbullאפשר לקבוע הרבה פרמטריםbullכולם משתמשיםbull
CLUSTALW algorithm
17
bull We can deduce a pairwise alignment for each two sequences in the multiple alignment (projected pairwise alignment)
bull The projected pairwise alignment is NOT the best pairwise alignment for the two sequences
bull CLUTALW is not an optimal algorithm Better alignments might exist The algorithm yields a possible alignment but not necessarily the best one
Best Pairwise alignment (optimal)
Projected Pairwise alignment
CLUSTALW algorithm
18
ClustalW at EMBLhttpwwwebiacukclustalw Clustalw at the SRS site at EBI
19
ClustalW Output Aln format
20
MSA Editing Jalview
Conservation
wwwesembnetorgServicesMolBiojalviewindexhtml
21
MSA formats - fasta
22
MSA formats - Aln
23
MSA formats - MSF
24
Example 1a a good MSA
25
Example 1b making MSA of distantly related proteins
26
Example 1c including more distant relatives in the MSA
27
bull Mononuclear iron proteins ndash electron carrier proteins Iron atoms are bound to amino acid side chains
bull In IPNS the metal ion is coordinated by three protein residues
bull IPNS is involved in biosynthesis of penicillin
Example 2 Isopenicillin N Synthase
N
SN
OCOOHO
H
H
Me
Me
COOH
NH2
N
SN
OCOOH
NH2 Me
Me
COOHO
ACV
Isopenicillin N
Fe+2
Ascorbate
O2
2H2O Fe
N
NHis268
H
N
NHis212
H
SACV
O2 (NO)
H2O
OAsp214
O
28
Research IPNS
bull Goal Identify Fe+2 binding residues
bull Possible solutions1 In the lab
2 Bioinformatic approach (comparing different IPNS sequences)
29
Step 1
Multiple alignment of known IPNS
Implementation
1 Obtain sequence (eg for MCBI)
IPNS AND Bacteria[Organism]
2 MSA (clustalw) and search for conserved residues in the MSA
30
31
32
MSA ndash bacteria only
Not enough variation
33
MSA ndash bacteria amp fungi
Not enough variation
bacteria amp fungi
bacteria
34
Step 2
Goal Add more enzymes similar to IPNS
ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW
35
bull New multiple alignment narrowing down the possibilities
36
Simple multiple alignment
bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite
similarbull Not enough variability to categorize the active
sitesbull We need to obtain even more distant sequences
(distant homologs)
37
Step 3
Using the results of the MSA for further searches
Implementation
1 Obtain an MSA (clustalw)
2 - Construct a consensus sequence and perform a
new search
OR
- Construct a profile and perform a new search
3 MSA (clustalw) and search for conserved residues in the MSA
38
Consensus Sequence
bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column
bull Consensus each position reflects the most common character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
39
Profilebull We can deduce a statistical model describing
the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column
bull Profile each position reflects the frequency of the character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 1 067 0 0
T 0 033 1 1
C 0 0 0 0
G 0 0 0 0
40
Profile vs Consensusbull The following multiple alignments will
have the same consensus
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
41
Profile vs Consensusbull But have a different profile
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 066 1 0 0
T 0 0 0 1
C 033 0 066
0
G 0 0 033
0
1 2 3 4 5 6
A 1 1 0 0
T 0 0 0 1
C 0 0 1 0
G 0 0 0 0
42
Sequence LOGO A A C C G C T C T T
A G C C G C G C - T
A - C A G A G C C T
A A G C A C G C - T
A C G G G T G C T T
A T G C ndash C G C - T
A gc c g G C T
httpweblogoberkeleyedu
43
Psi Blast
bull Position Specific Iterated - automatic profile-like search
Regular blast
Construct profile from blast results
Blast profile search
Final results
44
bull Alignment with distantly related proteins
45
bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS
Enzyme Relative Km kcat kcatKm
Activity (mM) (min-1) (mM-1min
-1)Wild type 100 04 388 969
His48Ala 16 056 75 134
His63Ala 31 10 142 142
His114Ala 28 085 125 147
His124Ala 48 084 321 381
His135Ala 22 059 117 198
His212Ala lt0007 nd nd
His268Ala lt0003 nd nd
Asp14Ala 5 086 056 07
Asp113Ala 63 045 238 528
Asp131Ala 68 048 363 755
Asp203Ala 32 091 123 135
Asp214Ala lt0004 nd nd
Isopenicillin N Synthase
46
ndash IPNS
47
48
2
Example
VTISCTGSSSNIGAG-NHVKWYQQLPGVTISCTGTSSNIGS--ITVNWYQQLPGLRLSCSSSGFIFSS--YAMYWVRQAPGLSLTCTVSGTSFDD--YYSTWVRQPPGPEVTCVVVDVSHEDPQVKFNWYVDG--ATLVCLISDFYPGA--VTVAWKADS--AALGCLVKDYFPEP--VTVSWNSG---VSLTCLVKGFYPSD--IAVEWWSNG--
3
Why multiple sequence alignmentbull Structure similarity ndash aa that play the same role in
each structure are in the same columnbull Evolutionary similarity ndash aa related to the same
ancestor are in the same columnbull Functional similarity - aa with the same function are in
the same columnbull Seq similarity ndash alignment with max similarity No
biological meaning
bull When seqs are closely related structure-evolution-functional similarity equivalent
4
bull Histones small abundant proteins Present in all eukaryotic chromosomes
bull Show a remarkable conserved multiple sequence alignment
bull Conservation of structure and function (aid in DNA package)
Why multiple alignment - examplebull Example
5
MSA applicationsbull Generate protein familiesbull Extrapolation ndash membership of uncharacterized sequence to a
protein familybull Understand evolution - preliminary step in molecular evolution
analysis for constructing phylogenetic trees eg is the duck evolutionary closer to a lion or to a fruit fly
bull Pattern identification ndash find the important (conserved) region in the protein - conserved positions may characterize a function
bull Domain identification ndash Build a consensusprofilemotif that describe the protein family help to describe new members of the family
bull DNA regulatory elementsbull Structure prediction (secondary and 3D model)
6
multiple sequence alignment
Pairwise solution might be very different from multiple solution
7
בתיאוריה כן באופן האם אפשר פשוט להכליל את שיטת התיכנות הדינמי פרקטי לא
מדובר בזמן ריצהובגודל זיכרון הגדלים
כאשר NKכ Nהוא אורך הרצף Kמספר הרצפים
למעשה בלתי אפשריKgt3 עבור
הבעיה החישובית בהינתן מספר רצפים מצא את ההתאמה שלהם למסגרת תקבל ערך אופטימלי שפונקצית מרחקמשותפת (עי הוספת רווחים) כך
-SCGPFIRVMSCGPGLRA-SCTPHL-AMSC-PKIRGMS-LPLLRNMSHKPALRA
SCGPFIRVMSCGPGLRASCTPHLAMSCPKIRGMSLPLLRNMSHKPALRA
8
מאחר שהבעיה החישובית קשה אנו יודעים שכל שיטה שנציע לא תבטיח פתרון אופטימלי נחפש שיטה שנותנת תוצאות טובות
ברב המקרים יתכן שהשיטה שנבחר תהיה תלויה בסיבה שבגללה
אנו מבצעים את ההתאמה
1 Choosing the sequencess2 Scoring metrics3 Approximationheuristic algorithms4 MSA formats5 Interpreting the MSA
The problems we have to answer
9
Scoring metricsbull Distance from Consensus - The consensus of an alignment
is a string of the most common characters in each column of the alignment The total distance between the strings is defined as the number of characters that differ from the consensus character of their column let C be the consensus sequence then the total distance is sum_i D(Si C)
bull Evolutionary Tree Alignment - The weight of the lightest evolutionary tree that can be constructed from the sequences with the weight of the tree defined as the number of changes between pairs of sequences that correspond to two adjacent nodes in the tree summed over all such pairs
bull Sum of Pairs - The sum of pairwise distances between all pairs of sequences Sum_iltj D(Si Sj)
10
-SCGPFIRVMSCGPGLRA-SCTPHL-A
-SCGPFIRVMSCGPGLRA
-SCGPFIRV-SCTPHL-A
MSCGPGLRA-SCTPHL-A
5 3 5 13
Scoring metrics -examplesSum of pairs
-SCGPFIRVMSCGPGLRA-SCTPHL-AMSC-PKIRGMS-LPLLRNMSHKPALRA
3 5 4 1 6 2 4 6 354סהכ הומוגניות
3 1 2 5 0 4 2 0 42 19סהכ מרחק
Distance from concensus
11
MSA algorithms
bull Progressive methods (CLUSTALWT-Coffee)
bull Iterative methods (Dialign)
bull Direct optimization (monte carlo genetic algorithms)
bull Local methods eMotifs Blocks Psi-blast
12
CLUSTALW algorithm
bull Compare all sequence pairs (pairwise alignment)
bull Generate a hierarchy for alignment (guide tree)
bull Build the multiple alignment step by step according to the guide tree first aligning the most similar pair then add another sequence or another pairwise alignment etc
13
(1) Pairwise alignment (prepare a guide tree)
6 pairwise alignments
then cluster analysis
(2) Multiple alignment following the tree from (1)
successive alignments
CLUSTALW algorithm
14
בונים עץ עי חיבור הרצפים הדומים לפי סדר התאמתם2
Hbb Hbb Hba Hba Myg Human Horse Human Horse Whale
בונים את ההתאמה לפי הסדר המוכתב3 עי העץ
ישנם שלושה מצביםהתאמת זוג רצפים - התאמה בין התאמות-הוספת רצף בודד להתאמה קיימת-
בונים טבלה של מרחק עריכה בין כל שני רצפים1CLUSTALW algorithm
15
ניתן להתאים בין התאמות (או בין רצף להתאמה) עי הרחבה די טבעית של שיטת התיכנות הדינמי הוספה
והכנסת אותיות באופן הרגיל המחיר להתאמת אותיות הוא מחיר ממוצע לקומבינציות השונות צריך לתת מחיר מיוחד
לרווח
16
נקודות עדינות
נותנים משקלות שונים לרצפים כך שאם יש מספר רצפים מאד דומים bullהמשקל היחסי של כל אחד יקטן
שיטה מיוחדת לקביעת מחיר להכנסת רווחיםbull
חסרונותמאחר שהבניה היא בלתי הפיכה יש תלות גדולה מאד באיכות ההתאמה bull
של הזוגות הראשוניםOnce a Gap Always a gapרווחים אינם נעלמים bullרגישות רבה לקנס הרווחיםbull
יתרונותמהירbullסבירbullאפשר לקבוע הרבה פרמטריםbullכולם משתמשיםbull
CLUSTALW algorithm
17
bull We can deduce a pairwise alignment for each two sequences in the multiple alignment (projected pairwise alignment)
bull The projected pairwise alignment is NOT the best pairwise alignment for the two sequences
bull CLUTALW is not an optimal algorithm Better alignments might exist The algorithm yields a possible alignment but not necessarily the best one
Best Pairwise alignment (optimal)
Projected Pairwise alignment
CLUSTALW algorithm
18
ClustalW at EMBLhttpwwwebiacukclustalw Clustalw at the SRS site at EBI
19
ClustalW Output Aln format
20
MSA Editing Jalview
Conservation
wwwesembnetorgServicesMolBiojalviewindexhtml
21
MSA formats - fasta
22
MSA formats - Aln
23
MSA formats - MSF
24
Example 1a a good MSA
25
Example 1b making MSA of distantly related proteins
26
Example 1c including more distant relatives in the MSA
27
bull Mononuclear iron proteins ndash electron carrier proteins Iron atoms are bound to amino acid side chains
bull In IPNS the metal ion is coordinated by three protein residues
bull IPNS is involved in biosynthesis of penicillin
Example 2 Isopenicillin N Synthase
N
SN
OCOOHO
H
H
Me
Me
COOH
NH2
N
SN
OCOOH
NH2 Me
Me
COOHO
ACV
Isopenicillin N
Fe+2
Ascorbate
O2
2H2O Fe
N
NHis268
H
N
NHis212
H
SACV
O2 (NO)
H2O
OAsp214
O
28
Research IPNS
bull Goal Identify Fe+2 binding residues
bull Possible solutions1 In the lab
2 Bioinformatic approach (comparing different IPNS sequences)
29
Step 1
Multiple alignment of known IPNS
Implementation
1 Obtain sequence (eg for MCBI)
IPNS AND Bacteria[Organism]
2 MSA (clustalw) and search for conserved residues in the MSA
30
31
32
MSA ndash bacteria only
Not enough variation
33
MSA ndash bacteria amp fungi
Not enough variation
bacteria amp fungi
bacteria
34
Step 2
Goal Add more enzymes similar to IPNS
ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW
35
bull New multiple alignment narrowing down the possibilities
36
Simple multiple alignment
bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite
similarbull Not enough variability to categorize the active
sitesbull We need to obtain even more distant sequences
(distant homologs)
37
Step 3
Using the results of the MSA for further searches
Implementation
1 Obtain an MSA (clustalw)
2 - Construct a consensus sequence and perform a
new search
OR
- Construct a profile and perform a new search
3 MSA (clustalw) and search for conserved residues in the MSA
38
Consensus Sequence
bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column
bull Consensus each position reflects the most common character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
39
Profilebull We can deduce a statistical model describing
the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column
bull Profile each position reflects the frequency of the character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 1 067 0 0
T 0 033 1 1
C 0 0 0 0
G 0 0 0 0
40
Profile vs Consensusbull The following multiple alignments will
have the same consensus
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
41
Profile vs Consensusbull But have a different profile
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 066 1 0 0
T 0 0 0 1
C 033 0 066
0
G 0 0 033
0
1 2 3 4 5 6
A 1 1 0 0
T 0 0 0 1
C 0 0 1 0
G 0 0 0 0
42
Sequence LOGO A A C C G C T C T T
A G C C G C G C - T
A - C A G A G C C T
A A G C A C G C - T
A C G G G T G C T T
A T G C ndash C G C - T
A gc c g G C T
httpweblogoberkeleyedu
43
Psi Blast
bull Position Specific Iterated - automatic profile-like search
Regular blast
Construct profile from blast results
Blast profile search
Final results
44
bull Alignment with distantly related proteins
45
bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS
Enzyme Relative Km kcat kcatKm
Activity (mM) (min-1) (mM-1min
-1)Wild type 100 04 388 969
His48Ala 16 056 75 134
His63Ala 31 10 142 142
His114Ala 28 085 125 147
His124Ala 48 084 321 381
His135Ala 22 059 117 198
His212Ala lt0007 nd nd
His268Ala lt0003 nd nd
Asp14Ala 5 086 056 07
Asp113Ala 63 045 238 528
Asp131Ala 68 048 363 755
Asp203Ala 32 091 123 135
Asp214Ala lt0004 nd nd
Isopenicillin N Synthase
46
ndash IPNS
47
48
3
Why multiple sequence alignmentbull Structure similarity ndash aa that play the same role in
each structure are in the same columnbull Evolutionary similarity ndash aa related to the same
ancestor are in the same columnbull Functional similarity - aa with the same function are in
the same columnbull Seq similarity ndash alignment with max similarity No
biological meaning
bull When seqs are closely related structure-evolution-functional similarity equivalent
4
bull Histones small abundant proteins Present in all eukaryotic chromosomes
bull Show a remarkable conserved multiple sequence alignment
bull Conservation of structure and function (aid in DNA package)
Why multiple alignment - examplebull Example
5
MSA applicationsbull Generate protein familiesbull Extrapolation ndash membership of uncharacterized sequence to a
protein familybull Understand evolution - preliminary step in molecular evolution
analysis for constructing phylogenetic trees eg is the duck evolutionary closer to a lion or to a fruit fly
bull Pattern identification ndash find the important (conserved) region in the protein - conserved positions may characterize a function
bull Domain identification ndash Build a consensusprofilemotif that describe the protein family help to describe new members of the family
bull DNA regulatory elementsbull Structure prediction (secondary and 3D model)
6
multiple sequence alignment
Pairwise solution might be very different from multiple solution
7
בתיאוריה כן באופן האם אפשר פשוט להכליל את שיטת התיכנות הדינמי פרקטי לא
מדובר בזמן ריצהובגודל זיכרון הגדלים
כאשר NKכ Nהוא אורך הרצף Kמספר הרצפים
למעשה בלתי אפשריKgt3 עבור
הבעיה החישובית בהינתן מספר רצפים מצא את ההתאמה שלהם למסגרת תקבל ערך אופטימלי שפונקצית מרחקמשותפת (עי הוספת רווחים) כך
-SCGPFIRVMSCGPGLRA-SCTPHL-AMSC-PKIRGMS-LPLLRNMSHKPALRA
SCGPFIRVMSCGPGLRASCTPHLAMSCPKIRGMSLPLLRNMSHKPALRA
8
מאחר שהבעיה החישובית קשה אנו יודעים שכל שיטה שנציע לא תבטיח פתרון אופטימלי נחפש שיטה שנותנת תוצאות טובות
ברב המקרים יתכן שהשיטה שנבחר תהיה תלויה בסיבה שבגללה
אנו מבצעים את ההתאמה
1 Choosing the sequencess2 Scoring metrics3 Approximationheuristic algorithms4 MSA formats5 Interpreting the MSA
The problems we have to answer
9
Scoring metricsbull Distance from Consensus - The consensus of an alignment
is a string of the most common characters in each column of the alignment The total distance between the strings is defined as the number of characters that differ from the consensus character of their column let C be the consensus sequence then the total distance is sum_i D(Si C)
bull Evolutionary Tree Alignment - The weight of the lightest evolutionary tree that can be constructed from the sequences with the weight of the tree defined as the number of changes between pairs of sequences that correspond to two adjacent nodes in the tree summed over all such pairs
bull Sum of Pairs - The sum of pairwise distances between all pairs of sequences Sum_iltj D(Si Sj)
10
-SCGPFIRVMSCGPGLRA-SCTPHL-A
-SCGPFIRVMSCGPGLRA
-SCGPFIRV-SCTPHL-A
MSCGPGLRA-SCTPHL-A
5 3 5 13
Scoring metrics -examplesSum of pairs
-SCGPFIRVMSCGPGLRA-SCTPHL-AMSC-PKIRGMS-LPLLRNMSHKPALRA
3 5 4 1 6 2 4 6 354סהכ הומוגניות
3 1 2 5 0 4 2 0 42 19סהכ מרחק
Distance from concensus
11
MSA algorithms
bull Progressive methods (CLUSTALWT-Coffee)
bull Iterative methods (Dialign)
bull Direct optimization (monte carlo genetic algorithms)
bull Local methods eMotifs Blocks Psi-blast
12
CLUSTALW algorithm
bull Compare all sequence pairs (pairwise alignment)
bull Generate a hierarchy for alignment (guide tree)
bull Build the multiple alignment step by step according to the guide tree first aligning the most similar pair then add another sequence or another pairwise alignment etc
13
(1) Pairwise alignment (prepare a guide tree)
6 pairwise alignments
then cluster analysis
(2) Multiple alignment following the tree from (1)
successive alignments
CLUSTALW algorithm
14
בונים עץ עי חיבור הרצפים הדומים לפי סדר התאמתם2
Hbb Hbb Hba Hba Myg Human Horse Human Horse Whale
בונים את ההתאמה לפי הסדר המוכתב3 עי העץ
ישנם שלושה מצביםהתאמת זוג רצפים - התאמה בין התאמות-הוספת רצף בודד להתאמה קיימת-
בונים טבלה של מרחק עריכה בין כל שני רצפים1CLUSTALW algorithm
15
ניתן להתאים בין התאמות (או בין רצף להתאמה) עי הרחבה די טבעית של שיטת התיכנות הדינמי הוספה
והכנסת אותיות באופן הרגיל המחיר להתאמת אותיות הוא מחיר ממוצע לקומבינציות השונות צריך לתת מחיר מיוחד
לרווח
16
נקודות עדינות
נותנים משקלות שונים לרצפים כך שאם יש מספר רצפים מאד דומים bullהמשקל היחסי של כל אחד יקטן
שיטה מיוחדת לקביעת מחיר להכנסת רווחיםbull
חסרונותמאחר שהבניה היא בלתי הפיכה יש תלות גדולה מאד באיכות ההתאמה bull
של הזוגות הראשוניםOnce a Gap Always a gapרווחים אינם נעלמים bullרגישות רבה לקנס הרווחיםbull
יתרונותמהירbullסבירbullאפשר לקבוע הרבה פרמטריםbullכולם משתמשיםbull
CLUSTALW algorithm
17
bull We can deduce a pairwise alignment for each two sequences in the multiple alignment (projected pairwise alignment)
bull The projected pairwise alignment is NOT the best pairwise alignment for the two sequences
bull CLUTALW is not an optimal algorithm Better alignments might exist The algorithm yields a possible alignment but not necessarily the best one
Best Pairwise alignment (optimal)
Projected Pairwise alignment
CLUSTALW algorithm
18
ClustalW at EMBLhttpwwwebiacukclustalw Clustalw at the SRS site at EBI
19
ClustalW Output Aln format
20
MSA Editing Jalview
Conservation
wwwesembnetorgServicesMolBiojalviewindexhtml
21
MSA formats - fasta
22
MSA formats - Aln
23
MSA formats - MSF
24
Example 1a a good MSA
25
Example 1b making MSA of distantly related proteins
26
Example 1c including more distant relatives in the MSA
27
bull Mononuclear iron proteins ndash electron carrier proteins Iron atoms are bound to amino acid side chains
bull In IPNS the metal ion is coordinated by three protein residues
bull IPNS is involved in biosynthesis of penicillin
Example 2 Isopenicillin N Synthase
N
SN
OCOOHO
H
H
Me
Me
COOH
NH2
N
SN
OCOOH
NH2 Me
Me
COOHO
ACV
Isopenicillin N
Fe+2
Ascorbate
O2
2H2O Fe
N
NHis268
H
N
NHis212
H
SACV
O2 (NO)
H2O
OAsp214
O
28
Research IPNS
bull Goal Identify Fe+2 binding residues
bull Possible solutions1 In the lab
2 Bioinformatic approach (comparing different IPNS sequences)
29
Step 1
Multiple alignment of known IPNS
Implementation
1 Obtain sequence (eg for MCBI)
IPNS AND Bacteria[Organism]
2 MSA (clustalw) and search for conserved residues in the MSA
30
31
32
MSA ndash bacteria only
Not enough variation
33
MSA ndash bacteria amp fungi
Not enough variation
bacteria amp fungi
bacteria
34
Step 2
Goal Add more enzymes similar to IPNS
ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW
35
bull New multiple alignment narrowing down the possibilities
36
Simple multiple alignment
bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite
similarbull Not enough variability to categorize the active
sitesbull We need to obtain even more distant sequences
(distant homologs)
37
Step 3
Using the results of the MSA for further searches
Implementation
1 Obtain an MSA (clustalw)
2 - Construct a consensus sequence and perform a
new search
OR
- Construct a profile and perform a new search
3 MSA (clustalw) and search for conserved residues in the MSA
38
Consensus Sequence
bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column
bull Consensus each position reflects the most common character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
39
Profilebull We can deduce a statistical model describing
the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column
bull Profile each position reflects the frequency of the character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 1 067 0 0
T 0 033 1 1
C 0 0 0 0
G 0 0 0 0
40
Profile vs Consensusbull The following multiple alignments will
have the same consensus
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
41
Profile vs Consensusbull But have a different profile
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 066 1 0 0
T 0 0 0 1
C 033 0 066
0
G 0 0 033
0
1 2 3 4 5 6
A 1 1 0 0
T 0 0 0 1
C 0 0 1 0
G 0 0 0 0
42
Sequence LOGO A A C C G C T C T T
A G C C G C G C - T
A - C A G A G C C T
A A G C A C G C - T
A C G G G T G C T T
A T G C ndash C G C - T
A gc c g G C T
httpweblogoberkeleyedu
43
Psi Blast
bull Position Specific Iterated - automatic profile-like search
Regular blast
Construct profile from blast results
Blast profile search
Final results
44
bull Alignment with distantly related proteins
45
bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS
Enzyme Relative Km kcat kcatKm
Activity (mM) (min-1) (mM-1min
-1)Wild type 100 04 388 969
His48Ala 16 056 75 134
His63Ala 31 10 142 142
His114Ala 28 085 125 147
His124Ala 48 084 321 381
His135Ala 22 059 117 198
His212Ala lt0007 nd nd
His268Ala lt0003 nd nd
Asp14Ala 5 086 056 07
Asp113Ala 63 045 238 528
Asp131Ala 68 048 363 755
Asp203Ala 32 091 123 135
Asp214Ala lt0004 nd nd
Isopenicillin N Synthase
46
ndash IPNS
47
48
4
bull Histones small abundant proteins Present in all eukaryotic chromosomes
bull Show a remarkable conserved multiple sequence alignment
bull Conservation of structure and function (aid in DNA package)
Why multiple alignment - examplebull Example
5
MSA applicationsbull Generate protein familiesbull Extrapolation ndash membership of uncharacterized sequence to a
protein familybull Understand evolution - preliminary step in molecular evolution
analysis for constructing phylogenetic trees eg is the duck evolutionary closer to a lion or to a fruit fly
bull Pattern identification ndash find the important (conserved) region in the protein - conserved positions may characterize a function
bull Domain identification ndash Build a consensusprofilemotif that describe the protein family help to describe new members of the family
bull DNA regulatory elementsbull Structure prediction (secondary and 3D model)
6
multiple sequence alignment
Pairwise solution might be very different from multiple solution
7
בתיאוריה כן באופן האם אפשר פשוט להכליל את שיטת התיכנות הדינמי פרקטי לא
מדובר בזמן ריצהובגודל זיכרון הגדלים
כאשר NKכ Nהוא אורך הרצף Kמספר הרצפים
למעשה בלתי אפשריKgt3 עבור
הבעיה החישובית בהינתן מספר רצפים מצא את ההתאמה שלהם למסגרת תקבל ערך אופטימלי שפונקצית מרחקמשותפת (עי הוספת רווחים) כך
-SCGPFIRVMSCGPGLRA-SCTPHL-AMSC-PKIRGMS-LPLLRNMSHKPALRA
SCGPFIRVMSCGPGLRASCTPHLAMSCPKIRGMSLPLLRNMSHKPALRA
8
מאחר שהבעיה החישובית קשה אנו יודעים שכל שיטה שנציע לא תבטיח פתרון אופטימלי נחפש שיטה שנותנת תוצאות טובות
ברב המקרים יתכן שהשיטה שנבחר תהיה תלויה בסיבה שבגללה
אנו מבצעים את ההתאמה
1 Choosing the sequencess2 Scoring metrics3 Approximationheuristic algorithms4 MSA formats5 Interpreting the MSA
The problems we have to answer
9
Scoring metricsbull Distance from Consensus - The consensus of an alignment
is a string of the most common characters in each column of the alignment The total distance between the strings is defined as the number of characters that differ from the consensus character of their column let C be the consensus sequence then the total distance is sum_i D(Si C)
bull Evolutionary Tree Alignment - The weight of the lightest evolutionary tree that can be constructed from the sequences with the weight of the tree defined as the number of changes between pairs of sequences that correspond to two adjacent nodes in the tree summed over all such pairs
bull Sum of Pairs - The sum of pairwise distances between all pairs of sequences Sum_iltj D(Si Sj)
10
-SCGPFIRVMSCGPGLRA-SCTPHL-A
-SCGPFIRVMSCGPGLRA
-SCGPFIRV-SCTPHL-A
MSCGPGLRA-SCTPHL-A
5 3 5 13
Scoring metrics -examplesSum of pairs
-SCGPFIRVMSCGPGLRA-SCTPHL-AMSC-PKIRGMS-LPLLRNMSHKPALRA
3 5 4 1 6 2 4 6 354סהכ הומוגניות
3 1 2 5 0 4 2 0 42 19סהכ מרחק
Distance from concensus
11
MSA algorithms
bull Progressive methods (CLUSTALWT-Coffee)
bull Iterative methods (Dialign)
bull Direct optimization (monte carlo genetic algorithms)
bull Local methods eMotifs Blocks Psi-blast
12
CLUSTALW algorithm
bull Compare all sequence pairs (pairwise alignment)
bull Generate a hierarchy for alignment (guide tree)
bull Build the multiple alignment step by step according to the guide tree first aligning the most similar pair then add another sequence or another pairwise alignment etc
13
(1) Pairwise alignment (prepare a guide tree)
6 pairwise alignments
then cluster analysis
(2) Multiple alignment following the tree from (1)
successive alignments
CLUSTALW algorithm
14
בונים עץ עי חיבור הרצפים הדומים לפי סדר התאמתם2
Hbb Hbb Hba Hba Myg Human Horse Human Horse Whale
בונים את ההתאמה לפי הסדר המוכתב3 עי העץ
ישנם שלושה מצביםהתאמת זוג רצפים - התאמה בין התאמות-הוספת רצף בודד להתאמה קיימת-
בונים טבלה של מרחק עריכה בין כל שני רצפים1CLUSTALW algorithm
15
ניתן להתאים בין התאמות (או בין רצף להתאמה) עי הרחבה די טבעית של שיטת התיכנות הדינמי הוספה
והכנסת אותיות באופן הרגיל המחיר להתאמת אותיות הוא מחיר ממוצע לקומבינציות השונות צריך לתת מחיר מיוחד
לרווח
16
נקודות עדינות
נותנים משקלות שונים לרצפים כך שאם יש מספר רצפים מאד דומים bullהמשקל היחסי של כל אחד יקטן
שיטה מיוחדת לקביעת מחיר להכנסת רווחיםbull
חסרונותמאחר שהבניה היא בלתי הפיכה יש תלות גדולה מאד באיכות ההתאמה bull
של הזוגות הראשוניםOnce a Gap Always a gapרווחים אינם נעלמים bullרגישות רבה לקנס הרווחיםbull
יתרונותמהירbullסבירbullאפשר לקבוע הרבה פרמטריםbullכולם משתמשיםbull
CLUSTALW algorithm
17
bull We can deduce a pairwise alignment for each two sequences in the multiple alignment (projected pairwise alignment)
bull The projected pairwise alignment is NOT the best pairwise alignment for the two sequences
bull CLUTALW is not an optimal algorithm Better alignments might exist The algorithm yields a possible alignment but not necessarily the best one
Best Pairwise alignment (optimal)
Projected Pairwise alignment
CLUSTALW algorithm
18
ClustalW at EMBLhttpwwwebiacukclustalw Clustalw at the SRS site at EBI
19
ClustalW Output Aln format
20
MSA Editing Jalview
Conservation
wwwesembnetorgServicesMolBiojalviewindexhtml
21
MSA formats - fasta
22
MSA formats - Aln
23
MSA formats - MSF
24
Example 1a a good MSA
25
Example 1b making MSA of distantly related proteins
26
Example 1c including more distant relatives in the MSA
27
bull Mononuclear iron proteins ndash electron carrier proteins Iron atoms are bound to amino acid side chains
bull In IPNS the metal ion is coordinated by three protein residues
bull IPNS is involved in biosynthesis of penicillin
Example 2 Isopenicillin N Synthase
N
SN
OCOOHO
H
H
Me
Me
COOH
NH2
N
SN
OCOOH
NH2 Me
Me
COOHO
ACV
Isopenicillin N
Fe+2
Ascorbate
O2
2H2O Fe
N
NHis268
H
N
NHis212
H
SACV
O2 (NO)
H2O
OAsp214
O
28
Research IPNS
bull Goal Identify Fe+2 binding residues
bull Possible solutions1 In the lab
2 Bioinformatic approach (comparing different IPNS sequences)
29
Step 1
Multiple alignment of known IPNS
Implementation
1 Obtain sequence (eg for MCBI)
IPNS AND Bacteria[Organism]
2 MSA (clustalw) and search for conserved residues in the MSA
30
31
32
MSA ndash bacteria only
Not enough variation
33
MSA ndash bacteria amp fungi
Not enough variation
bacteria amp fungi
bacteria
34
Step 2
Goal Add more enzymes similar to IPNS
ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW
35
bull New multiple alignment narrowing down the possibilities
36
Simple multiple alignment
bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite
similarbull Not enough variability to categorize the active
sitesbull We need to obtain even more distant sequences
(distant homologs)
37
Step 3
Using the results of the MSA for further searches
Implementation
1 Obtain an MSA (clustalw)
2 - Construct a consensus sequence and perform a
new search
OR
- Construct a profile and perform a new search
3 MSA (clustalw) and search for conserved residues in the MSA
38
Consensus Sequence
bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column
bull Consensus each position reflects the most common character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
39
Profilebull We can deduce a statistical model describing
the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column
bull Profile each position reflects the frequency of the character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 1 067 0 0
T 0 033 1 1
C 0 0 0 0
G 0 0 0 0
40
Profile vs Consensusbull The following multiple alignments will
have the same consensus
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
41
Profile vs Consensusbull But have a different profile
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 066 1 0 0
T 0 0 0 1
C 033 0 066
0
G 0 0 033
0
1 2 3 4 5 6
A 1 1 0 0
T 0 0 0 1
C 0 0 1 0
G 0 0 0 0
42
Sequence LOGO A A C C G C T C T T
A G C C G C G C - T
A - C A G A G C C T
A A G C A C G C - T
A C G G G T G C T T
A T G C ndash C G C - T
A gc c g G C T
httpweblogoberkeleyedu
43
Psi Blast
bull Position Specific Iterated - automatic profile-like search
Regular blast
Construct profile from blast results
Blast profile search
Final results
44
bull Alignment with distantly related proteins
45
bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS
Enzyme Relative Km kcat kcatKm
Activity (mM) (min-1) (mM-1min
-1)Wild type 100 04 388 969
His48Ala 16 056 75 134
His63Ala 31 10 142 142
His114Ala 28 085 125 147
His124Ala 48 084 321 381
His135Ala 22 059 117 198
His212Ala lt0007 nd nd
His268Ala lt0003 nd nd
Asp14Ala 5 086 056 07
Asp113Ala 63 045 238 528
Asp131Ala 68 048 363 755
Asp203Ala 32 091 123 135
Asp214Ala lt0004 nd nd
Isopenicillin N Synthase
46
ndash IPNS
47
48
5
MSA applicationsbull Generate protein familiesbull Extrapolation ndash membership of uncharacterized sequence to a
protein familybull Understand evolution - preliminary step in molecular evolution
analysis for constructing phylogenetic trees eg is the duck evolutionary closer to a lion or to a fruit fly
bull Pattern identification ndash find the important (conserved) region in the protein - conserved positions may characterize a function
bull Domain identification ndash Build a consensusprofilemotif that describe the protein family help to describe new members of the family
bull DNA regulatory elementsbull Structure prediction (secondary and 3D model)
6
multiple sequence alignment
Pairwise solution might be very different from multiple solution
7
בתיאוריה כן באופן האם אפשר פשוט להכליל את שיטת התיכנות הדינמי פרקטי לא
מדובר בזמן ריצהובגודל זיכרון הגדלים
כאשר NKכ Nהוא אורך הרצף Kמספר הרצפים
למעשה בלתי אפשריKgt3 עבור
הבעיה החישובית בהינתן מספר רצפים מצא את ההתאמה שלהם למסגרת תקבל ערך אופטימלי שפונקצית מרחקמשותפת (עי הוספת רווחים) כך
-SCGPFIRVMSCGPGLRA-SCTPHL-AMSC-PKIRGMS-LPLLRNMSHKPALRA
SCGPFIRVMSCGPGLRASCTPHLAMSCPKIRGMSLPLLRNMSHKPALRA
8
מאחר שהבעיה החישובית קשה אנו יודעים שכל שיטה שנציע לא תבטיח פתרון אופטימלי נחפש שיטה שנותנת תוצאות טובות
ברב המקרים יתכן שהשיטה שנבחר תהיה תלויה בסיבה שבגללה
אנו מבצעים את ההתאמה
1 Choosing the sequencess2 Scoring metrics3 Approximationheuristic algorithms4 MSA formats5 Interpreting the MSA
The problems we have to answer
9
Scoring metricsbull Distance from Consensus - The consensus of an alignment
is a string of the most common characters in each column of the alignment The total distance between the strings is defined as the number of characters that differ from the consensus character of their column let C be the consensus sequence then the total distance is sum_i D(Si C)
bull Evolutionary Tree Alignment - The weight of the lightest evolutionary tree that can be constructed from the sequences with the weight of the tree defined as the number of changes between pairs of sequences that correspond to two adjacent nodes in the tree summed over all such pairs
bull Sum of Pairs - The sum of pairwise distances between all pairs of sequences Sum_iltj D(Si Sj)
10
-SCGPFIRVMSCGPGLRA-SCTPHL-A
-SCGPFIRVMSCGPGLRA
-SCGPFIRV-SCTPHL-A
MSCGPGLRA-SCTPHL-A
5 3 5 13
Scoring metrics -examplesSum of pairs
-SCGPFIRVMSCGPGLRA-SCTPHL-AMSC-PKIRGMS-LPLLRNMSHKPALRA
3 5 4 1 6 2 4 6 354סהכ הומוגניות
3 1 2 5 0 4 2 0 42 19סהכ מרחק
Distance from concensus
11
MSA algorithms
bull Progressive methods (CLUSTALWT-Coffee)
bull Iterative methods (Dialign)
bull Direct optimization (monte carlo genetic algorithms)
bull Local methods eMotifs Blocks Psi-blast
12
CLUSTALW algorithm
bull Compare all sequence pairs (pairwise alignment)
bull Generate a hierarchy for alignment (guide tree)
bull Build the multiple alignment step by step according to the guide tree first aligning the most similar pair then add another sequence or another pairwise alignment etc
13
(1) Pairwise alignment (prepare a guide tree)
6 pairwise alignments
then cluster analysis
(2) Multiple alignment following the tree from (1)
successive alignments
CLUSTALW algorithm
14
בונים עץ עי חיבור הרצפים הדומים לפי סדר התאמתם2
Hbb Hbb Hba Hba Myg Human Horse Human Horse Whale
בונים את ההתאמה לפי הסדר המוכתב3 עי העץ
ישנם שלושה מצביםהתאמת זוג רצפים - התאמה בין התאמות-הוספת רצף בודד להתאמה קיימת-
בונים טבלה של מרחק עריכה בין כל שני רצפים1CLUSTALW algorithm
15
ניתן להתאים בין התאמות (או בין רצף להתאמה) עי הרחבה די טבעית של שיטת התיכנות הדינמי הוספה
והכנסת אותיות באופן הרגיל המחיר להתאמת אותיות הוא מחיר ממוצע לקומבינציות השונות צריך לתת מחיר מיוחד
לרווח
16
נקודות עדינות
נותנים משקלות שונים לרצפים כך שאם יש מספר רצפים מאד דומים bullהמשקל היחסי של כל אחד יקטן
שיטה מיוחדת לקביעת מחיר להכנסת רווחיםbull
חסרונותמאחר שהבניה היא בלתי הפיכה יש תלות גדולה מאד באיכות ההתאמה bull
של הזוגות הראשוניםOnce a Gap Always a gapרווחים אינם נעלמים bullרגישות רבה לקנס הרווחיםbull
יתרונותמהירbullסבירbullאפשר לקבוע הרבה פרמטריםbullכולם משתמשיםbull
CLUSTALW algorithm
17
bull We can deduce a pairwise alignment for each two sequences in the multiple alignment (projected pairwise alignment)
bull The projected pairwise alignment is NOT the best pairwise alignment for the two sequences
bull CLUTALW is not an optimal algorithm Better alignments might exist The algorithm yields a possible alignment but not necessarily the best one
Best Pairwise alignment (optimal)
Projected Pairwise alignment
CLUSTALW algorithm
18
ClustalW at EMBLhttpwwwebiacukclustalw Clustalw at the SRS site at EBI
19
ClustalW Output Aln format
20
MSA Editing Jalview
Conservation
wwwesembnetorgServicesMolBiojalviewindexhtml
21
MSA formats - fasta
22
MSA formats - Aln
23
MSA formats - MSF
24
Example 1a a good MSA
25
Example 1b making MSA of distantly related proteins
26
Example 1c including more distant relatives in the MSA
27
bull Mononuclear iron proteins ndash electron carrier proteins Iron atoms are bound to amino acid side chains
bull In IPNS the metal ion is coordinated by three protein residues
bull IPNS is involved in biosynthesis of penicillin
Example 2 Isopenicillin N Synthase
N
SN
OCOOHO
H
H
Me
Me
COOH
NH2
N
SN
OCOOH
NH2 Me
Me
COOHO
ACV
Isopenicillin N
Fe+2
Ascorbate
O2
2H2O Fe
N
NHis268
H
N
NHis212
H
SACV
O2 (NO)
H2O
OAsp214
O
28
Research IPNS
bull Goal Identify Fe+2 binding residues
bull Possible solutions1 In the lab
2 Bioinformatic approach (comparing different IPNS sequences)
29
Step 1
Multiple alignment of known IPNS
Implementation
1 Obtain sequence (eg for MCBI)
IPNS AND Bacteria[Organism]
2 MSA (clustalw) and search for conserved residues in the MSA
30
31
32
MSA ndash bacteria only
Not enough variation
33
MSA ndash bacteria amp fungi
Not enough variation
bacteria amp fungi
bacteria
34
Step 2
Goal Add more enzymes similar to IPNS
ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW
35
bull New multiple alignment narrowing down the possibilities
36
Simple multiple alignment
bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite
similarbull Not enough variability to categorize the active
sitesbull We need to obtain even more distant sequences
(distant homologs)
37
Step 3
Using the results of the MSA for further searches
Implementation
1 Obtain an MSA (clustalw)
2 - Construct a consensus sequence and perform a
new search
OR
- Construct a profile and perform a new search
3 MSA (clustalw) and search for conserved residues in the MSA
38
Consensus Sequence
bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column
bull Consensus each position reflects the most common character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
39
Profilebull We can deduce a statistical model describing
the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column
bull Profile each position reflects the frequency of the character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 1 067 0 0
T 0 033 1 1
C 0 0 0 0
G 0 0 0 0
40
Profile vs Consensusbull The following multiple alignments will
have the same consensus
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
41
Profile vs Consensusbull But have a different profile
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 066 1 0 0
T 0 0 0 1
C 033 0 066
0
G 0 0 033
0
1 2 3 4 5 6
A 1 1 0 0
T 0 0 0 1
C 0 0 1 0
G 0 0 0 0
42
Sequence LOGO A A C C G C T C T T
A G C C G C G C - T
A - C A G A G C C T
A A G C A C G C - T
A C G G G T G C T T
A T G C ndash C G C - T
A gc c g G C T
httpweblogoberkeleyedu
43
Psi Blast
bull Position Specific Iterated - automatic profile-like search
Regular blast
Construct profile from blast results
Blast profile search
Final results
44
bull Alignment with distantly related proteins
45
bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS
Enzyme Relative Km kcat kcatKm
Activity (mM) (min-1) (mM-1min
-1)Wild type 100 04 388 969
His48Ala 16 056 75 134
His63Ala 31 10 142 142
His114Ala 28 085 125 147
His124Ala 48 084 321 381
His135Ala 22 059 117 198
His212Ala lt0007 nd nd
His268Ala lt0003 nd nd
Asp14Ala 5 086 056 07
Asp113Ala 63 045 238 528
Asp131Ala 68 048 363 755
Asp203Ala 32 091 123 135
Asp214Ala lt0004 nd nd
Isopenicillin N Synthase
46
ndash IPNS
47
48
6
multiple sequence alignment
Pairwise solution might be very different from multiple solution
7
בתיאוריה כן באופן האם אפשר פשוט להכליל את שיטת התיכנות הדינמי פרקטי לא
מדובר בזמן ריצהובגודל זיכרון הגדלים
כאשר NKכ Nהוא אורך הרצף Kמספר הרצפים
למעשה בלתי אפשריKgt3 עבור
הבעיה החישובית בהינתן מספר רצפים מצא את ההתאמה שלהם למסגרת תקבל ערך אופטימלי שפונקצית מרחקמשותפת (עי הוספת רווחים) כך
-SCGPFIRVMSCGPGLRA-SCTPHL-AMSC-PKIRGMS-LPLLRNMSHKPALRA
SCGPFIRVMSCGPGLRASCTPHLAMSCPKIRGMSLPLLRNMSHKPALRA
8
מאחר שהבעיה החישובית קשה אנו יודעים שכל שיטה שנציע לא תבטיח פתרון אופטימלי נחפש שיטה שנותנת תוצאות טובות
ברב המקרים יתכן שהשיטה שנבחר תהיה תלויה בסיבה שבגללה
אנו מבצעים את ההתאמה
1 Choosing the sequencess2 Scoring metrics3 Approximationheuristic algorithms4 MSA formats5 Interpreting the MSA
The problems we have to answer
9
Scoring metricsbull Distance from Consensus - The consensus of an alignment
is a string of the most common characters in each column of the alignment The total distance between the strings is defined as the number of characters that differ from the consensus character of their column let C be the consensus sequence then the total distance is sum_i D(Si C)
bull Evolutionary Tree Alignment - The weight of the lightest evolutionary tree that can be constructed from the sequences with the weight of the tree defined as the number of changes between pairs of sequences that correspond to two adjacent nodes in the tree summed over all such pairs
bull Sum of Pairs - The sum of pairwise distances between all pairs of sequences Sum_iltj D(Si Sj)
10
-SCGPFIRVMSCGPGLRA-SCTPHL-A
-SCGPFIRVMSCGPGLRA
-SCGPFIRV-SCTPHL-A
MSCGPGLRA-SCTPHL-A
5 3 5 13
Scoring metrics -examplesSum of pairs
-SCGPFIRVMSCGPGLRA-SCTPHL-AMSC-PKIRGMS-LPLLRNMSHKPALRA
3 5 4 1 6 2 4 6 354סהכ הומוגניות
3 1 2 5 0 4 2 0 42 19סהכ מרחק
Distance from concensus
11
MSA algorithms
bull Progressive methods (CLUSTALWT-Coffee)
bull Iterative methods (Dialign)
bull Direct optimization (monte carlo genetic algorithms)
bull Local methods eMotifs Blocks Psi-blast
12
CLUSTALW algorithm
bull Compare all sequence pairs (pairwise alignment)
bull Generate a hierarchy for alignment (guide tree)
bull Build the multiple alignment step by step according to the guide tree first aligning the most similar pair then add another sequence or another pairwise alignment etc
13
(1) Pairwise alignment (prepare a guide tree)
6 pairwise alignments
then cluster analysis
(2) Multiple alignment following the tree from (1)
successive alignments
CLUSTALW algorithm
14
בונים עץ עי חיבור הרצפים הדומים לפי סדר התאמתם2
Hbb Hbb Hba Hba Myg Human Horse Human Horse Whale
בונים את ההתאמה לפי הסדר המוכתב3 עי העץ
ישנם שלושה מצביםהתאמת זוג רצפים - התאמה בין התאמות-הוספת רצף בודד להתאמה קיימת-
בונים טבלה של מרחק עריכה בין כל שני רצפים1CLUSTALW algorithm
15
ניתן להתאים בין התאמות (או בין רצף להתאמה) עי הרחבה די טבעית של שיטת התיכנות הדינמי הוספה
והכנסת אותיות באופן הרגיל המחיר להתאמת אותיות הוא מחיר ממוצע לקומבינציות השונות צריך לתת מחיר מיוחד
לרווח
16
נקודות עדינות
נותנים משקלות שונים לרצפים כך שאם יש מספר רצפים מאד דומים bullהמשקל היחסי של כל אחד יקטן
שיטה מיוחדת לקביעת מחיר להכנסת רווחיםbull
חסרונותמאחר שהבניה היא בלתי הפיכה יש תלות גדולה מאד באיכות ההתאמה bull
של הזוגות הראשוניםOnce a Gap Always a gapרווחים אינם נעלמים bullרגישות רבה לקנס הרווחיםbull
יתרונותמהירbullסבירbullאפשר לקבוע הרבה פרמטריםbullכולם משתמשיםbull
CLUSTALW algorithm
17
bull We can deduce a pairwise alignment for each two sequences in the multiple alignment (projected pairwise alignment)
bull The projected pairwise alignment is NOT the best pairwise alignment for the two sequences
bull CLUTALW is not an optimal algorithm Better alignments might exist The algorithm yields a possible alignment but not necessarily the best one
Best Pairwise alignment (optimal)
Projected Pairwise alignment
CLUSTALW algorithm
18
ClustalW at EMBLhttpwwwebiacukclustalw Clustalw at the SRS site at EBI
19
ClustalW Output Aln format
20
MSA Editing Jalview
Conservation
wwwesembnetorgServicesMolBiojalviewindexhtml
21
MSA formats - fasta
22
MSA formats - Aln
23
MSA formats - MSF
24
Example 1a a good MSA
25
Example 1b making MSA of distantly related proteins
26
Example 1c including more distant relatives in the MSA
27
bull Mononuclear iron proteins ndash electron carrier proteins Iron atoms are bound to amino acid side chains
bull In IPNS the metal ion is coordinated by three protein residues
bull IPNS is involved in biosynthesis of penicillin
Example 2 Isopenicillin N Synthase
N
SN
OCOOHO
H
H
Me
Me
COOH
NH2
N
SN
OCOOH
NH2 Me
Me
COOHO
ACV
Isopenicillin N
Fe+2
Ascorbate
O2
2H2O Fe
N
NHis268
H
N
NHis212
H
SACV
O2 (NO)
H2O
OAsp214
O
28
Research IPNS
bull Goal Identify Fe+2 binding residues
bull Possible solutions1 In the lab
2 Bioinformatic approach (comparing different IPNS sequences)
29
Step 1
Multiple alignment of known IPNS
Implementation
1 Obtain sequence (eg for MCBI)
IPNS AND Bacteria[Organism]
2 MSA (clustalw) and search for conserved residues in the MSA
30
31
32
MSA ndash bacteria only
Not enough variation
33
MSA ndash bacteria amp fungi
Not enough variation
bacteria amp fungi
bacteria
34
Step 2
Goal Add more enzymes similar to IPNS
ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW
35
bull New multiple alignment narrowing down the possibilities
36
Simple multiple alignment
bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite
similarbull Not enough variability to categorize the active
sitesbull We need to obtain even more distant sequences
(distant homologs)
37
Step 3
Using the results of the MSA for further searches
Implementation
1 Obtain an MSA (clustalw)
2 - Construct a consensus sequence and perform a
new search
OR
- Construct a profile and perform a new search
3 MSA (clustalw) and search for conserved residues in the MSA
38
Consensus Sequence
bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column
bull Consensus each position reflects the most common character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
39
Profilebull We can deduce a statistical model describing
the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column
bull Profile each position reflects the frequency of the character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 1 067 0 0
T 0 033 1 1
C 0 0 0 0
G 0 0 0 0
40
Profile vs Consensusbull The following multiple alignments will
have the same consensus
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
41
Profile vs Consensusbull But have a different profile
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 066 1 0 0
T 0 0 0 1
C 033 0 066
0
G 0 0 033
0
1 2 3 4 5 6
A 1 1 0 0
T 0 0 0 1
C 0 0 1 0
G 0 0 0 0
42
Sequence LOGO A A C C G C T C T T
A G C C G C G C - T
A - C A G A G C C T
A A G C A C G C - T
A C G G G T G C T T
A T G C ndash C G C - T
A gc c g G C T
httpweblogoberkeleyedu
43
Psi Blast
bull Position Specific Iterated - automatic profile-like search
Regular blast
Construct profile from blast results
Blast profile search
Final results
44
bull Alignment with distantly related proteins
45
bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS
Enzyme Relative Km kcat kcatKm
Activity (mM) (min-1) (mM-1min
-1)Wild type 100 04 388 969
His48Ala 16 056 75 134
His63Ala 31 10 142 142
His114Ala 28 085 125 147
His124Ala 48 084 321 381
His135Ala 22 059 117 198
His212Ala lt0007 nd nd
His268Ala lt0003 nd nd
Asp14Ala 5 086 056 07
Asp113Ala 63 045 238 528
Asp131Ala 68 048 363 755
Asp203Ala 32 091 123 135
Asp214Ala lt0004 nd nd
Isopenicillin N Synthase
46
ndash IPNS
47
48
7
בתיאוריה כן באופן האם אפשר פשוט להכליל את שיטת התיכנות הדינמי פרקטי לא
מדובר בזמן ריצהובגודל זיכרון הגדלים
כאשר NKכ Nהוא אורך הרצף Kמספר הרצפים
למעשה בלתי אפשריKgt3 עבור
הבעיה החישובית בהינתן מספר רצפים מצא את ההתאמה שלהם למסגרת תקבל ערך אופטימלי שפונקצית מרחקמשותפת (עי הוספת רווחים) כך
-SCGPFIRVMSCGPGLRA-SCTPHL-AMSC-PKIRGMS-LPLLRNMSHKPALRA
SCGPFIRVMSCGPGLRASCTPHLAMSCPKIRGMSLPLLRNMSHKPALRA
8
מאחר שהבעיה החישובית קשה אנו יודעים שכל שיטה שנציע לא תבטיח פתרון אופטימלי נחפש שיטה שנותנת תוצאות טובות
ברב המקרים יתכן שהשיטה שנבחר תהיה תלויה בסיבה שבגללה
אנו מבצעים את ההתאמה
1 Choosing the sequencess2 Scoring metrics3 Approximationheuristic algorithms4 MSA formats5 Interpreting the MSA
The problems we have to answer
9
Scoring metricsbull Distance from Consensus - The consensus of an alignment
is a string of the most common characters in each column of the alignment The total distance between the strings is defined as the number of characters that differ from the consensus character of their column let C be the consensus sequence then the total distance is sum_i D(Si C)
bull Evolutionary Tree Alignment - The weight of the lightest evolutionary tree that can be constructed from the sequences with the weight of the tree defined as the number of changes between pairs of sequences that correspond to two adjacent nodes in the tree summed over all such pairs
bull Sum of Pairs - The sum of pairwise distances between all pairs of sequences Sum_iltj D(Si Sj)
10
-SCGPFIRVMSCGPGLRA-SCTPHL-A
-SCGPFIRVMSCGPGLRA
-SCGPFIRV-SCTPHL-A
MSCGPGLRA-SCTPHL-A
5 3 5 13
Scoring metrics -examplesSum of pairs
-SCGPFIRVMSCGPGLRA-SCTPHL-AMSC-PKIRGMS-LPLLRNMSHKPALRA
3 5 4 1 6 2 4 6 354סהכ הומוגניות
3 1 2 5 0 4 2 0 42 19סהכ מרחק
Distance from concensus
11
MSA algorithms
bull Progressive methods (CLUSTALWT-Coffee)
bull Iterative methods (Dialign)
bull Direct optimization (monte carlo genetic algorithms)
bull Local methods eMotifs Blocks Psi-blast
12
CLUSTALW algorithm
bull Compare all sequence pairs (pairwise alignment)
bull Generate a hierarchy for alignment (guide tree)
bull Build the multiple alignment step by step according to the guide tree first aligning the most similar pair then add another sequence or another pairwise alignment etc
13
(1) Pairwise alignment (prepare a guide tree)
6 pairwise alignments
then cluster analysis
(2) Multiple alignment following the tree from (1)
successive alignments
CLUSTALW algorithm
14
בונים עץ עי חיבור הרצפים הדומים לפי סדר התאמתם2
Hbb Hbb Hba Hba Myg Human Horse Human Horse Whale
בונים את ההתאמה לפי הסדר המוכתב3 עי העץ
ישנם שלושה מצביםהתאמת זוג רצפים - התאמה בין התאמות-הוספת רצף בודד להתאמה קיימת-
בונים טבלה של מרחק עריכה בין כל שני רצפים1CLUSTALW algorithm
15
ניתן להתאים בין התאמות (או בין רצף להתאמה) עי הרחבה די טבעית של שיטת התיכנות הדינמי הוספה
והכנסת אותיות באופן הרגיל המחיר להתאמת אותיות הוא מחיר ממוצע לקומבינציות השונות צריך לתת מחיר מיוחד
לרווח
16
נקודות עדינות
נותנים משקלות שונים לרצפים כך שאם יש מספר רצפים מאד דומים bullהמשקל היחסי של כל אחד יקטן
שיטה מיוחדת לקביעת מחיר להכנסת רווחיםbull
חסרונותמאחר שהבניה היא בלתי הפיכה יש תלות גדולה מאד באיכות ההתאמה bull
של הזוגות הראשוניםOnce a Gap Always a gapרווחים אינם נעלמים bullרגישות רבה לקנס הרווחיםbull
יתרונותמהירbullסבירbullאפשר לקבוע הרבה פרמטריםbullכולם משתמשיםbull
CLUSTALW algorithm
17
bull We can deduce a pairwise alignment for each two sequences in the multiple alignment (projected pairwise alignment)
bull The projected pairwise alignment is NOT the best pairwise alignment for the two sequences
bull CLUTALW is not an optimal algorithm Better alignments might exist The algorithm yields a possible alignment but not necessarily the best one
Best Pairwise alignment (optimal)
Projected Pairwise alignment
CLUSTALW algorithm
18
ClustalW at EMBLhttpwwwebiacukclustalw Clustalw at the SRS site at EBI
19
ClustalW Output Aln format
20
MSA Editing Jalview
Conservation
wwwesembnetorgServicesMolBiojalviewindexhtml
21
MSA formats - fasta
22
MSA formats - Aln
23
MSA formats - MSF
24
Example 1a a good MSA
25
Example 1b making MSA of distantly related proteins
26
Example 1c including more distant relatives in the MSA
27
bull Mononuclear iron proteins ndash electron carrier proteins Iron atoms are bound to amino acid side chains
bull In IPNS the metal ion is coordinated by three protein residues
bull IPNS is involved in biosynthesis of penicillin
Example 2 Isopenicillin N Synthase
N
SN
OCOOHO
H
H
Me
Me
COOH
NH2
N
SN
OCOOH
NH2 Me
Me
COOHO
ACV
Isopenicillin N
Fe+2
Ascorbate
O2
2H2O Fe
N
NHis268
H
N
NHis212
H
SACV
O2 (NO)
H2O
OAsp214
O
28
Research IPNS
bull Goal Identify Fe+2 binding residues
bull Possible solutions1 In the lab
2 Bioinformatic approach (comparing different IPNS sequences)
29
Step 1
Multiple alignment of known IPNS
Implementation
1 Obtain sequence (eg for MCBI)
IPNS AND Bacteria[Organism]
2 MSA (clustalw) and search for conserved residues in the MSA
30
31
32
MSA ndash bacteria only
Not enough variation
33
MSA ndash bacteria amp fungi
Not enough variation
bacteria amp fungi
bacteria
34
Step 2
Goal Add more enzymes similar to IPNS
ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW
35
bull New multiple alignment narrowing down the possibilities
36
Simple multiple alignment
bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite
similarbull Not enough variability to categorize the active
sitesbull We need to obtain even more distant sequences
(distant homologs)
37
Step 3
Using the results of the MSA for further searches
Implementation
1 Obtain an MSA (clustalw)
2 - Construct a consensus sequence and perform a
new search
OR
- Construct a profile and perform a new search
3 MSA (clustalw) and search for conserved residues in the MSA
38
Consensus Sequence
bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column
bull Consensus each position reflects the most common character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
39
Profilebull We can deduce a statistical model describing
the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column
bull Profile each position reflects the frequency of the character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 1 067 0 0
T 0 033 1 1
C 0 0 0 0
G 0 0 0 0
40
Profile vs Consensusbull The following multiple alignments will
have the same consensus
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
41
Profile vs Consensusbull But have a different profile
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 066 1 0 0
T 0 0 0 1
C 033 0 066
0
G 0 0 033
0
1 2 3 4 5 6
A 1 1 0 0
T 0 0 0 1
C 0 0 1 0
G 0 0 0 0
42
Sequence LOGO A A C C G C T C T T
A G C C G C G C - T
A - C A G A G C C T
A A G C A C G C - T
A C G G G T G C T T
A T G C ndash C G C - T
A gc c g G C T
httpweblogoberkeleyedu
43
Psi Blast
bull Position Specific Iterated - automatic profile-like search
Regular blast
Construct profile from blast results
Blast profile search
Final results
44
bull Alignment with distantly related proteins
45
bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS
Enzyme Relative Km kcat kcatKm
Activity (mM) (min-1) (mM-1min
-1)Wild type 100 04 388 969
His48Ala 16 056 75 134
His63Ala 31 10 142 142
His114Ala 28 085 125 147
His124Ala 48 084 321 381
His135Ala 22 059 117 198
His212Ala lt0007 nd nd
His268Ala lt0003 nd nd
Asp14Ala 5 086 056 07
Asp113Ala 63 045 238 528
Asp131Ala 68 048 363 755
Asp203Ala 32 091 123 135
Asp214Ala lt0004 nd nd
Isopenicillin N Synthase
46
ndash IPNS
47
48
8
מאחר שהבעיה החישובית קשה אנו יודעים שכל שיטה שנציע לא תבטיח פתרון אופטימלי נחפש שיטה שנותנת תוצאות טובות
ברב המקרים יתכן שהשיטה שנבחר תהיה תלויה בסיבה שבגללה
אנו מבצעים את ההתאמה
1 Choosing the sequencess2 Scoring metrics3 Approximationheuristic algorithms4 MSA formats5 Interpreting the MSA
The problems we have to answer
9
Scoring metricsbull Distance from Consensus - The consensus of an alignment
is a string of the most common characters in each column of the alignment The total distance between the strings is defined as the number of characters that differ from the consensus character of their column let C be the consensus sequence then the total distance is sum_i D(Si C)
bull Evolutionary Tree Alignment - The weight of the lightest evolutionary tree that can be constructed from the sequences with the weight of the tree defined as the number of changes between pairs of sequences that correspond to two adjacent nodes in the tree summed over all such pairs
bull Sum of Pairs - The sum of pairwise distances between all pairs of sequences Sum_iltj D(Si Sj)
10
-SCGPFIRVMSCGPGLRA-SCTPHL-A
-SCGPFIRVMSCGPGLRA
-SCGPFIRV-SCTPHL-A
MSCGPGLRA-SCTPHL-A
5 3 5 13
Scoring metrics -examplesSum of pairs
-SCGPFIRVMSCGPGLRA-SCTPHL-AMSC-PKIRGMS-LPLLRNMSHKPALRA
3 5 4 1 6 2 4 6 354סהכ הומוגניות
3 1 2 5 0 4 2 0 42 19סהכ מרחק
Distance from concensus
11
MSA algorithms
bull Progressive methods (CLUSTALWT-Coffee)
bull Iterative methods (Dialign)
bull Direct optimization (monte carlo genetic algorithms)
bull Local methods eMotifs Blocks Psi-blast
12
CLUSTALW algorithm
bull Compare all sequence pairs (pairwise alignment)
bull Generate a hierarchy for alignment (guide tree)
bull Build the multiple alignment step by step according to the guide tree first aligning the most similar pair then add another sequence or another pairwise alignment etc
13
(1) Pairwise alignment (prepare a guide tree)
6 pairwise alignments
then cluster analysis
(2) Multiple alignment following the tree from (1)
successive alignments
CLUSTALW algorithm
14
בונים עץ עי חיבור הרצפים הדומים לפי סדר התאמתם2
Hbb Hbb Hba Hba Myg Human Horse Human Horse Whale
בונים את ההתאמה לפי הסדר המוכתב3 עי העץ
ישנם שלושה מצביםהתאמת זוג רצפים - התאמה בין התאמות-הוספת רצף בודד להתאמה קיימת-
בונים טבלה של מרחק עריכה בין כל שני רצפים1CLUSTALW algorithm
15
ניתן להתאים בין התאמות (או בין רצף להתאמה) עי הרחבה די טבעית של שיטת התיכנות הדינמי הוספה
והכנסת אותיות באופן הרגיל המחיר להתאמת אותיות הוא מחיר ממוצע לקומבינציות השונות צריך לתת מחיר מיוחד
לרווח
16
נקודות עדינות
נותנים משקלות שונים לרצפים כך שאם יש מספר רצפים מאד דומים bullהמשקל היחסי של כל אחד יקטן
שיטה מיוחדת לקביעת מחיר להכנסת רווחיםbull
חסרונותמאחר שהבניה היא בלתי הפיכה יש תלות גדולה מאד באיכות ההתאמה bull
של הזוגות הראשוניםOnce a Gap Always a gapרווחים אינם נעלמים bullרגישות רבה לקנס הרווחיםbull
יתרונותמהירbullסבירbullאפשר לקבוע הרבה פרמטריםbullכולם משתמשיםbull
CLUSTALW algorithm
17
bull We can deduce a pairwise alignment for each two sequences in the multiple alignment (projected pairwise alignment)
bull The projected pairwise alignment is NOT the best pairwise alignment for the two sequences
bull CLUTALW is not an optimal algorithm Better alignments might exist The algorithm yields a possible alignment but not necessarily the best one
Best Pairwise alignment (optimal)
Projected Pairwise alignment
CLUSTALW algorithm
18
ClustalW at EMBLhttpwwwebiacukclustalw Clustalw at the SRS site at EBI
19
ClustalW Output Aln format
20
MSA Editing Jalview
Conservation
wwwesembnetorgServicesMolBiojalviewindexhtml
21
MSA formats - fasta
22
MSA formats - Aln
23
MSA formats - MSF
24
Example 1a a good MSA
25
Example 1b making MSA of distantly related proteins
26
Example 1c including more distant relatives in the MSA
27
bull Mononuclear iron proteins ndash electron carrier proteins Iron atoms are bound to amino acid side chains
bull In IPNS the metal ion is coordinated by three protein residues
bull IPNS is involved in biosynthesis of penicillin
Example 2 Isopenicillin N Synthase
N
SN
OCOOHO
H
H
Me
Me
COOH
NH2
N
SN
OCOOH
NH2 Me
Me
COOHO
ACV
Isopenicillin N
Fe+2
Ascorbate
O2
2H2O Fe
N
NHis268
H
N
NHis212
H
SACV
O2 (NO)
H2O
OAsp214
O
28
Research IPNS
bull Goal Identify Fe+2 binding residues
bull Possible solutions1 In the lab
2 Bioinformatic approach (comparing different IPNS sequences)
29
Step 1
Multiple alignment of known IPNS
Implementation
1 Obtain sequence (eg for MCBI)
IPNS AND Bacteria[Organism]
2 MSA (clustalw) and search for conserved residues in the MSA
30
31
32
MSA ndash bacteria only
Not enough variation
33
MSA ndash bacteria amp fungi
Not enough variation
bacteria amp fungi
bacteria
34
Step 2
Goal Add more enzymes similar to IPNS
ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW
35
bull New multiple alignment narrowing down the possibilities
36
Simple multiple alignment
bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite
similarbull Not enough variability to categorize the active
sitesbull We need to obtain even more distant sequences
(distant homologs)
37
Step 3
Using the results of the MSA for further searches
Implementation
1 Obtain an MSA (clustalw)
2 - Construct a consensus sequence and perform a
new search
OR
- Construct a profile and perform a new search
3 MSA (clustalw) and search for conserved residues in the MSA
38
Consensus Sequence
bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column
bull Consensus each position reflects the most common character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
39
Profilebull We can deduce a statistical model describing
the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column
bull Profile each position reflects the frequency of the character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 1 067 0 0
T 0 033 1 1
C 0 0 0 0
G 0 0 0 0
40
Profile vs Consensusbull The following multiple alignments will
have the same consensus
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
41
Profile vs Consensusbull But have a different profile
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 066 1 0 0
T 0 0 0 1
C 033 0 066
0
G 0 0 033
0
1 2 3 4 5 6
A 1 1 0 0
T 0 0 0 1
C 0 0 1 0
G 0 0 0 0
42
Sequence LOGO A A C C G C T C T T
A G C C G C G C - T
A - C A G A G C C T
A A G C A C G C - T
A C G G G T G C T T
A T G C ndash C G C - T
A gc c g G C T
httpweblogoberkeleyedu
43
Psi Blast
bull Position Specific Iterated - automatic profile-like search
Regular blast
Construct profile from blast results
Blast profile search
Final results
44
bull Alignment with distantly related proteins
45
bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS
Enzyme Relative Km kcat kcatKm
Activity (mM) (min-1) (mM-1min
-1)Wild type 100 04 388 969
His48Ala 16 056 75 134
His63Ala 31 10 142 142
His114Ala 28 085 125 147
His124Ala 48 084 321 381
His135Ala 22 059 117 198
His212Ala lt0007 nd nd
His268Ala lt0003 nd nd
Asp14Ala 5 086 056 07
Asp113Ala 63 045 238 528
Asp131Ala 68 048 363 755
Asp203Ala 32 091 123 135
Asp214Ala lt0004 nd nd
Isopenicillin N Synthase
46
ndash IPNS
47
48
9
Scoring metricsbull Distance from Consensus - The consensus of an alignment
is a string of the most common characters in each column of the alignment The total distance between the strings is defined as the number of characters that differ from the consensus character of their column let C be the consensus sequence then the total distance is sum_i D(Si C)
bull Evolutionary Tree Alignment - The weight of the lightest evolutionary tree that can be constructed from the sequences with the weight of the tree defined as the number of changes between pairs of sequences that correspond to two adjacent nodes in the tree summed over all such pairs
bull Sum of Pairs - The sum of pairwise distances between all pairs of sequences Sum_iltj D(Si Sj)
10
-SCGPFIRVMSCGPGLRA-SCTPHL-A
-SCGPFIRVMSCGPGLRA
-SCGPFIRV-SCTPHL-A
MSCGPGLRA-SCTPHL-A
5 3 5 13
Scoring metrics -examplesSum of pairs
-SCGPFIRVMSCGPGLRA-SCTPHL-AMSC-PKIRGMS-LPLLRNMSHKPALRA
3 5 4 1 6 2 4 6 354סהכ הומוגניות
3 1 2 5 0 4 2 0 42 19סהכ מרחק
Distance from concensus
11
MSA algorithms
bull Progressive methods (CLUSTALWT-Coffee)
bull Iterative methods (Dialign)
bull Direct optimization (monte carlo genetic algorithms)
bull Local methods eMotifs Blocks Psi-blast
12
CLUSTALW algorithm
bull Compare all sequence pairs (pairwise alignment)
bull Generate a hierarchy for alignment (guide tree)
bull Build the multiple alignment step by step according to the guide tree first aligning the most similar pair then add another sequence or another pairwise alignment etc
13
(1) Pairwise alignment (prepare a guide tree)
6 pairwise alignments
then cluster analysis
(2) Multiple alignment following the tree from (1)
successive alignments
CLUSTALW algorithm
14
בונים עץ עי חיבור הרצפים הדומים לפי סדר התאמתם2
Hbb Hbb Hba Hba Myg Human Horse Human Horse Whale
בונים את ההתאמה לפי הסדר המוכתב3 עי העץ
ישנם שלושה מצביםהתאמת זוג רצפים - התאמה בין התאמות-הוספת רצף בודד להתאמה קיימת-
בונים טבלה של מרחק עריכה בין כל שני רצפים1CLUSTALW algorithm
15
ניתן להתאים בין התאמות (או בין רצף להתאמה) עי הרחבה די טבעית של שיטת התיכנות הדינמי הוספה
והכנסת אותיות באופן הרגיל המחיר להתאמת אותיות הוא מחיר ממוצע לקומבינציות השונות צריך לתת מחיר מיוחד
לרווח
16
נקודות עדינות
נותנים משקלות שונים לרצפים כך שאם יש מספר רצפים מאד דומים bullהמשקל היחסי של כל אחד יקטן
שיטה מיוחדת לקביעת מחיר להכנסת רווחיםbull
חסרונותמאחר שהבניה היא בלתי הפיכה יש תלות גדולה מאד באיכות ההתאמה bull
של הזוגות הראשוניםOnce a Gap Always a gapרווחים אינם נעלמים bullרגישות רבה לקנס הרווחיםbull
יתרונותמהירbullסבירbullאפשר לקבוע הרבה פרמטריםbullכולם משתמשיםbull
CLUSTALW algorithm
17
bull We can deduce a pairwise alignment for each two sequences in the multiple alignment (projected pairwise alignment)
bull The projected pairwise alignment is NOT the best pairwise alignment for the two sequences
bull CLUTALW is not an optimal algorithm Better alignments might exist The algorithm yields a possible alignment but not necessarily the best one
Best Pairwise alignment (optimal)
Projected Pairwise alignment
CLUSTALW algorithm
18
ClustalW at EMBLhttpwwwebiacukclustalw Clustalw at the SRS site at EBI
19
ClustalW Output Aln format
20
MSA Editing Jalview
Conservation
wwwesembnetorgServicesMolBiojalviewindexhtml
21
MSA formats - fasta
22
MSA formats - Aln
23
MSA formats - MSF
24
Example 1a a good MSA
25
Example 1b making MSA of distantly related proteins
26
Example 1c including more distant relatives in the MSA
27
bull Mononuclear iron proteins ndash electron carrier proteins Iron atoms are bound to amino acid side chains
bull In IPNS the metal ion is coordinated by three protein residues
bull IPNS is involved in biosynthesis of penicillin
Example 2 Isopenicillin N Synthase
N
SN
OCOOHO
H
H
Me
Me
COOH
NH2
N
SN
OCOOH
NH2 Me
Me
COOHO
ACV
Isopenicillin N
Fe+2
Ascorbate
O2
2H2O Fe
N
NHis268
H
N
NHis212
H
SACV
O2 (NO)
H2O
OAsp214
O
28
Research IPNS
bull Goal Identify Fe+2 binding residues
bull Possible solutions1 In the lab
2 Bioinformatic approach (comparing different IPNS sequences)
29
Step 1
Multiple alignment of known IPNS
Implementation
1 Obtain sequence (eg for MCBI)
IPNS AND Bacteria[Organism]
2 MSA (clustalw) and search for conserved residues in the MSA
30
31
32
MSA ndash bacteria only
Not enough variation
33
MSA ndash bacteria amp fungi
Not enough variation
bacteria amp fungi
bacteria
34
Step 2
Goal Add more enzymes similar to IPNS
ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW
35
bull New multiple alignment narrowing down the possibilities
36
Simple multiple alignment
bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite
similarbull Not enough variability to categorize the active
sitesbull We need to obtain even more distant sequences
(distant homologs)
37
Step 3
Using the results of the MSA for further searches
Implementation
1 Obtain an MSA (clustalw)
2 - Construct a consensus sequence and perform a
new search
OR
- Construct a profile and perform a new search
3 MSA (clustalw) and search for conserved residues in the MSA
38
Consensus Sequence
bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column
bull Consensus each position reflects the most common character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
39
Profilebull We can deduce a statistical model describing
the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column
bull Profile each position reflects the frequency of the character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 1 067 0 0
T 0 033 1 1
C 0 0 0 0
G 0 0 0 0
40
Profile vs Consensusbull The following multiple alignments will
have the same consensus
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
41
Profile vs Consensusbull But have a different profile
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 066 1 0 0
T 0 0 0 1
C 033 0 066
0
G 0 0 033
0
1 2 3 4 5 6
A 1 1 0 0
T 0 0 0 1
C 0 0 1 0
G 0 0 0 0
42
Sequence LOGO A A C C G C T C T T
A G C C G C G C - T
A - C A G A G C C T
A A G C A C G C - T
A C G G G T G C T T
A T G C ndash C G C - T
A gc c g G C T
httpweblogoberkeleyedu
43
Psi Blast
bull Position Specific Iterated - automatic profile-like search
Regular blast
Construct profile from blast results
Blast profile search
Final results
44
bull Alignment with distantly related proteins
45
bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS
Enzyme Relative Km kcat kcatKm
Activity (mM) (min-1) (mM-1min
-1)Wild type 100 04 388 969
His48Ala 16 056 75 134
His63Ala 31 10 142 142
His114Ala 28 085 125 147
His124Ala 48 084 321 381
His135Ala 22 059 117 198
His212Ala lt0007 nd nd
His268Ala lt0003 nd nd
Asp14Ala 5 086 056 07
Asp113Ala 63 045 238 528
Asp131Ala 68 048 363 755
Asp203Ala 32 091 123 135
Asp214Ala lt0004 nd nd
Isopenicillin N Synthase
46
ndash IPNS
47
48
10
-SCGPFIRVMSCGPGLRA-SCTPHL-A
-SCGPFIRVMSCGPGLRA
-SCGPFIRV-SCTPHL-A
MSCGPGLRA-SCTPHL-A
5 3 5 13
Scoring metrics -examplesSum of pairs
-SCGPFIRVMSCGPGLRA-SCTPHL-AMSC-PKIRGMS-LPLLRNMSHKPALRA
3 5 4 1 6 2 4 6 354סהכ הומוגניות
3 1 2 5 0 4 2 0 42 19סהכ מרחק
Distance from concensus
11
MSA algorithms
bull Progressive methods (CLUSTALWT-Coffee)
bull Iterative methods (Dialign)
bull Direct optimization (monte carlo genetic algorithms)
bull Local methods eMotifs Blocks Psi-blast
12
CLUSTALW algorithm
bull Compare all sequence pairs (pairwise alignment)
bull Generate a hierarchy for alignment (guide tree)
bull Build the multiple alignment step by step according to the guide tree first aligning the most similar pair then add another sequence or another pairwise alignment etc
13
(1) Pairwise alignment (prepare a guide tree)
6 pairwise alignments
then cluster analysis
(2) Multiple alignment following the tree from (1)
successive alignments
CLUSTALW algorithm
14
בונים עץ עי חיבור הרצפים הדומים לפי סדר התאמתם2
Hbb Hbb Hba Hba Myg Human Horse Human Horse Whale
בונים את ההתאמה לפי הסדר המוכתב3 עי העץ
ישנם שלושה מצביםהתאמת זוג רצפים - התאמה בין התאמות-הוספת רצף בודד להתאמה קיימת-
בונים טבלה של מרחק עריכה בין כל שני רצפים1CLUSTALW algorithm
15
ניתן להתאים בין התאמות (או בין רצף להתאמה) עי הרחבה די טבעית של שיטת התיכנות הדינמי הוספה
והכנסת אותיות באופן הרגיל המחיר להתאמת אותיות הוא מחיר ממוצע לקומבינציות השונות צריך לתת מחיר מיוחד
לרווח
16
נקודות עדינות
נותנים משקלות שונים לרצפים כך שאם יש מספר רצפים מאד דומים bullהמשקל היחסי של כל אחד יקטן
שיטה מיוחדת לקביעת מחיר להכנסת רווחיםbull
חסרונותמאחר שהבניה היא בלתי הפיכה יש תלות גדולה מאד באיכות ההתאמה bull
של הזוגות הראשוניםOnce a Gap Always a gapרווחים אינם נעלמים bullרגישות רבה לקנס הרווחיםbull
יתרונותמהירbullסבירbullאפשר לקבוע הרבה פרמטריםbullכולם משתמשיםbull
CLUSTALW algorithm
17
bull We can deduce a pairwise alignment for each two sequences in the multiple alignment (projected pairwise alignment)
bull The projected pairwise alignment is NOT the best pairwise alignment for the two sequences
bull CLUTALW is not an optimal algorithm Better alignments might exist The algorithm yields a possible alignment but not necessarily the best one
Best Pairwise alignment (optimal)
Projected Pairwise alignment
CLUSTALW algorithm
18
ClustalW at EMBLhttpwwwebiacukclustalw Clustalw at the SRS site at EBI
19
ClustalW Output Aln format
20
MSA Editing Jalview
Conservation
wwwesembnetorgServicesMolBiojalviewindexhtml
21
MSA formats - fasta
22
MSA formats - Aln
23
MSA formats - MSF
24
Example 1a a good MSA
25
Example 1b making MSA of distantly related proteins
26
Example 1c including more distant relatives in the MSA
27
bull Mononuclear iron proteins ndash electron carrier proteins Iron atoms are bound to amino acid side chains
bull In IPNS the metal ion is coordinated by three protein residues
bull IPNS is involved in biosynthesis of penicillin
Example 2 Isopenicillin N Synthase
N
SN
OCOOHO
H
H
Me
Me
COOH
NH2
N
SN
OCOOH
NH2 Me
Me
COOHO
ACV
Isopenicillin N
Fe+2
Ascorbate
O2
2H2O Fe
N
NHis268
H
N
NHis212
H
SACV
O2 (NO)
H2O
OAsp214
O
28
Research IPNS
bull Goal Identify Fe+2 binding residues
bull Possible solutions1 In the lab
2 Bioinformatic approach (comparing different IPNS sequences)
29
Step 1
Multiple alignment of known IPNS
Implementation
1 Obtain sequence (eg for MCBI)
IPNS AND Bacteria[Organism]
2 MSA (clustalw) and search for conserved residues in the MSA
30
31
32
MSA ndash bacteria only
Not enough variation
33
MSA ndash bacteria amp fungi
Not enough variation
bacteria amp fungi
bacteria
34
Step 2
Goal Add more enzymes similar to IPNS
ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW
35
bull New multiple alignment narrowing down the possibilities
36
Simple multiple alignment
bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite
similarbull Not enough variability to categorize the active
sitesbull We need to obtain even more distant sequences
(distant homologs)
37
Step 3
Using the results of the MSA for further searches
Implementation
1 Obtain an MSA (clustalw)
2 - Construct a consensus sequence and perform a
new search
OR
- Construct a profile and perform a new search
3 MSA (clustalw) and search for conserved residues in the MSA
38
Consensus Sequence
bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column
bull Consensus each position reflects the most common character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
39
Profilebull We can deduce a statistical model describing
the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column
bull Profile each position reflects the frequency of the character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 1 067 0 0
T 0 033 1 1
C 0 0 0 0
G 0 0 0 0
40
Profile vs Consensusbull The following multiple alignments will
have the same consensus
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
41
Profile vs Consensusbull But have a different profile
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 066 1 0 0
T 0 0 0 1
C 033 0 066
0
G 0 0 033
0
1 2 3 4 5 6
A 1 1 0 0
T 0 0 0 1
C 0 0 1 0
G 0 0 0 0
42
Sequence LOGO A A C C G C T C T T
A G C C G C G C - T
A - C A G A G C C T
A A G C A C G C - T
A C G G G T G C T T
A T G C ndash C G C - T
A gc c g G C T
httpweblogoberkeleyedu
43
Psi Blast
bull Position Specific Iterated - automatic profile-like search
Regular blast
Construct profile from blast results
Blast profile search
Final results
44
bull Alignment with distantly related proteins
45
bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS
Enzyme Relative Km kcat kcatKm
Activity (mM) (min-1) (mM-1min
-1)Wild type 100 04 388 969
His48Ala 16 056 75 134
His63Ala 31 10 142 142
His114Ala 28 085 125 147
His124Ala 48 084 321 381
His135Ala 22 059 117 198
His212Ala lt0007 nd nd
His268Ala lt0003 nd nd
Asp14Ala 5 086 056 07
Asp113Ala 63 045 238 528
Asp131Ala 68 048 363 755
Asp203Ala 32 091 123 135
Asp214Ala lt0004 nd nd
Isopenicillin N Synthase
46
ndash IPNS
47
48
11
MSA algorithms
bull Progressive methods (CLUSTALWT-Coffee)
bull Iterative methods (Dialign)
bull Direct optimization (monte carlo genetic algorithms)
bull Local methods eMotifs Blocks Psi-blast
12
CLUSTALW algorithm
bull Compare all sequence pairs (pairwise alignment)
bull Generate a hierarchy for alignment (guide tree)
bull Build the multiple alignment step by step according to the guide tree first aligning the most similar pair then add another sequence or another pairwise alignment etc
13
(1) Pairwise alignment (prepare a guide tree)
6 pairwise alignments
then cluster analysis
(2) Multiple alignment following the tree from (1)
successive alignments
CLUSTALW algorithm
14
בונים עץ עי חיבור הרצפים הדומים לפי סדר התאמתם2
Hbb Hbb Hba Hba Myg Human Horse Human Horse Whale
בונים את ההתאמה לפי הסדר המוכתב3 עי העץ
ישנם שלושה מצביםהתאמת זוג רצפים - התאמה בין התאמות-הוספת רצף בודד להתאמה קיימת-
בונים טבלה של מרחק עריכה בין כל שני רצפים1CLUSTALW algorithm
15
ניתן להתאים בין התאמות (או בין רצף להתאמה) עי הרחבה די טבעית של שיטת התיכנות הדינמי הוספה
והכנסת אותיות באופן הרגיל המחיר להתאמת אותיות הוא מחיר ממוצע לקומבינציות השונות צריך לתת מחיר מיוחד
לרווח
16
נקודות עדינות
נותנים משקלות שונים לרצפים כך שאם יש מספר רצפים מאד דומים bullהמשקל היחסי של כל אחד יקטן
שיטה מיוחדת לקביעת מחיר להכנסת רווחיםbull
חסרונותמאחר שהבניה היא בלתי הפיכה יש תלות גדולה מאד באיכות ההתאמה bull
של הזוגות הראשוניםOnce a Gap Always a gapרווחים אינם נעלמים bullרגישות רבה לקנס הרווחיםbull
יתרונותמהירbullסבירbullאפשר לקבוע הרבה פרמטריםbullכולם משתמשיםbull
CLUSTALW algorithm
17
bull We can deduce a pairwise alignment for each two sequences in the multiple alignment (projected pairwise alignment)
bull The projected pairwise alignment is NOT the best pairwise alignment for the two sequences
bull CLUTALW is not an optimal algorithm Better alignments might exist The algorithm yields a possible alignment but not necessarily the best one
Best Pairwise alignment (optimal)
Projected Pairwise alignment
CLUSTALW algorithm
18
ClustalW at EMBLhttpwwwebiacukclustalw Clustalw at the SRS site at EBI
19
ClustalW Output Aln format
20
MSA Editing Jalview
Conservation
wwwesembnetorgServicesMolBiojalviewindexhtml
21
MSA formats - fasta
22
MSA formats - Aln
23
MSA formats - MSF
24
Example 1a a good MSA
25
Example 1b making MSA of distantly related proteins
26
Example 1c including more distant relatives in the MSA
27
bull Mononuclear iron proteins ndash electron carrier proteins Iron atoms are bound to amino acid side chains
bull In IPNS the metal ion is coordinated by three protein residues
bull IPNS is involved in biosynthesis of penicillin
Example 2 Isopenicillin N Synthase
N
SN
OCOOHO
H
H
Me
Me
COOH
NH2
N
SN
OCOOH
NH2 Me
Me
COOHO
ACV
Isopenicillin N
Fe+2
Ascorbate
O2
2H2O Fe
N
NHis268
H
N
NHis212
H
SACV
O2 (NO)
H2O
OAsp214
O
28
Research IPNS
bull Goal Identify Fe+2 binding residues
bull Possible solutions1 In the lab
2 Bioinformatic approach (comparing different IPNS sequences)
29
Step 1
Multiple alignment of known IPNS
Implementation
1 Obtain sequence (eg for MCBI)
IPNS AND Bacteria[Organism]
2 MSA (clustalw) and search for conserved residues in the MSA
30
31
32
MSA ndash bacteria only
Not enough variation
33
MSA ndash bacteria amp fungi
Not enough variation
bacteria amp fungi
bacteria
34
Step 2
Goal Add more enzymes similar to IPNS
ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW
35
bull New multiple alignment narrowing down the possibilities
36
Simple multiple alignment
bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite
similarbull Not enough variability to categorize the active
sitesbull We need to obtain even more distant sequences
(distant homologs)
37
Step 3
Using the results of the MSA for further searches
Implementation
1 Obtain an MSA (clustalw)
2 - Construct a consensus sequence and perform a
new search
OR
- Construct a profile and perform a new search
3 MSA (clustalw) and search for conserved residues in the MSA
38
Consensus Sequence
bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column
bull Consensus each position reflects the most common character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
39
Profilebull We can deduce a statistical model describing
the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column
bull Profile each position reflects the frequency of the character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 1 067 0 0
T 0 033 1 1
C 0 0 0 0
G 0 0 0 0
40
Profile vs Consensusbull The following multiple alignments will
have the same consensus
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
41
Profile vs Consensusbull But have a different profile
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 066 1 0 0
T 0 0 0 1
C 033 0 066
0
G 0 0 033
0
1 2 3 4 5 6
A 1 1 0 0
T 0 0 0 1
C 0 0 1 0
G 0 0 0 0
42
Sequence LOGO A A C C G C T C T T
A G C C G C G C - T
A - C A G A G C C T
A A G C A C G C - T
A C G G G T G C T T
A T G C ndash C G C - T
A gc c g G C T
httpweblogoberkeleyedu
43
Psi Blast
bull Position Specific Iterated - automatic profile-like search
Regular blast
Construct profile from blast results
Blast profile search
Final results
44
bull Alignment with distantly related proteins
45
bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS
Enzyme Relative Km kcat kcatKm
Activity (mM) (min-1) (mM-1min
-1)Wild type 100 04 388 969
His48Ala 16 056 75 134
His63Ala 31 10 142 142
His114Ala 28 085 125 147
His124Ala 48 084 321 381
His135Ala 22 059 117 198
His212Ala lt0007 nd nd
His268Ala lt0003 nd nd
Asp14Ala 5 086 056 07
Asp113Ala 63 045 238 528
Asp131Ala 68 048 363 755
Asp203Ala 32 091 123 135
Asp214Ala lt0004 nd nd
Isopenicillin N Synthase
46
ndash IPNS
47
48
12
CLUSTALW algorithm
bull Compare all sequence pairs (pairwise alignment)
bull Generate a hierarchy for alignment (guide tree)
bull Build the multiple alignment step by step according to the guide tree first aligning the most similar pair then add another sequence or another pairwise alignment etc
13
(1) Pairwise alignment (prepare a guide tree)
6 pairwise alignments
then cluster analysis
(2) Multiple alignment following the tree from (1)
successive alignments
CLUSTALW algorithm
14
בונים עץ עי חיבור הרצפים הדומים לפי סדר התאמתם2
Hbb Hbb Hba Hba Myg Human Horse Human Horse Whale
בונים את ההתאמה לפי הסדר המוכתב3 עי העץ
ישנם שלושה מצביםהתאמת זוג רצפים - התאמה בין התאמות-הוספת רצף בודד להתאמה קיימת-
בונים טבלה של מרחק עריכה בין כל שני רצפים1CLUSTALW algorithm
15
ניתן להתאים בין התאמות (או בין רצף להתאמה) עי הרחבה די טבעית של שיטת התיכנות הדינמי הוספה
והכנסת אותיות באופן הרגיל המחיר להתאמת אותיות הוא מחיר ממוצע לקומבינציות השונות צריך לתת מחיר מיוחד
לרווח
16
נקודות עדינות
נותנים משקלות שונים לרצפים כך שאם יש מספר רצפים מאד דומים bullהמשקל היחסי של כל אחד יקטן
שיטה מיוחדת לקביעת מחיר להכנסת רווחיםbull
חסרונותמאחר שהבניה היא בלתי הפיכה יש תלות גדולה מאד באיכות ההתאמה bull
של הזוגות הראשוניםOnce a Gap Always a gapרווחים אינם נעלמים bullרגישות רבה לקנס הרווחיםbull
יתרונותמהירbullסבירbullאפשר לקבוע הרבה פרמטריםbullכולם משתמשיםbull
CLUSTALW algorithm
17
bull We can deduce a pairwise alignment for each two sequences in the multiple alignment (projected pairwise alignment)
bull The projected pairwise alignment is NOT the best pairwise alignment for the two sequences
bull CLUTALW is not an optimal algorithm Better alignments might exist The algorithm yields a possible alignment but not necessarily the best one
Best Pairwise alignment (optimal)
Projected Pairwise alignment
CLUSTALW algorithm
18
ClustalW at EMBLhttpwwwebiacukclustalw Clustalw at the SRS site at EBI
19
ClustalW Output Aln format
20
MSA Editing Jalview
Conservation
wwwesembnetorgServicesMolBiojalviewindexhtml
21
MSA formats - fasta
22
MSA formats - Aln
23
MSA formats - MSF
24
Example 1a a good MSA
25
Example 1b making MSA of distantly related proteins
26
Example 1c including more distant relatives in the MSA
27
bull Mononuclear iron proteins ndash electron carrier proteins Iron atoms are bound to amino acid side chains
bull In IPNS the metal ion is coordinated by three protein residues
bull IPNS is involved in biosynthesis of penicillin
Example 2 Isopenicillin N Synthase
N
SN
OCOOHO
H
H
Me
Me
COOH
NH2
N
SN
OCOOH
NH2 Me
Me
COOHO
ACV
Isopenicillin N
Fe+2
Ascorbate
O2
2H2O Fe
N
NHis268
H
N
NHis212
H
SACV
O2 (NO)
H2O
OAsp214
O
28
Research IPNS
bull Goal Identify Fe+2 binding residues
bull Possible solutions1 In the lab
2 Bioinformatic approach (comparing different IPNS sequences)
29
Step 1
Multiple alignment of known IPNS
Implementation
1 Obtain sequence (eg for MCBI)
IPNS AND Bacteria[Organism]
2 MSA (clustalw) and search for conserved residues in the MSA
30
31
32
MSA ndash bacteria only
Not enough variation
33
MSA ndash bacteria amp fungi
Not enough variation
bacteria amp fungi
bacteria
34
Step 2
Goal Add more enzymes similar to IPNS
ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW
35
bull New multiple alignment narrowing down the possibilities
36
Simple multiple alignment
bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite
similarbull Not enough variability to categorize the active
sitesbull We need to obtain even more distant sequences
(distant homologs)
37
Step 3
Using the results of the MSA for further searches
Implementation
1 Obtain an MSA (clustalw)
2 - Construct a consensus sequence and perform a
new search
OR
- Construct a profile and perform a new search
3 MSA (clustalw) and search for conserved residues in the MSA
38
Consensus Sequence
bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column
bull Consensus each position reflects the most common character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
39
Profilebull We can deduce a statistical model describing
the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column
bull Profile each position reflects the frequency of the character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 1 067 0 0
T 0 033 1 1
C 0 0 0 0
G 0 0 0 0
40
Profile vs Consensusbull The following multiple alignments will
have the same consensus
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
41
Profile vs Consensusbull But have a different profile
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 066 1 0 0
T 0 0 0 1
C 033 0 066
0
G 0 0 033
0
1 2 3 4 5 6
A 1 1 0 0
T 0 0 0 1
C 0 0 1 0
G 0 0 0 0
42
Sequence LOGO A A C C G C T C T T
A G C C G C G C - T
A - C A G A G C C T
A A G C A C G C - T
A C G G G T G C T T
A T G C ndash C G C - T
A gc c g G C T
httpweblogoberkeleyedu
43
Psi Blast
bull Position Specific Iterated - automatic profile-like search
Regular blast
Construct profile from blast results
Blast profile search
Final results
44
bull Alignment with distantly related proteins
45
bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS
Enzyme Relative Km kcat kcatKm
Activity (mM) (min-1) (mM-1min
-1)Wild type 100 04 388 969
His48Ala 16 056 75 134
His63Ala 31 10 142 142
His114Ala 28 085 125 147
His124Ala 48 084 321 381
His135Ala 22 059 117 198
His212Ala lt0007 nd nd
His268Ala lt0003 nd nd
Asp14Ala 5 086 056 07
Asp113Ala 63 045 238 528
Asp131Ala 68 048 363 755
Asp203Ala 32 091 123 135
Asp214Ala lt0004 nd nd
Isopenicillin N Synthase
46
ndash IPNS
47
48
13
(1) Pairwise alignment (prepare a guide tree)
6 pairwise alignments
then cluster analysis
(2) Multiple alignment following the tree from (1)
successive alignments
CLUSTALW algorithm
14
בונים עץ עי חיבור הרצפים הדומים לפי סדר התאמתם2
Hbb Hbb Hba Hba Myg Human Horse Human Horse Whale
בונים את ההתאמה לפי הסדר המוכתב3 עי העץ
ישנם שלושה מצביםהתאמת זוג רצפים - התאמה בין התאמות-הוספת רצף בודד להתאמה קיימת-
בונים טבלה של מרחק עריכה בין כל שני רצפים1CLUSTALW algorithm
15
ניתן להתאים בין התאמות (או בין רצף להתאמה) עי הרחבה די טבעית של שיטת התיכנות הדינמי הוספה
והכנסת אותיות באופן הרגיל המחיר להתאמת אותיות הוא מחיר ממוצע לקומבינציות השונות צריך לתת מחיר מיוחד
לרווח
16
נקודות עדינות
נותנים משקלות שונים לרצפים כך שאם יש מספר רצפים מאד דומים bullהמשקל היחסי של כל אחד יקטן
שיטה מיוחדת לקביעת מחיר להכנסת רווחיםbull
חסרונותמאחר שהבניה היא בלתי הפיכה יש תלות גדולה מאד באיכות ההתאמה bull
של הזוגות הראשוניםOnce a Gap Always a gapרווחים אינם נעלמים bullרגישות רבה לקנס הרווחיםbull
יתרונותמהירbullסבירbullאפשר לקבוע הרבה פרמטריםbullכולם משתמשיםbull
CLUSTALW algorithm
17
bull We can deduce a pairwise alignment for each two sequences in the multiple alignment (projected pairwise alignment)
bull The projected pairwise alignment is NOT the best pairwise alignment for the two sequences
bull CLUTALW is not an optimal algorithm Better alignments might exist The algorithm yields a possible alignment but not necessarily the best one
Best Pairwise alignment (optimal)
Projected Pairwise alignment
CLUSTALW algorithm
18
ClustalW at EMBLhttpwwwebiacukclustalw Clustalw at the SRS site at EBI
19
ClustalW Output Aln format
20
MSA Editing Jalview
Conservation
wwwesembnetorgServicesMolBiojalviewindexhtml
21
MSA formats - fasta
22
MSA formats - Aln
23
MSA formats - MSF
24
Example 1a a good MSA
25
Example 1b making MSA of distantly related proteins
26
Example 1c including more distant relatives in the MSA
27
bull Mononuclear iron proteins ndash electron carrier proteins Iron atoms are bound to amino acid side chains
bull In IPNS the metal ion is coordinated by three protein residues
bull IPNS is involved in biosynthesis of penicillin
Example 2 Isopenicillin N Synthase
N
SN
OCOOHO
H
H
Me
Me
COOH
NH2
N
SN
OCOOH
NH2 Me
Me
COOHO
ACV
Isopenicillin N
Fe+2
Ascorbate
O2
2H2O Fe
N
NHis268
H
N
NHis212
H
SACV
O2 (NO)
H2O
OAsp214
O
28
Research IPNS
bull Goal Identify Fe+2 binding residues
bull Possible solutions1 In the lab
2 Bioinformatic approach (comparing different IPNS sequences)
29
Step 1
Multiple alignment of known IPNS
Implementation
1 Obtain sequence (eg for MCBI)
IPNS AND Bacteria[Organism]
2 MSA (clustalw) and search for conserved residues in the MSA
30
31
32
MSA ndash bacteria only
Not enough variation
33
MSA ndash bacteria amp fungi
Not enough variation
bacteria amp fungi
bacteria
34
Step 2
Goal Add more enzymes similar to IPNS
ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW
35
bull New multiple alignment narrowing down the possibilities
36
Simple multiple alignment
bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite
similarbull Not enough variability to categorize the active
sitesbull We need to obtain even more distant sequences
(distant homologs)
37
Step 3
Using the results of the MSA for further searches
Implementation
1 Obtain an MSA (clustalw)
2 - Construct a consensus sequence and perform a
new search
OR
- Construct a profile and perform a new search
3 MSA (clustalw) and search for conserved residues in the MSA
38
Consensus Sequence
bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column
bull Consensus each position reflects the most common character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
39
Profilebull We can deduce a statistical model describing
the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column
bull Profile each position reflects the frequency of the character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 1 067 0 0
T 0 033 1 1
C 0 0 0 0
G 0 0 0 0
40
Profile vs Consensusbull The following multiple alignments will
have the same consensus
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
41
Profile vs Consensusbull But have a different profile
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 066 1 0 0
T 0 0 0 1
C 033 0 066
0
G 0 0 033
0
1 2 3 4 5 6
A 1 1 0 0
T 0 0 0 1
C 0 0 1 0
G 0 0 0 0
42
Sequence LOGO A A C C G C T C T T
A G C C G C G C - T
A - C A G A G C C T
A A G C A C G C - T
A C G G G T G C T T
A T G C ndash C G C - T
A gc c g G C T
httpweblogoberkeleyedu
43
Psi Blast
bull Position Specific Iterated - automatic profile-like search
Regular blast
Construct profile from blast results
Blast profile search
Final results
44
bull Alignment with distantly related proteins
45
bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS
Enzyme Relative Km kcat kcatKm
Activity (mM) (min-1) (mM-1min
-1)Wild type 100 04 388 969
His48Ala 16 056 75 134
His63Ala 31 10 142 142
His114Ala 28 085 125 147
His124Ala 48 084 321 381
His135Ala 22 059 117 198
His212Ala lt0007 nd nd
His268Ala lt0003 nd nd
Asp14Ala 5 086 056 07
Asp113Ala 63 045 238 528
Asp131Ala 68 048 363 755
Asp203Ala 32 091 123 135
Asp214Ala lt0004 nd nd
Isopenicillin N Synthase
46
ndash IPNS
47
48
14
בונים עץ עי חיבור הרצפים הדומים לפי סדר התאמתם2
Hbb Hbb Hba Hba Myg Human Horse Human Horse Whale
בונים את ההתאמה לפי הסדר המוכתב3 עי העץ
ישנם שלושה מצביםהתאמת זוג רצפים - התאמה בין התאמות-הוספת רצף בודד להתאמה קיימת-
בונים טבלה של מרחק עריכה בין כל שני רצפים1CLUSTALW algorithm
15
ניתן להתאים בין התאמות (או בין רצף להתאמה) עי הרחבה די טבעית של שיטת התיכנות הדינמי הוספה
והכנסת אותיות באופן הרגיל המחיר להתאמת אותיות הוא מחיר ממוצע לקומבינציות השונות צריך לתת מחיר מיוחד
לרווח
16
נקודות עדינות
נותנים משקלות שונים לרצפים כך שאם יש מספר רצפים מאד דומים bullהמשקל היחסי של כל אחד יקטן
שיטה מיוחדת לקביעת מחיר להכנסת רווחיםbull
חסרונותמאחר שהבניה היא בלתי הפיכה יש תלות גדולה מאד באיכות ההתאמה bull
של הזוגות הראשוניםOnce a Gap Always a gapרווחים אינם נעלמים bullרגישות רבה לקנס הרווחיםbull
יתרונותמהירbullסבירbullאפשר לקבוע הרבה פרמטריםbullכולם משתמשיםbull
CLUSTALW algorithm
17
bull We can deduce a pairwise alignment for each two sequences in the multiple alignment (projected pairwise alignment)
bull The projected pairwise alignment is NOT the best pairwise alignment for the two sequences
bull CLUTALW is not an optimal algorithm Better alignments might exist The algorithm yields a possible alignment but not necessarily the best one
Best Pairwise alignment (optimal)
Projected Pairwise alignment
CLUSTALW algorithm
18
ClustalW at EMBLhttpwwwebiacukclustalw Clustalw at the SRS site at EBI
19
ClustalW Output Aln format
20
MSA Editing Jalview
Conservation
wwwesembnetorgServicesMolBiojalviewindexhtml
21
MSA formats - fasta
22
MSA formats - Aln
23
MSA formats - MSF
24
Example 1a a good MSA
25
Example 1b making MSA of distantly related proteins
26
Example 1c including more distant relatives in the MSA
27
bull Mononuclear iron proteins ndash electron carrier proteins Iron atoms are bound to amino acid side chains
bull In IPNS the metal ion is coordinated by three protein residues
bull IPNS is involved in biosynthesis of penicillin
Example 2 Isopenicillin N Synthase
N
SN
OCOOHO
H
H
Me
Me
COOH
NH2
N
SN
OCOOH
NH2 Me
Me
COOHO
ACV
Isopenicillin N
Fe+2
Ascorbate
O2
2H2O Fe
N
NHis268
H
N
NHis212
H
SACV
O2 (NO)
H2O
OAsp214
O
28
Research IPNS
bull Goal Identify Fe+2 binding residues
bull Possible solutions1 In the lab
2 Bioinformatic approach (comparing different IPNS sequences)
29
Step 1
Multiple alignment of known IPNS
Implementation
1 Obtain sequence (eg for MCBI)
IPNS AND Bacteria[Organism]
2 MSA (clustalw) and search for conserved residues in the MSA
30
31
32
MSA ndash bacteria only
Not enough variation
33
MSA ndash bacteria amp fungi
Not enough variation
bacteria amp fungi
bacteria
34
Step 2
Goal Add more enzymes similar to IPNS
ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW
35
bull New multiple alignment narrowing down the possibilities
36
Simple multiple alignment
bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite
similarbull Not enough variability to categorize the active
sitesbull We need to obtain even more distant sequences
(distant homologs)
37
Step 3
Using the results of the MSA for further searches
Implementation
1 Obtain an MSA (clustalw)
2 - Construct a consensus sequence and perform a
new search
OR
- Construct a profile and perform a new search
3 MSA (clustalw) and search for conserved residues in the MSA
38
Consensus Sequence
bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column
bull Consensus each position reflects the most common character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
39
Profilebull We can deduce a statistical model describing
the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column
bull Profile each position reflects the frequency of the character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 1 067 0 0
T 0 033 1 1
C 0 0 0 0
G 0 0 0 0
40
Profile vs Consensusbull The following multiple alignments will
have the same consensus
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
41
Profile vs Consensusbull But have a different profile
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 066 1 0 0
T 0 0 0 1
C 033 0 066
0
G 0 0 033
0
1 2 3 4 5 6
A 1 1 0 0
T 0 0 0 1
C 0 0 1 0
G 0 0 0 0
42
Sequence LOGO A A C C G C T C T T
A G C C G C G C - T
A - C A G A G C C T
A A G C A C G C - T
A C G G G T G C T T
A T G C ndash C G C - T
A gc c g G C T
httpweblogoberkeleyedu
43
Psi Blast
bull Position Specific Iterated - automatic profile-like search
Regular blast
Construct profile from blast results
Blast profile search
Final results
44
bull Alignment with distantly related proteins
45
bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS
Enzyme Relative Km kcat kcatKm
Activity (mM) (min-1) (mM-1min
-1)Wild type 100 04 388 969
His48Ala 16 056 75 134
His63Ala 31 10 142 142
His114Ala 28 085 125 147
His124Ala 48 084 321 381
His135Ala 22 059 117 198
His212Ala lt0007 nd nd
His268Ala lt0003 nd nd
Asp14Ala 5 086 056 07
Asp113Ala 63 045 238 528
Asp131Ala 68 048 363 755
Asp203Ala 32 091 123 135
Asp214Ala lt0004 nd nd
Isopenicillin N Synthase
46
ndash IPNS
47
48
15
ניתן להתאים בין התאמות (או בין רצף להתאמה) עי הרחבה די טבעית של שיטת התיכנות הדינמי הוספה
והכנסת אותיות באופן הרגיל המחיר להתאמת אותיות הוא מחיר ממוצע לקומבינציות השונות צריך לתת מחיר מיוחד
לרווח
16
נקודות עדינות
נותנים משקלות שונים לרצפים כך שאם יש מספר רצפים מאד דומים bullהמשקל היחסי של כל אחד יקטן
שיטה מיוחדת לקביעת מחיר להכנסת רווחיםbull
חסרונותמאחר שהבניה היא בלתי הפיכה יש תלות גדולה מאד באיכות ההתאמה bull
של הזוגות הראשוניםOnce a Gap Always a gapרווחים אינם נעלמים bullרגישות רבה לקנס הרווחיםbull
יתרונותמהירbullסבירbullאפשר לקבוע הרבה פרמטריםbullכולם משתמשיםbull
CLUSTALW algorithm
17
bull We can deduce a pairwise alignment for each two sequences in the multiple alignment (projected pairwise alignment)
bull The projected pairwise alignment is NOT the best pairwise alignment for the two sequences
bull CLUTALW is not an optimal algorithm Better alignments might exist The algorithm yields a possible alignment but not necessarily the best one
Best Pairwise alignment (optimal)
Projected Pairwise alignment
CLUSTALW algorithm
18
ClustalW at EMBLhttpwwwebiacukclustalw Clustalw at the SRS site at EBI
19
ClustalW Output Aln format
20
MSA Editing Jalview
Conservation
wwwesembnetorgServicesMolBiojalviewindexhtml
21
MSA formats - fasta
22
MSA formats - Aln
23
MSA formats - MSF
24
Example 1a a good MSA
25
Example 1b making MSA of distantly related proteins
26
Example 1c including more distant relatives in the MSA
27
bull Mononuclear iron proteins ndash electron carrier proteins Iron atoms are bound to amino acid side chains
bull In IPNS the metal ion is coordinated by three protein residues
bull IPNS is involved in biosynthesis of penicillin
Example 2 Isopenicillin N Synthase
N
SN
OCOOHO
H
H
Me
Me
COOH
NH2
N
SN
OCOOH
NH2 Me
Me
COOHO
ACV
Isopenicillin N
Fe+2
Ascorbate
O2
2H2O Fe
N
NHis268
H
N
NHis212
H
SACV
O2 (NO)
H2O
OAsp214
O
28
Research IPNS
bull Goal Identify Fe+2 binding residues
bull Possible solutions1 In the lab
2 Bioinformatic approach (comparing different IPNS sequences)
29
Step 1
Multiple alignment of known IPNS
Implementation
1 Obtain sequence (eg for MCBI)
IPNS AND Bacteria[Organism]
2 MSA (clustalw) and search for conserved residues in the MSA
30
31
32
MSA ndash bacteria only
Not enough variation
33
MSA ndash bacteria amp fungi
Not enough variation
bacteria amp fungi
bacteria
34
Step 2
Goal Add more enzymes similar to IPNS
ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW
35
bull New multiple alignment narrowing down the possibilities
36
Simple multiple alignment
bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite
similarbull Not enough variability to categorize the active
sitesbull We need to obtain even more distant sequences
(distant homologs)
37
Step 3
Using the results of the MSA for further searches
Implementation
1 Obtain an MSA (clustalw)
2 - Construct a consensus sequence and perform a
new search
OR
- Construct a profile and perform a new search
3 MSA (clustalw) and search for conserved residues in the MSA
38
Consensus Sequence
bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column
bull Consensus each position reflects the most common character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
39
Profilebull We can deduce a statistical model describing
the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column
bull Profile each position reflects the frequency of the character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 1 067 0 0
T 0 033 1 1
C 0 0 0 0
G 0 0 0 0
40
Profile vs Consensusbull The following multiple alignments will
have the same consensus
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
41
Profile vs Consensusbull But have a different profile
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 066 1 0 0
T 0 0 0 1
C 033 0 066
0
G 0 0 033
0
1 2 3 4 5 6
A 1 1 0 0
T 0 0 0 1
C 0 0 1 0
G 0 0 0 0
42
Sequence LOGO A A C C G C T C T T
A G C C G C G C - T
A - C A G A G C C T
A A G C A C G C - T
A C G G G T G C T T
A T G C ndash C G C - T
A gc c g G C T
httpweblogoberkeleyedu
43
Psi Blast
bull Position Specific Iterated - automatic profile-like search
Regular blast
Construct profile from blast results
Blast profile search
Final results
44
bull Alignment with distantly related proteins
45
bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS
Enzyme Relative Km kcat kcatKm
Activity (mM) (min-1) (mM-1min
-1)Wild type 100 04 388 969
His48Ala 16 056 75 134
His63Ala 31 10 142 142
His114Ala 28 085 125 147
His124Ala 48 084 321 381
His135Ala 22 059 117 198
His212Ala lt0007 nd nd
His268Ala lt0003 nd nd
Asp14Ala 5 086 056 07
Asp113Ala 63 045 238 528
Asp131Ala 68 048 363 755
Asp203Ala 32 091 123 135
Asp214Ala lt0004 nd nd
Isopenicillin N Synthase
46
ndash IPNS
47
48
16
נקודות עדינות
נותנים משקלות שונים לרצפים כך שאם יש מספר רצפים מאד דומים bullהמשקל היחסי של כל אחד יקטן
שיטה מיוחדת לקביעת מחיר להכנסת רווחיםbull
חסרונותמאחר שהבניה היא בלתי הפיכה יש תלות גדולה מאד באיכות ההתאמה bull
של הזוגות הראשוניםOnce a Gap Always a gapרווחים אינם נעלמים bullרגישות רבה לקנס הרווחיםbull
יתרונותמהירbullסבירbullאפשר לקבוע הרבה פרמטריםbullכולם משתמשיםbull
CLUSTALW algorithm
17
bull We can deduce a pairwise alignment for each two sequences in the multiple alignment (projected pairwise alignment)
bull The projected pairwise alignment is NOT the best pairwise alignment for the two sequences
bull CLUTALW is not an optimal algorithm Better alignments might exist The algorithm yields a possible alignment but not necessarily the best one
Best Pairwise alignment (optimal)
Projected Pairwise alignment
CLUSTALW algorithm
18
ClustalW at EMBLhttpwwwebiacukclustalw Clustalw at the SRS site at EBI
19
ClustalW Output Aln format
20
MSA Editing Jalview
Conservation
wwwesembnetorgServicesMolBiojalviewindexhtml
21
MSA formats - fasta
22
MSA formats - Aln
23
MSA formats - MSF
24
Example 1a a good MSA
25
Example 1b making MSA of distantly related proteins
26
Example 1c including more distant relatives in the MSA
27
bull Mononuclear iron proteins ndash electron carrier proteins Iron atoms are bound to amino acid side chains
bull In IPNS the metal ion is coordinated by three protein residues
bull IPNS is involved in biosynthesis of penicillin
Example 2 Isopenicillin N Synthase
N
SN
OCOOHO
H
H
Me
Me
COOH
NH2
N
SN
OCOOH
NH2 Me
Me
COOHO
ACV
Isopenicillin N
Fe+2
Ascorbate
O2
2H2O Fe
N
NHis268
H
N
NHis212
H
SACV
O2 (NO)
H2O
OAsp214
O
28
Research IPNS
bull Goal Identify Fe+2 binding residues
bull Possible solutions1 In the lab
2 Bioinformatic approach (comparing different IPNS sequences)
29
Step 1
Multiple alignment of known IPNS
Implementation
1 Obtain sequence (eg for MCBI)
IPNS AND Bacteria[Organism]
2 MSA (clustalw) and search for conserved residues in the MSA
30
31
32
MSA ndash bacteria only
Not enough variation
33
MSA ndash bacteria amp fungi
Not enough variation
bacteria amp fungi
bacteria
34
Step 2
Goal Add more enzymes similar to IPNS
ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW
35
bull New multiple alignment narrowing down the possibilities
36
Simple multiple alignment
bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite
similarbull Not enough variability to categorize the active
sitesbull We need to obtain even more distant sequences
(distant homologs)
37
Step 3
Using the results of the MSA for further searches
Implementation
1 Obtain an MSA (clustalw)
2 - Construct a consensus sequence and perform a
new search
OR
- Construct a profile and perform a new search
3 MSA (clustalw) and search for conserved residues in the MSA
38
Consensus Sequence
bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column
bull Consensus each position reflects the most common character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
39
Profilebull We can deduce a statistical model describing
the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column
bull Profile each position reflects the frequency of the character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 1 067 0 0
T 0 033 1 1
C 0 0 0 0
G 0 0 0 0
40
Profile vs Consensusbull The following multiple alignments will
have the same consensus
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
41
Profile vs Consensusbull But have a different profile
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 066 1 0 0
T 0 0 0 1
C 033 0 066
0
G 0 0 033
0
1 2 3 4 5 6
A 1 1 0 0
T 0 0 0 1
C 0 0 1 0
G 0 0 0 0
42
Sequence LOGO A A C C G C T C T T
A G C C G C G C - T
A - C A G A G C C T
A A G C A C G C - T
A C G G G T G C T T
A T G C ndash C G C - T
A gc c g G C T
httpweblogoberkeleyedu
43
Psi Blast
bull Position Specific Iterated - automatic profile-like search
Regular blast
Construct profile from blast results
Blast profile search
Final results
44
bull Alignment with distantly related proteins
45
bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS
Enzyme Relative Km kcat kcatKm
Activity (mM) (min-1) (mM-1min
-1)Wild type 100 04 388 969
His48Ala 16 056 75 134
His63Ala 31 10 142 142
His114Ala 28 085 125 147
His124Ala 48 084 321 381
His135Ala 22 059 117 198
His212Ala lt0007 nd nd
His268Ala lt0003 nd nd
Asp14Ala 5 086 056 07
Asp113Ala 63 045 238 528
Asp131Ala 68 048 363 755
Asp203Ala 32 091 123 135
Asp214Ala lt0004 nd nd
Isopenicillin N Synthase
46
ndash IPNS
47
48
17
bull We can deduce a pairwise alignment for each two sequences in the multiple alignment (projected pairwise alignment)
bull The projected pairwise alignment is NOT the best pairwise alignment for the two sequences
bull CLUTALW is not an optimal algorithm Better alignments might exist The algorithm yields a possible alignment but not necessarily the best one
Best Pairwise alignment (optimal)
Projected Pairwise alignment
CLUSTALW algorithm
18
ClustalW at EMBLhttpwwwebiacukclustalw Clustalw at the SRS site at EBI
19
ClustalW Output Aln format
20
MSA Editing Jalview
Conservation
wwwesembnetorgServicesMolBiojalviewindexhtml
21
MSA formats - fasta
22
MSA formats - Aln
23
MSA formats - MSF
24
Example 1a a good MSA
25
Example 1b making MSA of distantly related proteins
26
Example 1c including more distant relatives in the MSA
27
bull Mononuclear iron proteins ndash electron carrier proteins Iron atoms are bound to amino acid side chains
bull In IPNS the metal ion is coordinated by three protein residues
bull IPNS is involved in biosynthesis of penicillin
Example 2 Isopenicillin N Synthase
N
SN
OCOOHO
H
H
Me
Me
COOH
NH2
N
SN
OCOOH
NH2 Me
Me
COOHO
ACV
Isopenicillin N
Fe+2
Ascorbate
O2
2H2O Fe
N
NHis268
H
N
NHis212
H
SACV
O2 (NO)
H2O
OAsp214
O
28
Research IPNS
bull Goal Identify Fe+2 binding residues
bull Possible solutions1 In the lab
2 Bioinformatic approach (comparing different IPNS sequences)
29
Step 1
Multiple alignment of known IPNS
Implementation
1 Obtain sequence (eg for MCBI)
IPNS AND Bacteria[Organism]
2 MSA (clustalw) and search for conserved residues in the MSA
30
31
32
MSA ndash bacteria only
Not enough variation
33
MSA ndash bacteria amp fungi
Not enough variation
bacteria amp fungi
bacteria
34
Step 2
Goal Add more enzymes similar to IPNS
ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW
35
bull New multiple alignment narrowing down the possibilities
36
Simple multiple alignment
bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite
similarbull Not enough variability to categorize the active
sitesbull We need to obtain even more distant sequences
(distant homologs)
37
Step 3
Using the results of the MSA for further searches
Implementation
1 Obtain an MSA (clustalw)
2 - Construct a consensus sequence and perform a
new search
OR
- Construct a profile and perform a new search
3 MSA (clustalw) and search for conserved residues in the MSA
38
Consensus Sequence
bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column
bull Consensus each position reflects the most common character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
39
Profilebull We can deduce a statistical model describing
the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column
bull Profile each position reflects the frequency of the character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 1 067 0 0
T 0 033 1 1
C 0 0 0 0
G 0 0 0 0
40
Profile vs Consensusbull The following multiple alignments will
have the same consensus
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
41
Profile vs Consensusbull But have a different profile
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 066 1 0 0
T 0 0 0 1
C 033 0 066
0
G 0 0 033
0
1 2 3 4 5 6
A 1 1 0 0
T 0 0 0 1
C 0 0 1 0
G 0 0 0 0
42
Sequence LOGO A A C C G C T C T T
A G C C G C G C - T
A - C A G A G C C T
A A G C A C G C - T
A C G G G T G C T T
A T G C ndash C G C - T
A gc c g G C T
httpweblogoberkeleyedu
43
Psi Blast
bull Position Specific Iterated - automatic profile-like search
Regular blast
Construct profile from blast results
Blast profile search
Final results
44
bull Alignment with distantly related proteins
45
bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS
Enzyme Relative Km kcat kcatKm
Activity (mM) (min-1) (mM-1min
-1)Wild type 100 04 388 969
His48Ala 16 056 75 134
His63Ala 31 10 142 142
His114Ala 28 085 125 147
His124Ala 48 084 321 381
His135Ala 22 059 117 198
His212Ala lt0007 nd nd
His268Ala lt0003 nd nd
Asp14Ala 5 086 056 07
Asp113Ala 63 045 238 528
Asp131Ala 68 048 363 755
Asp203Ala 32 091 123 135
Asp214Ala lt0004 nd nd
Isopenicillin N Synthase
46
ndash IPNS
47
48
18
ClustalW at EMBLhttpwwwebiacukclustalw Clustalw at the SRS site at EBI
19
ClustalW Output Aln format
20
MSA Editing Jalview
Conservation
wwwesembnetorgServicesMolBiojalviewindexhtml
21
MSA formats - fasta
22
MSA formats - Aln
23
MSA formats - MSF
24
Example 1a a good MSA
25
Example 1b making MSA of distantly related proteins
26
Example 1c including more distant relatives in the MSA
27
bull Mononuclear iron proteins ndash electron carrier proteins Iron atoms are bound to amino acid side chains
bull In IPNS the metal ion is coordinated by three protein residues
bull IPNS is involved in biosynthesis of penicillin
Example 2 Isopenicillin N Synthase
N
SN
OCOOHO
H
H
Me
Me
COOH
NH2
N
SN
OCOOH
NH2 Me
Me
COOHO
ACV
Isopenicillin N
Fe+2
Ascorbate
O2
2H2O Fe
N
NHis268
H
N
NHis212
H
SACV
O2 (NO)
H2O
OAsp214
O
28
Research IPNS
bull Goal Identify Fe+2 binding residues
bull Possible solutions1 In the lab
2 Bioinformatic approach (comparing different IPNS sequences)
29
Step 1
Multiple alignment of known IPNS
Implementation
1 Obtain sequence (eg for MCBI)
IPNS AND Bacteria[Organism]
2 MSA (clustalw) and search for conserved residues in the MSA
30
31
32
MSA ndash bacteria only
Not enough variation
33
MSA ndash bacteria amp fungi
Not enough variation
bacteria amp fungi
bacteria
34
Step 2
Goal Add more enzymes similar to IPNS
ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW
35
bull New multiple alignment narrowing down the possibilities
36
Simple multiple alignment
bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite
similarbull Not enough variability to categorize the active
sitesbull We need to obtain even more distant sequences
(distant homologs)
37
Step 3
Using the results of the MSA for further searches
Implementation
1 Obtain an MSA (clustalw)
2 - Construct a consensus sequence and perform a
new search
OR
- Construct a profile and perform a new search
3 MSA (clustalw) and search for conserved residues in the MSA
38
Consensus Sequence
bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column
bull Consensus each position reflects the most common character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
39
Profilebull We can deduce a statistical model describing
the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column
bull Profile each position reflects the frequency of the character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 1 067 0 0
T 0 033 1 1
C 0 0 0 0
G 0 0 0 0
40
Profile vs Consensusbull The following multiple alignments will
have the same consensus
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
41
Profile vs Consensusbull But have a different profile
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 066 1 0 0
T 0 0 0 1
C 033 0 066
0
G 0 0 033
0
1 2 3 4 5 6
A 1 1 0 0
T 0 0 0 1
C 0 0 1 0
G 0 0 0 0
42
Sequence LOGO A A C C G C T C T T
A G C C G C G C - T
A - C A G A G C C T
A A G C A C G C - T
A C G G G T G C T T
A T G C ndash C G C - T
A gc c g G C T
httpweblogoberkeleyedu
43
Psi Blast
bull Position Specific Iterated - automatic profile-like search
Regular blast
Construct profile from blast results
Blast profile search
Final results
44
bull Alignment with distantly related proteins
45
bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS
Enzyme Relative Km kcat kcatKm
Activity (mM) (min-1) (mM-1min
-1)Wild type 100 04 388 969
His48Ala 16 056 75 134
His63Ala 31 10 142 142
His114Ala 28 085 125 147
His124Ala 48 084 321 381
His135Ala 22 059 117 198
His212Ala lt0007 nd nd
His268Ala lt0003 nd nd
Asp14Ala 5 086 056 07
Asp113Ala 63 045 238 528
Asp131Ala 68 048 363 755
Asp203Ala 32 091 123 135
Asp214Ala lt0004 nd nd
Isopenicillin N Synthase
46
ndash IPNS
47
48
19
ClustalW Output Aln format
20
MSA Editing Jalview
Conservation
wwwesembnetorgServicesMolBiojalviewindexhtml
21
MSA formats - fasta
22
MSA formats - Aln
23
MSA formats - MSF
24
Example 1a a good MSA
25
Example 1b making MSA of distantly related proteins
26
Example 1c including more distant relatives in the MSA
27
bull Mononuclear iron proteins ndash electron carrier proteins Iron atoms are bound to amino acid side chains
bull In IPNS the metal ion is coordinated by three protein residues
bull IPNS is involved in biosynthesis of penicillin
Example 2 Isopenicillin N Synthase
N
SN
OCOOHO
H
H
Me
Me
COOH
NH2
N
SN
OCOOH
NH2 Me
Me
COOHO
ACV
Isopenicillin N
Fe+2
Ascorbate
O2
2H2O Fe
N
NHis268
H
N
NHis212
H
SACV
O2 (NO)
H2O
OAsp214
O
28
Research IPNS
bull Goal Identify Fe+2 binding residues
bull Possible solutions1 In the lab
2 Bioinformatic approach (comparing different IPNS sequences)
29
Step 1
Multiple alignment of known IPNS
Implementation
1 Obtain sequence (eg for MCBI)
IPNS AND Bacteria[Organism]
2 MSA (clustalw) and search for conserved residues in the MSA
30
31
32
MSA ndash bacteria only
Not enough variation
33
MSA ndash bacteria amp fungi
Not enough variation
bacteria amp fungi
bacteria
34
Step 2
Goal Add more enzymes similar to IPNS
ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW
35
bull New multiple alignment narrowing down the possibilities
36
Simple multiple alignment
bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite
similarbull Not enough variability to categorize the active
sitesbull We need to obtain even more distant sequences
(distant homologs)
37
Step 3
Using the results of the MSA for further searches
Implementation
1 Obtain an MSA (clustalw)
2 - Construct a consensus sequence and perform a
new search
OR
- Construct a profile and perform a new search
3 MSA (clustalw) and search for conserved residues in the MSA
38
Consensus Sequence
bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column
bull Consensus each position reflects the most common character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
39
Profilebull We can deduce a statistical model describing
the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column
bull Profile each position reflects the frequency of the character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 1 067 0 0
T 0 033 1 1
C 0 0 0 0
G 0 0 0 0
40
Profile vs Consensusbull The following multiple alignments will
have the same consensus
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
41
Profile vs Consensusbull But have a different profile
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 066 1 0 0
T 0 0 0 1
C 033 0 066
0
G 0 0 033
0
1 2 3 4 5 6
A 1 1 0 0
T 0 0 0 1
C 0 0 1 0
G 0 0 0 0
42
Sequence LOGO A A C C G C T C T T
A G C C G C G C - T
A - C A G A G C C T
A A G C A C G C - T
A C G G G T G C T T
A T G C ndash C G C - T
A gc c g G C T
httpweblogoberkeleyedu
43
Psi Blast
bull Position Specific Iterated - automatic profile-like search
Regular blast
Construct profile from blast results
Blast profile search
Final results
44
bull Alignment with distantly related proteins
45
bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS
Enzyme Relative Km kcat kcatKm
Activity (mM) (min-1) (mM-1min
-1)Wild type 100 04 388 969
His48Ala 16 056 75 134
His63Ala 31 10 142 142
His114Ala 28 085 125 147
His124Ala 48 084 321 381
His135Ala 22 059 117 198
His212Ala lt0007 nd nd
His268Ala lt0003 nd nd
Asp14Ala 5 086 056 07
Asp113Ala 63 045 238 528
Asp131Ala 68 048 363 755
Asp203Ala 32 091 123 135
Asp214Ala lt0004 nd nd
Isopenicillin N Synthase
46
ndash IPNS
47
48
20
MSA Editing Jalview
Conservation
wwwesembnetorgServicesMolBiojalviewindexhtml
21
MSA formats - fasta
22
MSA formats - Aln
23
MSA formats - MSF
24
Example 1a a good MSA
25
Example 1b making MSA of distantly related proteins
26
Example 1c including more distant relatives in the MSA
27
bull Mononuclear iron proteins ndash electron carrier proteins Iron atoms are bound to amino acid side chains
bull In IPNS the metal ion is coordinated by three protein residues
bull IPNS is involved in biosynthesis of penicillin
Example 2 Isopenicillin N Synthase
N
SN
OCOOHO
H
H
Me
Me
COOH
NH2
N
SN
OCOOH
NH2 Me
Me
COOHO
ACV
Isopenicillin N
Fe+2
Ascorbate
O2
2H2O Fe
N
NHis268
H
N
NHis212
H
SACV
O2 (NO)
H2O
OAsp214
O
28
Research IPNS
bull Goal Identify Fe+2 binding residues
bull Possible solutions1 In the lab
2 Bioinformatic approach (comparing different IPNS sequences)
29
Step 1
Multiple alignment of known IPNS
Implementation
1 Obtain sequence (eg for MCBI)
IPNS AND Bacteria[Organism]
2 MSA (clustalw) and search for conserved residues in the MSA
30
31
32
MSA ndash bacteria only
Not enough variation
33
MSA ndash bacteria amp fungi
Not enough variation
bacteria amp fungi
bacteria
34
Step 2
Goal Add more enzymes similar to IPNS
ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW
35
bull New multiple alignment narrowing down the possibilities
36
Simple multiple alignment
bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite
similarbull Not enough variability to categorize the active
sitesbull We need to obtain even more distant sequences
(distant homologs)
37
Step 3
Using the results of the MSA for further searches
Implementation
1 Obtain an MSA (clustalw)
2 - Construct a consensus sequence and perform a
new search
OR
- Construct a profile and perform a new search
3 MSA (clustalw) and search for conserved residues in the MSA
38
Consensus Sequence
bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column
bull Consensus each position reflects the most common character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
39
Profilebull We can deduce a statistical model describing
the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column
bull Profile each position reflects the frequency of the character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 1 067 0 0
T 0 033 1 1
C 0 0 0 0
G 0 0 0 0
40
Profile vs Consensusbull The following multiple alignments will
have the same consensus
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
41
Profile vs Consensusbull But have a different profile
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 066 1 0 0
T 0 0 0 1
C 033 0 066
0
G 0 0 033
0
1 2 3 4 5 6
A 1 1 0 0
T 0 0 0 1
C 0 0 1 0
G 0 0 0 0
42
Sequence LOGO A A C C G C T C T T
A G C C G C G C - T
A - C A G A G C C T
A A G C A C G C - T
A C G G G T G C T T
A T G C ndash C G C - T
A gc c g G C T
httpweblogoberkeleyedu
43
Psi Blast
bull Position Specific Iterated - automatic profile-like search
Regular blast
Construct profile from blast results
Blast profile search
Final results
44
bull Alignment with distantly related proteins
45
bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS
Enzyme Relative Km kcat kcatKm
Activity (mM) (min-1) (mM-1min
-1)Wild type 100 04 388 969
His48Ala 16 056 75 134
His63Ala 31 10 142 142
His114Ala 28 085 125 147
His124Ala 48 084 321 381
His135Ala 22 059 117 198
His212Ala lt0007 nd nd
His268Ala lt0003 nd nd
Asp14Ala 5 086 056 07
Asp113Ala 63 045 238 528
Asp131Ala 68 048 363 755
Asp203Ala 32 091 123 135
Asp214Ala lt0004 nd nd
Isopenicillin N Synthase
46
ndash IPNS
47
48
21
MSA formats - fasta
22
MSA formats - Aln
23
MSA formats - MSF
24
Example 1a a good MSA
25
Example 1b making MSA of distantly related proteins
26
Example 1c including more distant relatives in the MSA
27
bull Mononuclear iron proteins ndash electron carrier proteins Iron atoms are bound to amino acid side chains
bull In IPNS the metal ion is coordinated by three protein residues
bull IPNS is involved in biosynthesis of penicillin
Example 2 Isopenicillin N Synthase
N
SN
OCOOHO
H
H
Me
Me
COOH
NH2
N
SN
OCOOH
NH2 Me
Me
COOHO
ACV
Isopenicillin N
Fe+2
Ascorbate
O2
2H2O Fe
N
NHis268
H
N
NHis212
H
SACV
O2 (NO)
H2O
OAsp214
O
28
Research IPNS
bull Goal Identify Fe+2 binding residues
bull Possible solutions1 In the lab
2 Bioinformatic approach (comparing different IPNS sequences)
29
Step 1
Multiple alignment of known IPNS
Implementation
1 Obtain sequence (eg for MCBI)
IPNS AND Bacteria[Organism]
2 MSA (clustalw) and search for conserved residues in the MSA
30
31
32
MSA ndash bacteria only
Not enough variation
33
MSA ndash bacteria amp fungi
Not enough variation
bacteria amp fungi
bacteria
34
Step 2
Goal Add more enzymes similar to IPNS
ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW
35
bull New multiple alignment narrowing down the possibilities
36
Simple multiple alignment
bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite
similarbull Not enough variability to categorize the active
sitesbull We need to obtain even more distant sequences
(distant homologs)
37
Step 3
Using the results of the MSA for further searches
Implementation
1 Obtain an MSA (clustalw)
2 - Construct a consensus sequence and perform a
new search
OR
- Construct a profile and perform a new search
3 MSA (clustalw) and search for conserved residues in the MSA
38
Consensus Sequence
bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column
bull Consensus each position reflects the most common character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
39
Profilebull We can deduce a statistical model describing
the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column
bull Profile each position reflects the frequency of the character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 1 067 0 0
T 0 033 1 1
C 0 0 0 0
G 0 0 0 0
40
Profile vs Consensusbull The following multiple alignments will
have the same consensus
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
41
Profile vs Consensusbull But have a different profile
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 066 1 0 0
T 0 0 0 1
C 033 0 066
0
G 0 0 033
0
1 2 3 4 5 6
A 1 1 0 0
T 0 0 0 1
C 0 0 1 0
G 0 0 0 0
42
Sequence LOGO A A C C G C T C T T
A G C C G C G C - T
A - C A G A G C C T
A A G C A C G C - T
A C G G G T G C T T
A T G C ndash C G C - T
A gc c g G C T
httpweblogoberkeleyedu
43
Psi Blast
bull Position Specific Iterated - automatic profile-like search
Regular blast
Construct profile from blast results
Blast profile search
Final results
44
bull Alignment with distantly related proteins
45
bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS
Enzyme Relative Km kcat kcatKm
Activity (mM) (min-1) (mM-1min
-1)Wild type 100 04 388 969
His48Ala 16 056 75 134
His63Ala 31 10 142 142
His114Ala 28 085 125 147
His124Ala 48 084 321 381
His135Ala 22 059 117 198
His212Ala lt0007 nd nd
His268Ala lt0003 nd nd
Asp14Ala 5 086 056 07
Asp113Ala 63 045 238 528
Asp131Ala 68 048 363 755
Asp203Ala 32 091 123 135
Asp214Ala lt0004 nd nd
Isopenicillin N Synthase
46
ndash IPNS
47
48
22
MSA formats - Aln
23
MSA formats - MSF
24
Example 1a a good MSA
25
Example 1b making MSA of distantly related proteins
26
Example 1c including more distant relatives in the MSA
27
bull Mononuclear iron proteins ndash electron carrier proteins Iron atoms are bound to amino acid side chains
bull In IPNS the metal ion is coordinated by three protein residues
bull IPNS is involved in biosynthesis of penicillin
Example 2 Isopenicillin N Synthase
N
SN
OCOOHO
H
H
Me
Me
COOH
NH2
N
SN
OCOOH
NH2 Me
Me
COOHO
ACV
Isopenicillin N
Fe+2
Ascorbate
O2
2H2O Fe
N
NHis268
H
N
NHis212
H
SACV
O2 (NO)
H2O
OAsp214
O
28
Research IPNS
bull Goal Identify Fe+2 binding residues
bull Possible solutions1 In the lab
2 Bioinformatic approach (comparing different IPNS sequences)
29
Step 1
Multiple alignment of known IPNS
Implementation
1 Obtain sequence (eg for MCBI)
IPNS AND Bacteria[Organism]
2 MSA (clustalw) and search for conserved residues in the MSA
30
31
32
MSA ndash bacteria only
Not enough variation
33
MSA ndash bacteria amp fungi
Not enough variation
bacteria amp fungi
bacteria
34
Step 2
Goal Add more enzymes similar to IPNS
ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW
35
bull New multiple alignment narrowing down the possibilities
36
Simple multiple alignment
bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite
similarbull Not enough variability to categorize the active
sitesbull We need to obtain even more distant sequences
(distant homologs)
37
Step 3
Using the results of the MSA for further searches
Implementation
1 Obtain an MSA (clustalw)
2 - Construct a consensus sequence and perform a
new search
OR
- Construct a profile and perform a new search
3 MSA (clustalw) and search for conserved residues in the MSA
38
Consensus Sequence
bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column
bull Consensus each position reflects the most common character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
39
Profilebull We can deduce a statistical model describing
the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column
bull Profile each position reflects the frequency of the character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 1 067 0 0
T 0 033 1 1
C 0 0 0 0
G 0 0 0 0
40
Profile vs Consensusbull The following multiple alignments will
have the same consensus
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
41
Profile vs Consensusbull But have a different profile
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 066 1 0 0
T 0 0 0 1
C 033 0 066
0
G 0 0 033
0
1 2 3 4 5 6
A 1 1 0 0
T 0 0 0 1
C 0 0 1 0
G 0 0 0 0
42
Sequence LOGO A A C C G C T C T T
A G C C G C G C - T
A - C A G A G C C T
A A G C A C G C - T
A C G G G T G C T T
A T G C ndash C G C - T
A gc c g G C T
httpweblogoberkeleyedu
43
Psi Blast
bull Position Specific Iterated - automatic profile-like search
Regular blast
Construct profile from blast results
Blast profile search
Final results
44
bull Alignment with distantly related proteins
45
bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS
Enzyme Relative Km kcat kcatKm
Activity (mM) (min-1) (mM-1min
-1)Wild type 100 04 388 969
His48Ala 16 056 75 134
His63Ala 31 10 142 142
His114Ala 28 085 125 147
His124Ala 48 084 321 381
His135Ala 22 059 117 198
His212Ala lt0007 nd nd
His268Ala lt0003 nd nd
Asp14Ala 5 086 056 07
Asp113Ala 63 045 238 528
Asp131Ala 68 048 363 755
Asp203Ala 32 091 123 135
Asp214Ala lt0004 nd nd
Isopenicillin N Synthase
46
ndash IPNS
47
48
23
MSA formats - MSF
24
Example 1a a good MSA
25
Example 1b making MSA of distantly related proteins
26
Example 1c including more distant relatives in the MSA
27
bull Mononuclear iron proteins ndash electron carrier proteins Iron atoms are bound to amino acid side chains
bull In IPNS the metal ion is coordinated by three protein residues
bull IPNS is involved in biosynthesis of penicillin
Example 2 Isopenicillin N Synthase
N
SN
OCOOHO
H
H
Me
Me
COOH
NH2
N
SN
OCOOH
NH2 Me
Me
COOHO
ACV
Isopenicillin N
Fe+2
Ascorbate
O2
2H2O Fe
N
NHis268
H
N
NHis212
H
SACV
O2 (NO)
H2O
OAsp214
O
28
Research IPNS
bull Goal Identify Fe+2 binding residues
bull Possible solutions1 In the lab
2 Bioinformatic approach (comparing different IPNS sequences)
29
Step 1
Multiple alignment of known IPNS
Implementation
1 Obtain sequence (eg for MCBI)
IPNS AND Bacteria[Organism]
2 MSA (clustalw) and search for conserved residues in the MSA
30
31
32
MSA ndash bacteria only
Not enough variation
33
MSA ndash bacteria amp fungi
Not enough variation
bacteria amp fungi
bacteria
34
Step 2
Goal Add more enzymes similar to IPNS
ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW
35
bull New multiple alignment narrowing down the possibilities
36
Simple multiple alignment
bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite
similarbull Not enough variability to categorize the active
sitesbull We need to obtain even more distant sequences
(distant homologs)
37
Step 3
Using the results of the MSA for further searches
Implementation
1 Obtain an MSA (clustalw)
2 - Construct a consensus sequence and perform a
new search
OR
- Construct a profile and perform a new search
3 MSA (clustalw) and search for conserved residues in the MSA
38
Consensus Sequence
bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column
bull Consensus each position reflects the most common character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
39
Profilebull We can deduce a statistical model describing
the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column
bull Profile each position reflects the frequency of the character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 1 067 0 0
T 0 033 1 1
C 0 0 0 0
G 0 0 0 0
40
Profile vs Consensusbull The following multiple alignments will
have the same consensus
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
41
Profile vs Consensusbull But have a different profile
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 066 1 0 0
T 0 0 0 1
C 033 0 066
0
G 0 0 033
0
1 2 3 4 5 6
A 1 1 0 0
T 0 0 0 1
C 0 0 1 0
G 0 0 0 0
42
Sequence LOGO A A C C G C T C T T
A G C C G C G C - T
A - C A G A G C C T
A A G C A C G C - T
A C G G G T G C T T
A T G C ndash C G C - T
A gc c g G C T
httpweblogoberkeleyedu
43
Psi Blast
bull Position Specific Iterated - automatic profile-like search
Regular blast
Construct profile from blast results
Blast profile search
Final results
44
bull Alignment with distantly related proteins
45
bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS
Enzyme Relative Km kcat kcatKm
Activity (mM) (min-1) (mM-1min
-1)Wild type 100 04 388 969
His48Ala 16 056 75 134
His63Ala 31 10 142 142
His114Ala 28 085 125 147
His124Ala 48 084 321 381
His135Ala 22 059 117 198
His212Ala lt0007 nd nd
His268Ala lt0003 nd nd
Asp14Ala 5 086 056 07
Asp113Ala 63 045 238 528
Asp131Ala 68 048 363 755
Asp203Ala 32 091 123 135
Asp214Ala lt0004 nd nd
Isopenicillin N Synthase
46
ndash IPNS
47
48
24
Example 1a a good MSA
25
Example 1b making MSA of distantly related proteins
26
Example 1c including more distant relatives in the MSA
27
bull Mononuclear iron proteins ndash electron carrier proteins Iron atoms are bound to amino acid side chains
bull In IPNS the metal ion is coordinated by three protein residues
bull IPNS is involved in biosynthesis of penicillin
Example 2 Isopenicillin N Synthase
N
SN
OCOOHO
H
H
Me
Me
COOH
NH2
N
SN
OCOOH
NH2 Me
Me
COOHO
ACV
Isopenicillin N
Fe+2
Ascorbate
O2
2H2O Fe
N
NHis268
H
N
NHis212
H
SACV
O2 (NO)
H2O
OAsp214
O
28
Research IPNS
bull Goal Identify Fe+2 binding residues
bull Possible solutions1 In the lab
2 Bioinformatic approach (comparing different IPNS sequences)
29
Step 1
Multiple alignment of known IPNS
Implementation
1 Obtain sequence (eg for MCBI)
IPNS AND Bacteria[Organism]
2 MSA (clustalw) and search for conserved residues in the MSA
30
31
32
MSA ndash bacteria only
Not enough variation
33
MSA ndash bacteria amp fungi
Not enough variation
bacteria amp fungi
bacteria
34
Step 2
Goal Add more enzymes similar to IPNS
ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW
35
bull New multiple alignment narrowing down the possibilities
36
Simple multiple alignment
bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite
similarbull Not enough variability to categorize the active
sitesbull We need to obtain even more distant sequences
(distant homologs)
37
Step 3
Using the results of the MSA for further searches
Implementation
1 Obtain an MSA (clustalw)
2 - Construct a consensus sequence and perform a
new search
OR
- Construct a profile and perform a new search
3 MSA (clustalw) and search for conserved residues in the MSA
38
Consensus Sequence
bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column
bull Consensus each position reflects the most common character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
39
Profilebull We can deduce a statistical model describing
the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column
bull Profile each position reflects the frequency of the character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 1 067 0 0
T 0 033 1 1
C 0 0 0 0
G 0 0 0 0
40
Profile vs Consensusbull The following multiple alignments will
have the same consensus
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
41
Profile vs Consensusbull But have a different profile
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 066 1 0 0
T 0 0 0 1
C 033 0 066
0
G 0 0 033
0
1 2 3 4 5 6
A 1 1 0 0
T 0 0 0 1
C 0 0 1 0
G 0 0 0 0
42
Sequence LOGO A A C C G C T C T T
A G C C G C G C - T
A - C A G A G C C T
A A G C A C G C - T
A C G G G T G C T T
A T G C ndash C G C - T
A gc c g G C T
httpweblogoberkeleyedu
43
Psi Blast
bull Position Specific Iterated - automatic profile-like search
Regular blast
Construct profile from blast results
Blast profile search
Final results
44
bull Alignment with distantly related proteins
45
bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS
Enzyme Relative Km kcat kcatKm
Activity (mM) (min-1) (mM-1min
-1)Wild type 100 04 388 969
His48Ala 16 056 75 134
His63Ala 31 10 142 142
His114Ala 28 085 125 147
His124Ala 48 084 321 381
His135Ala 22 059 117 198
His212Ala lt0007 nd nd
His268Ala lt0003 nd nd
Asp14Ala 5 086 056 07
Asp113Ala 63 045 238 528
Asp131Ala 68 048 363 755
Asp203Ala 32 091 123 135
Asp214Ala lt0004 nd nd
Isopenicillin N Synthase
46
ndash IPNS
47
48
25
Example 1b making MSA of distantly related proteins
26
Example 1c including more distant relatives in the MSA
27
bull Mononuclear iron proteins ndash electron carrier proteins Iron atoms are bound to amino acid side chains
bull In IPNS the metal ion is coordinated by three protein residues
bull IPNS is involved in biosynthesis of penicillin
Example 2 Isopenicillin N Synthase
N
SN
OCOOHO
H
H
Me
Me
COOH
NH2
N
SN
OCOOH
NH2 Me
Me
COOHO
ACV
Isopenicillin N
Fe+2
Ascorbate
O2
2H2O Fe
N
NHis268
H
N
NHis212
H
SACV
O2 (NO)
H2O
OAsp214
O
28
Research IPNS
bull Goal Identify Fe+2 binding residues
bull Possible solutions1 In the lab
2 Bioinformatic approach (comparing different IPNS sequences)
29
Step 1
Multiple alignment of known IPNS
Implementation
1 Obtain sequence (eg for MCBI)
IPNS AND Bacteria[Organism]
2 MSA (clustalw) and search for conserved residues in the MSA
30
31
32
MSA ndash bacteria only
Not enough variation
33
MSA ndash bacteria amp fungi
Not enough variation
bacteria amp fungi
bacteria
34
Step 2
Goal Add more enzymes similar to IPNS
ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW
35
bull New multiple alignment narrowing down the possibilities
36
Simple multiple alignment
bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite
similarbull Not enough variability to categorize the active
sitesbull We need to obtain even more distant sequences
(distant homologs)
37
Step 3
Using the results of the MSA for further searches
Implementation
1 Obtain an MSA (clustalw)
2 - Construct a consensus sequence and perform a
new search
OR
- Construct a profile and perform a new search
3 MSA (clustalw) and search for conserved residues in the MSA
38
Consensus Sequence
bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column
bull Consensus each position reflects the most common character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
39
Profilebull We can deduce a statistical model describing
the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column
bull Profile each position reflects the frequency of the character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 1 067 0 0
T 0 033 1 1
C 0 0 0 0
G 0 0 0 0
40
Profile vs Consensusbull The following multiple alignments will
have the same consensus
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
41
Profile vs Consensusbull But have a different profile
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 066 1 0 0
T 0 0 0 1
C 033 0 066
0
G 0 0 033
0
1 2 3 4 5 6
A 1 1 0 0
T 0 0 0 1
C 0 0 1 0
G 0 0 0 0
42
Sequence LOGO A A C C G C T C T T
A G C C G C G C - T
A - C A G A G C C T
A A G C A C G C - T
A C G G G T G C T T
A T G C ndash C G C - T
A gc c g G C T
httpweblogoberkeleyedu
43
Psi Blast
bull Position Specific Iterated - automatic profile-like search
Regular blast
Construct profile from blast results
Blast profile search
Final results
44
bull Alignment with distantly related proteins
45
bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS
Enzyme Relative Km kcat kcatKm
Activity (mM) (min-1) (mM-1min
-1)Wild type 100 04 388 969
His48Ala 16 056 75 134
His63Ala 31 10 142 142
His114Ala 28 085 125 147
His124Ala 48 084 321 381
His135Ala 22 059 117 198
His212Ala lt0007 nd nd
His268Ala lt0003 nd nd
Asp14Ala 5 086 056 07
Asp113Ala 63 045 238 528
Asp131Ala 68 048 363 755
Asp203Ala 32 091 123 135
Asp214Ala lt0004 nd nd
Isopenicillin N Synthase
46
ndash IPNS
47
48
26
Example 1c including more distant relatives in the MSA
27
bull Mononuclear iron proteins ndash electron carrier proteins Iron atoms are bound to amino acid side chains
bull In IPNS the metal ion is coordinated by three protein residues
bull IPNS is involved in biosynthesis of penicillin
Example 2 Isopenicillin N Synthase
N
SN
OCOOHO
H
H
Me
Me
COOH
NH2
N
SN
OCOOH
NH2 Me
Me
COOHO
ACV
Isopenicillin N
Fe+2
Ascorbate
O2
2H2O Fe
N
NHis268
H
N
NHis212
H
SACV
O2 (NO)
H2O
OAsp214
O
28
Research IPNS
bull Goal Identify Fe+2 binding residues
bull Possible solutions1 In the lab
2 Bioinformatic approach (comparing different IPNS sequences)
29
Step 1
Multiple alignment of known IPNS
Implementation
1 Obtain sequence (eg for MCBI)
IPNS AND Bacteria[Organism]
2 MSA (clustalw) and search for conserved residues in the MSA
30
31
32
MSA ndash bacteria only
Not enough variation
33
MSA ndash bacteria amp fungi
Not enough variation
bacteria amp fungi
bacteria
34
Step 2
Goal Add more enzymes similar to IPNS
ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW
35
bull New multiple alignment narrowing down the possibilities
36
Simple multiple alignment
bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite
similarbull Not enough variability to categorize the active
sitesbull We need to obtain even more distant sequences
(distant homologs)
37
Step 3
Using the results of the MSA for further searches
Implementation
1 Obtain an MSA (clustalw)
2 - Construct a consensus sequence and perform a
new search
OR
- Construct a profile and perform a new search
3 MSA (clustalw) and search for conserved residues in the MSA
38
Consensus Sequence
bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column
bull Consensus each position reflects the most common character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
39
Profilebull We can deduce a statistical model describing
the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column
bull Profile each position reflects the frequency of the character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 1 067 0 0
T 0 033 1 1
C 0 0 0 0
G 0 0 0 0
40
Profile vs Consensusbull The following multiple alignments will
have the same consensus
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
41
Profile vs Consensusbull But have a different profile
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 066 1 0 0
T 0 0 0 1
C 033 0 066
0
G 0 0 033
0
1 2 3 4 5 6
A 1 1 0 0
T 0 0 0 1
C 0 0 1 0
G 0 0 0 0
42
Sequence LOGO A A C C G C T C T T
A G C C G C G C - T
A - C A G A G C C T
A A G C A C G C - T
A C G G G T G C T T
A T G C ndash C G C - T
A gc c g G C T
httpweblogoberkeleyedu
43
Psi Blast
bull Position Specific Iterated - automatic profile-like search
Regular blast
Construct profile from blast results
Blast profile search
Final results
44
bull Alignment with distantly related proteins
45
bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS
Enzyme Relative Km kcat kcatKm
Activity (mM) (min-1) (mM-1min
-1)Wild type 100 04 388 969
His48Ala 16 056 75 134
His63Ala 31 10 142 142
His114Ala 28 085 125 147
His124Ala 48 084 321 381
His135Ala 22 059 117 198
His212Ala lt0007 nd nd
His268Ala lt0003 nd nd
Asp14Ala 5 086 056 07
Asp113Ala 63 045 238 528
Asp131Ala 68 048 363 755
Asp203Ala 32 091 123 135
Asp214Ala lt0004 nd nd
Isopenicillin N Synthase
46
ndash IPNS
47
48
27
bull Mononuclear iron proteins ndash electron carrier proteins Iron atoms are bound to amino acid side chains
bull In IPNS the metal ion is coordinated by three protein residues
bull IPNS is involved in biosynthesis of penicillin
Example 2 Isopenicillin N Synthase
N
SN
OCOOHO
H
H
Me
Me
COOH
NH2
N
SN
OCOOH
NH2 Me
Me
COOHO
ACV
Isopenicillin N
Fe+2
Ascorbate
O2
2H2O Fe
N
NHis268
H
N
NHis212
H
SACV
O2 (NO)
H2O
OAsp214
O
28
Research IPNS
bull Goal Identify Fe+2 binding residues
bull Possible solutions1 In the lab
2 Bioinformatic approach (comparing different IPNS sequences)
29
Step 1
Multiple alignment of known IPNS
Implementation
1 Obtain sequence (eg for MCBI)
IPNS AND Bacteria[Organism]
2 MSA (clustalw) and search for conserved residues in the MSA
30
31
32
MSA ndash bacteria only
Not enough variation
33
MSA ndash bacteria amp fungi
Not enough variation
bacteria amp fungi
bacteria
34
Step 2
Goal Add more enzymes similar to IPNS
ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW
35
bull New multiple alignment narrowing down the possibilities
36
Simple multiple alignment
bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite
similarbull Not enough variability to categorize the active
sitesbull We need to obtain even more distant sequences
(distant homologs)
37
Step 3
Using the results of the MSA for further searches
Implementation
1 Obtain an MSA (clustalw)
2 - Construct a consensus sequence and perform a
new search
OR
- Construct a profile and perform a new search
3 MSA (clustalw) and search for conserved residues in the MSA
38
Consensus Sequence
bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column
bull Consensus each position reflects the most common character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
39
Profilebull We can deduce a statistical model describing
the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column
bull Profile each position reflects the frequency of the character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 1 067 0 0
T 0 033 1 1
C 0 0 0 0
G 0 0 0 0
40
Profile vs Consensusbull The following multiple alignments will
have the same consensus
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
41
Profile vs Consensusbull But have a different profile
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 066 1 0 0
T 0 0 0 1
C 033 0 066
0
G 0 0 033
0
1 2 3 4 5 6
A 1 1 0 0
T 0 0 0 1
C 0 0 1 0
G 0 0 0 0
42
Sequence LOGO A A C C G C T C T T
A G C C G C G C - T
A - C A G A G C C T
A A G C A C G C - T
A C G G G T G C T T
A T G C ndash C G C - T
A gc c g G C T
httpweblogoberkeleyedu
43
Psi Blast
bull Position Specific Iterated - automatic profile-like search
Regular blast
Construct profile from blast results
Blast profile search
Final results
44
bull Alignment with distantly related proteins
45
bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS
Enzyme Relative Km kcat kcatKm
Activity (mM) (min-1) (mM-1min
-1)Wild type 100 04 388 969
His48Ala 16 056 75 134
His63Ala 31 10 142 142
His114Ala 28 085 125 147
His124Ala 48 084 321 381
His135Ala 22 059 117 198
His212Ala lt0007 nd nd
His268Ala lt0003 nd nd
Asp14Ala 5 086 056 07
Asp113Ala 63 045 238 528
Asp131Ala 68 048 363 755
Asp203Ala 32 091 123 135
Asp214Ala lt0004 nd nd
Isopenicillin N Synthase
46
ndash IPNS
47
48
28
Research IPNS
bull Goal Identify Fe+2 binding residues
bull Possible solutions1 In the lab
2 Bioinformatic approach (comparing different IPNS sequences)
29
Step 1
Multiple alignment of known IPNS
Implementation
1 Obtain sequence (eg for MCBI)
IPNS AND Bacteria[Organism]
2 MSA (clustalw) and search for conserved residues in the MSA
30
31
32
MSA ndash bacteria only
Not enough variation
33
MSA ndash bacteria amp fungi
Not enough variation
bacteria amp fungi
bacteria
34
Step 2
Goal Add more enzymes similar to IPNS
ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW
35
bull New multiple alignment narrowing down the possibilities
36
Simple multiple alignment
bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite
similarbull Not enough variability to categorize the active
sitesbull We need to obtain even more distant sequences
(distant homologs)
37
Step 3
Using the results of the MSA for further searches
Implementation
1 Obtain an MSA (clustalw)
2 - Construct a consensus sequence and perform a
new search
OR
- Construct a profile and perform a new search
3 MSA (clustalw) and search for conserved residues in the MSA
38
Consensus Sequence
bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column
bull Consensus each position reflects the most common character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
39
Profilebull We can deduce a statistical model describing
the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column
bull Profile each position reflects the frequency of the character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 1 067 0 0
T 0 033 1 1
C 0 0 0 0
G 0 0 0 0
40
Profile vs Consensusbull The following multiple alignments will
have the same consensus
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
41
Profile vs Consensusbull But have a different profile
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 066 1 0 0
T 0 0 0 1
C 033 0 066
0
G 0 0 033
0
1 2 3 4 5 6
A 1 1 0 0
T 0 0 0 1
C 0 0 1 0
G 0 0 0 0
42
Sequence LOGO A A C C G C T C T T
A G C C G C G C - T
A - C A G A G C C T
A A G C A C G C - T
A C G G G T G C T T
A T G C ndash C G C - T
A gc c g G C T
httpweblogoberkeleyedu
43
Psi Blast
bull Position Specific Iterated - automatic profile-like search
Regular blast
Construct profile from blast results
Blast profile search
Final results
44
bull Alignment with distantly related proteins
45
bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS
Enzyme Relative Km kcat kcatKm
Activity (mM) (min-1) (mM-1min
-1)Wild type 100 04 388 969
His48Ala 16 056 75 134
His63Ala 31 10 142 142
His114Ala 28 085 125 147
His124Ala 48 084 321 381
His135Ala 22 059 117 198
His212Ala lt0007 nd nd
His268Ala lt0003 nd nd
Asp14Ala 5 086 056 07
Asp113Ala 63 045 238 528
Asp131Ala 68 048 363 755
Asp203Ala 32 091 123 135
Asp214Ala lt0004 nd nd
Isopenicillin N Synthase
46
ndash IPNS
47
48
29
Step 1
Multiple alignment of known IPNS
Implementation
1 Obtain sequence (eg for MCBI)
IPNS AND Bacteria[Organism]
2 MSA (clustalw) and search for conserved residues in the MSA
30
31
32
MSA ndash bacteria only
Not enough variation
33
MSA ndash bacteria amp fungi
Not enough variation
bacteria amp fungi
bacteria
34
Step 2
Goal Add more enzymes similar to IPNS
ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW
35
bull New multiple alignment narrowing down the possibilities
36
Simple multiple alignment
bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite
similarbull Not enough variability to categorize the active
sitesbull We need to obtain even more distant sequences
(distant homologs)
37
Step 3
Using the results of the MSA for further searches
Implementation
1 Obtain an MSA (clustalw)
2 - Construct a consensus sequence and perform a
new search
OR
- Construct a profile and perform a new search
3 MSA (clustalw) and search for conserved residues in the MSA
38
Consensus Sequence
bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column
bull Consensus each position reflects the most common character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
39
Profilebull We can deduce a statistical model describing
the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column
bull Profile each position reflects the frequency of the character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 1 067 0 0
T 0 033 1 1
C 0 0 0 0
G 0 0 0 0
40
Profile vs Consensusbull The following multiple alignments will
have the same consensus
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
41
Profile vs Consensusbull But have a different profile
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 066 1 0 0
T 0 0 0 1
C 033 0 066
0
G 0 0 033
0
1 2 3 4 5 6
A 1 1 0 0
T 0 0 0 1
C 0 0 1 0
G 0 0 0 0
42
Sequence LOGO A A C C G C T C T T
A G C C G C G C - T
A - C A G A G C C T
A A G C A C G C - T
A C G G G T G C T T
A T G C ndash C G C - T
A gc c g G C T
httpweblogoberkeleyedu
43
Psi Blast
bull Position Specific Iterated - automatic profile-like search
Regular blast
Construct profile from blast results
Blast profile search
Final results
44
bull Alignment with distantly related proteins
45
bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS
Enzyme Relative Km kcat kcatKm
Activity (mM) (min-1) (mM-1min
-1)Wild type 100 04 388 969
His48Ala 16 056 75 134
His63Ala 31 10 142 142
His114Ala 28 085 125 147
His124Ala 48 084 321 381
His135Ala 22 059 117 198
His212Ala lt0007 nd nd
His268Ala lt0003 nd nd
Asp14Ala 5 086 056 07
Asp113Ala 63 045 238 528
Asp131Ala 68 048 363 755
Asp203Ala 32 091 123 135
Asp214Ala lt0004 nd nd
Isopenicillin N Synthase
46
ndash IPNS
47
48
30
31
32
MSA ndash bacteria only
Not enough variation
33
MSA ndash bacteria amp fungi
Not enough variation
bacteria amp fungi
bacteria
34
Step 2
Goal Add more enzymes similar to IPNS
ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW
35
bull New multiple alignment narrowing down the possibilities
36
Simple multiple alignment
bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite
similarbull Not enough variability to categorize the active
sitesbull We need to obtain even more distant sequences
(distant homologs)
37
Step 3
Using the results of the MSA for further searches
Implementation
1 Obtain an MSA (clustalw)
2 - Construct a consensus sequence and perform a
new search
OR
- Construct a profile and perform a new search
3 MSA (clustalw) and search for conserved residues in the MSA
38
Consensus Sequence
bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column
bull Consensus each position reflects the most common character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
39
Profilebull We can deduce a statistical model describing
the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column
bull Profile each position reflects the frequency of the character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 1 067 0 0
T 0 033 1 1
C 0 0 0 0
G 0 0 0 0
40
Profile vs Consensusbull The following multiple alignments will
have the same consensus
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
41
Profile vs Consensusbull But have a different profile
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 066 1 0 0
T 0 0 0 1
C 033 0 066
0
G 0 0 033
0
1 2 3 4 5 6
A 1 1 0 0
T 0 0 0 1
C 0 0 1 0
G 0 0 0 0
42
Sequence LOGO A A C C G C T C T T
A G C C G C G C - T
A - C A G A G C C T
A A G C A C G C - T
A C G G G T G C T T
A T G C ndash C G C - T
A gc c g G C T
httpweblogoberkeleyedu
43
Psi Blast
bull Position Specific Iterated - automatic profile-like search
Regular blast
Construct profile from blast results
Blast profile search
Final results
44
bull Alignment with distantly related proteins
45
bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS
Enzyme Relative Km kcat kcatKm
Activity (mM) (min-1) (mM-1min
-1)Wild type 100 04 388 969
His48Ala 16 056 75 134
His63Ala 31 10 142 142
His114Ala 28 085 125 147
His124Ala 48 084 321 381
His135Ala 22 059 117 198
His212Ala lt0007 nd nd
His268Ala lt0003 nd nd
Asp14Ala 5 086 056 07
Asp113Ala 63 045 238 528
Asp131Ala 68 048 363 755
Asp203Ala 32 091 123 135
Asp214Ala lt0004 nd nd
Isopenicillin N Synthase
46
ndash IPNS
47
48
31
32
MSA ndash bacteria only
Not enough variation
33
MSA ndash bacteria amp fungi
Not enough variation
bacteria amp fungi
bacteria
34
Step 2
Goal Add more enzymes similar to IPNS
ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW
35
bull New multiple alignment narrowing down the possibilities
36
Simple multiple alignment
bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite
similarbull Not enough variability to categorize the active
sitesbull We need to obtain even more distant sequences
(distant homologs)
37
Step 3
Using the results of the MSA for further searches
Implementation
1 Obtain an MSA (clustalw)
2 - Construct a consensus sequence and perform a
new search
OR
- Construct a profile and perform a new search
3 MSA (clustalw) and search for conserved residues in the MSA
38
Consensus Sequence
bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column
bull Consensus each position reflects the most common character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
39
Profilebull We can deduce a statistical model describing
the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column
bull Profile each position reflects the frequency of the character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 1 067 0 0
T 0 033 1 1
C 0 0 0 0
G 0 0 0 0
40
Profile vs Consensusbull The following multiple alignments will
have the same consensus
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
41
Profile vs Consensusbull But have a different profile
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 066 1 0 0
T 0 0 0 1
C 033 0 066
0
G 0 0 033
0
1 2 3 4 5 6
A 1 1 0 0
T 0 0 0 1
C 0 0 1 0
G 0 0 0 0
42
Sequence LOGO A A C C G C T C T T
A G C C G C G C - T
A - C A G A G C C T
A A G C A C G C - T
A C G G G T G C T T
A T G C ndash C G C - T
A gc c g G C T
httpweblogoberkeleyedu
43
Psi Blast
bull Position Specific Iterated - automatic profile-like search
Regular blast
Construct profile from blast results
Blast profile search
Final results
44
bull Alignment with distantly related proteins
45
bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS
Enzyme Relative Km kcat kcatKm
Activity (mM) (min-1) (mM-1min
-1)Wild type 100 04 388 969
His48Ala 16 056 75 134
His63Ala 31 10 142 142
His114Ala 28 085 125 147
His124Ala 48 084 321 381
His135Ala 22 059 117 198
His212Ala lt0007 nd nd
His268Ala lt0003 nd nd
Asp14Ala 5 086 056 07
Asp113Ala 63 045 238 528
Asp131Ala 68 048 363 755
Asp203Ala 32 091 123 135
Asp214Ala lt0004 nd nd
Isopenicillin N Synthase
46
ndash IPNS
47
48
32
MSA ndash bacteria only
Not enough variation
33
MSA ndash bacteria amp fungi
Not enough variation
bacteria amp fungi
bacteria
34
Step 2
Goal Add more enzymes similar to IPNS
ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW
35
bull New multiple alignment narrowing down the possibilities
36
Simple multiple alignment
bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite
similarbull Not enough variability to categorize the active
sitesbull We need to obtain even more distant sequences
(distant homologs)
37
Step 3
Using the results of the MSA for further searches
Implementation
1 Obtain an MSA (clustalw)
2 - Construct a consensus sequence and perform a
new search
OR
- Construct a profile and perform a new search
3 MSA (clustalw) and search for conserved residues in the MSA
38
Consensus Sequence
bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column
bull Consensus each position reflects the most common character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
39
Profilebull We can deduce a statistical model describing
the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column
bull Profile each position reflects the frequency of the character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 1 067 0 0
T 0 033 1 1
C 0 0 0 0
G 0 0 0 0
40
Profile vs Consensusbull The following multiple alignments will
have the same consensus
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
41
Profile vs Consensusbull But have a different profile
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 066 1 0 0
T 0 0 0 1
C 033 0 066
0
G 0 0 033
0
1 2 3 4 5 6
A 1 1 0 0
T 0 0 0 1
C 0 0 1 0
G 0 0 0 0
42
Sequence LOGO A A C C G C T C T T
A G C C G C G C - T
A - C A G A G C C T
A A G C A C G C - T
A C G G G T G C T T
A T G C ndash C G C - T
A gc c g G C T
httpweblogoberkeleyedu
43
Psi Blast
bull Position Specific Iterated - automatic profile-like search
Regular blast
Construct profile from blast results
Blast profile search
Final results
44
bull Alignment with distantly related proteins
45
bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS
Enzyme Relative Km kcat kcatKm
Activity (mM) (min-1) (mM-1min
-1)Wild type 100 04 388 969
His48Ala 16 056 75 134
His63Ala 31 10 142 142
His114Ala 28 085 125 147
His124Ala 48 084 321 381
His135Ala 22 059 117 198
His212Ala lt0007 nd nd
His268Ala lt0003 nd nd
Asp14Ala 5 086 056 07
Asp113Ala 63 045 238 528
Asp131Ala 68 048 363 755
Asp203Ala 32 091 123 135
Asp214Ala lt0004 nd nd
Isopenicillin N Synthase
46
ndash IPNS
47
48
33
MSA ndash bacteria amp fungi
Not enough variation
bacteria amp fungi
bacteria
34
Step 2
Goal Add more enzymes similar to IPNS
ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW
35
bull New multiple alignment narrowing down the possibilities
36
Simple multiple alignment
bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite
similarbull Not enough variability to categorize the active
sitesbull We need to obtain even more distant sequences
(distant homologs)
37
Step 3
Using the results of the MSA for further searches
Implementation
1 Obtain an MSA (clustalw)
2 - Construct a consensus sequence and perform a
new search
OR
- Construct a profile and perform a new search
3 MSA (clustalw) and search for conserved residues in the MSA
38
Consensus Sequence
bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column
bull Consensus each position reflects the most common character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
39
Profilebull We can deduce a statistical model describing
the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column
bull Profile each position reflects the frequency of the character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 1 067 0 0
T 0 033 1 1
C 0 0 0 0
G 0 0 0 0
40
Profile vs Consensusbull The following multiple alignments will
have the same consensus
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
41
Profile vs Consensusbull But have a different profile
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 066 1 0 0
T 0 0 0 1
C 033 0 066
0
G 0 0 033
0
1 2 3 4 5 6
A 1 1 0 0
T 0 0 0 1
C 0 0 1 0
G 0 0 0 0
42
Sequence LOGO A A C C G C T C T T
A G C C G C G C - T
A - C A G A G C C T
A A G C A C G C - T
A C G G G T G C T T
A T G C ndash C G C - T
A gc c g G C T
httpweblogoberkeleyedu
43
Psi Blast
bull Position Specific Iterated - automatic profile-like search
Regular blast
Construct profile from blast results
Blast profile search
Final results
44
bull Alignment with distantly related proteins
45
bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS
Enzyme Relative Km kcat kcatKm
Activity (mM) (min-1) (mM-1min
-1)Wild type 100 04 388 969
His48Ala 16 056 75 134
His63Ala 31 10 142 142
His114Ala 28 085 125 147
His124Ala 48 084 321 381
His135Ala 22 059 117 198
His212Ala lt0007 nd nd
His268Ala lt0003 nd nd
Asp14Ala 5 086 056 07
Asp113Ala 63 045 238 528
Asp131Ala 68 048 363 755
Asp203Ala 32 091 123 135
Asp214Ala lt0004 nd nd
Isopenicillin N Synthase
46
ndash IPNS
47
48
34
Step 2
Goal Add more enzymes similar to IPNS
ImplementationbullSearch in httpwwwexpasyorgtoolsblastbullblast IPNS_CEPAC as querybullSelect sequences similar to the query in the entire lengthbullExport in FASTA formatbullRun CLUSTALW
35
bull New multiple alignment narrowing down the possibilities
36
Simple multiple alignment
bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite
similarbull Not enough variability to categorize the active
sitesbull We need to obtain even more distant sequences
(distant homologs)
37
Step 3
Using the results of the MSA for further searches
Implementation
1 Obtain an MSA (clustalw)
2 - Construct a consensus sequence and perform a
new search
OR
- Construct a profile and perform a new search
3 MSA (clustalw) and search for conserved residues in the MSA
38
Consensus Sequence
bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column
bull Consensus each position reflects the most common character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
39
Profilebull We can deduce a statistical model describing
the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column
bull Profile each position reflects the frequency of the character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 1 067 0 0
T 0 033 1 1
C 0 0 0 0
G 0 0 0 0
40
Profile vs Consensusbull The following multiple alignments will
have the same consensus
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
41
Profile vs Consensusbull But have a different profile
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 066 1 0 0
T 0 0 0 1
C 033 0 066
0
G 0 0 033
0
1 2 3 4 5 6
A 1 1 0 0
T 0 0 0 1
C 0 0 1 0
G 0 0 0 0
42
Sequence LOGO A A C C G C T C T T
A G C C G C G C - T
A - C A G A G C C T
A A G C A C G C - T
A C G G G T G C T T
A T G C ndash C G C - T
A gc c g G C T
httpweblogoberkeleyedu
43
Psi Blast
bull Position Specific Iterated - automatic profile-like search
Regular blast
Construct profile from blast results
Blast profile search
Final results
44
bull Alignment with distantly related proteins
45
bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS
Enzyme Relative Km kcat kcatKm
Activity (mM) (min-1) (mM-1min
-1)Wild type 100 04 388 969
His48Ala 16 056 75 134
His63Ala 31 10 142 142
His114Ala 28 085 125 147
His124Ala 48 084 321 381
His135Ala 22 059 117 198
His212Ala lt0007 nd nd
His268Ala lt0003 nd nd
Asp14Ala 5 086 056 07
Asp113Ala 63 045 238 528
Asp131Ala 68 048 363 755
Asp203Ala 32 091 123 135
Asp214Ala lt0004 nd nd
Isopenicillin N Synthase
46
ndash IPNS
47
48
35
bull New multiple alignment narrowing down the possibilities
36
Simple multiple alignment
bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite
similarbull Not enough variability to categorize the active
sitesbull We need to obtain even more distant sequences
(distant homologs)
37
Step 3
Using the results of the MSA for further searches
Implementation
1 Obtain an MSA (clustalw)
2 - Construct a consensus sequence and perform a
new search
OR
- Construct a profile and perform a new search
3 MSA (clustalw) and search for conserved residues in the MSA
38
Consensus Sequence
bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column
bull Consensus each position reflects the most common character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
39
Profilebull We can deduce a statistical model describing
the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column
bull Profile each position reflects the frequency of the character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 1 067 0 0
T 0 033 1 1
C 0 0 0 0
G 0 0 0 0
40
Profile vs Consensusbull The following multiple alignments will
have the same consensus
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
41
Profile vs Consensusbull But have a different profile
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 066 1 0 0
T 0 0 0 1
C 033 0 066
0
G 0 0 033
0
1 2 3 4 5 6
A 1 1 0 0
T 0 0 0 1
C 0 0 1 0
G 0 0 0 0
42
Sequence LOGO A A C C G C T C T T
A G C C G C G C - T
A - C A G A G C C T
A A G C A C G C - T
A C G G G T G C T T
A T G C ndash C G C - T
A gc c g G C T
httpweblogoberkeleyedu
43
Psi Blast
bull Position Specific Iterated - automatic profile-like search
Regular blast
Construct profile from blast results
Blast profile search
Final results
44
bull Alignment with distantly related proteins
45
bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS
Enzyme Relative Km kcat kcatKm
Activity (mM) (min-1) (mM-1min
-1)Wild type 100 04 388 969
His48Ala 16 056 75 134
His63Ala 31 10 142 142
His114Ala 28 085 125 147
His124Ala 48 084 321 381
His135Ala 22 059 117 198
His212Ala lt0007 nd nd
His268Ala lt0003 nd nd
Asp14Ala 5 086 056 07
Asp113Ala 63 045 238 528
Asp131Ala 68 048 363 755
Asp203Ala 32 091 123 135
Asp214Ala lt0004 nd nd
Isopenicillin N Synthase
46
ndash IPNS
47
48
36
Simple multiple alignment
bull The known IPNS sequences are very similarbull Close enzymes sequences are also quite
similarbull Not enough variability to categorize the active
sitesbull We need to obtain even more distant sequences
(distant homologs)
37
Step 3
Using the results of the MSA for further searches
Implementation
1 Obtain an MSA (clustalw)
2 - Construct a consensus sequence and perform a
new search
OR
- Construct a profile and perform a new search
3 MSA (clustalw) and search for conserved residues in the MSA
38
Consensus Sequence
bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column
bull Consensus each position reflects the most common character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
39
Profilebull We can deduce a statistical model describing
the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column
bull Profile each position reflects the frequency of the character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 1 067 0 0
T 0 033 1 1
C 0 0 0 0
G 0 0 0 0
40
Profile vs Consensusbull The following multiple alignments will
have the same consensus
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
41
Profile vs Consensusbull But have a different profile
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 066 1 0 0
T 0 0 0 1
C 033 0 066
0
G 0 0 033
0
1 2 3 4 5 6
A 1 1 0 0
T 0 0 0 1
C 0 0 1 0
G 0 0 0 0
42
Sequence LOGO A A C C G C T C T T
A G C C G C G C - T
A - C A G A G C C T
A A G C A C G C - T
A C G G G T G C T T
A T G C ndash C G C - T
A gc c g G C T
httpweblogoberkeleyedu
43
Psi Blast
bull Position Specific Iterated - automatic profile-like search
Regular blast
Construct profile from blast results
Blast profile search
Final results
44
bull Alignment with distantly related proteins
45
bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS
Enzyme Relative Km kcat kcatKm
Activity (mM) (min-1) (mM-1min
-1)Wild type 100 04 388 969
His48Ala 16 056 75 134
His63Ala 31 10 142 142
His114Ala 28 085 125 147
His124Ala 48 084 321 381
His135Ala 22 059 117 198
His212Ala lt0007 nd nd
His268Ala lt0003 nd nd
Asp14Ala 5 086 056 07
Asp113Ala 63 045 238 528
Asp131Ala 68 048 363 755
Asp203Ala 32 091 123 135
Asp214Ala lt0004 nd nd
Isopenicillin N Synthase
46
ndash IPNS
47
48
37
Step 3
Using the results of the MSA for further searches
Implementation
1 Obtain an MSA (clustalw)
2 - Construct a consensus sequence and perform a
new search
OR
- Construct a profile and perform a new search
3 MSA (clustalw) and search for conserved residues in the MSA
38
Consensus Sequence
bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column
bull Consensus each position reflects the most common character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
39
Profilebull We can deduce a statistical model describing
the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column
bull Profile each position reflects the frequency of the character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 1 067 0 0
T 0 033 1 1
C 0 0 0 0
G 0 0 0 0
40
Profile vs Consensusbull The following multiple alignments will
have the same consensus
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
41
Profile vs Consensusbull But have a different profile
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 066 1 0 0
T 0 0 0 1
C 033 0 066
0
G 0 0 033
0
1 2 3 4 5 6
A 1 1 0 0
T 0 0 0 1
C 0 0 1 0
G 0 0 0 0
42
Sequence LOGO A A C C G C T C T T
A G C C G C G C - T
A - C A G A G C C T
A A G C A C G C - T
A C G G G T G C T T
A T G C ndash C G C - T
A gc c g G C T
httpweblogoberkeleyedu
43
Psi Blast
bull Position Specific Iterated - automatic profile-like search
Regular blast
Construct profile from blast results
Blast profile search
Final results
44
bull Alignment with distantly related proteins
45
bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS
Enzyme Relative Km kcat kcatKm
Activity (mM) (min-1) (mM-1min
-1)Wild type 100 04 388 969
His48Ala 16 056 75 134
His63Ala 31 10 142 142
His114Ala 28 085 125 147
His124Ala 48 084 321 381
His135Ala 22 059 117 198
His212Ala lt0007 nd nd
His268Ala lt0003 nd nd
Asp14Ala 5 086 056 07
Asp113Ala 63 045 238 528
Asp131Ala 68 048 363 755
Asp203Ala 32 091 123 135
Asp214Ala lt0004 nd nd
Isopenicillin N Synthase
46
ndash IPNS
47
48
38
Consensus Sequence
bull We can deduce a consensus sequence from the multiple sequence alignment The consensus sequence holds the most frequent character of the alignment at each column
bull Consensus each position reflects the most common character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
39
Profilebull We can deduce a statistical model describing
the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column
bull Profile each position reflects the frequency of the character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 1 067 0 0
T 0 033 1 1
C 0 0 0 0
G 0 0 0 0
40
Profile vs Consensusbull The following multiple alignments will
have the same consensus
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
41
Profile vs Consensusbull But have a different profile
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 066 1 0 0
T 0 0 0 1
C 033 0 066
0
G 0 0 033
0
1 2 3 4 5 6
A 1 1 0 0
T 0 0 0 1
C 0 0 1 0
G 0 0 0 0
42
Sequence LOGO A A C C G C T C T T
A G C C G C G C - T
A - C A G A G C C T
A A G C A C G C - T
A C G G G T G C T T
A T G C ndash C G C - T
A gc c g G C T
httpweblogoberkeleyedu
43
Psi Blast
bull Position Specific Iterated - automatic profile-like search
Regular blast
Construct profile from blast results
Blast profile search
Final results
44
bull Alignment with distantly related proteins
45
bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS
Enzyme Relative Km kcat kcatKm
Activity (mM) (min-1) (mM-1min
-1)Wild type 100 04 388 969
His48Ala 16 056 75 134
His63Ala 31 10 142 142
His114Ala 28 085 125 147
His124Ala 48 084 321 381
His135Ala 22 059 117 198
His212Ala lt0007 nd nd
His268Ala lt0003 nd nd
Asp14Ala 5 086 056 07
Asp113Ala 63 045 238 528
Asp131Ala 68 048 363 755
Asp203Ala 32 091 123 135
Asp214Ala lt0004 nd nd
Isopenicillin N Synthase
46
ndash IPNS
47
48
39
Profilebull We can deduce a statistical model describing
the multiple sequence alignment A Profile holds statistical information about characters in alignment at each column
bull Profile each position reflects the frequency of the character found at a position
A T C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 1 067 0 0
T 0 033 1 1
C 0 0 0 0
G 0 0 0 0
40
Profile vs Consensusbull The following multiple alignments will
have the same consensus
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
41
Profile vs Consensusbull But have a different profile
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 066 1 0 0
T 0 0 0 1
C 033 0 066
0
G 0 0 033
0
1 2 3 4 5 6
A 1 1 0 0
T 0 0 0 1
C 0 0 1 0
G 0 0 0 0
42
Sequence LOGO A A C C G C T C T T
A G C C G C G C - T
A - C A G A G C C T
A A G C A C G C - T
A C G G G T G C T T
A T G C ndash C G C - T
A gc c g G C T
httpweblogoberkeleyedu
43
Psi Blast
bull Position Specific Iterated - automatic profile-like search
Regular blast
Construct profile from blast results
Blast profile search
Final results
44
bull Alignment with distantly related proteins
45
bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS
Enzyme Relative Km kcat kcatKm
Activity (mM) (min-1) (mM-1min
-1)Wild type 100 04 388 969
His48Ala 16 056 75 134
His63Ala 31 10 142 142
His114Ala 28 085 125 147
His124Ala 48 084 321 381
His135Ala 22 059 117 198
His212Ala lt0007 nd nd
His268Ala lt0003 nd nd
Asp14Ala 5 086 056 07
Asp113Ala 63 045 238 528
Asp131Ala 68 048 363 755
Asp203Ala 32 091 123 135
Asp214Ala lt0004 nd nd
Isopenicillin N Synthase
46
ndash IPNS
47
48
40
Profile vs Consensusbull The following multiple alignments will
have the same consensus
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
A A C T T G T
41
Profile vs Consensusbull But have a different profile
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 066 1 0 0
T 0 0 0 1
C 033 0 066
0
G 0 0 033
0
1 2 3 4 5 6
A 1 1 0 0
T 0 0 0 1
C 0 0 1 0
G 0 0 0 0
42
Sequence LOGO A A C C G C T C T T
A G C C G C G C - T
A - C A G A G C C T
A A G C A C G C - T
A C G G G T G C T T
A T G C ndash C G C - T
A gc c g G C T
httpweblogoberkeleyedu
43
Psi Blast
bull Position Specific Iterated - automatic profile-like search
Regular blast
Construct profile from blast results
Blast profile search
Final results
44
bull Alignment with distantly related proteins
45
bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS
Enzyme Relative Km kcat kcatKm
Activity (mM) (min-1) (mM-1min
-1)Wild type 100 04 388 969
His48Ala 16 056 75 134
His63Ala 31 10 142 142
His114Ala 28 085 125 147
His124Ala 48 084 321 381
His135Ala 22 059 117 198
His212Ala lt0007 nd nd
His268Ala lt0003 nd nd
Asp14Ala 5 086 056 07
Asp113Ala 63 045 238 528
Asp131Ala 68 048 363 755
Asp203Ala 32 091 123 135
Asp214Ala lt0004 nd nd
Isopenicillin N Synthase
46
ndash IPNS
47
48
41
Profile vs Consensusbull But have a different profile
A A C T T G C
A A G T C G T
C A C T T C T
A A C T T G T
A A C T T G T
A A C T T C T
1 2 3 4 5 6
A 066 1 0 0
T 0 0 0 1
C 033 0 066
0
G 0 0 033
0
1 2 3 4 5 6
A 1 1 0 0
T 0 0 0 1
C 0 0 1 0
G 0 0 0 0
42
Sequence LOGO A A C C G C T C T T
A G C C G C G C - T
A - C A G A G C C T
A A G C A C G C - T
A C G G G T G C T T
A T G C ndash C G C - T
A gc c g G C T
httpweblogoberkeleyedu
43
Psi Blast
bull Position Specific Iterated - automatic profile-like search
Regular blast
Construct profile from blast results
Blast profile search
Final results
44
bull Alignment with distantly related proteins
45
bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS
Enzyme Relative Km kcat kcatKm
Activity (mM) (min-1) (mM-1min
-1)Wild type 100 04 388 969
His48Ala 16 056 75 134
His63Ala 31 10 142 142
His114Ala 28 085 125 147
His124Ala 48 084 321 381
His135Ala 22 059 117 198
His212Ala lt0007 nd nd
His268Ala lt0003 nd nd
Asp14Ala 5 086 056 07
Asp113Ala 63 045 238 528
Asp131Ala 68 048 363 755
Asp203Ala 32 091 123 135
Asp214Ala lt0004 nd nd
Isopenicillin N Synthase
46
ndash IPNS
47
48
42
Sequence LOGO A A C C G C T C T T
A G C C G C G C - T
A - C A G A G C C T
A A G C A C G C - T
A C G G G T G C T T
A T G C ndash C G C - T
A gc c g G C T
httpweblogoberkeleyedu
43
Psi Blast
bull Position Specific Iterated - automatic profile-like search
Regular blast
Construct profile from blast results
Blast profile search
Final results
44
bull Alignment with distantly related proteins
45
bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS
Enzyme Relative Km kcat kcatKm
Activity (mM) (min-1) (mM-1min
-1)Wild type 100 04 388 969
His48Ala 16 056 75 134
His63Ala 31 10 142 142
His114Ala 28 085 125 147
His124Ala 48 084 321 381
His135Ala 22 059 117 198
His212Ala lt0007 nd nd
His268Ala lt0003 nd nd
Asp14Ala 5 086 056 07
Asp113Ala 63 045 238 528
Asp131Ala 68 048 363 755
Asp203Ala 32 091 123 135
Asp214Ala lt0004 nd nd
Isopenicillin N Synthase
46
ndash IPNS
47
48
43
Psi Blast
bull Position Specific Iterated - automatic profile-like search
Regular blast
Construct profile from blast results
Blast profile search
Final results
44
bull Alignment with distantly related proteins
45
bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS
Enzyme Relative Km kcat kcatKm
Activity (mM) (min-1) (mM-1min
-1)Wild type 100 04 388 969
His48Ala 16 056 75 134
His63Ala 31 10 142 142
His114Ala 28 085 125 147
His124Ala 48 084 321 381
His135Ala 22 059 117 198
His212Ala lt0007 nd nd
His268Ala lt0003 nd nd
Asp14Ala 5 086 056 07
Asp113Ala 63 045 238 528
Asp131Ala 68 048 363 755
Asp203Ala 32 091 123 135
Asp214Ala lt0004 nd nd
Isopenicillin N Synthase
46
ndash IPNS
47
48
44
bull Alignment with distantly related proteins
45
bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS
Enzyme Relative Km kcat kcatKm
Activity (mM) (min-1) (mM-1min
-1)Wild type 100 04 388 969
His48Ala 16 056 75 134
His63Ala 31 10 142 142
His114Ala 28 085 125 147
His124Ala 48 084 321 381
His135Ala 22 059 117 198
His212Ala lt0007 nd nd
His268Ala lt0003 nd nd
Asp14Ala 5 086 056 07
Asp113Ala 63 045 238 528
Asp131Ala 68 048 363 755
Asp203Ala 32 091 123 135
Asp214Ala lt0004 nd nd
Isopenicillin N Synthase
46
ndash IPNS
47
48
45
bull Experimental evidence supports the finding that His212 Asp214 and His268 are the endogenous ligands that bind Fe+2 in IPNS
Enzyme Relative Km kcat kcatKm
Activity (mM) (min-1) (mM-1min
-1)Wild type 100 04 388 969
His48Ala 16 056 75 134
His63Ala 31 10 142 142
His114Ala 28 085 125 147
His124Ala 48 084 321 381
His135Ala 22 059 117 198
His212Ala lt0007 nd nd
His268Ala lt0003 nd nd
Asp14Ala 5 086 056 07
Asp113Ala 63 045 238 528
Asp131Ala 68 048 363 755
Asp203Ala 32 091 123 135
Asp214Ala lt0004 nd nd
Isopenicillin N Synthase
46
ndash IPNS
47
48
46
ndash IPNS
47
48
47
48
48