View
219
Download
0
Category
Preview:
Citation preview
CLePAPS : Fast Pair Alignment of Protein Structures
Based on Conformational Letters
BioInfoSummer07 ICE-EM Summer Symposium in BioInformatics , Dec 10-14, Australian National University . Canberra, Australia
Sheng WANG,Wei-Mou ZHENG*
Institute of Theoretical Physics, CAS
zheng@itp.ac.cn *To whom correspondence should be addressed
Outline
• [1] Introduction
• [2] The flow chart of CLePAPS Algorithm
[2-1] Find SFPs by CLeSUM
[2-2] Construct ‘Star-Tree’
[2-3] The ‘Zoon-In’ Strategy
• [3] Result & Discussion
Structure alignment --- a self-consistent problemCorrespondence Rigid transformation
However, when aligning two protein structures, at the beginning we know neither the transformation nor the correspondence.
DALI, CEVASTSTRUCTAL, ProSup
CLePAPS: Conformational Letters based Pairwise Alignment of Protein Structures
Initialization + iteration• Similar Fragment Pairs (SFPs); • Anchor-based; • Alignment = As many consistent SFPs as possible
Page 1 (Chp1) Chapter[1] : Introduction
Anchor-based superposition
SFPs
Anchor SFP
consistent
inconsistent
Alignment = Collect as many consistent SFPs as possible
Page 2 (Chp1) Chapter[1] : Introduction
Initial correspondence(Anchor SFP)
Optimal transformation
for the correspondence
Correspondenceupdate
(adding consistent SFPs)
Convergence? End
Structure Alignment => a self-consistent problem
Yes
No
ProteinA ProteinB
Align
Chapter[1] : Introduction Page 3 (Chp1)
[1] How can we find SFPs as fast as possible?
[2] How can we balance Specificity and Sensitivity of the found SFPs?
[3] How can we avoid a start?
[4] How can we haste the convergence while not to be Local Traped?
Four Main Problems
LOCAL TRAP
Chapter[1] : Introduction Page 4 (Chp1)
An example ofLOCAL TRAP
Find SFPsBy CLeSUM
SFP List(width 8)
SFP List(width 20)
FinalAlignment
ThirdUpdate
SecondUpdate
FirstUpdate
OptimalAnchor SFP
Star-TreeConstruct
Part_III: ‘Zoom-In’
Top K for anchor
Top J for neighbor
d1 blank-filling
d2 blank-filling
d3 blank-filling
Part_II: ‘Star-Tree’
Specificity
Sensitivity
Part_I: SFP
Chapter[2] : The flow chart of CLePAPS Algorithm Page 5 (Chp2)
Part_II: ‘Star-Tree’
Initial correspondence
(Select an Optimal
Anchor SFP)
Part_III: ‘Zoom-In’
Correspondenceupdate
(adding consistent SFPs without
Local Trap and to haste the convergence)
Find SFPsBy CLeSUM
SFP ( Similarity Fragment Pair)
Chapter[2-1] : Find SFPs by CLeSUM Page 6 (Chp2)
CLeSUM ( Conformational Letter SUbstitution Matrix )
Hint:
Part_I: SFP
The main difference of CLePAPS from other existing algorithms for structure alignment is the use of Conformational Letters. Conformational letters = discretized states of 3D segmental conformations. A letter = a cluster of combinations of three angles formed by C pseudobonds of four contiguous residues. (obtained by clustering according to the probability distribution.)
Fig.1 Centers of 17 conformational letters
Page 7 (Chp2) Chapter[2-1] : Find SFPs by CLeSUM
Similarity between conformational letters
CLeSUM: Conformational Letter SUbstitution Matrix
Mij = 20* log 2 (Pij/PiPj) ~ BLOSUM83, H ~ 1.05
constructed using FSSP representatives.
typical helix
typical sheet
evolutionary
+ geometric
Page 8 (Chp2) Chapter[2-1] : Find SFPs by CLeSUM
SFP => highly scored string pair
• Fast search for SFPs by string comparison
• CLESUM similarity score importance of SFPs
Guided by CLESUM scores, only the top few SFPs need to be examinedto determine the superposition for alignment, and hence a reliable greedy strategy becomes possible.
Protein Asimilar
seed
Page 9 (Chp2)
Protein B (smaller)
Chapter[2-1] : Find SFPs by CLeSUM
Example
An example of Find SFP
>1molARRFEDECCGAIHHHHHHHHHHHHHHHOMICQEECBLDFQNBFEEEEFEQNNGCPLDDEEEDEEENOGCEDEEEEEEPKKOGFEDPLDEQBGCCR
>1cewIRRCECECAJGBIHHHHHHHHIIHHHIIGPGBLDFFCPLDPLEEFEDPOLCEEEEEEDEFDEAGCAKLAJGKHHIIMNGKLQQQDEEEDEEEEEBPKKOGEEDPLEEER
HHHHHHHHAJGKHHII
FEDECCGAOLCEEEEE
FEDPLDEQEEDPLEEE
PLDDEEEDPLEEFEDP
CEDEEEEEEEDEEEEE
Similar Fragment Pair (SFP)
Score rank51 2 3 4
To find SFP , we take the shorter sequence as template , and record every pair position which score is higher than the threshold , the fragment is at a given length
seed
1cewI 1molA
Align
Chapter[2-2] : Construct ‘Star-Tree’ Page 10 (Chp2)
SFP List (width 20) =>We create a list of SFP with length 20
and sort them by CLeSUM score
Top_K & Top_J ( J > K ) =>We only select the Top_K of the list as Anchor SFPand check their consistency use Top_J for neighbor
Hint:
Find SFPsBy CLeSUM
Part_I: SFP
Score rank 5 1 4 2
Example: Top K, K = 2; Top J,J = 5
Anchor
# of consistent SFPs = 4 # of consistent SFPs = 1
Selection of Optimal Anchor SFP
1
Top_1 SFP is globally supported by three other SFPs, while Top_2 SFP is supported only by itself.
Page 11 (Chp2)
3
Anchor
2
Example
Chapter[2-2] : Construct ‘Star-Tree’
SFP
SFP
1cewI 1molA
Anchor
Consistent
# of consistent SFBs = 4
Anchor
# of consistent SFBs = 1
Top_1 SFPTop_2 SFP
‘Star-Tree’ view
An example of ‘Star-Tree’ construct
Align
SFP List(width 8)
SFP List(width 20)
FinalAlignment
ThirdUpdate
SecondUpdate
FirstUpdate
OptimalAnchor SFP
Star-TreeConstruct
Part_III: ‘Zoom-In’
Top K for anchor
Top J for neighbor
d1 blank-filling
d2 blank-filling
d3 blank-filling
Part_II: ‘Star-Tree’
Specificity
Sensitivity
Page 5 (Chp2)
Find SFPsBy CLeSUM
Part_I: SFP
Part_III: ‘Zoom-In’
Correspondenceupdate
(adding consistent SFPs without
Local Trap and to haste the convergence)
Chapter[2] : The flow chart of CLePAPS Algorithm
Top 1 ( 4 ) Top 2 ( 1 )
Chapter[2-3] : The ‘Zoon-In’ Strategy Page 12 (Chp2)
SFP List (width 8) =>We create a list of SFP with length 8
and sort them by CLeSUM score (descending order)
blank-filling =>We add consistent SFPs one by one from SFP List (width 8) to update the correspondence
Hint:
Find SFPsBy CLeSUM
Part_I: SFP
d1
d2
d3
Page 13 (Chp2)
[1] The first transformation is determined by
the Optimal Anchor SFP , so we use a large cutoff d1 to avoid LOCAL TRAP
Example
Chapter[2-3] : The ‘Zoon-In’ Strategy
d1 > d2 > d3
8A 6A 5A。 。 。
[2] The later transformation is determined by a set of globally consistent SFPs , so we use a lower cutoff to add new consistent SFPs
ThirdUpdate
d1 d2
d3
d1 > d2 > d3
An example of ‘Zoom-In’ strategy
Elongation
FinalAlignment
FisrtUpdate
SecondUpdate
Shrink
8A 6A 5A。 。 。
SFP List(width 8)
SFP List(width 20)
FinalAlignment
ThirdUpdate
SecondUpdate
FirstUpdate
OptimalAnchor SFP
Star-TreeConstruct
Part_III: ‘Zoom-In’
Top K for anchor
Top J for neighbor
d1 blank-filling
d2 blank-filling
d3 blank-filling
Top 1 ( 4 ) Top 2 ( 1 )
Part_II: ‘Star-Tree’
Specificity
Sensitivity
Page 5 (Chp2)
Find SFPsBy CLeSUM
Part_I: SFP
Chapter[2] : The flow chart of CLePAPS Algorithm
[1] How can we find SFPs as fast as possible?
[2] How can we balance Specificity and Sensitivity of the found SFPs ?
[3] How can we avoid a Local Trap start?
[4] How can we haste the convergence while not to be Local Traped ?
Four Main Problems
[1] Fast search for SFPs by merely string comparison
[2] Width 20 for Specificity and width 8 for Sensitivity, both sorted by CLeSUM score
[3] Optimal Anchor SFP selected through ‘Star-Tree’
[4] Fast ‘Zoom-In’ strategy to convergence only within three times
CLePAPS ‘s Solution
Page 14 (Chp3) Chapter[3] : Result & Conclusion
Chapter[3] : Result & Conclusion Page 15 (Chp3)
•The Fischer benchmark test• Database search with CLePAPS• Multi-Solution of alignments: symmetry, domain move, repeats• Non-topological alignment and domain shuffling
[pdb:1ihwA] [pdb:1ssoA]
Multi-Solution[1] : Symmetry
[pdb:4fgf][pdb:8i1b]
Red structure fixed
Solution [A] Solution [B] Solution [C]
[pdb:4fgf][OGCCFEFAHOGEED][OGDCEDFAIOGEED][KGFCEDDAJOGCCC]
Multi-Solution[2] : Domain Move Blue structure fixed
[pdb:2gbp][pdb:2liv]
Solution [A] Solution [B]
Domain_1
Domain_2
Multi-Solution[3] : Repeats Blue structure fixed
[pdb:4cpv][pdb:1osa]
Solution [A] Solution [B]
Repeat_1
Repeat_2
ConclusionCLePAPS distinguishes itself from other existing algorithms for pairwise structure alignment in its use of conformational letters.
• conformational letters : aptly balance precision with simplicity• CLeSUM: a proper measure of similarity between states
• CLeSUM extracted from the database FSSP contains information of structure database statistics, which reduces the chance of accidental matching of two irrelevant helices. evolutionary + geometric = specificity gain
For example, two frequent helices are geometrically very similar,
but their score is relatively low.• CLeSUM similarity score can be used to sort the importance of SFPs for a greedy algorithm. Only the top few SFPs need to be examined.
Page 16 (Chp3) Chapter[3] : Result & Conclusion
1, Fast search for SFPs by merely string comparison
2, Width 20 for specificity + width 8 for sensitivity
3, Optimal Anchor SFP selected by checking consistency
4, Avoid Local Trap by ’zoom-in’
The running time for the 68 pairs of the Fischer benchmark is less than 2% of that of the downloaded CE local version.
Next steps
1, BLOMAPS: fast multiple structure alignment;
SFPs → Highly Similar Fragment Blocks (HSFBs)
2, Include biochemical information into CLESUM by amino acid clustering.
Entropic clustering: AVCFIWLMY (h) + DEGHKNPQRST (p)
Page 17 (Chp3) Chapter[3] : Result & Conclusion
Thank you
>1molARRFEDECCGAIHHHHHHHHHHHHHHHOMICQEECBLDFQNBFEEEEFEQNNGCPLDDEEEDEEENOGCEDEEEEEEPKKOGFEDPLDEQBGCCR
N-Terminal
C-Terminal
Step 1 get four continuous Cα atom
Step 2 get two bending angle θ and θ’ and one torsion angle τStep 3 select the most similar one from the 17 statesStep 4 assign the code
Step 1 Step 2
Step 3
Step 4
θ
θ’
τ
Recommended