View
218
Download
1
Tags:
Embed Size (px)
Citation preview
Comparative modeling
Ole Lund,
Associate Professor,
CBS, BioCentrum, DTU
OL
Comparative modeling
Also known as homology modeling Uses template from related protein to build
model Based on the finding that
– Protein structure tend to remain approximately the same even when many amino acids have changed during evolution!
– selection for conservation of structure? proteins with similar sequences often have similar
structures
OL
Why make structural models?
Fast and cheap alternative to experimental determination of structures (X-ray & NMR)– Not as accurate as experimental methods– Not all proteins can be modeled with current
methods
Applications– Drug discovery (Requires accurate model)– Plan new experiments (mutations)– Understanding of function
OL
Steps in comparative modeling
1. Find template
2. Make alignment
3. Build loops
4. Model side chains
5. Refinement
6. Evaluate model
OL
Recovery from errors
An error on an earlier step is normally unrecoverable on a later step– The alignment can not make up for a wrong
choice of template– Loop modeling can not make up for a wrong
alignment Errors may be discovered on a later step and
corrected for by going back and correcting it– i.e. by selecting a new (and better) template
OL
Template identification
Search with sequence– Blast– Psi-Blast– Fold recognition methods
Use significance levels (P or E values) - not %ID BLAST reports E-values:
– # of random hits with expected to be found with a given score Rather than P values:
– probability of finding at least one hit with a given score P = 1- exp(-E) E=loge(1-P)
– http://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-1.html Use biological information Functional annotation in databases Active site/motifs
OL
Example: Query sequence
>gi|2065035|emb|CAA65601.1| beta-lactamase [Chryseobacterium meningosepticumMLKKIKISLILALGLTSLQAFGQENPDVKIEKLKDNLYVYTTYNTFNGTKYAANAVYLVTDKGVVVIDCPWGEDKFKSFTDEIYKKHGKKVIMNIATHSHDDRAGGLEYFGKIGAKTYSTKMTDSILAKENKPRAQYTFDNNKSFKVGKSEFQVYYPGKGHTADNVVVWFPKEKVLVGGCIIKSADSKDLGYIGEAYVNDWTQSVHNIQQKFSGAQYVVAGHDDWKDQRSIQHTLDLINEYQQKQKASN
Since the discovery of penicillin, bacteria have developed defense mechanisms against these drugs. In particular, this has become a problem during the last decades, where certain pathogenic bacteria have become resistant to antibiotics. The primary defense mechanism is production of beta-lactamases, which are enzymes cleaving beta-lactam antibiotics. http://www.matfys.kvl.dk/~antony/
OL
Blast search vs. pdb
>gi|3318914|pdb|1A7T|A Chain A, Metallo-Beta-Lactamase With Mes gi|3318915|pdb|1A7T|B Chain B, Metallo-Beta-Lactamase With Mes gi|3891997|pdb|1A8T|A Chain A, Metallo-Beta-Lactamase In Complex With L-159,061 gi|3891998|pdb|1A8T|B Chain B, Metallo-Beta-Lactamase In Complex With L-159,061 Length = 232
Score = 126 bits (317), Expect = 7e-30 Identities = 62/216 (28%), Positives = 111/216 (51%), Gaps = 1/216 (0%)
Query: 27 DVKIEKLKDNLYVYTTYNTFNG-TKYAANAVYLVTDKGVVVIDCPWGEDKFKSFTDEIYK 85 D+ I +L D +Y Y + G +N + ++ + ++D P + + + + + Sbjct: 10 DISITQLSDKVYTYVSLAEIEGWGMVPSNGMIVINNHQAALLDTPINDAQTEMLVNWVTD 69
Query: 86 KHGKKVIMNIATHSHDDRAGGLEYFGKIGAKTYSTKMTDSILAKENKPRAQYTFDNNKSF 145 KV I H H D GGL Y + G ++Y+ +MT + ++ P ++ F ++ + Sbjct: 70 SLHAKVTTFIPNHWHGDCIGGLGYLQRKGVQSYANQMTIDLAKEKGLPVPEHGFTDSLTV 129
Query: 146 KVGKSEFQVYYPGKGHTADNVVVWFPKEKVLVGGCIIKSADSKDLGYIGEAYVNDWTQSV 205 + Q YY G GH DN+VVW P E +L GGC++K + +G I +A V W +++Sbjct: 130 SLDGMPLQCYYLGGGHATDNIVVWLPTENILFGGCMLKDNQTTSIGNISDADVTAWPKTL 189
Query: 206 HNIQQKFSGAQYVVAGHDDWKDQRSIQHTLDLINEY 241 ++ KF A+YVV GH ++ I+HT ++N+YSbjct: 190 DKVKAKFPSARYVVPGHGNYGGTELIEHTKQIVNQY 225
http://www.ncbi.nlm.nih.gov/blast/
OL
Template sequence
1A8TB. Chain B, Metallo-...[gi:3891998] BLink, Domains, Links LOCUS 1A8T_B 232 aa linear BCT 23-MAR-1998DEFINITION Chain B, Metallo-Beta-Lactamase In Complex With L-159,061.ACCESSION 1A8T_BVERSION 1A8T_B GI:3891998DBSOURCE pdb: molecule 1A8T, chain 66, release Mar 23, 1998; deposition: Mar 23, 1998; class: Hydrolase; source: Mol_id: 1; Organism_scientific: Bacteroides Fragilis; Strain: Tal3636; Variant: Clinical Isolate; Gene: Ccra; Expression_system: Escherichia Coli; Exp. method: X-Ray Diffraction.KEYWORDS .SOURCE Bacteroides fragilis ORGANISM Bacteroides fragilis Bacteria; Bacteroidetes; Bacteroides (class); Bacteroidales; Bacteroidaceae; Bacteroides.……………ORIGIN 1 aqksvkisdd isitqlsdkv ytyvslaeie gwgmvpsngm ivinnhqaal ldtpindaqt 61 emlvnwvtds lhakvttfip nhwhgdcigg lgylqrkgvq syanqmtidl akekglpvpe 121 hgftdsltvs ldgmplqcyy lggghatdni vvwlptenil fggcmlkdnq ttsignisda 181 dvtawpktld kvkakfpsar yvvpghgnyg gteliehtkq ivnqyiests kp//
OL
Template1A8TChain A
Template recognitionTemplate recognitionBlaB – Beta lactamaseBlaB – Beta lactamase
OL
Alignment of query and template
Look at the alignment used to find the template– Are secondary structure elements active sites and other
motifs aligned?– Can gaps be closed?– Are there place for the insertions?
Change the alignment manually or by a different alignment program/alignment parameters
– Take care not to change it for the worse– On average I only make things slightly worse by manual
intervention!
OL
AlignmentAlignmentBlaB – Beta lactamaseBlaB – Beta lactamase
BLAB EKLKDNLYVYTTYNTFNGTKY-AANAVYLVTDKGVVVIDCPWGEDKFKSFTDEIYKKHGKKVIMNIATHS1A8T.A TQLSDKVYTYVSLAEIEGWGMVPSNGMIVINNHQAALLDTPINDAQTEMLVNWVTDSLHAKVTTFIPNHW
BLAB HDDRAGGLEYFGKIGAKTYSTKMTDSILAKENKPRAQYTFDNNKSFKVGKSEFQVYYPGKGHTADNVVVW1A8T.A HGDCIGGLGYLQRKGVQSYANQMTIDLAKEKGLPVPEHGFTDSLTVSLDGMPLQCYYLGGGHATDNIVVW
BLAB FPKEKVLVGGCIIKSADSKDLGYIGEAYVNDWTQSVHNIQQKFSGAQYVVAGHDDWKDQRSIQHTLDLIN1A8T.A LPTENILFGGCMLKDNQTTSIGNISDADVTAWPKTLDKVKAKFPSARYVVPGHGNYGGTELIEHTKQIVN
BLAB EYQQKQK1A8T.A QYIESTS
Sequence identity 27%
OL
Template vs alignment identification
If the template was hard to find the correct alignment will be tough to make
If the Template is correct part of the model will normally be correct
OL
Build loops
Fragment based methods – Many implementations (M Levitt, L Holm, D Baker etc.)– Fast
Energy based methods– Avoid stereo-chemically infeasible solutions– Can see what is bad but not what is good!
Combination of methods is often used No method can move the model (very much) towards
the native conformation i.e reduce the root mean square deviation (RMSD) = How many Ångstrøms you are off
OL
Loops: The rosetta method
Find fragments (10 per amino acid) with the same sequence and secondary structure profile as the query sequence
Combine them using a Monte Carlo scheme to build them to build the loop
Baker et al.
http://www.bioinfo.rpi.edu/~bystrc/hmmstr/server.php
OL
Model side chains
Knowledge based methods– SCWRL performed well in CASP4
(http://dunbrack.fccc.edu/SCWRL3.php , http://dunbrack.fccc.edu/scwrl3protsci.pdf )
– Energy calculations Slow
OL
SCWRL (Bower, Cohen & Dunbrack)
Sidechain placement With a Rotamer Library Assumes constant angles and distances of bonds
1. Each residue begins in its most favored rotamer
2. Rotamer search to remove steric clashes between sidechains and backbone
3. Rotamer search to remove steric clashes between sidechains
OL
Model (red) vs template (blue)
OL
Model evaluation
Is the structure unlikely? Distributions of
– Dihedral angles (fraction in most favored regions)– Bond lengths and angles
Procheck– www.biochem.ucl.ac.uk/~roman/procheck/
procheck.html
OL
Example ofProcheckoutput
OL
Benchmarking comparative modeling
CASP– Critical Assessment of Structure Predictions– Sequences from about-to-be-solved-structures
are given to groups who submit their predictions before the structure is published
EVA– Newly solved structures are send to prediction
servers.– Evaluates automatic servers
OL
CASP4: Best overall fold
1. Venclovas, C
2. Baker, D
3. Sternberg, M
4. Rychlewski, L (Bioinfo.PL)
5. SBI-AT Tramantano et al., 2001
OL
CASP4: Best details of models
1. Venclovas, C
2. Sternberg, M
3. Honig, B
4. Baker, D
5. SBI-AT
Tramantano et al., 2001
OL
Accuracy of SwissModel
OL
EVA
http://cubic.bioc.columbia.edu/eva/cm/res/rank.html
Analysis of Fold accuracy (% Equivalent Positions):
Ranking of the methods:
1. sdsc12. 3djigsaw3. SwissModel4. cphmodels5. esypred
OL
Links to modeling servers
Database of links– http://mmtsb.scripps.edu/cgi-bin/renderrelres?protmodel
SwissModel – www.expasy.ch/swissmod/SM_FIRST.html
3D-Jigsaw– www.bmm.icnet.uk/servers/3djigsaw/
SDSC1– http://cl.sdsc.edu/hm.html
ESyPred3D – http://www.fundp.ac.be/urbm/bioinfo/esypred/
CPHmodels– www.cbs.dtu.dk/services/CPHmodels-2.0
OL
Practical conclusions
Several servers exist in the public domain Template and alignment must be correct Loops are difficult to model
More info on comparative modeling– http://speedy.embl-heidelberg.de/gtsp/ – http://www.cmbi.kun.nl/gv/course/index.html – http://www.umass.edu/microbio/chime/explorer/
homolmod.htm