Upload
meghan-mckinney
View
223
Download
2
Embed Size (px)
Citation preview
Coordinate handling and exploitation
An overview of coordinate functionality in CCP4 suite
Coordinate functionality in REFMAC group of programs (A. Vaguine)
New CCP4 project “Protein Interfaces” (E. Krissinel)
Coordinate support in CCP4
Old FORTRAN coordinate-related applications not using RWBrook (42%)
Old FORTRAN coordinate-related applications not using RWBrook (42%)
Own coordinate functionsOwn coordinate functions
Refmac group of programsRefmac group of programs Own coordinate functionsOwn coordinate functions
Old FORTRAN coordinate-related applications using
RWBrook (33%)
Old FORTRAN coordinate-related applications using
RWBrook (33%)
New C & C++ coordinate-related applications (a few)
New C & C++ coordinate-related applications (a few)
ClipperClipper
Molecular GraphicsMolecular Graphics
CootCoot
RWBrook emulatorRWBrook emulator
MMDB(C++ Coordinate Library)
MMDB(C++ Coordinate Library)
SSMSSM
DNA groupDNA group Own coordinate functionsOwn coordinate functions
otherother Own coordinate functionsOwn coordinate functions
CCP4 Coordinate Library (MMDB)
Manager
Interface
API
PDB file
One or moreC++ classes
mmCIF file Binary file
ModelHeader
Cryst
Sequence
Model
Residue
Atom
Chain ChainChain
Residue Residue
Atom Atom
• C++ class hierarchy• PDB/mmCIF support• Database features• ~600 interface functions• Emulate RWBrook• Wealth of retrieval,
selection, transformation and edit tools
• User-defined data• Built-in high-level
functionality (contacts, alignment, superposition etc.)
• Monomer database• SWIG interface• Stable and documented
E. Krissinel et.al. (2004) Acta Cryst. D60 2250-55
Approximately 40% of CCP4 suite now uses a common set of coordinate functions provided by MMDB. This should help greatly in maintenance and adaptation to possible format changes.
Conversion of older FORTRAN applications, which are not using RWBrook, to MMDB, in most cases means a complete rewriting. This does not seem to be necessary at the moment.
All on-going developments in FORTRAN seem to be using their own coordinate functions and libraries.
MMDB delivers all its power only in C++ interface. Most of MMDB functionality cannot be expressed in traditional FORTRAN terms. Should we encourage new coordinate developments in C/C++ using MMDB? - shift away from FORTRAN thinking.
New coordinate-related CCP4 projects - MG, Coot, SSM and Protein Interfaces - are all based on MMDB and that seems to be an advantage for the projects.
General remarks
PIAS Project goals
Develop a tool and publicly available interactive service to aid solution of different tasks that involve structural and chemical analysis of protein interactions, such as
• prediction of oligomeric states• analysis of structure-function relationship• analysis and prediction of protein interactions• search for interface homologues• active site recognition and analysis• protein surface analysis• structure specificity analysis• other
Project started in 2004.
Interactive Web server
provisional parts, subject to progress and feasibility
Crystal interfaces
Interface calculations, analysis, scoring &
biological significance
Interfaces & structure similarity
searches
Interface fingerprinting
Applied studies (e.g. discovery of
multispecific proteins)
Active site recognition
Docking
Procedures for CCP4 MG
Prediction of interfaces
Prediction of oligomeric states
(PQS-3)
Interfaces & surface similarity
searches
PIAS Project overview
PIAS database
2004-2005 2005-2007 2006-2008 2004-2008
Crystal interfaces
Interface calculations, analysis, scoring &
biological significance
Interfaces & structure similarity
searches
Interface fingerprinting
Applied studies (e.g. discovery of
multispecific proteins)
Active site recognition
Docking
Procedures for CCP4 MG
Prediction of interfaces
Prediction of oligomeric states
(PQS-3)
Interfaces & surface similarity
searches
PIAS Project schedule
PIAS database
PIAS Database
• Interface is defined as area that becomes inaccessible to solvent upon complex formation
• Databased properties for interfacing structures:
Contains interfaces between polypeptides found in all PDB entries: all crystal contacts for X-ray entries and chain contacts for NMR entries. Also contains predicted protein assemblies.
Interface area per residue (+ selection of interfacing atoms and residues) Number of atoms and residues involved Solvation energy gain (per residue) and P-value of hydrophobic patches List of potential hydrogen bonds and salt bridges Complexation significance score
• Databased properties for interfaces:
Size, weight Solvent accessible area per residue (+
selection of surface atoms and residues)
• Databased properties for assemblies: Composition, chemical formula List of engaged interfaces Transformation matrices
Solvation energy gain Solvent accessible and buried surface area Dissociation pattern and barrier
Solvation energy per residue SSM data for structure search Structure and sequence alignment
PIAS database
Existing tools for the calculation of quaternary structuresPrediction of oligomeric states
(PQS-3)
PQS server @ MSD (Kim Henrick) (PQS-1)
Prediction of oligomeric states
Method: recursive splitting of the largest complexes allowed by crystal symmetry. Termination criteria is derived from the individual statistical scores of crystal contacts. The results are not curated.
PITA software @ Thornton group EBI (Hannes Ponstingl) (PQS-2)
Method: progressive built-up by addition of monomeric chains that suit the selection criteria. The results are partly curated.
Graph-chemical approach
• Crystal is represented as a periodic graph of monomers (a “supermolecule”)
• All possible assemblies that obey the symmetry criteria are recursively enumerated as subgraphs covering all the crystal
• Only sets of chemically stable assemblies are left as an answer:
0int STG
Prediction of oligomeric states
Prediction of oligomeric states
(PQS-3)
Success rate obtained on a benchmark set of 212 structures (H. Ponstingl)
PQS server @ MSD 78% (not optimised on the benchmark set)
PITA software 84% (optimised with 18 parameters)
PIAS 89% (optimised with 8 parameters, underfit)Early results outside the benchmark set indicate some prevalence of PIAS,
however the actual differences may be less significant.
Prediction of oligomeric states
Prediction of oligomeric states
(PQS-3)
Prediction of oligomeric states
(PQS-3)
Prediction of oligomeric states
PQS may be predicted only up to a certain level of confidence. It seems that 85-90% of correct predictions may be reached. Main reasons for why 100% success rate can never be achieved:
• theoretical models for protein affinity and entropy change upon complexation are primitive
• coordinate (experimental) data are of limited accuracy
• there is no feasible way to take conformation changes into account
• experimental data on multimeric states is very limited and not always reliable - calibration of parameters is difficult
• assemblies may exist in some environments and dissociate in other - a definitive answer is simply not there
Questions to answer
Searching the PIAS database for structurally similar interfaces and interfaces between similar structures
Interfaces & structure similarity
searches
• What interfaces are formed by structures similar to the given one(s) in PDB
• What are the interface partners of a given structure in PDB
• What is the relation between sequence and biological (complexation) significance of the interface (function)
• What PQS may be formed by structures similar to the given one(s) and how the PQS may depend on the sequence
• Is a given structure interaction-specific and/or multispecific
Interfaces and structure similarity searches
A preliminary version of the MSD protein interaction service is set up at
http://www.ebi.ac.uk/msd-srv/prot_int/cgi-bin/piserver
The version includes:
• Calculations for uploaded files or database retrievals on PDB Id code of
Solvent-Accessible Surface area Crystal contacts / interfaces Protein interface parameters and scoring
Interface areaSolvation energy gainHydrogen bonds and salt
bridges
Hydrophobic P-valueBiological relevance scoreSelection of interfacing residues and atoms
Protein Quaternary Structures
• Interface and structure searches in protein interface database derived from PDB
• Visualisation of the structures, interfaces and PQS
PIAS web server
PIAS web server
http://www.ebi.ac.uk/msd-srv/prot_int/cgi-bin/piserver
PIAS web server
http://www.ebi.ac.uk/msd-srv/prot_int/cgi-bin/piserver
PIAS web server
http://www.ebi.ac.uk/msd-srv/prot_int/cgi-bin/piserver
PIAS web server
http://www.ebi.ac.uk/msd-srv/prot_int/cgi-bin/piserver
3gcb hexamer Dissociation of 3gcb hexamer
Concluding remarks
The PIAS software is almost ready for first release. It may be released in 2 months time after catching up with
• on-line help and documentation• minor cleaning and re-design of output pages• enhancement of structural search options• further entropy calibration to increase accuracy of PQS prediction
Further work will concentrate on
• surface calculation and analysis• surface / active sites searches• possibly docking• additions to and improvements of existing functions (based on users’
feedback and own needs)