Coordinate handling and exploitation An overview of coordinate functionality in CCP4 suite Coordinate functionality in REFMAC group of programs (A. Vaguine)

Coordinate handling and exploitation

An overview of coordinate functionality in CCP4 suite

Coordinate functionality in REFMAC group of programs (A. Vaguine)

New CCP4 project “Protein Interfaces” (E. Krissinel)

Coordinate support in CCP4

Old FORTRAN coordinate-related applications not using RWBrook (42%)

Old FORTRAN coordinate-related applications not using RWBrook (42%)

Own coordinate functionsOwn coordinate functions

Refmac group of programsRefmac group of programs Own coordinate functionsOwn coordinate functions

Old FORTRAN coordinate-related applications using

RWBrook (33%)

Old FORTRAN coordinate-related applications using

RWBrook (33%)

New C & C++ coordinate-related applications (a few)

New C & C++ coordinate-related applications (a few)

ClipperClipper

Molecular GraphicsMolecular Graphics

CootCoot

RWBrook emulatorRWBrook emulator

MMDB(C++ Coordinate Library)

MMDB(C++ Coordinate Library)

SSMSSM

DNA groupDNA group Own coordinate functionsOwn coordinate functions

otherother Own coordinate functionsOwn coordinate functions

CCP4 Coordinate Library (MMDB)

Manager

Interface

API

PDB file

One or moreC++ classes

mmCIF file Binary file

ModelHeader

Cryst

Sequence

Model

Residue

Atom

Chain ChainChain

Residue Residue

Atom Atom

• C++ class hierarchy• PDB/mmCIF support• Database features• ~600 interface functions• Emulate RWBrook• Wealth of retrieval,

selection, transformation and edit tools

• User-defined data• Built-in high-level

functionality (contacts, alignment, superposition etc.)

• Monomer database• SWIG interface• Stable and documented

E. Krissinel et.al. (2004) Acta Cryst. D60 2250-55

Approximately 40% of CCP4 suite now uses a common set of coordinate functions provided by MMDB. This should help greatly in maintenance and adaptation to possible format changes.

Conversion of older FORTRAN applications, which are not using RWBrook, to MMDB, in most cases means a complete rewriting. This does not seem to be necessary at the moment.

All on-going developments in FORTRAN seem to be using their own coordinate functions and libraries.

MMDB delivers all its power only in C++ interface. Most of MMDB functionality cannot be expressed in traditional FORTRAN terms. Should we encourage new coordinate developments in C/C++ using MMDB? - shift away from FORTRAN thinking.

New coordinate-related CCP4 projects - MG, Coot, SSM and Protein Interfaces - are all based on MMDB and that seems to be an advantage for the projects.

General remarks

PIASProtein Interactions, Assemblies and Searches

E. Krissinel

CCP4 - EBI/MSD project

PIAS Project goals

Develop a tool and publicly available interactive service to aid solution of different tasks that involve structural and chemical analysis of protein interactions, such as

• prediction of oligomeric states• analysis of structure-function relationship• analysis and prediction of protein interactions• search for interface homologues• active site recognition and analysis• protein surface analysis• structure specificity analysis• other

Project started in 2004.

Interactive Web server

provisional parts, subject to progress and feasibility

Crystal interfaces

Interface calculations, analysis, scoring &

biological significance

Interfaces & structure similarity

searches

Interface fingerprinting

Applied studies (e.g. discovery of

multispecific proteins)

Active site recognition

Docking

Procedures for CCP4 MG

Prediction of interfaces

Prediction of oligomeric states

(PQS-3)

Interfaces & surface similarity

searches

PIAS Project overview

PIAS database

2004-2005 2005-2007 2006-2008 2004-2008

Crystal interfaces

Interface calculations, analysis, scoring &

biological significance


searches

Interface fingerprinting

Applied studies (e.g. discovery of

multispecific proteins)

Active site recognition

Docking

Procedures for CCP4 MG

Prediction of interfaces


(PQS-3)

Interfaces & surface similarity

searches

PIAS Project schedule

PIAS database

PIAS Database

• Interface is defined as area that becomes inaccessible to solvent upon complex formation

• Databased properties for interfacing structures:

Contains interfaces between polypeptides found in all PDB entries: all crystal contacts for X-ray entries and chain contacts for NMR entries. Also contains predicted protein assemblies.

Interface area per residue (+ selection of interfacing atoms and residues) Number of atoms and residues involved Solvation energy gain (per residue) and P-value of hydrophobic patches List of potential hydrogen bonds and salt bridges Complexation significance score

• Databased properties for interfaces:

Size, weight Solvent accessible area per residue (+

selection of surface atoms and residues)

• Databased properties for assemblies: Composition, chemical formula List of engaged interfaces Transformation matrices

Solvation energy gain Solvent accessible and buried surface area Dissociation pattern and barrier

Solvation energy per residue SSM data for structure search Structure and sequence alignment

PIAS database

Existing tools for the calculation of quaternary structuresPrediction of oligomeric states

(PQS-3)

PQS server @ MSD (Kim Henrick) (PQS-1)


Method: recursive splitting of the largest complexes allowed by crystal symmetry. Termination criteria is derived from the individual statistical scores of crystal contacts. The results are not curated.

PITA software @ Thornton group EBI (Hannes Ponstingl) (PQS-2)

Method: progressive built-up by addition of monomeric chains that suit the selection criteria. The results are partly curated.

Graph-chemical approach

• Crystal is represented as a periodic graph of monomers (a “supermolecule”)

• All possible assemblies that obey the symmetry criteria are recursively enumerated as subgraphs covering all the crystal

• Only sets of chemically stable assemblies are left as an answer:

0int STG



(PQS-3)

Success rate obtained on a benchmark set of 212 structures (H. Ponstingl)

PQS server @ MSD 78% (not optimised on the benchmark set)

PITA software 84% (optimised with 18 parameters)

PIAS 89% (optimised with 8 parameters, underfit)Early results outside the benchmark set indicate some prevalence of PIAS,

however the actual differences may be less significant.



(PQS-3)


(PQS-3)


PQS may be predicted only up to a certain level of confidence. It seems that 85-90% of correct predictions may be reached. Main reasons for why 100% success rate can never be achieved:

• theoretical models for protein affinity and entropy change upon complexation are primitive

• coordinate (experimental) data are of limited accuracy

• there is no feasible way to take conformation changes into account

• experimental data on multimeric states is very limited and not always reliable - calibration of parameters is difficult

• assemblies may exist in some environments and dissociate in other - a definitive answer is simply not there

Questions to answer

Searching the PIAS database for structurally similar interfaces and interfaces between similar structures


searches

• What interfaces are formed by structures similar to the given one(s) in PDB

• What are the interface partners of a given structure in PDB

• What is the relation between sequence and biological (complexation) significance of the interface (function)

• What PQS may be formed by structures similar to the given one(s) and how the PQS may depend on the sequence

• Is a given structure interaction-specific and/or multispecific

Interfaces and structure similarity searches

A preliminary version of the MSD protein interaction service is set up at

http://www.ebi.ac.uk/msd-srv/prot_int/cgi-bin/piserver

The version includes:

• Calculations for uploaded files or database retrievals on PDB Id code of

Solvent-Accessible Surface area Crystal contacts / interfaces Protein interface parameters and scoring

Interface areaSolvation energy gainHydrogen bonds and salt

bridges

Hydrophobic P-valueBiological relevance scoreSelection of interfacing residues and atoms

Protein Quaternary Structures

• Interface and structure searches in protein interface database derived from PDB

• Visualisation of the structures, interfaces and PQS

PIAS web server

http://www.ebi.ac.uk/msd-srv/cgi-bin/piserver#http://www.ebi.ac.uk/msd-srv/cgi-bin/piserver

















PIAS web server



















PIAS web server



















PIAS web server



















PIAS web server

PIAS web server


3gcb hexamer Dissociation of 3gcb hexamer


















Concluding remarks

The PIAS software is almost ready for first release. It may be released in 2 months time after catching up with

• on-line help and documentation• minor cleaning and re-design of output pages• enhancement of structural search options• further entropy calibration to increase accuracy of PQS prediction

Further work will concentrate on

• surface calculation and analysis• surface / active sites searches• possibly docking• additions to and improvements of existing functions (based on users’

feedback and own needs)

Documents

Coordinate handling and exploitation An overview of coordinate functionality in CCP4 suite Coordinate functionality in REFMAC group of programs (A. Vaguine)