34
www.ccdc.cam.ac.u k CCDC Tools for Mining Structural Databases Or – Building Solid Foundations for a Structure Based Design Campaign John Liebeschuetz, Peter Carlqvist, Simon Bowden Cambridge Crystallographic Data Centre 12 Union Rd., Cambridge, UK

John Liebeschuetz, Peter Carlqvist, Simon Bowden Cambridge Crystallographic Data Centre

  • Upload
    hedy

  • View
    54

  • Download
    3

Embed Size (px)

DESCRIPTION

CCDC Tools for Mining Structural Databases Or – Building Solid Foundations for a Structure Based Design Campaign . John Liebeschuetz, Peter Carlqvist, Simon Bowden Cambridge Crystallographic Data Centre 12 Union Rd., Cambridge, UK . - PowerPoint PPT Presentation

Citation preview

Page 1: John Liebeschuetz, Peter Carlqvist, Simon Bowden Cambridge Crystallographic Data Centre

www.ccdc.cam.ac.uk

CCDC Tools for Mining Structural Databases Or – Building Solid Foundations for a Structure Based Design Campaign

John Liebeschuetz, Peter Carlqvist, Simon BowdenCambridge Crystallographic Data Centre

12 Union Rd., Cambridge, UK

Page 2: John Liebeschuetz, Peter Carlqvist, Simon Bowden Cambridge Crystallographic Data Centre

www.ccdc.cam.ac.uk

Assessment and Comparison of Ligand – Protein Structural Models

• For the Crystallographer

– What is wrong with my model?

– What interesting features or differences with related structures can I highlight in my publication?

• For the Molecular Modeller

– What is wrong with the Crystallographer’s model?

– What interesting features or differences with related structures can I use to inform my structure-based drug design campaign ?

– Are there non-homologous structures with similar features that I need to watch out for?

Page 3: John Liebeschuetz, Peter Carlqvist, Simon Bowden Cambridge Crystallographic Data Centre

www.ccdc.cam.ac.uk

Why can’t I take a structure from the PDB and just use it ?

• Validation of ligand structures bound to proteins15% of 100 recent PDB entries have ligand geometry that are

almost certainly in significant error (in house analysis using Relibase+/Mogul)

evaluation of pdb ligand dataset from 1990's with Mogul and Relibase

correct34%

wrong26%

not unusual40% correct

wrong

not unusual

evaluation of most recent pdb ligand dataset with Mogul and Relibase

correct29%

wrong16%

not unusual55%

correct

wrong

notunusual

Pre 2000 2006

Page 4: John Liebeschuetz, Peter Carlqvist, Simon Bowden Cambridge Crystallographic Data Centre

www.ccdc.cam.ac.uk

How much ligand strain is accomodated by the protein?

• Accepted View –Many ligands adopt strained conformation when bound to proteins, some (60%) do not bind even in a local minimum conformation. (Perola & Charifson, J. Med. Chem. 2004, 47, 2499-2510)

• Alternative view – Ligands usually (but not always) bind in a local minimum. Many ‘strained’ structures found in the PDB are imperfectly refined. (Open-Eye, B. Kelley and G. Warren, EuroCYP)

Page 5: John Liebeschuetz, Peter Carlqvist, Simon Bowden Cambridge Crystallographic Data Centre

www.ccdc.cam.ac.uk

CCDC Tools that can help you

• Relibase/Relibase+ - Web-based database system for searching, retrieving and analysing 3D structures of protein-ligand complexes in the Brookhaven Protein Data Bank (PDB) – Relibase is freely available for academics

– Relibase+ has extra features (some of these will be used in this workshop)

• The Cambridge Structural Database System - Database of > 400,000 small molecule crystallographic structures, and associated query software– Mogul and IsoStar knowledge-bases of molecular geometry and inter-

molecular interactions

– Directly linked access from Relibase+

Page 6: John Liebeschuetz, Peter Carlqvist, Simon Bowden Cambridge Crystallographic Data Centre

www.ccdc.cam.ac.uk

The Workshop

Part 1: Validation of models and structural analysis• Analysing a protein structure for errors and interesting features

• Comparing a structure with structures related by homology or by functionality

Part 2: Probing the Protein-Ligand Interface• Substructure searching in Relibase/Relibase+

• Comparing the interactions of different ligands with the same target

• Validating an unusual interaction using substructure searching in Relibase+

Page 7: John Liebeschuetz, Peter Carlqvist, Simon Bowden Cambridge Crystallographic Data Centre

www.ccdc.cam.ac.uk

Relibase+

• Relibase+– Web-based database system for searching, retrieving and

analysing 3D structures of protein-ligand complexes in the Brookhaven Protein Data Bank (PDB)

– Successor to ReLiBase (developed by Manfred Hendlich et al. (Merck, Marburg U.) M. Hendlich, Acta Cryst. D54,1178-1182, 1998

• Relibase: free on WWW for academics– http://relibase.ccdc.cam.ac.uk/– http://relibase.rutgers.edu/

Page 8: John Liebeschuetz, Peter Carlqvist, Simon Bowden Cambridge Crystallographic Data Centre

www.ccdc.cam.ac.uk

Relibase+

• Keyword searching• FASTA protein sequence searching• 2D substructure searching• 3D protein-ligand interaction searching• Protein-protein interaction searching• Similarity searching for ligands• SMILES substructure matching• Automatic superposition of related binding sites to

compare ligand binding modes, water positions, etc.• 3D visualisation with AstexViewer and ReliView(Hermes)

Basic Functionality

Page 9: John Liebeschuetz, Peter Carlqvist, Simon Bowden Cambridge Crystallographic Data Centre

www.ccdc.cam.ac.uk

Relibase+

• Functionality for generation and search of proprietary databases of protein-ligand complexes alongside the PDB

• Links to the Mogul and IsoStar modules of the CSDS for geometry validation

• Additional modules: Crystal packing, WaterBase, CavBase

• Detailed analysis of superimposed binding sites• Enhanced treatment of hitlists• Reliscript: Command-line access via a Python-based

toolkit• Coming Soon: SecBase including Turn Classification

Advanced Functionality

Page 10: John Liebeschuetz, Peter Carlqvist, Simon Bowden Cambridge Crystallographic Data Centre

www.ccdc.cam.ac.uk

CavBase

• Detect unexpected similarities amongst protein cavities (e.g. active sites) that share little or no sequence homology.

• Similarity judged by matching 3D property descriptors (pseudocentres) that encode the shape and chemical characteristics of each cavity

• No sequence information used, can detect similar cavities even if they have no obvious secondary-structure relationship

• Developed by S.Schmitt et al., J.Mol.Biol. (2002)

CavBase

Page 11: John Liebeschuetz, Peter Carlqvist, Simon Bowden Cambridge Crystallographic Data Centre

www.ccdc.cam.ac.uk

Cambridge Structural Database

• Repository for the world’s small organic and metal-organic crystal structures (up to 500 non-H atoms)

• Experimentally determined 3D structures via X-ray, and neutron diffraction methods

• 2007 release contains 423,798 entries– approximately 32,000 entries added per year

• Derived from around 1200 published sources– official depository for >80 major journals

– majority of data directly deposited electronically (CIF)

• Increasing number of Private Communications

Page 12: John Liebeschuetz, Peter Carlqvist, Simon Bowden Cambridge Crystallographic Data Centre

www.ccdc.cam.ac.uk

How much Data is Available?

CSD Growth 1970-2006

419,768 entries June 2007

0

100000

200000

300000

400000

500000

600000

2001 2003 2005 2007 2009

Growth of the CSD

Predicted Growthto 2010

>500,000 entries during 2009

Page 13: John Liebeschuetz, Peter Carlqvist, Simon Bowden Cambridge Crystallographic Data Centre

www.ccdc.cam.ac.uk

CSD Information content

Atomic coordinates, unit-cell, space-group symmetry (fully validated)

Crystal structure data

Page 14: John Liebeschuetz, Peter Carlqvist, Simon Bowden Cambridge Crystallographic Data Centre

www.ccdc.cam.ac.uk

Bibliographic and Chemical Information

• Bibliographic and chemical text and properties (all searchable)

4-Oxonicotinamide-1-

(1’-beta-D-2’,3’,5’-tri-O-acetyl-ribofuranoside)

Source: Rothmannia longiflora

Colour: pale yellow

Habit: acicular

Polymorph: Form IV

C17 H20 N2 O9

G. Bringmann, M. Ochse, K. Wolf,

J. Kraus, K. Peters, E-M. Peters,

M. Herderich, L. Ake, F. Tayman

Phytochemistry 51 (1999), p271

R-factor: .0506

• Chemical diagram and chemical connectivity to enable 2D and 3D searching for substructures, pharmacophores and intermolecular interactions

• Cross-referencing between entries

CSD Information content

Page 15: John Liebeschuetz, Peter Carlqvist, Simon Bowden Cambridge Crystallographic Data Centre

www.ccdc.cam.ac.uk

Cambridge Structural Database System

CambridgeCambridge StructuralStructural DatabaseDatabase

PreQuestDatabase Production

VISTAStatisticalanalysis

MercuryGraphical display,packing analysis

ConQuestDatabase

Search

MogulLibrary of

Molecular Geometry

IsoStarLibrary of

Intermolecular Interactions

Knowledge Bases

Page 16: John Liebeschuetz, Peter Carlqvist, Simon Bowden Cambridge Crystallographic Data Centre

www.ccdc.cam.ac.uk

MogulA Knowledge Base of Molecular Geometries

Bruno et al., J. Chem. Inf. Comput. Sci., 44, 2133-2144, 2004

Page 17: John Liebeschuetz, Peter Carlqvist, Simon Bowden Cambridge Crystallographic Data Centre

www.ccdc.cam.ac.uk

Incorporates pre-computed libraries of bond lengths, valence angles and torsion angles, derived entirely from the CSD

Sketch or import molecule, then click on feature of interest to view distribution, mean values and statistics

Very fast search speeds, with hyperlinks to the CSD to view specific structures

Complete geometry: retrieve distributions for all bonds, angles and torsions in the molecule

MogulRapid access to CSD information

Page 18: John Liebeschuetz, Peter Carlqvist, Simon Bowden Cambridge Crystallographic Data Centre

www.ccdc.cam.ac.uk

A Knowledge Base of Intermolecular Interactions

• Experimental data from:

– Cambridge Structural Database

– Protein Data Bank (protein-ligand complexes only)

– Theoretical potential energy minima (DMA, IMPT)

• Interaction distributions displayed immediately as scatterplots or contour surfaces

• >20,000 CSD scatterplots, >5,500 PDB, 1,500 Eminima

IsoStar

Page 19: John Liebeschuetz, Peter Carlqvist, Simon Bowden Cambridge Crystallographic Data Centre

www.ccdc.cam.ac.uk

central group: -CONH2

contact group: NH

IsoStar Methodology

Search CSD or PDB for structures containing desired contact

Superimpose hits and display as scatterplots

Page 20: John Liebeschuetz, Peter Carlqvist, Simon Bowden Cambridge Crystallographic Data Centre

www.ccdc.cam.ac.uk

Density Maps

Can also represent distribution as density maps

Page 21: John Liebeschuetz, Peter Carlqvist, Simon Bowden Cambridge Crystallographic Data Centre

www.ccdc.cam.ac.uk

The Workshop

Part 1: Validation of models and structural analysis• Analysing a protein structure for errors and interesting features

• Comparing a structure with structures related by homology or by functionality

Part 2: Probing the Protein-Ligand Interface• Substructure searching in Relibase/Relibase+

• Comparing the interactions of different ligands with the same target

• Validating an unusual interaction using substructure searching in Relibase+

Page 22: John Liebeschuetz, Peter Carlqvist, Simon Bowden Cambridge Crystallographic Data Centre

www.ccdc.cam.ac.uk

How to access the workshop

http://relibase.ccdc.cam.ac.uk/

[email protected]

s1mple

Webpage

Email address

Password

Page 23: John Liebeschuetz, Peter Carlqvist, Simon Bowden Cambridge Crystallographic Data Centre

www.ccdc.cam.ac.uk

Page 24: John Liebeschuetz, Peter Carlqvist, Simon Bowden Cambridge Crystallographic Data Centre

www.ccdc.cam.ac.uk

Cavity Detection

PROTEIN

N

O

OO

N

ON

N

O

N

OO

N

N

O

O

N

O

N

N

N

O

Based on the LIGSITE ProgramM.Hendlich et al., J.Mol.Graph. (1997).

Page 25: John Liebeschuetz, Peter Carlqvist, Simon Bowden Cambridge Crystallographic Data Centre

www.ccdc.cam.ac.uk

The pseudo-centre concept

donoracceptor

aliphaticpi/aromatic

NH

O

O

O

N

OO

N

HN

HH

Coding Molecular Recognition into Simple Descriptors

Page 26: John Liebeschuetz, Peter Carlqvist, Simon Bowden Cambridge Crystallographic Data Centre

www.ccdc.cam.ac.uk

O

NH

Cavity

Protein

3D Property Description

Page 27: John Liebeschuetz, Peter Carlqvist, Simon Bowden Cambridge Crystallographic Data Centre

www.ccdc.cam.ac.uk

Similarity Search

Page 28: John Liebeschuetz, Peter Carlqvist, Simon Bowden Cambridge Crystallographic Data Centre

www.ccdc.cam.ac.uk

Similarity Search

Clique detectionBron-Kerbosch

Page 29: John Liebeschuetz, Peter Carlqvist, Simon Bowden Cambridge Crystallographic Data Centre

www.ccdc.cam.ac.uk

Similarity Search

Clique detectionBron-Kerbosch

Page 30: John Liebeschuetz, Peter Carlqvist, Simon Bowden Cambridge Crystallographic Data Centre

www.ccdc.cam.ac.uk

Similarity Analysis

Scoring based on matching pseudo-centres, and the associated surface patches

Page 31: John Liebeschuetz, Peter Carlqvist, Simon Bowden Cambridge Crystallographic Data Centre

www.ccdc.cam.ac.uk

An Example

1OXO/1F2D• Overlay of PLP ligands • Matching pseudo-centres

and surface patches shown

Page 32: John Liebeschuetz, Peter Carlqvist, Simon Bowden Cambridge Crystallographic Data Centre

www.ccdc.cam.ac.uk

Crystal PackingImportant e.g. when docking ligands

Concanavalin A (1cjp) Binding site in Relibase+

Page 33: John Liebeschuetz, Peter Carlqvist, Simon Bowden Cambridge Crystallographic Data Centre

www.ccdc.cam.ac.uk

1mtw

reference ligand, no packing

reference in green, first-rank solution atom-coloured

Page 34: John Liebeschuetz, Peter Carlqvist, Simon Bowden Cambridge Crystallographic Data Centre

www.ccdc.cam.ac.uk

1mtw, Packing Included

reference ligand, no packing

including neighbouring chains

GOLD’s first-rank solution