PowerPoint Presentation · – Halogen bonds – Salt bridges – Metal-ligand/protein bonds • Powerful geometric search criteria – Distance, angle, dihedral – Centroids, vectors,

ICDD, New Delhi, 2017

An Active Knowledge Base of Structural Data and Protein-Ligand Interactions

Presenter

Presentation Notes

My name is Ishita Aloni and I am the Product Manager for PLDB I will be giving you a brief overview about the project and I’ll show you some of our features

What Is the PLDB?1. Central data store

– Deep integration with Maestro– Lightweight Web interface

2. Processing pipeline tool– Processed public PDB data & proprietary data– Schrodinger scientific software: Protein Prep Wizard,

WaterMap, SiteMap– Custom/proprietary software

3. Geometric and annotation-based search – Idea generation for lead optimization– Validation of ideas and models

Presenter

Presentation Notes

What is PLDB? At its core, PLDB is an organized central storage solution designed for structural data, but flexible enough to handle almost anything you may want to archive and query The points of access for the data store include a deep integrative experience experience with Maestro as well as a lightweight web interface for quick querying The second core aspect of PLDB is its processing pipeline infrastructure. PLDB uses this infrastructure to post-process the entire PDB, leveraging Schrodinger’s scientific software tools such as protein prep wizard, WaterMap and SiteMap Because these pipelines are completely pluggable, we also provide a straightforward way to create custom processing pipelines so you can use your proprietary software on your proprietary data the way you want. Finally, what do we do with all of this data? We query it. And because our database is full of structures, the queries we want to do are based on geometry and structural motifs so PLDB provides a powerful geometric and annotation-based search That can be used to generate new ideas for workflows like lead optimization or validate existing ideas and models.

PLDB: A Knowledge Base of Structural Data1. Post-processed PDB database

– Crystal structures– Density data: fo-fc and 2fo-fc maps– RCSB metadata (deposition date, resolution, etc.)

2. Proprietary structural data– Crystal structures– NMR– Density data– Docking– Models (Homology)– Simulations (MD, FEP, WaterMap)

3. Indexed and made searchable4. Plugin architecture for easy addition of custom data types

Presenter

Presentation Notes

What exactly does the PLDB contain? This includes crystal structures, which are uploaded and stored in their original form. Meaning what you put into it, you will get out of it. We also generate density data like fofc and 2fofc maps, and RCSB metadata like the deposition date and the resolution. The PLDB can also hold proprietary data such as crystal structures, NMR data, and simulation trajectories All of these data are indexed and made searchable. We also provide a plugin architecture for simple customization, meaning you can add new data types with ease.

PLDB Geometric and Interaction Search• The capability to search for interactions between protein and ligands

– Hydrogen bonds– Pi-Pi interactions (face to face, face to edge)– Halogen bonds– Salt bridges– Metal-ligand/protein bonds

• Powerful geometric search criteria– Distance, angle, dihedral– Centroids, vectors, planes

• Predefined Kinase features– Gatekeeper residue, Catalytic residues, C-Helix, DFG motif, Hinge

Region, Activation Loop, P-Loop, HRD• And much more …

Presenter

Presentation Notes

As I mentioned, the core of PLDB’s features is its powerful geometric and interaction-based searching tool Which provides the ability to search for protein-ligand interactions like hydrogen bonds, and pi-pi interactions We allow searching of geometric search criteria like distances, angles, centroids and planes There are Kinase feature annotations like the C-Helix, DFG motif, and Activation Loop And we of course provide the ability to search on custom interactions and structural motifs

What Sets PLDB Apart from Current Solutions?

Current Repositories: passive data retrieval

• E.g. File hierarchies of proprietary structures,

RCSB• Limited search – no

geometric querying

Presenter

Presentation Notes

What is the difference between PLDB and the solutions that we currently have available for storing structural data? Our current solutions are largely passive data repositories that allow you to put structural information in and retrieve that same information One example of this is the RCSB’s PDB, which you can use to do some rudimentary keyword type querying, but it’s limited because there’s little to no customization and they provide a geometric search

What Sets PLDB Apart from Current Solutions?

• Data is processed and mined for information

• Search tools give ability to ask complex questions:– Is a given interaction seen

in experimental structures?– What kinds of groups could

I add to form a given interaction?

• Goal: complement and improve intuition

PLDB:An Active

Knowledge Base

Current Repositories

Presenter

Presentation Notes

The PLDB is a superset of this kind of repository. Yes, it provides an organized and centralized data store But PLDB gives you the option to ask questions that a passive repository wouldn’t PLDB lets you ask whether a given interaction is seen in experimental structures We can use the tool to figure out what groups to add to a molecule so we can form a desired interaction Instead of just holding onto your data for you and leaving the mining up to you, this tool is meant to complement and improve your intuition by helping you find patterns and trends in your structural data

Geometric Search• Draw in Ligand or Receptor mode• Measure distances, angles, torsions• Centroid, vector, and plane objects• Waters and protein residues• Query protein-ligand and protein-protein

< 4 Å

Search Form• Uniprot, CATH, SCOP, Pfam• Standard PDB Header fields• Search domain• Include problematic

structures• Ligand information• Custom fields from plugins• 2D ligand similarity search• BLAST sequence searching• Import from 3D workspace

and SMILES

Wildcards and Operators• Wildcard atoms: List or “Any”• “Any aromatic ring” wildcard• Specify aromatic or aliphatic

atoms• R-group: something attached

at point must satisfy criterionR-group

Any aromatic ring wildcard

Aliphatic vs.

aromatic

Presenter

Presentation Notes

TODO: Update with ring membership

3D Visualization in Maestro

Toggle display of structural annotations

3D workspace shows geometric measurements and annotations

Ligand atoms displacing WaterMap

hydration sites

Centroid of matched

ring

Hydrogen Bonds

PLDB Components

Data Repository

Query Nodes

Query Processing

Query Queue

Import Queue

Import Nodes

Structure Preparation

Admin UI (Web)

PLDB Server Master Node

Annotation Scripts

Data Pipelines

Configuration

Maestro Plugin

Web Browser

PyMOL Plugin

Custom Applications

Data Input Utilities

Web Service API Clients

Presenter

Presentation Notes

Extra slide in case of Power users will access the PLDB through the Maestro plugin Custom applications can be build on top of the web services API to take advantage of common infrastructure Query Nodes and Import Nodes can be scaled up depending on complexity and volume of import and queries Annotation Scripts can be managed using the Admin UI Data Pipelines can be customized, and control the actions taken during structure import

Import Processing Pipeline

Uploaded Structural

Data

Script Annotators PostformersData

Repository

Import Pipeline (Transformers)

PLDB Ecosystem

PLDB

3D Mol Viz

Presenter

Presentation Notes

Where does PLDB fit into the larger ecosystem of computational tools and workflows? First, we have 3D molecular visualization

PLDB Ecosystem: 3D Molecular VisualizationPredefined

standard viz: WebPyMOL

3D Mol Viz• E.g. PyMOL,

Chimera• Dedicated tools

for presentations and publications

PLDB

Presenter

Presentation Notes

Products like PyMOL and Chimera that are used as dedicated tools for presentations and publications PLDB overlaps with these types of tools by providing standard visualization tools like WebPyMOL

PLDB Ecosystem

PLDB

3D Mol Viz

Scripting

Presenter

Presentation Notes

Next, we have scripting tools

PLDB Ecosystem: ScriptingScripts can plugin to PLDB

to leverage standard search & visualization

Scripting• E.g. R/Python

PLDB

Presenter

Presentation Notes

Things like R/Python that you can use to do analysis and data mining While PLDB is not a scripting platform, it does provide tools like Python and Web service APIs, which allow users to utilize the search and visualization capabilities in their own scripts For example, I recently used the PLDB to mine the PDB for sets of congeneric ligands with proteins that have significant conformational difference. This code wasn’t within the PLDB, but I used the PLDB infrastructure to do these kinds of analyses

PLDB Ecosystem

PLDB

3D Mol Viz

Scripting

Assay Data

Presenter

Presentation Notes

Next we have tools for tracking assay data

PLDB Ecosystem: Assay DataOnly as pertinent to experimental data

Assay Data• E.g.

SEURAT/LiveDesign, DotMatics,

Forge, Genedata screener, Arxspan

PLDB

Presenter

Presentation Notes

Notice that there is less overlap between these two circles. Handling assay data is typically handled by software like LiveDesign, but if you were interested in storing experimental structures, you could do that with PLDB

PLDB Ecosystem

PLDB

3D Mol Viz

Scripting

Assay Data

Workflow

Presenter

Presentation Notes

Next we have workflow tools

PLDB Ecosystem: WorkflowPLDB supports automated

3D structure prep and visualization workflows

Workflow• E.g. Pipeline

Pilot, KNIME

PLDB

Presenter

Presentation Notes

Things like Pipeline Pilot and KNIME that allow you to set up nodes that data can pass through PLDB provides this kind of processing infrastructure by means of pipelining tools But our focus is on automated 3D structure preparation and visualization workflows

PLDB Ecosystem

PLDB

3D Mol Viz

Scripting

Assay Data

Workflow

Lead Opt

Presenter

Presentation Notes

Lead optimization

PLDB Ecosystem: Lead OptimizationPLDB use cases include ideation of

changes to support LO based on 3D crystal structures

Lead Opt• E.g. LiveDesign,

Torch, SeeSAR

PLDB

Presenter

Presentation Notes

Tools like LiveDesign, Torch, and SeeSAR

PLDB Ecosystem

PLDB

3D Mol Viz

Scripting

Assay Data

Workflow

Lead Opt

Data Analysis

Presenter

Presentation Notes

Data analysis tools

PLDB Ecosystem: Data AnalysisLightweight analysis built-in; export from PLDB to data analysis tools

for advanced analyses

Data Analysis• E.g. Spotfire,

Tableau, Graphpad prism,

Excel

PLDB

Presenter

Presentation Notes

Like Spotfire, Tableau, Graphpad prism, and Excel PLDB has some lightweight data analysis tools like interactive histograms and scatter plots But we allow you to export from PLDB to these other data analysis tools for advanced analyses

Additional Features• Upload from Maestro• Web-based admin portal• KNIME integration for complex workflows• Access control• Load search from 3D workspace• Problem flagging• Search within / excluding existing results• Sharing searches• FEP/MD Trajectories in PLDB• Ligand similarity search• Store proprietary data• Web GUI

Key Advantages of the PLDB• 3D structure search tool integrated with Maestro modeling

environment• Powerful and easy-to-use search• Simple to integrate custom structures, data, and analysis

methods• Python API for integration with other services• Search diverse data types

– PDB structures – In-house structures– Docking results– Molecular dynamics/FEP trajectories– Flexibility to add more…

No Installation Required

• Client now comes pre-installed within Maestro

• Find PLDB Search… in the Structure Analysis section of the Tasks menu (or search for PLDB).

• Tutorial available in Maestro Help menu

Documents

PowerPoint Presentation · – Halogen bonds – Salt bridges – Metal-ligand/protein bonds • Powerful geometric search criteria – Distance, angle, dihedral – Centroids, vectors,