An Active Knowledge Base of Structural Data and Protein-Ligand Interactions
Presenter
Presentation Notes
My name is Ishita Aloni and I am the Product Manager for PLDB I will be giving you a brief overview about the project and I’ll show you some of our features
What Is the PLDB?1. Central data store
– Deep integration with Maestro– Lightweight Web interface
2. Processing pipeline tool– Processed public PDB data & proprietary data– Schrodinger scientific software: Protein Prep Wizard,
WaterMap, SiteMap– Custom/proprietary software
3. Geometric and annotation-based search – Idea generation for lead optimization– Validation of ideas and models
Presenter
Presentation Notes
What is PLDB? At its core, PLDB is an organized central storage solution designed for structural data, but flexible enough to handle almost anything you may want to archive and query The points of access for the data store include a deep integrative experience experience with Maestro as well as a lightweight web interface for quick querying The second core aspect of PLDB is its processing pipeline infrastructure. PLDB uses this infrastructure to post-process the entire PDB, leveraging Schrodinger’s scientific software tools such as protein prep wizard, WaterMap and SiteMap Because these pipelines are completely pluggable, we also provide a straightforward way to create custom processing pipelines so you can use your proprietary software on your proprietary data the way you want. Finally, what do we do with all of this data? We query it. And because our database is full of structures, the queries we want to do are based on geometry and structural motifs so PLDB provides a powerful geometric and annotation-based search That can be used to generate new ideas for workflows like lead optimization or validate existing ideas and models.
PLDB: A Knowledge Base of Structural Data1. Post-processed PDB database
– Crystal structures– Density data: fo-fc and 2fo-fc maps– RCSB metadata (deposition date, resolution, etc.)
3. Indexed and made searchable4. Plugin architecture for easy addition of custom data types
Presenter
Presentation Notes
What exactly does the PLDB contain? This includes crystal structures, which are uploaded and stored in their original form. Meaning what you put into it, you will get out of it. We also generate density data like fofc and 2fofc maps, and RCSB metadata like the deposition date and the resolution. The PLDB can also hold proprietary data such as crystal structures, NMR data, and simulation trajectories All of these data are indexed and made searchable. We also provide a plugin architecture for simple customization, meaning you can add new data types with ease.
PLDB Geometric and Interaction Search• The capability to search for interactions between protein and ligands
– Hydrogen bonds– Pi-Pi interactions (face to face, face to edge)– Halogen bonds– Salt bridges– Metal-ligand/protein bonds
Region, Activation Loop, P-Loop, HRD• And much more …
Presenter
Presentation Notes
As I mentioned, the core of PLDB’s features is its powerful geometric and interaction-based searching tool Which provides the ability to search for protein-ligand interactions like hydrogen bonds, and pi-pi interactions We allow searching of geometric search criteria like distances, angles, centroids and planes There are Kinase feature annotations like the C-Helix, DFG motif, and Activation Loop And we of course provide the ability to search on custom interactions and structural motifs
What Sets PLDB Apart from Current Solutions?
Current Repositories: passive data retrieval
• E.g. File hierarchies of proprietary structures,
RCSB• Limited search – no
geometric querying
Presenter
Presentation Notes
What is the difference between PLDB and the solutions that we currently have available for storing structural data? Our current solutions are largely passive data repositories that allow you to put structural information in and retrieve that same information One example of this is the RCSB’s PDB, which you can use to do some rudimentary keyword type querying, but it’s limited because there’s little to no customization and they provide a geometric search
What Sets PLDB Apart from Current Solutions?
• Data is processed and mined for information
• Search tools give ability to ask complex questions:– Is a given interaction seen
in experimental structures?– What kinds of groups could
I add to form a given interaction?
• Goal: complement and improve intuition
PLDB:An Active
Knowledge Base
Current Repositories
Presenter
Presentation Notes
The PLDB is a superset of this kind of repository. Yes, it provides an organized and centralized data store But PLDB gives you the option to ask questions that a passive repository wouldn’t PLDB lets you ask whether a given interaction is seen in experimental structures We can use the tool to figure out what groups to add to a molecule so we can form a desired interaction Instead of just holding onto your data for you and leaving the mining up to you, this tool is meant to complement and improve your intuition by helping you find patterns and trends in your structural data
Geometric Search• Draw in Ligand or Receptor mode• Measure distances, angles, torsions• Centroid, vector, and plane objects• Waters and protein residues• Query protein-ligand and protein-protein
< 4 Å
Search Form• Uniprot, CATH, SCOP, Pfam• Standard PDB Header fields• Search domain• Include problematic
structures• Ligand information• Custom fields from plugins• 2D ligand similarity search• BLAST sequence searching• Import from 3D workspace
and SMILES
Wildcards and Operators• Wildcard atoms: List or “Any”• “Any aromatic ring” wildcard• Specify aromatic or aliphatic
atoms• R-group: something attached
at point must satisfy criterionR-group
Any aromatic ring wildcard
Aliphatic vs.
aromatic
Presenter
Presentation Notes
TODO: Update with ring membership
3D Visualization in Maestro
Toggle display of structural annotations
3D workspace shows geometric measurements and annotations
Ligand atoms displacing WaterMap
hydration sites
Centroid of matched
ring
Hydrogen Bonds
PLDB Components
Data Repository
Query Nodes
Query Processing
Query Queue
Import Queue
Import Nodes
Structure Preparation
Admin UI (Web)
PLDB Server Master Node
Annotation Scripts
Data Pipelines
Configuration
Maestro Plugin
Web Browser
PyMOL Plugin
Custom Applications
Data Input Utilities
Web Service API Clients
Presenter
Presentation Notes
Extra slide in case of Power users will access the PLDB through the Maestro plugin Custom applications can be build on top of the web services API to take advantage of common infrastructure Query Nodes and Import Nodes can be scaled up depending on complexity and volume of import and queries Annotation Scripts can be managed using the Admin UI Data Pipelines can be customized, and control the actions taken during structure import
Import Processing Pipeline
Uploaded Structural
Data
Script Annotators PostformersData
Repository
Import Pipeline (Transformers)
PLDB Ecosystem
PLDB
3D Mol Viz
Presenter
Presentation Notes
Where does PLDB fit into the larger ecosystem of computational tools and workflows? First, we have 3D molecular visualization
PLDB Ecosystem: 3D Molecular VisualizationPredefined
standard viz: WebPyMOL
3D Mol Viz• E.g. PyMOL,
Chimera• Dedicated tools
for presentations and publications
PLDB
Presenter
Presentation Notes
Products like PyMOL and Chimera that are used as dedicated tools for presentations and publications PLDB overlaps with these types of tools by providing standard visualization tools like WebPyMOL
PLDB Ecosystem
PLDB
3D Mol Viz
Scripting
Presenter
Presentation Notes
Next, we have scripting tools
PLDB Ecosystem: ScriptingScripts can plugin to PLDB
to leverage standard search & visualization
Scripting• E.g. R/Python
PLDB
Presenter
Presentation Notes
Things like R/Python that you can use to do analysis and data mining While PLDB is not a scripting platform, it does provide tools like Python and Web service APIs, which allow users to utilize the search and visualization capabilities in their own scripts For example, I recently used the PLDB to mine the PDB for sets of congeneric ligands with proteins that have significant conformational difference. This code wasn’t within the PLDB, but I used the PLDB infrastructure to do these kinds of analyses
PLDB Ecosystem
PLDB
3D Mol Viz
Scripting
Assay Data
Presenter
Presentation Notes
Next we have tools for tracking assay data
PLDB Ecosystem: Assay DataOnly as pertinent to experimental data
Assay Data• E.g.
SEURAT/LiveDesign, DotMatics,
Forge, Genedata screener, Arxspan
PLDB
Presenter
Presentation Notes
Notice that there is less overlap between these two circles. Handling assay data is typically handled by software like LiveDesign, but if you were interested in storing experimental structures, you could do that with PLDB
PLDB Ecosystem
PLDB
3D Mol Viz
Scripting
Assay Data
Workflow
Presenter
Presentation Notes
Next we have workflow tools
PLDB Ecosystem: WorkflowPLDB supports automated
3D structure prep and visualization workflows
Workflow• E.g. Pipeline
Pilot, KNIME
PLDB
Presenter
Presentation Notes
Things like Pipeline Pilot and KNIME that allow you to set up nodes that data can pass through PLDB provides this kind of processing infrastructure by means of pipelining tools But our focus is on automated 3D structure preparation and visualization workflows
PLDB Ecosystem
PLDB
3D Mol Viz
Scripting
Assay Data
Workflow
Lead Opt
Presenter
Presentation Notes
Lead optimization
PLDB Ecosystem: Lead OptimizationPLDB use cases include ideation of
changes to support LO based on 3D crystal structures
Lead Opt• E.g. LiveDesign,
Torch, SeeSAR
PLDB
Presenter
Presentation Notes
Tools like LiveDesign, Torch, and SeeSAR
PLDB Ecosystem
PLDB
3D Mol Viz
Scripting
Assay Data
Workflow
Lead Opt
Data Analysis
Presenter
Presentation Notes
Data analysis tools
PLDB Ecosystem: Data AnalysisLightweight analysis built-in; export from PLDB to data analysis tools
for advanced analyses
Data Analysis• E.g. Spotfire,
Tableau, Graphpad prism,
Excel
PLDB
Presenter
Presentation Notes
Like Spotfire, Tableau, Graphpad prism, and Excel PLDB has some lightweight data analysis tools like interactive histograms and scatter plots But we allow you to export from PLDB to these other data analysis tools for advanced analyses
Additional Features• Upload from Maestro• Web-based admin portal• KNIME integration for complex workflows• Access control• Load search from 3D workspace• Problem flagging• Search within / excluding existing results• Sharing searches• FEP/MD Trajectories in PLDB• Ligand similarity search• Store proprietary data• Web GUI
Key Advantages of the PLDB• 3D structure search tool integrated with Maestro modeling
environment• Powerful and easy-to-use search• Simple to integrate custom structures, data, and analysis
methods• Python API for integration with other services• Search diverse data types