Computation and visualization of protein topology graphs including ligands

Preview:

DESCRIPTION

Talk by Tim Schäfer at the German Conference on Bioinformatics 2012. Jena, Germany. Ligand information is of great interest to understand protein function. Protein structure topology can be modeled as a graph, with secondary structure elements as vertices and spatial contacts between them as edges. Meaningful representations of such graphs in 2D are required for the visual inspection, comparison and analysis of protein folds, but their automatic visualization is still challenging. We present an approach which solves this task, supports several graph types and includes ligands. Our method generates a mathematically unique representation and high quality 2D plots of the secondary structure of proteins based on a protein ligand graph. This graph is computed from 3D atom coordinates in Protein Databank (PDB) and the corresponding secondary structure elements (SSE) assignments of the DSSP algorithm. The Visualization of Protein-Ligand Graphs (VPLG) software enables rapid visualization of protein structures and exports graphs in various standard formats for further analysis.

Citation preview

GCB 2012

Computation and visualization of protein topology graphs including ligands

Tim Schäfer, Patrick May, Ina Koch

Goethe-University Frankfurt am MainInstitute for Computer Science

Department for Molecular Bioinformatics

Computation and visualization of protein topology graphs including ligands 2

Contents

● Introduction● Protein structure, ligands and graphs

● Methods● Computing protein ligand graphs from PDB and DSSP data

● Types of protein ligand graphs

● Visualization of protein ligand graphs

● Results● The VPLG software

● Case studies

Computation and visualization of protein topology graphs including ligands 3

Protein structure

● Primary structure● String of 20 AAs

● 3D: peptide bonds, steric constraints, Ramachandran

● Secondary structure● local, H-bonds, SSE types

● super-secondary structure

● protein domains

● Tertiary structure● atom coordinates

● Quaternary structure● multi-chain proteins

Computation and visualization of protein topology graphs including ligands 4

Modeling protein structure as a graph

● Graph G = (V, E)

● Pervasive data structure in computer science and bioinformatics

● Usage of graps to model proteins

● CATH [Orengo et al.]

● SCOP [Murzin et al.]

● TOPS+ [Gilbert, Veeramalai]

● PTGL [May, Koch]

Computation and visualization of protein topology graphs including ligands 5

Protein ligand interactions

● Proteins interact with their environment to fulfill their biological function: other proteins, ligands, ...

● Knowledge on PLI is important in many applications, e.g. drug design, molecular medicine

● > 10,000 different ligands occur in PDB files

[PDB Ligand Expo, 1AVD]

Computation and visualization of protein topology graphs including ligands 6

A graph model of proteins on the super-secondary structure level

● Secondary Structure Elements (SSEs)● Few: 40,000 atoms => 400 residues => 40 SSEs

● Automated assignment from 3D data possible (e.g., DSSP)

● Protein model● undirected, labeled graph for each chain of a protein

● Protein graph● vertices: SSEs or ligands

● edges: contacts and spatial relations between them

● graph types: alpha-beta-graph, alpha-graph or beta-graph

[Koch 1997, May et al. 2004]

Computation and visualization of protein topology graphs including ligands 7

Protein ligand graphs

● Extension of protein graph model

● Protein ligand graph G = (V, E)● Vertex: SSE || Ligand

● Edge: Spatial contact (parallel, antiparallel, mixed, ligand, backbone)

Computation and visualization of protein topology graphs including ligands 8

Computing protein ligand graphs

1. Obtain sequence from PDB file

SKVVVPAQGKKITLQNGKLNVPENPIIPYIEGDGIGVDVTPAM

Computation and visualization of protein topology graphs including ligands 9

Computing protein ligand graphs

1. Obtain sequence from PDB file

2. Obtain secondary structure assignments from DSSP file

SKVVVPAQGKKITLQNGKLNVPENPIIPYIEGDGIGVDVTPAM

HHHHHHHHHH EEEEEEE HHHHHH EEEEEEE

Computation and visualization of protein topology graphs including ligands 10

Computing protein ligand graphs

1. Obtain sequence from PDB file

2. Obtain secondary structure assignments from DSSP file

3. Add ligand information from PDB file

SKVVVPAQGKKITLQNGKLNVPENPIIPYIEGDGIGVDVTPAMJ

HHHHHHHHHH EEEEEEE HHHHHH EEEEEEE L

Computation and visualization of protein topology graphs including ligands 11

Computing protein ligand graphs

1. Obtain sequence from PDB file

2. Obtain secondary structure assignments from DSSP file

3. Add ligand information from PDB file

4. Build graph: add a vertex for each SSE

H E H E L

Computation and visualization of protein topology graphs including ligands 12

Computing protein ligand graphs

1. Obtain sequence from PDB file

2. Obtain secondary structure assignments from DSSP file

3. Add ligand information from PDB file

4. Build graph: add a vertex for each SSE

5. Compute spatial contacts and add edges between SSEs

H E H E L

Computation and visualization of protein topology graphs including ligands 13

VPLG – an open source implementation in Java

Computation and visualization of protein topology graphs including ligands 14

Overview: Computing protein ligand graphs

Computation and visualization of protein topology graphs including ligands 15

Contact computation

● Parser for PDB and DSSP files

● Atom level contacts● vdW radius overlap, radius 2 A

● complexity for n atoms is Ο (n*n)

● Residue level contacts● collision spheres, CA is center

● Protein-ligand contacts● collision spheres, min max center

● SSE level contacts● rules depending on SSE type, difficult

Computation and visualization of protein topology graphs including ligands 16

Types of protein graphs

● Alpha

● Beta

● Alpha-beta

● Ligands

● Not ness. connected● Connected components

● Backbone option

Computation and visualization of protein topology graphs including ligands 17

The VPLG software

● Command line tool

● GUI in Java/Swing

● Screenshots

● Graph file formats

● Database support

Computation and visualization of protein topology graphs including ligands 18

Protein graphs for the whole PDB – Some statistics

● 62,364 proteins consisting of 151,025 chains

● 1.4 million helices, 1.2 million beta strands, 300k ligands

● Contacts● Mixed: 710.000

● Parallel: 450.000

● Antiparallel: 870.000

● Total non-ligand: 2.000.000 => 0.76 contacts per SSE

● Ligand: 760.000 => 2.50 contacts per ligand

● High number of isolated ligands (~30 %) at first run● Atom radius too small?

● Contact definition?

● Small ligands (few atoms?)

Computation and visualization of protein topology graphs including ligands 19

Outlook and possible improvements

● VPLG software ● Post-processing of DSSP output (short SSEs)

● New spatial relation 'close to' for ligands

● RNA / DNA as ligands

● Backend● Get Web server online

● Protein structure comparison (not necessarily graph-based)

● Use VPLG to get ligands into the PTGL

Computation and visualization of protein topology graphs including ligands 20

Summary

● Ligands are part of protein graph

● Visualization based on primary structure ordering of SSEs

● VPLG software is open source implementation● Visualization of protein graphs from PDB and DSSP files

● http://www.bioinformatik.uni-frankfurt.de https://sourceforge.net/projects/vplg/

● Evaluation● Statistics

● Hemoglobin as an example

Computation and visualization of protein topology graphs including ligands 21

Acknowledgments

● Supervisor Prof. Ina Koch

● Molecular Bininformatics group at Goethe University Frankfurt am Main (MolBI)

Computation and visualization of protein topology graphs including ligands 22

Thanks for your attention!

Questions?

Computation and visualization of protein topology graphs including ligands 23

(Appendix slides follow)

Recommended