14
BMC Bioinformatics Methodology article IDSS: deformation invariant signatures for molecular shape comparison Yu-Shen Liu* 1 , Yi Fang 1 and Karthik Ramani 1,2 Address: 1 School of Mechanical Engineering, Purdue University, West Lafayette, IN, 47907, USA and 2 School of Electrical Computer Engineering (by courtesy), Purdue University, West Lafayette, IN, 47907, USA E-mail: Yu-Shen Liu* - [email protected]; Yi Fang - [email protected]; Karthik Ramani - [email protected] *Corresponding author Published: 22 May 2009 Received: 8 March 2009 BMC Bioinformatics 2009, 10:157 doi: 10.1186/1471-2105-10-157 Accepted: 22 May 2009 This article is available from: http://www.biomedcentral.com/1471-2105/10/157 © 2009 Liu et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Abstract Background: Many molecules of interest are flexible and undergo significant shape deformation as part of their function, but most existing methods of molecular shape comparison (MSC) treat them as rigid bodies, which may lead to incorrect measure of the shape similarity of flexible molecules. Results: To address the issue we introduce a new shape descriptor, called Inner Distance Shape Signature (IDSS), for describing the 3D shapes of flexible molecules. The inner distance is defined as the length of the shortest path between landmark points within the molecular shape, and it reflects well the molecular structure and deformation without explicit decomposition. Our IDSS is stored as a histogram which is a probability distribution of inner distances between all sample point pairs on the molecular surface. We show that IDSS is insensitive to shape deformation of flexible molecules and more effective at capturing molecular structures than traditional shape descriptors. Our approach reduces the 3D shape comparison problem of flexible molecules to the comparison of IDSS histograms. Conclusion: The proposed algorithm is robust and does not require any prior knowledge of the flexible regions. We demonstrate the effectiveness of IDSS within a molecular search engine application for a benchmark containing abundant conformational changes of molecules. Such comparisons in several thousands per second can be carried out. The presented IDSS method can be considered as an alternative and complementary tool for the existing methods for rigid MSC. The binary executable program for Windows platform and database are available from https://engineering.purdue.edu/PRECISE/IDSS. Background Molecular shape comparison (MSC) has been playing an increasingly important role in computer aided molecular design, rational drug design, molecular docking and function prediction. The goal of MSC is to find the spatial properties common to two or more molecules. Especially in computer aided drug design, a critical problem of virtual screening, aimed at identifying the drug-like molecules likely to have beneficial biological properties, is comparing molecular shapes. An alterna- tive virtual screening technique consists of searching a molecular database for compounds that most closely resemble a given query molecule [1-4]. The underlying assumption is that the molecules similar to the active query molecule are likely to share similar properties. This similarity can be in terms of molecular geometrical Page 1 of 14 (page number not for citation purposes) BioMed Central Open Access

BMC Bioinformatics BioMed Central · BMC Bioinformatics Methodology article IDSS: deformation invariant signatures for molecular shape comparison Yu-Shen Liu*1,YiFang1 and Karthik

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: BMC Bioinformatics BioMed Central · BMC Bioinformatics Methodology article IDSS: deformation invariant signatures for molecular shape comparison Yu-Shen Liu*1,YiFang1 and Karthik

BMC Bioinformatics

Methodology articleIDSS: deformation invariant signatures for molecular shapecomparisonYu-Shen Liu*1, Yi Fang1 and Karthik Ramani1,2

Address: 1School of Mechanical Engineering, Purdue University, West Lafayette, IN, 47907, USA and 2School of Electrical Computer Engineering(by courtesy), Purdue University, West Lafayette, IN, 47907, USA

E-mail: Yu-Shen Liu* - [email protected]; Yi Fang - [email protected]; Karthik Ramani - [email protected]*Corresponding author

Published: 22 May 2009 Received: 8 March 2009

BMC Bioinformatics 2009, 10:157 doi: 10.1186/1471-2105-10-157 Accepted: 22 May 2009

This article is available from: http://www.biomedcentral.com/1471-2105/10/157

© 2009 Liu et al; licensee BioMed Central Ltd.This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0),which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Background: Many molecules of interest are flexible and undergo significant shape deformationas part of their function, but most existing methods of molecular shape comparison (MSC) treatthem as rigid bodies, which may lead to incorrect measure of the shape similarity of flexiblemolecules.

Results: To address the issue we introduce a new shape descriptor, called Inner Distance ShapeSignature (IDSS), for describing the 3D shapes of flexible molecules. The inner distance is defined asthe length of the shortest path between landmark points within the molecular shape, and it reflectswell the molecular structure and deformation without explicit decomposition. Our IDSS is storedas a histogram which is a probability distribution of inner distances between all sample point pairson the molecular surface. We show that IDSS is insensitive to shape deformation of flexiblemolecules and more effective at capturing molecular structures than traditional shape descriptors.Our approach reduces the 3D shape comparison problem of flexible molecules to the comparisonof IDSS histograms.

Conclusion: The proposed algorithm is robust and does not require any prior knowledge of theflexible regions. We demonstrate the effectiveness of IDSS within a molecular search engineapplication for a benchmark containing abundant conformational changes of molecules. Suchcomparisons in several thousands per second can be carried out. The presented IDSS method canbe considered as an alternative and complementary tool for the existing methods for rigidMSC. The binary executable program for Windows platform and database are available fromhttps://engineering.purdue.edu/PRECISE/IDSS.

BackgroundMolecular shape comparison (MSC) has been playing anincreasingly important role in computer aided moleculardesign, rational drug design, molecular docking andfunction prediction. The goal of MSC is to find thespatial properties common to two or more molecules.Especially in computer aided drug design, a criticalproblem of virtual screening, aimed at identifying the

drug-like molecules likely to have beneficial biologicalproperties, is comparing molecular shapes. An alterna-tive virtual screening technique consists of searching amolecular database for compounds that most closelyresemble a given query molecule [1-4]. The underlyingassumption is that the molecules similar to the activequery molecule are likely to share similar properties. Thissimilarity can be in terms of molecular geometrical

Page 1 of 14(page number not for citation purposes)

BioMed Central

Open Access

Page 2: BMC Bioinformatics BioMed Central · BMC Bioinformatics Methodology article IDSS: deformation invariant signatures for molecular shape comparison Yu-Shen Liu*1,YiFang1 and Karthik

shapes or descriptors. A number of previous studies haveconcerned shape comparison of molecules [1,2,5-10].Most existing MSC methods are only effective forcomparing 3D rigid objects, but they can not handlethe deformed shapes of flexible objects well. Never-theless, many molecules of interest are flexible andundergo significant shape deformation as part of theirfunction. When flexible molecules in different conforma-tions are compared to each other as rigid bodies, strongshape similarities might be missed. To address the issuewe developed a new method for comparing molecularshapes, which is insensitive to molecular shape deforma-tion compared to previously rigid methods.

Methods of molecular shape comparisonThe molecular shape has been widely acknowledged as akey factor for biological activity and it is directly related tothe design of selective ligands for protein and DNAbinding. To exploit the shape similarity of molecules inthe shape-based molecular design, a useful tool is MSCthat compares the shapes of two or more molecules andidentifies common spatial features [11,12]. Such compar-ison can lead to some alternative models in the process ofdrug design. An additional advantage of MSC is that nospecification of chemical structure is made and thereforethe molecules with shape similarity, but with differentchemical structure, can be found [1,2]. However, theefficient MSC is currently a challenge [1,2,11,12] due tothe high complexity of 3D molecular shapes.

Ballester et al. [1,2] divided the MSC methods into twocategories: superposition and descriptor (or signature)methods.The former relies on finding an optimal superposition ofmolecules, and the later (i.e. non-superposition) is indepen-dent of molecular orientation and position.

Superposition MSCThe superposition methods are a popular family of MSCmethods based on the optimal superposition/alignmentof two or more molecules. The early superpositionmethod was developed by Meyer and Richards [13] tomeasure the similarity of molecular shape. Masek et al.[14] compared molecular shapes by optimizing theintersection of molecular surfaces. ROCS (Rapid Overlayof Chemical Structures) is an available superpositionmethod [15] and it performs shape-based overlays oftwo molecules by a local optimization process. Thealgorithm is based on the earlier implementations ofmolecular shape comparison described by Masek et al.[16], which quickly finds and quantifies the maximumoverlap of the volume of two molecules [11,12]. Rushet al. [17] described a shape-based 3D scaffold hoppingmethod, which is an application of ROCS to a bacterialprotein-protein interaction. Recently, Natarajan et al.

[18] compared rigid components of molecules bysegmenting their surfaces based on Morse theory. Thesuperposition MSC methods require a priori super-position/alignment of molecular shapes into a coordi-nate system, which is difficult to achieve robustly. Thereader may consult Refs. [1,2] for a review of manyavailable methods of superposition MSC.

Descriptor/signature MSCAnother category of shape comparison methods usesdescriptor/signature to represent the shape of molecule.The kind of methods is non-superposition that computesthe similarity score by comparing the correspondingdescriptors between two molecular shapes. A 3D shapedescriptor, or called signature, is a compact representationfor some essence of the shape. The shape descriptor isusually used as an index in a database of shapes and enablesfast queries and retrieval. The descriptor methods aresimpler than the traditional superposition methods thatrequire shape superposition/alignment, feature correspon-dence, or model fitting [6,7]. An early molecular shapedescription is developed by Bemis et al. [19] by consideringeach molecule as a collection of its 3-atom submolecules.Nilakantan et al. [20] also introduced amethod for the rapidquantitative shape match between two molecules or amolecule and a template, using atom triplets as descriptors.

Several recent works related to molecular shape compar-ison using shape descriptors have been developedincluding shape distribution descriptor, sphericalharmonic signature, 3D Zernike descriptor, etc[3,5-8,21-25]. These descriptors are rigid-body-transfor-mation invariant, and they are effective for matchingrigid objects. Nevertheless, none of these methods isdeformation invariant and they can not support flexiblemolecular shape comparison.

Deformation invariant representation of nonrigid or flexibleshape like articulated objects is a challenging problem in thefield of shape analysis. Several recent works focus on thisproblem [26-30]. One class of approaches focuses ontopology or graph comparison for determining the defor-mation [27], but the graph extraction process is often verysensitive to local shape changes. Furthermore, graphcomparison cost increases proportionally with the graphsize, resulting in relatively slow comparison and retrievaltimes. In [26], Elad and Kimmel presented a bendinginvariant representation for a pitch of surface based onmultidimensional scaling, but the geodesic distance issensitive to shape changing [31] and therefore it is notappropriate for protein comparison. Jain et al. [28]presented a spectral approach to shape-based retrieval ofdeformation 3Dmodels, but thismethod is not appropriatefor proteinmodels withmany holes. Recently, Gal et al. [29]

BMC Bioinformatics 2009, 10:157 http://www.biomedcentral.com/1471-2105/10/157

Page 2 of 14(page number not for citation purposes)

Page 3: BMC Bioinformatics BioMed Central · BMC Bioinformatics Methodology article IDSS: deformation invariant signatures for molecular shape comparison Yu-Shen Liu*1,YiFang1 and Karthik

proposed the local diameter shape signature by computingthe distance from surface to medial axis. Other methodstake into account local features on the boundary surface ofthe shape in the neighborhood of points [30]. Usually, theselocal techniques are based on matching local descriptors.However, many times they do not perform well on globalshape matching because of their local nature they do notprovide a good signature of the overall shape [29]. Theseexisting descriptors can not perform well for flexiblemolecules due to their complex shape deformation.

Distance signaturesIn 3D shape retrieval, the simplest and most widely usedshape signatures is the distance signature betweensampling point pairs on shape surfaces. Our work alsobelongs to this category. We introduce three representa-tive distance signatures: Euclidean distance (ED), geodesicdistance (GD) and inner distance (ID). The ED signature [7]usually is represented by a histogram of distance valuesand it is formed by three steps: 1) sampling uniformlyrandom points from the shape surface, 2) computing theED between the sampled point pairs, and 3) building thehistogram of corresponding distance values. After finish-ing computation of the signature histogram, the similar-ity scores of shapes are defined as the distance betweentheir histograms. A histogram is actually a one-dimen-sional vector. However, the ED histogram between pair ofpoints on a shape surface is sensitive to shape deforma-tion. An alternative distance signature is to replace ED bygeodesic distance (GD) [26]. The GD between any pair ofpoints on a surface is defined as the length of the shortestpath on the surface between them. Since the GD isinvariant to surface bending, the stretched surface forms abending invariant signature of the original surface.

Although the GD is insensitive to surface stretch, it issensitive to shape deformation, as shown by [31]. The GDdoes not work well for our purpose.

The work most related to ours is [31], in which a novel2D inner distance measurement is presented for building2D shape signatures. The ID signature is robust toarticulated deformation and it is more effective atcapturing shape structures than both ED and GD. In2D case, the ID is defined as the length of the shortestpath between landmark points within the 2D silhouette.However, no algorithm for ID computation of 3D shapesis given due to the complexity of 3D shapes so far.

ResultsOverview of approachHere we introduce a new technique, called InnerDistance Shape Signature (IDSS), for describing the 3Dshapes of flexible molecules. Our work can be regardedas an extension of inner distance from 2D to 3D forcomputing the deformation invariant shape signatures offlexible molecules. The procedure of computing the IDSSof molecular is given as follows. First, we obtain a set ofpoints sampled uniformly from a molecular surfaceusing Lloyd's algorithm of k-means clustering. Then anew algorithm is presented for checking the insidevisibility between sample point pairs; based on theirinside visibility, we define a graph and compute theinner distances using a shortest path algorithm in thegraph. Finally, we build a signature of inner distances formeasuring the global geometric properties of themolecule. The core procedure can be divided into threesteps: sampling, calculating inner distance and buildingsignatures (see Figure 1). These techniques have been

input model sample set

sampling

inner distance signature

buildingcalculating

Figure 1Flowchart of IDSS. Given a molecular shape, three independent steps contain sampling (red points), calculating the innerdistance (green line segments) between all sample point pairs, and building the signature (blue histogram). Here the input shapeis the volumetric data with the simulated 8 Å resolution density map for GroEL (PDB code: 1aon).

BMC Bioinformatics 2009, 10:157 http://www.biomedcentral.com/1471-2105/10/157

Page 3 of 14(page number not for citation purposes)

Page 4: BMC Bioinformatics BioMed Central · BMC Bioinformatics Methodology article IDSS: deformation invariant signatures for molecular shape comparison Yu-Shen Liu*1,YiFang1 and Karthik

implemented in a software package called the IDSSprogram. Figure 2 illustrates the comparison betweenIDSS and rigid methods. The source molecule comesfrom Drosophila sp. (PDB code 2spc, chain A), where2spcA has one long helix and one short helix. The fourartificial molecules (A, B, C and D in Figure 2) areformed by fixing the long helix of 2spcA and rotating theshort helix about 10, 45, 90 and 120 degrees round x-axis, respectively. The input four artificial molecules havethe same main chain orientation but with differentsurface shapes, where the IDSS is computed with surfaceshapes of molecules. Note that our inner distancesignatures remain largely consistent for the fourdeformed molecular shapes of the same protein, whilethe previous rigid descriptor [7] is strongly sensitive toshape deformation.

Definition of inner distanceFirst, we extend the definition of inner distance (ID) in2D objects [31] to 3D shapes. Let O be a 3D shape as aconnected and closed subset of R3. We denote theboundary surface of O by ∂O. Given two points x, y Œ∂O, the ID between x and y, denoted as d(x, y; O), is

defined as the length of the shortest path connecting xand y within O.

Figure 3 gives the illustration of the ID definition, wherethe red dashed lines denote the ID paths between twolandmark points x and y. Note that the object B is anarticulated deformation of the object A. In contrast, theEuclidean distances (ED), defined as the length of theline segment between two landmark points (x and y),does not consider whether the line segment crosses theshape boundaries. Intuitively, this example shows thatthe ID is insensitive to articulated deformation, while theED does not have this property. The significantadvantage of ID is that it reflects shape structure andarticulated deformation without explicitly decomposingthe shape into parts. Note that there may be multipleshortest paths in rare cases, and we arbitrarily chooseone. We are interested in 3D shapes defined by their

Figure 2Our inner distance (ID) signature is compared, forinstance, to Euclidean distance (ED) signature from[7]. The first row shows the input four artificial proteins withthe same main chain orientation but with different molecularshapes. The second row shows the ID and ED signatures. Ineach plot, the vertical axis represents distance distribution.Note that ID is not sensitive to shape deformation, so foursignatures are almost consistent; in contrast, ED is stronglysensitive to deformation.

Figure 3Illustration of definition of the inner distance. The reddashed lines denote shortest paths within the shapeboundary surface that connect two landmark points x and y.The right object B is articulated deformation to the left oneA, and the relative change of the inner distances between thecorresponding pair of points (e.g. x and y) during articulateddeformation are small.

BMC Bioinformatics 2009, 10:157 http://www.biomedcentral.com/1471-2105/10/157

Page 4 of 14(page number not for citation purposes)

Page 5: BMC Bioinformatics BioMed Central · BMC Bioinformatics Methodology article IDSS: deformation invariant signatures for molecular shape comparison Yu-Shen Liu*1,YiFang1 and Karthik

boundaries, hence only boundary points are used assampling points. ID reduces to ED when O is convex.

Articulated deformation and hinge-bending movementLing and Jacobs [31] have proven that the ID isinsensitive to the shape articulated deformation bydecomposing the shape into some rigid parts connectedby junctions. An articulated shape O is described withthe following conditions: 1) O can be decomposed intoseveral parts that are connected by junctions (or hinge);2) the junctions between parts are very small comparedto the parts they connect; 3) the articulation of O as atransformation is rigid when limited to any part but canbe non-rigid at the junctions. The relative ID change isvery small for the articulated objects, so ID is insensitiveto articulations. Molecules are flexible and can beregarded as an articulated shape. Many moleculescontain flexible structures such as loops and hingedomains. Some recent studies demonstrated that theactivity of many molecules induces conformationaltransitions by hinge-bending, which involves the move-ment of relatively rigid parts of a molecule about flexiblejoints [32-34]. In hinge-bending, parts of the moleculerotate with respect to each other as relatively rigidbodies, on a common hinge. The hinge-bending ofmolecules can be treated as a special shape articulateddeformation. In Figure 2, each molecule contains twodomains (two red helices) that are rigid regions and alsocontains one hinge (green loop) that is a flexible region.

Data set of moleculesA molecule is represented by a set of overlappingspherical atoms. The exposed surface of these spheresrepresents a molecular surface that defines the boundaryof a single molecule' volume. In this paper, we considerthe input data as a volumetric/voxelized representationof molecular shape. There have been numerous works onthis representation, such as for binding sites determina-tion [35], molecular shape comparison [8], the cryo-electron microscopy (cryo-EM) data [36,37], and 3Dshape searching [38]. We consider a volumetric model asa uniform 3D lattice consisting of object points O andbackground points O . We represent the 3 × 3 × 3neighborhood of each lattice point x by N(x), which is aset of 26 points and each point (other than x) that sharea common grid edge, face, and cell with x. The boundarysurface of O is defined as

∂ = ∈ ∩ ≠O O N O{ | ( ) }.x x x and � (1)

Figure 4 shows all boundary points of ∂O colored inlight gray.

In the preprocessing stage, the molecular shape is built.The molecular shape in the MRC file format is directly

used for our program as the default input. The MRCvolumetric data can be generated by using a waydescribed by [8]. First, the MSROLL [39] program inMolecular Surface Package is used to compute theConnolly surface (triangle mesh) of the molecule usingdefault parameters. Next, the triangle mesh is placed in a3D cubic grid of n3 (such as n = 64), compactly fitting amolecule to the grid. Each lattice point is assigned either1 or 0; 1 for object points O and 0 for background points

O . Alternatively, the users may use some commercialsoftwares for getting the MRC files of molecular models,such as Chimera [40] and EMAN [37].

We have implemented the technique presented in theprevious section and tested it on a set of molecules. Thealgorithm described above is implemented in C++. Toshow the ability of the IDSS approximating themolecular shapes, we first select a couple of complicatedexamples for visualizing their IDSS. To demonstrate theutility of deformation invariant signatures, we develop ashape search system of flexible molecules and test thissystem for a benchmark containing abundant conforma-tional changes of molecules.

Examples of simulated dataThe ability of inner distance to represent deformationinvariant shape signatures of flexible molecules is firsttested on some unrelated proteins (PDB code: 1ctr, 1b7t,1irk and 2btv). Previously, these structures are used inthe assessment of structure recognition in cryo-EM [41].Four protein models with the simulated 8 Å resolution

Figure 4Illustration of computing the inner distance for theprotein (PDB code: 1aon). All boundary points of theshape are colored in light gray. Left, the shape with 500uniform sample points (red color) and their inner distance(greed color). Right, a detail of the middle of the graph. Notethat the inner distances capture the shape.

BMC Bioinformatics 2009, 10:157 http://www.biomedcentral.com/1471-2105/10/157

Page 5 of 14(page number not for citation purposes)

Page 6: BMC Bioinformatics BioMed Central · BMC Bioinformatics Methodology article IDSS: deformation invariant signatures for molecular shape comparison Yu-Shen Liu*1,YiFang1 and Karthik

density maps are shown in Figure 5. The computed innerdistances approximate well the global shapes of fourproteins. Note how the inner distances capture the holesfor 1b7t and 1irk. This apparent differences of the globalsurface shapes are also reflected by distinctive innerdistance signatures shown in Figure 6.

Articulated deformation insensitivityAs described above in this paper, the most attractive oneof advantages of IDSS is deformation insensitivity for 3Darticulated shapes in contrast with the traditional rigiddescriptors. In addition, the IDSS also captures someglobal geometric properties which are scale, translationand rotation invariant. However, in practice the IDSSs of

deformation shapes of one same protein are not exactlyidentical. This error is caused by two reasons. One is thatthe molecular surface shape O is discretized into thevolumetric format, where m sample points on theboundary surface ∂O of O are only used for approximat-ing the global inner distances. Smaller m does notsufficiently approximate ∂O, while larger m requiresmore computation time and space. In our implementa-tion, we typically choose m = 500 for both a smallapproximation error and little computation time.

The second reason is that the size of the loop and hingeregions of deformations affects the IDSS computation.Intuitively, for smaller the loop and hinge changecompared to the overall size of the molecular shape,the inner distance changes are smaller. Figure 7 shows anexample of articulated deformation insensitivity of theIDSS. Here, the two molecules used are two conforma-tions of the same protein (PDB code: 1j5nA and1lwmA), and the relative change of the loop (green)on the left top are large. This results in some errors in theIDSS, but the two IDSSs are still very close with thesimilar histogram. In contrast, the traditional rigiddescriptors fail in deformation detection (see theEuclidean distance signature in this figure).

A search system of flexible moleculesTo assess the efficacy of the proposed signature, we haveincorporated the new method into a system of molecularshape comparison. We have chosen to test our methodon a benchmark set of molecules found in the Databaseof Macromolecular Movements (MolMovDB) [42].MolMovDB presents a diverse set of molecules thatdisplay large conformational changes in proteins and

Figure 5(A-D) test four protein models with 500 samplepoints each. (A) 1ctr. (B) 1b7t. (C) 1irk. (D) 2btv. Column1 shows the isosurfaces with the simulated 8 Å resolutiondensity maps for the four models. Column 2 shows theuniform sample points, while column 3 shows the pathes ofinner distances. Points on boundary surfaces are colored ingray, sample points are colored in red, and the edges of innerdistance are colored in green.

1CTR

1B7T

1IRK

2BTV

Figure 6The inner distance signatures of four models are given.

BMC Bioinformatics 2009, 10:157 http://www.biomedcentral.com/1471-2105/10/157

Page 6 of 14(page number not for citation purposes)

Page 7: BMC Bioinformatics BioMed Central · BMC Bioinformatics Methodology article IDSS: deformation invariant signatures for molecular shape comparison Yu-Shen Liu*1,YiFang1 and Karthik

other macromolecules, which can be found at: http://www.molmovdb.org/. The benchmark data set is classi-fied 214 groups with the total 2,695 PDB files, whereeach is named the corresponding group ID. The numberof conformations in different group may be different.This benchmark has been used in predicting proteinstructures and hinge predictor [32,43]. The developedsearch system of flexible molecules provides a tool withwhich users can retrieve molecules from the benchmarkbased on their shape attributes. In our current program,the user selects a query molecule from the database andthe program computes the similarity scores for allmolecules in the database using the methods describedin this paper. The program then shows the querymolecule and the similar molecules in the database.

Figure 8 shows the framework and its visual appearance.In the interface of our program, the query molecular isdisplayed on the left and the retrieved results includingGroup ID are shown in the right dialog box. The examplein Figure 8 shows a query molecule from Group 1 andsome retrieved molecules. Especially, in our database,there are only four molecules in Group 1, our methodcan totally find them in the first four retrieved resultsalthough they have different deformations. Note that the

current page in the retrieved results only shows the mostrelated 15 results. To see more results, the users can clickthe button "Next Page" in the dialog box and the othergroups will come in the next page. Here, we renamedeach group "Group + an unique number". If the userswant to learn more details of the query protein, they canclick the button "Link Website" in our program toconnect the corresponding website in the MolMovDBdatabase to see how the protein deformation works.

In the MolMovDB benchmark we have pre-calculated allinner distance signatures of queries on the database anddisplay molecules using images in the dialog box ofretrieved results. The inner distance signatures allowrapid search on the system because a molecular shape iscompactly represented by a 1D vector. If a querymolecule is already transformed into the inner distancesignature, a search to the current benchmark data settakes less than half a second.

Figure 7The ID signature compared, for instance, toEDsignature. The first row shows the input twoconformations 1j5nA (left) and 1lwmA (right) of the sameprotein. The second row shows the ID and ED signatures.Note that ID is not sensitive to shape deformation, so twosignatures are very close; in contrast, ED is strongly sensitiveto deformation.

Previous Page Next Page

: Link Website

Click the button : Link Website in the MolMovDB database to see how the proteindeformation works.

Figure 8A screen shot from flexible molecular shapecomparison system.

BMC Bioinformatics 2009, 10:157 http://www.biomedcentral.com/1471-2105/10/157

Page 7 of 14(page number not for citation purposes)

Page 8: BMC Bioinformatics BioMed Central · BMC Bioinformatics Methodology article IDSS: deformation invariant signatures for molecular shape comparison Yu-Shen Liu*1,YiFang1 and Karthik

Comparison with existing methodsThe ID shape signature is first compared with two otherdistance signatures: Euclidean distance (ED) and geo-desic distance (GD) in terms of the performance onretrieving similar molecular structures. We use standardevaluation procedures from information retrieval,namely precision-recall curves, for evaluating the variousshape distance signatures [44]. Precision-recall (PR)curves describe the relationship between precision andrecall for an information retrieval method. Precision isthe ratio of the relevant models retrieved to the retrievalsize. Recall is the fraction of the relevant modelsretrieved for a given retrieval size. A perfect retrievalretrieves all relevant models consistently at each recalllevel, producing a horizontal line at precision = 1.0.However, in practice, precision decreases with increasingrecall. The closer a PR curve tends to the horizontal lineat precision = 1.0, the better the information retrievalmethod. Figure 9 shows the PR curves of three distancesignatures for the MolMovDB database. The results showthat the ID method performs better than ED and GD ataverage level for flexible molecules. As we discussedpreviously, although the GD is insensitive to surfacestretch, it is sensitive to 3D shape deformation [31]. GDis sensitive to 3D shape deformation. From our experi-ments we also found that some molecules with onedomain are often judged as similarity to some ones withtwo or three domains when using GD signatures. In theMolMovDB database, GD can not give good searchingresults as well as ED. Furthermore, we compared oursignatures with three known rigid descriptors: thespherical Harmonic descriptor, the solid angle histogram

and the eigen value model [44]. The all three methodshave been developed and used for searching of rigidshapes in computer graphics, engineering domain, andmolecular shape comparison. One recent work [8] hascompared the differences between shape descriptors withthe cleaned SCOP protein classification database. Intheir paper, the 3D Zernike descriptor retrieved the betterresults than the above rigid methods based on theconsistency of the rigid shapes. However, protein areflexible molecules that undergo significant structuralchanges and shape deformations as part of theirfunction, and the existing rigid descriptors all fail ondeformation detection (e.g. for examples in Figure 2 andFigure 7).

DiscussionIn this section, we present several potential applicationsfor IDSS by replacing the conventional rigid shapedescriptors in molecular shape comparison and alsodiscuss some limitations of our approach.

Searching molecular databases for drug designIn rational drug design, a unifying principle is the use ofeither shape similarity or complementarity to identifycompounds expected to be active against a given target[1-3,12]. Shape similarity is the underlying foundationof ligand-based methods that seek compounds withstructure similar to known actives. Shape complemen-tarity is the basis of most receptor-based design methods,which identify compounds complementary in shape to agiven receptor. One of the future works is to apply theIDSS method to some large and diverse moleculardatabases for both ligand- and receptor-based moleculardesign.

There are some methods that have focused on searchingdiverse molecular databases based on the descriptorMSC methods. For instance, Nilakantan et al. [20]searched the ten molecules with the highest shapesimilarity score in a database consisted of 22,495compounds derived from the Cambridge Crystal File.Their technique can be used to screen large databases toeliminate those candidates which have a low shapesimilarity with the template. Hahn [45] described athree-phase database searching strategy for rapidlyfinding compounds similar in shape to a given shapequery. This used database contained 45,579 compoundsand 1,949,459 total conformations. Zauhar et al. [3]tested their shape signature method to the Triposfragment database and the NCI database (113,331compounds) under two different metrics. Recently,Ballester et al. [1,2] presented ultrafast shape recognitionto search several compound databases for similarmolecular shapes. The tested databases include the

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

Recall

Pre

cisi

on

IDEDGD

Figure 9Precision-recall curves computed for three distancesignatures: ID, ED and GD for the MolMovDBdatabase.

BMC Bioinformatics 2009, 10:157 http://www.biomedcentral.com/1471-2105/10/157

Page 8 of 14(page number not for citation purposes)

Page 9: BMC Bioinformatics BioMed Central · BMC Bioinformatics Methodology article IDSS: deformation invariant signatures for molecular shape comparison Yu-Shen Liu*1,YiFang1 and Karthik

Vendor Database (2,433,493 commercially availablecompounds) and an independent benchmark fromDrugBank. Our IDSS may replace the existing shapedescriptors used in the above molecular databases.Searching molecular databases for drug design will bethe subject of a separate publication.

Protein structure retrievalWith the rapidly increasing number of known proteinstructure data, fast structural comparisons and retrievalmethods are necessary to protein structure databases.Many structural comparison methods of proteins havebeen proposed for computing the similarity scores, andmost of them are based on protein structure alignment,such as DALI [46] and CE [47]. Structural alignmentaims to compare a pair of structures, where thealignment between equivalent residues is not givenprior. Therefore, an optimal sequence alignment needsto be identified, which has been shown to be NP-complete [48]. In addition, several methods consider thehinge regions for aligning the protein rigid subparts[33,49]. Recently, we also presented a structural compar-ison method for flexible proteins using least median ofsquares [50]. The reader may consult [51] for com-prehensive evaluation of protein structure alignmentmethods.

Our IDSS method can be used as a search for similarprotein structures. One main advantage is that the shape-based protein searching method does not produce analignment between two proteins (i.e. correspondencebetween amino acids). The standard benchmark data setsused to demonstrate the effectiveness of a similaritysearch are SCOP and CATH at various homologythresholds. It is expected that the presented IDSS methodcan be considered as an alternative and complementarytool for the existing methods for protein structurecomparison and rigid molecular shape comparison.

Discovery of high resolution structural homologues fromcryo-EM mapsComputer reconstruction of cryo-EM images approxi-mates the overall shape and topology of 3D volumetricobject of macromolecular complexes [41,52-54], whereit is not a trivial task to determine the structureinformation due to the low resolution. The obtainedcryo-EM data is a 3D grid, called cryo-EM map, in whichevery voxel is assigned a density value. Only the overallshape and possible component boundaries are visible atlow resolution; individual components become apparentat intermediate resolution. Many works have beenpresented for fitting high resolution structures ofindividual subunits into a cryo-EM map of a proteincomplex. Lasker et al. [54] divided the different

approaches into two categories. One class of approachesassumes that the input is a cryo-EM map of a complexand an atomic resolution structure of one of itscomponents, and the aim is to fit the given componentinto its location in the cryo-EM map. In many cases, onlythe cryo-EM map is available, whereas the atomicstructures of individual components in a complex areunknown. Another class of approaches looks for closelyrelated atomic structures of the complex's componentsand fits them into the map, which is a challenge. Theprevious methods search for structural homologues ofthe complex's domains based on sequence alignment orcorrelation scores, and then fit them into the map. Toalign atomic resolution subunits into cryo-EM maps,EMatch method [54] first identifies helices in an inputcryo-EM map. It then uses the spatial arrangements ofthe helices to query a data set of high resolution foldsand finds structures that can be aligned into the cryo-EMmap. One key step in EMatch is to detect helices in cryo-EM. However, identification of secondary structureelements in low or intermediate resolution densitymaps still is a difficult open problem [41]. In addition,Baker et al. [41] discussed a framework for simultaneousidentification of both a helices and b sheets inintermediate resolution density maps.

In the spirit similar to searching a data set used inEMatch, one possible solution is to first convert allproteins of the database into the density maps in thesame resolution. Then we may search the converteddatabase of protein surfaces for compounds that mostclosely resemble the input query cryo-EM. One mainadvantage of the strategy is to avoiding detecting ahelices and b sheets for the input croy-EM. Most existingrigid MSC methods can work on the above searchingstep. Our IDSS method can also be directly used forsearching for the most related proteins of the complex'scomponents as an alternative method by considering thedeformation of flexible proteins.

Combining other characteristics into the signatureThe IDSS algorithm presented in this paper belongs onmolecular shape comparison. The current implementa-tions only take advantage of geometry information ofmolecular shapes without chemical features. However, inmany applications, such as matching in protein-proteinor protein-ligand (drug) docking/design, chemistry isalso very useful [55]. In fact, our current signature can bedirectly combined with some chemistry information.Specifically, other characteristics of a molecular surface,such as electrostatic potentials, might be naturallyincorporated into the inner distance signature byconsidering a high dimensional sample point coordi-nate. For example, a molecular boundary surface ∂O in

BMC Bioinformatics 2009, 10:157 http://www.biomedcentral.com/1471-2105/10/157

Page 9 of 14(page number not for citation purposes)

Page 10: BMC Bioinformatics BioMed Central · BMC Bioinformatics Methodology article IDSS: deformation invariant signatures for molecular shape comparison Yu-Shen Liu*1,YiFang1 and Karthik

Eq. 1 can also be described as a set of 4D points ∂O ={pi = (xi, yi, zi, ci)}, where xi, yi, and zi are three geometrycoordinates of the sample point pi and ci denotes itsvalue of charge. The inner distance can be computed asthe length of the shortest path between four-dimensionalpoints. In the future, we intend to consider adding otherchemical features into our signature.

LimitationA limitation of our approach is that the calculation ofthe ID is sensitive to the topology changes in shape.Figure 10 shows two examples of protein conformationpairs but with different topology structures. In Figure 10(A),two molecules are two conformations of GroEL (PDBcode: 1kp8 and 1aon), where the intermediate domainof 1kp8 swings down towards the equatorial domainand the central channel so that the surfaces of two

domains intersect in 1aon. In Figure 10(B), twomolecules are two conformations of Diptheria Toxin(PDB code: 1ddt and 1mdt), where 1ddt has severaldomains but 1mdt shrinks together. The inner distancesignatures will be very different between conformationsin the term of the shape topology changes. However, thespecial cases with shape topology changes are not veryusual in protein deformations. Our method can workwell for most molecular shape deformation withouttopology changes. In many ways the definition of asignature which is both effective and highly robust to theobject representation remains a challenge [29].

Another limitation of molecular shape comparison isthat the shapes with similar descriptors perhaps have noevolutionary relationship. Figure 11 shows a pair ofproteins 1barA and 1rro, which is provided in Ref. [9].The two proteins have very similar geometry shapedescriptors, but they have very different main chainorientation. Most molecular comparison methods basedon their shapes are not available for distinguishing thespecial cases. In particular, IDSS can be combined withclassical structure alignment algorithms for proteinshape retrieval. For example, IDSS first can be used toretrieve an initial small subset for a query protein, andthen some conventional structure comparison methods,such as CE and DALI, can compute main-chain similarityin the small subset.

ConclusionA new method for molecular shape comparison (MSC),called IDSS (Inner Distance Shape Signature), has beenpresented. IDSS does not require previous alignment ofthe molecules being compared. We show that the IDSS isdeformation insensitive and is good for approximatingthe complicated shapes of flexible molecules. In contrast,most existing MSC methods are effective for only

Figure 10Examples of protein conformation pairs but withvery different shape structures. (A) shows GroEL: 1kp8(left) and 1aon (right), where 1kp8 has two separate domainsand the corresponding domains of 1aon touch together.(B) shows Diptheria Toxin: 1ddt (left) and 1mdt (right),where 1ddt has several separate domains and thecorresponding domains of 1mdt touch together.

Figure 11Examples of proteins: 1barA (left) and 1rro (right)have similar geometry shapes but with different mainchain orientation [9].

BMC Bioinformatics 2009, 10:157 http://www.biomedcentral.com/1471-2105/10/157

Page 10 of 14(page number not for citation purposes)

Page 11: BMC Bioinformatics BioMed Central · BMC Bioinformatics Methodology article IDSS: deformation invariant signatures for molecular shape comparison Yu-Shen Liu*1,YiFang1 and Karthik

comparing rigid objects and they can not handle shapedeformation of flexible objects well. We have evaluatedand demonstrated the effectiveness of IDSS within amolecular search engine application for a benchmark onMolMovDB. The new signature achieves good perfor-mance and retrieval results for different classes of flexiblemolecules with the efficiency of comparing histogramsignatures. The presented IDSS method can also beapplied to the molecular surface representation, such asthe Connolly surface, by verifying whether a segment isinside the molecular surface. Moreover, we also showedseveral potential applications for IDSS by replacing theconventional rigid shape descriptors in molecular shapecomparison, including searching molecular databases fordrug design and protein structure retrieval.

MethodsThe IDSS algorithm for computing the inner distanceshape signature of a object O is given at Algorithm 1:

Algorithm 1 (IDSS)1. Sample uniformly m points S = {p1,...,pm} on theboundary surface ∂O of O using Lloyd's algorithm ofk-means clustering.2. Calculate the inner distances of all sample pointpairs in S.

2.1. First, we define a graph G over all samplepoints by connecting points pi and pj in S if theline segment connecting pi and pj falls entirelywithin the object O, and an edge between pi andpj is added to the graph with its weight equal tothe Euclidean distance ||pi - pj||.2.2. Then, we compute the inner distances byapplying a shortest path algorithm to the graph.

3. Build the signature of the shape O as the histogramof values of inner distances using 128 bins.

Our algorithm approximates the surface ∂O of O with aset of uniform sample points on ∂O, and the innerdistances between each pair of sample points with thelength of the shortest path through some other samplepoints. The implementation details of algorithm arepresented next.

Sampling pointsThe input shape of a molecule in the volumetric data is apoint array. If all points of the boundary surface ∂O ofthe volumetric shape O are utilized for the final innerdistance computation, it will increase the storage andcomputing costs of the shape inner distances. Therefore,we first sample points from the point array of themolecular surface ∂O. One issue of concern is the sampledensity. The more samples we take, the more accuratelyand precisely we can reconstruct the shape distribution.

However, a large number of sample points increases thestorage and computation costs of the inner distances, sothere is an accuracy/time tradeoff in the choice of thenumber m of sample points. In our experiments, we havefound that using m Π[300, 1000] yields shapedistributions with low enough variance and high enoughresolution to be useful for our initial experiments.

A second issue is the sample method. We implement twosampling methods: random and uniform sampling. Ran-dom samplingmethod can not yield a good approximationusing part of points. In this paper, we use Lloyd's algorithmof k-means clustering for obtaining uniform samplingpoints on a molecular surface. The uniform sample methodconsists of the following steps: 1) first, m random samplepoints are set as m clustering centers; 2) for each center, wecluster its neighborhood points; 3) each stage of Lloyd'salgorithm moves every center point to the centroid of thecluster and then updates the cluster by recomputing thedistance from each point to its nearest center; 4) these abovesteps are repeated until convergence; 5) finally, the point ineach cluster, which is most nearest to the cluster center, ischosen as the final sample point. The C++ source code fork-means clustering can be found at: http://www.cs.umd.edu/~mount/Projects/KMeans/.

Checking intersectionIn the second step of IDSS, we check whether a line segmentconnecting two sample points falls entirely within the givenshapeO, which is called inside visibility. We check whether piand pj are the inside visibility by computing the intersectionbetween the boundary surface ∂O and the line segment lconnecting pi and pj. Since ∂O is a point array or pointcloud, this section will deal with the intersection between land the point cloud surface ∂O. We improved our previousalgorithm [56], called LPSI (Line and Point Sets Intersect-ing), for resolving the intersection problem. This algorithmis fast, robust and obtains the high accuracy withoutrequiring a reconstruction of the underlying surface frompoint cloud. Our algorithm first detects whether anintersection has occurred between l and ∂O, and collectsthe inclusion points. Next we cluster the inclusion points.Finally the number of the resultant clusters is equal to thenumber of intersection points, which is used to judge theinside visibility.

We consider a cylinder around the line segment l withthe radius r. To determine r, we need to obtain thedensity r of the point set ∂O, where r is the maximumsize of a gap in ∂O. Suppose that d is the edge length of avoxel and it usually is set as a unit value, i.e. d ≡ 1, thenthe longest distance in the neighborhood around a voxelis 3d . Therefore, the density radius is chosen as r =

3d in this paper. Typically, we choose r = r = 3d as

BMC Bioinformatics 2009, 10:157 http://www.biomedcentral.com/1471-2105/10/157

Page 11 of 14(page number not for citation purposes)

Page 12: BMC Bioinformatics BioMed Central · BMC Bioinformatics Methodology article IDSS: deformation invariant signatures for molecular shape comparison Yu-Shen Liu*1,YiFang1 and Karthik

the radius of the cylinder for obtaining sufficientintersection points and less time. An intersection isreported if the cylinder contains some points of ∂O. Wecall these points inside the cylinder inclusion points.

After collecting the inclusion points, we then cluster them.Our cluster method maps 3D points into 1D parametercoordinates by projecting the inclusion points into the linesegment l. Suppose that {qi} ⊆ ∂O is a set of inclusionpoints of l and ∂O. Firstly, we project each point qi onto l,and get one corresponding parameter tiŒR. We also obtaina set {ti} of parameters. Secondly, the set {ti} is sorted inincreasing order. Here we suppose below that {ti} hasalready been sorted. Finally, we build the initial clusters by{ti} as described here. Starting from the minimal parameterof {ti}, a cluster Q0, which is a set of some inclusion pointsin {qi}, is built by comparing the distance of adjacentparameters. This cluster is terminated when the distance oftwo adjacent parameters is larger than a maximum bound(we typically choose 1.5 d as the bound). Then, startingfrom the terminated parameter, the next cluster Q1 is builtrepetitively. Clustering is terminated until the maximalparameter is reached. According to the number of initialclusters, we classify the intersection into three cases:According to the number of initial clusters, we classify theintersection into the following four cases.

Case 1: Containing only one intersection point(inside visibility).Case 2: Containing two intersection points (eitherinside or outside visibility).Case 3: Containing more than two intersectionpoints (non-visibility).

For Case 1 that the number of intersection points is less than2, pi and pj is inside visibility. For Case 3 the number ofintersection points is more than 2, pi and pj is non-visibility.

For Case 2 that the number of intersection points isequal to 2, pi and pj are either inside or outside visibility.We collect inclusion points of l with O (not ∂O) andcluster inclusion points using the above strategy. If thereis only one intersection point, pi and pj are insidevisibility; otherwise, pi and pj are outside visibility.

After all pairs of sample points are checked for insidevisibility, we define the graph G over all sample pointsby connecting points pi and pj and setting edge weightequal to the Euclidean distance ||pi - pj|| if pi and pj areinside visibility. An example is shown in Figure 4.

Computing the shortest pathesWe estimate the inner distances between all samplingpoint pairs by computing their shortest path distances in

the graph G. Algorithms for finding the shortest paths ingraph are well known. Here we use Dijkstra's algorithm tocompute the inner distance between sampling points inthe graph G. Dijkstra's algorithm is a graph searchalgorithm that solves the single source shortest pathproblem for a graph. In order to implement Dijkstra'salgorithm more efficiently, Fibonacci heap is used as apriority queue. We use the code package of Dijkstra'salgorithm implemented by Tenenbaum et al. [57] (seehttp://isomap.stanford.edu). In this paper we are inter-ested in the inner distance between all pairs of samplepoints. The time complexity isO(m3) form sample points.

Building signaturesThe inner distances reflect well the complex shapestructure and articulated without explicitly decomposingshapes into parts. Now we convert a set of the innerdistances defined on the boundary of the object to ashape signature. This is done in a similar manner asshape distribution in [7,29]. Given m sample points, thenumber of inner distances of the shape is at most m2/2.Specifically, we evaluate m2/2 inner distance values fromthe shape distribution and construct a histogram bycounting how many values fall into each of Nbin fixedsized bins. This vector with Nbin entries is an expressivesignature, as can be seen in Figure 1. Empirically, wehave found that using m = 500 samples and Nbin = 128bins yields shape signatures with low enough varianceand high enough resolution to be useful for ourexperiments.

Similarity measurementA signature of a shape is usually used as an index in adatabase of shapes and enables fast queries and retrieval.Hence, to achieve accurate results there is a need todefine the similarity measurement between two shapesignatures. Note that the shape signature of eachmolecule is represented by a 1D vector. There havebeen many standard ways of comparing two vectorsinvestigated in [7]. These include Lp (p = 1, 2,..., ∞)norms, the c2 measurement and Bhattacharyya distance.In fact, we have found that using different metrics ondifferent signatures may affect lightly the query results.Although in our experiments we tested all different typesof metrics for each signature when possible, we havefound that the metrics such as L1 and L2 norms aresimple and usually give better results. Assume that IA andIB represent signatures for two molecules A and B,respectively. The L1 norm, known as the Manhattandistance, between A and B is defined as

d A B i iM A B

i

Nbin

( , ) | ( ) ( ) |,= −=∑ I I

1

(2)

BMC Bioinformatics 2009, 10:157 http://www.biomedcentral.com/1471-2105/10/157

Page 12 of 14(page number not for citation purposes)

Page 13: BMC Bioinformatics BioMed Central · BMC Bioinformatics Methodology article IDSS: deformation invariant signatures for molecular shape comparison Yu-Shen Liu*1,YiFang1 and Karthik

where IA(i) is the ith element of vector IA, similarly forIB(i). The L2 norm, known as the Euclidean distance,between A and B is defined as

d A B i iE A

i

N

B

bin

( , ) | ( ) ( ) | .= −=∑ I I

1

2 (3)

Authors' contributionsYL generated the original idea, executed the research, andwrote the manuscript. YF participated in the research. KRsupervised the project and edited the paper. All authorsread and approved the final manuscript.

AcknowledgementsThe authors appreciate the comments and suggestions of all anonymousreviewers, whose comments significantly improved this paper. Thedatabase is provided by S Flores and M Gerstein. We also acknowledgesupport from the National Institute of Health (GM-075004). Any opinions,findings, and conclusions or recommendations expressed in this materialare those of the author(s) and do not necessarily reflect the views of theNational Institutes of Health.

References1. Ballester P and Richards W: Ultrafast shape recognition for

similarity search in molecular databases. Proceedings of theRoyal Society A 2007, 463(2081):1307–1321.

2. Ballester P and Richards W: Ultrafast shape recognition tosearch compound databases for similar molecular shapes.Journal of Computational Chemistry 2007, 28(10):1711–1723.

3. Zauhar R, Moyna G, Tian L, Li Z and Welsh W: Shape signatures:A new Approach to computer-aided ligand- and receptor-based drug design. Journal of Medicinal Chemistry 2003, 46(26):5674–5690.

4. Willett P: Searching techniques for databases of two- andthree-dimensional chemical structures. Journal of MedicinalChemistry 2005, 48(13):4183–4199.

5. Daras P, Zarpalas D, Axenopoulos A, Tzovaras D and Strintzis M:Three-dimensional shape-structure comparison method forprotein classification. IEEE/ACM Trans Comput Biol Bioinform 2006,3(3):193–207.

6. Funkhouser T, Min P, Kazhdan M, Chen J, Halderman A, Dobkin Dand Jacobs D: A search engine for 3D models. ACM Transactionson Graphics 2003, 22:83–105.

7. Osada R, Funkhouser T, Chazelle B and Dobkin D: Shapedistributions. ACM Transactions on Graphics 2002, 21(4):807–832.

8. Sael L, Li B, La D, Fang Y, Ramani K, Rustamov R and Kihara D: Fastprotein tertiary structure retrieval based on global surfaceshape similarity. Proteins: Structure, Function, and Bioinformatics2008, 72(4):1259–1273.

9. Sael L, La D, Li B, Rustamov R and Kihara D: Rapid comparison ofproperties on protein surface. Proteins: Structure, Function, andBioinformatics 2008, 73:1–10.

10. Yeh JS, Chen DY, Chen BY and Ouhyoung M: A web-based three-dimensional protein retrieval system by matching visualsimilarity. Bioinformatics 2005, 21(13):3056–3057.

11. Grant J and Pickup B: A Gaussian description of molecularshape. Journal of Physical Chemistry 1995, 99(11):3503–3510.

12. Grant J, Gallardo M and Pickup BT: A fast method of molecularshape comparison: A simple application of a Gaussiandescription of molecular shape. Journal of ComputationalChemistry 1996, 17(14):1653–1666.

13. Meyer A and Richards W: Similarity of molecular shape. Journalof Computer-Aided Molecular Design 1991, 5(5):427–439.

14. Masek BB, Merchant A and Matthew JB: Molecular skins: A newconcept for quantitative shape matching of a protein with itssmall molecule mimics. Proteins: Structure, Function, and Bioinfor-matics 1993, 17(2):193–202.

15. ROCS: OpenEyeScientific Software: Santa Fe, NM.

16. Masek BB, Merchant A and Matthew JB: Molecular shapecomparison of angiotensin II receptor antagonists. Journal ofMedicinal Chemistry 1993, 36(9):1230–1238.

17. Rush TS, Grant JA, Mosyak L and Nicholls A: A shape-based 3-Dscaffold hopping method and its application to a bacterialprotein-protein interaction. Journal of Medicinal Chemistry 2005,48(5):1489–1495.

18. Natarajan V, Wang Y, Bremer PT, Pascucci V and Hamann B:Segmenting molecular surfaces. Computer Aided GeometricDesign 2006, 23(6):495–509.

19. Bemis GW and Kuntz ID: A fast and efficient method for 2Dand 3D molecular shape description. Journal of Computer-AidedMolecular Design 1992, 6(6):607–628.

20. Nilakantan R, Bauman N and Venkataraghavan R: New method forrapid characterisation of molecular shapes: applications indrug design. Journal of Chemical Information and Computer Sciences1993, 33:79–85.

21. Mak L, Grandison S and Morris RJ: An extension of sphericalharmonics to region-based rotationally-invariant descrip-tors for molecular shape description and comparison. Journalof Molecular Graphics and Modeling 2008, 26:1035–1045.

22. Morris RJ, Najmanovich RJ, Kahraman A and Thornton JM: Realspherical harmonic expansion coefficients as 3D shapedescriptors for protein binding pocket and ligand compar-isons. Bioinformatics 2005, 21:2347–2355.

23. Nagarajan K, Zauhar R and Welsh W: Enrichment of ligands forthe serotonin receptor using the shape signatures approach.Journal of Chemical Information and Modeling 2005, 45:49–57.

24. Nicola G and Vakser IA: A simple shape characteristic ofprotein-protein recognition. Bioinformatics 2007, 23(7):789–792.

25. Sommer I, Müller O, Domingues FS, Sander O, Weickert J andLengauer T: Moment invariants as shape recognition techni-que for comparing protein binding sites. Bioinformatics 2007,23(23):3139–3146.

26. Elad A and Kimmel R: On bending invariant signatures forsurfaces. IEEE Transactions on Pattern Analysis and MachineIntelligence 2003, 25(10):1285–1295.

27. Hilaga M, Shinagawa Y, Kohmura T and Kunii T: Topologymatching for fully automatic similarity estimation of 3Dshapes. Proceedings of ACM SIGGRAPH 2001, 203–212.

28. Jain V and Zhang H: A spectral approach to shape-basedretrieval of articulated 3D models. Computer-Aided Design 2007,39(5):398–407.

29. Gal R, Shamir A and Cohen-Or D: Pose-oblivious shapesignature. IEEE Transactions on Visualization and Computer Graphics2007, 13(2):261–271.

30. Rustamov RM: Laplace-Beltrami eigenfunctions for deforma-tion invariant shape representation. Proceedings of the fifthEurographics symposium on Geometry processing 2007.

31. Ling H and Jacobs D: Shape classification using the inner-distance. IEEE Transactions on Pattern Analysis and MachineIntelligence (PAMI) 2007, 29(2):286–299.

32. Damm KL and Carlson HA: Gaussian-weighted RMSD super-position of proteins: A structural comparison for plexibleproteins and predicted protein structures. Biophysical Journal2006, 90:4558–4573.

33. Shatsky M, Nussinov R and Wolfson HJ: Flexible proteinalignment and hinge detection. Proteins: Structure, Function, andBioinformatics 2002, 48(2):242–256.

34. Wriggers W and Schulten K: Protein domain movements:detection of rigid domains and visualization of hinges incomparisons of atomic coordinates. Proteins: Structure, Function,and Bioinformatics 1997, 29:1–14.

35. Li B, Turuvekere S, Agrawal M, La D, Ramani K and Kihara D:Characterization of local geometry of protein surfaces withthe visibility criterion. Proteins: Structure, Function, and Bioinfor-matics 2008, 71(2):670–683.

36. Ju T, Baker M and Chiu W: Computing a family of skeletons ofvolumetric models for shape description. Computer-AidedDesign 2007, 39(5):352–360.

37. Ludtke S, Baldwin P and Chiu W: EMAN: semi-automatedsoftware for high resolution single particle reconstructions.Journal of Structural Biology 1999, 128:82–97.

38. Iyer N, Jayanti S, Lou K, Kalyanaraman Y and Ramani K: Threedimensional shape searching: State-of-the-art review andfuture trends. Computer-Aided Design 2005, 37(5):509–530.

39. Connolly M: Solvent-accessible surfaces of proteins andnucleic acids. Science 1983, 221(4612):709–713.

40. Pettersen E, Goddard T, Huang C, Couch GDGS, Meng E andFerrin T: UCSF Chimera – A visualization system for

BMC Bioinformatics 2009, 10:157 http://www.biomedcentral.com/1471-2105/10/157

Page 13 of 14(page number not for citation purposes)

Page 14: BMC Bioinformatics BioMed Central · BMC Bioinformatics Methodology article IDSS: deformation invariant signatures for molecular shape comparison Yu-Shen Liu*1,YiFang1 and Karthik

exploratory research and analysis. Journal of ComputationalChemistry 2004, 25(13):1605–1612.

41. Baker M, Ju T and Chiu W: Identification of secondary structureelements in intermediate-resolution density maps. Structure2006, 15:7–19.

42. Echols N, Milburn D and Gerstein M: MolMovDB: Analysis andvisualization of conformational change and structural flex-ibility. Nucleic Acids Research 2003, 31:478–482.

43. Flores S and Gerstein M: FlexOracle: predicting flexible hingesby identification of stable domains. BMC Bioinformatics 2007,8(215).

44. Jayanti S, Kalyanaraman Y, Iyer N and Ramani K: Developing anengineering shape benchmark for CAD models. Computer-Aided Design 2006, 38(9):939–953.

45. Hahn M: Three-dimensional shape-based searching of con-formationally flexible compounds. J Chem Inf Comput Sci 1997,37:80–86.

46. Holm L and Sander C: Protein structure comparison byalignment of distance matrices. Journal of Molecular Biology1993, 233:123–138.

47. Shindyalov I and Bourne P: Protein structure alignment byincremental combinatorial extension (CE) of the optimalpath. Protein Engineering 1998, 11(9):739–747.

48. Lathrop RH: The protein threading problem with sequenceamino acid interaction preferences is NP-complete. ProteinEngineering 1994, 7:1059–1068.

49. Ye Y and Godzik A: Flexible structure alignment by chainingaligned fragment pairs allowing twists. Bioinformatics 2003, 19:ii246–ii255.

50. Liu YS, Fang Y and Ramani K: Using least median of squares forstructural superposition of flexible proteins. BMC Bioinfor-matics 2009, 10(29).

51. Kolodny R, Koehl P and Levitt M: Comprehensive evaluation ofprotein structure alignment methods: scoring by geometricmeasures. J Mol Biol 2005, 346(4):1173–1188.

52. Chiu W, Baker M, Jiang W, Dougherty M and Schmid M: Electroncryomicroscopy of biological machines at subnanometerresolution. Structure 2005, 13(3):363–372.

53. Dror O, Lasker K, Nussinov R and Wolfson H: EMatch: anefficient method for aligning atomic resolution subunits intointermediate-resolution cryo-EM maps of large macromo-lecular assemblies. Acta Crystallogr D Biol Crystallogr 2007,63:42–49.

54. Lasker K, Dror O, Shatsky M, Nussinov R and Wolfson H: EMatch:discovery of high resolution structural homologues ofprotein domains in intermediate resolution cryo-EM maps.IEEE/ACM Transactions on Computational Biology and Bioinformatics2007, 4:28–39.

55. Shulman-Peleg A, Shatsky M, Nussinov R and Wolfson H: MultiBindand MAPPIS: webservers for multiple alignment of protein3D-binding sites and their interactions. Nucleic Acids Research2008, 36 Web Server: W260–W264.

56. Liu YS, Yong JH, Zhang H, Yan DM and Sun JG: A quasi-MonteCarlo method for computing areas of point-sampledsurfaces. Computer-Aided Design 2006, 38:55–68.

57. Tenenbaum JB, de Silva V and Langford JC: A global geometricframework for nonlinear dimensionality reduction. Science2000, 290(5500):2319–2323.

Publish with BioMed Central and every scientist can read your work free of charge

"BioMed Central will be the most significant development for disseminating the results of biomedical research in our lifetime."

Sir Paul Nurse, Cancer Research UK

Your research papers will be:

available free of charge to the entire biomedical community

peer reviewed and published immediately upon acceptance

cited in PubMed and archived on PubMed Central

yours — you keep the copyright

Submit your manuscript here:http://www.biomedcentral.com/info/publishing_adv.asp

BioMedcentral

BMC Bioinformatics 2009, 10:157 http://www.biomedcentral.com/1471-2105/10/157

Page 14 of 14(page number not for citation purposes)