Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Delivered by Publishing Technology to Sankar BasuIP 11619314116 On Thu 19 Dec 2013 194807
Copyright American Scientific Publishers
COMMUNIC
ATIO
N
Copyright copy 2013 American Scientific PublishersAll rights reservedPrinted in the United States of America
Journal ofBioinformatics and Intelligent Control
Vol 2 321ndash323 2013
SARAMA A Standalone Suite of Programs for theComplementarity PlotmdashA Graphical Structure
Validation Tool for Proteins
Sankar Basu1lowast Dhananjay Bhattacharyya2 and Rahul Banerjee1lowast1Crystallography and Molecular Biology Division Saha Institute of Nuclear Physics Kolkata 700064 India
2Computational Science Division Saha Institute of Nuclear Physics Kolkata 700064 India
Structure validation is a crucial component not only in protein crystallography but also in modelquality estimation in homology modeling protein design and de-novo structure prediction Two keyattributes of a correctly determined atomic model are optimal packing between side-chains andabsence of destabilizing unbalanced electric fields within the interior of a protein molecule Thecomplementarity plot (CP) combines them in a single unified measure CP has now been compiledinto a user friendly validation package and made available as a standalone suite of programs in thepublic domain (httpwwwsahaacinbiopwwwsaramahtml) The application of CP in the detectionof wrong rotamer assignment has been surveyed
Keywords Complementarity Packing and Electrostatics Structure Validation
We report the free availability of a standalone suite of pro-grams (Sarama) for the Complementarity Plot (Linux Plat-form) with detailed features and documentation availableat the website httpwwwsahaacinbiopwwwsaramahtml The basic methodology has already been reported1
Briefly the Complementarity Plot (CP) estimates the shapeand electrostatic complementarity of interior residues of aglobular protein and is a sensitive indicator of their har-mony or disharmony with regard to the short and longrange forces sustaining the native fold A correctly deter-mined natively folded protein structure should have opti-mal packing between its buried side-chains and absence ofdestabilizing unbalanced electric fields within the interiorof the molecule CP has already been demonstrated to beeffective in detecting local regions of suboptimal packingor electrostatics which were found to be highly correlatedto coordinate errors CP has now been compiled into anuser friendly validation package which should be an use-ful addition in the already existing repertoire of structurevalidation tools A set of scores have now been includedin the methodology which gives an estimate of the prob-abilities associated with the distribution of points in theplot and the propensities of specific residues to differentdegrees solvent exposureAs has been reported previously1 CP requires the sur-
face (Sscm ) and electrostatic (Esc
m complementarity to becomputed for buried residues In this regard the extent of
lowastAuthors to whom correspondence should be addressed
burial (Bur) of every amino acid residue with respect tothe solvent was estimated by the ratio of the solvent acces-sible areas (probe radius 14 Aring)2 of the residue (X) in thepolypeptide chain to that of an identical residue in a Gly-X-Gly peptide fragment in a fully extended conforma-tion Only those residues with the burial ratio Burle 030were henceforth considered for the complementarity plotThe van der Waals surface was calculated for the entirepolypeptide chain sampled at 10 dotsAring2 3 and surface(Ssc
m and electrostatic (Escm complementarities calculated
for buried or partially buried side-chains13
For surface complementarity Sscm only side-chain sur-
face points of buried residues (target) were considered andtheir nearest neighboring surface points identified fromthe rest of the polypeptide chain (within a distance of35 Aring) Surface points essentially being area elements arecharacterized by their positions (x y z) and the direc-tion cosines (dldmdn) of their normals Then adaptedfrom Lawrence and Colman4 the following expression wascalculated
Sab= na middotnb middot expminuswd2ab (1)
where na and nb are two unit normal vectors correspond-ing to the dot surface point a (located on the side chainsurface of the target residue) and b (the dot point near-est to a within 35 Aring) respectively with dab the distancebetween them and w a scaling constant set to 05 Ssc
m
was defined as the median of the distribution Sab
J Bioinf Intell Control 2013 Vol 2 No 4 2326-749620132321003 doi101166jbic20131059 321
Delivered by Publishing Technology to Sankar BasuIP 11619314116 On Thu 19 Dec 2013 194807
Copyright American Scientific Publishers
COMMUNIC
ATIO
N
SARAMA A Standalone Suite of Programs for the CPmdashA Graphical Structure Validation Tool for Proteins Basu et al
calculated over all the dot surface points of the side-chaintarget residueFor electrostatic complementarity (Esc
m ) the electrostaticpotential of the molecular surface was estimated usingthe finite difference Poisson-Boltzmann method as imple-mented in DelPhi5 The potential on the side-chain sur-face points of a buried residue was then computed twice1
first due to all atoms of the target residue and second asa function of all atoms from the rest of the polypeptidechain (excluding the target) Thus each surface point wastagged with two values of electrostatic potential Follow-ing McCoy et al6 negative of the Pearsonrsquos correlationcoefficient between these two sets of potential values overthe side-chain dot surface points of the target residue wasdefined as Esc
m
Escm =minus
( sumNi=1 iminus primeiminus prime
sumN
i=1 iminus 2sumN
i=1 primeiminus prime212
)(2)
where for a given residue consisting of a total of N side-chain dot surface points i is the potential on its ithpoint realized due to its own atoms and i due to therest of the protein atoms and are the mean potentialsof i i = 1 N and primei i = 1 N respectivelyThe plot of Ssc
m on the X-axis and Escm on the Y -axis
(spanning minus1 to 1 in both axes) constitutes the lsquoCom-plementarity Plotrsquo (CP) which is actually divided intothree plots based on the burial ranges 000 le Bur le 005(CP1) 005 lt Bur le 015 (CP2) and 015 lt Bur le 030(CP3) Initially all the buried residues from a trainingdatabase (DB2) consisting of 400 highly resolved proteincrystal structures1 were plotted in the CPs which had beendivided into square-grids (of width 005times 005) and thecenter of every square grid was assigned an initial prob-ability (Pgrid equal to the number of points in the griddivided by the total number of points in the plot Theprobability of a residue to occupy a specific position inthe plot was then estimated by bilinear interpolation fromthe probability values of its four nearest neighboring vox-els Each CP was contoured based on the initial proba-bility values (Pgrid ge 0005 for the first contour level andPgrid ge 0002 for the second) thus dividing the plot intothree distinct regions The cumulative probability of locat-ing a point within the second (outer) contour for the threeplots were 91 90 88 respectively whereas for thefirst (inner) contour the probability gradually dropped withincreasing solvent exposure (82 76 71) Inspired bythe Ramachandran Plot the region within the first con-tour was termed lsquoprobablersquo between the first and secondcontour lsquoless probablersquo and outside the second contourlsquoimprobablersquo (Fig 1)In such a plot residues with low Ssc
m and Escm (lt02 for
both) are easily identified The methodology has alreadybeen shown to detect errors in side-chain conformersin obsolete structures wrt their upgraded counterparts1
Such side-chains were found to have suboptimal pack-ing andor electrostatics and thus predominantly lie in the
Fig 1 CP1 The complementarity plot for the 1st burial bin lsquoProbablersquolsquoless probablersquo and improbablersquo regions of the plot are colored in purplemauve and sky-blue respectively
improbable region of the plots An example of such a cal-culation is given below110 pairs of obsolete and their corresponding upgraded
counterparts were assembled from the PDB (ftpftpwwpdborgpubpdbdatastatusobsoletedat) In order toensure that the upgraded structure was genuinely bet-ter than its obsoleted counterpart only those pairs wereselected wherein the improvement in resolution andR-factor were better than 02 Aring and 002 respectively222 deeply buried residues (00 le Bur le 005) from theupgraded structures were identified which were origi-nally found to be located in the probable region of CP1and whose counterparts in the corresponding obsoletestructures differed by more than 40 (involving 1 and 2though belonging to another valid rotamer combination7
They were then replaced by their corresponding coun-terparts from the obsolete structures Subsequent to thereplacement 45 of the points were relocated in theimprobable region of the plot 16 were found in the lessprobable region whereas 39 were retained in the prob-able region (Fig 2) Deviations from the expected dis-tributions (DB2) were estimated by means of 2 (df =3ndash1 probable less probable improbable 2
005 = 5991)subsequent to the replacement which was found to be39763 Thus CP could have applications when deal-ing with low-resolution data where automated side-chainrebuilding methods generally do not work very efficientlyThe Complementarity Plot as a validation technique is
probabilistic in nature and can be utilized either over thefull chain or on any distribution of points Further this isthe only validation procedure which combines both packing
322 J Bioinf Intell Control 2 321ndash323 2013
Delivered by Publishing Technology to Sankar BasuIP 11619314116 On Thu 19 Dec 2013 194807
Copyright American Scientific Publishers
COMMUNIC
ATIO
N
Basu et al SARAMA A Standalone Suite of Programs for the CPmdashA Graphical Structure Validation Tool for Proteins
Fig 2 Distributions (in CP1) for residues with native side-chain conformers from the upgraded structures and replaced by rotamers from correspondingobsolete counterparts (A) Distribution of residues with native side-chains all falling into the probable regions of CP1 and (B) distribution subsequentto the replacement
and electrostatics in a single unified measure and displaysgraphically (apart from actually listing) residues with faultypacking andor electrostatics Thus CP should be a usefuladdition in the already existing repertoire of structure val-idation tools The output of the program gives Ssc
m and Escm
of buried residues which can also be used for a wide rangeof other applications eg fold recognition analysis ofside-chain packing detection of unbalanced partial chargeswithin protein interiors protein design and modelingThe website contains detailed documentation of the
different thresholds for successful validation for all theCP-scores for a given atomic model The model mightbe experimentally or computationally derived but shoulddefinitely contain coordinates of (geometrically fixed)hydrogen atoms consistent with the format of REDUCE8
The suite has been successfully tested on Redhat Enter-prise and open Suse linux platforms with PERL and Fortrancompilers f90 f95 gfortran or ifort DELPHI5 must be pre-installed and running under the command delphi_static
References and Notes1 S Basu D Bhattacharyya and R Banerjee Self-complementarity
within proteins Bridging the gap between binding and foldingBiophys J 102 2605 (2012)
2 B Lee and F M Richards The interpretation of protein structuresEstimation of static accessibility J Mol Biol 55 379 (1971)
3 R Banerjee M Sen D Bhattacharyya and P Saha The Jigsaw puz-zle model Search for conformational specificity in protein interiorsJ Mol Biol 333 211 (2003)
4 M C Lawrence and P M Colman Shape complementarity at pro-teinprotein interfaces J Mol Biol 234 946 (1993)
5 A Nichollos and B Honig A rapid finite difference algorithm utiliz-ing successive over-relaxation to solve the Poisson-Boltzmann equa-tion J Comput Chem 12 435 (1991)
6 A J McCoy V C Epa and P M Colman Electrostatic complemen-tarity at proteinprotein interfaces J Mol Biol 268 570 (1997)
7 M S Shapovalov and R L Dunbrack Jr A smoothed backbone-dependent rotamer library for proteins derived from adaptive kerneldensity estimates and regressions Structure 19 844 (2011)
8 J M Word S C Lovell J S Richardson and D C RichardsonAsparagine and glutamine Using hydrogen atom contacts in thechoice of side-chain amide orientation J Mol Biol 285 1735 (1999)
Received 10 August 2013 Accepted 3 September 2013
J Bioinf Intell Control 2 321ndash323 2013 323
Delivered by Publishing Technology to Sankar BasuIP 11619314116 On Thu 19 Dec 2013 194807
Copyright American Scientific Publishers
COMMUNIC
ATIO
N
SARAMA A Standalone Suite of Programs for the CPmdashA Graphical Structure Validation Tool for Proteins Basu et al
calculated over all the dot surface points of the side-chaintarget residueFor electrostatic complementarity (Esc
m ) the electrostaticpotential of the molecular surface was estimated usingthe finite difference Poisson-Boltzmann method as imple-mented in DelPhi5 The potential on the side-chain sur-face points of a buried residue was then computed twice1
first due to all atoms of the target residue and second asa function of all atoms from the rest of the polypeptidechain (excluding the target) Thus each surface point wastagged with two values of electrostatic potential Follow-ing McCoy et al6 negative of the Pearsonrsquos correlationcoefficient between these two sets of potential values overthe side-chain dot surface points of the target residue wasdefined as Esc
m
Escm =minus
( sumNi=1 iminus primeiminus prime
sumN
i=1 iminus 2sumN
i=1 primeiminus prime212
)(2)
where for a given residue consisting of a total of N side-chain dot surface points i is the potential on its ithpoint realized due to its own atoms and i due to therest of the protein atoms and are the mean potentialsof i i = 1 N and primei i = 1 N respectivelyThe plot of Ssc
m on the X-axis and Escm on the Y -axis
(spanning minus1 to 1 in both axes) constitutes the lsquoCom-plementarity Plotrsquo (CP) which is actually divided intothree plots based on the burial ranges 000 le Bur le 005(CP1) 005 lt Bur le 015 (CP2) and 015 lt Bur le 030(CP3) Initially all the buried residues from a trainingdatabase (DB2) consisting of 400 highly resolved proteincrystal structures1 were plotted in the CPs which had beendivided into square-grids (of width 005times 005) and thecenter of every square grid was assigned an initial prob-ability (Pgrid equal to the number of points in the griddivided by the total number of points in the plot Theprobability of a residue to occupy a specific position inthe plot was then estimated by bilinear interpolation fromthe probability values of its four nearest neighboring vox-els Each CP was contoured based on the initial proba-bility values (Pgrid ge 0005 for the first contour level andPgrid ge 0002 for the second) thus dividing the plot intothree distinct regions The cumulative probability of locat-ing a point within the second (outer) contour for the threeplots were 91 90 88 respectively whereas for thefirst (inner) contour the probability gradually dropped withincreasing solvent exposure (82 76 71) Inspired bythe Ramachandran Plot the region within the first con-tour was termed lsquoprobablersquo between the first and secondcontour lsquoless probablersquo and outside the second contourlsquoimprobablersquo (Fig 1)In such a plot residues with low Ssc
m and Escm (lt02 for
both) are easily identified The methodology has alreadybeen shown to detect errors in side-chain conformersin obsolete structures wrt their upgraded counterparts1
Such side-chains were found to have suboptimal pack-ing andor electrostatics and thus predominantly lie in the
Fig 1 CP1 The complementarity plot for the 1st burial bin lsquoProbablersquolsquoless probablersquo and improbablersquo regions of the plot are colored in purplemauve and sky-blue respectively
improbable region of the plots An example of such a cal-culation is given below110 pairs of obsolete and their corresponding upgraded
counterparts were assembled from the PDB (ftpftpwwpdborgpubpdbdatastatusobsoletedat) In order toensure that the upgraded structure was genuinely bet-ter than its obsoleted counterpart only those pairs wereselected wherein the improvement in resolution andR-factor were better than 02 Aring and 002 respectively222 deeply buried residues (00 le Bur le 005) from theupgraded structures were identified which were origi-nally found to be located in the probable region of CP1and whose counterparts in the corresponding obsoletestructures differed by more than 40 (involving 1 and 2though belonging to another valid rotamer combination7
They were then replaced by their corresponding coun-terparts from the obsolete structures Subsequent to thereplacement 45 of the points were relocated in theimprobable region of the plot 16 were found in the lessprobable region whereas 39 were retained in the prob-able region (Fig 2) Deviations from the expected dis-tributions (DB2) were estimated by means of 2 (df =3ndash1 probable less probable improbable 2
005 = 5991)subsequent to the replacement which was found to be39763 Thus CP could have applications when deal-ing with low-resolution data where automated side-chainrebuilding methods generally do not work very efficientlyThe Complementarity Plot as a validation technique is
probabilistic in nature and can be utilized either over thefull chain or on any distribution of points Further this isthe only validation procedure which combines both packing
322 J Bioinf Intell Control 2 321ndash323 2013
Delivered by Publishing Technology to Sankar BasuIP 11619314116 On Thu 19 Dec 2013 194807
Copyright American Scientific Publishers
COMMUNIC
ATIO
N
Basu et al SARAMA A Standalone Suite of Programs for the CPmdashA Graphical Structure Validation Tool for Proteins
Fig 2 Distributions (in CP1) for residues with native side-chain conformers from the upgraded structures and replaced by rotamers from correspondingobsolete counterparts (A) Distribution of residues with native side-chains all falling into the probable regions of CP1 and (B) distribution subsequentto the replacement
and electrostatics in a single unified measure and displaysgraphically (apart from actually listing) residues with faultypacking andor electrostatics Thus CP should be a usefuladdition in the already existing repertoire of structure val-idation tools The output of the program gives Ssc
m and Escm
of buried residues which can also be used for a wide rangeof other applications eg fold recognition analysis ofside-chain packing detection of unbalanced partial chargeswithin protein interiors protein design and modelingThe website contains detailed documentation of the
different thresholds for successful validation for all theCP-scores for a given atomic model The model mightbe experimentally or computationally derived but shoulddefinitely contain coordinates of (geometrically fixed)hydrogen atoms consistent with the format of REDUCE8
The suite has been successfully tested on Redhat Enter-prise and open Suse linux platforms with PERL and Fortrancompilers f90 f95 gfortran or ifort DELPHI5 must be pre-installed and running under the command delphi_static
References and Notes1 S Basu D Bhattacharyya and R Banerjee Self-complementarity
within proteins Bridging the gap between binding and foldingBiophys J 102 2605 (2012)
2 B Lee and F M Richards The interpretation of protein structuresEstimation of static accessibility J Mol Biol 55 379 (1971)
3 R Banerjee M Sen D Bhattacharyya and P Saha The Jigsaw puz-zle model Search for conformational specificity in protein interiorsJ Mol Biol 333 211 (2003)
4 M C Lawrence and P M Colman Shape complementarity at pro-teinprotein interfaces J Mol Biol 234 946 (1993)
5 A Nichollos and B Honig A rapid finite difference algorithm utiliz-ing successive over-relaxation to solve the Poisson-Boltzmann equa-tion J Comput Chem 12 435 (1991)
6 A J McCoy V C Epa and P M Colman Electrostatic complemen-tarity at proteinprotein interfaces J Mol Biol 268 570 (1997)
7 M S Shapovalov and R L Dunbrack Jr A smoothed backbone-dependent rotamer library for proteins derived from adaptive kerneldensity estimates and regressions Structure 19 844 (2011)
8 J M Word S C Lovell J S Richardson and D C RichardsonAsparagine and glutamine Using hydrogen atom contacts in thechoice of side-chain amide orientation J Mol Biol 285 1735 (1999)
Received 10 August 2013 Accepted 3 September 2013
J Bioinf Intell Control 2 321ndash323 2013 323
Delivered by Publishing Technology to Sankar BasuIP 11619314116 On Thu 19 Dec 2013 194807
Copyright American Scientific Publishers
COMMUNIC
ATIO
N
Basu et al SARAMA A Standalone Suite of Programs for the CPmdashA Graphical Structure Validation Tool for Proteins
Fig 2 Distributions (in CP1) for residues with native side-chain conformers from the upgraded structures and replaced by rotamers from correspondingobsolete counterparts (A) Distribution of residues with native side-chains all falling into the probable regions of CP1 and (B) distribution subsequentto the replacement
and electrostatics in a single unified measure and displaysgraphically (apart from actually listing) residues with faultypacking andor electrostatics Thus CP should be a usefuladdition in the already existing repertoire of structure val-idation tools The output of the program gives Ssc
m and Escm
of buried residues which can also be used for a wide rangeof other applications eg fold recognition analysis ofside-chain packing detection of unbalanced partial chargeswithin protein interiors protein design and modelingThe website contains detailed documentation of the
different thresholds for successful validation for all theCP-scores for a given atomic model The model mightbe experimentally or computationally derived but shoulddefinitely contain coordinates of (geometrically fixed)hydrogen atoms consistent with the format of REDUCE8
The suite has been successfully tested on Redhat Enter-prise and open Suse linux platforms with PERL and Fortrancompilers f90 f95 gfortran or ifort DELPHI5 must be pre-installed and running under the command delphi_static
References and Notes1 S Basu D Bhattacharyya and R Banerjee Self-complementarity
within proteins Bridging the gap between binding and foldingBiophys J 102 2605 (2012)
2 B Lee and F M Richards The interpretation of protein structuresEstimation of static accessibility J Mol Biol 55 379 (1971)
3 R Banerjee M Sen D Bhattacharyya and P Saha The Jigsaw puz-zle model Search for conformational specificity in protein interiorsJ Mol Biol 333 211 (2003)
4 M C Lawrence and P M Colman Shape complementarity at pro-teinprotein interfaces J Mol Biol 234 946 (1993)
5 A Nichollos and B Honig A rapid finite difference algorithm utiliz-ing successive over-relaxation to solve the Poisson-Boltzmann equa-tion J Comput Chem 12 435 (1991)
6 A J McCoy V C Epa and P M Colman Electrostatic complemen-tarity at proteinprotein interfaces J Mol Biol 268 570 (1997)
7 M S Shapovalov and R L Dunbrack Jr A smoothed backbone-dependent rotamer library for proteins derived from adaptive kerneldensity estimates and regressions Structure 19 844 (2011)
8 J M Word S C Lovell J S Richardson and D C RichardsonAsparagine and glutamine Using hydrogen atom contacts in thechoice of side-chain amide orientation J Mol Biol 285 1735 (1999)
Received 10 August 2013 Accepted 3 September 2013
J Bioinf Intell Control 2 321ndash323 2013 323