Amino acid propensity in different protein secondary structural units

Presented by

IntroductionTo understand and map the universe of protein,

it is necessary to know the protein structures which provide important information regarding its function and mechanism of action. The prediction of the native conformation of proteins is one of the most challenging problems in molecular biology. Secondary structure prediction is an important intermediate step in this process.

Basics of Protein Structure

PrimaryPrimary

SecondarySecondary

TertiaryTertiary

QuaternaryQuaternary

ACDEFGHIKLMNPQRSTVWY

primary structure

…the wayTraditional experimental methods:X-Ray or NMR to solve three dimensional structures;

it requires relatively large amounts of pure protein (generally greater than milligram quantities). The structures of many proteins will remain out of reach. Thus 3D structure prediction it is a great problem (more than three decades of history).

Strong demand for structure prediction:continual advances in molecular biology provide

protein sequence information (primary structures) at a pace that far exceeds the speed with which higher-order protein structures can be determined.

Reasons for Predicting Secondary StructureSince secondary structure is local, just need

amino acid sequenceAccurate secondary structure prediction can

be an important information for the tertiary structure prediction

Protein function predictionPredicting structural changeProtein classificationTo gain insights into the protein folding

processFacilitate alignment for homology modeling

of distantly related proteinsTo assist 3D structure modeling from NMR

data

Secondary Structure Prediction methods:

1st-generation method Calculate propensities for each amino acid

Chou-Fasman method (P.Y. Chou, G.D. Fasman, 1974)2nd-generation method

Calculate propensities for segments of 3-51 amino acids GOR method (Garnier et al, 1978)

3rd-generation methodUse evolutional information, multiple sequence

alignment Neural Network method (Qian & Sejnowski 1988, Karplus

1996) Nearest neighbour methods (Yi & Lander, 1993) PHD algorithm (Rost & Sander, 1993) Homology or nearest neighbor comparisons (Levin, 1993) Evolutionary methods (Barton, Niemann) Combined approaches (Rost 1994, Levin, Argos)

Chou-Fasman AlgorithmEmpirical method (statistical) for secondary structure

prediction [α-helix, β-strand, or coil].Several prediction methods were developed in recent

years, and have relevant improvement in the accuracy of prediction, in comparison to the original CF method. Nevertheless, many authors still use amino acid propensities or CF method, for 2D structure predictions as well as for evolution studies and in developing or evaluating new prediction methods.

Uses known small amount of structural data.Based on two parameters:

Frequency Propensity

MethodsThe analyses were performed using PDBselect

as a set of experimentally determined, non-redundant protein structures in PDB. The PDBselect list with <25% sequence homology, released in Oct 2007, which contained 3693 protein chains. All analysis perform in Human protein.

The extracted protein sequences were classified into four secondary structural classes (all-alpha, all-beta, alpha+beta and alpha/beta)[321 human protein] from the structural information provided by SCOP database.

The 2D structure for every PDBselect entry was assigned by the DSSP algorithm.

Secondary structure alphabets

Standard 3-state alphabet: H: α-helixE: β-strand (extended

structure)C: coil (any other structure)

CASP convention Standard:H = (H, G, I)E = (E, B)C = (T, S)

DSSP alphabet by Kabsch & Sanders (1983):H: α-helixG: 310 helixI : π-helixE: extended strand (β-

strand)B: residue in isolated β-

bridgeT: H-bonded turnS: bend

PropensityPij= (nij/ni) / (Nj/NT)

where; nij= number of amino acids (i) that occur in α helix

(j). ni= total no. of these residues occur in α helix (j) in

database. Nj= total no. of all amino acids residues in α-helix (j). NT= total no. of all amino acids residues in database.

Propensity value of all amino acids are calculated by using a perl program, in LINUX enviorment.

The t-test is used to evaluate the significance of the pairwise differences in intra and inter amino acid propensities in the four secondary structural class of human protein.

Analysis of GC3GC3:

Base composition at third codon position.TBLASTN used to retrieve the corresponding

mRNA sequence from protein sequence.GC3 value calculated by SingleFasta program

(developed in bioinformatics center, Bose Institute).

Devided into two categories GC3 low (<45%) and GC3 high (>60%).

T-test performed to evaluate significance level of GC3 and corresponding amino acid propensities in four structural class in Helix, Sheet and coil.

Expression levelExpression level:

Expression data collect from GNF SymAtlas database.

Divided into two categories expression low (<20%) and expression high (>80%).

T-test performed to evaluate significance level of Expression values and corresponding amino acid propensities in four structural class i.e., all-a, all-b, a+b, and a/b in Helix, Sheet and coil.

ResultCalculation of propensities:

At the level of individual significance, all-a and all-b show better significant result in helix as well as sheet and average in coil. All-a & a+b show average significant in helix and coil but better in sheet. All-a & a/b less significant in helix but strong significant in sheet and coil. All-b & a+b show good significance in helix, but in sheet and coil, show average and less significance respectively. In all-b and a/b protein show, all significance difference in helix, but less significance in sheet as well as coil. A+b and a/b show less significance deviation in helix, sheet, and coil.

GC3 Vs propensities:Inter-different structural class are more

significance than intra different structural class.In case of GC3 high and GC3 low, inter different

structural class are more significant.Expression Vs propensity:

Sheet structure of all protein class gives significant result.

In low expression protein better significant result are shown.

Val show significant result in random match.

ConclusionIntrinsic propensity of amino acids for secondary

structure is influenced by the context of the sequence and structural organization. This aspect could suggest that propensity for secondary structure may not be considered a really intrinsic property of each amino acid, but it must be viewed as influenced by the contest.

Amino acid propensities for secondary structures are not only influence by protein structural class, but also genomic GC3 and expression level.

Although other predictive approaches exist and give results better than the statistical methods, this results indicate that improvements of statistical methods are still possible.

Refference:URL:

PDBselect :http://bioinfo.tg.fh-giessen.de/pdbselect

SCOP: http://scop.mrc-lmb.cam.ac.uk/scop/ DSSP: http://www.embl-heidelberg.de/dssp/GNF SymAtlas:

http://symatlas.gnf.org/Symatlas/

Documents

Amino acid propensity in different protein secondary structural units