Biophysics Paper

Embed Size (px)

DESCRIPTION

Physics of protein

Citation preview

  • A Gauge Field Theory of Chirally Folded Homopolymerswith Applications to Folded Proteins

    Ulf H. Danielsson,1, Martin Lundgren,1, and Antti J. Niemi1, 2, 3, 1Department of Physics and Astronomy, Uppsala University, P.O. Box 803, S-75108, Uppsala, Sweden

    2 Laboratoire de Mathematiques et Physique Theorique CNRS UMR 6083,Federation Denis Poisson, Universite de Tours, Parc de Grandmont, F37200, Tours, France

    3Chern Institute of Mathematics, Tianjin 300071, P.R. China

    We combine the principle of gauge invariance with extrinsic string geometry to develop a latticemodel that can be employed to theoretically describe properties of chiral, unbranched homopolymers.We find that in its low temperature phase the model is in the same universality class with proteinsthat are deposited in the Protein Data Bank, in the sense of the compactness index. We apply themodel to analyze various statistical aspects of folded proteins. Curiously we find that it can produceresults that are a very good good match to the data in the Protein Data Bank.

    PACS numbers: 87.15.A- 87.15.Cc 87.14.bd 87.15.ak

    I. INTRODUCTION

    Effective field theory models are often employed andsometimes even with great success, to address compli-cated problems when the exact theoretical principles areeither unknown, or have a structure that is far too com-plex for analytic or numerical treatments. Familiar ex-amples of powerful and predictive effective field theorymodels include the Ginzburg-Landau approach to super-conductivity [1] and the Skyrme model of atomic nuclei[2].

    In polymer physics field theory techniques became pop-ular after de Gennes [3], [4] showed that the self-avoidingrandom walk and theN 0 limit of theO(N) symmetric(2)2 scalar field theory are in the same universality classin the sense of the compactness index. The ensuing fieldtheory approach is very powerful in characterizing criticalproperties of homopolymers. However, to our knowledgethere are no effective field theory models that allow fora detailed description of the geometry of collapsed, chi-ral homopolymers. The goal of the present article is todevelop such a model, in the case of unbranched single-strand homopolymers.

    II. COMPACTNESS INDEX

    We start by recalling the compactness index that de-scribes how the radius of gyration Rg scales in the degreeof polymerization N

    Rg =1

    N

    1

    2

    i,j

    (ri rj)2 LN (1)

    Electronic address: [email protected] address: [email protected] address: [email protected]

    The value of is a universal quantity, in the limit oflarge N [4]. Here ri (i = 1, 2, ..., N) are the locationsof the monomers and L is a form factor that character-izes an effective distance between monomers, it is nota universal quantity. At high temperatures we expectthat quite universally approaches the Flory value [4] 3/5 that corresponds to the universality class of self-avoiding random walk; Monte-Carlo estimates refine thisto 0.588 . . . [5]. On the other hand, for collapsedpolymers we expect to find 1/3. Since coincideswith the inverse Hausdorff dimension of the polymer, thismeans that a collapsed polymer is as compact as ordinarymatter. Finally, between the self-avoiding random walkphase and the collapsed phase we expect to have the -point that describes the universality class of a fully flex-ible chain. At the -point we expect 1/2.

    III. MODEL

    We have found that the scaling law 1/3 of col-lapsed polymers can be computed by the low temperaturefree energy of a discrete version of the two dimensionalAbelian Higgs model with an O(2) U(1) symmetricHiggs field. In its three space dimensional version thismodel was originally introduced to describe superconduc-tivity [1] and it has also found applications in high energyphysics, for example in the description of cosmic strings.We first motivate this model in the present context byconsidering the continuum limit, where the polymer isapproximated by a continuous one-dimensional string.The string is described by the position vector r(s) wherethe parameter s [0, L] measures the distance along thestring that has a total length L. The unit tangent vectorof the string is

    t =dr

    ds.

    arX

    iv:0

    902.

    2920

    v2 [

    cond

    -mat.

    stat-m

    ech]

    26 A

    ug 20

    10

  • 2Together with the unit normal vector n and the unitbinormal vector b = tn we have an orthonormal Frenetframe at each point along the string. In terms of thecomplex combination

    eF = n ibthese three vectors are subject to the Frenet equations[6]

    dt

    ds=

    1

    2(e+F + e

    F ) &

    deFds

    = t ieF

    Here (s) is the extrinsic curvature and (s) is the tor-sion. They specify the extrinsic geometry of the string:Once (s) and (s) are known the shape of the stringcan be constructed by solving the Frenet equations. Thesolution is defined uniquely in R3 up to rigid Galileanmotions.

    The concept of gauge invariance emerges from the fol-lowing simple observation [7]: The vectors n and b spanthe normal plane of the string. But any physical prop-erty of the string must be independent of the choice ofbasis on the normal plane, and instead of eF we couldintroduce another frame which is related to the Frenetframe by a rotation with an angle (s) on the normalplane,

    e+F eie+F e+ .When we substitute this the Frenet equation we concludethat the U(1) rotation redefines

    ei + s (2)

    In the relations (2) we identify the gauge transforma-tion structure of two dimensional Abelian Higgs multi-plet (,Ai). The frame rotation corresponds to a staticU(1) gauge transformation, corresponds to the com-plex scalar field , and corresponds to the spatialcomponent A1 of the U(1) gauge field.

    Since the physical properties of the string are indepen-dent of the choice of a local frame, they must remaininvariant under the U(1) transformation (2). In par-ticular, any effective Landau-Ginzburg energy functionalthat describes a homopolymer and involves the multiplet(, ) (,A1) must be U(1) gauge invariant.

    The following variant of the Abelian Higgs modelHamiltonian [1] is the natural choice for a gauge invariant(internal) Landau-Ginzburg energy functional,

    F =

    L0

    ds{ |(s iA1)|2 + c (||2 2)2 }

    + d

    L0

    dsA1. (3)

    Here the first term is the conventional energy functionalof the Abelian Higgs model including a gauge invariantkinetic and potential terms. For simplicity we choosethe Higgs potential so that it has the canonical quar-tic functional form. When 6= 0 and real valued wehave a spontaneous symmetry breaking with the ensuingHiggs effect, and the ground state of the string acquiresa nonvanishing local curvature. The last term is the one-dimensional version of the Chern-Simons functional [8].We shall find that its presence provides a very simpleexplanation of homochirality, with a negative (positive)parameter giving rise to right-handed (left-handed) chi-rality.

    We determine the thermodynamical properties of (3)from the canonical partition function, defined in the usualmanner by integrating over the fields and A1

    Z =

    [d][dA1] exp{

    0

    d F (,A1)}

    We take the measure to be the canonical measure in the(,A1) space (with appropriate gauge fixing). Alterna-tively we could also introduce the canonical (Polyakov)measure in the coordinate space r, and the two measuresdiffer by a Jacobian factor J [,A] that appears as a cor-rection to the free energy (3),

    F F +L

    0

    ds ln[J ]

    The Jacobian is in general a non-local functional of (s)and A1(s) and we do not have its general form at our dis-posal. But we can expand it in power of the derivativesof these variables: Since the Jacobian is gauge invariant,by general arguments of gauge invariance to lowest non-trivial order in and A1 the result must have the samefunctional as the terms that we have already included in(3). Consequently at the the present level of approxima-tion we strongly suspect that the the only effect of theJacobian would be to renormalize the parameters thatalready appear in (3). It would be very interesting tostudy this issue in more detail.

    IV. DISCRETIZATION

    We now proceed to the discrete lattice version of (3)that we use in our actual computations. We first elimi-nate the explicit gauge dependence by implementing theinvertible change of variables

    ei

    J 12i||2 {2iA1||

    2 ?s+ c.c.}

  • 3where the gauge invariant variable J is called the super-current in the context of superconductivity. The Jaco-bian for this change of variables is . With the identifi-cations and J we then arrive at the followingdiscrete version of the free energy (3) to describe theproperties of general chiral polymers,

    F =

    Ni,j=1

    aij {1 cos[ij(i j)]}

    +

    Ni=1

    {bi

    2i

    2i + ci (2i 2i )2

    }+

    Ni=i

    dii (4)

    The i, j = 1, ..., N label the monomers, and aij , ij andbij are parameters that we have normalized to unity in(3) but now included for completeness; Note that as wewrite it, there is a superfluous overall scale in (4). Thefirst term describes long-distance correlations, it is thediscrete analog of the derivative term of the Higgs fieldin the continuum limit. We have introduced the cosinefunction to tame excessive fluctuations in i. The mid-dle term describes the interaction between i and i, andthe symmetry breaking self-interaction of i. Finally, thelast term is a discretized one-dimensional version of theChern-Simons functional [8] that is the origin of homochi-rality.

    In addition, one could also add the Jacobian thatemerges from the supercurrent change of variables. Wehave tested our model with this Jacobian included andwe have found that it has no essential qualitative conse-quences, thus it will be excluded from the present analy-sis.

    We relate the dynamical variables (i, i) in (4) to thepolymer geometry as follows: The modulus of the Higgsfield i we identify with the signed Frenet curvature ofthe backbone at the site i, and i is the correspondingFrenet torsion. Once the numerical values of i and iare known, the geometric shape of the polymer in thethree dimensional space R3 is obtained by integrating adiscretized version of the Frenet equations. This inte-gration also introduces parameters i, the average finitedistance between the monomers.

    For a general polymer the quantities (aij , ij , bi, ci,i, di) are a priori free site-dependent parameters, anddifferent values of these parameters can be used to de-scribe different kind of monomer structures. Here weshall be interested in the limiting case of homopolymers,where we restrict ourselves to only the nearest neighborinteractions with

    aij =

    {a (i,i+1 + i,i1) (i = 2, ..., N 1)a (i = 1, j = 2) & (i = N 1, j = N)(5)

    and we also select all the remaining parameters to beindependent of the site index i.

    V. NUMERICAL SIMULATIONS

    We have employed (4) to study polymer collapse atlow temperatures using Monte Carlo free energy mini-mization, in the limiting case of a homopolymer whereall the parameters are site-independent; see (5). At eachiteration step of the numerical energy minimization pro-cedure we first generate a new set of values for the curva-ture and torsion (i, i) using the Metropolis algorithm[9] with a finite Metropolis temperature-like parameterTM . We then construct a new polymer configuration bysolving the discrete Frenet equations with a fixed anduniform distance between monomers ,

    |r(si) r(si1)| = i = 2, ..., N. (6)Finally, before accepting the new configuration we ex-clude steric clashes by demanding that the distance be-tween any two monomers in the new configuration satis-fies the bound

    |r(si) r(sj)| z for |i j| 2. (7)Our simulations start from an initial configuration with

    i = i = 0. This corresponds to a straight, untwistedpolymer. Since the initial Metropolis step is determinedrandomly, essentially by a thermal fluctuation, our start-ing point has a large conformational entropy. Conse-quently we expect that statistically our final conforma-tions cover a substantial portion of the landscape of col-lapsed polymers.

    Depending on application we can derive restrictions onthe parameters, for example by comparing the results ofour simulations to the properties of actual polymers. Inthe biologically interesting case where the model is usedto describe statistical properties of folded structures inthe Protein Data Bank (PDB) [10], we would imposethe constraint that in a full 2pi -helix turn there are onaverage about 3.6 monomers (central carbons).

    We have made extensive numerical simulations usingconfigurations where the number N of monomers lies inthe range 75 N 1, 000. For these configurations wetypically arrive at a stable collapsed state after around1,000,000 steps. The folding process takes no more thana few tens of seconds in a MacPro desktop computer,even for the large values of N . But in order to ensure thestability of our final configurations we have extended oursimulations to 22,000,000 steps. Besides thermal fluc-tuations, we observe no essential change in the collapsedstructures after the initial 1,000,000 steps which confirmsthat we have reached a native state.

    VI. COMPARISON WITH PROTEINS

    We have compared the predictions of the homogeneouslimit of the model (4) to the statistical properties of pro-tein structures that have been deposited in the ProteinData Bank. The protein backbones all have an identical

  • 4homogeneous structure. But we recognize that the de-tailed fold of a given protein is presumed to be stronglyinfluenced by the specifics of the interactions that involveits unique amino acid sequence. These include hydropho-bic, hydrophilic, long-range Coulomb, van der Waals, sat-urating hydrogen bonds etc. interactions. Consequentlya given protein should not be approximated by a ho-mopolymer model. However, one can argue that whenone asks questions that relate to the common statisticalproperties of all proteins that are stored in the PDB onecan expect that the inhomogeneities that are due to thedifferent amino acid structures become less relevant andstatistically, in average, these proteins behave very muchlike a homopolymer. We find it interesting to try and seewhether this kind of argumentation is indeed correct.

    In Figure 1 we have placed all single-stranded proteinsthat can be presently harvested from the Protein DataBank, with the number N of central carbons in the rangeof 75 N 1, 000. Using a least square linear fit tothe data we find for the compactness index the valuePDB 0.378 0.0017, which is in line with the resultspreviously reported in the literature [11], [12]. In Figure1 we also show how the compactness index in our modeldepends on N when 75 N 1, 000, using a statisticalsample of 80 runs for each value of N . When we apply aleast square linear fit to our results we find for the com-pactness index the estimate 0.379 0.0081, a some-what surprisingly excellent agreement with the value ob-tained from the Protein Data Bank. Since is a universalquantity, this agreement implies that in the sense of thecompactness index our model resides in the same univer-sality class with proteins in PDB.

    FIG. 1: (Color online) Least square linear fit to the compact-ness index computed in our model ( 0.3790.0081) com-pared with that describing all single strand proteins currentlydeposited at the Protein Data Bank (PDB 0.3780.0017).The error-bars describe standard deviation from the average,and it can be viewed as a measure of conformational entropyin our configurations.

    From the data in Figure 1 we find that our model pre-dicts for the form factor L in (1) the numerical value

    L 2.656 0.049 (A). This compares well with the av-erage value LPDB 2.254 0.021 (A) that we obtainusing a least square fit to the Protein Data Bank data

    displayed in Figure 1.We note that unlike the compactness index , the value

    of L is not a universal quantity and its value can be influ-enced by varying the parameters. The explicit parametervalues that we have used in our simulations are a = 4, = 4.25, b = 0.0005488, c = 0.5, = 24.7, d = 20,and = 3.8, z = 3.7 and we have selected these pa-rameter values by trial-and-error to produce a value forL that is as close as possible to the experimental value.We have verified that by changing the parameter values remains in the vicinity of 0.38 while L can changesubstantially.

    We observe from Figure 1 that the standard devia-tion displayed by our final conformations are comparablein size to the actual spreading of PDB proteins aroundtheir experimentally determined average values. Sincethis standard deviation is a measure of conformationalentropy, we conclude that at each value of N our initialconfiguration appears to have enough conformational en-tropy to cover the entire landscape of native state proteinfolds in PDB.

    We have verified that in our model the value of istemperature independent for a wide range of tempera-tures: The value of is insensitive to an increase in theMetropolis temperature TM until TM reaches a criticalvalue T1. At this critical temperature there is an onsetof a transition towards the -point, and at the -pointwe estimate 0.48 0.49 in line with the expectedvalue 1/2 that characterizes the universality classof a random coil. In the limit of high temperatures wefind 0.65 which is slightly above but in line with theFlory value = 3/5 for a self-avoiding random walk.

    We have also studied the effect of the various operatorsin (4) in determining the universality class:

    We find indications that the value 0.38 is drivenby the presence of the chirality breaking Chern-Simonsterm: When we entirely remove the Chern-Simons termby setting di = 0 while keeping all other parameters in-tact in (4), we find that the compactness index increasesto 0.488 . . . which is very close to the -point value 1/2. This suggests that according to our model thereis some relation between chirality and the transition tothe collapsed phase in the case of homopolymers, thatdeserves to be investigated in more detail.

    When we in addition remove the direct coupling be-tween torsion and curvature by setting bi = di = 0the compactness index remains near its -point value 0.488 . . . .

    Finally, we have observed that when we remove theentire symmetry breaking potential by setting ci = 0 in(4) we find that approaches the value 0.737 . . . .This proposes that we may have a novel universality classin the low temperature phase. Notice that in this casethe local minima of the potential energy are absent andthe discrete symmetry becomes restored at theclassical level.

    At a very high Metropolis temperature TM and whenwe set bi = di = 0 we find that the compactness index,

  • 5as expected, approaches the Flory value 3/5; we now get 0.61 . . . .

    In Figure 2 we show using an example with N = 300,how the compactness index evolves as a function of thenumber of iterations (time), during the first 1,000,000steps. In this figure we also describe how the free energy(4) develops as a function of the iteration steps. We find

    FIG. 2: (Color online) The dotted line shows how the com-pactness index typically evolves as a function of the numberof interation steps (time) when computed as an average overstatistical samples and up to 1,000,000 steps. The continuousline shows similarly how the average energy typically evolvesas a function of the number of interation steps.

    that while generically approaches its asymptotic value 0.38 very rapidly, after only a few thousand itera-tions, the process of energy minimization typically takesabout two orders of magnitude longer. The asymptoticbehaviour of the curves confirms that the final state ishighly stable. The stability is further validated by a com-parison with Figure 1 where we report on results after theiteration process has been continued by 21,000,000 addi-tional steps: For TM < T we find no essential changein the final conformations after N 1, 000, 000 steps,beyond thermal fluctuations.

    While we realize that our Monte Carlo simulation isnot designed to be a reliable method for describing theout-of-equilibrium dynamical time evolution of polymerfolding, we still find it curious that according to Figure 2the process of collapse as described by our model is verymuch like the expected folding process of biological pro-teins: The initial denatured state first rapidly collapsesinto a molten globule, with a large decrease in conforma-tional entropy but only a very small change in the internalenergy. After the initial collapse to the molten globulewith the ensuing formation of secondary structures suchas -helices and -sheets, the process continues with arelatively slow conformational re-arrangement towards alocally stable conformation. The final state has a sub-stantially lower energy than the corresponding moltenglobule state.

    Finally, we have also compared the geometrical shapeof the collapsed configurations as computed in our modelto that of folded proteins using the hierarchical classifi-cation scheme CATH [13]. We find that the geometry

    of the configurations computed using our model are in avery good correspondence with this classification scheme,and they look very much like actual folded proteins. Inparticular, our model appears to produce all the majorsecondary structures of proteins in PDB; See figure 4 fortypical examples.

    FIG. 3: (Color online) On the left two a priori generic ex-amples of folded configurations computed in our model withN = 153. Motifs such as helicity-loop-helicity are clearly vis-ible. On the right, for comparison, are pictures of myoglobin1mbn (above) and 1m6c (below) backbones, constructed usingdata taken from Protein Data Bank.

    VII. SUMMARY

    In summary, we have developed an effective field the-ory model that describes the collapse of a homopolymerin its low temperature phase. We have compared vari-ous properties of our model with the statistical, averageproperties of folded proteins that are stored in the Pro-tein Data Bank. We have found that our model repro-duces the statistical properties of the PDB proteins withsurprisingly good accuracy. For example, it computes ac-curately the compactness index of native state proteinsand correctly describes the phenomenology of protein col-lapse. Furthermore, since the folded states obtained inour model are also in line with the CATH classificationscheme, it appears that our model has promise for a toolto analyze the statistical properties of folded proteins.

    Our research is supported by grants from the SwedishResearch Council (VR). The work by A.J.N is also sup-ported by the Project Grant ANR NT05-142856. A.J.N.thanks H. Orland for discussions and advice. We allthank M. Chernodub for discussions, and N. Johanssonand J. Minahan for comments. A.J.N. also thanks T.Gregory Dewey for communications. A.J.N. thanks theAspen Center for Physics for hospitality during this work.

  • 6[1] P.G. De Gennes, Superconductivity of Metals and Alloys(Westfield Press, New York 1995)

    [2] I. Zahed and G.E. Brown, Physics Reports 142 (1986)1; T. Gisiger and M.B. Paranjape, Physics Reports 306(1998) 109

    [3] P.G. De Gennes, Physics Letters 38A (1972) 339[4] P.G. De Gennes, Scaling Concepts in Polymer Physics

    (Cornell University Press, Ithaca, 1979)[5] B. Li, N. Madras and A. Sokal, Journal of Statistical

    Physics 80 (1995) 661; N. Madras and G. Slade, TheSelf-Avoiding Walk (Birkhauser, Berlin, 1996)

    [6] M. Spivak, A Comprehensive Introduction to DifferentialGeometry Volume Two (Publish or Perish, Inc.. Houston1999)

    [7] A.J. Niemi, Physical Review D67 (2003) 106004[8] S.-S. Chern, J. Simons, Annals of Mathematics 99 (1974)

    48[9] N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.H.

    Teller, E. Teller, Journal of Chemical Physics 21 (1053)1087

    [10] H.M. Berman et.al., Nucleic Acids Research 28 (2000)235

    [11] L. Hong and J. Lei, http://arxiv.org/abs/0711.3679v1[12] T.G. Dewey, Journal of Chemical Physics 98 (1993) 2250[13] A.L. Cuff et.al., Nucleic Acids Research (Advance Access

    published on November 7, 2008)

    I introductionII Compactness indexIII modelIV discretizationV numerical simulationsVI comparison with proteinsVII summary References