Evangelos Coutsias Dept of Mathematics and Statistics ......Evangelos Coutsias Dept of Mathematics...

Loop closure and molecular shapespaces

Evangelos CoutsiasDept of Mathematics and Statistics,

University of New Mexico

IMA-1/15/08

HybridLoop Closure

sampling protein loops with Sobol’squasirandom algorithm

using a combination of inverse kinematic and constraint methods

Applications to loop modeling, exhaustive conformational covering of ring moleculres, the

assembly of helical proteins, Concerted Move MC,...2

Many versions of analytic loop closure

• Go & Scheraga (1970) (transcendental)• Lee & Liang (Mec. Mach. Theory, 1988) (polynomial,

efficient 16x16 resultant, arbitrary chain)• Wedemayer & Scheraga (1999) (polynomial, planar

tripeptide, Sylvester (24x24) resultant etc.)• Coutsias, Seok, Jacobson, Dill (JCC, 2004) (polynomial,

general tripeptide/arbitrary loop, Sylvester resultant)• Coutsias, Seok, Wester, Dill (IJQC, 2005) (polynomial,

arbitrary structure, Dixon (16x16) resultant)• many others.... 3

baseend effector: position and orientation are prescribed

arbitrary chain

rotatable arm (R-joint)

( ) iijij dtRRdR −×Γ=Effect of changing one torsion grows with distance from torsional axisiΓ

jdRidt

1=×Γ= ∑

iij dtRdR

4 torsions can be changed together so that their combined effect cancels at a given point

fixed end

1=×Γ= ∑

iij dtRdR

44332211 =+++

uauauaua

Given 4 vectors in 3-space, there is always a way to combine them so that they add up to zero, while the coefficients do not all vanish:

we can always combine 4 rotations so that they leave a given point invariant in 3-space 7b

4321 =

⎥⎥⎥⎥

⎢⎢⎢⎢

uuuu jjjj

System of 3 equations (the u’s are 3-vectors) in 4unknowns (the a’s): a nontrivial solution (null vector)is guaranteed

1=×Γ= ∑

iij dtRdR

3Γ 1+j2+j

Although atom j is now fixed, the chain past j rotates about an effective axis through j

baseend effector

The end triad is moved due to change in the j-th torsion

2−NR

1−NRNR

base Using six “adjustor” torsions cancels effect of changing j-th torsion, keeping end triad (and all subsequent atoms) fixed in space

2−NR

1−NRNR

2 =×Γ= −=

− ∑ iNi

iiN dtRdR

1 =×Γ= −=

− ∑ iNi

iiN dtRdR

=×Γ= ∑=

iiN dtRdR

( ) 6,,1;,,7 …… == ittBt Nii

Loop closure conditions (arbitrary indexing)

base Could determine the values of the adjustor torsions that cancel the change in j-th torsion by integrating a system of differential equations

2−NR

1−NRNR

6,...,1

∂∂

==∂∂

ijiijk

iij Rvv

tBvtR ×Γ=

⎭⎬⎫

⎩⎨⎧

∂∂

+Δ=Δ ∑∑==

This applies to all atoms, including the end triad where displacement must vanish

NNNjRj ,1,2;0 −−==Δ

⎥⎥⎥

⎢⎢⎢

×Γ−×Γ−×Γ−

⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢

∂∂

⎟⎟⎟

⎜⎜⎜

×Γ×Γ×Γ×Γ×Γ×Γ

−−

RRRRRR

,66,11

1,661,11

2,662,11

Kinematic Jacobian

The rates of change of the “adjustor” torsions with respect to an arbitrary (sampled) torsion can be found by solving such systems. Here, except for degenerate arrangements, the six columns are independent. But there are 9 rows. The system is solvable, because there are 3 relations among those (“left null vectors”), due to the invariance of the distances among the end triad atoms.

( ) ( ) 0

=−⋅−=−

≠=−

jijiji

RRdRdRRRd

jiconstRR

Solvability conditions for rate-of-change system: 3 relations for i,j=N-2,N-1,N

locked hinge

free hinge

free torsion

pivot torsion

virtual torsion9c

Our New View of Loop Closure

Old View6 rotations/6 constraints

New View3 rotations/3 constraints

Transcendental equation Polynomial equation for a more general case

In the new frameIn the original frame

Representation of Loop Structures

Coutsias, Seok, Jacobson, Dill, JCC 200411

α1−nC

α1+nC

In the body frame of thethree carbons, the anchor bonds lie in cones about the fixed base.

Given the distance and angle

constraints, three types of virtual

motions are encountered in the

body frame12

α1−nC α

With the base andthe lengths of the two peptidevirtual bonds fixed, the vertex

is constrained to lie ona circle.

αα11 +− − nn CC

Tripeptide Loop ClosureN

Bond vectorsfixed in space

Fixed distance

α1+nC

Peptide axis rotation:With the two end carbons fixed in space, the peptide unit rotates about the virtual bond

δτσ +=

y ),,,;( ηξθαστ ±= FTransfer Function for concerted rotations: rotations of two peptide units about respective virtual bonds couple to preserve bond angle at Ca atom

zThe 4-bar spherical linkage

τσξ θ η

STetrahedral Angle

12sin;

=⇔⎟⎠⎞

⎜⎝⎛=

=⇔⎟⎠⎞

⎜⎝⎛=

τττ

σσσ

Equation of the Tetrahedral AngleRaoul Bricard, 1897

(study of flexible octahedra)

( )( )

( )( )ηξαθ

ηξαθηξ

ηξαθηξαθ

+++=+−+=

−=−++=−−+=

coscoscoscossinsin4

coscoscoscos

( ) ( ) 0222 =++++ EDuCuwwBAu

Bricard Flexible Octahedra

constraints: enforced as relations among the 3 torsions

( ) ( )

0023201311

0022203211

0021202111

CuCuuCuCuC

BuBuuBuBuB

AuAuuAuAuA

General 6-member ring (Octahedron)

6C 3τ1

slightly perturbed 6-member ring may assume 4 distinct conformations

Canonical cyclohexane

DCEEACCEAACE

FABABC

5.3260

==<=<=<

==<==<

( ) ( )( ) ( )( )

( ) ( )γαγγ

αγγαγ

++=−−=

−==−+=

coscoscos1

coscoscoscos

( ) ( ) 02, 12

1 =++++= ++++ iiiiiiiiiiiii EuuCuuBuuAuuG

( ) ( ) ( ) 0,,, 133221 === uuGuuGuuG

( ) 0, =uuG( )

Chair:

Twist Boat:

⎟⎟⎠

⎞⎜⎜⎝

⎛ +−±=

−====

βγτβαβαγ

sinsincossinsincoscoscoscos

,,3,2,1,tan2

ststiu

rigid, comes from double rootof tetrahedral equation

For flexible solutions to exist, the tetrahedral equation must be satisfied by arbitrary triplets! This is not an obvious fact.

( ) ( ) 02, 2222 =++++= ECuvvuBvAuvuG

Bricard Conjecture

For the system of biquadratics to have a continuum of solutions, it must be derivable via successive elimination, from the equations:

321321

211332

=+++=+++

uuPuNuMuLupunuumuulu

easily seen to be a sufficient condition, not obvious that it is necessary (and not proven)!

Sufficient conditions for flexibility

The system of biquadratics can be reduced toBricard’s sufficient condition if:

BCBAEDB

22 −=

It is easy to see that the cyclohexane coefficients satisfy these relations; therefore the three equations in three variables are equivalent to two equations in three variables!

conformation circle for cyclohexanetwist-boat. As u spans the feasible range, so do v and w. Any triplet (u,v,w) corresponds to a molecular conformation

The conformation space of a flexible octahedron is composed of circles

A. Bushmelev & I. Sabitov

( )( )βα

βα−−+=+−+=

cos2cos2

abbaxabbax

The extreme values of x are found when M4 lies on the plane of M1, m2, M3. If the range for the upper half overlap with that of the lower, there are solutions.

case 2: one circle

cases 3(resp. 4): 2 circles, fused at 1(resp. 2) points

case 3: 2 circles

1τ2τ

1δ2δ

( ) ( )

2/tan,1

0023201311

0022203211

0021202111

CvCuvCuCvC

BvBuvBuBvB

AvAuvAuAvA

δ=ΔΔ−

( )( )( )

( ) ( ) ( )

( ) ( ) ( ) 0

301312

302312232

0001202

1102112212

2120221

=+++++

uCuuCuuC

uBuuBuuB

AuAuAuAuAuA

uAuAuA

Three constraint equations:three generalized angles

Elimination in Polynomial systems

• Pairwise elimination: Sylvester resultant• Simultaneous elimination: Dixon resultant• Both methods lead to a single polynomial in

one of the variables: 16th degree• Differ in complexity and numerical

accuracy

( )( ) ( ) ( )( ) ( ) ( )( ) ( ) ( )⎥

⎥⎥

⎢⎢⎢

1322211

,,,vpvpvvpvpupuvpupupuup

( ) ( )( )22112121

det,,,vuvu

Cvvuud−−

Dixon resultant: details

Elimination matrix

Dixon polynomial

Vanishes at all common roots of original system.27

[ ]TuuuuuuuuuuU

vvvvvvvvvvV

VDUvvuud

222121

Dixon matrix: coefficients of Dixon polynomial, arranged by monomials in x, y

If u1, u2 are roots of original system, U becomes a right null vector. Therefore the vanishing of det D is a necessary condition for the existence of a common root.

⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

000000000000000

CCCCCC

SSCBAD

The Dixon resultant contains an extraneous factor (deg. 16)Coutsias, Seok, Wester, Dill, IJQC 2005

⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

000000000000000

0000000

210210

iiiiii

CCCCCC

NNNMMMNNNMMM

MMMLLLMMMLLL

031232

SuSuSS

LuLuLL kkkk

=⎟⎟⎠

⎞⎜⎜⎝

⎛⎥⎦

⎤⎢⎣

⎡−−

−⎥⎦

⎤⎢⎣

⎡SSI

Eliminating the extraneous factor there results a matrix polynomial of degree 2; its determinant can be computed with a companion matrix of size 16X16. Then the real eigenvalues give u3 for all possible conformations. Values for the other two variables are found from the appropriate eigenvector components

Successive elimination (Sylvester resultant method) results in a matrix of size 6X6 but with quartic coefficients, leading to a companion matrix of size 24X24 (but still a 16th degree polynomial)

QZ or Characteristic PolynomialEither:

• (1) Solve generalized eigenproblem using QZ algorithm (cubic order in matrix size, here approx. 16^3 ~4kflops)Or:

• (2a) Use Lagrange’s expansion in complementary minors to efficiently compute the characteristic polynomial of the Dixon resultant (~2.2kflops).

• (2b) Use Sturm sequences (count number of real zeros between two values) for an efficient computation of real zeros from characteristic polynomial (bisection/Newton: variable, but low, cost)

Rotate segments 1,2 about resp. virtual axes by angles τ1, τ2

Rotate entire loop about virtual axis 3 by angle τ3

Choose values of the τ1,τ2,τ3 to fix bond angles θ at 3 junctions

1τ2τ

3τ1θ

Method calculates all (0-16)possible loop configurationsthat bridge given gap.

are anchorbonds defining the two endsof the loop.

TriaxialLoop Closure

10 solutions from RNA modeling

Extensions to More General Cases

Longer Loop Closure

General Loop Sampling

3τ33 δτ +

22 δτ +

Reconstruction Procedure:

1. Rotate about axis 3 by angle to place triangle into space. Compute local space axes.

2. Bring objects 1,2 into their own body frame, so that the corresponding C atoms are located in the positive-X direction (while the Z-axis is the body axis).

3. Rotate each of 1, 2 by ,

4. Place 1, 2 in space, bringing each set of body axes into coincidence with respective space axes.

α2C 3τ−

2τ1τ

2GCrank

Follower

Two-revolute, two-spherical-pair mechanism 35

Lee & Liang, Mech. Mach. Theory, 1988

Lee-Liang

Closure

S. Pollock, Undergrad. thesis, UNM ‘07

36bS. Pollock, Undergrad. thesis, UNM ‘07

36cS. Pollock, Undergrad. thesis, UNM ‘07

36dS. Pollock, Undergrad. thesis, UNM ‘07

36eS. Pollock, Undergrad. thesis, UNM ‘07

“The Mt. Everest of kinematics”

A. Freudenstein

The formulation is quite involved; however it eventually leads to a generalized eigenproblem of identical form as the one found in the Triaxial case with the Dixon resultant, so the solution from that point on is identical for the two methods. Key difference is the considerable additional overhead for carrying out the Hartenberg-Denavit transform. The numerical computation in this form is approximately 3 times slower than for the Triaxial-Dixon algorithm (comparison in progress).

1θ 8θ

5θ4θ

3θ2θ

Comparison of algorithms: exhaustive covering of the conformational space of cyclooctane. Two torsions set to arbitrary values, other 6 determined to satisfy closure 37

Pollock & Coutsias, (preprint, 2007)38a

Low res. plots of solution no. and energy (only lowest energy state shown, no solvation)

energyno. of solutions

AutoEncoder

IsoMap

Cyclooctane as Benchmark for Automated Dimensionality Reduction

• Automated dimensionality reduction algorithms offer the potential for visualization of high dimensional energy landscapes and more efficient sampling in surrogate spaces for general problems in protein folding, molecular recognition, self assembly, etc.

• Examples:– Principal Component Analysis (PCA)– Locally Linear Embedding (LLE)– IsoMap– Autoencoder Neural Network (AE)

• How well do they perform on cyclooctane?with M. Brown, J.-P. Watson, Sandia Nat. Labs

Automated Dimensionality Reduction Must Be Able To…

1. Identify the dimensionality of a manifold2. Provide a forward map into low

dimensional space (Visualization)3. Provide a reverse map into high

dimensional space (Recover meaningful structure from low-dimensional samples)

4. Reconstruction error – Error between original structure and structure obtained following forward and reverse mapping.

with M. Brown, J.-P. Watson, Sandia Nat. Labs

Results

• Can obtain successful reduction from D=72 (with hydrogens), 24, or 8 (dihedral angles) to ~3-5, but not 2.

3-D Embedding from Isomapwith M. Brown, J.-P. Watson, Sandia Nat. Labs

Manifold Tearing?• Cannot guarantee a smooth embedding of an

n-manifold in n dimensions• For the problems of interest, it may not be

necessary to ensure a strict preservation of topology as long as the reconstruction error is minimized.

• Tearing the manifold or considering only a subset of charts at a time may allow for a more efficient encoding– Self organizing maps or methods for automating

the tearing might be beneficialwith M. Brown, J.-P. Watson, Sandia Nat. Labs

Examples• By restricting 1 of the independent dihedral

angles to (0,pi/2) domain or by using a single molecular dynamics trajectory a successful reduction to 2 dimensions is obtained:

Completeness/efficiency issues• Complete sampling: use perturbation theory of eigenvalues

to ensure sampling at sufficient resolution to detect all possible bifurcations and crossings

• Efficiency: .5 degree grid sampling requires ~1hr CPU time on a 2.8GHz Xeon processor (compared to 4 months for

calculation by Ros et al JCC 2007, using distance geometry)

BxAxBxAx βαλ =⇒=

βαλ

yByAyByAFBBEAA

=⇒=⇒⎭⎬⎫

~~~~~~~~~

( )2~,~~,~ εβα OxByxAy HH +=

Stewart & Sun, Matrix Perturbation Theory, Academic Press, 1990

Numerical Sensitivity Issues• Numerical errors accumulate roughly like the square root of

the number of algebraic operations• Setting up the resultant for the LL algorithm involves approx

4x the amount of computation as for the TLC algorithm• The two resultants have similar forms, with nonzero entries in

the same locations, so that the generalized eigenproblemsinvolve identical computations

• The sensitivity of the eigenproblem depends on the separation of eigenvalues/eigenvectors

• Near degeneracies, the sensitivity in the eigenproblem serves as an amplifier of numerical errors in the matrix elements

Dangerous territory: in a broad sampling need to tread carefully for trouble may be lurking!

Bricard curve

1st chaotic ridge, in 6-8 projection

1st chaotic ridge, shown in t5-t8 projection, is pathological for both TLC and LL algorithms

10^-5 stepsize

Cyclooctane conformation analysis(under the hood!)

• The TLC calculations employ 3 equations and use t5, t8 as control parameters

• The LL computations employ additionally descriptions in terms of (t6,t8) and (t4,t8), not possible for TLC

• The various descriptions are equivalent, as alternative 2d projections of a 2-parameter surface in 8 dimensional space.

• The phenomena seen can be more easily explained by employing a 4-equation generalization of TLC, using as control parameters the angles (a1, a2) which serve to completely characterize the quadrilateral formed by atoms (1-3-5-7)

( )2/tan ii z=τ

2/tan,1 ii

iii δ

ττσ =Δ

Δ+Δ−

( )2/tan ii s=σ

δ−= 44 zs

( ) ( ) 12/tan2/tan2cos0

2/tan,1

2212142

1241431

2232322

1221211

≤=≤

=ΔΔ+

Δ−=

ααδ

σσσσσσ

ττττττ

σσσσσσ

ττττττ

πααα

γπβ

βαγβ

αγβαγ

≤+≤

−−=−=

=−=+−=

2coscossin2

coscos2coscos

The coefficients for cyclooctane; the canonical values are γ=115, β=32.5 deg.

Determining backbone torsions:spherical trigonometry

cossinsincoscoscossinsincoscoscossinsincoscoscossinsincoscos

βγβγτβαβαβγβγτβαβα

−=−−=−

Canonical cyclooctanequasi-planar conformation

5.32817

90571135357713115812123

==<=<==<=<

==<==<

( ) ( ) ( ) ( )( ) ( ) ( )( )

( ) ( )γγγ

γπγγγγπγ

sincoscos1

cos2/coscossincos2/coscos

−=−−=

=−==+=−+=

Square scaffold conformation

( ) ( )

:021sin12cos

τττττ

ττττγττττττγ

→→→→

=−−++−++cyclic

o115=α

o5.32== δβo90=γ

Flexible conformation

{ } [ ]( ) 02

,,,,,,,2222

−−−−==

ECuvvuBvAu

vvuuvvuut ii

When the 1-3-5-7 frame forms a square, t6=-u and t8=v lie on a Bricard curve and atoms 2,4,6,8 may rotate freely about the square frame. Two eigenvalues coincide, leading to the sensitive behavior. This coincidence is accidental, not associated with merger and disappearance of solutions, i.e. not associated with a boundary where solution number changes

Bricard curve

( )( )2/tan

The 1st chaotic ridge is a Bricard curve, shown here in terms of the half-tangents of the control parameters

2nd chaotic ridge, (6-8). Inaccurate solutions found.

another accidental degeneracy, with no simple geometrical interpretation

610−

710−

Hybrid Algorithm

• Triaxial Loop Closure (paired axes) preferrablewhen applicable – faster, more robust

• Lee-Liang (independent axes) loop closure: more general, employed when necessary

• Similar resultant: equivalent solution as eigenproblem or polynomial (new formulation)

• Automatic switching between two methods• Additional algebraic safeguards against accidental

degeneracies (optimal choice of solution variable,nongeneric flexibility detection)

locked hinge

free hinge

free torsion

locked hinge

free hinge

virtual torsions

constraints

virtual axes

5 coupled tetrahedrals=> 2*2^5 = 64 possible solutions

example: additional backbone distance constraints

locked hinge

free hinge

first loop

Example: Sampling a contact: sample triad positions and double loop closure

second loop

locked hinge

free hinge

first loop

Example: Sampling a contact: sample triad positions and double loop closure

second loop

Sobol sampling

A quasirandom sampling scheme that retains nearly uniform density in an n-

dimensional box for arbitrary resolution

seed = 1 to 100 +

seed = 1 to 100 + , seed = 101 to 200 +

seed = 1 to 100 + , seed = 101 to 200 o

seed = 1 to 1000 + , seed = 9001 to 10000 +

seed = 1 to 10000 +

General

Proline

Glycine

Ramachandran regions

Ramachandran region sampling

• Sobol samples hypercubes• dedicating two dimensions per residue, would

“waste” ¾ of sample points at the most unlikely areas (probability<5%)

• Simple: 95% probability region for each residue is pixelated at 5deg squares, and laid out in a linear array. Each residue is assigned 1 Sobol dimension

• Advanced: each pixel in 95% region stretched according to probability measure

Applications

• modeling of ring molecules • Loop refinement• Complementing Rosetta Relax algorithm

for sampling loop and helix variability• A simple algorithm for Helical Protein

Assembly using maximal hydrophobic packing

refinement:CASP7 target T308

best model

native best 5 among 100 TLC samples

preliminary result

Coutsias, Jacobson, Dill (2007)4

Sampling loops and repositioning helices in Rosetta

Mandell, Coutsias, Kortemme (2007, work in progress)

Can apply to longer loops by sampling dihedrals and applying closure

Can also perturb fragments like helices and close surrounding loops

Current Backbone Flexibility in Rosetta: Relax

Ensemble generated using Rosetta Relax from a single PDZ domain

Natural structural variation of a family of PDZ domains

Single PDZ domainwith ligand

Fragment insertion +local backbone moves

Loop closure complements fragment insertion

• Combining loop closure with fragment insertion might yield even more realistic ensembles

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

Residue C-alpha

Rosetta Loop closure Natural

Rosetta Relax Natural

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

Residue C-alpha

Rosetta Loop closure Natural

Loop closure

sheets helix

Triaxial Loop Closure - An Efficient Method for Sampling

BackbonesModeling the antigen-binding antibody H3 loopApo conformation

starting from Holo

using Rosetta Relax

triaxial loop closure(D. Mandell, Kortemme lab,UCSF)

Comparison: Rosetta Relax vs. TLC

• the Rosetta Relax structures are generated by fragment insertion and small torsion moves over the whole structure,

• the TLC loop is generated by choosing 3 fixed pivots, and sobol-sampling the remaining torsions (the rest of the protein is fixed). Best candidate selected using Rosetta’s energy function.

• The Roseta Relax protocol thus took much longer (~1000 times, since it is operating on the entire structure).

Modeling PDZ domain variability

• PDZ domains are topologies frequently found in scaffold proteins that organize complexes, particularly those involved in cell trafficking.

• They frequently bind the C-terminus of a protein ligand, as well as peptides internal to the ligand. PDZ domains have a well-defined core typically consisting of 6 strands and 2 helices arranged in a compact, globular structure.

• For example, substrate binding consistently occurs at the interface formed by two strands and a helix.

1be9 helix ensemble, generated by perturbation of helix torsions, corrected by loop closure. The helix forms an interface together with the adjacent beta strand also shown as a backbone trace. The ligand is shown in green with side chains.

To compare the ensemble with naturally occurring variation, the wild type of this PDZ domain is shown in cyan superimposed with another PDZ domain in blue. The helix from the first PDZ domain is in purple backbone trace and the second is in orange backbone trace. The ligandsare also shown in dark green and light green (side-chain trace only for the second).

much of the variationbetween the naturally occurring helices is captured in the ensemble

TLC as a refinement strategyin new Rosetta loop modelingprotocol (1thg 12 residue loop)

native new

with D. Mandell, T. Kortemme, UCSF

Minimization of Angle Deviationin loop modeling by fragment assembly

Joulyan Lee, Soongsil U., Chaok Seok, Seoul Natl. U., (S. Korea), and E.Coutsias

• Fragment assembly methods often applied to protein structure problems.• When structures are generated for a segment such as a protein loop,

assembled fragments are not fit into the frame of the protein of interest exactly.

• Deviation of the dihedral angles of the loop from the fragment angles are minimized here to maintain the features obtained from the structure database as well as possible,

• Two methods have been tried: 1) Monte Carlo simulation and 2) A local minimization in the space of loop conformations satisfying the loop closure constraint. In both method, root-mean-square deviation in dihedral angles is used as the objective function.

Application of Our Loop Closure Method to Protein Loop Modeling: Move Sets in

Monte Carlo MinimizationMonte Carlo Minimization: Sampling of Local Minima

LTD Move Sets Baysal and Meirovitch

J. Phys. Chem. A 101, 2185 (1997).Loop Closure Move Sets

minimize

8 residue loop of turkey egg whitelysozyme

Finding Native structure

Loop Modeling Using Monte Carlo Minimization: Example

Loop Modeling: Increased Efficiency and Accuracy

Native

Coutsias et al, J. Comput. Chem. 25, 510 (2004)Baysal, C. and Meirovitch, H., J. Phys. Chem. A, 101, 2185, (1997 )

Monte Carlo simulation: 1 driver angle is perturbed randomly within 10 deg, 6 torsion angles are used to close the loop. kT=0.5 deg and 2000 MC steps. 20 independent simulations starting from different initial loop closure were performed for each starting conformation generated from fragment assembly.Minimization using the kinematic Jacobian: 100 steps of steepest descent minimization, and subsequent LBFGS-b minimization (termination criterion: function decrease: 10^7*machine precision, gradient: 10^(-3)). 20 independent minimization starting from different initial loop closure.Results: two loops, 8-residue loop of 135l (residues 84-91) and 12-residue loop of (35-46), were tested. R_ave is RMSD averaged over the different conformations generated from fragment assembly. R_min is the min RMSD. Dphi_init is the initial deviation in angles, and dphi_ave is the final deviation averaged over the conformations. The overallperformance of the two methods is similar, but the computation time is much faster with Jac method.

with C. Seok , J. Lee

Protein135l 1eco

length8 12

# of conf from fragment assembly 87 127

dphi_init (deg)34.3 28.0

R_ave (A) MC 2.5 3.9

Jac 2.7 5.2

R_min (A) MC 1.0 1.3

Jac 1.1 1.3

dphi_ave (deg) MC 18.1 15.8

Jac 17.0 15.4

Time (sec) MC 38.9 91.7

Jac 5.4 5.1

0 0.5 1 1.5 2 2.5 3 3.5 4

RMSD (A)

ation (

0 2 4 6 8 10

RMSD (A)

ation (

Assembly of helical proteins

A simple heuristic based on fast loop closure and maximal hydrophobic packing—as measured by radius

of gyration of the Ca atoms in hydrophobic residuesMotivated from need to improve assembly

performance of Dill group’s Zipping & Assembly strategy for tertiary structure prediction

G.A. Wu, E.A. Coutsias, K.A. Dill, Iterative Assembly of Helical Proteins by Optimal Hydrophobic Packing, (2007, in press)

Related works• Huang, Samudrala, Ponder (JMB 1999): Rg packing• Yue & Dill (Prot Sci 2000): special angles• Fain & Levitt (JMB 2001): database scoring; graph

theoretic/combinatorial assembly• Nanias et al. (PNAS 2003): simplified energy (Ca rep.)• Narang et al. (Phys Chem Chem Phys 2005): torsion sampling,

knowledge constraints• McAllister et al. (Proteins, 2006), contact prediction, distance

restraints• Zhang, Hu, Kim (PNAS 2002): torsion angle dynamics• …• Issues: Assembly, (Loop closure), Scoring

Paths not followed 1: Object assembly en masse: identical equations to loop closure. A possible approach to avoid searching. Assemble all elements at once into geometrically feasible configurations.

1, 2 rings

3 rings

4 rings

all possible graph topologies with up to 4 rings, with only 3-node contacts

http://topology.health.unm.edu

Graph Theory: assemble using topological graphs and combinatorics

Wester, Pollock, Coutsias, Allu, Muresan & Oprea, Topological Analysis of Molecular Scaffolds II: Analysis of Chemical Databases (submitted) (2007)

The assembly algorithm

• Begin with a protein whose secondary structure is known to contain helices (as determined, e.g. by DSSP, Kabsch,Sander, Biopolymers 22, 2577-2637 (1983) ) remove loops and consider the problem of placing the helices relative to each other

• Align two helices; score each alignment; select best subset, close loop(s).

• Align next helix with assemblage of first two, close loop; iterate

• Cluster/rank by RgH; select best candidates based on a hydrophobic packing criterion.

Initial arrangement: juxtaposition of Ca atoms for two hydrophobic residues at 10 A

• 1) do shortest loop 1st

Initial arrangement: juxtaposition of Ca atoms for two hydrophobic residues at 10 A

• 1) do shortest loop 1st• 2) always add one helix at a time (i.e. the 2nd

shortest loop may not be second, unless it is next to the shortest loop) 61c

• starting with long loops, we would need to keep less compact conformations (in addition to compact ones) in order to ensure the native conformation is covered.

• by adding long loop closure at the later steps, we have a more limited conformation space to explore due of excluded volume effects (i.e. steric constraints with the pre-assembled parts).

• starting with long loops, we would need to keep less compact conformations (in addition to compact ones) in order to ensure the native conformation is covered.

• by adding long loop closure at the later steps, we have a more limited conformation space to explore due of excluded volume effects (i.e. steric constraints with the pre-assembled parts).

• Amber force field energy minimization is done after loop closure for better sterics (30-60 sd/cg steps)

Sample all possible hydrophobic pairings between two helices

2 DoF sampled: translation and rotation about alignment axis

Limited by loop closability

Ensemble of structures generated, ranked by RgH, clustered by mutual RMSD, lowest RgH structure kept per cluster

Cluster cutoff heuristic: (n-1)Ang, n = #of helices in assembly63

2HEP (2hx)

Topology 1

rmsd = 1.27(32)

1PRB (3hx):

assembly order:

Topology 2

rmsd = 1.5(1)

Larger proteins

• subtle connectivity topology errors mayexist, although overall RMSD is still good

2CRO (5 helix)assembly order:

2-1-3-41

Topology 22

Topology 21

rmsd = 3.99(55)

Disulfide bridged proteins

• RgH scoring not discriminating: native notamong few best structures

• Imposing known disulfide bonds introduces sufficient restrictions (at least in the 5 proteins we attempted) that still allowed us to sample near-native structures

• TLC applicable to S2 bridges (but not included in current implementation); used amber9 with restraints to close bridges for these studies

1C5A (4hx)

assembly order:

Topology 6 (21)RMSD = 2(103), 2.49(17)

1RPO 1.52(3) 2 helices

1BDD 1.58(24) 3 helices

1HP8 2.45(30) (3 helices, 2 S2 bridges)

2MHR 2.14(8) 4 helices

2ICP 5.10(15), 4.09(137) 5 helices

Subtleties re. clustering rmsd

• a large rmsd cutoff (~ 3 Ang) during clustering, may lead to a smaller number of centroids but may end up throwing the smallest rmsd conformation away, assuming (falsely) it is represented by another in the cluster

• using a low rmsd cutoff for clustering may keep best rmsd conf, but will become computationally too expensive, or may miss the best conf among the top ranking ones by RgH (because of too many candidates, with insufficient variability).

Helical protein gallery: nativevs. lowest rmsd model

)32(27.1 )3(52.1 )24(58.1 )19(92.1 )21(67.1

)13(51.1

)6(93.1

)20(80.1 )1(50.1 )13(03.2

)1(80.1 )2(21.2 )34(50.2 )10(67.2 )19(96.2

[ ]3 [ ]2 [ ]3 [ ]2 [ ]2

)30(29.2 )8(91.2 )22(54.4 )47(21.3 )42(22.4

)8(14.2 )43(00.4 )12(01.4 )15(10.5 )23(59.4

)30(45.2 )9(69.1 )17(49.2 )2(21.3 )28(73.2

Thanks

• Ken Dill, Albert Wu (now at UC Berkeley), Matthew Jacobson, Tanja Kortemme, Dan Mandell, Pharmaceutical Chemistry, UCSF

• Michael Wester, Sara Pollock (now at Dept of Applied Math, U. Washington), Dept of Math and Division of Biocomputing, UNM

• Mike Brown, Jean-Paul Watson, Sandia National Labs

• Chaok Seok, Dept. of Chemistry, Seoul National University, S. Korea, Julian Lee, Soongsil University, S. Korea

• Bob Lewis, Fordham U., Manfred Minimair, Setton Hall U.

Support: NIH, UNM Division of Biocomputing

PEP25:CLLRMKSAC

Designing a 9-peptide ring

pep virtual bond3-pep bridge

design triangle

cysteine bridge 9-pepring

ModelingR. Larson’s9-peptide

(P-G-P-G-dA)

native

sample from exhaustive enumeration of conformations for cyclic pentapeptide P-G-P-G-dA

Evangelos Coutsias Dept of Mathematics and Statistics ......Evangelos Coutsias Dept of Mathematics...

Documents

DEVIATION USES/EVANGELOS PAPANDREOU

Evripides Zantides / Evangelos Kourdis Graphism …...Evripides Zantides / Evangelos Kourdis Graphism and Intersemiotic Translation: An old idea or a new trend in advertising? Abstract

Evangelos Assimakopoulos-No 5

Ben Carterette Paul Clough Evangelos Kanoulas Mark Sanderson

Rear Lights Vehicle Detection for Collision Avoidance Evangelos Skodras George Siogkas Evangelos Dermatas Nikolaos Fakotakis Electrical & Computer Engineering

Evangelos Trantos: MSc Thesis Design Presentation

Loop Refinement by Localized Sampling and Minimization (SLAM) Vageli Coutsias, Matt Jacobson, Michael Wester, Lan Hua

Evangelos Assimakopoulos-No 11

CARS Microscopy of Colloidal Gels Evangelos Gatzogiannis

Evangelos Loutas and Ioannis Pitas* Dept. of Informatics ...poseidon.csd.auth.gr/LAB_SEMINARS/DigDays/Lectures/... · Aristotle University of Thessaloniki Occlusion Definition Ł

Evangelos Assimakopoulos-No 1

Evangelos Markatos, FORTH info@ist-lobster.org1 Network Monitoring for Performance and Security The LOBSTER project Evangelos

Evangelos Boudounis - 8 Summeries

Evangelos Vlachos - staff.mef.org.ar

Root Finding The solution of nonlinear equations and systems Vageli Coutsias, UNM, Fall ‘02

A Kinematic View of Loop Closure - University of New Mexicovageli/papers/loop.pdf · A Kinematic View of Loop Closure EVANGELOS A. COUTSIAS,1,* CHAOK SEOK,2,* MATTHEW P. JACOBSON,2

Evangelos Assimakopoulos-No 6

Evangelos Assimakopoulos-No 7

IntroductionIntroduction Evangelos Pournaras, Izabela Moise, Dirk Helbing Evangelos Pournaras, Izabela Moise, Dirk Helbing 1 Outline 1.Data Science 2.Course Description Evangelos Pournaras,

evangelos boudounis - free · PDF fileevangelos boudounis Arranger, Composer, Teacher Greece , Athens About the artist EVANGELOS BOUDOUNIS BIOGRAPHY Evangelos Boudounis was