43
4. Modeling of side chains 1

4. Modeling of side chains 1. Protein Structure Prediction: – given: sequence of protein – predict: structure of protein Challenges: – conformation space

Embed Size (px)

Citation preview

Page 1: 4. Modeling of side chains 1. Protein Structure Prediction: – given: sequence of protein – predict: structure of protein Challenges: – conformation space

4. Modeling of side chains

1

Page 2: 4. Modeling of side chains 1. Protein Structure Prediction: – given: sequence of protein – predict: structure of protein Challenges: – conformation space

Protein Structure Prediction:– given: sequence of protein– predict: structure of protein

Challenges:– conformation space

• goal: describe continuous, immense space of conformations in an efficient and representative way

– realistic energy function • goal: energy minimum at or near experimentally derived structure (native)

– efficient and reliable search algorithm • goal: locate minimum (global minimum energy conformation GMEC)

Prediction of side chain conformations:– subtask of protein structure prediction

Side chain modeling is part of structure prediction

2

Page 3: 4. Modeling of side chains 1. Protein Structure Prediction: – given: sequence of protein – predict: structure of protein Challenges: – conformation space

The importance of side chain modeling

Side chain prediction subtask of protein structure prediction• given: correct backbone conformation• predict: side chain conformations (i.e. whole protein)

• successful prediction of protein structure depends on successful prediction of the side chain conformations

• complete details not solved by experiment• allows evaluation of protocol at detailed, full-atom level• allows flexibility in docking

3

Page 4: 4. Modeling of side chains 1. Protein Structure Prediction: – given: sequence of protein – predict: structure of protein Challenges: – conformation space

Prediction of side chain conformations1. rotamer libraries2. dependence on backbone accuracy3. approaches that locate GMEC or MECs

Rosetta & other approaches DEE - Dead end elimination, SCWRL, PB - Belief propagation, LP -Linear integer programming

Today’s menu

4

Page 5: 4. Modeling of side chains 1. Protein Structure Prediction: – given: sequence of protein – predict: structure of protein Challenges: – conformation space

Side chains are described as rotamers

Dihedral angles c1-c4 define side chain (assuming equilibrium bond and angle values)

From wikipedia

5

Page 6: 4. Modeling of side chains 1. Protein Structure Prediction: – given: sequence of protein – predict: structure of protein Challenges: – conformation space

Serine c1 preferences

t=180o

g-=-60og+=+60o

Side chains assume discrete conformations

Staggered conformations minimize collision with neighboring atoms

Lovell, 2000

6

Page 7: 4. Modeling of side chains 1. Protein Structure Prediction: – given: sequence of protein – predict: structure of protein Challenges: – conformation space

Rotamer: discrete side chain conformation defined by c1-c4

Rotamer libraries contain preferred conformations

Dunbrack, 2002

Shapovalov and Dunbrack* 2011 BBDEP 3854 1.8* Shapovalov & Dunbrack, Structure 2011 7

Page 8: 4. Modeling of side chains 1. Protein Structure Prediction: – given: sequence of protein – predict: structure of protein Challenges: – conformation space

Ponder & Richards, 1987:Analysis of ~20 proteins

(~2000 side chains)

67 rotamers can adequately represent side chain conformations (for 17/20aa)

Representative rotamer libraries are surprisingly small

8

Page 9: 4. Modeling of side chains 1. Protein Structure Prediction: – given: sequence of protein – predict: structure of protein Challenges: – conformation space

Dunbrack & Karplus, 1993: • For each -f y (20ox20o) bin, derive statistics on c1+2 values

• Reflects dependence of side chain conformation on backbone conformation

Backbone dependent rotamer libraries

f

y

9

Page 10: 4. Modeling of side chains 1. Protein Structure Prediction: – given: sequence of protein – predict: structure of protein Challenges: – conformation space

Observed frequency ofgauche+, gauche- + trans

is very different in different backbone conformationssheet, helix, and coil regions(n=850 proteins, <1.7 Å resolution, and pair-wise seqid < 50%)

Rotamer preferences depend on backbone conformation: example Valine

10

Page 11: 4. Modeling of side chains 1. Protein Structure Prediction: – given: sequence of protein – predict: structure of protein Challenges: – conformation space

use Bayesian statistics to estimate populations

• for all rotamers,• of all side chain types,• for each -f y (10ox10o) bin

P(c2, c3, c4 | c1, , )f y

Bayesian statistical analysis of rotamer library Dunbrack 1997

using Bayesian formalism, combine

• prior distribution based on P( )f *Py)

• fully ,f y dependent data

… to describe both • well-sampled regions• sparsely sampled regions

11

Page 12: 4. Modeling of side chains 1. Protein Structure Prediction: – given: sequence of protein – predict: structure of protein Challenges: – conformation space

Rotamer energy (Edun): a knowledge-based score

1. Calculate pobs: frequencies of rotamers (or any other feature)

2. Convert into effective potential energy using Boltzmann equation

Boas & Harbury , 2007

DG = -RTln (pobs/pexp)

12

Page 13: 4. Modeling of side chains 1. Protein Structure Prediction: – given: sequence of protein – predict: structure of protein Challenges: – conformation space

Structure determination revisited

Refit electron density maps15% of non-rotameric side chains can be refitted to 1 (or 2) rotameric conformations

13(Shapovalov & Dunbrack, 2007)

Page 14: 4. Modeling of side chains 1. Protein Structure Prediction: – given: sequence of protein – predict: structure of protein Challenges: – conformation space

Refit electron density mapsRotameric side chains have lower entropy (dispersion of electron density around c)

than side chains with multiple conformations in pdb, or non-rotameric side chains

Structure determination revisited

Residue type

c 1 e

ntro

py

14(Shapovalov & Dunbrack, 2007)

Page 15: 4. Modeling of side chains 1. Protein Structure Prediction: – given: sequence of protein – predict: structure of protein Challenges: – conformation space

Many good reasons:

1. More structural data

2. Improved set: Electron density calculations - remove highly dynamic side chains

3. Derive accurate and smooth density estimates of rotamer populations (incl. rare rotamers) as continuous function of backbone dihedral angles

4. Derive smooth estimates of the mean values and variances of rotameric side-chain dihedral angles

5. Improve treatment of non-rotameric degrees of freedom

2011: Improved Dunbrack library

15Shapovalov & Dunbrack, 2011

Page 16: 4. Modeling of side chains 1. Protein Structure Prediction: – given: sequence of protein – predict: structure of protein Challenges: – conformation space

• Calculate rotamer preference for given -F Y bin:

• Adaptive Kernel density estimation allows:– smoother density function

(prevents steep derivatives in Rosetta minimizations!)– more detailed binning

The 2011 Dunbrack library

1. For each rotamer r of aa: determine a probability density estimate r( ,j f|r) (= Ramachandran distribution for each rotamer)

2. Use Bayes’ rule to invert this density to produce an estimate of the rotamer probability

P(r): backbone independent

probability of rotamer r

16

Page 17: 4. Modeling of side chains 1. Protein Structure Prediction: – given: sequence of protein – predict: structure of protein Challenges: – conformation space

Smoother density function P(r = g+| ,j f, aa = Ser)

histogram

Original probability density

Using adaptive density kernels (integrate over neighborhood

of adaptive size) 17

Page 18: 4. Modeling of side chains 1. Protein Structure Prediction: – given: sequence of protein – predict: structure of protein Challenges: – conformation space

Not all side chain atoms show rotameric distribution

Better description of non-rotameric side chains

Original libraryMetc1

SP3

Glnc3

SP2

Example: GLN c3 angles for (c1=g+; c2=t)

3c

New library

18

Alpha helix Beta sheet Loops (polyP II)

Page 19: 4. Modeling of side chains 1. Protein Structure Prediction: – given: sequence of protein – predict: structure of protein Challenges: – conformation space

Better description of non-rotameric side chains

Example: ASN c2 angles for (c1=g+)

Page 20: 4. Modeling of side chains 1. Protein Structure Prediction: – given: sequence of protein – predict: structure of protein Challenges: – conformation space

…. Leads to slight improvement in modeling

20

Page 21: 4. Modeling of side chains 1. Protein Structure Prediction: – given: sequence of protein – predict: structure of protein Challenges: – conformation space

Rotamer frequency:• rare conformations reflect increased internal strain – important to

take frequency into account • frequency can be used as energy term: Ei= -K ln Pi

Increasing availability of high-resolution structures• narrows distribution around rotamer in library• Indicates that errors are responsible for outliers

Refitting of electron density maps • non-rotameric conformations often incorrectly modeled and high in

entropy

Some conclusions about rotamer libraries

21

Page 22: 4. Modeling of side chains 1. Protein Structure Prediction: – given: sequence of protein – predict: structure of protein Challenges: – conformation space

Rotamericity <100%:• Include more side chain conformations!

– Position-dependent rotamers (example: unbound conformations in docking predictions)

– Additional conformations around rotamer (± sd)– Non-rotameric side chain angles: describe as continuous

density function

Some conclusions about rotamer libraries

22

Page 23: 4. Modeling of side chains 1. Protein Structure Prediction: – given: sequence of protein – predict: structure of protein Challenges: – conformation space

Prediction of side chain conformations1. rotamer libraries2. dependence on backbone accuracy3. approaches that locate GMEC or MECs

Rosetta & other approaches DEE - Dead end elimination, SCWRL, PB - Belief propagation, LP -Linear integer programming

Today’s menu

23

Page 24: 4. Modeling of side chains 1. Protein Structure Prediction: – given: sequence of protein – predict: structure of protein Challenges: – conformation space

• Most common local backbone move in ultra-high resolution structures (<1.0Å)

• Changes side chain orientation without effect on backbone• 3 rotations around Ca-Ca axes• In 3% of all residues (1/4=Serine)Two distinct rotamers related by backrub moves for Ile (tt,mm)

Backrub Motions: “How protein backbone shrugs when side chain dances”

24

Change of θ 1,3

Compensatory changes of θ 1,2 and 2,3

Davis, 2006

Page 25: 4. Modeling of side chains 1. Protein Structure Prediction: – given: sequence of protein – predict: structure of protein Challenges: – conformation space

Prediction of side chain conformations1. rotamer libraries2. dependence on backbone accuracy3. approaches that locate GMEC or MECs

Rosetta & other approaches DEE - Dead end elimination, SCWRL, PB - Belief propagation, LP -Linear integer programming

Today’s menu

25

Page 26: 4. Modeling of side chains 1. Protein Structure Prediction: – given: sequence of protein – predict: structure of protein Challenges: – conformation space

Prediction of side chain conformations using rotamers

• Given: – protein backbone– for each residue: set of possible

conformations (rotamers from library) • Wanted:Combination of rotamers that results in

lowest total energyGMEC = min (SEir + SEirjs)

location of GMEC is NP-hard (Fraenkel, 1997; Pierce, 2002)

i i+1i+2

i i+1i+2

Self energy

Pair energy

26

Page 27: 4. Modeling of side chains 1. Protein Structure Prediction: – given: sequence of protein – predict: structure of protein Challenges: – conformation space

Side chain modeling = find best combination of rotamers

How?1. systematic scan

• for a protein with – 50 residue, and– 9 rotamers/residue

number of combinations to scan:

N=509 ~ 1047 !

feasible only for small proteins search space needs to be reduced

i i+1i+2

Pos … ia ib …

jaeia,ja eib,ja

jbeia,jb eib,jb

….

Etot= Si Ei + Si,j Eij

ia ib ic

27

Page 28: 4. Modeling of side chains 1. Protein Structure Prediction: – given: sequence of protein – predict: structure of protein Challenges: – conformation space

Deterministic Approaches (e.g. DEE):– Guarantee location of GMEC– Can be slow– Advantageous when GMEC is (the only) near-

native conformationHeuristic Approaches (e.g. MC):

– Locate Population of low-energy models (not necessarily GMEC)

– Faster, often converge

Search strategies for locating GMEC or MECs

28

Page 29: 4. Modeling of side chains 1. Protein Structure Prediction: – given: sequence of protein – predict: structure of protein Challenges: – conformation space

DEE (Dead-end elimination):– prune impossible rotamers, determine GMEC from reduced rotamer

set

Residue-interacting graphs (SCWRL)– use dynamic programming on graph to find GMEC– start with “leafs”: residues with low connectivity in graph

Linear Programming (Kingsford)– solve set of linear constraints – can locate GMEC for sparsely connected graphs– dependent on energy function

Guaranteed finding of GMEC

29

Page 30: 4. Modeling of side chains 1. Protein Structure Prediction: – given: sequence of protein – predict: structure of protein Challenges: – conformation space

• Approach: remove rotamers that cannot be part of the GMEC

Rotamer r at position i can be eliminated if there exists a rotamer t such that:

• Iterative application of DEE removes many rotamers, at certain positions only one rotamer is left

• (Note that some rotamers can be removed from the beginning because they clash with the backbone - too high Eit)

Dead End Elimination (DEE)

r

t

E

Combinations of rotamers at positions j≠i

30Desmet & Lasters, 1992

Page 31: 4. Modeling of side chains 1. Protein Structure Prediction: – given: sequence of protein – predict: structure of protein Challenges: – conformation space

• Approach: remove rotamers that cannot be part of the GMEC, second criterion:

Rotamer r at position i can be eliminated if there exists a rotamer t such that:

• This criterion allows removing of additional rotamers

Refined DEE

r

t

E

Combinations of rotamers at positions j≠i

31Goldstein, 1994

Page 32: 4. Modeling of side chains 1. Protein Structure Prediction: – given: sequence of protein – predict: structure of protein Challenges: – conformation space

• Approach: remove rotamers that cannot be part of the GMEC - additional criterion:

Rotamer r at position i can be eliminated if there exists rotamers t1 and t2 such that:

• takes more time to compute

• at the end, we are left with 1 combination, or with a few combinations only, that need to be evaluated using other criteria

More sophisticated DEE criteria….

rt1

t2

E

Combinations of rotamers at positions j≠i

32

Page 33: 4. Modeling of side chains 1. Protein Structure Prediction: – given: sequence of protein – predict: structure of protein Challenges: – conformation space

• DEE guarantees to find GMEC…• … but may miss conformations that have only slightly

worse energy • Given that the energy function is not perfect, we

want to find also additional conformations with comparable energy

• Approach used in Orbit: use MC to find additional low-energy combinations that resemble GMEC

DEE-based approaches

33

Page 34: 4. Modeling of side chains 1. Protein Structure Prediction: – given: sequence of protein – predict: structure of protein Challenges: – conformation space

• Local sampling starting from GMEC reveals conservation pattern of designs

Alignment with zif268second finger

Conservation across1000 simulations

Ranking of predictedsequences

Design of a sequence that adoptsa zinc finger fold without zinc

34

Dahiyat & Mayo (1997)

Page 35: 4. Modeling of side chains 1. Protein Structure Prediction: – given: sequence of protein – predict: structure of protein Challenges: – conformation space

SCWRL - residue-interacting graphs

• DEE - remain with residues with > 1 rotamer: “active residues”

• undirected graph of active residues: – side chains = vertices– interacting rotamer pairs: connected by

edge• identify

– articulation points (break cluster apart) &– bi-connected components (cannot be

broken into different parts by removing one node)

Very simple energy function: only dunbrack energy and repulsion 35Canutescu, 2003

Page 36: 4. Modeling of side chains 1. Protein Structure Prediction: – given: sequence of protein – predict: structure of protein Challenges: – conformation space

SCWRL - residue-interacting graphs

Solve a cluster using bi-connected components•For each, calculate best energy given specific rotamer in bi-connected residue

• Pruning is easy since energy function only positive [Backtracking: when certain threshold is used, a specific rotamer (combination) can be deleted]

36Canutescu, 2003

Page 37: 4. Modeling of side chains 1. Protein Structure Prediction: – given: sequence of protein – predict: structure of protein Challenges: – conformation space

• Define cutoff values to prune branches that probably do not contain low-energy conformations

• Mean-field approach, Belief Propagation• Self-consistent algorithms• Monte-Carlo sampling

Heuristic approaches

37

Page 38: 4. Modeling of side chains 1. Protein Structure Prediction: – given: sequence of protein – predict: structure of protein Challenges: – conformation space

Side chain optimization

Rigid body minimization

Random perturbation

MC

Sc modeling in Rosetta: part of a cycle

START

Randomperturbation

Side chain optimization

Rigid body minimization

FINISH

Ener

gy

Rigid body orientations

• rigid body optimization• backbone optimization

38

Page 39: 4. Modeling of side chains 1. Protein Structure Prediction: – given: sequence of protein – predict: structure of protein Challenges: – conformation space

Side chain modeling protocols in Rosetta

• Monte-Carlo procedure: • heuristic• does not converge – several runs needed to locate solution

• use backbone-dependent rotamer library (Dunbrack)

• approaches• “Repacking” – model side chain conformation from scratch• “Rotamer Trial” – refine side chain conformations• “Rotamer Trial with minimization” (RTmin) – off-rotamer

sampling by minimization39

Page 40: 4. Modeling of side chains 1. Protein Structure Prediction: – given: sequence of protein – predict: structure of protein Challenges: – conformation space

Monte Carlo sampling

• pre-calculate Eir and Eirjt matrix• Self energy: Energy between rotamer r at position i with

constant part• Pairwise energy: between rotamer r at position i and rotamer

t at position j (sparse matrix)

Etotal = Si Eir + SiSj Eirjt

• simulated annealing• make random change• start with high acceptance rate, gradually lower temperature• acceptance based on Boltzmann distribution

40

Page 41: 4. Modeling of side chains 1. Protein Structure Prediction: – given: sequence of protein – predict: structure of protein Challenges: – conformation space

“Repacking”: full combinatorial side chain optimization

• remove all side chains• gradually add side chains: select from backbone-

dependent rotamer library add position-specific rotamers (e.g. from unbound conformation): set

their energy to minimum rotamer energy, to ensure acceptance

• use simulated annealing to create increasingly well packed side chains

• repeat to sample range of low-energy conformations

41

Page 42: 4. Modeling of side chains 1. Protein Structure Prediction: – given: sequence of protein – predict: structure of protein Challenges: – conformation space

“Rotamer trial”: side chain adjustment

• Find better rotamers for existing structure• pick residue at random• search for rotamer with lower energy • replace rotamer

• Repeated until all high-energy positions are improved

• Fast42

Page 43: 4. Modeling of side chains 1. Protein Structure Prediction: – given: sequence of protein – predict: structure of protein Challenges: – conformation space

• Side chain modeling based on rotamer libraries Combinatorial problem

• Approaches for side chain modeling involve smart reduction of combinatorial complexity (heuristic or exact)

• Side chain modeling as a “toy model” for structural modeling

• Side chain modeling can be extended to Design by adding rotamer options of different amino acids

Side chain modeling: Summary

43