Upload
rachel-hunter
View
219
Download
2
Tags:
Embed Size (px)
Citation preview
4. Modeling of side chains
1
Protein Structure Prediction:– given: sequence of protein– predict: structure of protein
Challenges:– conformation space
• goal: describe continuous, immense space of conformations in an efficient and representative way
– realistic energy function • goal: energy minimum at or near experimentally derived structure (native)
– efficient and reliable search algorithm • goal: locate minimum (global minimum energy conformation GMEC)
Prediction of side chain conformations:– subtask of protein structure prediction
Side chain modeling is part of structure prediction
2
The importance of side chain modeling
Side chain prediction subtask of protein structure prediction• given: correct backbone conformation• predict: side chain conformations (i.e. whole protein)
• successful prediction of protein structure depends on successful prediction of the side chain conformations
• complete details not solved by experiment• allows evaluation of protocol at detailed, full-atom level• allows flexibility in docking
3
Prediction of side chain conformations1. rotamer libraries2. dependence on backbone accuracy3. approaches that locate GMEC or MECs
Rosetta & other approaches DEE - Dead end elimination, SCWRL, PB - Belief propagation, LP -Linear integer programming
Today’s menu
4
Side chains are described as rotamers
Dihedral angles c1-c4 define side chain (assuming equilibrium bond and angle values)
From wikipedia
5
Serine c1 preferences
t=180o
g-=-60og+=+60o
Side chains assume discrete conformations
Staggered conformations minimize collision with neighboring atoms
Lovell, 2000
6
Rotamer: discrete side chain conformation defined by c1-c4
Rotamer libraries contain preferred conformations
Dunbrack, 2002
Shapovalov and Dunbrack* 2011 BBDEP 3854 1.8* Shapovalov & Dunbrack, Structure 2011 7
Ponder & Richards, 1987:Analysis of ~20 proteins
(~2000 side chains)
67 rotamers can adequately represent side chain conformations (for 17/20aa)
Representative rotamer libraries are surprisingly small
8
Dunbrack & Karplus, 1993: • For each -f y (20ox20o) bin, derive statistics on c1+2 values
• Reflects dependence of side chain conformation on backbone conformation
Backbone dependent rotamer libraries
f
y
9
Observed frequency ofgauche+, gauche- + trans
is very different in different backbone conformationssheet, helix, and coil regions(n=850 proteins, <1.7 Å resolution, and pair-wise seqid < 50%)
Rotamer preferences depend on backbone conformation: example Valine
10
use Bayesian statistics to estimate populations
• for all rotamers,• of all side chain types,• for each -f y (10ox10o) bin
P(c2, c3, c4 | c1, , )f y
Bayesian statistical analysis of rotamer library Dunbrack 1997
using Bayesian formalism, combine
• prior distribution based on P( )f *Py)
• fully ,f y dependent data
… to describe both • well-sampled regions• sparsely sampled regions
11
Rotamer energy (Edun): a knowledge-based score
1. Calculate pobs: frequencies of rotamers (or any other feature)
2. Convert into effective potential energy using Boltzmann equation
Boas & Harbury , 2007
DG = -RTln (pobs/pexp)
12
Structure determination revisited
Refit electron density maps15% of non-rotameric side chains can be refitted to 1 (or 2) rotameric conformations
13(Shapovalov & Dunbrack, 2007)
Refit electron density mapsRotameric side chains have lower entropy (dispersion of electron density around c)
than side chains with multiple conformations in pdb, or non-rotameric side chains
Structure determination revisited
Residue type
c 1 e
ntro
py
14(Shapovalov & Dunbrack, 2007)
Many good reasons:
1. More structural data
2. Improved set: Electron density calculations - remove highly dynamic side chains
3. Derive accurate and smooth density estimates of rotamer populations (incl. rare rotamers) as continuous function of backbone dihedral angles
4. Derive smooth estimates of the mean values and variances of rotameric side-chain dihedral angles
5. Improve treatment of non-rotameric degrees of freedom
2011: Improved Dunbrack library
15Shapovalov & Dunbrack, 2011
• Calculate rotamer preference for given -F Y bin:
• Adaptive Kernel density estimation allows:– smoother density function
(prevents steep derivatives in Rosetta minimizations!)– more detailed binning
The 2011 Dunbrack library
1. For each rotamer r of aa: determine a probability density estimate r( ,j f|r) (= Ramachandran distribution for each rotamer)
2. Use Bayes’ rule to invert this density to produce an estimate of the rotamer probability
P(r): backbone independent
probability of rotamer r
16
Smoother density function P(r = g+| ,j f, aa = Ser)
histogram
Original probability density
Using adaptive density kernels (integrate over neighborhood
of adaptive size) 17
Not all side chain atoms show rotameric distribution
Better description of non-rotameric side chains
Original libraryMetc1
SP3
Glnc3
SP2
Example: GLN c3 angles for (c1=g+; c2=t)
3c
New library
18
Alpha helix Beta sheet Loops (polyP II)
Better description of non-rotameric side chains
Example: ASN c2 angles for (c1=g+)
…. Leads to slight improvement in modeling
20
Rotamer frequency:• rare conformations reflect increased internal strain – important to
take frequency into account • frequency can be used as energy term: Ei= -K ln Pi
Increasing availability of high-resolution structures• narrows distribution around rotamer in library• Indicates that errors are responsible for outliers
Refitting of electron density maps • non-rotameric conformations often incorrectly modeled and high in
entropy
Some conclusions about rotamer libraries
21
Rotamericity <100%:• Include more side chain conformations!
– Position-dependent rotamers (example: unbound conformations in docking predictions)
– Additional conformations around rotamer (± sd)– Non-rotameric side chain angles: describe as continuous
density function
Some conclusions about rotamer libraries
22
Prediction of side chain conformations1. rotamer libraries2. dependence on backbone accuracy3. approaches that locate GMEC or MECs
Rosetta & other approaches DEE - Dead end elimination, SCWRL, PB - Belief propagation, LP -Linear integer programming
Today’s menu
23
• Most common local backbone move in ultra-high resolution structures (<1.0Å)
• Changes side chain orientation without effect on backbone• 3 rotations around Ca-Ca axes• In 3% of all residues (1/4=Serine)Two distinct rotamers related by backrub moves for Ile (tt,mm)
Backrub Motions: “How protein backbone shrugs when side chain dances”
24
Change of θ 1,3
Compensatory changes of θ 1,2 and 2,3
Davis, 2006
Prediction of side chain conformations1. rotamer libraries2. dependence on backbone accuracy3. approaches that locate GMEC or MECs
Rosetta & other approaches DEE - Dead end elimination, SCWRL, PB - Belief propagation, LP -Linear integer programming
Today’s menu
25
Prediction of side chain conformations using rotamers
• Given: – protein backbone– for each residue: set of possible
conformations (rotamers from library) • Wanted:Combination of rotamers that results in
lowest total energyGMEC = min (SEir + SEirjs)
location of GMEC is NP-hard (Fraenkel, 1997; Pierce, 2002)
i i+1i+2
i i+1i+2
Self energy
Pair energy
26
Side chain modeling = find best combination of rotamers
How?1. systematic scan
• for a protein with – 50 residue, and– 9 rotamers/residue
number of combinations to scan:
N=509 ~ 1047 !
feasible only for small proteins search space needs to be reduced
i i+1i+2
Pos … ia ib …
…
jaeia,ja eib,ja
jbeia,jb eib,jb
….
Etot= Si Ei + Si,j Eij
ia ib ic
27
Deterministic Approaches (e.g. DEE):– Guarantee location of GMEC– Can be slow– Advantageous when GMEC is (the only) near-
native conformationHeuristic Approaches (e.g. MC):
– Locate Population of low-energy models (not necessarily GMEC)
– Faster, often converge
Search strategies for locating GMEC or MECs
28
DEE (Dead-end elimination):– prune impossible rotamers, determine GMEC from reduced rotamer
set
Residue-interacting graphs (SCWRL)– use dynamic programming on graph to find GMEC– start with “leafs”: residues with low connectivity in graph
Linear Programming (Kingsford)– solve set of linear constraints – can locate GMEC for sparsely connected graphs– dependent on energy function
Guaranteed finding of GMEC
29
• Approach: remove rotamers that cannot be part of the GMEC
Rotamer r at position i can be eliminated if there exists a rotamer t such that:
• Iterative application of DEE removes many rotamers, at certain positions only one rotamer is left
• (Note that some rotamers can be removed from the beginning because they clash with the backbone - too high Eit)
Dead End Elimination (DEE)
r
t
E
Combinations of rotamers at positions j≠i
30Desmet & Lasters, 1992
• Approach: remove rotamers that cannot be part of the GMEC, second criterion:
Rotamer r at position i can be eliminated if there exists a rotamer t such that:
• This criterion allows removing of additional rotamers
Refined DEE
r
t
E
Combinations of rotamers at positions j≠i
31Goldstein, 1994
• Approach: remove rotamers that cannot be part of the GMEC - additional criterion:
Rotamer r at position i can be eliminated if there exists rotamers t1 and t2 such that:
• takes more time to compute
• at the end, we are left with 1 combination, or with a few combinations only, that need to be evaluated using other criteria
More sophisticated DEE criteria….
rt1
t2
E
Combinations of rotamers at positions j≠i
32
• DEE guarantees to find GMEC…• … but may miss conformations that have only slightly
worse energy • Given that the energy function is not perfect, we
want to find also additional conformations with comparable energy
• Approach used in Orbit: use MC to find additional low-energy combinations that resemble GMEC
DEE-based approaches
33
• Local sampling starting from GMEC reveals conservation pattern of designs
Alignment with zif268second finger
Conservation across1000 simulations
Ranking of predictedsequences
Design of a sequence that adoptsa zinc finger fold without zinc
34
Dahiyat & Mayo (1997)
SCWRL - residue-interacting graphs
• DEE - remain with residues with > 1 rotamer: “active residues”
• undirected graph of active residues: – side chains = vertices– interacting rotamer pairs: connected by
edge• identify
– articulation points (break cluster apart) &– bi-connected components (cannot be
broken into different parts by removing one node)
Very simple energy function: only dunbrack energy and repulsion 35Canutescu, 2003
SCWRL - residue-interacting graphs
Solve a cluster using bi-connected components•For each, calculate best energy given specific rotamer in bi-connected residue
• Pruning is easy since energy function only positive [Backtracking: when certain threshold is used, a specific rotamer (combination) can be deleted]
36Canutescu, 2003
• Define cutoff values to prune branches that probably do not contain low-energy conformations
• Mean-field approach, Belief Propagation• Self-consistent algorithms• Monte-Carlo sampling
Heuristic approaches
37
Side chain optimization
Rigid body minimization
Random perturbation
MC
Sc modeling in Rosetta: part of a cycle
START
Randomperturbation
Side chain optimization
Rigid body minimization
FINISH
Ener
gy
Rigid body orientations
• rigid body optimization• backbone optimization
38
Side chain modeling protocols in Rosetta
• Monte-Carlo procedure: • heuristic• does not converge – several runs needed to locate solution
• use backbone-dependent rotamer library (Dunbrack)
• approaches• “Repacking” – model side chain conformation from scratch• “Rotamer Trial” – refine side chain conformations• “Rotamer Trial with minimization” (RTmin) – off-rotamer
sampling by minimization39
Monte Carlo sampling
• pre-calculate Eir and Eirjt matrix• Self energy: Energy between rotamer r at position i with
constant part• Pairwise energy: between rotamer r at position i and rotamer
t at position j (sparse matrix)
Etotal = Si Eir + SiSj Eirjt
• simulated annealing• make random change• start with high acceptance rate, gradually lower temperature• acceptance based on Boltzmann distribution
40
“Repacking”: full combinatorial side chain optimization
• remove all side chains• gradually add side chains: select from backbone-
dependent rotamer library add position-specific rotamers (e.g. from unbound conformation): set
their energy to minimum rotamer energy, to ensure acceptance
• use simulated annealing to create increasingly well packed side chains
• repeat to sample range of low-energy conformations
41
“Rotamer trial”: side chain adjustment
• Find better rotamers for existing structure• pick residue at random• search for rotamer with lower energy • replace rotamer
• Repeated until all high-energy positions are improved
• Fast42
• Side chain modeling based on rotamer libraries Combinatorial problem
• Approaches for side chain modeling involve smart reduction of combinatorial complexity (heuristic or exact)
• Side chain modeling as a “toy model” for structural modeling
• Side chain modeling can be extended to Design by adding rotamer options of different amino acids
Side chain modeling: Summary
43