2. Introduction to Rosetta and structural modeling Approaches for structural modeling of proteins...

2. Introduction to Rosetta and structural modeling

• Approaches for structural modeling of proteins • The Rosetta framework and its prediction

modes• Cartesian and polar coordinates• Sampling (finding the structure) and scoring

(selecting the structure)

Structural Modeling of Proteins - Approaches

Prediction of Structure from Sequence

Flowchart Comparison of query sequence to nr databaseComparison of query sequence to nr database

Similar to a sequence of known structure?Similar to a sequence of known structure?

Homology Modeling(Comparative Modeling)

Fold Recognition(Threading)

Fits a known fold?Fits a known fold?

YesYes

Ab initio predictionAb initio prediction

Protocols: ab initio, loops, side chains, active sites….Protocols: ab initio, loops, side chains, active sites….

The Rosetta framework and its prediction modes

The Rosetta Strategy

• Observation: local sequence preferences bias, but do not uniquely define the local structure of a protein

• Goal: mimic interplay of local and global interactions that determine protein structure

Local interactions: fragments •Derived from known structures• Sampled for similar

sequences/secondary structure propensity

• Fragment library represents accessible local structures for short sequence

Global (non-local) interactions: scoring function•Buried hydrophobic residues, paired strands, specific side chain interactions, etc.•Derived from known structures (statistics on preferred conformations)•Boltzmann’s principle relates frequency to energy

A short history of Rosetta

In the beginning: ab initio modeling of protein structure starting from sequence Short fragments of known proteins are

assembled by a Monte Carlo strategy to yield native-like protein conformations

Reliable fold identification for short proteins. Recently improved to high-resolution models (within 2A RMSD)

ATCSFFGRKLL…..ATCSFFGRKLL…..

A short history of Rosetta

Success of ab initio protocol lead to extension to Protein design Design of new fold: TOP7 Protein loop modeling; homology modeling Protein-protein docking; protein interface design

Protein-ligand docking Protein-DNA interactions; RNA modeling Many more, e.g. solving the phase problem in

Xray crystallography

ATCSFFGRKLL…..ATCSFFGRKLL…..

More recent additions

• Boinc (Rosetta@home)• FoldIt

• Rosettascripts; RosettaDiagrams• PyRosetta

Scoring and Sampling

The basic assumption in structure prediction

Native structure located in global minimum (free) energy conformation (GMEC)

➜A good Energy function can select the correct model among decoys

➜A good sampling technique can find the GMEC in the rugged landscape

EEGMECGMEC

Conformation spaceConformation space

Two-Step Procedure

1. Low-resolution step locates potential minima (fast)

2. Cluster analysis identifies broadest basins in landscape

3. High-resolution step can identify lowest energy minimum in the basins (slow)

GMECGMEC

Conformation spaceConformation space

Nature uses one scoring function…

Aim: one generic function for different applications

Optimization of parameters: Originally from small

molecules (experiments & quantum mechanical calculations)

Today: use of protein structures solved at high-accuracy

How are scoring terms optimized?

Benchmarks:

Discriminate ground state from alternative conformations

Identify correct side chain conformation

Predict effect of stability of point mutations (G)

Leaver-Fay, …, & Baker (2013) Methods in Enzymology 523:109

Structure Representation:• Equilibrium bonds and

angles (Engh & Huber 1991)

• Centroid: average location of center of mass of side-chain(Centroid | aa, ,)

• No modeling of side chains• Fast

Low-Resolution Step (e.g. score4)

Bayes Theorem:• Independent components prevent over-counting

P(str | seq) = P(str)*P(seq|str) / P(seq)

Low-Resolution Scoring Function

constantconstantsequence-dependent features

sequence-dependent features

structuredependent features

......

Bayes Theorem: P(str | seq) = P(str) * P(P(seq seq | | strstr)) / P(seq)

Score = Senv+ Spair + …

neighbors: C-C <10Ǻ

Sequence-Dependent Components

Rohl et al. (2004) Methods in Enzymology 383:66Origin: Simons et al., JMB 1997; Simons et al., Proteins 1999

P(str | seq) = P(P(strstr)) * P(seq | str) / P(seq)

Score = … + Srg + Sc + Svdw + …

Structure-Dependent Components

P(str | seq) = P(P(strstr)) * P(seq | str) / P(seq)

Score = … + Srama

….+…..+

Structure-Dependent Components

Slow, exact step• Locates global energy

minimum

Structure Representation:• All-atom (including polar and non-

polar hydrogens, but no water)• Side chains as rotamers from

backbone-dependent library• Side chain conformation adjusted

frequently

e.g. score12; Talaris; …

High-Resolution Step

Dunbrack 1997

• Side chains have preferred conformations

• They are summarized in rotamer libraries

• Select one rotamer for each position

• Best conformation: lowest-energy combination of rotamers

High-Resolution Step: Rotamer Libraries

Serine 1 preferences

t=180o

g-=-60og+=+60o

High-Resolution Scoring Function

• Major contributions:– Burial of hydrophobic

groups away from water– Void-free packing of

buried groups and atoms– Buried polar atoms form

intra-molecular hydrogen bonds

Packing interactionsScore = SLJ(atr + rep) + ….

Linearized repulsive part

e: well depth from CHARMm19

(new in score12’: starts from minimum)

Implicit solvation

Score = … + Ssolvation + ….

Lazaridis & Karplus, Proteins 1999

solvation free energy density of i

xij=(rij - Ri)/i

Hydrogen Bonding Energy

Based on statistics from high-resolution structures in the PDB

(Kortemme, Morozov & Baker 2003 JMB)

Slide from Jeff Gray

Score = …. + Shb(srbb+lrbb+sc) + ….

srbb: short range, backbone HB

lrbb: long range, backbone HB

sc: HB with side chain atom

Rotamer preference

Score = … + Sdunbrack + ….

Dunbrack, 1997

One long, generic function ….

Score = Senv+ Spair + Srg + Sc+ Svdw + Sss+ Ssheet+ Shs + Srama + Shb (srbb + lrbb) + docking_score + Sdisulf_cent+ Sr+ Sco + Scontact_prediction + Sdipolar+ Sprojection + Spc+ Stether+ S+ S+ Ssymmetry + Ssplicemsd + …..

docking_score = Sd env+ Sd pair + Sd contact+ Sd vdw+ Sd site constr + Sd + Sfab score

Score = SLJ(atr + rep) + Ssolvation + Shb(srbb+lrbb+sc) + Sdunbrack + Spair – Sref + Sprob1b + Sintrares + Sgb_elec + Sgsolt

+ Sh2o(solv + hb) + S_plane

Scoring Function: Summary

One long, generic function …. A weighted sum of different terms

Score12 = w1*SLJatr + w2*SLJrep + w3*Ssolvation + w4*Shb(srbb+lrbb+sc) + w5*Sdunbrack + w6*Spair – Sref

Scoring Function: Summary

How can it be improved ? Feature Analysis Tool : improve parametersOptE : optimize weights

Feature Analysis : improve scoring term

Aim: similar distributions in crystal structures and modelsAim: similar distributions in crystal structures and models

e.g. HB distance H- Oin Ser & Thr

Feature Analysis : improve scoring term

Aim: similar distributions in crystal structures and modelsAim: similar distributions in crystal structures and models

e.g. HB distance H- Oin Ser & Thr

After correction: distribution in native & model structures overlap After correction: distribution in native & model structures overlap

Score12 = w1*SLJatr + w2*SLJrep + w3*Ssolvation + w4*Shb(srbb+lrbb+sc) + w5*Sdunbrack + w6*Spair – Sref

OptE : optimize weights

Maximum Likelihood Parameter EstimationBenchmarks: Discriminate ground state from alternative conformations Identify correct side chain conformation Sequence recovery in design: choose correct amino acid

residue Predict effect of stability of point mutations (G)

& more …

Aim: Best score for correct predictionAim: Best score for correct prediction

Representations of protein structure: Cartesian and polar coordinates

Position PHI PSI OMEGA CHI1 CHI2 CHI3 CHI41 0.00 -60.00 -180.00 -60.00 0.00 0.00 0.00 23….……

PDB x y zATOM 490 N GLN A 31 52.013 -87.359 -8.797 1.00 7.06 NATOM 491 CA GLN A 31 52.134 -87.762 -10.201 1.00 8.67 CATOM 492 C GLN A 31 51.726 -89.222 -10.343 1.00 10.90 CATOM 493 O GLN A 31 51.015 -89.601 -11.275 1.00 9.63 O…..….

2 ways to represent the protein structure

Cartesian coordinates (x,y,z; pdb format)

Intuitive – look at molecules in space

Easy calculation of energy score (based on atom-atom distances)

– Difficult to change conformation of structure (while keeping bond length and bond angle unchanged)

Polar coordinates ( equilibrium angles and bond lengths)

Compact (3 values/residue)Easy changes of protein

structure (turn around one or more dihedral angles)

– Non-intuitive– Difficult to evaluate energy

score (calculation of neighboring matrix complicated)

A snake in the 2D world

• Cartesian representation:points:(0,0),(1,1),(1,2),(2,2),(3,3)

connections (predefined):1-2,2-3,3-4,4-5

y(0,0)

A snake in the 2D world

• Internal coordinates:bond lengths (predefined):√2,1,1,√2

angles:450,90o,0o,45o

y√2√2

√2√211

From wikipedia

A snake wiggling in the 2D world

• Constraint: keep bond length fixed

• Move in Cartesian representation

(0,0),(1,1),(1,2),(2,2),(3,3) (0,0),(1,1),(1,2),(2,2),(3,0)

Bond length changed!

√2√2

√3√3

A snake wiggling in the 2D world

• Constraint: keep bond length fixed

• Move in polar coordinates450,90o,0o,45o 450,90o,45o,45o

Bond length unchanged!Large impact on structure

Polar Cartesian coordinatesConvert r and to x and y

(0,0),(1,1),(1,2),(2,2),(3,3)

450,90o,0o,45o

√2,1,1,√2

From wikipedia

Cartesianpolar coordinatesConvert x and y to r and

(0,0),(1,1),(1,2),(2,2),(3,3)

450,90o,0o,45o

√2,1,1,√2

Moving the snake to the 3D world

• Cartesian representation:points: additional z-axis(0,0,0),(1,1,0),(1,2,0),(2,2,0),

(3,3,0)connections (predefined):1-2,2-3,3-4,4-5

• Internal coordinates:bond lengths (predefined):√2,1,1,√2angles:450,90o,0o,45o

dihedral angles: 1800,180o

Proteins: bond lengths and angles fixed. Only dihedral angles are variedProteins: bond lengths and angles fixed. Only dihedral angles are varied

Dihedral angles

Dihedral angles 1-4 define side chain

From wikipedia

• Dihedral angle: defines geometry of 4 consecutive atoms (given bond lengths and angles)

What we learned from our snake

• Cartesian representation: Easy to look at, difficult to move– Moves do not preserve bond length

(and angles in 3D)

• Internal coordinates: Easy to move, difficult to see – calculation of distances between

points not trivial

Proteins: bond lengths and angles fixed. Only dihedral angles are variedProteins: bond lengths and angles fixed. Only dihedral angles are varied

Solution: toggle

CALCULATE ENERGY - Cartesian coordinates:

Derive distance matrix (neighbor list) for energy score calculation

CALCULATE ENERGY - Cartesian coordinates:

Derive distance matrix (neighbor list) for energy score calculation

Transform: build positions in space according to

dihedral angles

Transform: build positions in space according to

dihedral anglesPDB x y zATOM 490 N GLN A 31 52.013 -87.359 -8.797 1.00 7.06 NATOM 491 CA GLN A 31 52.134 -87.762 -10.201 1.00 8.67 CATOM 492 C GLN A 31 51.726 -89.222 -10.343 1.00 10.90 CATOM 493 O GLN A 31 51.015 -89.601 -11.275 1.00 9.63 O…..….

MOVE STRUCTURE - Polar coordinates:

introduce changes in structure by rotating around dihedral angle(s) (change values)

MOVE STRUCTURE - Polar coordinates:

introduce changes in structure by rotating around dihedral angle(s) (change values)

Position PHI PSI OMEGA CHI1 CHI2 CHI3 CHI41 0.00 -60.00 -180.00 -60.00 0.00 0.00 0.00 23….……

Transform: calculate dihedral angles from

coordinates

Transform: calculate dihedral angles from

coordinates

(0,0),(1,1),(1,2),(2,2),(3,3) 450,90o,0o,45o

Cartesian polar coordinates

Position PHI PSI OMEGA CHI1 CHI2 CHI3 CHI4…..32 -59.00 -60.00 -180.00 0.00 0.00 0.00 0.00 3334….……

PDB x y z…ATOM 490 C GLN A 31 52.013 -87.359 -8.797 1.00 7.06 NATOM 491 N GLY A 32 52.134 -87.762 -10.201 1.00 8.67 CATOM 492 CA GLY A 32 51.726 -89.222 -10.343 1.00 10.90 CATOM 493 O GLY A 32 51.015 -89.601 -11.275 1.00 9.63 O…..….

How to calculate polar from Cartesian coordinates: example : C’-N-Ca-C

– define plane perpendicular to N-Ca (b2) vector– calculate projection of Ca-C (b3) and C’-N (b1) onto plane– calculate angle between projections

(0,0),(1,1),(1,2),(2,2),(3,3) 450,90o,0o,45o

Polar Cartesian coordinates

Position PHI PSI OMEGA CHI1 CHI2 CHI3 CHI4…..32 -59.00 -60.00 -180.00 0.00 0.00 0.00 0.00 3334….……

PDB x y z…ATOM 490 C GLN A 31 52.013 -87.359 -8.797 1.00 7.06 NATOM 491 N GLY A 32 52.134 -87.762 -10.201 1.00 8.67 CATOM 492 CA GLY A 32 51.726 -89.222 -10.343 1.00 10.90 CATOM 493 O GLY A 32 51.015 -89.601 -11.275 1.00 9.63 O…..….

Find x,y,z coordinates of C, based on atom positions of C’, N and Ca, and a given value (: C’-N-Ca-C)

• create Ca-C vector: –size Ca-C=1.51A (equilibrium bond length)–angle N-Ca-C= 111o (equilibrium value for N-Ca-C angle)

• rotate vector around N-Ca axis to obtain projections of Ca-C and N-C’ with wanted

(0,0),(1,1),(1,2),(2,2),(3,3) 450,90o,0o,45o

Representation of protein structure

431 2 875 6Rosetta folding

3 backbone dihedral angles per residue

Sampling and minimization in TORSIONAL space: change angle and rebuild, starting from changed angle

Build coordinates of structure starting from first atom, according to dihedral angles (and equilibrium bond length and angle)

431 2 875 687

Based on slides by Chu Wang

Representation of protein structure

431 2 875 6

4’3’1’ 2’ 8’7’5’ 6’

Backbone dihedral angles fixed (rigid-body)

Rosetta folding

3 backbone dihedral angles per residue

Rosetta docking

6 rigid-body DOFs --3 translational vectors3 rotational angles

Sampling and minimization in TORSIONAL space

Sampling and minimization in RIGID-BODY space

How can those two types of degrees of freedom be combined?How can those two types of degrees of freedom be combined?

Fold tree representation

“long-range” edge – 6 rigid-body DOFs

4’3’1’ 2’ 8’7’5’ 6’

“peptide” edge – 3 backbone dihedral angles

431 2 875 6

“peptide” edge – 3 backbone dihedral anglesExample:fold-tree based docking

Originally developed to improve sampling of strand registers in -sheet proteins. Allows simultaneous optimization of rigid-body and backbone/sidechain torsional degrees of freedom.

Fold tree: Bradley and Baker, Proteins (2006)

4’3’1’ 2’ 8’7’5’ 6’

Construct fold-trees to treat a variety of protein folding and docking problems.

Fold-trees for different modeling tasks protein folding N C

N: N-terminal; C: C-terminal; X: chain break; O: root of the tree;

Flexible “peptide” edge rigid “peptide” edge 1 1’ rigid “jump” 1 1’ flexible “jump”

Color – flexible bbGray – fixed bb

Fold-trees for different modeling tasks

N 1 1’ C2 2’xx

loop modeling

N 1’ C

fully flexible docking

N 1’ C

docking w/ hinge motion

N 1’ C

2 2’x C

3’ 3x

docking w/ loop modeling

Color – flexible bbGray – fixed bbPale – symmetry operation

Color – flexible bbGray – fixed bb• Filled colored circles - flexible sc

• Filled colored circles - flexible sco empty colored circles – flexible amino acid: design

Rosetta3: Object-oriented architecture

Description of object-oriented organization in Rosetta3: Leaver-Fay et al. Methods in Enzymology (2013)

The Rosetta sampling strategy: A general overview

2. Introduction to Rosetta and structural modeling Approaches for structural modeling of proteins...

Documents

Structural Equation Modeling (SEM)

OPERATIONAL APPROACH FOR THE MODELING OF THE …Overview of the operational approach for the modeling of coma drag force and torque on Rosetta. 2. The Rosetta spacecraft . Rosetta

Structural Modeling with Examples - OpenSeesopensees.berkeley.edu/.../B1_StructuralModeling.pdf · Structural Modeling with Examples ... •Nonlinear modeling using frame elements

Structural Modeling

Structural & Mechanical modeling

Rosetta Workshop Modeling

Hybrid Structural Default Modeling

Exploratory Structural Equation Modeling

Macromolecular Modeling with Rosetta - Stanford Universityrhiju/Das_Baker_Ann... · Macromolecular Modeling with Rosetta Rhiju Das1 and David Baker1,2 ... be generally put into practice

Macromolecular Modeling with Rosetta - Stanford …web.stanford.edu/~rhiju/Das_Baker_AnnReview... · Macromolecular Modeling with Rosetta Rhiju Das1 and David Baker1,2 ... Baker Annu

Basic Structural Modeling

2. Introduction to Rosetta and structural modeling (From Ora Schueler-Furman) Approaches for structural modeling of proteins The Rosetta framework and

Structural bioinformatics for glycobiology. Structural glycoinformatics approaches Structural modeling – Comparative modeling of glycoproteins – Complex

CS Structural/Emag Modeling

Structural Equation Modeling – Rakenneyhtälömallinnus

Structural Equation Modeling 3

Hierachical structural modeling

UML Advanced Structural Modeling 1 UML Advanced Structural Modeling II

Structural Equation Modeling (SEM)

Structural Modeling: An essential skill for engineers - RISA · Why is Structural Modeling an important ... Important things to remember about Structural Modeling ... There should