10
1 Protein-Protein Docking Jeffrey J. Gray Protein Bioinformatics Lecture, May 2005 PROGRAM IN Molecular Biophysics Biomolecular & Nanoscale Modeling Lab Jeffrey J. Gray, Ph.D. – [email protected] – http://graylab.jhu.edu Chemical & Biomolecular Engineering and Program in Molecular & Computational Biophysics 1. Protein-Protein Docking 5. Protein-Surface Interactions 2. Therapeutic Antibodies 3. Proteome Docking Predictions 4. Allostery / Intramolecular Signal Transduction Laminin prediction/experiment Nidogen Anthrax toxin PA4 FAb 14B7 Outline • Motivation Docking Methods Results / Evaluation of Method Blind Prediction Challenge Recent Work: Flexibility & Ensembles Goal: Demonstrate Current Methodologies & Capabilities in Protein-Protein Docking Cellular Function Depends on Protein-Protein Interaction Signaling Regulation Recognition Enzymes/inhibitors Antibodies/antigens Faulty interactions result in diseases nSec1 + syntaxin1a Protein docking tests our fundamental knowledge of biomolecular physics Conformational space Free energy functions Water (solvation) Hydrogen bonding Van der Waals – Electrostatics N O O O N O N O N N O ... ... Computational protein docking could help elucidate biological molecular interactions on a genomic scale Uetz, Fields, Rothberg et al. Nature 2000

PROGRAM IN Molecular Biophysics - Biostatistics

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: PROGRAM IN Molecular Biophysics - Biostatistics

1

Protein-Protein Docking

Jeffrey J. GrayProtein Bioinformatics Lecture, May 2005

PROGRAM IN

Molecular Biophysics

Biomolecular & Nanoscale Modeling LabJeffrey J. Gray, Ph.D. – [email protected] – http://graylab.jhu.edu

Chemical & Biomolecular Engineering and Program in Molecular & Computational Biophysics

1. Protein-Protein Docking

5. Protein-Surface Interactions

2. Therapeutic Antibodies

3. Proteome Docking Predictions

4. Allostery / Intramolecular Signal Transduction

Lamininprediction/experiment

Nidogen

Anthrax toxin PA4

FAb14B7

Outline

• Motivation• Docking Methods• Results / Evaluation of Method• Blind Prediction Challenge• Recent Work: Flexibility & Ensembles

Goal: Demonstrate Current Methodologies & Capabilities in Protein-Protein Docking

Cellular Function Depends on Protein-Protein Interaction

– Signaling– Regulation– Recognition– Enzymes/inhibitors– Antibodies/antigens

Faulty interactions result in diseases

nSec1 + syntaxin1a

Protein docking tests our fundamental knowledge of biomolecular physics• Conformational space• Free energy functions

– Water (solvation)– Hydrogen bonding– Van der Waals– Electrostatics

N

O

O

ON

O

N

O

NN

O

...

...

Computational protein docking could help elucidate biological molecular interactions on a genomic scale

Uetz, Fields, Rothberg et al. Nature 2000

Page 2: PROGRAM IN Molecular Biophysics - Biostatistics

2

Recent data (2004): DrosophilaJoel Bader/Curagen (now at JHU BME)

Giot, Bader et al. Science 2003

Draft: 7048 proteins, 20,405 interactions High confidence: 4679 proteins, 4780 interactions

Protein docking studies may teach us how to design complex devices capable of assemblingthemselves from nanoscopic (macromolecular) components

FTDOCK: Fourier-Transform Docking (Rigid Body)

• Discretize the protein shape:

• And correlate the functions:

• l,m,n,α,β,γ → N6

⎪⎩

⎪⎨

⎧−==

moleculeof outside ,moleculeof core ,

moleculeof surface ,

015

1

,, ρnmla

⎪⎩

⎪⎨

⎧==

moleculeof outside ,moleculeof core ,moleculeof surface ,

01

1

,, δnmlb

∑∑∑ +++⋅=l m n

nmlnml bac γβαγβα ,,,,,,

Ackermann 1998

Katchalski-Katzir, Shariv, Eisenstein, Friesem, Afalo & Vakser 1992

FTDOCK

• Use a Discrete Fourier Transform

• Mulitply in Fourier Space:

• Invert:

• DFT → N3 ln N3

• Then, search over rotation space:

( ) nmll m n

qpo xqnpmolN

iX ,,,,2exp ⋅⎥⎦

⎤⎢⎣⎡ ++−

= ∑∑∑ π

qpoqpoqpo BAC ,,*

,,,, ⋅=

( ) qpoo p q

CqpoN

iN

c ,,3,,2exp1

⋅⎥⎦⎤

⎢⎣⎡ ++= ∑∑∑ γβαπ

γβα

{ }ψφθ ,,

Katchalski-Katzir et al., 1992

FT Docking Solutions

Vakser et al.PNAS 1999

A wide variety of methods have been developed since Katchalski-Katzir

• FFT/Grid (Eisenstein, Sternberg, Weng, Ten Eyck)

• Computer vision / matching knobs & holes / geometric hashing (Wolfson, Nussinov, Norel)

• Electrostatic and VdW filters (Weng, Camacho, Sternberg, Ten Eyck, many others)

• Spherical harmonic shape representations (Ritchie)

• Genetic Algorithm (Gardiner)

• MD (Mustard, Bates) and Minimization (many)

• NMR + docking (Bonvin)

• Residue conservation and co-variance/hotspots (Valencia, Kaznessis)

• Biological information (Sternberg, many others)

• Monte Carlo with physical potentials (Abagyan, US!)

Page 3: PROGRAM IN Molecular Biophysics - Biostatistics

3

Protein Docking is Difficult!• Proteins can be large (50-1000+ residues = 500-10,000+ atoms)• Interactions mediated by water• Proteins are flexible

– Backbone– Side chains

• Ions can be present• Proteins can be post-translationally modified• Environment is crowded

(other proteins, lipids, membranes, nucleic acids…)• Multi-protein interactions (chaperones) could be important

N

O

O

ONO

N

O

NN

O

...

...

Need to simplify!!

Our Approach to Modeling Proteins• Model physical forces when possible:

van der Waals, solvation, hydrogen bonding, electrostatics, …

• Use statistics from the Protein Data Bank to compensate for poor physical models

• Generate large numbers of plausible decoys• Model only necessary degrees of freedom• Employ multi-scale models for both breadth of

search and accuracy of discrimination

Although the problem is tremendously complex,we believe that simple fundamental principles will emerge…

Random Start Position

Low-Resolution Monte Carlo Search

High-Resolution Refinement

105 Clustering

Predictions

RosettaDock AlgorithmOverview

(Gray 2003 JMB)

Low-Resolution Search

• Monte Carlo Search• Rigid body translations

and rotations• Residue-scale interaction

potentials

Protein representation:backbone atoms + average centroids

N

O

OO

N

O

NO

N

N

O

......

Low-Resolution Search

• Monte Carlo Search• Rigid body translations

and rotations• Residue-scale interaction

potentials

Protein representation:backbone atoms + average centroids

N

O

OO

N

O

NO

N

N

O

......

• Mimics physicaldiffusion process

Residue-scale scoring

(biochemical)

(bioinformatic)

Hydrogen bonding electrostatics,

solvation

Solvation

Repulsive van der Waals

Attractive van der Waals

Physical Force

(r – Rij)2Bumps

variesConstraints

-1 for interface residues in Antibody CDRAlignment

-ln(Pij)Residue pair

-ln(Penv)

rcentroid-centroid < 6 Å

Representation

Residue environment

Contacts

Score

Page 4: PROGRAM IN Molecular Biophysics - Biostatistics

4

Repack Side Chains

Rigid-Body Minimization

Filter

High-ResolutionRefinement

Small Rigid-Body Move

50x

Reject

Clustering

Monte Carlo Accept?

Low-Resolution Decoy

• Simultaneous rigid-bodyand side-chain refinement

Side Chain Packing• Build amino acid side chains

– Choose side chains from Dunbrack’sbackbone-dependent rotamer library

– Vary χ1, χ2, χ3, χ4 angles– Minimize a full-atom energy function w.r.t. all

rotamer combinations– With strict VdW parameters, extra angles are

necessary (Chu Wang)

Phenylalanine rotamers(Richardson, 2000)

(Brian Kuhlman & David Baker, Nature Struct. Biol. 2001)

Minimization

• Full atom rigid-body minimization– Use a conjugate-gradient search to find the

local score minimum relative to a rigid body translation and rotation

Refinement Cycle• Simultaneous rigid-body displacement

and side chain minimization

START

randomperturbation

repack

minimization

FINISH

Full-Atom scoring

0.4-15.1 (LR rep)Coulomb model with simple chargesElectrostatics6.9Empirical, Kuhlman & Baker 2000Residue pair probability

14.9 & 6.8 (BB/BB)Empirical, Kortemme et al. 2003Hydrogen bonding19.6Dunbrack & Cohen, 1997Rotamer probability27.2Lazaridis & Karplus, 1999Gaussian solvent-exclusion

28.5Surface area (see Tsai 2003)Surface area solvation45.0Lennard-Jones 6-12Attractive van der Waals 73.0Modified Lennard-Jones 6-12Repulsive van der Waals

Discriminatoryz-valueForm / SourceScore

3.28.3

15.10.4

0.0250.0250.0980.0020

0.0250.0250.0980.0020

----

Simple electrostaticsShort-range repulsiveShort-range attractiveLong-range repulsiveLong-range attractive

6.90.1640.1640.66Residue pair probability

14.96.8

0.4410.4412.1Hydrogen bondingSC/SC + SC/BBBB/BB

19.60.0690.0690.79Rotamer probability27.20.2790.2790.80Gaussian solvent-exclusion28.50.344--Surface area solvation45.00.3380.3380.80Attractive van der Waals

73.00.080.3380.80Repulsive van der Waals z-valueWeight (D)Weight (M)Weight (P)Score

Scoring Weights

Page 5: PROGRAM IN Molecular Biophysics - Biostatistics

5

Hydrogen Bonding Energy

Based on statistics from high-resolution structures in the Protein Data Bank (rcsb.org)

lnG kT P∆ = −

(Kortemme 2003 JMB)

-60

-50

-40

-30

-20

-10

0

-14 -12 -10 -8 -6 -4 -2 0

-log Ka (experimental)

∆ s

core

(cal

cula

ted)

Score correlates with Binding Energy

Filled symbols – targets with funnelsOpen symbols – targets without funnels ∆ score for bound backbone docking

Clustering• Compare all top-scoring decoys pairwise

• Cluster decoys hierarchically

• Decoys within 2.5Å form a cluster

1/ 2

2

ligand

( )i ixS yRM D⎡ ⎤⎢ ⎥= −⎢ ⎥⎢ ⎥⎣ ⎦∑

I1−

9

I1−

4

I1−

5

I2−

8

I1−

3

I1−

7

I1−

8

I2−

7

I2−

4

I1−

0

I2−

0

I1−

2

I2−

3 I2−

9

I1−

1

I2−

2

I1−

6

I2−

1

I2−

5

I2−

6

05

10

15

20

25

30

He

igh

t

RepresentsENTROPY

IBM BladeCenterSupercomputing Facility

60 CPUs0.5 TB storage1.5 GB RAM/node1 GB network

Capable of producing~105 protein structures/day

Benchmark Studies

• Bound-Bound– Start with bound complex

structure, but remove the side chain configurations so they must be predicted

• Unbound-Unbound– Start with the individually-

crystallized component proteins in their unbound conformation

Benchmark set contains 54 targets for which bound and unbound structures are known

http://zlab.bu.edu/~rong/dock/benchmark.shtml

• Bound-Unbound (Semibound)

Binding Funnels

0 10 20 30 40 50−59

0−

580

−57

0−

560

−55

0

1AHW (norepack)

rms

scor

e

0 10 20 30 40−61

0−

590

−57

0−

550

1AHW (bound)

rms

scor

e

10 20 30 40 50

−62

0−

610

−60

0−

590

1AHW (unbound)

rms

scor

e

0 10 30

−10

80−

1020

1FSS (bound)

rms

scor

e

0 10 20 30 40−10

00−

970

1FSS (unbound)

rms

scor

e

Acetylcholinesterase / Fasciculin II

Antibody Fab 5G9 / Tissue FactorDecoys: graylab.jhu.edu

Page 6: PROGRAM IN Molecular Biophysics - Biostatistics

6

trypsin + inhibitor barnase + barstarα-chymotrypsin

+ inhibitorsubtilisin + inhibitor

lysozyme + antibodieshemagglutinin

+ antibody

actin + deoxyribonuclease I subtilisin + prosegment

Benchmark Results

28/5432/5442/54TOTAL0/60/66/6Difficult3/105/105/10Other8/169/1610/16Antibody/Antigen17/2218/2221/22Enzyme/Inhibitor

Global Searches

Unbound Perturbations

BoundPerturbations

Number of successful dockings, starting from either bound or unbound protein backbones and searching either near the native structure or globally.

Benchmark set assembled by R. Chen et al., see Proteins 2003

ZDOCK / RosettaDock(Vajda & Camacho 2004)

CAPRI: Critical Assessment of PRotein Interactions

• International Blind Prediction Challenge• 20-25 Participating Research Groups• Organized by Janin, Wodak, Sternberg

– Rounds 1-2: 2001-2002 (T1-7)– Rounds 3-5: 2003-2004 (T8-19)– Round 6: January 2005 (T20)– Round 7: NOW! May 9-22, 2005 (T21)

• See Janin et al. 2003 Proteins 52:2 and Mendez et al. 2003 Proteins 52:51.

Hemagglutinin+ Antibody

α-amylase + Camelid Antibody

T-Cell Receptor + Strep. pyrogenic

exotoxin A (superantigen)

• α-amylase + VHH, model #1:– 48/65 contacts, distance 1.33Å, rotation 3º, rmsd 1.5Å

Blue—Native αAGreen—Predicted αA

Orange—VHH

Xtal by C. Cambillau, CNRS

Target 6 (Round 2, Mar 2002) Target 8 (Round 3, M. Daily, Jan 2003)• Laminin + Nidogen, model #2:

– 53% contacts, rmsd 4.6 Å, interface rmsd 0.66 Å

Red—Native lamininGreen—Predicted laminin

Blue—Nidogen

Xtal by T. Springer, Harvard

D800, N802, V804 constrained near interface

Page 7: PROGRAM IN Molecular Biophysics - Biostatistics

7

Docking a Homology Model (Round 4, Sep 2003)

Xtal by Romao, Carvalho, Fontes et al., Lisbon

CAPRI T11/12: Cohesin + DockerinModel #6 (T11): 42% contacts, 6.1 Å rmsd, 1.9 Å interface rmsd• Dockerin coordinates modeled by homology via the Robetta server• RosettaDock produced the best model by correct contacts

Red—PredictionGreen—Experimental structure

Prediction using homology model of dockerin

Prediction using bound coordinates of dockerin

Prediction by Mike DailyMethods in Gray et al. 2003 JMB

Target 19: prion + Fab, model #264% contacts, rmsd 3.64 Å, interface rmsd 1.27 Å

Red—Predicted prionGreen—Actual prionBlue—Fab

Prion constructed manually from a 95% identical homologue

Targets 4 and 5 (Round 2)• α-amylase + VHH

– Incorrectly assumed binding occurs at CDRs

Blue—Native VHHGreen—Predicted VHH

Red—Native α-amylaseYellow—Predicted α-amylase

Xtal by C. Cambillau et al., CNRS

Target 7 – “Homology Target”• Streptococcal pyrogenic exotoxin A (superantigen) + T Cell Receptor β

chain– Predicted by overlaying 1SBB using Mastodon– Model #1: 22/37 contacts, distance 3.6Å, rotation 11º– Refinement did not improve model

Blue—Native SAgGreen—Predicted SAg

Red—Native TCRYellow—Predicted TCR

Xtal by Ray Mariuzza et al., NIST

RosettaDock correctly predicts binding sites in 6/10 non-difficult targets

Standard targets; homology targets; not submittedNP: not predicted

1146U-UTBEV envelope trimer10412U-ULicT homodimer9

-NPNPNPNP600U-Bmypt1-PP114*11.648.130.147575H-UGH 10 xylanase – XIP16*-NPNPNPNP552U-BGH11 xylanase - TAXI18-NPNPNPNP474U-Bsag1-fab13-8.7812.910.075464H-UGH11 xylanase - XIP17*

***0.664.630.532427U-BLaminin-nidogen8***1.273.640.642312H-BOvine prion – fab19**1.936.110.425196U-HCohesin-dockerin11***0.510.990.871196U-BCohesin-dockerin12***0.2430.5470.887194BB-BBColicin D – immD15*

Acc.I_rmsdL_rmsdFnatModelNresTypeComplexTarget

Many Docking Players (Vajda/Camacho 2004)

Page 8: PROGRAM IN Molecular Biophysics - Biostatistics

8

CAPRI Submissions (Mendez 2003)RosettaDock Assumptions

• Rigid protein backbones• Side chains in rotamer conformations• Native structure is minimum (free) energy• Entropy captured by clustering or convergence

compensates for poor energy model• Energy functions!

– Linearly separable– Choice of contributions– Parameters…

What RosettaDock study tells us about Proteins

• Packing dominates free energy• Solvation, hydrogen bonding also

important• Electrostatics not important?• Energy function is closer to correct than

past models• A short list of probable best docking

structures

What it doesn’t tell you about Proteins

• THE energy function• Unambiguously the “best” conformation• How specificity is achieved• Binding affinities

Side chain movement(Camacho 2004 PNAS)

• Most side chains do not change rotameric conformation upon binding (Weng)

• “Anchor” residue = deeply buried residue at center of interface, usually no conformational change

• “Latch” residue = peripheral interface residue, moves upon binding

Copyright ©2001 by the National Academy of Sciences

Camacho, Carlos J. and Vajda, Sandor (2001) Proc. Natl. Acad. Sci. USA 98, 10636-10641

Fig. 1. Shapes of the binding free energy landscape as a function of some arbitrary coordinate measuring the rmsd from the native conformation

Fig. 2. Binding free energy funnel

Page 9: PROGRAM IN Molecular Biophysics - Biostatistics

9

Three-step mechanismGrünberg, Leckner & Nilges 2004 Structure

I. diffusionII. free conformer

selection (recognition)III. induced fit (refolding)

Docking EnsemblesGrünberg, Leckner & Nilges 2004 Structure

• Sampled monomer conformations by MD and by PCR-MD (Principal Component Restrained)

• Greatly increased sampling near-native

• Model:

(similar ideas by Smith, Sternberg & Bates 2005)

Loop Flexibility

• Currently exploring ways of moving loops during protein-protein docking to simulate an induced fit binding mechanism

Rohl, CA et al 2004 to appear

Target 1: HPr + HPr Kinase: (Round 1,Sep 2001)

• Model #8 among the closest:

2/52 contactsdistance 2.6Årotation 55ºRMSD 8.8Å

Distance constraint between Ser157C and Asp46A

Xtal by Fieulaine et al., CNRS

HPr

HPr Kinase III

HPr Kinase I

HPr Kinase II

C-terminal helix α4

Kinase I

HPr

No energy funnel for binding the unbound components

scor

e

HPr rmsd (Å)

Backbone Conformational ChangeCAPRI T01: HPr + HPr Kinase (Round 1, Sep 2001)

Terminal helix swings upon docking, nuzzling HPr in a pocket

Torsion Angle Perturbation

Res 290-292

C-terminal helix α4

Kinase I

Torsion angle movement in residues 290-292 would allow the correct conformation to be observed.

Page 10: PROGRAM IN Molecular Biophysics - Biostatistics

10

Helix Minimization

Low-ResolutionSearch

With Flexible Backbone

Small helix perturb

Monte Carlo Accept?

10X50

Initial helix perturb

Initial rigid body perturb

Low resolution output decoy

Rigid-body move

Helix MC perturb

Kinase I

Kinase II

C-terminal helix α4

HPR

scor

e

HPr rmsd

Flexible Docking ResultsWith torsion angle perturbations and explicit minimizations

rmsd=2.0Å

unbound rmsd=5.8Å

helix rmsd

scor

e

18/36 contacts, translation 1.8Å, rotation 18º

Docking into EM maps(Aloy, Bork, Serrano, Russell, et al. Science 2004) Summary

• A variety of protein-protein docking techniques have been developed combining advanced techniques in applied mathematics and biophysics

• Benchmark and CAPRI performance is encouraging – but work remains

• Significant challenges persist in sampling (particularly for flexible backbones and large targets) and correction of the energy function

• RosettaDock Software & Decoys: – graylab.jhu.edu– Gray et al., JMB 331:281, 2003– Gray et al., Proteins 52:118, 2003

Recommended References1. “Protein-protein docking with simultaneous optimization

of rigid-body displacement and side-chain conformations,” Gray, Moughon, Wang, Schueler-Furman, Kuhlman, Rohl & Baker, J. Mol. Biol. 2003 331, 281-299.

2. “Complementarity of structure ensembles in protein-protein binding,” Grunberg, Leckner & Nilges, Structure2004 12, 2125-2136.

3. “Prediction of protein-protein interactions by docking methods,” Smith & Sternberg, Curr. Op. Struct. Biol. 2002 12, 36-40.

4. “Assessment of blind predictions of protein-protein interactions: current status of docking methods,”Mendez, Leplae, De Maria & Wodak, Proteins 2003 52, 51-67.