Lecture 13 Structural Aspects of Protein-Protein Interactions PHAR 201/Bioinformatics I Philip E. Bourne Department of Pharmacology, UCSD Slides modified

Lecture 13Structural Aspects of

Protein-Protein Interactions

PHAR 201/Bioinformatics I

Philip E. BourneDepartment of Pharmacology, UCSD

Slides modified from JoLan Chung

Agenda

• Understand the importance of studying protein-protein interactions at the structural level

• Classify the various types of interactions

• Look in detail at one structure-based method for predicting protein-protein interactions

LINKStill seems the best review

http://www.nature.com/emboj/journal/v22/n14/abs/7590769a.pdf

Types of protein-protein interactions (PPI)

Obligate PPI

the protomers are not found as stable

structures on their own in vivo

Non-obligate PPI

Obligate homodimer

P22 Arc repressor

DNA-binding

Obligate heterodimer

Human cathepsin D

1LYB

Non-obligate homodimer

Sperm lysin

Non-obligate heterodimer

RhoA and RhoGAP signaling complex

http://pdbbeta.rcsb.org/pdb/explore.do?structureId=1lyb


Obligate PPI

usually permanent

the protomers are not found as stable structures on their

own in vivo

Non-obligate PPI


Human cathepsin D

Non-obligate transient homodimer, Sperm lysin (interaction is broken and

formed continuously)

Permanent

(many enzyme-inhibitor complexes)

dissociation constant Kd=[A][B] / [AB]

10-7 - 10-13 M

Transient

Weak

(electron transport

complexes)

Kd mM-M

Non-obligate permanent

heterodimer

Thrombin and rodniin inhibitor

Intermediate

(antibody-antigen, TCR-MHC-peptide, signal transduction PPI), Kd M-nM

Strong

(require a molecular trigger to shift the

oligomeric equilibrium)

Kd nM-fM

Bovine G protein dissociates into G and G subunits upon GTP, but forms a stable trimer upon GDP


Obligate PPI

usually permanent

the protomers are not found as stable structures on their

own in vivo

Non-obligate PPI


Human cathepsin D

Non-obligate transient homodimer, Sperm lysin (interaction is broken and

formed continuously)

Permanent

(many enzyme-inhibitor complexes)

dissociation constant Kd=[A][B] / [AB]

10-7 ÷ 10-13 M

Transient

Weak

(electron transport

complexes)

Kd mM-M

Non-obligate permanent

heterodimer

Thrombin and rodniin inhibitor

Intermediate

(antibody-antigen, TCR-MHC-peptide, signal transduction PPI), Kd M-nM

Strong

(require a molecular trigger to shift the

oligomeric equilibrium)

Kd nM-fM

Bovine G protein dissociates into G and G subunits upon GTP, but forms a stable trimer upon GDP

One Approach Using Structural Bioinformatics: Exploiting Using Sequence and Structure

Homologs to Identify Protein-Protein Binding Sites

Work of Jo-Lan Chung (Former Graduate Student Chemistry/Biochemistry)

Co-mentored by Wei Wang

Proteins: Structure Function and Bioinformatics 2006 62:630-640 [PDF]

http://www.sdsc.edu/pb/papers/chung05.pdf

Current Situation

• We have a relatively small number of protein complexes

• We have a large number of predominantly apo form structures being determined by structural genomics and functionally driven structure determination

• Exploiting the information obtained from these apo structures to identify functional sites, such as protein-protein binding sites, is an important question.

General Properties of Protein-Protein Binding Sites

• More hydrophobic than the rest of the protein surface.

• Relatively flat (except for enzyme-substrate binding sites).

• The binding free energy is not distributed equally across the protein interfaces: a small subset of residues at the interfaces forms energy hot spots, enriched in tyrosine, tryptophan, and arginine.

• Structurally conserved residues, especially polar resides, correspond to the energy hot spots.

Previous Methods to Identify Protein-Protein Binding Sites

• Sequence conservation e.g. Consurf • Docking• Threading and homology modeling• Evolutionary tracing• Correlated mutations• Properties of patches• Hydrophobicity• Neural networks and support vector machines

http://consurf.tau.ac.il/

• The above mentioned methods use evolutionary information from sequence alignments, and/or residue properties, and/or the geometric information from structures.

• It is difficult to judge the predictive powers of these various methods due to the differences of learning sets, the different definitions of binding sites and the lack of benchmarks.

Characteristics of Previous Methods

• None of the above methods consider the residues which are spatially conserved on the surfaces among their structure homologs.

• These residues are reported to have a correspondence to the energy hot spots on protein interfaces and can be derived from multiple structure alignments.

Structurally Conserved Surface Residues?

Approach

• Survey the structurally conserved surface residues in the non-redundant chains of hetero complexes from the Protein Data Bank (PDB).

• Incorporate the structurally conserved surface residues with the information derived from sequence alignments and single structures to identify protein-protein binding sites.

Dataset• All non-redundant hetero-complexes were

collected from the PDB (<30% sequence identity)

• A pair of chains was then selected if the reduction of the accessible solvent area (ASA) was at least 450 Å2

upon binding. • The pairs containing chains belong to SCOP class >=

8 (9 and 10 did not exist) or with length less than 80 amino acids were discarded

Dataset

• Retrieve structure homologs at SCOP family level from Astral database at 40% sequence purge level.

• Align each chain with its homologs with CEMC. • A chain was selected if at least 4 members in its alignment were

aligned together over 60% of its length and with Z score > 4. • Discard the chains with less than 20 interfacial contacts or with

constant B-factors.

The final dataset was composed of 274 non-redundant chains of

hetero-complexes. Each of these chains was accompanied with a

structure alignment with at least 4 members.

Derive the Structurally Conserved Residues

• The structural conservation scores were derived from the multiple structural alignments.

• Each position in the alignment has a structural conservation score, which represents the conservation in 3D space.

• A position has a high conservation score if the aligned residues are spatially conserved.

The Structural Conservation Score

• Raw structural conservation score

where

if a is not gap and b is not gap otherwise

where N is the total number of aligned structures, si(x) is the amino acid at position x in the ith structure in the alignment, M is a modified PET substitution matrix calculated by Valdar et al. d is the distance between Calpha atoms.

N

i

N

ij

ji xsxsLNN

xC ))(),(()1(

2)(

))(),(()))(),((exp())(),(( xsxsMxsxsdxsxsL jijiji

)min()max(

)min(),(

0))(),(( mm

mbam

ji xsxsM

The Structural Conservation Score

• The B-factors determined by X-ray crystallographic experiments provide an indication of the degree of mobility and disorder of an atom in a protein structure

• Raw structural conservation scores were weighted by the normalized B-factors (Bnorm, i) to consider the structure flexibility

where

)()()( xrweighted xCxC

)exp()( , inormBxr

0

10

20

30

40

50

60

Top 50% Top 40% Top 30% Top 20% Top 10%

Structural conservation score

% o

f th

e re

sid

ues

Interface residues

Non-interface residues

Does the Structural Conservation Score Discriminate Interface from Non-interface Residues?

60,834 Residues from 274 Chains

Incorporating the Structural Conservation Scores to Predict the

Interface Residues

A surface residue

↓

Sequence profile + ASA + Structural conservation score

in a window of 13 residues

(The residue to be predicted and 12 spatially nearest surface residues)

↓

Support vector machine classifier trained to distinguish surface

non-interface residues vs surface residues

↓

Interface or non-interface residue ?

↓

Clustering of sites

Predictor 1: Sequence profile + ASA.Predictor 2: Sequence profile + ASA + structural conservation score

The Performances of the Predictors

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

0 5 10 15 20 25 30 35 40 45 50 55 60 65

1-Precision (%)

Rec

all (

%)

Predictor 1

Predictor 2

Clustering of Results

• Clustering was used to remove isolated interface residues and include non-interface residues surrounded by several interface residues

• Any 2 residues were clustered if the distance between the C beta’s was less than 6.5A

• Clusters with only 1or 2 residues were removed• A non-interface residue was converted to an

interface residue if the C beta of the residue and at least 3 interface residues were located within 6A

Precise prediction: at least 70% interface residues were identified.Precise prediction: at least 70% interface residues were identified.Correct prediction: at least 50 % interface residues were identified.Correct prediction: at least 50 % interface residues were identified.Partial prediction: some but less than 50 % interface residues were identified.Partial prediction: some but less than 50 % interface residues were identified.Wrong prediction: no interface residues were identified.Wrong prediction: no interface residues were identified.

The Performances of the Predictors

The Predicted Binding Sites (Example 1)

Protein : domain 1 of the human coxsackie and adenovirus receptor (CAR D1) - yellow• Mediate adenoviruses and coxsackie virus B infection.• CAR is an integral membrane protein expressed in a broad range of human and

murine cell type. CAR D1 is one of its two extracellular domains.

Binding partner: knob domain of the adenoviruses serotype 12 (Ad12) - blue

Predictor 1 Predictor 2

The Predicted Binding Sites (example 1)

Each CAR D1 binds at the interface between two adjacent Ad12 knob domain

• Consistent with the observation that most neutralizing antibodies to knob are directed against the trimer, rather than monomer.


Protein : adrendoxin (Adx)

• In mitochondria of the adrenal cortex, the steroid hydroxylating system requires the transfer of electrons from the membrane-attached flavoprotein AR via the soluble Adx to the membrane-integrated cytochrome P450 of the CYP 11 family.

Binding partner: adrenodoxin reductase (AR) - blue

Predictor 1Predictor 2


Protein : fibroblast growth factor receptor 2 (FGFR2) Ser252Trp Mutant (Gray)

• Apert syndrome (AS) is caused by substitution of one of two adjacent residues, Ser252Trp or Pro253Arg (green).

Binding partner: fibroblast growth factor (FGF2) (blue)



Protein : Nitrogenase molybdenum-iron protein of Clostridium pasteurianum

ß subunit

Binding partner: Nitrogenase molybdenum-iron protein of Clostridium

pasteurianum subunit


Is the Method Sensitive to the Multiple Structure Alignment Algorithm?

Hemoglobin Hemoglobin chain chainLeft: automatic multiple structure alignments (CE-MC)Left: automatic multiple structure alignments (CE-MC)Right: expert curetted Right: expert curetted multiple structure alignments (HOMSTRAD)multiple structure alignments (HOMSTRAD)

Murine t-cell receptor variable domainMurine t-cell receptor variable domainLeft: automatic multiple structure alignments (CE-MC)Left: automatic multiple structure alignments (CE-MC)Right: expert curetted Right: expert curetted multiple structure alignments (HOMSTRAD)multiple structure alignments (HOMSTRAD)


AdrendoxinAdrendoxinLeft: automatic multiple structure alignments (CE-MC)Left: automatic multiple structure alignments (CE-MC)Right: expert curetted Right: expert curetted multiple structure alignments (HOMSTRAD)multiple structure alignments (HOMSTRAD)


What Effects the Performance of Predictor 2?

• Poor structure alignment

eg. More than 30% of the residues of the methane monooxygenase hydroxylase subunit were in the gapped positions of its structure alignment.

• Binding residues located in flexible loops

eg. The deteriorated prediction of the interfaces on VP 1 in p1/mahoney poliovirus may be caused by its long and flexible loops contacting the other two coat proteins, VP 2 and VP 3.

Conclusions

1. Analysis of the structurally conserved surface residues

• The conserved residues were measured by the structural conservation scores derived from the multiple structure alignments followed by a weighting process with the B-factors.

• Although derived from the alignments of single structures, these structurally conserved residues did differentiate the protein interfaces and the rest of the surfaces.

Conclusions

2. Incorporating the structural conservation score improved the prediction significantly.

• Before clustering, the recall increases about 10% at precision 50%, increases about 18% at precision 40%.

• After clustering, at precision about 50 %, the number of correctly predicted binding sites increase close to 16%, the number of precisely predicted binding sites increase close to 13%.

• 53.0% of the binding sites were precisely predicted, 75.9% of the binding sites were correctly predicted, and 20.4% of the binding sites were partially predicted.

Conclusions

3. This study was an initial trial that exploits multiple structure alignments on a large scale for the prediction of functional regions.

4. A more suitable scoring function or weighting function could be developed in the future.

5. This method can be used to guide experiments, such as site-specific mutagenesis or combined with docking procedures to limit the search space.

Acknowledgments

Dr. Wei Wang

Dr. Chittibabu Guda

Dr. BVB Reddy

All the members in Bourne group

Dr. Philip E. Bourne

This work was supported by

the Molecular Biophysics Training Grants at UCSD

Documents

Lecture 13 Structural Aspects of Protein-Protein Interactions PHAR 201/Bioinformatics I Philip E. Bourne Department of Pharmacology, UCSD Slides modified