Upload
janice-rodgers
View
251
Download
4
Tags:
Embed Size (px)
Citation preview
Lecture 13Structural Aspects of
Protein-Protein Interactions
PHAR 201/Bioinformatics I
Philip E. BourneDepartment of Pharmacology, UCSD
Slides modified from JoLan Chung
Agenda
• Understand the importance of studying protein-protein interactions at the structural level
• Classify the various types of interactions
• Look in detail at one structure-based method for predicting protein-protein interactions
LINKStill seems the best review
Types of protein-protein interactions (PPI)
Obligate PPI
the protomers are not found as stable
structures on their own in vivo
Non-obligate PPI
Obligate homodimer
P22 Arc repressor
DNA-binding
Obligate heterodimer
Human cathepsin D
1LYB
Non-obligate homodimer
Sperm lysin
Non-obligate heterodimer
RhoA and RhoGAP signaling complex
Types of protein-protein interactions (PPI)
Obligate PPI
usually permanent
the protomers are not found as stable structures on their
own in vivo
Non-obligate PPI
Obligate heterodimer
Human cathepsin D
Non-obligate transient homodimer, Sperm lysin (interaction is broken and
formed continuously)
Permanent
(many enzyme-inhibitor complexes)
dissociation constant Kd=[A][B] / [AB]
10-7 - 10-13 M
Transient
Weak
(electron transport
complexes)
Kd mM-M
Non-obligate permanent
heterodimer
Thrombin and rodniin inhibitor
Intermediate
(antibody-antigen, TCR-MHC-peptide, signal transduction PPI), Kd M-nM
Strong
(require a molecular trigger to shift the
oligomeric equilibrium)
Kd nM-fM
Bovine G protein dissociates into G and G subunits upon GTP, but forms a stable trimer upon GDP
Types of protein-protein interactions (PPI)
Obligate PPI
usually permanent
the protomers are not found as stable structures on their
own in vivo
Non-obligate PPI
Obligate heterodimer
Human cathepsin D
Non-obligate transient homodimer, Sperm lysin (interaction is broken and
formed continuously)
Permanent
(many enzyme-inhibitor complexes)
dissociation constant Kd=[A][B] / [AB]
10-7 ÷ 10-13 M
Transient
Weak
(electron transport
complexes)
Kd mM-M
Non-obligate permanent
heterodimer
Thrombin and rodniin inhibitor
Intermediate
(antibody-antigen, TCR-MHC-peptide, signal transduction PPI), Kd M-nM
Strong
(require a molecular trigger to shift the
oligomeric equilibrium)
Kd nM-fM
Bovine G protein dissociates into G and G subunits upon GTP, but forms a stable trimer upon GDP
One Approach Using Structural Bioinformatics: Exploiting Using Sequence and Structure
Homologs to Identify Protein-Protein Binding Sites
Work of Jo-Lan Chung (Former Graduate Student Chemistry/Biochemistry)
Co-mentored by Wei Wang
Proteins: Structure Function and Bioinformatics 2006 62:630-640 [PDF]
Current Situation
• We have a relatively small number of protein complexes
• We have a large number of predominantly apo form structures being determined by structural genomics and functionally driven structure determination
• Exploiting the information obtained from these apo structures to identify functional sites, such as protein-protein binding sites, is an important question.
General Properties of Protein-Protein Binding Sites
• More hydrophobic than the rest of the protein surface.
• Relatively flat (except for enzyme-substrate binding sites).
• The binding free energy is not distributed equally across the protein interfaces: a small subset of residues at the interfaces forms energy hot spots, enriched in tyrosine, tryptophan, and arginine.
• Structurally conserved residues, especially polar resides, correspond to the energy hot spots.
Previous Methods to Identify Protein-Protein Binding Sites
• Sequence conservation e.g. Consurf • Docking• Threading and homology modeling• Evolutionary tracing• Correlated mutations• Properties of patches• Hydrophobicity• Neural networks and support vector machines
• The above mentioned methods use evolutionary information from sequence alignments, and/or residue properties, and/or the geometric information from structures.
• It is difficult to judge the predictive powers of these various methods due to the differences of learning sets, the different definitions of binding sites and the lack of benchmarks.
Characteristics of Previous Methods
• None of the above methods consider the residues which are spatially conserved on the surfaces among their structure homologs.
• These residues are reported to have a correspondence to the energy hot spots on protein interfaces and can be derived from multiple structure alignments.
Structurally Conserved Surface Residues?
Approach
• Survey the structurally conserved surface residues in the non-redundant chains of hetero complexes from the Protein Data Bank (PDB).
• Incorporate the structurally conserved surface residues with the information derived from sequence alignments and single structures to identify protein-protein binding sites.
Dataset• All non-redundant hetero-complexes were
collected from the PDB (<30% sequence identity)
• A pair of chains was then selected if the reduction of the accessible solvent area (ASA) was at least 450 Å2
upon binding. • The pairs containing chains belong to SCOP class >=
8 (9 and 10 did not exist) or with length less than 80 amino acids were discarded
Dataset
• Retrieve structure homologs at SCOP family level from Astral database at 40% sequence purge level.
• Align each chain with its homologs with CEMC. • A chain was selected if at least 4 members in its alignment were
aligned together over 60% of its length and with Z score > 4. • Discard the chains with less than 20 interfacial contacts or with
constant B-factors.
The final dataset was composed of 274 non-redundant chains of
hetero-complexes. Each of these chains was accompanied with a
structure alignment with at least 4 members.
Derive the Structurally Conserved Residues
• The structural conservation scores were derived from the multiple structural alignments.
• Each position in the alignment has a structural conservation score, which represents the conservation in 3D space.
• A position has a high conservation score if the aligned residues are spatially conserved.
The Structural Conservation Score
• Raw structural conservation score
where
if a is not gap and b is not gap otherwise
where N is the total number of aligned structures, si(x) is the amino acid at position x in the ith structure in the alignment, M is a modified PET substitution matrix calculated by Valdar et al. d is the distance between Calpha atoms.
N
i
N
ij
ji xsxsLNN
xC ))(),(()1(
2)(
))(),(()))(),((exp())(),(( xsxsMxsxsdxsxsL jijiji
)min()max(
)min(),(
0))(),(( mm
mbam
ji xsxsM
The Structural Conservation Score
• The B-factors determined by X-ray crystallographic experiments provide an indication of the degree of mobility and disorder of an atom in a protein structure
• Raw structural conservation scores were weighted by the normalized B-factors (Bnorm, i) to consider the structure flexibility
where
)()()( xrweighted xCxC
)exp()( , inormBxr
0
10
20
30
40
50
60
Top 50% Top 40% Top 30% Top 20% Top 10%
Structural conservation score
% o
f th
e re
sid
ues
Interface residues
Non-interface residues
Does the Structural Conservation Score Discriminate Interface from Non-interface Residues?
60,834 Residues from 274 Chains
Incorporating the Structural Conservation Scores to Predict the
Interface Residues
A surface residue
↓
Sequence profile + ASA + Structural conservation score
in a window of 13 residues
(The residue to be predicted and 12 spatially nearest surface residues)
↓
Support vector machine classifier trained to distinguish surface
non-interface residues vs surface residues
↓
Interface or non-interface residue ?
↓
Clustering of sites
Predictor 1: Sequence profile + ASA.Predictor 2: Sequence profile + ASA + structural conservation score
The Performances of the Predictors
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
0 5 10 15 20 25 30 35 40 45 50 55 60 65
1-Precision (%)
Rec
all (
%)
Predictor 1
Predictor 2
Clustering of Results
• Clustering was used to remove isolated interface residues and include non-interface residues surrounded by several interface residues
• Any 2 residues were clustered if the distance between the C beta’s was less than 6.5A
• Clusters with only 1or 2 residues were removed• A non-interface residue was converted to an
interface residue if the C beta of the residue and at least 3 interface residues were located within 6A
Precise prediction: at least 70% interface residues were identified.Precise prediction: at least 70% interface residues were identified.Correct prediction: at least 50 % interface residues were identified.Correct prediction: at least 50 % interface residues were identified.Partial prediction: some but less than 50 % interface residues were identified.Partial prediction: some but less than 50 % interface residues were identified.Wrong prediction: no interface residues were identified.Wrong prediction: no interface residues were identified.
The Performances of the Predictors
The Predicted Binding Sites (Example 1)
Protein : domain 1 of the human coxsackie and adenovirus receptor (CAR D1) - yellow• Mediate adenoviruses and coxsackie virus B infection.• CAR is an integral membrane protein expressed in a broad range of human and
murine cell type. CAR D1 is one of its two extracellular domains.
Binding partner: knob domain of the adenoviruses serotype 12 (Ad12) - blue
Predictor 1 Predictor 2
The Predicted Binding Sites (example 1)
Each CAR D1 binds at the interface between two adjacent Ad12 knob domain
• Consistent with the observation that most neutralizing antibodies to knob are directed against the trimer, rather than monomer.
The Predicted Binding Sites (example 2)
Protein : adrendoxin (Adx)
• In mitochondria of the adrenal cortex, the steroid hydroxylating system requires the transfer of electrons from the membrane-attached flavoprotein AR via the soluble Adx to the membrane-integrated cytochrome P450 of the CYP 11 family.
Binding partner: adrenodoxin reductase (AR) - blue
Predictor 1Predictor 2
The Predicted Binding Sites (example 3)
Protein : fibroblast growth factor receptor 2 (FGFR2) Ser252Trp Mutant (Gray)
• Apert syndrome (AS) is caused by substitution of one of two adjacent residues, Ser252Trp or Pro253Arg (green).
Binding partner: fibroblast growth factor (FGF2) (blue)
Predictor 1 Predictor 2
The Predicted Binding Sites (example 4)
Protein : Nitrogenase molybdenum-iron protein of Clostridium pasteurianum
ß subunit
Binding partner: Nitrogenase molybdenum-iron protein of Clostridium
pasteurianum subunit
Predictor 1 Predictor 2
Is the Method Sensitive to the Multiple Structure Alignment Algorithm?
Hemoglobin Hemoglobin chain chainLeft: automatic multiple structure alignments (CE-MC)Left: automatic multiple structure alignments (CE-MC)Right: expert curetted Right: expert curetted multiple structure alignments (HOMSTRAD)multiple structure alignments (HOMSTRAD)
Murine t-cell receptor variable domainMurine t-cell receptor variable domainLeft: automatic multiple structure alignments (CE-MC)Left: automatic multiple structure alignments (CE-MC)Right: expert curetted Right: expert curetted multiple structure alignments (HOMSTRAD)multiple structure alignments (HOMSTRAD)
Is the Method Sensitive to the Multiple Structure Alignment Algorithm?
AdrendoxinAdrendoxinLeft: automatic multiple structure alignments (CE-MC)Left: automatic multiple structure alignments (CE-MC)Right: expert curetted Right: expert curetted multiple structure alignments (HOMSTRAD)multiple structure alignments (HOMSTRAD)
Is the Method Sensitive to the Multiple Structure Alignment Algorithm?
What Effects the Performance of Predictor 2?
• Poor structure alignment
eg. More than 30% of the residues of the methane monooxygenase hydroxylase subunit were in the gapped positions of its structure alignment.
• Binding residues located in flexible loops
eg. The deteriorated prediction of the interfaces on VP 1 in p1/mahoney poliovirus may be caused by its long and flexible loops contacting the other two coat proteins, VP 2 and VP 3.
Conclusions
1. Analysis of the structurally conserved surface residues
• The conserved residues were measured by the structural conservation scores derived from the multiple structure alignments followed by a weighting process with the B-factors.
• Although derived from the alignments of single structures, these structurally conserved residues did differentiate the protein interfaces and the rest of the surfaces.
Conclusions
2. Incorporating the structural conservation score improved the prediction significantly.
• Before clustering, the recall increases about 10% at precision 50%, increases about 18% at precision 40%.
• After clustering, at precision about 50 %, the number of correctly predicted binding sites increase close to 16%, the number of precisely predicted binding sites increase close to 13%.
• 53.0% of the binding sites were precisely predicted, 75.9% of the binding sites were correctly predicted, and 20.4% of the binding sites were partially predicted.
Conclusions
3. This study was an initial trial that exploits multiple structure alignments on a large scale for the prediction of functional regions.
4. A more suitable scoring function or weighting function could be developed in the future.
5. This method can be used to guide experiments, such as site-specific mutagenesis or combined with docking procedures to limit the search space.
Acknowledgments
Dr. Wei Wang
Dr. Chittibabu Guda
Dr. BVB Reddy
All the members in Bourne group
Dr. Philip E. Bourne
This work was supported by
the Molecular Biophysics Training Grants at UCSD