B Cell Epitope Prediction Immunological Bioinformatics #27685 3-week course June 2011 Center for Biological Sequence Analysis Department of Systems Biology Technical University of Denmark
10/06/2011 B Cell Epitope Prediction Leon Jessen ([email protected])
2 CBS, Department of Systems Biology
CEN
TER FO
R B
IOLO
GIC
AL S
EQU
ENCE A
NALYS
IS
TECH
NIC
AL U
NIV
ERSITY O
F DEN
MARK
Outline
• What is an epitope?
• How can we identify potential epitopes?
• Note that in this presentation, epitope refers only to B-cell epitopes and is not to be confused with T-cell epitopes (Recall the MHCI/II pathways)
10/06/2011 B Cell Epitope Prediction Leon Jessen ([email protected])
3 CBS, Department of Systems Biology
CEN
TER FO
R B
IOLO
GIC
AL S
EQU
ENCE A
NALYS
IS
TECH
NIC
AL U
NIV
ERSITY O
F DEN
MARK
So what is an Epitope?
• An epitope is the part of a protein (antigen) being recognised by soluble antibodies or the B-cell receptor (immobilised antibodies on B-cells)
• Proteins play an absolute key role in pathogenicity: Invasion, adhesion, inhibition etc.
• There is a constant arms race going on inside each and everyone of you!
• Pathogens seek to evade immune detection
• The immune system seek to detect pathogens
• So who is winning?
• Luckily in most cases we are!
10/06/2011 B Cell Epitope Prediction Leon Jessen ([email protected])
4 CBS, Department of Systems Biology
CEN
TER FO
R B
IOLO
GIC
AL S
EQU
ENCE A
NALYS
IS
TECH
NIC
AL U
NIV
ERSITY O
F DEN
MARK
Antibody-antigen interaction
https://www.pharmatching.com/blog/wp-content/uploads/2011/01/monoclonal_antibody.jpg
Heavy chains
Light chains
Constant region
Variable region
Antigen binding site
Epitope Paratope
10/06/2011 B Cell Epitope Prediction Leon Jessen ([email protected])
5 CBS, Department of Systems Biology
CEN
TER FO
R B
IOLO
GIC
AL S
EQU
ENCE A
NALYS
IS
TECH
NIC
AL U
NIV
ERSITY O
F DEN
MARK
Binding interactions
• Salt bridges
• Hydrogen bonds
• Hydrophobic interactions
• Van der Waals forces
Binding
strength
The interaction is highly specific! One key, one lock principle!
(This is the reason for the high sensitivity of western blots)
10/06/2011 B Cell Epitope Prediction Leon Jessen ([email protected])
6 CBS, Department of Systems Biology
CEN
TER FO
R B
IOLO
GIC
AL S
EQU
ENCE A
NALYS
IS
TECH
NIC
AL U
NIV
ERSITY O
F DEN
MARK
Taking a closer look at the ‘button’
• Basic principle – An epitope is an exposed, accessible non-
self surface structure
• Consists of – Lipids, sugars, protein, DNA or complex
hereof
Basically anything that the BCR will recognise (bind to)!
In the following we solely focus on epitopes made up of amino acid residues
10/06/2011 B Cell Epitope Prediction Leon Jessen ([email protected])
7 CBS, Department of Systems Biology
CEN
TER FO
R B
IOLO
GIC
AL S
EQU
ENCE A
NALYS
IS
TECH
NIC
AL U
NIV
ERSITY O
F DEN
MARK
Types of epitopes
• There are two basic types of B-cell epitopes
– Continuous (linear made up of primary structure)
– Discontinuous (non-linear made up of tertiary structure)
• In nature – ~10% linear – ~90% discontinues (But often
with a linear determinant)
10/06/2011 B Cell Epitope Prediction Leon Jessen ([email protected])
8 CBS, Department of Systems Biology
CEN
TER FO
R B
IOLO
GIC
AL S
EQU
ENCE A
NALYS
IS
TECH
NIC
AL U
NIV
ERSITY O
F DEN
MARK
Case: PfEMP1 PAM VAR2CSA DBL5ε
• VAR2CSA: Primary pathogenesis protein in Pregnancy associated malaria • Highlighted motif predicted using computational approach • Experimentally validated using high density peptide array
Adapted from: Gnidehou S, Jessen L, et al. 2010. PLoS ONE 5(10): e13105. doi:10.1371/journal.pone.0013105
Homology model of the DBL5ε domain of the Pregnancy Associated Malaria PfEMP1 protein VAR2CSA (3D7 variant)
275-TFKNI-279
- Exposed
- Accessible
10/06/2011 B Cell Epitope Prediction Leon Jessen ([email protected])
9 CBS, Department of Systems Biology
CEN
TER FO
R B
IOLO
GIC
AL S
EQU
ENCE A
NALYS
IS
TECH
NIC
AL U
NIV
ERSITY O
F DEN
MARK
High Density Peptide Array
24 fields divided by
Teflon barriers
Each field is further subdivided
into 5,000 sub-fields, on which
peptides are synthesised directly
High density peptide chip, here applied for VAR2CSA antigenicity analysis.
Works for any protein
Briefly:
Addition of sera samples
(Immunised rats) Signal quantified by
fluorescence measurements ~1,000,000 peptides/chip!
10/06/2011 B Cell Epitope Prediction Leon Jessen ([email protected])
10 CBS, Department of Systems Biology
CEN
TER FO
R B
IOLO
GIC
AL S
EQU
ENCE A
NALYS
IS
TECH
NIC
AL U
NIV
ERSITY O
F DEN
MARK
High Density Peptide Array
0 500 1000 1500 2000 2500
05
1015
20
VAR2CSA Variant: 3D7, PepArray Conditions #1
Sequence Position
Z−Sc
ore
Nor
mal
ised
ave
rage
S/N
378−SRYDDYVKDFFKKLEA−393378−SRYDDYVKDFFKKLEA−393
997−RTMKRGYKN−1005997−RTMKRGYKN−1005
p = 0.05p = 0.05 w/ corr.
0 500 1000 1500 2000 2500
05
1015
20
VAR2CSA Variant: 3D7, PepArray Conditions #2
Sequence Position
Z−Sc
ore
Nor
mal
ised
ave
rage
S/N
2263−GMDEFKNTFKNIKE−2276
p = 0.05p = 0.05 w/ corr.
0 500 1000 1500 2000 2500
05
1015
20
VAR2CSA Variant: 3D7, PepArray Conditions #3
Sequence Position
Z−Sc
ore
Nor
mal
ised
ave
rage
S/N
420−NSSDANNPSEKI−431420−NSSDANNPSEKI−431420−NSSDANNPSEKI−431995−SARTMKRGYK−1004995−SARTMKRGYK−1004
2266−E−22662268−K−2268
p = 0.05p = 0.05 w/ corr.
0 500 1000 1500 2000 2500
05
1015
20
VAR2CSA Variant: 3D7, PepArray Conditions #4
Sequence Position
Z−Sc
ore
Nor
mal
ised
ave
rage
S/N
994−GSARTMKRGYKNDNYELC−1011
1038−FNLFEQW−1044
p = 0.05p = 0.05 w/ corr.
0 500 1000 1500 2000 2500
05
1015
20
VAR2CSA Variant: 3D7, PepArray Conditions #5
Sequence Position
Z−Sc
ore
Nor
mal
ised
ave
rage
S/N
993−CGSARTMKRGYKNDNYELC−1011993−CGSARTMKRGYKNDNYELC−1011
p = 0.05p = 0.05 w/ corr.
0 500 1000 1500 2000 2500
05
1015
20
VAR2CSA Variant: 3D7, PepArray Conditions #6
Sequence Position
Z−Sc
ore
Nor
mal
ised
ave
rage
S/N
862−NRK−864995−SARTMKRGYK−10041286−KRYGGRSNIK−12952268−K−22682270−T−2270
p = 0.05p = 0.05 w/ corr.
0 500 1000 1500 2000 2500
05
1015
20
VAR2CSA Variant: 3D7, PepArray Conditions #7
Sequence Position
Z−Sc
ore
Nor
mal
ised
ave
rage
S/N
994−GSARTMKRGYKNDNY−1008994−GSARTMKRGYKNDNY−1008
p = 0.05p = 0.05 w/ corr.
0 500 1000 1500 2000 2500
05
1015
20
VAR2CSA Variant: 3D7, PepArray Conditions #8
Sequence Position
Z−Sc
ore
Nor
mal
ised
ave
rage
S/N
p = 0.05p = 0.05 w/ corr.
0 500 1000 1500 2000 2500
05
1015
20
VAR2CSA Variant: 3D7, PepArray Conditions #9
Sequence Position
Z−Sc
ore
Nor
mal
ised
ave
rage
S/N
862−NRKAG−8662668−AG−26692668−AG−2669
p = 0.05p = 0.05 w/ corr.
0 500 1000 1500 2000 2500
05
1015
20
VAR2CSA Variant: 3D7, PepArray Conditions #10
Sequence Position
Z−Sc
ore
Nor
mal
ised
ave
rage
S/N
1597−GNDRTWSKKYIKKLE−16111597−GNDRTWSKKYIKKLE−1611
p = 0.05p = 0.05 w/ corr.
0 500 1000 1500 2000 2500
05
1015
20
VAR2CSA Variant: 3D7, PepArray Conditions #11
Sequence Position
Z−Sc
ore
Nor
mal
ised
ave
rage
S/N
1579−YEYNNAEKKNNKS−15911579−YEYNNAEKKNNKS−15911579−YEYNNAEKKNNKS−1591
p = 0.05p = 0.05 w/ corr.
0 500 1000 1500 2000 2500
05
1015
20
VAR2CSA Variant: 3D7, PepArray Conditions #12
Sequence Position
Z−Sc
ore
Nor
mal
ised
ave
rage
S/N
1573−CEQVKYYEYNNAEKK−15871573−CEQVKYYEYNNAEKK−1587
p = 0.05p = 0.05 w/ corr.
10/06/2011 B Cell Epitope Prediction Leon Jessen ([email protected])
11 CBS, Department of Systems Biology
CEN
TER FO
R B
IOLO
GIC
AL S
EQU
ENCE A
NALYS
IS
TECH
NIC
AL U
NIV
ERSITY O
F DEN
MARK
So why are B-cell epitopes so important?
• In the case of PAM, being ready for the ‘attack’ really in many cases is the difference between life and death! (Sadly ~10,000 women and ~200,000 infants die from PAM each year in sub-Saharan Africa)
• The primary response simply is not enough!
Adapted from http://www.mhhe.com/biosci/esp/2001_gbio/folder_structure/an/m10/s3/assets/images/anm10s3_9.jpg
10/06/2011 B Cell Epitope Prediction Leon Jessen ([email protected])
12 CBS, Department of Systems Biology
CEN
TER FO
R B
IOLO
GIC
AL S
EQU
ENCE A
NALYS
IS
TECH
NIC
AL U
NIV
ERSITY O
F DEN
MARK
A small note
• We now turn to the more ‘nerdy’ bioinformatics
• But please do remind yourself that the research you do every day, does in fact have extrapolations to the real world, in which real people will benefit from your tedious work
• We are saving the world here people! (or at least trying to – Right?)
10/06/2011 B Cell Epitope Prediction Leon Jessen ([email protected])
13 CBS, Department of Systems Biology
CEN
TER FO
R B
IOLO
GIC
AL S
EQU
ENCE A
NALYS
IS
TECH
NIC
AL U
NIV
ERSITY O
F DEN
MARK
So how do we predict b-cell epitopes?
• Appr. 1010 combinations of the variable region (minus self-antigens!)
• Millions of B-cells with different B-cell receptors are made each day
• All in all, we are asking a very difficult question!
• To which residues in any given sequence will any of these 1010 paratopes bind?
• Number of possible combinations?
• A LOT!
10/06/2011 B Cell Epitope Prediction Leon Jessen ([email protected])
14 CBS, Department of Systems Biology
CEN
TER FO
R B
IOLO
GIC
AL S
EQU
ENCE A
NALYS
IS
TECH
NIC
AL U
NIV
ERSITY O
F DEN
MARK
Recall the basic principle of epitopes
Exposed accessible surface structure!
So basically we need to predict the surface!
10/06/2011 B Cell Epitope Prediction Leon Jessen ([email protected])
15 CBS, Department of Systems Biology
CEN
TER FO
R B
IOLO
GIC
AL S
EQU
ENCE A
NALYS
IS
TECH
NIC
AL U
NIV
ERSITY O
F DEN
MARK
So which residues fulfill the basic principle?
• Most proteins live in aqueous environments • Aqueous environments are... hydrophilic (surprise) • The Parker Hydrophilicity Scale provides a quantitative measure of the
hydrophilicity of any given amino acid residue (exp derived) D 2.46 E 1.86 N 1.64 S 1.50 Q 1.37 G 1.28 K 1.26 T 1.15 R 0.87 P 0.30 H 0.30 C 0.11 A 0.03 Y -0.78 V -1.27 M -1.41 I -2.45 F -2.78 L -2.87 W -3.00
Hydrophilicity
10/06/2011 B Cell Epitope Prediction Leon Jessen ([email protected])
16 CBS, Department of Systems Biology
CEN
TER FO
R B
IOLO
GIC
AL S
EQU
ENCE A
NALYS
IS
TECH
NIC
AL U
NIV
ERSITY O
F DEN
MARK
Propensity Scale
On a primary structure level, each amino acid residues is assigned the average of the nearest neighbours
...DEFKNTFKNIKEPDA...
...EFKNTFKNIKEPDA...
S(N)= mean(T+F+K+N+I+K+E)
= (1.15-2.78+1.26+1.64-2.45+1.26+1.86)/7
= 0.28
So the region is hydrophilic on average!
10/06/2011 B Cell Epitope Prediction Leon Jessen ([email protected])
17 CBS, Department of Systems Biology
CEN
TER FO
R B
IOLO
GIC
AL S
EQU
ENCE A
NALYS
IS
TECH
NIC
AL U
NIV
ERSITY O
F DEN
MARK
Other approaches
• Epitopes are found in: – Turns – Loops – Other often surface exposed
secondary structures
http://www.cs.gmu.edu/~ashehu/sites/default/files/images/1hml_loop_ensemble_newcartoon.jpg
10/06/2011 B Cell Epitope Prediction Leon Jessen ([email protected])
18 CBS, Department of Systems Biology
CEN
TER FO
R B
IOLO
GIC
AL S
EQU
ENCE A
NALYS
IS
TECH
NIC
AL U
NIV
ERSITY O
F DEN
MARK
But...
• Blythe and Flowers (2005) did an extensive evaluation of the Propensity scale approach
• Simon says random!
• But why? White regions are hydrophobic!
Electrostatics for VAR2CSA DBL3x (3BQK) based on the structure by:
Higgins, M. K. Journal of Biological Chemistry 283, 21842 (2008).
10/06/2011 B Cell Epitope Prediction Leon Jessen ([email protected])
19 CBS, Department of Systems Biology
CEN
TER FO
R B
IOLO
GIC
AL S
EQU
ENCE A
NALYS
IS
TECH
NIC
AL U
NIV
ERSITY O
F DEN
MARK
The BepiPred Server
• Combining – Parker Hydrophobisity Scale – Position Specific Scoring Matrix (PSSM) experimentally derived
• Validated using Pellequer Dataset and epitopes from the HIV Los Alamos database
• Available at: http://www.cbs.dtu.dk/services/BepiPred/
10/06/2011 B Cell Epitope Prediction Leon Jessen ([email protected])
20 CBS, Department of Systems Biology
CEN
TER FO
R B
IOLO
GIC
AL S
EQU
ENCE A
NALYS
IS
TECH
NIC
AL U
NIV
ERSITY O
F DEN
MARK
Performance
• Depiction of HIV Los Alamos set
• HIV Los Alamos set – Levitt 0.57 – Parker 0.59 – BepiPred 0.60
• Pellequer set: – Levitt 0.66 – Parker 0.65 – BepiPred 0.68
So, BepiPred is better than the others, but still not too good!
10/06/2011 B Cell Epitope Prediction Leon Jessen ([email protected])
21 CBS, Department of Systems Biology
CEN
TER FO
R B
IOLO
GIC
AL S
EQU
ENCE A
NALYS
IS
TECH
NIC
AL U
NIV
ERSITY O
F DEN
MARK
Imposing 2D on what is really 3D
• Predicting linear epitopes from sequence is an over simplification
• Epitopes live in a 3D world!
• In a 2D world you go to San Francisco • In the 3D world you to Miami (At best if you drop out of the sky!)
10/06/2011 B Cell Epitope Prediction Leon Jessen ([email protected])
22 CBS, Department of Systems Biology
CEN
TER FO
R B
IOLO
GIC
AL S
EQU
ENCE A
NALYS
IS
TECH
NIC
AL U
NIV
ERSITY O
F DEN
MARK
Going 3D requires protein structures
• Protein Data Bank (Jun 07 2011: 73656 Structures)
• Evolutionary conservation: Structure over function over sequence
• Even if the structure is unknown, often it can be modeled using homology modeling (http://www.cbs.dtu.dk/services/CPHmodels/)
Super impose sequence
with unknown structure
Homology model of VAR2CSA DBL5ε based on
template VAR2CSA DBL3x (3BQK). RMS = 0.498
10/06/2011 B Cell Epitope Prediction Leon Jessen ([email protected])
23 CBS, Department of Systems Biology
CEN
TER FO
R B
IOLO
GIC
AL S
EQU
ENCE A
NALYS
IS
TECH
NIC
AL U
NIV
ERSITY O
F DEN
MARK
So exactly what is it that an AB ‘sees’?
• Interrogate the surface using a 10Å probe
• Imagine the probe as a ball rolling over the surface of the protein and reporting back what it touches
• In the figure, all that is ‘touched’ is in green
• This can define ‘Exposed accessible surface structure’ (Basic principle)
• Regardless of hydrophilicity: What is accessible is accessible! (But often hydrophilic will be ‘most’ accessible)
Novotny J. A static accessibility model of protein antigenicity. Int Rev Immunol 1987 Jul;2(4):379-89
Antibody
Antigen
Probe
10/06/2011 B Cell Epitope Prediction Leon Jessen ([email protected])
24 CBS, Department of Systems Biology
CEN
TER FO
R B
IOLO
GIC
AL S
EQU
ENCE A
NALYS
IS
TECH
NIC
AL U
NIV
ERSITY O
F DEN
MARK
The DiscoTope Server
• Discotope: Prediction of residues in discontinuous B cell epitopes using protein 3D structures (Andersen PH, Nielsen M and Lund O, Protein Sci 2006)
• http://www.cbs.dtu.dk/services/DiscoTope/
10/06/2011 B Cell Epitope Prediction Leon Jessen ([email protected])
25 CBS, Department of Systems Biology
CEN
TER FO
R B
IOLO
GIC
AL S
EQU
ENCE A
NALYS
IS
TECH
NIC
AL U
NIV
ERSITY O
F DEN
MARK
How does it work then?
• Combines propensity scale values of amino acids in discontinuous epitopes with surface exposure
10/06/2011 B Cell Epitope Prediction Leon Jessen ([email protected])
26 CBS, Department of Systems Biology
CEN
TER FO
R B
IOLO
GIC
AL S
EQU
ENCE A
NALYS
IS
TECH
NIC
AL U
NIV
ERSITY O
F DEN
MARK
Surface exposure
• Structures of antibodies/antigen protein complexes in PDB
• Dr.Andrew Martin’s SACS database (available at http://www.bioinf.org.uk/abs/sacs) was used to get an overview of PDB entries
• Epitopes in the data set were identified by finding residues within 4Å from heavy or light chains in the Abs
• Homology grouping and cross-validation for the training and testing of the method to avoid biasing towards specific antigens was used
• The 5 sets used for cross-validated training/testing are available at: http://www.cbs.dtu.dk/suppl/immunology/DiscoTope.php
10/06/2011 B Cell Epitope Prediction Leon Jessen ([email protected])
27 CBS, Department of Systems Biology
CEN
TER FO
R B
IOLO
GIC
AL S
EQU
ENCE A
NALYS
IS
TECH
NIC
AL U
NIV
ERSITY O
F DEN
MARK
Log-odds ratios
• Frequencies of amino acids in epitope residues compared to frequencies of non-epitope residues
• Several discrepancies compared to the Parker hydrophilicity scale
• Predictive performance (AUC) of B cell epitopes: – Parker hydrophilicity scale 0.614 – Epitope log–odds 0.634
10/06/2011 B Cell Epitope Prediction Leon Jessen ([email protected])
28 CBS, Department of Systems Biology
CEN
TER FO
R B
IOLO
GIC
AL S
EQU
ENCE A
NALYS
IS
TECH
NIC
AL U
NIV
ERSITY O
F DEN
MARK
Combine log-odds ratio and surface expos.
• By structure, we know which residues are in spatial proximity
• By log-odds ratio, we know which residues are likely to be in an epitope
• S - D - E - K - R - P - E - K are in spatial proximity
• K has 7 contacts
• The score for K is the sum of the log-odds values
...LIST..FVDEKRPGSDIVED......ALILKDENKTTVI...
-0.145 + 0.691 + 0.346 + 1.136 + 1.180 + 1.164 + 0.346 + 1.136
10/06/2011 B Cell Epitope Prediction Leon Jessen ([email protected])
29 CBS, Department of Systems Biology
CEN
TER FO
R B
IOLO
GIC
AL S
EQU
ENCE A
NALYS
IS
TECH
NIC
AL U
NIV
ERSITY O
F DEN
MARK
DiscoTope Performance
• Parker 0.614 Seq.-based
• Epitope log–odds 0.634 Seq.-based
• Contact numbers 0.647 Str.-based
• DiscoTope 0.711 Seq./Str.-based
10/06/2011 B Cell Epitope Prediction Leon Jessen ([email protected])
30 CBS, Department of Systems Biology
CEN
TER FO
R B
IOLO
GIC
AL S
EQU
ENCE A
NALYS
IS
TECH
NIC
AL U
NIV
ERSITY O
F DEN
MARK
Evaluation example
• Plasmodium falciparum Apical antigen I
• Kept completely seperat from DiscoTope training
• Two epitopes were identified using phagedisplay, sequence variance analysis and pointmutation (green backbone)
• Most residues identified as epitopes were successfully predicted by DiscoTope (black side chains)
10/06/2011 B Cell Epitope Prediction Leon Jessen ([email protected])
31 CBS, Department of Systems Biology
CEN
TER FO
R B
IOLO
GIC
AL S
EQU
ENCE A
NALYS
IS
TECH
NIC
AL U
NIV
ERSITY O
F DEN
MARK
Relatively new services
Technical University of Denmark - DTUDepartment of systems biology
CE
NT
ER
FOR
BIO
LOG
ICA
L SE
QU
EN
CE
AN
ALY
SIS
ECCB/ISMB-2009 - Immunological Bioinformatics Tutorial
Vol. 24 no. 12 2008, pages 1459–1460BIOINFORMATICS APPLICATIONS NOTE doi:10.1093/bioinformatics/btn199
Structural bioinformatics
PEPITO: improved discontinuous B-cell epitope prediction usingmultiple distance thresholds and half sphere exposureMichael J. Sweredoski1,2 and Pierre Baldi1,2,*1Department of Computer Science and 2Institute for Genomics and Bioinformatics, University of California, Irvine,92697-3435, California, USA
Received on March 3, 2008; revised on April 18, 2008; accepted on April 20, 2008
Advance Access publication April 28, 2008
Associate Editor: Anna Tramontano
ABSTRACT
Motivation: Accurate prediction of B-cell epitopes is an important
goal of computational immunology. Up to 90% of B-cell epitopes are
discontinuous in nature, yet most predictors focus on linear
epitopes. Even when the tertiary structure of the antigen is available,
the accurate prediction of B-cell epitopes remains challenging.
Results: Our predictor, PEPITO, uses a combination of amino-acid
propensity scores and half sphere exposure values at multiple
distances to achieve state-of-the-art performance. PEPITO achieves
an area under the curve (AUC) of 75.4 on the Discotope dataset.
Additionally, we benchmark PEPITO as well as the Discotope
predictor on the more recent Epitome dataset, achieving AUCs of
68.3 and 66.0, respectively.
Availability: PEPITO is available as part of the SCRATCH suite of
protein structure predictors via www.igb.uci.edu.
Contact: [email protected]
Supplementary information: Supplementary data are available at
Bioinformatics online.
1 INTRODUCTION
B-cell epitope prediction is an important, but unsolved problem inbioinformatics. The ability to accurately predict B-cell epitopeswould aid researchers in a variety of immunological applications.Initial attempts at predicting B-cell epitopes involved the
calculation of propensity scales (Hopp and Woods, 1981).While this information can be useful in predicting B-cellepitopes, Blythe and Flower (2005) showed that propensityscales alone are not enough to accurately predict epitopes.Many of the previous predictors have focused on linear B-cell
epitopes. Some of these methods include ABCpred (Saha andRaghava, 2006), BEPITOPE (Odorico and Pellequer, 2003),Bepipred (Larsen et al., 2006) and PEOPLE (Alix, 1999).However, past surveys have estimated that only 10% of theB-cell epitopes are continuous (van Regenmortel, 1996).Additionally, van Regenmortel (2006) noted that even linearepitopes adopt a conformational structure and therefore thedistinction is somewhat blurred. Far fewer predictors have beendeveloped for discontinuous B-cell epitopes. One of the firstmethods explicitly created for identification of discontinuousepitopes was conformational epitope predictor (CEP)
(Kulkarni-Kale et al., 2005). Another method described byRapberger et al. (2007) incorporates epitope–paratope shapecomplementarity to predict interaction sites. One of the mostrecent, state-of-the-art, predictors of discontinuous epitopes isDiscotope (Andersen et al., 2006), which uses both contactnumbers (i.e. the number of C! atoms within a certain distancethreshold) and an amino-acid propensity scale.Our predictor, PEPITO, attempts to overcome some of the
limitations of previous predictors by incorporating an amino-acid propensity scale along with side chain orientation andsolvent accessibility information using half sphere exposurevalues (Hamelryck, 2005). To increase robustness, PEPITOuses propensity scales and half sphere exposure values atmultiple distance thresholds from the target residue.
2 METHODS
2.1 DatasetsWe obtained epitope datasets for benchmarking prediction methodsfrom both the Discotope Supplementary Materials (Andersen et al.,2006) and Epitome (Schlessinger et al., 2006). The two datasets containdifferent sets of protein chains and differ in their epitope/non-epitopeclassification rules. The Discotope dataset, which consists of 75 proteinchains, labels all residues in antigen chains within 4 A of an antibody asepitopes. The Epitome dataset, which consists of 140 protein chains,seeks to eliminate incidental contacts by labeling residues in the antigenwithin 6 A of the complementary determining regions of the antibodychains as epitopes.
We derived two additional datasets, C[Discotope] and C[Epitome],from the set of protein chains that are common to both the Epitomeand Discotope datasets. The two datasets differ in the method used toidentify epitope residues. Eight hundred and seventy-five of the residuesin the derived datasets are defined as epitopes using both methods. Fourhundred and seventy-one of the residues in the derived datasets aredefined as epitopes using the Epitome method but not the Discotopemethod. One hundred and nine of the residues in the derived datasetsare defined as epitopes using the Discotope method but not the Epitomemethod. The assertions by Schlessinger et al. (2006) would indicate thatthe 471 residues are integral to the antigen–antibody binding while the109 residues result from incidental contacts.
Testing procedures require that the protein chains present in thedatasets be clustered to prevent any one family from dominating theperformance measures. Protein families were previously annotated forthe Discotope dataset. UniqueProt (Mika and Rost, 2003) was used toidentify protein families in the Epitome dataset and the two deriveddatasets.*To whom correspondence should be addressed.
! The Author 2008. Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected] 1459
BioMed Central
!"#$%&%'(%)!"#$%&'()*%+&',-&.,+&/0-#-0,'&"(+",1%12
BMC Bioinformatics
Open AccessSoftwareElliPro: a new structure-based tool for the prediction of antibody epitopesJulia Ponomarenko*1,2, Huynh-Hoa Bui3, Wei Li, Nicholas Fusseder, Philip E Bourne1,2, Alessandro Sette4 and Bjoern Peters4
Address: 1San Diego Supercomputer Center, University of California, San Diego, 9500 Gilman Drive, La Jolla, California 92093, USA, 2Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, 9500 Gilman Drive, La Jolla, California 92093, USA, 3Isis Pharmaceuticals, Inc., 1896 Rutherford Road, Carlsbad, California 92008, USA and 4La Jolla Institute for Allergy and Immunology, 9420 Athena Circle, La Jolla, California 92037, USA
Email: Julia Ponomarenko* - [email protected]; Huynh-Hoa Bui - [email protected]; Wei Li - [email protected]; Nicholas Fusseder - [email protected]; Philip E Bourne - [email protected]; Alessandro Sette - [email protected]; Bjoern Peters - [email protected]* Corresponding author
AbstractBackground: Reliable prediction of antibody, or B-cell, epitopes remains challenging yet highlydesirable for the design of vaccines and immunodiagnostics. A correlation between antigenicity,solvent accessibility, and flexibility in proteins was demonstrated. Subsequently, Thornton andcolleagues proposed a method for identifying continuous epitopes in the protein regions protrudingfrom the protein's globular surface. The aim of this work was to implement that method as a web-tool and evaluate its performance on discontinuous epitopes known from the structures ofantibody-protein complexes.
Results: Here we present ElliPro, a web-tool that implements Thornton's method and, togetherwith a residue clustering algorithm, the MODELLER program and the Jmol viewer, allows theprediction and visualization of antibody epitopes in a given protein sequence or structure. ElliProhas been tested on a benchmark dataset of discontinuous epitopes inferred from 3D structures ofantibody-protein complexes. In comparison with six other structure-based methods that can beused for epitope prediction, ElliPro performed the best and gave an AUC value of 0.732, when themost significant prediction was considered for each protein. Since the rank of the best predictionwas at most in the top three for more than 70% of proteins and never exceeded five, ElliPro isconsidered a useful research tool for identifying antibody epitopes in protein antigens. ElliPro isavailable at http://tools.immuneepitope.org/tools/ElliPro.
Conclusion: The results from ElliPro suggest that further research on antibody epitopesconsidering more features that discriminate epitopes from non-epitopes may further improvepredictions. As ElliPro is based on the geometrical properties of protein structure and does notrequire training, it might be more generally applied for predicting different types of protein-proteininteractions.
Published: 2 December 2008
BMC Bioinformatics 2008, 9:514 doi:10.1186/1471-2105-9-514
Received: 24 September 2008Accepted: 2 December 2008
This article is available from: http://www.biomedcentral.com/1471-2105/9/514
© 2008 Ponomarenko et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
RECENT DEVELOPMENTS
Friday, 11 June 2010
10/06/2011 B Cell Epitope Prediction Leon Jessen ([email protected])
32 CBS, Department of Systems Biology
CEN
TER FO
R B
IOLO
GIC
AL S
EQU
ENCE A
NALYS
IS
TECH
NIC
AL U
NIV
ERSITY O
F DEN
MARK
Summary
• B-cell epitopes are essential in preparing the body for future infections
• Antibodies are constantly monitoring the body
• Due to combinatorics predicting B-cell epitopes is a highly complex task
• Current best approach: Combine propensity scales with structure
• Immunoinformatics is a new field of reasearch, so there is plenty of room for improvement
• The field is expanding and actively reduces lab-time
• Thanks to Claus Lundegaard for letting me use his slides from last year as inspiration for this talk