75
Napovedovanje imunskega odziva iz peptidnih mikromrež Mitja Luštrek 1 (2) , Peter Lorenz 2 , Felix Steinbeck 2 , Georg Füllen 2 , Hans-Jürgen Thiesen 2 1 Odsek za inteligentne sisteme, Institut Jožef Stefan 2 Univerza v Rostocku

Napovedovanje imunskega odziva iz peptidnih mikromrež Mitja Luštrek 1 (2), Peter Lorenz 2, Felix Steinbeck 2, Georg Füllen 2, Hans-Jürgen Thiesen 2 1 Odsek

Embed Size (px)

Citation preview

Napovedovanje imunskega odzivaiz peptidnih mikromrež

Mitja Luštrek1 (2),Peter Lorenz2, Felix Steinbeck2, Georg Füllen2, Hans-Jürgen Thiesen2

1 Odsek za inteligentne sisteme, Institut Jožef Stefan2 Univerza v Rostocku

1. Introduction2. Immune response prediction3. Interpretation

1. Introduction2. Immune response prediction3. Interpretation

Peptide

= part of protein = short sequence of amino acids

Image taken fromEMBL website

Peptide

= part of protein = short sequence of amino acids

SNDIVLT

= string of letters from 20-letter alphabet(1 letter = 1 amino acid, 20 standard amino acids)

Image taken fromEMBL website

Epitope

Antigen protein

Antibody binding

Antibody

Epitope

Antibody binding

Antibody

EpitopeAntigen protein

Epitope

Epitope

Peptide

Antigen protein

Epitope

EpitopeAntigen protein

Epitope

EpitopeAntigen protein

Antibody binding

Antibody

Epitope

EpitopeAntigen protein

Antibody binding

Antibody

Epitope

EpitopeAntigen protein

Antibody binding

Antibody

Epitope

EpitopeAntigen protein

Epitope

EpitopeAntigen protein

Peptide arrays

Peptidearray Peptides

(15 amino acids)

Glass slide

Peptide arrays

Peptidearray

IVIg antibody mixture

Peptides(15 amino acids)

Glass slide

Peptide arrays

Peptidearray

IVIg antibody mixture

Red = epitopes (bind antibodies)Black = non-epitopes

Peptides(15 amino acids)

Glass slide

Peptide arrays

Red = epitopes (bind antibodies)Black = non-epitopes

Peptide

Antibody

Antibody against

antibody + dye

Glass slide

Peptide arrays

Red = epitopes (bind antibodies)Black = non-epitopes

Peptide Class

PGIGFPGPPGPKGDQ non-ep.

PNMVFIGGINCANGK non-ep.

DGIGGAMHKAMLMAQ non-ep.

REDNLTLDISKLKEQ non-ep.

TPLAGRGLAERASQQ non-ep.

DQVHPVDPYDLPPAG non-ep.

...

RRMISRMPIFYLMSG epitope

LPPGFKRFTCLSIPR epitope

EFSQMESYPEDYFPI epitope

...

1. Introduction2. Immune response prediction3. Interpretation

Our task

Peptide

RRKGGLEEPQPPAEQ

SEDLENALKAVINDK

EDHVKLVNEVTEFAK

GEKIIQEFLSKVKQM

ILVSRSLKMRGQAFV

YTCQCRAGYQSTLTR

...

Our task

Peptide

RRKGGLEEPQPPAEQ

SEDLENALKAVINDK

EDHVKLVNEVTEFAK

GEKIIQEFLSKVKQM

ILVSRSLKMRGQAFV

YTCQCRAGYQSTLTR

...

Peptide Class

RRKGGLEEPQPPAEQ non-ep.

SEDLENALKAVINDK non-ep.

EDHVKLVNEVTEFAK non-ep.

GEKIIQEFLSKVKQM non-ep.

ILVSRSLKMRGQAFV epitope

YTCQCRAGYQSTLTR epitope

...

Machine learning

Our task

Peptide

RRKGGLEEPQPPAEQ

SEDLENALKAVINDK

EDHVKLVNEVTEFAK

GEKIIQEFLSKVKQM

ILVSRSLKMRGQAFV

YTCQCRAGYQSTLTR

...

Peptide Class

RRKGGLEEPQPPAEQ non-ep.

SEDLENALKAVINDK non-ep.

EDHVKLVNEVTEFAK non-ep.

GEKIIQEFLSKVKQM non-ep.

ILVSRSLKMRGQAFV epitope

YTCQCRAGYQSTLTR epitope

...

Machine learning

Training set: 13,638 peptides (3,420 epitopes)Test set: 13,640 peptides (3,421 epitopes)

Balanced until the final testing

Machine learningPeptide Class

PGIGFPGPPGPKGDQ non-ep. / epitope

Machine learningPeptide Class

PGIGFPGPPGPKGDQ non-ep. / epitope

Attribute 1 Attribute 2 ... Class

value 1 value 2 non-ep. / epitopeAttribute

representation

Machine learningPeptide Class

PGIGFPGPPGPKGDQ non-ep. / epitope

Attribute 1 Attribute 2 ... Class

value 1 value 2 non-ep. / epitope

ML

Attribute representation

Classifier

Proability for epitope p

Machine learningPeptide Class

PGIGFPGPPGPKGDQ non-ep. / epitope

Attribute 1 Attribute 2 ... Class

value 1 value 2 non-ep. / epitope

ML

Attribute representation

Classifier

Proability for epitope p

Machine learningPeptide Class

PGIGFPGPPGPKGDQ non-ep. / epitope

Attribute representation 1

Attribute representation 8

Classifier 1 Classifier 8...

...

ML

ML

Machine learningPeptide Class

PGIGFPGPPGPKGDQ non-ep. / epitope

Attribute representation 1

Attribute representation 8

Classifier 1 Classifier 8...

...

Probabilities for epitope Class

p1 p2 p3 p4 p5 p6 p7 p8 non-ep. / epitope

ML

ML

Meta classifierML

Final proability for epitope p

Machine learningPeptide Class

PGIGFPGPPGPKGDQ non-ep. / epitope

Attribute representation 1

Attribute representation 8

Classifier 1 Classifier 8...

...

Probabilities for epitope Class

p1 p2 p3 p4 p5 p6 p7 p8 non-ep. / epitope

ML

ML

Meta classifierML

Final proability for epitope p

SVM (SMO), Logistic

regression

Linear regression

Attribute representation 1

RRMISRMPIFYLMSG

Count of A C D E F G H I K L M N P Q R S T V W Y

1 1 2 1 3 1 3 2 1

Amino-acid counts

Attribute representation 2

RRMISRMPIFYLMSG

Amino-acid count differences

Difference in counts of F–G F–I F–L F–M F–P F–R F–S F–Y G–F G–I ...

0 –1 0 –2 0 –2 –1 0 0 –1

Attribute representation 3

Count of RR RM MI ... RRM RMI MIS ... ACDE ... ACDEF ...

1 2 1 1 1 1 0 0

RRMISRMPIFYLMSG

Subsequence counts

Attribute representation 4

Amino-acid class counts

Count of tiny small large basic acidic neutral ...

3 1 11 3 0 12

l l l l t l l s l l l l l t t

RRMISRMPIFYLMSG

b b n n n b n n n n n n n n n

Attribute representation 5

Amino-acid class subsequence counts

l l l l t l l s l l l l l t t

RRMISRMPIFYLMSG

b b n n n b n n n n n n n n n

Count of ll lt tl ls sl tt ... bb bn nb nn ...

8 2 1 1 1 1 1 2 1 10

Attribute representation 6

Amino-acid pair countsRationale: antibodies may bind in two places due to their two-chain structure.

Antibody

Peptide

Attribute representation 6

RRMISRMPIFYLMSG

Amino-acid pair countsRationale: antibodies may bind in two places due to their two-chain structure.

Count of pairs at distance (R,R) at 1 (R,M) at 2 (R,I) at 3 ... (A,C) at 1 (A,C) at 2 ...

1 1 2 0 0

1 2 3 3 Antibody

Peptide

Attribute representation 7

Amino-acids at distances from first + first amino acidRationale: antibodies may bind in two places, first amino acid most accesible on the peptide array.

Antibody

Peptide

Attribute representation 7

R RMISRMPIFYLMSG

Amino-acids at distances from first + first amino acidRationale: antibodies may bind in two places, first amino acid most accesible on the peptide array.

Count of at distance ... R at 1 ... M at 2 ... A at 3 C at 3 ... First

1 1 0 0 R

Antibody

Peptide

Attribute representation 8

RRMISRMPIFYLMSG

Average amino-acid properties

Hydrophobicity Size Polarity Flexibility Accesibility ...

0.448 0.596 0.306 0.231 0.376

Attribute representation 9 (not used)

RRMISRMPIFYLMSG

Amino-acid counts with a difference

RRMISRMPIWYLMSG

Equivalent for epitope prediction?

Attribute representation 9 (not used)

RRMISRMPIFYLMSG

Amino-acid counts with a difference

RRMISRMPIWYLMSG

Equivalent for epitope prediction?

Count F as:• 1 F• 0.8 W• 0.4 Y• ...

Count W as:• 1 W• 0.7 F • 0.3 Y• ...

Attribute representation 9 (not used)

Amino-acid substitution matrix

A C D ... F W YA 1C 1D 1...F 1 0.8 0.4W 0.7 1 0.3Y 1

Attribute representation 9 (not used)

Amino-acid substitution matrix

A C D ... F W YA 1C 1D 1...F 1 0.8 0.4W 0.7 1 0.3Y 1

Optimizewith a genetic algorithm to maximize classification accuracy

Results – training set

Attribute representation AUC AccuracyAmino-acid counts 0.870 80.7 %Amino-acid count differences 0.868 80.3 %Subsequence counts 0.867 80.5 %Amino-acid class counts 0.873 81.2 %Amino-acid class subsequence counts 0.866 80.5 %Amino-acid pair counts 0.865 80.6 %Amino acids at distances from the first 0.873 81.2 %Average amino-acid properties 0.863 80.3 %

Results – training set

Attribute representation AUC AccuracyAmino-acid counts 0.870 80.7 %Amino-acid count differences 0.868 80.3 %Subsequence counts 0.867 80.5 %Amino-acid class counts 0.873 81.2 %Amino-acid class subsequence counts 0.866 80.5 %Amino-acid pair counts 0.865 80.6 %Amino acids at distances from the first 0.873 81.2 %Average amino-acid properties 0.863 80.3 %Combined 0.881 83.3 %

Results – test set

Attribute representation / dataset AUC AccuracyBest single / training set 0.873 81.2 %Combined / training set 0.881 83.3 %Combined / test set 0.883 83.7 %

Results – test set

Attribute representation / dataset AUC AccuracyBest single / training set (balanced) 0.873 81.2 %Combined / training set (balanced) 0.881 83.3 %Combined / test set (balanced) 0.883 83.7 %Combined / test set (original) 0.884 85.9 %

Epitope : non-epitope = 1 : 1

Epitope : non-epitope = 1 : 3

Results – test set

Attribute representation / dataset AUC AccuracyBest single / training set (balanced) 0.873 81.2 %Combined / training set (balanced) 0.881 83.3 %Combined / test set (balanced) 0.883 83.7 %Combined / test set (original) 0.884 85.9 %EL-Manzalawy / test set (balanced) 0.868 82.0 %EL-Manzalawy / test set (original) 0.874 83.9 %

State of the art:SVM + string kernel(EL-Manzalawy et al., 2008)Trained and tested on our data.

Results – test set

Our resultsBalanced: 0.883 / 83.7 % Original: 0.884 / 85.9 %

EL-ManzalawyBalanced: 0.868 / 82.0 % Original: 0.874 / 83.9 %

1. Introduction2. Immune response prediction3. Interpretation

Rules

Interpretable classifier:• Interpretable attributes

(frequencies, properties of amino acids)• RIPPER (JRip) to induce rules

Rules

Property Low/high Applies to peptidesAromaticity High 53.8 %

If a peptide has a high aromaticity, it binds antibodies.This applies to 53.8 % of peptides that bind antibodies.

(Aromaticity is the percentage of aromatic amino acids in the peptide.)

Interpretable classifier:• Interpretable attributes

(frequencies, properties of amino acids)• RIPPER (JRip) to induce rules

Rules

Property Low/high Applies to peptidesAromaticity High 53.8 %Polarity Low 27.7 %Frequency of tyrosine High 26.2 %Hydrophobicity Low 22.5 %Frequency of arginine High 19.7 %Summary factor 2 High 16.7 %Acidity Low 11.4 %Preference for -sheets Low 4.3 %Summary factor 5 High 3.0 %

Epitope propensity

Frequency in peptides with epitopes,divided by frequency in peptides without epitopes

Epitope propensity

Aromatic

Epitope propensity

Non-polar

Epitope propensity

Tyrosine

(Un)classifiable peptides

Simplified classifier:• Interpretable attributes

(frequencies, properties of amino acids)• Logistic regression to train the classifier

Peptides AUC AccuracyAll 0.860 83.0 %

(Un)classifiable peptides

Simplified classifier:• Interpretable attributes

(frequencies, properties of amino acids)• Logistic regression to train the classifier

Peptides AUC AccuracyAll 0.860 83.0 %ClassifiableUnclassifiable

Classified correctly

Classified incorrectly

(Un)classifiable peptides

Simplified classifier:• Interpretable attributes

(frequencies, properties of amino acids)• Logistic regression to train the classifier

Peptides AUC AccuracyAll 0.860 83.0 %Classifiable 0.999 98.8 %Unclassifiable 0.956 91.5 %

Expected

Strange?

(Un)classifiable – rules

AttributeClassifiable Unclassifiable

L/h Applies L/h AppliesAromaticity High 74.3 % Low 53.3 %Polarity Low 58.7 % High 27.5 %Frequency of arginine High 31.5 % Low 34.0 %Frequency of tyrosine High 20.7 % Low 16.9 %Summary factor 5 High 15.1 % Low 15.2 %Antigenicity High 7.3 % Low 8.7 %Hydrophobicity Low 4.7 % High 6.5 %Frequency of histidine Low 3.9 %Frequency of cysteine Low 10.4 %Preference for reverse turns High 10.4 %Occurrence in turns Low 10.4 %Frequency of alanine High 8.7 %

(Un)classifiable – rules

AttributeClassifiable Unclassifiable

L/h Applies L/h AppliesAromaticity High 74.3 % Low 53.3 %Polarity Low 58.7 % High 27.5 %Frequency of arginine High 31.5 % Low 34.0 %Frequency of tyrosine High 20.7 % Low 16.9 %Summary factor 5 High 15.1 % Low 15.2 %Antigenicity High 7.3 % Low 8.7 %Hydrophobicity Low 4.7 % High 6.5 %Frequency of histidine Low 3.9 %Frequency of cysteine Low 10.4 %Preference for reverse turns High 10.4 %Occurrence in turns Low 10.4 %Frequency of alanine High 8.7 %

All: 53.8 %

All: 27.7 %

(Un)classifiable – epitope propensity

(Un)classifiable peptides

Simplified classifier:• Interpretable attributes

(frequencies, properties of amino acids)• Logistic regression to train the classifier

Peptides AUC AccuracyAll 0.860 83.0 %Classifiable 0.999 98.8 %Unclassifiable 0.956 91.5 %

Strange? Not really!Inevitable or does it mean something?

2nd degree (un)classifiable peptides

• Unclassifiable peptides only• Simplified classifier

Peptides AUC AccuracyAll unclassifiable 0.956 91.5 %

2nd degree (un)classifiable peptides

• Unclassifiable peptides only• Simplified classifier

Peptides AUC AccuracyAll unclassifiable 0.956 91.5 %Classifiable unclassifiableUnclassifiable unclassifiable

Classified correctly

Classified incorrectly

2nd degree (un)classifiable peptides

• Unclassifiable peptides only• Simplified classifier

Peptides AUC AccuracyAll unclassifiable 0.956 91.5 %Classifiable unclassifiable 0.992 97.8 %Unclassifiable unclassifiable 0.683 65.0 %

2nd degree (un)classifiable peptidesPeptides AUC AccuracyAll unclassifiable 0.956 91.5 %Classifiable unclassifiable 0.992 97.8 %Unclassifiable unclassifiable 0.683 65.0 %

(Un)classifiable peptidesPeptides AUC AccuracyAll 0.860 83.0 %Classifiable 0.999 98.8 %Unclassifiable 0.956 91.5 %

Inevitable or does it mean something?

Not inevitable!

2nd degree (un)cl. – epitope propensity

Conclusions

• Epitopes have common characteristics

Conclusions

• Epitopes have common characteristics– Epitopes are parts of antigens that bind antibodies

Our peptides mostly did not come from known antigens

Probably partly general and partly antibody-specific binding

Conclusions

• Epitopes have common characteristics– Epitopes are parts of antigens that bind antibodies

• Epitope characteristics are not unexpected

Our peptides mostly did not come from known antigens

Probably partly general and partly antibody-specific binding

Conclusions

• Epitopes have common characteristics– Epitopes are parts of antigens that bind antibodies

• Epitope characteristics are not unexpected

• Two groups of epitopes:– around 80 % “typical” (classifiable)– around 20 % “atypical” (unclassifiable)

Our peptides mostly did not come from known antigens

Probably partly general and partly antibody-specific binding

Conclusions

• Epitopes have common characteristics– Epitopes are parts of antigens that bind antibodies

• Epitope characteristics are not unexpected

• Two groups of epitopes:– around 80 % “typical” (classifiable)– around 20 % “atypical” (unclassifiable)

Our peptides mostly did not come from known antigens

Probably partly general and partly antibody-specific binding

Mostly general-purpose antibodies?

Mostly antigen-specific antibodies?