49
Evolving energy function for protein structure prediction Pawel Widera [email protected] Natalio Krasnogor, Jonathan Garibaldi Department of Computer Science Ben-Gurion University of the Negev Beer Sheva, Israel 2009-06-30

A Genetic Programming Challenge: Evolving the Energy Function for Protein Structure Prediction

Embed Size (px)

DESCRIPTION

In this talk I introduce a computational challenge for GP researchers, namely, the automated synthesis of energy functions for protein structure prediction.

Citation preview

Page 1: A Genetic Programming Challenge: Evolving the Energy Function for Protein Structure Prediction

Evolving energy functionfor protein structure prediction

Paweł [email protected]

Natalio Krasnogor, Jonathan Garibaldi

Department of Computer ScienceBen-Gurion University of the Negev Beer Sheva, Israel

2009-06-30

Page 2: A Genetic Programming Challenge: Evolving the Energy Function for Protein Structure Prediction

Outline

1 Introduction

2 Protein energy models

3 Genetic Programming problem formulation

4 Results

5 Conclusions

Pawe l Widera Evolving energy function for PSP 2009-06-30 2 / 26

Page 3: A Genetic Programming Challenge: Evolving the Energy Function for Protein Structure Prediction

Protein structure predictionFrom 1D sequence to 3D structure

LFSKELRCMMYGFGDDQNPYTESVDILEDLVIEFITEMTHKAMSIFSEEQLNRYEMYRRSAFPKAAIKRLIQSITGTSVSQNVVIAMSGISKVFVGEVVEEALDVCEKWGEMPPLQPKHMREAVRRLKSKGQIP

Protein basics20 aminoacidalphabetsequence encodesstructurestructuredetermines activity

Pawe l Widera Evolving energy function for PSP 2009-06-30 3 / 26

Page 4: A Genetic Programming Challenge: Evolving the Energy Function for Protein Structure Prediction

International prediction contestCritical Assessment of techniques for protein Structure Prediction

CASP factsbiannual competition started in 1994parallel prediction and experimental verificationmodel assesment by human experts

Prediction difficultycomparative modelling (sequence similarity)fold recognition (new or existing)ab initio modelling (first principles)

Pawe l Widera Evolving energy function for PSP 2009-06-30 4 / 26

Page 5: A Genetic Programming Challenge: Evolving the Energy Function for Protein Structure Prediction

Ab initio predictor schemaFrom sequence to the final model

Target sequence

Secondarystructureprediction

Foldrecognition

and threading

Initial ab initio prediction

Optimisation

Clustering

Final models

PSIPRED

SAM-T02

JUFO

PSI-BLAST

Pawe l Widera Evolving energy function for PSP 2009-06-30 5 / 26

Page 6: A Genetic Programming Challenge: Evolving the Energy Function for Protein Structure Prediction

Ab initio predictor schemaFrom sequence to the final model

Target sequence

Secondarystructureprediction

Foldrecognition

and threading

Initial ab initio prediction

Optimisation

Clustering

Final models

PSIPRED

SAM-T02

JUFO

PSI-BLAST

Pawe l Widera Evolving energy function for PSP 2009-06-30 5 / 26

Page 7: A Genetic Programming Challenge: Evolving the Energy Function for Protein Structure Prediction

The algorithm of foldingAnfinsen’s thermodynamic hypothesis [Anfinsen, 1973]

[Dill and Chan, 1997]

Refolding experimentfolds to the samenative statenative state isenergetically stable

Energy funnelroll down freeenergy hillavoid local minimatraps

Pawe l Widera Evolving energy function for PSP 2009-06-30 6 / 26

Page 8: A Genetic Programming Challenge: Evolving the Energy Function for Protein Structure Prediction

Model assesmentCorrelation between energy and similarity to native

Similarity measure

RMSD =

√√√√ 1N

i=N∑i=1

δ2i

Decoys generated byI-TASSER[Wu et al., 2007]

Robetta[Rohl et al., 2004]

Pawe l Widera Evolving energy function for PSP 2009-06-30 7 / 26

Page 9: A Genetic Programming Challenge: Evolving the Energy Function for Protein Structure Prediction

Model assesmentCorrelation between energy and similarity to native

Similarity measure

RMSD =

√√√√ 1N

i=N∑i=1

δ2i

Decoys generated byI-TASSER[Wu et al., 2007]

Robetta[Rohl et al., 2004]

Pawe l Widera Evolving energy function for PSP 2009-06-30 7 / 26

Page 10: A Genetic Programming Challenge: Evolving the Energy Function for Protein Structure Prediction

All-atom force fieldFolding simulation ∑bonds

ik l

i2 (l − l0i )+∑angles

ikθi2 (θ − θ0

i )+∑torsionsi

Vωi2 [1 + cos(niωi − φi)]+∑N−1

i=1∑N

j=i+1

{4εij

[(σijrij

)12−(

σijrij

)6]

+qi qj

4πε0rij

}Intermolecular forces

bond forces (stretching, bending, rotating)short range forces (Pauli repulsion, van der Waals’ interactions)electrostatic forces (Coulomb’s law)

Rosetta@home in CASP7140k computers (37 TFLOPS) — 500k CPU hours per domain

Pawe l Widera Evolving energy function for PSP 2009-06-30 8 / 26

Page 11: A Genetic Programming Challenge: Evolving the Energy Function for Protein Structure Prediction

All-atom force fieldFolding simulation ∑bonds

ik l

i2 (l − l0i )+∑angles

ikθi2 (θ − θ0

i )+∑torsionsi

Vωi2 [1 + cos(niωi − φi)]+∑N−1

i=1∑N

j=i+1

{4εij

[(σijrij

)12−(

σijrij

)6]

+qi qj

4πε0rij

}Intermolecular forces

bond forces (stretching, bending, rotating)short range forces (Pauli repulsion, van der Waals’ interactions)electrostatic forces (Coulomb’s law)

Rosetta@home in CASP7140k computers (37 TFLOPS) — 500k CPU hours per domain

Pawe l Widera Evolving energy function for PSP 2009-06-30 8 / 26

Page 12: A Genetic Programming Challenge: Evolving the Energy Function for Protein Structure Prediction

Simplified knowledege-based potentialProtein structure prediction

i − 1

i

i + 1

n̂ib̂i v̂i

Example

Estiff =∑

i

(−λv̂i · v̂i+4 − λ

∣∣∣b̂i · b̂i+2

∣∣∣− λΘ1(i) + Θ2(i) + Θ3(i))

Eenv =∑

i V (NPi ,NAi ,NOi ,Ai)

Pawe l Widera Evolving energy function for PSP 2009-06-30 9 / 26

Page 13: A Genetic Programming Challenge: Evolving the Energy Function for Protein Structure Prediction

Energy functionWeighted sum of terms vs. evolved function

F (~T ) = w1 ∗ T1 + . . .wn ∗ Tn[Zhang et al., 2003]

F (~T ) = T1∗T3w1∗log(T2)

+ sin(

T4−w2∗T1T5∗exp(cos(w1∗T3))

)GP input

terminals:T1, . . . ,T8

functions:add sub mul divsin cos exp lograndom ephemeralsin range [0,1]

GP tree examplesize = 60depth = 17

Pawe l Widera Evolving energy function for PSP 2009-06-30 10 / 26

Page 14: A Genetic Programming Challenge: Evolving the Energy Function for Protein Structure Prediction

Energy functionWeighted sum of terms vs. evolved function

F (~T ) = w1 ∗ T1 + . . .wn ∗ Tn[Zhang et al., 2003]

F (~T ) = T1∗T3w1∗log(T2)

+ sin(

T4−w2∗T1T5∗exp(cos(w1∗T3))

)GP input

terminals:T1, . . . ,T8

functions:add sub mul divsin cos exp lograndom ephemeralsin range [0,1]

GP tree examplesize = 60depth = 17

Pawe l Widera Evolving energy function for PSP 2009-06-30 10 / 26

Page 15: A Genetic Programming Challenge: Evolving the Energy Function for Protein Structure Prediction

Energy functionWeighted sum of terms vs. evolved function

F (~T ) = w1 ∗ T1 + . . .wn ∗ Tn[Zhang et al., 2003]

F (~T ) = T1∗T3w1∗log(T2)

+ sin(

T4−w2∗T1T5∗exp(cos(w1∗T3))

)GP input

terminals:T1, . . . ,T8

functions:add sub mul divsin cos exp lograndom ephemeralsin range [0,1]

GP tree examplesize = 60depth = 17

Pawe l Widera Evolving energy function for PSP 2009-06-30 10 / 26

Page 16: A Genetic Programming Challenge: Evolving the Energy Function for Protein Structure Prediction

Fitness evaluationEvolutionary objective

1 construction of the reference ranking RR(decoys sorted by similarity to native)

2 ranking decoys using evolved energy function RE(decoys sorted by energy)

3 rankings comparison - RR vs. RE4 fitness = average distance for all proteins

Pawe l Widera Evolving energy function for PSP 2009-06-30 11 / 26

Page 17: A Genetic Programming Challenge: Evolving the Energy Function for Protein Structure Prediction

Reference ranking constructionCorrelation between energy and similarity to native

R0

RMSD

0

3.2

1

2.1

2

5.2

3

1.2

4

3.5

5

2.1

6

4.8

7

3.5

R1 3 1 5 0 4 7 6 2

R2 3.0 1.5 7.0 0.0 4.5 1.5 6.0 4.5

Ranking typesR1 - permutation of indicesR2 - averaged ranks

Pawe l Widera Evolving energy function for PSP 2009-06-30 12 / 26

Page 18: A Genetic Programming Challenge: Evolving the Energy Function for Protein Structure Prediction

Rankings comparisonMeasure of distance between rankings

4 3 2 1 53 4 1 5 21 1 1 4 3→ 10

1 45

35

25

15 → 4.6

Distance functionsLevenshtein edit distance - O(n)

Kendall Tau distance - O(n(n−1)2 )

Spearman footrule distance - O(12n2)

Ranks weightinglinearsigmoid

Pawe l Widera Evolving energy function for PSP 2009-06-30 13 / 26

Page 19: A Genetic Programming Challenge: Evolving the Energy Function for Protein Structure Prediction

Rankings comparisonMeasure of distance between rankings

4 3 2 1 53 4 1 5 21 1 1 4 3→ 10

1 45

35

25

15 → 4.6

Distance functionsLevenshtein edit distance - O(n)

Kendall Tau distance - O(n(n−1)2 )

Spearman footrule distance - O(12n2)

Ranks weightinglinearsigmoid

Pawe l Widera Evolving energy function for PSP 2009-06-30 13 / 26

Page 20: A Genetic Programming Challenge: Evolving the Energy Function for Protein Structure Prediction

Decoys samplingSelection vs. noise reduction

Simple selectiontopuniformrandom

Bin based selectionequal sizeequal distance

Pawe l Widera Evolving energy function for PSP 2009-06-30 14 / 26

Page 21: A Genetic Programming Challenge: Evolving the Energy Function for Protein Structure Prediction

Decoys samplingSelection vs. noise reduction

Simple selectiontopuniformrandom

Bin based selectionequal sizeequal distance

Pawe l Widera Evolving energy function for PSP 2009-06-30 14 / 26

Page 22: A Genetic Programming Challenge: Evolving the Energy Function for Protein Structure Prediction

Decoys samplingSelection vs. noise reduction

Simple selectiontopuniformrandom

Bin based selectionequal sizeequal distance

Pawe l Widera Evolving energy function for PSP 2009-06-30 14 / 26

Page 23: A Genetic Programming Challenge: Evolving the Energy Function for Protein Structure Prediction

Decoys samplingSelection vs. noise reduction

Simple selectiontopuniformrandom

Bin based selectionequal sizeequal distance

Pawe l Widera Evolving energy function for PSP 2009-06-30 14 / 26

Page 24: A Genetic Programming Challenge: Evolving the Energy Function for Protein Structure Prediction

Decoys samplingSelection vs. noise reduction

Simple selectiontopuniformrandom

Bin based selectionequal sizeequal distance

Pawe l Widera Evolving energy function for PSP 2009-06-30 14 / 26

Page 25: A Genetic Programming Challenge: Evolving the Energy Function for Protein Structure Prediction

Decoys samplingSelection vs. noise reduction

Simple selectiontopuniformrandom

Bin based selectionequal sizeequal distance

Pawe l Widera Evolving energy function for PSP 2009-06-30 14 / 26

Page 26: A Genetic Programming Challenge: Evolving the Energy Function for Protein Structure Prediction

Decoys samplingSelection vs. noise reduction

Simple selectiontopuniformrandom

Bin based selectionequal sizeequal distance

Pawe l Widera Evolving energy function for PSP 2009-06-30 14 / 26

Page 27: A Genetic Programming Challenge: Evolving the Energy Function for Protein Structure Prediction

Experiment design

Pawe l Widera Evolving energy function for PSP 2009-06-30 15 / 26

Page 28: A Genetic Programming Challenge: Evolving the Energy Function for Protein Structure Prediction

Evolutionary progressFitness throughout generations

0 200 400 600 800 1000

generation

0.0010

0.0015

0.0020

0.0025

0.0030

0.0035

0.0040

fitness

levenshtein-steadystate

0 200 400 600 800 1000

generation

0.34

0.36

0.38

0.40

0.42

fitness

spearman-generational

0 200 400 600 800 1000

generation

0.500

0.505

0.510

0.515

0.520

fitness

kendall-generational

ObservationsRound I - early saturationRound II - small but constant improvement

Pawe l Widera Evolving energy function for PSP 2009-06-30 16 / 26

Page 29: A Genetic Programming Challenge: Evolving the Energy Function for Protein Structure Prediction

Evolutionary progressFitness throughout generations

0 200 400 600 800 1000

generation

0.32

0.34

0.36

0.38

0.40

0.42

0.44

fitness

spearman-linear-ts8-generational

0 200 400 600 800 1000

generation

0.42

0.44

0.46

0.48

0.50

0.52

0.54

fitness

spearman-sigmoid-ts8-elitism

0 200 400 600 800 1000

generation

0.500

0.505

0.510

0.515

0.520

0.525

fitness

kendall-ts4-generational

ObservationsRound I - early saturationRound II - small but constant improvement

Pawe l Widera Evolving energy function for PSP 2009-06-30 16 / 26

Page 30: A Genetic Programming Challenge: Evolving the Energy Function for Protein Structure Prediction

Landscape analysisFitness distribution for the random walk

Pawe l Widera Evolving energy function for PSP 2009-06-30 17 / 26

Page 31: A Genetic Programming Challenge: Evolving the Energy Function for Protein Structure Prediction

Landscape analysisFitness distribution for the random walk

d-100 d-58 f-100 f-42 random-100 top-100 uniform-100 all0.3

0.4

0.5

0.6

0.7

0.8

fitness

Pawe l Widera Evolving energy function for PSP 2009-06-30 17 / 26

Page 32: A Genetic Programming Challenge: Evolving the Energy Function for Protein Structure Prediction

Population diveristy analysisGenotype - Phenotype - Fitness mapping

Diversity measuresF - fitness entropy(frequency ofduplicates)P - root mean squaredistance betweenrankingsG - number of uniquetrees <#T, #NT, depth>

Pawe l Widera Evolving energy function for PSP 2009-06-30 18 / 26

Page 33: A Genetic Programming Challenge: Evolving the Energy Function for Protein Structure Prediction

Population diveristy analysisGenotype - Phenotype - Fitness mapping

Pawe l Widera Evolving energy function for PSP 2009-06-30 18 / 26

Page 34: A Genetic Programming Challenge: Evolving the Energy Function for Protein Structure Prediction

Population diveristy analysisGenotype - Phenotype - Fitness mapping

Pawe l Widera Evolving energy function for PSP 2009-06-30 18 / 26

Page 35: A Genetic Programming Challenge: Evolving the Energy Function for Protein Structure Prediction

Improvement over random walkIs the evolution any good?

decoys set improvement avg best

all 0.78% 0.710uniform-100 0.96% 0.711random-100 1.28% 0.713top-100 1.93% 0.702s-42 7.76% 0.713s-100 7.64% 0.772d-58 8.21% 0.780d-100 10.88% 0.804

Pawe l Widera Evolving energy function for PSP 2009-06-30 19 / 26

Page 36: A Genetic Programming Challenge: Evolving the Energy Function for Protein Structure Prediction

Best evolved energy functionsComparison to naive combination of energy terms

Correlation to RMSDd-100 0.76(generational+ADF)all decoys 0.30(steady-state+elitism)best single term 0.24worst single term -0.20naive combination ofterms 0.12original I-TASSERenergy 0.44 (0.51/0.65)

Pawe l Widera Evolving energy function for PSP 2009-06-30 20 / 26

Page 37: A Genetic Programming Challenge: Evolving the Energy Function for Protein Structure Prediction

Best evolved energy functionsComparison to naive combination of energy terms

Correlation to RMSDd-100 0.76(generational+ADF)all decoys 0.30(steady-state+elitism)best single term 0.24worst single term -0.20naive combination ofterms 0.12original I-TASSERenergy 0.44 (0.51/0.65)

Pawe l Widera Evolving energy function for PSP 2009-06-30 20 / 26

Page 38: A Genetic Programming Challenge: Evolving the Energy Function for Protein Structure Prediction

Best evolved energy functionsComparison to naive combination of energy terms

Correlation to RMSDd-100 0.76(generational+ADF)all decoys 0.30(steady-state+elitism)best single term 0.24worst single term -0.20naive combination ofterms 0.12original I-TASSERenergy 0.44 (0.51/0.65)

Pawe l Widera Evolving energy function for PSP 2009-06-30 20 / 26

Page 39: A Genetic Programming Challenge: Evolving the Energy Function for Protein Structure Prediction

Best evolved energy functionsComparison to naive combination of energy terms

Correlation to RMSDd-100 0.76(generational+ADF)all decoys 0.30(steady-state+elitism)best single term 0.24worst single term -0.20naive combination ofterms 0.12original I-TASSERenergy 0.44 (0.51/0.65)

Pawe l Widera Evolving energy function for PSP 2009-06-30 20 / 26

Page 40: A Genetic Programming Challenge: Evolving the Energy Function for Protein Structure Prediction

Comparison to weighted sum of termsNelder-Mead downhill simplex optimisation

spearman-sigmoid correlation

method d-100 all d-100 all

simplex 0.734 0.638 0.650 0.166GP 0.835 *0.714 0.740 *0.200

Pawe l Widera Evolving energy function for PSP 2009-06-30 21 / 26

Page 41: A Genetic Programming Challenge: Evolving the Energy Function for Protein Structure Prediction

Distribution of terminals and operatorsDid the evolution discovered any knowledge?

energy term correlation

T1 (E13) 0.03± 0.11T2 (E14) 0.20± 0.17T3 (E15) 0.15± 0.15T4 (Estiff ) 0.24± 0.22T5 (EHB) −0.16± 0.20T6 (Epair ) 0.01± 0.14T7 (Eelectro) −0.20± 0.23T8 (Eenv ) 0.04± 0.16

average 0.06

Use of energy termsmost frequent: T4, T5

least frequent: T1, T6

Use of operatorsmost frequentadd, divleast frequentsin, cos, log

Pawe l Widera Evolving energy function for PSP 2009-06-30 22 / 26

Page 42: A Genetic Programming Challenge: Evolving the Energy Function for Protein Structure Prediction

Distribution of terminals and operatorsDid the evolution discovered any knowledge?

energy term correlation

T1 (E13) 0.03± 0.11T2 (E14) 0.20± 0.17T3 (E15) 0.15± 0.15T4 (Estiff ) 0.24± 0.22T5 (EHB) −0.16± 0.20T6 (Epair ) 0.01± 0.14T7 (Eelectro) −0.20± 0.23T8 (Eenv ) 0.04± 0.16

average 0.06

Use of energy termsmost frequent: T4, T5

least frequent: T1, T6

Use of operatorsmost frequentadd, divleast frequentsin, cos, log

Pawe l Widera Evolving energy function for PSP 2009-06-30 22 / 26

Page 43: A Genetic Programming Challenge: Evolving the Energy Function for Protein Structure Prediction

Distribution of terminals and operatorsDid the evolution discovered any knowledge?

energy term correlation

T1 (E13) 0.03± 0.11T2 (E14) 0.20± 0.17T3 (E15) 0.15± 0.15T4 (Estiff ) 0.24± 0.22T5 (EHB) −0.16± 0.20T6 (Epair ) 0.01± 0.14T7 (Eelectro) −0.20± 0.23T8 (Eenv ) 0.04± 0.16

average 0.06

Use of energy termsmost frequent: T4, T5

least frequent: T1, T6

Use of operatorsmost frequentadd, divleast frequentsin, cos, log

Pawe l Widera Evolving energy function for PSP 2009-06-30 22 / 26

Page 44: A Genetic Programming Challenge: Evolving the Energy Function for Protein Structure Prediction

Distribution of terminals and operatorsDid the evolution discovered any knowledge?

energy term correlation

T1 (E13) 0.03± 0.11T2 (E14) 0.20± 0.17T3 (E15) 0.15± 0.15T4 (Estiff ) 0.24± 0.22T5 (EHB) −0.16± 0.20T6 (Epair ) 0.01± 0.14T7 (Eelectro) −0.20± 0.23T8 (Eenv ) 0.04± 0.16

average 0.06

Use of energy termsmost frequent: T4, T5

least frequent: T1, T6

Use of operatorsmost frequentadd, divleast frequentsin, cos, log

Pawe l Widera Evolving energy function for PSP 2009-06-30 22 / 26

Page 45: A Genetic Programming Challenge: Evolving the Energy Function for Protein Structure Prediction

Summary

ConclusionsGP evolved function outperforms linear combination of weightsGP choice of energy terms reflects their correlation to RMSDdecoys from real prediction process are more difficult to assesbloat control is necessary to evolve more compact functions

Ideas for the futuremore complex total fitnessdistance measured using ProCKSI consensusRosetta generated decoysadditional energy terms (SA, RCH)

Pawe l Widera Evolving energy function for PSP 2009-06-30 23 / 26

Page 46: A Genetic Programming Challenge: Evolving the Energy Function for Protein Structure Prediction

Summary

ConclusionsGP evolved function outperforms linear combination of weightsGP choice of energy terms reflects their correlation to RMSDdecoys from real prediction process are more difficult to assesbloat control is necessary to evolve more compact functions

Ideas for the futuremore complex total fitnessdistance measured using ProCKSI consensusRosetta generated decoysadditional energy terms (SA, RCH)

Pawe l Widera Evolving energy function for PSP 2009-06-30 23 / 26

Page 47: A Genetic Programming Challenge: Evolving the Energy Function for Protein Structure Prediction

Thank you!

AcknowledgementsThis work was supported by Marie CurieAction MEST-CT-2004-7597 under theSixth Framework Programme of theEuropean Community.

Ben Gurion University of the Negev’sDistinguished Scientists Visitor Programand Prof. Moshe Sipper.

[email protected]

Pawe l Widera Evolving energy function for PSP 2009-06-30 24 / 26

Page 48: A Genetic Programming Challenge: Evolving the Energy Function for Protein Structure Prediction

Publications

1 P. Widera, J.M. Garibaldi, N. KrasnogorEvolutionary design of the energy function for proteinstructure predictionIn IEEE Congress on Evolutionary Computation, CEC’09,p1305–1312, Trondheim, Norway, May 2009

2 P. Widera, J.M. Garibaldi, N. KrasnogorGP challange: evolving the energy function for proteinstructure predictionsubmitted to Genetic Programming and Evolvable Machines, 2008

Pawe l Widera Evolving energy function for PSP 2009-06-30 25 / 26

Page 49: A Genetic Programming Challenge: Evolving the Energy Function for Protein Structure Prediction

References

Anfinsen, C. (1973).Principles that Govern the Folding of Protein Chains.Science, 181(4096):223–30.

Dill, K. A. and Chan, H. S. (1997).From Levinthal to pathways to funnels.Nat Struct Mol Biol, 4(1):10–19.

Rohl, C. A., Strauss, C. E. M., Misura, K. M. S., and Baker, D. (2004).Protein Structure Prediction Using Rosetta.In Brand, L. and Johnson, M. L., editors, Numerical Computer Methods, Part D, volume Volume 383 of Methods inEnzymology, pages 66–93. Academic Press.

Wu, S., Skolnick, J., and Zhang, Y. (2007).Ab initio modeling of small proteins by iterative TASSER simulations.BMC Biol, 5(1):17.

Zhang, Y., Kolinski, A., and Skolnick, J. (2003).TOUCHSTONE II: A New Approach to Ab Initio Protein Structure Prediction.Biophys. J., 85(2):1145–1164.

Pawe l Widera Evolving energy function for PSP 2009-06-30 26 / 26