1
Abstract The search for an efficient protein conformation predicting method began in 1972; however, only minor progress has been made towards the 3-D prediction algorithm. Our research focuses on a novel search and optimization method based on the concept of natural selection, a Genetic Algorithm. We have successfully developed a GA program (Genetic Algorithm based Protein Structure Search, GAPSS) that minimizes the potential energy of proteins and generates the corresponding Cartesian coordinates. We were also able to visualize the predicted conformations and compare them to their known natural conformations. Genetic Algorithm 1. The genetic algorithm starts with an initial population of protein conformations. The population numbers vary depending on the preference of the researcher. 2. After the initial population has been determined, a potential energy function is applied to the population. 3. The reproduction process takes place with the occurrence of three genetic operators. Operators are rules that modify individuals and the population to include diversity to the process. Selection – elitism within population Crossover – exchange dihedral angles between chromosomes Mutation – randomly replace a gen with a new one Adaptation – maximize the fitness of each individual 4. The GA stops on the occurrence of or two occasions one is that there is a solution or the GA has proved the impossibility of the reproduction. Bibliography A. LIWO, P. M., Wawak, R. J., Rackovsky, S., & Scheraga, H. A. (1993). Agostini, L., & Morosetti, S. (2003). Cox, G. A., Mortimer-Jones, T. V., Taylor, R. P., & Johnston, R. L. (2004). Creighton, T. E. (1988). Cui, Y., Chen, R. S., & Wong, W. H. (1998). Dandekar, T., & Argos, P. (1994). Dill, K. A. (1990). Gibson, K. D., & Scheraga, H. A. (1967). Gordon, M. S. (1969). Jayaram, B., Bhushan, K., Shenoy, S. R., Narang, P., Bose, S., Agrawal, P., et al. (2006). Klepeis, J. L., & Floudas, C. A. (1999). Momany, F. A., Carruthers, L. M., McGuire, R. F., & Scheraga, H. A. (1974). Momany, F. A., McGuire, R. F., Burgess, A. W., & Scheraga, H. A. (1975). Nemethy, G., Gibson, K. D., Palmer, K. A., Yoon, C. N., Paterllini, G., Zagari, A., et al. (1992). Pedersen, J. T., & Moult, J. (1996). Pedersen, J. T., & Moult, J. (1997). Pitzer, R. A. (1983). Rabow, A. A., & Scheraga, H. A. (1996). Sippl, M. J., Nemethy, G., & Scheraga, H. A. (1984). Standley, D. M., Gunn, J. R., Friesner, R. A., & McDermott, A. E. (1998). Unger, R., & Moult, J. (1993). Yan, J. F., Momany, F. A., & Scheraga, H. A. (1969). Yang, Y., & Liu, H. (2006). Protein Folding Prediction* Rufei Lu**, Lauren M. Yarholar**, Warren Yates**, Armando Diaz and Miguel J. Bagajewicz University of Oklahoma ― Chemical Engineering *This work was done as part of the capstone Chemical Engineering class at the University of Oklahoma **Capstone undergraduate students Figure 6 Performance analysis. (a) The minimum energy of each generation with different initial population at 3 generation limit and 20% mutation; (b) The minimum energy of each generation with different the percentage of mutation at 10 generation limit and 20 initial population. GAPSS predicted Single AA Conformations Alanine/A/ Ala Asparagine/N/ Asn Aspartic Acid/D/Asp Cysteine/C/ Cys Glutamine/Q/ Gln Glutamic Acid/E/Glu Glycine/G/ Gly Isoleucine/ I/Ile Leucine/L/ Leu Methionine/M/Met Serine/S/ Ser Threonine/T/ Thr Valine/V/ Val Energy Function 3 Primary Energy: Electrostatic Non-bonded (6-12) Hydrogen-Bonded “Torsion Energy ignored Not real interaction energy Only introduces a penalty for positive torsion Cysteine Loop- Closing Introduced only when more than one cysteine is present in the protein Set GA Parameters Initial Population Fitness Function Reproduction Process: Selection Crossover Mutation Adaptation Offspring Generation Termination Criterion End GA GAPSS Flow Chart Figure 5. Mutation Operators. (a) Uniform Mutation operator randomly replaces original values with values ranging from -180 to 180; (b) Non-uniform Mutation operator randomly replaces the value with different degrees. Mutation Operator - Figure 1. Modified Operators. (a) Crossover: Creates α- helices and b-sheets of random lengths at random start positions. Crossover will involve trading the two parameters between two individuals; (b) Mutation: Only circled region is only susceptible to mutation. α-helix/b-sheet Modified Operators - Figure 1. Crossover Operators. (a) Random 2-point crossover operator randomly exchange between parents 2 angels at a time; (b) Multiple entries crossover operator applies multiple random exchanges along the chromosome Crossover Operator Conclusion and Recommendations • GAPPS predicts short isolated native protein structures accurately. • GAPSS has demonstrated its ability to determine natural conformations for unknown proteins. • The resolution and accuracy of GAPSS depends largely upon the fitness function and the GA parameters optimization process. • To further improve the method, a more refined fitness function with torsion angle penalty terms, bond stretching, and bond angle bending should be used. • Solvation energies and entropic effects need to be added. - Figure 1. Adaptation Operators. Linear gradient search on each chromosome to minimize energy. Adaptation Operator GA Parameter Optimization - Figure 3. The structures have zero-gradient after adaptation. The zero linear gradient suggests these structures might be the natural conformations at local minima, since they have total energy level lower than the NMR confirmed structure. Local Minimum Structures Figure 4. Comparisons of two predicted backbone structures with theoretical structure. (a) and (d) are the theoretical backbone structures. (c) and (f) are the GAPSS predicted protein conformations. (b) and (e) are superimposed image of predicted and theoretical backbone conformations. Comparison of Predicted and Theoretical Enkephalin Figure 2. Potential energy profile of best prediction run (initial population: 50; generation limits: 15, and mutation percentage: 90%): the structures at each point are displayed under the chart. The energy trend suggests that more stringent GA parameter might lead to lower energy. Potential Energy Profile

Abstract The search for an efficient protein conformation predicting method began in 1972; however, only minor progress has been made towards the 3-D prediction

Embed Size (px)

Citation preview

Page 1: Abstract The search for an efficient protein conformation predicting method began in 1972; however, only minor progress has been made towards the 3-D prediction

AbstractThe search for an efficient protein conformation predicting method began in 1972; however, only minor progress has been made towards the 3-D prediction algorithm. Our research focuses on a novel search and optimization method based on the concept of natural selection, a Genetic Algorithm. We have successfully developed a GA program (Genetic Algorithm based Protein Structure Search, GAPSS) that minimizes the potential energy of proteins and generates the corresponding Cartesian coordinates. We were also able to visualize the predicted conformations and compare them to their known natural conformations.

Genetic Algorithm1. The genetic algorithm starts with an initial population of protein

conformations. The population numbers vary depending on the preference of the researcher.

2. After the initial population has been determined, a potential energy function is applied to the population.

3. The reproduction process takes place with the occurrence of three genetic operators. Operators are rules that modify individuals and the population to include diversity to the process.

• Selection – elitism within population

• Crossover – exchange dihedral angles between chromosomes

• Mutation – randomly replace a gen with a new one

• Adaptation – maximize the fitness of each individual

4. The GA stops on the occurrence of or two occasions one is that there is a solution or the GA has proved the impossibility of the reproduction.

BibliographyA. LIWO, P. M., Wawak, R. J., Rackovsky, S., & Scheraga, H. A. (1993).Agostini, L., & Morosetti, S. (2003). Cox, G. A., Mortimer-Jones, T. V., Taylor, R. P., & Johnston, R. L. (2004). Creighton, T. E. (1988). Cui, Y., Chen, R. S., & Wong, W. H. (1998). Dandekar, T., & Argos, P. (1994). Dill, K. A. (1990). Gibson, K. D., & Scheraga, H. A. (1967). Gordon, M. S. (1969). Jayaram, B., Bhushan, K., Shenoy, S. R., Narang, P., Bose, S., Agrawal, P., et al. (2006). Klepeis, J. L., & Floudas, C. A. (1999). Momany, F. A., Carruthers, L. M., McGuire, R. F., & Scheraga, H. A. (1974). Momany, F. A., McGuire, R. F., Burgess, A. W., & Scheraga, H. A. (1975). Nemethy, G., Gibson, K. D., Palmer, K. A., Yoon, C. N., Paterllini, G., Zagari, A., et al. (1992). Pedersen, J. T., & Moult, J. (1996). Pedersen, J. T., & Moult, J. (1997). Pitzer, R. A. (1983). Rabow, A. A., & Scheraga, H. A. (1996). Sippl, M. J., Nemethy, G., & Scheraga, H. A. (1984). Standley, D. M., Gunn, J. R., Friesner, R. A., & McDermott, A. E. (1998). Unger, R., & Moult, J. (1993). Yan, J. F., Momany, F. A., & Scheraga, H. A. (1969). Yang, Y., & Liu, H. (2006).

Protein Folding Prediction*Rufei Lu**, Lauren M. Yarholar**, Warren Yates**, Armando Diaz and Miguel J. Bagajewicz

University of Oklahoma ― Chemical Engineering*This work was done as part of the capstone Chemical Engineering class at the University of Oklahoma

**Capstone undergraduate students

Figure 6 Performance analysis. (a) The minimum energy of each generation with different initial population at 3 generation limit and 20% mutation; (b) The minimum energy of each generation with different the percentage of mutation at 10 generation limit and 20 initial population.

GAPSS predicted Single AA Conformations

Alanine/A/Ala Asparagine/N/Asn Aspartic Acid/D/Asp

Cysteine/C/Cys Glutamine/Q/Gln Glutamic Acid/E/Glu Glycine/G/Gly

Isoleucine/I/Ile Leucine/L/Leu Methionine/M/Met

Serine/S/Ser Threonine/T/Thr Valine/V/Val

Energy Function3 Primary Energy:

Electrostatic Non-bonded (6-12) Hydrogen-Bonded

“Torsion Energy “ ignored Not real interaction energy Only introduces a penalty

for positive torsion

Cysteine Loop-Closing Introduced only when

more than one cysteine is present in the protein

Set GA Parameters

Initial Population

Fitness Function

Reproduction Process:SelectionCrossoverMutation

Adaptation

Offspring Generation

Termination Criterion

End GA

GAPSS Flow Chart

Figure 5. Mutation Operators. (a) Uniform Mutation operator randomly replaces original values with values ranging from -180 to 180; (b) Non-uniform Mutation operator randomly replaces the value with different degrees.

Mutation Operator

-

Figure 1. Modified Operators. (a) Crossover: Creates α-helices and b-sheets of random lengths at random start positions. Crossover will involve trading the two parameters between two individuals; (b) Mutation: Only circled region is only susceptible to mutation.

α-helix/b-sheet Modified Operators

-

Figure 1. Crossover Operators. (a) Random 2-point crossover operator randomly exchange between parents 2 angels at a time; (b) Multiple entries crossover operator applies multiple random exchanges along the chromosome

Crossover Operator

Conclusion and Recommendations• GAPPS predicts short isolated native protein structures accurately.• GAPSS has demonstrated its ability to determine natural conformations for

unknown proteins. • The resolution and accuracy of GAPSS depends largely upon the fitness

function and the GA parameters optimization process. • To further improve the method, a more refined fitness function with torsion

angle penalty terms, bond stretching, and bond angle bending should be used.

• Solvation energies and entropic effects need to be added.

-

Figure 1. Adaptation Operators. Linear gradient search on each chromosome to minimize energy.

Adaptation Operator

GA Parameter Optimization

-

Figure 3. The structures have zero-gradient after adaptation. The zero linear gradient suggests these structures might be the natural conformations at local minima, since they have total energy level lower than the NMR confirmed structure.

Local Minimum Structures

Figure 4. Comparisons of two predicted backbone structures with theoretical structure. (a) and (d) are the theoretical backbone structures. (c) and (f) are the GAPSS predicted protein conformations. (b) and (e) are superimposed image of predicted and theoretical backbone conformations.

Comparison of Predicted and Theoretical Enkephalin

Figure 2. Potential energy profile of best prediction run (initial population: 50; generation limits: 15, and mutation percentage: 90%): the structures at each point are displayed under the chart. The energy trend suggests that more stringent GA parameter might lead to lower energy.

Potential Energy Profile