View
221
Download
1
Category
Tags:
Preview:
Citation preview
AAUTOUTODDOCKOCK
An Automated Docking Software for An Automated Docking Software for Predicting Optimal Protein-Ligand Predicting Optimal Protein-Ligand InteractionInteraction
ByBy
Susan McClatchy, Milind Misra,Susan McClatchy, Milind Misra,
Chandreyee Mukherjee, Indu ShrivastavaChandreyee Mukherjee, Indu Shrivastava
IntroductionIntroduction
Chandreyee MukherjeeChandreyee Mukherjee
Interaction between biomolecules lie at the core of all Interaction between biomolecules lie at the core of all metabolic processes and life activitiesmetabolic processes and life activities
The number of solved protein structures available in the The number of solved protein structures available in the databases is expanding exponentiallydatabases is expanding exponentially
To understand their functions it is essential to elucidate To understand their functions it is essential to elucidate the interaction mechanisms between the different the interaction mechanisms between the different moleculesmolecules
Primary importance lies in rational drug designPrimary importance lies in rational drug design Depending upon the success of the docked molecules the Depending upon the success of the docked molecules the
docking ligand may be redesigned or its structure further docking ligand may be redesigned or its structure further refined. refined.
Also important in the area of immunology to study Also important in the area of immunology to study antigen-antibody interaction.antigen-antibody interaction.
Automated Docking: Automated Docking: ImportanceImportance
Inhibitor bound to active site of HIVPR
Surface structure of HIVPR with bound inhibitor
Prediction of the optimal physical configuration and energy between two molecules
The docking problem optimizes:
Binding between two molecules such that their orientation maximizes the interaction
Evaluates the total energy of interaction such that for the best binding configuration the binding energy is the minimum
The resultant structural changes brought about by the interaction
What is docking?What is docking?
1. Protein-Protein Docking: Both molecules are rigid Interaction produces no change in
conformation Similar to lock-and key model
2. Protein-Ligand Docking: Ligand is flexible but the receptor protein is
rigid Interaction produces conformational
changes in ligand
Categories of dockingCategories of docking
1. Protein-Protein Docking
2. Protein-Ligand Docking
optimized
It involves:
Finding useful ways of representing the molecules and molecular properties.
Exploration of the configuration spaces available for interaction between ligand and receptor.
Evaluate and rank configurations using a scoring system, in this case the binding energy
However, since it is difficult to evaluate the binding energy because the binding sites may not be easily accessible, the binding energy is modeled as follows:
∆G bind= ∆Gvdw + ∆Ghbond + ∆Gelect + ∆G conform+ ∆G tor + ∆G sol
Docking uses a “search Docking uses a “search and score” methodand score” method
Developed by AJ Olson’s group in 1990.
AutoDock uses free energy of the docking molecules using 3D potential-grids
Uses heuristic search to minimize the energy.
Search Algorithms used: Simulated Annealing
Genetic Algorithm
Lamarckian GA (GA+LS hybrid)
The AutoDock SoftwareThe AutoDock Software
Algorithms OverviewAlgorithms Overview
Simulated Annealing Based on temperature effects Start with high temperature and global search Lower temperature local search
Genetic Algorithm Charles Darwin’s Theory of Evolution Genotype Phenotype Lamarckian Algorithm ( Jean –Baptiste de
Lamarck) Phenotype Genotype
Study algorithms used to perform the searches and to calculate minimum energy
Discuss why GA+LS hybrid better than SA
Look at an example, i.e., dock a ligand to a protein molecule using latest AutoDock version
Project GoalProject Goal
The AlgorithmsThe Algorithms
Sue McClatchySue McClatchy
Simulated AnnealingSimulated Annealing Algorithm modeled after the cooling of a solution to Algorithm modeled after the cooling of a solution to
form glass, though it’s better explained by crystal form glass, though it’s better explained by crystal formationformation
Given a long enough cooling time, molecules will relax Given a long enough cooling time, molecules will relax into their lowest energy state to form the largest into their lowest energy state to form the largest crystalscrystals Quick cooling - highly disordered systemQuick cooling - highly disordered system Slow cooling - highly ordered crystal, with each Slow cooling - highly ordered crystal, with each
molecule in its lowest energy statemolecule in its lowest energy state Algorithm simulates either linear or proportional slow Algorithm simulates either linear or proportional slow
cooling cooling
The SA AlgorithmThe SA Algorithm Uses neighborhood operator N(s) to generate a set of solutions Uses neighborhood operator N(s) to generate a set of solutions
according to a fixed distributionaccording to a fixed distribution New solution compared to preceding solution, and is accepted if New solution compared to preceding solution, and is accepted if
its energy is lower than that of previous solutionits energy is lower than that of previous solution If new solution has higher energy, it is accepted probabilistically If new solution has higher energy, it is accepted probabilistically
according to Boltzmann distribution (see figure above)according to Boltzmann distribution (see figure above) At high temperatures, many higher energy solutions will be At high temperatures, many higher energy solutions will be
accepted; at low temps., majority of probabilistic moves accepted; at low temps., majority of probabilistic moves rejectedrejected
Boltzmann probability distribution = e exp(delta E/T) where Boltzmann probability distribution = e exp(delta E/T) where delta E = energy difference between two solutions, delta E = energy difference between two solutions, T = temperature T = temperature
Boltzmann finds p(of finding a system with energy E at temp T)Boltzmann finds p(of finding a system with energy E at temp T)
Pseudocode for SAPseudocode for SACompute a random initial state sCompute a random initial state s
n=0, x*n=0, x*nn = s = s // initialize best solution to s and first state to 0// initialize best solution to s and first state to 0Repeat i = 1, 2, …Repeat i = 1, 2, … // specify number of temperatures to try// specify number of temperatures to try
Repeat j = 1, 2, …, mRepeat j = 1, 2, …, mi i // no. of steps to perform for each temp.// no. of steps to perform for each temp. TTii
Compute a neighbor s’ = N(s) Compute a neighbor s’ = N(s) // s’ = new solution from // s’ = new solution from N(s)N(s)
if (f(s’) <= f(s)) thenif (f(s’) <= f(s)) then // if energy of s’ <= energy of s// if energy of s’ <= energy of s s = s’s = s’ // accept new solution s’// accept new solution s’
if (f(s) < f(x*if (f(s) < f(x*nn)) then)) then // if energy of new solution <// if energy of new solution <
x*x*nn = s = s // energy of best solution of // energy of best solution of n = n + 1n = n + 1 // state n, replace best with new // state n, replace best with new
endifendifelse else // otherwise replace s with s’ using// otherwise replace s with s’ using
s = s’ with probability e s = s’ with probability e (f(s) - f(s’))/T(f(s) - f(s’))/Tii // Boltzmann dist.// Boltzmann dist.
endifendifEndRepeatEndRepeat
EndRepeatEndRepeat
How Genetic Algorithms How Genetic Algorithms Work Work - A Simple Example- A Simple Example
1 1 1 1 0 0
0 0 0 0 0 1
1 0 0 0 0 1
0 0 0 0 0 0
Initial population of Initial population of binary creatures binary creatures having 6 “genes”having 6 “genes”
Each gene has two Each gene has two different alleles, different alleles, either a 0 or a 1either a 0 or a 1
Three operators: Three operators: crossover, mutation crossover, mutation and selectionand selection
SelectionSelection
1 1 1 1 0 0
0 0 0 0 0 1
1 0 0 0 0 1
0 0 0 0 0 0
Selection based on a Selection based on a fitness function f(x)fitness function f(x)
This operator chooses This operator chooses those individuals with those individuals with the lowest valuesthe lowest values
Those with higher Those with higher values chosen with a values chosen with a very low probabilityvery low probability
Sco
re
20
13
48
52
CrossoverCrossover
0 0 0 1 0 0
1 1 1 0 0 1
1 1 1 1 0 1
0 0 0 0 0 0
1 1 1 1 0 0
0 0 0 0 0 1
1 1 1 1 0 0
0 0 0 0 0 1
MutationMutation
0 0 1 1 0 0
1 1 1 0 1 1
1 1 1 1 0 1
0 0 1 0 1 0
0 0 0 1 0 0
1 1 1 0 0 1
1 1 1 1 0 1
0 0 0 0 0 0
ReplacementReplacement Lower scoring individuals Lower scoring individuals
create more offspring, higher create more offspring, higher scoring ones create fewer or scoring ones create fewer or none at all none at all
Offspring replace parental Offspring replace parental generationgeneration
““Elitism” function allows best Elitism” function allows best individual from parent individual from parent generation to persist, if it is generation to persist, if it is a better solution than new a better solution than new individuals createdindividuals created
Cycle of selection, mutation, Cycle of selection, mutation, crossover and replacement crossover and replacement
repeatedrepeated
0 0 1 1 0 0
1 1 1 0 1 1
1 1 1 1 0 1
0 0 1 0 1 0
Sco
re#
off
sp
15 1
9 1
22 0
1 2
Pseudocode for GAPseudocode for GA
Select an initial population set xSelect an initial population set xii0 =0 = {x {x11
0 0 ,, xx2200,…, x,…, xMM
00}}
Determine fitness values f(xDetermine fitness values f(xii00) for each individual ) for each individual
Repeat for g = 1, 2, … # of generationsRepeat for g = 1, 2, … # of generationsPerform selectionPerform selection
Perform crossover with probability Perform crossover with probability Perform mutation with probability Perform mutation with probability Determine fitness f(xDetermine fitness f(xii
gg) for new individuals) for new individuals
xxgg** = argmin = argmini=1,…M i=1,…M f(xf(xii
gg) and y) and ygg* = f(x* = f(xgg**))
Perform replacementPerform replacement
Until stopping criterion (# of generations) is reachedUntil stopping criterion (# of generations) is reached
How GA works in How GA works in AutoDockAutoDock
Ligand’s “genes” are its Ligand’s “genes” are its x, y and z coordinatesx, y and z coordinates
These form a unit vector, These form a unit vector, which is given a random which is given a random rotation angle between rotation angle between
00oo and 360 and 360
o o to form a to form a
quaternionquaternion Additional genes may Additional genes may
represent torsion angles represent torsion angles between bonds of the between bonds of the ligandligand
MappingMapping In standard GA, the In standard GA, the
genotype (x,y,z coordinates genotype (x,y,z coordinates plus rotation and any plus rotation and any torsion angles) are mapped torsion angles) are mapped to the fitness function f(x)to the fitness function f(x)
The fitness function value The fitness function value corresponds to each corresponds to each individual’s phenotypeindividual’s phenotype
According to the right hand According to the right hand side of the figure, side of the figure, genotypes of parents with genotypes of parents with high f(x) values are high f(x) values are mutated to form genotypes mutated to form genotypes of children with lower f(x) of children with lower f(x) valuesvalues
Selection, Crossover & Selection, Crossover & MutationMutation
Selection chooses ligands Selection chooses ligands with the lowest fitness with the lowest fitness (energy) values(energy) values
Crossover exchanges x, Crossover exchanges x, y, z coordinates, or y, z coordinates, or rotations or torsions rotations or torsions between these ligandsbetween these ligands
Example: Two ligands Example: Two ligands with xyz coordinates Abc with xyz coordinates Abc and aBc Crossover and aBc Crossover results in new individuals results in new individuals with coordinates abc and with coordinates abc and ABc ABc
Mutation operator Mutation operator mutates coordinate or mutates coordinate or other angle values by other angle values by adding a random real adding a random real number according to a number according to a Cauchy distribution, Cauchy distribution, which is similar to a which is similar to a Gaussian but has thicker Gaussian but has thicker tailstails
ReplacementReplacement Individuals with better-Individuals with better-
than-average fitness than-average fitness receive proportionally receive proportionally more offspringmore offspring
nnoo= (f= (fww – f – fii)/(f)/(fw w - <f>),- <f>),
ffw w != <f> != <f>
wherewhere
nnoo= number of offspring= number of offspring
ffi i = fitness of individual = fitness of individual (energy of ligand)(energy of ligand)
ffw w = fitness of worst = fitness of worst individual in last g individual in last g generations (typically 10) generations (typically 10)
<f> = mean fitness of <f> = mean fitness of populationpopulation
Lamarckian Genetic Lamarckian Genetic AlgorithmAlgorithm
According to left hand side According to left hand side of figure, LGA finds lowest of figure, LGA finds lowest fitness function (energy) fitness function (energy) values first, then maps values first, then maps these values to their these values to their respective genotypesrespective genotypes
Genetic algorithm plus Solis Genetic algorithm plus Solis and Wets local searchand Wets local search
Better performance than Better performance than either simulated annealing either simulated annealing or genetic algorithm aloneor genetic algorithm alone
The ApplicationThe Application
Milind MisraMilind Misra
HIV-1 Protease and HIV-1 Protease and AHA006AHA006
HIV-1 Protease in complex with the HIV-1 Protease in complex with the cyclic sulfamide inhibitor, AHA006 cyclic sulfamide inhibitor, AHA006
Source: Protein Data BankSource: Protein Data Bank Authors: K. Backbro, T. Unge Authors: K. Backbro, T. Unge Exp. Method: X-ray Diffraction (2 Å res.)Exp. Method: X-ray Diffraction (2 Å res.) Primary Citation: Backbro Primary Citation: Backbro et alet al, J Med , J Med
Chem 40 pp. 898 (1997)Chem 40 pp. 898 (1997) Polymer Chains: A, B; Residues: 198; Polymer Chains: A, B; Residues: 198;
Atoms: 1632Atoms: 1632
Protein (HIV-1 Protease)
Ligand (AHA006)
(Source: PDB)
HIV-1 Protease dimer
(Rasmol)
(SYBYL)
Initial X-Ray crystallographic positions of protein and ligand
Docking Preparation – Docking Preparation – LigandLigand
Assign chargesAssign charges Define rotatable bondsDefine rotatable bonds Rename aromatic carbonsRename aromatic carbons Merge non-polar hydrogensMerge non-polar hydrogens Write .pdbq ligand fileWrite .pdbq ligand file
Docking Preparation – Docking Preparation – ProteinProtein
Add essential hydrogensAdd essential hydrogens Load chargesLoad charges Merge lone-pairsMerge lone-pairs Add solvation parametersAdd solvation parameters Write .pdbqs protein fileWrite .pdbqs protein file
AutoDock uses AutoDock uses grid-based grid-based dockingdocking
Ligand-protein Ligand-protein interaction interaction energies are pre-energies are pre-calculated and calculated and then used as a then used as a look-up table look-up table during simulationduring simulation
Grid maps are Grid maps are constructed based constructed based on atoms of on atoms of interest in ligand interest in ligand (here CA(here CANNOOSSHH))
Docking Preparation – GridDocking Preparation – Grid
(AutoDockTools)
Docking – Simulated Docking – Simulated AnnealingAnnealing
Runs = 100Runs = 100 Cycles = 50Cycles = 50 Initial Temp (RT) = 1,000Initial Temp (RT) = 1,000 Temp reduction factor = .95Temp reduction factor = .95 Linear temperature reductionLinear temperature reduction Translation reduction factor = 1Translation reduction factor = 1 Quaternion reduction factor = 1Quaternion reduction factor = 1 Torsional reduction factor = 1Torsional reduction factor = 1 # rotatable bonds = 12# rotatable bonds = 12 Initial coordinates = RandomInitial coordinates = Random Initial quaternion = RandomInitial quaternion = Random Initial dihedrals = RandomInitial dihedrals = Random Translation step = 2.0 ÅTranslation step = 2.0 Å Quaternion step = 50 degQuaternion step = 50 deg Torsion step = 50 degTorsion step = 50 deg
Results:Results: 100 different clusters100 different clusters Energy range: -0.63 to Energy range: -0.63 to
+64,000+64,000 Conformation #81: -0.63Conformation #81: -0.63 Conformation #67: +20.02Conformation #67: +20.02 Conformation #68: +10.74Conformation #68: +10.74
Lowest energy conf not close Lowest energy conf not close to position but similar to to position but similar to originaloriginal
Conf #67 closest to position Conf #67 closest to position and conformation of original and conformation of original ligand; higher energyligand; higher energy
Conf #68 close to position but Conf #68 close to position but not conformation of original not conformation of original ligand; not as high energyligand; not as high energy
(SYBYL)
Original ligand confSA conformation #67
Original ligand confSA conformation #67
(SYBYL)
Close-up of previous
(SYBYL)
Original ligand confSA conformation #67
100 Clustered SA 100 Clustered SA ConformationsConformations
(gOpenMol)
Docking – Genetic Docking – Genetic AlgorithmAlgorithm
Runs = 50Runs = 50 # Evaluations = 250,000# Evaluations = 250,000 Population size = 50Population size = 50 Elitism count = 1Elitism count = 1 Mutation rate = 0.02Mutation rate = 0.02 Crossover rate = 0.8Crossover rate = 0.8 Window size = 10Window size = 10 Cauchy alpha = 0Cauchy alpha = 0 Cauchy beta = 1Cauchy beta = 1 # rotatable bonds = 12# rotatable bonds = 12 Initial coordinates = RandomInitial coordinates = Random Initial quaternion = RandomInitial quaternion = Random Initial dihedrals = RandomInitial dihedrals = Random Translation step = 2.0 ÅTranslation step = 2.0 Å Quaternion step = 50 degQuaternion step = 50 deg Torsion step = 50 degTorsion step = 50 deg
Results:Results: 50 different clusters50 different clusters Energy range: -18.66 to Energy range: -18.66 to
+86.28+86.28 Conformation #39: -18.66Conformation #39: -18.66 Conformation #9: -10.60Conformation #9: -10.60
Lowest energy conformation Lowest energy conformation overall closest to original overall closest to original ligand conformationligand conformation
If only 10 runs had been used If only 10 runs had been used instead of 50, then conf #9 instead of 50, then conf #9 would have been the lowest would have been the lowest energy conformation.energy conformation.
Docking – Local SearchDocking – Local SearchResults:Results: 18 different clusters18 different clusters Energy range: +35.92 to Energy range: +35.92 to
+215,200+215,200 Confs #20, 21, 22, 23: +35.92Confs #20, 21, 22, 23: +35.92
Lowest energy conformation Lowest energy conformation was most dissimilar to original was most dissimilar to original ligand conformationligand conformation
Better results could have been Better results could have been obtained by reducing the step obtained by reducing the step sizessizes
Runs = 50Runs = 50 Solis-Wets iterations = 300Solis-Wets iterations = 300 Consecutive successes = 4Consecutive successes = 4 Consecutive failures = 4Consecutive failures = 4 Rho = 1Rho = 1 Lower bound on rho = 0.01Lower bound on rho = 0.01 LS frequency = 0.06LS frequency = 0.06 # rotatable bonds = 12# rotatable bonds = 12 Initial coordinates = RandomInitial coordinates = Random Initial quaternion = RandomInitial quaternion = Random Initial dihedrals = RandomInitial dihedrals = Random Translation step = 2.0 ÅTranslation step = 2.0 Å Quaternion step = 50 degQuaternion step = 50 deg Torsion step = 50 degTorsion step = 50 deg
Docking – Lamarckian GADocking – Lamarckian GAResults:Results: 10 different clusters10 different clusters Energy range: -18.10 to –8.38Energy range: -18.10 to –8.38 Conformation #7: -18.10Conformation #7: -18.10
Lowest energy conformation Lowest energy conformation fairly similar to original ligand fairly similar to original ligand conformationconformation
If the number of runs was If the number of runs was restricted to 10 for both GA restricted to 10 for both GA and LGA, LGA would have and LGA, LGA would have generated the best structuregenerated the best structure
Runs = 10Runs = 10 Max # Evaluations = 250,000Max # Evaluations = 250,000 Max # Generations = 27,000Max # Generations = 27,000 Population size = 50Population size = 50 Elitism count = 1Elitism count = 1 Mutation rate = 0.02Mutation rate = 0.02 Crossover rate = 0.8Crossover rate = 0.8 Window size = 10Window size = 10 Cauchy alpha = 0Cauchy alpha = 0 Cauchy beta = 1Cauchy beta = 1 Solis-Wets iterations = 300Solis-Wets iterations = 300 Consecutive successes = 4Consecutive successes = 4 Consecutive failures = 4Consecutive failures = 4 Rho = 1Rho = 1 Lower bound on rho = 0.01Lower bound on rho = 0.01 LS frequency = 0.06LS frequency = 0.06 * Gray options ** Gray options *
(SYBYL)
Original ligand confBest GA confBest LGA confBest SA confBest LS conf
(SYBYL)
Original ligand confBest GA confBest LGA confBest SA conf
ReferencesReferenceshttp://cmgm.stanford.edu/biochem218/Projects%201998/Apaydin.pdfhttp://www.biz.uiowa.edu/class/6K299_menczer/PPT/Hart/sld018.htmlhttp://www.biz.uiowa.edu/class/6K299_menczer/PPT/Hart/sld018.htmlhttp://cs.felk.cvut.cz/~xobitko/ga/http://www.bch.msu.edu/labs/kuhn/web/projects/screening/solvation.htmlhttp://wwwcmc.pharm.uu.nl/gillies/thesis/http://www.chem.uidaho.edu/~honors/boltz.html
S.Kumar et.al. “Protein Flexibility and Electrostatic Interactions.” IBM Journal of Research and Development Vol45. No ¾ 2001.
G. Morris et.al. “Automated Docking Using a Lamarckian Genetic Algorithm and an Empirical Binding Free Energy Function.” Journal of Computational Chemistry, Vol. 19, No. 14, 1639-1662 (1998)
C. Rosin et.al. “A Comparison of Global and Local Search Methods in Drug Docking.” UCSD CSE Technical Report #CS97-522 (1997)
C. A. Sotriffer et.al. “Automated Docking of Ligands to Antibodies: Methods and Applications.” Methods 20, 280-291 (2000)
M. Vieth et.al. “Assessing Search Strategies for Flexible Docking.”
Practical Handbook of Genetic Algorithms. Edited by Lance Chambers An Introduction to Genetic Algorithms. Melanie Mitchell. Goodsell and Olson Prot. Struct. Func. Genet, 8, 195(1990). Principals of Biochemistry: LehningerR. Durbin, S Eddy, A. Krogh, G. Mitchison Biological sequence analysisWm. E. Hart. “A Theoretical Comparison of Genetic Algorithms and Simulated
Annealing” Sandia National Laboratories, www.cs.sandia.gov/~wehart.
Recommended