CHAPTER 2 2. MOLECULAR MODELINGshodhganga.inflibnet.ac.in/bitstream/10603/17255/7/07_chapter 2.pdf · 2. MOLECULAR MODELING Drug discovery is a multidisciplinary approach where drugs

27

CHAPTER 2

2. MOLECULAR MODELING

Drug discovery is a multidisciplinary approach where drugs are designed

and/or discovered. The drug design & discovery (DDD) process involves

the candidate identification, synthesis, screening studies, and pre-

clinical and clinical studies. The total expenditure for R&D to bring a

molecule (new chemical entity) to market (drug) is estimated to be ~ $1.8

billion dollars and time involved in drug discovery is ~12-15 years with

low rate of new discoveries. The lack in efficiency in converting lead

molecule to preclinical candidate and the ratio of converting of candidate

molecule from phase II to phase III trials are major issues.81 This

situation demands an alternate strategies to cut down the cost and time

involved in the drug discovery and as well as increase the success rate.

2.1 Rational Drug Design (RDD)

A quick look into the recent drug discoveries in the pharmaceutical

industry were there by luck and most of them by chance; also traditional

approaches fails to explain why any compound is resulting into active or

inactive and similarly how to optimize the compound, so that we can

enhance its activity and as well as its properties such as solubility. The

knowledge based approaches (RDD) concept originally developed from the

finding of chemoreceptor concept (Paul Ehrlich, 1872) and lock and key

concept (Emil Fischer, 1894) respectively.82 The further development in

28

the area of molecular biology, bioinformatics tools, crystallography leads

to the development of RDD approaches.

The development of pathophysiology and pathogen biochemistry,

leads to the development of rational approaches towards the drug design.

In early 1900s,Combinatorial chemistry approach made a lager impact in

the drug research and developmental strategiesarea and unlocked the

chemical-bottleneckduring optimization of lead moleculefrom “what can

we make” to “which should we make”.83-85In generalized manner, RDD

(Rational drug design) approach starts with the detection of key

biomolecular target(protein) and the associated affected biochemical

pathways due to the protein over-expression/malfunction and followed

by the designing of small molecules based on the lock & key approach

and which leads to modifying/inhibiting the function of these protein

target.Figure 2.1shows the different strategies involved in the RDD

inorder to identify the novel and potent NCE’s.

Figure 2.1: Different

Role of Computer Aided Molecular Design in Drug

The computer

knowledge based models and used to screen and eliminate many

irrelevant compounds and will help in the identification of new leads.

Ability of the identification of structurally diverse comp

binding affinities that can be correlate its biological activities are the

crucial for in any lead discovery programs/approaches. Many

computational approaches and algorithms are developed for

29

Different Strategies of Rational Drug Design

Role of Computer Aided Molecular Design in Drug Discovery

he computer-aided approaches are used to generate the



Ability of the identification of structurally diverse compounds with good




Discovery

aided approaches are used to generate the



ounds with good




30

combinational chemistry to generate a library with molecular diversity,

available vendor databases and fragment based approaches can be used

to generate an in-house library and these databases can be used for

screening purpose. Target based/structure based approaches are one of

the important approach in RDD, it’s about protein structure,

identification of binding site and analysis and docking approaches to

identify the binding mode of the newly designed compound. One of the

major issues in the SBDD is accurate prediction of the binding mode and

its binding affinities are difficult.86 The thermodynamics properties of

binding cannot be modeled analytically without simplifying the entropic

and associated energetic factors.87

RDDApproaches

Rational design approaches (RDD) are divided based on the

availability/existence of 3D-structure of the biomolecular target. These

are Analog based strategies (ABDD) and Structure based strategies

(SBDD).

2.1.1 Analog Based Studies

In analogue based strategies, it will explore the already

known/reported molecules against the biomolecular target and activity.

Based on the analysis of this reported compounds, set of rules

(hypothesis) are developed to identify/design a NCE's and also suggest

31

the modifications to existing compounds. Analog based approach mainly

depends on the available data (known/reported compounds with specific

activity against that particular target).

2.1.1.1 Quantitative Structure Activity Relationship (QSAR)

QSAR is the most common approach in ABDD. It generates the

correlation between the molecular descriptors (physico-chemical and

structural properties) of known/reported compounds and their

corresponding experimental activities. These generated equations

(correlations) are validated with known compounds and later this

information used in the design of potent NCEs and to optimize the

activity of the already known compounds.

At the earlier stages of QSAR development, used to study the effect

of the functional groups (presence/absence) in a series of compounds

(Free-Wilson model) with its experimental activity and later they used to

compare with the physico-chemical properties (Molecular Descriptors

such as., electronic properties, lipophilicity) of the molecules with its

experimental activity of the dataset (Hansch analysis). 3D-QSAR methods

are developed.88 To generate QSAR equation; (1) Enough no of active

ligands against particular target with structure and activity diverse, to

32

develop the structure activity relationships (SAR); (2) the equations

generated from QSAR, parameterized only for the one specific targetand

(3) also address molecular properties (Descriptors) influencing the

activity and how modify those properties.

2.1.1.1.1 Pharmacophore Model Generation

Pharmacophore modeling is another approach of theanalog based

method. The word “pharmacophore” was coined by Paul Ehrlich as the

skeleton (framework)carrying the crucial features (phoros) necessary for

biological activity of a compound (pharmacon). Soon after in 1977,Peter

Gund defined it as “a set of features (structural) in a molecule that is

recognized at a receptor site and is responsible for that molecule’s

biological activity”. Using reported compounds with its biological activity,

pharmacophore models are constructed and are refined using further

data. In addition to the ligand-based models, pharmacophore model can

constructed using from the known receptor-ligand interactions

(Structure based).In a similar manner, dynamic pharmacophore model

can be generated using molecular dynamics trajectories,which consider

the binding site dynamics (flexibility).89 The generated pharmacophore

models (ligand/structure-based)are used in the optimization of the

known compounds and also to for screening databases in order to get

new leads suitable for further development.90

33

Pharmacophore features

1. Hydrophobic

2. Hydrophobic aliphatic

3. Hydrophobic aromatic

4. Hydrogen bond acceptor

5. Hydrogen bond donor

6. Positive ionizable

7. Negative ionizable

8. Ring aromatic

Manual Pharmacophore Generation: Visual Pattern Recognition

Steps for the development of Pharmacophore model

1 Manualassessment and identification of the common features

shared by a bunch of activecompounds and those features

which are absent in the inactive compounds

2 Identification of common features its spatial arrangements with

other models

3 Improvement of a manual Pharmacophore models and followed

by the validation study of the generated models,to study the

pharmacophore mapping to the active ones and the followed by

the mapping to the inactive compounds.

34

4 Generation of Quantitative Pharmacophore modelsusing the

compounds by known biological activity (IC50), until the

preferred model is obtained.

Compound is considered as inactive, when

a) There is a short of the important groups necessary for

biomolecular target recognition, which are not completely mapping

on to the essential Pharmacophore.

b) Though it contains the pharmacophore features, nevertheless it

also contains the unnecessary groups which obstruct with target

recognition/binding and that can be optimized using these

pharmacophore models.

c) It can be less soluble when compared to its bioactive conformation

of the compound.

d) It has bulky groups which will sterically stop them to interact with

the target biomolecule, it’s an additional asset that might be

detected by these models.

If the protein crystallographic structure is reported and the complete

information about the active site and ligand binding pattern is well

explored, thenthe medicinal chemist should depends on the structure

activity for the ligandsgiven. If all the givendataset compounds are active

35

against the receptor and by studying the interaction pattern, we can

describe the commonality between the data set.

The Pharmacophore model can be used for

� To generate new ideas (search for a new candidate)

� Rank/prioritizing hits

� Develop the virtual compound library

� Estimate the activity

� Using the resulting alignments for further studies

� Finalized models are considered asqueries for mining

Using Catalystpackage (Accelry’s Suite), two types of hypotheses or

pharmacophore models can be generated, based on the

availability/accessibility to the reported compounds and its activities.91

While activity is not considered and needs searchfor common

chemical features present in the set of compounds, in that caseHipHop

module in Catalyst can be used for this function.

If activity data is availablefor set compounds, in that case HypoGen

modulein Catalyst is used. Quantitative (Feature based) models can be

36

generated by means of HypoGen can be used as query for virtual

screening process to recognize as potential hits.

HipHop: (Qualitative/Common feature based)

Hypothesis, or Pharmacophore model, contains of a 3D-arrangement of

chemical features encircled by spheres. Each sphere has a tolerance,

defined by the area in cartesian space which can be occupied by a

particular chemical significance. Each and every feature is assigned with

a weight that indicates its significance within the hypothesis. Whereas,

feature with high weight, indicates that the feature is responsible for the

activity compare other features of the hypothesis.

While the data set (ligand) is small (<15 compounds), with

biological activity range also not sufficient, one can generate a

hypotheses based on the common feature alignments with HipHop

module inCatalyst suite.91,92 HipHop module generates a common

chemical features pharmacophore models share by a set of actives.

HipHop findsthe 3D-dimensional spatial arrangements of chemical

features for common set of actives. The configurations are recognized by

pruned exhaustive search,it begins with the small feature combination

37

and widen them till no further configuration is found. Onehas to define

the number of actives that should map the features entirely or number of

actives can moderately ignore the feature mapping. On basis of the user-

defined options and chosen features, it generates the simplest to varied

hypotheses.

� Principle

– It represents the top active/reference compound(s)

– reference configuration models

0 don’t consider these compounds

1 consider this compounds configurations

2 it consider this as active/reference compound

– only for generation of HipHop model

� Max Omit Features

– It specifies number of features can be omitted for each one

0 map all features

1 all features but at least one/more feature should map to

hypotheses

2 no need of mapping all the features

38

– only for generation of HipHop model

HypoGen: (Quantitative Pharmacophore Models)

It generatesa variety of Structure Activity Relationship (SAR)

hypothesis models by making use of data set moleculeswithits

experimental activity data. HypoGen creates many hypotheses models

and recognizes the features which are commonly sharedin the actives,

but they are not shared by the inactives. The top pharmacophore model

is taken for the prediction of the activity of unknown compounds or to

look for new possible leads contained in 3D chemical databases.93,94

HypoGen generates hypotheses with a set of chemical features in

3D space, each feature contains a certain weight and tolerance that fit

entirely to the hypotheses features of the training set, and the

experimental activity is correlated to it.

The HypoGen has three phases:

1. Constructive phase

2. Subtractive phase

3. Optimization phase

In the constructive phase, it locates pharmacophore models that

are common and shared by actives, while the subtractive phase

withdraws pharmacophore models which are common to the inactives,

and lastly the optimization phase will develop the initial models. The

39

resulting models should contain a set of best features with regression

data. The validated models are utilized as queries in screening to explore

new hits (Figure 2.2) from a virtual database.

Figure 2.2: Pharmacophore guided lead optimization

Running HypoGen

Preparation of Training set:

� Training set should contain minimum of 16 molecules to assure

the statistical power of the model

� Experimental Activities rangeshould be minimum of 4 orders

magnitude

� It should contain 3-4 compounds (every order of magnitude)

40

� No redundant information in activity & No problems in

excluded volume

HypoGen goes through3 phases, a constructive, subtractive &

optimization phase (shown in Figure 2.3).

HypoGen calculates the cost of two observed hypotheses,Fixedcost

(minimum cost), and the Null cost (high cost). Ideal hypothesis cost

should contain the value between these two cost values and it must be

close to the Fixed (Ideal) cost compare to the Null cost.

As per the pharmacophoreguidelines and studies shows that if the

cost difference between a returned hypothesis and the Null hypothesis is

around 40-60 bits, there is a 75-90% chance of true correlation.

Another useful number in the pharmacophore is the Entropy of

hypothesis. It should bebelow 17, a careful inspection of pharmacophore

models will be necessary to select the best model out of it.

Constructive phase

The concept of constructive phase is same as HipHopalgorithm.

Steps:

1) It consider the top 2 actives

41

2) Hypotheses (maximum 5 features) from the top two

activeswill begenerated and stored

3) the fit (values) the other actives are same

Subtractive phase

The hypotheses models generated during the constructive phase

are examined and if any models found common to the most of the

inactives, they will be removed.In this phase, the program deletes all

hypotheses which areunlikely to be of use.

Figure 2.3: Hypotheses generation

42

Optimization phase

During the optimization phase, itutilizes small adjustments to the

hypotheses features formed in the constructive phase, in order to

enhance score and regression values.During the optimization phase, it

implements simulated annealing algorithm.

HypoRefine

This algorithm is an additional algorithm for Catalyst HypoGen algorithm

for making Structure activity relationship (SAR)-based pharmacophore

models required to predict the activities of new molecules. HypoRefine

assist to develop the predictive hypothesis models with improved

regression data that correlates hypothesis which has steric properties.

Besides, it can assist to overcome prediction of inactives (false positives)

with pharmacophore features which are common to the other actives

present in the dataset.

Analysis of cost parameters from the HypoGen output files

During the automated hypothesis generation runs, Catalyst

generates and discards no of models. It differentiates the models by

applying costparameters. The general assumption is based on Occam’s

razor; the simplest model is best with good regression parameters. In

general, if the difference (Null and Fixed cost) is above 60 bits, there is

43

avery good chance of the model to represent a true correlation. The list of

details of remaining cost parameters are described in Table 2.1.

Table 2.1 Pharmacophore cost parameters

S.No Parameter name Description

1 Fixed cost Cost of the simplest possible hypothesis

(initial)

2 Null cost

Costs when each molecule estimated as

mean activity acts like a hypothesis with no

features

3 Weight cost

A value that increases in a Gaussian form as

the feature weight in a model deviates from

an idealized value of 2.0. This cost factor

favors hypotheses in which the feature

weights are close to 2. The standard

deviation of this parameter is given by the

weight variation parameter.

4 Error cost

A value that increases as the rms difference

between estimated and measured activities

for the training set molecules increases. This

cost factor is designed to favor models for

which the correlation between estimated and

measured activities is better. The standard

44

deviation of this parameter is given by the

uncertainty parameter.

5 Configuration cost

A fixed cost depends on the complexity of

the hypothesis space being optimized. It is

equal to the entropy of the hypothesis space.

This parameter is constant among all the

hypotheses.

The main goal made by hypotheses is it should map all the actives

present in the dataset and compared to the inactives. In a simple

statement, the compound is inactive because of a) it missing (lack of)

important feature and/or b) the partial/incomplete pharmacophore

mapping because of the orientation.

Metrics for analyzing Pharmacophores and Hit Lists

The generated pharmacophore model further validated based on its

predictive power and its ability to identify/retrieve actives from the

inactives (decoys) (Figure 2.4).

45

Actives

Database molecules Database molecules

Database molecules

Actives

Actives

Hits

Ha

Ht

Figure 2.4: Database searching using pharmacophore models

D = Total no of compounds in database,

A = No of active compounds in database,

Ht = No of compounds in search hit list and

Ha = No of active compounds in hit list

Pharmacophore Validation

Percentage of yield (actives):

% Y = Ha / Ht x 100

Percent ratio of the activities in the hit list:

% A = Ha / Ax 100

Enrichment (enhancement)

46

E = Ha / Ht = Ha x D

A/ D Ht x A

False negatives: A - Ha

False positives: Ht - Ha

When these two conditions (Ha = A and Ha = Ht), and hence (Ha =

Ht= A), are fulfilled, that is impossible to get the true situation.

In actual situation, no of database compounds which are actives

but not short listed as actives, or not tested for activity against this

target. In both cases, these are end up as “False positives”.

When “False negatives” is nil but not able to retrieve the actives

from the database.

The prefect/ideal hit list which retrieves all actives & nothing else

(False negatives = 0, false positives = 0) (i.e., Ht = Ha= A).

Whereas, the worst hit list is which retrieves all data set else but

the known actives (False positives = D-A, False negatives = A) (i.e., Ha =

0, Ht = D-A).

47

The Goodness of Hit (GH) score indicates a how fine the short

listed hit-list is in all aspects total yield and percentage of actives

retrieval. Table 2.2 provides an example of acceptable range of the hit

lists, from ideal& worst (based on the GH score/enrichment value).

Case % Y % A Enrichment False

negatives

False

positives

GH

score

Best 100 100 500 0 0 1

Typical

Good 40 80 200 20 120 0.60

Extreme

Y 100 1 500 99 0 0.50

Extreme

A 0.2 100 1 0 49,900 0.50

Typical

Bad 5 50 25 50 950 0.26

Worst 0 0 0 100 49,900 0

Table 2.2: Goodness of Hit score values

48

2.1.2 Structure Based Studies

Structure based studies (SBDD) uses 3D-structure of the

biomolecular target has many advantages compared to analog based

studies.Itassist to acquire a theoretical depiction of the protein-ligand

interactions and active site orientations, which enable to design new

leads.95A antihypertensive drug, a Angiotensin Converting Enzyme

(ACE)inhibitor was the first lead designed from SBDD.96Table 2.3shows

the complete list of drugs which are identified from structure based

approaches (SBDD).

Table 2.3: List of drugs identified from structure based studies

Enzyme Disease Drugs Trade name

Neuraminidase

Influenza Oseltamivir

Zanamivir

Tamiflu

Relenza®

Carbonic Anhydrase II Glaucoma Dorzolamide Cosopt®

5-Hydroxy Tryptamine

1B Migraine Zolmitriptan Zomig®

Angiotensin II Hypertension Losartan Cozaar®

EGFR Kinase

Bcr-Abl Kinase

Cancer

CML

Erlotinib

Imatinab

Tarceva®

Gleevac®

49

HIV-protease

HIV-Reverse

transcriptase

AIDS

Indinavir

Nelfinavir

Saquinavir

Ritonavir

Lopinavir

Amprenavir

Tipranavir

Rilpivirine

Etravirine

Crixivan

Viracept

Invirase®

Norvir®

Kaletra®

Agenerase®

Aptivus®

Phase II*

Phase III*

* in clinical trials

2.1.2.1 Docking

The protein data bank (PDB) is a freely accessible repository

maintained by RCSB for the 3D structures of biomolecular

targets,determined by x-ray crystallography& NMR

spectroscopy.96,97,98Large-scale protein structureprojects led by various

genomicconsortiums throughout the world has enabled selection of many

target proteins for their therapeutic potential.99

Docking is defined asinsilicoprediction of non-bonded (non-

covalent) protein-ligand complex. The main objectiveof the docking

algorithms to predict the binding mode and orientation of ligand, from

given the protein structure and ligand. During the docking process, it

identifies the binding mode (with lowest free energy) of a ligand within

50

the protein cavity. Docking mainly deals with two components: binding

pose identification and scoring. Protein simulation process is

computationally expensive due to its high flexibility; because of this

reason, most of the docking algorithmare rigid docking [consider protein

as rigid]or flexible docking [allow side chainsflexibility (active site amino

acids)]. Docking algorithm can be classified as: fragment based approach

(as variant 3) and whole molecule approach [variant 1 and 2] shownin

Figure 2.5. Table 2.4list the existing docking methodologies and the

strategies they follow.In general docking algorithmcalculates the forces

associated and concerned in the protein−ligand bindingsuch as.,Van Der

Waals, electrostatic and hydrogen bonding. 100

Figure 2.5: flexible ligand docking process

51

Table 2.4: Different conformational methods used for docking

Searching

Algorithm Description Examples

Monte Carlo

(MC)

Stochastic method of generating

conformations. Selection based on

Metropolis criterion

Ligand Fit

Simulated

Annealing

(SA)

Random thermal motions are induced,

through high temperatures, to explore

the local search space. System is

driven to a minimum energy

conformation by decreasing

temperature. SA usually combined

with MC

MC-DOCK,

ICM-DOCK,

AutoDock

Genetic

Algorithm

(GA)

Based on Darwin principles of

evolution. ‘Chromosome’ encoding

model parameters (like torsion angles)

is varied stochastically. Populations

generated through genetic operations

(crossover, mutation, migration). The

fittest survives in the population.

GOLD,

DARWIN,

AutoDock

Tabu search Stochastic method of generating PRO_LEADS

52

conformation by keeping a record of

previous conformations (tabu).

Generated conformation is retained if

it is not tabu or if it scores better than

that in tabu

Incremental

construction

Systematic method where ligand is

broken into rigid fragments at

rotatable bonds. Fragments docked in

all possible ways and assembled

piecewise to regenerate ligand

Flex-X, DOCK,

HOOK, LUDI,

Hammerhead

Matching

methods

Based on clique detection technique

from graph theory. Ligand atoms

matched to the complimentary atoms

in the receptor

FLOG, DOCK

Simulation

methods

Molecular dynamics simulations used

to generate conformations DOCK

Docking algorithmwill be implemented in the different phases of

the discovery process:

(1) To predict the binding mode of a known actives

(2) To identify the new ligands in virtual screening process

53

(3) To predict/estimate the binding affinities of series of

known actives

Scoring

Scoring functions in a docking program often have twoaims: to

assist the conformational search algorithm to identify the correct

bindingmodeand to predict the binding energy. This dock score and

binding free energy values allowsthe user to compare binding modes of

same ligand and also to prioritize the ligands during the screening

process.101

Scoring functions are divided into3 types.100: (a) Empirical Scoring;

(b)knowledge based and (c) force field basedscoring methods. Empirical

scoring functions are depends on the functions/rules are derived from a

known crystal structures with its affinities for the co-crystals. Force field

based scoring functions usesparameterized terms (with force field)that

score the electrostatic interactions and Van Der Waals interactions

between protein and ligand. It includesinteraction energy (receptor-

ligand) and internal ligand energy (strain energy induced due to the

binding). Knowledge based scoring methodsis based on the assessment of

the frequency of each non-bonded interaction and the observed distance

between specific type of atoms in of protein–ligand databases. The

probability & occurrence of an each type of interaction and well

compared with the idealvalue.

54

2.1.2.1.1 Docking algorithms

A. GOLD

B. GLIDE

C. Ligand Fit

GOLD

Genetic Optimization for Ligand Docking102 is an rigid docking,

that implementsGA (genetic algorithm) to investigate the widerange of

ligand conformations and gold allows partial flexibility (side chain) of the

protein.

GOLD scoring functions

A. The Goldscore function similar to molecular mechanics

parameterized function with 4 terms:

Shb_ext calculates thehydrogen-bond interactions

Svdw_ext it calculates the Van Der Waals score

Shb_int it calculates the intramolecular hydrogen bonds with in the

ligand; by default this term is not considered (GOLD default),

and which gives the better results;

Svdw_int it calculates the intramolecular strain in the ligand.

GOLD Fitness = Shb_ext + Svdw_ext + Shb_int + Svdw_int ---(1)

55

B. GOLD uses the genetic algorithm (GA) with an adjustable

parameters such as., (a) ligand torsion angle adjustments (b) flip ring

corners of the ligand (c) dihedral angle adjustments OH groups and NH3

groupsof protein; and (d) the ligand mapping on to the fitting points (i.e.,

the orientation/position of the ligand within the site). In addition to these

parameters, user can also adjust docking parameters.

C. GOLD uses fitting points to place/orient the ligand molecule in

the site; it generates fitting points which represents the groups involved

in the hydrogen bonding interactions in protein &ligand, and it

superimpose the acceptor points on to the ligand and donor points on to

the protein and vice-versa. In addition to that, GOLD also generates

Steric (hydrophobic fitting points) in the protein cavity, which maps onto

the ligand CH groups.

ChemScore was developed and derived empirically (a set of 82

protein-ligand complexes) which measures binding affinities 103, 104.

The ChemScore function also added with regression data against

measured affinity, but still it has to be improved in all aspects and

GoldScore is superior compare to Chemscore in predicting binding

affinities.

ChemScore :

56

Here, the no ofterms(regression coefficients) and P terms represents the

physical contributions required to binding.

In addition to above terms, in a clash penalty and internal torsion

terms, also added to ChemScore value, which gives penalty for close

contacts and poor conformations. Covalent &constraint scores are also

included as shown in the below equation:

GLIDE

Grid-based ligand docking with energetic (Glide)docking

calculations is performed with version v3.6. The glide docking

involvesthree steps, Protein preparation, Grid generation and Ligand

docking.

Using the protein preparation wizard, contains charged side

chainsrefinements of improper side chains, adjust bond order, and

monomer from dimer. During the grid generation, grid center can be

defined using the bound ligand or specifying the X, Y, Z coordinates. The

box dimensions (12X12X12), which are automatically adjustable from the

57

bound ligand size, which covers the entire active site pocket. A scaling

factor of Van Der Waals radii is 0.9 was applied to ligand atoms.

Scoring Functions 105, 106

GLIDE 3.5 employs two differentscoring systems:

(i) GLIDE Score 3.5 SP(Standard-Precision)

(ii) GLIDE Score 3.5 XP (Extra-Precision)

These two scoring (SP & XP) functions uses the similar terms but

are used forspecial objectives. Specifically, GLIDE SP 3.5 is a “softer”,

has more forgiving function which allows ligands that have a reasonable

binding. Glide SP version will reduce false negatives and it is an

appropriate choice for database screening.GLIDE score 3.5 XP gives the

severe penalties for poses which violatesknown physical chemistry rules

GLIDE XP scorewill reduce the false positives and especially useful in

optimization of lead.

GLIDE score(also known as modified ChemScore) function as follows:

LigandFit

58

LigandFit module (Accelrys Suite)is another rigid docking algorithm

where protein is considered as rigid and ligand is flexible allowing for

conformation search and docked within the pocket.

The steps involved in the Ligand Fit

1. Site search:

The site can be defined by two ways.

a. Protein Shape (Cavity based): it identifies the sitesusing

proteinshape alone. An “eraser” algorithm used to adjust and

editthe grid points.

b. LigandDocking: Site points are defined usingbound ligand.

2. Conformational Search:

The Monte Carlo method used to generateligand conformational

search.

Ligand Fittings

The ligand fitting is carried out in twosteps:

1. The principle moment of inertia (PMI) of active site pocket is

compared with PMI of ligand. Ligand conformation is selected

based on the PMI fit value (threshold).

2. Further optimization will carried out to adjust ligand positions

and orientation and to improve the docking scores using Rigid

body minimization.

59

2.1.3 De Novo Ligand Design

New drug(De novo) design uses active site information to “develop” a

fragment to whole compoundusing fragment databasebased on the active

site shape and key interactions 107.

Figure 2.6: De novo ligand design flowchart

De novo design carried out in different approaches like site point

approach, fragment based approach. Figure 2.6 gives a schematic

description of few of these strategies.

2.2 Virtual Screening(VS)

Virtual Screening (VS) is theone of key technique in CADD, to search

analyze chemical databases (Vendor and in-house) in order to

60

rank/identify possible new candidate molecules (Figure 2.7). VS

(Structure-based and analog based) is used to screen databases to limit

the no of compounds to be screened experimentally.109 The choice of

virtual screening (SBVS/LBVS) methods is depends on the three

parameters: no of available actives, no of crystal structures

available/not..110-112

Figure 2.7The virtual screening methods

2.3 Molecular Dynamics

Molecular Dynamic simulations (MD) are widely used to study the

conformations changes of biological molecules with the associated kinetic

61

and thermodynamic properties within the time steps (Table 2.5). The one

of basic feature of MD is the generation of a molecular trajectory, i.e. a

series of conformations captured at time steps (regular-intervals) with

complete information.113-115

Table 2.5List of various MD techniques

MD Methods Description

Brownian MD accounts for Brownian motion of molecules

particularly evident with solvents of high

viscosity

Langevin MD Uses Langevin equations of motion where

additional frictional and random forces are

added to Newtonian equation

Activated MD Simulates activated processes allowing to

cross barriers existing between the initial and

final stages

Accelerated MD Time treated as statistical quantity.

Conformational sampling through i) modified

potential energy surface ii) non-Boltzmann

type iii) degrees of freedom at the expense of

faster degrees of freedom

Steered MD Based on Atomic Force Microscopy (AFM)

62

principle. Introduces a time and position

dependent force to steer systems along

particular degrees of freedom

Targeted MD Force used to drive simulation towards target

conformation

SWARM-MD Multiple simulations using molecules with

each one subjected to force introducing

cooperative behavior and drives the trajectory

to the mean trajectory of the entire swarm

Replica

Exchange MD

(REMD)

Series of simultaneous non-interacting

simulations (replica) carried over a range of

temperatures and swapped at particular

intervals of temperature

Self Guided MD

(SGMD)

Introduces an additional guiding force that is

a continuously updated time average of the

force of the current simulation

Leap Dynamics A combination of MD and essential dynamics.

Leaps applied to the system to force it over

energy barriers

Multiple Body

O(N) Dynamics

[MBO(N)D]

Combines rigid body dynamics with multiple

time steps where highest frequency harmonic

motions are removed retaining the low-

63

frequency anharmonic motions

λ Dynamics A multiple copy method with reduced

interaction potential of the ligand, lowering

the barriers of conformational transitions.

Fraction of interacting potential, λi2

(additional degree of freedom) allows ligand to

explore conformational space due to reduced

barriers

2.4 ADMET Prediction

ADMET (Absorption, Distribution, Metabolism, Elimination, and

Toxicity) profiles major challenge in discovery. Table 2.6 lists drugs

which are withdrawn due to their unsatisfactory ADMET profile. Various

InsilicoADME models developed to filter compounds with low ADME

profile.116, 117

Table 2.6: Drugs withdrawn from market due to ADMET failure

Drug Therapeutic

Utility

Year of

withdrawal

Withdrawn due

to

Thalidomide Morning

sickness in

pregnancy

1960s Teratogenicity

leading to

deformities in

64

fetal development

Ticrynafen Diuretic 1982 Hepatitis

Sumatriptan Migraine drug-drug

interactions with

MAO inhibitors

Terfinadine Antihistamine 1997 Cardiotoxicity

Mebifradil Anti

hypertension

Angina

1998 Interferes with

metabolism of

other drugs used

for hypertension

Troglitazone Anti diabetic 2000 Hepatotoxicity

Cerivastatin Anti

hyperlipidemic

2001 Rhabdomyolysis

Rofecoxib Anti

inflammatory

2004 Myocardial

Infarction

Phenyl

proponalamine

In cough as

decongestant

2005 Hemorrhagic

stroke

Simulations One Step Ahead−Future Directions

Advances in computations in structural biology have made it

possible to carry out virtual cell simulations that mimic the cell

environment and the cellular events therein.120 A number of programs

65

have been developed in this area. For example NEURON and GENESIS

simulate the electrophysiological behavior of single neurons and

neuronal networks. One step ahead of these, E-CELL constructs a model

of a hypothetical self-sustaining whole-cell with 127 genes sufficient for

transcription, translation and energy production.121-122

Documents

CHAPTER 2 2. MOLECULAR MODELINGshodhganga.inflibnet.ac.in/bitstream/10603/17255/7/07_chapter 2.pdf · 2. MOLECULAR MODELING Drug discovery is a multidisciplinary approach where drugs