Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Computer Aided Drug Design
Qin Xu
Contact
• 4-221 Life Science Building • Email: [email protected] • Tel: 34204348(O) • Website:
http://cbb.sjtu.edu.cn/~qinxu/teaching.htm
Readings • 分子模拟与计算机辅助药物设计
– 魏冬青等编著,上海交通大学出版社 • 计算机辅助药物分子设计
– 徐筱杰等编著,化学工业出版社 • 计算机辅助药物设计导论
– 叶德泳著,化学工业出版社 • Others
– Textbook of Drug Design and Discovery, Third Edition, by Povl Krogsgaard-Larsen, published by Oxford University Press
– Computational Medicinal Chemistry for Drug Discovery, by Patrick Bultinck, et. al,published by Marcel Dekker, 2003
– Molecular Modeling for Beginners,by Alan Hinchliffe,published by John Wiley and Sons,2002
Outline • Introduction and Case Study • Drug Targets
– Sequence analysis – Protein structure prediction – Molecular simulation
• Drug Design – Combinatorial library – De novo Drug Design – Pharmacophore – QSAR
• Molecular Docking
What is a drug?
• Defined composition with a pharmacological effect
• Regulated by the Food and Drug Administration (FDA)
• Most drugs are small molecules, and the interactions they make with proteins determine their effects, and toxicity, to the human body
Sources of Drugs • Small Molecules
– Natural products • fermentation broths(发酵液) • plant extracts • animal fluids (e.g., snake venoms蛇毒)
– Synthetic Medicinal Chemicals • Project medicinal chemistry derived • Combinatorial chemistry derived
• Biologicals – Natural products (isolation) – Recombinant products – Chimeric or novel recombinant products
Drug Discovery and Drug Development
• Drug Discovery – Concept, mechanism, assay, screening, hit
identification, lead demonstration, lead optimization
– In Vivo proof of concept in animals and concomitant demonstration of a therapeutic index
• Drug Development – Begins when the decision is made to put a
molecule into phase I clinical trials
Issues in Drug Discovery • Hits and Leads 成药性
– Is it a “Druggable” target? • Resistance 抗药性 • Pharmacodynamics 药效学 • Delivery - oral and otherwise 投药方式 • ADMET (absorption, distribution, metabolism,
and excretion - toxicity ) • Patentability
Drug Discovery Processes
Molecular Biological Hypothesis (Genomics)
Chemical Hypothesis
Physiological Hypothesis
Primary Assays Biochemical Cellular Pharmacological Physiological
Sources of Molecules Natural Products Synthetic Chemicals Combichem Biologicals
+ Initial Hit Compounds Screening
Drug Discovery Processes - II
Initial Hit Compounds
Secondary Evaluation - Mechanism Of Action - Dose Response
Initial Synthetic Evaluation - analytics - first analogs
Hit to Lead Chemistry - physical properties -in vitro metabolism
First In Vivo Tests - PK, efficacy, toxicity
Drug Discovery Processes - III
Lead Optimization Potency Selectivity Physical Properties PK Metabolism Oral Bioavailability Synthetic Ease Scalability
Pharmacology
Multiple In Vivo Models
Chronic Dosing
Preliminary Tox
Development Candidate (and Backups)
`> 10,000,000 compounds
1 drug
>1,000 “hits”
6 drug candidates
10-15 years $300 to >$800 million
preclinical clinical Phase I III
12 “leads”
Drug Discovery Pipeline
• The time from conception to approval of a new drug is typically 10-15 years
• The vast majority of molecules fail along the way
• The estimated cost to bring to market a successful drug is now $800 million!! (Dimasi, 2000)
活性化合物
先导化合物
药物候选物 药物
Drug Discovery Disciplines • Medicine • Physiology/pathology • Pharmacology • Molecular/cellular biology • Automation/robotics • Medicinal, analytical,and combinatorial
chemistry • Structural and computational chemistry • Bioinformatics
Why we need computers?
• Clinical trials are most expensive part of the pipeline – if failure can be predicted before this point, it saves time and money
A Little History of Computer Aided Drug Design
• 1960’s - review the target - drug interaction • 1980’s- Automation - high throughput target/drug selection • 1980’s- Databases (information technology) - combinatorial libraries • 1980’s- Fast computers - docking • 1990’s- Fast computers - genome assembly - genomic based target selection • 2000’s- Vast information handling - pharmacogenomics
CADD in various stages of drug discovery
Some introductions in this class
先导化合物设计与优化
药物候选物 随机筛选 1,000-2,000 个化合物
临床前研究
临床研究
市场
理论计算、分子模拟 计算机辅助药物设计
2~3年
2~3年 2~3年
2~3年
3~4年
Modeling Docking
Molecular Simulation
Combinatorial library De novo Drug Design Pharmacophore QSAR
Outline • Introduction and Case Study • Drug Targets
– Sequence analysis – Protein structure prediction – Molecular simulation
• Drug Design – Combinatorial library – De novo Drug Design – Pharmacophore – QSAR
• Molecular Docking
Drug Targets • Receptor, enzyme, ion channel, and
nucleic acid which can act with drug. • Therapeutic Target Database (TTD) http://bidd.nus.edu.sg/group/ttd/ttd.asp A database to provide information about the known and explored
therapeutic protein and nucleic acid targets, the targeted disease, pathway information and the corresponding drugs directed at each of these targets.
Also included in this database are links to relevant databases containing information about target function, sequence, 3D structure, ligand binding properties, enzyme nomenclature and drug structure, therapeutic class, clinical development status.
Therapeutic Target Database (TTD)
Drugs and targets • One drug on one target (FDA standard) • Multi target agents
– One drug on multiple target – Side effects
• Drug combinations – Complexed biological metabolism – Interference between drugs – Cock tail to overcome Drug resistance
• Nature derived drugs, traditional medicine
Types of Drug Combinations • Pharmacodynamically (药代动力学)
– synergistic drug combinations due to anti-counteractive actions
– synergistic drug combinations due to complementary actions
– synergistic drug combinations due to facilitating actions
– additive drug combinations – antagonistic drug combinations – potentiative drug combinations – reductive drug combinations
Successful drug developments
• HIV-1 Protease Inhibitors in the market: – Inverase (Hoffman-LaRoche, 1995) – Norvir (Abbot, 1996) – Crixivan (Merck, 1996) – Viracept (Agouron, 1997)
Drug discovery today 2, 261-272 (1997)
HIV-1 virus
Nat
ure
Med
icin
e 5,
740
- 74
2 (1
999)
Integrase inhibitors
Cocktail therapy of AIDS
Serendipity in Drug Discovery
• Surprised effect of one drug designed for other proposes – Tamoxifen (birth control and breast cancer) – Viagra (hypertension 高血压 and erectile
dysfunction 勃起功能障碍) – Salvarsan (Sleeping sickness and syphilis 胎传性梅毒)
– Interferon-α (hairy cell leukemia 多毛细胞白血病 and Hepatitis C丙肝)
Sequence Alignment • Dynamic programming sequence alignment
– Needleman/Wunsch global alignment – Smith/Waterman local alignment – Linear and affine gap penalties
• Improved algorithms • Heuristic algorithms
– FASTA – BLAST – CLUSTAL
Protein structure prediction • Sequence
– From RNA sequence – Protein sequencing
• Structure
– Secondary – Tertiary
• Function
– Activity, specificity – Binding
Protein structure prediction
• Secondary structure prediction – Distribution of amino acids – Featured sequenceFeatured domain
• Tertiary structure prediction
– The 3D structure of target proteins – Structure of active site, binding site,
possible conformational changes – Molecular docking, molecular dynamics
simulations
Protein 3D structure prediction
• Homology Modeling – Homolog template – Similar sequence similar secondary
structure similar featured domain similar backbone similar 3D structure with optimized side chains
• Threading method – Fold recognition method – Long distance homology protein
• Ab initio prediction – Sequence Structure – Conformer search – Energy minimization
Rosetta
MOE, MODELLER, SWISS-MODEL
% structure overlap the fraction of equivalent residues (Cα atoms within 3.5Å each other).
Sequence identity and Accuracy
Sequence identity and applications
Molecular Simulations
• Molecular Dynamics, Langevin Dynamics – Protein structure prediction – Molecular docking – Predictive ADMET
• Monte-Carlo – Combinatorial library – De novo drug design – Molecular docking
Applications of Molecular Simulations
• Search of conformations – Drug molecules – Target proteins – Binding complex
• Energetic determination – Structural optimization – Binding energy – Free energy simulations
Concepts of Molecular Simulations • Statistical Mechanics
– Phase space – Ensembles: N, V, E and N, P, T
• Scales of simulations – Quantum Mechanical – Molecular Mechanical and force fields – Coarse grained
• Related methods – Boundary condition – Potential truncation – Constraints
Forces calculation and Scales of simulations
http://www.sciencedirect.com/science/article/pii/S0079642509000565
Common workflow in running MD simulations 1. Model Generation. (建模) 2. Energy minimization. (能量最小化) 3. Heating-up the system to the
desired temperature. (加温) 4. Equilibration. (平衡) 5. Production run. (采样计算) 6. Analysis. (分析)
Examples of biomedical researches in computer-aided Drug design
• Much more gigantic system – 104~108 atoms
• Much more complicated – target protein – drug ligand – Solution, membrane environment
• Two examples – nicotinice acetylcholine receptor, nAChR – influenza virus proton channel M2.
nAChR
Ruo-Xu Gu, 2012, Structural Bioinformatics Studies of Two Ion Channel Proteins.
Interactions between JN403 and three different nAChRs
the VdW interactions between JN 403 and the residues at the binding cavity the long rang electrostatic interactions between JN403 and the extracellular domain of the receptors are responsible for the ligand’s selectivity for different nAChR subtypes
Ruo-Xu Gu, 2012, Structural Bioinformatics Studies of Two Ion Channel Proteins.
The correlation of motions of the receptor extracellular domain induced
by agonists, varenicline
Cα of α subunits
Ruo
-Xu
Gu,
201
2, S
truct
ural
Bio
info
rmat
ics
Stud
ies
of T
wo
Ion
Cha
nnel
Pro
tein
s.
The correlation of motions of a~h
Ruo
-Xu
Gu,
201
2, S
truct
ural
Bio
info
rmat
ics
Stud
ies
of T
wo
Ion
Cha
nnel
Pro
tein
s.
Possible binding sites of the allosteric modulators
Ruo-Xu Gu, 2012, Structural Bioinformatics Studies of Two Ion Channel Proteins.
Influenza viral infection to host cell
Behrens G., Stoll M., Kamps B.S., et al., Pathogenesis and immunology. Printed by Druckhaus Sud GmbH & Co. KG, D-50968 Cologne, 2006, 19.
Binding to M2 Channel
• Two sites • Three
paths for ligands
Ruo-Xu Gu, 2012
Structural Bioinformatics Studies of Two Ion Channel Proteins.
Binding at P-site
Rotation of the rimantadine molecule during the MD simulation explained the NMR result.
• amantadine (金刚烷胺 ) bound
• Rimantadine (金刚乙胺 ) bound
Ruo-Xu Gu, 2012
Structural Bioinformatics Studies of Two Ion Channel Proteins.
Block of M2 channel
Free energy of four binding states
rimantadine in the M2-lipid environment
P: thermodynamic preferred (hard to bind, but stably bound)
S: kinetic preferred (easy to come, easy to go)
Ruo-Xu Gu, 2012, Structural Bioinformatics Studies of Two Ion Channel Proteins.
Virtual combinatorial library • Fragment screening (分子片段枚举) • RECAP (分子片段化与重组)
– RECAP analysis – RECAP synthesis
• BREED(配体繁殖) – Crossover by Genetic algorithm – superposition
Terms for combinatorial library
Functional group
R-group
Leaving group
Attachment point
Scaffold
Reagent
Reactive atoms
De novo Drug Design
• Not de novo, but by database searching – Fragment-based – Descriptor-based
• Descriptors and Molecular Docking – Scaffold replacement – Fragment evolution – Link & Merge
Fragment-based Fragments
(databases)
Target active site
(3D structure)
Docking Evaluation Link & Merge
Descriptor-based
Fragments
(databases)
Target active site
(3D structure)
Pharmacophore Query & Docking
Receptor surface calculation
Pharmacophore & Link features
Pharmacophore
Geometric arrangement of functional groups of the ligand that are required for “activity”
A pharmacophore is a spatial arrangement of atoms or functional groups believed to be responsible for biological activity
• Pharmacophore-based alignment
Pharmacophore of KZ7088
7 points pharmacophores
The ligand KZ7088 in the active site of SARS-CoV Mpro
A pharmacophore scheme is a collection of functions that define the meaning, appearance and methods of calculation of ligand annotation points and their attached labels. The scheme defines how each ligand in the searched database is annotated. The default scheme is PCH (Polarity-Charge-Hydrophobicity).
Label Definition Don Hydrogen bond donors, including tautomeric donors. Acc Hydrogen bond acceptors, including tautomeric acceptors. Cat Cations, including resonance cations. Ani Anions, including resonance anions. Hyd Hydrophobic areas. Aro Aromatic centers.
Quantitative Structure-Activity Relationships (QSAR)
• Mathematical Models Found a quantitative relationship
Molecular Parameters
log P σ
MR MLP
…
Biological activity IC50 Ki MIC Permeation …
Statistical tool
The key of QSAR
Descriptor Choice
Statistical Model construction
Fingerprint
{ is-aromatic, has-ring, has-C }
{ has-ring, has-C }
{ has-C, has-N, has-O }
{ is-aromatic, has-ring, has-C, has-N, has-F }
If a universe of features U = { is-aromatic, has-ring, has-C, has-N, has-O, has-S, has-P, has-halogen } there are 28=256 possible fingerprints Tanimoto coefficient between fingerprints X and Y is defined to be: # features in intersect(A,B) / # features in union(A,B)
3D-Molecular Field Calculation
Mol3
Mol1Mol2
PLS Bio = cte + a*S001 + b*S002 + ..... + m*S999 +n*E001 + ..... + z*E999
Bio S001 E001 E999....S002 S999....
Interpretation of CoMFA results
Compounds with low activity
Compounds with high activity
3D-QSAR contour
COMFA Electrostatic
3D-QSAR Parameters
• Traditional CoMFA fields – Steric fields, Lennard-Jones potential
– Electrostatic fields, Coulomb potential
E = r + r
r - 2
r + r rj
k = 1
natoms probe k
ij
12probe k
jkε∑ •
•
6
E = q q
rjprobe k
jkk = 1
natoms •
•∑
ε
More 3D-QSAR Parameters
• Additional fields in CoMFA – Interaction energies with functional groups (GRID software)
– hydrophobic field (HINT software)
– Molecular Lipophilicity Field (CLIP software)
Biological activity parameters
– IC50 (50% inhibiting concentration), – Ki (inhibitory constant), – MIC (minimum inhibitory concentration), – Permeation
Statistical models • Linear (quantitative model)
– LR (linear regression) – MLR (multi-linear regression) – PCR (principle component regression) – PLS (partial least square) – SVR (support vector regression)
• Nonlinear (qualitative model) – SVM (support vector machine) – Bayesian statistics
∑∑
−
−−= 2
2exp
)()(
1 Rmeancalc
calc
yyyy
拟合效果 • 相关系数R,
• 均方根偏差RMSE
• Fisher检验值F
1)(
1 RMSE2
exp
−−
−−= ∑
knyycalc
2
2
)1()1(
RkknRF
−−−
=
calcy 为计算活性; expy 为实验活性; meany
为计算活性平均值; n为样本个数; k 为变量个数
Evaluation of QSAR models
Statistical value for 3D-QSAR •For the step of crossvalidation:
•Cross-validated correlation coefficient, q2. •Optimal number of components, N.
•For the final model: •Squared of correlation coefficient, r2. •Standard error of estimate, s. •Residuals. •F values.
Molecular Docking • Multistep procedure
– Prescreening: ligands or binding site – Pose optimization – Binding energy and evaluations
• Representations of the receptor – Atomic – Surface – Grid
• Scoring schemes – Force field-based: Electrostatic, VDW – Empirical – Knowledge-based scoring functions
Molecular Docking • Systematic docking algorithms
– conformational search methods – fragmentation methods – database methods.
• Random or stochastic algorithms – Monte Carlo methods (MC) – Genetic Algorithm methods (GA) – Tabu Search methods
• Simulation methods – Molecular dynamics – Local minimization
• Flexible protein docking – MD and MC methods – rotamer libraries – protein ensemble grids – soft-receptor modeling
Docking
• Prediction of ligand conformation and orientation (or posing) within a targeted binding site – accurate structural modeling – correct prediction of activity
• Free energy of binding (∆G)
Goals of Docking 1) Characterize binding site - make an image of binding site with
interaction points 2) Orient ligand into binding site - grid search - descriptor mapping - energy search 3) Evaluate strength of the interaction ∆G bind= ∆G complex – (∆Gligand +
∆Gtarget)
Multi-step Process • First, find the poses of small molecules in the
active site • Using simple scoring functions evaluating
compound fits on the basis of calculations of approximate shape and electrostatic complementarities.
• Pre-selected conformers are often further evaluated using more complex scoring schemes – More detailed treatment of electrostatic and van der Waals interactions – inclusion of at least some solvation or entropic effects – And other empirical terms
Molecular Representation • Three basic representations of the
receptor: – atomic, surface and grid
• Atomic representation – only used in conjunction with a potential energy function – often only during final RANKING procedures
• Surface-based docking – Typically used in protein–protein docking – Connolly’s surface
• Potential energy – store information about the receptor’s energetic contributions on
grid points so that it only needs to be read during ligand scoring. – two types of potentials: electrostatic and van der Waals
Standard Potential Energy • Electrostatic potential energy:
• Van der Waals interaction – Lennard-Jones 12-6 function
Docking Models • System: ligand, protein, and solvent
molecules. • Solvent molecules
– they are normally excluded from the problem. – Or implicitly modeled in the scoring functions as a way to address
the solvent effect.
• Rigid-body approximation – very popular in early – treats both the ligand and the receptor as rigid and explores only
the 6 degrees of translational and rotational freedom.
• A more common approach – modeling ligand flexibility while assuming a rigid protein receptor,
therefore considering only the conformational space of the ligand.
Docking algorithms • Systematic docking algorithms • Random or stochastic algorithms • Simulation methods • Flexible protein docking
Systematic Docking • Explore all the degrees of freedom in a
molecule – conformational search methods – fragmentation methods – database methods.
• Conformational search methods – All possible combinations have been generated and evaluated
• Fragmentation methods – “the place-and-join approach” – “the incremental approach”
• Database – use libraries of pre-generated conformations
Stochastic Algorithms • Three basic types of
– Monte Carlo methods (MC) – Genetic Algorithm methods (GA) – Tabu Search methods. (tabu = forbidden)
• Monte Carlo methods – Sampling Boltzmann probability function – Significant advantage over molecular dynamics methods (MD)
• Genetic algorithms – GAs start from an initial population of different conformations of the
ligand with respect to the protein.
Simulation Methods • Two major types exist
– molecular dynamics (MD) – pure energy minimization methods
• Molecular dynamics methods – difficulties in navigating a rugged hypersurface of a biological
system and crossing high-energy barriers – the problem in sampling the conformational space within a feasible
simulation period
• Energy minimization methods – simplex, gradient methods, conjugate-gradient methods, etc. – rarely used as a stand-alone search technique in docking because
only local minima can be reached
Flexible Protein Docking • MD and MC methods, rotamer libraries,
protein ensemble grids, soft-receptor modeling
• MD and MC: – docking simulations with a fully flexible target are currently not feasible.
• Rotamer libraries – Represent the protein conformational space as a set of experimentally
observed and preferred rotameric states for each side chain
• Ensemble of protein conformations – Multiple protein structures from crystal structures, NMR, or calculations
• The Soft-receptor modeling approach – combines the information of several different protein conformations to
generate a single “energy weighted average” grid
Scoring Functions • Lack of a suitable scoring function, both
in terms of speed and accuracy, is the major bottleneck in docking
• Generally able to predict binding free energies within 7–10 kJ/mol
• Three major classes – Force field-based – Empirical – Knowledge-based scoring functions
Force Field-based Scoring • Generally the sum of two energies:
– interaction energy between the receptor and the ligand – the internal energy of the ligand – Lennard-Jones + electrostatic terms
• D-Score, G-Score, GoldScore, and the AutoDock 3.0 scoring function
• Complete molecular mechanical force-fields: – AMBER (Assisted Model Building and Energy Refinement) – CHARMM (Chemistry at HARvard Macromolecular Mechanics) – GROMOS(Groningen Molecular Simulation System) – OPLS (Optimized Potentials for Liquid Simulations)
Empirical Scoring Functions • Designed to reproduce experimental data and are
based on the idea that binding energies can be approximated by a sum of several individual uncorrelated terms.
• Experimentally determined binding energies and sometimes a training set of experimentally resolved receptor–ligand complexes are used to determine the coefficients for the various terms by means of a regression analysis.
• Dependence on the experimental data set used in the parameterization process (non versatile and non transferable).
Knowledge-Based Scoring Functions
• Trying to implicitly capture binding effects that are difficult to model explicitly.
• Typically, very simple atomic interactions pair potentials
• Based on the frequency of occurrence of different atom–atom pair contacts and other typical interactions in large datasets of protein–ligand complexes of known structure.
Capabilities and Limitations • Predict known protein-bound poses with
averaged accuracies of about 1.5–2 A with success rates in the range of 70–80%.
• Significant improvement beyond this range seems for now unachievable, even with the inclusion of receptor flexibility.
• Source of problems – Scoring function is the major limiting factor. – solvent effect and the direct participation of water molecules in protein–ligand
interactions, – Limited resolutions of most crystallographic targets – Protein flexibility
• Recent trends – Inclusion of solvation and rotational entropy contributions – development of search algorithms more able to describe and efficiently sample the
conformational space of the protein–ligand system, within a flexible-target paradigm
Software
Docking – Cancer Drug Design • Ligand (analog-based)
• Target (structure-based)
Protein-Ligand Interactions
Quiz
quantitative? ligand or receptor?
energy or geometry?
local or global?
QSAR
Pharmacophore
Molecular Docking De novo
drug design
In developments
quantitative? ligand or receptor?
energy or geometry?
local or global?
QSAR yes ligand energy or geometry global
Pharmacophore no ligand or
receptor geometry local
Molecular Docking yes ligand &
receptor energy or geometry
global or local
De novo drug
design yes ligand &
receptor energy or geometry local