Plan - DTU Bioinformatics · • A drug is a key molecule involved in a particular metabolic or ......

Preview:

Citation preview

Plan Day 1: •  What is Chemoinformatics and Drug Design? •  Methods and Algorithms used in Chemoinformatics

including SVM. •  Cross validation and sequence encoding •  Example and exercise with hERG potassium channel:

Use of SVM in WEKA program

Day 2: •  Exercise on MHC molecules.

•  Lectures and compendium are available on the course program site at CBS.

•  Data set for the hERG exercise and MHC exercises will have to be downloaded on your directory or your machine.

•  WEKA program can be installed in your own machine or can be run from ”life” server at CBS.

Material

Drug discovery

Drug and drug design •  A drug is a key molecule involved in a particular metabolic or signaling pathway that is specific to a disease condition or pathology.

•  Action of activation (agonist) or inhibition (antagonist) to a biological target (protein, receptor, enzymes, cells…).

•  Drug design is the approach of finding drugs by design, based on their biological targets.

•  An important part of drug design is the prediction of small molecules binding to a target protein (pharmacophore, docking, QSAR,…)

F.K. Brown (1998): Annual Reports in Medicinal Chemistry, 33, 375-384 ”The use of information technology and management has become a

critical part of the drug discovery process. Chemoinformatics is the mixing of those information resources to transform data into information and information into knowledge for the intended purpose of making better decisions faster in the area of drug lead identification and optimization.”

Chemoinformatics- definition

Virtual screening- Chemoinformatics

•  Exploit any knowledge of target(s) and/or active ligand(s) and/or gene family.

•  It involves computational technique for a rapid assessment of large libraries of chemical structures in order to guide the selection of likely drug candidates.

Structure based

Docking…

Ligand based

QSAR, similarity search

Pharmacophore based

knowledge based

Chemical-Biology network

VS

Drug-likeness with a simple counting method,

‘rule of five’  Octanol-water partition coefficient (logP) ≤ 5  Molecular weight ≤ 500  No. hydrogen bond acceptors (HBA) ≤ 10  No. hydrogen bond donors (HBD) ≤ 5   If two or more of these rules are violated, the compound

might have problems with oral bioavailability. (Lipinski et al., Adv. Drug Delivery Rev., 23, 1997, 3.)

Rules have always exception.

(antibiotics, antibacterials and antimicrobials,…)

What the chemist sees.

Aspirin Sildenafil

CC(=O)OC1=CC=CC=C1C(=O)O CCCC1=NN(C2=C1NC(=NC2=O)C3=C(C=CC(=C3)S(=O)(=O)N4CCN(CC4)C)OCC)C

1D structure, line notation

SMILES – Simplified Molecular Line Entry System http://www.daylight.com/dayhtml/doc/theory/theory.smiles.html

SMILES

What a protein sees.

Aspirin Sildenafil

Electrostatic fields (red are negative and blue positive)

What a protein sees.

Aspirin Sildenafil

Green are hydrophilic area and red are hydrophobic areas

Ligand based- QSAR  Based on a set of experimental data (biological activity,

solubility, toxicity, permeability,…) one tries to correlate these data with some descriptors.

 But what can be these descriptors?

MACCS key

1D descriptors:

MW, number of features, sequence based

2D descriptors:

Topological, physichochemical, BCUT,…

Descriptors

QSAR based on 3 D interaction energies (GRID, CoMFA...)

000 001

002 003

004

DRY O N1 H2O

PCA and PLS model Structural model Molecular interaction field (GRID)

GRID: Determines a total interaction energy. Etot = Evdw + Eelec + Ehb

Descriptors

Blue areas represent the favorable electronegative region and red the unfavorable electronegative regions (based on CoMFA)

Green areas represent the favorable steric region and yellow the unfavorable steric regions (based on CoMFA).

The most common method used to correlate data are usually, PLS, SVM, ANN, K-means.

Example with Antimicrobial Peptides (AMPs)

AMPs are small (10-40 amino acids), cationic and amphipatic molecules, which are encoded directly from DNA. They are ubiquitous in nature, appearing in such diverse organisms as fungi, bacteria, plants, amphibians, insects, and mammals, where they establish a first-line defence mechanism against invading pathogens or competing organisms.

CAP18

Defensin Leucocin A

Tritrpticin

Activity S001 S002 S003 … S131

variant1 63

variant2 82

Activity = y + aS001 + bS002 + … + zS131

Novispirin

Classification from a sequence based analysis

•  Translate sequence information to a quantitative description 29 physicochemical descriptors reduced to 3 latent variables

by Principal Component Analysis (PCA) Hellberg et al., J.Med.Chem. 30 (1987) 1126

•  Extended to non-natural amino acids Sandberg et al., J. Med. Chem. 41 (1998) 2481

•  Classification of GPCRs Lapinsh et al., Prot. Sci. 11 (2002) 795

Amino Acid z-Scales

Amino Acid z-Scales

Translate sequence information to a quantitative description 29 physicochemical descriptors reduced to 3 latent variables

by Principal Component Analysis (PCA) z1 (lipophilicity) z2 (steric properties) z3 (electrostatic properties) Ala Phe Lys

0.07 -1.73 0.09 -4.92 1.30 0.45 2.84 1.41 -3.14

Hellberg et al., J.Med.Chem. 30 (1987) 1126

Principal Property

Translation

•  Principal property translation (z-scales) & multivariate data analysis

52 x 20

(52+n) x 60 z-scales PCA scores

and QSAR models

Novispirin: From sequences to data

training set of 52 Novispirin variants with 60 variables.

q2loo=0.40 ; r2=0.72 sdep =0.14

Novispirin: From structures to data • 131 descriptors: - CPSA descriptors (charged partial surface area) computed with the Tripos force field, dipole moment, energies,surface, weight (33).

- VolSurf descriptors defined interaction of molecules with biological membranes which is mediated by surface properties such as shape, electrostatic, hydrogen-bonding and hydrophobicity (94).

- theoretical descriptors according of the hydrophobicity of each amino acid defined by Engleman-Steitz (4).

48 variables selected with a fractional factorial design (FFD).

q2loo=0.64 ; r2=0.79 sdep =0.11

Single mutation Novispirin prediction with a QSAR model. Each position of the Novispirin sequence is mutated with the 20 naturally amino acids. The residual activity of peptide variants is defined according of the wild type activity. Then, all point superior to 0 represent a peptide mutant with a better predict activity compared to the original novispirin.

position 4

Taboureau Methods Mol. Biol. 2010 Taboureau et al. Chem.Biol.Drug.Des. 2006. Raventos et al. CCHTS. 2005 Mygind et al. Nature. 2005

Pharmacophore

A pharmacophore is the ensemble of steric and electronic features that is necessary to ensure the optimal supramolecular interactions with a specific biological target structure and to trigger (or to block) its biological response. A pharmacophore does not represent a real molecule or a real association of functional groups, but a purely abstract concept that accounts for the common molecular interaction capacities of a group of compounds towards their target structure.

Definition

Pharmacophore: chemical features •  The chemical features can be hydrogen bonds acceptors, hydrogen bond

donors, charge interactions, hydrophobic areas, aromatic rings, positive or negative ionizable group.) The shape or volume is also considered.

•  Pharmacophores represent chemical functions, valid not only for the curretly bound, but also unknown molecules. The steric hindrance may explain lack of activity.

Hyd Acc

Acc

Acc & Don

Aro

Start to be complex !!!

Pharmacophore

Atom is acceptor if it’s a nitrogen, oxygen or sulfur and not an amide nitrogen, aniline nitrogen and sulfonyl sulfur and nitro group nitrogen

Donor

Acceptor

Acceptor

Aro ring center

Example 1

Pharmacophore

Pharmacophore Example 2 with 3 inhibitors

•  Dopamine (2 rotations and 2 OH groups)

•  Apomorphine (no rotations)

•  5-OH DPAT (one OH group and many rotation)

Agonist at D2 receptor

Example 2

Active agonists define important groups: –  -Aromatic ring –  -meta OH group –  -N atom, righ distance from aromatic ring –  -other molecular ”scaffolding does NOT get in the way at the receptor

Pharmacophore

Pharmacophore

Pharmacophore

Are features of the site unique to hERG?

Structural based design: Docking

Docking

Kellenberger E. et al. Proteins 2004, 57(2): 225-242

Structural based design: Docking

Comparison of some of the docking tools.

Structural based design: Docking

Induced fit docking

Lock and Key

+ Substrate (ligand)

Enzyme (receptor) + Substrate

(ligand) Enzyme (receptor)

Induced Fit

Zhou, Sciences 2007

Structural based design: an example with antidepressant

Chemogenomics and Pharmacogenomics Chemogenomics: Studied the biological effect of a wide array of small molecules on a wide array of macromolecular targets. Pharmacogenomics: Design drugs according of the genetic variation.

Chemical network: example with neurotransmitter transporters

A441G

S438T

A173S

V343S V343N

A173S CIT: 4.5x gain of function Des-F: 16.5x gain of function

S438T CIT: 175x loss of function Monomethyl: 3.5x loss of function

A441G CIT: 2.5x gain of function CIT with “ears”: 2x gain of function

V343S CIT: 4.5x gain of function Des-CN: 4.9x gain of function [Des-F: 7.4x gain of function]

V343N CIT: 3.5x gain of function Des-CN: 35.5x gain of function [Des-F: 3.8x gain of function]

Chemogenomics and Pharmacogenomics: an example with citalopram

A441G

S438T

A173S

V343S V343N

A173S CIT: 4.5x gain of function Des-F: 16.5x gain of function

S438T CIT: 175x loss of function Monomethyl: 3.5x loss of function

A441G CIT: 2.5x gain of function CIT with “ears”: 2x gain of function

V343S CIT: 4.5x gain of function Des-CN: 4.9x gain of function [Des-F: 7.4x gain of function]

V343N CIT: 3.5x gain of function Des-CN: 35.5x gain of function [Des-F: 3.8x gain of function]

Chemogenomics and Pharmacogenomics: an example with citalopram

Conclusion

 According of the information you have, different strategies can be used.

  If you can develop different strategies which come to the same conclusion, that will reinforce your hypothesis.

  The most information (experimental) you have, the better your validation will be.

Time for a break !!!

Recommended