Cheminformatics in Drug Discovery and Chemical Genomics Research Weifan Zheng, Ph.D. Associate...

Preview:

Citation preview

Cheminformatics in Drug Discovery and Chemical Genomics Research

Weifan Zheng, Ph.D.Associate Professor

Department of Pharmaceutical SciencesBRITE Institute, NC Central University

Adjunct Associate ProfessorDepartment of Medicinal Chemistry

University of North Carolina at Chapel Hill

UKY Seminar Weifan Zheng, Ph.D.

Topics to Be Covered

Biotech/Pharma Orphan Disease Chemical Genomics

Computational Needs

Compound Collection Docking Scoring Data Analytics

CECCR Cheminformatics Center

UKY Seminar Weifan Zheng, Ph.D.

Drug Discovery & Development Pipeline

UKY Seminar Weifan Zheng, Ph.D.

Phases and Costs of Drug Discovery

UKY Seminar Weifan Zheng, Ph.D.

• GR: Genetic Research; DR: Discovery Research; DD: Drug Discovery • CADD: computer-assisted drug discovery• ADMET: Absorption, distribution, metabolism, elimination, toxicity

Drug Discovery Process and the Roles of CADD

GR DR DD Preclin

IND

I II III

T H L CH2L LOT2H

CADD

Clinical trials

UKY Seminar Weifan Zheng, Ph.D.

Human Genome Project Success

“Genome announcement 'technological triumph'Milestone in genetics ushers in new era of discovery, responsibility”

CNN, June 26, 2000

UKY Seminar Weifan Zheng, Ph.D.

Chemogenomics/Chemical Genomics

Chris AustinF. Collins

UKY Seminar Weifan Zheng, Ph.D.

• Chemogenomics – 69,000 in google (Oct.16, 2006)

• Chemical genomics – 113,000 in google (Oct.16, 2006)

• Chemical biology – 4,210,000 (Oct.16, 2006)

• Chemical genetics– 104,000 (Oct.16, 2006)

Chemical Genomics

UKY Seminar Weifan Zheng, Ph.D.

Chemical genetics is a research method that uses small molecules to change the way proteins work—directly in real time rather than indirectly by manipulating their genes. It is used to identify which proteins regulate different biological processes, to understand in molecular detail how proteins perform their biological functions, and to identify small molecules that may be of medical value.

Chemical genetics is a research method that uses small molecules to change the way proteins work—directly in real time rather than indirectly by manipulating their genes. It is used to identify which proteins regulate different biological processes, to understand in molecular detail how proteins perform their biological functions, and to identify small molecules that may be of medical value.

to create a national resource in chemical probe development. The center uses the latest industrial-scale technologies to collect data that is useful for defining the cross-section between chemical space and biological activity (and do soon genomic scale).

Chemical SynthesisCenters

Chemical SynthesisCenters

MLIMLI

MLSCN (9+1)9 centers 1 NIH intramural20 x 10 = 200 assays

MLSCN (9+1)9 centers 1 NIH intramural20 x 10 = 200 assays

PubChem(NLM)

PubChem(NLM)

ECCR (6)ExploratoryCenters

ECCR (6)ExploratoryCenters

CombiChemParallel synthesis

DOS4 centers + DPI

100K – 1M compounds

CombiChemParallel synthesis

DOS4 centers + DPI

100K – 1M compounds

           

           

           

           

           

           

           

           

           

           

           

           

compounds

200 assays

SAR matrix

NIH Molecular Library Initiative

UKY Seminar Weifan Zheng, Ph.D.

N

O

O

O

R1

• Biochemical assays• Cell-based functional assays• Phenotypic assays

• Databases– PubChem (http://pubchem.ncbi.nlm.nih.gov/)– ChemBank (http://chembank.broad.harvard.edu/)

– WOMBAT (http://sunsetmolecular.com/index.php)– Jubilant (http://www.jubilantbiosys.com/)– Gvk/Bio (http://www.gvkbio.com/)

Biological Assay Data

UKY Seminar Weifan Zheng, Ph.D.

VirtualLibraries

Diverse Lib Design

Targeted Lib Design

CombinatorialSynthesis

HTS

KDD(QSAR, P.R.)

Rules

RealLibraries

SAR Data

Drug DiscoveryChemical Genomics

Logistics

Sci

entif

icHigh Throughput Chemistry and Screening: Informatics

UKY Seminar Weifan Zheng, Ph.D.

Topics to Be Covered

Biotech/Pharma Orphan Disease Chemical Genomics

Computational Needs

Compound Collection Docking Scoring Data Analytics

CECCR Cheminformatics Center

UKY Seminar Weifan Zheng, Ph.D.

3,0003 / 1,000 per week = ~0.5 million years!!!• Library Design: rational selection of a subset

of building blocks to obtain a maximum amount of information

(3000) R1

R2 (3000)

R3 (3000)

Challenges in Combinatorial Chemistry

UKY Seminar Weifan Zheng, Ph.D.

Design for Activity: Similarity

• If we know a compound is active, and we want to design a set of compounds that may be active against the same target, we may select– A set of compounds that are similar to the

active compound

• The similarity principle: similar compounds should have similar biological activity

UKY Seminar Weifan Zheng, Ph.D.

X1 X2 X3 • • • X20

Str. 1 2 5 1 • • • 4Str. 2 4 7 9 • • • 7Str. 3 1 6 8 • • • 6

• • • • • • • •• • • • • • • •• • • • • • • •

Str.100 0 3 5 • • • 1

123

X1

X2

Molecular Identity and Molecular Similarity

UKY Seminar Weifan Zheng, Ph.D.

Design for General Application: Diversity

UKY Seminar Weifan Zheng, Ph.D.

- Maxi Min- Minimize (Sum 1/Dij*Dij)

Similarity and Diversity

UKY Seminar Weifan Zheng, Ph.D.

0

2

4

6

8

10

12

Nu

mb

er

of

Clu

ste

r H

its

5s 5r 10s 10r 15s 15r 20s 20r 25s 25r 30s 30r

Number of Active Clusters

40

80

120

160

200

Cluster Hits Obtained by SAGE and Random Sampling

UKY Seminar Weifan Zheng, Ph.D.

Drug Discovery & Development Failures

Venkatesh & Lipper, J. Pharm. Sci. 89, 145-154 (2000)

poor PK

efficacy

Tox

Market

39%

29%

21%6%

UKY Seminar Weifan Zheng, Ph.D.

Multi-Factorial Design

00.10.20.30.40.50.60.70.80.9

1

score

UKY Seminar Weifan Zheng, Ph.D.

)()( SEwSE ii

Total Score is the Weighted Sum of Individual Terms

UKY Seminar Weifan Zheng, Ph.D.

Penalty Scores

Iteration

Initial Library

Better Library

Optimal Library

Lipinski PropertiesP450 Activity

Diversity

R1 R2

R1

R2

R1

R2

R1

R2

Initial Ten solutions (undesigned)

The final ten solutions (well designed)

clogP

Designed Library Has a Better MW-clogP Distribution

X1 X2 X3 • • • X20

Str. 1 2 5 1 • • • 4Str. 2 4 7 9 • • • 7Str. 3 1 6 8 • • • 6

• • • • • • • •• • • • • • • •• • • • • • • •

Str.100 0 3 5 • • • 1

123

X1

X2

Molecular Identity and Molecular Similarity

UKY Seminar Weifan Zheng, Ph.D.

• Iterative Random Sampling

OriginalSpace

EmbeddingSpace (2D)

a b

D(a,b) D’(a,b)

If D’ > D, move a, b closerIf D’ < D, move a, b apart

SPE Algorithm (Agrafiotis)

UKY Seminar Weifan Zheng, Ph.D.

Chemical Space - Compound Collection Comparison

UKY Seminar Weifan Zheng, Ph.D.

Chemical Space - Compound Collection Comparison

UKY Seminar Weifan Zheng, Ph.D.

Chemical Space - Compound Collection Comparison

UKY Seminar Weifan Zheng, Ph.D.

SPE Embedding of ChemSpace

UKY Seminar Weifan Zheng, Ph.D.

Topics to Be Covered

Biotech/Pharma Orphan Disease Chemical Genomics

Computational Needs

Compound Collection Docking Scoring Data Analytics

CECCR Cheminformatics Center

UKY Seminar Weifan Zheng, Ph.D.

Quantitative Structure-Activity Relationship (QSAR)

Structures Activity

str1 a1

str2 a2

str3 a3

str4 a4

str5 a5

str6 a6

str7 a7

str8 a8

str9 a9

str10 a10

..

.

.

...

.

.predict

actu

al

..

.

.

.

.

..

predict

actu

al

q2=0.8R2=0.75

Multiple Linear regression (MLR); partial least square (PLS); Artificial neural nets; k-nearest neighbor (kNN)

UKY Seminar Weifan Zheng, Ph.D.

• Structurally similar compounds should have similar biological activities

• Biological similarities are often due to similarities of substructures (pharmacophore)

• Biological activities can be estimated from molecular similarities, which are calculated with pharmacophore-specific descriptors

Basic Assumptions of KNN-QSAR Method

UKY Seminar Weifan Zheng, Ph.D.

00.10.20.30.40.50.60.70.80.9

q2

AChE(60) 5HT1A(14) DHFR(23) D1 ANT (29)

Dataset

CoMFA/q2-GRSGA-PLSkNN-QSAR

Comparison of CoMFA, GA-PLS, and KNN-QSAR

UKY Seminar Weifan Zheng, Ph.D.

01020304050

60708090

100

0 20 40 60 80 100

%Screened

%A

ctiv

e R

etri

eved

%Random

%Retrieved

QSAR Based Virtual Screening for GPCR Ligand Design

UKY Seminar Weifan Zheng, Ph.D.

Topics to Be Covered

Biotech/Pharma Orphan Disease Chemical Genomics

Computational Needs

Compound Collection Docking Scoring Data Analytics

CECCR Cheminformatics Center

UKY Seminar Weifan Zheng, Ph.D.

Docking and Scoring

• Early 1980’s, Kuntz, I.D. developed the first computerized molecular docking program: DOCK

• GOLD, FRED,

GLIDE, FLEXX, AutoDock, ICM

X-raystructure

1. Use Delaunay tessellation to derive geometrical chemical descriptors of protein ligand interface

2. Establish correlation between the geometrical chemical descriptors and protein-ligand binding affinity using Perceptron Learning algorithm

Our Approach to Derive DT-SCORE

UKY Seminar Weifan Zheng, Ph.D.

Receptor-ligand Complexes

Descriptor Generation

Tessellation of receptor-ligand interface

Model Generation & Prediction

Binding constant

DT-SCORE

Perceptron Learningalgorithm

Flowchart to Derive DT-SCORE

UKY Seminar Weifan Zheng, Ph.D.

• Rigorous definition of nearest neighbors in 2D & 3D space - Delaunay tessellation

Nearest neighbors are unambiguously defined in sets of three (in 2D) and in sets of four (in 3D)

Delaunay Tessellation in 2D

UKY Seminar Weifan Zheng, Ph.D.

Delaunay Tessellation of the Receptor-Ligand Interface

UKY Seminar Weifan Zheng, Ph.D.

RR

R

L

R

R

An atom is sharedby several tetrahedra

A Detailed View of Active Site Tessellation

RRRLRRLLRLLL

RLLL: Formed by 1 receptor atom and 3 ligand atoms RRLL: Formed by 2 receptor atoms and 2 ligand atomsRRRL: Formed by 3 receptor atoms and 1 ligand atom

Each of the above tetrahedron types is further discriminated by atom types on the vertices

3 Types of Tetrahedra at the Receptor-Ligand Interface

UKY Seminar Weifan Zheng, Ph.D.

RRRLRRLLRLLL

NCNO ONOS …… CNOO NOCS …… COSC OSXN ……

5 3 …… 8 2 …… 4 0 ……

Geometrical Descriptors According to Tetrahedron Types

UKY Seminar Weifan Zheng, Ph.D.

( R·L Interaction Pattern – Binding Affinity Relationship Table)

Receptor-Ligand Complexes

Binding Affinity

RLLL RRLL RRRL

NCNO ONOS … CNOO NOCS … COSC OSXN …

(R • L)1 y1 0 3 … 2 8 … 1 3 …

(R • L)2 y2 1 7 … 3 1 … 0 3 …

… … … … … … … … … … …

(R • L)m-1 ym-1 3 4 … 0 5 … 4 6 …

(R • L)m ym 2 0 … 2 2 … 1 0 …

“QSAR” Input Table

UKY Seminar Weifan Zheng, Ph.D.

Input Layer Output Layer

2

1

3

N

x1

x2

x3

xN

y

w1

w2

w3

wN

(.)nf

xi = input of neuronwi = weight associated with the input xi

fn(.) = Activation function of output neuron.

Single-Layer Perceptron Network

Entire dataset

Test setTraining set

Model development (q2) Prediction of thetest set (R2)

80%(214 complexes)

20%(50 complexes)

(264 complexes)

Training Vs. Test Set Selection and Validation

UKY Seminar Weifan Zheng, Ph.D.

• Average value from multiple (ca. 80) models

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

0 200 400 600 800 1000

Number of Iterations

q2(R2)

Training Set

Test Set

Model Stability

UKY Seminar Weifan Zheng, Ph.D.

0

2

4

6

8

10

12

14

16

0 2 4 6 8 10 12 14 16

Actual pKd

Pre

dic

ted

pK

d

214 complexes: q2 = 0.73

Actual vs. Predicted Binding Affinity for the Training Set

UKY Seminar Weifan Zheng, Ph.D.

0

2

4

6

8

10

12

14

16

18

0 2 4 6 8 10 12 14 16

Actual pKd

Pre

dic

ted

pK

d

50 complexes: R2 = 0.61

Actual vs. Predicted Binding Affinity for the Test Set

UKY Seminar Weifan Zheng, Ph.D.

• NCCU and UNC– Jerry Ebalunode, Ph.D., BRITE– Min Shen, Ph.D., Lexicon– Alex Tropsha, Ph.D., Chair of MedChem,

UNC-Chapel Hill

• Funding– NIH P20HG003898– NIH R21GM076059

Acknowledgements

UKY Seminar Weifan Zheng, Ph.D.

• GSK

– Sunny Hung (GSK)

– George Seibel (JNJ)

– Ken Kopple (retired)

– Jeff Wiseman (Locus)

• Lilly

– Minmin Wang

– Greg Durst

– Jim Wikel (retired)

Recommended