21
Development of Novel Geometrical Chemical Descriptors and Their Application to the Prediction of Ligand-Protein Binding Affinity Shuxing Zhang, Alexander Golbraikh and Alex Tropsha The Laboratory for Molecular Modeling School of Pharmacy University of North Carolina at Chapel Hill May 16, 2022

Development of Novel Geometrical Chemical Descriptors and Their Application to the Prediction of Ligand-Protein Binding Affinity Shuxing Zhang, Alexander

Embed Size (px)

Citation preview

Page 1: Development of Novel Geometrical Chemical Descriptors and Their Application to the Prediction of Ligand-Protein Binding Affinity Shuxing Zhang, Alexander

Development of Novel Geometrical Chemical Descriptors and Their Application to the

Prediction of Ligand-Protein Binding Affinity

Shuxing Zhang, Alexander Golbraikh and Alex Tropsha

The Laboratory for Molecular ModelingSchool of Pharmacy

University of North Carolina at Chapel Hill

April 21, 2023

Page 2: Development of Novel Geometrical Chemical Descriptors and Their Application to the Prediction of Ligand-Protein Binding Affinity Shuxing Zhang, Alexander

Problem

Given a protein-ligand complex, predict ligand binding affinity.

Page 3: Development of Novel Geometrical Chemical Descriptors and Their Application to the Prediction of Ligand-Protein Binding Affinity Shuxing Zhang, Alexander

Knowledge-based (Statistical) Potentials

• Two Body potentialsPMF Muegge, I.; Martin, Y.C.; J.Med.Chem.1999, 42, 791-804

BLEEP Mitchell, J.B.; Laskowski R.A.; Alex A.; Thornton, J.M.; J. Comp. Chem.

1999, 20,1165-1176 DrugScore Gohlke, H.; Hendlich, M.; Klebe,G.; J Mol Biol 2000, 295, 337-356

SMoG DeWitte, R. S.; Shakhnovich, E.I. J Am. Chem. Soc. 1996, 118,11733-11744 SMoG2001 Ishchenko. A. V.; Shakhnovich, E. I.; J. Med. Chem. 2002, 45,

2770-2780 • Four-Body contact potential (By Jun Feng)

Page 4: Development of Novel Geometrical Chemical Descriptors and Their Application to the Prediction of Ligand-Protein Binding Affinity Shuxing Zhang, Alexander

Full Atom-based Delaunay tessellation of Protein-ligand Interface (5HVP)

king
An example of active site tessellation: the ribbon diagram represents the two chains of HIV-1 protease. The ligand acetyl-pepstatin is in spacefill mode and the yellow is the tetrahedral formed by protein and ligand
Page 5: Development of Novel Geometrical Chemical Descriptors and Their Application to the Prediction of Ligand-Protein Binding Affinity Shuxing Zhang, Alexander

RRRLRRLLRLLL

RRRL: Formed by 3 receptor atoms and 1 ligand atomsRRLL: Formed by 2 receptor atoms and 2 ligand atomsRLLL: Formed by 1 receptor atoms and 3 ligand atoms

Three Types of Tetrahedra at Protein-ligand Interface

Page 6: Development of Novel Geometrical Chemical Descriptors and Their Application to the Prediction of Ligand-Protein Binding Affinity Shuxing Zhang, Alexander

LRRR

RRRLRRRL ff

fE ln

LLRR

RRLLRRLL ff

fE ln

LLLR

RLLLRLLL ff

fE ln

Earlier work: Four-Body Statistical Contact Scoring Function Based on Delaunay

Tessellation

Page 7: Development of Novel Geometrical Chemical Descriptors and Their Application to the Prediction of Ligand-Protein Binding Affinity Shuxing Zhang, Alexander

R2 = 0.4678-100

-80

-60

-40

-20

0

-100 -80 -60 -40 -20 0

DDG, calc

DDG,

exp

RLLLRRLLRRRL EEEE

Correlation between experimental and calculated binding free energy for PMF dataset using four-body scoring function

Page 8: Development of Novel Geometrical Chemical Descriptors and Their Application to the Prediction of Ligand-Protein Binding Affinity Shuxing Zhang, Alexander

Training Set size

Test Set size

Test Set R2

BLEEP 351 90 0.53

PMF 697 77 0.61

SMoG96 120 46 0.42

SMoG2001 725 111 0.436

DT2001 319 67 0.71

DT2002 319 107 0.54

Comparison of Current Scoring Functions

Page 9: Development of Novel Geometrical Chemical Descriptors and Their Application to the Prediction of Ligand-Protein Binding Affinity Shuxing Zhang, Alexander

Multiple CG descriptors of protein-ligand interface and correlation with ligand affinity

• Define the ligand-receptor interface by the means of DT

• Calculate chemical descriptors for nearest neighbor atom quadruplets.

• Use statistical data modeling approach to correlate descriptors and affinity

Page 10: Development of Novel Geometrical Chemical Descriptors and Their Application to the Prediction of Ligand-Protein Binding Affinity Shuxing Zhang, Alexander

µ: Electronegativity (chemical potentials) of atoms

Q: Partial charges on atoms

Η: Hardness kernel

Descriptors derived from atomic electronegativity

King
According to study of Dr. Berkowitz's lab, EN is highly related to the energy of molecules (see formulus). Qualitatively, we also know that it is related to hydrogen bond, polarity and polarization. we hope be able to describe the structure and binding with this parameter by applying it to Delaunay tessellation. There are several ways to apply EN to our geometrical method.
Page 11: Development of Novel Geometrical Chemical Descriptors and Their Application to the Prediction of Ligand-Protein Binding Affinity Shuxing Zhang, Alexander

Ligand Atom TypesO EN = 3.4

N EN = 3.0

C EN = 2.5

S EN = 2.4

X P and Halgens, EN = 2.0 ~ 2.4, 4.0

M Metal and all other unexpected atom types, EN = 0.6 ~ 1.6

Receptor Atom TypesO EN = 3.4

N EN = 3.0

C EN = 2.5

S EN = 2.4

There are 554 possible interfacial quadruplet composition types. After processing 517 complexes, 100 are found to occur with high frequency (at least 50 times).

Atom Type Definition based on En values

king
In order to generate descriptors, the atom types must be defined. Here we use EN as a criterion. The reasons will bed discussed on next slides. Basically we want to our descriptors make more physico-chemical sense and hope to explain the complicated binding process mechanistically. Another is to control of the number of descriptors not too many.
Page 12: Development of Novel Geometrical Chemical Descriptors and Their Application to the Prediction of Ligand-Protein Binding Affinity Shuxing Zhang, Alexander

m: m-th tetrahedral composition typej: Vertex of a tetradedronn: Number of m-th composition type

Thus, there are 100 descriptors for each protein-ligand complex

Descriptor Calculation

S_L

C_R

O_L

N_R

2.5

2.4

3.0

3.4

n

i jijmEN

1

4

EN

king
In order to generate descriptors, the atom types must be defined. Here we use EN as a criterion. The reasons will bed discussed on next slides. Basically we want to our descriptors make more physico-chemical sense and hope to explain the complicated binding process mechanistically. Another is to control of the number of descriptors not too many.
Page 13: Development of Novel Geometrical Chemical Descriptors and Their Application to the Prediction of Ligand-Protein Binding Affinity Shuxing Zhang, Alexander

Flowchart of Novel Descriptor GenerationFlowchart of Novel Descriptor Generation

Process files and assign atom type

based on EN value

Define interaction interface with DT and record all interfacial tetrahedra

264 complexes

Classify interfacial tetrahedra into different composition

types and calculate their EN values (Descriptors)

Correlate with

Binding

Page 14: Development of Novel Geometrical Chemical Descriptors and Their Application to the Prediction of Ligand-Protein Binding Affinity Shuxing Zhang, Alexander

Data ModelingData Modeling

Structure Binding CG Descriptors

Comp.1 Value1 D1 D2 D3 D4

Comp.2 Value2 " " " "

Comp.3 Value3 " " " "

Comp.N-264 Value264 " " " "

- - - - - - - - - - - - - -

Goal: Establish correlations between descriptors and the binding affinity capable of predicting binding of novel complexes

{Binding affinity} = K{descriptor diversity}^

Page 15: Development of Novel Geometrical Chemical Descriptors and Their Application to the Prediction of Ligand-Protein Binding Affinity Shuxing Zhang, Alexander

0

5

10

15

20

25

30

Complex Families

Num

ber o

f Com

plex

es

Diversity of the dataset: 264 Complexes, 33 families

king
The high diversity of our structures and protein families which is hard for most of the current scoring functions to predict their binding affinity
Page 16: Development of Novel Geometrical Chemical Descriptors and Their Application to the Prediction of Ligand-Protein Binding Affinity Shuxing Zhang, Alexander

Only accept models that have a

q2 > 0.6R2 > 0.6, etc.

Multiple Training Sets

Validate Predictive Models with Randomly Selected

External Sets (24)

Data Modeling WorkflowData Modeling Workflow

264 Complexes

Multiple Test Sets

Variable Selection kNN to build modelsSplit 240 into

Training and Test Sets

Binding Prediction

Y-Randomization

Randomly Exclude 24 Complexes as

External Set

Page 17: Development of Novel Geometrical Chemical Descriptors and Their Application to the Prediction of Ligand-Protein Binding Affinity Shuxing Zhang, Alexander

Leave out one complex from the training set and calculate distance between the eliminated and all remaining compounds

(in the original 100 descriptor space)

k Nearest Neighbork Nearest Neighbor (k (kNN) with Variable SelectionNN) with Variable Selection

Randomly select a subset of descriptors (a hypothetical descriptor pharmacophore)

Leave out a complex

Find k nearest neighbors in the training set

Predict the binding affinity of the eliminated complex by weighted kNN using the identified k nearest neighbors.

Select acceptable models (with q2 > 0.6)Calculate the predictive ability (q2) of the model

N

times

N

times

SA

Page 18: Development of Novel Geometrical Chemical Descriptors and Their Application to the Prediction of Ligand-Protein Binding Affinity Shuxing Zhang, Alexander

0

2

4

6

8

10

12

0 2 4 6 8 10 12

Actual PKi

Pre

dic

ted

PK

i

Correlation of Actual ~ Predicted Binding Affinity for 49 Test Set Complexes

king
Prediction with multiple models and this is with the best model. R2 is about 0.783 and RMSD is about 0.91 (I will let you know the equavelent binding energy).
Page 19: Development of Novel Geometrical Chemical Descriptors and Their Application to the Prediction of Ligand-Protein Binding Affinity Shuxing Zhang, Alexander

0

2

4

6

8

10

12

0 2 4 6 8 10 12

Actual PKi

Pre

dict

ed P

Ki

Correlation of Actual ~ Predicted Binding Affinity for 24 Complexes with Best Model

king
United consensus prediction: Combine the training and test sets and do consensus prediction of external 24 complexes. R2 is about 0.70 and RMSD is about 0.89.
Page 20: Development of Novel Geometrical Chemical Descriptors and Their Application to the Prediction of Ligand-Protein Binding Affinity Shuxing Zhang, Alexander

Training Set size

Test Set size

Test Set R2

BLEEP 351 90 0.53

PMF 697 77 0.61

SMoG96 120 46 0.42

SMoG2001 725 111 0.436

DT2001 319 67 0.71

DT2002 319 107 0.54

CG 191 49 0.78

Comparison of Current Scoring Functions

Page 21: Development of Novel Geometrical Chemical Descriptors and Their Application to the Prediction of Ligand-Protein Binding Affinity Shuxing Zhang, Alexander

• Novel geometrical chemical descriptors have been developed

• These simple yet fundamental descriptors can be used to predict binding affinity using correlation approaches; have high prediction power for diverse ligand-protein structures

• The statistical models can be used for fast and accurate scoring of complexes resulting from docking studies

Conclusions