Time-Efficient Flexible Superposition of Medium-sized Molecules Presented by Tamar Sharir (Lemmen & Lengauer)

Time-Efficient Flexible

Superposition of

Medium-sized Molecules

Presented by Tamar Sharir

(Lemmen & Lengauer)

Outline

DefinitionsDefinitions Goals in superposition of moleculesGoals in superposition of molecules Structural-Activity relationsStructural-Activity relations Problem definitionProblem definition Assumptions and simplificationsAssumptions and simplifications Biologic background for the algorithmBiologic background for the algorithm The main algorithmThe main algorithm Modifications and improvmentsModifications and improvments ResultsResults SummarySummary

Receptor

Ligand

ReceptorPocket

What does it “look like ”?

Definitions

Receptor- a protein, molecule which give a - a protein, molecule which give a biological response upon biological response upon uniting with chemically complementary uniting with chemically complementary molecules.molecules.

Ligand - Small organic molecule, composed of - Small organic molecule, composed of atoms that forms a complex compound atoms that forms a complex compound

Receptor Pocket - The binding area (site) - The binding area (site)

Definitions-Cont.

ReceptorReceptor

Pharmacophore Model-Can be considered as the -Can be considered as the largest common denominator shared by a set of largest common denominator shared by a set of active molecules. Represent an abstract concept that active molecules. Represent an abstract concept that accounts for the common molecular interaction accounts for the common molecular interaction capacities of a group of compounds towards their capacities of a group of compounds towards their target structuretarget structure

L1 L2Pharmacophore

66

Areas of Interests

Pharmaceutical Research Area- design molecules that interfere with specific biochemical pathways in living systems.

Drug Design Area -develop small organic molecules with a high affinity of binding towards a given receptor (competition)

77

So we have a receptor and we have a ligand,

where is the problem???

88

3D structure of receptor is enough3D structure of receptor is enough But not always exists!But not always exists! In many cases, we only know a set of ligands together In many cases, we only know a set of ligands together

with their biological activities towards a receptorwith their biological activities towards a receptor

Structural – activity relationship studies (3D Structural – activity relationship studies (3D QSAR) aim to correlate measured activities QSAR) aim to correlate measured activities with structure-based properties of the with structure-based properties of the ligands. ligands.

Structural-Activity Relationship

99

What can we do with the results?

Extract the relevant chemical features of ligandsExtract the relevant chemical features of ligands Create a pharmacophore model.Create a pharmacophore model. Search ligands with the same activitySearch ligands with the same activity Provide an estimate of the binding affinity of a novel Provide an estimate of the binding affinity of a novel

ligand towards a given receptorligand towards a given receptor Take the negative imprint of the set of superimposed Take the negative imprint of the set of superimposed

ligands as a crude description of the binding pocket. ligands as a crude description of the binding pocket. (receptor modeling)(receptor modeling)

1010

The Problem “in Visual”

1111

Problem Definition

Input: 2 molecules:The reference ligand - rigid, presented in the

conformation inside the receptor packetThe test ligand - flexible, given in an arbitrary

conformation

Output: the best structural alignment of the 2 molecules received in a short given time

best=“highest score”

1212

Overall Goal

Drastically reduce run time, while limiting the inaccuracies of the model

and the computation to a tolerable level

1313

Existing Approaches

Some methods need to be given the pharmacophore Some methods need to be given the pharmacophore that displays the commonalities of both ligands that displays the commonalities of both ligands

Other methods treat both molecules as rigidOther methods treat both molecules as rigid Methods that handle molecular flexibility without Methods that handle molecular flexibility without

extraneous knowledge of commonalities of both extraneous knowledge of commonalities of both ligands are rare, but are in high demandligands are rare, but are in high demand

This method takes into account the molecular flexibility of the test ligand and needs no predefined information on the

pharmacophore shared by the reference and test ligands

1414

Assumptions & Simplifications

1. Reference and test ligands occupy maximally overlapping areas in space

2. Reference and test ligands usually interact with the same functional group of the amino acids in the binding pocket

3. Only pairs of ligands are considered (no multiple superposition of several ligands)

4. Number of degrees of freedom is reduced to the torsional degree of freedom of the test ligand

5. Atoms of the reference ligand are kept fixed in space.

1515

• Strong binding requires optimal space-filling of the binding pocket

• The run time is small enough to perform several runs:

• with different conformations of the reference ligand

• pairwise comparisons among a larger set of ligands.

• Runs can be performed independently and in parallel

• existing methods that can be used for refining the superposition

• The more rigid the molecules, the higher their binding affinity

Why do we

allow these simplifications?

1616

How do we score?

van der Waals volume

We will use physicochemical properties of the ligands not only for We will use physicochemical properties of the ligands not only for scoring, but also for generating the solutionsscoring, but also for generating the solutions

The two main contributions for scoring:1. paired inter-molecular interactions

2. overlap volumes

electrostatic potential

hydrophobicity

hydrogen-bonding donor and acceptor potentials

1717

How we score? –Cont.

The contributions to the scoring function are divided into two groups: called hard and soft criteria.

• The hard criteria can be used to generate placements and to reject unsatisfactory ones (example: minimum threshold for the overlap volume serves as a criterion to reject unlikely placements)

• the soft criteria are used only for scoring and not for eliminating unlikely solutions (example: the scoring terms for the paired intermolecular interactions)

1818

Paired Intermolecular Interactions

interaction surfaces are defined are defined

They amount to sections of a spherical surface They amount to sections of a spherical surface surrounding the functional group of interest surrounding the functional group of interest

To each such To each such interaction center a particular a particular interaction type is attributed is attributed

Intermolecular interactions with a potential receptor atom that are plausible for both ligands are paired and contribute a term to the overall score.

1919


O

H N

H

N

Reference Ligand

Test Ligand

hypothetical receptor side

interactionsurface

2020


sets of paired intermolecular interactions are called matches

To quantify the weight of a match, a scoring function is defined

Summing over the contributions of all matches results in the match score

Receptor

L2 L1

2121

Overlap volumes of different chemical propertiesprovide the major contributions to the binding affinity towards

the receptor

We assume for two ligands, which achieve a similar We assume for two ligands, which achieve a similar binding affinity, that their chemical fingerprints binding affinity, that their chemical fingerprints inside the receptor pocket are similar inside the receptor pocket are similar

The scoring scheme also considers the The scoring scheme also considers the physicochemical properties of both ligandsphysicochemical properties of both ligands

2222

The Algorithm

Fragmentation and determination

of a base fragment

iterative Incremental construction of

the entire test ligand

Placement of the base fragment

(onto the reference ligand)

Remaining FragmentsReference Ligand

Base Fragment

Fragmented Ligand

Phase 1:Phase 1:

Phase 2:

Phase 3:

2424

1.Placing the Base Fragment1. approximate the interaction surfaces by sets of points

2. search for nearly congruent triangles of such interaction points in both ligands.

3. Each pair of nearly congruent triangles determines a unique transformation that superimposes one triangle in the first molecule onto the other triangle in the second molecule

Through this operation a possible placement of the fragment under consideration is defined

2525

(Data Structures)The triangles for the reference ligand are stored in a triangle hash table (RL-table) in a preprocessing step.

A query to this table, given a triangle from the test ligand (query triangle), results in a list of all triangles in the reference ligand that are nearly congruent to

Pair consisting of the query triangle and a triangle in this list defines one placement of the base fragment over the reference ligand

1.Placing

the Base Fragment-Cont.

2626

1. we label each query triangle by the types of its corners (t(p1), t(p2) and t(p3), corresponding to the type of interaction points p1, p2 and p3) and the lengths of its sides (l(p1,p2), l(p2,p3) and l(p3,p1 ))

2. To make this label unique, the entries of the label [t(pi), t(pj), t(pk), l(pi,pj), l(pj,pk), l(pk,pi)] are ordered such that t(pi) <= t(pj) and t(pj) <= t(pk) hold

2. Clustering the query triangles

# < (# interaction points)3

p2p3

p1

5.0

5.6

8.1

Rule: <

t(p1)=t(p2)=

t(p3)=

Example:

Two possible orderings by type:

<<=

5.0 5.6 8.1

p3p2 p1

5.0 5.68.1

<<= p3p1 p2

2828

2. Clustering

the query triangles-Cont.

1. All query triangles are compiled in a list (called TL-list), which is sorted lexicographically by the triangle labels (The reason for doing so is to obtain contiguous segments of triangles with identical labels (called L-segments)

2. query each triangle in the TL-list against the RL-table (In fact, we perform such queries only for the first triangle in each L-segment)

3. The triangles which we retrieve from the RL- table are mapped onto each triangle in the L- segment.

2929

2. Clustering

the query triangles-Cont.

Normally, we produce between several hundred thousand up to millions of matches of triangles and, consequently, as many

possible placements for the base fragment.

3030

1. Reject matches for which the additional criterion for pairing interactions is missing

2. Van der Waals overlap volumes are computed to filter out unsatisfactory solutions

3. Run an efficient on-line procedure in order to cluster similar placements

2. Clustering

the query triangles-Cont.So how we reduce the number of query triangles??

3131

3. On-Line

Clustering of placements

The first computed placement p0 is taken as a reference from now on.

For every new placement pnew, the

RMS deviation dnew from p0 is determined.

we merge p and pnew.

Check if there is a cluster represented by a placement p that is similar to pnew

pnew is retained as the representative of a new cluster.

YES NO

3232

• the search for p is restricted to clusters that have an RMS distance d to the reference p0 which falls in the range of [dnew -delta,dnew +delta]

• we sort all placements by their RMS distance d to p0.

• The sorted list is maintained as a leaf- chained search tree.

• In this tree, placements within the range [dnew -delta,dnew +delta] form a continuous segment inside the leaf-chain

3. On-Line

Clustering of placements-Cont.

3333

So how do we know we received a good result of the

alignment???

3434

Evaluation Method

Data Sets:

How do we use the data sets?

Lets say we take a receptor R and Ligands L1 and L2. According to the data set we know connections between some receptors and ligands. Lets assume we know the connection between receptor R and ligand L1 and the connection between receptor R and ligand L2.

We wish to find connection between L1 and L2

By matching the connections of R-L1 and R-L2 we get a connection between L1 and L2

3535

Evaluation Method-Cont.

R L2R L1

R

The real

Alignment

derived from

the Data-Sets:

3636

Evaluation Method-Cont.

L2L1

RMS Deviation

Our Result:

The accuracy

of the result

3737

RMS Results

•The quality of our results is measured in terms of the RMS deviation of the predicted from the measured orientation and conformation of the test

ligand •The mean RMS deviation is below 2 Å, and about 1 Å.

3838

Run time Results

•The mean run time over all test cases is below 4 minutes per instance

•The run time spent parts on the base placement and on the complex construction is about equal

•Only a minor fraction of the run time is spent on I/O and preprocessing

3939

Result Example

Black- Reference Ligand

White-Test Ligand (computed by our algorithm)

Gray-The real result (from the data set)

4040

Result Example

Receptor Reference Ligand

Test Ligand

Run Time (mins) Accuracy (Å)

(a) (b) (c) (d) (a) (b) (c)

Carboxypeptidase A Carboxypeptidase A 7cpa 7cpa 1cbx 1cbx 11 1:471:47 2929 2:172:17 0.800.80 0.960.96 0.960.96

7cpa 7cpa 2ctc 2ctc 11 1:391:39 1717 1:571:57 0.510.51 0.790.79 0.790.79

7cpa 7cpa 3cpa 3cpa 11 3535 3434 1:101:10 0.920.92 0.940.94 0.800.80

7cpa 7cpa 6cpa 6cpa 22 4:424:42 2:342:34 7:187:18 0.410.41 0.710.71 0.710.71

4141

Disadvantages of Method

•Inaccuracy of the solutions

•The requirement of the rigid reference ligand (not always known)

•Prevent to produce better results for large ligands

4242

Advantages of Method

•Reasonable accurancy

•Quick superimposing

4343

Method Summary

Structural alignment of medium-sized organic molecules

For applications in 3D QSAR and in receptor modeling

Ligand flexibility is modeled by decomposing the test ligand into molecular fragments

Superimposes a base fragment of the test ligand onto the reference ligand and then attaches the remaining fragments of the test ligand in a step-by-step fashion

The run time on a single problem instance is a few minutes on a common-day workstation

Documents

Time-Efficient Flexible Superposition of Medium-sized Molecules Presented by Tamar Sharir (Lemmen & Lengauer)