1.16 Boltzmann Maps From Grand Canonical Monte Carlo Simulations - Overview

Boltzmann Fragment Maps From GC Monte Carlo Simulations: Hit Prioritization, Lead Optimization

MetaLeaps, LLC January, 2016

[email protected]

Definition: Boltzmann Maps

Distributions of chemical fragment or water binding configurations on the surface of macromolecules, with populations adhering to Boltzmann energy statistics

2

Structured water on DNA High affinity fragment cluster on the EPO receptor

Pyrimidine map for beta catenin

Point of view

•  Boltzmann maps are useful for a variety of applications in drug discovery and molecular biology research

•  Adequate prediction of differences in binding free energy, between different binding sites and different fragment types, is essential for utility

•  Maps must be easy to use, or they won’t be

3 P53-MDM2 Hot Spots Differential binding between iso-forms

Non-obvious ideas for lead op, water penalty

Boltzmann Maps from Grand Canonica Monte Carlo (GC/MC) Simulations

•  What are they?

•  Why are they important?

•  How are they used?

4

Thermodynamically-principled modeling of the configuration and energetics of water and chemical fragment binding, with better sampling

Common preconceived notions

•  This must be some form of docking

•  Calculated binding affinity is not predictive

•  Simulations, modeling tools are too complex to use

5

Molecular interactions are a statistical phenomenon

•  Boltzmann statistics required to accurately characterize ligand binding to proteins

•  Docking is scanning for binding poses that “stick” –  But no statistically-valid distributions of poses

•  Simulations (molecular dynamics, Monte Carlo) produce rigorous Boltzmann statistics –  But vary in the adequacy of sampling, practicality

GC/MC simulations are not docking! Docking sampling predicts binding poses but can not, in principle, accurately predict binding affinity*

–  Warren et al., J. Med. Chem. 2006, 49, 5912-5931

6 *But docking is useful for some tasks (e.g. virtual screening)!

Calculating binding affinity is hard, so invert the problem

•  Conventional methods involve summing contributions from many configuration samples –  Adequate sampling of the binding configurations a key limitation

•  Many examples in the literature of successful predictions of binding free energy –  But the computations are ponderous, often of limited generality

In GC/MC, chemical potential* is imposed, and the configurations with a given binding affinity are discovered

–  See also paper from the Essex group at U. Southhampton regarding calculating binding affinity directly from GC/MC data**

7

*Average free energy per molecule **“Water Sites, Networks, And Free Energies with Grand Canonical Monte Carlo”, G. A. Ross, M. S. Bodnarchuk, and J. W. Essex, JACS 2015 137 (47), 14930-14943

GC/MC efficiently provides information about fragment or water binding, ranked by affinity

•  Fragment maps –  Impose chemical potential (average FE/molecule) –  Find where fragments will bind on a protein –  Challenge is to adequately sample all possibilities

•  Predictive ranking of fragment binding –  Required for practical use of fragment maps –  Lowest chemical potential at which a fragment

survives at a site is used for ranking –  To be predictive, must account for at least

configurational entropy and ΔΔGs

•  Statistically-significant fragment distributions –  Finding appropriate geometries for linking –  Understand flexibility vs. affinity –  Complements single-pose X-ray co-crystals

Lower ΔH, Smaller ΔS

ΔG = ΔH-TΔS

Greater ΔH, Greater ΔS

Same ΔG with different ΔH, ΔS

Protein flexibility is important but that doesn’t mean all degrees of freedom are treated the same

•  Protein flexibility can be factored into 4 spatial scales –  Lobes, loops, side-chains, protonation states

•  Fragment binding doesn’t generally impact lobes, loops –  Simulate multiple X-ray structures or MD consensus structures

•  Side-chain rotamer sampling is valuable (not done yet) –  Only in the binding site, important in only 10-20% of cases

•  Use co-crystal structures for chemotype substitution •  Use MD simulations on larger assembled ligands to

evaluate binding (QM/MD even better) –  Especially important for protein-protein/protein-DNA inhibitors –  Protein-ligand binding of larger ligands is an emergent

phenomenon, not simply additive of component binding 9

Constrained fragment annealing (CFA) with bound waters to evaluate binding of assembled ligands

10

Anneal fragment subject to bond constraints to other fragments. Both can move, rotate. Ranks contri-bution of added fragment

Useful for prioritizing chemotype substitutions.

Tightly-bound waters

-5

-4

-3

-2

-1

0 -50 -40 -30 -20 -10 0

Expe

rimen

tal p

IC50

Predicted Free Energy

Top 5 compounds by IC50

Top 5 compounds predicted by analysis

In a blinded test with big pharma, correctly ranked 87% of predicted binding affinities using CFA

Side-chain conformation changes were an issue for some compounds

Ranking adequate for chemistry decisions

Summary of the SAMPL3 Challenge data hosted by OpenEye summer 2011*

R²=0.6651

R²=0.7518

-32

-27

-22

-17

-12

-7

-6.5 -6 -5.5 -5 -4.5

Calculated

Affinity(k

cal/mol)

ExperimentalAffinity(kcal/mol)

NoCorrec8on

Solva8onCorrec8on

FragMap SFA** CFA

•  Solva8oncorrec8onimprovesaffinitypredic8on•  CFAupgradesposepredic8on

*Kulp III, J. L.; et al. J Comput.er-Aided Mol. Des. , 2012, 26(5), 583-594. **SFA = Single Fragment Annealing of rigid fragment combinations

12

Constrained Fragment Annealing: Improves predictability, range, and accuracy

•  P38 compound ranking using CFA

R² = 0.86716

5

6

7

8

9

-25 -20 -15 -10 -5 0 5

-pIC

50

Calculated relative free energy

13

Water Modeling: Comparing GC Monte Carlo vs. Molecular Dynamics

GC/MC+Annealing MD (Gromacs)

Lowest energy, multi-body configuration Not found in 10ns MD runs

Also, not found without chemical potential annealing in GC/MC

14

GC/MC with simulated annealing of chemical potential is an efficient and accurate water free energy modeling technique.

Finds multi-body water configurations not found with other water mapping methods, especially important for nucleic acids

We believe we have the highest performance, most automated GC/MC simulation platform

•  Multi-variate (T, µ, V, P, σ, ε, ρ, ΔΔGs, ...), adaptive annealing schedules –  Tool box to solve difficult sampling problems –  Highly efficient, puts computation where it matters

•  Learned-bias sampling strategies –  Exponential acceleration of convergence –  Surface insertion bias, location bias, flip sampling, etc.

•  Multi-species GC/MC –  Waters, metal ions, multi-point ion models, co-factors, etc.

•  More accurate electrostatics –  Charge factoring, ion pairing, energy-based cutoffs, optimized partial chgs

•  High performance ΔΔGs model –  Integrated with GC/MC sampling

•  Optimized for Intel architectures, clusters –  Multi-threading, SSE-optimized energy calculations

•  Comprehensive support for a broad range of macromolecule structures, small molecule chemistries, ions, co-factors, etc. 15

A wide diversity of targets have been modeled: The method, tools are general (~50,000 maps)

•  Kinases –  P38 (3 variants) –  Proprietary kinases (3 proteins) –  Ckit –  PhoQ Histidine kinase –  JAK2/JAK3 –  Mapkap-k2 (5 variants) –  cAbl (2 variants)

•  Proteases and hydrolytic enzymes –  Elastase: PPE, HNE serine proteases –  Peptide deformylase (2 variants) –  Renin –  T4 lysozyme –  Peptidyl t-RNA hydrolase –  Trypsin –  HEWL lysozyme

•  Nuclear Hormone Receptors –  ROR-alpha –  RAR-beta –  LXR

•  Transferases –  Accase –  Amino transferase –  Phenylethanolamine methyltransferase

•  Oxygenases/Reductases –  Dihydrofolate reductase (5 types) –  Cox1/Cox2 –  IDO –  CpI hydrogenase

•  Receptors –  EPO receptor –  NOGO –  GPCR

•  Macromolecular Interactions –  PCSK9 –  BPTI (trypsin proteinase inhibitor) –  Fcrn (peptide mimetic) –  Protein/DNA complex –  FABP4 –  P53/MDM2 –  β-catenin

•  Other classes –  Hsp90 –  PTP1B –  PARP –  NS5B RNA polymerase –  Arginase –  Keap1 –  M2 proton pump –  Copper pump –  RNA polymerase IV

Results confirmed experimentally In silico validation vs. known ligand binding

Infectious diseases

•  Malaria -  Dihydrofolate reductase

(p.falciparum) -  Dihydrofolate reductase

(p.vivax)

•  Bacterial -  Rec A -  Gyrase -  Mur pathway proteins -  D-Ala-D-ala ligase -  Alanine racemase

•  Tuberculosis -  Nad+ synthetase -  Malate synthase -  pantothenate synthetase -  isocitrate lyase

•  HIV -  GP41 -  HIV protease -  TAR RNA

•  Ebola -  Niemann-Pick C1 -  Tsg101

•  Alphavirus -  nsP2 protease

16

Representative GC/MC Boltzmann Map Applications in the literature

•  Guarnieri F, Mezei M. "Simulated annealing of chemical potential: A general procedure for locating bound waters. Application to the study of the differential hydration propensities of the major and minor grooves of DNA." J Am Chem Soc. 1996;118(35):8493-4.

•  Kulp III JL, Kulp Jr JL, Pompliano DL, Guarnieri F. "Diverse fragment clustering and water exclusion identify protein hot spots." J Am Chem Soc. 2011;133(28):10740-3.

•  Kulp III, JL, et al. “A fragment-based approach to the SAMPL3 Challenge”, J Comput Aided Mol Des, 2012; DOI 10.1007/s10822-012-9546-1.

•  Clark M, Guarnieri F, Shkurko I, Wiseman J. "Grand canonical Monte Carlo simulation of ligand-protein binding." J Chem Inf Model. 2006;46(1):231-42.

•  G. A. Ross, M. S. Bodnarchuk, and J. W. Essex, “Water Sites, Networks, And Free Energies with Grand Canonical Monte Carlo”, JACS 2015 137 (47), 14930-14943.

•  M. Vallée et al., “Pregnenolone Can Protect the Brain from Cannabis Intoxication”, Science 343, 94 (2014). •  Marron, TJ et al., “Solvation studies of DMP323 and A76928 bound to HIV protease: Analysis of water sites using

grand canonical Monte Carlo simulations”, Protein Science (1998), 7573-579. •  Clark, M et al., “Fragment-Based Computation of Binding Free Energies by Systematic Sampling”, J. Chem. Inf.

Model., 2009. •  Berk, P et al., “Molecular Modeling and Functional Confirmation of a Predicted Fatty Acid Binding Site of

Mitochondrial Aspartate Aminotransferase”, J. Mol. Biol. (2011) 412, 412–422. •  Moore, W.R., Jr., “Maximizing discovery efficiency with a computationally driven fragment approach.” Curr Opin

Drug Discov Devel, 2005. 8(3): p. 355-64. •  Mezei, M., “Grand-canonical ensemble Monte Carlo study of dense liquid Lennard-Jones, soft spheres and water.”

Mol. Phys., 1987. 61: p. 565-582. •  Moffet, K et al., “Discovery of a novel class of non-ATP site DFG-out state p38 inhibitors utilizing computationally

assisted virtual fragment-based drug design (vFBDD)”, Bioorganic & Med. Chem. Letters 21 (2011) 7155–7165

17

Prioritize fragment hits from screens to reduce the number of dead-end chemistry paths taken

•  Multiple binding sites (super-stoichiometric binding) –  Determine the highest affinity site

•  Is it functionally appropriate? –  Docking/probing/X-ray can identify sites, but too many

•  GC/MC does that and also enables ranking by affinity

•  Accessibility of functional groups –  Are functional groups oriented to allow extension?

•  Modified without disrupting the binding (e.g. not buried)? •  Oriented towards other accessible, high-affinity hot spots

or pockets not blocked by tightly-bound waters? –  Statistically-valid pose distributions from GC/MC

provides answers •  Robustness of chemistry progression opportunities

–  Do other fragments bind in proximity to be linked? –  Search fragment maps to enumerate, rank possibilities

18

No Yes++

No Yes

Yes--

No

No Yes+

Fragment maps can be productively used in a fragment-based ligand engineering process

•  Similar to fragment screening methods1 –  Abbott2, Astex3, Evotek, Carmot, … –  Experimental methods, performance requirements are similar –  Achieve leads with high ligand efficiency (IC50/weight),

novel, patentable chemistry

•  But orthogonal, complementary to them –  Experimental screening limited to high solubility, weak binders –  Computed fragment maps are more general, fine-grain, diverse –  Many fewer lead candidates are synthesized and tested

•  High productivity, success rate (3 design chemists) –  >20,000 fragment simulations per year, >4,000 QM calculations/yr. –  Several hundred ligand designs evaluated per month –  Designs translated into drug leads in all fully-funded programs

1 “Fragment-based lead discovery grows up”, Nature Reviews/Drug Discovery, 2013 Jan, 12, 5-7. 2 “A decade of fragment-based drug design: strategic advances”, Nature Reviews/Drug Disc., 2007 Mar, 6, 211-9. 3 “Experiences in fragment-based drug discovery”, Trends in Pharma. Sciences, 2012 May, 33, 224-32.

Designs translate into confirmed hits/leads

20

Target Client Novel Designs

Fragment Hits* (IC50 < 25 µM)

Ligand Hits** (IC50 < 1 µM)

Lead (IC50 < 100nM)

Status

enzyme Client 1 ü ü ü ü In clinic enzyme Client 2 ü ü ü ü Ag. Field test enzyme Client 3 ü ü ü ü X-ray confirmed DHFR Partner 1 ü ü ü ü completed HSP90 Partner 2 ü ü canceled PARP Internal ü ü deprioritized PTP1B Internal ü ü deprioritized Renin Internal ü ü ü ü %F data

PCSK9 Internal ü ü ü sold

RecA Internal ü ü Waiting funding NS5B Collab. ü ü ü Cell data PDF Client ü ü ü ü patent P38 Internal ü ü ü ü validation

7- 20 compounds synthesized/project

17- 40 compounds synthesized/project

*Fragmenthits150-250Da**Drug-like,cellac8vity300-400Da

Delivering on the promise to solve hard problems

•  Identify “hot spots” –  Key interactions for disrupting protein-protein interactions –  Allosteric modulation sites –  Binding site sub-pockets

•  Non-obvious ideas for optimizations that preserve potency, addressing the “SAR Paradox” –  Design for a range of physico-chemical properties for membrane

penetration, bioavailability, etc. –  Avoid or strengthen patents

•  Exploiting differential fragment binding patterns between different protein structure variations –  Selectivity between isoform’s –  Binding that is not sensitive to mutations –  Multi-targeting protein pathways, patient sub-populations

•  Difficult targets where screening has not yielded good leads –  E.g. protein-protein interactions, peptide mimetics

21

Examples

Renin, Peptide Deformylase, RecA

22

Goals in renin inhibitor lead optimization

•  Improve bioavailability (F%) •  Generate new IP •  Improve physical properties (cLogD) •  Lower mol. wt. (<400 Da) while maintaining affinity

–  i.e. better ligand efficiency

•  Make a limited number of compounds (< 35)

23

Renin project

•  Used 2IKO.pdb structure •  Fragment maps used to:

–  Discover novel scaffolds interacting with catalytic aspartates –  Identify sub-pocket binding site not previously exploited –  Determine more optimal linkage to heterocycles –  Improve ligand efficiency (LE)

•  With lower mol. wt. and broader range of cLogD values

•  Round 1 –  15 compounds made and tested –  IC50 range achieved: 600nM -- 10 µM

•  Round 2 –  17 compounds made and tested –  IC50 range achieved: 40 nM -- 250nM

24

Inhibitorinthe2IKOco-crystal

Two chemists, 6 months, outsourced synthesis & testing

Predicted binding pose of the fragments* used in design (orange) compared to co-crystal ligand (green) in 2IKO showing the interaction of the indole NH with GLY.223 and the ether fragment penetrating S3sp

25

EtherfragmentinS3spsite

GLY.223

Novelheterocycle

NeworientaGon 2IKOligand

Newdesign

*From Grand Canonical Monte Carlo simulations

Predicted binding pose of the fragments* used in design (orange) compared to co-crystal ligand (light green) in 2G1R showing the two different NH interactions with GLY.223 for the ligand and new design

26

DifferentlinkageposiGon

Etherfragment

GLY.223

2G1Rligand

Newdesign

Novelheterocycle

*From Grand Canonical Monte Carlo simulations

Correlation plot of computed FE vs. IC50 shows rank order is predictive within experimental error

27

R²=0.77423

4

4.5

5

5.5

6

6.5

7

7.5

8

-45-40-35-30-25-20-15-10-50

pIC5

0

FEPred(kcal/mol)

FEPredvspIC50forrenincompounds

BioLeapscaffold1

Literature

BioLeapscaffold2

CompoundsonthesameverGcallineareliteraturevalues(red)andtheretestedvalue(blue)

pKa results demonstrate improved bioavailability at lower mol. wt. compared to literature compounds

28

Renin: What we have done using the technology

29

•  Identified 2 novel scaffolds, unique chemotypes with novel IP –  Even though renin has been broadly studied for a long time

•  Identified a previously underutilized interaction site on the protein –  Used to drive up affinity while maintaining high ligand efficiency

•  Found new head groups with 3X better affinity at the same mol. wt. –  Mol. wt. < 400 Da for all compounds tested – good drug potential

•  Shown pKa can be modified while maintaining affinity –  Tolerate a range of lipophilicy yet maintain a cLogD in the ideal 2-3 range

•  Modify properties that are key determinants for clinical candidates –  cLogD, mol. wt., polar suface area (PSA)

•  Making and testing only 32 compounds in < 6 months –  Highly productive prioritization of optimization opportunities

Comparison of various renin lead op efforts

OrganizaGon Mol.Wt. LigandEfficiency*

F% Est.#Chemist-years

Reference

Merck 610Da511Da

.23

.3318%41%

2020

1,2

Pfizer 521Da .26 74% 30 3

BI 635Da .30 17% 30 4

Roche 600Da .30 <10% 60 5

Vitae 508Da .25 13% 8 6

BioLeap 396Da .33 58% 2 7

30

References:1.P.Lacombeetal./Bioorg.Med.Chem.Le_.20(2010)5822–5826.2.A.Chenetal./Bioorg.Med.Chem.Le_.21(2011)3976–3981.3.R.Saver/Anal.Biochem.2007Jan1;360(1):30-40.4.B.Simoneauetal./Bioorg.Med.Chem.7(1999)489-508.5.H.P.Ma¨rkietal.:IlFarmaco56(2001)21–27.6.C.M.Ticeetal./Bioorg.Med.Chem.Le_.19(2009)3541–3545.7.I.Cloudsdale,etal.,Submi_edforpublica8on2015,draiavailableuponrequest.

*LE = 1.4(-log(IC50)/N) N = # heavy atoms Higher is better

Summary: Boltzmann fragment maps improve productivity in lead op

•  Identify often non-obvious chemistry ideas for scaffolds, linkages, chemotype substitutions or additions –  Find overlooked sub-pockets

•  Rank chemistry modifications by binding affinity –  Whether motivated from fragment maps or from chemist –  Quantitatively assess the impact of tightly-bound waters

•  Prioritize the combinatorial number of possibilities for optimization based on obvious and non-obvious ideas –  Make and test dramatically fewer compounds

•  Enable broader patents –  Using fragment maps to enumerate more possibilities

31

Peptide Deformylase – Predicting and Experimentally Confirming Water-Mediated Small Molecule Binding and Inhibition

32

Figure3.a)ThePDFbindingsiteconsistsoftheburiedFe(orange)andGlu133andtheexposedGlu42andArg97.SACPpredictsawatertriplet(b)thatbridgesGlu42andArg97.SACPpredictsthatthe2-hydroxamicindole(C)bindsdeeplyinthepocketH-bondingtoGlu133andhasahigheraffinitythanthe3-subs8tutedmolecule,whichispredictedtorotateout(d)ofthepocket.SACPpredictsthatN-methyla8ngthe3-sus8tutedcompound(e)isdestabilizing,becausethemethylgroupisfloa8nginavacuum.TheN-isopropylcompoundfillsthepocket(f)createdbythewatersandthusispredictedtohavehigheraffinity.

SCAP = Simulated annealing of chemical potential

Unanticipated PDF results predicted

BioLeap Confidential

y=0.3478x-0.2995R²=0.8388

-8

-7

-6

-5

-4

-3-21.00-19.00-17.00-15.00-13.00-11.00-9.00-7.00-5.00

pIC

50

FE Pred (kcal/mol)

PDF: Affinity vs Prediction

Non-obviousresultseasilypredicted

RecentdiscoveriesrevealthatallanGbioGcsactthroughafinalcommonpathwayofDNAdamage

•  RecAisdirectlyinvolvedinac8va8ngtheSOSresponseandDNArepair•  RecAmediatestheabili8esofmanybacteriatoovercometheDNA-damaging

radicalsinducedbyarangeofan8bio8cs•  RecAinhibitorsareexpectedtohavebroadspectrumeffectsandnotarget-based

toxicity•  Thereissubstan8alexperimentalevidencethatbacterialosetheabilitytodevelop

resistancetoan8bio8csorperformDNArepairopera8onsaierUVexposureifRecAisinhibited

RecAhomo-oligormerizesontosingle-strandedDNA,forminganacGvatednucleoproteinfilamentthatinducesSOSandcaneffectrecombinaGonalDNArepair

•  Oneresidue,F217onthehomo-oligomersurface,whenmutatedtoY,causesa250xincreaseinbindingaffinity

•  Determinedthattheinterfacecanbeinhibitedbysmallmolecules

Fo8JJ,DevadossB,WinkleJA,CollinsJJ,Walker,GC.“Oxida8onoftheguaninenucleo8depoolunderliescelldeathbybactericidalan8bio8cs.”SCIENCE.336,315–319(2012).

RecA: DNA Repair Inhibitor, Antibacterial

RecA: ssDNA binding and oligimerization

RecA + ATPRecA binds to ssDNA to trigger repair and recombination by SOS pathway

CryoEM of dozens of RecAs oligimerized on ssDNA (Micron, 24(3):309–324, 1993)

RecA involved in pathways of bacterial killing and of resistance to antibiotics

RecA knockout recovers up to 5 order of magnitude efficacy of antibiotics.

Collins JJ et al. Cell 130, 797–810, 2007

F217Y RecA mutation enhances binding by 250x

•  Nine residues (A214-R222) are in the homooligomeric interface and five of the nine residues are identical in 64 RecA sequences

•  K216, F217, and R222 have been shown to be intolerant to most mutations.1 Least tolerant is F217. Only a mutation to a tyrosine retains full RecA function.

•  Interestingly, the F217Y mutation results in a 250-fold increase in the interaction between RecA subunits2 -- why?

1 M.C. Skiba and K.L. Knight. J Biol Chem, 269(5):3823–3828, Feb 19942 De Zutter JK, Forget AL, Logan KM, Knight KL (2001) Structure (Camb) 9: 47–55

QM calculations show a greater extent of charge delocalization in Y vs. F; K216 strongly polarizes the phenyl ring.

RecA: Hot-spot clustering identifies 3 biologically relevant sites

Cluster at RecA homo-oligomer interface

Multiple clusters in ATP binding site

Cluster in single stranded DNA binding site

Fragment maps reproduce conserved interactions at RecA-RecA interface

Three amino acids at the protein-protein interface universally conserved.

F217 K216 R222

Validation compound delays bacterial growth in a UV-irradiation-dependent manner

Predicted Inhibitor:2 orders of magnitude delay in growth

Predicted Non-Binder:No reduction in growth

+ - + - inhibitor + + - - UV irradiation (induced DNA damage)

5 h 5 h 3 h 3 h 0 h

•  2 compounds, MW ~250, known pharmaceutical class•  Designed to block RecA oligimerization•  Compound inhibits RecA in vitro with IC50 = 23 μM

(SOS gene reporter assay)•  Next set of compounds designed link PPI site to ATP site

Documents

1.16 Boltzmann Maps From Grand Canonical Monte Carlo Simulations - Overview