View
5
Download
0
Category
Preview:
Citation preview
Artificial Intelligence in Drug Design
Ola Engkvist, Hit Discovery, Discovery Sciences, IMED Biotech Unit, AstraZeneca, Gothenburg, Sweden
ELRIG Research & Innovation April 2 2019
Drug Design
What to make next? How to make it?
De novo design
Multi-parameter scoring function
Retrosynthesis
What is different now?
3
Augmented design
Autonomousdesign
Automatic design
de novo molecular design
Synthesis prediction
Automation
Data generation
It takes two to tango
4
Artificial Intelligence Chemistry Automation
Science @AZ
5
Neural Networks & Deep Learning
6
• Neural Networks known for decades
• Inputs, Hidden Layers, Outputs
• Single layer NNs have been used in QSAR
modelling for years
• Recent Applications use more complex
networks such as
• Multi-layer Feed-Forward NNs
• Convolutional NNs
• biological image processing
• Auto-encoder NNs
• Recurrent NNs
• Trained using Maximum Likelihood
Estimation to maximize the likelihood
of next character
Why? Generation of Novel Compounds in the 1060 Chemical Space!
7
Where´s the impact?
• Use for de novo Molecular Design
• Scaffold Hopping
• Novelty
• Virtual Screening
• Library Design
10601010-1012
Natural language generation and molecular structure generation
8
• Can we borrow concepts from natural language processing and
apply to SMILES description of molecular structures to generate
molecules?
• Conditional probability distributions given context
• 𝑃 𝑔𝑟𝑒𝑒𝑛 𝑖𝑠, 𝑔𝑟𝑎𝑠𝑠, 𝑇ℎ𝑒
• 𝑃 𝑂 =, 𝐶, 𝐶
The grass is ?
C C = ?
Recurrent Neural Network & Natural language generation
9
Tokenization of SMILES
10
• Tokenize combinations of characters like “Cl” or “[nH]”
• Represent the characters as one-hot vectors
The generative process
11
Reinforcement learning
12
Learning from doing
Action Reward Update behaviour
Design molecule
Active?
Good DMPK?
Synthetically accessible?
Make more like this?
Make something else instead?
Agent
AI live: Create Structures Similar to Celecoxib
13
• Key Message
• RNN generates
structures similar
to Celecoxib
• Rapid sampling!
• Average score
describes how
many learning
steps are required
to reach similar
compounds
Some misconceptions about de novo RNN generated molecules
14
“The molecules are not diverse”
“The molecules are not synthetic feasible”
Answer: The generated molecules follows the properties of the molecules used as prior
Segler et al ACS Central Sci. 2018, 4, 120-131 Ertl et al arXiv:1712.07449
Diversity Synthetic feasibility
“Cambrian explosion” of different DL based molecular de novo generation methods
15
PyTorch + RDKit + ChEMBL => anyone with a computer can contribute =>
Benchmarking is urgently needed
Which benchmarks? What are the relevant questions?
Does the same algorithm work best for both
scaffold hopping and lead series optimization?
Which algorithm samples the underlying
chemical space most complete?
1
2
3
Which algorithm zooms most efficiently to the
most interesting regions of chemical space?4
Which is best way to describe molecules,
strings or graphs?
Benchmark published by the scientific community
• MOSES Polykovskiy et al
• https://arxiv.org/abs/1811.12823
• Diversity and quality of generated molecules
1
2
3
• Arus-Pous et al • https://chemrxiv.org/articles/Exploring_the_GDB13_Chemical_Space_Using_Deep_Generative_Models/7172849
• Complete sampling of the relevant chemical space4
• Klambauer et al
• J. Chem. Inf. Mod. 2018, 58, 1736
• Distribution between generated and real molecules
• GuacaMol Brown et al
• https://arxiv.org/abs/1811.09621
• Efficient optimisation of a specific property
AI + Big Data Transforms Synthesis Prediction
18
Manual heuristic rule generationAutomatic rule generation
Using Machine Learning
Accelerated by
Monte-Carlo
Tree search
What is the impact?
• Improves synthesis success
• Future implementation in iLab
• Route suggestions for Medicinal Chemists
• Invention of “novel” reactions
Segler et al Nature 2018, 555, 604
Embedding Synthesis Prediction in Projects @AZ
• Build of Reaction Knowledge Base
Flat filesFlat filesFlat filesReaxys
Flat filesFlat filesFlat filesPatent
data DMTA
Make
Test
Analyse
Design
AZ Reaction Connect
~20M reactions
MedChem ELN
PharmSci ELN
AZ ChemistryConnect
iLAB
DMTA
cycles
Drug Discovery
Project
Synthesis Prediction from Discovery to Launch
20
Forward Synthesis Prediction
Synthetic Route Prediction
Synergy between ML and QM
Support high-throughput experimentation
Artificial Intelligence Guided Drug Design Platform
21
Generation of Novel
Chemical Space
Reaction & Synthesis
Prediction
iLAB
DMTA
Make
Test
Analyse
Design
Desirability
function
Σ IC50, LogP,
Novelty etc.
Iterations
Profiling
AI Design
Platform
Fully Automated
DMTA Cycle
2018 Proof-of-Principle Pilot Study
1st iteration
Novelty
3rd iteration
Expansion2nd iteration
Novelty
4th iteration
iLab library
~2month~2month ~2month
Constant re-learning and training
RIA
• Novelty key goal
• Crowded IP space
• Lots of available data
• Selectivity
• New promising series
identified
Oncology
• Selectivity key goal
• Novelty
• Several promising
series identified
CVRM
• Optimising HI series
• Tool compound
23
Lessons from pilot study
• It works!
• Novel scaffolds were identified in crowded chemical space
• Compound series could be efficiently optimised
• Affinity and ADME predictions are still bottlenecks
• Too many ideas might make prioritization for synthesis challenging
• Chemistry resources need to be frontloaded
• Optimisation under constraints might lead to molecules that is difficult to synthesize
• Synergize with automation
• Better Machine Learning Models• Access to more data (for instance IMI2 Call 14 Topic 3)
• Experimental descriptors
• Graph convolution, include protein based information
• Multi-task modelling
• Matrix factorization with side information
• Free energy calculations• Progress in speed
• Combine with machine learning
• Confidence estimation• Conformal prediction
• Bayesian methods
• Benchmarking• Public Chemogenomics set available (Excape-DB, Pidgin)
• Blind competitions (SAMPL, D3R)
How can we improve affinity prediction?
24
Deep learning have struggled to improve bioactivity predictions significantly
25
Ramsundar J. Chem. Inf. Model. 2017, 57, 2068
Tong et al https://www.biorxiv.org/content/10.1101/473074v1
Still potential due to its flexibility
Experimental descriptors can improve predictions
26
HTS
Imaging
Transcriptomics
Simm et al Bioxriv later Cell Chem. Bio. 25, 611.
Uncertainty estimation is key
27
• Platt scaling
• Iso-tonic regression
• Venn-ABERS predictor
Calibrated probabilities
We are still only in the beginning
Formalize Chemical Intuition
Learn from Diverse Data
Extend to new Modalities
Explore New Deep Learning Architectures
Protein Structure Prediction
Automated Machine Learning
What is the vision?
Augmented
Design
• AI as idea generator
• AI generates and scores
molecules
• All decisions by chemist
based on AI
• Synthesis route design by
chemist
Autonomous
Design
• AI as designer
• Chemist defines,
monitors and updates
the desirability
function
Automated
Design
• AI as designer
• AI monitors and
updates desirability
function when
needed
Tool compounds on “tap”
Green, Engkvist & Pairaudeau Fut. Med. Chem. 2018
Will ML/AI revolutionize drug design?
My personal opinion(s)
30
• Only time will tell….
• The last commonly agreed revolution was the introduction of DMPK
departments in the 90s, so the bar is high
• ML/AI like other promising technologies (for instance PROTACS) warrants
further investments
• More data, automation and ability to learn makes ML/AI bound to have
larger impact on drug design in the future
• During my 19 years in industry it has never been as exciting to work with in
silico drug design
Acknowledgements
31
Discovery Sciences Molecular AI TeamThierry Kogej
Hongming Chen
Isabella Feierberg
Atanas Patronov
Esben Jannik Bjerrum
Preeti Iyer
Jiangming Sun (Postdoc 2015-2017)
Noe Sturm (Postdoc 2017-2018)
Philipp Buerger (Postdoc 2017-2020)
Jiazhen He (Postdoc 2019-2022)
Rocio Mercado (Postdoc 2018-2021)
Thomas Blaschke (PhD student 2017-2018)
Josep Arus Pous (PhD student 2018-2019)
Michael Withnall (PhD student 2018-2019)
Oliver Laufkötter (PhD student 2018-2019)
Laurent David (PhD student 2018-2019)
Ave Kuusk (PhD student 2016-2019)
Marcus Olivecrona (AZ GradProgram 2017)
Alexander Aivazidis (AZ GradProgram 2018)
Dhanushka Weerakoon (AZ GradProgram 2018-2019)
Panagiotis-Christos Kotsias (AZ AI GradProgram 2018-2019)
Edvard Lindelöf (Master Thesis Student 2018-2019)
Simon Johansson (Master Thesis Student 2019)
Oleksii Prykhodko (Master Thesis Student 2019)
Academic CollaboratorsMarwin Segler (Munster)
Juergen Bajorath (Bonn)
Jean-Louis Reymond (Bern)
Andreas Bender (Cambridge)
Sepp Hochreiter (Linz)
Gunther Klambauer (Linz)
Sami Kaski (Helsinki)
Discovery Sciences Garry Pairaudeau
Clive Green
Lars Carlsson
Nidhal Selmi
DSM AI TeamErnst Ahlberg
Suzanne Winiwarter
Ioana Oprisiu
Ruben Buendia (Postdoc 2018)
PharmSciPer-Ola Norrby
2018 PoP Pilot StudyWerngard Czechtizky
Ina Terstiege
Christian Tyrchan
Anders Johansson
Jonas Boström
Kun Song
Alex Hird
Neil Grimster
Richard Ward
Jeff Johannes
Confidentiality Notice
This file is private and may contain confidential and proprietary information. If you have received this file in error, please notify us and remove
it from your system and note that you must not copy, distribute or take any action in reliance on it. Any unauthorized use or disclosure of the
contents of this file is not permitted and may be unlawful. AstraZeneca PLC, 1 Francis Crick Avenue, Cambridge Biomedical Campus,
Cambridge, CB2 0AA, UK, T: +44(0)203 749 5000, www.astrazeneca.com
32
Recommended