38
Junction Tree Variational Autoencoder for Molecular Graph Generation Wengong Jin, Regina Barzilay and Tommi Jaakkola Presentation adapted from slides by: Wengong Jin Presenter: Yevgeny Tkach 2019 Spring @ h.ps://qdata.github.io/deep2Read/

Junction Tree Variational Autoencoder for Molecular Graph ... · Molecular Variational Autoencoder Encoder Decoder Potency Prediction Bayesian optimization over latent space Find

  • Upload
    others

  • View
    35

  • Download
    0

Embed Size (px)

Citation preview

Junction Tree Variational Autoencoder for Molecular Graph Generation

Wengong Jin, Regina Barzilay and Tommi Jaakkola

Presentation adapted from slides by: Wengong Jin Presenter: Yevgeny Tkach

[email protected]://qdata.github.io/deep2Read/

Executive Summary• Molecule generation using VAE. Encoding and

decoding is based on spacial graph message passing algorithm.

• Instead of generating the molecule node by node which can be looked at as “character level” generation, this work builds higher level vocabulary based on tree decomposition of the molecule graph.

• Using proper “words/parts of speech” helps to make sure that the final molecule is valid.

Drug Discovery

O

ClO

NH

NH

O

NH

5.30O

NH

O

NH

Cl

Cl

NH

SNH

Cl

4.93

O

NHNH

O

NH

F

4.49

O

Cl

Cl

Cl

4.45

NH NH

Cl

Cl

4.42

O

NH

NH

S

NH

Cl

Cl

4.42

ONH

NH

S

NH

O

NH

Cl

Cl

4.40

O

NH

NH

S

NH

Cl

Cl

4.37

Br

S

O

Cl

4.30

O

NH

NH

O

NH

Cl

Cl

4.26

F

S

NNH

S

N

F

NH

N

4.23

O NH

O

Cl

4.18

O

NH

O

S

4.17

NH

O

NH

NH

S

Cl

4.08

O

NH

Cl

NH

NHCl

Cl

4.07

S

NH

O

NH

NH

O

NH

Cl

Cl

4.07

O

NH

N

Br

S

Cl

Cl

4.04

O

S

O

S

Cl

S

4.04

Cl

S

Cl

Cl

4.03O

S

NH

Br

Cl

4.02

NH

O

NH

O

NH

Cl

Cl

4.01

O

NH

O

S

4.00

O

N

S

N NH

S

3.99

O

NH

S

NHNH

O

NH

Cl

Cl

3.98

O

NH

NH

Cl

NH

NH

O

Cl

Cl

3.96

N

Cl

NH

3.86

Br

Br N

OS

3.83

O

NH

O

NH

Cl

3.81

Cl

ClCl

NHCl

3.79

NH

O

N

Cl

3.72

O

I

N

Cl

N

Cl

3.69

Cl

Cl

Cl

NH

Cl

3.68

S

O

NH

O

O

3.64

O

S

N

S

3.61

N

Br

O

SS

3.61

N

Br

O

SS

3.61

O

Cl

N

Cl

3.60

NH

Cl

Cl

Cl

O

3.58

F

Cl

ClCl

3.58

S

FCl

N

3.58

Cl

S

N

3.57

N N

Cl

SN

O

F

3.56

N

S

O

O

NN

3.53

NH

O

NH

Cl

N+ O

O-

3.52

O

NH

N

S

Cl

Cl

3.51

O

NH

ClCl

Cl

3.51

O NH

F

3.50

O

S

3.50

O

OBr

Br

Br

3.50

OBr

3.50

Generate molecules with high potency

Drug Discovery

Modify molecules to increase potency

NH

N

-S NH+

S

NNBr

O

+H3N

O O-

OO

NH

OO

N+HN

O

N+O

O-

OH

S

OHN

OH

N

N+ O-O

O

O-ON O

N

OS

+HN

Cl NH+

N

OSN

HO

N

-2.056.516.696.80

O

NH

O

N

ON

+HN

O

O

HN

O

N

OO

N

O

NH

OO

H2N

O O-

OO

HN

NH

O

Br

SO

O

NN

OO O-

SO

O

NN

O

+HN NH2

O

NH+NH2

O

NH2

O

-2.214.524.945.69

NH

O

NH

SO

OO

O-

NH

O

NH

SO

OO

HN Br

O

NNH2+

N

O

NN

O

NH

O

O-

O

NN

O

NH

O

N

O

OH

N

NNN+

O

-O

OH

N

NN+HN

-1.922.693.034.00

Molecular Variational Autoencoder

Encoder Decoder

Potency Prediction

Bayesian optimization over latent space

Find “best” drugs

Gradient ascent over latent space

Make “better” drugs

[1] Gomez-Bombarelli et al.,Automatic chemical design using a data-driven continuous representation of molecules, 2016

How to generate graphs?

N

O

N S

O

N

O

N

O

N S

O

Valid ValidInvalid Invalid Invalid

Moresteps

Node by Node

• Not every graphs is chemically valid

• Invalid intermediate states hard to validate

• Very long intermediate steps difficult to train (Li et al., 2018)

[2] Li et al., Learning Deep Generative Models of Graphs, 2018

Functional Group

Functional Groups

NN

N

N

NNO Cl

SO SN

SN

Aromatic rings

How to generate graphs?

N

O

N S

O

N

O

N

O

N S

O

Valid ValidInvalid Invalid Invalid

Moresteps

N

O

N S

O

Valid Valid

O

SNValid

Node by Node

Group by Group• Shorter action sequence

• Easy to check validity

Tree DecompositionMolecule Junction tree

NN

NN

NN

O ClSO SN

SNCluster label

Vocabulary……

Clusters

• Generate junction tree Generate graph group by group

• Vocabulary size: less than 800 given 250K molecules

Our Approach

Molecule

Encode

DecodezG

zT

Ci

Molecular Graph G

Junction Tree T

Tree Decomposition

CjEncode Decode

Clusters

Graph & Tree Encoder

Neural Message Passing Network (MPN)

Node feature

Graph Encoding

[3] Dai et al., Discriminative embeddings of latent variable models for structured data, 2016

1-hop neighborhood graph

Graph Encoding

[3] Dai et al., Discriminative embeddings of latent variable models for structured data, 2016

2-hop neighborhood graph

Graph Encoding

[3] Dai et al., Discriminative embeddings of latent variable models for structured data, 2016

Graph Encoding

u

v⌫(t)uv

w

w⌫(t�1)wu

Node feature Edge featureMessages

[3] Dai et al., Discriminative embeddings of latent variable models for structured data, 2016

hu

u

Graph Encoding

[3] Dai et al., Discriminative embeddings of latent variable models for structured data, 2016

Tree Encoding

mij

j

i

mki

mki

k

k

To capture long range interactions

Graph & Tree Encoder

zG zT

average-pooling root node

Tree Decoder

zT

S

C

Tree Decoder

1

5

63 4

2

7 8

Label Prediction

[4] Alvarez-Melis & Jaakkola, Tree-structured decoding with doubly-recurrent neural networks

Tree Decoder

1

5

63 4

2

7 8

1. Topological Prediction

2. Label Prediction

Topological Prediction: Whether to expand a child or backtrack?

Label Prediction: What is the label of a node?

S

Message vector

Tree Decoder

1

5

63 4

2

7 8Topological Prediction

Backtrack

Topological Prediction: Whether to expand a node or backtrack?

Label Prediction: What is the label of a node?

Tree Decoder

hki

hki

hij

j

i

k

k

hij = GRU(xi, {hki}k2Nt(i)\j) hij zT

Feedforward NN

Label Prediction

Encodes the entire subtree of current state

Tree Decoder

Graph Decoder

Predicted Junction Tree Molecular Graph

S

C

Graph Decoder

Gi

A

S

S

A

S

A

S

Enumerate how clusters are merged together

1

2

Encode each candidategraph by graph encoder

fai (Gi) = hGi · zG 3

Score each candidate:

Enumerated subgraphs

Graph encoder

Training? VAE?• The KL divergence part on

the latent space is not discussed in the paper.

• zG is only used for generated subgraphs ranking so not clear how it falls in the VAE paradigm.

• From the code, training is with KL annealing following “Generating Sentences from a continuous space” paper by Bowman et al.

Experiments

• Data: 250K compounds from ZINC dataset

• Molecule Generation: How many molecules are valid when sampled from Gaussian prior?

• Molecule Optimization

• Global: Find the best molecule in the entire latent space.

• Local: Modify a molecule to increase its potency

BaselinesSMILES string based:

1. Grammar VAE (GVAE) (Kusner et al., 2017);

2. Syntax-directed VAE (SD-VAE) (Dai et al., 2018)

Graph based:

1. Graph VAE (Simonovsky & Komodakis, 2018)

2. DeepGMG (Li et al., 2018)

[2] Li et al., Learning Deep Generative Models of Graphs, 2018[5] Kusner et al., Grammar Variational Autoencoder, 2017[6] Dai et al., Syntax-directed Variational Autoencoder for structured data, 2018[7] Simonovsky & Komodakis, GraphVAE: Towards generation of small graphs using variational autoencoders

Molecule Generation (Validity)

0

25

50

75

100

GVAE GraphVAE SD-VAE DeepGMG Ours (w/o Ours

10093.589.2

43.5

13.57.2

checking)

NH

O

O

Cl

N

N

NN

O

N

O

S

O

NNH2

NH

O

OO

NH

NNHNH+

HH

HO

OH

N

N

NH N

O

NH

OH

NH2+ NH+N

O

F

NH

O S

O

O

NH

OF

NH

N

O

N

N

O

OH

S

OO

NH

O

N

SO H S

O

O

NH

O

H3N+

NNH O

O-

N

O

NNH+

NH

O

N

NH

Cl

NH2+

NH

O

NS

O

O

N

S

O

NH

S

NH

O

O

N

NN

O

O

N

N

NN

Cl

NHO

N

F

NO

N

O

NO

H2NNH

ON

O

NNH+

NH

O

N

S

N

N

O

O

NH

S

NH3+

NH

O

O

SO

O

O

S

ONH

N

OH

N

O

NH

NH2

O

NHN

O

NHO

O

H2N

NH

S OO

O

O

NH

O

O

O

NH

NN

N

OH

S N

N

O

NH2+

OO

NN

NHO

NH+ N

O

S

O O

N

OH

H

NH2

N

O

NH

O

NH

N

S

N

N

Cl

NO

N

OH

SNHO

N

N

NHO

NH2+

H

H

N

O

N

NH

O

O

NH

O

N

S

O

NH3+

N

N

NH O

O

N+

O

O-

N

NH

O

SNONH

NH2N

O

O

NN

N

NNH2

O

N

O

S

O

NH+Cl

NH

O

NH2

OHO

N

O

N

OHNH+

H2N

O

N

H

N

O

NH2

NH

O

O

N

N

N N

O

O

NH

O

N

S

N+

O

O- NH2S NH2

O

O

NH

O

NH

O

N

N

NH

O

NH

OO

NH+

F

NHS

O

O

N

NH

O

Cl

S

NNH2

O

NH

H HN

NNO

O

O

O

N

O

S

O

O

N-

NH

O

N+ F

N

O

NH

Cl

N

O

N

O

NH

OH

O

SHO

O

NH+

N

N

O

O

NH

O

NH

H

O

NHN+

OH

NH+

NH2

N

NH+

NH

O

N

OH

OH

O S

O

ON

N

N

N

N

SO

N

OO

O

OO

S

N

N

N

H2N S

NH

S

O

O O

O

N

ONH H

H

Br O

NH

N NH

S

O

OO

NH

S

NH+

OH

NH2 O

N

S

N

NH

O

O

O

NH+

N

S

NH2+

N

O

F N

O

N

N

O

N

F

FF HO

O

NN

F

ONH

NH3+

NH2+

NH+

O

N

OO

O

N NH

N

O

NH

FS

S

O

O

NH

N+O

O-

O

F

F

NN

NN

O

NO

H

H ONH

O

N

S

O

ON

N

H3N+ NH3+

NNH+

O

NN

HS

NN

O

N

NH O

O

NH

N

S

NH2

N

O

O

NH

FBr

N

O

NH

O

O O

O NO

N

Br

N

O

NH

O

NHH

O

NH

S

NS

NH2O

NH2+

NH

N

NH2

O

F

S

O

O

NH

ONH

N

ONH

O Cl

O

NH

OOH

N

O

NH

O

NH

NH3+

NH2+

H

O

OH

N

S

O

NH

NNH

O

N

O

NHO

N

OO

N

O

NH+NH2

O

NH O

N S

N

N

O

N

N O

O

NH

H2N

NH

O

H3N+

OH

NH3+

NH2+

N

O ClN

N

O

NHO

N

O

NH

O

NH

N

N

O

O

NH

N

N

O

O

O

NH2

NH

N

NH

OHNH

H

NH

O

O

NN

N

F

ON

NH

O

H2N

NH2+

NH

NH3+

FN

S

O

N

O

NHS

O

O

NHN

NH2

O

NO

NN

OH

O

O

NNH

ONH

S

H

O

NH

O

NN

S

NH2

O O

O

NH

O

N

SNH

O

N

N

O

Br

NH+

H2N

NHNH+ NH

O

N

N

NH2+

S N

NHO

H

OF

NH

NNH

OO-

O

N

O

O

NHN

O

NHH

N

O

S NHN

O

O

NH

O

NH2

NH

H3N+NH2

+

N O

O

NH2+

OH

NH

SO

OO-

NO

NH2+

S

H

H

HO

NH

O

H3N+ NH2+

NH

O

N

H2N

N NH

O

SHO

O

N

N NH2

S

Cl

N

NH

N

O

OH

NO

S

O

ON

NH2+

N NH2+

NHO

ON

S

H

HO

NHO

NH

H2N

H3N+

NH3+

N

O

O

N

Cl

N

OO

O

N

F

N

O

NHO

NH

O

SH

O NH

NNH

NH

O

NH

O

F

N-S

O

O

NH2

NH

O

ONH

O

OH

O

NH3+ O

N

NH2+

NH+

O

NH

N O

O

O

N

O

NH

SO

O

NH

O

F

NN

O

N

N

O

NH

N

NNH

O

O

NH2NH

O

N ONH

O

F

O

N

Cl

SH

Br

NH

O

NHN

S

O

NH+

NH

ONH

N

S

O

NH

F

F

N

O

NH

S

O

O

Cl

N

NH

OO-

NO

NN

HO

H

H

H3N+

SNH2

Cl

S

NN

O

O

N

O

NN

O

O

N

N

N

OO

O

N

O

O

O

O

O

NH+

NH

O O

O

N

N

S

O

ONH

O

O

N

N

O

N

N

NH

O

N

ONH

NN

OO

N N N

NN-

N

OH

N

N

O O

N

N

O

NH

N ONH

NO

O

NHN

O NH

O

N

NH2+

O

NH

O

ON

NH2+

H2N N

O

NH

OH

N

O

O

S

O

NH

NH2

NH

OH

F

FS

H2N

O

NHN

NNH

N

O

NH

BrO

O

N

N

N

NH3+

NH

O

NH2+

N

O

O-

N

NH3+

S

NH

O

O

NH

H2NNH

NN+

O

O-

NH

O

N

N

S

O

NH+

NH+

N

OH

O

O

NN

O

N

NS

O

O

NH NH

O

O

S

NH2

N

OO

N

NH2+O

N

N

N

N

O O-

N

FS

O

NH

O

N

SO OS

O

O

NHO

N NH+

NH+

O

NH

S

N

S

O

NH

NH

O

O

F

H O

N

ON

O

NH

F

O

NH

NN

O

N NH+ NH

O O

N

N

O

N

O

O

S

N

O

NH

HON N

NNH

Cl

O

N

NH

H2N

OH

N O

O

NH

NH3+

NN

O

O

NH

O

H2N

NH2+ NH

O

N

H H

S

NH

O

NH2

F

F

NH

NH2

S

O

O

N

O

N

N

N

O

N

O

N O N

NH S

O

O

N

Cl

N

NO

OCl

NH

NH+

O

NHN

N H

H

O O

N

NH2+

N

NN

N

NH2

O

FN

N

N

N

NO

O OH2N

O

N

O

N

S

O

BrO

NH N

O

O

NO

NS

O

O

O NH

O

N O

O

N

O

N

N

N

Cl

S

OO

O

NHNH

N

O

ON

NN O

O N

S

O

O

NNH

S

O

O

NN

N

OH

N-S

O

O

ON

O

OHN

SO

NH

N

OHN

O

S

ClO

O

N

O

NHO

H2N O

NH

O

OH O

NH

OH

NH2+

NH2

S

O

O

O

NHOH

NH3+

O

NH

O

O

NHS

HO

N

NH+N

O

N

O

H

HO

O

NH NH

NH2N

BrN

O

N

O

N

NH S

O

O

O

N

SN

O N

N

O

S

O

NH

N

O

Cl

S

ON

N

O

O-

N O

N NH

O

O

H

O

S

NH O

O

NHS

ClNH

O

ON

FNH

O

N

O

HS O

O

ClNH

O

S

O

NH

S

NH

O

ON

N

N

NH

NH2

O N

NNH+

O

N

O

N

NH2+

NH3+

N

H

H

O

NH

O

OO

O

O

NH

N

O

Sampled Molecules

Molecule Optimization (Global)

0

1.5

3

4.5

6

CVAE GVAE SD-VAE Ours

5.3

4.04

2.94

1.98

Property Score of the Best Molecule

Property: Solubility + Ease of Synthesis

Encoder Decoder

Property Prediction

GaussianProcess

BayesianOptimization

Molecule Optimization (Global)

0

1.5

3

4.5

6

CVAE GVAE SD-VAE Ours

5.3

4.04

2.94

1.98

Property Score of the Best Molecule

5.30

4.49

4.93

Property: Solubility + Ease of Synthesis

Molecule Optimization (Local)

0

0.5

1

1.5

2

0 0.2 0.4 0.6

0.21

0.84

1.681.91

Preservation

Average Improvement

Encoder Decoder

Property Prediction

NeuralNetwork

Gradient Ascent

Molecule Optimization (Local)

Average Improvement

NH

N

-S NH+

S

NNBr

O

+H3N

O O-

OO

NH

OO

N+HN

O

N+O

O-

OH

S

OHN

OH

N

N+ O-O

O

O-ON O

N

OS

+HN

Cl NH+

N

OSN

HO

N

-2.056.516.696.80

O

NH

O

N

ON

+HN

O

O

HN

O

N

OO

N

O

NH

OO

H2N

O O-

OO

HN

NH

O

Br

SO

O

NN

OO O-

SO

O

NN

O

+HN NH2

O

NH+NH2

O

NH2

O

-2.214.524.945.69

NH

O

NH

SO

OO

O-

NH

O

NH

SO

OO

HN Br

O

NNH2+

N

O

NN

O

NH

O

O-

O

NN

O

NH

O

N

O

OH

N

NNN+

O

-O

OH

N

NN+HN

-1.922.693.034.00

Preservation ≈ 0.6 0

0.5

1

1.5

2

0 0.2 0.4 0.6

0.21

0.84

1.681.91

Preservation

Molecule Optimization (Local)

Average Improvement

NH

N

-S NH+

S

NNBr

O

+H3N

O O-

OO

NH

OO

N+HN

O

N+O

O-

OH

S

OHN

OH

N

N+ O-O

O

O-ON O

N

OS

+HN

Cl NH+

N

OSN

HO

N

-2.056.516.696.80

O

NH

O

N

ON

+HN

O

O

HN

O

N

OO

N

O

NH

OO

H2N

O O-

OO

HN

NH

O

Br

SO

O

NN

OO O-

SO

O

NN

O

+HN NH2

O

NH+NH2

O

NH2

O

-2.214.524.945.69

NH

O

NH

SO

OO

O-

NH

O

NH

SO

OO

HN Br

O

NNH2+

N

O

NN

O

NH

O

O-

O

NN

O

NH

O

N

O

OH

N

NNN+

O

-O

OH

N

NN+HN

-1.922.693.034.00

Preservation ≈ 0.4

0

0.5

1

1.5

2

0 0.2 0.4 0.6

0.21

0.84

1.681.91

Preservation

Discussion• “word level” prediction can offer significant

improvement by shortening the decision process. • Latent space optimization is an interesting and

powerful technique. • “Teacher forcing” introduces data bias which can be

reduced via RL techniques and the GAN complete graph valuation approach.

• Similar to SMILES this paper samples a random order in the graph tree structure when: using an arbitrary minimal spanning tree, choosing an arbitrary node to be the root of the tree, choosing a random ordering of the children of each tree node.

Thanks

Original code is available at: https://github.com/wengong-jin/icml18-jtnn

Tree & Graph Vector

Input Molecule

ReconstructedMolecule

Junction Tree

Reconstructed Tree

S

C