136
1/21 Introduction Describing inorganic complexes Similarity and model uncertainty ML for inorganic molecular design: descriptors and similarity in transition metal chemical space Jon Paul Janet 1 Heather Kulik 1 1 Department of Chemical Engineering, Massachusetts Institute of Technology 255th ACS National Meeting, New Orleans 03.19.18

jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

1/21

Introduction Describing inorganic complexes Similarity and model uncertainty

ML for inorganic molecular design:descriptors and similarity in transition metal

chemical space

Jon Paul Janet 1 Heather Kulik 1

1Department of Chemical Engineering, Massachusetts Institute of Technology

255th ACS National Meeting, New Orleans

03.19.18

Page 2: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

2/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Data-driven molecular design

Gomez-Bombarelli, R. et al.. Nat.

Mater., 15(10):1120-1127, 2016.

OLED chemical space

NN∼ 106

DFT∼ 105

Exp.∼ 101

Ma, X. et al. J. Phys. Chem. Lett., (18):3528-3533, 2015.

Machine learningis transforminghow we designnew materials...

L

M

L

L

L

L

LBignozzi, C. et al. Coord. Chem. Rev., 257(9), 2013.

NN

N N

Pt

Cl

Cl

Periana, R. A. et al. Science, 280(5363), 1998.

...what about inorganic molecular complexes?

Page 3: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

2/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Data-driven molecular designGomez-Bombarelli, R. et al.. Nat.

Mater., 15(10):1120-1127, 2016.

OLED chemical space

NN∼ 106

DFT∼ 105

Exp.∼ 101

Ma, X. et al. J. Phys. Chem. Lett., (18):3528-3533, 2015.

Machine learningis transforminghow we designnew materials...

L

M

L

L

L

L

LBignozzi, C. et al. Coord. Chem. Rev., 257(9), 2013.

NN

N N

Pt

Cl

Cl

Periana, R. A. et al. Science, 280(5363), 1998.

...what about inorganic molecular complexes?

Page 4: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

2/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Data-driven molecular designGomez-Bombarelli, R. et al.. Nat.

Mater., 15(10):1120-1127, 2016.

OLED chemical space

NN∼ 106

DFT∼ 105

Exp.∼ 101

Ma, X. et al. J. Phys. Chem. Lett., (18):3528-3533, 2015.

Machine learningis transforminghow we designnew materials...

L

M

L

L

L

L

LBignozzi, C. et al. Coord. Chem. Rev., 257(9), 2013.

NN

N N

Pt

Cl

Cl

Periana, R. A. et al. Science, 280(5363), 1998.

...what about inorganic molecular complexes?

Page 5: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

2/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Data-driven molecular designGomez-Bombarelli, R. et al.. Nat.

Mater., 15(10):1120-1127, 2016.

OLED chemical space

NN∼ 106

DFT∼ 105

Exp.∼ 101

Ma, X. et al. J. Phys. Chem. Lett., (18):3528-3533, 2015.

Machine learningis transforminghow we designnew materials...

L

M

L

L

L

L

LBignozzi, C. et al. Coord. Chem. Rev., 257(9), 2013.

NN

N N

Pt

Cl

Cl

Periana, R. A. et al. Science, 280(5363), 1998.

...what about inorganic molecular complexes?

Page 6: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

2/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Data-driven molecular designGomez-Bombarelli, R. et al.. Nat.

Mater., 15(10):1120-1127, 2016.

OLED chemical space

NN∼ 106

DFT∼ 105

Exp.∼ 101

Ma, X. et al. J. Phys. Chem. Lett., (18):3528-3533, 2015.

Machine learningis transforminghow we designnew materials...

L

M

L

L

L

L

LBignozzi, C. et al. Coord. Chem. Rev., 257(9), 2013.

NN

N N

Pt

Cl

Cl

Periana, R. A. et al. Science, 280(5363), 1998.

...what about inorganic molecular complexes?

Page 7: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

2/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Data-driven molecular designGomez-Bombarelli, R. et al.. Nat.

Mater., 15(10):1120-1127, 2016.

OLED chemical space

NN∼ 106

DFT∼ 105

Exp.∼ 101

Ma, X. et al. J. Phys. Chem. Lett., (18):3528-3533, 2015.

Machine learningis transforminghow we designnew materials...

L

M

L

L

L

L

LBignozzi, C. et al. Coord. Chem. Rev., 257(9), 2013.

NN

N N

Pt

Cl

Cl

Periana, R. A. et al. Science, 280(5363), 1998.

...what about inorganic molecular complexes?

Page 8: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

2/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Data-driven molecular design

Gomez-Bombarelli, R. et al.. Nat.

Mater., 15(10):1120-1127, 2016.

OLED chemical space

NN∼ 106

DFT∼ 105

Exp.∼ 101

Ma, X. et al. J. Phys. Chem. Lett., (18):3528-3533, 2015.

Machine learningis transforminghow we designnew materials...

L

M

L

L

L

L

LBignozzi, C. et al. Coord. Chem. Rev., 257(9), 2013.

NN

N N

Pt

Cl

Cl

Periana, R. A. et al. Science, 280(5363), 1998.

...what about inorganic molecular complexes?

Page 9: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

2/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Data-driven molecular design

Gomez-Bombarelli, R. et al.. Nat.

Mater., 15(10):1120-1127, 2016.

OLED chemical space

NN∼ 106

DFT∼ 105

Exp.∼ 101

Ma, X. et al. J. Phys. Chem. Lett., (18):3528-3533, 2015.

Machine learningis transforminghow we designnew materials...

L

M

L

L

L

L

LBignozzi, C. et al. Coord. Chem. Rev., 257(9), 2013.

NN

N N

Pt

Cl

Cl

Periana, R. A. et al. Science, 280(5363), 1998.

...what about inorganic molecular complexes?

Page 10: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

2/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Data-driven molecular design

Gomez-Bombarelli, R. et al.. Nat.

Mater., 15(10):1120-1127, 2016.

OLED chemical space

NN∼ 106

DFT∼ 105

Exp.∼ 101

Ma, X. et al. J. Phys. Chem. Lett., (18):3528-3533, 2015.

Machine learningis transforminghow we designnew materials...

L

M

L

L

L

L

LBignozzi, C. et al. Coord. Chem. Rev., 257(9), 2013.

NN

N N

Pt

Cl

Cl

Periana, R. A. et al. Science, 280(5363), 1998.

...what about inorganic molecular complexes?

Page 11: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

2/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Data-driven molecular design

Gomez-Bombarelli, R. et al.. Nat.

Mater., 15(10):1120-1127, 2016.

OLED chemical space

NN∼ 106

DFT∼ 105

Exp.∼ 101

Ma, X. et al. J. Phys. Chem. Lett., (18):3528-3533, 2015.

Machine learningis transforminghow we designnew materials...

L

M

L

L

L

L

L

Bignozzi, C. et al. Coord. Chem. Rev., 257(9), 2013.

NN

N N

Pt

Cl

Cl

Periana, R. A. et al. Science, 280(5363), 1998.

...what about inorganic molecular complexes?

Page 12: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

2/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Data-driven molecular design

Gomez-Bombarelli, R. et al.. Nat.

Mater., 15(10):1120-1127, 2016.

OLED chemical space

NN∼ 106

DFT∼ 105

Exp.∼ 101

Ma, X. et al. J. Phys. Chem. Lett., (18):3528-3533, 2015.

Machine learningis transforminghow we designnew materials...

L

M

L

L

L

L

LBignozzi, C. et al. Coord. Chem. Rev., 257(9), 2013.

NN

N N

Pt

Cl

Cl

Periana, R. A. et al. Science, 280(5363), 1998.

...what about inorganic molecular complexes?

Page 13: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

2/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Data-driven molecular design

Gomez-Bombarelli, R. et al.. Nat.

Mater., 15(10):1120-1127, 2016.

OLED chemical space

NN∼ 106

DFT∼ 105

Exp.∼ 101

Ma, X. et al. J. Phys. Chem. Lett., (18):3528-3533, 2015.

Machine learningis transforminghow we designnew materials...

L

M

L

L

L

L

LBignozzi, C. et al. Coord. Chem. Rev., 257(9), 2013.

NN

N N

Pt

Cl

Cl

Periana, R. A. et al. Science, 280(5363), 1998.

...what about inorganic molecular complexes?

Page 14: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

3/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Transition metal complexes

t2g

eg

Energy

low spin

high spin

∆EH−L < 0

low spin

high spin

∆EH−L < 0

low spin

high spin

∆EH−L > 0

low spin

high spin∆EH−L ∼ 0

perturbation, ∆T

M2+

M3+

e

∆EIII−II

Page 15: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

3/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Transition metal complexes

t2g

eg

Energy

low spin

high spin

∆EH−L < 0

low spin

high spin

∆EH−L < 0

low spin

high spin

∆EH−L > 0

low spin

high spin∆EH−L ∼ 0

perturbation, ∆T

M2+

M3+

e

∆EIII−II

Page 16: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

3/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Transition metal complexes

t2g

eg

Energy

low spin

high spin

∆EH−L < 0

low spin

high spin

∆EH−L < 0

low spin

high spin

∆EH−L > 0

low spin

high spin∆EH−L ∼ 0

perturbation, ∆T

M2+

M3+

e

∆EIII−II

Page 17: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

3/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Transition metal complexes

t2g

eg

Energy

low spin

high spin

∆EH−L < 0

low spin

high spin

∆EH−L < 0

low spin

high spin

∆EH−L > 0

low spin

high spin∆EH−L ∼ 0

perturbation, ∆T

M2+

M3+

e

∆EIII−II

Page 18: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

3/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Transition metal complexes

t2g

eg

Energy

low spin

high spin

∆EH−L < 0

low spin

high spin

∆EH−L < 0

low spin

high spin

∆EH−L > 0

low spin

high spin∆EH−L ∼ 0

perturbation, ∆T

M2+

M3+

e

∆EIII−II

Page 19: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

3/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Transition metal complexes

t2g

eg

Energy

low spin

high spin

∆EH−L < 0

low spin

high spin

∆EH−L < 0

low spin

high spin

∆EH−L > 0

low spin

high spin∆EH−L ∼ 0

perturbation, ∆T

M2+

M3+

e

∆EIII−II

Page 20: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

3/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Transition metal complexes

t2g

eg

Energy

low spin

high spin

∆EH−L < 0

low spin

high spin

∆EH−L < 0

low spin

high spin

∆EH−L > 0

low spin

high spin∆EH−L ∼ 0

perturbation, ∆T

M2+

M3+

e

∆EIII−II

Page 21: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

3/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Transition metal complexes

t2g

eg

Energy

low spin

high spin

∆EH−L < 0

low spin

high spin

∆EH−L < 0

low spin

high spin

∆EH−L > 0

low spin

high spin∆EH−L ∼ 0

perturbation, ∆T

M2+

M3+

e

∆EIII−II

Page 22: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

3/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Transition metal complexes

t2g

eg

Energy

low spin

high spin

∆EH−L < 0

low spin

high spin

∆EH−L < 0

low spin

high spin

∆EH−L > 0

low spin

high spin∆EH−L ∼ 0

perturbation, ∆T

M2+

M3+

e

∆EIII−II

Page 23: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

3/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Transition metal complexes

t2g

eg

Energy

low spin

high spin

∆EH−L < 0

low spin

high spin

∆EH−L < 0

low spin

high spin

∆EH−L > 0

low spin

high spin∆EH−L ∼ 0

perturbation, ∆T

M2+

M3+

e

∆EIII−II

Page 24: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

3/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Transition metal complexes

t2g

eg

Energy

low spin

high spin

∆EH−L < 0

low spin

high spin

∆EH−L < 0

low spin

high spin

∆EH−L > 0

low spin

high spin∆EH−L ∼ 0

perturbation, ∆T

M2+

M3+

e

∆EIII−II

Page 25: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

3/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Transition metal complexes

t2g

eg

Energy

low spin

high spin

∆EH−L < 0

low spin

high spin

∆EH−L < 0

low spin

high spin

∆EH−L > 0

low spin

high spin∆EH−L ∼ 0

perturbation, ∆T

M2+

M3+

e

∆EIII−II

Page 26: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

3/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Transition metal complexes

t2g

eg

Energy

low spin

high spin

∆EH−L < 0

low spin

high spin

∆EH−L < 0

low spin

high spin

∆EH−L > 0

low spin

high spin∆EH−L ∼ 0

perturbation, ∆T

M2+

M3+

e

∆EIII−II

Page 27: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

3/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Transition metal complexes

t2g

eg

Energy

low spin

high spin

∆EH−L < 0

low spin

high spin

∆EH−L < 0

low spin

high spin

∆EH−L > 0

low spin

high spin∆EH−L ∼ 0

perturbation, ∆T

M2+

M3+

e

∆EIII−II

Page 28: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

4/21

Introduction Describing inorganic complexes Similarity and model uncertainty

How to estimate properties?

property

features

experiment

HΨ = EΨdensity functional theory (DFT)

model

weeks, months

days

seconds

Page 29: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

4/21

Introduction Describing inorganic complexes Similarity and model uncertainty

How to estimate properties?

property

features

experiment

HΨ = EΨdensity functional theory (DFT)

model

weeks, months

days

seconds

Page 30: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

4/21

Introduction Describing inorganic complexes Similarity and model uncertainty

How to estimate properties?

property

features

experiment

HΨ = EΨdensity functional theory (DFT)

model

weeks, months

days

seconds

Page 31: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

4/21

Introduction Describing inorganic complexes Similarity and model uncertainty

How to estimate properties?

property

features

experiment

HΨ = EΨdensity functional theory (DFT)

model

weeks, months

days

seconds

Page 32: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

4/21

Introduction Describing inorganic complexes Similarity and model uncertainty

How to estimate properties?

property

features

experiment

HΨ = EΨdensity functional theory (DFT)

model

weeks, months

days

seconds

Page 33: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

5/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Input space design

What would be the ideal feature space?

Chemical Space Cf

ci

Descriptor Space X ⊂ Rd

xi

xj

cj

d(xi , xj)

Good descriptors:• cheap• small as possible• preserve similarity

Page 34: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

5/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Input space design

What would be the ideal feature space?

Chemical Space Cf

ci

Descriptor Space X ⊂ Rd

xi

xj

cj

d(xi , xj)

Good descriptors:• cheap• small as possible• preserve similarity

Page 35: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

5/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Input space design

What would be the ideal feature space?

Chemical Space Cf

ci

Descriptor Space X ⊂ Rd

xi

xj

cj

d(xi , xj)

Good descriptors:• cheap• small as possible• preserve similarity

Page 36: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

5/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Input space design

What would be the ideal feature space?

Chemical Space Cf

ci

Descriptor Space X ⊂ Rd

xi

xj

cj

d(xi , xj)

Good descriptors:• cheap• small as possible• preserve similarity

Page 37: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

5/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Input space design

What would be the ideal feature space?

Chemical Space Cf

ci

Descriptor Space X ⊂ Rd

xi

xj

cj

d(xi , xj)

Good descriptors:• cheap• small as possible• preserve similarity

Page 38: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

5/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Input space design

What would be the ideal feature space?

Chemical Space Cf

ci

Descriptor Space X ⊂ Rd

xi

xj

cj

d(xi , xj)

Good descriptors:• cheap• small as possible• preserve similarity

Page 39: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

5/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Input space design

What would be the ideal feature space?

Chemical Space Cf

ci

Descriptor Space X ⊂ Rd

xi

xj

cj

d(xi , xj)

Good descriptors:• cheap• small as possible• preserve similarity

Page 40: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

5/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Input space design

What would be the ideal feature space?

Chemical Space Cf

ci

Descriptor Space X ⊂ Rd

xi

xj

cj

d(xi , xj)

Good descriptors:• cheap• small as possible• preserve similarity

Page 41: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

5/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Input space design

What would be the ideal feature space?

Chemical Space Cf

ci

Descriptor Space X ⊂ Rd

xi

xj

cj

d(xi , xj)

Good descriptors:• cheap• small as possible• preserve similarity

Page 42: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

6/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Data for spin splitting

Data for octahedral complexes1:

M

Lax

Lax

Leq

Leq

Leq

Leq

1345 (194)complexes

7 HF values

1Janet, J.P., and Kulik, H.J. Chem. Sci., 2017, 8, 5137-5152.

Page 43: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

6/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Data for spin splitting

Data for octahedral complexes1:

M

Lax

Lax

Leq

Leq

Leq

Leq

1345 (194)complexes

7 HF values

1Janet, J.P., and Kulik, H.J. Chem. Sci., 2017, 8, 5137-5152.

Page 44: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

6/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Data for spin splitting

Data for octahedral complexes1:

M

Lax

Lax

Leq

Leq

Leq

Leq

1345 (194)complexes

7 HF values

1Janet, J.P., and Kulik, H.J. Chem. Sci., 2017, 8, 5137-5152.

Page 45: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

6/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Data for spin splitting

Data for octahedral complexes1:

M

Lax

Lax

Leq

Leq

Leq

Leq

1345 (194)complexes

7 HF values

B3LYP-like DFTHF exchange in 0-30%gas phase optimizatonLANL2DZ/6-31G*high- and low-spinM(II)/(III)

1Janet, J.P., and Kulik, H.J. Chem. Sci., 2017, 8, 5137-5152.

Page 46: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

6/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Data for spin splitting

Data for octahedral complexes1:

M

Lax

Lax

Leq

Leq

Leq

Leq

1345 (194)complexes

7 HF values

Coulomb matrix eigenspec-trum (CM-ES) descriptor &kernel ridge regression (KRR)

1Janet, J.P., and Kulik, H.J. Chem. Sci., 2017, 8, 5137-5152.

Page 47: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

6/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Data for spin splitting

Data for octahedral complexes1:

M

Lax

Lax

Leq

Leq

Leq

Leq

1345 (194)complexes

7 HF values

Coulomb matrix eigenspec-trum (CM-ES) descriptor &kernel ridge regression (KRR)

∆EH-L RMSECM-ES 19.2 kcal/mol

1Janet, J.P., and Kulik, H.J. Chem. Sci., 2017, 8, 5137-5152.

Page 48: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

6/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Data for spin splitting

Data for octahedral complexes1:

M

Lax

Lax

Leq

Leq

Leq

Leq

1345 (194)complexes

7 HF values

Coulomb matrix eigenspec-trum (CM-ES) descriptor &kernel ridge regression (KRR)

∆EH-L RMSECM-ES 19.2 kcal/mol

Why?1Janet, J.P., and Kulik, H.J. Chem. Sci., 2017, 8, 5137-5152.

Page 49: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

7/21

Introduction Describing inorganic complexes Similarity and model uncertainty

A tale of two complexes

PC 1

PC

2

PC 1

PC

2

∆EH−L size

Fe[pisc]3+6 Fe[misc]3+6

Page 50: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

7/21

Introduction Describing inorganic complexes Similarity and model uncertainty

A tale of two complexes

PC 1

PC

2

PC 1

PC

2

∆EH−L size

Fe[pisc]3+6

∆EH-L = 40.7 kcal/mol

Fe[misc]3+6

∆EH-L = 37.7 kcal/mol

Page 51: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

7/21

Introduction Describing inorganic complexes Similarity and model uncertainty

A tale of two complexes

PC 1

PC

2

PC 1

PC

2

∆EH−L size

Fe[pisc]3+6 Fe[misc]3+6

Page 52: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

7/21

Introduction Describing inorganic complexes Similarity and model uncertainty

A tale of two complexes

PC 1

PC

2

PC 1

PC

2

∆EH−L size

Fe[pisc]3+6 Fe[misc]3+6

Page 53: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

8/21

Introduction Describing inorganic complexes Similarity and model uncertainty

MCDL-25

mixed continuous discrete lcoal (MCDL)

metalproperties

local ligandproperties

global ligandproperties

identity

oxidation state

Fe(II)max ∆χ

χ = 3.44

χ = 2.55Kier index

0

5

10

15

20

CM−ES MCDLmethod

test

RM

SE

, kca

l/mol

Janet, J.P., and Kulik, H.J. Chem. Sci., 2017, 8, 5137-5152.

Page 54: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

8/21

Introduction Describing inorganic complexes Similarity and model uncertainty

MCDL-25

mixed continuous discrete lcoal (MCDL)metal

properties

local ligandproperties

global ligandproperties

identity

oxidation state

Fe(II)

max ∆χ

χ = 3.44

χ = 2.55Kier index

0

5

10

15

20

CM−ES MCDLmethod

test

RM

SE

, kca

l/mol

Janet, J.P., and Kulik, H.J. Chem. Sci., 2017, 8, 5137-5152.

Page 55: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

8/21

Introduction Describing inorganic complexes Similarity and model uncertainty

MCDL-25

mixed continuous discrete lcoal (MCDL)metal

propertieslocal ligandproperties

global ligandproperties

identity

oxidation state

Fe(II)max ∆χ

χ = 3.44

χ = 2.55

Kier index

0

5

10

15

20

CM−ES MCDLmethod

test

RM

SE

, kca

l/mol

Janet, J.P., and Kulik, H.J. Chem. Sci., 2017, 8, 5137-5152.

Page 56: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

8/21

Introduction Describing inorganic complexes Similarity and model uncertainty

MCDL-25

mixed continuous discrete lcoal (MCDL)metal

propertieslocal ligandproperties

global ligandproperties

identity

oxidation state

Fe(II)max ∆χ

χ = 3.44

χ = 2.55Kier index

0

5

10

15

20

CM−ES MCDLmethod

test

RM

SE

, kca

l/mol

Janet, J.P., and Kulik, H.J. Chem. Sci., 2017, 8, 5137-5152.

Page 57: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

8/21

Introduction Describing inorganic complexes Similarity and model uncertainty

MCDL-25

mixed continuous discrete lcoal (MCDL)

metalproperties

local ligandproperties

global ligandproperties

identity

oxidation state

Fe(II)max ∆χ

χ = 3.44

χ = 2.55Kier index

0

5

10

15

20

CM−ES MCDLmethod

test

RM

SE

, kca

l/mol

Janet, J.P., and Kulik, H.J. Chem. Sci., 2017, 8, 5137-5152.

Page 58: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

8/21

Introduction Describing inorganic complexes Similarity and model uncertainty

MCDL-25

mixed continuous discrete lcoal (MCDL)

metalproperties

local ligandproperties

global ligandproperties

identity

oxidation state

Fe(II)max ∆χ

χ = 3.44

χ = 2.55Kier index

0

5

10

15

20

CM−ES MCDLmethod

test

RM

SE

, kca

l/mol

Janet, J.P., and Kulik, H.J. Chem. Sci., 2017, 8, 5137-5152.

Page 59: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

8/21

Introduction Describing inorganic complexes Similarity and model uncertainty

MCDL-25

mixed continuous discrete lcoal (MCDL)

metalproperties

local ligandproperties

global ligandproperties

identity

oxidation state

Fe(II)max ∆χ

χ = 3.44

χ = 2.55Kier index

0

5

10

15

20

CM−ES MCDLmethod

test

RM

SE

, kca

l/mol

Janet, J.P., and Kulik, H.J. Chem. Sci., 2017, 8, 5137-5152.

Page 60: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

8/21

Introduction Describing inorganic complexes Similarity and model uncertainty

MCDL-25

mixed continuous discrete lcoal (MCDL)

metalproperties

local ligandproperties

global ligandproperties

identity

oxidation state

Fe(II)max ∆χ

χ = 3.44

χ = 2.55Kier index

0

5

10

15

20

CM−ES MCDLmethod

test

RM

SE

, kca

l/mol

Janet, J.P., and Kulik, H.J. Chem. Sci., 2017, 8, 5137-5152.

Page 61: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

9/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Extensible, continuous descriptors - RACs

Based on autocorrelations2

OO

OO

C C

M

d1 : ∑O,C

ZOZC = 48d1 : 48 + ∑C,O

ZOZC = 144 + 48d1 : ∑i

∑j

ZiZj δ(di,j , 1)dx : ∑i

∑j

ZiZj δ(dij , x)

0 1 2 3 4 5 6maximum AC depth

8

10

12

14

16

18

MU

E (

kc

al/m

ol)

traintest

*

How to adapt to TM complexes?restrict the scope to focus onnear-metal atoms

d1 : ∑M,O

ZMZOd2 : ∑M,C

ZMZCd3 : ∑M,O

ZMZO

(Zi − Zj)

properties:T ,χ,Z ,I,S∼ 160 features in total

2Broto, P., Moreau, G. and Vandycke, C. Eur. J. Med. Chem., 19(1):71-78,1984.

Page 62: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

9/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Extensible, continuous descriptors - RACs

Based on autocorrelations2

OO

OO

C C

M

d1 : ∑O,C

ZOZC = 48d1 : 48 + ∑C,O

ZOZC = 144 + 48d1 : ∑i

∑j

ZiZj δ(di,j , 1)dx : ∑i

∑j

ZiZj δ(dij , x)

0 1 2 3 4 5 6maximum AC depth

8

10

12

14

16

18

MU

E (

kc

al/m

ol)

traintest

*

How to adapt to TM complexes?restrict the scope to focus onnear-metal atoms

d1 : ∑M,O

ZMZOd2 : ∑M,C

ZMZCd3 : ∑M,O

ZMZO

(Zi − Zj)

properties:T ,χ,Z ,I,S∼ 160 features in total

2Broto, P., Moreau, G. and Vandycke, C. Eur. J. Med. Chem., 19(1):71-78,1984.

Page 63: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

9/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Extensible, continuous descriptors - RACs

Based on autocorrelations2

OO

OO

C C

M

d1 : ∑O,C

ZOZC = 48d1 : 48 + ∑C,O

ZOZC = 144 + 48d1 : ∑i

∑j

ZiZj δ(di,j , 1)dx : ∑i

∑j

ZiZj δ(dij , x)

0 1 2 3 4 5 6maximum AC depth

8

10

12

14

16

18

MU

E (

kc

al/m

ol)

traintest

*

How to adapt to TM complexes?restrict the scope to focus onnear-metal atoms

d1 : ∑M,O

ZMZOd2 : ∑M,C

ZMZCd3 : ∑M,O

ZMZO

(Zi − Zj)

properties:T ,χ,Z ,I,S∼ 160 features in total

2Broto, P., Moreau, G. and Vandycke, C. Eur. J. Med. Chem., 19(1):71-78,1984.

Page 64: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

9/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Extensible, continuous descriptors - RACs

Based on autocorrelations2

OO

OO

C C

M

d1 : ∑O,C

ZOZC = 48

d1 : 48 + ∑C,O

ZOZC = 144 + 48d1 : ∑i

∑j

ZiZj δ(di,j , 1)dx : ∑i

∑j

ZiZj δ(dij , x)

0 1 2 3 4 5 6maximum AC depth

8

10

12

14

16

18

MU

E (

kc

al/m

ol)

traintest

*

How to adapt to TM complexes?restrict the scope to focus onnear-metal atoms

d1 : ∑M,O

ZMZOd2 : ∑M,C

ZMZCd3 : ∑M,O

ZMZO

(Zi − Zj)

properties:T ,χ,Z ,I,S∼ 160 features in total

2Broto, P., Moreau, G. and Vandycke, C. Eur. J. Med. Chem., 19(1):71-78,1984.

Page 65: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

9/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Extensible, continuous descriptors - RACs

Based on autocorrelations2

OO

OO

C C

M

d1 : ∑O,C

ZOZC = 48

d1 : 48 + ∑C,O

ZOZC = 144 + 48

d1 : ∑i

∑j

ZiZj δ(di,j , 1)dx : ∑i

∑j

ZiZj δ(dij , x)

0 1 2 3 4 5 6maximum AC depth

8

10

12

14

16

18

MU

E (

kc

al/m

ol)

traintest

*

How to adapt to TM complexes?restrict the scope to focus onnear-metal atoms

d1 : ∑M,O

ZMZOd2 : ∑M,C

ZMZCd3 : ∑M,O

ZMZO

(Zi − Zj)

properties:T ,χ,Z ,I,S∼ 160 features in total

2Broto, P., Moreau, G. and Vandycke, C. Eur. J. Med. Chem., 19(1):71-78,1984.

Page 66: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

9/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Extensible, continuous descriptors - RACs

Based on autocorrelations2

OO

OO

C C

M

d1 : ∑O,C

ZOZC = 48d1 : 48 + ∑C,O

ZOZC = 144 + 48

d1 : ∑i

∑j

ZiZj δ(di,j , 1)

dx : ∑i

∑j

ZiZj δ(dij , x)

0 1 2 3 4 5 6maximum AC depth

8

10

12

14

16

18

MU

E (

kc

al/m

ol)

traintest

*

How to adapt to TM complexes?restrict the scope to focus onnear-metal atoms

d1 : ∑M,O

ZMZOd2 : ∑M,C

ZMZCd3 : ∑M,O

ZMZO

(Zi − Zj)

properties:T ,χ,Z ,I,S∼ 160 features in total

2Broto, P., Moreau, G. and Vandycke, C. Eur. J. Med. Chem., 19(1):71-78,1984.

Page 67: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

9/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Extensible, continuous descriptors - RACs

Based on autocorrelations2

OO

OO

C C

M

d1 : ∑O,C

ZOZC = 48d1 : 48 + ∑C,O

ZOZC = 144 + 48d1 : ∑i

∑j

ZiZj δ(di,j , 1)

dx : ∑i

∑j

ZiZj δ(dij , x)

0 1 2 3 4 5 6maximum AC depth

8

10

12

14

16

18

MU

E (

kc

al/m

ol)

traintest

*

How to adapt to TM complexes?restrict the scope to focus onnear-metal atoms

d1 : ∑M,O

ZMZOd2 : ∑M,C

ZMZCd3 : ∑M,O

ZMZO

(Zi − Zj)

properties:T ,χ,Z ,I,S∼ 160 features in total

2Broto, P., Moreau, G. and Vandycke, C. Eur. J. Med. Chem., 19(1):71-78,1984.

Page 68: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

9/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Extensible, continuous descriptors - RACs

Based on autocorrelations

OO

OO

C C

M

d1 : ∑O,C

ZOZC = 48d1 : 48 + ∑C,O

ZOZC = 144 + 48d1 : ∑i

∑j

ZiZj δ(di,j , 1)dx : ∑i

∑j

ZiZj δ(dij , x)

0 1 2 3 4 5 6maximum AC depth

8

10

12

14

16

18

MU

E (

kc

al/m

ol)

traintest

*

How to adapt to TM complexes?

restrict the scope to focus onnear-metal atoms

d1 : ∑M,O

ZMZOd2 : ∑M,C

ZMZCd3 : ∑M,O

ZMZO

(Zi − Zj)

properties:T ,χ,Z ,I,S∼ 160 features in total

Page 69: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

9/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Extensible, continuous descriptors - RACs

Based on autocorrelations

OO

OO

C C

M

d1 : ∑O,C

ZOZC = 48d1 : 48 + ∑C,O

ZOZC = 144 + 48d1 : ∑i

∑j

ZiZj δ(di,j , 1)dx : ∑i

∑j

ZiZj δ(dij , x)

0 1 2 3 4 5 6maximum AC depth

8

10

12

14

16

18

MU

E (

kc

al/m

ol)

traintest

*

How to adapt to TM complexes?restrict the scope to focus onnear-metal atoms

d1 : ∑M,O

ZMZOd2 : ∑M,C

ZMZCd3 : ∑M,O

ZMZO

(Zi − Zj)

properties:T ,χ,Z ,I,S∼ 160 features in total

Page 70: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

9/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Extensible, continuous descriptors - RACs

Based on autocorrelations

OO

OO

C C

M

d1 : ∑O,C

ZOZC = 48d1 : 48 + ∑C,O

ZOZC = 144 + 48d1 : ∑i

∑j

ZiZj δ(di,j , 1)dx : ∑i

∑j

ZiZj δ(dij , x)

0 1 2 3 4 5 6maximum AC depth

8

10

12

14

16

18

MU

E (

kc

al/m

ol)

traintest

*

How to adapt to TM complexes?restrict the scope to focus onnear-metal atoms

d1 : ∑M,O

ZMZOd2 : ∑M,C

ZMZCd3 : ∑M,O

ZMZO

(Zi − Zj)

properties:T ,χ,Z ,I,S∼ 160 features in total

Page 71: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

9/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Extensible, continuous descriptors - RACs

Based on autocorrelations

OO

OO

C C

M

d1 : ∑O,C

ZOZC = 48d1 : 48 + ∑C,O

ZOZC = 144 + 48d1 : ∑i

∑j

ZiZj δ(di,j , 1)dx : ∑i

∑j

ZiZj δ(dij , x)

0 1 2 3 4 5 6maximum AC depth

8

10

12

14

16

18

MU

E (

kc

al/m

ol)

traintest

*

How to adapt to TM complexes?restrict the scope to focus onnear-metal atoms

d1 : ∑M,O

ZMZO

d2 : ∑M,C

ZMZCd3 : ∑M,O

ZMZO

(Zi − Zj)

properties:T ,χ,Z ,I,S∼ 160 features in total

Page 72: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

9/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Extensible, continuous descriptors - RACs

Based on autocorrelations

OO

OO

C C

M

d1 : ∑O,C

ZOZC = 48d1 : 48 + ∑C,O

ZOZC = 144 + 48d1 : ∑i

∑j

ZiZj δ(di,j , 1)dx : ∑i

∑j

ZiZj δ(dij , x)

0 1 2 3 4 5 6maximum AC depth

8

10

12

14

16

18

MU

E (

kc

al/m

ol)

traintest

*

How to adapt to TM complexes?restrict the scope to focus onnear-metal atoms

d1 : ∑M,O

ZMZO

d2 : ∑M,C

ZMZC

d3 : ∑M,O

ZMZO

(Zi − Zj)

properties:T ,χ,Z ,I,S∼ 160 features in total

Page 73: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

9/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Extensible, continuous descriptors - RACs

Based on autocorrelations

OO

OO

C C

M

d1 : ∑O,C

ZOZC = 48d1 : 48 + ∑C,O

ZOZC = 144 + 48d1 : ∑i

∑j

ZiZj δ(di,j , 1)dx : ∑i

∑j

ZiZj δ(dij , x)

0 1 2 3 4 5 6maximum AC depth

8

10

12

14

16

18

MU

E (

kc

al/m

ol)

traintest

*

How to adapt to TM complexes?restrict the scope to focus onnear-metal atoms

d1 : ∑M,O

ZMZOd2 : ∑M,C

ZMZC

d3 : ∑M,O

ZMZO

(Zi − Zj)

properties:T ,χ,Z ,I,S∼ 160 features in total

Page 74: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

9/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Extensible, continuous descriptors - RACs

Based on autocorrelations

OO

OO

C C

M

d1 : ∑O,C

ZOZC = 48d1 : 48 + ∑C,O

ZOZC = 144 + 48d1 : ∑i

∑j

ZiZj δ(di,j , 1)dx : ∑i

∑j

ZiZj δ(dij , x)

0 1 2 3 4 5 6maximum AC depth

8

10

12

14

16

18

MU

E (

kc

al/m

ol)

traintest

*

How to adapt to TM complexes?restrict the scope to focus onnear-metal atoms

d1 : ∑M,O

ZMZOd2 : ∑M,C

ZMZC

d3 : ∑M,O

ZMZO

(Zi − Zj)

properties:T ,χ,Z ,I,S∼ 160 features in total

Page 75: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

9/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Extensible, continuous descriptors - RACs

Based on autocorrelations

OO

OO

C C

M

d1 : ∑O,C

ZOZC = 48d1 : 48 + ∑C,O

ZOZC = 144 + 48d1 : ∑i

∑j

ZiZj δ(di,j , 1)dx : ∑i

∑j

ZiZj δ(dij , x)

0 1 2 3 4 5 6maximum AC depth

8

10

12

14

16

18

MU

E (

kc

al/m

ol)

traintest

*

How to adapt to TM complexes?restrict the scope to focus onnear-metal atoms

d1 : ∑M,O

ZMZOd2 : ∑M,C

ZMZC

d3 : ∑M,O

ZMZO

(Zi − Zj)

properties:T ,χ,Z ,I,S∼ 160 features in total

Page 76: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

9/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Extensible, continuous descriptors - RACs

Based on autocorrelations

OO

OO

C C

M

d1 : ∑O,C

ZOZC = 48d1 : 48 + ∑C,O

ZOZC = 144 + 48d1 : ∑i

∑j

ZiZj δ(di,j , 1)dx : ∑i

∑j

ZiZj δ(dij , x)

0 1 2 3 4 5 6maximum AC depth

8

10

12

14

16

18

MU

E (

kc

al/m

ol)

traintest

*

How to adapt to TM complexes?restrict the scope to focus onnear-metal atoms

d1 : ∑M,O

ZMZOd2 : ∑M,C

ZMZC

d3 : ∑M,O

ZMZO

(Zi − Zj)

properties:T ,χ,Z ,I,S

∼ 160 features in total

Page 77: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

9/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Extensible, continuous descriptors - RACs

Based on autocorrelations

OO

OO

C C

M

d1 : ∑O,C

ZOZC = 48d1 : 48 + ∑C,O

ZOZC = 144 + 48d1 : ∑i

∑j

ZiZj δ(di,j , 1)dx : ∑i

∑j

ZiZj δ(dij , x)

0 1 2 3 4 5 6maximum AC depth

8

10

12

14

16

18

MU

E (

kc

al/m

ol)

traintest

*

How to adapt to TM complexes?restrict the scope to focus onnear-metal atoms

d1 : ∑M,O

ZMZOd2 : ∑M,C

ZMZC

d3 : ∑M,O

ZMZO

(Zi − Zj)

properties:T ,χ,Z ,I,S

∼ 160 features in total

Page 78: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

10/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Feature selection

MCDL

RAC155UV86

RFE43

LS28

rF41

1.5

2.0

2.5

3.0

3.5

4.0

50 100 150

dimension

RM

SE

, kca

l/mol

Janet, J.P., and Kulik, H.J. J. Phys. Chem. A, 2017,121, 46, 8939-8954.

Page 79: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

10/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Feature selection

MCDL

RAC155UV86

RFE43

LS28

rF41

1.5

2.0

2.5

3.0

3.5

4.0

50 100 150

dimension

RM

SE

, kca

l/mol

Janet, J.P., and Kulik, H.J. J. Phys. Chem. A, 2017,121, 46, 8939-8954.

Page 80: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

10/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Feature selection

MCDL

RAC155UV86

RFE43

LS28

rF41

1.5

2.0

2.5

3.0

3.5

4.0

50 100 150

dimension

RM

SE

, kca

l/mol

Janet, J.P., and Kulik, H.J. J. Phys. Chem. A, 2017,121, 46, 8939-8954.

Page 81: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

10/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Feature selection

MCDL

RAC155UV86

RFE43

LS28

rF41

1.5

2.0

2.5

3.0

3.5

4.0

50 100 150

dimension

RM

SE

, kca

l/mol

Janet, J.P., and Kulik, H.J. J. Phys. Chem. A, 2017,121, 46, 8939-8954.

Page 82: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

10/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Feature selection

MCDL

RAC155UV86

RFE43

LS28

rF41

1.5

2.0

2.5

3.0

3.5

4.0

50 100 150

dimension

RM

SE

, kca

l/mol

Janet, J.P., and Kulik, H.J. J. Phys. Chem. A, 2017,121, 46, 8939-8954.

Page 83: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

10/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Feature selection

MCDL

RAC155UV86

RFE43

LS28

rF41

1.5

2.0

2.5

3.0

3.5

4.0

50 100 150

dimension

RM

SE

, kca

l/mol

Janet, J.P., and Kulik, H.J. J. Phys. Chem. A, 2017,121, 46, 8939-8954.

Page 84: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

10/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Feature selection

MCDL

RAC155UV86

RFE43

LS28

rF41

1.5

2.0

2.5

3.0

3.5

4.0

50 100 150

dimension

RM

SE

, kca

l/mol

Janet, J.P., and Kulik, H.J. J. Phys. Chem. A, 2017,121, 46, 8939-8954.

Page 85: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

10/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Feature selection

MCDL

RAC155UV86

RFE43

LS28

rF41

1.5

2.0

2.5

3.0

3.5

4.0

50 100 150

dimension

RM

SE

, kca

l/mol

Janet, J.P., and Kulik, H.J. J. Phys. Chem. A, 2017,121, 46, 8939-8954.

Page 86: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

10/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Feature selection

MCDL

RAC155UV86

RFE43

LS28

rF41

1.5

2.0

2.5

3.0

3.5

4.0

50 100 150

dimension

RM

SE

, kca

l/mol

Janet, J.P., and Kulik, H.J. J. Phys. Chem. A, 2017,121, 46, 8939-8954.

Page 87: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

10/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Feature selection

MCDL

RAC155UV86

RFE43

LS28

rF41

1.5

2.0

2.5

3.0

3.5

4.0

50 100 150

dimension

RM

SE

, kca

l/mol

Janet, J.P., and Kulik, H.J. J. Phys. Chem. A, 2017,121, 46, 8939-8954.

Page 88: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

10/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Feature selection

MCDL

RAC155UV86

RFE43

LS28

rF41

1.5

2.0

2.5

3.0

3.5

4.0

50 100 150

dimension

RM

SE

, kca

l/mol

Janet, J.P., and Kulik, H.J. J. Phys. Chem. A, 2017,121, 46, 8939-8954.

Page 89: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

10/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Feature selection

MCDL

RAC155UV86

RFE43

LS28

rF41

1.5

2.0

2.5

3.0

3.5

4.0

50 100 150

dimension

RM

SE

, kca

l/mol

Janet, J.P., and Kulik, H.J. J. Phys. Chem. A, 2017,121, 46, 8939-8954.

Page 90: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

10/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Feature selection

MCDL

RAC155UV86

RFE43

LS28

rF41

1.5

2.0

2.5

3.0

3.5

4.0

50 100 150

dimension

RM

SE

, kca

l/mol

Janet, J.P., and Kulik, H.J. J. Phys. Chem. A, 2017,121, 46, 8939-8954.

Page 91: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

10/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Feature selection

MCDL

RAC155UV86

RFE43

LS28

rF41

1.5

2.0

2.5

3.0

3.5

4.0

50 100 150

dimension

RM

SE

, kca

l/mol

Janet, J.P., and Kulik, H.J. J. Phys. Chem. A, 2017,121, 46, 8939-8954.

Page 92: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

10/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Feature selection

MCDL

RAC155UV86

RFE43

LS28

rF41

1.5

2.0

2.5

3.0

3.5

4.0

50 100 150

dimension

RM

SE

, kca

l/mol

Janet, J.P., and Kulik, H.J. J. Phys. Chem. A, 2017,121, 46, 8939-8954.

Page 93: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

11/21

Introduction Describing inorganic complexes Similarity and model uncertainty

A tale of two complexes, II

PC 1

PC

2

PC 1

PC

2

PC 1

PC

2

PC 1

PC

2

∆EH−L size

Page 94: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

11/21

Introduction Describing inorganic complexes Similarity and model uncertainty

A tale of two complexes, II

PC 1

PC

2

PC 1

PC

2

PC 1

PC

2

PC 1

PC

2

∆EH−L size

Page 95: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

12/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Do features depend on properties?

metal

N

N

NN

CC

C

C

C

C

CC

CC

HH

CC

CC

HH

CC

C

C

H

H

C

C

H

H

Page 96: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

12/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Do features depend on properties?

spin splitting (randF) spin splitting (randF)

bond lengths (randF) redox (randF)

more ‘electronic’

more ‘topological’

Page 97: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

12/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Do features depend on properties?

spin splitting (randF) spin splitting (randF)

bond lengths (randF) redox (randF)

more ‘electronic’

more ‘topological’

Page 98: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

12/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Do features depend on properties?

spin splitting (randF)

spin splitting (randF)

bond lengths (randF) redox (randF)

more ‘electronic’

more ‘topological’

Page 99: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

12/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Do features depend on properties?

spin splitting (randF) spin splitting (randF)

bond lengths (randF) redox (randF)

more ‘electronic’

more ‘topological’

Page 100: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

12/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Do features depend on properties?

spin splitting (randF) spin splitting (randF)

bond lengths (randF)

redox (randF)

more ‘electronic’

more ‘topological’

Page 101: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

12/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Do features depend on properties?

spin splitting (randF) spin splitting (randF)

bond lengths (randF) redox (randF)

more ‘electronic’

more ‘topological’

Page 102: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

13/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Mapping TM complex space

PC 1

PC

2

357911

E0 (eV)

?

random forest selected for redox

Page 103: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

13/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Mapping TM complex space

PC 1

PC

2

357911

E0 (eV)

?random forest selected for redox

Page 104: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

13/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Mapping TM complex space

PC 1

PC

2

357911

E0 (eV)

?random forest selected for redox

Page 105: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

13/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Mapping TM complex space

PC 1

PC

2

357911

E0 (eV)

?random forest selected for redox

Page 106: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

13/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Mapping TM complex space

PC 1

PC

2

357911

E0 (eV)

?

random forest selected for redox

Page 107: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

13/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Mapping TM complex space

PC 1

PC

2

357911

E0 (eV)

?

random forest selected for redox

Page 108: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

13/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Mapping TM complex space

PC 1

PC

2

357911

E0 (eV)

?

random forest selected for redox

Page 109: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

14/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Mapping TM complex space

+

2

= ?

Cr(II) [H2O]5 [misc]∆G = 5.3 eV

Co(II) [CO]5 [pyr]∆G = 8.1 eV

Fe(II) [CO]4 [pyr][water]∆G = 7.8 eV

Page 110: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

14/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Mapping TM complex space

+

2

= ?

Cr(II) [H2O]5 [misc]∆G = 5.3 eV

Co(II) [CO]5 [pyr]∆G = 8.1 eV

Fe(II) [CO]4 [pyr][water]∆G = 7.8 eV

Page 111: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

14/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Mapping TM complex space

+

2

= ?

Cr(II) [H2O]5 [misc]∆G = 5.3 eV

Co(II) [CO]5 [pyr]∆G = 8.1 eV

Fe(II) [CO]4 [pyr][water]∆G = 7.8 eV

Page 112: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

14/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Mapping TM complex space

+

2

= ?

Cr(II) [H2O]5 [misc]∆G = 5.3 eV

Co(II) [CO]5 [pyr]∆G = 8.1 eV

Fe(II) [CO]4 [pyr][water]∆G = 7.8 eV

Page 113: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

14/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Mapping TM complex space

+

2

=

?

Cr(II) [H2O]5 [misc]∆G = 5.3 eV

Co(II) [CO]5 [pyr]∆G = 8.1 eV

Fe(II) [CO]4 [pyr][water]∆G = 7.8 eV

Page 114: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

14/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Mapping TM complex space

+

2

= ?

Cr(II) [H2O]5 [misc]∆G = 5.3 eV

Co(II) [CO]5 [pyr]∆G = 8.1 eV

Fe(II) [CO]4 [pyr][water]∆G = 7.8 eV

Page 115: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

14/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Mapping TM complex space

+

2

= ?

Cr(II) [H2O]5 [misc]

∆G = 5.3 eV

Co(II) [CO]5 [pyr]

∆G = 8.1 eVFe(II) [CO]4 [pyr][water]

∆G = 7.8 eV

Page 116: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

14/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Mapping TM complex space

+

2

= ?

Cr(II) [H2O]5 [misc]∆G = 5.3 eV

Co(II) [CO]5 [pyr]∆G = 8.1 eV

Fe(II) [CO]4 [pyr][water]∆G = 7.8 eV

Page 117: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

14/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Mapping TM complex space

+

2

= ?

Cr(II) [H2O]5 [misc]∆G = 5.3 eV

Co(II) [CO]5 [pyr]∆G = 8.1 eV

Fe(II) [CO]4 [pyr][water]∆G = 7.8 eV

Page 118: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

15/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Model transferability

Test-set performance is not necessarily a good metric for generaltransferability2:

Fe(III)

−25

0

25

50

pisc−pisc pisc−NCSpisc−H2O pisc−Cl H2O−H2O Cl−Cl NCS−NCS

∆EH

−L k

cal/m

ol

ANN

B3LYP

Fe(III)[pisc]6

0

20

40

60

0.0 0.1 0.2 0.3HFX, %

∆EH

−L k

cal/m

ol

ANN

DFT

3.132.97

0

5

10

15

train test

abs.

err

or

(kca

l/mo

l)

0

10

20

30

train test CSD

abs.

err

or

(kca

l/mo

l)

2Janet, J.P., and Kulik, H.J. Chem. Sci., 2017, 8, 5137-5152.

Page 119: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

15/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Model transferability

Test-set performance is not necessarily a good metric for generaltransferability2:

Fe(III)

−25

0

25

50

pisc−pisc pisc−NCSpisc−H2O pisc−Cl H2O−H2O Cl−Cl NCS−NCS

∆EH

−L k

cal/m

ol

ANN

B3LYP

Fe(III)[pisc]6

0

20

40

60

0.0 0.1 0.2 0.3HFX, %

∆EH

−L k

cal/m

ol

ANN

DFT

3.132.97

0

5

10

15

train test

abs.

err

or

(kca

l/mo

l)

0

10

20

30

train test CSD

abs.

err

or

(kca

l/mo

l)

2Janet, J.P., and Kulik, H.J. Chem. Sci., 2017, 8, 5137-5152.

Page 120: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

15/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Model transferability

Test-set performance is not necessarily a good metric for generaltransferability2:

Fe(III)

−25

0

25

50

pisc−pisc pisc−NCSpisc−H2O pisc−Cl H2O−H2O Cl−Cl NCS−NCS

∆EH

−L k

cal/m

ol

ANN

B3LYP

Fe(III)[pisc]6

0

20

40

60

0.0 0.1 0.2 0.3HFX, %

∆EH

−L k

cal/m

ol

ANN

DFT

3.132.97

0

5

10

15

train test

abs.

err

or

(kca

l/mo

l)

0

10

20

30

train test CSD

abs.

err

or

(kca

l/mo

l)

2Janet, J.P., and Kulik, H.J. Chem. Sci., 2017, 8, 5137-5152.

Page 121: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

15/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Model transferability

Test-set performance is not necessarily a good metric for generaltransferability2:

Fe(III)

−25

0

25

50

pisc−pisc pisc−NCSpisc−H2O pisc−Cl H2O−H2O Cl−Cl NCS−NCS

∆EH

−L k

cal/m

ol

ANN

B3LYP

Fe(III)[pisc]6

0

20

40

60

0.0 0.1 0.2 0.3HFX, %

∆EH

−L k

cal/m

ol

ANN

DFT

3.132.97

0

5

10

15

train test

abs.

err

or

(kca

l/mo

l)

0

10

20

30

train test CSD

abs.

err

or

(kca

l/mo

l)

2Janet, J.P., and Kulik, H.J. Chem. Sci., 2017, 8, 5137-5152.

Page 122: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

15/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Model transferability

Test-set performance is not necessarily a good metric for generaltransferability2:

Fe(III)

−25

0

25

50

pisc−pisc pisc−NCSpisc−H2O pisc−Cl H2O−H2O Cl−Cl NCS−NCS

∆EH

−L k

cal/m

ol

ANN

B3LYP

Fe(III)[pisc]6

0

20

40

60

0.0 0.1 0.2 0.3HFX, %

∆EH

−L k

cal/m

ol

ANN

DFT

3.132.97

0

5

10

15

train test

abs.

err

or

(kca

l/mo

l)

0

10

20

30

train test CSD

abs.

err

or

(kca

l/mo

l)

2Janet, J.P., and Kulik, H.J. Chem. Sci., 2017, 8, 5137-5152.

Page 123: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

16/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Model transferability

Uncertainty estimates are essential for our surrogate model toexplore chemical space:

-50

-25

0

25

50

75

-50 -25 0 25 50surrogate splitting (kcal/mol)

DF

T s

plit

tin

g (

kcal

/mo

l)

Uncertainty from mc-dropout1:ANN model approximates vari-ational inference with GP undersome conditions:

var (y∗|x∗) ≈ 1J ∑j yT

j yj + τ−1

-50

-25

0

25

50

75

-50 -25 0 25 50surrogate splitting (kcal/mol)

DF

T s

plit

tin

g (

kcal

/mo

l)

0

10

20

30

0.5 1.0 1.5 2.0distance

abs.

err

or (k

cal/m

ol)

Gal, Y. and Ghahramani, Z., 2016. ICMLR 1050-1059

Page 124: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

16/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Model transferability

Uncertainty estimates are essential for our surrogate model toexplore chemical space:

-50

-25

0

25

50

75

-50 -25 0 25 50surrogate splitting (kcal/mol)

DF

T s

plit

tin

g (

kcal

/mo

l) Uncertainty from mc-dropout1:ANN model approximates vari-ational inference with GP undersome conditions:

var (y∗|x∗) ≈ 1J ∑j yT

j yj + τ−1

-50

-25

0

25

50

75

-50 -25 0 25 50surrogate splitting (kcal/mol)

DF

T s

plit

tin

g (

kcal

/mo

l)

0

10

20

30

0.5 1.0 1.5 2.0distance

abs.

err

or (k

cal/m

ol)

Gal, Y. and Ghahramani, Z., 2016. ICMLR 1050-1059

Page 125: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

16/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Model transferability

Uncertainty estimates are essential for our surrogate model toexplore chemical space:

-50

-25

0

25

50

75

-50 -25 0 25 50surrogate splitting (kcal/mol)

DF

T s

plit

tin

g (

kcal

/mo

l) Uncertainty from mc-dropout1:ANN model approximates vari-ational inference with GP undersome conditions:

var (y∗|x∗) ≈ 1J ∑j yT

j yj + τ−1

-50

-25

0

25

50

75

-50 -25 0 25 50surrogate splitting (kcal/mol)

DF

T s

plit

tin

g (

kcal

/mo

l)

0

10

20

30

0.5 1.0 1.5 2.0distance

abs.

err

or (k

cal/m

ol)

Gal, Y. and Ghahramani, Z., 2016. ICMLR 1050-1059

Page 126: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

16/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Model transferability

Uncertainty estimates are essential for our surrogate model toexplore chemical space:

-50

-25

0

25

50

75

-50 -25 0 25 50surrogate splitting (kcal/mol)

DF

T s

plit

tin

g (

kcal

/mo

l)

Uncertainty from mc-dropout1:ANN model approximates vari-ational inference with GP undersome conditions:

var (y∗|x∗) ≈ 1J ∑j yT

j yj + τ−1

-50

-25

0

25

50

75

-50 -25 0 25 50surrogate splitting (kcal/mol)

DF

T s

plit

tin

g (

kcal

/mo

l)

0

10

20

30

0.5 1.0 1.5 2.0distance

abs.

err

or (k

cal/m

ol)

Gal, Y. and Ghahramani, Z., 2016. ICMLR 1050-1059

Page 127: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

17/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Demonstration

Can we use the ANN model to find new spin-crossover materials,i.e. ∆EH−L = 0?

Define a space of 32 ligands, 5 metals and with∼ 5600 possible elements with forced axial/equatorial symmetry3:

3Janet, J.P., Chan, L. and Kulik, H.J. J. Phys. Chem. Lett., 2018, 9, 5,1064-1071.

Page 128: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

17/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Demonstration

Can we use the ANN model to find new spin-crossover materials,i.e. ∆EH−L = 0? Define a space of 32 ligands, 5 metals and with∼ 5600 possible elements with forced axial/equatorial symmetry3:

3Janet, J.P., Chan, L. and Kulik, H.J. J. Phys. Chem. Lett., 2018, 9, 5,1064-1071.

Page 129: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

18/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Demonstration

ANN is trained on 14 of these ligands, covers only 2% of thedesign space.

We can visualize the design space using t-SNE4:

−40

−20

0

20

40

0.0

0.5

1.0

1.5

2.0

4Maaten, L., & Hinton, G., 2008. J. Mach. Learn. Res. 2579-2605.

Page 130: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

18/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Demonstration

ANN is trained on 14 of these ligands, covers only 2% of thedesign space. We can visualize the design space using t-SNE4:

−40

−20

0

20

40

0.0

0.5

1.0

1.5

2.0

4Maaten, L., & Hinton, G., 2008. J. Mach. Learn. Res. 2579-2605.

Page 131: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

18/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Demonstration

ANN is trained on 14 of these ligands, covers only 2% of thedesign space. We can visualize the design space using t-SNE4:

−40

−20

0

20

40

0.0

0.5

1.0

1.5

2.0

4Maaten, L., & Hinton, G., 2008. J. Mach. Learn. Res. 2579-2605.

Page 132: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

19/21

Introduction Describing inorganic complexes Similarity and model uncertainty

How accurate are we?

Test 51 leads from ANN with DFT5:

1

2 2

1

2

3 3 3

2

1

3

7 7

4

5

3

1 1

0

2

4

6

8

-20 -15 -10 -5 0 5 10errors (kcal/mol)

coun

t

sub. isocyanides

0

5

10

15

0.00 0.25 0.50 0.75distance to train

ΔE H

− LA

NN−Δ

E H− L

GO

(kca

l/mol

)

23

CrMnFeCo

5Janet, J.P., Chan, L. and Kulik, H.J. J. Phys. Chem. Lett., 2018, 9, 5,1064-1071.

Page 133: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

19/21

Introduction Describing inorganic complexes Similarity and model uncertainty

How accurate are we?

Test 51 leads from ANN with DFT5:

1

2 2

1

2

3 3 3

2

1

3

7 7

4

5

3

1 1

0

2

4

6

8

-20 -15 -10 -5 0 5 10errors (kcal/mol)

coun

t

sub. isocyanides

0

5

10

15

0.00 0.25 0.50 0.75distance to train

ΔE H

− LA

NN−Δ

E H− L

GO

(kca

l/mol

)

23

CrMnFeCo

5Janet, J.P., Chan, L. and Kulik, H.J. J. Phys. Chem. Lett., 2018, 9, 5,1064-1071.

Page 134: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

19/21

Introduction Describing inorganic complexes Similarity and model uncertainty

How accurate are we?

Test 51 leads from ANN with DFT5:

1

2 2

1

2

3 3 3

2

1

3

7 7

4

5

3

1 1

0

2

4

6

8

-20 -15 -10 -5 0 5 10errors (kcal/mol)

coun

t

sub. isocyanides

0

5

10

15

0.00 0.25 0.50 0.75distance to train

ΔE H

− LA

NN−Δ

E H− L

GO

(kca

l/mol

)

23

CrMnFeCo

5Janet, J.P., Chan, L. and Kulik, H.J. J. Phys. Chem. Lett., 2018, 9, 5,1064-1071.

Page 135: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

20/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Conclusions

choice of molecular representation is important

different properties depend non-equally on features

feature-space geometry can provide insight into modelreliability

imbuing ‘chemical intuition’ to descriptor construction candrastically improve learning

conversely, feature selection can contribute tounderstanding systems

Page 136: jpjanet.io · 2/21 Introduction Describing inorganic complexes Similarity and model uncertainty Data-driven molecular design Gomez-Bombarelli, R. et al.. Nat. Mater., 15(10):1120-1127,

21/21

Introduction Describing inorganic complexes Similarity and model uncertainty

Acknowledgments

Thanks to the Kulik group and funding partners: