40
The Materials Project Ecosystem A Complete Software and Data Platform for Materials Informatics Shyue Ping Ong, University of California, San Diego

The Materials Project Ecosystem - A Complete Software and Data Platform for Materials Informatics

Embed Size (px)

Citation preview

Page 1: The Materials Project Ecosystem - A Complete Software and Data Platform for Materials Informatics

The Materials Project Ecosystem A Complete Software and Data Platform for Materials Informatics

Shyue Ping Ong, University of California, San Diego

Page 2: The Materials Project Ecosystem - A Complete Software and Data Platform for Materials Informatics

“Information wants to be free.” – Steward Brand, 1960s

Page 3: The Materials Project Ecosystem - A Complete Software and Data Platform for Materials Informatics

“Information wants to be free and code wants to be wrong.”

– RSA Conference 2008

Page 4: The Materials Project Ecosystem - A Complete Software and Data Platform for Materials Informatics

“Materials information and code wants to be free and right.”

Page 5: The Materials Project Ecosystem - A Complete Software and Data Platform for Materials Informatics

The Materials Project is an open science project to make the computed properties of all known inorganic materials publicly available to all researchers to accelerate materials innovation.

June 2011: Materials Genome Initiative which aims to “fund computational tools, software, new methods for material characterization, and the development of open standards and databases that will make the process of discovery and development of advanced materials faster, less expensive, and more predictable”

https://www.materialsproject.org

Page 6: The Materials Project Ecosystem - A Complete Software and Data Platform for Materials Informatics

As of Jun 5 2015q  Over 58,000 unique

compounds, and growingq  Diverse set of many

propertiesq Structural (lattice parameters,

atomic positions, etc.), q Energetic (formation

energies, phase stability, etc.) q Electronic structure (DOS,

Bandstructures) q Elastic constants

q  Suite of Web Apps for materials analysis

Page 7: The Materials Project Ecosystem - A Complete Software and Data Platform for Materials Informatics

User-friendly Web Apps

Materials Explorer: Search for materials by formula, elements or properties Battery Explorer: Search for battery materials by voltage, capacity and other properties Crystal Toolkit: Design new materials from existing materials Structure Predictor: Predict novel structures Phase Diagram App: Generate compositional and grand canonical phase diagrams Pourbaix Diagram App: Generate Pourbaix diagrams Reaction Calculator: Balance reactions and calculate their enthalpies

Page 8: The Materials Project Ecosystem - A Complete Software and Data Platform for Materials Informatics

Materials Project data in User papers M. Meinert, M.P. Geisler, Phase stability of chromium based compensated ferrimagnets with inverse Heusler structure, J. Magn. Magn. Mater. 341 (2013) 72–74.

J. Rustad, Density functional calculations of the enthalpies of formation of rare-earth orthophosphates, Am. Mineral. 97 (2012) 791–799.

M. Fondell, T.J. Jacobsson, M. Boman, T. Edvinsson, Optical quantum confinement in low dimensional hematite, J. Mater. Chem. A. 2 (2014) 3352.

Page 9: The Materials Project Ecosystem - A Complete Software and Data Platform for Materials Informatics

Web frontend is only the tip of the iceberg…

pymatgen FireWorks REST API custodian MPWorks

MPEnv rubicon

Page 10: The Materials Project Ecosystem - A Complete Software and Data Platform for Materials Informatics
Page 11: The Materials Project Ecosystem - A Complete Software and Data Platform for Materials Informatics

Hierarchical design of codebases keeps infrastructure nimble to changes

WORKFLOW CODE

CHEMISTRY CODE

Page 12: The Materials Project Ecosystem - A Complete Software and Data Platform for Materials Informatics

Many types of use cases

FireWorks pymatgen custodian MPWorksCrystal workflows

FireWorks pymatgen custodian rubicon (private)Molecule workflows

pymatgen

FireWorks

externalMAST, MaterialsHub

externalBerlin ML, JGI, MoDeNa

Page 13: The Materials Project Ecosystem - A Complete Software and Data Platform for Materials Informatics

Sustainable software development

¨  Open-source ¤  Managed via ¤  More eyes => robustness

¤  Contributions from all over the world

¨  Benevolent dictators ¤  Unified vision

¤  Quality control

¨  Clear documentation ¤  Prevent code rot

¤  More users

¨  Continuous integration and testing ¤  Ensure code is always working

Page 14: The Materials Project Ecosystem - A Complete Software and Data Platform for Materials Informatics

Python Materials Genomics (pymatgen)

¨  Core materials analysis powering the Materials Project

¨  Defines core extensible Python objects for materials data representation.

¨  Provides a robust and well-documented set of structure and thermodynamic analysis tools relevant to many applications.

¨  Establishes an open platform for researchers to collaboratively develop sophisticated analyses of materials data.

Page 15: The Materials Project Ecosystem - A Complete Software and Data Platform for Materials Informatics

Extensive Materials Analysis Capabilities

Input/Output

objects

(Modular, Reusable, Extendable)

Defects and Transformations Electronic Structure

XRD Patterns

Phase and Pourbaix Diagrams

Functional properties

Comprehensively documented

Continuously tested and integrated

Active dev/user community

Page 16: The Materials Project Ecosystem - A Complete Software and Data Platform for Materials Informatics

www.pymatgen.org stats •  > 6000 views per month on average •  (~50% increase from previous year)

V2.9.12 è v3.0.13 *Python 2/3 compatible! Other improvements •  ABINIT support •  Defects (Haranczyk/LBNL) •  Qchem (JCESR) •  Bug fixes & improvements Very active user community!

81 forks (developers making changes and contributing)

Actual commits has slowed somewhat, as expected for a maturing and robust code base.

Page 17: The Materials Project Ecosystem - A Complete Software and Data Platform for Materials Informatics

Pymatgen-db

¨  Database add-on for pymatgen. Enables the creation of Materials Project-style MongoDB (www.mongodb.org) databases for management of materials data. Key features: ¤ Query engine for easy translation of MongoDB docs to

useful pymatgen objects for analysis purposes. ¤  Includes a clean and intuitive web ui (the Materials

Genomics UI) for exploring Mongo collections. ¤ http://pythonhosted.org//pymatgen-db/

Page 18: The Materials Project Ecosystem - A Complete Software and Data Platform for Materials Informatics

Custodian

¨  Simple, robust and flexible just-in-time (JIT) job management framework. ¤ Wrappers to perform error checking,

job management and error recovery. ¤  Error recovery is an important aspect

for HT: O(100,000) jobs + 1% error rate => O(1000) errored jobs.

¤  Existing sub-packages for error handling for VASP, NwChem and QChem calculations.

¨  Blue: Controlled by subclasses of Job

¨  Red: Defined by ErrorHandlers.

Page 19: The Materials Project Ecosystem - A Complete Software and Data Platform for Materials Informatics

Concrete Example for VASP calculations

¨  Extensive set of rules have been codified for running VASP calculations

¨  Significantly reduces error rate of calculations (< 1%)

Page 20: The Materials Project Ecosystem - A Complete Software and Data Platform for Materials Informatics

VaspJob class

¨  auto_npar: automatically modifies NPAR in INCAR to a relatively optimal number based on detected number of processors! Enhances vasp calculation efficiency by ~10-30%!!!

¨  auto_gamma: If this is a gamma-only calculation and a gamma compiled version of vasp exists, use it. Another 10-20% increase in efficiency!

¨  Even without error handling, custodian already significantly improves resource utilization of running VASP calculations!

VaspJob(vasp_cmd, output_file="vasp.out”, auto_npar=True, auto_gamma=True, …<other options>...)

Page 21: The Materials Project Ecosystem - A Complete Software and Data Platform for Materials Informatics

FireWorks is the Workflow Manager 21  

Custom material

A cool material !! Lots of information about

cool material !!

Submit!  

Input generation (parameter choice) Workflow mapping

Supercomputer submission / monitoring

Error handling File Transfer

File Parsing / DB insertion

Page 22: The Materials Project Ecosystem - A Complete Software and Data Platform for Materials Informatics

FireWorks as a platform

Community can write any workflow in FireWorks à We can automate it over most supercomputing resources

structure

charge

Band structure

DOS

Optical

phonons

XAFS spectra

GW

Page 23: The Materials Project Ecosystem - A Complete Software and Data Platform for Materials Informatics

Workflows in Development by Internal/External Collaborations

¨  Elastic constants (in production) ¨  Thermal properties (Phonon / GIBBS: in testing) ¨  Surfaces (in testing) ¨  GW / hybrid calculations ¨  ABINIT workflows (Geoffroy Hautier, UCL) ¨  Any code can be added and automated

Page 24: The Materials Project Ecosystem - A Complete Software and Data Platform for Materials Informatics

Materials Project DB

How do I access MP

data?

Page 25: The Materials Project Ecosystem - A Complete Software and Data Platform for Materials Informatics

Materials Project DB

How do I access MP

data?

Option 1: Direct access

Most flexible and powerful, but •  User needs to know db language •  Security is an issue •  Fragile – if db tech or schema

changes, user’s analysis breaks

Page 26: The Materials Project Ecosystem - A Complete Software and Data Platform for Materials Informatics

Materials Project DB

How do I access MP

data?

Option 2: Web Apps

Pros •  Intuitive and user-friendly •  Secure

Cons •  Significant loss in flexibility

and power

Web

App

s

Page 27: The Materials Project Ecosystem - A Complete Software and Data Platform for Materials Informatics

Materials Project DB

How do I access MP

data?

Option 3: Web Apps built on RESTful API

Pros •  Intuitive and user-friendly •  Secure

Web

App

s

RE

STf

ul A

PI

•  Programmatic access for developers

and researchers

Page 28: The Materials Project Ecosystem - A Complete Software and Data Platform for Materials Informatics

The Materials API An open platform for accessing Materials Project data based on REpresentational State Transfer (REST) principles. Flexible and scalable to cater to large number of users, with different access privileges. Simple to use and code agnostic.

Page 29: The Materials Project Ecosystem - A Complete Software and Data Platform for Materials Informatics

A REST API maps a URL to a resource. Example: GET https://api.dropbox.com/1/account/info Returns information about a user’s account. Methods: GET, POST, PUT, DELETE, etc. Response: Usually JSON or XML or both

Page 30: The Materials Project Ecosystem - A Complete Software and Data Platform for Materials Informatics

Who implements REST APIs?

Page 31: The Materials Project Ecosystem - A Complete Software and Data Platform for Materials Informatics

https://www.materialsproject.org/rest/v2/materials/Fe2O3/vasp/energy

Preamble

Identifier, typically a formula (Fe2O3), id (1234) or chemical system (Li-Fe-O)

Data type (vasp, exp, etc.)

Property

Request type

Page 32: The Materials Project Ecosystem - A Complete Software and Data Platform for Materials Informatics

Secure access An individual API key provides secure access with defined privileges. All https requests must supply API key as either a “x-api-key” header or a GET/POST “API_KEY” parameter. API key available at https://www.materialsproject.org/dashboard

Page 33: The Materials Project Ecosystem - A Complete Software and Data Platform for Materials Informatics

Sample output (JSON)

¨  Intuitive response format

¨  Machine-readable (JSON parsers available for most programming languages)

¨  Metadata provides provenance for tracking

{

}

created_at: "2014-07-18T11:23:25.415382",valid_response: true,version: {

},

-pymatgen: "2.9.9",db: "2014.04.18",rest: "1.0"

response: [

],

-{

},

-energy: -67.16532048,material_id: "mp-24972"

{

},

-energy: -132.33035197,material_id: "mp-542309"

{…},+{…},+{…},+{…},+{…},+{…},+{…},+{…}+

copyright: "Materials Project, 2012"

Page 34: The Materials Project Ecosystem - A Complete Software and Data Platform for Materials Informatics

Can I really access any piece of data in the Materials Project?

Github-powered RESTful documentation http://bit.ly/materialsapi

Via the shockingly powerful https://www.materialsproject.org/rest/v2/query

Page 35: The Materials Project Ecosystem - A Complete Software and Data Platform for Materials Informatics

Demo http://localhost:8888/notebooks

Page 36: The Materials Project Ecosystem - A Complete Software and Data Platform for Materials Informatics

The Materials API + pymatgen in Education – UCSD’s NANO 106

¨  Data mined over the Materials Project’s 49,000+ unique crystals

http://www.bit.ly/sg_stats

P21/c is the most common space group, comprising ~9.8% of all compounds

Page 37: The Materials Project Ecosystem - A Complete Software and Data Platform for Materials Informatics

The Materials Virtual Lab @ UCSD’s One-click AIMD

Starting candidates

Topological Screening (augmented by DFT)

Stability (phase & EW) screening

Diffusivity

Optimized candidates

Automated “one-click” MD workflow based on pymatgen, custodian and fireworks

AIMD SDSC

Multi-week AIMD simulation

Statistical exclusionary screening

Y. Mo, S. P. Ong, G. Ceder, “Insights into Diffusion Mechanisms in P2 Layered Oxide Materials by First-Principles Calculations”, submitted

Automated pathway extraction + NEB

Page 38: The Materials Project Ecosystem - A Complete Software and Data Platform for Materials Informatics

Coming soon (full launch in next few

weeks)!!

Page 39: The Materials Project Ecosystem - A Complete Software and Data Platform for Materials Informatics

Sounds good, where do I learn more?

¨  The Materials Project ¤ https://www.materialsproject.org/open

¨  The Materials API Github Doc ¤ http://bit.ly/materialsapi

¨  The Materials Virtual Lab (MAVRL) @ UCSD ¤ Slides from Workshop on MP infrastructure (

http://mavrl.org/software)

Page 40: The Materials Project Ecosystem - A Complete Software and Data Platform for Materials Informatics

Thank you.