24
Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data Stuart J. Chalk Department of Chemistry University of North Florida [email protected] SCTY 132 - Pacifichem 201

Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data

Embed Size (px)

Citation preview

Page 1: Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data

Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series

DataStuart J. Chalk

Department of ChemistryUniversity of North Florida

[email protected]

SCTY 132 - Pacifichem 2015

Page 2: Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data

Thoughts on Policy and Procedures

Case in Point – the SDS Development Perspective The Scientific Data Model Database Structure for SDM Mapping NIST DB to SDM DB Current Status Management and Tools To Do List Acknowledgements

Overview

http://aspiresquared.co.uk/2011/01/metadata-what-is-its-purpose/

Page 3: Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data

Scientific publications need to be reimagined Describe work and highlight advancements yes!

… ...but also provide access to original data as

default For all reasearch (not just federally funded) With appropriate attribution/provenance… …and timestamp and digital signature

Thoughts on Policies and Procedures

Page 4: Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data

Implementation needs to be designed by scientists to engage the community and get buy-in to change paradigm

National societies need to promote good practices in digital collection, annotation and reporting of scientific data …

... and mandate that it be taught as part of the curiculum in chemistry!

Don’t publish useful scientific data in ways you can’t use it!

Thoughts on Policies and Procedures

Page 5: Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data

Case in Point…

IUPAC Solubility Data Series 1979-present 103 volumes

Paper volumes until 1996 (up to volume 65) Electronic articles in JPCRD

Abstracting solubility data from the literature by scientists, careful reporting of values with context

Great resource but why not electronic?

Page 6: Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data
Page 7: Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data
Page 8: Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data
Page 9: Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data
Page 10: Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data
Page 11: Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data

Understanding the NIST SDS DB structure Migration strategy development Database migration (iterative) Scientific Data Model development and

implementation UNF DB design Website development Ingest of chemical identifiers Identification of DOI’s for references Database cleanup

Activities

Page 12: Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data

A general model to describe/contain scientific data focuses on the data not the view of the data

Therefore, how the data is stored in a database should not be governed by how the user views it

Separating the view and the data structure a major focus Our perspective

Systems (NIST DB) are two different views of SDS data Reports are a view of data from one reference about one or

more chemical systems Evaluations are an aggregated view of data from many reports

Data in tables are aggregated views of data from reports and optionally preparer/evaluator supplemental data

Development Perspective

Page 13: Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data

Existing NIST DB Model

Page 14: Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data

Scientific Data Model (SDM)

Page 15: Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data

SDM DatabaseStructure

Page 16: Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data

Each table has one to many rows Each row has one to many columns Data in a row is of different types

Conditional Data – properties values that define context of data

Experimental Data – the measured values in the paper Supplemental data (other property data)

From the authors of the publication evaluated Calculated by the preparer or evaluator

Notes

Interpreting Data in Tables

Page 17: Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data

Interpreting Data in Tables

Conditional Data

Experimental Data

Supplemental Data

Page 18: Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data

Interpreting Data in Tables

Page 19: Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data

All tables completely transferred except SOLDATA – 75% transferred SOLID_DATA – 50% transferred GAS_DATA, PARAMETERS_TB, FIGURES - not

started TO DO’s

Separate Evaluations from Reports Clean up and link references properly

Get links to references that don’t have a DOI Additional integration into other API’s Write up API documentation Solubility Ontology

Current Status

Page 20: Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data

Project files are hosted on GitHub Collaborative environment Tracking issues Website (TO DO)

Communication through Slack Integration with GitHub (issues and commits)

PHP coding using PHPStorm Enforces coding standards, code checks, TO

DOs GitHub (commits and issues) and MySQL

Management and Tools

Page 21: Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data

Finish the transfer of data! Replicate the current SRD website Standardize GUI using BootStrap CSS Check browser compatibility (NIST list of browsers?) Check for compliance with Section 508 for access Add exit script() for leaving NIST website Documentation

PHP code, SDM schema, GUI, API Do roundtrip verification of the data New website will replace existing one by summer

2016

To Do List

Page 22: Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data

UNF Matthew Morse (UNF Senior)

NIST SDS DB migration to UNF DB, SDM Testing Israel Hurst (UNF Junior)

NIST SDS DB data transfer to UNF DB, UNF DB Cleanup

John Turner (UNF Senior) Website implementation, UNF DB Cleanup

NIST Bob Hanisch (ODI Director) Adam Morey, Peter Linstrom, Don Burgess,

Angela Lee

Acknowledgements

Page 23: Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data

Interesting in knowing more about chemical data, semantics and knowledge representation?

251st American Chemical Society MeetingSan Diego, CA, USA March 13-17, 2016

ACS Chemical Informatics Division “CINF Data Summit” All five days of the meeting three symposia

"Tomayto vs. Tomahto: Overcoming Incompatibilities in Scientific Data” “Global Initiatives in Research Data Management & Discovery” “Chemistry, Data, & the Semantic Web: An Important Triple to Advance

Science”

Shameless Plug

Page 24: Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data

[email protected] Phone: 904-620-5311 Skype: stuartchalk Twitter: @StuChalk LinkedIn/Slidehare: https://www.linkedin.com/in/

stuchalk ORCID: http://orcid.org/0000-0002-0703-7776 ResearcherID:

http://www.researcherid.com/rid/D-8577-2013

Questions?