28
oreChem: Linking Chemistry Scholarship into the Semantic Web and Web 2.0 Carl Lagoze, Cornell University Prasenjit Mitra, William Brouwer (Penn State University) Mark Borkum (University of Southampton)

Carl Lagoze, Cornell University Prasenjit Mitra, William Brouwer (Penn State University) Mark Borkum (University of Southampton)

Embed Size (px)

Citation preview

oreChem: Linking Chemistry Scholarship into the Semantic Web and Web 2.0

Carl Lagoze, Cornell UniversityPrasenjit Mitra, William Brouwer (Penn State University)Mark Borkum (University of Southampton)

The Fourth Paradigm

Machine-actionable Substrate Integration of Datasets Exposure of Process

Scholarship 2.0

Linked Data Cloud

Requirements of Scholarship 2.0

Hubble optical observationBaltimore, MD

Basic object informationStrasbourg, France

A “data-aware document”

text

2006 Astrophysics paper

X-MM-Newton X-ray observationVilspa, Spain

Chandra X-ray observationCambridge, MA

Identity

Description

Object-Centered Sociality

M

Mashup

Reputation

Relationships

Conversation

GroupsSharing

Collaboration

Actions

Presence

Open Archives Initiative – Object Reuse and Exchange

Triples

describes

aggregation

Resource Map

http://www.openarchives.org/ore/

oreChem – The Chemical Semantic Web

• At-source capture of experiment data and research process (Electronic Lab Notebook)

• Compound object authoring• Retrospective harvesting of chemistry data

• Representation/Reuse through common ORE data model and ontology

• Cloud-based triple store• Chemical structure search

In the future ideal world …

<?xml version="1.0" ?><cml version="3" convention="org-synth-report" xmlns="http://www.xml-cml.org/schema"> <molecule id="m1"> <atomArray> <atom id="a1" elementType="C" x2="-2.9149999618530273" y2="0.7699999809265137" /> <atom id="a2" elementType="C" x2="-1.5813208400249916" y2="1.5399999809265137" /> <atom id="a3" elementType="O" x2="-0.24764171819695613" y2="0.7699999809265134" /> <atom id="a4" elementType="O" x2="-1.5813208400249912" y2="3.0799999809265137" /> <atom id="a5" elementType="H" x2="-4.248679083681063" y2="1.5399999809265137" /> <atom id="a6" elementType="H" x2="-2.914999961853028" y2="-0.7700000190734864" /> <atom id="a7" elementType="H" x2="-4.248679083681063" y2="-1.907348645691087E-8" /> <atom id="a8" elementType="H" x2="1.0860374036310796" y2="1.5399999809265132" /> </atomArray> <bondArray> <bond atomRefs2="a1 a2" order="1" /> <bond atomRefs2="a2 a3" order="1" /> <bond atomRefs2="a2 a4" order="2" /> <bond atomRefs2="a1 a5" order="1" /> <bond atomRefs2="a1 a6" order="1" /> <bond atomRefs2="a1 a7" order="1" /> <bond atomRefs2="a3 a8" order="1" /> </bondArray> </molecule></cml>

Chem4Word - Chemistry Drawing in Word

Relationships: Navigate and link referenced chemistry

Available soon:http://research.microsoft.com/chem4word/

Data: Semantics stored in Chemistry Markup Language

Intent: Recognizes chemical dictionary and ontology terms

Author/edit 1D and 2D chemistry. Change chemical layout styles.

Intelligence: Verifies validity of authored chemistry

Triple store

data

Unfortunately …

PSU

•NMR Spectra and Structural Data•Experiment data

•Bibliographic metadata•Citations•Figures•Tables•Chunks

•Reactions•Molecular Compounds

Cambridge

Indiana

• ComputationalChemistry (Gaussian)

triplestore

Southampton

Ontologies

ChemistryOntology

(Nico Adams – Cambridge)

20

Experiments Ontology (prototype)

Document Ontology

Reaction Ontology

molecules

Data(capture)

SemanticGraph

(storage)

Mash-up(reuse)

text

observations

measurementsdocuments

datamolecules

data

scientists

datument

lab notebook

experiment

“May all your problems be technical”

Scholarly communities behave very differently (example: preprint server)?

success

1991 Ginsparg @ LANL

high-energy physics, step-wise expansion

societies hands-off, cooperative

modified

1999Director of NIH

all of biomedicine

societies take control

failure

2000Commercial publisher

all of chemistry

societies adverse

Physics Biomedicine ChemistryarXiv eBiomed / PubMedCentral CPS

Chemistry is particularly challenging

Commercial value of chemical information (pharmaceuticals)

Nature of Chemistry research culture pre-dominance of synthesis (creation)

overshadows discovery mode typical of physics or biology

autonomy, successful research with limited reliance on others

Monopoly of scholarly societies qua publishers ACS (CAS) RSC

The Future

Continue work on technical innovations and infrastructure Demonstrate through value-add

applications Understand socio-technical barriers

International workshop/study Chemistry as “canary in coal mine”

Integrate with larger infrastructure effort Data Conservancy