View
322
Download
1
Embed Size (px)
DESCRIPTION
Citation preview
Hypercube
ChemicalSemantics, Inc.
Publication and Retrieval of Computational Chemical-Physics Data via The Semantic Web
Applying the Semantic Web to Computational
Chemistry
Hypercube
Chemical Semantics, September 20132
What is this all about ?
The principal objective of our enterprise is to create a testbed for comprehensive exploration of ideas behind the practical application of the Semantic Web in computational chemistry.
The aforementioned working testbed (Chemical Semantics Portal) is initially limited to computational chemistry and a limited class of users.
In addition, we will focus on the semi-empirical, ab-initio and density functional (DFT) calculations of quantum chemistry and their typical results.
The purpose of this talk is to present the ideas of the Semantic Web and their possible application in computational chemistry, and to present the working prototype of the Chemical Semantic Portal.
3Dr Mirek Sopek
INTRODUCTIONThe Basics of Semantic Web
Hypercube
Chemical Semantics, September 20134
The evolution of the
Web
WEB 1.0 - Web of documents
WEB 2.0 - Social, Read/Write Web
WEB 3.0 - Semantic Web = Web of Data
? WEB 4.0 - Intelligent Web ?
* Assuming Christmas 1990 as its beggining
(http://en.wikipedia.org/wiki/History_of_the_World_Wide_Web)
The web is only 8287 days* (23 years) old !Print – 203,800 daysNewspapers – 142,800 daysRadio – 41,200 daysTV – 28,000 days
Hypercube
Chemical Semantics, September 20135
Web 1.0 – Web of
documents1989-2000 - Web of Hyperlinked documents
Hypercube
Chemical Semantics, September 20136
Web 2.0 – Social/Read-Write
Web2000-2010 - The Web of Social Networks and “Wisdom of the Crowds”
Hypercube
Chemical Semantics, September 20137
Web 3.0 – Semantic Web
2010-2020(?) - Web of Data, Linked Data Web
Link
Link
Link
Link
Link
Link
Link
Link
LinkLink
Resource
Resource
Resource
Resource
Resource
Resource
Resource
Resource
hasPeople
humanResources
hasServices
hasProducts
hasPeople
hasPeople
hasProduct
hasProduct
colleague
colleague
Organization
HR
Services
Products
People
People
Product
Product
8Dr Mirek Sopek
What is wrong with today’s Web?
Hypercube
Chemical Semantics, September 20139
The WEB is TOO BIG to know
Web 1.0 & 2.0 major
issuesThe WEB is TOO BIG to knowSocial Web dwells in isolated silos
Data Deluge - Scientific data stored in isolated silos
?
People look at the Web through Google’s Goggles
10Dr Mirek Sopek
THE SOLUTION:Semantic Web – Web 3.0
Hypercube
Chemical Semantics, September 201311
What is Semantic Web ?
The Semantic Web is a Web of data. It is an extension of the current Web that provides an easier way to find, share, reuse and combine
information.
“The vision of the Semantic Web is to extend principles of the Web from documents to data.(...) This also means creation of a common framework that allows data to be shared and reused across application, enterprise, and community boundaries, to be processed automatically by tools as well as manually, including revealing possible new relationships among pieces of data.”http://www.w3.org/2001/sw/
Hypercube
Chemical Semantics, September 201312
Foundations of Semantic
Web
“Semantic” in “Semantic Web” is about MEANING of data, not about the syntax it is expressed in. Semantic Web = Web Full of Meaning = Web of meaningful DataSemantic Web is about representation of THINGS (OBJECTS and CONCEPTS) and their properties on the Web, not just about documentsSemantic Web uses global NAMING scheme to identify THINGS, not just to address documentsSemantic Web links THINGS with TYPED LINKS, not with “blind” hyperlinksSemantic Web allows DISCOVERY of new FACTS about THINGS, not just browsing through pages
* Picture by Roger Sayle (http://pubs.acs.org/doi/abs/10.1021/ci800243w)
Hypercube
Chemical Semantics, September 201313
Example
COC(=O)[C@H](C1=CC=CC=C1Cl)N2CCC3=C(C2)C=CS3InChI=1S/C16H16ClNO2S/c1-20-16(19)15(12-4-2-3-5-13(12)17)18-8-6-14-11(10-18)7-9-21-14/h2-5,7,9,15H,6,8,10H2,1H3/t15-/m0/s1
InchI (Key)=GKTWGGQPFAXNFI-HNNXBMFYSA-N
“Plavix” (Clopidogrel)
* Based on “Foreign Language Translation of Chemical Nomenclature by Computer” by Roger Sayle (DOI: 10.1021/ci800243w)
http://www.chemspider.com/InChIKey=GKTWGGQPFAXNFI-HNNXBMFYSA-N
Hypercube
Chemical Semantics, September 201314
How do we represent THINGS on
SW
On the Semantic WEB we represent THINGS using elementary UNITS of data: TRIPLES.
We can create logical and structural relations between elements of the triple, build taxonomies, vocabularies and classes and finally “reason” on large sets of triples.
The file format we store the triples in — is called RDF.
:H2O gnvc:hasInChIString “1S/H2O/h1H2”
For example:
Subject Predicate Object
Thing Property Value
Resource Description Framework
:hasMolecularMass “18.0153”
“RDF is for THINGS as HTML is for DOCUMENTS”
Hypercube
Chemical Semantics, September 201315
How do we Identify Things on the Semantic Web
For unambiguous identification of things (objects) on the Web and their properties, Semantic Web uses URIs — Universal Resource Identifiers, a generalization of URL i.e. Ordinary Web addresses:
WaterMolecular
Mass “18.0153”
http://www.chemicalsemantics.com/h2o
http://purl.org/chem/ns#MM A number
Chemical Semantics, September 201316 Hypercube
RDF/XML or
Turtle (Terse RDF Triple Language)1 @prefix cs: <http://ChemicalSemantics.com/chem/dictionary/ns#> .2 @prefix mol: <http://ChemicalSemantics.com/chem/molecules/simplewater.ttl#> .3 @prefix xs: <http://www.w3.org/2001/XMLSchema#> .4 mol:molecule_31 a cs:molecule ;5 cs:name “water" ;6 cs:atom _:atom31_1 ;7 cs:atom _:atom31_2 ;8 cs:atom _:atom31_3 ;9 cs:bond _:bond31_1 ;10 cs:bond _:bond31_2 .11 _:atom31_1 cs:atomType cs:O ;12 cs:x3 "-0.381950"^^xs:double;13 cs:y3 "0.243825"^^xs:double;14 cs:z3 "0.000000"^^xs:double.15 _:atom31_2 cs:atomType cs:H ;16 cs:x3 "-0.381950"^^xs:double;17 cs:y3 "1.203825"^^xs:double;18 cs:z3 "0.000000"^^xs:double.19 _:atom31_3 cs:atomType cs:H ;20 cs:x3 "0.523148"^^xs:double;(.....)
RDF Serialization – preliminary
example
Hypercube
Chemical Semantics, September 201317
Semantic Web allows Discovery
Semantic Web tools for building “inteligent” vocabularies – RDFS (RDFS Schema) and OWL ontologies allow for simple logical INFERENCES and discovery of IMPLICIT facts.For example: When a user searches for a molecule with specific properties, it is possible to automatically provide him with other molecules that belong to the same “class” of molecules. .
Hypercube
Chemical Semantics, September 201318
Semantic Web = GGG (Giant Global Graph)
Organization
HR
Services
Products
People
People
Product
Product
hasPeople
humanResources
hasServices
hasProducts
hasPeople
hasPeople
hasProduct
hasProduct
colleague
colleague
GGG – term coined by Tim Berners Lee in 2007
Ooops… sorry, but it’s BIG
Semantic Web = GGG (Giant Global Graph)
Hypercube
Chemical Semantics, September 201319
Core Semantic Web Technologies
RDF — Resource Description Framework
RDFa — RDF “in attributes”
RDFS — Resource Description Framework Schema Language
OWL — Ontology Web Language
SPARQL — Semantic Protocol & RDF Query Language
RIF — Rule Interchange Format
RDF deals with THINGS
RDFa enables to embed RDF into ordinary HTML Web Pages
RDFS deals with SETS and CLASSES of THINGS
OWL deals with intelligent VOCABULARIES (with logical relations between concepts)
SPARQL allows for searching through graphs of triples stored in “triple stores”
RIF allows to express and interchange generalized IF...THEN constructs
Hypercube
Chemical Semantics, September 201320
AAA — Anyone can say Anything about Any Topic.
... and one about Semantic Web Philosophy
OWA — Open World Assumption.We must assume that at any time a new piece of information may come so we can’t assume that we have ALL the information at the moment of information consumption. It also means that not knowing something does not necessarily imply falsity!
Hendler Hypothesis:“A Little Semantics Goes A Long Way”
Hypercube
Chemical Semantics, September 201321
Link Data Four Principles:• Use WEB ADDRESES (URLs) as names for
things.
• Use ADDRESSES THAT WORK ON THE WEB - so that people can look up those names.
• When someone looks up a URL, PROVIDE USEFUL INFORMATION, USING THE STANDARDS (like RDF).
• Include LINKS TO OTHER URLs, so that they can discover more things.
Hendler Hypothesis in
action...
The Semantic Web isn't just about putting data on the web. It is about making links, so that a person or machine can explore the web of data. With linked data, when you have some of it, you can find other, related, data. (Tim-Berners Lee)
Hypercube
Chemical Semantics, September 201322
Ontologies
“An ontology formally represents knowledge as a set of concepts within a domain, and the relationships between pairs of concepts. It can be used to model a domain and support reasoning about concepts.” (Wikipedia)
The fundamental goals of ontologies:
Define concepts used in Semantic graphs (like RDF)
Enable terminological standardisation
Provide tools for building intelligent dictionaries with synonyms and cross-references
Enable encoding of taxonomies (hierarchical definitions)
Enable reasoning and inferencing – discovering implicit knowledge
Chemical Semantics, September 201323 Hypercube
Antoine Lavoisier “Traité élémentaire de chimie”
Early ideas in ontology
"We think only through the medium of words. --Languages are true analytical methods. (…) The art of reasoning is nothing more than a language well arranged.
Thus, while I thought myself employed only in forming a Nomenclature, and while I proposed to myself nothing more than to improve the chemical language, my work transformed itself by degrees, without my being able to prevent it, into a treatise upon the Elements of Chemistry.
Chemical Semantics, September 201324 Hypercube
Nivaldo J. Tro “Chemistry. A Molecular Approach”
Example of Ontology “Hello world”
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .@prefix chem: <http://purl.org/chem/simple_classification#> .@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .@prefix foo: <http://example.com/this/> .
## Classes
chem:Matter a rdfs:Class ; rdfs:label "Matter"@en ;
rdfs:label "Matière"@fr ; rdfs:label "Materia"@pl .
chem:PureSubstances a rdfs:Class ; rdfs:label "Pure Substances"@en ; rdfs:label "Substances Pures"@fr ; rdfs:label "Substancja"@pl ; rdfs:subClassOf chem:Matter .
chem:Mixture a rdfs:Class ;
rdfs:label "Mixture"@en ;rdfs:label "Mélange "@fr ;rdfs:label "Mieszanina"@pl ;
rdfs:subClassOf chem:Matter .
chem:Heterogeneous a rdfs:Class ;rdfs:label "Heterogeneous"@en ;rdfs:label "Hétérogène"@fr ;rdfs:label "Heterogeniczny"@pl ;rdfs:subClassOf chem:Mixture .
chem:Homogeneous a rdfs:Class ;rdfs:label "Homogeneous"@en ;rdfs:label "Homogène"@fr ;rdfs:label "Jednorodny"@pl ;rdfs:subClassOf chem:Mixture .
## Properties
chem:atomicNumber a rdf:Property ;rdfs:domain chem:Element; rdfs:range rdfs:Literal .
chem:moleculeName a rdf:Property ;rdfs:domain chem:Compound; rdfs:range rdfs:Literal .
chem:componentName a rdf:Property ;rdfs:domain chem:Mixture ;rdfs:range chem:Matter .
Chemical Semantics, September 201325 Hypercube
Non-Trivial Ontologies in Chemistry
ChEBI – Chemical Entities of Biological Interest
http://www.ebi.ac.uk/chebi/
Project of EMBL-EBI
European Bioinformatics Institute (Cambridge) of European Molecular Biology Lab (Heidelberg)
OBO Foundry Ontology (http://www.obofoundry.org/ )The Open Biological and Biomedical Ontologies
Chemical Entities of Biological Interest (ChEBI) is a freely available dictionary of molecular entities focused on ‘small’ chemical compounds. The term ‘molecular entity’ refers to any constitutionally or isotopicaly distinct atom, molecule, ion, ion pair, radical, radical ion, complex, conformer, etc., identifiable as a separately distinguishable entity. The molecular entities in question are either products of nature or synthetic products used to intervene in the processes of living organisms.ChEBI incorporates an ontological classification, whereby the relationships between molecular entities or classes of entities and their parents and/or children are specified.
Chemical Semantics, September 201326 Hypercube
Non-Trivial Ontologies in Chemistry
ChemINF – Chemical Information Ontology
https://code.google.com/p/semanticchemistry/Janna Hastings, Nico Adams, Christoph Steinbeck (EBI)Leonid Chepelev, Michel Dumontier,Egon Willighagen, Nico Adams
OBO Foundry Candidate
ChemINF descibes:
• Chemical graphs, and various formats for encoding them.• Chemical descriptors, with definitions and axioms describing what they
are specifically about.• Specifications for certain descriptors.• Algorithms and their software implementations and axioms describing
their inputs and outputs.• Chemical data representation formalisms and formats.
Hypercube
Chemical Semantics, September 201327
Chemical Semantics Ontology
http://purl.org/gc/gc.owl
Gainesville Core (alpha edition)
Gainesville Core describes:• Molecular Publications• Molecular Systems• Molecular Calculations Molecular Systems contain Molecules• The Molecules may have Residues (for
biopolymers and polymers)• Molecular Calculations contain Initial
Data and Results• The Initial Data may have Methods,
Basis Sets, Functionals, etc.• The Results may have Energies, Wave
Functions and Spectra, etc.
GC aims at complete description of typical Computational Chemistry experiment
Hypercube
Chemical Semantics, September 201328
Chemical Semantics Ontology
gc.owlwithProtege
Hypercube
Chemical Semantics, September 201329
Related Ontologies ...
SIO – Semanticscience Integrated OntologyOPB – Ontology of Physics for BiologyRXNO – Name Reaction OntologyCMO – Chemical Methods OntologyMOP – Molecular Proocesses OntologySO – The Sequence Ontology Project
Hypercube
Chemical Semantics, September 201330
Importance of Structural Data
Structures
CML – Chemical Markup Language
“CML is not 'just another file format'; it is capable of holding extremely complex information structures and so acting as an interchange mechanism or for archival. It interfaces easily with modern database architectures such as relational databases or object-oriented databases. Most importantly, it a large amount of generic XML software to process and transform it is already available from the community.”
P. Murray-Rust, H. S. Rzepa, 2001
CML “paved the road” to Semantics in Chemistry.Extremely useful as an interchange format between CC software and Semantic Web
Our position: Chemical Semantics will use CSX – similar structural format enriched by explicit description of molecular constituents, enriched description of computations inputs and results .
Hypercube
Chemical Semantics, September 201331
A timeline of Semantic Web
RDF – 1999CML - Chemical Markup Language - 1999FOAF - 2000RDFa - 2004DBPedia – 2007ChEBI - Chemical Entities of Biological Interest - 2007GoodRelations (2008, Google adoption: November 2, 2010)Schema.org – June 2011Google’s Knowledge Graph – May 2012Facebook Graph Search - January 2013
Chemical Semantics, September 201332 Hypercube
An emerging successor to the web, the Semantic Web, will likely profoundly change the very nature of how scientific knowledge is produced and shared, in ways that we can now barely imagine.
Conclusion
33Dr Mirek Sopek
Chemical Semantics Portalhttp://portal.chemicalsemantics.com/cs
Hypercube
Chemical Semantics, September 201334
CS Portal main targets
Interoperable PUBLISHING of Computational Chemistry calculationsFEDERATION of published data with existing web-based chemical datasetsCloud-like ARCHIVING of Computational Chemistry calulations results, input/output files etc.
Hypercube
Chemical Semantics, September 201335
http://portal.chemicalsemantics.com/cs
Hypercube
Chemical Semantics, September 201336
http://portal.chemicalsemantics.com/cs
Hypercube
Chemical Semantics, September 201337
http://portal.chemicalsemantics.com/cs
Manual publication (upload)Automated publication directly from Modelling Software - via Web API
Hypercube
Chemical Semantics, September 201338
http://portal.chemicalsemantics.com/cs
Automated generation of permanent URIs
Hypercube
Chemical Semantics, September 201339
Permanent Chemical URIs
Automated generation of permanent URIs
http://purl.org/chem/pub/2013-08-04-quercetin
Owned & controlled by OCLC (Online Computer Library Center)Is claimed to be persistent and eternal.
Owned by OCLC controlled by Chemical Semantics, Inc.
Generated by Chemical Semantics, Inc. for the user. Owned by the user.
Hypercube
Chemical Semantics, September 201340
URI naming scheme
Publicationhttp://purl.org/chem/pub/2013-08-05-betacyanin
http://purl.org/chem/pub/2013-08-05-betacyanin/mol-calc Molecular Calculations
http://purl.org/chem/pub/2013-08-05-betacyanin/molSys Molecular System
A Molecule of the system http://purl.org/chem/pub/2013-08-05-betacyanin/molSys/m1 Bonds between atoms in the molecule
http://purl.org/chem/pub/2013-08-05-betacyanin/molSys/m1/a1a12
Hypercube
Chemical Semantics, September 201341
Dual nature of the URIs
Realizes Linked Data Principles
For Humans (i.e. as seen via web browser)http://purl.org/chem/pub/2013-08-02-pyridine_base
Returns:
Hypercube
Chemical Semantics, September 201342
Dual nature of the URIs
Realizes Linked Data Principles
For Machines (i.e. as seen via Semantic Tools (rdfEditor, Fidler))http://purl.org/chem/pub/2013-08-02-pyridine_base
Returns:
Content-negotiations: “One gets what one asks for”
Hypercube
Chemical Semantics, September 201343
More on “Human-oriented” views
“Results” – a prototype for future publication “digest”
Hypercube
Chemical Semantics, September 201344
More on “Human-oriented” views
“Molecules” – generic, webGL based molecular viewer
Hypercube
Chemical Semantics, September 201345
More on “Human-oriented” views
“Wave function” – visualization of orbital energies
Hypercube
Chemical Semantics, September 201346
More on “Human-oriented” views
“Graph” – explore the knowledge structure about your system
Hypercube
Chemical Semantics, September 201347
More on “Human-oriented” views
“Data Federation” – explore Semantic Links to eternal resources
Hypercube
Chemical Semantics, September 201348
More on “Human-oriented” views
“Data sets” – use CS Portal for archiving purposes
Hypercube
Chemical Semantics, September 201349
SPARQL queries on CS Portal
Counting number of triples in the graphs of the CS Portal
SELECT ?graph (count(*) as ?count)
WHERE { GRAPH ?graph { ?s ?p ?o . }
}group by ?graph
order by DESC(?count)
Hypercube
Chemical Semantics, September 201350
SPARQL queries on CS Portal
Counting number of elements in all molecular systems on the CS Portal
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>PREFIX gc: <http://purl.org/gc/>PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?element (count(*) as ?count)WHERE { ?atom gc:isElement ?element .}GROUP BY ?element ORDER BY DESC(?count)
Hypercube
Chemical Semantics, September 201351
SPARQL queries on CS Portal
Number of different calculations in all molecular systems ofthe CS Portal
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>PREFIX gc: <http://purl.org/gc/>
SELECT ?resultType (count(*) as ?count)WHERE { GRAPH ?graph { ?calc rdf:type gc:Calculation ; gc:hasResult ?result . ?result rdf:type ?resultType . }}group by ?resultTypeorder by DESC(?count)
Hypercube
Chemical Semantics, September 201352
SPARQL queries on CS Portal
Number of molecular systems with halogen atoms the CS PortalPREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>PREFIX gc: <http://purl.org/gc/>PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>SELECT ?graphWHERE { GRAPH ?graph { { ?something gc:hasAtom ?atom1 ; rdf:type ?somethingType ; rdfs:label ?somethingLabel . ?atom1 gc:isElement "F" . } UNION { ?something gc:hasAtom ?atom2 ; rdf:type ?somethingType ; rdfs:label ?somethingLabel . ?atom2 gc:isElement "Cl" . } UNION { ?something gc:hasAtom ?atom3 ; rdf:type ?somethingType ; rdfs:label ?somethingLabel . ?atom3 gc:isElement "Br" . } UNION { ?something gc:hasAtom ?atom4 ; rdf:type ?somethingType ; rdfs:label ?somethingLabel . ?atom4 gc:isElement "I" . } UNION { ?something gc:hasAtom ?atom4 ; rdf:type ?somethingType ; rdfs:label ?somethingLabel . ?atom4 gc:isElement "At" . }} }
Hypercube
Chemical Semantics, September 201353
SPARQL queries on CS Portal
Number of inorganic molecular systems
## Show all molecules that contain atoms other than C,O,N,H
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>PREFIX gc: <http://purl.org/gc/>PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT DISTINCT ?graph WHERE { {GRAPH ?graph { ?mol gc:hasAtom ?atom}} MINUS {GRAPH ?graph { ?a gc:isElement "C" }} MINUS {GRAPH ?graph { ?b gc:isElement "O" }} MINUS {GRAPH ?graph { ?b gc:isElement "N" }} MINUS {GRAPH ?graph { ?b gc:isElement "H" }}}
Hypercube
Chemical Semantics, September 201354
SPARQL queries on CS Portal
Energy values computed of all of molecular systems
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>PREFIX gc: <http://purl.org/gc/>
SELECT ?sysEnergy ?energyValue ?energyNameWHERE { GRAPH ?graph { ?molSys rdf:type gc:MolecularSystem ; gc:hasCalculationOn ?molCalc . ?molCalc rdf:type gc:Calculation ; gc:hasResult ?sysEnergy . ?sysEnerg rdf:type gc:SystemEnergies ; ?p ?o . ?o gc:hasFloatValue ?energyValue; rdfs:label ?energyName. }}ORDER BY ?energyName
Hypercube
Stay tuned ...
If you want to work with us,or just share your opinions,Do not hesitate to notify us
Hypercube
Thank you…
Neil Ostlund, Hypercube, Inc.
1115 NW 4th St. Gainesville, FL 32608, USA
Phone: (352) 371 7744Web: www.hyper.comeMail: [email protected]
Mirek SopekMakoLab SA
Demokratyczna 46, 93-430 Lodz, Poland
Phone: +48 600 814 537Web: www.makolab.comeMail: [email protected]