22
The role of ontologies within the semantic web Wider data integration Wider data integration Simon Jupp, James Malone [email protected] , [email protected]

Wider data integration - BioMedBridges · The role of ontologies within the semantic web Wider data integration Simon Jupp, James Malone [email protected] , [email protected]

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Wider data integration - BioMedBridges · The role of ontologies within the semantic web Wider data integration Simon Jupp, James Malone jupp@ebi.ac.uk , malone@ebi.ac.uk

The role of ontologies within the semantic web

Wider data integrationWider data integration

Simon Jupp, James Malone

[email protected] , [email protected]

Page 2: Wider data integration - BioMedBridges · The role of ontologies within the semantic web Wider data integration Simon Jupp, James Malone jupp@ebi.ac.uk , malone@ebi.ac.uk

Outline

• Overview of the Semantic Web

• Resource Description Framework

• The role of ontologies in the semantic web

• Linked data

• Demo

Page 3: Wider data integration - BioMedBridges · The role of ontologies within the semantic web Wider data integration Simon Jupp, James Malone jupp@ebi.ac.uk , malone@ebi.ac.uk

Evolution of the web

• 1st generation web - linked documents

• Hand coded HTML

• 2nd generation web – the web as platform

• Web service, XML and REST APIs

• 3rd generation web – semantic web

• The web as a platform for publishing data

• Semantic markup that adds meaning to data for machine processing

Page 4: Wider data integration - BioMedBridges · The role of ontologies within the semantic web Wider data integration Simon Jupp, James Malone jupp@ebi.ac.uk , malone@ebi.ac.uk

A resource of information http://en.wikipedia.org/wiki/Barack_Obama

Lots of data hidden in this page

How could a machine pull data about US presidents from Illinois?

Page 5: Wider data integration - BioMedBridges · The role of ontologies within the semantic web Wider data integration Simon Jupp, James Malone jupp@ebi.ac.uk , malone@ebi.ac.uk

Data as a graph

• Data naturally fits into graph based datastructures

Barack Obama

Person

Illlinois

Unites States Presidents

USA State

USA

How do we publish this data on the Web so a machine can process and “understand” it?

Page 6: Wider data integration - BioMedBridges · The role of ontologies within the semantic web Wider data integration Simon Jupp, James Malone jupp@ebi.ac.uk , malone@ebi.ac.uk

RDF – Resource Description Framework

• RDF is a graphical language used for representing information about resources on the web.

• Resources are described in terms of properties and property values using RDF statements

• All statements in RDF are triple, consisting of a subject, predicate and object.

Page 7: Wider data integration - BioMedBridges · The role of ontologies within the semantic web Wider data integration Simon Jupp, James Malone jupp@ebi.ac.uk , malone@ebi.ac.uk

Triple statement

Barack Obama

Honolulubirth place

Subject

Predicate

Object

Page 8: Wider data integration - BioMedBridges · The role of ontologies within the semantic web Wider data integration Simon Jupp, James Malone jupp@ebi.ac.uk , malone@ebi.ac.uk

Identify things on the web

• Using existing web technology – the URI

http://dbpedia.org/page/Barack_Obama

http://dbpedia.org/page/Honolulu

http://dbpedia.org/property/birthPlace

Subject

Predicate

Object

Page 9: Wider data integration - BioMedBridges · The role of ontologies within the semantic web Wider data integration Simon Jupp, James Malone jupp@ebi.ac.uk , malone@ebi.ac.uk

Publishing and RDF as XML

• URIs give us a mechanism to identify “things” on the web

• RDF provides a base vocabulary describing how those things are related

• We need other vocabularies that give us different kinds of relationships

• Add shared meaning to things using additional vocabularies and ontologies

Page 10: Wider data integration - BioMedBridges · The role of ontologies within the semantic web Wider data integration Simon Jupp, James Malone jupp@ebi.ac.uk , malone@ebi.ac.uk

Ontologies provide the meaning

Barack Obama

Person

Honolulu

Unites States Presidents

USA State

USA

rdfs:subClassOf

rdf:type

dbpedia:birthPlace dbpedia:partOf

dbpedia:partOf

dbpedia:birthPlace

rdf:type

Page 11: Wider data integration - BioMedBridges · The role of ontologies within the semantic web Wider data integration Simon Jupp, James Malone jupp@ebi.ac.uk , malone@ebi.ac.uk

Dbpedia project

• Convert wikipedia info boxes into structured RDF

• RDF even has a query language called SPARQL

http://dbpedia.org/page/Barack_Obamahttp://dbpedia.org/data/Barack_Obama.rdf

PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>PREFIX umbel:<http://umbel.org/umbel/rc/>PREFIX dbpediaowl:<http://dbpedia.org/ontology/>PREFIX resource:<http://dbpedia.org/resource/>

SELECT ?subject WHERE {?subject rdf:type umbel:PresidentOfOrganization .?subject dbpediaowl:birthPlace resource:Honolulu} http://tinyurl.com/cpbmjjf

Page 12: Wider data integration - BioMedBridges · The role of ontologies within the semantic web Wider data integration Simon Jupp, James Malone jupp@ebi.ac.uk , malone@ebi.ac.uk

Building a web of linked data

Life sciences

Page 13: Wider data integration - BioMedBridges · The role of ontologies within the semantic web Wider data integration Simon Jupp, James Malone jupp@ebi.ac.uk , malone@ebi.ac.uk

Uniprot resourcehttp://www.uniprot.org/uniprot/Q0S7M9

Page 14: Wider data integration - BioMedBridges · The role of ontologies within the semantic web Wider data integration Simon Jupp, James Malone jupp@ebi.ac.uk , malone@ebi.ac.uk

Now serving up RDFhttp://purl.uniprot.org/core/Q5UA70

Page 15: Wider data integration - BioMedBridges · The role of ontologies within the semantic web Wider data integration Simon Jupp, James Malone jupp@ebi.ac.uk , malone@ebi.ac.uk

Atlas - Which genes are expressed where and when?• Questions (we collected) that biologists would like to ask of EBI data:

“Differentially expressed genes in adult mice, bred in oxygen rich vs oxygen poor environments? Of this set, which biological processes (GO) are enriched?”

“Where are genes with antigen binding function differentially expressed, which disease and which associated pathways?”

“Get metformin associated pathways with differentially expressed genes, find any proteins that are targets for known diabetes drugs”

How can we help to answer these?

Page 16: Wider data integration - BioMedBridges · The role of ontologies within the semantic web Wider data integration Simon Jupp, James Malone jupp@ebi.ac.uk , malone@ebi.ac.uk

HTML request (Human view)

Page 17: Wider data integration - BioMedBridges · The role of ontologies within the semantic web Wider data integration Simon Jupp, James Malone jupp@ebi.ac.uk , malone@ebi.ac.uk

GXA schema as an RDF graph

Page 18: Wider data integration - BioMedBridges · The role of ontologies within the semantic web Wider data integration Simon Jupp, James Malone jupp@ebi.ac.uk , malone@ebi.ac.uk

Mapping ontologies

Page 19: Wider data integration - BioMedBridges · The role of ontologies within the semantic web Wider data integration Simon Jupp, James Malone jupp@ebi.ac.uk , malone@ebi.ac.uk

Integration adds knowledge

liver cancer

Pathway x Protein a

Gene Y

Gene X

species

Gene ZRegulates

Page 20: Wider data integration - BioMedBridges · The role of ontologies within the semantic web Wider data integration Simon Jupp, James Malone jupp@ebi.ac.uk , malone@ebi.ac.uk

GXA

Data integration

• Primary use-case is to link to pathways• Reactome already publishing RDF

http://purl.uniprot.org/uniprot/Q5UAB1

Differentially expressed gene

Sample

Assay

Experiment

http://identifiers.org/ensembl/ENSG00000175793

Reactome Pathway

UniProt

GOAChEMBL

sio:encodes

sio:’is attribute of’

Page 21: Wider data integration - BioMedBridges · The role of ontologies within the semantic web Wider data integration Simon Jupp, James Malone jupp@ebi.ac.uk , malone@ebi.ac.uk

Demo queries collected from users (live demo)

• http://wwwdev.ebi.ac.uk/fgpt/gxa-sparql

• “Get top 10 overexpressed genes where the experimental factor is asthma”

• “Is the NID1 gene differentially in other respiratory system diseases?”

• “For the genes overexpressed in asthma, get the associated Reactome pathways.”

• “What is the function of genes from the previous query? (This query retrieves GO annotations from the UniProt)”

• “Which ChEMBL molecules target these proteins (retrieves data from ChEMBL”?

• Detecting inconsistencies in data annotations

Page 22: Wider data integration - BioMedBridges · The role of ontologies within the semantic web Wider data integration Simon Jupp, James Malone jupp@ebi.ac.uk , malone@ebi.ac.uk

Summary

• Semantic Web has promised much and delivered little to date

• Recently things have began to improve

• Google’s Knowledge Graph, Good Relations such examples

• EBI now has a trial group for RDF

• Not mature technology but something to keep in mind