61
BIO2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse François Belleau , Nicole Tourigny, Benjamin Good and Jean Morissette Centre de Recherche du CHUL, Université Laval Département d'informatique et de génie logiciel, Université Laval

Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse

Embed Size (px)

DESCRIPTION

Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse

Citation preview

Page 1: Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse

BIO2RDF : A Semantic Web Atlas of post genomic knowledge about

Human and Mouse

François Belleau, Nicole Tourigny, Benjamin Good and Jean Morissette

● Centre de Recherche du CHUL, Université Laval● Département d'informatique et de génie logiciel, Université Laval

Page 2: Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse

Vaugondy, Louis XV geograph, view of the world in the 18th century

Page 3: Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse

Google Map view of the world in the 21th century

Page 4: Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse

Evry, June 27, 2008 CHUL research center ­ Laval University 4

Outline

Introduction

− Problem definition− Proposed approach− The 4 rules of linked data− Related Work

Results

− Bio2RDF first knowledge map− Semantic ranking

Paget query demo with SPARQL

Future work and Conclusion

Page 5: Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse

Evry, June 27, 2008 CHUL research center ­ Laval University 5

Problem definition

● The objective of data integration is to make data distributed over a number of distinct, heterogeneous databases accessible via a single interface [Davidson 1995].

● We already use global text search engine on the web (Google, Yahoo).

● There is many specialized integrated search tools in bioinformatics (NCBI Entrez, EBI search, KEGG GenomeNet).

Page 6: Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse

Evry, June 27, 2008 CHUL research center ­ Laval University 6

What is known about « Paget disease» ?

but first ...

What is known about the mouse and human

genomes ?

Page 7: Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse

Evry, June 27, 2008 CHUL research center ­ Laval University 7

Popular web search engines without semantic

Page 8: Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse

Evry, June 27, 2008 CHUL research center ­ Laval University 8

Some Bioinformatics integrated search tools

● EMBL-EBI EB-eye search● KEGG GenomeNet● NCBI Entrez

Page 9: Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse

Evry, June 27, 2008 CHUL research center ­ Laval University 9

EMBL-EBI search

Page 10: Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse

Evry, June 27, 2008 CHUL research center ­ Laval University 10

NCBI Entrez life science searchacross databases

Page 11: Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse

Evry, June 27, 2008 CHUL research center ­ Laval University 11

KEGG GenomeNet search

Page 12: Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse

Evry, June 27, 2008 CHUL research center ­ Laval University 12

Bio2RDF search

What is known about Paget disease in the mouse and

human genomes ?

Page 13: Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse

Evry, June 27, 2008 CHUL research center ­ Laval University 13

Proposed approach

● Apply the semantic web model to data integration in bioinformatics;

● Use a PageRank [Brin 1998] variation adapted to semantic graph, a method analog to Aleman-Meza group's work: the LinkRank;

● Adopt standard (RDF, OWL) and use existing software (Sesame, Virtuoso, PiggyBank).

Page 14: Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse

Evry, June 27, 2008 CHUL research center ­ Laval University 14

Outline

Introduction

− Problem definition− Proposed approach− The 4 rules of linked data− Related Work

Results

− Bio2RDF first knowledge map− Semantic ranking

Paget query demo with SPARQL

Future work and Conclusion

Page 15: Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse

Evry, June 27, 2008 CHUL research center ­ Laval University 15

Linked data 4 rules

http://www.w3.org/DesignIssues/LinkedData

Page 16: Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse

Evry, June 27, 2008 CHUL research center ­ Laval University 16

Rule #1: Use URIs as names for things.● Using normalized identifier to name

concept is already a reality in biology domain.

● Hexokinase is GO:0004396● Definition :

− Catalysis of the reaction: ATP + D-hexose = ADP + D-hexose 6-phosphate.

● Synonym of EC:2.7.1.1

Page 17: Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse

Evry, June 27, 2008 CHUL research center ­ Laval University 17

Rule #2 : Use HTTP URIs so that people can look up those names.● Derefencable URL ● The Banff Manifesto rule for URN

− urn:bm:public_namespace:private_identifier● Normalized URL according to Banff

Manifesto: http://bio2rdf.org/public_namespace:private_identifier

● http://bio2rdf.org/go:0004396

Page 18: Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse

Evry, June 27, 2008 CHUL research center ­ Laval University 18

Rule #3 When someone looks up a URI, provide useful information.

● http://bio2rdf.org/go:0004396 returns the RDF graph of this topic

Page 19: Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse

Evry, June 27, 2008 CHUL research center ­ Laval University 19

Rule #4 :Include links to other URIs so that they can discover more things.

●Openess Ratio > 0 (to be defined)

Page 20: Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse

Evry, June 27, 2008 CHUL research center ­ Laval University 20

Outline

Introduction

− Problem definition− Proposed approach− The 4 rules of linked data− Related Work

Results

− Bio2RDF first knowledge map− Semantic ranking

Paget query demo with SPARQL

Future work and Conclusion

Page 21: Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse

Evry, June 27, 2008 CHUL research center ­ Laval University 21

Related work

● DBPedia● YeastHub● UniProt● HCLS linked data● Bio2RDF architecture

Page 22: Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse

Evry, June 27, 2008 CHUL research center ­ Laval University 22

Related work – Linked data map

http://wiki.dbpedia.org/Interlinking

Page 23: Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse

Evry, June 27, 2008 CHUL research center ­ Laval University 23

Related work – Linked data map

● If we were to draw a map of the existing relations between linked data from bioinformatics database providers, what would it look like?

● Could we measure the amount of post genomic knowledge available related to a mouse or human genome sequence?

● Could it help answer the what is known question?

Page 24: Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse

Evry, June 27, 2008 CHUL research center ­ Laval University 24

Related work – YeastHub

Page 25: Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse

Evry, June 27, 2008 CHUL research center ­ Laval University 25

Related work – UniProt beta

Page 26: Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse

Evry, June 27, 2008 CHUL research center ­ Laval University 26

Related work – HCLS demo

Page 27: Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse

Evry, June 27, 2008 CHUL research center ­ Laval University 27

Bio2RDF architecture

Page 28: Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse

Evry, June 27, 2008 CHUL research center ­ Laval University 28

Bio2RDF actual datasources loaded in the Atlas graph

Page 29: Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse

Evry, June 27, 2008 CHUL research center ­ Laval University 29

Outline

Introduction

− Problem definition− Proposed approach− The 4 rules of linked data− Related Work

Results

− Bio2RDF first knowledge map− Semantic ranking

Paget query demo with SPARQL

Future work and Conclusion

Page 30: Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse

What is known about human and mouse genome in 2008?

Page 31: Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse

Evry, June 27, 2008 CHUL research center ­ Laval University 31

What is Bioinformatics linked data ?

Page 32: Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse

Evry, June 27, 2008 CHUL research center ­ Laval University 32

http://bio2rdf.org/map

Bio2RDF linked data map is a first answer attempt

Page 33: Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse

Evry, June 27, 2008 CHUL research center ­ Laval University 33

Outline

Introduction

− Problem definition− Proposed approach− The 4 rules of linked data− Related Work

Results

− Bio2RDF first knowledge map− Semantic ranking

Paget query demo with SPARQL

Future work and Conclusion

Page 34: Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse

Evry, June 27, 2008 CHUL research center ­ Laval University 34

Semantic Web Ranking

● Openess Ratio● Averange Link Rank● Semantic weight

Page 35: Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse

   

Openess Ratio

Page 36: Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse

   

Average Link Rank

Page 37: Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse

   

Semantic Weight

Page 38: Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse

Evry, June 27, 2008 CHUL research center ­ Laval University 38

The semantic mashup effect

OR = 0ALR = 2MeSH

OR = 1ALR = 1GeneID

OR = 0,5ALR = 1,5PubMed

mean OR = 0,5mean ALR = 1,5

Page 39: Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse

Evry, June 27, 2008 CHUL research center ­ Laval University 39

The semantic mashup effect

OR = 0ALR = 2,3

MeSH

OR = 1ALR = 1GeneID

OR = 0,5ALR = 1,5PubMed

mean OR = 0,4mean ALR = 1,6

Page 40: Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse

   

Bio2RDF statistics by datasource

Page 41: Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse

   

Bio2RDF actual 30 datasources

Page 42: Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse

   

MeSH : OR = 0

Page 43: Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse

   

Pubmed: OR = 0,5

Page 44: Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse

   

GeneID : OR = 1

Page 45: Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse

   

Bio2RDF : OR = 0,630 datasources, 225 namespaces

Page 46: Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse

Evry, June 27, 2008 CHUL research center ­ Laval University 46

Knowledge gain of 0,19

From 0,77 to 0,58

Page 47: Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse

Evry, June 27, 2008 CHUL research center ­ Laval University 47

Bio2RDF Semantic Web Atlas in numbers

● 30 different datasources, 30 different namespaces

− go, geneid, uniprot, pubmed, pdb, reactome, omim, etc.

● 195 namespaces referencing non-rdfized datasource

− cog, genethon, tigr, cath, goa, etc.

● 8 millions topics● 65 millions triples● 973 Mo, size of N3 format compressed data

− http://bio2rdf.org/download/bio2rdf-atlas-080414.n3.gz

Page 48: Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse

Evry, June 27, 2008 CHUL research center ­ Laval University 48

Bio2RDF Semantic Web Atlas in statistics

● Openess Ratio (OR) of 0.58● Averange Link Rank (ALR) of 4.7● 8 millions topics are connected by 19 millions

relations within the graph● 58 % of URIs are referencing the open world

outside the graph● 19 % of knowledge gain because of the mashup

effect

Page 49: Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse

Evry, June 27, 2008 CHUL research center ­ Laval University 49

Outline

Introduction

− Problem definition− Proposed approach− The 4 rules of linked data− Related Work

Results

− Bio2RDF first knowledge map− Semantic ranking

Paget query demo with SPARQL

Future work and Conclusion

Page 50: Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse

Evry, June 27, 2008 CHUL research center ­ Laval University 50

Bio2RDF search demo with SPARQL

What is known about Paget disease in the mouse and

human genomes ?

Submitted athttp://bio2rdf.org:8890/sparql

Page 51: Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse

Evry, June 27, 2008 CHUL research center ­ Laval University 51

Submit the SPARQL query to Virtuoso

Page 52: Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse

Evry, June 27, 2008 CHUL research center ­ Laval University 52

SPARQL query in a URL

http://bio2rdf.org:8890/sparql?default­graph­uri=&query=CONSTRUCT+%7B%0D%0A%3Fs1+%3Fp1+%3Fo1+.%0D%0A%3Fs1+%3Chttp%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22­rdf­syntax­ns%23type%3E+%3Ftype+.+%0D%0A%3Fs1+%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf­schema%23label%3E+%3Flabel.+%0D%0A%3Fs1+%3Chttp%3A%2F%2Fbio2rdf.org%2Fbio2rdf%23linkRank%3E+%3FlinkRank.+%0D%0A%7D%0D%0AWHERE+%7B%0D%0A%3Fs1+%3Fp1+%3Fo1+.+%0D%0A%3Fo1+bif%3Acontains+%22paget%22+.%0D%0A%3Fs1+%3Chttp%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22­rdf­syntax­ns%23type%3E+%3Ftype+.+%0D%0A%3Fs1+%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf­schema%23label%3E+%3Flabel.+%0D%0A%3Fs1+%3Chttp%3A%2F%2Fbio2rdf.org%2Fbio2rdf%23linkRank%3E+%3FlinkRank.+%0D%0A%7D%0D%0A%0D%0A%0D%0A%0D%0A&format=application%2Frdf%2Bxml&debug=on

Page 53: Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse

Evry, June 27, 2008 CHUL research center ­ Laval University 53

View results in HTML

Page 54: Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse

Evry, June 27, 2008 CHUL research center ­ Laval University 54

View results with Sesame

Page 55: Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse

Evry, June 27, 2008 CHUL research center ­ Laval University 55

View results with Piggy Bank

Page 56: Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse

Evry, June 27, 2008 CHUL research center ­ Laval University 56

Outline

Introduction

− Problem definition− Proposed approach− The 4 rules of linked data− Related Work

Results

− Bio2RDF first knowledge map− Semantic ranking

Paget query demo with SPARQL

Future work and Conclusion

Page 57: Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse

Evry, June 27, 2008 CHUL research center ­ Laval University 57

Future works

● Create new rdfizer for public data source;● Build a community of users around the

Bio2RDF project (visit the Google group);● Connect more datasources to Bio2RDF by

building collaboration between research groups;

● Offer a public SPARQL endpoint based on Virtuoso server :

− http://bio2rdf.org:8890/sparql

Page 58: Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse

Evry, June 27, 2008 CHUL research center ­ Laval University 58

Conclusion

Those devices in the hands of scientists have forged our understanding of nature. 

Page 59: Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse

Evry, June 27, 2008 CHUL research center ­ Laval University 59

Conclusion

We have started to map the knowledge space of biology, we have a first impression of what the bioinformatics nation looks like, the time has come to explore it, the time has come to build 

the knowledgescope.

Page 60: Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse

Evry, June 27, 2008 CHUL research center ­ Laval University 60

Acknowlegments

Jean MorissetteNicole TourignyBenjamin Good

Bioinformatics lab’s team at CHUL Research Center :Philippe Rigault

Marc-Alexandre Nolin

Thanks to the essential annotators and data providerand to developers of open source project :

Sesame, Virtuoso and PiggyBank.François Belleau was a recipient of a studentship from Génome Québec. This work have been financed in part by the Atlas of Genomic Profiles of SteroidAction, a Genome Canada project. BMG is funded by Pacific Century and University of British Columbia Graduate Fellowships.

Page 61: Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse

Evry, June 27, 2008 CHUL research center ­ Laval University 61

http://bio2rdf.orgQuery the graph with SPARQL http://bio2rdf.org:8890/sparql

Download our software http://sourceforge.net/projects/bio2rdf/

Download the Atlas data in N3 format http://bio2rdf.org/download

Join our group http://groups.google.ca/group/bio2rdf

Contact us at [email protected]