Transcript
Page 1: DisGeNET: Applying Semantic Web and Network Analysis ... · DisGeNET: Applying Semantic Web and Network Analysis Approaches for the Integration and Analysis of Gene-Disease Associations

DisGeNET: Applying Semantic Web and Network Analysis Approaches for the Integration and Analysis of Gene-Disease Associations for Translational Research and Drug Discovery

Janet Piñero, Núria Queralt-Rosinach, Àlex Bravo, Ferran Sanz and Laura I. Furlong Integrative Biomedical Informatics Group, Research Programme on Biomedical Informatics; Hospital del Mar Medical Research Institute; Pompeu Fabra University

Acknowledgements The authors thank the Open PHACTS partners, Michel Dumontier and the OpenLink staff for their input, collaboration and help. Funding: We received support from ISCIII-FEDER (PI13/00082, CP10/00524), from the IMI-JU under grants agreements nº 115002 (eTOX), nº 115191 (Open PHACTS)], nº 115372 (EMIF) and nº 115735 (iPiE), resources of which are composed of financial con-tribution from the European Union's Seventh Framework Pro-gramme (FP7/2007-2013) and EFPIA companies’ in kind contribu-tion, and the EU H2020 Programme 2014-2020 under grant agreements no. 634143 (MedBioinformatics) and no. 676559 (Elixir-Excelerate). The Research Programme on Biomedical Informatics (GRIB) is a node of the Spanish National Institute of Bioinformatics (INB).

DISCOVERY PLATFORM2

KNOWLEDGE BASE TOOLS

EVIDENCE-BASED DISCOVERY

CLINICIAN INTEROPERABILITY

METADATA

DATABASES & LITERATURE STANDARDS

INTEGRATION

OPEN

http://www.disgenet.org/

RESEARCHER

CURATOR

BIOINFORMATICIAN & DEVELOPER

DISCOVERABILITY

COMMUNITY USE

LARGE-SCALE EXTRACTION

DIGITAL PUBLICATION, SHARING AND LINKING Usage stats (Ago2014-Ago2015):

• 12,040 users, 22,696 sessions • 14,494 downloads • DisGeNET used in +20 publications,

cited in +60 articles

Registered: • biosharing • OMICtools • NeuroLex • Datahub

Integrated knowledge: • Text mining extraction • Integration with well-curated data • Analysis • Discovery and decision-making

Present in the Semantic Web: • URI/RDF/nanpublications • Machine-processable • Semantic integration • Links to the Linked Open Data cloud • Data analysis across domains

IMPLEMENTATION

STANDARDIZATION

TRANSLATIONAL RESEARCH INTEROPERABILITY

INTEGRATION

EVIDENCE

SEMANTIC WEB

NETWORK ANALYSIS

DRUG DISCOVERY

WEB-BASED EXPLORATION

KNOWLEDGE BASE TOOLS FOR EXPLORATION AND ANALYSIS

REPRODUCIBILITY ACCESSIBILITY

LHGDN

S = WCURATED + WPREDICTED + WLITERATURE

SYNTACTIC

NORMALIZATION

SEMANTIC

Downloads

Web Interface

SPARQL endpoint

Open Database License

Programmatic access

Metadata

Digital objects

• Data item-level description • Dataset-level description Transparency and validation

• Tab separated plain text • SQLite • RDF • Trusty nanopublications

http://www.disgenet.org/

http://rdf.disgenet.org/sparql/

http://opendatacommons.org/licenses/odbl/1.0/

Linked Data Browser http://rdf.disgenet.org/fct/

• Automatic analysis • Higher speed • Reduce error • Share results • Embed in workflows

DisGeNET association type ontology

Semanticscience Integrated Ontology (SIO)4

• 11 common ontologies in NCBO BioPortal

• RDF1

• Nanopublications3

• NCBI Gene ID • UMLS Concept Unique Identifiers

• Normalized Identification Scheme http://rdf.disgenet.org/resource/gda/ + ID

What are the diseases associated to melanocortin 4 receptor (MC4R)?

What are the genes associated to Obesity?

429,111 Gene-Disease Associations

What is the pattern of tissue expression of the genes associated to Obesity?

References 1. Queralt-Rosinach, N., Piñero,J. , Bravo, À, Sanz, F. and Furlong, L.I. DisGeNET-RDF: harnessing the innovative power of the

Semantic Web to explore the genetic basis of diseases, 2015 (submitted). 2. Piñero, J., Queralt-Rosinach, N., Bravo, A., Deu-Pons, J., Bauer-Mehren, A., Baron, M., … Furlong, L. I. (2015). DisGeNET: a discovery

platform for the dynamical exploration of human diseases and their genes. Database, 2015(0), bav028–bav028.

3. Queralt-Rosinach, N., Kuhn, T., Chichester, C., Dumontier, M., Sanz, F., and Furlong, L.I., Publishing DisGeNET as Nanopublications.

Semantic Web Journal, (to appear), 1-10, 2015. 4. Dumontier, M., Baker, C. J., Baran, J., Callahan, A., Chepelev, L., Cruz-Toledo, J., … Hoehndorf, R. (2014). The Semanticscience

Integrated Ontology (SIO) for biomedical research and knowledge discovery. Journal of Biomedical Semantics, 5(1), 2014. 5. Gray, A. J. G., Groth, P., Loizou, A., Askjaer, S., Brenninkmeijer, C., Burger, K., … Williams, A. J. (2014, January 1). Applying linked

data approaches to pharmacology: Architectural decisions and implementation. Semantic Web. IOS Press. doi:10.3233/SW-2012-0088

Several formats and models

• Provenance (PubMed ID, source)

• DisGeNET score (evidence)

Context metadata for each G-D pair

Sentence description

• 17,181 Genes • PANTHER class

• 14,610 Diseases • MeSH class

• Semantic Web platform to answer complex questions for the pharmacological field 5

What is the pattern of tissue protein expression of the genes associated to Obesity that are involved in the same pathway, and retrieve all bioactive

small molecules hitting each target (filter minEx-pChembl=5)?

• Large-scale integration across domains

• DisGeNET Cytoscape plugin

60% complex, 36% rare/Mendelian, and 4% infectious diseases

DO MSH OMIM NCI ORDO ICD9

19 58 38 33 13 12

Recent findings

Recommended