Transcript

Graph Analytics in Pharmacology over the Web of Life Sciences Linked Open Data

26th World Wide Web Conference (WWW)

Perth, 4th – 8th April 2017

M A U L I K R . K A M D A R A N D M A R K A . M U S E N

Stanford Center for Biomedical Informatics [email protected]

Linked Open Data (LOD) Cloud

Cyganiak, Richard et al. 2014

2

Life Sciences Linked Open Data (LSLOD) Cloud

3

4

Semantic Web: Publishing Data as a Graph

5

589.25

mol_weight

Gleevec (Mol. Wt.: 589.25 g/mol, Half-Life: 18 hours) inhibits PDGFR, involved in signal transduction.

“18 hours”half-life

x-ref

GleevecDrugB: DB00619

Gleevec

Resource Description Framework (RDF)

Inhibits

target name

type

GO:0007165(Signal

Transduction)

process

PDGFRKEGG: D01441http://bio2rdf.org/kegg:D01441

http://bio2rdf.org/drugbank:DB00619

Uniform Resource Identifier

Semantic Web: Querying the Graph

< 1000

mol_weight

?half-life

x-ref

?

?

What are the half-lives of drugs that have Mol. Wt < 1000 g/mol and inhibit proteins

involved in signal transduction?

SPARQL Query Language6

Inhibits

?target name

type

GO:0007165(Signal

Transduction)

process

Life Sciences Linked Open Data Cloud – query federation

• Challenges associated with retrieving information from LSLOD sources• Pattern-based method to rewrite queries across LSLOD sources• An application in mechanism-based pharmacovigilance - PhLeGrA

What this talk is about …

7

8

Query Federation: Rewriting and executing queries across different sources

QUERY FEDERATION

Drug molecular-weight < 1000 target

process = “GO:0007165” half-life

9Schwarte, et al. ISWC 2012

Drug molecular-weight < 1000 target half-life

Drug molecular-weight < 1000 target

process = “GO:0007165”

What are the half-lives of drugs that have Mol. Wt < 1000 g/mol and inhibit

proteins involved in signal transduction?

Heterogeneity in the LSLOD Cloud

10

Gleevecmolecular-weight

493.61 Gleevecmol_weight

589.25

Label Mismatch: Different labels for classes, relations and attributes

(clinical features) (biological features)

Heterogeneity in the LSLOD Cloud

11

Gleevecmolecular-weight

493.61 Gleevecmol_weight

589.25

Label Mismatch: Different labels for classes, relations and attributes

(clinical features) (biological features)

Heterogeneity in the LSLOD Cloud

12

Gleevec PDGFRdrug-target

Gleevec

Inhibits

PDGFRtarget

name

type

PubMed: 21152856

source

Model Mismatch: Different graph patterns to capture granularity

Gleevecmolecular-weight

493.61 Gleevecmol_weight

589.25

Label Mismatch: Different labels for classes, relations and attributes

(clinical features) (biological features)

Heterogeneity in the LSLOD Cloud

13

• Inconsistent Meanings

• Inconsistent URI labels for classes, relations and attributes

• Inconsistent Attribute values for entities

• Inconsistent Graph patterns for SPARQL queries

• Incomplete Relations between entities

Query Rewriting fails over the LSLOD Cloud

What are the half-lives of drugs that have Mol. Wt < 1000 g/mol and inhibit proteins involved in signal transduction?

?s a <Drug>?s <molecular-weight> ?mw?s <target> ?protein ?s <half-life> ?hl?mw < 1000 g/mol?protein <hasGO> <GO:0007165>

?s a <Drug>{?s <molecular-weight> ?mw}{?s <half-life> ?hl}?mw < 1000 g/mol

?s a <Drug>{?s <target> ?protein}?protein <hasGO> <GO:0007165>

Query Rewriting

14

Using Graph Patterns for Query Rewriting

?Drug DrugBank:drug-target ?Protein?Drug KEGG:target ?blank KEGG:link ?Protein

Mapping Rules:

15

?Drug hasTarget ?Protein

Using Graph Patterns for Query Rewriting

?Drug DrugBank:drug-target ?Protein?Drug KEGG:target ?blank KEGG:link ?Protein

Mapping Rules:

What are the half-lives of drugs that have Mol. Wt < 1000 g/mol and inhibit proteins involved in signal transduction?

?s a <Drug>?s <hasMolWt> ?mw?s <hasTarget> ?protein ?s <hasHalfLife> ?hl?mw < 1000 g/mol?protein <hasGO> <GO:0007165>

?s a <Drug>{?s <molecular-weight> ?mw}?s <drug-target> ?protein {?s <half-life> ?hl}?mw < 1000 g/mol

?s a <Drug>?s <mol_wt> ?mw{?s <target> ?protein_blank?protein_blank <link> ?protein}?protein <hasGO> <GO:0007165>

QueryRewriteQuery Rewriting

16

?Drug hasTarget ?Protein

Life Sciences Linked Open Data Cloud – query federation

• Challenges associated with retrieving information from LSLOD sources• Pattern-based method to rewrite queries across LSLOD sources• An application in mechanism-based pharmacovigilance - PhLeGrA

What this talk is about …

17

PhLeGrA – Linked Graph Analytics in Pharmacology

18

Phlegra is a spider genus of the Salticidae family, commonly termed jumping spiders.

k-partite network will be generated as output

19

Entities and Relations from 4 different sources are retrieved to create the k-partite Network

This k-partite network is generated in < 1 day

20

Query Federation overcomes heterogeneous Distribution of Entities and Relations

R1: Drug hasTarget ProteinE1: Drug

• Similar and complete unique entities and relations exist between data sources• Necessary to get the complete picture, but also determine sources of noise

21

Several underlying mechanisms are possible …

http://onto-apps.stanford.edu/phlegra 22

A graph analytics module to rank the mechanisms

23

Preliminary results using network-based Apriori Algorithm for ranking mechanisms

24

The story so far …

25

Pattern-based federation methods can retrieve data from multiple sources in the Life Sciences Linked Open Data Cloud, and can enable development of advanced

methods for mechanism-based pharmacovigilance.

Acknowledgments

Musen Lab, Stanford

Biomedical Informatics Training Program

Michel Dumontier

US NIH Grant HG004028

26

PhLeGrA – Linked Graph Analytics in Pharmacology

27

www.stanford.edu/~maulikrk/research.htmlwww.onto-apps.stanford.edu/phlegra


Recommended