Transcript
Page 1: The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect London 2013

Grap

hCon

nect

Page 2: The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect London 2013

the power of graphs to analyze biological data

Page 3: The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect London 2013

about me

who am i ...

Davy Suvee@DSUVEE

➡ big data architect @ datablend - continuum• provide big data and nosql consultancy

• 5 years of hands-on expertise in the pharma/biotech sector

Page 4: The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect London 2013

massive data

big data in pharma

full genome sequencing

complex databiological networks

scalable number crunching platform

visual insights-driven platform

graphs!!

Page 5: The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect London 2013

outlier detection platform

big data in pharma (2 specific use cases)

neo4j, mongodb/cassandra and gephi

euretos - brainneo4j, mongodb, solr and prefuse

Page 6: The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect London 2013

gene expression clustering

★ 4.800 samples★ 27.000 genes

➡ oncology data set:

➡ Question:★ for a particular subset of samples, which genes are co-expressed?

Page 7: The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect London 2013

storing gene expressions (mongodb)

{ "_id" : { "$oid" : "4f1fb64a1695629dd9d916e3"} ,  "sample_name" : "122551hp133a21.cel" ,  "genomics_id" : 122551 ,  "sample_id" : 343981 ,  "donor_id" : 143981 ,  "sample_type" : "Tissue" ,  "sample_site" : "Ascending colon" ,  "pathology_category" : "MALIGNANT" ,  "pathology_morphology" : "Adenocarcinoma" ,  "pathology_type" : "Primary malignant neoplasm of colon" ,  "primary_site" : "Colon" ,  "expressions" : [ { "gene" : "X1_at" , "expression" : 5.54217719084415} ,                    { "gene" : "X10_at" , "expression" : 3.92335121981739} ,                    { "gene" : "X100_at" , "expression" : 7.81638155662255} ,                    { "gene" : "X1000_at" , "expression" : 5.44318512260619} ,                     … ]}

Page 8: The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect London 2013

correlating samples (mongodb/map-reduce)

pearson correlation

x y

43 99

21 65

25 79

42 75

57 87

59 81

0,52

Page 9: The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect London 2013

co-expression graph (neo4j)

➡ create a node for each sample➡ if correlation between two samples >= 0.8

create an edge between both nodes

122552

122553

122551

correlated

value : 0,86

Page 10: The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect London 2013

co-expression visualisation (gephi)

Page 11: The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect London 2013

euretos - brain

➡ pubmed: 23 million biomedical articles• 1300 new ones added every day• google-like search interface

➡ reading an article ...• malaria is transferred by mosquitoes

Page 12: The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect London 2013

euretos - brain

authors references

Page 13: The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect London 2013

euretos - brain

ooooooh crap ...

Page 14: The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect London 2013

euretos - brain

➡ nanopub (nanopub.org)• the smallest unit of publishable information

➡ assertion• subject: malaria• predicate: transferred by• object: mosquito

➡ provenance• how this came to be (meta-data)

Page 15: The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect London 2013

euretos - brain➡ unfortunately, malaria is encoded in various ways ...

malaria P22384 AQ879

db1 db2 db3

malaria

Page 16: The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect London 2013

euretos - brain

malaria mosquitotransferred by

Page 17: The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect London 2013

euretos - brain

➡ brain (http://www.euretos.com/brain)• exploration and analysis platform• millions of concepts/triples/nanopubs• pubmed, uniprot, omim, pubchem, ...

➡ architectural stack• meta-data is stored in mongodb• graph in neo4j• swing interface connecting to rest endpoints

Page 18: The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect London 2013

brain

Page 19: The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect London 2013

brain

Page 20: The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect London 2013

brain

Page 21: The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect London 2013

brain

Page 22: The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect London 2013

brain

Page 23: The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect London 2013

brain

Page 24: The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect London 2013

brain

Page 25: The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect London 2013

brain

Page 26: The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect London 2013

Questions?