Transcript
Page 1: Using MongoDB as a graph database - 2014 redux

Using MongoDB as a Graph Database

Chris ClarkeNoSQL Birmingham16th October 2014

Page 2: Using MongoDB as a graph database - 2014 redux

Graphs 101For the uninitiated

Page 3: Using MongoDB as a graph database - 2014 redux

John Janeknows

Page 4: Using MongoDB as a graph database - 2014 redux

John Janeknows

John knows JaneJane knows John

Page 5: Using MongoDB as a graph database - 2014 redux

John Janeknows

Page 6: Using MongoDB as a graph database - 2014 redux

John Janeknows

John knows JaneJane ? John

Page 7: Using MongoDB as a graph database - 2014 redux

John Jane

John knows JaneJane knows John

knows

knows

Page 8: Using MongoDB as a graph database - 2014 redux

RDF

Page 9: Using MongoDB as a graph database - 2014 redux

John knows JaneEntity Property Value

Page 10: Using MongoDB as a graph database - 2014 redux

John knows Jane

Subject Predicate Object

Page 11: Using MongoDB as a graph database - 2014 redux

John knows Jane

Jane knows John

Subject Predicate Object

Page 12: Using MongoDB as a graph database - 2014 redux

http://example.com/John foaf:knows http://example.com/Jane

PREFIX foaf: <http://xmlns.com/foaf/0.1/>

Subject Predicate Object

Page 13: Using MongoDB as a graph database - 2014 redux

http://example.com/John

http://example.com/John

foaf:knows http://example.com/Jane

foaf:name “John”

PREFIX foaf: <http://xmlns.com/foaf/0.1/>PREFIX rdf: <

http://www.w3.org/1999/02/22-rdf-syntax-ns#>

http://example.com/John rdf:type foaf:Person

http://example.com/Jane foaf:name “Jane”

http://example.com/Jane rdf:type foaf:Person

http://example.com/Jane foaf:knows http://example.com/John

Subject Predicate Object

Page 14: Using MongoDB as a graph database - 2014 redux

example:John example:Jane

foaf:Person

rdf:type rdf:type

“John” “Jane”

foaf:name foaf:name

foaf:knows

foaf:knows

Page 15: Using MongoDB as a graph database - 2014 redux

– Jack Fullstack

“WTF! Surely this is easier in JSON!”

Page 16: Using MongoDB as a graph database - 2014 redux

> db.people.find(){ _id: ObjectID(‘123’), name: ‘John’ knows: [ObjectID(‘456’)]},{ _id: ObjectID(‘456’), name: ‘Jane’ knows: [ObjectID(‘123’)]}

Page 17: Using MongoDB as a graph database - 2014 redux

foaf:Person

Page 18: Using MongoDB as a graph database - 2014 redux

example:John

“John”

foaf:name

example:John

24

foaf:age

Dataset A Dataset B

Page 19: Using MongoDB as a graph database - 2014 redux

example:John

“John” 24

Dataset A+B

foaf:name foaf:age

Page 20: Using MongoDB as a graph database - 2014 redux

SPARQLAn RDF Query Language

Page 21: Using MongoDB as a graph database - 2014 redux

PREFIX foaf: <http://xmlns.com/foaf/0.1/>SELECT ?name ?emailWHERE { ?person a foaf:Person. ?person foaf:name ?name. ?person foaf:mbox ?email.}ORDER BY ?nameLIMIT 50

Page 22: Using MongoDB as a graph database - 2014 redux

CONSTRUCTDESCRIBESELECTASK

GraphGraph

TabularBoolean

Page 23: Using MongoDB as a graph database - 2014 redux

Graphs and Talis A bit of history

Page 24: Using MongoDB as a graph database - 2014 redux

Over time…• Our apps become popular. Last week, average

4M requests per day and at peak times 600k+ per hour

• Our dataset is growing in size - about 350M triples this week

• Our apps needed more queries and more expensive queries

• Our in-house triple store was EoL and out of date

Page 25: Using MongoDB as a graph database - 2014 redux

Project Tripodhttp://github.com/talis/tripod-php http://github.com/talis/tripod-node

Page 26: Using MongoDB as a graph database - 2014 redux

System characteristics

• 99:1 read:write

• Well shared, tenant based system. Our largest single customer has 35M triples

• Graph data structures and operations (merges, sub-graphs etc.) well entrenched in the codebase, over 2M lines code (inc. libraries)

• Actually not that many distinct query shapes

Page 27: Using MongoDB as a graph database - 2014 redux

Simple Queries, and how they influenced our core

data model

Page 28: Using MongoDB as a graph database - 2014 redux

DESCRIBE <http://example.com/John>

SELECT ?name ?age WHERE { <http://example.com/John> <foaf:name> ?name . <http://example.com/John> <foaf:age> ?age .}

Give me all the triples about John as a graph

Give me properties name, age of John as tabular data

Page 29: Using MongoDB as a graph database - 2014 redux

Subject Predicate Object

http://example.com/John

http://example.com/John

foaf:knows http://example.com/Jane

foaf:name “John”

PREFIX foaf: <http://xmlns.com/foaf/0.1/>PREFIX rdf: <

http://www.w3.org/1999/02/22-rdf-syntax-ns#>

http://example.com/John rdf:type foaf:Person

http://example.com/Jane foaf:name “Jane”

http://example.com/Jane rdf:type foaf:Person

http://example.com/Jane foaf:knows http://example.com/John

Page 30: Using MongoDB as a graph database - 2014 redux

http://example.com/John

http://example.com/John

foaf:knows http://example.com/Jane

foaf:name “John”

http://example.com/John rdf:type foaf:Person

http://example.com/Jane foaf:name “Jane”

http://example.com/Jane rdf:type foaf:Person

http://example.com/Jane foaf:knows http://example.com/John

Concise Bound Description of http://example.com/John

Concise Bound Description of http://example.com/Jane

Page 31: Using MongoDB as a graph database - 2014 redux

http://example.com/John

http://example.com/John

foaf:knows http://example.com/Jane

foaf:name “John”

http://example.com/John rdf:type foaf:Person

Concise Bound Description of http://example.com/John

{ _id: “example:John”, “foaf:knows”: { u: “example:Jane” }, “rdf:type”: { u: “foaf:Person” }, “foaf:name”: { l: “John” }}

Page 32: Using MongoDB as a graph database - 2014 redux

{ _id: “example:John”, “foaf:knows”: { u: “example:Jane” }, “rdf:type”: { u: “foaf:Person” }, “foaf:name”: { l: “John” }}

Page 33: Using MongoDB as a graph database - 2014 redux

{ _id: “example:John”, “foaf:knows”: { u: “example:Jane” }, “rdf:type”: { u: “foaf:Person” }, “foaf:name”: { l: “John” }}

_id is the unique primary key. There can only be one John

Page 34: Using MongoDB as a graph database - 2014 redux

{ _id: “example:John”, “foaf:knows”: { u: “example:Jane” }, “rdf:type”: { u: “foaf:Person” }, “foaf:name”: { l: “John” }}

_id is the unique primary key. There can only be one John

l means value is a literal text value

Page 35: Using MongoDB as a graph database - 2014 redux

{ _id: “example:John”, “foaf:knows”: { u: “example:Jane” }, “rdf:type”: { u: “foaf:Person” }, “foaf:name”: { l: “John” }}

_id is the unique primary key. There can only be one John

u means value is a uri, or another

node.l means value is a literal text value

Page 36: Using MongoDB as a graph database - 2014 redux

{ _id: “example:John”, “foaf:knows”: { u: “example:Jane” }, “rdf:type”: { u: “foaf:Person” }, “foaf:name”: { l: “John” }}

DESCRIBE <http://example.com/John>

SELECT ?name ?age WHERE { <http://example.com/John> <foaf:name> ?name . <http://example.com/John> <foaf:age> ?age .}

Page 37: Using MongoDB as a graph database - 2014 redux

{ _id: “example:John”, “foaf:knows”: { u: “example:Jane” }, “rdf:type”: { u: “foaf:Person” }, “foaf:name”: { l: “John” }}

DESCRIBE <http://example.com/John>

SELECT ?name ?age WHERE { <http://example.com/John> <foaf:name> ?name . <http://example.com/John> <foaf:age> ?age .}

mongo$ col.findOne({_id:”example:John”});

mongo$ col.findOne({_id:”example:John”},{“foaf:name.l”:1,”foaf:age.l”:1});

Page 38: Using MongoDB as a graph database - 2014 redux

{ s: “example:John, p: “foaf:knows” o: { u: “example:Jane” } }, { s: “example:John, p: “rdf:type” o: { u: “foaf:Person” } }, { s: “example:John, p: “foaf:name” o: { l: “John” } },

Page 39: Using MongoDB as a graph database - 2014 redux

{ s: “example:John, p: “foaf:knows” o: { u: “example:Jane” } }, { s: “example:John, p: “rdf:type” o: { u: “foaf:Person” } }, { s: “example:John, p: “foaf:name” o: { l: “John” } },

DESCRIBE <http://example.com/John>

SELECT ?name ?age WHERE { <http://example.com/John> <foaf:name> ?name . <http://example.com/John> <foaf:age> ?age .}

mongo$ var s = col.find({s:”example:John”});mongo$ while (s.hasNext()) { addToGraph(s.next()) }

mongo$ col.find({s:”example:John”, p: “foaf:name”}},{“o”:1});mongo$ col.find({s:”example:John”, p: “age”}},{“o”:1});

Page 40: Using MongoDB as a graph database - 2014 redux

{ s: “example:John, p: “foaf:knows” o: { u: “example:Jane” } }, { s: “example:John, p: “rdf:type” o: { u: “foaf:Person” } }, { s: “example:John, p: “foaf:name” o: { l: “John” } },

DESCRIBE ?person WHERE { ?person <foaf:name> “John” . }

mongo$ var s = col.find({p:”foaf:name”, o:”John”}); // BasicCursor = slow

{ _id: “example:John”, “foaf:knows”: { u: “example:Jane” }, “rdf:type”: { u: “foaf:Person” }, “foaf:name”: { l: “John” }}

DESCRIBE ?person WHERE { ?person <foaf:name> “John” . }

mongo$ col.ensureIndex({“foaf:name.u”:1});mongo$ var s = col.find({“foaf:name.u”:”John”}); // BTreeCursor = fast

Page 41: Using MongoDB as a graph database - 2014 redux

Complex Queries

Page 42: Using MongoDB as a graph database - 2014 redux

DESCRIBE <http://example.com/foo> ?sectionOrItem ?resource ?document ?authorList ?author ?usedBy ?creator ?libraryNote ?publisherWHERE{ OPTIONAL { <http://example.com/foo> resource:contains ?sectionOrItem . OPTIONAL { ?sectionOrItem resource:resource ?resource . OPTIONAL { ?resource dcterms:isPartOf ?document . } OPTIONAL { ?resource bibo:authorList ?authorList . OPTIONAL { ?authorList ?p ?author . } } OPTIONAL { ?resource dcterms:publisher ?publisher . } } OPTIONAL { ?libraryNote bibo:annotates ?sectionOrItem } } . OPTIONAL { <http://example.com/foo> resource:usedBy ?usedBy } . OPTIONAL { <http://example.com/foo> sioc:has_creator ?creator }}

Page 43: Using MongoDB as a graph database - 2014 redux

DESCRIBE <http://example.com/foo> ?sectionOrItem ?resource ?document ?authorList ?author ?usedBy ?creator ?libraryNote ?publisherWHERE{ OPTIONAL { <http://example.com/foo> resource:contains ?sectionOrItem . OPTIONAL { ?sectionOrItem resource:resource ?resource . OPTIONAL { ?resource dcterms:isPartOf ?document . } OPTIONAL { ?resource bibo:authorList ?authorList . OPTIONAL { ?authorList ?p ?author . } } OPTIONAL { ?resource dcterms:publisher ?publisher . } } OPTIONAL { ?libraryNote bibo:annotates ?sectionOrItem } } . OPTIONAL { <http://example.com/foo> resource:usedBy ?usedBy } . OPTIONAL { <http://example.com/foo> sioc:has_creator ?creator }}

Page 44: Using MongoDB as a graph database - 2014 redux

– Project Tripod Team, sometime 2012

“We don’t need dynamic queries”

Page 45: Using MongoDB as a graph database - 2014 redux

Precomputed viewsRemember those from the RDBMS?

Page 46: Using MongoDB as a graph database - 2014 redux

{ _id: { “example:John” “foaf:knows”: { u: “example:Jane” }, “rdf:type”: { u: “foaf:Person” }, “foaf:name”: { l: “John” }}

{ _id: “example:Jane”, “foaf:knows”: { u: “example:John” }, “rdf:type”: { u: “foaf:Person” }, “foaf:name”: { l: “Jane” }}

DESCRIBE example:John ?knownPerson WHERE { example:John foaf:knows ?knownPerson . }

mongo$ var john = col.findOne({_id:”example:John”}); for (var i=0; i < john[“foaf:knows”].length; i++) { var knownPerson = col.findOne({“_id: john[“foaf:knows”][i]}); }

Page 47: Using MongoDB as a graph database - 2014 redux

System characteristics

• 99:1 read:write

• Well shared, tenant based system. Our largest single customer has 35M triples

• Graph data structures and operations (merges, sub-graphs etc.) well entrenched in the codebase, over 2M lines code (inc. libraries).

• Actually not that many distinct query shapes.

Page 48: Using MongoDB as a graph database - 2014 redux

{ _id : { r: “example:John, t: “v_knows”}, graphs: [{ _id: { “example:John” “foaf:knows”: { u: “example:Jane” }, “rdf:type”: { u: “foaf:Person” }, “foaf:name”: { l: “John” } }, { _id: “example:Jane”, “foaf:knows”: { u: “example:John” }, “rdf:type”: { u: “foaf:Person” }, “foaf:name”: { l: “Jane” } }]}

DESCRIBE example:John ?knownPerson WHERE { example:John foaf:knows ?knownPerson . }

mongo$ viewsCol.findOne({_id: {r:”example:John”,t:”v_knows”}})

Page 49: Using MongoDB as a graph database - 2014 redux

{ _id : { r: “example:John, t: “v_knows”}, graphs: [{ _id: { “example:John” “foaf:knows”: { u: “example:Jane” }, “rdf:type”: { u: “foaf:Person” }, “foaf:name”: { l: “John” } }, { _id: “example:Jane”, “foaf:knows”: { u: “example:John” }, “rdf:type”: { u: “foaf:Person” }, “foaf:name”: { l: “Jane” } }] _impactIndex : [“example:Jane”,”example:John”]}

Page 50: Using MongoDB as a graph database - 2014 redux

{ "_id":"v_knows", "type":["foaf:Person"], "from":"CBD_people", "joins":{ “foaf:knows":{} }}

View specification

Page 51: Using MongoDB as a graph database - 2014 redux

More complex example

{ "_id":"v_resources", "type":["resourcelist:Resource"], "from":"CBD_resources", "joins":{ "dct:partOf":{ "joins": { "bibo:authorList":{ "joins" : { "followSequence":{ "maxJoins":50 } } }, "bibo:editorList":{ "joins" : { "followSequence":{ "maxJoins":50 } } }, "dct:publisher":{} } },

"dct:isPartOf":{ "joins": { "bibo:authorList":{ "joins" : { "followSequence":{ "maxJoins":50 } } }, "bibo:editorList":{ "joins" : { "followSequence":{ "maxJoins":50 } } }, "dct:publisher":{} } }, "bibo:authorList":{ "joins" : { "followSequence":{ "maxJoins":50 } } }, "bibo:editorList":{ "joins" : { "followSequence":{ "maxJoins":50 } } }, "dct:publisher":{} } }

Page 52: Using MongoDB as a graph database - 2014 redux

What about tabular data?

• We also have tables and table specs

• Conceptually the same as views

• Instead of an array of graphs we have computed columns for complex tabular queries

• You can page, limit, offset results just like you’d expect

Page 53: Using MongoDB as a graph database - 2014 redux

{"_id" : {

"r" : “http://example.com/users/FC44E153-161C-C199-DBAB-4DDE13F76F9B/bookmarks/1ABE1B4B-A68C-90E4-41DB-AF132854770F”"type" : "t_user_resources"

},"value" : {

"_impactIndex" : [{

"r" : “http://example.com/users/FC44E153-161C-C199-DBAB-4DDE13F76F9B/bookmarks/1ABE1B4B-A68C-90E4-41DB-AF132854770F","c" : "tenantContexts:DefaultGraph"

},{

"r" : "tenantResources:7AB1D8E3-5D74-D07F-41E7-56206CFEC8EE","c" : "tenantContexts:DefaultGraph"

}],"collection" : “http://example.com/users/FC44E153-161C-C199-DBAB-4DDE13F76F9B/bookmarks","createdDate" : "2011-02-08T15:59:45+00:00","resourceUri" : "tenantResources:7AB1D8E3-5D74-D07F-41E7-56206CFEC8EE","note" : "ELECTRONIC","title" : "Feminism & psychology","type" : [

"resourcelist:Resource","bibo:Journal"

]}

}

Page 54: Using MongoDB as a graph database - 2014 redux

Database layout

talis-rs:PRIMARY> show collectionsCBD_configCBD_draftCBD_eventsCBD_jobsCBD_listsCBD_nodesCBD_resourcesCBD_reviewsCBD_serviceCBD_user_listsCBD_user_resourcesCBD_userstable_rowsviews

{r/w

} read only

Page 55: Using MongoDB as a graph database - 2014 redux

Fast and slow saves, you decide.

Page 56: Using MongoDB as a graph database - 2014 redux

Tripod save()• Based on change sets, you supply the old and

new graphs

• CBDs updated immediately. Write ahead transaction log for multi-CBD writes

• Choice per save on whether to update views/tables sync or async (eventually consistent)

• Async adds jobs to a Mongo based queue

Page 57: Using MongoDB as a graph database - 2014 redux

Measure everything

Page 58: Using MongoDB as a graph database - 2014 redux

Query volumecomplex vs. simple

Page 59: Using MongoDB as a graph database - 2014 redux

Query volumegraph vs. tabular

Page 60: Using MongoDB as a graph database - 2014 redux

Query speedcomplex vs. simple graph query

Page 61: Using MongoDB as a graph database - 2014 redux

Hardware• Real tin, 2x Dell low-end rack mount servers

• 96Gb RAM, 24 cores

• RAID-10 disks, non-SSD

• Keep ‘em on the same LAN as your app servers

• About the same to lease per month than a couple of c3.4xlarge (30Gb, 32vCPU)

• We’re about to add similar second cluster, 144Gb

Page 62: Using MongoDB as a graph database - 2014 redux

Why Mongo? RTFM, not HN comment feeds.

But seriously it could have been n other document DBs

Page 63: Using MongoDB as a graph database - 2014 redux

There’s lots moreSearch, named graphs (quads), data

functions

Page 64: Using MongoDB as a graph database - 2014 redux

Future roadmap• Multi-cluster <- IN PROGRESS

• NodeJS port <- IN PROGRESS

• Choose better solution for tlog, probably PostgreSQL

• Background queue -> redis and resque

• Chainable API

• Spout of updates for Apache Storm

• Versioned views/tables config

Page 65: Using MongoDB as a graph database - 2014 redux

ApertureAnnotate your models to persist to graph

Page 66: Using MongoDB as a graph database - 2014 redux

ApertureAnnotate your models to persist to graph

Page 67: Using MongoDB as a graph database - 2014 redux

tripod-php code…

…same in aperture

Page 68: Using MongoDB as a graph database - 2014 redux

@talisfacebook.com/talisgroup

+44 (0) 121 374 2740

[email protected]

48 Frederick StreetBirminghamB1 3HN