Transcript
Page 1: Socialite, the Open Source Status Feed Part 2: Managing the Social Graph

Building a Social Platform with MongoDB

MongoDB IncDarren Wood & Asya Kamsky

#MongoDBWorld

Page 2: Socialite, the Open Source Status Feed Part 2: Managing the Social Graph

Building a Social Platform

Part 2: Managing the Social Graph

Page 3: Socialite, the Open Source Status Feed Part 2: Managing the Social Graph

Socialite

• Open Source• Reference Implementation

– Various Fanout Feed Models– User Graph Implementation– Content storage

• Configurable models and options• REST API in Dropwizard (Yammer)

– https://dropwizard.github.io/dropwizard/• Built-in benchmarking

https://github.com/10gen-labs/socialite

Page 4: Socialite, the Open Source Status Feed Part 2: Managing the Social Graph

Architecture

Graph Service

Proxy

Cont

ent

Prox

y

Page 5: Socialite, the Open Source Status Feed Part 2: Managing the Social Graph

Graph Data - Social

John Katefollows

Bob

followsPete

follows

follows

Page 6: Socialite, the Open Source Status Feed Part 2: Managing the Social Graph

Graph Data - Social

John Katefollows

Bob

followsPete

follows

follows

Recommendation ?

Page 7: Socialite, the Open Source Status Feed Part 2: Managing the Social Graph

Graph Data - Promotional

John Katefollows

Bob

follows Pete

follows

follows

follows

Acme Soda

Mention

Mention

Recommendation ?

Page 8: Socialite, the Open Source Status Feed Part 2: Managing the Social Graph

Graph Data - Everywhere

• Retail

• Complex product catalogues

• Product recommendation engines

• Manufacturing and Logistics

• Tracing failures to faulty component batches

• Determining fallout from supply interruption

• Healthcare

• Patient/Physician interactions

Page 9: Socialite, the Open Source Status Feed Part 2: Managing the Social Graph

Design Considerations

Page 10: Socialite, the Open Source Status Feed Part 2: Managing the Social Graph

The Tale of Two Biebers

VS

Page 11: Socialite, the Open Source Status Feed Part 2: Managing the Social Graph

The Tale of Two Biebers

VS

Page 12: Socialite, the Open Source Status Feed Part 2: Managing the Social Graph

Follower Churn

• Tempting to focus on scaling content• Follow requests rival message send rates• Twitter enforces per day follow limits

Page 13: Socialite, the Open Source Status Feed Part 2: Managing the Social Graph

Edge Metadata

• Models – friends/followers• Requirements typically start simple• Add Groups, Favorites, Relationships

Page 14: Socialite, the Open Source Status Feed Part 2: Managing the Social Graph

Storing Graphs in MongoDB

Page 15: Socialite, the Open Source Status Feed Part 2: Managing the Social Graph

Option One – Embedding Edges

Page 16: Socialite, the Open Source Status Feed Part 2: Managing the Social Graph

Embedded Edge Arrays

• Storing connections with user (popular choice)Most compact formEfficient for reads

• However….– User documents grow– Upper limit on degree (document size)– Difficult to annotate (and index) edge

{ "_id" : "djw","fullname" : "Darren Wood","country" : "Australia","followers" : [ "jsr", "ian"],"following" : [ "jsr", "pete"]

}

Page 17: Socialite, the Open Source Status Feed Part 2: Managing the Social Graph

Embedded Edge Arrays

• Creating Rich Graph Information– Can become cumbersome

{

"_id" : "djw","fullname" : "Darren Wood","country" : "Australia","friends" : [

{"uid" : "jsr", "grp" : "school"},{"uid" : "ian", "grp" : "work"} ]

} {

"_id" : "djw","fullname" : "Darren Wood","country" : "Australia","friends" : [ "jsr", "ian"],"group" : [ ”school", ”work"]

}

Page 18: Socialite, the Open Source Status Feed Part 2: Managing the Social Graph

Option Two – Edge Collection

Page 19: Socialite, the Open Source Status Feed Part 2: Managing the Social Graph

Edge Collections

• Document per edge

• Very flexible for adding edge data

> db.followers.findOne(){

"_id" : ObjectId(…),"from" : "djw","to" : "jsr"

}

> db.friends.findOne(){

"_id" : ObjectId(…),"from" : "djw","to" : "jsr","grp" : "work","ts" : Date("2013-07-10")

}

Page 20: Socialite, the Open Source Status Feed Part 2: Managing the Social Graph

Operational issues

• Updates of embedded arrays– grow non-linearly with number of indexed array

elements

• Updating edge collection => inserts– grows close to linearly with existing number of

edges/user

Page 21: Socialite, the Open Source Status Feed Part 2: Managing the Social Graph

Edge Insert Rate

Page 22: Socialite, the Open Source Status Feed Part 2: Managing the Social Graph

Edge CollectionIndexing Strategies

Page 23: Socialite, the Open Source Status Feed Part 2: Managing the Social Graph

Finding Followers

Consider our single follower collection :> db.followers.find({from : "djw"}, {_id:0, to:1}){

"to" : "jsr"}

Using index :

{"v" : 1,"key" : { "from" : 1, "to" : 1 },"unique" : true,"ns" : "socialite.followers","name" : "from_1_to_1"

}

Covered index when searching on "from" for all followers

Specify only if multiple edges cannot exist

Page 24: Socialite, the Open Source Status Feed Part 2: Managing the Social Graph

Finding Following

What about who a user is following?Can use a reverse covered index :

{"v" : 1,"key" : { "from" : 1, "to" : 1 },"unique" : true,"ns" : "socialite.followers","name" : "from_1_to_1"

}{

"v" : 1,"key" : { "to" : 1, "from" : 1 },"unique" : true,"ns" : "socialite.followers","name" : "to_1_from_1"

}Notice the flipped field order here

Page 25: Socialite, the Open Source Status Feed Part 2: Managing the Social Graph

Finding Following

Wait ! There is an issue with the reverse index….. SHARDING !

{"v" : 1,"key" : { "from" : 1, "to" : 1 },"unique" : true,"ns" : "socialite.followers","name" : "from_1_to_1"

}{

"v" : 1,"key" : { "to" : 1, "from" : 1 },"unique" : true,"ns" : "socialite.followers","name" : "to_1_from_1"

}

If we shard this collection by "from", looking up followers for a specific user is "targeted" to a shard

To find who the user is following however, it must scatter-gather the query to all shards

Page 26: Socialite, the Open Source Status Feed Part 2: Managing the Social Graph

Dual Edge Collections

Page 27: Socialite, the Open Source Status Feed Part 2: Managing the Social Graph

Dual Edge Collections When "following" queries are common

– Not always the case– Consider overhead carefully

Can use dual collections storing

– One for each direction– Edges are duplicated reversed– Can be sharded independently

Page 28: Socialite, the Open Source Status Feed Part 2: Managing the Social Graph

Edge Query Rate ComparisonNumber of shards vsNumber of queries

Followers collectionwith forward and reverse indexes

Two collections, followers, followingone index each

1 10,000 10,000

3 90,000 30,000

6 360,000 60,000

12 1,440,000 120,000

Page 29: Socialite, the Open Source Status Feed Part 2: Managing the Social Graph

Follower Counts

Can use the edge indexes :

How to determine these counts ?

> db.followers.find({_f : "djw"}).count()> db.following.find({_f : "djw"}).count()

However this can be heavy weight- Especially for rendering landing page- Consider maintaining counts on user document

Page 30: Socialite, the Open Source Status Feed Part 2: Managing the Social Graph

Socialite User Service

• Manages user profiles and the follower graph• Supports arbitrary user data passthrough• Options for graph storage

– Uses edge collections (can shard by _f) – Options for maintaining separate follower/ing graphs– Storing counts vs counting

{"_id" : ObjectId("52cd1d32a0ee9a1a76d369bb"),"_f" : "jsr","_t" : "djw"

}{

"v" : 1,"key" : {"_f" : 1, "_t" : 1},"unique" : true,

}

Page 31: Socialite, the Open Source Status Feed Part 2: Managing the Social Graph

Next up @ 11:50am : Scaling the Data Feed

• Delivering user content to followers

• Comparing fanout models

• Caching user timelines for fast retrieval

• Embedding vs Linking Content

Page 32: Socialite, the Open Source Status Feed Part 2: Managing the Social Graph

Building a Social Platform with MongoDB

MongoDB IncDarren Wood & Asya Kamsky

#MongoDBWorld