Socialite, the Open Source Status Feed Part 2: Managing the Social Graph

Building a Social Platform with MongoDB

MongoDB IncDarren Wood & Asya Kamsky

#MongoDBWorld

Building a Social Platform

Part 2: Managing the Social Graph

Socialite

• Open Source• Reference Implementation

– Various Fanout Feed Models– User Graph Implementation– Content storage

• Configurable models and options• REST API in Dropwizard (Yammer)

– https://dropwizard.github.io/dropwizard/• Built-in benchmarking

https://github.com/10gen-labs/socialite

Architecture

Graph Service

Graph Data - Social

John Katefollows

followsPete

follows

Graph Data - Social

John Katefollows

followsPete

follows

Recommendation ?

Graph Data - Promotional

John Katefollows

follows Pete

follows

Acme Soda

Mention

Recommendation ?

Graph Data - Everywhere

• Retail

• Complex product catalogues

• Product recommendation engines

• Manufacturing and Logistics

• Tracing failures to faulty component batches

• Determining fallout from supply interruption

• Healthcare

• Patient/Physician interactions

Design Considerations

The Tale of Two Biebers

Follower Churn

• Tempting to focus on scaling content• Follow requests rival message send rates• Twitter enforces per day follow limits

Edge Metadata

• Models – friends/followers• Requirements typically start simple• Add Groups, Favorites, Relationships

Storing Graphs in MongoDB

Option One – Embedding Edges

Embedded Edge Arrays

• Storing connections with user (popular choice)Most compact formEfficient for reads

• However….– User documents grow– Upper limit on degree (document size)– Difficult to annotate (and index) edge

{ "_id" : "djw","fullname" : "Darren Wood","country" : "Australia","followers" : [ "jsr", "ian"],"following" : [ "jsr", "pete"]

Embedded Edge Arrays

• Creating Rich Graph Information– Can become cumbersome

"_id" : "djw","fullname" : "Darren Wood","country" : "Australia","friends" : [

{"uid" : "jsr", "grp" : "school"},{"uid" : "ian", "grp" : "work"} ]

"_id" : "djw","fullname" : "Darren Wood","country" : "Australia","friends" : [ "jsr", "ian"],"group" : [ ”school", ”work"]

Option Two – Edge Collection

Edge Collections

• Document per edge

• Very flexible for adding edge data

> db.followers.findOne(){

"_id" : ObjectId(…),"from" : "djw","to" : "jsr"

> db.friends.findOne(){

"_id" : ObjectId(…),"from" : "djw","to" : "jsr","grp" : "work","ts" : Date("2013-07-10")

Operational issues

• Updates of embedded arrays– grow non-linearly with number of indexed array

elements

• Updating edge collection => inserts– grows close to linearly with existing number of

edges/user

Edge Insert Rate

Edge CollectionIndexing Strategies

Finding Followers

Consider our single follower collection :> db.followers.find({from : "djw"}, {_id:0, to:1}){

"to" : "jsr"}

Using index :

{"v" : 1,"key" : { "from" : 1, "to" : 1 },"unique" : true,"ns" : "socialite.followers","name" : "from_1_to_1"

Covered index when searching on "from" for all followers

Specify only if multiple edges cannot exist

Finding Following

What about who a user is following?Can use a reverse covered index :

"v" : 1,"key" : { "to" : 1, "from" : 1 },"unique" : true,"ns" : "socialite.followers","name" : "to_1_from_1"

}Notice the flipped field order here

Finding Following

Wait ! There is an issue with the reverse index….. SHARDING !

"v" : 1,"key" : { "to" : 1, "from" : 1 },"unique" : true,"ns" : "socialite.followers","name" : "to_1_from_1"

If we shard this collection by "from", looking up followers for a specific user is "targeted" to a shard

To find who the user is following however, it must scatter-gather the query to all shards

Dual Edge Collections

Dual Edge Collections When "following" queries are common

– Not always the case– Consider overhead carefully

Can use dual collections storing

– One for each direction– Edges are duplicated reversed– Can be sharded independently

Edge Query Rate ComparisonNumber of shards vsNumber of queries

Followers collectionwith forward and reverse indexes

Two collections, followers, followingone index each

1 10,000 10,000

3 90,000 30,000

6 360,000 60,000

12 1,440,000 120,000

Follower Counts

Can use the edge indexes :

How to determine these counts ?

> db.followers.find({_f : "djw"}).count()> db.following.find({_f : "djw"}).count()

However this can be heavy weight- Especially for rendering landing page- Consider maintaining counts on user document

Socialite User Service

• Manages user profiles and the follower graph• Supports arbitrary user data passthrough• Options for graph storage

– Uses edge collections (can shard by _f) – Options for maintaining separate follower/ing graphs– Storing counts vs counting

{"_id" : ObjectId("52cd1d32a0ee9a1a76d369bb"),"_f" : "jsr","_t" : "djw"

"v" : 1,"key" : {"_f" : 1, "_t" : 1},"unique" : true,

Next up @ 11:50am : Scaling the Data Feed

• Delivering user content to followers

• Comparing fanout models

• Caching user timelines for fast retrieval

• Embedding vs Linking Content

Building a Social Platform with MongoDB

MongoDB IncDarren Wood & Asya Kamsky

#MongoDBWorld

Socialite, the Open Source Status Feed Part 2: Managing the Social Graph

Technology

SociaLite: An Efﬁcient Graph Query Language Based on Datalog · PDF file1 SociaLite: An Efﬁcient Graph Query Language Based on Datalog Jiwon Seo, Stephen Guo, and Monica S. Lam,

De socialite en de city

Gospel Socialite Magazine Vol. 1 Issue 2

Black Socialite Magazine Preview Issue -October 2012

SOCIALITE HUMAINE ET POLITIQUE.pdf

Socialite Magazine march 2013

Joocial€¦ · ADVANCED - Open Graph features for Users: Post to Facebook Timeline using Open Graph actions. Increases News Feed engagement through news-specific classification,

Socialite March 2012

Socialite, the Open Source Status Feed

Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed

Socialite, the Open Source Status Feed Part 1: Design Overview and Scaling for Infinite Content

Distributed SociaLite: A Datalog-Based Language for Large ...Many of these graph analyses can be readily expressed in Datalog, a declarative logic programming language often used as

The Socialite

Socialite Magazine November 2010

SociaLite A Python-Integrated Query Language for Big Data ...pythonkr.github.io/pyconkr-2014/pdf/pyconkr-2014-06_socialite.pdf · SociaLite is a high-level query language ! Compiled

Distributed SociaLite: A Datalog-Based Language for · PDF fileDistributed SociaLite: A Datalog-Based Language for Large-Scale Graph Analysis Jiwon Seo Stanford University jiwon@ Jongsoo

You 2.0: Three Steps to being a Socialite

Personal PAIRS ReportA Socialite might choose to dress or behave informally, even when the situation demands a more serious attitude. Low scorers on the Socialite scale choose smaller,

Socialite August

Socialite Magazine April 2013