Triple R â€“ Riak, Redis and RabbitMQ at XING

Triple R – Riak, Redis and RabbitMQ at XING Dr. Stefan Kaes, Sebastian Röbke

NoSQL matters Cologne, April 27, 2013

ActivityStream Intro

3 Types of Feeds

News Feed

Me Feed

Company Feed

Activity Creation

ActivityStream

POST /activitystream/activities

ActivityStream

events.participation.changed

events.participation.changed

Events Appevents.event.created

...

groups.member.joined

Groups Appgroups.article.created

...

users.contact.created

User Appusers.profile.updated

...

...

etc.

...

... ActivityStream

Old Approach

Comment

Activity

INSERT INTO `activities` ...

LikeINSERT INTO `comments` ...

INSERT INTO `likes` ...

several hundred millions

Efficient Activity Creation

But ... slow reads

User App

Groups App

GET group memberships

Companies App

Settings

ActivityStream

likes comments

activities

etc.

...

GET contacts

GET privacy settings

GET followed companies

?

User App

Groups App

GET group memberships

Companies App

Settings

ActivityStream

likes comments

activities

etc.

...

GET contacts

GET privacy settings

GET followed companies

Activities immediately visible

Consistency

SQL Databases arewell understood

Database Master is

Single Point of Failure

Sharding

Unsatisfactory Read Performance

New Approach

Materialized Feeds

ActivityStream

likes

comments

activities news feeds me feeds company feeds

Activity

Storage

User App

GET contacts

etc....

create

?

Requirements

Better Read Performance

Activities created by me must be visible to myself immediately

Activities created by others should appear within a reasonable time

frame in my stream

Storage layer must tolerate high read and write loads

Storage layer must provide easy capacity scaling

Low maintenance

Option 1:

Do it yourself SQLdatabase design

Option 2:

Off the shelf NoSQL database

We chose

We tend to view it as a highly available distributed hash table

Eventual consistency/conflict resolution is the hard part

Bounded size feeds are easyhttp://www.paperplanes.de/2011/12/15/storing-timelines-in-riak.html

http://www.paperplanes.de/2011/12/15/storing-timelines-in-riak.html

http://www.paperplanes.de/2011/12/15/storing-timelines-in-riak.html

Unbounded feeds are much harder

Object Model

JSON Documents

Activities

Activities

Activities

Activities

2P-Set

JSON Documents

Feeds

Feeds

bounded list of chunk references

Feeds

chunk sequence number

Feeds

youngest activity ref

Feeds

oldest activity ref

Feeds

size of referenced chunk

JSON Documents

FeedChunk

FeedChunk

2P-Set

The Migration

Incremental rollout

Part 1:

From old to new

Let’s start simple!

Replicating some data

Old ActivityStream

New ActivityStream

activity.deleted

activity.created

comment.deleted

comment.created

like.deleted

like.created

data migration processors

Measuring performance

Old ActivityStream

New ActivityStream

mefeed.viewed

newsfeed.viewed

companyfeed.viewed

shadow query processors

data migration processors

Part 2:

From new to old

Old ActivityStream

New ActivityStream

DELETE /activitystream/activities/{id}


DELETE /activitystream/activities/{activity_id}/comments/{id}

POST /activitystream/activities/{activity_id}/comments

DELETE /activitystream/activities/{activity_id}/likes/{user_id}

PUT /activitystream/activities/{activity_id}/likes/{user_id}

Beta User B

Beta User A

Beta User C

Old ActivityStream

New ActivityStream

DELETE /activitystream/activities/{id}


DELETE /activitystream/activities/{activity_id}/comments/{id}

POST /activitystream/activities/{activity_id}/comments

DELETE /activitystream/activities/{activity_id}/likes/{user_id}

PUT /activitystream/activities/{activity_id}/likes/{user_id}

activity idBeta User B

Beta User A

Beta User C

Part 3:

What about the old data?

Bulk Data Migration:

Failed Version 1


Failed Version 11. Reset data in the new system2. Query the old system REST API for the feeds3. Store them in the new system4. Switch to the new system

This was naive

The old system was way too slow to return the millions of feeds in

their full length


Failed Version 2


Failed Version 21. Reset data in the new system2. Read all activities based on a dump of the

old system3. Publish “created” messages to RabbitMQ for

each activity/comment/like4. Let the new system build its data structures5. Switch to the new system

This was naive

You can’t replay the history of 2.5 years this way


Successful Version


Successful Version1. Reset data in the new system2. Obtain data dump from old system3. Extract data from the dumps and compute a

representation of the feeds in Redis with a massive amount of batch processors

4. Use this data to build up the structures in the new system

5. Switch to the new system

It worked!

But...

A lot of additional code

Fragile, manual steps involved

A lot of technology:

RabbitMQ, Riak, Redis, Varnish

One run took 5 days

But it worked!

Current Status

New system is live for all users since 12/12/12

Old and new system were kept in sync till April 2013

In case of serious trouble, we could have switched to the old

system within seconds

Performance goals have been met

Performance goals have been met

Old system New system

happy t < 0.1s 0.17% 62.01%

satisfied t < 0.5s 41.36% 99.71%

tolerating 0.5s ≤ t < 2s 58.20% 0.28%

frustrated t ≥ 2s 0.44% 0.00%

Apdex Score 0.70 1.00

Production setup

‣ 10 Riak Servers as Primary Cluster

‣ 10 Riak Servers as Backup Cluster (Multi-DC Replication)

‣ SSDs, Raid 0 and a proper Linux file I/O scheduler setting (noop)

‣ Bitcask storage backend

‣ 4 REST API Servers

‣ 4 Background Worker Servers

‣ Monitoring using Ganglia and Logjam (App Performance)

Lessons learned

‣Eventual consistency sounds easy, but is hard to implement correctly in practice‣There’s a steep learning curve at the beginning‣High update rates and large objects don’t go

together well, if your storage system offers just get, put and delete operations‣Achieving high performance requires careful

thought about data structures, algorithms and access patterns‣Building a new system from scratch is lot easier

than migrating an existing system

‣Protobuffs API faster than HTTP ‣Use the best performing JSON library you can

find (Ruby: Oj gem)‣Avoid a full-blown ORM for Riak if you care

about performance (Ruby: Ripple gem)

‣At one point we saturated the Gigabit network cards on the Riak cluster‣This lead to compressing all data before storing

it on the cluster and breaking news feeds into chunks

The professional network www.xing.com

Thank you for your

attention!

Dr. Stefan Kaes

Twitter: @stkaes

Sebastian Röbke

Twitter: @boosty

We’re hiring: [email protected]

mailto:[email protected]

mailto:[email protected]