Migration from SQL to MongoDB - A Case Study at TheKnot.com

Preview:

Citation preview

| 1

© 2014 XO GROUP INC. ALL RIGHTS RESERVED.

| 2

© 2014 XO GROUP INC. ALL RIGHTS RESERVED.

XO Group Inc.

Membership and Community Team

Alexander Copquin - Senior Software Engineer

Vladimir Carballo - Senior Software Engineer

| 3

© 2014 XO GROUP INC. ALL RIGHTS RESERVED.

Favorites API Re-platforming

…a case study

| 4

© 2014 XO GROUP INC. ALL RIGHTS RESERVED.

Favorites API Re-platforming

• Architectures SQL .NET / Ruby Mongo• Reasons for migration• Schema design• RoR model design and implementation• Migration strategies and systems• Lessons learned

| 5

© 2014 XO GROUP INC. ALL RIGHTS RESERVED.

Our Favorites Feature

| 6

© 2014 XO GROUP INC. ALL RIGHTS RESERVED.

Favorites API

• Add / Edit / Delete Object.• Manage Boards• Get counts & stats• RESTful API• Rails • JavaScript • Ios• Android

Features

| 7

© 2014 XO GROUP INC. ALL RIGHTS RESERVED.

• + 100,000,000 “favorited” objects• + 760,000 boards• Avg. 55,000 new objects per day

Stats

| 8

© 2014 XO GROUP INC. ALL RIGHTS RESERVED.

Legacy Architecture

| 9

© 2014 XO GROUP INC. ALL RIGHTS RESERVED.

• Database 55 GB and growing…

• Avg 45 rpm on peak times

• Avg 80 msec response POST

• Avg 460 msec response GET

Legacy Benchmarks

| 10

© 2014 XO GROUP INC. ALL RIGHTS RESERVED.

• Db Reaching max. capacity for setup• Scalability problems• Hard to modify schema• Bad response times• Very complex caching layer• Out of line with company’s strategy

Maxed

| 11

© 2014 XO GROUP INC. ALL RIGHTS RESERVED.

New Architecture

| 12

© 2014 XO GROUP INC. ALL RIGHTS RESERVED.

• Easy to scale• Flexible schema• Fast Response• No Cache Layer• Fast Iteration / Deploy• TDD first and foremost• At a glance monitoring of all layers

Scalable

| 13

© 2014 XO GROUP INC. ALL RIGHTS RESERVED.

Implementation

| 14

© 2014 XO GROUP INC. ALL RIGHTS RESERVED.

What we persisted in the legacy schema

• UserId (primary key)• UniqueId• Url (unique per user)• ImageUrl• Name• Description• ObjectId (unique per application adding favorites)

• Category• Timestamps• Other

| 15

© 2014 XO GROUP INC. ALL RIGHTS RESERVED.

Favorites DB Legacy Schema

| 16

© 2014 XO GROUP INC. ALL RIGHTS RESERVED.

select top 10 UserFavoriteId, Name, Description, Url, ImageUrl from userFavorites where userId = '5174181997807393'

Sample queries

| 17

© 2014 XO GROUP INC. ALL RIGHTS RESERVED.

select top 5 grp.groupId, grp.Name as GroupName, fav.userFavoriteId, fav.name, fav.Description, fav.Url, fav.ImageUrlfrom userFavoritesGroups grpinner join userFavoritesGroupsItems grpItm on grp.GroupId = grpItm.GroupIdinner join userFavorites fav on grpItm.userFavoriteId = fav.userFavoriteIdwhere grp.userId = '5174181997807393'

Sample queries

| 18

© 2014 XO GROUP INC. ALL RIGHTS RESERVED.

Towards a new Schema and Persistance Layer

• Start with a clean slate• Break with the past• Persist only relevant minimum data points• Think and rethink relationships• High Performance• Flexible• Prototype different scenarios

| 19

© 2014 XO GROUP INC. ALL RIGHTS RESERVED.

First attempt

| 20

© 2014 XO GROUP INC. ALL RIGHTS RESERVED.

UserFavorites

| 21

© 2014 XO GROUP INC. ALL RIGHTS RESERVED.

• Document contains embedded documents which are required to be accessed on its own

• Documents would grow without bound• Most queries would be slow• Indexes would be very expensive• Tries too hard to imitate legacy

Cons

| 22

© 2014 XO GROUP INC. ALL RIGHTS RESERVED.

Second attempt

| 23

© 2014 XO GROUP INC. ALL RIGHTS RESERVED.

| 24

© 2014 XO GROUP INC. ALL RIGHTS RESERVED.

Board document with one recent favorite

| 25

© 2014 XO GROUP INC. ALL RIGHTS RESERVED.

Board document with more recent favorites

| 26

© 2014 XO GROUP INC. ALL RIGHTS RESERVED.

| 27

© 2014 XO GROUP INC. ALL RIGHTS RESERVED.

Favorite document located on different boards

| 28

© 2014 XO GROUP INC. ALL RIGHTS RESERVED.

• Document structure matches the data required on the view• A Board document includes the 4 most recent favorites.• A Favorite document includes the list of boards it was

added to• Faster queries.• More control on the size of each document• Better implementation of UX intent

Pros

| 29

© 2014 XO GROUP INC. ALL RIGHTS RESERVED.

Sample queriesdb.favorites.find({'member': 'e1606ed5-4ac8-48b4-aee6-bc4203937903'})

.limit(1)

db.favorites.find({'boards': '7557acf8-b7b1-4eab-a64d-57449034cfc6'})

.limit(1)

db.favorites.find({'application': 'marketplace'}) .limit(1)

db.boards.find({'member': 'e1606ed5-4ac8-48b4-aee6-bc4203937903'}) .limit(1)

db.boards.find({'member': 'e1606ed5-4ac8-48b4-aee6-bc4203937903', 'default_board': true})

db.boards.find({'name' : 'Simple Reception Decor'}).limit(1)

| 30

© 2014 XO GROUP INC. ALL RIGHTS RESERVED.

● Rails web application framework● We speak RoR and JS● mongoDB as a data repository (we love NoSQL)● Two collections, one for Boards and one for

Favorites● No joins, no foreign keys● Referential integrity is handled in a different fashion.● MongoId Gem (Pros & Cons)

Some implementation details

| 31

© 2014 XO GROUP INC. ALL RIGHTS RESERVED.

Favorites re-platform

| 32

© 2014 XO GROUP INC. ALL RIGHTS RESERVED.

Board class

| 33

© 2014 XO GROUP INC. ALL RIGHTS RESERVED.

| 34

© 2014 XO GROUP INC. ALL RIGHTS RESERVED.

Favorite class

| 35

© 2014 XO GROUP INC. ALL RIGHTS RESERVED.

| 36

© 2014 XO GROUP INC. ALL RIGHTS RESERVED.

Scaling reads with replica sets

| 37

© 2014 XO GROUP INC. ALL RIGHTS RESERVED.

Scaling reads with sharding

| 38

© 2014 XO GROUP INC. ALL RIGHTS RESERVED.

Migration

| 39

© 2014 XO GROUP INC. ALL RIGHTS RESERVED.

Clients switchover

NewAPI

Legacy API

Client

Client

Client

Client

ONE WAY

| 40

© 2014 XO GROUP INC. ALL RIGHTS RESERVED.

Migration Timeline

new API.Continuous

Migr.Implement Monitors

Turn on Continuous

Data Catch-up

Plug ClientsBulk Migr.

Development

Bulk Migr.

Migration

| 41

© 2014 XO GROUP INC. ALL RIGHTS RESERVED.

Bulk Migration

ETL

| 42

© 2014 XO GROUP INC. ALL RIGHTS RESERVED.

Bulk Migration

FavoritesUserFavorites

SQL Tables Mongo Collections

BoardsUserFavoritesGroups

UserFavoritesGroupsItems

| 43

© 2014 XO GROUP INC. ALL RIGHTS RESERVED.

Favorites Job

| 44

© 2014 XO GROUP INC. ALL RIGHTS RESERVED.

Pentaho Steps

| 45

© 2014 XO GROUP INC. ALL RIGHTS RESERVED.

Auto-increment Id vs. UUID

UserFavoriteId

GroupId

Favorites UUID

Groups UUID

Continuous Migration

| 46

© 2014 XO GROUP INC. ALL RIGHTS RESERVED.

Get UUID from the Get Go

• Add a column to legacy Db (+ 100M recs!!) with new Mongo UUID

• Then migrations will take care of inserting into new documents

SQL has all new

idsxxxxx-xxxx-xxxxxxxxx-xxxx-xxxxxxxxx-xxxx-xxxx

Mongo

Ids are inserted

Migration Systems

| 47

© 2014 XO GROUP INC. ALL RIGHTS RESERVED.

Add UUID Columns in SQL

100 M recs!!!Alter table add UUID uniqueidentifier

New Favs TempTablewith UUID

SELECT *, uuid = NEWID() INTO NewUserFavorites

FROM UserFavorites

Add Indexes

Rename & drop

| 48

© 2014 XO GROUP INC. ALL RIGHTS RESERVED.

• SQL needed some sanitation• SQL prep scripts approx. 4 hs• Pentaho ETL on local Workstation: 8hs• Restore into production Mongo Cluster: 4hs

Facts

| 49

© 2014 XO GROUP INC. ALL RIGHTS RESERVED.

We’ve got data!!

| 50

© 2014 XO GROUP INC. ALL RIGHTS RESERVED.

Continuous Migration Architecture

Clients

Legacy API New API

SQS Queue Messenger

ONE WAY SYNC…

| 51

© 2014 XO GROUP INC. ALL RIGHTS RESERVED.

Continuous Migration Favorites Legacy Messenger

• Ruby• Consumer of an SQS queue coming from Legacy

that generates 1 message per operation• Issues API call to new app per each operation• Runs as a worker in the background

Legacy API

SQSLegacy

Messenger

NewAPI Mongo

| 52

© 2014 XO GROUP INC. ALL RIGHTS RESERVED.

SQS Queue is not a FIFO Friend

Sent by Legacy

1

2

3

4

5

6

7

5

3

1

2

4

6

7

Consumed by Messenger

| 53

© 2014 XO GROUP INC. ALL RIGHTS RESERVED.

• Queue is not FIFO• Objects don’t exist• Queue bloats fast• Can get like not-real-time• Data is different

Challenges

| 54

© 2014 XO GROUP INC. ALL RIGHTS RESERVED.

• Verify if entity exists (API call),

otherwise, throw back in queue• Set message expiration• Sanitize data• Get multiple workers to achieve

near real-time syncing.

Solutions

| 55

© 2014 XO GROUP INC. ALL RIGHTS RESERVED.

• Favor a simple document structure• Try different schema paradigms• Bypass native objectId generation in favor of UUID• Break with the past• Queues can be deceiving• Gems can simplify application layer impl.• Manage ref. integrity in app. layer• No cache required Take away

| 56

© 2014 XO GROUP INC. ALL RIGHTS RESERVED.

• Avg 85 rpm on peak times

• Avg 58 msec response POST

• Avg 18 msec response GET

New Benchmarks

| 57

© 2014 XO GROUP INC. ALL RIGHTS RESERVED.

New vs. Legacy

• Overall Performance Increase• 18 ms vs. 460 ms for GET• 58 ms vs. 80 ms for POST • Easy Schema Changes• Scalable• Simpler architecture• No Cache layer• Fast Code iteration, testing and deployment• In-line with company’s technology strategy

Good

| 58

© 2014 XO GROUP INC. ALL RIGHTS RESERVED.

Acknowledgments

• Dmitri Nesterenko• Jason Sherman• Nelly Santoso• Phillip Chiu• Sean Lipkin• George Taveras• Alison Fay• Diana Taykhman• Rajendra Prashad• Josh Keys• Lewis DiFelice

| 59

© 2014 XO GROUP INC. ALL RIGHTS RESERVED.

contact, questions, inquiries?

memcomtech@xogrp.com

Recommended