57
MongoDB: How We Did It – Reanimating Identity at AOL

MongoDB: How We Did It – Reanimating Identity at AOL

  • Upload
    mongodb

  • View
    329

  • Download
    1

Embed Size (px)

DESCRIPTION

AOL experienced explosive growth and needed a new database that was both flexible and easy to deploy with little effort. They chose MongoDB. Due to the complexity of internal systems and the data, most of the migration process was spent building a new identity platform and adapters for legacy apps to talk to MongoDB. Systems were migrated in 4 phases to ensure that users were not impacted during the switch. Turning on dual reads/writes to both legacy databases and MongoDB also helped get production traffic into MongoDB during the process. Ultimately, the project was successful with the help of MongoDB support. Today, the team has 15 shards, with 60-70 GB per shard.

Citation preview

Page 1: MongoDB: How We Did It – Reanimating Identity at AOL

MongoDB: How We Did It – Reanimating Identity at AOL

Page 2: MongoDB: How We Did It – Reanimating Identity at AOL

Topics

• Motivation• Challenges• Approach• MongoDB Testing• Deployment• Collections• Problem/Solution• Lessons Learned• Going Forward

Page 3: MongoDB: How We Did It – Reanimating Identity at AOL

Motivation

Page 4: MongoDB: How We Did It – Reanimating Identity at AOL

• Cluttered data• Ambiguous data• Functionally shared data• Immutable data model

Motivation

Page 5: MongoDB: How We Did It – Reanimating Identity at AOL

Challenges

Page 6: MongoDB: How We Did It – Reanimating Identity at AOL

Challenges

• Leaving behind fault-tolerant (Non-Stop) platform/Transactional integrity

• Merge/extricate Identity data• Scaling to handle consolidated traffic • Continue to support Legacy

Page 7: MongoDB: How We Did It – Reanimating Identity at AOL

Approach

Page 8: MongoDB: How We Did It – Reanimating Identity at AOL

Approach

• Document-based data model – use MongoDB• Migrate data • Build adapter/interceptor layer• Production testing with no impacts

Page 9: MongoDB: How We Did It – Reanimating Identity at AOL

Approach

• Audit of setup with MongoDB• Tweak mongo settings, including driver, to

optimize for performance • Leverage eventual consistency to overcome

transactional integrity loss • Switch Identity to new data model using

MongoDB

Page 10: MongoDB: How We Did It – Reanimating Identity at AOL

Migration

Page 11: MongoDB: How We Did It – Reanimating Identity at AOL

Migration

• Adapters support 4 stages:1. Read/write legacy2. Read/write legacy, write mongoDB (shadow read

mongoDB)3. read/write mongoDB, write legacy4. Read/write mongoDB

Page 12: MongoDB: How We Did It – Reanimating Identity at AOL

Stage 1 Stage 1

Page 13: MongoDB: How We Did It – Reanimating Identity at AOL

Stage 2Stage 2

Page 14: MongoDB: How We Did It – Reanimating Identity at AOL

Stage 3

Stage 3

Page 15: MongoDB: How We Did It – Reanimating Identity at AOL

Stage 4 Stage 4

Page 16: MongoDB: How We Did It – Reanimating Identity at AOL

MongoDB Testing

Page 17: MongoDB: How We Did It – Reanimating Identity at AOL

Production Testing

• “Chaos Monkey” testing of MongoDB• 4 Million requests/Minute (production load,

read to write ratio 99%)• Test primary failover (graceful)• Kill Primary

Page 18: MongoDB: How We Did It – Reanimating Identity at AOL

Production Testing

• Test secondary failure• Shutdown all secondaries• Manually shutdown interface on primary• Performance benchmarking

Page 19: MongoDB: How We Did It – Reanimating Identity at AOL

Production Testing

• Performance very good, shard key reads ~2-3ms

• Scatter-gather reads ~12ms • Writes good as well, ~3-20ms• Failovers 4-5 minutes

Page 20: MongoDB: How We Did It – Reanimating Identity at AOL

MongoDB Healthcheck

• Use dedicated machines for Config servers• Place Config servers in different data centers• Handle failover in application, if network

exception, fallback to secondary• Set lower TCP keepalive values (5 minutes)

Page 21: MongoDB: How We Did It – Reanimating Identity at AOL

Deployment

Page 22: MongoDB: How We Did It – Reanimating Identity at AOL
Page 23: MongoDB: How We Did It – Reanimating Identity at AOL

Deployment

• Version 2.4.9• All 75 mongod’s on separate switches• 2 x 12 Core CPUs, 192GB of RAM and internal

controller based RAID 10 Ext4 File Systems• Using default chunk size (64MB)

Page 24: MongoDB: How We Did It – Reanimating Identity at AOL

Deployment

• Have dedicated slaves for backup (configured as hidden members with priority 0). Backup runs during 6-8am window

• Enable powerOf2Sizes for collections to reduce fragmentation

• Balancer restricted to 4-6am daily

Page 25: MongoDB: How We Did It – Reanimating Identity at AOL

Collections

Page 26: MongoDB: How We Did It – Reanimating Identity at AOL

Document Model

• Entire data set must be in memory to meet performance demands

• Document field names abbreviated, but descriptive

• Don’t store default values (Legacy document is 80% defaults)

• Working hard to keep legacy artifacts out, but always about trade-offs

Page 27: MongoDB: How We Did It – Reanimating Identity at AOL

UserIdentity Collection

• Core data model for Identity• Heterogenous collection (some documents

are “aliases” which are pointers to primary document)

• Index on user+namespace• Shard key is guid (UUID Type 1, flipped –

node then time)

Page 28: MongoDB: How We Did It – Reanimating Identity at AOL

UserIdentity

{_id: “baebc8bcc8e14f6e9bf70221d81711e2”,user: “jdoe”,ns: “aol,…"profile" : { "cc" : "US", "firstNm" : ”John", "lang" : "en_US", "lastNm" : ”Doe”},"sysTime" : ISODate("2014-05-03T04:43:49.899Z”)}

Page 29: MongoDB: How We Did It – Reanimating Identity at AOL

Relationship Collection

• Support all cardinalities• Equivalent to RDBMS intersection table (guid

on each end of relationship)• Use eventually consistent framework for non-

atomic writes• Shard key is parent+child+type (parent lookup

is primary use case)

Page 30: MongoDB: How We Did It – Reanimating Identity at AOL

Relationship Collection

{ "_id" : ”baa000163e5ff405b8083d5f164c11e3", "child" : "8a9e00237d617f08df7f1685527711e2", "createTime" : ISODate("2013-09-05T17:00:51.209Z"), "modTime" : ISODate("2013-09-05T17:00:51.209Z"), "attributes" : null, "parent" : ” baebc8bcc8e14f6e9bf70221d81711e2", "type" : ”CLASSROOM”}

Page 31: MongoDB: How We Did It – Reanimating Identity at AOL

Legacy Collection

• Bridge collection to facilitate migration from old data model to new

• Near-image of old data model but with some refactoring (3 tables into 1 document)

• Once migration is complete, plan is to drop this collection

• Defaults not stored, 1-2 character field names

Page 32: MongoDB: How We Did It – Reanimating Identity at AOL

Legacy Collection

{ "_id" : ”jdoe", ”subData" : { "f" : NumberLong(1018628731), "g" : ”jdoe", "d" : false, "e" : NumberLong(1018628731), "b" : NumberLong(434077116), "a" : ”JDoe", "l" : NumberLong("212200907100000000"), "i" : NumberLong(659952670) }, ”guid" : "baebc8bcc8e14f6e9bf70221d81711e2", "st" : ISODate("2013-06-24T20:13:16.627Z")}

Page 33: MongoDB: How We Did It – Reanimating Identity at AOL

Reservation Collection

• Namespace protection• Enforce uniqueness of user/namespace from

application side because shard key for UserIdentity collection is guid

• Shard key is username+namespace

Page 34: MongoDB: How We Did It – Reanimating Identity at AOL

Reservation Collection

{"_id" : "b13a00163e062d8ee9dc9eaf3e2411e1","createTime" : ISODate("2012-01-

13T20:26:46.111Z"),"user" : ”jdoe","expires" : ISODate("2012-01-

13T21:26:46.111Z"),”rsvId" : "e9bddfe1-1c84-42c9-8f4c-

1a7a96920ff4", ”data" : { "k1": "v1", "k2" : "v2" },

”ns" : "aol","type" : "R"

}

Page 35: MongoDB: How We Did It – Reanimating Identity at AOL

Problems/Solutions

Page 36: MongoDB: How We Did It – Reanimating Identity at AOL

Problem

Writes spanning multiple documents sometimes fail part way

Page 37: MongoDB: How We Did It – Reanimating Identity at AOL

Solution

• Developed eventually consistent framework “synchronizer”

• Events sent to framework to validate, repair, or finish

• Events retryable until success or ttl is expired

Page 38: MongoDB: How We Did It – Reanimating Identity at AOL

Problem

Scatter-gather queries slower, 100% performance impact on failover

Page 39: MongoDB: How We Did It – Reanimating Identity at AOL

Solution

• Use Memcached to map non-shard key to shard key (99% hit ratio for one mapping, 55% for other)

• Use Memcached to map potentially expensive intermediary results (88% hit ratio)

Page 40: MongoDB: How We Did It – Reanimating Identity at AOL

Problem

Querying lists of users required parallel processing for performance -- increasing connection requirements

Page 41: MongoDB: How We Did It – Reanimating Identity at AOL

Solution

Use $in operator to query lists of users rather than looping through individual queries

Page 42: MongoDB: How We Did It – Reanimating Identity at AOL

Problem

At application startup a large number of requests failed because of overhead in creating mongos connections

Page 43: MongoDB: How We Did It – Reanimating Identity at AOL

Solution

Build into application a “warm-up” stage that executes stock queries prior to going online and taking traffic

Page 44: MongoDB: How We Did It – Reanimating Identity at AOL

Problem

During failovers or other slow periods, application queues back up and recovery takes too long

Page 45: MongoDB: How We Did It – Reanimating Identity at AOL

Solution

Determine request’s time in queue, if exceeds client’s timeout, don’t process, drop request

Page 46: MongoDB: How We Did It – Reanimating Identity at AOL

Problem

Using application applied optimistic lock encounters lock errors during concurrent writes (entire document updated)

Page 47: MongoDB: How We Did It – Reanimating Identity at AOL

Solution

Use $Set operator to target writes to just those impacted elements, use MongoDB to enforce atomicity

Page 48: MongoDB: How We Did It – Reanimating Identity at AOL

Problem

Reads from primary, but when secondaries lost, reads fail

Page 49: MongoDB: How We Did It – Reanimating Identity at AOL

Solution

Use primaryPreferred for reads. Want the freshest data (password for example), but still want reads to work if no primary exists

Page 50: MongoDB: How We Did It – Reanimating Identity at AOL

Problem

Large number of connections to mongos/mongod is extending the failover times and nearing limits

Page 51: MongoDB: How We Did It – Reanimating Identity at AOL

Solution

• Application DAOs share connections to same Mongo cluster

• Connection params initially set too high• Set connectionsPerHost and

connectionMultiplier plus a buffer to cover the fixed number of worker threads per application (15/5 for 32 worker threads).

• Went from 15K connections to 2K connections

Page 52: MongoDB: How We Did It – Reanimating Identity at AOL

Benefits

Page 53: MongoDB: How We Did It – Reanimating Identity at AOL

Benefits• Unanticipated benefit was ability for all

eligible users to use the AOL client• Easily added Identity extensions leveraging the

new data model• Support for multiple namespaces made

building APIs for multi-tenancy straightforward

• Model is positioned in such a way to make vision for AOL Identity feasible

Page 54: MongoDB: How We Did It – Reanimating Identity at AOL

Lessons Learned

Page 55: MongoDB: How We Did It – Reanimating Identity at AOL

Lessons Learned

• Keep connections as low as possible– Higher connection numbers increase failover

times• Avoid scatter-gather reads (use cache if

possible to get to shard key)• Keep data set in memory• Fail fast on application side to lower recovery

time

Page 56: MongoDB: How We Did It – Reanimating Identity at AOL

Going Forward

Page 57: MongoDB: How We Did It – Reanimating Identity at AOL

Going forward

• Implement tagging to target secondaries• Further reduction in scatter-gather reads• Reduce failover window to as short as possible

• Contact: [email protected]