MongoDB: How We Did It – Reanimating Identity at AOL

MongoDB: How We Did It – Reanimating Identity at AOL

Topics

• Motivation• Challenges• Approach• MongoDB Testing• Deployment• Collections• Problem/Solution• Lessons Learned• Going Forward

Motivation

• Cluttered data• Ambiguous data• Functionally shared data• Immutable data model

Motivation

Challenges

Challenges

• Leaving behind fault-tolerant (Non-Stop) platform/Transactional integrity

• Merge/extricate Identity data• Scaling to handle consolidated traffic • Continue to support Legacy

Approach

Approach

• Document-based data model – use MongoDB• Migrate data • Build adapter/interceptor layer• Production testing with no impacts

Approach

• Audit of setup with MongoDB• Tweak mongo settings, including driver, to

optimize for performance • Leverage eventual consistency to overcome

transactional integrity loss • Switch Identity to new data model using

MongoDB

Migration

Migration

• Adapters support 4 stages:1. Read/write legacy2. Read/write legacy, write mongoDB (shadow read

mongoDB)3. read/write mongoDB, write legacy4. Read/write mongoDB

Stage 1 Stage 1

Stage 2Stage 2

Stage 3

Stage 3

Stage 4 Stage 4

MongoDB Testing

Production Testing

• “Chaos Monkey” testing of MongoDB• 4 Million requests/Minute (production load,

read to write ratio 99%)• Test primary failover (graceful)• Kill Primary

Production Testing

• Test secondary failure• Shutdown all secondaries• Manually shutdown interface on primary• Performance benchmarking

Production Testing

• Performance very good, shard key reads ~2-3ms

• Scatter-gather reads ~12ms • Writes good as well, ~3-20ms• Failovers 4-5 minutes

MongoDB Healthcheck

• Use dedicated machines for Config servers• Place Config servers in different data centers• Handle failover in application, if network

exception, fallback to secondary• Set lower TCP keepalive values (5 minutes)

Deployment

Deployment

• Version 2.4.9• All 75 mongod’s on separate switches• 2 x 12 Core CPUs, 192GB of RAM and internal

controller based RAID 10 Ext4 File Systems• Using default chunk size (64MB)

Deployment

• Have dedicated slaves for backup (configured as hidden members with priority 0). Backup runs during 6-8am window

• Enable powerOf2Sizes for collections to reduce fragmentation

• Balancer restricted to 4-6am daily

Collections

Document Model

• Entire data set must be in memory to meet performance demands

• Document field names abbreviated, but descriptive

• Don’t store default values (Legacy document is 80% defaults)

• Working hard to keep legacy artifacts out, but always about trade-offs

UserIdentity Collection

• Core data model for Identity• Heterogenous collection (some documents

are “aliases” which are pointers to primary document)

• Index on user+namespace• Shard key is guid (UUID Type 1, flipped –

node then time)

UserIdentity

{_id: “baebc8bcc8e14f6e9bf70221d81711e2”,user: “jdoe”,ns: “aol,…"profile" : { "cc" : "US", "firstNm" : ”John", "lang" : "en_US", "lastNm" : ”Doe”},"sysTime" : ISODate("2014-05-03T04:43:49.899Z”)}

Relationship Collection

• Support all cardinalities• Equivalent to RDBMS intersection table (guid

on each end of relationship)• Use eventually consistent framework for non-

atomic writes• Shard key is parent+child+type (parent lookup

is primary use case)

Relationship Collection

{ "_id" : ”baa000163e5ff405b8083d5f164c11e3", "child" : "8a9e00237d617f08df7f1685527711e2", "createTime" : ISODate("2013-09-05T17:00:51.209Z"), "modTime" : ISODate("2013-09-05T17:00:51.209Z"), "attributes" : null, "parent" : ” baebc8bcc8e14f6e9bf70221d81711e2", "type" : ”CLASSROOM”}

Legacy Collection

• Bridge collection to facilitate migration from old data model to new

• Near-image of old data model but with some refactoring (3 tables into 1 document)

• Once migration is complete, plan is to drop this collection

• Defaults not stored, 1-2 character field names

Legacy Collection

{ "_id" : ”jdoe", ”subData" : { "f" : NumberLong(1018628731), "g" : ”jdoe", "d" : false, "e" : NumberLong(1018628731), "b" : NumberLong(434077116), "a" : ”JDoe", "l" : NumberLong("212200907100000000"), "i" : NumberLong(659952670) }, ”guid" : "baebc8bcc8e14f6e9bf70221d81711e2", "st" : ISODate("2013-06-24T20:13:16.627Z")}

Reservation Collection

• Namespace protection• Enforce uniqueness of user/namespace from

application side because shard key for UserIdentity collection is guid

• Shard key is username+namespace

Reservation Collection

{"_id" : "b13a00163e062d8ee9dc9eaf3e2411e1","createTime" : ISODate("2012-01-

13T20:26:46.111Z"),"user" : ”jdoe","expires" : ISODate("2012-01-

13T21:26:46.111Z"),”rsvId" : "e9bddfe1-1c84-42c9-8f4c-

1a7a96920ff4", ”data" : { "k1": "v1", "k2" : "v2" },

”ns" : "aol","type" : "R"

}

Problems/Solutions

Problem

Writes spanning multiple documents sometimes fail part way

Solution

• Developed eventually consistent framework “synchronizer”

• Events sent to framework to validate, repair, or finish

• Events retryable until success or ttl is expired

Problem

Scatter-gather queries slower, 100% performance impact on failover

Solution

• Use Memcached to map non-shard key to shard key (99% hit ratio for one mapping, 55% for other)

• Use Memcached to map potentially expensive intermediary results (88% hit ratio)

Problem

Querying lists of users required parallel processing for performance -- increasing connection requirements

Solution

Use $in operator to query lists of users rather than looping through individual queries

Problem

At application startup a large number of requests failed because of overhead in creating mongos connections

Solution

Build into application a “warm-up” stage that executes stock queries prior to going online and taking traffic

Problem

During failovers or other slow periods, application queues back up and recovery takes too long

Solution

Determine request’s time in queue, if exceeds client’s timeout, don’t process, drop request

Problem

Using application applied optimistic lock encounters lock errors during concurrent writes (entire document updated)

Solution

Use $Set operator to target writes to just those impacted elements, use MongoDB to enforce atomicity

Problem

Reads from primary, but when secondaries lost, reads fail

Solution

Use primaryPreferred for reads. Want the freshest data (password for example), but still want reads to work if no primary exists

Problem

Large number of connections to mongos/mongod is extending the failover times and nearing limits

Solution

• Application DAOs share connections to same Mongo cluster

• Connection params initially set too high• Set connectionsPerHost and

connectionMultiplier plus a buffer to cover the fixed number of worker threads per application (15/5 for 32 worker threads).

• Went from 15K connections to 2K connections

Benefits

Benefits• Unanticipated benefit was ability for all

eligible users to use the AOL client• Easily added Identity extensions leveraging the

new data model• Support for multiple namespaces made

building APIs for multi-tenancy straightforward

• Model is positioned in such a way to make vision for AOL Identity feasible

Lessons Learned

Lessons Learned

• Keep connections as low as possible– Higher connection numbers increase failover

times• Avoid scatter-gather reads (use cache if

possible to get to shard key)• Keep data set in memory• Fail fast on application side to lower recovery

time

Going Forward

Going forward

• Implement tagging to target secondaries• Further reduction in scatter-gather reads• Reduce failover window to as short as possible

• Contact: [email protected]

mailto:[email protected]

Technology

MongoDB: How We Did It – Reanimating Identity at AOL