35
NoSQL N Why Wordnik wen Tony @feh Now 2011 nt Non-Relational Tam hguy

What Drove Wordnik Non-Relational?

Embed Size (px)

DESCRIPTION

Wordnik's technical co-founder Tony Tam describes the reason for going NoSQL. During his talk Tony will discuss the selection criteria, testing + evaluation and successful, zero-downtime migration to MongoDB. Additionally details on Wordnik's speed and stability will be covered as well as how NoSQL technologies have changed the way Wordnik scales.

Citation preview

Page 1: What Drove Wordnik Non-Relational?

NoSQL NWhy Wordnik weny

Tony @feh

Now 2011nt Non-RelationalTam

hguy

Page 2: What Drove Wordnik Non-Relational?

What this Ta

• 5 Key reasons why N R l ti l da Non-Relational da

• Process for selectioProcess for selectio• Optimizations and tip

survivors of the batt

alk is About

Wordnik migrated into t batabasen migrationn, migrationips from living p gtle field

Page 3: What Drove Wordnik Non-Relational?

Why Should

• MongoDB user for a• Lessons learned, an

processprocess• We migrated from Mg

with no downtime• W h i t ti• We have interesting

needs, likely relevan, y

d You Care?

almost 2 yearsnalysis, benefits from

MySQL to MongoDB y g

/ h ll i d tg/challenging data nt to youy

Page 4: What Drove Wordnik Non-Relational?

More on

• World’s fastest upda• Based on input of text • Word Graph as basis tWord Graph as basis t

• Synchronous & asyn

• 10’s of Billions of dostoragestorage

• 20M daily REST APy• Powered by Swagger

Powered APIswagg

Wordnik

ating English dictionaryup to 8k words/secondto our analysisto our analysisnchronous processing

ocuments in NR

PI calls, billions served,OSS API framework

ger.wordnik.com

Page 5: What Drove Wordnik Non-Relational?

Architectu

• 2008: Wordnik was EC2 t kEC2 stack

• 2009: Introduced pu2009: Introduced pupowered wordnik.co

• 2009: drank NoSQL• 2010 S l• 2010: Scala• 2011: Micro SOA2011: Micro SOA

ral History

born as a LAMP AWS

ublic REST APIublic REST API, om, partner APIsL cool-aid

Page 6: What Drove Wordnik Non-Relational?

Non-relational

• Moved to NR becau• Speed• StabilityStability• Scaling• Simplicity

• But• But…• MySQL can go a LON

• Takes right team, rig• NR ff i i l t• NR offerings simply to

scaling MySQL

l by Necessity

use of “4S”

G wayght reasons (+ patience)

lli t fo compelling to focus on

Page 7: What Drove Wordnik Non-Relational?

Wordnik’s 5 WWordnik s 5 WWhys for NoSQLWhys for NoSQL

Page 8: What Drove Wordnik Non-Relational?

Why #1: Speed bu

• Inserting data fast (5d M SQLcaused MySQL may

• Maintaining indexes laa ta g de es a• Operations for consiste

"cannot be turned off”cannot be turned off

• Devised twisted schblocking• Ak h “ / l• Aka the “master/slave

umps with MySQL

50k recs/second) hyhem

argely to blamea ge y to b a eency unnecessary but

hemes to avoid client

”tango”

Page 9: What Drove Wordnik Non-Relational?

Why #2: Retrie

• Objects typically ma• Object Hierarchy alway

• Lots of static data sLots of static data, s• “Noun” is not getting re

lifetime!• Logic like this is probaLogic like this is proba

• Since storage is che• I’ll choose speed

eval Complexity

apped to tablesys => inner + outer joins

so why join?so why join?enamed in my code’s

bly in application logicbly in application logic

eap

Page 10: What Drove Wordnik Non-Relational?

Why #2: Retrieeval Complexity

One definition = 10+

50 requests d!per second!

Page 11: What Drove Wordnik Non-Relational?

Why #2: Retrie

• Embed objects in ro• Fil i ll• Filtering gets really na• Native XML in MySQLy Q

• If a full table-scan is

• OK then cache it!OK, then cache it!• Layers of caching intro

• Stale data/corruptio• Object versionitis• Object versionitis• Cache stampedes

eval Complexity

ows “sort of works”sty

L?s OK…

oduced layers of complexityn

Page 12: What Drove Wordnik Non-Relational?

Why #3: Obje

• Object models beingk f i tsake of persistence

• This is backwards!s s bac a ds• Extra abstraction for th

• OK, then performan• In application joins acr• In-application joins acr• “Who ran the fetch all

–any sysadmin

• “My zillionth ORM laMy zillionth ORM launderstand” (and ca

ect Modeling

g compromised for

he wrong reason

nce suffersross objectsross objectsquery against production?!”

ayer that only Iayer that only I an maintain)

Page 13: What Drove Wordnik Non-Relational?

Why #4:

• Needed "cloud frien• Easy up, easy down!

• Startup: Sync your dStartup: Sync your dclients when ready f

• Sh td A• Shutdown: Announc

• Adding MySQL instaAdding MySQL insta• Snapshot + bin filesmysql> change master tMASTER_USER='xxx', MASMASTER LOG FILE ' tMASTER_LOG_FILE='masteMASTER_LOG_POS=1035435

Scaling

dly storage"

data and announce todata, and announce to for business

d t d lce your departure and leave

ances was a danceances was a dance

to MASTER_HOST='db1', STER_PASSWORD='xxx',

l 000431'er-relay.000431', 5402;

Page 14: What Drove Wordnik Non-Relational?

Why #4:

• What about those V• So convenient! But… • Can the database succCan the database succ

• VM Performance:• Memory, CPU or I/O—• C d t b• Can your database rea

with lots of RAM?

Scaling

VMs?they kind of suckceed on a VM?ceed on a VM?

—Pick only onell d CPU di k I/Oally reduce CPU or disk I/O

Page 15: What Drove Wordnik Non-Relational?

Why #5: B

• BI tools use relational • I hi h i h f• Is this the right reason for

• Can we work around this?

• Let’s have a BI tool revolu

• True service architectu• True service architectuconstraints impractica

• Distributed sharding mconstraints impracticaconstraints impractica

Big Picture

constraints for discoveryh ?r them?

?

ution, too!

ure makes relationalure makes relational l/impossible

makes relational l/impossiblel/impossible

Page 16: What Drove Wordnik Non-Relational?

Why #5: B

• Is your app smarter • The logic line is probab

• What does count(*What does count(add 5k records/sec?• Maybe eventual consis

• 2PC? Do some rea• 2PC? Do some reahttp://eaipatterns.com/docs

Big Picture

than your database?bly blurry!

*) really mean when y) really mean when y?stency is not so bad…

ading and decide!ading and decide!/IEEE_Software_Design_2PC.pd

Page 17: What Drove Wordnik Non-Relational?

Ok, I’

• I thought deciding w• Many quickly maturing• Divergent features tacDivergent features tac

• Wordnik spent 8 wetesting NoSQL solut• This is a long time! (fo• This is a long time! (fo• Wrote ODM classes an

• Surprise! There we• Be prepared to compro

’m in!

was easy!?g productskle different needskle different needs

eeks researching and tionsr a startup)r a startup)nd migrated our data

re surprisesomise

Page 18: What Drove Wordnik Non-Relational?

Choice Made• We went with Mong

• Fastest to implementFastest to implement• Most reliable• Best community

• Wh ?• Why?• Why #1: Fast loading/ry g• Why #2: Fast ODM (50• Why #3: Document Mo• Why #4: MMF => KernWhy #4: MMF Kern• Why #5: It’s 2011, is th

e, Now What?oDB ***

retrieval0 tps => 1000 tps!)odels === Object modelsnel-managed memory + RSnel managed memory RShere no progress?

Page 19: What Drove Wordnik Non-Relational?

More on Wh

• Testing, testing, test• Used our migration too

• Read from MySQLRead from MySQL, • We loaded 5+ billion d

• In the end, one serv• I t 100k d /• Insert 100k records/se• Read 250k records/se• Support concurrent loa

hy MongoDB

tingols to load testwrite to MongoDBwrite to MongoDBocuments, many times over

ver could…t i dec sustained

c sustainedading/reading

Page 20: What Drove Wordnik Non-Relational?

Migration

• Iterated ODM mapp• Some issues

• Type SafetyType Safetycur.next.get(”iWasAnIntOn

• D S i• Dates as Stringsobj.put("a_date", "2011-1

obj.put("a_date", new Dat

• Storage SizeStorage Sizeobj.put("very_long_field_

obj.put("vsfn", true)

& Testing

ping multiple times

nce").asInstanceOf[Long]

12-31") !=

te("2011-12-31"))

_name", true) >>

Page 21: What Drove Wordnik Non-Relational?

Migration

• Expect data model i• Wordnik migrated table

• Easier to migrate teEasier to migrate, te• _id field used same

• Auto Increment?• Used MySQL to “chUsed MySQL to ch

• One row per mon• Run out of seque

• Need exclusive lockNeed exclusive lock

& Testing

iterationse to Mongo collection "as-is”esteste MySQL PK

eck-out” sequenceseck-out sequencesngo collectionnces => get more

ks here!ks here!

Page 22: What Drove Wordnik Non-Relational?

Migration

• Sequence generatoSequenceGenerator.check

• Sequence generatoSequence generato• Centralized UID mana

& Testing

r in-processkout("doc_metadata,100")

r as web servicer as web servicegement

Page 23: What Drove Wordnik Non-Relational?

Migration

• Expect data access • So much more flexibilit

• Reach into objectsReach into objects> db.dictionary_entry.f

• A h l bj• Access to a whole obje• Overwrite a whole objej

• Not always! This clo> db.foo.save({foo:"bar

• Update a single field> db.foo.update({_id:18

& Testing

pattern iterationsty!

find({"hdr.sr":"cmu"})

iect tree at query timeect at once… when desiredobbers the whole recordr”})

d:8727353},{$set:{foo:"bar"}})

Page 24: What Drove Wordnik Non-Relational?

Flip the

• Migrate production w• We temporarily halted • Added a switch to flip bAdded a switch to flip b• Instrument, monitor, fli

• Profiling your code i• Wh t i l ?• What is slow?• Build this in your app f

Switch

with zero downtimeloading databetween MySQL/MongoDBbetween MySQL/MongoDBip it, analyze, flip back

is key

from day 1

Page 25: What Drove Wordnik Non-Relational?

Flip the Switch

Page 26: What Drove Wordnik Non-Relational?

Flip the

• Storage selected at l h h ldval h = shouldUseMongo

case true => new Mo

case _ => new MySQL

}

h.find(...)

• Hot swappable storaHot-swappable stora• It worked!

Switch

runtimeb h {oDb match {

ongoDbSentenceDAO

LDbSentenceDAO

age via configurationage via configuration

Page 27: What Drove Wordnik Non-Relational?

Then W

• Watch our deploymei lmapping layer

• Settled on in-house, tySett ed o ouse, tyhttps://github.com/fehguy

• S t h ( f• Some gotchas (of co• Locking issues on longLocking issues on long

minute)

• W t f th• We want more of th• Migrated shared files tMigrated shared files t• Easy-IT

What?

ent, many iterations to

ype-safe mapper ype sa e appey/mongodb-benchmark-tools

)ourse)g-running updates (more in ag running updates (more in a

i !is!to Mongo GridFSto Mongo GridFS

Page 28: What Drove Wordnik Non-Relational?

Performance +

• Loading data is fast• Fixed collection paddin• Tail of collection is alwTail of collection is alw• Append faster than My

• But... random acces• I d i RAM? Y• Indexes in RAM? Yes• Data in RAM? No, > 2• Limited by disk I/O /se• EC2 EBS f t• EC2 + EBS for storage

+ Optimization

!ng, similarly-sized records

ways in memoryways in memoryySQL in every case tested

ss started getting slows2TB per serverek performance?e?

Page 29: What Drove Wordnik Non-Relational?

Performance +

• Moved to physical d• DAS & 72GB RAM =>

performance

• Good move? Depe• If “access anything any• You want to support thYou want to support th

+ Optimization

data centergreat uncached

nds on use caseytime”, not many options

his?his?

Page 30: What Drove Wordnik Non-Relational?

Performance +

• Inserts are fast, how• Well… update => find • Lock acquired at “find”Lock acquired at find

• If hitting disk, lock ti

• Easy answer, pre-fe• Oh d NEVER d “• Oh, and NEVER do “u

large collection

+ Optimization

w about updates?object, update it, save

” released after “save”, released after saveme could be large

etch on updated t ll d ” i tpdate all records” against a

Page 31: What Drove Wordnik Non-Relational?

Performance +

• Indexes• Can't always keep inde

thing"• Right-balanced b-tree • I d hit di k >• Indexes hit disk => mu

+ Optimization

ex in ram. MMF "does it's

keeps necessary index hottute your pager

17

Page 32: What Drove Wordnik Non-Relational?

More Mong

• We modeled our wo

0M Nodes0M Edges0M Edges0μS edge fetch

go, Please!

ord graph in mongo

Page 33: What Drove Wordnik Non-Relational?

More Mong

• Analytics rolled-up f• Send to Hadoop, load

go, Please!

from aggregation jobsto mongo for fast access

Page 34: What Drove Wordnik Non-Relational?

What’s

• Liberate our models• stop worrying about ho

most part)

• New features almos• Some MySQL left

• Less on each release• Less on each release

s next

sow to store them (for the

st always NR

Page 35: What Drove Wordnik Non-Relational?

Quest

• See more about Wordnik APhttp://devehttp://deve

• Migrating from MySQL to Mohttp://www.slideshare.net/fehguy/mig

• Maintaining your MongoDB http://www.slideshare

• Swagger API Frameworkhttp://sw

• Mapping Benchmarkpp ghttps://github.com/f

• Wordnik OSS ToolsWordnik OSS Toolshttps://github.c

tions?

PIseloper wordnik comeloper.wordnik.com

ongoDBgrating-from-mysql-to-mongodb-at-wordn

Installatione.net/fehguy/mongo-sv-tony-tam

wagger.wordnik.com

fehguy/mongodb-benchmark-tools

com/wordnik/wordnik-oss