Upload
dataversity
View
598
Download
1
Embed Size (px)
DESCRIPTION
Wordnik's technical co-founder Tony Tam describes the reason for going NoSQL. During his talk Tony will discuss the selection criteria, testing + evaluation and successful, zero-downtime migration to MongoDB. Additionally details on Wordnik's speed and stability will be covered as well as how NoSQL technologies have changed the way Wordnik scales.
Citation preview
NoSQL NWhy Wordnik weny
Tony @feh
Now 2011nt Non-RelationalTam
hguy
What this Ta
• 5 Key reasons why N R l ti l da Non-Relational da
• Process for selectioProcess for selectio• Optimizations and tip
survivors of the batt
alk is About
Wordnik migrated into t batabasen migrationn, migrationips from living p gtle field
Why Should
• MongoDB user for a• Lessons learned, an
processprocess• We migrated from Mg
with no downtime• W h i t ti• We have interesting
needs, likely relevan, y
d You Care?
almost 2 yearsnalysis, benefits from
MySQL to MongoDB y g
/ h ll i d tg/challenging data nt to youy
More on
• World’s fastest upda• Based on input of text • Word Graph as basis tWord Graph as basis t
• Synchronous & asyn
• 10’s of Billions of dostoragestorage
• 20M daily REST APy• Powered by Swagger
Powered APIswagg
Wordnik
ating English dictionaryup to 8k words/secondto our analysisto our analysisnchronous processing
ocuments in NR
PI calls, billions served,OSS API framework
ger.wordnik.com
Architectu
• 2008: Wordnik was EC2 t kEC2 stack
• 2009: Introduced pu2009: Introduced pupowered wordnik.co
• 2009: drank NoSQL• 2010 S l• 2010: Scala• 2011: Micro SOA2011: Micro SOA
ral History
born as a LAMP AWS
ublic REST APIublic REST API, om, partner APIsL cool-aid
Non-relational
• Moved to NR becau• Speed• StabilityStability• Scaling• Simplicity
• But• But…• MySQL can go a LON
• Takes right team, rig• NR ff i i l t• NR offerings simply to
scaling MySQL
l by Necessity
use of “4S”
G wayght reasons (+ patience)
lli t fo compelling to focus on
Wordnik’s 5 WWordnik s 5 WWhys for NoSQLWhys for NoSQL
Why #1: Speed bu
• Inserting data fast (5d M SQLcaused MySQL may
• Maintaining indexes laa ta g de es a• Operations for consiste
"cannot be turned off”cannot be turned off
• Devised twisted schblocking• Ak h “ / l• Aka the “master/slave
umps with MySQL
50k recs/second) hyhem
argely to blamea ge y to b a eency unnecessary but
hemes to avoid client
”tango”
Why #2: Retrie
• Objects typically ma• Object Hierarchy alway
• Lots of static data sLots of static data, s• “Noun” is not getting re
lifetime!• Logic like this is probaLogic like this is proba
• Since storage is che• I’ll choose speed
eval Complexity
apped to tablesys => inner + outer joins
so why join?so why join?enamed in my code’s
bly in application logicbly in application logic
eap
Why #2: Retrieeval Complexity
One definition = 10+
50 requests d!per second!
Why #2: Retrie
• Embed objects in ro• Fil i ll• Filtering gets really na• Native XML in MySQLy Q
• If a full table-scan is
• OK then cache it!OK, then cache it!• Layers of caching intro
• Stale data/corruptio• Object versionitis• Object versionitis• Cache stampedes
eval Complexity
ows “sort of works”sty
L?s OK…
oduced layers of complexityn
Why #3: Obje
• Object models beingk f i tsake of persistence
• This is backwards!s s bac a ds• Extra abstraction for th
• OK, then performan• In application joins acr• In-application joins acr• “Who ran the fetch all
–any sysadmin
• “My zillionth ORM laMy zillionth ORM launderstand” (and ca
ect Modeling
g compromised for
he wrong reason
nce suffersross objectsross objectsquery against production?!”
ayer that only Iayer that only I an maintain)
Why #4:
• Needed "cloud frien• Easy up, easy down!
• Startup: Sync your dStartup: Sync your dclients when ready f
• Sh td A• Shutdown: Announc
• Adding MySQL instaAdding MySQL insta• Snapshot + bin filesmysql> change master tMASTER_USER='xxx', MASMASTER LOG FILE ' tMASTER_LOG_FILE='masteMASTER_LOG_POS=1035435
Scaling
dly storage"
data and announce todata, and announce to for business
d t d lce your departure and leave
ances was a danceances was a dance
to MASTER_HOST='db1', STER_PASSWORD='xxx',
l 000431'er-relay.000431', 5402;
Why #4:
• What about those V• So convenient! But… • Can the database succCan the database succ
• VM Performance:• Memory, CPU or I/O—• C d t b• Can your database rea
with lots of RAM?
Scaling
VMs?they kind of suckceed on a VM?ceed on a VM?
—Pick only onell d CPU di k I/Oally reduce CPU or disk I/O
Why #5: B
• BI tools use relational • I hi h i h f• Is this the right reason for
• Can we work around this?
• Let’s have a BI tool revolu
• True service architectu• True service architectuconstraints impractica
• Distributed sharding mconstraints impracticaconstraints impractica
Big Picture
constraints for discoveryh ?r them?
?
ution, too!
ure makes relationalure makes relational l/impossible
makes relational l/impossiblel/impossible
Why #5: B
• Is your app smarter • The logic line is probab
• What does count(*What does count(add 5k records/sec?• Maybe eventual consis
• 2PC? Do some rea• 2PC? Do some reahttp://eaipatterns.com/docs
Big Picture
than your database?bly blurry!
*) really mean when y) really mean when y?stency is not so bad…
ading and decide!ading and decide!/IEEE_Software_Design_2PC.pd
Ok, I’
• I thought deciding w• Many quickly maturing• Divergent features tacDivergent features tac
• Wordnik spent 8 wetesting NoSQL solut• This is a long time! (fo• This is a long time! (fo• Wrote ODM classes an
• Surprise! There we• Be prepared to compro
’m in!
was easy!?g productskle different needskle different needs
eeks researching and tionsr a startup)r a startup)nd migrated our data
re surprisesomise
Choice Made• We went with Mong
• Fastest to implementFastest to implement• Most reliable• Best community
• Wh ?• Why?• Why #1: Fast loading/ry g• Why #2: Fast ODM (50• Why #3: Document Mo• Why #4: MMF => KernWhy #4: MMF Kern• Why #5: It’s 2011, is th
e, Now What?oDB ***
retrieval0 tps => 1000 tps!)odels === Object modelsnel-managed memory + RSnel managed memory RShere no progress?
More on Wh
• Testing, testing, test• Used our migration too
• Read from MySQLRead from MySQL, • We loaded 5+ billion d
• In the end, one serv• I t 100k d /• Insert 100k records/se• Read 250k records/se• Support concurrent loa
hy MongoDB
tingols to load testwrite to MongoDBwrite to MongoDBocuments, many times over
ver could…t i dec sustained
c sustainedading/reading
Migration
• Iterated ODM mapp• Some issues
• Type SafetyType Safetycur.next.get(”iWasAnIntOn
• D S i• Dates as Stringsobj.put("a_date", "2011-1
obj.put("a_date", new Dat
• Storage SizeStorage Sizeobj.put("very_long_field_
obj.put("vsfn", true)
& Testing
ping multiple times
nce").asInstanceOf[Long]
12-31") !=
te("2011-12-31"))
_name", true) >>
Migration
• Expect data model i• Wordnik migrated table
• Easier to migrate teEasier to migrate, te• _id field used same
• Auto Increment?• Used MySQL to “chUsed MySQL to ch
• One row per mon• Run out of seque
• Need exclusive lockNeed exclusive lock
& Testing
iterationse to Mongo collection "as-is”esteste MySQL PK
eck-out” sequenceseck-out sequencesngo collectionnces => get more
ks here!ks here!
Migration
• Sequence generatoSequenceGenerator.check
• Sequence generatoSequence generato• Centralized UID mana
& Testing
r in-processkout("doc_metadata,100")
r as web servicer as web servicegement
Migration
• Expect data access • So much more flexibilit
• Reach into objectsReach into objects> db.dictionary_entry.f
• A h l bj• Access to a whole obje• Overwrite a whole objej
• Not always! This clo> db.foo.save({foo:"bar
• Update a single field> db.foo.update({_id:18
& Testing
pattern iterationsty!
find({"hdr.sr":"cmu"})
iect tree at query timeect at once… when desiredobbers the whole recordr”})
d:8727353},{$set:{foo:"bar"}})
Flip the
• Migrate production w• We temporarily halted • Added a switch to flip bAdded a switch to flip b• Instrument, monitor, fli
• Profiling your code i• Wh t i l ?• What is slow?• Build this in your app f
Switch
with zero downtimeloading databetween MySQL/MongoDBbetween MySQL/MongoDBip it, analyze, flip back
is key
from day 1
Flip the Switch
Flip the
• Storage selected at l h h ldval h = shouldUseMongo
case true => new Mo
case _ => new MySQL
}
h.find(...)
• Hot swappable storaHot-swappable stora• It worked!
Switch
runtimeb h {oDb match {
ongoDbSentenceDAO
LDbSentenceDAO
age via configurationage via configuration
Then W
• Watch our deploymei lmapping layer
• Settled on in-house, tySett ed o ouse, tyhttps://github.com/fehguy
• S t h ( f• Some gotchas (of co• Locking issues on longLocking issues on long
minute)
• W t f th• We want more of th• Migrated shared files tMigrated shared files t• Easy-IT
What?
ent, many iterations to
ype-safe mapper ype sa e appey/mongodb-benchmark-tools
)ourse)g-running updates (more in ag running updates (more in a
i !is!to Mongo GridFSto Mongo GridFS
Performance +
• Loading data is fast• Fixed collection paddin• Tail of collection is alwTail of collection is alw• Append faster than My
• But... random acces• I d i RAM? Y• Indexes in RAM? Yes• Data in RAM? No, > 2• Limited by disk I/O /se• EC2 EBS f t• EC2 + EBS for storage
+ Optimization
!ng, similarly-sized records
ways in memoryways in memoryySQL in every case tested
ss started getting slows2TB per serverek performance?e?
Performance +
• Moved to physical d• DAS & 72GB RAM =>
performance
• Good move? Depe• If “access anything any• You want to support thYou want to support th
+ Optimization
data centergreat uncached
nds on use caseytime”, not many options
his?his?
Performance +
• Inserts are fast, how• Well… update => find • Lock acquired at “find”Lock acquired at find
• If hitting disk, lock ti
• Easy answer, pre-fe• Oh d NEVER d “• Oh, and NEVER do “u
large collection
+ Optimization
w about updates?object, update it, save
” released after “save”, released after saveme could be large
etch on updated t ll d ” i tpdate all records” against a
Performance +
• Indexes• Can't always keep inde
thing"• Right-balanced b-tree • I d hit di k >• Indexes hit disk => mu
+ Optimization
ex in ram. MMF "does it's
keeps necessary index hottute your pager
17
More Mong
• We modeled our wo
0M Nodes0M Edges0M Edges0μS edge fetch
go, Please!
ord graph in mongo
More Mong
• Analytics rolled-up f• Send to Hadoop, load
go, Please!
from aggregation jobsto mongo for fast access
What’s
• Liberate our models• stop worrying about ho
most part)
• New features almos• Some MySQL left
• Less on each release• Less on each release
s next
sow to store them (for the
st always NR
Quest
• See more about Wordnik APhttp://devehttp://deve
• Migrating from MySQL to Mohttp://www.slideshare.net/fehguy/mig
• Maintaining your MongoDB http://www.slideshare
• Swagger API Frameworkhttp://sw
• Mapping Benchmarkpp ghttps://github.com/f
• Wordnik OSS ToolsWordnik OSS Toolshttps://github.c
tions?
PIseloper wordnik comeloper.wordnik.com
ongoDBgrating-from-mysql-to-mongodb-at-wordn
Installatione.net/fehguy/mongo-sv-tony-tam
wagger.wordnik.com
fehguy/mongodb-benchmark-tools
com/wordnik/wordnik-oss