View
163
Download
0
Category
Preview:
Citation preview
NoSQLMongoDB and Redis as alternatives to
traditional RDBMS
Then...
...and now
*This thing weighs less than 50g
Meaning of NoSQL
1970 = We have no SQL1980 = Know SQL2000 = No SQL!2005 = Not only SQL2014 = No, SQL
(slide adapted from @markmadsen)
MongoDB
MongoDB
● it is the “new MySQL”● Project started in 2007 by 10gen (now MongoDB Inc)● Cross-platform, open-source● 5th most used DBMS & most used Document Store*
(next DS CouchDB - 21st)* According to db-engines.com as of Oct 2014
Characteristics
● “It's really a hybrid database with features from a few different places.” (Gaetan Voyer-Perrault on Quora)
● Document Oriented but NO SCHEMA! ● Documents grouped in Collections● Binary JSON (BSON) format● Load Balancing (automated sharding, sharding key
can be user defined)● Replication (Replica Sets)● Automated failover
Characteristics - continued
● Primary and Secondary Indexes● JavaScript for UDF● MapReduce● Capped Collections● Aggregation Framework since 2.2● Ad-hoc Query Support
Caveats
Generic performance tips
● Use 64-bit OS● Lots of RAM, fast disks (was anyone expecting
something else?)● ensure that at least indexes + working set fit in RAM
(db.stats(), db.<coll>.stats()) - if not, you might want to try TokuMX
● Design for de-normalized data models
Generic performance tips
● Write-Concerns● Shard early● Fixed (or at least bounded) record size => better write
performance● Use short attribute names (reduces index & data size,
OFC!)● EXT4 or XFS
IRL
● virtualized server 8G RAM, 4 vCPU - no sharding, no replica sets
● 100 inserts/s , 130M doc collection WITH secondary index (avg doc size 0.6k)
● 20 inserts/s 3M doc collection WITH 18 secondary indexes (avg doc size 10k)
Use Cases
● Logs● Location Data (Mongo has built in Geospatial ops)● Account and User Profiles● Messaging● (complex) Config Data● http://www.mongodb.com/who-uses-mongodb (hint:
Expedia, Business Insider, The Weather Channel, Foursquare, eBay)
Redis
Redis
● Salvatore Sanfilippo (@antirez)● Started in 2009● Key-Value Store● 11th most used DBMS & most used KV Store* (next
KVS memcached - 19th)● Sponsored by Pivotal (spinoff EMC/VMware)* According to db-engines.com as of Oct 2014
Characteristics
● Holds all data in memory, persists on disk● Data Models
○ Strings/Blobs/Bit-Maps (not really Bitmaps)○ Hashtables○ Linked Lists○ Sets○ Sorted Sets
● HyperLogLog (+2.8.9 - trade accuracy for memory)● Master Slave Replication● High Availability (through Sentinel)
Characteristics - continued
● Redis Cluster in works (not production ready yet) - sharding ○ asynchronous replication○ does not guarantee strong consistency (may ‘forget’ writes)
● AOF sync - default 2s● Does not support secondary indexes● Pub/Sub mode since 2.0● Key expiry● Server scripting with Lua
IRL
● virtualized server 4G RAM, 1vCPU● +50k get/set per second (redis-benchmark)● only 128 queries out of 1165550375 over 10ms
(0.00001%)○ uptime_in_days:439○ used_memory_human:424.09M○ used_memory_peak_human:834.94M○ total_connections_received:1352935○ db0:keys=610884,expires=355397
Generic performance tips
● Use short key names (reduces data size, OFC!)● You can create secondary indexes (but you have to
maintain them, e.g. using SET)● You can have ad-hoc queries (actually is query) :
using SORT
Use Cases
● Cache● IPSS/IPC● Queue mechanisms (see e.g. Resque)● Log/Task buffers● Statistics and aggregation datastore● (anywhere you use memcached)● http://redis.io/topics/whos-using-redis (hint: Twitter,
GitHub, Snapchat, StackOverflow a.o.)
Recap
One size does NOT fit all!
Further reading
● Must read: http://blog.andreamostosi.name/big-data/ (almost exhaustive list of all things NoSQL and BigData)
Recommended