37
Riak @ Robby Grossman [email protected] @freerobby

Riak at shareaholic

Embed Size (px)

DESCRIPTION

Slides from my talk on using Riak at Shareaholic

Citation preview

Page 1: Riak at shareaholic

Riak @

Robby [email protected]

@freerobby

Page 2: Riak at shareaholic

Agenda

Shareaholic: Product & Tech

Why Riak: The Search for a Big Data Store

Transitioning to Riak

Riak Use Cases

Deploying to EC2

Page 3: Riak at shareaholic

What’s ?

Page 4: Riak at shareaholic

Browser Tools

Page 5: Riak at shareaholic

Sharing Buttons

Page 6: Riak at shareaholic

Recommendations

Page 7: Riak at shareaholic

Social Analytics

Page 8: Riak at shareaholic

Monthly @

Thousands of developers hitting API

Hundreds of thousands of publishers

Tens of millions of shares & clicks

Hundreds of millions of pageviews & events

Page 9: Riak at shareaholic

Tech @

JRuby on Rails (via Torquebox)

MySQL (Master, Read Slave)

Elastic MapReduce (similar to Hadoop)

Redis

Formerly Mongo, Now Riak

Page 10: Riak at shareaholic

Why Not Mongo?

Working set needs to fit in memory

Global write lock blocks all queriesdespite not having transactions/joins

Standbys not “hot”

Page 11: Riak at shareaholic

Why Riak?

Page 12: Riak at shareaholic

Next @Options:

HBase

Cassandra

Riak

Goals:

Linear scalability

Full-text search

Flexible indexing

Easier Devops

Page 13: Riak at shareaholic

HBasePros

Battle tested

High performance

Cons

Complex Architecture

SPOFs

Requires Hive for Indexing/Querying

Expensive to deployat small scale

Page 14: Riak at shareaholic

CassandraPros

Native secondary indices

Linear scalability

Tunable CAP

Cons

Known users all domain experts

Search requires Lucene

Heavy Weight MapReduce

Page 15: Riak at shareaholic

RiakPros

Operationally simpler

Linear scalability

Integrated search

Secondary indices

Tunable CAP

Vector clocks solve time-sync problems

Cons

Multi-data center replication requires Enterprise product

leveldb puts high strain on CPU

Page 16: Riak at shareaholic

From Mongo to Riak

Page 17: Riak at shareaholic

Migration Goals

No time where database goes “offline”

Product parity throughout migration

Page 18: Riak at shareaholic

Migration Process

1. App writes to Mongo and Riak

2. Verify data integrity

3. Import historical data

4. App reads from Riak

5. Decommission Mongo

Page 19: Riak at shareaholic

Use Cases

Page 20: Riak at shareaholic

Share API

Save shared content

Uses MapReduce topopulate user dashboard

Page 21: Riak at shareaholic

Recommendations

Sets of related pages

Generated on-demand

Page 22: Riak at shareaholic

Publisher Analytics

Generated nightly via Hadoop

Typical stored “document” (JSON)

80kb-1Mb

Page 23: Riak at shareaholic

Riak Successes

Page 25: Riak at shareaholic

Replication: primary/secondary authority

Read failure tolerance: speed/consistency

Write failure tolerance

Tunable CAP @

Page 26: Riak at shareaholic

Full Text Search

Built on Lucene

Make user content searchable

Make arbitrary keys queryable

“Just turn it on”

Hiccup: corrupt merge indexes

Page 27: Riak at shareaholic

Query Example

curl -XPOST http://localhost:8098/mapred -H 'Content-Type: application/json' -d '{ "inputs": { "bucket":"links", "query":"timestamp:[1346350877 TO 1346350937}" //60 second period }, "query":[ {"map":{"language":"javascript","source":"function(riakObject) { return [[Riak.mapValuesJson(riakObject)[0].user_id]]; }"}}, {"reduce":{"language":"javascript", "name":"Riak.reduceMin" // [[2],[5],[9],[13]] => [[2]] }} ]}'

Who’s our oldest user who’s shared something in the last minute?

[[2197]]

Page 28: Riak at shareaholic

Riak on EC2

Page 29: Riak at shareaholic

In a Nutshell

EC2 specs poorly proportioned for leveldb

Multiple AZs in one location works well

Scale vertically for better latency & consistency

Scale horizontally for more throughput/$

Page 30: Riak at shareaholic

Benchmarks

Top Graph: c1.medium (1.7G, 5 CPU)

Middle: m1.large (7.5G, 4 CPU)

Bottom: cc1.4xlarge (23G, 33.5 CPU)

Page 31: Riak at shareaholic

Throughput

Page 32: Riak at shareaholic

Latency (Typical)

Page 33: Riak at shareaholic

Latency (Worst Case)

Page 34: Riak at shareaholic

Calculationsc1.medium (1.7G, 5 CPU)1758 IOPS/$-hrWorst 1% of queries: 300ms/800ms

m1.large (7.5G, 4 CPU)1167 IOPS/$-hrWorst 1% of queries: 110ms/200ms

cc1.4xlarge (23G, 33.5 CPU)872 IOPS/$-hrWorst 1% of queries: 47ms/139ms

Page 35: Riak at shareaholic

Benchmark Takeaways

You can’t go “by spec”

IO is limiting factor

RAM never limiting factor for 1%of keyspace to be in memory

Page 36: Riak at shareaholic

Fin. Questions?Thanks:

Tom Santero

Justin Sheehy

Ryan Zezeski

Reid Draper

#freenode riak crew

We’re Hiring!

Robby Grossman

[email protected]

@freerobby

Page 37: Riak at shareaholic

Fin.