Upload
jared-rosoff
View
31.030
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Slides from my talk at RubyConfChina 2010.
Citation preview
Scalable Event Analytics with Ruby on Rails & MongoDB
Ruby Conf China 2010Jared Rosoff (@forjared)
Yottaa!!!! (www.yottaa.com)
Overview
• Ruby at Scale
• What is Event Analytics?
• What are the different ways you could do it?
• How we did it
Ruby At Scale?http://www.flickr.com/photos/laughingsquid
Event Analytics
Event Analytics
Data Source
EventEvent
Event
Data Source
EventEvent
Event
Data Source
EventEvent
Event
Data Source
EventEvent
Event
UserQuery
Report
User
Query
Report
High Write Volume•Each new data source adds X requests per second•Data never stops arriving
Continuous Data Growth•We only add more data•Historical data is valuable
Flexible Data Exploration•Ad hoc queries •Complex aggregations
Oh and we are a startup
Our requirements:On Launch Day
# of data sources 15# of events per minute 80# GBs data stored 20
3 months later (projected)# of data sources 45# of events per minute 5600# GBs data stored 100
Rails default architecture
MySQL
Data Source Collection Server
User Reporting Server
Rails default architecture
MySQL
Data Source Collection Server
User Reporting Server
“Just” a Rails App
Rails default architecture
MySQL
Data Source Collection Server
User Reporting Server
“Just” a Rails App
Performance Bottleneck: Too much load
Let’s add replication!
MySQLMasterMySQL
MasterMySQLMaster
MySQLMaster
Replication
Data Source Collection Server
User Reporting Server
Let’s add replication!
MySQLMasterMySQL
MasterMySQLMaster
MySQLMaster
Replication
Data Source Collection Server
User Reporting Server
Off the shelf!Scalable Reads!
Let’s add replication!
MySQLMasterMySQL
MasterMySQLMaster
MySQLMaster
Replication
Data Source Collection Server
User Reporting Server
Off the shelf!Scalable Reads!
Performance Bottleneck: Still can’t scale
writes
What about sharding?
MySQLMasterMySQL
MasterMySQLMaster
Data Source Collection Server
User Reporting Server
Shar
ding
Shar
ding
What about sharding?
MySQLMasterMySQL
MasterMySQLMaster
Data Source Collection Server
User Reporting Server
Shar
ding
Shar
ding
Scalable Writes!
What about sharding?
MySQLMasterMySQL
MasterMySQLMaster
Data Source Collection Server
User Reporting Server
Shar
ding
Shar
ding
Scalable Writes!
Development Bottleneck:
Need to write custom code
Key Value stores to the rescue?
MySQLMasterMySQL
MasterCassandra
orVoldemort
Data Source Collection Server
User Reporting Server
Key Value stores to the rescue?
MySQLMasterMySQL
MasterCassandra
orVoldemort
Data Source Collection Server
User Reporting Server
Scalable Writes!
Key Value stores to the rescue?
MySQLMasterMySQL
MasterCassandra
orVoldemort
Data Source Collection Server
User Reporting Server
Scalable Writes!
Development Bottleneck:
Reporting is limited / hard
Can I Hadoop my way out of this?
MySQLMasterMySQL
MasterCassandra
orVoldemort
Data Source Collection Server
User Reporting Server
Hadoop
MySQLMasterMySQL
MasterMySQLSlave
MySQLMaster
Can I Hadoop my way out of this?
MySQLMasterMySQL
MasterCassandra
orVoldemort
Data Source Collection Server
User Reporting Server
Hadoop
MySQLMasterMySQL
MasterMySQLSlave
MySQLMaster
Scalable Writes!
Can I Hadoop my way out of this?
MySQLMasterMySQL
MasterCassandra
orVoldemort
Data Source Collection Server
User Reporting Server
Hadoop
MySQLMasterMySQL
MasterMySQLSlave
MySQLMaster
Scalable Writes!
Flexible Reports!
Can I Hadoop my way out of this?
MySQLMasterMySQL
MasterCassandra
orVoldemort
Data Source Collection Server
User Reporting Server
Hadoop
MySQLMasterMySQL
MasterMySQLSlave
MySQLMaster
Scalable Writes!
Flexible Reports!
“Just” a Rails App
Can I Hadoop my way out of this?
MySQLMasterMySQL
MasterCassandra
orVoldemort
Data Source Collection Server
User Reporting Server
Hadoop
MySQLMasterMySQL
MasterMySQLSlave
MySQLMaster
Scalable Writes!
Flexible Reports!
“Just” a Rails App
Development Bottleneck:
Too many systems!
MongoDB!
MySQLMasterMySQL
MasterMongoDB
Data Source Collection Server
User Reporting Server
MongoDB!
MySQLMasterMySQL
MasterMongoDB
Data Source Collection Server
User Reporting Server
Scalable Writes!
MongoDB!
MySQLMasterMySQL
MasterMongoDB
Data Source Collection Server
User Reporting Server
Scalable Writes!
Flexible Reporting!
MongoDB!
MySQLMasterMySQL
MasterMongoDB
Data Source Collection Server
User Reporting Server
Scalable Writes!“Just” a rails app
Flexible Reporting!
MongoD
MongoD
MongoD
Data Source
App Server
CollectionN
ginx
Pass
enge
r
Mon
gos
ReportingUser
LoadBalancer
MongoD
MongoD
MongoD
Data Source
App Server
CollectionN
ginx
Pass
enge
r
Mon
gos
ReportingUser
Sharding!
LoadBalancer
MongoD
MongoD
MongoD
Data Source
App Server
CollectionN
ginx
Pass
enge
r
Mon
gos
ReportingUser
Sharding!
High Concurrency
LoadBalancer
MongoD
MongoD
MongoD
Data Source
App Server
CollectionN
ginx
Pass
enge
r
Mon
gos
ReportingUser
Sharding!
High ConcurrencyScale-Out
LoadBalancer
MongoDB Sharding
MongoDB Sharding
Replica Sets let us scale storage &
transaction capacity for each shard
MongoDB Sharding
Replica Sets let us scale storage &
transaction capacity for each shard
Mongos routes transactions to shards based on “shard key”
MongoDB Sharding
Replica Sets let us scale storage &
transaction capacity for each shard
Mongos routes transactions to shards based on “shard key”
Config servers store information about which shards exist
Inserting
1 insert { ‘name’ : bob }
2Shard key == namebob Shard 2
3 Insert { ‘name’ : bob }
Querying
1 Query { ‘name’ : bob }
2Shard key == namebob Shard 2
3 Query { ‘name’ : bob }
Map Reduce
1 Map-reduce( … )
2
Map-reduce(…)
2 2 2
Working with Mongo
• MongoMapper makes it look like ActiveRecord
• Documents are more natural than rows in many cases
• Map-Reduce rocks (but needs better support in rails)
http://www.flickr.com/photos/elhamalawy/2526783078/
Ruby
Mongo
Runs over all the objects in the views table, counting how many times a page was viewed
Adds up all the counts for a unique url / date combination
Run the map reduce job and return a collection containing the results
Results• Version 1 of our analytics system took 2 weeks with 1
engineer – We have since added a lot more complexity, but we did it
incrementally
• We replaced MySQL entirely with MongoDB – No need for joins, transactions – Every table is now a document collection
• It’s fast! – 63ms – Average response time for sending data to server– 93ms – Average response time for displaying reports